This project explores a systematic approach to improving Java log scanning performance. By identifying and resolving bottlenecks one at a time, it transforms a naïve solution into an efficient algorithm, reducing scan times significantly. Ideal for developers seeking to enhance performance in Java applications.
Java Performance Engineering: A Log Scanner Case Study
This repository serves as a detailed examination of a Java log scanner's performance improvements. Starting from a clear and readable baseline solution, optimizations are gradually applied, with each iteration dedicated to identifying and addressing specific performance bottlenecks. The objective extends beyond achieving rapid processing times; it emphasizes a thorough understanding of the underlying reasons for the naive implementation's inefficiencies and the impact of each adjustment on both the hardware and the Java Virtual Machine (JVM).
Test Environment
All performance benchmarks are executed in a single-threaded context without any concurrency or parallelism, ensuring that each measurement reflects the performance of an isolated scan on a single CPU core. The testing was conducted on the following hardware:
| CPU | Intel Core i5-1035G1 @ 1.00GHz (1.19 GHz boost) |
| RAM | 8.00 GB |
| OS | Windows x64 (64-bit) |
| JDK | Java 25 |
The Task
The primary challenge involves processing a log file containing millions of entries, where each line denotes a distinct log event, such as errors, warnings, and debug messages. The goal is to extract lines marked as ERROR and summarize the total errors occurring within each hour of the day. An expected output structure presents errors grouped by hour:
Hour 00 → 143 errors
Hour 01 → 311 errors
Hour 02 → 87 errors
...
Hour 23 → 204 errors
array [143, 311, 87...etc ]
each index represents the hour.
Why This Problem?
Log analysis is essential in typical backend systems and presents a valuable case study for log scanner engineering. Key reasons include:
- The substantial input size allows naive approaches to become evidently inefficient.
- The simplicity of the algorithm means that performance issues arise primarily from I/O operations and memory handling rather than the logic itself.
- Each optimization is focused, yielding measurable results in terms of heap allocation, garbage collection pressure, and CPU instruction counts.
Repository Structure
This repository organizes each optimized version into its own package, accompanied by documentation detailing both the implementations and the improvements:
src/
└── main/
├── java/
│ ├── generator/
│ │ └── LogGenerator.java ← Generates the input benchmark file
│ └── logscanner/
│ └── version0X/
│ ├── LogScannerV0X.java ← JMH benchmark class
│ ├── Tester.java ← Algorithm test class
│ └── overview.md ← Documentation for each version
└── resources/
├── example.txt ← Small sample log file for initial testing
├── input.txt ← Real benchmark input (~1M rows, ~120MB)
└── runOptions.txt ← JVM flags and run commands
Performance Evolution
Each version contains documentation structured consistently to facilitate understanding:
- What changed - A table summarizing the differences from the previous version.
- How it works - An in-depth examination of the code with clarifications.
- Why it matters - The motivations behind the changes, focusing on practical implications rather than mere outcomes.
- Benchmark results - Actual performance metrics gathered using the Java Microbenchmark Harness (JMH), analyzed for clarity.
- What comes next - Insights derived from the current results and suggested avenues for further optimization.
Benchmark Results Overview
The advancements across versions are evident in performance metrics, with execution times decreasing significantly while reducing garbage collection impacts:
| Version | Execution Time | GC Alloc Rate | GC Pauses |
|---|---|---|---|
| V01 | 872ms ± 346ms | 156 MB/sec | 19/op |
| V02 | 394ms ± 402ms | 13.4 MB/sec | 2/op |
| V03 | 194ms ± 24ms | 0.011 MB/sec | 0/op |
| V04 | 78ms ± 9ms | 0.019 MB/sec | 0/op |
From an initial ~870ms execution time with considerable garbage collection to a refined ~78ms, this case study illustrates the substantial benefits gained through careful optimization of data handling and access patterns.
Requirements
- Java 22+ (for FFM API in V04)
- Maven
- JMH (included via
pom.xml)
Each version is sequential, reflecting a progressive enhancement rather than a complete overhaul. It is recommended to read through them in order for the best insight into the performance engineering process.
No comments yet.
Sign in to be the first to comment.