Crystal Unified is a novel compression library designed to streamline the search of compressed logs by enabling direct access to matching data blocks, bypassing the need for traditional decompression. It intelligently detects and compresses various data types, achieving significant size reductions and improved performance, making log management more efficient.
Crystal Unified
Crystal Unified is an innovative compression library designed to streamline the process of managing and searching compressed data. With its unique ability to search through compressed logs without the need for decompression, Crystal offers a significant advantage over traditional compression methods, allowing for faster access to critical information.
Key Features
-
Search Without Decompressing: Unlike traditional tools that require full decompression, Crystal creates indexes during the compression process. This enables users to search for terms directly within the compressed files, significantly enhancing performance and reducing memory usage.
# Example of traditional decompression followed by search zstd -dc huge_logs.zst | grep "error" # Example using Crystal for indexed search cuz search huge_logs.cuz "error" -
Smart Compression Techniques: Crystal goes beyond simple byte compression by intelligently recognizing data structures for optimized storage efficiency:
Data Type What Crystal Sees Result Log files Repeating patterns with variable fields 6-11% of original size DNA sequences 4-letter nucleotide encoding (ACGT) 2-bit encoding + reference compression Time series Sequential numeric variants Delta-encoded efficiency Firmware Binary data with minimal changes Block-level random access -
Exceptional Real-World Performance: Tested on the Loghub benchmark dataset, Crystal excels in both compression ratios and speed, as demonstrated in the following table:
Log File Original Compressed Ratio Speed BGL.log 709 MB 62 MB 8.7% 476 MB/s Android.log 183 MB 19 MB 10.2% 400 MB/s SSH.log 70 MB 4.7 MB 6.7% 383 MB/s Mac.log 16 MB 1.0 MB 6.2% 253 MB/s
Use Cases
-
Log Management: Efficiently compress and preserve months of log files while maintaining quick access for searches:
cuz compress /var/log/app-2026-01-*.log -j -o january-logs.cuz cuz search january-logs.cuz "OutOfMemoryError" -
Genomic Data Compression: For genomic data, Crystal allows for reference-based compression, ensuring lossless representation and optimal storage solutions:
cuz dna-index reference.fa reference.cdni cuz dna-compress sample.fa -r reference.cdni -
Firmware Updates: Create compact firmware packages with delta patching capabilities for efficient over-the-air updates:
cuz firmware v1.0.bin -b 4096 cuz delta v1.0.bin v1.1.bin -o update.cuzd -
Time Series Data: Utilize sequential encoding techniques to optimize the storage of time series or IoT sensor data:
cuz compress sensor_readings.csv -t numeric
Library API
Crystal Unified is designed for ease of integration. Developers can use the library in Rust to compress and decompress data seamlessly with flexible options for compression levels and transformations:
use crystal_unified::{compress, decompress};
fn main() -> crystal_unified::Result<()> {
let data = b"Hello, World!";
let compressed = compress(data)?;
let original = decompress(&compressed)?;
assert_eq!(data.as_slice(), original.as_slice());
Ok(())
}
Documentation and Support
For detailed documentation on features, usage examples, and advanced capabilities, refer to the resources available in the repository. Crystal Unified aims to provide comprehensive support for efficient data compression, making it a valuable tool for developers and data scientists alike.
No comments yet.
Sign in to be the first to comment.