PitchHut logo
Crystal Unified
Efficiently search compressed logs without decompression.
Pitch

Crystal Unified is a novel compression library designed to streamline the search of compressed logs by enabling direct access to matching data blocks, bypassing the need for traditional decompression. It intelligently detects and compresses various data types, achieving significant size reductions and improved performance, making log management more efficient.

Description

Crystal Unified

Crystal Unified is an innovative compression library designed to streamline the process of managing and searching compressed data. With its unique ability to search through compressed logs without the need for decompression, Crystal offers a significant advantage over traditional compression methods, allowing for faster access to critical information.

Key Features

  • Search Without Decompressing: Unlike traditional tools that require full decompression, Crystal creates indexes during the compression process. This enables users to search for terms directly within the compressed files, significantly enhancing performance and reducing memory usage.

    # Example of traditional decompression followed by search
    zstd -dc huge_logs.zst | grep "error"
    
    # Example using Crystal for indexed search
    cuz search huge_logs.cuz "error"
    
  • Smart Compression Techniques: Crystal goes beyond simple byte compression by intelligently recognizing data structures for optimized storage efficiency:

    Data TypeWhat Crystal SeesResult
    Log filesRepeating patterns with variable fields6-11% of original size
    DNA sequences4-letter nucleotide encoding (ACGT)2-bit encoding + reference compression
    Time seriesSequential numeric variantsDelta-encoded efficiency
    FirmwareBinary data with minimal changesBlock-level random access
  • Exceptional Real-World Performance: Tested on the Loghub benchmark dataset, Crystal excels in both compression ratios and speed, as demonstrated in the following table:

    Log FileOriginalCompressedRatioSpeed
    BGL.log709 MB62 MB8.7%476 MB/s
    Android.log183 MB19 MB10.2%400 MB/s
    SSH.log70 MB4.7 MB6.7%383 MB/s
    Mac.log16 MB1.0 MB6.2%253 MB/s

Use Cases

  • Log Management: Efficiently compress and preserve months of log files while maintaining quick access for searches:

    cuz compress /var/log/app-2026-01-*.log -j -o january-logs.cuz
    cuz search january-logs.cuz "OutOfMemoryError"
    
  • Genomic Data Compression: For genomic data, Crystal allows for reference-based compression, ensuring lossless representation and optimal storage solutions:

    cuz dna-index reference.fa reference.cdni
    cuz dna-compress sample.fa -r reference.cdni
    
  • Firmware Updates: Create compact firmware packages with delta patching capabilities for efficient over-the-air updates:

    cuz firmware v1.0.bin -b 4096
    cuz delta v1.0.bin v1.1.bin -o update.cuzd
    
  • Time Series Data: Utilize sequential encoding techniques to optimize the storage of time series or IoT sensor data:

    cuz compress sensor_readings.csv -t numeric
    

Library API

Crystal Unified is designed for ease of integration. Developers can use the library in Rust to compress and decompress data seamlessly with flexible options for compression levels and transformations:

use crystal_unified::{compress, decompress};

fn main() -> crystal_unified::Result<()> {
    let data = b"Hello, World!";
    let compressed = compress(data)?;
    let original = decompress(&compressed)?;
    assert_eq!(data.as_slice(), original.as_slice());
    Ok(())
}

Documentation and Support

For detailed documentation on features, usage examples, and advanced capabilities, refer to the resources available in the repository. Crystal Unified aims to provide comprehensive support for efficient data compression, making it a valuable tool for developers and data scientists alike.

0 comments

No comments yet.

Sign in to be the first to comment.