PitchHut logo
compact-dict
A customizable and efficient open-addressing dictionary library in Rust.
Pitch

compact-dict offers an innovative approach to dictionary data structures in Rust, focusing on customizable layouts and cache-friendly performance. By utilizing linear probing and ensuring contiguous memory layout, it enhances CPU cache efficiency while minimizing pointer chasing, making it ideal for high-performance applications.

Description

compact-dict is an experimental, highly customizable, open-addressing dictionary implemented in Rust. It is designed to optimize performance and memory usage, heavily inspired by the principles of Mojo's Dict. This dictionary utilizes linear probing and maintains a continuous memory layout for string keys, significantly enhancing speed and cache performance for suitable workloads.

Key Advantages of compact-dict

Modern processors tend to perform poorly when faced with pointer chasing, which can lead to cache misses. By ensuring that data remains contiguous, compact-dict minimizes fragmentation and enhances retrieval efficiency.

Features

  • Customizable Layout: Offers generic integer types for key-count (KC) and key-offset (KO) sizes, allowing users to optimize memory based on specific requirements.
  • Cache Friendly: A linear probing mechanism coupled with optional cached hashes enhances performance.
  • SIMD Preparation: Ready for SIMD-accelerated probing using portable_simd.
  • Configurable Mutability: Supports optional destructive operations through a constant generic parameter for efficient deletion masking.
  • Performant Hash Function: Utilizes a high-performance internal hasher based on aHash by default.

Usage Example

Here’s how to create and use a compact-dict instance:

use compact_dict::dict::Dict;

fn main() {
    let mut map: Dict<u32> = Dict::new(8);

    // Inserting key-value pairs
    map.put("hello", 42);
    map.put("world", 100);

    // Retrieving values
    assert_eq!(map.get_or("hello", 0), 42);
    assert_eq!(map.get_or("world", 0), 100);
    assert_eq!(map.get_or("missing", 0), 0);

    // Checking the count of elements
    assert_eq!(map.contains("hello"), true);
    assert_eq!(map.len(), 2);
    map.clear();
    assert_eq!(map.len(), 0);
}

Performance Analysis

compact-dict excels in scenarios where rapid string ingestion is required, such as data analysis or short-lived workloads. Its design is specifically optimized for high throughput, particularly in read-heavy environments.

Strengths

  1. Continuous Memory Management: It minimizes heap allocations for each unique key by utilizing a single dense Vec<u8> for string storage.
  2. Optimized Lookups: Leverages SIMD capabilities for effective hash comparisons.
  3. Ideal for Static Datasets: Preserves speed in append-only or static workloads, making it less suited for applications that require frequent deletions.

Trade-offs

While compact-dict outperforms many other hashing implementations in specific contexts, it also comes with certain constraints:

  • Limited Deletion Capability: Designed primarily for append-only operations, making it less suitable for cases requiring regular deletions.
  • Generic Key Limitations: Focused on string keys (&str), which may not be compatible with all data types.
  • Nightly Rust Dependency: Requires the nightly Rust toolchain for certain features and optimizations, which may limit accessibility for some users.

In summary, compact-dict is engineered for high performance in specialized scenarios and serves as a compelling alternative in the Rust ecosystem for developers seeking an efficient dictionary solution.

0 comments

No comments yet.

Sign in to be the first to comment.