PitchHut logo
A minimal and efficient CSV parsing library in Zig.
Pitch

csv-zero is a low-level, zero-allocation CSV parsing library designed for performance-sensitive applications. By focusing on field-based iteration and SIMD acceleration, it provides precise control over CSV data processing without unnecessary abstractions or memory allocation. Ideal for systems engineers who prioritize efficiency.

Description

csv-zero is an efficient CSV parsing library written in Zig, designed for performance-sensitive applications. This library features a zero-allocation and SIMD-accelerated field iterator, intentionally omitting common abstractions for maximum control and transparency.

Key Features

  • Field-by-field iteration: Unlike typical CSV libraries, csv-zero emphasizes a direct field iteration model rather than a higher-level record abstraction.
  • Zero allocations by default: This library operates without automatic memory allocation unless explicitly asked by the user, allowing for predictable resource usage.
  • Explicit ownership and lifetimes: Control over memory management is prioritized, providing fields as slices into an internal buffer.
  • Support for SIMD: Leveraging Single Instruction, Multiple Data (SIMD) techniques enhances performance, making CSV parsing faster when available.
  • RFC 4180 compliance: csv-zero adheres strictly to the CSV specification, ensuring that the parsing behavior is consistent and reliable.

Functional Overview

csv-zero is all about simplicity and high performance:

  • Incrementally parse CSV data from *std.Io.Reader.
  • Iteration over individual fields with clear row boundaries.
  • Support for LF and CRLF line endings, quoted fields, and escaped quotes per RFC 4180.
  • Configurable SIMD-accelerated scanning enhances parsing speed without compromising correctness.

Notable Exclusions

  • Does not allocate memory for fields or records automatically.
  • Lacks built-in support for row or record iterators.
  • Does not construct complex data structures like structs or maps.
  • Avoids tolerating malformed or ambiguous CSV data, focusing instead on strict validity checks.

Design Philosophy

The design choice to use a field iterator means that csv-zero:

  • Avoids the overhead of tracking record states, which contributes to lower memory and performance costs.
  • Ensures that fields can span the entire buffer, accommodating large rows when necessary.
  • Facilitates user-defined structures such as records and columns without imposing extra layers of abstraction.

Usage Examples

To illustrate how to use the library: Count the number of rows and fields:

while (true) {
    const field = it.next() catch |err| switch (err) {
        csvz.Iterator.Error.EOF => break,
        else => return err,
    };
    fields += 1;
    rows += @intFromBool(field.last_column);
}

Print every parsed field:

while (true) {
    var field = it.next() catch |err| switch (err) {
        error.EOF => break,
        else => return err,
    };
    std.debug.print("field[{d}][{d}] = {s}\n", .{ row, col, field.unescaped() });
}

Performance Benchmarking

For performance context, csv-zero has been benchmarked against other high-quality CSV parsers in the complementary project csv-race, which evaluates numerous CSV libraries. Results consistently show csv-zero achieving top performance metrics, making it an ideal choice for anyone requiring speed and efficiency in CSV processing.

Additional Considerations

The library offers a C API, though it's recommended to use the Zig implementation for full performance benefits, minimizing cross-language overhead.

In conclusion, csv-zero is tailored for developers needing precise control over CSV parsing with minimal overhead, establishing an effective foundation for high-performance applications. For further details, refer to the associated documentation and examples.

0 comments

No comments yet.

Sign in to be the first to comment.