csv-zero - A minimal and efficient CSV parsing library in Zig.

csv-zero

A minimal and efficient CSV parsing library in Zig.

Pitch

csv-zero is a low-level, zero-allocation CSV parsing library designed for performance-sensitive applications. By focusing on field-based iteration and SIMD acceleration, it provides precise control over CSV data processing without unnecessary abstractions or memory allocation. Ideal for systems engineers who prioritize efficiency.

Description

csv-zero is an efficient CSV parsing library written in Zig, designed for performance-sensitive applications. This library features a zero-allocation and SIMD-accelerated field iterator, intentionally omitting common abstractions for maximum control and transparency.

Key Features

Field-by-field iteration: Unlike typical CSV libraries, csv-zero emphasizes a direct field iteration model rather than a higher-level record abstraction.
Zero allocations by default: This library operates without automatic memory allocation unless explicitly asked by the user, allowing for predictable resource usage.
Explicit ownership and lifetimes: Control over memory management is prioritized, providing fields as slices into an internal buffer.
Support for SIMD: Leveraging Single Instruction, Multiple Data (SIMD) techniques enhances performance, making CSV parsing faster when available.
RFC 4180 compliance: csv-zero adheres strictly to the CSV specification, ensuring that the parsing behavior is consistent and reliable.

Functional Overview

csv-zero is all about simplicity and high performance:

Incrementally parse CSV data from *std.Io.Reader.
Iteration over individual fields with clear row boundaries.
Support for LF and CRLF line endings, quoted fields, and escaped quotes per RFC 4180.
Configurable SIMD-accelerated scanning enhances parsing speed without compromising correctness.

Notable Exclusions

Does not allocate memory for fields or records automatically.
Lacks built-in support for row or record iterators.
Does not construct complex data structures like structs or maps.
Avoids tolerating malformed or ambiguous CSV data, focusing instead on strict validity checks.

Design Philosophy

The design choice to use a field iterator means that csv-zero:

Avoids the overhead of tracking record states, which contributes to lower memory and performance costs.
Ensures that fields can span the entire buffer, accommodating large rows when necessary.
Facilitates user-defined structures such as records and columns without imposing extra layers of abstraction.

Usage Examples

To illustrate how to use the library: Count the number of rows and fields:

while (true) {
    const field = it.next() catch |err| switch (err) {
        csvz.Iterator.Error.EOF => break,
        else => return err,
    };
    fields += 1;
    rows += @intFromBool(field.last_column);
}

Print every parsed field:

while (true) {
    var field = it.next() catch |err| switch (err) {
        error.EOF => break,
        else => return err,
    };
    std.debug.print("field[{d}][{d}] = {s}\n", .{ row, col, field.unescaped() });
}

Performance Benchmarking

For performance context, csv-zero has been benchmarked against other high-quality CSV parsers in the complementary project csv-race, which evaluates numerous CSV libraries. Results consistently show csv-zero achieving top performance metrics, making it an ideal choice for anyone requiring speed and efficiency in CSV processing.

Additional Considerations

The library offers a C API, though it's recommended to use the Zig implementation for full performance benefits, minimizing cross-language overhead.

In conclusion, csv-zero is tailored for developers needing precise control over CSV parsing with minimal overhead, establishing an effective foundation for high-performance applications. For further details, refer to the associated documentation and examples.

0 comments

No comments yet.

New comment