csv-zero is a low-level, zero-allocation CSV parsing library designed for performance-sensitive applications. By focusing on field-based iteration and SIMD acceleration, it provides precise control over CSV data processing without unnecessary abstractions or memory allocation. Ideal for systems engineers who prioritize efficiency.
csv-zero is an efficient CSV parsing library written in Zig, designed for performance-sensitive applications. This library features a zero-allocation and SIMD-accelerated field iterator, intentionally omitting common abstractions for maximum control and transparency.
Key Features
- Field-by-field iteration: Unlike typical CSV libraries, csv-zero emphasizes a direct field iteration model rather than a higher-level record abstraction.
- Zero allocations by default: This library operates without automatic memory allocation unless explicitly asked by the user, allowing for predictable resource usage.
- Explicit ownership and lifetimes: Control over memory management is prioritized, providing fields as slices into an internal buffer.
- Support for SIMD: Leveraging Single Instruction, Multiple Data (SIMD) techniques enhances performance, making CSV parsing faster when available.
- RFC 4180 compliance: csv-zero adheres strictly to the CSV specification, ensuring that the parsing behavior is consistent and reliable.
Functional Overview
csv-zero is all about simplicity and high performance:
- Incrementally parse CSV data from
*std.Io.Reader. - Iteration over individual fields with clear row boundaries.
- Support for LF and CRLF line endings, quoted fields, and escaped quotes per RFC 4180.
- Configurable SIMD-accelerated scanning enhances parsing speed without compromising correctness.
Notable Exclusions
- Does not allocate memory for fields or records automatically.
- Lacks built-in support for row or record iterators.
- Does not construct complex data structures like structs or maps.
- Avoids tolerating malformed or ambiguous CSV data, focusing instead on strict validity checks.
Design Philosophy
The design choice to use a field iterator means that csv-zero:
- Avoids the overhead of tracking record states, which contributes to lower memory and performance costs.
- Ensures that fields can span the entire buffer, accommodating large rows when necessary.
- Facilitates user-defined structures such as records and columns without imposing extra layers of abstraction.
Usage Examples
To illustrate how to use the library: Count the number of rows and fields:
while (true) {
const field = it.next() catch |err| switch (err) {
csvz.Iterator.Error.EOF => break,
else => return err,
};
fields += 1;
rows += @intFromBool(field.last_column);
}
Print every parsed field:
while (true) {
var field = it.next() catch |err| switch (err) {
error.EOF => break,
else => return err,
};
std.debug.print("field[{d}][{d}] = {s}\n", .{ row, col, field.unescaped() });
}
Performance Benchmarking
For performance context, csv-zero has been benchmarked against other high-quality CSV parsers in the complementary project csv-race, which evaluates numerous CSV libraries. Results consistently show csv-zero achieving top performance metrics, making it an ideal choice for anyone requiring speed and efficiency in CSV processing.
Additional Considerations
The library offers a C API, though it's recommended to use the Zig implementation for full performance benefits, minimizing cross-language overhead.
In conclusion, csv-zero is tailored for developers needing precise control over CSV parsing with minimal overhead, establishing an effective foundation for high-performance applications. For further details, refer to the associated documentation and examples.
No comments yet.
Sign in to be the first to comment.