csvql brings the power of SQL to CSV files with exceptional speed, leveraging Zig's capabilities for SIMD parsing and parallel execution. This command-line interface tool enables efficient data querying and manipulation, making it significantly faster than conventional options like DuckDB, even for large datasets.
csvql: Fast SQL Queries on CSV Files
csvql is a powerful command-line interface that allows users to execute SQL queries directly against CSV files, offering exceptional performance and flexibility. Designed with advanced features such as SIMD parsing and parallel execution in Zig, csvql significantly outperforms traditional database engines in handling large datasets.
Key Features
-
High Performance: Achieves remarkable speeds, proven to be up to 9 times faster than DuckDB for queries on large datasets.
-
SQL Syntax: Supports a familiar SQL dialect, enabling users to seamlessly query data. Example query:
$ csvql "SELECT name, city, salary FROM 'employees.csv' WHERE salary > 100000 ORDER BY salary DESC LIMIT 5" name,city,salary Alice,San Francisco,185000 Bob,New York,172000 Carol,Seattle,168000 Dave,Austin,155000 Eve,Boston,142000 -
Flexible Query Modes: Automatically switches between SQL and simple modes based on input, allowing for concise and effective queries. For instance:
csvql data.csv "name,salary" "age>30" 10 "salary:desc" -
Streamlined Integration: Easily works with Unix pipes, making it a versatile tool for processing data in the command line environment.
cat data.csv | csvql "SELECT name, age FROM '-' WHERE age > 25" -
Smart Flags: Comes with various command line flags for customized outputs, including JSON formatting and delimiter options.
Performance Highlights
csvql is built for speed and efficiency, with benchmarks indicating significant improvements over competing tools like DuckDB:
| Query | csvql | DuckDB | Speedup |
|---|---|---|---|
| WHERE + ORDER BY LIMIT 10 | 0.020s | 0.179s | 9x |
| ORDER BY LIMIT 10 | 0.041s | 0.165s | 4x |
| Full scan (1M rows) | 0.196s | 1.163s | 5.9x |
Supported SQL Functions
- Selection and Filtering: Utilize
SELECT,WHERE,LIKE, and more to extract and filter data effectively. - Aggregation Operations: Count, sum, and average with
GROUP BYand powerful aggregate functions. - Date Functions: Perform date arithmetic and comparisons with built-in functions like
DATEDIFFandDATEADD.
Integrating with AI Assistants
csvql also supports serving queries to AI assistants through the Model Context Protocol (MCP) server, facilitating dynamic and context-aware querying capabilities.
csvql --mcp
Conclusion
With its lightning-fast performance, robust SQL support, and extensibility, csvql is an essential tool for anyone working with large CSV datasets. Its combination of speed and ease of use makes data analysis efficient and straightforward. For more insights and usage details, users can refer to the comprehensive documentation available in the repository.
No comments yet.
Sign in to be the first to comment.