PitchHut logo
Ultra-fast SQL queries for CSV files using Zig and SIMD.
Pitch

csvql brings the power of SQL to CSV files with exceptional speed, leveraging Zig's capabilities for SIMD parsing and parallel execution. This command-line interface tool enables efficient data querying and manipulation, making it significantly faster than conventional options like DuckDB, even for large datasets.

Description

csvql: Fast SQL Queries on CSV Files

csvql is a powerful command-line interface that allows users to execute SQL queries directly against CSV files, offering exceptional performance and flexibility. Designed with advanced features such as SIMD parsing and parallel execution in Zig, csvql significantly outperforms traditional database engines in handling large datasets.

Key Features

  • High Performance: Achieves remarkable speeds, proven to be up to 9 times faster than DuckDB for queries on large datasets.

  • SQL Syntax: Supports a familiar SQL dialect, enabling users to seamlessly query data. Example query:

    $ csvql "SELECT name, city, salary FROM 'employees.csv' WHERE salary > 100000 ORDER BY salary DESC LIMIT 5"  
    
    name,city,salary  
    Alice,San Francisco,185000  
    Bob,New York,172000  
    Carol,Seattle,168000  
    Dave,Austin,155000  
    Eve,Boston,142000  
    
  • Flexible Query Modes: Automatically switches between SQL and simple modes based on input, allowing for concise and effective queries. For instance:

    csvql data.csv "name,salary" "age>30" 10 "salary:desc"  
    
  • Streamlined Integration: Easily works with Unix pipes, making it a versatile tool for processing data in the command line environment.

    cat data.csv | csvql "SELECT name, age FROM '-' WHERE age > 25"  
    
  • Smart Flags: Comes with various command line flags for customized outputs, including JSON formatting and delimiter options.

Performance Highlights

csvql is built for speed and efficiency, with benchmarks indicating significant improvements over competing tools like DuckDB:

QuerycsvqlDuckDBSpeedup
WHERE + ORDER BY LIMIT 100.020s0.179s9x
ORDER BY LIMIT 100.041s0.165s4x
Full scan (1M rows)0.196s1.163s5.9x

Supported SQL Functions

  • Selection and Filtering: Utilize SELECT, WHERE, LIKE, and more to extract and filter data effectively.
  • Aggregation Operations: Count, sum, and average with GROUP BY and powerful aggregate functions.
  • Date Functions: Perform date arithmetic and comparisons with built-in functions like DATEDIFF and DATEADD.

Integrating with AI Assistants

csvql also supports serving queries to AI assistants through the Model Context Protocol (MCP) server, facilitating dynamic and context-aware querying capabilities.

csvql --mcp  

Conclusion

With its lightning-fast performance, robust SQL support, and extensibility, csvql is an essential tool for anyone working with large CSV datasets. Its combination of speed and ease of use makes data analysis efficient and straightforward. For more insights and usage details, users can refer to the comprehensive documentation available in the repository.

0 comments

No comments yet.

Sign in to be the first to comment.