Luna is an open-source, in-memory SQL server designed for optimal performance with object storage data. Built on DuckDB and Apache Arrow, it seamlessly integrates with various cloud services and supports multiple data types. Its ability to process larger-than-memory workloads with spillover capabilities makes it a robust solution for scalable data applications.
Luna is an open-source, in-memory SQL server designed specifically for managing object storage data. Leveraging powerful technologies such as DuckDB and Apache Arrow, it seamlessly integrates with various data sources including S3, GCS, and local file systems. Luna supports multiple data formats such as CSV
, JSON
, and PARQUET
and is optimized for in-memory operations while also offering the capability to spill to disk for handling larger datasets.
Key Features
- In-Memory Database: Provides fast access and processing capabilities essential for efficient data handling.
- Broad Compatibility: Works with popular object storage solutions and readily supports essential data formats.
- Flexible API: Communicates via a TCP-based API, enabling users to execute SQL commands and retrieve results in a structured format.
API Overview
Luna maintains a single in-memory database accessible through a TCP-based API, which is exposed at the default port 7688
. It uses a variant of Redis' RESP protocol for requests and responses, including integration with Arrow's IPC format for efficient data interchange.
Request Format
Requests to the server are structured as follows:
$<length>\r\n<data>\r\n
For example, to load a CSV file from GCS:
$138\n\nx:CREATE TABLE tmpcur AS FROM read_csv('gs://bucket/yourdata.csv', header = true, union_by_name = true, files_to_sniff = -1);\n\n
$39\n\nq:SELECT uuid, date, payer FROM tmpcur;\n\n
Response Structure
Responses are delivered in a similar structured format, with success messages indicating changes or results utilizing an Arrow schema. Error messages are also provided with clarity, showing the specific issue encountered during processing.
Development and Support
Luna is currently in alpha development, actively supported by Alphaus, Inc. for both external and internal usage. While it is presently designed for single-machine operations, there are ambitions for future support in distributed cluster environments.
Build and Test Instructions
The following is a brief overview of how to build and test Luna using lunactl
:
# Build the binary:
$ cargo build
# Run Luna on the default port:
$ RUST_LOG=info ./target/debug/luna
# Example command to create a table from CSV:
$ lunactl -type 'x:' -p "CREATE TABLE customers AS FROM read_csv('/path/to/data.csv', header = true, files_to_sniff = -1);"
Future Directions
The development roadmap for Luna includes plans for creating client SDKs for popular programming languages, enhancing authentication methods, and establishing comprehensive documentation. There are also goals to expand testing capabilities and add distributed computing support.
No comments yet.
Sign in to be the first to comment.