OctaneDB is a high-performance, lightweight vector database library built in Python. It surpasses existing solutions with 10x faster performance, making it ideal for AI and ML applications. With features like advanced indexing, text embedding support, and flexible storage options, OctaneDB delivers efficient similarity search and optimized memory usage.
OctaneDB is a high-performance, lightweight vector database library developed in Python, specifically engineered to deliver remarkable speed improvements when compared to existing solutions such as Pinecone, ChromaDB, and Qdrant. Users can expect 10x faster performance, achieving sub-millisecond query response times while seamlessly handling an impressive insertion rate of over 3,000 vectors per second. This library is ideally suited for AI/ML applications that demand rapid similarity searches and efficient data management.
Key Features
Performance Optimizations
- Achieves 10x faster operations compared to established vector databases.
- Provides sub-millisecond response times for queries.
- Offers an exceptional insertion rate of 3,000+ vectors per second.
- Utilizes HDF5 compression for optimized memory usage.
Advanced Indexing Capabilities
- Implements HNSW (Hierarchical Navigable Small World) for rapid approximate search functionality.
- Supports FlatIndex for precise similarity searches.
- Allows the customization of parameters to fine-tune performance.
- Features automatic index optimization for enhanced efficiency.
Text Embedding Support
- Compatible API with ChromaDB for effortless migration.
- Automates text-to-vector conversion through the integration of sentence-transformers.
- Supports numerous embedding models, including
all-MiniLM-L6-v2
andall-mpnet-base-v2
, with GPU acceleration (CUDA) capabilities. - Facilitates batch processing to enhance performance.
Flexible Storage Options
- Offers in-memory storage for maximum speed, along with persistent file-based storage.
- Utilizes a hybrid mode to harness the advantages of both storage types.
- Employs HDF5 format for efficient compression of data.
Versatile Search Features
- Supports multiple distance metrics, including Cosine, Euclidean, Dot Product, Manhattan, Chebyshev, and Jaccard.
- Advanced metadata filtering capabilities leveraging logical operators.
- Enables batch search operations for increased efficiency.
- Includes text-based search functionality with automatic embedding.
Developer-Friendly Experience
- Features a simple and intuitive API similar to ChromaDB.
- Comprehensive documentation enriched with practical examples.
- Provides type hints throughout the code for better usability.
- Comes with an extensive suite of tests to ensure reliability.
Example Usage
Here is a sample of how to utilize OctaneDB for operational tasks:
from octanedb import OctaneDB
db = OctaneDB(dimensions=384, embedding_model="all-MiniLM-L6-v2") # Initialize
collection = db.create_collection("documents") # Create a new collection
result = db.add(
ids=["doc1", "doc2"],
documents=["This is a document about pineapple", "This is a document about oranges"],
metadatas=[{"category": "tropical", "color": "yellow"}, {"category": "citrus", "color": "orange"}]
)
results = db.search_text(query_text="fruit", k=2, filter="category == 'tropical'", include_metadata=True)
for doc_id, distance, metadata in results:
print(f"Document: {db.get_document(doc_id)}")
print(f"Distance: {distance:.4f}")
print(f"Metadata: {metadata}")
Performance Benchmarks
Recent benchmark comparisons illustrate OctaneDB's superior performance:
Operation | OctaneDB | ChromaDB | Pinecone | Qdrant |
---|---|---|---|---|
Insert (vectors/sec) | 3,200 | 320 | 280 | 450 |
Search (ms) | 0.8 | 8.2 | 15.1 | 12.3 |
Memory Usage | 1.2GB | 2.8GB | 3.1GB | 2.5GB |
Index Build Time | 45s | 180s | 120s | 95s |
This project is designed for applications including AI/ML, document search, recommendation systems, image search, and NLP tasks.
For more information and to get started with OctaneDB, check out the comprehensive documentation and examples provided in the repository.
No comments yet.
Sign in to be the first to comment.