PitchHut logo
Retrievo
A hybrid search library for .NET combining BM25 and vector search.
Pitch

Retrievo is an innovative in-memory search library for .NET that seamlessly integrates BM25 lexical matching with vector similarity search. With no external servers or databases required, it's perfect for small-scale applications, offline scenarios, and lightweight developer tools, delivering efficient search capabilities up to 10,000 documents.

Description

Retrievo: A Hybrid Search Solution for .NET

Retrievo is an innovative open-source search library designed specifically for .NET, offering a powerful combination of BM25 lexical matching and vector similarity search. With the ability to merge results through Reciprocal Rank Fusion (RRF), Retrievo provides a comprehensive search experience without the need for external servers or databases. It is ideal for managing corpora of up to approximately 10,000 documents, making it perfect for applications involving local agent memory, small retrieval-augmented generation (RAG) pipelines, developer tools, and offline or edge scenarios.

Key Features

  • Hybrid Retrieval: Seamlessly integrates BM25 lexical search with advanced cosine similarity using RRF for enhanced result relevance.
  • Standalone Modes: Flexibility to utilize purely lexical or vector-based searches based on specific use cases.
  • Explain Mode: Offers detailed score breakdowns for each search result, helping to understand how results were derived.
  • Fielded Search: Customize weight boosts for title and body fields, optimizing how documents are ranked.
  • Metadata Filters: Supports various filtering options such as exact-match, range, and contains filtering after result fusion.
  • Field Definitions: Allows declaration of field types at index time, enabling automatic handling of filter semantics.
  • Finite Vector Validation: Ensures queries reject invalid embeddings (NaN/Infinity) to maintain search integrity.

Index Management

Retrievo boasts an intuitive and efficient index management system:

  • Fluent Builder: Provides a clean API for batch construction and document ingestion, making setup straightforward.
  • Mutable Index: Supports incremental updates and deletions while ensuring thread-safe commits.
  • Zero Infrastructure: Fully functional within the process, eliminating the need for external dependencies.
  • Auto-Embedding: Automatically handles document embedding during the indexing process, simplifying setup.

Developer-Driven Enhancements

  • SIMD Accelerated Operations: Utilizes hardware intrinsics to boost performance in brute-force vector computations.
  • Query Diagnostics: Offers in-depth timing breakdowns for every stage of the query pipeline, facilitating performance analysis.
  • Pluggable Providers: Easily integrates with various embedding models or APIs, enhancing versatility.
  • CLI Tool: Provides a command-line interface for efficient indexing and querying operations.

Quick Start Example

Retrievo enables rapid setup and execution of search operations:

using Retrievo;
using Retrievo.Models;

var index = new HybridSearchIndexBuilder()
    .AddDocument(new Document { Id = "1", Body = "Neural networks learn complex patterns." })
    .AddDocument(new Document { Id = "2", Body = "Kubernetes orchestrates container deployments." })
    .Build();

using var _ = index;
var response = index.Search(new HybridQuery { Text = "neural network training", TopK = 5 });

foreach (var r in response.Results)
    Console.WriteLine($"  {r.Id}: {r.Score:F4}");

Azure OpenAI Integrations

Retrievo supports integration with Azure OpenAI for enhanced capabilities. This allows documents to be embedded at build time and queries to be processed using advanced embeddings:

using Retrievo.AzureOpenAI;

var provider = new AzureOpenAIEmbeddingProvider(
    new Uri("https://your-resource.openai.azure.com/"),
    "your-api-key",
    "text-embedding-3-small");

using var index = await new HybridSearchIndexBuilder()
    .AddFolder("./docs")  // loads *.md and *.txt recursively
    .WithEmbeddingProvider(provider)
    .BuildAsync();

var response = await index.SearchAsync(new HybridQuery { Text = "how to deploy", TopK = 5 });

Performance and Benchmarks

Retrievo has been validated against the BEIR benchmark with impressive retrieval quality and quick query latency:

DatasetHybrid (default)
NFCorpus0.392
SciFact0.756

Query latency, even with hybrid configurations, remains below 10 ms for hybrid queries. Retrievo is optimized for effective interactions with document collections of moderate size, ensuring a smooth user experience.

Roadmap

Future developments include:

  • Expansion of support for larger corpora using approximate nearest neighbor (ANN) solutions.
  • Implementing snapshot export and import features.

Retrievo represents a significant advancement in the field of search technologies for .NET, offering powerful features and capabilities suitable for both developers and end-users alike.

0 comments

No comments yet.

Sign in to be the first to comment.