Production RAG Pipeline

Your solution for self-hosted, Perplexity-style web search.

Pitch

Production RAG Pipeline offers a powerful library for creating search pipelines tailored for local LLMs. With its ability to integrate Bing and DuckDuckGo, it refines search results through semantic filtering and reranking, ensuring users receive well-structured, cited answers without the constraints of API costs.

Description

The Production RAG Pipeline provides a free, self-hosted solution for building search pipelines to produce Perplexity-style answers using local LLMs, ensuring no API costs or usage limits.

This library integrates Bing and DuckDuckGo to conduct web searches, applies semantic filtering to enhance relevance, reranks results by relevance, and prepares LLM-ready prompts with inline citations for informed responses.

Features

Dual Search Engines: Search Bing and DuckDuckGo in parallel for a comprehensive mix of factual and news queries. The pipeline dynamically shifts DuckDuckGo to a news mode based on the detected keywords, ensuring prompt access to relevant articles.
Semantic Pre-Filtering: Automatically assesses search results for relevance prior to fetching the page. This pre-filtering technique significantly reduces unnecessary fetches, optimizing response times and resource usage.
Context-Aware Content Detection: Implements a structured, two-stage evaluation to identify relevant content such as price lists and tables, ensuring only pertinent information is included.
Freshness Tracking: For news queries, content older than specific thresholds is automatically flagged, ensuring that provided information is current.
Multilingual Intelligence: The library can detect the language of the input query and switch models accordingly, supporting multilingual capabilities without requiring manual switches.
Quality Control: The pipeline guarantees citation accuracy, filtering out any irrelevant boilerplate or navigation elements from the fetched content.

How It Works

Query → Detects the input language.
Search → Runs searches on Bing and DuckDuckGo.
Semantic Pre-Filter → Filters results based on cosine similarity thresholds.
Fetch → Retrieves content using parallel workers to optimize time.
Content Quality Check → Validates the relevance of fetched content.
Chunking → Organizes the content based on semantic topics.
Hybrid Reranking → Combines different ranking techniques to ensure the best results.
Freshness Penalties → Adjusts scoring based on the age of the content for news queries.
Context Building → Structures the final input for the LLM with clean sources and warnings.

Quick Start Example

from production_rag_pipeline import build_llm_prompt

prompt = build_llm_prompt("latest AI news", lang="en")
print(prompt)

Comprehensive Package Structure

Contains various modules for search, fetching, extraction, reranking, and prompt assembly, allowing for flexibility and scalability in pipeline design.

Community and Support

For contribution guidelines, refer to the CONTRIBUTING.md. Community and security expectations can be found in CODE_OF_CONDUCT.md and SECURITY.md. Questions and feedback can be directed to Artem KK, the project maintainer.

0 comments

No comments yet.

New comment