Kaelum offers a novel approach to AI inference, utilizing a reward model and an online policy router. By leveraging a mixture-of-experts architecture, it directs queries across specialized domains, improving reasoning effectiveness. The system learns in real-time from each query, optimizing performance without needing a predefined labeled dataset.
Kaelum is an innovative project aimed at improving the reasoning capabilities of language models through advanced inference techniques. The core idea is to implement inference-time compute scaling by exploring multiple reasoning paths before delivering a solution rather than relying on instantaneous outputs. This method significantly enhances the model's ability to tackle complex problems with greater accuracy.
Key Features
-
Reward Modeling and Dynamic Routing: The project explores whether the routing and reward modeling layers can adapt and learn solely from live queries, negating the need for a pre-collected labeled dataset. The underlying language model, Qwen, alongside the sentence encoder
all-MiniLM-L6-v2, forms a robust foundation, with the added layer of Policy Reward Models (PRMs) functioning without an offline training phase. -
Mixture-of-Experts (MoE) Style Architecture: Kaelum employs a sophisticated architecture that routes queries to specialized workers in various domains: math, code, logic, factual, creative, and analysis. This hierarchical approach enhances the specificity of responses and optimizes the reasoning process. A semantic cache allows similar queries to leverage previous computations, streamlining the process.
-
Human Feedback Integration: A notable feature includes the incorporation of a feedback mechanism where users can influence the routing decisions. By rating worker performance and answer quality, feedback serves as a critical enhancement tool, contributing to continuous improvement in the routing accuracy and overall response quality.
How It Works
The system operates through a structured process of forward and backward passes:
-
Embedding Layer: The input query is transformed into a vector representation, enriched with additional features.
-
Cache Lookup: A cache is consulted to determine if similar queries have already been processed, ensuring efficiency.
-
Routing: A two-layer neural network assigns the query to an appropriate worker based on the input vector.
-
LATS Search Process: During this phase, a tree is constructed through a series of simulations, where each node represents a reasoning step for the query. The model evaluates each step and selects the optimal reasoning path based on a reward system.
-
Policy Reward Model (PRM): This assesses the effectiveness of each reasoning step, ensuring optimal choices are made at every level.
-
Feedback Loop: Post-query, feedback is utilized to adjust routing choices and PRM outputs based on performance, feeding into the learning process for future interactions.
Example Usage
To run Kaelum from the command line, queries can be issued directly, adjusting parameters as necessary:
# Basic query
python kaelum.py "What is the integral of x^2?"
# Stream output token by token
python kaelum.py "Write a binary search in Python" --stream
# Output raw JSON
python kaelum.py "What is entropy?" --json
Backward Pass Updates
After each query, several updates occur to refine the model:
- Router and policy weights are adjusted based on recent outcomes to improve future routing decisions.
- The PRM is fine-tuned using outcomes from the most successful reasoning paths.
- Human feedback is integrated into the reward model to create a more profound learning effect, enhancing performance iteratively.
Projects Structure
Kaelum’s structure includes the core components that cater to its functionalities:
Kaelum/
├── core/
│ ├── learning/
│ ├── search/
│ ├── verification/
│ └── workers/
└── kaelum.py # CLI entry-point and library API
Kaelum represents a step forward in AI reasoning technology by leveraging a blend of real-time learning and specialized processing through a combination of advanced algorithms and human interaction.
No comments yet.
Sign in to be the first to comment.