aludel - Evaluate LLMs seamlessly with real-time metrics and comparisons.

aludel

Evaluate LLMs seamlessly with real-time metrics and comparisons.

Pitch

Aludel is a powerful evaluation workbench designed for Phoenix apps, enabling real-time comparisons of language models like OpenAI, Anthropic, and Ollama. Track output quality, latency, token usage, and cost while managing prompts with version control. Visualize performance trends and ensure consistent evaluation across providers.

Description

Aludel: LLM Evaluation Workbench for Phoenix Apps

Aludel is a powerful LLM Evaluation Workbench designed specifically for Phoenix applications. This tool streamlines the process of evaluating large language models (LLMs) from various providers such as OpenAI, Anthropic, and Ollama, allowing users to compare their performance in real-time.

Key Features

Multi-provider Comparison: Run the same prompt across different providers side-by-side. This comparison allows for tracking of latency, token usage, and cost per run.
Prompt Management: Aludel offers version-controlled templates with {{variable}} interpolation. Each edit creates an immutable new version of a prompt, ensuring clarity and consistency in prompt evolution. Tags and descriptions help in organizing prompts effectively.
Evolution Tracking: Visualize the performance of prompt versions over time. Track critical metrics such as pass rates, cost, and latency trends across different versions and providers.
Evaluation Suites: Utilize a visual test case editor to create automated assertions including contains, regex, exact_match, and json_field. This feature enables tracking of pass rates and regression detection over time.
Dashboard: Access live metrics as runs execute, including trends in cost, latency, and performance metrics for each provider.

Usage

To leverage Aludel effectively:

Create a prompt using the {{variable}} syntax:

Explain {{topic}} in exactly 3 sentences.

Run tests across providers: Initiate a new run, fill in the required variables, select the providers, and monitor the results.
Build evaluation suites: Create new suites with test cases and assertions to run regression tests.
Track prompt evolution: Use the Evolution tab to analyze version improvements over time, focusing on metrics like pass rates, costs, and latencies.
Add new providers by navigating to the Providers section to configure parameters and API keys.

Installation Overview

Aludel can be seamlessly integrated into any Phoenix LiveView application as a self-contained dashboard. Setting it up involves:

Adding necessary dependencies in mix.exs
Configuring the repository in config/config.exs
Running migrations and setting up environment variables for provider API keys.

For standalone usage, Aludel's application can be set up from the standalone/ directory, allowing operation without embedding it within a Phoenix app.

Community and Support

Engage with the Aludel community through Discussions for queries and idea sharing, and report issues via the Issues section on GitHub.

Contribution Guidelines

Contributions are welcome. Please refer to the CONTRIBUTING.md for detailed instructions on how to participate in the Aludel project.

0 comments

No comments yet.

New comment