PitchHut logo
Effortlessly compare and benchmark models in your terminal.
Pitch

Prompter offers a streamlined way to evaluate and compare multiple AI models simultaneously from your terminal. With zero dependencies and a simple setup, it allows users to run prompts across various models in real time, providing detailed feedback in a structured manner. Optimize AI performance effortlessly.

Description

Prompter is a powerful terminal-based tool designed for multi-model comparison, benchmarking, and evaluation specifically for Ollama. With its simple one-file architecture and reliance exclusively on the standard Python library, Prompter offers an efficient solution for users looking to assess and compare different models without the need for additional dependencies.

Key Features

  • Multi-Model Evaluation: Execute the same prompt on multiple models simultaneously, enabling a side-by-side comparison that highlights differences in reasoning, accuracy, and tone.
  • Structured Evaluation Modes: Prompter supports diverse evaluation formats including:
    • Default Mode: Compare multiple models effortlessly and watch their responses stream live.
    • Ralph Mode: Initiate a self-review loop where a model critiques and refines its own responses over several rounds.
    • Council Mode: Engage three distinct personas (Domain Expert, Skeptic, and Devil's Advocate) in a debate to derive insights on complex issues.
    • Tribunal Mode: Challenge claims through adversarial interrogation, allowing one model to defend while another critiques.
    • Benchmark Mode: Run a series of 20 standardized tests ranging from arithmetic to web searches to evaluate model capabilities thoroughly.

Output Organization

Results from evaluations are neatly saved as collapsible markdown files in the responses/ directory, detailing all pertinent statistics — including timing, token counts, tool call traces, and response text — enabling easy access and analysis.

Tools and Configuration

Prompter offers extensive functionality through optional external tools such as web search and shell command execution. Configuration is straightforward, utilizing environment variables to adapt the setup to specific requirements.

Ideal User Base

Prompter is tailored for developers running local LLMs, teams assessing open-weight models pre-deployment, researchers comparing model behaviors, and anyone utilizing Ollama who needs a more organized and efficient way to manage multiple prompts within their terminal environment.

Examples

Prompter showcases clear and well-structured outputs, including:

  • Tribunal Mode: Effective fact-checking scenarios that accurately identify and correct misinformation through collaborative model engagement.
  • Benchmark Mode: Detailed results from 20 capability tests, presenting clear pass/fail outcomes alongside metrics like token usage and response times, promoting informed model selection.

Prompter enhances the evaluation of local LLMs by providing a streamlined interface for model comparison, moving beyond basic functionality to enable comprehensive and automated assessments.

0 comments

No comments yet.

Sign in to be the first to comment.