PromptProof GitHub Action enhances CI/CD pipelines by performing deterministic LLM testing. The action checks recorded LLM outputs against defined contracts, ensuring compliance while eliminating live model calls. With rich reporting and automatic PR comments, it tracks costs and identifies violations efficiently for streamlined development.
PromptProof GitHub Action
The PromptProof GitHub Action facilitates deterministic testing of language model outputs within CI/CD pipelines. This action evaluates recorded outputs against defined contracts and will fail pull requests if violations are detected, ensuring robust quality assurance without requiring live model calls.
Key Features
- Zero Network Calls: Tests are conducted exclusively on recorded fixtures, enhancing reliability and performance.
- Comprehensive Reporting: Generate detailed reports in multiple formats including HTML, JUnit, and JSON for easy integration and analysis.
- Automated PR Comments: Provides summary comments on pull requests, highlighting any violations found during evaluation.
- Budget Tracking: Monitors cost and latency metrics, helping maintain efficiency during testing.
- Flexible Validation Checks: Supports various types of checks including JSON schema validation, regex patterns, numeric bounds, string comparisons, list/set equality, file diffs, and custom functions.
Example Usage
Include the action in your workflows to evaluate LLM outputs:
name: PromptProof
on: [pull_request]
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: geminimir/promptproof-action@v0
with:
config: promptproof.yaml
baseline-ref: origin/main
runs: 3
seed: 1337
max-run-cost: 2.50
report-artifact: promptproof-report
mode: gate
Configuration Example
Create a promptproof.yaml
file to define your checks:
schema_version: pp.v1
fixtures: fixtures/outputs.jsonl
checks:
- id: no_pii
type: regex_forbidden
target: text
patterns:
- "[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}"
budgets:
cost_usd_per_run_max: 0.50
latency_ms_p95_max: 2000
mode: fail
Outputs
The action provides key outputs to analyze testing results, including:
violations
: Number of violations detected.passed
: Number of tests that passed successfully.failed
: Number of tests that failed.total-cost
: Overall cost in USD for the evaluation.
Report Artifacts
Generated reports include:
- HTML Reports: For human-friendly display and review.
- JSON Reports: For automated processing and access.
- JUnit XML: For better visualization of test results within CI tools.
This action streamlines the integration of machine learning model checks into existing development workflows, enhancing the reliability of LLM outputs in CI/CD processes while minimizing external dependencies.
No comments yet.
Sign in to be the first to comment.