PromptProof GitHub Action

Simplifying deterministic LLM contract checks for CI/CD workflows.

Pitch

PromptProof GitHub Action enhances CI/CD pipelines by performing deterministic LLM testing. The action checks recorded LLM outputs against defined contracts, ensuring compliance while eliminating live model calls. With rich reporting and automatic PR comments, it tracks costs and identifies violations efficiently for streamlined development.

Description

The PromptProof GitHub Action facilitates deterministic testing of language model outputs within CI/CD pipelines. This action evaluates recorded outputs against defined contracts and will fail pull requests if violations are detected, ensuring robust quality assurance without requiring live model calls.

Key Features

Zero Network Calls: Tests are conducted exclusively on recorded fixtures, enhancing reliability and performance.
Comprehensive Reporting: Generate detailed reports in multiple formats including HTML, JUnit, and JSON for easy integration and analysis.
Automated PR Comments: Provides summary comments on pull requests, highlighting any violations found during evaluation.
Budget Tracking: Monitors cost and latency metrics, helping maintain efficiency during testing.
Flexible Validation Checks: Supports various types of checks including JSON schema validation, regex patterns, numeric bounds, string comparisons, list/set equality, file diffs, and custom functions.

Example Usage

Include the action in your workflows to evaluate LLM outputs:

name: PromptProof
on: [pull_request]
jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: geminimir/promptproof-action@v0
        with:
          config: promptproof.yaml
          baseline-ref: origin/main
          runs: 3
          seed: 1337
          max-run-cost: 2.50
          report-artifact: promptproof-report
          mode: gate

Configuration Example

Create a promptproof.yaml file to define your checks:

schema_version: pp.v1
fixtures: fixtures/outputs.jsonl
checks:
  - id: no_pii
    type: regex_forbidden
    target: text
    patterns:
      - "[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}"
budgets:
  cost_usd_per_run_max: 0.50
  latency_ms_p95_max: 2000
mode: fail

Outputs

The action provides key outputs to analyze testing results, including:

violations: Number of violations detected.
passed: Number of tests that passed successfully.
failed: Number of tests that failed.
total-cost: Overall cost in USD for the evaluation.

Report Artifacts

Generated reports include:

HTML Reports: For human-friendly display and review.
JSON Reports: For automated processing and access.
JUnit XML: For better visualization of test results within CI tools.

This action streamlines the integration of machine learning model checks into existing development workflows, enhancing the reliability of LLM outputs in CI/CD processes while minimizing external dependencies.

0 comments

No comments yet.

New comment