Legal Action Boundary Eval (LABE) provides a robust public proxy evaluation for legal AI workflows. By assessing action boundary scenarios such as negotiation moves and compliance clearance, LABE offers measurable insights into systems' decision-making effectiveness. This evaluation ensures legal agents not only analyze data but execute critical high-impact actions with precision.
Legal Action Boundary Eval (LABE)
The Legal Action Boundary Eval (LABE) provides a robust public proxy evaluation tool specifically designed for legal AI workflows that operate at critical action boundaries. This evaluation framework encompasses various critical legal processes such as negotiation moves, compliance clearance, review routing, and the management of supervised and sub-agent flows.
Overview of LABE
The LABE is built on publicly marketed workflows by Luminance that span negotiation, compliance, collaboration, and supervisor-led legal agent interactions. Importantly, LABE functions independently and is not an internal benchmark from Luminance.
Purpose
Traditional evaluations in legal AI often focus on fundamental aspects such as:
- Clause extraction
- Answer quality
- Markup quality
- Summarization quality
However, LABE delves deeper by addressing critical high-impact decisions within a legal context, such as:
- Accepting or modifying a clause
- Routing documents for signature
- Resolving issues within agreements
- Ensuring compliance clearance
- Managing escalations or rerouting agreements
Key Findings
The results from the current suite, encompassing 12 scenarios evaluated in both TypeScript and Python, reveal significant insights:
- Baseline systems recorded 18 unjustified high-impact actions
- VerifiedX demonstrated 0 unjustified actions
- No false blocks occurred with VerifiedX
- Goal completion rates improved from 41.7% to 100%
Full results can be explored further in the results document along with supporting artifacts located in the artifacts directory.
Description
LABE serves as:
- A public proxy evaluation focused on specific workflow categories made publicly available by Luminance
- A side-by-side evaluation of baseline versus the VerifiedX system
- A focused discussion on action-boundary evaluations rather than general model quality assessments
- An evaluation suite that maintains consistency across TypeScript and Python scenarios
Exclusions
LABE is not intended for:
- Replicating Luminance's internal products or their associated traffic
- Making claims about overall legal reasoning quality
- Benchmarking tools such as Word UI, OCR processes, diligence reviews, or repository searches
- Serving as a substitute for evaluations specific to customer needs
Evaluation Tracks
- Negotiation: Handling counterparty positions, applying approved redrafts, facilitating signature routing, and resolving clause disputes.
- Compliance: Ensuring agreements meet compliance standards, applying remedial markups, escalating failures in checks, and preventing false clearance.
- Composed Workflows: Managing a sequence from an intake agent to execution, leading into legal/compliance review and potential redispatching or lane changes.
Repository Structure
The repository comprises various documents to support the evaluation process:
- EVAL_CARD.md: A concise overview detailing scope, intended use, metrics, and limitations.
- METHODOLOGY.md: Outlines the foundational sources, design of the harness, and scoring policies alongside limitations.
- SCENARIOS.md: A complete catalog of 12 scenarios, including guarded actions and expected behaviors.
- RESULTS.md: A comprehensive scorecard linked with raw artifact files.
- REPRODUCE.md: Instructions to rerun the evaluation suite and recreate the public report artifacts.
- EXECUTIVE_BRIEF.md: A summary tailored for legal AI operators, product managers, and governance stakeholders.
Current Run Details
- Run Date: 2026-04-19
- Model: gpt-5.4-mini
- Run Environment: Real production run against the VerifiedX API
- VerifiedX API: https://api.verifiedx.me
- TypeScript SDK:
@verifiedx-core/sdk@0.1.17 - Python SDK:
verifiedx==0.1.8
Supporting Evidence
Access the raw data and artifacts via the following links:
Public Workflow References
For additional context about the workflows leveraged in this evaluation, please visit:
- Luminance Homepage
- Luminance Negotiate
- Luminance Compliance
- Luminance Collaborate
- Blog on Legal-Grade AI
- Press Release on Autonomous Negotiation
This overview encapsulates the essential components and findings of the Legal Action Boundary Eval (LABE), a crucial tool aimed at enhancing the precision of legal AI applications operating at pivotal decision-making junctures.
No comments yet.
Sign in to be the first to comment.