DSPydantic simplifies the optimization process for Pydantic model field descriptions by leveraging DSPy's advanced algorithms. By providing just a few examples, users can enhance data extraction accuracy with minimal manual intervention, streamlining the creation of better-structured data from LLMs.
DSPydantic is a powerful tool designed to enhance the capabilities of Pydantic models by optimizing field descriptions and prompts automatically using DSPy. This project streamlines the process of extracting structured data from Large Language Models (LLMs), enabling users to achieve superior results with minimal manual intervention.
Features
- Auto-Optimization: Leverage DSPy's algorithms to automatically generate the optimal field descriptions based on input examples.
- Comprehensive Input Support: Handle multiple types of input including text, images, and PDFs seamlessly.
- Built-in Evaluation Methods: Utilize multiple evaluation techniques including exact matching and Levenshtein distance, or create custom evaluators.
- LLM Judge Integration: Evaluate inputs without needing ground truth data by using an LLM as a judge, offering more flexibility in assessments.
- Prompt Optimization: Enhance both system and instruction prompts for improved interaction with LLMs.
- Nested Model Support: Automatically manage complex nested Pydantic models, optimizing them as well.
- Smart Optimizer Selection: The system automatically selects the best DSPy optimizer based on the size of the dataset, ensuring optimal performance.
Getting Started
To illustrate how DSPydantic functions, consider the following example:
from pydantic import BaseModel, Field
from dspydantic import PydanticOptimizer, Example, create_optimized_model
# Define a Pydantic model
class User(BaseModel):
name: str = Field(description="User name")
age: int = Field(description="User age")
email: str = Field(description="Email address")
# Example data for optimization
examples = [
Example(text="John Doe, 30 years old, john@example.com", expected_output=User(name="John Doe", age=30, email="john@example.com")),
Example(text="Jane Smith, 25, jane.smith@email.com", expected_output=User(name="Jane Smith", age=25, email="jane.smith@email.com")),
]
# Optimize the model
optimizer = PydanticOptimizer(
model=User,
examples=examples,
evaluate_fn="exact",
model_id="gpt-4o",
)
result = optimizer.optimize()
# Display optimized descriptions
print("Optimized descriptions:")
for field, description in result.optimized_descriptions.items():
print(f" {field}: {description}")
# Create a new optimized model
OptimizedUser = create_optimized_model(User, result.optimized_descriptions)
This code sample demonstrates how to set up a Pydantic model, provide example data, and use DSPydantic to optimize the model automatically.
Examples Directory
For additional use cases and complete working examples, visit the examples directory:
- Text Extraction: Effective extraction of structured data from veterinary EHR text.
- Image Classification: Classify handwritten digits from designated image files.
- Sentiment Analysis: Classify movie reviews based on sentiment.
- Human-in-the-loop Evaluation: Interactive evaluation using graphical user interfaces.
For detailed API references, including descriptions of the PydanticOptimizer and Example classes, please refer to the full documentation provided in the README.
No comments yet.
Sign in to be the first to comment.