PitchHut logo
Effortlessly extract database subsets for local development and debugging.
Pitch

dbslice simplifies the process of debugging by allowing the extraction of minimal, referentially intact subsets from production databases. Instead of copying entire databases, dbslice intelligently retrieves only the necessary records, maintaining relationships and integrity, making it a vital tool for developers who need precise data to replicate and solve bugs.

Description

dbslice

dbslice is a powerful tool designed to help developers extract minimal, referentially-intact database subsets for use in local development and debugging. It addresses the challenge of reproducing bugs that require specific data from production databases while avoiding the inefficiency of copying an entire database.

The Problem

Recreating bugs often necessitates the exact records that caused them, which can be difficult to obtain when working with large production databases. dbslice simplifies this process by allowing users to extract only the necessary components by following foreign key relationships, thereby maintaining referential integrity.

dbslice Overview

Quick Start

To get started:

# Extract an order and all related records
dbslice extract postgres://localhost/myapp --seed "orders.id=12345" > subset.sql

# Import into local database
psql -d localdb < subset.sql

Key Features

  • Zero-Configuration Setup: dbslice introspects the database schema automatically, negating the need for a data model file.
  • Effortless Data Extraction: A single command is all it takes to extract complete data subsets.
  • Sensitive Data Handling: Automatically detects and anonymizes sensitive fields such as emails, phone numbers, and social security numbers by default.
  • Multiple Output Formats: Supports exporting data in SQL, JSON, and CSV formats, catering to various use cases.
  • Efficient Streaming: Optimizes memory use during the extraction of large datasets, accommodating 100K+ rows seamlessly.
  • Virtual Foreign Keys: Enables handling of Django GenericForeignKeys and implicit relationships through configuration.
  • Configurable Extractions: Provides an option for YAML-based configuration files, allowing for repeatable processes.
  • Data Validation: Ensures referential integrity of the extracted dataset to avoid potential issues.

Database Support

DatabaseStatus
PostgreSQLFully supported
MySQLPlanned (not yet implemented)
SQLitePlanned (not yet implemented)

Example Usages

Basic Extraction:

# Extract by primary key
 dbslice extract postgres://user:pass@host:5432/db --seed "orders.id=12345"

# Extract with WHERE clause
 dbslice extract postgres://localhost/db --seed "orders:status='failed' AND created_at > '2024-01-01'"

Anonymization Example:

# Auto-anonymize detected sensitive fields
 dbslice extract postgres://... --seed "users.id=1" --anonymize

Output Formats:

# SQL (default)
dbslice extract postgres://... --seed "orders.id=1" --output sql

# JSON fixtures
 dbslice extract postgres://... --seed "orders.id=1" --output json --out-file fixtures/

How It Works

  1. Introspection: Reads the database schema to discover tables and foreign key relationships.
  2. Traversal: Begins extraction from designated seed records, following foreign key relationships.
  3. Data Extraction: Collects all identified records.
  4. Sorting: Orders the tables correctly for insertion.
  5. Output Generation: Produces the final dataset in the specified formats with appropriate data handling.

With dbslice, the daunting task of preparing subsets for debugging becomes a streamlined process, enabling developers to focus more on resolving issues rather than managing data extraction complexities.

0 comments

No comments yet.

Sign in to be the first to comment.