DBMerge is a database-agnostic Python library that streamlines UPSERT (Insert/Update/Delete) operations with ease. Built on SQLAlchemy, it simplifies complex SQL commands into an intuitive interface, supporting various data sources and database types. Optimize data synchronization effortlessly.
DBMerge is a powerful Python library designed for seamless execution of UPSERT (Insert/Update/Delete) operations across various database systems through the use of SQLAlchemy. It simplifies the complexities involved in crafting engine-specific SQL MERGE or ON CONFLICT statements, allowing developers to focus on synchronizing data efficiently.
Overview
DBMerge supports a variety of input data sources, including:
- Pandas DataFrames
- Lists of dictionaries
- Other database tables or views
The library automates the data update process by comparing source data against the target table, performing necessary operations with efficiency:
- Insert new records that are not present in the target table.
- Update existing records when their values have changed.
- Delete or mark existing records in the target table that are missing from the source data.
To enhance performance, DBMerge first loads data into a temporary table and executes bulk synchronization queries, ensuring faster operations even with significant datasets.
Supported Databases
DBMerge has been tested and verified with:
- PostgreSQL
- MariaDB / MySQL
- SQLite
- MS SQL Server
Key Features
- Database Agnostic: Write synchronization logic once and utilize it across various SQL databases without code modifications.
- High Performance: Utilizes temporary staging tables for rapid bulk operations, avoiding slow row-by-row updates.
- Smart Deletion: Implement scoped deletions by passing SQLAlchemy logical expressions for specific timeframes or subsets.
- Auto-Schema Management: Automatically creates any missing tables or columns within the database.
- Audit Support: Include optional parameters to automatically log
merged_onandinserted_ontimestamps, tracking changes made to rows.
Quick Start Example
The library employs a context manager to manage database connections effectively. Here is a brief usage example:
from sqlalchemy import create_engine
from datetime import date
from dbmerge import dbmerge
# Initialize DB engine
engine = create_engine("sqlite://")
# Prepare source data
data = [
{'Shop': '123', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 2, 'Price': 50.10},
{'Shop': '124', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 1, 'Price': 100.50}
]
# Execute the merge operation
with dbmerge(engine=engine, data=data, table_name="Facts",
key=['Shop', 'Product', 'Date']) as merge:
merge.exec()
Performance Benchmark
DBMerge efficiently handles the entire reconciliation process, demonstrating excellent performance even with large amounts of data. For example, synchronization times on a standard developer laptop are as follows:
| Database | DBMerge (100k rows) | DBMerge (1mil rows) |
|---|---|---|
| PostgreSQL | ~2.0s | ~19.8s |
| MySQL / MariaDB | ~1.0s | ~11.1s |
| SQLite | ~0.7s | ~7.6s |
| MS SQL Server | ~22.4s | ~4m 23s |
Documentation and Resources
For further details, see the full module documentation and advanced examples available in the repository. DBMerge is ideal for developers seeking a streamlined, efficient approach for database synchronization operations.
No comments yet.
Sign in to be the first to comment.