PitchHut logo
Effortless UPSERT operations using SQLAlchemy in Python.
Pitch

DBMerge is a database-agnostic Python library that streamlines UPSERT (Insert/Update/Delete) operations with ease. Built on SQLAlchemy, it simplifies complex SQL commands into an intuitive interface, supporting various data sources and database types. Optimize data synchronization effortlessly.

Description

DBMerge is a powerful Python library designed for seamless execution of UPSERT (Insert/Update/Delete) operations across various database systems through the use of SQLAlchemy. It simplifies the complexities involved in crafting engine-specific SQL MERGE or ON CONFLICT statements, allowing developers to focus on synchronizing data efficiently.

Overview

DBMerge supports a variety of input data sources, including:

  • Pandas DataFrames
  • Lists of dictionaries
  • Other database tables or views

The library automates the data update process by comparing source data against the target table, performing necessary operations with efficiency:

  • Insert new records that are not present in the target table.
  • Update existing records when their values have changed.
  • Delete or mark existing records in the target table that are missing from the source data.

To enhance performance, DBMerge first loads data into a temporary table and executes bulk synchronization queries, ensuring faster operations even with significant datasets.

Supported Databases

DBMerge has been tested and verified with:

  • PostgreSQL
  • MariaDB / MySQL
  • SQLite
  • MS SQL Server

Key Features

  • Database Agnostic: Write synchronization logic once and utilize it across various SQL databases without code modifications.
  • High Performance: Utilizes temporary staging tables for rapid bulk operations, avoiding slow row-by-row updates.
  • Smart Deletion: Implement scoped deletions by passing SQLAlchemy logical expressions for specific timeframes or subsets.
  • Auto-Schema Management: Automatically creates any missing tables or columns within the database.
  • Audit Support: Include optional parameters to automatically log merged_on and inserted_on timestamps, tracking changes made to rows.

Quick Start Example

The library employs a context manager to manage database connections effectively. Here is a brief usage example:

from sqlalchemy import create_engine
from datetime import date
from dbmerge import dbmerge

# Initialize DB engine
engine = create_engine("sqlite://")

# Prepare source data
data = [
    {'Shop': '123', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 2, 'Price': 50.10},
    {'Shop': '124', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 1, 'Price': 100.50}
]

# Execute the merge operation
with dbmerge(engine=engine, data=data, table_name="Facts", 
             key=['Shop', 'Product', 'Date']) as merge:
    merge.exec()

Performance Benchmark

DBMerge efficiently handles the entire reconciliation process, demonstrating excellent performance even with large amounts of data. For example, synchronization times on a standard developer laptop are as follows:

DatabaseDBMerge (100k rows)DBMerge (1mil rows)
PostgreSQL~2.0s~19.8s
MySQL / MariaDB~1.0s~11.1s
SQLite~0.7s~7.6s
MS SQL Server~22.4s~4m 23s

Documentation and Resources

For further details, see the full module documentation and advanced examples available in the repository. DBMerge is ideal for developers seeking a streamlined, efficient approach for database synchronization operations.

0 comments

No comments yet.

Sign in to be the first to comment.