Drarpa - High-performance Change Data Capture sync engine in Go.

Drarpa

High-performance Change Data Capture sync engine in Go.

Pitch

Drarpa is a production-grade CDC sync engine that ensures exactly-once database replication with features such as saga orchestration, schema evolution, and PII masking. It simplifies the complexities of traditional ETL tools and streaming platforms, enabling teams to focus on building reliable data pipelines without compromising transactional consistency.

Description

Drarpa: High-Performance Change Data Capture Engine

Drarpa is a production-grade, high-performance Change Data Capture (CDC) synchronization engine, engineered in Go. This innovative tool bridges the essential gap between traditional ETL tools and true end-to-end transactional consistency, providing organizations with an effective solution for data replication.

Key Problem Addressed

Traditional ETL tools often compromise transactional integrity across distributed systems, obscure schema changes leading to potential data corruption, and lack guarantees for exactly-once delivery, thereby leaving data consistency as a recurring headache for development teams. Moreover, relying on complex streaming platforms requires cumbersome orchestration to manage retries and rollbacks, pushing developers into a maze of brittle, low-documented code or exposing them to data inconsistency challenges.

Drarpa addresses these issues comprehensively by offering:

Exactly-once delivery capabilities through idempotent UPSERT operations and automatic saga checkpointing.
Built-in saga orchestration facilitating compensating rollbacks seamlessly during write failures.
Schema-aware pipelines that adapt to schema changes without requiring any downtime for the application.
PII masking to protect sensitive information across the data synchronization process, ensuring compliance and security.
A single static binary without the need for external coordination services, simplifying deployment and reducing operational complexity.

Ideal User Base

Drarpa is tailored for:

Backend engineers managing consistent cross-database replication in distributed systems.
Data engineers orchestrating real-time data pipelines from transactional systems to data warehouses.
Platform teams handling multi-regional database architectures with strict consistency demands.
Any developer who has previously tackled CDC synchronization and seeks a more streamlined, robust solution.

Primary Use Cases

Drarpa shines in various scenarios, including:

Real-time data warehouse synchronization: Offering exactly-once guarantees while evolving schemas automatically from PostgreSQL to BigQuery.
Zero-downtime database migrations: Performing shadow writes during transitions, ensuring consistent and atomic cutovers with no data loss.
Multi-region replication capabilities equipped with automatic failover and extensive replication lag monitoring.
Event-driven microservices: Leveraging Kafka or NATS as a durable backbone for downstream applications.
Compliance-safe PII pipelines: Ensuring that sensitive columns are adequately masked before reaching their targets.
Cross-database synchronization: Facilitating migrations from MySQL to PostgreSQL with built-in conflict resolution features.

Robust Feature Set

Drarpa includes a wealth of features, such as:

Exactly-once delivery for SQL targets, ensuring data integrity through idempotent operations.
Causal ordering that preserves the event sequence per table.
Schema evolution capabilities enabling seamless adaptation to changes in database structure.
Saga orchestration that leverages automatic compensating rollback for distributed transactions.
Built-in support for PII masking, allowing organizations to handle sensitive information securely.
Full observability with OpenTelemetry, Prometheus metrics, and structured logging for monitoring performance.
Declarative configuration using HCL, making it easy to manage and version control deployment setups like Terraform.
Extensibility through a Plugin SDK for custom CDC sources and target integrations to meet specific business needs.

Architecture Overview

The architecture of Drarpa is structured across multiple layers:

CDC Source Layer: Supporting various sources like PostgreSQL, MySQL, and MongoDB.
Event Bus Layer: Utilizing Kafka, Redpanda, or NATS for high-throughput, durable message streaming.
Processing Pipeline: Incorporating essential components for schema mapping and conflict resolution.
Saga Orchestrator: Handling atomic operations, rollback, backpressure, and dead-letter queues for events.
Target Adapters Layer: Allowing connectivity to databases and storage solutions such as PostgreSQL, BigQuery, and S3.

For a detailed architectural overview, refer to the architecture documentation.

Integration and Management

Drarpa facilitates ease of integration with its REST API for management, allowing users to monitor job statuses, perform health checks, and handle checkpoints. Furthermore, it provides a CLI for streamlined operational tasks. All configurations are declared in HCL, making the setup process more accessible and manageable. Developers can explore example configurations and start their CDC pipelines quickly with provided templates.

For those interested in extending Drarpa's functionalities, the Plugin SDK is available for creating custom CDC sources and target adapters.

Drarpa empowers users to achieve reliable data synchronization with a focus on performance, flexibility, and operational simplicity. For further details, visit the Drarpa GitHub page for documentation, installation instructions, and contributions.

0 comments

No comments yet.

New comment