PitchHut logo
Transform production databases into privacy-safe shadow copies effectively.
Pitch

Mimicry is a deterministic database anonymization tool that securely transforms production databases into privacy-safe shadow copies while preserving relational integrity. With automatic PII detection, it ensures consistent anonymization across tables, making it ideal for secure analytics and testing without exposing real identities.

Description

Mimicry is a powerful deterministic database anonymization tool designed to transform production databases into privacy-safe shadow copies while maintaining relational integrity. This tool stands out by employing Deterministic Hashing, ensuring that identical data entries are consistently anonymized across the database. For instance, if "John Doe" is found in multiple tables, he will be anonymized as "Alice Smith" in each instance, thus facilitating seamless joins and analytics without compromising real identities.

Key Features

  • Relational Consistency: Ensures that foreign keys and joins are preserved, meaning the same entity is transformed uniformly throughout all tables, which is crucial for maintaining the integrity of relational data.

  • Smart PII Detection: Automatically identifies sensitive data fields using common naming conventions, such as email, phone, and first_name, applying the appropriate transformations accordingly. This functionality extends to accommodate various international data formats.

  • Customizable Transformers: Offers a variety of built-in transformers for different data types like Email, Phone, and Address. Users can also define their own transformation logic through a plugin system or YAML-based configuration for specific column needs.

  • Statistical Preservation: Implements Gaussian blurring for numeric data, which maintains the overall distribution patterns while obscuring individual values, making it ideal for analytics and testing scenarios.

  • Stream Processing: Capable of processing large datasets efficiently without overwhelming system memory, allowing it to handle multi-terabyte databases with ease.

  • Delta Mode: Focuses on anonymizing only the changes made since the last run, making it a suitable option for CI/CD pipelines.

  • Subset Extraction: Enables users to extract a specific portion of the production database, including all related data entries, which promotes flexibility in testing environments.

Supported Databases

Mimicry currently supports PostgreSQL, MySQL, and SQLite, with additional support for MongoDB planned in future updates.

Usage Example

Example of an original users table before and after running Mimicry:

Original Data

idfirst_namelast_nameemailphonesalary
1JohnSmithjohn.smith@acmecorp.com+1 212 555 123485000.00
2JaneDoejane.doe@techstartup.io+1 415 555 5678120000.00
3MariaGarciamaria.garcia@bigbank.com+1 305 555 3456110000.00

After Running Mimicry

idfirst_namelast_nameemailphonesalary
1MorganDavisalex7f3b@example.com+1 555 867 530981450.00
2RileyJohnsonquinn42a@example.com+1 555 432 1098114000.00
3CaseyWilliamsriver8c2@example.com+1 555 219 8734104500.00

With Mimicry, sensitive user information is effectively anonymized ensuring data privacy, while the statistical characteristics of the original data are preserved.

0 comments

No comments yet.

Sign in to be the first to comment.