PitchHut logo
LeakCTL
Automated detection and analysis of data leaks from Telegram.
Pitch

LeakCTL is an innovative platform designed to automatically capture and analyze .txt files shared in Telegram channels. By integrating Telegram monitoring with Elasticsearch, it offers powerful search capabilities and a secure web interface tailored for security researchers and SOC teams to efficiently track and manage data leaks.

Description

LeakCTL: Automated Data Leak Detection and Analysis Platform

LeakCTL is an innovative monitoring system that facilitates the automated capture of .txt files shared in Telegram channels. It efficiently extracts file contents and indexes them into Elasticsearch, providing a robust web interface for effective analysis and full-text search capabilities, thus promoting data leak detection and analysis.

Designed specifically for security researchers and Security Operation Center (SOC) teams, LeakCTL streamlines the process of tracking and analyzing data leaks in a scalable, automated, and centralized manner.

Key Features

  • Telegram Agent (Userbot)

    • Monitors Telegram groups and channels for .txt file leaks.
    • Automatically downloads, parses, and indexes contents into Elasticsearch.
    • Avoids duplication by skipping already-processed files.
    • Supports batch uploads with progress tracking and retry logic.
    • Includes Telegram-based logging and alerting features.
  • Web Interface

    • Secure user login system with session management and rate limiting.
    • Full-text wildcard search across indexed leak contents via Elasticsearch.
    • One-click export of search results as .txt files.
    • Live pagination for navigating large result sets.
    • Streamlined user interface with role-based access restrictions.
  • Elasticsearch Integration

    • Manages custom Index Lifecycle Management (ILM) policies and index templates.
    • Features daily index rollover and automatic deletion of outdated indices.
    • Optimized for high-performance ingestion and search capabilities even with millions of indexed entries.
  • Docker-Ready Deployment

    • Easily set up with docker-compose, including installation scripts for Elasticsearch and Docker.

System Architecture

LeakCTL consists of three primary components that work cohesively for automated data leak detection:

  1. Telegram Userbot (user_bot.py)

    • Operates as a headless user on Telegram, scanning target chats for any .txt file leaks.
    • Downloads and parses each file line-by-line, preparing it for upload.
    • Uploads the organized data to an Elasticsearch index named leaks.
  2. Elasticsearch Backend

    • Contains all extracted lines paired with relevant metadata such as file_name, line_number, and @timestamp.
    • Implements ILM to automate daily index rollover and deletion of obsolete indices, ensuring optimal performance for full-text search.
  3. Flask Web Interface (app.py)

    • Provides a secure frontend for analysts to search and view leaked contents in detail.
    • Supports full-text wildcard search and pagination to manage extensive datasets efficiently.
    • Results can be conveniently exported as downloadable .txt files, with built-in protections against spam through rate limiting and session expiry.

UML Diagram Summary

  • uml_use_case_diagram.png illustrates user interactions with the system, including authentication and search operations.
  • uml_class_diagram.png outlines the class architecture of key components, such as User, the search logic, and file processing mechanisms.
  • uml_class_academic_diagram.png offers a higher-level overview of the modular interactions between components.

For complete architectural diagrams, refer to /uml_*_diagram.png in the repository.

0 comments

No comments yet.

Sign in to be the first to comment.