PitchHut logo
Ship of Theseus
Explore the evolution of a codebase through its Git history.
Pitch

Ship of Theseus analyzes the evolutionary trajectory of code in a Git repository, weighing how much original code remains. Combining speed, comprehensive tracking, and appealing visuals, it addresses the philosophical question of identity amidst change, making it an essential tool for developers curious about their code's lineage.

Description

Ship of Theseus is an innovative tool designed to analyze the evolution of a codebase over time, inspired by the ancient philosophical paradox regarding identity and change. This repository's unique functionality enables users to trace the lineage of each line of code from its inception to the present, investigating the question: Is this still the same codebase?

The Philosophical Question

The Ship of Theseus dilemma poses a thought-provoking inquiry: if every part of a ship is replaced over time, does it remain the same ship? Similarly, if every line of code in a software project is modified, how much of the original code still exists? This tool effectively quantifies the originality of code by meticulously analyzing the Git history.

Core Features

  • Performance: Leveraging the Git CLI, Ship of Theseus operates at speeds 10 to 100 times faster than traditional libraries, thanks to its parallel processing capabilities.
  • Thorough Analysis: The tool meticulously traces every line of code through the Git history, equipped with rename detection to maintain accuracy in tracking changes.
  • Visual Insights: It generates engaging outputs, including ASCII art, detailed graphs, and philosophical commentary to elucidate findings.
  • Intelligent Similarity Measurement: By employing the Levenshtein distance, the tool effectively measures the similarity between lines of code, determining the extent of originality.
  • Historical Timeline Generation: A timeline visualization of code evolution is produced, showcasing the transformation of the codebase over time.
  • Robust Filtering: The analysis distinguishes between original code and elements such as comments, blank lines, generated content, and vendor dependencies to focus on core code changes.

How It Works

The algorithm employed by Ship of Theseus consists of several steps:

  1. Collect tracked files using git ls-files, ensuring adherence to the .gitignore file.
  2. Retrieve file content from Git to avoid any discrepancies with the working directory.
  3. Identify the first commit for each file using git log --follow.
  4. Compare the current line with its original counterpart, assessing whether they share similarities.
  5. Scan through a limited range of lines (±10) to account for common refactoring practices (such as moving lines or extracting methods).
  6. Calculate similarity metrics based on the Levenshtein distance to determine the percentage overlap.
  7. Establish lines with 25% similarity or more as "original".

Output Example

The output provides an insightful overview of the analysis:

═══════════════════════════════════════════════════════════════════
                    🚢 SHIP OF THESEUS
                 Codebase Evolution Analysis
═══════════════════════════════════════════════════════════════════

📊 OVERALL STATISTICS
   Total Lines of Code:    45,234
   Original Lines:         12,456 (27.5%)
   Average Similarity:     68.3%

💭 INTERPRETATION
   ⚡ This codebase has undergone substantial evolution.
   Like Theseus's ship, many planks have been replaced.

📈 ORIGINAL CODE REMAINING
   [████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 27.5%

Command-Line Options

Users can customize their analysis with several command-line options:

--path string     Path to git repository (default: ".")
--workers int     Number of parallel workers (default: NumCPU)
--sample int      Sample every Nth commit for timeline (default: 50)
--version         Show version information

Contribution Opportunities

Contributions are encouraged to enhance the tool's capabilities, including:

  • Developing JSON/CSV export options for data analysis.
  • Creating HTML report generation with interactive visualizations.
  • Implementing comparison features between branches or tags.

Philosophical Insight

This project's essence goes beyond mere technical metrics; it serves to stimulate reflection on the nature of software evolution and identity. As code continuously evolves, the questions posed by the Ship of Theseus challenge professionals to consider not only the technical aspects of development but also the philosophical implications of change within the software landscape.

0 comments

No comments yet.

Sign in to be the first to comment.