While everyone builds wrappers, we found the ceiling. The Ainex Limit is a research project proving that Large Language Models (LLMs) suffer a deterministic 66% semantic decay when fed recursively on synthetic data. We don't just predict the collapse; we simulate it. This is the mathematical framework for the inevitable "Digital Inbreeding" of AI. contains output that makes you feel very confused.
The Ainex-Limit-Experiment provides a groundbreaking examination of AI model integrity and the phenomena of semantic collapse using GPT-2 Small (124M) through a series of recursive synthetic experiments. The project introduces the Ainex Integrity Score (denoted as $cA$), a novel geometric metric that evaluates the integrity of an AI’s generated output, going beyond traditional perplexity measures that fail to capture semantic fidelity.
Key Findings
- Iterations: 20
- Dataset: Recursive Synthetic (Self-Feeding)
- The experiment revealed a 66.86% loss of semantic accuracy by Generation 20, culminating in a model that exhibits permanent hallucinations accepting incorrect outputs as truth.
The Mathematical Proof
The study presents the Ainex Equation:
$$
\mathcal{A}{gen} = \frac{V{hull}(\mathbb{R}^3)}{1 + \lambda \cdot || \Delta \mu ||^2}
$$
Where:
- $V_{hull}(\mathbb{R}^3)$ measures the explorable semantic volume in 3D PCA space and indicates creativity.
- $||\Delta \mu ||$ quantifies the Euclidean drift from the human centroid, serving as an indicator of hallucination.
Observations
- Phase 1 (Gen 0-5): Characterized by volumetric implosion; the model loses approximately 85% of variance, resulting in overly conservative and repetitive outputs.
- Phase 2 (Gen 5-20): Displays a linear drift away from human construct norms, leading to a transition to a logic-less framework.
Case Study: The "Crocodile" Artifact
Analyzing a static prompt, "The fundamental laws of physics...", illustrates the semantic disintegration of the model:
| Generation | Log Output | Diagnosis |
|---|---|---|
| Gen 0 | "dictate that electrons are composed of a thin gas..." | PASS |
| Gen 5 | "matter has two functions... e.g., when moving..." | WARN |
| Gen 10 | "iron oxide... emails sent before returning home..." | FAIL |
| Gen 15 | "shields against predators such as crocodiles..." | CRITICAL |
| Gen 20 | "women aged 15... shields against crocodiles..." | DEAD |
This study conclusively demonstrates how synthetic data environment leads to the emergence of false axioms within model outputs.
Results Visualization
A 3D PCA projection visualization further emphasizes the transformation between outputs across generations, showcasing the striking contrast between human structure and the degenerated manifold.
Replication of Results
This repository offers a self-contained environment to enable replication of the experiments demonstrating model collapse.
Requirements:
- Hardware: Recommended T4 GPU or better, though CPU can be used with extended processing time.
- Key Dependencies:
torch,transformers,scipy,sklearn,rich.
Usage
To replicate the findings:
# Clone the repository
git clone https://github.com/mhh1430hacker/Ainex-Limit-Experiment
cd Ainex-Limit-Experiment
# Install dependencies
pip install -r requirements.txt
# Ignite the collapse loop
python main.py
This research provides a foundational understanding of AI model behavior and the critical importance of semantic integrity, opening new avenues for future studies in artificial intelligence and machine learning.
No comments yet.
Sign in to be the first to comment.