Cybernetic Entropy Control offers a sophisticated 4th-order feedback mechanism that fine-tunes LLM sampling parameters in real-time. By leveraging token-level entropy, it achieves significant improvements in accuracy, outperforming traditional control methods and addressing hallucination spikes before they escalate.
Cybernetic Entropy Control
A 4th-order feedback controller that adjusts LLM sampling parameters in real-time based on token-level entropy. Using velocity-form actuation, it improves MATH benchmark accuracy from 75% to 83% over an uncontrolled baseline on small-scale experiments, and from 60.9% to 63.0% on a full 5000-problem sweep.
What it does
At each token generation step, the controller:
- Computes Shannon entropy over the top-64 softmaxed logits
- Feeds the error (target − actual) through a tanh dampener
- Updates a 4th-order state vector: integral, error, velocity (Δe), acceleration (Δ²e)
- Applies velocity-form actuation: each actuator integrates K·x over time, accumulating corrections rather than computing them from scratch
The acceleration term is the key contribution — it catches the upward curvature in entropy that precedes a hallucination spike, enabling intervention before it peaks. The velocity form gives the controller memory — by integrating corrections over time, it maintains persistent state that adapts to the trajectory of the generation rather than reacting only to instantaneous error.
Results
Full benchmark sweep (5000 problems)
All results: Qwen 3.5 2B (Q4_K_M), 5000 problems from MATH (Hendrycks et al.), 4096 token budget, llama.cpp with CUDA.
| Setup | Accuracy | Δ |
|---|---|---|
| Baseline (uncontrolled) | 60.9% (3047/5000) | — |
| Hybrid (4th order + QEWS, w_H=1, w_Q=1) | 63.0% (3152/5000) | +2.1pp |
Token efficiency: The controller generates ~7% fewer tokens on average (mean 1852 → 1722 tokens).
Failure mode analysis
The dominant failure mode is spinning: the model exhausts the token budget without converging. Cap hits are nearly always wrong (5.4% accuracy on baseline, 6.2% on hybrid). The controller's gain comes almost entirely from reducing cap hits rather than correcting confident wrong answers.
| Baseline | Hybrid | Δ | |
|---|---|---|---|
| Cap hit rate | 28.2% (1409/5000) | 25.3% (1267/5000) | −2.9pp |
| Under-cap accuracy | 82.7% (2971/3591) | 82.3% (3073/3733) | −0.4pp |
| Cap-hit accuracy | 5.4% (76/1409) | 6.2% (79/1267) | +0.8pp |
| Overall accuracy | 60.9% | 63.0% | +2.1pp |
Under-cap accuracy is nearly unchanged (−0.4pp), confirming the controller is not interfering with problems it has no business touching. The entire gain is cap hit reduction.
Entropy analysis
Mean entropy is a strong predictor of outcome. Capped problems have systematically higher entropy than uncapped ones, and wrong answers have higher entropy than correct ones across both conditions.
| Condition | Group | mean H | std H |
|---|---|---|---|
| Baseline | Uncapped + Correct | 0.225 | 0.419 |
| Baseline | Uncapped + Wrong | 0.256 | 0.455 |
| Baseline | Capped + Wrong | 0.344 | 0.540 |
| Hybrid | Uncapped + Correct | 0.210 | 0.404 |
| Hybrid | Uncapped + Wrong | 0.244 | 0.443 |
| Hybrid | Capped + Wrong | 0.285 | 0.496 |
The controller reduces mean entropy across all groups. The largest reduction is in capped wrong answers (0.344 → 0.285), consistent with it having the most authority over high-entropy spinning problems.
This implies the +2.1pp gain is a lower bound: stronger actuators with more authority over spinning should produce larger gains. The current actuators (min_p, top_p, repeat penalty) are insufficient to fully break a spin once it begins.
Small-scale experiments (v2: velocity-form controller)
All results: Qwen 3.5 2B (Q4_K_M), 100 problems from MATH, 2048 token budget, llama.cpp with CUDA.
Multi-actuator control (Min-P + Top-P + Frequency Penalty)
| Setup | Accuracy | Δ |
|---|---|---|
| Baseline (uncontrolled) | 75.0% | — |
| PID (3rd order) | 78.0% | +3.0 |
| 4th order, 2x acceleration | 79.0% | +4.0 |
| 4th order | 82.0% | +7.0 |
Min-P only
| Setup | Accuracy | Δ |
|---|---|---|
| Baseline (uncontrolled) | 75.0% | — |
| 4th order | 77.0% | +2.0 |
| 4th order, 2x acceleration | 77.0% | +2.0 |
| PID | 82.0% | +7.0 |
QEWS (Quantum Early Warning Signal)
| Setup | Accuracy | Δ |
|---|---|---|
| Baseline (uncontrolled) | 75.0% | — |
| QEWS hybrid (w_H=1, w_Q=2) | 74.0% | −1.0 |
| QEWS replace (4th order) | 76.0% | +1.0 |
| QEWS replace (4th order 2x) | 79.0% | +4.0 |
| QEWS replace (PID) | 82.0% | +7.0 |
| QEWS hybrid (w_H=1, w_Q=1) | 83.0% | +8.0 |
v1: Position-form controller (earlier results, different parser)
All results: Qwen 3.5 2B (Q4_K_M), 200 problems from MATH, 4096 token budget, llama.cpp with CUDA.
Note: these experiments used an earlier answer parser and are not directly comparable to results above.
Multi-actuator control
| Setup | Accuracy | Δ |
|---|---|---|
| Baseline (uncontrolled) | 55.0% | — |
| PID (3rd order) | 58.0% | +3.0 |
| 4th order | 59.5% | +4.5 |
| 4th order, 2x acceleration | 56.5% | +1.5 |
Min-P only
| Setup | Accuracy | Δ |
|---|---|---|
| Baseline (uncontrolled) | 55.0% | — |
| PID | 56.0% | +1.0 |
| 4th order | 55.5% | +0.5 |
| 4th order, 2x acceleration | 59.5% | +4.5 |
QEWS
| Setup | Accuracy | Δ |
|---|---|---|
| Baseline (uncontrolled) | 55.0% | — |
| QEWS replace (density operator only) | 51.5% | −3.5 |
| Hybrid (w_H=1, w_Q=1) | 54.0% | −1.0 |
| Hybrid (w_H=1, w_Q=2) | 58.0% | +3.0 |
Key findings
- Velocity form is a major improvement. Switching from position-form to velocity-form actuation raised the best result from +4.5pp to +8.0pp on small-scale experiments. The integrating actuator maintains persistent corrections that compound over the generation trajectory.
- The acceleration term matters. The 4th-order controller outperforms PID by 1.5–4.5 points depending on configuration. The signal catches entropy dynamics that lower-order controllers miss.
- QEWS hybrid is the best configuration. Shannon entropy and quantum density operator signals at equal weighting (w_H=1, w_Q=1) achieves 83% — the single best result across all small-scale experiments.
- The spinning failure mode is the primary target. In the full sweep, cap hits account for nearly all wrong answers. The controller's gain comes almost entirely from reducing cap hits, not from correcting confident wrong answers. Confident wrong answers (short responses, low entropy) are outside the controller's observable domain.
- Entropy predicts outcome. Mean entropy cleanly separates correct from wrong answers and uncapped from capped problems across both conditions, validating entropy as a control signal.
- Temperature is a harmful actuator. Earlier experiments using temperature control degraded accuracy by up to 8 points. Temperature directly modifies the logit distribution, corrupting the entropy signal the controller is responding to. Min-P does not have this problem.
- The controller has real authority. Aggressive overtuning (2x acceleration gains) can degrade performance, confirming the controller meaningfully steers generation — not a null effect.
Controller architecture
Velocity-form actuation
Unlike the position-form controller where actuators were computed fresh each step, the v2 velocity-form controller integrates corrections over time:
actuator(k) = actuator(k-1) + K · x(k)
If entropy has been consistently too high, the actuator accumulates tighter sampling — it doesn't snap back to default the moment entropy briefly dips.
QEWS: Quantum Early Warning Signal
A secondary observation channel inspired by the density operator formalism from quantum information theory. A rolling window of L2-normalized top-K logit vectors forms a density matrix:
$$\boldsymbol{\rho}k = \frac{1}{W} \sum{i=k-W+1}^{k} \boldsymbol{\psi}_i \boldsymbol{\psi}_i^\top$$
The von Neumann entropy of ρ tracks structural shifts in the distribution of distributions over time, rather than instantaneous entropy at a single token. The QEWS signal is the deviation from a running exponential moving average of this entropy.
How it's different from Entropix
Entropix uses entropy thresholds to switch between predefined sampling strategies (e.g., "high entropy → increase temperature"). This is open-loop control: a lookup table with no feedback.
Cybernetic Entropy Control is closed-loop. The controller continuously tracks the error signal and its derivatives, adjusting actuators proportionally to how fast and in which direction entropy is moving. The acceleration term gives it a predictive edge — it responds to the onset of an entropy spike, not the spike itself. The velocity form gives it memory — it accumulates corrections rather than reacting from scratch at each step.
Usage
# Baseline (no control)
python run.py -m 2b -d data/hendrycks_math.parquet \
--limit 200 --max-tokens 4096 -o results/baseline.jsonl
# 4th-order entropy control (velocity form)
python run.py -m 2b -d data/hendrycks_math.parquet \
--limit 200 --max-tokens 4096 --control --H-target 0.1 \
--K-M 0.005 0.03 0.04 0.08 \
--K-P 0.005 0.02 0.03 0.05 \
--K-F 0.002 0.01 0.015 0.025 \
-o results/controlled.jsonl
# QEWS hybrid mode (best configuration)
python run.py -m 2b -d data/hendrycks_math.parquet \
--limit 200 --max-tokens 4096 --control --H-target 0.1 \
--K-M 0.005 0.03 0.04 0.08 \
--K-P 0.005 0.02 0.03 0.05 \
--K-F 0.002 0.01 0.015 0.025 \
--qews-mode hybrid --qews-target 0.0 --w-H 1.0 --w-Q 1.0 \
--qews-K-M 0.005 0.03 0.04 0.08 \
--qews-K-P 0.005 0.02 0.03 0.05 \
--qews-K-F 0.002 0.01 0.015 0.025 \
-o results/qews_hybrid.jsonl
# Run full sweep
bash run_full_math_ab.sh
# Analyze results
python analyze.py results/*.jsonl --sort accuracy --md
# Interactive mode
python run.py -m 2b --control --H-target 0.1
Requirements
- Python 3.10+
- llama-cpp-python with CUDA
- numpy, torch, pyarrow
- A GGUF model (tested with Qwen 3.5 family)
Roadmap
Velocity form controller(done — v2)Multi-actuator + QEWS hybrid combined run(done — best result)Full 5000-problem benchmark sweep(done)- Single-actuator ablation (min_p, top_p, repeat_penalty) — in progress
- Token position as secondary sensor (budget pressure b(t) = t/t_max)
- TECA (Token Entropy Cumulative Average) as sensor
- Decoupled temperature: measure H(t) at fixed temperature, sample at dynamic temperature
- QEWS window size ablation (W=8, W=16)
- KV cache spectral reshaping as actuator (post GPU upgrade)
- Dual-model dynamic setpoint via speculative decoding
- Automated gain tuning
- Additional benchmarks (GSM8K, TruthfulQA, MuSiQue)
Acknowledgments
Motivated by the semantic entropy work of Farquhar et al. (2024), the QEWS density operator framework of Gong, Sedai, and Medda (2025), and the Token Entropy Cumulative Average (TECA) introduced by Bin et al. (2025).
License
MIT
Funding
This project is self-funded. If you'd like to support the research, the Manifund project covers a GPU upgrade that would significantly accelerate experimentation.
No comments yet.
Sign in to be the first to comment.