Making AI reliability a quantitative engineering discipline
We develop rigorous information-theoretic foundations for understanding and controlling large language model behavior—transforming unpredictable failures into quantifiable, preventable phenomena.
Why do AI systems hallucinate?
Most teams treat hallucinations as unpredictable bugs—random failures that can only be caught after the fact. We reject this premise.
Our research reveals two fundamentally different failure modes:
Factual Hallucinations
The model lacks the relevant knowledge. A retrieval or capacity problem that can potentially be fixed with more data or better retrieval.
Procedural Hallucinations
The model possesses the information but fails to route it to the output. A binding/gating problem that cannot be fixed by scaling or retrieval—the information is already present.
"On error trials where the model outputs the wrong answer, linear probes recover the correct answer from the model's own hidden states with 74% accuracy. The information is present but not routed."
The key insight
Transformers operate as "Bayesian in expectation, not in realization"—achieving optimal compression when averaged over input orderings while necessarily deviating for any fixed sequence.
This resolves a fundamental paradox: models can simultaneously exhibit near-Bayesian inference and systematically violate permutation invariance.
From theory to deployment
Every theoretical result yields deployable tools. We derive closed-form metrics that practitioners can compute from API logprobs.
Bits-to-Trust
How many bits of information are required to shift from prior uncertainty to target reliability level.
Risk-of-Hallucination
Probability that the model will hallucinate given current evidence, computed before generation.
Information Sufficiency Ratio
Ratio of observed information gain to required information, determining answer vs. abstain.
The EDFL Theorem
Our Expectation-level Decompression Law provides exact bounds: for fixed reliability 1−ε, achieving that reliability for rare events requires at least:
Empirically, hallucination rates drop by approximately 0.9% per additional nat of relevant information.
Where we're pushing
Compression-Learning Equivalence
Positional encodings induce O(log n) martingale violations with explicit constants, explaining "lost-in-the-middle" as compression failure.
Causal Interventions
Using null interventions do(E = ∅) to isolate model biases with tight Bernoulli-projected bounds on required information.
Optimal CoT Length
Closed-form expressions: k* = Θ(√n · log(1/ε)) for when extended reasoning helps vs. hurts.
Mechanistic Interpretability
Activation patching finds late attention restores correct bindings while late MLPs corrupt them.
Prompt Injection Audit
Distributional testing across wrapper formats, placements, and encodings. Move beyond single-payload testing to systematic vulnerability assessment.
Explore methodology →Optimal Chain-of-Thought
When does extended reasoning help vs. hurt? Understand the information-theoretic bounds on CoT length and avoid the "longer is better" trap.
Learn the framework →Our methodology
We combine rigorous theory with practical validation:
Mathematical Proofs
Fano inequalities, data processing inequalities, Girsanov theorem, change-of-measure arguments
Empirical Validation
Controlled experiments across GPT, Qwen, Llama, and Gemma with pre-specified analysis plans
Practical Toolkits
Everything deployable from standard API access with logprobs—no fine-tuning required
Our north star
Make model reliability a quantitative engineering discipline rather than an empirical guessing game.
Three tools, one goal: catch models that know but don't use.
Ready to make your AI systems reliable?
Join teams using information-theoretic tools to deploy AI with confidence.