Research

Making AI reliability a quantitative engineering discipline

We develop rigorous information-theoretic foundations for understanding and controlling large language model behavior—transforming unpredictable failures into quantifiable, preventable phenomena.

The Problem

Why do AI systems hallucinate?

Most teams treat hallucinations as unpredictable bugs—random failures that can only be caught after the fact. We reject this premise.

Our research reveals two fundamentally different failure modes:

Factual Hallucinations

The model lacks the relevant knowledge. A retrieval or capacity problem that can potentially be fixed with more data or better retrieval.

Procedural Hallucinations

The model possesses the information but fails to route it to the output. A binding/gating problem that cannot be fixed by scaling or retrieval—the information is already present.

"On error trials where the model outputs the wrong answer, linear probes recover the correct answer from the model's own hidden states with 74% accuracy. The information is present but not routed."

The key insight

Transformers operate as "Bayesian in expectation, not in realization"—achieving optimal compression when averaged over input orderings while necessarily deviating for any fixed sequence.

This resolves a fundamental paradox: models can simultaneously exhibit near-Bayesian inference and systematically violate permutation invariance.

Our Approach

From theory to deployment

Every theoretical result yields deployable tools. We derive closed-form metrics that practitioners can compute from API logprobs.

B2T

Bits-to-Trust

How many bits of information are required to shift from prior uncertainty to target reliability level.

RoH

Risk-of-Hallucination

Probability that the model will hallucinate given current evidence, computed before generation.

ISR

Information Sufficiency Ratio

Ratio of observed information gain to required information, determining answer vs. abstain.

The EDFL Theorem

Our Expectation-level Decompression Law provides exact bounds: for fixed reliability 1−ε, achieving that reliability for rare events requires at least:

(1−ε)·log(1/p₀) + O(p₀) nats

Empirically, hallucination rates drop by approximately 0.9% per additional nat of relevant information.

Active Research

Where we're pushing

01

Compression-Learning Equivalence

Positional encodings induce O(log n) martingale violations with explicit constants, explaining "lost-in-the-middle" as compression failure.

02

Causal Interventions

Using null interventions do(E = ∅) to isolate model biases with tight Bernoulli-projected bounds on required information.

03

Optimal CoT Length

Closed-form expressions: k* = Θ(√n · log(1/ε)) for when extended reasoning helps vs. hurts.

04

Mechanistic Interpretability

Activation patching finds late attention restores correct bindings while late MLPs corrupt them.

Our methodology

We combine rigorous theory with practical validation:

Mathematical Proofs

Fano inequalities, data processing inequalities, Girsanov theorem, change-of-measure arguments

Empirical Validation

Controlled experiments across GPT, Qwen, Llama, and Gemma with pre-specified analysis plans

Practical Toolkits

Everything deployable from standard API access with logprobs—no fine-tuning required

Our north star

Make model reliability a quantitative engineering discipline rather than an empirical guessing game.

Ready to make your AI systems reliable?

Join teams using information-theoretic tools to deploy AI with confidence.