New

Making AI
reliable.

Hassana Labs builds proofs, tools, and deployment playbooks that turn AI failure modes into predictable, preventable phenomena.

Explore Research Try Strawberry

Strawberry

Procedural hallucination detection for production-grade AI pipelines.

Strawberry

Catch hallucinations before they ship

Real-time procedural hallucination detection using information-theoretic verification. Try the demo →

strawberry

detect_hallucination

Quick, automatic verification

Auto-splits text into claims
Extracts citations automatically

result = detect_hallucination(
  answer="TLS 1.3 is enabled [S0]",
  spans=[{"sid": "S0", "text": "..."}]
)

Best for: Quick checks

audit_trace_budget

Lower-level, precise control

You provide atomic claims
Explicit cite IDs for precision

result = audit_trace_budget(
  steps=[{idx: 0, claim: "...",
         cites: ["S0"], confidence: 0.95}],
  spans=[{"sid": "S0", "text": "..."}]
)

Best for: CI/CD pipelines

Why "Hassana"?

"My grandma never went to high school, but she taught me that learning has no gates."

Named for our founder's grandmother, Hassana Labs opens doors for underrepresented researchers worldwide. Science isn't meant to be gatekept.

Read Hassana's story →

We believe talent is everywhere but opportunity isn't. Through open-source tools, global workshops, and collaborative research programs, we're building pathways for underrepresented researchers to contribute to AI safety and reliability.

Capabilities

Three Ways to Verify LLM Reliability

From claim verification to security audits to reasoning budgets—we provide information-theoretic grounding at every step.

Hallucination Detection

Verify that every claim in an LLM's answer is actually supported by provided evidence—robust to evidence ordering.

Claim extraction Splits answers into atomic, verifiable claims
QMV permutation probing Tests across evidence orderings for robustness
Confidence metrics q_bar (mean) and q_lo (robust) support scores
Order-sensitivity detection js_bound flags serialization-dependent judgments

Supported threshold q_lo ≥ 0.95

Prompt Injection Audit

Estimate policy violation probability under a distribution of prompt serializations—wrappers, placements, and permutations.

Threat model coverage Plain, quote, codeblock, XML, JSON wrappers
Placement variations Before user, after user, tool output positions
Baseline vs attack comparison Measures delta_q shift from payload injection
Vulnerability identification Pinpoints weakest serialization combinations

Track metric attack.q_lo → minimize

Learn more →

Chain-of-Thought Budgeting

Allocate reasoning token budgets rationally based on problem complexity and target error probability.

Heuristic formula k ≈ c · √n · log(1/ε) tokens
Calibration from data Fit constant c from observed runs
Context-aware scaling Budget grows with √n, not linearly
Error targeting Explicit ε parameter for reliability goals

Example budget n=600, ε=0.05 → k≈73

Learn more →

Transformers minimize expected description length across permutations—this explains why they look Bayesian in expectation.

Bayesian in Expectation Hassana Labs, 2025

Research & Publications

Information-theoretic foundations for predictable AI

We quantify how many nats of information are required to make models reliable, then build tools that deliver those guarantees in practice.

Foundations LLMs are Bayesian in Expectation, not Realization Resolves martingale violation paradox; optimal CoT length 2025 → Hallucination Information-Theoretic Theory of Procedural Hallucinations Stagewise decomposition, Fano bounds, oracle checkpointing 2026 → Deployment Predictable Compression Failures EDFL theorem; ISR-based abstention achieving near-0% hallucination 2025 → Tools Strawberry: Hallucination Detection for Lean4 Evidence-gated proof assistant with machine-checked results 2026 →

From the Blog

Latest research notes

Loading posts…

Join us in building AI you can trust.

Reach out to explore how information-theoretic tools can transform your AI deployment.

Get in touch Explore research

Making AIreliable.