New

Making AI
reliable.

Hassana Labs builds proofs, tools, and deployment playbooks that turn AI failure modes into predictable, preventable phenomena.

Strawberry Hallucination Detector - Claude Code integration showing guaranteed error bounds
Strawberry

Procedural hallucination detection for production-grade AI pipelines.

Strawberry

Catch hallucinations before they ship

Real-time procedural hallucination detection using information-theoretic verification. Try the demo →

strawberry

detect_hallucination

Quick, automatic verification

  • Auto-splits text into claims
  • Extracts citations automatically
result = detect_hallucination(
  answer="TLS 1.3 is enabled [S0]",
  spans=[{"sid": "S0", "text": "..."}]
)
Best for: Quick checks

audit_trace_budget

Lower-level, precise control

  • You provide atomic claims
  • Explicit cite IDs for precision
result = audit_trace_budget(
  steps=[{idx: 0, claim: "...",
         cites: ["S0"], confidence: 0.95}],
  spans=[{"sid": "S0", "text": "..."}]
)
Best for: CI/CD pipelines
Why "Hassana"?
"My grandma never went to high school, but she taught me that learning has no gates."

Named for our founder's grandmother, Hassana Labs opens doors for underrepresented researchers worldwide. Science isn't meant to be gatekept.

Read Hassana's story
We believe talent is everywhere but opportunity isn't. Through open-source tools, global workshops, and collaborative research programs, we're building pathways for underrepresented researchers to contribute to AI safety and reliability.
Capabilities

Three Ways to Verify LLM Reliability

From claim verification to security audits to reasoning budgets—we provide information-theoretic grounding at every step.

01

Hallucination Detection

Verify that every claim in an LLM's answer is actually supported by provided evidence—robust to evidence ordering.

  • Claim extraction Splits answers into atomic, verifiable claims
  • QMV permutation probing Tests across evidence orderings for robustness
  • Confidence metrics q_bar (mean) and q_lo (robust) support scores
  • Order-sensitivity detection js_bound flags serialization-dependent judgments
Supported threshold q_lo ≥ 0.95
02

Prompt Injection Audit

Estimate policy violation probability under a distribution of prompt serializations—wrappers, placements, and permutations.

  • Threat model coverage Plain, quote, codeblock, XML, JSON wrappers
  • Placement variations Before user, after user, tool output positions
  • Baseline vs attack comparison Measures delta_q shift from payload injection
  • Vulnerability identification Pinpoints weakest serialization combinations
Track metric attack.q_lo → minimize
Learn more →
03

Chain-of-Thought Budgeting

Allocate reasoning token budgets rationally based on problem complexity and target error probability.

  • Heuristic formula k ≈ c · √n · log(1/ε) tokens
  • Calibration from data Fit constant c from observed runs
  • Context-aware scaling Budget grows with √n, not linearly
  • Error targeting Explicit ε parameter for reliability goals
Example budget n=600, ε=0.05 → k≈73
Learn more →
"

Transformers minimize expected description length across permutations—this explains why they look Bayesian in expectation.

"
Bayesian in Expectation Hassana Labs, 2025
From the Blog

Latest research notes

Loading posts…

Join us in building AI you can trust.

Reach out to explore how information-theoretic tools can transform your AI deployment.