CausalLens is a middleware layer that sits between any LLM and any application to verify causal claims in the model's output. Drop it in front of your chatbot, agent, RAG pipeline, or evaluation harness and get back a structured report of every causal claim the model made along with a verdict for each one.
Large language models are trained to predict the next token given everything that came before. That objective rewards producing text that looks like the kinds of sentences humans write, including sentences of the form "X causes Y". What it does not reward is checking whether X actually causes Y. LLMs therefore emit causal claims that are:
- confounded - a hidden variable drives both sides,
- reversed - the model states effect-to-cause as cause-to-effect,
- spurious - the two concepts merely co-occur in training data,
- or contradictory - two claims in the same response imply a cycle.
In domains like medicine, law, science, and finance, acting on a correlation as though it were causation is dangerous. CausalLens catches those claims before they reach a user.
Consider a medical assistant that replies:
"Drinking coffee causes cancer because it is frequently consumed by cancer patients."
The grammar is fine, the sentence is confident, and the shape of the argument is familiar. The claim is also wrong: heavy coffee drinkers happen to smoke more, and smoking is the confounder driving the cancer association. A user who takes the answer at face value may give up their morning coffee for no benefit; a clinician who relies on it may mislead a patient.
CausalLens extracts the claim coffee -> cancer, builds a causal
graph from the surrounding text, runs the claim through DoWhy, and
returns a verdict:
[CORRELATION] coffee -> cancer
Estimated effect is close to zero or does not survive refutation.
pip install causallens
python -m spacy download en_core_web_smCausalLens depends on spaCy, NetworkX, and DoWhy. These are installed automatically.
from causallens import CausalLens
cl = CausalLens()
result = cl.verify("smoking causes cancer and also causes yellow teeth")
print(result.claims)
print(result.report)result.claims is a list of CausalClaim objects:
[
CausalClaim(cause='smoking', effect='cancer', cue='causes', confidence=0.8),
CausalClaim(cause='smoking', effect='yellow teeth', cue='causes', confidence=0.8),
]
result.report is a human-readable summary of the verification:
CausalLens verification report
========================================
Input length: 51 characters
Claims found: 2
Summary:
causal 1
correlation 1
contradicted 0
unverifiable 0
Claim-by-claim:
1. [CAUSAL] smoking -> cancer (cue='causes', estimate=1.035)
2. [CORRELATION] smoking -> yellow teeth (cue='causes', estimate=0.031)
from causallens import CausalLens
def my_llm(prompt: str) -> str:
... # call OpenAI, Anthropic, Ollama, etc.
cl = CausalLens()
safe_llm = cl.wrap(my_llm)
result = safe_llm("Why is smoking harmful?")
for v in result.verdicts:
print(v.verdict, v.claim.cause, "->", v.claim.effect)CausalLens runs a three-step pipeline.
causallens.extractor.ClaimExtractor parses the text with spaCy and
looks for causal-language patterns: causes, leads to, results
in, because of, due to, triggers, produces, and friends. Each
match yields a CausalClaim(cause, effect, cue, sentence, confidence).
The extractor handles forward cues (A causes B), backward cues (B because of A), and coordinated phrases (A causes B and C).
causallens.graph.CausalGraph builds a networkx.MultiDiGraph in
which every cause/effect phrase is a node and every claim is an edge
with a confidence score. The graph exposes a GML export that feeds
directly into DoWhy and a DOT export for Graphviz visualization. It
also detects cycles, which are automatically surfaced as
unverifiable.
causallens.verifier.CausalVerifier pushes the graph through DoWhy:
- identify an estimand for every
(cause, effect)pair, - estimate the effect with linear-regression backdoor adjustment,
- refute the estimate with a random-common-cause perturbation,
and returns a Verdict per claim: causal, correlation,
contradicted, or unverifiable. If you do not have observational
data, the verifier synthesizes a small dataset that is consistent
with the extracted DAG so that DoWhy has something to fit; pass your
own pandas.DataFrame via CausalLens(data=...) to use real data.
| Scenario | Raw LLM output | CausalLens-verified output |
|---|---|---|
| Medical chatbot | "Drinking coffee causes cancer." Emitted confidently. | Same text, plus [CORRELATION] coffee -> cancer - estimate does not survive refutation. Downstream app warns user. |
| Science explainer | "Higher CO2 causes rising sea levels." No traceable reasoning. | Same text, plus a verified DAG CO2 -> temperature -> sea_level with every edge labelled [CAUSAL]. |
| Business report | "Raising prices led to higher revenue." Based on one quarter. | Same text, plus [CORRELATION] price -> revenue - estimate not robust to random common cause. Reviewer digs deeper. |
| Policy analysis | "Minimum-wage hikes cause unemployment." Contested claim. | Same text, plus [UNVERIFIABLE] if no supporting structure is provided, preventing the app from treating it as settled. |
| Code assistant | "Using recursion causes stack overflow." True in context. | Same text, plus [CAUSAL] recursion -> stack_overflow - estimate survives refutation. |
The original text is always returned unchanged; CausalLens adds a report, never rewrites the model's words.
from causallens import (
CausalLens, # top-level wrapper
VerificationResult, # what cl.verify() returns
CausalClaim, # one extracted claim
ClaimExtractor, # spaCy-based extractor
CausalGraph, # NetworkX graph of claims
CausalVerifier, # DoWhy-based verifier
Verdict, # per-claim result
)CausalLens(spacy_model="en_core_web_sm", data=None) constructs a
ready-to-use pipeline. cl.verify(text) returns a
VerificationResult with .text, .claims, .verdicts, .graph,
.summary, and .report.
python examples/basic_usage.py
python examples/medical_example.py
python examples/scientific_example.pypip install -e ".[dev]"
pytestThe suite covers extraction, graph construction, verifier verdicts, and the top-level wrapper.
Pull requests are welcome. If you are adding a feature, please:
- Open an issue describing the use case first.
- Add unit tests under
tests/and keep the suite green. - Keep the public API (
CausalLens,VerificationResult,CausalClaim,Verdict) stable; extend rather than reshape. - Run
pytestlocally before pushing.
Ideas that would help the project:
- richer cue-phrase library (multilingual, domain-specific),
- support for user-supplied observational data with sensible column auto-mapping,
- additional DoWhy refuters surfaced as verdict qualifiers,
- plug-ins for popular LLM SDKs (OpenAI, Anthropic, LangChain).
CausalLens is released under the MIT License.