Why Refract?
Refract is the open claim-history layer for public knowledge. If you need to know what changed, when, and how — not just what a page says right now — Refract is the answer.
Who this is for
Journalists and researchers
You need to trace how a claim about a person, company, event, or policy evolved across Wikipedia's revision history. Refract gives you a deterministic timeline of every change — when a sentence first appeared, when sources were added or removed, when edits were reverted, and when talk page discussions accompanied the changes.
refract claim "Theranos" --text "revolutionary blood testing" -c
Result: a verifiable timeline of when the claim was added, who softened it, when the sources disappeared, and when the article was tagged with dispute templates.
Data scientists and OSINT analysts
You need structured, queryable data about page revision histories — no scraping, no custom parsers, no fragile heuristics. Refract outputs standard NDJSON that DuckDB, Python pandas, and any JSON tool can consume directly.
refract analyze "Bitcoin" --depth forensic --since 2024-01-01 > bitcoin.jsonl
duckdb -c "SELECT event_type, count(*) FROM 'bitcoin.jsonl' GROUP BY 1 ORDER BY 2 DESC"
ML engineers building RAG or training data
You need provenance signals for retrieved text. Is this claim stable? Has it been contested? Is the supporting source still in the article? Refract attaches claim stability metadata to every chunk so your retrieval system can weight results by provenance quality, not just embedding similarity.
# Filter training data to stable, well-sourced claims
stable_claims = df[(df["eventType"] == "sentence_first_seen")
& (df["is_contested"] == False)]
Regulatory and policy monitors
You need to track changes to drug safety pages, guideline entries, or policy language for early signals of institutional shifts. Refract's cron mode re-observes pages on a schedule and notifies you when new events fire — citation removal, template disputes, section reorganization.
refract cron pages.txt --notify-webhook https://hooks.example.com/refract
AI agent developers
You need an MCP-native tool to give your agent claim-awareness. refract mcp starts
a JSON-RPC server that any MCP client can connect to. Your agent can call
analyze, claim, and export directly — no API key required, no setup, no config.
{
"mcpServers": {
"refract": {
"command": "npx",
"args": ["@refract-org/cli", "mcp"]
}
}
}
Fan wiki maintainers
Your wiki's canon is contested. Headcanon, fan edits, and retcons overwrite established lore. Refract tracks every change to every page — when a character's backstory changed, when a category was reclassified, when one wiki's canon diverged from another's.
refract diff "Darth_Vader" \
--wiki-a https://starwars.fandom.com/api.php \
--wiki-b https://clone-wars.fandom.com/api.php
What makes Refract different
| Refract | Alternatives |
|---|---|
| Deterministic — same input, same output, every run. Byte-for-byte identical. | Most tools depend on model calls, heuristic sampling, or non-reproducible computation |
| Provenance-tagged — every event records which analyzer, what version, what parameters | Most tools produce output without an audit trail |
| 26 event types — sentence lifecycle, citations, templates, reverts, sections, categories, wikilinks, talk pages, edit clusters, protection changes | Most tools track 1–3 signal types |
| BYO-inference — every analyzer threshold is a pluggable function. Defaults work offline. Plug a model where you need one. | Most tools are either all-model or no-model, not selectable per boundary |
| Merkle-verifiable — signed bundles and replay manifests for audit trail integrity | No comparable tool offers cryptographic verification |
Zero-install — npx @refract-org/cli analyze "Earth" works with no download, no config |
Most tools require installation, API keys, or account setup |
| MCP-native — AI agents connect via built-in MCP server | Most tools have no AI agent integration |
When Refract is not the right tool
- You need real-time monitoring: Refract is polling-based (via cron or watch mode), not push-based. Use a MediaWiki webhook if you need sub-second latency.
- You need sentiment analysis: Refract is deterministic — it doesn't score, rank, or classify editor intent. Use a downstream tool for sentiment.
- You need editor profiling: Refract never records editor identities. Use a diff analytics tool if you need per-editor attribution.
- You need a truth-verification system: Refract reports what changed, not whether the change is correct. Truth assessment lives downstream.
Quick evaluation
# 30 seconds to first result
npx @refract-org/cli analyze "Earth" --depth brief
# 2 minutes to a full analysis
npx @refract-org/cli analyze "Bitcoin" --depth detailed
# See it all in a browser
refract explore "Bitcoin"
The full complete workflow walks through a realistic use case from zero to insight in 5 steps.