Refract vs. Wikipedia's page history

If you're evaluating whether Refract is worth adopting, this page compares what you get from Wikipedia's built-in tools vs. what Refract adds.

Wikipedia's page history

Capability	Wikipedia UI	Refract
View a single revision diff	Yes — click "prev" on any revision	Yes — every event carries `before`/`after` snapshots
See who edited what	Yes — username + timestamp per revision	No — Refract observes document change, not editor identity
Find when a sentence first appeared	Manual — search each revision sequentially	`refract claim "Page" --text "sentence"` → exact revision + timestamp
Track a sentence across its entire lifecycle	Manual — follow the page history	Automatic — first seen, modified, removed, reintroduced, all timestamped
Detect citation swapping	Manual — compare each diff's reference section	`citation_replaced` event — `before`/`after` show the old and new source
Detect edit wars	Manual — look for back-and-forth in history	`revert_detected` + `edit_cluster_detected` — automatic structural detection
Correlate article edits with talk page discussion	Manual — check Talk tab separately	`talk_page_correlated` — Refract checks 7 days before / 3 days after each edit
Compare the same topic across language editions	Manual — open each wiki separately	`refract diff` — cross-wiki comparison with z-score outlier detection
Query with SQL	No	DuckDB: `SELECT "eventType", count(*) FROM 'events.jsonl' GROUP BY 1`
Cryptographic audit trail	No — screenshots are the only proof	`refract export --bundle` → Merkle root, reproducible by anyone
Automated monitoring	No — you check manually	`refract cron` + `refract watch` → Slack, email, webhook alerts
AI agent integration	No	`refract mcp` → Claude Code, Cursor, VS Code can call Refract tools directly

Refract vs. other tools

Tool	What it does	Refract's difference
Wikipedia API	Raw revision data	Refract adds deterministic analysis, event typing, provenance metadata
WikiWho	Editor-level authorship attribution	Refract tracks claim lifecycle, not editor attribution. Different question.
WhoColor / WikiBlame	Visual diff highlighting	Refract structures the data for querying, not just viewing
Wikimedia Enterprise	Bulk API access, commercial licensing	Refract is open-source, deterministic, and runs locally
Internet Archive	Historical snapshots	Refract produces structured, queryable event streams, not page captures
Custom scrapers	Ad-hoc revision analysis	Refract has 26 event types, deterministic hashing, and a published SDK

What Refract deliberately doesn't do

Capability	Why not
Truth/fact-checking	Refract observes change, not correctness. It answers "what changed?", not "is this true?"
Sentiment analysis	Refract doesn't judge editor intent or tone
Editor scoring	Refract tracks document change, not editor behavior
Prediction	Refract reports what happened, not what might happen
Automated editing	Refract is read-only observation

When to use Refract

You need to prove when a claim appeared, not just screenshot it
You're analyzing patterns across many revisions (citation churn, edit clusters, talk correlation)
You need a cryptographic audit trail (Merkle-provable bundles)
You want to monitor pages automatically (cron + notifications)
You're building a RAG pipeline that needs claim stability signals
You want AI agents to reason about page history with structured data

Refract vs. AI evaluation tools

Refract's model evaluation capability — temporal leakage detection, provenance hallucination checking, retrieval quality scoring — has no direct competitor. Existing tools evaluate models on accuracy, safety, or reasoning. None evaluate models against deterministic ground truth about what was public knowledge and when.

Capability	Existing tools	Refract
Temporal leakage detection	Heuristic: compare model output to training cutoff dates. No deterministic proof.	`refract_eval.build_leakage_benchmark()` — exact revision ID, timestamp, SHA-256 hash. Proves leakage deterministically.
Provenance hallucination	Manual: check model citations against sources one at a time.	`refract_eval.check_provenance()` — query citation_added/removed/replaced events. Classify: verified, outdated, hallucinated.
Retrieval quality (stability-weighted)	Embedding similarity only. Contested and stable passages score identically.	`refract_eval.score_retrieval_quality()` — each passage scored by revert count, citation churn, talk activity.
Knowledge recency	No standard tooling. Ad-hoc: "ask the model what date it thinks it is."	`refract snapshot "Page" --at <date>` — deterministic page state at any point. Compare model answer against ground truth.
Standard benchmark	No open benchmark for temporal ground truth.	`BENCHMARK.md` — 10 standard pages, submission format, reproducibility requirements.
Reproducibility	Most eval suites: "run our script, trust our numbers."	Every event has a deterministic SHA-256 hash. Reviewer runs same command, gets same hash.

The gap Refract fills: every eval suite tests whether a model is accurate. None test whether a model knows things it shouldn't. Refract provides the ground truth for that test — and makes it reproducible.

When Wikipedia's UI is enough

You're checking one revision diff quickly
You need to see who made an edit
You're browsing page history casually

Refract doesn't replace Wikipedia's UI. It adds capabilities that the UI can't provide — deterministic reproducibility, SQL queryability, cryptographic verification, and automated monitoring.