Refract vs. Wikipedia's page history
If you're evaluating whether Refract is worth adopting, this page compares what you get from Wikipedia's built-in tools vs. what Refract adds.
Wikipedia's page history
| Capability | Wikipedia UI | Refract |
|---|---|---|
| View a single revision diff | Yes — click "prev" on any revision | Yes — every event carries before/after snapshots |
| See who edited what | Yes — username + timestamp per revision | No — Refract observes document change, not editor identity |
| Find when a sentence first appeared | Manual — search each revision sequentially | refract claim "Page" --text "sentence" → exact revision + timestamp |
| Track a sentence across its entire lifecycle | Manual — follow the page history | Automatic — first seen, modified, removed, reintroduced, all timestamped |
| Detect citation swapping | Manual — compare each diff's reference section | citation_replaced event — before/after show the old and new source |
| Detect edit wars | Manual — look for back-and-forth in history | revert_detected + edit_cluster_detected — automatic structural detection |
| Correlate article edits with talk page discussion | Manual — check Talk tab separately | talk_page_correlated — Refract checks 7 days before / 3 days after each edit |
| Compare the same topic across language editions | Manual — open each wiki separately | refract diff — cross-wiki comparison with z-score outlier detection |
| Query with SQL | No | DuckDB: SELECT "eventType", count(*) FROM 'events.jsonl' GROUP BY 1 |
| Cryptographic audit trail | No — screenshots are the only proof | refract export --bundle → Merkle root, reproducible by anyone |
| Automated monitoring | No — you check manually | refract cron + refract watch → Slack, email, webhook alerts |
| AI agent integration | No | refract mcp → Claude Code, Cursor, VS Code can call Refract tools directly |
Refract vs. other tools
| Tool | What it does | Refract's difference |
|---|---|---|
| Wikipedia API | Raw revision data | Refract adds deterministic analysis, event typing, provenance metadata |
| WikiWho | Editor-level authorship attribution | Refract tracks claim lifecycle, not editor attribution. Different question. |
| WhoColor / WikiBlame | Visual diff highlighting | Refract structures the data for querying, not just viewing |
| Wikimedia Enterprise | Bulk API access, commercial licensing | Refract is open-source, deterministic, and runs locally |
| Internet Archive | Historical snapshots | Refract produces structured, queryable event streams, not page captures |
| Custom scrapers | Ad-hoc revision analysis | Refract has 26 event types, deterministic hashing, and a published SDK |
What Refract deliberately doesn't do
| Capability | Why not |
|---|---|
| Truth/fact-checking | Refract observes change, not correctness. It answers "what changed?", not "is this true?" |
| Sentiment analysis | Refract doesn't judge editor intent or tone |
| Editor scoring | Refract tracks document change, not editor behavior |
| Prediction | Refract reports what happened, not what might happen |
| Automated editing | Refract is read-only observation |
When to use Refract
- You need to prove when a claim appeared, not just screenshot it
- You're analyzing patterns across many revisions (citation churn, edit clusters, talk correlation)
- You need a cryptographic audit trail (Merkle-provable bundles)
- You want to monitor pages automatically (cron + notifications)
- You're building a RAG pipeline that needs claim stability signals
- You want AI agents to reason about page history with structured data
Refract vs. AI evaluation tools
Refract's model evaluation capability — temporal leakage detection, provenance hallucination checking, retrieval quality scoring — has no direct competitor. Existing tools evaluate models on accuracy, safety, or reasoning. None evaluate models against deterministic ground truth about what was public knowledge and when.
| Capability | Existing tools | Refract |
|---|---|---|
| Temporal leakage detection | Heuristic: compare model output to training cutoff dates. No deterministic proof. | refract_eval.build_leakage_benchmark() — exact revision ID, timestamp, SHA-256 hash. Proves leakage deterministically. |
| Provenance hallucination | Manual: check model citations against sources one at a time. | refract_eval.check_provenance() — query citation_added/removed/replaced events. Classify: verified, outdated, hallucinated. |
| Retrieval quality (stability-weighted) | Embedding similarity only. Contested and stable passages score identically. | refract_eval.score_retrieval_quality() — each passage scored by revert count, citation churn, talk activity. |
| Knowledge recency | No standard tooling. Ad-hoc: "ask the model what date it thinks it is." | refract snapshot "Page" --at <date> — deterministic page state at any point. Compare model answer against ground truth. |
| Standard benchmark | No open benchmark for temporal ground truth. | BENCHMARK.md — 10 standard pages, submission format, reproducibility requirements. |
| Reproducibility | Most eval suites: "run our script, trust our numbers." | Every event has a deterministic SHA-256 hash. Reviewer runs same command, gets same hash. |
The gap Refract fills: every eval suite tests whether a model is accurate. None test whether a model knows things it shouldn't. Refract provides the ground truth for that test — and makes it reproducible.
When Wikipedia's UI is enough
- You're checking one revision diff quickly
- You need to see who made an edit
- You're browsing page history casually
Refract doesn't replace Wikipedia's UI. It adds capabilities that the UI can't provide — deterministic reproducibility, SQL queryability, cryptographic verification, and automated monitoring.