Tutorial: Track changes on a Wikipedia page
Goal
Use Refract to analyze the full revision history of a Wikipedia page and understand what changed, when, and what kind of change it was — all deterministically, byte-for-byte reproducible.
Steps
1. Run your first analysis
Zero install:
npx @refract-org/cli analyze "Earth" --depth detailed
Or with a local install:
bun add @refract-org/cli
refract analyze "Earth" --depth detailed -c
The -c flag enables caching so subsequent runs skip already-fetched revisions. This
is essential for large pages or repeated analysis.
2. Read the output
refract analyze prints one JSON event per line to stdout. Each event represents a
single observed change at a revision boundary. For example:
{
"eventId": "a3f5c2e1b7d409fa",
"eventType": "section_reorganized",
"fromRevisionId": 1280110001,
"toRevisionId": 1280110100,
"section": "Geology",
"before": "",
"after": "== Geology ==\nEarth's crust consists of tectonic plates...",
"timestamp": "2024-11-25T12:00:00Z",
"layer": "observed",
"deterministicFacts": [
{
"fact": "Section Geology added with 3 paragraphs",
"provenance": {
"analyzer": "section-differ",
"version": "0.5.1",
"inputHashes": []
}
}
]
}
Key fields to read:
eventType: what kind of change (sentence, citation, template, revert, section, etc.)section: where in the page the change happenedbefore/after: the text before and after the changedeterministicFacts: why the engine produced this event — always mechanicallayer:"observed"means directly extracted from the diff; other layers ("policy_coded","model_interpretation") are set by downstream consumers
3. Scope to a specific revision range
Large pages can produce thousands of events. Narrow the range with --from and --to:
refract analyze "Earth" --from 1250000000 --to 1260000000 --depth detailed
Or filter by time with --since:
refract analyze "Earth" --since 2024-01-01 --depth detailed
4. View results in a browser
refract explore "Earth"
Opens a local web server (default port 8899) with an interactive timeline, evidence table, and diff viewer. Press Ctrl+C to stop.
5. Export structured output
# Line-delimited JSON for tools
refract export "Earth" --format ndjson > earth-events.jsonl
# CSV for spreadsheets
refract export "Earth" --format csv > earth-events.csv
# Bundle with SHA-256 verification
refract export "Earth" --bundle > earth-bundle.json
6. Track a specific claim
refract claim "Earth" --text "Earth is the third planet from the Sun" -c
This shows every revision where that sentence was added, modified, removed, or reintroduced — the full lifecycle of a claim.
Understanding what you see
Citation events (citation_added, citation_removed, citation_replaced) trace
source churn. A section with frequent citation turnover may indicate an actively
contested topic. A citation that survives many revisions without change is a
high-stability source.
Revert events (revert_detected) mark edits that were undone — either by the
same editor ("self-revert") or by another editor. Clusters of revert events in a
short time window often indicate edit-warring.
Template events (template_added, template_removed) reveal policy tagging.
Templates like {{citation needed}} or {{NPOV}} are Wikipedia's dispute signals.
Refract detects these mechanically from wikitext.
Section events (section_reorganized, lead_promotion) track structural
reorganization. When material moves from body to lead, the framing of the page has
changed — Refract flags this as a change in editorial emphasis.
Troubleshooting
- Rate limits: Wikipedia's API limits request frequency. Use
-cto cache revisions and avoid re-fetching on repeat runs. - Large pages: Pages with thousands of revisions produce many events. Use
--from/--toor--sinceto scope to a date range. - Other wikis: Point
--apito any MediaWiki instance:refract analyze "Erde" --api https://de.wikipedia.org/w/api.php - Too much output: Pipe to a file instead of stdout, or use
refract exportfor structured formats.