Tutorial: Combat revisionism with deterministic provenance
Goal
Use Refract to detect revisionist edits — claim laundering, citation swapping, coordinated rewrites — and produce a cryptographic audit trail that anyone can reproduce independently. No trust required.
What Refract gives historians that Wikipedia's UI doesn't
Wikipedia's page history shows you diffs. Refract shows you patterns across diffs: which claims survived, which sources got swapped, whether edits were debated or silent, whether two wikis tell different stories, and a Merkle root proving you ran this analysis when you said you did.
Revisionism works by making the old version disappear. Refract makes the disappearance visible as a structured, queryable event. Every change becomes a timestamped, provenance-tagged record. The edit may vanish from the page — it cannot vanish from the event stream.
Step 1: Claim provenance — when did this sentence enter the record?
A contested historical claim appears on Wikipedia. When was it first added?
refract claim "Battle of X" --text "civilian casualties were minimal" -c
Output: a timeline of every revision where that sentence appeared, was rewritten,
removed, or reintroduced. Each event carries the exact revision ID, timestamp,
and a deterministic SHA-256 eventId. Run the same command on the same revision
range and you get the same hash — independently verifiable.
{
"eventType": "sentence_first_seen",
"toRevisionId": 1280110100,
"timestamp": "2021-04-03T12:00:00Z",
"after": "civilian casualties were minimal according to official sources",
"claimId": "c7d8e9f0a1b23456",
"layer": "observed"
}
Step 2: Citation laundering — same claim, different source
A common revisionist tactic: leave the claim text intact but swap the supporting source. "Civilian casualties were minimal" stays, but the citation changes from an independent human rights report to a government press release.
Refract catches this as a citation_replaced event:
refract analyze "Battle of X" --depth forensic -c > battle-events.jsonl
SELECT timestamp, before, after
FROM 'battle-events.jsonl'
WHERE event_type = 'citation_replaced'
AND section LIKE '%Casualties%'
ORDER BY timestamp;
The before field contains the old citation. The after field contains the new
one. The claim text didn't change — but what supports it did. Without Refract,
this pattern requires manually comparing every revision diff.
To find claims that lost their original sources:
SELECT
claim_id,
count(*) FILTER (WHERE event_type = 'citation_added') as sources_added,
count(*) FILTER (WHERE event_type = 'citation_removed') as sources_removed,
count(*) FILTER (WHERE event_type = 'citation_replaced') as sources_swapped
FROM 'battle-events.jsonl'
GROUP BY claim_id
HAVING sources_removed > sources_added
ORDER BY sources_swapped DESC;
Step 3: Edit clusters — coordinated or organic?
3+ edits hitting the same section within an hour fire edit_cluster_detected:
SELECT timestamp, section, deterministicFacts[0].detail as detail
FROM 'battle-events.jsonl'
WHERE event_type = 'edit_cluster_detected'
ORDER BY timestamp;
The detail field reports the editor count and time span.
Refract does not record editor identities.
Step 4: Talk page correlation — was this debated?
Refract checks whether talk page discussion occurred within 7 days before or 3 days after each edit. The absence of talk activity alongside content changes is itself a signal:
SELECT
count(*) FILTER (WHERE event_type LIKE 'sentence_%') as content_changes,
count(*) FILTER (WHERE event_type LIKE 'talk_%') as talk_activity,
count(*) FILTER (WHERE event_type = 'revert_detected') as reverts
FROM 'battle-events.jsonl';
Step 5: Cross-wiki temporal precedence — who said it first?
refract analyze "Topic" --api https://en.wikipedia.org/w/api.php > wiki-en.jsonl
refract analyze "Topic" --api https://de.wikipedia.org/w/api.php > wiki-de.jsonl
Or use refract diff for structured cross-wiki comparison with z-score outlier detection.
Step 6: Merkle-provable bundles — the audit trail
refract export "Battle of X" --bundle > battle-bundle.json
The bundle contains every event, a Merkle root, the schema version, and the analyzer configuration. A skeptic runs the same command and gets an identical Merkle root — or a different one if the page has changed. The historian provides a cryptographic receipt, not a screenshot.
What Refract deliberately does NOT tell you
Refract never says "this edit was revisionist" or "this claim is true." It says: this sentence appeared at this revision, supported by this source, removed at this revision, discussed on talk at this date, reinstated without its original citation at this revision.
- No sentiment analysis
- No truth claims
- No editor targeting
- No model calls in the deterministic pipeline
The historian brings the domain knowledge. Refract brings the audit trail.
Next steps
- Event taxonomy — all 26 event types
- Schema reference — full EvidenceEvent structure
- Citation churn tutorial
- Dispute timeline tutorial
- Analytics with DuckDB
- Bundle and manifest formats