Tutorial: Cross-wiki comparison with refract diff

Goal

Compare how the same topic is covered across two or more MediaWiki instances — different language Wikipedias, competing Fandom wikis, or independent vs. official wikis — and detect where they diverge.

Why cross-wiki comparison matters

The same topic framed differently across wikis reveals editorial perspective, regional bias in sourcing, canon disputes in fictional universes, and institutional differences in how knowledge is maintained. Refract can detect these divergences mechanically.

Steps

1. Compare two Wikipedia language editions

refract diff "Douglas_Adams" \
  --wiki-a https://en.wikipedia.org/w/api.php \
  --wiki-b https://de.wikipedia.org/w/api.php \
  --depth detailed

Refract fetches both pages, runs the full deterministic pipeline on each, and produces a side-by-side diff of section structure, citations, categories, and sentence changes.

2. Compare competing Fandom wikis

refract diff "Darth_Vader" \
  --wiki-a https://starwars.fandom.com/api.php \
  --wiki-b https://clone-wars.fandom.com/api.php \
  --depth forensic

Forensic depth enables edit cluster and talk page correlation on both wikis — showing not just what differs, but how contested the differences are.

3. Compare three wikis at once

refract diff "climate_change" \
  --wiki-a https://en.wikipedia.org/w/api.php \
  --wiki-b https://de.wikipedia.org/w/api.php \
  --wiki-c https://fr.wikipedia.org/w/api.php \
  --depth detailed

4. Export the comparison

refract diff "Bitcoin" \
  --wiki-a https://en.wikipedia.org/w/api.php \
  --wiki-b https://simple.wikipedia.org/w/api.php \
  --depth detailed > diff-output.jsonl

Reading the output

refract diff produces events with a wiki property tagging which wiki the event came from:

{
  "eventType": "citation_added",
  "wiki": "b",
  "wikiUrl": "https://simple.wikipedia.org/w/api.php",
  "fromRevisionId": 100,
  "toRevisionId": 101,
  "section": "History",
  "before": "",
  "after": "<ref>{{cite web |title=Bitcoin.org...}}</ref>",
  "timestamp": "2024-01-15T10:00:00Z",
  "deterministicFacts": [
    {
      "fact": "Citation added only on wiki B, absent on wiki A",
      "provenance": { "analyzer": "cross-wiki-differ", "version": "0.5.1" }
    }
  ]
}

Key signals:

Use cases

Detecting canon divergence (Fandom)

When two Fandom wikis cover the same fictional universe with different canon policies, refract diff catches the moment of divergence:

Wiki A: category_removed "Canon characters" at rev 1234
Wiki B: no corresponding event

→ Wiki A reclassified canon. Wiki B didn't. Canon divergence detected.

Detecting framing divergence (Wikipedia)

The same topic across English and German Wikipedia often differs in what's emphasized:

Wiki A: citation_added (section "Regulation") — SEC filing from 2023
Wiki B: citation_added (section "Regulierung") — BaFin statement from 2023

→ Each wiki uses locally-relevant primary sources. The evidence base is jurisdiction-specific.

Detecting editorial depth divergence

Simple Wikipedia vs. English Wikipedia shows how a topic is explained at different complexity levels:

Wiki A (English): 340 events, 26 revisions, forensic depth
Wiki B (Simple): 45 events, 8 revisions, detailed depth

→ Simple Wikipedia has fewer events, fewer citations, fewer contested claims.
→ The complexity difference is measurable from the event stream alone.

Troubleshooting

Next steps