Concepts
Refract is the open claim-history layer for public knowledge. It ingests revision histories from MediaWiki instances and produces a deterministic event stream showing how claims enter, change, stabilize, and exit the public record — where they came from, what supported them, what challenged them, and when context altered their meaning.
Every run against the same revision range produces identical output: byte-for-byte reproducible.
How determinism works
Refract never calls an LLM, never samples randomly, and never depends on external state. Given revision text from a MediaWiki API, each analyzer applies a pure function: (wikitext) → events. The same wikitext always produces the same events. This holds across machines, OS versions, and time.
Deterministic identity (eventId) extends this guarantee: events with identical content produce identical hashes, so deduplication and cross-reference are reliable even when the same observation is run independently.
Architecture
The diagram shows how Refract ingests revision histories from MediaWiki APIs, runs deterministic analyzers, and produces a structured event stream with provenance tags.
Pipeline stages
Wikipedia API → Fetch → Analyze → Report → (optional) Validate
| Stage | What it does | Reproducible? |
|---|---|---|
| Fetch | Retrieves revisions from the MediaWiki API, handles rate limits, pagination | Yes — same API + same page → same revisions |
| Analyze | Runs deterministic analyzers — section diffs, citation tracking, revert detection, template classification, category/wikilink extraction | Yes — pure functions, no external state |
| Report | Assembles structured events with revision, section, and timestamp provenance | Yes — deterministic ID via SHA-256 content hash |
| Validate | Optional: compares pipeline output against ground truth labels (eval package) | Yes — deterministic comparison |
Depth levels
The --depth flag controls how many analyzers run:
brief— section differ only. Quick check for any change.detailed(default) — all standard analyzers (section, citation, revert, template).forensic— includes cross-revision correlation, metadata trackers, and cluster detection. Use for full audit or export.
See depth reference.
Output model
The pipeline produces timed EvidenceEvent objects with 26 event types, all mechanical:
- Sentence events:
sentence_first_seen,sentence_removed,sentence_modified,sentence_reintroduced - Citation events:
citation_added,citation_removed,citation_replaced - Section events:
section_reorganized - Template events:
template_added,template_removed,template_parameter_changed - Revert events:
revert_detected - Cluster/activity events:
edit_cluster_detected,talk_activity_spike - Page events:
lead_promotion,lead_demotion,page_moved - Link/category events:
wikilink_added,wikilink_removed,category_added,category_removed - Protection events:
protection_changed - Talk page events:
talk_page_correlated,talk_thread_opened,talk_thread_archived,talk_reply_added
The event schema includes a modelInterpretation field and model_interpretation evidence layer — these are never set by Refract's deterministic engine. They exist for downstream consumers to attach semantic analysis without modifying the event stream.
Analyzers accept configurable parameters (similarity thresholds, time windows, revert patterns) via AnalyzerConfig, passed from the CLI or SDK. This allows users to tune detection sensitivity without modifying analyzer code.
The pipeline can also produce an ObservationReport — a structured aggregate that groups events by claim with a ClaimLedger tracking each claim's lifecycle across revisions.
See schema.md for the full reference.
Configurable heuristics / BYO-inference boundaries
Every analyzer threshold encodes an interpretive judgment — what counts as a revert, how close edits must be to form a cluster, what sentence similarity means "modified" vs "removed and added." These thresholds accept optional overrides via AnalyzerConfig, and the effective parameters are recorded in FactProvenance.parameters when non-default values are used.
Each threshold is a BYO-inference boundary — a typed function signature where a model can replace the default heuristic:
| Boundary | Default (mechanical) | Pluggable (model) |
|---|---|---|
| Sentence similarity | Word-overlap ratio (0.8) | "Are these two sentences the same claim?" |
| Revert detection | 6 regex patterns | "Is this edit comment a revert?" |
| Template classification | Name-to-type lookup | "What policy signal does this template represent?" |
| Edit cluster detection | Time window + min size | "Are these edits semantically related?" |
| Heuristic classification | Size thresholds + comment patterns | "What kind of edit is this?" |
The defaults work offline with no configuration required. Consumers supply a model at any boundary, and the event's FactProvenance.parameters records the path taken — similaritySource: "model" or "default" — keeping the audit trail transparent regardless.
This is the architectural spine: Refract stays purely mechanical, but the boundaries where judgment enters are explicit, typed, and versioned. Foundation model improvements automatically benefit consumers without Refract changing.
Two-knowledge split
Refract separates mechanical observation from human judgment:
- Deterministic layer — what changed, byte-for-byte reproducible. All analyzers, all events.
- Ground truth layer — independently verified outcome labels (talk page consensus, RFC closures, ArbCom decisions) stored in
@refract-org/eval. Used for validation, never redefined by pipeline output.
Downstream model interpretation is a separate concern — it consumes Refract's event stream without modifying it.
What Refract does not do
- No model interpretation — semantic analysis of what a change means is handled by downstream systems
- No truth claims — Refract reports what changed, not whether the change is accurate
- No prediction, sentiment analysis, or editor scoring
- No claims about compliance, policy violations, or decision relevance
Independent ground truth
The eval package stores outcome labels independently from pipeline output. These labels are collected from talk page discussions, RFC closures, and arbitration decisions — human consensus about what happened. The evaluation harness compares pipeline events against these labels to measure analyzer accuracy without feedback loops.
References
Refract builds on research in peer production, computational social science, and knowledge evolution. The following work informs its architecture and assumptions:
Wikipedia and peer production
- Viégas, Wattenberg, & McKeon (2007) — The Hidden Order of Wikipedia. Visualizing the edit history of Wikipedia articles reveals patterns of cooperation and conflict.
- Halfaker, Geiger, Morgan, & Riedl (2013) — The Rise and Decline of an Open Collaboration System: How Wikipedia's reaction to popularity is causing its decline. American Behavioral Scientist. Documents the systemic pressures that shape Wikipedia's editorial dynamics.
- Kittur & Kraut (2008) — Harnessing the Wisdom of Crowds in Wikipedia: Quality through Coordination. Proceedings of CSCW. Shows that coordination structures, not just contributor count, determine article quality.
- Keegan, Gergle, & Contractor (2013) — Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events. American Behavioral Scientist. Analyzes how Wikipedia responds to real-world events — the pattern Refract's edit cluster detector captures.
- Müller-Birn, Dobusch, & Herbsleb (2013) — Work-to-rule: The emergence of algorithmic regulation in Wikipedia. Proceedings of OpenSym. How policy templates and automated tools enforce editorial norms.
Content dynamics and claim evolution
- Adler, de Alfaro, Pye, & Raman (2008) — Measuring Author Contributions to the Wikipedia. WikiWho system for sentence-level authorship attribution. Refract differs by tracking sentence lifecycle rather than who wrote it.
- Ferschke, Gurevych, & Rittberger (2013) — The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia. Proceedings of ACL. Analyzes how cleanup templates correlate with content quality and how topic bias affects quality assessment.
Provenance and evidence infrastructure
- Buneman, Khanna, & Tan (2001) — Why and Where: A Characterization of Data Provenance. ICDT. Foundational work on data provenance that motivates Refract's
FactProvenanceandinputHashesdesign. - Carroll, Bizer, Hayes, & Stickler (2005) — Named Graphs, Provenance and Trust. WWW. Semantic web provenance patterns that influenced Refract's
EvidenceLayertaxonomy (observed,policy_coded,model_interpretation). - Merkle (1980) — Protocols for Public Key Cryptosystems. IEEE S&P. Merkle tree construction used in Refract's replay manifests for observation integrity verification.
- Moreau et al. (2011) — The Open Provenance Model core specification (v1.1). Future Generation Computer Systems. Provenance modeling patterns that informed
FactProvenancedesign (analyzer, version, inputHashes).
Decision intelligence (relevant to downstream consumers)
- Kahneman & Tversky (1979) — Prospect Theory: An Analysis of Decision under Risk. Econometrica. Foundational work on how humans make judgments under uncertainty — the cognitive context Refract's deterministic event stream is designed to anchor.
- Tetlock & Gardner (2015) — Superforecasting: The Art and Science of Prediction. Calibration and debiasing techniques that downstream consumers can apply to model interpretation built on top of Refract's event stream.
- Sackett et al. (1996) — Evidence Based Medicine: What it is and what it isn't. BMJ. Defines the evidence hierarchy that healthcare downstream consumers (e.g., NextConsensus) use to weight source types.
- Guyatt et al. (2008) — GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. The GRADE framework for evidence quality assessment — a model for how downstream consumers might score source reliability.