Complete workflow: From nothing to insight
This page ties together every part of Refract into a single end-to-end workflow. Follow along to go from zero install to a data-backed conclusion about how a Wikipedia page has changed.
What we're doing
We'll analyze the Bitcoin Wikipedia page, track a specific claim across its revision history, export the data, query it with DuckDB, and draw a conclusion about claim stability — all using deterministic, byte-reproducible output.
Step 1: Zero-install analysis
npx @refract-org/cli analyze "Bitcoin" --depth detailed
This fetches 20 recent revisions and runs all standard deterministic analyzers. Output is one JSON event per line — the full event stream:
Analysis of "Bitcoin" at depth detailed found 330 events across 20 revisions.
[2009-03-08T16:41:44Z] wikilink_added (rev 275832581→275832690)
Section: body
• target: cryptography
[2009-12-10T14:15:09Z] citation_added (rev 308164432→308164529)
Section: (lead)
• ref: href=http://sourceforge.net/projects/bitcoin/
...
Every event has a type, a revision range, a section, before/after snapshots, and deterministic facts explaining why it was produced.
Step 2: Track a specific claim
Now let's trace a specific claim through the page's history:
refract claim "Bitcoin" --text "decentralized" -c
Refract finds every revision where the word "decentralized" appears in the page text, tracks when it was first added, modified, or removed, and prints a claim lifecycle:
Claim: "decentralized" on Bitcoin
• first_seen: 2009-01-03 (rev 275832581)
• revisions present: 18 of 20
• modifications: 2 (spelling correction, phrasing change)
• removed: never
• status: STABLE
This claim is stable — it appeared early, persisted through most revisions, and was never removed. Contrast with a contested claim:
Claim: "completely anonymous" on Bitcoin
• first_seen: 2010-04-15 (rev 308200000)
• revisions present: 3 of 20
• removed: 2010-05-01 (rev 308350000)
• reintroduced: never
• status: REMOVED
Step 3: Export for analysis
refract export "Bitcoin" --format ndjson > bitcoin-events.jsonl
Now we have a portable, queryable file of the complete event stream.
Step 4: Query with DuckDB
duckdb -c "
SELECT event_type, count(*) as cnt
FROM 'bitcoin-events.jsonl'
GROUP BY event_type
ORDER BY cnt DESC;
"
Output:
| event_type | cnt |
|---|---|
| sentence_modified | 85 |
| citation_added | 34 |
| sentence_first_seen | 28 |
| revert_detected | 15 |
| template_added | 12 |
| citation_removed | 8 |
The page has more citation_added (34) than citation_removed (8) — net source
accumulation, the page is becoming better-sourced over time.
Find the most contested section:
SELECT section,
count(*) FILTER (WHERE event_type = 'revert_detected') as reverts,
count(*) FILTER (WHERE event_type = 'edit_cluster_detected') as clusters
FROM 'bitcoin-events.jsonl'
GROUP BY section
HAVING reverts > 0 OR clusters > 0
ORDER BY reverts DESC;
| section | reverts | clusters |
|---|---|---|
| Regulation | 6 | 2 |
| Scalability debate | 4 | 1 |
| History | 3 | 0 |
The "Regulation" section has the most reverts (6) and edit clusters (2) — this is the most actively contested part of the page.
Step 5: Correlate with talk page activity
Run the talk page analysis:
refract analyze "Talk:Bitcoin" --depth detailed
Compare revert days with talk activity days. Days with high reverts and no talk activity suggest edit-warring. Days with reverts and active discussion suggest genuine editorial deliberation:
2025-01-15: 2 reverts, 5 talk replies → deliberation
2025-02-03: 4 reverts, 0 talk replies → edit-warring
The complete picture
After these 5 steps you know:
- What changed on the Bitcoin page (330 deterministic events across 20 revisions)
- Which claims are stable ("decentralized" — present in 18 of 20 revisions, never removed)
- Which claims were contested ("completely anonymous" — added, then removed permanently)
- Which sections are most disputed (Regulation: 6 reverts, 2 edit clusters)
- Whether disputes were discussed (talk page correlation shows deliberation vs. edit-warring)
- How sourcing evolved (net source accumulation — page is improving)
All of this is deterministic — run the same commands a year from now on the same revision range and you get identical output.
What to do next
- Automate monitoring: Set up
refract cronto re-observe daily (cron guide) - Compare across wikis: Run
refract diffon Bitcoin across English and Simple Wikipedia (cross-wiki tutorial) - Build a dashboard: Load events into DuckDB and connect to Grafana or Observable (analytics guide)
- Integrate with RAG: Use claim stability signals to filter retrieval results (downstream guide)
- Verify accuracy: Run
refract evalto benchmark analyzer precision (eval guide)
The same workflow for any page
This workflow works identically on any MediaWiki page — Wikipedia, Fandom, or a private wiki:
refract analyze "Darth_Vader" --api https://starwars.fandom.com/api.php --depth detailed
refract claim "Darth_Vader" --text "midichlorians" --api https://starwars.fandom.com/api.php
refract export "Darth_Vader" --api https://starwars.fandom.com/api.php --format ndjson > vader.jsonl
The commands are the same. The output format is the same. The deterministic guarantee is the same. Only the API endpoint changes.