Refract: deterministic observation layer for model evaluation
Refract reveals how claims change across public revision histories — and gives AI researchers reproducible evidence for model evaluation.
npx @refract-org/cli analyze "Earth" --depth brief
Node.js 20+ or Bun 1.2+ · Git 2.x · Any MediaWiki instance
The printing press froze knowledge in editions. Wikipedia made it mutable. Refract makes the mutation legible — a deterministic event stream showing where every claim came from, what changed, what supported it, what challenged it, when it stabilized, and what context altered its meaning.
Why Refract?
Deterministic
Same input, same output. Every run byte-for-byte identical. No model, no variance.
Provenance-tagged
Every event carries revision, section, timestamp, and analyzer identity.
BYO-Inference Boundaries
Every threshold is a configurable boundary. Plug a model where you need one; defaults run offline.
26 Event Types
Sentence lifecycles, citations, reverts, talk pages, protection levels, and edit clusters.
Merkle-provable
Signed bundles and replay manifests for audit trail and dataset integrity.
AI Agent Integration
Built-in MCP server. Claude Code, Cursor, and VS Code call Refract tools natively.
Quick start
# 1. Install (zero install also works via npx)
npx @refract-org/cli analyze "Earth" --depth brief
# 2. Explore results in the web UI
refract explore "Earth"
# 3. Connect an AI agent
refract mcp
# 4. Export as structured data
refract export "Earth" --format ndjson > earth-events.jsonl
# 5. Save as a signed evidence bundle
refract export "Earth" --bundle > earth-bundle.json
# 5. Output an ObservationReport with claim lifecycle
refract analyze "Earth" --report > earth-report.json
What Refract Is
- Deterministic: Guarantees same output for same input, offline.
- Provenance-tagged: Identifies source revision, timestamp, and analyzer version.
- Verifiable: Cryptographically proves claims using Merkle tree envelopes.
- Open Layer: A raw observation feed designed for downstream processing.
What Refract is Not
- No Model Interpretation: Does not decide semantic meaning or intent.
- No Truth Claims: Observes what changed, not which version is correct.
- No Editor Profiles: Does not rank, grade, score, or track editors.
- No Policy Judgments: Leaves decision relevance and rules to downstream tools.
By use case
Journalist / Researcher
Trace claim evolution and sources across revision history.
Data Scientist / OSINT
Extract NDJSON events and run columnar SQL analysis in DuckDB.
ML / RAG Engineer
Score retrieved texts by stability and provenance quality indicators.
Policy / Compliance
Monitor pages on schedules, test documentation, and send webhooks.
AI Agent Developer
Expose CLI events natively to agents using the built-in MCP server.
AI Model Evaluator
Prove temporal leakage and recency cutoffs against revision histories.
Ecosystem
Refract is one tool in a family of three:
| Tool | What it does | Install |
|---|---|---|
| Refract | CLI + TypeScript SDK — the deterministic observation engine | npm install -g @refract-org/cli |
| Python SDK | Typed Python wrapper — pandas DataFrames, notebooks, LangChain | pip install refract-py |
| Refract UI | Browser visualizer — drag-and-drop JSONL, timelines, word-level diffs | git clone refract-ui && bun run dev |
The natural workflow: analyze with Refract, export as NDJSON, then explore in Python or the UI.
Other pathways: System Integrators (SDK Reference · Production DDL · Private Wikis) · Engine Contributors (Custom Analyzer · Custom Eval · Architecture Decisions)
What's possible
Refract's deterministic event stream unlocks capabilities that go beyond observation. These aren't tutorials — they're the frontier of what the architecture enables.
| Capability | Read |
|---|---|
| Temporal leakage & recency | Model evaluation tutorial — Test models against knowledge cutoffs. Prove leakage deterministically. Compare recency across frontier models. |
| Provenance-aware RAG | RAG provenance tutorial — Score claims by stability. Filter training data. Weight retrieval by source quality. |
| BYO-inference at every boundary | BYO-inference tutorial — Replace heuristics with LLMs. Audit which path was taken. |
| Claim-level search | Frontier use cases — Search claim histories, not documents. "Claims removed as unsourced." "Claims that softened after events." |
| Temporal leakage detection | Frontier use cases — Was this claim public before the model's knowledge cutoff? |
| LLM summarization | Summarization tutorial — Pipe events through any model. Get human-readable change reports with audit trail. |
| Non-Wikipedia sources | Custom adapter tutorial — Confluence, GitHub wikis, Notion. Same analyzers, different data. |
| Streaming and Parquet | Frontier use cases — Live ingestion, columnar export, HuggingFace datasets. |
Contents
Getting Started
Reference
- CLI command reference — SDK / package reference
- Event schema — Event taxonomy
- Analysis depth levels — Export formats
- Evaluation harness — Architecture decisions
Integration
- Integrations overview — all supported tools and patterns
- Downstream integration — MCP: AI agent integration
- Analytics with DuckDB — Notebook analysis
- Scheduled monitoring
Tutorials
- Wikipedia history — Fandom canon
- Citation churn — Dispute timeline
- Cross-wiki comparison — Combat revisionism
- RAG provenance — MCP agent
- Scheduled monitoring — Python SDK
- BYO-inference — Custom analyzer
- Custom eval labels — Private wikis
- Non-English wikis — Summarization
- Refract UI — Custom adapter — Model evaluation
Appendix
- Glossary — Troubleshooting / FAQ
- Interpreting output — Security
- Naming conventions — Boundary
- Contributing to docs
License
AGPL-3.0. Built and maintained by NextConsensus.