Refract: deterministic observation layer for model evaluation

Refract reveals how claims change across public revision histories — and gives AI researchers reproducible evidence for model evaluation.

npx @refract-org/cli analyze "Earth" --depth brief

▶ Live Demo Quick Start

Node.js 20+ or Bun 1.2+ · Git 2.x · Any MediaWiki instance

The printing press froze knowledge in editions. Wikipedia made it mutable. Refract makes the mutation legible — a deterministic event stream showing where every claim came from, what changed, what supported it, what challenged it, when it stabilized, and what context altered its meaning.

Why Refract?

🔍

Deterministic

Same input, same output. Every run byte-for-byte identical. No model, no variance.

⚙️

Provenance-tagged

Every event carries revision, section, timestamp, and analyzer identity.

🔐

BYO-Inference Boundaries

Every threshold is a configurable boundary. Plug a model where you need one; defaults run offline.

📊

26 Event Types

Sentence lifecycles, citations, reverts, talk pages, protection levels, and edit clusters.

🔧

Merkle-provable

Signed bundles and replay manifests for audit trail and dataset integrity.

🤖

AI Agent Integration

Built-in MCP server. Claude Code, Cursor, and VS Code call Refract tools natively.

Quick start

# 1. Install (zero install also works via npx)
npx @refract-org/cli analyze "Earth" --depth brief

# 2. Explore results in the web UI
refract explore "Earth"

# 3. Connect an AI agent
refract mcp

# 4. Export as structured data
refract export "Earth" --format ndjson > earth-events.jsonl

# 5. Save as a signed evidence bundle
refract export "Earth" --bundle > earth-bundle.json

# 5. Output an ObservationReport with claim lifecycle
refract analyze "Earth" --report > earth-report.json

What Refract Is

Deterministic: Guarantees same output for same input, offline.
Provenance-tagged: Identifies source revision, timestamp, and analyzer version.
Verifiable: Cryptographically proves claims using Merkle tree envelopes.
Open Layer: A raw observation feed designed for downstream processing.

What Refract is Not

No Model Interpretation: Does not decide semantic meaning or intent.
No Truth Claims: Observes what changed, not which version is correct.
No Editor Profiles: Does not rank, grade, score, or track editors.
No Policy Judgments: Leaves decision relevance and rules to downstream tools.

By use case

Research

Journalist / Researcher

Trace claim evolution and sources across revision history.

Demo ➔ Quick Start ➔ Wikipedia History

Data Science

Data Scientist / OSINT

Extract NDJSON events and run columnar SQL analysis in DuckDB.

Python SDK ➔ SDK Reference ➔ Notebooks

Engineering

ML / RAG Engineer

Score retrieved texts by stability and provenance quality indicators.

RAG Provenance ➔ Python SDK ➔ BYO-Inference

Automation

Policy / Compliance

Monitor pages on schedules, test documentation, and send webhooks.

Monitoring ➔ CLI Cron ➔ Citation Churn

Agents

AI Agent Developer

Expose CLI events natively to agents using the built-in MCP server.

MCP Tutorial ➔ MCP Reference ➔ BYO-Inference

Evaluation

AI Model Evaluator

Prove temporal leakage and recency cutoffs against revision histories.

Model Eval ➔ RAG Provenance ➔ Frontier Cases

Ecosystem

Refract is one tool in a family of three:

Tool	What it does	Install
Refract	CLI + TypeScript SDK — the deterministic observation engine	`npm install -g @refract-org/cli`
Python SDK	Typed Python wrapper — pandas DataFrames, notebooks, LangChain	`pip install refract-py`
Refract UI	Browser visualizer — drag-and-drop JSONL, timelines, word-level diffs	`git clone refract-ui && bun run dev`

The natural workflow: analyze with Refract, export as NDJSON, then explore in Python or the UI.

Other pathways: System Integrators (SDK Reference · Production DDL · Private Wikis) · Engine Contributors (Custom Analyzer · Custom Eval · Architecture Decisions)

What's possible

Refract's deterministic event stream unlocks capabilities that go beyond observation. These aren't tutorials — they're the frontier of what the architecture enables.

Capability	Read
Temporal leakage & recency	Model evaluation tutorial — Test models against knowledge cutoffs. Prove leakage deterministically. Compare recency across frontier models.
Provenance-aware RAG	RAG provenance tutorial — Score claims by stability. Filter training data. Weight retrieval by source quality.
BYO-inference at every boundary	BYO-inference tutorial — Replace heuristics with LLMs. Audit which path was taken.
Claim-level search	Frontier use cases — Search claim histories, not documents. "Claims removed as unsourced." "Claims that softened after events."
Temporal leakage detection	Frontier use cases — Was this claim public before the model's knowledge cutoff?
LLM summarization	Summarization tutorial — Pipe events through any model. Get human-readable change reports with audit trail.
Non-Wikipedia sources	Custom adapter tutorial — Confluence, GitHub wikis, Notion. Same analyzers, different data.
Streaming and Parquet	Frontier use cases — Live ingestion, columnar export, HuggingFace datasets.

Getting Started

Appendix

License

AGPL-3.0. Built and maintained by NextConsensus.