Refract: deterministic observation layer for model evaluation

Refract reveals how claims change across public revision histories — and gives AI researchers reproducible evidence for model evaluation.

npx @refract-org/cli analyze "Earth" --depth brief

Node.js 20+ or Bun 1.2+ · Git 2.x · Any MediaWiki instance

The printing press froze knowledge in editions. Wikipedia made it mutable. Refract makes the mutation legible — a deterministic event stream showing where every claim came from, what changed, what supported it, what challenged it, when it stabilized, and what context altered its meaning.

Why Refract?

🔍

Deterministic

Same input, same output. Every run byte-for-byte identical. No model, no variance.

⚙️

Provenance-tagged

Every event carries revision, section, timestamp, and analyzer identity.

🔐

BYO-Inference Boundaries

Every threshold is a configurable boundary. Plug a model where you need one; defaults run offline.

📊

26 Event Types

Sentence lifecycles, citations, reverts, talk pages, protection levels, and edit clusters.

🔧

Merkle-provable

Signed bundles and replay manifests for audit trail and dataset integrity.

🤖

AI Agent Integration

Built-in MCP server. Claude Code, Cursor, and VS Code call Refract tools natively.

Quick start

# 1. Install (zero install also works via npx)
npx @refract-org/cli analyze "Earth" --depth brief

# 2. Explore results in the web UI
refract explore "Earth"

# 3. Connect an AI agent
refract mcp

# 4. Export as structured data
refract export "Earth" --format ndjson > earth-events.jsonl

# 5. Save as a signed evidence bundle
refract export "Earth" --bundle > earth-bundle.json

# 5. Output an ObservationReport with claim lifecycle
refract analyze "Earth" --report > earth-report.json

What Refract Is

  • Deterministic: Guarantees same output for same input, offline.
  • Provenance-tagged: Identifies source revision, timestamp, and analyzer version.
  • Verifiable: Cryptographically proves claims using Merkle tree envelopes.
  • Open Layer: A raw observation feed designed for downstream processing.

What Refract is Not

  • No Model Interpretation: Does not decide semantic meaning or intent.
  • No Truth Claims: Observes what changed, not which version is correct.
  • No Editor Profiles: Does not rank, grade, score, or track editors.
  • No Policy Judgments: Leaves decision relevance and rules to downstream tools.

By use case

Research

Journalist / Researcher

Trace claim evolution and sources across revision history.

Data Science

Data Scientist / OSINT

Extract NDJSON events and run columnar SQL analysis in DuckDB.

Engineering

ML / RAG Engineer

Score retrieved texts by stability and provenance quality indicators.

Automation

Policy / Compliance

Monitor pages on schedules, test documentation, and send webhooks.

Agents

AI Agent Developer

Expose CLI events natively to agents using the built-in MCP server.

Evaluation

AI Model Evaluator

Prove temporal leakage and recency cutoffs against revision histories.

Ecosystem

Refract is one tool in a family of three:

Tool What it does Install
Refract CLI + TypeScript SDK — the deterministic observation engine npm install -g @refract-org/cli
Python SDK Typed Python wrapper — pandas DataFrames, notebooks, LangChain pip install refract-py
Refract UI Browser visualizer — drag-and-drop JSONL, timelines, word-level diffs git clone refract-ui && bun run dev

The natural workflow: analyze with Refract, export as NDJSON, then explore in Python or the UI.

Other pathways: System Integrators (SDK Reference · Production DDL · Private Wikis) · Engine Contributors (Custom Analyzer · Custom Eval · Architecture Decisions)

What's possible

Refract's deterministic event stream unlocks capabilities that go beyond observation. These aren't tutorials — they're the frontier of what the architecture enables.

Capability Read
Temporal leakage & recency Model evaluation tutorial — Test models against knowledge cutoffs. Prove leakage deterministically. Compare recency across frontier models.
Provenance-aware RAG RAG provenance tutorial — Score claims by stability. Filter training data. Weight retrieval by source quality.
BYO-inference at every boundary BYO-inference tutorial — Replace heuristics with LLMs. Audit which path was taken.
Claim-level search Frontier use cases — Search claim histories, not documents. "Claims removed as unsourced." "Claims that softened after events."
Temporal leakage detection Frontier use cases — Was this claim public before the model's knowledge cutoff?
LLM summarization Summarization tutorial — Pipe events through any model. Get human-readable change reports with audit trail.
Non-Wikipedia sources Custom adapter tutorial — Confluence, GitHub wikis, Notion. Same analyzers, different data.
Streaming and Parquet Frontier use cases — Live ingestion, columnar export, HuggingFace datasets.

Contents

Getting Started

Reference

Integration

Tutorials

Appendix

License

AGPL-3.0. Built and maintained by NextConsensus.

Type something to search...