Downstream integration
Refract produces a deterministic event stream. Downstream systems consume that stream to interpret what the changes mean for their domain.
Integration surfaces
1. Structured events via CLI
# Export events as NDJSON
refract export "Bitcoin" --format ndjson > bitcoin-events.jsonl
# Export as ObservationReport with Merkle root
refract analyze "Bitcoin" --report > bitcoin-report.json
Each event carries schemaVersion, FactProvenance (analyzer, version, parameters), and an eventId (deterministic content hash).
2. SDK events via adapter
import { buildStructuredEvents, EVENT_SCHEMA_VERSION } from "@refract-org/evidence-graph";
import { sectionDiffer, citationTracker, detectEditClusters } from "@refract-org/analyzers";
const events = buildStructuredEvents(revisions);
// Each event has schemaVersion, FactProvenance with version, parameters
3. FactProvenance for auditability
Every event's deterministicFacts[0].provenance includes:
{
"analyzer": "section-differ",
"version": "0.4.0",
"parameters": { "similarityThreshold": 0.8 }
}
When a consumer overrides a threshold, the effective value is in parameters.
4. Schema versioning
Every event carries schemaVersion matching EVENT_SCHEMA_VERSION. This prevents silent invalidation of historical observations when EventType gains new members.
5. Config version pinning
AnalyzerConfig.$version is pinned from the CLI version. Downstream systems can prove which configuration was used for a given run.
refract analyze "Bitcoin" --similarity 0.85
# config.$version = "0.5.1"
# config.section.similarityThreshold = 0.85
6. ObservationReport for chain-of-custody
{
"pageTitle": "Bitcoin",
"observedAt": "2026-05-15T10:00:00Z",
"revisionRange": { "from": 100, "to": 200 },
"merkleRoot": "a1b2c3d4...",
"eventCount": 47,
"analyzerVersion": "0.5.1"
}
The Merkle root is from the replay manifest. A downstream system can verify events match the observation.
Baseline superiority validation
When validating an integrated signal, supply a Refract event summary to computeBaselineSuperiority(). The function adds Refract-derived baselines (revert count, citation flux, edit clusters) that the signal must outperform.
const result = computeBaselineSuperiority({
integratedLeadTimeDays: signal.leadTimeDays,
mentionCount: snapshot.metrics.mentionCount,
revertCount: snapshot.metrics.revertCount,
refractEventSummary: { totalEvents: 47, revertCount: 12 },
});
Principle
Refract's event stream is purely mechanical. All interpretation happens downstream. Refract provides the deterministic record; the consumer provides the judgment.
What downstream systems build
| Consumer | Builds on Refract |
|---|---|
| Healthcare decision intelligence | Feed structured events into a measurement pipeline that scores claims by clinical truth, ratification, economic stake, and feasibility. Each event carries the exact analyzer thresholds used. |
| AI training data curation | Score each claim by revert count, citation churn, talk page correlation, and template dispute history. Include only stable, well-sourced claims in training data. |
| Provenance-aware RAG | Enrich each retrieved chunk with its claim history — stable, recently changed, source-fragile, contested. Use the signal to weight or filter results. |
| Regulatory monitoring | Run refract cron on drug pages, guidelines, and regulatory topics. Alert on citation removal, template disputes, or section reorganization. |
| Competitive intelligence | Use refract diff to compare how the same topic is framed across wikis (English vs German Wikipedia, Fandom vs independent wiki). Track divergence over time. |
| Fact-checking | Given a claim, query its lifecycle — first appearance, source additions, revert history, talk page activity. Return a verifiable provenance timeline. |
| Academic research | Export ObservationReport with Merkle-verifiable claim histories. Analyze claim stability across topics, time periods, and editorial environments. |
| Journalism forensics | Track how a specific claim about a person evolved. Detect coordinated editing, source softening, or removal without replacement. |
| Fan wiki canon tracking | Compare the same fictional universe across competing wikis. Detect retcon divergence and measure by how much. |
| Knowledge graph engineering | Use --depth forensic to capture category and wikilink change events. Build an entity graph that evolves with the public record. |
Complementary technologies
Refract pairs naturally with these modern tools. The event stream is standard JSON/NDJSON — anything that reads JSON or speaks HTTP can consume it.
| Category | Technology | How they fit |
|---|---|---|
| Vector databases | Pinecone, Weaviate, pgvector, Chroma | Store claim embeddings alongside stability metadata. Query: "find claims similar to X that are stable and well-sourced." |
| RAG frameworks | LangChain, LlamaIndex, Vercel AI SDK | Use Refract's stability/contestation signals as retrieval filters or reranking features. A LangChain document loader is available in the refract-py package. |
| AI coding agents | Claude Code, Cline, Codex CLI, OpenClaw | Agents connect via Refract's built-in MCP server (refract mcp) to read claim histories, track changes, and cite provenance in their reasoning. |
| Python SDK | refract-py (GitHub) |
Typed dataclasses, pandas DataFrame integration, RefractError handling. Install: pip install refract-py (requires npm install -g @refract-org/cli). |
| MCP (Model Context Protocol) | Any MCP client (Claude Desktop, VS Code, Cursor, ChatGPT) | refract mcp is a native MCP server exposing tools for analyze, claim, export, cron, and classify. AI agents use these tools to retrieve claim history directly. |
| Data lakes & query | DuckDB, Apache Parquet, ClickHouse | Query refract export --format ndjson output with SQL. DuckDB can query JSONL files directly: SELECT event_type, count(*) FROM 'events.jsonl' GROUP BY event_type; |
| Streaming | Apache Kafka, Redpanda, Cloudflare Queues | Feed event streams into real-time claim monitoring pipelines. Each EvidenceEvent is a Kafka message with key by claimId for stateful processing. |
| Visualization | Observable Framework, Mermaid, D3 | refract visualize --format mermaid produces Mermaid diagrams. Observable Framework has a dedicated @refract-org/observable data loader. D3 reads event JSONL directly. |
| Knowledge graphs | RDF, SPARQL, Neo4j | Convert wikilink_added/category_added events into triple statements. Build an evolving entity graph where each edge has a revision timestamp. |
| Model serving | OpenAI API, DeepSeek, Ollama, vLLM, Workers AI | Plug any OpenAI-compatible endpoint into refract classify at each BYO-inference boundary. Workers AI runs models at the edge without managing servers. |
| Local inference | WebGPU, MLX, llama.cpp | Run detection models directly on-device — no API key needed. Refract defaults are mechanical (zero inference), but any boundary can be replaced with a local model via MCP sampling or Ollama. |
| Notebooks | Jupyter, Marimo, Observable notebooks | Load event JSONL into a DataFrame: pd.read_json("events.jsonl", lines=True). Analyze claim stability, citation churn, and edit cluster patterns interactively. Marimo's reactive runtime is particularly well-suited for live event stream analysis. |
| Serverless | Cloudflare Workers, D1, R2, Queues | Run refract via npx in a Worker, store structured events in D1, export to R2, queue re-observations. The entire infrastructure is edge-deployable with no servers to manage. |
Production ingestion
When consuming Refract events in a production pipeline, persist them to a queryable table. The schema below is a reference DDL that works with any relational database (D1, PostgreSQL, SQLite):
CREATE TABLE refract_events (
event_id TEXT PRIMARY KEY,
event_type TEXT NOT NULL,
schema_version TEXT NOT NULL,
from_revision_id INTEGER NOT NULL,
to_revision_id INTEGER NOT NULL,
section TEXT NOT NULL,
fact TEXT NOT NULL,
fact_detail TEXT,
analyzer_name TEXT,
analyzer_version TEXT,
input_hashes TEXT, -- JSON array of input hashes
parameters_json TEXT, -- JSON: effective FactProvenance parameters
observed_at TEXT NOT NULL,
batch_id TEXT NOT NULL,
page_title TEXT NOT NULL,
entity_id TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_refract_events_batch ON refract_events(batch_id);
CREATE INDEX idx_refract_events_type ON refract_events(event_type);
CREATE INDEX idx_refract_events_page ON refract_events(page_title);
CREATE INDEX idx_refract_events_observed ON refract_events(observed_at);
Repository pattern
Use a single adapter file as the import boundary between Refract and your codebase. The adapter re-exports Refract functions and types; no other file imports from @refract-org/* directly.
// adapter.ts — single import boundary
export type { EvidenceEvent, FactProvenance, AnalyzerConfig } from '@refract-org/evidence-graph';
export { EVENT_SCHEMA_VERSION, DEFAULT_ANALYZER_CONFIG, createEventIdentity } from '@refract-org/evidence-graph';
export { sectionDiffer, citationTracker, revertDetector, detectEditClusters } from '@refract-org/analyzers';
export { buildStructuredEvents } from '../adapter/build-events';
// repository.ts — D1 insert
export async function insertRefractEvents(
db: D1Database,
events: EvidenceEvent[],
batchId: string,
pageTitle: string,
): Promise<number> {
let count = 0;
for (const event of events) {
const fact = event.deterministicFacts?.[0];
await db.prepare(`
INSERT OR IGNORE INTO refract_events (...) VALUES (...)
`).bind(/* event fields */).run();
count++;
}
return count;
}
Migration guide (0.3.x → 0.4.x)
When upgrading from @refract-org/evidence-graph@0.3.x to 0.4.x:
- Add
"sentence_modified"to anyEventTypewhitelists in your code - The
FactProvenanceinterface now has an optionalparametersfield — consumers that read it gain provenance transparency but are not required to - The
EVENT_SCHEMA_VERSIONconstant ("0.4.0") andCLAIM_IDENTITY_VERSION("claimidentityv1") are new exports - Every event now carries
schemaVersion— verify your DDL can store this field AnalyzerConfignow supports$versionfor config pinning — optional, no migration action needed
See the version compatibility table for the full matrix.