Tutorial: Python SDK notebook workflow

Goal

Use refract-py to analyze Wikipedia pages, load results into pandas DataFrames, plot citation churn and claim stability, and export findings — all from a Jupyter notebook or Python script.

Prerequisites

pip install refract-py pandas matplotlib

The Python SDK wraps the Refract CLI via subprocess. Install the CLI:

npm install -g @refract-org/cli

Or use npx — the SDK falls back to it automatically.

Step 1: Analyze a page and get typed objects

from refract import Refract

r = Refract()

# Analyze at forensic depth for maximum signal
events = r.analyze("COVID-19", depth="forensic")

print(f"Found {len(events)} events")
print(f"First event: {events[0].eventType} at {events[0].timestamp}")

Each event is a typed EvidenceEvent dataclass with .eventType, .timestamp, .section, .before, .after, .fromRevisionId, .toRevisionId, and .deterministicFacts. No manual JSON parsing needed.

Step 2: Load into pandas as a DataFrame

# Direct DataFrame export with flattened fields
df = r.analyze("COVID-19", depth="forensic", as_frame=True)

print(df.columns)
# Index(['timestamp', 'event_type', 'from_revision_id', 'to_revision_id',
#        'section', 'event_id', 'schema_version', 'layer', 'fact',
#        'fact_detail', 'analyzer_name', 'analyzer_version'], dtype='object')

print(df.head())

The as_frame=True flag returns a pandas DataFrame with nested fields (deterministic facts, provenance) flattened into columns. No manual transformation needed.

Step 3: Analyze event type distribution

# Count events by type
event_counts = df["event_type"].value_counts()
print(event_counts)

# Plot
import matplotlib.pyplot as plt

event_counts.head(10).plot(kind="barh", figsize=(10, 6))
plt.title("Top 10 Event Types — COVID-19 Wikipedia Page")
plt.xlabel("Event Count")
plt.tight_layout()
plt.show()

Step 4: Plot citation churn over time

# Filter to citation events
citations = df[df["event_type"].str.startswith("citation_")].copy()
citations["date"] = pd.to_datetime(citations["timestamp"]).dt.date

# Count by date and type
churn = citations.groupby(["date", "event_type"]).size().unstack(fill_value=0)

# Plot
churn.plot(kind="area", figsize=(14, 6), alpha=0.7, stacked=True)
plt.title("Citation Churn — COVID-19 Wikipedia Page")
plt.xlabel("Date")
plt.ylabel("Events per Day")
plt.legend(title="Event Type")
plt.tight_layout()
plt.show()

An area chart shows the rhythm of citation activity — when sources were being actively added (expansion) vs removed (contraction) vs replaced (re-evaluation).

Step 5: Identify the most contested sections

# Count events per section
section_activity = df.groupby("section").agg(
    total_events=("event_type", "count"),
    reverts=("event_type", lambda x: (x == "revert_detected").sum()),
    citations=("event_type", lambda x: (x.str.startswith("citation_")).sum()),
    talk=("event_type", lambda x: (x.str.startswith("talk_")).sum()),
).sort_values("total_events", ascending=False)

print(section_activity.head(10))

Sections with high revert counts and low talk activity are edit-warred. Sections with high revert counts and high talk activity are actively deliberated. The DataFrame makes this distinction visible at a glance.

Step 6: Export for further analysis

# Export as flat CSV
df_flat = r.export("COVID-19", format="ndjson", flatten=True, as_frame=True)
df_flat.to_csv("covid-events.csv", index=False)

# Or export as raw NDJSON for DuckDB
events_raw = r.export("COVID-19", format="ndjson")
with open("covid-events.jsonl", "w") as f:
    for event in events_raw:
        f.write(json.dumps(event) + "\n")

Step 7: Error handling

from refract import Refract, RefractError

r = Refract()

try:
    events = r.analyze("ThisPageDoesNotExist", depth="brief")
except RefractError as e:
    print(f"Refract error: {e}")
    # Handle gracefully — the page may have been deleted, the API may be down,
    # or the CLI may not be installed

Step 8: Use the model evaluation adapter

refract_eval maps Refract events to model evaluation records — no CLI needed:

from refract_eval import build_leakage_benchmark, check_provenance

# Export events first, then use the adapter
r.export("COVID-19", format="ndjson", flatten=False)
# (events saved to stdout — pipe to file, then load with adapter)

# Or use pre-computed events:
records = build_leakage_benchmark("covid-events.jsonl", cutoff="2024-06-01")
leaked = [r for r in records if r.leaked]
print(f"Leakage rate: {len(leaked)}/{len(records)}")

# Check if a source ever existed on the page
result = check_provenance("covid-events.jsonl", "who.int")
print(f"Verified: {result.verified}, Outdated: {result.outdated}")

See the model evaluation tutorial for full benchmark workflows and the benchmark specification for standard pages and submission format.

Next steps

RAG provenance tutorial — using stability signals in retrieval
Analytics with DuckDB — SQL queries on Refract NDJSON output
Notebook analysis — DuckDB, Observable, and Marimo workflows
SDK reference — all packages and their APIs