Python SDK
The official Python SDK for Refract — wraps the CLI via subprocess and provides typed dataclasses, pandas integration, and notebook support.
Install
pip install refract-py
Requires the Refract CLI (the SDK calls it via subprocess):
npm install -g @refract-org/cli
npx @refract-org/cli is used as a fallback if refract is not on PATH.
Quick start
from refract import Refract
r = Refract()
# Analyze — get typed dataclasses
events = r.analyze("Bitcoin", depth="brief")
for event in events:
print(event.eventType, event.timestamp)
# Export as pandas DataFrame
df = r.analyze("Bitcoin", depth="forensic", as_frame=True)
print(df.groupby("event_type").size())
API reference
Refract
The main client. No constructor arguments needed — it auto-detects the CLI.
r = Refract()
analyze(page, depth, as_frame, flatten)
Run a full page analysis. Returns list[EvidenceEvent] or DataFrame if as_frame=True.
| Parameter | Type | Default | Description |
|---|---|---|---|
page |
str |
required | Page title |
depth |
str |
"detailed" |
"brief", "detailed", or "forensic" |
as_frame |
bool |
False |
Return a pandas DataFrame |
flatten |
bool |
False |
Flatten nested provenance fields into flat columns |
events = r.analyze("Climate_change", depth="forensic")
df = r.analyze("Climate_change", depth="forensic", as_frame=True, flatten=True)
claim(page, text, as_frame)
Track a specific claim across all revisions.
| Parameter | Type | Default | Description |
|---|---|---|---|
page |
str |
required | Page title |
text |
str |
required | Claim text to track (partial match) |
as_frame |
bool |
False |
Return a pandas DataFrame |
lifecycle = r.claim("Bitcoin", "decentralized")
print(lifecycle.status, lifecycle.first_seen)
export(page, format, flatten, as_frame)
Export analysis to a file or DataFrame.
| Parameter | Type | Default | Description |
|---|---|---|---|
page |
str |
required | Page title |
format |
str |
"ndjson" |
"json", "ndjson", "csv" |
flatten |
bool |
False |
Flatten nested fields |
as_frame |
bool |
False |
Return a pandas DataFrame |
df = r.export("Bitcoin", format="ndjson", flatten=True, as_frame=True)
EvidenceEvent dataclass
@dataclass
class EvidenceEvent:
eventType: str
fromRevisionId: int
toRevisionId: int
section: str
before: str
after: str
timestamp: str
eventId: str = ""
claimId: str = ""
layer: str = ""
deterministicFacts: list[DeterministicFact] = field(default_factory=list)
Integrations
pandas
Every method accepts as_frame=True to return a DataFrame with flattened provenance fields.
df = r.analyze("Bitcoin", depth="forensic", as_frame=True, flatten=True)
df["event_type"].value_counts()
df.groupby("section").size().sort_values(ascending=False)
LangChain
refract_langchain.py loads events as Document objects with stability metadata for provenance-aware RAG:
from refract_langchain import RefractLoader
loader = RefractLoader(page="Bitcoin", depth="forensic")
documents = loader.load()
for doc in documents:
print(doc.metadata["event_type"], doc.metadata["stability_score"])
Jupyter / Marimo
Combine with pandas and matplotlib or altair for interactive exploration:
df = r.analyze("Bitcoin", depth="forensic", as_frame=True, flatten=True)
citations = df[df["event_type"].str.startswith("citation_")]
citations.groupby("event_type").size().plot(kind="bar")
Domain boundary
This SDK wraps the Refract CLI. It does not add model logic, interpretation, or domain-specific judgment. It provides typed Python access to deterministic observation output.