Python SDK

The official Python SDK for Refract — wraps the CLI via subprocess and provides typed dataclasses, pandas integration, and notebook support.

Install

pip install refract-py

Requires the Refract CLI (the SDK calls it via subprocess):

npm install -g @refract-org/cli

npx @refract-org/cli is used as a fallback if refract is not on PATH.

Quick start

from refract import Refract

r = Refract()

# Analyze — get typed dataclasses
events = r.analyze("Bitcoin", depth="brief")
for event in events:
    print(event.eventType, event.timestamp)

# Export as pandas DataFrame
df = r.analyze("Bitcoin", depth="forensic", as_frame=True)
print(df.groupby("event_type").size())

API reference

`Refract`

The main client. No constructor arguments needed — it auto-detects the CLI.

r = Refract()

`analyze(page, depth, as_frame, flatten)`

Run a full page analysis. Returns list[EvidenceEvent] or DataFrame if as_frame=True.

Parameter	Type	Default	Description
`page`	`str`	required	Page title
`depth`	`str`	`"detailed"`	`"brief"`, `"detailed"`, or `"forensic"`
`as_frame`	`bool`	`False`	Return a pandas DataFrame
`flatten`	`bool`	`False`	Flatten nested provenance fields into flat columns

events = r.analyze("Climate_change", depth="forensic")
df = r.analyze("Climate_change", depth="forensic", as_frame=True, flatten=True)

`claim(page, text, as_frame)`

Track a specific claim across all revisions.

Parameter	Type	Default	Description
`page`	`str`	required	Page title
`text`	`str`	required	Claim text to track (partial match)
`as_frame`	`bool`	`False`	Return a pandas DataFrame

lifecycle = r.claim("Bitcoin", "decentralized")
print(lifecycle.status, lifecycle.first_seen)

`export(page, format, flatten, as_frame)`

Export analysis to a file or DataFrame.

Parameter	Type	Default	Description
`page`	`str`	required	Page title
`format`	`str`	`"ndjson"`	`"json"`, `"ndjson"`, `"csv"`
`flatten`	`bool`	`False`	Flatten nested fields
`as_frame`	`bool`	`False`	Return a pandas DataFrame

df = r.export("Bitcoin", format="ndjson", flatten=True, as_frame=True)

`EvidenceEvent` dataclass

@dataclass
class EvidenceEvent:
    eventType: str
    fromRevisionId: int
    toRevisionId: int
    section: str
    before: str
    after: str
    timestamp: str
    eventId: str = ""
    claimId: str = ""
    layer: str = ""
    deterministicFacts: list[DeterministicFact] = field(default_factory=list)

Integrations

pandas

Every method accepts as_frame=True to return a DataFrame with flattened provenance fields.

df = r.analyze("Bitcoin", depth="forensic", as_frame=True, flatten=True)
df["event_type"].value_counts()
df.groupby("section").size().sort_values(ascending=False)

LangChain

refract_langchain.py loads events as Document objects with stability metadata for provenance-aware RAG:

from refract_langchain import RefractLoader

loader = RefractLoader(page="Bitcoin", depth="forensic")
documents = loader.load()

for doc in documents:
    print(doc.metadata["event_type"], doc.metadata["stability_score"])

Jupyter / Marimo

Combine with pandas and matplotlib or altair for interactive exploration:

df = r.analyze("Bitcoin", depth="forensic", as_frame=True, flatten=True)
citations = df[df["event_type"].str.startswith("citation_")]
citations.groupby("event_type").size().plot(kind="bar")

Domain boundary

This SDK wraps the Refract CLI. It does not add model logic, interpretation, or domain-specific judgment. It provides typed Python access to deterministic observation output.

Source

github.com/refract-org/refract-py