Tutorial: BYO-inference with real models

Goal

Replace Refract's default mechanical heuristics with a model at any BYO-inference boundary. Compare model output against the default, and audit which path was taken.

What BYO-inference means

Every analyzer threshold encodes an interpretive judgment — what counts as a revert, what sentence similarity means "modified," what activity level constitutes a spike. Refract ships with mechanical defaults that work offline. But at any of 5 typed boundaries, you can plug in a model.

The event records which path was taken in FactProvenance.parameters. The audit trail is transparent regardless of who supplied the judgment.

Step 1: See the default heuristics

Run a classification without a model:

refract classify revert --input '{"comment":"reverted vandalism"}'

Output:

{
  "boundary": "revert",
  "output": { "isRevert": true, "rationale": "comment matches revert pattern" },
  "source": "default"
}

source: "default" means the mechanical heuristic (6 regex patterns) made the decision. No API call was made.

Step 2: Plug in a model

Set your API key and endpoint:

export REFRACT_INFERENCE_API_KEY="sk-..."
export REFRACT_INFERENCE_ENDPOINT="https://api.openai.com/v1/chat/completions"
export REFRACT_INFERENCE_MODEL="gpt-4o-mini"

Or pass them as flags:

refract classify revert \
  --input '{"comment":"reverted vandalism"}' \
  --endpoint https://api.openai.com/v1/chat/completions \
  --model gpt-4o-mini

Output:

{
  "boundary": "revert",
  "output": { "isRevert": true, "rationale": "The edit comment explicitly states 'reverted vandalism'" },
  "source": "model",
  "confidence": 0.98
}

source: "model" + confidence — the model classified the boundary, and you know how confident it was.

Step 3: Try different providers

Refract works with any OpenAI-compatible API:

DeepSeek

refract classify revert \
  --input '{"comment":"rv unexplained removal"}' \
  --endpoint https://api.deepseek.com/v1/chat/completions \
  --model deepseek-chat

Local Ollama

# Start Ollama locally
ollama serve

# Run with a local model
refract classify revert \
  --input '{"comment":"undo previous edit"}' \
  --endpoint http://localhost:11434/v1/chat/completions \
  --model llama3

Anthropic (via proxy)

refract classify revert \
  --input '{"comment":"rvv"}' \
  --endpoint https://your-anthropic-proxy/v1/chat/completions \
  --model claude-3-haiku

No provider lock-in. Any endpoint that speaks chat/completions works.

Step 4: Compare model vs default

Run the same classification with and without a model and compare:

# Default
refract classify sentence_similarity \
  --input '{"before":"Earth is the third planet from the Sun","after":"Earth orbits the Sun as the third planet"}'

# Model
refract classify sentence_similarity \
  --input '{"before":"Earth is the third planet from the Sun","after":"Earth orbits the Sun as the third planet"}' \
  --model gpt-4o-mini

The default uses word-overlap ratio (threshold 0.8). The model considers semantic meaning. Same input, different reasoning, both recorded.

Step 5: Audit which path was taken

When a model classifies a boundary during a full page analysis, the event records the path:

refract analyze "COVID-19" --depth forensic --report > report.json

Check FactProvenance.parameters in the output:

{
  "deterministicFacts": [{
    "fact": "revert_detected",
    "provenance": {
      "analyzer": "revert-detector",
      "version": "0.5.1",
      "parameters": {
        "similaritySource": "model",
        "modelName": "gpt-4o-mini"
      }
    }
  }]
}

similaritySource: "model" means the model was used. similaritySource: "default" means the mechanical heuristic was used. Every event carries this audit trail — you always know who made the judgment.

Step 6: Use the MCP server for agent-driven classification

AI coding agents can call refract classify via the MCP server's sampling capability. The agent's host LLM classifies the boundary, and the result includes source: "model" for auditability:

refract mcp

Connect any MCP client and ask:

"Classify the most recent edit to the ChatGPT page — is it a revert, a major addition, or a minor edit? Use the model for classification."

The agent calls refract classify heuristic via MCP, the host LLM classifies, and the result is recorded with provenance.

The 5 BYO-inference boundaries

Boundary Default (mechanical) Model question
revert 6 regex patterns "Is this edit comment a revert?"
sentence_similarity Word-overlap ratio (0.8) "Are these two sentences the same claim?"
heuristic Size thresholds + comment patterns "What kind of edit is this?"
template_signal Name-to-type lookup "What policy signal does this template represent?"
activity_spike 3x moving average "Is this a meaningful spike in talk activity?"

Next steps