Tutorial: Add a non-Wikipedia data source

Goal

Connect Refract to a wiki or knowledge base that isn't Wikipedia — Confluence, GitHub wikis, Notion, or any revision-tracked content source. The engine doesn't change. You write an adapter.

How the adapter surface works

Refract's ingestion pipeline consumes Revision[] — an array of revision objects with content, timestamps, and metadata. The MediaWikiClient is one implementation of this interface. Any source that can produce Revision[] works.

export interface Revision {
  revId: number;
  pageId: number;
  pageTitle: string;
  timestamp: string;
  user?: string;
  comment: string;
  content: string;    // ← the wikitext or document content
  size: number;
  minor: boolean;
}

Once you have Revision[], every analyzer works: section differ, citation tracker, revert detector, edit cluster detector, talk page correlator. The analyzers are pure functions — they don't know or care where the revisions came from.

Pattern: adapter function

Write a single function that fetches your source and returns Revision[]:

import type { Revision } from "@refract-org/evidence-graph";
import { sectionDiffer, citationTracker } from "@refract-org/analyzers";

async function fetchFromConfluence(
  pageId: string,
  apiUrl: string,
  apiToken: string,
): Promise<Revision[]> {
  const response = await fetch(`${apiUrl}/rest/api/content/${pageId}/version`, {
    headers: { Authorization: `Bearer ${apiToken}` },
  });

  const data = await response.json();
  return data.results.map((v: any) => ({
    revId: v.number,
    pageId: parseInt(pageId),
    pageTitle: v.title ?? pageId,
    timestamp: v.when,
    user: v.by?.displayName,
    comment: v.message ?? "",
    content: v.body?.storage?.value ?? "",
    size: v.body?.storage?.value?.length ?? 0,
    minor: v.minorEdit ?? false,
  }));
}

// Use it exactly like the Wikipedia client
const revisions = await fetchFromConfluence("12345", "https://mycompany.atlassian.net/wiki", "token");
const events = [];

for (let i = 1; i < revisions.length; i++) {
  events.push(
    ...sectionDiffer.diffSections(
      sectionDiffer.extractSections(revisions[i - 1].content),
      sectionDiffer.extractSections(revisions[i].content),
    ),
  );
  events.push(
    ...citationTracker.diffCitations(
      citationTracker.extractCitations(revisions[i - 1].content),
      citationTracker.extractCitations(revisions[i].content),
    ),
  );
}

console.log(`Found ${events.length} events across ${revisions.length} revisions`);

Existing adapters

Source	Protocol	Auth	Example
MediaWiki (Wikipedia, Fandom)	`api.php`	None / Bearer / Basic / OAuth2	Built-in (`@refract-org/ingestion`)
Private MediaWiki	`api.php`	Bearer / Basic	Private wiki tutorial
Confluence	REST API	Bearer token	Example above

When to build an adapter vs. use the CLI

If you need	Use
Wikipedia or any MediaWiki wiki	`refract analyze "Page" --api <url>` — no code needed
A non-MediaWiki source	Write an adapter function (pattern above)
An adapter that others might use	Contribute it to `labs/` in the refract monorepo
Private/authenticated sources	Private wiki tutorial

What the analyzers expect

The analyzers operate on content (plain wikitext). If your source isn't wikitext (e.g., Markdown, HTML, Notion blocks), preprocess it before passing to analyzers:

function markdownToWikitext(md: string): string {
  return md
    .replace(/^### /gm, "=== ")      // headings
    .replace(/^## /gm, "== ")
    .replace(/^# /gm, "= ")
    .replace(/\[([^\]]+)\]\([^)]+\)/g, "$1")  // links → plain text
    .replace(/`([^`]+)`/g, "$1");    // inline code → plain text
}

The better the preprocessing, the better the analysis. Citation tracking, for example, looks for <ref> tags — if your source doesn't use them, citations won't be detected. Adapt the preprocessing to match what your analyzers expect.

Contribute an adapter

If you've built an adapter for a common source, contribute it to refract-labs as an experimental probe. Follow the custom analyzer tutorial for the full pipeline: adapter → analyzer integration → tests → eval.

Next steps

Private wiki tutorial — authenticated MediaWiki instances
Custom analyzer tutorial — build a new analyzer
Downstream integration — production patterns