Troubleshooting / FAQ
Rate limits
Refract respects the MediaWiki API's maxlag parameter and backs off automatically. If you see 429 Too Many Requests, wait a few minutes before retrying. Use -c / --cache to avoid re-fetching pages you've already analyzed.
"Page too large" errors
Wikipedia pages with thousands of revisions may exceed the default fetch limit. Use --from <revId> --to <revId> to scope to a specific range, or increase depth gradually:
# Start with the last 50 revisions
refract analyze "Earth" --from <recent-rev-id>
Authentication errors
| Error | Likely cause | Fix |
|---|---|---|
401 Unauthorized |
Missing or invalid --api-key |
Check the API token is correct |
403 Forbidden |
Token lacks permission | Verify token scope with wiki admin |
| Connection refused | Wrong API URL | Ensure URL ends in /api.php |
Private wikis
For private or authenticated MediaWiki instances, provide credentials:
refract analyze "Page" \
--api https://internal.wiki/api.php \
--api-key <token>
Supported auth methods: bearer token (--api-key), basic auth (--api-user + --api-password), OAuth2.
Cache issues
If results seem stale, clear the cache:
rm -rf ~/.wikihistory/
Or use --cache-dir to point at a fresh location. The default cache directory is ~/.wikihistory/.
No events produced
If refract analyze returns no events, the page may not have changed in the requested revision range. Try expanding the range or removing --from/--to to fetch the most recent revisions.
Cross-wiki diff returns no results
refract diff compares a topic across two MediaWiki instances. Each wiki must have a page with the given title. Verify:
refract analyze "Topic" --api <wiki-a-url>
refract analyze "Topic" --api <wiki-b-url>
Performance and scaling
How many revisions per page?
brief depth (section differ only) processes ~50 revisions per second. detailed
depth (all standard analyzers) processes ~20 revisions per second. forensic depth
(with talk page correlation and edit cluster detection) processes ~10 revisions per
second. These are single-core measurements on an M-series Mac; results scale linearly
with additional pages (Refract processes pages sequentially).
Memory usage
Memory is proportional to revision count, not revision size. Each revision's wikitext
is held in memory during diff comparison, then discarded. A page with 500 revisions at
detailed depth uses ~200 MB peak memory. For very large pages (5,000+ revisions),
use --from / --to to paginate into batches of ~500 revisions.
When to paginate
Use --from / --to or --since when:
- The page has more than 1,000 revisions
- You need faster processing (narrower ranges process faster)
- You're running on a resource-constrained machine
- You want to process the page in parallel across multiple ranges
For batch processing many pages, use --pages-file:
refract analyze --pages-file topics.txt --depth detailed -c
Cache behavior
The cache stores fetched revisions in ~/.wikihistory/ (default: ~/.wikihistory/refract.db). Each revision costs
~2–10 KB of disk space. A page with 500 revisions uses ~5 MB of cache. The cache is
persistent — clear it with rm -rf ~/.wikihistory/ when disk space is tight.
API rate limits
Wikipedia enforces maxlag (default: 5 seconds). Refract respects this and backs off
automatically. Expect ~5–10 requests per second to Wikipedia. For non-Wikipedia
MediaWiki instances, rate limits vary. If you hit 429 errors consistently, increase
poll intervals or contact the wiki's administrator.