Tutorial: Scheduled monitoring with cron, watch, and notifications

Goal

Set up Refract to re-observe Wikipedia pages on a schedule, detect when claims change, and notify you via Slack, email, or webhook. Catch citation removal, template disputes, and section reorganization as they happen — not when someone notices.

Two monitoring modes

Mode How it works Use when
refract cron One-shot re-observation for cron scheduling Batch monitoring of many pages on a fixed schedule
refract watch Live polling daemon for a single page/section Real-time monitoring of a specific page

Step 1: Create your pages file

List the pages you want to monitor, one per line:

cat > watch-pages.txt << EOF
COVID-19
SARS-CoV-2
COVID-19_pandemic
COVID-19_vaccine
mRNA_vaccine
EOF

Pages can be any valid MediaWiki page title. Use underscores for spaces.

Step 2: Run a one-shot re-observation

refract cron watch-pages.txt --interval 24

This re-observes every page in the file, fetching revisions from the last 24 hours (since the last observation). Pages with no new revisions are skipped. Pages with new revisions are analyzed and events are emitted.

Use -c to cache revisions and avoid re-fetching:

refract cron watch-pages.txt --interval 24 -c

The cache stores revision content in ~/.wikihistory/refract.db. Subsequent runs only fetch new revisions.

Step 3: Schedule it with cron

Add to your crontab:

# Run every 6 hours
0 */6 * * * refract cron /path/to/watch-pages.txt --interval 6 -c

Or with GitHub Actions:

name: Refract Monitor
on:
  schedule:
    - cron: '0 */6 * * *'
jobs:
  monitor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v1
      - run: bun add -g @refract-org/cli
      - run: refract cron watch-pages.txt --interval 6 -c

Refract respects MediaWiki rate limits automatically (maxlag backoff). No additional configuration needed.

Step 4: Set up notifications

Slack

export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/..."

refract cron watch-pages.txt --interval 24 --notify-slack

Refract POSTs a summary to the webhook when new events are detected:

Refract detected 12 new events on 3 pages:
- COVID-19: 2 citation_removed, 1 template_added (NPOV)
- mRNA_vaccine: 4 sentence_modified, 1 citation_replaced
- SARS-CoV-2: 3 revert_detected, 1 edit_cluster_detected

Email

export SMTP_HOST="smtp.example.com"
export SMTP_PORT="587"
export SMTP_USER="alerts@example.com"
export SMTP_PASS="your-password"
export NOTIFY_EMAIL_TO="researcher@example.com"

refract cron watch-pages.txt --interval 24 --notify-email

Webhook

refract cron watch-pages.txt --interval 24 --notify-webhook https://your-server.com/hooks/refract

Refract POSTs a JSON payload with event summaries. Integrate with PagerDuty, Opsgenie, or any webhook consumer.

Step 5: Live polling with watch

For real-time monitoring of a single page:

refract watch "COVID-19" --interval 60000

Polls every 60 seconds for new revisions. When a new revision appears, Refract analyzes it and emits events. Combine with Unix pipes for custom notification:

refract watch "COVID-19" --section "Vaccine safety" | while read event; do
  if echo "$event" | grep -q "citation_removed"; then
    echo "⚠️ Citation removed from COVID-19 Vaccine safety section" | \
      mail -s "Refract Alert" researcher@example.com
  fi
done

Step 6: Detect specific patterns

Filter for high-signal events:

# Watch for citation removal — the most common prelude to content change
refract cron watch-pages.txt --interval 24 | \
  grep "citation_removed"

# Watch for template disputes — NPOV, citation needed, dispute tags
refract cron watch-pages.txt --interval 24 | \
  grep "template_added"

The output is NDJSON — pipe it into jq, DuckDB, or your own analysis pipeline.

Step 7: Integrate with your own alerts

The cron output is machine-readable JSON. Parse it programmatically:

refract cron watch-pages.txt --interval 24 --format ndjson | while read event; do
  event_type=$(echo "$event" | jq -r '.eventType')
  page=$(echo "$event" | jq -r '.pageTitle')
  section=$(echo "$event" | jq -r '.section')

  if [ "$event_type" = "citation_removed" ]; then
    echo "ALERT: Citation removed from $page / $section"
    # Trigger your alerting pipeline
  fi
done

Next steps