Crime Data Pipelines

Crime Data Pipelines#

Two distinct pipelines keep crime data fresh. Both share a common DatasetPipelineRun model for observability and audit.

Architecture#

Pipeline A: District-Level (reports app)#

Processes detailed crime data for individual neighbourhoods (incidents, statistics, maps, time-series).

flowchart LR A[JSON file] --> B[update_dataset command] B --> C{validate_crime_data} C -->|errors + strict| D[FAIL - run.mark_failed] C -->|pass / warnings| E[Upload to R2] E -->|fail| F[FAIL - version NOT updated] E -->|success| G[Update Dataset.current_version] G --> H[run.mark_success]

Item	Detail
Command	`python manage.py update_dataset <slug>`
R2 key	`data/reports/{slug}/crime_data.json`
Local fallback	`static/reports/{slug}/crime_data.json`
Loader	`reports.views.load_crime_data()`
Validator	`reports.data_quality.validate_crime_data()`

Pipeline B: Ward-Level (safety app)#

Fetches from SAPS, CrimeHub, and news APIs to compute safety scores for all 116 Cape Town wards.

flowchart LR A[CrimeSafetyService] --> B[fetch_crime_safety command] B --> C{validate_ward_safety_data} C -->|errors + strict| D[FAIL - run.mark_failed] C -->|pass / warnings| E[Service writes to R2 + local] E --> F[run.mark_success]

Item	Detail
Command	`python manage.py fetch_crime_safety`
R2 key	`data/safety/crime_safety.json`
Local fallback	`static/crime_safety.json`
Loader	`safety.views.get_crime_safety_data()`
Validator	`safety.data_quality.validate_ward_safety_data()`

Validation Checks#

District-level checks#

#	Check	Level
1	13 required top-level keys	error
2	Incident fields (id, title, date, category)	error
3	Statistic fields (metric_name, metric_value)	error
4	Category fields (category_code)	error
5	Duplicate incidents	error
6	Future incident dates	warning
6b	Inverted period range	error
7	Outlier detection (> 3 sigma)	warning
8	Time-series: chronological, no duplicate period_id	error
9	Historical: reasonable years, no duplicate (year, cat)	error/warning

Ward-level checks#

#	Check	Level
1	6 required top-level keys	error
2	Ward count >= 100	error
2b	Ward count != 116	warning
3	Score bounds 0.0-10.0	error
4	Required ward fields	error
5	Known safe areas score >= 5.0	warning
6	Known danger areas score <= 4.0	warning
7	Gang presence + score consistency	warning
8	Score delta vs previous data (> +/- 2.0)	warning

Failure Points & Rollback#

Failure	Impact	Recovery
JSON parse error	Command aborts before run created	Fix source file, re-run
Validation error (strict)	Run marked failed, version NOT updated	Fix data or use `--no-strict`
R2 upload failure	Run marked failed, version NOT updated	Check R2 credentials, retry
Unexpected exception	Run marked failed	Check logs, fix, retry

Rollback (district pipeline)#

python manage.py update_dataset sea-point --rollback-to 2025-Q4

This restores the Dataset version without re-uploading data (assumes the previous version is still in R2).

Freshness Monitoring#

The /reports/crime/freshness/ endpoint returns JSON showing the last successful run for each pipeline:

{
  "pipelines": [
    {
      "pipeline": "ward",
      "slug": "city-wide",
      "version": "Q3 2025/2026",
      "finished_at": "2026-02-15T10:30:00Z",
      "age_seconds": 3600,
      "age_human": "1h 0m"
    }
  ],
  "checked_at": "2026-02-15T11:30:00Z"
}

Use this for uptime monitoring (e.g. alert if age_seconds > 7776000 = 90 days).