Crime Data Pipelines#
Two distinct pipelines keep crime data fresh. Both share a common
DatasetPipelineRun model for observability and audit.
Architecture#
Pipeline A: District-Level (reports app)#
Processes detailed crime data for individual neighbourhoods (incidents, statistics, maps, time-series).
Item |
Detail |
|---|---|
Command |
|
R2 key |
|
Local fallback |
|
Loader |
|
Validator |
|
Pipeline B: Ward-Level (safety app)#
Fetches from SAPS, CrimeHub, and news APIs to compute safety scores for all 116 Cape Town wards.
Item |
Detail |
|---|---|
Command |
|
R2 key |
|
Local fallback |
|
Loader |
|
Validator |
|
Validation Checks#
District-level checks#
# |
Check |
Level |
|---|---|---|
1 |
13 required top-level keys |
error |
2 |
Incident fields (id, title, date, category) |
error |
3 |
Statistic fields (metric_name, metric_value) |
error |
4 |
Category fields (category_code) |
error |
5 |
Duplicate incidents |
error |
6 |
Future incident dates |
warning |
6b |
Inverted period range |
error |
7 |
Outlier detection (> 3 sigma) |
warning |
8 |
Time-series: chronological, no duplicate period_id |
error |
9 |
Historical: reasonable years, no duplicate (year, cat) |
error/warning |
Ward-level checks#
# |
Check |
Level |
|---|---|---|
1 |
6 required top-level keys |
error |
2 |
Ward count >= 100 |
error |
2b |
Ward count != 116 |
warning |
3 |
Score bounds 0.0-10.0 |
error |
4 |
Required ward fields |
error |
5 |
Known safe areas score >= 5.0 |
warning |
6 |
Known danger areas score <= 4.0 |
warning |
7 |
Gang presence + score consistency |
warning |
8 |
Score delta vs previous data (> +/- 2.0) |
warning |
Failure Points & Rollback#
Failure |
Impact |
Recovery |
|---|---|---|
JSON parse error |
Command aborts before run created |
Fix source file, re-run |
Validation error (strict) |
Run marked failed, version NOT updated |
Fix data or use |
R2 upload failure |
Run marked failed, version NOT updated |
Check R2 credentials, retry |
Unexpected exception |
Run marked failed |
Check logs, fix, retry |
Rollback (district pipeline)#
python manage.py update_dataset sea-point --rollback-to 2025-Q4
This restores the Dataset version without re-uploading data (assumes the previous version is still in R2).
Freshness Monitoring#
The /reports/crime/freshness/ endpoint returns JSON showing the last
successful run for each pipeline:
{
"pipelines": [
{
"pipeline": "ward",
"slug": "city-wide",
"version": "Q3 2025/2026",
"finished_at": "2026-02-15T10:30:00Z",
"age_seconds": 3600,
"age_human": "1h 0m"
}
],
"checked_at": "2026-02-15T11:30:00Z"
}
Use this for uptime monitoring (e.g. alert if age_seconds > 7776000 =
90 days).