Crime Data Pipelines#

Two distinct pipelines keep crime data fresh. Both share a common DatasetPipelineRun model for observability and audit.

Architecture#

Pipeline A: District-Level (reports app)#

Processes detailed crime data for individual neighbourhoods (incidents, statistics, maps, time-series).

flowchart LR A[JSON file] --> B[update_dataset command] B --> C{validate_crime_data} C -->|errors + strict| D[FAIL - run.mark_failed] C -->|pass / warnings| E[Upload to R2] E -->|fail| F[FAIL - version NOT updated] E -->|success| G[Update Dataset.current_version] G --> H[run.mark_success]

Item

Detail

Command

python manage.py update_dataset <slug>

R2 key

data/reports/{slug}/crime_data.json

Local fallback

static/reports/{slug}/crime_data.json

Loader

reports.views.load_crime_data()

Validator

reports.data_quality.validate_crime_data()

Pipeline B: Ward-Level (safety app)#

Fetches from SAPS, CrimeHub, and news APIs to compute safety scores for all 116 Cape Town wards.

flowchart LR A[CrimeSafetyService] --> B[fetch_crime_safety command] B --> C{validate_ward_safety_data} C -->|errors + strict| D[FAIL - run.mark_failed] C -->|pass / warnings| E[Service writes to R2 + local] E --> F[run.mark_success]

Item

Detail

Command

python manage.py fetch_crime_safety

R2 key

data/safety/crime_safety.json

Local fallback

static/crime_safety.json

Loader

safety.views.get_crime_safety_data()

Validator

safety.data_quality.validate_ward_safety_data()

Validation Checks#

District-level checks#

#

Check

Level

1

13 required top-level keys

error

2

Incident fields (id, title, date, category)

error

3

Statistic fields (metric_name, metric_value)

error

4

Category fields (category_code)

error

5

Duplicate incidents

error

6

Future incident dates

warning

6b

Inverted period range

error

7

Outlier detection (> 3 sigma)

warning

8

Time-series: chronological, no duplicate period_id

error

9

Historical: reasonable years, no duplicate (year, cat)

error/warning

Ward-level checks#

#

Check

Level

1

6 required top-level keys

error

2

Ward count >= 100

error

2b

Ward count != 116

warning

3

Score bounds 0.0-10.0

error

4

Required ward fields

error

5

Known safe areas score >= 5.0

warning

6

Known danger areas score <= 4.0

warning

7

Gang presence + score consistency

warning

8

Score delta vs previous data (> +/- 2.0)

warning

Failure Points & Rollback#

Failure

Impact

Recovery

JSON parse error

Command aborts before run created

Fix source file, re-run

Validation error (strict)

Run marked failed, version NOT updated

Fix data or use --no-strict

R2 upload failure

Run marked failed, version NOT updated

Check R2 credentials, retry

Unexpected exception

Run marked failed

Check logs, fix, retry

Rollback (district pipeline)#

python manage.py update_dataset sea-point --rollback-to 2025-Q4

This restores the Dataset version without re-uploading data (assumes the previous version is still in R2).

Freshness Monitoring#

The /reports/crime/freshness/ endpoint returns JSON showing the last successful run for each pipeline:

{
  "pipelines": [
    {
      "pipeline": "ward",
      "slug": "city-wide",
      "version": "Q3 2025/2026",
      "finished_at": "2026-02-15T10:30:00Z",
      "age_seconds": 3600,
      "age_human": "1h 0m"
    }
  ],
  "checked_at": "2026-02-15T11:30:00Z"
}

Use this for uptime monitoring (e.g. alert if age_seconds > 7776000 = 90 days).