Adding a New Neighbourhood Crime Report#
Step-by-step playbook for replicating the crime report pipeline to a new neighbourhood. Based on the Walmer Estate implementation (Feb 2026), which mirrored Sea Point.
Prerequisites#
The neighbourhood must fall within an identifiable SAPS precinct or ward
You need at least a handful of news-source incidents to seed initial data
Decide upfront: free or paid access model
Overview: What Gets Created#
# |
Component |
File(s) |
Type |
|---|---|---|---|
1 |
Dataset migration |
|
NEW |
2 |
Translations + access model migration |
|
NEW |
3 |
Geographic reference data |
|
MODIFY |
4 |
Editorial reference data |
|
MODIFY |
5 |
Validation thresholds |
|
MODIFY |
6 |
Dataset registry |
|
NEW |
7 |
Manifest registration |
|
MODIFY |
8 |
News fetcher |
|
NEW |
9 |
Classifier patterns |
|
MODIFY |
10 |
Fetch service registry |
|
MODIFY |
11 |
Fetch GitHub workflow |
|
NEW |
12 |
Update GitHub workflow |
|
NEW |
13 |
Tests |
|
NEW |
14 |
Static fallback dir |
|
NEW |
Total: ~7 new files, ~7 modified files.
No new management command needed – the unified fetch_crime_incidents <slug> command handles all neighbourhoods.
Step-by-Step Guide#
Step 1: Database Migration – Create Dataset Record#
Create reports/migrations/00XX_add_<slug>_dataset.py.
Copy from 0026_add_walmer_estate_dataset.py and change:
def create_dataset(apps, schema_editor):
Dataset = apps.get_model("reports", "Dataset")
Dataset.objects.get_or_create(
slug="<slug>", # e.g. "observatory"
defaults={
"name": "<Name> Crime & Safety Data",
"description": "Crime and safety data for ...",
"region": "Western Cape",
"neighborhood": "<Name>", # display name
"category": "crime",
"data_path": "<slug>/crime_data.json",
"data_bucket": "ceraluna-ebooks",
"status": "draft", # flip to active after first publish
"workflow_enabled": False,
"update_frequency": "weekly",
},
)
Important: Set status="draft" initially. Flip to active in Django admin after first successful data publish and visual QA.
Dependencies: previous migration in the chain.
Step 2: Access Model + Translations Migration#
Create reports/migrations/00XX_<slug>_translations.py.
Copy from 0027_dataset_access_model_and_walmer_translations.py.
If access_model field already exists (it does after 0027), you only need the RunPython operation – no AddField.
TRANSLATIONS = {
"en": {"name": "...", "description": "..."},
"de": {"name": "...", "description": "..."},
"es": {"name": "...", "description": "..."},
"fr": {"name": "...", "description": "..."},
"it": {"name": "...", "description": "..."},
"ja": {"name": "...", "description": "..."},
"nl": {"name": "...", "description": "..."},
"pt": {"name": "...", "description": "..."},
"ru": {"name": "...", "description": "..."},
}
Set access_model to "free" or leave as default "paid".
free:
access_model="free", no paywall,included_in_tiers=[]paid:
access_model="paid", requires subscription,included_in_tiers=["data", "premium"]
Step 3: Geographic Data (reports/geodata.py)#
Add three constants:
# -- <NAME> ----------------------------------------------------------
<SLUG_UPPER>_LOCATIONS: dict[str, tuple[float, float]] = {
"Main Road": (-33.xxx, 18.xxx),
"Station": (-33.xxx, 18.xxx),
# 8-12 key landmarks with (lat, lng) coordinates
}
<SLUG_UPPER>_CENTER: tuple[float, float] = (-33.xxx, 18.xxx)
<SLUG_UPPER>_BOUNDS: dict[str, float] = {
"lat_min": -33.xxx,
"lat_max": -33.xxx,
"lng_min": 18.xxx,
"lng_max": 18.xxx,
}
How to get coordinates: Use Google Maps, right-click a point, copy coordinates. Center should be the neighbourhood midpoint. Bounds should be a generous bounding box.
Step 4: Editorial Reference Data (reports/builders/reference_data.py)#
Add a block for your neighbourhood in get_reference_data():
"<slug>": (
<SLUG_UPPER>_SAFETY_ZONES, # list of zone dicts
<SLUG_UPPER>_TIME_RISK, # dict with day/night/peak risk info
<SLUG_UPPER>_LANDMARKS, # list of landmark dicts
),
Each zone needs: zone, risk_level, description, recommendations.
Time risk needs: highest_risk_period, lowest_risk_period, weekend_note.
Each landmark needs: name, lat, lng, type, safety_note.
Copy the structure from SEA_POINT_SAFETY_ZONES or WALMER_ESTATE_SAFETY_ZONES and adapt the locations and risk assessments.
Step 5: Validation Thresholds (reports/builders/config.py)#
Add an entry to DATASET_THRESHOLDS:
"<slug>": PipelineThresholds(
min_incidents=3, # relaxed for new/sparse datasets
min_quarters=1,
min_categories=2,
min_half_years=1,
),
For mature datasets with plenty of data, use stricter thresholds (Sea Point uses defaults: 5/4/3/2).
Step 6: Dataset Registry (reports/<slug>_registry.py)#
Create a new file. Minimum viable registry (2 datasets):
from datetime import timedelta
from reports.dataset_metadata import (
DataClassID, DataProductManifest, DatasetMetadata,
RecomputePolicy, RefreshStrategy,
)
<SLUG_UPPER>_INCIDENTS = DatasetMetadata(
name="<Name> Incidents",
dataset_slug="<slug>-incidents",
source="news,SAPS",
description="Individual crime incidents from news sources and SAPS reports.",
class_id=DataClassID.EVT_DAILY,
expected_update_cadence="weekly",
staleness_threshold=timedelta(days=14),
cache_ttl=timedelta(hours=1),
refresh_strategy=RefreshStrategy.INCREMENTAL,
incremental_key="incident_date",
lookback_window=timedelta(days=14),
r2_latest_key="data/reports/<slug>/incidents.json",
local_fallback_path="static/reports/<slug>/crime_data.json",
default_run_command="update_dataset <slug>",
)
<SLUG_UPPER>_COMPOSITE = DatasetMetadata(
name="<Name> Crime Data (Composite)",
dataset_slug="<slug>-composite",
source="derived",
description="Assembled composite of all sub-datasets.",
class_id=DataClassID.SNAP_COMPOSITE,
expected_update_cadence="weekly",
staleness_threshold=timedelta(days=14),
cache_ttl=timedelta(hours=1),
refresh_strategy=RefreshStrategy.FULL,
recompute_policy=RecomputePolicy.ON_SOURCE_CHANGE,
dependencies=["<slug>-incidents"],
r2_latest_key="data/reports/<slug>/crime_data.json",
local_fallback_path="static/reports/<slug>/crime_data.json",
default_run_command="update_dataset <slug>",
)
<SLUG_UPPER>_MANIFEST = DataProductManifest(
product_slug="<slug>",
product_name="<Name> Crime & Safety Data",
region="Western Cape, South Africa",
datasets=[<SLUG_UPPER>_INCIDENTS, <SLUG_UPPER>_COMPOSITE],
composite_slug="<slug>-composite",
access_model="free", # or "paid"
included_in_tiers=[], # or ["data", "premium"]
public_url_name="reports:crime_detail",
public_url_kwargs={"neighborhood_slug": "<slug>"},
public_url_label="<Name> Crime Report",
)
Step 7: Register Manifest (reports/manifest_registry.py)#
Add to _build_registry():
from reports.<slug>_registry import <SLUG_UPPER>_MANIFEST
# ...
registry[<SLUG_UPPER>_MANIFEST.product_slug] = <SLUG_UPPER>_MANIFEST
Step 8: News Fetcher (reports/fetchers/<slug>_fetcher.py)#
from reports.fetchers.news_fetcher import SeaPointNewsFetcher
_SEARCH_QUERIES = [
'"<Area1>" Cape Town crime OR robbery OR assault OR murder',
'"<Area2>" Cape Town crime OR shooting OR theft OR burglary',
# 2-4 Google News search queries targeting the neighbourhood
]
class <Name>NewsFetcher(SeaPointNewsFetcher):
name = "<slug>-news"
async def _fetch(self):
return await self._fetch_google_news(self._SEARCH_QUERIES)
_SEARCH_QUERIES = _SEARCH_QUERIES
Tips for search queries:
Use the primary neighbourhood name + surrounding areas
Include “Cape Town” to avoid global false positives
Use OR-separated crime keywords
2-4 queries is the sweet spot
Step 9: Classifier Location Patterns (reports/fetchers/classifier.py)#
Add a location pattern list and register it:
_<SLUG_UPPER>_LOCATION_PATTERNS = [
r"(?i)\b<landmark1>\b",
r"(?i)\b<landmark2>\b",
r"(?i)\b<street name>\b",
# 10-15 regex patterns for known streets/landmarks
]
_DATASET_CLASSIFIER_CONFIG["<slug>"] = {
"location_patterns": _<SLUG_UPPER>_LOCATION_PATTERNS,
"locations": <SLUG_UPPER>_LOCATIONS, # from geodata.py
"default_location": "<Name>",
}
Step 10: Fetch Service Registry (reports/services/fetch_service.py)#
Add to _FETCHER_REGISTRY:
from reports.fetchers.<slug>_fetcher import <Name>NewsFetcher
_FETCHER_REGISTRY["<slug>"] = (<Name>NewsFetcher, "<Name>")
Once this is registered, python manage.py fetch_crime_incidents <slug> works automatically – no new management command needed.
Step 11: GitHub Workflows#
Both workflows are thin callers of reusable templates. Copy and adapt:
Fetch workflow (.github/workflows/fetch_<slug>_incidents.yml):
name: Fetch <Name> Crime Incidents
on:
schedule:
- cron: '0 7 * * 1' # stagger 30min from others
workflow_dispatch:
inputs:
since: { type: string, required: false }
dry_run: { type: boolean, default: false }
concurrency:
group: <slug>-fetch-${{ github.ref }}
cancel-in-progress: false
jobs:
fetch-incidents:
uses: ./.github/workflows/_fetch-crime-incidents.yml
with:
neighbourhood-slug: <slug>
since: ${{ github.event.inputs.since || '' }}
dry_run: ${{ github.event.inputs.dry_run == 'true' }}
secrets: inherit
Update workflow (.github/workflows/update_<slug>.yml):
name: Update <Name> Crime Data
on:
workflow_dispatch:
inputs:
no_strict: { type: boolean, default: true }
allow_anomaly: { type: boolean, default: false }
concurrency:
group: <slug>-update-${{ github.ref }}
cancel-in-progress: false
jobs:
update:
uses: ./.github/workflows/_update-crime-data.yml
with:
neighbourhood-slug: <slug>
from-db: true
no-strict: ${{ github.event.inputs.no_strict == 'true' }}
allow-anomaly: ${{ github.event.inputs.allow_anomaly == 'true' }}
secrets: inherit
Step 12: Tests#
Create reports/tests/test_<slug>_fetch.py. Cover:
Classifier dispatches to correct location patterns for the new slug
Fetcher class has correct name and search queries
Fetch service registry contains the new slug
fetch_crime_incidents <slug>creates incidents (mock RSS + httpx)
Step 13: Static Fallback Directory#
mkdir -p static/reports/<slug>
touch static/reports/<slug>/.gitkeep
Step 14: Run Full CI Checks Locally#
Before pushing, always run:
# Lint
poetry run ruff check .
# Type checking (all apps)
poetry run mypy payments reports products housing safety comments dashboard \
ebook hiking loadshedding things_to_do travel_quiz dataportal about \
cookieconsent eraluma
# Docs
sphinx-build -W --keep-going docs/source docs/build/html
# Tests
USE_SQLITE=1 python manage.py test reports --verbosity=2
Deployment Checklist#
After code is merged and CI is green:
Run migration on production
python manage.py migrate reports
Trigger “Fetch
Crime Incidents” workflow Manual dispatch from GitHub Actions
This populates CrimeIncident records in the DB
Trigger “Update
Crime Data” workflow Set inputs:
from_db: true,no_strict: true(first run only)This builds
crime_data.jsonand uploads to R2
Visual QA
Visit
https://eraluma.com/crime/<slug>/and verify the report rendersCheck map, statistics, incidents list, safety guide
Flip to active
Django admin > Reports > Datasets >
<slug>> Status = ActiveNow it appears on the public crime reports index page
Remove
--no-strictfrom future workflow runsOnce the dataset has enough data to pass standard validation
Access Model Reference#
access_model |
Paywall |
Tiers |
Stripe Product Needed? |
|---|---|---|---|
|
None – full access for everyone |
|
No |
|
Requires data or premium subscription |
|
No (uses existing tier) |
The access_model field on the Dataset model controls this. Change it in Django admin or via migration.
No new Stripe products or prices are needed per neighbourhood – the existing tier subscriptions grant access to ALL paid datasets.
What Walmer Estate Specifically Created#
For the record, here is every file touched for the Walmer Estate implementation:
New Files (8)#
File |
Purpose |
|---|---|
|
2 datasets (incidents + composite), manifest with |
|
Google News RSS fetcher for Woodstock/Observatory/Salt River/Walmer Estate |
|
Deprecated wrapper (delegates to |
|
Creates Dataset record (slug=walmer-estate, status=draft) |
|
Adds |
|
25 tests: classifier, fetcher, service, command |
|
Weekly Mon 06:30 UTC fetch + manual dispatch |
|
Manual dispatch: DB -> validate -> R2 publish |
Modified Files (7)#
File |
Change |
|---|---|
|
Added |
|
Added safety zones, time risk, landmarks for Walmer Estate |
|
Added relaxed thresholds: |
|
Registered |
|
Added |
|
Added |
|
Added |
Git Commits#
Commit |
Message |
|---|---|
|
feat: add Walmer Estate crime report pipeline |
|
feat: add per-dataset access gating and Walmer Estate translations |
|
fix: resolve mypy type errors in housing R2 download and Sphinx docs warning |