Adding a New Neighbourhood Crime Report

Adding a New Neighbourhood Crime Report#

Step-by-step playbook for replicating the crime report pipeline to a new neighbourhood. Based on the Walmer Estate implementation (Feb 2026), which mirrored Sea Point.

Prerequisites#

The neighbourhood must fall within an identifiable SAPS precinct or ward
You need at least a handful of news-source incidents to seed initial data
Decide upfront: free or paid access model

Overview: What Gets Created#

#	Component	File(s)	Type
1	Dataset migration	`reports/migrations/00XX_add_<slug>_dataset.py`	NEW
2	Translations + access model migration	`reports/migrations/00XX_<slug>_translations.py`	NEW
3	Geographic reference data	`reports/geodata.py`	MODIFY
4	Editorial reference data	`reports/builders/reference_data.py`	MODIFY
5	Validation thresholds	`reports/builders/config.py`	MODIFY
6	Dataset registry	`reports/<slug>_registry.py`	NEW
7	Manifest registration	`reports/manifest_registry.py`	MODIFY
8	News fetcher	`reports/fetchers/<slug>_fetcher.py`	NEW
9	Classifier patterns	`reports/fetchers/classifier.py`	MODIFY
10	Fetch service registry	`reports/services/fetch_service.py`	MODIFY
11	Fetch GitHub workflow	`.github/workflows/fetch_<slug>_incidents.yml`	NEW
12	Update GitHub workflow	`.github/workflows/update_<slug>.yml`	NEW
13	Tests	`reports/tests/test_<slug>_fetch.py`	NEW
14	Static fallback dir	`static/reports/<slug>/.gitkeep`	NEW

Total: ~7 new files, ~7 modified files.

No new management command needed – the unified fetch_crime_incidents <slug> command handles all neighbourhoods.

Step-by-Step Guide#

Step 1: Database Migration – Create Dataset Record#

Create reports/migrations/00XX_add_<slug>_dataset.py.

Copy from 0026_add_walmer_estate_dataset.py and change:

def create_dataset(apps, schema_editor):
    Dataset = apps.get_model("reports", "Dataset")
    Dataset.objects.get_or_create(
        slug="<slug>",                          # e.g. "observatory"
        defaults={
            "name": "<Name> Crime & Safety Data",
            "description": "Crime and safety data for ...",
            "region": "Western Cape",
            "neighborhood": "<Name>",            # display name
            "category": "crime",
            "data_path": "<slug>/crime_data.json",
            "data_bucket": "ceraluna-ebooks",
            "status": "draft",                   # flip to active after first publish
            "workflow_enabled": False,
            "update_frequency": "weekly",
        },
    )

Important: Set status="draft" initially. Flip to active in Django admin after first successful data publish and visual QA.

Dependencies: previous migration in the chain.

Step 2: Access Model + Translations Migration#

Create reports/migrations/00XX_<slug>_translations.py.

Copy from 0027_dataset_access_model_and_walmer_translations.py.

If access_model field already exists (it does after 0027), you only need the RunPython operation – no AddField.

TRANSLATIONS = {
    "en": {"name": "...", "description": "..."},
    "de": {"name": "...", "description": "..."},
    "es": {"name": "...", "description": "..."},
    "fr": {"name": "...", "description": "..."},
    "it": {"name": "...", "description": "..."},
    "ja": {"name": "...", "description": "..."},
    "nl": {"name": "...", "description": "..."},
    "pt": {"name": "...", "description": "..."},
    "ru": {"name": "...", "description": "..."},
}

Set access_model to "free" or leave as default "paid".

free: access_model="free", no paywall, included_in_tiers=[]
paid: access_model="paid", requires subscription, included_in_tiers=["data", "premium"]

Step 3: Geographic Data (`reports/geodata.py`)#

Add three constants:

# -- <NAME> ----------------------------------------------------------

<SLUG_UPPER>_LOCATIONS: dict[str, tuple[float, float]] = {
    "Main Road":     (-33.xxx, 18.xxx),
    "Station":       (-33.xxx, 18.xxx),
    # 8-12 key landmarks with (lat, lng) coordinates
}

<SLUG_UPPER>_CENTER: tuple[float, float] = (-33.xxx, 18.xxx)

<SLUG_UPPER>_BOUNDS: dict[str, float] = {
    "lat_min": -33.xxx,
    "lat_max": -33.xxx,
    "lng_min": 18.xxx,
    "lng_max": 18.xxx,
}

How to get coordinates: Use Google Maps, right-click a point, copy coordinates. Center should be the neighbourhood midpoint. Bounds should be a generous bounding box.

Step 4: Editorial Reference Data (`reports/builders/reference_data.py`)#

Add a block for your neighbourhood in get_reference_data():

"<slug>": (
    <SLUG_UPPER>_SAFETY_ZONES,   # list of zone dicts
    <SLUG_UPPER>_TIME_RISK,      # dict with day/night/peak risk info
    <SLUG_UPPER>_LANDMARKS,      # list of landmark dicts
),

Each zone needs: zone, risk_level, description, recommendations. Time risk needs: highest_risk_period, lowest_risk_period, weekend_note. Each landmark needs: name, lat, lng, type, safety_note.

Copy the structure from SEA_POINT_SAFETY_ZONES or WALMER_ESTATE_SAFETY_ZONES and adapt the locations and risk assessments.

Step 5: Validation Thresholds (`reports/builders/config.py`)#

Add an entry to DATASET_THRESHOLDS:

"<slug>": PipelineThresholds(
    min_incidents=3,      # relaxed for new/sparse datasets
    min_quarters=1,
    min_categories=2,
    min_half_years=1,
),

For mature datasets with plenty of data, use stricter thresholds (Sea Point uses defaults: 5/4/3/2).

Step 6: Dataset Registry (`reports/<slug>_registry.py`)#

Create a new file. Minimum viable registry (2 datasets):

from datetime import timedelta
from reports.dataset_metadata import (
    DataClassID, DataProductManifest, DatasetMetadata,
    RecomputePolicy, RefreshStrategy,
)

<SLUG_UPPER>_INCIDENTS = DatasetMetadata(
    name="<Name> Incidents",
    dataset_slug="<slug>-incidents",
    source="news,SAPS",
    description="Individual crime incidents from news sources and SAPS reports.",
    class_id=DataClassID.EVT_DAILY,
    expected_update_cadence="weekly",
    staleness_threshold=timedelta(days=14),
    cache_ttl=timedelta(hours=1),
    refresh_strategy=RefreshStrategy.INCREMENTAL,
    incremental_key="incident_date",
    lookback_window=timedelta(days=14),
    r2_latest_key="data/reports/<slug>/incidents.json",
    local_fallback_path="static/reports/<slug>/crime_data.json",
    default_run_command="update_dataset <slug>",
)

<SLUG_UPPER>_COMPOSITE = DatasetMetadata(
    name="<Name> Crime Data (Composite)",
    dataset_slug="<slug>-composite",
    source="derived",
    description="Assembled composite of all sub-datasets.",
    class_id=DataClassID.SNAP_COMPOSITE,
    expected_update_cadence="weekly",
    staleness_threshold=timedelta(days=14),
    cache_ttl=timedelta(hours=1),
    refresh_strategy=RefreshStrategy.FULL,
    recompute_policy=RecomputePolicy.ON_SOURCE_CHANGE,
    dependencies=["<slug>-incidents"],
    r2_latest_key="data/reports/<slug>/crime_data.json",
    local_fallback_path="static/reports/<slug>/crime_data.json",
    default_run_command="update_dataset <slug>",
)

<SLUG_UPPER>_MANIFEST = DataProductManifest(
    product_slug="<slug>",
    product_name="<Name> Crime & Safety Data",
    region="Western Cape, South Africa",
    datasets=[<SLUG_UPPER>_INCIDENTS, <SLUG_UPPER>_COMPOSITE],
    composite_slug="<slug>-composite",
    access_model="free",         # or "paid"
    included_in_tiers=[],        # or ["data", "premium"]
    public_url_name="reports:crime_detail",
    public_url_kwargs={"neighborhood_slug": "<slug>"},
    public_url_label="<Name> Crime Report",
)

Step 7: Register Manifest (`reports/manifest_registry.py`)#

Add to _build_registry():

from reports.<slug>_registry import <SLUG_UPPER>_MANIFEST
# ...
registry[<SLUG_UPPER>_MANIFEST.product_slug] = <SLUG_UPPER>_MANIFEST

Step 8: News Fetcher (`reports/fetchers/<slug>_fetcher.py`)#

from reports.fetchers.news_fetcher import SeaPointNewsFetcher

_SEARCH_QUERIES = [
    '"<Area1>" Cape Town crime OR robbery OR assault OR murder',
    '"<Area2>" Cape Town crime OR shooting OR theft OR burglary',
    # 2-4 Google News search queries targeting the neighbourhood
]

class <Name>NewsFetcher(SeaPointNewsFetcher):
    name = "<slug>-news"

    async def _fetch(self):
        return await self._fetch_google_news(self._SEARCH_QUERIES)

    _SEARCH_QUERIES = _SEARCH_QUERIES

Tips for search queries:

Use the primary neighbourhood name + surrounding areas
Include “Cape Town” to avoid global false positives
Use OR-separated crime keywords
2-4 queries is the sweet spot

Step 9: Classifier Location Patterns (`reports/fetchers/classifier.py`)#

Add a location pattern list and register it:

_<SLUG_UPPER>_LOCATION_PATTERNS = [
    r"(?i)\b<landmark1>\b",
    r"(?i)\b<landmark2>\b",
    r"(?i)\b<street name>\b",
    # 10-15 regex patterns for known streets/landmarks
]

_DATASET_CLASSIFIER_CONFIG["<slug>"] = {
    "location_patterns": _<SLUG_UPPER>_LOCATION_PATTERNS,
    "locations": <SLUG_UPPER>_LOCATIONS,  # from geodata.py
    "default_location": "<Name>",
}

Step 10: Fetch Service Registry (`reports/services/fetch_service.py`)#

Add to _FETCHER_REGISTRY:

from reports.fetchers.<slug>_fetcher import <Name>NewsFetcher

_FETCHER_REGISTRY["<slug>"] = (<Name>NewsFetcher, "<Name>")

Once this is registered, python manage.py fetch_crime_incidents <slug> works automatically – no new management command needed.

Step 11: GitHub Workflows#

Both workflows are thin callers of reusable templates. Copy and adapt:

Fetch workflow (.github/workflows/fetch_<slug>_incidents.yml):

name: Fetch <Name> Crime Incidents

on:
  schedule:
    - cron: '0 7 * * 1'   # stagger 30min from others
  workflow_dispatch:
    inputs:
      since: { type: string, required: false }
      dry_run: { type: boolean, default: false }

concurrency:
  group: <slug>-fetch-${{ github.ref }}
  cancel-in-progress: false

jobs:
  fetch-incidents:
    uses: ./.github/workflows/_fetch-crime-incidents.yml
    with:
      neighbourhood-slug: <slug>
      since: ${{ github.event.inputs.since || '' }}
      dry_run: ${{ github.event.inputs.dry_run == 'true' }}
    secrets: inherit

Update workflow (.github/workflows/update_<slug>.yml):

name: Update <Name> Crime Data

on:
  workflow_dispatch:
    inputs:
      no_strict: { type: boolean, default: true }
      allow_anomaly: { type: boolean, default: false }

concurrency:
  group: <slug>-update-${{ github.ref }}
  cancel-in-progress: false

jobs:
  update:
    uses: ./.github/workflows/_update-crime-data.yml
    with:
      neighbourhood-slug: <slug>
      from-db: true
      no-strict: ${{ github.event.inputs.no_strict == 'true' }}
      allow-anomaly: ${{ github.event.inputs.allow_anomaly == 'true' }}
    secrets: inherit

Step 12: Tests#

Create reports/tests/test_<slug>_fetch.py. Cover:

Classifier dispatches to correct location patterns for the new slug
Fetcher class has correct name and search queries
Fetch service registry contains the new slug
fetch_crime_incidents <slug> creates incidents (mock RSS + httpx)

Step 13: Static Fallback Directory#

mkdir -p static/reports/<slug>
touch static/reports/<slug>/.gitkeep

Step 14: Run Full CI Checks Locally#

Before pushing, always run:

# Lint
poetry run ruff check .

# Type checking (all apps)
poetry run mypy payments reports products housing safety comments dashboard \
  ebook hiking loadshedding things_to_do travel_quiz dataportal about \
  cookieconsent eraluma

# Docs
sphinx-build -W --keep-going docs/source docs/build/html

# Tests
USE_SQLITE=1 python manage.py test reports --verbosity=2

Deployment Checklist#

After code is merged and CI is green:

Run migration on production
```
python manage.py migrate reports
```
Trigger “Fetch Crime Incidents” workflow
- Manual dispatch from GitHub Actions
- This populates CrimeIncident records in the DB
Trigger “Update Crime Data” workflow
- Set inputs: from_db: true, no_strict: true (first run only)
- This builds crime_data.json and uploads to R2
Visual QA
- Visit https://eraluma.com/crime/<slug>/ and verify the report renders
- Check map, statistics, incidents list, safety guide
Flip to active
- Django admin > Reports > Datasets > <slug> > Status = Active
- Now it appears on the public crime reports index page
Remove --no-strict from future workflow runs
- Once the dataset has enough data to pass standard validation

Access Model Reference#

access_model	Paywall	Tiers	Stripe Product Needed?
`"free"`	None – full access for everyone	`[]`	No
`"paid"`	Requires data or premium subscription	`["data", "premium"]`	No (uses existing tier)

The access_model field on the Dataset model controls this. Change it in Django admin or via migration.

No new Stripe products or prices are needed per neighbourhood – the existing tier subscriptions grant access to ALL paid datasets.

What Walmer Estate Specifically Created#

For the record, here is every file touched for the Walmer Estate implementation:

New Files (8)#

File	Purpose
`reports/walmer_estate_registry.py`	2 datasets (incidents + composite), manifest with `access_model="free"`
`reports/fetchers/walmer_estate_fetcher.py`	Google News RSS fetcher for Woodstock/Observatory/Salt River/Walmer Estate
`reports/management/commands/fetch_walmer_estate_incidents.py`	Deprecated wrapper (delegates to `fetch_crime_incidents walmer-estate`)
`reports/migrations/0026_add_walmer_estate_dataset.py`	Creates Dataset record (slug=walmer-estate, status=draft)
`reports/migrations/0027_dataset_access_model_and_walmer_translations.py`	Adds `access_model` field, sets WE to free, 9-language translations
`reports/tests/test_walmer_estate_fetch.py`	25 tests: classifier, fetcher, service, command
`.github/workflows/fetch_walmer_estate_incidents.yml`	Weekly Mon 06:30 UTC fetch + manual dispatch
`.github/workflows/update_walmer_estate.yml`	Manual dispatch: DB -> validate -> R2 publish

Modified Files (7)#

File	Change
`reports/geodata.py`	Added `WALMER_ESTATE_LOCATIONS` (10 landmarks), `_CENTER`, `_BOUNDS`
`reports/builders/reference_data.py`	Added safety zones, time risk, landmarks for Walmer Estate
`reports/builders/config.py`	Added relaxed thresholds: `min_incidents=3, min_quarters=1`
`reports/manifest_registry.py`	Registered `WALMER_ESTATE_MANIFEST`
`reports/fetchers/classifier.py`	Added `_WALMER_ESTATE_LOCATION_PATTERNS` (14 patterns), dataset_slug dispatch
`reports/services/fetch_service.py`	Added `"walmer-estate"` to `_FETCHER_REGISTRY`
`reports/models.py` + `reports/views.py`	Added `access_model` field + `_content_access_for()` helper (one-time, shared)

Git Commits#

Commit	Message
`4d509b651`	feat: add Walmer Estate crime report pipeline
`53c4fb63b`	feat: add per-dataset access gating and Walmer Estate translations
`85172e6b5`	fix: resolve mypy type errors in housing R2 download and Sphinx docs warning