Adding a New Neighbourhood Crime Report#

Step-by-step playbook for replicating the crime report pipeline to a new neighbourhood. Based on the Walmer Estate implementation (Feb 2026), which mirrored Sea Point.


Prerequisites#

  • The neighbourhood must fall within an identifiable SAPS precinct or ward

  • You need at least a handful of news-source incidents to seed initial data

  • Decide upfront: free or paid access model


Overview: What Gets Created#

#

Component

File(s)

Type

1

Dataset migration

reports/migrations/00XX_add_<slug>_dataset.py

NEW

2

Translations + access model migration

reports/migrations/00XX_<slug>_translations.py

NEW

3

Geographic reference data

reports/geodata.py

MODIFY

4

Editorial reference data

reports/builders/reference_data.py

MODIFY

5

Validation thresholds

reports/builders/config.py

MODIFY

6

Dataset registry

reports/<slug>_registry.py

NEW

7

Manifest registration

reports/manifest_registry.py

MODIFY

8

News fetcher

reports/fetchers/<slug>_fetcher.py

NEW

9

Classifier patterns

reports/fetchers/classifier.py

MODIFY

10

Fetch service registry

reports/services/fetch_service.py

MODIFY

11

Fetch GitHub workflow

.github/workflows/fetch_<slug>_incidents.yml

NEW

12

Update GitHub workflow

.github/workflows/update_<slug>.yml

NEW

13

Tests

reports/tests/test_<slug>_fetch.py

NEW

14

Static fallback dir

static/reports/<slug>/.gitkeep

NEW

Total: ~7 new files, ~7 modified files.

No new management command needed – the unified fetch_crime_incidents <slug> command handles all neighbourhoods.


Step-by-Step Guide#

Step 1: Database Migration – Create Dataset Record#

Create reports/migrations/00XX_add_<slug>_dataset.py.

Copy from 0026_add_walmer_estate_dataset.py and change:

def create_dataset(apps, schema_editor):
    Dataset = apps.get_model("reports", "Dataset")
    Dataset.objects.get_or_create(
        slug="<slug>",                          # e.g. "observatory"
        defaults={
            "name": "<Name> Crime & Safety Data",
            "description": "Crime and safety data for ...",
            "region": "Western Cape",
            "neighborhood": "<Name>",            # display name
            "category": "crime",
            "data_path": "<slug>/crime_data.json",
            "data_bucket": "ceraluna-ebooks",
            "status": "draft",                   # flip to active after first publish
            "workflow_enabled": False,
            "update_frequency": "weekly",
        },
    )

Important: Set status="draft" initially. Flip to active in Django admin after first successful data publish and visual QA.

Dependencies: previous migration in the chain.

Step 2: Access Model + Translations Migration#

Create reports/migrations/00XX_<slug>_translations.py.

Copy from 0027_dataset_access_model_and_walmer_translations.py.

If access_model field already exists (it does after 0027), you only need the RunPython operation – no AddField.

TRANSLATIONS = {
    "en": {"name": "...", "description": "..."},
    "de": {"name": "...", "description": "..."},
    "es": {"name": "...", "description": "..."},
    "fr": {"name": "...", "description": "..."},
    "it": {"name": "...", "description": "..."},
    "ja": {"name": "...", "description": "..."},
    "nl": {"name": "...", "description": "..."},
    "pt": {"name": "...", "description": "..."},
    "ru": {"name": "...", "description": "..."},
}

Set access_model to "free" or leave as default "paid".

  • free: access_model="free", no paywall, included_in_tiers=[]

  • paid: access_model="paid", requires subscription, included_in_tiers=["data", "premium"]

Step 3: Geographic Data (reports/geodata.py)#

Add three constants:

# -- <NAME> ----------------------------------------------------------

<SLUG_UPPER>_LOCATIONS: dict[str, tuple[float, float]] = {
    "Main Road":     (-33.xxx, 18.xxx),
    "Station":       (-33.xxx, 18.xxx),
    # 8-12 key landmarks with (lat, lng) coordinates
}

<SLUG_UPPER>_CENTER: tuple[float, float] = (-33.xxx, 18.xxx)

<SLUG_UPPER>_BOUNDS: dict[str, float] = {
    "lat_min": -33.xxx,
    "lat_max": -33.xxx,
    "lng_min": 18.xxx,
    "lng_max": 18.xxx,
}

How to get coordinates: Use Google Maps, right-click a point, copy coordinates. Center should be the neighbourhood midpoint. Bounds should be a generous bounding box.

Step 4: Editorial Reference Data (reports/builders/reference_data.py)#

Add a block for your neighbourhood in get_reference_data():

"<slug>": (
    <SLUG_UPPER>_SAFETY_ZONES,   # list of zone dicts
    <SLUG_UPPER>_TIME_RISK,      # dict with day/night/peak risk info
    <SLUG_UPPER>_LANDMARKS,      # list of landmark dicts
),

Each zone needs: zone, risk_level, description, recommendations. Time risk needs: highest_risk_period, lowest_risk_period, weekend_note. Each landmark needs: name, lat, lng, type, safety_note.

Copy the structure from SEA_POINT_SAFETY_ZONES or WALMER_ESTATE_SAFETY_ZONES and adapt the locations and risk assessments.

Step 5: Validation Thresholds (reports/builders/config.py)#

Add an entry to DATASET_THRESHOLDS:

"<slug>": PipelineThresholds(
    min_incidents=3,      # relaxed for new/sparse datasets
    min_quarters=1,
    min_categories=2,
    min_half_years=1,
),

For mature datasets with plenty of data, use stricter thresholds (Sea Point uses defaults: 5/4/3/2).

Step 6: Dataset Registry (reports/<slug>_registry.py)#

Create a new file. Minimum viable registry (2 datasets):

from datetime import timedelta
from reports.dataset_metadata import (
    DataClassID, DataProductManifest, DatasetMetadata,
    RecomputePolicy, RefreshStrategy,
)

<SLUG_UPPER>_INCIDENTS = DatasetMetadata(
    name="<Name> Incidents",
    dataset_slug="<slug>-incidents",
    source="news,SAPS",
    description="Individual crime incidents from news sources and SAPS reports.",
    class_id=DataClassID.EVT_DAILY,
    expected_update_cadence="weekly",
    staleness_threshold=timedelta(days=14),
    cache_ttl=timedelta(hours=1),
    refresh_strategy=RefreshStrategy.INCREMENTAL,
    incremental_key="incident_date",
    lookback_window=timedelta(days=14),
    r2_latest_key="data/reports/<slug>/incidents.json",
    local_fallback_path="static/reports/<slug>/crime_data.json",
    default_run_command="update_dataset <slug>",
)

<SLUG_UPPER>_COMPOSITE = DatasetMetadata(
    name="<Name> Crime Data (Composite)",
    dataset_slug="<slug>-composite",
    source="derived",
    description="Assembled composite of all sub-datasets.",
    class_id=DataClassID.SNAP_COMPOSITE,
    expected_update_cadence="weekly",
    staleness_threshold=timedelta(days=14),
    cache_ttl=timedelta(hours=1),
    refresh_strategy=RefreshStrategy.FULL,
    recompute_policy=RecomputePolicy.ON_SOURCE_CHANGE,
    dependencies=["<slug>-incidents"],
    r2_latest_key="data/reports/<slug>/crime_data.json",
    local_fallback_path="static/reports/<slug>/crime_data.json",
    default_run_command="update_dataset <slug>",
)

<SLUG_UPPER>_MANIFEST = DataProductManifest(
    product_slug="<slug>",
    product_name="<Name> Crime & Safety Data",
    region="Western Cape, South Africa",
    datasets=[<SLUG_UPPER>_INCIDENTS, <SLUG_UPPER>_COMPOSITE],
    composite_slug="<slug>-composite",
    access_model="free",         # or "paid"
    included_in_tiers=[],        # or ["data", "premium"]
    public_url_name="reports:crime_detail",
    public_url_kwargs={"neighborhood_slug": "<slug>"},
    public_url_label="<Name> Crime Report",
)

Step 7: Register Manifest (reports/manifest_registry.py)#

Add to _build_registry():

from reports.<slug>_registry import <SLUG_UPPER>_MANIFEST
# ...
registry[<SLUG_UPPER>_MANIFEST.product_slug] = <SLUG_UPPER>_MANIFEST

Step 8: News Fetcher (reports/fetchers/<slug>_fetcher.py)#

from reports.fetchers.news_fetcher import SeaPointNewsFetcher

_SEARCH_QUERIES = [
    '"<Area1>" Cape Town crime OR robbery OR assault OR murder',
    '"<Area2>" Cape Town crime OR shooting OR theft OR burglary',
    # 2-4 Google News search queries targeting the neighbourhood
]

class <Name>NewsFetcher(SeaPointNewsFetcher):
    name = "<slug>-news"

    async def _fetch(self):
        return await self._fetch_google_news(self._SEARCH_QUERIES)

    _SEARCH_QUERIES = _SEARCH_QUERIES

Tips for search queries:

  • Use the primary neighbourhood name + surrounding areas

  • Include “Cape Town” to avoid global false positives

  • Use OR-separated crime keywords

  • 2-4 queries is the sweet spot

Step 9: Classifier Location Patterns (reports/fetchers/classifier.py)#

Add a location pattern list and register it:

_<SLUG_UPPER>_LOCATION_PATTERNS = [
    r"(?i)\b<landmark1>\b",
    r"(?i)\b<landmark2>\b",
    r"(?i)\b<street name>\b",
    # 10-15 regex patterns for known streets/landmarks
]

_DATASET_CLASSIFIER_CONFIG["<slug>"] = {
    "location_patterns": _<SLUG_UPPER>_LOCATION_PATTERNS,
    "locations": <SLUG_UPPER>_LOCATIONS,  # from geodata.py
    "default_location": "<Name>",
}

Step 10: Fetch Service Registry (reports/services/fetch_service.py)#

Add to _FETCHER_REGISTRY:

from reports.fetchers.<slug>_fetcher import <Name>NewsFetcher

_FETCHER_REGISTRY["<slug>"] = (<Name>NewsFetcher, "<Name>")

Once this is registered, python manage.py fetch_crime_incidents <slug> works automatically – no new management command needed.

Step 11: GitHub Workflows#

Both workflows are thin callers of reusable templates. Copy and adapt:

Fetch workflow (.github/workflows/fetch_<slug>_incidents.yml):

name: Fetch <Name> Crime Incidents

on:
  schedule:
    - cron: '0 7 * * 1'   # stagger 30min from others
  workflow_dispatch:
    inputs:
      since: { type: string, required: false }
      dry_run: { type: boolean, default: false }

concurrency:
  group: <slug>-fetch-${{ github.ref }}
  cancel-in-progress: false

jobs:
  fetch-incidents:
    uses: ./.github/workflows/_fetch-crime-incidents.yml
    with:
      neighbourhood-slug: <slug>
      since: ${{ github.event.inputs.since || '' }}
      dry_run: ${{ github.event.inputs.dry_run == 'true' }}
    secrets: inherit

Update workflow (.github/workflows/update_<slug>.yml):

name: Update <Name> Crime Data

on:
  workflow_dispatch:
    inputs:
      no_strict: { type: boolean, default: true }
      allow_anomaly: { type: boolean, default: false }

concurrency:
  group: <slug>-update-${{ github.ref }}
  cancel-in-progress: false

jobs:
  update:
    uses: ./.github/workflows/_update-crime-data.yml
    with:
      neighbourhood-slug: <slug>
      from-db: true
      no-strict: ${{ github.event.inputs.no_strict == 'true' }}
      allow-anomaly: ${{ github.event.inputs.allow_anomaly == 'true' }}
    secrets: inherit

Step 12: Tests#

Create reports/tests/test_<slug>_fetch.py. Cover:

  • Classifier dispatches to correct location patterns for the new slug

  • Fetcher class has correct name and search queries

  • Fetch service registry contains the new slug

  • fetch_crime_incidents <slug> creates incidents (mock RSS + httpx)

Step 13: Static Fallback Directory#

mkdir -p static/reports/<slug>
touch static/reports/<slug>/.gitkeep

Step 14: Run Full CI Checks Locally#

Before pushing, always run:

# Lint
poetry run ruff check .

# Type checking (all apps)
poetry run mypy payments reports products housing safety comments dashboard \
  ebook hiking loadshedding things_to_do travel_quiz dataportal about \
  cookieconsent eraluma

# Docs
sphinx-build -W --keep-going docs/source docs/build/html

# Tests
USE_SQLITE=1 python manage.py test reports --verbosity=2

Deployment Checklist#

After code is merged and CI is green:

  1. Run migration on production

    python manage.py migrate reports
    
  2. Trigger “Fetch Crime Incidents” workflow

    • Manual dispatch from GitHub Actions

    • This populates CrimeIncident records in the DB

  3. Trigger “Update Crime Data” workflow

    • Set inputs: from_db: true, no_strict: true (first run only)

    • This builds crime_data.json and uploads to R2

  4. Visual QA

    • Visit https://eraluma.com/crime/<slug>/ and verify the report renders

    • Check map, statistics, incidents list, safety guide

  5. Flip to active

    • Django admin > Reports > Datasets > <slug> > Status = Active

    • Now it appears on the public crime reports index page

  6. Remove --no-strict from future workflow runs

    • Once the dataset has enough data to pass standard validation


Access Model Reference#

access_model

Paywall

Tiers

Stripe Product Needed?

"free"

None – full access for everyone

[]

No

"paid"

Requires data or premium subscription

["data", "premium"]

No (uses existing tier)

The access_model field on the Dataset model controls this. Change it in Django admin or via migration.

No new Stripe products or prices are needed per neighbourhood – the existing tier subscriptions grant access to ALL paid datasets.


What Walmer Estate Specifically Created#

For the record, here is every file touched for the Walmer Estate implementation:

New Files (8)#

File

Purpose

reports/walmer_estate_registry.py

2 datasets (incidents + composite), manifest with access_model="free"

reports/fetchers/walmer_estate_fetcher.py

Google News RSS fetcher for Woodstock/Observatory/Salt River/Walmer Estate

reports/management/commands/fetch_walmer_estate_incidents.py

Deprecated wrapper (delegates to fetch_crime_incidents walmer-estate)

reports/migrations/0026_add_walmer_estate_dataset.py

Creates Dataset record (slug=walmer-estate, status=draft)

reports/migrations/0027_dataset_access_model_and_walmer_translations.py

Adds access_model field, sets WE to free, 9-language translations

reports/tests/test_walmer_estate_fetch.py

25 tests: classifier, fetcher, service, command

.github/workflows/fetch_walmer_estate_incidents.yml

Weekly Mon 06:30 UTC fetch + manual dispatch

.github/workflows/update_walmer_estate.yml

Manual dispatch: DB -> validate -> R2 publish

Modified Files (7)#

File

Change

reports/geodata.py

Added WALMER_ESTATE_LOCATIONS (10 landmarks), _CENTER, _BOUNDS

reports/builders/reference_data.py

Added safety zones, time risk, landmarks for Walmer Estate

reports/builders/config.py

Added relaxed thresholds: min_incidents=3, min_quarters=1

reports/manifest_registry.py

Registered WALMER_ESTATE_MANIFEST

reports/fetchers/classifier.py

Added _WALMER_ESTATE_LOCATION_PATTERNS (14 patterns), dataset_slug dispatch

reports/services/fetch_service.py

Added "walmer-estate" to _FETCHER_REGISTRY

reports/models.py + reports/views.py

Added access_model field + _content_access_for() helper (one-time, shared)

Git Commits#

Commit

Message

4d509b651

feat: add Walmer Estate crime report pipeline

53c4fb63b

feat: add per-dataset access gating and Walmer Estate translations

85172e6b5

fix: resolve mypy type errors in housing R2 download and Sphinx docs warning