Progress Report

Project: CapeTownData Period: Jan 5 – Mar 3, 2026 (8 weeks) Date: 2026-03-03

318

Changes Shipped

3,085

Automated Tests

Automated Jobs

Known Code Errors

1 Summary

318 changes shipped in 8 weeks — one developer working full-time.
Crime maps now update automatically. News articles are collected, sorted, checked for duplicates, placed on the map, and translated into 8 languages — all without manual work.
Grew from 1 to 4 neighbourhoods. Sea Point (live), Hout Bay (live), Walmer Estate (closed — not enough data), Camps Bay (set up, not active yet).
Rebranded. Old name (es-CapeTown) → new name (CapeTownData) with new website address (capetowndata.com).
Mountain passes page launched. Shows live road conditions from TomTom, updates every 10 minutes, available in 8 languages.
Data is now stored in the cloud (Cloudflare R2) instead of code files. This prevents accidental overwrites and data conflicts.
3,085 automated tests run on every code change. This catches bugs before they reach users.
Cleaned up 101 code warnings, improved payment reliability, and reduced false error alerts.

2 Before & After

Area	8 Weeks Ago	Now
Crime data	Updated by hand, one file at a time	Fully automatic: collects articles, removes duplicates, places on map, translates
Duplicate detection	None — same story from 2 newspapers showed as 2 pins	4-step system catches duplicates even when articles use different wording
Neighbourhoods	1 (Sea Point only)	4 set up, 2 live (Sea Point + Hout Bay)
Automated jobs	3, run manually	24, most run on schedule (daily, weekly, or every 10 min)
Tests	~50 basic checks	3,085 checks — run automatically on every code change
Data storage	Saved in code files (caused conflicts)	Saved in cloud storage (Cloudflare R2) — single source of truth
Brand	es-CapeTown	CapeTownData with its own domain
Crime map features	Basic dots on a map	Different shapes per crime type, filter buttons, accurate neighbourhood outlines, grouped sources
Mountain passes	Empty page	Live road conditions, auto-updating, 8 languages
Payments	Simple checkout	3 subscription tiers, per-neighbourhood access, metered free previews
Error monitoring	None	Sentry tracks errors, payment health alerts, translation checks
Languages	English only (partial)	9 languages, translated automatically

3 What We Built

Product Features

Feature	Status	What It Does
Crime map: different shapes per crime type	Done	Robbery, assault, theft etc. each have a unique icon — easier to scan
Crime map: accurate neighbourhood outlines	Done	Hangberg & Imizamo Yethu now show real boundaries instead of rough circles
Crime map: merged duplicate pins	Done	When 2 newspapers report the same crime, it shows as 1 pin with both sources
Crime map: translated popups	Done	Safety zone info and sidebar legend in the user's language
Mountain passes: live road status	Done	Real-time data from TomTom, refreshed every 10 minutes, 8 languages
Homepage redesign	Done	New data table, neighbourhood previews, updated hero section
Reports without police stats	Done	Neighbourhoods can launch with news data only (no need to wait for SAPS data)
Metered paywall	Done	Free users see limited previews, then are asked to subscribe
Walmer Estate report	Closed	Built but shut down — too little crime data to be useful
Camps Bay report	Waiting	Set up in the system, will activate when there's enough data

Data Quality

What	Status	Why It Matters
4-step duplicate detection	Done	Catches same-story articles from different newspapers, even when they use different words
Automatic duplicate removal pipeline	Done	Finds and merges duplicates without manual review
Wider matching window (3 → 5 days)	Done	Newspapers sometimes report the same story days apart — now caught
Filter out old incidents (before 2024)	Done	Maps only show recent, relevant data
AI-powered location placement	Done	Reads the article and figures out where the crime happened on the map
AI-powered article classification	Done	Automatically identifies crime type, location, and date from news articles
Police stats import	Done	Imports official SAPS quarterly crime statistics
Safety zone accuracy fix	Done	Imizamo Yethu marker was 1.5km wrong — corrected using official map data

Automated Jobs

24 jobs run automatically — here are the most important ones:

Job	How Often	What It Does
Fetch crime news	Weekly	Searches Google News, classifies articles, checks for duplicates
Clean up duplicates	On demand	Reviews all incidents, finds and merges duplicates
Update crime data	After fetch/cleanup	Validates data, uploads to cloud storage
Update safety data	Daily	Refreshes the combined crime + safety dataset
Mountain pass status	Every 10 min	Gets latest road conditions from TomTom
Dam levels	Daily	Scrapes Cape Town dam water levels
Payment health check	Daily	Alerts if payments or invoices are failing
SAPS data check	Weekly	Checks if South African Police Service released new quarterly data
Code quality check	Every code change	Runs code style checks + all 3,085 tests

How the automation works:

Jobs are reusable — the same job definition works for any neighbourhood.
A single config file controls which neighbourhoods get processed and with what settings.
Every job that uploads data checks the source first — it won't overwrite newer cloud data with outdated local data.
If an upload fails, the job fails loudly instead of silently continuing — so we notice.

4 What We Learned (Top 15)

1 Magic numbers cause surprises.

We had a "3-day window" for matching articles. Two newspapers reported the same robbery 4 days apart — and the system missed it because 4 > 3.

Changed to a named, configurable 5-day window. Added tests for edge cases.

2 New features must be wired up everywhere.

We built a "fix duplicates" feature in the code, but forgot to pass the required setting in the automation job. Result: the system found 17 duplicate groups but fixed zero of them for weeks.

New rule: every new feature gets checked end-to-end, from code to automation, in the same change.

3 Safety limits need to match real data.

We set a safety limit of "max 10 duplicate groups" to prevent accidents. Hout Bay had 17 groups. The job silently stopped and showed no output — impossible to figure out what went wrong.

Raised the limit to 50. Made the system explain why it stopped.

4 Database changes must run everywhere.

We added a new data field, but the production automation job didn't include the step to update the database. It would have crashed on the next run.

Added the database update step to the automation. Caught before it caused a problem.

5 Free-tier data is a privacy line, not just a UX choice.

We almost sent precise neighbourhood boundary shapes to free-tier users. The whole point of the free tier is to show approximate data only — precise data is for paying users.

Caught in review. Precise boundaries only show for paid users. Added tests to make sure this doesn't happen again.

6 Don't guess map coordinates — verify them.

The Imizamo Yethu safety zone was placed 1.5 km away from the actual settlement. It was probably typed in by rough guess.

Now we use verified data from OpenStreetMap, the community-maintained global map database.

7 AI steps should be a bonus, not a requirement.

Our AI-powered duplicate matching (step 4 of 4) can fail if the AI service is slow or down. If it were the only method, we'd miss duplicates.

Steps 1–3 use simple rules and catch 90%+ of duplicates. Step 4 (AI) adds extra matches when it works, but the system is fine without it.

8 Check if there's enough data before building a product.

We built a complete crime report for Walmer Estate. Then we discovered the area simply doesn't have enough crime data to make the report useful. We had to shut it down.

New rule: check data volume first. Also built a "news-only" report mode so neighbourhoods can launch without waiting for police statistics.

9 Database changes must be safe to re-run.

Some early database changes would break if run twice. This caused problems in testing environments.

All database changes now use "create if doesn't exist" patterns, so re-running them is safe.

10 Clear change descriptions save time later.

Early changes had messages like "fix db877" and "fix ACTIONS!155977". Weeks later, nobody could tell what these changes were for.

Now every change gets a clear, descriptive message explaining what changed and why.

11 Use cloud storage from the start.

We originally saved data inside the code repository. This caused file conflicts when two processes tried to update the same file, and sometimes newer data got overwritten by older data.

Moved everything to cloud storage (Cloudflare R2). Now there's one copy of the truth, and no conflicts.

12 Test third-party services before automating them.

The TomTom road API needed 5 attempts to get right. Each attempt found a new issue: wrong URL formatting, size limits, data field errors, wrong labels. Each failure broke the mountain passes page.

Now we test third-party services thoroughly by hand before connecting them to automation.

13 Long-running jobs lose their database connection.

Translation jobs that run 25+ minutes cause the database connection to go stale. The job would finish translating, then fail to save the results.

Added a "reconnect to database" step before saving. Also increased the job time limit.

14 Style checks aren't enough — you need logic checks too.

Our automatic style checker catches formatting issues every time. But logic errors (using the wrong field name, forgetting a setting) slipped through to production multiple times.

The real safety net is the full test suite (3,085 tests), not the style checker.

15 Too many alerts = no alerts.

Our error tracking system (Sentry) was flooded with expected issues like "cloud storage timeout, retrying" and "translation API slow." These drowned out actual bugs.

Reclassified expected issues as non-urgent. Now only real bugs and payment failures trigger alerts.

5 Mistakes & Close Calls

#	What Happened	Impact	How We Fixed It
1	Duplicate-fix feature was built but not connected to automation	System found duplicates but removed zero of them — for weeks	Connected the feature, verified end-to-end
2	Safety limit too low (10 groups max, Hout Bay had 17)	Job stopped silently with no explanation	Raised limit, added clear error message
3	Imizamo Yethu safety zone placed 1.5 km from actual location	Map showed a safety warning in the wrong spot for weeks	Corrected using verified OpenStreetMap data
4	Production automation didn't update the database before running	Would have crashed on next run after adding a new data field	Added database update step — caught before it caused an issue
5	Built full Walmer Estate report without checking data availability	Wasted effort — had to shut it down due to too little data	Created pre-launch checklist: verify data volume first
6	Test errors from developer's laptop appeared in production error tracker	Confusing — looked like production bugs but weren't	Filtered errors by environment so local dev errors don't appear
7	TomTom API broke 5 times during integration	Mountain passes page showed stale data during each fix	Built comprehensive tests, documented API quirks
8	Almost sent precise boundary data to free-tier users	Would have leaked paid content to non-paying users	Caught in code review. Added automated tests to prevent this.

6 Cleanup Done

These are problems that existed before and have now been fixed:

Problem	Before	After
Code warnings	101 unresolved warnings	0 — all fixed, checked automatically
Automated testing	~50 basic checks	3,085 tests across 97 files
Change descriptions	Cryptic messages like "fix db877"	Clear descriptions explaining what and why
Data storage	Saved in code repo (caused conflicts)	Cloud storage — one copy, no conflicts
Payment processing	Could process the same payment twice	Duplicate payments prevented automatically
Error alerts	Flooded with harmless warnings	Only real problems trigger alerts
Translations	Missing or empty translations	Automated pipeline keeps all 8 languages up to date
Data overwrite risk	Could accidentally replace new data with old	System checks data freshness before uploading
Dead code	1,400+ lines of unused old code	Moved to archive folder

Things we stopped doing:

Saving data files in the code repository
Editing crime data by hand in JSON files
Manually entering translations
Running the Walmer Estate pipeline (shut down — not enough data)
Guessing map coordinates without checking official sources

7 What Could Go Wrong

Risk	How Serious	What We're Doing About It
Only one developer — everything depends on one person	High	This report documents how the system works. Architecture decisions are written down.
Database changes aren't reviewed separately — any automated job can change the production database	Medium	All database changes are reviewed in code review. No auto-generated changes in automation.
AI costs aren't tracked — duplicate detection and article classification use OpenAI	Medium	Costs are small ($0.001–0.005 per run) but there's no automatic alert if they spike.
No test environment — changes go straight to the live site	Medium	3,085 automated tests catch most issues. Destructive commands have "dry run" modes.
Cloud storage has no backup — data is in one location only	Medium	Most data can be regenerated from the database. Financial data is backed up separately.
Camps Bay is set up but untested	Low	Will activate when there's enough incident data to be useful.
Map boundaries may become outdated over time	Low	Boundaries link back to OpenStreetMap sources. Can be refreshed when needed.
Police data releases are unpredictable	Low	A weekly automated check alerts us when new SAPS data is available.

8 Numbers

Work Done

What	Count	Notes
Changes shipped	318	Over 8 weeks
Individual commits	548	One developer
Files touched	711	Code, templates, automation configs
Lines of code added (net)	+136,573	152k added, 16k removed
Management tools built	18	For fetching, processing, publishing, translating data
Automated jobs	24	Up from 3 eight weeks ago

Quality

What	Count	Notes
Automated tests	3,085	Across 97 test files
Test pass rate	100%	Enforced on every code change
Code warnings	0	Down from 101
Pre-submit checks	8	Style, formatting, secrets detection, etc.
Duplicate detection steps	4	URL match → same-source → similar text → AI matching
Duplicate matching window	5 days	Increased from 3 to catch more matches

User-Facing

What	Count	Notes
Active crime maps	2	Sea Point, Hout Bay
Languages	9	English + 8 translations
Avg. incidents per map	~57	Based on current data
Paid subscribers	Not yet tracked	Available in Stripe dashboard
Monthly visitors	Not yet tracked	Available in Google Analytics

Still need to measure:

Number of paying subscribers and monthly revenue (from Stripe)
Website traffic: page views, popular pages, bounce rate (from Google Analytics)
How many incidents each map actually has in production
Success rate of automated jobs
Error trends over time

9 Next 30 Days

Priorities

Urgency	Task	Depends On	Status
Urgent	Run duplicate cleanup on all live maps	Recent bug fixes merged	Ready
Urgent	Check that Sea Point "Daylight robbery" duplicate is merged	Cleanup must run first	Waiting
Urgent	Check that Hout Bay murder duplicates are merged	Cleanup must run first	Waiting
Soon	Verify old incidents (pre-2024) are filtered out on live maps	Database update must run	Waiting
Soon	Visual check: do Hangberg & Imizamo Yethu boundaries look right?	Code deployed	Waiting
Soon	Decide whether to activate Camps Bay	Check how much data exists	Not started
Later	Track how users interact with the maps	—	Not started
Later	Monitor AI costs	—	Not started
Later	Set up a test environment (so changes don't go straight to the live site)	Infrastructure planning	Not started

Week by Week

Week 1 — Mar 3–9

Run all cleanup jobs. Check every map visually. Verify the new neighbourhood boundaries.

Week 2 — Mar 10–16

Check Camps Bay data volume. Set up AI cost tracking.

Week 3 — Mar 17–23

Add user analytics to maps. Check if new police data has been released.

Week 4 — Mar 24–30

Plan test environment. Update documentation.

10 Key Decisions Made

When	Decision	Why	Result
Jan	Rename to CapeTownData	Broader appeal beyond expats	New brand, new domain, new look
Jan	Move data to cloud storage	Saving data in code files caused conflicts	All data now in Cloudflare R2
Jan	Add automated testing pipeline	No way to catch bugs before they reached users	3,085 tests run on every code change
Feb	Build 4-step duplicate detection	Same crime from different newspapers showed as separate pins	Working well — catches duplicates even with different wording
Feb	Lock new articles until reviewed	Prevent unverified AI-classified articles from going live automatically	All new articles require explicit approval
Feb	Shut down Walmer Estate	Not enough crime data to make the report useful	Shut down cleanly, resources freed up
Feb	Allow reports without police stats	Not every neighbourhood has SAPS data yet	Hout Bay launched with news data only
Feb	Add AI duplicate matching (step 4)	Some articles use completely different words for the same crime	Catches extra matches, but steps 1–3 handle 90%+
Mar	Widen matching window from 3 to 5 days	Newspapers sometimes report the same story 4+ days apart	Fixed a real missed duplicate. Added tests.
Mar	Use real map data instead of circles for settlements	Circles don't match the actual shape of Hangberg and Imizamo Yethu	Accurate boundaries from OpenStreetMap
Mar	Only show precise boundaries to paying users	Free tier should show approximate data only	Enforced with automated tests

1 Summary

2 Before & After

3 What We Built

Product Features

Data Quality

Automated Jobs

4 What We Learned (Top 15)

5 Mistakes & Close Calls

6 Cleanup Done

7 What Could Go Wrong

8 Numbers

Work Done

Quality

User-Facing

9 Next 30 Days

Priorities

Week by Week

10 Key Decisions Made

11 One-Page Summary

Where We Are

Biggest Wins

Biggest Risks

Next Steps