Progress Report
1 Summary
- 318 changes shipped in 8 weeks — one developer working full-time.
- Crime maps now update automatically. News articles are collected, sorted, checked for duplicates, placed on the map, and translated into 8 languages — all without manual work.
- Grew from 1 to 4 neighbourhoods. Sea Point (live), Hout Bay (live), Walmer Estate (closed — not enough data), Camps Bay (set up, not active yet).
- Rebranded. Old name (es-CapeTown) → new name (CapeTownData) with new website address (
capetowndata.com). - Mountain passes page launched. Shows live road conditions from TomTom, updates every 10 minutes, available in 8 languages.
- Data is now stored in the cloud (Cloudflare R2) instead of code files. This prevents accidental overwrites and data conflicts.
- 3,085 automated tests run on every code change. This catches bugs before they reach users.
- Cleaned up 101 code warnings, improved payment reliability, and reduced false error alerts.
2 Before & After
| Area | 8 Weeks Ago | Now |
|---|---|---|
| Crime data | Updated by hand, one file at a time | Fully automatic: collects articles, removes duplicates, places on map, translates |
| Duplicate detection | None — same story from 2 newspapers showed as 2 pins | 4-step system catches duplicates even when articles use different wording |
| Neighbourhoods | 1 (Sea Point only) | 4 set up, 2 live (Sea Point + Hout Bay) |
| Automated jobs | 3, run manually | 24, most run on schedule (daily, weekly, or every 10 min) |
| Tests | ~50 basic checks | 3,085 checks — run automatically on every code change |
| Data storage | Saved in code files (caused conflicts) | Saved in cloud storage (Cloudflare R2) — single source of truth |
| Brand | es-CapeTown | CapeTownData with its own domain |
| Crime map features | Basic dots on a map | Different shapes per crime type, filter buttons, accurate neighbourhood outlines, grouped sources |
| Mountain passes | Empty page | Live road conditions, auto-updating, 8 languages |
| Payments | Simple checkout | 3 subscription tiers, per-neighbourhood access, metered free previews |
| Error monitoring | None | Sentry tracks errors, payment health alerts, translation checks |
| Languages | English only (partial) | 9 languages, translated automatically |
3 What We Built
Product Features
| Feature | Status | What It Does |
|---|---|---|
| Crime map: different shapes per crime type | Done | Robbery, assault, theft etc. each have a unique icon — easier to scan |
| Crime map: accurate neighbourhood outlines | Done | Hangberg & Imizamo Yethu now show real boundaries instead of rough circles |
| Crime map: merged duplicate pins | Done | When 2 newspapers report the same crime, it shows as 1 pin with both sources |
| Crime map: translated popups | Done | Safety zone info and sidebar legend in the user's language |
| Mountain passes: live road status | Done | Real-time data from TomTom, refreshed every 10 minutes, 8 languages |
| Homepage redesign | Done | New data table, neighbourhood previews, updated hero section |
| Reports without police stats | Done | Neighbourhoods can launch with news data only (no need to wait for SAPS data) |
| Metered paywall | Done | Free users see limited previews, then are asked to subscribe |
| Walmer Estate report | Closed | Built but shut down — too little crime data to be useful |
| Camps Bay report | Waiting | Set up in the system, will activate when there's enough data |
Data Quality
| What | Status | Why It Matters |
|---|---|---|
| 4-step duplicate detection | Done | Catches same-story articles from different newspapers, even when they use different words |
| Automatic duplicate removal pipeline | Done | Finds and merges duplicates without manual review |
| Wider matching window (3 → 5 days) | Done | Newspapers sometimes report the same story days apart — now caught |
| Filter out old incidents (before 2024) | Done | Maps only show recent, relevant data |
| AI-powered location placement | Done | Reads the article and figures out where the crime happened on the map |
| AI-powered article classification | Done | Automatically identifies crime type, location, and date from news articles |
| Police stats import | Done | Imports official SAPS quarterly crime statistics |
| Safety zone accuracy fix | Done | Imizamo Yethu marker was 1.5km wrong — corrected using official map data |
Automated Jobs
24 jobs run automatically — here are the most important ones:
| Job | How Often | What It Does |
|---|---|---|
| Fetch crime news | Weekly | Searches Google News, classifies articles, checks for duplicates |
| Clean up duplicates | On demand | Reviews all incidents, finds and merges duplicates |
| Update crime data | After fetch/cleanup | Validates data, uploads to cloud storage |
| Update safety data | Daily | Refreshes the combined crime + safety dataset |
| Mountain pass status | Every 10 min | Gets latest road conditions from TomTom |
| Dam levels | Daily | Scrapes Cape Town dam water levels |
| Payment health check | Daily | Alerts if payments or invoices are failing |
| SAPS data check | Weekly | Checks if South African Police Service released new quarterly data |
| Code quality check | Every code change | Runs code style checks + all 3,085 tests |
- Jobs are reusable — the same job definition works for any neighbourhood.
- A single config file controls which neighbourhoods get processed and with what settings.
- Every job that uploads data checks the source first — it won't overwrite newer cloud data with outdated local data.
- If an upload fails, the job fails loudly instead of silently continuing — so we notice.
4 What We Learned (Top 15)
5 Mistakes & Close Calls
| # | What Happened | Impact | How We Fixed It |
|---|---|---|---|
| 1 | Duplicate-fix feature was built but not connected to automation | System found duplicates but removed zero of them — for weeks | Connected the feature, verified end-to-end |
| 2 | Safety limit too low (10 groups max, Hout Bay had 17) | Job stopped silently with no explanation | Raised limit, added clear error message |
| 3 | Imizamo Yethu safety zone placed 1.5 km from actual location | Map showed a safety warning in the wrong spot for weeks | Corrected using verified OpenStreetMap data |
| 4 | Production automation didn't update the database before running | Would have crashed on next run after adding a new data field | Added database update step — caught before it caused an issue |
| 5 | Built full Walmer Estate report without checking data availability | Wasted effort — had to shut it down due to too little data | Created pre-launch checklist: verify data volume first |
| 6 | Test errors from developer's laptop appeared in production error tracker | Confusing — looked like production bugs but weren't | Filtered errors by environment so local dev errors don't appear |
| 7 | TomTom API broke 5 times during integration | Mountain passes page showed stale data during each fix | Built comprehensive tests, documented API quirks |
| 8 | Almost sent precise boundary data to free-tier users | Would have leaked paid content to non-paying users | Caught in code review. Added automated tests to prevent this. |
6 Cleanup Done
These are problems that existed before and have now been fixed:
| Problem | Before | After |
|---|---|---|
| Code warnings | 101 unresolved warnings | 0 — all fixed, checked automatically |
| Automated testing | ~50 basic checks | 3,085 tests across 97 files |
| Change descriptions | Cryptic messages like "fix db877" | Clear descriptions explaining what and why |
| Data storage | Saved in code repo (caused conflicts) | Cloud storage — one copy, no conflicts |
| Payment processing | Could process the same payment twice | Duplicate payments prevented automatically |
| Error alerts | Flooded with harmless warnings | Only real problems trigger alerts |
| Translations | Missing or empty translations | Automated pipeline keeps all 8 languages up to date |
| Data overwrite risk | Could accidentally replace new data with old | System checks data freshness before uploading |
| Dead code | 1,400+ lines of unused old code | Moved to archive folder |
- Saving data files in the code repository
- Editing crime data by hand in JSON files
- Manually entering translations
- Running the Walmer Estate pipeline (shut down — not enough data)
- Guessing map coordinates without checking official sources
7 What Could Go Wrong
| Risk | How Serious | What We're Doing About It |
|---|---|---|
| Only one developer — everything depends on one person | High | This report documents how the system works. Architecture decisions are written down. |
| Database changes aren't reviewed separately — any automated job can change the production database | Medium | All database changes are reviewed in code review. No auto-generated changes in automation. |
| AI costs aren't tracked — duplicate detection and article classification use OpenAI | Medium | Costs are small ($0.001–0.005 per run) but there's no automatic alert if they spike. |
| No test environment — changes go straight to the live site | Medium | 3,085 automated tests catch most issues. Destructive commands have "dry run" modes. |
| Cloud storage has no backup — data is in one location only | Medium | Most data can be regenerated from the database. Financial data is backed up separately. |
| Camps Bay is set up but untested | Low | Will activate when there's enough incident data to be useful. |
| Map boundaries may become outdated over time | Low | Boundaries link back to OpenStreetMap sources. Can be refreshed when needed. |
| Police data releases are unpredictable | Low | A weekly automated check alerts us when new SAPS data is available. |
8 Numbers
Work Done
| What | Count | Notes |
|---|---|---|
| Changes shipped | 318 | Over 8 weeks |
| Individual commits | 548 | One developer |
| Files touched | 711 | Code, templates, automation configs |
| Lines of code added (net) | +136,573 | 152k added, 16k removed |
| Management tools built | 18 | For fetching, processing, publishing, translating data |
| Automated jobs | 24 | Up from 3 eight weeks ago |
Quality
| What | Count | Notes |
|---|---|---|
| Automated tests | 3,085 | Across 97 test files |
| Test pass rate | 100% | Enforced on every code change |
| Code warnings | 0 | Down from 101 |
| Pre-submit checks | 8 | Style, formatting, secrets detection, etc. |
| Duplicate detection steps | 4 | URL match → same-source → similar text → AI matching |
| Duplicate matching window | 5 days | Increased from 3 to catch more matches |
User-Facing
| What | Count | Notes |
|---|---|---|
| Active crime maps | 2 | Sea Point, Hout Bay |
| Languages | 9 | English + 8 translations |
| Avg. incidents per map | ~57 | Based on current data |
| Paid subscribers | Not yet tracked | Available in Stripe dashboard |
| Monthly visitors | Not yet tracked | Available in Google Analytics |
- Number of paying subscribers and monthly revenue (from Stripe)
- Website traffic: page views, popular pages, bounce rate (from Google Analytics)
- How many incidents each map actually has in production
- Success rate of automated jobs
- Error trends over time
9 Next 30 Days
Priorities
| Urgency | Task | Depends On | Status |
|---|---|---|---|
| Urgent | Run duplicate cleanup on all live maps | Recent bug fixes merged | Ready |
| Urgent | Check that Sea Point "Daylight robbery" duplicate is merged | Cleanup must run first | Waiting |
| Urgent | Check that Hout Bay murder duplicates are merged | Cleanup must run first | Waiting |
| Soon | Verify old incidents (pre-2024) are filtered out on live maps | Database update must run | Waiting |
| Soon | Visual check: do Hangberg & Imizamo Yethu boundaries look right? | Code deployed | Waiting |
| Soon | Decide whether to activate Camps Bay | Check how much data exists | Not started |
| Later | Track how users interact with the maps | — | Not started |
| Later | Monitor AI costs | — | Not started |
| Later | Set up a test environment (so changes don't go straight to the live site) | Infrastructure planning | Not started |
Week by Week
10 Key Decisions Made
| When | Decision | Why | Result |
|---|---|---|---|
| Jan | Rename to CapeTownData | Broader appeal beyond expats | New brand, new domain, new look |
| Jan | Move data to cloud storage | Saving data in code files caused conflicts | All data now in Cloudflare R2 |
| Jan | Add automated testing pipeline | No way to catch bugs before they reached users | 3,085 tests run on every code change |
| Feb | Build 4-step duplicate detection | Same crime from different newspapers showed as separate pins | Working well — catches duplicates even with different wording |
| Feb | Lock new articles until reviewed | Prevent unverified AI-classified articles from going live automatically | All new articles require explicit approval |
| Feb | Shut down Walmer Estate | Not enough crime data to make the report useful | Shut down cleanly, resources freed up |
| Feb | Allow reports without police stats | Not every neighbourhood has SAPS data yet | Hout Bay launched with news data only |
| Feb | Add AI duplicate matching (step 4) | Some articles use completely different words for the same crime | Catches extra matches, but steps 1–3 handle 90%+ |
| Mar | Widen matching window from 3 to 5 days | Newspapers sometimes report the same story 4+ days apart | Fixed a real missed duplicate. Added tests. |
| Mar | Use real map data instead of circles for settlements | Circles don't match the actual shape of Hangberg and Imizamo Yethu | Accurate boundaries from OpenStreetMap |
| Mar | Only show precise boundaries to paying users | Free tier should show approximate data only | Enforced with automated tests |
11 One-Page Summary
Where We Are
CapeTownData shows neighbourhood-level crime data for Cape Town. In 8 weeks, we went from one manually-updated map (Sea Point) to a fully automated system covering 2 neighbourhoods, with 4 more in the pipeline. The system automatically collects news articles, figures out what type of crime they describe, removes duplicates, places them on the map, translates them into 8 languages, and publishes — all without human involvement. 24 automated jobs keep everything running.
Biggest Wins
Crime maps update themselves. No more manual data entry. News articles flow from Google News to the live map automatically.
Quality checks are built in. 3,085 automated tests run on every change. Code style, type safety, and secret detection are all enforced.
Data is stored properly. Cloud storage replaced file-based storage, eliminating conflicts and accidental overwrites.
Mountain passes page works. Went from an empty page to live road conditions in 8 languages, updating every 10 minutes.
Biggest Risks
One developer. Everything depends on one person. If they're unavailable, nothing gets updated.
No test environment. Code changes go directly to the live website. If something breaks, users see it immediately.
AI costs aren't monitored. Each run is cheap ($0.001–0.005) but there's no alert if costs suddenly spike.
Next Steps
- Run the duplicate cleanup on all live maps to apply recent fixes.
- Visually check every map: are duplicates merged? Are safety zones in the right place? Are old incidents filtered out?
- Decide whether Camps Bay has enough data to launch.
- Start tracking AI costs and user behaviour on maps.
- Plan a test environment so changes can be verified before going live.