Jobzyl
Unified job-search aggregator with ATS resume matching
- Timeline
- /* TODO: Hammad — fill in your project dates */
- Role
- Solo full-stack project
- Status
- Shipped
- Primary stack
- Next.js · Supabase · FastAPI · AWS
Headline metrics
What this project tackles
Job search across a half-dozen boards is a full-time data-collection job before it's a job-search activity. Each platform has different filters, different update cadences, and different opacity around how its ATS scoring works against your CV. Aggregator products exist, but they're either unauthenticated ad farms or so slow that the data is stale by the time you load it.
The interesting full-stack problem isn't the scraping itself — it's making a real-time, multi-tenant aggregator with row-level security, live progress streaming, and client-side ATS scoring that doesn't ship the candidate's resume to a server. The privacy-first ATS scoring was the differentiator.
System design
Six job boards aggregated: four scraped in parallel (Indeed, Google Jobs, Glassdoor, ZipRecruiter) and two via official APIs (Reed, Adzuna). The scrape layer is a FastAPI service on AWS App Runner with per-board rate limits, scheduled re-scrapes every six hours for cache warming, and Server-Sent Events streaming search progress back to the client as results arrive — so users see jobs populate live instead of waiting for a single bulk response.
Storage is Supabase with row-level security on every one of 11 tables. No bare PostgreSQL access from the client; every read and write goes through RLS policies tied to the authenticated user's UUID. Auth supports email plus Google and LinkedIn OAuth via PKCE flow.
ATS scoring runs entirely client-side. The user's CV is parsed in-browser, keywords are extracted with a small NLP routine, and each job card displays a match score computed locally. The CV never leaves the device. Application tracking is a Kanban board with the standard pipeline (Saved → Applied → Interview → Offer → Rejected), side-by-side job comparison, and a persistent audit log.
Admin layer is a separate authenticated dashboard for search analytics, manual scrape triggers, and audit log review.
Key technical decisions
— Parallel scraping over sequential
Six boards scraped sequentially is a four-minute search. Parallel with per-board concurrency limits is under 30 seconds for the same coverage. The scaling cost is rate-limit management — a one-time engineering investment, not a per-search cost.
— Client-side ATS scoring
Server-side scoring would let us cache results and run more sophisticated extraction, but it would also mean storing every CV ever uploaded. Privacy is the actual product feature here, not a compliance afterthought.
— SSE over WebSocket
Search progress is a one-way push from server to client. SSE is a single HTTP request, auto-reconnects, plays nicely with HTTP/2, and doesn't require the WebSocket upgrade dance. WebSocket would be over-spec for the data flow.
— Supabase RLS over custom auth
Building auth correctly is a months-long project on its own and a constant security liability. RLS at the database layer means the rules are enforced regardless of which API path forgets to check them. Defense in depth that costs almost no engineering time.
What it delivers
Six job boards aggregated under 30s with live SSE progress, 11 RLS-locked Supabase tables, 100% client-side ATS scoring (CV never leaves the browser), scheduled re-scrapes every 6 hours, and a Kanban-style application tracker with side-by-side job comparison.
Operationally: PKCE OAuth for Google and LinkedIn, scheduled scrapes for cache warming, admin dashboard with persistent audit log, and search analytics for understanding which boards return useful results per query type.
What I'd do next
The interesting next step is shifting some scoring server-side without breaking the privacy promise — federated or homomorphic patterns where the CV embedding stays local but the score computation can use server-side job-side embeddings. Probably not worth it for v1; potentially the next moat.
/* TODO: Hammad — add a reflection on rate-limit management complexity, what broke first under real user load, or the unexpected cost of full-stack ownership. */
Continue reading
Autonomous Voice Agent
2,100+ concurrent AI sales calls at 1.1s latency
FinLaw-UK
Graph-augmented RAG for UK financial regulation