Commercial classifieds platform for imported cars — a Go backend with a deep five-level catalog, a concurrent Go scraper with LLM-powered data structuring, and a Next.js frontend with proper grammatical-case handling for its target language.
- The Problem
- Overview
- Architecture
- The Scraping Pipeline — Deep Dive
- Backend Service
- Frontend — Next.js 15
- Infrastructure & Deployment
- Tech Stack
- What I Learned
- Status
A classifieds platform for imported cars lives or dies on the quality of its data, and the data never arrives clean. Listings come from external sources written in a foreign language, with free-text descriptions, inconsistent spec formats, and catalog identifiers that do not match any normalized database. Prices are quoted in the source currency and have to be converted to the target currency with reasonably current rates. The target audience speaks a language that has grammatical gender, six cases, and three plural forms — so even something as simple as "3 cars found" requires grammatical machinery that most i18n libraries won't give you. None of this work is visible to a buyer scrolling through a results page, but all of it has to be right before the page is worth scrolling.
Araxes is the platform that does that work. It ingests ads from external sources, translates and structures them through a multi-stage concurrent pipeline with LLM-backed field extraction, normalizes them into a five-level car catalog with five specialised spec tables per complectation, serves them through a typed REST API with Meilisearch-backed relaxed search, and renders them in a frontend that knows how to decline a car model name in the genitive case.
Araxes is a three-component commercial project — one backend, one scraper, one frontend — all written personally by me and deployed behind Traefik on a Russian cloud provider's infrastructure at araxes.ru. Each component is a separate repository coordinated by shared conventions and tern migrations:
- Backend — Go 1.24, Fiber v2,
pgx/v5withgo-jetfor type-safe generated queries andMasterminds/Squirrelfor dynamic ones, Redis for cache, Meilisearch for full-text search, JWT for auth,gocron/v2for scheduled work, OpenTelemetry for tracing, 40+ REST endpoints across ads / search / cars catalog / specs / locations / currencies. - Scraper — Go 1.24, six specialised CLI binaries sharing one internal package tree, with
chromedpfor headless-Chrome scraping of JS-rendered pages,PuerkitoBio/goqueryfor DOM parsing,openai-go/v2andgroq-gofor LLM-based structured data extraction driven byinvopop/jsonschema,libretranslateas a cheaper translation fallback,aws-sdk-go-v2/s3for media uploads, and agorilla/websocketbacked web control surface for live operation. - Frontend — Next.js 15.4, React 19.1, Tailwind v4, shadcn-ui on top of Radix,
next-authv4, TanStack Table and Virtual for admin grids,zodfor validation, and — the interesting bit —petrovichandpluralize-rufor grammatically-correct rendering of model names and result counts in the target language.
The interesting technical story is the scraping pipeline. The backend is a disciplined layered REST service — clean, tested, and deliberately boring in the way backends should be. The frontend is a polished Next.js app with one unusual grammar-handling quirk. But the scraper is where the engineering lives: it is a hand-built concurrent Go pipeline with priority-draining queues, pre-qualification filters, panic-recovering workers, LLM-driven data structuring, and an operational control plane exposed over WebSocket. That is the part this README spends the most time on.
Three diagrams: how the three components fit together and exchange data, how the scraper's internal pipeline flows from an external URL to a saved row, and how the backend is layered from HTTP handler to database row.
The three components are loosely coupled: the scraper writes into the shared Postgres + Meilisearch + S3 storage, and the backend reads from that storage to serve the frontend. The only synchronous coupling is at the edge — Traefik terminates TLS and routes public traffic across the backend, the frontend, and a media proxy. The Central Bank rate feed is a scheduled pull on the backend's own timer, not driven by any user request.
%%{init: {'theme':'dark', 'themeVariables': {'fontSize':'14px', 'fontFamily':'Inter, system-ui, sans-serif'}}}%%
flowchart LR
User([Buyer])
Admin([Operator])
subgraph Edge["Edge"]
direction TB
Traefik[Traefik<br/>Let's Encrypt<br/>4 subdomains]
end
subgraph App["Application Layer"]
direction TB
Frontend[Next.js 15<br/>React 19 · Tailwind v4<br/>shadcn-ui]
Backend[Backend Service<br/>Fiber v2 · Go 1.24]
ScraperUI[Scraper Control UI<br/>WebSocket-driven]
end
subgraph Scraper["Scraper — 6 binaries"]
direction TB
Pipeline[Concurrent Pipeline<br/>priority queue · panic respawn]
AI[AI Structurer<br/>OpenAI · Groq · LibreTranslate]
end
subgraph Stateful["Shared Storage"]
direction TB
PG[(PostgreSQL<br/>catalog · ads · users<br/>25+ tern migrations)]
Redis[(Redis<br/>cache · rate-limit)]
Meili[(Meilisearch<br/>ads index)]
S3[(Object Storage<br/>ad photos)]
end
CB[Central Bank<br/>currency feed]
Sources[External Ad Sources<br/>JS-rendered listings]
User ==> Traefik
Admin ==> Traefik
Traefik --> Frontend
Traefik --> Backend
Traefik --> ScraperUI
Frontend --> Backend
Backend --> PG
Backend --> Redis
Backend --> Meili
Backend -. cron pull .-> CB
ScraperUI <==> Pipeline
Pipeline --> Sources
Pipeline --> AI
Pipeline --> PG
Pipeline --> Meili
Pipeline --> S3
classDef user fill:#0f172a,stroke:#64748b,color:#e2e8f0,stroke-width:2px
classDef edge fill:#581c87,stroke:#c084fc,color:#f3e8ff,stroke-width:3px
classDef app fill:#14532d,stroke:#4ade80,color:#dcfce7,stroke-width:2px
classDef scraper fill:#134e4a,stroke:#2dd4bf,color:#ccfbf1,stroke-width:2px
classDef state fill:#7f1d1d,stroke:#f87171,color:#fee2e2,stroke-width:2px
classDef ext fill:#4a044e,stroke:#d946ef,color:#fae8ff,stroke-width:2px
class User,Admin user
class Traefik edge
class Frontend,Backend,ScraperUI app
class Pipeline,AI scraper
class PG,Redis,Meili,S3 state
class CB,Sources ext
Inside the scraper's main ads binary. A scraping cycle is driven by a ticker on the main loop, which fills the cards channel with new URLs to process. Three worker pools drain channels downstream: details workers fetch and parse each ad's full page through headless Chrome, processing workers call the AI structurer and translator, and the saving worker commits the normalized result into Postgres, Meilisearch, and S3 in one transaction. A separate manual channel feeds priority URLs that an operator has submitted through the WebSocket control UI; the details workers drain it non-blockingly ahead of the scheduled cycle, so manual tasks always jump the queue.
%%{init: {'theme':'dark', 'themeVariables': {'fontSize':'14px', 'fontFamily':'Inter, system-ui, sans-serif'}}}%%
flowchart TB
Op([Operator])
Ticker[Ticker<br/>scheduled cycles]
subgraph Ingress["Ingress"]
direction LR
Manual[/manual cards<br/>priority queue/]
Auto[/auto cards<br/>from scheduled scrape/]
end
subgraph Details["Details Workers · pool of N"]
direction TB
Prequal[Pre-qualification<br/>resolve ModID and ComID<br/>by title before scraping]
Chrome[Headless Chrome<br/>chromedp · rotating agents]
Parse[DOM parsing<br/>goquery]
end
subgraph Processing["Processing Workers · pool of N/2"]
direction LR
Translate[Translator<br/>LibreTranslate + LLM fallback]
Structurer[AI Structurer<br/>JSON Schema outputs<br/>batch manager]
end
subgraph Saving["Saving Worker · singleton"]
direction LR
DB[(PostgreSQL<br/>normalized row)]
S3[(Object Storage<br/>media upload)]
MS[(Meilisearch<br/>search index)]
end
Op -.-> Manual
Ticker -.-> Auto
Manual ==> Prequal
Auto ==> Prequal
Prequal --> Chrome
Chrome --> Parse
Parse --> Translate
Translate --> Structurer
Structurer --> DB
Structurer --> S3
Structurer --> MS
Panic[[Panic Recovery<br/>respawn workers<br/>on any crash]]
Panic -. protects .-> Details
Panic -. protects .-> Processing
Panic -. protects .-> Saving
classDef user fill:#0f172a,stroke:#64748b,color:#e2e8f0,stroke-width:2px
classDef queue fill:#713f12,stroke:#facc15,color:#fef9c3,stroke-width:2px
classDef worker fill:#134e4a,stroke:#2dd4bf,color:#ccfbf1,stroke-width:2px
classDef state fill:#7f1d1d,stroke:#f87171,color:#fee2e2,stroke-width:2px
classDef panic fill:#581c87,stroke:#c084fc,color:#f3e8ff,stroke-width:3px
class Op user
class Ticker,Manual,Auto queue
class Prequal,Chrome,Parse,Translate,Structurer worker
class DB,S3,MS state
class Panic panic
A classical layered Go backend, deliberately kept simple. Every request enters through Fiber v2 middleware (recovery, CORS, health check, rate limit, JWT auth, request ID), hits a typed handler, which calls a service, which calls a repository, which talks to Postgres (through go-jet's generated type-safe queries or Squirrel's dynamic builder depending on whether the shape is fixed or computed), plus Redis for cache and Meilisearch for search. A separate CurrencyManager runs on its own schedule through gocron/v2 and writes updated rates into the same Postgres.
%%{init: {'theme':'dark', 'themeVariables': {'fontSize':'14px', 'fontFamily':'Inter, system-ui, sans-serif'}}}%%
flowchart TB
Req([HTTP / WebSocket])
Fiber[Fiber v2<br/>recover · CORS · health<br/>rate-limit · JWT · requestid]
subgraph Layers["Core Layers — internal/core"]
direction LR
H[Handlers<br/>ads · auth · cars<br/>search · specs · location]
S[Services<br/>ads · search · currency<br/>cars · location · users<br/>analytics · cache]
R[Repository<br/>pgx + go-jet + Squirrel<br/>typed models]
end
subgraph Infra["Infrastructure — pkg"]
direction LR
Clients[Clients<br/>postgres · redis · meili]
JWT[JWT Service]
Log[Observability<br/>zap + OTel]
end
subgraph Stateful["Data Stores"]
direction LR
PG[(PostgreSQL<br/>25+ tern migrations)]
Redis[(Redis<br/>cache)]
Meili[(Meilisearch<br/>ads index)]
end
Cron[[Currency Manager<br/>gocron · pulls CB rates]]
Req --> Fiber
Fiber --> H
H --> S
S --> R
R --> Clients
Clients --> PG
Clients --> Redis
Clients --> Meili
S -. uses .-> JWT
Layers -. traces .-> Log
Cron --> R
classDef req fill:#0f172a,stroke:#64748b,color:#e2e8f0,stroke-width:2px
classDef fiber fill:#581c87,stroke:#c084fc,color:#f3e8ff,stroke-width:3px
classDef layer fill:#14532d,stroke:#4ade80,color:#dcfce7,stroke-width:2px
classDef infra fill:#7c2d12,stroke:#fb923c,color:#ffedd5,stroke-width:2px
classDef state fill:#7f1d1d,stroke:#f87171,color:#fee2e2,stroke-width:2px
classDef cron fill:#134e4a,stroke:#2dd4bf,color:#ccfbf1,stroke-width:3px
class Req req
class Fiber fiber
class H,S,R layer
class Clients,JWT,Log infra
class PG,Redis,Meili state
class Cron cron
This is the section that makes Araxes technically interesting. Everything else is well-executed craft; this is the engineering that would not exist if I had grabbed Colly and called it a day.
The scraper is not one program — it is six specialised CLI binaries sharing one internal module tree:
| Binary | Responsibility |
|---|---|
ads-scraper |
The main ad pipeline. Walks external listings, fetches each ad's detail page, processes and saves. This is the one that runs continuously. |
brands-scraper |
One-shot scrape of the source catalog's brand index. |
brands-parser |
Parses the scraped brand data into the normalized brands / models / generations tables. |
specs-scraper / specs-scraper-v2 |
Two generations of the car-specification scraper. The v2 rewrite uses a cleaner pipeline shape and retired the v1 approach; both remain in the tree because the v1 output is still useful as a ground-truth dataset for validating the v2. |
matches-indexer |
Post-processing step that correlates scraped ad modifications with the canonical catalog rows, populating the modification_matches join table that powers the search experience. |
Splitting into specialised binaries matters because the scraping domain has very different runtime profiles — the brand index changes roughly monthly, specs change per model-year, and ads arrive constantly. Running them all inside one long-lived process would force the cadence of one onto the other, which is how you end up re-scraping a million ad pages because you wanted to refresh one brand's spec sheet. Keeping them apart is not micro-service cosplay — it is acknowledging that they are genuinely different jobs.
The ads-scraper binary runs the most intricate of the six. It is a four-stage pipeline with three worker pools, coordinated by a handful of buffered channels and a weighted semaphore, and every single stage is designed to survive failures that production scraping always produces: timeouts, crashed browser tabs, rate-limited source pages, parser panics on malformed DOM, LLM API outages.
Four channels, three worker pools. A ticker on the main loop runs a scrape cycle every configured interval: fetch the current card listings, dedupe against already-processed IDs, and push new cards onto the cards channel. A pool of details workers pulls from cards, acquires a slot on the shared semaphore, drives chromedp to load the detail page, parses the DOM with goquery, and forwards the result onto the details channel. A smaller pool of processing workers drains details, runs translation and LLM structuring, and emits a fully-populated record onto the processed channel. A single saving worker drains processed and commits each record across Postgres, Meilisearch, and S3 in one transaction. Each pool size is configurable, but the shapes are deliberately asymmetric — more details workers than processing workers, because chromedp is the bottleneck, and a single saving worker because the write path is naturally serialised around the database transaction.
Priority draining for manual URLs. A parallel manual channel exists alongside cards. The scraper's web control UI lets an operator paste a specific ad URL and watch it be processed end-to-end in real time — useful for debugging a failing source page, re-running a suspect listing, or hand-verifying that a recent parser fix actually works. When a details worker is ready for the next task, it drains the manual channel non-blockingly first, and only falls through to the scheduled cards queue when manual is empty. Manual tasks therefore always jump the queue without starving the scheduled cycle, and the operator gets first-class latency on the URL they just asked about. A ManualTaskTracker interface emits started / completed events so the web UI can stream status back to the operator over WebSocket.
Panic-recovering workers. Production scraping means strange, unreliable inputs — truncated HTML, encoding artifacts, source-page layout changes, LLM responses that violate their own JSON schema. A single unhandled panic in a worker goroutine should not take down the pipeline, so every worker main-loop is wrapped in a recoverAndRespawn helper that catches the panic, logs the stack trace, increments an error counter, and respawns a fresh goroutine for the same worker before returning. The pipeline self-heals instead of grinding to a halt, and the error count surfaces in the stats panel so an operator notices the bleed.
Pre-qualification. Before ever invoking chromedp on a card's detail page, the worker tries to resolve the ad's ModID and ComID (modification and complectation primary keys) from the listing-level title alone, which is the one field available without a second HTTP request. If the resolution succeeds, the details scrape goes ahead; if it fails — meaning this listing doesn't match any known car in the catalog — the card is marked processed, the skip counter is incremented, and the worker moves on without ever spending browser time on the detail page. On a noisy source with broad model coverage, this pre-qualification is what turns the pipeline from "running chromedp on every garbage ad" into "running chromedp only on ads we already know we can place in the catalog".
Bounded concurrency via semaphore.Weighted. The total number of simultaneously-in-flight chromedp browser instances is capped at the configured concurrency limit, regardless of how many workers are spinning. This prevents a surge of incoming manual tasks from accidentally fork-bombing headless Chrome.
Graceful shutdown. Stop signals flip the running flag, cancel the root context, wait on the worker WaitGroup with a thirty-second timeout, and close all four channels in order. If the workers don't all finish within the timeout, the scraper logs a warning and forces the shutdown — the one case where data loss is acceptable because the alternative is blocking an operator's Ctrl-C forever.
Somewhere between "the DOM parsed cleanly" and "the row landed in Postgres" the pipeline has to deal with one unavoidable reality: the source ad is written in free-form natural language. The mileage might be "45,000 km" or "45K km" or "45 тыс. км"; the engine might be a 2.0L turbo or a 2.0T or just 2.0; the trim level might be named half a dozen different ways. No amount of regex survives this. So the pipeline hands each scraped ad to an AI structurer whose job is to turn the free-form blob into a typed record.
The structurer wraps two LLM clients — openai-go/v2 and groq-go — behind one Integrator abstraction, and drives both with invopop/jsonschema-generated JSON Schemas that describe the exact shape of the normalized ad row. The LLM is told "respond with JSON that matches this schema" instead of being asked to write free prose, so the output slots directly into the Go struct with no post-processing. When the LLM produces a field the schema doesn't expect, the unmarshaller rejects the response and the pipeline falls back to either a cheaper model or a lower-confidence parse path.
A separate batch_manager.go groups multiple ad-structuring requests into single LLM calls when the volume is there, which is the only reason the LLM cost on this project is survivable. The batch manager tracks a configurable window of pending requests, flushes when either the window fills or a timeout elapses, and returns the per-request results back to the calling workers through a fan-out. In a long scraping run, batching cuts the per-ad LLM cost meaningfully without introducing observable per-ad latency.
Translation runs in front of the structurer — raw text first gets translated from the source language into the target language, then the structurer extracts fields from the translated text. The translation path preferentially uses a self-hosted LibreTranslate instance (open-source, unlimited, free) and falls back to an LLM-driven translation only when LibreTranslate fails or the source text is too messy for its model. The LLM path is more expensive but more tolerant of formatting noise, which matters when a source page embeds specs inside Chinese-style bracketed fragments.
Media handling is a parallel path. Each ad detail page surfaces a set of photo URLs, which the saving worker streams through the AWS S3 SDK v2 into object storage under a deterministic key scheme tied to the ad's dealer pair, updates the ad row with the uploaded URLs, and ensures the Meilisearch index carries pointers to the stored media rather than to the original source URLs (which rot).
The scraper ships with its own local web control surface exposed over gorilla/websocket. It is not a dashboard for end users — it is an operator's cockpit for the scraper itself: submit a specific URL to the priority queue, watch its journey through the pipeline stage by stage in real time, inspect the current pipeline stats (total processed, new ads, skipped, errors, last run time, average run time per cycle), configure the minimum model-year filter without a restart, and trigger a scraping cycle on demand. The ManualTaskTracker interface is what the pipeline calls to notify the UI of per-task status transitions; the UI subscribes once and receives a stream of started / completed events for every manual URL.
The scraper was designed to run continuously, autonomously, for days at a time — and the control UI is what makes that actually practical. Without it, operating a long-running scraper means grepping logs and praying; with it, you can observe the pipeline's behaviour live and correct it without stopping it.
The most interesting part of the backend is not the code — it is the schema. A car classifieds catalog has to represent the actual taxonomy of cars, which is deeper than most people expect. Araxes uses the honest five-level hierarchy:
Brand → Model → Generation → Modification → Complectation
e.g. Toyota Camry XV70 (2017-…) 2.5 Hybrid e-CVT Executive
Every level has its own table with a parent FK, its own full CRUD endpoints, and its own lookup queries in the API. A listing attaches at the complectation level, which is the leaf node — because only the complectation has enough information to answer questions like "how big is the fuel tank" or "does this trim come with a sunroof".
On top of that, every complectation gets five specification tables, each normalized into its own schema:
| Spec table | What it captures |
|---|---|
body_specs |
Body type, doors, seats, cargo volume, drag coefficient |
engine_specs |
Displacement, power, torque, fuel type, cylinder layout |
transmission_specs |
Transmission type, gear count, drivetrain |
performance_specs |
0-100 km/h, top speed, fuel consumption (city / highway / combined) |
dimension_specs |
Length, width, height, wheelbase, ground clearance, turning radius |
A modification_options join table maps complectations to the optional equipment available on that trim, and a modification_matches table (populated by the scraper's matches-indexer binary) correlates scraped ads back to the canonical complectation rows — which is what makes "find me similar ads" and "ads for this exact complectation" actually return precise results instead of fuzzy matches.
All of this is managed through 25+ tern migrations under migrations/, applied automatically on startup by a dedicated migrator Docker image that runs once before the backend container boots.
The backend follows a standard four-layer Go architecture, and I kept it standard on purpose. A product like this lives for years, gets handed off between people, and gains dozens of endpoints over time — the less clever the architecture, the longer it stays maintainable.
internal/core/handlers— Fiber v2 HTTP handlers. One file per domain (ads.go,auth.go,cars.go,currencies.go,location.go,search.go,specs.go), one function per endpoint, no business logic. Parse, validate, call service, serialize.internal/core/services— business logic.ads,analytics,cache,cars,currency,location,search,users, plus a top-levelservice.gothat wires the others together. TheCurrencyManagerandSearchEngineare their own standalone services.internal/core/repository— data access. Per-domain files (ads.go,cars.go,currencies.go,location.go,specs.go,users.go), pluscache/anddb/subtrees for the per-store adapter code, and aninterfaces.gothat fixes the contracts so services depend on interfaces and not on concrete types.internal/core/entities— typed domain models (ad.go,analytics.go,car.go,colors.go,currency.go,locations.go,search.go,seller.go,user.go). These are the shapes the services talk in.
Underneath all of this sits a pkg/ tree with 13 in-house libraries that every layer reuses — clients (typed Postgres / Redis / Meili factories), collections, concurrent, dbx (my signature database layer, same one I reuse across projects), encrypts, errors, jwt, middlewares, models, observability, searchquery (the custom query builder that powers the search API's filter DSL), and validate. Raw third-party clients never leak into the handler layer — every cross-boundary call goes through something in pkg/ first.
The backend exposes ~40 REST endpoints under /api/v1, organized into six resource groups. Below is the shape, not an exhaustive list:
| Group | Endpoints |
|---|---|
| Ads | GET /ads (faceted search), GET /ads/:id (full detail), GET /ads/:id/similar (neighbourhood search) |
| Search | GET /search/ads/count, GET /search/ads/relaxed (fallback widening), GET /search/ads/suggest (autocomplete) |
| Recommendations | GET /recommendations/ads (per-user personalised feed) |
| Cars catalog | Full CRUD on brands, models, generations, modifications, complectations — all five levels, with GET / POST / PUT per level |
| Specs | Full CRUD on the five per-complectation spec tables (body, engine, transmission, performance, dimensions) |
| Locations | GET + POST + PUT on countries and cities, with GET /countries/:id/cities for nested lookup |
| Currencies | GET /currencies/rates — the latest rates from the currency manager's most recent pull |
Authentication is JWT-based, applied as Fiber middleware on the api.Group("/api/v1") — so every endpoint below that group inherits the auth check, and unauthenticated routes sit outside the group. An X-API-Key header is required for administrative mutations on top of the JWT, so an ordinary user token cannot create a new brand or update spec tables even if it leaks.
Imported car pricing needs current exchange rates, and pricing stale by a day is an embarrassment. The backend runs a CurrencyManager service wired through gocron/v2 that fires at a configurable interval (3 hours by default), pulls the latest rates from the Central Bank's public JSON feed, and upserts them into a Postgres currencies table. The /currencies/rates endpoint reads from that table, so the user-facing response is always a cached read and never waits on an external call.
The cron lifecycle is tied to the server's own start / stop signals through the closer utility from my personal gopherbox library: currencyManager.Start() is called during NewServer() and the stop function is pushed onto the closer, so shutdown deregisters the job cleanly before the Postgres pool is closed.
The search path is two-tier. The primary GET /search/ads endpoint runs a straight typed-filter query against Postgres with all the requested constraints (brand, model, generation, price range, mileage range, location, etc.). When that query returns too few results — say the user asked for "2022 Toyota Camry, hybrid, black, Moscow, under 2M" and nothing matches — the backend falls through to the relaxed search endpoint GET /search/ads/relaxed, which reruns the query against Meilisearch with progressively dropped constraints until enough results come back. The caller gets back a result set and a list of which constraints were relaxed to get it, so the frontend can show the user "we couldn't find exactly that, here are close alternatives without the hybrid filter".
The autocomplete endpoint GET /search/ads/suggest is a pure Meilisearch call — it's fast, typo-tolerant, and knows how to rank by frequency of matching ads, which is exactly what Meilisearch is good at.
The frontend is a single Next.js 15.4 app under the App Router with route groups for (admin) and (ads), plus a handful of top-level marketing and SEO landing pages. Tech stack:
| Layer | Stack |
|---|---|
| Framework | Next.js 15.4 · React 19.1 · TypeScript 5 · Turbopack dev server |
| Styling | Tailwind CSS v4 · shadcn-ui on top of Radix UI · tailwind-merge · tw-animate-css |
| Auth | next-auth v4, integrated against the backend's JWT issuance |
| Forms & Validation | zod v4 end-to-end, shared schemas between form input and API contracts |
| Data UI | TanStack Table v8 + TanStack Virtual v3 for admin grids with thousands of rows |
| Icons | lucide-react + react-icons + @remixicon/react |
| Interaction | cmdk (command palette), embla-carousel-react (image sliders), vaul (mobile drawers), react-resizable-panels |
| Theming | next-themes for dark mode |
| Language tooling | petrovich and pluralize-ru — see below |
The one quirk worth pulling out is the grammatically-correct rendering of target-language text. Languages with grammatical case require declining noun forms depending on syntactic role — for example, showing "3 cars found" vs. "3 of <brand>'s cars found" requires the brand name to be in a different case in each sentence, which no off-the-shelf i18n library handles. petrovich is a library that performs proper morphological declension of nouns and proper names, and pluralize-ru handles the three-form plural agreement (1 машина / 2 машины / 5 машин). Using both is how you make the frontend stop looking like a machine-translated knockoff — a detail that the target-language audience notices immediately even if they cannot explain why it feels wrong.
The App Router is organised with route groups for (admin) (admin console, gated on role) and (ads) (the ad listing and detail surface), plus top-level folders for auth/, cars/, a search podbor/ page, a kalkulyator/ (calculator) for cross-border pricing, a modelnyy-ryad/ (model lineup) catalog browser, and several SEO-optimized landing pages for long-tail search queries. The separation between (admin) and (ads) is a route-group parenthesis, so both groups share a root layout but present completely different chrome without any URL prefix.
The project ships with a full production deployment story, not just a docker run. Four moving pieces:
- Docker Compose split into
localandproduction.deploy/local.docker-compose.yamlbrings up the whole stack for development;deploy/production.docker-compose.yamlis the live prod topology with Traefik, the backend, the frontend, the migrator, and the infrastructure services. The compose files are kept separate so that iterating on dev never risks touching prod config by accident. - Traefik as the edge router with Let's Encrypt. Terminates TLS, handles HTTP-to-HTTPS redirection, and routes traffic across four subdomains: the main site, the API, the media proxy, and the Traefik dashboard itself. Certificates are issued automatically on first boot through the ACME HTTP-01 challenge, and the
letsencryptvolume persists them across restarts so the renewal cycle does not re-trigger ACME. - Selectel Registry as the production container registry. A dedicated PowerShell script (
deploy/push-to-selectel.ps1) handles the end-to-end flow — checking local images exist, tagging them with both the version andlatest, pushing all three images (autos-server,autos-frontend,autos-migrator) tocr.selcloud.ru/araxes-regitry, and reporting status — so cutting a new version is one command instead of a checklist. ternfor database migrations. A dedicatedtern.dockerfilebuilds a single-purpose migrator image that runstern migrateagainst the Postgres pool and exits. The production compose file runs it as a one-shot service before the backend starts, so every production deploy guarantees the schema is current before the first HTTP request is served. Migrations themselves live inmigrations/— 25+ files in numbered order covering the full schema.
The production README documents all of this, including the DNS A-record requirements for the four subdomains, the Let's Encrypt rate-limit recovery procedure, the manual certificate cleanup procedure, the Traefik dashboard credentials setup, and the troubleshooting workflow for the common failure modes. It is the kind of document you write once, at three in the morning, after the first production deploy has taught you which seven things always break.
| Layer | Technology |
|---|---|
| Backend Language | Go 1.24 |
| Backend Framework | Fiber v2 with recover, CORS, health check, rate limit, JWT, requestid, pprof (local only) |
| Database | PostgreSQL via pgx/v5, with go-jet for type-safe generated queries and Masterminds/Squirrel for dynamic ones |
| Migrations | tern (dedicated Docker image) |
| Cache | Redis via go-redis/v9 with redisotel tracing |
| Search | Meilisearch via meilisearch-go |
| Auth | JWT v5, plus a separate X-API-Key header for administrative mutations |
| Scheduler | gocron/v2 for the currency-rate refresh job |
| Observability | OpenTelemetry traces with otelpgx + redisotel, Zap for structured logging, Fiber's built-in request-ID middleware |
| Scraper Language | Go 1.24 |
| Scraper Engine | chromedp for headless Chrome, PuerkitoBio/goquery for DOM parsing, gorilla/websocket for the operator control UI |
| AI | openai-go/v2 and groq-go with invopop/jsonschema-generated JSON Schemas for structured outputs, behind a custom batch manager |
| Translation | Self-hosted LibreTranslate with LLM fallback |
| Media | aws-sdk-go-v2/s3 for object storage |
| Frontend Framework | Next.js 15.4, React 19.1, TypeScript 5, Turbopack dev server |
| Frontend UI | Tailwind v4, shadcn-ui on top of Radix UI, TanStack Table + Virtual, cmdk, embla-carousel, vaul, react-resizable-panels |
| Frontend Auth | next-auth v4 |
| Frontend Validation | zod v4 |
| Frontend Language Tooling | petrovich (morphological declension), pluralize-ru (three-form plural) |
| Edge | Traefik with Let's Encrypt, four subdomains |
| Registry | Selectel Registry (cr.selcloud.ru/araxes-regitry) |
| Infrastructure | Docker Compose, split into local and production; PowerShell deploy script; one-shot tern migrator service |
- A production scraper is a systems-engineering problem, not a parsing problem. The hardest part of
ads-scraperis notgoqueryselectors or XPath expressions — it is the concurrency shape, the panic-recovery protocol, the priority-queue semantics, the pre-qualification filter, the semaphore bounds, the graceful shutdown, and the operator control plane. Those are all things you find out the hard way when you run the scraper continuously for days and watch which failure modes it can and cannot survive. I came into this thinking the scraper would be two thousand lines; it ended up being a lot more than that, and every line earned its place. - LLMs with JSON Schema are the right tool for structuring free-text listings. Before this project I thought of LLMs as chat tools with a prompt and a reply. Working through
invopop/jsonschemachanged that: you describe the exact shape of the Go struct you want, you tell the LLM "answer in JSON matching this", and the response drops directly into the unmarshaller with no post-processing. For a task like "extract structured car specs from a free-form listing description", this is faster, cheaper, and dramatically more reliable than writing regex-and-rules code that never keeps up with the data. Batching is what makes the economics work at scale, which is why the AI module has a dedicatedbatch_manager.goand not just a direct call. - Grammar is a product feature in languages that have it. Using
petrovichon the frontend is an afternoon of work that most developers would skip because "the translation looks fine". It does not look fine — native speakers notice the missing case agreement immediately, and the feeling it leaves is "this product was built by people who do not care about the details". Getting the declension right is not localization — it is respect for the audience, expressed in code.
Commercial prototype — all three components were built, deployed, and run on production infrastructure at araxes.ru behind Traefik with Let's Encrypt certificates, with images hosted in Selectel Registry. The backend services, the scraping pipeline, the five-level car catalog with its five spec tables, the Meilisearch-backed search path with relaxed fallback, the currency-rate cron, the AI-powered data structuring, the operator control UI, and the frontend with its grammar-aware rendering layer are all implemented and were exercised against live data. The live environment is currently offline, but the complete codebase and deployment toolchain are ready to be brought back up with a single make prod-up against a prepared server.
Built by David Movsesian