Darkbloom is a decentralized private inference network for Apple Silicon Macs. Consumers use OpenAI-compatible APIs, the coordinator handles routing, auth, billing, attestation, and capacity management, and providers run local inference workloads on macOS hardware using MLX-Swift. All inference is end-to-end encrypted -- the coordinator never sees plaintext prompts.
coordinator/ Go control plane (packages live at top level, not internal/)
├── cmd/coordinator/ main service entrypoint
├── api/ HTTP + WebSocket handlers
│ ├── consumer.go OpenAI-compatible chat/completions/messages/transcriptions/images
│ ├── provider.go provider registration, heartbeats, attestation, relay
│ ├── billing_handlers.go Stripe/Solana/referral/pricing endpoints
│ ├── device_auth.go device code flow for linking providers to user accounts
│ ├── enroll.go MDM + ACME enrollment profile generation
│ ├── invite_handlers.go invite code admin/user flows
│ ├── release_handlers.go binary release registration (GitHub Actions integration)
│ ├── acme_verify.go ACME device-attest-01 client cert verification
│ ├── stats.go public network stats
│ └── server.go route wiring, auth middleware, version gate
├── attestation/ Secure Enclave + MDA verification
├── auth/ Privy JWT integration
├── billing/ Stripe, referrals
├── e2e/ X25519 request-encryption helpers
├── mdm/ MicroMDM client + webhook handling
├── payments/ ledger + pricing
├── protocol/ WebSocket message types shared with provider
├── ratelimit/ rate limiting
├── registry/ provider registry, queueing, routing, reputation, token-budget admission
├── saferun/ panic-safe goroutine runners
├── store/ in-memory or Postgres persistence
├── telemetry/ Datadog DogStatsD metrics
├── datadog/ dev dashboard JSON definitions
└── internal/e2e/ coordinator-scoped integration tests
e2e/ System-level E2E testing framework
├── integration_test.go 12 E2E tests (streaming, billing, encryption, attestation, etc.)
├── profile_test.go latency profiling tests
├── benchmark_test.go load benchmarks (posts markdown to PR comments)
└── testbed/ shared test harness
├── coordinator.go Coordinator lifecycle (start/stop, Postgres helpers)
├── provider.go Provider lifecycle (binary discovery, start/stop)
├── config.go Test configuration (model, provider, request settings)
├── suite.go Suite orchestration (multi-provider, user pools)
├── events.go Event system (segments, buffers, fan-out)
├── instrument.go Request-level instrumentation
├── load.go Load generator (concurrency, streaming, metrics)
├── assert/ Latency threshold + accounting integrity assertions
├── deps/ External dependency lifecycle (ephemeral Postgres)
└── profile/ Segment stats aggregation, diffing, JSON export
provider-swift/ Swift provider CLI for Apple Silicon Macs
├── Sources/ProviderCore/ coordinator client, protocol, hardware, security, inference, server, telemetry, model downloads
├── Sources/ProviderCoreFoundation/ model manifests, scanner, hashing, publish-safe foundation code
├── Sources/darkbloom/ CLI (`serve`, `start`, `stop`, `models`, `benchmark`, `status`, `doctor`, `login`, etc.)
├── Sources/darkbloom-publish/ registry manifest builder used by publish workflow
├── Sources/darkbloom-enclave-cli/ Secure Enclave attestation/sign helper
└── Tests/ ProviderCore and ProviderCoreFoundation tests
console-ui/ Next.js 16 / React 19 frontend
├── src/app/ chat, billing, images, models, stats, providers, settings, link, api-console, earn
├── src/app/api/ chat, images, transcribe, auth/keys, payments/*, invite, models, health, pricing
├── src/components/ chat UI, sidebar, top bar, trust badge, verification panel, invite banner
├── src/components/providers/
│ ├── PrivyClientProvider.tsx
│ └── ThemeProvider.tsx
├── src/lib/ API client (api.ts) + Zustand store (store.ts)
├── src/hooks/ auth (useAuth.ts) + toast (useToast.ts)
└── proxy.ts Next.js 16 proxy (replaces middleware.ts)
scripts/ build, signing, install, and deploy helpers
├── install.sh end-user installer served from coordinator (hash + codesign verification)
├── admin.sh admin CLI (Privy auth, release mgmt, API calls)
├── publish-model.sh model registry publish workflow
├── deploy-acme.sh nginx/step-ca helper
├── fetch-metallib.sh MLX metallib fetcher
└── entitlements.plist hardened runtime entitlements (hypervisor, network)
docs/ architecture, deploy runbooks, MDM/ACME notes, threat model
.github/workflows/ CI (ci.yml), integration tests (integration.yml), Swift release (release-swift.yml),
model registration (register-model.yml), threat model review (threat-model-review.yml)
- Coordinator HTTP routes include
POST /v1/chat/completions,POST /v1/completions,POST /v1/messages,GET /v1/models,GET /v1/models/capacity, billing/pricing endpoints, invite flows, stats, enrollment, device authorization, and release registration endpoints. - Coordinator auth is split between Privy JWTs, API keys, and device-code login (RFC 8628) for provider machines.
- Routing uses token-budget admission with engine-reported capacity, speculative TTFT dispatch, EWMA TPS tracking, and early 429 with Retry-After for OpenRouter compatibility.
- Billing logic is split between
coordinator/payments(ledger + pricing) andcoordinator/billing(Stripe, referrals). - Providers serve text inference through the Swift
darkbloomCLI with continuous batching via MLX-Swift. - Model registry data is DB-backed in the coordinator and points to R2 manifests under
https://models.darkbloom.ai; model bytes are not hardcoded in the provider or UI. - Observability: Datadog metrics (DogStatsD) for attestation, routing, billing, fleet version, and provider capacity. X-Timing header decomposes per-request latency.
Toolchain versions (Go, Node, Swift, Python, plus jq/gh/awscli/gcloud)
are pinned in mise.toml. Build/test commands are wrapped in the
root Makefile — run make with no args to list all targets.
mise install # installs every tool pinned in mise.toml
make ui-install # console-ui npm depsmake coordinator-test # cd coordinator && go test ./...
make coordinator-build # cd coordinator && go build ./cmd/coordinator
make coordinator-build-linux # GOOS=linux GOARCH=amd64 CGO_ENABLED=0 build (EigenCloud)
make coordinator # test + buildmake provider-build # cd provider-swift && swift build
make provider-test # cd provider-swift && swift test
make provider # build + testmake ui-install # npm install
make ui-build # npm run build
make ui-lint # npx eslint src/
make ui-test # vitest (npm test)
make ui # install + lint + test + build# Requires Postgres + Swift provider binary + MLX model downloaded.
make e2e-integration # go test ./e2e/... -run TestIntegration -v
make e2e-benchmark # go test ./e2e/... -run TestBenchmark -vmake test # all unit tests (coordinator + provider + ui)
make build # build all components
make all # test + build everything
make clean # remove built artifactsCanonical runbook: docs/coordinator-deploy-runbook.md
Current release-sensitive pieces:
- Prod coordinator runs on EigenCloud (TEE) as app
d-inferenceatapi.darkbloom.dev. Build target:coordinator/Dockerfile. Dev coordinator runs on Google Cloud (seedocs/dev-environment.md). - Provider bundle creation lives in
scripts/build-bundle.sh. - App bundle + DMG creation lives in
scripts/bundle-app.sh. - Installer flow lives in
scripts/install.sh. - Provider update checks use
LatestProviderVersionincoordinator/api/server.go, so bundle uploads and version bumps need to stay coordinated. - CI release workflow (
release-swift.yml) signs binaries with Developer ID Application cert, notarizes with Apple, computes SHA-256 hashes after signing, embeds provisioning profile in .app bundle.
Quick coordinator deploy (prod, EigenCloud):
# EigenCloud builds from the repo via coordinator/Dockerfile and blue-green deploys.
git push origin master
ecloud compute app deploy d-inference
curl https://api.darkbloom.dev/health
ecloud compute app logs d-inferenceDev coordinator deploy (Google Cloud): see docs/dev-environment.md.
- Protocol changes must be mirrored in both
provider-swift/Sources/ProviderCore/Protocol/andcoordinator/protocol/messages.go. - Telemetry wire types live in three places and MUST stay aligned:
coordinator/protocol/telemetry.go(canonical),provider-swift/Sources/ProviderCore/Telemetry/(Swift mirror),console-ui/src/lib/telemetry-types.ts(TS mirror). Symmetry tests in each language pin enum casing and optional-field omission. Field allowlist additions need parallel updates incoordinator/api/telemetry_handlers.go,provider-swift/Sources/ProviderCore/Telemetry/, and the TS set above.
- If you change provider bundle semantics, keep
scripts/build-bundle.sh,scripts/install.sh, andLatestProviderVersionin sync. - If you change install paths or process invocation, update both the CLI and install flow.
- Device linking changes often span both coordinator device auth endpoints and the provider
login/logoutcommands. - Model registry changes span coordinator registry schema/endpoints,
provider-swiftmanifest download/publish code,scripts/publish-model.sh, and the console UI. Do not add hardcoded providerMODEL_CATALOGlists.
coordinator/coordinatoris a built binary checked into the tree. Do not model changes from it, and do not commit more built artifacts.- CI release workflow must compute binary SHA-256 hashes AFTER code signing, not before. Providers verify hashes of the signed binary.
- Model scan uses fast discovery (no hashing) at startup. Weight hashing is on-demand via
compute_weight_hash()only for the served model. Don't add hashing back to the scan path. - Provider auto-injects ChatML template for models missing
chat_templatefield. This is intentional -- Qwen3.5 base models ship without it. - Store selection (
cmd/coordinator/main.go): the coordinator uses the Postgres store wheneverEIGENINFERENCE_DATABASE_URLis set (prod does — durable across restarts/deploys), and refuses to start without it unlessEIGENINFERENCE_ALLOW_MEMORY_STORE=true. The in-memory store is the dev/test fallback only (state lost on restart). Note: the live provider registry (WebSocket connections/attestation) is always in-process and is rebuilt on reconnect regardless of store. - Request queue timeout is 120 seconds. Initial attestation challenge is sent immediately on registration, then every 5 minutes.
- Backend idle timeout is 1 hour (not 10 minutes as some comments may say).
Provider state lives in several fields that are read by different code paths with different precedence rules. When mutating any of these, trace every reader:
BackendCapacity.Slotsis authoritative for the scheduler when present (Swift providers). The scheduler derivesslotState,modelLoaded, token budgets, and observed TPS from it.WarmModelsis only a fallback for legacy providers withoutBackendCapacity.WarmModelsis updated by heartbeats. It is NOT consulted bysnapshotProviderLockedorbuildCandidateWithReasonwhenBackendCapacityis non-nil.TriggerModelSwaps/hasWarmProviderLockedchecks it as a fallback. LegacyScoreProvideralso reads it for warm bonus, and/v1/me/providerscopies it into API responses.CurrentModelis set from heartbeatactive_model. A nil/omittedactive_modelmeans no model is loaded. StaleCurrentModelcan cause attestation hash mismatches.pendingModelLoadsis only checked byTriggerModelSwapsplanning. It is NOT checked byQuickCapacityCheck,ReserveProviderEx, orfreeMemoryAdmits. Do not assume pending-load state affects routing decisions.- Provider-reported slot states include
"running"(active requests),"idle"(loaded, no requests),"crashed","reloading", and"idle_shutdown". The"idle"state means the model IS loaded — treat it the same as"running"for warm detection, not as"unknown". - Providers can hold up to
maxModelSlotsmodels simultaneously (default 3). Do not assume a model swap evicts all other models. - The provider's
ensureModelLoadedrequiresestimatedMemoryGb * 3.0headroom. The coordinator'sfreeMemoryAdmitsuses a different (less conservative) check. A model the coordinator admits can still fail on the provider side.
When adding code that mutates provider state or sends commands (load_model, etc.):
- Enumerate every reader of the fields you're mutating (
BackendCapacity.Slots,WarmModels,CurrentModel,pendingModelLoads). - Check what happens on the failure path — does state get cleaned up on disconnect, timeout, and load failure?
- Check concurrent access — heartbeats arrive per-provider on separate goroutines;
TriggerModelSwapscan race withdrainQueuedRequestsForModels. - Check the cleanup path —
Disconnect()must clear any per-provider state you add. - Verify pre-existing invariants:
maxModelSlots, heartbeat field omission semantics (nilvs empty), and the 3x memory gate on the provider side.
Keep the codebase modular, never monolithic.
- Prefer small, single-responsibility files over large catch-all ones. Split by concern: types, pure helpers, data/IO hooks, UI pieces, and a thin orchestrator that wires them together.
- Group a feature's files into a dedicated module/folder with a thin entry point. Examples: the coordinator's top-level Go packages (
registry/,billing/,store/), andconsole-ui/src/components/api-keys/(constants,format,limits,Modal,KeyForm,KeyCard, auseApiKeysdata hook, and a thinApiKeysManagerorchestrator). - One file/component should do one thing. If a file mixes several concerns or grows past a few hundred lines, that's a signal to split it.
- At the end of every large piece of work, do a refactor pass to make it modular before calling it done. Extract helpers/types/hooks into focused files, delete dead code, and keep the public entry point thin. The refactor must be behavior-preserving — build, lint, and tests stay green.
Every PR MUST include a before-and-after diagram (Mermaid) in its description that details what changed — covering BOTH:
- Behavior: the request/response flow, states, and outcomes a user or caller observes (e.g. dispatch → retry → 429/503/200).
- Code: which functions/components changed and how control flows through them.
Use two clearly labeled diagrams — a Before and an After — (or one side-by-side comparison) so a reviewer sees the delta at a glance. Scope it to what the PR changes; it is not a full-system map. A PR without a before/after diagram is not ready for review.
```mermaid
flowchart LR
subgraph Before
A1[request] --> B1[old behavior / code path]
end
subgraph After
A2[request] --> B2[new behavior / code path]
end
```A pre-commit hook in .githooks/pre-commit checks staged files only. It is enabled via:
git config core.hooksPath .githooks| Component | Check | Manual fix |
|---|---|---|
Go (coordinator/) |
gofmt -l |
gofmt -w <file> |
Swift (provider-swift/) |
no enforced formatter | cd provider-swift && swift test |
TypeScript (console-ui/) |
npx eslint src/ |
cd console-ui && npx eslint src/ --fix |