qbrixqbrix

Architecture

qbrix separates the hot path (arm selection) from the learning path (training). qbrix scales selection independently from training, achieving ultra-low latency decisions with eventual consistency in parameter updates.

System Overview

qbrix architecture

ComponentRoleScaling
proxysvcGateway — HTTP + gRPC entry point, experiment/pool management, auth, feature gatesHorizontal (HPA)
motorsvcSelection — reads cached params, runs policy, returns armHorizontal (HPA)
cortexsvcTraining — consumes feedback, runs batch training, writes updated paramsSingle instance
RedisParam cache (read by motorsvc), feedback queue (Redis Streams)Managed service
PostgresExperiments, pools, users, feature gates, API keysManaged service
ClickHouseSelection/feedback traces, experiment analyticsEnterprise only

Request Flow

Selection (Hot Path)

Selection path

  1. proxysvc authenticates the request (JWT or API key)
  2. Feature gate evaluates rollout, schedule, and rules
  3. If the gate commits an arm, return immediately (no bandit call)
  4. Otherwise, forward to motorsvc via gRPC
  5. motorsvc checks agent cache (300s TTL), then param cache (60s TTL)
  6. On cache hit: run policy select() in-memory — zero I/O
  7. On cache miss: fetch from Redis, populate cache, then select
  8. Return signed selection token + arm details

Feedback (Learning Path)

Learning path

  1. proxysvc validates the signed selection token (HMAC)
  2. Publishes feedback event to Redis Streams
  3. cortexsvc consumes in batches (256 events, 100ms timeout)
  4. Dispatcher routes events to per-experiment worker queues
  5. Workers call policy.train() with the batch
  6. Updated params written to Redis
  7. motorsvc picks up new params on next cache expiry

Optimized for Performance, Reliability, and Uptime

Stateless hot path

motorsvc does zero database calls during selection. It reads only from in-memory caches, running the policy's select() method purely in-process. No locks, no I/O on the critical path when caches are warm. This is what enables ultra-low latency selection.

Two-level gate cache

Feature gate configs are cached in a two-level hierarchy:

  • L1 (in-memory): microsecond access, 30s TTL, 1000 entries max
  • L2 (Redis): millisecond access, 300s TTL

A gate evaluation never hits Postgres. L1 misses fall through to L2; L2 misses return None (safe fallback to bandit selection).

Dual param caching

motorsvc maintains two separate caches via cachebox:

  • Agent cache (300s TTL, 100 entries) — pre-built Agent objects with policy instance
  • Param cache (60s TTL, 1000 entries) — learned policy parameters

The agent cache has a longer TTL because policy structure changes rarely. The param cache refreshes more often to pick up training updates.

gRPC with tuned thread pools

All inter-service communication uses gRPC (binary protocol, HTTP/2 multiplexing). Each service runs a 100-thread gRPC pool (up from the default 10) with keepalive settings tuned to prevent connection timeouts during traffic spikes.

Redis Streams back-pressure

Feedback events flow through Redis Streams with consumer groups:

  • Consumer groups prevent duplicate processing across restarts
  • maxlen approximate limit prevents unbounded stream growth
  • xack + xdel after successful training ensures exactly-once semantics
  • Stream naturally buffers during traffic spikes — cortexsvc processes at its own pace without affecting selection latency

Multi-worker training dispatcher

cortexsvc uses a 4-worker parallel dispatcher with per-experiment async queues:

  • Events are routed to experiment-specific queues
  • Each worker drains one experiment's queue completely before moving on
  • This prevents head-of-line blocking — a slow-training experiment doesn't hold up others
  • Active experiment tracking prevents duplicate concurrent training

Pending message recovery

On startup, cortexsvc runs xautoclaim to recover messages that were in-flight during a previous crash. Messages claimed by a dead consumer are automatically re-assigned. No feedback events are lost, even during unclean shutdowns.

Fail-safe gate evaluation

If a feature gate evaluation fails for any reason — bad config, missing metadata, unexpected error — it returns None and falls through to normal bandit selection. The system never blocks a request due to a gate misconfiguration. This is a deliberate design choice: availability over correctness for gate logic.

Stateless feedback correlation

Selection responses include an HMAC-signed token that encodes: tenant ID, experiment ID, arm index, context ID, vector, and metadata. When feedback arrives, the token is verified and decoded — no server-side session state is needed to correlate feedback with selections. This means motorsvc can scale horizontally without shared state.

Rate limiting at the edge

Sliding window counters in Redis protect downstream services from traffic spikes. Rate limiting applies only to operational endpoints/api/v1/agent/select and /api/v1/agent/feedback (HTTP), Select and Feedback (gRPC). Management endpoints (pools, experiments, gates, auth) are not rate-limited. Per-user and per-API-key limits are configurable by plan tier. Counters auto-expire after 120 seconds.


Service Details

proxysvc (Gateway)

  • Protocols: HTTP/REST (FastAPI) + gRPC
  • Responsibilities: Auth, experiment/pool CRUD, feature gate evaluation, feedback publishing, rate limiting
  • Storage: Postgres (experiments, pools, users, gates, API keys)
  • Stateless: Scales horizontally via HPA

motorsvc (Selection)

  • Protocol: gRPC
  • Responsibilities: Arm selection only — nothing else
  • Storage: Redis (read-only, via cache)
  • Stateless: No mutations on the hot path, scales horizontally
  • Caches: Agent (300s) + Param (60s), both in-memory

cortexsvc (Training)

  • Protocol: gRPC
  • Responsibilities: Consume feedback, batch train policies, write updated params
  • Storage: Redis Streams (consume), Redis (write params)
  • Single instance: Event sourcing pattern — sequential processing ensures correct parameter updates
  • Dispatcher: 4 workers, per-experiment queues, batch size 256, 100ms timeout, 10s flush interval

tracesvc (Enterprise)

  • Responsibilities: Consume selection + feedback events, persist to ClickHouse
  • Storage: Redis Streams (consume), ClickHouse (write)
  • Analytics: Total selections, default vs. ML selections, timeseries, per-arm stats