pvm/docs/TECH_STACK_RESEARCH.md
Mikkel Georgsen cf03b3592a Add comprehensive tech stack research document
1,190-line research covering all 18 technology areas for PVM:
Rust/Axum backend, SvelteKit frontend, Postgres + libSQL databases,
NATS + JetStream messaging, DragonflyDB caching, and more.
Includes recommended stack summary and open questions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 02:50:33 +01:00

64 KiB

PVM (Poker Venue Manager) — Tech Stack Research

Generated: 2026-02-08 Status: DRAFT — for discussion and refinement


Table of Contents

  1. Programming Language
  2. Backend Framework
  3. Frontend Framework
  4. Database Strategy
  5. Caching Layer
  6. Message Queue / Event Streaming
  7. Real-Time Communication
  8. Auth & Authorization
  9. API Design
  10. Local Node Architecture
  11. Chromecast / Display Streaming
  12. Mobile Strategy
  13. Deployment & Infrastructure
  14. Monitoring & Observability
  15. Testing Strategy
  16. Security
  17. Developer Experience
  18. CSS / Styling
  19. Recommended Stack Summary
  20. Open Questions / Decisions Needed

1. Programming Language

Recommendation: Rust (backend + local node) + TypeScript (frontend + shared types)

Alternatives Considered

Language Pros Cons
Rust Memory safety, fearless concurrency, tiny binaries for RPi5, no GC pauses, excellent WebSocket perf Steeper learning curve, slower compile times
Go Simple, fast compilation, good concurrency Less expressive type system, GC pauses (minor), larger binaries than Rust
TypeScript (full-stack) One language everywhere, huge ecosystem, fast dev velocity Node.js memory overhead on RPi5, GC pauses in real-time scenarios, weaker concurrency model
Elixir Built for real-time (Phoenix), fault-tolerant OTP Small ecosystem, harder to find libs, RPi5 BEAM VM overhead

Reasoning

Rust is the strongest choice for PVM because of the RPi5 local node constraint. The local node must run reliably on constrained hardware with limited memory, handle real-time tournament clocks, manage offline operations, and sync data. Rust's zero-cost abstractions, lack of garbage collector, and small binary sizes (typically 5-15 MB static binaries) make it ideal for this.

For the cloud backend, Rust's performance means fewer servers and lower hosting costs. A single Rust service can handle thousands of concurrent WebSocket connections with minimal memory overhead — critical for real-time tournament updates across many venues.

The "all code written by Claude Code" constraint actually favors Rust: Claude has excellent Rust fluency, and the compiler's strict type system catches bugs that would otherwise require extensive testing in dynamic languages.

TypeScript remains the right choice for the frontend — the browser ecosystem is TypeScript-native, and sharing type definitions between Rust (via generated OpenAPI types) and TypeScript gives end-to-end type safety.

Gotchas

  • Rust compile times can be mitigated with cargo-watch, incremental compilation, and sccache
  • Cross-compilation for RPi5 (ARM64) is well-supported via cross or cargo-zigbuild
  • Shared domain types can be generated from Rust structs to TypeScript via ts-rs or OpenAPI codegen

2. Backend Framework

Recommendation: Axum (v0.8+)

Alternatives Considered

Framework Pros Cons
Axum Tokio-native, excellent middleware (Tower), lowest memory footprint, growing ecosystem, WebSocket built-in Younger than Actix
Actix Web Highest raw throughput, most mature Actor model adds complexity, not Tokio-native (uses own runtime fork)
Rocket Most ergonomic, Rails-like DX Slower performance, less flexible middleware
Loco Rails-like conventions, batteries-included Very new (2024), smaller community, opinionated

Reasoning

Axum is the clear winner for PVM:

  1. Tokio-native: Axum is built directly on Tokio + Hyper + Tower. Since NATS, database drivers, and WebSocket handling all use Tokio, everything shares one async runtime — no impedance mismatch.
  2. Tower middleware: The Tower service/layer pattern gives composable middleware for auth, rate limiting, tracing, compression, CORS, etc. Middleware can be shared between HTTP and WebSocket handlers.
  3. WebSocket support: First-class WebSocket extraction with axum::extract::ws, typed WebSocket messages via axum-typed-websockets.
  4. Memory efficiency: Benchmarks show Axum achieves the lowest memory footprint per connection — critical when serving thousands of concurrent venue connections.
  5. OpenAPI integration: utoipa crate provides derive macros for generating OpenAPI 3.1 specs directly from Axum handler types.
  6. Extractor pattern: Axum's extractor-based request handling maps cleanly to domain operations (extract tenant, extract auth, extract venue context).

Key Libraries

  • axum — HTTP framework
  • axum-extra — typed headers, cookie jar, multipart
  • tower + tower-http — middleware stack (CORS, compression, tracing, rate limiting)
  • utoipa + utoipa-axum — OpenAPI spec generation
  • utoipa-swagger-ui — embedded Swagger UI
  • axum-typed-websockets — strongly typed WS messages

Gotchas

  • Axum's error handling requires careful design — use thiserror + a custom error type that implements IntoResponse
  • Route organization: use axum::Router::nest() for modular route trees per domain (tournaments, venues, players)
  • State management: use axum::extract::State with Arc<AppState> — avoid the temptation to put everything in one giant state struct

3. Frontend Framework

Recommendation: SvelteKit (Svelte 5 + runes reactivity)

Alternatives Considered

Framework Pros Cons
SvelteKit Smallest bundles, true compilation (no virtual DOM), built-in routing/SSR/PWA, Svelte 5 runes are elegant Smaller ecosystem than React
Next.js (React) Largest ecosystem, most libraries, biggest job market Vercel lock-in concerns, React hydration overhead, larger bundles, RSC complexity
SolidStart Finest-grained reactivity, near-zero overhead updates Smallest ecosystem, least mature, fewer component libraries
Nuxt (Vue) Good DX, solid ecosystem Vue 3 composition API less elegant than Svelte 5 runes

Reasoning

SvelteKit is the best fit for PVM for several reasons:

  1. Performance matters for venue displays: Tournament clocks, waiting lists, and seat maps will run on venue TVs via Chromecast. Svelte's compiled output produces minimal JavaScript — the Cast receiver app will load faster and use less memory on Chromecast hardware.
  2. Real-time UI updates: Svelte 5's fine-grained reactivity (runes: $state, $derived, $effect) means updating a single timer or seat status re-renders only that DOM node, not a virtual DOM diff. This is ideal for dashboards with many independently updating elements.
  3. PWA support: SvelteKit has first-class service worker support and offline capabilities through @sveltejs/adapter-static and vite-plugin-pwa.
  4. Bundle size: SvelteKit produces the smallest JavaScript bundles of any major framework — important for mobile PWA users on venue WiFi.
  5. Claude Code compatibility: Svelte's template syntax is straightforward and less boilerplate than React — Claude can generate clean, readable Svelte components efficiently.
  6. No framework lock-in: Svelte compiles away, so there's no runtime dependency. The output is vanilla JS + DOM manipulation.

UI Component Library

Recommendation: Skeleton UI (Svelte-native) or shadcn-svelte (Tailwind-based, port of shadcn/ui)

shadcn-svelte is particularly compelling because:

  • Components are copied into your codebase (not a dependency) — full control
  • Built on Tailwind CSS — consistent with the styling recommendation
  • Accessible by default (uses Bits UI primitives under the hood)
  • Matches the design patterns of the widely-used shadcn/ui ecosystem

Gotchas

  • SvelteKit's SSR is useful for the management dashboard but the Cast receiver and PWA may use adapter-static for pure SPA mode
  • Svelte's ecosystem is smaller than React's, but for PVM's needs (forms, tables, charts, real-time) the ecosystem is sufficient
  • Svelte 5 (runes) is a significant API change from Svelte 4 — ensure all examples and libraries target Svelte 5

4. Database Strategy

Recommendation: PostgreSQL (cloud primary) + libSQL/SQLite (local node) + Electric SQL or custom sync

Alternatives Considered

Approach Pros Cons
Postgres cloud + libSQL local + sync Best of both worlds — Postgres power in cloud, SQLite simplicity on RPi5 Need sync layer, schema divergence risk
Postgres everywhere One DB engine, simpler mental model Postgres on RPi5 uses more memory, harder offline
libSQL/Turso everywhere One engine, built-in edge replication Less powerful for complex cloud queries, multi-tenant partitioning
CockroachDB Distributed, strong consistency Heavy for RPi5, expensive, overkill

Detailed Recommendation

Cloud Database: PostgreSQL 16+

  • The gold standard for multi-tenant SaaS
  • Row-level security (RLS) for tenant isolation
  • JSONB for flexible per-venue configuration
  • Excellent full-text search for player lookup across venues
  • Partitioning by tenant for performance at scale
  • Managed options: Neon (serverless, branching for dev), Supabase, or AWS RDS

Local Node Database: libSQL (via Turso's embedded runtime)

  • Fork of SQLite with cloud sync capabilities
  • Runs embedded in the Rust binary — no separate database process on RPi5
  • WAL mode for concurrent reads during tournament operations
  • Tiny memory footprint (< 10 MB typical)
  • libSQL's Rust driver (libsql) is well-maintained

Sync Strategy:

The local node operates on a subset of the cloud data — only data relevant to its venue(s). The sync approach:

  1. Cloud-to-local: Player profiles, memberships, credit lines pushed to local node via NATS JetStream. Local node maintains a read replica of relevant data in libSQL.
  2. Local-to-cloud: Tournament results, waitlist changes, transactions pushed to cloud via NATS JetStream with at-least-once delivery. Cloud processes as events.
  3. Conflict resolution: Last-writer-wins (LWW) with vector clocks for most entities. For financial data (credit lines, buy-ins), use event sourcing — conflicts are impossible because every transaction is an immutable event.
  4. Offline queue: When disconnected, local node queues mutations in a local WAL-style append-only log. On reconnect, replays in order via NATS.

ORM / Query Layer

Recommendation: sqlx (compile-time checked queries)

  • sqlx checks SQL queries against the actual database schema at compile time
  • No ORM abstraction layer — write real SQL, get compile-time safety
  • Supports both PostgreSQL and SQLite/libSQL
  • Avoids the N+1 query problems that ORMs introduce
  • Migrations via sqlx migrate

Alternative: sea-orm if you want a full ORM, but for PVM the explicit SQL approach of sqlx gives more control over multi-tenant queries and complex joins.

Migrations

  • Use sqlx migrate for cloud PostgreSQL migrations
  • Maintain parallel migration files for libSQL (SQLite-compatible subset)
  • A shared migration test ensures both schemas stay compatible for the sync subset

Gotchas

  • PostgreSQL and SQLite have different SQL dialects — the sync subset must use compatible types (no Postgres-specific types in synced tables)
  • libSQL's VECTOR type is interesting for future player similarity features but not needed initially
  • Turso's hosted libSQL replication is an option but adds a dependency — prefer embedded libSQL with custom NATS-based sync for more control
  • Schema versioning must be tracked on the local node so the cloud knows what schema version it's talking to

5. Caching Layer

Recommendation: DragonflyDB

Alternatives Considered

Option Pros Cons
DragonflyDB 25x Redis throughput, Redis-compatible API, multi-threaded, lower memory usage Younger project, smaller community
Redis 7+ Most mature, largest ecosystem, Redis Stack modules Single-threaded core, BSL license concerns since Redis 7.4
Valkey Redis fork, community-driven, BSD license Still catching up to Redis feature parity
KeyDB Multi-threaded Redis fork Development appears stalled (no updates in 1.5+ years)
No cache (just Postgres) Simpler architecture Higher DB load, slower for session/real-time data

Reasoning

DragonflyDB is the right choice for PVM:

  1. Redis API compatibility: Drop-in replacement — all Redis client libraries work unchanged. The fred Rust crate (async Redis client) works with DragonflyDB out of the box.
  2. Multi-threaded architecture: DragonflyDB uses all available CPU cores, unlike Redis's single-threaded model. This matters when caching tournament state for hundreds of concurrent venues.
  3. Memory efficiency: DragonflyDB uses up to 80% less memory than Redis for the same dataset — important for keeping infrastructure costs low.
  4. No license concerns: DragonflyDB uses BSL 1.1 (converts to open source after 4 years). Redis switched to a dual-license model that's more restrictive. Valkey is BSD but is playing catch-up.
  5. Pub/Sub: DragonflyDB supports Redis Pub/Sub — useful as a lightweight complement to NATS for in-process event distribution within the backend cluster.

What to Cache

  • Session data: User sessions, JWT refresh tokens
  • Tournament state: Current level, blinds, clock, player counts (hot read path)
  • Waiting lists: Ordered sets per venue/game type
  • Rate limiting: API rate limit counters
  • Player lookup cache: Frequently accessed player profiles
  • Seat maps: Current table/seat assignments per venue

What NOT to Cache (use Postgres directly)

  • Financial transactions (credit lines, buy-ins) — always hit the source of truth
  • Audit logs
  • Historical tournament data

Local Node: No DragonflyDB

The RPi5 local node should not run DragonflyDB. libSQL is fast enough for local caching needs, and adding another process increases complexity and memory usage on constrained hardware. Use in-memory Rust data structures (e.g., DashMap, moka cache crate) for hot local state.

Gotchas

  • DragonflyDB's replication features are less mature than Redis Sentinel/Cluster — use managed hosting or keep it simple with a single node + persistence initially
  • Monitor DragonflyDB's release cycle — it's actively developed but younger than Redis
  • Keep the cache layer optional — the system should function (slower) without it

6. Message Queue / Event Streaming

Recommendation: NATS + JetStream

Alternatives Considered

Option Pros Cons
NATS + JetStream Lightweight (single binary, ~20MB), sub-ms latency, built-in persistence, embedded mode, perfect for edge Smaller community than Kafka
Apache Kafka Highest throughput, mature, excellent tooling Heavy (JVM, ZooKeeper/KRaft), 4GB+ RAM minimum, overkill for PVM's scale
RabbitMQ Mature AMQP, sophisticated routing Higher latency (5-20ms), more memory, Erlang ops complexity
Redis Streams Simple, already have cache layer Not designed for reliable message delivery at scale

Reasoning

NATS + JetStream is purpose-built for PVM's architecture:

  1. Edge-native: NATS can run as a leaf node on the RPi5, connecting to the cloud NATS cluster. This is the core of the local-to-cloud sync architecture. When the connection drops, JetStream buffers messages locally and replays them on reconnect.

  2. Lightweight: NATS server is a single ~20 MB binary. On RPi5, it uses ~50 MB RAM. Compare to Kafka's 4 GB minimum.

  3. Sub-millisecond latency: Core NATS delivers messages in < 1ms. JetStream (persistent) adds 1-5ms. This is critical for real-time tournament updates — when a player busts, every connected display should update within milliseconds.

  4. Subject-based addressing: NATS subjects map perfectly to PVM's domain:

    • venue.{venue_id}.tournament.{id}.clock — tournament clock ticks
    • venue.{venue_id}.waitlist.update — waiting list changes
    • venue.{venue_id}.seats.{table_id} — seat assignments
    • player.{player_id}.notifications — player-specific events
    • sync.{node_id}.upstream — local node to cloud sync
    • sync.{node_id}.downstream — cloud to local node sync
  5. Built-in patterns: Request/reply (for RPC between cloud and node), pub/sub (for broadcasts), queue groups (for load-balanced consumers), key-value store (for distributed config), object store (for binary data like player photos).

  6. JetStream for durability: Tournament results, financial transactions, and sync operations need guaranteed delivery. JetStream provides at-least-once and exactly-once delivery semantics with configurable retention.

Architecture

RPi5 Local Node                    Cloud
┌──────────────┐                ┌──────────────────┐
│  NATS Leaf   │◄──── TLS ────►│  NATS Cluster     │
│  Node        │    (auto-     │  (3-node)         │
│              │    reconnect) │                    │
│  JetStream   │               │  JetStream        │
│  (local buf) │               │  (persistent)     │
└──────────────┘               └──────────────────┘

Gotchas

  • NATS JetStream's exactly-once semantics require careful consumer design — use idempotent handlers with deduplication IDs
  • Subject namespace design is critical — plan it early, changing later is painful
  • NATS leaf nodes need TLS configured for secure cloud connection
  • Monitor JetStream stream sizes on RPi5 — set max bytes limits to avoid filling the SD card during extended offline periods
  • The async-nats Rust crate is the official async client — well maintained and Tokio-native

7. Real-Time Communication

Recommendation: WebSockets (via Axum) for interactive clients + NATS for backend fan-out + SSE as fallback

Alternatives Considered

Option Pros Cons
WebSockets Full duplex, low latency, wide support Requires connection management, can't traverse some proxies
Server-Sent Events (SSE) Simpler, auto-reconnect, HTTP-native Server-to-client only, no binary support
WebTransport HTTP/3, multiplexed streams, unreliable mode Very new, limited browser support, no Chromecast support
Socket.IO Auto-fallback, rooms, namespaces Node.js-centric, adds overhead, not Rust-native
gRPC streaming Typed, efficient, bidirectional Not browser-native (needs grpc-web proxy), overkill

Architecture

The real-time pipeline has three layers:

  1. NATS (backend event bus): All state changes publish to NATS subjects. This is the single source of real-time truth. Both cloud services and local nodes publish here.

  2. WebSocket Gateway (Axum): A dedicated Axum service subscribes to relevant NATS subjects and fans out to connected WebSocket clients. Each client subscribes to the venues/tournaments they care about.

  3. SSE Fallback: For environments where WebSockets are blocked (some corporate networks), provide an SSE endpoint that delivers the same event stream. SSE's built-in auto-reconnect with Last-Event-ID makes resumption simple.

Flow Example: Tournament Clock Update

Tournament Service (Rust)
  → publishes to NATS: venue.123.tournament.456.clock {level: 5, time_remaining: 1200}
  → WebSocket Gateway subscribes to venue.123.tournament.*
  → fans out to all connected clients watching tournament 456
  → Chromecast receiver app gets update, renders clock
  → PWA on player's phone gets update, shows current level

Implementation Details

  • Use axum::extract::ws::WebSocket with tokio::select! to multiplex NATS subscription + client messages
  • Implement heartbeat/ping-pong to detect stale connections (30s interval, 10s timeout)
  • Client reconnection with exponential backoff + subscription replay from NATS JetStream
  • Binary message format: consider MessagePack (rmp-serde) for compact payloads over WebSocket, with JSON as human-readable fallback
  • Connection limits: track per-venue connection count, implement backpressure

Gotchas

  • WebSocket connections are stateful — need sticky sessions or a connection registry if running multiple gateway instances
  • Chromecast receiver apps have limited WebSocket support — test thoroughly on actual hardware
  • Mobile PWAs going to background will drop WebSocket connections — design for reconnection and state catch-up
  • Rate limit outbound messages to prevent flooding slow clients (tournament clock ticks should be throttled to 1/second for display, even if internal state updates more frequently)

8. Auth & Authorization

Recommendation: Custom JWT auth with Postgres-backed RBAC + optional OAuth2 social login

Alternatives Considered

Option Pros Cons
Custom JWT + RBAC Full control, no vendor dependency, works offline on local node Must implement everything yourself
Auth0 / Clerk Managed, social login, MFA out of box Vendor lock-in, cost scales with users, doesn't work offline
Keycloak Self-hosted, full-featured, OIDC/SAML Heavy (Java), complex to operate, overkill
Ory (Kratos + Keto) Open source, cloud-native, API-first Multiple services to deploy, newer
Lucia Auth Lightweight, framework-agnostic TypeScript-only, no Rust support

Architecture

PVM's auth has a unique challenge: cross-venue universal player accounts that must work both online (cloud) and offline (local node). This rules out purely managed auth services.

Token Strategy:

Access Token (JWT, short-lived: 15 min)
├── sub: player_id (universal)
├── tenant_id: current operator
├── venue_id: current venue (if applicable)
├── roles: ["player", "dealer", "floor_manager", "admin"]
├── permissions: ["tournament.manage", "waitlist.view", ...]
└── iat, exp, iss

Refresh Token (opaque, stored in DB/DragonflyDB, long-lived: 30 days)
└── Rotated on each use, old tokens invalidated

RBAC Model:

Operator (tenant)
├── Admin — full control over all venues
├── Manager — manage specific venues
├── Floor Manager — tournament/table operations at a venue
├── Dealer — assigned to tables, report results
└── Player — universal account, cross-venue
    ├── can self-register
    ├── has memberships per venue
    └── has credit lines per venue (managed by admin)

Key Design Decisions:

  1. Tenant-scoped roles: A user can be an admin in one operator's venues and a player in another. The (user_id, operator_id, role) triple is the authorization unit.
  2. Offline auth on local node: The local node caches valid JWT signing keys and a subset of user credentials (hashed). Players can authenticate locally when the cloud is unreachable. New registrations queue for cloud sync.
  3. JWT signing: Use Ed25519 (fast, small signatures) via the jsonwebtoken crate. The cloud signs tokens; the local node can verify them with the public key. For offline token issuance, the local node has a delegated signing key.
  4. Password hashing: argon2 crate — memory-hard, resistant to GPU attacks. Tune parameters for RPi5 (lower memory cost than cloud).
  5. Social login (optional, cloud-only): Support Google/Apple sign-in for player accounts via standard OAuth2 flows. Map social identities to the universal player account.

Gotchas

  • Token revocation is hard with JWTs — use short expiry (15 min) + refresh token rotation + a lightweight blocklist in DragonflyDB for immediate revocation
  • Cross-venue account linking: when a player signs up at venue A and later visits venue B (different operator), they should be recognized. Use email/phone as the universal identifier with verification.
  • Local node token issuance must be time-limited and logged — cloud should audit all locally-issued tokens on sync
  • Rate limit login attempts both on cloud and local node to prevent brute force

9. API Design

Recommendation: REST + OpenAPI 3.1 with generated TypeScript client

Alternatives Considered

Approach Pros Cons
REST + OpenAPI Universal, tooling-rich, generated clients, cacheable Overfetching possible, multiple round trips
GraphQL Flexible queries, single endpoint, good for complex UIs Complexity overhead, caching harder, Rust support less mature
tRPC Zero-config type safety TypeScript-only — cannot use with Rust backend
gRPC Efficient binary protocol, streaming Needs proxy for browsers, overkill for this use case

Reasoning

tRPC is ruled out because it requires both client and server to be TypeScript. With a Rust backend, this is not viable.

REST + OpenAPI is the best approach because:

  1. Generated type safety: Use utoipa to generate OpenAPI 3.1 specs from Rust types, then openapi-typescript to generate TypeScript types for the frontend. Changes to the Rust API automatically propagate to the frontend types.
  2. Cacheable: REST's HTTP semantics enable CDN caching, ETag support, and conditional requests — important for player profiles and tournament structures that change infrequently.
  3. Universal clients: The REST API will also be consumed by the Chromecast receiver app, the local node sync layer, and potentially third-party integrations. OpenAPI makes all of these easy.
  4. Tooling: Swagger UI for exploration, openapi-fetch for the TypeScript client (type-safe fetch wrapper), Postman/Insomnia for testing.

API Conventions

# Resource-based URLs
GET    /api/v1/venues/{venue_id}/tournaments
POST   /api/v1/venues/{venue_id}/tournaments
GET    /api/v1/venues/{venue_id}/tournaments/{id}
PATCH  /api/v1/venues/{venue_id}/tournaments/{id}

# Actions as sub-resources
POST   /api/v1/venues/{venue_id}/tournaments/{id}/start
POST   /api/v1/venues/{venue_id}/tournaments/{id}/pause
POST   /api/v1/venues/{venue_id}/waitlists/{id}/join
POST   /api/v1/venues/{venue_id}/waitlists/{id}/call/{player_id}

# Cross-venue player operations
GET    /api/v1/players/me
GET    /api/v1/players/{id}/memberships
POST   /api/v1/players/{id}/credit-lines

# Real-time subscriptions
WS     /api/v1/ws?venue={id}&subscribe=tournament.clock,waitlist.updates

Type Generation Pipeline

Rust structs (serde + utoipa derive)
  → OpenAPI 3.1 JSON spec (generated at build time)
  → openapi-typescript (CI step)
  → TypeScript types + openapi-fetch client
  → SvelteKit frontend consumes typed API

Gotchas

  • Version the API from day one (/api/v1/) — breaking changes go in /api/v2/
  • Use cursor-based pagination for lists (not offset-based) — more efficient and handles concurrent inserts
  • Standardize error responses: { error: { code: string, message: string, details?: any } }
  • Consider a lightweight BFF (Backend-for-Frontend) pattern in SvelteKit's server routes for aggregating multiple API calls into one page load

10. Local Node Architecture

Recommendation: Single Rust binary running on RPi5 with embedded libSQL, NATS leaf node, and local HTTP/WS server

What Runs on the RPi5

┌─────────────────────────────────────────────────────┐
│  PVM Local Node (single Rust binary, ~15-20 MB)     │
│                                                      │
│  ┌──────────────┐  ┌──────────────┐                  │
│  │ HTTP/WS      │  │ NATS Leaf    │                  │
│  │ Server       │  │ Node         │                  │
│  │ (Axum)       │  │ (embedded or │                  │
│  │              │  │  sidecar)    │                  │
│  └──────┬───────┘  └──────┬───────┘                  │
│         │                  │                          │
│  ┌──────┴──────────────────┴───────┐                  │
│  │        Application Core          │                  │
│  │  - Tournament engine             │                  │
│  │  - Clock manager                 │                  │
│  │  - Waitlist manager              │                  │
│  │  - Seat assignment               │                  │
│  │  - Sync orchestrator             │                  │
│  └──────────────┬───────────────────┘                  │
│                  │                                      │
│  ┌──────────────┴───────────────────┐                  │
│  │         libSQL (embedded)         │                  │
│  │  - Venue data subset              │                  │
│  │  - Offline mutation queue         │                  │
│  │  - Local auth cache               │                  │
│  └───────────────────────────────────┘                  │
│                                                         │
│  ┌───────────────────────────────────┐                  │
│  │  moka in-memory cache             │                  │
│  │  - Hot tournament state           │                  │
│  │  - Active session tokens          │                  │
│  └───────────────────────────────────┘                  │
└─────────────────────────────────────────────────────────┘

Offline Operations

When the cloud connection drops, the local node continues operating:

  1. Tournament operations: Clock continues, blinds advance, players bust/rebuy — all local state
  2. Waitlist management: Players can join/leave waitlists — queued for cloud sync
  3. Seat assignments: Floor managers can move players between tables locally
  4. Player auth: Cached credentials allow existing players to log in. New registrations queued.
  5. Financial operations: Buy-ins and credit transactions logged locally with offline flag. Cloud reconciles on reconnect.

Sync Protocol

On reconnect:
1. Local node sends its last-seen cloud sequence number
2. Cloud sends all events since that sequence (via NATS JetStream replay)
3. Local node sends its offline mutation queue (ordered by local timestamp)
4. Cloud processes mutations, detects conflicts, responds with resolution
5. Local node applies cloud resolutions, updates local state
6. Both sides confirm sync complete

Conflict Resolution Strategy

Data Type Strategy Reasoning
Tournament state Cloud wins Only one node runs a tournament at a time
Waitlist Merge (union) Both sides can add/remove; merge and re-order by timestamp
Player profiles Cloud wins (LWW) Cloud is the authority for universal accounts
Credit transactions Append-only (event sourcing) No conflicts — every transaction is immutable
Seat assignments Local wins during offline Floor manager's local decisions take precedence
Dealer schedules Cloud wins Schedules are set centrally

RPi5 System Setup

  • OS: Raspberry Pi OS Lite (64-bit, Debian Bookworm-based) — no desktop environment
  • Storage: 32 GB+ microSD or USB SSD (recommended for durability)
  • Auto-start: systemd service for the PVM binary
  • Updates: OTA binary updates via a self-update mechanism (download new binary, verify signature, swap, restart)
  • Watchdog: Hardware watchdog timer to auto-reboot if the process hangs
  • Networking: Ethernet preferred (reliable), WiFi as fallback. mDNS for local discovery.

Gotchas

  • RPi5 has 4 GB or 8 GB RAM — target 8 GB model, budget ~200 MB for the PVM process + NATS
  • SD card wear: use an external USB SSD for the libSQL database if heavy write operations are expected
  • Time synchronization: use chrony NTP client — accurate timestamps are critical for conflict resolution and tournament clocks
  • Power loss: libSQL in WAL mode is crash-safe, but implement a clean shutdown handler (SIGTERM) that flushes state
  • Security: the RPi5 is physically accessible in venues — encrypt the libSQL database at rest, disable SSH password auth, use key-only

11. Chromecast / Display Streaming

Recommendation: Google Cast SDK with a Custom Web Receiver (SvelteKit static app)

Architecture

┌──────────────┐     Cast SDK      ┌──────────────────┐
│ Sender App   │ ──────────────►   │ Custom Web        │
│ (PVM Admin   │   (discovers &    │ Receiver          │
│  Dashboard)  │    launches)      │ (SvelteKit SPA)   │
│              │                   │                    │
│ or           │                   │ Hosted at:         │
│              │                   │ cast.pvmapp.com    │
│ Local Node   │                   │                    │
│ HTTP Server  │                   │ Connects to WS    │
│              │                   │ for live updates   │
└──────────────┘                   └────────┬───────────┘
                                            │
                                   ┌────────▼───────────┐
                                   │ Chromecast Device   │
                                   │ (renders receiver)  │
                                   └────────────────────┘

Custom Web Receiver

The Cast receiver is a separate SvelteKit static app that:

  1. Loads on the Chromecast device when cast is initiated
  2. Connects to the PVM WebSocket endpoint (cloud or local node, depending on network)
  3. Subscribes to venue-specific events (tournament clock, waitlist, seat map)
  4. Renders full-screen display layouts:
    • Tournament clock: Large timer, current level, blind structure, next break
    • Waiting list: Player queue by game type, estimated wait times
    • Table status: Open seats, game types, stakes per table
    • Custom messages: Announcements, promotions

Display Manager

A venue can have multiple Chromecast devices showing different content:

  • TV 1: Tournament clock (main)
  • TV 2: Cash game waiting list
  • TV 3: Table/seat map
  • TV 4: Rotating between tournament clock and waiting list

The Display Manager (part of the admin dashboard) lets floor managers:

  • Assign content to each Chromecast device
  • Configure rotation/cycling between views
  • Send one-time announcements to all screens
  • Adjust display themes (dark/light, font size, venue branding)

Technical Details

  • Register the receiver app with Google Cast Developer Console (one-time setup, $5 fee)
  • Use Cast Application Framework (CAF) Receiver SDK v3
  • The receiver app is a standard web page — can use any web framework (SvelteKit static build)
  • Sender integration: use the cast.framework.CastContext API in the admin dashboard
  • For local network casting (offline mode): the local node serves the receiver app directly, and the Chromecast connects to the local node's IP
  • Consider also supporting generic HDMI displays via a simple browser in kiosk mode (Chromium on a secondary RPi or mini PC) as a non-Chromecast fallback

Gotchas

  • Chromecast devices have limited memory and CPU — keep the receiver app lightweight (Svelte is ideal here)
  • Cast sessions can timeout after inactivity — implement keep-alive messages
  • Chromecast requires an internet connection for initial app load (it fetches the receiver URL from Google's servers) — for fully offline venues, the kiosk-mode browser fallback is essential
  • Test on actual Chromecast hardware early — the developer emulator doesn't catch all issues
  • Cast SDK requires HTTPS for the receiver URL in production (self-signed certs won't work on Chromecast)

12. Mobile Strategy

Recommendation: PWA first (SvelteKit), with Capacitor wrapper for app store presence when needed

Alternatives Considered

Approach Pros Cons
PWA (SvelteKit) One codebase, instant updates, no app store, works offline Limited native API access, no push on iOS (improving), discoverability
Capacitor (hybrid) PWA + native shell, access native APIs, app store distribution Thin WebView wrapper, some performance overhead
Tauri Mobile Rust backend, small size Mobile support very early (alpha/beta), limited ecosystem
React Native True native UI, large ecosystem Separate codebase from web, React dependency, not Svelte
Flutter Excellent cross-platform, single codebase Dart language, separate from web entirely

Reasoning

PVM's mobile needs are primarily consumption-oriented — players check tournament schedules, waiting list position, and receive notifications. This is a perfect fit for a PWA:

  1. PWA first: The SvelteKit app with vite-plugin-pwa already provides offline caching, add-to-home-screen, and background sync. For most players, this is sufficient.

  2. Capacitor wrap when needed: When iOS push notifications, Apple Pay, or app store presence becomes important, wrap the existing SvelteKit PWA in Capacitor. Capacitor runs the same web app in a native WebView and provides JavaScript bridges to native APIs.

  3. Tauri Mobile is not ready: As of 2026, Tauri 2.0's mobile support exists but is still maturing. It would be a good fit architecturally (Rust backend + web frontend), but the plugin ecosystem and build tooling aren't as polished as Capacitor's. Revisit in 12-18 months.

PWA Features for PVM

  • Service Worker: Cache tournament schedules, player profile, venue info for offline access
  • Push Notifications: Web Push API for tournament start reminders, waitlist calls (Android + iOS 16.4+)
  • Add to Home Screen: App-like experience without app store
  • Background Sync: Queue waitlist join/leave actions when offline, sync when back online
  • Share Target: Accept shared tournament links

Gotchas

  • iOS PWA support is improving but still has limitations (no background fetch, limited push notification payload)
  • Capacitor requires maintaining iOS/Android build pipelines — only add this when there's a clear need
  • Test PWA on actual mobile devices in venues — WiFi quality varies dramatically
  • Deep linking: configure universal links / app links so shared tournament URLs open in the PWA/app

13. Deployment & Infrastructure

Recommendation: Fly.io (primary cloud) + Docker containers + GitHub Actions CI/CD

Alternatives Considered

Platform Pros Cons
Fly.io Edge deployment, built-in Postgres, simple scaling, good pricing, Rust-friendly CLI-first workflow, no built-in CI/CD
Railway Excellent DX, GitHub integration, preview environments Less edge presence, newer
AWS (ECS/Fargate) Full control, enterprise grade, broadest service catalog Complex, expensive operations overhead
Render Simple, good free tier Less flexible networking, no edge
Hetzner + manual Cheapest, full control Operations burden, no managed services

Reasoning

Fly.io is the best fit for PVM:

  1. Edge deployment: Fly.io runs containers close to users. For a poker venue SaaS with venues in multiple cities/countries, edge deployment means lower latency for real-time tournament updates.
  2. Built-in Postgres: Fly Postgres is managed, with automatic failover and point-in-time recovery.
  3. Fly Machines: Fine-grained control over machine placement — can run NATS, DragonflyDB, and the API server as separate Fly machines.
  4. Rust-friendly: Fly.io's multi-stage Docker builds work well for Rust (build on large machine, deploy tiny binary).
  5. Private networking: Fly's WireGuard mesh enables secure communication between services without exposing ports publicly. The RPi5 local nodes can use Fly's WireGuard to connect to the cloud NATS cluster.
  6. Reasonable pricing: Pay-as-you-go, no minimum commitment. Scale to zero for staging environments.

Infrastructure Layout

Fly.io Cloud
├── pvm-api (Axum, 2+ instances, auto-scaled)
├── pvm-ws-gateway (Axum WebSocket, 2+ instances)
├── pvm-nats (NATS cluster, 3 nodes)
├── pvm-db (Fly Postgres, primary + replica)
├── pvm-cache (DragonflyDB, single node)
└── pvm-worker (background jobs: sync processing, notifications)

Venue (RPi5)
└── pvm-node (single Rust binary + NATS leaf node)
    └── connects to pvm-nats via WireGuard/TLS

CI/CD Pipeline (GitHub Actions)

# Triggered on push to main
1. Lint (clippy, eslint)
2. Test (cargo test, vitest, playwright)
3. Build (multi-stage Docker for cloud, cross-compile for RPi5)
4. Deploy staging (auto-deploy to Fly.io staging)
5. E2E tests against staging
6. Deploy production (manual approval gate)
7. Publish RPi5 binary (signed, to update server)

Gotchas

  • Fly.io Postgres is not fully managed — you still need to handle major version upgrades and backup verification
  • Use multi-stage Docker builds to keep Rust image sizes small (builder stage with rust:bookworm, runtime stage with debian:bookworm-slim or distroless)
  • Pin Fly.io machine regions to match your target markets — don't spread too thin initially
  • Set up blue-green deployments for zero-downtime upgrades
  • The RPi5 binary update mechanism needs a rollback strategy — keep the previous binary and a fallback boot option

14. Monitoring & Observability

Recommendation: OpenTelemetry (traces + metrics + logs) exported to Grafana Cloud (or self-hosted Grafana + Loki + Tempo + Prometheus)

Alternatives Considered

Stack Pros Cons
OpenTelemetry + Grafana Vendor-neutral, excellent Rust support, unified pipeline Some setup required
Datadog All-in-one, excellent UX Expensive at scale, vendor lock-in
New Relic Good APM Cost, Rust support less first-class
Sentry Excellent error tracking Limited metrics/traces, complementary rather than primary

Rust Instrumentation Stack

# Key crates
tracing = "0.1"                    # Structured logging/tracing facade
tracing-subscriber = "0.3"        # Log formatting, filtering
tracing-opentelemetry = "0.28"    # Bridge tracing → OpenTelemetry
opentelemetry = "0.28"            # OTel SDK
opentelemetry-otlp = "0.28"      # OTLP exporter
opentelemetry-semantic-conventions # Standard attribute names

What to Monitor

Application Metrics:

  • Request rate, latency (p50/p95/p99), error rate per endpoint
  • WebSocket connection count per venue
  • NATS message throughput and consumer lag
  • Tournament clock drift (local node vs cloud time)
  • Sync latency (time from local mutation to cloud persistence)
  • Cache hit/miss ratios (DragonflyDB)

Business Metrics:

  • Active tournaments per venue
  • Players on waiting lists
  • Concurrent connected users
  • Tournament registrations per hour
  • Offline duration per local node

Infrastructure Metrics:

  • CPU, memory, disk per service
  • RPi5 node health: temperature, memory usage, SD card wear level
  • NATS cluster health
  • Postgres connection pool utilization

Local Node Observability

The RPi5 node should:

  • Buffer OpenTelemetry spans/metrics locally when offline
  • Flush to cloud collector on reconnect
  • Expose a local /health endpoint for venue staff to check node status
  • Log to both stdout (for journalctl) and a rotating file

Alerting

  • Use Grafana Alerting for cloud services
  • Critical alerts: API error rate > 5%, NATS cluster partition, Postgres replication lag > 30s
  • Warning alerts: RPi5 node offline > 5 min, sync backlog > 1000 events, high memory usage
  • Notification channels: Slack/Discord for ops team, push notification for venue managers on critical local node issues

Gotchas

  • OpenTelemetry's Rust SDK is stable but evolving — pin versions carefully
  • The tracing crate is the Rust ecosystem standard — everything (Axum, sqlx, async-nats) already emits tracing spans, so you get deep instrumentation for free
  • Sampling is important at scale — don't trace every tournament clock tick in production
  • Grafana Cloud's free tier is generous enough for early stages (10k metrics, 50GB logs, 50GB traces)

15. Testing Strategy

Recommendation: Multi-layer testing with cargo test (unit/integration), Playwright (E2E), and Vitest (frontend unit)

Test Pyramid

         ▲
        / \        E2E Tests (Playwright)
       /   \       - Full user flows
      /     \      - Cast receiver rendering
     /───────\
    /         \    Integration Tests (cargo test + testcontainers)
   /           \   - API endpoint tests with real DB
  /             \  - NATS pub/sub flows
 /               \ - Sync protocol tests
/─────────────────\
                    Unit Tests (cargo test + vitest)
                    - Domain logic (tournament engine, clock, waitlist)
                    - Svelte component tests
                    - Conflict resolution logic

Backend Testing (Rust)

  • Unit tests: Inline #[cfg(test)] modules for domain logic. The tournament engine, clock manager, waitlist priority algorithm, and conflict resolution are all pure functions that are easy to unit test.
  • Integration tests: Use testcontainers crate to spin up ephemeral Postgres + NATS + DragonflyDB instances. Test full API flows including auth, multi-tenancy, and real-time events.
  • sqlx compile-time checks: SQL queries are validated against the database schema at compile time — this catches a huge class of bugs before runtime.
  • Property-based testing: Use proptest for testing conflict resolution and sync protocol with random inputs.
  • Test runner: cargo-nextest for parallel test execution (significantly faster than default cargo test).

Frontend Testing (TypeScript/Svelte)

  • Component tests: Vitest + @testing-library/svelte for testing Svelte components in isolation.
  • Store/state tests: Vitest for testing reactive state logic (tournament clock state, waitlist updates).
  • API mocking: msw (Mock Service Worker) for intercepting API calls in tests.

End-to-End Testing

  • Playwright: Test critical user flows in real browsers:
    • Tournament creation and management flow
    • Player registration and waitlist join
    • Real-time updates (verify clock ticks appear in browser)
    • Multi-venue admin dashboard
    • Cast receiver display rendering (headless Chromium)
  • Local node E2E: Test offline scenarios — start local node, disconnect from cloud, perform operations, reconnect, verify sync.

Specialized Tests

  • Sync protocol tests: Simulate network partitions, conflicting writes, replay scenarios
  • Load testing: k6 or drill (Rust) for WebSocket connection saturation, API throughput
  • Cast receiver tests: Visual regression testing with Playwright screenshots of display layouts
  • Cross-browser: Playwright covers Chromium, Firefox, WebKit — ensure PWA works on all

Gotchas

  • Rust integration tests with testcontainers need Docker available in CI — Fly.io's CI runners support this, or use GitHub Actions with Docker
  • Playwright tests are slow — run in parallel, and only test critical paths in CI (full suite nightly)
  • The local node's offline/reconnect behavior is the hardest thing to test — invest heavily in deterministic sync protocol tests
  • Mock the NATS connection in unit tests using a channel-based mock, not an actual NATS server

16. Security

Recommendation: Defense in depth across all layers

Data Security

Layer Measure
Transport TLS 1.3 everywhere — API, WebSocket, NATS, Postgres connections
Data at rest Postgres: encrypted volumes (cloud provider). libSQL on RPi5: SQLCipher-compatible encryption via libsql
Secrets Environment variables via Fly.io secrets (cloud), encrypted config file on RPi5 (sealed at provisioning)
Passwords Argon2id hashing, tuned per environment (higher params on cloud, lower on RPi5)
JWTs Ed25519 signing, short expiry (15 min), refresh token rotation
API keys SHA-256 hashed in DB, displayed once at creation, prefix-based identification (pvm_live_, pvm_test_)

Network Security

  • API: Rate limiting (Tower middleware), CORS restricted to known origins, request size limits
  • WebSocket: Authenticated connection upgrade (JWT in first message or query param), per-connection rate limiting
  • NATS: TLS + token auth between cloud and leaf nodes. Leaf nodes have scoped permissions (can only access their venue's subjects)
  • RPi5: Firewall (nftables/ufw) — only allow outbound to cloud NATS + HTTPS, inbound on local network only for venue devices
  • DDoS: Fly.io provides basic DDoS protection. Add Cloudflare in front for the API if needed.

Financial Data Security

PVM handles credit lines and buy-in transactions — this requires extra care:

  • All financial mutations are event-sourced with immutable audit trail
  • Credit line changes require admin approval with logged reason
  • Buy-in/cashout transactions include idempotency keys to prevent duplicate charges
  • Financial reports are only accessible to operator admins, with access logged
  • Consider PCI DSS implications if handling payment card data directly — prefer delegating to a payment processor (Stripe)

Local Node Security

The RPi5 is physically in a venue — assume it can be stolen or tampered with:

  • Disk encryption: Full disk encryption (LUKS) or at minimum encrypted database
  • Secure boot: Signed binaries, verified at startup
  • Remote wipe: Cloud can send a command to reset the node to factory state
  • Tamper detection: Log unexpected restarts, hardware changes
  • Credential scope: Local node only has access to its venue's data — compromising one node doesn't expose other venues

Gotchas

  • DO NOT store payment card numbers — use a payment processor's tokenization
  • GDPR/privacy: Player data across venues requires careful consent management. Players must be able to request data deletion.
  • The local node's offline auth cache is a security risk — limit cached credentials, expire after configurable period
  • Regularly rotate NATS credentials and JWT signing keys — automate this

17. Developer Experience

Recommendation: Cargo workspace (Rust monorepo) + pnpm workspace (TypeScript) managed by Turborepo

Monorepo Structure

pvm/
├── Cargo.toml                 # Rust workspace root
├── turbo.json                 # Turborepo config
├── package.json               # pnpm workspace root
├── pnpm-workspace.yaml
│
├── crates/                    # Rust crates
│   ├── pvm-api/               # Cloud API server (Axum)
│   ├── pvm-node/              # Local node binary
│   ├── pvm-ws-gateway/        # WebSocket gateway
│   ├── pvm-worker/            # Background job processor
│   ├── pvm-core/              # Shared domain logic
│   │   ├── tournament/        # Tournament engine
│   │   ├── waitlist/          # Waitlist management
│   │   ├── clock/             # Tournament clock
│   │   └── sync/              # Sync protocol
│   ├── pvm-db/                # Database layer (sqlx queries, migrations)
│   ├── pvm-auth/              # Auth logic (JWT, RBAC)
│   ├── pvm-nats/              # NATS client wrappers
│   └── pvm-types/             # Shared types (serde, utoipa derives)
│
├── apps/                      # TypeScript apps
│   ├── dashboard/             # SvelteKit admin dashboard
│   ├── player/                # SvelteKit player-facing app
│   ├── cast-receiver/         # SvelteKit Cast receiver (static)
│   └── docs/                  # Documentation site (optional)
│
├── packages/                  # Shared TypeScript packages
│   ├── ui/                    # shadcn-svelte components
│   ├── api-client/            # Generated OpenAPI client
│   └── shared/                # Shared types, utilities
│
├── docker/                    # Dockerfiles
├── .github/                   # GitHub Actions workflows
└── docs/                      # Project documentation

Key Tools

Tool Purpose
Cargo Rust build system, workspace management
pnpm Fast, disk-efficient Node.js package manager
Turborepo Orchestrates build/test/lint across both Rust and TS workspaces. Caches build outputs. --affected flag for CI optimization.
cargo-watch Auto-rebuild on Rust file changes during development
cargo-nextest Faster test runner with parallel execution
sccache Shared compilation cache (speeds up CI and local builds)
cross / cargo-zigbuild Cross-compile Rust for RPi5 ARM64
Biome Fast linter + formatter for TypeScript (replaces ESLint + Prettier)
clippy Rust linter (run with --deny warnings in CI)
rustfmt Rust formatter (enforced in CI)
lefthook Git hooks manager (format + lint on pre-commit)

Development Workflow

# Start everything for local development
turbo dev                      # Starts SvelteKit dev servers
cargo watch -x run -p pvm-api  # Auto-restart API on changes

# Run all tests
turbo test                     # TypeScript tests
cargo nextest run              # Rust tests

# Generate API client after backend changes
cargo run -p pvm-api -- --openapi > apps/dashboard/src/lib/api/schema.json
turbo generate:api-client

# Build for production
turbo build                    # TypeScript apps
cargo build --release -p pvm-api
cross build --release --target aarch64-unknown-linux-gnu -p pvm-node

Gotchas

  • Turborepo's Rust support is task-level (it runs cargo as a shell command) — it doesn't understand Cargo's internal dependency graph. Use Cargo workspace for Rust-internal dependencies.
  • Keep pvm-core as a pure library crate with no async runtime dependency — this lets it be used in both the cloud API and the local node without conflicts.
  • Rust compile times are the bottleneck — invest in sccache and incremental compilation from day one
  • Use .cargo/config.toml for cross-compilation targets and linker settings

18. CSS / Styling

Recommendation: Tailwind CSS v4 + shadcn-svelte component system

Alternatives Considered

Option Pros Cons
Tailwind CSS v4 Utility-first, fast, excellent Svelte integration, v4 is faster with Rust-based engine Learning curve for utility classes
Vanilla CSS No dependencies, full control Slow development, inconsistent patterns
UnoCSS Atomic CSS, fast, flexible presets Smaller ecosystem than Tailwind
Open Props Design tokens as CSS custom properties Not utility-first, less adoption
Panda CSS Type-safe styles, zero runtime Newer, smaller ecosystem

Reasoning

Tailwind CSS v4 is the clear choice:

  1. Svelte integration: Tailwind works seamlessly with SvelteKit via the Vite plugin. Svelte's template syntax + Tailwind utilities produce compact, readable component markup.
  2. Tailwind v4 improvements: The v4 release includes a Rust-based engine (Oxide) that is significantly faster, CSS-first configuration (no more tailwind.config.js), automatic content detection, and native CSS cascade layers.
  3. shadcn-svelte: The component library is built on Tailwind, providing a consistent design system with accessible, customizable components. Components are generated into your codebase — full ownership, no black box.
  4. Cast receiver: Tailwind's utility classes produce small CSS bundles (only used classes are included) — important for the resource-constrained Chromecast receiver.
  5. Design tokens: Use CSS custom properties (via Tailwind's theme) for venue-specific branding (colors, logos) that can be swapped at runtime.

Design System Structure

packages/ui/
├── components/                # shadcn-svelte generated components
│   ├── button/
│   ├── card/
│   ├── data-table/
│   ├── dialog/
│   ├── form/
│   └── ...
├── styles/
│   ├── app.css                # Global styles, Tailwind imports
│   ├── themes/
│   │   ├── default.css        # Default PVM theme
│   │   ├── dark.css           # Dark mode overrides
│   │   └── cast.css           # Optimized for large screens
│   └── tokens.css             # Design tokens (colors, spacing, typography)
└── utils.ts                   # cn() helper, variant utilities

Venue Branding

Venues should be able to customize their displays:

/* Runtime theme switching via CSS custom properties */
:root {
  --venue-primary: theme(colors.blue.600);
  --venue-secondary: theme(colors.gray.800);
  --venue-logo-url: url('/default-logo.svg');
}

/* Applied per-venue at runtime */
[data-venue-theme="vegas-poker"] {
  --venue-primary: #c41e3a;
  --venue-secondary: #1a1a2e;
  --venue-logo-url: url('/venues/vegas-poker/logo.svg');
}

Gotchas

  • Tailwind v4's CSS-first config is a paradigm shift from v3 — ensure all team documentation targets v4 syntax
  • shadcn-svelte components use Tailwind v4 as of recent updates — verify compatibility
  • Large data tables (tournament player lists, waitlists) need careful styling — consider virtualized rendering for 100+ row tables
  • Cast receiver displays need large fonts and high contrast — create a dedicated cast.css theme
  • Dark mode is essential for poker venues (low-light environments) — design dark-first

Area Recommendation Key Reasoning
Backend Language Rust Memory efficiency on RPi5, performance, type safety
Frontend Language TypeScript Browser ecosystem standard, type safety
Backend Framework Axum (v0.8+) Tokio-native, Tower middleware, WebSocket support
Frontend Framework SvelteKit (Svelte 5) Smallest bundles, fine-grained reactivity, PWA support
UI Components shadcn-svelte Accessible, Tailwind-based, full ownership
Cloud Database PostgreSQL 16+ Multi-tenant gold standard, RLS, JSONB
Local Database libSQL (embedded) SQLite-compatible, tiny footprint, Rust-native
ORM / Queries sqlx Compile-time checked SQL, Postgres + SQLite support
Caching DragonflyDB Redis-compatible, multi-threaded, memory efficient
Messaging NATS + JetStream Edge-native leaf nodes, sub-ms latency, lightweight
Real-Time WebSockets (Axum) + SSE fallback Full duplex, NATS-backed fan-out
Auth Custom JWT + RBAC Offline-capable, cross-venue, full control
API Design REST + OpenAPI 3.1 Generated TypeScript client, universal compatibility
Mobile PWA first, Capacitor later One codebase, offline support, app store when needed
Cast/Display Google Cast SDK + Custom Web Receiver SvelteKit static app on Chromecast
Deployment Fly.io + Docker Edge deployment, managed Postgres, WireGuard
CI/CD GitHub Actions + Turborepo Cross-language build orchestration, caching
Monitoring OpenTelemetry + Grafana Vendor-neutral, excellent Rust support
Testing cargo-nextest + Vitest + Playwright Full pyramid: unit, integration, E2E
Styling Tailwind CSS v4 Fast, small bundles, Svelte-native
Monorepo Cargo workspace + pnpm + Turborepo Unified builds, shared types
Linting clippy + Biome Rust + TypeScript coverage

Open Questions / Decisions Needed

High Priority

  1. Fly.io vs. self-hosted: Fly.io simplifies operations but creates vendor dependency. For a bootstrapped SaaS, the convenience is worth it. For VC-funded with an ops team, self-hosted on Hetzner could be cheaper at scale. Decision: Start with Fly.io, design for portability.

  2. libSQL sync granularity: Should the local node sync entire tables or individual rows? Row-level sync is more efficient but more complex to implement. Recommendation: Start with table-level sync for the initial version, refine to row-level as data volumes grow.

  3. NATS embedded vs. sidecar on RPi5: Running NATS as an embedded library (via nats-server Rust bindings) vs. a separate process. Embedded is simpler but couples versions tightly. Recommendation: Sidecar (separate process managed by systemd) for operational flexibility.

  4. Financial data handling: Does PVM handle actual money transactions, or only track buy-ins/credits as records? If handling real money, PCI DSS and financial regulations apply. Recommendation: Track records only. Integrate with Stripe for actual payments.

  5. Multi-region from day one?: Should the initial architecture support venues in multiple countries/regions? This affects Postgres replication strategy and NATS cluster topology. Recommendation: Single region initially, design NATS subjects and DB schema for eventual multi-region.

Medium Priority

  1. Player account deduplication: When a player signs up at two venues independently, how do we detect and merge accounts? Email match? Phone match? Manual linking? Needs product decision.

  2. Chromecast vs. generic display hardware: Should the primary display strategy be Chromecast, or should we target a browser-in-kiosk-mode approach that also works with Chromecast? Recommendation: Build the receiver as a standard web app first (works in kiosk mode), add Cast SDK integration second.

  3. RPi5 provisioning: How are local nodes set up? Manual image flashing? Automated provisioning? Remote setup? Recommendation: Pre-built OS image with first-boot wizard that connects to cloud and provisions the node.

  4. Offline duration limits: How long should a local node operate offline before we consider the data stale? 1 hour? 1 day? 1 week? Needs product decision based on venue feedback.

  5. API versioning strategy: When do we introduce /api/v2/? Should we support multiple versions simultaneously? Recommendation: Semantic versioning for the API spec. Maintain backward compatibility as long as possible. Only version on breaking changes.

Low Priority

  1. GraphQL for player-facing app: The admin dashboard is well-served by REST, but the player app might benefit from GraphQL's flexible querying (e.g., "show me my upcoming tournaments across all venues with waitlist status"). Revisit after v1 launch.

  2. WebTransport: When browser support matures and Chromecast supports it, WebTransport could replace WebSockets for lower-latency, multiplexed real-time streams. Monitor but do not adopt yet.

  3. WASM on local node: Could parts of the frontend run on the local node via WASM for ultra-fast local rendering? Interesting but not a priority. Defer.

  4. AI features: Player behavior analytics, optimal table assignments, tournament structure recommendations. The data model should be designed to support future ML pipelines. Design for it, build later.