pvm/docs/TECH_STACK_RESEARCH.md
Mikkel Georgsen 2bb381a0a3 Update tech stack research with finalized decisions
Resolve all open questions from tech stack review:
- Self-hosted on Hetzner PVE (LXC + Docker)
- Event-based sync via NATS JetStream
- Generic display system with Android client (no Cast SDK dep)
- Docker-based RPi5 provisioning
- No money handling, 72h offline limit, REST + OpenAPI
- PVM signup-first for player accounts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 03:06:53 +01:00

65 KiB

PVM (Poker Venue Manager) — Tech Stack Research

Generated: 2026-02-08 Status: DRAFT — for discussion and refinement


Table of Contents

  1. Programming Language
  2. Backend Framework
  3. Frontend Framework
  4. Database Strategy
  5. Caching Layer
  6. Message Queue / Event Streaming
  7. Real-Time Communication
  8. Auth & Authorization
  9. API Design
  10. Local Node Architecture
  11. Chromecast / Display Streaming
  12. Mobile Strategy
  13. Deployment & Infrastructure
  14. Monitoring & Observability
  15. Testing Strategy
  16. Security
  17. Developer Experience
  18. CSS / Styling
  19. Recommended Stack Summary
  20. Open Questions / Decisions Needed

1. Programming Language

Recommendation: Rust (backend + local node) + TypeScript (frontend + shared types)

Alternatives Considered

Language Pros Cons
Rust Memory safety, fearless concurrency, tiny binaries for RPi5, no GC pauses, excellent WebSocket perf Steeper learning curve, slower compile times
Go Simple, fast compilation, good concurrency Less expressive type system, GC pauses (minor), larger binaries than Rust
TypeScript (full-stack) One language everywhere, huge ecosystem, fast dev velocity Node.js memory overhead on RPi5, GC pauses in real-time scenarios, weaker concurrency model
Elixir Built for real-time (Phoenix), fault-tolerant OTP Small ecosystem, harder to find libs, RPi5 BEAM VM overhead

Reasoning

Rust is the strongest choice for PVM because of the RPi5 local node constraint. The local node must run reliably on constrained hardware with limited memory, handle real-time tournament clocks, manage offline operations, and sync data. Rust's zero-cost abstractions, lack of garbage collector, and small binary sizes (typically 5-15 MB static binaries) make it ideal for this.

For the cloud backend, Rust's performance means fewer servers and lower hosting costs. A single Rust service can handle thousands of concurrent WebSocket connections with minimal memory overhead — critical for real-time tournament updates across many venues.

The "all code written by Claude Code" constraint actually favors Rust: Claude has excellent Rust fluency, and the compiler's strict type system catches bugs that would otherwise require extensive testing in dynamic languages.

TypeScript remains the right choice for the frontend — the browser ecosystem is TypeScript-native, and sharing type definitions between Rust (via generated OpenAPI types) and TypeScript gives end-to-end type safety.

Gotchas

  • Rust compile times can be mitigated with cargo-watch, incremental compilation, and sccache
  • Cross-compilation for RPi5 (ARM64) is well-supported via cross or cargo-zigbuild
  • Shared domain types can be generated from Rust structs to TypeScript via ts-rs or OpenAPI codegen

2. Backend Framework

Recommendation: Axum (v0.8+)

Alternatives Considered

Framework Pros Cons
Axum Tokio-native, excellent middleware (Tower), lowest memory footprint, growing ecosystem, WebSocket built-in Younger than Actix
Actix Web Highest raw throughput, most mature Actor model adds complexity, not Tokio-native (uses own runtime fork)
Rocket Most ergonomic, Rails-like DX Slower performance, less flexible middleware
Loco Rails-like conventions, batteries-included Very new (2024), smaller community, opinionated

Reasoning

Axum is the clear winner for PVM:

  1. Tokio-native: Axum is built directly on Tokio + Hyper + Tower. Since NATS, database drivers, and WebSocket handling all use Tokio, everything shares one async runtime — no impedance mismatch.
  2. Tower middleware: The Tower service/layer pattern gives composable middleware for auth, rate limiting, tracing, compression, CORS, etc. Middleware can be shared between HTTP and WebSocket handlers.
  3. WebSocket support: First-class WebSocket extraction with axum::extract::ws, typed WebSocket messages via axum-typed-websockets.
  4. Memory efficiency: Benchmarks show Axum achieves the lowest memory footprint per connection — critical when serving thousands of concurrent venue connections.
  5. OpenAPI integration: utoipa crate provides derive macros for generating OpenAPI 3.1 specs directly from Axum handler types.
  6. Extractor pattern: Axum's extractor-based request handling maps cleanly to domain operations (extract tenant, extract auth, extract venue context).

Key Libraries

  • axum — HTTP framework
  • axum-extra — typed headers, cookie jar, multipart
  • tower + tower-http — middleware stack (CORS, compression, tracing, rate limiting)
  • utoipa + utoipa-axum — OpenAPI spec generation
  • utoipa-swagger-ui — embedded Swagger UI
  • axum-typed-websockets — strongly typed WS messages

Gotchas

  • Axum's error handling requires careful design — use thiserror + a custom error type that implements IntoResponse
  • Route organization: use axum::Router::nest() for modular route trees per domain (tournaments, venues, players)
  • State management: use axum::extract::State with Arc<AppState> — avoid the temptation to put everything in one giant state struct

3. Frontend Framework

Recommendation: SvelteKit (Svelte 5 + runes reactivity)

Alternatives Considered

Framework Pros Cons
SvelteKit Smallest bundles, true compilation (no virtual DOM), built-in routing/SSR/PWA, Svelte 5 runes are elegant Smaller ecosystem than React
Next.js (React) Largest ecosystem, most libraries, biggest job market Vercel lock-in concerns, React hydration overhead, larger bundles, RSC complexity
SolidStart Finest-grained reactivity, near-zero overhead updates Smallest ecosystem, least mature, fewer component libraries
Nuxt (Vue) Good DX, solid ecosystem Vue 3 composition API less elegant than Svelte 5 runes

Reasoning

SvelteKit is the best fit for PVM for several reasons:

  1. Performance matters for venue displays: Tournament clocks, waiting lists, and seat maps will run on venue TVs via Chromecast. Svelte's compiled output produces minimal JavaScript — the Cast receiver app will load faster and use less memory on Chromecast hardware.
  2. Real-time UI updates: Svelte 5's fine-grained reactivity (runes: $state, $derived, $effect) means updating a single timer or seat status re-renders only that DOM node, not a virtual DOM diff. This is ideal for dashboards with many independently updating elements.
  3. PWA support: SvelteKit has first-class service worker support and offline capabilities through @sveltejs/adapter-static and vite-plugin-pwa.
  4. Bundle size: SvelteKit produces the smallest JavaScript bundles of any major framework — important for mobile PWA users on venue WiFi.
  5. Claude Code compatibility: Svelte's template syntax is straightforward and less boilerplate than React — Claude can generate clean, readable Svelte components efficiently.
  6. No framework lock-in: Svelte compiles away, so there's no runtime dependency. The output is vanilla JS + DOM manipulation.

UI Component Library

Recommendation: Skeleton UI (Svelte-native) or shadcn-svelte (Tailwind-based, port of shadcn/ui)

shadcn-svelte is particularly compelling because:

  • Components are copied into your codebase (not a dependency) — full control
  • Built on Tailwind CSS — consistent with the styling recommendation
  • Accessible by default (uses Bits UI primitives under the hood)
  • Matches the design patterns of the widely-used shadcn/ui ecosystem

Gotchas

  • SvelteKit's SSR is useful for the management dashboard but the Cast receiver and PWA may use adapter-static for pure SPA mode
  • Svelte's ecosystem is smaller than React's, but for PVM's needs (forms, tables, charts, real-time) the ecosystem is sufficient
  • Svelte 5 (runes) is a significant API change from Svelte 4 — ensure all examples and libraries target Svelte 5

4. Database Strategy

Recommendation: PostgreSQL (cloud primary) + libSQL/SQLite (local node) + Electric SQL or custom sync

Alternatives Considered

Approach Pros Cons
Postgres cloud + libSQL local + sync Best of both worlds — Postgres power in cloud, SQLite simplicity on RPi5 Need sync layer, schema divergence risk
Postgres everywhere One DB engine, simpler mental model Postgres on RPi5 uses more memory, harder offline
libSQL/Turso everywhere One engine, built-in edge replication Less powerful for complex cloud queries, multi-tenant partitioning
CockroachDB Distributed, strong consistency Heavy for RPi5, expensive, overkill

Detailed Recommendation

Cloud Database: PostgreSQL 16+

  • The gold standard for multi-tenant SaaS
  • Row-level security (RLS) for tenant isolation
  • JSONB for flexible per-venue configuration
  • Excellent full-text search for player lookup across venues
  • Partitioning by tenant for performance at scale
  • Managed options: Neon (serverless, branching for dev), Supabase, or AWS RDS

Local Node Database: libSQL (via Turso's embedded runtime)

  • Fork of SQLite with cloud sync capabilities
  • Runs embedded in the Rust binary — no separate database process on RPi5
  • WAL mode for concurrent reads during tournament operations
  • Tiny memory footprint (< 10 MB typical)
  • libSQL's Rust driver (libsql) is well-maintained

Sync Strategy:

The local node operates on a subset of the cloud data — only data relevant to its venue(s). The sync approach:

  1. Cloud-to-local: Player profiles, memberships, credit lines pushed to local node via NATS JetStream. Local node maintains a read replica of relevant data in libSQL.
  2. Local-to-cloud: Tournament results, waitlist changes, transactions pushed to cloud via NATS JetStream with at-least-once delivery. Cloud processes as events.
  3. Conflict resolution: Last-writer-wins (LWW) with vector clocks for most entities. For financial data (credit lines, buy-ins), use event sourcing — conflicts are impossible because every transaction is an immutable event.
  4. Offline queue: When disconnected, local node queues mutations in a local WAL-style append-only log. On reconnect, replays in order via NATS.

ORM / Query Layer

Recommendation: sqlx (compile-time checked queries)

  • sqlx checks SQL queries against the actual database schema at compile time
  • No ORM abstraction layer — write real SQL, get compile-time safety
  • Supports both PostgreSQL and SQLite/libSQL
  • Avoids the N+1 query problems that ORMs introduce
  • Migrations via sqlx migrate

Alternative: sea-orm if you want a full ORM, but for PVM the explicit SQL approach of sqlx gives more control over multi-tenant queries and complex joins.

Migrations

  • Use sqlx migrate for cloud PostgreSQL migrations
  • Maintain parallel migration files for libSQL (SQLite-compatible subset)
  • A shared migration test ensures both schemas stay compatible for the sync subset

Gotchas

  • PostgreSQL and SQLite have different SQL dialects — the sync subset must use compatible types (no Postgres-specific types in synced tables)
  • libSQL's VECTOR type is interesting for future player similarity features but not needed initially
  • Turso's hosted libSQL replication is an option but adds a dependency — prefer embedded libSQL with custom NATS-based sync for more control
  • Schema versioning must be tracked on the local node so the cloud knows what schema version it's talking to

5. Caching Layer

Recommendation: DragonflyDB

Alternatives Considered

Option Pros Cons
DragonflyDB 25x Redis throughput, Redis-compatible API, multi-threaded, lower memory usage Younger project, smaller community
Redis 7+ Most mature, largest ecosystem, Redis Stack modules Single-threaded core, BSL license concerns since Redis 7.4
Valkey Redis fork, community-driven, BSD license Still catching up to Redis feature parity
KeyDB Multi-threaded Redis fork Development appears stalled (no updates in 1.5+ years)
No cache (just Postgres) Simpler architecture Higher DB load, slower for session/real-time data

Reasoning

DragonflyDB is the right choice for PVM:

  1. Redis API compatibility: Drop-in replacement — all Redis client libraries work unchanged. The fred Rust crate (async Redis client) works with DragonflyDB out of the box.
  2. Multi-threaded architecture: DragonflyDB uses all available CPU cores, unlike Redis's single-threaded model. This matters when caching tournament state for hundreds of concurrent venues.
  3. Memory efficiency: DragonflyDB uses up to 80% less memory than Redis for the same dataset — important for keeping infrastructure costs low.
  4. No license concerns: DragonflyDB uses BSL 1.1 (converts to open source after 4 years). Redis switched to a dual-license model that's more restrictive. Valkey is BSD but is playing catch-up.
  5. Pub/Sub: DragonflyDB supports Redis Pub/Sub — useful as a lightweight complement to NATS for in-process event distribution within the backend cluster.

What to Cache

  • Session data: User sessions, JWT refresh tokens
  • Tournament state: Current level, blinds, clock, player counts (hot read path)
  • Waiting lists: Ordered sets per venue/game type
  • Rate limiting: API rate limit counters
  • Player lookup cache: Frequently accessed player profiles
  • Seat maps: Current table/seat assignments per venue

What NOT to Cache (use Postgres directly)

  • Financial transactions (credit lines, buy-ins) — always hit the source of truth
  • Audit logs
  • Historical tournament data

Local Node: No DragonflyDB

The RPi5 local node should not run DragonflyDB. libSQL is fast enough for local caching needs, and adding another process increases complexity and memory usage on constrained hardware. Use in-memory Rust data structures (e.g., DashMap, moka cache crate) for hot local state.

Gotchas

  • DragonflyDB's replication features are less mature than Redis Sentinel/Cluster — use managed hosting or keep it simple with a single node + persistence initially
  • Monitor DragonflyDB's release cycle — it's actively developed but younger than Redis
  • Keep the cache layer optional — the system should function (slower) without it

6. Message Queue / Event Streaming

Recommendation: NATS + JetStream

Alternatives Considered

Option Pros Cons
NATS + JetStream Lightweight (single binary, ~20MB), sub-ms latency, built-in persistence, embedded mode, perfect for edge Smaller community than Kafka
Apache Kafka Highest throughput, mature, excellent tooling Heavy (JVM, ZooKeeper/KRaft), 4GB+ RAM minimum, overkill for PVM's scale
RabbitMQ Mature AMQP, sophisticated routing Higher latency (5-20ms), more memory, Erlang ops complexity
Redis Streams Simple, already have cache layer Not designed for reliable message delivery at scale

Reasoning

NATS + JetStream is purpose-built for PVM's architecture:

  1. Edge-native: NATS can run as a leaf node on the RPi5, connecting to the cloud NATS cluster. This is the core of the local-to-cloud sync architecture. When the connection drops, JetStream buffers messages locally and replays them on reconnect.

  2. Lightweight: NATS server is a single ~20 MB binary. On RPi5, it uses ~50 MB RAM. Compare to Kafka's 4 GB minimum.

  3. Sub-millisecond latency: Core NATS delivers messages in < 1ms. JetStream (persistent) adds 1-5ms. This is critical for real-time tournament updates — when a player busts, every connected display should update within milliseconds.

  4. Subject-based addressing: NATS subjects map perfectly to PVM's domain:

    • venue.{venue_id}.tournament.{id}.clock — tournament clock ticks
    • venue.{venue_id}.waitlist.update — waiting list changes
    • venue.{venue_id}.seats.{table_id} — seat assignments
    • player.{player_id}.notifications — player-specific events
    • sync.{node_id}.upstream — local node to cloud sync
    • sync.{node_id}.downstream — cloud to local node sync
  5. Built-in patterns: Request/reply (for RPC between cloud and node), pub/sub (for broadcasts), queue groups (for load-balanced consumers), key-value store (for distributed config), object store (for binary data like player photos).

  6. JetStream for durability: Tournament results, financial transactions, and sync operations need guaranteed delivery. JetStream provides at-least-once and exactly-once delivery semantics with configurable retention.

Architecture

RPi5 Local Node                    Cloud
┌──────────────┐                ┌──────────────────┐
│  NATS Leaf   │◄──── TLS ────►│  NATS Cluster     │
│  Node        │    (auto-     │  (3-node)         │
│              │    reconnect) │                    │
│  JetStream   │               │  JetStream        │
│  (local buf) │               │  (persistent)     │
└──────────────┘               └──────────────────┘

Gotchas

  • NATS JetStream's exactly-once semantics require careful consumer design — use idempotent handlers with deduplication IDs
  • Subject namespace design is critical — plan it early, changing later is painful
  • NATS leaf nodes need TLS configured for secure cloud connection
  • Monitor JetStream stream sizes on RPi5 — set max bytes limits to avoid filling the SD card during extended offline periods
  • The async-nats Rust crate is the official async client — well maintained and Tokio-native

7. Real-Time Communication

Recommendation: WebSockets (via Axum) for interactive clients + NATS for backend fan-out + SSE as fallback

Alternatives Considered

Option Pros Cons
WebSockets Full duplex, low latency, wide support Requires connection management, can't traverse some proxies
Server-Sent Events (SSE) Simpler, auto-reconnect, HTTP-native Server-to-client only, no binary support
WebTransport HTTP/3, multiplexed streams, unreliable mode Very new, limited browser support, no Chromecast support
Socket.IO Auto-fallback, rooms, namespaces Node.js-centric, adds overhead, not Rust-native
gRPC streaming Typed, efficient, bidirectional Not browser-native (needs grpc-web proxy), overkill

Architecture

The real-time pipeline has three layers:

  1. NATS (backend event bus): All state changes publish to NATS subjects. This is the single source of real-time truth. Both cloud services and local nodes publish here.

  2. WebSocket Gateway (Axum): A dedicated Axum service subscribes to relevant NATS subjects and fans out to connected WebSocket clients. Each client subscribes to the venues/tournaments they care about.

  3. SSE Fallback: For environments where WebSockets are blocked (some corporate networks), provide an SSE endpoint that delivers the same event stream. SSE's built-in auto-reconnect with Last-Event-ID makes resumption simple.

Flow Example: Tournament Clock Update

Tournament Service (Rust)
  → publishes to NATS: venue.123.tournament.456.clock {level: 5, time_remaining: 1200}
  → WebSocket Gateway subscribes to venue.123.tournament.*
  → fans out to all connected clients watching tournament 456
  → Chromecast receiver app gets update, renders clock
  → PWA on player's phone gets update, shows current level

Implementation Details

  • Use axum::extract::ws::WebSocket with tokio::select! to multiplex NATS subscription + client messages
  • Implement heartbeat/ping-pong to detect stale connections (30s interval, 10s timeout)
  • Client reconnection with exponential backoff + subscription replay from NATS JetStream
  • Binary message format: consider MessagePack (rmp-serde) for compact payloads over WebSocket, with JSON as human-readable fallback
  • Connection limits: track per-venue connection count, implement backpressure

Gotchas

  • WebSocket connections are stateful — need sticky sessions or a connection registry if running multiple gateway instances
  • Chromecast receiver apps have limited WebSocket support — test thoroughly on actual hardware
  • Mobile PWAs going to background will drop WebSocket connections — design for reconnection and state catch-up
  • Rate limit outbound messages to prevent flooding slow clients (tournament clock ticks should be throttled to 1/second for display, even if internal state updates more frequently)

8. Auth & Authorization

Recommendation: Custom JWT auth with Postgres-backed RBAC + optional OAuth2 social login

Alternatives Considered

Option Pros Cons
Custom JWT + RBAC Full control, no vendor dependency, works offline on local node Must implement everything yourself
Auth0 / Clerk Managed, social login, MFA out of box Vendor lock-in, cost scales with users, doesn't work offline
Keycloak Self-hosted, full-featured, OIDC/SAML Heavy (Java), complex to operate, overkill
Ory (Kratos + Keto) Open source, cloud-native, API-first Multiple services to deploy, newer
Lucia Auth Lightweight, framework-agnostic TypeScript-only, no Rust support

Architecture

PVM's auth has a unique challenge: cross-venue universal player accounts that must work both online (cloud) and offline (local node). This rules out purely managed auth services.

Token Strategy:

Access Token (JWT, short-lived: 15 min)
├── sub: player_id (universal)
├── tenant_id: current operator
├── venue_id: current venue (if applicable)
├── roles: ["player", "dealer", "floor_manager", "admin"]
├── permissions: ["tournament.manage", "waitlist.view", ...]
└── iat, exp, iss

Refresh Token (opaque, stored in DB/DragonflyDB, long-lived: 30 days)
└── Rotated on each use, old tokens invalidated

RBAC Model:

Operator (tenant)
├── Admin — full control over all venues
├── Manager — manage specific venues
├── Floor Manager — tournament/table operations at a venue
├── Dealer — assigned to tables, report results
└── Player — universal account, cross-venue
    ├── can self-register
    ├── has memberships per venue
    └── has credit lines per venue (managed by admin)

Key Design Decisions:

  1. Tenant-scoped roles: A user can be an admin in one operator's venues and a player in another. The (user_id, operator_id, role) triple is the authorization unit.
  2. Offline auth on local node: The local node caches valid JWT signing keys and a subset of user credentials (hashed). Players can authenticate locally when the cloud is unreachable. New registrations queue for cloud sync.
  3. JWT signing: Use Ed25519 (fast, small signatures) via the jsonwebtoken crate. The cloud signs tokens; the local node can verify them with the public key. For offline token issuance, the local node has a delegated signing key.
  4. Password hashing: argon2 crate — memory-hard, resistant to GPU attacks. Tune parameters for RPi5 (lower memory cost than cloud).
  5. Social login (optional, cloud-only): Support Google/Apple sign-in for player accounts via standard OAuth2 flows. Map social identities to the universal player account.

Gotchas

  • Token revocation is hard with JWTs — use short expiry (15 min) + refresh token rotation + a lightweight blocklist in DragonflyDB for immediate revocation
  • Cross-venue account linking: when a player signs up at venue A and later visits venue B (different operator), they should be recognized. Use email/phone as the universal identifier with verification.
  • Local node token issuance must be time-limited and logged — cloud should audit all locally-issued tokens on sync
  • Rate limit login attempts both on cloud and local node to prevent brute force

9. API Design

Recommendation: REST + OpenAPI 3.1 with generated TypeScript client

Alternatives Considered

Approach Pros Cons
REST + OpenAPI Universal, tooling-rich, generated clients, cacheable Overfetching possible, multiple round trips
GraphQL Flexible queries, single endpoint, good for complex UIs Complexity overhead, caching harder, Rust support less mature
tRPC Zero-config type safety TypeScript-only — cannot use with Rust backend
gRPC Efficient binary protocol, streaming Needs proxy for browsers, overkill for this use case

Reasoning

tRPC is ruled out because it requires both client and server to be TypeScript. With a Rust backend, this is not viable.

REST + OpenAPI is the best approach because:

  1. Generated type safety: Use utoipa to generate OpenAPI 3.1 specs from Rust types, then openapi-typescript to generate TypeScript types for the frontend. Changes to the Rust API automatically propagate to the frontend types.
  2. Cacheable: REST's HTTP semantics enable CDN caching, ETag support, and conditional requests — important for player profiles and tournament structures that change infrequently.
  3. Universal clients: The REST API will also be consumed by the Chromecast receiver app, the local node sync layer, and potentially third-party integrations. OpenAPI makes all of these easy.
  4. Tooling: Swagger UI for exploration, openapi-fetch for the TypeScript client (type-safe fetch wrapper), Postman/Insomnia for testing.

API Conventions

# Resource-based URLs
GET    /api/v1/venues/{venue_id}/tournaments
POST   /api/v1/venues/{venue_id}/tournaments
GET    /api/v1/venues/{venue_id}/tournaments/{id}
PATCH  /api/v1/venues/{venue_id}/tournaments/{id}

# Actions as sub-resources
POST   /api/v1/venues/{venue_id}/tournaments/{id}/start
POST   /api/v1/venues/{venue_id}/tournaments/{id}/pause
POST   /api/v1/venues/{venue_id}/waitlists/{id}/join
POST   /api/v1/venues/{venue_id}/waitlists/{id}/call/{player_id}

# Cross-venue player operations
GET    /api/v1/players/me
GET    /api/v1/players/{id}/memberships
POST   /api/v1/players/{id}/credit-lines

# Real-time subscriptions
WS     /api/v1/ws?venue={id}&subscribe=tournament.clock,waitlist.updates

Type Generation Pipeline

Rust structs (serde + utoipa derive)
  → OpenAPI 3.1 JSON spec (generated at build time)
  → openapi-typescript (CI step)
  → TypeScript types + openapi-fetch client
  → SvelteKit frontend consumes typed API

Gotchas

  • Version the API from day one (/api/v1/) — breaking changes go in /api/v2/
  • Use cursor-based pagination for lists (not offset-based) — more efficient and handles concurrent inserts
  • Standardize error responses: { error: { code: string, message: string, details?: any } }
  • Consider a lightweight BFF (Backend-for-Frontend) pattern in SvelteKit's server routes for aggregating multiple API calls into one page load

10. Local Node Architecture

Recommendation: Single Rust binary running on RPi5 with embedded libSQL, NATS leaf node, and local HTTP/WS server

What Runs on the RPi5

┌─────────────────────────────────────────────────────┐
│  PVM Local Node (single Rust binary, ~15-20 MB)     │
│                                                      │
│  ┌──────────────┐  ┌──────────────┐                  │
│  │ HTTP/WS      │  │ NATS Leaf    │                  │
│  │ Server       │  │ Node         │                  │
│  │ (Axum)       │  │ (embedded or │                  │
│  │              │  │  sidecar)    │                  │
│  └──────┬───────┘  └──────┬───────┘                  │
│         │                  │                          │
│  ┌──────┴──────────────────┴───────┐                  │
│  │        Application Core          │                  │
│  │  - Tournament engine             │                  │
│  │  - Clock manager                 │                  │
│  │  - Waitlist manager              │                  │
│  │  - Seat assignment               │                  │
│  │  - Sync orchestrator             │                  │
│  └──────────────┬───────────────────┘                  │
│                  │                                      │
│  ┌──────────────┴───────────────────┐                  │
│  │         libSQL (embedded)         │                  │
│  │  - Venue data subset              │                  │
│  │  - Offline mutation queue         │                  │
│  │  - Local auth cache               │                  │
│  └───────────────────────────────────┘                  │
│                                                         │
│  ┌───────────────────────────────────┐                  │
│  │  moka in-memory cache             │                  │
│  │  - Hot tournament state           │                  │
│  │  - Active session tokens          │                  │
│  └───────────────────────────────────┘                  │
└─────────────────────────────────────────────────────────┘

Offline Operations

When the cloud connection drops, the local node continues operating:

  1. Tournament operations: Clock continues, blinds advance, players bust/rebuy — all local state
  2. Waitlist management: Players can join/leave waitlists — queued for cloud sync
  3. Seat assignments: Floor managers can move players between tables locally
  4. Player auth: Cached credentials allow existing players to log in. New registrations queued.
  5. Financial operations: Buy-ins and credit transactions logged locally with offline flag. Cloud reconciles on reconnect.

Sync Protocol

On reconnect:
1. Local node sends its last-seen cloud sequence number
2. Cloud sends all events since that sequence (via NATS JetStream replay)
3. Local node sends its offline mutation queue (ordered by local timestamp)
4. Cloud processes mutations, detects conflicts, responds with resolution
5. Local node applies cloud resolutions, updates local state
6. Both sides confirm sync complete

Conflict Resolution Strategy

Data Type Strategy Reasoning
Tournament state Cloud wins Only one node runs a tournament at a time
Waitlist Merge (union) Both sides can add/remove; merge and re-order by timestamp
Player profiles Cloud wins (LWW) Cloud is the authority for universal accounts
Credit transactions Append-only (event sourcing) No conflicts — every transaction is immutable
Seat assignments Local wins during offline Floor manager's local decisions take precedence
Dealer schedules Cloud wins Schedules are set centrally

RPi5 System Setup

  • OS: Raspberry Pi OS Lite (64-bit, Debian Bookworm-based) — no desktop environment
  • Runtime: Docker + Docker Compose. Two containers: pvm-node (Rust binary) + pvm-nats-leaf (NATS)
  • Storage: 32 GB+ microSD or USB SSD (recommended for durability). libSQL database in a Docker volume.
  • Auto-start: Docker Compose with restart: always. systemd service ensures Docker starts on boot.
  • Updates: docker compose pull && docker compose up -d — automated via cron or webhook from cloud.
  • Watchdog: Docker health checks + hardware watchdog timer to auto-reboot if containers fail.
  • Networking: Ethernet preferred (reliable), WiFi as fallback. mDNS for local display device discovery. WireGuard tunnel to Hetzner cloud.

Gotchas

  • RPi5 has 4 GB or 8 GB RAM — target 8 GB model, budget ~200 MB for the PVM process + NATS
  • SD card wear: use an external USB SSD for the libSQL database if heavy write operations are expected
  • Time synchronization: use chrony NTP client — accurate timestamps are critical for conflict resolution and tournament clocks
  • Power loss: libSQL in WAL mode is crash-safe, but implement a clean shutdown handler (SIGTERM) that flushes state
  • Security: the RPi5 is physically accessible in venues — encrypt the libSQL database at rest, disable SSH password auth, use key-only

11. Venue Display System

Recommendation: Generic web display app + Android display client (no Google Cast SDK dependency)

Architecture

┌──────────────────┐
│  Screen Manager   │    (part of admin dashboard)
│  - Assign streams │    Venue staff assigns content to each display
│  - Per-TV config  │
└────────┬─────────┘
         │ WebSocket (display assignment)
         ▼
┌──────────────────┐         ┌──────────────────┐
│ Local RPi5 Node  │◄─ mDNS─┤ Display Devices   │
│ serves display   │  auto   │ (Android box /    │
│ web app + WS     │  disco  │  smart TV /       │
│                  │─────────►  Chromecast)      │
└────────┬─────────┘         └──────────────────┘
         │                            │
    if offline:                  fallback:
    serves locally             connect to cloud
         │                     SaaS URL directly
         ▼                            │
┌──────────────────┐         ┌───────▼──────────┐
│ Display renders  │         │ Display renders   │
│ from local node  │         │ from cloud        │
└──────────────────┘         └──────────────────┘

Display Client (Android App)

A lightweight Android app (or a $40 4K Android box) that:

  1. Auto-starts on boot — kiosk mode, no user interaction needed
  2. Discovers the local node via mDNS — zero-config for venue staff, falls back to manual IP entry
  3. Registers with a unique device ID — appears automatically in the Screen Manager dashboard
  4. Receives display assignment via WebSocket — the system tells it what to render
  5. Renders a full-screen web page — the display content is a standard SvelteKit static page
  6. Falls back to cloud SaaS if the local RPi5 node is offline
  7. Remotely controllable — venue staff can change the stream, restart, or push an announcement overlay from the Screen Manager

Display Content (SvelteKit Static App)

The display views are a separate SvelteKit static build optimized for large screens:

  • Tournament clock: Large timer, current level, blind structure, next break, average stack
  • Waiting list: Player queue by game type, estimated wait times
  • Table status: Open seats, game types, stakes per table
  • Seatings: Tournament seat assignments after draws
  • Custom slideshow: Announcements, promotions, venue info (managed by staff)
  • Rotation mode: Cycle between multiple views on a configurable timer

Screen Manager

The Screen Manager (part of the admin dashboard) lets floor managers:

  • See all connected display devices with status (online, offline, content)
  • Assign content streams to each device (TV 1-5: tournament clock, TV 6: waitlist, etc.)
  • Configure rotation/cycling between views per device
  • Send one-time announcements to all screens or specific screens
  • Adjust display themes (dark/light, font size, venue branding)
  • Group screens (e.g. "Tournament Area", "Cash Room", "Lobby")

Technical Details

  • Display web app is served by the local node's HTTP server (Axum) for lowest latency
  • WebSocket connection for live data updates (tournament clock ticks, waitlist changes)
  • Each display device is identified by a stable device ID (generated on first boot, persisted)
  • mDNS service type: _pvm-display._tcp.local for auto-discovery
  • Display URLs: http://{local-node-ip}/display/{device-id} (local) or https://app.pvmapp.com/display/{device-id} (cloud fallback)
  • Dark mode by default (poker venues are low-light environments)
  • Large fonts, high contrast — designed for viewing from across the room

Chromecast Compatibility

Chromecast is supported as a display target but not the primary architecture:

  • Smart TVs with built-in Chromecast or attached Chromecast dongles can open the display URL
  • No Google Cast SDK dependency — just opening a URL
  • The Android display client app is the recommended approach for reliability and offline support

Gotchas

  • Android kiosk mode needs careful implementation — prevent users from exiting the app, handle OS updates gracefully
  • mDNS can be unreliable on some enterprise/venue networks — always offer manual IP fallback
  • Display devices on venue WiFi may have intermittent connectivity — design for reconnection and state catch-up
  • Keep the display app extremely lightweight — some $40 Android boxes have limited RAM
  • Test on actual cheap Android hardware early — performance varies wildly
  • Power cycling (venue closes nightly) must be handled gracefully — auto-start, auto-reconnect, auto-resume

12. Mobile Strategy

Recommendation: PWA first (SvelteKit), with Capacitor wrapper for app store presence when needed

Alternatives Considered

Approach Pros Cons
PWA (SvelteKit) One codebase, instant updates, no app store, works offline Limited native API access, no push on iOS (improving), discoverability
Capacitor (hybrid) PWA + native shell, access native APIs, app store distribution Thin WebView wrapper, some performance overhead
Tauri Mobile Rust backend, small size Mobile support very early (alpha/beta), limited ecosystem
React Native True native UI, large ecosystem Separate codebase from web, React dependency, not Svelte
Flutter Excellent cross-platform, single codebase Dart language, separate from web entirely

Reasoning

PVM's mobile needs are primarily consumption-oriented — players check tournament schedules, waiting list position, and receive notifications. This is a perfect fit for a PWA:

  1. PWA first: The SvelteKit app with vite-plugin-pwa already provides offline caching, add-to-home-screen, and background sync. For most players, this is sufficient.

  2. Capacitor wrap when needed: When iOS push notifications, Apple Pay, or app store presence becomes important, wrap the existing SvelteKit PWA in Capacitor. Capacitor runs the same web app in a native WebView and provides JavaScript bridges to native APIs.

  3. Tauri Mobile is not ready: As of 2026, Tauri 2.0's mobile support exists but is still maturing. It would be a good fit architecturally (Rust backend + web frontend), but the plugin ecosystem and build tooling aren't as polished as Capacitor's. Revisit in 12-18 months.

PWA Features for PVM

  • Service Worker: Cache tournament schedules, player profile, venue info for offline access
  • Push Notifications: Web Push API for tournament start reminders, waitlist calls (Android + iOS 16.4+)
  • Add to Home Screen: App-like experience without app store
  • Background Sync: Queue waitlist join/leave actions when offline, sync when back online
  • Share Target: Accept shared tournament links

Gotchas

  • iOS PWA support is improving but still has limitations (no background fetch, limited push notification payload)
  • Capacitor requires maintaining iOS/Android build pipelines — only add this when there's a clear need
  • Test PWA on actual mobile devices in venues — WiFi quality varies dramatically
  • Deep linking: configure universal links / app links so shared tournament URLs open in the PWA/app

13. Deployment & Infrastructure

Recommendation: Self-hosted on Hetzner PVE (LXC containers) + Docker + Forgejo Actions CI/CD

Reasoning

The project already has a Hetzner Proxmox VE (PVE) server. Running PVM in LXC containers on the existing infrastructure keeps costs minimal and gives full control.

  1. LXC containers on PVE: Lightweight, near-native performance, easy to snapshot and backup. Each service gets its own container or Docker runs inside an LXC.
  2. Docker Compose for services: All cloud services defined in a single docker-compose.yml. Simple to start, stop, and update.
  3. No vendor lock-in: Everything runs on standard Linux + Docker. Can migrate to any cloud or other bare metal trivially.
  4. WireGuard for RPi5 connectivity: RPi5 local nodes connect to the Hetzner server via WireGuard tunnel for secure NATS leaf node communication.
  5. Forgejo Actions: CI/CD runs on the same Forgejo instance hosting the code.

Infrastructure Layout

Hetzner PVE Server
├── LXC: pvm-cloud
│   ├── Docker: pvm-api (Axum)
│   ├── Docker: pvm-ws-gateway (Axum WebSocket)
│   ├── Docker: pvm-worker (background jobs: sync, notifications)
│   ├── Docker: pvm-nats (NATS cluster)
│   ├── Docker: pvm-db (PostgreSQL 16)
│   └── Docker: pvm-cache (DragonflyDB)
├── LXC: pvm-staging (mirrors production for testing)
└── WireGuard endpoint for RPi5 nodes

Venue (RPi5 — Docker on Raspberry Pi OS)
├── Docker: pvm-node (Rust binary — API proxy + sync engine)
├── Docker: pvm-nats-leaf (NATS leaf node)
└── connects to Hetzner via WireGuard/TLS

RPi5 Local Node (Docker-based)

The local node runs Docker on stock Raspberry Pi OS (64-bit):

  • Provisioning: One-liner curl script installs Docker and pulls the PVM stack (docker compose pull && docker compose up -d)
  • Updates: Pull new images and restart (docker compose pull && docker compose up -d). Automated via a cron job or self-update webhook.
  • Rollback: Previous images remain on disk. Roll back with docker compose up -d --force-recreate using pinned image tags.
  • Services: pvm-node (Rust binary) + pvm-nats-leaf (NATS leaf node). Two containers, minimal footprint.
  • Storage: libSQL database stored in a Docker volume on the SD card (or USB SSD for heavy-write venues).

CI/CD Pipeline (Forgejo Actions)

# Triggered on push to main
1. Lint (clippy, biome)
2. Test (cargo nextest, vitest, playwright)
3. Build (multi-stage Docker for cloud + cross-compile ARM64 for RPi5)
4. Push images to container registry
5. Deploy staging (docker compose pull on staging LXC)
6. E2E tests against staging
7. Deploy production (manual approval, docker compose on production LXC)
8. Publish RPi5 images (ARM64 Docker images to registry)

Gotchas

  • Use multi-stage Docker builds for Rust: builder stage with rust:bookworm, runtime stage with debian:bookworm-slim or distroless
  • PostgreSQL backups: automate pg_dump to a separate backup location (another Hetzner storage box or off-site)
  • Set up blue-green deployments via Docker Compose profiles for zero-downtime upgrades
  • Monitor Hetzner server resources — if PVM outgrows a single server, split services across multiple LXCs or servers
  • WireGuard keys for RPi5 nodes: automate key generation and registration during provisioning
  • The RPi5 Docker update mechanism needs a health check — if new images fail, auto-rollback to previous tag

14. Monitoring & Observability

Recommendation: OpenTelemetry (traces + metrics + logs) exported to self-hosted Grafana + Loki + Tempo + Prometheus (on Hetzner PVE)

Alternatives Considered

Stack Pros Cons
OpenTelemetry + Grafana Vendor-neutral, excellent Rust support, unified pipeline Some setup required
Datadog All-in-one, excellent UX Expensive at scale, vendor lock-in
New Relic Good APM Cost, Rust support less first-class
Sentry Excellent error tracking Limited metrics/traces, complementary rather than primary

Rust Instrumentation Stack

# Key crates
tracing = "0.1"                    # Structured logging/tracing facade
tracing-subscriber = "0.3"        # Log formatting, filtering
tracing-opentelemetry = "0.28"    # Bridge tracing → OpenTelemetry
opentelemetry = "0.28"            # OTel SDK
opentelemetry-otlp = "0.28"      # OTLP exporter
opentelemetry-semantic-conventions # Standard attribute names

What to Monitor

Application Metrics:

  • Request rate, latency (p50/p95/p99), error rate per endpoint
  • WebSocket connection count per venue
  • NATS message throughput and consumer lag
  • Tournament clock drift (local node vs cloud time)
  • Sync latency (time from local mutation to cloud persistence)
  • Cache hit/miss ratios (DragonflyDB)

Business Metrics:

  • Active tournaments per venue
  • Players on waiting lists
  • Concurrent connected users
  • Tournament registrations per hour
  • Offline duration per local node

Infrastructure Metrics:

  • CPU, memory, disk per service
  • RPi5 node health: temperature, memory usage, SD card wear level
  • NATS cluster health
  • Postgres connection pool utilization

Local Node Observability

The RPi5 node should:

  • Buffer OpenTelemetry spans/metrics locally when offline
  • Flush to cloud collector on reconnect
  • Expose a local /health endpoint for venue staff to check node status
  • Log to both stdout (for journalctl) and a rotating file

Alerting

  • Use Grafana Alerting for cloud services
  • Critical alerts: API error rate > 5%, NATS cluster partition, Postgres replication lag > 30s
  • Warning alerts: RPi5 node offline > 5 min, sync backlog > 1000 events, high memory usage
  • Notification channels: Slack/Discord for ops team, push notification for venue managers on critical local node issues

Gotchas

  • OpenTelemetry's Rust SDK is stable but evolving — pin versions carefully
  • The tracing crate is the Rust ecosystem standard — everything (Axum, sqlx, async-nats) already emits tracing spans, so you get deep instrumentation for free
  • Sampling is important at scale — don't trace every tournament clock tick in production
  • Grafana Cloud's free tier is generous enough for early stages (10k metrics, 50GB logs, 50GB traces)

15. Testing Strategy

Recommendation: Multi-layer testing with cargo test (unit/integration), Playwright (E2E), and Vitest (frontend unit)

Test Pyramid

         ▲
        / \        E2E Tests (Playwright)
       /   \       - Full user flows
      /     \      - Cast receiver rendering
     /───────\
    /         \    Integration Tests (cargo test + testcontainers)
   /           \   - API endpoint tests with real DB
  /             \  - NATS pub/sub flows
 /               \ - Sync protocol tests
/─────────────────\
                    Unit Tests (cargo test + vitest)
                    - Domain logic (tournament engine, clock, waitlist)
                    - Svelte component tests
                    - Conflict resolution logic

Backend Testing (Rust)

  • Unit tests: Inline #[cfg(test)] modules for domain logic. The tournament engine, clock manager, waitlist priority algorithm, and conflict resolution are all pure functions that are easy to unit test.
  • Integration tests: Use testcontainers crate to spin up ephemeral Postgres + NATS + DragonflyDB instances. Test full API flows including auth, multi-tenancy, and real-time events.
  • sqlx compile-time checks: SQL queries are validated against the database schema at compile time — this catches a huge class of bugs before runtime.
  • Property-based testing: Use proptest for testing conflict resolution and sync protocol with random inputs.
  • Test runner: cargo-nextest for parallel test execution (significantly faster than default cargo test).

Frontend Testing (TypeScript/Svelte)

  • Component tests: Vitest + @testing-library/svelte for testing Svelte components in isolation.
  • Store/state tests: Vitest for testing reactive state logic (tournament clock state, waitlist updates).
  • API mocking: msw (Mock Service Worker) for intercepting API calls in tests.

End-to-End Testing

  • Playwright: Test critical user flows in real browsers:
    • Tournament creation and management flow
    • Player registration and waitlist join
    • Real-time updates (verify clock ticks appear in browser)
    • Multi-venue admin dashboard
    • Cast receiver display rendering (headless Chromium)
  • Local node E2E: Test offline scenarios — start local node, disconnect from cloud, perform operations, reconnect, verify sync.

Specialized Tests

  • Sync protocol tests: Simulate network partitions, conflicting writes, replay scenarios
  • Load testing: k6 or drill (Rust) for WebSocket connection saturation, API throughput
  • Cast receiver tests: Visual regression testing with Playwright screenshots of display layouts
  • Cross-browser: Playwright covers Chromium, Firefox, WebKit — ensure PWA works on all

Gotchas

  • Rust integration tests with testcontainers need Docker available in CI — Fly.io's CI runners support this, or use GitHub Actions with Docker
  • Playwright tests are slow — run in parallel, and only test critical paths in CI (full suite nightly)
  • The local node's offline/reconnect behavior is the hardest thing to test — invest heavily in deterministic sync protocol tests
  • Mock the NATS connection in unit tests using a channel-based mock, not an actual NATS server

16. Security

Recommendation: Defense in depth across all layers

Data Security

Layer Measure
Transport TLS 1.3 everywhere — API, WebSocket, NATS, Postgres connections
Data at rest Postgres: encrypted volumes (cloud provider). libSQL on RPi5: SQLCipher-compatible encryption via libsql
Secrets Environment variables via Fly.io secrets (cloud), encrypted config file on RPi5 (sealed at provisioning)
Passwords Argon2id hashing, tuned per environment (higher params on cloud, lower on RPi5)
JWTs Ed25519 signing, short expiry (15 min), refresh token rotation
API keys SHA-256 hashed in DB, displayed once at creation, prefix-based identification (pvm_live_, pvm_test_)

Network Security

  • API: Rate limiting (Tower middleware), CORS restricted to known origins, request size limits
  • WebSocket: Authenticated connection upgrade (JWT in first message or query param), per-connection rate limiting
  • NATS: TLS + token auth between cloud and leaf nodes. Leaf nodes have scoped permissions (can only access their venue's subjects)
  • RPi5: Firewall (nftables/ufw) — only allow outbound to cloud NATS + HTTPS, inbound on local network only for venue devices
  • DDoS: Fly.io provides basic DDoS protection. Add Cloudflare in front for the API if needed.

Financial Data Security

PVM handles credit lines and buy-in transactions — this requires extra care:

  • All financial mutations are event-sourced with immutable audit trail
  • Credit line changes require admin approval with logged reason
  • Buy-in/cashout transactions include idempotency keys to prevent duplicate charges
  • Financial reports are only accessible to operator admins, with access logged
  • Consider PCI DSS implications if handling payment card data directly — prefer delegating to a payment processor (Stripe)

Local Node Security

The RPi5 is physically in a venue — assume it can be stolen or tampered with:

  • Disk encryption: Full disk encryption (LUKS) or at minimum encrypted database
  • Secure boot: Signed binaries, verified at startup
  • Remote wipe: Cloud can send a command to reset the node to factory state
  • Tamper detection: Log unexpected restarts, hardware changes
  • Credential scope: Local node only has access to its venue's data — compromising one node doesn't expose other venues

Gotchas

  • DO NOT store payment card numbers — use a payment processor's tokenization
  • GDPR/privacy: Player data across venues requires careful consent management. Players must be able to request data deletion.
  • The local node's offline auth cache is a security risk — limit cached credentials, expire after configurable period
  • Regularly rotate NATS credentials and JWT signing keys — automate this

17. Developer Experience

Recommendation: Cargo workspace (Rust monorepo) + pnpm workspace (TypeScript) managed by Turborepo

Monorepo Structure

pvm/
├── Cargo.toml                 # Rust workspace root
├── turbo.json                 # Turborepo config
├── package.json               # pnpm workspace root
├── pnpm-workspace.yaml
│
├── crates/                    # Rust crates
│   ├── pvm-api/               # Cloud API server (Axum)
│   ├── pvm-node/              # Local node binary
│   ├── pvm-ws-gateway/        # WebSocket gateway
│   ├── pvm-worker/            # Background job processor
│   ├── pvm-core/              # Shared domain logic
│   │   ├── tournament/        # Tournament engine
│   │   ├── waitlist/          # Waitlist management
│   │   ├── clock/             # Tournament clock
│   │   └── sync/              # Sync protocol
│   ├── pvm-db/                # Database layer (sqlx queries, migrations)
│   ├── pvm-auth/              # Auth logic (JWT, RBAC)
│   ├── pvm-nats/              # NATS client wrappers
│   └── pvm-types/             # Shared types (serde, utoipa derives)
│
├── apps/                      # TypeScript apps
│   ├── dashboard/             # SvelteKit admin dashboard
│   ├── player/                # SvelteKit player-facing app
│   ├── cast-receiver/         # SvelteKit Cast receiver (static)
│   └── docs/                  # Documentation site (optional)
│
├── packages/                  # Shared TypeScript packages
│   ├── ui/                    # shadcn-svelte components
│   ├── api-client/            # Generated OpenAPI client
│   └── shared/                # Shared types, utilities
│
├── docker/                    # Dockerfiles
├── .github/                   # GitHub Actions workflows
└── docs/                      # Project documentation

Key Tools

Tool Purpose
Cargo Rust build system, workspace management
pnpm Fast, disk-efficient Node.js package manager
Turborepo Orchestrates build/test/lint across both Rust and TS workspaces. Caches build outputs. --affected flag for CI optimization.
cargo-watch Auto-rebuild on Rust file changes during development
cargo-nextest Faster test runner with parallel execution
sccache Shared compilation cache (speeds up CI and local builds)
cross / cargo-zigbuild Cross-compile Rust for RPi5 ARM64
Biome Fast linter + formatter for TypeScript (replaces ESLint + Prettier)
clippy Rust linter (run with --deny warnings in CI)
rustfmt Rust formatter (enforced in CI)
lefthook Git hooks manager (format + lint on pre-commit)

Development Workflow

# Start everything for local development
turbo dev                      # Starts SvelteKit dev servers
cargo watch -x run -p pvm-api  # Auto-restart API on changes

# Run all tests
turbo test                     # TypeScript tests
cargo nextest run              # Rust tests

# Generate API client after backend changes
cargo run -p pvm-api -- --openapi > apps/dashboard/src/lib/api/schema.json
turbo generate:api-client

# Build for production
turbo build                    # TypeScript apps
cargo build --release -p pvm-api
cross build --release --target aarch64-unknown-linux-gnu -p pvm-node

Gotchas

  • Turborepo's Rust support is task-level (it runs cargo as a shell command) — it doesn't understand Cargo's internal dependency graph. Use Cargo workspace for Rust-internal dependencies.
  • Keep pvm-core as a pure library crate with no async runtime dependency — this lets it be used in both the cloud API and the local node without conflicts.
  • Rust compile times are the bottleneck — invest in sccache and incremental compilation from day one
  • Use .cargo/config.toml for cross-compilation targets and linker settings

18. CSS / Styling

Recommendation: Tailwind CSS v4 + shadcn-svelte component system

Alternatives Considered

Option Pros Cons
Tailwind CSS v4 Utility-first, fast, excellent Svelte integration, v4 is faster with Rust-based engine Learning curve for utility classes
Vanilla CSS No dependencies, full control Slow development, inconsistent patterns
UnoCSS Atomic CSS, fast, flexible presets Smaller ecosystem than Tailwind
Open Props Design tokens as CSS custom properties Not utility-first, less adoption
Panda CSS Type-safe styles, zero runtime Newer, smaller ecosystem

Reasoning

Tailwind CSS v4 is the clear choice:

  1. Svelte integration: Tailwind works seamlessly with SvelteKit via the Vite plugin. Svelte's template syntax + Tailwind utilities produce compact, readable component markup.
  2. Tailwind v4 improvements: The v4 release includes a Rust-based engine (Oxide) that is significantly faster, CSS-first configuration (no more tailwind.config.js), automatic content detection, and native CSS cascade layers.
  3. shadcn-svelte: The component library is built on Tailwind, providing a consistent design system with accessible, customizable components. Components are generated into your codebase — full ownership, no black box.
  4. Cast receiver: Tailwind's utility classes produce small CSS bundles (only used classes are included) — important for the resource-constrained Chromecast receiver.
  5. Design tokens: Use CSS custom properties (via Tailwind's theme) for venue-specific branding (colors, logos) that can be swapped at runtime.

Design System Structure

packages/ui/
├── components/                # shadcn-svelte generated components
│   ├── button/
│   ├── card/
│   ├── data-table/
│   ├── dialog/
│   ├── form/
│   └── ...
├── styles/
│   ├── app.css                # Global styles, Tailwind imports
│   ├── themes/
│   │   ├── default.css        # Default PVM theme
│   │   ├── dark.css           # Dark mode overrides
│   │   └── cast.css           # Optimized for large screens
│   └── tokens.css             # Design tokens (colors, spacing, typography)
└── utils.ts                   # cn() helper, variant utilities

Venue Branding

Venues should be able to customize their displays:

/* Runtime theme switching via CSS custom properties */
:root {
  --venue-primary: theme(colors.blue.600);
  --venue-secondary: theme(colors.gray.800);
  --venue-logo-url: url('/default-logo.svg');
}

/* Applied per-venue at runtime */
[data-venue-theme="vegas-poker"] {
  --venue-primary: #c41e3a;
  --venue-secondary: #1a1a2e;
  --venue-logo-url: url('/venues/vegas-poker/logo.svg');
}

Gotchas

  • Tailwind v4's CSS-first config is a paradigm shift from v3 — ensure all team documentation targets v4 syntax
  • shadcn-svelte components use Tailwind v4 as of recent updates — verify compatibility
  • Large data tables (tournament player lists, waitlists) need careful styling — consider virtualized rendering for 100+ row tables
  • Cast receiver displays need large fonts and high contrast — create a dedicated cast.css theme
  • Dark mode is essential for poker venues (low-light environments) — design dark-first

Area Recommendation Key Reasoning
Backend Language Rust Memory efficiency on RPi5, performance, type safety
Frontend Language TypeScript Browser ecosystem standard, type safety
Backend Framework Axum (v0.8+) Tokio-native, Tower middleware, WebSocket support
Frontend Framework SvelteKit (Svelte 5) Smallest bundles, fine-grained reactivity, PWA support
UI Components shadcn-svelte Accessible, Tailwind-based, full ownership
Cloud Database PostgreSQL 16+ Multi-tenant gold standard, RLS, JSONB
Local Database libSQL (embedded) SQLite-compatible, tiny footprint, Rust-native
ORM / Queries sqlx Compile-time checked SQL, Postgres + SQLite support
Caching DragonflyDB Redis-compatible, multi-threaded, memory efficient
Messaging NATS + JetStream Edge-native leaf nodes, sub-ms latency, lightweight
Real-Time WebSockets (Axum) + SSE fallback Full duplex, NATS-backed fan-out
Auth Custom JWT + RBAC Offline-capable, cross-venue, full control
API Design REST + OpenAPI 3.1 Generated TypeScript client, universal compatibility
Mobile PWA first, Capacitor later One codebase, offline support, app store when needed
Displays Generic web app + Android display client No Cast SDK dependency, works offline, mDNS auto-discovery
Deployment Hetzner PVE + Docker (LXC containers) Self-hosted, full control, existing infrastructure
CI/CD Forgejo Actions + Turborepo Cross-language build orchestration, caching
Monitoring OpenTelemetry + Grafana Vendor-neutral, excellent Rust support
Testing cargo-nextest + Vitest + Playwright Full pyramid: unit, integration, E2E
Styling Tailwind CSS v4 Fast, small bundles, Svelte-native
Monorepo Cargo workspace + pnpm + Turborepo Unified builds, shared types
Linting clippy + Biome Rust + TypeScript coverage

Decisions Made

Resolved during tech stack review session, 2026-02-08.

# Question Decision
1 Hosting Self-hosted on Hetzner PVE — LXC containers. Already have infrastructure. No Fly.io dependency.
2 Sync strategy Event-based sync via NATS JetStream — all mutations are events, local node replays events to build state. Perfect audit trail. No table-vs-row debate.
3 NATS on RPi5 Sidecar — separate process managed by systemd/Docker. Independently upgradeable and monitorable.
4 Financial data No money handling at all. Venues handle payments via their own POS systems (most are cash-based). PVM only tracks game data.
5 Multi-region Single region initially. Design DB schema and NATS subjects for eventual multi-region without rewrite.
6 Player accounts PVM signup first. Players always create a PVM account before joining venues. No deduplication problem.
7 Display strategy Generic web app + Android display client. TVs run a simple Android app (or $40 Android box) that connects to the local node via mDNS auto-discovery, receives its display assignment via WebSocket, and renders a web page. Falls back to cloud SaaS if local node is offline. Chromecast is supported but not the primary path. No Google Cast SDK dependency.
8 RPi5 provisioning Docker on stock Raspberry Pi OS. All PVM services (node, NATS) run as containers. Updates via image pulls. Provisioning is a one-liner curl script.
9 Offline duration 72 hours. Covers a full weekend tournament series. After 72h offline, warn staff but keep operating. Sync everything on reconnect.
10 API style REST + OpenAPI 3.1. Auto-generated TypeScript client. Universal, debuggable, works with everything.

Deferred Questions

These remain open for future consideration:

  1. API versioning strategy: Maintain backward compatibility as long as possible. Only version on breaking changes. Revisit when approaching first external API consumers.

  2. GraphQL for player-facing app: REST is sufficient for v1. The player app might benefit from GraphQL's flexible querying later (e.g., "show me my upcoming tournaments across all venues with waitlist status"). Revisit after v1 launch.

  3. WebTransport: When browser support matures, could replace WebSockets for lower-latency real-time streams. Monitor but do not adopt yet.

  4. WASM on local node: Could parts of the frontend run on the local node via WASM for ultra-fast local rendering. Defer.

  5. AI features: Player behavior analytics, optimal table assignments, tournament structure recommendations. The data model should be designed to support future ML pipelines. Design for it, build later.