mikkel/pvm

Mikkel Georgsen 2bb381a0a3 Update tech stack research with finalized decisions

Resolve all open questions from tech stack review:
- Self-hosted on Hetzner PVE (LXC + Docker)
- Event-based sync via NATS JetStream
- Generic display system with Android client (no Cast SDK dep)
- Docker-based RPi5 provisioning
- No money handling, 72h offline limit, REST + OpenAPI
- PVM signup-first for player accounts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-08 03:06:53 +01:00

65 KiB

Raw Blame History

PVM (Poker Venue Manager) — Tech Stack Research

Generated: 2026-02-08 Status: DRAFT — for discussion and refinement

Programming Language
Backend Framework
Frontend Framework
Database Strategy
Caching Layer
Message Queue / Event Streaming
Real-Time Communication
Auth & Authorization
API Design
Local Node Architecture
Chromecast / Display Streaming
Mobile Strategy
Deployment & Infrastructure
Monitoring & Observability
Testing Strategy
Security
Developer Experience
CSS / Styling
Recommended Stack Summary
Open Questions / Decisions Needed

1. Programming Language

Recommendation: Rust (backend + local node) + TypeScript (frontend + shared types)

Alternatives Considered

Language	Pros	Cons
Rust	Memory safety, fearless concurrency, tiny binaries for RPi5, no GC pauses, excellent WebSocket perf	Steeper learning curve, slower compile times
Go	Simple, fast compilation, good concurrency	Less expressive type system, GC pauses (minor), larger binaries than Rust
TypeScript (full-stack)	One language everywhere, huge ecosystem, fast dev velocity	Node.js memory overhead on RPi5, GC pauses in real-time scenarios, weaker concurrency model
Elixir	Built for real-time (Phoenix), fault-tolerant OTP	Small ecosystem, harder to find libs, RPi5 BEAM VM overhead

Reasoning

Rust is the strongest choice for PVM because of the RPi5 local node constraint. The local node must run reliably on constrained hardware with limited memory, handle real-time tournament clocks, manage offline operations, and sync data. Rust's zero-cost abstractions, lack of garbage collector, and small binary sizes (typically 5-15 MB static binaries) make it ideal for this.

For the cloud backend, Rust's performance means fewer servers and lower hosting costs. A single Rust service can handle thousands of concurrent WebSocket connections with minimal memory overhead — critical for real-time tournament updates across many venues.

The "all code written by Claude Code" constraint actually favors Rust: Claude has excellent Rust fluency, and the compiler's strict type system catches bugs that would otherwise require extensive testing in dynamic languages.

TypeScript remains the right choice for the frontend — the browser ecosystem is TypeScript-native, and sharing type definitions between Rust (via generated OpenAPI types) and TypeScript gives end-to-end type safety.

Gotchas

Rust compile times can be mitigated with cargo-watch, incremental compilation, and sccache
Cross-compilation for RPi5 (ARM64) is well-supported via cross or cargo-zigbuild
Shared domain types can be generated from Rust structs to TypeScript via ts-rs or OpenAPI codegen

2. Backend Framework

Recommendation: Axum (v0.8+)

Alternatives Considered

Framework	Pros	Cons
Axum	Tokio-native, excellent middleware (Tower), lowest memory footprint, growing ecosystem, WebSocket built-in	Younger than Actix
Actix Web	Highest raw throughput, most mature	Actor model adds complexity, not Tokio-native (uses own runtime fork)
Rocket	Most ergonomic, Rails-like DX	Slower performance, less flexible middleware
Loco	Rails-like conventions, batteries-included	Very new (2024), smaller community, opinionated

Reasoning

Axum is the clear winner for PVM:

Tokio-native: Axum is built directly on Tokio + Hyper + Tower. Since NATS, database drivers, and WebSocket handling all use Tokio, everything shares one async runtime — no impedance mismatch.
Tower middleware: The Tower service/layer pattern gives composable middleware for auth, rate limiting, tracing, compression, CORS, etc. Middleware can be shared between HTTP and WebSocket handlers.
WebSocket support: First-class WebSocket extraction with axum::extract::ws, typed WebSocket messages via axum-typed-websockets.
Memory efficiency: Benchmarks show Axum achieves the lowest memory footprint per connection — critical when serving thousands of concurrent venue connections.
OpenAPI integration: utoipa crate provides derive macros for generating OpenAPI 3.1 specs directly from Axum handler types.
Extractor pattern: Axum's extractor-based request handling maps cleanly to domain operations (extract tenant, extract auth, extract venue context).

Key Libraries

axum — HTTP framework
axum-extra — typed headers, cookie jar, multipart
tower + tower-http — middleware stack (CORS, compression, tracing, rate limiting)
utoipa + utoipa-axum — OpenAPI spec generation
utoipa-swagger-ui — embedded Swagger UI
axum-typed-websockets — strongly typed WS messages

Gotchas

Axum's error handling requires careful design — use thiserror + a custom error type that implements IntoResponse
Route organization: use axum::Router::nest() for modular route trees per domain (tournaments, venues, players)
State management: use axum::extract::State with Arc<AppState> — avoid the temptation to put everything in one giant state struct

3. Frontend Framework

Recommendation: SvelteKit (Svelte 5 + runes reactivity)

Alternatives Considered

Framework	Pros	Cons
SvelteKit	Smallest bundles, true compilation (no virtual DOM), built-in routing/SSR/PWA, Svelte 5 runes are elegant	Smaller ecosystem than React
Next.js (React)	Largest ecosystem, most libraries, biggest job market	Vercel lock-in concerns, React hydration overhead, larger bundles, RSC complexity
SolidStart	Finest-grained reactivity, near-zero overhead updates	Smallest ecosystem, least mature, fewer component libraries
Nuxt (Vue)	Good DX, solid ecosystem	Vue 3 composition API less elegant than Svelte 5 runes

Reasoning

SvelteKit is the best fit for PVM for several reasons:

Performance matters for venue displays: Tournament clocks, waiting lists, and seat maps will run on venue TVs via Chromecast. Svelte's compiled output produces minimal JavaScript — the Cast receiver app will load faster and use less memory on Chromecast hardware.
Real-time UI updates: Svelte 5's fine-grained reactivity (runes: $state, $derived, $effect) means updating a single timer or seat status re-renders only that DOM node, not a virtual DOM diff. This is ideal for dashboards with many independently updating elements.
PWA support: SvelteKit has first-class service worker support and offline capabilities through @sveltejs/adapter-static and vite-plugin-pwa.
Bundle size: SvelteKit produces the smallest JavaScript bundles of any major framework — important for mobile PWA users on venue WiFi.
Claude Code compatibility: Svelte's template syntax is straightforward and less boilerplate than React — Claude can generate clean, readable Svelte components efficiently.
No framework lock-in: Svelte compiles away, so there's no runtime dependency. The output is vanilla JS + DOM manipulation.

UI Component Library

Recommendation: Skeleton UI (Svelte-native) or shadcn-svelte (Tailwind-based, port of shadcn/ui)

shadcn-svelte is particularly compelling because:

Components are copied into your codebase (not a dependency) — full control
Built on Tailwind CSS — consistent with the styling recommendation
Accessible by default (uses Bits UI primitives under the hood)
Matches the design patterns of the widely-used shadcn/ui ecosystem

Gotchas

SvelteKit's SSR is useful for the management dashboard but the Cast receiver and PWA may use adapter-static for pure SPA mode
Svelte's ecosystem is smaller than React's, but for PVM's needs (forms, tables, charts, real-time) the ecosystem is sufficient
Svelte 5 (runes) is a significant API change from Svelte 4 — ensure all examples and libraries target Svelte 5

4. Database Strategy

Recommendation: PostgreSQL (cloud primary) + libSQL/SQLite (local node) + Electric SQL or custom sync

Alternatives Considered

Approach	Pros	Cons
Postgres cloud + libSQL local + sync	Best of both worlds — Postgres power in cloud, SQLite simplicity on RPi5	Need sync layer, schema divergence risk
Postgres everywhere	One DB engine, simpler mental model	Postgres on RPi5 uses more memory, harder offline
libSQL/Turso everywhere	One engine, built-in edge replication	Less powerful for complex cloud queries, multi-tenant partitioning
CockroachDB	Distributed, strong consistency	Heavy for RPi5, expensive, overkill

Detailed Recommendation

Cloud Database: PostgreSQL 16+

The gold standard for multi-tenant SaaS
Row-level security (RLS) for tenant isolation
JSONB for flexible per-venue configuration
Excellent full-text search for player lookup across venues
Partitioning by tenant for performance at scale
Managed options: Neon (serverless, branching for dev), Supabase, or AWS RDS

Local Node Database: libSQL (via Turso's embedded runtime)

Fork of SQLite with cloud sync capabilities
Runs embedded in the Rust binary — no separate database process on RPi5
WAL mode for concurrent reads during tournament operations
Tiny memory footprint (< 10 MB typical)
libSQL's Rust driver (libsql) is well-maintained

Sync Strategy:

The local node operates on a subset of the cloud data — only data relevant to its venue(s). The sync approach:

Cloud-to-local: Player profiles, memberships, credit lines pushed to local node via NATS JetStream. Local node maintains a read replica of relevant data in libSQL.
Local-to-cloud: Tournament results, waitlist changes, transactions pushed to cloud via NATS JetStream with at-least-once delivery. Cloud processes as events.
Conflict resolution: Last-writer-wins (LWW) with vector clocks for most entities. For financial data (credit lines, buy-ins), use event sourcing — conflicts are impossible because every transaction is an immutable event.
Offline queue: When disconnected, local node queues mutations in a local WAL-style append-only log. On reconnect, replays in order via NATS.

ORM / Query Layer

Recommendation: sqlx (compile-time checked queries)

sqlx checks SQL queries against the actual database schema at compile time
No ORM abstraction layer — write real SQL, get compile-time safety
Supports both PostgreSQL and SQLite/libSQL
Avoids the N+1 query problems that ORMs introduce
Migrations via sqlx migrate

Alternative: sea-orm if you want a full ORM, but for PVM the explicit SQL approach of sqlx gives more control over multi-tenant queries and complex joins.

Migrations

Use sqlx migrate for cloud PostgreSQL migrations
Maintain parallel migration files for libSQL (SQLite-compatible subset)
A shared migration test ensures both schemas stay compatible for the sync subset

Gotchas

PostgreSQL and SQLite have different SQL dialects — the sync subset must use compatible types (no Postgres-specific types in synced tables)
libSQL's VECTOR type is interesting for future player similarity features but not needed initially
Turso's hosted libSQL replication is an option but adds a dependency — prefer embedded libSQL with custom NATS-based sync for more control
Schema versioning must be tracked on the local node so the cloud knows what schema version it's talking to

5. Caching Layer

Recommendation: DragonflyDB

Alternatives Considered

Option	Pros	Cons
DragonflyDB	25x Redis throughput, Redis-compatible API, multi-threaded, lower memory usage	Younger project, smaller community
Redis 7+	Most mature, largest ecosystem, Redis Stack modules	Single-threaded core, BSL license concerns since Redis 7.4
Valkey	Redis fork, community-driven, BSD license	Still catching up to Redis feature parity
KeyDB	Multi-threaded Redis fork	Development appears stalled (no updates in 1.5+ years)
No cache (just Postgres)	Simpler architecture	Higher DB load, slower for session/real-time data

Reasoning

DragonflyDB is the right choice for PVM:

Redis API compatibility: Drop-in replacement — all Redis client libraries work unchanged. The fred Rust crate (async Redis client) works with DragonflyDB out of the box.
Multi-threaded architecture: DragonflyDB uses all available CPU cores, unlike Redis's single-threaded model. This matters when caching tournament state for hundreds of concurrent venues.
Memory efficiency: DragonflyDB uses up to 80% less memory than Redis for the same dataset — important for keeping infrastructure costs low.
No license concerns: DragonflyDB uses BSL 1.1 (converts to open source after 4 years). Redis switched to a dual-license model that's more restrictive. Valkey is BSD but is playing catch-up.
Pub/Sub: DragonflyDB supports Redis Pub/Sub — useful as a lightweight complement to NATS for in-process event distribution within the backend cluster.

What to Cache

Session data: User sessions, JWT refresh tokens
Tournament state: Current level, blinds, clock, player counts (hot read path)
Waiting lists: Ordered sets per venue/game type
Rate limiting: API rate limit counters
Player lookup cache: Frequently accessed player profiles
Seat maps: Current table/seat assignments per venue

What NOT to Cache (use Postgres directly)

Financial transactions (credit lines, buy-ins) — always hit the source of truth
Audit logs
Historical tournament data

Local Node: No DragonflyDB

The RPi5 local node should not run DragonflyDB. libSQL is fast enough for local caching needs, and adding another process increases complexity and memory usage on constrained hardware. Use in-memory Rust data structures (e.g., DashMap, moka cache crate) for hot local state.

Gotchas

DragonflyDB's replication features are less mature than Redis Sentinel/Cluster — use managed hosting or keep it simple with a single node + persistence initially
Monitor DragonflyDB's release cycle — it's actively developed but younger than Redis
Keep the cache layer optional — the system should function (slower) without it

6. Message Queue / Event Streaming

Recommendation: NATS + JetStream

Alternatives Considered

Option	Pros	Cons
NATS + JetStream	Lightweight (single binary, ~20MB), sub-ms latency, built-in persistence, embedded mode, perfect for edge	Smaller community than Kafka
Apache Kafka	Highest throughput, mature, excellent tooling	Heavy (JVM, ZooKeeper/KRaft), 4GB+ RAM minimum, overkill for PVM's scale
RabbitMQ	Mature AMQP, sophisticated routing	Higher latency (5-20ms), more memory, Erlang ops complexity
Redis Streams	Simple, already have cache layer	Not designed for reliable message delivery at scale

Reasoning

NATS + JetStream is purpose-built for PVM's architecture:

Edge-native: NATS can run as a leaf node on the RPi5, connecting to the cloud NATS cluster. This is the core of the local-to-cloud sync architecture. When the connection drops, JetStream buffers messages locally and replays them on reconnect.
Lightweight: NATS server is a single ~20 MB binary. On RPi5, it uses ~50 MB RAM. Compare to Kafka's 4 GB minimum.
Sub-millisecond latency: Core NATS delivers messages in < 1ms. JetStream (persistent) adds 1-5ms. This is critical for real-time tournament updates — when a player busts, every connected display should update within milliseconds.
Subject-based addressing: NATS subjects map perfectly to PVM's domain:
- venue.{venue_id}.tournament.{id}.clock — tournament clock ticks
- venue.{venue_id}.waitlist.update — waiting list changes
- venue.{venue_id}.seats.{table_id} — seat assignments
- player.{player_id}.notifications — player-specific events
- sync.{node_id}.upstream — local node to cloud sync
- sync.{node_id}.downstream — cloud to local node sync
Built-in patterns: Request/reply (for RPC between cloud and node), pub/sub (for broadcasts), queue groups (for load-balanced consumers), key-value store (for distributed config), object store (for binary data like player photos).
JetStream for durability: Tournament results, financial transactions, and sync operations need guaranteed delivery. JetStream provides at-least-once and exactly-once delivery semantics with configurable retention.

Architecture

RPi5 Local Node                    Cloud
┌──────────────┐                ┌──────────────────┐
│  NATS Leaf   │◄──── TLS ────►│  NATS Cluster     │
│  Node        │    (auto-     │  (3-node)         │
│              │    reconnect) │                    │
│  JetStream   │               │  JetStream        │
│  (local buf) │               │  (persistent)     │
└──────────────┘               └──────────────────┘

Gotchas

NATS JetStream's exactly-once semantics require careful consumer design — use idempotent handlers with deduplication IDs
Subject namespace design is critical — plan it early, changing later is painful
NATS leaf nodes need TLS configured for secure cloud connection
Monitor JetStream stream sizes on RPi5 — set max bytes limits to avoid filling the SD card during extended offline periods
The async-nats Rust crate is the official async client — well maintained and Tokio-native

7. Real-Time Communication

Recommendation: WebSockets (via Axum) for interactive clients + NATS for backend fan-out + SSE as fallback

Alternatives Considered

Option	Pros	Cons
WebSockets	Full duplex, low latency, wide support	Requires connection management, can't traverse some proxies
Server-Sent Events (SSE)	Simpler, auto-reconnect, HTTP-native	Server-to-client only, no binary support
WebTransport	HTTP/3, multiplexed streams, unreliable mode	Very new, limited browser support, no Chromecast support
Socket.IO	Auto-fallback, rooms, namespaces	Node.js-centric, adds overhead, not Rust-native
gRPC streaming	Typed, efficient, bidirectional	Not browser-native (needs grpc-web proxy), overkill

Architecture

The real-time pipeline has three layers:

NATS (backend event bus): All state changes publish to NATS subjects. This is the single source of real-time truth. Both cloud services and local nodes publish here.
WebSocket Gateway (Axum): A dedicated Axum service subscribes to relevant NATS subjects and fans out to connected WebSocket clients. Each client subscribes to the venues/tournaments they care about.
SSE Fallback: For environments where WebSockets are blocked (some corporate networks), provide an SSE endpoint that delivers the same event stream. SSE's built-in auto-reconnect with Last-Event-ID makes resumption simple.

Flow Example: Tournament Clock Update

Tournament Service (Rust)
  → publishes to NATS: venue.123.tournament.456.clock {level: 5, time_remaining: 1200}
  → WebSocket Gateway subscribes to venue.123.tournament.*
  → fans out to all connected clients watching tournament 456
  → Chromecast receiver app gets update, renders clock
  → PWA on player's phone gets update, shows current level

Implementation Details

Use axum::extract::ws::WebSocket with tokio::select! to multiplex NATS subscription + client messages
Implement heartbeat/ping-pong to detect stale connections (30s interval, 10s timeout)
Client reconnection with exponential backoff + subscription replay from NATS JetStream
Binary message format: consider MessagePack (rmp-serde) for compact payloads over WebSocket, with JSON as human-readable fallback
Connection limits: track per-venue connection count, implement backpressure

Gotchas

WebSocket connections are stateful — need sticky sessions or a connection registry if running multiple gateway instances
Chromecast receiver apps have limited WebSocket support — test thoroughly on actual hardware
Mobile PWAs going to background will drop WebSocket connections — design for reconnection and state catch-up
Rate limit outbound messages to prevent flooding slow clients (tournament clock ticks should be throttled to 1/second for display, even if internal state updates more frequently)

8. Auth & Authorization

Alternatives Considered

Option	Pros	Cons
Custom JWT + RBAC	Full control, no vendor dependency, works offline on local node	Must implement everything yourself
Auth0 / Clerk	Managed, social login, MFA out of box	Vendor lock-in, cost scales with users, doesn't work offline
Keycloak	Self-hosted, full-featured, OIDC/SAML	Heavy (Java), complex to operate, overkill
Ory (Kratos + Keto)	Open source, cloud-native, API-first	Multiple services to deploy, newer
Lucia Auth	Lightweight, framework-agnostic	TypeScript-only, no Rust support

Architecture

PVM's auth has a unique challenge: cross-venue universal player accounts that must work both online (cloud) and offline (local node). This rules out purely managed auth services.

Token Strategy:

Access Token (JWT, short-lived: 15 min)
├── sub: player_id (universal)
├── tenant_id: current operator
├── venue_id: current venue (if applicable)
├── roles: ["player", "dealer", "floor_manager", "admin"]
├── permissions: ["tournament.manage", "waitlist.view", ...]
└── iat, exp, iss

Refresh Token (opaque, stored in DB/DragonflyDB, long-lived: 30 days)
└── Rotated on each use, old tokens invalidated

RBAC Model:

Operator (tenant)
├── Admin — full control over all venues
├── Manager — manage specific venues
├── Floor Manager — tournament/table operations at a venue
├── Dealer — assigned to tables, report results
└── Player — universal account, cross-venue
    ├── can self-register
    ├── has memberships per venue
    └── has credit lines per venue (managed by admin)

Key Design Decisions:

Tenant-scoped roles: A user can be an admin in one operator's venues and a player in another. The (user_id, operator_id, role) triple is the authorization unit.
Offline auth on local node: The local node caches valid JWT signing keys and a subset of user credentials (hashed). Players can authenticate locally when the cloud is unreachable. New registrations queue for cloud sync.
JWT signing: Use Ed25519 (fast, small signatures) via the jsonwebtoken crate. The cloud signs tokens; the local node can verify them with the public key. For offline token issuance, the local node has a delegated signing key.
Password hashing: argon2 crate — memory-hard, resistant to GPU attacks. Tune parameters for RPi5 (lower memory cost than cloud).
Social login (optional, cloud-only): Support Google/Apple sign-in for player accounts via standard OAuth2 flows. Map social identities to the universal player account.

Gotchas

Token revocation is hard with JWTs — use short expiry (15 min) + refresh token rotation + a lightweight blocklist in DragonflyDB for immediate revocation
Cross-venue account linking: when a player signs up at venue A and later visits venue B (different operator), they should be recognized. Use email/phone as the universal identifier with verification.
Local node token issuance must be time-limited and logged — cloud should audit all locally-issued tokens on sync
Rate limit login attempts both on cloud and local node to prevent brute force

9. API Design

Recommendation: REST + OpenAPI 3.1 with generated TypeScript client

Alternatives Considered

Approach	Pros	Cons
REST + OpenAPI	Universal, tooling-rich, generated clients, cacheable	Overfetching possible, multiple round trips
GraphQL	Flexible queries, single endpoint, good for complex UIs	Complexity overhead, caching harder, Rust support less mature
tRPC	Zero-config type safety	TypeScript-only — cannot use with Rust backend
gRPC	Efficient binary protocol, streaming	Needs proxy for browsers, overkill for this use case

Reasoning

tRPC is ruled out because it requires both client and server to be TypeScript. With a Rust backend, this is not viable.

REST + OpenAPI is the best approach because:

Generated type safety: Use utoipa to generate OpenAPI 3.1 specs from Rust types, then openapi-typescript to generate TypeScript types for the frontend. Changes to the Rust API automatically propagate to the frontend types.
Cacheable: REST's HTTP semantics enable CDN caching, ETag support, and conditional requests — important for player profiles and tournament structures that change infrequently.
Universal clients: The REST API will also be consumed by the Chromecast receiver app, the local node sync layer, and potentially third-party integrations. OpenAPI makes all of these easy.
Tooling: Swagger UI for exploration, openapi-fetch for the TypeScript client (type-safe fetch wrapper), Postman/Insomnia for testing.

API Conventions

# Resource-based URLs
GET    /api/v1/venues/{venue_id}/tournaments
POST   /api/v1/venues/{venue_id}/tournaments
GET    /api/v1/venues/{venue_id}/tournaments/{id}
PATCH  /api/v1/venues/{venue_id}/tournaments/{id}

# Actions as sub-resources
POST   /api/v1/venues/{venue_id}/tournaments/{id}/start
POST   /api/v1/venues/{venue_id}/tournaments/{id}/pause
POST   /api/v1/venues/{venue_id}/waitlists/{id}/join
POST   /api/v1/venues/{venue_id}/waitlists/{id}/call/{player_id}

# Cross-venue player operations
GET    /api/v1/players/me
GET    /api/v1/players/{id}/memberships
POST   /api/v1/players/{id}/credit-lines

# Real-time subscriptions
WS     /api/v1/ws?venue={id}&subscribe=tournament.clock,waitlist.updates

Type Generation Pipeline

Rust structs (serde + utoipa derive)
  → OpenAPI 3.1 JSON spec (generated at build time)
  → openapi-typescript (CI step)
  → TypeScript types + openapi-fetch client
  → SvelteKit frontend consumes typed API

Gotchas

Version the API from day one (/api/v1/) — breaking changes go in /api/v2/
Use cursor-based pagination for lists (not offset-based) — more efficient and handles concurrent inserts
Standardize error responses: { error: { code: string, message: string, details?: any } }
Consider a lightweight BFF (Backend-for-Frontend) pattern in SvelteKit's server routes for aggregating multiple API calls into one page load

10. Local Node Architecture

Recommendation: Single Rust binary running on RPi5 with embedded libSQL, NATS leaf node, and local HTTP/WS server

What Runs on the RPi5

┌─────────────────────────────────────────────────────┐
│  PVM Local Node (single Rust binary, ~15-20 MB)     │
│                                                      │
│  ┌──────────────┐  ┌──────────────┐                  │
│  │ HTTP/WS      │  │ NATS Leaf    │                  │
│  │ Server       │  │ Node         │                  │
│  │ (Axum)       │  │ (embedded or │                  │
│  │              │  │  sidecar)    │                  │
│  └──────┬───────┘  └──────┬───────┘                  │
│         │                  │                          │
│  ┌──────┴──────────────────┴───────┐                  │
│  │        Application Core          │                  │
│  │  - Tournament engine             │                  │
│  │  - Clock manager                 │                  │
│  │  - Waitlist manager              │                  │
│  │  - Seat assignment               │                  │
│  │  - Sync orchestrator             │                  │
│  └──────────────┬───────────────────┘                  │
│                  │                                      │
│  ┌──────────────┴───────────────────┐                  │
│  │         libSQL (embedded)         │                  │
│  │  - Venue data subset              │                  │
│  │  - Offline mutation queue         │                  │
│  │  - Local auth cache               │                  │
│  └───────────────────────────────────┘                  │
│                                                         │
│  ┌───────────────────────────────────┐                  │
│  │  moka in-memory cache             │                  │
│  │  - Hot tournament state           │                  │
│  │  - Active session tokens          │                  │
│  └───────────────────────────────────┘                  │
└─────────────────────────────────────────────────────────┘

Offline Operations

When the cloud connection drops, the local node continues operating:

Tournament operations: Clock continues, blinds advance, players bust/rebuy — all local state
Waitlist management: Players can join/leave waitlists — queued for cloud sync
Seat assignments: Floor managers can move players between tables locally
Player auth: Cached credentials allow existing players to log in. New registrations queued.
Financial operations: Buy-ins and credit transactions logged locally with offline flag. Cloud reconciles on reconnect.

Sync Protocol

On reconnect:
1. Local node sends its last-seen cloud sequence number
2. Cloud sends all events since that sequence (via NATS JetStream replay)
3. Local node sends its offline mutation queue (ordered by local timestamp)
4. Cloud processes mutations, detects conflicts, responds with resolution
5. Local node applies cloud resolutions, updates local state
6. Both sides confirm sync complete

Conflict Resolution Strategy

Data Type	Strategy	Reasoning
Tournament state	Cloud wins	Only one node runs a tournament at a time
Waitlist	Merge (union)	Both sides can add/remove; merge and re-order by timestamp
Player profiles	Cloud wins (LWW)	Cloud is the authority for universal accounts
Credit transactions	Append-only (event sourcing)	No conflicts — every transaction is immutable
Seat assignments	Local wins during offline	Floor manager's local decisions take precedence
Dealer schedules	Cloud wins	Schedules are set centrally

RPi5 System Setup

OS: Raspberry Pi OS Lite (64-bit, Debian Bookworm-based) — no desktop environment
Runtime: Docker + Docker Compose. Two containers: pvm-node (Rust binary) + pvm-nats-leaf (NATS)
Storage: 32 GB+ microSD or USB SSD (recommended for durability). libSQL database in a Docker volume.
Auto-start: Docker Compose with restart: always. systemd service ensures Docker starts on boot.
Updates: docker compose pull && docker compose up -d — automated via cron or webhook from cloud.
Watchdog: Docker health checks + hardware watchdog timer to auto-reboot if containers fail.
Networking: Ethernet preferred (reliable), WiFi as fallback. mDNS for local display device discovery. WireGuard tunnel to Hetzner cloud.

Gotchas

RPi5 has 4 GB or 8 GB RAM — target 8 GB model, budget ~200 MB for the PVM process + NATS
SD card wear: use an external USB SSD for the libSQL database if heavy write operations are expected
Time synchronization: use chrony NTP client — accurate timestamps are critical for conflict resolution and tournament clocks
Power loss: libSQL in WAL mode is crash-safe, but implement a clean shutdown handler (SIGTERM) that flushes state
Security: the RPi5 is physically accessible in venues — encrypt the libSQL database at rest, disable SSH password auth, use key-only

11. Venue Display System

Recommendation: Generic web display app + Android display client (no Google Cast SDK dependency)

Architecture

┌──────────────────┐
│  Screen Manager   │    (part of admin dashboard)
│  - Assign streams │    Venue staff assigns content to each display
│  - Per-TV config  │
└────────┬─────────┘
         │ WebSocket (display assignment)
         ▼
┌──────────────────┐         ┌──────────────────┐
│ Local RPi5 Node  │◄─ mDNS─┤ Display Devices   │
│ serves display   │  auto   │ (Android box /    │
│ web app + WS     │  disco  │  smart TV /       │
│                  │─────────►  Chromecast)      │
└────────┬─────────┘         └──────────────────┘
         │                            │
    if offline:                  fallback:
    serves locally             connect to cloud
         │                     SaaS URL directly
         ▼                            │
┌──────────────────┐         ┌───────▼──────────┐
│ Display renders  │         │ Display renders   │
│ from local node  │         │ from cloud        │
└──────────────────┘         └──────────────────┘

Display Client (Android App)

A lightweight Android app (or a $40 4K Android box) that:

Auto-starts on boot — kiosk mode, no user interaction needed
Discovers the local node via mDNS — zero-config for venue staff, falls back to manual IP entry
Registers with a unique device ID — appears automatically in the Screen Manager dashboard
Receives display assignment via WebSocket — the system tells it what to render
Renders a full-screen web page — the display content is a standard SvelteKit static page
Falls back to cloud SaaS if the local RPi5 node is offline
Remotely controllable — venue staff can change the stream, restart, or push an announcement overlay from the Screen Manager

Display Content (SvelteKit Static App)

The display views are a separate SvelteKit static build optimized for large screens:

Tournament clock: Large timer, current level, blind structure, next break, average stack
Waiting list: Player queue by game type, estimated wait times
Table status: Open seats, game types, stakes per table
Seatings: Tournament seat assignments after draws
Custom slideshow: Announcements, promotions, venue info (managed by staff)
Rotation mode: Cycle between multiple views on a configurable timer

Screen Manager

The Screen Manager (part of the admin dashboard) lets floor managers:

See all connected display devices with status (online, offline, content)
Assign content streams to each device (TV 1-5: tournament clock, TV 6: waitlist, etc.)
Configure rotation/cycling between views per device
Send one-time announcements to all screens or specific screens
Adjust display themes (dark/light, font size, venue branding)
Group screens (e.g. "Tournament Area", "Cash Room", "Lobby")

Technical Details

Display web app is served by the local node's HTTP server (Axum) for lowest latency
WebSocket connection for live data updates (tournament clock ticks, waitlist changes)
Each display device is identified by a stable device ID (generated on first boot, persisted)
mDNS service type: _pvm-display._tcp.local for auto-discovery
Display URLs: http://{local-node-ip}/display/{device-id} (local) or https://app.pvmapp.com/display/{device-id} (cloud fallback)
Dark mode by default (poker venues are low-light environments)
Large fonts, high contrast — designed for viewing from across the room

Chromecast Compatibility

Chromecast is supported as a display target but not the primary architecture:

Smart TVs with built-in Chromecast or attached Chromecast dongles can open the display URL
No Google Cast SDK dependency — just opening a URL
The Android display client app is the recommended approach for reliability and offline support

Gotchas

Android kiosk mode needs careful implementation — prevent users from exiting the app, handle OS updates gracefully
mDNS can be unreliable on some enterprise/venue networks — always offer manual IP fallback
Display devices on venue WiFi may have intermittent connectivity — design for reconnection and state catch-up
Keep the display app extremely lightweight — some $40 Android boxes have limited RAM
Test on actual cheap Android hardware early — performance varies wildly
Power cycling (venue closes nightly) must be handled gracefully — auto-start, auto-reconnect, auto-resume

12. Mobile Strategy

Recommendation: PWA first (SvelteKit), with Capacitor wrapper for app store presence when needed

Alternatives Considered

Approach	Pros	Cons
PWA (SvelteKit)	One codebase, instant updates, no app store, works offline	Limited native API access, no push on iOS (improving), discoverability
Capacitor (hybrid)	PWA + native shell, access native APIs, app store distribution	Thin WebView wrapper, some performance overhead
Tauri Mobile	Rust backend, small size	Mobile support very early (alpha/beta), limited ecosystem
React Native	True native UI, large ecosystem	Separate codebase from web, React dependency, not Svelte
Flutter	Excellent cross-platform, single codebase	Dart language, separate from web entirely

Reasoning

PVM's mobile needs are primarily consumption-oriented — players check tournament schedules, waiting list position, and receive notifications. This is a perfect fit for a PWA:

PWA first: The SvelteKit app with vite-plugin-pwa already provides offline caching, add-to-home-screen, and background sync. For most players, this is sufficient.
Capacitor wrap when needed: When iOS push notifications, Apple Pay, or app store presence becomes important, wrap the existing SvelteKit PWA in Capacitor. Capacitor runs the same web app in a native WebView and provides JavaScript bridges to native APIs.
Tauri Mobile is not ready: As of 2026, Tauri 2.0's mobile support exists but is still maturing. It would be a good fit architecturally (Rust backend + web frontend), but the plugin ecosystem and build tooling aren't as polished as Capacitor's. Revisit in 12-18 months.

PWA Features for PVM

Service Worker: Cache tournament schedules, player profile, venue info for offline access
Push Notifications: Web Push API for tournament start reminders, waitlist calls (Android + iOS 16.4+)
Add to Home Screen: App-like experience without app store
Background Sync: Queue waitlist join/leave actions when offline, sync when back online
Share Target: Accept shared tournament links

Gotchas

iOS PWA support is improving but still has limitations (no background fetch, limited push notification payload)
Capacitor requires maintaining iOS/Android build pipelines — only add this when there's a clear need
Test PWA on actual mobile devices in venues — WiFi quality varies dramatically
Deep linking: configure universal links / app links so shared tournament URLs open in the PWA/app

13. Deployment & Infrastructure

Recommendation: Self-hosted on Hetzner PVE (LXC containers) + Docker + Forgejo Actions CI/CD

Reasoning

The project already has a Hetzner Proxmox VE (PVE) server. Running PVM in LXC containers on the existing infrastructure keeps costs minimal and gives full control.

LXC containers on PVE: Lightweight, near-native performance, easy to snapshot and backup. Each service gets its own container or Docker runs inside an LXC.
Docker Compose for services: All cloud services defined in a single docker-compose.yml. Simple to start, stop, and update.
No vendor lock-in: Everything runs on standard Linux + Docker. Can migrate to any cloud or other bare metal trivially.
WireGuard for RPi5 connectivity: RPi5 local nodes connect to the Hetzner server via WireGuard tunnel for secure NATS leaf node communication.
Forgejo Actions: CI/CD runs on the same Forgejo instance hosting the code.

Infrastructure Layout

Hetzner PVE Server
├── LXC: pvm-cloud
│   ├── Docker: pvm-api (Axum)
│   ├── Docker: pvm-ws-gateway (Axum WebSocket)
│   ├── Docker: pvm-worker (background jobs: sync, notifications)
│   ├── Docker: pvm-nats (NATS cluster)
│   ├── Docker: pvm-db (PostgreSQL 16)
│   └── Docker: pvm-cache (DragonflyDB)
├── LXC: pvm-staging (mirrors production for testing)
└── WireGuard endpoint for RPi5 nodes

Venue (RPi5 — Docker on Raspberry Pi OS)
├── Docker: pvm-node (Rust binary — API proxy + sync engine)
├── Docker: pvm-nats-leaf (NATS leaf node)
└── connects to Hetzner via WireGuard/TLS

RPi5 Local Node (Docker-based)

The local node runs Docker on stock Raspberry Pi OS (64-bit):

Provisioning: One-liner curl script installs Docker and pulls the PVM stack (docker compose pull && docker compose up -d)
Updates: Pull new images and restart (docker compose pull && docker compose up -d). Automated via a cron job or self-update webhook.
Rollback: Previous images remain on disk. Roll back with docker compose up -d --force-recreate using pinned image tags.
Services: pvm-node (Rust binary) + pvm-nats-leaf (NATS leaf node). Two containers, minimal footprint.
Storage: libSQL database stored in a Docker volume on the SD card (or USB SSD for heavy-write venues).

CI/CD Pipeline (Forgejo Actions)

# Triggered on push to main
1. Lint (clippy, biome)
2. Test (cargo nextest, vitest, playwright)
3. Build (multi-stage Docker for cloud + cross-compile ARM64 for RPi5)
4. Push images to container registry
5. Deploy staging (docker compose pull on staging LXC)
6. E2E tests against staging
7. Deploy production (manual approval, docker compose on production LXC)
8. Publish RPi5 images (ARM64 Docker images to registry)

Gotchas

Use multi-stage Docker builds for Rust: builder stage with rust:bookworm, runtime stage with debian:bookworm-slim or distroless
PostgreSQL backups: automate pg_dump to a separate backup location (another Hetzner storage box or off-site)
Set up blue-green deployments via Docker Compose profiles for zero-downtime upgrades
Monitor Hetzner server resources — if PVM outgrows a single server, split services across multiple LXCs or servers
WireGuard keys for RPi5 nodes: automate key generation and registration during provisioning
The RPi5 Docker update mechanism needs a health check — if new images fail, auto-rollback to previous tag

14. Monitoring & Observability

Recommendation: OpenTelemetry (traces + metrics + logs) exported to self-hosted Grafana + Loki + Tempo + Prometheus (on Hetzner PVE)

Alternatives Considered

Stack	Pros	Cons
OpenTelemetry + Grafana	Vendor-neutral, excellent Rust support, unified pipeline	Some setup required
Datadog	All-in-one, excellent UX	Expensive at scale, vendor lock-in
New Relic	Good APM	Cost, Rust support less first-class
Sentry	Excellent error tracking	Limited metrics/traces, complementary rather than primary

Rust Instrumentation Stack

# Key crates
tracing = "0.1"                    # Structured logging/tracing facade
tracing-subscriber = "0.3"        # Log formatting, filtering
tracing-opentelemetry = "0.28"    # Bridge tracing → OpenTelemetry
opentelemetry = "0.28"            # OTel SDK
opentelemetry-otlp = "0.28"      # OTLP exporter
opentelemetry-semantic-conventions # Standard attribute names

What to Monitor

Application Metrics:

Request rate, latency (p50/p95/p99), error rate per endpoint
WebSocket connection count per venue
NATS message throughput and consumer lag
Tournament clock drift (local node vs cloud time)
Sync latency (time from local mutation to cloud persistence)
Cache hit/miss ratios (DragonflyDB)

Business Metrics:

Active tournaments per venue
Players on waiting lists
Concurrent connected users
Tournament registrations per hour
Offline duration per local node

Infrastructure Metrics:

CPU, memory, disk per service
RPi5 node health: temperature, memory usage, SD card wear level
NATS cluster health
Postgres connection pool utilization

Local Node Observability

The RPi5 node should:

Buffer OpenTelemetry spans/metrics locally when offline
Flush to cloud collector on reconnect
Expose a local /health endpoint for venue staff to check node status
Log to both stdout (for journalctl) and a rotating file

Alerting

Use Grafana Alerting for cloud services
Critical alerts: API error rate > 5%, NATS cluster partition, Postgres replication lag > 30s
Warning alerts: RPi5 node offline > 5 min, sync backlog > 1000 events, high memory usage
Notification channels: Slack/Discord for ops team, push notification for venue managers on critical local node issues

Gotchas

OpenTelemetry's Rust SDK is stable but evolving — pin versions carefully
The tracing crate is the Rust ecosystem standard — everything (Axum, sqlx, async-nats) already emits tracing spans, so you get deep instrumentation for free
Sampling is important at scale — don't trace every tournament clock tick in production
Grafana Cloud's free tier is generous enough for early stages (10k metrics, 50GB logs, 50GB traces)

15. Testing Strategy

Recommendation: Multi-layer testing with cargo test (unit/integration), Playwright (E2E), and Vitest (frontend unit)

Test Pyramid

         ▲
        / \        E2E Tests (Playwright)
       /   \       - Full user flows
      /     \      - Cast receiver rendering
     /───────\
    /         \    Integration Tests (cargo test + testcontainers)
   /           \   - API endpoint tests with real DB
  /             \  - NATS pub/sub flows
 /               \ - Sync protocol tests
/─────────────────\
                    Unit Tests (cargo test + vitest)
                    - Domain logic (tournament engine, clock, waitlist)
                    - Svelte component tests
                    - Conflict resolution logic

Backend Testing (Rust)

Unit tests: Inline #[cfg(test)] modules for domain logic. The tournament engine, clock manager, waitlist priority algorithm, and conflict resolution are all pure functions that are easy to unit test.
Integration tests: Use testcontainers crate to spin up ephemeral Postgres + NATS + DragonflyDB instances. Test full API flows including auth, multi-tenancy, and real-time events.
sqlx compile-time checks: SQL queries are validated against the database schema at compile time — this catches a huge class of bugs before runtime.
Property-based testing: Use proptest for testing conflict resolution and sync protocol with random inputs.
Test runner: cargo-nextest for parallel test execution (significantly faster than default cargo test).

Frontend Testing (TypeScript/Svelte)

Component tests: Vitest + @testing-library/svelte for testing Svelte components in isolation.
Store/state tests: Vitest for testing reactive state logic (tournament clock state, waitlist updates).
API mocking: msw (Mock Service Worker) for intercepting API calls in tests.

End-to-End Testing

Playwright: Test critical user flows in real browsers:
- Tournament creation and management flow
- Player registration and waitlist join
- Real-time updates (verify clock ticks appear in browser)
- Multi-venue admin dashboard
- Cast receiver display rendering (headless Chromium)
Local node E2E: Test offline scenarios — start local node, disconnect from cloud, perform operations, reconnect, verify sync.

Specialized Tests

Sync protocol tests: Simulate network partitions, conflicting writes, replay scenarios
Load testing: k6 or drill (Rust) for WebSocket connection saturation, API throughput
Cast receiver tests: Visual regression testing with Playwright screenshots of display layouts
Cross-browser: Playwright covers Chromium, Firefox, WebKit — ensure PWA works on all

Gotchas

Rust integration tests with testcontainers need Docker available in CI — Fly.io's CI runners support this, or use GitHub Actions with Docker
Playwright tests are slow — run in parallel, and only test critical paths in CI (full suite nightly)
The local node's offline/reconnect behavior is the hardest thing to test — invest heavily in deterministic sync protocol tests
Mock the NATS connection in unit tests using a channel-based mock, not an actual NATS server

16. Security

Recommendation: Defense in depth across all layers

Data Security

Layer	Measure
Transport	TLS 1.3 everywhere — API, WebSocket, NATS, Postgres connections
Data at rest	Postgres: encrypted volumes (cloud provider). libSQL on RPi5: SQLCipher-compatible encryption via `libsql`
Secrets	Environment variables via Fly.io secrets (cloud), encrypted config file on RPi5 (sealed at provisioning)
Passwords	Argon2id hashing, tuned per environment (higher params on cloud, lower on RPi5)
JWTs	Ed25519 signing, short expiry (15 min), refresh token rotation
API keys	SHA-256 hashed in DB, displayed once at creation, prefix-based identification (`pvm_live_`, `pvm_test_`)

Network Security

API: Rate limiting (Tower middleware), CORS restricted to known origins, request size limits
WebSocket: Authenticated connection upgrade (JWT in first message or query param), per-connection rate limiting
NATS: TLS + token auth between cloud and leaf nodes. Leaf nodes have scoped permissions (can only access their venue's subjects)
RPi5: Firewall (nftables/ufw) — only allow outbound to cloud NATS + HTTPS, inbound on local network only for venue devices
DDoS: Fly.io provides basic DDoS protection. Add Cloudflare in front for the API if needed.

Financial Data Security

PVM handles credit lines and buy-in transactions — this requires extra care:

All financial mutations are event-sourced with immutable audit trail
Credit line changes require admin approval with logged reason
Buy-in/cashout transactions include idempotency keys to prevent duplicate charges
Financial reports are only accessible to operator admins, with access logged
Consider PCI DSS implications if handling payment card data directly — prefer delegating to a payment processor (Stripe)

Local Node Security

The RPi5 is physically in a venue — assume it can be stolen or tampered with:

Disk encryption: Full disk encryption (LUKS) or at minimum encrypted database
Secure boot: Signed binaries, verified at startup
Remote wipe: Cloud can send a command to reset the node to factory state
Tamper detection: Log unexpected restarts, hardware changes
Credential scope: Local node only has access to its venue's data — compromising one node doesn't expose other venues

Gotchas

DO NOT store payment card numbers — use a payment processor's tokenization
GDPR/privacy: Player data across venues requires careful consent management. Players must be able to request data deletion.
The local node's offline auth cache is a security risk — limit cached credentials, expire after configurable period
Regularly rotate NATS credentials and JWT signing keys — automate this

17. Developer Experience

Recommendation: Cargo workspace (Rust monorepo) + pnpm workspace (TypeScript) managed by Turborepo

Monorepo Structure

pvm/
├── Cargo.toml                 # Rust workspace root
├── turbo.json                 # Turborepo config
├── package.json               # pnpm workspace root
├── pnpm-workspace.yaml
│
├── crates/                    # Rust crates
│   ├── pvm-api/               # Cloud API server (Axum)
│   ├── pvm-node/              # Local node binary
│   ├── pvm-ws-gateway/        # WebSocket gateway
│   ├── pvm-worker/            # Background job processor
│   ├── pvm-core/              # Shared domain logic
│   │   ├── tournament/        # Tournament engine
│   │   ├── waitlist/          # Waitlist management
│   │   ├── clock/             # Tournament clock
│   │   └── sync/              # Sync protocol
│   ├── pvm-db/                # Database layer (sqlx queries, migrations)
│   ├── pvm-auth/              # Auth logic (JWT, RBAC)
│   ├── pvm-nats/              # NATS client wrappers
│   └── pvm-types/             # Shared types (serde, utoipa derives)
│
├── apps/                      # TypeScript apps
│   ├── dashboard/             # SvelteKit admin dashboard
│   ├── player/                # SvelteKit player-facing app
│   ├── cast-receiver/         # SvelteKit Cast receiver (static)
│   └── docs/                  # Documentation site (optional)
│
├── packages/                  # Shared TypeScript packages
│   ├── ui/                    # shadcn-svelte components
│   ├── api-client/            # Generated OpenAPI client
│   └── shared/                # Shared types, utilities
│
├── docker/                    # Dockerfiles
├── .github/                   # GitHub Actions workflows
└── docs/                      # Project documentation

Key Tools

Tool	Purpose
Cargo	Rust build system, workspace management
pnpm	Fast, disk-efficient Node.js package manager
Turborepo	Orchestrates build/test/lint across both Rust and TS workspaces. Caches build outputs. `--affected` flag for CI optimization.
cargo-watch	Auto-rebuild on Rust file changes during development
cargo-nextest	Faster test runner with parallel execution
sccache	Shared compilation cache (speeds up CI and local builds)
cross / cargo-zigbuild	Cross-compile Rust for RPi5 ARM64
Biome	Fast linter + formatter for TypeScript (replaces ESLint + Prettier)
clippy	Rust linter (run with `--deny warnings` in CI)
rustfmt	Rust formatter (enforced in CI)
lefthook	Git hooks manager (format + lint on pre-commit)

Development Workflow

# Start everything for local development
turbo dev                      # Starts SvelteKit dev servers
cargo watch -x run -p pvm-api  # Auto-restart API on changes

# Run all tests
turbo test                     # TypeScript tests
cargo nextest run              # Rust tests

# Generate API client after backend changes
cargo run -p pvm-api -- --openapi > apps/dashboard/src/lib/api/schema.json
turbo generate:api-client

# Build for production
turbo build                    # TypeScript apps
cargo build --release -p pvm-api
cross build --release --target aarch64-unknown-linux-gnu -p pvm-node

Gotchas

Turborepo's Rust support is task-level (it runs cargo as a shell command) — it doesn't understand Cargo's internal dependency graph. Use Cargo workspace for Rust-internal dependencies.
Keep pvm-core as a pure library crate with no async runtime dependency — this lets it be used in both the cloud API and the local node without conflicts.
Rust compile times are the bottleneck — invest in sccache and incremental compilation from day one
Use .cargo/config.toml for cross-compilation targets and linker settings

18. CSS / Styling

Recommendation: Tailwind CSS v4 + shadcn-svelte component system

Alternatives Considered

Option	Pros	Cons
Tailwind CSS v4	Utility-first, fast, excellent Svelte integration, v4 is faster with Rust-based engine	Learning curve for utility classes
Vanilla CSS	No dependencies, full control	Slow development, inconsistent patterns
UnoCSS	Atomic CSS, fast, flexible presets	Smaller ecosystem than Tailwind
Open Props	Design tokens as CSS custom properties	Not utility-first, less adoption
Panda CSS	Type-safe styles, zero runtime	Newer, smaller ecosystem

Reasoning

Tailwind CSS v4 is the clear choice:

Svelte integration: Tailwind works seamlessly with SvelteKit via the Vite plugin. Svelte's template syntax + Tailwind utilities produce compact, readable component markup.
Tailwind v4 improvements: The v4 release includes a Rust-based engine (Oxide) that is significantly faster, CSS-first configuration (no more tailwind.config.js), automatic content detection, and native CSS cascade layers.
shadcn-svelte: The component library is built on Tailwind, providing a consistent design system with accessible, customizable components. Components are generated into your codebase — full ownership, no black box.
Cast receiver: Tailwind's utility classes produce small CSS bundles (only used classes are included) — important for the resource-constrained Chromecast receiver.
Design tokens: Use CSS custom properties (via Tailwind's theme) for venue-specific branding (colors, logos) that can be swapped at runtime.

Design System Structure

packages/ui/
├── components/                # shadcn-svelte generated components
│   ├── button/
│   ├── card/
│   ├── data-table/
│   ├── dialog/
│   ├── form/
│   └── ...
├── styles/
│   ├── app.css                # Global styles, Tailwind imports
│   ├── themes/
│   │   ├── default.css        # Default PVM theme
│   │   ├── dark.css           # Dark mode overrides
│   │   └── cast.css           # Optimized for large screens
│   └── tokens.css             # Design tokens (colors, spacing, typography)
└── utils.ts                   # cn() helper, variant utilities

Venue Branding

Venues should be able to customize their displays:

/* Runtime theme switching via CSS custom properties */
:root {
  --venue-primary: theme(colors.blue.600);
  --venue-secondary: theme(colors.gray.800);
  --venue-logo-url: url('/default-logo.svg');
}

/* Applied per-venue at runtime */
[data-venue-theme="vegas-poker"] {
  --venue-primary: #c41e3a;
  --venue-secondary: #1a1a2e;
  --venue-logo-url: url('/venues/vegas-poker/logo.svg');
}

Gotchas

Tailwind v4's CSS-first config is a paradigm shift from v3 — ensure all team documentation targets v4 syntax
shadcn-svelte components use Tailwind v4 as of recent updates — verify compatibility
Large data tables (tournament player lists, waitlists) need careful styling — consider virtualized rendering for 100+ row tables
Cast receiver displays need large fonts and high contrast — create a dedicated cast.css theme
Dark mode is essential for poker venues (low-light environments) — design dark-first

Recommended Stack Summary

Area	Recommendation	Key Reasoning
Backend Language	Rust	Memory efficiency on RPi5, performance, type safety
Frontend Language	TypeScript	Browser ecosystem standard, type safety
Backend Framework	Axum (v0.8+)	Tokio-native, Tower middleware, WebSocket support
Frontend Framework	SvelteKit (Svelte 5)	Smallest bundles, fine-grained reactivity, PWA support
UI Components	shadcn-svelte	Accessible, Tailwind-based, full ownership
Cloud Database	PostgreSQL 16+	Multi-tenant gold standard, RLS, JSONB
Local Database	libSQL (embedded)	SQLite-compatible, tiny footprint, Rust-native
ORM / Queries	sqlx	Compile-time checked SQL, Postgres + SQLite support
Caching	DragonflyDB	Redis-compatible, multi-threaded, memory efficient
Messaging	NATS + JetStream	Edge-native leaf nodes, sub-ms latency, lightweight
Real-Time	WebSockets (Axum) + SSE fallback	Full duplex, NATS-backed fan-out
Auth	Custom JWT + RBAC	Offline-capable, cross-venue, full control
API Design	REST + OpenAPI 3.1	Generated TypeScript client, universal compatibility
Mobile	PWA first, Capacitor later	One codebase, offline support, app store when needed
Displays	Generic web app + Android display client	No Cast SDK dependency, works offline, mDNS auto-discovery
Deployment	Hetzner PVE + Docker (LXC containers)	Self-hosted, full control, existing infrastructure
CI/CD	Forgejo Actions + Turborepo	Cross-language build orchestration, caching
Monitoring	OpenTelemetry + Grafana	Vendor-neutral, excellent Rust support
Testing	cargo-nextest + Vitest + Playwright	Full pyramid: unit, integration, E2E
Styling	Tailwind CSS v4	Fast, small bundles, Svelte-native
Monorepo	Cargo workspace + pnpm + Turborepo	Unified builds, shared types
Linting	clippy + Biome	Rust + TypeScript coverage

Decisions Made

Resolved during tech stack review session, 2026-02-08.

#	Question	Decision
1	Hosting	Self-hosted on Hetzner PVE — LXC containers. Already have infrastructure. No Fly.io dependency.
2	Sync strategy	Event-based sync via NATS JetStream — all mutations are events, local node replays events to build state. Perfect audit trail. No table-vs-row debate.
3	NATS on RPi5	Sidecar — separate process managed by systemd/Docker. Independently upgradeable and monitorable.
4	Financial data	No money handling at all. Venues handle payments via their own POS systems (most are cash-based). PVM only tracks game data.
5	Multi-region	Single region initially. Design DB schema and NATS subjects for eventual multi-region without rewrite.
6	Player accounts	PVM signup first. Players always create a PVM account before joining venues. No deduplication problem.
7	Display strategy	Generic web app + Android display client. TVs run a simple Android app (or $40 Android box) that connects to the local node via mDNS auto-discovery, receives its display assignment via WebSocket, and renders a web page. Falls back to cloud SaaS if local node is offline. Chromecast is supported but not the primary path. No Google Cast SDK dependency.
8	RPi5 provisioning	Docker on stock Raspberry Pi OS. All PVM services (node, NATS) run as containers. Updates via image pulls. Provisioning is a one-liner curl script.
9	Offline duration	72 hours. Covers a full weekend tournament series. After 72h offline, warn staff but keep operating. Sync everything on reconnect.
10	API style	REST + OpenAPI 3.1. Auto-generated TypeScript client. Universal, debuggable, works with everything.

Deferred Questions

These remain open for future consideration:

API versioning strategy: Maintain backward compatibility as long as possible. Only version on breaking changes. Revisit when approaching first external API consumers.
GraphQL for player-facing app: REST is sufficient for v1. The player app might benefit from GraphQL's flexible querying later (e.g., "show me my upcoming tournaments across all venues with waitlist status"). Revisit after v1 launch.
WebTransport: When browser support matures, could replace WebSockets for lower-latency real-time streams. Monitor but do not adopt yet.
WASM on local node: Could parts of the frontend run on the local node via WASM for ultra-fast local rendering. Defer.
AI features: Player behavior analytics, optimal table assignments, tournament structure recommendations. The data model should be designed to support future ML pipelines. Design for it, build later.

65 KiB Raw Blame History

PVM (Poker Venue Manager) — Tech Stack Research

Table of Contents

1. Programming Language

Recommendation: Rust (backend + local node) + TypeScript (frontend + shared types)

Alternatives Considered

Reasoning

Gotchas

2. Backend Framework

Recommendation: Axum (v0.8+)

Alternatives Considered

Reasoning

Key Libraries

Gotchas

3. Frontend Framework

Recommendation: SvelteKit (Svelte 5 + runes reactivity)

Alternatives Considered

Reasoning

UI Component Library

Gotchas

4. Database Strategy

Recommendation: PostgreSQL (cloud primary) + libSQL/SQLite (local node) + Electric SQL or custom sync

Alternatives Considered

Detailed Recommendation

ORM / Query Layer

Migrations

Gotchas

5. Caching Layer

Recommendation: DragonflyDB

Alternatives Considered

Reasoning

What to Cache

What NOT to Cache (use Postgres directly)

Local Node: No DragonflyDB

Gotchas

6. Message Queue / Event Streaming

Recommendation: NATS + JetStream

Alternatives Considered

Reasoning

Architecture

Gotchas

7. Real-Time Communication

Recommendation: WebSockets (via Axum) for interactive clients + NATS for backend fan-out + SSE as fallback

Alternatives Considered

Architecture

Flow Example: Tournament Clock Update

Implementation Details

Gotchas

8. Auth & Authorization

Recommendation: Custom JWT auth with Postgres-backed RBAC + optional OAuth2 social login

Alternatives Considered

Architecture

Gotchas

9. API Design

Recommendation: REST + OpenAPI 3.1 with generated TypeScript client

Alternatives Considered

Reasoning

API Conventions

Type Generation Pipeline

Gotchas

10. Local Node Architecture

Recommendation: Single Rust binary running on RPi5 with embedded libSQL, NATS leaf node, and local HTTP/WS server

What Runs on the RPi5

Offline Operations

Sync Protocol

Conflict Resolution Strategy

RPi5 System Setup

Gotchas

11. Venue Display System

Recommendation: Generic web display app + Android display client (no Google Cast SDK dependency)

Architecture

Display Client (Android App)

Display Content (SvelteKit Static App)

Screen Manager

Technical Details

Chromecast Compatibility

Gotchas

12. Mobile Strategy

Recommendation: PWA first (SvelteKit), with Capacitor wrapper for app store presence when needed

Alternatives Considered

65 KiB

Raw Blame History