pvm/docs/TECH_STACK_RESEARCH.md
Mikkel Georgsen 2bb381a0a3 Update tech stack research with finalized decisions
Resolve all open questions from tech stack review:
- Self-hosted on Hetzner PVE (LXC + Docker)
- Event-based sync via NATS JetStream
- Generic display system with Android client (no Cast SDK dep)
- Docker-based RPi5 provisioning
- No money handling, 72h offline limit, REST + OpenAPI
- PVM signup-first for player accounts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 03:06:53 +01:00

1213 lines
65 KiB
Markdown

# PVM (Poker Venue Manager) — Tech Stack Research
> Generated: 2026-02-08
> Status: DRAFT — for discussion and refinement
---
## Table of Contents
1. [Programming Language](#1-programming-language)
2. [Backend Framework](#2-backend-framework)
3. [Frontend Framework](#3-frontend-framework)
4. [Database Strategy](#4-database-strategy)
5. [Caching Layer](#5-caching-layer)
6. [Message Queue / Event Streaming](#6-message-queue--event-streaming)
7. [Real-Time Communication](#7-real-time-communication)
8. [Auth & Authorization](#8-auth--authorization)
9. [API Design](#9-api-design)
10. [Local Node Architecture](#10-local-node-architecture)
11. [Chromecast / Display Streaming](#11-chromecast--display-streaming)
12. [Mobile Strategy](#12-mobile-strategy)
13. [Deployment & Infrastructure](#13-deployment--infrastructure)
14. [Monitoring & Observability](#14-monitoring--observability)
15. [Testing Strategy](#15-testing-strategy)
16. [Security](#16-security)
17. [Developer Experience](#17-developer-experience)
18. [CSS / Styling](#18-css--styling)
19. [Recommended Stack Summary](#recommended-stack-summary)
20. [Open Questions / Decisions Needed](#open-questions--decisions-needed)
---
## 1. Programming Language
### Recommendation: **Rust** (backend + local node) + **TypeScript** (frontend + shared types)
### Alternatives Considered
| Language | Pros | Cons |
|----------|------|------|
| **Rust** | Memory safety, fearless concurrency, tiny binaries for RPi5, no GC pauses, excellent WebSocket perf | Steeper learning curve, slower compile times |
| **Go** | Simple, fast compilation, good concurrency | Less expressive type system, GC pauses (minor), larger binaries than Rust |
| **TypeScript (full-stack)** | One language everywhere, huge ecosystem, fast dev velocity | Node.js memory overhead on RPi5, GC pauses in real-time scenarios, weaker concurrency model |
| **Elixir** | Built for real-time (Phoenix), fault-tolerant OTP | Small ecosystem, harder to find libs, RPi5 BEAM VM overhead |
### Reasoning
Rust is the strongest choice for PVM because of the **RPi5 local node constraint**. The local node must run reliably on constrained hardware with limited memory, handle real-time tournament clocks, manage offline operations, and sync data. Rust's zero-cost abstractions, lack of garbage collector, and small binary sizes (typically 5-15 MB static binaries) make it ideal for this.
For the **cloud backend**, Rust's performance means fewer servers and lower hosting costs. A single Rust service can handle thousands of concurrent WebSocket connections with minimal memory overhead — critical for real-time tournament updates across many venues.
The **"all code written by Claude Code"** constraint actually favors Rust: Claude has excellent Rust fluency, and the compiler's strict type system catches bugs that would otherwise require extensive testing in dynamic languages.
**TypeScript** remains the right choice for the frontend — the browser ecosystem is TypeScript-native, and sharing type definitions between Rust (via generated OpenAPI types) and TypeScript gives end-to-end type safety.
### Gotchas
- Rust compile times can be mitigated with `cargo-watch`, incremental compilation, and `sccache`
- Cross-compilation for RPi5 (ARM64) is well-supported via `cross` or `cargo-zigbuild`
- Shared domain types can be generated from Rust structs to TypeScript via `ts-rs` or OpenAPI codegen
---
## 2. Backend Framework
### Recommendation: **Axum** (v0.8+)
### Alternatives Considered
| Framework | Pros | Cons |
|-----------|------|------|
| **Axum** | Tokio-native, excellent middleware (Tower), lowest memory footprint, growing ecosystem, WebSocket built-in | Younger than Actix |
| **Actix Web** | Highest raw throughput, most mature | Actor model adds complexity, not Tokio-native (uses own runtime fork) |
| **Rocket** | Most ergonomic, Rails-like DX | Slower performance, less flexible middleware |
| **Loco** | Rails-like conventions, batteries-included | Very new (2024), smaller community, opinionated |
### Reasoning
**Axum** is the clear winner for PVM:
1. **Tokio-native**: Axum is built directly on Tokio + Hyper + Tower. Since NATS, database drivers, and WebSocket handling all use Tokio, everything shares one async runtime — no impedance mismatch.
2. **Tower middleware**: The Tower service/layer pattern gives composable middleware for auth, rate limiting, tracing, compression, CORS, etc. Middleware can be shared between HTTP and WebSocket handlers.
3. **WebSocket support**: First-class WebSocket extraction with `axum::extract::ws`, typed WebSocket messages via `axum-typed-websockets`.
4. **Memory efficiency**: Benchmarks show Axum achieves the lowest memory footprint per connection — critical when serving thousands of concurrent venue connections.
5. **OpenAPI integration**: `utoipa` crate provides derive macros for generating OpenAPI 3.1 specs directly from Axum handler types.
6. **Extractor pattern**: Axum's extractor-based request handling maps cleanly to domain operations (extract tenant, extract auth, extract venue context).
### Key Libraries
- `axum` — HTTP framework
- `axum-extra` — typed headers, cookie jar, multipart
- `tower` + `tower-http` — middleware stack (CORS, compression, tracing, rate limiting)
- `utoipa` + `utoipa-axum` — OpenAPI spec generation
- `utoipa-swagger-ui` — embedded Swagger UI
- `axum-typed-websockets` — strongly typed WS messages
### Gotchas
- Axum's error handling requires careful design — use `thiserror` + a custom error type that implements `IntoResponse`
- Route organization: use `axum::Router::nest()` for modular route trees per domain (tournaments, venues, players)
- State management: use `axum::extract::State` with `Arc<AppState>` — avoid the temptation to put everything in one giant state struct
---
## 3. Frontend Framework
### Recommendation: **SvelteKit** (Svelte 5 + runes reactivity)
### Alternatives Considered
| Framework | Pros | Cons |
|-----------|------|------|
| **SvelteKit** | Smallest bundles, true compilation (no virtual DOM), built-in routing/SSR/PWA, Svelte 5 runes are elegant | Smaller ecosystem than React |
| **Next.js (React)** | Largest ecosystem, most libraries, biggest job market | Vercel lock-in concerns, React hydration overhead, larger bundles, RSC complexity |
| **SolidStart** | Finest-grained reactivity, near-zero overhead updates | Smallest ecosystem, least mature, fewer component libraries |
| **Nuxt (Vue)** | Good DX, solid ecosystem | Vue 3 composition API less elegant than Svelte 5 runes |
### Reasoning
**SvelteKit** is the best fit for PVM for several reasons:
1. **Performance matters for venue displays**: Tournament clocks, waiting lists, and seat maps will run on venue TVs via Chromecast. Svelte's compiled output produces minimal JavaScript — the Cast receiver app will load faster and use less memory on Chromecast hardware.
2. **Real-time UI updates**: Svelte 5's fine-grained reactivity (runes: `$state`, `$derived`, `$effect`) means updating a single timer or seat status re-renders only that DOM node, not a virtual DOM diff. This is ideal for dashboards with many independently updating elements.
3. **PWA support**: SvelteKit has first-class service worker support and offline capabilities through `@sveltejs/adapter-static` and `vite-plugin-pwa`.
4. **Bundle size**: SvelteKit produces the smallest JavaScript bundles of any major framework — important for mobile PWA users on venue WiFi.
5. **Claude Code compatibility**: Svelte's template syntax is straightforward and less boilerplate than React — Claude can generate clean, readable Svelte components efficiently.
6. **No framework lock-in**: Svelte compiles away, so there's no runtime dependency. The output is vanilla JS + DOM manipulation.
### UI Component Library
**Recommendation: Skeleton UI** (Svelte-native) or **shadcn-svelte** (Tailwind-based, port of shadcn/ui)
`shadcn-svelte` is particularly compelling because:
- Components are copied into your codebase (not a dependency) — full control
- Built on Tailwind CSS — consistent with the styling recommendation
- Accessible by default (uses Bits UI primitives under the hood)
- Matches the design patterns of the widely-used shadcn/ui ecosystem
### Gotchas
- SvelteKit's SSR is useful for the management dashboard but the Cast receiver and PWA may use `adapter-static` for pure SPA mode
- Svelte's ecosystem is smaller than React's, but for PVM's needs (forms, tables, charts, real-time) the ecosystem is sufficient
- Svelte 5 (runes) is a significant API change from Svelte 4 — ensure all examples and libraries target Svelte 5
---
## 4. Database Strategy
### Recommendation: **PostgreSQL** (cloud primary) + **libSQL/SQLite** (local node) + **Electric SQL** or custom sync
### Alternatives Considered
| Approach | Pros | Cons |
|----------|------|------|
| **Postgres cloud + libSQL local + sync** | Best of both worlds — Postgres power in cloud, SQLite simplicity on RPi5 | Need sync layer, schema divergence risk |
| **Postgres everywhere** | One DB engine, simpler mental model | Postgres on RPi5 uses more memory, harder offline |
| **libSQL/Turso everywhere** | One engine, built-in edge replication | Less powerful for complex cloud queries, multi-tenant partitioning |
| **CockroachDB** | Distributed, strong consistency | Heavy for RPi5, expensive, overkill |
### Detailed Recommendation
**Cloud Database: PostgreSQL 16+**
- The gold standard for multi-tenant SaaS
- Row-level security (RLS) for tenant isolation
- JSONB for flexible per-venue configuration
- Excellent full-text search for player lookup across venues
- Partitioning by tenant for performance at scale
- Managed options: Neon (serverless, branching for dev), Supabase, or AWS RDS
**Local Node Database: libSQL (via Turso's embedded runtime)**
- Fork of SQLite with cloud sync capabilities
- Runs embedded in the Rust binary — no separate database process on RPi5
- WAL mode for concurrent reads during tournament operations
- Tiny memory footprint (< 10 MB typical)
- libSQL's Rust driver (`libsql`) is well-maintained
**Sync Strategy:**
The local node operates on a **subset** of the cloud data only data relevant to its venue(s). The sync approach:
1. **Cloud-to-local**: Player profiles, memberships, credit lines pushed to local node via NATS JetStream. Local node maintains a read replica of relevant data in libSQL.
2. **Local-to-cloud**: Tournament results, waitlist changes, transactions pushed to cloud via NATS JetStream with at-least-once delivery. Cloud processes as events.
3. **Conflict resolution**: Last-writer-wins (LWW) with vector clocks for most entities. For financial data (credit lines, buy-ins), use **event sourcing** conflicts are impossible because every transaction is an immutable event.
4. **Offline queue**: When disconnected, local node queues mutations in a local WAL-style append-only log. On reconnect, replays in order via NATS.
### ORM / Query Layer
**Recommendation: `sqlx`** (compile-time checked queries)
- `sqlx` checks SQL queries against the actual database schema at compile time
- No ORM abstraction layer write real SQL, get compile-time safety
- Supports both PostgreSQL and SQLite/libSQL
- Avoids the N+1 query problems that ORMs introduce
- Migrations via `sqlx migrate`
Alternative: `sea-orm` if you want a full ORM, but for PVM the explicit SQL approach of `sqlx` gives more control over multi-tenant queries and complex joins.
### Migrations
- Use `sqlx migrate` for cloud PostgreSQL migrations
- Maintain parallel migration files for libSQL (SQLite-compatible subset)
- A shared migration test ensures both schemas stay compatible for the sync subset
### Gotchas
- PostgreSQL and SQLite have different SQL dialects the sync subset must use compatible types (no Postgres-specific types in synced tables)
- libSQL's `VECTOR` type is interesting for future player similarity features but not needed initially
- Turso's hosted libSQL replication is an option but adds a dependency prefer embedded libSQL with custom NATS-based sync for more control
- Schema versioning must be tracked on the local node so the cloud knows what schema version it's talking to
---
## 5. Caching Layer
### Recommendation: **DragonflyDB**
### Alternatives Considered
| Option | Pros | Cons |
|--------|------|------|
| **DragonflyDB** | 25x Redis throughput, Redis-compatible API, multi-threaded, lower memory usage | Younger project, smaller community |
| **Redis 7+** | Most mature, largest ecosystem, Redis Stack modules | Single-threaded core, BSL license concerns since Redis 7.4 |
| **Valkey** | Redis fork, community-driven, BSD license | Still catching up to Redis feature parity |
| **KeyDB** | Multi-threaded Redis fork | Development appears stalled (no updates in 1.5+ years) |
| **No cache (just Postgres)** | Simpler architecture | Higher DB load, slower for session/real-time data |
### Reasoning
**DragonflyDB** is the right choice for PVM:
1. **Redis API compatibility**: Drop-in replacement all Redis client libraries work unchanged. The `fred` Rust crate (async Redis client) works with DragonflyDB out of the box.
2. **Multi-threaded architecture**: DragonflyDB uses all available CPU cores, unlike Redis's single-threaded model. This matters when caching tournament state for hundreds of concurrent venues.
3. **Memory efficiency**: DragonflyDB uses up to 80% less memory than Redis for the same dataset important for keeping infrastructure costs low.
4. **No license concerns**: DragonflyDB uses BSL 1.1 (converts to open source after 4 years). Redis switched to a dual-license model that's more restrictive. Valkey is BSD but is playing catch-up.
5. **Pub/Sub**: DragonflyDB supports Redis Pub/Sub useful as a lightweight complement to NATS for in-process event distribution within the backend cluster.
### What to Cache
- **Session data**: User sessions, JWT refresh tokens
- **Tournament state**: Current level, blinds, clock, player counts (hot read path)
- **Waiting lists**: Ordered sets per venue/game type
- **Rate limiting**: API rate limit counters
- **Player lookup cache**: Frequently accessed player profiles
- **Seat maps**: Current table/seat assignments per venue
### What NOT to Cache (use Postgres directly)
- Financial transactions (credit lines, buy-ins) always hit the source of truth
- Audit logs
- Historical tournament data
### Local Node: No DragonflyDB
The RPi5 local node should **not** run DragonflyDB. libSQL is fast enough for local caching needs, and adding another process increases complexity and memory usage on constrained hardware. Use in-memory Rust data structures (e.g., `DashMap`, `moka` cache crate) for hot local state.
### Gotchas
- DragonflyDB's replication features are less mature than Redis Sentinel/Cluster use managed hosting or keep it simple with a single node + persistence initially
- Monitor DragonflyDB's release cycle it's actively developed but younger than Redis
- Keep the cache layer optional the system should function (slower) without it
---
## 6. Message Queue / Event Streaming
### Recommendation: **NATS + JetStream**
### Alternatives Considered
| Option | Pros | Cons |
|--------|------|------|
| **NATS + JetStream** | Lightweight (single binary, ~20MB), sub-ms latency, built-in persistence, embedded mode, perfect for edge | Smaller community than Kafka |
| **Apache Kafka** | Highest throughput, mature, excellent tooling | Heavy (JVM, ZooKeeper/KRaft), 4GB+ RAM minimum, overkill for PVM's scale |
| **RabbitMQ** | Mature AMQP, sophisticated routing | Higher latency (5-20ms), more memory, Erlang ops complexity |
| **Redis Streams** | Simple, already have cache layer | Not designed for reliable message delivery at scale |
### Reasoning
**NATS + JetStream** is purpose-built for PVM's architecture:
1. **Edge-native**: NATS can run as a **leaf node** on the RPi5, connecting to the cloud NATS cluster. This is the core of the local-to-cloud sync architecture. When the connection drops, JetStream buffers messages locally and replays them on reconnect.
2. **Lightweight**: NATS server is a single ~20 MB binary. On RPi5, it uses ~50 MB RAM. Compare to Kafka's 4 GB minimum.
3. **Sub-millisecond latency**: Core NATS delivers messages in < 1ms. JetStream (persistent) adds 1-5ms. This is critical for real-time tournament updates when a player busts, every connected display should update within milliseconds.
4. **Subject-based addressing**: NATS subjects map perfectly to PVM's domain:
- `venue.{venue_id}.tournament.{id}.clock` tournament clock ticks
- `venue.{venue_id}.waitlist.update` waiting list changes
- `venue.{venue_id}.seats.{table_id}` seat assignments
- `player.{player_id}.notifications` player-specific events
- `sync.{node_id}.upstream` local node to cloud sync
- `sync.{node_id}.downstream` cloud to local node sync
5. **Built-in patterns**: Request/reply (for RPC between cloud and node), pub/sub (for broadcasts), queue groups (for load-balanced consumers), key-value store (for distributed config), object store (for binary data like player photos).
6. **JetStream for durability**: Tournament results, financial transactions, and sync operations need guaranteed delivery. JetStream provides at-least-once and exactly-once delivery semantics with configurable retention.
### Architecture
```
RPi5 Local Node Cloud
┌──────────────┐ ┌──────────────────┐
│ NATS Leaf │◄──── TLS ────►│ NATS Cluster │
│ Node │ (auto- │ (3-node) │
│ │ reconnect) │ │
│ JetStream │ │ JetStream │
│ (local buf) │ │ (persistent) │
└──────────────┘ └──────────────────┘
```
### Gotchas
- NATS JetStream's exactly-once semantics require careful consumer design use idempotent handlers with deduplication IDs
- Subject namespace design is critical plan it early, changing later is painful
- NATS leaf nodes need TLS configured for secure cloud connection
- Monitor JetStream stream sizes on RPi5 set max bytes limits to avoid filling the SD card during extended offline periods
- The `async-nats` Rust crate is the official async client well maintained and Tokio-native
---
## 7. Real-Time Communication
### Recommendation: **WebSockets** (via Axum) for interactive clients + **NATS** for backend fan-out + **SSE** as fallback
### Alternatives Considered
| Option | Pros | Cons |
|--------|------|------|
| **WebSockets** | Full duplex, low latency, wide support | Requires connection management, can't traverse some proxies |
| **Server-Sent Events (SSE)** | Simpler, auto-reconnect, HTTP-native | Server-to-client only, no binary support |
| **WebTransport** | HTTP/3, multiplexed streams, unreliable mode | Very new, limited browser support, no Chromecast support |
| **Socket.IO** | Auto-fallback, rooms, namespaces | Node.js-centric, adds overhead, not Rust-native |
| **gRPC streaming** | Typed, efficient, bidirectional | Not browser-native (needs grpc-web proxy), overkill |
### Architecture
The real-time pipeline has three layers:
1. **NATS** (backend event bus): All state changes publish to NATS subjects. This is the single source of real-time truth. Both cloud services and local nodes publish here.
2. **WebSocket Gateway** (Axum): A dedicated Axum service subscribes to relevant NATS subjects and fans out to connected WebSocket clients. Each client subscribes to the venues/tournaments they care about.
3. **SSE Fallback**: For environments where WebSockets are blocked (some corporate networks), provide an SSE endpoint that delivers the same event stream. SSE's built-in auto-reconnect with `Last-Event-ID` makes resumption simple.
### Flow Example: Tournament Clock Update
```
Tournament Service (Rust)
→ publishes to NATS: venue.123.tournament.456.clock {level: 5, time_remaining: 1200}
→ WebSocket Gateway subscribes to venue.123.tournament.*
→ fans out to all connected clients watching tournament 456
→ Chromecast receiver app gets update, renders clock
→ PWA on player's phone gets update, shows current level
```
### Implementation Details
- Use `axum::extract::ws::WebSocket` with `tokio::select!` to multiplex NATS subscription + client messages
- Implement heartbeat/ping-pong to detect stale connections (30s interval, 10s timeout)
- Client reconnection with exponential backoff + subscription replay from NATS JetStream
- Binary message format: consider MessagePack (`rmp-serde`) for compact payloads over WebSocket, with JSON as human-readable fallback
- Connection limits: track per-venue connection count, implement backpressure
### Gotchas
- WebSocket connections are stateful need sticky sessions or a connection registry if running multiple gateway instances
- Chromecast receiver apps have limited WebSocket support test thoroughly on actual hardware
- Mobile PWAs going to background will drop WebSocket connections design for reconnection and state catch-up
- Rate limit outbound messages to prevent flooding slow clients (tournament clock ticks should be throttled to 1/second for display, even if internal state updates more frequently)
---
## 8. Auth & Authorization
### Recommendation: **Custom JWT auth** with **Postgres-backed RBAC** + optional **OAuth2 social login**
### Alternatives Considered
| Option | Pros | Cons |
|--------|------|------|
| **Custom JWT + RBAC** | Full control, no vendor dependency, works offline on local node | Must implement everything yourself |
| **Auth0 / Clerk** | Managed, social login, MFA out of box | Vendor lock-in, cost scales with users, doesn't work offline |
| **Keycloak** | Self-hosted, full-featured, OIDC/SAML | Heavy (Java), complex to operate, overkill |
| **Ory (Kratos + Keto)** | Open source, cloud-native, API-first | Multiple services to deploy, newer |
| **Lucia Auth** | Lightweight, framework-agnostic | TypeScript-only, no Rust support |
### Architecture
PVM's auth has a unique challenge: **cross-venue universal player accounts** that must work both online (cloud) and offline (local node). This rules out purely managed auth services.
**Token Strategy:**
```
Access Token (JWT, short-lived: 15 min)
├── sub: player_id (universal)
├── tenant_id: current operator
├── venue_id: current venue (if applicable)
├── roles: ["player", "dealer", "floor_manager", "admin"]
├── permissions: ["tournament.manage", "waitlist.view", ...]
└── iat, exp, iss
Refresh Token (opaque, stored in DB/DragonflyDB, long-lived: 30 days)
└── Rotated on each use, old tokens invalidated
```
**RBAC Model:**
```
Operator (tenant)
├── Admin — full control over all venues
├── Manager — manage specific venues
├── Floor Manager — tournament/table operations at a venue
├── Dealer — assigned to tables, report results
└── Player — universal account, cross-venue
├── can self-register
├── has memberships per venue
└── has credit lines per venue (managed by admin)
```
**Key Design Decisions:**
1. **Tenant-scoped roles**: A user can be an admin in one operator's venues and a player in another. The `(user_id, operator_id, role)` triple is the authorization unit.
2. **Offline auth on local node**: The local node caches valid JWT signing keys and a subset of user credentials (hashed). Players can authenticate locally when the cloud is unreachable. New registrations queue for cloud sync.
3. **JWT signing**: Use Ed25519 (fast, small signatures) via the `jsonwebtoken` crate. The cloud signs tokens; the local node can verify them with the public key. For offline token issuance, the local node has a delegated signing key.
4. **Password hashing**: `argon2` crate memory-hard, resistant to GPU attacks. Tune parameters for RPi5 (lower memory cost than cloud).
5. **Social login** (optional, cloud-only): Support Google/Apple sign-in for player accounts via standard OAuth2 flows. Map social identities to the universal player account.
### Gotchas
- Token revocation is hard with JWTs use short expiry (15 min) + refresh token rotation + a lightweight blocklist in DragonflyDB for immediate revocation
- Cross-venue account linking: when a player signs up at venue A and later visits venue B (different operator), they should be recognized. Use email/phone as the universal identifier with verification.
- Local node token issuance must be time-limited and logged cloud should audit all locally-issued tokens on sync
- Rate limit login attempts both on cloud and local node to prevent brute force
---
## 9. API Design
### Recommendation: **REST + OpenAPI 3.1** with generated TypeScript client
### Alternatives Considered
| Approach | Pros | Cons |
|----------|------|------|
| **REST + OpenAPI** | Universal, tooling-rich, generated clients, cacheable | Overfetching possible, multiple round trips |
| **GraphQL** | Flexible queries, single endpoint, good for complex UIs | Complexity overhead, caching harder, Rust support less mature |
| **tRPC** | Zero-config type safety | TypeScript-only cannot use with Rust backend |
| **gRPC** | Efficient binary protocol, streaming | Needs proxy for browsers, overkill for this use case |
### Reasoning
**tRPC is ruled out** because it requires both client and server to be TypeScript. With a Rust backend, this is not viable.
**REST + OpenAPI** is the best approach because:
1. **Generated type safety**: Use `utoipa` to generate OpenAPI 3.1 specs from Rust types, then `openapi-typescript` to generate TypeScript types for the frontend. Changes to the Rust API automatically propagate to the frontend types.
2. **Cacheable**: REST's HTTP semantics enable CDN caching, ETag support, and conditional requests important for player profiles and tournament structures that change infrequently.
3. **Universal clients**: The REST API will also be consumed by the Chromecast receiver app, the local node sync layer, and potentially third-party integrations. OpenAPI makes all of these easy.
4. **Tooling**: Swagger UI for exploration, `openapi-fetch` for the TypeScript client (type-safe fetch wrapper), Postman/Insomnia for testing.
### API Conventions
```
# Resource-based URLs
GET /api/v1/venues/{venue_id}/tournaments
POST /api/v1/venues/{venue_id}/tournaments
GET /api/v1/venues/{venue_id}/tournaments/{id}
PATCH /api/v1/venues/{venue_id}/tournaments/{id}
# Actions as sub-resources
POST /api/v1/venues/{venue_id}/tournaments/{id}/start
POST /api/v1/venues/{venue_id}/tournaments/{id}/pause
POST /api/v1/venues/{venue_id}/waitlists/{id}/join
POST /api/v1/venues/{venue_id}/waitlists/{id}/call/{player_id}
# Cross-venue player operations
GET /api/v1/players/me
GET /api/v1/players/{id}/memberships
POST /api/v1/players/{id}/credit-lines
# Real-time subscriptions
WS /api/v1/ws?venue={id}&subscribe=tournament.clock,waitlist.updates
```
### Type Generation Pipeline
```
Rust structs (serde + utoipa derive)
→ OpenAPI 3.1 JSON spec (generated at build time)
→ openapi-typescript (CI step)
→ TypeScript types + openapi-fetch client
→ SvelteKit frontend consumes typed API
```
### Gotchas
- Version the API from day one (`/api/v1/`) breaking changes go in `/api/v2/`
- Use cursor-based pagination for lists (not offset-based) more efficient and handles concurrent inserts
- Standardize error responses: `{ error: { code: string, message: string, details?: any } }`
- Consider a lightweight BFF (Backend-for-Frontend) pattern in SvelteKit's server routes for aggregating multiple API calls into one page load
---
## 10. Local Node Architecture
### Recommendation: **Single Rust binary** running on RPi5 with embedded libSQL, NATS leaf node, and local HTTP/WS server
### What Runs on the RPi5
```
┌─────────────────────────────────────────────────────┐
│ PVM Local Node (single Rust binary, ~15-20 MB) │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ HTTP/WS │ │ NATS Leaf │ │
│ │ Server │ │ Node │ │
│ │ (Axum) │ │ (embedded or │ │
│ │ │ │ sidecar) │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ┌──────┴──────────────────┴───────┐ │
│ │ Application Core │ │
│ │ - Tournament engine │ │
│ │ - Clock manager │ │
│ │ - Waitlist manager │ │
│ │ - Seat assignment │ │
│ │ - Sync orchestrator │ │
│ └──────────────┬───────────────────┘ │
│ │ │
│ ┌──────────────┴───────────────────┐ │
│ │ libSQL (embedded) │ │
│ │ - Venue data subset │ │
│ │ - Offline mutation queue │ │
│ │ - Local auth cache │ │
│ └───────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────┐ │
│ │ moka in-memory cache │ │
│ │ - Hot tournament state │ │
│ │ - Active session tokens │ │
│ └───────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
```
### Offline Operations
When the cloud connection drops, the local node continues operating:
1. **Tournament operations**: Clock continues, blinds advance, players bust/rebuy all local state
2. **Waitlist management**: Players can join/leave waitlists queued for cloud sync
3. **Seat assignments**: Floor managers can move players between tables locally
4. **Player auth**: Cached credentials allow existing players to log in. New registrations queued.
5. **Financial operations**: Buy-ins and credit transactions logged locally with offline flag. Cloud reconciles on reconnect.
### Sync Protocol
```
On reconnect:
1. Local node sends its last-seen cloud sequence number
2. Cloud sends all events since that sequence (via NATS JetStream replay)
3. Local node sends its offline mutation queue (ordered by local timestamp)
4. Cloud processes mutations, detects conflicts, responds with resolution
5. Local node applies cloud resolutions, updates local state
6. Both sides confirm sync complete
```
### Conflict Resolution Strategy
| Data Type | Strategy | Reasoning |
|-----------|----------|-----------|
| Tournament state | Cloud wins | Only one node runs a tournament at a time |
| Waitlist | Merge (union) | Both sides can add/remove; merge and re-order by timestamp |
| Player profiles | Cloud wins (LWW) | Cloud is the authority for universal accounts |
| Credit transactions | Append-only (event sourcing) | No conflicts every transaction is immutable |
| Seat assignments | Local wins during offline | Floor manager's local decisions take precedence |
| Dealer schedules | Cloud wins | Schedules are set centrally |
### RPi5 System Setup
- **OS**: Raspberry Pi OS Lite (64-bit, Debian Bookworm-based) no desktop environment
- **Runtime**: Docker + Docker Compose. Two containers: `pvm-node` (Rust binary) + `pvm-nats-leaf` (NATS)
- **Storage**: 32 GB+ microSD or USB SSD (recommended for durability). libSQL database in a Docker volume.
- **Auto-start**: Docker Compose with `restart: always`. systemd service ensures Docker starts on boot.
- **Updates**: `docker compose pull && docker compose up -d` automated via cron or webhook from cloud.
- **Watchdog**: Docker health checks + hardware watchdog timer to auto-reboot if containers fail.
- **Networking**: Ethernet preferred (reliable), WiFi as fallback. mDNS for local display device discovery. WireGuard tunnel to Hetzner cloud.
### Gotchas
- RPi5 has 4 GB or 8 GB RAM target 8 GB model, budget ~200 MB for the PVM process + NATS
- SD card wear: use an external USB SSD for the libSQL database if heavy write operations are expected
- Time synchronization: use `chrony` NTP client accurate timestamps are critical for conflict resolution and tournament clocks
- Power loss: libSQL in WAL mode is crash-safe, but implement a clean shutdown handler (SIGTERM) that flushes state
- Security: the RPi5 is physically accessible in venues encrypt the libSQL database at rest, disable SSH password auth, use key-only
---
## 11. Venue Display System
### Recommendation: **Generic web display app** + **Android display client** (no Google Cast SDK dependency)
### Architecture
```
┌──────────────────┐
│ Screen Manager │ (part of admin dashboard)
│ - Assign streams │ Venue staff assigns content to each display
│ - Per-TV config │
└────────┬─────────┘
│ WebSocket (display assignment)
┌──────────────────┐ ┌──────────────────┐
│ Local RPi5 Node │◄─ mDNS─┤ Display Devices │
│ serves display │ auto │ (Android box / │
│ web app + WS │ disco │ smart TV / │
│ │─────────► Chromecast) │
└────────┬─────────┘ └──────────────────┘
│ │
if offline: fallback:
serves locally connect to cloud
│ SaaS URL directly
▼ │
┌──────────────────┐ ┌───────▼──────────┐
│ Display renders │ │ Display renders │
│ from local node │ │ from cloud │
└──────────────────┘ └──────────────────┘
```
### Display Client (Android App)
A lightweight Android app (or a $40 4K Android box) that:
1. **Auto-starts on boot** kiosk mode, no user interaction needed
2. **Discovers the local node via mDNS** zero-config for venue staff, falls back to manual IP entry
3. **Registers with a unique device ID** appears automatically in the Screen Manager dashboard
4. **Receives display assignment via WebSocket** the system tells it what to render
5. **Renders a full-screen web page** the display content is a standard SvelteKit static page
6. **Falls back to cloud SaaS** if the local RPi5 node is offline
7. **Remotely controllable** venue staff can change the stream, restart, or push an announcement overlay from the Screen Manager
### Display Content (SvelteKit Static App)
The display views are a **separate SvelteKit static build** optimized for large screens:
- **Tournament clock**: Large timer, current level, blind structure, next break, average stack
- **Waiting list**: Player queue by game type, estimated wait times
- **Table status**: Open seats, game types, stakes per table
- **Seatings**: Tournament seat assignments after draws
- **Custom slideshow**: Announcements, promotions, venue info (managed by staff)
- **Rotation mode**: Cycle between multiple views on a configurable timer
### Screen Manager
The **Screen Manager** (part of the admin dashboard) lets floor managers:
- See all connected display devices with status (online, offline, content)
- Assign content streams to each device (TV 1-5: tournament clock, TV 6: waitlist, etc.)
- Configure rotation/cycling between views per device
- Send one-time announcements to all screens or specific screens
- Adjust display themes (dark/light, font size, venue branding)
- Group screens (e.g. "Tournament Area", "Cash Room", "Lobby")
### Technical Details
- Display web app is served by the local node's HTTP server (Axum) for lowest latency
- WebSocket connection for live data updates (tournament clock ticks, waitlist changes)
- Each display device is identified by a stable device ID (generated on first boot, persisted)
- mDNS service type: `_pvm-display._tcp.local` for auto-discovery
- Display URLs: `http://{local-node-ip}/display/{device-id}` (local) or `https://app.pvmapp.com/display/{device-id}` (cloud fallback)
- Dark mode by default (poker venues are low-light environments)
- Large fonts, high contrast designed for viewing from across the room
### Chromecast Compatibility
Chromecast is supported as a **display target** but not the primary architecture:
- Smart TVs with built-in Chromecast or attached Chromecast dongles can open the display URL
- No Google Cast SDK dependency just opening a URL
- The Android display client app is the recommended approach for reliability and offline support
### Gotchas
- Android kiosk mode needs careful implementation prevent users from exiting the app, handle OS updates gracefully
- mDNS can be unreliable on some enterprise/venue networks always offer manual IP fallback
- Display devices on venue WiFi may have intermittent connectivity design for reconnection and state catch-up
- Keep the display app extremely lightweight some $40 Android boxes have limited RAM
- Test on actual cheap Android hardware early performance varies wildly
- Power cycling (venue closes nightly) must be handled gracefully auto-start, auto-reconnect, auto-resume
---
## 12. Mobile Strategy
### Recommendation: **PWA first** (SvelteKit), with **Capacitor** wrapper for app store presence when needed
### Alternatives Considered
| Approach | Pros | Cons |
|----------|------|------|
| **PWA (SvelteKit)** | One codebase, instant updates, no app store, works offline | Limited native API access, no push on iOS (improving), discoverability |
| **Capacitor (hybrid)** | PWA + native shell, access native APIs, app store distribution | Thin WebView wrapper, some performance overhead |
| **Tauri Mobile** | Rust backend, small size | Mobile support very early (alpha/beta), limited ecosystem |
| **React Native** | True native UI, large ecosystem | Separate codebase from web, React dependency, not Svelte |
| **Flutter** | Excellent cross-platform, single codebase | Dart language, separate from web entirely |
### Reasoning
PVM's mobile needs are primarily **consumption-oriented** players check tournament schedules, waiting list position, and receive notifications. This is a perfect fit for a PWA:
1. **PWA first**: The SvelteKit app with `vite-plugin-pwa` already provides offline caching, add-to-home-screen, and background sync. For most players, this is sufficient.
2. **Capacitor wrap when needed**: When iOS push notifications, Apple Pay, or app store presence becomes important, wrap the existing SvelteKit PWA in Capacitor. Capacitor runs the same web app in a native WebView and provides JavaScript bridges to native APIs.
3. **Tauri Mobile is not ready**: As of 2026, Tauri 2.0's mobile support exists but is still maturing. It would be a good fit architecturally (Rust backend + web frontend), but the plugin ecosystem and build tooling aren't as polished as Capacitor's. Revisit in 12-18 months.
### PWA Features for PVM
- **Service Worker**: Cache tournament schedules, player profile, venue info for offline access
- **Push Notifications**: Web Push API for tournament start reminders, waitlist calls (Android + iOS 16.4+)
- **Add to Home Screen**: App-like experience without app store
- **Background Sync**: Queue waitlist join/leave actions when offline, sync when back online
- **Share Target**: Accept shared tournament links
### Gotchas
- iOS PWA support is improving but still has limitations (no background fetch, limited push notification payload)
- Capacitor requires maintaining iOS/Android build pipelines only add this when there's a clear need
- Test PWA on actual mobile devices in venues WiFi quality varies dramatically
- Deep linking: configure universal links / app links so shared tournament URLs open in the PWA/app
---
## 13. Deployment & Infrastructure
### Recommendation: **Self-hosted on Hetzner PVE** (LXC containers) + **Docker** + **Forgejo Actions** CI/CD
### Reasoning
The project already has a Hetzner Proxmox VE (PVE) server. Running PVM in LXC containers on the existing infrastructure keeps costs minimal and gives full control.
1. **LXC containers on PVE**: Lightweight, near-native performance, easy to snapshot and backup. Each service gets its own container or Docker runs inside an LXC.
2. **Docker Compose for services**: All cloud services defined in a single `docker-compose.yml`. Simple to start, stop, and update.
3. **No vendor lock-in**: Everything runs on standard Linux + Docker. Can migrate to any cloud or other bare metal trivially.
4. **WireGuard for RPi5 connectivity**: RPi5 local nodes connect to the Hetzner server via WireGuard tunnel for secure NATS leaf node communication.
5. **Forgejo Actions**: CI/CD runs on the same Forgejo instance hosting the code.
### Infrastructure Layout
```
Hetzner PVE Server
├── LXC: pvm-cloud
│ ├── Docker: pvm-api (Axum)
│ ├── Docker: pvm-ws-gateway (Axum WebSocket)
│ ├── Docker: pvm-worker (background jobs: sync, notifications)
│ ├── Docker: pvm-nats (NATS cluster)
│ ├── Docker: pvm-db (PostgreSQL 16)
│ └── Docker: pvm-cache (DragonflyDB)
├── LXC: pvm-staging (mirrors production for testing)
└── WireGuard endpoint for RPi5 nodes
Venue (RPi5 — Docker on Raspberry Pi OS)
├── Docker: pvm-node (Rust binary — API proxy + sync engine)
├── Docker: pvm-nats-leaf (NATS leaf node)
└── connects to Hetzner via WireGuard/TLS
```
### RPi5 Local Node (Docker-based)
The local node runs **Docker on stock Raspberry Pi OS (64-bit)**:
- **Provisioning**: One-liner curl script installs Docker and pulls the PVM stack (`docker compose pull && docker compose up -d`)
- **Updates**: Pull new images and restart (`docker compose pull && docker compose up -d`). Automated via a cron job or self-update webhook.
- **Rollback**: Previous images remain on disk. Roll back with `docker compose up -d --force-recreate` using pinned image tags.
- **Services**: `pvm-node` (Rust binary) + `pvm-nats-leaf` (NATS leaf node). Two containers, minimal footprint.
- **Storage**: libSQL database stored in a Docker volume on the SD card (or USB SSD for heavy-write venues).
### CI/CD Pipeline (Forgejo Actions)
```yaml
# Triggered on push to main
1. Lint (clippy, biome)
2. Test (cargo nextest, vitest, playwright)
3. Build (multi-stage Docker for cloud + cross-compile ARM64 for RPi5)
4. Push images to container registry
5. Deploy staging (docker compose pull on staging LXC)
6. E2E tests against staging
7. Deploy production (manual approval, docker compose on production LXC)
8. Publish RPi5 images (ARM64 Docker images to registry)
```
### Gotchas
- Use multi-stage Docker builds for Rust: builder stage with `rust:bookworm`, runtime stage with `debian:bookworm-slim` or `distroless`
- PostgreSQL backups: automate `pg_dump` to a separate backup location (another Hetzner storage box or off-site)
- Set up blue-green deployments via Docker Compose profiles for zero-downtime upgrades
- Monitor Hetzner server resources if PVM outgrows a single server, split services across multiple LXCs or servers
- WireGuard keys for RPi5 nodes: automate key generation and registration during provisioning
- The RPi5 Docker update mechanism needs a health check if new images fail, auto-rollback to previous tag
---
## 14. Monitoring & Observability
### Recommendation: **OpenTelemetry** (traces + metrics + logs) exported to **self-hosted Grafana + Loki + Tempo + Prometheus** (on Hetzner PVE)
### Alternatives Considered
| Stack | Pros | Cons |
|-------|------|------|
| **OpenTelemetry + Grafana** | Vendor-neutral, excellent Rust support, unified pipeline | Some setup required |
| **Datadog** | All-in-one, excellent UX | Expensive at scale, vendor lock-in |
| **New Relic** | Good APM | Cost, Rust support less first-class |
| **Sentry** | Excellent error tracking | Limited metrics/traces, complementary rather than primary |
### Rust Instrumentation Stack
```toml
# Key crates
tracing = "0.1" # Structured logging/tracing facade
tracing-subscriber = "0.3" # Log formatting, filtering
tracing-opentelemetry = "0.28" # Bridge tracing → OpenTelemetry
opentelemetry = "0.28" # OTel SDK
opentelemetry-otlp = "0.28" # OTLP exporter
opentelemetry-semantic-conventions # Standard attribute names
```
### What to Monitor
**Application Metrics:**
- Request rate, latency (p50/p95/p99), error rate per endpoint
- WebSocket connection count per venue
- NATS message throughput and consumer lag
- Tournament clock drift (local node vs cloud time)
- Sync latency (time from local mutation to cloud persistence)
- Cache hit/miss ratios (DragonflyDB)
**Business Metrics:**
- Active tournaments per venue
- Players on waiting lists
- Concurrent connected users
- Tournament registrations per hour
- Offline duration per local node
**Infrastructure Metrics:**
- CPU, memory, disk per service
- RPi5 node health: temperature, memory usage, SD card wear level
- NATS cluster health
- Postgres connection pool utilization
### Local Node Observability
The RPi5 node should:
- Buffer OpenTelemetry spans/metrics locally when offline
- Flush to cloud collector on reconnect
- Expose a local `/health` endpoint for venue staff to check node status
- Log to both stdout (for `journalctl`) and a rotating file
### Alerting
- Use Grafana Alerting for cloud services
- Critical alerts: API error rate > 5%, NATS cluster partition, Postgres replication lag > 30s
- Warning alerts: RPi5 node offline > 5 min, sync backlog > 1000 events, high memory usage
- Notification channels: Slack/Discord for ops team, push notification for venue managers on critical local node issues
### Gotchas
- OpenTelemetry's Rust SDK is stable but evolving — pin versions carefully
- The `tracing` crate is the Rust ecosystem standard — everything (Axum, sqlx, async-nats) already emits tracing spans, so you get deep instrumentation for free
- Sampling is important at scale — don't trace every tournament clock tick in production
- Grafana Cloud's free tier is generous enough for early stages (10k metrics, 50GB logs, 50GB traces)
---
## 15. Testing Strategy
### Recommendation: Multi-layer testing with **cargo test** (unit/integration), **Playwright** (E2E), and **Vitest** (frontend unit)
### Test Pyramid
```
/ \ E2E Tests (Playwright)
/ \ - Full user flows
/ \ - Cast receiver rendering
/───────\
/ \ Integration Tests (cargo test + testcontainers)
/ \ - API endpoint tests with real DB
/ \ - NATS pub/sub flows
/ \ - Sync protocol tests
/─────────────────\
Unit Tests (cargo test + vitest)
- Domain logic (tournament engine, clock, waitlist)
- Svelte component tests
- Conflict resolution logic
```
### Backend Testing (Rust)
- **Unit tests**: Inline `#[cfg(test)]` modules for domain logic. The tournament engine, clock manager, waitlist priority algorithm, and conflict resolution are all pure functions that are easy to unit test.
- **Integration tests**: Use `testcontainers` crate to spin up ephemeral Postgres + NATS + DragonflyDB instances. Test full API flows including auth, multi-tenancy, and real-time events.
- **sqlx compile-time checks**: SQL queries are validated against the database schema at compile time — this catches a huge class of bugs before runtime.
- **Property-based testing**: Use `proptest` for testing conflict resolution and sync protocol with random inputs.
- **Test runner**: `cargo-nextest` for parallel test execution (significantly faster than default `cargo test`).
### Frontend Testing (TypeScript/Svelte)
- **Component tests**: Vitest + `@testing-library/svelte` for testing Svelte components in isolation.
- **Store/state tests**: Vitest for testing reactive state logic (tournament clock state, waitlist updates).
- **API mocking**: `msw` (Mock Service Worker) for intercepting API calls in tests.
### End-to-End Testing
- **Playwright**: Test critical user flows in real browsers:
- Tournament creation and management flow
- Player registration and waitlist join
- Real-time updates (verify clock ticks appear in browser)
- Multi-venue admin dashboard
- Cast receiver display rendering (headless Chromium)
- **Local node E2E**: Test offline scenarios — start local node, disconnect from cloud, perform operations, reconnect, verify sync.
### Specialized Tests
- **Sync protocol tests**: Simulate network partitions, conflicting writes, replay scenarios
- **Load testing**: `k6` or `drill` (Rust) for WebSocket connection saturation, API throughput
- **Cast receiver tests**: Visual regression testing with Playwright screenshots of display layouts
- **Cross-browser**: Playwright covers Chromium, Firefox, WebKit — ensure PWA works on all
### Gotchas
- Rust integration tests with testcontainers need Docker available in CI — Fly.io's CI runners support this, or use GitHub Actions with Docker
- Playwright tests are slow — run in parallel, and only test critical paths in CI (full suite nightly)
- The local node's offline/reconnect behavior is the hardest thing to test — invest heavily in deterministic sync protocol tests
- Mock the NATS connection in unit tests using a channel-based mock, not an actual NATS server
---
## 16. Security
### Recommendation: Defense in depth across all layers
### Data Security
| Layer | Measure |
|-------|---------|
| **Transport** | TLS 1.3 everywhere — API, WebSocket, NATS, Postgres connections |
| **Data at rest** | Postgres: encrypted volumes (cloud provider). libSQL on RPi5: SQLCipher-compatible encryption via `libsql` |
| **Secrets** | Environment variables via Fly.io secrets (cloud), encrypted config file on RPi5 (sealed at provisioning) |
| **Passwords** | Argon2id hashing, tuned per environment (higher params on cloud, lower on RPi5) |
| **JWTs** | Ed25519 signing, short expiry (15 min), refresh token rotation |
| **API keys** | SHA-256 hashed in DB, displayed once at creation, prefix-based identification (`pvm_live_`, `pvm_test_`) |
### Network Security
- **API**: Rate limiting (Tower middleware), CORS restricted to known origins, request size limits
- **WebSocket**: Authenticated connection upgrade (JWT in first message or query param), per-connection rate limiting
- **NATS**: TLS + token auth between cloud and leaf nodes. Leaf nodes have scoped permissions (can only access their venue's subjects)
- **RPi5**: Firewall (nftables/ufw) — only allow outbound to cloud NATS + HTTPS, inbound on local network only for venue devices
- **DDoS**: Fly.io provides basic DDoS protection. Add Cloudflare in front for the API if needed.
### Financial Data Security
PVM handles credit lines and buy-in transactions — this requires extra care:
- All financial mutations are **event-sourced** with immutable audit trail
- Credit line changes require **admin approval** with logged reason
- Buy-in/cashout transactions include **idempotency keys** to prevent duplicate charges
- Financial reports are only accessible to operator admins, with access logged
- Consider PCI DSS implications if handling payment card data directly — prefer delegating to a payment processor (Stripe)
### Local Node Security
The RPi5 is physically in a venue — assume it can be stolen or tampered with:
- **Disk encryption**: Full disk encryption (LUKS) or at minimum encrypted database
- **Secure boot**: Signed binaries, verified at startup
- **Remote wipe**: Cloud can send a command to reset the node to factory state
- **Tamper detection**: Log unexpected restarts, hardware changes
- **Credential scope**: Local node only has access to its venue's data — compromising one node doesn't expose other venues
### Gotchas
- DO NOT store payment card numbers — use a payment processor's tokenization
- GDPR/privacy: Player data across venues requires careful consent management. Players must be able to request data deletion.
- The local node's offline auth cache is a security risk — limit cached credentials, expire after configurable period
- Regularly rotate NATS credentials and JWT signing keys — automate this
---
## 17. Developer Experience
### Recommendation: **Cargo workspace** (Rust monorepo) + **pnpm workspace** (TypeScript) managed by **Turborepo**
### Monorepo Structure
```
pvm/
├── Cargo.toml # Rust workspace root
├── turbo.json # Turborepo config
├── package.json # pnpm workspace root
├── pnpm-workspace.yaml
├── crates/ # Rust crates
│ ├── pvm-api/ # Cloud API server (Axum)
│ ├── pvm-node/ # Local node binary
│ ├── pvm-ws-gateway/ # WebSocket gateway
│ ├── pvm-worker/ # Background job processor
│ ├── pvm-core/ # Shared domain logic
│ │ ├── tournament/ # Tournament engine
│ │ ├── waitlist/ # Waitlist management
│ │ ├── clock/ # Tournament clock
│ │ └── sync/ # Sync protocol
│ ├── pvm-db/ # Database layer (sqlx queries, migrations)
│ ├── pvm-auth/ # Auth logic (JWT, RBAC)
│ ├── pvm-nats/ # NATS client wrappers
│ └── pvm-types/ # Shared types (serde, utoipa derives)
├── apps/ # TypeScript apps
│ ├── dashboard/ # SvelteKit admin dashboard
│ ├── player/ # SvelteKit player-facing app
│ ├── cast-receiver/ # SvelteKit Cast receiver (static)
│ └── docs/ # Documentation site (optional)
├── packages/ # Shared TypeScript packages
│ ├── ui/ # shadcn-svelte components
│ ├── api-client/ # Generated OpenAPI client
│ └── shared/ # Shared types, utilities
├── docker/ # Dockerfiles
├── .github/ # GitHub Actions workflows
└── docs/ # Project documentation
```
### Key Tools
| Tool | Purpose |
|------|---------|
| **Cargo** | Rust build system, workspace management |
| **pnpm** | Fast, disk-efficient Node.js package manager |
| **Turborepo** | Orchestrates build/test/lint across both Rust and TS workspaces. Caches build outputs. `--affected` flag for CI optimization. |
| **cargo-watch** | Auto-rebuild on Rust file changes during development |
| **cargo-nextest** | Faster test runner with parallel execution |
| **sccache** | Shared compilation cache (speeds up CI and local builds) |
| **cross** / **cargo-zigbuild** | Cross-compile Rust for RPi5 ARM64 |
| **Biome** | Fast linter + formatter for TypeScript (replaces ESLint + Prettier) |
| **clippy** | Rust linter (run with `--deny warnings` in CI) |
| **rustfmt** | Rust formatter (enforced in CI) |
| **lefthook** | Git hooks manager (format + lint on pre-commit) |
### Development Workflow
```bash
# Start everything for local development
turbo dev # Starts SvelteKit dev servers
cargo watch -x run -p pvm-api # Auto-restart API on changes
# Run all tests
turbo test # TypeScript tests
cargo nextest run # Rust tests
# Generate API client after backend changes
cargo run -p pvm-api -- --openapi > apps/dashboard/src/lib/api/schema.json
turbo generate:api-client
# Build for production
turbo build # TypeScript apps
cargo build --release -p pvm-api
cross build --release --target aarch64-unknown-linux-gnu -p pvm-node
```
### Gotchas
- Turborepo's Rust support is task-level (it runs `cargo` as a shell command) — it doesn't understand Cargo's internal dependency graph. Use Cargo workspace for Rust-internal dependencies.
- Keep `pvm-core` as a pure library crate with no async runtime dependency — this lets it be used in both the cloud API and the local node without conflicts.
- Rust compile times are the bottleneck — invest in `sccache` and incremental compilation from day one
- Use `.cargo/config.toml` for cross-compilation targets and linker settings
---
## 18. CSS / Styling
### Recommendation: **Tailwind CSS v4** + **shadcn-svelte** component system
### Alternatives Considered
| Option | Pros | Cons |
|--------|------|------|
| **Tailwind CSS v4** | Utility-first, fast, excellent Svelte integration, v4 is faster with Rust-based engine | Learning curve for utility classes |
| **Vanilla CSS** | No dependencies, full control | Slow development, inconsistent patterns |
| **UnoCSS** | Atomic CSS, fast, flexible presets | Smaller ecosystem than Tailwind |
| **Open Props** | Design tokens as CSS custom properties | Not utility-first, less adoption |
| **Panda CSS** | Type-safe styles, zero runtime | Newer, smaller ecosystem |
### Reasoning
**Tailwind CSS v4** is the clear choice:
1. **Svelte integration**: Tailwind works seamlessly with SvelteKit via the Vite plugin. Svelte's template syntax + Tailwind utilities produce compact, readable component markup.
2. **Tailwind v4 improvements**: The v4 release includes a Rust-based engine (Oxide) that is significantly faster, CSS-first configuration (no more `tailwind.config.js`), automatic content detection, and native CSS cascade layers.
3. **shadcn-svelte**: The component library is built on Tailwind, providing a consistent design system with accessible, customizable components. Components are generated into your codebase — full ownership, no black box.
4. **Cast receiver**: Tailwind's utility classes produce small CSS bundles (only used classes are included) — important for the resource-constrained Chromecast receiver.
5. **Design tokens**: Use CSS custom properties (via Tailwind's theme) for venue-specific branding (colors, logos) that can be swapped at runtime.
### Design System Structure
```
packages/ui/
├── components/ # shadcn-svelte generated components
│ ├── button/
│ ├── card/
│ ├── data-table/
│ ├── dialog/
│ ├── form/
│ └── ...
├── styles/
│ ├── app.css # Global styles, Tailwind imports
│ ├── themes/
│ │ ├── default.css # Default PVM theme
│ │ ├── dark.css # Dark mode overrides
│ │ └── cast.css # Optimized for large screens
│ └── tokens.css # Design tokens (colors, spacing, typography)
└── utils.ts # cn() helper, variant utilities
```
### Venue Branding
Venues should be able to customize their displays:
```css
/* Runtime theme switching via CSS custom properties */
:root {
--venue-primary: theme(colors.blue.600);
--venue-secondary: theme(colors.gray.800);
--venue-logo-url: url('/default-logo.svg');
}
/* Applied per-venue at runtime */
[data-venue-theme="vegas-poker"] {
--venue-primary: #c41e3a;
--venue-secondary: #1a1a2e;
--venue-logo-url: url('/venues/vegas-poker/logo.svg');
}
```
### Gotchas
- Tailwind v4's CSS-first config is a paradigm shift from v3 — ensure all team documentation targets v4 syntax
- shadcn-svelte components use Tailwind v4 as of recent updates — verify compatibility
- Large data tables (tournament player lists, waitlists) need careful styling — consider virtualized rendering for 100+ row tables
- Cast receiver displays need large fonts and high contrast — create a dedicated `cast.css` theme
- Dark mode is essential for poker venues (low-light environments) — design dark-first
---
## Recommended Stack Summary
| Area | Recommendation | Key Reasoning |
|------|---------------|---------------|
| **Backend Language** | Rust | Memory efficiency on RPi5, performance, type safety |
| **Frontend Language** | TypeScript | Browser ecosystem standard, type safety |
| **Backend Framework** | Axum (v0.8+) | Tokio-native, Tower middleware, WebSocket support |
| **Frontend Framework** | SvelteKit (Svelte 5) | Smallest bundles, fine-grained reactivity, PWA support |
| **UI Components** | shadcn-svelte | Accessible, Tailwind-based, full ownership |
| **Cloud Database** | PostgreSQL 16+ | Multi-tenant gold standard, RLS, JSONB |
| **Local Database** | libSQL (embedded) | SQLite-compatible, tiny footprint, Rust-native |
| **ORM / Queries** | sqlx | Compile-time checked SQL, Postgres + SQLite support |
| **Caching** | DragonflyDB | Redis-compatible, multi-threaded, memory efficient |
| **Messaging** | NATS + JetStream | Edge-native leaf nodes, sub-ms latency, lightweight |
| **Real-Time** | WebSockets (Axum) + SSE fallback | Full duplex, NATS-backed fan-out |
| **Auth** | Custom JWT + RBAC | Offline-capable, cross-venue, full control |
| **API Design** | REST + OpenAPI 3.1 | Generated TypeScript client, universal compatibility |
| **Mobile** | PWA first, Capacitor later | One codebase, offline support, app store when needed |
| **Displays** | Generic web app + Android display client | No Cast SDK dependency, works offline, mDNS auto-discovery |
| **Deployment** | Hetzner PVE + Docker (LXC containers) | Self-hosted, full control, existing infrastructure |
| **CI/CD** | Forgejo Actions + Turborepo | Cross-language build orchestration, caching |
| **Monitoring** | OpenTelemetry + Grafana | Vendor-neutral, excellent Rust support |
| **Testing** | cargo-nextest + Vitest + Playwright | Full pyramid: unit, integration, E2E |
| **Styling** | Tailwind CSS v4 | Fast, small bundles, Svelte-native |
| **Monorepo** | Cargo workspace + pnpm + Turborepo | Unified builds, shared types |
| **Linting** | clippy + Biome | Rust + TypeScript coverage |
---
## Decisions Made
> Resolved during tech stack review session, 2026-02-08.
| # | Question | Decision |
|---|----------|----------|
| 1 | **Hosting** | **Self-hosted on Hetzner PVE** — LXC containers. Already have infrastructure. No Fly.io dependency. |
| 2 | **Sync strategy** | **Event-based sync via NATS JetStream** — all mutations are events, local node replays events to build state. Perfect audit trail. No table-vs-row debate. |
| 3 | **NATS on RPi5** | **Sidecar** — separate process managed by systemd/Docker. Independently upgradeable and monitorable. |
| 4 | **Financial data** | **No money handling at all.** Venues handle payments via their own POS systems (most are cash-based). PVM only tracks game data. |
| 5 | **Multi-region** | **Single region initially.** Design DB schema and NATS subjects for eventual multi-region without rewrite. |
| 6 | **Player accounts** | **PVM signup first.** Players always create a PVM account before joining venues. No deduplication problem. |
| 7 | **Display strategy** | **Generic web app + Android display client.** TVs run a simple Android app (or $40 Android box) that connects to the local node via mDNS auto-discovery, receives its display assignment via WebSocket, and renders a web page. Falls back to cloud SaaS if local node is offline. Chromecast is supported but not the primary path. No Google Cast SDK dependency. |
| 8 | **RPi5 provisioning** | **Docker on stock Raspberry Pi OS.** All PVM services (node, NATS) run as containers. Updates via image pulls. Provisioning is a one-liner curl script. |
| 9 | **Offline duration** | **72 hours.** Covers a full weekend tournament series. After 72h offline, warn staff but keep operating. Sync everything on reconnect. |
| 10 | **API style** | **REST + OpenAPI 3.1.** Auto-generated TypeScript client. Universal, debuggable, works with everything. |
## Deferred Questions
These remain open for future consideration:
1. **API versioning strategy**: Maintain backward compatibility as long as possible. Only version on breaking changes. Revisit when approaching first external API consumers.
2. **GraphQL for player-facing app**: REST is sufficient for v1. The player app might benefit from GraphQL's flexible querying later (e.g., "show me my upcoming tournaments across all venues with waitlist status"). **Revisit after v1 launch.**
3. **WebTransport**: When browser support matures, could replace WebSockets for lower-latency real-time streams. **Monitor but do not adopt yet.**
4. **WASM on local node**: Could parts of the frontend run on the local node via WASM for ultra-fast local rendering. **Defer.**
5. **AI features**: Player behavior analytics, optimal table assignments, tournament structure recommendations. The data model should be designed to support future ML pipelines. **Design for it, build later.**