Resolve all open questions from tech stack review: - Self-hosted on Hetzner PVE (LXC + Docker) - Event-based sync via NATS JetStream - Generic display system with Android client (no Cast SDK dep) - Docker-based RPi5 provisioning - No money handling, 72h offline limit, REST + OpenAPI - PVM signup-first for player accounts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1213 lines
65 KiB
Markdown
1213 lines
65 KiB
Markdown
# PVM (Poker Venue Manager) — Tech Stack Research
|
|
|
|
> Generated: 2026-02-08
|
|
> Status: DRAFT — for discussion and refinement
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Programming Language](#1-programming-language)
|
|
2. [Backend Framework](#2-backend-framework)
|
|
3. [Frontend Framework](#3-frontend-framework)
|
|
4. [Database Strategy](#4-database-strategy)
|
|
5. [Caching Layer](#5-caching-layer)
|
|
6. [Message Queue / Event Streaming](#6-message-queue--event-streaming)
|
|
7. [Real-Time Communication](#7-real-time-communication)
|
|
8. [Auth & Authorization](#8-auth--authorization)
|
|
9. [API Design](#9-api-design)
|
|
10. [Local Node Architecture](#10-local-node-architecture)
|
|
11. [Chromecast / Display Streaming](#11-chromecast--display-streaming)
|
|
12. [Mobile Strategy](#12-mobile-strategy)
|
|
13. [Deployment & Infrastructure](#13-deployment--infrastructure)
|
|
14. [Monitoring & Observability](#14-monitoring--observability)
|
|
15. [Testing Strategy](#15-testing-strategy)
|
|
16. [Security](#16-security)
|
|
17. [Developer Experience](#17-developer-experience)
|
|
18. [CSS / Styling](#18-css--styling)
|
|
19. [Recommended Stack Summary](#recommended-stack-summary)
|
|
20. [Open Questions / Decisions Needed](#open-questions--decisions-needed)
|
|
|
|
---
|
|
|
|
## 1. Programming Language
|
|
|
|
### Recommendation: **Rust** (backend + local node) + **TypeScript** (frontend + shared types)
|
|
|
|
### Alternatives Considered
|
|
|
|
| Language | Pros | Cons |
|
|
|----------|------|------|
|
|
| **Rust** | Memory safety, fearless concurrency, tiny binaries for RPi5, no GC pauses, excellent WebSocket perf | Steeper learning curve, slower compile times |
|
|
| **Go** | Simple, fast compilation, good concurrency | Less expressive type system, GC pauses (minor), larger binaries than Rust |
|
|
| **TypeScript (full-stack)** | One language everywhere, huge ecosystem, fast dev velocity | Node.js memory overhead on RPi5, GC pauses in real-time scenarios, weaker concurrency model |
|
|
| **Elixir** | Built for real-time (Phoenix), fault-tolerant OTP | Small ecosystem, harder to find libs, RPi5 BEAM VM overhead |
|
|
|
|
### Reasoning
|
|
|
|
Rust is the strongest choice for PVM because of the **RPi5 local node constraint**. The local node must run reliably on constrained hardware with limited memory, handle real-time tournament clocks, manage offline operations, and sync data. Rust's zero-cost abstractions, lack of garbage collector, and small binary sizes (typically 5-15 MB static binaries) make it ideal for this.
|
|
|
|
For the **cloud backend**, Rust's performance means fewer servers and lower hosting costs. A single Rust service can handle thousands of concurrent WebSocket connections with minimal memory overhead — critical for real-time tournament updates across many venues.
|
|
|
|
The **"all code written by Claude Code"** constraint actually favors Rust: Claude has excellent Rust fluency, and the compiler's strict type system catches bugs that would otherwise require extensive testing in dynamic languages.
|
|
|
|
**TypeScript** remains the right choice for the frontend — the browser ecosystem is TypeScript-native, and sharing type definitions between Rust (via generated OpenAPI types) and TypeScript gives end-to-end type safety.
|
|
|
|
### Gotchas
|
|
|
|
- Rust compile times can be mitigated with `cargo-watch`, incremental compilation, and `sccache`
|
|
- Cross-compilation for RPi5 (ARM64) is well-supported via `cross` or `cargo-zigbuild`
|
|
- Shared domain types can be generated from Rust structs to TypeScript via `ts-rs` or OpenAPI codegen
|
|
|
|
---
|
|
|
|
## 2. Backend Framework
|
|
|
|
### Recommendation: **Axum** (v0.8+)
|
|
|
|
### Alternatives Considered
|
|
|
|
| Framework | Pros | Cons |
|
|
|-----------|------|------|
|
|
| **Axum** | Tokio-native, excellent middleware (Tower), lowest memory footprint, growing ecosystem, WebSocket built-in | Younger than Actix |
|
|
| **Actix Web** | Highest raw throughput, most mature | Actor model adds complexity, not Tokio-native (uses own runtime fork) |
|
|
| **Rocket** | Most ergonomic, Rails-like DX | Slower performance, less flexible middleware |
|
|
| **Loco** | Rails-like conventions, batteries-included | Very new (2024), smaller community, opinionated |
|
|
|
|
### Reasoning
|
|
|
|
**Axum** is the clear winner for PVM:
|
|
|
|
1. **Tokio-native**: Axum is built directly on Tokio + Hyper + Tower. Since NATS, database drivers, and WebSocket handling all use Tokio, everything shares one async runtime — no impedance mismatch.
|
|
2. **Tower middleware**: The Tower service/layer pattern gives composable middleware for auth, rate limiting, tracing, compression, CORS, etc. Middleware can be shared between HTTP and WebSocket handlers.
|
|
3. **WebSocket support**: First-class WebSocket extraction with `axum::extract::ws`, typed WebSocket messages via `axum-typed-websockets`.
|
|
4. **Memory efficiency**: Benchmarks show Axum achieves the lowest memory footprint per connection — critical when serving thousands of concurrent venue connections.
|
|
5. **OpenAPI integration**: `utoipa` crate provides derive macros for generating OpenAPI 3.1 specs directly from Axum handler types.
|
|
6. **Extractor pattern**: Axum's extractor-based request handling maps cleanly to domain operations (extract tenant, extract auth, extract venue context).
|
|
|
|
### Key Libraries
|
|
|
|
- `axum` — HTTP framework
|
|
- `axum-extra` — typed headers, cookie jar, multipart
|
|
- `tower` + `tower-http` — middleware stack (CORS, compression, tracing, rate limiting)
|
|
- `utoipa` + `utoipa-axum` — OpenAPI spec generation
|
|
- `utoipa-swagger-ui` — embedded Swagger UI
|
|
- `axum-typed-websockets` — strongly typed WS messages
|
|
|
|
### Gotchas
|
|
|
|
- Axum's error handling requires careful design — use `thiserror` + a custom error type that implements `IntoResponse`
|
|
- Route organization: use `axum::Router::nest()` for modular route trees per domain (tournaments, venues, players)
|
|
- State management: use `axum::extract::State` with `Arc<AppState>` — avoid the temptation to put everything in one giant state struct
|
|
|
|
---
|
|
|
|
## 3. Frontend Framework
|
|
|
|
### Recommendation: **SvelteKit** (Svelte 5 + runes reactivity)
|
|
|
|
### Alternatives Considered
|
|
|
|
| Framework | Pros | Cons |
|
|
|-----------|------|------|
|
|
| **SvelteKit** | Smallest bundles, true compilation (no virtual DOM), built-in routing/SSR/PWA, Svelte 5 runes are elegant | Smaller ecosystem than React |
|
|
| **Next.js (React)** | Largest ecosystem, most libraries, biggest job market | Vercel lock-in concerns, React hydration overhead, larger bundles, RSC complexity |
|
|
| **SolidStart** | Finest-grained reactivity, near-zero overhead updates | Smallest ecosystem, least mature, fewer component libraries |
|
|
| **Nuxt (Vue)** | Good DX, solid ecosystem | Vue 3 composition API less elegant than Svelte 5 runes |
|
|
|
|
### Reasoning
|
|
|
|
**SvelteKit** is the best fit for PVM for several reasons:
|
|
|
|
1. **Performance matters for venue displays**: Tournament clocks, waiting lists, and seat maps will run on venue TVs via Chromecast. Svelte's compiled output produces minimal JavaScript — the Cast receiver app will load faster and use less memory on Chromecast hardware.
|
|
2. **Real-time UI updates**: Svelte 5's fine-grained reactivity (runes: `$state`, `$derived`, `$effect`) means updating a single timer or seat status re-renders only that DOM node, not a virtual DOM diff. This is ideal for dashboards with many independently updating elements.
|
|
3. **PWA support**: SvelteKit has first-class service worker support and offline capabilities through `@sveltejs/adapter-static` and `vite-plugin-pwa`.
|
|
4. **Bundle size**: SvelteKit produces the smallest JavaScript bundles of any major framework — important for mobile PWA users on venue WiFi.
|
|
5. **Claude Code compatibility**: Svelte's template syntax is straightforward and less boilerplate than React — Claude can generate clean, readable Svelte components efficiently.
|
|
6. **No framework lock-in**: Svelte compiles away, so there's no runtime dependency. The output is vanilla JS + DOM manipulation.
|
|
|
|
### UI Component Library
|
|
|
|
**Recommendation: Skeleton UI** (Svelte-native) or **shadcn-svelte** (Tailwind-based, port of shadcn/ui)
|
|
|
|
`shadcn-svelte` is particularly compelling because:
|
|
- Components are copied into your codebase (not a dependency) — full control
|
|
- Built on Tailwind CSS — consistent with the styling recommendation
|
|
- Accessible by default (uses Bits UI primitives under the hood)
|
|
- Matches the design patterns of the widely-used shadcn/ui ecosystem
|
|
|
|
### Gotchas
|
|
|
|
- SvelteKit's SSR is useful for the management dashboard but the Cast receiver and PWA may use `adapter-static` for pure SPA mode
|
|
- Svelte's ecosystem is smaller than React's, but for PVM's needs (forms, tables, charts, real-time) the ecosystem is sufficient
|
|
- Svelte 5 (runes) is a significant API change from Svelte 4 — ensure all examples and libraries target Svelte 5
|
|
|
|
---
|
|
|
|
## 4. Database Strategy
|
|
|
|
### Recommendation: **PostgreSQL** (cloud primary) + **libSQL/SQLite** (local node) + **Electric SQL** or custom sync
|
|
|
|
### Alternatives Considered
|
|
|
|
| Approach | Pros | Cons |
|
|
|----------|------|------|
|
|
| **Postgres cloud + libSQL local + sync** | Best of both worlds — Postgres power in cloud, SQLite simplicity on RPi5 | Need sync layer, schema divergence risk |
|
|
| **Postgres everywhere** | One DB engine, simpler mental model | Postgres on RPi5 uses more memory, harder offline |
|
|
| **libSQL/Turso everywhere** | One engine, built-in edge replication | Less powerful for complex cloud queries, multi-tenant partitioning |
|
|
| **CockroachDB** | Distributed, strong consistency | Heavy for RPi5, expensive, overkill |
|
|
|
|
### Detailed Recommendation
|
|
|
|
**Cloud Database: PostgreSQL 16+**
|
|
- The gold standard for multi-tenant SaaS
|
|
- Row-level security (RLS) for tenant isolation
|
|
- JSONB for flexible per-venue configuration
|
|
- Excellent full-text search for player lookup across venues
|
|
- Partitioning by tenant for performance at scale
|
|
- Managed options: Neon (serverless, branching for dev), Supabase, or AWS RDS
|
|
|
|
**Local Node Database: libSQL (via Turso's embedded runtime)**
|
|
- Fork of SQLite with cloud sync capabilities
|
|
- Runs embedded in the Rust binary — no separate database process on RPi5
|
|
- WAL mode for concurrent reads during tournament operations
|
|
- Tiny memory footprint (< 10 MB typical)
|
|
- libSQL's Rust driver (`libsql`) is well-maintained
|
|
|
|
**Sync Strategy:**
|
|
|
|
The local node operates on a **subset** of the cloud data — only data relevant to its venue(s). The sync approach:
|
|
|
|
1. **Cloud-to-local**: Player profiles, memberships, credit lines pushed to local node via NATS JetStream. Local node maintains a read replica of relevant data in libSQL.
|
|
2. **Local-to-cloud**: Tournament results, waitlist changes, transactions pushed to cloud via NATS JetStream with at-least-once delivery. Cloud processes as events.
|
|
3. **Conflict resolution**: Last-writer-wins (LWW) with vector clocks for most entities. For financial data (credit lines, buy-ins), use **event sourcing** — conflicts are impossible because every transaction is an immutable event.
|
|
4. **Offline queue**: When disconnected, local node queues mutations in a local WAL-style append-only log. On reconnect, replays in order via NATS.
|
|
|
|
### ORM / Query Layer
|
|
|
|
**Recommendation: `sqlx`** (compile-time checked queries)
|
|
|
|
- `sqlx` checks SQL queries against the actual database schema at compile time
|
|
- No ORM abstraction layer — write real SQL, get compile-time safety
|
|
- Supports both PostgreSQL and SQLite/libSQL
|
|
- Avoids the N+1 query problems that ORMs introduce
|
|
- Migrations via `sqlx migrate`
|
|
|
|
Alternative: `sea-orm` if you want a full ORM, but for PVM the explicit SQL approach of `sqlx` gives more control over multi-tenant queries and complex joins.
|
|
|
|
### Migrations
|
|
|
|
- Use `sqlx migrate` for cloud PostgreSQL migrations
|
|
- Maintain parallel migration files for libSQL (SQLite-compatible subset)
|
|
- A shared migration test ensures both schemas stay compatible for the sync subset
|
|
|
|
### Gotchas
|
|
|
|
- PostgreSQL and SQLite have different SQL dialects — the sync subset must use compatible types (no Postgres-specific types in synced tables)
|
|
- libSQL's `VECTOR` type is interesting for future player similarity features but not needed initially
|
|
- Turso's hosted libSQL replication is an option but adds a dependency — prefer embedded libSQL with custom NATS-based sync for more control
|
|
- Schema versioning must be tracked on the local node so the cloud knows what schema version it's talking to
|
|
|
|
---
|
|
|
|
## 5. Caching Layer
|
|
|
|
### Recommendation: **DragonflyDB**
|
|
|
|
### Alternatives Considered
|
|
|
|
| Option | Pros | Cons |
|
|
|--------|------|------|
|
|
| **DragonflyDB** | 25x Redis throughput, Redis-compatible API, multi-threaded, lower memory usage | Younger project, smaller community |
|
|
| **Redis 7+** | Most mature, largest ecosystem, Redis Stack modules | Single-threaded core, BSL license concerns since Redis 7.4 |
|
|
| **Valkey** | Redis fork, community-driven, BSD license | Still catching up to Redis feature parity |
|
|
| **KeyDB** | Multi-threaded Redis fork | Development appears stalled (no updates in 1.5+ years) |
|
|
| **No cache (just Postgres)** | Simpler architecture | Higher DB load, slower for session/real-time data |
|
|
|
|
### Reasoning
|
|
|
|
**DragonflyDB** is the right choice for PVM:
|
|
|
|
1. **Redis API compatibility**: Drop-in replacement — all Redis client libraries work unchanged. The `fred` Rust crate (async Redis client) works with DragonflyDB out of the box.
|
|
2. **Multi-threaded architecture**: DragonflyDB uses all available CPU cores, unlike Redis's single-threaded model. This matters when caching tournament state for hundreds of concurrent venues.
|
|
3. **Memory efficiency**: DragonflyDB uses up to 80% less memory than Redis for the same dataset — important for keeping infrastructure costs low.
|
|
4. **No license concerns**: DragonflyDB uses BSL 1.1 (converts to open source after 4 years). Redis switched to a dual-license model that's more restrictive. Valkey is BSD but is playing catch-up.
|
|
5. **Pub/Sub**: DragonflyDB supports Redis Pub/Sub — useful as a lightweight complement to NATS for in-process event distribution within the backend cluster.
|
|
|
|
### What to Cache
|
|
|
|
- **Session data**: User sessions, JWT refresh tokens
|
|
- **Tournament state**: Current level, blinds, clock, player counts (hot read path)
|
|
- **Waiting lists**: Ordered sets per venue/game type
|
|
- **Rate limiting**: API rate limit counters
|
|
- **Player lookup cache**: Frequently accessed player profiles
|
|
- **Seat maps**: Current table/seat assignments per venue
|
|
|
|
### What NOT to Cache (use Postgres directly)
|
|
|
|
- Financial transactions (credit lines, buy-ins) — always hit the source of truth
|
|
- Audit logs
|
|
- Historical tournament data
|
|
|
|
### Local Node: No DragonflyDB
|
|
|
|
The RPi5 local node should **not** run DragonflyDB. libSQL is fast enough for local caching needs, and adding another process increases complexity and memory usage on constrained hardware. Use in-memory Rust data structures (e.g., `DashMap`, `moka` cache crate) for hot local state.
|
|
|
|
### Gotchas
|
|
|
|
- DragonflyDB's replication features are less mature than Redis Sentinel/Cluster — use managed hosting or keep it simple with a single node + persistence initially
|
|
- Monitor DragonflyDB's release cycle — it's actively developed but younger than Redis
|
|
- Keep the cache layer optional — the system should function (slower) without it
|
|
|
|
---
|
|
|
|
## 6. Message Queue / Event Streaming
|
|
|
|
### Recommendation: **NATS + JetStream**
|
|
|
|
### Alternatives Considered
|
|
|
|
| Option | Pros | Cons |
|
|
|--------|------|------|
|
|
| **NATS + JetStream** | Lightweight (single binary, ~20MB), sub-ms latency, built-in persistence, embedded mode, perfect for edge | Smaller community than Kafka |
|
|
| **Apache Kafka** | Highest throughput, mature, excellent tooling | Heavy (JVM, ZooKeeper/KRaft), 4GB+ RAM minimum, overkill for PVM's scale |
|
|
| **RabbitMQ** | Mature AMQP, sophisticated routing | Higher latency (5-20ms), more memory, Erlang ops complexity |
|
|
| **Redis Streams** | Simple, already have cache layer | Not designed for reliable message delivery at scale |
|
|
|
|
### Reasoning
|
|
|
|
**NATS + JetStream** is purpose-built for PVM's architecture:
|
|
|
|
1. **Edge-native**: NATS can run as a **leaf node** on the RPi5, connecting to the cloud NATS cluster. This is the core of the local-to-cloud sync architecture. When the connection drops, JetStream buffers messages locally and replays them on reconnect.
|
|
|
|
2. **Lightweight**: NATS server is a single ~20 MB binary. On RPi5, it uses ~50 MB RAM. Compare to Kafka's 4 GB minimum.
|
|
|
|
3. **Sub-millisecond latency**: Core NATS delivers messages in < 1ms. JetStream (persistent) adds 1-5ms. This is critical for real-time tournament updates — when a player busts, every connected display should update within milliseconds.
|
|
|
|
4. **Subject-based addressing**: NATS subjects map perfectly to PVM's domain:
|
|
- `venue.{venue_id}.tournament.{id}.clock` — tournament clock ticks
|
|
- `venue.{venue_id}.waitlist.update` — waiting list changes
|
|
- `venue.{venue_id}.seats.{table_id}` — seat assignments
|
|
- `player.{player_id}.notifications` — player-specific events
|
|
- `sync.{node_id}.upstream` — local node to cloud sync
|
|
- `sync.{node_id}.downstream` — cloud to local node sync
|
|
|
|
5. **Built-in patterns**: Request/reply (for RPC between cloud and node), pub/sub (for broadcasts), queue groups (for load-balanced consumers), key-value store (for distributed config), object store (for binary data like player photos).
|
|
|
|
6. **JetStream for durability**: Tournament results, financial transactions, and sync operations need guaranteed delivery. JetStream provides at-least-once and exactly-once delivery semantics with configurable retention.
|
|
|
|
### Architecture
|
|
|
|
```
|
|
RPi5 Local Node Cloud
|
|
┌──────────────┐ ┌──────────────────┐
|
|
│ NATS Leaf │◄──── TLS ────►│ NATS Cluster │
|
|
│ Node │ (auto- │ (3-node) │
|
|
│ │ reconnect) │ │
|
|
│ JetStream │ │ JetStream │
|
|
│ (local buf) │ │ (persistent) │
|
|
└──────────────┘ └──────────────────┘
|
|
```
|
|
|
|
### Gotchas
|
|
|
|
- NATS JetStream's exactly-once semantics require careful consumer design — use idempotent handlers with deduplication IDs
|
|
- Subject namespace design is critical — plan it early, changing later is painful
|
|
- NATS leaf nodes need TLS configured for secure cloud connection
|
|
- Monitor JetStream stream sizes on RPi5 — set max bytes limits to avoid filling the SD card during extended offline periods
|
|
- The `async-nats` Rust crate is the official async client — well maintained and Tokio-native
|
|
|
|
---
|
|
|
|
## 7. Real-Time Communication
|
|
|
|
### Recommendation: **WebSockets** (via Axum) for interactive clients + **NATS** for backend fan-out + **SSE** as fallback
|
|
|
|
### Alternatives Considered
|
|
|
|
| Option | Pros | Cons |
|
|
|--------|------|------|
|
|
| **WebSockets** | Full duplex, low latency, wide support | Requires connection management, can't traverse some proxies |
|
|
| **Server-Sent Events (SSE)** | Simpler, auto-reconnect, HTTP-native | Server-to-client only, no binary support |
|
|
| **WebTransport** | HTTP/3, multiplexed streams, unreliable mode | Very new, limited browser support, no Chromecast support |
|
|
| **Socket.IO** | Auto-fallback, rooms, namespaces | Node.js-centric, adds overhead, not Rust-native |
|
|
| **gRPC streaming** | Typed, efficient, bidirectional | Not browser-native (needs grpc-web proxy), overkill |
|
|
|
|
### Architecture
|
|
|
|
The real-time pipeline has three layers:
|
|
|
|
1. **NATS** (backend event bus): All state changes publish to NATS subjects. This is the single source of real-time truth. Both cloud services and local nodes publish here.
|
|
|
|
2. **WebSocket Gateway** (Axum): A dedicated Axum service subscribes to relevant NATS subjects and fans out to connected WebSocket clients. Each client subscribes to the venues/tournaments they care about.
|
|
|
|
3. **SSE Fallback**: For environments where WebSockets are blocked (some corporate networks), provide an SSE endpoint that delivers the same event stream. SSE's built-in auto-reconnect with `Last-Event-ID` makes resumption simple.
|
|
|
|
### Flow Example: Tournament Clock Update
|
|
|
|
```
|
|
Tournament Service (Rust)
|
|
→ publishes to NATS: venue.123.tournament.456.clock {level: 5, time_remaining: 1200}
|
|
→ WebSocket Gateway subscribes to venue.123.tournament.*
|
|
→ fans out to all connected clients watching tournament 456
|
|
→ Chromecast receiver app gets update, renders clock
|
|
→ PWA on player's phone gets update, shows current level
|
|
```
|
|
|
|
### Implementation Details
|
|
|
|
- Use `axum::extract::ws::WebSocket` with `tokio::select!` to multiplex NATS subscription + client messages
|
|
- Implement heartbeat/ping-pong to detect stale connections (30s interval, 10s timeout)
|
|
- Client reconnection with exponential backoff + subscription replay from NATS JetStream
|
|
- Binary message format: consider MessagePack (`rmp-serde`) for compact payloads over WebSocket, with JSON as human-readable fallback
|
|
- Connection limits: track per-venue connection count, implement backpressure
|
|
|
|
### Gotchas
|
|
|
|
- WebSocket connections are stateful — need sticky sessions or a connection registry if running multiple gateway instances
|
|
- Chromecast receiver apps have limited WebSocket support — test thoroughly on actual hardware
|
|
- Mobile PWAs going to background will drop WebSocket connections — design for reconnection and state catch-up
|
|
- Rate limit outbound messages to prevent flooding slow clients (tournament clock ticks should be throttled to 1/second for display, even if internal state updates more frequently)
|
|
|
|
---
|
|
|
|
## 8. Auth & Authorization
|
|
|
|
### Recommendation: **Custom JWT auth** with **Postgres-backed RBAC** + optional **OAuth2 social login**
|
|
|
|
### Alternatives Considered
|
|
|
|
| Option | Pros | Cons |
|
|
|--------|------|------|
|
|
| **Custom JWT + RBAC** | Full control, no vendor dependency, works offline on local node | Must implement everything yourself |
|
|
| **Auth0 / Clerk** | Managed, social login, MFA out of box | Vendor lock-in, cost scales with users, doesn't work offline |
|
|
| **Keycloak** | Self-hosted, full-featured, OIDC/SAML | Heavy (Java), complex to operate, overkill |
|
|
| **Ory (Kratos + Keto)** | Open source, cloud-native, API-first | Multiple services to deploy, newer |
|
|
| **Lucia Auth** | Lightweight, framework-agnostic | TypeScript-only, no Rust support |
|
|
|
|
### Architecture
|
|
|
|
PVM's auth has a unique challenge: **cross-venue universal player accounts** that must work both online (cloud) and offline (local node). This rules out purely managed auth services.
|
|
|
|
**Token Strategy:**
|
|
|
|
```
|
|
Access Token (JWT, short-lived: 15 min)
|
|
├── sub: player_id (universal)
|
|
├── tenant_id: current operator
|
|
├── venue_id: current venue (if applicable)
|
|
├── roles: ["player", "dealer", "floor_manager", "admin"]
|
|
├── permissions: ["tournament.manage", "waitlist.view", ...]
|
|
└── iat, exp, iss
|
|
|
|
Refresh Token (opaque, stored in DB/DragonflyDB, long-lived: 30 days)
|
|
└── Rotated on each use, old tokens invalidated
|
|
```
|
|
|
|
**RBAC Model:**
|
|
|
|
```
|
|
Operator (tenant)
|
|
├── Admin — full control over all venues
|
|
├── Manager — manage specific venues
|
|
├── Floor Manager — tournament/table operations at a venue
|
|
├── Dealer — assigned to tables, report results
|
|
└── Player — universal account, cross-venue
|
|
├── can self-register
|
|
├── has memberships per venue
|
|
└── has credit lines per venue (managed by admin)
|
|
```
|
|
|
|
**Key Design Decisions:**
|
|
|
|
1. **Tenant-scoped roles**: A user can be an admin in one operator's venues and a player in another. The `(user_id, operator_id, role)` triple is the authorization unit.
|
|
2. **Offline auth on local node**: The local node caches valid JWT signing keys and a subset of user credentials (hashed). Players can authenticate locally when the cloud is unreachable. New registrations queue for cloud sync.
|
|
3. **JWT signing**: Use Ed25519 (fast, small signatures) via the `jsonwebtoken` crate. The cloud signs tokens; the local node can verify them with the public key. For offline token issuance, the local node has a delegated signing key.
|
|
4. **Password hashing**: `argon2` crate — memory-hard, resistant to GPU attacks. Tune parameters for RPi5 (lower memory cost than cloud).
|
|
5. **Social login** (optional, cloud-only): Support Google/Apple sign-in for player accounts via standard OAuth2 flows. Map social identities to the universal player account.
|
|
|
|
### Gotchas
|
|
|
|
- Token revocation is hard with JWTs — use short expiry (15 min) + refresh token rotation + a lightweight blocklist in DragonflyDB for immediate revocation
|
|
- Cross-venue account linking: when a player signs up at venue A and later visits venue B (different operator), they should be recognized. Use email/phone as the universal identifier with verification.
|
|
- Local node token issuance must be time-limited and logged — cloud should audit all locally-issued tokens on sync
|
|
- Rate limit login attempts both on cloud and local node to prevent brute force
|
|
|
|
---
|
|
|
|
## 9. API Design
|
|
|
|
### Recommendation: **REST + OpenAPI 3.1** with generated TypeScript client
|
|
|
|
### Alternatives Considered
|
|
|
|
| Approach | Pros | Cons |
|
|
|----------|------|------|
|
|
| **REST + OpenAPI** | Universal, tooling-rich, generated clients, cacheable | Overfetching possible, multiple round trips |
|
|
| **GraphQL** | Flexible queries, single endpoint, good for complex UIs | Complexity overhead, caching harder, Rust support less mature |
|
|
| **tRPC** | Zero-config type safety | TypeScript-only — cannot use with Rust backend |
|
|
| **gRPC** | Efficient binary protocol, streaming | Needs proxy for browsers, overkill for this use case |
|
|
|
|
### Reasoning
|
|
|
|
**tRPC is ruled out** because it requires both client and server to be TypeScript. With a Rust backend, this is not viable.
|
|
|
|
**REST + OpenAPI** is the best approach because:
|
|
|
|
1. **Generated type safety**: Use `utoipa` to generate OpenAPI 3.1 specs from Rust types, then `openapi-typescript` to generate TypeScript types for the frontend. Changes to the Rust API automatically propagate to the frontend types.
|
|
2. **Cacheable**: REST's HTTP semantics enable CDN caching, ETag support, and conditional requests — important for player profiles and tournament structures that change infrequently.
|
|
3. **Universal clients**: The REST API will also be consumed by the Chromecast receiver app, the local node sync layer, and potentially third-party integrations. OpenAPI makes all of these easy.
|
|
4. **Tooling**: Swagger UI for exploration, `openapi-fetch` for the TypeScript client (type-safe fetch wrapper), Postman/Insomnia for testing.
|
|
|
|
### API Conventions
|
|
|
|
```
|
|
# Resource-based URLs
|
|
GET /api/v1/venues/{venue_id}/tournaments
|
|
POST /api/v1/venues/{venue_id}/tournaments
|
|
GET /api/v1/venues/{venue_id}/tournaments/{id}
|
|
PATCH /api/v1/venues/{venue_id}/tournaments/{id}
|
|
|
|
# Actions as sub-resources
|
|
POST /api/v1/venues/{venue_id}/tournaments/{id}/start
|
|
POST /api/v1/venues/{venue_id}/tournaments/{id}/pause
|
|
POST /api/v1/venues/{venue_id}/waitlists/{id}/join
|
|
POST /api/v1/venues/{venue_id}/waitlists/{id}/call/{player_id}
|
|
|
|
# Cross-venue player operations
|
|
GET /api/v1/players/me
|
|
GET /api/v1/players/{id}/memberships
|
|
POST /api/v1/players/{id}/credit-lines
|
|
|
|
# Real-time subscriptions
|
|
WS /api/v1/ws?venue={id}&subscribe=tournament.clock,waitlist.updates
|
|
```
|
|
|
|
### Type Generation Pipeline
|
|
|
|
```
|
|
Rust structs (serde + utoipa derive)
|
|
→ OpenAPI 3.1 JSON spec (generated at build time)
|
|
→ openapi-typescript (CI step)
|
|
→ TypeScript types + openapi-fetch client
|
|
→ SvelteKit frontend consumes typed API
|
|
```
|
|
|
|
### Gotchas
|
|
|
|
- Version the API from day one (`/api/v1/`) — breaking changes go in `/api/v2/`
|
|
- Use cursor-based pagination for lists (not offset-based) — more efficient and handles concurrent inserts
|
|
- Standardize error responses: `{ error: { code: string, message: string, details?: any } }`
|
|
- Consider a lightweight BFF (Backend-for-Frontend) pattern in SvelteKit's server routes for aggregating multiple API calls into one page load
|
|
|
|
---
|
|
|
|
## 10. Local Node Architecture
|
|
|
|
### Recommendation: **Single Rust binary** running on RPi5 with embedded libSQL, NATS leaf node, and local HTTP/WS server
|
|
|
|
### What Runs on the RPi5
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────┐
|
|
│ PVM Local Node (single Rust binary, ~15-20 MB) │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ HTTP/WS │ │ NATS Leaf │ │
|
|
│ │ Server │ │ Node │ │
|
|
│ │ (Axum) │ │ (embedded or │ │
|
|
│ │ │ │ sidecar) │ │
|
|
│ └──────┬───────┘ └──────┬───────┘ │
|
|
│ │ │ │
|
|
│ ┌──────┴──────────────────┴───────┐ │
|
|
│ │ Application Core │ │
|
|
│ │ - Tournament engine │ │
|
|
│ │ - Clock manager │ │
|
|
│ │ - Waitlist manager │ │
|
|
│ │ - Seat assignment │ │
|
|
│ │ - Sync orchestrator │ │
|
|
│ └──────────────┬───────────────────┘ │
|
|
│ │ │
|
|
│ ┌──────────────┴───────────────────┐ │
|
|
│ │ libSQL (embedded) │ │
|
|
│ │ - Venue data subset │ │
|
|
│ │ - Offline mutation queue │ │
|
|
│ │ - Local auth cache │ │
|
|
│ └───────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌───────────────────────────────────┐ │
|
|
│ │ moka in-memory cache │ │
|
|
│ │ - Hot tournament state │ │
|
|
│ │ - Active session tokens │ │
|
|
│ └───────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Offline Operations
|
|
|
|
When the cloud connection drops, the local node continues operating:
|
|
|
|
1. **Tournament operations**: Clock continues, blinds advance, players bust/rebuy — all local state
|
|
2. **Waitlist management**: Players can join/leave waitlists — queued for cloud sync
|
|
3. **Seat assignments**: Floor managers can move players between tables locally
|
|
4. **Player auth**: Cached credentials allow existing players to log in. New registrations queued.
|
|
5. **Financial operations**: Buy-ins and credit transactions logged locally with offline flag. Cloud reconciles on reconnect.
|
|
|
|
### Sync Protocol
|
|
|
|
```
|
|
On reconnect:
|
|
1. Local node sends its last-seen cloud sequence number
|
|
2. Cloud sends all events since that sequence (via NATS JetStream replay)
|
|
3. Local node sends its offline mutation queue (ordered by local timestamp)
|
|
4. Cloud processes mutations, detects conflicts, responds with resolution
|
|
5. Local node applies cloud resolutions, updates local state
|
|
6. Both sides confirm sync complete
|
|
```
|
|
|
|
### Conflict Resolution Strategy
|
|
|
|
| Data Type | Strategy | Reasoning |
|
|
|-----------|----------|-----------|
|
|
| Tournament state | Cloud wins | Only one node runs a tournament at a time |
|
|
| Waitlist | Merge (union) | Both sides can add/remove; merge and re-order by timestamp |
|
|
| Player profiles | Cloud wins (LWW) | Cloud is the authority for universal accounts |
|
|
| Credit transactions | Append-only (event sourcing) | No conflicts — every transaction is immutable |
|
|
| Seat assignments | Local wins during offline | Floor manager's local decisions take precedence |
|
|
| Dealer schedules | Cloud wins | Schedules are set centrally |
|
|
|
|
### RPi5 System Setup
|
|
|
|
- **OS**: Raspberry Pi OS Lite (64-bit, Debian Bookworm-based) — no desktop environment
|
|
- **Runtime**: Docker + Docker Compose. Two containers: `pvm-node` (Rust binary) + `pvm-nats-leaf` (NATS)
|
|
- **Storage**: 32 GB+ microSD or USB SSD (recommended for durability). libSQL database in a Docker volume.
|
|
- **Auto-start**: Docker Compose with `restart: always`. systemd service ensures Docker starts on boot.
|
|
- **Updates**: `docker compose pull && docker compose up -d` — automated via cron or webhook from cloud.
|
|
- **Watchdog**: Docker health checks + hardware watchdog timer to auto-reboot if containers fail.
|
|
- **Networking**: Ethernet preferred (reliable), WiFi as fallback. mDNS for local display device discovery. WireGuard tunnel to Hetzner cloud.
|
|
|
|
### Gotchas
|
|
|
|
- RPi5 has 4 GB or 8 GB RAM — target 8 GB model, budget ~200 MB for the PVM process + NATS
|
|
- SD card wear: use an external USB SSD for the libSQL database if heavy write operations are expected
|
|
- Time synchronization: use `chrony` NTP client — accurate timestamps are critical for conflict resolution and tournament clocks
|
|
- Power loss: libSQL in WAL mode is crash-safe, but implement a clean shutdown handler (SIGTERM) that flushes state
|
|
- Security: the RPi5 is physically accessible in venues — encrypt the libSQL database at rest, disable SSH password auth, use key-only
|
|
|
|
---
|
|
|
|
## 11. Venue Display System
|
|
|
|
### Recommendation: **Generic web display app** + **Android display client** (no Google Cast SDK dependency)
|
|
|
|
### Architecture
|
|
|
|
```
|
|
┌──────────────────┐
|
|
│ Screen Manager │ (part of admin dashboard)
|
|
│ - Assign streams │ Venue staff assigns content to each display
|
|
│ - Per-TV config │
|
|
└────────┬─────────┘
|
|
│ WebSocket (display assignment)
|
|
▼
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ Local RPi5 Node │◄─ mDNS─┤ Display Devices │
|
|
│ serves display │ auto │ (Android box / │
|
|
│ web app + WS │ disco │ smart TV / │
|
|
│ │─────────► Chromecast) │
|
|
└────────┬─────────┘ └──────────────────┘
|
|
│ │
|
|
if offline: fallback:
|
|
serves locally connect to cloud
|
|
│ SaaS URL directly
|
|
▼ │
|
|
┌──────────────────┐ ┌───────▼──────────┐
|
|
│ Display renders │ │ Display renders │
|
|
│ from local node │ │ from cloud │
|
|
└──────────────────┘ └──────────────────┘
|
|
```
|
|
|
|
### Display Client (Android App)
|
|
|
|
A lightweight Android app (or a $40 4K Android box) that:
|
|
|
|
1. **Auto-starts on boot** — kiosk mode, no user interaction needed
|
|
2. **Discovers the local node via mDNS** — zero-config for venue staff, falls back to manual IP entry
|
|
3. **Registers with a unique device ID** — appears automatically in the Screen Manager dashboard
|
|
4. **Receives display assignment via WebSocket** — the system tells it what to render
|
|
5. **Renders a full-screen web page** — the display content is a standard SvelteKit static page
|
|
6. **Falls back to cloud SaaS** if the local RPi5 node is offline
|
|
7. **Remotely controllable** — venue staff can change the stream, restart, or push an announcement overlay from the Screen Manager
|
|
|
|
### Display Content (SvelteKit Static App)
|
|
|
|
The display views are a **separate SvelteKit static build** optimized for large screens:
|
|
|
|
- **Tournament clock**: Large timer, current level, blind structure, next break, average stack
|
|
- **Waiting list**: Player queue by game type, estimated wait times
|
|
- **Table status**: Open seats, game types, stakes per table
|
|
- **Seatings**: Tournament seat assignments after draws
|
|
- **Custom slideshow**: Announcements, promotions, venue info (managed by staff)
|
|
- **Rotation mode**: Cycle between multiple views on a configurable timer
|
|
|
|
### Screen Manager
|
|
|
|
The **Screen Manager** (part of the admin dashboard) lets floor managers:
|
|
|
|
- See all connected display devices with status (online, offline, content)
|
|
- Assign content streams to each device (TV 1-5: tournament clock, TV 6: waitlist, etc.)
|
|
- Configure rotation/cycling between views per device
|
|
- Send one-time announcements to all screens or specific screens
|
|
- Adjust display themes (dark/light, font size, venue branding)
|
|
- Group screens (e.g. "Tournament Area", "Cash Room", "Lobby")
|
|
|
|
### Technical Details
|
|
|
|
- Display web app is served by the local node's HTTP server (Axum) for lowest latency
|
|
- WebSocket connection for live data updates (tournament clock ticks, waitlist changes)
|
|
- Each display device is identified by a stable device ID (generated on first boot, persisted)
|
|
- mDNS service type: `_pvm-display._tcp.local` for auto-discovery
|
|
- Display URLs: `http://{local-node-ip}/display/{device-id}` (local) or `https://app.pvmapp.com/display/{device-id}` (cloud fallback)
|
|
- Dark mode by default (poker venues are low-light environments)
|
|
- Large fonts, high contrast — designed for viewing from across the room
|
|
|
|
### Chromecast Compatibility
|
|
|
|
Chromecast is supported as a **display target** but not the primary architecture:
|
|
|
|
- Smart TVs with built-in Chromecast or attached Chromecast dongles can open the display URL
|
|
- No Google Cast SDK dependency — just opening a URL
|
|
- The Android display client app is the recommended approach for reliability and offline support
|
|
|
|
### Gotchas
|
|
|
|
- Android kiosk mode needs careful implementation — prevent users from exiting the app, handle OS updates gracefully
|
|
- mDNS can be unreliable on some enterprise/venue networks — always offer manual IP fallback
|
|
- Display devices on venue WiFi may have intermittent connectivity — design for reconnection and state catch-up
|
|
- Keep the display app extremely lightweight — some $40 Android boxes have limited RAM
|
|
- Test on actual cheap Android hardware early — performance varies wildly
|
|
- Power cycling (venue closes nightly) must be handled gracefully — auto-start, auto-reconnect, auto-resume
|
|
|
|
---
|
|
|
|
## 12. Mobile Strategy
|
|
|
|
### Recommendation: **PWA first** (SvelteKit), with **Capacitor** wrapper for app store presence when needed
|
|
|
|
### Alternatives Considered
|
|
|
|
| Approach | Pros | Cons |
|
|
|----------|------|------|
|
|
| **PWA (SvelteKit)** | One codebase, instant updates, no app store, works offline | Limited native API access, no push on iOS (improving), discoverability |
|
|
| **Capacitor (hybrid)** | PWA + native shell, access native APIs, app store distribution | Thin WebView wrapper, some performance overhead |
|
|
| **Tauri Mobile** | Rust backend, small size | Mobile support very early (alpha/beta), limited ecosystem |
|
|
| **React Native** | True native UI, large ecosystem | Separate codebase from web, React dependency, not Svelte |
|
|
| **Flutter** | Excellent cross-platform, single codebase | Dart language, separate from web entirely |
|
|
|
|
### Reasoning
|
|
|
|
PVM's mobile needs are primarily **consumption-oriented** — players check tournament schedules, waiting list position, and receive notifications. This is a perfect fit for a PWA:
|
|
|
|
1. **PWA first**: The SvelteKit app with `vite-plugin-pwa` already provides offline caching, add-to-home-screen, and background sync. For most players, this is sufficient.
|
|
|
|
2. **Capacitor wrap when needed**: When iOS push notifications, Apple Pay, or app store presence becomes important, wrap the existing SvelteKit PWA in Capacitor. Capacitor runs the same web app in a native WebView and provides JavaScript bridges to native APIs.
|
|
|
|
3. **Tauri Mobile is not ready**: As of 2026, Tauri 2.0's mobile support exists but is still maturing. It would be a good fit architecturally (Rust backend + web frontend), but the plugin ecosystem and build tooling aren't as polished as Capacitor's. Revisit in 12-18 months.
|
|
|
|
### PWA Features for PVM
|
|
|
|
- **Service Worker**: Cache tournament schedules, player profile, venue info for offline access
|
|
- **Push Notifications**: Web Push API for tournament start reminders, waitlist calls (Android + iOS 16.4+)
|
|
- **Add to Home Screen**: App-like experience without app store
|
|
- **Background Sync**: Queue waitlist join/leave actions when offline, sync when back online
|
|
- **Share Target**: Accept shared tournament links
|
|
|
|
### Gotchas
|
|
|
|
- iOS PWA support is improving but still has limitations (no background fetch, limited push notification payload)
|
|
- Capacitor requires maintaining iOS/Android build pipelines — only add this when there's a clear need
|
|
- Test PWA on actual mobile devices in venues — WiFi quality varies dramatically
|
|
- Deep linking: configure universal links / app links so shared tournament URLs open in the PWA/app
|
|
|
|
---
|
|
|
|
## 13. Deployment & Infrastructure
|
|
|
|
### Recommendation: **Self-hosted on Hetzner PVE** (LXC containers) + **Docker** + **Forgejo Actions** CI/CD
|
|
|
|
### Reasoning
|
|
|
|
The project already has a Hetzner Proxmox VE (PVE) server. Running PVM in LXC containers on the existing infrastructure keeps costs minimal and gives full control.
|
|
|
|
1. **LXC containers on PVE**: Lightweight, near-native performance, easy to snapshot and backup. Each service gets its own container or Docker runs inside an LXC.
|
|
2. **Docker Compose for services**: All cloud services defined in a single `docker-compose.yml`. Simple to start, stop, and update.
|
|
3. **No vendor lock-in**: Everything runs on standard Linux + Docker. Can migrate to any cloud or other bare metal trivially.
|
|
4. **WireGuard for RPi5 connectivity**: RPi5 local nodes connect to the Hetzner server via WireGuard tunnel for secure NATS leaf node communication.
|
|
5. **Forgejo Actions**: CI/CD runs on the same Forgejo instance hosting the code.
|
|
|
|
### Infrastructure Layout
|
|
|
|
```
|
|
Hetzner PVE Server
|
|
├── LXC: pvm-cloud
|
|
│ ├── Docker: pvm-api (Axum)
|
|
│ ├── Docker: pvm-ws-gateway (Axum WebSocket)
|
|
│ ├── Docker: pvm-worker (background jobs: sync, notifications)
|
|
│ ├── Docker: pvm-nats (NATS cluster)
|
|
│ ├── Docker: pvm-db (PostgreSQL 16)
|
|
│ └── Docker: pvm-cache (DragonflyDB)
|
|
├── LXC: pvm-staging (mirrors production for testing)
|
|
└── WireGuard endpoint for RPi5 nodes
|
|
|
|
Venue (RPi5 — Docker on Raspberry Pi OS)
|
|
├── Docker: pvm-node (Rust binary — API proxy + sync engine)
|
|
├── Docker: pvm-nats-leaf (NATS leaf node)
|
|
└── connects to Hetzner via WireGuard/TLS
|
|
```
|
|
|
|
### RPi5 Local Node (Docker-based)
|
|
|
|
The local node runs **Docker on stock Raspberry Pi OS (64-bit)**:
|
|
|
|
- **Provisioning**: One-liner curl script installs Docker and pulls the PVM stack (`docker compose pull && docker compose up -d`)
|
|
- **Updates**: Pull new images and restart (`docker compose pull && docker compose up -d`). Automated via a cron job or self-update webhook.
|
|
- **Rollback**: Previous images remain on disk. Roll back with `docker compose up -d --force-recreate` using pinned image tags.
|
|
- **Services**: `pvm-node` (Rust binary) + `pvm-nats-leaf` (NATS leaf node). Two containers, minimal footprint.
|
|
- **Storage**: libSQL database stored in a Docker volume on the SD card (or USB SSD for heavy-write venues).
|
|
|
|
### CI/CD Pipeline (Forgejo Actions)
|
|
|
|
```yaml
|
|
# Triggered on push to main
|
|
1. Lint (clippy, biome)
|
|
2. Test (cargo nextest, vitest, playwright)
|
|
3. Build (multi-stage Docker for cloud + cross-compile ARM64 for RPi5)
|
|
4. Push images to container registry
|
|
5. Deploy staging (docker compose pull on staging LXC)
|
|
6. E2E tests against staging
|
|
7. Deploy production (manual approval, docker compose on production LXC)
|
|
8. Publish RPi5 images (ARM64 Docker images to registry)
|
|
```
|
|
|
|
### Gotchas
|
|
|
|
- Use multi-stage Docker builds for Rust: builder stage with `rust:bookworm`, runtime stage with `debian:bookworm-slim` or `distroless`
|
|
- PostgreSQL backups: automate `pg_dump` to a separate backup location (another Hetzner storage box or off-site)
|
|
- Set up blue-green deployments via Docker Compose profiles for zero-downtime upgrades
|
|
- Monitor Hetzner server resources — if PVM outgrows a single server, split services across multiple LXCs or servers
|
|
- WireGuard keys for RPi5 nodes: automate key generation and registration during provisioning
|
|
- The RPi5 Docker update mechanism needs a health check — if new images fail, auto-rollback to previous tag
|
|
|
|
---
|
|
|
|
## 14. Monitoring & Observability
|
|
|
|
### Recommendation: **OpenTelemetry** (traces + metrics + logs) exported to **self-hosted Grafana + Loki + Tempo + Prometheus** (on Hetzner PVE)
|
|
|
|
### Alternatives Considered
|
|
|
|
| Stack | Pros | Cons |
|
|
|-------|------|------|
|
|
| **OpenTelemetry + Grafana** | Vendor-neutral, excellent Rust support, unified pipeline | Some setup required |
|
|
| **Datadog** | All-in-one, excellent UX | Expensive at scale, vendor lock-in |
|
|
| **New Relic** | Good APM | Cost, Rust support less first-class |
|
|
| **Sentry** | Excellent error tracking | Limited metrics/traces, complementary rather than primary |
|
|
|
|
### Rust Instrumentation Stack
|
|
|
|
```toml
|
|
# Key crates
|
|
tracing = "0.1" # Structured logging/tracing facade
|
|
tracing-subscriber = "0.3" # Log formatting, filtering
|
|
tracing-opentelemetry = "0.28" # Bridge tracing → OpenTelemetry
|
|
opentelemetry = "0.28" # OTel SDK
|
|
opentelemetry-otlp = "0.28" # OTLP exporter
|
|
opentelemetry-semantic-conventions # Standard attribute names
|
|
```
|
|
|
|
### What to Monitor
|
|
|
|
**Application Metrics:**
|
|
- Request rate, latency (p50/p95/p99), error rate per endpoint
|
|
- WebSocket connection count per venue
|
|
- NATS message throughput and consumer lag
|
|
- Tournament clock drift (local node vs cloud time)
|
|
- Sync latency (time from local mutation to cloud persistence)
|
|
- Cache hit/miss ratios (DragonflyDB)
|
|
|
|
**Business Metrics:**
|
|
- Active tournaments per venue
|
|
- Players on waiting lists
|
|
- Concurrent connected users
|
|
- Tournament registrations per hour
|
|
- Offline duration per local node
|
|
|
|
**Infrastructure Metrics:**
|
|
- CPU, memory, disk per service
|
|
- RPi5 node health: temperature, memory usage, SD card wear level
|
|
- NATS cluster health
|
|
- Postgres connection pool utilization
|
|
|
|
### Local Node Observability
|
|
|
|
The RPi5 node should:
|
|
- Buffer OpenTelemetry spans/metrics locally when offline
|
|
- Flush to cloud collector on reconnect
|
|
- Expose a local `/health` endpoint for venue staff to check node status
|
|
- Log to both stdout (for `journalctl`) and a rotating file
|
|
|
|
### Alerting
|
|
|
|
- Use Grafana Alerting for cloud services
|
|
- Critical alerts: API error rate > 5%, NATS cluster partition, Postgres replication lag > 30s
|
|
- Warning alerts: RPi5 node offline > 5 min, sync backlog > 1000 events, high memory usage
|
|
- Notification channels: Slack/Discord for ops team, push notification for venue managers on critical local node issues
|
|
|
|
### Gotchas
|
|
|
|
- OpenTelemetry's Rust SDK is stable but evolving — pin versions carefully
|
|
- The `tracing` crate is the Rust ecosystem standard — everything (Axum, sqlx, async-nats) already emits tracing spans, so you get deep instrumentation for free
|
|
- Sampling is important at scale — don't trace every tournament clock tick in production
|
|
- Grafana Cloud's free tier is generous enough for early stages (10k metrics, 50GB logs, 50GB traces)
|
|
|
|
---
|
|
|
|
## 15. Testing Strategy
|
|
|
|
### Recommendation: Multi-layer testing with **cargo test** (unit/integration), **Playwright** (E2E), and **Vitest** (frontend unit)
|
|
|
|
### Test Pyramid
|
|
|
|
```
|
|
▲
|
|
/ \ E2E Tests (Playwright)
|
|
/ \ - Full user flows
|
|
/ \ - Cast receiver rendering
|
|
/───────\
|
|
/ \ Integration Tests (cargo test + testcontainers)
|
|
/ \ - API endpoint tests with real DB
|
|
/ \ - NATS pub/sub flows
|
|
/ \ - Sync protocol tests
|
|
/─────────────────\
|
|
Unit Tests (cargo test + vitest)
|
|
- Domain logic (tournament engine, clock, waitlist)
|
|
- Svelte component tests
|
|
- Conflict resolution logic
|
|
```
|
|
|
|
### Backend Testing (Rust)
|
|
|
|
- **Unit tests**: Inline `#[cfg(test)]` modules for domain logic. The tournament engine, clock manager, waitlist priority algorithm, and conflict resolution are all pure functions that are easy to unit test.
|
|
- **Integration tests**: Use `testcontainers` crate to spin up ephemeral Postgres + NATS + DragonflyDB instances. Test full API flows including auth, multi-tenancy, and real-time events.
|
|
- **sqlx compile-time checks**: SQL queries are validated against the database schema at compile time — this catches a huge class of bugs before runtime.
|
|
- **Property-based testing**: Use `proptest` for testing conflict resolution and sync protocol with random inputs.
|
|
- **Test runner**: `cargo-nextest` for parallel test execution (significantly faster than default `cargo test`).
|
|
|
|
### Frontend Testing (TypeScript/Svelte)
|
|
|
|
- **Component tests**: Vitest + `@testing-library/svelte` for testing Svelte components in isolation.
|
|
- **Store/state tests**: Vitest for testing reactive state logic (tournament clock state, waitlist updates).
|
|
- **API mocking**: `msw` (Mock Service Worker) for intercepting API calls in tests.
|
|
|
|
### End-to-End Testing
|
|
|
|
- **Playwright**: Test critical user flows in real browsers:
|
|
- Tournament creation and management flow
|
|
- Player registration and waitlist join
|
|
- Real-time updates (verify clock ticks appear in browser)
|
|
- Multi-venue admin dashboard
|
|
- Cast receiver display rendering (headless Chromium)
|
|
- **Local node E2E**: Test offline scenarios — start local node, disconnect from cloud, perform operations, reconnect, verify sync.
|
|
|
|
### Specialized Tests
|
|
|
|
- **Sync protocol tests**: Simulate network partitions, conflicting writes, replay scenarios
|
|
- **Load testing**: `k6` or `drill` (Rust) for WebSocket connection saturation, API throughput
|
|
- **Cast receiver tests**: Visual regression testing with Playwright screenshots of display layouts
|
|
- **Cross-browser**: Playwright covers Chromium, Firefox, WebKit — ensure PWA works on all
|
|
|
|
### Gotchas
|
|
|
|
- Rust integration tests with testcontainers need Docker available in CI — Fly.io's CI runners support this, or use GitHub Actions with Docker
|
|
- Playwright tests are slow — run in parallel, and only test critical paths in CI (full suite nightly)
|
|
- The local node's offline/reconnect behavior is the hardest thing to test — invest heavily in deterministic sync protocol tests
|
|
- Mock the NATS connection in unit tests using a channel-based mock, not an actual NATS server
|
|
|
|
---
|
|
|
|
## 16. Security
|
|
|
|
### Recommendation: Defense in depth across all layers
|
|
|
|
### Data Security
|
|
|
|
| Layer | Measure |
|
|
|-------|---------|
|
|
| **Transport** | TLS 1.3 everywhere — API, WebSocket, NATS, Postgres connections |
|
|
| **Data at rest** | Postgres: encrypted volumes (cloud provider). libSQL on RPi5: SQLCipher-compatible encryption via `libsql` |
|
|
| **Secrets** | Environment variables via Fly.io secrets (cloud), encrypted config file on RPi5 (sealed at provisioning) |
|
|
| **Passwords** | Argon2id hashing, tuned per environment (higher params on cloud, lower on RPi5) |
|
|
| **JWTs** | Ed25519 signing, short expiry (15 min), refresh token rotation |
|
|
| **API keys** | SHA-256 hashed in DB, displayed once at creation, prefix-based identification (`pvm_live_`, `pvm_test_`) |
|
|
|
|
### Network Security
|
|
|
|
- **API**: Rate limiting (Tower middleware), CORS restricted to known origins, request size limits
|
|
- **WebSocket**: Authenticated connection upgrade (JWT in first message or query param), per-connection rate limiting
|
|
- **NATS**: TLS + token auth between cloud and leaf nodes. Leaf nodes have scoped permissions (can only access their venue's subjects)
|
|
- **RPi5**: Firewall (nftables/ufw) — only allow outbound to cloud NATS + HTTPS, inbound on local network only for venue devices
|
|
- **DDoS**: Fly.io provides basic DDoS protection. Add Cloudflare in front for the API if needed.
|
|
|
|
### Financial Data Security
|
|
|
|
PVM handles credit lines and buy-in transactions — this requires extra care:
|
|
|
|
- All financial mutations are **event-sourced** with immutable audit trail
|
|
- Credit line changes require **admin approval** with logged reason
|
|
- Buy-in/cashout transactions include **idempotency keys** to prevent duplicate charges
|
|
- Financial reports are only accessible to operator admins, with access logged
|
|
- Consider PCI DSS implications if handling payment card data directly — prefer delegating to a payment processor (Stripe)
|
|
|
|
### Local Node Security
|
|
|
|
The RPi5 is physically in a venue — assume it can be stolen or tampered with:
|
|
|
|
- **Disk encryption**: Full disk encryption (LUKS) or at minimum encrypted database
|
|
- **Secure boot**: Signed binaries, verified at startup
|
|
- **Remote wipe**: Cloud can send a command to reset the node to factory state
|
|
- **Tamper detection**: Log unexpected restarts, hardware changes
|
|
- **Credential scope**: Local node only has access to its venue's data — compromising one node doesn't expose other venues
|
|
|
|
### Gotchas
|
|
|
|
- DO NOT store payment card numbers — use a payment processor's tokenization
|
|
- GDPR/privacy: Player data across venues requires careful consent management. Players must be able to request data deletion.
|
|
- The local node's offline auth cache is a security risk — limit cached credentials, expire after configurable period
|
|
- Regularly rotate NATS credentials and JWT signing keys — automate this
|
|
|
|
---
|
|
|
|
## 17. Developer Experience
|
|
|
|
### Recommendation: **Cargo workspace** (Rust monorepo) + **pnpm workspace** (TypeScript) managed by **Turborepo**
|
|
|
|
### Monorepo Structure
|
|
|
|
```
|
|
pvm/
|
|
├── Cargo.toml # Rust workspace root
|
|
├── turbo.json # Turborepo config
|
|
├── package.json # pnpm workspace root
|
|
├── pnpm-workspace.yaml
|
|
│
|
|
├── crates/ # Rust crates
|
|
│ ├── pvm-api/ # Cloud API server (Axum)
|
|
│ ├── pvm-node/ # Local node binary
|
|
│ ├── pvm-ws-gateway/ # WebSocket gateway
|
|
│ ├── pvm-worker/ # Background job processor
|
|
│ ├── pvm-core/ # Shared domain logic
|
|
│ │ ├── tournament/ # Tournament engine
|
|
│ │ ├── waitlist/ # Waitlist management
|
|
│ │ ├── clock/ # Tournament clock
|
|
│ │ └── sync/ # Sync protocol
|
|
│ ├── pvm-db/ # Database layer (sqlx queries, migrations)
|
|
│ ├── pvm-auth/ # Auth logic (JWT, RBAC)
|
|
│ ├── pvm-nats/ # NATS client wrappers
|
|
│ └── pvm-types/ # Shared types (serde, utoipa derives)
|
|
│
|
|
├── apps/ # TypeScript apps
|
|
│ ├── dashboard/ # SvelteKit admin dashboard
|
|
│ ├── player/ # SvelteKit player-facing app
|
|
│ ├── cast-receiver/ # SvelteKit Cast receiver (static)
|
|
│ └── docs/ # Documentation site (optional)
|
|
│
|
|
├── packages/ # Shared TypeScript packages
|
|
│ ├── ui/ # shadcn-svelte components
|
|
│ ├── api-client/ # Generated OpenAPI client
|
|
│ └── shared/ # Shared types, utilities
|
|
│
|
|
├── docker/ # Dockerfiles
|
|
├── .github/ # GitHub Actions workflows
|
|
└── docs/ # Project documentation
|
|
```
|
|
|
|
### Key Tools
|
|
|
|
| Tool | Purpose |
|
|
|------|---------|
|
|
| **Cargo** | Rust build system, workspace management |
|
|
| **pnpm** | Fast, disk-efficient Node.js package manager |
|
|
| **Turborepo** | Orchestrates build/test/lint across both Rust and TS workspaces. Caches build outputs. `--affected` flag for CI optimization. |
|
|
| **cargo-watch** | Auto-rebuild on Rust file changes during development |
|
|
| **cargo-nextest** | Faster test runner with parallel execution |
|
|
| **sccache** | Shared compilation cache (speeds up CI and local builds) |
|
|
| **cross** / **cargo-zigbuild** | Cross-compile Rust for RPi5 ARM64 |
|
|
| **Biome** | Fast linter + formatter for TypeScript (replaces ESLint + Prettier) |
|
|
| **clippy** | Rust linter (run with `--deny warnings` in CI) |
|
|
| **rustfmt** | Rust formatter (enforced in CI) |
|
|
| **lefthook** | Git hooks manager (format + lint on pre-commit) |
|
|
|
|
### Development Workflow
|
|
|
|
```bash
|
|
# Start everything for local development
|
|
turbo dev # Starts SvelteKit dev servers
|
|
cargo watch -x run -p pvm-api # Auto-restart API on changes
|
|
|
|
# Run all tests
|
|
turbo test # TypeScript tests
|
|
cargo nextest run # Rust tests
|
|
|
|
# Generate API client after backend changes
|
|
cargo run -p pvm-api -- --openapi > apps/dashboard/src/lib/api/schema.json
|
|
turbo generate:api-client
|
|
|
|
# Build for production
|
|
turbo build # TypeScript apps
|
|
cargo build --release -p pvm-api
|
|
cross build --release --target aarch64-unknown-linux-gnu -p pvm-node
|
|
```
|
|
|
|
### Gotchas
|
|
|
|
- Turborepo's Rust support is task-level (it runs `cargo` as a shell command) — it doesn't understand Cargo's internal dependency graph. Use Cargo workspace for Rust-internal dependencies.
|
|
- Keep `pvm-core` as a pure library crate with no async runtime dependency — this lets it be used in both the cloud API and the local node without conflicts.
|
|
- Rust compile times are the bottleneck — invest in `sccache` and incremental compilation from day one
|
|
- Use `.cargo/config.toml` for cross-compilation targets and linker settings
|
|
|
|
---
|
|
|
|
## 18. CSS / Styling
|
|
|
|
### Recommendation: **Tailwind CSS v4** + **shadcn-svelte** component system
|
|
|
|
### Alternatives Considered
|
|
|
|
| Option | Pros | Cons |
|
|
|--------|------|------|
|
|
| **Tailwind CSS v4** | Utility-first, fast, excellent Svelte integration, v4 is faster with Rust-based engine | Learning curve for utility classes |
|
|
| **Vanilla CSS** | No dependencies, full control | Slow development, inconsistent patterns |
|
|
| **UnoCSS** | Atomic CSS, fast, flexible presets | Smaller ecosystem than Tailwind |
|
|
| **Open Props** | Design tokens as CSS custom properties | Not utility-first, less adoption |
|
|
| **Panda CSS** | Type-safe styles, zero runtime | Newer, smaller ecosystem |
|
|
|
|
### Reasoning
|
|
|
|
**Tailwind CSS v4** is the clear choice:
|
|
|
|
1. **Svelte integration**: Tailwind works seamlessly with SvelteKit via the Vite plugin. Svelte's template syntax + Tailwind utilities produce compact, readable component markup.
|
|
2. **Tailwind v4 improvements**: The v4 release includes a Rust-based engine (Oxide) that is significantly faster, CSS-first configuration (no more `tailwind.config.js`), automatic content detection, and native CSS cascade layers.
|
|
3. **shadcn-svelte**: The component library is built on Tailwind, providing a consistent design system with accessible, customizable components. Components are generated into your codebase — full ownership, no black box.
|
|
4. **Cast receiver**: Tailwind's utility classes produce small CSS bundles (only used classes are included) — important for the resource-constrained Chromecast receiver.
|
|
5. **Design tokens**: Use CSS custom properties (via Tailwind's theme) for venue-specific branding (colors, logos) that can be swapped at runtime.
|
|
|
|
### Design System Structure
|
|
|
|
```
|
|
packages/ui/
|
|
├── components/ # shadcn-svelte generated components
|
|
│ ├── button/
|
|
│ ├── card/
|
|
│ ├── data-table/
|
|
│ ├── dialog/
|
|
│ ├── form/
|
|
│ └── ...
|
|
├── styles/
|
|
│ ├── app.css # Global styles, Tailwind imports
|
|
│ ├── themes/
|
|
│ │ ├── default.css # Default PVM theme
|
|
│ │ ├── dark.css # Dark mode overrides
|
|
│ │ └── cast.css # Optimized for large screens
|
|
│ └── tokens.css # Design tokens (colors, spacing, typography)
|
|
└── utils.ts # cn() helper, variant utilities
|
|
```
|
|
|
|
### Venue Branding
|
|
|
|
Venues should be able to customize their displays:
|
|
|
|
```css
|
|
/* Runtime theme switching via CSS custom properties */
|
|
:root {
|
|
--venue-primary: theme(colors.blue.600);
|
|
--venue-secondary: theme(colors.gray.800);
|
|
--venue-logo-url: url('/default-logo.svg');
|
|
}
|
|
|
|
/* Applied per-venue at runtime */
|
|
[data-venue-theme="vegas-poker"] {
|
|
--venue-primary: #c41e3a;
|
|
--venue-secondary: #1a1a2e;
|
|
--venue-logo-url: url('/venues/vegas-poker/logo.svg');
|
|
}
|
|
```
|
|
|
|
### Gotchas
|
|
|
|
- Tailwind v4's CSS-first config is a paradigm shift from v3 — ensure all team documentation targets v4 syntax
|
|
- shadcn-svelte components use Tailwind v4 as of recent updates — verify compatibility
|
|
- Large data tables (tournament player lists, waitlists) need careful styling — consider virtualized rendering for 100+ row tables
|
|
- Cast receiver displays need large fonts and high contrast — create a dedicated `cast.css` theme
|
|
- Dark mode is essential for poker venues (low-light environments) — design dark-first
|
|
|
|
---
|
|
|
|
## Recommended Stack Summary
|
|
|
|
| Area | Recommendation | Key Reasoning |
|
|
|------|---------------|---------------|
|
|
| **Backend Language** | Rust | Memory efficiency on RPi5, performance, type safety |
|
|
| **Frontend Language** | TypeScript | Browser ecosystem standard, type safety |
|
|
| **Backend Framework** | Axum (v0.8+) | Tokio-native, Tower middleware, WebSocket support |
|
|
| **Frontend Framework** | SvelteKit (Svelte 5) | Smallest bundles, fine-grained reactivity, PWA support |
|
|
| **UI Components** | shadcn-svelte | Accessible, Tailwind-based, full ownership |
|
|
| **Cloud Database** | PostgreSQL 16+ | Multi-tenant gold standard, RLS, JSONB |
|
|
| **Local Database** | libSQL (embedded) | SQLite-compatible, tiny footprint, Rust-native |
|
|
| **ORM / Queries** | sqlx | Compile-time checked SQL, Postgres + SQLite support |
|
|
| **Caching** | DragonflyDB | Redis-compatible, multi-threaded, memory efficient |
|
|
| **Messaging** | NATS + JetStream | Edge-native leaf nodes, sub-ms latency, lightweight |
|
|
| **Real-Time** | WebSockets (Axum) + SSE fallback | Full duplex, NATS-backed fan-out |
|
|
| **Auth** | Custom JWT + RBAC | Offline-capable, cross-venue, full control |
|
|
| **API Design** | REST + OpenAPI 3.1 | Generated TypeScript client, universal compatibility |
|
|
| **Mobile** | PWA first, Capacitor later | One codebase, offline support, app store when needed |
|
|
| **Displays** | Generic web app + Android display client | No Cast SDK dependency, works offline, mDNS auto-discovery |
|
|
| **Deployment** | Hetzner PVE + Docker (LXC containers) | Self-hosted, full control, existing infrastructure |
|
|
| **CI/CD** | Forgejo Actions + Turborepo | Cross-language build orchestration, caching |
|
|
| **Monitoring** | OpenTelemetry + Grafana | Vendor-neutral, excellent Rust support |
|
|
| **Testing** | cargo-nextest + Vitest + Playwright | Full pyramid: unit, integration, E2E |
|
|
| **Styling** | Tailwind CSS v4 | Fast, small bundles, Svelte-native |
|
|
| **Monorepo** | Cargo workspace + pnpm + Turborepo | Unified builds, shared types |
|
|
| **Linting** | clippy + Biome | Rust + TypeScript coverage |
|
|
|
|
---
|
|
|
|
## Decisions Made
|
|
|
|
> Resolved during tech stack review session, 2026-02-08.
|
|
|
|
| # | Question | Decision |
|
|
|---|----------|----------|
|
|
| 1 | **Hosting** | **Self-hosted on Hetzner PVE** — LXC containers. Already have infrastructure. No Fly.io dependency. |
|
|
| 2 | **Sync strategy** | **Event-based sync via NATS JetStream** — all mutations are events, local node replays events to build state. Perfect audit trail. No table-vs-row debate. |
|
|
| 3 | **NATS on RPi5** | **Sidecar** — separate process managed by systemd/Docker. Independently upgradeable and monitorable. |
|
|
| 4 | **Financial data** | **No money handling at all.** Venues handle payments via their own POS systems (most are cash-based). PVM only tracks game data. |
|
|
| 5 | **Multi-region** | **Single region initially.** Design DB schema and NATS subjects for eventual multi-region without rewrite. |
|
|
| 6 | **Player accounts** | **PVM signup first.** Players always create a PVM account before joining venues. No deduplication problem. |
|
|
| 7 | **Display strategy** | **Generic web app + Android display client.** TVs run a simple Android app (or $40 Android box) that connects to the local node via mDNS auto-discovery, receives its display assignment via WebSocket, and renders a web page. Falls back to cloud SaaS if local node is offline. Chromecast is supported but not the primary path. No Google Cast SDK dependency. |
|
|
| 8 | **RPi5 provisioning** | **Docker on stock Raspberry Pi OS.** All PVM services (node, NATS) run as containers. Updates via image pulls. Provisioning is a one-liner curl script. |
|
|
| 9 | **Offline duration** | **72 hours.** Covers a full weekend tournament series. After 72h offline, warn staff but keep operating. Sync everything on reconnect. |
|
|
| 10 | **API style** | **REST + OpenAPI 3.1.** Auto-generated TypeScript client. Universal, debuggable, works with everything. |
|
|
|
|
## Deferred Questions
|
|
|
|
These remain open for future consideration:
|
|
|
|
1. **API versioning strategy**: Maintain backward compatibility as long as possible. Only version on breaking changes. Revisit when approaching first external API consumers.
|
|
|
|
2. **GraphQL for player-facing app**: REST is sufficient for v1. The player app might benefit from GraphQL's flexible querying later (e.g., "show me my upcoming tournaments across all venues with waitlist status"). **Revisit after v1 launch.**
|
|
|
|
3. **WebTransport**: When browser support matures, could replace WebSockets for lower-latency real-time streams. **Monitor but do not adopt yet.**
|
|
|
|
4. **WASM on local node**: Could parts of the frontend run on the local node via WASM for ultra-fast local rendering. **Defer.**
|
|
|
|
5. **AI features**: Player behavior analytics, optimal table assignments, tournament structure recommendations. The data model should be designed to support future ML pipelines. **Design for it, build later.**
|