From cf03b3592a5645c92caaa2eed87c167d443f0f70 Mon Sep 17 00:00:00 2001 From: Mikkel Georgsen Date: Sun, 8 Feb 2026 02:50:33 +0100 Subject: [PATCH] Add comprehensive tech stack research document 1,190-line research covering all 18 technology areas for PVM: Rust/Axum backend, SvelteKit frontend, Postgres + libSQL databases, NATS + JetStream messaging, DragonflyDB caching, and more. Includes recommended stack summary and open questions. Co-Authored-By: Claude Opus 4.6 --- docs/TECH_STACK_RESEARCH.md | 1190 +++++++++++++++++++++++++++++++++++ 1 file changed, 1190 insertions(+) create mode 100644 docs/TECH_STACK_RESEARCH.md diff --git a/docs/TECH_STACK_RESEARCH.md b/docs/TECH_STACK_RESEARCH.md new file mode 100644 index 0000000..229cafa --- /dev/null +++ b/docs/TECH_STACK_RESEARCH.md @@ -0,0 +1,1190 @@ +# PVM (Poker Venue Manager) — Tech Stack Research + +> Generated: 2026-02-08 +> Status: DRAFT — for discussion and refinement + +--- + +## Table of Contents + +1. [Programming Language](#1-programming-language) +2. [Backend Framework](#2-backend-framework) +3. [Frontend Framework](#3-frontend-framework) +4. [Database Strategy](#4-database-strategy) +5. [Caching Layer](#5-caching-layer) +6. [Message Queue / Event Streaming](#6-message-queue--event-streaming) +7. [Real-Time Communication](#7-real-time-communication) +8. [Auth & Authorization](#8-auth--authorization) +9. [API Design](#9-api-design) +10. [Local Node Architecture](#10-local-node-architecture) +11. [Chromecast / Display Streaming](#11-chromecast--display-streaming) +12. [Mobile Strategy](#12-mobile-strategy) +13. [Deployment & Infrastructure](#13-deployment--infrastructure) +14. [Monitoring & Observability](#14-monitoring--observability) +15. [Testing Strategy](#15-testing-strategy) +16. [Security](#16-security) +17. [Developer Experience](#17-developer-experience) +18. [CSS / Styling](#18-css--styling) +19. [Recommended Stack Summary](#recommended-stack-summary) +20. [Open Questions / Decisions Needed](#open-questions--decisions-needed) + +--- + +## 1. Programming Language + +### Recommendation: **Rust** (backend + local node) + **TypeScript** (frontend + shared types) + +### Alternatives Considered + +| Language | Pros | Cons | +|----------|------|------| +| **Rust** | Memory safety, fearless concurrency, tiny binaries for RPi5, no GC pauses, excellent WebSocket perf | Steeper learning curve, slower compile times | +| **Go** | Simple, fast compilation, good concurrency | Less expressive type system, GC pauses (minor), larger binaries than Rust | +| **TypeScript (full-stack)** | One language everywhere, huge ecosystem, fast dev velocity | Node.js memory overhead on RPi5, GC pauses in real-time scenarios, weaker concurrency model | +| **Elixir** | Built for real-time (Phoenix), fault-tolerant OTP | Small ecosystem, harder to find libs, RPi5 BEAM VM overhead | + +### Reasoning + +Rust is the strongest choice for PVM because of the **RPi5 local node constraint**. The local node must run reliably on constrained hardware with limited memory, handle real-time tournament clocks, manage offline operations, and sync data. Rust's zero-cost abstractions, lack of garbage collector, and small binary sizes (typically 5-15 MB static binaries) make it ideal for this. + +For the **cloud backend**, Rust's performance means fewer servers and lower hosting costs. A single Rust service can handle thousands of concurrent WebSocket connections with minimal memory overhead — critical for real-time tournament updates across many venues. + +The **"all code written by Claude Code"** constraint actually favors Rust: Claude has excellent Rust fluency, and the compiler's strict type system catches bugs that would otherwise require extensive testing in dynamic languages. + +**TypeScript** remains the right choice for the frontend — the browser ecosystem is TypeScript-native, and sharing type definitions between Rust (via generated OpenAPI types) and TypeScript gives end-to-end type safety. + +### Gotchas + +- Rust compile times can be mitigated with `cargo-watch`, incremental compilation, and `sccache` +- Cross-compilation for RPi5 (ARM64) is well-supported via `cross` or `cargo-zigbuild` +- Shared domain types can be generated from Rust structs to TypeScript via `ts-rs` or OpenAPI codegen + +--- + +## 2. Backend Framework + +### Recommendation: **Axum** (v0.8+) + +### Alternatives Considered + +| Framework | Pros | Cons | +|-----------|------|------| +| **Axum** | Tokio-native, excellent middleware (Tower), lowest memory footprint, growing ecosystem, WebSocket built-in | Younger than Actix | +| **Actix Web** | Highest raw throughput, most mature | Actor model adds complexity, not Tokio-native (uses own runtime fork) | +| **Rocket** | Most ergonomic, Rails-like DX | Slower performance, less flexible middleware | +| **Loco** | Rails-like conventions, batteries-included | Very new (2024), smaller community, opinionated | + +### Reasoning + +**Axum** is the clear winner for PVM: + +1. **Tokio-native**: Axum is built directly on Tokio + Hyper + Tower. Since NATS, database drivers, and WebSocket handling all use Tokio, everything shares one async runtime — no impedance mismatch. +2. **Tower middleware**: The Tower service/layer pattern gives composable middleware for auth, rate limiting, tracing, compression, CORS, etc. Middleware can be shared between HTTP and WebSocket handlers. +3. **WebSocket support**: First-class WebSocket extraction with `axum::extract::ws`, typed WebSocket messages via `axum-typed-websockets`. +4. **Memory efficiency**: Benchmarks show Axum achieves the lowest memory footprint per connection — critical when serving thousands of concurrent venue connections. +5. **OpenAPI integration**: `utoipa` crate provides derive macros for generating OpenAPI 3.1 specs directly from Axum handler types. +6. **Extractor pattern**: Axum's extractor-based request handling maps cleanly to domain operations (extract tenant, extract auth, extract venue context). + +### Key Libraries + +- `axum` — HTTP framework +- `axum-extra` — typed headers, cookie jar, multipart +- `tower` + `tower-http` — middleware stack (CORS, compression, tracing, rate limiting) +- `utoipa` + `utoipa-axum` — OpenAPI spec generation +- `utoipa-swagger-ui` — embedded Swagger UI +- `axum-typed-websockets` — strongly typed WS messages + +### Gotchas + +- Axum's error handling requires careful design — use `thiserror` + a custom error type that implements `IntoResponse` +- Route organization: use `axum::Router::nest()` for modular route trees per domain (tournaments, venues, players) +- State management: use `axum::extract::State` with `Arc` — avoid the temptation to put everything in one giant state struct + +--- + +## 3. Frontend Framework + +### Recommendation: **SvelteKit** (Svelte 5 + runes reactivity) + +### Alternatives Considered + +| Framework | Pros | Cons | +|-----------|------|------| +| **SvelteKit** | Smallest bundles, true compilation (no virtual DOM), built-in routing/SSR/PWA, Svelte 5 runes are elegant | Smaller ecosystem than React | +| **Next.js (React)** | Largest ecosystem, most libraries, biggest job market | Vercel lock-in concerns, React hydration overhead, larger bundles, RSC complexity | +| **SolidStart** | Finest-grained reactivity, near-zero overhead updates | Smallest ecosystem, least mature, fewer component libraries | +| **Nuxt (Vue)** | Good DX, solid ecosystem | Vue 3 composition API less elegant than Svelte 5 runes | + +### Reasoning + +**SvelteKit** is the best fit for PVM for several reasons: + +1. **Performance matters for venue displays**: Tournament clocks, waiting lists, and seat maps will run on venue TVs via Chromecast. Svelte's compiled output produces minimal JavaScript — the Cast receiver app will load faster and use less memory on Chromecast hardware. +2. **Real-time UI updates**: Svelte 5's fine-grained reactivity (runes: `$state`, `$derived`, `$effect`) means updating a single timer or seat status re-renders only that DOM node, not a virtual DOM diff. This is ideal for dashboards with many independently updating elements. +3. **PWA support**: SvelteKit has first-class service worker support and offline capabilities through `@sveltejs/adapter-static` and `vite-plugin-pwa`. +4. **Bundle size**: SvelteKit produces the smallest JavaScript bundles of any major framework — important for mobile PWA users on venue WiFi. +5. **Claude Code compatibility**: Svelte's template syntax is straightforward and less boilerplate than React — Claude can generate clean, readable Svelte components efficiently. +6. **No framework lock-in**: Svelte compiles away, so there's no runtime dependency. The output is vanilla JS + DOM manipulation. + +### UI Component Library + +**Recommendation: Skeleton UI** (Svelte-native) or **shadcn-svelte** (Tailwind-based, port of shadcn/ui) + +`shadcn-svelte` is particularly compelling because: +- Components are copied into your codebase (not a dependency) — full control +- Built on Tailwind CSS — consistent with the styling recommendation +- Accessible by default (uses Bits UI primitives under the hood) +- Matches the design patterns of the widely-used shadcn/ui ecosystem + +### Gotchas + +- SvelteKit's SSR is useful for the management dashboard but the Cast receiver and PWA may use `adapter-static` for pure SPA mode +- Svelte's ecosystem is smaller than React's, but for PVM's needs (forms, tables, charts, real-time) the ecosystem is sufficient +- Svelte 5 (runes) is a significant API change from Svelte 4 — ensure all examples and libraries target Svelte 5 + +--- + +## 4. Database Strategy + +### Recommendation: **PostgreSQL** (cloud primary) + **libSQL/SQLite** (local node) + **Electric SQL** or custom sync + +### Alternatives Considered + +| Approach | Pros | Cons | +|----------|------|------| +| **Postgres cloud + libSQL local + sync** | Best of both worlds — Postgres power in cloud, SQLite simplicity on RPi5 | Need sync layer, schema divergence risk | +| **Postgres everywhere** | One DB engine, simpler mental model | Postgres on RPi5 uses more memory, harder offline | +| **libSQL/Turso everywhere** | One engine, built-in edge replication | Less powerful for complex cloud queries, multi-tenant partitioning | +| **CockroachDB** | Distributed, strong consistency | Heavy for RPi5, expensive, overkill | + +### Detailed Recommendation + +**Cloud Database: PostgreSQL 16+** +- The gold standard for multi-tenant SaaS +- Row-level security (RLS) for tenant isolation +- JSONB for flexible per-venue configuration +- Excellent full-text search for player lookup across venues +- Partitioning by tenant for performance at scale +- Managed options: Neon (serverless, branching for dev), Supabase, or AWS RDS + +**Local Node Database: libSQL (via Turso's embedded runtime)** +- Fork of SQLite with cloud sync capabilities +- Runs embedded in the Rust binary — no separate database process on RPi5 +- WAL mode for concurrent reads during tournament operations +- Tiny memory footprint (< 10 MB typical) +- libSQL's Rust driver (`libsql`) is well-maintained + +**Sync Strategy:** + +The local node operates on a **subset** of the cloud data — only data relevant to its venue(s). The sync approach: + +1. **Cloud-to-local**: Player profiles, memberships, credit lines pushed to local node via NATS JetStream. Local node maintains a read replica of relevant data in libSQL. +2. **Local-to-cloud**: Tournament results, waitlist changes, transactions pushed to cloud via NATS JetStream with at-least-once delivery. Cloud processes as events. +3. **Conflict resolution**: Last-writer-wins (LWW) with vector clocks for most entities. For financial data (credit lines, buy-ins), use **event sourcing** — conflicts are impossible because every transaction is an immutable event. +4. **Offline queue**: When disconnected, local node queues mutations in a local WAL-style append-only log. On reconnect, replays in order via NATS. + +### ORM / Query Layer + +**Recommendation: `sqlx`** (compile-time checked queries) + +- `sqlx` checks SQL queries against the actual database schema at compile time +- No ORM abstraction layer — write real SQL, get compile-time safety +- Supports both PostgreSQL and SQLite/libSQL +- Avoids the N+1 query problems that ORMs introduce +- Migrations via `sqlx migrate` + +Alternative: `sea-orm` if you want a full ORM, but for PVM the explicit SQL approach of `sqlx` gives more control over multi-tenant queries and complex joins. + +### Migrations + +- Use `sqlx migrate` for cloud PostgreSQL migrations +- Maintain parallel migration files for libSQL (SQLite-compatible subset) +- A shared migration test ensures both schemas stay compatible for the sync subset + +### Gotchas + +- PostgreSQL and SQLite have different SQL dialects — the sync subset must use compatible types (no Postgres-specific types in synced tables) +- libSQL's `VECTOR` type is interesting for future player similarity features but not needed initially +- Turso's hosted libSQL replication is an option but adds a dependency — prefer embedded libSQL with custom NATS-based sync for more control +- Schema versioning must be tracked on the local node so the cloud knows what schema version it's talking to + +--- + +## 5. Caching Layer + +### Recommendation: **DragonflyDB** + +### Alternatives Considered + +| Option | Pros | Cons | +|--------|------|------| +| **DragonflyDB** | 25x Redis throughput, Redis-compatible API, multi-threaded, lower memory usage | Younger project, smaller community | +| **Redis 7+** | Most mature, largest ecosystem, Redis Stack modules | Single-threaded core, BSL license concerns since Redis 7.4 | +| **Valkey** | Redis fork, community-driven, BSD license | Still catching up to Redis feature parity | +| **KeyDB** | Multi-threaded Redis fork | Development appears stalled (no updates in 1.5+ years) | +| **No cache (just Postgres)** | Simpler architecture | Higher DB load, slower for session/real-time data | + +### Reasoning + +**DragonflyDB** is the right choice for PVM: + +1. **Redis API compatibility**: Drop-in replacement — all Redis client libraries work unchanged. The `fred` Rust crate (async Redis client) works with DragonflyDB out of the box. +2. **Multi-threaded architecture**: DragonflyDB uses all available CPU cores, unlike Redis's single-threaded model. This matters when caching tournament state for hundreds of concurrent venues. +3. **Memory efficiency**: DragonflyDB uses up to 80% less memory than Redis for the same dataset — important for keeping infrastructure costs low. +4. **No license concerns**: DragonflyDB uses BSL 1.1 (converts to open source after 4 years). Redis switched to a dual-license model that's more restrictive. Valkey is BSD but is playing catch-up. +5. **Pub/Sub**: DragonflyDB supports Redis Pub/Sub — useful as a lightweight complement to NATS for in-process event distribution within the backend cluster. + +### What to Cache + +- **Session data**: User sessions, JWT refresh tokens +- **Tournament state**: Current level, blinds, clock, player counts (hot read path) +- **Waiting lists**: Ordered sets per venue/game type +- **Rate limiting**: API rate limit counters +- **Player lookup cache**: Frequently accessed player profiles +- **Seat maps**: Current table/seat assignments per venue + +### What NOT to Cache (use Postgres directly) + +- Financial transactions (credit lines, buy-ins) — always hit the source of truth +- Audit logs +- Historical tournament data + +### Local Node: No DragonflyDB + +The RPi5 local node should **not** run DragonflyDB. libSQL is fast enough for local caching needs, and adding another process increases complexity and memory usage on constrained hardware. Use in-memory Rust data structures (e.g., `DashMap`, `moka` cache crate) for hot local state. + +### Gotchas + +- DragonflyDB's replication features are less mature than Redis Sentinel/Cluster — use managed hosting or keep it simple with a single node + persistence initially +- Monitor DragonflyDB's release cycle — it's actively developed but younger than Redis +- Keep the cache layer optional — the system should function (slower) without it + +--- + +## 6. Message Queue / Event Streaming + +### Recommendation: **NATS + JetStream** + +### Alternatives Considered + +| Option | Pros | Cons | +|--------|------|------| +| **NATS + JetStream** | Lightweight (single binary, ~20MB), sub-ms latency, built-in persistence, embedded mode, perfect for edge | Smaller community than Kafka | +| **Apache Kafka** | Highest throughput, mature, excellent tooling | Heavy (JVM, ZooKeeper/KRaft), 4GB+ RAM minimum, overkill for PVM's scale | +| **RabbitMQ** | Mature AMQP, sophisticated routing | Higher latency (5-20ms), more memory, Erlang ops complexity | +| **Redis Streams** | Simple, already have cache layer | Not designed for reliable message delivery at scale | + +### Reasoning + +**NATS + JetStream** is purpose-built for PVM's architecture: + +1. **Edge-native**: NATS can run as a **leaf node** on the RPi5, connecting to the cloud NATS cluster. This is the core of the local-to-cloud sync architecture. When the connection drops, JetStream buffers messages locally and replays them on reconnect. + +2. **Lightweight**: NATS server is a single ~20 MB binary. On RPi5, it uses ~50 MB RAM. Compare to Kafka's 4 GB minimum. + +3. **Sub-millisecond latency**: Core NATS delivers messages in < 1ms. JetStream (persistent) adds 1-5ms. This is critical for real-time tournament updates — when a player busts, every connected display should update within milliseconds. + +4. **Subject-based addressing**: NATS subjects map perfectly to PVM's domain: + - `venue.{venue_id}.tournament.{id}.clock` — tournament clock ticks + - `venue.{venue_id}.waitlist.update` — waiting list changes + - `venue.{venue_id}.seats.{table_id}` — seat assignments + - `player.{player_id}.notifications` — player-specific events + - `sync.{node_id}.upstream` — local node to cloud sync + - `sync.{node_id}.downstream` — cloud to local node sync + +5. **Built-in patterns**: Request/reply (for RPC between cloud and node), pub/sub (for broadcasts), queue groups (for load-balanced consumers), key-value store (for distributed config), object store (for binary data like player photos). + +6. **JetStream for durability**: Tournament results, financial transactions, and sync operations need guaranteed delivery. JetStream provides at-least-once and exactly-once delivery semantics with configurable retention. + +### Architecture + +``` +RPi5 Local Node Cloud +┌──────────────┐ ┌──────────────────┐ +│ NATS Leaf │◄──── TLS ────►│ NATS Cluster │ +│ Node │ (auto- │ (3-node) │ +│ │ reconnect) │ │ +│ JetStream │ │ JetStream │ +│ (local buf) │ │ (persistent) │ +└──────────────┘ └──────────────────┘ +``` + +### Gotchas + +- NATS JetStream's exactly-once semantics require careful consumer design — use idempotent handlers with deduplication IDs +- Subject namespace design is critical — plan it early, changing later is painful +- NATS leaf nodes need TLS configured for secure cloud connection +- Monitor JetStream stream sizes on RPi5 — set max bytes limits to avoid filling the SD card during extended offline periods +- The `async-nats` Rust crate is the official async client — well maintained and Tokio-native + +--- + +## 7. Real-Time Communication + +### Recommendation: **WebSockets** (via Axum) for interactive clients + **NATS** for backend fan-out + **SSE** as fallback + +### Alternatives Considered + +| Option | Pros | Cons | +|--------|------|------| +| **WebSockets** | Full duplex, low latency, wide support | Requires connection management, can't traverse some proxies | +| **Server-Sent Events (SSE)** | Simpler, auto-reconnect, HTTP-native | Server-to-client only, no binary support | +| **WebTransport** | HTTP/3, multiplexed streams, unreliable mode | Very new, limited browser support, no Chromecast support | +| **Socket.IO** | Auto-fallback, rooms, namespaces | Node.js-centric, adds overhead, not Rust-native | +| **gRPC streaming** | Typed, efficient, bidirectional | Not browser-native (needs grpc-web proxy), overkill | + +### Architecture + +The real-time pipeline has three layers: + +1. **NATS** (backend event bus): All state changes publish to NATS subjects. This is the single source of real-time truth. Both cloud services and local nodes publish here. + +2. **WebSocket Gateway** (Axum): A dedicated Axum service subscribes to relevant NATS subjects and fans out to connected WebSocket clients. Each client subscribes to the venues/tournaments they care about. + +3. **SSE Fallback**: For environments where WebSockets are blocked (some corporate networks), provide an SSE endpoint that delivers the same event stream. SSE's built-in auto-reconnect with `Last-Event-ID` makes resumption simple. + +### Flow Example: Tournament Clock Update + +``` +Tournament Service (Rust) + → publishes to NATS: venue.123.tournament.456.clock {level: 5, time_remaining: 1200} + → WebSocket Gateway subscribes to venue.123.tournament.* + → fans out to all connected clients watching tournament 456 + → Chromecast receiver app gets update, renders clock + → PWA on player's phone gets update, shows current level +``` + +### Implementation Details + +- Use `axum::extract::ws::WebSocket` with `tokio::select!` to multiplex NATS subscription + client messages +- Implement heartbeat/ping-pong to detect stale connections (30s interval, 10s timeout) +- Client reconnection with exponential backoff + subscription replay from NATS JetStream +- Binary message format: consider MessagePack (`rmp-serde`) for compact payloads over WebSocket, with JSON as human-readable fallback +- Connection limits: track per-venue connection count, implement backpressure + +### Gotchas + +- WebSocket connections are stateful — need sticky sessions or a connection registry if running multiple gateway instances +- Chromecast receiver apps have limited WebSocket support — test thoroughly on actual hardware +- Mobile PWAs going to background will drop WebSocket connections — design for reconnection and state catch-up +- Rate limit outbound messages to prevent flooding slow clients (tournament clock ticks should be throttled to 1/second for display, even if internal state updates more frequently) + +--- + +## 8. Auth & Authorization + +### Recommendation: **Custom JWT auth** with **Postgres-backed RBAC** + optional **OAuth2 social login** + +### Alternatives Considered + +| Option | Pros | Cons | +|--------|------|------| +| **Custom JWT + RBAC** | Full control, no vendor dependency, works offline on local node | Must implement everything yourself | +| **Auth0 / Clerk** | Managed, social login, MFA out of box | Vendor lock-in, cost scales with users, doesn't work offline | +| **Keycloak** | Self-hosted, full-featured, OIDC/SAML | Heavy (Java), complex to operate, overkill | +| **Ory (Kratos + Keto)** | Open source, cloud-native, API-first | Multiple services to deploy, newer | +| **Lucia Auth** | Lightweight, framework-agnostic | TypeScript-only, no Rust support | + +### Architecture + +PVM's auth has a unique challenge: **cross-venue universal player accounts** that must work both online (cloud) and offline (local node). This rules out purely managed auth services. + +**Token Strategy:** + +``` +Access Token (JWT, short-lived: 15 min) +├── sub: player_id (universal) +├── tenant_id: current operator +├── venue_id: current venue (if applicable) +├── roles: ["player", "dealer", "floor_manager", "admin"] +├── permissions: ["tournament.manage", "waitlist.view", ...] +└── iat, exp, iss + +Refresh Token (opaque, stored in DB/DragonflyDB, long-lived: 30 days) +└── Rotated on each use, old tokens invalidated +``` + +**RBAC Model:** + +``` +Operator (tenant) +├── Admin — full control over all venues +├── Manager — manage specific venues +├── Floor Manager — tournament/table operations at a venue +├── Dealer — assigned to tables, report results +└── Player — universal account, cross-venue + ├── can self-register + ├── has memberships per venue + └── has credit lines per venue (managed by admin) +``` + +**Key Design Decisions:** + +1. **Tenant-scoped roles**: A user can be an admin in one operator's venues and a player in another. The `(user_id, operator_id, role)` triple is the authorization unit. +2. **Offline auth on local node**: The local node caches valid JWT signing keys and a subset of user credentials (hashed). Players can authenticate locally when the cloud is unreachable. New registrations queue for cloud sync. +3. **JWT signing**: Use Ed25519 (fast, small signatures) via the `jsonwebtoken` crate. The cloud signs tokens; the local node can verify them with the public key. For offline token issuance, the local node has a delegated signing key. +4. **Password hashing**: `argon2` crate — memory-hard, resistant to GPU attacks. Tune parameters for RPi5 (lower memory cost than cloud). +5. **Social login** (optional, cloud-only): Support Google/Apple sign-in for player accounts via standard OAuth2 flows. Map social identities to the universal player account. + +### Gotchas + +- Token revocation is hard with JWTs — use short expiry (15 min) + refresh token rotation + a lightweight blocklist in DragonflyDB for immediate revocation +- Cross-venue account linking: when a player signs up at venue A and later visits venue B (different operator), they should be recognized. Use email/phone as the universal identifier with verification. +- Local node token issuance must be time-limited and logged — cloud should audit all locally-issued tokens on sync +- Rate limit login attempts both on cloud and local node to prevent brute force + +--- + +## 9. API Design + +### Recommendation: **REST + OpenAPI 3.1** with generated TypeScript client + +### Alternatives Considered + +| Approach | Pros | Cons | +|----------|------|------| +| **REST + OpenAPI** | Universal, tooling-rich, generated clients, cacheable | Overfetching possible, multiple round trips | +| **GraphQL** | Flexible queries, single endpoint, good for complex UIs | Complexity overhead, caching harder, Rust support less mature | +| **tRPC** | Zero-config type safety | TypeScript-only — cannot use with Rust backend | +| **gRPC** | Efficient binary protocol, streaming | Needs proxy for browsers, overkill for this use case | + +### Reasoning + +**tRPC is ruled out** because it requires both client and server to be TypeScript. With a Rust backend, this is not viable. + +**REST + OpenAPI** is the best approach because: + +1. **Generated type safety**: Use `utoipa` to generate OpenAPI 3.1 specs from Rust types, then `openapi-typescript` to generate TypeScript types for the frontend. Changes to the Rust API automatically propagate to the frontend types. +2. **Cacheable**: REST's HTTP semantics enable CDN caching, ETag support, and conditional requests — important for player profiles and tournament structures that change infrequently. +3. **Universal clients**: The REST API will also be consumed by the Chromecast receiver app, the local node sync layer, and potentially third-party integrations. OpenAPI makes all of these easy. +4. **Tooling**: Swagger UI for exploration, `openapi-fetch` for the TypeScript client (type-safe fetch wrapper), Postman/Insomnia for testing. + +### API Conventions + +``` +# Resource-based URLs +GET /api/v1/venues/{venue_id}/tournaments +POST /api/v1/venues/{venue_id}/tournaments +GET /api/v1/venues/{venue_id}/tournaments/{id} +PATCH /api/v1/venues/{venue_id}/tournaments/{id} + +# Actions as sub-resources +POST /api/v1/venues/{venue_id}/tournaments/{id}/start +POST /api/v1/venues/{venue_id}/tournaments/{id}/pause +POST /api/v1/venues/{venue_id}/waitlists/{id}/join +POST /api/v1/venues/{venue_id}/waitlists/{id}/call/{player_id} + +# Cross-venue player operations +GET /api/v1/players/me +GET /api/v1/players/{id}/memberships +POST /api/v1/players/{id}/credit-lines + +# Real-time subscriptions +WS /api/v1/ws?venue={id}&subscribe=tournament.clock,waitlist.updates +``` + +### Type Generation Pipeline + +``` +Rust structs (serde + utoipa derive) + → OpenAPI 3.1 JSON spec (generated at build time) + → openapi-typescript (CI step) + → TypeScript types + openapi-fetch client + → SvelteKit frontend consumes typed API +``` + +### Gotchas + +- Version the API from day one (`/api/v1/`) — breaking changes go in `/api/v2/` +- Use cursor-based pagination for lists (not offset-based) — more efficient and handles concurrent inserts +- Standardize error responses: `{ error: { code: string, message: string, details?: any } }` +- Consider a lightweight BFF (Backend-for-Frontend) pattern in SvelteKit's server routes for aggregating multiple API calls into one page load + +--- + +## 10. Local Node Architecture + +### Recommendation: **Single Rust binary** running on RPi5 with embedded libSQL, NATS leaf node, and local HTTP/WS server + +### What Runs on the RPi5 + +``` +┌─────────────────────────────────────────────────────┐ +│ PVM Local Node (single Rust binary, ~15-20 MB) │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ │ +│ │ HTTP/WS │ │ NATS Leaf │ │ +│ │ Server │ │ Node │ │ +│ │ (Axum) │ │ (embedded or │ │ +│ │ │ │ sidecar) │ │ +│ └──────┬───────┘ └──────┬───────┘ │ +│ │ │ │ +│ ┌──────┴──────────────────┴───────┐ │ +│ │ Application Core │ │ +│ │ - Tournament engine │ │ +│ │ - Clock manager │ │ +│ │ - Waitlist manager │ │ +│ │ - Seat assignment │ │ +│ │ - Sync orchestrator │ │ +│ └──────────────┬───────────────────┘ │ +│ │ │ +│ ┌──────────────┴───────────────────┐ │ +│ │ libSQL (embedded) │ │ +│ │ - Venue data subset │ │ +│ │ - Offline mutation queue │ │ +│ │ - Local auth cache │ │ +│ └───────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────┐ │ +│ │ moka in-memory cache │ │ +│ │ - Hot tournament state │ │ +│ │ - Active session tokens │ │ +│ └───────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### Offline Operations + +When the cloud connection drops, the local node continues operating: + +1. **Tournament operations**: Clock continues, blinds advance, players bust/rebuy — all local state +2. **Waitlist management**: Players can join/leave waitlists — queued for cloud sync +3. **Seat assignments**: Floor managers can move players between tables locally +4. **Player auth**: Cached credentials allow existing players to log in. New registrations queued. +5. **Financial operations**: Buy-ins and credit transactions logged locally with offline flag. Cloud reconciles on reconnect. + +### Sync Protocol + +``` +On reconnect: +1. Local node sends its last-seen cloud sequence number +2. Cloud sends all events since that sequence (via NATS JetStream replay) +3. Local node sends its offline mutation queue (ordered by local timestamp) +4. Cloud processes mutations, detects conflicts, responds with resolution +5. Local node applies cloud resolutions, updates local state +6. Both sides confirm sync complete +``` + +### Conflict Resolution Strategy + +| Data Type | Strategy | Reasoning | +|-----------|----------|-----------| +| Tournament state | Cloud wins | Only one node runs a tournament at a time | +| Waitlist | Merge (union) | Both sides can add/remove; merge and re-order by timestamp | +| Player profiles | Cloud wins (LWW) | Cloud is the authority for universal accounts | +| Credit transactions | Append-only (event sourcing) | No conflicts — every transaction is immutable | +| Seat assignments | Local wins during offline | Floor manager's local decisions take precedence | +| Dealer schedules | Cloud wins | Schedules are set centrally | + +### RPi5 System Setup + +- **OS**: Raspberry Pi OS Lite (64-bit, Debian Bookworm-based) — no desktop environment +- **Storage**: 32 GB+ microSD or USB SSD (recommended for durability) +- **Auto-start**: systemd service for the PVM binary +- **Updates**: OTA binary updates via a self-update mechanism (download new binary, verify signature, swap, restart) +- **Watchdog**: Hardware watchdog timer to auto-reboot if the process hangs +- **Networking**: Ethernet preferred (reliable), WiFi as fallback. mDNS for local discovery. + +### Gotchas + +- RPi5 has 4 GB or 8 GB RAM — target 8 GB model, budget ~200 MB for the PVM process + NATS +- SD card wear: use an external USB SSD for the libSQL database if heavy write operations are expected +- Time synchronization: use `chrony` NTP client — accurate timestamps are critical for conflict resolution and tournament clocks +- Power loss: libSQL in WAL mode is crash-safe, but implement a clean shutdown handler (SIGTERM) that flushes state +- Security: the RPi5 is physically accessible in venues — encrypt the libSQL database at rest, disable SSH password auth, use key-only + +--- + +## 11. Chromecast / Display Streaming + +### Recommendation: **Google Cast SDK** with a **Custom Web Receiver** (SvelteKit static app) + +### Architecture + +``` +┌──────────────┐ Cast SDK ┌──────────────────┐ +│ Sender App │ ──────────────► │ Custom Web │ +│ (PVM Admin │ (discovers & │ Receiver │ +│ Dashboard) │ launches) │ (SvelteKit SPA) │ +│ │ │ │ +│ or │ │ Hosted at: │ +│ │ │ cast.pvmapp.com │ +│ Local Node │ │ │ +│ HTTP Server │ │ Connects to WS │ +│ │ │ for live updates │ +└──────────────┘ └────────┬───────────┘ + │ + ┌────────▼───────────┐ + │ Chromecast Device │ + │ (renders receiver) │ + └────────────────────┘ +``` + +### Custom Web Receiver + +The Cast receiver is a **separate SvelteKit static app** that: + +1. Loads on the Chromecast device when cast is initiated +2. Connects to the PVM WebSocket endpoint (cloud or local node, depending on network) +3. Subscribes to venue-specific events (tournament clock, waitlist, seat map) +4. Renders full-screen display layouts: + - **Tournament clock**: Large timer, current level, blind structure, next break + - **Waiting list**: Player queue by game type, estimated wait times + - **Table status**: Open seats, game types, stakes per table + - **Custom messages**: Announcements, promotions + +### Display Manager + +A venue can have **multiple Chromecast devices** showing different content: + +- TV 1: Tournament clock (main) +- TV 2: Cash game waiting list +- TV 3: Table/seat map +- TV 4: Rotating between tournament clock and waiting list + +The **Display Manager** (part of the admin dashboard) lets floor managers: +- Assign content to each Chromecast device +- Configure rotation/cycling between views +- Send one-time announcements to all screens +- Adjust display themes (dark/light, font size, venue branding) + +### Technical Details + +- Register the receiver app with Google Cast Developer Console (one-time setup, $5 fee) +- Use Cast Application Framework (CAF) Receiver SDK v3 +- The receiver app is a standard web page — can use any web framework (SvelteKit static build) +- Sender integration: use the `cast.framework.CastContext` API in the admin dashboard +- For **local network casting** (offline mode): the local node serves the receiver app directly, and the Chromecast connects to the local node's IP +- Consider also supporting **generic HDMI displays** via a simple browser in kiosk mode (Chromium on a secondary RPi or mini PC) as a non-Chromecast fallback + +### Gotchas + +- Chromecast devices have limited memory and CPU — keep the receiver app lightweight (Svelte is ideal here) +- Cast sessions can timeout after inactivity — implement keep-alive messages +- Chromecast requires an internet connection for initial app load (it fetches the receiver URL from Google's servers) — for fully offline venues, the kiosk-mode browser fallback is essential +- Test on actual Chromecast hardware early — the developer emulator doesn't catch all issues +- Cast SDK requires HTTPS for the receiver URL in production (self-signed certs won't work on Chromecast) + +--- + +## 12. Mobile Strategy + +### Recommendation: **PWA first** (SvelteKit), with **Capacitor** wrapper for app store presence when needed + +### Alternatives Considered + +| Approach | Pros | Cons | +|----------|------|------| +| **PWA (SvelteKit)** | One codebase, instant updates, no app store, works offline | Limited native API access, no push on iOS (improving), discoverability | +| **Capacitor (hybrid)** | PWA + native shell, access native APIs, app store distribution | Thin WebView wrapper, some performance overhead | +| **Tauri Mobile** | Rust backend, small size | Mobile support very early (alpha/beta), limited ecosystem | +| **React Native** | True native UI, large ecosystem | Separate codebase from web, React dependency, not Svelte | +| **Flutter** | Excellent cross-platform, single codebase | Dart language, separate from web entirely | + +### Reasoning + +PVM's mobile needs are primarily **consumption-oriented** — players check tournament schedules, waiting list position, and receive notifications. This is a perfect fit for a PWA: + +1. **PWA first**: The SvelteKit app with `vite-plugin-pwa` already provides offline caching, add-to-home-screen, and background sync. For most players, this is sufficient. + +2. **Capacitor wrap when needed**: When iOS push notifications, Apple Pay, or app store presence becomes important, wrap the existing SvelteKit PWA in Capacitor. Capacitor runs the same web app in a native WebView and provides JavaScript bridges to native APIs. + +3. **Tauri Mobile is not ready**: As of 2026, Tauri 2.0's mobile support exists but is still maturing. It would be a good fit architecturally (Rust backend + web frontend), but the plugin ecosystem and build tooling aren't as polished as Capacitor's. Revisit in 12-18 months. + +### PWA Features for PVM + +- **Service Worker**: Cache tournament schedules, player profile, venue info for offline access +- **Push Notifications**: Web Push API for tournament start reminders, waitlist calls (Android + iOS 16.4+) +- **Add to Home Screen**: App-like experience without app store +- **Background Sync**: Queue waitlist join/leave actions when offline, sync when back online +- **Share Target**: Accept shared tournament links + +### Gotchas + +- iOS PWA support is improving but still has limitations (no background fetch, limited push notification payload) +- Capacitor requires maintaining iOS/Android build pipelines — only add this when there's a clear need +- Test PWA on actual mobile devices in venues — WiFi quality varies dramatically +- Deep linking: configure universal links / app links so shared tournament URLs open in the PWA/app + +--- + +## 13. Deployment & Infrastructure + +### Recommendation: **Fly.io** (primary cloud) + **Docker** containers + **GitHub Actions** CI/CD + +### Alternatives Considered + +| Platform | Pros | Cons | +|----------|------|------| +| **Fly.io** | Edge deployment, built-in Postgres, simple scaling, good pricing, Rust-friendly | CLI-first workflow, no built-in CI/CD | +| **Railway** | Excellent DX, GitHub integration, preview environments | Less edge presence, newer | +| **AWS (ECS/Fargate)** | Full control, enterprise grade, broadest service catalog | Complex, expensive operations overhead | +| **Render** | Simple, good free tier | Less flexible networking, no edge | +| **Hetzner + manual** | Cheapest, full control | Operations burden, no managed services | + +### Reasoning + +**Fly.io** is the best fit for PVM: + +1. **Edge deployment**: Fly.io runs containers close to users. For a poker venue SaaS with venues in multiple cities/countries, edge deployment means lower latency for real-time tournament updates. +2. **Built-in Postgres**: Fly Postgres is managed, with automatic failover and point-in-time recovery. +3. **Fly Machines**: Fine-grained control over machine placement — can run NATS, DragonflyDB, and the API server as separate Fly machines. +4. **Rust-friendly**: Fly.io's multi-stage Docker builds work well for Rust (build on large machine, deploy tiny binary). +5. **Private networking**: Fly's WireGuard mesh enables secure communication between services without exposing ports publicly. The RPi5 local nodes can use Fly's WireGuard to connect to the cloud NATS cluster. +6. **Reasonable pricing**: Pay-as-you-go, no minimum commitment. Scale to zero for staging environments. + +### Infrastructure Layout + +``` +Fly.io Cloud +├── pvm-api (Axum, 2+ instances, auto-scaled) +├── pvm-ws-gateway (Axum WebSocket, 2+ instances) +├── pvm-nats (NATS cluster, 3 nodes) +├── pvm-db (Fly Postgres, primary + replica) +├── pvm-cache (DragonflyDB, single node) +└── pvm-worker (background jobs: sync processing, notifications) + +Venue (RPi5) +└── pvm-node (single Rust binary + NATS leaf node) + └── connects to pvm-nats via WireGuard/TLS +``` + +### CI/CD Pipeline (GitHub Actions) + +```yaml +# Triggered on push to main +1. Lint (clippy, eslint) +2. Test (cargo test, vitest, playwright) +3. Build (multi-stage Docker for cloud, cross-compile for RPi5) +4. Deploy staging (auto-deploy to Fly.io staging) +5. E2E tests against staging +6. Deploy production (manual approval gate) +7. Publish RPi5 binary (signed, to update server) +``` + +### Gotchas + +- Fly.io Postgres is not fully managed — you still need to handle major version upgrades and backup verification +- Use multi-stage Docker builds to keep Rust image sizes small (builder stage with `rust:bookworm`, runtime stage with `debian:bookworm-slim` or `distroless`) +- Pin Fly.io machine regions to match your target markets — don't spread too thin initially +- Set up blue-green deployments for zero-downtime upgrades +- The RPi5 binary update mechanism needs a rollback strategy — keep the previous binary and a fallback boot option + +--- + +## 14. Monitoring & Observability + +### Recommendation: **OpenTelemetry** (traces + metrics + logs) exported to **Grafana Cloud** (or self-hosted Grafana + Loki + Tempo + Prometheus) + +### Alternatives Considered + +| Stack | Pros | Cons | +|-------|------|------| +| **OpenTelemetry + Grafana** | Vendor-neutral, excellent Rust support, unified pipeline | Some setup required | +| **Datadog** | All-in-one, excellent UX | Expensive at scale, vendor lock-in | +| **New Relic** | Good APM | Cost, Rust support less first-class | +| **Sentry** | Excellent error tracking | Limited metrics/traces, complementary rather than primary | + +### Rust Instrumentation Stack + +```toml +# Key crates +tracing = "0.1" # Structured logging/tracing facade +tracing-subscriber = "0.3" # Log formatting, filtering +tracing-opentelemetry = "0.28" # Bridge tracing → OpenTelemetry +opentelemetry = "0.28" # OTel SDK +opentelemetry-otlp = "0.28" # OTLP exporter +opentelemetry-semantic-conventions # Standard attribute names +``` + +### What to Monitor + +**Application Metrics:** +- Request rate, latency (p50/p95/p99), error rate per endpoint +- WebSocket connection count per venue +- NATS message throughput and consumer lag +- Tournament clock drift (local node vs cloud time) +- Sync latency (time from local mutation to cloud persistence) +- Cache hit/miss ratios (DragonflyDB) + +**Business Metrics:** +- Active tournaments per venue +- Players on waiting lists +- Concurrent connected users +- Tournament registrations per hour +- Offline duration per local node + +**Infrastructure Metrics:** +- CPU, memory, disk per service +- RPi5 node health: temperature, memory usage, SD card wear level +- NATS cluster health +- Postgres connection pool utilization + +### Local Node Observability + +The RPi5 node should: +- Buffer OpenTelemetry spans/metrics locally when offline +- Flush to cloud collector on reconnect +- Expose a local `/health` endpoint for venue staff to check node status +- Log to both stdout (for `journalctl`) and a rotating file + +### Alerting + +- Use Grafana Alerting for cloud services +- Critical alerts: API error rate > 5%, NATS cluster partition, Postgres replication lag > 30s +- Warning alerts: RPi5 node offline > 5 min, sync backlog > 1000 events, high memory usage +- Notification channels: Slack/Discord for ops team, push notification for venue managers on critical local node issues + +### Gotchas + +- OpenTelemetry's Rust SDK is stable but evolving — pin versions carefully +- The `tracing` crate is the Rust ecosystem standard — everything (Axum, sqlx, async-nats) already emits tracing spans, so you get deep instrumentation for free +- Sampling is important at scale — don't trace every tournament clock tick in production +- Grafana Cloud's free tier is generous enough for early stages (10k metrics, 50GB logs, 50GB traces) + +--- + +## 15. Testing Strategy + +### Recommendation: Multi-layer testing with **cargo test** (unit/integration), **Playwright** (E2E), and **Vitest** (frontend unit) + +### Test Pyramid + +``` + ▲ + / \ E2E Tests (Playwright) + / \ - Full user flows + / \ - Cast receiver rendering + /───────\ + / \ Integration Tests (cargo test + testcontainers) + / \ - API endpoint tests with real DB + / \ - NATS pub/sub flows + / \ - Sync protocol tests +/─────────────────\ + Unit Tests (cargo test + vitest) + - Domain logic (tournament engine, clock, waitlist) + - Svelte component tests + - Conflict resolution logic +``` + +### Backend Testing (Rust) + +- **Unit tests**: Inline `#[cfg(test)]` modules for domain logic. The tournament engine, clock manager, waitlist priority algorithm, and conflict resolution are all pure functions that are easy to unit test. +- **Integration tests**: Use `testcontainers` crate to spin up ephemeral Postgres + NATS + DragonflyDB instances. Test full API flows including auth, multi-tenancy, and real-time events. +- **sqlx compile-time checks**: SQL queries are validated against the database schema at compile time — this catches a huge class of bugs before runtime. +- **Property-based testing**: Use `proptest` for testing conflict resolution and sync protocol with random inputs. +- **Test runner**: `cargo-nextest` for parallel test execution (significantly faster than default `cargo test`). + +### Frontend Testing (TypeScript/Svelte) + +- **Component tests**: Vitest + `@testing-library/svelte` for testing Svelte components in isolation. +- **Store/state tests**: Vitest for testing reactive state logic (tournament clock state, waitlist updates). +- **API mocking**: `msw` (Mock Service Worker) for intercepting API calls in tests. + +### End-to-End Testing + +- **Playwright**: Test critical user flows in real browsers: + - Tournament creation and management flow + - Player registration and waitlist join + - Real-time updates (verify clock ticks appear in browser) + - Multi-venue admin dashboard + - Cast receiver display rendering (headless Chromium) +- **Local node E2E**: Test offline scenarios — start local node, disconnect from cloud, perform operations, reconnect, verify sync. + +### Specialized Tests + +- **Sync protocol tests**: Simulate network partitions, conflicting writes, replay scenarios +- **Load testing**: `k6` or `drill` (Rust) for WebSocket connection saturation, API throughput +- **Cast receiver tests**: Visual regression testing with Playwright screenshots of display layouts +- **Cross-browser**: Playwright covers Chromium, Firefox, WebKit — ensure PWA works on all + +### Gotchas + +- Rust integration tests with testcontainers need Docker available in CI — Fly.io's CI runners support this, or use GitHub Actions with Docker +- Playwright tests are slow — run in parallel, and only test critical paths in CI (full suite nightly) +- The local node's offline/reconnect behavior is the hardest thing to test — invest heavily in deterministic sync protocol tests +- Mock the NATS connection in unit tests using a channel-based mock, not an actual NATS server + +--- + +## 16. Security + +### Recommendation: Defense in depth across all layers + +### Data Security + +| Layer | Measure | +|-------|---------| +| **Transport** | TLS 1.3 everywhere — API, WebSocket, NATS, Postgres connections | +| **Data at rest** | Postgres: encrypted volumes (cloud provider). libSQL on RPi5: SQLCipher-compatible encryption via `libsql` | +| **Secrets** | Environment variables via Fly.io secrets (cloud), encrypted config file on RPi5 (sealed at provisioning) | +| **Passwords** | Argon2id hashing, tuned per environment (higher params on cloud, lower on RPi5) | +| **JWTs** | Ed25519 signing, short expiry (15 min), refresh token rotation | +| **API keys** | SHA-256 hashed in DB, displayed once at creation, prefix-based identification (`pvm_live_`, `pvm_test_`) | + +### Network Security + +- **API**: Rate limiting (Tower middleware), CORS restricted to known origins, request size limits +- **WebSocket**: Authenticated connection upgrade (JWT in first message or query param), per-connection rate limiting +- **NATS**: TLS + token auth between cloud and leaf nodes. Leaf nodes have scoped permissions (can only access their venue's subjects) +- **RPi5**: Firewall (nftables/ufw) — only allow outbound to cloud NATS + HTTPS, inbound on local network only for venue devices +- **DDoS**: Fly.io provides basic DDoS protection. Add Cloudflare in front for the API if needed. + +### Financial Data Security + +PVM handles credit lines and buy-in transactions — this requires extra care: + +- All financial mutations are **event-sourced** with immutable audit trail +- Credit line changes require **admin approval** with logged reason +- Buy-in/cashout transactions include **idempotency keys** to prevent duplicate charges +- Financial reports are only accessible to operator admins, with access logged +- Consider PCI DSS implications if handling payment card data directly — prefer delegating to a payment processor (Stripe) + +### Local Node Security + +The RPi5 is physically in a venue — assume it can be stolen or tampered with: + +- **Disk encryption**: Full disk encryption (LUKS) or at minimum encrypted database +- **Secure boot**: Signed binaries, verified at startup +- **Remote wipe**: Cloud can send a command to reset the node to factory state +- **Tamper detection**: Log unexpected restarts, hardware changes +- **Credential scope**: Local node only has access to its venue's data — compromising one node doesn't expose other venues + +### Gotchas + +- DO NOT store payment card numbers — use a payment processor's tokenization +- GDPR/privacy: Player data across venues requires careful consent management. Players must be able to request data deletion. +- The local node's offline auth cache is a security risk — limit cached credentials, expire after configurable period +- Regularly rotate NATS credentials and JWT signing keys — automate this + +--- + +## 17. Developer Experience + +### Recommendation: **Cargo workspace** (Rust monorepo) + **pnpm workspace** (TypeScript) managed by **Turborepo** + +### Monorepo Structure + +``` +pvm/ +├── Cargo.toml # Rust workspace root +├── turbo.json # Turborepo config +├── package.json # pnpm workspace root +├── pnpm-workspace.yaml +│ +├── crates/ # Rust crates +│ ├── pvm-api/ # Cloud API server (Axum) +│ ├── pvm-node/ # Local node binary +│ ├── pvm-ws-gateway/ # WebSocket gateway +│ ├── pvm-worker/ # Background job processor +│ ├── pvm-core/ # Shared domain logic +│ │ ├── tournament/ # Tournament engine +│ │ ├── waitlist/ # Waitlist management +│ │ ├── clock/ # Tournament clock +│ │ └── sync/ # Sync protocol +│ ├── pvm-db/ # Database layer (sqlx queries, migrations) +│ ├── pvm-auth/ # Auth logic (JWT, RBAC) +│ ├── pvm-nats/ # NATS client wrappers +│ └── pvm-types/ # Shared types (serde, utoipa derives) +│ +├── apps/ # TypeScript apps +│ ├── dashboard/ # SvelteKit admin dashboard +│ ├── player/ # SvelteKit player-facing app +│ ├── cast-receiver/ # SvelteKit Cast receiver (static) +│ └── docs/ # Documentation site (optional) +│ +├── packages/ # Shared TypeScript packages +│ ├── ui/ # shadcn-svelte components +│ ├── api-client/ # Generated OpenAPI client +│ └── shared/ # Shared types, utilities +│ +├── docker/ # Dockerfiles +├── .github/ # GitHub Actions workflows +└── docs/ # Project documentation +``` + +### Key Tools + +| Tool | Purpose | +|------|---------| +| **Cargo** | Rust build system, workspace management | +| **pnpm** | Fast, disk-efficient Node.js package manager | +| **Turborepo** | Orchestrates build/test/lint across both Rust and TS workspaces. Caches build outputs. `--affected` flag for CI optimization. | +| **cargo-watch** | Auto-rebuild on Rust file changes during development | +| **cargo-nextest** | Faster test runner with parallel execution | +| **sccache** | Shared compilation cache (speeds up CI and local builds) | +| **cross** / **cargo-zigbuild** | Cross-compile Rust for RPi5 ARM64 | +| **Biome** | Fast linter + formatter for TypeScript (replaces ESLint + Prettier) | +| **clippy** | Rust linter (run with `--deny warnings` in CI) | +| **rustfmt** | Rust formatter (enforced in CI) | +| **lefthook** | Git hooks manager (format + lint on pre-commit) | + +### Development Workflow + +```bash +# Start everything for local development +turbo dev # Starts SvelteKit dev servers +cargo watch -x run -p pvm-api # Auto-restart API on changes + +# Run all tests +turbo test # TypeScript tests +cargo nextest run # Rust tests + +# Generate API client after backend changes +cargo run -p pvm-api -- --openapi > apps/dashboard/src/lib/api/schema.json +turbo generate:api-client + +# Build for production +turbo build # TypeScript apps +cargo build --release -p pvm-api +cross build --release --target aarch64-unknown-linux-gnu -p pvm-node +``` + +### Gotchas + +- Turborepo's Rust support is task-level (it runs `cargo` as a shell command) — it doesn't understand Cargo's internal dependency graph. Use Cargo workspace for Rust-internal dependencies. +- Keep `pvm-core` as a pure library crate with no async runtime dependency — this lets it be used in both the cloud API and the local node without conflicts. +- Rust compile times are the bottleneck — invest in `sccache` and incremental compilation from day one +- Use `.cargo/config.toml` for cross-compilation targets and linker settings + +--- + +## 18. CSS / Styling + +### Recommendation: **Tailwind CSS v4** + **shadcn-svelte** component system + +### Alternatives Considered + +| Option | Pros | Cons | +|--------|------|------| +| **Tailwind CSS v4** | Utility-first, fast, excellent Svelte integration, v4 is faster with Rust-based engine | Learning curve for utility classes | +| **Vanilla CSS** | No dependencies, full control | Slow development, inconsistent patterns | +| **UnoCSS** | Atomic CSS, fast, flexible presets | Smaller ecosystem than Tailwind | +| **Open Props** | Design tokens as CSS custom properties | Not utility-first, less adoption | +| **Panda CSS** | Type-safe styles, zero runtime | Newer, smaller ecosystem | + +### Reasoning + +**Tailwind CSS v4** is the clear choice: + +1. **Svelte integration**: Tailwind works seamlessly with SvelteKit via the Vite plugin. Svelte's template syntax + Tailwind utilities produce compact, readable component markup. +2. **Tailwind v4 improvements**: The v4 release includes a Rust-based engine (Oxide) that is significantly faster, CSS-first configuration (no more `tailwind.config.js`), automatic content detection, and native CSS cascade layers. +3. **shadcn-svelte**: The component library is built on Tailwind, providing a consistent design system with accessible, customizable components. Components are generated into your codebase — full ownership, no black box. +4. **Cast receiver**: Tailwind's utility classes produce small CSS bundles (only used classes are included) — important for the resource-constrained Chromecast receiver. +5. **Design tokens**: Use CSS custom properties (via Tailwind's theme) for venue-specific branding (colors, logos) that can be swapped at runtime. + +### Design System Structure + +``` +packages/ui/ +├── components/ # shadcn-svelte generated components +│ ├── button/ +│ ├── card/ +│ ├── data-table/ +│ ├── dialog/ +│ ├── form/ +│ └── ... +├── styles/ +│ ├── app.css # Global styles, Tailwind imports +│ ├── themes/ +│ │ ├── default.css # Default PVM theme +│ │ ├── dark.css # Dark mode overrides +│ │ └── cast.css # Optimized for large screens +│ └── tokens.css # Design tokens (colors, spacing, typography) +└── utils.ts # cn() helper, variant utilities +``` + +### Venue Branding + +Venues should be able to customize their displays: + +```css +/* Runtime theme switching via CSS custom properties */ +:root { + --venue-primary: theme(colors.blue.600); + --venue-secondary: theme(colors.gray.800); + --venue-logo-url: url('/default-logo.svg'); +} + +/* Applied per-venue at runtime */ +[data-venue-theme="vegas-poker"] { + --venue-primary: #c41e3a; + --venue-secondary: #1a1a2e; + --venue-logo-url: url('/venues/vegas-poker/logo.svg'); +} +``` + +### Gotchas + +- Tailwind v4's CSS-first config is a paradigm shift from v3 — ensure all team documentation targets v4 syntax +- shadcn-svelte components use Tailwind v4 as of recent updates — verify compatibility +- Large data tables (tournament player lists, waitlists) need careful styling — consider virtualized rendering for 100+ row tables +- Cast receiver displays need large fonts and high contrast — create a dedicated `cast.css` theme +- Dark mode is essential for poker venues (low-light environments) — design dark-first + +--- + +## Recommended Stack Summary + +| Area | Recommendation | Key Reasoning | +|------|---------------|---------------| +| **Backend Language** | Rust | Memory efficiency on RPi5, performance, type safety | +| **Frontend Language** | TypeScript | Browser ecosystem standard, type safety | +| **Backend Framework** | Axum (v0.8+) | Tokio-native, Tower middleware, WebSocket support | +| **Frontend Framework** | SvelteKit (Svelte 5) | Smallest bundles, fine-grained reactivity, PWA support | +| **UI Components** | shadcn-svelte | Accessible, Tailwind-based, full ownership | +| **Cloud Database** | PostgreSQL 16+ | Multi-tenant gold standard, RLS, JSONB | +| **Local Database** | libSQL (embedded) | SQLite-compatible, tiny footprint, Rust-native | +| **ORM / Queries** | sqlx | Compile-time checked SQL, Postgres + SQLite support | +| **Caching** | DragonflyDB | Redis-compatible, multi-threaded, memory efficient | +| **Messaging** | NATS + JetStream | Edge-native leaf nodes, sub-ms latency, lightweight | +| **Real-Time** | WebSockets (Axum) + SSE fallback | Full duplex, NATS-backed fan-out | +| **Auth** | Custom JWT + RBAC | Offline-capable, cross-venue, full control | +| **API Design** | REST + OpenAPI 3.1 | Generated TypeScript client, universal compatibility | +| **Mobile** | PWA first, Capacitor later | One codebase, offline support, app store when needed | +| **Cast/Display** | Google Cast SDK + Custom Web Receiver | SvelteKit static app on Chromecast | +| **Deployment** | Fly.io + Docker | Edge deployment, managed Postgres, WireGuard | +| **CI/CD** | GitHub Actions + Turborepo | Cross-language build orchestration, caching | +| **Monitoring** | OpenTelemetry + Grafana | Vendor-neutral, excellent Rust support | +| **Testing** | cargo-nextest + Vitest + Playwright | Full pyramid: unit, integration, E2E | +| **Styling** | Tailwind CSS v4 | Fast, small bundles, Svelte-native | +| **Monorepo** | Cargo workspace + pnpm + Turborepo | Unified builds, shared types | +| **Linting** | clippy + Biome | Rust + TypeScript coverage | + +--- + +## Open Questions / Decisions Needed + +### High Priority + +1. **Fly.io vs. self-hosted**: Fly.io simplifies operations but creates vendor dependency. For a bootstrapped SaaS, the convenience is worth it. For VC-funded with an ops team, self-hosted on Hetzner could be cheaper at scale. **Decision: Start with Fly.io, design for portability.** + +2. **libSQL sync granularity**: Should the local node sync entire tables or individual rows? Row-level sync is more efficient but more complex to implement. **Recommendation: Start with table-level sync for the initial version, refine to row-level as data volumes grow.** + +3. **NATS embedded vs. sidecar on RPi5**: Running NATS as an embedded library (via `nats-server` Rust bindings) vs. a separate process. Embedded is simpler but couples versions tightly. **Recommendation: Sidecar (separate process managed by systemd) for operational flexibility.** + +4. **Financial data handling**: Does PVM handle actual money transactions, or only track buy-ins/credits as records? If handling real money, PCI DSS and financial regulations apply. **Recommendation: Track records only. Integrate with Stripe for actual payments.** + +5. **Multi-region from day one?**: Should the initial architecture support venues in multiple countries/regions? This affects Postgres replication strategy and NATS cluster topology. **Recommendation: Single region initially, design NATS subjects and DB schema for eventual multi-region.** + +### Medium Priority + +6. **Player account deduplication**: When a player signs up at two venues independently, how do we detect and merge accounts? Email match? Phone match? Manual linking? **Needs product decision.** + +7. **Chromecast vs. generic display hardware**: Should the primary display strategy be Chromecast, or should we target a browser-in-kiosk-mode approach that also works with Chromecast? **Recommendation: Build the receiver as a standard web app first (works in kiosk mode), add Cast SDK integration second.** + +8. **RPi5 provisioning**: How are local nodes set up? Manual image flashing? Automated provisioning? Remote setup? **Recommendation: Pre-built OS image with first-boot wizard that connects to cloud and provisions the node.** + +9. **Offline duration limits**: How long should a local node operate offline before we consider the data stale? 1 hour? 1 day? 1 week? **Needs product decision based on venue feedback.** + +10. **API versioning strategy**: When do we introduce `/api/v2/`? Should we support multiple versions simultaneously? **Recommendation: Semantic versioning for the API spec. Maintain backward compatibility as long as possible. Only version on breaking changes.** + +### Low Priority + +11. **GraphQL for player-facing app**: The admin dashboard is well-served by REST, but the player app might benefit from GraphQL's flexible querying (e.g., "show me my upcoming tournaments across all venues with waitlist status"). **Revisit after v1 launch.** + +12. **WebTransport**: When browser support matures and Chromecast supports it, WebTransport could replace WebSockets for lower-latency, multiplexed real-time streams. **Monitor but do not adopt yet.** + +13. **WASM on local node**: Could parts of the frontend run on the local node via WASM for ultra-fast local rendering? Interesting but not a priority. **Defer.** + +14. **AI features**: Player behavior analytics, optimal table assignments, tournament structure recommendations. The data model should be designed to support future ML pipelines. **Design for it, build later.**