From cf03b3592a5645c92caaa2eed87c167d443f0f70 Mon Sep 17 00:00:00 2001
From: Mikkel Georgsen <msgeorgsen@gmail.com>
Date: Sun, 8 Feb 2026 02:50:33 +0100
Subject: [PATCH] Add comprehensive tech stack research document

1,190-line research covering all 18 technology areas for PVM:
Rust/Axum backend, SvelteKit frontend, Postgres + libSQL databases,
NATS + JetStream messaging, DragonflyDB caching, and more.
Includes recommended stack summary and open questions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/TECH_STACK_RESEARCH.md | 1190 +++++++++++++++++++++++++++++++++++
 1 file changed, 1190 insertions(+)
 create mode 100644 docs/TECH_STACK_RESEARCH.md

diff --git a/docs/TECH_STACK_RESEARCH.md b/docs/TECH_STACK_RESEARCH.md
new file mode 100644
index 0000000..229cafa
--- /dev/null
+++ b/docs/TECH_STACK_RESEARCH.md
@@ -0,0 +1,1190 @@
+# PVM (Poker Venue Manager) — Tech Stack Research
+
+> Generated: 2026-02-08
+> Status: DRAFT — for discussion and refinement
+
+---
+
+## Table of Contents
+
+1. [Programming Language](#1-programming-language)
+2. [Backend Framework](#2-backend-framework)
+3. [Frontend Framework](#3-frontend-framework)
+4. [Database Strategy](#4-database-strategy)
+5. [Caching Layer](#5-caching-layer)
+6. [Message Queue / Event Streaming](#6-message-queue--event-streaming)
+7. [Real-Time Communication](#7-real-time-communication)
+8. [Auth & Authorization](#8-auth--authorization)
+9. [API Design](#9-api-design)
+10. [Local Node Architecture](#10-local-node-architecture)
+11. [Chromecast / Display Streaming](#11-chromecast--display-streaming)
+12. [Mobile Strategy](#12-mobile-strategy)
+13. [Deployment & Infrastructure](#13-deployment--infrastructure)
+14. [Monitoring & Observability](#14-monitoring--observability)
+15. [Testing Strategy](#15-testing-strategy)
+16. [Security](#16-security)
+17. [Developer Experience](#17-developer-experience)
+18. [CSS / Styling](#18-css--styling)
+19. [Recommended Stack Summary](#recommended-stack-summary)
+20. [Open Questions / Decisions Needed](#open-questions--decisions-needed)
+
+---
+
+## 1. Programming Language
+
+### Recommendation: **Rust** (backend + local node) + **TypeScript** (frontend + shared types)
+
+### Alternatives Considered
+
+| Language | Pros | Cons |
+|----------|------|------|
+| **Rust** | Memory safety, fearless concurrency, tiny binaries for RPi5, no GC pauses, excellent WebSocket perf | Steeper learning curve, slower compile times |
+| **Go** | Simple, fast compilation, good concurrency | Less expressive type system, GC pauses (minor), larger binaries than Rust |
+| **TypeScript (full-stack)** | One language everywhere, huge ecosystem, fast dev velocity | Node.js memory overhead on RPi5, GC pauses in real-time scenarios, weaker concurrency model |
+| **Elixir** | Built for real-time (Phoenix), fault-tolerant OTP | Small ecosystem, harder to find libs, RPi5 BEAM VM overhead |
+
+### Reasoning
+
+Rust is the strongest choice for PVM because of the **RPi5 local node constraint**. The local node must run reliably on constrained hardware with limited memory, handle real-time tournament clocks, manage offline operations, and sync data. Rust's zero-cost abstractions, lack of garbage collector, and small binary sizes (typically 5-15 MB static binaries) make it ideal for this.
+
+For the **cloud backend**, Rust's performance means fewer servers and lower hosting costs. A single Rust service can handle thousands of concurrent WebSocket connections with minimal memory overhead — critical for real-time tournament updates across many venues.
+
+The **"all code written by Claude Code"** constraint actually favors Rust: Claude has excellent Rust fluency, and the compiler's strict type system catches bugs that would otherwise require extensive testing in dynamic languages.
+
+**TypeScript** remains the right choice for the frontend — the browser ecosystem is TypeScript-native, and sharing type definitions between Rust (via generated OpenAPI types) and TypeScript gives end-to-end type safety.
+
+### Gotchas
+
+- Rust compile times can be mitigated with `cargo-watch`, incremental compilation, and `sccache`
+- Cross-compilation for RPi5 (ARM64) is well-supported via `cross` or `cargo-zigbuild`
+- Shared domain types can be generated from Rust structs to TypeScript via `ts-rs` or OpenAPI codegen
+
+---
+
+## 2. Backend Framework
+
+### Recommendation: **Axum** (v0.8+)
+
+### Alternatives Considered
+
+| Framework | Pros | Cons |
+|-----------|------|------|
+| **Axum** | Tokio-native, excellent middleware (Tower), lowest memory footprint, growing ecosystem, WebSocket built-in | Younger than Actix |
+| **Actix Web** | Highest raw throughput, most mature | Actor model adds complexity, not Tokio-native (uses own runtime fork) |
+| **Rocket** | Most ergonomic, Rails-like DX | Slower performance, less flexible middleware |
+| **Loco** | Rails-like conventions, batteries-included | Very new (2024), smaller community, opinionated |
+
+### Reasoning
+
+**Axum** is the clear winner for PVM:
+
+1. **Tokio-native**: Axum is built directly on Tokio + Hyper + Tower. Since NATS, database drivers, and WebSocket handling all use Tokio, everything shares one async runtime — no impedance mismatch.
+2. **Tower middleware**: The Tower service/layer pattern gives composable middleware for auth, rate limiting, tracing, compression, CORS, etc. Middleware can be shared between HTTP and WebSocket handlers.
+3. **WebSocket support**: First-class WebSocket extraction with `axum::extract::ws`, typed WebSocket messages via `axum-typed-websockets`.
+4. **Memory efficiency**: Benchmarks show Axum achieves the lowest memory footprint per connection — critical when serving thousands of concurrent venue connections.
+5. **OpenAPI integration**: `utoipa` crate provides derive macros for generating OpenAPI 3.1 specs directly from Axum handler types.
+6. **Extractor pattern**: Axum's extractor-based request handling maps cleanly to domain operations (extract tenant, extract auth, extract venue context).
+
+### Key Libraries
+
+- `axum` — HTTP framework
+- `axum-extra` — typed headers, cookie jar, multipart
+- `tower` + `tower-http` — middleware stack (CORS, compression, tracing, rate limiting)
+- `utoipa` + `utoipa-axum` — OpenAPI spec generation
+- `utoipa-swagger-ui` — embedded Swagger UI
+- `axum-typed-websockets` — strongly typed WS messages
+
+### Gotchas
+
+- Axum's error handling requires careful design — use `thiserror` + a custom error type that implements `IntoResponse`
+- Route organization: use `axum::Router::nest()` for modular route trees per domain (tournaments, venues, players)
+- State management: use `axum::extract::State` with `Arc<AppState>` — avoid the temptation to put everything in one giant state struct
+
+---
+
+## 3. Frontend Framework
+
+### Recommendation: **SvelteKit** (Svelte 5 + runes reactivity)
+
+### Alternatives Considered
+
+| Framework | Pros | Cons |
+|-----------|------|------|
+| **SvelteKit** | Smallest bundles, true compilation (no virtual DOM), built-in routing/SSR/PWA, Svelte 5 runes are elegant | Smaller ecosystem than React |
+| **Next.js (React)** | Largest ecosystem, most libraries, biggest job market | Vercel lock-in concerns, React hydration overhead, larger bundles, RSC complexity |
+| **SolidStart** | Finest-grained reactivity, near-zero overhead updates | Smallest ecosystem, least mature, fewer component libraries |
+| **Nuxt (Vue)** | Good DX, solid ecosystem | Vue 3 composition API less elegant than Svelte 5 runes |
+
+### Reasoning
+
+**SvelteKit** is the best fit for PVM for several reasons:
+
+1. **Performance matters for venue displays**: Tournament clocks, waiting lists, and seat maps will run on venue TVs via Chromecast. Svelte's compiled output produces minimal JavaScript — the Cast receiver app will load faster and use less memory on Chromecast hardware.
+2. **Real-time UI updates**: Svelte 5's fine-grained reactivity (runes: `$state`, `$derived`, `$effect`) means updating a single timer or seat status re-renders only that DOM node, not a virtual DOM diff. This is ideal for dashboards with many independently updating elements.
+3. **PWA support**: SvelteKit has first-class service worker support and offline capabilities through `@sveltejs/adapter-static` and `vite-plugin-pwa`.
+4. **Bundle size**: SvelteKit produces the smallest JavaScript bundles of any major framework — important for mobile PWA users on venue WiFi.
+5. **Claude Code compatibility**: Svelte's template syntax is straightforward and less boilerplate than React — Claude can generate clean, readable Svelte components efficiently.
+6. **No framework lock-in**: Svelte compiles away, so there's no runtime dependency. The output is vanilla JS + DOM manipulation.
+
+### UI Component Library
+
+**Recommendation: Skeleton UI** (Svelte-native) or **shadcn-svelte** (Tailwind-based, port of shadcn/ui)
+
+`shadcn-svelte` is particularly compelling because:
+- Components are copied into your codebase (not a dependency) — full control
+- Built on Tailwind CSS — consistent with the styling recommendation
+- Accessible by default (uses Bits UI primitives under the hood)
+- Matches the design patterns of the widely-used shadcn/ui ecosystem
+
+### Gotchas
+
+- SvelteKit's SSR is useful for the management dashboard but the Cast receiver and PWA may use `adapter-static` for pure SPA mode
+- Svelte's ecosystem is smaller than React's, but for PVM's needs (forms, tables, charts, real-time) the ecosystem is sufficient
+- Svelte 5 (runes) is a significant API change from Svelte 4 — ensure all examples and libraries target Svelte 5
+
+---
+
+## 4. Database Strategy
+
+### Recommendation: **PostgreSQL** (cloud primary) + **libSQL/SQLite** (local node) + **Electric SQL** or custom sync
+
+### Alternatives Considered
+
+| Approach | Pros | Cons |
+|----------|------|------|
+| **Postgres cloud + libSQL local + sync** | Best of both worlds — Postgres power in cloud, SQLite simplicity on RPi5 | Need sync layer, schema divergence risk |
+| **Postgres everywhere** | One DB engine, simpler mental model | Postgres on RPi5 uses more memory, harder offline |
+| **libSQL/Turso everywhere** | One engine, built-in edge replication | Less powerful for complex cloud queries, multi-tenant partitioning |
+| **CockroachDB** | Distributed, strong consistency | Heavy for RPi5, expensive, overkill |
+
+### Detailed Recommendation
+
+**Cloud Database: PostgreSQL 16+**
+- The gold standard for multi-tenant SaaS
+- Row-level security (RLS) for tenant isolation
+- JSONB for flexible per-venue configuration
+- Excellent full-text search for player lookup across venues
+- Partitioning by tenant for performance at scale
+- Managed options: Neon (serverless, branching for dev), Supabase, or AWS RDS
+
+**Local Node Database: libSQL (via Turso's embedded runtime)**
+- Fork of SQLite with cloud sync capabilities
+- Runs embedded in the Rust binary — no separate database process on RPi5
+- WAL mode for concurrent reads during tournament operations
+- Tiny memory footprint (< 10 MB typical)
+- libSQL's Rust driver (`libsql`) is well-maintained
+
+**Sync Strategy:**
+
+The local node operates on a **subset** of the cloud data — only data relevant to its venue(s). The sync approach:
+
+1. **Cloud-to-local**: Player profiles, memberships, credit lines pushed to local node via NATS JetStream. Local node maintains a read replica of relevant data in libSQL.
+2. **Local-to-cloud**: Tournament results, waitlist changes, transactions pushed to cloud via NATS JetStream with at-least-once delivery. Cloud processes as events.
+3. **Conflict resolution**: Last-writer-wins (LWW) with vector clocks for most entities. For financial data (credit lines, buy-ins), use **event sourcing** — conflicts are impossible because every transaction is an immutable event.
+4. **Offline queue**: When disconnected, local node queues mutations in a local WAL-style append-only log. On reconnect, replays in order via NATS.
+
+### ORM / Query Layer
+
+**Recommendation: `sqlx`** (compile-time checked queries)
+
+- `sqlx` checks SQL queries against the actual database schema at compile time
+- No ORM abstraction layer — write real SQL, get compile-time safety
+- Supports both PostgreSQL and SQLite/libSQL
+- Avoids the N+1 query problems that ORMs introduce
+- Migrations via `sqlx migrate`
+
+Alternative: `sea-orm` if you want a full ORM, but for PVM the explicit SQL approach of `sqlx` gives more control over multi-tenant queries and complex joins.
+
+### Migrations
+
+- Use `sqlx migrate` for cloud PostgreSQL migrations
+- Maintain parallel migration files for libSQL (SQLite-compatible subset)
+- A shared migration test ensures both schemas stay compatible for the sync subset
+
+### Gotchas
+
+- PostgreSQL and SQLite have different SQL dialects — the sync subset must use compatible types (no Postgres-specific types in synced tables)
+- libSQL's `VECTOR` type is interesting for future player similarity features but not needed initially
+- Turso's hosted libSQL replication is an option but adds a dependency — prefer embedded libSQL with custom NATS-based sync for more control
+- Schema versioning must be tracked on the local node so the cloud knows what schema version it's talking to
+
+---
+
+## 5. Caching Layer
+
+### Recommendation: **DragonflyDB**
+
+### Alternatives Considered
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **DragonflyDB** | 25x Redis throughput, Redis-compatible API, multi-threaded, lower memory usage | Younger project, smaller community |
+| **Redis 7+** | Most mature, largest ecosystem, Redis Stack modules | Single-threaded core, BSL license concerns since Redis 7.4 |
+| **Valkey** | Redis fork, community-driven, BSD license | Still catching up to Redis feature parity |
+| **KeyDB** | Multi-threaded Redis fork | Development appears stalled (no updates in 1.5+ years) |
+| **No cache (just Postgres)** | Simpler architecture | Higher DB load, slower for session/real-time data |
+
+### Reasoning
+
+**DragonflyDB** is the right choice for PVM:
+
+1. **Redis API compatibility**: Drop-in replacement — all Redis client libraries work unchanged. The `fred` Rust crate (async Redis client) works with DragonflyDB out of the box.
+2. **Multi-threaded architecture**: DragonflyDB uses all available CPU cores, unlike Redis's single-threaded model. This matters when caching tournament state for hundreds of concurrent venues.
+3. **Memory efficiency**: DragonflyDB uses up to 80% less memory than Redis for the same dataset — important for keeping infrastructure costs low.
+4. **No license concerns**: DragonflyDB uses BSL 1.1 (converts to open source after 4 years). Redis switched to a dual-license model that's more restrictive. Valkey is BSD but is playing catch-up.
+5. **Pub/Sub**: DragonflyDB supports Redis Pub/Sub — useful as a lightweight complement to NATS for in-process event distribution within the backend cluster.
+
+### What to Cache
+
+- **Session data**: User sessions, JWT refresh tokens
+- **Tournament state**: Current level, blinds, clock, player counts (hot read path)
+- **Waiting lists**: Ordered sets per venue/game type
+- **Rate limiting**: API rate limit counters
+- **Player lookup cache**: Frequently accessed player profiles
+- **Seat maps**: Current table/seat assignments per venue
+
+### What NOT to Cache (use Postgres directly)
+
+- Financial transactions (credit lines, buy-ins) — always hit the source of truth
+- Audit logs
+- Historical tournament data
+
+### Local Node: No DragonflyDB
+
+The RPi5 local node should **not** run DragonflyDB. libSQL is fast enough for local caching needs, and adding another process increases complexity and memory usage on constrained hardware. Use in-memory Rust data structures (e.g., `DashMap`, `moka` cache crate) for hot local state.
+
+### Gotchas
+
+- DragonflyDB's replication features are less mature than Redis Sentinel/Cluster — use managed hosting or keep it simple with a single node + persistence initially
+- Monitor DragonflyDB's release cycle — it's actively developed but younger than Redis
+- Keep the cache layer optional — the system should function (slower) without it
+
+---
+
+## 6. Message Queue / Event Streaming
+
+### Recommendation: **NATS + JetStream**
+
+### Alternatives Considered
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **NATS + JetStream** | Lightweight (single binary, ~20MB), sub-ms latency, built-in persistence, embedded mode, perfect for edge | Smaller community than Kafka |
+| **Apache Kafka** | Highest throughput, mature, excellent tooling | Heavy (JVM, ZooKeeper/KRaft), 4GB+ RAM minimum, overkill for PVM's scale |
+| **RabbitMQ** | Mature AMQP, sophisticated routing | Higher latency (5-20ms), more memory, Erlang ops complexity |
+| **Redis Streams** | Simple, already have cache layer | Not designed for reliable message delivery at scale |
+
+### Reasoning
+
+**NATS + JetStream** is purpose-built for PVM's architecture:
+
+1. **Edge-native**: NATS can run as a **leaf node** on the RPi5, connecting to the cloud NATS cluster. This is the core of the local-to-cloud sync architecture. When the connection drops, JetStream buffers messages locally and replays them on reconnect.
+
+2. **Lightweight**: NATS server is a single ~20 MB binary. On RPi5, it uses ~50 MB RAM. Compare to Kafka's 4 GB minimum.
+
+3. **Sub-millisecond latency**: Core NATS delivers messages in < 1ms. JetStream (persistent) adds 1-5ms. This is critical for real-time tournament updates — when a player busts, every connected display should update within milliseconds.
+
+4. **Subject-based addressing**: NATS subjects map perfectly to PVM's domain:
+   - `venue.{venue_id}.tournament.{id}.clock` — tournament clock ticks
+   - `venue.{venue_id}.waitlist.update` — waiting list changes
+   - `venue.{venue_id}.seats.{table_id}` — seat assignments
+   - `player.{player_id}.notifications` — player-specific events
+   - `sync.{node_id}.upstream` — local node to cloud sync
+   - `sync.{node_id}.downstream` — cloud to local node sync
+
+5. **Built-in patterns**: Request/reply (for RPC between cloud and node), pub/sub (for broadcasts), queue groups (for load-balanced consumers), key-value store (for distributed config), object store (for binary data like player photos).
+
+6. **JetStream for durability**: Tournament results, financial transactions, and sync operations need guaranteed delivery. JetStream provides at-least-once and exactly-once delivery semantics with configurable retention.
+
+### Architecture
+
+```
+RPi5 Local Node                    Cloud
+┌──────────────┐                ┌──────────────────┐
+│  NATS Leaf   │◄──── TLS ────►│  NATS Cluster     │
+│  Node        │    (auto-     │  (3-node)         │
+│              │    reconnect) │                    │
+│  JetStream   │               │  JetStream        │
+│  (local buf) │               │  (persistent)     │
+└──────────────┘               └──────────────────┘
+```
+
+### Gotchas
+
+- NATS JetStream's exactly-once semantics require careful consumer design — use idempotent handlers with deduplication IDs
+- Subject namespace design is critical — plan it early, changing later is painful
+- NATS leaf nodes need TLS configured for secure cloud connection
+- Monitor JetStream stream sizes on RPi5 — set max bytes limits to avoid filling the SD card during extended offline periods
+- The `async-nats` Rust crate is the official async client — well maintained and Tokio-native
+
+---
+
+## 7. Real-Time Communication
+
+### Recommendation: **WebSockets** (via Axum) for interactive clients + **NATS** for backend fan-out + **SSE** as fallback
+
+### Alternatives Considered
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **WebSockets** | Full duplex, low latency, wide support | Requires connection management, can't traverse some proxies |
+| **Server-Sent Events (SSE)** | Simpler, auto-reconnect, HTTP-native | Server-to-client only, no binary support |
+| **WebTransport** | HTTP/3, multiplexed streams, unreliable mode | Very new, limited browser support, no Chromecast support |
+| **Socket.IO** | Auto-fallback, rooms, namespaces | Node.js-centric, adds overhead, not Rust-native |
+| **gRPC streaming** | Typed, efficient, bidirectional | Not browser-native (needs grpc-web proxy), overkill |
+
+### Architecture
+
+The real-time pipeline has three layers:
+
+1. **NATS** (backend event bus): All state changes publish to NATS subjects. This is the single source of real-time truth. Both cloud services and local nodes publish here.
+
+2. **WebSocket Gateway** (Axum): A dedicated Axum service subscribes to relevant NATS subjects and fans out to connected WebSocket clients. Each client subscribes to the venues/tournaments they care about.
+
+3. **SSE Fallback**: For environments where WebSockets are blocked (some corporate networks), provide an SSE endpoint that delivers the same event stream. SSE's built-in auto-reconnect with `Last-Event-ID` makes resumption simple.
+
+### Flow Example: Tournament Clock Update
+
+```
+Tournament Service (Rust)
+  → publishes to NATS: venue.123.tournament.456.clock {level: 5, time_remaining: 1200}
+  → WebSocket Gateway subscribes to venue.123.tournament.*
+  → fans out to all connected clients watching tournament 456
+  → Chromecast receiver app gets update, renders clock
+  → PWA on player's phone gets update, shows current level
+```
+
+### Implementation Details
+
+- Use `axum::extract::ws::WebSocket` with `tokio::select!` to multiplex NATS subscription + client messages
+- Implement heartbeat/ping-pong to detect stale connections (30s interval, 10s timeout)
+- Client reconnection with exponential backoff + subscription replay from NATS JetStream
+- Binary message format: consider MessagePack (`rmp-serde`) for compact payloads over WebSocket, with JSON as human-readable fallback
+- Connection limits: track per-venue connection count, implement backpressure
+
+### Gotchas
+
+- WebSocket connections are stateful — need sticky sessions or a connection registry if running multiple gateway instances
+- Chromecast receiver apps have limited WebSocket support — test thoroughly on actual hardware
+- Mobile PWAs going to background will drop WebSocket connections — design for reconnection and state catch-up
+- Rate limit outbound messages to prevent flooding slow clients (tournament clock ticks should be throttled to 1/second for display, even if internal state updates more frequently)
+
+---
+
+## 8. Auth & Authorization
+
+### Recommendation: **Custom JWT auth** with **Postgres-backed RBAC** + optional **OAuth2 social login**
+
+### Alternatives Considered
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **Custom JWT + RBAC** | Full control, no vendor dependency, works offline on local node | Must implement everything yourself |
+| **Auth0 / Clerk** | Managed, social login, MFA out of box | Vendor lock-in, cost scales with users, doesn't work offline |
+| **Keycloak** | Self-hosted, full-featured, OIDC/SAML | Heavy (Java), complex to operate, overkill |
+| **Ory (Kratos + Keto)** | Open source, cloud-native, API-first | Multiple services to deploy, newer |
+| **Lucia Auth** | Lightweight, framework-agnostic | TypeScript-only, no Rust support |
+
+### Architecture
+
+PVM's auth has a unique challenge: **cross-venue universal player accounts** that must work both online (cloud) and offline (local node). This rules out purely managed auth services.
+
+**Token Strategy:**
+
+```
+Access Token (JWT, short-lived: 15 min)
+├── sub: player_id (universal)
+├── tenant_id: current operator
+├── venue_id: current venue (if applicable)
+├── roles: ["player", "dealer", "floor_manager", "admin"]
+├── permissions: ["tournament.manage", "waitlist.view", ...]
+└── iat, exp, iss
+
+Refresh Token (opaque, stored in DB/DragonflyDB, long-lived: 30 days)
+└── Rotated on each use, old tokens invalidated
+```
+
+**RBAC Model:**
+
+```
+Operator (tenant)
+├── Admin — full control over all venues
+├── Manager — manage specific venues
+├── Floor Manager — tournament/table operations at a venue
+├── Dealer — assigned to tables, report results
+└── Player — universal account, cross-venue
+    ├── can self-register
+    ├── has memberships per venue
+    └── has credit lines per venue (managed by admin)
+```
+
+**Key Design Decisions:**
+
+1. **Tenant-scoped roles**: A user can be an admin in one operator's venues and a player in another. The `(user_id, operator_id, role)` triple is the authorization unit.
+2. **Offline auth on local node**: The local node caches valid JWT signing keys and a subset of user credentials (hashed). Players can authenticate locally when the cloud is unreachable. New registrations queue for cloud sync.
+3. **JWT signing**: Use Ed25519 (fast, small signatures) via the `jsonwebtoken` crate. The cloud signs tokens; the local node can verify them with the public key. For offline token issuance, the local node has a delegated signing key.
+4. **Password hashing**: `argon2` crate — memory-hard, resistant to GPU attacks. Tune parameters for RPi5 (lower memory cost than cloud).
+5. **Social login** (optional, cloud-only): Support Google/Apple sign-in for player accounts via standard OAuth2 flows. Map social identities to the universal player account.
+
+### Gotchas
+
+- Token revocation is hard with JWTs — use short expiry (15 min) + refresh token rotation + a lightweight blocklist in DragonflyDB for immediate revocation
+- Cross-venue account linking: when a player signs up at venue A and later visits venue B (different operator), they should be recognized. Use email/phone as the universal identifier with verification.
+- Local node token issuance must be time-limited and logged — cloud should audit all locally-issued tokens on sync
+- Rate limit login attempts both on cloud and local node to prevent brute force
+
+---
+
+## 9. API Design
+
+### Recommendation: **REST + OpenAPI 3.1** with generated TypeScript client
+
+### Alternatives Considered
+
+| Approach | Pros | Cons |
+|----------|------|------|
+| **REST + OpenAPI** | Universal, tooling-rich, generated clients, cacheable | Overfetching possible, multiple round trips |
+| **GraphQL** | Flexible queries, single endpoint, good for complex UIs | Complexity overhead, caching harder, Rust support less mature |
+| **tRPC** | Zero-config type safety | TypeScript-only — cannot use with Rust backend |
+| **gRPC** | Efficient binary protocol, streaming | Needs proxy for browsers, overkill for this use case |
+
+### Reasoning
+
+**tRPC is ruled out** because it requires both client and server to be TypeScript. With a Rust backend, this is not viable.
+
+**REST + OpenAPI** is the best approach because:
+
+1. **Generated type safety**: Use `utoipa` to generate OpenAPI 3.1 specs from Rust types, then `openapi-typescript` to generate TypeScript types for the frontend. Changes to the Rust API automatically propagate to the frontend types.
+2. **Cacheable**: REST's HTTP semantics enable CDN caching, ETag support, and conditional requests — important for player profiles and tournament structures that change infrequently.
+3. **Universal clients**: The REST API will also be consumed by the Chromecast receiver app, the local node sync layer, and potentially third-party integrations. OpenAPI makes all of these easy.
+4. **Tooling**: Swagger UI for exploration, `openapi-fetch` for the TypeScript client (type-safe fetch wrapper), Postman/Insomnia for testing.
+
+### API Conventions
+
+```
+# Resource-based URLs
+GET    /api/v1/venues/{venue_id}/tournaments
+POST   /api/v1/venues/{venue_id}/tournaments
+GET    /api/v1/venues/{venue_id}/tournaments/{id}
+PATCH  /api/v1/venues/{venue_id}/tournaments/{id}
+
+# Actions as sub-resources
+POST   /api/v1/venues/{venue_id}/tournaments/{id}/start
+POST   /api/v1/venues/{venue_id}/tournaments/{id}/pause
+POST   /api/v1/venues/{venue_id}/waitlists/{id}/join
+POST   /api/v1/venues/{venue_id}/waitlists/{id}/call/{player_id}
+
+# Cross-venue player operations
+GET    /api/v1/players/me
+GET    /api/v1/players/{id}/memberships
+POST   /api/v1/players/{id}/credit-lines
+
+# Real-time subscriptions
+WS     /api/v1/ws?venue={id}&subscribe=tournament.clock,waitlist.updates
+```
+
+### Type Generation Pipeline
+
+```
+Rust structs (serde + utoipa derive)
+  → OpenAPI 3.1 JSON spec (generated at build time)
+  → openapi-typescript (CI step)
+  → TypeScript types + openapi-fetch client
+  → SvelteKit frontend consumes typed API
+```
+
+### Gotchas
+
+- Version the API from day one (`/api/v1/`) — breaking changes go in `/api/v2/`
+- Use cursor-based pagination for lists (not offset-based) — more efficient and handles concurrent inserts
+- Standardize error responses: `{ error: { code: string, message: string, details?: any } }`
+- Consider a lightweight BFF (Backend-for-Frontend) pattern in SvelteKit's server routes for aggregating multiple API calls into one page load
+
+---
+
+## 10. Local Node Architecture
+
+### Recommendation: **Single Rust binary** running on RPi5 with embedded libSQL, NATS leaf node, and local HTTP/WS server
+
+### What Runs on the RPi5
+
+```
+┌─────────────────────────────────────────────────────┐
+│  PVM Local Node (single Rust binary, ~15-20 MB)     │
+│                                                      │
+│  ┌──────────────┐  ┌──────────────┐                  │
+│  │ HTTP/WS      │  │ NATS Leaf    │                  │
+│  │ Server       │  │ Node         │                  │
+│  │ (Axum)       │  │ (embedded or │                  │
+│  │              │  │  sidecar)    │                  │
+│  └──────┬───────┘  └──────┬───────┘                  │
+│         │                  │                          │
+│  ┌──────┴──────────────────┴───────┐                  │
+│  │        Application Core          │                  │
+│  │  - Tournament engine             │                  │
+│  │  - Clock manager                 │                  │
+│  │  - Waitlist manager              │                  │
+│  │  - Seat assignment               │                  │
+│  │  - Sync orchestrator             │                  │
+│  └──────────────┬───────────────────┘                  │
+│                  │                                      │
+│  ┌──────────────┴───────────────────┐                  │
+│  │         libSQL (embedded)         │                  │
+│  │  - Venue data subset              │                  │
+│  │  - Offline mutation queue         │                  │
+│  │  - Local auth cache               │                  │
+│  └───────────────────────────────────┘                  │
+│                                                         │
+│  ┌───────────────────────────────────┐                  │
+│  │  moka in-memory cache             │                  │
+│  │  - Hot tournament state           │                  │
+│  │  - Active session tokens          │                  │
+│  └───────────────────────────────────┘                  │
+└─────────────────────────────────────────────────────────┘
+```
+
+### Offline Operations
+
+When the cloud connection drops, the local node continues operating:
+
+1. **Tournament operations**: Clock continues, blinds advance, players bust/rebuy — all local state
+2. **Waitlist management**: Players can join/leave waitlists — queued for cloud sync
+3. **Seat assignments**: Floor managers can move players between tables locally
+4. **Player auth**: Cached credentials allow existing players to log in. New registrations queued.
+5. **Financial operations**: Buy-ins and credit transactions logged locally with offline flag. Cloud reconciles on reconnect.
+
+### Sync Protocol
+
+```
+On reconnect:
+1. Local node sends its last-seen cloud sequence number
+2. Cloud sends all events since that sequence (via NATS JetStream replay)
+3. Local node sends its offline mutation queue (ordered by local timestamp)
+4. Cloud processes mutations, detects conflicts, responds with resolution
+5. Local node applies cloud resolutions, updates local state
+6. Both sides confirm sync complete
+```
+
+### Conflict Resolution Strategy
+
+| Data Type | Strategy | Reasoning |
+|-----------|----------|-----------|
+| Tournament state | Cloud wins | Only one node runs a tournament at a time |
+| Waitlist | Merge (union) | Both sides can add/remove; merge and re-order by timestamp |
+| Player profiles | Cloud wins (LWW) | Cloud is the authority for universal accounts |
+| Credit transactions | Append-only (event sourcing) | No conflicts — every transaction is immutable |
+| Seat assignments | Local wins during offline | Floor manager's local decisions take precedence |
+| Dealer schedules | Cloud wins | Schedules are set centrally |
+
+### RPi5 System Setup
+
+- **OS**: Raspberry Pi OS Lite (64-bit, Debian Bookworm-based) — no desktop environment
+- **Storage**: 32 GB+ microSD or USB SSD (recommended for durability)
+- **Auto-start**: systemd service for the PVM binary
+- **Updates**: OTA binary updates via a self-update mechanism (download new binary, verify signature, swap, restart)
+- **Watchdog**: Hardware watchdog timer to auto-reboot if the process hangs
+- **Networking**: Ethernet preferred (reliable), WiFi as fallback. mDNS for local discovery.
+
+### Gotchas
+
+- RPi5 has 4 GB or 8 GB RAM — target 8 GB model, budget ~200 MB for the PVM process + NATS
+- SD card wear: use an external USB SSD for the libSQL database if heavy write operations are expected
+- Time synchronization: use `chrony` NTP client — accurate timestamps are critical for conflict resolution and tournament clocks
+- Power loss: libSQL in WAL mode is crash-safe, but implement a clean shutdown handler (SIGTERM) that flushes state
+- Security: the RPi5 is physically accessible in venues — encrypt the libSQL database at rest, disable SSH password auth, use key-only
+
+---
+
+## 11. Chromecast / Display Streaming
+
+### Recommendation: **Google Cast SDK** with a **Custom Web Receiver** (SvelteKit static app)
+
+### Architecture
+
+```
+┌──────────────┐     Cast SDK      ┌──────────────────┐
+│ Sender App   │ ──────────────►   │ Custom Web        │
+│ (PVM Admin   │   (discovers &    │ Receiver          │
+│  Dashboard)  │    launches)      │ (SvelteKit SPA)   │
+│              │                   │                    │
+│ or           │                   │ Hosted at:         │
+│              │                   │ cast.pvmapp.com    │
+│ Local Node   │                   │                    │
+│ HTTP Server  │                   │ Connects to WS    │
+│              │                   │ for live updates   │
+└──────────────┘                   └────────┬───────────┘
+                                            │
+                                   ┌────────▼───────────┐
+                                   │ Chromecast Device   │
+                                   │ (renders receiver)  │
+                                   └────────────────────┘
+```
+
+### Custom Web Receiver
+
+The Cast receiver is a **separate SvelteKit static app** that:
+
+1. Loads on the Chromecast device when cast is initiated
+2. Connects to the PVM WebSocket endpoint (cloud or local node, depending on network)
+3. Subscribes to venue-specific events (tournament clock, waitlist, seat map)
+4. Renders full-screen display layouts:
+   - **Tournament clock**: Large timer, current level, blind structure, next break
+   - **Waiting list**: Player queue by game type, estimated wait times
+   - **Table status**: Open seats, game types, stakes per table
+   - **Custom messages**: Announcements, promotions
+
+### Display Manager
+
+A venue can have **multiple Chromecast devices** showing different content:
+
+- TV 1: Tournament clock (main)
+- TV 2: Cash game waiting list
+- TV 3: Table/seat map
+- TV 4: Rotating between tournament clock and waiting list
+
+The **Display Manager** (part of the admin dashboard) lets floor managers:
+- Assign content to each Chromecast device
+- Configure rotation/cycling between views
+- Send one-time announcements to all screens
+- Adjust display themes (dark/light, font size, venue branding)
+
+### Technical Details
+
+- Register the receiver app with Google Cast Developer Console (one-time setup, $5 fee)
+- Use Cast Application Framework (CAF) Receiver SDK v3
+- The receiver app is a standard web page — can use any web framework (SvelteKit static build)
+- Sender integration: use the `cast.framework.CastContext` API in the admin dashboard
+- For **local network casting** (offline mode): the local node serves the receiver app directly, and the Chromecast connects to the local node's IP
+- Consider also supporting **generic HDMI displays** via a simple browser in kiosk mode (Chromium on a secondary RPi or mini PC) as a non-Chromecast fallback
+
+### Gotchas
+
+- Chromecast devices have limited memory and CPU — keep the receiver app lightweight (Svelte is ideal here)
+- Cast sessions can timeout after inactivity — implement keep-alive messages
+- Chromecast requires an internet connection for initial app load (it fetches the receiver URL from Google's servers) — for fully offline venues, the kiosk-mode browser fallback is essential
+- Test on actual Chromecast hardware early — the developer emulator doesn't catch all issues
+- Cast SDK requires HTTPS for the receiver URL in production (self-signed certs won't work on Chromecast)
+
+---
+
+## 12. Mobile Strategy
+
+### Recommendation: **PWA first** (SvelteKit), with **Capacitor** wrapper for app store presence when needed
+
+### Alternatives Considered
+
+| Approach | Pros | Cons |
+|----------|------|------|
+| **PWA (SvelteKit)** | One codebase, instant updates, no app store, works offline | Limited native API access, no push on iOS (improving), discoverability |
+| **Capacitor (hybrid)** | PWA + native shell, access native APIs, app store distribution | Thin WebView wrapper, some performance overhead |
+| **Tauri Mobile** | Rust backend, small size | Mobile support very early (alpha/beta), limited ecosystem |
+| **React Native** | True native UI, large ecosystem | Separate codebase from web, React dependency, not Svelte |
+| **Flutter** | Excellent cross-platform, single codebase | Dart language, separate from web entirely |
+
+### Reasoning
+
+PVM's mobile needs are primarily **consumption-oriented** — players check tournament schedules, waiting list position, and receive notifications. This is a perfect fit for a PWA:
+
+1. **PWA first**: The SvelteKit app with `vite-plugin-pwa` already provides offline caching, add-to-home-screen, and background sync. For most players, this is sufficient.
+
+2. **Capacitor wrap when needed**: When iOS push notifications, Apple Pay, or app store presence becomes important, wrap the existing SvelteKit PWA in Capacitor. Capacitor runs the same web app in a native WebView and provides JavaScript bridges to native APIs.
+
+3. **Tauri Mobile is not ready**: As of 2026, Tauri 2.0's mobile support exists but is still maturing. It would be a good fit architecturally (Rust backend + web frontend), but the plugin ecosystem and build tooling aren't as polished as Capacitor's. Revisit in 12-18 months.
+
+### PWA Features for PVM
+
+- **Service Worker**: Cache tournament schedules, player profile, venue info for offline access
+- **Push Notifications**: Web Push API for tournament start reminders, waitlist calls (Android + iOS 16.4+)
+- **Add to Home Screen**: App-like experience without app store
+- **Background Sync**: Queue waitlist join/leave actions when offline, sync when back online
+- **Share Target**: Accept shared tournament links
+
+### Gotchas
+
+- iOS PWA support is improving but still has limitations (no background fetch, limited push notification payload)
+- Capacitor requires maintaining iOS/Android build pipelines — only add this when there's a clear need
+- Test PWA on actual mobile devices in venues — WiFi quality varies dramatically
+- Deep linking: configure universal links / app links so shared tournament URLs open in the PWA/app
+
+---
+
+## 13. Deployment & Infrastructure
+
+### Recommendation: **Fly.io** (primary cloud) + **Docker** containers + **GitHub Actions** CI/CD
+
+### Alternatives Considered
+
+| Platform | Pros | Cons |
+|----------|------|------|
+| **Fly.io** | Edge deployment, built-in Postgres, simple scaling, good pricing, Rust-friendly | CLI-first workflow, no built-in CI/CD |
+| **Railway** | Excellent DX, GitHub integration, preview environments | Less edge presence, newer |
+| **AWS (ECS/Fargate)** | Full control, enterprise grade, broadest service catalog | Complex, expensive operations overhead |
+| **Render** | Simple, good free tier | Less flexible networking, no edge |
+| **Hetzner + manual** | Cheapest, full control | Operations burden, no managed services |
+
+### Reasoning
+
+**Fly.io** is the best fit for PVM:
+
+1. **Edge deployment**: Fly.io runs containers close to users. For a poker venue SaaS with venues in multiple cities/countries, edge deployment means lower latency for real-time tournament updates.
+2. **Built-in Postgres**: Fly Postgres is managed, with automatic failover and point-in-time recovery.
+3. **Fly Machines**: Fine-grained control over machine placement — can run NATS, DragonflyDB, and the API server as separate Fly machines.
+4. **Rust-friendly**: Fly.io's multi-stage Docker builds work well for Rust (build on large machine, deploy tiny binary).
+5. **Private networking**: Fly's WireGuard mesh enables secure communication between services without exposing ports publicly. The RPi5 local nodes can use Fly's WireGuard to connect to the cloud NATS cluster.
+6. **Reasonable pricing**: Pay-as-you-go, no minimum commitment. Scale to zero for staging environments.
+
+### Infrastructure Layout
+
+```
+Fly.io Cloud
+├── pvm-api (Axum, 2+ instances, auto-scaled)
+├── pvm-ws-gateway (Axum WebSocket, 2+ instances)
+├── pvm-nats (NATS cluster, 3 nodes)
+├── pvm-db (Fly Postgres, primary + replica)
+├── pvm-cache (DragonflyDB, single node)
+└── pvm-worker (background jobs: sync processing, notifications)
+
+Venue (RPi5)
+└── pvm-node (single Rust binary + NATS leaf node)
+    └── connects to pvm-nats via WireGuard/TLS
+```
+
+### CI/CD Pipeline (GitHub Actions)
+
+```yaml
+# Triggered on push to main
+1. Lint (clippy, eslint)
+2. Test (cargo test, vitest, playwright)
+3. Build (multi-stage Docker for cloud, cross-compile for RPi5)
+4. Deploy staging (auto-deploy to Fly.io staging)
+5. E2E tests against staging
+6. Deploy production (manual approval gate)
+7. Publish RPi5 binary (signed, to update server)
+```
+
+### Gotchas
+
+- Fly.io Postgres is not fully managed — you still need to handle major version upgrades and backup verification
+- Use multi-stage Docker builds to keep Rust image sizes small (builder stage with `rust:bookworm`, runtime stage with `debian:bookworm-slim` or `distroless`)
+- Pin Fly.io machine regions to match your target markets — don't spread too thin initially
+- Set up blue-green deployments for zero-downtime upgrades
+- The RPi5 binary update mechanism needs a rollback strategy — keep the previous binary and a fallback boot option
+
+---
+
+## 14. Monitoring & Observability
+
+### Recommendation: **OpenTelemetry** (traces + metrics + logs) exported to **Grafana Cloud** (or self-hosted Grafana + Loki + Tempo + Prometheus)
+
+### Alternatives Considered
+
+| Stack | Pros | Cons |
+|-------|------|------|
+| **OpenTelemetry + Grafana** | Vendor-neutral, excellent Rust support, unified pipeline | Some setup required |
+| **Datadog** | All-in-one, excellent UX | Expensive at scale, vendor lock-in |
+| **New Relic** | Good APM | Cost, Rust support less first-class |
+| **Sentry** | Excellent error tracking | Limited metrics/traces, complementary rather than primary |
+
+### Rust Instrumentation Stack
+
+```toml
+# Key crates
+tracing = "0.1"                    # Structured logging/tracing facade
+tracing-subscriber = "0.3"        # Log formatting, filtering
+tracing-opentelemetry = "0.28"    # Bridge tracing → OpenTelemetry
+opentelemetry = "0.28"            # OTel SDK
+opentelemetry-otlp = "0.28"      # OTLP exporter
+opentelemetry-semantic-conventions # Standard attribute names
+```
+
+### What to Monitor
+
+**Application Metrics:**
+- Request rate, latency (p50/p95/p99), error rate per endpoint
+- WebSocket connection count per venue
+- NATS message throughput and consumer lag
+- Tournament clock drift (local node vs cloud time)
+- Sync latency (time from local mutation to cloud persistence)
+- Cache hit/miss ratios (DragonflyDB)
+
+**Business Metrics:**
+- Active tournaments per venue
+- Players on waiting lists
+- Concurrent connected users
+- Tournament registrations per hour
+- Offline duration per local node
+
+**Infrastructure Metrics:**
+- CPU, memory, disk per service
+- RPi5 node health: temperature, memory usage, SD card wear level
+- NATS cluster health
+- Postgres connection pool utilization
+
+### Local Node Observability
+
+The RPi5 node should:
+- Buffer OpenTelemetry spans/metrics locally when offline
+- Flush to cloud collector on reconnect
+- Expose a local `/health` endpoint for venue staff to check node status
+- Log to both stdout (for `journalctl`) and a rotating file
+
+### Alerting
+
+- Use Grafana Alerting for cloud services
+- Critical alerts: API error rate > 5%, NATS cluster partition, Postgres replication lag > 30s
+- Warning alerts: RPi5 node offline > 5 min, sync backlog > 1000 events, high memory usage
+- Notification channels: Slack/Discord for ops team, push notification for venue managers on critical local node issues
+
+### Gotchas
+
+- OpenTelemetry's Rust SDK is stable but evolving — pin versions carefully
+- The `tracing` crate is the Rust ecosystem standard — everything (Axum, sqlx, async-nats) already emits tracing spans, so you get deep instrumentation for free
+- Sampling is important at scale — don't trace every tournament clock tick in production
+- Grafana Cloud's free tier is generous enough for early stages (10k metrics, 50GB logs, 50GB traces)
+
+---
+
+## 15. Testing Strategy
+
+### Recommendation: Multi-layer testing with **cargo test** (unit/integration), **Playwright** (E2E), and **Vitest** (frontend unit)
+
+### Test Pyramid
+
+```
+         ▲
+        / \        E2E Tests (Playwright)
+       /   \       - Full user flows
+      /     \      - Cast receiver rendering
+     /───────\
+    /         \    Integration Tests (cargo test + testcontainers)
+   /           \   - API endpoint tests with real DB
+  /             \  - NATS pub/sub flows
+ /               \ - Sync protocol tests
+/─────────────────\
+                    Unit Tests (cargo test + vitest)
+                    - Domain logic (tournament engine, clock, waitlist)
+                    - Svelte component tests
+                    - Conflict resolution logic
+```
+
+### Backend Testing (Rust)
+
+- **Unit tests**: Inline `#[cfg(test)]` modules for domain logic. The tournament engine, clock manager, waitlist priority algorithm, and conflict resolution are all pure functions that are easy to unit test.
+- **Integration tests**: Use `testcontainers` crate to spin up ephemeral Postgres + NATS + DragonflyDB instances. Test full API flows including auth, multi-tenancy, and real-time events.
+- **sqlx compile-time checks**: SQL queries are validated against the database schema at compile time — this catches a huge class of bugs before runtime.
+- **Property-based testing**: Use `proptest` for testing conflict resolution and sync protocol with random inputs.
+- **Test runner**: `cargo-nextest` for parallel test execution (significantly faster than default `cargo test`).
+
+### Frontend Testing (TypeScript/Svelte)
+
+- **Component tests**: Vitest + `@testing-library/svelte` for testing Svelte components in isolation.
+- **Store/state tests**: Vitest for testing reactive state logic (tournament clock state, waitlist updates).
+- **API mocking**: `msw` (Mock Service Worker) for intercepting API calls in tests.
+
+### End-to-End Testing
+
+- **Playwright**: Test critical user flows in real browsers:
+  - Tournament creation and management flow
+  - Player registration and waitlist join
+  - Real-time updates (verify clock ticks appear in browser)
+  - Multi-venue admin dashboard
+  - Cast receiver display rendering (headless Chromium)
+- **Local node E2E**: Test offline scenarios — start local node, disconnect from cloud, perform operations, reconnect, verify sync.
+
+### Specialized Tests
+
+- **Sync protocol tests**: Simulate network partitions, conflicting writes, replay scenarios
+- **Load testing**: `k6` or `drill` (Rust) for WebSocket connection saturation, API throughput
+- **Cast receiver tests**: Visual regression testing with Playwright screenshots of display layouts
+- **Cross-browser**: Playwright covers Chromium, Firefox, WebKit — ensure PWA works on all
+
+### Gotchas
+
+- Rust integration tests with testcontainers need Docker available in CI — Fly.io's CI runners support this, or use GitHub Actions with Docker
+- Playwright tests are slow — run in parallel, and only test critical paths in CI (full suite nightly)
+- The local node's offline/reconnect behavior is the hardest thing to test — invest heavily in deterministic sync protocol tests
+- Mock the NATS connection in unit tests using a channel-based mock, not an actual NATS server
+
+---
+
+## 16. Security
+
+### Recommendation: Defense in depth across all layers
+
+### Data Security
+
+| Layer | Measure |
+|-------|---------|
+| **Transport** | TLS 1.3 everywhere — API, WebSocket, NATS, Postgres connections |
+| **Data at rest** | Postgres: encrypted volumes (cloud provider). libSQL on RPi5: SQLCipher-compatible encryption via `libsql` |
+| **Secrets** | Environment variables via Fly.io secrets (cloud), encrypted config file on RPi5 (sealed at provisioning) |
+| **Passwords** | Argon2id hashing, tuned per environment (higher params on cloud, lower on RPi5) |
+| **JWTs** | Ed25519 signing, short expiry (15 min), refresh token rotation |
+| **API keys** | SHA-256 hashed in DB, displayed once at creation, prefix-based identification (`pvm_live_`, `pvm_test_`) |
+
+### Network Security
+
+- **API**: Rate limiting (Tower middleware), CORS restricted to known origins, request size limits
+- **WebSocket**: Authenticated connection upgrade (JWT in first message or query param), per-connection rate limiting
+- **NATS**: TLS + token auth between cloud and leaf nodes. Leaf nodes have scoped permissions (can only access their venue's subjects)
+- **RPi5**: Firewall (nftables/ufw) — only allow outbound to cloud NATS + HTTPS, inbound on local network only for venue devices
+- **DDoS**: Fly.io provides basic DDoS protection. Add Cloudflare in front for the API if needed.
+
+### Financial Data Security
+
+PVM handles credit lines and buy-in transactions — this requires extra care:
+
+- All financial mutations are **event-sourced** with immutable audit trail
+- Credit line changes require **admin approval** with logged reason
+- Buy-in/cashout transactions include **idempotency keys** to prevent duplicate charges
+- Financial reports are only accessible to operator admins, with access logged
+- Consider PCI DSS implications if handling payment card data directly — prefer delegating to a payment processor (Stripe)
+
+### Local Node Security
+
+The RPi5 is physically in a venue — assume it can be stolen or tampered with:
+
+- **Disk encryption**: Full disk encryption (LUKS) or at minimum encrypted database
+- **Secure boot**: Signed binaries, verified at startup
+- **Remote wipe**: Cloud can send a command to reset the node to factory state
+- **Tamper detection**: Log unexpected restarts, hardware changes
+- **Credential scope**: Local node only has access to its venue's data — compromising one node doesn't expose other venues
+
+### Gotchas
+
+- DO NOT store payment card numbers — use a payment processor's tokenization
+- GDPR/privacy: Player data across venues requires careful consent management. Players must be able to request data deletion.
+- The local node's offline auth cache is a security risk — limit cached credentials, expire after configurable period
+- Regularly rotate NATS credentials and JWT signing keys — automate this
+
+---
+
+## 17. Developer Experience
+
+### Recommendation: **Cargo workspace** (Rust monorepo) + **pnpm workspace** (TypeScript) managed by **Turborepo**
+
+### Monorepo Structure
+
+```
+pvm/
+├── Cargo.toml                 # Rust workspace root
+├── turbo.json                 # Turborepo config
+├── package.json               # pnpm workspace root
+├── pnpm-workspace.yaml
+│
+├── crates/                    # Rust crates
+│   ├── pvm-api/               # Cloud API server (Axum)
+│   ├── pvm-node/              # Local node binary
+│   ├── pvm-ws-gateway/        # WebSocket gateway
+│   ├── pvm-worker/            # Background job processor
+│   ├── pvm-core/              # Shared domain logic
+│   │   ├── tournament/        # Tournament engine
+│   │   ├── waitlist/          # Waitlist management
+│   │   ├── clock/             # Tournament clock
+│   │   └── sync/              # Sync protocol
+│   ├── pvm-db/                # Database layer (sqlx queries, migrations)
+│   ├── pvm-auth/              # Auth logic (JWT, RBAC)
+│   ├── pvm-nats/              # NATS client wrappers
+│   └── pvm-types/             # Shared types (serde, utoipa derives)
+│
+├── apps/                      # TypeScript apps
+│   ├── dashboard/             # SvelteKit admin dashboard
+│   ├── player/                # SvelteKit player-facing app
+│   ├── cast-receiver/         # SvelteKit Cast receiver (static)
+│   └── docs/                  # Documentation site (optional)
+│
+├── packages/                  # Shared TypeScript packages
+│   ├── ui/                    # shadcn-svelte components
+│   ├── api-client/            # Generated OpenAPI client
+│   └── shared/                # Shared types, utilities
+│
+├── docker/                    # Dockerfiles
+├── .github/                   # GitHub Actions workflows
+└── docs/                      # Project documentation
+```
+
+### Key Tools
+
+| Tool | Purpose |
+|------|---------|
+| **Cargo** | Rust build system, workspace management |
+| **pnpm** | Fast, disk-efficient Node.js package manager |
+| **Turborepo** | Orchestrates build/test/lint across both Rust and TS workspaces. Caches build outputs. `--affected` flag for CI optimization. |
+| **cargo-watch** | Auto-rebuild on Rust file changes during development |
+| **cargo-nextest** | Faster test runner with parallel execution |
+| **sccache** | Shared compilation cache (speeds up CI and local builds) |
+| **cross** / **cargo-zigbuild** | Cross-compile Rust for RPi5 ARM64 |
+| **Biome** | Fast linter + formatter for TypeScript (replaces ESLint + Prettier) |
+| **clippy** | Rust linter (run with `--deny warnings` in CI) |
+| **rustfmt** | Rust formatter (enforced in CI) |
+| **lefthook** | Git hooks manager (format + lint on pre-commit) |
+
+### Development Workflow
+
+```bash
+# Start everything for local development
+turbo dev                      # Starts SvelteKit dev servers
+cargo watch -x run -p pvm-api  # Auto-restart API on changes
+
+# Run all tests
+turbo test                     # TypeScript tests
+cargo nextest run              # Rust tests
+
+# Generate API client after backend changes
+cargo run -p pvm-api -- --openapi > apps/dashboard/src/lib/api/schema.json
+turbo generate:api-client
+
+# Build for production
+turbo build                    # TypeScript apps
+cargo build --release -p pvm-api
+cross build --release --target aarch64-unknown-linux-gnu -p pvm-node
+```
+
+### Gotchas
+
+- Turborepo's Rust support is task-level (it runs `cargo` as a shell command) — it doesn't understand Cargo's internal dependency graph. Use Cargo workspace for Rust-internal dependencies.
+- Keep `pvm-core` as a pure library crate with no async runtime dependency — this lets it be used in both the cloud API and the local node without conflicts.
+- Rust compile times are the bottleneck — invest in `sccache` and incremental compilation from day one
+- Use `.cargo/config.toml` for cross-compilation targets and linker settings
+
+---
+
+## 18. CSS / Styling
+
+### Recommendation: **Tailwind CSS v4** + **shadcn-svelte** component system
+
+### Alternatives Considered
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **Tailwind CSS v4** | Utility-first, fast, excellent Svelte integration, v4 is faster with Rust-based engine | Learning curve for utility classes |
+| **Vanilla CSS** | No dependencies, full control | Slow development, inconsistent patterns |
+| **UnoCSS** | Atomic CSS, fast, flexible presets | Smaller ecosystem than Tailwind |
+| **Open Props** | Design tokens as CSS custom properties | Not utility-first, less adoption |
+| **Panda CSS** | Type-safe styles, zero runtime | Newer, smaller ecosystem |
+
+### Reasoning
+
+**Tailwind CSS v4** is the clear choice:
+
+1. **Svelte integration**: Tailwind works seamlessly with SvelteKit via the Vite plugin. Svelte's template syntax + Tailwind utilities produce compact, readable component markup.
+2. **Tailwind v4 improvements**: The v4 release includes a Rust-based engine (Oxide) that is significantly faster, CSS-first configuration (no more `tailwind.config.js`), automatic content detection, and native CSS cascade layers.
+3. **shadcn-svelte**: The component library is built on Tailwind, providing a consistent design system with accessible, customizable components. Components are generated into your codebase — full ownership, no black box.
+4. **Cast receiver**: Tailwind's utility classes produce small CSS bundles (only used classes are included) — important for the resource-constrained Chromecast receiver.
+5. **Design tokens**: Use CSS custom properties (via Tailwind's theme) for venue-specific branding (colors, logos) that can be swapped at runtime.
+
+### Design System Structure
+
+```
+packages/ui/
+├── components/                # shadcn-svelte generated components
+│   ├── button/
+│   ├── card/
+│   ├── data-table/
+│   ├── dialog/
+│   ├── form/
+│   └── ...
+├── styles/
+│   ├── app.css                # Global styles, Tailwind imports
+│   ├── themes/
+│   │   ├── default.css        # Default PVM theme
+│   │   ├── dark.css           # Dark mode overrides
+│   │   └── cast.css           # Optimized for large screens
+│   └── tokens.css             # Design tokens (colors, spacing, typography)
+└── utils.ts                   # cn() helper, variant utilities
+```
+
+### Venue Branding
+
+Venues should be able to customize their displays:
+
+```css
+/* Runtime theme switching via CSS custom properties */
+:root {
+  --venue-primary: theme(colors.blue.600);
+  --venue-secondary: theme(colors.gray.800);
+  --venue-logo-url: url('/default-logo.svg');
+}
+
+/* Applied per-venue at runtime */
+[data-venue-theme="vegas-poker"] {
+  --venue-primary: #c41e3a;
+  --venue-secondary: #1a1a2e;
+  --venue-logo-url: url('/venues/vegas-poker/logo.svg');
+}
+```
+
+### Gotchas
+
+- Tailwind v4's CSS-first config is a paradigm shift from v3 — ensure all team documentation targets v4 syntax
+- shadcn-svelte components use Tailwind v4 as of recent updates — verify compatibility
+- Large data tables (tournament player lists, waitlists) need careful styling — consider virtualized rendering for 100+ row tables
+- Cast receiver displays need large fonts and high contrast — create a dedicated `cast.css` theme
+- Dark mode is essential for poker venues (low-light environments) — design dark-first
+
+---
+
+## Recommended Stack Summary
+
+| Area | Recommendation | Key Reasoning |
+|------|---------------|---------------|
+| **Backend Language** | Rust | Memory efficiency on RPi5, performance, type safety |
+| **Frontend Language** | TypeScript | Browser ecosystem standard, type safety |
+| **Backend Framework** | Axum (v0.8+) | Tokio-native, Tower middleware, WebSocket support |
+| **Frontend Framework** | SvelteKit (Svelte 5) | Smallest bundles, fine-grained reactivity, PWA support |
+| **UI Components** | shadcn-svelte | Accessible, Tailwind-based, full ownership |
+| **Cloud Database** | PostgreSQL 16+ | Multi-tenant gold standard, RLS, JSONB |
+| **Local Database** | libSQL (embedded) | SQLite-compatible, tiny footprint, Rust-native |
+| **ORM / Queries** | sqlx | Compile-time checked SQL, Postgres + SQLite support |
+| **Caching** | DragonflyDB | Redis-compatible, multi-threaded, memory efficient |
+| **Messaging** | NATS + JetStream | Edge-native leaf nodes, sub-ms latency, lightweight |
+| **Real-Time** | WebSockets (Axum) + SSE fallback | Full duplex, NATS-backed fan-out |
+| **Auth** | Custom JWT + RBAC | Offline-capable, cross-venue, full control |
+| **API Design** | REST + OpenAPI 3.1 | Generated TypeScript client, universal compatibility |
+| **Mobile** | PWA first, Capacitor later | One codebase, offline support, app store when needed |
+| **Cast/Display** | Google Cast SDK + Custom Web Receiver | SvelteKit static app on Chromecast |
+| **Deployment** | Fly.io + Docker | Edge deployment, managed Postgres, WireGuard |
+| **CI/CD** | GitHub Actions + Turborepo | Cross-language build orchestration, caching |
+| **Monitoring** | OpenTelemetry + Grafana | Vendor-neutral, excellent Rust support |
+| **Testing** | cargo-nextest + Vitest + Playwright | Full pyramid: unit, integration, E2E |
+| **Styling** | Tailwind CSS v4 | Fast, small bundles, Svelte-native |
+| **Monorepo** | Cargo workspace + pnpm + Turborepo | Unified builds, shared types |
+| **Linting** | clippy + Biome | Rust + TypeScript coverage |
+
+---
+
+## Open Questions / Decisions Needed
+
+### High Priority
+
+1. **Fly.io vs. self-hosted**: Fly.io simplifies operations but creates vendor dependency. For a bootstrapped SaaS, the convenience is worth it. For VC-funded with an ops team, self-hosted on Hetzner could be cheaper at scale. **Decision: Start with Fly.io, design for portability.**
+
+2. **libSQL sync granularity**: Should the local node sync entire tables or individual rows? Row-level sync is more efficient but more complex to implement. **Recommendation: Start with table-level sync for the initial version, refine to row-level as data volumes grow.**
+
+3. **NATS embedded vs. sidecar on RPi5**: Running NATS as an embedded library (via `nats-server` Rust bindings) vs. a separate process. Embedded is simpler but couples versions tightly. **Recommendation: Sidecar (separate process managed by systemd) for operational flexibility.**
+
+4. **Financial data handling**: Does PVM handle actual money transactions, or only track buy-ins/credits as records? If handling real money, PCI DSS and financial regulations apply. **Recommendation: Track records only. Integrate with Stripe for actual payments.**
+
+5. **Multi-region from day one?**: Should the initial architecture support venues in multiple countries/regions? This affects Postgres replication strategy and NATS cluster topology. **Recommendation: Single region initially, design NATS subjects and DB schema for eventual multi-region.**
+
+### Medium Priority
+
+6. **Player account deduplication**: When a player signs up at two venues independently, how do we detect and merge accounts? Email match? Phone match? Manual linking? **Needs product decision.**
+
+7. **Chromecast vs. generic display hardware**: Should the primary display strategy be Chromecast, or should we target a browser-in-kiosk-mode approach that also works with Chromecast? **Recommendation: Build the receiver as a standard web app first (works in kiosk mode), add Cast SDK integration second.**
+
+8. **RPi5 provisioning**: How are local nodes set up? Manual image flashing? Automated provisioning? Remote setup? **Recommendation: Pre-built OS image with first-boot wizard that connects to cloud and provisions the node.**
+
+9. **Offline duration limits**: How long should a local node operate offline before we consider the data stale? 1 hour? 1 day? 1 week? **Needs product decision based on venue feedback.**
+
+10. **API versioning strategy**: When do we introduce `/api/v2/`? Should we support multiple versions simultaneously? **Recommendation: Semantic versioning for the API spec. Maintain backward compatibility as long as possible. Only version on breaking changes.**
+
+### Low Priority
+
+11. **GraphQL for player-facing app**: The admin dashboard is well-served by REST, but the player app might benefit from GraphQL's flexible querying (e.g., "show me my upcoming tournaments across all venues with waitlist status"). **Revisit after v1 launch.**
+
+12. **WebTransport**: When browser support matures and Chromecast supports it, WebTransport could replace WebSockets for lower-latency, multiplexed real-time streams. **Monitor but do not adopt yet.**
+
+13. **WASM on local node**: Could parts of the frontend run on the local node via WASM for ultra-fast local rendering? Interesting but not a priority. **Defer.**
+
+14. **AI features**: Player behavior analytics, optimal table assignments, tournament structure recommendations. The data model should be designed to support future ML pipelines. **Design for it, build later.**