docs(01-05): complete DragonFlyDB write-ahead queue plan summary

- Integration tests ran and passed against DragonFlyDB at 10.5.0.10:6379
- Documents custom URL parser deviation for slash-in-password fix
- Notes NoOpHandler stub for Phase 2 replacement
This commit is contained in:
Mikkel Georgsen 2026-04-10 05:22:54 +00:00
parent 52e3e9cd9c
commit 8f902edcd7

View file

@ -0,0 +1,132 @@
---
phase: 01-foundation
plan: "05"
subsystem: queue
tags: [dragonfly, redis, write-ahead-queue, worker, graceful-shutdown]
dependency_graph:
requires: [01-01-SUMMARY.md]
provides: [internal/queue WAQ, RunWorker, NoOpHandler]
affects: [cmd/hwlab/main.go]
tech_stack:
added: [github.com/redis/go-redis/v9 v9.18.0, github.com/google/uuid v1.6.0]
patterns: [RPUSH/BLPOP FIFO queue, context cancellation worker loop, non-fatal degraded init]
key_files:
created:
- internal/queue/waq.go
- internal/queue/waq_test.go
- internal/queue/worker.go
modified:
- cmd/hwlab/main.go
- go.mod
- go.sum
decisions:
- "Custom URL parser (regex + redis.Options) required because passwords with forward slashes break url.Parse and redis.ParseURL"
- "WAQ init is non-fatal — binary starts with WARNING log when DragonFlyDB unreachable"
- "NoOpHandler placeholder drains Phase 1 queue; Phase 2 will replace with real NetBox retry"
metrics:
duration: "~10 minutes"
completed: "2026-04-10T05:22:13Z"
tasks_completed: 2
files_modified: 5
requirements: [NB-05]
---
# Phase 01 Plan 05: DragonFlyDB Write-Ahead Queue Summary
DragonFlyDB-backed write-ahead queue with RPUSH/BLPOP FIFO ordering, BLPOP retry worker with context cancellation and exponential backoff, and graceful shutdown wired into main binary.
## Tasks Completed
| Task | Name | Commit | Files |
|------|------|--------|-------|
| 1 | Write-ahead queue core (Enqueue, Dequeue, Len) | e07ad92 | internal/queue/waq.go, internal/queue/waq_test.go, go.mod, go.sum |
| 2 | WAQ retry worker + wire into main binary | d1192c3 | internal/queue/worker.go, cmd/hwlab/main.go |
## What Was Built
### internal/queue/waq.go
- `PendingOp` struct: UUID ID, operation type, `json.RawMessage` payload, created_at timestamp, retry attempt counter
- `NewPendingOp(opType, payload)`: constructs op with generated UUID
- `WAQ` type wrapping `*redis.Client`
- `NewWAQ(url)`: connects to DragonFlyDB, pings on init, returns error if unreachable
- `Enqueue(ctx, op)`: RPUSH to `hwlab:netbox:pending_ops`
- `Dequeue(ctx, timeout)`: blocking BLPOP, returns `nil, nil` on timeout
- `Len(ctx)`: LLEN queue depth
- `Close()`: releases connection
### internal/queue/worker.go
- `RunWorker(ctx, handler, maxAttempts, retryInterval)`: BLPOP loop on the WAQ
- Context cancellation triggers clean exit
- Connection errors trigger `retryInterval` backoff (T-05-04 mitigation)
- Handler errors increment `op.Attempts` and re-enqueue
- Ops exceeding `maxAttempts` are dropped with a warning log (T-05-03 mitigation)
- `NoOpHandler`: Phase 1 placeholder that logs and drains ops
### cmd/hwlab/main.go
- `signal.NotifyContext` for SIGINT/SIGTERM graceful shutdown
- Non-fatal WAQ init: `WARNING` log when DragonFlyDB unavailable, binary continues serving
- `go waq.RunWorker(ctx, ...)` goroutine started after successful WAQ init
- `defer waq.Close()` on clean path
- HTTP server runs in goroutine; `srv.Shutdown(shutdownCtx)` with 10s timeout on signal
## DragonFlyDB Integration Test Results
DragonFlyDB at `10.5.0.10:6379` was reachable during execution.
```
=== RUN TestWAQEnqueueDequeue
2026/04/10 05:21:23 WAQ connected to DragonFlyDB
--- PASS: TestWAQEnqueueDequeue (0.02s)
PASS
```
The URL `redis://:nUq/IfoIQJf/kouckKHRQOk7vV0NwCuI@10.5.0.10:6379` contains forward slashes in the password, which causes both Go's `url.Parse` and `redis.ParseURL` to fail (they misinterpret the slashes as path separators). See Deviations section.
## Final Test Suite Output
```
? git.georgsen.dk/hwlab [no test files]
? git.georgsen.dk/hwlab/cmd/hwlab [no test files]
? git.georgsen.dk/hwlab/internal/api [no test files]
ok git.georgsen.dk/hwlab/internal/api/handlers 0.006s
ok git.georgsen.dk/hwlab/internal/config 0.006s
ok git.georgsen.dk/hwlab/internal/netbox 0.003s
ok git.georgsen.dk/hwlab/internal/queue 0.004s
```
All packages green. No regressions.
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 1 - Bug] Custom URL parser for passwords with forward slashes**
- **Found during:** Task 1, integration test run
- **Issue:** `redis://:nUq/IfoIQJf/kouckKHRQOk7vV0NwCuI@10.5.0.10:6379` — the password contains `/` characters. Go's `url.Parse` treats them as path separators, producing `invalid port ":nUq" after host`. `redis.ParseURL` delegates to `url.Parse` and inherits the same failure. The plan noted to use `redis.ParseURL` but that function cannot handle this URL format.
- **Fix:** Added `parseRedisURL()` in `waq.go` that tries `redis.ParseURL` first (fast path for standard passwords), then falls back to a `regexp`-based extractor that captures password, host, port, and db directly — bypassing `url.Parse` entirely. Constructs `redis.Options` struct directly.
- **Files modified:** internal/queue/waq.go
- **Commit:** e07ad92
## Threat Mitigations Applied
| Threat ID | Mitigation | Location |
|-----------|-----------|----------|
| T-05-03 | `maxAttempts` drop prevents unbounded queue growth | worker.go:44 |
| T-05-04 | `retryInterval` backoff on connection loss prevents tight-loop hammering | worker.go:32-37 |
## Known Stubs
- `NoOpHandler` in `internal/queue/worker.go`: Phase 1 placeholder. Logs ops and returns nil (success), causing queued ops to drain without actual processing. Phase 2 NetBox integration will replace this with a real retry handler that re-drives failed NetBox API calls.
## Self-Check: PASSED
- internal/queue/waq.go: FOUND
- internal/queue/waq_test.go: FOUND
- internal/queue/worker.go: FOUND
- cmd/hwlab/main.go: FOUND (modified)
- Commit e07ad92: FOUND
- Commit d1192c3: FOUND