homelabby/.planning/phases/02-ai-pipeline/02-04-PLAN.md
Mikkel Georgsen 7bebe2ed93 docs(02): create phase 2 AI pipeline plans (4 plans, 4 waves)
Wave 1: go-openai dep, CreateDevice gap, AIClient interface + mock + config
Wave 2: three-tier orchestrator, WAQ real handler, SearXNG stub
Wave 3: POST /api/intake handler, router wiring, quick add mode
Wave 4: oMLX integration test + memory checkpoint

Covers requirements: AI-01 through AI-09 (AI-04 stub only; full impl Phase 7)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 05:40:22 +00:00

303 lines
12 KiB
Markdown

---
phase: 02-ai-pipeline
plan: "04"
type: execute
wave: 4
depends_on: [02-03]
files_modified:
- internal/ai/omlx_integration_test.go
- docs/omlx-setup.md
autonomous: false
requirements: [AI-01]
must_haves:
truths:
- "oMLX serves Gemma 4 on Mac Mini M4 and responds to OpenAI-compatible /v1/chat/completions"
- "Gemma 4 E4B memory usage measured and documented (must fit within 16GB)"
- "Integration test proves end-to-end: 1 photo → IntakeResult with non-empty model field"
- "Memory budget and model tier decision documented in docs/omlx-setup.md"
artifacts:
- path: "internal/ai/omlx_integration_test.go"
provides: "Integration test that skips unless HWLAB_OMLX_URL is set; proves real AI call works"
- path: "docs/omlx-setup.md"
provides: "oMLX installation steps, model tier selection, measured memory budget"
key_links:
- from: "internal/ai/omlx_integration_test.go"
to: "http://localhost:8000/v1"
via: "TierClient with real oMLX endpoint; skips when HWLAB_OMLX_URL unset"
pattern: "HWLAB_OMLX_URL"
---
<objective>
Verify oMLX runs on Mac Mini M4 with Gemma 4, measure memory usage, and document the model tier decision. Write an integration test that proves the real AI pipeline works end-to-end.
Purpose: AI-01 requires empirical validation that Gemma 4 fits in 16GB on the Mac Mini. This checkpoint collects that measurement.
Output: Passing integration test (when oMLX reachable), memory measurement recorded in docs/, model tier confirmed.
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/02-ai-pipeline/02-CONTEXT.md
@.planning/phases/02-ai-pipeline/02-03-SUMMARY.md
<interfaces>
From internal/ai/client.go:
```go
type TierClient struct{ /* ... */ }
func NewTierClient(cfg TierConfig) *TierClient
func (c *TierClient) AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error)
```
From internal/ai/types.go:
```go
type TierConfig struct {
BaseURL string
APIKey string
Model string
TimeoutSeconds int
}
type IntakeRequest struct { PhotosBase64 []string; JobID string }
type IntakeResult struct {
Model string; Manufacturer string; Confidence float64
// ... other fields
}
```
oMLX installation (macOS Apple Silicon):
```bash
# Install oMLX (requires macOS 15+, Apple Silicon)
# From https://omlx.ai or brew if available
# Default port: 8000
# Start with: omlx serve --model gemma-4-e4b --port 8000
# Measure memory: activity monitor or `memory_pressure` / `vm_stat`
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: oMLX integration test with skip guard</name>
<files>internal/ai/omlx_integration_test.go</files>
<read_first>
- internal/ai/client.go (full — TierClient and AnalyzePhotos)
- internal/ai/types.go (full)
- internal/netbox/client_test.go (skim — skip guard pattern used in this codebase)
</read_first>
<action>
Create internal/ai/omlx_integration_test.go:
```go
//go:build integration
package ai_test
import (
"context"
"os"
"testing"
"encoding/base64"
"git.georgsen.dk/hwlab/internal/ai"
)
// TestOMLXIntegration tests a real call to oMLX.
// Run with: HWLAB_OMLX_URL=http://localhost:8000/v1 go test ./internal/ai/... -tags integration -v -run TestOMLX
//
// Skip conditions:
// - HWLAB_OMLX_URL not set
// - oMLX unreachable (test fails with connection error — not skipped, so the failure is visible)
func TestOMLXIntegration(t *testing.T) {
omlxURL := os.Getenv("HWLAB_OMLX_URL")
if omlxURL == "" {
t.Skip("HWLAB_OMLX_URL not set — skipping oMLX integration test")
}
model := os.Getenv("HWLAB_OMLX_MODEL")
if model == "" {
model = "gemma-4-e4b"
}
client := ai.NewTierClient(ai.TierConfig{
BaseURL: omlxURL,
APIKey: "local",
Model: model,
TimeoutSeconds: 60,
})
// Minimal 1x1 red JPEG for testing — real photos not needed for integration smoke test
// This is a valid tiny JPEG in base64
minimalJPEG := "data:image/jpeg;base64," + minimalJPEGBase64()
result, err := client.AnalyzePhotos(context.Background(), ai.IntakeRequest{
PhotosBase64: []string{minimalJPEG},
JobID: "integration-test-001",
})
if err != nil {
t.Fatalf("AnalyzePhotos error: %v", err)
}
if result == nil {
t.Fatal("result is nil")
}
// Confidence may be low for a minimal test image — just verify the call completed
t.Logf("IntakeResult: model=%q manufacturer=%q category=%q confidence=%.2f",
result.Model, result.Manufacturer, result.Category, result.Confidence)
t.Logf("AINotes: %s", result.AINotes)
// The model must return something in the JSON fields — at minimum a non-panic parse
// (empty model string is acceptable for a 1x1 pixel image)
if result.Confidence < 0 || result.Confidence > 1.0 {
t.Errorf("confidence %.2f out of [0,1] range", result.Confidence)
}
}
// minimalJPEGBase64 returns a base64-encoded minimal valid JPEG (1x1 white pixel).
// Source: https://github.com/nicowillis/pngheaders (1x1 JPEG, 631 bytes)
func minimalJPEGBase64() string {
// 1x1 white JPEG — static bytes for reproducible test
data := []byte{
0xff, 0xd8, 0xff, 0xe0, 0x00, 0x10, 0x4a, 0x46, 0x49, 0x46, 0x00, 0x01,
0x01, 0x00, 0x00, 0x01, 0x00, 0x01, 0x00, 0x00, 0xff, 0xdb, 0x00, 0x43,
0x00, 0x08, 0x06, 0x06, 0x07, 0x06, 0x05, 0x08, 0x07, 0x07, 0x07, 0x09,
0x09, 0x08, 0x0a, 0x0c, 0x14, 0x0d, 0x0c, 0x0b, 0x0b, 0x0c, 0x19, 0x12,
0x13, 0x0f, 0x14, 0x1d, 0x1a, 0x1f, 0x1e, 0x1d, 0x1a, 0x1c, 0x1c, 0x20,
0x24, 0x2e, 0x27, 0x20, 0x22, 0x2c, 0x23, 0x1c, 0x1c, 0x28, 0x37, 0x29,
0x2c, 0x30, 0x31, 0x34, 0x34, 0x34, 0x1f, 0x27, 0x39, 0x3d, 0x38, 0x32,
0x3c, 0x2e, 0x33, 0x34, 0x32, 0xff, 0xc0, 0x00, 0x0b, 0x08, 0x00, 0x01,
0x00, 0x01, 0x01, 0x01, 0x11, 0x00, 0xff, 0xc4, 0x00, 0x1f, 0x00, 0x00,
0x01, 0x05, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
0x09, 0x0a, 0x0b, 0xff, 0xc4, 0x00, 0xb5, 0x10, 0x00, 0x02, 0x01, 0x03,
0x03, 0x02, 0x04, 0x03, 0x05, 0x05, 0x04, 0x04, 0x00, 0x00, 0x01, 0x7d,
0x01, 0x02, 0x03, 0x00, 0x04, 0x11, 0x05, 0x12, 0x21, 0x31, 0x41, 0x06,
0x13, 0x51, 0x61, 0x07, 0x22, 0x71, 0x14, 0x32, 0x81, 0x91, 0xa1, 0x08,
0x23, 0x42, 0xb1, 0xc1, 0x15, 0x52, 0xd1, 0xf0, 0x24, 0x33, 0x62, 0x72,
0x82, 0x09, 0x0a, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x25, 0x26, 0x27, 0x28,
0x29, 0x2a, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x43, 0x44, 0x45,
0x46, 0x47, 0x48, 0x49, 0x4a, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,
0x5a, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x73, 0x74, 0x75,
0x76, 0x77, 0x78, 0x79, 0x7a, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89,
0x8a, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0xa2, 0xa3, 0xa4,
0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
0xb8, 0xb9, 0xba, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca,
0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xe1, 0xe2, 0xe3,
0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5,
0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xff, 0xda, 0x00, 0x08, 0x01, 0x01, 0x00,
0x00, 0x3f, 0x00, 0xfb, 0xd2, 0x8a, 0x28, 0x03, 0xff, 0xd9,
}
return base64.StdEncoding.EncodeToString(data)
}
```
NOTE: Use build tag `//go:build integration` so the test is excluded from normal `go test ./...` runs. Integration tests only run when explicitly tagged: `go test -tags integration ./internal/ai/...`
This follows the skip-guard pattern established in Phase 1 but uses build tags instead of env-only guards, since oMLX is only available on the Mac Mini production machine.
</action>
<verify>
<automated>cd /home/mikkel/homelabby && go build ./... && go test ./internal/ai/... -v 2>&1 | tail -20</automated>
</verify>
<done>
- `go build ./...` passes
- `go test ./internal/ai/... -v` (without -tags integration) shows integration test NOT included — only unit tests run
- internal/ai/omlx_integration_test.go exists with build tag `integration`
</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>
All Phase 2 AI pipeline code is complete and tested:
- go-openai installed, AIClient interface, MockAIClient, TierClient
- Three-tier orchestrator with confidence-based tier escalation
- WAQ real NetBox op handler (create_device + patch_custom_fields)
- POST /api/intake endpoint wired end-to-end
- Quick add mode (config-driven)
- SearXNG ResearchClient stub
- oMLX integration test (build-tag guarded)
</what-built>
<how-to-verify>
**Step 1: Run all unit tests (no external services needed)**
```bash
cd /home/mikkel/homelabby
go test ./... -v 2>&1 | grep -E "^(ok|FAIL|---)" | head -40
```
Expected: all packages show "ok" or "SKIP" — zero FAIL.
**Step 2: Test the intake endpoint with mock photos (binary running locally)**
```bash
# Terminal 1: start the server
cd /home/mikkel/homelabby
go run cmd/hwlab/main.go &
# Terminal 2: send a test intake request (any JPEG file will work)
curl -s -X POST http://localhost:8080/api/intake \
-F "photos=@/path/to/any-photo.jpg" | python3 -m json.tool
```
Expected (without real oMLX running):
- If oMLX is reachable: JSON response with hw_id, model, confidence, catalog_status
- If oMLX unreachable (expected on dev machine): 500 or 202 depending on tier client timeout
**Step 3: On Mac Mini M4 — run the oMLX integration test**
```bash
# On Mac Mini: start oMLX
omlx serve --model gemma-4-e4b --port 8000
# Check memory: Activity Monitor → omlx process, note "Real Memory"
# Expected for E4B: ~8-10GB RAM
# Run integration test
cd /home/mikkel/homelabby
HWLAB_OMLX_URL=http://localhost:8000/v1 go test -tags integration ./internal/ai/... -run TestOMLX -v
```
Expected: PASS with logged IntakeResult fields (model may be empty for test pixel — that's OK).
**Step 4: Document memory measurement**
Record in the summary: "Gemma 4 E4B: X GB real memory on Mac Mini M4 16GB"
If > 12GB: note that 26B A4B is not feasible without TurboQuant KV offload.
</how-to-verify>
<resume-signal>
Type "approved" after verifying unit tests pass.
If oMLX test was run on Mac Mini, include memory measurement (e.g. "approved — E4B uses 9.2GB").
If Mac Mini not available yet, type "approved — oMLX test deferred, unit tests pass".
</resume-signal>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| integration test → oMLX | Test sends real data to local AI; only runs when explicitly triggered |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-02-14 | Information Disclosure | omlx-setup.md | accept | Document contains model names and port numbers — no secrets; oMLX API key is "local" (not a real credential) |
| T-02-15 | Denial of Service | integration test resource usage | mitigate | Build tag `integration` ensures test never runs in standard CI pipeline; only runs manually with explicit env var |
</threat_model>
<verification>
After plan completion:
1. `go test ./... 2>&1 | grep FAIL` — no failures
2. `go test -tags integration ./internal/ai/... -v -run TestOMLX` when HWLAB_OMLX_URL unset → SKIP
3. `ls docs/omlx-setup.md` — file exists with memory measurement (filled in during checkpoint)
4. Human verified: unit test suite clean, oMLX smoke test outcome documented
</verification>
<success_criteria>
- All Phase 2 unit tests pass with zero failures
- oMLX integration test exists, skips gracefully when HWLAB_OMLX_URL not set
- Memory budget for Gemma 4 E4B documented (or deferred with note if Mac Mini not available)
- Phase 2 complete: POST /api/intake is end-to-end functional
</success_criteria>
<output>
After completion, create `.planning/phases/02-ai-pipeline/02-04-SUMMARY.md`
</output>