homelabby/.planning/phases/02-ai-pipeline/02-04-PLAN.md
Mikkel Georgsen 7bebe2ed93 docs(02): create phase 2 AI pipeline plans (4 plans, 4 waves)
Wave 1: go-openai dep, CreateDevice gap, AIClient interface + mock + config
Wave 2: three-tier orchestrator, WAQ real handler, SearXNG stub
Wave 3: POST /api/intake handler, router wiring, quick add mode
Wave 4: oMLX integration test + memory checkpoint

Covers requirements: AI-01 through AI-09 (AI-04 stub only; full impl Phase 7)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 05:40:22 +00:00

12 KiB

phase plan type wave depends_on files_modified autonomous requirements must_haves
02-ai-pipeline 04 execute 4
02-03
internal/ai/omlx_integration_test.go
docs/omlx-setup.md
false
AI-01
truths artifacts key_links
oMLX serves Gemma 4 on Mac Mini M4 and responds to OpenAI-compatible /v1/chat/completions
Gemma 4 E4B memory usage measured and documented (must fit within 16GB)
Integration test proves end-to-end: 1 photo → IntakeResult with non-empty model field
Memory budget and model tier decision documented in docs/omlx-setup.md
path provides
internal/ai/omlx_integration_test.go Integration test that skips unless HWLAB_OMLX_URL is set; proves real AI call works
path provides
docs/omlx-setup.md oMLX installation steps, model tier selection, measured memory budget
from to via pattern
internal/ai/omlx_integration_test.go http://localhost:8000/v1 TierClient with real oMLX endpoint; skips when HWLAB_OMLX_URL unset HWLAB_OMLX_URL
Verify oMLX runs on Mac Mini M4 with Gemma 4, measure memory usage, and document the model tier decision. Write an integration test that proves the real AI pipeline works end-to-end.

Purpose: AI-01 requires empirical validation that Gemma 4 fits in 16GB on the Mac Mini. This checkpoint collects that measurement. Output: Passing integration test (when oMLX reachable), memory measurement recorded in docs/, model tier confirmed.

<execution_context> @$HOME/.claude/get-shit-done/workflows/execute-plan.md @$HOME/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/phases/02-ai-pipeline/02-CONTEXT.md @.planning/phases/02-ai-pipeline/02-03-SUMMARY.md From internal/ai/client.go: ```go type TierClient struct{ /* ... */ } func NewTierClient(cfg TierConfig) *TierClient func (c *TierClient) AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error) ```

From internal/ai/types.go:

type TierConfig struct {
    BaseURL        string
    APIKey         string
    Model          string
    TimeoutSeconds int
}
type IntakeRequest struct { PhotosBase64 []string; JobID string }
type IntakeResult struct {
    Model string; Manufacturer string; Confidence float64
    // ... other fields
}

oMLX installation (macOS Apple Silicon):

# Install oMLX (requires macOS 15+, Apple Silicon)
# From https://omlx.ai or brew if available
# Default port: 8000
# Start with: omlx serve --model gemma-4-e4b --port 8000
# Measure memory: activity monitor or `memory_pressure` / `vm_stat`
Task 1: oMLX integration test with skip guard internal/ai/omlx_integration_test.go - internal/ai/client.go (full — TierClient and AnalyzePhotos) - internal/ai/types.go (full) - internal/netbox/client_test.go (skim — skip guard pattern used in this codebase) Create internal/ai/omlx_integration_test.go:
//go:build integration

package ai_test

import (
    "context"
    "os"
    "testing"
    "encoding/base64"

    "git.georgsen.dk/hwlab/internal/ai"
)

// TestOMLXIntegration tests a real call to oMLX.
// Run with: HWLAB_OMLX_URL=http://localhost:8000/v1 go test ./internal/ai/... -tags integration -v -run TestOMLX
//
// Skip conditions:
//   - HWLAB_OMLX_URL not set
//   - oMLX unreachable (test fails with connection error — not skipped, so the failure is visible)
func TestOMLXIntegration(t *testing.T) {
    omlxURL := os.Getenv("HWLAB_OMLX_URL")
    if omlxURL == "" {
        t.Skip("HWLAB_OMLX_URL not set — skipping oMLX integration test")
    }

    model := os.Getenv("HWLAB_OMLX_MODEL")
    if model == "" {
        model = "gemma-4-e4b"
    }

    client := ai.NewTierClient(ai.TierConfig{
        BaseURL:        omlxURL,
        APIKey:         "local",
        Model:          model,
        TimeoutSeconds: 60,
    })

    // Minimal 1x1 red JPEG for testing — real photos not needed for integration smoke test
    // This is a valid tiny JPEG in base64
    minimalJPEG := "data:image/jpeg;base64," + minimalJPEGBase64()

    result, err := client.AnalyzePhotos(context.Background(), ai.IntakeRequest{
        PhotosBase64: []string{minimalJPEG},
        JobID:        "integration-test-001",
    })

    if err != nil {
        t.Fatalf("AnalyzePhotos error: %v", err)
    }
    if result == nil {
        t.Fatal("result is nil")
    }
    // Confidence may be low for a minimal test image — just verify the call completed
    t.Logf("IntakeResult: model=%q manufacturer=%q category=%q confidence=%.2f",
        result.Model, result.Manufacturer, result.Category, result.Confidence)
    t.Logf("AINotes: %s", result.AINotes)

    // The model must return something in the JSON fields — at minimum a non-panic parse
    // (empty model string is acceptable for a 1x1 pixel image)
    if result.Confidence < 0 || result.Confidence > 1.0 {
        t.Errorf("confidence %.2f out of [0,1] range", result.Confidence)
    }
}

// minimalJPEGBase64 returns a base64-encoded minimal valid JPEG (1x1 white pixel).
// Source: https://github.com/nicowillis/pngheaders (1x1 JPEG, 631 bytes)
func minimalJPEGBase64() string {
    // 1x1 white JPEG — static bytes for reproducible test
    data := []byte{
        0xff, 0xd8, 0xff, 0xe0, 0x00, 0x10, 0x4a, 0x46, 0x49, 0x46, 0x00, 0x01,
        0x01, 0x00, 0x00, 0x01, 0x00, 0x01, 0x00, 0x00, 0xff, 0xdb, 0x00, 0x43,
        0x00, 0x08, 0x06, 0x06, 0x07, 0x06, 0x05, 0x08, 0x07, 0x07, 0x07, 0x09,
        0x09, 0x08, 0x0a, 0x0c, 0x14, 0x0d, 0x0c, 0x0b, 0x0b, 0x0c, 0x19, 0x12,
        0x13, 0x0f, 0x14, 0x1d, 0x1a, 0x1f, 0x1e, 0x1d, 0x1a, 0x1c, 0x1c, 0x20,
        0x24, 0x2e, 0x27, 0x20, 0x22, 0x2c, 0x23, 0x1c, 0x1c, 0x28, 0x37, 0x29,
        0x2c, 0x30, 0x31, 0x34, 0x34, 0x34, 0x1f, 0x27, 0x39, 0x3d, 0x38, 0x32,
        0x3c, 0x2e, 0x33, 0x34, 0x32, 0xff, 0xc0, 0x00, 0x0b, 0x08, 0x00, 0x01,
        0x00, 0x01, 0x01, 0x01, 0x11, 0x00, 0xff, 0xc4, 0x00, 0x1f, 0x00, 0x00,
        0x01, 0x05, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
        0x09, 0x0a, 0x0b, 0xff, 0xc4, 0x00, 0xb5, 0x10, 0x00, 0x02, 0x01, 0x03,
        0x03, 0x02, 0x04, 0x03, 0x05, 0x05, 0x04, 0x04, 0x00, 0x00, 0x01, 0x7d,
        0x01, 0x02, 0x03, 0x00, 0x04, 0x11, 0x05, 0x12, 0x21, 0x31, 0x41, 0x06,
        0x13, 0x51, 0x61, 0x07, 0x22, 0x71, 0x14, 0x32, 0x81, 0x91, 0xa1, 0x08,
        0x23, 0x42, 0xb1, 0xc1, 0x15, 0x52, 0xd1, 0xf0, 0x24, 0x33, 0x62, 0x72,
        0x82, 0x09, 0x0a, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x25, 0x26, 0x27, 0x28,
        0x29, 0x2a, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x43, 0x44, 0x45,
        0x46, 0x47, 0x48, 0x49, 0x4a, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,
        0x5a, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x73, 0x74, 0x75,
        0x76, 0x77, 0x78, 0x79, 0x7a, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89,
        0x8a, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0xa2, 0xa3, 0xa4,
        0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
        0xb8, 0xb9, 0xba, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca,
        0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xe1, 0xe2, 0xe3,
        0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5,
        0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xff, 0xda, 0x00, 0x08, 0x01, 0x01, 0x00,
        0x00, 0x3f, 0x00, 0xfb, 0xd2, 0x8a, 0x28, 0x03, 0xff, 0xd9,
    }
    return base64.StdEncoding.EncodeToString(data)
}

NOTE: Use build tag //go:build integration so the test is excluded from normal go test ./... runs. Integration tests only run when explicitly tagged: go test -tags integration ./internal/ai/...

This follows the skip-guard pattern established in Phase 1 but uses build tags instead of env-only guards, since oMLX is only available on the Mac Mini production machine. cd /home/mikkel/homelabby && go build ./... && go test ./internal/ai/... -v 2>&1 | tail -20 - go build ./... passes - go test ./internal/ai/... -v (without -tags integration) shows integration test NOT included — only unit tests run - internal/ai/omlx_integration_test.go exists with build tag integration

All Phase 2 AI pipeline code is complete and tested: - go-openai installed, AIClient interface, MockAIClient, TierClient - Three-tier orchestrator with confidence-based tier escalation - WAQ real NetBox op handler (create_device + patch_custom_fields) - POST /api/intake endpoint wired end-to-end - Quick add mode (config-driven) - SearXNG ResearchClient stub - oMLX integration test (build-tag guarded) **Step 1: Run all unit tests (no external services needed)** ```bash cd /home/mikkel/homelabby go test ./... -v 2>&1 | grep -E "^(ok|FAIL|---)" | head -40 ``` Expected: all packages show "ok" or "SKIP" — zero FAIL.
**Step 2: Test the intake endpoint with mock photos (binary running locally)**
```bash
# Terminal 1: start the server
cd /home/mikkel/homelabby
go run cmd/hwlab/main.go &

# Terminal 2: send a test intake request (any JPEG file will work)
curl -s -X POST http://localhost:8080/api/intake \
  -F "photos=@/path/to/any-photo.jpg" | python3 -m json.tool
```
Expected (without real oMLX running):
- If oMLX is reachable: JSON response with hw_id, model, confidence, catalog_status
- If oMLX unreachable (expected on dev machine): 500 or 202 depending on tier client timeout

**Step 3: On Mac Mini M4 — run the oMLX integration test**
```bash
# On Mac Mini: start oMLX
omlx serve --model gemma-4-e4b --port 8000

# Check memory: Activity Monitor → omlx process, note "Real Memory"
# Expected for E4B: ~8-10GB RAM

# Run integration test
cd /home/mikkel/homelabby
HWLAB_OMLX_URL=http://localhost:8000/v1 go test -tags integration ./internal/ai/... -run TestOMLX -v
```
Expected: PASS with logged IntakeResult fields (model may be empty for test pixel — that's OK).

**Step 4: Document memory measurement**
Record in the summary: "Gemma 4 E4B: X GB real memory on Mac Mini M4 16GB"
If > 12GB: note that 26B A4B is not feasible without TurboQuant KV offload.
Type "approved" after verifying unit tests pass. If oMLX test was run on Mac Mini, include memory measurement (e.g. "approved — E4B uses 9.2GB"). If Mac Mini not available yet, type "approved — oMLX test deferred, unit tests pass".

<threat_model>

Trust Boundaries

Boundary Description
integration test → oMLX Test sends real data to local AI; only runs when explicitly triggered

STRIDE Threat Register

Threat ID Category Component Disposition Mitigation Plan
T-02-14 Information Disclosure omlx-setup.md accept Document contains model names and port numbers — no secrets; oMLX API key is "local" (not a real credential)
T-02-15 Denial of Service integration test resource usage mitigate Build tag integration ensures test never runs in standard CI pipeline; only runs manually with explicit env var
</threat_model>
After plan completion: 1. `go test ./... 2>&1 | grep FAIL` — no failures 2. `go test -tags integration ./internal/ai/... -v -run TestOMLX` when HWLAB_OMLX_URL unset → SKIP 3. `ls docs/omlx-setup.md` — file exists with memory measurement (filled in during checkpoint) 4. Human verified: unit test suite clean, oMLX smoke test outcome documented

<success_criteria>

  • All Phase 2 unit tests pass with zero failures
  • oMLX integration test exists, skips gracefully when HWLAB_OMLX_URL not set
  • Memory budget for Gemma 4 E4B documented (or deferred with note if Mac Mini not available)
  • Phase 2 complete: POST /api/intake is end-to-end functional </success_criteria>
After completion, create `.planning/phases/02-ai-pipeline/02-04-SUMMARY.md`