docs(02): research AI pipeline phase — go-openai vision, mock interface, orchestrator patterns

2026-04-10 05:32:17 +00:00 · 2026-04-10 05:32:17 +00:00 · 6460b27bfc
commit 6460b27bfc
parent 5313dfb501
1 changed files with 809 additions and 0 deletions
--- a/.planning/phases/02-ai-pipeline/02-RESEARCH.md
+++ b/.planning/phases/02-ai-pipeline/02-RESEARCH.md
@ -0,0 +1,809 @@
+# Phase 2: AI Pipeline - Research
+
+**Researched:** 2026-04-10
+**Domain:** Go AI client interface, multipart photo intake, multimodal vision with Gemma 4 via oMLX, three-tier orchestrator, confidence-based quality gate wiring
+**Confidence:** HIGH (core patterns from training knowledge, verified against codebase and stack decisions)
+
+---
+
+<user_constraints>
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+
+- Single `go-openai` client with configurable BaseURL per tier
+- Tier 1: oMLX at http://localhost:8000/v1 (Gemma 4 E4B default)
+- Tier 2: OpenRouter at https://openrouter.ai/api/v1 (research agent)
+- Tier 3: OpenRouter (Opus for Lab Advisor — deferred to Phase 6)
+- Config JSON drives tier routing — no code changes to swap providers
+- POST /api/intake accepts multipart/form-data with 1-3 photo files
+- Photos encoded as base64 and sent to Gemma 4 vision endpoint
+- AI extracts: serial number, model, manufacturer, specs, category, suggested tags
+- Confidence score determines catalog_status: high → indexed, low → needs_research
+- Config flag enables skip-review flow for high-confidence items (Quick Add mode)
+- oMLX may not be installed on dev machine — use mock AI client for unit tests
+- Integration tests skip gracefully when oMLX unreachable
+- Expose `AIClient` interface so production uses oMLX, tests use mock
+- AI config lives in ai_config.json (separate from main config.json)
+- Intake handler should use write-ahead queue if NetBox unreachable
+- SearXNG function calling deferred to Phase 7
+
+### Claude's Discretion
+
+All implementation details are at Claude's discretion. Use Phase 1 artifacts (NetBox client, quality gate, HW-ID) as building blocks.
+
+### Deferred Ideas (OUT OF SCOPE)
+
+- SearXNG function calling (Phase 7)
+- Lab Advisor tier 3 (Phase 6)
+- Natural language search (Phase 7)
+- Actual Gemma 4 model tuning/fine-tuning
+- React UI for intake (Phase 3)
+</user_constraints>
+
+---
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|------------------|
+| AI-01 | oMLX installed on Mac Mini M4 with Gemma 4 model serving OpenAI-compatible API | oMLX setup guide + mock pattern for dev |
+| AI-02 | User can upload 1-3 photos and AI extracts serial number, model, manufacturer, specs via multimodal vision | Multipart form handling + base64 vision message pattern |
+| AI-03 | AI suggests category, tags, and location for each item | Structured JSON response from vision prompt |
+| AI-04 | AI calls SearXNG via function calling to research product specs (STUB only this phase) | Stub interface only; real impl Phase 7 |
+| AI-05 | Orchestrator reviews Tier 1 output for completeness and flags gaps as needs_research | Confidence extraction + quality gate transition |
+| AI-06 | Tier 2 research agent (OpenRouter) automatically enriches items flagged needs_research | go-openai BaseURL swap pattern |
+| AI-07 | Quick add mode skips review screen for items with high AI confidence | Config flag + threshold comparison |
+| AI-08 | All AI tiers accessed via single OpenAI-compatible client with configurable base URLs | go-openai ClientConfig.BaseURL |
+| AI-09 | Provider routing configured via JSON file — swap any tier without code changes | ai_config.json schema + factory pattern |
+</phase_requirements>
+
+---
+
+## Summary
+
+Phase 2 builds the AI backbone of HWLab: a Go interface hierarchy that decouples test-time mocks from production oMLX/OpenRouter calls, a multipart photo intake handler that encodes images as base64 vision messages, a structured-output extractor that parses Gemma 4 JSON responses into typed `IntakeResult` values, and a three-tier orchestrator that escalates to OpenRouter when Tier 1 confidence falls below threshold.
+
+The key design challenge is keeping the `AIClient` interface minimal enough to mock cleanly while capturing the full vision + JSON-mode call pattern used by go-openai. The confidence score must be embedded in the model's structured output (not inferred post-hoc) because Gemma 4 / OpenAI-compatible APIs do not expose logprobs for vision tasks reliably.
+
+The orchestrator plugs directly into Phase 1's `CatalogUpdater`, `AllocateNextHWID`, `PatchCustomFields`, and `SyncTags` — all four are stable and tested. The WAQ from Phase 1 (Plan 05) is already wired into main.go and is the fallback path when NetBox is unreachable during intake.
+
+**Primary recommendation:** Build the `AIClient` interface and mock first, then the intake handler, then the orchestrator. Keep confidence scoring self-contained inside the AI package — do not leak `float64` confidence values into the service layer; instead expose a typed `CatalogStatus` decision from the orchestrator.
+
+---
+
+## Standard Stack
+
+### Core (Phase 2 additions)
+
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| github.com/sashabaranov/go-openai | v1.x | OpenAI-compatible HTTP client | Single client for oMLX + OpenRouter; BaseURL swap is the tier-routing mechanism; already recommended in STACK.md |
+
+**Version verification:**
+```bash
+go get github.com/sashabaranov/go-openai@latest
+# As of 2026-04 training knowledge: v1.36+ is current — verify before install
+```
+[ASSUMED: exact latest version; run `npm view` equivalent: `go list -m github.com/sashabaranov/go-openai@latest` to confirm]
+
+### Already in go.mod (no new dependencies needed)
+
+| Package | Current Version | Used By Phase 2 |
+|---------|-----------------|-----------------|
+| github.com/go-chi/chi/v5 | v5.2.5 | POST /api/intake route |
+| github.com/spf13/viper | v1.21.0 | ai_config.json loading |
+| github.com/google/uuid | v1.6.0 | Intake job ID (already indirect) |
+| github.com/redis/go-redis/v9 | v9.18.0 | WAQ fallback on NetBox failure |
+
+### Installation
+
+```bash
+cd /home/mikkel/homelabby
+go get github.com/sashabaranov/go-openai@latest
+```
+
+---
+
+## Architecture Patterns
+
+### Recommended Package Structure (Phase 2 additions)
+
+```
+internal/
+├── ai/
+│   ├── client.go          # AIClient interface + TierClient concrete type
+│   ├── mock.go            # MockAIClient for unit tests
+│   ├── orchestrator.go    # Three-tier routing + escalation logic
+│   ├── types.go           # IntakeRequest, IntakeResult, ConfidenceLevel
+│   └── prompts/
+│       └── intake.go      # Prompt templates for hardware analysis
+├── api/
+│   ├── handlers/
+│   │   └── intake.go      # POST /api/intake multipart handler (new)
+│   └── router.go          # Add intake route (modify existing)
+└── config/
+    └── config.go          # Add AIConfig fields (modify existing)
+```
+
+---
+
+### Pattern 1: AIClient Interface + TierClient
+
+**What:** A minimal Go interface that captures the one call shape Phase 2 needs. `TierClient` wraps `*openai.Client` from go-openai. `MockAIClient` implements the same interface deterministically.
+
+**Why minimal interface:** The interface should expose the behavior, not the library. If the interface requires `*openai.ChatCompletionRequest`, tests must import go-openai. A domain-typed interface (`AnalyzePhotos`) keeps mocks simple.
+
+```go
+// Source: training knowledge — standard Go interface pattern [ASSUMED]
+// internal/ai/client.go
+
+package ai
+
+import "context"
+
+// AIClient is the single abstraction over any OpenAI-compatible inference backend.
+// Production: TierClient wrapping sashabaranov/go-openai.
+// Tests: MockAIClient with canned responses.
+type AIClient interface {
+    AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error)
+}
+
+// TierConfig holds provider configuration for one AI tier.
+type TierConfig struct {
+    BaseURL  string  `json:"base_url"`
+    APIKey   string  `json:"api_key"`
+    Model    string  `json:"model"`
+    TimeoutS int     `json:"timeout_seconds"`
+}
+
+// TierClient is the production AIClient backed by go-openai.
+type TierClient struct {
+    client *openai.Client
+    model  string
+}
+
+func NewTierClient(cfg TierConfig) *TierClient {
+    config := openai.DefaultConfig(cfg.APIKey)
+    config.BaseURL = cfg.BaseURL
+    return &TierClient{
+        client: openai.NewClientWithConfig(config),
+        model:  cfg.Model,
+    }
+}
+```
+
+[VERIFIED: go-openai BaseURL override via `openai.DefaultConfig` + `config.BaseURL` — confirmed pattern from STACK.md and ARCHITECTURE.md]
+
+---
+
+### Pattern 2: Multipart Photo Upload → Base64 Vision Message
+
+**What:** chi handler reads up to 3 files from multipart form, reads each into `[]byte`, encodes to base64 data URL, assembles a `ChatCompletionRequest` with `ImageURL` content parts.
+
+**go-openai vision message shape:** [ASSUMED: standard pattern, consistent with OpenAI API]
+
+```go
+// internal/api/handlers/intake.go
+// Source: go-openai vision pattern [ASSUMED — matches OpenAI API spec]
+
+func (h *IntakeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
+    // Parse multipart — 32MB max
+    if err := r.ParseMultipartForm(32 << 20); err != nil {
+        http.Error(w, "bad multipart", http.StatusBadRequest)
+        return
+    }
+
+    files := r.MultipartForm.File["photos"]
+    if len(files) == 0 || len(files) > 3 {
+        http.Error(w, "1-3 photos required", http.StatusBadRequest)
+        return
+    }
+
+    var photosB64 []string
+    for _, fh := range files {
+        f, err := fh.Open()
+        if err != nil { /* handle */ }
+        defer f.Close()
+        data, err := io.ReadAll(f)
+        if err != nil { /* handle */ }
+        // Detect MIME type from first 512 bytes
+        mime := http.DetectContentType(data[:min(512, len(data))])
+        photosB64 = append(photosB64, fmt.Sprintf("data:%s;base64,%s",
+            mime, base64.StdEncoding.EncodeToString(data)))
+    }
+
+    result, err := h.ai.AnalyzePhotos(r.Context(), ai.IntakeRequest{
+        PhotosBase64: photosB64,
+    })
+    // ...
+}
+```
+
+**go-openai vision content parts:** [ASSUMED]
+
+```go
+// internal/ai/client.go — TierClient.AnalyzePhotos
+func (c *TierClient) AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error) {
+    // Build image content parts
+    parts := []openai.ChatMessagePart{
+        {
+            Type: openai.ChatMessagePartTypeText,
+            Text: buildIntakePrompt(),
+        },
+    }
+    for _, b64 := range req.PhotosBase64 {
+        parts = append(parts, openai.ChatMessagePart{
+            Type: openai.ChatMessagePartTypeImageURL,
+            ImageURL: &openai.ChatMessageImageURL{
+                URL:    b64,   // data:image/jpeg;base64,...
+                Detail: openai.ImageURLDetailAuto,
+            },
+        })
+    }
+
+    resp, err := c.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
+        Model: c.model,
+        Messages: []openai.ChatCompletionMessage{
+            {Role: openai.ChatMessageRoleUser, MultiContent: parts},
+        },
+        // ResponseFormat for JSON mode — see Pattern 3
+    })
+    // parse resp.Choices[0].Message.Content as JSON
+}
+```
+
+[ASSUMED: `MultiContent` field name in go-openai ChatCompletionMessage — verify against actual go-openai source after install. Some versions use `Content` string OR `MultiContent []ChatMessagePart`]
+
+**CRITICAL NOTE:** Verify the exact `ChatCompletionMessage` field for multi-content vision after `go get`. The field has been `MultiContent` in v1.20+ but naming may differ. Check with:
+```bash
+go doc github.com/sashabaranov/go-openai ChatCompletionMessage
+```
+
+---
+
+### Pattern 3: Structured JSON Output from Gemma 4
+
+**What:** Instruct the model to return a specific JSON schema via prompt engineering. Use `ResponseFormat` with `JSONObject` type when the endpoint supports it (oMLX/Gemma 4 may not support strict JSON schema mode — fall back to prompt-only).
+
+**IntakeResult schema:**
+
+```go
+// internal/ai/types.go
+package ai
+
+// IntakeResult is the structured output from any AI tier's photo analysis.
+// The model is instructed to return this JSON shape verbatim.
+type IntakeResult struct {
+    SerialNumber   string   `json:"serial_number"`    // empty string if not visible
+    Model          string   `json:"model"`
+    Manufacturer   string   `json:"manufacturer"`
+    Category       string   `json:"category"`         // e.g. "networking", "cable", "compute"
+    Specs          map[string]string `json:"specs"`   // key-value hardware specs
+    SuggestedTags  []string `json:"suggested_tags"`
+    AINotes        string   `json:"ai_notes"`          // free-form observations
+    Confidence     float64  `json:"confidence"`        // 0.0–1.0, self-reported by model
+    ConfidenceNote string   `json:"confidence_note"`   // why confidence is low (if < threshold)
+}
+```
+
+**Prompt pattern for JSON output:**
+
+```go
+// internal/ai/prompts/intake.go
+func buildIntakePrompt() string {
+    return `Analyze the hardware in the provided photo(s) and return ONLY valid JSON matching this schema:
+{
+  "serial_number": "<string or empty>",
+  "model": "<string>",
+  "manufacturer": "<string>",
+  "category": "<one of: compute, networking, storage, cable, peripheral, component, unknown>",
+  "specs": {"<key>": "<value>"},
+  "suggested_tags": ["<tag1>", "<tag2>"],
+  "ai_notes": "<observations>",
+  "confidence": <float 0.0-1.0>,
+  "confidence_note": "<reason if confidence < 0.75>"
+}
+Return ONLY the JSON object. No markdown, no explanation.`
+}
+```
+
+**JSON mode ResponseFormat (use if supported by endpoint):** [ASSUMED]
+
+```go
+// Only set if oMLX / OpenRouter model supports JSON mode
+ResponseFormat: &openai.ChatCompletionResponseFormat{
+    Type: openai.ChatCompletionResponseFormatTypeJSONObject,
+},
+```
+
+[ASSUMED: Gemma 4 via oMLX may not support `response_format: json_object` — implement with prompt-only fallback and parse `json.Unmarshal` on the raw response string. If JSON parse fails, treat as low-confidence and escalate.]
+
+---
+
+### Pattern 4: Three-Tier Orchestrator
+
+**What:** Orchestrator holds two `AIClient` instances (tier1, tier2). For each intake request: call tier1, parse result, check confidence. If confidence < threshold OR parse failed, call tier2 with same request. Map confidence to `CatalogStatus` for quality gate.
+
+```go
+// internal/ai/orchestrator.go
+package ai
+
+type Orchestrator struct {
+    tier1     AIClient
+    tier2     AIClient
+    threshold float64 // from config — default 0.75
+}
+
+func NewOrchestrator(tier1, tier2 AIClient, threshold float64) *Orchestrator {
+    return &Orchestrator{tier1: tier1, tier2: tier2, threshold: threshold}
+}
+
+// Analyze runs tier1, escalates to tier2 if needed, returns result + catalog decision.
+func (o *Orchestrator) Analyze(ctx context.Context, req IntakeRequest) (*IntakeResult, inventory.CatalogStatus, error) {
+    result, err := o.tier1.AnalyzePhotos(ctx, req)
+    if err != nil || result == nil || result.Confidence < o.threshold {
+        // Escalate to tier2
+        result2, err2 := o.tier2.AnalyzePhotos(ctx, req)
+        if err2 == nil && result2 != nil {
+            result = result2
+        }
+        // If tier2 also fails, use tier1 result (or zero result) with NeedsResearch status
+    }
+
+    status := inventory.StatusIndexed
+    if result == nil || result.Confidence < o.threshold {
+        status = inventory.StatusNeedsResearch
+    }
+    return result, status, nil
+}
+```
+
+---
+
+### Pattern 5: MockAIClient for Unit Tests
+
+**What:** A deterministic mock that returns canned `IntakeResult` values. Implements `AIClient` interface. Configurable to return high-confidence or low-confidence responses, and optionally errors.
+
+```go
+// internal/ai/mock.go
+package ai
+
+import "context"
+
+// MockAIClient is a test double for AIClient.
+// Configure FixedResult and/or FixedError before use.
+type MockAIClient struct {
+    FixedResult *IntakeResult
+    FixedError  error
+    Calls       []IntakeRequest // record of calls for assertions
+}
+
+func (m *MockAIClient) AnalyzePhotos(_ context.Context, req IntakeRequest) (*IntakeResult, error) {
+    m.Calls = append(m.Calls, req)
+    return m.FixedResult, m.FixedError
+}
+
+// HighConfidenceResult returns a fixture IntakeResult with confidence 0.95.
+func HighConfidenceResult() *IntakeResult {
+    return &IntakeResult{
+        Model:        "Raspberry Pi 4 Model B",
+        Manufacturer: "Raspberry Pi Foundation",
+        Category:     "compute",
+        Specs:        map[string]string{"ram": "4GB", "cpu": "BCM2711"},
+        SuggestedTags: []string{"raspberry-pi", "compute", "arm"},
+        Confidence:   0.95,
+    }
+}
+
+// LowConfidenceResult returns a fixture with confidence 0.40 (below threshold).
+func LowConfidenceResult() *IntakeResult {
+    return &IntakeResult{
+        Model:          "Unknown Device",
+        Category:       "unknown",
+        Confidence:     0.40,
+        ConfidenceNote: "Cannot identify markings clearly",
+    }
+}
+```
+
+---
+
+### Pattern 6: AI Config Schema (ai_config.json)
+
+**What:** Separate JSON config file for AI provider settings. Loaded by viper alongside main config.json. Keeps provider credentials out of the main config.
+
+```json
+{
+  "tier1": {
+    "base_url": "http://localhost:8000/v1",
+    "api_key": "local",
+    "model": "gemma-4-e4b",
+    "timeout_seconds": 30
+  },
+  "tier2": {
+    "base_url": "https://openrouter.ai/api/v1",
+    "api_key": "sk-or-...",
+    "model": "google/gemma-2-27b-it",
+    "timeout_seconds": 60
+  },
+  "confidence_threshold": 0.75,
+  "quick_add_enabled": false,
+  "quick_add_threshold": 0.90
+}
+```
+
+**Config struct extension** (extend existing `internal/config/config.go`):
+
+```go
+type AIConfig struct {
+    Tier1               TierConfig `mapstructure:"tier1"`
+    Tier2               TierConfig `mapstructure:"tier2"`
+    ConfidenceThreshold float64    `mapstructure:"confidence_threshold"`
+    QuickAddEnabled     bool       `mapstructure:"quick_add_enabled"`
+    QuickAddThreshold   float64    `mapstructure:"quick_add_threshold"`
+}
+
+// Add to Config struct:
+AI AIConfig `mapstructure:"ai"`
+```
+
+**Viper loads ai_config.json** by merging it into the same viper instance using `v.MergeInConfig()` with a second config name, or by embedding the AI fields directly in config.json under an `"ai"` key. Simplest: use a single config.json with an `"ai"` section and add `ai_config.json` as an override file via `v.MergeConfigMap`.
+
+[ASSUMED: viper MergeInConfig pattern for secondary config file — standard viper v1 capability]
+
+---
+
+### Pattern 7: Intake Handler Wiring to Phase 1 Components
+
+**What:** The intake handler coordinates: orchestrator (AI analysis) → `AllocateNextHWID` (ID) → `BuildFullCustomFieldsPatch` (fields) → `NetboxClient.CreateDevice` or `PatchCustomFields` → `SyncTags` → `CatalogUpdater.UpdateCatalogStatus` → WAQ fallback.
+
+**Existing Phase 1 APIs the handler calls:**
+
+| Phase 1 Function | Package | Handler Usage |
+|-----------------|---------|---------------|
+| `AllocateNextHWID(ctx)` | `internal/netbox` | Assign HW-XXXXX ID to new record |
+| `BuildFullCustomFieldsPatch(cf)` | `internal/netbox` | Populate custom fields from IntakeResult |
+| `PatchCustomFields(ctx, id, patch)` | `internal/netbox` | Write AI data to NetBox device |
+| `SyncTags(ctx, tags)` | `internal/netbox` | Create and assign AI-suggested tags |
+| `UpdateCatalogStatus(ctx, id, current, next)` | `internal/inventory` | Set indexed or needs_research |
+| `waq.Enqueue(ctx, op)` | `internal/queue` | Buffer NetBox write if unreachable |
+
+**Note:** Phase 1's `client.go` has `ListDevices` and `GetDevice` but no `CreateDevice`. The intake handler will need `CreateDevice` — this is a new method on `internal/netbox.Client`. Plan must include this task.
+
+---
+
+### Pattern 8: SearXNG Stub (AI-04)
+
+**What:** AI-04 is listed as "Phase 7" in REQUIREMENTS.md but the CONTEXT.md says "stub only" this phase. Implement a `ResearchClient` interface with a `Search(ctx, query)` method, and a `NoOpResearchClient` that returns empty results. This satisfies the interface requirement without Phase 7 scope creep.
+
+```go
+// internal/ai/research.go (stub)
+type ResearchClient interface {
+    Search(ctx context.Context, query string) ([]SearchResult, error)
+}
+
+type NoOpResearchClient struct{}
+
+func (n *NoOpResearchClient) Search(_ context.Context, _ string) ([]SearchResult, error) {
+    return nil, nil // Phase 7 will provide real implementation
+}
+```
+
+---
+
+### Anti-Patterns to Avoid
+
+- **Don't extract confidence from logprobs:** Gemma 4 vision via oMLX does not expose per-token logprobs reliably. Embed `confidence: float` in the JSON output schema and instruct the model to self-report it. [ASSUMED: oMLX logprobs availability is uncertain]
+- **Don't store photos:** Per CLAUDE.md stack patterns: "Store the original photo in a local temp directory only until the NetBox record is created; do not persist photos in HWLab itself." Photos are transient.
+- **Don't call NetBox from the AI package:** `internal/ai` should not import `internal/netbox`. The intake handler (service layer) orchestrates both. Keep the AI package focused on inference only.
+- **Don't share a single go-openai client across tiers:** Each tier gets its own `*openai.Client` instance with its own `BaseURL` and `APIKey`. Mutating a shared client's config is a race condition.
+- **Don't block the HTTP response on AI inference:** AI calls take 2-30 seconds. The intake handler should return a job ID immediately and push the result via SSE. (Phase 3 will add SSE — for Phase 2, a synchronous response is acceptable since there's no UI yet, but design the handler to support async promotion.)
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| OpenAI-compatible HTTP client | Custom HTTP calls to oMLX | `sashabaranov/go-openai` | Handles auth headers, retry, streaming, vision content parts |
+| Base64 encoding | Custom encoder | `encoding/base64` stdlib | Already in Go stdlib |
+| MIME type detection | File extension parsing | `net/http.DetectContentType` | Magic bytes detection from stdlib |
+| JSON structured output parsing | Regex extraction | `encoding/json.Unmarshal` | Model output is well-formed JSON when prompted correctly |
+| Multipart form parsing | Manual `--boundary` parsing | `r.ParseMultipartForm()` | stdlib net/http handles multipart |
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: go-openai Vision MultiContent Field Name
+
+**What goes wrong:** Code compiles but `ChatCompletionMessage.MultiContent` field doesn't exist or is named differently in the installed version.
+
+**Why it happens:** go-openai API evolved; older versions used a single `Content string`, newer versions added `MultiContent []ChatMessagePart` for vision. The exact field name depends on the version.
+
+**How to avoid:** After `go get github.com/sashabaranov/go-openai@latest`, run `go doc github.com/sashabaranov/go-openai ChatCompletionMessage` and verify the vision field name before writing handler code.
+
+**Warning signs:** Compiler error "unknown field MultiContent" or images silently not being sent (text-only response from model).
+
+---
+
+### Pitfall 2: oMLX JSON Mode Not Supported
+
+**What goes wrong:** Setting `ResponseFormat: {Type: "json_object"}` causes a 400 error from oMLX because Gemma 4 E4B via oMLX may not support the `response_format` parameter.
+
+**Why it happens:** The `response_format` JSON schema enforcement is an OpenAI-specific feature not universally implemented across all OpenAI-compatible servers.
+
+**How to avoid:** Implement JSON parsing with a fallback: try `json.Unmarshal(content)` on the raw string. If parse fails, treat result as zero-confidence and escalate to tier2. Do not set `ResponseFormat` unless tested against live oMLX.
+
+**Warning signs:** 400 Bad Request from oMLX at inference time with "unsupported parameter" in body.
+
+---
+
+### Pitfall 3: Data URL MIME Type vs go-openai Image URL
+
+**What goes wrong:** Some OpenAI-compatible servers reject `data:image/jpeg;base64,...` data URLs in vision requests and require a `https://` URL instead.
+
+**Why it happens:** The OpenAI spec allows data URLs in `image_url.url` but not all providers implement this.
+
+**How to avoid:** oMLX (local, Gemma 4) should accept data URLs since it's processing locally. Test with a minimal integration test against live oMLX before building the full intake flow. Keep the base64 path for oMLX (tier1) and note that tier2 (OpenRouter) may require a different approach if it doesn't accept data URLs.
+
+**Warning signs:** 400 or inference-time error from oMLX with "invalid image_url".
+
+---
+
+### Pitfall 4: CreateDevice Not in Phase 1 NetBox Client
+
+**What goes wrong:** Intake handler tries to call `netboxClient.CreateDevice(...)` but that method was not built in Phase 1 (only ListDevices, GetDevice, PatchCustomFields were built).
+
+**Why it happens:** Phase 1 was scoped to read/patch existing devices for the quality gate workflow. Intake requires creating new records.
+
+**How to avoid:** Plan must include a Wave 0 task to add `CreateDevice(ctx, name, assetTag) (int, error)` to `internal/netbox/client.go` before the intake handler can be completed.
+
+**go-netbox v4 create pattern:** [ASSUMED — matches observed PATCH pattern from 01-02-SUMMARY]
+```go
+req := nb.WritableDeviceWithConfigContextRequest{}
+req.SetName(name)
+req.SetAssetTag(assetTag)
+// DeviceRole and DeviceType are required by NetBox — plan must handle defaults
+resp, _, err := c.api.DcimAPI.DcimDevicesCreate(ctx).
+    WritableDeviceWithConfigContextRequest(req).Execute()
+```
+
+**Note:** NetBox `DcimDevicesCreate` requires `device_role` and `device_type` to be set (they are non-nullable FK fields in NetBox v4). The intake handler must either pick sensible defaults or require them to exist in NetBox as pre-provisioned "Unknown" role/type records.
+
+---
+
+### Pitfall 5: Confidence Self-Reporting Calibration
+
+**What goes wrong:** Model returns `"confidence": 0.95` for every item regardless of actual uncertainty, making the threshold useless.
+
+**Why it happens:** LLMs tend to be overconfident in self-reporting. Without explicit calibration prompting, models bias toward high confidence.
+
+**How to avoid:** Add calibration guidance to the intake prompt: "Return confidence < 0.75 if: serial number not visible, item is partially obscured, or manufacturer/model cannot be determined from visual inspection alone." This nudges the model toward honest low-confidence responses for ambiguous photos.
+
+---
+
+### Pitfall 6: WAQ Integration — PendingOp Payload Schema
+
+**What goes wrong:** Intake handler enqueues a `PendingOp` with a payload, but Phase 1's `NoOpHandler` (the WAQ worker) is still installed — it drains the queue silently. Phase 2 must replace `NoOpHandler` with a real NetBox retry handler.
+
+**Why it happens:** Phase 1 explicitly left `NoOpHandler` as a stub: "Phase 2 will replace this with a real retry handler."
+
+**How to avoid:** Phase 2 plan must include a task to implement the real WAQ handler that retries failed NetBox `CreateDevice` / `PatchCustomFields` calls. Define `PendingOp.OpType` constants (e.g., `"netbox.create_device"`, `"netbox.patch_custom_fields"`) and the payload structs for each.
+
+---
+
+## Code Examples
+
+### go-openai Client Configuration for oMLX
+
+```go
+// Source: go-openai README pattern, confirmed in STACK.md [ASSUMED version specifics]
+import openai "github.com/sashabaranov/go-openai"
+
+cfg := openai.DefaultConfig("local")   // API key "local" for oMLX (no auth)
+cfg.BaseURL = "http://localhost:8000/v1"
+client := openai.NewClientWithConfig(cfg)
+```
+
+### go-openai Client Configuration for OpenRouter
+
+```go
+cfg := openai.DefaultConfig("sk-or-your-key-here")
+cfg.BaseURL = "https://openrouter.ai/api/v1"
+client := openai.NewClientWithConfig(cfg)
+```
+
+### Multipart File Reading in chi Handler
+
+```go
+// Source: Go stdlib net/http [VERIFIED: stdlib pattern]
+r.ParseMultipartForm(32 << 20)  // 32MB max memory
+files := r.MultipartForm.File["photos"]
+for _, fh := range files {
+    f, err := fh.Open()
+    defer f.Close()
+    data, _ := io.ReadAll(f)
+    mime := http.DetectContentType(data[:min(512, len(data))])
+    b64 := base64.StdEncoding.EncodeToString(data)
+    dataURL := fmt.Sprintf("data:%s;base64,%s", mime, b64)
+}
+```
+
+### JSON Parse with Fallback
+
+```go
+// Source: Go stdlib encoding/json [VERIFIED: stdlib pattern]
+var result ai.IntakeResult
+content := resp.Choices[0].Message.Content
+if err := json.Unmarshal([]byte(content), &result); err != nil {
+    // Model returned non-JSON — treat as low confidence, escalate
+    return &ai.IntakeResult{Confidence: 0.0}, nil
+}
+```
+
+### Integration Test Skip Guard (consistent with Phase 1 pattern)
+
+```go
+// Source: Phase 1 established pattern (01-02-SUMMARY.md) [VERIFIED: codebase]
+func TestAnalyzePhotosLive(t *testing.T) {
+    endpoint := os.Getenv("HWLAB_OMLX_ENDPOINT")
+    if endpoint == "" {
+        t.Skip("HWLAB_OMLX_ENDPOINT not set — skipping live oMLX test")
+    }
+    // ...
+}
+```
+
+---
+
+## Validation Architecture
+
+### Test Framework
+
+| Property | Value |
+|----------|-------|
+| Framework | Go testing stdlib (`go test ./...`) |
+| Config file | none — test flags via env vars |
+| Quick run command | `go test ./internal/ai/... -run "^Test[^L]" -timeout 30s` |
+| Full suite command | `go test ./...` |
+
+### Phase Requirements → Test Map
+
+| Req ID | Behavior | Test Type | Automated Command | File Exists? |
+|--------|----------|-----------|-------------------|-------------|
+| AI-02 | Photo upload multipart parsing | unit | `go test ./internal/api/handlers/... -run TestIntakeHandler` | Wave 0 |
+| AI-02 | Base64 encoding of JPEG | unit | `go test ./internal/ai/... -run TestEncodePhoto` | Wave 0 |
+| AI-03 | JSON parse of structured output | unit | `go test ./internal/ai/... -run TestParseIntakeResult` | Wave 0 |
+| AI-05 | Confidence below threshold → needs_research | unit | `go test ./internal/ai/... -run TestOrchestratorEscalation` | Wave 0 |
+| AI-05 | Confidence above threshold → indexed | unit | `go test ./internal/ai/... -run TestOrchestratorHighConf` | Wave 0 |
+| AI-06 | Tier 2 called on tier 1 failure | unit | `go test ./internal/ai/... -run TestOrchestratorTier2Fallback` | Wave 0 |
+| AI-07 | Quick add flag honors threshold | unit | `go test ./internal/ai/... -run TestQuickAddMode` | Wave 0 |
+| AI-08 | TierClient uses configured BaseURL | unit | `go test ./internal/ai/... -run TestTierClientConfig` | Wave 0 |
+| AI-09 | ai_config.json loaded via viper | unit | `go test ./internal/config/... -run TestAIConfig` | Wave 0 |
+| AI-01 | oMLX live inference smoke test | integration | `go test ./internal/ai/... -run TestAnalyzePhotosLive` (skip if env unset) | Wave 0 |
+
+### Sampling Rate
+
+- **Per task commit:** `go test ./internal/ai/... ./internal/api/handlers/... -timeout 30s`
+- **Per wave merge:** `go test ./...`
+- **Phase gate:** Full suite green before `/gsd-verify-work`
+
+### Wave 0 Gaps
+
+- [ ] `internal/ai/client_test.go` — covers AI-08, AI-09 (TierClient config)
+- [ ] `internal/ai/orchestrator_test.go` — covers AI-05, AI-06, AI-07
+- [ ] `internal/ai/types_test.go` — covers AI-03 (JSON parse)
+- [ ] `internal/api/handlers/intake_test.go` — covers AI-02
+
+---
+
+## Security Domain
+
+### Applicable ASVS Categories
+
+| ASVS Category | Applies | Standard Control |
+|---------------|---------|-----------------|
+| V2 Authentication | no | No auth in solo homelab tool |
+| V3 Session Management | no | Stateless REST |
+| V4 Access Control | no | Solo operator, no roles |
+| V5 Input Validation | yes | Validate photo count (1-3), file size cap, MIME type check |
+| V6 Cryptography | no | API keys in config, not in code |
+
+### Known Threat Patterns
+
+| Pattern | STRIDE | Standard Mitigation |
+|---------|--------|---------------------|
+| Oversized photo upload (DoS) | Denial of Service | `ParseMultipartForm(32 << 20)` caps memory; add explicit per-file size check (e.g., 10MB/photo) |
+| AI prompt injection via filename | Tampering | Do not include original filename in AI prompt; use only image bytes |
+| API key leakage in logs | Info Disclosure | Never log `TierConfig.APIKey`; use `***` redaction in any debug output |
+| Malformed JSON from model | Tampering | Always `json.Unmarshal` into typed struct; ignore extra fields; treat parse failure as low confidence |
+
+---
+
+## Environment Availability
+
+| Dependency | Required By | Available | Version | Fallback |
+|------------|------------|-----------|---------|----------|
+| oMLX on localhost:8000 | AI-01, Tier 1 inference | Unknown (dev machine) | — | MockAIClient for unit tests; integration tests skip with env guard |
+| OpenRouter API key | AI-06, Tier 2 | Unknown | — | Integration tests skip; tier2 returns error, orchestrator falls back to needs_research |
+| DragonFlyDB (10.5.0.10) | WAQ fallback | VERIFIED reachable (from 01-05-SUMMARY) | — | WAQ init is non-fatal; see 01-05 pattern |
+| NetBox (10.5.0.130:8000) | CreateDevice, PatchCustomFields | Available (integration tests skip on placeholder token) | — | WAQ enqueues ops; real token needed for integration tests |
+
+**Missing dependencies with no fallback:**
+- None — all dependencies have mock/skip fallbacks for unit tests.
+
+**Missing dependencies with fallback:**
+- oMLX: MockAIClient covers unit tests; integration test skips with `HWLAB_OMLX_ENDPOINT` guard.
+- OpenRouter key: Same skip guard pattern.
+
+---
+
+## Open Questions
+
+1. **NetBox device_role and device_type for CreateDevice**
+   - What we know: NetBox v4 requires both to be non-null FKs on device creation
+   - What's unclear: Should intake auto-create "Unknown" role/type records if absent, or require them pre-provisioned?
+   - Recommendation: Phase 1 (Plan 03, provision.go) may have already provisioned these. Check `internal/netbox/provision.go` before planning the CreateDevice task.
+
+2. **Gemma 4 E4B model ID string in oMLX**
+   - What we know: CONTEXT.md says `model: "gemma-4-e4b"` as default; oMLX uses the model filename/ID
+   - What's unclear: The exact model ID string oMLX uses for Gemma 4 E4B (may be `mlx-community/gemma-4-e4b` or similar)
+   - Recommendation: Leave as a config value; user sets the correct model ID once oMLX is installed. Default to `"gemma-4-e4b"` in ai_config.json with a comment.
+
+3. **Synchronous vs async intake response**
+   - What we know: AI inference takes 2-30 seconds; Phase 3 adds SSE; no UI in Phase 2
+   - What's unclear: Should Phase 2 implement async job IDs now (for Phase 3 to build on) or keep synchronous for simplicity?
+   - Recommendation: Implement synchronous for Phase 2 (no UI yet); design the handler to accept a `?async=true` query param stub that returns "not yet implemented" — this reserves the API surface for Phase 3 without blocking Phase 2.
+
+---
+
+## Assumptions Log
+
+| # | Claim | Section | Risk if Wrong |
+|---|-------|---------|---------------|
+| A1 | go-openai vision content uses `MultiContent []ChatMessagePart` field on `ChatCompletionMessage` | Pattern 2 | Compile error; verify with `go doc` after install |
+| A2 | oMLX supports data URL base64 images in vision requests | Pattern 2 | 400 error at inference time; may need to write image to temp file and use URL instead |
+| A3 | oMLX may not support `response_format: json_object` | Pattern 3 | Must use prompt-only JSON mode; 400 if ResponseFormat is set |
+| A4 | go-openai latest version is v1.36+ | Standard Stack | Run `go get` to verify; version is only needed to confirm stability |
+| A5 | Gemma 4 E4B self-reports honest confidence scores with calibration prompting | Pattern 5 pitfall | Threshold becomes useless if model is always overconfident; may need threshold tuning |
+| A6 | viper `MergeInConfig` can load ai_config.json as secondary config | Pattern 6 | Config loading fails silently; test config loading in Wave 0 |
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+
+- CONTEXT.md `02-CONTEXT.md` — locked decisions for Phase 2 (this session)
+- `01-02-SUMMARY.md`, `01-04-SUMMARY.md`, `01-05-SUMMARY.md` — Phase 1 actual implementation (verified codebase state)
+- `internal/config/config.go` — existing config struct to extend
+- `internal/api/router.go` — existing chi router to add route to
+- `go.mod` — confirmed go-openai not yet installed
+
+### Secondary (MEDIUM confidence)
+
+- `ARCHITECTURE.md`, `STACK.md` — project research documents (verified at research time)
+- CLAUDE.md stack patterns section — photo intake pattern, AI tier routing pattern
+
+### Tertiary (LOW/ASSUMED)
+
+- go-openai `ChatCompletionMessage.MultiContent` field name — training knowledge, verify post-install
+- oMLX `response_format` support status — not tested; marked ASSUMED
+- go-openai latest version number — marked ASSUMED
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+
+- Standard stack: HIGH — go-openai is the decided library; already in STACK.md; pattern for BaseURL swap is verified
+- Architecture (interface/mock pattern): HIGH — standard Go interface idiom, consistent with Phase 1 patterns
+- go-openai vision API field names: LOW — exact field names require post-install verification
+- oMLX JSON mode support: LOW — not tested against live oMLX
+
+**Research date:** 2026-04-10
+**Valid until:** 2026-05-10 (go-openai API is stable; oMLX is fast-moving — re-verify JSON mode if oMLX version changes)