homelabby/.planning/phases/02-ai-pipeline/02-RESEARCH.md

809 lines
36 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 2: AI Pipeline - Research
**Researched:** 2026-04-10
**Domain:** Go AI client interface, multipart photo intake, multimodal vision with Gemma 4 via oMLX, three-tier orchestrator, confidence-based quality gate wiring
**Confidence:** HIGH (core patterns from training knowledge, verified against codebase and stack decisions)
---
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
- Single `go-openai` client with configurable BaseURL per tier
- Tier 1: oMLX at http://localhost:8000/v1 (Gemma 4 E4B default)
- Tier 2: OpenRouter at https://openrouter.ai/api/v1 (research agent)
- Tier 3: OpenRouter (Opus for Lab Advisor — deferred to Phase 6)
- Config JSON drives tier routing — no code changes to swap providers
- POST /api/intake accepts multipart/form-data with 1-3 photo files
- Photos encoded as base64 and sent to Gemma 4 vision endpoint
- AI extracts: serial number, model, manufacturer, specs, category, suggested tags
- Confidence score determines catalog_status: high → indexed, low → needs_research
- Config flag enables skip-review flow for high-confidence items (Quick Add mode)
- oMLX may not be installed on dev machine — use mock AI client for unit tests
- Integration tests skip gracefully when oMLX unreachable
- Expose `AIClient` interface so production uses oMLX, tests use mock
- AI config lives in ai_config.json (separate from main config.json)
- Intake handler should use write-ahead queue if NetBox unreachable
- SearXNG function calling deferred to Phase 7
### Claude's Discretion
All implementation details are at Claude's discretion. Use Phase 1 artifacts (NetBox client, quality gate, HW-ID) as building blocks.
### Deferred Ideas (OUT OF SCOPE)
- SearXNG function calling (Phase 7)
- Lab Advisor tier 3 (Phase 6)
- Natural language search (Phase 7)
- Actual Gemma 4 model tuning/fine-tuning
- React UI for intake (Phase 3)
</user_constraints>
---
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| AI-01 | oMLX installed on Mac Mini M4 with Gemma 4 model serving OpenAI-compatible API | oMLX setup guide + mock pattern for dev |
| AI-02 | User can upload 1-3 photos and AI extracts serial number, model, manufacturer, specs via multimodal vision | Multipart form handling + base64 vision message pattern |
| AI-03 | AI suggests category, tags, and location for each item | Structured JSON response from vision prompt |
| AI-04 | AI calls SearXNG via function calling to research product specs (STUB only this phase) | Stub interface only; real impl Phase 7 |
| AI-05 | Orchestrator reviews Tier 1 output for completeness and flags gaps as needs_research | Confidence extraction + quality gate transition |
| AI-06 | Tier 2 research agent (OpenRouter) automatically enriches items flagged needs_research | go-openai BaseURL swap pattern |
| AI-07 | Quick add mode skips review screen for items with high AI confidence | Config flag + threshold comparison |
| AI-08 | All AI tiers accessed via single OpenAI-compatible client with configurable base URLs | go-openai ClientConfig.BaseURL |
| AI-09 | Provider routing configured via JSON file — swap any tier without code changes | ai_config.json schema + factory pattern |
</phase_requirements>
---
## Summary
Phase 2 builds the AI backbone of HWLab: a Go interface hierarchy that decouples test-time mocks from production oMLX/OpenRouter calls, a multipart photo intake handler that encodes images as base64 vision messages, a structured-output extractor that parses Gemma 4 JSON responses into typed `IntakeResult` values, and a three-tier orchestrator that escalates to OpenRouter when Tier 1 confidence falls below threshold.
The key design challenge is keeping the `AIClient` interface minimal enough to mock cleanly while capturing the full vision + JSON-mode call pattern used by go-openai. The confidence score must be embedded in the model's structured output (not inferred post-hoc) because Gemma 4 / OpenAI-compatible APIs do not expose logprobs for vision tasks reliably.
The orchestrator plugs directly into Phase 1's `CatalogUpdater`, `AllocateNextHWID`, `PatchCustomFields`, and `SyncTags` — all four are stable and tested. The WAQ from Phase 1 (Plan 05) is already wired into main.go and is the fallback path when NetBox is unreachable during intake.
**Primary recommendation:** Build the `AIClient` interface and mock first, then the intake handler, then the orchestrator. Keep confidence scoring self-contained inside the AI package — do not leak `float64` confidence values into the service layer; instead expose a typed `CatalogStatus` decision from the orchestrator.
---
## Standard Stack
### Core (Phase 2 additions)
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| github.com/sashabaranov/go-openai | v1.x | OpenAI-compatible HTTP client | Single client for oMLX + OpenRouter; BaseURL swap is the tier-routing mechanism; already recommended in STACK.md |
**Version verification:**
```bash
go get github.com/sashabaranov/go-openai@latest
# As of 2026-04 training knowledge: v1.36+ is current — verify before install
```
[ASSUMED: exact latest version; run `npm view` equivalent: `go list -m github.com/sashabaranov/go-openai@latest` to confirm]
### Already in go.mod (no new dependencies needed)
| Package | Current Version | Used By Phase 2 |
|---------|-----------------|-----------------|
| github.com/go-chi/chi/v5 | v5.2.5 | POST /api/intake route |
| github.com/spf13/viper | v1.21.0 | ai_config.json loading |
| github.com/google/uuid | v1.6.0 | Intake job ID (already indirect) |
| github.com/redis/go-redis/v9 | v9.18.0 | WAQ fallback on NetBox failure |
### Installation
```bash
cd /home/mikkel/homelabby
go get github.com/sashabaranov/go-openai@latest
```
---
## Architecture Patterns
### Recommended Package Structure (Phase 2 additions)
```
internal/
├── ai/
│ ├── client.go # AIClient interface + TierClient concrete type
│ ├── mock.go # MockAIClient for unit tests
│ ├── orchestrator.go # Three-tier routing + escalation logic
│ ├── types.go # IntakeRequest, IntakeResult, ConfidenceLevel
│ └── prompts/
│ └── intake.go # Prompt templates for hardware analysis
├── api/
│ ├── handlers/
│ │ └── intake.go # POST /api/intake multipart handler (new)
│ └── router.go # Add intake route (modify existing)
└── config/
└── config.go # Add AIConfig fields (modify existing)
```
---
### Pattern 1: AIClient Interface + TierClient
**What:** A minimal Go interface that captures the one call shape Phase 2 needs. `TierClient` wraps `*openai.Client` from go-openai. `MockAIClient` implements the same interface deterministically.
**Why minimal interface:** The interface should expose the behavior, not the library. If the interface requires `*openai.ChatCompletionRequest`, tests must import go-openai. A domain-typed interface (`AnalyzePhotos`) keeps mocks simple.
```go
// Source: training knowledge — standard Go interface pattern [ASSUMED]
// internal/ai/client.go
package ai
import "context"
// AIClient is the single abstraction over any OpenAI-compatible inference backend.
// Production: TierClient wrapping sashabaranov/go-openai.
// Tests: MockAIClient with canned responses.
type AIClient interface {
AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error)
}
// TierConfig holds provider configuration for one AI tier.
type TierConfig struct {
BaseURL string `json:"base_url"`
APIKey string `json:"api_key"`
Model string `json:"model"`
TimeoutS int `json:"timeout_seconds"`
}
// TierClient is the production AIClient backed by go-openai.
type TierClient struct {
client *openai.Client
model string
}
func NewTierClient(cfg TierConfig) *TierClient {
config := openai.DefaultConfig(cfg.APIKey)
config.BaseURL = cfg.BaseURL
return &TierClient{
client: openai.NewClientWithConfig(config),
model: cfg.Model,
}
}
```
[VERIFIED: go-openai BaseURL override via `openai.DefaultConfig` + `config.BaseURL` — confirmed pattern from STACK.md and ARCHITECTURE.md]
---
### Pattern 2: Multipart Photo Upload → Base64 Vision Message
**What:** chi handler reads up to 3 files from multipart form, reads each into `[]byte`, encodes to base64 data URL, assembles a `ChatCompletionRequest` with `ImageURL` content parts.
**go-openai vision message shape:** [ASSUMED: standard pattern, consistent with OpenAI API]
```go
// internal/api/handlers/intake.go
// Source: go-openai vision pattern [ASSUMED — matches OpenAI API spec]
func (h *IntakeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// Parse multipart — 32MB max
if err := r.ParseMultipartForm(32 << 20); err != nil {
http.Error(w, "bad multipart", http.StatusBadRequest)
return
}
files := r.MultipartForm.File["photos"]
if len(files) == 0 || len(files) > 3 {
http.Error(w, "1-3 photos required", http.StatusBadRequest)
return
}
var photosB64 []string
for _, fh := range files {
f, err := fh.Open()
if err != nil { /* handle */ }
defer f.Close()
data, err := io.ReadAll(f)
if err != nil { /* handle */ }
// Detect MIME type from first 512 bytes
mime := http.DetectContentType(data[:min(512, len(data))])
photosB64 = append(photosB64, fmt.Sprintf("data:%s;base64,%s",
mime, base64.StdEncoding.EncodeToString(data)))
}
result, err := h.ai.AnalyzePhotos(r.Context(), ai.IntakeRequest{
PhotosBase64: photosB64,
})
// ...
}
```
**go-openai vision content parts:** [ASSUMED]
```go
// internal/ai/client.go — TierClient.AnalyzePhotos
func (c *TierClient) AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error) {
// Build image content parts
parts := []openai.ChatMessagePart{
{
Type: openai.ChatMessagePartTypeText,
Text: buildIntakePrompt(),
},
}
for _, b64 := range req.PhotosBase64 {
parts = append(parts, openai.ChatMessagePart{
Type: openai.ChatMessagePartTypeImageURL,
ImageURL: &openai.ChatMessageImageURL{
URL: b64, // data:image/jpeg;base64,...
Detail: openai.ImageURLDetailAuto,
},
})
}
resp, err := c.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: c.model,
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, MultiContent: parts},
},
// ResponseFormat for JSON mode — see Pattern 3
})
// parse resp.Choices[0].Message.Content as JSON
}
```
[ASSUMED: `MultiContent` field name in go-openai ChatCompletionMessage — verify against actual go-openai source after install. Some versions use `Content` string OR `MultiContent []ChatMessagePart`]
**CRITICAL NOTE:** Verify the exact `ChatCompletionMessage` field for multi-content vision after `go get`. The field has been `MultiContent` in v1.20+ but naming may differ. Check with:
```bash
go doc github.com/sashabaranov/go-openai ChatCompletionMessage
```
---
### Pattern 3: Structured JSON Output from Gemma 4
**What:** Instruct the model to return a specific JSON schema via prompt engineering. Use `ResponseFormat` with `JSONObject` type when the endpoint supports it (oMLX/Gemma 4 may not support strict JSON schema mode — fall back to prompt-only).
**IntakeResult schema:**
```go
// internal/ai/types.go
package ai
// IntakeResult is the structured output from any AI tier's photo analysis.
// The model is instructed to return this JSON shape verbatim.
type IntakeResult struct {
SerialNumber string `json:"serial_number"` // empty string if not visible
Model string `json:"model"`
Manufacturer string `json:"manufacturer"`
Category string `json:"category"` // e.g. "networking", "cable", "compute"
Specs map[string]string `json:"specs"` // key-value hardware specs
SuggestedTags []string `json:"suggested_tags"`
AINotes string `json:"ai_notes"` // free-form observations
Confidence float64 `json:"confidence"` // 0.01.0, self-reported by model
ConfidenceNote string `json:"confidence_note"` // why confidence is low (if < threshold)
}
```
**Prompt pattern for JSON output:**
```go
// internal/ai/prompts/intake.go
func buildIntakePrompt() string {
return `Analyze the hardware in the provided photo(s) and return ONLY valid JSON matching this schema:
{
"serial_number": "<string or empty>",
"model": "<string>",
"manufacturer": "<string>",
"category": "<one of: compute, networking, storage, cable, peripheral, component, unknown>",
"specs": {"<key>": "<value>"},
"suggested_tags": ["<tag1>", "<tag2>"],
"ai_notes": "<observations>",
"confidence": <float 0.0-1.0>,
"confidence_note": "<reason if confidence < 0.75>"
}
Return ONLY the JSON object. No markdown, no explanation.`
}
```
**JSON mode ResponseFormat (use if supported by endpoint):** [ASSUMED]
```go
// Only set if oMLX / OpenRouter model supports JSON mode
ResponseFormat: &openai.ChatCompletionResponseFormat{
Type: openai.ChatCompletionResponseFormatTypeJSONObject,
},
```
[ASSUMED: Gemma 4 via oMLX may not support `response_format: json_object` — implement with prompt-only fallback and parse `json.Unmarshal` on the raw response string. If JSON parse fails, treat as low-confidence and escalate.]
---
### Pattern 4: Three-Tier Orchestrator
**What:** Orchestrator holds two `AIClient` instances (tier1, tier2). For each intake request: call tier1, parse result, check confidence. If confidence < threshold OR parse failed, call tier2 with same request. Map confidence to `CatalogStatus` for quality gate.
```go
// internal/ai/orchestrator.go
package ai
type Orchestrator struct {
tier1 AIClient
tier2 AIClient
threshold float64 // from config — default 0.75
}
func NewOrchestrator(tier1, tier2 AIClient, threshold float64) *Orchestrator {
return &Orchestrator{tier1: tier1, tier2: tier2, threshold: threshold}
}
// Analyze runs tier1, escalates to tier2 if needed, returns result + catalog decision.
func (o *Orchestrator) Analyze(ctx context.Context, req IntakeRequest) (*IntakeResult, inventory.CatalogStatus, error) {
result, err := o.tier1.AnalyzePhotos(ctx, req)
if err != nil || result == nil || result.Confidence < o.threshold {
// Escalate to tier2
result2, err2 := o.tier2.AnalyzePhotos(ctx, req)
if err2 == nil && result2 != nil {
result = result2
}
// If tier2 also fails, use tier1 result (or zero result) with NeedsResearch status
}
status := inventory.StatusIndexed
if result == nil || result.Confidence < o.threshold {
status = inventory.StatusNeedsResearch
}
return result, status, nil
}
```
---
### Pattern 5: MockAIClient for Unit Tests
**What:** A deterministic mock that returns canned `IntakeResult` values. Implements `AIClient` interface. Configurable to return high-confidence or low-confidence responses, and optionally errors.
```go
// internal/ai/mock.go
package ai
import "context"
// MockAIClient is a test double for AIClient.
// Configure FixedResult and/or FixedError before use.
type MockAIClient struct {
FixedResult *IntakeResult
FixedError error
Calls []IntakeRequest // record of calls for assertions
}
func (m *MockAIClient) AnalyzePhotos(_ context.Context, req IntakeRequest) (*IntakeResult, error) {
m.Calls = append(m.Calls, req)
return m.FixedResult, m.FixedError
}
// HighConfidenceResult returns a fixture IntakeResult with confidence 0.95.
func HighConfidenceResult() *IntakeResult {
return &IntakeResult{
Model: "Raspberry Pi 4 Model B",
Manufacturer: "Raspberry Pi Foundation",
Category: "compute",
Specs: map[string]string{"ram": "4GB", "cpu": "BCM2711"},
SuggestedTags: []string{"raspberry-pi", "compute", "arm"},
Confidence: 0.95,
}
}
// LowConfidenceResult returns a fixture with confidence 0.40 (below threshold).
func LowConfidenceResult() *IntakeResult {
return &IntakeResult{
Model: "Unknown Device",
Category: "unknown",
Confidence: 0.40,
ConfidenceNote: "Cannot identify markings clearly",
}
}
```
---
### Pattern 6: AI Config Schema (ai_config.json)
**What:** Separate JSON config file for AI provider settings. Loaded by viper alongside main config.json. Keeps provider credentials out of the main config.
```json
{
"tier1": {
"base_url": "http://localhost:8000/v1",
"api_key": "local",
"model": "gemma-4-e4b",
"timeout_seconds": 30
},
"tier2": {
"base_url": "https://openrouter.ai/api/v1",
"api_key": "sk-or-...",
"model": "google/gemma-2-27b-it",
"timeout_seconds": 60
},
"confidence_threshold": 0.75,
"quick_add_enabled": false,
"quick_add_threshold": 0.90
}
```
**Config struct extension** (extend existing `internal/config/config.go`):
```go
type AIConfig struct {
Tier1 TierConfig `mapstructure:"tier1"`
Tier2 TierConfig `mapstructure:"tier2"`
ConfidenceThreshold float64 `mapstructure:"confidence_threshold"`
QuickAddEnabled bool `mapstructure:"quick_add_enabled"`
QuickAddThreshold float64 `mapstructure:"quick_add_threshold"`
}
// Add to Config struct:
AI AIConfig `mapstructure:"ai"`
```
**Viper loads ai_config.json** by merging it into the same viper instance using `v.MergeInConfig()` with a second config name, or by embedding the AI fields directly in config.json under an `"ai"` key. Simplest: use a single config.json with an `"ai"` section and add `ai_config.json` as an override file via `v.MergeConfigMap`.
[ASSUMED: viper MergeInConfig pattern for secondary config file standard viper v1 capability]
---
### Pattern 7: Intake Handler Wiring to Phase 1 Components
**What:** The intake handler coordinates: orchestrator (AI analysis) `AllocateNextHWID` (ID) `BuildFullCustomFieldsPatch` (fields) `NetboxClient.CreateDevice` or `PatchCustomFields` `SyncTags` `CatalogUpdater.UpdateCatalogStatus` WAQ fallback.
**Existing Phase 1 APIs the handler calls:**
| Phase 1 Function | Package | Handler Usage |
|-----------------|---------|---------------|
| `AllocateNextHWID(ctx)` | `internal/netbox` | Assign HW-XXXXX ID to new record |
| `BuildFullCustomFieldsPatch(cf)` | `internal/netbox` | Populate custom fields from IntakeResult |
| `PatchCustomFields(ctx, id, patch)` | `internal/netbox` | Write AI data to NetBox device |
| `SyncTags(ctx, tags)` | `internal/netbox` | Create and assign AI-suggested tags |
| `UpdateCatalogStatus(ctx, id, current, next)` | `internal/inventory` | Set indexed or needs_research |
| `waq.Enqueue(ctx, op)` | `internal/queue` | Buffer NetBox write if unreachable |
**Note:** Phase 1's `client.go` has `ListDevices` and `GetDevice` but no `CreateDevice`. The intake handler will need `CreateDevice` this is a new method on `internal/netbox.Client`. Plan must include this task.
---
### Pattern 8: SearXNG Stub (AI-04)
**What:** AI-04 is listed as "Phase 7" in REQUIREMENTS.md but the CONTEXT.md says "stub only" this phase. Implement a `ResearchClient` interface with a `Search(ctx, query)` method, and a `NoOpResearchClient` that returns empty results. This satisfies the interface requirement without Phase 7 scope creep.
```go
// internal/ai/research.go (stub)
type ResearchClient interface {
Search(ctx context.Context, query string) ([]SearchResult, error)
}
type NoOpResearchClient struct{}
func (n *NoOpResearchClient) Search(_ context.Context, _ string) ([]SearchResult, error) {
return nil, nil // Phase 7 will provide real implementation
}
```
---
### Anti-Patterns to Avoid
- **Don't extract confidence from logprobs:** Gemma 4 vision via oMLX does not expose per-token logprobs reliably. Embed `confidence: float` in the JSON output schema and instruct the model to self-report it. [ASSUMED: oMLX logprobs availability is uncertain]
- **Don't store photos:** Per CLAUDE.md stack patterns: "Store the original photo in a local temp directory only until the NetBox record is created; do not persist photos in HWLab itself." Photos are transient.
- **Don't call NetBox from the AI package:** `internal/ai` should not import `internal/netbox`. The intake handler (service layer) orchestrates both. Keep the AI package focused on inference only.
- **Don't share a single go-openai client across tiers:** Each tier gets its own `*openai.Client` instance with its own `BaseURL` and `APIKey`. Mutating a shared client's config is a race condition.
- **Don't block the HTTP response on AI inference:** AI calls take 2-30 seconds. The intake handler should return a job ID immediately and push the result via SSE. (Phase 3 will add SSE for Phase 2, a synchronous response is acceptable since there's no UI yet, but design the handler to support async promotion.)
---
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| OpenAI-compatible HTTP client | Custom HTTP calls to oMLX | `sashabaranov/go-openai` | Handles auth headers, retry, streaming, vision content parts |
| Base64 encoding | Custom encoder | `encoding/base64` stdlib | Already in Go stdlib |
| MIME type detection | File extension parsing | `net/http.DetectContentType` | Magic bytes detection from stdlib |
| JSON structured output parsing | Regex extraction | `encoding/json.Unmarshal` | Model output is well-formed JSON when prompted correctly |
| Multipart form parsing | Manual `--boundary` parsing | `r.ParseMultipartForm()` | stdlib net/http handles multipart |
---
## Common Pitfalls
### Pitfall 1: go-openai Vision MultiContent Field Name
**What goes wrong:** Code compiles but `ChatCompletionMessage.MultiContent` field doesn't exist or is named differently in the installed version.
**Why it happens:** go-openai API evolved; older versions used a single `Content string`, newer versions added `MultiContent []ChatMessagePart` for vision. The exact field name depends on the version.
**How to avoid:** After `go get github.com/sashabaranov/go-openai@latest`, run `go doc github.com/sashabaranov/go-openai ChatCompletionMessage` and verify the vision field name before writing handler code.
**Warning signs:** Compiler error "unknown field MultiContent" or images silently not being sent (text-only response from model).
---
### Pitfall 2: oMLX JSON Mode Not Supported
**What goes wrong:** Setting `ResponseFormat: {Type: "json_object"}` causes a 400 error from oMLX because Gemma 4 E4B via oMLX may not support the `response_format` parameter.
**Why it happens:** The `response_format` JSON schema enforcement is an OpenAI-specific feature not universally implemented across all OpenAI-compatible servers.
**How to avoid:** Implement JSON parsing with a fallback: try `json.Unmarshal(content)` on the raw string. If parse fails, treat result as zero-confidence and escalate to tier2. Do not set `ResponseFormat` unless tested against live oMLX.
**Warning signs:** 400 Bad Request from oMLX at inference time with "unsupported parameter" in body.
---
### Pitfall 3: Data URL MIME Type vs go-openai Image URL
**What goes wrong:** Some OpenAI-compatible servers reject `data:image/jpeg;base64,...` data URLs in vision requests and require a `https://` URL instead.
**Why it happens:** The OpenAI spec allows data URLs in `image_url.url` but not all providers implement this.
**How to avoid:** oMLX (local, Gemma 4) should accept data URLs since it's processing locally. Test with a minimal integration test against live oMLX before building the full intake flow. Keep the base64 path for oMLX (tier1) and note that tier2 (OpenRouter) may require a different approach if it doesn't accept data URLs.
**Warning signs:** 400 or inference-time error from oMLX with "invalid image_url".
---
### Pitfall 4: CreateDevice Not in Phase 1 NetBox Client
**What goes wrong:** Intake handler tries to call `netboxClient.CreateDevice(...)` but that method was not built in Phase 1 (only ListDevices, GetDevice, PatchCustomFields were built).
**Why it happens:** Phase 1 was scoped to read/patch existing devices for the quality gate workflow. Intake requires creating new records.
**How to avoid:** Plan must include a Wave 0 task to add `CreateDevice(ctx, name, assetTag) (int, error)` to `internal/netbox/client.go` before the intake handler can be completed.
**go-netbox v4 create pattern:** [ASSUMED matches observed PATCH pattern from 01-02-SUMMARY]
```go
req := nb.WritableDeviceWithConfigContextRequest{}
req.SetName(name)
req.SetAssetTag(assetTag)
// DeviceRole and DeviceType are required by NetBox — plan must handle defaults
resp, _, err := c.api.DcimAPI.DcimDevicesCreate(ctx).
WritableDeviceWithConfigContextRequest(req).Execute()
```
**Note:** NetBox `DcimDevicesCreate` requires `device_role` and `device_type` to be set (they are non-nullable FK fields in NetBox v4). The intake handler must either pick sensible defaults or require them to exist in NetBox as pre-provisioned "Unknown" role/type records.
---
### Pitfall 5: Confidence Self-Reporting Calibration
**What goes wrong:** Model returns `"confidence": 0.95` for every item regardless of actual uncertainty, making the threshold useless.
**Why it happens:** LLMs tend to be overconfident in self-reporting. Without explicit calibration prompting, models bias toward high confidence.
**How to avoid:** Add calibration guidance to the intake prompt: "Return confidence < 0.75 if: serial number not visible, item is partially obscured, or manufacturer/model cannot be determined from visual inspection alone." This nudges the model toward honest low-confidence responses for ambiguous photos.
---
### Pitfall 6: WAQ Integration — PendingOp Payload Schema
**What goes wrong:** Intake handler enqueues a `PendingOp` with a payload, but Phase 1's `NoOpHandler` (the WAQ worker) is still installed it drains the queue silently. Phase 2 must replace `NoOpHandler` with a real NetBox retry handler.
**Why it happens:** Phase 1 explicitly left `NoOpHandler` as a stub: "Phase 2 will replace this with a real retry handler."
**How to avoid:** Phase 2 plan must include a task to implement the real WAQ handler that retries failed NetBox `CreateDevice` / `PatchCustomFields` calls. Define `PendingOp.OpType` constants (e.g., `"netbox.create_device"`, `"netbox.patch_custom_fields"`) and the payload structs for each.
---
## Code Examples
### go-openai Client Configuration for oMLX
```go
// Source: go-openai README pattern, confirmed in STACK.md [ASSUMED version specifics]
import openai "github.com/sashabaranov/go-openai"
cfg := openai.DefaultConfig("local") // API key "local" for oMLX (no auth)
cfg.BaseURL = "http://localhost:8000/v1"
client := openai.NewClientWithConfig(cfg)
```
### go-openai Client Configuration for OpenRouter
```go
cfg := openai.DefaultConfig("sk-or-your-key-here")
cfg.BaseURL = "https://openrouter.ai/api/v1"
client := openai.NewClientWithConfig(cfg)
```
### Multipart File Reading in chi Handler
```go
// Source: Go stdlib net/http [VERIFIED: stdlib pattern]
r.ParseMultipartForm(32 << 20) // 32MB max memory
files := r.MultipartForm.File["photos"]
for _, fh := range files {
f, err := fh.Open()
defer f.Close()
data, _ := io.ReadAll(f)
mime := http.DetectContentType(data[:min(512, len(data))])
b64 := base64.StdEncoding.EncodeToString(data)
dataURL := fmt.Sprintf("data:%s;base64,%s", mime, b64)
}
```
### JSON Parse with Fallback
```go
// Source: Go stdlib encoding/json [VERIFIED: stdlib pattern]
var result ai.IntakeResult
content := resp.Choices[0].Message.Content
if err := json.Unmarshal([]byte(content), &result); err != nil {
// Model returned non-JSON — treat as low confidence, escalate
return &ai.IntakeResult{Confidence: 0.0}, nil
}
```
### Integration Test Skip Guard (consistent with Phase 1 pattern)
```go
// Source: Phase 1 established pattern (01-02-SUMMARY.md) [VERIFIED: codebase]
func TestAnalyzePhotosLive(t *testing.T) {
endpoint := os.Getenv("HWLAB_OMLX_ENDPOINT")
if endpoint == "" {
t.Skip("HWLAB_OMLX_ENDPOINT not set — skipping live oMLX test")
}
// ...
}
```
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | Go testing stdlib (`go test ./...`) |
| Config file | none test flags via env vars |
| Quick run command | `go test ./internal/ai/... -run "^Test[^L]" -timeout 30s` |
| Full suite command | `go test ./...` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| AI-02 | Photo upload multipart parsing | unit | `go test ./internal/api/handlers/... -run TestIntakeHandler` | Wave 0 |
| AI-02 | Base64 encoding of JPEG | unit | `go test ./internal/ai/... -run TestEncodePhoto` | Wave 0 |
| AI-03 | JSON parse of structured output | unit | `go test ./internal/ai/... -run TestParseIntakeResult` | Wave 0 |
| AI-05 | Confidence below threshold needs_research | unit | `go test ./internal/ai/... -run TestOrchestratorEscalation` | Wave 0 |
| AI-05 | Confidence above threshold indexed | unit | `go test ./internal/ai/... -run TestOrchestratorHighConf` | Wave 0 |
| AI-06 | Tier 2 called on tier 1 failure | unit | `go test ./internal/ai/... -run TestOrchestratorTier2Fallback` | Wave 0 |
| AI-07 | Quick add flag honors threshold | unit | `go test ./internal/ai/... -run TestQuickAddMode` | Wave 0 |
| AI-08 | TierClient uses configured BaseURL | unit | `go test ./internal/ai/... -run TestTierClientConfig` | Wave 0 |
| AI-09 | ai_config.json loaded via viper | unit | `go test ./internal/config/... -run TestAIConfig` | Wave 0 |
| AI-01 | oMLX live inference smoke test | integration | `go test ./internal/ai/... -run TestAnalyzePhotosLive` (skip if env unset) | Wave 0 |
### Sampling Rate
- **Per task commit:** `go test ./internal/ai/... ./internal/api/handlers/... -timeout 30s`
- **Per wave merge:** `go test ./...`
- **Phase gate:** Full suite green before `/gsd-verify-work`
### Wave 0 Gaps
- [ ] `internal/ai/client_test.go` covers AI-08, AI-09 (TierClient config)
- [ ] `internal/ai/orchestrator_test.go` covers AI-05, AI-06, AI-07
- [ ] `internal/ai/types_test.go` covers AI-03 (JSON parse)
- [ ] `internal/api/handlers/intake_test.go` covers AI-02
---
## Security Domain
### Applicable ASVS Categories
| ASVS Category | Applies | Standard Control |
|---------------|---------|-----------------|
| V2 Authentication | no | No auth in solo homelab tool |
| V3 Session Management | no | Stateless REST |
| V4 Access Control | no | Solo operator, no roles |
| V5 Input Validation | yes | Validate photo count (1-3), file size cap, MIME type check |
| V6 Cryptography | no | API keys in config, not in code |
### Known Threat Patterns
| Pattern | STRIDE | Standard Mitigation |
|---------|--------|---------------------|
| Oversized photo upload (DoS) | Denial of Service | `ParseMultipartForm(32 << 20)` caps memory; add explicit per-file size check (e.g., 10MB/photo) |
| AI prompt injection via filename | Tampering | Do not include original filename in AI prompt; use only image bytes |
| API key leakage in logs | Info Disclosure | Never log `TierConfig.APIKey`; use `***` redaction in any debug output |
| Malformed JSON from model | Tampering | Always `json.Unmarshal` into typed struct; ignore extra fields; treat parse failure as low confidence |
---
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| oMLX on localhost:8000 | AI-01, Tier 1 inference | Unknown (dev machine) | | MockAIClient for unit tests; integration tests skip with env guard |
| OpenRouter API key | AI-06, Tier 2 | Unknown | | Integration tests skip; tier2 returns error, orchestrator falls back to needs_research |
| DragonFlyDB (10.5.0.10) | WAQ fallback | VERIFIED reachable (from 01-05-SUMMARY) | | WAQ init is non-fatal; see 01-05 pattern |
| NetBox (10.5.0.130:8000) | CreateDevice, PatchCustomFields | Available (integration tests skip on placeholder token) | | WAQ enqueues ops; real token needed for integration tests |
**Missing dependencies with no fallback:**
- None all dependencies have mock/skip fallbacks for unit tests.
**Missing dependencies with fallback:**
- oMLX: MockAIClient covers unit tests; integration test skips with `HWLAB_OMLX_ENDPOINT` guard.
- OpenRouter key: Same skip guard pattern.
---
## Open Questions
1. **NetBox device_role and device_type for CreateDevice**
- What we know: NetBox v4 requires both to be non-null FKs on device creation
- What's unclear: Should intake auto-create "Unknown" role/type records if absent, or require them pre-provisioned?
- Recommendation: Phase 1 (Plan 03, provision.go) may have already provisioned these. Check `internal/netbox/provision.go` before planning the CreateDevice task.
2. **Gemma 4 E4B model ID string in oMLX**
- What we know: CONTEXT.md says `model: "gemma-4-e4b"` as default; oMLX uses the model filename/ID
- What's unclear: The exact model ID string oMLX uses for Gemma 4 E4B (may be `mlx-community/gemma-4-e4b` or similar)
- Recommendation: Leave as a config value; user sets the correct model ID once oMLX is installed. Default to `"gemma-4-e4b"` in ai_config.json with a comment.
3. **Synchronous vs async intake response**
- What we know: AI inference takes 2-30 seconds; Phase 3 adds SSE; no UI in Phase 2
- What's unclear: Should Phase 2 implement async job IDs now (for Phase 3 to build on) or keep synchronous for simplicity?
- Recommendation: Implement synchronous for Phase 2 (no UI yet); design the handler to accept a `?async=true` query param stub that returns "not yet implemented" this reserves the API surface for Phase 3 without blocking Phase 2.
---
## Assumptions Log
| # | Claim | Section | Risk if Wrong |
|---|-------|---------|---------------|
| A1 | go-openai vision content uses `MultiContent []ChatMessagePart` field on `ChatCompletionMessage` | Pattern 2 | Compile error; verify with `go doc` after install |
| A2 | oMLX supports data URL base64 images in vision requests | Pattern 2 | 400 error at inference time; may need to write image to temp file and use URL instead |
| A3 | oMLX may not support `response_format: json_object` | Pattern 3 | Must use prompt-only JSON mode; 400 if ResponseFormat is set |
| A4 | go-openai latest version is v1.36+ | Standard Stack | Run `go get` to verify; version is only needed to confirm stability |
| A5 | Gemma 4 E4B self-reports honest confidence scores with calibration prompting | Pattern 5 pitfall | Threshold becomes useless if model is always overconfident; may need threshold tuning |
| A6 | viper `MergeInConfig` can load ai_config.json as secondary config | Pattern 6 | Config loading fails silently; test config loading in Wave 0 |
---
## Sources
### Primary (HIGH confidence)
- CONTEXT.md `02-CONTEXT.md` locked decisions for Phase 2 (this session)
- `01-02-SUMMARY.md`, `01-04-SUMMARY.md`, `01-05-SUMMARY.md` Phase 1 actual implementation (verified codebase state)
- `internal/config/config.go` existing config struct to extend
- `internal/api/router.go` existing chi router to add route to
- `go.mod` confirmed go-openai not yet installed
### Secondary (MEDIUM confidence)
- `ARCHITECTURE.md`, `STACK.md` project research documents (verified at research time)
- CLAUDE.md stack patterns section photo intake pattern, AI tier routing pattern
### Tertiary (LOW/ASSUMED)
- go-openai `ChatCompletionMessage.MultiContent` field name training knowledge, verify post-install
- oMLX `response_format` support status not tested; marked ASSUMED
- go-openai latest version number marked ASSUMED
---
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH go-openai is the decided library; already in STACK.md; pattern for BaseURL swap is verified
- Architecture (interface/mock pattern): HIGH standard Go interface idiom, consistent with Phase 1 patterns
- go-openai vision API field names: LOW exact field names require post-install verification
- oMLX JSON mode support: LOW not tested against live oMLX
**Research date:** 2026-04-10
**Valid until:** 2026-05-10 (go-openai API is stable; oMLX is fast-moving re-verify JSON mode if oMLX version changes)