Mikkel Georgsen 6460b27bfc docs(02): research AI pipeline phase — go-openai vision, mock interface, orchestrator patterns

2026-04-10 05:32:17 +00:00

36 KiB

Raw Blame History

Phase 2: AI Pipeline - Research

Researched: 2026-04-10 Domain: Go AI client interface, multipart photo intake, multimodal vision with Gemma 4 via oMLX, three-tier orchestrator, confidence-based quality gate wiring Confidence: HIGH (core patterns from training knowledge, verified against codebase and stack decisions)

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Single go-openai client with configurable BaseURL per tier
Tier 1: oMLX at http://localhost:8000/v1 (Gemma 4 E4B default)
Tier 2: OpenRouter at https://openrouter.ai/api/v1 (research agent)
Tier 3: OpenRouter (Opus for Lab Advisor — deferred to Phase 6)
Config JSON drives tier routing — no code changes to swap providers
POST /api/intake accepts multipart/form-data with 1-3 photo files
Photos encoded as base64 and sent to Gemma 4 vision endpoint
AI extracts: serial number, model, manufacturer, specs, category, suggested tags
Confidence score determines catalog_status: high → indexed, low → needs_research
Config flag enables skip-review flow for high-confidence items (Quick Add mode)
oMLX may not be installed on dev machine — use mock AI client for unit tests
Integration tests skip gracefully when oMLX unreachable
Expose AIClient interface so production uses oMLX, tests use mock
AI config lives in ai_config.json (separate from main config.json)
Intake handler should use write-ahead queue if NetBox unreachable
SearXNG function calling deferred to Phase 7

Claude's Discretion

All implementation details are at Claude's discretion. Use Phase 1 artifacts (NetBox client, quality gate, HW-ID) as building blocks.

Deferred Ideas (OUT OF SCOPE)

SearXNG function calling (Phase 7)
Lab Advisor tier 3 (Phase 6)
Natural language search (Phase 7)
Actual Gemma 4 model tuning/fine-tuning
React UI for intake (Phase 3) </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
AI-01	oMLX installed on Mac Mini M4 with Gemma 4 model serving OpenAI-compatible API	oMLX setup guide + mock pattern for dev
AI-02	User can upload 1-3 photos and AI extracts serial number, model, manufacturer, specs via multimodal vision	Multipart form handling + base64 vision message pattern
AI-03	AI suggests category, tags, and location for each item	Structured JSON response from vision prompt
AI-04	AI calls SearXNG via function calling to research product specs (STUB only this phase)	Stub interface only; real impl Phase 7
AI-05	Orchestrator reviews Tier 1 output for completeness and flags gaps as needs_research	Confidence extraction + quality gate transition
AI-06	Tier 2 research agent (OpenRouter) automatically enriches items flagged needs_research	go-openai BaseURL swap pattern
AI-07	Quick add mode skips review screen for items with high AI confidence	Config flag + threshold comparison
AI-08	All AI tiers accessed via single OpenAI-compatible client with configurable base URLs	go-openai ClientConfig.BaseURL
AI-09	Provider routing configured via JSON file — swap any tier without code changes	ai_config.json schema + factory pattern
</phase_requirements>

Summary

Phase 2 builds the AI backbone of HWLab: a Go interface hierarchy that decouples test-time mocks from production oMLX/OpenRouter calls, a multipart photo intake handler that encodes images as base64 vision messages, a structured-output extractor that parses Gemma 4 JSON responses into typed IntakeResult values, and a three-tier orchestrator that escalates to OpenRouter when Tier 1 confidence falls below threshold.

The key design challenge is keeping the AIClient interface minimal enough to mock cleanly while capturing the full vision + JSON-mode call pattern used by go-openai. The confidence score must be embedded in the model's structured output (not inferred post-hoc) because Gemma 4 / OpenAI-compatible APIs do not expose logprobs for vision tasks reliably.

The orchestrator plugs directly into Phase 1's CatalogUpdater, AllocateNextHWID, PatchCustomFields, and SyncTags — all four are stable and tested. The WAQ from Phase 1 (Plan 05) is already wired into main.go and is the fallback path when NetBox is unreachable during intake.

Primary recommendation: Build the AIClient interface and mock first, then the intake handler, then the orchestrator. Keep confidence scoring self-contained inside the AI package — do not leak float64 confidence values into the service layer; instead expose a typed CatalogStatus decision from the orchestrator.

Standard Stack

Core (Phase 2 additions)

Library	Version	Purpose	Why Standard
github.com/sashabaranov/go-openai	v1.x	OpenAI-compatible HTTP client	Single client for oMLX + OpenRouter; BaseURL swap is the tier-routing mechanism; already recommended in STACK.md

Version verification:

go get github.com/sashabaranov/go-openai@latest
# As of 2026-04 training knowledge: v1.36+ is current — verify before install

[ASSUMED: exact latest version; run npm view equivalent: go list -m github.com/sashabaranov/go-openai@latest to confirm]

Already in go.mod (no new dependencies needed)

Package	Current Version	Used By Phase 2
github.com/go-chi/chi/v5	v5.2.5	POST /api/intake route
github.com/spf13/viper	v1.21.0	ai_config.json loading
github.com/google/uuid	v1.6.0	Intake job ID (already indirect)
github.com/redis/go-redis/v9	v9.18.0	WAQ fallback on NetBox failure

Installation

cd /home/mikkel/homelabby
go get github.com/sashabaranov/go-openai@latest

Architecture Patterns

Recommended Package Structure (Phase 2 additions)

internal/
├── ai/
│   ├── client.go          # AIClient interface + TierClient concrete type
│   ├── mock.go            # MockAIClient for unit tests
│   ├── orchestrator.go    # Three-tier routing + escalation logic
│   ├── types.go           # IntakeRequest, IntakeResult, ConfidenceLevel
│   └── prompts/
│       └── intake.go      # Prompt templates for hardware analysis
├── api/
│   ├── handlers/
│   │   └── intake.go      # POST /api/intake multipart handler (new)
│   └── router.go          # Add intake route (modify existing)
└── config/
    └── config.go          # Add AIConfig fields (modify existing)

Pattern 1: AIClient Interface + TierClient

What: A minimal Go interface that captures the one call shape Phase 2 needs. TierClient wraps *openai.Client from go-openai. MockAIClient implements the same interface deterministically.

Why minimal interface: The interface should expose the behavior, not the library. If the interface requires *openai.ChatCompletionRequest, tests must import go-openai. A domain-typed interface (AnalyzePhotos) keeps mocks simple.

// Source: training knowledge — standard Go interface pattern [ASSUMED]
// internal/ai/client.go

package ai

import "context"

// AIClient is the single abstraction over any OpenAI-compatible inference backend.
// Production: TierClient wrapping sashabaranov/go-openai.
// Tests: MockAIClient with canned responses.
type AIClient interface {
    AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error)
}

// TierConfig holds provider configuration for one AI tier.
type TierConfig struct {
    BaseURL  string  `json:"base_url"`
    APIKey   string  `json:"api_key"`
    Model    string  `json:"model"`
    TimeoutS int     `json:"timeout_seconds"`
}

// TierClient is the production AIClient backed by go-openai.
type TierClient struct {
    client *openai.Client
    model  string
}

func NewTierClient(cfg TierConfig) *TierClient {
    config := openai.DefaultConfig(cfg.APIKey)
    config.BaseURL = cfg.BaseURL
    return &TierClient{
        client: openai.NewClientWithConfig(config),
        model:  cfg.Model,
    }
}

[VERIFIED: go-openai BaseURL override via openai.DefaultConfig + config.BaseURL — confirmed pattern from STACK.md and ARCHITECTURE.md]

Pattern 2: Multipart Photo Upload → Base64 Vision Message

What: chi handler reads up to 3 files from multipart form, reads each into []byte, encodes to base64 data URL, assembles a ChatCompletionRequest with ImageURL content parts.

go-openai vision message shape: [ASSUMED: standard pattern, consistent with OpenAI API]

// internal/api/handlers/intake.go
// Source: go-openai vision pattern [ASSUMED — matches OpenAI API spec]

func (h *IntakeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // Parse multipart — 32MB max
    if err := r.ParseMultipartForm(32 << 20); err != nil {
        http.Error(w, "bad multipart", http.StatusBadRequest)
        return
    }

    files := r.MultipartForm.File["photos"]
    if len(files) == 0 || len(files) > 3 {
        http.Error(w, "1-3 photos required", http.StatusBadRequest)
        return
    }

    var photosB64 []string
    for _, fh := range files {
        f, err := fh.Open()
        if err != nil { /* handle */ }
        defer f.Close()
        data, err := io.ReadAll(f)
        if err != nil { /* handle */ }
        // Detect MIME type from first 512 bytes
        mime := http.DetectContentType(data[:min(512, len(data))])
        photosB64 = append(photosB64, fmt.Sprintf("data:%s;base64,%s",
            mime, base64.StdEncoding.EncodeToString(data)))
    }

    result, err := h.ai.AnalyzePhotos(r.Context(), ai.IntakeRequest{
        PhotosBase64: photosB64,
    })
    // ...
}

go-openai vision content parts: [ASSUMED]

// internal/ai/client.go — TierClient.AnalyzePhotos
func (c *TierClient) AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error) {
    // Build image content parts
    parts := []openai.ChatMessagePart{
        {
            Type: openai.ChatMessagePartTypeText,
            Text: buildIntakePrompt(),
        },
    }
    for _, b64 := range req.PhotosBase64 {
        parts = append(parts, openai.ChatMessagePart{
            Type: openai.ChatMessagePartTypeImageURL,
            ImageURL: &openai.ChatMessageImageURL{
                URL:    b64,   // data:image/jpeg;base64,...
                Detail: openai.ImageURLDetailAuto,
            },
        })
    }

    resp, err := c.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
        Model: c.model,
        Messages: []openai.ChatCompletionMessage{
            {Role: openai.ChatMessageRoleUser, MultiContent: parts},
        },
        // ResponseFormat for JSON mode — see Pattern 3
    })
    // parse resp.Choices[0].Message.Content as JSON
}

[ASSUMED: MultiContent field name in go-openai ChatCompletionMessage — verify against actual go-openai source after install. Some versions use Content string OR MultiContent []ChatMessagePart]

CRITICAL NOTE: Verify the exact ChatCompletionMessage field for multi-content vision after go get. The field has been MultiContent in v1.20+ but naming may differ. Check with:

go doc github.com/sashabaranov/go-openai ChatCompletionMessage

Pattern 3: Structured JSON Output from Gemma 4

What: Instruct the model to return a specific JSON schema via prompt engineering. Use ResponseFormat with JSONObject type when the endpoint supports it (oMLX/Gemma 4 may not support strict JSON schema mode — fall back to prompt-only).

IntakeResult schema:

// internal/ai/types.go
package ai

// IntakeResult is the structured output from any AI tier's photo analysis.
// The model is instructed to return this JSON shape verbatim.
type IntakeResult struct {
    SerialNumber   string   `json:"serial_number"`    // empty string if not visible
    Model          string   `json:"model"`
    Manufacturer   string   `json:"manufacturer"`
    Category       string   `json:"category"`         // e.g. "networking", "cable", "compute"
    Specs          map[string]string `json:"specs"`   // key-value hardware specs
    SuggestedTags  []string `json:"suggested_tags"`
    AINotes        string   `json:"ai_notes"`          // free-form observations
    Confidence     float64  `json:"confidence"`        // 0.0–1.0, self-reported by model
    ConfidenceNote string   `json:"confidence_note"`   // why confidence is low (if < threshold)
}

Prompt pattern for JSON output:

// internal/ai/prompts/intake.go
func buildIntakePrompt() string {
    return `Analyze the hardware in the provided photo(s) and return ONLY valid JSON matching this schema:
{
  "serial_number": "<string or empty>",
  "model": "<string>",
  "manufacturer": "<string>",
  "category": "<one of: compute, networking, storage, cable, peripheral, component, unknown>",
  "specs": {"<key>": "<value>"},
  "suggested_tags": ["<tag1>", "<tag2>"],
  "ai_notes": "<observations>",
  "confidence": <float 0.0-1.0>,
  "confidence_note": "<reason if confidence < 0.75>"
}
Return ONLY the JSON object. No markdown, no explanation.`
}

JSON mode ResponseFormat (use if supported by endpoint): [ASSUMED]

// Only set if oMLX / OpenRouter model supports JSON mode
ResponseFormat: &openai.ChatCompletionResponseFormat{
    Type: openai.ChatCompletionResponseFormatTypeJSONObject,
},

[ASSUMED: Gemma 4 via oMLX may not support response_format: json_object — implement with prompt-only fallback and parse json.Unmarshal on the raw response string. If JSON parse fails, treat as low-confidence and escalate.]

Pattern 4: Three-Tier Orchestrator

What: Orchestrator holds two AIClient instances (tier1, tier2). For each intake request: call tier1, parse result, check confidence. If confidence < threshold OR parse failed, call tier2 with same request. Map confidence to CatalogStatus for quality gate.

// internal/ai/orchestrator.go
package ai

type Orchestrator struct {
    tier1     AIClient
    tier2     AIClient
    threshold float64 // from config — default 0.75
}

func NewOrchestrator(tier1, tier2 AIClient, threshold float64) *Orchestrator {
    return &Orchestrator{tier1: tier1, tier2: tier2, threshold: threshold}
}

// Analyze runs tier1, escalates to tier2 if needed, returns result + catalog decision.
func (o *Orchestrator) Analyze(ctx context.Context, req IntakeRequest) (*IntakeResult, inventory.CatalogStatus, error) {
    result, err := o.tier1.AnalyzePhotos(ctx, req)
    if err != nil || result == nil || result.Confidence < o.threshold {
        // Escalate to tier2
        result2, err2 := o.tier2.AnalyzePhotos(ctx, req)
        if err2 == nil && result2 != nil {
            result = result2
        }
        // If tier2 also fails, use tier1 result (or zero result) with NeedsResearch status
    }

    status := inventory.StatusIndexed
    if result == nil || result.Confidence < o.threshold {
        status = inventory.StatusNeedsResearch
    }
    return result, status, nil
}

Pattern 5: MockAIClient for Unit Tests

What: A deterministic mock that returns canned IntakeResult values. Implements AIClient interface. Configurable to return high-confidence or low-confidence responses, and optionally errors.

// internal/ai/mock.go
package ai

import "context"

// MockAIClient is a test double for AIClient.
// Configure FixedResult and/or FixedError before use.
type MockAIClient struct {
    FixedResult *IntakeResult
    FixedError  error
    Calls       []IntakeRequest // record of calls for assertions
}

func (m *MockAIClient) AnalyzePhotos(_ context.Context, req IntakeRequest) (*IntakeResult, error) {
    m.Calls = append(m.Calls, req)
    return m.FixedResult, m.FixedError
}

// HighConfidenceResult returns a fixture IntakeResult with confidence 0.95.
func HighConfidenceResult() *IntakeResult {
    return &IntakeResult{
        Model:        "Raspberry Pi 4 Model B",
        Manufacturer: "Raspberry Pi Foundation",
        Category:     "compute",
        Specs:        map[string]string{"ram": "4GB", "cpu": "BCM2711"},
        SuggestedTags: []string{"raspberry-pi", "compute", "arm"},
        Confidence:   0.95,
    }
}

// LowConfidenceResult returns a fixture with confidence 0.40 (below threshold).
func LowConfidenceResult() *IntakeResult {
    return &IntakeResult{
        Model:          "Unknown Device",
        Category:       "unknown",
        Confidence:     0.40,
        ConfidenceNote: "Cannot identify markings clearly",
    }
}

Pattern 6: AI Config Schema (ai_config.json)

What: Separate JSON config file for AI provider settings. Loaded by viper alongside main config.json. Keeps provider credentials out of the main config.

{
  "tier1": {
    "base_url": "http://localhost:8000/v1",
    "api_key": "local",
    "model": "gemma-4-e4b",
    "timeout_seconds": 30
  },
  "tier2": {
    "base_url": "https://openrouter.ai/api/v1",
    "api_key": "sk-or-...",
    "model": "google/gemma-2-27b-it",
    "timeout_seconds": 60
  },
  "confidence_threshold": 0.75,
  "quick_add_enabled": false,
  "quick_add_threshold": 0.90
}

Config struct extension (extend existing internal/config/config.go):

type AIConfig struct {
    Tier1               TierConfig `mapstructure:"tier1"`
    Tier2               TierConfig `mapstructure:"tier2"`
    ConfidenceThreshold float64    `mapstructure:"confidence_threshold"`
    QuickAddEnabled     bool       `mapstructure:"quick_add_enabled"`
    QuickAddThreshold   float64    `mapstructure:"quick_add_threshold"`
}

// Add to Config struct:
AI AIConfig `mapstructure:"ai"`

Viper loads ai_config.json by merging it into the same viper instance using v.MergeInConfig() with a second config name, or by embedding the AI fields directly in config.json under an "ai" key. Simplest: use a single config.json with an "ai" section and add ai_config.json as an override file via v.MergeConfigMap.

[ASSUMED: viper MergeInConfig pattern for secondary config file — standard viper v1 capability]

Pattern 7: Intake Handler Wiring to Phase 1 Components

What: The intake handler coordinates: orchestrator (AI analysis) → AllocateNextHWID (ID) → BuildFullCustomFieldsPatch (fields) → NetboxClient.CreateDevice or PatchCustomFields → SyncTags → CatalogUpdater.UpdateCatalogStatus → WAQ fallback.

Existing Phase 1 APIs the handler calls:

Phase 1 Function	Package	Handler Usage
`AllocateNextHWID(ctx)`	`internal/netbox`	Assign HW-XXXXX ID to new record
`BuildFullCustomFieldsPatch(cf)`	`internal/netbox`	Populate custom fields from IntakeResult
`PatchCustomFields(ctx, id, patch)`	`internal/netbox`	Write AI data to NetBox device
`SyncTags(ctx, tags)`	`internal/netbox`	Create and assign AI-suggested tags
`UpdateCatalogStatus(ctx, id, current, next)`	`internal/inventory`	Set indexed or needs_research
`waq.Enqueue(ctx, op)`	`internal/queue`	Buffer NetBox write if unreachable

Note: Phase 1's client.go has ListDevices and GetDevice but no CreateDevice. The intake handler will need CreateDevice — this is a new method on internal/netbox.Client. Plan must include this task.

Pattern 8: SearXNG Stub (AI-04)

What: AI-04 is listed as "Phase 7" in REQUIREMENTS.md but the CONTEXT.md says "stub only" this phase. Implement a ResearchClient interface with a Search(ctx, query) method, and a NoOpResearchClient that returns empty results. This satisfies the interface requirement without Phase 7 scope creep.

// internal/ai/research.go (stub)
type ResearchClient interface {
    Search(ctx context.Context, query string) ([]SearchResult, error)
}

type NoOpResearchClient struct{}

func (n *NoOpResearchClient) Search(_ context.Context, _ string) ([]SearchResult, error) {
    return nil, nil // Phase 7 will provide real implementation
}

Anti-Patterns to Avoid

Don't extract confidence from logprobs: Gemma 4 vision via oMLX does not expose per-token logprobs reliably. Embed confidence: float in the JSON output schema and instruct the model to self-report it. [ASSUMED: oMLX logprobs availability is uncertain]
Don't store photos: Per CLAUDE.md stack patterns: "Store the original photo in a local temp directory only until the NetBox record is created; do not persist photos in HWLab itself." Photos are transient.
Don't call NetBox from the AI package: internal/ai should not import internal/netbox. The intake handler (service layer) orchestrates both. Keep the AI package focused on inference only.
Don't share a single go-openai client across tiers: Each tier gets its own *openai.Client instance with its own BaseURL and APIKey. Mutating a shared client's config is a race condition.
Don't block the HTTP response on AI inference: AI calls take 2-30 seconds. The intake handler should return a job ID immediately and push the result via SSE. (Phase 3 will add SSE — for Phase 2, a synchronous response is acceptable since there's no UI yet, but design the handler to support async promotion.)

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
OpenAI-compatible HTTP client	Custom HTTP calls to oMLX	`sashabaranov/go-openai`	Handles auth headers, retry, streaming, vision content parts
Base64 encoding	Custom encoder	`encoding/base64` stdlib	Already in Go stdlib
MIME type detection	File extension parsing	`net/http.DetectContentType`	Magic bytes detection from stdlib
JSON structured output parsing	Regex extraction	`encoding/json.Unmarshal`	Model output is well-formed JSON when prompted correctly
Multipart form parsing	Manual `--boundary` parsing	`r.ParseMultipartForm()`	stdlib net/http handles multipart

Common Pitfalls

Pitfall 1: go-openai Vision MultiContent Field Name

What goes wrong: Code compiles but ChatCompletionMessage.MultiContent field doesn't exist or is named differently in the installed version.

Why it happens: go-openai API evolved; older versions used a single Content string, newer versions added MultiContent []ChatMessagePart for vision. The exact field name depends on the version.

How to avoid: After go get github.com/sashabaranov/go-openai@latest, run go doc github.com/sashabaranov/go-openai ChatCompletionMessage and verify the vision field name before writing handler code.

Warning signs: Compiler error "unknown field MultiContent" or images silently not being sent (text-only response from model).

Pitfall 2: oMLX JSON Mode Not Supported

What goes wrong: Setting ResponseFormat: {Type: "json_object"} causes a 400 error from oMLX because Gemma 4 E4B via oMLX may not support the response_format parameter.

Why it happens: The response_format JSON schema enforcement is an OpenAI-specific feature not universally implemented across all OpenAI-compatible servers.

How to avoid: Implement JSON parsing with a fallback: try json.Unmarshal(content) on the raw string. If parse fails, treat result as zero-confidence and escalate to tier2. Do not set ResponseFormat unless tested against live oMLX.

Warning signs: 400 Bad Request from oMLX at inference time with "unsupported parameter" in body.

Pitfall 3: Data URL MIME Type vs go-openai Image URL

What goes wrong: Some OpenAI-compatible servers reject data:image/jpeg;base64,... data URLs in vision requests and require a https:// URL instead.

Why it happens: The OpenAI spec allows data URLs in image_url.url but not all providers implement this.

How to avoid: oMLX (local, Gemma 4) should accept data URLs since it's processing locally. Test with a minimal integration test against live oMLX before building the full intake flow. Keep the base64 path for oMLX (tier1) and note that tier2 (OpenRouter) may require a different approach if it doesn't accept data URLs.

Warning signs: 400 or inference-time error from oMLX with "invalid image_url".

Pitfall 4: CreateDevice Not in Phase 1 NetBox Client

What goes wrong: Intake handler tries to call netboxClient.CreateDevice(...) but that method was not built in Phase 1 (only ListDevices, GetDevice, PatchCustomFields were built).

Why it happens: Phase 1 was scoped to read/patch existing devices for the quality gate workflow. Intake requires creating new records.

How to avoid: Plan must include a Wave 0 task to add CreateDevice(ctx, name, assetTag) (int, error) to internal/netbox/client.go before the intake handler can be completed.

go-netbox v4 create pattern: [ASSUMED — matches observed PATCH pattern from 01-02-SUMMARY]

req := nb.WritableDeviceWithConfigContextRequest{}
req.SetName(name)
req.SetAssetTag(assetTag)
// DeviceRole and DeviceType are required by NetBox — plan must handle defaults
resp, _, err := c.api.DcimAPI.DcimDevicesCreate(ctx).
    WritableDeviceWithConfigContextRequest(req).Execute()

Note: NetBox DcimDevicesCreate requires device_role and device_type to be set (they are non-nullable FK fields in NetBox v4). The intake handler must either pick sensible defaults or require them to exist in NetBox as pre-provisioned "Unknown" role/type records.

Pitfall 5: Confidence Self-Reporting Calibration

What goes wrong: Model returns "confidence": 0.95 for every item regardless of actual uncertainty, making the threshold useless.

Why it happens: LLMs tend to be overconfident in self-reporting. Without explicit calibration prompting, models bias toward high confidence.

How to avoid: Add calibration guidance to the intake prompt: "Return confidence < 0.75 if: serial number not visible, item is partially obscured, or manufacturer/model cannot be determined from visual inspection alone." This nudges the model toward honest low-confidence responses for ambiguous photos.

Pitfall 6: WAQ Integration — PendingOp Payload Schema

What goes wrong: Intake handler enqueues a PendingOp with a payload, but Phase 1's NoOpHandler (the WAQ worker) is still installed — it drains the queue silently. Phase 2 must replace NoOpHandler with a real NetBox retry handler.

Why it happens: Phase 1 explicitly left NoOpHandler as a stub: "Phase 2 will replace this with a real retry handler."

How to avoid: Phase 2 plan must include a task to implement the real WAQ handler that retries failed NetBox CreateDevice / PatchCustomFields calls. Define PendingOp.OpType constants (e.g., "netbox.create_device", "netbox.patch_custom_fields") and the payload structs for each.

Code Examples

go-openai Client Configuration for oMLX

// Source: go-openai README pattern, confirmed in STACK.md [ASSUMED version specifics]
import openai "github.com/sashabaranov/go-openai"

cfg := openai.DefaultConfig("local")   // API key "local" for oMLX (no auth)
cfg.BaseURL = "http://localhost:8000/v1"
client := openai.NewClientWithConfig(cfg)

go-openai Client Configuration for OpenRouter

cfg := openai.DefaultConfig("sk-or-your-key-here")
cfg.BaseURL = "https://openrouter.ai/api/v1"
client := openai.NewClientWithConfig(cfg)

Multipart File Reading in chi Handler

// Source: Go stdlib net/http [VERIFIED: stdlib pattern]
r.ParseMultipartForm(32 << 20)  // 32MB max memory
files := r.MultipartForm.File["photos"]
for _, fh := range files {
    f, err := fh.Open()
    defer f.Close()
    data, _ := io.ReadAll(f)
    mime := http.DetectContentType(data[:min(512, len(data))])
    b64 := base64.StdEncoding.EncodeToString(data)
    dataURL := fmt.Sprintf("data:%s;base64,%s", mime, b64)
}

JSON Parse with Fallback

// Source: Go stdlib encoding/json [VERIFIED: stdlib pattern]
var result ai.IntakeResult
content := resp.Choices[0].Message.Content
if err := json.Unmarshal([]byte(content), &result); err != nil {
    // Model returned non-JSON — treat as low confidence, escalate
    return &ai.IntakeResult{Confidence: 0.0}, nil
}

Integration Test Skip Guard (consistent with Phase 1 pattern)

// Source: Phase 1 established pattern (01-02-SUMMARY.md) [VERIFIED: codebase]
func TestAnalyzePhotosLive(t *testing.T) {
    endpoint := os.Getenv("HWLAB_OMLX_ENDPOINT")
    if endpoint == "" {
        t.Skip("HWLAB_OMLX_ENDPOINT not set — skipping live oMLX test")
    }
    // ...
}

Validation Architecture

Test Framework

Property	Value
Framework	Go testing stdlib (`go test ./...`)
Config file	none — test flags via env vars
Quick run command	`go test ./internal/ai/... -run "^Test[^L]" -timeout 30s`
Full suite command	`go test ./...`

Phase Requirements → Test Map

Req ID	Behavior	Test Type	Automated Command	File Exists?
AI-02	Photo upload multipart parsing	unit	`go test ./internal/api/handlers/... -run TestIntakeHandler`	Wave 0
AI-02	Base64 encoding of JPEG	unit	`go test ./internal/ai/... -run TestEncodePhoto`	Wave 0
AI-03	JSON parse of structured output	unit	`go test ./internal/ai/... -run TestParseIntakeResult`	Wave 0
AI-05	Confidence below threshold → needs_research	unit	`go test ./internal/ai/... -run TestOrchestratorEscalation`	Wave 0
AI-05	Confidence above threshold → indexed	unit	`go test ./internal/ai/... -run TestOrchestratorHighConf`	Wave 0
AI-06	Tier 2 called on tier 1 failure	unit	`go test ./internal/ai/... -run TestOrchestratorTier2Fallback`	Wave 0
AI-07	Quick add flag honors threshold	unit	`go test ./internal/ai/... -run TestQuickAddMode`	Wave 0
AI-08	TierClient uses configured BaseURL	unit	`go test ./internal/ai/... -run TestTierClientConfig`	Wave 0
AI-09	ai_config.json loaded via viper	unit	`go test ./internal/config/... -run TestAIConfig`	Wave 0
AI-01	oMLX live inference smoke test	integration	`go test ./internal/ai/... -run TestAnalyzePhotosLive` (skip if env unset)	Wave 0

Sampling Rate

Per task commit: go test ./internal/ai/... ./internal/api/handlers/... -timeout 30s
Per wave merge: go test ./...
Phase gate: Full suite green before /gsd-verify-work

Wave 0 Gaps

internal/ai/client_test.go — covers AI-08, AI-09 (TierClient config)
internal/ai/orchestrator_test.go — covers AI-05, AI-06, AI-07
internal/ai/types_test.go — covers AI-03 (JSON parse)
internal/api/handlers/intake_test.go — covers AI-02

Security Domain

Applicable ASVS Categories

ASVS Category	Applies	Standard Control
V2 Authentication	no	No auth in solo homelab tool
V3 Session Management	no	Stateless REST
V4 Access Control	no	Solo operator, no roles
V5 Input Validation	yes	Validate photo count (1-3), file size cap, MIME type check
V6 Cryptography	no	API keys in config, not in code

Known Threat Patterns

Pattern	STRIDE	Standard Mitigation
Oversized photo upload (DoS)	Denial of Service	`ParseMultipartForm(32 << 20)` caps memory; add explicit per-file size check (e.g., 10MB/photo)
AI prompt injection via filename	Tampering	Do not include original filename in AI prompt; use only image bytes
API key leakage in logs	Info Disclosure	Never log `TierConfig.APIKey`; use `***` redaction in any debug output
Malformed JSON from model	Tampering	Always `json.Unmarshal` into typed struct; ignore extra fields; treat parse failure as low confidence

Environment Availability

Dependency	Required By	Available	Version	Fallback
oMLX on localhost:8000	AI-01, Tier 1 inference	Unknown (dev machine)	—	MockAIClient for unit tests; integration tests skip with env guard
OpenRouter API key	AI-06, Tier 2	Unknown	—	Integration tests skip; tier2 returns error, orchestrator falls back to needs_research
DragonFlyDB (10.5.0.10)	WAQ fallback	VERIFIED reachable (from 01-05-SUMMARY)	—	WAQ init is non-fatal; see 01-05 pattern
NetBox (10.5.0.130:8000)	CreateDevice, PatchCustomFields	Available (integration tests skip on placeholder token)	—	WAQ enqueues ops; real token needed for integration tests

Missing dependencies with no fallback:

None — all dependencies have mock/skip fallbacks for unit tests.

Missing dependencies with fallback:

oMLX: MockAIClient covers unit tests; integration test skips with HWLAB_OMLX_ENDPOINT guard.
OpenRouter key: Same skip guard pattern.

Open Questions

NetBox device_role and device_type for CreateDevice
- What we know: NetBox v4 requires both to be non-null FKs on device creation
- What's unclear: Should intake auto-create "Unknown" role/type records if absent, or require them pre-provisioned?
- Recommendation: Phase 1 (Plan 03, provision.go) may have already provisioned these. Check internal/netbox/provision.go before planning the CreateDevice task.
Gemma 4 E4B model ID string in oMLX
- What we know: CONTEXT.md says model: "gemma-4-e4b" as default; oMLX uses the model filename/ID
- What's unclear: The exact model ID string oMLX uses for Gemma 4 E4B (may be mlx-community/gemma-4-e4b or similar)
- Recommendation: Leave as a config value; user sets the correct model ID once oMLX is installed. Default to "gemma-4-e4b" in ai_config.json with a comment.
Synchronous vs async intake response
- What we know: AI inference takes 2-30 seconds; Phase 3 adds SSE; no UI in Phase 2
- What's unclear: Should Phase 2 implement async job IDs now (for Phase 3 to build on) or keep synchronous for simplicity?
- Recommendation: Implement synchronous for Phase 2 (no UI yet); design the handler to accept a ?async=true query param stub that returns "not yet implemented" — this reserves the API surface for Phase 3 without blocking Phase 2.

Assumptions Log

#	Claim	Section	Risk if Wrong
A1	go-openai vision content uses `MultiContent []ChatMessagePart` field on `ChatCompletionMessage`	Pattern 2	Compile error; verify with `go doc` after install
A2	oMLX supports data URL base64 images in vision requests	Pattern 2	400 error at inference time; may need to write image to temp file and use URL instead
A3	oMLX may not support `response_format: json_object`	Pattern 3	Must use prompt-only JSON mode; 400 if ResponseFormat is set
A4	go-openai latest version is v1.36+	Standard Stack	Run `go get` to verify; version is only needed to confirm stability
A5	Gemma 4 E4B self-reports honest confidence scores with calibration prompting	Pattern 5 pitfall	Threshold becomes useless if model is always overconfident; may need threshold tuning
A6	viper `MergeInConfig` can load ai_config.json as secondary config	Pattern 6	Config loading fails silently; test config loading in Wave 0

Sources

Primary (HIGH confidence)

CONTEXT.md 02-CONTEXT.md — locked decisions for Phase 2 (this session)
01-02-SUMMARY.md, 01-04-SUMMARY.md, 01-05-SUMMARY.md — Phase 1 actual implementation (verified codebase state)
internal/config/config.go — existing config struct to extend
internal/api/router.go — existing chi router to add route to
go.mod — confirmed go-openai not yet installed

Secondary (MEDIUM confidence)

ARCHITECTURE.md, STACK.md — project research documents (verified at research time)
CLAUDE.md stack patterns section — photo intake pattern, AI tier routing pattern

Tertiary (LOW/ASSUMED)

go-openai ChatCompletionMessage.MultiContent field name — training knowledge, verify post-install
oMLX response_format support status — not tested; marked ASSUMED
go-openai latest version number — marked ASSUMED

Metadata

Confidence breakdown:

Standard stack: HIGH — go-openai is the decided library; already in STACK.md; pattern for BaseURL swap is verified
Architecture (interface/mock pattern): HIGH — standard Go interface idiom, consistent with Phase 1 patterns
go-openai vision API field names: LOW — exact field names require post-install verification
oMLX JSON mode support: LOW — not tested against live oMLX

Research date: 2026-04-10 Valid until: 2026-05-10 (go-openai API is stable; oMLX is fast-moving — re-verify JSON mode if oMLX version changes)

36 KiB Raw Blame History Unescape Escape