homelabby/.planning/phases/07-research-agent-search/07-01-PLAN.md
Mikkel Georgsen 34e0803661 docs(07): create phase 7 plans — research agent and NL search
2 plans, 2 waves: SearXNG client + ResearchAgent (wave 1),
NL search endpoint + dashboard search bar (wave 2). Covers AI-04 + UI-03.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 07:45:56 +00:00

14 KiB

phase plan type wave depends_on files_modified autonomous requirements must_haves
07-research-agent-search 01 execute 1
internal/config/config.go
internal/netbox/client.go
internal/research/searxng.go
internal/research/agent.go
internal/ai/research.go
cmd/hwlab/main.go
true
AI-04
truths artifacts key_links
A SearXNG HTTP GET to http://10.5.0.129:8080/search?q=...&format=json returns parsed results
Items with catalog_status=needs_research are polled from NetBox every 10 minutes
Each needs_research item is enriched by SearXNG + Tier 2 LLM and updated to catalog_status=researched in NetBox
POST /api/research/trigger fires an immediate research cycle (does not wait for the 10-min ticker)
path provides exports
internal/research/searxng.go SearXNGClient implementing ai.ResearchClient
SearXNGClient
NewSearXNGClient
path provides exports
internal/research/agent.go ResearchAgent background goroutine
Agent
NewAgent
RunOnce
Start
from to via pattern
internal/research/searxng.go http://10.5.0.129:8080/search net/http GET with q and format=json query params http.Get.*search.*format=json
from to via pattern
internal/research/agent.go internal/netbox/client.go ListDevicesWithStatus(ctx, "needs_research") ListDevicesWithStatus
from to via pattern
internal/research/agent.go internal/ai/client.go tier2.AnalyzePhotos (text-only prompt, no photos) AnalyzePhotos
Build the real SearXNG research client and the ResearchAgent background worker that closes the AI-04 research loop: items at needs_research are enriched automatically.

Purpose: Replace the Phase 2 NoOpResearchClient stub and deliver the automated enrichment cycle that advances items from needs_research to researched in NetBox.

Output:

  • internal/research/searxng.go — real HTTP client implementing ai.ResearchClient
  • internal/research/agent.go — background worker with ticker + on-demand trigger
  • Config additions for SearXNG URL
  • main.go goroutine start + POST /api/research/trigger handler

<execution_context> @/home/mikkel/.claude/get-shit-done/workflows/execute-plan.md @/home/mikkel/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md

@internal/ai/research.go @internal/ai/client.go @internal/ai/orchestrator.go @internal/netbox/client.go @internal/netbox/custom_fields.go @internal/netbox/types.go @internal/inventory/catalog_updater.go @internal/config/config.go @cmd/hwlab/main.go @internal/api/router.go

From internal/ai/research.go:

type SearchResult struct {
    Title   string
    URL     string
    Snippet string
}

type ResearchClient interface {
    Search(ctx context.Context, query string) ([]SearchResult, error)
}

type NoOpResearchClient struct{}
// Replace this with SearXNGClient in this plan.

From internal/ai/client.go:

type AIClient interface {
    AnalyzePhotos(ctx context.Context, req IntakeRequest) (*IntakeResult, error)
}
// IntakeRequest.PhotosBase64 may be empty — the Tier 2 model accepts text-only
// if the prompt is placed in a separate system message; use a text-only prompt
// for research enrichment (no photos).

From internal/netbox/client.go (method to ADD):

// ListDevicesWithStatus returns devices whose catalog_status custom field equals status.
// Use status="needs_research" to find items needing enrichment.
func (c *Client) ListDevicesWithStatus(ctx context.Context, status string) ([]Device, error)

From internal/inventory/catalog_updater.go:

func (u *CatalogUpdater) UpdateCatalogStatus(ctx context.Context, deviceID int64, current, next CatalogStatus) (CatalogStatus, error)

From internal/inventory (quality_gate.go constants):

const StatusNeedsResearch CatalogStatus = "needs_research"
const StatusResearched    CatalogStatus = "researched"

From internal/config/config.go (field to ADD):

SearXNGURL string `mapstructure:"searxng_url"`
// default: "http://10.5.0.129:8080"
// env: HWLAB_SEARXNG_URL
Task 1: SearXNG client + netbox.ListDevicesWithStatus internal/research/searxng.go, internal/research/searxng_test.go, internal/netbox/client.go, internal/config/config.go - SearXNGClient.Search(ctx, "Intel NIC i350") sends GET http://10.5.0.129:8080/search?q=Intel+NIC+i350&format=json - HTTP 200 with JSON body {"results":[{"title":"...","url":"...","content":"..."},...]} parses into []ai.SearchResult (map content->Snippet) - HTTP non-200 returns error with status code - Empty results array returns empty slice, no error - Query is URL-encoded (url.QueryEscape or url.Values) - ListDevicesWithStatus filters via custom_fields cf_catalog_status in go-netbox list call; falls back to client-side filter if API param unavailable - ListDevicesWithStatus("needs_research") returns only devices with that catalog_status Create package internal/research.
internal/research/searxng.go:
- Struct SearXNGClient with baseURL string and httpClient *http.Client (timeout 15s)
- NewSearXNGClient(baseURL string) *SearXNGClient — if baseURL empty, use "http://10.5.0.129:8080"
- Implements ai.ResearchClient interface
- Search method: build GET {baseURL}/search?q={url-encoded query}&format=json, execute, decode JSON
- SearXNG JSON response shape: {"results":[{"title":"","url":"","content":""},...]}
  Map content field to SearchResult.Snippet (SearXNG uses "content" not "snippet")
- Return ([]ai.SearchResult, error). Never panic on empty results.

internal/research/searxng_test.go:
- Use httptest.NewServer to mock SearXNG responses
- Test: valid response parses correctly (2 results)
- Test: HTTP 500 returns error
- Test: empty results returns empty slice

internal/netbox/client.go — add ListDevicesWithStatus:
- List all devices (up to 200), filter client-side where CustomFields.CatalogStatus == status
- (go-netbox v4 custom field filtering via query param is schema-dependent; client-side is safer)

internal/config/config.go — add SearXNGURL:
- Field: SearXNGURL string `mapstructure:"searxng_url"`
- Default: v.SetDefault("searxng_url", "http://10.5.0.129:8080")
- Env binding: v.BindEnv("searxng_url", "HWLAB_SEARXNG_URL")
cd /home/mikkel/homelabby && go test ./internal/research/... ./internal/config/... -v -count=1 -run TestSearXNG 2>&1 | tail -20 SearXNGClient implements ai.ResearchClient. Tests pass with httptest mock server. ListDevicesWithStatus added to netbox.Client. Config loads SearXNGURL with default. Task 2: ResearchAgent worker + main.go wiring + trigger endpoint internal/research/agent.go, internal/research/agent_test.go, internal/api/handlers/research.go, internal/api/router.go, cmd/hwlab/main.go - Agent.RunOnce(ctx) polls NetBox for needs_research items, for each: builds a text-only search query from item Name, calls SearXNGClient.Search, sends results to Tier 2 LLM with a research prompt, patches NetBox custom fields (ai_notes, product_url from first result URL), transitions status to researched via CatalogUpdater - Agent.Start(ctx, interval) runs RunOnce on ticker; logs "research agent: cycle complete, enriched N items" - If SearXNG returns 0 results for an item, log warning and skip (do not change status) - Tier 2 LLM research prompt: "You are enriching a hardware inventory record. Item: {name}. Search results: {formatted snippets}. Return JSON: {\"ai_notes\": \"...\", \"product_url\": \"...\"}" - POST /api/research/trigger responds 202 Accepted and fires RunOnce in a goroutine (non-blocking) - Query sanitization: strip characters outside [a-zA-Z0-9 .-_] before passing to SearXNG internal/research/agent.go: - Struct Agent with fields: nbClient *netbox.Client, researchClient ai.ResearchClient, tier2 ai.AIClient, updater *inventory.CatalogUpdater - NewAgent(nb *netbox.Client, rc ai.ResearchClient, tier2 ai.AIClient, updater *inventory.CatalogUpdater) *Agent - sanitizeQuery(s string) string — regexp [^a-zA-Z0-9 .\-_]+ replaced with space, strings.TrimSpace - RunOnce(ctx context.Context) (enriched int, err error): 1. ListDevicesWithStatus(ctx, "needs_research") 2. For each device: a. query = sanitizeQuery(device.Name) b. results = researchClient.Search(ctx, query) — skip if 0 results c. Build text prompt with top 3 results (title + snippet) d. tier2.AnalyzePhotos(ctx, IntakeRequest{PhotosBase64: nil, SystemPrompt: researchPrompt}) NOTE: IntakeRequest may not have SystemPrompt; build the research prompt as the text part of the multimodal request by putting it in a single text-only message. Check IntakeRequest fields; if no SystemPrompt, use a wrapper: set PhotosBase64 to nil and pass the assembled prompt text in a way the TierClient accepts. ALTERNATIVE if IntakeRequest does not support text-only: use go-openai directly via a new ResearchTierClient method — add TextComplete(ctx, prompt) (*IntakeResult, error) that posts a simple text ChatCompletion (no images). Prefer this approach for clarity. e. Parse response for ai_notes and product_url f. Patch NetBox: PatchCustomFields with ai_notes + product_url (if non-empty) g. UpdateCatalogStatus(ctx, id, StatusNeedsResearch, StatusResearched) h. enriched++ 3. Return enriched count - Start(ctx context.Context, interval time.Duration): log.Printf("research agent: starting, interval=%v", interval) RunOnce immediately, then ticker loop until ctx.Done()
For the text-only LLM call: add TextComplete to TierClient in internal/ai/client.go:
```go
func (c *TierClient) TextComplete(ctx context.Context, prompt string) (string, error)
```
This does a simple non-vision ChatCompletion with a single user message. Agent uses this.

internal/research/agent_test.go:
- Mock ResearchClient returning 2 fake SearchResults
- Mock AIClient (use existing MockAIClient pattern if available, else minimal struct)
- Mock NetBox (or use a stub struct) — test RunOnce returns enriched=1 for a fake device
- Test sanitizeQuery strips special chars

internal/api/handlers/research.go:
- ResearchHandler struct with agent *research.Agent
- NewResearchHandler(agent *research.Agent) *ResearchHandler
- TriggerResearch(w http.ResponseWriter, r *http.Request):
  go func() { agent.RunOnce(context.Background()) }()
  w.WriteHeader(http.StatusAccepted)
  json.NewEncoder(w).Encode(map[string]string{"status": "accepted"})

internal/api/router.go:
- Add researchHandler *handlers.ResearchHandler parameter to NewRouter signature
- Add r.Post("/research/trigger", researchHandler.TriggerResearch) inside r.Route("/api", ...)
- If researchHandler is nil, register an unavailable handler (same pattern as advisorHandler)

cmd/hwlab/main.go:
- Import internal/research
- After config load: searxngClient := research.NewSearXNGClient(cfg.SearXNGURL)
- researchAgent := research.NewAgent(nbClient, searxngClient, tier2, catalogUpdater)
- go researchAgent.Start(ctx, 10*time.Minute)
- researchHandler := handlers.NewResearchHandler(researchAgent)
- Pass researchHandler to api.NewRouter(...)
cd /home/mikkel/homelabby && go build ./... && go test ./internal/research/... -v -count=1 2>&1 | tail -30 go build passes. Agent tests pass. POST /api/research/trigger wired in router. Research agent goroutine starts on server launch with 10-minute interval.

<threat_model>

Trust Boundaries

Boundary Description
agent → SearXNG AI-generated query text leaves the process and reaches the search engine
SearXNG → agent External search results (HTML snippets) enter the process and are forwarded to LLM
trigger endpoint → agent HTTP request from frontend triggers a research cycle

STRIDE Threat Register

Threat ID Category Component Disposition Mitigation Plan
T-07-01 Tampering sanitizeQuery mitigate Strip [^a-zA-Z0-9 .-_]+ before dispatch; test with adversarial input in unit test
T-07-02 Information Disclosure SearXNG response snippets accept SearXNG is self-hosted LAN service; snippets never stored, only passed to LLM
T-07-03 Denial of Service POST /api/research/trigger mitigate Trigger fires goroutine but RunOnce is bounded per item; no queuing needed for MVP rate
T-07-04 Spoofing SearXNG base URL in config accept LAN-only service at fixed IP; no auth required by design
</threat_model>
1. `go build ./...` passes with no errors 2. `go test ./internal/research/...` all pass 3. SearXNG integration (manual): `curl "http://10.5.0.129:8080/search?q=Intel+i350&format=json"` returns JSON 4. Trigger endpoint: `curl -X POST http://localhost:8080/api/research/trigger` returns 202 5. Log line "research agent: starting, interval=10m0s" appears on server start

<success_criteria>

  • SearXNGClient.Search returns parsed []ai.SearchResult from live SearXNG instance
  • ResearchAgent.RunOnce enriches needs_research items end-to-end: search → LLM → NetBox patch → status transition
  • Research cycle runs every 10 minutes automatically and on demand via POST /api/research/trigger
  • All queries sanitized before SearXNG dispatch
  • go build clean, all new tests pass </success_criteria>
After completion, create `.planning/phases/07-research-agent-search/07-01-SUMMARY.md`