docs(28): research phase domain — Ollama API, Hermes config surface, cost tracking, dashboard

2026-04-02 16:45:03 +00:00 · 2026-04-02 16:45:03 +00:00 · ee7f47c4d0
commit ee7f47c4d0
parent 5fd7744516
1 changed files with 522 additions and 0 deletions
--- a/.planning/phases/28-ollama-integration/28-RESEARCH.md
+++ b/.planning/phases/28-ollama-integration/28-RESEARCH.md
@ -0,0 +1,522 @@
+# Phase 28: Ollama Integration & Agent Surface - Research
+
+**Researched:** 2026-04-01
+**Domain:** Ollama HTTP API, Hermes adapter extension, agent dashboard UI, cost tracking
+**Confidence:** HIGH
+
+## Summary
+
+Phase 28 adds three distinct capabilities on top of the completed Phase 27 Hermes adapter: (1) Ollama detection and model catalog — Nexus queries `localhost:11434` to detect Ollama, lists available models, and ships a static JSON catalog for hardware-aware recommendations; (2) Hermes config surface extension — the model field in `config-fields.tsx` becomes a dropdown fed by live Ollama discovery rather than a free-text input, and a new `base_url`/`provider: custom` adapterConfig field routes Hermes to the local endpoint; (3) Hermes runtime data in the dashboard — `stateJson` in `agentRuntimeState` is the right place to store Hermes-specific runtime metadata (model name, native skill count, memory usage from Ollama's `/api/ps`), and the `AgentOverview` component in `AgentDetail.tsx` is the right insertion point.
+
+The most important finding is that **Hermes does not have a native "ollama" provider**. Ollama is configured as a custom OpenAI-compatible endpoint: `provider: custom`, `base_url: http://localhost:11434/v1`. The model field passes the Ollama model name bare (e.g. `qwen2.5-coder:32b`). This shapes OLLA-02, OLLA-03, and the `config-fields.tsx` changes.
+
+For cost tracking (HERM-06): `hermes-paperclip-adapter@0.2.1` already parses `token_usage` and `cost` regex patterns from Hermes stdout. When Hermes returns non-zero usage, `heartbeat.ts:updateRuntimeState` already calls `costService.createEvent`. The only gap is that Hermes running local Ollama models will have `costUsd = undefined` (no billing) — the infrastructure handles this correctly (zero cost event is suppressed when `additionalCostCents === 0 && !hasTokenUsage`). No cost tracking code changes are needed for local models; the planner just needs to verify the regex path works end-to-end.
+
+For HERM-05 (skill visibility): `syncHermesNativeSkills` already exists in `skillRegistryService` and is already called from the `GET /skill-registry/agents/:agentId/skills` route when `adapterType === "hermes_local"`. The Hermes adapter's `listHermesSkills` function merges Paperclip-managed and native skills. The integration is already complete at the data layer. What is missing is the UI surface in the Skills tab that renders the `originLabel: "Hermes skill"` / `readOnly: true` entries distinctly from managed skills.
+
+**Primary recommendation:** Implement as four focused plans — (P01) server-side Ollama service + routes; (P02) Hermes config-fields UI extension for Ollama model selection; (P03) dashboard Hermes runtime info card; (P04) model catalog JSON + recommendation logic.
+
+---
+
+<user_constraints>
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.
+
+### Claude's Discretion
+All implementation choices are at Claude's discretion.
+
+### Deferred Ideas (OUT OF SCOPE)
+None — discuss phase skipped. Refer to REQUIREMENTS.md for in-scope requirements.
+
+Out of scope per REQUIREMENTS.md:
+- Multi-provider model routing (Hermes can use OpenRouter/Anthropic/OpenAI but that's Hermes config, not Nexus)
+- Hermes MCP server management
+- Custom Hermes skill authoring UI
+- DFLT-01 through DFLT-04 (Phase 29)
+</user_constraints>
+
+---
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|------------------|
+| OLLA-01 | Nexus detects whether Ollama is installed locally | HTTP probe to `localhost:11434/api/version`; new server service `ollamaService` |
+| OLLA-02 | User can see list of available Ollama models when configuring a Hermes agent | `GET /api/tags` from Ollama HTTP API; new server route `GET /companies/:id/ollama/models`; config-fields.tsx dropdown |
+| OLLA-03 | User can configure a Hermes agent with any local Ollama model | Sets `adapterConfig.model = <model-name>`, `adapterConfig.provider = "custom"`, `adapterConfig.base_url = "http://localhost:11434/v1"` |
+| OLLA-04 | Model recommendation based on RAM/VRAM from a shipped catalog | Static JSON catalog in `server/src/data/ollama-model-catalog.json`; server reads `os.totalmem()` to filter; returned with model list |
+| OLLA-05 | If Ollama is not present, user is offered installation instructions | Ollama status endpoint returns `installed: false` + `installUrl`; UI shows callout in Hermes config-fields |
+| HERM-05 | Nexus-managed skills visible alongside Hermes native skills in agent config | Already wired at data layer — UI Skills tab needs `originLabel: "Hermes skill"` rendering distinction |
+| HERM-06 | Cost tracking captures token usage and model costs for Hermes agents | Infrastructure already handles this; verify end-to-end with local Ollama (zero cost is correct, no change needed) |
+| HERM-07 | Dashboard shows Hermes-specific info (model name, memory usage, native skill count) | Store in `agentRuntimeState.stateJson`; render in `AgentOverview` component |
+</phase_requirements>
+
+---
+
+## Standard Stack
+
+### Core
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| Node.js `os` module | built-in | Read total system RAM | Already used in heartbeat.ts; no new dep |
+| Node.js `fetch` | Node 18+ built-in | HTTP calls to Ollama API at localhost:11434 | Already confirmed available in runtime |
+| `hermes-paperclip-adapter` | 0.2.1 (installed) | Hermes execution, skill sync, model detection | Already wired into adapter registry |
+
+### No New Dependencies Required
+All capabilities needed for Phase 28 are achievable with existing infrastructure:
+- Ollama HTTP API is probed with `fetch` (built-in Node 18+)
+- Model catalog is a static JSON file in the server package
+- RAM reading uses `os.totalmem()` (built-in)
+- Hermes Ollama configuration uses existing `adapterConfig` fields
+
+## Architecture Patterns
+
+### Recommended Project Structure (new files)
+
+```
+server/src/services/ollama.ts          # ollamaService — detect + list models
+server/src/routes/ollama.ts            # HTTP routes: /companies/:id/ollama/status, /models
+server/src/data/ollama-model-catalog.json  # shipped catalog for OLLA-04
+server/src/__tests__/ollama-service.test.ts  # unit tests for ollamaService
+ui/src/api/ollama.ts                   # ollamaApi client — wraps server routes
+```
+
+### Pattern 1: Ollama Service (server-side)
+
+```typescript
+// server/src/services/ollama.ts
+const OLLAMA_BASE_URL = process.env.OLLAMA_BASE_URL ?? "http://localhost:11434";
+const OLLAMA_TIMEOUT_MS = 3000;
+
+export interface OllamaStatus {
+  installed: boolean;
+  version: string | null;
+  installUrl: string;
+}
+
+export interface OllamaModel {
+  name: string;           // e.g. "qwen2.5-coder:32b"
+  parameterSize: string;  // e.g. "32.8B" from /api/tags details
+  quantization: string;   // e.g. "Q4_K_M"
+  sizeBytes: number;
+  family: string;         // e.g. "qwen2"
+  recommended: boolean;   // from catalog match + RAM check
+  recommendationReason: string | null;
+}
+
+export async function detectOllama(): Promise<OllamaStatus> {
+  const controller = new AbortController();
+  const timeout = setTimeout(() => controller.abort(), OLLAMA_TIMEOUT_MS);
+  try {
+    const res = await fetch(`${OLLAMA_BASE_URL}/api/version`, {
+      signal: controller.signal,
+    });
+    if (!res.ok) return { installed: false, version: null, installUrl: INSTALL_URL };
+    const body = await res.json() as { version?: string };
+    return { installed: true, version: body.version ?? null, installUrl: INSTALL_URL };
+  } catch {
+    return { installed: false, version: null, installUrl: INSTALL_URL };
+  } finally {
+    clearTimeout(timeout);
+  }
+}
+```
+
+**Why this pattern:** Matches the existing codex-models.ts pattern — HTTP fetch with timeout, graceful failure returns empty/false rather than throwing. The 3s timeout prevents hanging requests when Ollama is not installed.
+
+### Pattern 2: Ollama Routes (mounted under /companies/:companyId)
+
+```
+GET /companies/:companyId/ollama/status
+  → { installed: boolean, version: string|null, installUrl: string }
+
+GET /companies/:companyId/ollama/models
+  → { models: OllamaModel[], ramGb: number }
+```
+
+Both routes use existing `assertCompanyAccess(req, companyId)` authz pattern from `agents.ts`.
+
+Mount in `server/src/routes/index.ts` alongside the existing `agentsRoutes`.
+
+### Pattern 3: Hermes Config-Fields Enhancement
+
+The existing `HermesLocalConfigFields` in `config-fields.tsx` has a free-text `Model` input. For Ollama support, it becomes a hybrid: dropdown (when Ollama is present) + manual entry fallback.
+
+```tsx
+// Fetch Ollama status + models (only for hermes_local adapter)
+const { data: ollamaStatus } = useQuery({
+  queryKey: ["ollama", "status", companyId],
+  queryFn: () => ollamaApi.status(companyId!),
+  enabled: Boolean(companyId),
+});
+
+const { data: ollamaModels } = useQuery({
+  queryKey: ["ollama", "models", companyId],
+  queryFn: () => ollamaApi.models(companyId!),
+  enabled: Boolean(companyId && ollamaStatus?.installed),
+});
+```
+
+When `ollamaStatus.installed === false`, render an install callout (OLLA-05) instead of the dropdown.
+
+When a local Ollama model is selected, `buildHermesConfig` (or `mark`) must also set `provider: "custom"` and `base_url: "http://localhost:11434/v1"` in `adapterConfig`. This is the critical mapping from OLLA-03.
+
+### Pattern 4: Hermes Runtime Data in stateJson (HERM-07)
+
+`agentRuntimeState.stateJson` is `jsonb` typed as `Record<string, unknown>`. The heartbeat service writes this via `updateRuntimeState`. The Hermes adapter's `execute.ts` already returns `resultJson` with `session_id`, `usage`, and `cost_usd`.
+
+For HERM-07 runtime data (model name, native skill count, memory usage), the server-side approach is:
+- After a Hermes run completes, read `resultJson.result` and extract/store model + detected skill count into `stateJson`
+- Optionally query Ollama `/api/ps` (running models) to get `size_vram` for memory usage display
+
+**Insertion point for stateJson patch:** `heartbeat.ts:updateRuntimeState` already calls `db.update(agentRuntimeState).set(...)`. Add a `stateJson` merge here when `adapterType === "hermes_local"`.
+
+**UI insertion point:** `AgentOverview` component in `AgentDetail.tsx` (line ~1183). Add a `HermesRuntimeCard` component after the charts section, gated by `agent.adapterType === "hermes_local"`:
+
+```tsx
+{agent.adapterType === "hermes_local" && runtimeState && (
+  <HermesRuntimeCard runtimeState={runtimeState} />
+)}
+```
+
+### Pattern 5: Model Catalog JSON (OLLA-04)
+
+```json
+// server/src/data/ollama-model-catalog.json
+{
+  "models": [
+    {
+      "family": "qwen2",
+      "variants": [
+        { "name": "qwen2.5-coder:7b",  "ramGb": 5,  "vramGb": 5,  "quality": "fast" },
+        { "name": "qwen2.5-coder:32b", "ramGb": 22, "vramGb": 22, "quality": "best" }
+      ]
+    },
+    {
+      "family": "llama",
+      "variants": [
+        { "name": "llama3.2:3b",  "ramGb": 3,  "vramGb": 3,  "quality": "fast" },
+        { "name": "llama3.1:8b",  "ramGb": 6,  "vramGb": 6,  "quality": "balanced" },
+        { "name": "llama3.1:70b", "ramGb": 48, "vramGb": 48, "quality": "best" }
+      ]
+    },
+    {
+      "family": "mistral",
+      "variants": [
+        { "name": "mistral:7b",   "ramGb": 5,  "vramGb": 5,  "quality": "balanced" },
+        { "name": "mistral:22b",  "ramGb": 14, "vramGb": 14, "quality": "best" }
+      ]
+    },
+    {
+      "family": "phi",
+      "variants": [
+        { "name": "phi4:14b",    "ramGb": 10, "vramGb": 10, "quality": "balanced" }
+      ]
+    },
+    {
+      "family": "deepseek",
+      "variants": [
+        { "name": "deepseek-r1:7b",  "ramGb": 5,  "vramGb": 5,  "quality": "reasoning" },
+        { "name": "deepseek-r1:32b", "ramGb": 22, "vramGb": 22, "quality": "reasoning" }
+      ]
+    }
+  ]
+}
+```
+
+Recommendation logic: `os.totalmem()` gives total RAM. Use 75% as usable RAM budget (leave OS headroom). Filter catalog entries where `ramGb <= totalRamGb * 0.75`. Return the highest-quality variant within budget plus a `recommendationReason` string.
+
+### Anti-Patterns to Avoid
+
+- **Polling Ollama in a loop:** Use a 60-second TTL in-memory cache (same as codex-models.ts `MODELS_CACHE_TTL_MS`). Do not re-probe on every API call.
+- **Blocking server startup on Ollama check:** Ollama detection is on-demand (per-request), not at startup.
+- **Hard-coding `localhost:11434`:** Always read from `process.env.OLLAMA_BASE_URL ?? "http://localhost:11434"` so users with non-standard ports work.
+- **Requiring Ollama for Hermes:** All Ollama paths are optional. Hermes without Ollama continues to work unchanged. Never throw when Ollama is absent.
+- **Overwriting all of stateJson:** Merge into stateJson using spread, never replace: `stateJson: { ...existingState, hermesModel: ..., hermesNativeSkillCount: ... }`.
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Ollama connectivity check | Custom TCP socket probe | `fetch` to `/api/version` with AbortController timeout | Reuses existing pattern from codex-models.ts |
+| YAML config parsing | Full YAML parser | Existing `parseModelFromConfig` in hermes adapter | Already ships in hermes-paperclip-adapter/dist |
+| System RAM reading | Shell commands | `os.totalmem()` | Built-in, no dep, works cross-platform |
+| Token cost tracking | New billing logic | Existing `costService.createEvent` + `updateRuntimeState` | Already handles Hermes via regex-extracted usage |
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Hermes Does Not Have an "ollama" Provider
+**What goes wrong:** Setting `adapterConfig.provider = "ollama"` causes Hermes to fail — "ollama" is not a valid VALID_PROVIDERS entry in `constants.js`.
+**Why it happens:** Ollama mimics the OpenAI API, so Hermes treats it as `provider: "custom"` with `base_url: "http://localhost:11434/v1"`.
+**How to avoid:** When a user selects an Ollama model, always write `provider: "custom"` and `base_url: "http://localhost:11434/v1"` into `adapterConfig`. These fields are already in the Hermes config schema (see `agentConfigurationDoc`).
+**Warning signs:** Hermes stderr shows "unknown provider" or authentication errors during local model runs.
+
+### Pitfall 2: Ollama API Returns Models at `/api/tags`, Not `/v1/models`
+**What goes wrong:** Using the OpenAI-compat endpoint `/v1/models` to list models misses the `details` object (parameterSize, quantization_level, family) needed for OLLA-04.
+**Why it happens:** `/v1/models` is OpenAI-compat, `/api/tags` is Ollama-native with richer data.
+**How to avoid:** Use `GET localhost:11434/api/tags` for model listing (returns `details.parameter_size`, `details.family`). Use `/v1/models` only if passing through to Hermes.
+
+### Pitfall 3: stateJson Merge Requires Read-Modify-Write
+**What goes wrong:** `db.update(agentRuntimeState).set({ stateJson: newData })` overwrites other fields stored by other parts of the system.
+**Why it happens:** Drizzle `.set()` replaces the entire column value.
+**How to avoid:** Use Postgres jsonb merge: `stateJson: sql\`${agentRuntimeState.stateJson} || ${JSON.stringify(patch)}::jsonb\`` or read existing `stateJson` first, then spread. The existing `ensureRuntimeState` call in `updateRuntimeState` already reads the row.
+
+### Pitfall 4: HermesLocalConfigFields Uses adapterConfig for Both Create and Edit Modes
+**What goes wrong:** Setting `provider` and `base_url` only in create mode loses the values on edit, or vice versa.
+**Why it happens:** The `isCreate` flag switches between `set!({ model: v })` (create) and `mark("adapterConfig", "model", v)` (edit) — both paths must update all three fields (model, provider, base_url) when an Ollama model is selected.
+**How to avoid:** When Ollama model is selected, call the setter for all three config fields atomically. For create mode: `set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })`. For edit mode: three `mark()` calls or a compound helper.
+
+### Pitfall 5: Ollama /api/ps Probe May Have No Models Running
+**What goes wrong:** `/api/ps` returns an empty `models: []` when no model is currently loaded — this does not mean Ollama is absent.
+**Why it happens:** Ollama only shows models in `/api/ps` when they are actively loaded in memory.
+**How to avoid:** Use `/api/version` for detection (OLLA-01), `/api/tags` for the model list (OLLA-02), and `/api/ps` only for the optional "memory usage" metric in HERM-07 — handling the empty case as "not currently loaded".
+
+### Pitfall 6: HERM-06 Cost Tracking — Ollama Models Return Zero Cost
+**What goes wrong:** Expecting a `cost_usd` value from runs using local Ollama models — there is no external billing.
+**Why it happens:** Hermes does not know the user's GPU/CPU cost. The `COST_REGEX` will not match if Hermes does not emit a cost line.
+**How to avoid:** This is correct behavior. `normalizeBilledCostCents(undefined, "unknown")` returns `0`. Token usage may still be captured if Hermes emits token counts. Accept that Ollama-based runs show $0.00 in the cost UI — that is accurate.
+
+---
+
+## Code Examples
+
+### Ollama /api/tags Response Shape (verified)
+```typescript
+// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
+interface OllamaTagsResponse {
+  models: Array<{
+    name: string;             // "qwen2.5-coder:32b"
+    model: string;            // same as name
+    modified_at: string;
+    size: number;             // bytes
+    digest: string;
+    details: {
+      parent_model: string;
+      format: string;         // "gguf"
+      family: string;         // "qwen2"
+      families: string[];
+      parameter_size: string; // "32.8B"
+      quantization_level: string; // "Q4_K_M"
+    };
+  }>;
+}
+```
+
+### Ollama /api/ps Response Shape (verified)
+```typescript
+// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
+interface OllamaPsResponse {
+  models: Array<{
+    name: string;
+    model: string;
+    size: number;
+    digest: string;
+    details: { /* same as tags */ };
+    expires_at: string;
+    size_vram: number;  // bytes used in VRAM
+  }>;
+}
+```
+
+### Reading hermes-adapter stateJson Hermes fields
+```typescript
+// In AgentDetail.tsx HermesRuntimeCard — read from runtimeState.stateJson
+const hermesModel = runtimeState.stateJson?.hermesModel as string | undefined;
+const hermesNativeSkillCount = runtimeState.stateJson?.hermesNativeSkillCount as number | undefined;
+const hermesMemoryBytes = runtimeState.stateJson?.hermesMemoryBytes as number | undefined;
+```
+
+### Hermes Ollama adapterConfig (what to write)
+```typescript
+// When user selects an Ollama model in config-fields.tsx:
+// model = "qwen2.5-coder:32b"  (bare Ollama model name)
+// provider = "custom"           (OpenAI-compatible endpoint)
+// base_url = "http://localhost:11434/v1"
+
+// For create mode:
+set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })
+
+// For edit mode:
+mark("adapterConfig", "model", model);
+mark("adapterConfig", "provider", "custom");
+mark("adapterConfig", "base_url", "http://localhost:11434/v1");
+```
+
+### Cost Tracking — Already Wired (HERM-06 context)
+```typescript
+// Source: server/src/services/heartbeat.ts:updateRuntimeState
+// Hermes execute.ts returns:
+//   result.usage = { inputTokens, outputTokens }  (from regex)
+//   result.costUsd = number | undefined           (from regex, usually undefined for local)
+//
+// heartbeat.ts normalizes:
+const usage = normalizeUsageTotals(result.usage);
+const additionalCostCents = normalizeBilledCostCents(result.costUsd, billingType);
+// Then:
+if (additionalCostCents > 0 || hasTokenUsage) {
+  await costs.createEvent(companyId, { ... model: result.model ?? "unknown" ... });
+}
+// → For Ollama: costCents=0, but inputTokens/outputTokens may be > 0 → cost event recorded
+// → If Hermes doesn't emit token counts: no event recorded (correct behavior)
+```
+
+---
+
+## HERM-05: Skill Visibility — What Is Already Done vs. What Is Missing
+
+### Already Done (data layer is complete)
+- `skillRegistryService.syncHermesNativeSkills(agentId)` scans `~/.hermes/skills/` and inserts `source: "native"` rows
+- Called automatically from `GET /skill-registry/agents/:agentId/skills` when `adapterType === "hermes_local"`
+- Returns `AgentSkillEntry[]` with `{ skillId, source, installedAt }` — both `"native"` and `"managed"` source values
+- Hermes adapter `listHermesSkills` returns snapshot with `originLabel: "Hermes skill"` and `readOnly: true` for native skills
+
+### What Is Missing (UI rendering in AgentSkillsTab)
+The `unmanagedSkillRows` section in `AgentSkillsTab` (AgentDetail.tsx:2566) renders read-only adapter entries. It uses `entry.originLabel` and `entry.locationLabel` for display. Hermes native skills already flow through this path.
+
+The gap: the UI may not clearly distinguish "Hermes skill" entries from other unmanaged entries. The `originLabel: "Hermes skill"` badge rendering and skill count display are the UI additions needed. This is a targeted render update to `AgentSkillsTab`, not a new data flow.
+
+---
+
+## HERM-07: Dashboard Hermes Runtime Info
+
+### What to Store in stateJson
+```typescript
+// Written by heartbeat.ts updateRuntimeState after a Hermes run
+{
+  hermesModel: string;         // e.g. "qwen2.5-coder:32b" or "anthropic/claude-sonnet-4"
+  hermesNativeSkillCount: number;  // from skillRegistryService query
+  hermesMemoryBytes: number | null; // from /api/ps size_vram, null if unavailable
+}
+```
+
+### Where to Write stateJson
+In `heartbeat.ts:updateRuntimeState`, after the existing `db.update(agentRuntimeState).set(...)` call, add a second update that merges hermes-specific fields when `agent.adapterType === "hermes_local"`. Read `result.model` for `hermesModel`. Query `skillRegistryDb` for `hermesNativeSkillCount`. Query Ollama `/api/ps` for `hermesMemoryBytes` (non-blocking, fire-and-forget).
+
+### What to Render
+A `HermesRuntimeCard` component in `AgentOverview` (gated by `adapterType === "hermes_local"`):
+- Model name (from stateJson.hermesModel)
+- Native skill count (from stateJson.hermesNativeSkillCount)
+- Memory usage (from stateJson.hermesMemoryBytes, formatted as "X.X GB" or "Not loaded")
+
+---
+
+## Environment Availability
+
+| Dependency | Required By | Available | Version | Fallback |
+|------------|------------|-----------|---------|----------|
+| Ollama daemon | OLLA-01 through OLLA-05 | No (not installed) | — | All paths degrade gracefully; UI shows install instructions |
+| hermes-paperclip-adapter | HERM-05, HERM-06, HERM-07 | Yes | 0.2.1 | — |
+| Node.js fetch | Ollama HTTP probing | Yes | built-in (Node 18+) | — |
+| Node.js os module | OLLA-04 RAM reading | Yes | built-in | — |
+| Vitest | Tests | Yes | (server vitest.config.ts) | — |
+
+**Missing dependencies with no fallback:** None — all Ollama features degrade gracefully when Ollama is absent.
+
+**Pre-existing test failures (not Phase 28 regressions):** 4 test files failing before Phase 28 begins:
+- `app-hmr-port.test.ts`
+- `plugin-worker-manager.test.ts`
+- `heartbeat-workspace-session.test.ts` (5 tests)
+- `skill-registry-routes.test.ts` (1 test)
+
+---
+
+## Validation Architecture
+
+### Test Framework
+| Property | Value |
+|----------|-------|
+| Framework | Vitest (server) |
+| Config file | `server/vitest.config.ts` |
+| Quick run command | `cd server && npx vitest run src/__tests__/ollama-service.test.ts` |
+| Full suite command | `cd server && npx vitest run` |
+
+### Phase Requirements → Test Map
+| Req ID | Behavior | Test Type | Automated Command | File Exists? |
+|--------|----------|-----------|-------------------|-------------|
+| OLLA-01 | `detectOllama()` returns `installed: false` when Ollama absent | unit | `npx vitest run src/__tests__/ollama-service.test.ts` | No — Wave 0 |
+| OLLA-01 | `detectOllama()` returns `installed: true` + version when Ollama present | unit | same | No — Wave 0 |
+| OLLA-01 | `detectOllama()` times out cleanly (AbortController) | unit | same | No — Wave 0 |
+| OLLA-02 | `listOllamaModels()` returns AdapterModel[] from /api/tags | unit | same | No — Wave 0 |
+| OLLA-04 | `buildModelRecommendation()` returns correct model for given RAM budget | unit | same | No — Wave 0 |
+| OLLA-05 | Routes return `installUrl` when Ollama absent | unit | same | No — Wave 0 |
+| HERM-05 | Skills tab renders `originLabel: "Hermes skill"` badge | manual-only | — | — |
+| HERM-06 | `updateRuntimeState` records cost event when Hermes emits token data | unit (existing pattern) | `npx vitest run src/__tests__/costs-service.test.ts` | Yes |
+| HERM-07 | stateJson receives hermesModel/hermesNativeSkillCount after run | unit | `npx vitest run src/__tests__/ollama-service.test.ts` | No — Wave 0 |
+
+### Sampling Rate
+- **Per task commit:** `cd server && npx vitest run src/__tests__/ollama-service.test.ts`
+- **Per wave merge:** `cd server && npx vitest run`
+- **Phase gate:** Full suite green before `/gsd:verify-work` (excluding 4 pre-existing failures)
+
+### Wave 0 Gaps
+- [ ] `server/src/__tests__/ollama-service.test.ts` — covers OLLA-01, OLLA-02, OLLA-04, OLLA-05, HERM-07 stateJson logic
+- [ ] Test stubs use mock fetch (AbortController pattern); no real Ollama needed
+
+---
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| Manual text entry for Hermes model | Dropdown fed from Ollama + manual fallback | Phase 28 | Better UX for local models |
+| stateJson unused for Hermes | stateJson stores hermesModel, skillCount, memoryBytes | Phase 28 | Dashboard can show runtime info |
+| Hermes native skills in separate table only | Skills tab renders both managed + native in unified view | Phase 28 (HERM-05 completion) | Unified skill surface |
+
+---
+
+## Open Questions
+
+1. **Should Ollama route be gated to hermes_local only?**
+   - What we know: Only Hermes uses the Ollama custom endpoint pattern currently
+   - What's unclear: Future adapters (Phase 29 defaults) may also use Ollama
+   - Recommendation: Mount under `/companies/:companyId/ollama/*` without adapter-type gating — the endpoint is useful generically and Pi/OpenCode adapters may benefit in Phase 29
+
+2. **Should listOllamaModels also extend the hermes adapter's `listModels` function?**
+   - What we know: `listAdapterModels("hermes_local")` already calls `adapter.listModels()` if present; hermes adapter has no `listModels` implementation (returns `models: []`)
+   - What's unclear: Whether to add `listModels` to hermes adapter (requires adapter package change) or use a separate Ollama API route in Nexus
+   - Recommendation: Use a separate Nexus route (`/companies/:companyId/ollama/models`). Avoids changing the hermes-paperclip-adapter package (external dependency). The config-fields.tsx component can call the Nexus route directly. **Do not modify the hermes-paperclip-adapter package.**
+
+3. **stateJson hermesNativeSkillCount — count from skillRegistry or from adapter snapshot?**
+   - What we know: `skillRegistryDb` is a separate libSQL DB; querying it in `updateRuntimeState` adds cross-DB complexity
+   - What's unclear: Is the extra query worth it for a display-only count?
+   - Recommendation: Store the count from `result.resultJson` if Hermes emits it, or derive from the adapter skill snapshot after run. Alternatively, skip native skill count from stateJson and derive it in the UI from `agentsApi.skills(agentId)` query. The UI approach avoids cross-DB concerns in heartbeat.
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+- hermes-paperclip-adapter@0.2.1 dist source code — `execute.js`, `skills.js`, `detect-model.js`, `test.js`, `constants.js` — read directly from `/opt/nexus/server/node_modules/hermes-paperclip-adapter/dist/`
+- Nexus codebase — `server/src/services/heartbeat.ts`, `server/src/services/costs.ts`, `server/src/services/skill-registry.ts`, `ui/src/pages/AgentDetail.tsx`, `ui/src/adapters/hermes-local/config-fields.tsx` — read directly
+- Ollama REST API — `https://docs.ollama.com/api/tags` — verified /api/tags response shape with `details.parameter_size`, `details.family`, `details.quantization_level`
+- Node.js built-ins — `os.totalmem()`, `fetch` with AbortController — confirmed available in Node 18+ runtime
+
+### Secondary (MEDIUM confidence)
+- Hermes Agent provider docs — `https://hermes-agent.nousresearch.com/docs/integrations/providers/` — verified "ollama uses custom provider + localhost:11434/v1 base_url"
+- Hermes Agent + Ollama guide — Medium/Substack articles cross-referencing official docs — confirmed custom endpoint configuration steps
+
+### Tertiary (LOW confidence)
+- Ollama model RAM requirements (catalog) — community sources + Ollama model page tags — use conservative estimates; verify against https://ollama.com/library model pages before shipping
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+- Ollama API: HIGH — verified from official docs, response shapes confirmed
+- Hermes + Ollama provider mapping: HIGH — verified from official Hermes provider docs
+- Standard stack: HIGH — all existing infrastructure confirmed from source code
+- Architecture patterns: HIGH — follow existing codex-models.ts, heartbeat.ts, config-fields.tsx patterns exactly
+- HERM-05 data layer status: HIGH — verified syncHermesNativeSkills exists and is already called
+- HERM-06 cost tracking: HIGH — execute.js returns usage/costUsd, heartbeat.ts wires it to costService
+- Pitfalls: HIGH — derived from actual source code analysis
+
+**Research date:** 2026-04-01
+**Valid until:** 2026-05-01 (Ollama API is stable; hermes-paperclip-adapter may receive new releases)