# Phase 28: Ollama Integration & Agent Surface - Research

**Researched:** 2026-04-01
**Domain:** Ollama HTTP API, Hermes adapter extension, agent dashboard UI, cost tracking
**Confidence:** HIGH

## Summary

Phase 28 adds three distinct capabilities on top of the completed Phase 27 Hermes adapter: (1) Ollama detection and model catalog — Nexus queries `localhost:11434` to detect Ollama, lists available models, and ships a static JSON catalog for hardware-aware recommendations; (2) Hermes config surface extension — the model field in `config-fields.tsx` becomes a dropdown fed by live Ollama discovery rather than a free-text input, and a new `base_url`/`provider: custom` adapterConfig field routes Hermes to the local endpoint; (3) Hermes runtime data in the dashboard — `stateJson` in `agentRuntimeState` is the right place to store Hermes-specific runtime metadata (model name, native skill count, memory usage from Ollama's `/api/ps`), and the `AgentOverview` component in `AgentDetail.tsx` is the right insertion point.

The most important finding is that **Hermes does not have a native "ollama" provider**. Ollama is configured as a custom OpenAI-compatible endpoint: `provider: custom`, `base_url: http://localhost:11434/v1`. The model field passes the Ollama model name bare (e.g. `qwen2.5-coder:32b`). This shapes OLLA-02, OLLA-03, and the `config-fields.tsx` changes.

For cost tracking (HERM-06): `hermes-paperclip-adapter@0.2.1` already parses `token_usage` and `cost` regex patterns from Hermes stdout. When Hermes returns non-zero usage, `heartbeat.ts:updateRuntimeState` already calls `costService.createEvent`. The only gap is that Hermes running local Ollama models will have `costUsd = undefined` (no billing) — the infrastructure handles this correctly (zero cost event is suppressed when `additionalCostCents === 0 && !hasTokenUsage`). No cost tracking code changes are needed for local models; the planner just needs to verify the regex path works end-to-end.

For HERM-05 (skill visibility): `syncHermesNativeSkills` already exists in `skillRegistryService` and is already called from the `GET /skill-registry/agents/:agentId/skills` route when `adapterType === "hermes_local"`. The Hermes adapter's `listHermesSkills` function merges Paperclip-managed and native skills. The integration is already complete at the data layer. What is missing is the UI surface in the Skills tab that renders the `originLabel: "Hermes skill"` / `readOnly: true` entries distinctly from managed skills.

**Primary recommendation:** Implement as four focused plans — (P01) server-side Ollama service + routes; (P02) Hermes config-fields UI extension for Ollama model selection; (P03) dashboard Hermes runtime info card; (P04) model catalog JSON + recommendation logic.

---

<user_constraints>
## User Constraints (from CONTEXT.md)

### Locked Decisions
All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.

### Claude's Discretion
All implementation choices are at Claude's discretion.

### Deferred Ideas (OUT OF SCOPE)
None — discuss phase skipped. Refer to REQUIREMENTS.md for in-scope requirements.

Out of scope per REQUIREMENTS.md:
- Multi-provider model routing (Hermes can use OpenRouter/Anthropic/OpenAI but that's Hermes config, not Nexus)
- Hermes MCP server management
- Custom Hermes skill authoring UI
- DFLT-01 through DFLT-04 (Phase 29)
</user_constraints>

---

<phase_requirements>
## Phase Requirements

| ID | Description | Research Support |
|----|-------------|------------------|
| OLLA-01 | Nexus detects whether Ollama is installed locally | HTTP probe to `localhost:11434/api/version`; new server service `ollamaService` |
| OLLA-02 | User can see list of available Ollama models when configuring a Hermes agent | `GET /api/tags` from Ollama HTTP API; new server route `GET /companies/:id/ollama/models`; config-fields.tsx dropdown |
| OLLA-03 | User can configure a Hermes agent with any local Ollama model | Sets `adapterConfig.model = <model-name>`, `adapterConfig.provider = "custom"`, `adapterConfig.base_url = "http://localhost:11434/v1"` |
| OLLA-04 | Model recommendation based on RAM/VRAM from a shipped catalog | Static JSON catalog in `server/src/data/ollama-model-catalog.json`; server reads `os.totalmem()` to filter; returned with model list |
| OLLA-05 | If Ollama is not present, user is offered installation instructions | Ollama status endpoint returns `installed: false` + `installUrl`; UI shows callout in Hermes config-fields |
| HERM-05 | Nexus-managed skills visible alongside Hermes native skills in agent config | Already wired at data layer — UI Skills tab needs `originLabel: "Hermes skill"` rendering distinction |
| HERM-06 | Cost tracking captures token usage and model costs for Hermes agents | Infrastructure already handles this; verify end-to-end with local Ollama (zero cost is correct, no change needed) |
| HERM-07 | Dashboard shows Hermes-specific info (model name, memory usage, native skill count) | Store in `agentRuntimeState.stateJson`; render in `AgentOverview` component |
</phase_requirements>

---

## Standard Stack

### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| Node.js `os` module | built-in | Read total system RAM | Already used in heartbeat.ts; no new dep |
| Node.js `fetch` | Node 18+ built-in | HTTP calls to Ollama API at localhost:11434 | Already confirmed available in runtime |
| `hermes-paperclip-adapter` | 0.2.1 (installed) | Hermes execution, skill sync, model detection | Already wired into adapter registry |

### No New Dependencies Required
All capabilities needed for Phase 28 are achievable with existing infrastructure:
- Ollama HTTP API is probed with `fetch` (built-in Node 18+)
- Model catalog is a static JSON file in the server package
- RAM reading uses `os.totalmem()` (built-in)
- Hermes Ollama configuration uses existing `adapterConfig` fields

## Architecture Patterns

### Recommended Project Structure (new files)

```
server/src/services/ollama.ts          # ollamaService — detect + list models
server/src/routes/ollama.ts            # HTTP routes: /companies/:id/ollama/status, /models
server/src/data/ollama-model-catalog.json  # shipped catalog for OLLA-04
server/src/__tests__/ollama-service.test.ts  # unit tests for ollamaService
ui/src/api/ollama.ts                   # ollamaApi client — wraps server routes
```

### Pattern 1: Ollama Service (server-side)

```typescript
// server/src/services/ollama.ts
const OLLAMA_BASE_URL = process.env.OLLAMA_BASE_URL ?? "http://localhost:11434";
const OLLAMA_TIMEOUT_MS = 3000;

export interface OllamaStatus {
  installed: boolean;
  version: string | null;
  installUrl: string;
}

export interface OllamaModel {
  name: string;           // e.g. "qwen2.5-coder:32b"
  parameterSize: string;  // e.g. "32.8B" from /api/tags details
  quantization: string;   // e.g. "Q4_K_M"
  sizeBytes: number;
  family: string;         // e.g. "qwen2"
  recommended: boolean;   // from catalog match + RAM check
  recommendationReason: string | null;
}

export async function detectOllama(): Promise<OllamaStatus> {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), OLLAMA_TIMEOUT_MS);
  try {
    const res = await fetch(`${OLLAMA_BASE_URL}/api/version`, {
      signal: controller.signal,
    });
    if (!res.ok) return { installed: false, version: null, installUrl: INSTALL_URL };
    const body = await res.json() as { version?: string };
    return { installed: true, version: body.version ?? null, installUrl: INSTALL_URL };
  } catch {
    return { installed: false, version: null, installUrl: INSTALL_URL };
  } finally {
    clearTimeout(timeout);
  }
}
```

**Why this pattern:** Matches the existing codex-models.ts pattern — HTTP fetch with timeout, graceful failure returns empty/false rather than throwing. The 3s timeout prevents hanging requests when Ollama is not installed.

### Pattern 2: Ollama Routes (mounted under /companies/:companyId)

```
GET /companies/:companyId/ollama/status
  → { installed: boolean, version: string|null, installUrl: string }

GET /companies/:companyId/ollama/models
  → { models: OllamaModel[], ramGb: number }
```

Both routes use existing `assertCompanyAccess(req, companyId)` authz pattern from `agents.ts`.

Mount in `server/src/routes/index.ts` alongside the existing `agentsRoutes`.

### Pattern 3: Hermes Config-Fields Enhancement

The existing `HermesLocalConfigFields` in `config-fields.tsx` has a free-text `Model` input. For Ollama support, it becomes a hybrid: dropdown (when Ollama is present) + manual entry fallback.

```tsx
// Fetch Ollama status + models (only for hermes_local adapter)
const { data: ollamaStatus } = useQuery({
  queryKey: ["ollama", "status", companyId],
  queryFn: () => ollamaApi.status(companyId!),
  enabled: Boolean(companyId),
});

const { data: ollamaModels } = useQuery({
  queryKey: ["ollama", "models", companyId],
  queryFn: () => ollamaApi.models(companyId!),
  enabled: Boolean(companyId && ollamaStatus?.installed),
});
```

When `ollamaStatus.installed === false`, render an install callout (OLLA-05) instead of the dropdown.

When a local Ollama model is selected, `buildHermesConfig` (or `mark`) must also set `provider: "custom"` and `base_url: "http://localhost:11434/v1"` in `adapterConfig`. This is the critical mapping from OLLA-03.

### Pattern 4: Hermes Runtime Data in stateJson (HERM-07)

`agentRuntimeState.stateJson` is `jsonb` typed as `Record<string, unknown>`. The heartbeat service writes this via `updateRuntimeState`. The Hermes adapter's `execute.ts` already returns `resultJson` with `session_id`, `usage`, and `cost_usd`.

For HERM-07 runtime data (model name, native skill count, memory usage), the server-side approach is:
- After a Hermes run completes, read `resultJson.result` and extract/store model + detected skill count into `stateJson`
- Optionally query Ollama `/api/ps` (running models) to get `size_vram` for memory usage display

**Insertion point for stateJson patch:** `heartbeat.ts:updateRuntimeState` already calls `db.update(agentRuntimeState).set(...)`. Add a `stateJson` merge here when `adapterType === "hermes_local"`.

**UI insertion point:** `AgentOverview` component in `AgentDetail.tsx` (line ~1183). Add a `HermesRuntimeCard` component after the charts section, gated by `agent.adapterType === "hermes_local"`:

```tsx
{agent.adapterType === "hermes_local" && runtimeState && (
  <HermesRuntimeCard runtimeState={runtimeState} />
)}
```

### Pattern 5: Model Catalog JSON (OLLA-04)

```json
// server/src/data/ollama-model-catalog.json
{
  "models": [
    {
      "family": "qwen2",
      "variants": [
        { "name": "qwen2.5-coder:7b",  "ramGb": 5,  "vramGb": 5,  "quality": "fast" },
        { "name": "qwen2.5-coder:32b", "ramGb": 22, "vramGb": 22, "quality": "best" }
      ]
    },
    {
      "family": "llama",
      "variants": [
        { "name": "llama3.2:3b",  "ramGb": 3,  "vramGb": 3,  "quality": "fast" },
        { "name": "llama3.1:8b",  "ramGb": 6,  "vramGb": 6,  "quality": "balanced" },
        { "name": "llama3.1:70b", "ramGb": 48, "vramGb": 48, "quality": "best" }
      ]
    },
    {
      "family": "mistral",
      "variants": [
        { "name": "mistral:7b",   "ramGb": 5,  "vramGb": 5,  "quality": "balanced" },
        { "name": "mistral:22b",  "ramGb": 14, "vramGb": 14, "quality": "best" }
      ]
    },
    {
      "family": "phi",
      "variants": [
        { "name": "phi4:14b",    "ramGb": 10, "vramGb": 10, "quality": "balanced" }
      ]
    },
    {
      "family": "deepseek",
      "variants": [
        { "name": "deepseek-r1:7b",  "ramGb": 5,  "vramGb": 5,  "quality": "reasoning" },
        { "name": "deepseek-r1:32b", "ramGb": 22, "vramGb": 22, "quality": "reasoning" }
      ]
    }
  ]
}
```

Recommendation logic: `os.totalmem()` gives total RAM. Use 75% as usable RAM budget (leave OS headroom). Filter catalog entries where `ramGb <= totalRamGb * 0.75`. Return the highest-quality variant within budget plus a `recommendationReason` string.

### Anti-Patterns to Avoid

- **Polling Ollama in a loop:** Use a 60-second TTL in-memory cache (same as codex-models.ts `MODELS_CACHE_TTL_MS`). Do not re-probe on every API call.
- **Blocking server startup on Ollama check:** Ollama detection is on-demand (per-request), not at startup.
- **Hard-coding `localhost:11434`:** Always read from `process.env.OLLAMA_BASE_URL ?? "http://localhost:11434"` so users with non-standard ports work.
- **Requiring Ollama for Hermes:** All Ollama paths are optional. Hermes without Ollama continues to work unchanged. Never throw when Ollama is absent.
- **Overwriting all of stateJson:** Merge into stateJson using spread, never replace: `stateJson: { ...existingState, hermesModel: ..., hermesNativeSkillCount: ... }`.

---

## Don't Hand-Roll

| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Ollama connectivity check | Custom TCP socket probe | `fetch` to `/api/version` with AbortController timeout | Reuses existing pattern from codex-models.ts |
| YAML config parsing | Full YAML parser | Existing `parseModelFromConfig` in hermes adapter | Already ships in hermes-paperclip-adapter/dist |
| System RAM reading | Shell commands | `os.totalmem()` | Built-in, no dep, works cross-platform |
| Token cost tracking | New billing logic | Existing `costService.createEvent` + `updateRuntimeState` | Already handles Hermes via regex-extracted usage |

---

## Common Pitfalls

### Pitfall 1: Hermes Does Not Have an "ollama" Provider
**What goes wrong:** Setting `adapterConfig.provider = "ollama"` causes Hermes to fail — "ollama" is not a valid VALID_PROVIDERS entry in `constants.js`.
**Why it happens:** Ollama mimics the OpenAI API, so Hermes treats it as `provider: "custom"` with `base_url: "http://localhost:11434/v1"`.
**How to avoid:** When a user selects an Ollama model, always write `provider: "custom"` and `base_url: "http://localhost:11434/v1"` into `adapterConfig`. These fields are already in the Hermes config schema (see `agentConfigurationDoc`).
**Warning signs:** Hermes stderr shows "unknown provider" or authentication errors during local model runs.

### Pitfall 2: Ollama API Returns Models at `/api/tags`, Not `/v1/models`
**What goes wrong:** Using the OpenAI-compat endpoint `/v1/models` to list models misses the `details` object (parameterSize, quantization_level, family) needed for OLLA-04.
**Why it happens:** `/v1/models` is OpenAI-compat, `/api/tags` is Ollama-native with richer data.
**How to avoid:** Use `GET localhost:11434/api/tags` for model listing (returns `details.parameter_size`, `details.family`). Use `/v1/models` only if passing through to Hermes.

### Pitfall 3: stateJson Merge Requires Read-Modify-Write
**What goes wrong:** `db.update(agentRuntimeState).set({ stateJson: newData })` overwrites other fields stored by other parts of the system.
**Why it happens:** Drizzle `.set()` replaces the entire column value.
**How to avoid:** Use Postgres jsonb merge: `stateJson: sql\`${agentRuntimeState.stateJson} || ${JSON.stringify(patch)}::jsonb\`` or read existing `stateJson` first, then spread. The existing `ensureRuntimeState` call in `updateRuntimeState` already reads the row.

### Pitfall 4: HermesLocalConfigFields Uses adapterConfig for Both Create and Edit Modes
**What goes wrong:** Setting `provider` and `base_url` only in create mode loses the values on edit, or vice versa.
**Why it happens:** The `isCreate` flag switches between `set!({ model: v })` (create) and `mark("adapterConfig", "model", v)` (edit) — both paths must update all three fields (model, provider, base_url) when an Ollama model is selected.
**How to avoid:** When Ollama model is selected, call the setter for all three config fields atomically. For create mode: `set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })`. For edit mode: three `mark()` calls or a compound helper.

### Pitfall 5: Ollama /api/ps Probe May Have No Models Running
**What goes wrong:** `/api/ps` returns an empty `models: []` when no model is currently loaded — this does not mean Ollama is absent.
**Why it happens:** Ollama only shows models in `/api/ps` when they are actively loaded in memory.
**How to avoid:** Use `/api/version` for detection (OLLA-01), `/api/tags` for the model list (OLLA-02), and `/api/ps` only for the optional "memory usage" metric in HERM-07 — handling the empty case as "not currently loaded".

### Pitfall 6: HERM-06 Cost Tracking — Ollama Models Return Zero Cost
**What goes wrong:** Expecting a `cost_usd` value from runs using local Ollama models — there is no external billing.
**Why it happens:** Hermes does not know the user's GPU/CPU cost. The `COST_REGEX` will not match if Hermes does not emit a cost line.
**How to avoid:** This is correct behavior. `normalizeBilledCostCents(undefined, "unknown")` returns `0`. Token usage may still be captured if Hermes emits token counts. Accept that Ollama-based runs show $0.00 in the cost UI — that is accurate.

---

## Code Examples

### Ollama /api/tags Response Shape (verified)
```typescript
// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
interface OllamaTagsResponse {
  models: Array<{
    name: string;             // "qwen2.5-coder:32b"
    model: string;            // same as name
    modified_at: string;
    size: number;             // bytes
    digest: string;
    details: {
      parent_model: string;
      format: string;         // "gguf"
      family: string;         // "qwen2"
      families: string[];
      parameter_size: string; // "32.8B"
      quantization_level: string; // "Q4_K_M"
    };
  }>;
}
```

### Ollama /api/ps Response Shape (verified)
```typescript
// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
interface OllamaPsResponse {
  models: Array<{
    name: string;
    model: string;
    size: number;
    digest: string;
    details: { /* same as tags */ };
    expires_at: string;
    size_vram: number;  // bytes used in VRAM
  }>;
}
```

### Reading hermes-adapter stateJson Hermes fields
```typescript
// In AgentDetail.tsx HermesRuntimeCard — read from runtimeState.stateJson
const hermesModel = runtimeState.stateJson?.hermesModel as string | undefined;
const hermesNativeSkillCount = runtimeState.stateJson?.hermesNativeSkillCount as number | undefined;
const hermesMemoryBytes = runtimeState.stateJson?.hermesMemoryBytes as number | undefined;
```

### Hermes Ollama adapterConfig (what to write)
```typescript
// When user selects an Ollama model in config-fields.tsx:
// model = "qwen2.5-coder:32b"  (bare Ollama model name)
// provider = "custom"           (OpenAI-compatible endpoint)
// base_url = "http://localhost:11434/v1"

// For create mode:
set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })

// For edit mode:
mark("adapterConfig", "model", model);
mark("adapterConfig", "provider", "custom");
mark("adapterConfig", "base_url", "http://localhost:11434/v1");
```

### Cost Tracking — Already Wired (HERM-06 context)
```typescript
// Source: server/src/services/heartbeat.ts:updateRuntimeState
// Hermes execute.ts returns:
//   result.usage = { inputTokens, outputTokens }  (from regex)
//   result.costUsd = number | undefined           (from regex, usually undefined for local)
//
// heartbeat.ts normalizes:
const usage = normalizeUsageTotals(result.usage);
const additionalCostCents = normalizeBilledCostCents(result.costUsd, billingType);
// Then:
if (additionalCostCents > 0 || hasTokenUsage) {
  await costs.createEvent(companyId, { ... model: result.model ?? "unknown" ... });
}
// → For Ollama: costCents=0, but inputTokens/outputTokens may be > 0 → cost event recorded
// → If Hermes doesn't emit token counts: no event recorded (correct behavior)
```

---

## HERM-05: Skill Visibility — What Is Already Done vs. What Is Missing

### Already Done (data layer is complete)
- `skillRegistryService.syncHermesNativeSkills(agentId)` scans `~/.hermes/skills/` and inserts `source: "native"` rows
- Called automatically from `GET /skill-registry/agents/:agentId/skills` when `adapterType === "hermes_local"`
- Returns `AgentSkillEntry[]` with `{ skillId, source, installedAt }` — both `"native"` and `"managed"` source values
- Hermes adapter `listHermesSkills` returns snapshot with `originLabel: "Hermes skill"` and `readOnly: true` for native skills

### What Is Missing (UI rendering in AgentSkillsTab)
The `unmanagedSkillRows` section in `AgentSkillsTab` (AgentDetail.tsx:2566) renders read-only adapter entries. It uses `entry.originLabel` and `entry.locationLabel` for display. Hermes native skills already flow through this path.

The gap: the UI may not clearly distinguish "Hermes skill" entries from other unmanaged entries. The `originLabel: "Hermes skill"` badge rendering and skill count display are the UI additions needed. This is a targeted render update to `AgentSkillsTab`, not a new data flow.

---

## HERM-07: Dashboard Hermes Runtime Info

### What to Store in stateJson
```typescript
// Written by heartbeat.ts updateRuntimeState after a Hermes run
{
  hermesModel: string;         // e.g. "qwen2.5-coder:32b" or "anthropic/claude-sonnet-4"
  hermesNativeSkillCount: number;  // from skillRegistryService query
  hermesMemoryBytes: number | null; // from /api/ps size_vram, null if unavailable
}
```

### Where to Write stateJson
In `heartbeat.ts:updateRuntimeState`, after the existing `db.update(agentRuntimeState).set(...)` call, add a second update that merges hermes-specific fields when `agent.adapterType === "hermes_local"`. Read `result.model` for `hermesModel`. Query `skillRegistryDb` for `hermesNativeSkillCount`. Query Ollama `/api/ps` for `hermesMemoryBytes` (non-blocking, fire-and-forget).

### What to Render
A `HermesRuntimeCard` component in `AgentOverview` (gated by `adapterType === "hermes_local"`):
- Model name (from stateJson.hermesModel)
- Native skill count (from stateJson.hermesNativeSkillCount)
- Memory usage (from stateJson.hermesMemoryBytes, formatted as "X.X GB" or "Not loaded")

---

## Environment Availability

| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| Ollama daemon | OLLA-01 through OLLA-05 | No (not installed) | — | All paths degrade gracefully; UI shows install instructions |
| hermes-paperclip-adapter | HERM-05, HERM-06, HERM-07 | Yes | 0.2.1 | — |
| Node.js fetch | Ollama HTTP probing | Yes | built-in (Node 18+) | — |
| Node.js os module | OLLA-04 RAM reading | Yes | built-in | — |
| Vitest | Tests | Yes | (server vitest.config.ts) | — |

**Missing dependencies with no fallback:** None — all Ollama features degrade gracefully when Ollama is absent.

**Pre-existing test failures (not Phase 28 regressions):** 4 test files failing before Phase 28 begins:
- `app-hmr-port.test.ts`
- `plugin-worker-manager.test.ts`
- `heartbeat-workspace-session.test.ts` (5 tests)
- `skill-registry-routes.test.ts` (1 test)

---

## Validation Architecture

### Test Framework
| Property | Value |
|----------|-------|
| Framework | Vitest (server) |
| Config file | `server/vitest.config.ts` |
| Quick run command | `cd server && npx vitest run src/__tests__/ollama-service.test.ts` |
| Full suite command | `cd server && npx vitest run` |

### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| OLLA-01 | `detectOllama()` returns `installed: false` when Ollama absent | unit | `npx vitest run src/__tests__/ollama-service.test.ts` | No — Wave 0 |
| OLLA-01 | `detectOllama()` returns `installed: true` + version when Ollama present | unit | same | No — Wave 0 |
| OLLA-01 | `detectOllama()` times out cleanly (AbortController) | unit | same | No — Wave 0 |
| OLLA-02 | `listOllamaModels()` returns AdapterModel[] from /api/tags | unit | same | No — Wave 0 |
| OLLA-04 | `buildModelRecommendation()` returns correct model for given RAM budget | unit | same | No — Wave 0 |
| OLLA-05 | Routes return `installUrl` when Ollama absent | unit | same | No — Wave 0 |
| HERM-05 | Skills tab renders `originLabel: "Hermes skill"` badge | manual-only | — | — |
| HERM-06 | `updateRuntimeState` records cost event when Hermes emits token data | unit (existing pattern) | `npx vitest run src/__tests__/costs-service.test.ts` | Yes |
| HERM-07 | stateJson receives hermesModel/hermesNativeSkillCount after run | unit | `npx vitest run src/__tests__/ollama-service.test.ts` | No — Wave 0 |

### Sampling Rate
- **Per task commit:** `cd server && npx vitest run src/__tests__/ollama-service.test.ts`
- **Per wave merge:** `cd server && npx vitest run`
- **Phase gate:** Full suite green before `/gsd:verify-work` (excluding 4 pre-existing failures)

### Wave 0 Gaps
- [ ] `server/src/__tests__/ollama-service.test.ts` — covers OLLA-01, OLLA-02, OLLA-04, OLLA-05, HERM-07 stateJson logic
- [ ] Test stubs use mock fetch (AbortController pattern); no real Ollama needed

---

## State of the Art

| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Manual text entry for Hermes model | Dropdown fed from Ollama + manual fallback | Phase 28 | Better UX for local models |
| stateJson unused for Hermes | stateJson stores hermesModel, skillCount, memoryBytes | Phase 28 | Dashboard can show runtime info |
| Hermes native skills in separate table only | Skills tab renders both managed + native in unified view | Phase 28 (HERM-05 completion) | Unified skill surface |

---

## Open Questions

1. **Should Ollama route be gated to hermes_local only?**
   - What we know: Only Hermes uses the Ollama custom endpoint pattern currently
   - What's unclear: Future adapters (Phase 29 defaults) may also use Ollama
   - Recommendation: Mount under `/companies/:companyId/ollama/*` without adapter-type gating — the endpoint is useful generically and Pi/OpenCode adapters may benefit in Phase 29

2. **Should listOllamaModels also extend the hermes adapter's `listModels` function?**
   - What we know: `listAdapterModels("hermes_local")` already calls `adapter.listModels()` if present; hermes adapter has no `listModels` implementation (returns `models: []`)
   - What's unclear: Whether to add `listModels` to hermes adapter (requires adapter package change) or use a separate Ollama API route in Nexus
   - Recommendation: Use a separate Nexus route (`/companies/:companyId/ollama/models`). Avoids changing the hermes-paperclip-adapter package (external dependency). The config-fields.tsx component can call the Nexus route directly. **Do not modify the hermes-paperclip-adapter package.**

3. **stateJson hermesNativeSkillCount — count from skillRegistry or from adapter snapshot?**
   - What we know: `skillRegistryDb` is a separate libSQL DB; querying it in `updateRuntimeState` adds cross-DB complexity
   - What's unclear: Is the extra query worth it for a display-only count?
   - Recommendation: Store the count from `result.resultJson` if Hermes emits it, or derive from the adapter skill snapshot after run. Alternatively, skip native skill count from stateJson and derive it in the UI from `agentsApi.skills(agentId)` query. The UI approach avoids cross-DB concerns in heartbeat.

---

## Sources

### Primary (HIGH confidence)
- hermes-paperclip-adapter@0.2.1 dist source code — `execute.js`, `skills.js`, `detect-model.js`, `test.js`, `constants.js` — read directly from `/opt/nexus/server/node_modules/hermes-paperclip-adapter/dist/`
- Nexus codebase — `server/src/services/heartbeat.ts`, `server/src/services/costs.ts`, `server/src/services/skill-registry.ts`, `ui/src/pages/AgentDetail.tsx`, `ui/src/adapters/hermes-local/config-fields.tsx` — read directly
- Ollama REST API — `https://docs.ollama.com/api/tags` — verified /api/tags response shape with `details.parameter_size`, `details.family`, `details.quantization_level`
- Node.js built-ins — `os.totalmem()`, `fetch` with AbortController — confirmed available in Node 18+ runtime

### Secondary (MEDIUM confidence)
- Hermes Agent provider docs — `https://hermes-agent.nousresearch.com/docs/integrations/providers/` — verified "ollama uses custom provider + localhost:11434/v1 base_url"
- Hermes Agent + Ollama guide — Medium/Substack articles cross-referencing official docs — confirmed custom endpoint configuration steps

### Tertiary (LOW confidence)
- Ollama model RAM requirements (catalog) — community sources + Ollama model page tags — use conservative estimates; verify against https://ollama.com/library model pages before shipping

---

## Metadata

**Confidence breakdown:**
- Ollama API: HIGH — verified from official docs, response shapes confirmed
- Hermes + Ollama provider mapping: HIGH — verified from official Hermes provider docs
- Standard stack: HIGH — all existing infrastructure confirmed from source code
- Architecture patterns: HIGH — follow existing codex-models.ts, heartbeat.ts, config-fields.tsx patterns exactly
- HERM-05 data layer status: HIGH — verified syncHermesNativeSkills exists and is already called
- HERM-06 cost tracking: HIGH — execute.js returns usage/costUsd, heartbeat.ts wires it to costService
- Pitfalls: HIGH — derived from actual source code analysis

**Research date:** 2026-04-01
**Valid until:** 2026-05-01 (Ollama API is stable; hermes-paperclip-adapter may receive new releases)