docs(28): research phase domain — Ollama API, Hermes config surface, cost tracking, dashboard

This commit is contained in:
Nexus Dev 2026-04-02 16:45:03 +00:00
parent 5fd7744516
commit ee7f47c4d0

View file

@ -0,0 +1,522 @@
# Phase 28: Ollama Integration & Agent Surface - Research
**Researched:** 2026-04-01
**Domain:** Ollama HTTP API, Hermes adapter extension, agent dashboard UI, cost tracking
**Confidence:** HIGH
## Summary
Phase 28 adds three distinct capabilities on top of the completed Phase 27 Hermes adapter: (1) Ollama detection and model catalog — Nexus queries `localhost:11434` to detect Ollama, lists available models, and ships a static JSON catalog for hardware-aware recommendations; (2) Hermes config surface extension — the model field in `config-fields.tsx` becomes a dropdown fed by live Ollama discovery rather than a free-text input, and a new `base_url`/`provider: custom` adapterConfig field routes Hermes to the local endpoint; (3) Hermes runtime data in the dashboard — `stateJson` in `agentRuntimeState` is the right place to store Hermes-specific runtime metadata (model name, native skill count, memory usage from Ollama's `/api/ps`), and the `AgentOverview` component in `AgentDetail.tsx` is the right insertion point.
The most important finding is that **Hermes does not have a native "ollama" provider**. Ollama is configured as a custom OpenAI-compatible endpoint: `provider: custom`, `base_url: http://localhost:11434/v1`. The model field passes the Ollama model name bare (e.g. `qwen2.5-coder:32b`). This shapes OLLA-02, OLLA-03, and the `config-fields.tsx` changes.
For cost tracking (HERM-06): `hermes-paperclip-adapter@0.2.1` already parses `token_usage` and `cost` regex patterns from Hermes stdout. When Hermes returns non-zero usage, `heartbeat.ts:updateRuntimeState` already calls `costService.createEvent`. The only gap is that Hermes running local Ollama models will have `costUsd = undefined` (no billing) — the infrastructure handles this correctly (zero cost event is suppressed when `additionalCostCents === 0 && !hasTokenUsage`). No cost tracking code changes are needed for local models; the planner just needs to verify the regex path works end-to-end.
For HERM-05 (skill visibility): `syncHermesNativeSkills` already exists in `skillRegistryService` and is already called from the `GET /skill-registry/agents/:agentId/skills` route when `adapterType === "hermes_local"`. The Hermes adapter's `listHermesSkills` function merges Paperclip-managed and native skills. The integration is already complete at the data layer. What is missing is the UI surface in the Skills tab that renders the `originLabel: "Hermes skill"` / `readOnly: true` entries distinctly from managed skills.
**Primary recommendation:** Implement as four focused plans — (P01) server-side Ollama service + routes; (P02) Hermes config-fields UI extension for Ollama model selection; (P03) dashboard Hermes runtime info card; (P04) model catalog JSON + recommendation logic.
---
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.
### Claude's Discretion
All implementation choices are at Claude's discretion.
### Deferred Ideas (OUT OF SCOPE)
None — discuss phase skipped. Refer to REQUIREMENTS.md for in-scope requirements.
Out of scope per REQUIREMENTS.md:
- Multi-provider model routing (Hermes can use OpenRouter/Anthropic/OpenAI but that's Hermes config, not Nexus)
- Hermes MCP server management
- Custom Hermes skill authoring UI
- DFLT-01 through DFLT-04 (Phase 29)
</user_constraints>
---
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| OLLA-01 | Nexus detects whether Ollama is installed locally | HTTP probe to `localhost:11434/api/version`; new server service `ollamaService` |
| OLLA-02 | User can see list of available Ollama models when configuring a Hermes agent | `GET /api/tags` from Ollama HTTP API; new server route `GET /companies/:id/ollama/models`; config-fields.tsx dropdown |
| OLLA-03 | User can configure a Hermes agent with any local Ollama model | Sets `adapterConfig.model = <model-name>`, `adapterConfig.provider = "custom"`, `adapterConfig.base_url = "http://localhost:11434/v1"` |
| OLLA-04 | Model recommendation based on RAM/VRAM from a shipped catalog | Static JSON catalog in `server/src/data/ollama-model-catalog.json`; server reads `os.totalmem()` to filter; returned with model list |
| OLLA-05 | If Ollama is not present, user is offered installation instructions | Ollama status endpoint returns `installed: false` + `installUrl`; UI shows callout in Hermes config-fields |
| HERM-05 | Nexus-managed skills visible alongside Hermes native skills in agent config | Already wired at data layer — UI Skills tab needs `originLabel: "Hermes skill"` rendering distinction |
| HERM-06 | Cost tracking captures token usage and model costs for Hermes agents | Infrastructure already handles this; verify end-to-end with local Ollama (zero cost is correct, no change needed) |
| HERM-07 | Dashboard shows Hermes-specific info (model name, memory usage, native skill count) | Store in `agentRuntimeState.stateJson`; render in `AgentOverview` component |
</phase_requirements>
---
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| Node.js `os` module | built-in | Read total system RAM | Already used in heartbeat.ts; no new dep |
| Node.js `fetch` | Node 18+ built-in | HTTP calls to Ollama API at localhost:11434 | Already confirmed available in runtime |
| `hermes-paperclip-adapter` | 0.2.1 (installed) | Hermes execution, skill sync, model detection | Already wired into adapter registry |
### No New Dependencies Required
All capabilities needed for Phase 28 are achievable with existing infrastructure:
- Ollama HTTP API is probed with `fetch` (built-in Node 18+)
- Model catalog is a static JSON file in the server package
- RAM reading uses `os.totalmem()` (built-in)
- Hermes Ollama configuration uses existing `adapterConfig` fields
## Architecture Patterns
### Recommended Project Structure (new files)
```
server/src/services/ollama.ts # ollamaService — detect + list models
server/src/routes/ollama.ts # HTTP routes: /companies/:id/ollama/status, /models
server/src/data/ollama-model-catalog.json # shipped catalog for OLLA-04
server/src/__tests__/ollama-service.test.ts # unit tests for ollamaService
ui/src/api/ollama.ts # ollamaApi client — wraps server routes
```
### Pattern 1: Ollama Service (server-side)
```typescript
// server/src/services/ollama.ts
const OLLAMA_BASE_URL = process.env.OLLAMA_BASE_URL ?? "http://localhost:11434";
const OLLAMA_TIMEOUT_MS = 3000;
export interface OllamaStatus {
installed: boolean;
version: string | null;
installUrl: string;
}
export interface OllamaModel {
name: string; // e.g. "qwen2.5-coder:32b"
parameterSize: string; // e.g. "32.8B" from /api/tags details
quantization: string; // e.g. "Q4_K_M"
sizeBytes: number;
family: string; // e.g. "qwen2"
recommended: boolean; // from catalog match + RAM check
recommendationReason: string | null;
}
export async function detectOllama(): Promise<OllamaStatus> {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), OLLAMA_TIMEOUT_MS);
try {
const res = await fetch(`${OLLAMA_BASE_URL}/api/version`, {
signal: controller.signal,
});
if (!res.ok) return { installed: false, version: null, installUrl: INSTALL_URL };
const body = await res.json() as { version?: string };
return { installed: true, version: body.version ?? null, installUrl: INSTALL_URL };
} catch {
return { installed: false, version: null, installUrl: INSTALL_URL };
} finally {
clearTimeout(timeout);
}
}
```
**Why this pattern:** Matches the existing codex-models.ts pattern — HTTP fetch with timeout, graceful failure returns empty/false rather than throwing. The 3s timeout prevents hanging requests when Ollama is not installed.
### Pattern 2: Ollama Routes (mounted under /companies/:companyId)
```
GET /companies/:companyId/ollama/status
→ { installed: boolean, version: string|null, installUrl: string }
GET /companies/:companyId/ollama/models
→ { models: OllamaModel[], ramGb: number }
```
Both routes use existing `assertCompanyAccess(req, companyId)` authz pattern from `agents.ts`.
Mount in `server/src/routes/index.ts` alongside the existing `agentsRoutes`.
### Pattern 3: Hermes Config-Fields Enhancement
The existing `HermesLocalConfigFields` in `config-fields.tsx` has a free-text `Model` input. For Ollama support, it becomes a hybrid: dropdown (when Ollama is present) + manual entry fallback.
```tsx
// Fetch Ollama status + models (only for hermes_local adapter)
const { data: ollamaStatus } = useQuery({
queryKey: ["ollama", "status", companyId],
queryFn: () => ollamaApi.status(companyId!),
enabled: Boolean(companyId),
});
const { data: ollamaModels } = useQuery({
queryKey: ["ollama", "models", companyId],
queryFn: () => ollamaApi.models(companyId!),
enabled: Boolean(companyId && ollamaStatus?.installed),
});
```
When `ollamaStatus.installed === false`, render an install callout (OLLA-05) instead of the dropdown.
When a local Ollama model is selected, `buildHermesConfig` (or `mark`) must also set `provider: "custom"` and `base_url: "http://localhost:11434/v1"` in `adapterConfig`. This is the critical mapping from OLLA-03.
### Pattern 4: Hermes Runtime Data in stateJson (HERM-07)
`agentRuntimeState.stateJson` is `jsonb` typed as `Record<string, unknown>`. The heartbeat service writes this via `updateRuntimeState`. The Hermes adapter's `execute.ts` already returns `resultJson` with `session_id`, `usage`, and `cost_usd`.
For HERM-07 runtime data (model name, native skill count, memory usage), the server-side approach is:
- After a Hermes run completes, read `resultJson.result` and extract/store model + detected skill count into `stateJson`
- Optionally query Ollama `/api/ps` (running models) to get `size_vram` for memory usage display
**Insertion point for stateJson patch:** `heartbeat.ts:updateRuntimeState` already calls `db.update(agentRuntimeState).set(...)`. Add a `stateJson` merge here when `adapterType === "hermes_local"`.
**UI insertion point:** `AgentOverview` component in `AgentDetail.tsx` (line ~1183). Add a `HermesRuntimeCard` component after the charts section, gated by `agent.adapterType === "hermes_local"`:
```tsx
{agent.adapterType === "hermes_local" && runtimeState && (
<HermesRuntimeCard runtimeState={runtimeState} />
)}
```
### Pattern 5: Model Catalog JSON (OLLA-04)
```json
// server/src/data/ollama-model-catalog.json
{
"models": [
{
"family": "qwen2",
"variants": [
{ "name": "qwen2.5-coder:7b", "ramGb": 5, "vramGb": 5, "quality": "fast" },
{ "name": "qwen2.5-coder:32b", "ramGb": 22, "vramGb": 22, "quality": "best" }
]
},
{
"family": "llama",
"variants": [
{ "name": "llama3.2:3b", "ramGb": 3, "vramGb": 3, "quality": "fast" },
{ "name": "llama3.1:8b", "ramGb": 6, "vramGb": 6, "quality": "balanced" },
{ "name": "llama3.1:70b", "ramGb": 48, "vramGb": 48, "quality": "best" }
]
},
{
"family": "mistral",
"variants": [
{ "name": "mistral:7b", "ramGb": 5, "vramGb": 5, "quality": "balanced" },
{ "name": "mistral:22b", "ramGb": 14, "vramGb": 14, "quality": "best" }
]
},
{
"family": "phi",
"variants": [
{ "name": "phi4:14b", "ramGb": 10, "vramGb": 10, "quality": "balanced" }
]
},
{
"family": "deepseek",
"variants": [
{ "name": "deepseek-r1:7b", "ramGb": 5, "vramGb": 5, "quality": "reasoning" },
{ "name": "deepseek-r1:32b", "ramGb": 22, "vramGb": 22, "quality": "reasoning" }
]
}
]
}
```
Recommendation logic: `os.totalmem()` gives total RAM. Use 75% as usable RAM budget (leave OS headroom). Filter catalog entries where `ramGb <= totalRamGb * 0.75`. Return the highest-quality variant within budget plus a `recommendationReason` string.
### Anti-Patterns to Avoid
- **Polling Ollama in a loop:** Use a 60-second TTL in-memory cache (same as codex-models.ts `MODELS_CACHE_TTL_MS`). Do not re-probe on every API call.
- **Blocking server startup on Ollama check:** Ollama detection is on-demand (per-request), not at startup.
- **Hard-coding `localhost:11434`:** Always read from `process.env.OLLAMA_BASE_URL ?? "http://localhost:11434"` so users with non-standard ports work.
- **Requiring Ollama for Hermes:** All Ollama paths are optional. Hermes without Ollama continues to work unchanged. Never throw when Ollama is absent.
- **Overwriting all of stateJson:** Merge into stateJson using spread, never replace: `stateJson: { ...existingState, hermesModel: ..., hermesNativeSkillCount: ... }`.
---
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Ollama connectivity check | Custom TCP socket probe | `fetch` to `/api/version` with AbortController timeout | Reuses existing pattern from codex-models.ts |
| YAML config parsing | Full YAML parser | Existing `parseModelFromConfig` in hermes adapter | Already ships in hermes-paperclip-adapter/dist |
| System RAM reading | Shell commands | `os.totalmem()` | Built-in, no dep, works cross-platform |
| Token cost tracking | New billing logic | Existing `costService.createEvent` + `updateRuntimeState` | Already handles Hermes via regex-extracted usage |
---
## Common Pitfalls
### Pitfall 1: Hermes Does Not Have an "ollama" Provider
**What goes wrong:** Setting `adapterConfig.provider = "ollama"` causes Hermes to fail — "ollama" is not a valid VALID_PROVIDERS entry in `constants.js`.
**Why it happens:** Ollama mimics the OpenAI API, so Hermes treats it as `provider: "custom"` with `base_url: "http://localhost:11434/v1"`.
**How to avoid:** When a user selects an Ollama model, always write `provider: "custom"` and `base_url: "http://localhost:11434/v1"` into `adapterConfig`. These fields are already in the Hermes config schema (see `agentConfigurationDoc`).
**Warning signs:** Hermes stderr shows "unknown provider" or authentication errors during local model runs.
### Pitfall 2: Ollama API Returns Models at `/api/tags`, Not `/v1/models`
**What goes wrong:** Using the OpenAI-compat endpoint `/v1/models` to list models misses the `details` object (parameterSize, quantization_level, family) needed for OLLA-04.
**Why it happens:** `/v1/models` is OpenAI-compat, `/api/tags` is Ollama-native with richer data.
**How to avoid:** Use `GET localhost:11434/api/tags` for model listing (returns `details.parameter_size`, `details.family`). Use `/v1/models` only if passing through to Hermes.
### Pitfall 3: stateJson Merge Requires Read-Modify-Write
**What goes wrong:** `db.update(agentRuntimeState).set({ stateJson: newData })` overwrites other fields stored by other parts of the system.
**Why it happens:** Drizzle `.set()` replaces the entire column value.
**How to avoid:** Use Postgres jsonb merge: `stateJson: sql\`${agentRuntimeState.stateJson} || ${JSON.stringify(patch)}::jsonb\`` or read existing `stateJson` first, then spread. The existing `ensureRuntimeState` call in `updateRuntimeState` already reads the row.
### Pitfall 4: HermesLocalConfigFields Uses adapterConfig for Both Create and Edit Modes
**What goes wrong:** Setting `provider` and `base_url` only in create mode loses the values on edit, or vice versa.
**Why it happens:** The `isCreate` flag switches between `set!({ model: v })` (create) and `mark("adapterConfig", "model", v)` (edit) — both paths must update all three fields (model, provider, base_url) when an Ollama model is selected.
**How to avoid:** When Ollama model is selected, call the setter for all three config fields atomically. For create mode: `set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })`. For edit mode: three `mark()` calls or a compound helper.
### Pitfall 5: Ollama /api/ps Probe May Have No Models Running
**What goes wrong:** `/api/ps` returns an empty `models: []` when no model is currently loaded — this does not mean Ollama is absent.
**Why it happens:** Ollama only shows models in `/api/ps` when they are actively loaded in memory.
**How to avoid:** Use `/api/version` for detection (OLLA-01), `/api/tags` for the model list (OLLA-02), and `/api/ps` only for the optional "memory usage" metric in HERM-07 — handling the empty case as "not currently loaded".
### Pitfall 6: HERM-06 Cost Tracking — Ollama Models Return Zero Cost
**What goes wrong:** Expecting a `cost_usd` value from runs using local Ollama models — there is no external billing.
**Why it happens:** Hermes does not know the user's GPU/CPU cost. The `COST_REGEX` will not match if Hermes does not emit a cost line.
**How to avoid:** This is correct behavior. `normalizeBilledCostCents(undefined, "unknown")` returns `0`. Token usage may still be captured if Hermes emits token counts. Accept that Ollama-based runs show $0.00 in the cost UI — that is accurate.
---
## Code Examples
### Ollama /api/tags Response Shape (verified)
```typescript
// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
interface OllamaTagsResponse {
models: Array<{
name: string; // "qwen2.5-coder:32b"
model: string; // same as name
modified_at: string;
size: number; // bytes
digest: string;
details: {
parent_model: string;
format: string; // "gguf"
family: string; // "qwen2"
families: string[];
parameter_size: string; // "32.8B"
quantization_level: string; // "Q4_K_M"
};
}>;
}
```
### Ollama /api/ps Response Shape (verified)
```typescript
// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
interface OllamaPsResponse {
models: Array<{
name: string;
model: string;
size: number;
digest: string;
details: { /* same as tags */ };
expires_at: string;
size_vram: number; // bytes used in VRAM
}>;
}
```
### Reading hermes-adapter stateJson Hermes fields
```typescript
// In AgentDetail.tsx HermesRuntimeCard — read from runtimeState.stateJson
const hermesModel = runtimeState.stateJson?.hermesModel as string | undefined;
const hermesNativeSkillCount = runtimeState.stateJson?.hermesNativeSkillCount as number | undefined;
const hermesMemoryBytes = runtimeState.stateJson?.hermesMemoryBytes as number | undefined;
```
### Hermes Ollama adapterConfig (what to write)
```typescript
// When user selects an Ollama model in config-fields.tsx:
// model = "qwen2.5-coder:32b" (bare Ollama model name)
// provider = "custom" (OpenAI-compatible endpoint)
// base_url = "http://localhost:11434/v1"
// For create mode:
set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })
// For edit mode:
mark("adapterConfig", "model", model);
mark("adapterConfig", "provider", "custom");
mark("adapterConfig", "base_url", "http://localhost:11434/v1");
```
### Cost Tracking — Already Wired (HERM-06 context)
```typescript
// Source: server/src/services/heartbeat.ts:updateRuntimeState
// Hermes execute.ts returns:
// result.usage = { inputTokens, outputTokens } (from regex)
// result.costUsd = number | undefined (from regex, usually undefined for local)
//
// heartbeat.ts normalizes:
const usage = normalizeUsageTotals(result.usage);
const additionalCostCents = normalizeBilledCostCents(result.costUsd, billingType);
// Then:
if (additionalCostCents > 0 || hasTokenUsage) {
await costs.createEvent(companyId, { ... model: result.model ?? "unknown" ... });
}
// → For Ollama: costCents=0, but inputTokens/outputTokens may be > 0 → cost event recorded
// → If Hermes doesn't emit token counts: no event recorded (correct behavior)
```
---
## HERM-05: Skill Visibility — What Is Already Done vs. What Is Missing
### Already Done (data layer is complete)
- `skillRegistryService.syncHermesNativeSkills(agentId)` scans `~/.hermes/skills/` and inserts `source: "native"` rows
- Called automatically from `GET /skill-registry/agents/:agentId/skills` when `adapterType === "hermes_local"`
- Returns `AgentSkillEntry[]` with `{ skillId, source, installedAt }` — both `"native"` and `"managed"` source values
- Hermes adapter `listHermesSkills` returns snapshot with `originLabel: "Hermes skill"` and `readOnly: true` for native skills
### What Is Missing (UI rendering in AgentSkillsTab)
The `unmanagedSkillRows` section in `AgentSkillsTab` (AgentDetail.tsx:2566) renders read-only adapter entries. It uses `entry.originLabel` and `entry.locationLabel` for display. Hermes native skills already flow through this path.
The gap: the UI may not clearly distinguish "Hermes skill" entries from other unmanaged entries. The `originLabel: "Hermes skill"` badge rendering and skill count display are the UI additions needed. This is a targeted render update to `AgentSkillsTab`, not a new data flow.
---
## HERM-07: Dashboard Hermes Runtime Info
### What to Store in stateJson
```typescript
// Written by heartbeat.ts updateRuntimeState after a Hermes run
{
hermesModel: string; // e.g. "qwen2.5-coder:32b" or "anthropic/claude-sonnet-4"
hermesNativeSkillCount: number; // from skillRegistryService query
hermesMemoryBytes: number | null; // from /api/ps size_vram, null if unavailable
}
```
### Where to Write stateJson
In `heartbeat.ts:updateRuntimeState`, after the existing `db.update(agentRuntimeState).set(...)` call, add a second update that merges hermes-specific fields when `agent.adapterType === "hermes_local"`. Read `result.model` for `hermesModel`. Query `skillRegistryDb` for `hermesNativeSkillCount`. Query Ollama `/api/ps` for `hermesMemoryBytes` (non-blocking, fire-and-forget).
### What to Render
A `HermesRuntimeCard` component in `AgentOverview` (gated by `adapterType === "hermes_local"`):
- Model name (from stateJson.hermesModel)
- Native skill count (from stateJson.hermesNativeSkillCount)
- Memory usage (from stateJson.hermesMemoryBytes, formatted as "X.X GB" or "Not loaded")
---
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| Ollama daemon | OLLA-01 through OLLA-05 | No (not installed) | — | All paths degrade gracefully; UI shows install instructions |
| hermes-paperclip-adapter | HERM-05, HERM-06, HERM-07 | Yes | 0.2.1 | — |
| Node.js fetch | Ollama HTTP probing | Yes | built-in (Node 18+) | — |
| Node.js os module | OLLA-04 RAM reading | Yes | built-in | — |
| Vitest | Tests | Yes | (server vitest.config.ts) | — |
**Missing dependencies with no fallback:** None — all Ollama features degrade gracefully when Ollama is absent.
**Pre-existing test failures (not Phase 28 regressions):** 4 test files failing before Phase 28 begins:
- `app-hmr-port.test.ts`
- `plugin-worker-manager.test.ts`
- `heartbeat-workspace-session.test.ts` (5 tests)
- `skill-registry-routes.test.ts` (1 test)
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | Vitest (server) |
| Config file | `server/vitest.config.ts` |
| Quick run command | `cd server && npx vitest run src/__tests__/ollama-service.test.ts` |
| Full suite command | `cd server && npx vitest run` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| OLLA-01 | `detectOllama()` returns `installed: false` when Ollama absent | unit | `npx vitest run src/__tests__/ollama-service.test.ts` | No — Wave 0 |
| OLLA-01 | `detectOllama()` returns `installed: true` + version when Ollama present | unit | same | No — Wave 0 |
| OLLA-01 | `detectOllama()` times out cleanly (AbortController) | unit | same | No — Wave 0 |
| OLLA-02 | `listOllamaModels()` returns AdapterModel[] from /api/tags | unit | same | No — Wave 0 |
| OLLA-04 | `buildModelRecommendation()` returns correct model for given RAM budget | unit | same | No — Wave 0 |
| OLLA-05 | Routes return `installUrl` when Ollama absent | unit | same | No — Wave 0 |
| HERM-05 | Skills tab renders `originLabel: "Hermes skill"` badge | manual-only | — | — |
| HERM-06 | `updateRuntimeState` records cost event when Hermes emits token data | unit (existing pattern) | `npx vitest run src/__tests__/costs-service.test.ts` | Yes |
| HERM-07 | stateJson receives hermesModel/hermesNativeSkillCount after run | unit | `npx vitest run src/__tests__/ollama-service.test.ts` | No — Wave 0 |
### Sampling Rate
- **Per task commit:** `cd server && npx vitest run src/__tests__/ollama-service.test.ts`
- **Per wave merge:** `cd server && npx vitest run`
- **Phase gate:** Full suite green before `/gsd:verify-work` (excluding 4 pre-existing failures)
### Wave 0 Gaps
- [ ] `server/src/__tests__/ollama-service.test.ts` — covers OLLA-01, OLLA-02, OLLA-04, OLLA-05, HERM-07 stateJson logic
- [ ] Test stubs use mock fetch (AbortController pattern); no real Ollama needed
---
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Manual text entry for Hermes model | Dropdown fed from Ollama + manual fallback | Phase 28 | Better UX for local models |
| stateJson unused for Hermes | stateJson stores hermesModel, skillCount, memoryBytes | Phase 28 | Dashboard can show runtime info |
| Hermes native skills in separate table only | Skills tab renders both managed + native in unified view | Phase 28 (HERM-05 completion) | Unified skill surface |
---
## Open Questions
1. **Should Ollama route be gated to hermes_local only?**
- What we know: Only Hermes uses the Ollama custom endpoint pattern currently
- What's unclear: Future adapters (Phase 29 defaults) may also use Ollama
- Recommendation: Mount under `/companies/:companyId/ollama/*` without adapter-type gating — the endpoint is useful generically and Pi/OpenCode adapters may benefit in Phase 29
2. **Should listOllamaModels also extend the hermes adapter's `listModels` function?**
- What we know: `listAdapterModels("hermes_local")` already calls `adapter.listModels()` if present; hermes adapter has no `listModels` implementation (returns `models: []`)
- What's unclear: Whether to add `listModels` to hermes adapter (requires adapter package change) or use a separate Ollama API route in Nexus
- Recommendation: Use a separate Nexus route (`/companies/:companyId/ollama/models`). Avoids changing the hermes-paperclip-adapter package (external dependency). The config-fields.tsx component can call the Nexus route directly. **Do not modify the hermes-paperclip-adapter package.**
3. **stateJson hermesNativeSkillCount — count from skillRegistry or from adapter snapshot?**
- What we know: `skillRegistryDb` is a separate libSQL DB; querying it in `updateRuntimeState` adds cross-DB complexity
- What's unclear: Is the extra query worth it for a display-only count?
- Recommendation: Store the count from `result.resultJson` if Hermes emits it, or derive from the adapter skill snapshot after run. Alternatively, skip native skill count from stateJson and derive it in the UI from `agentsApi.skills(agentId)` query. The UI approach avoids cross-DB concerns in heartbeat.
---
## Sources
### Primary (HIGH confidence)
- hermes-paperclip-adapter@0.2.1 dist source code — `execute.js`, `skills.js`, `detect-model.js`, `test.js`, `constants.js` — read directly from `/opt/nexus/server/node_modules/hermes-paperclip-adapter/dist/`
- Nexus codebase — `server/src/services/heartbeat.ts`, `server/src/services/costs.ts`, `server/src/services/skill-registry.ts`, `ui/src/pages/AgentDetail.tsx`, `ui/src/adapters/hermes-local/config-fields.tsx` — read directly
- Ollama REST API — `https://docs.ollama.com/api/tags` — verified /api/tags response shape with `details.parameter_size`, `details.family`, `details.quantization_level`
- Node.js built-ins — `os.totalmem()`, `fetch` with AbortController — confirmed available in Node 18+ runtime
### Secondary (MEDIUM confidence)
- Hermes Agent provider docs — `https://hermes-agent.nousresearch.com/docs/integrations/providers/` — verified "ollama uses custom provider + localhost:11434/v1 base_url"
- Hermes Agent + Ollama guide — Medium/Substack articles cross-referencing official docs — confirmed custom endpoint configuration steps
### Tertiary (LOW confidence)
- Ollama model RAM requirements (catalog) — community sources + Ollama model page tags — use conservative estimates; verify against https://ollama.com/library model pages before shipping
---
## Metadata
**Confidence breakdown:**
- Ollama API: HIGH — verified from official docs, response shapes confirmed
- Hermes + Ollama provider mapping: HIGH — verified from official Hermes provider docs
- Standard stack: HIGH — all existing infrastructure confirmed from source code
- Architecture patterns: HIGH — follow existing codex-models.ts, heartbeat.ts, config-fields.tsx patterns exactly
- HERM-05 data layer status: HIGH — verified syncHermesNativeSkills exists and is already called
- HERM-06 cost tracking: HIGH — execute.js returns usage/costUsd, heartbeat.ts wires it to costService
- Pitfalls: HIGH — derived from actual source code analysis
**Research date:** 2026-04-01
**Valid until:** 2026-05-01 (Ollama API is stable; hermes-paperclip-adapter may receive new releases)