docs(28): research phase domain — Ollama API, Hermes config surface, cost tracking, dashboard
This commit is contained in:
parent
5fd7744516
commit
ee7f47c4d0
1 changed files with 522 additions and 0 deletions
522
.planning/phases/28-ollama-integration/28-RESEARCH.md
Normal file
522
.planning/phases/28-ollama-integration/28-RESEARCH.md
Normal file
|
|
@ -0,0 +1,522 @@
|
|||
# Phase 28: Ollama Integration & Agent Surface - Research
|
||||
|
||||
**Researched:** 2026-04-01
|
||||
**Domain:** Ollama HTTP API, Hermes adapter extension, agent dashboard UI, cost tracking
|
||||
**Confidence:** HIGH
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 28 adds three distinct capabilities on top of the completed Phase 27 Hermes adapter: (1) Ollama detection and model catalog — Nexus queries `localhost:11434` to detect Ollama, lists available models, and ships a static JSON catalog for hardware-aware recommendations; (2) Hermes config surface extension — the model field in `config-fields.tsx` becomes a dropdown fed by live Ollama discovery rather than a free-text input, and a new `base_url`/`provider: custom` adapterConfig field routes Hermes to the local endpoint; (3) Hermes runtime data in the dashboard — `stateJson` in `agentRuntimeState` is the right place to store Hermes-specific runtime metadata (model name, native skill count, memory usage from Ollama's `/api/ps`), and the `AgentOverview` component in `AgentDetail.tsx` is the right insertion point.
|
||||
|
||||
The most important finding is that **Hermes does not have a native "ollama" provider**. Ollama is configured as a custom OpenAI-compatible endpoint: `provider: custom`, `base_url: http://localhost:11434/v1`. The model field passes the Ollama model name bare (e.g. `qwen2.5-coder:32b`). This shapes OLLA-02, OLLA-03, and the `config-fields.tsx` changes.
|
||||
|
||||
For cost tracking (HERM-06): `hermes-paperclip-adapter@0.2.1` already parses `token_usage` and `cost` regex patterns from Hermes stdout. When Hermes returns non-zero usage, `heartbeat.ts:updateRuntimeState` already calls `costService.createEvent`. The only gap is that Hermes running local Ollama models will have `costUsd = undefined` (no billing) — the infrastructure handles this correctly (zero cost event is suppressed when `additionalCostCents === 0 && !hasTokenUsage`). No cost tracking code changes are needed for local models; the planner just needs to verify the regex path works end-to-end.
|
||||
|
||||
For HERM-05 (skill visibility): `syncHermesNativeSkills` already exists in `skillRegistryService` and is already called from the `GET /skill-registry/agents/:agentId/skills` route when `adapterType === "hermes_local"`. The Hermes adapter's `listHermesSkills` function merges Paperclip-managed and native skills. The integration is already complete at the data layer. What is missing is the UI surface in the Skills tab that renders the `originLabel: "Hermes skill"` / `readOnly: true` entries distinctly from managed skills.
|
||||
|
||||
**Primary recommendation:** Implement as four focused plans — (P01) server-side Ollama service + routes; (P02) Hermes config-fields UI extension for Ollama model selection; (P03) dashboard Hermes runtime info card; (P04) model catalog JSON + recommendation logic.
|
||||
|
||||
---
|
||||
|
||||
<user_constraints>
|
||||
## User Constraints (from CONTEXT.md)
|
||||
|
||||
### Locked Decisions
|
||||
All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.
|
||||
|
||||
### Claude's Discretion
|
||||
All implementation choices are at Claude's discretion.
|
||||
|
||||
### Deferred Ideas (OUT OF SCOPE)
|
||||
None — discuss phase skipped. Refer to REQUIREMENTS.md for in-scope requirements.
|
||||
|
||||
Out of scope per REQUIREMENTS.md:
|
||||
- Multi-provider model routing (Hermes can use OpenRouter/Anthropic/OpenAI but that's Hermes config, not Nexus)
|
||||
- Hermes MCP server management
|
||||
- Custom Hermes skill authoring UI
|
||||
- DFLT-01 through DFLT-04 (Phase 29)
|
||||
</user_constraints>
|
||||
|
||||
---
|
||||
|
||||
<phase_requirements>
|
||||
## Phase Requirements
|
||||
|
||||
| ID | Description | Research Support |
|
||||
|----|-------------|------------------|
|
||||
| OLLA-01 | Nexus detects whether Ollama is installed locally | HTTP probe to `localhost:11434/api/version`; new server service `ollamaService` |
|
||||
| OLLA-02 | User can see list of available Ollama models when configuring a Hermes agent | `GET /api/tags` from Ollama HTTP API; new server route `GET /companies/:id/ollama/models`; config-fields.tsx dropdown |
|
||||
| OLLA-03 | User can configure a Hermes agent with any local Ollama model | Sets `adapterConfig.model = <model-name>`, `adapterConfig.provider = "custom"`, `adapterConfig.base_url = "http://localhost:11434/v1"` |
|
||||
| OLLA-04 | Model recommendation based on RAM/VRAM from a shipped catalog | Static JSON catalog in `server/src/data/ollama-model-catalog.json`; server reads `os.totalmem()` to filter; returned with model list |
|
||||
| OLLA-05 | If Ollama is not present, user is offered installation instructions | Ollama status endpoint returns `installed: false` + `installUrl`; UI shows callout in Hermes config-fields |
|
||||
| HERM-05 | Nexus-managed skills visible alongside Hermes native skills in agent config | Already wired at data layer — UI Skills tab needs `originLabel: "Hermes skill"` rendering distinction |
|
||||
| HERM-06 | Cost tracking captures token usage and model costs for Hermes agents | Infrastructure already handles this; verify end-to-end with local Ollama (zero cost is correct, no change needed) |
|
||||
| HERM-07 | Dashboard shows Hermes-specific info (model name, memory usage, native skill count) | Store in `agentRuntimeState.stateJson`; render in `AgentOverview` component |
|
||||
</phase_requirements>
|
||||
|
||||
---
|
||||
|
||||
## Standard Stack
|
||||
|
||||
### Core
|
||||
| Library | Version | Purpose | Why Standard |
|
||||
|---------|---------|---------|--------------|
|
||||
| Node.js `os` module | built-in | Read total system RAM | Already used in heartbeat.ts; no new dep |
|
||||
| Node.js `fetch` | Node 18+ built-in | HTTP calls to Ollama API at localhost:11434 | Already confirmed available in runtime |
|
||||
| `hermes-paperclip-adapter` | 0.2.1 (installed) | Hermes execution, skill sync, model detection | Already wired into adapter registry |
|
||||
|
||||
### No New Dependencies Required
|
||||
All capabilities needed for Phase 28 are achievable with existing infrastructure:
|
||||
- Ollama HTTP API is probed with `fetch` (built-in Node 18+)
|
||||
- Model catalog is a static JSON file in the server package
|
||||
- RAM reading uses `os.totalmem()` (built-in)
|
||||
- Hermes Ollama configuration uses existing `adapterConfig` fields
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Recommended Project Structure (new files)
|
||||
|
||||
```
|
||||
server/src/services/ollama.ts # ollamaService — detect + list models
|
||||
server/src/routes/ollama.ts # HTTP routes: /companies/:id/ollama/status, /models
|
||||
server/src/data/ollama-model-catalog.json # shipped catalog for OLLA-04
|
||||
server/src/__tests__/ollama-service.test.ts # unit tests for ollamaService
|
||||
ui/src/api/ollama.ts # ollamaApi client — wraps server routes
|
||||
```
|
||||
|
||||
### Pattern 1: Ollama Service (server-side)
|
||||
|
||||
```typescript
|
||||
// server/src/services/ollama.ts
|
||||
const OLLAMA_BASE_URL = process.env.OLLAMA_BASE_URL ?? "http://localhost:11434";
|
||||
const OLLAMA_TIMEOUT_MS = 3000;
|
||||
|
||||
export interface OllamaStatus {
|
||||
installed: boolean;
|
||||
version: string | null;
|
||||
installUrl: string;
|
||||
}
|
||||
|
||||
export interface OllamaModel {
|
||||
name: string; // e.g. "qwen2.5-coder:32b"
|
||||
parameterSize: string; // e.g. "32.8B" from /api/tags details
|
||||
quantization: string; // e.g. "Q4_K_M"
|
||||
sizeBytes: number;
|
||||
family: string; // e.g. "qwen2"
|
||||
recommended: boolean; // from catalog match + RAM check
|
||||
recommendationReason: string | null;
|
||||
}
|
||||
|
||||
export async function detectOllama(): Promise<OllamaStatus> {
|
||||
const controller = new AbortController();
|
||||
const timeout = setTimeout(() => controller.abort(), OLLAMA_TIMEOUT_MS);
|
||||
try {
|
||||
const res = await fetch(`${OLLAMA_BASE_URL}/api/version`, {
|
||||
signal: controller.signal,
|
||||
});
|
||||
if (!res.ok) return { installed: false, version: null, installUrl: INSTALL_URL };
|
||||
const body = await res.json() as { version?: string };
|
||||
return { installed: true, version: body.version ?? null, installUrl: INSTALL_URL };
|
||||
} catch {
|
||||
return { installed: false, version: null, installUrl: INSTALL_URL };
|
||||
} finally {
|
||||
clearTimeout(timeout);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Why this pattern:** Matches the existing codex-models.ts pattern — HTTP fetch with timeout, graceful failure returns empty/false rather than throwing. The 3s timeout prevents hanging requests when Ollama is not installed.
|
||||
|
||||
### Pattern 2: Ollama Routes (mounted under /companies/:companyId)
|
||||
|
||||
```
|
||||
GET /companies/:companyId/ollama/status
|
||||
→ { installed: boolean, version: string|null, installUrl: string }
|
||||
|
||||
GET /companies/:companyId/ollama/models
|
||||
→ { models: OllamaModel[], ramGb: number }
|
||||
```
|
||||
|
||||
Both routes use existing `assertCompanyAccess(req, companyId)` authz pattern from `agents.ts`.
|
||||
|
||||
Mount in `server/src/routes/index.ts` alongside the existing `agentsRoutes`.
|
||||
|
||||
### Pattern 3: Hermes Config-Fields Enhancement
|
||||
|
||||
The existing `HermesLocalConfigFields` in `config-fields.tsx` has a free-text `Model` input. For Ollama support, it becomes a hybrid: dropdown (when Ollama is present) + manual entry fallback.
|
||||
|
||||
```tsx
|
||||
// Fetch Ollama status + models (only for hermes_local adapter)
|
||||
const { data: ollamaStatus } = useQuery({
|
||||
queryKey: ["ollama", "status", companyId],
|
||||
queryFn: () => ollamaApi.status(companyId!),
|
||||
enabled: Boolean(companyId),
|
||||
});
|
||||
|
||||
const { data: ollamaModels } = useQuery({
|
||||
queryKey: ["ollama", "models", companyId],
|
||||
queryFn: () => ollamaApi.models(companyId!),
|
||||
enabled: Boolean(companyId && ollamaStatus?.installed),
|
||||
});
|
||||
```
|
||||
|
||||
When `ollamaStatus.installed === false`, render an install callout (OLLA-05) instead of the dropdown.
|
||||
|
||||
When a local Ollama model is selected, `buildHermesConfig` (or `mark`) must also set `provider: "custom"` and `base_url: "http://localhost:11434/v1"` in `adapterConfig`. This is the critical mapping from OLLA-03.
|
||||
|
||||
### Pattern 4: Hermes Runtime Data in stateJson (HERM-07)
|
||||
|
||||
`agentRuntimeState.stateJson` is `jsonb` typed as `Record<string, unknown>`. The heartbeat service writes this via `updateRuntimeState`. The Hermes adapter's `execute.ts` already returns `resultJson` with `session_id`, `usage`, and `cost_usd`.
|
||||
|
||||
For HERM-07 runtime data (model name, native skill count, memory usage), the server-side approach is:
|
||||
- After a Hermes run completes, read `resultJson.result` and extract/store model + detected skill count into `stateJson`
|
||||
- Optionally query Ollama `/api/ps` (running models) to get `size_vram` for memory usage display
|
||||
|
||||
**Insertion point for stateJson patch:** `heartbeat.ts:updateRuntimeState` already calls `db.update(agentRuntimeState).set(...)`. Add a `stateJson` merge here when `adapterType === "hermes_local"`.
|
||||
|
||||
**UI insertion point:** `AgentOverview` component in `AgentDetail.tsx` (line ~1183). Add a `HermesRuntimeCard` component after the charts section, gated by `agent.adapterType === "hermes_local"`:
|
||||
|
||||
```tsx
|
||||
{agent.adapterType === "hermes_local" && runtimeState && (
|
||||
<HermesRuntimeCard runtimeState={runtimeState} />
|
||||
)}
|
||||
```
|
||||
|
||||
### Pattern 5: Model Catalog JSON (OLLA-04)
|
||||
|
||||
```json
|
||||
// server/src/data/ollama-model-catalog.json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"family": "qwen2",
|
||||
"variants": [
|
||||
{ "name": "qwen2.5-coder:7b", "ramGb": 5, "vramGb": 5, "quality": "fast" },
|
||||
{ "name": "qwen2.5-coder:32b", "ramGb": 22, "vramGb": 22, "quality": "best" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"family": "llama",
|
||||
"variants": [
|
||||
{ "name": "llama3.2:3b", "ramGb": 3, "vramGb": 3, "quality": "fast" },
|
||||
{ "name": "llama3.1:8b", "ramGb": 6, "vramGb": 6, "quality": "balanced" },
|
||||
{ "name": "llama3.1:70b", "ramGb": 48, "vramGb": 48, "quality": "best" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"family": "mistral",
|
||||
"variants": [
|
||||
{ "name": "mistral:7b", "ramGb": 5, "vramGb": 5, "quality": "balanced" },
|
||||
{ "name": "mistral:22b", "ramGb": 14, "vramGb": 14, "quality": "best" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"family": "phi",
|
||||
"variants": [
|
||||
{ "name": "phi4:14b", "ramGb": 10, "vramGb": 10, "quality": "balanced" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"family": "deepseek",
|
||||
"variants": [
|
||||
{ "name": "deepseek-r1:7b", "ramGb": 5, "vramGb": 5, "quality": "reasoning" },
|
||||
{ "name": "deepseek-r1:32b", "ramGb": 22, "vramGb": 22, "quality": "reasoning" }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Recommendation logic: `os.totalmem()` gives total RAM. Use 75% as usable RAM budget (leave OS headroom). Filter catalog entries where `ramGb <= totalRamGb * 0.75`. Return the highest-quality variant within budget plus a `recommendationReason` string.
|
||||
|
||||
### Anti-Patterns to Avoid
|
||||
|
||||
- **Polling Ollama in a loop:** Use a 60-second TTL in-memory cache (same as codex-models.ts `MODELS_CACHE_TTL_MS`). Do not re-probe on every API call.
|
||||
- **Blocking server startup on Ollama check:** Ollama detection is on-demand (per-request), not at startup.
|
||||
- **Hard-coding `localhost:11434`:** Always read from `process.env.OLLAMA_BASE_URL ?? "http://localhost:11434"` so users with non-standard ports work.
|
||||
- **Requiring Ollama for Hermes:** All Ollama paths are optional. Hermes without Ollama continues to work unchanged. Never throw when Ollama is absent.
|
||||
- **Overwriting all of stateJson:** Merge into stateJson using spread, never replace: `stateJson: { ...existingState, hermesModel: ..., hermesNativeSkillCount: ... }`.
|
||||
|
||||
---
|
||||
|
||||
## Don't Hand-Roll
|
||||
|
||||
| Problem | Don't Build | Use Instead | Why |
|
||||
|---------|-------------|-------------|-----|
|
||||
| Ollama connectivity check | Custom TCP socket probe | `fetch` to `/api/version` with AbortController timeout | Reuses existing pattern from codex-models.ts |
|
||||
| YAML config parsing | Full YAML parser | Existing `parseModelFromConfig` in hermes adapter | Already ships in hermes-paperclip-adapter/dist |
|
||||
| System RAM reading | Shell commands | `os.totalmem()` | Built-in, no dep, works cross-platform |
|
||||
| Token cost tracking | New billing logic | Existing `costService.createEvent` + `updateRuntimeState` | Already handles Hermes via regex-extracted usage |
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Pitfall 1: Hermes Does Not Have an "ollama" Provider
|
||||
**What goes wrong:** Setting `adapterConfig.provider = "ollama"` causes Hermes to fail — "ollama" is not a valid VALID_PROVIDERS entry in `constants.js`.
|
||||
**Why it happens:** Ollama mimics the OpenAI API, so Hermes treats it as `provider: "custom"` with `base_url: "http://localhost:11434/v1"`.
|
||||
**How to avoid:** When a user selects an Ollama model, always write `provider: "custom"` and `base_url: "http://localhost:11434/v1"` into `adapterConfig`. These fields are already in the Hermes config schema (see `agentConfigurationDoc`).
|
||||
**Warning signs:** Hermes stderr shows "unknown provider" or authentication errors during local model runs.
|
||||
|
||||
### Pitfall 2: Ollama API Returns Models at `/api/tags`, Not `/v1/models`
|
||||
**What goes wrong:** Using the OpenAI-compat endpoint `/v1/models` to list models misses the `details` object (parameterSize, quantization_level, family) needed for OLLA-04.
|
||||
**Why it happens:** `/v1/models` is OpenAI-compat, `/api/tags` is Ollama-native with richer data.
|
||||
**How to avoid:** Use `GET localhost:11434/api/tags` for model listing (returns `details.parameter_size`, `details.family`). Use `/v1/models` only if passing through to Hermes.
|
||||
|
||||
### Pitfall 3: stateJson Merge Requires Read-Modify-Write
|
||||
**What goes wrong:** `db.update(agentRuntimeState).set({ stateJson: newData })` overwrites other fields stored by other parts of the system.
|
||||
**Why it happens:** Drizzle `.set()` replaces the entire column value.
|
||||
**How to avoid:** Use Postgres jsonb merge: `stateJson: sql\`${agentRuntimeState.stateJson} || ${JSON.stringify(patch)}::jsonb\`` or read existing `stateJson` first, then spread. The existing `ensureRuntimeState` call in `updateRuntimeState` already reads the row.
|
||||
|
||||
### Pitfall 4: HermesLocalConfigFields Uses adapterConfig for Both Create and Edit Modes
|
||||
**What goes wrong:** Setting `provider` and `base_url` only in create mode loses the values on edit, or vice versa.
|
||||
**Why it happens:** The `isCreate` flag switches between `set!({ model: v })` (create) and `mark("adapterConfig", "model", v)` (edit) — both paths must update all three fields (model, provider, base_url) when an Ollama model is selected.
|
||||
**How to avoid:** When Ollama model is selected, call the setter for all three config fields atomically. For create mode: `set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })`. For edit mode: three `mark()` calls or a compound helper.
|
||||
|
||||
### Pitfall 5: Ollama /api/ps Probe May Have No Models Running
|
||||
**What goes wrong:** `/api/ps` returns an empty `models: []` when no model is currently loaded — this does not mean Ollama is absent.
|
||||
**Why it happens:** Ollama only shows models in `/api/ps` when they are actively loaded in memory.
|
||||
**How to avoid:** Use `/api/version` for detection (OLLA-01), `/api/tags` for the model list (OLLA-02), and `/api/ps` only for the optional "memory usage" metric in HERM-07 — handling the empty case as "not currently loaded".
|
||||
|
||||
### Pitfall 6: HERM-06 Cost Tracking — Ollama Models Return Zero Cost
|
||||
**What goes wrong:** Expecting a `cost_usd` value from runs using local Ollama models — there is no external billing.
|
||||
**Why it happens:** Hermes does not know the user's GPU/CPU cost. The `COST_REGEX` will not match if Hermes does not emit a cost line.
|
||||
**How to avoid:** This is correct behavior. `normalizeBilledCostCents(undefined, "unknown")` returns `0`. Token usage may still be captured if Hermes emits token counts. Accept that Ollama-based runs show $0.00 in the cost UI — that is accurate.
|
||||
|
||||
---
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Ollama /api/tags Response Shape (verified)
|
||||
```typescript
|
||||
// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
|
||||
interface OllamaTagsResponse {
|
||||
models: Array<{
|
||||
name: string; // "qwen2.5-coder:32b"
|
||||
model: string; // same as name
|
||||
modified_at: string;
|
||||
size: number; // bytes
|
||||
digest: string;
|
||||
details: {
|
||||
parent_model: string;
|
||||
format: string; // "gguf"
|
||||
family: string; // "qwen2"
|
||||
families: string[];
|
||||
parameter_size: string; // "32.8B"
|
||||
quantization_level: string; // "Q4_K_M"
|
||||
};
|
||||
}>;
|
||||
}
|
||||
```
|
||||
|
||||
### Ollama /api/ps Response Shape (verified)
|
||||
```typescript
|
||||
// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
|
||||
interface OllamaPsResponse {
|
||||
models: Array<{
|
||||
name: string;
|
||||
model: string;
|
||||
size: number;
|
||||
digest: string;
|
||||
details: { /* same as tags */ };
|
||||
expires_at: string;
|
||||
size_vram: number; // bytes used in VRAM
|
||||
}>;
|
||||
}
|
||||
```
|
||||
|
||||
### Reading hermes-adapter stateJson Hermes fields
|
||||
```typescript
|
||||
// In AgentDetail.tsx HermesRuntimeCard — read from runtimeState.stateJson
|
||||
const hermesModel = runtimeState.stateJson?.hermesModel as string | undefined;
|
||||
const hermesNativeSkillCount = runtimeState.stateJson?.hermesNativeSkillCount as number | undefined;
|
||||
const hermesMemoryBytes = runtimeState.stateJson?.hermesMemoryBytes as number | undefined;
|
||||
```
|
||||
|
||||
### Hermes Ollama adapterConfig (what to write)
|
||||
```typescript
|
||||
// When user selects an Ollama model in config-fields.tsx:
|
||||
// model = "qwen2.5-coder:32b" (bare Ollama model name)
|
||||
// provider = "custom" (OpenAI-compatible endpoint)
|
||||
// base_url = "http://localhost:11434/v1"
|
||||
|
||||
// For create mode:
|
||||
set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })
|
||||
|
||||
// For edit mode:
|
||||
mark("adapterConfig", "model", model);
|
||||
mark("adapterConfig", "provider", "custom");
|
||||
mark("adapterConfig", "base_url", "http://localhost:11434/v1");
|
||||
```
|
||||
|
||||
### Cost Tracking — Already Wired (HERM-06 context)
|
||||
```typescript
|
||||
// Source: server/src/services/heartbeat.ts:updateRuntimeState
|
||||
// Hermes execute.ts returns:
|
||||
// result.usage = { inputTokens, outputTokens } (from regex)
|
||||
// result.costUsd = number | undefined (from regex, usually undefined for local)
|
||||
//
|
||||
// heartbeat.ts normalizes:
|
||||
const usage = normalizeUsageTotals(result.usage);
|
||||
const additionalCostCents = normalizeBilledCostCents(result.costUsd, billingType);
|
||||
// Then:
|
||||
if (additionalCostCents > 0 || hasTokenUsage) {
|
||||
await costs.createEvent(companyId, { ... model: result.model ?? "unknown" ... });
|
||||
}
|
||||
// → For Ollama: costCents=0, but inputTokens/outputTokens may be > 0 → cost event recorded
|
||||
// → If Hermes doesn't emit token counts: no event recorded (correct behavior)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## HERM-05: Skill Visibility — What Is Already Done vs. What Is Missing
|
||||
|
||||
### Already Done (data layer is complete)
|
||||
- `skillRegistryService.syncHermesNativeSkills(agentId)` scans `~/.hermes/skills/` and inserts `source: "native"` rows
|
||||
- Called automatically from `GET /skill-registry/agents/:agentId/skills` when `adapterType === "hermes_local"`
|
||||
- Returns `AgentSkillEntry[]` with `{ skillId, source, installedAt }` — both `"native"` and `"managed"` source values
|
||||
- Hermes adapter `listHermesSkills` returns snapshot with `originLabel: "Hermes skill"` and `readOnly: true` for native skills
|
||||
|
||||
### What Is Missing (UI rendering in AgentSkillsTab)
|
||||
The `unmanagedSkillRows` section in `AgentSkillsTab` (AgentDetail.tsx:2566) renders read-only adapter entries. It uses `entry.originLabel` and `entry.locationLabel` for display. Hermes native skills already flow through this path.
|
||||
|
||||
The gap: the UI may not clearly distinguish "Hermes skill" entries from other unmanaged entries. The `originLabel: "Hermes skill"` badge rendering and skill count display are the UI additions needed. This is a targeted render update to `AgentSkillsTab`, not a new data flow.
|
||||
|
||||
---
|
||||
|
||||
## HERM-07: Dashboard Hermes Runtime Info
|
||||
|
||||
### What to Store in stateJson
|
||||
```typescript
|
||||
// Written by heartbeat.ts updateRuntimeState after a Hermes run
|
||||
{
|
||||
hermesModel: string; // e.g. "qwen2.5-coder:32b" or "anthropic/claude-sonnet-4"
|
||||
hermesNativeSkillCount: number; // from skillRegistryService query
|
||||
hermesMemoryBytes: number | null; // from /api/ps size_vram, null if unavailable
|
||||
}
|
||||
```
|
||||
|
||||
### Where to Write stateJson
|
||||
In `heartbeat.ts:updateRuntimeState`, after the existing `db.update(agentRuntimeState).set(...)` call, add a second update that merges hermes-specific fields when `agent.adapterType === "hermes_local"`. Read `result.model` for `hermesModel`. Query `skillRegistryDb` for `hermesNativeSkillCount`. Query Ollama `/api/ps` for `hermesMemoryBytes` (non-blocking, fire-and-forget).
|
||||
|
||||
### What to Render
|
||||
A `HermesRuntimeCard` component in `AgentOverview` (gated by `adapterType === "hermes_local"`):
|
||||
- Model name (from stateJson.hermesModel)
|
||||
- Native skill count (from stateJson.hermesNativeSkillCount)
|
||||
- Memory usage (from stateJson.hermesMemoryBytes, formatted as "X.X GB" or "Not loaded")
|
||||
|
||||
---
|
||||
|
||||
## Environment Availability
|
||||
|
||||
| Dependency | Required By | Available | Version | Fallback |
|
||||
|------------|------------|-----------|---------|----------|
|
||||
| Ollama daemon | OLLA-01 through OLLA-05 | No (not installed) | — | All paths degrade gracefully; UI shows install instructions |
|
||||
| hermes-paperclip-adapter | HERM-05, HERM-06, HERM-07 | Yes | 0.2.1 | — |
|
||||
| Node.js fetch | Ollama HTTP probing | Yes | built-in (Node 18+) | — |
|
||||
| Node.js os module | OLLA-04 RAM reading | Yes | built-in | — |
|
||||
| Vitest | Tests | Yes | (server vitest.config.ts) | — |
|
||||
|
||||
**Missing dependencies with no fallback:** None — all Ollama features degrade gracefully when Ollama is absent.
|
||||
|
||||
**Pre-existing test failures (not Phase 28 regressions):** 4 test files failing before Phase 28 begins:
|
||||
- `app-hmr-port.test.ts`
|
||||
- `plugin-worker-manager.test.ts`
|
||||
- `heartbeat-workspace-session.test.ts` (5 tests)
|
||||
- `skill-registry-routes.test.ts` (1 test)
|
||||
|
||||
---
|
||||
|
||||
## Validation Architecture
|
||||
|
||||
### Test Framework
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Framework | Vitest (server) |
|
||||
| Config file | `server/vitest.config.ts` |
|
||||
| Quick run command | `cd server && npx vitest run src/__tests__/ollama-service.test.ts` |
|
||||
| Full suite command | `cd server && npx vitest run` |
|
||||
|
||||
### Phase Requirements → Test Map
|
||||
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|
||||
|--------|----------|-----------|-------------------|-------------|
|
||||
| OLLA-01 | `detectOllama()` returns `installed: false` when Ollama absent | unit | `npx vitest run src/__tests__/ollama-service.test.ts` | No — Wave 0 |
|
||||
| OLLA-01 | `detectOllama()` returns `installed: true` + version when Ollama present | unit | same | No — Wave 0 |
|
||||
| OLLA-01 | `detectOllama()` times out cleanly (AbortController) | unit | same | No — Wave 0 |
|
||||
| OLLA-02 | `listOllamaModels()` returns AdapterModel[] from /api/tags | unit | same | No — Wave 0 |
|
||||
| OLLA-04 | `buildModelRecommendation()` returns correct model for given RAM budget | unit | same | No — Wave 0 |
|
||||
| OLLA-05 | Routes return `installUrl` when Ollama absent | unit | same | No — Wave 0 |
|
||||
| HERM-05 | Skills tab renders `originLabel: "Hermes skill"` badge | manual-only | — | — |
|
||||
| HERM-06 | `updateRuntimeState` records cost event when Hermes emits token data | unit (existing pattern) | `npx vitest run src/__tests__/costs-service.test.ts` | Yes |
|
||||
| HERM-07 | stateJson receives hermesModel/hermesNativeSkillCount after run | unit | `npx vitest run src/__tests__/ollama-service.test.ts` | No — Wave 0 |
|
||||
|
||||
### Sampling Rate
|
||||
- **Per task commit:** `cd server && npx vitest run src/__tests__/ollama-service.test.ts`
|
||||
- **Per wave merge:** `cd server && npx vitest run`
|
||||
- **Phase gate:** Full suite green before `/gsd:verify-work` (excluding 4 pre-existing failures)
|
||||
|
||||
### Wave 0 Gaps
|
||||
- [ ] `server/src/__tests__/ollama-service.test.ts` — covers OLLA-01, OLLA-02, OLLA-04, OLLA-05, HERM-07 stateJson logic
|
||||
- [ ] Test stubs use mock fetch (AbortController pattern); no real Ollama needed
|
||||
|
||||
---
|
||||
|
||||
## State of the Art
|
||||
|
||||
| Old Approach | Current Approach | When Changed | Impact |
|
||||
|--------------|------------------|--------------|--------|
|
||||
| Manual text entry for Hermes model | Dropdown fed from Ollama + manual fallback | Phase 28 | Better UX for local models |
|
||||
| stateJson unused for Hermes | stateJson stores hermesModel, skillCount, memoryBytes | Phase 28 | Dashboard can show runtime info |
|
||||
| Hermes native skills in separate table only | Skills tab renders both managed + native in unified view | Phase 28 (HERM-05 completion) | Unified skill surface |
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Should Ollama route be gated to hermes_local only?**
|
||||
- What we know: Only Hermes uses the Ollama custom endpoint pattern currently
|
||||
- What's unclear: Future adapters (Phase 29 defaults) may also use Ollama
|
||||
- Recommendation: Mount under `/companies/:companyId/ollama/*` without adapter-type gating — the endpoint is useful generically and Pi/OpenCode adapters may benefit in Phase 29
|
||||
|
||||
2. **Should listOllamaModels also extend the hermes adapter's `listModels` function?**
|
||||
- What we know: `listAdapterModels("hermes_local")` already calls `adapter.listModels()` if present; hermes adapter has no `listModels` implementation (returns `models: []`)
|
||||
- What's unclear: Whether to add `listModels` to hermes adapter (requires adapter package change) or use a separate Ollama API route in Nexus
|
||||
- Recommendation: Use a separate Nexus route (`/companies/:companyId/ollama/models`). Avoids changing the hermes-paperclip-adapter package (external dependency). The config-fields.tsx component can call the Nexus route directly. **Do not modify the hermes-paperclip-adapter package.**
|
||||
|
||||
3. **stateJson hermesNativeSkillCount — count from skillRegistry or from adapter snapshot?**
|
||||
- What we know: `skillRegistryDb` is a separate libSQL DB; querying it in `updateRuntimeState` adds cross-DB complexity
|
||||
- What's unclear: Is the extra query worth it for a display-only count?
|
||||
- Recommendation: Store the count from `result.resultJson` if Hermes emits it, or derive from the adapter skill snapshot after run. Alternatively, skip native skill count from stateJson and derive it in the UI from `agentsApi.skills(agentId)` query. The UI approach avoids cross-DB concerns in heartbeat.
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
### Primary (HIGH confidence)
|
||||
- hermes-paperclip-adapter@0.2.1 dist source code — `execute.js`, `skills.js`, `detect-model.js`, `test.js`, `constants.js` — read directly from `/opt/nexus/server/node_modules/hermes-paperclip-adapter/dist/`
|
||||
- Nexus codebase — `server/src/services/heartbeat.ts`, `server/src/services/costs.ts`, `server/src/services/skill-registry.ts`, `ui/src/pages/AgentDetail.tsx`, `ui/src/adapters/hermes-local/config-fields.tsx` — read directly
|
||||
- Ollama REST API — `https://docs.ollama.com/api/tags` — verified /api/tags response shape with `details.parameter_size`, `details.family`, `details.quantization_level`
|
||||
- Node.js built-ins — `os.totalmem()`, `fetch` with AbortController — confirmed available in Node 18+ runtime
|
||||
|
||||
### Secondary (MEDIUM confidence)
|
||||
- Hermes Agent provider docs — `https://hermes-agent.nousresearch.com/docs/integrations/providers/` — verified "ollama uses custom provider + localhost:11434/v1 base_url"
|
||||
- Hermes Agent + Ollama guide — Medium/Substack articles cross-referencing official docs — confirmed custom endpoint configuration steps
|
||||
|
||||
### Tertiary (LOW confidence)
|
||||
- Ollama model RAM requirements (catalog) — community sources + Ollama model page tags — use conservative estimates; verify against https://ollama.com/library model pages before shipping
|
||||
|
||||
---
|
||||
|
||||
## Metadata
|
||||
|
||||
**Confidence breakdown:**
|
||||
- Ollama API: HIGH — verified from official docs, response shapes confirmed
|
||||
- Hermes + Ollama provider mapping: HIGH — verified from official Hermes provider docs
|
||||
- Standard stack: HIGH — all existing infrastructure confirmed from source code
|
||||
- Architecture patterns: HIGH — follow existing codex-models.ts, heartbeat.ts, config-fields.tsx patterns exactly
|
||||
- HERM-05 data layer status: HIGH — verified syncHermesNativeSkills exists and is already called
|
||||
- HERM-06 cost tracking: HIGH — execute.js returns usage/costUsd, heartbeat.ts wires it to costService
|
||||
- Pitfalls: HIGH — derived from actual source code analysis
|
||||
|
||||
**Research date:** 2026-04-01
|
||||
**Valid until:** 2026-05-01 (Ollama API is stable; hermes-paperclip-adapter may receive new releases)
|
||||
Loading…
Add table
Reference in a new issue