Nexus Dev 8ae8e526d9 chore: complete v1.4 Hermes Default Provider milestone

3 phases, 6 plans, 16 requirements. Archives copied to milestones/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 03:55:49 +00:00

29 KiB

Raw Blame History

Phase 28: Ollama Integration & Agent Surface - Research

Researched: 2026-04-01 Domain: Ollama HTTP API, Hermes adapter extension, agent dashboard UI, cost tracking Confidence: HIGH

Summary

Phase 28 adds three distinct capabilities on top of the completed Phase 27 Hermes adapter: (1) Ollama detection and model catalog — Nexus queries localhost:11434 to detect Ollama, lists available models, and ships a static JSON catalog for hardware-aware recommendations; (2) Hermes config surface extension — the model field in config-fields.tsx becomes a dropdown fed by live Ollama discovery rather than a free-text input, and a new base_url/provider: custom adapterConfig field routes Hermes to the local endpoint; (3) Hermes runtime data in the dashboard — stateJson in agentRuntimeState is the right place to store Hermes-specific runtime metadata (model name, native skill count, memory usage from Ollama's /api/ps), and the AgentOverview component in AgentDetail.tsx is the right insertion point.

The most important finding is that Hermes does not have a native "ollama" provider. Ollama is configured as a custom OpenAI-compatible endpoint: provider: custom, base_url: http://localhost:11434/v1. The model field passes the Ollama model name bare (e.g. qwen2.5-coder:32b). This shapes OLLA-02, OLLA-03, and the config-fields.tsx changes.

For cost tracking (HERM-06): hermes-paperclip-adapter@0.2.1 already parses token_usage and cost regex patterns from Hermes stdout. When Hermes returns non-zero usage, heartbeat.ts:updateRuntimeState already calls costService.createEvent. The only gap is that Hermes running local Ollama models will have costUsd = undefined (no billing) — the infrastructure handles this correctly (zero cost event is suppressed when additionalCostCents === 0 && !hasTokenUsage). No cost tracking code changes are needed for local models; the planner just needs to verify the regex path works end-to-end.

For HERM-05 (skill visibility): syncHermesNativeSkills already exists in skillRegistryService and is already called from the GET /skill-registry/agents/:agentId/skills route when adapterType === "hermes_local". The Hermes adapter's listHermesSkills function merges Paperclip-managed and native skills. The integration is already complete at the data layer. What is missing is the UI surface in the Skills tab that renders the originLabel: "Hermes skill" / readOnly: true entries distinctly from managed skills.

Primary recommendation: Implement as four focused plans — (P01) server-side Ollama service + routes; (P02) Hermes config-fields UI extension for Ollama model selection; (P03) dashboard Hermes runtime info card; (P04) model catalog JSON + recommendation logic.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

All implementation choices are at Claude's discretion — discuss phase was skipped per user setting. Use ROADMAP phase goal, success criteria, and codebase conventions to guide decisions.

Claude's Discretion

All implementation choices are at Claude's discretion.

Deferred Ideas (OUT OF SCOPE)

None — discuss phase skipped. Refer to REQUIREMENTS.md for in-scope requirements.

Out of scope per REQUIREMENTS.md:

Multi-provider model routing (Hermes can use OpenRouter/Anthropic/OpenAI but that's Hermes config, not Nexus)
Hermes MCP server management
Custom Hermes skill authoring UI
DFLT-01 through DFLT-04 (Phase 29) </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
OLLA-01	Nexus detects whether Ollama is installed locally	HTTP probe to `localhost:11434/api/version`; new server service `ollamaService`
OLLA-02	User can see list of available Ollama models when configuring a Hermes agent	`GET /api/tags` from Ollama HTTP API; new server route `GET /companies/:id/ollama/models`; config-fields.tsx dropdown
OLLA-03	User can configure a Hermes agent with any local Ollama model	Sets `adapterConfig.model = <model-name>`, `adapterConfig.provider = "custom"`, `adapterConfig.base_url = "http://localhost:11434/v1"`
OLLA-04	Model recommendation based on RAM/VRAM from a shipped catalog	Static JSON catalog in `server/src/data/ollama-model-catalog.json`; server reads `os.totalmem()` to filter; returned with model list
OLLA-05	If Ollama is not present, user is offered installation instructions	Ollama status endpoint returns `installed: false` + `installUrl`; UI shows callout in Hermes config-fields
HERM-05	Nexus-managed skills visible alongside Hermes native skills in agent config	Already wired at data layer — UI Skills tab needs `originLabel: "Hermes skill"` rendering distinction
HERM-06	Cost tracking captures token usage and model costs for Hermes agents	Infrastructure already handles this; verify end-to-end with local Ollama (zero cost is correct, no change needed)
HERM-07	Dashboard shows Hermes-specific info (model name, memory usage, native skill count)	Store in `agentRuntimeState.stateJson`; render in `AgentOverview` component
</phase_requirements>

Standard Stack

Core

Library	Version	Purpose	Why Standard
Node.js `os` module	built-in	Read total system RAM	Already used in heartbeat.ts; no new dep
Node.js `fetch`	Node 18+ built-in	HTTP calls to Ollama API at localhost:11434	Already confirmed available in runtime
`hermes-paperclip-adapter`	0.2.1 (installed)	Hermes execution, skill sync, model detection	Already wired into adapter registry

No New Dependencies Required

All capabilities needed for Phase 28 are achievable with existing infrastructure:

Ollama HTTP API is probed with fetch (built-in Node 18+)
Model catalog is a static JSON file in the server package
RAM reading uses os.totalmem() (built-in)
Hermes Ollama configuration uses existing adapterConfig fields

Architecture Patterns

Recommended Project Structure (new files)

server/src/services/ollama.ts          # ollamaService — detect + list models
server/src/routes/ollama.ts            # HTTP routes: /companies/:id/ollama/status, /models
server/src/data/ollama-model-catalog.json  # shipped catalog for OLLA-04
server/src/__tests__/ollama-service.test.ts  # unit tests for ollamaService
ui/src/api/ollama.ts                   # ollamaApi client — wraps server routes

Pattern 1: Ollama Service (server-side)

// server/src/services/ollama.ts
const OLLAMA_BASE_URL = process.env.OLLAMA_BASE_URL ?? "http://localhost:11434";
const OLLAMA_TIMEOUT_MS = 3000;

export interface OllamaStatus {
  installed: boolean;
  version: string | null;
  installUrl: string;
}

export interface OllamaModel {
  name: string;           // e.g. "qwen2.5-coder:32b"
  parameterSize: string;  // e.g. "32.8B" from /api/tags details
  quantization: string;   // e.g. "Q4_K_M"
  sizeBytes: number;
  family: string;         // e.g. "qwen2"
  recommended: boolean;   // from catalog match + RAM check
  recommendationReason: string | null;
}

export async function detectOllama(): Promise<OllamaStatus> {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), OLLAMA_TIMEOUT_MS);
  try {
    const res = await fetch(`${OLLAMA_BASE_URL}/api/version`, {
      signal: controller.signal,
    });
    if (!res.ok) return { installed: false, version: null, installUrl: INSTALL_URL };
    const body = await res.json() as { version?: string };
    return { installed: true, version: body.version ?? null, installUrl: INSTALL_URL };
  } catch {
    return { installed: false, version: null, installUrl: INSTALL_URL };
  } finally {
    clearTimeout(timeout);
  }
}

Why this pattern: Matches the existing codex-models.ts pattern — HTTP fetch with timeout, graceful failure returns empty/false rather than throwing. The 3s timeout prevents hanging requests when Ollama is not installed.

Pattern 2: Ollama Routes (mounted under /companies/:companyId)

GET /companies/:companyId/ollama/status
  → { installed: boolean, version: string|null, installUrl: string }

GET /companies/:companyId/ollama/models
  → { models: OllamaModel[], ramGb: number }

Both routes use existing assertCompanyAccess(req, companyId) authz pattern from agents.ts.

Mount in server/src/routes/index.ts alongside the existing agentsRoutes.

Pattern 3: Hermes Config-Fields Enhancement

The existing HermesLocalConfigFields in config-fields.tsx has a free-text Model input. For Ollama support, it becomes a hybrid: dropdown (when Ollama is present) + manual entry fallback.

// Fetch Ollama status + models (only for hermes_local adapter)
const { data: ollamaStatus } = useQuery({
  queryKey: ["ollama", "status", companyId],
  queryFn: () => ollamaApi.status(companyId!),
  enabled: Boolean(companyId),
});

const { data: ollamaModels } = useQuery({
  queryKey: ["ollama", "models", companyId],
  queryFn: () => ollamaApi.models(companyId!),
  enabled: Boolean(companyId && ollamaStatus?.installed),
});

When ollamaStatus.installed === false, render an install callout (OLLA-05) instead of the dropdown.

When a local Ollama model is selected, buildHermesConfig (or mark) must also set provider: "custom" and base_url: "http://localhost:11434/v1" in adapterConfig. This is the critical mapping from OLLA-03.

Pattern 4: Hermes Runtime Data in stateJson (HERM-07)

agentRuntimeState.stateJson is jsonb typed as Record<string, unknown>. The heartbeat service writes this via updateRuntimeState. The Hermes adapter's execute.ts already returns resultJson with session_id, usage, and cost_usd.

For HERM-07 runtime data (model name, native skill count, memory usage), the server-side approach is:

After a Hermes run completes, read resultJson.result and extract/store model + detected skill count into stateJson
Optionally query Ollama /api/ps (running models) to get size_vram for memory usage display

Insertion point for stateJson patch: heartbeat.ts:updateRuntimeState already calls db.update(agentRuntimeState).set(...). Add a stateJson merge here when adapterType === "hermes_local".

UI insertion point: AgentOverview component in AgentDetail.tsx (line ~1183). Add a HermesRuntimeCard component after the charts section, gated by agent.adapterType === "hermes_local":

{agent.adapterType === "hermes_local" && runtimeState && (
  <HermesRuntimeCard runtimeState={runtimeState} />
)}

Pattern 5: Model Catalog JSON (OLLA-04)

// server/src/data/ollama-model-catalog.json
{
  "models": [
    {
      "family": "qwen2",
      "variants": [
        { "name": "qwen2.5-coder:7b",  "ramGb": 5,  "vramGb": 5,  "quality": "fast" },
        { "name": "qwen2.5-coder:32b", "ramGb": 22, "vramGb": 22, "quality": "best" }
      ]
    },
    {
      "family": "llama",
      "variants": [
        { "name": "llama3.2:3b",  "ramGb": 3,  "vramGb": 3,  "quality": "fast" },
        { "name": "llama3.1:8b",  "ramGb": 6,  "vramGb": 6,  "quality": "balanced" },
        { "name": "llama3.1:70b", "ramGb": 48, "vramGb": 48, "quality": "best" }
      ]
    },
    {
      "family": "mistral",
      "variants": [
        { "name": "mistral:7b",   "ramGb": 5,  "vramGb": 5,  "quality": "balanced" },
        { "name": "mistral:22b",  "ramGb": 14, "vramGb": 14, "quality": "best" }
      ]
    },
    {
      "family": "phi",
      "variants": [
        { "name": "phi4:14b",    "ramGb": 10, "vramGb": 10, "quality": "balanced" }
      ]
    },
    {
      "family": "deepseek",
      "variants": [
        { "name": "deepseek-r1:7b",  "ramGb": 5,  "vramGb": 5,  "quality": "reasoning" },
        { "name": "deepseek-r1:32b", "ramGb": 22, "vramGb": 22, "quality": "reasoning" }
      ]
    }
  ]
}

Recommendation logic: os.totalmem() gives total RAM. Use 75% as usable RAM budget (leave OS headroom). Filter catalog entries where ramGb <= totalRamGb * 0.75. Return the highest-quality variant within budget plus a recommendationReason string.

Anti-Patterns to Avoid

Polling Ollama in a loop: Use a 60-second TTL in-memory cache (same as codex-models.ts MODELS_CACHE_TTL_MS). Do not re-probe on every API call.
Blocking server startup on Ollama check: Ollama detection is on-demand (per-request), not at startup.
Hard-coding localhost:11434: Always read from process.env.OLLAMA_BASE_URL ?? "http://localhost:11434" so users with non-standard ports work.
Requiring Ollama for Hermes: All Ollama paths are optional. Hermes without Ollama continues to work unchanged. Never throw when Ollama is absent.
Overwriting all of stateJson: Merge into stateJson using spread, never replace: stateJson: { ...existingState, hermesModel: ..., hermesNativeSkillCount: ... }.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
Ollama connectivity check	Custom TCP socket probe	`fetch` to `/api/version` with AbortController timeout	Reuses existing pattern from codex-models.ts
YAML config parsing	Full YAML parser	Existing `parseModelFromConfig` in hermes adapter	Already ships in hermes-paperclip-adapter/dist
System RAM reading	Shell commands	`os.totalmem()`	Built-in, no dep, works cross-platform
Token cost tracking	New billing logic	Existing `costService.createEvent` + `updateRuntimeState`	Already handles Hermes via regex-extracted usage

Common Pitfalls

Pitfall 1: Hermes Does Not Have an "ollama" Provider

What goes wrong: Setting adapterConfig.provider = "ollama" causes Hermes to fail — "ollama" is not a valid VALID_PROVIDERS entry in constants.js. Why it happens: Ollama mimics the OpenAI API, so Hermes treats it as provider: "custom" with base_url: "http://localhost:11434/v1". How to avoid: When a user selects an Ollama model, always write provider: "custom" and base_url: "http://localhost:11434/v1" into adapterConfig. These fields are already in the Hermes config schema (see agentConfigurationDoc). Warning signs: Hermes stderr shows "unknown provider" or authentication errors during local model runs.

Pitfall 2: Ollama API Returns Models at `/api/tags`, Not `/v1/models`

What goes wrong: Using the OpenAI-compat endpoint /v1/models to list models misses the details object (parameterSize, quantization_level, family) needed for OLLA-04. Why it happens: /v1/models is OpenAI-compat, /api/tags is Ollama-native with richer data. How to avoid: Use GET localhost:11434/api/tags for model listing (returns details.parameter_size, details.family). Use /v1/models only if passing through to Hermes.

Pitfall 3: stateJson Merge Requires Read-Modify-Write

What goes wrong: db.update(agentRuntimeState).set({ stateJson: newData }) overwrites other fields stored by other parts of the system. Why it happens: Drizzle .set() replaces the entire column value. How to avoid: Use Postgres jsonb merge: stateJson: sql\${agentRuntimeState.stateJson} || ${JSON.stringify(patch)}::jsonb`or read existingstateJsonfirst, then spread. The existingensureRuntimeStatecall inupdateRuntimeState` already reads the row.

Pitfall 4: HermesLocalConfigFields Uses adapterConfig for Both Create and Edit Modes

What goes wrong: Setting provider and base_url only in create mode loses the values on edit, or vice versa. Why it happens: The isCreate flag switches between set!({ model: v }) (create) and mark("adapterConfig", "model", v) (edit) — both paths must update all three fields (model, provider, base_url) when an Ollama model is selected. How to avoid: When Ollama model is selected, call the setter for all three config fields atomically. For create mode: set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" }). For edit mode: three mark() calls or a compound helper.

Pitfall 5: Ollama /api/ps Probe May Have No Models Running

What goes wrong: /api/ps returns an empty models: [] when no model is currently loaded — this does not mean Ollama is absent. Why it happens: Ollama only shows models in /api/ps when they are actively loaded in memory. How to avoid: Use /api/version for detection (OLLA-01), /api/tags for the model list (OLLA-02), and /api/ps only for the optional "memory usage" metric in HERM-07 — handling the empty case as "not currently loaded".

Pitfall 6: HERM-06 Cost Tracking — Ollama Models Return Zero Cost

What goes wrong: Expecting a cost_usd value from runs using local Ollama models — there is no external billing. Why it happens: Hermes does not know the user's GPU/CPU cost. The COST_REGEX will not match if Hermes does not emit a cost line. How to avoid: This is correct behavior. normalizeBilledCostCents(undefined, "unknown") returns 0. Token usage may still be captured if Hermes emits token counts. Accept that Ollama-based runs show $0.00 in the cost UI — that is accurate.

Code Examples

Ollama /api/tags Response Shape (verified)

// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
interface OllamaTagsResponse {
  models: Array<{
    name: string;             // "qwen2.5-coder:32b"
    model: string;            // same as name
    modified_at: string;
    size: number;             // bytes
    digest: string;
    details: {
      parent_model: string;
      format: string;         // "gguf"
      family: string;         // "qwen2"
      families: string[];
      parameter_size: string; // "32.8B"
      quantization_level: string; // "Q4_K_M"
    };
  }>;
}

Ollama /api/ps Response Shape (verified)

// Source: https://docs.ollama.com/api/tags (verified 2026-04-01)
interface OllamaPsResponse {
  models: Array<{
    name: string;
    model: string;
    size: number;
    digest: string;
    details: { /* same as tags */ };
    expires_at: string;
    size_vram: number;  // bytes used in VRAM
  }>;
}

Reading hermes-adapter stateJson Hermes fields

// In AgentDetail.tsx HermesRuntimeCard — read from runtimeState.stateJson
const hermesModel = runtimeState.stateJson?.hermesModel as string | undefined;
const hermesNativeSkillCount = runtimeState.stateJson?.hermesNativeSkillCount as number | undefined;
const hermesMemoryBytes = runtimeState.stateJson?.hermesMemoryBytes as number | undefined;

Hermes Ollama adapterConfig (what to write)

// When user selects an Ollama model in config-fields.tsx:
// model = "qwen2.5-coder:32b"  (bare Ollama model name)
// provider = "custom"           (OpenAI-compatible endpoint)
// base_url = "http://localhost:11434/v1"

// For create mode:
set!({ model, provider: "custom", base_url: "http://localhost:11434/v1" })

// For edit mode:
mark("adapterConfig", "model", model);
mark("adapterConfig", "provider", "custom");
mark("adapterConfig", "base_url", "http://localhost:11434/v1");

Cost Tracking — Already Wired (HERM-06 context)

// Source: server/src/services/heartbeat.ts:updateRuntimeState
// Hermes execute.ts returns:
//   result.usage = { inputTokens, outputTokens }  (from regex)
//   result.costUsd = number | undefined           (from regex, usually undefined for local)
//
// heartbeat.ts normalizes:
const usage = normalizeUsageTotals(result.usage);
const additionalCostCents = normalizeBilledCostCents(result.costUsd, billingType);
// Then:
if (additionalCostCents > 0 || hasTokenUsage) {
  await costs.createEvent(companyId, { ... model: result.model ?? "unknown" ... });
}
// → For Ollama: costCents=0, but inputTokens/outputTokens may be > 0 → cost event recorded
// → If Hermes doesn't emit token counts: no event recorded (correct behavior)

HERM-05: Skill Visibility — What Is Already Done vs. What Is Missing

Already Done (data layer is complete)

skillRegistryService.syncHermesNativeSkills(agentId) scans ~/.hermes/skills/ and inserts source: "native" rows
Called automatically from GET /skill-registry/agents/:agentId/skills when adapterType === "hermes_local"
Returns AgentSkillEntry[] with { skillId, source, installedAt } — both "native" and "managed" source values
Hermes adapter listHermesSkills returns snapshot with originLabel: "Hermes skill" and readOnly: true for native skills

What Is Missing (UI rendering in AgentSkillsTab)

The unmanagedSkillRows section in AgentSkillsTab (AgentDetail.tsx:2566) renders read-only adapter entries. It uses entry.originLabel and entry.locationLabel for display. Hermes native skills already flow through this path.

The gap: the UI may not clearly distinguish "Hermes skill" entries from other unmanaged entries. The originLabel: "Hermes skill" badge rendering and skill count display are the UI additions needed. This is a targeted render update to AgentSkillsTab, not a new data flow.

HERM-07: Dashboard Hermes Runtime Info

What to Store in stateJson

// Written by heartbeat.ts updateRuntimeState after a Hermes run
{
  hermesModel: string;         // e.g. "qwen2.5-coder:32b" or "anthropic/claude-sonnet-4"
  hermesNativeSkillCount: number;  // from skillRegistryService query
  hermesMemoryBytes: number | null; // from /api/ps size_vram, null if unavailable
}

Where to Write stateJson

In heartbeat.ts:updateRuntimeState, after the existing db.update(agentRuntimeState).set(...) call, add a second update that merges hermes-specific fields when agent.adapterType === "hermes_local". Read result.model for hermesModel. Query skillRegistryDb for hermesNativeSkillCount. Query Ollama /api/ps for hermesMemoryBytes (non-blocking, fire-and-forget).

What to Render

A HermesRuntimeCard component in AgentOverview (gated by adapterType === "hermes_local"):

Model name (from stateJson.hermesModel)
Native skill count (from stateJson.hermesNativeSkillCount)
Memory usage (from stateJson.hermesMemoryBytes, formatted as "X.X GB" or "Not loaded")

Environment Availability

Dependency	Required By	Available	Version	Fallback
Ollama daemon	OLLA-01 through OLLA-05	No (not installed)	—	All paths degrade gracefully; UI shows install instructions
hermes-paperclip-adapter	HERM-05, HERM-06, HERM-07	Yes	0.2.1	—
Node.js fetch	Ollama HTTP probing	Yes	built-in (Node 18+)	—
Node.js os module	OLLA-04 RAM reading	Yes	built-in	—
Vitest	Tests	Yes	(server vitest.config.ts)	—

Missing dependencies with no fallback: None — all Ollama features degrade gracefully when Ollama is absent.

Pre-existing test failures (not Phase 28 regressions): 4 test files failing before Phase 28 begins:

app-hmr-port.test.ts
plugin-worker-manager.test.ts
heartbeat-workspace-session.test.ts (5 tests)
skill-registry-routes.test.ts (1 test)

Validation Architecture

Test Framework

Property	Value
Framework	Vitest (server)
Config file	`server/vitest.config.ts`
Quick run command	`cd server && npx vitest run src/__tests__/ollama-service.test.ts`
Full suite command	`cd server && npx vitest run`

Phase Requirements → Test Map

Req ID	Behavior	Test Type	Automated Command	File Exists?
OLLA-01	`detectOllama()` returns `installed: false` when Ollama absent	unit	`npx vitest run src/__tests__/ollama-service.test.ts`	No — Wave 0
OLLA-01	`detectOllama()` returns `installed: true` + version when Ollama present	unit	same	No — Wave 0
OLLA-01	`detectOllama()` times out cleanly (AbortController)	unit	same	No — Wave 0
OLLA-02	`listOllamaModels()` returns AdapterModel[] from /api/tags	unit	same	No — Wave 0
OLLA-04	`buildModelRecommendation()` returns correct model for given RAM budget	unit	same	No — Wave 0
OLLA-05	Routes return `installUrl` when Ollama absent	unit	same	No — Wave 0
HERM-05	Skills tab renders `originLabel: "Hermes skill"` badge	manual-only	—	—
HERM-06	`updateRuntimeState` records cost event when Hermes emits token data	unit (existing pattern)	`npx vitest run src/__tests__/costs-service.test.ts`	Yes
HERM-07	stateJson receives hermesModel/hermesNativeSkillCount after run	unit	`npx vitest run src/__tests__/ollama-service.test.ts`	No — Wave 0

Sampling Rate

Per task commit: cd server && npx vitest run src/__tests__/ollama-service.test.ts
Per wave merge: cd server && npx vitest run
Phase gate: Full suite green before /gsd:verify-work (excluding 4 pre-existing failures)

Wave 0 Gaps

server/src/__tests__/ollama-service.test.ts — covers OLLA-01, OLLA-02, OLLA-04, OLLA-05, HERM-07 stateJson logic
Test stubs use mock fetch (AbortController pattern); no real Ollama needed

State of the Art

Old Approach	Current Approach	When Changed	Impact
Manual text entry for Hermes model	Dropdown fed from Ollama + manual fallback	Phase 28	Better UX for local models
stateJson unused for Hermes	stateJson stores hermesModel, skillCount, memoryBytes	Phase 28	Dashboard can show runtime info
Hermes native skills in separate table only	Skills tab renders both managed + native in unified view	Phase 28 (HERM-05 completion)	Unified skill surface

Open Questions

Should Ollama route be gated to hermes_local only?
- What we know: Only Hermes uses the Ollama custom endpoint pattern currently
- What's unclear: Future adapters (Phase 29 defaults) may also use Ollama
- Recommendation: Mount under /companies/:companyId/ollama/* without adapter-type gating — the endpoint is useful generically and Pi/OpenCode adapters may benefit in Phase 29
Should listOllamaModels also extend the hermes adapter's listModels function?
- What we know: listAdapterModels("hermes_local") already calls adapter.listModels() if present; hermes adapter has no listModels implementation (returns models: [])
- What's unclear: Whether to add listModels to hermes adapter (requires adapter package change) or use a separate Ollama API route in Nexus
- Recommendation: Use a separate Nexus route (/companies/:companyId/ollama/models). Avoids changing the hermes-paperclip-adapter package (external dependency). The config-fields.tsx component can call the Nexus route directly. Do not modify the hermes-paperclip-adapter package.
stateJson hermesNativeSkillCount — count from skillRegistry or from adapter snapshot?
- What we know: skillRegistryDb is a separate libSQL DB; querying it in updateRuntimeState adds cross-DB complexity
- What's unclear: Is the extra query worth it for a display-only count?
- Recommendation: Store the count from result.resultJson if Hermes emits it, or derive from the adapter skill snapshot after run. Alternatively, skip native skill count from stateJson and derive it in the UI from agentsApi.skills(agentId) query. The UI approach avoids cross-DB concerns in heartbeat.

Sources

Primary (HIGH confidence)

hermes-paperclip-adapter@0.2.1 dist source code — execute.js, skills.js, detect-model.js, test.js, constants.js — read directly from /opt/nexus/server/node_modules/hermes-paperclip-adapter/dist/
Nexus codebase — server/src/services/heartbeat.ts, server/src/services/costs.ts, server/src/services/skill-registry.ts, ui/src/pages/AgentDetail.tsx, ui/src/adapters/hermes-local/config-fields.tsx — read directly
Ollama REST API — https://docs.ollama.com/api/tags — verified /api/tags response shape with details.parameter_size, details.family, details.quantization_level
Node.js built-ins — os.totalmem(), fetch with AbortController — confirmed available in Node 18+ runtime

Secondary (MEDIUM confidence)

Hermes Agent provider docs — https://hermes-agent.nousresearch.com/docs/integrations/providers/ — verified "ollama uses custom provider + localhost:11434/v1 base_url"
Hermes Agent + Ollama guide — Medium/Substack articles cross-referencing official docs — confirmed custom endpoint configuration steps

Tertiary (LOW confidence)

Ollama model RAM requirements (catalog) — community sources + Ollama model page tags — use conservative estimates; verify against https://ollama.com/library model pages before shipping

Metadata

Confidence breakdown:

Ollama API: HIGH — verified from official docs, response shapes confirmed
Hermes + Ollama provider mapping: HIGH — verified from official Hermes provider docs
Standard stack: HIGH — all existing infrastructure confirmed from source code
Architecture patterns: HIGH — follow existing codex-models.ts, heartbeat.ts, config-fields.tsx patterns exactly
HERM-05 data layer status: HIGH — verified syncHermesNativeSkills exists and is already called
HERM-06 cost tracking: HIGH — execute.js returns usage/costUsd, heartbeat.ts wires it to costService
Pitfalls: HIGH — derived from actual source code analysis

Research date: 2026-04-01 Valid until: 2026-05-01 (Ollama API is stable; hermes-paperclip-adapter may receive new releases)

29 KiB Raw Blame History