Use MicroDC.ai as the LLM Backend for Hermes — Fractional-Cost Personal AI Agent

A personal agent is a different cost shape than a chatbot. Chatbots get a handful of calls per session. An always-on agent — checking inboxes, summarizing meetings, running skills on a schedule, holding context across days — fires off hundreds of calls a day in normal use. Multiply that by a few users, or by background memory consolidation, and a hosted-API bill stops looking incidental.

The good news: Hermes treats LLM providers as configuration. It ships with first-class entries for OpenAI, Anthropic, and OpenRouter, plus an explicit custom provider that points at any OpenAI-compatible base_url — which is exactly the shape MicroDC.ai exposes. Setting up the swap is a single config block.

Why this combo works.

Hermes resolves model calls through a provider/model pair, and its config explicitly supports custom OpenAI-compatible endpoints. MicroDC.ai exposes Chat Completions at https://api.microdc.ai/v1 — drop-in shape, same headers, same JSON. The result:

No fork, no patch. You're not modifying Hermes. You're using its documented provider: custom path with a base_url override.
Open-weight models. Llama 3.x, Qwen 2.5, Mistral, DeepSeek, Phi, Gemma, and the Hermes-tuned NousResearch lineage — pick any from the MicroDC.ai catalog. Tool-call-capable models handle Hermes' skill and tool layer cleanly.
Async batch under the hood. Hermes sees a synchronous response, but the request runs through MicroDC.ai's distributed queue — see why that matters for cost.
Local agent, distributed compute. Hermes' memory, skills, and tool state stay on your machine. Only the prompt content actually sent to a model leaves — and end-to-end-encrypted jobs are an option if even that needs to stay opaque.

Step 1: install Hermes.

From the official installer:

# macOS / Linux / WSL2 / Termux
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Windows (PowerShell)
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)

The installer drops a hermes CLI on your $PATH and creates a ~/.hermes/ directory for config, memory, and skills. If this is your first install, run the wizard so the directory is initialized:

hermes setup

You can pick any provider in the wizard — you'll override it in Step 3 anyway.

Step 2: get a MicroDC.ai API key.

Create a free account at console.microdc.ai — about a minute, no credit card. Generate an API key from the dashboard. New accounts get welcome credits, so you can run real Hermes workloads through the integration before adding any funds.

Pick a model from the catalog. For an agent that needs to follow tool-call instructions reliably, qwen3:32b is a solid default — modern, mid-sized, and clean on the OpenAI tool-call format. llama3:70b is the heavier alternative for more reasoning headroom. For lighter background tasks where latency matters more than depth, drop to gpt-oss:20b or phi4:latest — meaningfully cheaper per call. The catalog uses Ollama-style name:tag identifiers; copy them verbatim.

Step 3: point Hermes at MicroDC.ai.

Hermes' main config lives at ~/.hermes/config.yaml; API keys go in ~/.hermes/.env. The provider: custom path tells Hermes to ignore the built-in provider list and call whatever base_url you give it. Edit ~/.hermes/config.yaml so the model block looks like this:

model:
  provider: custom
  model: qwen3:32b
  base_url: "https://api.microdc.ai/v1"

# Optional: a smaller fallback for background tasks
secondary_model:
  provider: custom
  model: gpt-oss:20b
  base_url: "https://api.microdc.ai/v1"

Then add the key. Hermes' CLI auto-routes secrets into .env and everything else into config.yaml:

hermes config set OPENAI_API_KEY mDC_your_api_key_here

That looks wrong on first read — but it's deliberate. When base_url is set, Hermes uses the standard OPENAI_API_KEY environment variable as the bearer token, regardless of whose endpoint it's actually talking to. That's the OpenAI-compatible contract. Your MicroDC.ai key gets sent as Authorization: Bearer mDC_…, which is exactly what the MicroDC.ai API expects.

Confirm the wiring with the bundled model picker:

hermes model

You should see your MicroDC.ai-backed model selected and reachable. A quick interactive smoke test:

hermes chat "ping"

Step 4: keep your other providers around (optional).

You don't have to commit. Hermes resolves model per-invocation, so you can keep an Anthropic or OpenRouter entry in config.yaml for specific skills and route the day-to-day agent loop through MicroDC.ai. A workable split:

Day-to-day agent loop — MicroDC.ai (cheap, async batch under the hood, fine for the thousands of small calls).
Voice mode — whichever real-time provider you prefer, if you want minimal latency on speech turns.
Heavy reasoning skills — your existing premium provider, invoked only for the few tasks that justify the cost.

Hermes' per-skill config can override the global model block; check the skills docs for the exact key shape on the version you're running.

What to expect.

A few honest notes from running this combo in practice:

Latency. Hermes will see roughly the same response shape as a hosted OpenAI call — a few hundred ms of overhead vs a real-time provider, often invisible to the agent. For interactive voice turns you'd want a real-time provider on that skill; for typical agent task loops the queue overhead is in the noise.
Tool-call format. Most modern open-weight chat models handle the OpenAI tool-call shape correctly via their chat template. Llama 3.x and Qwen 2.5 are reliable. If you hit a model that doesn't behave, switch to one that does — the catalog is large.
Memory and skills are local. Hermes' long-term memory, context files, and skill state stay on your machine. Only prompt content actually sent to a model leaves your box. If even that's sensitive, MicroDC.ai supports end-to-end-encrypted jobs.
Context window. Hermes uses each model's reported context limit. Llama 3.x is 128K; smaller models vary — the MicroDC.ai model card lists the exact figure per model.

The pitch.

Hermes is one of the more honest takes on what a personal AI agent should be: open-source, local-first, your memory and your tools, with the LLM as the one externally-provided dependency. The downside of that dependency is what every agent author hits eventually — an always-on agent burns through tokens, and the per-call premium of a hosted API is a tax you didn't sign up for.

Routing Hermes' model layer through MicroDC.ai keeps everything that makes Hermes Hermes — local, configurable, yours — while swapping the most expensive part of the stack for a fractional-cost equivalent. The agent doesn't notice. The bill does.

Hermes site → Get a MicroDC.ai API key →