A personal agent is a different cost shape than a chatbot. Chatbots get a handful of calls per session. An always-on agent — checking inboxes, summarizing meetings, running skills on a schedule, holding context across days — fires off hundreds of calls a day in normal use. Multiply that by a few users, or by background memory consolidation, and a hosted-API bill stops looking incidental.
The good news: Hermes treats LLM providers as configuration. It ships with first-class entries for OpenAI, Anthropic, and OpenRouter, plus an explicit custom provider that points at any OpenAI-compatible base_url — which is exactly the shape MicroDC.ai exposes. Setting up the swap is a single config block.
Why this combo works.
Hermes resolves model calls through a provider/model pair, and its config explicitly supports custom OpenAI-compatible endpoints. MicroDC.ai exposes Chat Completions at https://api.microdc.ai/v1 — drop-in shape, same headers, same JSON. The result:
- No fork, no patch. You're not modifying Hermes. You're using its documented
provider: custompath with abase_urloverride. - Open-weight models. Llama 3.x, Qwen 2.5, Mistral, DeepSeek, Phi, Gemma, and the Hermes-tuned NousResearch lineage — pick any from the MicroDC.ai catalog. Tool-call-capable models handle Hermes' skill and tool layer cleanly.
- Async batch under the hood. Hermes sees a synchronous response, but the request runs through MicroDC.ai's distributed queue — see why that matters for cost.
- Local agent, distributed compute. Hermes' memory, skills, and tool state stay on your machine. Only the prompt content actually sent to a model leaves — and end-to-end-encrypted jobs are an option if even that needs to stay opaque.
Step 1: install Hermes.
From the official installer:
# macOS / Linux / WSL2 / Termux
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# Windows (PowerShell)
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)
The installer drops a hermes CLI on your $PATH and creates a ~/.hermes/ directory for config, memory, and skills. If this is your first install, run the wizard so the directory is initialized:
hermes setup
You can pick any provider in the wizard — you'll override it in Step 3 anyway.
Step 2: get a MicroDC.ai API key.
Create a free account at console.microdc.ai — about a minute, no credit card. Generate an API key from the dashboard. New accounts get welcome credits, so you can run real Hermes workloads through the integration before adding any funds.
Pick a model from the catalog. For an agent that needs to follow tool-call instructions reliably, qwen3:32b is a solid default — modern, mid-sized, and clean on the OpenAI tool-call format. llama3:70b is the heavier alternative for more reasoning headroom. For lighter background tasks where latency matters more than depth, drop to gpt-oss:20b or phi4:latest — meaningfully cheaper per call. The catalog uses Ollama-style name:tag identifiers; copy them verbatim.
Step 3: point Hermes at MicroDC.ai.
Hermes' main config lives at ~/.hermes/config.yaml; API keys go in ~/.hermes/.env. The provider: custom path tells Hermes to ignore the built-in provider list and call whatever base_url you give it. Edit ~/.hermes/config.yaml so the model block looks like this:
model:
provider: custom
model: qwen3:32b
base_url: "https://api.microdc.ai/v1"
# Optional: a smaller fallback for background tasks
secondary_model:
provider: custom
model: gpt-oss:20b
base_url: "https://api.microdc.ai/v1"
Then add the key. Hermes' CLI auto-routes secrets into .env and everything else into config.yaml:
hermes config set OPENAI_API_KEY mDC_your_api_key_here
That looks wrong on first read — but it's deliberate. When base_url is set, Hermes uses the standard OPENAI_API_KEY environment variable as the bearer token, regardless of whose endpoint it's actually talking to. That's the OpenAI-compatible contract. Your MicroDC.ai key gets sent as Authorization: Bearer mDC_…, which is exactly what the MicroDC.ai API expects.
Confirm the wiring with the bundled model picker:
hermes model
You should see your MicroDC.ai-backed model selected and reachable. A quick interactive smoke test:
hermes chat "ping"
Step 4: keep your other providers around (optional).
You don't have to commit. Hermes resolves model per-invocation, so you can keep an Anthropic or OpenRouter entry in config.yaml for specific skills and route the day-to-day agent loop through MicroDC.ai. A workable split:
- Day-to-day agent loop — MicroDC.ai (cheap, async batch under the hood, fine for the thousands of small calls).
- Voice mode — whichever real-time provider you prefer, if you want minimal latency on speech turns.
- Heavy reasoning skills — your existing premium provider, invoked only for the few tasks that justify the cost.
Hermes' per-skill config can override the global model block; check the skills docs for the exact key shape on the version you're running.
What to expect.
A few honest notes from running this combo in practice:
- Latency. Hermes will see roughly the same response shape as a hosted OpenAI call — a few hundred ms of overhead vs a real-time provider, often invisible to the agent. For interactive voice turns you'd want a real-time provider on that skill; for typical agent task loops the queue overhead is in the noise.
- Tool-call format. Most modern open-weight chat models handle the OpenAI tool-call shape correctly via their chat template. Llama 3.x and Qwen 2.5 are reliable. If you hit a model that doesn't behave, switch to one that does — the catalog is large.
- Memory and skills are local. Hermes' long-term memory, context files, and skill state stay on your machine. Only prompt content actually sent to a model leaves your box. If even that's sensitive, MicroDC.ai supports end-to-end-encrypted jobs.
- Context window. Hermes uses each model's reported context limit. Llama 3.x is 128K; smaller models vary — the MicroDC.ai model card lists the exact figure per model.
The pitch.
Hermes is one of the more honest takes on what a personal AI agent should be: open-source, local-first, your memory and your tools, with the LLM as the one externally-provided dependency. The downside of that dependency is what every agent author hits eventually — an always-on agent burns through tokens, and the per-call premium of a hosted API is a tax you didn't sign up for.
Routing Hermes' model layer through MicroDC.ai keeps everything that makes Hermes Hermes — local, configurable, yours — while swapping the most expensive part of the stack for a fractional-cost equivalent. The agent doesn't notice. The bill does.