If you started a new LLM app in the last 18 months, there's a high chance the import line says from openai import OpenAI. The SDK is well-designed, the docs are good, and most other providers have copied the request/response shape. That's a happy accident for everyone: it means you can swap providers without rewriting your application.
MicroDC.ai ships an OpenAI-compatible Chat Completions endpoint. You change base_url to ours, supply your MicroDC.ai API key, and the rest of your code — including LangChain, LlamaIndex, Instructor, and any other library built on the OpenAI SDK — works without modification.
The diff.
Here's a typical OpenAI integration:
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Summarize this contract..."}],
)
print(resp.choices[0].message.content)
The MicroDC.ai version:
from openai import OpenAI
client = OpenAI(
api_key=os.environ["MICRODC_API_KEY"],
base_url="https://api.microdc.ai/v1",
)
resp = client.chat.completions.create(
model="llama-3.1-8b",
messages=[{"role": "user", "content": "Summarize this contract..."}],
)
print(resp.choices[0].message.content)
Two lines change: the base_url and the model. Everything else is identical — including the response shape, error format, and parameter names. If you'd already wrapped the OpenAI client in a service class, you'd swap the base_url in one place and ship.
What works as-is.
- Chat completions. The full request/response shape, including
system/user/assistantmessage roles,temperature,top_p,max_tokens,stop, and the standardchoices[]response. - Multimodal content lists. Pass image parts as
{"type": "image_url", "image_url": {"url": "..."}}and they route to vision-capable models in our catalog. - LangChain.
ChatOpenAI(base_url="https://api.microdc.ai/v1", api_key="...")— pipelines, agents, and tools all work. - LlamaIndex. Same pattern via
OpenAI(api_base="https://api.microdc.ai/v1", api_key="..."). - Instructor. Patches the OpenAI client; the patched client points wherever you point the underlying client. Structured outputs work.
- Any other library that lets you override
base_urlorapi_baseon the OpenAI client.
What's different.
Worth being upfront about the differences, because surprises in production are the worst kind:
- Async-by-default underneath. The endpoint accepts the request synchronously and returns the response synchronously — that part is OpenAI-shape. Internally, the work runs through our async job queue. For most batch and chat workloads the overhead is in the tens of milliseconds. For latency-critical sub-second use cases you want a real-time inference provider.
- Models are different. We run open-source models — Llama, Mistral, Qwen, DeepSeek, Phi, Gemma, and others.
gpt-*model names won't resolve. See the model catalog for what's currently available. - Function calling / tools. Supported on models whose tokenizers and chat templates carry the tool-call format. Llama 3.x and Qwen 2.5 work cleanly. Older models may not. Check before relying on it for production.
- Streaming. Server-sent events streaming is being rolled out across the model catalog. Most LLM models stream today; some specialized models don't yet.
- Logprobs and other rarely-used fields. Some advanced fields (
logprobs,seed, etc.) are accepted but not always returned, depending on the underlying model server. If your code depends on them, test first.
The cost difference.
For the same Llama 3.1 8B chat completion (1,000 output tokens, ~500 input):
| Provider | Cost | Notes |
|---|---|---|
| Major cloud serverless | $0.10–$0.20 | Real-time premium baked in |
| OpenAI gpt-4o-mini equivalent quality | ~$0.0006 | Cheap proprietary tier |
| MicroDC.ai llama-3.1-8b | ~$0.011 | Open weights, fractional infra cost |
The proprietary tier is competitive on raw price for small models, but you give up control: opaque routing, surprise rate limits, and zero ability to use a custom or fine-tuned model. The MicroDC.ai tier wins decisively when you want to run open-weight models, when you need predictable pricing for batch volume, or when you have data-residency or zero-knowledge requirements that proprietary APIs can't meet.
Migration in one diff.
Most teams we've moved over have a wrapper class around the OpenAI client — something like LLMClient in their services/ folder. The migration is changing the constructor in that one file. The rest of the codebase doesn't know it's running on a different provider.
--- a/services/llm_client.py
+++ b/services/llm_client.py
@@
-client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
+client = OpenAI(
+ api_key=os.environ["MICRODC_API_KEY"],
+ base_url="https://api.microdc.ai/v1",
+)
@@
-DEFAULT_MODEL = "gpt-4o-mini"
+DEFAULT_MODEL = "llama-3.1-8b"
That's the whole change. Run your test suite. Run a small canary on real traffic. Compare costs at the end of the month. Most teams keep both providers wired up for a release or two so they can A/B and roll back if anything surprises them.
The takeaway.
OpenAI-compatibility isn't a marketing checkbox. It means the SDK you already use, the framework code you already wrote, and the integration tests you already have all work without modification. The cost gap from "real-time premium proprietary API" to "open-weight model on a distributed network" is large enough to matter for any non-trivial volume of inference.
If you're paying real-time prices for an LLM workload that isn't actually real-time, this migration is one diff away from a substantially smaller bill.