DOC · MICRODC-INFRA-001 v3.4 · BETA
DOC · MICRODC-INFRA-001 · REV 04 · 2026.05

Distributed inference,
at the scale your
workloads demand.

An asynchronous job queue spanning a continent of GPUs. Submit work, receive results. Built for enterprise batch inference — document pipelines, dataset enrichment, overnight runs — at a fraction of dedicated-cluster cost.

JOBS PROCESSED
ALL TIME
JOBS QUEUED
WAITING
JOBS · 24H
RECENT
SUCCESS RATE
COMPLETED / TOTAL
FIG.01 · INFERENCE TOPOLOGY
SCALE 1:1
▶ Live job log · /v3/jobs/stream ● REC
TIME JOB / MODEL REGION TOKENS STATUS
§02 · INTEGRATION

Drop-in replacement for your
OpenAI client.

Point your base_url at MicroDC.ai and submit asynchronously. Poll, stream, or hand us a webhook — your call. End-to-end encrypted payloads available for sensitive workloads.

View Docs Python SDK
from microdc import Client

client = Client(api_key=os.environ["MDC_KEY"])

# Submit asynchronously — get a job id back.
job = client.jobs.create(
    model="llama-3.3-70b",
    messages=[{"role":"user","content":"..."}],
    encrypt=True,           # zero-knowledge
    webhook="https://api.acme.co/done",
)

# Or poll. Or stream. Your call.
result = client.jobs.wait(job.id)
print(result.choices[0].message.content)

Three audiences. One queue.

§03 · PATHS
01 · FOR DEVELOPERS

Asynchronous inference, no infrastructure to babysit.

REST, Python, OpenAI-compat. Submit jobs, get results. Encrypted payloads for sensitive workloads. Custom-amount credits — no minimums, no monthly fees.

  • REST · SDK · OpenAI-compat
  • LLM · embedding · document · Docker
  • End-to-end encryption (zero-knowledge)
  • Pay per compute-second
02 · FOR ENTERPRISES

Bring batch inference in-budget — without a procurement cycle.

Pre-negotiated capacity tiers, dedicated worker groups, private VPC routing, SLA-backed throughput. We're onboarding significant compute to handle medium-to-large customers.

  • Reserved-capacity pools
  • Private worker groups
  • VPC peering · region pinning
  • SOC 2 (in progress) · MSA available
03 · FOR GPU OPERATORS

Idle silicon → revenue.

Connect a worker in five minutes. Context-tier routing means low-spec hardware still earns. Multi-GPU and concurrent jobs supported. Payouts in credits or PayPal.

  • 5-minute install
  • Context-tier routing
  • Multi-GPU · concurrent
  • Credits + bonus or USD

Mechanics, in four steps.

§04 · MECHANICS
STEP 01 ━━━▶

Submit

POST your job — model, payload, optional encryption keys, optional webhook.

STEP 02 ━━━▶

Route

Our scheduler matches the job to a worker tier with the right VRAM, locality, and price ceiling.

STEP 03 ━━━▶

Execute

A worker pulls, runs, and returns. Streaming token deltas where applicable.

STEP 04

Deliver

Webhook fires, or your client retrieves. Receipt-of-compute logged for billing transparency.

Workloads we already run.

§05 · APPLICATIONS
DOC PIPELINES47.2M docs / mo

Document Processing

Summarize tens of thousands of PDFs, extract structure from contracts, normalize OCR output. Queue at 2am, deliver at 6am.

RESEARCH1,840 labs

Research & Analysis

Run grids of experiments across model × prompt × dataset. No reservation, no spin-up time, no idle burn.

ENRICHMENT12.4B rows / qtr

Data Enrichment

Classify, tag, embed, and score records at warehouse scale. Webhook back into your ETL.

GENERATION92M jobs / mo

Content Generation

Personalized summaries, briefs, and translations. Rate-limit-free batches.

AUTOMATIONAPI-first

Automation Pipelines

Plug LLM steps into Airflow, Temporal, or n8n. Idempotent retries, signed receipts.

OVERNIGHT−74% cost

Overnight Processing

Submit during business hours, results on your morning desk. Cheapest tier.

§06 · ENGAGE

Plug in. Ship work.

No credit card to start. No monthly minimum. Free tier for evaluation, custom contracts for production. We're shipping reserved-capacity tiers for medium-to-large enterprise workloads this quarter — talk to us if you're sizing.