Three audiences, one platform. Jump to your section below.
AI inference is the process of using a trained AI model to generate outputs from new inputs — running the AI to get answers, summaries, or predictions.
Examples: asking ChatGPT a question, summarizing a 50-page document, classifying support tickets, generating product descriptions, extracting info from contracts.
Inference requires powerful GPU hardware. At scale, costs add up quickly — which is exactly the problem MicroDC.ai solves. If your business processes text or data using AI at scale, you're paying for inference. We help you pay up to 90% less.
MicroDC.ai is a distributed AI inference platform — a two-sided marketplace connecting businesses that need AI compute with individuals and organizations that have idle GPU resources.
Problem: AI inference costs are skyrocketing. Major cloud providers charge premium prices and building internal infrastructure requires significant CapEx.
Solution: We aggregate distributed GPU resources globally, providing enterprise-grade inference at up to 90% lower cost than traditional cloud, with no infrastructure management.
MicroDC.ai was founded by Ray Sisson (CEO) and Jeffrey Rivero (CTO), with Neil Quarterman serving as CFO.
| Factor | MicroDC.ai | Major Cloud |
|---|---|---|
| Business model | Marketplace (asset-light) | Infrastructure owner |
| Cost savings | Up to 90% lower | Premium pricing |
| Processing model | Asynchronous batch | Real-time (expensive) |
| Minimum commitment | Pay-per-use, no minimum | Often required |
| Setup time | Minutes | Hours to days |
Organizations typically see 50–90% cost reduction compared to traditional cloud GPU instances:
Example: A document-processing company processing 100,000 documents/month saved 78% by switching from dedicated GPU instances to MicroDC.ai's batch processing.
Ideal: document summarization, content generation at scale, data enrichment, research/batch analytics, overnight processing, report generation.
Consider alternatives for: sub-second real-time chatbots, interactive user-facing AI, streaming responses, mission-critical real-time systems.
Simple credit-based system. 1 credit = $0.01 USD. Welcome bonus credits for new accounts. No setup, monthly, or platform fees. Per-job pricing based on model complexity and tokens processed.
Tiers: Economy (lower cost, flexible timing) · Standard (balanced) · Premium (priority).
Security is foundational. Our CTO is CISSP-certified with 25+ years in cybersecurity:
For zero-knowledge guarantees, opt into end-to-end encrypted jobs — the server never sees plaintext.
Current: security-first architecture from CISSP-certified leadership, regular audits and pen tests, secure development lifecycle.
Planned: SOC 2 Type II, ISO 27001, GDPR documentation. Contact us for current compliance status.
For sensitive workloads we recommend: anonymize/pseudonymize before processing, remove PII when possible, use retention settings to minimize exposure, consider end-to-end encrypted jobs (zero-knowledge), evaluate dedicated worker pools.
For HIPAA, PCI-DSS, or regulated workloads, contact sales to discuss dedicated infrastructure options.
Yes — customized enterprise packages: volume discounts, priority processing, SLA guarantees, dedicated support, custom integration, NET-30 invoice billing for qualified customers. Email sales@microdc.ai.
Platform availability: 99.9% API uptime target.
Job processing SLA by tier:
Enterprise customers receive customized SLAs with credits for missed targets.
Connect resources on your schedule. Workers receive a generous revenue share. Real-time earnings dashboard. Multiple payout options (credits with bonus, PayPal cash). See the Providers page.
We're developing blockchain integration for transparency, trust, and decentralized payments:
Traditional payment methods will remain available alongside any blockchain features.
Now: Batch inference, 100+ models, credit system, web dashboard.
Next: Streaming responses, custom model hosting, enhanced analytics.
Future: On-chain verification, token payments, global expansion.
Yes — we provide an OpenAI-compatible Chat Completions endpoint. Point your existing openai client at https://api.microdc.ai/v1 and keep your code. Multimodal content lists supported. Works with LangChain, LlamaIndex, Instructor, and any OpenAI-shaped library.
By default, jobs are submitted in plaintext so the server can assist with routing and pricing. For zero-knowledge guarantees, use end-to-end encrypted jobs: client encrypts payload with a per-job symmetric key; only the claiming worker receives key material; result is re-encrypted with your public key; per-job keys deleted on acknowledgment. Only workers with the encryption capability can claim encrypted jobs.
Yes — use the container job type. Provide image name, optional args/env, and any script files (.py, .sh, .js, .ts, .go, .rs, .java...) as inputs. Routes only to workers advertising the docker capability. Live log streaming via per-job heartbeat.
Metered pricing — GPU-hour rate × runtime, or CPU-core-hour rate × cores × runtime, plus a $2.00 platform fee per run. Minimum charges apply. CPU-only jobs default to $0.005/core-hour when the worker's CPU model isn't tier-priced.
Every job is labeled with a context tier (1–4) at submission based on prompt length. Workers declare a max_context_tier via heartbeat — your job will only be claimed by a capable worker. Transparent to your code; you don't set the tier, the server computes it.
Tier 1 = 0–1.5K chars, tier 2 = 1.5K–6K, tier 3 = 6K–24K, tier 4 = 24K+.
Three components:
Client (SDK/API) Server (Hub) Workers (Compute)
| | |
| Submit Job | |
|------------------>| |
| | Queue Job |
| |-------------------->|
| | |
| | Poll for Jobs |
| |<--------------------|
| | |
| | Return Results |
| |<--------------------|
| Get Results | |
|<------------------| |Server: API backend, PostgreSQL, scheduler. Workers: distributed GPU nodes running inference. Client: Python SDK and REST API.
Backend: FastAPI, PostgreSQL, JWT auth, Docker.
Frontend (console): HTMX, Alpine.js, Tailwind CSS, Jinja2.
Workers: Python runtime, vLLM / llama.cpp, CUDA/ROCm GPU support.
Infra: Docker Compose (dev), Kubernetes (prod), Nginx, Redis (planned).
Smart retry: a failed job won't be reassigned to the same worker.
pip install git+https://gitlab.com/microdc/python-client.git
from microdc import Client, LLMCall
client = Client(api_key="mDC_your_api_key")
job = LLMCall(model="llama3.3")
job.add_user_message("Explain quantum computing in simple terms")
job_id = client.send_job(job)
result = client.wait_for_job(job_id)
print(result.output)Full SDK docs: gitlab.com/microdc/python-client
Base URL: https://api.microdc.ai/v1 — Authorization: Bearer mDC_your_api_key
| Method | Endpoint | Description |
|---|---|---|
| POST | /jobs | Submit new job |
| GET | /jobs/{id} | Get job status / result |
| GET | /jobs | List your jobs |
| GET | /models | List available models |
| GET | /account/balance | Check credit balance |
| DELETE | /jobs/{id} | Cancel a pending job |
job = LLMCall(model="llama3.3")
job.add_user_message("Process this document...")
job_id = client.send_job(
job,
callback_url="https://your-api.com/webhook/microdc"
)
# Your webhook will receive POST with:
# { "job_id": "uuid", "status": "completed", "result": {...} }Or use polling with client.wait_for_job(job_id).
100+ open-source models.
LLM: Llama 3.3 (8B, 70B), Mistral (7B, Mixtral), Qwen 2.5, DeepSeek, Phi-3, Gemma 2.
Embedding: BGE, E5, GTE, Nomic Embed.
Image: Stable Diffusion XL, SDXL Turbo, Flux.
Live catalog: /models.html
Yes. Custom model hosting is available — upload weights, automatic validation, deployment to compatible workers, version control, private access. Contact us for requirements and pricing.
Most jobs complete within 30 seconds to 5 minutes. Premium tier averages under 1 minute. Factors: priority tier, model size, token count, queue depth.
export MICRODC_API_KEY="mDC_your_api_key" client = Client() # auto-reads from env # or client = Client(api_key="mDC_...") # HTTP header: Authorization: Bearer mDC_your_api_key
Never commit API keys to version control. Use env vars or secrets management.
Default limits (raisable for enterprise):
| Limit | Value |
|---|---|
| Job submissions | 100/minute |
| Status checks | 300/minute |
| Concurrent jobs | 50 (default) |
| Burst allowance | 2× for 10 seconds |
Rate-limit headers: X-RateLimit-Remaining
| Component | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA GTX 1060 6GB | RTX 3080+ / A100 |
| VRAM | 6GB | 16GB+ |
| RAM | 16GB | 32GB+ |
| Storage | 100GB SSD | 500GB+ NVMe |
| Internet | 50 Mbps | 100+ Mbps |
AMD GPUs (ROCm) supported. CPU-only workers can handle smaller models.
git clone https://gitlab.com/microdc/worker.git cd worker pip install -r requirements.txt export MICRODC_WORKER_TOKEN="mdc_wrk_your_token" python worker.py --models llama3-8b,mistral-7b
The worker will auto-register, download required models, and start polling for jobs. Setup guide: gitlab.com/microdc/worker
Per-job ledger entry on the JobAssignment record — your source of truth, not a drift-prone counter. Aggregated to your Earnings page in three windows: lifetime, 90-day, 30-day. Factors: model complexity, job volume, performance/reliability score, geographic latency preferences.
Under evaluation: Ethereum L2s (Arbitrum, Optimism), Solana, Polygon, custom Cosmos-SDK chain. Selection criteria: low fees, high throughput, dev ecosystem, regulatory clarity.
Exploring a utility token: payment currency, worker staking, governance, fee discounts, rewards. Token economics are under development. Traditional fiat payments will always be supported.
Phased rollout: Phase 1 — accept stablecoin payments (USDC/USDT). Phase 2 — on-chain job-completion proofs. Phase 3 — full decentralization, token launch, staking, governance.
Less than you'd think. We support a wide range via context-tier routing — workers declare capacity and only receive jobs they can handle.
Stable internet (~50 Mbps recommended), Linux or Windows, Docker for container jobs.
Yes. CPU-only workers heartbeat cleanly and can serve embedding jobs and small-LLM / short-prompt jobs efficiently. Embedding workloads in particular run at acceptable latency without a GPU at all.
About 5 minutes from sign-up to first heartbeat. Create an account, generate a worker token in the console, install the worker, start it. Jobs start flowing automatically.
Setup guide: gitlab.com/microdc/worker
Your choice, per payout request:
PAYOUT transaction.Head to the Earnings page in your dashboard (under Workers).
For each completed job the customer is charged and you receive a payout share (the platform retains a margin). Your per-job payout is recorded in the JobAssignment ledger — your source of truth. Earnings aggregated in lifetime, 90-day, and 30-day windows.
Only successfully completed jobs earn a payout. If a job fails or times out, the customer isn't charged and you aren't paid for it.
An integer (1–4) your worker advertises via heartbeat. The scheduler then only sends you jobs at or below that tier. Set it if your hardware struggles with long prompts. Don't set it if you have plenty of headroom — you'll see all jobs.
Setting a tier doesn't reduce your earning ceiling — it lets you earn consistently on jobs you can actually complete instead of timing out on oversized prompts.
Yes. Set max_concurrent_jobs > 1 in your worker config. The server will only mark you BUSY when at full capacity. Defaults to 1.
max_concurrent_jobs capacityYes — use Worker Groups. Named groups ("Home-Lab", "Colo-Rack-A", "Customer-X") with aggregate stats per group including 30-day, 90-day, and all-time earnings sourced from the real ledger.
Workers stay. Group membership is just a label — deleting the group sets the FK to NULL on members. Nothing else changes.
Advertise capabilities: ["docker"] in your heartbeat. You'll start receiving customer container jobs. Each container job includes a per-job heartbeat endpoint to stream logs and reset timeouts. Metered pricing — GPU-hour or CPU-core-hour rate × runtime, plus $2.00 platform fee.
The worker runs jobs in Docker with standard container isolation. For maximum safety, run on a dedicated machine or VM. If you don't want to accept arbitrary code, simply don't advertise docker — you'll only receive LLM / embedding / document jobs.
Advertise capabilities: ["encryption"] to opt in. On claim you receive the symmetric key + IV. You decrypt the payload in memory, run inference, then re-encrypt the result with the customer's public key before submitting. Per-job keys are deleted on customer acknowledgment.
The zombie monitor detects stalled jobs using activity-based tracking (no progress for 10 minutes). Stalled jobs are requeued to another worker, and your worker is freed. You don't get paid for the stalled attempt. Repeated stalls (retry exhaustion) result in a FAILED job — preventing infinite retry loops.
Today: just start and stop the worker process. When running, it heartbeats and receives jobs; when stopped, it's offline. Many providers run workers overnight or during spare hours via cron, systemd timers, or Task Scheduler.
The worker's registration token is deactivated. Any further login or heartbeat receives a clear 403. Historical earnings stay in your ledger — records aren't deleted.
Our team is here to help. Reach out for personalized assistance with your specific use case.