A simple asynchronous job queue for distributed AI inference. Submit, route, execute, deliver — explained in detail below.
Validated, added to the distributed queue with model spec, input prompt, and config parameters.
Scheduler matches the job to suitable workers based on model requirements, GPU capabilities, and current load.
Workers pull jobs, load models (cached if available), execute inference, submit results back encrypted and authenticated.
Results stored; notifications via webhook or polling. Configurable retention period.
Save up to 90% vs dedicated GPU instances. Pay only for actual compute time, not idle resources.
Process thousands of jobs in parallel across the distributed worker network. No infrastructure to manage.
Simple API, comprehensive SDKs, detailed documentation. Start integrating in minutes.
Every job is labeled with a context tier (1–4) at submission based on prompt length. Workers declare a max_context_tier via heartbeat — the scheduler only sends them jobs they can run efficiently. Better latency everywhere; revenue opportunities for low-spec hardware.
| Tier | Prompt size | Default routing |
|---|---|---|
| Tier 1 | 0–1.5K chars | All workers |
| Tier 2 | 1.5K–6K chars | Workers with tier ≥ 2 |
| Tier 3 | 6K–24K chars | Workers with tier ≥ 3 |
| Tier 4 | 24K+ chars | Workers with tier 4 (high-end GPUs) |
Payload encrypted before it leaves the client. Server stores ciphertext only. Only workers with the encryption capability receive decryption material — and only on claim.
Per-job symmetric key, payload encrypted, public key sent for results.
Ciphertext stored, keys held separately. Only the claiming worker gets decryption material.
In-memory decryption, inference, re-encrypts result with your public key.
Client decrypts with private key. Per-job keys deleted on acknowledgment.
Docker jobs get live log streaming and per-job heartbeats. Routes only to workers advertising the docker capability.
Provide Docker image name, optional args/env, script and code files as inputs.
Only workers advertising docker see container jobs. Image pulled and started.
Workers ping POST /jobs/{id}/heartbeat with log lines. Resets timeout for long runs.
Exit code, stdout/stderr, output files submitted. Local cache means faster subsequent runs.
Free credits to start. No credit card. Five minutes from signup to first job result.