Documentation

Build with the world's most efficient AI models. Deploy in minutes, scale to billions of tokens, and save up to 90% on inference costs.

Use the same SDKs and code you already know. Switch in seconds, not days.

98.2% compression with 13.6 TPS throughput. The most efficient inference available.

RESTful API with full TypeScript and Python SDKs. Get started in minutes.

Start Building

Make your first API call in under 5 minutes

Learn about our OpenAI-compatible chat API

Explore our jobim-optimized model family

Understand our revolutionary compression