Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Supported regions
Supported regions
US & EU
Overview
AssemblyAI’s LLM Gateway is a unified interface that allows you to connect with multiple LLM providers including Claude, GPT, Gemini, and more. You can use the LLM Gateway to build sophisticated AI applications through a single API.| Endpoint | Base URL |
|---|---|
| US (default) | https://llm-gateway.assemblyai.com/v1/chat/completions |
| EU | https://llm-gateway.eu.assemblyai.com/v1/chat/completions |
The LLM Gateway is available in both US and EU regions. Use the EU endpoint to
ensure your data stays within the European Union. Currently, Anthropic Claude
and most Google Gemini models are supported in the EU (except where otherwise noted). OpenAI models are only
available in the US region. See Cloud Endpoints and Data
Residency for more details.
- Basic Chat Completions - Simple request/response interactions
- Streamed Responses - Stream output as it’s generated (OpenAI models)
- Multi-turn Conversations - Maintain context across multiple exchanges
- Structured Outputs - Constrain responses to a specific JSON schema
- Tool/Function Calling - Enable models to execute custom functions
- Agentic Workflows - Multi-step reasoning with automatic tool chaining
- Unified Interface - One API for Claude, GPT, Gemini, Qwen, Kimi, and more
- Post-processing - Automatically repair malformed JSON responses with built-in JSON repair
Available models
By quality (LMArena Score)
| Model | Provider | Parameter | LMArena Score | Latency per 10,000 tokens |
|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | claude-opus-4-6 | 1498 | 7.4s |
| Claude Opus 4.7 | Anthropic | claude-opus-4-7 | 1491 | TBD |
| Gemini 3.5 Flash | gemini-3.5-flash | 1480 | TBD | |
| GPT-5.5 | OpenAI | gpt-5.5 | 1475 | TBD |
| Gemini 3 Flash Preview | gemini-3-flash-preview | 1474 | 4.2s | |
| Claude Opus 4.5 | Anthropic | claude-opus-4-5-20251101 | 1468 | 3.9s |
| Claude Sonnet 4.6 | Anthropic | claude-sonnet-4-6 | 1466 | 7.2s |
| Claude 4.5 Sonnet | Anthropic | claude-sonnet-4-5-20250929 | 1453 | 5.6s |
| Gemini 2.5 Pro | gemini-2.5-pro | 1448 | 4.0s | |
| GPT-5.1 | OpenAI | gpt-5.1 | 1439 | 2.7s |
| Gemini 3.1 Flash Lite Preview | gemini-3.1-flash-lite-preview | 1438 | TBD | |
| GPT-5.2 | OpenAI | gpt-5.2 | 1437 | 1.6s |
| GPT-5 | OpenAI | gpt-5 | 1434 | 4.3s |
| Kimi K2.5 | Moonshot AI | kimi-k2.5 | 1432 | 1.2s |
| GPT-4.1 | OpenAI | gpt-4.1 | 1413 | 1.8s |
| Claude 4 Opus | Anthropic | claude-opus-4-20250514 | 1412 | 13.6s |
| Gemini 2.5 Flash | gemini-2.5-flash | 1411 | 2.6s | |
| Claude 4.5 Haiku | Anthropic | claude-haiku-4-5-20251001 | 1409 | 4.1s |
| Qwen3 Next 80B A3B | Alibaba Cloud | qwen3-next-80b-a3b | 1402 | 3.1s |
| GPT-5 mini | OpenAI | gpt-5-mini | 1390 | 3.8s |
| Claude 4 Sonnet | Anthropic | claude-sonnet-4-20250514 | 1389 | 5.1s |
| Gemini 2.5 Flash-Lite | gemini-2.5-flash-lite | 1380 | 1.1s | |
| gpt-oss-120b | OpenAI | gpt-oss-120b | 1353 | 1.4s |
| Qwen3 32B | Alibaba Cloud | qwen3-32B | 1347 | 3.7s |
| GPT-5 nano | OpenAI | gpt-5-nano | 1337 | 3.2s |
| gpt-oss-20b | OpenAI | gpt-oss-20b | 1317 | 1.1s |
By latency (per 10,000 tokens)
| Model | Provider | Parameter | Latency per 10,000 tokens | LMArena Score |
|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | gemini-2.5-flash-lite | 1.1s | 1380 | |
| gpt-oss-20b | OpenAI | gpt-oss-20b | 1.1s | 1317 |
| Kimi K2.5 | Moonshot AI | kimi-k2.5 | 1.2s | 1432 |
| gpt-oss-120b | OpenAI | gpt-oss-120b | 1.4s | 1353 |
| GPT-5.2 | OpenAI | gpt-5.2 | 1.6s | 1437 |
| GPT-4.1 | OpenAI | gpt-4.1 | 1.8s | 1413 |
| Gemini 2.5 Flash | gemini-2.5-flash | 2.6s | 1411 | |
| GPT-5.1 | OpenAI | gpt-5.1 | 2.7s | 1439 |
| Qwen3 Next 80B A3B | Alibaba Cloud | qwen3-next-80b-a3b | 3.1s | 1402 |
| GPT-5 nano | OpenAI | gpt-5-nano | 3.2s | 1337 |
| Qwen3 32B | Alibaba Cloud | qwen3-32B | 3.7s | 1347 |
| GPT-5 mini | OpenAI | gpt-5-mini | 3.8s | 1390 |
| Claude Opus 4.5 | Anthropic | claude-opus-4-5-20251101 | 3.9s | 1468 |
| Gemini 2.5 Pro | gemini-2.5-pro | 4.0s | 1448 | |
| Claude 4.5 Haiku | Anthropic | claude-haiku-4-5-20251001 | 4.1s | 1409 |
| Gemini 3 Flash Preview | gemini-3-flash-preview | 4.2s | 1474 | |
| GPT-5 | OpenAI | gpt-5 | 4.3s | 1434 |
| Claude 4 Sonnet | Anthropic | claude-sonnet-4-20250514 | 5.1s | 1389 |
| Claude 4.5 Sonnet | Anthropic | claude-sonnet-4-5-20250929 | 5.6s | 1453 |
| Claude Sonnet 4.6 | Anthropic | claude-sonnet-4-6 | 7.2s | 1466 |
| Claude Opus 4.6 | Anthropic | claude-opus-4-6 | 7.4s | 1498 |
| Claude 4 Opus | Anthropic | claude-opus-4-20250514 | 13.6s | 1412 |
| Claude Opus 4.7 | Anthropic | claude-opus-4-7 | TBD | 1491 |
| GPT-5.5 | OpenAI | gpt-5.5 | TBD | 1475 |
| Gemini 3.1 Flash Lite Preview | gemini-3.1-flash-lite-preview | TBD | 1438 | |
| Gemini 3.5 Flash | gemini-3.5-flash | TBD | 1480 |
By provider
Anthropic Claude
| Model | Parameter | LMArena Score | Latency per 10,000 tokens |
|---|---|---|---|
| Claude Opus 4.7 | claude-opus-4-7 | 1491 | TBD |
| Claude Opus 4.6 | claude-opus-4-6 | 1498 | 7.4s |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | 1466 | 7.2s |
| Claude Opus 4.5 | claude-opus-4-5-20251101 | 1468 | 3.9s |
| Claude 4.5 Sonnet | claude-sonnet-4-5-20250929 | 1453 | 5.6s |
| Claude 4.5 Haiku | claude-haiku-4-5-20251001 | 1409 | 4.1s |
| Claude 4 Opus | claude-opus-4-20250514 | 1412 | 13.6s |
| Claude 4 Sonnet | claude-sonnet-4-20250514 | 1389 | 5.1s |
OpenAI GPT
| Model | Parameter | LMArena Score | Latency per 10,000 tokens |
|---|---|---|---|
| GPT-5.5 | gpt-5.5 | 1475 | TBD |
| GPT-5.2 | gpt-5.2 | 1437 | 1.6s |
| GPT-5.1 | gpt-5.1 | 1439 | 2.7s |
| GPT-5 | gpt-5 | 1434 | 4.3s |
| GPT-5 nano | gpt-5-nano | 1337 | 3.2s |
| GPT-5 mini | gpt-5-mini | 1390 | 3.8s |
| GPT-4.1 | gpt-4.1 | 1413 | 1.8s |
| gpt-oss-120b | gpt-oss-120b | 1353 | 1.4s |
| gpt-oss-20b | gpt-oss-20b | 1317 | 1.1s |
Google Gemini
| Model | Parameter | LMArena Score | Latency per 10,000 tokens |
|---|---|---|---|
| Gemini 3.5 Flash | gemini-3.5-flash | 1480 | TBD |
| Gemini 3 Flash Preview | gemini-3-flash-preview | 1474 | 4.2s |
| Gemini 3.1 Flash Lite Preview | gemini-3.1-flash-lite-preview | 1438 | TBD |
| Gemini 2.5 Pro | gemini-2.5-pro | 1448 | 4.0s |
| Gemini 2.5 Flash | gemini-2.5-flash | 1411 | 2.6s |
| Gemini 2.5 Flash-Lite | gemini-2.5-flash-lite | 1380 | 1.1s |
Gemini 3.1 Flash Lite Preview is currently available in the US region only.
Alibaba Cloud Qwen
| Model | Parameter | LMArena Score | Latency per 10,000 tokens |
|---|---|---|---|
| Qwen3 Next 80B A3B | qwen3-next-80b-a3b | 1402 | 3.1s |
| Qwen3 32B | qwen3-32B | 1347 | 3.7s |
Moonshot AI Kimi
| Model | Parameter | LMArena Score | Latency per 10,000 tokens |
|---|---|---|---|
| Kimi K2.5 | kimi-k2.5 | 1432 | 1.2s |
Claude Opus 4.5 and Claude Opus 4.6 currently support context windows under
200k tokens via the LLM Gateway.
For information on data retention and model training policies for each
provider, see Data Retention and Model Training.
Head to our Playground to
test out LLM Gateway without having to write any code!
Select a model
You can specify which model to use in your request by setting themodel parameter. Here are examples showing how to use Claude 4.5 Sonnet:
- Python
- JavaScript
model parameter to use any of the available models listed in the Available models section above.
Want to compare models side-by-side? Try the Model Comparison
Tool, a Lovable
application, to test different LLM models and see how they perform.
Logging and troubleshooting
Every LLM Gateway response includes arequest_id field — a unique identifier for that request. Persist it (along with the model, the API region, and a timestamp) for every call you make, not just when something goes wrong. If you contact support@assemblyai.com about a specific request (latency spikes, unexpected output, rate-limit errors, content moderation surprises), this ID lets us locate the exact request in our logs immediately.
We recommend logging, at minimum:
request_idfrom the response body- The
modelparameter used - The API region (US:
llm-gateway.assemblyai.com, EU:llm-gateway.eu.assemblyai.com) - A timestamp for when the request was sent
- The full error response body when a non-2xx status code is returned
Next steps
- Basic Chat Completions - Learn how to send simple messages and receive responses
- Multi-turn Conversations - Maintain context across multiple exchanges
- Structured Outputs - Constrain model responses to follow a specific JSON schema
- Tool Calling - Enable models to execute custom functions
- Agentic Workflows - Build multi-step reasoning applications
- Post-processing - Automatically repair malformed JSON in model responses
The LLM Gateway API is separate from the Speech-to-Text and Speech
Understanding APIs. It provides a unified interface to work with large
language models across multiple providers.