Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Fallback feature lets you specify one or more backup models that the LLM Gateway will automatically try if your primary model fails. This ensures your application stays resilient without requiring complex retry logic on your end.The LLM Gateway is available in both US and EU regions. Fallback behavior works the same way on both endpoints. See Cloud endpoints and data residency for more details.
Basic usage
To add a fallback, include afallbacks array in your request. Each entry specifies an alternative model to use if the primary model is unavailable:
- Python
- JavaScript
kimi-k2.5 fails, the LLM Gateway automatically retries the request using claude-sonnet-4-6.
You can chain up to two fallback models by setting
fallback_config.depth to 2. The LLM Gateway tries each fallback in order until one succeeds.Override fields per fallback
In the advanced case, you can override specific request fields for each fallback model. For example, you can change themessages or temperature for the fallback:
- Python
- JavaScript
Retry behavior
If no fallbacks are set, the API automatically retries the LLM request once after 500ms. This is becausefallback_config.retry defaults to true, providing a zero-config way to handle transient failures.
For more control over retries, set retry to false and implement your own exponential backoff:
Response behavior
When a fallback is used, the response looks exactly as if you had made the original request with the fallback model. Themodel field in the response reflects the fallback model that was used, and billing is charged only for that model.
API reference
Request parameters
| Key | Type | Required? | Description |
|---|---|---|---|
model | string | Yes | The primary model to use for completion. See Available models for supported values. |
messages | array | Yes | An array of message objects representing the conversation history. |
fallbacks | array | No | An array of fallback objects. Each object must include a model and can override any field available in the original request. |
fallback_config | object | No | Configuration for fallback behavior. |
fallback_config.retry | boolean | No | Whether to automatically retry the request once after 500ms on failure. Defaults to true. |
fallback_config.depth | number | No | Max fallbacks to traverse. Default 1, max 2. |
Fallback object
Each object in thefallbacks array must include a model and can override any field available in the original request. For example:
| Key | Type | Required? | Description |
|---|---|---|---|
model | string | Yes | The fallback model to use. See Available models for supported values. |
messages | array | No | Override the messages for the fallback request. |
temperature | number | No | Override the temperature for the fallback request. |
max_tokens | number | No | Override the max tokens for the fallback request. |
Next steps
- Basic chat completions - Send simple messages and receive responses
- Multi-turn conversations - Maintain context across multiple exchanges
- Tool calling - Enable models to execute custom functions
- Cloud endpoints and data residency - Learn about regional endpoint options