Create a chat completion
Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Body
Request body for creating a chat completion.
The main request body for the chat completions endpoint.
The ID of the model to use for this request. See LLM Gateway Overview for available models.
"claude-sonnet-4-5-20250929"
A list of messages comprising the conversation so far.
- Option 1
- Option 2
A simple string prompt. The API will automatically convert this into a user message.
Optional. The ID of an AssemblyAI transcript whose text replaces the first {{ transcript }} tag in the prompt. See Inject a transcript by ID for substitution rules and edge cases.
"065a71ac-dc3e-4e38-9374-e54c0bea564f"
The maximum number of tokens to generate in the completion. Default is 1000.
x >= 1Controls randomness. Lower values produce more deterministic results.
0 <= x <= 2When true, responses are streamed as server-sent events (SSE). Supported on OpenAI models only.
A list of tools the model may call.
Controls which (if any) function is called by the model.
none, auto Specifies the format of the model's response. Use this to constrain the model to output valid JSON matching a schema. Supported by OpenAI (GPT-4.1, GPT-5.x), Gemini, and Claude models. Not supported by gpt-oss models.
An array of fallback objects. Each object must include a model and can optionally override any field from the original request. If the primary model fails, the LLM Gateway tries each fallback in order until one succeeds. See Specify fallback models for more details.
Configuration for fallback behavior, including retry and depth settings. See Specify fallback models for more details.
An ordered list of post-processing steps to apply to the model's response after generation. Currently supports json-repair, which automatically fixes malformed JSON in LLM Gateway content responses. See Post-processing for details.
Response
Successful response containing the model's choices.
A copy of the original request, excluding prompt and messages.
The HTTP status code of the response
200
The response time in nanoseconds
275510459
The status code from the LLM provider
200