Create a chat completion

curl --request POST \ --url https://llm-gateway.assemblyai.com/v1/chat/completions \ --header 'Authorization: <api-key>' \ --header 'Content-Type: application/json' \ --data ' { "model": "claude-sonnet-4-6", "messages": [ { "role": "user", "content": "Hello, how are you?" } ], "max_tokens": 100, "temperature": 0.7 } '

{ "request_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a", "choices": [ { "message": { "role": "<string>", "content": "<string>", "tool_calls": [ { "id": "<string>", "function": { "name": "<string>", "arguments": "<string>" } } ] }, "finish_reason": "<string>" } ], "request": { "model": "<string>", "max_tokens": 123, "temperature": 123, "tools": [ { "function": { "name": "<string>", "parameters": {}, "description": "<string>" } } ] }, "usage": { "input_tokens": 123, "output_tokens": 123, "total_tokens": 123 }, "http_status_code": 200, "response_time": 275510459, "llm_status_code": 200 }

Authorizations

Authorization

string

header

required

Body

application/json

Request body for creating a chat completion.

The main request body for the chat completions endpoint.

model

string

required

The ID of the model to use for this request. See LLM Gateway Overview for available models.

Example:

"claude-sonnet-4-5-20250929"

messages

object[]

A list of messages comprising the conversation so far.

Option 1
Option 2

Show child attributes

prompt

string

A simple string prompt. The API will automatically convert this into a user message.

transcript_id

string

Optional. The ID of an AssemblyAI transcript whose text replaces the first {{ transcript }} tag in the prompt. See Inject a transcript by ID for substitution rules and edge cases.

Example:

"065a71ac-dc3e-4e38-9374-e54c0bea564f"

max_tokens

integer

default:1000

The maximum number of tokens to generate in the completion. Default is 1000.

Required range: x >= 1

temperature

number<float>

Controls randomness. Lower values produce more deterministic results.

Required range: 0 <= x <= 2

stream

boolean

default:false

When true, responses are streamed as server-sent events (SSE). Supported on OpenAI models only.

tools

object[]

A list of tools the model may call.

Show child attributes

tool_choice

Controls which (if any) function is called by the model.

Available options:

none,

auto

response_format

object

Specifies the format of the model's response. Use this to constrain the model to output valid JSON matching a schema. Supported by OpenAI (GPT-4.1, GPT-5.x), Gemini, and Claude models. Not supported by gpt-oss models.

Show child attributes

fallbacks

object[]

An array of fallback objects. Each object must include a model and can optionally override any field from the original request. If the primary model fails, the LLM Gateway tries each fallback in order until one succeeds. See Specify fallback models for more details.

Show child attributes

fallback_config

object

Configuration for fallback behavior, including retry and depth settings. See Specify fallback models for more details.

Show child attributes

post_processing_steps

object[]

An ordered list of post-processing steps to apply to the model's response after generation. Currently supports json-repair, which automatically fixes malformed JSON in LLM Gateway content responses. See Post-processing for details.

Show child attributes

Response

Successful response containing the model's choices.

request_id

string<uuid>

choices

object[]

Show child attributes

request

object

A copy of the original request, excluding prompt and messages.

Show child attributes

usage

object

Show child attributes

http_status_code

integer

The HTTP status code of the response

Example:

200

response_time

integer

The response time in nanoseconds

Example:

275510459

llm_status_code

integer

The status code from the LLM provider

Example:

200

Documentation Index

Authorizations

Body

Response