Skip to main content
POST
/
chat
/
completions
curl --request POST \
  --url https://llm-gateway.assemblyai.com/v1/chat/completions \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "claude-sonnet-4-6",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.7
}
'
{
  "request_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "choices": [
    {
      "message": {
        "role": "<string>",
        "content": "<string>",
        "tool_calls": [
          {
            "id": "<string>",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            }
          }
        ]
      },
      "finish_reason": "<string>"
    }
  ],
  "request": {
    "model": "<string>",
    "max_tokens": 123,
    "temperature": 123,
    "tools": [
      {
        "function": {
          "name": "<string>",
          "parameters": {},
          "description": "<string>"
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "total_tokens": 123
  },
  "http_status_code": 200,
  "response_time": 275510459,
  "llm_status_code": 200
}

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

Body

application/json

Request body for creating a chat completion.

The main request body for the chat completions endpoint.

model
string
required

The ID of the model to use for this request. See LLM Gateway Overview for available models.

Example:

"claude-sonnet-4-5-20250929"

messages
object[]

A list of messages comprising the conversation so far.

prompt
string

A simple string prompt. The API will automatically convert this into a user message.

transcript_id
string

Optional. The ID of an AssemblyAI transcript whose text replaces the first {{ transcript }} tag in the prompt. See Inject a transcript by ID for substitution rules and edge cases.

Example:

"065a71ac-dc3e-4e38-9374-e54c0bea564f"

max_tokens
integer
default:1000

The maximum number of tokens to generate in the completion. Default is 1000.

Required range: x >= 1
temperature
number<float>

Controls randomness. Lower values produce more deterministic results.

Required range: 0 <= x <= 2
stream
boolean
default:false

When true, responses are streamed as server-sent events (SSE). Supported on OpenAI models only.

tools
object[]

A list of tools the model may call.

tool_choice

Controls which (if any) function is called by the model.

Available options:
none,
auto
response_format
object

Specifies the format of the model's response. Use this to constrain the model to output valid JSON matching a schema. Supported by OpenAI (GPT-4.1, GPT-5.x), Gemini, and Claude models. Not supported by gpt-oss models.

fallbacks
object[]

An array of fallback objects. Each object must include a model and can optionally override any field from the original request. If the primary model fails, the LLM Gateway tries each fallback in order until one succeeds. See Specify fallback models for more details.

fallback_config
object

Configuration for fallback behavior, including retry and depth settings. See Specify fallback models for more details.

post_processing_steps
object[]

An ordered list of post-processing steps to apply to the model's response after generation. Currently supports json-repair, which automatically fixes malformed JSON in LLM Gateway content responses. See Post-processing for details.

Response

Successful response containing the model's choices.

request_id
string<uuid>
choices
object[]
request
object

A copy of the original request, excluding prompt and messages.

usage
object
http_status_code
integer

The HTTP status code of the response

Example:

200

response_time
integer

The response time in nanoseconds

Example:

275510459

llm_status_code
integer

The status code from the LLM provider

Example:

200