Skip to main content

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Basic chat completions allow you to send a message and receive a response from the model. This is the simplest way to interact with the LLM Gateway.

Getting started

Send a message and receive a response:
import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers = headers,
    json = {
        "model": "claude-sonnet-4-6",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "max_tokens": 1000
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

Streamed responses

You can stream responses from OpenAI models by setting stream to true. This returns partial responses as server-sent events (SSE), allowing you to display output as it’s generated.
Streamed responses are currently supported on OpenAI models only.
import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers=headers,
    json={
        "model": "gpt-4.1",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "stream": True,
        "max_tokens": 1000
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode("utf-8"))

API reference

Request

The LLM Gateway accepts POST requests to https://llm-gateway.assemblyai.com/v1/chat/completions with the following parameters:
curl -X POST \
  "https://llm-gateway.assemblyai.com/v1/chat/completions" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "max_tokens": 1000
  }'

Request parameters

KeyTypeRequired?Description
modelstringYesThe model to use for completion. See Available models section for supported values.
messagesarrayYes*An array of message objects representing the conversation history. Either messages or prompt is required.
promptstringYes*A simple string prompt for single request/response interactions. Either messages or prompt is required.
streambooleanNoWhen true, responses are streamed as server-sent events (SSE). Supported on OpenAI models only.
max_tokensnumberNoThe maximum number of tokens to generate. Default: 1000. Range: [1, context_length).
temperaturenumberNoControls randomness in the output. Higher values make output more random. Range: [0, 2].
post_processing_stepsarrayNoAn ordered list of post-processing steps to apply to the response. See Post-processing.
transcript_idstringNoInject an AssemblyAI transcript’s text into the prompt. The first {{ transcript }} tag in the first message that contains it is replaced with the transcript text. See Inject a transcript by ID.

Message object

KeyTypeRequired?Description
rolestringYesThe role of the message sender. Valid values: "user", "assistant", "system", or "tool".
contentstring or arrayYesThe message content. Can be a string or an array of content parts for the "user" role.
namestringNoAn optional name for the message sender. For non-OpenAI models, this will be prepended as {name}: {content}.

Content part object

KeyTypeRequired?Description
typestringYesThe type of content. Currently only "text" is supported.
textstringYesThe text content.

Response

The API returns a JSON response with the model’s completion:
{
  "request_id": "abc123",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "request": {
    "model": "claude-sonnet-4-6",
    "max_tokens": 1000
  },
  "usage": {
    "input_tokens": 15,
    "output_tokens": 8,
    "total_tokens": 23
  }
}

Response fields

KeyTypeDescription
request_idstringA unique identifier for the request.
choicesarrayAn array of completion choices. Typically contains one choice.
choices[i].messageobjectThe message object containing the model’s response.
choices[i].message.rolestringThe role of the message, typically "assistant".
choices[i].message.contentstringThe text content of the model’s response.
choices[i].finish_reasonstringThe reason the model stopped generating. Common values: "stop", "length".
requestobjectEcho of the request parameters (excluding prompt and messages).
usageobjectToken usage statistics for the request.
usage.input_tokensnumberNumber of tokens in the prompt.
usage.output_tokensnumberNumber of tokens in the completion.
usage.total_tokensnumberTotal tokens used (prompt + completion).

Inject a transcript by ID

Pass transcript_id at the top level of the request to inject a transcript’s text into the prompt. The API replaces the first occurrence of the literal tag {{ transcript }} in the first message containing it with the transcript’s text field, then runs the completion.
Only the first occurrence of {{ transcript }} in the first message that contains it is substituted — additional tags or tags in later messages are left as-is. The tag must be exactly {{ transcript }} (with the spaces); variants like {{transcript}} or {{ TRANSCRIPT }} are not substituted. The endpoint returns 404 if the transcript ID does not exist or belongs to a different account.
curl -X POST "https://llm-gateway.assemblyai.com/v1/chat/completions" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash-lite",
    "messages": [
      {"role": "user", "content": "hi there"},
      {"role": "assistant", "content": "Hi! How can I help?"},
      {"role": "user", "content": "Here is a transcript: {{ transcript }}. Return the text verbatim."}
    ],
    "transcript_id": "065a71ac-dc3e-4e38-9374-e54c0bea564f"
  }'

Error response

If an error occurs, the API returns an error response:
{
  "error": {
    "code": 400,
    "message": "Invalid request: missing required field 'model'",
    "metadata": {}
  }
}
KeyTypeDescription
errorobjectContainer for error information.
error.codenumberHTTP status code for the error.
error.messagestringA human-readable description of the error.
error.metadataobjectOptional additional error context.

Common error codes

CodeDescription
400Bad Request - Invalid request parameters
401Unauthorized - Invalid or missing API key
403Forbidden - Insufficient permissions
404Not Found - Invalid endpoint or model
429Too Many Requests - Rate limit exceeded
500Internal Server Error - Server-side error