Basic Chat Completions

Overview

Basic chat completions allow you to send a message and receive a response from the model. This is the simplest way to interact with the LLM Gateway.

Getting started

Send a message and receive a response:

Python
JavaScript

import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers = headers,
    json = {
        "model": "claude-sonnet-4-6",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "max_tokens": 1000
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

const response = await fetch(
  "https://llm-gateway.assemblyai.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      authorization: "<YOUR_API_KEY>",
      "content-type": "application/json",
    },
    body: JSON.stringify({
      model: "claude-sonnet-4-6",
      messages: [{ role: "user", content: "What is the capital of France?" }],
      max_tokens: 1000,
    }),
  }
);

const result = await response.json();
console.log(result.choices[0].message.content);

Streamed responses

You can stream responses from OpenAI models by setting stream to true. This returns partial responses as server-sent events (SSE), allowing you to display output as it’s generated.

Streamed responses are currently supported on OpenAI models only.

Python
JavaScript

import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers=headers,
    json={
        "model": "gpt-4.1",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "stream": True,
        "max_tokens": 1000
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode("utf-8"))

const response = await fetch(
  "https://llm-gateway.assemblyai.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      authorization: "<YOUR_API_KEY>",
      "content-type": "application/json",
    },
    body: JSON.stringify({
      model: "gpt-4.1",
      messages: [{ role: "user", content: "What is the capital of France?" }],
      stream: true,
      max_tokens: 1000,
    }),
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(decoder.decode(value));
}

API reference

Request

The LLM Gateway accepts POST requests to https://llm-gateway.assemblyai.com/v1/chat/completions with the following parameters:

curl -X POST \
  "https://llm-gateway.assemblyai.com/v1/chat/completions" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "max_tokens": 1000
  }'

Request parameters

Key	Type	Required?	Description
`model`	string	Yes	The model to use for completion. See Available models section for supported values.
`messages`	array	Yes*	An array of message objects representing the conversation history. Either `messages` or `prompt` is required.
`prompt`	string	Yes*	A simple string prompt for single request/response interactions. Either `messages` or `prompt` is required.
`stream`	boolean	No	When `true`, responses are streamed as server-sent events (SSE). Supported on OpenAI models only.
`max_tokens`	number	No	The maximum number of tokens to generate. Default: 1000. Range: [1, context_length).
`temperature`	number	No	Controls randomness in the output. Higher values make output more random. Range: [0, 2].
`post_processing_steps`	array	No	An ordered list of post-processing steps to apply to the response. See Post-processing.
`transcript_id`	string	No	Inject an AssemblyAI transcript’s text into the prompt. The first `{{ transcript }}` tag in the first message that contains it is replaced with the transcript text. See Inject a transcript by ID.

Message object

Key	Type	Required?	Description
`role`	string	Yes	The role of the message sender. Valid values: `"user"`, `"assistant"`, `"system"`, or `"tool"`.
`content`	string or array	Yes	The message content. Can be a string or an array of content parts for the `"user"` role.
`name`	string	No	An optional name for the message sender. For non-OpenAI models, this will be prepended as `{name}: {content}`.

Content part object

Key	Type	Required?	Description
`type`	string	Yes	The type of content. Currently only `"text"` is supported.
`text`	string	Yes	The text content.

Response

The API returns a JSON response with the model’s completion:

{
  "request_id": "abc123",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "request": {
    "model": "claude-sonnet-4-6",
    "max_tokens": 1000
  },
  "usage": {
    "input_tokens": 15,
    "output_tokens": 8,
    "total_tokens": 23
  }
}

Response fields

Key	Type	Description
`request_id`	string	A unique identifier for the request.
`choices`	array	An array of completion choices. Typically contains one choice.
`choices[i].message`	object	The message object containing the model’s response.
`choices[i].message.role`	string	The role of the message, typically `"assistant"`.
`choices[i].message.content`	string	The text content of the model’s response.
`choices[i].finish_reason`	string	The reason the model stopped generating. Common values: `"stop"`, `"length"`.
`request`	object	Echo of the request parameters (excluding `prompt` and `messages`).
`usage`	object	Token usage statistics for the request.
`usage.input_tokens`	number	Number of tokens in the prompt.
`usage.output_tokens`	number	Number of tokens in the completion.
`usage.total_tokens`	number	Total tokens used (prompt + completion).

Inject a transcript by ID

Pass transcript_id at the top level of the request to inject a transcript’s text into the prompt. The API replaces the first occurrence of the literal tag {{ transcript }} in the first message containing it with the transcript’s text field, then runs the completion.

Only the first occurrence of {{ transcript }} in the first message that contains it is substituted — additional tags or tags in later messages are left as-is. The tag must be exactly {{ transcript }} (with the spaces); variants like {{transcript}} or {{ TRANSCRIPT }} are not substituted. The endpoint returns 404 if the transcript ID does not exist or belongs to a different account.

curl -X POST "https://llm-gateway.assemblyai.com/v1/chat/completions" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash-lite",
    "messages": [
      {"role": "user", "content": "hi there"},
      {"role": "assistant", "content": "Hi! How can I help?"},
      {"role": "user", "content": "Here is a transcript: {{ transcript }}. Return the text verbatim."}
    ],
    "transcript_id": "065a71ac-dc3e-4e38-9374-e54c0bea564f"
  }'

Error response

If an error occurs, the API returns an error response:

{
  "error": {
    "code": 400,
    "message": "Invalid request: missing required field 'model'",
    "metadata": {}
  }
}

Key	Type	Description
`error`	object	Container for error information.
`error.code`	number	HTTP status code for the error.
`error.message`	string	A human-readable description of the error.
`error.metadata`	object	Optional additional error context.

Common error codes

Code	Description
400	Bad Request - Invalid request parameters
401	Unauthorized - Invalid or missing API key
403	Forbidden - Insufficient permissions
404	Not Found - Invalid endpoint or model
429	Too Many Requests - Rate limit exceeded
500	Internal Server Error - Server-side error

Documentation Index

​Overview

​Getting started

​Streamed responses

​API reference

​Request

​Request parameters

​Message object

​Content part object

​Response

​Response fields

​Inject a transcript by ID

​Error response

​Common error codes

Overview

Getting started

Streamed responses

API reference

Request

Request parameters

Message object

Content part object

Response

Response fields

Inject a transcript by ID

Error response

Common error codes