You can stream responses from OpenAI models by setting stream to true. This returns partial responses as server-sent events (SSE), allowing you to display output as it’s generated.
Streamed responses are currently supported on OpenAI models only.
Python
JavaScript
import requestsheaders = { "authorization": "<YOUR_API_KEY>"}response = requests.post( "https://llm-gateway.assemblyai.com/v1/chat/completions", headers=headers, json={ "model": "gpt-4.1", "messages": [ {"role": "user", "content": "What is the capital of France?"} ], "stream": True, "max_tokens": 1000 }, stream=True)for line in response.iter_lines(): if line: print(line.decode("utf-8"))
const response = await fetch( "https://llm-gateway.assemblyai.com/v1/chat/completions", { method: "POST", headers: { authorization: "<YOUR_API_KEY>", "content-type": "application/json", }, body: JSON.stringify({ model: "gpt-4.1", messages: [{ role: "user", content: "What is the capital of France?" }], stream: true, max_tokens: 1000, }), });const reader = response.body.getReader();const decoder = new TextDecoder();while (true) { const { done, value } = await reader.read(); if (done) break; console.log(decoder.decode(value));}
The model to use for completion. See Available models section for supported values.
messages
array
Yes*
An array of message objects representing the conversation history. Either messages or prompt is required.
prompt
string
Yes*
A simple string prompt for single request/response interactions. Either messages or prompt is required.
stream
boolean
No
When true, responses are streamed as server-sent events (SSE). Supported on OpenAI models only.
max_tokens
number
No
The maximum number of tokens to generate. Default: 1000. Range: [1, context_length).
temperature
number
No
Controls randomness in the output. Higher values make output more random. Range: [0, 2].
post_processing_steps
array
No
An ordered list of post-processing steps to apply to the response. See Post-processing.
transcript_id
string
No
Inject an AssemblyAI transcript’s text into the prompt. The first {{ transcript }} tag in the first message that contains it is replaced with the transcript text. See Inject a transcript by ID.
Pass transcript_id at the top level of the request to inject a transcript’s text into the prompt. The API replaces the first occurrence of the literal tag {{ transcript }} in the first message containing it with the transcript’s text field, then runs the completion.
Only the first occurrence of {{ transcript }} in the first message that contains it is substituted — additional tags or tags in later messages are left as-is. The tag must be exactly {{ transcript }} (with the spaces); variants like {{transcript}} or {{ TRANSCRIPT }} are not substituted. The endpoint returns 404 if the transcript ID does not exist or belongs to a different account.
curl -X POST "https://llm-gateway.assemblyai.com/v1/chat/completions" \ -H "Authorization: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-flash-lite", "messages": [ {"role": "user", "content": "hi there"}, {"role": "assistant", "content": "Hi! How can I help?"}, {"role": "user", "content": "Here is a transcript: {{ transcript }}. Return the text verbatim."} ], "transcript_id": "065a71ac-dc3e-4e38-9374-e54c0bea564f" }'