Skip to main content
WSS
/
v3
/
ws

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Messages
speech_model
type:enum
required

The speech model used for your Streaming session.

Available options: universal-streaming-english, universal-streaming-multilingual, whisper-rt
ApiKey
type:string
required

Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.

encoding
type:enum
required

Encoding of the audio stream.

Available options: pcm_s16le, pcm_mulaw
format_turns
type:enum
required

Whether to return formatted final transcripts.

Available options: true, false
inactivity_timeout
type:string
required

Optional time in seconds of inactivity before session is terminated (integer, minimum 5, maximum 3600). If not set, no inactivity timeout is applied.

keyterms_prompt
type:string
required

A list of words and phrases to improve recognition accuracy for. See Keyterms Prompting for more details.

language_detection
type:enum
required

Whether to detect the language and return language metadata on utterances and final turns. Only available for the multilingual model.

Available options: true, false
max_turn_silence
type:string
required

The maximum amount of silence in milliseconds allowed in a turn before end of turn is triggered. See Turn Detection for configuration details.

min_turn_silence
type:string
required

The minimum amount of silence in milliseconds required to detect end of turn when confident. See Turn Detection for configuration details.

sample_rate
type:string
required

Sample rate of the audio stream.

speaker_labels
type:enum
required

Whether to enable Streaming Speaker Diarization. When enabled, each Turn event will include a speaker_label field and each final word in the words array will include a speaker field for word-level speaker attribution.

Available options: true, false
max_speakers
type:string
required

The maximum number of speakers expected in the audio stream (integer, 1-10). Setting this can improve speaker label accuracy when you know the number of speakers in advance. Only used when speaker_labels is enabled. See Streaming Diarization for more details.

token
type:string
required

API token for authentication (if using a temporary token).

vad_threshold
type:string
required

The confidence threshold (0.0 to 1.0) for classifying audio frames as silence. Frames with VAD confidence below this value are considered silent. Increase for noisy environments to reduce false speech detection.

end_of_turn_confidence_threshold
type:string
required

The confidence threshold (0.0 to 1.0) to use when determining if the end of a turn has been reached. See Turn Detection for configuration details.

Note: This parameter is only supported for the Universal-streaming model.

domain
type:enum
required

Enable domain-specific transcription models to improve accuracy for specialized terminology. Set to "medical-v1" to enable Medical Mode for improved accuracy of medical terms such as medications, procedures, conditions, and dosages. Supported languages: English (en), Spanish (es), German (de), French (fr). If used with an unsupported language, the parameter is ignored and a warning is returned.

Available options: medical-v1
language
type:enum
required

The language of your audio stream. Deprecated.

Available options: en, multi
llm_gateway
type:string
required

JSON-stringified LLM Gateway configuration that processes each finalized turn. Follows the same interface as the Chat Completions endpoint and accepts model, messages, tools, tool_choice, post_processing_steps, and max_tokens. See Apply LLM Gateway to Streaming for the full schema and examples.

Session Begins Confirmation
type:object

Server message indicating the streaming session has successfully started.

Formatted Turn Result
type:object

Server message containing a formatted turn-based transcription result.

Session Terminated (Server Confirmation)
type:object

Server message confirming session termination with session statistics.

LLM Gateway Response
type:object

Server message containing an LLM Gateway response for a finalized turn.

Audio Data Chunk
type:string

Client sends audio data as raw binary.

Update Streaming Configuration
type:object

Client message to update streaming configuration parameters during an active session.

Force Endpoint
type:object

Client message to manually force an endpoint in the transcription.

Terminate Session (Client Initiated)
type:object

Client message to gracefully terminate the streaming session.

Keep Alive
type:object

Client message to reset the inactivity timeout timer. This is not necessary by default — sessions remain open until explicitly terminated or until the 3-hour maximum session duration is reached. This message is only needed if you have set inactivity_timeout and want to keep the session open during periods where no audio is being sent.