Universal-3 Pro Streaming

Messages

{
  "type": "Turn",
  "turn_order": 0,
  "turn_is_formatted": true,
  "end_of_turn": true,
  "transcript": "Hello world.",
  "end_of_turn_confidence": 1,
  "words": [
    {
      "text": "Hello",
      "start": 0,
      "end": 500,
      "confidence": 0.99
    },
    {
      "text": "world.",
      "start": 500,
      "end": 1000,
      "confidence": 0.98
    }
  ]
}

{
  "type": "LLMGatewayResponse",
  "turn_order": 0,
  "transcript": "Hello world.",
  "data": {
    "request_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "Hello! How can I help?"
        },
        "finish_reason": "stop"
      }
    ],
    "usage": {
      "input_tokens": 12,
      "output_tokens": 8,
      "total_tokens": 20,
      "prompt_tokens_details": {},
      "completion_tokens_details": {}
    },
    "request": {},
    "response_time": 123456789
  }
}

WSS

Messages

{
  "type": "Turn",
  "turn_order": 0,
  "turn_is_formatted": true,
  "end_of_turn": true,
  "transcript": "Hello world.",
  "end_of_turn_confidence": 1,
  "words": [
    {
      "text": "Hello",
      "start": 0,
      "end": 500,
      "confidence": 0.99
    },
    {
      "text": "world.",
      "start": 500,
      "end": 1000,
      "confidence": 0.98
    }
  ]
}

{
  "type": "LLMGatewayResponse",
  "turn_order": 0,
  "transcript": "Hello world.",
  "data": {
    "request_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "Hello! How can I help?"
        },
        "finish_reason": "stop"
      }
    ],
    "usage": {
      "input_tokens": 12,
      "output_tokens": 8,
      "total_tokens": 20,
      "prompt_tokens_details": {},
      "completion_tokens_details": {}
    },
    "request": {},
    "response_time": 123456789
  }
}

speech_model

type:enum

required

The speech model to use.

Available options: u3-rt-pro

ApiKey

type:string

required

Use your API key for authentication, or alternatively generate a temporary token and pass it via the token query parameter.

encoding

type:enum

required

Encoding of the audio stream.

Available options: pcm_s16le, pcm_mulaw

inactivity_timeout

type:string

required

Optional time in seconds of inactivity before session is terminated (integer, minimum 5, maximum 3600). If not set, no inactivity timeout is applied.

keyterms_prompt

type:string

required

A list of words and phrases to improve recognition accuracy for. See Keyterms Prompting for more details.

language_detection

type:enum

required

Whether to return language_code and language_confidence in turn messages. Universal-3 Pro Streaming natively code-switches between English, Spanish, German, French, Portuguese, and Italian by default without any necessary configuration.

Available options: true, false

max_turn_silence

type:string

required

Maximum silence in milliseconds before the turn is forced to end, regardless of punctuation. See Configuring Turn Detection for configuration details.

min_turn_silence

type:string

required

Silence duration in milliseconds before a speculative end-of-turn check. If terminal punctuation is found, the turn ends. Otherwise, a partial is emitted and the turn continues. See Configuring Turn Detection for configuration details.

prompt

type:string

required

Prompting is a beta feature. Custom transcription instructions for the model. When not provided, a default prompt optimized for native turn detection is used automatically. See the Prompting Guide for details.

sample_rate

type:string

required

Sample rate of the audio stream.

speaker_labels

type:enum

required

Whether to enable Streaming Speaker Diarization. When enabled, each Turn event will include a speaker_label field and each final word in the words array will include a speaker field for word-level speaker attribution.

Available options: true, false

max_speakers

type:string

required

The maximum number of speakers expected in the audio stream (integer, 1-10). Setting this can improve speaker label accuracy when you know the number of speakers in advance. Only used when speaker_labels is enabled. See Streaming Diarization for more details.

token

type:string

required

API token for authentication (if using a temporary token).

vad_threshold

type:string

required

The confidence threshold (0.0 to 1.0) for classifying audio frames as silence. Frames with VAD confidence below this value are considered silent. Increase for noisy environments to reduce false speech detection.

continuous_partials

type:string

required

Whether to emit additional partial transcripts during long turns at a steady ~3 second cadence. When disabled (default), only one early partial is emitted near turn start. When enabled, additional partials covering the full turn transcript are emitted approximately every 3 seconds while speech continues. The first partial (at 750ms) is unaffected.

include_partial_turns

type:string

required

Whether to emit partial transcripts during the turn. When enabled (default), partial transcripts are forwarded as speech is still in progress alongside final turns. When disabled, only final turns (with end_of_turn true) are sent. Defaults to false when redact_pii is enabled, to prevent unredacted partial transcripts from reaching the client; set explicitly to true to override.

interruption_delay

type:string

required

How soon the first partial is emitted in milliseconds. Useful for tuning voice agent barge-in responsiveness or allowing earlier partials for early LLM inference. Larger values are more confident on interruptions, smaller values result in faster time to first partial.

domain

type:enum

required

Enable domain-specific transcription models to improve accuracy for specialized terminology. Set to "medical-v1" to enable Medical Mode for improved accuracy of medical terms such as medications, procedures, conditions, and dosages. Supported languages: English (en), Spanish (es), German (de), French (fr). If used with an unsupported language, the parameter is ignored and a warning is returned.

Available options: medical-v1

filter_profanity

type:enum

required

Filter profanity from the transcribed text, can be true or false. See Profanity Filtering for more details.

Available options: true, false

redact_pii

type:enum

required

Redact PII from the transcribed text using the Redact PII model, can be true or false. Only applies to final turns. See PII Redaction for more details.

Available options: true, false

redact_pii_policies

type:string

required

The list of PII Redaction policies to enable. Requires redact_pii to be true. See PII redaction for more details.

redact_pii_sub

type:enum

required

The replacement logic for detected PII, can be entity_name or hash. Requires redact_pii to be true. See PII redaction for more details.

Available options: entity_name, hash

llm_gateway

type:string

required

JSON-stringified LLM Gateway configuration that processes each finalized turn. Follows the same interface as the Chat Completions endpoint and accepts model, messages, tools, tool_choice, post_processing_steps, and max_tokens. See Apply LLM Gateway to Streaming for the full schema and examples.

Session Begins Confirmation

type:object

Server message indicating the streaming session has successfully started.

Speech Started

type:object

Server message indicating that speech has been detected.

Formatted Turn Result

type:object

Server message containing a formatted turn-based transcription result.

Session Terminated (Server Confirmation)

type:object

Server message confirming session termination with session statistics.

LLM Gateway Response

type:object

Server message containing an LLM Gateway response for a finalized turn.

Audio Data Chunk

type:string

Client sends audio data as raw binary.

Update Streaming Configuration

type:object

Client message to update streaming configuration parameters during an active session.

Force Endpoint

type:object

Client message to manually force an endpoint in the transcription.

Terminate Session (Client Initiated)

type:object

Client message to gracefully terminate the streaming session.

Keep Alive

type:object

Client message to reset the inactivity timeout timer. This is not necessary by default — sessions remain open until explicitly terminated or until the 3-hour maximum session duration is reached. This message is only needed if you have set inactivity_timeout and want to keep the session open during periods where no audio is being sent.

Universal Streaming

Generate voice agent token

⌘I

Documentation Index