Events reference - AssemblyAI

Every message exchanged over the Voice Agent API WebSocket, grouped by direction. You’ll send session.update to configure, input.audio to stream mic audio, and tool.result to respond to tool calls. The server streams everything else back.

Event flow

A typical voice agent session moves through the events in this order:

Client                              Server
  │                                   │
  │── WebSocket connect ─────────────►│
  │── session.update ────────────────►│  (system prompt + tools + greeting)
  │                                   │
  │◄─── session.ready ────────────────│  (save session_id)
  │                                   │
  │── input.audio (stream) ──────────►│  (only after session.ready)
  │── input.audio (stream) ──────────►│
  │                                   │
  │◄─── input.speech.started ─────────│
  │◄─── transcript.user.delta ────────│
  │◄─── input.speech.stopped ─────────│
  │◄─── transcript.user ──────────────│
  │                                   │
  │◄─── reply.started ────────────────│
  │◄─── reply.audio ──────────────────│
  │◄─── transcript.agent ─────────────│
  │◄─── reply.done ───────────────────│
  │                                   │
  │  [tool call flow]                 │
  │◄─── tool.call ────────────────────│  (arguments is a dict)
  │◄─── reply.done ───────────────────│  ← send tool.result here
  │── tool.result ───────────────────►│
  │◄─── reply.started ────────────────│
  │◄─── reply.audio ──────────────────│
  │◄─── reply.done ───────────────────│

Client → Server

`input.audio`

Stream PCM16 audio to the agent.

{
  "type": "input.audio",
  "audio": "<base64-encoded PCM16>"
}

Field	Type	Description
`audio`	string	Base64-encoded PCM16 mono 24kHz audio

See Audio format for the full format specification.

`session.update`

Configure the session. Send immediately on WebSocket connect (before session.ready). Can also be sent mid-conversation to update most fields. See Mutability after session.ready for which fields can change once the session is established.

{
  "type": "session.update",
  "session": {
    "system_prompt": "You are a concise assistant.",
    "greeting": "Hi! How can I help?",
    "input": {
      "format": { "encoding": "audio/pcm" },
      "turn_detection": { "vad_threshold": 0.5 }
    },
    "output": {
      "voice": "ivy",
      "format": { "encoding": "audio/pcm" },
      "volume": 100
    },
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    ]
  }
}

All fields are optional. Include only what you want to set or change. After session.ready, only a subset of fields can be changed; changing greeting, session.output.voice, or session.output.format raises immutable_field. session.output.volume is mutable mid-session.

Field	Type	Description
`session.system_prompt`	string	Sets the agent’s personality and context
`session.greeting`	string	Spoken aloud at the start of the conversation
`session.input.format`	object	Input audio format (`encoding`). See Audio format
`session.input.keyterms`	array	List of strings to boost in transcription. See Key terms
`session.input.turn_detection`	object	Turn detection configuration. See Session configuration
`session.output.voice`	string	The voice used for the agent’s speech. See Voices
`session.output.format`	object	Output audio format (`encoding`). See Audio format
`session.output.volume`	number	Playback volume for the agent’s speech, `0` (silent) to `100` (loudest). Mutable mid-session. See Output volume
`session.tools`	array	Tool definitions. See Tool calling

`session.resume`

Reconnect to an existing session using the session_id from a previous session.ready. Preserves conversation context across dropped connections.

{
  "type": "session.resume",
  "session_id": "sess_abc123"
}

Sessions are preserved for 30 seconds after every disconnection before expiring. If the session has expired, the server returns a session.error with code session_not_found or session_forbidden. Start a fresh connection without session.resume.

Example. Capture session_id from session.ready on the first connection, then send session.resume as the first message when reconnecting:

import json
import websockets

session_id: str | None = None

async def connect():
    global session_id
    async with websockets.connect(URL, additional_headers={"Authorization": f"Bearer {API_KEY}"}) as ws:
        # If we already have a session_id from a previous connection, resume it.
        if session_id:
            await ws.send(json.dumps({"type": "session.resume", "session_id": session_id}))
        else:
            await ws.send(json.dumps({"type": "session.update", "session": {...}}))

        async for raw in ws:
            event = json.loads(raw)
            if event["type"] == "session.ready":
                session_id = event["session_id"]  # save for next reconnect
            elif event["type"] == "session.error" and event["code"] in ("session_not_found", "session_forbidden"):
                session_id = None  # session expired - start fresh next time
            # ... handle other events

# On disconnect, call connect() again within 30 seconds to resume.

`tool.result`

Send a tool result back to the agent. Send this when reply.done is the latest event you’ve received (and nothing has happened since). The simplest pattern is to accumulate on tool.call and drain inside the reply.done handler. See Tool calling.

{
  "type": "tool.result",
  "call_id": "call_abc123",
  "result": "{\"temp_c\": 22, \"description\": \"Sunny\"}"
}

Field	Type	Description
`call_id`	string	The `call_id` from the `tool.call` event
`result`	string	JSON string containing the tool result

`reply.create`

Ask the agent to generate a reply right now, optionally with custom instructions. Useful for delivering status updates during long-running hold-mode tool calls, or any time you want the agent to speak without a user utterance triggering it.

{
  "type": "reply.create",
  "instructions": "Let the customer know we're still processing the transfer."
}

Field	Type	Description
`instructions`	string	Optional. One-shot instruction the agent uses to compose this reply. Does not modify `system_prompt`.

The agent generates a normal reply (reply.started → reply.audio → transcript.agent → reply.done) using the provided instructions on top of the existing system prompt and conversation history.

Server → Client

`session.ready`

Session is established and ready to receive audio. Save session_id for reconnection. Start sending input.audio only after this event.

{
  "type": "session.ready",
  "session_id": "sess_abc123"
}

Field	Type	Description
`session_id`	string	Always present. Save this value to reconnect with `session.resume`.

`session.updated`

Sent after session.update is applied successfully.

{ "type": "session.updated" }

`input.speech.started`

Turn detection determined the user has started speaking.

{ "type": "input.speech.started" }

`input.speech.stopped`

Turn detection determined the user has stopped speaking.

{ "type": "input.speech.stopped" }

`transcript.user.delta`

Partial transcript of what the user is saying, updating in real-time.

{
  "type": "transcript.user.delta",
  "text": "What's the weather in"
}

Live user transcripts pause while a hold-mode tool is in flight and resume once the hold ends. Anything the user said during the hold is preserved in the conversation context.

`transcript.user`

Final transcript of the user’s utterance.

{
  "type": "transcript.user",
  "text": "What's the weather in Tokyo?",
  "item_id": "item_abc123"
}

`reply.started`

Agent has begun generating a response.

{
  "type": "reply.started",
  "reply_id": "reply_abc123"
}

`reply.audio`

A chunk of the agent’s spoken response as base64 PCM16. Decode and play immediately.

{
  "type": "reply.audio",
  "data": "<base64-encoded PCM16>"
}

See Audio format for playback guidance.

`transcript.agent`

Full text of the agent’s response, sent after all audio for the response has been delivered. If the agent was interrupted, interrupted is true and text contains only what was actually spoken before the interruption.

{
  "type": "transcript.agent",
  "text": "It's currently 22°C and sunny in Tokyo.",
  "reply_id": "reply_abc123",
  "item_id": "item_abc123",
  "interrupted": false
}

Field	Type	Description
`text`	string	What the agent said (trimmed to interruption point if interrupted)
`reply_id`	string	ID of the reply
`item_id`	string	Conversation item ID
`interrupted`	boolean	`true` if the user interrupted mid-response

`reply.done`

Agent has finished speaking. The optional status field indicates why the reply ended.

{ "type": "reply.done" }

{ "type": "reply.done", "status": "interrupted" }

Field	Type	Description
`status`	string	`"interrupted"` if the user barged in, absent for normal completion

`tool.call`

Agent wants to call a registered tool. arguments is a dict, ready to use directly as-is.

{
  "type": "tool.call",
  "call_id": "call_abc123",
  "name": "get_weather",
  "arguments": { "location": "Tokyo" }
}

Field	Type	Description
`call_id`	string	Include this in `tool.result`
`name`	string	Tool name to call
`arguments`	object	Arguments as a dict (use directly)

See Tool calling for the full pattern.

`session.error`

Session or protocol error. The payload always includes type, timestamp, code, and message. Some errors (like session.update validation failures) also include a param field naming the offending field.

{
  "type": "session.error",
  "code": "invalid_format",
  "message": "Invalid message format",
  "timestamp": "2025-01-01T00:00:00Z"
}

Connection and handshake errors Sent before or instead of session.ready. The WebSocket closes after these with the indicated close code.

Code	Close code	Description
`UNAUTHORIZED`	1008	Missing or invalid `Authorization` token
`FORBIDDEN`	1008	Valid token, but insufficient permissions
`server_error`	1008	Service at capacity (try again later)
`INTERNAL_ERROR`	1011	Unexpected exception during connection setup

Session resume errors Sent when session.resume fails. The WebSocket closes after these.

Code	Close code	Description
`session_not_found`	1008	The `session_id` is unknown or the 30-second grace window expired
`session_forbidden`	1008	The `session_id` belongs to a different account
`session_expired`	1008	Session TTL elapsed during the grace window

Agent startup errors Sent after the WebSocket is accepted but before session.ready.

Code	Description
`agent_init_failed`	Voice agent worker reported initialization failure
`agent_timeout`	Agent did not signal ready within 10 seconds

Client message errors Sent on the open socket when an inbound message is invalid. The session stays alive (except session_expired).

Code	Description
`invalid_format`	Bad JSON, missing or unknown `type`, validation failure, or missing `audio` field on `input.audio`
`invalid_audio`	`input.audio` payload failed base64 decode or PCM conversion
`invalid_value`	`session.update` with an invalid voice or field type
`immutable_field`	`session.update` tried to change `greeting`, `output.voice`, or `output.format` after the first update was applied. `output.volume` is mutable and does not raise this error.
`invalid_config`	`session.update` raised a validation error
`server_error`	Unexpected exception while applying `session.update`

Live session errors

Code	Close code	Description
`session_expired`	1008	Session duration TTL reached. There is no separate “closing soon” warning event before this, so run a client-side timer if you need to wrap up gracefully.

If the server cancels the session due to an internal error, the WebSocket closes with code 1011 without any session.error payload. In browsers, pre-handshake failures (like UNAUTHORIZED) surface as a close event with code 1006. You won’t receive a session.error. Always fetch a fresh token immediately before each connection attempt.

Interruptions

When the user speaks mid-response (barge-in), the server stops the agent and emits reply.done with status: "interrupted" and transcript.agent with interrupted: true. The decision is semantic. Back-channels like “uh-huh” don’t trigger an interruption. On reply.done with status: "interrupted":

Flush your local audio playback buffer.
Discard any pending tool.result accumulators from the just-ended reply.
Restart the playback stream so it’s ready for the next response.

See Turn detection and interruptions for how the model decides what counts as an interruption, and Handling interruptions for the platform-specific flush pattern.

Documentation Index

​Event flow

​Client → Server

​input.audio

​session.update

​session.resume

​tool.result

​reply.create

​Server → Client

​session.ready

​session.updated

​input.speech.started

​input.speech.stopped

​transcript.user.delta

​transcript.user

​reply.started

​reply.audio

​transcript.agent

​reply.done

​tool.call

​session.error

​Interruptions

Event flow

Client → Server

`input.audio`

`session.update`

`session.resume`

`tool.result`

`reply.create`

Server → Client

`session.ready`

`session.updated`

`input.speech.started`

`input.speech.stopped`

`transcript.user.delta`

`transcript.user`

`reply.started`

`reply.audio`

`transcript.agent`

`reply.done`

`tool.call`

`session.error`

Interruptions