{
"type": "session.ready",
"session_id": "sess_abc123"
}{
"type": "session.updated"
}{
"type": "session.error",
"code": "invalid_format",
"message": "Invalid message format"
}{
"type": "input.speech.started"
}{
"type": "input.speech.stopped"
}{
"type": "transcript.user.delta",
"text": "What's the weather in"
}{
"type": "transcript.user",
"text": "What's the weather in Tokyo?",
"item_id": "item_abc123"
}{
"type": "reply.started",
"reply_id": "reply_abc123"
}{
"type": "reply.audio",
"data": "EAAgADAAQAAwACAAEAAAAPD/4P/Q/8D/"
}{
"type": "transcript.agent",
"text": "It's currently 22°C and sunny in Tokyo.",
"reply_id": "reply_abc123",
"item_id": "item_abc123",
"interrupted": false
}{
"type": "reply.done"
}{
"type": "tool.call",
"call_id": "call_abc123",
"name": "get_weather",
"arguments": {
"location": "Tokyo"
}
}{
"type": "session.update",
"session": {
"system_prompt": "You are a concise assistant.",
"greeting": "Hi — how can I help?",
"input": {
"format": {
"encoding": "audio/pcm"
},
"turn_detection": {
"vad_threshold": 0.5
}
},
"output": {
"voice": "ivy",
"format": {
"encoding": "audio/pcm"
},
"volume": 100
},
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string"
}
},
"required": [
"city"
]
}
}
]
}
}{
"type": "session.resume",
"session_id": "sess_abc123"
}{
"type": "input.audio",
"audio": "EAAgADAAQAAwACAAEAAAAPD/4P/Q/8D/"
}{
"type": "tool.result",
"call_id": "call_abc123",
"result": "{\"temp_c\": 22, \"description\": \"Sunny\"}"
}{
"type": "reply.create",
"instructions": "Let the customer know we're still processing the transfer."
}Voice Agent WebSocket
Connect to the Voice Agent API to run a real-time voice conversation. The client streams PCM16 audio to the server and receives the agent’s spoken response (also PCM16), along with transcripts, tool calls, and lifecycle events.
See the Voice Agent API overview for the full event flow and a runnable quickstart.
Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
{
"type": "session.ready",
"session_id": "sess_abc123"
}{
"type": "session.updated"
}{
"type": "session.error",
"code": "invalid_format",
"message": "Invalid message format"
}{
"type": "input.speech.started"
}{
"type": "input.speech.stopped"
}{
"type": "transcript.user.delta",
"text": "What's the weather in"
}{
"type": "transcript.user",
"text": "What's the weather in Tokyo?",
"item_id": "item_abc123"
}{
"type": "reply.started",
"reply_id": "reply_abc123"
}{
"type": "reply.audio",
"data": "EAAgADAAQAAwACAAEAAAAPD/4P/Q/8D/"
}{
"type": "transcript.agent",
"text": "It's currently 22°C and sunny in Tokyo.",
"reply_id": "reply_abc123",
"item_id": "item_abc123",
"interrupted": false
}{
"type": "reply.done"
}{
"type": "tool.call",
"call_id": "call_abc123",
"name": "get_weather",
"arguments": {
"location": "Tokyo"
}
}{
"type": "session.update",
"session": {
"system_prompt": "You are a concise assistant.",
"greeting": "Hi — how can I help?",
"input": {
"format": {
"encoding": "audio/pcm"
},
"turn_detection": {
"vad_threshold": 0.5
}
},
"output": {
"voice": "ivy",
"format": {
"encoding": "audio/pcm"
},
"volume": 100
},
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string"
}
},
"required": [
"city"
]
}
}
]
}
}{
"type": "session.resume",
"session_id": "sess_abc123"
}{
"type": "input.audio",
"audio": "EAAgADAAQAAwACAAEAAAAPD/4P/Q/8D/"
}{
"type": "tool.result",
"call_id": "call_abc123",
"result": "{\"temp_c\": 22, \"description\": \"Sunny\"}"
}{
"type": "reply.create",
"instructions": "Let the customer know we're still processing the transfer."
}Pass your API key as a Bearer token in the Authorization header on the WebSocket upgrade request. For browser apps (which can't set custom headers on WebSockets), generate a temporary token and pass it via the token query parameter instead. See Browser integration.
Temporary authentication token for client-side connections. Generate one with GET /v1/token on your server and pass it here so you don't expose your permanent API key in the browser. Each token is one-time use.
Server confirms the session is established and ready for audio.
Server acknowledges that a session.update was applied successfully.
Server reports a session- or protocol-level error.
Server signals that turn detection determined the user started speaking.
Server signals that turn detection determined the user stopped speaking.
Partial transcript of the user's current utterance.
Final transcript of the user's utterance.
Agent has begun generating a reply.
A chunk of the agent's spoken response as base64 PCM16.
Text of the agent's response, sent after all reply audio has been delivered.
Agent has finished speaking. If the user barged in, status is "interrupted". Send
accumulated tool.result events on this event.
Agent wants to invoke a registered tool.
Client message to configure the session (system prompt, greeting, input, output, tools).
Client message to resume a previous session by session_id.
Client streams a chunk of PCM16 audio as base64.
Client returns the result of a tool invocation to the agent.
Client asks the agent to generate a reply now, optionally with one-shot instructions.
Was this page helpful?