Voice Agent WebSocket

Messages

{
  "type": "session.update",
  "session": {
    "system_prompt": "You are a concise assistant.",
    "greeting": "Hi — how can I help?",
    "input": {
      "format": {
        "encoding": "audio/pcm"
      },
      "turn_detection": {
        "vad_threshold": 0.5
      }
    },
    "output": {
      "voice": "ivy",
      "format": {
        "encoding": "audio/pcm"
      },
      "volume": 100
    },
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string"
            }
          },
          "required": [
            "city"
          ]
        }
      }
    ]
  }
}

WSS

Messages

{
  "type": "session.update",
  "session": {
    "system_prompt": "You are a concise assistant.",
    "greeting": "Hi — how can I help?",
    "input": {
      "format": {
        "encoding": "audio/pcm"
      },
      "turn_detection": {
        "vad_threshold": 0.5
      }
    },
    "output": {
      "voice": "ivy",
      "format": {
        "encoding": "audio/pcm"
      },
      "volume": 100
    },
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string"
            }
          },
          "required": [
            "city"
          ]
        }
      }
    ]
  }
}

ApiKey

type:string

required

Pass your API key as a Bearer token in the Authorization header on the WebSocket upgrade request. For browser apps (which can't set custom headers on WebSockets), generate a temporary token and pass it via the token query parameter instead. See Browser integration.

token

type:string

required

Temporary authentication token for client-side connections. Generate one with GET /v1/token on your server and pass it here so you don't expose your permanent API key in the browser. Each token is one-time use.

Session Ready

type:object

Server confirms the session is established and ready for audio.

Session Updated

type:object

Server acknowledges that a session.update was applied successfully.

Session Error

type:object

Server reports a session- or protocol-level error.

User Started Speaking

type:object

Server signals that turn detection determined the user started speaking.

User Stopped Speaking

type:object

Server signals that turn detection determined the user stopped speaking.

User Transcript Delta

type:object

Partial transcript of the user's current utterance.

User Transcript

type:object

Final transcript of the user's utterance.

Reply Started

type:object

Agent has begun generating a reply.

Reply Audio Chunk

type:object

A chunk of the agent's spoken response as base64 PCM16.

Agent Transcript

type:object

Text of the agent's response, sent after all reply audio has been delivered.

Reply Done

type:object

Agent has finished speaking. If the user barged in, status is "interrupted". Send accumulated tool.result events on this event.

Tool Call

type:object

Agent wants to invoke a registered tool.

Update Session

type:object

Client message to configure the session (system prompt, greeting, input, output, tools).

Resume Session

type:object

Client message to resume a previous session by session_id.

Input Audio Chunk

type:object

Client streams a chunk of PCM16 audio as base64.

Tool Result

type:object

Client returns the result of a tool invocation to the agent.

Reply Create

type:object

Client asks the agent to generate a reply now, optionally with one-shot instructions.

Generate voice agent token

⌘I

Documentation Index