Skip to main content

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Register tools in session.tools to let your agent take actions. The agent emits a tool.call; you run the tool and reply with a tool.result when reply.done is the latest event you’ve received.

Quick start

import asyncio, json, websockets

URL = "wss://agents.assemblyai.com/v1/ws"
TOOLS = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current weather for any city. Use this whenever the user asks about weather, temperature, or conditions. Prefer calling this over guessing.",
    "parameters": {
        "type": "object",
        "properties": {"city": {"type": "string", "description": "City name (e.g. London)"}},
        "required": ["city"],
    },
}]

async def main():
    async with websockets.connect(URL, extra_headers={"Authorization": "Bearer YOUR_KEY"}) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": "You are a weather assistant. Call get_weather for weather questions. When in doubt, call the tool.",
                "greeting": "Hi! Ask me about the weather.",
                "tools": TOOLS,
                "output": {"type": "audio", "voice": "ivy"},
            },
        }))

        last_event, pending = None, []

        async def flush_if_idle():
            if last_event != "reply.done" or not pending:
                return
            for t in pending:
                await ws.send(json.dumps({"type": "tool.result", "call_id": t["call_id"],
                                          "result": json.dumps(t["result"])}))
            pending.clear()

        async for raw in ws:
            event = json.loads(raw); t = event.get("type")
            if t == "tool.call" and event["name"] == "get_weather":
                pending.append({"call_id": event["call_id"], "result": {"temp_c": 22, "description": "Sunny"}})
                await flush_if_idle()
            elif t in ("reply.started", "input.speech.started"):
                last_event = t
            elif t == "reply.done":
                last_event = t
                if event.get("status") == "interrupted":
                    pending.clear()
                else:
                    await flush_if_idle()

asyncio.run(main())

Defining tools

{
  "type": "session.update",
  "session": {
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for any city. Use this whenever the user asks about weather.",
        "parameters": {
          "type": "object",
          "properties": {"location": {"type": "string", "description": "City name, e.g. London"}},
          "required": ["location"]
        },
        "execution_mode": "interactive",
        "timeout_seconds": 120
      }
    ]
  }
}
FieldTypeDefaultNotes
typestring(required)Always "function".
namestring(required)snake_case, verb-noun. Referenced by tool.call.
descriptionstring""The model’s main signal for when to call. See below.
parametersobject{}JSON Schema. See below.
execution_modestring"interactive""interactive" or "hold". See Execution modes.
timeout_secondsnumber1201–300. On timeout the agent apologises; the session continues.
session.tools updates replace the previous array (not merge). See Progressive tool reveal.

Descriptions

Treat description as “when should I reach for this?”, not “what does this do?”.
{
  "description": "Get current weather for any city. Use this whenever the user asks about weather, temperature, conditions, what to wear, or anything weather-dependent. Prefer calling this over guessing."
}
  • Name the trigger: “Call this when the user asks about X.”
  • Name the anti-trigger: “Do not call this for Y.”
  • Mention precondition fields if any.

Parameters

{
  "type": "object",
  "properties": {
    "phone_number": {
      "type": "string",
      "description": "E.164 format (e.g. +14155551234). Strip spaces, parentheses, dashes."
    },
    "department": {
      "type": "string",
      "enum": ["billing", "sales", "support"],
      "description": "Pick the closest match (e.g. 'tech help' → support)."
    }
  },
  "required": ["phone_number", "department"]
}
  • Lead with format (“E.164”, “ISO-8601 date”, “lowercase”).
  • Always include an example.
  • Use enum for fixed sets.
  • required only for fields the tool truly can’t function without; otherwise the model interrogates the user.
parameters is not validated at session.update time. Malformed schemas (missing type: "object", broken enum) are accepted silently and break tool calling at runtime. Validate locally.

Getting the agent to call your tools

In rough order of impact:
  1. Strong tool descriptions: see above. Most “tool never fires” failures trace here.
  2. Strong parameter descriptions: same idea applied per-field. Vague params produce missing or invented argument values, which the validator then rejects (or worse, your tool runs on garbage). Lead with format, include an example, use enum for fixed sets. See Parameters.
  3. Default-to-call wording in system_prompt: “When in doubt, call the tool. A wasted call is fine. Answering wrong from memory is not.” Don’t stack exceptions.
  4. Few-shot examples in system_prompt are the strongest behavioural signal:
    User: “Where’s my order?” You: [call search_orders] “Looks like it’s out for delivery today.”
  5. Keep tool sets small (≤10 per phase). Past that, selection accuracy drops. See Progressive tool reveal.

Returning tool results

Send tool.result when reply.done is the latest event you’ve received. Not earlier (agent is still mid-transition-phrase), not later (a new turn has started).
last_event: str | None = None
pending_tools: list[dict] = []

async def flush_if_idle():
    if last_event != "reply.done" or not pending_tools:
        return
    for tool in pending_tools:
        await ws.send(json.dumps({
            "type": "tool.result",
            "call_id": tool["call_id"],
            "result": json.dumps(tool["result"]),   # JSON string
        }))
    pending_tools.clear()

# In your event loop:
if t == "tool.call":
    result = run_tool(event["name"], event["arguments"])
    pending_tools.append({"call_id": event["call_id"], "result": result})
    await flush_if_idle()                # may already be idle if reply.done fired first

elif t in ("reply.started", "input.speech.started"):
    last_event = t                        # turn in flight, hold results

elif t == "reply.done":
    last_event = t
    if event.get("status") == "interrupted":
        pending_tools.clear()             # agent moved on, drop stale results
    else:
        await flush_if_idle()
Two non-obvious bits:
  • Call flush_if_idle() from the tool.call handler. Your tool may return after reply.done already fired.
  • Update last_event on reply.started / input.speech.started so results that become available mid-turn are held until that turn ends.

Errors that help the agent recover

The error field is read verbatim by the model. Weak errors cause guessing loops; specific errors get clean recoveries. Weak (agent re-asks for everything):
{ "error": "Lookup failed." }
Strong (agent re-asks only for the field that failed):
{ "error": "Could not resolve DROPOFF 'Central train station'. Pickup resolved ('SW1A 1AA'). Ask the user for a UK postcode for the dropoff." }
Patterns: name the failing field, say what did work so the agent doesn’t re-ask for it, tell the agent what to ask for next.

Execution modes

Set execution_mode per tool to choose how the agent waits.
Use "interactive" for…Use "hold" for…
DB lookups, REST calls, short calculationsPhone transfers, escalations
Returns under ~5 secondsLong-running ops (>10s, async jobs)
Transition phrase (“let me check”) feels naturalSensitive flows (payment auth, identity verification)
Default to interactive. Two common mis-uses:
  • ❌ Wrapping a slow DB query in hold “to be safe”. Agent goes mute, user thinks the call dropped. Use interactive with a longer timeout_seconds.
  • ❌ Using interactive for a 30-second human transfer. Agent fills with small-talk; user gets suspicious.

Interactive

server                                 client
  │  reply.started                       │
  │ ───────────────────────────────────► │
  │  reply.audio  ("let me check that")  │
  │ ───────────────────────────────────► │
  │  tool.call                           │  client accumulates result
  │ ───────────────────────────────────► │  (does NOT send tool.result yet)
  │  reply.done                          │
  │ ───────────────────────────────────► │  client drains pending results:
  │                                      │  tool.result
  │ ◄─────────────────────────────────── │
  │  reply.started                       │  agent delivers answer
  │ ───────────────────────────────────► │
  │  reply.audio  ("it's 22°C and sunny")│
  │ ───────────────────────────────────► │
  │  reply.done                          │
  │ ───────────────────────────────────► │

Hold

While the tool is in flight:
  1. Agent stays silent (no reply.started).
  2. User speech doesn’t trigger replies. Utterances are added to context but the agent doesn’t respond until you send tool.result or reply.create.
  3. tool.result auto-fires the next reply. Don’t also send reply.create after.
{
  "type": "function",
  "name": "transfer_call",
  "description": "Transfer the call to a human agent. Takes 15–30 seconds.",
  "parameters": {"type": "object", "properties": {"department": {"type": "string"}}, "required": ["department"]},
  "execution_mode": "hold",
  "timeout_seconds": 60
}
server                                 client
  │  tool.call (hold)                    │
  │ ───────────────────────────────────► │  kick off long-running op
  │                                      │  (agent silent, no reply.started)
  │                                      │
  │                                      │  reply.create { instructions: ... }
  │ ◄─────────────────────────────────── │  ── optional status update
  │  reply.started → reply.audio → done  │
  │ ───────────────────────────────────► │
  │                                      │  (op completes)
  │                                      │  tool.result
  │ ◄─────────────────────────────────── │
  │  reply.started                       │  auto-fired by tool.result
  │ ───────────────────────────────────► │
  │  reply.audio  ("all set...")         │
  │ ───────────────────────────────────► │
  │  reply.done                          │
  │ ───────────────────────────────────► │
During hold, the server does not emit transcript.user.delta or transcript.user in real time. Transcripts flush once the hold ends (tool.result or reply.create). Live captioning pauses during the hold; nothing is dropped.

Status updates during hold

Send reply.create with optional instructions to make the agent speak mid-hold without ending it:
await ws.send(json.dumps({
    "type": "reply.create",
    "instructions": "Let the customer know you're still working on the transfer."
}))
The hold continues until you send the matching tool.result.

Progressive tool reveal

For multi-step workflows (lookup → estimate → commit), don’t register all tools upfront. After each successful tool.result, send session.update adding the next phase’s tools, and update system_prompt to match. Why: a tool that isn’t in the current list can’t be called, so the model can’t fabricate a commit before the prerequisite step has run. Smaller per-phase tool sets also raise selection accuracy.

Worked example: taxi booking

session state              tools exposed
─────────────────────────  ──────────────────────────────────────────
session start              [lookup_postcode]
                                  │ user gives pickup postcode

                           ⚙ lookup_postcode("SW1A 1AA") → ✓
─────────────────────────  ──────────────────────────────────────────
tier 2 unlocked            [lookup_postcode, estimate_fare]
                                  │ user gives dropoff

                           ⚙ estimate_fare(...) → ✓
─────────────────────────  ──────────────────────────────────────────
tier 3 unlocked            [lookup_postcode, estimate_fare, book_ride,
                            get_booking, track_driver, cancel_ride]
                                  │ user confirms + name

                           ⚙ book_ride(name="Alex", ...) → ✓
Until lookup_postcode returns a real postcode, the model has no book_ride tool. It can verbally promise a booking; it can’t create one.

Client-side wiring

TIER_1_TOOLS = [lookup_postcode]
TIER_2_TOOLS = [lookup_postcode, estimate_fare]
TIER_3_TOOLS = [lookup_postcode, estimate_fare, book_ride,
                get_booking, track_driver, cancel_ride]

tier_2_unlocked = tier_3_unlocked = False

async def maybe_unlock_next_tier(tool_name, result):
    global tier_2_unlocked, tier_3_unlocked
    if result.get("error"):
        return

    if not tier_2_unlocked and tool_name == "lookup_postcode" and result.get("postcode"):
        tier_2_unlocked = True
        await ws.send(json.dumps({"type": "session.update",
                                  "session": {"tools": TIER_2_TOOLS, "system_prompt": TIER_2_PROMPT}}))
    elif not tier_3_unlocked and tool_name == "estimate_fare" and result.get("estimated_fare"):
        tier_3_unlocked = True
        await ws.send(json.dumps({"type": "session.update",
                                  "session": {"tools": TIER_3_TOOLS, "system_prompt": TIER_3_PROMPT}}))
Update tools AND system_prompt together. Tool-only gating where the prompt still references a now-hidden tool can underperform not gating at all. The model hunts for a tool the prompt promised and stalls or improvises when it can’t find it. Strip or rewrite every prompt sentence that names a tool whose visibility changed.

Per-call state machine

The strongest form: every successful tool call is a state transition; each state owns a narrow prompt + small tool list.
StateSystem prompt focusTools
s0_greet”Get pickup postcode. Nothing else.”lookup_postcode
s2_quoting”Call estimate_fare. Filler only; no fare numbers.”estimate_fare
s4_have_name”Call book_ride with captured pickup, dropoff, name.”book_ride
s5_booked”Read back confirmation. Offer track/cancel.”get_booking, track_driver, cancel_ride

Escape hatches

Real users go off-script. Two patterns, used together:
  • Transition tools: revise_pickup, revise_dropoff, restart, end_call exposed in every state. Model picks the right escape; orchestrator rolls state back.
  • respond_freely: a no-op tool in every state for tangential questions (“are you a real person?”). Model calls it instead of leaving the state.

Anti-fabrication clause

Gating makes hallucinations harmless (no real booking happens) but doesn’t suppress the spoken claim. Pair with prompt wording:
NEVER quote a fare, distance, time, confirmation number, name, or ETA unless
those exact values came from a tool result in this conversation. If you
haven't seen a tool result, you do NOT have these values. Don't estimate
them. Don't guess. Don't say "around" a number.

Patterns by agent type

Customer support

s0:  [lookup_ticket]                                           ← always
s1:  [lookup_ticket, escalate_to_human (hold), close_ticket]   ← after lookup
any: [respond_freely, end_call]
Prompt focus: “Use lookup_ticket for any ticket question. Only escalate_to_human after checking the ticket. Don’t promise outcomes you can’t verify.”

Booking / reservations

s0:  [check_availability, cancel_reservation]                          ← always
s1a: [check_availability, create_reservation, cancel_reservation]      ← if available
s1b: [check_availability, add_to_waitlist, cancel_reservation]         ← if full
Prompt focus: “Confirm party size, date, time. Call check_availability. If open, offer it and book. If not, offer the next two times or the waitlist.” The next tool depends on the prior result. Don’t expose both create_reservation and add_to_waitlist simultaneously. The model picks the wrong one ~30% of the time.

Banking / account

s0:  [verify_identity (hold)]                                  ← gatekeeper
s1:  [get_balance, list_recent_transactions]                   ← after auth
s2:  [start_transfer, dispute_charge (hold), close_account]    ← actions
any: [end_call]
Prompt focus: “Before sharing any account info, call verify_identity. Never quote a balance or transaction you haven’t fetched. Never promise a dispute outcome; only the system can.” The anti-fabrication clause matters most here. A bank agent inventing a balance is a P0.

Debugging

Tool never fires

  • Description too vague. Name the user phrases that should trigger it.
  • System prompt missing “default to calling” wording.
  • Too many tools (>10). Drop or split via progressive reveal.
  • Add a few-shot example to the system prompt. This is the strongest signal.

Wrong arguments

  • Parameter description missing format/example. Add (e.g. 2026-04-30).
  • Free-text where you want fixed buckets. Use enum.
  • User says the value multiple ways. Normalise in the description.

Agent invents a result

Most common cause: the model is being asked to do something after a tool result without having actually called the tool. Two fixes, used together:
  1. Progressive reveal: gate the commit tool behind the read tool.
  2. Anti-fabrication clause in the prompt (see above).

Tool fires repeatedly

  • tool.result arriving while last_event is reply.started. Make sure your handler flushes on reply.done.
  • Tool slower than timeout_seconds. Agent gets internal timeout, user tries again. Bump the timeout.

Tool fires when it shouldn’t

Description is too broad. Add explicit anti-triggers:
**Use this for**: weather questions.
**Do not call for**: general chit-chat, scheduling, or any non-weather topic.