Solutions

Voice agents for customer support & contact centers

Replace legacy IVR with AI voice agents powered by the fastest, most accurate speech-to-text. Build end-to-end with our Voice Agent API, or drop Universal-3 Pro Streaming into your existing stack.

Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
The problem

Legacy IVR is costing you customers

Touch-tone menus and brittle keyword bots strand callers in loops that end in a hang-up or an escalation. Modern voice agents — built on accurate streaming STT, a managed LLM, and natural TTS — resolve more calls before a human ever picks up.

Built for contact center performance

Latency ~150ms

P50 median streaming latency for Universal-3 Pro Streaming.

Entity 43%

Better alphanumeric accuracy than other providers.

Uptime 99.9%

SLA with SOC 2 Type 2 certification.

Scale 40TB+

Audio processed daily in production.

Two ways to build

Pick the API that fits your support stack

Ship a working support agent in an afternoon, or drop industry-leading STT into the orchestrator you already run.

Recommended

Voice Agent API

Our proprietary voice stack via one WebSocket. Connect, stream audio in, get audio back — we handle the rest.

Best for

  • Best-in-class voice agents — the preferred way to build with AssemblyAI
  • Customer support agents, AI companions, clinical intake, language learning
  • Teams shipping fast — working agent in an afternoon, no infra to manage
  • Claude Code compatible — paste the docs and build anything
$4.50/hr — speech, LLM, and voice all included
Get started for free

Free tier available · No credit card required

Bring Your Own Stack

Universal-3 Pro Streaming STT API

The STT layer for your cascading voice agent architecture. Works natively with your preferred orchestrator.

Best for

  • Teams already using LiveKit, Pipecat, or Vapi as their orchestration layer
  • Teams running cascading architectures (STT → LLM → TTS)
  • High-scale deployments where margin and full control matter
  • Complex workflows with RAG, custom tooling, or proprietary LLMs
  • HIPAA, SOC 2 — bring your own compliance infrastructure
$0.45/hr — transcription only, unlimited concurrent streams
View integration docs

No concurrency caps · Autoscaling included

Your support agent pipeline

Ingest caller audio

Voice Agent API: single WebSocket. Or Twilio Media Streams → U3 Pro Streaming for BYO stack.

Real-time transcription

Punctuation-based turn detection at ~150ms P50. Keyterm boosting for your product vocabulary.

LLM reasoning

Intent classification, KB lookup, and response generation. Managed (Voice Agent API) or BYO.

Voice response

TTS audio streamed back to caller. Full round-trip under 1 second.

Quickstart

Get a working agent in minutes

Voice Agent API — recommended

# Voice Agent API: one WebSocket, full pipeline
import asyncio, json, websockets

API_KEY = "YOUR_API_KEY"

async def run_agent():
    async with websockets.connect(
        "wss://agents.assemblyai.com/v1/ws",
        extra_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": "You are a helpful support agent for Acme Corp.",
                "greeting": "Hi, this is Acme support — how can I help?",
                "output": {"voice": "ivy"},
            },
        }))
        # Stream audio in, get audio + transcript back
        async for msg in ws:
            handle(json.loads(msg))  # transcript.user, audio.delta, tool.call, ...

Universal-3 Pro Streaming + LiveKit — BYO stack

# LiveKit + AssemblyAI STT in a cascading pipeline
from livekit.agents import Agent, AgentSession
from livekit.plugins import assemblyai, cartesia, openai, silero

class SupportAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a support agent for Acme Corp. Be concise.",
        )

async def entrypoint(ctx):
    session = AgentSession(
        stt=assemblyai.STT(
            model="universal-streaming-english",
            keyterms_prompt=["Acme Pro", "tier-2", "premium plan"],
        ),
        llm=openai.LLM(model="gpt-4o"),
        tts=cartesia.TTS(),
        vad=silero.VAD.load(),
    )
    await session.start(room=ctx.room, agent=SupportAgent())

Resolution-grade accuracy

Universal-3 Pro Streaming transcribes 94%+ on noisy contact-center audio — the difference between a deflected ticket and an angry escalation.

PII redaction by default

Names, card numbers, addresses, and account IDs masked before transcripts hit your CRM, data warehouse, or QA stack.

Real-time intelligence

Topic detection, sentiment, and call outcomes available on the live stream — coach agents in the moment, not the next day.

Frequently asked questions