customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Voice Agent API
Voice agents that get it right, respond instantly, and ship the same day with our new Voice Agent API
resources
Build voice agents that understand, detect, and respond in 99 languages with automatic language detection and real-time code-switching — no per-language pipeline required.
Language detection
LiveSpanish detected
"Buenos días, necesito ayuda con mi cuenta..."
French detected
"Bonjour, je voudrais vérifier mon solde..."
German detected
"Ich brauche Hilfe mit meiner Bestellung..."
Portuguese detected
"Olá, preciso cancelar minha assinatura..."
Global companies spend millions operating separate support queues per language — hiring bilingual agents, routing through IVR menus, and accepting degraded accuracy from English-centric ASR. When a customer code-switches mid-sentence, most systems break. Longer handle times, higher staffing costs, and a fragmented experience for non-English speakers all follow. AssemblyAI's multilingual pipeline eliminates the per-language bottleneck with a single model that detects, transcribes, and supports 99 languages — including real-time code-switching.
Total languages supported with automatic detection.
Core languages with native code-switching on Universal-3 Pro Streaming.
Mean word error rate across FLEURS multilingual benchmark.
SLA with SOC 2 Type 2 certification.
Two ways to build
Ship a working multilingual voice agent in an afternoon, or drop industry-leading STT into the orchestration stack you already run.
Our proprietary voice stack via one WebSocket. Build multilingual support agents with automatic language detection and localized voice response — zero infra to manage.
Best for
Free tier available · No credit card required
The multilingual STT layer for your voice agent. Works natively with LiveKit, Pipecat, Vapi, and Twilio — native code-switching across 6 core languages, automatic routing for 99 total.
Best for
No concurrency caps · Autoscaling included
Automatic language detection
The model identifies the dominant language in real time with confidence scores. Supports constrained detection with expected language lists — no IVR language menu required.
Multilingual transcription with code-switching
Universal-3 Pro Streaming natively handles code-switching across 6 core languages. Automatic model routing extends coverage to 99 languages.
Language-aware LLM processing
Route transcripts through the LLM Gateway with the detected language code passed downstream, so the LLM responds in the caller's language without explicit prompting.
Localized voice response
Voice Agent API generates TTS in the detected language with natural prosody. Full STT → LLM → TTS round trip in sub-1-second for core languages.
Pipeline
Ingest multilingual audio
Detect language + transcribe
Route to language-aware LLM
Respond in caller's language
Voice Agent API — recommended
# Voice Agent API: multilingual support agent
import asyncio, json, websockets
API_KEY = "YOUR_API_KEY"
async def run_agent():
async with websockets.connect(
"wss://agents.assemblyai.com/v1/ws",
additional_headers={"Authorization": f"Bearer {API_KEY}"},
) as ws:
await ws.send(json.dumps({
"type": "session.update",
"session": {
"system_prompt": (
"You are a multilingual support agent. Detect the caller's "
"language and respond in that language. Handle account "
"inquiries, billing, and technical support."
),
"greeting": "Hello! Hola! Bonjour! How can I help today?",
"input": {"keyterms": ["cuenta", "Abonnement", "Konto", "fatura"]},
"output": {"voice": "ivy"},
},
}))
async for msg in ws:
handle(json.loads(msg)) # transcript.user, reply.audio, tool.call, ...
Universal-3 Pro Streaming + LiveKit — BYO stack
# LiveKit + AssemblyAI STT in a multilingual voice agent pipeline
from livekit.agents import Agent, AgentSession, TurnHandlingOptions
from livekit.plugins import assemblyai, cartesia, openai, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
class MultilingualAgent(Agent):
def __init__(self):
super().__init__(
instructions=(
"You are a multilingual support agent. Respond in the "
"caller's detected language. Handle global customer "
"inquiries across all product lines."
),
)
async def entrypoint(ctx):
session = AgentSession(
stt=assemblyai.STT(
model="u3-rt-pro",
language_detection=True, # include language metadata in turns
vad_threshold=0.3, # match Silero activation_threshold
),
llm=openai.LLM(model="gpt-4o"),
tts=cartesia.TTS(),
vad=silero.VAD.load(activation_threshold=0.3),
turn_handling=TurnHandlingOptions(
turn_detection=MultilingualModel(), # multilingual turn detection model
endpointing={"min_delay": 0.5, "max_delay": 3.0},
),
)
await session.start(room=ctx.room, agent=MultilingualAgent())
Universal-3 Pro Streaming handles 6 core languages at the highest accuracy. Automatic model routing extends coverage to 99 languages — one API call covers every market.
When callers switch languages mid-sentence — "I need help with mi cuenta, s'il vous plaît" — Universal-3 Pro Streaming transcribes each segment in the correct language without miscategorizing the call.
Universal-3 Pro Streaming goes beyond standard language codes with deep understanding of regional dialects — Quebecois French, Mexican Spanish, Brazilian Portuguese — capturing colloquial expressions and accent-specific pronunciation.
We require a leading edge speech-to-text provider that can meet our specialized needs: fast, accurate, targeted, and multilingual.
Super
Read more The transcription accuracy, reliability, and speed of AssemblyAI's API have greatly enhanced our operations.
Raj Shankar, SVP Product — Calabrio