Solutions

Voice agents for multilingual global support

Build voice agents that understand, detect, and respond in 99 languages with automatic language detection and real-time code-switching — no per-language pipeline required.

Language detection

Live

Spanish detected

"Buenos días, necesito ayuda con mi cuenta..."

99.4%

French detected

"Bonjour, je voudrais vérifier mon solde..."

98.7%

German detected

"Ich brauche Hilfe mit meiner Bestellung..."

97.9%

Portuguese detected

"Olá, preciso cancelar minha assinatura..."

98.2%
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
The problem

One language per agent doesn't scale globally

Global companies spend millions operating separate support queues per language — hiring bilingual agents, routing through IVR menus, and accepting degraded accuracy from English-centric ASR. When a customer code-switches mid-sentence, most systems break. Longer handle times, higher staffing costs, and a fragmented experience for non-English speakers all follow. AssemblyAI's multilingual pipeline eliminates the per-language bottleneck with a single model that detects, transcribes, and supports 99 languages — including real-time code-switching.

Built for global-scale voice operations

Languages 99

Total languages supported with automatic detection.

Code-switching 6

Core languages with native code-switching on Universal-3 Pro Streaming.

Multilingual WER 4.58%

Mean word error rate across FLEURS multilingual benchmark.

Uptime 99.9%

SLA with SOC 2 Type 2 certification.

Two ways to build

Pick the API that fits your multilingual stack

Ship a working multilingual voice agent in an afternoon, or drop industry-leading STT into the orchestration stack you already run.

Recommended

Voice Agent API

Our proprietary voice stack via one WebSocket. Build multilingual support agents with automatic language detection and localized voice response — zero infra to manage.

Best for

  • Multilingual support across 6 core code-switching languages
  • Built-in language detection and model routing
  • Teams shipping fast — working multilingual agent in an afternoon
  • Claude Code compatible — paste the docs and build anything
$4.50/hr — speech, LLM, and voice all included
Get started for free

Free tier available · No credit card required

Bring Your Own Stack

Universal-3 Pro Streaming STT API

The multilingual STT layer for your voice agent. Works natively with LiveKit, Pipecat, Vapi, and Twilio — native code-switching across 6 core languages, automatic routing for 99 total.

Best for

  • Teams using LiveKit, Pipecat, Vapi, or Twilio as their orchestrator
  • 6 core languages with native code-switching on Universal-3 Pro Streaming
  • Automatic routing for 99-language coverage
  • Real-time language detection with confidence scores
  • Keyterm prompting for domain-specific vocabulary
$0.45/hr — transcription only, unlimited streams
View integration docs

No concurrency caps · Autoscaling included

One pipeline handles every language your customers speak

Automatic language detection

The model identifies the dominant language in real time with confidence scores. Supports constrained detection with expected language lists — no IVR language menu required.

Multilingual transcription with code-switching

Universal-3 Pro Streaming natively handles code-switching across 6 core languages. Automatic model routing extends coverage to 99 languages.

Language-aware LLM processing

Route transcripts through the LLM Gateway with the detected language code passed downstream, so the LLM responds in the caller's language without explicit prompting.

Localized voice response

Voice Agent API generates TTS in the detected language with natural prosody. Full STT → LLM → TTS round trip in sub-1-second for core languages.

language

Pipeline

Ingest multilingual audio

Detect language + transcribe

Route to language-aware LLM

Respond in caller's language

Quickstart

Build a multilingual voice agent in minutes

Voice Agent API — recommended

# Voice Agent API: multilingual support agent
import asyncio, json, websockets

API_KEY = "YOUR_API_KEY"

async def run_agent():
    async with websockets.connect(
        "wss://agents.assemblyai.com/v1/ws",
        additional_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": (
                    "You are a multilingual support agent. Detect the caller's "
                    "language and respond in that language. Handle account "
                    "inquiries, billing, and technical support."
                ),
                "greeting": "Hello! Hola! Bonjour! How can I help today?",
                "input": {"keyterms": ["cuenta", "Abonnement", "Konto", "fatura"]},
                "output": {"voice": "ivy"},
            },
        }))
        async for msg in ws:
            handle(json.loads(msg))  # transcript.user, reply.audio, tool.call, ...

Universal-3 Pro Streaming + LiveKit — BYO stack

# LiveKit + AssemblyAI STT in a multilingual voice agent pipeline
from livekit.agents import Agent, AgentSession, TurnHandlingOptions
from livekit.plugins import assemblyai, cartesia, openai, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

class MultilingualAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions=(
                "You are a multilingual support agent. Respond in the "
                "caller's detected language. Handle global customer "
                "inquiries across all product lines."
            ),
        )

async def entrypoint(ctx):
    session = AgentSession(
        stt=assemblyai.STT(
            model="u3-rt-pro",
            language_detection=True,                  # include language metadata in turns
            vad_threshold=0.3,                        # match Silero activation_threshold
        ),
        llm=openai.LLM(model="gpt-4o"),
        tts=cartesia.TTS(),
        vad=silero.VAD.load(activation_threshold=0.3),
        turn_handling=TurnHandlingOptions(
            turn_detection=MultilingualModel(),       # multilingual turn detection model
            endpointing={"min_delay": 0.5, "max_delay": 3.0},
        ),
    )
    await session.start(room=ctx.room, agent=MultilingualAgent())

99-language coverage with automatic routing

Universal-3 Pro Streaming handles 6 core languages at the highest accuracy. Automatic model routing extends coverage to 99 languages — one API call covers every market.

Real-time code-switching detection

When callers switch languages mid-sentence — "I need help with mi cuenta, s'il vous plaît" — Universal-3 Pro Streaming transcribes each segment in the correct language without miscategorizing the call.

Language-specific dialect recognition

Universal-3 Pro Streaming goes beyond standard language codes with deep understanding of regional dialects — Quebecois French, Mexican Spanish, Brazilian Portuguese — capturing colloquial expressions and accent-specific pronunciation.

Frequently asked questions