Solutions

Voice agents for AI medical scribe & ambient documentation

Build ambient AI scribes that listen to patient-provider conversations and automatically generate structured clinical notes. Powered by Medical Mode with 87% fewer medical entity errors, speaker diarization, and LLM Gateway for SOAP note generation.

SOAP note — auto-generated

Visit: Annual wellness · Dr. Patel · 14 min

Subjective

Patient reports persistent fatigue over 3 weeks. Denies chest pain, SOB. Sleep quality poor…

Objective

BP 128/82, HR 74, Temp 98.6°F. BMI 27.3. No lymphadenopathy…

Assessment & plan

R53.83 Fatigue. Order CBC, CMP, TSH, ferritin. F/U 2 weeks…

Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
The problem

Documentation is burning out your clinicians

Providers spend two hours on documentation for every one hour of patient care. That overhead drives burnout, shrinks appointment availability, and costs health systems thousands per provider annually in lost revenue. Ambient AI scribes — built on clinical-grade speech-to-text, speaker diarization, and LLM-powered note generation — eliminate the typing so providers can focus on the patient in front of them.

Built for clinical documentation accuracy

Medical accuracy 87%

Fewer medical entity errors with Medical Mode.

Ambient range 20ft+

Far-field capture as providers move around the room.

Latency ~150ms

P50 median streaming latency on Universal-3 Pro.

LLM Gateway 25+

Models for note generation — Claude, GPT, Gemini, and more through one API.

Two ways to build

Pick the API that fits your scribe architecture

Ship an ambient scribe with our managed pipeline, or drop medical-grade STT into the orchestrator you already run.

Recommended

Voice Agent API

Our proprietary voice stack with Medical Mode via one WebSocket. Real-time ambient transcription with built-in speaker diarization, LLM reasoning, and TTS for interactive scribes.

Best for

  • Interactive ambient scribes with voice confirmation
  • Medical Mode with 87% fewer entity errors built in
  • Teams shipping fast — working scribe in an afternoon
  • Business Associate Addendum (BAA) available for PHI workloads
$4.50/hr — speech, LLM, and voice all included
Get started for free

Free tier available · No credit card required

Bring Your Own Stack

Universal-3 Pro Streaming STT API

The medical-grade STT layer for your ambient scribe pipeline. Pair with your own LLM for SOAP generation and your own EHR integration logic.

Best for

  • Teams using LiveKit, Pipecat, or custom orchestration
  • Cascading architectures (STT → LLM → note generation)
  • Medical Mode add-on with keyterm prompting for formulary
  • Complex EHR integrations (Epic, Cerner, custom)
  • BAA-eligible, SOC 2 Type 2 — bring your own compliance infra
$0.45/hr — transcription only, unlimited streams
View integration docs

No concurrency caps · Autoscaling included

Your ambient scribe pipeline

Capture clinical audio

Voice Agent API: single WebSocket. Or smartphone, tablet, or room mic → U3 Pro Streaming for BYO stack. Far-field from 20+ feet.

Transcribe with Medical Mode

87% fewer medical entity errors. Speaker diarization labels provider and patient speech automatically at ~150ms P50.

Generate structured notes

LLM Gateway organizes the diarized transcript into SOAP, DAP, or specialty-specific templates. 25+ models across Claude, GPT, and Gemini.

Review and sync to EHR

Provider reviews draft note, edits as needed, approves. Push to Epic, Cerner, or any EHR via API integration.

schedule

Encounter timeline

Provider

"Let's review your metformin dosage — any side effects with the 500mg?"

Patient

"Some nausea in the morning, but it's getting better."

Provider

"Good. We'll keep the current dose and recheck A1C in 3 months."

Quickstart

Build a medical scribe in minutes

Voice Agent API — recommended

# Voice Agent API: ambient scribe with Medical Mode
import asyncio, json, websockets

API_KEY = "YOUR_API_KEY"

async def run_scribe():
    async with websockets.connect(
        "wss://agents.assemblyai.com/v1/ws",
        additional_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": (
                    "You are an ambient medical scribe. Listen to the "
                    "encounter and generate a SOAP note when the visit ends."
                ),
                "input": {"keyterms": ["metformin", "lisinopril", "A1C", "Dr. Patel"]},
                "output": {"voice": "ivy"},
            },
        }))
        # Stream encounter audio in, get transcript + note back
        async for msg in ws:
            handle(json.loads(msg))  # transcript.user, reply.audio, tool.call, ...

Universal-3 Pro Streaming + LiveKit — BYO stack

# LiveKit + AssemblyAI Medical Mode in a cascading scribe pipeline
from livekit.agents import Agent, AgentSession
from livekit.plugins import assemblyai, cartesia, openai, silero

class MedicalScribe(Agent):
    def __init__(self):
        super().__init__(
            instructions=(
                "You are an ambient scribe for Dr. Patel's clinic. "
                "Generate SOAP notes from the encounter transcript."
            ),
        )

async def entrypoint(ctx):
    session = AgentSession(
        stt=assemblyai.STT(
            model="u3-rt-pro",
            domain="medical-v1",                       # Enable Medical Mode
            keyterms_prompt=["metformin", "lisinopril", "A1C", "Dr. Patel"],
            min_turn_silence=800,                      # Clinicians pause to think
            max_turn_silence=2000,                     # Don't fragment chart-review pauses
        ),
        llm=openai.LLM(model="gpt-4o"),
        tts=cartesia.TTS(),
        vad=silero.VAD.load(),
    )
    await session.start(room=ctx.room, agent=MedicalScribe())

Medical Mode accuracy

87% fewer medical entity errors — correctly captures drug names, dosages, anatomical terms, and ICD codes from ambient exam room audio.

Speaker diarization

Real-time speaker diarization separates provider and patient speech automatically — essential for mapping conversation segments to SOAP note sections.

LLM Gateway

Access 25+ models through one unified API — Claude, GPT, Gemini, and more — for SOAP note generation. Customizable templates for any specialty: primary care, psych, surgery, radiology.

Frequently asked questions