Solutions

Voice agents for field service operations

Build hands-free voice agents that let technicians pull up manuals, log work completed, order parts, and update work orders — all through voice while their hands stay on the job.

Work order

Live
WO # FS-20260518-4471
Asset Carrier 50XC 15T RTU
Task Compressor replacement
Status In progress
Parts ordered Pending

Voice note captured

"Compressor seized — replacing scroll assembly. Need part #3BAK-0601, ordering now via voice."

Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
The problem

Field techs lose hours every shift to paperwork

Most field technicians spend 30–60 minutes per shift tapping through FSM screens — on ladders, in crawl spaces, on rooftops, with greasy or wet hands. Voice would fix it, but consumer ASR breaks on noisy job sites and butchers part numbers. The result: late work orders, missing parts, and revenue stuck in unbilled hours. AssemblyAI's purpose-built voice AI handles the real noise, the real vocabulary, and the real workflows of field service.

Built for the real conditions of field work

Latency ~150ms

Median streaming latency for hands-free voice prompts and confirmations.

Entity accuracy 28%

Better consecutive number recognition for part SKUs, model numbers, and asset IDs.

Languages 99

Total languages supported for multilingual field technician workforces.

Keyterms 100

Domain-specific terms per session — boost recognition of parts, tools, and procedures.

Two ways to build

Pick the API that fits your field service stack

Ship a working hands-free agent in an afternoon, or drop best-in-class streaming STT into the FSM platform you already run.

Recommended

Voice Agent API

Our proprietary voice stack via one WebSocket. Run a hands-free agent that captures voice notes, confirms back via TTS, and writes updates to your FSM platform — zero infra to manage.

Best for

  • Hands-free voice capture and read-back confirmation
  • Tool calls for FSM write-back (ServiceTitan, FieldEdge, Jobber, custom)
  • Built-in keyterm prompting for parts catalogs and asset IDs
  • Claude Code compatible — paste the docs and build anything
$4.50/hr — speech, LLM, and voice all included
Get started for free

Free tier available · No credit card required

Bring Your Own Stack

Universal-3 Pro Streaming STT API

The live transcription layer for your FSM platform. Works natively with LiveKit, Pipecat, Vapi, and Twilio — entity-accurate, noise-robust, and multilingual out of the box.

Best for

  • Teams running their own LLM and FSM integrations
  • ~150ms P50 latency for real-time voice prompts
  • 28% better consecutive number recognition for SKUs and asset IDs
  • 99-language coverage via automatic model routing
  • Trained on real-world noisy audio — job sites work
$0.45/hr — transcription only, unlimited streams
View integration docs

No concurrency caps · Autoscaling included

One pipeline turns voice into structured field-service data

Capture hands-free voice

Stream audio from a Bluetooth headset, phone speaker, or work-truck mic. No tapping, no swiping — technicians keep both hands on the job.

Transcribe with noise robustness

Universal-3 Pro handles loud HVAC units, generators, road traffic, and wind. Speaker labels separate technician from customer when on-site.

Extract structured work-order data

Finalized turns feed the LLM Gateway (25+ models across Claude, GPT, and Gemini) to extract part numbers, asset IDs, work status, and parts requests as structured fields.

Confirm and write back to FSM

Read captured fields back to the technician for confirmation, then push to ServiceTitan, FieldEdge, HousecallPro, Jobber, or your custom backend via tool calls or webhooks.

handyman

Field service pipeline

Capture hands-free voice notes

Transcribe — noise-robust + multilingual

Extract structured work-order fields

Confirm + push to FSM platform

Quickstart

Build a hands-free field service voice agent in minutes

Voice Agent API — hands-free agent with FSM write-back

# Voice Agent API: hands-free field service voice agent
import asyncio, json, websockets

API_KEY = "YOUR_API_KEY"

async def run_agent():
    async with websockets.connect(
        "wss://agents.assemblyai.com/v1/ws",
        additional_headers={"Authorization": f"Bearer {API_KEY}"},
    ) as ws:
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "system_prompt": (
                    "You are a hands-free assistant for an HVAC field technician. "
                    "Capture part numbers, asset IDs, and work status. Always "
                    "confirm captured fields back to the tech before calling "
                    "update_work_order. Keep responses under 2 sentences."
                ),
                "greeting": "Ready when you are — what's the update?",
                "input": {"keyterms": ["Carrier 50XC", "Trane XR", "scroll assembly", "compressor"]},
                "output": {"voice": "ivy"},
                "tools": [{
                    "type": "function",
                    "name": "update_work_order",
                    "description": "Push captured fields to the FSM platform.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "wo_id": {"type": "string"},
                            "part_number": {"type": "string"},
                            "status": {"type": "string"},
                        },
                        "required": ["wo_id", "status"],
                    },
                }],
            },
        }))
        async for msg in ws:
            handle(json.loads(msg))  # transcript.user, reply.audio, tool.call, ...

Universal-3 Pro Streaming — voice notes to structured fields

# Universal-3 Pro Streaming: voice notes → structured work order
import asyncio, json, websockets
from urllib.parse import urlencode

API_KEY = "YOUR_API_KEY"

params = urlencode({
    "sample_rate": 16000,
    "speech_model": "u3-rt-pro",
    "language_detection": "true",              # tag each turn with detected language
    "keyterms_prompt": json.dumps([
        "Carrier 50XC", "Trane XR", "scroll assembly",
        "compressor seized", "refrigerant leak",
        "3BAK-0601", "FS-20260518",
    ]),
    "format_turns": "true",
    "speaker_labels": "true",                  # tech vs. customer on-site
})

async def stream_field_notes(audio_iter, send_to_fsm):
    url = f"wss://streaming.assemblyai.com/v3/ws?{params}"
    async with websockets.connect(
        url, additional_headers={"Authorization": API_KEY},
    ) as ws:
        async def send_audio():
            async for chunk in audio_iter:
                await ws.send(chunk)
        asyncio.create_task(send_audio())
        async for raw in ws:
            evt = json.loads(raw)
            if evt.get("type") == "Turn" and evt.get("end_of_turn"):
                # finalized turn → LLM Gateway extracts {wo_id, part, status}
                fields = extract_work_order_fields(evt["transcript"])
                send_to_fsm(fields)

Part SKUs, model numbers, and asset IDs captured cleanly

Universal-3 Pro Streaming delivers 28% better consecutive number recognition for alphanumeric sequences. Add part catalogs and customer asset terms via keyterm prompting (up to 100 per session) for near-perfect domain accuracy.

Built for the real noise of a job site

Universal-3 Pro Streaming is trained on noisy real-world audio — HVAC compressors, generators, road traffic, wind. The model stays accurate where consumer ASR breaks down, so field-truck dictation works the first time.

Multilingual workforce out of the box

Universal-3 Pro Streaming handles 6 core languages with native code-switching at the highest accuracy. Automatic model routing extends coverage to 99 languages — field technicians dictate work-order notes in their preferred language and your FSM system receives clean transcripts every time.

Frequently asked questions