Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
How prompting works
Universal-3 Pro is a Speech-augmented Large Language Model (SpeechLLM): a multi-modal LLM with an audio encoder and LLM decoder that processes speech, audio, and text inputs in the same workflow. Think of SpeechLLM prompting as selecting modes and knobs, not open-ended instruction following. The model is trained primarily to transcribe, then fine-tuned to respond to common transcription instructions for style, speakers, and speech events. It responds best to explicit formatting rules and behavioral instructions (e.g., “include all filler words”, “use periods only for complete sentences”). Domain context like “this is a cardiology appointment” only helps when paired with specific instructions on how to transcribe.What prompts can do
| Capability | Description | Reliability |
|---|---|---|
| Verbatim transcription and disfluencies | Include filler words, false starts, repetitions, stutters | High |
| Native code switching | Handle multilingual audio in the same transcript | High |
| Output style and formatting | Control punctuation, capitalization, number formatting | High |
| Context aware clues | Help with jargon, names, and domain expectations | Medium |
| Entity accuracy and spelling | Improve accuracy for proper nouns, brands, technical terms | Medium |
Recommended prompts
These three prompts are battle-tested and the strongest starting points. Use one as your base and tweak from there — don’t start from scratch.Best all around (default)
This is also the current built-in default prompt — when you omit theprompt parameter, this is what Universal-3 Pro uses. You don’t need to set it explicitly; it’s shown here so you can build off it.
Verbatim with multilingual support
This prompt maximizes speech pattern capture, preserves code-switching, and tells the model to always attempt transcription even on difficult audio. The trade-off is that the model may occasionally hallucinate disfluencies or language switches that don’t exist in the audio.Handling unclear audio with [unclear]
This prompt flags uncertain segments rather than forcing the model to guess. It is one of the strongest tools for avoiding hallucinations on unclear audio.
- Hallucinations are materially reduced — the model doesn’t force incorrect guesses on uncertain audio.
- Uncertain sections are explicitly flagged as
[unclear], surfacing exactly where audio quality is insufficient. - Clearly audible speech is still preserved.
Capabilities reference
Each capability is a “knob” you can turn. Each section below shows one audio demo with before/after output and one recommended prompt. Layer capabilities in one at a time so you can measure the impact of each — conflicting instructions degrade output, so keep your prompt focused.Verbatim transcription and disfluencies
Preserves natural speech patterns including filler words, false starts, repetitions, and self-corrections. Reliability: High. Without prompt:wordWrap
wordWrap
Native code switching
Handles audio where speakers switch between languages. Reliability: High.Output style and formatting
Controls punctuation, capitalization, and readability without changing words. Reliability: High. Without prompt:wordWrap
wordWrap
Context aware clues
Helps with jargon, names, and domain expectations from the audio file. Reliability: Medium. Without prompt:wordWrap
clinical history evaluation as a context clue corrects spelling of “Glicoside” to “Glycoside”:
wordWrap
Entity accuracy and spelling
Improves accuracy for proper nouns, brands, technical terms, and domain vocabulary. Reliability: Medium. If you already know the exact terms you want boosted, use keyterms prompting instead of describing them in your prompt. Without prompt:wordWrap
wordWrap
What works / what to avoid
What works
| Practice | Why it helps | Example | Impact |
|---|---|---|---|
Start with Transcribe… | The model has transcription prompts in its training data, so leading with this focuses it on the task. | Transcribe this audio or Transcribe verbatim | Massive |
| Use authoritative language | Strong directive keywords get higher compliance than soft language. | Mandatory:, Non-negotiable:, Required:, Always: | Massive |
| Start with fewer instructions, add one at a time | Every added instruction risks conflicting with another. The previous “3–6 instructions” guidance is an upper bound, not a target — test each addition against your own audio before adding the next. | Add a single capability instruction, evaluate, then add the next. | High |
| Describe the desired output format | Telling the model the pattern to watch for is more reliable than listing specifics. | Pharmaceutical accuracy required across all medications and drug names | High |
| Spell out disfluency behavior explicitly | Enumerated behavior produces more consistent output than a bare directive. | Preserve linguistic speech patterns including disfluencies, filler words, hesitations, repetitions, stutters, false starts, and colloquialisms | High |
What to avoid
| Anti-pattern | Why it hurts | Example | Impact |
|---|---|---|---|
| Listing explicit errors from your audio | Makes the model over-eager to insert those exact phrases, including in places they don’t belong. Describe the pattern, not the corrections. Use keyterms prompting if you know specific terms. | Pharmaceutical accuracy required (omeprazole over omeprizole, metformin over metforman) | Hallucinations |
| Using negative language | Don't, Avoid, Never, Not are not reliably processed by the model. Phrase instructions positively. | Don't include filler words → use Output complete sentences without disfluencies | Severe |
| Conflicting instructions | Forces the model to pick one; the outcome becomes non-deterministic. | Include disfluencies. Maximum readability. | Severe |
| Being short or vague | Gives the model no actionable pattern. | Be accurate, Best transcript ever, Superhero human transcriptionist | High |
Evaluating your prompts
Prompts only work on your audio — universal best practices don’t transfer reliably across use cases. Before settling on a prompt, run it against a representative dataset. The workflow:- Build an evaluation set of at least 25 audio files that reflect the speakers, accents, audio quality, and vocabulary you expect in production. See Evaluate model accuracy for the full methodology.
- Transcribe each file with no prompt to establish a baseline.
- Try the Best all around and
[unclear]recommended prompts and compare. - Layer in one capability instruction at a time and re-measure.
Generate a starting prompt with AI
If the recommended prompts above aren’t a fit for your audio, use the generator below to produce a starting prompt. It opens your preferred AI assistant with a pre-loaded brief built from this guide — the capability knobs, the keyterms-vs-prompt routing, the positive-language rule, and the “start with fewer instructions, add one at a time” framing. The output is a starting point, not a final prompt. Test it against your evaluation set using the workflow above before settling on it.System prompt history
The current default prompt is shown above under Best all around (default). Prior defaults are kept here for changelog transparency.Prior system prompt (April 15, 2026 – April 21, 2026)
Prior system prompt (April 15, 2026 – April 21, 2026)
Prior system prompt (February 25, 2026 – April 15, 2026)
Prior system prompt (February 25, 2026 – April 15, 2026)
Prior system prompt (February 20, 2026 – February 25, 2026)
Prior system prompt (February 20, 2026 – February 25, 2026)
Prior system prompt (before February 20, 2026)
Prior system prompt (before February 20, 2026)