Skip to main content

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Choosing the right Speech-to-text model for your product requires more than reviewing public benchmarks. Public benchmarks can be misleading due to overfitting — models are often trained on the same datasets used for evaluation, inflating their reported accuracy. Running an evaluation on your own audio data is the most reliable way to determine which model performs best for your specific use case. AssemblyAI provides evaluation tools for both pre-recorded and streaming transcription, measuring metrics that matter in production.

Pre-recorded audio evaluations

Assess which pre-recorded audio STT model is best for your use case. Pre-recorded evaluations measure accuracy using metrics like Word Error Rate (WER) and Full-Word Error Rate (FWER), giving you a clear picture of transcription quality on your actual audio.

Run a pre-recorded audio evaluation

Learn how to evaluate pre-recorded STT models on your own audio data.

Streaming evaluations

Assess which streaming STT model is best for your voice agent or real-time use case. Streaming evaluations focus on latency metrics like Time to First Token (TTFT) and Time to Complete Turn (TTCT) alongside accuracy, since both speed and correctness matter for real-time applications.

Run a streaming evaluation

Learn how to evaluate streaming STT models for voice agents and real-time applications.

Benchmarks

If you want to review AssemblyAI’s current model performance before running your own evaluation, see our benchmarks for the latest accuracy and latency numbers: