Choosing the right Speech-to-text model for your product requires more than reviewing public benchmarks. Public benchmarks can be misleading due to overfitting — models are often trained on the same datasets used for evaluation, inflating their reported accuracy. Running an evaluation on your own audio data is the most reliable way to determine which model performs best for your specific use case. AssemblyAI provides evaluation tools for both pre-recorded and streaming transcription, measuring metrics that matter in production.Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Pre-recorded audio evaluations
Assess which pre-recorded audio STT model is best for your use case. Pre-recorded evaluations measure accuracy using metrics like Word Error Rate (WER) and Full-Word Error Rate (FWER), giving you a clear picture of transcription quality on your actual audio.Run a pre-recorded audio evaluation
Learn how to evaluate pre-recorded STT models on your own audio data.
Streaming evaluations
Assess which streaming STT model is best for your voice agent or real-time use case. Streaming evaluations focus on latency metrics like Time to First Token (TTFT) and Time to Complete Turn (TTCT) alongside accuracy, since both speed and correctness matter for real-time applications.Run a streaming evaluation
Learn how to evaluate streaming STT models for voice agents and real-time
applications.