Use this file to discover all available pages before exploring further.
In this guide, we’ll show you how to detect sentences that contain words with low confidence scores. Confidence scores represent how confident the model was in predicting the transcribed word. Detecting words with low confidence scores can be important for manually editing transcripts.
Each transcribed word will contain a corresponding confidence score between 0.0 (low confidence) and 1.0 (high confidence).
You can decide what your confidence threshold will be when implementing this logic in your application. For this guide, we will use a threshold of 0.4.
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an account and get your API key from your dashboard. This guide will use AssemblyAI’s JavaScript SDK. If you haven’t already, install the SDK by following these instructions.
Import the AssemblyAI package and create an AssemblyAI object with your API key:
import { AssemblyAI } from "assemblyai";const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY,});
Next create the transcript with your audio file, either via local audio file or URL (AssemblyAI’s servers need to be able to access the URL, make sure the URL links to a downloadable file).
Finally, we’ll display the final results. The final results will include the timestamp of the sentence that contains low confidence words, the sentence, the words that scored poorly, and their scores.
//This function is optional but can be used to format the timestamps from milleseconds to HH:MM:SSconst formatMilliseconds = (milliseconds) => { // Calculate hours, minutes, and seconds const hours = Math.floor(milliseconds / 3600000); const minutes = Math.floor((milliseconds % 3600000) / 60000); const seconds = Math.floor((milliseconds % 60000) / 1000); // Ensure the values are displayed with leading zeros if needed const formattedHours = hours.toString().padStart(2, "0"); const formattedMinutes = minutes.toString().padStart(2, "0"); const formattedSeconds = seconds.toString().padStart(2, "0"); return `${formattedHours}:${formattedMinutes}:${formattedSeconds}`;};//Format the final results to contain the sentence, low confidence words, timestamps, and confidence scores.const finalResults = filterScores.map((res) => { return `The following sentence at timestamp ${formatMilliseconds(res.start)} contained low confidence words: ${res.text} \n Low confidence word(s) from this sentence: ${res.words .map((res) => { return `${res.text}[score: ${res.confidence}]`; }) .join(", ")}}`;});console.log(finalResults);
The output will look something like this:
[ 'The following sentence at timestamp 00:04:34 contained low confidence words: I am contacting you first when I could just have phoned my bank and marked you as fraud in an instant. \n' + ' Low confidence word(s) from this sentence: marked[score: 0.33049]}', 'The following sentence at timestamp 00:06:40 contained low confidence words: Sabitha, as much as I would like to help you, this is the best I can do for you. \n' + ' Low confidence word(s) from this sentence: Sabitha,[score: 0.22706]}', 'The following sentence at timestamp 00:07:37 contained low confidence words: Thank you for calling Queston. \n' + ' Low confidence word(s) from this sentence: Queston.[score: 0.16557]}']