If you already have a completed transcript, you can add Speaker Identification in a separate request to the Speech Understanding API. This is especially useful when you want to re-identify speakers with different parameters, or when your workflow separates transcription from post-processing.
Speaker Identification requires Speaker Diarization. Your original transcription request must have set speaker_labels: true.
To transcribe and identify speakers in a single request, see the main Speaker Identification page.
Know the speakers’ names? Use speaker_type: "name" with the names in known_values or speakers. Click here to learn more.
Know their roles but not names? Use speaker_type: "role" with roles like "Interviewer" or "Agent" in known_values or speakers. Click here to learn more.
Need better accuracy? Use speakers with description fields that provide context about what each speaker typically discusses. Click here to learn more.
How to use Speaker Identification on an existing transcript
First, transcribe your audio with speaker_labels: true. Once the transcription is complete, send the transcript_id along with your speaker identification configuration to the Speech Understanding API.
To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.
Python
JavaScript
import requestsimport timebase_url = "https://api.assemblyai.com"headers = { "authorization": "<YOUR_API_KEY>"}# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileupload_url = "https://assembly.ai/wildfires.mp3"data = { "audio_url": upload_url, "speech_models": ["universal-3-pro", "universal-2"], "language_detection": True, "speaker_labels": True}# Transcribe fileresponse = requests.post(base_url + "/v2/transcript", headers=headers, json=data)transcript_id = response.json()["id"]polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"# Poll for transcription resultswhile True: transcript = requests.get(polling_endpoint, headers=headers).json() if transcript["status"] == "completed": break elif transcript["status"] == "error": raise RuntimeError(f"Transcription failed: {transcript['error']}") else: time.sleep(3)# Enable speaker identificationunderstanding_body = { "transcript_id": transcript_id, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "name", "known_values": ["Michel Martin", "Peter DeCarlo"] # Change these values to match the names of the speakers in your file } } }}# Send the modified transcript to the Speech Understanding APIresult = requests.post( "https://llm-gateway.assemblyai.com/v1/understanding", headers=headers, json=understanding_body).json()# Access the results and print utterances to the terminalfor utterance in result["utterances"]: print(f"{utterance['speaker']}: {utterance['text']}")
const baseUrl = "https://api.assemblyai.com";const apiKey = "<YOUR_API_KEY>";const headers = { "authorization": apiKey, "content-type": "application/json"};// Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileconst uploadUrl = "https://assembly.ai/wildfires.mp3";async function transcribeAndIdentifySpeakers() { // Transcribe file const transcriptResponse = await fetch(`${baseUrl}/v2/transcript`, { method: 'POST', headers: headers, body: JSON.stringify({ audio_url: uploadUrl, speech_models: ["universal-3-pro", "universal-2"], language_detection: true, speaker_labels: true }) }); const { id: transcriptId } = await transcriptResponse.json(); const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`; // Poll for transcription results while (true) { const pollingResponse = await fetch(pollingEndpoint, { headers }); const transcript = await pollingResponse.json(); if (transcript.status === "completed") { break; } else if (transcript.status === "error") { throw new Error(`Transcription failed: ${transcript.error}`); } else { await new Promise(resolve => setTimeout(resolve, 3000)); } } // Enable speaker identification const understandingBody = { transcript_id: transcriptId, speech_understanding: { request: { speaker_identification: { speaker_type: "name", known_values: ["Michel Martin", "Peter DeCarlo"] // Change these values to match the names of the speakers in your file } } } }; // Send the modified transcript to the Speech Understanding API const understandingResponse = await fetch( "https://llm-gateway.assemblyai.com/v1/understanding", { method: 'POST', headers: headers, body: JSON.stringify(understandingBody) } ); const result = await understandingResponse.json(); // Access the results and print utterances to the terminal for (const utterance of result.utterances) { console.log(`${utterance.speaker}: ${utterance.text}`); }}transcribeAndIdentifySpeakers();
To identify speakers by role instead of name, use speaker_type: "role" with role labels in known_values. This is useful for customer service calls, interviews, or any scenario where you know the roles but not the names.
Python
JavaScript
import requestsimport timebase_url = "https://api.assemblyai.com"headers = { "authorization": "<YOUR_API_KEY>"}# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileupload_url = "https://assembly.ai/wildfires.mp3"data = { "audio_url": upload_url, "speech_models": ["universal-3-pro", "universal-2"], "language_detection": True, "speaker_labels": True}# Transcribe fileresponse = requests.post(base_url + "/v2/transcript", headers=headers, json=data)transcript_id = response.json()["id"]polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"# Poll for transcription resultswhile True: transcript = requests.get(polling_endpoint, headers=headers).json() if transcript["status"] == "completed": break elif transcript["status"] == "error": raise RuntimeError(f"Transcription failed: {transcript['error']}") else: time.sleep(3)# Enable role-based speaker identificationunderstanding_body = { "transcript_id": transcript_id, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "role", "known_values": ["Interviewer", "Interviewee"] # Change these values to match the roles of the speakers in your file } } }}# Send the modified transcript to the Speech Understanding APIresult = requests.post( "https://llm-gateway.assemblyai.com/v1/understanding", headers=headers, json=understanding_body).json()# Access the results and print utterances to the terminalfor utterance in result["utterances"]: print(f"{utterance['speaker']}: {utterance['text']}")
const baseUrl = "https://api.assemblyai.com";const apiKey = "<YOUR_API_KEY>";const headers = { "authorization": apiKey, "content-type": "application/json"};// Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-fileconst uploadUrl = "https://assembly.ai/wildfires.mp3";async function transcribeAndIdentifySpeakers() { // Transcribe file const transcriptResponse = await fetch(`${baseUrl}/v2/transcript`, { method: 'POST', headers: headers, body: JSON.stringify({ audio_url: uploadUrl, speech_models: ["universal-3-pro", "universal-2"], language_detection: true, speaker_labels: true }) }); const { id: transcriptId } = await transcriptResponse.json(); const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`; // Poll for transcription results while (true) { const pollingResponse = await fetch(pollingEndpoint, { headers }); const transcript = await pollingResponse.json(); if (transcript.status === "completed") { break; } else if (transcript.status === "error") { throw new Error(`Transcription failed: ${transcript.error}`); } else { await new Promise(resolve => setTimeout(resolve, 3000)); } } // Enable role-based speaker identification const understandingBody = { transcript_id: transcriptId, speech_understanding: { request: { speaker_identification: { speaker_type: "role", known_values: ["Interviewer", "Interviewee"] // Change these values to match the roles of the speakers in your file } } } }; // Send the modified transcript to the Speech Understanding API const understandingResponse = await fetch( "https://llm-gateway.assemblyai.com/v1/understanding", { method: 'POST', headers: headers, body: JSON.stringify(understandingBody) } ); const result = await understandingResponse.json(); // Access the results and print utterances to the terminal for (const utterance of result.utterances) { console.log(`${utterance.speaker}: ${utterance.text}`); }}transcribeAndIdentifySpeakers();
For more accurate identification, use the speakers parameter instead of known_values to provide descriptions and metadata. The examples below show the understanding_body payload sent to the Speech Understanding API. For setup, transcription, and polling code, see the full examples above.
Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.
At its simplest, you can provide a description alongside each speaker’s name or role:
understanding_body = { "transcript_id": transcript_id, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "role", "speakers": [ { "role": "interviewer", "description": "Hosts the program and interviews the guests" }, { "role": "guest", "description": "Answers questions from the interview" } ] } } }}# Send the modified transcript to the Speech Understanding APIresult = requests.post( "https://llm-gateway.assemblyai.com/v1/understanding", headers = headers, json = understanding_body).json()
For even more fine-tuned identification, you can include any additional custom properties on each speaker object, such as company, title, department, or any other fields that help describe the speaker:
understanding_body = { "transcript_id": transcript_id, "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "name", "speakers": [ { "name": "Michel Martin", "description": "Hosts the program and interviews the guests", "company": "NPR", "title": "Host Morning Edition" }, { "name": "Peter DeCarlo", "description": "Answers questions from the interview", "company": "Johns Hopkins University", "title": "Professor and Vice Chair of Environmental Health and Engineering" } ] } } }}
You can use the same custom properties with role-based identification by replacing name with role in each speaker object.
Retrieve the completed transcript and send it to the Speech Understanding API:
# Step 1: Submit transcription jobcurl -X POST "https://api.assemblyai.com/v2/transcript" \ -H "authorization: <YOUR_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "audio_url": "https://assembly.ai/wildfires.mp3", "speaker_labels": true }'# Save the transcript_id from the response above, then use it in the following commands# Step 2: Poll for transcription status (repeat until status is "completed")curl -X GET "https://api.assemblyai.com/v2/transcript/{transcript_id}" \ -H "authorization: <YOUR_API_KEY>"# Step 3: Once transcription is completed, enable speaker identificationcurl -X POST "https://llm-gateway.assemblyai.com/v1/understanding" \ -H "authorization: <YOUR_API_KEY>" \ -H "Content-Type: application/json" \ -d '{ "transcript_id": "{transcript_id}", "speech_understanding": { "request": { "speaker_identification": { "speaker_type": "name", "known_values": ["Michel Martin", "Peter DeCarlo"] } } } }'