Skip to main content

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Overview

If you already have a completed transcript, you can add Speaker Identification in a separate request to the Speech Understanding API. This is especially useful when you want to re-identify speakers with different parameters, or when your workflow separates transcription from post-processing.
Speaker Identification requires Speaker Diarization. Your original transcription request must have set speaker_labels: true.
To transcribe and identify speakers in a single request, see the main Speaker Identification page.

Choosing how to identify speakers

You can identify speakers by name or by role:
  • Know the speakers’ names? Use speaker_type: "name" with the names in known_values or speakers. Click here to learn more.
  • Know their roles but not names? Use speaker_type: "role" with roles like "Interviewer" or "Agent" in known_values or speakers. Click here to learn more.
  • Need better accuracy? Use speakers with description fields that provide context about what each speaker typically discusses. Click here to learn more.

How to use Speaker Identification on an existing transcript

First, transcribe your audio with speaker_labels: true. Once the transcription is complete, send the transcript_id along with your speaker identification configuration to the Speech Understanding API.

Identify by name

To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.
import requests
import time

base_url = "https://api.assemblyai.com"

headers = {
    "authorization": "<YOUR_API_KEY>"
}

# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file

upload_url = "https://assembly.ai/wildfires.mp3"

data = {
    "audio_url": upload_url,
    "speech_models": ["universal-3-pro", "universal-2"],
    "language_detection": True,
    "speaker_labels": True
}

# Transcribe file

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

transcript_id = response.json()["id"]
polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"

# Poll for transcription results

while True:
    transcript = requests.get(polling_endpoint, headers=headers).json()

    if transcript["status"] == "completed":
        break

    elif transcript["status"] == "error":
        raise RuntimeError(f"Transcription failed: {transcript['error']}")

    else:
        time.sleep(3)

# Enable speaker identification

understanding_body = {
    "transcript_id": transcript_id,
    "speech_understanding": {
        "request": {
            "speaker_identification": {
                "speaker_type": "name",
                "known_values": ["Michel Martin", "Peter DeCarlo"]  # Change these values to match the names of the speakers in your file
            }
        }
    }
}

# Send the modified transcript to the Speech Understanding API

result = requests.post(
    "https://llm-gateway.assemblyai.com/v1/understanding",
    headers=headers,
    json=understanding_body
).json()

# Access the results and print utterances to the terminal

for utterance in result["utterances"]:
    print(f"{utterance['speaker']}: {utterance['text']}")

Identify by role

To identify speakers by role instead of name, use speaker_type: "role" with role labels in known_values. This is useful for customer service calls, interviews, or any scenario where you know the roles but not the names.
import requests
import time

base_url = "https://api.assemblyai.com"

headers = {
    "authorization": "<YOUR_API_KEY>"
}

# Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file

upload_url = "https://assembly.ai/wildfires.mp3"

data = {
    "audio_url": upload_url,
    "speech_models": ["universal-3-pro", "universal-2"],
    "language_detection": True,
    "speaker_labels": True
}

# Transcribe file

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

transcript_id = response.json()["id"]
polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"

# Poll for transcription results

while True:
    transcript = requests.get(polling_endpoint, headers=headers).json()

    if transcript["status"] == "completed":
        break

    elif transcript["status"] == "error":
        raise RuntimeError(f"Transcription failed: {transcript['error']}")

    else:
        time.sleep(3)

# Enable role-based speaker identification

understanding_body = {
    "transcript_id": transcript_id,
    "speech_understanding": {
        "request": {
            "speaker_identification": {
                "speaker_type": "role",
                "known_values": ["Interviewer", "Interviewee"]  # Change these values to match the roles of the speakers in your file
            }
        }
    }
}

# Send the modified transcript to the Speech Understanding API

result = requests.post(
    "https://llm-gateway.assemblyai.com/v1/understanding",
    headers=headers,
    json=understanding_body
).json()

# Access the results and print utterances to the terminal

for utterance in result["utterances"]:
    print(f"{utterance['speaker']}: {utterance['text']}")

Common role combinations

  • ["Agent", "Customer"] - Customer service calls
  • ["AI Assistant", "User"] - AI chatbot interactions
  • ["Support", "Customer"] - Technical support calls
  • ["Interviewer", "Interviewee"] - Interview recordings
  • ["Host", "Guest"] - Podcast or show recordings
  • ["Moderator", "Panelist"] - Panel discussions

Adding speaker metadata

For more accurate identification, use the speakers parameter instead of known_values to provide descriptions and metadata. The examples below show the understanding_body payload sent to the Speech Understanding API. For setup, transcription, and polling code, see the full examples above.
Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.
At its simplest, you can provide a description alongside each speaker’s name or role:
understanding_body = {
  "transcript_id": transcript_id,
  "speech_understanding": {
    "request": {
      "speaker_identification": {
        "speaker_type": "role",
        "speakers": [
          {
            "role": "interviewer",
            "description": "Hosts the program and interviews the guests"
          },
          {
            "role": "guest",
            "description": "Answers questions from the interview"
          }
        ]
      }
    }
  }
}

# Send the modified transcript to the Speech Understanding API
result = requests.post(
  "https://llm-gateway.assemblyai.com/v1/understanding",
  headers = headers,
  json = understanding_body
).json()
For even more fine-tuned identification, you can include any additional custom properties on each speaker object, such as company, title, department, or any other fields that help describe the speaker:
understanding_body = {
  "transcript_id": transcript_id,
  "speech_understanding": {
    "request": {
      "speaker_identification": {
        "speaker_type": "name",
        "speakers": [
          {
            "name": "Michel Martin",
            "description": "Hosts the program and interviews the guests",
            "company": "NPR",
            "title": "Host Morning Edition"
          },
          {
            "name": "Peter DeCarlo",
            "description": "Answers questions from the interview",
            "company": "Johns Hopkins University",
            "title": "Professor and Vice Chair of Environmental Health and Engineering"
          }
        ]
      }
    }
  }
}
You can use the same custom properties with role-based identification by replacing name with role in each speaker object.

API reference

Request

Retrieve the completed transcript and send it to the Speech Understanding API:
# Step 1: Submit transcription job
curl -X POST "https://api.assemblyai.com/v2/transcript" \
  -H "authorization: <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://assembly.ai/wildfires.mp3",
    "speaker_labels": true
  }'

# Save the transcript_id from the response above, then use it in the following commands

# Step 2: Poll for transcription status (repeat until status is "completed")
curl -X GET "https://api.assemblyai.com/v2/transcript/{transcript_id}" \
  -H "authorization: <YOUR_API_KEY>"

# Step 3: Once transcription is completed, enable speaker identification
curl -X POST "https://llm-gateway.assemblyai.com/v1/understanding" \
  -H "authorization: <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "transcript_id": "{transcript_id}",
    "speech_understanding": {
      "request": {
        "speaker_identification": {
          "speaker_type": "name",
          "known_values": ["Michel Martin", "Peter DeCarlo"]
        }
      }
    }
  }'

Request parameters

For the full list of request parameters, see the Speaker Identification API reference.

Response

For the response format and fields, see the Speaker Identification response reference.