Multichannel Transcription

Supported Languages, Regions, and ModelsMultichannel transcription is supported for all languages, regions, and models.

If you have a multichannel audio file with multiple speakers, you can transcribe each of them separately. The response includes an audio_channels property with the number of different channels, and an additional utterances property, containing a list of turn-by-turn utterances. Each utterance contains channel information, starting at 1. Additionally, each word in the words array contains the channel identifier.

Quickstart

import assemblyai as aai

aai.settings.api_key = "<YOUR_API_KEY>"

# audio_file = "./local_file.mp3"
audio_file = "https://assembly.ai/wildfires.mp3"

config = aai.TranscriptionConfig(
  speech_models=["universal-3-pro", "universal-2"],
  language_detection=True,
  multichannel=True
)

transcript = aai.Transcriber(config=config).transcribe(audio_file)

if transcript.status == "error":
  raise RuntimeError(f"Transcription failed: {transcript.error}")

for utterance in transcript.utterances:
  print(f"Channel {utterance.speaker}: {utterance.text}")

Multichannel audio increases the transcription time by approximately 40%.

Per-channel diarization

If you have a multichannel audio file where individual channels may contain multiple speakers, you can combine multichannel and speaker_labels to perform diarization within each channel.

When using multichannel with speaker_labels, the speaker_options parameters (min_speakers_expected and max_speakers_expected) are applied per channel, not globally across the entire file. For example, setting min_speakers_expected: 5 and max_speakers_expected: 7 on a 5-channel file means the model will find 5–7 speakers on each channel, resulting in 25–35 total speakers. Adjust your speaker options accordingly when using multichannel transcription.

When both parameters are enabled:

Channels are labeled numerically (1, 2, 3, etc.)
Speakers within each channel are labeled alphabetically (A, B, C, etc.)
The combined speaker label format is {channel}{speaker} (e.g., “1A”, “1B”, “2A”)

For example, if channel 1 has two speakers and channel 2 has one speaker, the labels would be:

First speaker on channel 1: 1A
Second speaker on channel 1: 1B
First speaker on channel 2: 2A

import assemblyai as aai

aai.settings.api_key = "<YOUR_API_KEY>"

# audio_file = "./local_file.mp3"
audio_file = "https://assembly.ai/wildfires.mp3"

config = aai.TranscriptionConfig(
  speech_models=["universal-3-pro", "universal-2"],
  language_detection=True,
  multichannel=True,
  speaker_labels=True
)

transcript = aai.Transcriber(config=config).transcribe(audio_file)

if transcript.status == "error":
  raise RuntimeError(f"Transcription failed: {transcript.error}")

for utterance in transcript.utterances:
  print(f"Speaker {utterance.speaker}: {utterance.text}")

Documentation Index

​Quickstart

​Per-channel diarization

Quickstart

Per-channel diarization