Migration guide: OpenAI to AssemblyAI

This guide walks through the process of migrating from OpenAI to AssemblyAI for transcribing pre-recorded audio.

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

Side-By-Side Code Comparison

Below is a side-by-side comparison of a basic snippet to transcribe a local file by OpenAI and AssemblyAI:

OpenAI
AssemblyAI

from openai import OpenAI

api_key = "YOUR_OPENAI_API_KEY"
client = OpenAI(api_key)

audio_file = open("./example.wav", "rb")

transcript = client.audio.transcriptions.create(
    model = "whisper-1",
    file = audio_file
)

print(transcript.text)

import assemblyai as aai

aai.settings.api_key = "YOUR-API-KEY"
transcriber = aai.Transcriber()

audio_file = "./example.wav"

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"],
    language_detection=True,
)
transcript = transcriber.transcribe(audio_file, config)

if transcript.status == aai.TranscriptStatus.error:
    print(f"Transcription failed: {transcript.error}")
    exit(1)

print(transcript.text)

Here are helpful things to know about our transcribe method:

The SDK handles polling under the hood
Transcript is directly accessible via transcript.text
English is the default language. We recommend specifying speech_models=["universal-3-pro", "universal-2"] for the highest accuracy
We have a cookbook for error handling common errors when using our API.

Installation

OpenAI
AssemblyAI

from openai import OpenAI

api_key = "YOUR_OPENAI_API_KEY"
client = OpenAI(api_key)

import assemblyai as aai

aai.settings.api_key = "YOUR-API-KEY"
transcriber = aai.Transcriber()

When migrating from OpenAI to AssemblyAI, you’ll first need to handle authentication and SDK setup: Get your API key from your AssemblyAI dashboard
To follow this guide, install AssemblyAI’s Python SDK by typing this code into your terminal:
pip install assemblyai Things to know:

Store your API key securely in an environment variable
API key authentication works the same across all AssemblyAI SDKs

Audio File Sources

OpenAI
AssemblyAI

client = OpenAI()

# Local Files

audio_file = open("./example.wav", "rb")
transcript = client.audio.transcriptions.create(
    model = "whisper-1",
    file = audio_file
)

transcriber = aai.Transcriber()

# Local Files
transcript = transcriber.transcribe("./audio.mp3")

# Public URLs
transcript = transcriber.transcribe("https://example.com/audio.mp3")

Here are helpful things to know when migrating your audio input handling:

AssemblyAI natively supports transcribing publicly accessible audio URLs (for example, S3 URLs), the Whisper API only natively supports transcribing local files.
There’s no need to specify the audio format to AssemblyAI - it’s auto-detected. AssemblyAI accepts almost every audio/video file type: here is a full list of all our supported file types
The Whisper API only supports file sizes up to 25MB, AssemblyAI supports file sizes up to 5GB.

Adding Features

OpenAI
AssemblyAI

transcript = client.audio.transcriptions.create(
    file = audio_file,
    prompt = "INSERT_PROMPT", # Optional text to guide the model's style
    language = "en", # Set language code
    model = "whisper-1",
    response_format = "verbose_json",
    timestamp_granularities = ["word"]
)

# Access word-level timestamps

print(transcript.words)

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"],
    language_detection=True,
    speaker_labels = True, # Speaker diarization
    sentiment_analysis=True, # Sentiment Analysis
    entity_detection = True, # Named entity detection
)

transcript = transcriber.transcribe(audio_url, config)

# Access word-level timestamps
print(transcript.words)

# Access speaker labels
for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

Key differences:

OpenAI does not offer speech understanding features for their speech-to-text API
Use aai.TranscriptionConfig to specify any extra features that you wish to use
With AssemblyAI, timestamp granularity is word-level by default
The results for Speaker Diarization are stored in transcript.utterances. To see the full transcript response object, refer to our API Reference.
Check our documentation for our full list of available features and their parameters
If you want to send a custom prompt to an LLM, you can use LLM Gateway to apply the model to your transcribed audio files.

Documentation Index

​Get Started

​Side-By-Side Code Comparison

​Installation

​Audio File Sources

​Adding Features

Get Started

Side-By-Side Code Comparison

Installation

Audio File Sources

Adding Features