Skip to main content

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks through the process of migrating from OpenAI to AssemblyAI for transcribing pre-recorded audio.

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

Side-By-Side Code Comparison

Below is a side-by-side comparison of a basic snippet to transcribe a local file by OpenAI and AssemblyAI:
from openai import OpenAI

api_key = "YOUR_OPENAI_API_KEY"
client = OpenAI(api_key)

audio_file = open("./example.wav", "rb")

transcript = client.audio.transcriptions.create(
    model = "whisper-1",
    file = audio_file
)

print(transcript.text)
Here are helpful things to know about our transcribe method:
  • The SDK handles polling under the hood
  • Transcript is directly accessible via transcript.text
  • English is the default language. We recommend specifying speech_models=["universal-3-pro", "universal-2"] for the highest accuracy
  • We have a cookbook for error handling common errors when using our API.

Installation

from openai import OpenAI

api_key = "YOUR_OPENAI_API_KEY"
client = OpenAI(api_key)
When migrating from OpenAI to AssemblyAI, you’ll first need to handle authentication and SDK setup: Get your API key from your AssemblyAI dashboard
To follow this guide, install AssemblyAI’s Python SDK by typing this code into your terminal:
pip install assemblyai
Things to know:
  • Store your API key securely in an environment variable
  • API key authentication works the same across all AssemblyAI SDKs

Audio File Sources

client = OpenAI()

# Local Files

audio_file = open("./example.wav", "rb")
transcript = client.audio.transcriptions.create(
    model = "whisper-1",
    file = audio_file
)

Here are helpful things to know when migrating your audio input handling:
  • AssemblyAI natively supports transcribing publicly accessible audio URLs (for example, S3 URLs), the Whisper API only natively supports transcribing local files.
  • There’s no need to specify the audio format to AssemblyAI - it’s auto-detected. AssemblyAI accepts almost every audio/video file type: here is a full list of all our supported file types
  • The Whisper API only supports file sizes up to 25MB, AssemblyAI supports file sizes up to 5GB.

Adding Features

transcript = client.audio.transcriptions.create(
    file = audio_file,
    prompt = "INSERT_PROMPT", # Optional text to guide the model's style
    language = "en", # Set language code
    model = "whisper-1",
    response_format = "verbose_json",
    timestamp_granularities = ["word"]
)

# Access word-level timestamps

print(transcript.words)
Key differences:
  • OpenAI does not offer speech understanding features for their speech-to-text API
  • Use aai.TranscriptionConfig to specify any extra features that you wish to use
  • With AssemblyAI, timestamp granularity is word-level by default
  • The results for Speaker Diarization are stored in transcript.utterances. To see the full transcript response object, refer to our API Reference.
  • Check our documentation for our full list of available features and their parameters
  • If you want to send a custom prompt to an LLM, you can use LLM Gateway to apply the model to your transcribed audio files.