Use this file to discover all available pages before exploring further.
In this guide, we’ll show you how to transcribe YouTube videos.For this, we use the yt-dlp library to download YouTube videos and then transcribe it with the AssemblyAI API.yt-dlp is a youtube-dl fork with additional features and fixes. It is better maintained and preferred over youtube-dl nowadays.In this guide we’ll show 2 different approaches:
import assemblyai as aaiimport yt_dlpdef transcribe_youtube_video(video_url: str, api_key: str) -> str: """ Transcribe a YouTube video given its URL. Args: video_url: The YouTube video URL to transcribe api_key: AssemblyAI API key Returns: The transcript text """ # Configure yt-dlp options for audio extraction ydl_opts = { 'format': 'm4a/bestaudio/best', 'outtmpl': '%(id)s.%(ext)s', 'postprocessors': [{ 'key': 'FFmpegExtractAudio', 'preferredcodec': 'm4a', }] } # Download and extract audio with yt_dlp.YoutubeDL(ydl_opts) as ydl: ydl.download([video_url]) # Get video ID from info dict info = ydl.extract_info(video_url, download=False) video_id = info['id'] # Configure AssemblyAI aai.settings.api_key = api_key # Transcribe the downloaded audio file config = aai.TranscriptionConfig(speech_models=["universal-3-pro", "universal-2"]) transcriber = aai.Transcriber() transcript = transcriber.transcribe(f"{video_id}.m4a", config) return transcript.texttranscript_text = transcribe_youtube_video("https://www.youtube.com/watch?v=wtolixa9XTg", "YOUR-API-KEY")print(transcript_text)
Next, set up the AssemblyAI SDK and trancribe the file. Replace YOUR_API_KEY with your own key. If you don’t have one, you can sign up here for free.Make sure that the path you pass to the transcribe() function corresponds to the saved filename.
In this approach we download the video with a Python script instead of the command line.You can download the file with the following code:
import yt_dlpURLS = ['https://www.youtube.com/watch?v=wtolixa9XTg']ydl_opts = { 'format': 'm4a/bestaudio/best', # The best audio version in m4a format 'outtmpl': '%(id)s.%(ext)s', # The output name should be the id followed by the extension 'postprocessors': [{ # Extract audio using ffmpeg 'key': 'FFmpegExtractAudio', 'preferredcodec': 'm4a', }]}with yt_dlp.YoutubeDL(ydl_opts) as ydl: error_code = ydl.download(URLS)
After downloading, you can use the same code from option 1 to transcribe the file: