Is there a way to generate SRT or VTT captions with speaker labels?

The export captions endpoint does not include speaker label information by default, but you can generate captions with speaker labels using the information from the JSON response for the completed transcript. When you enable Speaker Diarization in your transcription request, the response includes speaker information for each word and utterance. You can use this data along with the word-level timestamps to build custom SRT or VTT files that include speaker identification. See the Create Subtitles with Speaker Labels cookbook for a step-by-step guide on how to accomplish this.

Documentation Index