Skip to main content

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

This guide will walk you through using AssemblyAI’s Entity Detection model to redact specific entities from an audio transcription. While AssemblyAI offers a PII Redaction model for automatic redaction, this method is ideal for scenarios where you need both a redacted and a non-redacted version of the transcript. We’ll use the AssemblyAI Python SDK to demonstrate this. By the end of this guide, you’ll be able to effectively redact sensitive information from your transcriptions while preserving the original text.
You can also get both a redacted and an unredacted version of the transcript back in a single request by setting redact_pii_return_unredacted: true alongside redact_pii: true — no client-side reconstruction needed.

Quickstart

import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"

transcriber = aai.Transcriber()

audio_url = (
    "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
)

config = aai.TranscriptionConfig(entity_detection=True)

transcript = transcriber.transcribe(audio_url, config)
redacted_transcript = transcript.text

# redact ALL entities
for entity in transcript.entities:
    redacted_transcript = redacted_transcript.replace(entity.text, f"[{entity.entity_type.upper()}]")

print(redacted_transcript[:500])

Before you begin

To complete this tutorial, you need:

Step-by-Step Guide

Install the AssemblyAI SDK:
pip install assemblyai
Import the assemblyai package and set your API key:
import assemblyai as aai

aai.settings.api_key = "YOUR-API-KEY"
Define a Transcriber and a TranscriptionConfig with entity_detection set to True, and then create a transcript.
transcriber = aai.Transcriber()

audio_url = (
    "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
)

config = aai.TranscriptionConfig(entity_detection=True)

transcript = transcriber.transcribe(audio_url, config)
To redact all detected entities, iterate through the entities in the transcript and replace their text with their entity type:
redacted_transcript = transcript.text

# redact ALL entities
for entity in transcript.entities:
    redacted_transcript = redacted_transcript.replace(entity.text, f"[{entity.entity_type.upper()}]")

print(redacted_transcript[:500])
Output
Smoke from hundreds of wildfires in [LOCATION] is triggering air quality alerts throughout [LOCATION]. Skylines from [LOCATION] to [LOCATION] to [LOCATION] are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what's happening here and why. So he called [PERSON_NAME], an [OCCUPATION] in the [ORGANIZATION] at [ORGANIZATION]...
If you want to redact only certain types of entities (e.g., locations), filter them using a list of entity types:
# filter for some entities
redacted_transcript2 = transcript.text

pii_policies = ["location"]

for entity in transcript.entities:
    if entity.entity_type in pii_policies:
        redacted_transcript2 = redacted_transcript2.replace(entity.text, f"[{entity.entity_type.upper()}]")

print(redacted_transcript2[:500])
Output
Smoke from hundreds of wildfires in [LOCATION] is triggering air quality alerts throughout [LOCATION]. Skylines from [LOCATION] to [LOCATION] to [LOCATION] are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what's happening here and why. So he called Peter DiCarlo, an associate professor in the department of Environmental Health and Engineering at Johns Hopkins University. Good morning. Professor. Good morning. So

Conclusion

Disclaimer: This method only creates a local redacted copy of the text. If you make a GET request for the transcript again, the text field will remain unredacted. This tutorial demonstrated how to use the AssemblyAI Python SDK to redact sensitive information from your transcriptions using our Entity Detection model. If you have any further questions or need additional assistance, feel free to reach out to the AssemblyAI Support team at support@assemblyai.com!