Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
This guide will walk you through using AssemblyAI’s Entity Detection model to redact specific entities from an audio transcription.
While AssemblyAI offers a PII Redaction model for automatic redaction, this method is ideal for scenarios where you need both a redacted and a non-redacted version of the transcript.
We’ll use the AssemblyAI Python SDK to demonstrate this. By the end of this guide, you’ll be able to effectively redact sensitive information from your transcriptions while preserving the original text.
You can also get both a redacted and an unredacted version of the transcript back in a single request by setting redact_pii_return_unredacted: true alongside redact_pii: true — no client-side reconstruction needed.
Quickstart
import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()
audio_url = (
"https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
)
config = aai.TranscriptionConfig(entity_detection=True)
transcript = transcriber.transcribe(audio_url, config)
redacted_transcript = transcript.text
# redact ALL entities
for entity in transcript.entities:
redacted_transcript = redacted_transcript.replace(entity.text, f"[{entity.entity_type.upper()}]")
print(redacted_transcript[:500])
Before you begin
To complete this tutorial, you need:
Step-by-Step Guide
Install the AssemblyAI SDK:
Import the assemblyai package and set your API key:
import assemblyai as aai
aai.settings.api_key = "YOUR-API-KEY"
Define a Transcriber and a TranscriptionConfig with entity_detection set to True, and then create a transcript.
transcriber = aai.Transcriber()
audio_url = (
"https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
)
config = aai.TranscriptionConfig(entity_detection=True)
transcript = transcriber.transcribe(audio_url, config)
To redact all detected entities, iterate through the entities in the transcript and replace their text with their entity type:
redacted_transcript = transcript.text
# redact ALL entities
for entity in transcript.entities:
redacted_transcript = redacted_transcript.replace(entity.text, f"[{entity.entity_type.upper()}]")
print(redacted_transcript[:500])
Smoke from hundreds of wildfires in [LOCATION] is triggering air quality alerts throughout [LOCATION]. Skylines from [LOCATION] to [LOCATION] to [LOCATION] are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what's happening here and why. So he called [PERSON_NAME], an [OCCUPATION] in the [ORGANIZATION] at [ORGANIZATION]...
If you want to redact only certain types of entities (e.g., locations), filter them using a list of entity types:
# filter for some entities
redacted_transcript2 = transcript.text
pii_policies = ["location"]
for entity in transcript.entities:
if entity.entity_type in pii_policies:
redacted_transcript2 = redacted_transcript2.replace(entity.text, f"[{entity.entity_type.upper()}]")
print(redacted_transcript2[:500])
Smoke from hundreds of wildfires in [LOCATION] is triggering air quality alerts throughout [LOCATION]. Skylines from [LOCATION] to [LOCATION] to [LOCATION] are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what's happening here and why. So he called Peter DiCarlo, an associate professor in the department of Environmental Health and Engineering at Johns Hopkins University. Good morning. Professor. Good morning. So
Conclusion
Disclaimer: This method only creates a local redacted copy of the text. If you make a GET request for the transcript again, the text field will remain unredacted.
This tutorial demonstrated how to use the AssemblyAI Python SDK to redact sensitive information from your transcriptions using our Entity Detection model. If you have any further questions or need additional assistance, feel free to reach out to the AssemblyAI Support team at support@assemblyai.com!