Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Supported languages
Supported languages
Supported models
Supported models
Supported regions
Supported regions
US & EU
- With
hashsubstitution:Hi, my name is ####! - With
entity_namesubstitution:Hi, my name is [PERSON_NAME]!
Quickstart
- Python
- Python SDK
- JavaScript
- JavaScript SDK
Enable Topic Detection by setting
redact_pii to True in the JSON payload.Set redact_pii_policies to specify the information you want to redact. For the full list of policies, see PII policies.Set redact_pii_sub to specify the replacement for redacted information.Example output
Create redacted audio files
In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII “beeped” out. You can optionally use silence instead of a beep by settingoverride_audio_redaction_method to "silence" within redact_pii_audio_options.
- Python
- Python SDK
- JavaScript
- JavaScript SDK
To create a redacted version of the audio file, set
redact_pii_audio to True on the JSON payload.
Set redact_pii_audio_quality to specify the quality of the redacted audio file.Use the transcript ID to poll the GET redacted audio endpoint every few seconds to check the status of the redacted audio. Once the status is redacted_audio_ready, you can retrieve the audio URL from the API response.Redacted Audio for Silent FilesBy default, audio redaction provides redacted audio URLs only when speech is detected. However, if your use-case specifically requires redacted audio files even for silent audio files without any dialogue, you can now opt to receive these URLs. Enable this by setting the optional parameter
"return_redacted_no_speech_audio": true within redact_pii_audio_options in your POST request body.Example request body
Example request body
Example output
Return the unredacted transcript
If your workflow needs both the redacted and unredacted transcripts, you can request both in a single transcription call by settingredact_pii_return_unredacted to true. This avoids the need to send a second API request without redaction.
When redact_pii_return_unredacted is true, the response includes three additional fields alongside their redacted counterparts:
| Field | Type | Description |
|---|---|---|
unredacted_text | string | The original transcript text before PII redaction was applied. |
unredacted_words | array of Word | The original word objects before redaction. Same shape as words. |
unredacted_utterances | array of Utterance | The original utterance objects before redaction. Same shape as utterances. |
- Python
- Python SDK
- JavaScript
- JavaScript SDK
Set
redact_pii_return_unredacted to True alongside the existing PII parameters. The completed transcript will include unredacted_text, unredacted_words, and unredacted_utterances in addition to the redacted versions.Example response
API reference
Request
| Key | Type | Description |
|---|---|---|
redact_pii | boolean | Enable PII Redaction. |
redact_pii_policies | array | PII policies for what information to redact. |
redact_pii_sub | string | Method used to substitute PII in the transcript. Can be entity_name or hash. |
redact_pii_audio | boolean | Create a redacted version of the audio file. |
redact_pii_audio_quality | string | Quality of the redacted PII audio file. Can be mp3 or wav. |
redact_pii_audio_options | object | Options for PII-redacted audio. See Create redacted audio files. |
redact_pii_audio_options.override_audio_redaction_method | string | The method used to redact audio. Set to silence to replace PII with silence instead of the default beep. |
redact_pii_return_unredacted | boolean | Opt-in. When true, returns the unredacted transcript alongside the redacted one. Requires redact_pii: true. Defaults to false, in which case only the redacted transcript is returned. |
Response
| Key | Type | Description |
|---|---|---|
text | string | Transcript with redacted PII. |
Request for Redacted Audio
In the request URL, replace transcript_id with the ID of the transcript whereredact_pii_audio is set to true.
Response for Redacted Audio
| Key | Type | Description |
|---|---|---|
status | string | The status of the redacted audio. |
redacted_audio_url | string | The URL of the redacted audio file. |
PII policies
| Policy name | Description | Example |
|---|---|---|
account_number | Customer account or membership identification number | Policy No. 10042992; Member ID: HZ-5235-001 |
banking_information | Banking information, including account and routing numbers | |
blood_type | Blood type | O-, AB positive |
credit_card_cvv | Credit card verification code | CVV: 080 |
credit_card_expiration | Expiration date of a credit card | |
credit_card_number | Credit card number | |
date | Specific calendar date | December 18 |
date_interval | Broader time periods, including date ranges, months, seasons, years, and decades | 2020-2021, 5-9 May, January 1984 |
date_of_birth | Date of birth | Date of Birth: March 7, 1961 |
drivers_license | Driver’s license number | DL# 356933-540 |
drug | Medications, vitamins, or supplements | Advil, Acetaminophen, Panadol |
duration | Measurements of time expressed as a numerical value plus a unit | 8 months, 2 years |
email_address | Email address | support@assemblyai.com |
event | Name of an event or holiday | Olympics, Yom Kippur |
filename | Names of computer files, including the extension or filepath | Taxes/2012/brad-tax-returns.pdf |
gender_sexuality | Terms indicating gender identity or sexual orientation, including slang terms | female, bisexual, trans |
healthcare_number | Healthcare numbers and health plan beneficiary numbers | Policy No.: 5584-486-674-YM |
injury | Bodily injury | I broke my arm, I have a sprained wrist |
ip_address | Internet IP address, including IPv4 and IPv6 formats | 192.168.0.1 |
language | Name of a natural language | Spanish, French |
location | Any Location reference including mailing address, postal code, city, state, province, country, or coordinates. | Lake Victoria, 145 Windsor St., 90210 |
marital_status | Terms indicating marital status | Single, common-law, ex-wife, married |
medical_condition | Name of a medical condition, disease, syndrome, deficit, or disorder | chronic fatigue syndrome, arrhythmia, depression |
medical_process | Medical process, including treatments, procedures, and tests | heart surgery, CT scan |
money_amount | Name and/or amount of currency | 15 pesos, $94.50 |
nationality | Terms indicating nationality, ethnicity, or race | American, Asian, Caucasian |
number_sequence | Numerical PII (including alphanumeric strings) that doesn’t fall under other categories | |
occupation | Job title or profession | professor, actors, engineer, CPA |
organization | Name of an organization | CNN, McDonalds, University of Alaska, Northwest General Hospital |
passport_number | Passport numbers, issued by any country | PA4568332, NU3C6L86S12 |
password | Account passwords, PINs, access keys, or verification answers | 27%alfalfa, temp1234, My mother’s maiden name is Smith |
person_age | Number associated with an age | 27, 75 |
person_name | Name of a person | Bob, Doug Jones, Dr. Kay Martinez, MD |
phone_number | Telephone or fax number | |
physical_attribute | Distinctive bodily attributes, including race | I’m 190cm tall |
political_affiliation | Terms referring to a political party, movement, or ideology | Republican, Liberal |
religion | Terms indicating religious affiliation | Hindu, Catholic |
statistics | Medical statistics | 18%, 18 percent |
time | Expressions indicating clock times | 19:37:28, 10pm EST |
url | Internet addresses | https://www.assemblyai.com/ |
us_social_security_number | Social Security Number or equivalent | |
username | Usernames, login names, or handles | @AssemblyAI |
vehicle_id | Vehicle identification numbers (VINs), vehicle serial numbers, and license plate numbers | 5FNRL38918B111818, BIF7547 |
zodiac_sign | Names of Zodiac signs | Aries, Taurus |
Troubleshooting
Why is the PII not redacted in my transcription?
Why is the PII not redacted in my transcription?
Make sure that at least one PII policy has been specified in
your request, using the
redact_pii_policies parameter. If you’re still
experiencing issues, please reach out to our support team for assistance.Why is my webhook not being sent?
Why is my webhook not being sent?
There could be several reasons why your webhook isn’t being sent, such as a
misconfigured URL, an unreachable endpoint, or an issue with the
authentication headers. Double-check your request and ensure that the
webhook_url parameter is included with a valid URL that can be reached by
AssemblyAI’s API. If you’re using custom authentication headers, ensure that
the webhook_auth_header_name and webhook_auth_header_value parameters are
included and are correct. If you’re still having issues, please contact our
support team for assistance.Why does my redacted audio file sound worse than the original?
Why does my redacted audio file sound worse than the original?
By default, the API returns redacted audio files in MP3 format, a lossy
format. Lossy formats remove audio information to reduce file size, which may
cause a reduction in quality. The difference may be particularly noticeable if
the submitted audio is in a lossless file format. To retain as much quality as
possible, you can instead return your redacted audio files in a lossless
format, by setting
redact_pii_audio_quality to wav.