Documentation Index Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Streaming PII Redaction lets you automatically detect and remove personally identifiable information from your streaming transcripts in real time. When enabled, the API redacts PII in final turns only before sending them to the client.
PII redaction supports all streaming models: u3-rt-pro, universal-streaming-english, and universal-streaming-multilingual.
Final turns only PII redaction only applies to final turns. When redact_pii is true,
include_partial_turns defaults to false automatically so no unredacted
text reaches the client. Only set include_partial_turns to true if you
explicitly want partial (non-final) turns, which will contain unredacted PII
alongside the redacted final turns.
When you enable PII redaction, your final turns will look like this:
With hash substitution: Hi, my name is ####!
With entity_name substitution: Hi, my name is [PERSON_NAME]!
Pre-recorded PII redaction For PII redaction on pre-recorded audio, including generating redacted audio files, see Redact PII from transcripts .
Connection parameters
Parameter Type Required Default Description redact_piiboolean Yes falseEnable PII text redaction. Only applies to final turns. redact_pii_policiesarray No All PII entity types to redact. Over the raw WebSocket, pass a JSON-encoded array of policy names (e.g. ["person_name","phone_number"]). The SDKs accept a native list/array. If omitted and redact_pii is true, all detected PII is redacted. See PII policies for the full list. redact_pii_substring No hashReplacement scheme. hash replaces PII with # characters, entity_name replaces with [ENTITY_TYPE]. include_partial_turnsboolean No false when redact_pii is true, otherwise trueWhether to include partial (non-final) turns. Defaults to false automatically when PII redaction is enabled, so no unredacted text reaches the client. Set to true only if you explicitly want to receive partial turns, which will contain unredacted PII.
Quickstart
Get started with streaming PII redaction using the code below. This example streams audio from your microphone and prints each turn with PII redacted.
Python
Python SDK
JavaScript
JavaScript SDK
Install the required libraries pip install websocket-client pyaudio
Create a new file main.py and paste the code below. Replace <YOUR_API_KEY> with your API key.
Run with python main.py and speak into your microphone.
import pyaudio
import websocket
import json
import threading
import time
from urllib.parse import urlencode
YOUR_API_KEY = "<YOUR_API_KEY>"
CONNECTION_PARAMS = {
"sample_rate" : 16000 ,
"speech_model" : "u3-rt-pro" ,
"format_turns" : "true" ,
"redact_pii" : "true" ,
"redact_pii_policies" : json.dumps([ "person_name" , "phone_number" , "email_address" ]),
"redact_pii_sub" : "entity_name" ,
}
API_ENDPOINT_BASE_URL = "wss://streaming.assemblyai.com/v3/ws"
API_ENDPOINT = f " { API_ENDPOINT_BASE_URL } ? { urlencode( CONNECTION_PARAMS ) } "
FRAMES_PER_BUFFER = 800
SAMPLE_RATE = CONNECTION_PARAMS [ "sample_rate" ]
CHANNELS = 1
FORMAT = pyaudio.paInt16
audio = None
stream = None
ws_app = None
audio_thread = None
stop_event = threading.Event()
def on_open ( ws ):
print ( "WebSocket connection opened." )
def stream_audio ():
global stream
while not stop_event.is_set():
try :
audio_data = stream.read( FRAMES_PER_BUFFER , exception_on_overflow = False )
ws.send(audio_data, websocket. ABNF . OPCODE_BINARY )
except Exception as e:
print ( f "Error streaming audio: { e } " )
break
global audio_thread
audio_thread = threading.Thread( target = stream_audio)
audio_thread.daemon = True
audio_thread.start()
def on_message ( ws , message ):
try :
data = json.loads(message)
msg_type = data.get( "type" )
if msg_type == "Begin" :
print ( f "Session began: ID= { data.get( 'id' ) } " )
elif msg_type == "Turn" :
transcript = data.get( "transcript" , "" )
end_of_turn = data.get( "end_of_turn" , False )
if end_of_turn:
print ( f " \r { ' ' * 80 } \r { transcript } " )
elif msg_type == "Termination" :
print ( f " \n Session terminated: { data.get( 'audio_duration_seconds' , 0 ) } s of audio" )
except Exception as e:
print ( f "Error handling message: { e } " )
def on_error ( ws , error ):
print ( f " \n WebSocket Error: { error } " )
stop_event.set()
def on_close ( ws , close_status_code , close_msg ):
print ( f " \n WebSocket Disconnected: Status= { close_status_code } " )
global stream, audio
stop_event.set()
if stream:
if stream.is_active():
stream.stop_stream()
stream.close()
if audio:
audio.terminate()
def run ():
global audio, stream, ws_app
audio = pyaudio.PyAudio()
stream = audio.open(
input = True ,
frames_per_buffer = FRAMES_PER_BUFFER ,
channels = CHANNELS ,
format = FORMAT ,
rate = SAMPLE_RATE ,
)
print ( "Speak into your microphone. Press Ctrl+C to stop." )
ws_app = websocket.WebSocketApp(
API_ENDPOINT ,
header = { "Authorization" : YOUR_API_KEY },
on_open = on_open,
on_message = on_message,
on_error = on_error,
on_close = on_close,
)
ws_thread = threading.Thread( target = ws_app.run_forever)
ws_thread.daemon = True
ws_thread.start()
try :
while ws_thread.is_alive():
time.sleep( 0.1 )
except KeyboardInterrupt :
print ( " \n Stopping..." )
stop_event.set()
if ws_app and ws_app.sock and ws_app.sock.connected:
ws_app.send(json.dumps({ "type" : "Terminate" }))
time.sleep( 2 )
if ws_app:
ws_app.close()
ws_thread.join( timeout = 2.0 )
if __name__ == "__main__" :
run()
See all 116 lines
Install the required libraries pip install "assemblyai>=0.64.0" pyaudio
Create a new file main.py and paste the code below. Replace <YOUR_API_KEY> with your API key.
Run with python main.py and speak into your microphone.
import logging
from typing import Type
import assemblyai as aai
from assemblyai.streaming.v3 import (
BeginEvent,
StreamingClient,
StreamingClientOptions,
StreamingError,
StreamingEvents,
StreamingParameters,
TurnEvent,
TerminationEvent,
)
api_key = "<YOUR_API_KEY>"
logging.basicConfig( level = logging. INFO )
logger = logging.getLogger( __name__ )
def on_begin ( self : Type[StreamingClient], event : BeginEvent):
print ( f "Session started: { event.id } " )
def on_turn ( self : Type[StreamingClient], event : TurnEvent):
print ( f " { event.transcript } (end_of_turn= { event.end_of_turn } )" )
def on_terminated ( self : Type[StreamingClient], event : TerminationEvent):
print (
f "Session terminated: { event.audio_duration_seconds } seconds of audio processed"
)
def on_error ( self : Type[StreamingClient], error : StreamingError):
print ( f "Error occurred: { error } " )
def main ():
client = StreamingClient(
StreamingClientOptions(
api_key = api_key,
api_host = "streaming.assemblyai.com" ,
)
)
client.on(StreamingEvents.Begin, on_begin)
client.on(StreamingEvents.Turn, on_turn)
client.on(StreamingEvents.Termination, on_terminated)
client.on(StreamingEvents.Error, on_error)
client.connect(
StreamingParameters(
sample_rate = 16000 ,
speech_model = "u3-rt-pro" ,
format_turns = True ,
redact_pii = True ,
redact_pii_policies = [ "person_name" , "phone_number" , "email_address" ],
redact_pii_sub = "entity_name" ,
)
)
try :
client.stream(
aai.extras.MicrophoneStream( sample_rate = 16000 )
)
finally :
client.disconnect( terminate = True )
if __name__ == "__main__" :
main()
See all 63 lines
Install the required libraries
Create a new file index.mjs and paste the code below. Replace <YOUR_API_KEY> with your API key.
Run with node index.mjs and speak into your microphone.
import WebSocket from "ws" ;
const YOUR_API_KEY = "<YOUR_API_KEY>" ;
const params = new URLSearchParams ({
sample_rate: "16000" ,
speech_model: "u3-rt-pro" ,
format_turns: "true" ,
redact_pii: "true" ,
redact_pii_policies: JSON . stringify ([ "person_name" , "phone_number" , "email_address" ]),
redact_pii_sub: "entity_name" ,
});
const url = `wss://streaming.assemblyai.com/v3/ws? ${ params } ` ;
const ws = new WebSocket ( url , {
headers: { Authorization: YOUR_API_KEY },
});
ws . on ( "open" , () => {
console . log ( "Connected to AssemblyAI Streaming API" );
// Stream audio data by sending binary frames:
// ws.send(audioBuffer);
});
ws . on ( "message" , ( data ) => {
const msg = JSON . parse ( data );
if ( msg . type === "Turn" && msg . end_of_turn ) {
console . log ( msg . transcript );
}
});
ws . on ( "error" , ( err ) => console . error ( "WebSocket error:" , err ));
ws . on ( "close" , () => console . log ( "Disconnected" ));
See all 32 lines
Install the required libraries npm install assemblyai node-record-lpcm16
Create a new file index.mjs and paste the code below. Replace <YOUR_API_KEY> with your API key.
Run with node index.mjs and speak into your microphone.
import { AssemblyAI } from "assemblyai" ;
import recorder from "node-record-lpcm16" ;
const apiKey = "<YOUR_API_KEY>" ;
const SAMPLE_RATE = 16000 ;
const client = new AssemblyAI ({ apiKey });
const transcriber = client . streaming . transcriber ({
sampleRate: SAMPLE_RATE ,
speechModel: "u3-rt-pro" ,
formatTurns: true ,
redactPii: true ,
redactPiiPolicies: [ "person_name" , "phone_number" , "email_address" ],
redactPiiSub: "entity_name" ,
});
transcriber . on ( "open" , ({ id }) => {
console . log ( `Session started: ${ id } ` );
});
transcriber . on ( "turn" , ( turn ) => {
if ( turn . end_of_turn ) {
console . log ( turn . transcript );
}
});
transcriber . on ( "close" , ( code , reason ) => {
console . log ( `Session terminated: ${ code } ${ reason } ` );
});
transcriber . on ( "error" , ( error ) => {
console . error ( `Error occurred: ${ error } ` );
});
async function main () {
await transcriber . connect ();
console . log ( "Speak into your microphone. Press Ctrl+C to stop." );
const recording = recorder . record ({
channels: 1 ,
sampleRate: SAMPLE_RATE ,
audioType: "raw" ,
});
recording . stream (). on ( "data" , ( chunk ) => transcriber . sendAudio ( chunk ));
process . on ( "SIGINT" , async () => {
recording . stop ();
await transcriber . close ( true );
process . exit ( 0 );
});
}
main ();
Example output
With entity_name substitution:
Hi, my name is [PERSON_NAME] and you can reach me at [PHONE_NUMBER] or [EMAIL_ADDRESS].
With hash substitution:
Hi, my name is #### and you can reach me at ###-###-#### or ####@#####.###.
Supported PII policies
Streaming PII redaction supports the same policies as pre-recorded PII redaction, including person_name, phone_number, email_address, credit_card_number, us_social_security_number, date_of_birth, and more.
For the full list of available policies, see PII policies .
Troubleshooting
Why am I still seeing PII in the transcript?
PII redaction only applies to final turns . If you’re seeing PII, you
likely set include_partial_turns to true, which returns unredacted
partial turns alongside redacted finals. Remove that override (or set it to
false) to only receive redacted final turns — this is the default when
redact_pii is enabled.
Can I redact PII from the audio itself?
Audio redaction is not available for streaming. To generate a redacted audio
file, use pre-recorded PII redaction
with the redact_pii_audio parameter.