Use this file to discover all available pages before exploring further.
This guide demonstrates how to implement a noise reduction system for real-time audio transcription using AssemblyAI’s Streaming STT and the noisereduce library. You’ll learn how to create a custom audio pipeline that preprocesses incoming audio to remove background noise before it reaches the transcription service.This solution is particularly valuable for:
Voice assistants operating in noisy environments
Customer service applications processing calls
Meeting transcription tools
Voice-enabled applications requiring high accuracy
The implementation uses Python and combines proven audio processing techniques with AssemblyAI’s powerful transcription capabilities. While our example focuses on microphone input, the principles can be applied to any real-time audio stream.
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard. Please note that Streaming Speech-to-text is available for upgraded accounts only. If you’re on the free plan, you’ll need to upgrade your account by adding a credit card.
Set all of your audio configurations and global variables. The NOISE_BUFFER_SIZE controls how much audio is buffered before applying noise reduction — 0.5 seconds provides a good balance between latency and noise reduction quality.
When the connection opens, we start a background thread that reads audio from the microphone, buffers it, applies noise reduction using noisereduce, and sends the denoised audio to AssemblyAI.The noise reduction works by:
Accumulating raw audio samples into a buffer
Once the buffer reaches 0.5 seconds, converting to float and applying nr.reduce_noise()
Converting back to int16 and sending over the WebSocket
Keeping the last 1024 samples as overlap for continuity, and only sending the non-overlapping portion to avoid duplicate audio
def on_open(ws): """Called when the WebSocket connection is established.""" print("WebSocket connection opened.") print(f"Connected to: {API_ENDPOINT}") def stream_audio(): global stream print("Starting audio streaming with noise reduction...") buffer = np.array([], dtype=np.int16) overlap = 1024 has_overlap = False while not stop_event.is_set(): try: audio_data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False) audio_array = np.frombuffer(audio_data, dtype=np.int16) buffer = np.append(buffer, audio_array) if len(buffer) >= NOISE_BUFFER_SIZE: # Apply noise reduction float_audio = buffer.astype(np.float32) / 32768.0 denoised = nr.reduce_noise( y=float_audio, sr=SAMPLE_RATE, prop_decrease=0.75, n_fft=1024, ) int_audio = (denoised * 32768.0).astype(np.int16) # Send only the non-overlapping portion to avoid duplicate audio if has_overlap: ws.send(int_audio[overlap:].tobytes(), websocket.ABNF.OPCODE_BINARY) else: ws.send(int_audio.tobytes(), websocket.ABNF.OPCODE_BINARY) has_overlap = True # Keep some overlap for continuity buffer = buffer[-overlap:] except Exception as e: print(f"Error streaming audio: {e}") break print("Audio streaming stopped.") global audio_thread audio_thread = threading.Thread(target=stream_audio) audio_thread.daemon = True audio_thread.start()
def on_error(ws, error): """Called when a WebSocket error occurs.""" print(f"\nWebSocket Error: {error}") stop_event.set()def on_close(ws, close_status_code, close_msg): """Called when the WebSocket connection is closed.""" print(f"\nWebSocket Disconnected: Status={close_status_code}, Msg={close_msg}") global stream, audio stop_event.set() if stream: if stream.is_active(): stream.stop_stream() stream.close() stream = None if audio: audio.terminate() audio = None if audio_thread and audio_thread.is_alive(): audio_thread.join(timeout=1.0)