This page explains how the ai-coustics SDK integrates into Pipecat’s audio processing pipeline. If you’re looking for a quickstart, see the Pipecat Quickstart.Documentation Index
Fetch the complete documentation index at: https://docs.ai-coustics.com/llms.txt
Use this file to discover all available pages before exploring further.
Architecture Overview
In Pipecat, audio filters run inside the input transport. They process raw input audio before it reaches any downstream processors. TheAICFilter plugs into this mechanism via the audio_in_filter parameter on the transport.
Key Points
- AICFilter runs first. It processes the raw input audio, before anything else in the pipeline sees it.
- VAD runs with the filter. The AICFilter obtains a VAD signal at the same time it filters the audio.
- The VAD analyzer just queries the VAD signal of the filter. It does not process audio a second time.
AICFilter Integration
TheAICFilter class inherits from Pipecat’s BaseAudioFilter. When the transport starts, it calls AICFilter.start(sample_rate), which:
- Loads the model: Either from a local file (
model_path) or by downloading it from the CDN (model_id). Models are cached and shared across filter instances via a singletonAICModelManager. - Creates the processor: An async processor (
ProcessorAsync) is initialized with the model, license key, and optimal configuration for the given sample rate. - Initializes VAD and enhancement contexts: The processor exposes a
ProcessorContextfor controlling parameters (bypass, enhancement level) and aVadContextfor Voice Activity Detection parameters.
Built-in VAD
TheAICFilter provides a built-in voice activity detector through create_vad_analyzer(). This is the recommended VAD for Pipecat pipelines using ai-coustics enhancement.
How It Works
The built-in VAD (AICVADAnalyzer) does not analyze audio independently. It is a passive observer of the AICFilter’s internal state:
- When
AICFilter.filter()processes an audio block, the underlyingProcessorAsync.process_async()call runs both enhancement and VAD detection internally as part of the same computation. - The VAD result is stored on the processor’s
VadContext. - When Pipecat later calls
AICVADAnalyzer.voice_confidence(buffer), the analyzer ignores the buffer entirely, it simply queriesvad_ctx.is_speech_detected()and returns1.0(speech) or0.0(no speech).
Lazy Initialization
TheAICVADAnalyzer is designed to allow creating it before the AICFilter has started.
This is necessary because Pipecat’s transport and aggregators are typically constructed together at setup time, but the filter only initializes when the transport starts.
The analyzer handles this through a lazy binding pattern:
- At construction, it receives a
vad_context_factory, which that returns theVadContextfrom the filter. - On the first call to
set_sample_rate()orvoice_confidence(), it attempts to call the factory. - Once the
VadContextis obtained, pending VAD parameters (sensitivity, hold duration, etc.) are applied.
Further Reading
Pipecat Quickstart
Get started using the ai-coustics SDK in Pipecat.
Pipecat Docs
Pipecat’s documentation on
AICFilter.