Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ai-coustics.com/llms.txt

Use this file to discover all available pages before exploring further.

This page explains how the ai-coustics SDK integrates into Pipecat’s audio processing pipeline. If you’re looking for a quickstart, see the Pipecat Quickstart.

Architecture Overview

In Pipecat, audio filters run inside the input transport. They process raw input audio before it reaches any downstream processors. The AICFilter plugs into this mechanism via the audio_in_filter parameter on the transport.
     ┌───────┐   ┌─────────┐   ┌───┐   ┌───┐   ┌───┐   ┌───┐   ┌──────┐ 
     │ Audio ├──►│AICFilter├──►│VAD├──►│STT├──►│LLM├──►│TTS├──►│Audio │ 
     │ Input │   └─────────┘   └───┘   └───┘   └───┘   └───┘   │Output│ 
     └───────┘                                                 └──────┘

Key Points

  • AICFilter runs first. It processes the raw input audio, before anything else in the pipeline sees it.
  • VAD runs with the filter. The AICFilter obtains a VAD signal at the same time it filters the audio.
  • The VAD analyzer just queries the VAD signal of the filter. It does not process audio a second time.

AICFilter Integration

The AICFilter class inherits from Pipecat’s BaseAudioFilter. When the transport starts, it calls AICFilter.start(sample_rate), which:
  1. Loads the model: Either from a local file (model_path) or by downloading it from the CDN (model_id). Models are cached and shared across filter instances via a singleton AICModelManager.
  2. Creates the processor: An async processor (ProcessorAsync) is initialized with the model, license key, and optimal configuration for the given sample rate.
  3. Initializes VAD and enhancement contexts: The processor exposes a ProcessorContext for controlling parameters (bypass, enhancement level) and a VadContext for Voice Activity Detection parameters.

Built-in VAD

The AICFilter provides a built-in voice activity detector through create_vad_analyzer(). This is the recommended VAD for Pipecat pipelines using ai-coustics enhancement.

How It Works

The built-in VAD (AICVADAnalyzer) does not analyze audio independently. It is a passive observer of the AICFilter’s internal state:
  1. When AICFilter.filter() processes an audio block, the underlying ProcessorAsync.process_async() call runs both enhancement and VAD detection internally as part of the same computation.
  2. The VAD result is stored on the processor’s VadContext.
  3. When Pipecat later calls AICVADAnalyzer.voice_confidence(buffer), the analyzer ignores the buffer entirely, it simply queries vad_ctx.is_speech_detected() and returns 1.0 (speech) or 0.0 (no speech).
This means the VAD cannot function without the filter running. It has no independent audio analysis capability. The dependency chain is:
                     ┌─────────┐         ┌──────────────┐
                     │AICFilter│         │AICVADAnalyzer│
                     └────┬────┘         └─┬────┬───────┘
                          │                │ ▲  │        
                    ┌─────┴─────┐          │ │  │        
                    ▼           ▼          │ │  │        
                ┌────────┐  ┌──────┐       │ │  │        
                │Filtered│  │ VAD  │◄──────┘ │  │        
                │  Audio │  │result├─────────┘  │        
                └────────┘  └──────┘            ▼        
                                           ┌──────────┐  
                                           │  Voice   │  
                                           │Confidence│  
                                           └──────────┘  
When using the VAD, disabling the filter (via FilterEnableFrame(False)) the filter passes the audio through unmodified, but the processor continues to run in order to produce a VAD output. If the VAD is not used, the processor does not run when disabled.

Lazy Initialization

The AICVADAnalyzer is designed to allow creating it before the AICFilter has started. This is necessary because Pipecat’s transport and aggregators are typically constructed together at setup time, but the filter only initializes when the transport starts. The analyzer handles this through a lazy binding pattern:
  • At construction, it receives a vad_context_factory, which that returns the VadContext from the filter.
  • On the first call to set_sample_rate() or voice_confidence(), it attempts to call the factory.
  • Once the VadContext is obtained, pending VAD parameters (sensitivity, hold duration, etc.) are applied.

Further Reading

Pipecat Quickstart

Get started using the ai-coustics SDK in Pipecat.

Pipecat Docs

Pipecat’s documentation on AICFilter.