Output Delay and Performance - ai-coustics Docs

This guide explains the difference between latency and delay in the ai-coustics SDK, and how to optimize performance for real-time applications.

Terminology

Term	Meaning	Example
Latency	Time to execute the model’s forward pass (CPU dependent)	“2 ms inference latency”
Delay	Time offset of output audio relative to input	”30 ms output delay”

These are independent values. A model that has 2 ms latency on your CPU and 30 ms delay means: computation completes in 2 ms, but the output audio is shifted 30 ms relative to input.

Understanding `output_delay`

The output_delay value (retrieved via aic_processor_get_output_delay) represents the total audio delay in frames. Convert to milliseconds:

delay_ms = (delay_frames / configured_sample_rate) * 1000

Components of Output Delay

Component	When it applies	Avoidable?
Algorithmic delay	Always	No — inherent to model architecture
Adapter delay	When not using `optimal_num_frames`	Yes — use optimal frame size
Buffering delay	When `allow_variable_frames = true`	Yes — use fixed frame size

Sample rate has no effect on the output delay in ms. Neither the model’s native sample rate nor the processor’s configured sample rate changes the delay.

Algorithmic Delay

Our models typically have 10 ms or 30 ms algorithmic delay. See the model documentation for exact values per model. This delay is based on the model architecture and thus is the minimum output delay of the processor when configured with ProcessorConfig::optimal.

Model Processing Window

Models operate on fixed-size audio chunks called the processing window. Our models typically use 10 ms windows. See the model documentation for exact values per model. The size of the window can be determined with the Model::get_optimal_num_frames function, which is dependent on the sample_rate that you operate on.

window_frames = model.get_optimal_num_frames(sample_rate)
window_ms = (window_frames / sample_rate) * 1000

Adapter Delay

The adapter is responsible for adapting between the incoming audio buffers and the model processing window. If they are configured to be the same, there is no adapter delay introduced. Otherwise the delay depends on the relation between the two frame sizes.

Buffering Delay

If the adapter is configured with allow_variable_frames = true, each process call can use a different number of frames, as long as it is at most the configured num_frames. This is necessary because some audio hosts (e.g. Audacity) only report the maximum number of frames they will call the process with. If variable frames are activated, a buffering delay of the length of the model’s processing window will be introduced.

Real-Time Guarantees

For the real-time processing to work, the inference latency (execution time) of the model has to be lower than the duration of the incoming audio buffers (your configured num_frames). This depends on your CPU and system load.

inference latency < audio buffer length

If you call the process function with 10 ms buffers, the function must complete in under 10 ms.

CPU Usage

CPU usage depends only on model complexity. More complex models (e.g., Sparrow L) require more CPU than simpler ones (e.g., Sparrow XS).

Sample rate has no effect on CPU usage. Neither the model’s native sample rate nor the processor’s configured sample rate changes performance.

Parallel Processing

For multiple audio streams (e.g., multiple speakers), create one processor per stream.

Processor	Threading	Parallel Streams
`Processor`	Same thread	❌ Blocks — cannot run concurrently
`AsyncProcessor`	Separate thread	✅ Runs in parallel

# Multiple speakers — use AsyncProcessor
processor_speaker1 = AsyncProcessor(model)
processor_speaker2 = AsyncProcessor(model)
# These process concurrently on different threads

With the synchronous Processor, each call blocks the main thread. You cannot process multiple streams simultaneously.

The AsyncProcessor is currently only available in the Python SDK. For other languages, you’ll need to manage threading manually.

SDK

API

​Terminology

​Understanding output_delay

​Components of Output Delay

​Algorithmic Delay

​Model Processing Window

​Adapter Delay

​Buffering Delay

​Real-Time Guarantees

​CPU Usage

​Parallel Processing

Terminology

Understanding `output_delay`

Components of Output Delay

Algorithmic Delay

Model Processing Window

Adapter Delay

Buffering Delay

Real-Time Guarantees

CPU Usage

Parallel Processing