Latency and Performance - ai-coustics Docs

This guide explains the factors that influence the audio latency (output delay) and performance (CPU usage) of the ai-coustics SDK, and provides best practices for optimization.

Terminology

It is important to distinguish between Inference Latency and Audio Latency when optimizing your application:

Term	Meaning	Example
Inference Latency	Time it takes to execute the model’s forward pass (CPU dependent).	“2 ms inference latency”
Audio Latency	Time offset of the output audio relative to the input audio.	”30 ms audio latency”

These are independent values. For example, a model might calculate a result in 2 ms (inference latency), but the audio output will be shifted by 30 ms (audio latency) relative to the input due to the model’s architecture. To avoid confusion we call the audio latency output delay of the audio processor.

Understanding Output Delay

In real-time audio applications, output delay is the time offset between the input signal and the processed output. The primary function to determine the total end-to-end delay is aic_processor_get_output_delay.

Output: The function returns the delay in samples.
Conversion: To convert the delay to milliseconds, use the following formula:

\text{delay\_ms} = \left( \frac{\text{delay\_frames}}{\text{configured\_sample\_rate}} \right) \times 1000

Components of Output Delay

The total returned value includes three potential sources of delay:

Component	When it applies	Avoidable?
Algorithmic delay	Always	No — inherent to model architecture
Adapter delay	When not using `optimal_num_frames`	Yes — use optimal number of frames
Buffering delay	When `allow_variable_frames = true`	Yes — use fixed number of frames

Sample Rate Impact: Sample rate has no effect on the output delay. Neither the model’s native sample rate nor the processor’s configured sample rate changes the delay duration.

Factors Affecting Delay

Several factors affect the overall delay of the SDK:

1. Algorithmic Delay (Model Choice)

This is the minimum output delay inherent to the model’s architecture. Different models have different algorithmic delays:

Model	Algorithmic Delay
Sparrow L, Sparrow S, Quail	30 ms
Sparrow XS, Sparrow XXS	10 ms

See the model documentation for exact values per model.

2. Adapter Delay (Frame Size)

The SDK uses an internal adapter to handle differences between your application’s buffer size and the model’s internal processing window (typically 10 ms).

Optimal Frame Size: Each model was trained to process a buffer of audio of fixed length at a given sample rate on each forward pass. This is what is referred to as the model’s native window size and sample rate. The optimal frame size is the number of frames required to produce the model’s native window size at a given sample rate. You can get this value by calling aic_model_get_optimal_num_frames with your target sample rate.
Non-Optimal Frame Size: If you initialize the SDK with a frame size different from the optimal one, the adapter introduces buffering to match the model’s window, which increases delay.

3. Buffering Delay (Variable Frames)

The aic_processor_initialize function has a boolean parameter allow_variable_frames.

false (Default): The SDK expects a fixed frame size for every process call. This is the lowest latency mode.
true: The SDK allows you to send smaller frame sizes than the one specified at initialization. This flexibility comes at the cost of increased delay due to additional buffering.

Real-Time Guarantees

For real-time processing to work without dropouts, the Inference Latency (execution time) must be lower than the duration of your audio buffer.

\text{inference latency} \leq \text{audio buffer length}

For example, if you call the process function with 10 ms buffers, the function must complete execution in under 10 ms on your CPU. This depends entirely on your system load and hardware capabilities.

Performance (CPU Usage)

CPU usage is affected by the following factors:

Model Complexity: More complex models like Sparrow L variants consume significantly more CPU than simpler models like Sparrow S or Sparrow XS.

Sample Rate Impact: The configured sample rate does not affect CPU usage when using the same model; for instance, processing at 48 kHz requires the same computational resources as 16 kHz.

Parallel Processing

If you need to process multiple audio streams (e.g., multiple speakers) simultaneously, you must create one processor instance per stream.

Processor	Threading	Parallel Streams
`Processor`	Same thread	❌ Blocks — cannot run concurrently
`AsyncProcessor`	Separate thread	✅ Runs in parallel

With the synchronous Processor, each call blocks the main thread. You cannot process multiple streams simultaneously on a single thread.

SDK

API

​Terminology

​Understanding Output Delay

​Components of Output Delay

​Factors Affecting Delay

​1. Algorithmic Delay (Model Choice)

​2. Adapter Delay (Frame Size)

​3. Buffering Delay (Variable Frames)

​Real-Time Guarantees

​Performance (CPU Usage)

​Parallel Processing

Terminology

Understanding Output Delay

Components of Output Delay

Factors Affecting Delay

1. Algorithmic Delay (Model Choice)

2. Adapter Delay (Frame Size)

3. Buffering Delay (Variable Frames)

Real-Time Guarantees

Performance (CPU Usage)

Parallel Processing