Terminology
| Term | Meaning | Example |
|---|---|---|
| Latency | Time to execute the model’s forward pass (CPU dependent) | “2 ms inference latency” |
| Delay | Time offset of output audio relative to input | ”30 ms output delay” |
Understanding output_delay
The output_delay value (retrieved via aic_processor_get_output_delay) represents the total audio delay in frames. Convert to milliseconds:
Components of Output Delay
| Component | When it applies | Avoidable? |
|---|---|---|
| Algorithmic delay | Always | No — inherent to model architecture |
| Adapter delay | When not using optimal_num_frames | Yes — use optimal frame size |
| Buffering delay | When allow_variable_frames = true | Yes — use fixed frame size |
Sample rate has no effect on the output delay in ms. Neither the model’s native sample rate nor the processor’s configured sample rate changes the delay.
Algorithmic Delay
Our models typically have 10 ms or 30 ms algorithmic delay. See the model documentation for exact values per model. This delay is based on the model architecture and thus is the minimum output delay of the processor when configured withProcessorConfig::optimal.
Model Processing Window
Models operate on fixed-size audio chunks called the processing window. Our models typically use 10 ms windows. See the model documentation for exact values per model. The size of the window can be determined with theModel::get_optimal_num_frames function, which is dependent on the sample_rate that you operate on.
Adapter Delay
The adapter is responsible for adapting between the incoming audio buffers and the model processing window. If they are configured to be the same, there is no adapter delay introduced. Otherwise the delay depends on the relation between the two frame sizes.Buffering Delay
If the adapter is configured withallow_variable_frames = true, each process call can use a different number of frames, as long as it is at most the configured num_frames. This is necessary because some audio hosts (e.g. Audacity) only report the maximum number of frames they will call the process with. If variable frames are activated, a buffering delay of the length of the model’s processing window will be introduced.
Real-Time Guarantees
For the real-time processing to work, the inference latency (execution time) of the model has to be lower than the duration of the incoming audio buffers (your configurednum_frames). This depends on your CPU and system load.
CPU Usage
CPU usage depends only on model complexity. More complex models (e.g., Sparrow L) require more CPU than simpler ones (e.g., Sparrow XS).Sample rate has no effect on CPU usage. Neither the model’s native sample rate nor the processor’s configured sample rate changes performance.
Parallel Processing
For multiple audio streams (e.g., multiple speakers), create one processor per stream.| Processor | Threading | Parallel Streams |
|---|---|---|
Processor | Same thread | ❌ Blocks — cannot run concurrently |
AsyncProcessor | Separate thread | ✅ Runs in parallel |
Processor, each call blocks the main thread. You cannot process multiple streams simultaneously.
The
AsyncProcessor is currently only available in the Python SDK. For other languages, you’ll need to manage threading manually.