Skip to main content
This guide explains the factors that influence runtime performance (CPU usage and processing throughput) in the ai-coustics SDK.

Real-Time Guarantees

For real-time processing to work without dropouts, the Inference Latency (execution time) must be lower than the duration of your audio buffer. inference latencyaudio buffer length\text{inference latency} \leq \text{audio buffer length} For example, if you call the process function with 10 ms buffers, the function must complete execution in under 10 ms on your CPU. This depends entirely on your system load and hardware capabilities.

Performance (CPU Usage)

CPU usage is affected only by the model’s complexity. More complex models like Quail Voice Focus L variants consume significantly more CPU than simpler models like Quail S. The configured sample rate does not affect CPU usage when using the same model. Take a look at the models guide for more details on this topic.

Parallel Processing

For Python, Processor and ProcessorAsync have different scheduling behavior:
  • Processor blocks the thread. Running the model is CPU intensive so this prevents other parts of your application from running until the processing is complete.
  • ProcessorAsync spawns a blocking task in an internal thread pool. This allows the application to move on with other tasks while the processing is running. Note that this only helps if there are cores available to run the processor in parallel.
Use one processor instance per stream, keep calls sequential within a stream, and parallelize across streams using the async processor. We recommend running our benchmark example on your target CPU in order to understand how many processors your system can run while still meeting the real-time deadlines.

Async Execution Footguns

  1. Assuming async means one instance can process in parallel ProcessorAsync is asynchronous at the Python API level, but each instance is still guarded by a mutex internally. In practice, calls to process_async(...) on the same instance are serialized. This is a footgun because teams often add asyncio expecting higher throughput, but get the same per-instance throughput with extra scheduling overhead and confusing performance results.
  2. Allowing too many in-flight async calls If you schedule process_async(...) faster than your CPU can complete calls, work piles up in a queue. Each pending call keeps its own copied audio buffer and waits for execution. This is a footgun because memory usage grows with queue depth, and chunk completion time becomes inconsistent (jitter). In real-time pipelines, queued chunks can arrive too late and miss playback deadlines.
  3. Submitting chunks from one stream concurrently A single stream is stateful over time. If multiple chunks are submitted concurrently, lock acquisition order and completion timing can differ from enqueue order. This is a footgun because temporal state (enhancement filters, VAD history, etc.) expects ordered chunk progression; out-of-order progression can cause unstable or inconsistent output.
  4. Re-initializing while processing is in flight initialize_async(...) shares the same internal processor state as process_async(...). If called while processing tasks are pending/running, state can change between chunks. This is a footgun because configuration/state transitions can happen at unintended boundaries, leading to mixed-config output, abrupt behavior changes, or hard-to-reproduce bugs.
  5. Calling sync getters during heavy async processing Methods like get_processor_context() and get_vad_context() are synchronous and contend for the same lock as processing. This is a footgun because these calls can block unexpectedly, which may stall the event loop at bad times and create latency spikes elsewhere in the application.