Real-Time Guarantees
For real-time processing to work without dropouts, the Inference Latency (execution time) must be lower than the duration of your audio buffer. For example, if you call theprocess function with 10 ms buffers, the function must complete execution in under 10 ms on your CPU.
This depends entirely on your system load and hardware capabilities.
Performance (CPU Usage)
CPU usage is affected only by the model’s complexity. More complex models likeQuail Voice Focus L variants consume significantly more CPU than simpler models like Quail S.
The configured sample rate does not affect CPU usage when using the same model. Take a look at the models guide
for more details on this topic.
Parallel Processing
For Python,Processor and ProcessorAsync have different scheduling behavior:
Processorblocks the thread. Running the model is CPU intensive so this prevents other parts of your application from running until the processing is complete.ProcessorAsyncspawns a blocking task in an internal thread pool. This allows the application to move on with other tasks while the processing is running. Note that this only helps if there are cores available to run the processor in parallel.
Async Execution Footguns
-
Assuming async means one instance can process in parallel
ProcessorAsyncis asynchronous at the Python API level, but each instance is still guarded by a mutex internally. In practice, calls toprocess_async(...)on the same instance are serialized. This is a footgun because teams often addasyncioexpecting higher throughput, but get the same per-instance throughput with extra scheduling overhead and confusing performance results. -
Allowing too many in-flight async calls
If you schedule
process_async(...)faster than your CPU can complete calls, work piles up in a queue. Each pending call keeps its own copied audio buffer and waits for execution. This is a footgun because memory usage grows with queue depth, and chunk completion time becomes inconsistent (jitter). In real-time pipelines, queued chunks can arrive too late and miss playback deadlines. - Submitting chunks from one stream concurrently A single stream is stateful over time. If multiple chunks are submitted concurrently, lock acquisition order and completion timing can differ from enqueue order. This is a footgun because temporal state (enhancement filters, VAD history, etc.) expects ordered chunk progression; out-of-order progression can cause unstable or inconsistent output.
-
Re-initializing while processing is in flight
initialize_async(...)shares the same internal processor state asprocess_async(...). If called while processing tasks are pending/running, state can change between chunks. This is a footgun because configuration/state transitions can happen at unintended boundaries, leading to mixed-config output, abrupt behavior changes, or hard-to-reproduce bugs. -
Calling sync getters during heavy async processing
Methods like
get_processor_context()andget_vad_context()are synchronous and contend for the same lock as processing. This is a footgun because these calls can block unexpectedly, which may stall the event loop at bad times and create latency spikes elsewhere in the application.