Core Behavior
The SDK accepts multi-channel input, but speech enhancement is applied on a mono mixdown internally.- Input channels are mixed down to mono for enhancement.
- The mono signal is processed by the model.
- The enhanced mono signal is mixed back into each output channel with a mix ratio dependent on the configured enhancement level.
num_channels = 1.
Configuration Rules
Use one processor configuration per stream and keep it stable while processing.- Set
num_channelsto the stream channel count during initialization. - Each process call must use the same channel count.
- For fixed-frame mode (
allow_variable_frames = false), each call must also use the initializednum_frames. - For variable-frame mode (
allow_variable_frames = true), frame count per call must be less than or equal to the initializednum_frames.
Buffer Layouts
The SDK supports three memory layouts for multi-channel buffers:| Layout | Shape | Example (2 channels, 4 frames) |
|---|---|---|
| Interleaved | Single buffer | [ch0_f0, ch1_f0, ch0_f1, ch1_f1, ...] |
| Sequential | Single buffer | [ch0_f0, ch0_f1, ch0_f2, ch0_f3, ch1_f0, ...] |
| Planar | One buffer per channel | audio[0] = [ch0_f0, ...], audio[1] = [ch1_f0, ...] |
The planar layout supports up to 16 channels maximum.
Codecs
The ai-coustics SDK processes decoded PCM samples, not encoded audio files or network packets. In the SDK interface, audio is passed asfloat32 sample buffers using one of the supported memory layouts: planar, interleaved, or sequential.
This means codecs such as MP3, AAC, Opus or G.711 are not decoded by the SDK itself. If your input stream uses encoded audio, first decode it to PCM in your application or media stack, then pass the decoded samples to the SDK.
After processing, you can encode the enhanced PCM output back to the transport codec your application needs.
Common Pitfalls
- Initializing with one channel count and processing with another (
AIC_ERROR_CODE_AUDIO_CONFIG_MISMATCH). - Sending the wrong number of samples for the configured frame size.
- Assuming stereo channels are enhanced independently (they are not, they are mixed to mono first).
- Re-initializing on a real-time audio thread (initialization allocates memory).