Audio Format - ai-coustics Docs

Core Behavior

The SDK accepts multi-channel input, but speech enhancement is applied on a mono mixdown internally.

Input channels are mixed down to mono for enhancement.
The mono signal is processed by the model.
The enhanced mono signal is mixed back into each output channel with a mix ratio dependent on the configured enhancement level.

If you need to process each channel independently, e.g. you have two separate audio streams on each channel, you must create a separate processor instance for each stream and configure each with num_channels = 1.

Configuration Rules

Use one processor configuration per stream and keep it stable while processing.

Set num_channels to the stream channel count during initialization.
Each process call must use the same channel count.
For fixed-frame mode (allow_variable_frames = false), each call must also use the initialized num_frames.
For variable-frame mode (allow_variable_frames = true), frame count per call must be less than or equal to the initialized num_frames.

Buffer Layouts

The SDK supports three memory layouts for multi-channel buffers:

Layout	Shape	Example (2 channels, 4 frames)
Interleaved	Single buffer	`[ch0_f0, ch1_f0, ch0_f1, ch1_f1, ...]`
Sequential	Single buffer	`[ch0_f0, ch0_f1, ch0_f2, ch0_f3, ch1_f0, ...]`
Planar	One buffer per channel	`audio[0] = [ch0_f0, ...]`, `audio[1] = [ch1_f0, ...]`

The planar layout supports up to 16 channels maximum.

Codecs

The ai-coustics SDK processes decoded PCM samples, not encoded audio files or network packets. In the SDK interface, audio is passed as float32 sample buffers using one of the supported memory layouts: planar, interleaved, or sequential. This means codecs such as MP3, AAC, Opus or G.711 are not decoded by the SDK itself. If your input stream uses encoded audio, first decode it to PCM in your application or media stack, then pass the decoded samples to the SDK. After processing, you can encode the enhanced PCM output back to the transport codec your application needs.

Common Pitfalls

Initializing with one channel count and processing with another (AIC_ERROR_CODE_AUDIO_CONFIG_MISMATCH).
Sending the wrong number of samples for the configured frame size.
Assuming stereo channels are enhanced independently (they are not, they are mixed to mono first).
Re-initializing on a real-time audio thread (initialization allocates memory).

​Core Behavior

​Configuration Rules

​Buffer Layouts

​Codecs

​Common Pitfalls

Core Behavior

Configuration Rules

Buffer Layouts

Codecs

Common Pitfalls