Skip to main content

Core Behavior

The SDK accepts multi-channel input, but speech enhancement is applied on a mono mixdown internally.
  1. Input channels are mixed down to mono for enhancement.
  2. The mono signal is processed by the model.
  3. The enhanced mono signal is mixed back into each output channel with a mix ratio dependent on the configured enhancement level.
If you need to process each channel independently, e.g. you have two separate audio streams on each channel, you must create a separate processor instance for each stream and configure each with num_channels = 1.

Configuration Rules

Use one processor configuration per stream and keep it stable while processing.
  • Set num_channels to the stream channel count during initialization.
  • Each process call must use the same channel count.
  • For fixed-frame mode (allow_variable_frames = false), each call must also use the initialized num_frames.
  • For variable-frame mode (allow_variable_frames = true), frame count per call must be less than or equal to the initialized num_frames.

Buffer Layouts

The SDK supports three memory layouts for multi-channel buffers:
LayoutShapeExample (2 channels, 4 frames)
InterleavedSingle buffer[ch0_f0, ch1_f0, ch0_f1, ch1_f1, ...]
SequentialSingle buffer[ch0_f0, ch0_f1, ch0_f2, ch0_f3, ch1_f0, ...]
PlanarOne buffer per channelaudio[0] = [ch0_f0, ...], audio[1] = [ch1_f0, ...]
The planar layout supports up to 16 channels maximum.

Common Pitfalls

  • Initializing with one channel count and processing with another (AIC_ERROR_CODE_AUDIO_CONFIG_MISMATCH).
  • Sending the wrong number of samples for the configured frame size.
  • Assuming stereo channels are enhanced independently (they are not, they are mixed to mono first).
  • Re-initializing on a real-time audio thread (initialization allocates memory).