Models - ai-coustics Docs

This page contains details about the models compatible with the current version of the SDK.

All models can be used with any sample rate using our SDK. You can learn more about that here.

Quail

Far-field Speech Enhancement for Voice AI Quail is designed for far-field and multi-speaker environments. It does not suppress distant-sounding speech, making it better suited for speakerphone setups, meeting rooms, or situations with multiple participants spread across a space.

Quail L (16 kHz)

ID: quail-l-16khz
File size: 35 MB
Window length: 10 ms
Optimal sample rate: 16 kHz
Optimal num frames: 160
Minimal algorithmic delay: 30 ms

Quail L (8 kHz)

ID: quail-l-8khz
File size: 33.4 MB
Window length: 10 ms
Native sample rate: 8 kHz
Native num frames: 80
Minimal algorithmic delay: 30 ms

Quail S (16 kHz)

ID: quail-s-16khz
File size: 8.88 MB
Window length: 10 ms
Native sample rate: 16 kHz
Native num frames: 160
Minimal algorithmic delay: 30 ms

Quail S (8 kHz)

ID: quail-s-8khz
File size: 8.43 MB
Window length: 10 ms
Native sample rate: 8 kHz
Native num frames: 80
Minimal algorithmic delay: 30 ms

Model files available for download here.

Quail Voice Focus

Near-field Speech Enhancement for Voice AI Quail Voice Focus is optimized for near-field voice interactions. It prioritizes speech that sounds close to the microphone and suppresses speech that sounds distant, along with background noise. This makes it ideal for single-user, close-talk use cases (e.g., headsets or handheld devices).

Quail Voice Focus 2.1 L (16 kHz)

ID: quail-vf-2.1-l-16khz
File size: 20 MB
Window length: 15 ms
Optimal sample rate: 16 kHz
Optimal num frames: 240
Minimal algorithmic delay: 30 ms

Quail Voice Focus 2.1 S (16 kHz)

ID: quail-vf-2.1-s-16khz
File size: 5.3 MB
Window length: 15 ms
Optimal sample rate: 16 kHz
Optimal num frames: 240
Minimal algorithmic delay: 30 ms

Model files available for download here.

Quail VAD

Noise-robust Voice Activity Detection Quail VAD is a standalone, noise-robust Voice Activity Detection model for real-time Voice AI pipelines.

Quail VAD 2.0 XXS (16 kHz)

ID: quail-vad-2.0-xxs-16khz
File size: 630 kB
Window length: 15 ms
Optimal sample rate: 16 kHz
Optimal num frames: 240
Minimal algorithmic delay: 30 ms

Model files available for download here.

Tyto

Audio Insight for Voice AI Tyto is an audio intelligence model that predicts whether an audio signal is likely to cause failures in the downstream models that consume it (VAD, turn-taking, STT and speech-to-speech).

Tyto L (16 kHz)

ID: tyto-l-16khz
File size: 19.8 MB
Window length: 5 s
Native sample rate: 16 kHz

Model files available for download here.

Rook

Speech Enhancement for Human Intelligibility Rook reduces background noise and reverberation while preserving speech naturalness and intelligibility for human perception.

Rook L (48 kHz)

ID: rook-l-48khz
File size: 35.1 MB
Window length: 10 ms
Native sample rate: 48 kHz
Native num frames: 480
Minimal algorithmic delay: 30 ms

Rook L (16 kHz)

ID: rook-l-16khz
File size: 35 MB
Window length: 10 ms
Native sample rate: 16 kHz
Native num frames: 160
Minimal algorithmic delay: 30 ms

Rook L (8 kHz)

ID: rook-l-8khz
File size: 33.4 MB
Window length: 10 ms
Native sample rate: 8 kHz
Native num frames: 80
Minimal algorithmic delay: 30 ms

Rook S (48 kHz)

ID: rook-s-48khz
File size: 8.96 MB
Window length: 10 ms
Native sample rate: 48 kHz
Native num frames: 480
Minimal algorithmic delay: 30 ms

Rook S (16 kHz)

ID: rook-s-16khz
File size: 8.88 MB
Window length: 10 ms
Native sample rate: 16 kHz
Native num frames: 160
Minimal algorithmic delay: 30 ms

Rook S (8 kHz)

ID: rook-s-8khz
File size: 8.43 MB
Window length: 10 ms
Native sample rate: 8 kHz
Native num frames: 80
Minimal algorithmic delay: 30 ms

Model files available for download here.

​Quail

​Quail Voice Focus

​Quail VAD

​Tyto

​Rook

Quail

Quail Voice Focus

Quail VAD

Tyto

Rook