All models can be used with any sample rate using our SDK.
You can learn more about that here.
Quail
Far-field Speech Enhancement for Voice AI Quail is designed for far-field and multi-speaker environments. It does not suppress distant-sounding speech, making it better suited for speakerphone setups, meeting rooms, or situations with multiple participants spread across a space.Quail L (16 kHz)
Quail L (16 kHz)
- ID:
quail-l-16khz - File size: 35 MB
- Window length: 10 ms
- Optimal sample rate: 16 kHz
- Optimal num frames: 160
- Minimal algorithmic delay: 30 ms
Quail L (8 kHz)
Quail L (8 kHz)
- ID:
quail-l-8khz - File size: 33.4 MB
- Window length: 10 ms
- Native sample rate: 8 kHz
- Native num frames: 80
- Minimal algorithmic delay: 30 ms
Quail S (16 kHz)
Quail S (16 kHz)
- ID:
quail-s-16khz - File size: 8.88 MB
- Window length: 10 ms
- Native sample rate: 16 kHz
- Native num frames: 160
- Minimal algorithmic delay: 30 ms
Quail S (8 kHz)
Quail S (8 kHz)
- ID:
quail-s-8khz - File size: 8.43 MB
- Window length: 10 ms
- Native sample rate: 8 kHz
- Native num frames: 80
- Minimal algorithmic delay: 30 ms
Quail Voice Focus
Near-field Speech Enhancement for Voice AI Quail Voice Focus is optimized for near-field voice interactions. It prioritizes speech that sounds close to the microphone and suppresses speech that sounds distant, along with background noise. This makes it ideal for single-user, close-talk use cases (e.g., headsets or handheld devices).Quail Voice Focus 2.1 L (16 kHz)
Quail Voice Focus 2.1 L (16 kHz)
- ID:
quail-vf-2.1-l-16khz - File size: 20 MB
- Window length: 15 ms
- Optimal sample rate: 16 kHz
- Optimal num frames: 240
- Minimal algorithmic delay: 30 ms
Quail Voice Focus 2.1 S (16 kHz)
Quail Voice Focus 2.1 S (16 kHz)
- ID:
quail-vf-2.1-s-16khz - File size: 5.3 MB
- Window length: 15 ms
- Optimal sample rate: 16 kHz
- Optimal num frames: 240
- Minimal algorithmic delay: 30 ms
Quail VAD
Noise-robust Voice Activity Detection Quail VAD is a standalone, noise-robust Voice Activity Detection model for real-time Voice AI pipelines.Quail VAD 2.0 XXS (16 kHz)
Quail VAD 2.0 XXS (16 kHz)
- ID:
quail-vad-2.0-xxs-16khz - File size: 630 kB
- Window length: 15 ms
- Optimal sample rate: 16 kHz
- Optimal num frames: 240
- Minimal algorithmic delay: 30 ms
Tyto
Audio Insight for Voice AI Tyto is an audio intelligence model that predicts whether an audio signal is likely to cause failures in the downstream models that consume it (VAD, turn-taking, STT and speech-to-speech).Tyto L (16 kHz)
Tyto L (16 kHz)
- ID:
tyto-l-16khz - File size: 19.8 MB
- Window length: 5 s
- Native sample rate: 16 kHz
Rook
Speech Enhancement for Human Intelligibility Rook reduces background noise and reverberation while preserving speech naturalness and intelligibility for human perception.Rook L (48 kHz)
Rook L (48 kHz)
- ID:
rook-l-48khz - File size: 35.1 MB
- Window length: 10 ms
- Native sample rate: 48 kHz
- Native num frames: 480
- Minimal algorithmic delay: 30 ms
Rook L (16 kHz)
Rook L (16 kHz)
- ID:
rook-l-16khz - File size: 35 MB
- Window length: 10 ms
- Native sample rate: 16 kHz
- Native num frames: 160
- Minimal algorithmic delay: 30 ms
Rook L (8 kHz)
Rook L (8 kHz)
- ID:
rook-l-8khz - File size: 33.4 MB
- Window length: 10 ms
- Native sample rate: 8 kHz
- Native num frames: 80
- Minimal algorithmic delay: 30 ms
Rook S (48 kHz)
Rook S (48 kHz)
- ID:
rook-s-48khz - File size: 8.96 MB
- Window length: 10 ms
- Native sample rate: 48 kHz
- Native num frames: 480
- Minimal algorithmic delay: 30 ms
Rook S (16 kHz)
Rook S (16 kHz)
- ID:
rook-s-16khz - File size: 8.88 MB
- Window length: 10 ms
- Native sample rate: 16 kHz
- Native num frames: 160
- Minimal algorithmic delay: 30 ms
Rook S (8 kHz)
Rook S (8 kHz)
- ID:
rook-s-8khz - File size: 8.43 MB
- Window length: 10 ms
- Native sample rate: 8 kHz
- Native num frames: 80
- Minimal algorithmic delay: 30 ms