Overview
Speech Enhancement for Voice AI
Best for: Improving Voice AI Agents and Speech-to-Text (STT) accuracy. Includes Quail Voice Focus.
- Model Family: Quail
- Platform: SDK
- Real-time
Perceptual Speech Enhancement
Best for: Removing background noise and reverb on real-time communication use-cases.
- Model Family: Sparrow
- Platform: SDK
- Real-time
Voice Isolation & Clarity
Best for: Removing background noise and reverb while preserving the speaker’s identity.
- Model: Finch 2
- Platform: API
- File-based
Audio Repair & Restoration
Best for: Repairing degraded audio (phone calls, old recordings) to studio quality.
- Model: Lark 2
- Platform: API
- File-based
Quail
SDK, Real-time, Human-to-Machine The Quail models are purpose-built for Voice AI Agents and human-to-machine interactions. Unlike standard noise suppression, Quail is tuned to improve the performance of downstream Speech-to-Text (STT) engines. The Voice Focus models will elevate the foreground speaker while suppressing both interfering speech and background noise.Quail Voice Focus L (16 kHz)
Quail Voice Focus L (16 kHz)
- ID:
quail-vf-l-16khz - File size: 35 MB
- Window length: 10 ms
- Optimal sample rate: 16 kHz
- Optimal num frames: 160
- Minimal algorithmic delay: 30 ms
Quail L (16 kHz)
Quail L (16 kHz)
- ID:
quail-l-16khz - File size: 35 MB
- Window length: 10 ms
- Optimal sample rate: 16 kHz
- Optimal num frames: 160
- Minimal algorithmic delay: 30 ms
Quail L (8 kHz)
Quail L (8 kHz)
- ID:
quail-l-8khz - File size: 33.4 MB
- Window length: 10 ms
- Native sample rate: 8 kHz
- Native num frames: 80
- Minimal algorithmic delay: 30 ms
Quail S (16 kHz)
Quail S (16 kHz)
- ID:
quail-s-16khz - File size: 8.88 MB
- Window length: 10 ms
- Native sample rate: 16 kHz
- Native num frames: 160
- Minimal algorithmic delay: 30 ms
Quail S (8 kHz)
Quail S (8 kHz)
- ID:
quail-s-8khz - File size: 8.43 MB
- Window length: 10 ms
- Native sample rate: 8 kHz
- Native num frames: 80
- Minimal algorithmic delay: 30 ms
Sparrow
SDK, Real-time, Human-to-Human The Sparrow models are specifically optimized for human-to-human interaction in real-time constrained systems (e.g. voice calls). They reduce background noise and reverberation while preserving speech naturalness and intelligibility for human perception.Sparrow L (48 kHz)
Sparrow L (48 kHz)
- ID:
sparrow-l-48khz - File size: 35.1 MB
- Window length: 10 ms
- Native sample rate: 48 kHz
- Native num frames: 480
- Minimal algorithmic delay: 30 ms
Sparrow L (16 kHz)
Sparrow L (16 kHz)
- ID:
sparrow-l-16khz - File size: 35 MB
- Window length: 10 ms
- Native sample rate: 16 kHz
- Native num frames: 160
- Minimal algorithmic delay: 30 ms
Sparrow L (8 kHz)
Sparrow L (8 kHz)
- ID:
sparrow-l-8khz - File size: 33.4 MB
- Window length: 10 ms
- Native sample rate: 8 kHz
- Native num frames: 80
- Minimal algorithmic delay: 30 ms
Sparrow S (48 kHz)
Sparrow S (48 kHz)
- ID:
sparrow-s-48khz - File size: 8.96 MB
- Window length: 10 ms
- Native sample rate: 48 kHz
- Native num frames: 480
- Minimal algorithmic delay: 30 ms
Sparrow S (16 kHz)
Sparrow S (16 kHz)
- ID:
sparrow-s-16khz - File size: 8.88 MB
- Window length: 10 ms
- Native sample rate: 16 kHz
- Native num frames: 160
- Minimal algorithmic delay: 30 ms
Sparrow S (8 kHz)
Sparrow S (8 kHz)
- ID:
sparrow-s-8khz - File size: 8.43 MB
- Window length: 10 ms
- Native sample rate: 8 kHz
- Native num frames: 80
- Minimal algorithmic delay: 30 ms
Sparrow XS (48 kHz)
Sparrow XS (48 kHz)
- ID:
sparrow-xs-48khz - File size: 1.62 MB
- Window length: 10 ms
- Native sample rate: 48 kHz
- Native num frames: 480
- Minimal algorithmic delay: 10 ms
Sparrow XXS (48 kHz)
Sparrow XXS (48 kHz)
- ID:
sparrow-xxs-48khz - File size: 1 MB
- Window length: 10 ms
- Native sample rate: 48 kHz
- Native num frames: 480
- Minimal algorithmic delay: 10 ms
Finch 2
API, File-based, Subtractive Finch 2 is our updated voice isolation model designed to remove undesired sounds (noise, reverb) while preserving the original speaker’s identity.- Best for: Strong background noise, heavy reverb, distant speakers, voice isolation needs
- Strengths: Improved de-noising/de-reverb, fewer artifacts, more robust, faster and more energy‑efficient
- Parameter:
enhancement_model: "FINCH"(maps to Finch 2)
Lark 2
API, File-based, Reconstructive Lark 2 is our reconstructive model that goes beyond isolation to repair degraded audio (e.g., compression, band-limiting) and restore a full, modern studio sound while keeping the authentic voice.- Best for: Old/phone/Zoom recordings, clipped or compressed audio, bandwidth‑limited sources
- Strengths: Better denoising and reverb removal, robust across complex real‑world distortions, anti‑hallucination training
- Parameter:
enhancement_model: "LARK_V2"(Lark 2).LARKis legacy.