Skip to main content
ai-coustics offers complementary speech enhancement model families for SDK and API users. Use this guide to choose a model that fits your needs.

Overview

Quail

SDK, Real-time, Human-to-Machine The Quail models are purpose-built for Voice AI Agents and human-to-machine interactions. Unlike standard noise suppression, Quail is tuned to improve the performance of downstream Speech-to-Text (STT) engines. The Voice Focus models will elevate the foreground speaker while suppressing both interfering speech and background noise.
Quail models utilize fixed enhancement and voice gain parameters to optimize performance for voice agents. Attempting to modify these parameters will trigger an error.
  • ID: quail-vf-l-16khz
  • File size: 35 MB
  • Window length: 10 ms
  • Optimal sample rate: 16 kHz
  • Optimal num frames: 160
  • Minimal algorithmic delay: 30 ms
  • ID: quail-l-16khz
  • File size: 35 MB
  • Window length: 10 ms
  • Optimal sample rate: 16 kHz
  • Optimal num frames: 160
  • Minimal algorithmic delay: 30 ms
  • ID: quail-l-8khz
  • File size: 33.4 MB
  • Window length: 10 ms
  • Native sample rate: 8 kHz
  • Native num frames: 80
  • Minimal algorithmic delay: 30 ms
  • ID: quail-s-16khz
  • File size: 8.88 MB
  • Window length: 10 ms
  • Native sample rate: 16 kHz
  • Native num frames: 160
  • Minimal algorithmic delay: 30 ms
  • ID: quail-s-8khz
  • File size: 8.43 MB
  • Window length: 10 ms
  • Native sample rate: 8 kHz
  • Native num frames: 80
  • Minimal algorithmic delay: 30 ms

Sparrow

SDK, Real-time, Human-to-Human The Sparrow models are specifically optimized for human-to-human interaction in real-time constrained systems (e.g. voice calls). They reduce background noise and reverberation while preserving speech naturalness and intelligibility for human perception.
  • ID: sparrow-l-48khz
  • File size: 35.1 MB
  • Window length: 10 ms
  • Native sample rate: 48 kHz
  • Native num frames: 480
  • Minimal algorithmic delay: 30 ms
  • ID: sparrow-l-16khz
  • File size: 35 MB
  • Window length: 10 ms
  • Native sample rate: 16 kHz
  • Native num frames: 160
  • Minimal algorithmic delay: 30 ms
  • ID: sparrow-l-8khz
  • File size: 33.4 MB
  • Window length: 10 ms
  • Native sample rate: 8 kHz
  • Native num frames: 80
  • Minimal algorithmic delay: 30 ms
  • ID: sparrow-s-48khz
  • File size: 8.96 MB
  • Window length: 10 ms
  • Native sample rate: 48 kHz
  • Native num frames: 480
  • Minimal algorithmic delay: 30 ms
  • ID: sparrow-s-16khz
  • File size: 8.88 MB
  • Window length: 10 ms
  • Native sample rate: 16 kHz
  • Native num frames: 160
  • Minimal algorithmic delay: 30 ms
  • ID: sparrow-s-8khz
  • File size: 8.43 MB
  • Window length: 10 ms
  • Native sample rate: 8 kHz
  • Native num frames: 80
  • Minimal algorithmic delay: 30 ms
  • ID: sparrow-xs-48khz
  • File size: 1.62 MB
  • Window length: 10 ms
  • Native sample rate: 48 kHz
  • Native num frames: 480
  • Minimal algorithmic delay: 10 ms
  • ID: sparrow-xxs-48khz
  • File size: 1 MB
  • Window length: 10 ms
  • Native sample rate: 48 kHz
  • Native num frames: 480
  • Minimal algorithmic delay: 10 ms

Finch 2

API, File-based, Subtractive Finch 2 is our updated voice isolation model designed to remove undesired sounds (noise, reverb) while preserving the original speaker’s identity.
  • Best for: Strong background noise, heavy reverb, distant speakers, voice isolation needs
  • Strengths: Improved de-noising/de-reverb, fewer artifacts, more robust, faster and more energy‑efficient
  • Parameter: enhancement_model: "FINCH" (maps to Finch 2)

Lark 2

API, File-based, Reconstructive Lark 2 is our reconstructive model that goes beyond isolation to repair degraded audio (e.g., compression, band-limiting) and restore a full, modern studio sound while keeping the authentic voice.
  • Best for: Old/phone/Zoom recordings, clipped or compressed audio, bandwidth‑limited sources
  • Strengths: Better denoising and reverb removal, robust across complex real‑world distortions, anti‑hallucination training
  • Parameter: enhancement_model: "LARK_V2" (Lark 2). LARK is legacy.