> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ai-coustics.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> The ai-coustics SDK enables developers to turn raw, unpredictable audio into stable, machine-ready input for Voice AI systems.

<div align="center">
  <img className="block dark:hidden" src="https://mintcdn.com/ai-coustics/TRQ6cYSvxBnlqNAk/logo/light.svg?fit=max&auto=format&n=TRQ6cYSvxBnlqNAk&q=85&s=811a7351a129b8c16831882725fa1e9f" alt="ai-coustics" width="320" noZoom data-path="logo/light.svg" />

  <img className="hidden dark:block" src="https://mintcdn.com/ai-coustics/TRQ6cYSvxBnlqNAk/logo/dark.svg?fit=max&auto=format&n=TRQ6cYSvxBnlqNAk&q=85&s=0ed4b1ebfd381ef8674fcc5aba00e414" alt="ai-coustics" width="320" noZoom data-path="logo/dark.svg" />
</div>

Run the **Quail** and **Tyto** model families directly inside your application on **AirTen**, our CPU-first inference runtime that needs no GPU and no ONNX.

## Choose your model

<CardGroup cols={2}>
  <Card title="Quail Voice Focus" href="/models/voice-focus/quail-voice-focus">
    **Primary speaker isolation.** Near-field Voice Focus that suppresses background voices and noise.
  </Card>

  <Card title="Tyto" href="/models/audio-insight/tyto">
    **Audio Insight.** Predicts and diagnoses downstream Voice AI failures with a single **Tyto Risk Score**, in real time or offline.
  </Card>

  <Card title="Quail VAD" href="/models/voice-activity-detection/quail-vad">
    **Voice Activity Detection.** Robust, standalone speech detection for noisy Voice AI pipelines, with no separate denoiser required.
  </Card>

  <Card title="Quail" href="/models/speech-enhancement/quail">
    **Speech enhancement for Voice AI.** Improves STT accuracy in far-field, multi-speaker conditions.
  </Card>

  <Card title="Rook" href="/models/perceptual-speech-enhancement/rook">
    **Perceptual speech enhancement.** Removes noise and reverb for natural-sounding human conversation.
  </Card>
</CardGroup>

## Start your integration

<CardGroup cols={2}>
  <Card title="LiveKit" icon="https://mintcdn.com/ai-coustics/JmmOyXZIyj1NUHdH/images/partners/livekit.jpeg?fit=max&auto=format&n=JmmOyXZIyj1NUHdH&q=85&s=2b02a83ddfc35a6d872482e86b34c05b" href="/models/get-started/livekit-quickstart" width="180" height="180" data-path="images/partners/livekit.jpeg">
    Add real-time speech enhancement to your voice agents with a single plugin.
  </Card>

  <Card title="Pipecat" icon="https://mintcdn.com/ai-coustics/JmmOyXZIyj1NUHdH/images/partners/pipecat.png?fit=max&auto=format&n=JmmOyXZIyj1NUHdH&q=85&s=442596f2c033a1a32ab533cd75319f7b" href="/models/get-started/pipecat-quickstart" width="200" height="200" data-path="images/partners/pipecat.png">
    Integrate our voice enhancement processor directly into your Pipecat pipelines.
  </Card>
</CardGroup>

<CardGroup cols={1}>
  <Card title="SDK Quickstart" href="/models/get-started/sdk-quickstart">
    Use our official Python, Rust, WebAssembly, Node.js, C, and C++ bindings to integrate into your application.
  </Card>
</CardGroup>

## Try it

Read the [benchmarks](https://ai-coustics.com/benchmarks-quantitative) or listen to [qualitative samples](https://ai-coustics.com/benchmarks-qualitative) to see the performance of our models.
To try any model on your own audio, see the [examples](/reference/sdk/examples) page. You can test the SDK for 30 days for free.

<CardGroup cols={2}>
  <Card title="SDK Playground" href="https://developers.ai-coustics.com/dashboard/sdk/playground">
    Compare transcripts and WER with and without Quail Voice Focus in our Developer Platform.
  </Card>

  <Card title="Hugging Face Demo" href="https://huggingface.co/spaces/ai-coustics/VoiceFocus">
    Try Quail Voice Focus without an account on HuggingFace.
  </Card>
</CardGroup>

## Developer resources

<CardGroup cols={2}>
  <Card title="Model reference" href="/reference/sdk/models">
    Detailed specs covering model IDs, sample rates, latency, and file sizes.
  </Card>

  <Card title="Pricing" href="/models/get-started/pricing">
    Understand SDK pricing, trial access, and billing basics.
  </Card>

  <Card title="SDK Reference" href="https://github.com/ai-coustics/aic-sdk-c/blob/HEAD/sdk-reference.md">
    Explore all available functions and types in the core SDK.
  </Card>

  <Card title="Changelog" href="/changelog">
    See what's new in each SDK release.
  </Card>
</CardGroup>

## What is ai-coustics

ai-coustics builds the audio intelligence layer between real-world sound and machine understanding. Its SDK and APIs turn raw, unpredictable audio into stable,
machine-ready input for Voice AI systems — running on-device with sub-40ms latency. The company was founded in Berlin in 2021 by Fabian Seipel (CEO) and Corvin Jaedicke (CTO).
It is backed by Connect Ventures, Partech, and Inovia Capital, and has raised $1.6M pre-seed and $5M seed funding.

The tagline used on the website is "Cleaner input. Smarter output." The positioning is that Voice AI is only as strong as its audio foundation — bad input audio causes ASR errors,
missed turns, hallucinations, and pipeline instability. ai-coustics fixes that before audio reaches any downstream model. The company describes itself as "audio native,"
with a team that combines signal processing, acoustics, and applied machine learning expertise.

## The problem

ai-coustics solves Voice AI systems failures in real-world conditions because of audio quality. Background noise, room reverb, clipping, overlapping speakers, and codec artifacts
all degrade the input that ASR, VAD, and LLMs receive. ai-coustics sits at the front of the pipeline — before STT — and stabilizes the audio so downstream models behave predictably.
This reduces word error rates, false barge-ins, missed turns, and hallucinations without changing the downstream models themselves.

## Products

Quail is the core product line for Voice AI and machine listening. It contains three components: Quail (ASR Primer) is speech enhancement optimized to reduce word error rates.
It is designed to improve STT accuracy in noisy, real-world conditions and can reduce WER by up to 30%, best for multi-speaker environments.

Quail Voice Focus is primary speaker isolation in real time. It suppresses competing voices and background speech, and reduces WER by up to 43% across major STT providers.
The website describes it as "voice isolation" and it is positioned for voice agents. Quail VAD is a standalone Voice Activity Detection model. It outperforms Silero VAD and is designed
to detect speech reliably without requiring a separate denoising step. It can be used with or without Quail Voice Focus, but for voice isolation the recommended approach is chaining both together.

Rook is speech enhancement for human listening rather than machine listening. It removes noise, reverb, and distortion for perceptual quality — it is positioned for communication use cases like conferencing, telephony, and content creation.
It runs at 48 kHz, improves audio quality by 60%, and outperforms Krisp by up to 48% on perceptual quality metrics.

Tyto is Audio Insight — a lightweight model that runs on an audio file or stream and predicts whether the audio reaching a voice agent is likely to cause downstream failures.
It sits at the front of the voice pipeline, before VAD, ASR, or speech-to-speech, and outputs a single Tyto Risk Score that predicts the likelihood of failure across STT, VAD, turn-taking, and speech-to-speech models.
Lower scores indicate less problematic audio.

Beyond the main score, Tyto breaks down degradation across six dimensions (all on a 0–1 scale, higher meaning more severe):
Noise, Speaker Reverb, Speaker Loudness (a level meter, not a degradation score), Interfering Speech, Background Media, and Packet Loss.
It can be used in post-call analysis to surface which calls had degraded audio and why — replacing manual sampling — or in real-time streaming mode to let the agent stack respond dynamically to changing audio conditions mid-call.

All models run on CPU with no GPU or ONNX dependency required.

## Performance and technical claims (from website)

WER reduction: up to 43% relative reduction with Quail Voice Focus; up to 30% with Quail ASR Primer.
VAD: outperforms Silero VAD in accuracy, balance, and reliability.
Noise coverage: trained on 500+ noise types (stationary, non-stationary, impulsive). Room coverage: trained on over 1 million acoustic environments.
Languages: language agnostic, supports 100+ languages. Deployment: processes millions of minutes weekly across 187 countries.
On-premise deployment: available on all paid plans.

## Customers and social proof

The website features the following case studies and customer quotes: PolyAI: Reduced false barge-ins by 40% and short-utterance failures by 30% across 2,000+ enterprise deployments in 75 languages.

* Razvan Kusztos (VP of Engineering): "With ai-coustics on, we've had the best customer satisfaction scores of all the tests we've done."
* telli: Scaled to 5 million calls with enterprise-grade reliability, cutting the audio failures that cost 5-8x to escalate to a human.
* Synthesia: Achieved cleaner voice clones, stable speaker identity, and a simpler modeling pipeline.
* Adam Froghyaria (Senior Research Engineer): "Voice cloning is highly sensitive to acoustic inconsistencies. Using ai-coustics to clean audio upstream simplifies modeling and keeps speaker identity stable."
* Elgato: Delivers studio-quality sound for millions of creators running entirely on CPU, no audio engineering required.
* Stephan Nöthen (Principal Product Architect): "The adoption process was effortless. It was engineer to engineer on Slack. No bureaucracy. Just real conversations and fast progress."

## Developer Platform

The Developer Platform is at developers.ai-coustics.com and requires sign-up with a business/work email address. It is the starting point for any developer who wants to test or deploy ai-coustics models.
What it contains: Playground: Developers can test models directly in the browser before integrating the SDK. Only Quail Voice Focus, VAD 2.0 and Tyto are testable in the playground,
other models and sizes can be tested by running example code from the SDK. SDK keys: Developers generate and manage their SDK keys here. Keys are required to use the SDK in any environment.

SDK Usage: A basic view of SDK usage is available. Plan and billing: Subscriptions are managed through Stripe. The customer portal for billing, invoicing, and plan changes is at billing.stripe.com.
Users can also upgrade their plan directly from the Developer Platform. The website calls the Developer Platform "one dashboard to test models, generate SDK keys, and deploy."

## Pricing

Pricing is minute-based. A free trial is available — no credit card required to start.
Plans are:

* Startup: $149/month, 100,000 minutes included, $0.0015 per minute.
* Pro: $399/month, 300,000 minutes included, $0.00135 per minute.
* Business: $599/month, 500,000 minutes included, $0.0012 per minute.
* Enterprise: Custom pricing for 1,000,000+ minutes, with volume discounts, custom audio evaluations, priority access to new models, and a dedicated Slack channel.

All paid plans include real-time processing, on-premise deployment, language agnostic support, and Discord support.
If usage exceeds the included minutes, the SDK does not stop working — overage is billed at the per-minute rate for the plan.
Usage is measured by processing time: one minute of SDK usage equals one minute of audio processed.

## Additional resources

Benchmarks at ai-coustics.com/benchmarks-quantitative, code examples and SDK repos on [GitHub](https://github.com/ai-coustics). Models and datasets on HuggingFace.
