enhancement_level parameter provides fine-grained control over foreground isolation. It accepts values from 0.0 to 1.0.
How it works
How it works
The model uses an internal probabilistic confidence estimate for its foreground isolation decisions. The
enhancement_level parameter modulates how the model acts on this confidence signal:- Lower values bias the model toward preserving ambiguous speech. Foreground speech is always kept, but more background leakage may pass through.
- Higher values shift the model toward stricter decisions under uncertainty. Competing speech and echo are attenuated more aggressively, but there is an increased risk of suppressing low-energy foreground speech.
Recommended Settings
| Value | Behavior | When to use |
|---|---|---|
0.5 (default) | Conservative. Foreground speech is always preserved. | When minimizing any risk of speech deletion is the top priority. |
0.8 | Balanced. Optimal word error rate on challenging data. | Best starting point for most Voice AI deployments. Slightly higher chance of over-suppression in edge cases. |
1.0 | Aggressive. Maximum suppression of interfering speech. | When reducing insertions from background speakers is critical. Higher risk of suppressing quiet foreground speech. |
Tune Your Input Gain
Unlike diarization-based systems, Quail Voice Focus is signal-based rather than speaker-based. It enhances whichever speech signal is dominant in the foreground, allowing multiple near-field speakers to be enhanced without locking onto a single voice. For optimal performance, the foreground speaker should typically fall within a level range of -35 to -10 LUFS (integrated) at the model input. We recommend tuning the input gain to satisfy this range. If the foreground speaker is too quiet, the model may classify it as background speech and suppress it.Best Practices
- Use Quail Voice Focus to isolate the main speaker. It provides the best foreground isolation for headset and handheld use cases, eliminating interfering background speech and noise.
- Tune per STT provider. Different engines respond differently to the same audio. Run evaluations with your specific STT model to find the optimal
enhancement_level. - Monitor both insertions and deletions. Increasing the enhancement level reduces false insertions from background speech but may increase deletions of quiet foreground speech. Find the right balance for your application.