Enhancement Parameters - ai-coustics Docs

Choose parameters by use case

Podcasts & interviews
YouTube & streaming video
Voice AI & telephony
Archives & restoration

Model: LARK_V2 for a polished studio sound; use FINCH (Finch 2) for natural isolation when ambience matters.
Loudness: -16 LUFS for stereo or -19 LUFS for mono (common podcast targets).
True peak: -1 dBTP (or -2 dBTP if exporting MP3/AAC to avoid intersample peaks).
Enhancement level: 80–90 for strong cleanup with natural room; 100 for fully cleaned voice.
Transcode: WAV as master; MP3 for distribution.

Model: LARK_V2 for fuller presence; FINCH (Finch 2) for heavy noise/reverb scenes.
Loudness: -14 LUFS (typical platform normalization).
True peak: -1 dBTP.
Enhancement level: 70–90 to retain some environment; 100 for voice‑only content.
Transcode: WAV for final mix, platform handles compression.

Model: FINCH (Finch 2) for robust noise/reverb removal while preserving identity.
Loudness: -16 LUFS is a good baseline for ASR; keep consistent across turns.
True peak: -2 dBTP if downstream codecs are lossy or bandwidth‑limited.
Enhancement level: 70–85 to avoid over‑processing artifacts in ASR pipelines.
Transcode: WAV preferred; transcode downstream if needed.

Model: LARK_V2 to reconstruct bandwidth and repair compression.
Loudness: -16 to -14 LUFS for modern listening; use -23 LUFS for EBU R128 compliance.
True peak: -1 dBTP.
Enhancement level: Start 80–90; audition; increase toward 100 for severely degraded sources.
Transcode: WAV only for archival masters.

Platform loudness targets (guidance)

Podcasts: −16 LUFS (stereo), −19 LUFS (mono); true peak −1 dBTP recommended.
YouTube / streaming: around −14 LUFS; true peak −1 dBTP.
Music platforms (Spotify/Apple Music): typically normalize to around −14 LUFS; for speech‑first content, −16 to −14 LUFS works well.
Broadcast (EBU R128): −23 LUFS with gating; true peak −1 dBTP. Use this when delivering to broadcast specs.

Loudness policies change over time. Treat these as working targets and verify with each platform’s latest guidance.

Peak loudness and headroom

Set true peak to −1 dBTP by default.
Prefer −2 dBTP when exporting to MP3/AAC to reduce intersample clipping risk.
Leave at least 1 dB headroom if further mastering or loudness‑normalization is expected downstream.

Enhancement level guide

40–60: Subtle cleanup; preserves environment and room tone.
70–90: Strong cleanup while retaining some ambience. Good general‑purpose range.
100: Fully cleaned, voice‑forward result with minimal environment.

If the voice sounds “over‑processed” or brittle, reduce enhancement level or switch from LARK_V2 to FINCH (Finch 2) for more natural isolation.

Preset examples

curl -sS -X POST "https://api.ai-coustics.io/v2/medias" \
  -H "X-API-Key: $AICOUSTICS_API_KEY" \
  -F file=@"./episode.wav" \
  -F enhancement='{ "enhancement_model": "LARK_V2", "enhancement_level": 85, "loudness_target": -16, "true_peak": -1, "transcode": "WAV" }'

Validation rules

loudness_target: integer, -70 to -5 (LUFS)
true_peak: integer, -9 to 0 (dBTP)
enhancement_level: integer, 1 to 100
enhancement_model: FINCH | LARK | LARK_V2
transcode: MP3 | WAV (optional)

Getting started

​Choose parameters by use case

​Platform loudness targets (guidance)

​Peak loudness and headroom

​Enhancement level guide

​Preset examples

Choose parameters by use case

Platform loudness targets (guidance)

Peak loudness and headroom

Enhancement level guide

Preset examples