Skip to main content
Tyto predicts whether your call audio will cause failures in downstream Voice AI models, and why. In this tutorial you build a single Python script that analyzes every recording in a folder with the SDK, writes one JSON file, and lets you explore the results visually in the call-analysis dashboard. The script extends the official analyze_file.py example from the Python SDK.
1

Get an SDK License

Self-service SDK Keys can be generated on the developer platform.
These keys are configured to authorize with our backend and collect telemetry.
You also need uv installed — the script declares its own dependencies, so there is nothing else to set up.
2

Create the script

Save the following as analyze_calls.py. It downloads the tyto-l-16khz model on first run, analyzes every recording in a folder with Tyto’s 5-second window sliding in 1-second steps, and writes a single dashboard-ready JSON file.
analyze_calls.py
# /// script
# requires-python = ">=3.14"
# dependencies = [
#     "aic-sdk",
#     "numpy>=2.3.5",
#     "soundfile>=0.13.1",
# ]
# ///
"""Batch-analyze a folder of call recordings with Tyto and write a JSON
file for the call-analysis dashboard (call-analysis.ai-coustics.com).

Usage:
    uv run analyze_calls.py <folder> [output.json]
"""

import json
import os
import sys
from pathlib import Path

import numpy as np
import soundfile as sf

import aic_sdk as aic

MODEL = "tyto-l-16khz"
WINDOW_SECONDS = 5  # Tyto's fixed analysis window
STEP_SECONDS = 1  # hop between windows; 1 s gives smooth dashboard timelines
AUDIO_EXTENSIONS = {".wav", ".flac", ".mp3", ".ogg"}
DIMENSIONS = (
    "risk_score",
    "speaker_reverb",
    "speaker_loudness",
    "interfering_speech",
    "media_speech",
    "noise",
    "packet_loss",
)


def load_mono_audio(path: Path) -> tuple[np.ndarray, int]:
    """Load an audio file and mix it down to a mono float32 array."""
    audio, sample_rate = sf.read(path, dtype="float32")

    # audio is (frames,) for mono or (frames, channels) for multi-channel.
    if audio.ndim > 1:
        audio = audio.mean(axis=1)

    return np.ascontiguousarray(audio, dtype=np.float32), sample_rate


def analyze_file(analyzer: aic.FileAnalyzer, path: Path) -> dict | None:
    """Analyze one recording and return a dashboard entry for it."""
    samples, sample_rate = load_mono_audio(path)

    results = analyzer.analyze(samples, sample_rate, sample_rate * STEP_SECONDS)
    if not results:
        return None  # Shorter than one analysis window.

    return {
        "file": path.name,
        "duration_sec": round(len(samples) / sample_rate, 2),
        "frames": {
            dim: [round(getattr(r, dim), 4) for r in results] for dim in DIMENSIONS
        },
    }


def main():
    if len(sys.argv) < 2:
        sys.exit("usage: uv run analyze_calls.py <folder> [output.json]")

    folder = Path(sys.argv[1])
    output_path = Path(sys.argv[2]) if len(sys.argv) > 2 else Path("analysis.json")
    license_key = os.environ["AIC_SDK_LICENSE"]

    audio_files = sorted(
        p for p in folder.iterdir() if p.suffix.lower() in AUDIO_EXTENSIONS
    )
    if not audio_files:
        sys.exit(f"No audio files found in {folder}")

    # Download and load the analysis model, then reuse one analyzer for all files.
    model_path = aic.Model.download(MODEL, Path("./models"))
    model = aic.Model.from_file(model_path)
    analyzer = aic.FileAnalyzer(model, license_key)
    print(f"Model loaded from {model_path}")

    calls = []
    for index, path in enumerate(audio_files, start=1):
        try:
            call = analyze_file(analyzer, path)
        except Exception as error:
            print(f"[{index}/{len(audio_files)}] {path.name}: skipped ({error})")
            continue

        if call is None:
            print(
                f"[{index}/{len(audio_files)}] {path.name}: skipped "
                f"(shorter than one {WINDOW_SECONDS} s window)"
            )
            continue

        risk = call["frames"]["risk_score"]
        print(
            f"[{index}/{len(audio_files)}] {path.name}: "
            f"{len(risk)} window(s), mean risk {sum(risk) / len(risk):.2f}"
        )
        calls.append(call)

    output_path.write_text(json.dumps({"model": "Tyto", "calls": calls}, indent=2))
    print(f"\nWrote {len(calls)} call(s) to {output_path}")
    print("Upload it at https://call-analysis.ai-coustics.com/")


if __name__ == "__main__":
    main()
Supported formats are WAV, FLAC, MP3 and OGG. Multi-channel recordings are mixed down to mono, and any sample rate works — the analyzer resamples internally.
3

Run it on your recordings

Point the script at a folder of recordings:
export AIC_SDK_LICENSE="your-license-key"
uv run analyze_calls.py recordings/ analysis.json
Output
Model loaded from models/tyto_l_16khz_yhlek4hc_v43.aicmodel
[1/4] rec_0001.wav: 18 window(s), mean risk 0.23
[2/4] rec_0002.wav: 2 window(s), mean risk 0.60
[3/4] rec_0003.wav: 3 window(s), mean risk 0.20
[4/4] rec_0004.wav: 13 window(s), mean risk 0.30

Wrote 4 call(s) to analysis.json
Upload it at https://call-analysis.ai-coustics.com/
The first run downloads the model (≈20 MB) into ./models; subsequent runs reuse it.
Tyto operates on fixed 5-second windows and emits one score set per window. The script slides that window in 1-second steps so the dashboard timeline stays smooth. Recordings shorter than 5 seconds carry too little context for a meaningful score and are skipped with a warning.
4

Upload to the dashboard

Open call-analysis.ai-coustics.com, click Load data and drop analysis.json on the Analysis JSON zone.Optionally add the folder of recordings as the Audio folder — they are matched to calls by filename so you can listen while reviewing scores. Without audio, the player uses an animated playhead instead.
5

Read the results

Each row is one recording. The table shows the average of each score array, plus p95 and % degraded (the fraction of windows in the Warn band or above) for triage, and the Driver — the dimension that contributed most to the risk.The Tyto Risk Score is bucketed into indicative bands:
BandRangeReading
🟢 Good< 0.35No meaningful degradation; downstream models should be unaffected
🟡 Warn0.35 - 0.60Noticeable degradation; expect elevated error rates
🔴 Bad> 0.60Severe degradation; downstream failure likely; flag the call/intervene
Keep in mind that speaker_loudness is a neutral level meter, not a degradation score, i.e. high values are usually fine.A simple triage workflow: sort by risk score descending, review the top N, and group flagged calls by their worst dimension. See aggregating over calls for more strategies.
6

Find out more

Tyto: Audio Insight

What Tyto measures, how to interpret each dimension, and real-time usage.

SDK Quickstart

Real-time speech enhancement with the SDK in your preferred language.

Developer Platform

Generate SDK license keys and explore the SDK playground.

Python SDK Examples

More examples, including real-time analysis on live streams.