Batch Call Analysis with Tyto

Tyto predicts whether your call audio will cause failures in downstream Voice AI models, and why. In this tutorial you build a single Python script that analyzes every recording in a folder with the SDK, writes one JSON file, and lets you explore the results visually in the call-analysis dashboard. The script extends the official analyze_file.py example from the Python SDK.

Get an SDK License

Self-service SDK Keys can be generated on the developer platform.

These keys are configured to authorize with our backend and collect telemetry.

You also need uv installed — the script declares its own dependencies, so there is nothing else to set up.

Create the script

Save the following as analyze_calls.py. It downloads the tyto-l-16khz model on first run, analyzes every recording in a folder with Tyto’s 5-second window sliding in 1-second steps, and writes a single dashboard-ready JSON file.

analyze_calls.py

# /// script
# requires-python = ">=3.14"
# dependencies = [
#     "aic-sdk",
#     "numpy>=2.3.5",
#     "soundfile>=0.13.1",
# ]
# ///
"""Batch-analyze a folder of call recordings with Tyto and write a JSON
file for the call-analysis dashboard (call-analysis.ai-coustics.com).

Usage:
    uv run analyze_calls.py <folder> [output.json]
"""

import json
import os
import sys
from pathlib import Path

import numpy as np
import soundfile as sf

import aic_sdk as aic

MODEL = "tyto-l-16khz"
WINDOW_SECONDS = 5  # Tyto's fixed analysis window
STEP_SECONDS = 1  # hop between windows; 1 s gives smooth dashboard timelines
AUDIO_EXTENSIONS = {".wav", ".flac", ".mp3", ".ogg"}
DIMENSIONS = (
    "risk_score",
    "speaker_reverb",
    "speaker_loudness",
    "interfering_speech",
    "media_speech",
    "noise",
    "packet_loss",
)


def load_mono_audio(path: Path) -> tuple[np.ndarray, int]:
    """Load an audio file and mix it down to a mono float32 array."""
    audio, sample_rate = sf.read(path, dtype="float32")

    # audio is (frames,) for mono or (frames, channels) for multi-channel.
    if audio.ndim > 1:
        audio = audio.mean(axis=1)

    return np.ascontiguousarray(audio, dtype=np.float32), sample_rate


def analyze_file(analyzer: aic.FileAnalyzer, path: Path) -> dict | None:
    """Analyze one recording and return a dashboard entry for it."""
    samples, sample_rate = load_mono_audio(path)

    results = analyzer.analyze(samples, sample_rate, sample_rate * STEP_SECONDS)
    if not results:
        return None  # Shorter than one analysis window.

    return {
        "file": path.name,
        "duration_sec": round(len(samples) / sample_rate, 2),
        "frames": {
            dim: [round(getattr(r, dim), 4) for r in results] for dim in DIMENSIONS
        },
    }


def main():
    if len(sys.argv) < 2:
        sys.exit("usage: uv run analyze_calls.py <folder> [output.json]")

    folder = Path(sys.argv[1])
    output_path = Path(sys.argv[2]) if len(sys.argv) > 2 else Path("analysis.json")
    license_key = os.environ["AIC_SDK_LICENSE"]

    audio_files = sorted(
        p for p in folder.iterdir() if p.suffix.lower() in AUDIO_EXTENSIONS
    )
    if not audio_files:
        sys.exit(f"No audio files found in {folder}")

    # Download and load the analysis model, then reuse one analyzer for all files.
    model_path = aic.Model.download(MODEL, Path("./models"))
    model = aic.Model.from_file(model_path)
    analyzer = aic.FileAnalyzer(model, license_key)
    print(f"Model loaded from {model_path}")

    calls = []
    for index, path in enumerate(audio_files, start=1):
        try:
            call = analyze_file(analyzer, path)
        except Exception as error:
            print(f"[{index}/{len(audio_files)}] {path.name}: skipped ({error})")
            continue

        if call is None:
            print(
                f"[{index}/{len(audio_files)}] {path.name}: skipped "
                f"(shorter than one {WINDOW_SECONDS} s window)"
            )
            continue

        risk = call["frames"]["risk_score"]
        print(
            f"[{index}/{len(audio_files)}] {path.name}: "
            f"{len(risk)} window(s), mean risk {sum(risk) / len(risk):.2f}"
        )
        calls.append(call)

    output_path.write_text(json.dumps({"model": "Tyto", "calls": calls}, indent=2))
    print(f"\nWrote {len(calls)} call(s) to {output_path}")
    print("Upload it at https://call-analysis.ai-coustics.com/")


if __name__ == "__main__":
    main()

Supported formats are WAV, FLAC, MP3 and OGG. Multi-channel recordings are mixed down to mono, and any sample rate works — the analyzer resamples internally.

Run it on your recordings

Point the script at a folder of recordings:

export AIC_SDK_LICENSE="your-license-key"
uv run analyze_calls.py recordings/ analysis.json

Output

Model loaded from models/tyto_l_16khz_yhlek4hc_v43.aicmodel
[1/4] rec_0001.wav: 18 window(s), mean risk 0.23
[2/4] rec_0002.wav: 2 window(s), mean risk 0.60
[3/4] rec_0003.wav: 3 window(s), mean risk 0.20
[4/4] rec_0004.wav: 13 window(s), mean risk 0.30

Wrote 4 call(s) to analysis.json
Upload it at https://call-analysis.ai-coustics.com/

The first run downloads the model (≈20 MB) into ./models; subsequent runs reuse it.

Tyto operates on fixed 5-second windows and emits one score set per window. The script slides that window in 1-second steps so the dashboard timeline stays smooth. Recordings shorter than 5 seconds carry too little context for a meaningful score and are skipped with a warning.

Upload to the dashboard

Open call-analysis.ai-coustics.com, click Load data and drop analysis.json on the Analysis JSON zone.Optionally add the folder of recordings as the Audio folder — they are matched to calls by filename so you can listen while reviewing scores. Without audio, the player uses an animated playhead instead.

Read the results

Each row is one recording. The table shows the average of each score array, plus p95 and % degraded (the fraction of windows in the Warn band or above) for triage, and the Driver — the dimension that contributed most to the risk.The Tyto Risk Score is bucketed into indicative bands:

Band	Range	Reading
🟢 Good	< 0.35	No meaningful degradation; downstream models should be unaffected
🟡 Warn	0.35 - 0.60	Noticeable degradation; expect elevated error rates
🔴 Bad	> 0.60	Severe degradation; downstream failure likely; flag the call/intervene

Keep in mind that speaker_loudness is a neutral level meter, not a degradation score, i.e. high values are usually fine.A simple triage workflow: sort by risk score descending, review the top N, and group flagged calls by their worst dimension. See aggregating over calls for more strategies.

Find out more

Tyto: Audio Insight

What Tyto measures, how to interpret each dimension, and real-time usage.

SDK Quickstart

Real-time speech enhancement with the SDK in your preferred language.

Developer Platform

Generate SDK license keys and explore the SDK playground.

Python SDK Examples

More examples, including real-time analysis on live streams.