PitchHut logo
Efficient speaker diarization for identifying who spoke when in audio files.
Pitch

Diarize is a Python package that enables speaker diarization, determining 'who spoke when' in any audio file. Designed for CPU-only processing without the need for GPUs or API keys, it offers impressive performance with a Diarization Error Rate of ~10.8% on VoxConverse, while processing audio ~8x faster than real-time.

Description

Diarize offers efficient and accurate speaker diarization for Python, allowing users to determine "who spoke when" in any audio file. Designed for CPU use without the need for GPUs or API keys, this open-source solution is licensed under Apache 2.0.

Key Features:

  • Performance: Achieves approximately 10.8% Diarization Error Rate (DER) on the VoxConverse benchmark, which is lower than other free models like pyannote.
  • Speed: Processes audio inputs at a rate of around 8 times faster than real-time, ensuring quick results without sacrificing accuracy.
  • Automatic Speaker Detection: Automatically identifies the number of speakers in the audio, simplifying the diarization process.

Installation:

Diarize can be easily installed via pip:

pip install diarize

Usage:

Utilizing diarize is straightforward. Here’s an example of how to get started:

from diarize import diarize

result = diarize("meeting.wav")
for seg in result.segments:
    print(f"  [{seg.start:.1f}s - {seg.end:.1f}s] {seg.speaker}")

This will output the detected speakers and the time segments during which they spoke.

Supported Formats:

Diarize works with various audio formats such as WAV, MP3, FLAC, and OGG, making it versatile for different use cases.

API Overview:

result = diarize("meeting.wav")                # Automatically detects speakers
result.to_rttm("meeting.rttm")                # Export results to RTTM format

The API allows easy manipulation of audio segments, enabling users to access speaker information, duration, and more. For comprehensive details, refer to the full documentation.

Benchmarking Insights:

Diarize has been evaluated against the VoxConverse development set, showcasing its impressive performance:

SystemWeighted DER
diarize~10.8%
pyannote precision-2~8.5% (commercial)
pyannote community-1~11.2% (CC-BY-4.0, requires HF token)

Future Enhancements:

Enhancements are actively being developed, including:

  • Cross-dataset validation and benchmark testing across multiple standard datasets.
  • Enhanced speaker count estimation.
  • Real-time streaming capabilities for live audio feeds.

For more information about how Diarize operates, refer to the detailed documentation.

Diarize is an optimal choice for streamlined and efficient speaker diarization in Python, offering a practical solution for various audio processing tasks.

0 comments

No comments yet.

Sign in to be the first to comment.