PitchHut logo
Effortlessly transcribe audio files with offline accuracy.
Pitch

Audioscribe converts any audio file into a clean, timestamped transcript, all done offline using OpenAI Whisper. No API keys or cloud reliance is needed, making it ideal for anyone who values privacy and convenience. It supports multiple audio formats and offers batch processing, ensuring a seamless experience.

Description

AudioScribe: Efficient Audio Transcription Tool

AudioScribe is a powerful command-line interface (CLI) tool that enables the transcription of audio files into clean, structured text using OpenAI Whisper. Designed for offline use, it eliminates the need for API keys, cloud services, and costly per-minute charges, making it an ideal choice for users seeking a cost-effective solution for audio transcription.

Key Features

  • Wide Format Support: Transcribes various audio file formats, including .mp3, .wav, .m4a, .ogg, .flac, .aac, and .wma.
  • Timestamped Output: Generates structured .txt files that include timestamps, a metadata header, and neatly wrapped lines for easy reading.
  • Batch Processing: Allows users to transcribe an entire folder of audio files with a single command, enhancing workflow efficiency.
  • Automatic Language Detection: Supports automatic language detection or accepts a language hint to speed up the transcription process.
  • Flexible Model Options: Utilizes various sizes of the Whisper models from tiny to large, catering to different accuracy and speed requirements.
  • Robust Testing: The tool includes 47 unit tests, ensuring reliability without imposing heavy dependencies for test execution.

Example Output

The output file is structured to include detailed metadata about the transcription:

========================================
 TRANSCRIPTION
========================================
 File     : recording.mp3
 Date     : 2026-03-30
 Duration : 42:02
 Language : en
 Model    : whisper-medium
========================================

[00:00:00] So this system basically drops to the critical risk
           for real time.

[00:00:03] So this system will have three components.

[00:00:06] It has the data ingestion layer.
...
========================================
 END OF TRANSCRIPTION
========================================

Usage Examples

To transcribe a single audio file:

python transcribe.py audio/recording.mp3

To transcribe all audio files in a folder:

python transcribe.py audio/ --batch

Users can customize their experience further with options to select model size, specify output directories, and set language codes.

Model Size Considerations

ModelSizeSpeedAccuracyBest For
tiny75 MBFastestGoodQuick drafts, clear audio
base145 MBFastBetterGeneral use
small465 MBMediumGoodBalanced
medium1.5 GBSlowerGreatDefault — best general accuracy
large2.9 GBSlowestBestMaximum accuracy, long-form

Each model is downloaded automatically upon initial use and cached locally for efficient access.

Project Components

The structure of the AudioScribe project includes:

audio-transcription/
├── transcribe.py          # CLI entry point
├── src/                   # Source code
│   ├── validator.py       # Checks for file format, existence, and size
│   ├── input_handler.py   # Resolves single file and batch directory input
│   ├── preprocessor.py    # Converts audio to 16kHz mono WAV
│   ├── engine.py          # Handles Whisper model and caching
│   ├── formatter.py       # Formats transcript output with timestamps
│   └── writer.py          # Manages output file creation and directory setup
├── specs/                 # Module acceptance criteria
├── tests/                 # Unit tests (47 in total using pytest)
└── output/                # Default output directory for transcripts (gitignored)

Testing

To run tests, simply ensure the necessary environment is set:

pip install pytest
python -m pytest tests/ -v

AudioScribe streamlines the audio transcription process, making it accessible for various personal and professional applications.

0 comments

No comments yet.

Sign in to be the first to comment.