Audioscribe converts any audio file into a clean, timestamped transcript, all done offline using OpenAI Whisper. No API keys or cloud reliance is needed, making it ideal for anyone who values privacy and convenience. It supports multiple audio formats and offers batch processing, ensuring a seamless experience.
AudioScribe: Efficient Audio Transcription Tool
AudioScribe is a powerful command-line interface (CLI) tool that enables the transcription of audio files into clean, structured text using OpenAI Whisper. Designed for offline use, it eliminates the need for API keys, cloud services, and costly per-minute charges, making it an ideal choice for users seeking a cost-effective solution for audio transcription.
Key Features
- Wide Format Support: Transcribes various audio file formats, including
.mp3,.wav,.m4a,.ogg,.flac,.aac, and.wma. - Timestamped Output: Generates structured
.txtfiles that include timestamps, a metadata header, and neatly wrapped lines for easy reading. - Batch Processing: Allows users to transcribe an entire folder of audio files with a single command, enhancing workflow efficiency.
- Automatic Language Detection: Supports automatic language detection or accepts a language hint to speed up the transcription process.
- Flexible Model Options: Utilizes various sizes of the Whisper models from
tinytolarge, catering to different accuracy and speed requirements. - Robust Testing: The tool includes 47 unit tests, ensuring reliability without imposing heavy dependencies for test execution.
Example Output
The output file is structured to include detailed metadata about the transcription:
========================================
TRANSCRIPTION
========================================
File : recording.mp3
Date : 2026-03-30
Duration : 42:02
Language : en
Model : whisper-medium
========================================
[00:00:00] So this system basically drops to the critical risk
for real time.
[00:00:03] So this system will have three components.
[00:00:06] It has the data ingestion layer.
...
========================================
END OF TRANSCRIPTION
========================================
Usage Examples
To transcribe a single audio file:
python transcribe.py audio/recording.mp3
To transcribe all audio files in a folder:
python transcribe.py audio/ --batch
Users can customize their experience further with options to select model size, specify output directories, and set language codes.
Model Size Considerations
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| tiny | 75 MB | Fastest | Good | Quick drafts, clear audio |
| base | 145 MB | Fast | Better | General use |
| small | 465 MB | Medium | Good | Balanced |
| medium | 1.5 GB | Slower | Great | Default — best general accuracy |
| large | 2.9 GB | Slowest | Best | Maximum accuracy, long-form |
Each model is downloaded automatically upon initial use and cached locally for efficient access.
Project Components
The structure of the AudioScribe project includes:
audio-transcription/
├── transcribe.py # CLI entry point
├── src/ # Source code
│ ├── validator.py # Checks for file format, existence, and size
│ ├── input_handler.py # Resolves single file and batch directory input
│ ├── preprocessor.py # Converts audio to 16kHz mono WAV
│ ├── engine.py # Handles Whisper model and caching
│ ├── formatter.py # Formats transcript output with timestamps
│ └── writer.py # Manages output file creation and directory setup
├── specs/ # Module acceptance criteria
├── tests/ # Unit tests (47 in total using pytest)
└── output/ # Default output directory for transcripts (gitignored)
Testing
To run tests, simply ensure the necessary environment is set:
pip install pytest
python -m pytest tests/ -v
AudioScribe streamlines the audio transcription process, making it accessible for various personal and professional applications.
No comments yet.
Sign in to be the first to comment.