Podvoice - Transform Markdown scripts into multi-speaker audio effortlessly.

Podvoice

Transform Markdown scripts into multi-speaker audio effortlessly.

Pitch

Podvoice is a local-first CLI tool that seamlessly converts Markdown scripts into rich, multi-speaker audio using Coqui XTTS v2. Ideal for podcast creators and developers, it prioritizes data control and simplicity by operating offline. Experience a straightforward approach to audio content creation with easy speaker mapping and multiple audio export formats.

Description

Podvoice is an open-source command-line interface (CLI) tool designed to transform simple Markdown scripts into multi-speaker audio files using Coqui XTTS v2. This local-first solution allows developers and content creators to generate podcast-style audio without relying on cloud services or paid APIs, ensuring full control over data and costs.

Key Features

Markdown-based scripts: Write scripts easily in .md format with clear speaker blocks.
Multiple logical speakers: Map each speaker name to a consistent voice within the XTTS model.
Single output file: Generate a complete audio file for the entire script.
Flexible export options: Audio can be exported as WAV by default, or MP3 when specified.
Local-only inference: Utilizes the pre-trained Coqui XTTS v2 model, which is downloaded and cached locally.
CPU-friendly design: Runs efficiently on CPU, with optional GPU support.
Beginner-friendly codebase: Built in Python 3.10+ with a modular structure and ample comments for easy modification.

How It Works

Podvoice processes Markdown input formatted with speaker designations. For instance:

[SpeakerA | calm]
Hello and welcome to the show.

[SpeakerB | excited]
Aaj hum AI ke baare mein baat karenge.

The tool generates audio by consistently mapping speaker names to voices, using a hashing mechanism to determine which voice to assign. This allows developers to maintain consistent voicing across their scripts, simplifying audio generation.

Getting Started with Podvoice

To begin using Podvoice, the following prerequisites are necessary:

Python 3.10
ffmpeg installed on your system for audio processing.

A simple command to render a Markdown script looks like this:

podvoice render examples/demo.md --out output.wav

This command ensures that the Coqui TTS model is cached after the first run, with subsequent commands reusing the cached files, enhancing performance.

Responsible Usage

As a powerful TTS tool, Podvoice should be used responsibly. Users must avoid impersonating real individuals and refrain from utilizing this technology for harmful purposes. All generated content should be disclosed to listeners, maintaining ethical standards in audio generation.

Podvoice serves as a practical and flexible solution, ideal for developers seeking to streamline their audio content generation workflows while keeping everything local.

0 comments

No comments yet.

New comment