PitchHut logo
Scribble
Fast and lightweight transcription engine in Rust.
Pitch

Scribble is a robust transcription server and library built in Rust. It effortlessly handles audio and video files, providing fast, real-time transcription without the need for preprocessing. With support for multiple output formats and a streaming-first design, it's ideal for custom implementations and low-latency workflows.

Description

Scribble is a powerful and efficient transcription engine designed for audio and video files, implemented in Rust. Featuring a built-in Whisper backend, Scribble allows for robust transcription capabilities with the flexibility to develop custom backend implementations.

Key Features

  • Fast and Lightweight: Scribble efficiently demuxes and decodes audio and video containers (including MP4, MP3, WAV, FLAC, OGG, WebM, MKV, etc.), normalizing them to mono channels at 16 kHz without any preprocessing needed.

  • Rich Output Formats: The library supports various output formats such as JSON, VTT, and plain text, catering to diverse transcription needs.

  • Versatile Usage: Operates seamlessly as a command-line interface (CLI) tool or as an embedded library for applications requiring transcription capabilities.

  • Streaming-first Architecture: Designed for incremental, chunk-based transcription, Scribble is optimized for handling live audio and low-latency workflows, ensuring effective performance under various conditions.

  • Composable Pipelines: Supports flexible transcription workflows by allowing components such as voice activity detection (VAD) and encoding to work together smoothly.

Usage Examples

Transcribe with CLI: The scribble-cli tool simplifies the transcription process with the command:

cargo run --features bin-scribble-cli --bin scribble-cli -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --input ./input.mp4

Run Scribble Server: To accept transcription requests over HTTP, initiate the scribble-server using:

cargo run --features bin-scribble-server --bin scribble-server -- \
  --model ./models/ggml-large-v3-turbo.bin \
  --vad-model ./models/ggml-silero-v6.2.0.bin \
  --host 127.0.0.1 \
  --port 8080

Sending a transcribe request can be done via:

curl -sS --data-binary @./input.mp4 \
  "http://127.0.0.1:8080/transcribe?output=json" \
  > transcript.json

Library Integration

As an embedded library, Scribble offers the following Rust code for basic usage:

use scribble::{Opts, OutputType, Scribble};
use std::fs::File;

let mut scribble = Scribble::new(
    ["./models/ggml-large-v3-turbo.bin"],
    "./models/ggml-silero-v6.2.0.bin",
)?;

let mut input = File::open("audio.wav")?;
let mut output = Vec::new();

let opts = Opts {
    model_key: None,
    enable_translate_to_english: false,
    enable_voice_activity_detection: true,
    language: None,
    output_type: OutputType::Json,
    incremental_min_window_seconds: 1,
};

scribble.transcribe(&mut input, &mut output, &opts)?;
let json = String::from_utf8(output)?;
println!("{{}}", json);

Project Goals

Scribble is set to evolve further with a commitment to enhance VAD, support streaming, and improve API stability. The project is actively developed with swift iterations based on user requirements and community feedback.

For further information, documentation, and updates, visit the project documentation or GitHub repository.

0 comments

No comments yet.

Sign in to be the first to comment.