Open Source Golang Voice Agent Orchestrator -By Lokutor

Build voice-powered applications with ease and flexibility.

Pitch

Lokutor Orchestrator is a production-ready Go library designed for developing voice-powered applications. With support for multiple pluggable providers, including STT, LLM, and TTS, it offers features like full-duplex voice orchestration and real-time interaction, making it an ideal choice for robust voice applications.

Description

Lokutor Orchestrator is a robust Go library designed for the creation of voice-powered applications, supporting a variety of plug-in providers for Speech-to-Text (STT), Language Model (LLM), and Text-to-Speech (TTS) functionalities. It offers a comprehensive set of features to implement highly interactive voice applications with minimal latency.

Key Features

Full-Duplex Voice Orchestration: Engage in real-time voice interactions with seamless capture and playback, featuring built-in Voice Activity Detection (VAD) for responsive listening.
Barge-in Support: Enable users to interrupt audio playback instantly, enhancing user experience by responding to voice commands immediately.
High-Quality Audio: Supports native 44.1kHz 16-bit PCM for clear and high-fidelity voice output.
Provider-agnostic Architecture: Easily switch between different STT, LLM, and TTS implementations without altering the overall structure of your application.
Multiple Providers Integrated: Out-of-the-box support for top providers, including Groq (Whisper & Llama), OpenAI (Whisper & GPT), Anthropic (Claude), Google (Gemini), Deepgram, AssemblyAI, and Lokutor (Versa).
Automatic Session Management: This allows for context-aware handling of user interactions and supports multiple languages seamlessly.
Event-driven API: A thread-safe, channel-based event bus for building interactive and stable user interfaces with ease.
Low Latency Design: Crafted with real-time voice interactivity as a priority, ensuring instant feedback and responses.

Example Usage

Full-Duplex Voice Agent

The following code snippet demonstrates how to set up a real-time voice assistant with barge-in support:

package main

import (
    "context"
    "github.com/lokutor-ai/lokutor-orchestrator/pkg/orchestrator"
    sttProvider "github.com/lokutor-ai/lokutor-orchestrator/pkg/providers/stt"
    llmProvider "github.com/lokutor-ai/lokutor-orchestrator/pkg/providers/llm"
    ttsProvider "github.com/lokutor-ai/lokutor-orchestrator/pkg/providers/tts"
)

func main() {
    // Initialize Providers
    stt := sttProvider.NewGroqSTT("YOUR_GROQ_KEY", "whisper-large-v3")
    llm := llmProvider.NewGroqLLM("YOUR_GROQ_KEY", "llama-3.3-70b-versatile")
    tts := ttsProvider.NewLokutorTTS("YOUR_LOKUTOR_KEY")
    
    // Initialize VAD
    vad := orchestrator.NewRMSVAD(0.02, 500*time.Millisecond)

    // Create Orchestrator
    orch := orchestrator.NewWithVAD(stt, llm, tts, vad, orchestrator.DefaultConfig())
    session := orch.NewSessionWithDefaults("user_123")

    // Start Managed Stream
    stream := orch.NewManagedStream(context.Background(), session)
    defer stream.Close()
    
    // Handle Events
    go func() {
        for event := range stream.Events() {
            switch event.Type {
            case orchestrator.UserSpeaking:
                // Handle audio playback stop
            case orchestrator.TranscriptFinal:
                fmt.Printf("User: %s\n", event.Data.(string))
            case orchestrator.AudioChunk:
                // Play audio chunk raw bytes
                playAudio(event.Data.([]byte))
            }
        }
    }()

    // Stream microphone audio bytes here
    stream.Write(micBytes)
}

Conversational API

The library also supports a conversational API for managing audio input and responses elegantly:

transcript, response, err := conv.ProcessAudio(
    context.Background(),
    audioBytes,
    func(chunk []byte) error {
        // Handle audio chunk
        return sendToSpeaker(chunk)
    },
)

if err != nil {
    log.Fatal(err)
}

log.Printf("User said: %s", transcript)
log.Printf("Assistant responded: %s", response)

Advanced Usage

The orchestrator is highly configurable, allowing for seamless integration of different languages and additional providers, giving developers flexibility in customizing voice interactions.

With its clean architecture and support for multiple providers, Lokutor Orchestrator is well-suited for developers looking to build sophisticated voice applications efficiently. By managing conversation context and employing event-driven paradigms, it streamlines the complexity often associated with real-time speech applications.

0 comments

No comments yet.

New comment