Embershard - A native chat app for macOS with a built-in LLM inference engine.

Projects Leaderboard

Pitch

Description

Comments

Embershard

progressive_ivory_jenifer

A native chat app for macOS with a built-in LLM inference engine.

Visit project

Pitch

Embershard is a macOS chat application featuring its own native LLM inference engine, designed specifically for Apple Silicon. It offers an efficient and focused computing environment, allowing for seamless interaction while maintaining complete control over graph construction and execution for Llama and Qwen architectures.

Description

Embershard is a specialized native language model (LLM) inference engine and chat application designed for macOS and Apple Silicon. This project provides an efficient and streamlined solution for executing chat applications using advanced language models while maintaining simplicity and performance.

Key Features

Native Inference Engine: Embershard operates independently from other libraries during inference, ensuring a focused and efficient processing of models. It supports the llama and qwen2 families, including Llama 3.x, Mistral, and Qwen 2.5.
Optimized Performance: The design includes a resident Key-Value (KV) cache and a custom byte-level BPE and SentencePiece tokenizer. This results in enhanced performance and throughput, matching or exceeding other implementations based on memory bandwidth.
Proven Accuracy: The model achieves logit parity with existing systems, ensuring consistent and reliable outputs across multiple conversations without losing context, even during prolonged interactions.

Efficient Tokenization and Model Loading

Embershard features a robust loading mechanism for GGUF files, including support for sharded GGUFs and compatibility checks based on available system resources. This ensures that only suitable models are loaded and enhances overall user experience.

User-Friendly Interface

Built using SwiftUI, the Embershard application provides a contemporary interface featuring multiple chat modes such as Standard, Agentic (for multi-step tasks), and Arena (for concurrent responses from multiple models). Each chat session can be customized with specific skills, making it versatile and adaptable to different use cases.

Example Usage

To run the engine and generate responses, no additional libraries are needed. Here’s a quick example:

./build/gen_gx /path/to/model.gguf "My name is Alice." "What is my name?"

Conclusion

As a project rooted in the principles of independence and modularity, Embershard represents a powerful tool for developers and enthusiasts looking to explore LLM capabilities in a macOS environment. Explore the potential of language models with Embershard, where performance meets simplicity.

0 comments

No comments yet.

New comment