SoMatic is a powerful CLI tool designed for native desktop UI automation. It employs a local YOLO model to identify and organize interactive elements in screenshots, providing clear targeting options. Ideal for automating workflows across applications, it supports both JSON output and easy installation through npm, making it accessible for developers.
SoMatic is a powerful agent-first command-line interface (CLI) designed for native desktop UI automation. Utilizing a local YOLO model, it detects and numbers interactive elements within screenshots, allowing for precise action targeting using a structured coordinate map. No guesswork is involved when interacting with elements; users can specify targets by mark ID, nearest-mark offset, or direct pixel coordinates, ensuring accuracy in automation workflows.
The core command for accessing SoMatic is the public binary somatic, which seamlessly integrates with various applications, including native desktop apps, web browsers, and PDF tools. Each command executed returns structured JSON, enhancing the automation experience.
Key Features
SoMatic transforms desktop screenshots into actionable targets suitable for diverse use cases, including:
- PDF Workflow

- Chess Setup

- Browser and Terminal Interactions

Quick Start Commands
Start automating actions easily with commands such as:
somatic doctor
somatic vision init
somatic screenshot --annotate
somatic click 3
somatic type "hello from SoMatic"
somatic hotkey ctrl s
somatic vision stop
The initial setup includes somatic vision init, which downloads the necessary model weights and prepares the system for annotating screenshots. Once this configuration is complete, users can perform various actions, including clicking, typing, and even structuring hotkeys for efficient workflows.
Vision and Automation
The vision daemon offers a local HTTP API, facilitating advanced features such as:
- Health Check
GET /health - Run YOLO Inference
POST /parsewith{ "image_path": "/absolute/path.png" }
For executors needing automation on Linux, SoMatic can run a headless operation using Xvfb, which allows actions to be performed in a virtualized environment.
Benchmark Performance
SoMatic’s local YOLO detection has shown superior performance in benchmarks, significantly outperforming standard models in tasks requiring spatial recognition and interaction.
Discover more about SoMatic's capabilities and how it streamlines native UI automation through structured visual recognition, suitable for various applications in both personal and professional environments.
No comments yet.
Sign in to be the first to comment.