oMLX: High-Performance LLM Inference Server for Apple Silicon

date: 2026-05-11

draft: false

---

The newly released omlx inference server optimizes LLM performance on Mac through continuous batching and tiered KV caching across RAM and SSD. Managed via the macOS menu bar, it allows users to pin models in memory and supports vision-language models, embeddings, and the Model Context Protocol. The tool is designed to make local LLMs practical for daily coding by preserving context across requests even after server restarts.