ANEMLL: Accelerating LLMs on Apple's Neural Engine
2025-05-03
ANEMLL is an open-source project focused on accelerating Large Language Models (LLMs) to tensor processors, starting with Apple's Neural Engine (ANE). It provides a complete open-source pipeline from model conversion (from Hugging Face) to inference on ANE, enabling seamless on-device inference for low-power edge applications, maximizing privacy and security. Currently supporting models like LLaMA 3.1, ANEMLL offers Swift and Python sample code, along with iOS/macOS applications. This is an alpha release, so expect improvements in quantization.
Development
Apple Neural Engine