OpenArc: A Lightweight Inference API for Accelerating LLMs on Intel Hardware

2025-02-19
OpenArc: A Lightweight Inference API for Accelerating LLMs on Intel Hardware

OpenArc is a lightweight inference API backend leveraging the OpenVINO runtime and OpenCL drivers to accelerate inference of Transformers models on Intel CPUs, GPUs, and NPUs. Designed for agentic use cases, it features a strongly-typed FastAPI implementation with endpoints for model loading, unloading, text generation, and status queries. OpenArc simplifies decoupling machine learning code from application logic, offering a workflow similar to Ollama, LM-Studio, and OpenRouter. It supports custom models and roles, with planned extensions including an OpenAI proxy, vision model support, and more.