Phi Silica: A Highly Efficient SLM for Windows 11 Copilot+ PCs

2025-05-01
Phi Silica: A Highly Efficient SLM for Windows 11 Copilot+ PCs

Microsoft's Applied Sciences team achieved a breakthrough in AI efficiency on Windows 11 Copilot+ PCs (powered by Snapdragon X-series processors) using a multi-disciplinary approach. Their small language model, Phi Silica, significantly improves power efficiency, inference speed, and memory efficiency. Phi Silica powers several Copilot+ PC features, including Click to Do, on-device rewrite and summarization in Word and Outlook, and provides a pre-optimized SLM for developers. Techniques like 4-bit weight quantization, memory-mapped embeddings, and QuaRot (a novel 4-bit quantization method) drastically reduce memory footprint and achieve high-accuracy 4-bit quantized inference. It boasts a time-to-first-token of 230ms for short prompts and a throughput of up to 20 tokens/second.