Training the Strongest Model on a MacBook Pro in 5 Minutes: A Challenge

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

Training the Strongest Model on a MacBook Pro in 5 Minutes: A Challenge

2025-08-14

The author challenges himself to train the strongest possible language model on a MacBook Pro in just five minutes. Experiments culminated in a ~1.8M parameter GPT-style transformer trained on ~20M TinyStories tokens, achieving ~9.6 perplexity. Optimizations focused on maximizing tokens-per-second, favoring MPS and avoiding gradient accumulation. Dataset selection proved crucial, with TinyStories' coherent, simple language proving superior. Transformers outperformed LSTMs and diffusion models. The optimal model size for a five-minute training window was found to be around 2M parameters, aligning with Chinchilla scaling laws.

(www.seangoedecke.com)

The Pains and Pleasures of Typeface Licensing: A Designer's Perspective

ArchWiki's Secrets to Success: Lessons from DebConf25