Softmax: Forever? A Deep Dive into Log-Harmonic Functions

2025-02-20

A decade ago, while teaching a course on NLP, the author was challenged by a student about alternatives to softmax. A recent paper proposes a log-harmonic function as a replacement, sparking a deeper investigation. The author analyzes the partial derivatives of both softmax and the log-harmonic function, revealing that softmax's gradient is well-behaved and interpretable, while the log-harmonic function's gradient exhibits singularity near the origin, potentially causing training difficulties. While powerful optimizers might overcome these challenges, the author concludes that the log-harmonic approach still warrants further exploration and potential improvements.