Automatic Sparse Differentiation: Taming High-Dimensional Hessians
Computing high-dimensional Hessian matrices is a major bottleneck in machine learning. This post introduces Automatic Sparse Differentiation (ASD), a technique leveraging matrix sparsity to accelerate Hessian and Jacobian computations. ASD uses sparsity pattern detection and matrix coloring to combine multiple structurally orthogonal columns (or rows) into a single vector for computation, thus reducing computational cost and memory requirements. The article details ASD's workings, covering forward and reverse-mode automatic differentiation, sparse matrix representations, coloring algorithms, and more. A Julia code example demonstrates its application and performance benefits, concluding that ASD offers significant advantages in applications requiring sparse Jacobian or Hessian computations, such as Newton's method and other optimization algorithms.