Aligning Polynomial Features with Data Distribution: The Attention-Alignment Problem in ML

2025-08-26
Aligning Polynomial Features with Data Distribution: The Attention-Alignment Problem in ML

This post explores aligning polynomial features with data distribution for improved machine learning model performance. Orthogonal bases produce informative features when data is uniformly distributed, but real-world data isn't. Two approaches are presented: a mapping trick, transforming data to a uniform distribution before applying an orthogonal basis; and multiplying by a carefully chosen function to adjust the orthogonal basis's weight function to align with the data distribution. The first is more practical, achievable with Scikit-Learn's QuantileTransformer. The second is more complex, requiring deeper mathematical understanding and fine-tuning. Experiments on the California housing dataset show that near-orthogonal features from the first method outperform traditional min-max scaling in linear regression.

Read more

Debunking the Myth of High-Degree Polynomials in Regression

2025-04-22
Debunking the Myth of High-Degree Polynomials in Regression

The common belief that high-degree polynomials are prone to overfitting and difficult to control in machine learning is challenged in this article. The author argues that the problem isn't high-degree polynomials themselves, but rather the use of inappropriate basis functions, such as the standard basis. Experiments comparing the standard, Chebyshev, and Legendre bases with the Bernstein basis in fitting noisy data demonstrate that the Bernstein basis, with its coefficients sharing the same 'units' and being easily regularized, effectively avoids overfitting. Even high-degree polynomials yield excellent fits using the Bernstein basis, requiring minimal hyperparameter tuning.

Read more