Kovács, László (2022) Feature selection algorithms in generalized additive models under concurvity. Computational Statistics . DOI https://doi.org/10.1007/s00180-022-01292-7
|
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB |
Official URL: https://doi.org/10.1007/s00180-022-01292-7
Abstract
In this paper, the properties of 10 different feature selection algorithms for generalized additive models (GAMs) are compared on one simulated and two real-world datasets under concurvity. Concurvity can be interpreted as a redundancy in the feature set of a GAM. Like multicollinearity in linear models, concurvity causes unstable parameter estimates in GAMs and makes the marginal effect of features harder interpret. Feature selection algorithms for GAMs can be separated into four clusters: stepwise, boosting, regularization and concurvity controlled methods. Our numerical results show that algorithms with no constraints on concurvity tend to select a large feature set, without significant improvements in predictive performance compared to a more parsimonious feature set. A large feature set is accompanied by harmful concurvity in the proposed models. To tackle the concurvity phenomenon, recent feature selection algorithms such as the mRMR and the HSIC-Lasso incorporated some constraints on concurvity in their objective function. However, these algorithms interpret concurvity as pairwise non-linear relationship between features, so they do not account for the case when a feature can be accurately estimated as a multivariate function of several other features. This is confirmed by our numerical results. Our own solution to the problem, a hybrid genetic–harmony search algorithm (HA) introduces constrains on multivariate concurvity directly. Due to this constraint, the HA proposes a small and not redundant feature set with predictive performance similar to that of models with far more features.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | generalized additive model, feature selection, regularization, boosting, genetic algorithm, harmony search algorithm |
Subjects: | General statistics |
DOI: | https://doi.org/10.1007/s00180-022-01292-7 |
ID Code: | 7707 |
Deposited By: | MTMT SWORD |
Deposited On: | 14 Nov 2022 10:07 |
Last Modified: | 14 Nov 2022 10:07 |
Repository Staff Only: item control page