Corvinus
Corvinus

Feature selection algorithms in generalized additive models under concurvity

Kovács, László (2022) Feature selection algorithms in generalized additive models under concurvity. Computational Statistics . DOI https://doi.org/10.1007/s00180-022-01292-7

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB

Official URL: https://doi.org/10.1007/s00180-022-01292-7


Abstract

In this paper, the properties of 10 different feature selection algorithms for generalized additive models (GAMs) are compared on one simulated and two real-world datasets under concurvity. Concurvity can be interpreted as a redundancy in the feature set of a GAM. Like multicollinearity in linear models, concurvity causes unstable parameter estimates in GAMs and makes the marginal effect of features harder interpret. Feature selection algorithms for GAMs can be separated into four clusters: stepwise, boosting, regularization and concurvity controlled methods. Our numerical results show that algorithms with no constraints on concurvity tend to select a large feature set, without significant improvements in predictive performance compared to a more parsimonious feature set. A large feature set is accompanied by harmful concurvity in the proposed models. To tackle the concurvity phenomenon, recent feature selection algorithms such as the mRMR and the HSIC-Lasso incorporated some constraints on concurvity in their objective function. However, these algorithms interpret concurvity as pairwise non-linear relationship between features, so they do not account for the case when a feature can be accurately estimated as a multivariate function of several other features. This is confirmed by our numerical results. Our own solution to the problem, a hybrid genetic–harmony search algorithm (HA) introduces constrains on multivariate concurvity directly. Due to this constraint, the HA proposes a small and not redundant feature set with predictive performance similar to that of models with far more features.

Item Type:Article
Uncontrolled Keywords:generalized additive model, feature selection, regularization, boosting, genetic algorithm, harmony search algorithm
Subjects:General statistics
DOI:https://doi.org/10.1007/s00180-022-01292-7
ID Code:7707
Deposited By: MTMT SWORD
Deposited On:14 Nov 2022 10:07
Last Modified:14 Nov 2022 10:07

Repository Staff Only: item control page

Downloads

Downloads per month over past year

View more statistics