Corvinus
Corvinus

Feature space reduction method for ultrahigh-dimensional, multiclass data: Random forest-based multiround screening (RFMS)

Hanczár, Gergely, Stippinger, Marcell ORCID: https://orcid.org/0000-0002-9954-8089, Hanák, Dávid, Kurbucz, Marcell Tamás ORCID: https://orcid.org/0000-0002-0121-6781, Törteli, Olivér Máté, Chripkó, Ágnes ORCID: https://orcid.org/0000-0002-2863-5257 and Somogyvári, Zoltán (2023) Feature space reduction method for ultrahigh-dimensional, multiclass data: Random forest-based multiround screening (RFMS). Machine Learning: Science and Technology, 4 (4). DOI https://doi.org/10.1088/2632-2153/ad020e

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
910kB

Official URL: https://doi.org/10.1088/2632-2153/ad020e


Abstract

In recent years, several screening methods have been published for ultrahigh-dimensional data that contain hundreds of thousands of features, many of which are irrelevant or redundant. However, most of these methods cannot handle data with thousands of classes. Prediction models built to authenticate users based on multichannel biometric data result in this type of problem. In this study, we present a novel method known as random forest-based multiround screening (RFMS) that can be effectively applied under such circumstances. The proposed algorithm divides the feature space into small subsets and executes a series of partial model builds. These partial models are used to implement tournament-based sorting and the selection of features based on their importance. This algorithm successfully filters irrelevant features and also discovers binary and higher-order feature interactions. To benchmark RFMS, a synthetic biometric feature space generator known as BiometricBlender is employed. Based on the results, the RFMS is on par with industry-standard feature screening methods, while simultaneously possessing many advantages over them.

Item Type:Article
Divisions:Institute of Data Analytics and Information Systems
Subjects:Knowledge economy, innovation
Computer science
DOI:https://doi.org/10.1088/2632-2153/ad020e
ID Code:9520
Deposited By: MTMT SWORD
Deposited On:21 Nov 2023 16:04
Last Modified:21 Nov 2023 16:04

Repository Staff Only: item control page

Downloads

Downloads per month over past year

View more statistics