Sziklai, Balázs
ORCID: https://orcid.org/0000-0002-0068-8920, Baranyi, Máté
ORCID: https://orcid.org/0000-0003-4415-4805 and Héberger, Károly
(2025)
Does cross-validation work in telling rankings apart?
Central European Journal of Operations Research, 33
.
pp. 1503-1528.
DOI 10.1007/s10100-024-00932-1
|
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
2MB |
Official URL: https://doi.org/10.1007/s10100-024-00932-1
Abstract
Although cross-validation (CV) is a standard technique in machine learning and data science, its efficacy remains largely unexplored in ranking environments. When evaluating the significance of differences, cross-validation is typically coupled with statistical testing, such as the Dietterich, Alpaydin, or Wilcoxon test. In this paper, we evaluate the power and false positive error rate of the Dietterich, Alpaydin, and Wilcoxon statistical tests combined with cross-validation each operating with folds ranging from 5 to 10, resulting in a total of 18 variants. Our testing setup utilizes a ranking framework, similar to the Sum of Ranking Differences (SRD) statistical procedure: we assume the existence of a reference ranking, and distances are measured in L_1 L 1 -norm. We test the methods under artificial scenarios as well as on real data borrowed from sports and chemistry. The choice of the optimal CV test method depends on preferences related to the minimization of errors in type I and II cases, the size of the input, and anticipated patterns in the data. Among the investigated input sizes, the Wilcoxon method with eight folds proved to be the most effective, although its performance in type I situations is subpar. While the Dietterich and Alpaydin methods excel in type I situations, they perform poorly in type II scenarios. The inadequate performances of these tests raises questions about their efficacy outside of ranking environments too.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | k-fold cross-validation ; Rankings ; Sum of ranking differences ; Wilcoxon test ; Alpaydin test ; Leave-many-out ; Multi-criteria decision-making |
| Divisions: | Institute of Operations and Decision Sciences |
| Subjects: | Decision making |
| Funders: | János Bolyai Research Scholarship of the Hungarian Academy of Sciences, National Research, Development and Innovation Fund, National Research, Development and Innovation Office of Hungary |
| Projects: | ÚNKP-23-5, K 138945, K 134260 |
| DOI: | 10.1007/s10100-024-00932-1 |
| ID Code: | 12393 |
| Deposited By: | MTMT SWORD |
| Deposited On: | 09 Jan 2026 08:51 |
| Last Modified: | 09 Jan 2026 08:51 |
Repository Staff Only: item control page


Download Statistics
Download Statistics