The K-fold Cross Validation (KCV) technique is one of the most used approaches by practitioners for model selection and error estimation of classifiers. The KCV consists in splitting a dataset into k subsets; then, iteratively, some of them are used to learn the model, while the others are exploited to assess its performance. However, in spite of the KCV success, only practical rule-of-thumb methods exist to choose the number and the cardinality of the subsets. We propose here an approach, which allows to tune the number of the subsets of the KCV in a data–dependent way, so to obtain a reliable, tight and rigorous estimation of the probability of misclassification of the chosen model.
The ‘K’ in K-fold Cross Validation
ANGUITA, DAVIDE;GHIO, ALESSANDRO;ONETO, LUCA;RIDELLA, SANDRO
2012-01-01
Abstract
The K-fold Cross Validation (KCV) technique is one of the most used approaches by practitioners for model selection and error estimation of classifiers. The KCV consists in splitting a dataset into k subsets; then, iteratively, some of them are used to learn the model, while the others are exploited to assess its performance. However, in spite of the KCV success, only practical rule-of-thumb methods exist to choose the number and the cardinality of the subsets. We propose here an approach, which allows to tune the number of the subsets of the KCV in a data–dependent way, so to obtain a reliable, tight and rigorous estimation of the probability of misclassification of the chosen model.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.