Model selection and error estimation without the agonizing pain

IRIS

How can we select the best performing data-driven model? How can we rigorously estimate its generalization error? Statistical learning theory (SLT) answers these questions by deriving nonasymptotic bounds on the generalization error of a model or, in other words, by delivering upper bounding of the true error of the learned model based just on quantities computed on the available data. However, for a long time, SLT has been considered only as an abstract theoretical framework, useful for inspiring new learning approaches, but with limited applicability to practical problems. The purpose of this review is to give an intelligible overview of the problems of model selection (MS) and error estimation (EE), by focusing on the ideas behind the different SLT-based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice. We start by presenting the seminal works of the 80s until the most recent results, then discuss open problems and finally outline future directions of this field of research. This article is categorized under: Technologies > Statistical Fundamentals Algorithmic Development > Statistics.

Model selection and error estimation without the agonizing pain

Oneto, Luca

2018-01-01

Abstract

How can we select the best performing data-driven model? How can we rigorously estimate its generalization error? Statistical learning theory (SLT) answers these questions by deriving nonasymptotic bounds on the generalization error of a model or, in other words, by delivering upper bounding of the true error of the learned model based just on quantities computed on the available data. However, for a long time, SLT has been considered only as an abstract theoretical framework, useful for inspiring new learning approaches, but with limited applicability to practical problems. The purpose of this review is to give an intelligible overview of the problems of model selection (MS) and error estimation (EE), by focusing on the ideas behind the different SLT-based approaches and simplifying most of the technical aspects with the purpose of making them more accessible and usable in practice. We start by presenting the seminal works of the 80s until the most recent results, then discuss open problems and finally outline future directions of this field of research. This article is categorized under: Technologies > Statistical Fundamentals Algorithmic Development > Statistics.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2018

Appare nelle tipologie:

01.01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
J032 - WIRES.pdf accesso chiuso Tipologia: Documento in versione editoriale Dimensione 6.18 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	6.18 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/914702

Citazioni

ND

17

12

social impact