Resource Efficient Large-Scale Machine Learning

IRIS

Non-parametric models provide a principled way to learn non-linear functions. In particular, kernel methods are accurate prediction tools that rely on solid theoretical foundations. Although they enjoy optimal statistical properties, they have limited applicability in real-world large-scale scenarios because of their stringent computational requirements in terms of time and memory. Indeed their computational costs scale at least quadratically with the number of points of the dataset and many of the modern machine learning challenges requires training on datasets of millions if not billions of points. In this thesis, we focus on scaling kernel methods, developing novel algorithmic solutions that incorporate budgeted computations. To derive these algorithms we mix ideas from statistics, optimization, and randomized linear algebra. We study the statistical and computational trade-offs for various non-parametric models, the key component to derive numerical solutions with resources tailored to the statistical accuracy allowed by the data. In particular, we study the estimator defined by stochastic gradients and random features, showing how all the free parameters provably govern both the statistical properties and the computational complexity of the algorithm. We then see how to blend the Nyström approximation and preconditioned conjugate gradient to derive a provably statistically optimal solver that can easily scale on datasets of millions of points on a single machine. We also derive a provably accurate leverage score sampling algorithm that can further improve the latter solver. Finally, we see how the Nyström approximation with leverage scores can be used to scale Gaussian processes in a bandit optimization setting deriving a provably accurate algorithm. The theoretical analysis and the new algorithms presented in this work represent a step towards building a new generation of efficient non-parametric algorithms with minimal time and memory footprints.

Resource Efficient Large-Scale Machine Learning

CARRATINO, LUIGI

2020-03-20

Abstract

Non-parametric models provide a principled way to learn non-linear functions. In particular, kernel methods are accurate prediction tools that rely on solid theoretical foundations. Although they enjoy optimal statistical properties, they have limited applicability in real-world large-scale scenarios because of their stringent computational requirements in terms of time and memory. Indeed their computational costs scale at least quadratically with the number of points of the dataset and many of the modern machine learning challenges requires training on datasets of millions if not billions of points. In this thesis, we focus on scaling kernel methods, developing novel algorithmic solutions that incorporate budgeted computations. To derive these algorithms we mix ideas from statistics, optimization, and randomized linear algebra. We study the statistical and computational trade-offs for various non-parametric models, the key component to derive numerical solutions with resources tailored to the statistical accuracy allowed by the data. In particular, we study the estimator defined by stochastic gradients and random features, showing how all the free parameters provably govern both the statistical properties and the computational complexity of the algorithm. We then see how to blend the Nyström approximation and preconditioned conjugate gradient to derive a provably statistically optimal solver that can easily scale on datasets of millions of points on a single machine. We also derive a provably accurate leverage score sampling algorithm that can further improve the latter solver. Finally, we see how the Nyström approximation with leverage scores can be used to scale Gaussian processes in a bandit optimization setting deriving a provably accurate algorithm. The theoretical analysis and the new algorithms presented in this work represent a step towards building a new generation of efficient non-parametric algorithms with minimal time and memory footprints.

Scheda breve

Scheda completa

Scheda completa (DC)

Data di discussione della tesi

20-mar-2020

Appare nelle tipologie:

Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phdunige_3690127.pdf accesso aperto Tipologia: Tesi di dottorato Dimensione 1.48 MB Formato Adobe PDF Visualizza/Apri	1.48 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1001813

Citazioni

ND

ND

ND

social impact