We investigate an extension of classical empirical risk minimization, where the hypothesis space consists of a random subspace within a given Hilbert space. Specifically, we examine the Nystrom method where the subspaces are defined by a random subset of the data. This approach recovers Nystrom approximations used in kernel methods as a specific case. Using random subspaces naturally leads to computational advantages, but a key question is whether it compromises the learning accuracy. Recently, the tradeoffs between statistics and computation have been explored for the square loss and self-concordant losses, such as the logistic loss. In this paper, we extend these analyses to general convex Lipschitz losses, which may lack smoothness, such as the hinge loss used in support vector machines. Our main results show the existence of various scenarios where computational gains can be achieved without sacrificing learning performance. When specialized to smooth loss functions, our analysis recovers most previous results. Moreover, it allows to consider classification problems and translate the surrogate risk bounds into classification error bounds. Indeed, this gives the opportunity to compare the effect of Nystrom approximations when combined with different loss functions such as the hinge or the square los

The Nyström method for convex loss functions

Ernesto De vito;Lorenzo Rosasco
2024-01-01

Abstract

We investigate an extension of classical empirical risk minimization, where the hypothesis space consists of a random subspace within a given Hilbert space. Specifically, we examine the Nystrom method where the subspaces are defined by a random subset of the data. This approach recovers Nystrom approximations used in kernel methods as a specific case. Using random subspaces naturally leads to computational advantages, but a key question is whether it compromises the learning accuracy. Recently, the tradeoffs between statistics and computation have been explored for the square loss and self-concordant losses, such as the logistic loss. In this paper, we extend these analyses to general convex Lipschitz losses, which may lack smoothness, such as the hinge loss used in support vector machines. Our main results show the existence of various scenarios where computational gains can be achieved without sacrificing learning performance. When specialized to smooth loss functions, our analysis recovers most previous results. Moreover, it allows to consider classification problems and translate the surrogate risk bounds into classification error bounds. Indeed, this gives the opportunity to compare the effect of Nystrom approximations when combined with different loss functions such as the hinge or the square los
File in questo prodotto:
File Dimensione Formato  
23-0768.pdf

accesso aperto

Tipologia: Documento in versione editoriale
Dimensione 653.13 kB
Formato Adobe PDF
653.13 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1229355
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact