A crucial issue in designing learning machines is to select the correct model parameters. When the number of available samples is small, theoretical sample-based generalization bounds can prove effective, provided that they are tight and track the validation error correctly. The Maximal Discrepancy approach is a very promising technique for model selection for Support Vector Machines (SVM), and estimates a classifier’s generalization performance by multiple training cycles on random labeled data. This paper presents a general method to compute the generalization bounds for SVMs, which is based on referring the SVM parameters to an unsupervised solution, and shows that such an approach yields tight bounds and attains effective model selection. When one estimates the generalization error, one uses an unsupervised reference to constrain the complexity of the learning machine, thereby possibly decreasing sharply the number of admissible hypothesis. Although the methodology has a general value, the method described in the paper adopts Vector Quantization (VQ) as a representation paradigm, and introduces a biased regularization approach in bound computation and learning. Experimental results validate the proposed method on complex real-world data sets.

Using Unsupervised Analysis to Constrain Generalization Bounds for Support Vector Classifiers

RIDELLA, SANDRO;ZUNINO, RODOLFO;GASTALDO, PAOLO;ANGUITA, DAVIDE
2010-01-01

Abstract

A crucial issue in designing learning machines is to select the correct model parameters. When the number of available samples is small, theoretical sample-based generalization bounds can prove effective, provided that they are tight and track the validation error correctly. The Maximal Discrepancy approach is a very promising technique for model selection for Support Vector Machines (SVM), and estimates a classifier’s generalization performance by multiple training cycles on random labeled data. This paper presents a general method to compute the generalization bounds for SVMs, which is based on referring the SVM parameters to an unsupervised solution, and shows that such an approach yields tight bounds and attains effective model selection. When one estimates the generalization error, one uses an unsupervised reference to constrain the complexity of the learning machine, thereby possibly decreasing sharply the number of admissible hypothesis. Although the methodology has a general value, the method described in the paper adopts Vector Quantization (VQ) as a representation paradigm, and introduces a biased regularization approach in bound computation and learning. Experimental results validate the proposed method on complex real-world data sets.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/250846
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 17
social impact