Choosing an appropriate hypothesis space in classification applications, according to the Structural Risk Minimization (SRM) principle, is of paramount importance to train effective models: in fact, properly selecting the the space complexity allows to optimize the learned functions performance. This selection is not straightforward, especially (though not solely) when few samples are available for deriving an effective model (e.g. in bioinformatics applications). In this paper, by exploiting a bit-based definition for Support Vector Machine (SVM) classifiers, selected from an hypothesis space described according to sparsity and locality principles, we show how the complexity of the corresponding space of functions can be effectively tuned through the number of bits used for the function representation. Real world datasets are exploited to show how the number of bits and the degree of sparsity/locality imposed to define the hypothesis space affect the complexity of the space of classifiers and, consequently, the performance of the model, picked up from this set.

A Support Vector Machine Classifier from a Bit-Constrained, Sparse and Localized Hypothesis Space

ANGUITA, DAVIDE;GHIO, ALESSANDRO;ONETO, LUCA;RIDELLA, SANDRO
2013

Abstract

Choosing an appropriate hypothesis space in classification applications, according to the Structural Risk Minimization (SRM) principle, is of paramount importance to train effective models: in fact, properly selecting the the space complexity allows to optimize the learned functions performance. This selection is not straightforward, especially (though not solely) when few samples are available for deriving an effective model (e.g. in bioinformatics applications). In this paper, by exploiting a bit-based definition for Support Vector Machine (SVM) classifiers, selected from an hypothesis space described according to sparsity and locality principles, we show how the complexity of the corresponding space of functions can be effectively tuned through the number of bits used for the function representation. Real world datasets are exploited to show how the number of bits and the degree of sparsity/locality imposed to define the hypothesis space affect the complexity of the space of classifiers and, consequently, the performance of the model, picked up from this set.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/629588
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? ND
social impact