Data mining applications explore large amounts of heterogeneous data in search of consistent information. In such a challenging context, empirical learning methods aim to optimize prediction on unseen data, and an accurate estimate of the generalization error is of paramount importance. The paper shows that the theoretical formulation based on the Vapnik-Chervonenkis dimension (d vc ) can be of practical interest when applied to clustering methods for data-mining applications. The presented research adopts the K-Winner Machine (KWM) as a clustering-based, semi-supervised classifier; in addition to fruitful theoretical properties, the model provides a general criterion for evaluating the applicability of Vapnik's generalization predictions in data mining. The general approach is verified experimentally in the practical problem of detecting intrusions in computer networks. Empirical results prove that the KWM model can effectively support such a difficult classification task and combine unsupervised and supervised.

Non-stationary Data Mining: the Network Security Issue

GASTALDO, PAOLO;ZUNINO, RODOLFO
2008-01-01

Abstract

Data mining applications explore large amounts of heterogeneous data in search of consistent information. In such a challenging context, empirical learning methods aim to optimize prediction on unseen data, and an accurate estimate of the generalization error is of paramount importance. The paper shows that the theoretical formulation based on the Vapnik-Chervonenkis dimension (d vc ) can be of practical interest when applied to clustering methods for data-mining applications. The presented research adopts the K-Winner Machine (KWM) as a clustering-based, semi-supervised classifier; in addition to fruitful theoretical properties, the model provides a general criterion for evaluating the applicability of Vapnik's generalization predictions in data mining. The general approach is verified experimentally in the practical problem of detecting intrusions in computer networks. Empirical results prove that the KWM model can effectively support such a difficult classification task and combine unsupervised and supervised.
2008
978-3-540-87558-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/238847
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact