Multidimensional data streams are a major paradigm in data science. They are always related to time, albeit to different degrees. They may represent actual time series or quasi-stationary phenomena that feature longer-term variability, e.g., changes in statistical distribution or a cyclical behavior. In these non-stationary conditions, a given model is expected to be appropriate only in a temporal neighborhood of the time when it has been validated/learned. Its validity may decrease smoothly with time (concept drift), or there may be sudden changes, for instance when switching from one operating condition to a new one (concept shift). The proposed approach consists in studying a clustering process able to adapt to streaming data, by implementing a continuous learning exploiting the input patterns as they arrive. Based on this idea we specifically exploit the ability of possibilistic clustering [2] to cluster iteratively using both batch (sliding-window) and online (by-pattern) strategies that track and adapt to concept drift and shift in a natural way. Measures of fuzzy “outlierness” and fuzzy outlier density are obtained as intrinsic by-products of the possibilistic clustering technique adopted. These measures are used to modulate the amount of incremental learning according to the different regimes required by non-stationary data stream clustering. The proposed method is used as a generative model to assess and improve the accuracy of a forecaster based of a neural network ensemble [1]. The generative model provides two kinds of information: The first is used to partition the data for obtaining a specialized forecaster for each cluster; the second allows us to provide a soft rejection, i.e., a fuzzy evaluation of outlierness that is a symptom of a possibly anomalous pattern.

A fuzzy clustering approach to non-stationary data streams learning

Abdullatif, A.;Masulli, F.;Rovetta, S.;Cabri, A.
2017-01-01

Abstract

Multidimensional data streams are a major paradigm in data science. They are always related to time, albeit to different degrees. They may represent actual time series or quasi-stationary phenomena that feature longer-term variability, e.g., changes in statistical distribution or a cyclical behavior. In these non-stationary conditions, a given model is expected to be appropriate only in a temporal neighborhood of the time when it has been validated/learned. Its validity may decrease smoothly with time (concept drift), or there may be sudden changes, for instance when switching from one operating condition to a new one (concept shift). The proposed approach consists in studying a clustering process able to adapt to streaming data, by implementing a continuous learning exploiting the input patterns as they arrive. Based on this idea we specifically exploit the ability of possibilistic clustering [2] to cluster iteratively using both batch (sliding-window) and online (by-pattern) strategies that track and adapt to concept drift and shift in a natural way. Measures of fuzzy “outlierness” and fuzzy outlier density are obtained as intrinsic by-products of the possibilistic clustering technique adopted. These measures are used to modulate the amount of incremental learning according to the different regimes required by non-stationary data stream clustering. The proposed method is used as a generative model to assess and improve the accuracy of a forecaster based of a neural network ensemble [1]. The generative model provides two kinds of information: The first is used to partition the data for obtaining a specialized forecaster for each cluster; the second allows us to provide a soft rejection, i.e., a fuzzy evaluation of outlierness that is a symptom of a possibly anomalous pattern.
2017
9783319686110
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/885685
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact