Plankton is one of the most abundant and diverse class of microscopic organisms inhabiting the Earth. Their enormous intra- and inter-species genetic and phenotypic diversity, coupled with the limited amount of large survey data, makes it hard to obtain a complete representation of this important class of organisms. Hence, the classification accuracy of novel supervised machine learning algorithms is bound to be limited by the incompleteness of the training data. In this work we introduce an efficient pipeline centered around a novel anomaly detection algorithm to discover and classify new plankton species, in situ, with the aim of automatically populating a plankton database in an unsupervised fashion. Our pipeline utilizes the concept of anomaly detection to separate a novel species from the ones contained in an initial existing database. Our results show that the implemented algorithm outperforms four state-of-the-art methods for outlier detection on the plankton dataset used in our analysis. Finally, using a leave-one-out approach, we prove that our pipeline is able to identify unknown plankton species with high-accuracy.

An Anomaly Detection Approach for Plankton Species Discovery

Pastore, Vito Paolo;
2022-01-01

Abstract

Plankton is one of the most abundant and diverse class of microscopic organisms inhabiting the Earth. Their enormous intra- and inter-species genetic and phenotypic diversity, coupled with the limited amount of large survey data, makes it hard to obtain a complete representation of this important class of organisms. Hence, the classification accuracy of novel supervised machine learning algorithms is bound to be limited by the incompleteness of the training data. In this work we introduce an efficient pipeline centered around a novel anomaly detection algorithm to discover and classify new plankton species, in situ, with the aim of automatically populating a plankton database in an unsupervised fashion. Our pipeline utilizes the concept of anomaly detection to separate a novel species from the ones contained in an initial existing database. Our results show that the implemented algorithm outperforms four state-of-the-art methods for outlier detection on the plankton dataset used in our analysis. Finally, using a leave-one-out approach, we prove that our pipeline is able to identify unknown plankton species with high-accuracy.
2022
978-3-031-06429-6
978-3-031-06430-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1087288
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 8
social impact