Plankton is one of the most abundant and diverse class of microscopic organisms inhabiting the Earth. Their enormous intra- and inter-species genetic and phenotypic diversity, coupled with the limited amount of large survey data, makes it hard to obtain a complete representation of this important class of organisms. Hence, the classification accuracy of novel supervised machine learning algorithms is bound to be limited by the incompleteness of the training data. In this work we introduce an efficient pipeline centered around a novel anomaly detection algorithm to discover and classify new plankton species, in situ, with the aim of automatically populating a plankton database in an unsupervised fashion. Our pipeline utilizes the concept of anomaly detection to separate a novel species from the ones contained in an initial existing database. Our results show that the implemented algorithm outperforms four state-of-the-art methods for outlier detection on the plankton dataset used in our analysis. Finally, using a leave-one-out approach, we prove that our pipeline is able to identify unknown plankton species with high-accuracy.
An Anomaly Detection Approach for Plankton Species Discovery
Pastore, Vito Paolo;
2022-01-01
Abstract
Plankton is one of the most abundant and diverse class of microscopic organisms inhabiting the Earth. Their enormous intra- and inter-species genetic and phenotypic diversity, coupled with the limited amount of large survey data, makes it hard to obtain a complete representation of this important class of organisms. Hence, the classification accuracy of novel supervised machine learning algorithms is bound to be limited by the incompleteness of the training data. In this work we introduce an efficient pipeline centered around a novel anomaly detection algorithm to discover and classify new plankton species, in situ, with the aim of automatically populating a plankton database in an unsupervised fashion. Our pipeline utilizes the concept of anomaly detection to separate a novel species from the ones contained in an initial existing database. Our results show that the implemented algorithm outperforms four state-of-the-art methods for outlier detection on the plankton dataset used in our analysis. Finally, using a leave-one-out approach, we prove that our pipeline is able to identify unknown plankton species with high-accuracy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.