In the last years, Machine Learning (ML) has become extremely used in software systems: it is applied in many different contexts such as medicine, bioinformatics, finance, automotive, only to mention a few. One of the main drawbacks recognized in the literature is that there are still no consolidated approaches and strategies to ensure the reliability of the code implementing the underlying ML theoretical algorithms. This fact has potentially a strong impact since many critical software systems rely on ML algorithms for implementing intelligent behaviors, and so on (potentially) unreliable code that could cause, in extreme cases, catastrophic errors: e.g., loss of life due to a wrong diagnosis of an ML-based cancer classifier. Our work aims to understand better the impact that implementation bugs have on the results provided by ML algorithms. Such analysis is fundamental to define novel techniques able to detect bugs in ML-based software systems. Thus, we extensively analyzed thousands of bugs on eight ML algorithms (in particular, four classification algorithms and four clustering ones). The bugs were injected by using an automatic Mutation tool able to mimic realistic errors in the algorithms source code. The empirical study shows that a large amount of the injected bugs are silent since they do not influence the results the eight algorithms provide on the 17 datasets employed in our study; in the remaining cases, the bugs emerge as runtime errors, exceptions, or modified accuracy of the predictions. Moreover, we also discovered that about 1% of the injected bugs are extremely dangerous since they drastically affect the quality of the predictions only in rare cases and with specific datasets increasing the possibility of going unnoticed. The fact that a considerable portion of the bugs does not influence the behavior of the algorithms, on the datasets employed in our study, poses a considerable problem: indeed, among them, several other dangerous silent bugs could be present. They could emerge when the implementations of the algorithms are employed on a novel dataset and with different settings. So the problem of the dangerous silent bugs can potentially be more pervasive than shown in our study.

A large experimentation to analyze the effects of implementation bugs in machine learning algorithms

Leotta M.;Olianas D.;Ricca F.
2022

Abstract

In the last years, Machine Learning (ML) has become extremely used in software systems: it is applied in many different contexts such as medicine, bioinformatics, finance, automotive, only to mention a few. One of the main drawbacks recognized in the literature is that there are still no consolidated approaches and strategies to ensure the reliability of the code implementing the underlying ML theoretical algorithms. This fact has potentially a strong impact since many critical software systems rely on ML algorithms for implementing intelligent behaviors, and so on (potentially) unreliable code that could cause, in extreme cases, catastrophic errors: e.g., loss of life due to a wrong diagnosis of an ML-based cancer classifier. Our work aims to understand better the impact that implementation bugs have on the results provided by ML algorithms. Such analysis is fundamental to define novel techniques able to detect bugs in ML-based software systems. Thus, we extensively analyzed thousands of bugs on eight ML algorithms (in particular, four classification algorithms and four clustering ones). The bugs were injected by using an automatic Mutation tool able to mimic realistic errors in the algorithms source code. The empirical study shows that a large amount of the injected bugs are silent since they do not influence the results the eight algorithms provide on the 17 datasets employed in our study; in the remaining cases, the bugs emerge as runtime errors, exceptions, or modified accuracy of the predictions. Moreover, we also discovered that about 1% of the injected bugs are extremely dangerous since they drastically affect the quality of the predictions only in rare cases and with specific datasets increasing the possibility of going unnoticed. The fact that a considerable portion of the bugs does not influence the behavior of the algorithms, on the datasets employed in our study, poses a considerable problem: indeed, among them, several other dangerous silent bugs could be present. They could emerge when the implementations of the algorithms are employed on a novel dataset and with different settings. So the problem of the dangerous silent bugs can potentially be more pervasive than shown in our study.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11567/1085962
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact