When people-related data are used in automated decision making processes, social inequities can be amplified. Thus, the development of technological solutions satisfying nondiscrimination requirements, in terms of, e.g., fairness, diversity, and coverage, is currently one of the main challenges for the data management and data analytics communities. In particular, coverage constraints guarantee that a dataset includes enough items for each (protected) category of interest, thus increasing diversity with the aim of limiting the introduction of bias during the next analytical steps. While coverage constraints have been mainly used for designing data repair solutions, in our study we investigate their effects on data processing pipelines, with a special reference to data transformation. To this aim, we first introduce coverage-based queries as a means for ensuring coverage constraint satisfaction on a selection-based query result, through rewriting. We then present two approximate algorithms for coverage-based query processing: the first, covRew, initially introduced in [3], relies on data discretization and sampling; the second, covKnn is a novel contribution and relies on a nearest neighbour approach for coverage-based query processing. The algorithms are experimentally compared with respect to efficiency and effectiveness, on a real dataset.

Coverage-based Queries: Nondiscrimination Awareness in Data Transformation

Accinelli C.;Catania B.;Guerrini G.
2022-01-01

Abstract

When people-related data are used in automated decision making processes, social inequities can be amplified. Thus, the development of technological solutions satisfying nondiscrimination requirements, in terms of, e.g., fairness, diversity, and coverage, is currently one of the main challenges for the data management and data analytics communities. In particular, coverage constraints guarantee that a dataset includes enough items for each (protected) category of interest, thus increasing diversity with the aim of limiting the introduction of bias during the next analytical steps. While coverage constraints have been mainly used for designing data repair solutions, in our study we investigate their effects on data processing pipelines, with a special reference to data transformation. To this aim, we first introduce coverage-based queries as a means for ensuring coverage constraint satisfaction on a selection-based query result, through rewriting. We then present two approximate algorithms for coverage-based query processing: the first, covRew, initially introduced in [3], relies on data discretization and sampling; the second, covKnn is a novel contribution and relies on a nearest neighbour approach for coverage-based query processing. The algorithms are experimentally compared with respect to efficiency and effectiveness, on a real dataset.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1145935
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact