The impact of rewriting on coverage constraint satisfaction

IRIS

Due to the impact of analytical processes on our life, an increasing effort is being devoted to the design of technological solutions that help humans in measuring the bias introduced by such processes and understanding its causes. Existing solutions can refer to either back-end or front-end stages of the data processing pipeline and usually represent bias in terms of some given diversity or fairness constraint. In our previous work [1], we proposed an approach for rewriting filtering and merge operations in pre-processing pipelines into the “closest” operations so that protected groups are adequately represented (i.e., covered) in the result. This is relevant because any under-represented category in an initial or intermediate dataset might lead to an under-representation of that category in any subsequent analytical process. Since many potential rewritings exist, the proposed approach is approximate and relies on a sample-based cardinality estimation, thus introducing a trade-off between the accuracy and the efficiency of the process. In this paper, we investigate this trade-off by first presenting various measures quantifying the error introduced by the rewriting, due to the applied approximation and the selected sample. Then, we (preliminarly) experimentally evaluate such measures on a real-world dataset.

The impact of rewriting on coverage constraint satisfaction

Accinelli C.;Catania B.;Guerrini G.;Minisi S.

2021-01-01

Abstract

Due to the impact of analytical processes on our life, an increasing effort is being devoted to the design of technological solutions that help humans in measuring the bias introduced by such processes and understanding its causes. Existing solutions can refer to either back-end or front-end stages of the data processing pipeline and usually represent bias in terms of some given diversity or fairness constraint. In our previous work [1], we proposed an approach for rewriting filtering and merge operations in pre-processing pipelines into the “closest” operations so that protected groups are adequately represented (i.e., covered) in the result. This is relevant because any under-represented category in an initial or intermediate dataset might lead to an under-representation of that category in any subsequent analytical process. Since many potential rewritings exist, the proposed approach is approximate and relies on a sample-based cardinality estimation, thus introducing a trade-off between the accuracy and the efficiency of the process. In this paper, we investigate this trade-off by first presenting various measures quantifying the error introduced by the rewriting, due to the applied approximation and the selected sample. Then, we (preliminarly) experimentally evaluate such measures on a real-world dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2021

Appare nelle tipologie:

04.01 - Contributo in atti di convegno

File in questo prodotto:

File	Dimensione	Formato
PIE+Q_2.pdf accesso aperto Descrizione: Contributo in atti di convegno Tipologia: Documento in versione editoriale Dimensione 1.72 MB Formato Adobe PDF Visualizza/Apri	1.72 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1071392

Citazioni

ND

12

ND

social impact