The first part of this thesis presents the motivation for adapting ideas and methods from the theory of the model-based optimal design of experiments in the context of Big Data while guarding against different sources of bias. In particular, the key focus is on the issue of guarding against bias from confounders and how to use the theory of the design of experiment and randomization to remove bias depending on the constraints in the design. Starting with A/B experiments, largely used by major Tech Companies in online marketing, the theory of circuits is introduced and an algebraic methods which gives a wide choice of randomization schemes is presented. Furthermore, a robust exchange algorithm to deal with the problem of outliers in a Big Dataset is proposed. The second part is based on a marine insurance use case sponsored by Swiss Re Corporate Solutions, commercial insurance division of the Swiss Re Group. Several temporal disaggregation methods for dealing with time series collected at different time frequencies are reviewed and applied to real data in order to obtain a curated dataset for predicting future losses.

Model-based Design of Experiments for Large Dataset

PESCE, ELENA
2021-10-22

Abstract

The first part of this thesis presents the motivation for adapting ideas and methods from the theory of the model-based optimal design of experiments in the context of Big Data while guarding against different sources of bias. In particular, the key focus is on the issue of guarding against bias from confounders and how to use the theory of the design of experiment and randomization to remove bias depending on the constraints in the design. Starting with A/B experiments, largely used by major Tech Companies in online marketing, the theory of circuits is introduced and an algebraic methods which gives a wide choice of randomization schemes is presented. Furthermore, a robust exchange algorithm to deal with the problem of outliers in a Big Dataset is proposed. The second part is based on a marine insurance use case sponsored by Swiss Re Corporate Solutions, commercial insurance division of the Swiss Re Group. Several temporal disaggregation methods for dealing with time series collected at different time frequencies are reviewed and applied to real data in order to obtain a curated dataset for predicting future losses.
22-ott-2021
Big Data, Bias, Design of Experiments, Randomization, Circuits, Outliers, Temporal Disaggregation
File in questo prodotto:
File Dimensione Formato  
phdunige_3777600.pdf

accesso aperto

Descrizione: PhD thesis
Tipologia: Tesi di dottorato
Dimensione 1.9 MB
Formato Adobe PDF
1.9 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1057610
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact