Visual data like images and videos are characterized by high-dimensional and noisy samples that are hard to understand and reason upon. To support the effective processing of such data, classic machine learning approaches relied on meaningful crafted features from raw data to enhance the performance of downstream models. However, the manual design of features is a labor-intensive process that requires expertise in the domain of interest, making it economically and time-demanding. Contrary to the human-engineered features approach, representation learning automatically discovers informative data representations that ease downstream processing in diverse applications. This dissertation investigates new approaches and assumptions for representation learning in visual data, such as images or videos. Specifically, we focus on the application of deep learning techniques to uncover and interpret the spatial and causal structure inherent in visual information. The research is structured around three main questions, each addressing a different aspect of visual data, such as what information to preserve, how to structure it, and how to adapt to new data. The first part of the thesis addresses the challenge of reconstructing the spatial structure from unordered visual parts. We introduce GANZZLE and GANZZLE++ that overcome the combinatorial complexity of puzzles using a generative approach to estimate the global solution. Contrary to other deep learning solutions, the approaches can handle a variable number of pieces. The second section of the thesis explores the organization and structuring of image representations. It delves into the concept of disentanglement in data representation, aiming to align the learned representation with the data generative process. We develop new strategies that facilitate the disentanglement process, enhancing the interpretability of the representation. Finally, the thesis examines the adaptability of structured representations to new and unseen environments. We present DECAF, a novel framework that alters existing representations to address a new, different environment. This approach highlights the importance of modular causal representations for re-usability and composition of available knowledge.

Structured Representation Learning for Visual Data

TALON, DAVIDE
2024-03-29

Abstract

Visual data like images and videos are characterized by high-dimensional and noisy samples that are hard to understand and reason upon. To support the effective processing of such data, classic machine learning approaches relied on meaningful crafted features from raw data to enhance the performance of downstream models. However, the manual design of features is a labor-intensive process that requires expertise in the domain of interest, making it economically and time-demanding. Contrary to the human-engineered features approach, representation learning automatically discovers informative data representations that ease downstream processing in diverse applications. This dissertation investigates new approaches and assumptions for representation learning in visual data, such as images or videos. Specifically, we focus on the application of deep learning techniques to uncover and interpret the spatial and causal structure inherent in visual information. The research is structured around three main questions, each addressing a different aspect of visual data, such as what information to preserve, how to structure it, and how to adapt to new data. The first part of the thesis addresses the challenge of reconstructing the spatial structure from unordered visual parts. We introduce GANZZLE and GANZZLE++ that overcome the combinatorial complexity of puzzles using a generative approach to estimate the global solution. Contrary to other deep learning solutions, the approaches can handle a variable number of pieces. The second section of the thesis explores the organization and structuring of image representations. It delves into the concept of disentanglement in data representation, aiming to align the learned representation with the data generative process. We develop new strategies that facilitate the disentanglement process, enhancing the interpretability of the representation. Finally, the thesis examines the adaptability of structured representations to new and unseen environments. We present DECAF, a novel framework that alters existing representations to address a new, different environment. This approach highlights the importance of modular causal representations for re-usability and composition of available knowledge.
29-mar-2024
Computer vision; structured representation; causal representation learning; disentanglement; jigsaw puzzle; generative adversarial network
File in questo prodotto:
File Dimensione Formato  
phdunige_4970330.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 23.32 MB
Formato Adobe PDF
23.32 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1168641
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact