Structured Representation Learning for Visual Data

IRIS

Visual data like images and videos are characterized by high-dimensional and noisy samples that are hard to understand and reason upon. To support the effective processing of such data, classic machine learning approaches relied on meaningful crafted features from raw data to enhance the performance of downstream models. However, the manual design of features is a labor-intensive process that requires expertise in the domain of interest, making it economically and time-demanding. Contrary to the human-engineered features approach, representation learning automatically discovers informative data representations that ease downstream processing in diverse applications. This dissertation investigates new approaches and assumptions for representation learning in visual data, such as images or videos. Specifically, we focus on the application of deep learning techniques to uncover and interpret the spatial and causal structure inherent in visual information. The research is structured around three main questions, each addressing a different aspect of visual data, such as what information to preserve, how to structure it, and how to adapt to new data. The first part of the thesis addresses the challenge of reconstructing the spatial structure from unordered visual parts. We introduce GANZZLE and GANZZLE++ that overcome the combinatorial complexity of puzzles using a generative approach to estimate the global solution. Contrary to other deep learning solutions, the approaches can handle a variable number of pieces. The second section of the thesis explores the organization and structuring of image representations. It delves into the concept of disentanglement in data representation, aiming to align the learned representation with the data generative process. We develop new strategies that facilitate the disentanglement process, enhancing the interpretability of the representation. Finally, the thesis examines the adaptability of structured representations to new and unseen environments. We present DECAF, a novel framework that alters existing representations to address a new, different environment. This approach highlights the importance of modular causal representations for re-usability and composition of available knowledge.

Structured Representation Learning for Visual Data

TALON, DAVIDE

2024-03-29

Abstract

Visual data like images and videos are characterized by high-dimensional and noisy samples that are hard to understand and reason upon. To support the effective processing of such data, classic machine learning approaches relied on meaningful crafted features from raw data to enhance the performance of downstream models. However, the manual design of features is a labor-intensive process that requires expertise in the domain of interest, making it economically and time-demanding. Contrary to the human-engineered features approach, representation learning automatically discovers informative data representations that ease downstream processing in diverse applications. This dissertation investigates new approaches and assumptions for representation learning in visual data, such as images or videos. Specifically, we focus on the application of deep learning techniques to uncover and interpret the spatial and causal structure inherent in visual information. The research is structured around three main questions, each addressing a different aspect of visual data, such as what information to preserve, how to structure it, and how to adapt to new data. The first part of the thesis addresses the challenge of reconstructing the spatial structure from unordered visual parts. We introduce GANZZLE and GANZZLE++ that overcome the combinatorial complexity of puzzles using a generative approach to estimate the global solution. Contrary to other deep learning solutions, the approaches can handle a variable number of pieces. The second section of the thesis explores the organization and structuring of image representations. It delves into the concept of disentanglement in data representation, aiming to align the learned representation with the data generative process. We develop new strategies that facilitate the disentanglement process, enhancing the interpretability of the representation. Finally, the thesis examines the adaptability of structured representations to new and unseen environments. We present DECAF, a novel framework that alters existing representations to address a new, different environment. This approach highlights the importance of modular causal representations for re-usability and composition of available knowledge.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione della tesi
	
				29-mar-2024
			
	Parole chiave
	
				Computer vision; structured representation; causal representation learning; disentanglement; jigsaw puzzle; generative adversarial network
			
	Appare nelle tipologie:
	
				Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phdunige_4970330.pdf accesso aperto Tipologia: Tesi di dottorato Dimensione 23.32 MB Formato Adobe PDF Visualizza/Apri	23.32 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1168641

Citazioni

ND

ND

ND

social impact