The field of visual object recognition has seen a significant progress in recent years thanks to the availability of large-scale annotated datasets. However, labelling a large amount of data is difficult and costly and can be simply infeasible for some classes due to the long-tail instances distribution problem. Zero-Shot Learning (ZSL) is a framework that consider the case in which for some of the classes no labeled training examples are available to train the model. To solve the problem a multi-modal source of information, the class (semantic) embeddings, is exploited to extract knowledge from the available classes, the seen classes, and recognize novel categories for which the class embeddings is the only information available, namely, the unseen classes. To directly targeting the extreme imbalance in the data, in this thesis, we first propose a methodology to improve synthetic data generation for the unseen classes through their class embeddings. Second, we propose to generalize the Zero-Shot Learning framework towards a more competitive and real-world oriented scenario. Thus, we formalize the problem of Open Zero-Shot Learning as the problem of recognizing seen and unseen classes, as in ZSL, while also rejecting instances from unknown categories, for which neither visual data nor class embeddings are provided. Finally, we propose methodologies to not only generate unseen categories, but also the unknown ones.

Learning with Unavailable Data: Generalized and Open Zero-Shot Learning

MARMOREO, FEDERICO
2022-02-25

Abstract

The field of visual object recognition has seen a significant progress in recent years thanks to the availability of large-scale annotated datasets. However, labelling a large amount of data is difficult and costly and can be simply infeasible for some classes due to the long-tail instances distribution problem. Zero-Shot Learning (ZSL) is a framework that consider the case in which for some of the classes no labeled training examples are available to train the model. To solve the problem a multi-modal source of information, the class (semantic) embeddings, is exploited to extract knowledge from the available classes, the seen classes, and recognize novel categories for which the class embeddings is the only information available, namely, the unseen classes. To directly targeting the extreme imbalance in the data, in this thesis, we first propose a methodology to improve synthetic data generation for the unseen classes through their class embeddings. Second, we propose to generalize the Zero-Shot Learning framework towards a more competitive and real-world oriented scenario. Thus, we formalize the problem of Open Zero-Shot Learning as the problem of recognizing seen and unseen classes, as in ZSL, while also rejecting instances from unknown categories, for which neither visual data nor class embeddings are provided. Finally, we propose methodologies to not only generate unseen categories, but also the unknown ones.
25-feb-2022
Zero-Shot Learning; Computer Vision
File in questo prodotto:
File Dimensione Formato  
phdunige_4611785.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 3.6 MB
Formato Adobe PDF
3.6 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1069118
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact