The field of visual object recognition has seen a significant progress in recent years thanks to the availability of large-scale annotated datasets. However, labelling a large amount of data is difficult and costly and can be simply infeasible for some classes due to the long-tail instances distribution problem. Zero-Shot Learning (ZSL) is a framework that consider the case in which for some of the classes no labeled training examples are available to train the model. To solve the problem a multi-modal source of information, the class (semantic) embeddings, is exploited to extract knowledge from the available classes, the seen classes, and recognize novel categories for which the class embeddings is the only information available, namely, the unseen classes. To directly targeting the extreme imbalance in the data, in this thesis, we first propose a methodology to improve synthetic data generation for the unseen classes through their class embeddings. Second, we propose to generalize the Zero-Shot Learning framework towards a more competitive and real-world oriented scenario. Thus, we formalize the problem of Open Zero-Shot Learning as the problem of recognizing seen and unseen classes, as in ZSL, while also rejecting instances from unknown categories, for which neither visual data nor class embeddings are provided. Finally, we propose methodologies to not only generate unseen categories, but also the unknown ones.
Learning with Unavailable Data: Generalized and Open Zero-Shot Learning
MARMOREO, FEDERICO
2022-02-25
Abstract
The field of visual object recognition has seen a significant progress in recent years thanks to the availability of large-scale annotated datasets. However, labelling a large amount of data is difficult and costly and can be simply infeasible for some classes due to the long-tail instances distribution problem. Zero-Shot Learning (ZSL) is a framework that consider the case in which for some of the classes no labeled training examples are available to train the model. To solve the problem a multi-modal source of information, the class (semantic) embeddings, is exploited to extract knowledge from the available classes, the seen classes, and recognize novel categories for which the class embeddings is the only information available, namely, the unseen classes. To directly targeting the extreme imbalance in the data, in this thesis, we first propose a methodology to improve synthetic data generation for the unseen classes through their class embeddings. Second, we propose to generalize the Zero-Shot Learning framework towards a more competitive and real-world oriented scenario. Thus, we formalize the problem of Open Zero-Shot Learning as the problem of recognizing seen and unseen classes, as in ZSL, while also rejecting instances from unknown categories, for which neither visual data nor class embeddings are provided. Finally, we propose methodologies to not only generate unseen categories, but also the unknown ones.File | Dimensione | Formato | |
---|---|---|---|
phdunige_4611785.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Dimensione
3.6 MB
Formato
Adobe PDF
|
3.6 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.