Cross-view action recognition refers to the task of recognizing actions observed from view-points that are unfamiliar to the system. To address the complexity of the problem, state of the art methods often rely on large-scale datasets, where the variability of viewpoints is appropriately represented. However, this comes to a significant price, in terms of computational power, time, costs, energy for both gathering data annotation and training the model. We propose a methodological pipeline that tackles the same challenges with specific focus on small-scale datasets and attention to the amount of resources required. The core idea of our method is to transfer knowledge from an intermediate, pre-trained representation, under the hypothesis that it already may implicitly incorporate relevant cues for the task. We rely on an effective domain adaptation strategy coupled with the design of a robust classifier that promotes view-invariant properties and allows us to efficiently generalise to action recognition to unseen viewpoints. In contrast to other state-of-art methods employing also alternative data modalities, our approach is purely video-based and thus has a wider field of applications. We present a thorough experimental analysis justifying the choices on the design of the pipeline, and providing a comparison with existing approaches in the two main scenarios of one-one learning and multiple view learning, where our approach provides superior performance.

Cross-view action recognition with small-scale datasets

Goyal G.;Noceti N.;Odone F.
2022-01-01

Abstract

Cross-view action recognition refers to the task of recognizing actions observed from view-points that are unfamiliar to the system. To address the complexity of the problem, state of the art methods often rely on large-scale datasets, where the variability of viewpoints is appropriately represented. However, this comes to a significant price, in terms of computational power, time, costs, energy for both gathering data annotation and training the model. We propose a methodological pipeline that tackles the same challenges with specific focus on small-scale datasets and attention to the amount of resources required. The core idea of our method is to transfer knowledge from an intermediate, pre-trained representation, under the hypothesis that it already may implicitly incorporate relevant cues for the task. We rely on an effective domain adaptation strategy coupled with the design of a robust classifier that promotes view-invariant properties and allows us to efficiently generalise to action recognition to unseen viewpoints. In contrast to other state-of-art methods employing also alternative data modalities, our approach is purely video-based and thus has a wider field of applications. We present a thorough experimental analysis justifying the choices on the design of the pipeline, and providing a comparison with existing approaches in the two main scenarios of one-one learning and multiple view learning, where our approach provides superior performance.
File in questo prodotto:
File Dimensione Formato  
IMAVIS_1-s2.0-S0262885622000324-main.pdf

accesso chiuso

Tipologia: Documento in versione editoriale
Dimensione 1.38 MB
Formato Adobe PDF
1.38 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1088068
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 8
social impact