Human action recognition from visual data is a popular topic in Computer Vision, applied in a wide range of domains. State-of-the-art solutions often include deep-learning approaches based on RGB videos and pre-computed optical flow maps. Recently, 3D Gray-Code Kernels projections have been assessed as an alternative way of representing motion, being able to efficiently capture space-time structures. In this work, we investigate the use of GCK pooling maps, which we called GCK-Maps, as input for addressing Human Action Recognition with CNNs. We provide an experimental comparison with RGB and optical flow in terms of accuracy, efficiency, and scene-bias dependency. Our results show that GCK-Maps generally represent a valuable alternative to optical flow and RGB frames, with a significant reduction of the computational burden.
GCK-Maps: A Scene Unbiased Representation for Efficient Human Action Recognition
Nicora E.;Pastore V. P.;Noceti N.
2023-01-01
Abstract
Human action recognition from visual data is a popular topic in Computer Vision, applied in a wide range of domains. State-of-the-art solutions often include deep-learning approaches based on RGB videos and pre-computed optical flow maps. Recently, 3D Gray-Code Kernels projections have been assessed as an alternative way of representing motion, being able to efficiently capture space-time structures. In this work, we investigate the use of GCK pooling maps, which we called GCK-Maps, as input for addressing Human Action Recognition with CNNs. We provide an experimental comparison with RGB and optical flow in terms of accuracy, efficiency, and scene-bias dependency. Our results show that GCK-Maps generally represent a valuable alternative to optical flow and RGB frames, with a significant reduction of the computational burden.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.