Crowd counting is a challenging and relevant computer vision task. Most of the existing methods are image-based, i.e., they only exploit the spatial information of a single image to estimate the corresponding people count. Recently, video-based methods have been proposed to improve counting accuracy by also exploiting temporal information coming from the correlation between adjacent frames. In this work, we point out the need to properly evaluate the temporal information’s specific contribution over the spatial one. This issue has not been discussed by existing work, and in some cases such evaluation has been carried out in a way that may lead to overestimating the contribution of the temporal information. To address this issue we propose a categorisation of existing video-based models, discuss how the contribution of the temporal information has been evaluated by existing work, and propose an evaluation approach aimed at providing a more complete evaluation for two different categories of video-based methods. We finally illustrate our approach, for a specific category, through experiments on several benchmark video data sets.

On the Evaluation of Video-Based Crowd Counting Models

Ledda E.;Roli F.
2022-01-01

Abstract

Crowd counting is a challenging and relevant computer vision task. Most of the existing methods are image-based, i.e., they only exploit the spatial information of a single image to estimate the corresponding people count. Recently, video-based methods have been proposed to improve counting accuracy by also exploiting temporal information coming from the correlation between adjacent frames. In this work, we point out the need to properly evaluate the temporal information’s specific contribution over the spatial one. This issue has not been discussed by existing work, and in some cases such evaluation has been carried out in a way that may lead to overestimating the contribution of the temporal information. To address this issue we propose a categorisation of existing video-based models, discuss how the contribution of the temporal information has been evaluated by existing work, and propose an evaluation approach aimed at providing a more complete evaluation for two different categories of video-based methods. We finally illustrate our approach, for a specific category, through experiments on several benchmark video data sets.
2022
978-3-031-06432-6
978-3-031-06433-3
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1158798
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact