What can we expect from a V1-MT feedforward architecture for optical flow estimation?

IRIS

Motion estimation has been studied extensively in neuroscience in the last two decades. Even though there has been some early interaction between the biological and computer vision communities at a modelling level, comparatively little work has been done on the examination or extension of the biological models in terms of their engineering efficacy on modern optical flow estimation datasets. An essential contribution of this paper is to show how a neural model can be enriched to deal with real sequences. We start from a classical V1-MT feedforward architecture. We model V1 cells by motion energy (based on spatio-temporal filtering), and MT pattern cells (by pooling V1 cell responses). The efficacy of this architecture and its inherent limitations in the case of real videos are not known. To answer this question, we propose a velocity space sampling of MT neurons (using a decoding scheme to obtain the local velocity from their activity) coupled with a multi-scale approach. After this, we explore the performance of our model on the Middlebury dataset. To the best of our knowledge, this is the only neural model in this dataset. The results are promising and suggest several possible improvements, in particular to better deal with discontinuities. Overall, this work provides a baseline for future developments of bio-inspired scalable computer vision algorithms and the code is publicly available to encourage research in this direction.

What can we expect from a V1-MT feedforward architecture for optical flow estimation?

SOLARI, FABIO;CHESSA, MANUELA;Medathati, N. V. Kartheek;Kornprobst, Pierre

2015-01-01

Abstract

Motion estimation has been studied extensively in neuroscience in the last two decades. Even though there has been some early interaction between the biological and computer vision communities at a modelling level, comparatively little work has been done on the examination or extension of the biological models in terms of their engineering efficacy on modern optical flow estimation datasets. An essential contribution of this paper is to show how a neural model can be enriched to deal with real sequences. We start from a classical V1-MT feedforward architecture. We model V1 cells by motion energy (based on spatio-temporal filtering), and MT pattern cells (by pooling V1 cell responses). The efficacy of this architecture and its inherent limitations in the case of real videos are not known. To answer this question, we propose a velocity space sampling of MT neurons (using a decoding scheme to obtain the local velocity from their activity) coupled with a multi-scale approach. After this, we explore the performance of our model on the Middlebury dataset. To the best of our knowledge, this is the only neural model in this dataset. The results are promising and suggest several possible improvements, in particular to better deal with discontinuities. Overall, this work provides a baseline for future developments of bio-inspired scalable computer vision algorithms and the code is publicly available to encourage research in this direction.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2015

File in questo prodotto:

File	Dimensione	Formato
SolariA15.pdf accesso chiuso Tipologia: Documento in versione editoriale Dimensione 3.28 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.28 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/850998

Citazioni

ND

32

25

social impact