Despite the name, human fixation is a highly dynamic process. Contrary to a photographer, who puts a great effort in keeping the lens still, eyes incessantly move even during fixation, to the point visibility would otherwise rapidly fade away. Beyond avoiding perceptual fading, many functional advantages of Fixational Eye Movements have been pointed out in biology, persuading the scientific community that FEMs are far from being a nuisance as originally believed. On the wave of these findings, I investigate their role in neuromorphic vision sensing and processing, hence bridging two active research fields: vision neuroscience and neuromorphic computing. By emulating primary functionalities of the human retina, neuromorphic cameras represent a convenient device for either investigating (by modeling) visual neuroscience theories, as well as building power-efficient robotic systems. I exploit neuromorphic technology to examine the effects of active viewing strategies during fixation, emphasizing the benefits they provide in terms of both space and time encoding of static visual inputs. My aim is to confirm main biological evidence and propose new arguments on the importance of such movements for artificial vision systems, further extendable to biological ones. The scientific questions I tackle can be expressed as follows: (1) “How do bio-inspired fixation drifts influence the spatio-temporal representation of static visual features in neuromorphic sensing?” and (2) “How does this affect subsequent processing stages?”. The first step to answer both questions is the design of a neuromorphic active- vision setup – able to finely reproduce microscopic and randomly-drifting bio-inspired motion patterns – combined with the development of a suitable software toolkit. This enables to effortlessly acquire significant amount of event-based data from different static visual sources, for further analysis. I then characterize the output signal of the silicon retina using either traditional computer vision algorithms, latest deep-learning architectures, and emerging spiking neural networks. Specifically, to answer the first question, I examine purely spatial information gathered from event-based recordings of natural images and synthetic stimuli. I adopt both traditional amplitude-based image correlation approaches (frequently employed for efficient coding studies in vision science), as well as more sophisticated phase-based examination techniques (coupled with Gabor filter banks for robust local feature extraction, ix and metrics inherited from neuroimaging research). The influence of motion isotropy is then evaluated based on circular statistic tests and image deconvolution techniques (refined using Tikhonov regularization). Finally, to conclude the first question, I asses the distribution events on the temporal dimension by using a custom deep-learning pipeline, together with neuromorphic data recorded from a large and well-established computer-vision dataset. The pipeline uses progressively increasing temporal scales of the data stream, and mainly comprises state-of-the-art 2D and 3D convolutional neural networks, as well as a cutting-edge spiking network model. Since the latter is particularly suitable for neuromorphic hardware implementations, this answers the second question as well. Collectively, my results prove that fixational drifts assist vision in (i) minimizing redundancy, by removing space correlations and inducing whitening effects of the amplitude spectrum, (ii) preserving relevant structural information, by retaining local phase with no alteration, (iii) acting as preliminary anisotropic filtering stages, that can be combined in time for unbiased feature extraction, (iv) inducing biologically-comparable time modulations of static visual information, by arranging events in complex spatio-temporal patterns, and (v) improving a subsequent spike-based computation, able to learn rich temporal codes for high-level vision tasks. Overall, these aspects reflect in an efficient spatio-temporal encoding of static visual scenes, that benefits both transmission and computation of spiking data downstream the hierarchical structure of the visual system. Hopefully, the modeling framework I propose could serve as a methodological basis to (i) investigate data encoding in visual systems, (ii) provide preliminary proofs or refusals of neuroscientific theories about FEMs, (ii) produce novel and suitable benchmarks for advancing the field of neuromorphic computing, and finally (iv) inspire future robotic applications to efficiently gather visual information during fixation. Since neuromorphic hardware is particularly suited for embedded and power-constrained solutions, a mechanism that helps discarding redundancy while preserving informative content can be a convenient tool for optimizing data transmission, both in terms of wiring and energy requirements. Moreover, given that spike-based algorithms and processors are particularly convenient for (and devoted to) richly time-structured data, finding an optimal strategy for effectively encoding space in time is crucial for real-world neuromorphic vision applications. Nature came up with a single elegant solution for both problems.
Fixation Drifts in Neuromorphic Vision Sensing & Computing: a Natural Approach for Effective Space-Time Encoding of Static Scenes
TESTA, SIMONE
2023-06-30
Abstract
Despite the name, human fixation is a highly dynamic process. Contrary to a photographer, who puts a great effort in keeping the lens still, eyes incessantly move even during fixation, to the point visibility would otherwise rapidly fade away. Beyond avoiding perceptual fading, many functional advantages of Fixational Eye Movements have been pointed out in biology, persuading the scientific community that FEMs are far from being a nuisance as originally believed. On the wave of these findings, I investigate their role in neuromorphic vision sensing and processing, hence bridging two active research fields: vision neuroscience and neuromorphic computing. By emulating primary functionalities of the human retina, neuromorphic cameras represent a convenient device for either investigating (by modeling) visual neuroscience theories, as well as building power-efficient robotic systems. I exploit neuromorphic technology to examine the effects of active viewing strategies during fixation, emphasizing the benefits they provide in terms of both space and time encoding of static visual inputs. My aim is to confirm main biological evidence and propose new arguments on the importance of such movements for artificial vision systems, further extendable to biological ones. The scientific questions I tackle can be expressed as follows: (1) “How do bio-inspired fixation drifts influence the spatio-temporal representation of static visual features in neuromorphic sensing?” and (2) “How does this affect subsequent processing stages?”. The first step to answer both questions is the design of a neuromorphic active- vision setup – able to finely reproduce microscopic and randomly-drifting bio-inspired motion patterns – combined with the development of a suitable software toolkit. This enables to effortlessly acquire significant amount of event-based data from different static visual sources, for further analysis. I then characterize the output signal of the silicon retina using either traditional computer vision algorithms, latest deep-learning architectures, and emerging spiking neural networks. Specifically, to answer the first question, I examine purely spatial information gathered from event-based recordings of natural images and synthetic stimuli. I adopt both traditional amplitude-based image correlation approaches (frequently employed for efficient coding studies in vision science), as well as more sophisticated phase-based examination techniques (coupled with Gabor filter banks for robust local feature extraction, ix and metrics inherited from neuroimaging research). The influence of motion isotropy is then evaluated based on circular statistic tests and image deconvolution techniques (refined using Tikhonov regularization). Finally, to conclude the first question, I asses the distribution events on the temporal dimension by using a custom deep-learning pipeline, together with neuromorphic data recorded from a large and well-established computer-vision dataset. The pipeline uses progressively increasing temporal scales of the data stream, and mainly comprises state-of-the-art 2D and 3D convolutional neural networks, as well as a cutting-edge spiking network model. Since the latter is particularly suitable for neuromorphic hardware implementations, this answers the second question as well. Collectively, my results prove that fixational drifts assist vision in (i) minimizing redundancy, by removing space correlations and inducing whitening effects of the amplitude spectrum, (ii) preserving relevant structural information, by retaining local phase with no alteration, (iii) acting as preliminary anisotropic filtering stages, that can be combined in time for unbiased feature extraction, (iv) inducing biologically-comparable time modulations of static visual information, by arranging events in complex spatio-temporal patterns, and (v) improving a subsequent spike-based computation, able to learn rich temporal codes for high-level vision tasks. Overall, these aspects reflect in an efficient spatio-temporal encoding of static visual scenes, that benefits both transmission and computation of spiking data downstream the hierarchical structure of the visual system. Hopefully, the modeling framework I propose could serve as a methodological basis to (i) investigate data encoding in visual systems, (ii) provide preliminary proofs or refusals of neuroscientific theories about FEMs, (ii) produce novel and suitable benchmarks for advancing the field of neuromorphic computing, and finally (iv) inspire future robotic applications to efficiently gather visual information during fixation. Since neuromorphic hardware is particularly suited for embedded and power-constrained solutions, a mechanism that helps discarding redundancy while preserving informative content can be a convenient tool for optimizing data transmission, both in terms of wiring and energy requirements. Moreover, given that spike-based algorithms and processors are particularly convenient for (and devoted to) richly time-structured data, finding an optimal strategy for effectively encoding space in time is crucial for real-world neuromorphic vision applications. Nature came up with a single elegant solution for both problems.File | Dimensione | Formato | |
---|---|---|---|
phdunige_4420440.pdf
Open Access dal 01/07/2024
Tipologia:
Tesi di dottorato
Dimensione
17.8 MB
Formato
Adobe PDF
|
17.8 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.