This thesis investigates the embodied AI paradigm, its introduction into active perception and semantical reasoning pipelines, and explores its practical applications in robotics. Central to my research is the action-perception loop, where an agent follows a policy to explore an environment while significantly enhancing its perception—key facets in real-world applications. Embodied AI raises two challenges: (i) evaluating the performance of a policy in a real world scenario is risky, and (ii) complex tasks, e.g., rearranging objects in a room, require a deep understanding of the environment that goes beyond simple perception. I first tackle the issue of deploying an agent on a real-world robotic platform. I propose a novel approach for evaluating agent performance through efficient offline policy evaluation without the need for direct deployment. This method is particularly relevant when deploying in the target scenario is either unethical (e.g., healthcare), expensive (e.g., robotics), or unsafe (e.g., self-driving cars). Secondly, I delve into the complexities of spatial and semantic reasoning. Here, I introduce a novel diffusion model formulation, explicitly designed for tasks involving spatial and semantic reasoning, such as rearranging a room or solving puzzles. In summary, this thesis presents significant contributions in the domains of active exploration, offline policy evaluation, and spatial reasoning. My findings and methodologies not only advance academic understanding but also have substantial implications for the development of real-world robotic applications.

Embodied Active Perception for Spatial and Semantical Reasoning

SCARPELLINI, GIANLUCA
2024-03-29

Abstract

This thesis investigates the embodied AI paradigm, its introduction into active perception and semantical reasoning pipelines, and explores its practical applications in robotics. Central to my research is the action-perception loop, where an agent follows a policy to explore an environment while significantly enhancing its perception—key facets in real-world applications. Embodied AI raises two challenges: (i) evaluating the performance of a policy in a real world scenario is risky, and (ii) complex tasks, e.g., rearranging objects in a room, require a deep understanding of the environment that goes beyond simple perception. I first tackle the issue of deploying an agent on a real-world robotic platform. I propose a novel approach for evaluating agent performance through efficient offline policy evaluation without the need for direct deployment. This method is particularly relevant when deploying in the target scenario is either unethical (e.g., healthcare), expensive (e.g., robotics), or unsafe (e.g., self-driving cars). Secondly, I delve into the complexities of spatial and semantic reasoning. Here, I introduce a novel diffusion model formulation, explicitly designed for tasks involving spatial and semantic reasoning, such as rearranging a room or solving puzzles. In summary, this thesis presents significant contributions in the domains of active exploration, offline policy evaluation, and spatial reasoning. My findings and methodologies not only advance academic understanding but also have substantial implications for the development of real-world robotic applications.
29-mar-2024
reinforcement learning; diffusion models; computer vision; embodied ai
File in questo prodotto:
File Dimensione Formato  
phdunige_4965929.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 33.47 MB
Formato Adobe PDF
33.47 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1168598
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact