The ability to adapt to perceive and manipulate novel objects is an important requirement for robots operating in unstructured dynamically-changing environments like the ones we live in. Autonomous perception and manipulation of objects in the environment surrounding the robot requires processing sensor data, including images, depth information and tactile feedback. Extracting meaningful semantic and geometric information from such data is per se a challenging open problem, which becomes even more pronounced in the considered scenario. In this setting, the target task of the robot may be not known in advance, requiring continuous adaptation of the robot perception system and control policies. The Deep Learning breakthrough provided great improvements both in the Computer Vision literature and on some open problems in robotics. While these approaches have shown huge potential to be applied in robotic tasks and to overcome some problems related to the use of classical methods, their application is constrained by some limitations which are intrinsic in the Deep Learning based approaches, such as the requirement of huge amounts of training data and the need of long training sessions to optimize such models. The aim of this Thesis is to overcome these limitations, allowing robots to perform tasks that would not be achievable leveraging only classical methods. The proposed methods aim at making Deep Learning approaches for the visual task of instance segmentation and for multi-fingered grasping suitable for training on real robotic platforms. In this perspective, I firstly proposed a hybrid method that leverages a pre-trained Convolutional Neural Network for feature extraction and Kernel-based classifiers for fast adaptation of an instance segmentation model in the presence of novel objects or different visual domains. Secondly, I proposed a Residual Reinforcement Learning method with the purpose of learning multi-fingered grasping of novel objects on the real robot. This relies on a policy pre-trained in simulation with a Deep Reinforcement Learning from Demonstration approach which has also been presented in this Thesis. Furthermore, I contributed to a community-driven effort aimed at providing a generalist policy for robotic manipulation by collecting a dataset for language-guided long-horizon manipulation tasks.

Robotic Perception and Manipulation: Leveraging Deep Learning Methods for Efficient Instance Segmentation and Multi-fingered Grasping

CEOLA, FEDERICO
2024-04-08

Abstract

The ability to adapt to perceive and manipulate novel objects is an important requirement for robots operating in unstructured dynamically-changing environments like the ones we live in. Autonomous perception and manipulation of objects in the environment surrounding the robot requires processing sensor data, including images, depth information and tactile feedback. Extracting meaningful semantic and geometric information from such data is per se a challenging open problem, which becomes even more pronounced in the considered scenario. In this setting, the target task of the robot may be not known in advance, requiring continuous adaptation of the robot perception system and control policies. The Deep Learning breakthrough provided great improvements both in the Computer Vision literature and on some open problems in robotics. While these approaches have shown huge potential to be applied in robotic tasks and to overcome some problems related to the use of classical methods, their application is constrained by some limitations which are intrinsic in the Deep Learning based approaches, such as the requirement of huge amounts of training data and the need of long training sessions to optimize such models. The aim of this Thesis is to overcome these limitations, allowing robots to perform tasks that would not be achievable leveraging only classical methods. The proposed methods aim at making Deep Learning approaches for the visual task of instance segmentation and for multi-fingered grasping suitable for training on real robotic platforms. In this perspective, I firstly proposed a hybrid method that leverages a pre-trained Convolutional Neural Network for feature extraction and Kernel-based classifiers for fast adaptation of an instance segmentation model in the presence of novel objects or different visual domains. Secondly, I proposed a Residual Reinforcement Learning method with the purpose of learning multi-fingered grasping of novel objects on the real robot. This relies on a policy pre-trained in simulation with a Deep Reinforcement Learning from Demonstration approach which has also been presented in this Thesis. Furthermore, I contributed to a community-driven effort aimed at providing a generalist policy for robotic manipulation by collecting a dataset for language-guided long-horizon manipulation tasks.
8-apr-2024
File in questo prodotto:
File Dimensione Formato  
phdunige_4958495.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 31.24 MB
Formato Adobe PDF
31.24 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1169675
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact