For a long time, robots were assigned to repetitive tasks such as industrial chains, where social skills played a secondary role. However, as next-generation robots are designed to interact and collaborate with humans, it becomes ever more important to endow them with social competencies. Humans use several explicit and implicit cues, like gaze, facial expression, and gestures to communicate. To understand and acquire these skills, social interaction during infancy plays a crucial role. Babies already at birth show social skills and continue to learn and develop them during all childhood. As robots will interact more frequently with us, with the goal of becoming our social companions helping us in different tasks, they should also learn these abilities. Deep learning algorithms have reached state-of-the-art results in different tasks, from objects recognition/detection to speech recognition. They demonstrated to be powerful tools to address complex problems, being valid candidates to make robots learn social skills. However, most of the achievements were reached using a supervised-learning approach, where large annotated datasets are available. This requires human supervision in both collecting and annotating the data, which in robotic applications can be problematic. Indeed, those networks, when applied in robots, can suffer from a drop in performance and need to be fine-tuned. Consequently, the annotation process has to be repeated to deal with the inherent dynamicity of robots, which is time-consuming and limits the autonomy of robots in their learning. Robots, being embodied, have access to a continuous stream of data thanks to their different sensors. Thus, instead of relying on human supervision to annotate their data, they should learn in a self-supervised way, similarly to babies in their early development. Advancements in neuroscience and psychology give some insights into how babies learn and develop social skills. For example, attentional mechanisms and multi-modal experience play an important role to guide the learning in the earliest years of life. Moreover, the development of their social skills follows specific stages. For example, very early they learn to detect faces as it is an important vector of information to further develop joint-attention mechanisms or emotion recognition. The works presented in this thesis investigate how to integrate Deep Learning and human early developmental strategies to enable robots to learn autonomously from their sensory experience. More specifically, we designed computational architectures tested on the iCub humanoid robot in ecological and natural interaction experiments where participants behaviours were not fully controlled. Indeed, in the past several studies have investigated self-supervised learning approaches to develop autonomous robots, but usually, the interaction was strongly constrained, often making it not representative of real-world scenarios. The novelty of this thesis relied also on the integration of the facilitation mechanisms used by babies, such as attention and cross-modal learning with Deep learning, to propose self-supervised frameworks. We focused on perception abilities that are important to develop social skills, such as face detection, voice localisation or people identification, to test our framework. Our results demonstrated the effectiveness of the approach: using our proposed framework iCubcould collect different datasets without the need for human annotations. Those datasets were used to train Deep Learning networks for faces, objects detection, sound localisation and person recognition to make the robot generalize from its experience. While the performances do not compare to state-of-the-art networks, these are promising results because they represent a proof of concept of the feasibility of the adoption of developmentally inspired mechanisms, to guide the learning in a pro-active way in robots. The different architectures proposed in this thesis represent a novel contribution to the development of robots capable of autonomously and efficiently learn from their sensory experience in ecological interactions. To conclude, our approach is a step forward toward autonomous robots that can learn directly from their experience in a self-supervised way.

Self-supervised solutions for developmental learning with the humanoid robot iCub.

GONZALEZ, JONAS PIERRE GUSTAVO
2021-06-09

Abstract

For a long time, robots were assigned to repetitive tasks such as industrial chains, where social skills played a secondary role. However, as next-generation robots are designed to interact and collaborate with humans, it becomes ever more important to endow them with social competencies. Humans use several explicit and implicit cues, like gaze, facial expression, and gestures to communicate. To understand and acquire these skills, social interaction during infancy plays a crucial role. Babies already at birth show social skills and continue to learn and develop them during all childhood. As robots will interact more frequently with us, with the goal of becoming our social companions helping us in different tasks, they should also learn these abilities. Deep learning algorithms have reached state-of-the-art results in different tasks, from objects recognition/detection to speech recognition. They demonstrated to be powerful tools to address complex problems, being valid candidates to make robots learn social skills. However, most of the achievements were reached using a supervised-learning approach, where large annotated datasets are available. This requires human supervision in both collecting and annotating the data, which in robotic applications can be problematic. Indeed, those networks, when applied in robots, can suffer from a drop in performance and need to be fine-tuned. Consequently, the annotation process has to be repeated to deal with the inherent dynamicity of robots, which is time-consuming and limits the autonomy of robots in their learning. Robots, being embodied, have access to a continuous stream of data thanks to their different sensors. Thus, instead of relying on human supervision to annotate their data, they should learn in a self-supervised way, similarly to babies in their early development. Advancements in neuroscience and psychology give some insights into how babies learn and develop social skills. For example, attentional mechanisms and multi-modal experience play an important role to guide the learning in the earliest years of life. Moreover, the development of their social skills follows specific stages. For example, very early they learn to detect faces as it is an important vector of information to further develop joint-attention mechanisms or emotion recognition. The works presented in this thesis investigate how to integrate Deep Learning and human early developmental strategies to enable robots to learn autonomously from their sensory experience. More specifically, we designed computational architectures tested on the iCub humanoid robot in ecological and natural interaction experiments where participants behaviours were not fully controlled. Indeed, in the past several studies have investigated self-supervised learning approaches to develop autonomous robots, but usually, the interaction was strongly constrained, often making it not representative of real-world scenarios. The novelty of this thesis relied also on the integration of the facilitation mechanisms used by babies, such as attention and cross-modal learning with Deep learning, to propose self-supervised frameworks. We focused on perception abilities that are important to develop social skills, such as face detection, voice localisation or people identification, to test our framework. Our results demonstrated the effectiveness of the approach: using our proposed framework iCubcould collect different datasets without the need for human annotations. Those datasets were used to train Deep Learning networks for faces, objects detection, sound localisation and person recognition to make the robot generalize from its experience. While the performances do not compare to state-of-the-art networks, these are promising results because they represent a proof of concept of the feasibility of the adoption of developmentally inspired mechanisms, to guide the learning in a pro-active way in robots. The different architectures proposed in this thesis represent a novel contribution to the development of robots capable of autonomously and efficiently learn from their sensory experience in ecological interactions. To conclude, our approach is a step forward toward autonomous robots that can learn directly from their experience in a self-supervised way.
9-giu-2021
Cognitive robotic, Deep Learning, autonomous learning, human-robot interaction
File in questo prodotto:
File Dimensione Formato  
phdunige_4460512.pdf

accesso aperto

Descrizione: PhD thesis
Tipologia: Tesi di dottorato
Dimensione 6.39 MB
Formato Adobe PDF
6.39 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1047609
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact