The high request for autonomous human-robot interaction (HRI), combined with the potential of machine learning (ML) techniques, allow us to deploy ML mechanisms in robot control. However, the use of ML can make robots' behavior unclear to the observer during the learning phase. Recently, transparency in HRI has been investigated to make such interactions more comprehensible. In this work, we propose a model to improve the transparency during reinforcement learning (RL) tasks for HRI scenarios: the model supports transparency by having the robot show nonverbal emotional-behavioral cues. Our model considered human feedback as the reward of the RL algorithm and it presents emotional-behavioral responses based on the progress of the robot learning. The model is managed only by the temporal-difference error. We tested the architecture in a teaching scenario with the iCub humanoid robot. The results highlight that when the robot expresses its emotional-behavioral response, the human teacher is able to understand its learning process better. Furthermore, people prefer to interact with an expressive robot as compared to a mechanical one. Movement-based signals proved to be more effective in revealing the internal state of the robot than facial expressions. In particular, gaze movements were effective in showing the robot's next intentions. In contrast, communicating uncertainty through robot movements sometimes led to action misinterpretation, highlighting the importance of balancing transparency and the legibility of the robot goal. We also found a reliable temporal window in which to register teachers' feedback that can be used by the robot as a reward.
Toward Robots' Behavioral Transparency of Temporal Difference Reinforcement Learning with a Human Teacher
Matarese M.;Sciutti A.;Rea F.;
2021-01-01
Abstract
The high request for autonomous human-robot interaction (HRI), combined with the potential of machine learning (ML) techniques, allow us to deploy ML mechanisms in robot control. However, the use of ML can make robots' behavior unclear to the observer during the learning phase. Recently, transparency in HRI has been investigated to make such interactions more comprehensible. In this work, we propose a model to improve the transparency during reinforcement learning (RL) tasks for HRI scenarios: the model supports transparency by having the robot show nonverbal emotional-behavioral cues. Our model considered human feedback as the reward of the RL algorithm and it presents emotional-behavioral responses based on the progress of the robot learning. The model is managed only by the temporal-difference error. We tested the architecture in a teaching scenario with the iCub humanoid robot. The results highlight that when the robot expresses its emotional-behavioral response, the human teacher is able to understand its learning process better. Furthermore, people prefer to interact with an expressive robot as compared to a mechanical one. Movement-based signals proved to be more effective in revealing the internal state of the robot than facial expressions. In particular, gaze movements were effective in showing the robot's next intentions. In contrast, communicating uncertainty through robot movements sometimes led to action misinterpretation, highlighting the importance of balancing transparency and the legibility of the robot goal. We also found a reliable temporal window in which to register teachers' feedback that can be used by the robot as a reward.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.