Toward Robots' Behavioral Transparency of Temporal Difference Reinforcement Learning with a Human Teacher

IRIS

The high request for autonomous human-robot interaction (HRI), combined with the potential of machine learning (ML) techniques, allow us to deploy ML mechanisms in robot control. However, the use of ML can make robots' behavior unclear to the observer during the learning phase. Recently, transparency in HRI has been investigated to make such interactions more comprehensible. In this work, we propose a model to improve the transparency during reinforcement learning (RL) tasks for HRI scenarios: the model supports transparency by having the robot show nonverbal emotional-behavioral cues. Our model considered human feedback as the reward of the RL algorithm and it presents emotional-behavioral responses based on the progress of the robot learning. The model is managed only by the temporal-difference error. We tested the architecture in a teaching scenario with the iCub humanoid robot. The results highlight that when the robot expresses its emotional-behavioral response, the human teacher is able to understand its learning process better. Furthermore, people prefer to interact with an expressive robot as compared to a mechanical one. Movement-based signals proved to be more effective in revealing the internal state of the robot than facial expressions. In particular, gaze movements were effective in showing the robot's next intentions. In contrast, communicating uncertainty through robot movements sometimes led to action misinterpretation, highlighting the importance of balancing transparency and the legibility of the robot goal. We also found a reliable temporal window in which to register teachers' feedback that can be used by the robot as a reward.

Toward Robots' Behavioral Transparency of Temporal Difference Reinforcement Learning with a Human Teacher

Matarese M.;Sciutti A.;Rea F.;Rossi S.

2021-01-01

Abstract

The high request for autonomous human-robot interaction (HRI), combined with the potential of machine learning (ML) techniques, allow us to deploy ML mechanisms in robot control. However, the use of ML can make robots' behavior unclear to the observer during the learning phase. Recently, transparency in HRI has been investigated to make such interactions more comprehensible. In this work, we propose a model to improve the transparency during reinforcement learning (RL) tasks for HRI scenarios: the model supports transparency by having the robot show nonverbal emotional-behavioral cues. Our model considered human feedback as the reward of the RL algorithm and it presents emotional-behavioral responses based on the progress of the robot learning. The model is managed only by the temporal-difference error. We tested the architecture in a teaching scenario with the iCub humanoid robot. The results highlight that when the robot expresses its emotional-behavioral response, the human teacher is able to understand its learning process better. Furthermore, people prefer to interact with an expressive robot as compared to a mechanical one. Movement-based signals proved to be more effective in revealing the internal state of the robot than facial expressions. In particular, gaze movements were effective in showing the robot's next intentions. In contrast, communicating uncertainty through robot movements sometimes led to action misinterpretation, highlighting the importance of balancing transparency and the legibility of the robot goal. We also found a reliable temporal window in which to register teachers' feedback that can be used by the robot as a reward.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2021

Appare nelle tipologie:

01.01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1075430

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

19

12

social impact