To exhibit intelligent behavior, cognitive robots must have some knowledge about the consequences of their actions and their value in the context of the goal being realized. We present a neural framework using which explorative sensorimotor experiences of cognitive robots can be efficiently ‘internalized’ using growing sensorimotor maps and planning realized using goal induced quasistationary value fields. Further, if there are no predefined reward functions (or the case when they are not good enough in a slightly modified world), the robot will have to try and realize its goal by exploration after which reward/penalty is given at the end. This paper proposes three simple rules for distribution of the received end reward among the contributing neurons in a high dimensional sensorimotor map. Importantly, reward/penalty distribution over hundreds of neurons in the sensorimotor map is computed one shot. This resulting reward distribution can be visualized as an additional value field, representing the new learnt experience and can be combined with other such fields in a context dependent fashion to plan/compose novel emergent behavior. The simplicity and efficiency of the approach is illustrated through the resulting behaviors of the GNOSYS robot in two different scenarios a) learning ‘when’ to optimize ‘what constraint’ while realizing spatial goals b) learning to push a ball intelligently to the corners of a table, while avoiding traps randomly placed by the teacher (this scenario replicates the famous trap tube paradigm from animal reasoning carried out on chimpanzees, capuchins and infants).

The distribution of rewards in growing sensorimotor maps acquired by Cognitive robots through exploration

MORASSO, PIETRO GIOVANNI;METTA, GIORGIO;
2011-01-01

Abstract

To exhibit intelligent behavior, cognitive robots must have some knowledge about the consequences of their actions and their value in the context of the goal being realized. We present a neural framework using which explorative sensorimotor experiences of cognitive robots can be efficiently ‘internalized’ using growing sensorimotor maps and planning realized using goal induced quasistationary value fields. Further, if there are no predefined reward functions (or the case when they are not good enough in a slightly modified world), the robot will have to try and realize its goal by exploration after which reward/penalty is given at the end. This paper proposes three simple rules for distribution of the received end reward among the contributing neurons in a high dimensional sensorimotor map. Importantly, reward/penalty distribution over hundreds of neurons in the sensorimotor map is computed one shot. This resulting reward distribution can be visualized as an additional value field, representing the new learnt experience and can be combined with other such fields in a context dependent fashion to plan/compose novel emergent behavior. The simplicity and efficiency of the approach is illustrated through the resulting behaviors of the GNOSYS robot in two different scenarios a) learning ‘when’ to optimize ‘what constraint’ while realizing spatial goals b) learning to push a ball intelligently to the corners of a table, while avoiding traps randomly placed by the teacher (this scenario replicates the famous trap tube paradigm from animal reasoning carried out on chimpanzees, capuchins and infants).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/393968
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact