The prediction of affordances i.e., the potential actions an agent can perform on objects in the scene, is fundamental for human-robot collaboration and wearable robotics scenarios in which objects may be on a tabletop or held by a person. Perceiving affordances from an image is challenging due to the variety of object geometric and physical properties, as well as occlusions caused by clutter or by a person's hand holding the object. In this thesis, we propose a framework for visual affordance prediction that estimates object properties such as position and mass, and identifies graspable regions of objects, supporting the agent to perform the intended actions. As previous methods focused on predicting the filling mass of a container manipulated by a human, the complementary estimation of container mass regardless of the content was underexplored. Moreover, during a human manipulation more than one object could be in the scene, so a selection phase is necessary to focus only on the object of interest. We propose a strategy to select the object manipulated by a human from a fixed frontal RGB-D camera and we design a model to predict its mass. The model learns how to combine color and geometric information to predict the (empty) container mass. The integration of our pipeline with already existing filling mass predictors allows to obtain the complete container mass (object plus content). Object detection methods identify objects in a scene, however in wearable robotic applications the human knows objects location and category. We investigate a transfer learning procedure to locate objects in the scene regardless of their category (`objectness'). We target lightweight object detection models that could be used in a wearable application, where the trade-off between accuracy and computational cost is relevant and was previously not investigated. In case of human manipulations, the identification of the object regions an agent can interact with is more challenging due to occlusions and the poses object may take. We design an affordance segmentation model that learns affordance features under hand-occlusion by weighting the feature map through arm and object segmentation. Due to a lack of datasets to tackle this scenario, we complement an existing dataset, annotating the visual affordances of mixed-reality images of hand-held containers in third-person view. Experiments show that the strategy to select objects and predict their mass outperforms most baselines on previously unseen manipulated containers; the transfer learning procedure improves the performance of lightweight object detection methods in a wearable application; and the affordance segmentation model achieves better affordance segmentation and generalisation than existing models.

Visual Affordance Prediction of Hand-Occluded Objects

APICELLA, TOMMASO
2024-03-27

Abstract

The prediction of affordances i.e., the potential actions an agent can perform on objects in the scene, is fundamental for human-robot collaboration and wearable robotics scenarios in which objects may be on a tabletop or held by a person. Perceiving affordances from an image is challenging due to the variety of object geometric and physical properties, as well as occlusions caused by clutter or by a person's hand holding the object. In this thesis, we propose a framework for visual affordance prediction that estimates object properties such as position and mass, and identifies graspable regions of objects, supporting the agent to perform the intended actions. As previous methods focused on predicting the filling mass of a container manipulated by a human, the complementary estimation of container mass regardless of the content was underexplored. Moreover, during a human manipulation more than one object could be in the scene, so a selection phase is necessary to focus only on the object of interest. We propose a strategy to select the object manipulated by a human from a fixed frontal RGB-D camera and we design a model to predict its mass. The model learns how to combine color and geometric information to predict the (empty) container mass. The integration of our pipeline with already existing filling mass predictors allows to obtain the complete container mass (object plus content). Object detection methods identify objects in a scene, however in wearable robotic applications the human knows objects location and category. We investigate a transfer learning procedure to locate objects in the scene regardless of their category (`objectness'). We target lightweight object detection models that could be used in a wearable application, where the trade-off between accuracy and computational cost is relevant and was previously not investigated. In case of human manipulations, the identification of the object regions an agent can interact with is more challenging due to occlusions and the poses object may take. We design an affordance segmentation model that learns affordance features under hand-occlusion by weighting the feature map through arm and object segmentation. Due to a lack of datasets to tackle this scenario, we complement an existing dataset, annotating the visual affordances of mixed-reality images of hand-held containers in third-person view. Experiments show that the strategy to select objects and predict their mass outperforms most baselines on previously unseen manipulated containers; the transfer learning procedure improves the performance of lightweight object detection methods in a wearable application; and the affordance segmentation model achieves better affordance segmentation and generalisation than existing models.
27-mar-2024
Affordances; Semantic Segmentation; Object Detection; Mass estimation
File in questo prodotto:
File Dimensione Formato  
phdunige_4111548.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 21.18 MB
Formato Adobe PDF
21.18 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1168788
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact