The widespread use of wearable cameras prompted the design of egocentric (first-person) systems that can readily support and help humans in their daily activities. Such systems require to understand a person’s engagement with their environment, aiming to decode their intentions and predict future actions. These engagements can also be regarded as the social-physical interactions. In this thesis, we chose to study such interactions for first-person vision and contribute to an understanding of human behavior within the context of egocentric video content. Our research initially focused on understanding these engagements with their environment by observing where they look. We explored the potential to determine the gaze a person using a more cost-effective and practical approach—using only a first-person video and the head orientation of the person involved in the activities. In our latter works, we pivoted from individual visual focus of a person to a broader contextual attention of the activities performed by the person. This involved comprehending the human-object relationships and the related interactions. We investigated the understanding of different state of objects for an action - a) currently involved (“active”), b) not involved (“passive”), to predict for c) the future object used for a future action (“next-active-object”) in the last seen frame by a model. Subsequently, we further explored the dependency of a future action on an identified next-active-object, and how it influences and refines the anticipation of specific actions that can be performed on that object. Lastly, we focused on understanding the motion of a person from the last observed seen to the start of an action. This “un-seen” segment preceding the start of an action introduces a degree of ambiguity, yet it holds significant potential to offer valuable insights into the state and timing of future interactions. In summary we show how the orientation of head can contribute to decode the person’s gaze. In addition, we show the importance of understanding the human-object relationship to anticipate the future actions for egocentric videos.

Social-Physical Interaction analysis for egocentric videos

THAKUR, SANKET KUMAR
2024-03-29

Abstract

The widespread use of wearable cameras prompted the design of egocentric (first-person) systems that can readily support and help humans in their daily activities. Such systems require to understand a person’s engagement with their environment, aiming to decode their intentions and predict future actions. These engagements can also be regarded as the social-physical interactions. In this thesis, we chose to study such interactions for first-person vision and contribute to an understanding of human behavior within the context of egocentric video content. Our research initially focused on understanding these engagements with their environment by observing where they look. We explored the potential to determine the gaze a person using a more cost-effective and practical approach—using only a first-person video and the head orientation of the person involved in the activities. In our latter works, we pivoted from individual visual focus of a person to a broader contextual attention of the activities performed by the person. This involved comprehending the human-object relationships and the related interactions. We investigated the understanding of different state of objects for an action - a) currently involved (“active”), b) not involved (“passive”), to predict for c) the future object used for a future action (“next-active-object”) in the last seen frame by a model. Subsequently, we further explored the dependency of a future action on an identified next-active-object, and how it influences and refines the anticipation of specific actions that can be performed on that object. Lastly, we focused on understanding the motion of a person from the last observed seen to the start of an action. This “un-seen” segment preceding the start of an action introduces a degree of ambiguity, yet it holds significant potential to offer valuable insights into the state and timing of future interactions. In summary we show how the orientation of head can contribute to decode the person’s gaze. In addition, we show the importance of understanding the human-object relationship to anticipate the future actions for egocentric videos.
29-mar-2024
File in questo prodotto:
File Dimensione Formato  
phdunige_4964072.pdf

accesso aperto

Tipologia: Tesi di dottorato
Dimensione 32.1 MB
Formato Adobe PDF
32.1 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1168815
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact