Social-Physical Interaction analysis for egocentric videos

IRIS

The widespread use of wearable cameras prompted the design of egocentric (first-person) systems that can readily support and help humans in their daily activities. Such systems require to understand a person’s engagement with their environment, aiming to decode their intentions and predict future actions. These engagements can also be regarded as the social-physical interactions. In this thesis, we chose to study such interactions for first-person vision and contribute to an understanding of human behavior within the context of egocentric video content. Our research initially focused on understanding these engagements with their environment by observing where they look. We explored the potential to determine the gaze a person using a more cost-effective and practical approach—using only a first-person video and the head orientation of the person involved in the activities. In our latter works, we pivoted from individual visual focus of a person to a broader contextual attention of the activities performed by the person. This involved comprehending the human-object relationships and the related interactions. We investigated the understanding of different state of objects for an action - a) currently involved (“active”), b) not involved (“passive”), to predict for c) the future object used for a future action (“next-active-object”) in the last seen frame by a model. Subsequently, we further explored the dependency of a future action on an identified next-active-object, and how it influences and refines the anticipation of specific actions that can be performed on that object. Lastly, we focused on understanding the motion of a person from the last observed seen to the start of an action. This “un-seen” segment preceding the start of an action introduces a degree of ambiguity, yet it holds significant potential to offer valuable insights into the state and timing of future interactions. In summary we show how the orientation of head can contribute to decode the person’s gaze. In addition, we show the importance of understanding the human-object relationship to anticipate the future actions for egocentric videos.

Social-Physical Interaction analysis for egocentric videos

THAKUR, SANKET KUMAR

2024-03-29

Abstract

The widespread use of wearable cameras prompted the design of egocentric (first-person) systems that can readily support and help humans in their daily activities. Such systems require to understand a person’s engagement with their environment, aiming to decode their intentions and predict future actions. These engagements can also be regarded as the social-physical interactions. In this thesis, we chose to study such interactions for first-person vision and contribute to an understanding of human behavior within the context of egocentric video content. Our research initially focused on understanding these engagements with their environment by observing where they look. We explored the potential to determine the gaze a person using a more cost-effective and practical approach—using only a first-person video and the head orientation of the person involved in the activities. In our latter works, we pivoted from individual visual focus of a person to a broader contextual attention of the activities performed by the person. This involved comprehending the human-object relationships and the related interactions. We investigated the understanding of different state of objects for an action - a) currently involved (“active”), b) not involved (“passive”), to predict for c) the future object used for a future action (“next-active-object”) in the last seen frame by a model. Subsequently, we further explored the dependency of a future action on an identified next-active-object, and how it influences and refines the anticipation of specific actions that can be performed on that object. Lastly, we focused on understanding the motion of a person from the last observed seen to the start of an action. This “un-seen” segment preceding the start of an action introduces a degree of ambiguity, yet it holds significant potential to offer valuable insights into the state and timing of future interactions. In summary we show how the orientation of head can contribute to decode the person’s gaze. In addition, we show the importance of understanding the human-object relationship to anticipate the future actions for egocentric videos.

Scheda breve

Scheda completa

Scheda completa (DC)

Data di discussione della tesi

29-mar-2024

Appare nelle tipologie:

Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phdunige_4964072.pdf accesso aperto Tipologia: Tesi di dottorato Dimensione 32.1 MB Formato Adobe PDF Visualizza/Apri	32.1 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1168815

Citazioni

ND

ND

ND

social impact