Human social interaction is as common as complex to understand. It is a part of our routine life ranging from houses to communal places either in direct face-to-face interaction or through digital media. During these interactions, humans exchange their thoughts, intentions, and emotions effectively. They use verbal language along with non-verbal social signals such as variation in voice tune, hand gestures, facial expressions, and body posture. This non-verbal part of communication is still less understood despite the fact of recent huge research progression and computational advancement. Recently, social interactions in groups such as meetings, standing conversations, interviewing, and discussions have become popular areas of research for the social computing domain. In this thesis, we propose and investigate novel computational approaches for the application of emergent leadership detection, leadership style prediction, personality traits classification, and visual voice activity detection in the context of small group interactions. First of all, we investigated emergent leadership detection in small group meeting environments. The leaders are key players in making the decision, facing problems, and as a result, playing an important role in an organization. In organizational behavioral research, the detection of an emergent leader is an important task. From the computing perspective, we propose visual activity-based nonverbal feature extraction from video streams by applying a deep learning approach along with the feature encoding for low dimensional representation. Our method shows improved results even as compared to multi-modal non-verbal features extracted from audio and visual. These novel features also performed well for the application of autocratic or democratic leadership style prediction and the discrimination of high/low extraversion. Afterwards, we explored the problem of voice activity detection (VAD) extensively. VAD is defined as Who is Speaking and When". Usually, VAD is accomplished using audio features only. But, due to some physical or privacy-related constraints, the audio modality is not always accessible which increases the importance of VAD based on visual modality only. Visual VAD is also a very useful for several social interactions analysis-related applications. We performed a detailed analysis to find out an efficient way of representing the raw video streams for this task. A full upper body-based holistic approach is adopted instead of using only lips motion or facial visual features as mostly suggested by the literature. Motivated from psychology literature, gesticulating style while speaking varies from person to person depending upon ethnic background or type of personality. An unsupervised domain adaptation is also adapted and gives a good boost in VAD performance. We introduce the new RealVAD dataset, which is used to benchmark the VAD methods in real-life situations. Lastly, we performed body motion cues based VAD learning in conjunction with a weakly supervised segmentation scheme.

Social Interactions Analysis through Deep Visual Nonverbal Features

SHAHID, MUHAMMAD
2021-03-05

Abstract

Human social interaction is as common as complex to understand. It is a part of our routine life ranging from houses to communal places either in direct face-to-face interaction or through digital media. During these interactions, humans exchange their thoughts, intentions, and emotions effectively. They use verbal language along with non-verbal social signals such as variation in voice tune, hand gestures, facial expressions, and body posture. This non-verbal part of communication is still less understood despite the fact of recent huge research progression and computational advancement. Recently, social interactions in groups such as meetings, standing conversations, interviewing, and discussions have become popular areas of research for the social computing domain. In this thesis, we propose and investigate novel computational approaches for the application of emergent leadership detection, leadership style prediction, personality traits classification, and visual voice activity detection in the context of small group interactions. First of all, we investigated emergent leadership detection in small group meeting environments. The leaders are key players in making the decision, facing problems, and as a result, playing an important role in an organization. In organizational behavioral research, the detection of an emergent leader is an important task. From the computing perspective, we propose visual activity-based nonverbal feature extraction from video streams by applying a deep learning approach along with the feature encoding for low dimensional representation. Our method shows improved results even as compared to multi-modal non-verbal features extracted from audio and visual. These novel features also performed well for the application of autocratic or democratic leadership style prediction and the discrimination of high/low extraversion. Afterwards, we explored the problem of voice activity detection (VAD) extensively. VAD is defined as Who is Speaking and When". Usually, VAD is accomplished using audio features only. But, due to some physical or privacy-related constraints, the audio modality is not always accessible which increases the importance of VAD based on visual modality only. Visual VAD is also a very useful for several social interactions analysis-related applications. We performed a detailed analysis to find out an efficient way of representing the raw video streams for this task. A full upper body-based holistic approach is adopted instead of using only lips motion or facial visual features as mostly suggested by the literature. Motivated from psychology literature, gesticulating style while speaking varies from person to person depending upon ethnic background or type of personality. An unsupervised domain adaptation is also adapted and gives a good boost in VAD performance. We introduce the new RealVAD dataset, which is used to benchmark the VAD methods in real-life situations. Lastly, we performed body motion cues based VAD learning in conjunction with a weakly supervised segmentation scheme.
5-mar-2021
Human-Computer Interaction, Social Interactions, Nonverbal Behavior, Leadership Detection, Personality Traits Classification, Visual Voice Activity Detection,
File in questo prodotto:
File Dimensione Formato  
phdunige_4466936.pdf

accesso aperto

Descrizione: Social Interactions Analysis through Deep Visual Nonverbal Features
Tipologia: Tesi di dottorato
Dimensione 13.96 MB
Formato Adobe PDF
13.96 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1040976
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact