Social Interactions Analysis through Deep Visual Nonverbal Features

Shahid, Muhammad

doi:10.15167/shahid-muhammad_phd2021-03-05

Human social interaction is as common as complex to understand. It is a part of our routine life ranging from houses to communal places either in direct face-to-face interaction or through digital media. During these interactions, humans exchange their thoughts, intentions, and emotions effectively. They use verbal language along with non-verbal social signals such as variation in voice tune, hand gestures, facial expressions, and body posture. This non-verbal part of communication is still less understood despite the fact of recent huge research progression and computational advancement. Recently, social interactions in groups such as meetings, standing conversations, interviewing, and discussions have become popular areas of research for the social computing domain. In this thesis, we propose and investigate novel computational approaches for the application of emergent leadership detection, leadership style prediction, personality traits classification, and visual voice activity detection in the context of small group interactions. First of all, we investigated emergent leadership detection in small group meeting environments. The leaders are key players in making the decision, facing problems, and as a result, playing an important role in an organization. In organizational behavioral research, the detection of an emergent leader is an important task. From the computing perspective, we propose visual activity-based nonverbal feature extraction from video streams by applying a deep learning approach along with the feature encoding for low dimensional representation. Our method shows improved results even as compared to multi-modal non-verbal features extracted from audio and visual. These novel features also performed well for the application of autocratic or democratic leadership style prediction and the discrimination of high/low extraversion. Afterwards, we explored the problem of voice activity detection (VAD) extensively. VAD is defined as Who is Speaking and When". Usually, VAD is accomplished using audio features only. But, due to some physical or privacy-related constraints, the audio modality is not always accessible which increases the importance of VAD based on visual modality only. Visual VAD is also a very useful for several social interactions analysis-related applications. We performed a detailed analysis to find out an efficient way of representing the raw video streams for this task. A full upper body-based holistic approach is adopted instead of using only lips motion or facial visual features as mostly suggested by the literature. Motivated from psychology literature, gesticulating style while speaking varies from person to person depending upon ethnic background or type of personality. An unsupervised domain adaptation is also adapted and gives a good boost in VAD performance. We introduce the new RealVAD dataset, which is used to benchmark the VAD methods in real-life situations. Lastly, we performed body motion cues based VAD learning in conjunction with a weakly supervised segmentation scheme.

Social Interactions Analysis through Deep Visual Nonverbal Features

SHAHID, MUHAMMAD

2021-03-05

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

Social Interactions Analysis through Deep Visual Nonverbal Features

SHAHID, MUHAMMAD

2021-03-05

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)