Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Mental Health Assessment: analyzing body language patterns and emotional expressions.

Dataset : EmpatheticDialogues

using Deep Learning & Chatbot API. For Daily Mood Tracking and Theraputistic reports.

Mental Health Assessment: analyzing body language patterns and emotional expressions.

- dataset collected on the Amazon Mechanical Turk

- containing 24,850 one-to-one open-domain conversations.

- The dataset provides 32 evenly distributed emotion labels.

- Train: 19208, Valid: 2755 and Test: 2541

Basma Tarik 213113

Sama Ehab 213633

Supervised by Dr. Wael Gomaa

Dataset : EmoReact

Outlines

Motivation

Objective

  • Advancing Mental Health Care with Nonverbal Cues
  • The Need for Enhanced Virtual Assistance: understand human more accurately.
  • Bridging the Gap in Emotion Identification
  • Contributing to such significant work is deeply rewarding and inspiring.

Conclusion

- Dataset of children between the ages of 4 - 14 years old.

- contains 1102 videos. Train: 434, Val:305 and Test 368:

- This dataset is annotated for 17 affective states, six basic emotions, nine complex emotions :

- happiness

- sadness

- surprise

- fear

- disgust

- anger

- Amazon Mechanical Turk (AMT) was used for generating the labels

- Each video was annotated by threeworkers for seventeen labels.

- curiosity

- uncertainty

- excitement

- attentiveness

-exploration

- confusion

-anxiety

-embarrassment

-frustration.

  • Develop a deep learning emotion detection model, using frames and images.

  • Explore practical applications in mental health monitoring and virtual asstistance.

  • Use a wide spectrum of emotions to be detected by the model.

Introduction

- The development of a comprehensive model that uses deep learning to analyze and integrate data from body language, voice, and facial expressions.

- Enabling the identification of emotions in individuals who may struggle to express themselves verbally.

- Tracking daily moods and sending reports to therapists that can provide valuable support to individuals.

Solution Domains

  • Human-Robot Interaction

  • Behavior Analysis

  • ASD Diagnosis and Support
  • Background
  • Introduction
  • Motivation
  • Problem statement
  • Objective
  • Related Work
  • Proposed Solution
  • Dataset
  • Methodology
  • System Architecture
  • Results and Discussion
  • Comparative Results
  • Future Work
  • Conclusion
  • References

Abstract System Architecture

- People with disabilities face challenges in expressing themselves:

- Speech and language disabilities

- Hearing impairments

- Visual impairments

- Mobility challenges can all impact communication.

Emotic

&

Model Architecture

&

Introduction

Proposed Solution

Model Architecture

importance of body language

Background

Results and Discussion

Comparitive Results

- Effective communication stems from observing body language.

Resnet50 best Multi-accuracy: 94.5%

Body Language: Nonverbal communication through physical behaviors, gestures, and postures conveying emotions which express people's personality or intents.

Facial Expressions: Nonverbal signals communicated through facial muscle movements, crucial for expressing emotions.

The approach utilizes body language and emotional expressions as mental health indicators, involving

-data collection

- computer vision

- preprocessing

- deep learning model with integrated neural networks and attention mechanisms.

Its user-friendly application enables early intervention and personalized care by mental health professionals, with continuous updates improving adaptability and accuracy in identifying diverse traits and body language patterns associated with various mental health conditions.

Demo

- It's a key component in understanding how others are feeling and responding accordingly to their emotion.

Previous Work (EMOTIC dataset):

  • Focused on a limited set of emotions (6 out of 26).
  • Achieved high accuracy (97%) on this reduced set.
  • Limited applicability due to narrow range of emotions.

Our Work (EMOTIC dataset):

  • Analyzed all 26 emotions for a more comprehensive evaluation.
  • Achieved accuracy of 94.331% on the broader range.
  • Demonstrated effectiveness of deep learning for complex emotion recognition.

Our Best Performing Model ( ResNet50 on EMOTIC dataset):

  • ResNet-50 architecture achieved 94.5% accuracy and 82.89% recall rate.
  • Deep learning techniques, particularly ResNet-50, showed promising results.

Elements of personal communication

55% nonverbal

38% vocal

7% words only.

Dr. Albert Mehrabian, a researcher of body language

Gestures

Hand movements

Posture

Eye Contact

Physical contact such as handshakes

Tone of Voice

Dataset : EMOTIC

Related work 1/2

Refrences

Future Work

Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation (2021)

  • EMOTIC:

- Amazon Mechanical Turk (AMT) was used for generating the labels

- 34, 315 images

- It consists of images featuring people in real-world environments

- The dataset uses an extended list of 26 emotion categorie

for annotation

- 80% training , 20% testing

  • [1] J. Smith and A. Johnson, “Deep learning approach for emotion recognition from hu man body movements with feedforward deep convolution neural networks,” Journal of Machine Learning Research, 2021.
  • [2] M. Romeo, D. Hern´andez Garc´ıa, T. Han, A. Cangelosi, and K. Jokinen, “Predicting apparent personality from body language: benchmarking deep learning architectures for adaptive social human–robot interaction,” pp. 1167–1179, 2021, received 25 Jan 2021, Accepted 05 Aug 2021, Published online: 27 Oct 2021.
  • [3] R. Kosti, J. M. Alvarez, A. Recasens, and A. Lapedriza, “Emotic: Emotions in context dataset,” pp. 61–69, 2017.
  • [4] H. F. Alhasson, G. M. Alsaheel, N. S. Alharbi, A. A. Alsalamah, J. M. Alhujilan, and S. S. Alharbi, “Bi-model engagement emotion recognition based on facial and upper body landmarks and machine learning approaches,” International Journal of E-Services and Mobile Applications (IJESMA), vol. 15, no. 1, pp. 1–13, 2023.
  • [5] Z. Yang, A. Kay, Y. Li, W. Cross, and J. Luo, “Pose-based body language recogni tion for emotion and psychiatric symptom interpretation,” in 2020 25th International Conference on Pattern Recognition (ICPR). Milan, Italy: IEEE, January 2021.
  • [6] Z. Lu, J. Zeng, S. Shan, and X. Chen, “Zero-shot facial expression recognition with multi label label propagation,” in Asian Conference on Computer Vision. Cham: Springer International Publishing, 2018, pp. 19–34
  • Deep learning models are effective for emotion recognition from body language.
  • Potential applications in human-computer interaction and related fields.
  • Future research should focus on model refinement, exploring other modalities, and doing data augmentation to adress the dataset biases .

Introduction :

- Develop efficient body language recognition for emotion inference from RGB video.

- Create interpretable models for psychiatric symptom detection in healthcare.

Dataset:

- Collected and labeled by mental health professionals

- Include 32 body language, 24 emotions, 24 symptoms and psychiatric verdict.

- The dataset consists of 144 video clips, with 48 for training, 48 for validation, and 48 for testing

Problem Statement

Related Work 2/3

Predicting apparent personality from body language: benchmarking deep learning architectures for adaptive social human–robot interaction (2021)

Introduction :

- enable social robots to understand users' apparent personalities based on body language

- Deep learning architectures.

-Mental health patients face communication obstacles when conveying their emotions.

-Virtual assistants can aid expression but lack comprehensive emotion recognition and understanding of other's intensions .

-Despite their great utilities, Virtual Assistants' limitations include empathy, user's personality, and privacy concerns.

Methodology :

-Pose Feature Representations: NTraj Feature, NTraj+ Feature

- ST-ConvPose for learning spatial and temporal relations among joints

- ResNet-50: for high-level pose feature encoding

- knn classifier: for body language prediction with limited data.

- LSTM: for emotion interpretation

Dataset:

-ChaLearn First Impression dataset

- provides both audio and visual information from 10,000 clips.

Methodolgy :

  • 3DCNN: spatio-temporal features, Apparent Personality Prediction.
  • 3DResNet: activity recognition, audiovisual layer.
  • VGG-16 (VGG DAN+): successful in the ChaLearn 2016 competition.
  • CNN + LSTM: 2D convolutional & recursive layers.

Results:

  • VGGDAN+ as the most successful architecture across all personality traits.
  • conscientiousness was the most successfully predicted personality trait by the models (62% balanced accuracy for VGG DAN+ and 3DResNet).

Limitation:

  • structure of the data is inconsistent across the HHI sessions
  • dataset recordings are less informative than ideal

Results:

- The model's performance varies: 79.9% with predictions.

Related Work 2/2

Bi-Model Engagement Emotion Recognition Based on Facial and Upper-Body Landmarks and Machine Learning Approaches

Introduction:

-proposes a system for emotion recognition that combines facial expressions and upper-body movements.

-uses machine learning algorithms to classify emotions from facial landmarks and body pose data.

Limmetation:

- Data Bias

- Emotion Inference Relies on Body Language only.

Dataset:

-EMOTIC contains images with people in natural settings.

-MELD includes recordings of movie dialogues with emotional content.

-they selected 6 specific emotions: Aversion, Engagement, Excitement, Pleasure, Annoyance, and Disconnection. (23,185 images)

Methodolgy :

-MediaPipe Holistic model is used to extract facial landmarks (468 points) and body pose landmarks (33 points) from images.

-bi-model approach that combines separate models for facial expressions and upper-body movements.

-Individual models: Support Vector Machine (SVM), Logistic Regression, Random Forest.

-Ensemble model: combines predictions from all three individual models for improved accuracy.

Results:

-97% accuracy on EMOTIC

-99% accuracy on MELD

Limitation:

-Controlled Settings

-imited Information on Class Imbalance Correction

Results and Discussion

Experment 1

Demo

Transformers best result: 38.43%

Experment 2

implementing 5-fold cross-validation.

rnn-cnn: 38.8%

How to improve it ?

-Increase the Number of Epochs

-Data Augmentation

-vision transformers

Learn more about creating dynamic, engaging presentations with Prezi