Facial expression-based emotion recognition across diverse age groups: a multi-scale vision transformer with contrastive learning approach

Balachandran, G.; Ranjith, S.; Chenthil, T. R.; Jagan, G. C.

Facial expression-based emotion recognition across diverse age groups: a multi-scale vision transformer with contrastive learning approach

G. Balachandran (), S. Ranjith, T. R. Chenthil and G. C. Jagan
Additional contact information
G. Balachandran: Jeppiaar Engineering College
S. Ranjith: Jeppiaar Engineering College
T. R. Chenthil: Jeppiaar Engineering College
G. C. Jagan: Jeppiaar Engineering College

Journal of Combinatorial Optimization, 2025, vol. 49, issue 1, No 11, 39 pages

Abstract: Abstract Facial expression-based Emotion Recognition (FER) is crucial in human–computer interaction and affective computing, particularly when addressing diverse age groups. This paper introduces the Multi-Scale Vision Transformer with Contrastive Learning (MViT-CnG), an age-adaptive FER approach designed to enhance the accuracy and interpretability of emotion recognition models across different classes. The MViT-CnG model leverages vision transformers and contrastive learning to capture intricate facial features, ensuring robust performance despite diverse and dynamic facial features. By utilizing contrastive learning, the model's interpretability is significantly enhanced, which is vital for building trust in automated systems and facilitating human–machine collaboration. Additionally, this approach enriches the model's capacity to discern shared and distinct features within facial expressions, improving its ability to generalize across different age groups. Evaluations using the FER-2013 and CK + datasets highlight the model's broad generalization capabilities, with FER-2013 covering a wide range of emotions across diverse age groups and CK + focusing on posed expressions in controlled environments. The MViT-CnG model adapts effectively to both datasets, showcasing its versatility and reliability across distinct data characteristics. Performance results demonstrated that the MViT-CnG model achieved superior accuracy across all emotion recognition labels on the FER-2013 dataset with a 99.6% accuracy rate, and 99.5% on the CK + dataset, indicating significant improvements in recognizing subtle facial expressions. Comprehensive evaluations revealed that the model's precision, recall, and F1-score are consistently higher than those of existing models, confirming its robustness and reliability in facial emotion recognition tasks.

Keywords: Facial emotion recognition; Human–computer interaction; Age-adaptive model; Deep learning; Vision Transformer; Contrastive learning (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s10878-024-01241-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jcomop:v:49:y:2025:i:1:d:10.1007_s10878-024-01241-8

Ordering information: This journal article can be ordered from
https://www.springer.com/journal/10878

DOI: 10.1007/s10878-024-01241-8

Access Statistics for this article

Journal of Combinatorial Optimization is currently edited by Thai, My T.

More articles in Journal of Combinatorial Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().