Facial Expression Recognition Using Dual Path Feature Fusion and Stacked Attention

Zhu, Hongtao; Xu, Huahu; Ma, Xiaojin; Bian, Minjie

Facial Expression Recognition Using Dual Path Feature Fusion and Stacked Attention

Hongtao Zhu (), Huahu Xu, Xiaojin Ma and Minjie Bian
Additional contact information
Hongtao Zhu: School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Huahu Xu: School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Xiaojin Ma: School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Minjie Bian: School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China

Future Internet, 2022, vol. 14, issue 9, 1-17

Abstract: Facial Expression Recognition (FER) can achieve an understanding of the emotional changes of a specific target group. The relatively small dataset related to facial expression recognition and the lack of a high accuracy of expression recognition are both a challenge for researchers. In recent years, with the rapid development of computer technology, especially the great progress of deep learning, more and more convolutional neural networks have been developed for FER research. Most of the convolutional neural performances are not good enough when dealing with the problems of overfitting from too-small datasets and noise, due to expression-independent intra-class differences. In this paper, we propose a Dual Path Stacked Attention Network (DPSAN) to better cope with the above challenges. Firstly, the features of key regions in faces are extracted using segmentation, and irrelevant regions are ignored, which effectively suppresses intra-class differences. Secondly, by providing the global image and segmented local image regions as training data for the integrated dual path model, the overfitting problem of the deep network due to a lack of data can be effectively mitigated. Finally, this paper also designs a stacked attention module to weight the fused feature maps according to the importance of each part for expression recognition. For the cropping scheme, this paper chooses to adopt a cropping method based on the fixed four regions of the face image, to segment out the key image regions and to ignore the irrelevant regions, so as to improve the efficiency of the algorithm computation. The experimental results on the public datasets, CK+ and FERPLUS, demonstrate the effectiveness of DPSAN, and its accuracy reaches the level of current state-of-the-art methods on both CK+ and FERPLUS, with 93.2% and 87.63% accuracy on the CK+ dataset and FERPLUS dataset, respectively.

Keywords: deep learning; attention; facial expression recognition (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1999-5903/14/9/258/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/9/258/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:9:p:258-:d:901445

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().