Appearance-Based Gaze Estimation Method Using Static Transformer Temporal Differential Network
Yujie Li,
Longzhao Huang,
Jiahui Chen,
Xiwen Wang and
Benying Tan ()
Additional contact information
Yujie Li: Guangxi Colleges and Universities Key Laboratory of AI Algorithm Engineering, School of Artificial Intelligence, Guilin University of Electronic Technology, Jinji Road, Guilin 541004, China
Longzhao Huang: School of Artificial Intelligence, Guilin University of Electronic Technology, Jinji Road, Guilin 541004, China
Jiahui Chen: School of Artificial Intelligence, Guilin University of Electronic Technology, Jinji Road, Guilin 541004, China
Xiwen Wang: School of Artificial Intelligence, Guilin University of Electronic Technology, Jinji Road, Guilin 541004, China
Benying Tan: Guangxi Colleges and Universities Key Laboratory of AI Algorithm Engineering, School of Artificial Intelligence, Guilin University of Electronic Technology, Jinji Road, Guilin 541004, China
Mathematics, 2023, vol. 11, issue 3, 1-18
Abstract:
Gaze behavior is important and non-invasive human–computer interaction information that plays an important role in many fields—including skills transfer, psychology, and human–computer interaction. Recently, improving the performance of appearance-based gaze estimation, using deep learning techniques, has attracted increasing attention: however, several key problems in these deep-learning-based gaze estimation methods remain. Firstly, the feature fusion stage is not fully considered: existing methods simply concatenate the different obtained features into one feature, without considering their internal relationship. Secondly, dynamic features can be difficult to learn, because of the unstable extraction process of ambiguously defined dynamic features. In this study, we propose a novel method to consider feature fusion and dynamic feature extraction problems. We propose the static transformer module (STM), which uses a multi-head self-attention mechanism to fuse fine-grained eye features and coarse-grained facial features. Additionally, we propose an innovative recurrent neural network (RNN) cell—that is, the temporal differential module (TDM)—which can be used to extract dynamic features. We integrated the STM and the TDM into the static transformer with a temporal differential network (STTDN). We evaluated the STTDN performance, using two publicly available datasets (MPIIFaceGaze and Eyediap), and demonstrated the effectiveness of the STM and the TDM. Our results show that the proposed STTDN outperformed state-of-the-art methods, including that of Eyediap (by 2.9%).
Keywords: gaze estimation; static transformer temporal differential network; static transformer module; temporal differential module; self-attention mechanism (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/3/686/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/3/686/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:3:p:686-:d:1050410
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().