Modelling Sign Language with Encoder-Only Transformers and Human Pose Estimation Keypoint Data

Woods, Luke T.; Rana, Zeeshan A.

Modelling Sign Language with Encoder-Only Transformers and Human Pose Estimation Keypoint Data

Luke T. Woods () and Zeeshan A. Rana
Additional contact information
Luke T. Woods: Digital Aviation Research and Technology Centre (DARTeC), Cranfield University, Cranfield, Bedfordshire MK43 0AL, UK
Zeeshan A. Rana: Centre for Aeronautics, School of Aerospace, Transport and Manufacturing (SATM), Cranfield University, Cranfield, Bedfordshire MK43 0AL, UK

Mathematics, 2023, vol. 11, issue 9, 1-28

Abstract: We present a study on modelling American Sign Language (ASL) with encoder-only transformers and human pose estimation keypoint data. Using an enhanced version of the publicly available Word-level ASL (WLASL) dataset, and a novel normalisation technique based on signer body size, we show the impact model architecture has on accurately classifying sets of 10, 50, 100, and 300 isolated, dynamic signs using two-dimensional keypoint coordinates only. We demonstrate the importance of running and reporting results from repeated experiments to describe and evaluate model performance. We include descriptions of the algorithms used to normalise the data and generate the train, validation, and test data splits. We report top-1, top-5, and top-10 accuracy results, evaluated with two separate model checkpoint metrics based on validation accuracy and loss. We find models with fewer than 100k learnable parameters can achieve high accuracy on reduced vocabulary datasets, paving the way for lightweight consumer hardware to perform tasks that are traditionally resource-intensive, requiring expensive, high-end equipment. We achieve top-1, top-5, and top-10 accuracies of 97 % , 100 % , and 100 % , respectively, on a vocabulary size of 10 signs; 87 % , 97 % , and 98 % on 50 signs; 83 % , 96 % , and 97 % on 100 signs; and 71 % , 90 % , and 94 % on 300 signs, thereby setting a new benchmark for this task.

Keywords: sign language recognition; human pose estimation; classification; computer vision; deep learning; machine learning; supervised learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/9/2129/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/9/2129/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:9:p:2129-:d:1137920

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().