Enhancing Robustness of Viewpoint Changes in 3D Skeleton-Based Human Action Recognition

Park, Jinyoon; Kim, Chulwoong; Kim, Seung-Chan

Enhancing Robustness of Viewpoint Changes in 3D Skeleton-Based Human Action Recognition

Jinyoon Park, Chulwoong Kim and Seung-Chan Kim ()
Additional contact information
Jinyoon Park: Machine Learning Systems Lab., Department of Sport Interaction Science, Sungkyunkwan University, Suwon 16419, Republic of Korea
Chulwoong Kim: TAIIPA—Taean AI Industry Promotion Agency, Taean 32154, Republic of Korea
Seung-Chan Kim: Machine Learning Systems Lab., Department of Sport Interaction Science, Sungkyunkwan University, Suwon 16419, Republic of Korea

Mathematics, 2023, vol. 11, issue 15, 1-17

Abstract: Previous research on 3D skeleton-based human action recognition has frequently relied on a sequence-wise viewpoint normalization process, which adjusts the view directions of all segmented action sequences. This type of approach typically demonstrates robustness against variations in viewpoint found in short-term videos, a characteristic commonly encountered in public datasets. However, our preliminary investigation of complex action sequences, such as discussions or smoking, reveals its limitations in capturing the intricacies of such actions. To address these view-dependency issues, we propose a straightforward, yet effective, sequence-wise augmentation technique. This strategy enhances the robustness of action recognition models, particularly against changes in viewing direction that mainly occur within the horizontal plane (azimuth) by rotating human key points around either the z-axis or the spine vector, effectively creating variations in viewing directions. We scrutinize the robustness of this approach against real-world viewpoint variations through extensive empirical studies on multiple public datasets, including an additional set of custom action sequences. Despite the simplicity of our approach, our experimental results consistently yield improved action recognition accuracies. Compared to the sequence-wise viewpoint normalization method used with advanced deep learning models like Conv1D, LSTM, and Transformer, our approach showed a relative increase in accuracy of 34.42% for the z-axis and 10.86% for the spine vector.

Keywords: action recognition; machine learning; feature learning; skeletal data; data augmentation (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/15/3280/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/15/3280/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:15:p:3280-:d:1202894

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().