Action Recognition in Videos through a Transfer-Learning-Based Technique
Elizabeth López-Lozada (),
Humberto Sossa (),
Elsa Rubio-Espino () and
Jesús Yaljá Montiel-Pérez
Additional contact information
Elizabeth López-Lozada: Centro de Investigación en Computación, Instituto Politécnico Nacional, Ciudad de México 07738, Mexico
Humberto Sossa: Centro de Investigación en Computación, Instituto Politécnico Nacional, Ciudad de México 07738, Mexico
Elsa Rubio-Espino: Centro de Investigación en Computación, Instituto Politécnico Nacional, Ciudad de México 07738, Mexico
Jesús Yaljá Montiel-Pérez: Centro de Investigación en Computación, Instituto Politécnico Nacional, Ciudad de México 07738, Mexico
Mathematics, 2024, vol. 12, issue 20, 1-17
Abstract:
In computer vision, human action recognition is a hot topic, popularized by the development of deep learning. Deep learning models typically accept video input without prior processing and train them to achieve recognition. However, conducting preliminary motion analysis can be beneficial in directing the model training to prioritize the motion of individuals with less priority for the environment in which the action occurs. This paper puts forth a novel methodology for human action recognition based on motion information that employs transfer-learning techniques. The proposed method comprises four stages: (1) human detection and tracking, (2) motion estimation, (3) feature extraction, and (4) action recognition using a two-stream model. In order to develop this work, a customized dataset was utilized, comprising videos of diverse actions (e.g., walking, running, cycling, drinking, and falling) extracted from multiple public sources and websites, including Pexels and MixKit. This realistic and diverse dataset allowed for a comprehensive evaluation of the proposed method, demonstrating its effectiveness in different scenarios and conditions. Furthermore, the performance of seven pre-trained models for feature extraction was evaluated. The models analyzed were Inception-v3, MobileNet-v2, MobileNet-v3-L, VGG-16, VGG-19, Xception, and ConvNeXt-L. The results demonstrated that the ConvNeXt-L model yielded the most optimal outcomes. Furthermore, using pre-trained models for feature extraction facilitated the training process on a personal computer with a single graphics processing unit, achieving an accuracy of 94.9%. The experimental findings and outcomes suggest that integrating motion information enhances action recognition performance.
Keywords: human action recognition; deep learning; video-based action recognition; computer vision; transfer learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/20/3245/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/20/3245/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:20:p:3245-:d:1500265
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().