MHAiR: A Dataset of Audio-Image Representations for Multimodal Human Actions

Shaikh, Muhammad Bilal; Chai, Douglas; Islam, Syed Mohammed Shamsul; Akhtar, Naveed

MHAiR: A Dataset of Audio-Image Representations for Multimodal Human Actions

Muhammad Bilal Shaikh (), Douglas Chai, Syed Mohammed Shamsul Islam and Naveed Akhtar
Additional contact information
Muhammad Bilal Shaikh: School of Engineering, Edith Cowan University, 270 Joondalup Drive, Joondalup, Perth, WA 6027, Australia
Douglas Chai: School of Engineering, Edith Cowan University, 270 Joondalup Drive, Joondalup, Perth, WA 6027, Australia
Syed Mohammed Shamsul Islam: School of Science, Edith Cowan University, 270 Joondalup Drive, Joondalup, Perth, WA 6027, Australia
Naveed Akhtar: School of Computing and Information Systems, The University of Melbourne, Melbourne Connect, 700 Swanston Street, Carlton, WA 3053, Australia

Data, 2024, vol. 9, issue 2, 1-12

Abstract: Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can be used as a benchmark dataset for evaluating the performance of machine learning models for human action recognition and related tasks. These audio-image representations could be suitable for a wide range of applications, such as surveillance, healthcare monitoring, and robotics. The dataset can also be used for transfer learning, where pre-trained models can be fine-tuned on a specific task using specific audio images. Thus, this dataset can facilitate the development of new techniques and approaches for improving the accuracy of human action-related tasks and also serve as a standard benchmark for testing the performance of different machine learning models and algorithms.

Keywords: human action recognition; image representations; multimodal dataset; computer vision (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/9/2/21/pdf (application/pdf)
https://www.mdpi.com/2306-5729/9/2/21/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:9:y:2024:i:2:p:21-:d:1326377

Access Statistics for this article

Data is currently edited by Ms. Becky Zhang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().