Int.2D-3D-CNN: Integrated 2D and 3D Convolutional Neural Networks for Video Violence Recognition
Wimolsree Getsopon,
Sirawan Phiphitphatphaisit,
Emmanuel Okafor and
Olarik Surinta ()
Additional contact information
Wimolsree Getsopon: Multi-Agent Intelligent Simulation Laboratory (MISL) Research Unit, Department of Information Technology, Faculty of Informatics, Mahasarakham University, Mahasarakham 44150, Thailand
Sirawan Phiphitphatphaisit: Department of Information System, Faculty of Business Administration and Information Technology, Rajamangala University of Technology Isan Khon Kaen Campus, Khon Kaen 40000, Thailand
Emmanuel Okafor: SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
Olarik Surinta: Multi-Agent Intelligent Simulation Laboratory (MISL) Research Unit, Department of Information Technology, Faculty of Informatics, Mahasarakham University, Mahasarakham 44150, Thailand
Mathematics, 2025, vol. 13, issue 16, 1-35
Abstract:
Intelligent video analysis tools have advanced significantly, with numerous cameras installed in various locations to enhance security and monitor unusual events. However, the effective detection and monitoring of violent incidents often depend on manual effort and time-consuming analysis of recorded footage, which can delay timely interventions. Deep learning has emerged as a powerful approach for extracting critical features essential to identifying and classifying violent behavior, enabling the development of accurate and scalable models across diverse domains. This study presents the Int.2D-3D-CNN architecture, which integrates a two-dimensional convolutional neural network (2D-CNN) and 3D-CNNs for video-based violence recognition. Compared to traditional 2D-CNN and 3D-CNN models, the proposed Int.2D-3D-CNN model presents improved performance on the Hockey Fight, Movie, and Violent Flows datasets. The architecture captures both static and dynamic characteristics of violent scenes by integrating spatial and temporal information. Specifically, the 2D-CNN component employs lightweight MobileNetV1 and MobileNetV2 to extract spatial features from individual frames, while a simplified 3D-CNN module with a single 3D convolution layer captures motion and temporal dependencies across sequences. Evaluation results highlight the robustness of the proposed model in accurately distinguishing violent from non-violent videos under diverse conditions. The Int.2D-3D-CNN model achieved accuracies of 98%, 100%, and 98% on the Hockey Fight, Movie, and Violent Flows datasets, respectively, indicating strong potential for violence recognition applications.
Keywords: 2D convolutional neural network; 3D convolutional neural network; deep feature extraction; frame-level deep features; video violence recognition (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/16/2665/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/16/2665/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:16:p:2665-:d:1727782
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().