An Efficient Deep Convolutional Neural Network Approach for Object Detection and Recognition Using a Multi-Scale Anchor Box in Real-Time
Vijayakumar Varadarajan,
Dweepna Garg and
Ketan Kotecha
Additional contact information
Vijayakumar Varadarajan: School of Computer Science and Engineering, The University of New South Wales, Sydney 1466, Australia
Dweepna Garg: Department of Computer Engineering, Devang Patel Institute of Advance Technology and Research (DEPSTAR), Faculty of Technology and Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Anand 388421, India
Ketan Kotecha: Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune 412115, India
Future Internet, 2021, vol. 13, issue 12, 1-19
Abstract:
Deep learning is a relatively new branch of machine learning in which computers are taught to recognize patterns in massive volumes of data. It primarily describes learning at various levels of representation, which aids in understanding data that includes text, voice, and visuals. Convolutional neural networks have been used to solve challenges in computer vision, including object identification, image classification, semantic segmentation and a lot more. Object detection in videos involves confirming the presence of the object in the image or video and then locating it accurately for recognition. In the video, modelling techniques suffer from high computation and memory costs, which may decrease performance measures such as accuracy and efficiency to identify the object accurately in real-time. The current object detection technique based on a deep convolution neural network requires executing multilevel convolution and pooling operations on the entire image to extract deep semantic properties from it. For large objects, detection models can provide superior results; however, those models fail to detect the varying size of the objects that have low resolution and are greatly influenced by noise because the features after the repeated convolution operations of existing models do not fully represent the essential characteristics of the objects in real-time. With the help of a multi-scale anchor box, the proposed approach reported in this paper enhances the detection accuracy by extracting features at multiple convolution levels of the object. The major contribution of this paper is to design a model to understand better the parameters and the hyper-parameters which affect the detection and the recognition of objects of varying sizes and shapes, and to achieve real-time object detection and recognition speeds by improving accuracy. The proposed model has achieved 84.49 mAP on the test set of the Pascal VOC-2007 dataset at 11 FPS, which is comparatively better than other real-time object detection models.
Keywords: deep learning; convolution neural network; object detection and recognition; PASCAL VOC dataset; FDDB dataset (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1999-5903/13/12/307/pdf (application/pdf)
https://www.mdpi.com/1999-5903/13/12/307/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:13:y:2021:i:12:p:307-:d:691137
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().