Transport Object Detection in Street View Imagery Using Decomposed Convolutional Neural Networks
Yunpeng Bai,
Changjing Shang,
Ying Li,
Liang Shen,
Shangzhu Jin () and
Qiang Shen ()
Additional contact information
Yunpeng Bai: Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK
Changjing Shang: Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK
Ying Li: School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
Liang Shen: School of Information Engineering, Fujian Business University, Fuzhou 350506, China
Shangzhu Jin: Information Office, Chongqing University of Science and Technology, Chongqing 401331, China
Qiang Shen: School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
Mathematics, 2023, vol. 11, issue 18, 1-22
Abstract:
Deep learning has achieved great successes in performing many visual recognition tasks, including object detection. Nevertheless, existing deep networks are computationally expensive and memory intensive, hindering their deployment in resource-constrained environments, such as mobile or embedded devices that are widely used by city travellers. Recently, estimating city-level travel patterns using street imagery has been shown to be a potentially valid way according to a case study with Google Street View (GSV), addressing a critical challenge in transport object detection. This paper presents a compressed deep network using tensor decomposition to detect transport objects in GSV images, which is sustainable and eco-friendly. In particular, a new dataset named Transport Mode Share-Tokyo (TMS-Tokyo) is created to serve the public for transport object detection. This is based on the selection and filtering of 32,555 acquired images that involve 50,827 visible transport objects (including cars, pedestrians, buses, trucks, motors, vans, cyclists and parked bicycles) from the GSV imagery of Tokyo. Then a compressed convolutional neural network (termed SVDet) is proposed for street view object detection via tensor train decomposition on a given baseline detector. The method proposed herein yields a mean average precision (mAP) of 77.6% on the newly introduced dataset, TMS-Tokyo, necessitating just 17.29 M parameters and a computational capacity of 16.52 G FLOPs. As such, it markedly surpasses the performance of existing state-of-the-art methods documented in the literature.
Keywords: convolutional neural networks; street-view object detection; tensor train decomposition (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/11/18/3839/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/18/3839/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:18:p:3839-:d:1235024
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().