Building Self-Healing Feature Based on Faster R-CNN Deep Learning Technique in Web Data Extraction Systems
Sudhir Kumar Patnaik and
C. Narendra Babu ()
Additional contact information
Sudhir Kumar Patnaik: Department of Computer Science and Engineering, M.S. Ramaiah University of Applied Sciences, MSR Nagar, Bangalore, India
C. Narendra Babu: Department of Computer Science and Engineering, M.S. Ramaiah University of Applied Sciences, MSR Nagar, Bangalore, India
Journal of Information & Knowledge Management (JIKM), 2022, vol. 21, issue 02, 1-27
Abstract:
Web data extraction has evolved over the years with extracting data from documents to today’s World Wide Web (WWW). The WWW growth has placed data at the centre of this ecosystem and benefited society at large, businesses and consumers. The proposed system uses deep learning technique, Faster region convolutional neural network (R-CNN) for automated navigation, extraction of data and self-healing of data extraction engine to adapt to dynamic changes in website layout. The proposed system trains the Faster R-CNN model for detection of product in the web page using bounding box image detection technique and extracts product details with high extraction accuracy. Deep learning technique has advanced rapidly in the different fields for image detection, but its application in data extraction makes this paper unique. An ecommerce retail website is used as real-world example to prove the self-healing capability of the proposed automated web data extraction system.
Keywords: Adaptive; data extraction; deep learning; Faster R-CNN; self-healing (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649222500290
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:21:y:2022:i:02:n:s0219649222500290
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649222500290
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().