Design of a Parallel and Scalable Crawler for the Hidden Web
Sonali Gupta and
Komal Kumar Bhatia
Additional contact information
Sonali Gupta: J. C. Bose University of Science and Technology, India
Komal Kumar Bhatia: J. C. Bose University of Science and Technology, India
International Journal of Information Retrieval Research (IJIRR), 2022, vol. 12, issue 1, 1-23
Abstract:
The WWW contains huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals / magazine), blogs etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the Hidden Web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, we present the architecture of a parallel crawler for the Hidden Web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel Hidden web crawler (PSHWC), not only effectively but also efficiently extracts and download the contents in the Hidden web databases
Date: 2022
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJIRR.289612 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jirr00:v:12:y:2022:i:1:p:1-23
Access Statistics for this article
International Journal of Information Retrieval Research (IJIRR) is currently edited by Zhongyu Lu
More articles in International Journal of Information Retrieval Research (IJIRR) from IGI Global
Bibliographic data for series maintained by Journal Editor ().