Evaluating and Improving the Robustness of Web Crawlers Against IP Blocking and Captchas
Victor Onyenagubom
Additional contact information
Victor Onyenagubom: Department of Computing, Teesside University, London, United Kingdom
International Journal of Latest Technology in Engineering, Management & Applied Science, 2024, vol. 13, issue 4, 146-154
Abstract:
: In this work, we demonstrate the capability of a JavaScript-based web crawler to overcome anti-crawling measures such as CAPTCHAs and IP blocking. we delved into the ethical and legal dimensions of web crawling and provide recommendations for future research endeavors in this domain. A web crawler, being an automated software program, can navigate through websites and extract information. While it serves purposes like website analysis and indexing, it can also be misused for extracting personal data, scraping content, and overloading servers. Website administrators often employ anti-crawling techniques, such as CAPTCHAs, Robot.txt, and IP blocking, to thwart malicious web crawlers from accessing their content. These techniques aim to curtail the ability of a web crawler to scrape, access, or overload website resources, without impeding legitimate users from accessing the necessary content. The objective of this study is to demonstrate and enhance the resilience of legitimate web crawlers against anti-crawling techniques like CAPTCHAs and IP blocking, challenging the notion that these measures are universally effective against all types of web crawlers.
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.ijltemas.in/DigitalLibrary/Vol.13Issue4/146-154.pdf (application/pdf)
https://www.ijltemas.in/papers/volume-13-issue-4/146-154.html (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bjb:journl:v:13:y:2024:i:4:p:146-154
Access Statistics for this article
International Journal of Latest Technology in Engineering, Management & Applied Science is currently edited by Dr. Pawan Verma
More articles in International Journal of Latest Technology in Engineering, Management & Applied Science from International Journal of Latest Technology in Engineering, Management & Applied Science (IJLTEMAS)
Bibliographic data for series maintained by Dr. Pawan Verma ().