A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names
Dongyang Hou,
Hao Wu,
Jun Chen and
Ran Li
Additional contact information
Dongyang Hou: School of Environment Science and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China
Hao Wu: National Geomatics Center of China, 28 Lianhuachi West Road, Beijing 100830, China
Jun Chen: School of Environment Science and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China
Ran Li: National Geomatics Center of China, 28 Lianhuachi West Road, Beijing 100830, China
Sustainability, 2014, vol. 6, issue 10, 1-24
Abstract:
Place name is an important ingredient of borderlands situation information and plays a significant role in collecting them from the Internet with focused crawlers. However, current focused crawlers treat place name in the same way as any other common keyword, which has no geographical properties. This may reduce the effectiveness of focused crawlers. To solve the problem, this paper firstly discusses the importance of place name in focused crawlers in terms of location and spatial relation, and, then, proposes the two-tuple-based topic representation method to express place name and common keyword, respectively. Afterwards, spatial relations between place names are introduced to calculate the relevance of given topics and webpages, which can make the calculation process more accurately. On the basis of the above, a focused crawler prototype for borderlands situation information collection is designed and implemented. The crawling speed and F-Score are adopted to evaluate its efficiency and effectiveness. Experimental results indicate that the efficiency of our proposed focused crawler is consistent with the polite access interval and it could meet the daily demand of borderlands situation information collection. Additionally, the F-Score value of our proposed focused crawler increases by around 7%, which means that our proposed focused crawler is more effective than the traditional best-first focused crawler.
Keywords: focused crawler; place name; web information collection; borderlands situation; relevance calculation; spatial relations (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2014
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2071-1050/6/10/6529/pdf (application/pdf)
https://www.mdpi.com/2071-1050/6/10/6529/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:6:y:2014:i:10:p:6529-6552:d:40740
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().