Status Locality on the Web: Implications for Building Focused Collections

Pant, Gautam; Srinivasan, Padmini

Status Locality on the Web: Implications for Building Focused Collections

Gautam Pant () and Padmini Srinivasan ()
Additional contact information
Gautam Pant: Department of Management Sciences, The University of Iowa, Iowa City, Iowa 52242
Padmini Srinivasan: Department of Computer Science, The University of Iowa, Iowa City, Iowa 52242

Information Systems Research, 2013, vol. 24, issue 3, 802-821

Abstract: Topical locality on the Web is the notion that pages tend to link to other topically similar pages and that such similarity decays rapidly with link distance. This supports meaningful Web browsing and searching by information consumers. It also allows topical Web crawlers, programs that fetch pages by following hyperlinks, to harvest topical subsets of the Web for applications such as those in vertical search and business intelligence. We show that the Web exhibits another property that we call “status locality.” It is based on the notion that pages tend to link to other pages of similar status (importance) and that this status similarity also decays rapidly with link distance. Analogous to topical locality, status locality may also be exploited by Web crawlers. Collections built by such crawlers include pages that are both topically relevant and also important. This capability is crucial because of the large numbers of Web pages addressing even niche topics. The challenge in exploiting status locality while crawling is that page importance (or status ) is typically recognized through global measures computed by processing link data from billion of pages. In contrast, topical Web crawlers depend on local information based on previously downloaded pages. We solve this problem by using methods developed previously that utilize local characteristics of pages to estimate their global status. This leads to the design of new crawlers, specifically of utility-biased crawlers guided by a Cobb-Douglas utility function. Our crawler experiments show that status and topicality of Web collections present a trade-off. An adaptive version of our utility-biased crawler dynamically modifies output elasticities of topicality and status to create Web collections that maintain high average topicality. This can be done while simultaneously achieving significantly higher average status as compared to several benchmarks including a state-of-the-art topical crawler.

Keywords: status locality; predictive models; topical crawlers; homophily (search for similar items in EconPapers)
Date: 2013
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
http://dx.doi.org/10.1287/isre.1120.0457 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orisre:v:24:y:2013:i:3:p:802-821

Access Statistics for this article

More articles in Information Systems Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().