EconPapers    
Economics at your fingertips  
 

Understanding and customizing stopword lists for enhanced patent mapping

Antoine Blanchard

World Patent Information, 2007, vol. 29, issue 4, 308-316

Abstract: While the use of patent mapping tools is growing, the 'black-box' systems involved do not generally allow the user to interfere further than the preliminary retrieval of documents. Except, that is, for one thing: the stopword list, i.e. the list of 'noise' words to be ignored, which can be modified to one's liking and dramatically impacts the final output and analysis. This paper invokes information science and computer science to provide clues for a better understanding of the stopword lists' origin and purpose, and how they fit in the mapping algorithm. Further, it stresses the need for stopword lists that depend on the document corpus analyzed. Thus, the analyst is invited to add and remove stopwords--or even, in order to avoid inherent biases, to use algorithms that can automatically create ad hoc stopword lists.

Keywords: Text; mining; Word; distribution; Zipf's; law; STN; AnaVist; Thomson; Aureka; OmniViz; Stopwords; Patent; mapping (search for similar items in EconPapers)
Date: 2007
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0172-2190(07)00044-0
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:worpat:v:29:y:2007:i:4:p:308-316

Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/supportfaq.cws_home/regional
http://www.elsevier. ... _01_ooc_1&version=01

Access Statistics for this article

World Patent Information is currently edited by Michael Blackman

More articles in World Patent Information from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:worpat:v:29:y:2007:i:4:p:308-316