Automatic thesaurus generation for Chinese documents

Tseng, Yuen‐Hsien

Automatic thesaurus generation for Chinese documents

Yuen‐Hsien Tseng

Journal of the American Society for Information Science and Technology, 2002, vol. 53, issue 13, 1130-1138

Abstract: This article reports an approach to automatic thesaurus construction for Chinese documents. An effective Chinese keyword extraction algorithm is first presented. Experiments showed that for each document an average of 33% keywords unknown to a lexicon of 123,226 terms could be identified by this algorithm. Of these unregistered words, only 8.3% of them are illegal. Keywords extracted from each document are further filtered for term association analysis. Association weights larger than a threshold are then accumulated over all the documents to yield the final term pair similarities. Compared to previous studies, this method speeds up the thesaurus generation process drastically. It also achieves a similar percentage level of term relatedness.

Date: 2002
References: Add references at CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://doi.org/10.1002/asi.10146

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:53:y:2002:i:13:p:1130-1138

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().