ACTS: An automatic Chinese text segmentation system for full text retrieval
Zimin Wu and
Gwyneth Tseng
Journal of the American Society for Information Science, 1995, vol. 46, issue 2, 83-96
Abstract:
Text segmentation is a prerequisite for text retrieval systems. Chinese texts cannot be readily segmented into words because they do not contain word boundaries. ACTS is an automatic Chinese text segmentation proto‐type for Chinese full text retrieval. It applies partial syntactic analysis—the analysis of morphemes, words, and phrases. The idea was originally largely inspired by experiments on English morpheme and phrase‐analysis‐based text retrieval, which are particularly germane to Chinese, because neither Chinese nor English texts have morpheme and phrase boundaries. ACTS is built on the hypothesis that Chinese words and phrases exceeding two characters can be characterized by a grammar that describes the concatenation behavior of the morphological and syntactic categories of their formatives. This is examined through three procedures: (1) Segmentation—texts are divided into one and two character segments by matching against a dictionary; (2) Category disambiguation—the syntactic categories of segments are determined according to context; (3) Parsing—the segments are analyzed based on the grammar, and subsequently combined into compound and complex words for indexing and retrieval. The experimental results, based on a small sample of 30 texts, show that most significant words and phrases in these texts can be extracted with a high degree of accuracy. © 1995 John Wiley & Sons, Inc.
Date: 1995
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/(SICI)1097-4571(199503)46:23.0.CO;2-0
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamest:v:46:y:1995:i:2:p:83-96
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1097-4571
Access Statistics for this article
More articles in Journal of the American Society for Information Science from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().