Distributional Similarity for Chinese: Exploiting Characters and Radicals
Peng Jin,
John Carroll,
Yunfang Wu and
Diana McCarthy
Mathematical Problems in Engineering, 2012, vol. 2012, 1-11
Abstract:
Distributional Similarity has attracted considerable attention in the field of natural language processing as an automatic means of countering the ubiquitous problem of sparse data. As a logographic language, Chinese words consist of characters and each of them is composed of one or more radicals. The meanings of characters are usually highly related to the words which contain them. Likewise, radicals often make a predictable contribution to the meaning of a character: characters that have the same components tend to have similar or related meanings. In this paper, we utilize these properties of the Chinese language to improve Chinese word similarity computation. Given a content word, we first extract similar words based on a large corpus and a similarity score for ranking. This rank is then adjusted according to the characters and components shared between the similar word and the target word. Experiments on two gold standard datasets show that the adjusted rank is superior and closer to human judgments than the original rank. In addition to quantitative evaluation, we examine the reasons behind errors drawing on linguistic phenomena for our explanations.
Date: 2012
References: Add references at CitEc
Citations:
Downloads: (external link)
http://downloads.hindawi.com/journals/MPE/2012/347257.pdf (application/pdf)
http://downloads.hindawi.com/journals/MPE/2012/347257.xml (text/xml)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:347257
DOI: 10.1155/2012/347257
Access Statistics for this article
More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().