EconPapers    
Economics at your fingertips  
 

Conceptual analysis of parallel corpus collected from the Web

Kar Wing Li and Christopher C. Yang

Journal of the American Society for Information Science and Technology, 2006, vol. 57, issue 5, 632-644

Abstract: As illustrated by the World Wide Web, the volume of information in languages other than English has grown significantly in recent years. This highlights the importance of multilingual corpora. Much effort has been devoted to the compilation of multilingual corpora for the purpose of cross‐lingual information retrieval and machine translation. Existing parallel corpora mostly involve European languages, such as English–French and English–Spanish. There is still a lack of parallel corpora between European languages and Asian languages. In the authors' previous work, an alignment method to identify one‐to‐one Chinese and English title pairs was developed to construct an English–Chinese parallel corpus that works automatically from the World Wide Web, and a 100% precision and 87% recall were obtained. Careful analysis of these results has helped the authors to understand how the alignment method can be improved. A conceptual analysis was conducted, which includes the analysis of conceptual equivalent and conceptual information alternation in the aligned and nonaligned English–Chinese title pairs that are obtained by the alignment method. The result of the analysis not only reflects the characteristics of parallel corpora, but also gives insight into the strengths and weaknesses of the alignment method. In particular, conceptual alternation, such as omission and addition, is found to have a significant impact on the performance of the alignment method.

Date: 2006
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.20326

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:57:y:2006:i:5:p:632-644

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:57:y:2006:i:5:p:632-644