Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia
Vishal Goyal,
Ajit Kumar and
Manpreet Singh Lehal
Additional contact information
Vishal Goyal: Punjabi University, India
Ajit Kumar: Multani Mal Modi College, India
Manpreet Singh Lehal: Punjabi University
International Journal of E-Adoption (IJEA), 2020, vol. 12, issue 1, 42-51
Abstract:
Comparable corpora come as an alternative to parallel corpora for the languages where the parallel corpora is scarce. The efficiency of the models trained on comparable corpora is comparatively less to that of the parallel corpora however it helps to compensate much to the machine translation. In this article, the authors have explored Wikipedia as a potential source and delineated the process of alignment of documents which will be further used for the extraction of parallel data. The parallel data thus extracted will help to enhance the performance of Statistical Machine translation.
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 4018/IJEA.2020010104 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:igg:jea000:v:12:y:2020:i:1:p:42-51
Access Statistics for this article
International Journal of E-Adoption (IJEA) is currently edited by Hayden Wimmer
More articles in International Journal of E-Adoption (IJEA) from IGI Global
Bibliographic data for series maintained by Journal Editor ().