EconPapers    
Economics at your fingertips  
 

A HYBRID LEMMATISER FOR OLD CHURCH SLAVONIC

Ilia Afanasev ()
Additional contact information
Ilia Afanasev: National Research University Higher School of Economics

HSE Working papers from National Research University Higher School of Economics

Abstract: The article considers a lemmatiser that is developed specifically for Old Church Slavonic (OCS). The introduction underlines the problem of the lack of lemmatisers that might deal with different datasets of the OCS. The review gives a short description of previous attempts and current trends in lemmatisation. The lemmatiser is hybrid-based and uses the advantages of linguistic rules for specific cases (fragmentary tokens, punctuation, or digits), a dictionary for the most common tokens, and a sequence-to-sequence (seq2seq) neural network with an attention mechanism for the rest of material. The model achieves an 85% overall accuracy score, which is lower than one of the previous models for the Universal Dependencies(UD) dataset. However, when specific tokens are taken into consideration, the model outperforms the previous ones with the help of its rule-based part. Possible further directions of the research include the use of more sophisticated architectures, such as BART.

Keywords: lemmatisation; Old Church Slavonic; hybrid approach; natural language processing; seq2seq. (search for similar items in EconPapers)
JEL-codes: Z (search for similar items in EconPapers)
Pages: 19 pages
Date: 2021
References: View complete reference list from CitEc
Citations:

Published in WP BRP Series: Linguistics / LNG, February 2021, pages 1-19

Downloads: (external link)
https://wp.hse.ru/data/2021/02/18/1393879077/106LNG2021.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hig:wpaper:106/lng/2021

Access Statistics for this paper

More papers in HSE Working papers from National Research University Higher School of Economics
Bibliographic data for series maintained by Shamil Abdulaev () and Shamil Abdulaev ().

 
Page updated 2025-04-16
Handle: RePEc:hig:wpaper:106/lng/2021