APPLYING PROBABILISTIC TAGGING TO RUSSIAN POETRY
Alexey Starchenko (aleksey-starchenko@mail.ru),
Lev Kazakevich (lvkazakevich@edu.hse.ru) and
Olga Lyashevskaya (olesar@yandex.ru)
Additional contact information
Alexey Starchenko: National Research University Higher School of Economics
Lev Kazakevich: National Research University Higher School of Economics
Olga Lyashevskaya: National Research University Higher School of Economics
HSE Working papers from National Research University Higher School of Economics
Abstract:
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as one state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we discuss the method to compile the gold standard datasets for the Russian poetry. Secondly, we focus on the taggers’ performance in the identification of the part of speech tags and lemmas. These two annotation layers are key to compiling the corpus-based dictionaries, which we consider a long-term goal of our project
Keywords: natural language processing; full morphology tagging; NLP evaluation; Russian language; Russian poetry (search for similar items in EconPapers)
JEL-codes: Z (search for similar items in EconPapers)
Pages: 17 pages
Date: 2018
New Economics Papers: this item is included in nep-cis
References: Add references at CitEc
Citations:
Published in WP BRP Series: Linguistics / LNG, December 2018, pages 1-17
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hig:wpaper:76/lng/2018
Access Statistics for this paper
More papers in HSE Working papers from National Research University Higher School of Economics
Bibliographic data for series maintained by Shamil Abdulaev (sabdulaev@hse.ru) and Shamil Abdulaev (sabdulaev@hse.ru).