Pre-Experiments on Annotation of Russian Coreference Corpus"
Svetlana Toldova (),
Ilya Azerkovich (),
Yulia Grishina (),
Alina Ladygina (),
Olga Lyashevkaya (),
Anna Roytberg (),
Galina Sim () and
Maria Vasilieva ()
Additional contact information
Svetlana Toldova: National Research University Higher School of Economics
Ilya Azerkovich: National Research University Higher School of Economics
Yulia Grishina: University of Potsdam
Alina Ladygina: University of Tubingen
Olga Lyashevkaya: National Research University Higher School of Economics
Anna Roytberg: National Research University Higher School of Economics
Galina Sim: Lomonosov Moscow State University
Maria Vasilieva: Lomonosov Moscow State University
HSE Working papers from National Research University Higher School of Economics
Abstract:
Building benchmark corpora in the domain of coreference and anaphora resolution is an important task for developing and evaluating NLP systems and models. Our study is aimed at assessing the feasibility of enhancing corpora with information about coreference relations. The annotation procedure includes identification of text segments that are subject to annotation (markables), marking their syntactic heads and identifying coreferential links. Markables are classified according to their morphological, syntactic and reference structure. The annotation is performed manually, providing gold standard data for high-level NLP tasks such as anaphora and coreference resolution. The paper reports on inconsistencies in selecting NPs of various types as markables and their borders, and in ways of constructing anaphoric pairs. We consider the types of NPs missed by annotators, and the discourse and semantic factors that may have affected the annotators’ judgements
Keywords: anaphora; coreference; coreference corpus; Russian language; corpus annotation; inter-annotator agreement. (search for similar items in EconPapers)
JEL-codes: Z (search for similar items in EconPapers)
Pages: 27 pages
Date: 2015
New Economics Papers: this item is included in nep-cis
References: View complete reference list from CitEc
Citations:
Published in WP BRP Series: Linguistics / LNG, December 2015, pages 1-27
Downloads: (external link)
http://www.hse.ru/data/2015/12/29/1136297632/35LNG2015_ed.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hig:wpaper:35/lng/2015
Access Statistics for this paper
More papers in HSE Working papers from National Research University Higher School of Economics
Bibliographic data for series maintained by Shamil Abdulaev () and Shamil Abdulaev ().