EconPapers    
Economics at your fingertips  
 

Coreference Annotation in the Russian Clinical Pear Stories Corpus: Annotation Features and Preliminary Results

Svetlana Toldova (), Elizaveta Ivtushok (), Kira Shulgina (), Mira Bergelson () and Mariya Khudyakova ()
Additional contact information
Svetlana Toldova: National Research University Higher School of Economics
Elizaveta Ivtushok: National Research University Higher School of Economics
Kira Shulgina: National Research University Higher School of Economics
Mira Bergelson: National Research University Higher School of Economics
Mariya Khudyakova: National Research University Higher School of Economics

HSE Working papers from National Research University Higher School of Economics

Abstract: This work is devoted to the distribution of different referential devices in spoken discourse produced by healthy speakers and people with aphasia and its comparison to written discourse. We discuss some special annotation issues for the corpus of Pear film retellings (Russian CliPS) by people with aphasia (PWA), right hemisphere damage (RHD), and healthy speakers (HP for healthy people) of Russian. The study summarizes the comprehensive annotation schema developed for this task and the preliminary research of the referential choice features based on the corpus. Comparing retellings and written texts, we found a significant difference in the use of basic coreferential expressions between the two. Firstly, there is a significant difference in the distribution of basic NP types. Speakers use reduced devices such as zero anaphora or bare nouns in retellings more frequently than in written texts. There are also differences in the distribution of more granulated features such as the word order within an NP, the use of anaphoric and reduced expressions (demonstratives or zero NPs) for the first mention of an entity, and the inclusion of epistemic markers into NPs. We also found that the retellings produced by PWA and HP do not differ much in terms of the distribution of basic NP types. However, a detailed analysis within different NP types and taking into consideration various disfluencies reveals some prominent differences between the two populations. These include a difference in zero subject distribution, the frequency of non-referential NP links, the frequency of co-reference errors. While adapting the initial coreference annotation scheme we concluded that besides referential ambiguity, which is normally taken into account in spoken discourse analysis, and basic taxonomy of the referential devices (full NP vs. anaphoric pronoun vs. anaphoric zero), other features need to be considered

Keywords: coreference annotation; retellings corpus; discourse; brain damage; aphasia (search for similar items in EconPapers)
JEL-codes: Z (search for similar items in EconPapers)
Pages: 20 pages
Date: 2016
New Economics Papers: this item is included in nep-cis
References: View complete reference list from CitEc
Citations:

Published in WP BRP Series: Linguistics / LNG, December 2016, pages 1-20

Downloads: (external link)
https://wp.hse.ru/data/2016/12/15/1111579976/50LNG2016.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hig:wpaper:50/lng/2016

Access Statistics for this paper

More papers in HSE Working papers from National Research University Higher School of Economics
Bibliographic data for series maintained by Shamil Abdulaev () and Shamil Abdulaev ().

 
Page updated 2025-03-30
Handle: RePEc:hig:wpaper:50/lng/2016