Automatic Dependency Parsing of a Learner English Corpus Realec
Olga Lyashevskaya () and
Irina Panteleeva ()
Additional contact information
Olga Lyashevskaya: National Research University Higher School of Economics
Irina Panteleeva: National Research University Higher School of Economics
HSE Working papers from National Research University Higher School of Economics
Abstract:
The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The essays are a part of students' preparation for the independent final examination similar to the international English exam. While adjusting existing dependency parsing tools to a learner data, one has to take into account to what extent students' mistakes provoke errors in the parser output. The ungrammatical and stylistically inappropriate utterances may challenge parsers' algorithms trained on grammatically appropriate written texts. In our experiments, we compared the output of the dependency parser UDPipe (trained on UD-English 2.0) with the results of manual parsing, placing a particular focus on parses of ungrammatical English clauses. We show how mistakes made by students influence the work of the parser. Overall, UDPipe performed reasonably well (UAS 92.9, LAS 91.7). The following cases cause the errors in automatic annotation a) incorrect detection of a head, b) incorrect detection of the relation type, as well as c) both. We propose some solutions which could improve the automatic output and thus make the assessment of syntactic complexity more reliable.
Keywords: learner corpus; dependency annotation of learner treebank; Universal Dependencies; evaluation of parser quality; L2 English. (search for similar items in EconPapers)
JEL-codes: Z (search for similar items in EconPapers)
Pages: 13 pages
Date: 2017
New Economics Papers: this item is included in nep-cis
References: View complete reference list from CitEc
Citations:
Published in WP BRP Series: Linguistics / LNG, December 2017, pages 1-13
Downloads: (external link)
https://wp.hse.ru/data/2017/12/18/1159875511/62LNG2017.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hig:wpaper:62/lng/2017
Access Statistics for this paper
More papers in HSE Working papers from National Research University Higher School of Economics
Bibliographic data for series maintained by Shamil Abdulaev () and Shamil Abdulaev ().