Comparative Study Of Data Clustering Algorithms And Analysis Of The Keywords Extraction Efficiency: Learner Corpus Case
Anna Scherbakova ()
Additional contact information
Anna Scherbakova: National Research University Higher School of Economics
HSE Working papers from National Research University Higher School of Economics
Abstract:
The paper focuses on the task of clustering essays produced by ESL (English as a Second Language) learners. The data was taken from a learner corpus REALEC. The division of texts by certain characteristics can be useful to speed up the analysis of a single corpus or access to the necessary sections of a large number of documents. The study discusses not only some existing approaches to clustering text data, as well as the possibility of clustering texts produced by ESL learners, but also ways to extract keywords in order to determine the topic of the essays in each group.
Keywords: learner corpus; text documents clustering; document embedding; keywords extraction; metadata enrichment. (search for similar items in EconPapers)
JEL-codes: Z (search for similar items in EconPapers)
Pages: 18 pages
Date: 2020
New Economics Papers: this item is included in nep-cmp
References: View complete reference list from CitEc
Citations:
Published in WP BRP Series: Linguistics / LNG, December 2020, pages 1-18
Downloads: (external link)
https://wp.hse.ru/data/2020/12/01/1353767171/97LNG2020.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hig:wpaper:97/lng/2020
Access Statistics for this paper
More papers in HSE Working papers from National Research University Higher School of Economics
Bibliographic data for series maintained by Shamil Abdulaev () and Shamil Abdulaev ().