Measurement method research of Chinese texts’ difficulty based on two-characters continuations
Dongjie Zhou and
Tianqing Zheng
PLOS ONE, 2024, vol. 19, issue 9, 1-13
Abstract:
Two-characters continuation, which is a string with two characters emerging in linear sequence, can break through the encapsulation and independence of long solidified language chunks (words and phrases). In this way, two-characters continuation can measure the information of not only static language units (words and phrases) but also their combination in the text. Therefore, two-characters continuation is used as a measurement unit for investigating Chinese text’s difficulty, to enhance the accuracy of measuring text’s difficulty. Three different measurement methods of text’s difficulty are proposed, which are respectively based on "continuation index of character", "new and stable two-characters continuation" and "emerging tendency of two-characters continuation". The results show that compared to other two methods, the measurement method of text’s difficulty based on new and stable two-characters continuations has better effectiveness, whose accuracies for measuring text’s difficulty with 6 levels, 3 levels and 2 levels difficulties can reach 36.4%, 64.6% and 79.6%, respectively. In addition, compared to Jiang and Wu’s research works, the above measurement method also shows a better effectiveness.
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0309717 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 09717&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0309717
DOI: 10.1371/journal.pone.0309717
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().