ZaQQ: A New Arabic Dataset for Automatic Essay Scoring via a Novel Human–AI Collaborative Framework

Elsayed, Yomna; Nabil, Emad; Torki, Marwan; Faizullah, Safiullah; Khalafallah, Ayman

ZaQQ: A New Arabic Dataset for Automatic Essay Scoring via a Novel Human–AI Collaborative Framework

Yomna Elsayed, Emad Nabil (), Marwan Torki, Safiullah Faizullah and Ayman Khalafallah
Additional contact information
Yomna Elsayed: Computer and Systems Engineering Department, Alexandria University, Alexandria 21526, Egypt
Emad Nabil: Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia
Marwan Torki: Computer and Systems Engineering Department, Alexandria University, Alexandria 21526, Egypt
Safiullah Faizullah: Faculty of Computer and Information Systems, Islamic University of Madinah, Madinah 42351, Saudi Arabia
Ayman Khalafallah: Computer and Systems Engineering Department, Alexandria University, Alexandria 21526, Egypt

Data, 2025, vol. 10, issue 9, 1-31

Abstract: Automated essay scoring (AES) has become an essential tool in educational assessment. However, applying AES to the Arabic language presents notable challenges, primarily due to the lack of labeled datasets. This data scarcity hampers the development of reliable machine learning models and slows progress in Arabic natural language processing for educational use. While manual annotation by human experts remains the most accurate method for essay evaluation, it is often too costly and time-consuming to create large-scale datasets, especially for low-resource languages like Arabic. In this work, we introduce a human–AI collaborative framework designed to overcome the shortage of scored Arabic essays. Leveraging QAES, a high-quality annotated dataset, our approach uses Large Language Models (LLMs) to generate multidimensional essay evaluations across seven key writing traits: Relevance, Organization, Vocabulary, Style, Development, Mechanics, and Structure. To ensure accuracy and consistency, we design prompting strategies and validation procedures tailored to each trait. This system is then applied to two unannotated Arabic essay datasets: ZAEBUC and QALB. As a result, we introduce ZaQQ, a newly annotated dataset that merges ZA EBUC, Q AES, and Q ALB. Our findings demonstrate that human–AI collaboration can significantly enhance the availability of labeled resources without compromising assessment quality. The proposed framework serves as a scalable and replicable model for addressing data annotation challenges in low-resource languages and supports the broader goal of expanding access to automated educational assessment tools where expert evaluation is limited.

Keywords: automatic essay scoring; Arabic NLP; Large Language Models; Arabic essay scoring dataset; annotation framework; multidimensional assessments (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/10/9/148/pdf (application/pdf)
https://www.mdpi.com/2306-5729/10/9/148/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:10:y:2025:i:9:p:148-:d:1753187

Access Statistics for this article

Data is currently edited by Ms. Becky Zhang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().