Knowledge-Aware Arabic Question Generation: A Transformer-Based Framework

Jabr, Reham Bin; Azmi, Aqil M.

Knowledge-Aware Arabic Question Generation: A Transformer-Based Framework

Reham Bin Jabr and Aqil M. Azmi ()
Additional contact information
Reham Bin Jabr: Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
Aqil M. Azmi: Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

Mathematics, 2025, vol. 13, issue 18, 1-31

Abstract: In this work, we propose a knowledge-aware approach for Arabic automatic question generation (QG) that leverages the multilingual T5 (mT5) transformer augmented with a pre-trained Arabic question-answering model to address challenges posed by Arabic’s morphological richness and limited QG resources. Our system generates both subjective questions and multiple-choice questions (MCQs) with contextually relevant distractors through a dual-model pipeline that tailors the decoding strategy to each subtask: the question generator employs beam search to maximize semantic fidelity and lexical precision, while the distractor generator uses top- k sampling to enhance diversity and contextual plausibility. The QG model is fine-tuned on Arabic SQuAD, and the distractor model is trained on a curated combination of ARCD and Qudrat. Experimental results show that beam search significantly outperforms top- k sampling for fact-based question generation, achieving a BLEU-4 score of 27.49 and a METEOR score of 25.18, surpassing fine-tuned AraT5 and translated English–Arabic baselines. In contrast, top- k sampling is more effective for distractor generation, yielding higher BLEU scores and producing distractors that are more diverse yet remain pedagogically valid, with a BLEU-1 score of 20.28 establishing a strong baseline in the absence of prior Arabic benchmarks. Human evaluation further confirms the quality of the generated questions. This work advances Arabic QG by providing a scalable, knowledge-aware solution with applications in educational technology, while demonstrating the critical role of task-specific decoding strategies and setting a foundation for future research in automated assessment.

Keywords: Arabic question generation; multiple-choice question generation; knowledge-aware NLP; low-resource language processing; beam search; top- k sampling (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/18/2975/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/18/2975/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:18:p:2975-:d:1749316

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().