EconPapers    
Economics at your fingertips  
 

Assessing the Impact of Static, Contextual and Character Embeddings for Arabic Machine Translation

Bensalah Nouhaila, Ayad Habib (), Adib Abdellah () and Ibn El Farouk Abdelhamid ()
Additional contact information
Bensalah Nouhaila: Data Science & Artificial Intelligence, University of Hassan II Casablanca, Casablanca 20000, Morocco
Ayad Habib: Data Science & Artificial Intelligence, University of Hassan II Casablanca, Casablanca 20000, Morocco
Adib Abdellah: Data Science & Artificial Intelligence, University of Hassan II Casablanca, Casablanca 20000, Morocco
Ibn El Farouk Abdelhamid: ��Teaching, Languages and Cultures Laboratory Mohammedia, University of Hassan II Casablanca, Casablanca 20000, Morocco

Journal of Information & Knowledge Management (JIKM), 2024, vol. 23, issue 02, 1-19

Abstract: Word embeddings/representations are an important component of Natural Language Processing (NLP) tasks. Most Neural Machine Translation (NMT) systems that use such representations disregard word morphology by assigning a unique vector to each unique word in the used vocabulary and thus cannot handle Out-Of-Vocabulary (OOV) words. In some languages, such as Arabic, the meaning of words is associated with the meaning of the individual characters that constitute them, as these characters embody internal information. In this study, a combination of character- and word- level models is used to determine the most effective approaches to semantically and morphologically representing affective Arabic words. Furthermore, this work examines the strategy of combining static, character and contextual word embeddings to obtain richer representations for the Arabic Machine Translation (MT) task. To the best of our knowledge, we are the first to investigate the combination of static word embeddings, contextual ones and character-level representation in Arabic MT. Furthermore, a Deep Learning (DL) architecture is employed on data preprocessed by various prominent preprocessing techniques. Various experiments were conducted and the findings indicate that the integration of various models for word embedding and character-level representation is feasible and more effective than the state-of-the-art Arabic MT systems.

Keywords: Arabic MT; character-word embeddings; Arabic word embeddings; transformer; CNN (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649224500096
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:23:y:2024:i:02:n:s0219649224500096

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649224500096

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-03-20
Handle: RePEc:wsi:jikmxx:v:23:y:2024:i:02:n:s0219649224500096