A Semantic Enhancement Framework for Multimodal Sarcasm Detection

Zhong, Weiyu; Zhang, Zhengxuan; Wu, Qiaofeng; Xue, Yun; Cai, Qianhua

A Semantic Enhancement Framework for Multimodal Sarcasm Detection

Weiyu Zhong, Zhengxuan Zhang, Qiaofeng Wu, Yun Xue and Qianhua Cai ()
Additional contact information
Weiyu Zhong: School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China
Zhengxuan Zhang: School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China
Qiaofeng Wu: School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China
Yun Xue: School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China
Qianhua Cai: School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China

Mathematics, 2024, vol. 12, issue 2, 1-13

Abstract: Sarcasm represents a language form where a discrepancy lies between the literal meanings and implied intention. Sarcasm detection is challenging with unimodal text without clearly understanding the context, based on which multimodal information is introduced to benefit detection. However, current approaches only focus on modeling text–image incongruity at the token level and use the incongruity as the key to detection, ignoring the significance of the overall multimodal features and textual semantics during processing. Moreover, semantic information from other samples with a similar manner of expression also facilitates sarcasm detection. In this work, a semantic enhancement framework is proposed to address image–text congruity by modeling textual and visual information at the multi-scale and multi-span token level. The efficacy of textual semantics in multimodal sarcasm detection is pronounced. Aiming to bridge the cross-modal semantic gap, semantic enhancement is performed by using a multiple contrastive learning strategy. Experiments were conducted on a benchmark dataset. Our model outperforms the latest baseline by 1.87% in terms of the F1-score and 1% in terms of accuracy.

Keywords: multimodal sarcasm detection; contrastive learning; graph neural networks; social media (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/2/317/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/2/317/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:2:p:317-:d:1321771

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().