EconPapers    
Economics at your fingertips  
 

Contrastive Learning-Based Cross-Modal Fusion for Product Form Imagery Recognition: A Case Study on New Energy Vehicle Front-End Design

Yutong Zhang, Jiantao Wu (), Li Sun and Guoan Yang
Additional contact information
Yutong Zhang: School of Arts and Design, Yanshan University, Qinhuangdao 066000, China
Jiantao Wu: School of Arts and Design, Yanshan University, Qinhuangdao 066000, China
Li Sun: School of Arts and Design, Yanshan University, Qinhuangdao 066000, China
Guoan Yang: School of Arts and Design, Yanshan University, Qinhuangdao 066000, China

Sustainability, 2025, vol. 17, issue 10, 1-28

Abstract: Fine-grained feature extraction and affective semantic mapping remain significant challenges in product form analysis. To address these issues, this study proposes a contrastive learning-based cross-modal fusion approach for product form imagery recognition, using the front-end design of new energy vehicles (NEVs) as a case study. The proposed method first employs the Biterm Topic Model (BTM) and Analytic Hierarchy Process (AHP) to extract thematic patterns and compute weight distributions from consumer review texts, thereby identifying key imagery style labels. These labels are then leveraged for image annotation, facilitating the construction of a multimodal dataset. Next, ResNet-50 and Transformer architectures serve as the image and text encoders, respectively, to extract and represent multimodal features. To ensure effective alignment and deep fusion of textual and visual representations in a shared embedding space, a contrastive learning mechanism is introduced, optimizing cosine similarity between positive and negative sample pairs. Finally, a fully connected multilayer network is integrated at the output of the Transformer and ResNet with Contrastive Learning (TRCL) model to enhance classification accuracy and reliability. Comparative experiments against various deep convolutional neural networks (DCNNs) demonstrate that the TRCL model effectively integrates semantic and visual information, significantly improving the accuracy and robustness of complex product form imagery recognition. These findings suggest that the proposed method holds substantial potential for large-scale product appearance evaluation and affective cognition research. Moreover, this data-driven fusion underpins sustainable product form design by streamlining evaluation and optimizing resource use.

Keywords: product form imagery; cross-modal information fusion; contrastive learning; Resnet-50; Transformer; new energy vehicle front-end design (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2071-1050/17/10/4432/pdf (application/pdf)
https://www.mdpi.com/2071-1050/17/10/4432/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:17:y:2025:i:10:p:4432-:d:1654886

Access Statistics for this article

Sustainability is currently edited by Ms. Alexandra Wu

More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-05-14
Handle: RePEc:gam:jsusta:v:17:y:2025:i:10:p:4432-:d:1654886