Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals
I-Cheng Chang,
Tai-Kuei Yu,
Yu-Jie Chang and
Tai-Yi Yu
Additional contact information
I-Cheng Chang: Department of Environmental Engineering, National Ilan University, Yilan 260, Taiwan
Tai-Kuei Yu: Department of Business Administration, National Quemoy University, Kinmen 892, Taiwan
Yu-Jie Chang: Department of Earth and Life Science, University of Taipei, Taipei 100, Taiwan
Tai-Yi Yu: Department of Risk Management and Insurance, Ming Chuan University, Taipei 111, Taiwan
Sustainability, 2021, vol. 13, issue 19, 1-20
Abstract:
Facing the big data wave, this study applied artificial intelligence to cite knowledge and find a feasible process to play a crucial role in supplying innovative value in environmental education. Intelligence agents of artificial intelligence and natural language processing (NLP) are two key areas leading the trend in artificial intelligence; this research adopted NLP to analyze the research topics of environmental education research journals in the Web of Science (WoS) database during 2011–2020 and interpret the categories and characteristics of abstracts for environmental education papers. The corpus data were selected from abstracts and keywords of research journal papers, which were analyzed with text mining, cluster analysis, latent Dirichlet allocation (LDA), and co-word analysis methods. The decisions regarding the classification of feature words were determined and reviewed by domain experts, and the associated TF-IDF weights were calculated for the following cluster analysis, which involved a combination of hierarchical clustering and K-means analysis. The hierarchical clustering and LDA decided the number of required categories as seven, and the K-means cluster analysis classified the overall documents into seven categories. This study utilized co-word analysis to check the suitability of the K-means classification, analyzed the terms with high TF-IDF wights for distinct K-means groups, and examined the terms for different topics with the LDA technique. A comparison of the results demonstrated that most categories that were recognized with K-means and LDA methods were the same and shared similar words; however, two categories had slight differences. The involvement of field experts assisted with the consistency and correctness of the classified topics and documents.
Keywords: environmental education; cluster analysis; natural language processing; latent Dirichlet allocation (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.mdpi.com/2071-1050/13/19/10856/pdf (application/pdf)
https://www.mdpi.com/2071-1050/13/19/10856/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:13:y:2021:i:19:p:10856-:d:646836
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().