Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels
Miguel Angel Valles-Coral (),
Luis Salazar-Ramírez,
Richard Injante,
Edwin Augusto Hernandez-Torres,
Juan Juárez-Díaz,
Jorge Raul Navarro-Cabrera,
Lloy Pinedo and
Pierre Vidaurre-Rojas
Additional contact information
Miguel Angel Valles-Coral: Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru
Luis Salazar-Ramírez: Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru
Richard Injante: Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru
Edwin Augusto Hernandez-Torres: Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru
Juan Juárez-Díaz: Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru
Jorge Raul Navarro-Cabrera: Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru
Lloy Pinedo: Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru
Pierre Vidaurre-Rojas: Facultad de Ingeniería de Sistemas e Informática, Universidad Nacional de San Martín, Jr. Maynas, Tarapoto 22200, Peru
Data, 2022, vol. 7, issue 11, 1-18
Abstract:
Compliance with the basic conditions of quality in higher education implies the design of strategies to reduce student dropout, and Information and Communication Technologies (ICT) in the educational field have allowed directing, reinforcing, and consolidating the process of professional academic training. We propose an academic and emotional tracking model that uses data mining and machine learning to group university students according to their level of dropout risk. We worked with 670 students from a Peruvian public university, applied 5 valid and reliable psychological assessment questionnaires to them using a chatbot-based system, and then classified them using 3 density-based unsupervised learning algorithms, DBSCAN, K-Means, and HDBSCAN. The results showed that HDBSCAN was the most robust option, obtaining better validity levels in two of the three internal indices evaluated, where the performance of the Silhouette index was 0.6823, the performance of the Davies–Bouldin index was 0.6563, and the performance of the Calinski–Harabasz index was 369.6459. The best number of clusters produced by the internal indices was five. For the validation of external indices, with answers from mental health professionals, we obtained a high level of precision in the F -measure: 90.9%, purity: 94.5%, V -measure: 86.9%, and ARI: 86.5%, and this indicates the robustness of the proposed model that allows us to categorize university students into five levels according to the risk of dropping out.
Keywords: clustering; data mining; DBSCAN; K-Means; HDBSCAN (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2306-5729/7/11/165/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/11/165/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:11:p:165-:d:976562
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().