Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data

Rodríguez Velasco, Carmen Lilí and García Villena, Eduardo and Brito Ballester, Julién and Durántez Prados, Frigdiano Álvaro and Silva Alvarado, Eduardo René and Crespo Álvarez, Jorge carmen.rodriguez@uneatlantico.es, eduardo.garcia@uneatlantico.es, julien.brito@uneatlantico.es, durantez@uneatlantico.es, eduardo.silva@funiber.org, jorge.crespo@uneatlantico.es (2023) Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data. International Journal of Emerging Technologies in Learning (iJET), 18 (04). pp. 120-155. ISSN 1863-0383

[img]
Preview
Text
document.pdf
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

The purpose of this research article was to contrast the benefits of the optimal probability threshold adjustment technique with other imbalanced data processing techniques, in its application to the prediction of post-graduate students’ late dropout from distance learning courses in two universities in the Ibero-American space. In this context, the optimization of the Logistic Regression, Random Forest, and Neural Network classifiers, together with different techniques, attributes, and algorithms (Hyperparameters, SMOTE, SMOTE_SVM, and ADASYN) resulted in a set of metrics for decision-making, prioritizing the reduction of false negatives. The best model was the Neural Network model in combination with SMOTE_SVM, obtaining a recall index of 0.75 and an f1-Score of 0.60. Likewise, the robustness of the Random Forest classifier for imbalanced data was demonstrated by achieving, with an optimal threshold of 0.427, very similar metrics to those obtained by the consensus of the three best models found. This demonstrates that, for Random Forest, the optimal prediction probability threshold is an excellent alternative to resampling techniques with different optimal thresholds. Finally, it is hoped that this research paper will contribute to boost the application of this simple but powerful technique, which is highly underrated with respect to data resampling techniques for imbalanced data.

Item Type: Article
Uncontrolled Keywords: optimal likelihood threshold,, imbalanced data, student dropout prediction, resample techniques, distance learning courses
Subjects: Subjects > Engineering
Subjects > Teaching
Divisions: Europe University of Atlantic > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Scientific Production
Depositing User: Sr Bibliotecario
Date Deposited: 27 Feb 2023 09:06
Last Modified: 21 Oct 2024 12:28
URI: http://repositorio.funiber.org/id/eprint/6067

Actions (login required)

View Item View Item