Comparing Machine Learning Models in Predicting On-Time Graduation with Emphasis on Feature Importance

Zulfa Safina Ibrahim, Rifqah Khairunnisa, Madiha Syarifah Balqis

Abstract


Timely graduation remains a key performance indicator in higher education and is closely linked to institutional efficiency and student success. In Indonesia, many students fail to graduate on time, resulting in resource inefficiencies and delayed workforce entry. Previous studies have primarily used conventional statistical methods such as logistic regression to analyze factors influencing graduation, but these approaches are limited in capturing complex, non-linear interactions. This study addresses this gap by applying a machine learning (ML) approach to predict on-time graduation while integrating logistic regression to enhance interpretability. The main contribution of this research lies in developing a hybrid model that balances predictive accuracy and interpretability, providing actionable insights for higher education institutions and aligning with Sustainable Development Goal (SDG) 4. Three ML algorithms (Random Forest, Support Vector Machine, and Naïve Bayes) were applied to a dataset comprising 18 academic, demographic, and institutional variables from an Indonesian university. Model performance was evaluated using accuracy, sensitivity, specificity, AUC, and Kappa metrics. Logistic regression was used to test the significance of key predictors. Results show that Random Forest achieved the highest overall accuracy (75.13%) and AUC (0.7021), while SVM and Naïve Bayes exhibited complementary strengths in sensitivity and specificity. Feature importance analysis highlighted GPA, faculty affiliation, and total credits as key predictors. These findings demonstrate the potential of combining ML and statistical techniques to support data-informed decisions in higher education and align with SDG 4 objectives. However, this study is limited to a single institution, which may affect the generalizability of the findings. Future research could extend the model to multi-institutional datasets for broader validation.

Keywords


Machine Learning; Student Performance; Random Forest; Educational Data Mining

Full Text:

PDF

References


J. Kim, “Student Perspectives on Barriers to Timely Graduation,” Int. Res. Educ., vol. 10, no. 1, p. 35, May 2022, https://doi.org/10.5296/ire.v10i1.19876.

Y. Wu, “Sustainablity in Higher Education: Strategies, Performance and Future Challenges,” Adv. Educ. Res. Eval., vol. 5, no. 1, pp. 264–266, Nov. 2024, https://doi.org/10.25082/AERE.2024.01.002.

Z. Kilasonia, “Higher education and the Sustainable Development Goals,” DAVID AGHMASHENEBELI Univ. Georg. Sci. J. „SPECTRI“, Mar. 2023, https://doi.org/10.52340/spectri.2023.15.

K. Okoye, J. T. Nganji, J. Escamilla, and S. Hosseini, “Machine learning model (RG-DMML) and ensemble algorithm for prediction of students’ retention and graduation in education,” Comput. Educ. Artif. Intell., vol. 6, p. 100205, Jun. 2024, https://doi.org/10.1016/j.caeai.2024.100205.

M. A. S. Pawitra, H.-C. Hung, and H. Jati, “A Machine Learning Approach to Predicting On-Time Graduation in Indonesian Higher Education,” Elinvo (Electronics, Informatics, Vocat. Educ., vol. 9, no. 2, pp. 294–308, Dec. 2024, https://doi.org/10.21831/elinvo.v9i2.77052.

A. Santoso, H. Retnawati, Kartianom, E. Apino, I. Rafi, and M. N. Rosyada, “Predicting Time to Graduation of Open University Students: An Educational Data Mining Study,” Open Educ. Stud., vol. 6, no. 1, Feb. 2024, https://doi.org/10.1515/edu-2022-0220.

A. Desfiandi and B. Soewito, “Student Graduation Time Prediction Using Logistic Regression, Decision Tree, Support Vector Machine, And Adaboost Ensemble Learning,” IJISCS (International J. Inf. Syst. Comput. Sci., vol. 7, no. 3, p. 195, Oct. 2023, https://doi.org/10.56327/ijiscs.v7i2.1579.

L. R. Pelima, Y. Sukmana, and Y. Rosmansyah, “Predicting University Student Graduation Using Academic Performance and Machine Learning: A Systematic Literature Review,” IEEE Access, vol. 12, pp. 23451–23465, 2024, https://doi.org/10.1109/ACCESS.2024.3361479.

A. Rahman, D. Mahdiana, and A. Fauzi, “Predicting Student On-Time Graduation Using Particle Swarm Optimization and Random Forest Algorithms,” Indones. J. Artif. Intell. Data Min., vol. 8, no. 1, 2025, https://doi.org/10.24014/ijaidm.v8i1.33577.

A. Artyukhov, T. Wołowiec, N. Artyukhova, S. Bogacki, and T. Vasylieva, “SDG 4, Academic Integrity and Artificial Intelligence: Clash or Win-Win Cooperation?,” Sustainability, vol. 16, no. 19, p. 8483, Sep. 2024, https://doi.org/10.3390/su16198483.

M. Nachouki, E. A. Mohamed, R. Mehdi, and M. Abou Naaj, “Student course grade prediction using the random forest algorithm: Analysis of predictors’ importance,” Trends Neurosci. Educ., vol. 33, p. 100214, 2023, https://doi.org/10.1016/j.tine.2023.100214.

M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning,” Decis. Anal. J., vol. 3, no. May, p. 100071, 2022, https://doi.org/10.1016/j.dajour.2022.100071.

Y. Gu, “Exploring the application of teaching evaluation models incorporating association rules and weighted naive Bayesian algorithms,” Intell. Syst. with Appl., vol. 20, no. November 2022, p. 200297, 2023, https://doi.org/10.1016/j.iswa.2023.200297.

H. Al Sagri and M. Ykhlef, “Quantifying Feature Importance for Detecting Depression using Random Forest,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 5, 2020, https://doi.org/10.14569/IJACSA.2020.0110577.

R. Bakri, N. P. Astuti, and A. S. Ahmar, “Machine Learning Algorithms with Parameter Tuning to Predict Students’ Graduation-on-time: A Case Study in Higher Education,” J. Appl. Sci. Eng. Technol. Educ., vol. 4, no. 2, pp. 259–265, Dec. 2022, https://doi.org/10.35877/454RI.asci1581.

M. R. Julianti, Y. Heryadi, B. Yulianto, and W. Budiharto, “Performance Graduation Student Predicting Using One-Class Support Vector Machine Algorithm,” Int. J. Intell. Syst. Appl. Eng., vol. 2024, no. 4, 2024, https://ijisae.org/index.php/IJISAE/article/view/6208.

C. Dewi, G. E. Laukon, H. J. Christanto, and S. A. Sutresno, “Modification of random forest method to predict student graduation data,” Mantik J., vol. 7, no. 4, 2024, https://iocscience.org/ejournal/index.php/mantik/article/view/4528.

K. Chenary, O. Pirian Kalat, and A. Sharifi, “Forecasting sustainable development goals scores by 2030 using machine learning models,” Sustain. Dev., vol. 32, no. 6, pp. 6520–6538, Dec. 2024, https://doi.org/10.1002/sd.3037.

S. Theodoridis, Machine Learning: A Bayesian and Optimization Perspective, Second Edition. Elsevier Ltd, 2020, https://doi.org/10.1016/C2019-0-03772-7.

I. Kononenko and M. Kukar, Machine learning and data mining: introduction to principles and algorithms. Horwood Publishing, 2008, https://doi.org/10.5860/CHOICE.45-3834.

B. Lantz, Machine learning with R. Packt Publishing, 2013, https://books.google.co.id/books?id=iNuSDwAAQBAJ.

Y. Zhang, New Advances in Machine Learning. InTech, 2012, https://doi.org/10.5772/225.

E. Halabaku and E. Bytyçi, “Overfitting in Machine Learning: A Comparative Analysis of Decision Trees and Random Forests,” Intell. Autom. Soft Comput., vol. 39, no. 6, pp. 987–1006, 2024, https://doi.org/10.32604/iasc.2024.059429.

A. Testas, “Random Forest Classification with Scikit-Learn and PySpark BT - Distributed Machine Learning with PySpark,” Apress, 2023, https://doi.org/10.1007/978-1-4842-9751-3_9.

V. S. Sahithi, I. V. M. Krishna, and M. V. S. S. Giridhar, “Analysing the Sensitivity of SVM Kernels on Hyperspectral Imagery for Land Use Land Cover Classification,” J. Image Process. Artif. Intell., vol. 8, no. 2, pp. 15–23, Jun. 2022, https://doi.org/10.46610/JOIPAI.2022.v08i02.003.

R. Kumar, B. Krishna Goswami, S. Motiram Mhatre, and S. Agrawal, “Naive Bayes in Focus: A Thorough Examination of its Algorithmic Foundations and Use Cases,” Int. J. Innov. Sci. Res. Technol., pp. 2078–2081, Jun. 2024, https://doi.org/10.38124/ijisrt/IJISRT24MAY1438.

Y. Chen and L. Zhai, “A comparative study on student performance prediction using machine learning,” Educ. Inf. Technol., vol. 28, no. 9, pp. 12039–12057, Sep. 2023, https://doi.org/10.1007/s10639-023-11672-1.

E. Ismanto, H. A. Ghani, and N. I. B. Md Saleh, “A comparative study of machine learning algorithms for virtual learning environment performance prediction,” IAES Int. J. Artif. Intell., vol. 12, no. 4, p. 1677, Dec. 2023, https://doi.org/10.11591/ijai.v12.i4.pp1677-1686.

M. D. Laddha, V. T. Lokare, A. W. Kiwelekar, and L. D. Netak, “Performance Analysis of the Impact of Technical Skills on Employability,” Int. J. Performability Eng., vol. 17, no. 4, p. 371, 2021, https://doi.org/10.23940/ijpe.21.04.p5.371378.

G. Ben Brahim, “Predicting Student Performance from Online Engagement Activities Using Novel Statistical Features,” Arab. J. Sci. Eng., vol. 47, no. 8, pp. 10225–10243, Aug. 2022, https://doi.org/10.1007/s13369-021-06548-w.

M. M. Tamada, R. Giusti, and J. F. de M. Netto, “Predicting Students at Risk of Dropout in Technical Course Using LMS Logs,” Electronics, vol. 11, no. 3, p. 468, Feb. 2022, https://doi.org/10.3390/electronics11030468.

M. V. Martins, L. Baptista, J. Machado, and V. Realinho, “Multi-Class Phased Prediction of Academic Performance and Dropout in Higher Education,” Appl. Sci., vol. 13, no. 8, p. 4702, Apr. 2023, https://doi.org/10.3390/app13084702.

M. A. Arif, A. Jahan, M. I. Mau, and R. Tummarzia, “An Improved Prediction System of Students’ Performance Using Classification model and Feature Selection Algorithm,” Int. J. Adv. Soft Comput. Appl., vol. 13, no. 1, 2021, https://www.i-csrs.org/Volumes/ijasca/2021.1.10.pdf.




DOI: https://doi.org/10.59247/jtped.v1i1.7

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Zulfa Safina Ibrahim, Rifqah Khairunnisa, Madiha Syarifah Balqis

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Journal of Technological Pedagogy and Educational Development
ISSN: xxxx-xxxx
Organized by Peneliti Teknologi Teknik Indonesia
Published by Peneliti Teknologi Teknik Indonesia
Website: https://ejournal.jtped.org/ojs/index.php/jtped
Email: alfian_maarif@ieee.org
Address: Jl. Empu Sedah No. 12, Pringwulung, Condongcatur, Kec. Depok, Kabupaten Sleman, Daerah Istimewa Yogyakarta 55281, Indonesia