Machine Learning in Educational Data Mining: Current Trends and Emerging Gaps in Predicting Student Performance

Rugaya Tuanaya, Shazia Aslam, Safdar Ali, Ayesha Ajmal

Abstract


The growing availability of educational data and advancements in machine learning (ML) have led to its widespread application in predicting student performance. However, current research remains fragmented, lacking an integrated understanding of knowledge structures, collaboration networks, and thematic directions. This study addresses this gap through a bibliometric analysis of literature focused on ML-based student performance prediction. The study contributes a comprehensive knowledge map that identifies major research themes, key contributors, and underexplored areas within educational data mining. Using 465 Scopus-indexed articles from 2005–2025, four bibliometric techniques were applied: descriptive analysis, collaboration mapping, keyword co-occurrence, and thematic mapping. Data were processed using Bibliometrix in R and visualized via Biblioshiny. Findings reveal a sharp increase in publication trends after 2018, with China, the U.S., and India as top contributors. Despite high output, international collaboration remains limited to certain clusters, while countries like Pakistan and Indonesia show high collaborative intensity. Keyword analysis highlights “student performance” and “machine learning” as core themes, while federated learning, contrastive learning, and algorithmic fairness are emerging gaps. Non-cognitive factors such as motivation and emotional engagement are also underrepresented in predictive models. In conclusion, this study offers a systematic overview of the field, outlining its evolution, key players, and future directions. The results provide valuable insights for designing predictive models that are accurate, ethical, and contextually appropriate for higher education.

Keywords


Machine Learning; Student Performance Prediction; Educational Data Mining; Bibliometric Analysis

Full Text:

PDF HTML

References


[1] N. Sghir, A. Adadi, and M. Lahmer, “Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022),” Educ. Inf. Technol., vol. 28, no. 7, pp. 8299–8333, Jul. 2023, https://doi.org/10.1007/s10639-022-11536-0.

[2] Z. Ersozlu, S. Taheri, and I. Koch, “A review of machine learning methods used for educational data,” Educ. Inf. Technol., vol. 29, no. 16, pp. 22125–22145, Nov. 2024, https://doi.org/10.1007/s10639-024-12704-0.

[3] T. Doleck, D. J. Lemay, R. B. Basnet, and P. Bazelais, “Predictive analytics in education: a comparison of deep learning frameworks,” Educ. Inf. Technol., vol. 25, no. 3, pp. 1951–1963, May 2020, https://doi.org/10.1007/s10639-019-10068-4.

[4] S. Sarker, M. K. Paul, S. T. H. Thasin, and M. A. M. Hasan, “Analyzing students’ academic performance using educational data mining,” Comput. Educ. Artif. Intell., vol. 7, p. 100263, Dec. 2024, https://doi.org/10.1016/j.caeai.2024.100263.

[5] M. M. Islam, F. H. Sojib, M. F. H. Mihad, M. Hasan, and M. Rahman, “The integration of explainable AI in Educational Data Mining for student academic performance prediction and support system,” Telemat. Informatics Reports, vol. 18, p. 100203, Jun. 2025, https://doi.org/10.1016/j.teler.2025.100203.

[6] L. C. Sorensen, “‘Big Data’ in Educational Administration: An Application for Predicting School Dropout Risk,” Educ. Adm. Q., vol. 55, no. 3, pp. 404–446, Aug. 2019, https://doi.org/10.1177/0013161X18799439.

[7] P. Jiao, F. Ouyang, Q. Zhang, and A. H. Alavi, “Artificial intelligence-enabled prediction model of student academic performance in online engineering education,” Artif. Intell. Rev., vol. 55, no. 8, pp. 6321–6344, Dec. 2022, https://doi.org/10.1007/s10462-022-10155-y.

[8] N. Naicker, T. Adeliyi, and J. Wing, “Linear Support Vector Machines for Prediction of Student Performance in School-Based Education,” Math. Probl. Eng., vol. 2020, 2020, https://doi.org/10.1155/2020/4761468.

[9] E. Alhazmi and A. Sheneamer, “Early Predicting of Students Performance in Higher Education,” IEEE Access, vol. 11, no. March, pp. 27579–27589, 2023, https://doi.org/10.1109/ACCESS.2023.3250702.

[10] M. Qadach, C. Schechter, and R. Da’as, “From Principals to Teachers to Students: Exploring an Integrative Model for Predicting Students’ Achievements,” Educ. Adm. Q., vol. 56, no. 5, pp. 736–778, Dec. 2020, https://doi.org/10.1177/0013161X20907133.

[11] E. Ahmed, “Student Performance Prediction Using Machine Learning Algorithms,” Appl. Comput. Intell. Soft Comput., vol. 2024, 2024, https://doi.org/10.1155/2024/4067721.

[12] M. Wu, G. Subramaniam, D. Zhu, C. Li, H. Ding, and Y. Zhang, “Using Machine Learning-based Algorithms to Predict Academic Performance - A Systematic Literature Review,” in 2024 4th International Conference on Innovative Practices in Technology and Management (ICIPTM), IEEE, Feb. 2024, pp. 1–8, https://doi.org/10.1109/ICIPTM59628.2024.10563566.

[13] Y. A. Alsariera, Y. Baashar, G. Alkawsi, A. Mustafa, A. A. Alkahtani, and N. Ali, “Assessment and Evaluation of Different Machine Learning Algorithms for Predicting Student Performance,” Comput. Intell. Neurosci., vol. 2022, pp. 1–11, May 2022, https://doi.org/10.1155/2022/4151487.

[14] L. I. Dobrescu, A. Motta, and A. Scriven, “The predictive power of Big Data in education,” in Critical Perspectives on Economics of Education, London: Routledge, 2022, pp. 237–269, https://doi.org/10.4324/9781003100232-11.

[15] Y. Liu, S. Fan, S. Xu, A. Sajjanhar, S. Yeom, and Y. Wei, “Predicting Student Performance Using Clickstream Data and Machine Learning,” Educ. Sci., vol. 13, no. 1, p. 17, Dec. 2022, https://doi.org/10.3390/educsci13010017.

[16] Y. Wu, “Research on Prediction Algorithm of College Students’ Academic Performance Based on Bert-GCN Multi-modal Data Fusion,” Syst. Soft Comput., p. 200327, Jun. 2025, https://doi.org/10.1016/j.sasc.2025.200327.

[17] B. Albreiki, N. Zaki, and H. Alashwal, “A Systematic Literature Review of Student’ Performance Prediction Using Machine Learning Techniques,” Educ. Sci., vol. 11, no. 9, 2021, https://doi.org/10.3390/educsci11090552.

[18] J. A. Moral-muñoz et al., “Software tools for conducting bibliometric analysis in science: An up- to-date review,” El Prof. la informa- ción, vol. 29, pp. 1–20, 2020, https://doi.org/10.3145/epi.2020.ene.03.

[19] N. Donthu, S. Kumar, D. Mukherjee, N. Pandey, and W. M. Lim, “How to conduct a bibliometric analysis: An overview and guidelines,” J. Bus. Res., vol. 133, no. April, pp. 285–296, 2021, https://doi.org/10.1016/j.jbusres.2021.04.070.

[20] P. Bazeley, “Conceptualising research performance,” Stud. High. Educ., vol. 35, no. 8, pp. 889–903, Dec. 2010, https://doi.org/10.1080/03075070903348404.

[21] I. Passas, “Bibliometric Analysis: The Main Steps,” Encyclopedia, vol. 4, no. 2, pp. 1014–1025, Jun. 2024, https://doi.org/10.3390/encyclopedia4020065.

[22] D. Rakic, B. Lazic, and M. Maric, “The Influence of Differentiated Mathematical Tasks on Students’ Logical-Combinatorial Thinking in Elementary Mathematics Teaching,” Slavon. Pedagog. Stud. J., vol. 10, no. 1, pp. 78–92, Feb. 2021, https://doi.org/10.18355/PG.2021.10.1.7.

[23] O. Öztürk, R. Kocaman, and D. K. Kanbach, “How to design bibliometric research: an overview and a framework proposal,” Rev. Manag. Sci., vol. 18, no. 11, pp. 3333–3361, Nov. 2024, https://doi.org/10.1007/s11846-024-00738-0.

[24] K. Alalawi, R. Athauda, and R. Chiong, “Contextualizing the current state of research on the use of machine learning for student performance prediction: A systematic literature review,” Eng. Reports, vol. 5, no. 12, pp. 1–25, 2023, https://doi.org/10.1002/eng2.12699.

[25] S. Rizwan, C. K. Nee, and S. Garfan, “Identifying the Factors Affecting Student Academic Performance and Engagement Prediction in MOOC Using Deep Learning: A Systematic Literature Review,” IEEE Access, vol. 13, pp. 18952–18982, 2025, https://doi.org/10.1109/ACCESS.2025.3533915.

[26] E. López-Meneses, L. López-Catalán, N. Pelícano-Piris, and P. C. Mellado-Moreno, “Artificial Intelligence in Educational Data Mining and Human-in-the-Loop Machine Learning and Machine Teaching: Analysis of Scientific Knowledge,” Appl. Sci., vol. 15, no. 2, p. 772, Jan. 2025, https://doi.org/10.3390/app15020772.

[27] C. Ouhaddou, A. Retbi, and S. Bennani, “Predicting Student Academic Path Using Machine Learning: Systematic Review,” in 2025 5th International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), IEEE, May 2025, pp. 1–8, https://doi.org/10.1109/IRASET64571.2025.11008352.

[28] M. Salem and K. Shaalan, “Unlocking the power of machine learning in E-learning: A comprehensive review of predictive models for student performance and engagement,” Educ. Inf. Technol., Apr. 2025, https://doi.org/10.1007/s10639-025-13526-4.

[29] M. Aria and C. Cuccurullo, “bibliometrix : An R-tool for comprehensive science mapping analysis,” J. Informetr., vol. 11, no. 4, pp. 959–975, Nov. 2017, https://doi.org/10.1016/j.joi.2017.08.007.

[30] A. I. Abdi, A. M. Omar, A. O. Mahdi, C. Asiimwe, and M. A. Osman, “Tracing the evolution of STEM education: a bibliometric analysis,” Front. Educ., vol. 9, Nov. 2024, https://doi.org/10.3389/feduc.2024.1457938.

[31] J. Baas, M. Schotten, A. Plume, G. Côté, and R. Karimi, “Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies,” Quant. Sci. Stud., vol. 1, no. 1, pp. 377–386, Feb. 2020, https://doi.org/10.1162/qss_a_00019.

[32] M. J. Page et al., “The PRISMA 2020 statement: an updated guideline for reporting systematic reviews,” Syst. Rev., vol. 10, no. 1, p. 89, Dec. 2021, https://doi.org/10.1186/s13643-021-01626-4.

[33] Y. Jia, H. Chen, J. Liu, X. Wang, R. Guo, and X. Wang, “Exploring network dynamics in scientific innovation: collaboration, knowledge combination, and innovative performance,” Front. Phys., vol. 12, Jan. 2025, https://doi.org/10.3389/fphy.2024.1492731.

[34] Y. Fuentes-Peñaranda, A. Labarta-González-Vallarino, E. Arroyo-Bello, and M. Gómez de Quero Córdoba, “Global Trends in Diabetic Foot Research (2004–2023): A Bibliometric Study Based on the Scopus Database,” Int. J. Environ. Res. Public Health, vol. 22, no. 4, p. 463, Mar. 2025, https://doi.org/10.3390/ijerph22040463.

[35] R. Asad, S. Altaf, S. Ahmad, H. Mahmoud, S. Huda, and S. Iqbal, “Machine Learning-Based Hybrid Ensemble Model Achieving Precision Education for Online Education Amid the Lockdown Period of COVID-19 Pandemic in Pakistan,” Sustain., vol. 15, no. 6, 2023, https://doi.org/10.3390/su15065431.

[36] S. Mythili and S. Kowsalya, “AI-Powered Data-Centric Approaches,” in Deep Learning Applications in Operations Research, Boca Raton: Auerbach Publications, 2024, pp. 26–36, https://doi.org/10.1201/9781032725444-3.

[37] H. Waheed, S. U. Hassan, N. R. Aljohani, J. Hardman, S. Alelyani, and R. Nawaz, “Predicting academic performance of students from VLE big data using deep learning models,” Comput. Human Behav., vol. 104, 2020, https://doi.org/10.1016/j.chb.2019.106189.

[38] M. E. Dogan, T. Goru Dogan, and A. Bozkurt, “The Use of Artificial Intelligence (AI) in Online Learning and Distance Education Processes: A Systematic Review of Empirical Studies,” Appl. Sci., vol. 13, no. 5, 2023, https://doi.org/10.3390/app13053056.

[39] M. Akour, H. Al Sghaier, and O. Al Qasem, “The effectiveness of using deep learning algorithms in predicting students achievements,” Indones. J. Electr. Eng. Comput. Sci., vol. 19, no. 1, pp. 388–394, 2020, https://doi.org/10.11591/ijeecs.v19.i1.pp388-394.

[40] K. Roy and D. M. Farid, “An Adaptive Feature Selection Algorithm for Student Performance Prediction,” IEEE Access, vol. 12, no. May, pp. 75577–75598, 2024, https://doi.org/10.1109/ACCESS.2024.3406252.

[41] K. Alalawi, R. Athauda, and R. Chiong, "An Extended Learning Analytics Framework Integrating Machine Learning and Pedagogical Approaches for Student Performance Prediction and Intervention," International Journal of Artificial Intelligence in Education, 2024, https://doi.org/10.1007/s40593-024-00429-7.

[42] E. Kalita et al., “Educational data mining: a 10-year review,” Discov. Comput., vol. 28, no. 1, p. 81, May 2025, https://doi.org/10.1007/s10791-025-09589-z.

[43] J.-M. Aguado-García, S. Alonso-Muñoz, and C. De-Pablos-Heredero, “Using Artificial Intelligence for Higher Education: An Overview and Future Research Avenues,” SAGE Open, vol. 15, no. 2, Apr. 2025, https://doi.org/10.1177/21582440251340352.

[44] Y. S. N. Rao and C. J. Chen, “Bibliometric insights into data mining in education research: A decade in review,” Contemp. Educ. Technol., vol. 16, no. 2, p. ep502, Apr. 2024, https://doi.org/10.30935/cedtech/14333.




DOI: https://doi.org/10.59247/jtped.v2i1.5

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Rugaya Tuanaya, Shazia Aslam, Safdar Ali, Ayesha Ajmal

License URL: https://creativecommons.org/licenses/by/4.0/


Journal of Technological Pedagogy and Educational Development
ISSN: 
Organized by Peneliti Teknologi Teknik Indonesia
Published by Peneliti Teknologi Teknik Indonesia
Website: https://ejournal.jtped.org/ojs/index.php/jtped
Email: jtped@ptti.web.id
Address: Yogyakarta, Indonesia