Prediciendo el rendimiento académico de estudiantes de pregrado en una universidad destacada de Perú: Una aproximación con herramientas de Machine Learning
Resumen
Aunque la accesibilidad a la educación superior ha mejorado en países de renta baja y media (PRMB), persiste el abandono, especialmente entre estudiantes socioeconómicamente desfavorecidos. A pesar de los avances en modelos de Machine Learning para entender este desafío, muchos estudios descuidan factores institucionales específicos de los PRMB o se centran en cursos específicos, limitando su aplicabilidad y relevancia política. Para abordar esto, creamos una base de datos usando registros administrativos y censales para predecir el rendimiento académico en la Pontificia Universidad Católica del Perú (PUCP). Los modelos más efectivos, entre ellos Random Forest, destacaron predictores como el rendimiento previo y puntuaciones en pruebas de admisión. Presentamos un modelo eficiente con diez características que puede predecir el rendimiento futuro y así aportar a la reducción de la deserción en PUCP.
Referencias bibliográficas
Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’ performance prediction using machine learning techniques. Education Sciences, 11(9), 552-579. https://doi.org/10.3390/educsci11090552
Almasri, A., Celebi, E., & Alkhawaldeh, R. (2019). EMT: Ensemble meta-based tree model for predicting student performance. Scientific Programming, 2019. https://doi.org/10.1155/2019/3610248
Al-Barrak, M., & Al-Razgan, M. (2016). Predicting students final GPA using decision trees: a case study. International journal of information and education technology, 6(7), 528-533. https://doi.org/10.7763/ijiet.2016.v6.745
Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. International Journal of Educational Technology in Higher Education, 17(3), 1-21. https://doi.org/10.1186/s41239-020-0177-7
Andrabi, T., Bau, N., Das, J., & Khwaja, A. (2022, November). Heterogeneity in School Value-Added and the Private Premium (Working Paper No. 30627). National Bureau of Economic Research. https://doi.org/10.3386/w30627
Athey, S., & Imbens, G. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic perspectives, 31(2), 3-32. https://doi.org/10.1257/jep.31.2.3
Balán, J. (2020). Expanding access and improving equity in higher education: the national systems perspective. In S. Schwartzman (Ed.), Higher education in Latin America and the challenges of the 21st century (pp. 59-75). Springer. https://doi.org/10.1007/978-3-030-44263-7
Beck, H., & Davidson, W. (2001). Establishing an Early Warning System: Predicting Low Grades in College Students from Survey of Academic Orientations Scores. Research in Higher Education, 42, 709-723. https://doi.org/10.1023/A:1012253527960
Benites, R. (2021, April). La educación superior universitaria en el Perú post-pandemia (Policy Document No.1). Pontificia Universidad Católica del Perú. https://repositorio.pucp.edu.pe/index/handle/123456789/176597
Cachia, M., Lynam, S., & Stock, R. (2018). Academic success: Is it just about the grades? Higher Education Pedagogies, 3(1), 434-439. https://doi.org/10.1080/23752696.2018.1462096
Coleman, J. S. (1968). Equality of educational opportunity. Integrated education, 6(5), 19-28. https://doi.org/10.1080/0020486680060504
Contreras, L., Caro, J., & Morales, D. (2022). A review on the prediction of students’ academic performance using ensemble methods. Ingeniería Solidaria, 18(2), 1-28. https://doi.org/10.16925/2357-6014.2022.02.01
Daud, A., Radi, N., Abbasi, R., Lytras, M., Abbas, F. & Alowbdi, J. (2017). Predicting Student Performance using Advanced Learning Analytics. Proceedings of the 26th international conference on world wide web companion, 415-421. https://doi.org/10.1145/3041021.3054164
De Los Rios, F. (2023, April). ¿Es el enfoque correcto?: El problema de la modalidad de ingreso por examen de admisión a las universidades nacionales del Perú. Estudios Generales Letras - Pontificia Universidad Católica del Perú. https://files.pucp.education/facultad/generales-letras/wp-content/uploads/2022/06/15113956/%C2%BFEs-el-enfoque-correcto_-El-problema-de-la-modalidad-de-ingreso-por-examen-de-admision-a-las-universidades-nacionales-del-Peru.docx.pdf
Disha, R., & Waheed, S. (2022). Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique. Cybersecurity, 5(1), 1-22. https://doi.org/10.1186/s42400-021-00103-8
Fonteyne, L., Duyck, W., & De Fruyt, F. (2017). Program-specific prediction of academic achievement on the basis of cognitive and non-cognitive factors. Learning and Individual Differences, 56, 34-48. https://doi.org/10.1016/j.lindif.2017.05.003
Garcia, J. (2021). Machine learning para predecir el rendimiento académico de los estudiantes universitarios [Bachelor thesis, Universidad César Vallejo]. Universidad César Vallejo. https://repositorio.ucv.edu.pe/handle/20.500.12692/83442
Gil, P., Da Cruz Martins, S., Moro, S., & Costa, J. (2021). A data-driven approach to predict first-year students’ academic success in higher education institutions. Education and Information Technologies, 26(2), 2165-2190. https://doi.org/10.1007/s10639-020-10346-6
Guerrero, G., Sugimaru, C., Cussianovich, A., De Fraine, B., & Cueto, S. (2016, March). Education aspirations among young people in Peru and their perceptions of barriers to higher education (Working Paper No. 148). https://www.grade.org.pe/en/publicaciones/education-aspirations-among-young-people-in-peru-and-their-perceptions-of-barriers-to-higher-education/
Hajar, M., Adil, J., Ali, Y., & Khalid, A. (2022). Predicting Student Success in a Scholarship Program: A Comparative Study of Classification Learning Models. In S. Motahhir & B. Bossoufi (Eds.) Digital Technologies and Applications: Proceedings of ICDTA’22, Fez, Morocco, Volume 2, 333-341. Springer. https://doi.org/10.1007/978-3-031-02447-4_35
Incio, F., Capuñay, D., & Estela, R. (2023). Modelo de red neuronal artificial para predecir resultados académicos en la asignatura Matemática II. Revista Electrónica Educare, 27(1), 1-19. https://doi.org/10.15359/ree.27-1.14516
Infante, L. & Rojas, J. (2021). Identification of factors that affect the academic performance of high school students in Peru through a machine learning algorithm. Proceedings of the 19th LACCEI International Multi-Conference for Engineering, Education and Technology. https://www.laccei.org/LACCEI2021-VirtualEdition/full_papers/FP68.pdf
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (1st ed.). Springer.
Kumar, S., Agarwal, M., & Agarwal, N. (2021). Defining and measuring academic performance of Hei students-a critical review. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(6), 3091-3105.
Kuncel, N. R., & Hezlett, S. (2010). Fact and fiction in cognitive ability testing for admissions and hiring decisions. Current Directions in Psychological Science, 19(6), 339-345. https://doi.org/10.1177/0963721410389459
Lavin, D. E. (1965). The prediction of academic performance. Russel Sage Found.
LBDEAC - Local Burden of Disease Educational Attainment Collaborators. (2020). Mapping disparities in education across low-and middle-income countries. Nature, 577(7789), 235-238. https://doi.org/10.1038/s41586-019-1872-1
Leitner, P., Khalil, M., & Ebner, M. (2017). Learning analytics in higher education—a literature review. In Peña-Ayala, A. (eds.), Learning Analytics: Fundaments, Applications, and Trends. Studies in Systems, Decision and Control, 1-23, Springer. https://doi.org/10.1007/978-3-319-52977-6_1
Lemos, G., Abad, F., Almeida, L., & Colom, R. (2014). Past and future academic experiences are related with present scholastic achievement when intelligence is controlled. Learning and Individual Differences, 32, 148-155. https://doi.org/10.1016/j.lindif.2014.01.004
Lundberg, S., Erion, G., & Lee, S. (2018). Consistent individualized feature attribution for tree ensembles. https://doi.org/10.48550/arXiv.1802.03888
Mathrani, A., Susnjak, T., Ramaswami, G., & Barczak, A. (2021). Perspectives on the challenges of generalizability, transparency and ethics in predictive learning analytics. Computers and Education Open, 2, https://doi.org/10.1016/j.caeo.2021.100060.
Menacho, C. (2017). Predicción del rendimiento académico aplicando técnicas de minería de datos. Anales Científicos, 78(1), 26-33. http://doi.org/10.21704/ac.v78i1.811
Michalak, T., Aadithya, K., Szczepanski, P., Ravindran, B., & Jennings, N. (2013). Efficient computation of the Shapley value for game-theoretic network centrality. Journal of Artificial Intelligence Research, 46, 607-650. https://doi.org/10.1613/jair.3806
MINEDU - Ministerio de Educación del Perú. (2018). Desafíos en la medición y el análisis del estatus socioeconómico de los estudiantes peruanos. Lima. https://hdl.handle.net/20.500.12799/5862
MINEDU - Ministerio de Educación del Perú. (2022). Alerta Escuela: Machine Learning para el cálculo del riesgo de interrupción de estudios en el Perú. https://repositorio.minedu.gob.pe/handle/20.500.12799/8668
Moreno-Ger, P., & Burgos, D. (2021). Machine Learning and Student Activity to Predict Academic Grades in Online Settings in Latam. In Burgos, D., Branch, J.W. (eds), Radical Solutions for Digital Transformation in Latin American Universities. Lecture Notes in Educational Technology, 243-257. Springer, https://doi.org/10.1007/978-981-16-3941-8_13
Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting students’ academic performance using data mining techniques. International Journal of Modern Education and Computer Science, 8(11), 36-42. https://doi.org/10.5815/ijmecs.2016.11.05
Niri, O. (2021). Using Machine Learning for University Admission: Mapping the Socio-Technical Issue. Delft University of Technology [Bachelor Thesis, Delft University of Technology]. Research repository. http://resolver.tudelft.nl/uuid:be135436-2a52-483a-b3bb-cebbe2ed8b6a
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, MPrettenhofer, P., Weiss, R., Duborh, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of machine Learning research, 12, 2825-2830. https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf
Peña-Ayala, A., Cárdenas-Robledo, L., & Sossa, H. (2017). A landscape of learning analytics: An exercise to highlight the nature of an emergent field. In Peña-Ayala, A. (eds.). Learning Analytics: Fundaments, Applications, and Trends. Studies in Systems, Decision and Control, 65-112. Springer
Puga, J. & Torres, R. (2023). Redes neuronales artificiales para pronosticar el rendimiento académico de alumnos de ingeniería de sistemas e informática de la Universidad Nacional de la Amazonía Peruana. [Master thesis, Universidad Nacional de la Amazonía Peruana]. Repositorio Institucional Digital UNAP. https://repositorio.unapiquitos.edu.pe/bitstream/handle/20.500.12737/9204/Jorge_TrabajoDeInvestigacion_Maestria_2023.pdf?sequence=1&isAllowed=y
QS WUR - Quacquarelli Symonds World University Ranking. (2023, october, 29). QS World University Rankings 2023. https://www.topuniversities.com/university-rankings/world-university-rankings/2023
Rastrollo-Guerrero, J., Gómez-Pulido, J., & Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied sciences, 10(3), 1042. https://doi.org/10.3390/app10031042
Rifat, M. R. I., Al Imran, A., & Badrudduza, A. S. M. (2019). Educational performance analytics of undergraduate business students. International Journal of Modern Education and Computer Science, 11(7), 44. https://doi.org/10.5815/ijmecs.2019.07.05
Rodríguez, C., Cascallar, E. and Kyndt, E. (2020). Socio-economic status and academic performance in higher education: A systematic review. Educational Research Review, 29, 100305. https://doi.org/10.1016/j.edurev.2019.100305
Romero, R. (2021). La formación académica de los jóvenes y las pruebas de admisión a la educación superior. Una experiencia peruana. Horizontes. Revista de Investigación en Ciencias de la Educación, 5(19), pp.714-752. https://doi.org/10.33996/revistahorizontes.v5i19.234
Roth, B., Becker, N., Romeyke, S., Schäfer, S., Domnick, F., & Spinath, F. (2015). Intelligence and school grades: A meta-analysis. Intelligence, 53, 118-137. https://psycnet.apa.org/doi/10.1016/j.intell.2015.09.002
Sahlaoui, H., Nayyar, A., Agoujil, S., & Jaber, M. M. (2021). Predicting and interpreting student performance using ensemble models and shapley additive explanations. IEEE Access, 9, 152688-152703. https://doi.org/10.1109/ACCESS.2021.3124270
Saire, E. (2023). Predicción de la ruta de rendimiento académico con algoritmos de clasificación. [Doctoral thesis, Universidad Nacional San Agustín de Arequipa]. Repositorio Institucional UNSA. https://hdl.handle.net/20.500.12773/16154
Salas-Pilco, S. Z., & Yang, Y. (2022). Artificial intelligence applications in Latin American higher education: a systematic review. International Journal of Educational Technology in Higher Education, 19(1), 1-20. https://doi.org/10.1186/s41239-022-00326-w
Sánchez, A., Favara, M., & Porter, C. (2021). Stratification of returns to higher education in Peru: the role of education quality and major choices (Working Paper No. 14339). IZA Institute of Labor Economics. https://www.iza.org/publications/dp/14339/stratification-of-returns-to-higher-education-in-peru-the-role-of-education-quality-and-major-choices
SIR - Scimago Institutions Ranking. (2023, october 29). Scimago Institutions Ranking in Latinamerica 2023. https://www.scimagoir.com/rankings.php?sector=Higher+educ.&country=Latin%20America
Schendel, R., & McCowan, T. (2016). Expanding higher education systems in low-and middle-income countries: the challenges of equity and quality. Higher education, 72(4), 407-411. https://doi.org/10.1007/s10734-016-0028-6
Sekeroglu, B., Abiyev, R., Ilhan, A., Arslan, M., & Idoko, J. B. (2021). Systematic literature review on machine learning and student performance prediction: Critical gaps and possible remedies. Applied Sciences, 11(22), 10907. https://doi.org/10.3390/app112210907
Silva, L., Catela, L., Seabra, C., Balcao, A. and Alves, M. (2020). Student selection and performance in higher education: admission exam vs. high school scores. Education Economics, 28(5), 437-454. https://doi.org/10.1080/09645292.2020.1782846
Susnjak, T. (2023). Beyond Predictive Learning Analytics Modelling and onto Explainable Artificial Intelligence with Prescriptive Analytics and ChatGPT. International Journal of Artificial Intelligence in Education, 1-31. https://doi.org/10.1007/s40593-023-00336-3
York, T. T., Gibson, C., & Rankin, S. (2015). Defining and measuring academic success. Practical assessment, research, and evaluation, 20(1), 5. https://doi.org/10.7275/hz5x-tx03