Prediciendo el rendimiento académico de estudiantes de pregrado en una universidad destacada de Perú: Una aproximación con herramientas de Machine Learning

  • Fabio Salas Pontificia Universidad Católica del Perú

    I am a Peruvian economist specialized in the fields of development and education. Currently, I work as a junior researcher at the Institute of Human Development of Latin America (IDHAL-PUCP), focusing on studying poverty in the region through the lens of the capabilities approach. Additionally, I serve as data analyst at the Central Admission Office (OCAI-PUCP), leveraging data insights to shape and inform admission policies at PUCP.
    E-mail: fabio.salas@pucp.edu.pe

  • Josué Caldas Pontificia Universidad Católica del Perú

    I hold a Bachelor’s Degree in Political Science and Government at Pontifical Catholic University of Peru. I specialize in data analysis with a focus on public policy and development economics. Currently, I am engaged as a Research Assistant at the Artificial Intelligence and Computational Methods Laboratory (QLAB-PUCP), where I leverage data science methodologies to inform and enhance public policy applications.
    E-mail: josue.caldas@pucp.edu.pe

Palabras clave: Rendimiento Académico, Machine Learning, Educación Superior, Perú

Resumen

Aunque la accesibilidad a la educación superior ha mejorado en países de renta baja y media (PRMB), persiste el abandono, especialmente entre estudiantes socioeconómicamente desfavorecidos. A pesar de los avances en modelos de Machine Learning para entender este desafío, muchos estudios descuidan factores institucionales específicos de los PRMB o se centran en cursos específicos, limitando su aplicabilidad y relevancia política. Para abordar esto, creamos una base de datos usando registros administrativos y censales para predecir el rendimiento académico en la Pontificia Universidad Católica del Perú (PUCP). Los modelos más efectivos, entre ellos Random Forest, destacaron predictores como el rendimiento previo y puntuaciones en pruebas de admisión. Presentamos un modelo eficiente con diez características que puede predecir el rendimiento futuro y así aportar a la reducción de la deserción en PUCP.

Referencias bibliográficas

Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’ performance prediction using machine learning techniques. Education Sciences, 11(9), 552-579. https://doi.org/10.3390/educsci11090552

Almasri, A., Celebi, E., & Alkhawaldeh, R. (2019). EMT: Ensemble meta-based tree model for predicting student performance. Scientific Programming, 2019. https://doi.org/10.1155/2019/3610248

Al-Barrak, M., & Al-Razgan, M. (2016). Predicting students final GPA using decision trees: a case study. International journal of information and education technology, 6(7), 528-533. https://doi.org/10.7763/ijiet.2016.v6.745

Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: literature review and best practices. International Journal of Educational Technology in Higher Education, 17(3), 1-21. https://doi.org/10.1186/s41239-020-0177-7

Andrabi, T., Bau, N., Das, J., & Khwaja, A. (2022, November). Heterogeneity in School Value-Added and the Private Premium (Working Paper No. 30627). National Bureau of Economic Research. https://doi.org/10.3386/w30627

Athey, S., & Imbens, G. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic perspectives, 31(2), 3-32. https://doi.org/10.1257/jep.31.2.3

Balán, J. (2020). Expanding access and improving equity in higher education: the national systems perspective. In S. Schwartzman (Ed.), Higher education in Latin America and the challenges of the 21st century (pp. 59-75). Springer. https://doi.org/10.1007/978-3-030-44263-7

Beck, H., & Davidson, W. (2001). Establishing an Early Warning System: Predicting Low Grades in College Students from Survey of Academic Orientations Scores. Research in Higher Education, 42, 709-723. https://doi.org/10.1023/A:1012253527960

Benites, R. (2021, April). La educación superior universitaria en el Perú post-pandemia (Policy Document No.1). Pontificia Universidad Católica del Perú. https://repositorio.pucp.edu.pe/index/handle/123456789/176597

Cachia, M., Lynam, S., & Stock, R. (2018). Academic success: Is it just about the grades? Higher Education Pedagogies, 3(1), 434-439. https://doi.org/10.1080/23752696.2018.1462096

Coleman, J. S. (1968). Equality of educational opportunity. Integrated education, 6(5), 19-28. https://doi.org/10.1080/0020486680060504

Contreras, L., Caro, J., & Morales, D. (2022). A review on the prediction of students’ academic performance using ensemble methods. Ingeniería Solidaria, 18(2), 1-28. https://doi.org/10.16925/2357-6014.2022.02.01

Daud, A., Radi, N., Abbasi, R., Lytras, M., Abbas, F. & Alowbdi, J. (2017). Predicting Student Performance using Advanced Learning Analytics. Proceedings of the 26th international conference on world wide web companion, 415-421. https://doi.org/10.1145/3041021.3054164

De Los Rios, F. (2023, April). ¿Es el enfoque correcto?: El problema de la modalidad de ingreso por examen de admisión a las universidades nacionales del Perú. Estudios Generales Letras - Pontificia Universidad Católica del Perú. https://files.pucp.education/facultad/generales-letras/wp-content/uploads/2022/06/15113956/%C2%BFEs-el-enfoque-correcto_-El-problema-de-la-modalidad-de-ingreso-por-examen-de-admision-a-las-universidades-nacionales-del-Peru.docx.pdf

Disha, R., & Waheed, S. (2022). Performance analysis of machine learning models for intrusion detection system using Gini Impurity-based Weighted Random Forest (GIWRF) feature selection technique. Cybersecurity, 5(1), 1-22. https://doi.org/10.1186/s42400-021-00103-8

Fonteyne, L., Duyck, W., & De Fruyt, F. (2017). Program-specific prediction of academic achievement on the basis of cognitive and non-cognitive factors. Learning and Individual Differences, 56, 34-48. https://doi.org/10.1016/j.lindif.2017.05.003

Garcia, J. (2021). Machine learning para predecir el rendimiento académico de los estudiantes universitarios [Bachelor thesis, Universidad César Vallejo]. Universidad César Vallejo. https://repositorio.ucv.edu.pe/handle/20.500.12692/83442

Gil, P., Da Cruz Martins, S., Moro, S., & Costa, J. (2021). A data-driven approach to predict first-year students’ academic success in higher education institutions. Education and Information Technologies, 26(2), 2165-2190. https://doi.org/10.1007/s10639-020-10346-6

Guerrero, G., Sugimaru, C., Cussianovich, A., De Fraine, B., & Cueto, S. (2016, March). Education aspirations among young people in Peru and their perceptions of barriers to higher education (Working Paper No. 148). https://www.grade.org.pe/en/publicaciones/education-aspirations-among-young-people-in-peru-and-their-perceptions-of-barriers-to-higher-education/

Hajar, M., Adil, J., Ali, Y., & Khalid, A. (2022). Predicting Student Success in a Scholarship Program: A Comparative Study of Classification Learning Models. In S. Motahhir & B. Bossoufi (Eds.) Digital Technologies and Applications: Proceedings of ICDTA’22, Fez, Morocco, Volume 2, 333-341. Springer. https://doi.org/10.1007/978-3-031-02447-4_35

Incio, F., Capuñay, D., & Estela, R. (2023). Modelo de red neuronal artificial para predecir resultados académicos en la asignatura Matemática II. Revista Electrónica Educare, 27(1), 1-19. https://doi.org/10.15359/ree.27-1.14516

Infante, L. & Rojas, J. (2021). Identification of factors that affect the academic performance of high school students in Peru through a machine learning algorithm. Proceedings of the 19th LACCEI International Multi-Conference for Engineering, Education and Technology. https://www.laccei.org/LACCEI2021-VirtualEdition/full_papers/FP68.pdf

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (1st ed.). Springer.

Kumar, S., Agarwal, M., & Agarwal, N. (2021). Defining and measuring academic performance of Hei students-a critical review. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(6), 3091-3105.

Kuncel, N. R., & Hezlett, S. (2010). Fact and fiction in cognitive ability testing for admissions and hiring decisions. Current Directions in Psychological Science, 19(6), 339-345. https://doi.org/10.1177/0963721410389459

Lavin, D. E. (1965). The prediction of academic performance. Russel Sage Found.

LBDEAC - Local Burden of Disease Educational Attainment Collaborators. (2020). Mapping disparities in education across low-and middle-income countries. Nature, 577(7789), 235-238. https://doi.org/10.1038/s41586-019-1872-1

Leitner, P., Khalil, M., & Ebner, M. (2017). Learning analytics in higher education—a literature review. In Peña-Ayala, A. (eds.), Learning Analytics: Fundaments, Applications, and Trends. Studies in Systems, Decision and Control, 1-23, Springer. https://doi.org/10.1007/978-3-319-52977-6_1

Lemos, G., Abad, F., Almeida, L., & Colom, R. (2014). Past and future academic experiences are related with present scholastic achievement when intelligence is controlled. Learning and Individual Differences, 32, 148-155. https://doi.org/10.1016/j.lindif.2014.01.004

Lundberg, S., Erion, G., & Lee, S. (2018). Consistent individualized feature attribution for tree ensembles. https://doi.org/10.48550/arXiv.1802.03888

Mathrani, A., Susnjak, T., Ramaswami, G., & Barczak, A. (2021). Perspectives on the challenges of generalizability, transparency and ethics in predictive learning analytics. Computers and Education Open, 2, https://doi.org/10.1016/j.caeo.2021.100060.

Menacho, C. (2017). Predicción del rendimiento académico aplicando técnicas de minería de datos. Anales Científicos, 78(1), 26-33. http://doi.org/10.21704/ac.v78i1.811

Michalak, T., Aadithya, K., Szczepanski, P., Ravindran, B., & Jennings, N. (2013). Efficient computation of the Shapley value for game-theoretic network centrality. Journal of Artificial Intelligence Research, 46, 607-650. https://doi.org/10.1613/jair.3806

MINEDU - Ministerio de Educación del Perú. (2018). Desafíos en la medición y el análisis del estatus socioeconómico de los estudiantes peruanos. Lima. https://hdl.handle.net/20.500.12799/5862

MINEDU - Ministerio de Educación del Perú. (2022). Alerta Escuela: Machine Learning para el cálculo del riesgo de interrupción de estudios en el Perú. https://repositorio.minedu.gob.pe/handle/20.500.12799/8668

Moreno-Ger, P., & Burgos, D. (2021). Machine Learning and Student Activity to Predict Academic Grades in Online Settings in Latam. In Burgos, D., Branch, J.W. (eds), Radical Solutions for Digital Transformation in Latin American Universities. Lecture Notes in Educational Technology, 243-257. Springer, https://doi.org/10.1007/978-981-16-3941-8_13

Mueen, A., Zafar, B., & Manzoor, U. (2016). Modeling and predicting students’ academic performance using data mining techniques. International Journal of Modern Education and Computer Science, 8(11), 36-42. https://doi.org/10.5815/ijmecs.2016.11.05

Niri, O. (2021). Using Machine Learning for University Admission: Mapping the Socio-Technical Issue. Delft University of Technology [Bachelor Thesis, Delft University of Technology]. Research repository. http://resolver.tudelft.nl/uuid:be135436-2a52-483a-b3bb-cebbe2ed8b6a

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, MPrettenhofer, P., Weiss, R., Duborh, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of machine Learning research, 12, 2825-2830. https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf

Peña-Ayala, A., Cárdenas-Robledo, L., & Sossa, H. (2017). A landscape of learning analytics: An exercise to highlight the nature of an emergent field. In Peña-Ayala, A. (eds.). Learning Analytics: Fundaments, Applications, and Trends. Studies in Systems, Decision and Control, 65-112. Springer

Puga, J. & Torres, R. (2023). Redes neuronales artificiales para pronosticar el rendimiento académico de alumnos de ingeniería de sistemas e informática de la Universidad Nacional de la Amazonía Peruana. [Master thesis, Universidad Nacional de la Amazonía Peruana]. Repositorio Institucional Digital UNAP. https://repositorio.unapiquitos.edu.pe/bitstream/handle/20.500.12737/9204/Jorge_TrabajoDeInvestigacion_Maestria_2023.pdf?sequence=1&isAllowed=y

QS WUR - Quacquarelli Symonds World University Ranking. (2023, october, 29). QS World University Rankings 2023. https://www.topuniversities.com/university-rankings/world-university-rankings/2023

Rastrollo-Guerrero, J., Gómez-Pulido, J., & Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied sciences, 10(3), 1042. https://doi.org/10.3390/app10031042

Rifat, M. R. I., Al Imran, A., & Badrudduza, A. S. M. (2019). Educational performance analytics of undergraduate business students. International Journal of Modern Education and Computer Science, 11(7), 44. https://doi.org/10.5815/ijmecs.2019.07.05

Rodríguez, C., Cascallar, E. and Kyndt, E. (2020). Socio-economic status and academic performance in higher education: A systematic review. Educational Research Review, 29, 100305. https://doi.org/10.1016/j.edurev.2019.100305

Romero, R. (2021). La formación académica de los jóvenes y las pruebas de admisión a la educación superior. Una experiencia peruana. Horizontes. Revista de Investigación en Ciencias de la Educación, 5(19), pp.714-752. https://doi.org/10.33996/revistahorizontes.v5i19.234

Roth, B., Becker, N., Romeyke, S., Schäfer, S., Domnick, F., & Spinath, F. (2015). Intelligence and school grades: A meta-analysis. Intelligence, 53, 118-137. https://psycnet.apa.org/doi/10.1016/j.intell.2015.09.002

Sahlaoui, H., Nayyar, A., Agoujil, S., & Jaber, M. M. (2021). Predicting and interpreting student performance using ensemble models and shapley additive explanations. IEEE Access, 9, 152688-152703. https://doi.org/10.1109/ACCESS.2021.3124270

Saire, E. (2023). Predicción de la ruta de rendimiento académico con algoritmos de clasificación. [Doctoral thesis, Universidad Nacional San Agustín de Arequipa]. Repositorio Institucional UNSA. https://hdl.handle.net/20.500.12773/16154

Salas-Pilco, S. Z., & Yang, Y. (2022). Artificial intelligence applications in Latin American higher education: a systematic review. International Journal of Educational Technology in Higher Education, 19(1), 1-20. https://doi.org/10.1186/s41239-022-00326-w

Sánchez, A., Favara, M., & Porter, C. (2021). Stratification of returns to higher education in Peru: the role of education quality and major choices (Working Paper No. 14339). IZA Institute of Labor Economics. https://www.iza.org/publications/dp/14339/stratification-of-returns-to-higher-education-in-peru-the-role-of-education-quality-and-major-choices

SIR - Scimago Institutions Ranking. (2023, october 29). Scimago Institutions Ranking in Latinamerica 2023. https://www.scimagoir.com/rankings.php?sector=Higher+educ.&country=Latin%20America

Schendel, R., & McCowan, T. (2016). Expanding higher education systems in low-and middle-income countries: the challenges of equity and quality. Higher education, 72(4), 407-411. https://doi.org/10.1007/s10734-016-0028-6

Sekeroglu, B., Abiyev, R., Ilhan, A., Arslan, M., & Idoko, J. B. (2021). Systematic literature review on machine learning and student performance prediction: Critical gaps and possible remedies. Applied Sciences, 11(22), 10907. https://doi.org/10.3390/app112210907

Silva, L., Catela, L., Seabra, C., Balcao, A. and Alves, M. (2020). Student selection and performance in higher education: admission exam vs. high school scores. Education Economics, 28(5), 437-454. https://doi.org/10.1080/09645292.2020.1782846

Susnjak, T. (2023). Beyond Predictive Learning Analytics Modelling and onto Explainable Artificial Intelligence with Prescriptive Analytics and ChatGPT. International Journal of Artificial Intelligence in Education, 1-31. https://doi.org/10.1007/s40593-023-00336-3

York, T. T., Gibson, C., & Rankin, S. (2015). Defining and measuring academic success. Practical assessment, research, and evaluation, 20(1), 5. https://doi.org/10.7275/hz5x-tx03

Descargas

El artículo aún no registra descargas.
Cómo citar
Salas, F., & Caldas, J. (2024). Prediciendo el rendimiento académico de estudiantes de pregrado en una universidad destacada de Perú: Una aproximación con herramientas de Machine Learning. Educación, 33(64), 55-85. https://doi.org/10.18800/educacion.202401.M003