4.6 Article

Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile

期刊

ENTROPY
卷 23, 期 4, 页码 -

出版社

MDPI
DOI: 10.3390/e23040485

关键词

data analytics; databases; data science; Friedman test; socioeconomic index; university dropout

资金

  1. National Agency for Research and Development (ANID) of the Chilean government under the Ministry of Science, Technology, Knowledge and Innovation [FONDECYT 11190636, FONDECYT 1200525]
  2. ANID-Millennium Science Initiative Program [NCN17_059]

向作者/读者索取更多资源

The study uses data mining techniques to predict student retention in higher education, identifying important predictive variables such as secondary educational score and community poverty index. This enables institutions to take preventative measures to avoid dropouts effectively.
Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting student retention at various levels, using higher education data and specifying the relevant variables involved in the modeling. Then, we utilize this information to help the process of knowledge discovery. We predict student retention at each of three levels during their first, second, and third years of study, obtaining models with an accuracy that exceeds 80% in all scenarios. These models allow us to adequately predict the level when dropout occurs. Among the machine learning algorithms used in this work are: decision trees, k-nearest neighbors, logistic regression, naive Bayes, random forest, and support vector machines, of which the random forest technique performs the best. We detect that secondary educational score and the community poverty index are important predictive variables, which have not been previously reported in educational studies of this type. The dropout assessment at various levels reported here is valid for higher education institutions around the world with similar conditions to the Chilean case, where dropout rates affect the efficiency of such institutions. Having the ability to predict dropout based on student's data enables these institutions to take preventative measures, avoiding the dropouts. In the case study, balancing the majority and minority classes improves the performance of the algorithms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Engineering, Multidisciplinary

Bootstrap control charts for quantiles based on log-symmetric distributions with applications to the monitoring of reliability data

Victor Leiva, Rafael A. dos Santos, Helton Saulo, Carolina Marchant, Yuhlong Lio

Summary: This work proposes a methodology for monitoring a shift in the quantile of a distribution belonging to the log-symmetric family. The parametric bootstrap method is used to determine the sampling distribution and establish control limits. Monte Carlo simulations are conducted to assess the performance of the proposed bootstrap control charts. An application in the field of reliability data is presented. The research also provides an R package named chartslogsym for public use.

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL (2023)

Article Engineering, Environmental

Machine learning and automatic ARIMA/Prophet models-based forecasting of COVID-19: methodology, evaluation, and case study in SAARC countries

Iqra Sardar, Muhammad Azeem Akbar, Victor Leiva, Ahmed Alsanad, Pradeep Mishra

Summary: This article proposes an autoregressive modeling framework based on machine learning and statistical methods to predict confirmed COVID-19 cases in SAARC countries. By comparing different forecasting models, it is found that the ARIMA model performs well in predicting confirmed cases in these countries.

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT (2023)

Article Mathematics

Modeling Income Data via New Parametric Quantile Regressions: Formulation, Computational Statistics, and Application

Helton Saulo, Roberto Vila, Giovanna V. Borges, Marcelo Bourguignon, Victor Leiva, Carolina Marchant

Summary: Income modeling is crucial in determining workers' earnings and is an important research topic in labor economics. Traditional regressions based on normal distributions are widely used but not suitable for asymmetric income data. This study proposes parametric quantile regressions based on two asymmetric income distributions: Dagum and Singh-Maddala. Monte Carlo simulation studies and empirical data analysis show that both models perform well in model fitting for positively asymmetrically distributed income data. The economic implications of this investigation are discussed, and the proposed models are valuable tools for statisticians and econometricians.

MATHEMATICS (2023)

Article Infectious Diseases

Identification of Hazard and Socio-Demographic Patterns of Dengue Infections in a Colombian Subtropical Region from 2015 to 2020: Cox Regression Models and Statistical Analysis

Santiago Ortiz, Alexandra Catano-Lopez, Henry Velasco, Juan P. Restrepo, Andres Perez-Coronado, Henry Laniado, Victor Leiva

Summary: This article retrospectively analyzes confirmed dengue cases in the Antioquia region of Colombia from 2015 to 2020, distinguishing by subregions and dengue severity. The authors conducted exploratory analysis of epidemic data and performed statistical survival analysis using a Cox regression model. The findings identify the hazard and socio-demographic patterns of dengue infections in Antioquia, Colombia from 2015 to 2020.

TROPICAL MEDICINE AND INFECTIOUS DISEASE (2023)

Article Biology

An intelligent health monitoring and diagnosis system based on the internet of things and fuzzy logic for cardiac arrhythmia COVID-19 patients

Muhammad Zia Rahman, Muhammad Azeem Akbar, Victor Leiva, Abdullah Tahir, Muhammad Tanveer Riaz, Carlos Martin-Barreiro

Summary: The study aims to design and implement an intelligent health monitoring and diagnosis system for critical cardiac arrhythmia COVID-19 patients. The system utilizes artificial intelligence tools, including IoT-based health monitoring and fuzzy logic-based medical diagnosis, and provides intelligent diagnosis and health surveillance by doctors for critical COVID-19 patients or patients in remote locations. Communication with doctors in case of emergency is achieved through sensors, cloud storage, as well as a global system for mobile texts and emails.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Chemistry, Physical

Model-based optimal and robust control of renewable hydrogen gas production in a fed-batch microbial electrolysis cell

Muhammad Zia Ur Rahman, Mohsin Rizwan, Rabia Liaquat, Victor Leiva, Muhammad Muddasar

Summary: This article focuses on the control problem of microbial electrolysis cell (MEC) systems, develops a robust controller to achieve fast and stable response, and proposes an anti-integral windup control strategy to address the issue of increasing control effort due to error accumulation.

INTERNATIONAL JOURNAL OF HYDROGEN ENERGY (2023)

Article Statistics & Probability

Robust autoregressive modeling and its diagnostic analytics with a COVID-19 related application

Yonghui Liu, Jing Wang, Victor Leiva, Alejandra Tapia, Wei Tan, Shuangzhe Liu

Summary: This article proposes a skew-t autoregressive model and estimates its parameters using the expectation-maximization (EM) method. It also develops an influence methodology based on local perturbations for validation. The study identifies influential observations using normal curvatures for four perturbation strategies and assesses their performance through Monte Carlo simulations. An example of financial data analysis on Brent crude futures daily log-returns is presented to investigate the possible impact of the COVID-19 pandemic.

JOURNAL OF APPLIED STATISTICS (2023)

Article Computer Science, Interdisciplinary Applications

A score test for detecting extreme values in a vector autoregressive model

Yonghui Liu, Jing Wang, Dawei Shi, Victor Leiva, Shuangzhe Liu

Summary: In this paper, a score test is proposed to study a vector autoregressive model and detect extreme values. Maximum likelihood estimators and information matrix are derived using a likelihood approach. The score statistic for the vector autoregressive model is established to identify influential cases or outliers. The effectiveness of the diagnostics is examined through simulation study. The model is applied to analyze monthly log-returns of IBM stock and the S&P 500 index. Comparisons between the score test and the local influence method are made, revealing that the score test is more effective while the local influence analysis can identify more influential cases.

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION (2023)

Article Mathematics, Interdisciplinary Applications

On a Novel Dynamics of a SIVR Model Using a Laplace Adomian Decomposition Based on a Vaccination Strategy

Prasantha Bharathi Dhandapani, Victor Leiva, Carlos Martin-Barreiro, Maheswari Rangasamy

Summary: In this paper, a SIVR model using the Laplace Adomian decomposition is introduced, which focuses on studying the characteristics of vaccination in infected communities. The epidemiological parameters are analyzed using equilibrium stability and numerical analysis techniques. The model establishes the chance for the next wave of any pandemic disease and demonstrates that a consistent vaccination strategy could control it. This work is important for future research on COVID-19 and pandemic diseases as it considers the vaccinated population.

FRACTAL AND FRACTIONAL (2023)

Review Mathematics

An Overview of Forecast Analysis with ARIMA Models during the COVID-19 Pandemic: Methodology and Case Study in Brazil

Raydonal Ospina, Joao A. M. Gondim, Victor Leiva, Cecilia Castro

Summary: This article focuses on the issues presented by the COVID-19 pandemic and examines the use of ARIMA models for short-term forecasting. The study highlights the importance of accurate and timely predictions for public health strategies and interventions. The research also emphasizes the limitations of ARIMA models for long-term predictions.

MATHEMATICS (2023)

Article Mathematics

Inference Based on the Stochastic Expectation Maximization Algorithm in a Kumaraswamy Model with an Application to COVID-19 Cases in Chile

Jorge Figueroa-Zuniga, Juan G. Toledo, Bernardo Lagos-Alvarez, Victor Leiva, Jean P. Navarrete

Summary: Extensive research has examined models utilizing the Kumaraswamy distribution for describing continuous variables with bounded support. This study focuses on the trapezoidal Kumaraswamy model and proposes a parameter estimation method using the stochastic expectation maximization algorithm, which overcomes challenges faced by the traditional expectation maximization algorithm. The results are applied to modeling daily COVID-19 cases in Chile.

MATHEMATICS (2023)

Article Biochemistry & Molecular Biology

On the Use of Machine Learning Techniques and Non-Invasive Indicators for Classifying and Predicting Cardiac Disorders

Raydonal Ospina, Adenice G. O. Ferreira, Helio M. de Oliveira, Victor Leiva, Cecilia Castro

Summary: This research aims to improve the classification and prediction of ischemic heart diseases using machine learning techniques. Novel non-invasive indicators called Campello de Souza features were introduced and evaluated with a comprehensive dataset. The study demonstrates the potential of machine learning algorithms in streamlining diagnostic procedures and reducing errors and dependency on extensive clinical testing.

BIOMEDICINES (2023)

Article Biology

Similarity-Based Predictive Models: Sensitivity Analysis and a Biological Application with Multi-Attributes

Jeniffer D. Sanchez, Leandro C. Rego, Raydonal Ospina, Victor Leiva, Christophe Chesneau, Cecilia Castro

Summary: In this study, sensitivity analysis in similarity-based predictive models is performed, using computational simulations and two distinct methodologies, with a focus on a biological application. A linear regression model is used as a reference point, and the coefficient of variation of parameter estimators is calculated to gauge sensitivity. Results show that the first approach outperforms the second one when dealing with categorical variables and offers the advantage of being more parsimonious. Predictive models based on empirical similarity are crucial in biology and data science, and this study provides insights into how to handle categorical variables effectively.

BIOLOGY-BASEL (2023)

Article Mathematics, Applied

Statistical characterization of vaccinated cases and deaths due to COVID-19: methodology and case study in South America

Carlos Martin-Barreiro, Xavier Cabezas, Victor Leiva, Pedro Ramos-De Santis, John A. Ramirez-Figueroa, Erwin J. Delgado

Summary: This study analyzes the number of vaccinated cases and deaths due to COVID-19 in ten South American countries using principal component analysis and K-means analysis. The countries are classified into groups based on these variables, which reveal common properties and differences. Factors such as political decisions, availability of resources, bargaining power with suppliers, and health infrastructure can affect the vaccination process and timely care. Most countries acted promptly in terms of vaccination, with the exception of two countries. All countries experienced peaks in the number of deaths at some point during the study period.

AIMS MATHEMATICS (2023)

Article Mathematical & Computational Biology

STATIS multivariate three-way method for evaluating quality of life after corneal surgery: Methodology and case study in Costa Rica

Francisco J. Perdomo-Arguello, Estelina Ortega-Gomez, Purificacion Galindo-Villardon, Victor Leiva, Purificacion Vicente-Galindo

Summary: This paper presents a methodology based on multivariate three-way methods to assess the real change in vision-related quality of life (QoL) for myopic patients before and after corneal surgery. The study conducted in Costa Rica found a statistically significant difference in perceived QoL levels after surgery and identified recalibration and reconceptualization.

MATHEMATICAL BIOSCIENCES AND ENGINEERING (2023)

暂无数据