Article
Genetics & Heredity
Osval Antonio Montesinos-Lopez, Abelardo Montesinos-Lopez, Brandon A. Mosqueda-Gonzalez, Jose Cricelio Montesinos-Lopez, Jose Crossa, Nerida Lozano Ramirez, Pawan Singh, Felicitas Alejandra Valladares-Anguiano
Summary: Choosing the right statistical machine learning model is crucial in genomic selection. This study introduces a zero-inflated random forest model, which outperforms conventional random forest and Generalized Poisson Ridge regression models in prediction performance when dealing with excessive zeros in count response variables.
G3-GENES GENOMES GENETICS
(2021)
Article
Automation & Control Systems
Alexandre Bouchard-Cote, Andrew Roth
Summary: Bayesian feature allocation models are widely used for data modeling with a combinatorial latent structure, but exact inference is typically intractable. To address the inefficiency of single variable Gibbs updates due to strong correlations between features, a block sampler has been developed for updating entire rows of the feature allocation matrix in a single move. The Particle Gibbs (PG) sampler offers improved performance compared to standard Gibbs sampling, with computational complexity growing linearly instead of exponentially with the number of features.
JOURNAL OF MACHINE LEARNING RESEARCH
(2021)
Article
Geosciences, Multidisciplinary
Steinar Love Ellefmo, Thomas Kuhn
Summary: Minerals and metals play a crucial role in society, and there is great potential in mining mineral resources from the deep ocean floor. This study utilized images and expert knowledge to estimate nodules abundance, showcasing the importance of utilizing data effectively for better informed estimates. Future improvements will focus on enhancing the estimation of minimum and maximum values at image locations.
NATURAL RESOURCES RESEARCH
(2021)
Article
Computer Science, Interdisciplinary Applications
Kheirolah Okhli, Mehdi Jabbari Nooghabi
Summary: This paper introduces the three-component mixture of exponential (3-CME) distributions as an alternative platform for analyzing positive datasets in the presence of multiple lower and upper outliers, which may cause misleading inferential conclusions. The parameter estimates are obtained using the Bayesian methodology, and five simulation studies are conducted to investigate the performance of the proposed approach. The results show that the proposed outlier model can be selected as an appropriate alternative model in dealing with the data with and without lower and upper outliers.
MATHEMATICS AND COMPUTERS IN SIMULATION
(2023)
Article
Physics, Multidisciplinary
Anmin Tang, Xingde Duan, Yuanying Zhao
Summary: In the development of simplex mixed-effects models, random effects are typically assumed to follow a normal distribution. To address violations of this assumption, a centered Dirichlet process mixture model is employed in this paper. By utilizing a Bayesian Lasso, important covariates with nonzero effects can be selected while estimating unknown parameters in semiparametric simplex mixed-effects models, with the help of a block Gibbs sampler and the Metropolis-Hastings algorithm.
Article
Agriculture, Dairy & Animal Science
Viktor Milkevych, Per Madsen, Hongding Gao, Just Jensen
Summary: This study proposes a quantitative criterion for determining the amount of genomic information included in a model and finds that the estimated variances are dependent on the amount of genomic data, but independent of the Gibbs updating schemes. Simulation results show that the convergence rate of location parameters deteriorates gradually as new genomic data is added, while the convergence of variance components continuously improves.
JOURNAL OF ANIMAL BREEDING AND GENETICS
(2021)
Article
Mathematics, Applied
Kheirolah Okhli, Mehdi Jabbari Nooghabi
Summary: This paper introduces the CE distribution as an alternative platform for analyzing insurance data and presents the Bayesian approach for parameter estimation. Simulation studies using Gibbs sampling were conducted to check the methodology's performance, and four examples of actual insurance claim data were analyzed to illustrate the CE distribution's superiority in analyzing data and identifying outliers.
APPLIED MATHEMATICS AND COMPUTATION
(2021)
Article
Genetics & Heredity
Abelardo Montesinos-Lopez, Daniel E. Runcie, Maria Itria Ibba, Paulino Perez-Rodriguez, Osval A. Montesinos-Lopez, Leonardo A. Crespo, Alison R. Bentley, Jose Crossa
Summary: Implementing genomic-based prediction models in genomic selection involves understanding how to evaluate prediction accuracy from different models and methods using multi-trait data. This study compared prediction accuracy using six large multi-trait wheat datasets and found that a corrected Pearson's correlation method was more accurate than the traditional method. For grain yield, using a multi-trait model yielded higher prediction performance compared to a single-trait model, with the benefits increasing as genetic correlations between traits strengthen.
G3-GENES GENOMES GENETICS
(2021)
Article
Genetics & Heredity
Tianjing Zhao, Jian Zeng, Hao Cheng
Summary: With the increasing amount and diversity of intermediate omics data, there is a need to develop methods to incorporate them into genomic evaluation. The researchers developed a new method called NN-MM, which models the multiple layers of regulation from genotypes to intermediate omics features using a multilayer neural network. Compared to the recently proposed single-step approach, NN-MM provides better prediction performance for genomic prediction with intermediate omics data.
Editorial Material
Biochemistry & Molecular Biology
Tim K. Mackey, Alec J. Calac, B. S. Chenna Keshava, Joseph Yracheta, Krystal S. Tsosie, Keolu Fox
Summary: This commentary discusses how to provide health and genomic data that align with the values and priorities of marginalized communities by building a blockchain genomics data framework based on the concept of Indigenous Data Sovereignty.
Article
Mathematics, Applied
Felipe Uribe, Yiqiu Dong, Per Christian Hansen
Summary: This paper investigates the application of the shrinkage horseshoe prior in edge-preserving settings and introduces its formulation. A Gibbs sampling framework is used to solve the hierarchical formulation of the Bayesian inverse problem, with one conditional distribution being high-dimensional Gaussian and the rest derived in closed form using a scale mixture representation of the heavy-tailed hyperpriors. Applications in imaging science demonstrate that our computational procedure is able to compute sharp edge-preserving posterior point estimates with reduced uncertainty.
SIAM JOURNAL ON SCIENTIFIC COMPUTING
(2023)
Article
Computer Science, Artificial Intelligence
Jingtao Ding, Guanghui Yu, Xiangnan He, Fuli Feng, Yong Li, Depeng Jin
Summary: This paper proposes a negative sampler for BPR that leverages additional view data, leading to a relative improvement in personalized ranking performance over 36.64% and 16.40% on Beibei and Tmall datasets. The findings demonstrate the importance of considering users' additional feedback when modeling their preference on different items.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2021)
Article
Computer Science, Interdisciplinary Applications
Nicola Donelli, Stefano Peluso, Antonietta Mira
Summary: Interactions among multiple time series of positive random variables play a crucial role in various financial applications, with the popular model being the vector Multiplicative Error Model (vMEM) that imposes a linear iterative structure on the dynamics of the conditional mean. A Bayesian semiparametric approach is used to address the restrictive assumption on the distribution of the random innovation term in vMEM, resulting in a more flexible specification. The method avoids computational complications by formulating a slice sampler on the parameter-extended unconstrained version of the model, and outperforms classical methods in terms of fitting and predictive power.
COMPUTATIONAL STATISTICS & DATA ANALYSIS
(2021)
Article
Mathematics
Mohamed S. Eliwa, Mahmoud El-Morshedy, Haitham M. Yousof
Summary: This paper introduces a new flexible probability tool for modeling extreme and zero-inflated count data under different shapes of hazard rates. Many relevant mathematical and statistical properties are derived and analyzed. Several classical estimation techniques are considered, and comprehensive comparison is performed for both simulated and real-life data. The flexibility, applicability, and notability of the new class are demonstrated through the analysis of four real datasets.
Article
Statistics & Probability
Rahim Alhamzawi
Summary: Two Bayesian methods for regularized left censored regression, namely the reciprocal Bayesian bridge and the reciprocal Bayesian adaptive bridge, are proposed in this paper. Gibbs samplers are derived based on the reciprocal Bayesian bridge prior, which is a scale mixture of the inverse uniform distribution. The proposed methods are illustrated through simulated studies and a real data example, showing improved variable selection and estimation performance compared to existing methods.
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION
(2023)
Article
Computer Science, Information Systems
Soham Chattopadhyay, Arijit Dey, Pawan Kumar Singh, Ali Ahmadian, Ram Sarkar
Summary: Speech is crucial in human communication and human-computer interaction. In the field of AI and ML, it has been extensively studied to recognize human emotions from speech signals. To address the challenge of large feature dimension, a hybrid feature selection algorithm called CEOAS is proposed. By extracting LPC and LPCC features, the proposed model reduces feature dimension and improves classification accuracy. Impressive recognition accuracies have been achieved on four benchmark datasets, surpassing state-of-the-art algorithms.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Mainak Biswas, Saif Rahaman, Ali Ahmadian, Kamalularifin Subari, Pawan Kumar Singh
Summary: Spoken Language Identification (SLID) is a well-researched field and an important first step in multilingual speech recognition systems. This study proposes a model for Indian and foreign language recognition, which enhances data to make it robust against everyday life noise and selects relevant features through feature extraction and selection algorithms. The model achieves high accuracy on three standard datasets, indicating that these features capture language specific characteristics of speech and can be used as standard features for SLID task.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Business, Finance
Pawan Kumar Singh, Alok Kumar Pandey, Ravi Kiran, Rajiv Kumar Bhatt, Anushka Chouhan
Summary: This study collected information from 145 countries to predict the impact of COVID-19 cases, tests per million, and the proportion of people aged 65 and above on deaths per million at country and continent levels. It also evaluated the economic cost of these indicators in terms of reduction in GDP growth rate. The study found significant differences across continents and a negative association between tests per million and deaths per million. It provides valuable insights for assessing the impact of these indicators in the pandemic and informing policy formation and decision-making strategies.
INTERNATIONAL JOURNAL OF FINANCE & ECONOMICS
(2023)
Article
Environmental Sciences
Alok Kumar Pandey, Pawan Kumar Singh, Muhammad Nawaz, Amrendra Kumar Kushwaha
Summary: Renewable energy plays an important role in providing reliable power supplies and diversifying fuel sources, while also helping to conserve natural resources. Solar energy has become increasingly prominent in India. This study forecasts the development of renewable energy and finds that wind power is growing faster than hydropower, solar energy, and bioenergy.
ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH
(2023)
Article
Plant Sciences
Raysa Gevartosky, Humberto Fanelli Carvalho, Germano Costa-Neto, Osval A. Montesinos-Lopez, Jose Crossa, Roberto Fritsche-Neto
Summary: This study aimed to design optimized training sets for genomic prediction considering multi-trait multi-environment trials and how those methods may increase accuracy reducing phenotyping costs. The combined use of genomic and enviromic data efficiently designs optimized training sets for genomic prediction, improving the response to selection per dollar invested.
Article
Biotechnology & Applied Microbiology
Osval A. Montesinos-Lopez, Abelardo Kismiantini, Abelardo Montesinos-Lopez
Summary: Genomic selection (GS) is being revolutionized in plant and animal breeding, but its practical implementation faces challenges due to uncontrolled factors. To improve prediction accuracy, this paper proposes two methods: reformulating GS as a binary classification problem, and applying postprocessing to adjust the classification threshold. Both methods outperformed the conventional regression model, with the postprocessing method showing better results.
Article
Biochemistry & Molecular Biology
Guillermo Garcia-Barrios, Jose Crossa, Serafin Cruz-Izquierdo, Victor Heber Aguilar-Rincon, J. Sergio Sandoval-Islas, Tarsicio Corona-Torres, Nerida Lozano-Ramirez, Susanne Dreisigacker, Xinyao He, Pawan Kumar Singh, Rosa Angela Pacheco-Gil
Summary: Genomic prediction is used to predict breeding values based on molecular and phenotypic data. This study evaluated the performance of different models in predicting disease resistance in synthetic hexaploid wheat. The results showed that the combination of genomic and pedigree information (A+G BLUP) had the highest prediction accuracy, while the single trait and multi-trait models had similar accuracies. This suggests that the use of genomic information can improve breeding programs.
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES
(2023)
Article
Genetics & Heredity
Abelardo Montesinos-Lopez, Carolina Rivera, Francisco Pinto, Francisco Pinera, David Gonzalez, Mathew Reynolds, Paulino Perez-Rodriguez, H. Li, Osval A. Montesinos-Lopez, Jose Crossa
Summary: By comparing a novel DL method with conventional GP models, this study found that DL method has higher accuracy in predicting genomic phenotypes in plant breeding research and can account for the complexity of genotype-environment interaction. However, traditional GP models can also achieve high accuracy in certain situations.
G3-GENES GENOMES GENETICS
(2023)
Article
Plant Sciences
Osval A. Montesinos-Lopez, Alison R. Bentley, Carolina Saint Pierre, Leonardo Crespo-Herrera, Leonardo Rebollar-Ruellas, Patricia Edwigis Valladares-Celis, Morten Lillemo, Abelardo Montesinos-Lopez, Jose Crossa
Summary: Genomic selection (GS), proposed by Meuwissen et al. more than 20 years ago, is revolutionizing plant and animal breeding. In our study of 14 real datasets, we found that the average gain in prediction accuracy when genomic information is considered was 26.31%. The quality of the markers and relatedness of the individuals can greatly impact the increase in prediction accuracy.
Article
Environmental Sciences
Afolabi Agbona, Osval A. Montesinos-Lopez, Mark E. Everett, Henry Ruiz-Guzman, Dirk B. Hays
Summary: Many aspects of below-ground plant performance are not fully understood, including their spatial and temporal dynamics in relation to environmental factors. In this study, Ground-Penetrating Radar (GPR) was evaluated for its potential in normalizing spatial heterogeneity and estimating fresh root yield in a cassava field trial. The results showed that the GPR-based autoregressive (AR) model outperformed other models, indicating the potential of GPR in non-destructive yield estimation and field spatial heterogeneity normalization in root and tuber crop programs.
Article
Agronomy
Timothy G. Porch, Juan Carlos Rosas, Karen Cichy, Graciela Godoy Lutz, Iveth Rodriguez, Raphael W. Colbert, Gasner Demosthene, Juan Carlos Hernandez, Donna M. Winham, James S. Beaver
Summary: Tepary bean is a nutritious alternative to common bean in high temperature and drought-prone areas. USDA Fortuna cultivar has improved seed size and quality, resistant to diseases and pests, and shorter cooking time.
JOURNAL OF PLANT REGISTRATIONS
(2023)
Article
Agronomy
Osval Montesinos-Lopez, Kismiantini, Abelardo Montesinos-Lopez
Summary: Genomic selection is revolutionizing animal and plant breeding, but its implementation faces challenges due to mismatch in training and testing set distributions. This research used the adversarial validation method with probit regression to address the distribution mismatch and select optimal training sets. Evaluations showed that the proposed method effectively detected the mismatch and outperformed existing methods, achieving higher prediction accuracy.
Article
Multidisciplinary Sciences
Cesar D. Petroli, Guntur V. Subbarao, Juan A. Burgueno, Tadashi Yoshihashi, Huihui Li, Jorge Franco Duran, Kevin V. Pixley
Summary: A study found that maize root systems release glycosides that can inhibit the activity of nitrifiers and reduce soil nitrate formation in the root zone. Through genetic variation analysis, several maize varieties with high glycoside activity and the ability to release glycosides were identified, and genetic markers associated with these traits were found, providing the possibility of improving glycoside activity in maize through marker-assisted selection.
SCIENTIFIC REPORTS
(2023)
Article
Plant Sciences
Hannah R. R. Jeffery, Nyasha Mudukuti, Carol Robin Buell, Kevin L. L. Childs, Karen Cichy
Summary: Soaking dry beans before cooking can reduce the cooking time. This study identified gene expression patterns that are altered by soaking and compared gene expression in fast-cooking and slow-cooking beans. Genes related to cell wall growth and development, as well as hypoxic stress, were differentially expressed in slow-cooking beans after soaking.
Article
Mathematics
Soumita Seth, Saurav Mallik, Atikul Islam, Tapas Bhadra, Arup Roy, Pawan Kumar Singh, Aimin Li, Zhongming Zhao
Summary: In this paper, a new framework is introduced to discover gene signatures from scRNA-seq data. The framework combines various strategies such as imputed matrix, MRMR feature selection, and shrinkage clustering. The results show that the proposed framework efficiently identifies differentially expressed stronger gene signatures and up-regulated markers in single-cell RNA sequencing data.