Article
Computer Science, Theory & Methods
Pete Philipson, Alan Huang
Summary: Classic models like Poisson or negative binomial regression models cannot handle count data that are subject to both under and overdispersion at some hierarchical level. The mean-parameterised Conway-Maxwell-Poisson distribution allows for both types of dispersion within the same model, but is difficult to work with due to an embedded normalising constant. We propose a look-up method that pre-computes values of the rate parameter to significantly reduce computing times and make the proposed model a practical alternative for handling bidispersed data. The approach is validated using a simulation study and applied to three datasets that exhibit both underdispersion and overdispersion at the individual level: a small dataset on takeover bids, a medium dataset on yellow cards issued by referees in the English Premier League before and during the Covid-19 pandemic, and a large Test match cricket bowling dataset.
STATISTICS AND COMPUTING
(2023)
Article
Statistics & Probability
A. Huang, A. S. I. Kim
Summary: This note explores Bayesian Conway-Maxwell-Poisson regression models that can handle both overdispersion and underdispersion, demonstrating Bayesian regression inferences for dispersed counts via a Metropolis-Hastings algorithm. Through analysis of two data examples and a simulation study, the models show favorable frequentist properties.
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS
(2021)
Article
Mathematics, Interdisciplinary Applications
Marie Beisemann
Summary: This article introduces a new count data model that can better accommodate count data tests and self-reports. The model allows for varying discriminations and dispersions, performing well in terms of statistical properties and handling real data.
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY
(2022)
Article
Statistics & Probability
Peter M. Philipson
Summary: A truncated, mean-parameterized Conway-Maxwell-Poisson model is developed to handle under- and overdispersed count data owing to individual heterogeneity. The model is applied to a large dataset of Test match cricket bowlers, and it indicates the merit of a more sophisticated measure for ranking and assessing Test match bowlers. The Bayesian approach and Markov Chain Monte Carlo algorithm are used for parameter estimation and extracting individual players' innate ability.
STATISTICAL MODELLING
(2023)
Article
Statistics & Probability
S. Bedbur, U. Kamps
Summary: Uniformly most powerful unbiased tests for one-sided hypotheses about the dispersion parameter of the Conway-Maxwell-Poisson distribution are derived using the exponential family structure. The critical values of the test statistics are obtained via simulation. The tests, applied to real data sets, provide evidence against the common Poisson model for count data.
STATISTICS & PROBABILITY LETTERS
(2023)
Article
Mathematics, Applied
Ulduz Mammadova, M. Revan Ozkale
Summary: This paper introduces new process control charts for monitoring Poisson and COM-Poisson profiles with highly correlated variables. The control charts are constructed based on ridge deviance residuals for both types of profiles and evaluated using simulation studies with real historical data and a hypothetical dataset. Performance comparison with existing methods is also conducted.
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS
(2021)
Article
Computer Science, Interdisciplinary Applications
Boris Forthmann, Philipp Doebler
Summary: The article outlines the use of item-response models for estimating researcher capacity, focusing on count data modeling and the issues of underdispersion and overdispersion. The flexible Conway-Maxwell-Poisson count model was employed to assess reliability estimates, revealing a drop in reliability estimates for inventors due to overdispersion. The study also identified a more complex dispersion pattern in researcher data compared to previous findings, emphasizing the importance of considering different models for capacity reliability evaluations.
Article
Biotechnology & Applied Microbiology
Pierluigi Polese, Manuela Del Torre, Mara Lucia Stecchini
Summary: This study examined the impact of multiple hurdles on the random and non-random components of survivors. The analysis showed that the randomness of survivors was related to the degree of dispersion of the inactivation parameters.
Article
Multidisciplinary Sciences
Zakariya Yahya Algamal, Mohamed R. Abonazel, Fuad A. Awwad, Elsayed Tag Eldin
Summary: Recently, there has been a strong interest in modeling count data, which frequently exhibit over-dispersion or under-dispersion. The Conway-Maxwell-Poisson regression (COMP) model has successfully addressed the count data modeling with a broad range of dispersion. Multicollinearity, known to impact the variance of maximum likelihood estimator, can be mitigated by biased estimators like the ridge estimator. In this study, we propose the jack-knife ridge estimator (JCOMPRE) and its modified version (MJCOMPRE) for the COMP model, which effectively reduce multicollinearity effects and bias. Simulation and real-life applications demonstrate that the proposed estimators outperform the maximum likelihood estimator and the ridge estimator in terms of bias and mean squared error.
SCIENTIFIC AFRICAN
(2023)
Article
Mathematical & Computational Biology
Tong Kang, Jeremy Gaskins, Steven Levy, Somnath Datta
Summary: This study investigates the effects of various factors on the progression of dental caries in children using a mixed effects model. By combining a Bayesian hurdle framework with the Conway-Maxwell-Poisson regression model, the study provides novel tools for statistical practitioners and fresh insights for dental researchers. The methodology developed in this study incorporates a hierarchical shrinkage prior distribution and a sparse covariance structure for modeling the dependence among teeth of each individual child.
STATISTICS IN MEDICINE
(2021)
Article
Microbiology
Pierluigi Polese, Manuela Del Torre, Mara Lucia Stecchini
Summary: The article discusses the importance of considering uncertainty and biological variability in controlling harmful microorganisms like Listeria monocytogenes through inactivation steps. It proposes a statistical modeling approach for describing variation in osmotic inactivation processes, and highlights the impact of over-dispersion on the variance of bacterial populations.
FRONTIERS IN MICROBIOLOGY
(2021)
Article
Multidisciplinary Sciences
Nico Higgs, Ian Stavness
Summary: This study used a Bayesian multi-level regression model to infer the changes in home advantage during the COVID-19 pandemic across North American professional sports leagues. The results showed a negative impact on home advantage in the NHL and NBA playoffs, while the MLB and NFL seasons saw little to no change.
SCIENTIFIC REPORTS
(2021)
Article
Engineering, Multidisciplinary
Marcelo Bourguignon, Rodrigo M. R. Medeiros, Fidel Henrique Fernandes, Linda Lee Ho
Summary: This study focuses on control charts based on the BerG process for various dispersion cases and warns about the potential errors in using asymptotic control limits. Guidelines for practitioners are provided for determining minimum sample size and matching exact control limits based on extensive simulations. The proposed schemes are applied to monitoring the BerG mean parameter.
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL
(2021)
Article
Computer Science, Interdisciplinary Applications
Faiza Sami, Muhammad Moeen Butt, Muhammad Amin
Summary: The two-parameter estimator (TPE) is proposed for the Poisson regression model, but it has the limitation of a single parameter. The count data models often suffer from the problems of dispersion and multicollinearity. The Conway-Maxwell-Poisson regression model (COMPRM) is suitable for handling both issues simultaneously. To estimate the COMPRM coefficients, the iterative reweighted least square (IRLS) method is used. Through a Monte Carlo simulation study, the efficiency of the estimator is evaluated based on the mean square error (MSE). In the presence of multicollinearity, the Asar and Genc's two-parameter estimator (AGTPE) shows better efficiency for COMPRM compared to other estimators like maximum likelihood (MLE), Ridge estimator, Liu estimator, and the TPE by Huang and Yang (HYTPE). The proposed estimator is also being studied for real-life applications.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
(2023)
Article
Computer Science, Interdisciplinary Applications
Mohamed R. Abonazel
Summary: The COMP model is a flexible count data regression model used in over- and underdispersion cases. To handle multicollinearity, a new modified Liu estimator for the COMP regression model is proposed based on two shrinkage parameters. The proposed estimator shows superiority in simulation study and real-life application compared to existing estimators.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
(2023)
Article
Computer Science, Interdisciplinary Applications
Travis Greene, Galit Shmueli, Soumya Ray, Jan Fell
Editorial Material
Biology
Galit Shmueli
Article
Statistics & Probability
Suneel Babu Chatla, Galit Shmueli
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2020)
Article
Statistics & Probability
Mahsa Ashouri, Rob J. Hyndman, Galit Shmueli
Summary: Forecasting hierarchical or grouped time series using a reconciliation approach involves two steps: computing base forecasts and reconciling the forecasts. A proposed linear model handles the forecasting and reconciliation in a single step, avoiding computational challenges when using popular time series forecasting methods like ETS or ARIMA. This method is flexible and can incorporate external data effectively.
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
(2022)
Article
Economics
Galit Shmueli, Ali Tafti
Summary: Many internet platforms collect behavioral big data to predict user behavior for internal purposes and for business customers. Data science researchers improve prediction through algorithms, models, and larger data. Platforms can achieve better prediction accuracy by pushing users' behaviors towards predicted values using behavior modification techniques. This strategy is absent from the machine learning and statistics literature. Incorporating causal with predictive notation helps understand the impact of behavior modification on predictive power. Behavior modification can make users' behavior more predictable and homogeneous, but may not generalize in practice and may harm manipulated users.
INTERNATIONAL JOURNAL OF FORECASTING
(2023)
Article
Social Sciences, Mathematical Methods
Travis Greene, Galit Shmueli, Jan Fell, Ching-Fu Lin, Han-Wei Liu
Summary: This article discusses the issue of predictive inconsistency in algorithmic risk prediction tools, pointing out that different choices may lead to different predicted risk scores for the same individual. The article argues that in a diverse and pluralistic society, complete elimination of predictive inconsistency should not be expected. Instead, the authors propose identifying and documenting relevant and reasonable 'forking paths' to enhance the legal, scientific, and political legitimacy of these tools.
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY
(2022)
Editorial Material
Economics
Galit Shmueli, Ali Tafti
INTERNATIONAL JOURNAL OF FORECASTING
(2023)
Article
Computer Science, Artificial Intelligence
Travis Greene, Amit Dhurandhar, Galit Shmueli
Summary: In response to the social impacts of AI-based technologies, conferences and journals now require ethics impact statements and reviews, causing debates between atomists and holists. Atomists believe in separating facts from values, while holists believe they are intertwined. To reduce polarization, we analyze each ideology's beliefs and propose empathy and targeted strategies for ethical disagreements in the data science community.
Article
Management
Nicholas P. Danks, Soumya Ray, Galit Shmueli
Summary: Construct-based models in management and information systems research may be too overfit to the data samples used, making them risky to use outside of a specific sample. A composite overfit analysis framework is proposed to analyze the sources and consequences of overfitting in empirical research. The framework uses predictive tools to identify deviant cases and groups, and analyzes the impact on model parameters, providing insights into the reasons and effects of overfitting.
MANAGEMENT SCIENCE
(2023)
Article
Information Science & Library Science
Sujin Park, Ali Tafti, Galit Shmueli
Summary: Transportability is a method of transporting causal effects from experimental studies to observational data when studying different populations. It provides a solution to overcome practical constraints in inferring causal relationships. However, its implementation has been limited due to the lack of practical guidelines and handling statistical challenges. This study aims to bridge the theory-practice gap by offering a detailed procedure for transporting causal effects and discussing practical considerations and limitations.
INFORMATION SYSTEMS RESEARCH
(2023)
Article
Computer Science, Artificial Intelligence
Travis Greene, David Martens, Galit Shmueli
Summary: The era of behavioural big data has made it difficult for academic researchers to access the data and conduct research due to platform control and algorithmic behaviour modification techniques. This isolation has consequences for creating knowledge in data science, and academic data scientists should play new roles in promoting platform transparency and social debate.
NATURE MACHINE INTELLIGENCE
(2022)
Article
Management
Pratyush Nidhi Sharma, Galit Shmueli, Marko Sarstedt, Nicholas Danks, Soumya Ray
Summary: The study compares the performance of standard PLS-PM criteria and Information Theory-derived model selection criteria, finding that in-sample criteria can serve as useful substitutes for out-of-sample criteria when there is no holdout sample. The best performing out-of-sample criteria include RMSE and MAD when a holdout sample is available.
Article
Information Science & Library Science
Ali Tafti, Galit Shmueli
INFORMATION SYSTEMS RESEARCH
(2020)
Article
Computer Science, Information Systems
Pratyush Nidhi Sharma, Marko Sarstedt, Galit Shmueli, Kevin H. Kim, Kai Oliver Thiele
JOURNAL OF THE ASSOCIATION FOR INFORMATION SYSTEMS
(2019)
Article
Business
Galit Shmueli, Marko Sarstedt, Joseph F. Hair, Jun-Hwa Cheah, Hiram Ting, Santha Vaithilingam, Christian M. Ringle
EUROPEAN JOURNAL OF MARKETING
(2019)