4.5 Article

Application of information-theoretic tests for the analysis of DNA sequences based on Markov chain models

Journal

COMPUTATIONAL STATISTICS & DATA ANALYSIS
Volume 53, Issue 5, Pages 1861-1872

Publisher

ELSEVIER
DOI: 10.1016/j.csda.2008.07.002

Keywords

-

Ask authors/readers for more resources

The statistical structure of DNA sequences is of great interest to molecular biology, genetics and the theory of evolution. One popular approach is sequence modeling using Markov processes of different orders, and further statistical estimation of their parameters. To continue the investigations according to this approach, tests for hypothesis testing are used to estimate the memory (or connectivity) of genetic texts and to solve the DNA-based problem relating to the phylogenetic system of various groups of organisms. (C) 2008 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Hardware & Architecture

Application of the Computer Capacity to the Analysis of Processors Evolution

Boris Ryabko, Anton Rakitskiy

JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS (2020)

Article Mathematics

Using Data-Compressors for Classification Hunting Behavioral Sequences in Rodents as Ethological Texts

Jan Levenets, Anna Novikovskaya, Sofia Panteleeva, Zhanna Reznikova, Boris Ryabko

MATHEMATICS (2020)

Article Physics, Multidisciplinary

Time-Adaptive Statistical Test for Random Number Generators

Boris Ryabko

ENTROPY (2020)

Article Mathematics

Compression-Based Methods of Time Series Forecasting

Konstantin Chirikhin, Boris Ryabko

Summary: The article proposes forecasting methods based on real-world data compressors that can effectively predict univariate and multivariate data with automatic selection of the best algorithm. Additionally, the use of time-universal codes can reduce computation time without sacrificing accuracy.

MATHEMATICS (2021)

Article Computer Science, Theory & Methods

A Pseudo-Random Generator Whose Output is a Normal Sequence

Boris Ryabko

Summary: This paper introduces a PRNG class that has been tested successfully and consists of generators that can produce normal sequences. The generators in this class also satisfy a specific mathematical property.

INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE (2021)

Article Physics, Multidisciplinary

Using Data Compression to Build a Method for Statistically Verified Attribution of Literary Texts

Boris Ryabko, Nadezhda Savina

Summary: This article proposes a methodology for authorship attribution of literary texts based on the use of data compressors, which allows for statistically verified results. The method is used to solve two problems of attribution in Russian literature.

ENTROPY (2021)

Article Statistics & Probability

Asymptotically most powerful tests for random number generators

Boris Ryabko

Summary: This article explores the construction of the most powerful test and effective statistical tests for RNGs used in various fields such as data protection, modeling and simulation systems, and computer games. The effectiveness of RNG statistical tests is estimated through experiments and a model suitable for binary sequences in encryption systems is proposed.

JOURNAL OF STATISTICAL PLANNING AND INFERENCE (2022)

Article Physics, Multidisciplinary

Information-Theoretic Method for Assessing the Quality of Translations

Boris Ryabko, Nadezhda Savina

Summary: In recent years, the task of translation has gained attention from researchers due to its practical applications. This paper proposes an information-theoretic method to assess translation quality, focusing on the impact of unconscious author's style on translation. The method is applied to translations of classic English works into Russian and vice versa, successfully determining the attribution of literary texts.

ENTROPY (2022)

Article Computer Science, Theory & Methods

Unconditionally secure short key ciphers based on data compression and randomization

Boris Ryabko

Summary: This article discusses the problem of creating an unconditionally secure cipher when the key length is shorter than the encrypted message. It proposes a cipher method based on data compression, randomization, and entropy-secure encryption, and applies it to two scenarios: knowing the statistics of encrypted messages, and generating messages using a Markov chain with known memory or connectivity. In both cases, the length of the secret key is negligible compared to the message length.

DESIGNS CODES AND CRYPTOGRAPHY (2023)

Proceedings Paper Computer Science, Information Systems

Using data compression and randomisation to build an unconditionally secure short key cipher

Boris Ryabko

Summary: We discuss the problem of constructing an unconditionally secure cipher when the key length is shorter than the encrypted message. By combining data compression, randomization techniques, and entropically-secure encryption, we propose a solution for encryption with known message statistics. The resulting cipher allows for key length independent of entropy or encrypted message length, but determined by the desired security level.

2022 IEEE INFORMATION THEORY WORKSHOP (ITW) (2022)

Proceedings Paper Computer Science, Information Systems

Statistical Testing of Randomness

Boris Ryabko

PROCEEDINGS OF 2020 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA2020) (2020)

Proceedings Paper Computer Science, Information Systems

The time-adaptive statistical testing for random number generators

Ryabko Boris, Zhuravlev Viacheslav

PROCEEDINGS OF 2020 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA2020) (2020)

Meeting Abstract Radiology, Nuclear Medicine & Medical Imaging

Translocator Protein in 4R Tauopathies - Experience from the ActiGliA Study

J. Sauerbeck, L. Beyer, S. Schoenecker, C. Palleis, G. Hoeglinger, E. Schuh, R. Boris, G. Rohrer, S. Sonnenfeld, K. Boetzel, A. Danek, A. Rominger, J. Levin, B. Matthias

EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING (2019)

Proceedings Paper Computer Science, Information Systems

Time-universal data compression and prediction

Boris Ryabko

2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) (2019)

Article Computer Science, Artificial Intelligence

Time-Universal Data Compression

Boris Ryabko

ALGORITHMS (2019)

Article Computer Science, Interdisciplinary Applications

One point per cluster spatially balanced sampling

Blair Robertson, Chris Price

Summary: Spatial sampling designs are crucial for accurate estimation of population parameters. This study proposes a new design method that generates samples with good spatial spread and performs favorably compared to existing designs.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Simultaneous confidence region of an embedded one-dimensional curve in multi-dimensional space

Hiroya Yamazoe, Kanta Naito

Summary: This paper focuses on the simultaneous confidence region of a one-dimensional curve embedded in multi-dimensional space. An estimator of the curve is obtained through local linear regression on each variable in multi-dimensional data. A method to construct a simultaneous confidence region based on this estimator is proposed, and theoretical results for the estimator and the region are developed. The effectiveness of the region is demonstrated through simulation studies and applications to artificial and real datasets.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Efficient and robust optimal design for quantile regression based on linear programming

Cheng Peng, Drew P. Kouri, Stan Uryasev

Summary: This paper introduces a novel optimal experimental design method for quantifying the distribution tails of uncertain system responses. The method minimizes the variance or conditional value-at-risk of the upper bound of the predicted quantile, and estimates the data uncertainty using quantile regression. The optimal design problems are solved as linear programming problems, making the proposed methods efficient even for large datasets.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Multi-block alternating direction method of multipliers for ultrahigh dimensional quantile fused regression

Xiaofei Wu, Hao Ming, Zhimin Zhang, Zhenyu Cui

Summary: This paper proposes a model that combines quantile regression and fused LASSO penalty, and introduces an iterative algorithm based on ADMM to solve high-dimensional datasets. The paper proves the global convergence and comparable convergence rates of the algorithm, and analyzes the theoretical properties of the model. Numerical experimental results support the superior performance of the model.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Nonparametric augmented probability weighting with sparsity

Xin He, Xiaojun Mao, Zhonglei Wang

Summary: This paper proposes a nonparametric imputation method with sparsity to estimate the finite population mean, using an efficient kernel method and sparse learning for estimation. An augmented inverse probability weighting framework is adopted to achieve a central limit theorem for the proposed estimator under regularity conditions.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Conditional-mean multiplicative operator models for count time series

Christian H. Weiss, Fukang Zhu

Summary: This study introduces a multiplicative error model (CMEMs) for discrete-valued count time series, which is closely related to the integer-valued generalized autoregressive conditional heteroscedasticity (INGARCH) models. It derives the stochastic properties and estimation approaches of different types of INGARCH-CMEMs, and demonstrates their performance and application through simulations and real-world data examples.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Hybrid exact-approximate design approach for sparse functional data

Ming-Hung Kao, Ping-Han Huang

Summary: Optimal designs for sparse functional data under the functional empirical component (FEC) settings are investigated. New computational methods and theoretical results are developed to efficiently obtain optimal exact and approximate designs. A hybrid exact-approximate design approach is proposed and demonstrated to be efficient through simulation studies and a real example.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

GP-BART: A novel Bayesian additive regression trees approach using Gaussian processes

Mateus Maia, Keefe Murphy, Andrew C. Parnell

Summary: The Bayesian additive regression trees (BART) model is a powerful ensemble method for regression tasks, but its lack of smoothness and explicit covariance structure can limit its performance. The Gaussian processes Bayesian additive regression trees (GP-BART) model addresses this limitation by incorporating Gaussian process priors, resulting in superior performance in various scenarios.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Additive partially linear model for pooled biomonitoring data

Xichen Mou, Dewei Wang

Summary: Human biomonitoring is a method of monitoring human health by measuring the accumulation of harmful chemicals in the body. To reduce the high cost of chemical analysis, researchers have adopted a cost-effective approach that combines specimens and analyzes the concentration of toxic substances in the pooled samples. To effectively interpret these aggregated measurements, a new regression framework is proposed by extending the additive partially linear model (APLM). The APLM is versatile in capturing the complex association between outcomes and covariates, making it valuable in assessing the complex interplay between chemical bioaccumulation and potential risk factors.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Laplace approximated quasi-likelihood method for heteroscedastic survival data

Lili Yu, Yichuan Zhao

Summary: The classical accelerated failure time model is a linear model commonly used for right censored survival data, but it cannot handle heteroscedastic survival data. This paper proposes a Laplace approximated quasi-likelihood method with a continuous estimating equation to address this issue, and provides estimation bias and confidence interval estimation formulas.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Standard error estimates in hierarchical generalized linear models

Shaobo Jin, Youngjo Lee

Summary: Hierarchical generalized linear models are widely used for fitting random effects models, but the standard error estimators receive less attention. Current standard error estimation methods are not necessarily accurate, and a sandwich estimator is proposed to improve the accuracy of standard error estimation.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Probability of default estimation in credit risk using mixture cure models

Rebeca Pelaez, Ingrid Van Keilegom, Ricardo Cao, Juan M. Vilar

Summary: This article proposes an estimator for the probability of default (PD) in credit risk, derived from a nonparametric conditional survival function estimator based on cure models. The asymptotic expressions for bias, variance, and normality of the estimator are presented. Through simulation and empirical studies, the performance and practical behavior of the nonparametric estimator are compared with other methods.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Joint modelling of the body and tail of bivariate data

L. M. Andre, J. L. Wadsworth, A. O'Hagan

Summary: This paper proposes a dependence model that captures the entire data range in multi-variable cases. By blending two copulas with different characteristics and using a dynamic weighting function for smooth transition, the model is able to flexibly capture various dependence structures.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Significance test for semiparametric conditional average treatment effects and other structural functions

Niwen Zhou, Xu Guo, Lixing Zhu

Summary: The paper investigates hypothesis testing regarding the potential additional contributions of other covariates to the structural function, given the known covariates. The proposed distance-based test, based on Neyman's orthogonality condition, effectively detects local alternatives and is robust to the influence of nuisance functions. Numerical studies and real data analysis demonstrate the importance of this test in exploring covariates associated with AIDS treatment effects.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)

Article Computer Science, Interdisciplinary Applications

Full uncertainty analysis for Bayesian nonparametric mixture models

Blake Moya, Stephen G. Walker

Summary: A full posterior analysis method for nonparametric mixture models using Gibbs-type prior distributions, including the well known Dirichlet process mixture (DPM) model, is presented. The method removes the random mixing distribution and enables a simple-to-implement Markov chain Monte Carlo (MCMC) algorithm. The removal procedure reduces some of the posterior uncertainty and introduces a novel replacement approach. The method only requires the probabilities of a new or an old value associated with the corresponding Gibbs-type exchangeable sequence, without the need for explicit representations of the prior or posterior distributions. This allows the implementation of mixture models with full posterior uncertainty, including one introduced by Gnedin. The paper also provides numerous illustrations and introduces an R-package called CopRe that implements the methodology.

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2024)