Article
Economics
Alexander D'Amour, Peng Ding, Avi Feller, Lihua Lei, Jasjeet Sekhon
Summary: This paper discusses the key assumptions for estimating causal effects under exogeneity, including unconfoundedness and overlap. Researchers often argue that unconfoundedness is more plausible when more covariates are included in the analysis, while less discussed is the difficulty of satisfying covariate overlap. By exploiting results from information theory, the authors derive explicit bounds on the average imbalance in covariate means under strict overlap, showing that these bounds become more restrictive as the dimension grows large.
JOURNAL OF ECONOMETRICS
(2021)
Article
Computer Science, Artificial Intelligence
Zsolt T. Kosztyan, Marcell T. Kurbucz, Attila I. Katona
Summary: This work introduces a novel network-based (nonparametric) dimensionality reduction analysis (NDA) method for addressing the high-dimensional, low-sample-size problem in data science. The NDA method constructs a correlation graph and detects modules using a modularity-based community detection method. It determines a linear combination of variables weighted by their eigenvector centralities (EVCs) and can effectively select important variables. The experimental results demonstrate that NDA outperforms existing methods in terms of interpretability and ease of use.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Mathematics, Applied
Liang Chen
Summary: In this paper, a new hashing scheme is theoretically proposed for the sparse Fourier transform in high-dimensional space. The complexity analysis of the algorithm shows that this transform can overcome the curse of dimensionality. To the best of our knowledge, this is the first polynomial-time algorithm to recover high-dimensional continuous frequencies.
Review
Nanoscience & Nanotechnology
Hao Gu, Junmin Xia, Chao Liang, Yonghua Chen, Wei Huang, Guichuan Xing
Summary: This Perspective article investigates the advances in achieving phase-pure perovskite by manipulating precursor interactions and preparation methods, and discusses their prominent optoelectronic properties and applications. Compared to two-dimensional metal-halide perovskites with multiple quantum wells, the ones with phase-pure quantum wells have a flattened energy landscape, resulting in reduced energy or charge-transfer losses and increased stability.
NATURE REVIEWS MATERIALS
(2023)
Article
Mathematical & Computational Biology
Qi Zhang, Feifei Chen, Shunyao Wu, Hua Liang
Summary: We evaluate the validity of a projection-based test for linear models when the number of covariates tends to infinity, showing that the test remains consistent and derives asymptotic distributions under the null and alternative hypotheses. The test gains dimension reduction significantly and demonstrates remarkable numerical performance, with asymptotic properties similar to when the number of covariates is fixed as long as p/n -> 0.
STATISTICS IN MEDICINE
(2021)
Article
Chemistry, Analytical
Emmanuel Pintelas, Ioannis E. Livieris, Panagiotis E. Pintelas
Summary: The study introduces a novel approach using a convolutional autoencoder topological model to address the issue of noise and redundant information affecting deep learning models, leading to a significant performance improvement by compressing and filtering initial high-dimensional input images.
Article
Automation & Control Systems
Ning Ning, Edward L. Ionides
Summary: This paper introduces a method for parameter learning in high-dimensional, partially observed, and nonlinear stochastic processes, proposing the iterated block particle filter algorithm. The algorithm shows promising performance in solving the curse of dimensionality in various experiments.
JOURNAL OF MACHINE LEARNING RESEARCH
(2023)
Article
Biochemical Research Methods
Tianshu Feng, Jaime Davila, Yuanhang Liu, Sangdi Lin, Shuai Huang, Chen Wang
Summary: Topological data analysis is a powerful method for dimensionality reduction, data relationship mining, and data structure representation, but current TDA modeling frameworks do not take into account domain context information and prior knowledge. The developed semi-supervised topological analysis (STA) framework, validated with simulation data, has been successfully applied to real gene expression and ovarian cancer data.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2021)
Review
Computer Science, Artificial Intelligence
Giuseppe Manco, Ettore Ritacco, Antonino Rullo, Domenico Sacca, Edoardo Serra
Summary: The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. This survey explores two possible approaches for synthesizing datasets that reflect patterns of real ones, and compares their pros and cons.
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
(2022)
Article
Computer Science, Artificial Intelligence
Cheong Hee Park
Summary: This paper compares and analyzes the performance of outlier detection in high dimensional data, with a focus on text data with dimensions typically in the tens of thousands. The performance of outlier detection methods in unsupervised versus semi-supervised mode and uni-modal versus multi-modal data distributions are compared through simulated experimental setups. The paper also discusses the use of k-NN distance in high dimensional data.
JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH
(2023)
Article
Computer Science, Interdisciplinary Applications
Qiangqiang Zhai, Zhao Liu, Zhouzhou Song, Ping Zhu
Summary: In this work, an improved high-dimensional Kriging modeling method based on maximal information coefficient (MIC) is developed to address problems with high-dimensional input variables. The method optimizes hyperparameters using MIC values as prior knowledge and introduces an auxiliary parameter to establish the relationship between MIC values and hyperparameters. Experimental results show that the proposed method can achieve more accurate results than other three methods in problems with high-dimensional input variables, and it has acceptable modeling efficiency.
ENGINEERING COMPUTATIONS
(2023)
Review
Statistics & Probability
Jingyi Zhang, Ping Ma, Wenxuan Zhong, Cheng Meng
Summary: Optimal transport methods aim to find a transformation map that minimizes the transportation cost between two probability measures, known as the Wasserstein distance. Recently, these methods have gained attention in statistics, machine learning, and computer science, particularly in deep generative neural networks. However, estimating high-dimensional Wasserstein distances is a challenging problem due to the curse-of-dimensionality. Advanced projection-based techniques, such as the slicing approach, iterative projection approach, and projection robust OT approach, have been developed to tackle these high-dimensional OT problems. The article concludes by discussing open challenges in the field.
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS
(2023)
Article
Computer Science, Information Systems
Rahi Jain, Wei Xu
Summary: RHDSI is a novel feature selection method that integrates dimensionality reduction and machine learning, capable of handling high-dimensional data with interaction terms. It performs feature selection in three steps, including coarse feature selection, unsupervised statistical learning-based feature refinement, and supervised statistical learning-based final feature selection with interactions. RHDSI demonstrates better or comparable performance to standard feature selection algorithms in simulated data and real studies.
INFORMATION SCIENCES
(2021)
Article
Statistics & Probability
Gabriel Chandler, Wolfgang Polonik
Summary: This paper proposes a method for extracting multiscale geometric features from a data cloud, and demonstrates its potential in various applications such as classification and anomaly detection. It also explores connections to other concepts such as random set theory, localized depth measures, and nonlinear dimension reduction.
ANNALS OF STATISTICS
(2021)
Article
Computer Science, Information Systems
J. I. N. G. J. I. N. CHEN, S. H. U. P. I. N. G. CHEN, X. U. A. N. DING
Summary: This paper proposes a deep model based on Brenier theorem for manifold discovery in high-dimensional space. The results show that this method outperforms competing methods in terms of precision and resistance to data sparsity, and non-linear architectures with deep paradigms are more effective for manifold discovery. The loss function derived from Brenier theorem helps minimize the error between reconstructed and original manifolds, and constraining neurons with norm-2 is better for both easing data sparsity and improving precision in manifold discovery.