☆ 4.3 Review

Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework

MATHEMATICAL PROBLEMS IN ENGINEERING (2015)

Journal

MATHEMATICAL PROBLEMS IN ENGINEERING

Volume 2015, Issue -, Pages -

Publisher

HINDAWI LTD

DOI: 10.1155/2015/759567

Keywords

-

Categories

Engineering, Multidisciplinary Mathematics, Interdisciplinary Applications

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

When dealing with datasets comprising high-dimensional points, it is usually advantageous to discover some data structure. A fundamental information needed to this aim is the minimum number of parameters required to describe the data while minimizing the information loss. This number, usually called intrinsic dimension, can be interpreted as the dimension of the manifold from which the input data are supposed to be drawn. Due to its usefulness in many theoretical and practical problems, in the last decades the concept of intrinsic dimension has gained considerable attention in the scientific community, motivating the large number of intrinsic dimensionality estimators proposed in the literature. However, the problem is still open since most techniques cannot efficiently deal with datasets drawn from manifolds of high intrinsic dimension and nonlinearly embedded in higher dimensional spaces. This paper surveys some of the most interesting, widespread used, and advanced state-of-the-art methodologies. Unfortunately, since no benchmark database exists in this research field, an objective comparison among different techniques is not possible. Consequently, we suggest a benchmark framework and apply it to comparatively evaluate relevant state-of-the-art estimators.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Automation & Control Systems

Intrinsic Dimension Estimation Using Wasserstein Distance

Adam Block, Zeyu Jia, Yury Polyanskiy, Alexander Rakhlin

Summary: The article investigates the low-dimensional structure assumption of high-dimensional data. It introduces a new estimator for the intrinsic dimension and provides finite sample guarantees. The techniques are then applied to derive new sample complexity bounds for Generative Adversarial Networks (GANs) based solely on the intrinsic dimension of the data.

JOURNAL OF MACHINE LEARNING RESEARCH (2022)

Add to Collection

Article Physics, Multidisciplinary

Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation

Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, Andrei Zinovyev

Summary: This technical note introduces an open-source Python package called scikit-dimension for intrinsic dimension estimation. The package provides a uniform implementation of various known ID estimators based on scikit-learn API, allowing evaluation of global and local intrinsic dimension as well as generating synthetic datasets. It is developed with tools to assess code quality, coverage, unit testing and continuous integration.

ENTROPY (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Intrinsic dimension estimation method based on correlation dimension and kNN method

Haiquan Qiu, Youlong Yang, Saeid Rezakhah

Summary: In practical problems, high-dimensional data often exhibits low-dimensional structure, which can be estimated using correlation dimension methods. However, these methods tend to underestimate the true intrinsic dimension of the dataset.

KNOWLEDGE-BASED SYSTEMS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Underestimation modification for intrinsic dimension estimation

Haiquan Qiu, Youlong Yang, Hua Pan

Summary: This paper introduces the concept of intrinsic dimension and its importance in data dimensionality reduction and preprocessing. Due to the unknown spatial distribution of data and the limited sample size, estimation methods which only use distance information tend to underestimate the intrinsic dimension of the dataset. To improve accuracy and reduce complexity, two estimation algorithms based on ID (kappa) are proposed, where kappa is the scaling ratio of the neighborhood radius. The comparative experiments on simulation and real datasets show that the underestimation modification algorithm has high estimation accuracy and robustness.

PATTERN RECOGNITION (2023)

Add to Collection

Article Computer Science, Information Systems

Intrinsic dimension estimation based on local adjacency information

Haiquan Qiu, Youlong Yang, Benchong Li

Summary: The intrinsic dimension (ID) of a data set is crucial for data processing, and a new ID estimation method known as ID(k) algorithm is proposed in this study. By redefining the adjacency matrix using local adjacency information of sample points, the ID(k) method shows closer estimates to the true intrinsic dimension in experimental results.

INFORMATION SCIENCES (2021)

Add to Collection

Article Computer Science, Interdisciplinary Applications

intRinsic: An R Package for Model-Based Estimation of the Intrinsic Dimension of a Dataset

Francesco Denti

Summary: This article introduces intRinsic, an R package that implements novel likelihood-based estimators for estimating the intrinsic dimension of a dataset. It includes two categories of models: homogeneous and heterogeneous estimators. The package provides high-level functions for easier accessibility and efficient low-level routines. The performance of the models is demonstrated on simulated datasets and applied to the Alon dataset. Estimating the intrinsic dimensions provides valuable insights into the dataset's topological structure.

JOURNAL OF STATISTICAL SOFTWARE (2023)

Add to Collection

Review Computer Science, Artificial Intelligence

Visual Interestingness Prediction: A Benchmark Framework and Literature Review

Mihai Gabriel Constantin, Liviu-Daniel Stefan, Bogdan Ionescu, Ngoc Q. K. Duong, Claire-Helene Demarty, Mats Sjoberg

Summary: This paper presents a common evaluation framework for image and video visual interestingness prediction, with a robust dataset and in-depth analysis. It discusses the potential for surpassing current state-of-the-art systems and proposes solutions for achieving this.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2021)

Add to Collection

Article Astronomy & Astrophysics

GALLIFRAY-A Geometric Modeling and Parameter Estimation Framework for Black Hole Images Using Bayesian Techniques

Saurabh, Sourabh Nampalliwar

Summary: Recent observations of galactic centers with the Event Horizon Telescope have led to a new era of black hole tests of fundamental physics using VLBI. This article presents GALLIFRAY, an open-source, Python-based framework for parameter estimation using VLBI data. The framework demonstrates good convergence of the posterior distribution when fitting geometric and physical models to simulated datasets.

ASTROPHYSICAL JOURNAL (2023)

Add to Collection

Article Multidisciplinary Sciences

DeepRank: a deep learning framework for data mining 3D protein-protein interfaces

Nicolas Renaud, Cunliang Geng, Sonja Georgievska, Francesco Ambrosetti, Lars Ridder, Dario F. Marzella, Manon F. Reau, Alexandre M. J. J. Bonvin, Li C. Xue

Summary: DeepRank is a deep learning framework for data mining large sets of 3D protein-protein interfaces, enabling efficient training with millions of PPIs and supporting both classification and regression. By addressing challenges such as distinguishing biological versus crystallographic PPIs and ranking docking models, DeepRank proves to be competitive with or outperform state-of-the-art methods, demonstrating its versatility in structural biology research.

NATURE COMMUNICATIONS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Heterogeneous Network Representation Learning: A Unified Framework With Survey and Benchmark

Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, Jiawei Han

Summary: This article aims to provide a unified framework for deeply summarizing and evaluating existing research on heterogeneous network embedding (HNE). We first provide a generic paradigm for categorization and analysis of various HNE algorithms, then create four benchmark datasets for fair evaluations, and finally, refactor and amend the implementations of 13 popular HNE algorithms for comprehensive comparisons.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

Add to Collection

Article Environmental Sciences

A computational system for Bayesian benchmark dose estimation of genomic data in BBMD

Chao Ji, Andrew Weissmann, Kan Shao

Summary: This article introduces a web-based dose-response modeling and benchmark dose (BMD) estimation system, Bayesian BMD (BBMD). By quantitatively addressing uncertainty from various sources, the system can provide more accurate BMD estimates. The study conducted using BBMD demonstrates the significant role of dose-response modeling using genomic data in supporting chemical risk assessment.

ENVIRONMENT INTERNATIONAL (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

BenchSubset: A framework for selecting benchmark subsets based on consensus clustering

Hongping Zhan, Weiwei Lin, Feiqiao Mao, Minxian Xu, Guangxin Wu, Guokai Wu, Jianzhuo Li

Summary: This study proposes a BenchmarkSubset framework based on consensus clustering for selecting benchmark subsets, solving the problem of redundancy in benchmark suites and the challenge of validating subset results for unlabeled suites. It also introduces a new evaluation method to consider the universal and diversity characteristics of benchmark suites.

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2022)

Add to Collection

Article Biochemistry & Molecular Biology

A Comprehensive Evaluation of the Performance of Prediction Algorithms on Clinically Relevant Missense Variants

Erda Qorri, Bertalan Takacs, Alexandra Graf, Marton Zsolt Enyedi, Lajos Pinter, Erno Kiss, Lajos Haracska

Summary: The rapid integration of genomic technologies in clinical diagnostics has led to the detection of numerous missense variants of unknown clinical significance. To aid in the interpretation of these variants, computational tools have been developed. Systematic benchmarking with high-quality independent datasets is crucial for selecting appropriate software. The performance of prediction algorithms varied widely across datasets.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

Add to Collection

Article Physics, Multidisciplinary

Stochastic modelling of fractal diffusion and dimension estimation

Frantisek Gaspar, Jaromir Kukal

Summary: The main aim of this paper is to revise classical methods for spectral and walk dimension estimates. The paper focuses on constructing unbiased estimation with minimal mean square error for walk and spectral dimensions. Simulation experiments are conducted on finite substrates, serving as models for continuum and fractal sets. The paper compares classical approaches with logarithmic transformation of asymptotic models and develops a weighted approach to improve dimension estimates' statistical properties. The paper also discusses different diffusion models and presents simulation results on two-dimensional substrates and Sierpinski gaskets and carpets. General suggestions based on the simulation experiment results are summarized.

PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Additive autoencoder for dimension estimation

Tommi Karkkainen, Jan Hanninen

Summary: This article proposes an additive autoencoder model for dimension reduction and analyzes its performance. Compared to traditional models, this model enhances the data reproduction capabilities in the original data dimension by adding an explicit linear operator to the overall transformation. Experimental results show that this model, with only a shallow network, can identify the intrinsic dimension of a dataset and achieve low autoencoding error. This is the first experimental result concluding no significant advantage of deep network structures compared to shallow ones in identifying the intrinsic dimension.

NEUROCOMPUTING (2023)

Add to Collection

Article Environmental Sciences

Volume-of-Interest Aware Deep Neural Networks for Rapid Chest CT-Based COVID-19 Patient Risk Assessment

Anargyros Chatzitofis, Pierandrea Cancian, Vasileios Gkitsas, Alessandro Carlucci, Panagiotis Stalidis, Georgios Albanis, Antonis Karakottas, Theodoros Semertzidis, Petros Daras, Caterina Giannitto, Elena Casiraghi, Federica Mrakic Sposta, Giulia Vatteroni, Angela Ammirabile, Ludovica Lofino, Pasquala Ragucci, Maria Elena Laino, Antonio Voza, Antonio Desai, Maurizio Cecconi, Luca Balzarini, Arturo Chiti, Dimitrios Zarpalas, Victor Savevski

Summary: This paper introduces an artificially intelligent tool for CT-based risk assessment in COVID-19 patients to improve treatment and patient care. The authors utilize a VoI-based approach to address the high dimensionality of CT inputs, create a new labeled CT dataset, and demonstrate the effectiveness of their method in patient risk assessment. Achieving high accuracy and performance, this approach shows promise in enhancing healthcare practices during the COVID-19 pandemic.

INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH (2021)

Add to Collection

Article Multidisciplinary Sciences

Automated image analysis to assess hygienic behaviour of honeybees

Gianluigi Paolillo, Alessandro Petrini, Elena Casiraghi, Maria Grazia De Iorio, Stefano Biffani, Giulio Pagnacco, Giulietta Minozzi, Giorgio Valentini

Summary: The focus of this study is to develop an automated image processing pipeline for images acquired in uncontrolled conditions. The pipeline is specifically tested on honeybee comb images for identifying and counting uncapped brood cells. The model shows good performance in handling various acquisition conditions and achieves high correlation between automated and manual counts.

PLOS ONE (2022)

Add to Collection

Article Virology

NSAID use and clinical outcomes in COVID-19 patients: a 38-center retrospective cohort study

Justin T. Reese, Ben Coleman, Lauren Chan, Hannah Blau, Tiffany J. Callahan, Luca Cappelletti, Tommaso Fontana, Katie Rebecca Bradwell, Nomi L. Harris, Elena Casiraghi, Giorgio Valentini, Guy Karlebach, Rachel Deer, Julie A. McMurry, Melissa A. Haendel, Christopher G. Chute, Emily Pfaff, Richard Moffitt, Heidi Spratt, Jasvinder Singh, Christopher J. Mungall, Andrew E. Williams, Peter N. Robinson

Summary: This study found that the use of non-steroidal anti-inflammatory drugs (NSAIDs) is not associated with increased severity or other adverse outcomes in COVID-19 inpatients. The results confirm and extend the findings of previous observational studies and provide evidence against the initial concerns raised about the use of NSAIDs in COVID-19 patients.

VIROLOGY JOURNAL (2022)

Add to Collection

Review Biochemical Research Methods

Heterogeneous data integration methods for patient similarity networks

Jessica Gliozzo, Marco Mesiti, Marco Notaro, Alessandro Petrini, Alex Patak, Antonio Puertas-Gallardo, Alberto Paccanaro, Giorgio Valentini, Elena Casiraghi

Summary: Patient similarity networks (PSNs) are widely used in clinical research to summarize patient relationships and predict outcomes, phenotypes, and disease risk. PSNs can be visualized and offer explainability of machine learning predictions. This article reviews methods for integrating multiple biomedical data views and patient similarity measures to construct PSNs, while also providing a resource to navigate machine learning literature on this topic.

BRIEFINGS IN BIOINFORMATICS (2022)

Add to Collection

Article Biochemical Research Methods

Boosting tissue-specific prediction of active cis-regulatory regions through deep learning and Bayesian optimization techniques

Luca Cappelletti, Alessandro Petrini, Jessica Gliozzo, Elena Casiraghi, Max Schubach, Martin Kircher, Giorgio Valentini

Summary: CRRs play a central role in regulating transcription under physiological and pathological conditions. Accurately identifying CRRs and their tissue-specific activity status using machine learning methods is essential for studying the impact of genetic variants on human diseases.

BMC BIOINFORMATICS (2022)

Add to Collection

Article Endocrinology & Metabolism

Metformin is associated with reduced COVID-19 severity in patients with prediabetes

Lauren E. Chan, Elena Casiraghi, Bryan Laraway, Ben Coleman, Hannah Blau, Adnin Zaman, Nomi L. Harris, Kenneth Wilkins, Blessy Antony, Michael Gargano, Giorgio Valentini, David Sahner, Melissa Haendel, Peter N. Robinson, Carolyn Bramante, Justin Reese

Summary: Studies have shown that the use of metformin is associated with reduced severity of COVID-19, particularly for patients with prediabetes or PCOS.

DIABETES RESEARCH AND CLINICAL PRACTICE (2022)

Add to Collection

Article Computer Science, Interdisciplinary Applications

A method for comparing multiple imputation techniques: A case study on the US national COVID cohort collaborative

Elena Casiraghi, Rachel Wong, Margaret Hall, Ben Coleman, Marco Notaro, Michael D. Evans, Jena S. Tronieri, Hannah Blau, Bryan Laraway, Tiffany J. Callahan, Lauren E. Chan, Carolyn T. Bramante, John B. Buse, Richard A. Moffitt, Til Sturmer, Steven G. Johnson, Yu Raymond Shao, Justin Reese, Peter N. Robinson, Alberto Paccanaro, Giorgio Valentini, Jared D. Huling, Kenneth J. Wilkins

Summary: Healthcare datasets from Electronic Health Records are valuable for assessing associations between patients' predictors and outcomes. However, missing values are common in these datasets, and removing them may introduce bias. Multiple imputation algorithms have been proposed to recover missing information, but there is no consensus on which algorithm works best. Choosing algorithm parameters and data-related modeling choices is also challenging.

JOURNAL OF BIOMEDICAL INFORMATICS (2023)

Add to Collection

Article Health Care Sciences & Services

Ontologizing health systems data at scale: making translational discovery a reality

Tiffany J. Callahan, Adrianne L. Stefanski, Jordan M. Wyrwa, Chenjie Zeng, Anna Ostropolets, Juan M. Banda, William A. Baumgartner, Richard D. Boyce, Elena Casiraghi, Ben D. Coleman, Janine H. Collins, Sara J. Deakyne Davies, James A. Feinstein, Asiyah Y. Lin, Blake Martin, Nicolas A. Matentzoglu, Daniella Meeker, Justin Reese, Jessica Sinclair, Sanya B. Taneja, Katy E. Trinkley, Nicole A. Vasilevsky, Andrew E. Williams, Xingmin A. Zhang, Joshua C. Denny, Patrick B. Ryan, George Hripcsak, Tellen D. Bennett, Melissa A. Haendel, Peter N. Robinson, Lawrence E. Hunter, Michael G. Kahn

Summary: Common data models address standardization challenges in EHR data, but fail to integrate all resources for deep phenotyping. OBO ontologies provide computable representations of biological knowledge, but mapping EHR data to OBO ontologies requires manual curation. OMOP2OBO is an algorithm that maps OMOP vocabularies to OBO ontologies, enabling deep phenotyping and identification of undiagnosed patients.

NPJ DIGITAL MEDICINE (2023)

Add to Collection

Article Medicine, General & Internal

Predictive models of long COVID

Blessy Antony, Hannah Blau, Elena Casiraghi, Johanna J. Loomba, Tiffany J. Callahan, Bryan J. Laraway, Kenneth J. Wilkins, Corneliu C. Antonescu, Giorgio Valentini, Andrew E. Williams, Peter N. Robinson, Justin T. Reese, T. M. Murali

Summary: Predicting the incidence of long COVID using electronic health record data and machine learning methods is effective. Specific features like drugs and certain symptoms showed the highest influence on the prediction task.

EBIOMEDICINE (2023)

Add to Collection

Review Medicine, General & Internal

Image-Guided Intraoperative Assessment of Surgical Margins in Oral Cavity Squamous Cell Cancer: A Diagnostic Test Accuracy Review

Giorgia Carnicelli, Luca Disconzi, Michele Cerasuolo, Elena Casiraghi, Guido Costa, Armando De Virgilio, Andrea Alessandro Esposito, Fabio Ferreli, Federica Fici, Antonio Lo Casto, Silvia Marra, Luca Malvezzi, Giuseppe Mercante, Giuseppe Spriano, Guido Torzilli, Marco Francone, Luca Balzarini, Caterina Giannitto

Summary: This study shows that intraoperative imaging techniques such as magnetic resonance imaging (MRI) and intraoral ultrasound (ioUS) have great potential in improving the assessment of surgical margins in oral cavity squamous cell cancer (OCSCC) surgery. IoUS has comparable accuracy to ex vivo MRI and is more affordable and reproducible.

DIAGNOSTICS (2023)

Add to Collection

Article Computer Science, Interdisciplinary Applications

GRAPE for fast and scalable graph processing and random-walk-based embedding

Luca Cappelletti, Tommaso Fontana, Elena Casiraghi, Vida Ravanmehr, Tiffany J. J. Callahan, Carlos Cano, Marcin P. P. Joachimiak, Christopher J. J. Mungall, Peter N. N. Robinson, Justin Reese, Giorgio Valentini

Summary: GRAPE is a software resource for graph processing and embedding that can scale with big graphs, showing substantial improvements in space and time complexity compared to existing resources. It offers efficient graph-processing utilities, node embedding methods, and inference models, making it a valuable tool for graph representation learning. GRAPE is capable of handling millions of nodes and billions of edges, enabling large-graph analysis in various real-world applications.

NATURE COMPUTATIONAL SCIENCE (2023)

Add to Collection

Article Biochemical Research Methods

An expectation-maximization framework for comprehensive prediction of isoform-specific functions

Guy Karlebach, Leigh Carmody, Jagadish Chandrabose Sundaramurthi, Elena Casiraghi, Peter Hansen, Justin Reese, Christopher J. Mungall, Giorgio Valentini, Peter N. Robinson

Summary: This article proposes a method called isoform interpretation to infer isoform-specific functions using expectation-maximization. It predicts specific functional annotations for 85,617 isoforms of 17,900 protein-coding genes and outperforms other methods in comparison to manually annotated results.

BIOINFORMATICS (2023)

Add to Collection

Article Medicine, General & Internal

Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmes

Justin T. Reese, Hannah Blau, Elena Casiraghi, Timothy Bergquist, Johanna J. Loomba, Tiffany J. Callahan, Bryan Laraway, Corneliu Antonescu, Ben Coleman, Michael Gargano, Kenneth J. Wilkins, Luca Cappelletti, Tommaso Fontana, Nariman Ammar, Blessy Antony, T. M. Murali, J. Harry Caufield, Guy Karlebach, Julie A. McMurry, Andrew Williams, Richard Moffitt, Jineta Banerjee, Anthony E. Solomonides, Hannah Davis, Kristin Kostka, Giorgio Valentini, David Sahner, Christopher G. Chute, Charisse Madlock-Brown, Melissa A. Haendel, Peter N. Robinson

Summary: By computationally modelling PASC phenotype data and assessing semantic similarity, we identified six distinct clusters of PASC patients with different clinical features and severity, including diverse manifestations. This semantic phenotypic clustering approach provides a foundation for stratifying and studying PASC patients for natural history or therapy studies.

EBIOMEDICINE (2023)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data

Alessandro Petrini, Marco Notaro, Jessica Gliozzo, Tiziana Castrignano, Peter N. Robinson, Elena Casiraghi, Giorgio Valentini

Summary: In the context of Genomic and Precision Medicine, this study presents ParSMURF-NG, a High Performance Computing-oriented Machine Learning approach that can effectively handle big omics data. It has shown its usefulness in the detection of pathogenic single nucleotide variants in the non-coding regions of the human genome, providing a powerful model for Genomic Medicine.

ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS. AIAI 2022 IFIP WG 12.5 INTERNATIONAL WORKSHOPS (2022)

Add to Collection

Article Radiology, Nuclear Medicine & Medical Imaging

An approach to evaluate the quality of radiological reports in Head and Neck cancer loco-regional staging: experience of two Academic Hospitals

Caterina Giannitto, Andrea Alessandro Esposito, Giuseppe Spriano, Armando De Virgilio, Emanuele Avola, Giada Beltramini, Gianpaolo Carrafiello, Elena Casiraghi, Alessandra Coppola, Valentina Cristofaro, Davide Farina, Francesca Gaino, Giulia Lastella, Ludovica Lofino, Roberto Maroldi, Francesca Piccoli, Lorenzo Pignataro, Lorenzo Preda, Elena Russo, Lorenzo Solimeno, Giulia Vatteroni, Antonello Vidiri, Luca Balzarini, Giuseppe Mercante

Summary: This study evaluated the quality of loco-regional staging CT and MRI reports in head and neck cancer. The results showed that the quality of tumor description was low, while the quality of lymph node description was high.

RADIOLOGIA MEDICA (2022)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.