Article
Automation & Control Systems
Adam Block, Zeyu Jia, Yury Polyanskiy, Alexander Rakhlin
Summary: The article investigates the low-dimensional structure assumption of high-dimensional data. It introduces a new estimator for the intrinsic dimension and provides finite sample guarantees. The techniques are then applied to derive new sample complexity bounds for Generative Adversarial Networks (GANs) based solely on the intrinsic dimension of the data.
JOURNAL OF MACHINE LEARNING RESEARCH
(2022)
Article
Physics, Multidisciplinary
Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, Andrei Zinovyev
Summary: This technical note introduces an open-source Python package called scikit-dimension for intrinsic dimension estimation. The package provides a uniform implementation of various known ID estimators based on scikit-learn API, allowing evaluation of global and local intrinsic dimension as well as generating synthetic datasets. It is developed with tools to assess code quality, coverage, unit testing and continuous integration.
Article
Computer Science, Artificial Intelligence
Haiquan Qiu, Youlong Yang, Saeid Rezakhah
Summary: In practical problems, high-dimensional data often exhibits low-dimensional structure, which can be estimated using correlation dimension methods. However, these methods tend to underestimate the true intrinsic dimension of the dataset.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Haiquan Qiu, Youlong Yang, Hua Pan
Summary: This paper introduces the concept of intrinsic dimension and its importance in data dimensionality reduction and preprocessing. Due to the unknown spatial distribution of data and the limited sample size, estimation methods which only use distance information tend to underestimate the intrinsic dimension of the dataset. To improve accuracy and reduce complexity, two estimation algorithms based on ID (kappa) are proposed, where kappa is the scaling ratio of the neighborhood radius. The comparative experiments on simulation and real datasets show that the underestimation modification algorithm has high estimation accuracy and robustness.
PATTERN RECOGNITION
(2023)
Article
Computer Science, Information Systems
Haiquan Qiu, Youlong Yang, Benchong Li
Summary: The intrinsic dimension (ID) of a data set is crucial for data processing, and a new ID estimation method known as ID(k) algorithm is proposed in this study. By redefining the adjacency matrix using local adjacency information of sample points, the ID(k) method shows closer estimates to the true intrinsic dimension in experimental results.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Interdisciplinary Applications
Francesco Denti
Summary: This article introduces intRinsic, an R package that implements novel likelihood-based estimators for estimating the intrinsic dimension of a dataset. It includes two categories of models: homogeneous and heterogeneous estimators. The package provides high-level functions for easier accessibility and efficient low-level routines. The performance of the models is demonstrated on simulated datasets and applied to the Alon dataset. Estimating the intrinsic dimensions provides valuable insights into the dataset's topological structure.
JOURNAL OF STATISTICAL SOFTWARE
(2023)
Review
Computer Science, Artificial Intelligence
Mihai Gabriel Constantin, Liviu-Daniel Stefan, Bogdan Ionescu, Ngoc Q. K. Duong, Claire-Helene Demarty, Mats Sjoberg
Summary: This paper presents a common evaluation framework for image and video visual interestingness prediction, with a robust dataset and in-depth analysis. It discusses the potential for surpassing current state-of-the-art systems and proposes solutions for achieving this.
INTERNATIONAL JOURNAL OF COMPUTER VISION
(2021)
Article
Astronomy & Astrophysics
Saurabh, Sourabh Nampalliwar
Summary: Recent observations of galactic centers with the Event Horizon Telescope have led to a new era of black hole tests of fundamental physics using VLBI. This article presents GALLIFRAY, an open-source, Python-based framework for parameter estimation using VLBI data. The framework demonstrates good convergence of the posterior distribution when fitting geometric and physical models to simulated datasets.
ASTROPHYSICAL JOURNAL
(2023)
Article
Multidisciplinary Sciences
Nicolas Renaud, Cunliang Geng, Sonja Georgievska, Francesco Ambrosetti, Lars Ridder, Dario F. Marzella, Manon F. Reau, Alexandre M. J. J. Bonvin, Li C. Xue
Summary: DeepRank is a deep learning framework for data mining large sets of 3D protein-protein interfaces, enabling efficient training with millions of PPIs and supporting both classification and regression. By addressing challenges such as distinguishing biological versus crystallographic PPIs and ranking docking models, DeepRank proves to be competitive with or outperform state-of-the-art methods, demonstrating its versatility in structural biology research.
NATURE COMMUNICATIONS
(2021)
Article
Computer Science, Artificial Intelligence
Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, Jiawei Han
Summary: This article aims to provide a unified framework for deeply summarizing and evaluating existing research on heterogeneous network embedding (HNE). We first provide a generic paradigm for categorization and analysis of various HNE algorithms, then create four benchmark datasets for fair evaluations, and finally, refactor and amend the implementations of 13 popular HNE algorithms for comprehensive comparisons.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2022)
Article
Environmental Sciences
Chao Ji, Andrew Weissmann, Kan Shao
Summary: This article introduces a web-based dose-response modeling and benchmark dose (BMD) estimation system, Bayesian BMD (BBMD). By quantitatively addressing uncertainty from various sources, the system can provide more accurate BMD estimates. The study conducted using BBMD demonstrates the significant role of dose-response modeling using genomic data in supporting chemical risk assessment.
ENVIRONMENT INTERNATIONAL
(2022)
Article
Computer Science, Artificial Intelligence
Hongping Zhan, Weiwei Lin, Feiqiao Mao, Minxian Xu, Guangxin Wu, Guokai Wu, Jianzhuo Li
Summary: This study proposes a BenchmarkSubset framework based on consensus clustering for selecting benchmark subsets, solving the problem of redundancy in benchmark suites and the challenge of validating subset results for unlabeled suites. It also introduces a new evaluation method to consider the universal and diversity characteristics of benchmark suites.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
(2022)
Article
Biochemistry & Molecular Biology
Erda Qorri, Bertalan Takacs, Alexandra Graf, Marton Zsolt Enyedi, Lajos Pinter, Erno Kiss, Lajos Haracska
Summary: The rapid integration of genomic technologies in clinical diagnostics has led to the detection of numerous missense variants of unknown clinical significance. To aid in the interpretation of these variants, computational tools have been developed. Systematic benchmarking with high-quality independent datasets is crucial for selecting appropriate software. The performance of prediction algorithms varied widely across datasets.
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES
(2022)
Article
Physics, Multidisciplinary
Frantisek Gaspar, Jaromir Kukal
Summary: The main aim of this paper is to revise classical methods for spectral and walk dimension estimates. The paper focuses on constructing unbiased estimation with minimal mean square error for walk and spectral dimensions. Simulation experiments are conducted on finite substrates, serving as models for continuum and fractal sets. The paper compares classical approaches with logarithmic transformation of asymptotic models and develops a weighted approach to improve dimension estimates' statistical properties. The paper also discusses different diffusion models and presents simulation results on two-dimensional substrates and Sierpinski gaskets and carpets. General suggestions based on the simulation experiment results are summarized.
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Tommi Karkkainen, Jan Hanninen
Summary: This article proposes an additive autoencoder model for dimension reduction and analyzes its performance. Compared to traditional models, this model enhances the data reproduction capabilities in the original data dimension by adding an explicit linear operator to the overall transformation. Experimental results show that this model, with only a shallow network, can identify the intrinsic dimension of a dataset and achieve low autoencoding error. This is the first experimental result concluding no significant advantage of deep network structures compared to shallow ones in identifying the intrinsic dimension.
Article
Environmental Sciences
Anargyros Chatzitofis, Pierandrea Cancian, Vasileios Gkitsas, Alessandro Carlucci, Panagiotis Stalidis, Georgios Albanis, Antonis Karakottas, Theodoros Semertzidis, Petros Daras, Caterina Giannitto, Elena Casiraghi, Federica Mrakic Sposta, Giulia Vatteroni, Angela Ammirabile, Ludovica Lofino, Pasquala Ragucci, Maria Elena Laino, Antonio Voza, Antonio Desai, Maurizio Cecconi, Luca Balzarini, Arturo Chiti, Dimitrios Zarpalas, Victor Savevski
Summary: This paper introduces an artificially intelligent tool for CT-based risk assessment in COVID-19 patients to improve treatment and patient care. The authors utilize a VoI-based approach to address the high dimensionality of CT inputs, create a new labeled CT dataset, and demonstrate the effectiveness of their method in patient risk assessment. Achieving high accuracy and performance, this approach shows promise in enhancing healthcare practices during the COVID-19 pandemic.
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH
(2021)
Article
Multidisciplinary Sciences
Gianluigi Paolillo, Alessandro Petrini, Elena Casiraghi, Maria Grazia De Iorio, Stefano Biffani, Giulio Pagnacco, Giulietta Minozzi, Giorgio Valentini
Summary: The focus of this study is to develop an automated image processing pipeline for images acquired in uncontrolled conditions. The pipeline is specifically tested on honeybee comb images for identifying and counting uncapped brood cells. The model shows good performance in handling various acquisition conditions and achieves high correlation between automated and manual counts.
Article
Virology
Justin T. Reese, Ben Coleman, Lauren Chan, Hannah Blau, Tiffany J. Callahan, Luca Cappelletti, Tommaso Fontana, Katie Rebecca Bradwell, Nomi L. Harris, Elena Casiraghi, Giorgio Valentini, Guy Karlebach, Rachel Deer, Julie A. McMurry, Melissa A. Haendel, Christopher G. Chute, Emily Pfaff, Richard Moffitt, Heidi Spratt, Jasvinder Singh, Christopher J. Mungall, Andrew E. Williams, Peter N. Robinson
Summary: This study found that the use of non-steroidal anti-inflammatory drugs (NSAIDs) is not associated with increased severity or other adverse outcomes in COVID-19 inpatients. The results confirm and extend the findings of previous observational studies and provide evidence against the initial concerns raised about the use of NSAIDs in COVID-19 patients.
Review
Biochemical Research Methods
Jessica Gliozzo, Marco Mesiti, Marco Notaro, Alessandro Petrini, Alex Patak, Antonio Puertas-Gallardo, Alberto Paccanaro, Giorgio Valentini, Elena Casiraghi
Summary: Patient similarity networks (PSNs) are widely used in clinical research to summarize patient relationships and predict outcomes, phenotypes, and disease risk. PSNs can be visualized and offer explainability of machine learning predictions. This article reviews methods for integrating multiple biomedical data views and patient similarity measures to construct PSNs, while also providing a resource to navigate machine learning literature on this topic.
BRIEFINGS IN BIOINFORMATICS
(2022)
Article
Biochemical Research Methods
Luca Cappelletti, Alessandro Petrini, Jessica Gliozzo, Elena Casiraghi, Max Schubach, Martin Kircher, Giorgio Valentini
Summary: CRRs play a central role in regulating transcription under physiological and pathological conditions. Accurately identifying CRRs and their tissue-specific activity status using machine learning methods is essential for studying the impact of genetic variants on human diseases.
BMC BIOINFORMATICS
(2022)
Article
Endocrinology & Metabolism
Lauren E. Chan, Elena Casiraghi, Bryan Laraway, Ben Coleman, Hannah Blau, Adnin Zaman, Nomi L. Harris, Kenneth Wilkins, Blessy Antony, Michael Gargano, Giorgio Valentini, David Sahner, Melissa Haendel, Peter N. Robinson, Carolyn Bramante, Justin Reese
Summary: Studies have shown that the use of metformin is associated with reduced severity of COVID-19, particularly for patients with prediabetes or PCOS.
DIABETES RESEARCH AND CLINICAL PRACTICE
(2022)
Article
Computer Science, Interdisciplinary Applications
Elena Casiraghi, Rachel Wong, Margaret Hall, Ben Coleman, Marco Notaro, Michael D. Evans, Jena S. Tronieri, Hannah Blau, Bryan Laraway, Tiffany J. Callahan, Lauren E. Chan, Carolyn T. Bramante, John B. Buse, Richard A. Moffitt, Til Sturmer, Steven G. Johnson, Yu Raymond Shao, Justin Reese, Peter N. Robinson, Alberto Paccanaro, Giorgio Valentini, Jared D. Huling, Kenneth J. Wilkins
Summary: Healthcare datasets from Electronic Health Records are valuable for assessing associations between patients' predictors and outcomes. However, missing values are common in these datasets, and removing them may introduce bias. Multiple imputation algorithms have been proposed to recover missing information, but there is no consensus on which algorithm works best. Choosing algorithm parameters and data-related modeling choices is also challenging.
JOURNAL OF BIOMEDICAL INFORMATICS
(2023)
Article
Health Care Sciences & Services
Tiffany J. Callahan, Adrianne L. Stefanski, Jordan M. Wyrwa, Chenjie Zeng, Anna Ostropolets, Juan M. Banda, William A. Baumgartner, Richard D. Boyce, Elena Casiraghi, Ben D. Coleman, Janine H. Collins, Sara J. Deakyne Davies, James A. Feinstein, Asiyah Y. Lin, Blake Martin, Nicolas A. Matentzoglu, Daniella Meeker, Justin Reese, Jessica Sinclair, Sanya B. Taneja, Katy E. Trinkley, Nicole A. Vasilevsky, Andrew E. Williams, Xingmin A. Zhang, Joshua C. Denny, Patrick B. Ryan, George Hripcsak, Tellen D. Bennett, Melissa A. Haendel, Peter N. Robinson, Lawrence E. Hunter, Michael G. Kahn
Summary: Common data models address standardization challenges in EHR data, but fail to integrate all resources for deep phenotyping. OBO ontologies provide computable representations of biological knowledge, but mapping EHR data to OBO ontologies requires manual curation. OMOP2OBO is an algorithm that maps OMOP vocabularies to OBO ontologies, enabling deep phenotyping and identification of undiagnosed patients.
NPJ DIGITAL MEDICINE
(2023)
Article
Medicine, General & Internal
Blessy Antony, Hannah Blau, Elena Casiraghi, Johanna J. Loomba, Tiffany J. Callahan, Bryan J. Laraway, Kenneth J. Wilkins, Corneliu C. Antonescu, Giorgio Valentini, Andrew E. Williams, Peter N. Robinson, Justin T. Reese, T. M. Murali
Summary: Predicting the incidence of long COVID using electronic health record data and machine learning methods is effective. Specific features like drugs and certain symptoms showed the highest influence on the prediction task.
Review
Medicine, General & Internal
Giorgia Carnicelli, Luca Disconzi, Michele Cerasuolo, Elena Casiraghi, Guido Costa, Armando De Virgilio, Andrea Alessandro Esposito, Fabio Ferreli, Federica Fici, Antonio Lo Casto, Silvia Marra, Luca Malvezzi, Giuseppe Mercante, Giuseppe Spriano, Guido Torzilli, Marco Francone, Luca Balzarini, Caterina Giannitto
Summary: This study shows that intraoperative imaging techniques such as magnetic resonance imaging (MRI) and intraoral ultrasound (ioUS) have great potential in improving the assessment of surgical margins in oral cavity squamous cell cancer (OCSCC) surgery. IoUS has comparable accuracy to ex vivo MRI and is more affordable and reproducible.
Article
Computer Science, Interdisciplinary Applications
Luca Cappelletti, Tommaso Fontana, Elena Casiraghi, Vida Ravanmehr, Tiffany J. J. Callahan, Carlos Cano, Marcin P. P. Joachimiak, Christopher J. J. Mungall, Peter N. N. Robinson, Justin Reese, Giorgio Valentini
Summary: GRAPE is a software resource for graph processing and embedding that can scale with big graphs, showing substantial improvements in space and time complexity compared to existing resources. It offers efficient graph-processing utilities, node embedding methods, and inference models, making it a valuable tool for graph representation learning. GRAPE is capable of handling millions of nodes and billions of edges, enabling large-graph analysis in various real-world applications.
NATURE COMPUTATIONAL SCIENCE
(2023)
Article
Biochemical Research Methods
Guy Karlebach, Leigh Carmody, Jagadish Chandrabose Sundaramurthi, Elena Casiraghi, Peter Hansen, Justin Reese, Christopher J. Mungall, Giorgio Valentini, Peter N. Robinson
Summary: This article proposes a method called isoform interpretation to infer isoform-specific functions using expectation-maximization. It predicts specific functional annotations for 85,617 isoforms of 17,900 protein-coding genes and outperforms other methods in comparison to manually annotated results.
Article
Medicine, General & Internal
Justin T. Reese, Hannah Blau, Elena Casiraghi, Timothy Bergquist, Johanna J. Loomba, Tiffany J. Callahan, Bryan Laraway, Corneliu Antonescu, Ben Coleman, Michael Gargano, Kenneth J. Wilkins, Luca Cappelletti, Tommaso Fontana, Nariman Ammar, Blessy Antony, T. M. Murali, J. Harry Caufield, Guy Karlebach, Julie A. McMurry, Andrew Williams, Richard Moffitt, Jineta Banerjee, Anthony E. Solomonides, Hannah Davis, Kristin Kostka, Giorgio Valentini, David Sahner, Christopher G. Chute, Charisse Madlock-Brown, Melissa A. Haendel, Peter N. Robinson
Summary: By computationally modelling PASC phenotype data and assessing semantic similarity, we identified six distinct clusters of PASC patients with different clinical features and severity, including diverse manifestations. This semantic phenotypic clustering approach provides a foundation for stratifying and studying PASC patients for natural history or therapy studies.
Proceedings Paper
Computer Science, Artificial Intelligence
Alessandro Petrini, Marco Notaro, Jessica Gliozzo, Tiziana Castrignano, Peter N. Robinson, Elena Casiraghi, Giorgio Valentini
Summary: In the context of Genomic and Precision Medicine, this study presents ParSMURF-NG, a High Performance Computing-oriented Machine Learning approach that can effectively handle big omics data. It has shown its usefulness in the detection of pathogenic single nucleotide variants in the non-coding regions of the human genome, providing a powerful model for Genomic Medicine.
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS. AIAI 2022 IFIP WG 12.5 INTERNATIONAL WORKSHOPS
(2022)
Article
Radiology, Nuclear Medicine & Medical Imaging
Caterina Giannitto, Andrea Alessandro Esposito, Giuseppe Spriano, Armando De Virgilio, Emanuele Avola, Giada Beltramini, Gianpaolo Carrafiello, Elena Casiraghi, Alessandra Coppola, Valentina Cristofaro, Davide Farina, Francesca Gaino, Giulia Lastella, Ludovica Lofino, Roberto Maroldi, Francesca Piccoli, Lorenzo Pignataro, Lorenzo Preda, Elena Russo, Lorenzo Solimeno, Giulia Vatteroni, Antonello Vidiri, Luca Balzarini, Giuseppe Mercante
Summary: This study evaluated the quality of loco-regional staging CT and MRI reports in head and neck cancer. The results showed that the quality of tumor description was low, while the quality of lymph node description was high.