4.5 Article

Monte Carlo Null Models for Genomic Data

期刊

STATISTICAL SCIENCE
卷 30, 期 1, 页码 59-71

出版社

INST MATHEMATICAL STATISTICS
DOI: 10.1214/14-STS484

关键词

Monte Carlo methods; hypothesis testing; genomics

向作者/读者索取更多资源

As increasingly complex hypothesis-testing scenarios are considered in many scientific fields, analytic derivation of null distributions is often out of reach. To the rescue comes Monte Carlo testing, which may appear deceptively simple: as long as you can sample test statistics under the null hypothesis, the p-value is just the proportion of sampled test statistics that exceed the observed test statistic. Sampling test statistics is often simple once you have a Monte Carlo null model for your data, and defining some form of randomization procedure is also, in many cases, relatively straightforward. However, there may be several possible choices of a randomization null model for the data and no clear-cut criteria for choosing among them. Obviously, different null models may lead to very different p-values, and a very low p-value may thus occur due to the inadequacy of the chosen null model. It is preferable to use assumptions about the underlying random data generation process to guide selection of a null model. In many cases, we may order the null models by increasing preservation of the data characteristics, and we argue in this paper that this ordering in most cases gives increasing p-values, that is, lower significance. We denote this as the null complexity principle. The principle gives a better understanding of the different null models and may guide in the choice between the different models.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Immunology

B cell tolerance and antibody production to the celiac disease autoantigen transglutaminase 2

M. Fleur du Pre, Jana Blazevski, Alisa E. Dewan, Jorunn Stamnaes, Chakravarthi Kanduri, Geir Kjetil Sandve, Marie K. Johannesen, Christian B. Lindstad, Kathrin Hnida, Lars Fugger, Gerry Melino, Shuo-Wang Qiao, Ludvig M. Sollid

JOURNAL OF EXPERIMENTAL MEDICINE (2020)

Correction Genetics & Heredity

Human somatic cell mutagenesis creates genetically tractable sarcomas (vol 46, pg 964, 2014)

Sam D. Molyneux, Paul D. Waterhouse, Dawne Shelton, Yang W. Shao, Christopher M. Watling, Qing-Lian Tang, Isaac S. Harris, Brendan C. Dickson, Pirashaanthy Tharmapalan, Geir K. Sandve, Xiaoyang Zhang, Swneke D. Bailey, Hal Berman, Jay S. Wunder, Zsuzsanna Izsvak, Mathieu Lupien, Tak W. Mak, Rama Khokha

NATURE GENETICS (2020)

Article Cardiac & Cardiovascular Systems

Genetic variability in the absorption of dietary sterols affects the risk of coronary artery disease

Anna Helgadottir, Gudmar Thorleifsson, Kristjan F. Alexandersson, Vinicius Tragante, Margret Thorsteinsdottir, Finnur F. Eiriksson, Solveig Gretarsdottir, Eythor Bjornsson, Olafur Magnusson, Gardar Sveinbjornsson, Ingileif Jonsdottir, Valgerdur Steinthorsdottir, Egil Ferkingstad, Brynjar O. Jensson, Hreinn Stefansson, Isleifur Olafsson, Alex H. Christensen, Christian Torp-Pedersen, Lars Kober, Ole B. Pedersen, Christian Erikstrup, Erik Sorensen, Soren Brunak, Karina Banasik, Thomas F. Hansen, Mette Nyegaard, Gudmundur Eyjolfssson, Olof Sigurdardottir, Bjorn L. Thorarinsson, Stefan E. Matthiasson, Thora Steingrimsdottir, Einar S. Bjornsson, Ragnar Danielsen, Folkert W. Asselbergs, David O. Arnar, Henrik Ullum, Henning Bundgaard, Patrick Sulem, Unnur Thorsteinsdottir, Gudmundur Thorgeirsson, Hilma Holm, Daniel F. Gudbjartsson, Kari Stefansson

EUROPEAN HEART JOURNAL (2020)

Editorial Material Biochemical Research Methods

Ten simple rules for quick and dirty scientific programming

Gabriel Balaban, Ivar Grytten, Knut Dagestad Rand, Lonneke Scheffer, Geir Kjetil Sandve

PLOS COMPUTATIONAL BIOLOGY (2021)

Article Biology

A genome-wide meta-analysis yields 46 new loci associating with biomarkers of iron homeostasis

Steven Bell, Andreas S. Rigas, Magnus K. Magnusson, Egil Ferkingstad, Elias Allara, Gyda Bjornsdottir, Anna Ramond, Erik Sorensen, Gisli H. Halldorsson, Dirk S. Paul, Hannes P. Eggertsson, Kristoffer S. Burgdorf, Joanna M. M. Howson, Lise W. Thorner, Snaedis Kristmundsdottir, William J. Astle, Christian Erikstrup, Jon K. Sigurdsson, Dragana Vuckovic, Khoa M. Dinh, Vinicius Tragante, Praveen Surendran, Ole B. Pedersen, Brynjar Vidarsson, Tao Jiang, Helene M. Paarup, Pall T. Onundarson, Parsa Akbari, Kaspar R. Nielsen, Sigrun H. Lund, Kristinn Juliusson, Magnus Magnusson, Michael L. Frigge, Asmundur Oddsson, Isleifur Olafsson, Stephen Kaptoge, Henrik Hjalgrim, Gudmundur Runarsson, Angela M. Wood, Ingileif Jonsdottir, Thomas F. Hansen, Olof Sigurdardottir, Hreinn Stefansson, David Rye, James E. Peters, David Westergaard, Hilma Holm, Nicole Soranzo, Karina Banasik, Gudmar Thorleifsson, Willem H. Ouwehand, Unnur Thorsteinsdottir, David J. Roberts, Patrick Sulem, Adam S. Butterworth, Daniel F. Gudbjartsson, John Danesh, Soren Brunak, Emanuele Di Angelantonio, Henrik Ullum, Kari Stefansson

Summary: The meta-analysis of three genome-wide association studies revealed 62 independent sequence variants associating with iron homeostasis parameters at 56 loci, including 46 novel loci. These variants are associated with iron deficiency anemia and iron overload, highlighting their significant role in regulating iron homeostasis.

COMMUNICATIONS BIOLOGY (2021)

Article Biology

Eleven genomic loci affect plasma levels of chronic inflammation marker soluble urokinase-type plasminogen activator receptor

Joseph Dowsett, Egil Ferkingstad, Line Jee Hartmann Rasmussen, Lise Wegner Thorner, Magnus K. Magnusson, Karen Sugden, Gudmar Thorleifsson, Mike Frigge, Kristoffer Solvsten Burgdorf, Sisse Rye Ostrowski, Erik Sorensen, Christian Erikstrup, Ole Birger Pedersen, Thomas Folkmann Hansen, Karina Banasik, Soren Brunak, Vinicius Tragante, Sigrun Helga Lund, Lilja Stefansdottir, Bjarni Gunnarson, Richie Poulton, Louise Arseneault, Avshalom Caspi, Terrie E. Moffitt, Daniel Gudbjartsson, Jesper Eugen-Olsen, Hreinn Stefansson, Kari Stefansson, Henrik Ullum

Summary: The study found a 60% heritability factor for suPAR variation and identified 13 independently genome-wide significant sequence variants associated with suPAR levels across 11 distinct loci. These findings provide new insight into the causes of variation in suPAR plasma levels, which may clarify suPAR's potential role in associated diseases.

COMMUNICATIONS BIOLOGY (2021)

Article Biology

Predicting the probability of death using proteomics

Thjodbjorg Eiriksdottir, Steinthor Ardal, Benedikt A. Jonsson, Sigrun H. Lund, Erna V. Ivarsdottir, Kristjan Norland, Egil Ferkingstad, Hreinn Stefansson, Ingileif Jonsdottir, Hilma Holm, Thorunn Rafnar, Jona Saemundsdottir, Gudmundur L. Norddahl, Gudmundur Thorgeirsson, Daniel F. Gudbjartsson, Patrick Sulem, Unnur Thorsteinsdottir, Kari Stefansson, Magnus O. Ulfarsson

Summary: The study developed predictors for all-cause mortality using large-scale proteomics datasets, indicating that the plasma proteome may be valuable in assessing overall health status and estimating the risk of death.

COMMUNICATIONS BIOLOGY (2021)

Article Genetics & Heredity

Large-scale integration of the plasma proteome with genetics and disease

Egil Ferkingstad, Patrick Sulem, Bjarni A. Atlason, Gardar Sveinbjornsson, Magnus I. Magnusson, Edda L. Styrmisdottir, Kristbjorg Gunnarsdottir, Agnar Helgason, Asmundur Oddsson, Bjarni V. Halldorsson, Brynjar O. Jensson, Florian Zink, Gisli H. Halldorsson, Gisli Masson, Gudny A. Arnadottir, Hildigunnur Katrinardottir, Kristinn Juliusson, Magnus K. Magnusson, Olafur Th. Magnusson, Run Fridriksdottir, Saedis Saevarsdottir, Sigurjon A. Gudjonsson, Simon N. Stacey, Solvi Rognvaldsson, Thjodbjorg Eiriksdottir, Thorunn A. Olafsdottir, Valgerdur Steinthorsdottir, Vinicius Tragante, Magnus O. Ulfarsson, Hreinn Stefansson, Ingileif Jonsdottir, Hilma Holm, Thorunn Rafnar, Pall Melsted, Jona Saemundsdottir, Gudmundur L. Norddahl, Sigrun H. Lund, Daniel F. Gudbjartsson, Unnur Thorsteinsdottir, Kari Stefansson

Summary: Genome-wide association studies of plasma protein levels in Icelanders have identified numerous associations with diseases and other traits, providing valuable insights into disease pathogenesis and potential drug targets. Through integration of proteomics, genomics, and transcriptomics, this research offers a resource for improving disease understanding and aiding drug discovery and development.

NATURE GENETICS (2021)

Article Biochemical Research Methods

CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching

Torbjorn Rognes, Lonneke Scheffer, Victor Greiff, Geir Kjetil Sandve

Summary: In this study, CompAIRR was developed for fast computation of AIRR overlap, achieving a 1000-fold improvement in computational speed compared to existing methods. CompAIRR has been integrated with immuneML, a machine learning ecosystem for AIRR analysis.

BIOINFORMATICS (2022)

Article Multidisciplinary Sciences

Identification of gluten T cell epitopes driving celiac disease

Marketa Chlubnova, Asbjorn O. Christophersen, Geir Kjetil F. Sandve, Knut E. A. Lundin, Jorgen Jahnsen, Shiva Dahal-Koirala, Ludvig M. Sollid

Summary: 42 wheat gluten-reactive T cell clones with different phenotypes and no reactivity to known epitopes were screened. Synthetic peptides were identified bioinformatically from a wheat gluten protein database and tested against the T cell clones. Reactivity of 10 T cell clones was assigned, and 5 previously uncharacterized gliadin/glutenin epitopes with a 9-nucleotide oligomer core region were identified. This work represents an advance in identifying CeD-driving gluten epitopes.

SCIENCE ADVANCES (2023)

Article Psychiatry

Effects of prenatal exposure to (es)citalopram and maternal depression during pregnancy on DNA methylation and child neurodevelopment

Emilie Willoch Olstad, Hedvig Marie Egeland Nordeng, Geir Kjetil Sandve, Robert Lyle, Kristina Gervin

Summary: This study investigated the associations between prenatal exposure to citalopram or escitalopram, maternal depression, and offspring DNA methylation (DNAm). The researchers also examined the interaction effect of (es)citalopram exposure and DNAm on neurodevelopmental outcomes, as well as the correlation between DNAm at birth and neurodevelopmental trajectories in childhood.

TRANSLATIONAL PSYCHIATRY (2023)

Article Computer Science, Artificial Intelligence

Linguistically inspired roadmap for building biologically reliable protein language models

Mai Ha Vu, Rahmad Akbar, Philippe A. Robert, Bartlomiej Swiatczak, Geir Kjetil Sandve, Victor Greiff, Dag Trygve Truslew Haug

Summary: Language models trained on proteins can predict functions from sequences but lack insight into underlying mechanisms. Extracting rules from these models can make them interpretable and help explain biological mechanisms.

NATURE MACHINE INTELLIGENCE (2023)

Article Biochemical Research Methods

Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data

Ping-Han Hsieh, Camila Miranda Lopes-Ramos, Manuela Zucknick, Geir Kjetil Sandve, Kimberly Glass, Marieke Lydia Kuijjer

Summary: Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns. However, certain normalization methods can introduce false-positive associations between genes, hindering downstream co-expression network analysis. In this study, a normalization method called SNAIL (Smooth-quantile Normalization Adaptation for the Inference of co-expression Links) is developed to avoid false-positive associations and retain associations to genes expressed in small subgroups of samples. This method has the potential to impact network modeling and association-based approaches in large-scale heterogeneous data.

BIOINFORMATICS (2023)

Article Biology

Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification

Chakravarthi Kanduri, Milena Pavlovic, Lonneke Scheffer, Keshav Motwani, Maria Chernigovskaya, Victor Greiff, Geir K. Sandve

Summary: This article presents a study aimed at determining the effectiveness of baseline machine learning (ML) methods in the classification of adaptive immune receptor repertoires (AIRRs). The study generated a series of synthetic AIRR benchmark datasets and found that even when the immune signal occurs only in 1 out of 50,000 AIR sequences, the baseline L1-penalized logistic regression model can achieve high prediction accuracy.

GIGASCIENCE (2022)

暂无数据