Editorial Material
Biochemistry & Molecular Biology
Neil A. Hanchard, Ananyo Choudhury
Summary: This article discusses the release of expanded high-depth sequencing data for the fourth phase of the 1000 Genomes Project. It explores how this dataset can serve as a more comprehensive and accurate resource for global genomics through extensive comparisons and benchmarks.
Article
Biochemistry & Molecular Biology
Abhinav Jain, Rahul C. Bhoyar, Kavita Pandhare, Anushree Mishra, Disha Sharma, Mohamed Imran, Vigneshwar Senthivel, Mohit Kumar Divakar, Mercy Rophina, Bani Jolly, Arushi Batra, Sumit Sharma, Sanjay Siwach, Arun G. Jadhao, Nikhil Palande, Ganga Nath Jha, Nishat Ashrafi, Prashant Kumar Mishra, A. K. Vidhya, Suman Jain, Debasis Dash, Nachimuthu Senthil Kumar, Andrew Vanlallawma, Ranjan Jyoti Sarma, Lalchhandama Chhakchhuak, Shantaraman Kalyanaraman, Radha Mahadevan, Sunitha Kandasamy, B. M. Pabitha, Raskin Erusan Rajagopal, Ezhil J. Ramya, Nirmala P. Devi, Anjali Bajaj, Vishu Gupta, Samatha Mathew, Sangam Goswami, Mohit Mangla, Savinitha Prakash, Kandarp Joshi, S. Sreedevi, Devarshi Gajjar, Ronibala Soraisham, Rohit Yadav, Yumnam Silla Devi, Aayush Gupta, Mitali Mukerji, Sivaprakash Ramalingam, B. K. Binukumar, Vinod Scaria, Sridhar Sivasubbu
Summary: With the advent of next-generation sequencing, a pilot phase of the 'IndiGen' program performed whole genome sequencing of 1029 healthy Indian individuals to create the IndiGenomes database, which is now freely accessible. This comprehensive genetic variant resource for the Indian population has been extensively accessed by the worldwide community.
NUCLEIC ACIDS RESEARCH
(2021)
Article
Biology
Abirami Rajavel, Selina Klees, Yuehan Hui, Armin Otto Schmitt, Mehmet Gultas
Summary: Climate change increases the risk of spreading vector-borne diseases like African Animal Trypanosomiasis (AAT). This study investigates the genetic mechanisms involved in AAT susceptibility and tolerance in cattle breeds. The findings provide insights into the regulatory SNPs, gene expression profiles, and downstream effectors associated with AAT, contributing to a better understanding of resistance and susceptibility in cattle breeds.
Article
Genetics & Heredity
Zhong Wang, Lei Sun, Andrew Paterson
Summary: This study found significant sex differences in genetic variations on the X chromosome, which has important implications for disease and trait associations. Variations in pseudoautosomal regions and X-transposed regions showed larger differences compared to non-pseudoautosomal regions. These findings need to be taken into account in X chromosome analyses.
Article
Biochemistry & Molecular Biology
Marta Byrska-Bishop, Uday S. Evani, Xuefang Zhao, Anna O. Basile, Haley J. Abel, Allison A. Regier, Andre Corvelo, Wayne E. Clarke, Rajeeva Musunuri, Kshithija Nagulapalli, Susan Fairley, Alexi Runnels, Lara Winterkorn, Ernesto Lowy, Paul Flicek, Soren Germer, Harrison Brand, Ira M. Hall, Michael E. Talkowski, Giuseppe Narzisi, Michael C. Zody
Summary: The 1000 Genomes Project is the largest open resource of whole-genome sequencing data, and a new high-coverage WGS 1kGP resource has been released, improving the sensitivity and accuracy of variant calls and making it more valuable for association studies.
Article
Clinical Neurology
Fulya Akcimen, Jay P. Ross, Calwing Liao, Dan Spiegelman, Patrick A. Dion, Guy A. Rouleau
Summary: The study identified HTT-positive, ATXN2-positive, ATXN3-positive, and possibly ATXN1-positive samples in 0.5% of these populations, indicating the presence of asymptomatic small expanded repeats. There was no correlation between repeat sizes of different genes and the distribution of CAG alleles varied by ethnicity.
MOVEMENT DISORDERS
(2021)
Article
Biochemical Research Methods
Kshitij Srivastava, Anne-Sophie Fratzscher, Bo Lan, Willy Albert Flegel
Summary: This study identified long reference sequences by analyzing homozygous genomic regions in the 1000 Genomes Project database. A total of 902 ACKR1 haplotypes of varying lengths were confirmed, with the longest being 80,584 nucleotides and the shortest being 1,901 nucleotides. The approach of using tracts of homozygosity for definitive reference sequences is scalable and can be applied to any gene.
BMC BIOINFORMATICS
(2021)
Article
Genetics & Heredity
Jeffrey K. Ng, Pankaj Vats, Elyn Fritz-Waters, Stephanie Sarkar, Eleanor I. Sams, Evin M. Padhi, Zachary L. Payne, Shawn Leonard, Marc A. West, Chandler Prince, Lee Trani, Marshall Jansen, George Vacek, Mehrzad Samadi, Timothy T. Harkins, Craig Pohl, Tychele N. Turner
Summary: In this study, a graphics processing units-based workflow was developed to accelerate the detection of de novo variants (DNVs). The workflow was applied to whole-genome sequencing data from different sources, revealing unexpected results in the DNV callsets and potential cell line artifacts. Additionally, mutation signature analysis identified associations with B-cell lymphoma and variants in DNA repair genes. These findings have important implications for reference building and disease-related projects.
Article
Biochemistry & Molecular Biology
Xiangqi Hao, Yanchao Li, Xiangyu Xiao, Bo Chen, Pei Zhou, Shoujun Li
Summary: Canine parvovirus (CPV-2) is a significant pathogen in dogs. Despite the development of vaccines, CPV-2 continues to circulate in the dog population. CPV-2c is replacing CPV-2a as the dominant variant in Asia, South America, North America, and Africa, with evidence of spillover.
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES
(2022)
Article
Genetics & Heredity
Alexander J. M. Blakes, Htoo A. Wai, Ian Davies, Hassan E. Moledina, April Ruiz, Tessy Thomas, David Bunyan, N. Simon Thomas, Christine P. Burren, Lynn Greenhalgh, Melissa Lees, Amanda Pichini, Sarah F. Smithson, Ana Lisa Taylor Tavares, Peter O'Donovan, Andrew G. L. Douglas, Nicola Whiffin, Diana Baralle, Jenny Lord
Summary: This study examines the role of non-canonical splicing variants in rare genetic diseases using whole-genome sequencing data. The research shows that positions near splice sites and splicing branchpoints are constrained by purifying selection and harbor potentially damaging non-coding variants. The study also identifies new likely diagnoses for individuals with unsolved rare diseases.
Article
Medicine, Legal
Jiaming Xue, Shengqiu Qu, Mengyu Tan, Yuanyuan Xiao, Ranran Zhang, Dezhi Chen, Meili Lv, Yiming Zhang, Lin Zhang, Weibo Liang
Summary: Microhaplotypes (MHs) are a promising new type of forensic markers that have advantages such as low mutation rates, lack of stutter artifacts, and short amplicons. They have improved human identification, kinship analysis, ancestry prediction, and mixture deconvolution capabilities. While there are some public databases available for MHs, there is a need for a more comprehensive database that integrates information from other databases. This study successfully established a dual-SNP MH database (D-SNPsDB) for 26 populations, providing basic data such as physical positions, allele frequencies, and variant information.
INTERNATIONAL JOURNAL OF LEGAL MEDICINE
(2022)
Article
Multidisciplinary Sciences
Hideki Tokunaga, Keita Iida, Atsushi Hozawa, Soichi Ogishima, Yoh Watanabe, Shogo Shigeta, Muneaki Shimada, Yumi Yamaguchi-Kabata, Shu Tadaka, Fumiki Katsuoka, Shin Ito, Kazuki Kumada, Yohei Hamanaka, Nobuo Fuse, Kengo Kinoshita, Masayuki Yamamoto, Nobuo Yaegashi, Jun Yasuda
Summary: Identifying population frequencies of pathogenic variants in BRCA1/2 genes is essential for estimating HBOC patient numbers, while detecting moderately penetrant HBOC gene variants in the population is critical for personalized health care. A prospective cohort study can provide valuable information, with computational scoring and MAF filtration being useful for identifying potentially pathogenic variants.
Article
Genetics & Heredity
Tamara Soledad Frontanilla, Guilherme Valle-Silva, Jesus Ayala, Celso Teixeira Mendes-Junior
Summary: By conducting a comprehensive genotyping analysis of STRs obtained from the 1000 Genome populations, we established a reliable open-access STR database and identified limitations of HipSTR in detecting longer alleles.
Article
Genetics & Heredity
Iago Maceda, Oscar Lao
Summary: The 1000 Genomes Project (1000G) is a valuable dataset for genomics research. Recent studies have found ghost mutation signals in 1000G, which can affect follow-up studies. This study demonstrates the association between sequencing center and loss of function alleles, singletons, and patterns of archaic introgression in 1000G.
Article
Genetics & Heredity
Eleanor G. Seaby, N. Simon Thomas, Amy Webb, Helen Brittain, Ana Lisa Taylor Tavares, Genomics England Consortium, Diana Baralle, Heidi L. Rehm, Anne O'Donnell-Luria, Sarah Ennis
Summary: This study proposed a screening method based on the LOEUF score to rapidly identify pathogenic variants and new diagnoses. The results showed that this method has high specificity and accuracy, and can identify diagnoses missed by 100KGP analysis.
Article
Biochemistry & Molecular Biology
Rossana Zaru, Joseph Onwubiko, Antonio J. M. Ribeiro, Keeva Cochrane, Jonathan D. Tyzack, Venkatesh Muthukrishnan, Lukas Pravda, Janet M. Thornton, Claire O'Donovan, Sameer Velanker, Sandra Orchard, Andrew Leach, Maria J. Martin
Summary: Enzyme Portal serves as a free hub for researchers to easily access and explore enzyme-related information from various resources, addressing the challenge of time-consuming retrieval of scattered enzyme data.
Letter
Biochemical Research Methods
M. Michael Gromiha, Christine A. Orengo, Ramanathan Sowdhamini, Janet M. Thornton
Article
Biochemical Research Methods
Nicola R. De Maio, William Boulton, Lukas Weilguny, Conor Walker, Yatish Turakhia, Russell O. Corbett-Detig, Nick Goldman, Ville Mustonen, Joel Wertheim
Summary: This article introduces a new algorithm and software for efficiently simulating a large number of closely related genomes. The algorithm is based on the Gillespie approach and utilizes an efficient multi-layered search tree structure to achieve high computational efficiency, allowing integration with various evolutionary models.
PLOS COMPUTATIONAL BIOLOGY
(2022)
Correction
Multidisciplinary Sciences
Harald S. Vohringer, Theo Sanderson, Matthew Sinnott, Nicola De Maio, Thuy Nguyen, Richard Goater, Frank Schwach, Ian Harrison, Joel Hellewell, Cristina V. Ariani, Sonia Goncalves, David K. Jackson, Ian Johnston, Alexander W. Jung, Callum Saint, John Sillitoe, Maria Suciu, Nick Goldman
Article
Biochemistry & Molecular Biology
Fergal J. Martin, M. Ridwan Amode, Alisha Aneja, Olanrewaju Austine-Orimoloye, Andrey G. Azov, If Barnes, Arne Becker, Ruth Bennett, Andrew Berry, Jyothish Bhai, Simarpreet Kaur Bhurji, Alexandra Bignell, Sanjay Boddu, Paulo R. Branco Lins, Lucy Brooks, Shashank Budhanuru Ramaraju, Mehrnaz Charkhchi, Alexander Cockburn, Luca Da Rin Fiorretto, Claire Davidson, Kamalkumar Dodiya, Sarah Donaldson, Bilal El Houdaigui, Tamara El Naboulsi, Reham Fatima, Carlos Garcia Giron, Thiago Genez, Gurpreet S. Ghattaoraya, Jose Gonzalez Martinez, Cristi Guijarro, Matthew Hardy, Zoe Hollis, Thibaut Hourlier, Toby Hunt, Mike Kay, Vinay Kaykala, Tuan Le, Diana Lemos, Diego Marques-Coelho, Jose Carlos Marugan, Gabriela Alejandra Merino, Louisse Paola Mirabueno, Aleena Mushtaq, Syed Nakib Hossain, Denye N. Ogeh, Manoj Pandian Sakthivel, Anne Parker, Malcolm Perry, Ivana Pilizota, Irina Prosovetskaia, Jose G. Perez-Silva, Ahamed Imran Abdul Salam, Nuno Saraiva-Agostinho, Helen Schuilenburg, Dan Sheppard, Swati Sinha, Botond Sipos, William Stark, Emily Steed, Ranjit Sukumaran, Dulika Sumathipala, Marie-Marthe Suner, Likhitha Surapaneni, Kyosti Sutinen, Michal Szpak, Francesca Floriana Tricomi, David Urbina-Gomez, Andres Veidenberg, Thomas A. Walsh, Brandon Walts, Elizabeth Wass, Natalie Willhoft, Jamie Allen, Jorge Alvarez-Jarreta, Marc Chakiachvili, Bethany Flint, Stefano Giorgetti, Leanne Haggerty, Garth R. Ilsley, Jane E. Loveland, Benjamin Moore, Jonathan M. Mudge, John Tate, David Thybert, Stephen J. Trevanion, Andrea Winterbottom, Adam Frankish, Sarah E. Hunt, Magali Ruffier, Fiona Cunningham, Sarah Dyer, Robert D. Finn, Kevin L. Howe, Peter W. Harrison, Andrew D. Yates, Paul Flicek
Summary: Ensembl has been providing high-quality genomic resources for vertebrates and model organisms for over 20 years. With the increase in high-quality reference genomes and the development of pangenome representations, Ensembl aims to support downstream research by creating high-quality annotations, tools, and services for species across the tree of life. This report highlights Ensembl's resources for popular reference genomes, the growing annotations, updates to the Variant Effect Predictor, protein structure predictions, and the beta release of their new website.
NUCLEIC ACIDS RESEARCH
(2023)
Article
Biochemistry & Molecular Biology
Neera Borkakoti, Janet M. Thornton
Summary: The drug discovery process involves designing compounds to selectively interact with their protein targets. Recent advancements in artificial intelligence have greatly improved the accuracy of protein structure prediction, making protein targets more accessible in the drug design process. In this perspective article, we highlight the importance of accurate protein structure prediction in various stages of small molecule drug discovery, discussing current capabilities and the potential impact of further evolution of predictive procedures.
CURRENT OPINION IN STRUCTURAL BIOLOGY
(2023)
Article
Biochemistry & Molecular Biology
Marcia A. Hasenahuer, Alba Sanchis-Juan, Roman A. Laskowski, James A. Baker, James D. Stephenson, Christine A. Orengo, F. Lucy Raymond, Janet M. Thornton
Summary: In this study, constrained coding regions (CCRs) in the human genome were identified using DNA sequencing data from healthy control populations. These regions lack protein-changing variants and have been under constraint during human evolution. The distribution of CCRs was explored and their co-occurrence with different protein functional features was analyzed. Functional amino acids involved in DNA/RNA interactions, protein-protein contacts, and catalytic sites were found to be highly constrained. Surprisingly, linear motifs, linear interacting peptides, disorder-order transitions, and liquid-liquid phase separating regions also showed strong association with constraint for variability.
JOURNAL OF MOLECULAR BIOLOGY
(2023)
Article
Biotechnology & Applied Microbiology
Lukas Weilguny, Nicola De Maio, Rory Munro, Charlotte Manser, Ewan Birney, Matthew Loose, Nick Goldman
Summary: BOSS-RUNS is an algorithmic framework and software that dynamically updates decision strategies based on real-time updates of uncertainty at each genome position. It optimizes information gain by deciding whether to fully sequence each DNA fragment, leading to improved variant calling in microbial communities.
NATURE BIOTECHNOLOGY
(2023)
Article
Biochemistry & Molecular Biology
Mehmet Akdel, Douglas E. Pires, Eduard Porta Pardo, Jurgen Janes, Arthur O. Zalevsky, Balint Meszaros, Patrick Bryant, Lydia L. Good, Roman A. Laskowski, Gabriele Pozzati, Aditi Shenoy, Wensi Zhu, Petras Kundrotas, Victoria Ruiz Serra, Carlos H. M. Rodrigues, Alistair S. Dunham, David Burke, Neera Borkakoti, Sameer Velankar, Adam Frost, Jerome Basquin, Kresten Lindorff-Larsen, Alex Bateman, Andrey Kajava, Alfonso Valencia, Sergey Ovchinnikov, Janani Durairaj, David B. Ascher, Janet M. Thornton, Norman E. Davey, Amelie Stein, Arne Elofsson, Tristan Croll, Pedro Beltrao
Summary: This study evaluates the performance of AlphaFold2 in structural biology applications and finds that it performs well and can partially replace experimentally determined structures, which is of great significance for life science research.
NATURE STRUCTURAL & MOLECULAR BIOLOGY
(2022)
Editorial Material
Biochemistry & Molecular Biology
Angus I. Lamond, Ivan Dikic, Andre Nussenzweig, Christoph W. Mueller, Janet M. Thornton, Michael B. Yaffe
Article
Genetics & Heredity
Nicola De Maio, Prabhav Kalaghatgi, Yatish Turakhia, Russell Corbett-Detig, Bui Quang Minh, Nick Goldman
Summary: Phylogenetics plays a crucial role in genomic epidemiology, and the COVID-19 pandemic has generated an unprecedented amount of genome sequence data for analysis. However, most phylogenetic approaches are unable to handle the scale of these datasets. This study presents a new method called 'MAximum Parsimonious Likelihood Estimation' (MAPLE) for likelihood-based phylogenetic analysis of large genomic datasets. MAPLE is faster, more accurate, and requires significantly less memory compared to existing maximum likelihood methods, enabling the analysis of millions of genomes.
Correction
Multidisciplinary Sciences
Ulas Isildak, Mehmet Somel, Janet M. Thornton, Handan Melike Donertas
SCIENTIFIC REPORTS
(2023)
Article
Biochemistry & Molecular Biology
Ioannis G. Riziotis, Antonio J. M. Ribeiro, Neera Borkakoti, Janet M. Thornton
Summary: Enzyme catalysis is controlled by a limited set of residues and co-factors. By utilizing three-dimensional templates, recurring catalytic modules that are involved in metal ion, co-factor, and substrate binding can be identified. Some of these convergent modules perform specific catalytic functions, while enzymes that have diverged during evolution retain specific regions of their active site.
JOURNAL OF MOLECULAR BIOLOGY
(2023)
Article
Biochemical Research Methods
Antonio J. M. Ribeiro, Ioannis G. Riziotis, Jonathan D. Tyzack, Neera Borkakoti, Janet M. Thornton
Summary: The rich literature on enzyme reaction mechanisms can serve as the foundation for new knowledge-based approaches to investigate enzyme mechanisms. In this study, a tool called EzMechanism is presented, which can automatically infer mechanistic paths for a given three-dimensional active site and enzyme reaction based on catalytic rules compiled from a database of enzyme mechanisms. EzMechanism facilitates and improves the generation of hypotheses by considering relevant information derived from literature on both related and unrelated enzymes.
Article
Evolutionary Biology
Paschalia Kapli, Ioanna Kotari, Maximilian J. Telford, Nick Goldman, Ziheng Yang
Summary: Inference of deep phylogenies has primarily used protein sequences, but our analysis shows that DNA sequences may be just as useful and should not be excluded. We conducted a simulation study and analyzed empirical data, which suggest that DNA sequences can recover the correct tree as often as protein sequences. Using DNA data has computational advantages and allows for advanced models that account for heterogeneity in the nucleotide-substitution process.
SYSTEMATIC BIOLOGY
(2023)