4.7 Article Proceedings Paper

Bacteriophage classification for assembled contigs using graph convolutional network

期刊

BIOINFORMATICS
卷 37, 期 -, 页码 I25-I33

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab293

关键词

-

资金

  1. Research Grants Council of the Hong Kong Special Administrative Region, China [CityU 11206819]
  2. HKIDS [9360163]
  3. NSF of China [31972847]

向作者/读者索取更多资源

Bacteriophages, viruses that infect bacteria, play crucial roles in microbial biology, but their classification faces challenges due to high diversity and limited knowledge. A novel semi-supervised learning model called PhaGCN combines DNA and protein sequence features to classify phage contigs effectively, showing competitive performance against existing tools in both simulated and real sequencing data.
Motivation: Bacteriophages (aka phages), which mainly infect bacteria, play key roles in the biology of microbes. As the most abundant biological entities on the planet, the number of discovered phages is only the tip of the iceberg. Recently, many new phages have been revealed using high-throughput sequencing, particularly metagenomic sequencing. Compared to the fast accumulation of phage-like sequences, there is a serious lag in taxonomic classification of phages. High diversity, abundance and limited known phages pose great challenges for taxonomic analysis. In particular, alignment-based tools have difficulty in classifying fast accumulating contigs assembled from metagenomic data. Results: In this work, we present a novel semi-supervised learning model, named PhaGCN, to conduct taxonomic classification for phage contigs. In this learning model, we construct a knowledge graph by combining the DNA sequence features learned by convolutional neural network and protein sequence similarity gained from gene-sharing network. Then we apply graph convolutional network to utilize both the labeled and unlabeled samples in training to enhance the learning ability. We tested PhaGCN on both simulated and real sequencing data. The results clearly show that our method competes favorably against available phage classification tools.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Environmental Sciences

Interactive effects of water management and liming on-CH4 emissions and rice cadmium uptake in an acid paddy soil

Yong Wang, Yanni Sun, Le Chen, Hua Shao, Yanhua Zeng, Yongjun Zeng, Feiyu Tang, Junhuo Cai, Shan Huang

Summary: Rice agriculture is a significant source of methane emissions and cadmium accumulation. The study investigated the combined effects of water management and lime application on CH4 emissions and rice Cd uptake. Results showed that flooding following midseason drainage effectively reduced CH4 emissions, while lime application reduced both CH4 emissions and rice Cd uptake. The recommended approach to mitigate CH4 emissions without increasing Cd uptake is continuous flooding with midseason drainage combined with lime application.

ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH (2023)

Article Biochemical Research Methods

Virus classification for viral genomic fragments using PhaGCN2

Jing-Zhe Jiang, Wen-Guang Yuan, Jiayu Shang, Ying-Hui Shi, Li-Ling Yang, Min Liu, Peng Zhu, Tao Jin, Yanni Sun, Li-Hong Yuan

Summary: The paper presents PhaGCN2, a tool that rapidly classifies viral sequences at the family level and improves the precision and recall of virus classification. It allows for high-throughput processing of viral sequences and supports visualization.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Virology

Critical Assessment of Whole Genome and Viral Enrichment Shotgun Metagenome on the Characterization of Stool Total Virome in Hepatocellular Carcinoma Patients

Fan Zhang, Andrew Gia, Guowei Chen, Lan Gong, Jason Behary, Georgina L. Hold, Amany Zekry, Xubo Tang, Yanni Sun, Emad El-Omar, Xiao-Tao Jiang

Summary: Viruses are abundant and important in ecosystems, and the two methods of whole genome shotgun metagenome and viral-like particle enriched metagenome sequencing are commonly used for comparing viruses in different environments. In this study, both methods were applied to investigate the stool virome in HCC patients and healthy controls, and they both successfully identified altered viral profiles in HCC patients. Ultra-deep sequencing recovered more viruses, and VLPM method can detect RNA viruses. Using both methods would capture different parts of the total virome and identify shared and specific viral signatures.

VIRUSES-BASEL (2023)

Article Multidisciplinary Sciences

Metabolic changes in bile acids with pregnancy progression and their correlation with perinatal complications in intrahepatic cholestasis of pregnant patients

Zhixin Ma, Yifeng Liu, Lin Chai, Guochen Jin, Yanni Sun, Shaomin Zhou, Peiyuan Yin, Siwen Wang, Yuning Zhu, Dan Zhang, Shiming Lu, Bo Zhu

Summary: Intrahepatic cholestasis of pregnancy (ICP) is a rare liver disease characterized by disordered bile acid metabolism during pregnancy. Different types of ICP show distinct bile acid metabolism profiles. Elevated levels of total bile acids and glycocholic acid are associated with preterm birth in early-onset ICP (EICP), while increased levels of total bile acids and taurocholic acid are predictive of preterm birth in late-onset ICP (LICP). This study highlights the importance of assessing bile acid metabolism in ICP patients to predict perinatal complications.

SCIENTIFIC REPORTS (2023)

Article Biochemical Research Methods

HOTSPOT: hierarchical host prediction for assembled plasmid contigs with transformer

Yongxin Ji, Jiayu Shang, Xubo Tang, Yanni Sun

Summary: Understanding the host range of plasmids is essential for studying their roles in bacterial evolution and adaptation. Existing tools for predicting plasmid hosts face challenges in sensitivity and precision. This study presents a hierarchical classification tool called HOTSPOT, which uses a state-of-the-art language model, Transformer, to accurately predict the host taxonomy of input plasmid contigs.

BIOINFORMATICS (2023)

Article Chemistry, Multidisciplinary

Laboratory Experiment and Application Evaluation of a Bio-Nano- depressurization and Injection-Increasing Composite System in Medium-Low Permeability Offshore Reservoirs

Qing Feng, Xianchao Chen, Ning Zhang, Xiaonan Li, Jingchao Zhou, Shengsheng Li, Xiaorong Zhang, Yanni Sun, Yuehui She

Summary: Bohai Oilfield has developed a bio-nano-depressurization and injection-increasing composite system solution to address the high injection pressure and insufficient injection volume in offshore oilfields. The new technology has the advantages of efficient decompression, long-term injection, and wide adaptation. However, there is a need for optimization schemes and application effect prediction methods to further promote and apply the bio-nano-composite system solution. This paper optimizes the injection volume, concentration, and speed of the bio-nano-augmentation fluid and evaluates the application effect using well testing, water absorption index, and numerical simulation methods.

ACS OMEGA (2023)

Article Biochemical Research Methods

PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer

Jiayu Shang, Cheng Peng, Xubo Tang, Yanni Sun

Summary: In this study, a computational method called PhaVIP was developed for fast and accurate classification and annotation of phage virion proteins (PVPs). By encoding protein sequences into unique images and utilizing the Vision Transformer model, PhaVIP can learn both local and global features from sequence images. Experimental results demonstrated the superior performance of PhaVIP, and its output was further applied to phage taxonomy classification and phage host prediction with beneficial results.

BIOINFORMATICS (2023)

Article Biochemistry & Molecular Biology

PLASMe: a tool to identify PLASMid contigs from short-read assemblies using transformer

Xubo Tang, Jiayu Shang, Yongxin Ji, Yanni Sun

Summary: In this study, a plasmid detection tool called PLASMe was developed, which combines alignment and learning-based methods to effectively identify closely related and diverged plasmids. By encoding plasmid sequences as a language defined on the protein cluster-based token set, the Transformer model in PLASMe can learn the importance of proteins and their correlation. Comparative analysis showed that PLASMe achieved the highest F1-score in detecting complete plasmids, plasmid contigs, and contigs assembled from CAMI2 simulated data, and exhibited more reliable performance than other tools.

NUCLEIC ACIDS RESEARCH (2023)

Article Microbiology

High-resolution strain-level microbiome composition analysis from short reads

Herui Liao, Yongxin Ji, Yanni Sun

Summary: In this study, a new strain-level composition analysis tool named StrainScan is introduced, which employs a novel tree-based k-mers indexing structure to strike a balance between strain identification accuracy and computational complexity. Extensive testing on simulated and real sequencing data shows that StrainScan outperforms popular strain-level analysis tools in terms of accuracy and resolution. It provides more informative strain composition analysis in one sample or across multiple samples.

MICROBIOME (2023)

Article Multidisciplinary Sciences

LncRNA-Top: Controlled deep learning approaches for lncRNA gene regulatory relationship annotations across different platforms

Weidun Xie, Xingjian Chen, Zetian Zheng, Fuzhou Wang, Xiaowei Zhu, Qiuzhen Lin, Yanni Sun, Ka-Chun Wong

Summary: This study presents a method called lncRNA-Top to predict lncRNA-gene regulation relationships and constructs controlled deep-learning models. Through case studies, it is found that the predictions are accurate, and additional software is provided for target candidate annotation.

ISCIENCE (2023)

Article Biochemical Research Methods

PhaTYP: predicting the lifestyle for bacteriophages using BERT

Jiayu Shang, Xubo Tang, Yanni Sun

Summary: Researchers have developed a tool called PhaTYP that accurately predicts the lifestyle of bacteriophages, especially for short contigs. Experimental results show that PhaTYP outperforms other existing methods and achieves more stable performance on short contigs. Additionally, the utility of PhaTYP for analyzing phage lifestyle in human neonates' gut data has been demonstrated, which helps extend our understanding of microbial communities.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemical Research Methods

AccuVIR: an ACCUrate VIRal genome assembly tool for third-generation sequencing data

Runzhou Yu, Dehan Cai, Yanni Sun

Summary: RNA viruses mutate constantly, but accurate assembly of viral genomes is crucial for studying virus evolution and understanding the relationship between genotypes and virus properties. A new tool called AccuVIR has been developed for viral genome assembly and polishing using error-prone long reads. It can distinguish sequencing errors from true variants, resulting in more accurate viral genomes compared to other tools.

BIOINFORMATICS (2023)

Article Biochemical Research Methods

VirBot: an RNA viral contig detector for metagenomic data

Guowei Chen, Xubo Tang, Mang Shi, Yanni Sun

Summary: In this study, we developed VirBot, a simple yet effective RNA virus identification tool based on protein families and adaptive score cutoffs. Compared to seven popular tools for virus identification, VirBot demonstrated high specificity in metagenomic datasets and superior sensitivity in detecting novel RNA viruses on both simulated and real sequencing data.

BIOINFORMATICS (2023)

Article Engineering, Environmental

Identifying ARG-carrying bacteriophages in a lake replenished by reclaimed water using deep learning techniques

Donglin Wang, Jiayu Shang, Hui Lin, Jinsong Liang, Chenchen Wang, Yanni Sun, Yaohui Bai, Jiuhui Qu

Summary: This study develops a bio-informatic pipeline using deep learning techniques to identify phages carrying antibiotic resistance genes (ARGs) and predict their hosts, with a focus on pathogens. The study discovers that temperate phages in a landscape lake replenished by reclaimed water predominantly carry ARGs related to multidrug resistance and beta-lactam antibiotics. In silico analysis and qPCR confirm a positive correlation between temperate phages and host pathogens, and seasonal variations in the abundance of phages and chromosomes carrying ARGs.

WATER RESEARCH (2024)

Article Soil Science

Liming reduces nitrogen uptake from chemical fertilizer but increases that from straw in a double rice cropping system

Ping Liao, Lei Liu, Jin Chen, Yanni Sun, Shan Huang, Yongjun Zeng, Kees Jan van Groenigen

Summary: Liming materials can increase rice yield and nitrogen uptake, but decrease the efficiency of fertilizer nitrogen and promote nitrogen losses. Long-term studies on the impact of liming on nitrogen dynamics in paddy soils are necessary.

SOIL & TILLAGE RESEARCH (2024)

暂无数据