4.6 Article

Predicting the host of influenza viruses based on the word vector

期刊

PEERJ
卷 5, 期 -, 页码 -

出版社

PEERJ INC
DOI: 10.7717/peerj.3579

关键词

Host; Word vector; SVM; Influenza virus; Prediction

资金

  1. National Key Plan for Scientific Research and Development of China [2016YFC1200204, 2016YFD0500300]
  2. National Natural Science Foundation [31500126, 31371338]

向作者/读者索取更多资源

Newly emerging influenza viruses continue to threaten public health. A rapid deter urination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to Predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, anew representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species for which the sequences were derived for generating the word vector all influence the performance of models in predicting the host of influenza viruses In nearly all cases, the models built on the surface proteins hemagglutinin (HA) and neuraminidase (NA) (or their genes) produced better results than internal influenza Proteins (or their genes). The best performance was achieved when the model was built on the HA gene based on word vectors (words of three-letters long) generated from DNA sequences of the influenza virus This results in accuracies of 99.7% for avian, 96.9% for human and 90.6% for swine influenza viruses. Compared to the method of sequence homology best-hit searches using the Basic Local Alignment Search Tool (BLAST), the word vector-based models still need further improvements in predicting the host of influenza A viruses.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据