4.6 Article

CRF: detection of CRISPR arrays using random forest

Journal

PEERJ
Volume 5, Issue -, Pages -

Publisher

PEERJ INC
DOI: 10.7717/peerj.3219

Keywords

Repeat detection; Random forest; Machine learning; CRISPR; Data visualization

Funding

  1. Committee on Faculty Research (CRF) Program, Miami University, Oxford, Ohio, USA
  2. Department of Biology, Miami University, Oxford, Ohio, USA
  3. Office for the Advancement of Research & Scholarship (OARS), Miami University, Oxford, Ohio, USA

Ask authors/readers for more resources

CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genornes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available