A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action
Shiran Abadi,
Winston X Yan,
David Amar and
Itay Mayrose
PLOS Computational Biology, 2017, vol. 13, issue 10, 1-24
Abstract:
The adaptation of the CRISPR-Cas9 system as a genome editing technique has generated much excitement in recent years owing to its ability to manipulate targeted genes and genomic regions that are complementary to a programmed single guide RNA (sgRNA). However, the efficacy of a specific sgRNA is not uniquely defined by exact sequence homology to the target site, thus unintended off-targets might additionally be cleaved. Current methods for sgRNA design are mainly concerned with predicting off-targets for a given sgRNA using basic sequence features and employ elementary rules for ranking possible sgRNAs. Here, we introduce CRISTA (CRISPR Target Assessment), a novel algorithm within the machine learning framework that determines the propensity of a genomic site to be cleaved by a given sgRNA. We show that the predictions made with CRISTA are more accurate than other available methodologies. We further demonstrate that the occurrence of bulges is not a rare phenomenon and should be accounted for in the prediction process. Beyond predicting cleavage efficiencies, the learning process provides inferences regarding patterns that underlie the mechanism of action of the CRISPR-Cas9 system. We discover that attributes that describe the spatial structure and rigidity of the entire genomic site as well as those surrounding the PAM region are a major component of the prediction capabilities.Author summary: The CRISPR-Cas9 system, a microbial adaptive immune system, was recently exploited for modulating DNA sequences within the endogenous genome in many organisms. This system has emerged as a technology of choice for genome editing with promising therapeutic and research advancements. However, these exciting developments were not paralleled by deep understanding of CRISPR-Cas9 cleavage efficiency. Indeed, while numerous studies have been conducted in order to define better guidelines to determine CRISPR-Cas9 specificity, much ambiguity remains surrounding its mechanism of action. Here, we present a machine-learning based algorithm that was trained on genome-wide experimental data. The algorithm considers a broad range of features that describe different attributes that potentially impact the cleavage efficacy of CRISPR-Cas9 including genomic attributes, RNA thermodynamics, and those concerning sequence similarity. We further found that incorporating the possibility for DNA or RNA bulges play an important role in prediction accuracy. Together, these result in a predictive model that can be used both to predict the cleavage propensity of a new genomic site according to the genomic context, as well as to learn on the importance of different features on CRISPR-Cas9 efficiency and selectivity.
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (5)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005807 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 05807&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1005807
DOI: 10.1371/journal.pcbi.1005807
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().