Hands-on learning is the best way to absorb new concepts, and the only way to become an expert. Even if you’re not a scientist, you can gain hands-on CRISPR experience with code.

A GitHub repository called awesome-CRISPR lists a number of CRISPR tools, divided into a few categories:

Guide design tools

Off-target prediction algorithms

Genome editing outcomes and predictions

Screening analysis algorithms

Here, we'll explain what these actually mean, and cover a few Python libraries.

Guide Design

“Guide design” means designing guide RNAs (gRNAs) for using CRISPR technology. A guide RNA literally guides the Cas9 enzyme (the “snipping tool”) to the DNA that matches the guide RNA. You can think of it like a genetic GPS system.

gRNAs need to be designed depending on what you’re trying to do, whether it’s gene knockout, a specific base edit, or modulating a gene’s expression.

For example, DESKGEN has guide libraries covering a number of gRNA needs:

As this paper summarizes:

“The success of the CRISPR/Cas9 genome editing technique depends on the choice of the guide RNA sequence.”

A number of tools exist to help you select a gRNA sequence, which can take inputs like the CRISPR enzyme (e.g. Cas9 or Cas12a), the target genome (e.g. Human GRCh38), and the target (e.g. a raw DNA sequence).

The goal is to select a gRNA sequence with high specificity and efficiency.

One Python library to help you do this is DeepCRISPR, a deep learning based prediction model for sgRNA on-target knockout efficacy and genome-wide off-target cleavage profile prediction.

Attempting to use this, I discovered that the repo seems to have been abandoned, requiring deprecated modules (in TF2.0) like tensorflow.contrib, and shape modules for core ops that were moved to C++, requiring a whole load of work to get it up and running.

Never jump into an abandoned repo too early!

Another Python option is pgRNADesign, for designing paired gRNAs to knockout non-coding RNAs (lncRNAs).

That has many components, so let’s break it down:

gRNA is an RNA sequence made up of CRISPR RNA (crRNA) and a tracr RNA. The literature suggests that "paired gRNAs" are more efficient.

“knockout” (or gene knockout) is a technique to make genes inoperative.

A "non-coding RNA" (or ncRNA) is an RNA molecule that's not translated into a protein (doesn't "code into").

A "lncRNA" is a long non-coding RNA, or transcripts longer than 200 nucleotides, which, again, are not translated into protein.

Understanding these components, we can explain guide design and pgRNADesign more simply:

CRISPR-Cas9 is a GPS (gRNA)-guided pair of molecular scissors, and pgRNADesign helps you make a better GPS, specifically for knocking out long RNA sequences.

To use pgRNADesign, you have to input sgRNA records, with the following data:

Header Meaning chr The chromosome of the sgRNA start The start coordinate end The end coordinate gene_symbol The targeting lncRNA/gene IDs of the sgRNA. Note that if an sgRNA can target multiple lncRNAs/genes, use underscore _ to separate multiple IDs. For example, ENSG00000228327_ENSG00000237491. strand The orientation of the sgRNA short_seq The sequence of the sgRNA

Further, you need to input gene information, specifically:

Column Meaning 1 lncRNA/gene ID 2 Names of the lncRNA/gene 3 The chromosome of the lncRNA/gene 4 The orientation of the lncRNA/gene 5 The start coordinate 6 The end coordinate 7-8 Currently not used in the program

Finally, you need to input a gene annotation file in GTF format.

As an output, you get the pgRNA locations, detailed information, number of possible pairs in the design, and so on:

File name Meanings design.bed The pgRNA locations, in bed format design.txt All the information of pgRNAs (see below) gene.txt The summary of lncRNAs/genes, including the number of possible pairs and the number of pairs selected in the current design goodsgrna.bed All the sgRNAs that pass the quality check.

In summary: Inputting sgRNA, gene information, and gene annotation into the design algorithm will output pgRNA design files.

Genome editing outcomes and predictions

For an example of analyzing genome editing experiments and sequencing data, we'll take a look at CRISPResso2.

This Python software pipeline does the following (don't worry, we'll explain this!):

Aligns sequencing reads to a reference sequence Quantifies insertions, mutations, and deletions to determine whether a read is modified or unmodified by genome editing Summarizes editing results in plots and datasets

To understand step #1, we need to first understanding "sequencing." A sequence is the order in which bases appear in the DNA. Sequencing can be used to validate an edit and make sure that edits occur in the target gene, instead of off-target mutations.

The best-practice method is called "CRISPR amplicon sequencing," where an amplicon is a piece of DNA that's prepared for amplification. In the example below, the target sequence to be amplified is in green.

Amplification means making copies of the amplicon, which can be done in lieu of obtaining the target DNA from a large number of cells (just make copies of one target).

Step 1, therefore, means comparing your target DNA sequence to your sgRNA sequence.

Comparing these brings you to step 2, which tells you whether edits (such as insertions, mutations, or deletions) would be made or not. Step 3 simply summarizes and visualizes the output of step 2.

To use CRISPRess2, you have to input four parameters:

Editing tool (e.g. Cas9, Cpf1, or base editors). This tells the program how to line up the sequences later. Input sequences in FASTQ (e.g. the base sequence: TTGGAC...)

multicrispr.net

3. Amplicon sequence. This is a reference amplicon (like described earlier: It's a Ctrl-C Ctrl-V of the target DNA):

AATGTCCCCCAATGGGAAGTTCATCTGGCACTGCCCACAGGTGAGGAGGTCATGATCCCCTTCTGGAGCTCCCAACGGGCCGTGGTCTGGTTCATCATCTGTAAGAATGGCTTCAAGAGGCTCGGCTGTGGTT

4. sgRNA sequence. The sgRNA is the "GPS" that guides the Cas molecular scissors. By inputting the sgRNA sequence, we can infer the predicted position of editing activity.

As an example of what the program will do, if you're running a NHEJ (non-homologous end joining) experiment, you'll get outputs with a format like the following (credits: crispresso2.pinellolab.org):

The above shows the editing frequency of reads by the amount of modified and unmodified alleles.

This sliver shows nucleotide distribution across the amplicon. Those little black bars show you where deletions occured while the very small brown bars show where insertions occured.

Is there an easier way to get started?

If you're not a Python wizard, there are web-based CRISPR tools to get started more easily.

For instance, CRISPRdirect is an "oligo designer." Oligo is short for oligonucleotide, or short single strands of synthetic DNA (or RNA) that are the starting point for many CRISPR applications.

Using this tool, you can identify CRISPR/Cas targets by inputting a nucleotide sequence, PAM sequence requirements (e.g. NGG, NRG), and the species (called a specificity check).