Flipping a specific bit from C to T in a human genome

The protein, SaCas9 Base Editor 4-Gam converts particular “C”s in the genome to “T”s without causing a double-stranded break to the DNA, and protecting the DNA if a break does occur. At the core of the protein is a compact variant of Cas9. By coupling the protein with a guide RNA made of a particular sequence, the protein can seek out specific DNA sequences in a genome with high fidelity, and once arrived at, bring the rest of its DNA-manipulation machinery to bear. Ultimately the single protein converts a specific C to a T in an mammalian organism’s genome, a few basepairs upstream of the targeted DNA sequence. Older variants of ‘Base Editor 4’ have already been used with success in in many applications including editing rice genomes.

The paper describing SaBE4-Gam was published at the end of August 2017, and describes the fourth generation of a synthetic protein that involves significantly more engineering than most other synthetic proteins, and demonstrates one way in which biological tools can be both engineered and optimized. It contains 4 different major functional domains including Cas9, with each functional domain separated by variously designed linkers, epitope tags or localization signals. The domains each have independent functions, but when physically coupled into a single protein they in concert produce a specifically engineered outcome. In this case much of the engineering has been focused on ensuring the DNA being edited is edited accurately and specifically, while minimizing the chance for DNA damage from indel accumulation.

The Protein

Cas9 guides the protein

Cas9, from the CRISPR system guides the entire multi-domain protein to a particular sequence in the genome. Cas9 binds a strand of guide RNA that has an exposed sequence that probes genomic dna for a match. Once matched, the protein will presist at that location while other editing functions are performed.

Functions that Convert C -> T

Listed is a causal order of events that result in a C->T conversion, however each component of the protein acts persistently and independently on its surroundings, not necessarily in sequential order:



1) SV40 NLS ensures the entire protein is brought to the nucleus where it can bind genomic DNA. 2) SaCas9 [D10A] (Nickase) recognizes and then binds DNA that complements a bound strand of guide-RNA. 3) SaCas9 [D10A] (Nickase) once bound to DNA, breaks the covalent backbone of (only) one of the two strands of DNA. 4) rApoBEC1 ∆M converts a nearby cytosine (“C” of DNA’s A/C/T/G) to a uracil (“U” being RNA’s version of “T”). 5) The cell’s own DNA repair machinery notices either the now-incorrect U:G pair, the broken DNA strand, or both. 6a) If the cell tries to excise the “U” (as a “U” should only found in DNA by chemical error), one of the two Uracil Glycosylase Inhibitors will try to intercept and inhibit the repair enzyme at the working site. 6b) If something goes wrong and the both DNA strands are cut, Gam will bind to the end of the cut DNA to prevent its degradation, awaiting non-homologous end joining repair machinery to rejoin the cleaved strands. 7) The cell will try to repair the errant U:G nucleic acid pair and replace it with a U:A pair. 8) The cell will try to repair the errant U:A nucleic acid pair with a T:A. 9) Optional: If a researcher wants to ‘see’ where the SaBE4-Gam protein is using a microscope (to make sure it’s being produced and localized properly) they can use the 3x Flag epitope as a unique handle for fluorescent identification.

Functional Domains

SV40 NLS

SV40 is from Simian virus 40.



PKKKRKV

SaCas9 {D10A} Nickase)

SaCas9 [D10A] is a strategically mutated version of SaCas9 from Staphylococcus aureus that prevents both strands of DNA from being cut. It is related to, but smaller than, the original CRISPR Cas9 protein found in Streptococcus pyogenes.



KRNYILGLAIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAELQLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLLETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQEELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNRLKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIELAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRKDDKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIMEQYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG

rApoBEC1

rApoBEC1 is the rat version of the human ApoBEC1 which regulates Apolipoprotein B.



TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML

Uracil Glycosylase Inhibitor

UGI is from Bacillus subtilis.



TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML

Gam

Gam is from Bacteriophage Mu.



MAKPAKRIKSAAAAYVPQNRDAVITDIKRIGDLQREASRLETEMNDAIAEITEKFAARIAPIKTDIETLSKGVQGWCEANRDELTNGGKVKTANLVTGDVSWRVRPPSVSIRGMDAVMETLERLGLQRFIRTKQEINKEAILLEPKAVAGVAGITVKSGIEDFSIIPFEQEAGI

Flag

Flag is a synthetic peptide that was patented by Sigma-Aldrich in 1987.



PKKKRKV

gRNA

The guide RNA has part of its sequence that is recognized by the Cas9 protein. This recognition sequence is called a “protospacer adjacent motif” (PAM), and is specific to each variant of Cas9. The second part of the RNA sequence is arbitrary, and what specifies where Cas9 will bind to the DNA in the genome. With >20 nucleotides in the gRNA’s variable region, a single sequence will be able to register >40 bits of information - enough to generally uniquely identifiy a section in a human genome (~3Gbp) An example SaCas9 gRNA might be as follows:

[XXXXXXXXXXX][ACCG][NNNNNNNNNNNNNNNNNNNNN]

Articles:

In the case of SaBE4Gam, whichever ‘C’ is ~6 base pairs in from the end of the gRNA (in the XXX region), will be flipped in the genome to a corresponding T.

Addgene blogpost about “Single Base Editing with CRISPR”

Harvard’s “Base Editor” Could Be the Gene-Editing Technique That Answers CRISPR’s Problems

Discussions

Papers

Komor et al. (2016)

Kim et al. (2017)