Chimpanzee iPSC line 8919 was generated by an integration-free episomal protocol by Applied StemCell (Menlo Park, CA) from S008919 primary fibroblasts (Yerkes Primates, Coriell) as described in Field, et. al., bioRxiv 232553; doi: https://doi.org/10.1101/232553 . Normal 48/XX karyotype was confirmed through passage 32 by Cell Line Genetics or the Coriell Institute for Medical Research. A male Gorilla iPSC line, 00053-cA3, was a gift from Carol Marchetto and Fred Gage and is described in (). No validation of this line was done in our lab. Chimp and gorilla iPSCs were maintained under feeder free conditions on Matrigel (Corning) with mTeSR-1 (STEMCELL Techonolgies).

U2OS cells and U2OS-JAG2 (Myc-tagged) cells (gifts of Arjan Groot and Marc Vooijs, MAASTRO lab, Maastricht University) were cultured in DMEM, 4.5 g/l glucose + GlutaMax, 10% HIFBS and 1x P/S. U2OS-JAG2 cells were supplemented with 2 μg/ml puromycin. OP9 cells and OP9-DLL1 cells (gifts of Bianca Blom, Academic Medical Center Amsterdam) were cultured in MEMα without nucleosides (Thermofisher), 2mM L-glutamine, 20% HIFBS, 100 μM 2-mercaptoethanol and 1x P/S. For routine culturing, cells were passaged every 3-4 days using 0.25% Trypsin (Thermofisher) + 0.5 mM EDTA (Sigma) in PBS at densities of 1/8 to 1/10 (U2OS), or 1/4 to 1/6 (OP9).

Lymphoblastoid cell lines representing NA19240, NA12877 were obtained from Coriell ( coriell.org ). The CHM1hTert hydatidiform cell lines was kindly provided by Dr. Urvashi Surti of the Magee-Womens Hospital and Magee-Womens Research Institute. Primary fibroblasts derived from individuals with 1q21.1 CNV were obtained from the Simon's VIP collection ( www.sfari.org/resources/simons-vip/ ). Lymphoblastoid cell lines were grown as suspension cultures in RPMI media (ThermoFisher) with GlutaMax (ThermoFisher) and 10% Gibco fetal bovine serum (FBS, ThermoFisher) as recommended by Coriell. Primary fibroblasts were grown as adherent cultures in MEM Alpha with nucleic acids (ThermoFisher) with 10% Gibco FBS (ThermoFisher) and 1% Pen-Strep (ThermoFisher). CHM1 was grown as adherent cultures in Amnio-MAX C-100 Basal Medium (17001-074, ThermoFisher) with Amnio-MAX C-100 Supplement (12556-023, ThermoFisher). The human samples/data used in this study were determined to be exempt from UCSC IRB review because they were de-identified and part of an IRB-approved study that gave consent for additional research of the type we performed.

H9 human embryonic stem cells (female, WA09, WiCell), were cultured in W0 medium: DMEM/F12 (Thermofisher) with 20% KnockOut serum replacement (KOSR, Thermofisher), 2 mM L-glutamine (Thermofisher), 1x non-essential amino acids (NEAA, Thermofisher), 100 uM 2-mercaptoethanol (Thermofisher) and 1x P/S (Thermofisher). W0 was freshly supplemented daily with 8 ng/ml FGF2 (Sigma). H9 hESCs (WA09, WiCell Research Institute) were grown on MEF feeder layers, and manually passaged every 5-6 days when colonies reached approximately 2 mm in diameter. Mitomycin-C treated mouse embryonic fibroblasts (MEFs, GlobalStem) were seeded on 0.1% gelatin coated plates at a density of 35.000 cell/cm 2 . MEFs were cultured in DMEM, 4.5 g/l glucose + GlutaMax (Thermofisher, 10% heat inactivated fetal bovine serum (HIFBS, Thermofisher), 1x Penicillin/Streptomycin (P/S, Thermofisher) and 1x sodium pyruvate (NaPyr, Thermofisher). Karyoptype of our H9 culture was confirmed by Cell Line Genetics before initiating CRISPR experiments.

Method Details

Mouse ESC stable cell line generation and organoid differentiation Sh,T197I-ires-GFP or empty pCIG-ires-GFP vector, using lipofectamine 2000 (Thermofisher). After 36 hours, GFP-positive cells were sorted using a FACSAria III (BD Biosciences) and recovered for further culturing. After 4 passages sorting was repeated and GFP-positive cells that had stably integrated the plasmid DNA in their genome were recovered for expansion and further culturing. We verified continued stable expression of NOTCH2NLSh,T197I-ires-GFP or empty vector ( Eiraku et al., 2008 Eiraku M.

Watanabe K.

Matsuo-Takasaki M.

Kawada M.

Yonemura S.

Matsumura M.

Wataya T.

Nishiyama A.

Muguruma K.

Sasai Y. Self-organized formation of polarized cortical tissues from ESCs and its active manipulation by extrinsic signals. Sh,T197 organoids. To generate stable cell lines, 46C cells seeded on 100 mm plates and were transfected with 24 μg of linearized pCIG-NOTCH2NL-ires-GFP or empty pCIG-ires-GFP vector, using lipofectamine 2000 (Thermofisher). After 36 hours, GFP-positive cells were sorted using a FACSAria III (BD Biosciences) and recovered for further culturing. After 4 passages sorting was repeated and GFP-positive cells that had stably integrated the plasmid DNA in their genome were recovered for expansion and further culturing. We verified continued stable expression of NOTCH2NL-ires-GFP or empty vector ( Figure S4 ). Mouse 46C ESC organoid differentiation was performed as described previously (). Briefly, cells were seeded in ultra low attachment U-shaped 96 wells plates (Corning) at 6000 cells per well. Cells were in mouse ESC medium without LIF and supplemented with 3 μM IWR-1-Endo (Sigma) and 10 μM SB431542 (Sigma). Medium was replaced every other day. At day 7, medium was changed to Neurobasal/N2 medium. Three pools of 16 organoids were isolated in TRIzol after 6 days of differentiation for EV and NOTCH2NLorganoids.

Human cortical organoid differentiation For organoid differentiation, medium was replaced with W0 medium + 1x NaPyr without FGF2 (Differentiation medium). Colonies of 2-3 mm in diameter were manually lifted using a cell lifter, and transferred to an ultra-low attachment 60mm dish (Corning). After 24 hours (day 0) embryoid bodies had formed, and 50% of medium was replaced with Differentiation medium supplemented with small molecule inhibitors and recombinant proteins to the following final concentrations: 500 ng/ml DKK1 (peprotech), 500 ng/ml NOGGIN (R&D Systems), 10 μM SB431542 (Sigma) and 1 μM Cyclopamine V. californicum (VWR). Medium was then replaced every other day until harvest. On day 8, organoids were transferred to ultra-low attachment U-shaped bottom 96 well plates (Corning). On day 18, medium was changed to Neurobasal/N2 medium: Neurobasal (Thermofisher), 1x N2 supplement (Thermofisher), 2 mM L-Glutamine, 1x P/S, supplemented with 1 μM Cyclopamine. From day 26 on, Cyclopamine was not supplemented anymore. Organoids were harvested in TRIzol at weekly time points. Total-transcriptome strand-specific RNA sequencing libraries were generated using dUTP for second strand synthesis on Ribo-zero (Epicenter) depleted total RNA. Double stranded cDNA was used for library preparation following the Low Throughput guidelines of the TruSeq DNA Sample Preparation kit (Illumina). For organoid formation of H9 hESC CRISPR/Cas9 NOTCH2NL knockout lines, an updated protocol was used: Differentiation medium was supplemented with 10 μM SB431542 (Sigma), 1 μM Dorsomorphin (Sigma), 3 μM IWR-1-Endo (Sigma) and 1 μM Cyclopamine (Sigma). Medium was then replaced every other day until harvest. On day 4, 60 mm dishes with organoids were placed on a hi/lo rocker in the incubator. From day 18 on, medium is replaced with Neurobasal/N2 medium. From day 24 on, Cyclopamine was not added anymore. Three pools of 5-10 organoids per condition were harvested in TRIzol at day 28 for RNA extraction.

RNA-Sequencing Analysis Langmead and Salzberg, 2012 Langmead B.

Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Dobin et al., 2013 Dobin A.

Davis C.A.

Schlesinger F.

Drenkow J.

Zaleski C.

Jha S.

Batut P.

Chaisson M.

Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Quinlan and Hall, 2010 Quinlan A.R.

Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Love et al., 2014 Love M.I.

Huber W.

Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Paired-end Illumina reads were trimmed from the 3′ end of read1 and read2 to 100x100 bp for human. Bowtie2 v2.2.1 () was used with the “–very-sensitive” parameter to filter reads against the RepeatMasker library ( http://www.repeatmasker.org ) which were removed from further analysis. STAR v2.5.1b () was used to map RNA-seq reads to the human reference genome GRCh37. STAR was run with the default parameters with the following exceptions:–outFilterMismatchNmax 999,–outFilterMismatchNoverLmax 0.04,–alignIntronMin 20,–alignIntronMax 1000000, and–alignMatesGapMax 1000000. STAR alignments were converted to genomic position coverage with the bedtools command genomeCoverageBed –split (). DESeq2 v1.14.1 () was used to provide basemean expression values and differential expression analysis across the time course. Total gene coverage for a gene was converted to read counts by dividing the coverage by N+N (100+100) since each paired-end NxN mapped read induces a total coverage of N+N across its genomic positions. Results are in Table S2 and data are available from GEO: GSE106245 ∗ and H9NOTCH2NLΔ organoid samples, RNA was isolated according to standard TRIzol protocol. RNA was treated with DNaseI (Roche) according to standard protocol for DNA clean-up in RNA samples. RNA was then isolated by column purification (Zymo RNA clean & concentrator 5) and stored at −80°C. For RNA sequencing, mRNA was isolated from total RNA using polyA selection Dynabeads mRNA DIRECT Micro Purification Kit (Thermofisher). Library was prepared using strand-specific Ion Total RNA-seq Kit v2 (Thermofisher) and Ion Xpress RNA-seq Barcode 1-16 (Thermofisher) to label different samples. The samples were sequenced on an IonProton system (Thermofisher), generating single-end reads of around 100 bp in length. RNA sequencing data was processed using the Tuxedo package, according the ThermoFisher protocol for IonProton data with the following parameters: Reads were trimmed using trimmomatic (0.36) LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25. Then, reads were mapped using STAR (2.4.0)–outStd SAM–outReadsUnmapped Fastx–chimSegmentMin 18–chimScoreMin 12 and Bowtie2 (2.3.3.1)–local–very-sensitive-local -q–mm, output BAM file per tool were merged. The ENSEMBL hg38 release 84 was used as reference. To generate raw read counts per gene: htseq-count (0.6.1p1) -t exon -i exon_id -q. DESeq2 (2.11.39, Galaxy) was used to normalize read counts and do pairwise statistical analysis to determine significant differentially expressed genes (p-adj < 0.05). For analysis of mouse data, the same processing was used with the mm10 genome. Results are in For mouse cortical organoids and H9and H9organoid samples, RNA was isolated according to standard TRIzol protocol. RNA was treated with DNaseI (Roche) according to standard protocol for DNA clean-up in RNA samples. RNA was then isolated by column purification (Zymo RNA clean & concentrator 5) and stored at −80°C. For RNA sequencing, mRNA was isolated from total RNA using polyA selection Dynabeads mRNA DIRECT Micro Purification Kit (Thermofisher). Library was prepared using strand-specific Ion Total RNA-seq Kit v2 (Thermofisher) and Ion Xpress RNA-seq Barcode 1-16 (Thermofisher) to label different samples. The samples were sequenced on an IonProton system (Thermofisher), generating single-end reads of around 100 bp in length. RNA sequencing data was processed using the Tuxedo package, according the ThermoFisher protocol for IonProton data with the following parameters: Reads were trimmed using trimmomatic (0.36) LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25. Then, reads were mapped using STAR (2.4.0)–outStd SAM–outReadsUnmapped Fastx–chimSegmentMin 18–chimScoreMin 12 and Bowtie2 (2.3.3.1)–local–very-sensitive-local -q–mm, output BAM file per tool were merged. The ENSEMBL hg38 release 84 was used as reference. To generate raw read counts per gene: htseq-count (0.6.1p1) -t exon -i exon_id -q. DESeq2 (2.11.39, Galaxy) was used to normalize read counts and do pairwise statistical analysis to determine significant differentially expressed genes (p-adj < 0.05). For analysis of mouse data, the same processing was used with the mm10 genome. Results are in Table S2 ∗ and H9NOTCH2NLΔ organoid data to the previously established H9 organoid timeline, the following procedure was used: The top 250 upregulated and the 250 downregulated genes between week 4 H9∗ and H9NOTCH2NLΔ based on p-adj were selected. The matching expression profiles of these 500 genes were extracted from the H9 organoid timeline, yielding 361 genes expressed in both datasets. The expression profiles in week 4 H9∗ and H9NOTCH2NLΔ and H9 Week 3, Week 4 and Week 5 were sorted from high to low, and ranked 1 to 361. Then, pairwise comparisons were made between each sample to calculate Spearman’s rank correlation between all samples, and plotted using multi-experiment viewer. 212 genes showed shift toward better correlation with Week 5 data in H9NOTCH2NLΔ compared to H9∗. These 212 genes were subjected to GO analysis using Panther V13.0 ( Mi et al., 2017 Mi H.

Huang X.

Muruganujan A.

Tang H.

Mills C.

Kang D.

Thomas P.D. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. ∗ and H9NOTCH2NLΔ, and H9 Week 3, Week 4 and Week 5. For comparison of week 4 H9and H9organoid data to the previously established H9 organoid timeline, the following procedure was used: The top 250 upregulated and the 250 downregulated genes between week 4 H9and H9based on p-adj were selected. The matching expression profiles of these 500 genes were extracted from the H9 organoid timeline, yielding 361 genes expressed in both datasets. The expression profiles in week 4 H9and H9and H9 Week 3, Week 4 and Week 5 were sorted from high to low, and ranked 1 to 361. Then, pairwise comparisons were made between each sample to calculate Spearman’s rank correlation between all samples, and plotted using multi-experiment viewer. 212 genes showed shift toward better correlation with Week 5 data in H9compared to H9. These 212 genes were subjected to GO analysis using Panther V13.0 (). A selection of genes from the significantly associated term neuron differentiation was plotted in a heatmap. Z-scores were calculated for the different samples of Week 4 H9and H9, and H9 Week 3, Week 4 and Week 5.

Organoid immunofluorescence staining Organoids were collected in an eppendorf tube and washed 3 times in PBS, then fixed in 3.8% PFA / PBS for 10 minutes. Organoids were washed 3 times in PBS again, following incubation in 30% sucrose / PBS overnight at 4°C. Organoids were embedded in cryomolds with Tissue Freezing Medium (VWR) or Shandon Cryomatrix (ThermoFisher) and stored at −80°C for later use. 16μm cryosections (Leica CM3050S) were captured on SuperFrost plus slides (VWR), and stored at −80°C for later use. For immunostainings, sections were defrosted and washed 3 times 3 minutes in PBS. Sections were post-fixed 10 minutes in 3.8% PFA, followed by 3 washes of 3 minutes in PBS. Blocking solution (3% BSA + 0.1% Triton in PBS) was incubated at room temperature for 3-4 hours. Primary antibodies were diluted 1:1000 in blocking solution and incubated overnight at 4°C. Sections were washed 3 times 5 minutes in PBS, then secondary antibodies diluted 1:1000 in 0.1% Triton / PBS and incubated 1 hours at room temperature. Slides were then washed 3 times 5 minutes with PBS and mounted with SlowFade+DAPI solution (Invitrogen) and stored at 4°C. Alternatively, after secondary antibody incubation, slides were washed 2 times 5 minutes in PBS, followed by DAPI solution incubation of 5 minutes, then washed 2 more times for 5 minutes. 3 drops of FluorSave (MerckMillipore) were added and slides were sealed by coverslips and nailpolish. Imaging was done at least 24 hours after storing the slides at 4°C. Primary antibodies were diluted to the following amounts: anti-SOX2 1:1000, rabbit anti-CTIP2 1:1000, rat anti-CTIP2 1:250, anti-PAX6 1:200, anti-TBR1 1:500. Secondary antibodies were diluted to the following amounts: anti-rabbit 488 1:1000, anti-mouse 488 1:1000, anti-rabbit Cy3 1:1000, anti-rat Cy3 1:1000. Antibody details can be found in the key resource table.

Co-immunoprecipitation and immunoblot pCIG-NOTCH2-Myc and pCAG-NOTCH2NL-HASh + pCAG-NOTCH2NL-HAL,T197I were mixed in equimolar ratios and transfected using Lipofectamine 2000 (Thermofisher). For control conditions, pCIG-EV and pCAG-EV were used in equimolar ratios. 6 hours after transfection, medium was replaced, and another 24 hours later medium was replaced. Cells were harvested 48 hours after transfection. Cells were washed 3 times with cold 1x PBS, then incubated in 40 minutes in IP buffer (50mM Tris-HCl, 150mM NaCl, 5mM MgCl, 0.5mM EDTA, 0.2% NP-40, 5% glycerol, supplemented with cOmplete, EDTA-free protease inhibitor cocktail (Sigma). Cells were lysed by passing cell suspension through 273/4 gauge needle 10 times. Lysate was centrifuged 10 minutes at 4°C, supernatant was transferred to a fresh 1.5ml tube. 2 μg of one specific antibody was added (anti-HA Abcam ab9110, anti-Myc Abcam ab9E10, anti-His Abcam ab9108, anti-NOTCH2 SCBT sc25-255) and incubated overnight at 4°C in a rotating wheel. DynaBeads were blocked using 3 washed of 1x PBS + 0.5% BSA and added to IP samples, incubating 3 hours at 4°C rotating. Using a magnetic separator, samples were washed 2 times in cold IP buffer. Then samples were eluted in Tris-EDTA buffer and transferred to new 1.5ml tubes. 2x Laemmli buffer + DTT was added 1:1 prior to SDS-PAGE. Samples were loaded on 4%–20% Tris glycine gels (Bio-Rad), followed by blotting on nitrocellulose membranes following manufacturer’s recommended protocol. Membranes were blocked in 5% skim-milk powder in 1x PBS + 0.05% Tween or 1x TBS + 0.1% Tween. Primary antibodies were incubated 3 hours at room temperature in 1x PBS (anti-NOTCH2 sc25-255) or 1x TBS-T (other antibodies), followed by 3 washes in 1x PBS-T (anti-NOTCH2 sc25-255) or 1x TBS-T (other antibodies). Secondary antibodies (anti-Rabbit-HRP 65-6120, anti-Mouse-HRP 62-6520, Thermofisher) were incubated 60 minutes at room temperature, followed by 3 more washes in 1x PBS-T or 1x TBS-T. Membranes were incubated with supersignal westdura ECL substrate (Thermofisher) and imaged using Bio-Rad Chemidoc imager. For experiments with pCAG-NOTCH2NL-His, pCIG-NOTCH2-Myc, pCIG-PDGFRB-Myc and pCIG-EGFR-Myc, the same protocol was used with equimolar mixes of plasmid DNA. For immunoprecipitation of NOTCH2NLSh,T197I from mouse 46c ESCs, the same protocol was used and protein was isolated from medium using the NOTCH2 sc25-255 antibody. Sh,T197I is detected in two bands, of which the two bands of NOTCH2NLSh likely may represent the glycosylated form of the protein (higher band) and unmodified protein (lower band). This pattern was also observed in ectopic expression of N-terminal fragments of the NOTCH3 receptor ( Duering et al., 2011 Duering M.

Karpinska A.

Rosner S.

Hopfner F.

Zechmeister M.

Peters N.

Kremmer E.

Haffner C.

Giese A.

Dichgans M.

Opherk C. Co-aggregate formation of CADASIL-mutant NOTCH3: a single-particle analysis. To analyze presence of secreted NOTCH2NL in NOTCH2NL-conditioned medium, the medium was collected after 32 hours, and used for immunoprecipitation with a NOTCH2 antibody specific for the N-terminal region. The isolated protein samples were analyzed by immunoblot, confirming the presence of secreted NOTCH2NL in the medium ( Figure 6 D). NOTCH2NLis detected in two bands, of which the two bands of NOTCH2NLlikely may represent the glycosylated form of the protein (higher band) and unmodified protein (lower band). This pattern was also observed in ectopic expression of N-terminal fragments of the NOTCH3 receptor ().

NOTCH reporter co-culture assays U2OS cells were seeded at a density of 425,000 cells per well for transfection (6-wells plate). In parallel, U2OS control or U2OS-JAG2 cells were seeded at a density of 110,000 cells per well for co-culture (12-wells plate). After 24 hours, U2OS cells in 6-wells plates were transfected the following amounts of plasmid DNA per well. For control conditions: 500 ng pGL3-UAS, 33.3 ng pRL-CMV, 16.7 ng pCAG-GFP, 200 ng pcDNA5.1-NOTCH2-GAL4, 167 ng pCAG-EV, and 273 ng pBluescript. For conditions including NOTCH2NL:500 ng pGL3-UAS, 33.3 ng pRL-CMV, 16.7 ng pCAG-GFP, 200 ng pcDNA5.1-NOTCH2-GAL4, 200 ng pCAG-NOTCH2NL, and 240 ng pBluescript. Plasmid DNA mix was transfected using polyethylenimine (PEI, linear, MW 25000, Polysciences). All amounts were scaled accordingly for multiple transfections. For larger experiments, cells were seeded and transfected in T25 flasks or on 100 mm plates and amounts used were scaled accordingly to surface area. 6 hours after transfection, 6-wells plates were treated with 0.5 mL of 0.25% Trypsin and 0.5 mM EDTA in PBS per well for 2 minutes at 37 degrees. Cells were resuspended in a total volume of 7 mL after addition of culture medium. Medium of 12-wells plates was removed, and 1 mL of transfected cell suspension was added to each well for co-culture. 10 μM Dibenzazepine (DBZ) was added to selected control wells. After 24 hours, medium was removed and cells washed once with PBS. Cells were incubated in 150 μl of 1x passive lysis buffer (PLB, Promega) on an orbital shaker for 15 minutes. Lysates were stored at −80°C until analysis. In OP9 and OP9-DLL1 co-cultures, 80,000 cells were seeded per well of a 12-wells plate. For generating conditioned medium, U2OS cells were seeded on 100 mm plates, and were PEI transfected with 2000 ng of pCIG-EV, or 2400 ng of NOTCH2NLA or NOTCH2NLB. Another 10000 ng or 9600 ng of pBluescript was used as carrier DNA. 6 hours after transfection, medium was replaced. 32 hours after transfection, medium was collected and 0.2 μm filtered and used the same day. The experiments using conditioned medium were done as previously described, but were seeded on 0.25% gelatin, 0.1% BSA coated plates instead. For the reporter U2OS cell transfection, only pCAG-EV, and NOTCH2NL plasmids were not added to the plasmid DNA mix, and replaced by pBluescript. Instead, transfected cells are resuspended and seeded in conditioned medium harvested from other cells. For DLL4 assays, 24-wells plates were coated overnight at 4°C with 150 μl of 5 μg / ml rDLL4 (R&D Systems), 0.25% gelatin, 0.1% BSA in PBS. Control plates were coated with 0.25% gelatin, 0.1% BSA in PBS only. U2OS cells were transfected and seeded according to co-culture protocol as previously described, except 0.5 mL of cell suspension was used for each well of the coated 24-wells plates. NOTCH-GAL4 and reporter constructs were kindly gifted by Arjan Groot and Marc Vooijs (MAASTRO lab, Maastricht University).

RT-PCR characterization of primate NOTCH2NL fusion genes N2NL_Fw1_exon1: CGCTGGGCTTCGGAGCGTAG

N2NL_Rv2_exon5: CCAGTGTCTAATTCTCATCG

PDE4DIP_Fw2_exon24: ACACCATGCTGAGCCTTTGC

PDE4DIP_Fw1_exon27: AAGGCCCAGCTGCAGAATGC

MAGI3_Fw1_exon1: GGGTTCGGGATGTCGAAGAC

MAGI3_Fw2_exon10: GCAACTGTGTCCTCGGTCAC

MAGI3_Fw3_exon14: GGGAGCAGCTGAGAAAGATG

TXNIP_Fw1_exon1: CAGTTCCATCATGGTGATG

BRD9_Fw2_exon10: ACGCTGGGCTTCAAAGACG

BRD9_Fw1_exon12: GCAGGAGTTTGTGAAGGATGC For amplification and detection of potential fusion transcripts, QIAGEN OneStep RT-PCR kit was used according to manufacturer’s protocol. 25 ng of total RNA isolated from gorilla iPSCs, chimpanzee iPSCs, or human H9 ESCs was used per reaction. Primers used in these reactions were:

Oligo capture library generation To enrich whole-genome sequencing libraries to allow for cost-effective deep sequencing of the NOTCH2NL loci, a MYcroarray MyBaits custom oligonucleotide library was developed. 100 bp probes were designed spaced 50 bp apart in chr1:145,750,000-149,950,000, ignoring repeat masked bases, for a total of 20,684 probes. A further 8,728 probes were created in the three NOTCH2NL loci by tiling with 50 bp overlaps, ignoring repeat masking but dropping any probes with very low complexity. 17,866 probes were added at every Singly Unique Nucleotide (SUN) position tiling at 5 bp intervals from −75 bp to +75 bp around the SUN. SUN positions are single nucleotide substitutions that are markers for individual paralogs. To try and capture population diversity and ensure even enrichment, at every SNP in the NA12878 Genome In a Bottle variant call set the reference base was replaced and probes tiled in the same fashion as the SUNs. Finally, to reach the required 60,060 probes a random 347 probes were dropped.

Library Preparation and Enrichment of 10x Chromium Libraries High molecular weight DNA was processed into Illumina sequencing libraries using the Chromium Genome Reagent Kit V2 chemistry according to the recommended protocol (CG00022 Genome Reagent Kit User Guide RevC) and enriched using the custom MyBaits oligonucleotide probes described above ( Figure S1 ). Briefly, high molecular weight (HMW) gDNA was isolated from cultured cells using a MagAttract kit (QIAGEN) followed by quantification with Qubit. HMW DNA was partitioned inside of an emulsion droplet along with DNA barcode containing gel beads and an amplification reaction mixture. After barcoding the molecules within the emulsion, Illumina sequencing adaptors were added by ligation. In preparation for hybridization with MyBaits probes Illumina adaptor sequences are blocked with complementary oligonucleotides. Biotinylated probes were hybridized overnight at 65°C and isolated using streptavidin coated MyOne C1 beads (Invitrogen). The final enriched libraries were amplified using an Illumina Library Amplification Kit (Kapa).

Sequencing of Enriched 10x Chromium Libraries The MYcroarray probes (above) were used to enrich 10x Genomics sequencing libraries for three well studied individuals (NA19240, NA12877 and CHM1), the H9 ESC line, the six Simons VIP samples in Figure 7 , and the H9 CRISPR mutants in Figure 5 . NA12877 was chosen instead of NA12878 because of the existence of high depth 10x Genomics Chromium whole-genome data for that individual. We find that around 50% of our reads map to regions of enrichment, leading to > 1000x coverage of the NOTCH2NL loci. The NA19240, NA12877 and H9 libraries were sequenced to 65 million reads, 71 million reads, and 107 million reads respectively. The Simons VIP samples SV721, SV877, SV7720, SV780, SV735 and SV788 were sequenced to a depth of 57 million, 30 million, 44 million, 37 million, 86 million and 37 million reads respectively.

Chimpanzee NOTCH2NL gene analysis on chimpanzee Chromium genome sequencing data Whole genome 10x Genomics Chromium linked read sequencing libraries were generated from high molecular weight DNA isolated from a chimpanzee iPSC line (Epi-8919-1A) derived from S008919 primary fibroblasts (Yerkes Primates, Coriell) and described in Field, et. al., bioRxiv 232553; doi: https://doi.org/10.1101/232553 . according to the 10x Genomics Protocol (CG00022 Genome Reagent Kit User Guide RevC). PE 150bp Illumina sequencing was done on a HiSeq4000 producing 1.6 billion reads. Reads were processed using Longranger and aligned to hg38. Reads aligned to chr1:119,989,248-120,190,000, chr1:149,328,818-149,471,561, chr1:148,600,079-148,801,427, chr1:146,149,145-146,328,264, or chr1:120,705,669-120,801,220 were extracted and their barcodes recorded. All reads from these barcodes were extracted and realigned to a chimp BAC-derived consensus NOTCH2NL sequence using bwa, variants were called with freebayes command “freebayes -f n2nlConsensus.fa–ploidy 10–min-alternate-fraction 0.05 -k -j–min-coverage 50 -i -u −0 consensus_mapped.sorted.bam,” and then sequences were assembled with Gordian Assembler. Scaffold and assembly hubs were made to visualize these assemblies, as shown in the Github at https://github.com/vrubels/Notch2NL-Project

NOTCH2NL Simons Samples Coverage Analysis Zerbino et al., 2014 Zerbino D.R.

Johnson N.

Juettemann T.

Wilder S.P.

Flicek P. WiggleTools: parallel processing of large collections of genome-wide datasets for visualization and statistical analysis. To assess copy number change in the Simons VIP 1q21.1 collection, the H9, NA12877 and NA19240 enriched 10x Chromium libraries described above were mapped to GRCh38 using Longranger 2.1.3. bamCoverage was used to extract all reads that mapped to the region chr1:142785299-150598866, normalizing depth to 1x coverage across the region to account for library depth. Wiggletools mean () was used to average the depth across these samples. Wiggletools was then used to perform a ratio of this average with the coverages of every Simons 1q21.1 collection sample, which simultaneously normalizes out bias from the array enrichment as well as GC content. These coverages were then re-scaled by the average coverage in the region chr1:149,578,286-149,829,369, which is downstream of NOTCH2NLC and not observed to have copy number change. This rescaling adjusts for a systematic shift downward caused by the combination of the previous normalizations seen in deletion samples, and a similar shift upward in duplication samples. Finally, sliding midpoint smoothing was applied to each coverage track, taking into account missing data by ignoring it and expanding the window size symmetrically around a midpoint to always include 100,000 datapoints, stepping the midpoint 10 kb each time.

Hominid and Archaic Human Copy Number Analysis Sequencing data for NA12878 (ERR194147), Vindjia Neanderthal (PRJEB21157), Altai Neanderthal (PRJEB1265), Denisovan (ERP001519), Chimpanzee (SRP012268), Gorilla (PRJEB2590) and Orangutan (SRR748005) were obtained either from SRA or from collaborators. These data were mapped to GRCh37 to obtain reads mapping to the NOTCH2 (chr1:120,392,936-120,744,537) and NOTCH2NL (chr1:145,117,638-145,295,356) loci in that assembly, and then those reads were remapped to a reference containing just the GRCh38 version of NOTCH2. Coverage was extracted with bamCoverage, normalizing to 1x coverage across the custom NOTCH2 reference. The resulting coverage tracks were then scaled to the average of the unique region of NOTCH2 then underwent the same sliding midpoint normalization described above, with 5,000 datapoints per window and 2.5 kb step size.

Gordian Assembler Peng et al., 2012 Peng Y.

Leung H.C.

Yiu S.M.

Chin F.Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. The extremely low number of long fragments per partition in the 10x Chromium process ensures that nearly all partitions containing sequence from a NOTCH2NL repeat will contain sequence from precisely one repeat copy. In order to recover the precise NOTCH2NL repeat sequences, a process was developed to assemble paratypes using barcoded reads. A 208 kb multiple sequence alignment of NOTCH2NL paralogs was constructed and a consensus sequence generated. For each sample being assembled, the 10x Genomics Longranger pipeline was used to map enriched or unenriched reads to GRCh38. All reads that mapped to any of the five NOTCH2 or NOTCH2NL loci in that alignment were extracted, as well as any reads associated with the same input molecules via the associated barcodes. These reads were then remapped to the consensus sequence. FreeBayes ( https://arxiv.org/abs/1207.3907 ) was used to call variants on these alignments with ploidy set to 10 based on the putative number of NOTCH2NL repeats. Each barcode is then genotyped to find the set of alleles supported at each informative SNP site. Alleles for the majority of SNP sites are undetermined in each barcode due to the sparsity of the linked reads. The result is an MxB sparse matrix where M is the number of variants and B is the number of barcodes identified as having NOTCH2-like sequence. A statistical model is then used to phase this matrix into K paratypes. For each cluster of barcodes representing a single paratype, all reads with the associated barcodes are pooled for short-read assembly using the DeBruijn graph assembler idba_ud ().

Establishment of Paratypes in Population The paratype assembly process described above was applied to the MYcroarray enriched 10x Genomics sequencing of NA19240, H9, NA12877, and the six Simons VIP samples. The H9 paratypes were validated with full-length cDNA sequencing. The NA12878/NA12891/NA12892 trio (Utah) as well as the NA24385/NA24143/NA24149 trio (Ashkenazi) were assembled using linked read data produced by 10x Genomics for the Genome In A Bottle Consortium. Inheritance was established for the Ashkenazi trio, as well as for the three NA12878 paratypes that assembled. Inherited paratypes are not double counted in Table S1 . NA12877 did not assemble completely and so is not included in the table. A scaffolding process using alignments of contigs to GRCh38 was performed to construct full-length NOTCH2NL loci for each of these assemblies. The NOTCH2NL transcripts were annotated and assessed for their protein level features.

Enrichment and Sequencing of Full-Length cDNA Byrne et al., 2017 Byrne A.

Beaudin A.E.

Olsen H.E.

Jain M.

Cole C.

Palmer T.

DuBois R.M.

Forsberg E.C.

Akeson M.

Vollmers C. Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Full-length cDNA was constructed from both week 5 cortical organoids as well as undifferentiated H9 hESC total RNA similar to previously described protocols () and were enriched using the same MyBaits oligonucleotide set as the 10x Chromium libraries. These cDNA libraries were prepared and sequenced on the Oxford Nanopore MinION. 47,391 reads were obtained for the undifferentiated cells and 118,545 reads for the differentiated cells. The reads were base called with Metrichor. After pooling these datasets, the reads were aligned to GRCh38 to identify putative NOTCH2NL reads. 2,566 reads were identified in the week 5 dataset that mapped to NOTCH2NL, and 363 in the undifferentiated. Both datasets were filtered for full-length transcripts by requiring at least 70% coverage to the first 1.1 kb of the consensus sequence. This filtering process removed NOTCH2 like transcripts, leaving a final set of 1,484 transcripts pooled across both time points to be analyzed.

Validating H9 Haplotypes Using Full-Length cDNA Jain et al., 2015 Jain M.

Fiddes I.T.

Miga K.H.

Olsen H.E.

Paten B.

Akeson M. Improved data analysis for the MinION nanopore sequencer. The 1,484 NOTCH2NL transcript sequences identified above were aligned to a consensus sequence of H9 ESC transcript paratypes using MarginAlign (). The reads were then reduced into feature vectors containing variant sites along the first 1.1 kb of the consensus to eliminate noise related to alternative transcription stop sites. The feature vectors were aligned using a Hidden Markov Model with one path for each of the paratype assemblies. Since the transcripts are already aligned to a consensus, there is no need for reverse transitions in the model, and since variation or recombination between paralogs is already accounted for in the assemblies, no transitions between paths are allowed. This vastly simplifies the Forward algorithm, and the maximum probability path (usually determined with the Viterbi Algorithm) is trivial to calculate under these conditions. All mismatches were assumed to be errors and were given an emission probability of 0.1 to approximate the error rate of the nanopore. The paratype assembly was validated by showing that there were no recurrent feature vectors that did not align well to any path through this model.

CRISPR Mutation of NOTCH2NL in the H9 ES Line 6 mouse embryonic fibroblasts (MEFs) and cultured in E8 Flex with 2 μM thiazovivin (Tocris, 1226056-71-8) for added for the first 24 hours. After growing 5-7 days, individual colonies were manually isolated into 1 well of a 6-well dish on 250,000 MEFs in E8 Flex. 3-5 days later, 3-7 good colonies at passage 42+3 were frozen in BAMBANKER (Fisher Scientific, NC9582225). Remaining cells on MEFs were used for PCR deletion assay. For all subsequent analysis, cells were adapted to culturing on vitronectin (Thermo Fisher A14700) in GIBCO’s Essential 8 Flex medium (Thermo Fisher, A2858501). To avoid targeting NOTCH2, two guides were used: one in intron 1 with a 1 base mismatch with NOTCH2 and NOTCH2NLR, but identical to the corresponding sequence in all H9 1q21 NOTCH2NL genes, and another that spans a 4 bp deletion relative to NOTCH2 at the start of exon 5. This region is also quite different in NOTCH2NLR (13/20 mismatches to NOTCH2NL) ( Figure S5 A). H9 hESC at passage 42 were plated on a 6-well dish at 40%–50% confluency. After 24 hours, cells were treated with 10 μM ROCK inhibitor (Y27632; ATCC, ACS-3030) for 1 hour. 2.5 μg of each guide plasmid (E2.1& E5.2, Figure S5 cloned into pX458, Addgene) was then introduced for 4 hours using Xfect DNA transfection reagent (Clontech, 631317). Each guide set was introduced to all 6 wells of a 6-well plate. 48 hours after transfection, cells were dissociated from wells using Accutase cell dissociation enzyme (eBioscience, 00-4555-56), then rinsed twice in PBS supplemented with 0.2mM EDTA, 2% KnockOut Serum Replacement (Thermo Fisher, 10828028), 1% Penicillin-Streptomycin (LifeTech, 15140122), and 2 μM thiazovivin (Tocris, 1226056-71-8), and resuspended in a final volume of 1mL of sorting buffer. The cells were then filtered in a 70 μm filter and sorted on a FACS Aria II (BD Biosciences) with a 100 μm nozzle at 20psi to select for cells expressing the Cas9-2A-GFP encoded on pX458. Gating was optimized for specificity. Single cells positive for GFP were plated on a 10 cm plate containing 1.5x10mouse embryonic fibroblasts (MEFs) and cultured in E8 Flex with 2 μM thiazovivin (Tocris, 1226056-71-8) for added for the first 24 hours. After growing 5-7 days, individual colonies were manually isolated into 1 well of a 6-well dish on 250,000 MEFs in E8 Flex. 3-5 days later, 3-7 good colonies at passage 42+3 were frozen in BAMBANKER (Fisher Scientific, NC9582225). Remaining cells on MEFs were used for PCR deletion assay. For all subsequent analysis, cells were adapted to culturing on vitronectin (Thermo Fisher A14700) in GIBCO’s Essential 8 Flex medium (Thermo Fisher, A2858501). PCR assay for CRIPSR deletion: For each clone, gDNA was isolated from one 70% confluent well of a 6-well dish using Zymo Quick-gDNA Miniprep kit (Zymo, D3006) according to the manufacturer’s protocol. PCR was performed using approximately 70ng gDNA with Herculase II fusion DNA polymerase (Agilent, 6006745) using primers N2NL E2del_F (5′ CACAGCCTTCCTCAAACAAA 3′) and N2NL E5del_R (5′ GTGCCACGCATAGTCTCTCA 3′). PCR products of the expected size were cloned and sequenced to determine that at least one of NOTCH2NL locus harbored the expected deletion. Positive clones underwent Chromium library preparation, target enrichment, Illumina sequencing and NOTCH2NL gene assembly as described above.

NOTCH2NL Expression in Week 5 Neurospheres Two replicates of bulk RNA-seq of week 5 cortical organoids derived from H9 ES as well as undifferentiated cells from the H9 differentiation time course described above were quantified against a custom Kallisto reference based on GENCODE V27. Using bedtools, all transcripts which overlapped our curated annotations of NOTCH2NL paralogs and NOTCH2 were removed. After converting this annotation set to FASTA, a subset of our paratype assemblies of H9 NOTCH2NL paralogs were added in. Only one representative of both NOTCH2NLR and NOTCH2NLC was used due to their high similarity on the transcript level. The TPM values of the replicates were averaged.

Estimate of NOTCH2 and NOTCH2NL Expression in human fetal brain scRNA-Seq data Nowakowski et al., 2017 Nowakowski T.J.

Bhaduri A.

Pollen A.A.

Alvarado B.

Mostajo-Radji M.A.

Di Lullo E.

Haeussler M.

Sandoval-Espinosa C.

Liu S.J.

Velmeshev D.

et al. Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex. To asses NOTCH2NL expression in the developing brain, we re-analyzed single cell RNA sequencing data from (). Initial analysis of this data showed low expression of NOTCH2 and NOTCH2NL presumably due to removal of multi-mapping reads. To address this, we constructed a custom Kallisto reference based off GENCODE V19 (hg19) where we removed the transcripts ENST00000468030.1, ENST00000344859.3 and ENST00000369340.3. The reads for 3,466 single cells were then quantified against this Kallisto index, and the NOTCH2 and NOTCH2NL rows of the resulting gene-cell matrix compared to previously generated tSNE clusters.

Copy Number Estimates of NOTCH2NL in Human Population The copy number of NOTCH2NLR and NOTCH2NLC in the human population were established by extracting reads that map to NOTCH2 (chr1:119,989,248-120,190,000), NOTCH2NLR (chr1:120,705,669-120,801,220), NOTCH2NLA (chr1:146,149,145-146,328,264), NOTCH2NLB (chr1:148,600,079-148,801,427) and NOTCH2NLC (chr1:149,328,818-149,471,561) from 266 individuals in the Simons Diversity Panel. These reads were then remapped to the 101,143 bp consensus sequence of a multiple sequence alignment of alignable portions of NOTCH2 and all NOTCH2NL paralogs. This multiple sequence alignment was used to define our SUN markers, and the ratio of reads containing a SUN to a non-SUN were measured and the median value taken for NOTCH2NLC and NOTCH2NLR. Establishing copy number with SUNs proved difficult for NOTCH2NLA and NOTCH2NLB due to the high rate of segregating ectopic gene conversion alleles in the population. Each of the 266 samples was studied by hand. Using comparison to the 10 normal genomes assembled, it appeared that NOTCH2NLA and NOTCH2NLB are not copy number variable in the phenotypically normal population.

Paratype Estimation of NOTCH2NL in Human Population ). These were evaluated for NOTCH2NLC and NOTCH2NLR copy number (). Three samples were identified in Simons with apparent gene conversion in NOTCH2NLC, which we did not observe in any of our assembled samples. Manual analysis of these SUN diagrams led to the annotation of six distinct classes of NOTCH2NLA-NOTCH2NLB gene conversion with varying population frequencies. In some cases, the data were of lower quality and harder to interpret. The most common gene conversion allele is an overwrite of around 20kb of NOTCH2NLB by NOTCH2NLA in intronic sequence between exons 2 and 3, present in 42.5% of Simons normals haplotypes. When interpreting these SUN plots, it is helpful to remember that the denominator of the ratio is the total copy number, and as such as individuals stray from N = 10 the expected values change. Gene conversion can be observed as regions where one paralog has ratios on the y axis go up while the other goes down. Exons 1-5 are located at 19,212-19,590 bp, 59,719-59,800 bp, 84,150-84,409 bp, 92,421-92,756 bp and 93,009-97,333 bp respectively in the consensus sequence. Assigning paratypes without assemblies is not possible. To try and evaluate the gene conversion landscape in the population, we took the ratio of SUN read depths in all 266 Simons individuals as well as the six Simons VIP samples and our 10 assembled genomes and plotted them split up by paralog ( Table S4 . These were evaluated for NOTCH2NLC and NOTCH2NLR copy number ( Figure S1 . Three samples were identified in Simons with apparent gene conversion in NOTCH2NLC, which we did not observe in any of our assembled samples. Manual analysis of these SUN diagrams led to the annotation of six distinct classes of NOTCH2NLA-NOTCH2NLB gene conversion with varying population frequencies. In some cases, the data were of lower quality and harder to interpret. The most common gene conversion allele is an overwrite of around 20kb of NOTCH2NLB by NOTCH2NLA in intronic sequence between exons 2 and 3, present in 42.5% of Simons normals haplotypes. When interpreting these SUN plots, it is helpful to remember that the denominator of the ratio is the total copy number, and as such as individuals stray from N = 10 the expected values change. Gene conversion can be observed as regions where one paralog has ratios on the y axis go up while the other goes down. Exons 1-5 are located at 19,212-19,590 bp, 59,719-59,800 bp, 84,150-84,409 bp, 92,421-92,756 bp and 93,009-97,333 bp respectively in the consensus sequence.