Quick summary: 23andMe raw data contains insertions and deletions with proprietary identifiers, most of which have never been analyzed.

Our software can now handle over 1,000 of these “indels”, and nearly all of them impact a human disease or trait!

Background:

There are only a few thousand insertions and deletions (“indels”) in the 23andMe raw data. That’s not many compared to the hundreds of thousands of SNPs. But indels can be some of the most impactful types of genome alterations. Many diseases and traits are caused by an insertion or deletion in a critical gene.

Analysis of the indels in 23andMe’s raw data is difficult, because many of the indels use 23andMe’s proprietary identifier (i.e. i5037354). In addition, they do not provide enough information to determine the exact insertion or deletion that was designed to be tested. We asked 23andMe if they would share this information, but they declined to do so.

In the latest 23andMe genotyping chip (v4) there are:

4,093 total indels

and 3,413 of these indels use a 23andMe proprietary identifier (83.3%)

Even when a dbSNP (rs) identifier is used, the position of the indel can be shifted, such that it makes it difficult to compare to next-generation sequencing data.

We knew there were likely to be many important indels among those in the 23andMe data, so we set out to reverse engineer as many as we could, and identify those that affect human disease and traits.

The Indel Analysis:

We started with over 1,500 23andMe raw data files from the Opensnp.org database. We compiled a list of every indel and the frequency with which we found a DD, DI, or II genotype. Then, we cross-correlated this list with a list of nearby known indels from our own database – especially those with a disease or trait phenotype. We expect that many of the indels in the 23andMe raw data were designed to test known clinically relevant genome variants.

Finally, we went though a very labor intensive process to analyze each indel, the surrounding sequence, the nearby clinical variants, and the expected allele frequencies. In the end, we were able to confidently identify over 1,000 indels, most of which have a known effect on a disease or trait.

An Example:

Let’s take a look at one:

i5012559 8 87656009 DI

We have identified this as an autosomal recessive deletion that can lead to Achromatopsia – a condition where the individual cannot see any color – complete color blindness! There are a few carriers of this deletion in the Opensnp database, but no homozygous individuals (2 copies and therefore affected). The frequency of this deletion among the 1,500 23andMe users is consistent with the frequency of this deletion in next-generation sequencing data.

23andMe doesn’t tell you anything about this deletion (even if you have access to the health information). In the old 23andMe health reports, 23andMe identifies only 20 total insertions and deletions. Given that there is less total information in the new health reports, I expect this number to be even smaller in the newly announced 23andMe health reports.

As of this publication, this deletion is not reported by other interpretation services, like SNPedia/Promethease. To examine further, I randomly selected 50 of the indels that we identified and looked for them in SNPedia. SNPedia only had information on 2 out of the 50 indels tested.

Summary:

For the first time anywhere, we have been able to analyze over 1,000 of 23andMe’s proprietary indels. To my knowledge, the Enlis software is the only solution for identifying and getting more information on the majority of these health-impacting variants.

I will have a more complete analysis of the totality of health information in the 23andMe raw data in another blog post, but one interesting thing to leave you with — the 23andMe raw data contains information on hundreds of indels that are related to hereditary cancer. How many hereditary cancer variants does 23andMe report in their new system? Zero.

Want to get your own 23andMe indels analyzed? Click here to start our import process.

Note: 23andMe recently revamped their online service, but the genotyping chip has not changed. The v4 chip, launched in December 2013, is still being used.

Appendix:

The indels that we analyze affect these diseases:

Achondrogenesis, type IB

Achromatopsia 3

Alpha Thalassemia

Alpha-2-macroglobulin polymorphism

Alzheimer disease, susceptibility to

Amyotrophic lateral sclerosis type 2

Andermann syndrome

Aspartylglycosaminuria

Ataxia with vitamin E deficiency

Ataxia, Friedreich-like, with isolated vitamin E deficiency

Ataxia-telangiectasia syndrome

Atypical Rett syndrome

BRCA1 and BRCA2 Hereditary Breast and Ovarian Cancer

Becker muscular dystrophy

Benign scapuloperoneal muscular dystrophy with cardiomyopathy

Beta Thalassemia

Beta-plus-thalassemia

Beta-thalassemia dominant

Bloom syndrome

Breast cancer, susceptibility to

Breast-ovarian cancer, familial 1

Breast-ovarian cancer, familial 2

Bronchiectasis with or without elevated sweat chloride 1, modifier of

Brugada syndrome 1

Cardiomyopathy

Carnitine palmitoyltransferase ii deficiency, late-onset

Ceroid lipofuscinosis neuronal 5

Ceroid lipofuscinosis, neuronal, 11

Choroideremia

Colorectal cancer, hereditary, nonpolyposis, type 1

Cone-rod dystrophy 3

Congenital myopathy with fiber type disproportion

Congestive heart failure and beta-blocker response, modifier of

Cystic fibrosis

Deafness, autosomal recessive 1A

Deafness, digenic, GJB2/GJB3

Deafness, digenic, GJB2/GJB6

Debrisoquine, poor metabolism of

Delta-zero-thalassemia, knossos type

Dermatitis, atopic, 2, susceptibility to

Diastrophic dysplasia

Dilated cardiomyopathy 1A

Dilated cardiomyopathy 3B

Duchenne muscular dystrophy

Dystonia 1

Dystonia 12

Early infantile epileptic encephalopathy 2

Encephalopathy, neonatal severe, due to MECP2 mutations

Enlarged vestibular aqueduct syndrome

Familial Mediterranean fever

Familial cancer of breast

Familial hypercholesterolemia

Familial hypertrophic cardiomyopathy 2

Familial hypertrophic cardiomyopathy 4

Familial hypertrophic cardiomyopathy 7

Fanconi anemia, complementation group C

Fanconi anemia, complementation group D1

Frontotemporal dementia, ubiquitin-positive

Fumarase deficiency

Gaucher’s disease, type 1

Glucose-6-phosphate transport defect

Glycogen storage disease IIIa

Glycogen storage disease IIIb

Glycogen storage disease type 1A

Glycogen storage disease type III

Hearing impairment

Heinz body hemolytic anemia

Hemoglobinopathy

Hereditary cancer-predisposing syndrome

Hereditary factor VIII deficiency disease

Hereditary fructosuria

Hereditary leiomyomatosis and renal cell cancer

Hereditary nonpolyposis colorectal cancer type 5

Hereditary pancreatitis

Hypertrophic cardiomyopathy

I cell disease

Ichthyosis vulgaris

Immunodeficiency due to ficolin 3 deficiency

Infantile hypophosphatasia

Infantile-onset ascending hereditary spastic paralysis

Infertility associated with multi-tailed spermatozoa and excessive DNA

Inflammatory bowel disease 1, susceptibility to

Leber congenital amaurosis 4

Left ventricular noncompaction 6

Li-Fraumeni syndrome 1

Limb-girdle muscular dystrophy, type 2A

Limb-girdle muscular dystrophy, type 2G

Long QT syndrome 3

Lynch syndrome

Lynch syndrome I

Lynch syndrome II

Macular dystrophy, vitelliform, adult-onset

Malignant tumor of prostate

Marfan’s syndrome

Maturity-onset diabetes of the young, type 2

Meckel-Gruber syndrome

Mental retardation, X-linked, syndromic 13

Microcephaly, normal intelligence and immunodeficiency

Multiple epiphyseal dysplasia 4

Myopathy, distal, 1

Neurofibromatosis, familial spinal

Neurofibromatosis, type 1

Neurofibromatosis, type 2

Neurofibromatosis-Noonan syndrome

Niemann-Pick disease, type A

Osteogenesis imperfecta

Osteogenesis imperfecta type I

Osteogenesis imperfecta type III

Pachydermoperiostosis syndrome

Pachyonychia congenita type 2

Pancreatic cancer 2

Pancreatic cancer 4

Pancreatic cancer, susceptibility to

Parkinson disease 6, autosomal recessive early-onset

Parkinson disease, late-onset

Pendred’s syndrome

Persistent hyperinsulinemic hypoglycemia of infancy

Phenylketonuria

Phosphate transport defect

Polycystic kidney disease, infantile type

Primary familial hypertrophic cardiomyopathy

Primary hyperoxaluria, type II

Primary progressive aphasia

Pseudo-Hurler polydystrophy

Pseudoxanthoma elasticum

Retinitis pigmentosa 19

Retinitis pigmentosa 7

Retinoblastoma

Rett’s disorder

Schwannomatosis

Spastic ataxia Charlevoix-Saguenay type

Stargardt disease 1

Supranuclear palsy, progressive, 1, atypical

Symmetrical dyschromatosis of extremities

Tay-Sachs disease

Turcot syndrome

Tyrosinase-negative oculocutaneous albinism

Werdnig-Hoffmann disease

Wilson’s disease