To improve our interpretation of 23andMe’s raw data, we used allele frequency data from next generation sequencing to identify hundreds of inaccurate SNPs. Read on to see what we did, and how you can get your own data analyzed.

Note: 23andMe recently revamped their online service, but the genotyping chip has not changed. The v4 chip, launched in December 2013, is still being used.

While developing and testing our Enlis Genome Personal software, we noticed some unusual SNPs in 23andMe’s raw data. We found a lot of rare homozygous SNPs, with very serious consequences, and the same SNPs were found in multiple samples that we had on hand!

Here is an example:

The SNP variant shown here is a splice disruption in a gene called HEXA. Splice disruptions in HEXA are known to cause Tay-Sachs disease. Not only do all 3 of these 23andMe users have this extremely rare homozygous (2 copies) splice disruption SNP, but all 3 users also have 2 more extremely rare homozygous splice disruption SNPs in the same HEXA gene! That can’t be right.

We wanted to verify with more data, and identify similar inaccurate positions, so first, we downloaded the database of user-submitted 23andMe data from Opensnp.org

Then, using the our software’s Variation Filter tool, we were able to compare the allele frequency of each 23andMe SNP among 1,500 users, against the expected allele frequency, based on next-generation sequencing projects (1000 genomes and Exome Aggregation Consortium).

As it turns out, there are more than 500 inaccurate positions like this in 23andMe’s raw data:

323 of the faulty SNPs are in splice sites, and 246 of those are splice disruptions (more serious).

75 are missense

The faulty SNPs are in 279 different genes, and 243 of those genes are known to affect a human disease or trait.

We have notified 23andMe of this problem, and our hope was that they will fix their raw data — however, so far they have not seemed very interested in our findings. This brings up the question: If 23andMe wants to have an ongoing relationship with their customers, then what is their responsibility fix the raw data when errors are discovered?

So there is some inaccurate data 23andMe’s results — is this cause for banning the download of raw data? No, not at all. In data sets this large, there are bound to be errors of this nature. We should fix errors where we find them and move forward. But if you want to get raw data interpreted, make sure that you use an experienced service, with quality control measures in place.

When you import your 23andMe data with our online import tool, we automatically remove these inaccurate SNPs. To my knowledge, we are the only 23andMe interpretation service to provide this level of quality control.

Click here to get started on the analysis of your own 23andMe data!