deCODEme and 23andme material

This page contains material and links to material useful to analyze deCODEme and 23andme test results, especially the Y chromosome.

Merged Y-file

David Reynolds and Adriano Squecco are keeping a repository of Y-chromosome decodeme and 23andme results:
Y-Chromosome Genome Comparison

An older repository was kept by Ann Turner. The file will not be updated. However, the file is very useful because it contains also the first version of HapMap Y chromosome data and a list of phylogenetically relevant SNP on decodeme and 23andme:
SNPs_on_Chips.xls

I have merged her worksheets for decodeme, 23andme (versions 1 and 2), and the hapmaps into one big csv file:
turnerYmerged.zip

(the file has not been updated for a while). I merged using Matlab:
Matlab files

Note for the matlab programs

To make comparisons, I change everything to + orientation, that is, I change decodeme - observations A<->T and C<->G. Also, I set all missing values (that is, all non ACGT values) to '-'.
The merging was done based on the position variable (that is, a rs is considered the same if its reported position is the same in the two datasets). The resulting file will be ordered by position as well. If there are two different SNPs with the same position, they will be merged and only one will appear (23andme, for instance, has a few rs appearing twice with different names). This means that merges done with other system (eg by SNP#) will show a few more lines.
Be also aware that some rs may be reported differently across databases, so check for consistency.
The program should interpret decodeme files, 23andme v1 at least those saved in excel as csv (I don't have an original result file), and 23andme v2. One will have to change the header lines, though.
Important: I have not checked the file, so mistakes are very likely. If you use the file, please let me know of any mistake or proble

Regarding the output csv file:

The observations (which have the same headers as in Ann Turner's file) are in the order: decodeme, 23andme, HapmapCEU, HapmapCBT, HapmapJPT, HapmapYRI.
The file has more than 256 observations, so if you use excel, excel reads only the first 256 columns and skips the rest. The last observations are HapmapYRI=Yoruba, so if you are not interested in those, it won't matter to you. You can erase additional columns if you need to add columns for your computations. The number of rs lines should be fine, though.
Be also aware that some rs may be reported differently across databases, so check for consistency.

I have also a smaller version in which I do some cleaning:
Ymergedclean.zip

This file deletes most Hapmap's SNP in the recombinant part of the Y choromosome (that is, they can have different alleles). Then it tries to change some of the SNP values for hapmap so that they are compatible with decodeme and 23andme. It also estimates the haplogroup for hapmap observations. Of course, beware: things have not been checked carefully. The matlab codes used to create the file are enclosed, so one can check the data cleaning.

Simple admixture

Dienekes has created a program to compute a simple admixture model of NW European, SE European, and Ashkenazi Jew (see his blog entry for an explanation). The model uses frequency data from Price 2007 to make a guess about a likely percentage of each of these three populations in one's genome. I have replicated his computation in matlab:
basicadmixture.zip

Chromosome extraction

Since the result file is huge, this matlab code extracts the (decodeme, for now) data for each chromosome and saves them to another csv file (note: the program takes a long time to run).
extractchr.m

deCODEme and 23andme material

Merged Y-file

Simple admixture

Chromosome extraction

Links