deCODEme and 23andme material
This page contains material and links to material useful to analyze
deCODEme and
23andme test results, especially the Y chromosome.
Merged Y-file
David Reynolds and Adriano Squecco are keeping a repository of
Y-chromosome decodeme and 23andme results:
Y-Chromosome Genome Comparison
An older repository was kept by Ann Turner. The file will not be updated. However, the
file is very useful because it
contains also the first version of HapMap Y chromosome data and a list of phylogenetically
relevant SNP on decodeme and 23andme:
SNPs_on_Chips.xls
I have merged her worksheets for decodeme, 23andme (versions 1 and 2), and the hapmaps
into one big csv file:
turnerYmerged.zip
(the file has not been updated for a while).
I merged using Matlab:
Matlab files
Note for the matlab programs
- To make comparisons, I change everything to + orientation, that is,
I change decodeme - observations A<->T and C<->G. Also, I
set all missing values (that is, all non ACGT values) to '-'.
-
The merging was done based on the position variable (that is,
a rs is considered the same if its reported position is the
same in the two datasets). The resulting file will be ordered by position
as well. If there are two different SNPs with the same position, they will be
merged and only one will appear (23andme, for instance, has a few rs appearing twice
with different names). This means that merges done with other system (eg by
SNP#) will show a few more lines.
- Be also aware that some rs may be reported differently across
databases, so check for consistency.
- The program should interpret decodeme files, 23andme v1 at least
those saved in excel as csv (I don't have an original result file),
and 23andme v2. One will have to change the header lines, though.
- Important: I have not checked the file, so mistakes
are very likely. If you use the file, please let me know of
any mistake or proble
Regarding the output csv file:
-
The observations (which have the same headers as in
Ann Turner's file) are in the order: decodeme, 23andme, HapmapCEU,
HapmapCBT, HapmapJPT, HapmapYRI.
- The file has more than 256 observations, so if you use excel,
excel reads only the
first 256 columns and skips the rest. The last observations are
HapmapYRI=Yoruba, so if you are not interested in those, it won't matter to you.
You can erase additional columns if you need to add columns for your computations.
The number of rs lines should be fine, though.
- Be also aware that some rs may be reported differently across
databases, so check for consistency.
I have also a smaller version in which I do some cleaning:
Ymergedclean.zip
This file deletes most Hapmap's SNP in the recombinant part of the Y choromosome
(that is, they can have different alleles).
Then it tries to change some of the SNP values for hapmap so that
they are compatible with decodeme and 23andme. It also estimates the haplogroup
for hapmap observations. Of course, beware: things have not
been checked carefully. The matlab codes used to create the file
are enclosed, so one can check the data cleaning.
Simple admixture
Dienekes has created a program to compute a simple
admixture model of NW European, SE European, and Ashkenazi Jew (see
his blog entry for an explanation). The model uses frequency data from Price 2007
to make a guess about a likely percentage of each of these three populations in
one's genome. I have replicated his computation in matlab:
basicadmixture.zip
Chromosome extraction
Since the result file is huge, this matlab code extracts the (decodeme, for now)
data for each chromosome
and saves them to another csv file (note: the program takes a long time to run).
extractchr.m
Links
- Ron Scott has several links to information about deCODEme and 23andme.