Total Pageviews

Thursday, May 23, 2013

R1a tree

Today, I want to update the STR111 tree of R1a1a that I have presented earlier here and here and here. For the first time I tried to implement some SNP information into the tree as well, which made the R1a1a branching much clearer but it is still not perfect. Assuming an initial branching of R1a 8000BP I also calculated the age of each node of the tree (see table below). Last but not least, I increased the number of individuals in this tree (N=547).

Rectangular tree of R1a (as pdf):

Polar tree of R1a (as pdf):

SNP Age in years based on tree Age in years based on STR111 variability
M420 8000 8000
SRY10831.2 7798 7907
L664 4965 4375
Z645/Z647 6117 7294
Z283 5938 6751
M458 4625 3931
L260 3598 2411
CTS11962 4013 3069
L1029 4341 3078
Z280 5614 6050
Z92 4597 3996
CTS1211 5322 5381
P278 3719 2473
CTS3402 5046 4937
L366 3079 1038
L365 4095 2041
L1280 3281 2169
Z284 5063 4688
L448 4069 2857
CTS4179 3740 2212
L176 2956 1128
Z287/Z288 4908 4499
Z93 5989 6979
Z94 5795 6900
Z2121/Z2124 5322 5319
Z2122 4124 2457
Z2123 4781 3998
L657 4729 4131
Y7 3885 2197

Due to the size of the tree I split the tree into pieces.

Edit May 25th 2013:

I did not use a molecular clock, I just defined the total age of the whole tree at 8000 years BP.

Example Z94:
1. First I calculated the standard deviation (STDEV) for each STR within Z94+ individuals (Note: For STDEV I only used individuals with STR111 data).
2. I calculated the sum of all STR111 standard deviations (STDEV DYS393 + STDEV DYS390 + STDEV DYS19+ STDEV DYS391 + STDEV DYS385a + STDEV DYS385b + STDEV DYS426 + STDEV DYS388 + STDEV DYS439 + STDEV DYS389i + STDEV DYS392 + STDEV DYS389ii + etc.). For Z94+ individuals sum of all STR111 standard deviations is 51.307. The sum of all STR111 standard deviations within R1a (all individuals of tree N=547) is 53.823.
3. I figured out a correlation between the sum of all STR111 standard deviations within Z94+ individuals and the relative age of the SNP (see formula below).
4. Arbitrarily, I defined the total age of the whole tree at 8000 years BP. It might be better to see the age estimates as relative age estimates, not as absolute age estimates.

Age of SNP Z94=8000/(2.8863*e^(0.0588*(sum of all STR111 standard deviations within all 547 R1a individuals)))*(2.8863*e^(0.0588*(sum of all STR111 standard deviations within Z94+ individuals)))

Age of SNP Z94=8000*(2.8863*e^(0.0588*51.307))/(2.8863*e^(0.0588*53.823))

Update 06/18/2013:
I generated a Z282+* STR67 tree.

Radial tree:

Rectangular tree:


  1. This is a massive, almost monumental work, which I would like to thank you for.

    However I wonder if any conclusions can be drawn at all, considering that almost all samples are European (I could just identify a handful of West Asians and a single Indian individual, all concentrated in the Z93+ zone).

  2. Thanks, Maju, it took me quiet a while.

    Exactly because of the reasons you pointed out I try not to draw too many conclusions of the tree. I let the trees speak for themselves, and most of my comments are primarily descriptive.

    I try to focus on individuals with STR111 data, only very few West, Central and South Asians have done that. Due to the high number of Europeans and low number of Asians in this data set the tree is shifting from Z93 towards Z283. This can be seen in the age estimates based on the tree (Z283: 5973 years; Z93: 5792 years) vs age estimates based on the actual observed STR variability (Z283: 6752 years; Z93: 6980 years).

    Although the Asian side of R1a is heavily underrepresented in public data sets we already know the following:
    From the SNP distribution of the "FTDNA R1a1a and subclades project" we know that the oldest branches of R1a were also found in West Asia.
    We also know that one of the oldest branches of the "European" Z283 branch were found in West Asia (including one individual with paternal Kurdish ancestry).

    I believe that rigorous SNP testing of West Asians will lead to many surprises in the R1a tree. Unfortunately, right now most Asian R1a tested individuals are from the Arabian peninsula that doesn't show much diversity.

  3. Edit: I implemented some more SNP data to further improve outcome; tree looks better now.

  4. This is not only a work of science, it is a work of art. B K Nelson, PhD.

  5. Interesting. The North-Western in grey is N-W Europe I take it?

    1. L664+ is grey, mostly N-W Europe.