Total Pageviews

Saturday, August 18, 2012

R1a1a comparison STR111 Part II

First of all, I want to thank Humata, the blogger from He helped me analyzing R1a1a; I used the same tools that he used to analyze haplogroup Q in his recent post.

So, with Humata's help I analyzed all R1a1a individuals that have STR111 data. Again, I used adjusted distances to perform this analysis, which I described and used  here and here.

In order to analyze the data I had to increase IDs to 10 digits to prevent malfunction of the used software Fitch.

There are various versions to illustrate the data. Here, I am presenting two layouts.

Rectangular tree layout (see high resolution image):

Polar tree layout (see high resolution image):

Again, to better see "what is what" I annotate each ID with the proposed group used in the R1a1a and Subclades Project at FTDNA, and I used the same colors for the subclades as in this figure from FTDNA. Even with 111 STR values the main R1a1a SNPs (Z93, Z283, etc.) are overlapping.

So what does give more accurate results, unrooted network analysis or rooted tree analysis?

It is quiet obvious that even STR111 data are not sufficient enough to differentiate between the major R1a1a subclades. There are multiple overlapping haplogroups in the tree (presented below) and in the network (presented previously), i.e. Z283 and Z93 are overlapping when focusing only on the STR111 data. All previously presented trees are adding SNP information to the tree to correct this obvious overlapping. As an example: 
179005     Krikor Mirijanian, Arapkir, Turkey who is Z93+, L342+, L657-. 

Based on his STR111 values 179005 is closest to Z283+, Z280+ and Z283+, Z284+ individuals and not to other Z93+ individuals.

After Semargl was asked how he generated his R1a1a STR111 tree and why his tree shows clear clusters along the SNP branches, he responded that he is using not only STR but also SNP information to generate the tree.  Additionally, his current phylogenetic tree is a cladogram, that means that the cladogram tree does not have any information about the age or diversity of the R1a1a branches, e.g. the Ashkenazi-Jewish Z93+, L342+, L657- cluster and the MacDonald cluster takes a large part of the tree, even though these clusters are known to be very narrow (low diversity). Hopefully, he can generate a new tree that includes all this information.

From the Fitch software manual:

In Fitch you can also randomize the input order of the sequences with option "j", jumble. Often the input order of the sequences affects the outcome of the analysis. This can be assessed by randomizing the input order. The program also asks you to specify the number of times you want to randomize the input order of the sequences. It is advisable to do jumbling at least 10 times, because it almost certainly improves the results.

This is why I repeated the analysis with 10 runs as advised. Indeed, the R1a1a STR111 tree looks a little bit better now.

Rectangular tree layout (see high resolution image):

Polar tree layout (see high resolution image):

One of the new discovered SNPs is Z1282, downstream of L342. It was found in N77532 Sundardas Tulsyan, India. He is in the tree as # "N77532-2C*". Based on the presented tree and the previously presented network analysis #184336 , SAUD ABDUL AZIZ, Qatar would be a good candidate for L1282, too (1844336-2C* in the tree and in the network analysis).


  1. Palisto,

    I am interested to make these plots. I tried to use the ACD tool of, but failed to download the tool. Could you help me to make similar plots. I have used the McGee tools extensively.



  2. These are two different things. Do you want to use ACD tool or do you want to make STR trees?
    If you want to use the ACD tool you need to know what your starting data set is: Admixture data with K=? (e.g. Dodecad K=12 or Harappa K=16)

    If you want to generate STR trees you need to know the number of STRs in your data set. (e.g. STR67, STR111)

  3. Palisto,

    thanks for your response. I realized I do not want to use the ACD tool at the moment. I wanted to generate the STR trees. So far i succeeded to create:
    of the L147.1* groep. It was created using mcGee and the Phylip data and using Kitsch as described on using neighbouring of the timescales of mcgee. To display i did not use Treeview, but the ape package of R. I see a few improvements possible. I would prefer to use a "neighbouring" function of the STR values. Do you have suggestions?