Friday, July 6, 2012

How to read STR data

I know that there is a lot of controversy about the usage of Y-STR data and I agree with most of them. However, sometimes there are no other data available (e.g. Y-SNP data) and in those cases Y-STR can help a little bit to understand the observed pattern within one haplogroup subbranch.
Looking at STR databases (e.g. ysearch.org or semargl.me/en/dna/ydna/tools/asd-classic/)  can be painful and useless, the reason for this is that the relationship between two individuals is solely based on STR differences, but these differences are not "weighted" in any sense, they just focus on "Distance markers" and "Distance steps".

Everyone who took a look at one of the FTDNA project quickly realizes that some Y-STRs are more variable than others In the L342+ group of FTDNA R1a1a and Subclades Y-DNA Project the following order of variance can be observed in the first 25 Y-STRs (from low to high variability):

DYS388    DYS437    DYS392    DYS455    DYS448    DYS393    DYS454    DYS426    DYS447    DYS438    DYS390    YCAIIb    DYS459a    DYS385a    DYS389ii    CDYa    DYS389i    YCAIIa    DYS464c    DYS449    DYS19    DYS464d    DYS464a    DYS385b    DYS459b    CDYb    DYS460    Y-GATA-H4    DYS391    DYS607    DYS439    DYS456    DYS570    DYS458    DYS576    DYS442    DYS464b

Differences in DYS464b are more common than differences in DYS388, so differences in DYS464b are "less important" than differences in DYS388. Any ranking should be weighted according to the variability of the STRs. The variance of some Y-STRs was calculated and published previously; YHRD listed them here.

This additional information can be used to better rank the best matches for an individual. The less Y-STRs are available for a comparison the more this approach is useful.

Of course, I did a first ranking test using a Kurdish individual:
H1483 (Z93+, L342+, L657-) in the FTDNA R1a1a and Subclades Y-DNA Project (focusing on 34 STRs and individuals that have these 34 STRs tested).

Top30 matches:



Next, this approach was expanded using 67 Y-STRs of the Arabic modal haplotype (2. C6. Z93+ L342+ L657-, Arabic). Top30 matches are:


Obviously, the Arabic cluster is a very narrow one with low variance. It cannot be old.


Then, R1a1a Ashkenazi-Levite modal haplotype was tested:


Similar to the Arabic cluster the Ashkenazi-Levite cluster is also pretty narrow with low variance. It cannot be old, either.



No comments:

Post a Comment