Tuesday, August 21, 2012

Whole Genome Comparison: Kurds Part II

 I recently saw the ACD tool (ACD=Ancestral Component Dissection) at Vaêdhya. Unfortunately, I was not able to download it, but I could see the logic behind it by looking at the presented figures. What he did was:
1. Taking several populations from one region and assuming only one main ancestry for all of them.
2. Taking the lowest percentage of each component (Dodecad, Harappa or Eurogenes) and declaring this lowest percentage as ancestral.
3. Subtracting the actual percentage of a component of a population from the lowest percentage.

Example with the Baloch component in Harappa Project:
Step 1:
Armenian: 18%
Assyrian:  19%
Kurd: 26 %
Iranian: 27 %
Step 2:
Lowest Percentage for the Baloch component in these four populations: 18%
Step 3:
Armenian: 18% - 18% = 0 %
Assyrian:  19% - 18% = 1 %
Kurd: 26% - 18% = 8 %
Iranian: 27% - 18% = 9 %

His goal is to measure the influx migration from outside. This measurement is based on the assumption of an ancestral population for all compared populations. Let's see how this ancestral population would look like:

The minimum values of the 12 components for Armenians, Assyrians, Kurds and Iranians are:
S Indian         0%
Baloch         18%
Caucasian    42%
NE Euro        1%
SE Asian       0%
Siberian         0%
NE Asian       0%
Papuan           0%
American       0%
Beringian       0%
Medit.            5%
SW Asian    11%
San                0%
E African      0%
Pygmy          0%
W African    0%
Total           77%

Let's bring up the numbers to a total of 100% :
S Indian        0%
Baloch        23%
Caucasian   55%
NE Euro       1%
SE Asian      0%
Siberian        0%
NE Asian      0%
Papuan          0%
American      0%
Beringian      0%
Medit.           6%
SW Asian    14%
San                0%
E African      0%
Pygmy          0%
W African    0%
Total         100%

In short:
#PopulationPercent
1 Caucasian 55
2 Baloch 23
3 SW-Asian 14
4 Mediterranean 6
5 NE-Euro 1

The main drawbacks of this approach are that ...
1. The calculation is based on processing of already processed data, so the error is getting bigger. It is not directly based on raw data.
2. It is based on the assumption that neighboring people with different histories/languages/religions have one main common ancestry.


I actually used a similar approach here but I did not use averages of populations and I did not assume a common ancestry for these populations. Instead I only focused on results of individuals of only one population. My goal was not to see differences but to see similarities. Onur critized that my old approach was not based on admixture results of others and not on raw data.


So, what I now did was to take all Kurdish raw data and fuse them into one genome. This genome is based on the allele frequencies in Kurds results that I presented here. Allele frequencies of 25-75% were considered as heterozygous and allele frequencies below 25% and above 75% were considered as homozygous.

Then, I used this fused Kurdish genome and ran it through HarappaWorld, Dodecad K12b, and Eurogenes K12.

Interestingly, the Harappa results of the fused Kurdish genome are not far away from the results above:

HarappaWorld for fused Kurdish genome (N=16):

#PopulationPercent
1 Caucasian 54.63
2 Baloch 28.84
3 SW-Asian 12.61
4 Mediterranean 2.85
5 NE-Euro 1.07


Closest populations to this fused Kurdish genome (N=16):

Single Population Sharing:


#Population (source)Distance
1 kurd (yunusbayev) 9.97
2 kurd (xing) 10.36
3 assyrian (harappa) 11.41
4 armenian (yunusbayev) 11.69
5 kurd (harappa) 12.08
6 azerbaijan-jew (behar) 12.29
7 armenian (behar) 12.83
8 iranian (harappa) 12.93
9 uzbekistan-jew (behar) 13.06
10 georgian (harappa) 13.52
11 iranian-jew (behar) 13.91
12 iraqi-mandaean (harappa) 14
13 georgia-jew (behar) 14.49
14 azeri (harappa) 14.62
15 iranian (behar) 15.3
16 turkish (harappa) 16.27
17 iraq-jew (behar) 16.47
18 turk (behar) 17.64
19 turk-kayseri (hodoglugil) 18.51
20 kumyk (yunusbayev) 19.35


Dodecad K12b for fused Kurdish genome (N=16):

#PopulationPercent
1 Caucasus 52.25
2 Gedrosia 29.22
3 Southwest_Asian 12
4 Atlantic_Med 4.18
5 North_European 2.35

Single Population Sharing:


#Population (source)Distance
1 Kurds (Yunusbayev) 10.3
2 Kurd (Dodecad) 11.48
3 Armenians_15 (Yunusbayev) 11.7
4 Iranian (Dodecad) 11.95
5 Azerbaijan_Jews (Behar) 12.09
6 Uzbekistan_Jews (Behar) 12.59
7 Assyrian (Dodecad) 13.02
8 Armenian (Dodecad) 13.25
9 Georgia_Jews (Behar) 13.38
10 Iranian_Jews (Behar) 14.49
11 Iranians (Behar) 14.87
12 Armenians (Behar) 15.19
13 Turks (Behar) 16.88
14 Iraq_Jews (Behar) 17.74
15 Turkish (Dodecad) 19.37
16 Kumyks (Yunusbayev) 20.68
17 Druze (HGDP) 23.08
18 Adygei (HGDP) 23.34
19 Lezgins (Behar) 23.49
20 Chechens (Yunusbayev) 23.62


Eurogenes K12b for fused Kurdish genome (N=16):


#PopulationPercent
1 Caucasus 45.67
2 W-Central Asian 22.79
3 Mediterranean 19.29
4 Southwest Asian 11.81
5 North European 0.45




No comments:

Post a Comment