KurdishDNA

Thursday, September 6, 2012

mtDNA Haplogroup H5

The interesting thing about the Kurdish individual with the H5a1 haplogroup is that the mtDNA contains SNPs that are characteristic for H5a iand H5b. I discovered it by using the mtDNA haplogroup predictor and GenBank data summarized by Ian Logan:

H2a2a1(rCRS) ⇨ 263G ⇨ H2a2a ⇨ 8860G 15326G ⇨ H2a2 ⇨ 750G ⇨ H2a ⇨ 4769G ⇨ H2 ⇨ 1438G ⇨ H ⇨ 456T ⇨ H5'36 ⇨ 16304C ⇨ H5 ⇨ 4336C ⇨ H5a ⇨ 15833T ⇨ H5a1 ⇨ 3397G 5471A

The definition of mtDNA haplogroup H5b is having a mutation from G to A at postion 5471, a.k.a. G5471A (or short 5471A). Other mtDNA haplogroups with this G5471A mutation are HV7 and N1b. To summarize, this mutation occurred multiple times during human evolution.

Another thing that effects the the midpoint and thus the outcome of the analysis are repeating mtDNA samples in the tree, so I am excluding them, too.

In the H5 branch I am excluding these because they are exactly like [HM625680 Kloss]:
GQ983083(Italy) Santoro
GQ983085(Italy) Santoro
GQ983086(Italy) Santoro
GQ983094(Italy) Santoro

In the H36 branch I am excluding these because they are exactly like [FJ348166 Irene]:
FJ348167 Irene
FJ348168 Irene
FJ348169 Irene

In the H36 branch I am also excluding these because they are exactly like [FJ348151 Irene]:
FJ348152 Irene

In Figtree, I generated rectangular tree of H5:

Same data of H5 as polar tree:

From the first look at the data, it seems like that the root European H5a originated in the Middle East (bright yellow), which is in agreement with ancient DNA data.

From Wikipedia:

H5 has been dated to around 11,500 BP (9500 BC).[5] It appears to be most frequent and diverse in the Western Caucasus, so an origin there has been suggested, while its subclade H5a appears European.[6] However samples of mtDNA with T16304C in the HVR1 region have been found in four individuals of around 6800 BC from the Pre-Pottery Neolithic B site of Tell Halula, Syria,[7] suggesting that H5 may have arrived in the Caucasus with farmers from the Near East.

This blunt conclusion needs more thoughts. I will update it.

Wednesday, September 5, 2012

mtDNA Haplogroup H5a1

Today, I want show some data about mtDNA haplogroup H5a1 because there is a Kurdish individual with H5a1 in this Kurdish DNA project.

I analyzed the phylogeny of this haplogroup by using fully sequenced and published mtDNA data from GenBank.

To do so I first downloaded the data from GenBank, then I used CLUSTALW, (mode: slow accurate pairwise alignment) to align the sequences and create a rooted phylogenetic tree with branch length (UPGMA). The data were pasted into CLUSTALW in the Fasta format.

I realized that all the data of Herrnstadt et al. 2002 are lacking the first 577 nucleotides, which is messing up the position of Herrnstadt samples in the trees/network and is messing up the position of the rest. The same effect can be seen with the two samples of Kivisild et al.; they are lacking 236 nucleotides. Thus, I excluded these samples.

Rooted phylogenetic tree with branth length (UPGMA) of H5a1:

The nice thing about CLUSTALW is that it also generates a "dnd file" and an "aln file" of the alignment.

The dnd file can be opened with the Figtree software. In Figtree, I generated another tree of H5a1:

The aln file can be opened with the Splitstree software. In Splitstree4, I generated a network (Convex Hull) of H5a1:

H5a1a (T721C mutation): three individuals in these trees have this mutation [AF346975(Dutch) Ingman; Q983087(Italy) Santoro; HQ659693(Polish) FTDNA]

H5a1b (G11719A mutation): two individuals in these trees have this mutation [AY495167(European) Coble; AY495176(European) Coble]

H5a1c1a (C4095T G13194A G9055A A2851G mutations): only one individual in these trees has these mutations [HQ663878(Danish) FTDNA]

H5a1d (A8803D mutation): only one individual in these trees has this mutation [AY495171(European) Coble]

H5a1e (A16166G mutation): two individuals from Finland belong to the H5a1e branch [AY339431(Finland) Moilanen; AY339432(Finland) Moilanen].

H5a1f (T961C mutation): one individual in these trees has this mutation [ JN646689(Polish) FTDNA]

H5a1g1 (T16172C, A444G, G9804A, T16311C mutations): two individuals in these trees have these mutations [EU294323 FTDNA; HQ645111(English) FTDNA]. Since HQ645111 has the additional T1284C and A7517G mutations it belongs to H5a1g1a.

Note: In the Finland DNA project I found one individual (N48161 Mary Anne Bodle, b.1791, Plumstead, Kent) that originated in England and has the same mutations of H5a1g1 HV regions, i.e. A444G, T16172C, T16311C.

H5a1k (T12864C): two individuals have this mutation [AY495170(European) Coble, GQ983064(Italy) Santoro]

H5a1p (T16093C): three individuals have this mutation [FJ966912 FTDNA, GQ983075(Italy) Santoro, GQ983084(Italy) Santoro]

Sunday, September 2, 2012

Cultural Distance Calculator Part2

This is a follow-up for the Cultural Distance Calculator:

Since there are some cultural data available I decided to use phylogeny software to present the results. This helps to detects groups of population that have similar cultural values and behavior. In order to visualize the data of the first 4 dimensions.First, I calculated a distance matrix for all populations of the "Old World" (N=69). Then, I had to adjust the IDs to 10 digits to prevent malfunction of the Fitch software. In Fitch, I used 10 randomized runs to improve the results.

The tree of "Old world" cultures:
Europe: red
Middle East and North Africa: green
Asia: green asparagus
New World "Latin America": brown
New World "English-speaking": grey
Pacific: pink
Africa: black

Some of the surprising and interesting observations:

1. Scandinavians form one cultural cluster.
2. Australia, USA, Canada, New Zealand and South Africa are in the British/Irish cultural cluster.
3. The closest to British/Irish cultural cluster are Central Europeans and Israelis.
4. The "Catholic Cluster" is formed by Argentina, Spain, France, Belgium, Poland, and Malta.
5. The "Arab World Cluster" (Iraq, Saudi-Arabia, UAE, Kuwait) shows cultural similarities to some Latin-Americans (Guatemala, Panama, Surinam, Mexico, Brazil) and some East Europeans, mostly Orthodox Christians (Russia, Serbia, Romania)
6. Culturally, Turkey is more similar to Balkan people (Bulgaria, Croatia) than to the Middle East.
7. Culturally, Albania is more similar to Ecuador, Colombia and Venezuela than to the Balkan
8. Some European countries (Portugal, Greece, Slovenia) form a cultural cluster with Latin Americans (Uruguay, Chile, Costa Rica, El Salvador, Peru) and South Korea. Egypt is not far away from that cluster.
9. Singapore, Hong Kong, China, and Vietnam on one side, and Philippines, Malaysia, and Bhutan on the other side form two closely related clusters.
10. Dominican Republic forms a cultural cluster Ethiopia and Kenya.
11. Culturally, Nepal is more similar to some African countries (Malawi, Zambia, Namibia, Tanzania, Sierra Leone, Senegal)
12. Honduras, Indonesia and the Fiji Islands form a cultural cluster.

Monday, August 27, 2012

Haplogroup Tree J1 STR67

Today, I want to present haplogroup J1 tree with STR67 data. I used the same method as before. The goal was to get a tree for the oldest branches of the J1 haplogroup.
To do so I only used individuals that are labeled as J1 (not J1c, J1b, etc.) at FTDNA. In a lot of cases SNPs downstream of J1 were not tested, so I had to help myself: I excluded all individuals that show high similarity with known J1c and J1b individuals. Then, I generated the first tree with 200 individuals.

Polar tree of haplogroup J1 (excluding known J1b and J1c individuals):

J1 is split into two parts: the Arabian Peninsula (highlighted in magenta) and the rest. My assumption is that all these individuals from the Arabian Peninsula have the haplogroup J1c3d2 L222.2+ or at least J1c but they were just not tested for it.

Next, I excluded those individuals, I repeated the analysis with the remaining individuals.

1. Rectangular tree of haplogroup J1:

2. Polar tree of haplogroup J1:

Note:
The individuals from Iran (Irn-187962), Iraq (Irq--92829) and Turkey (Tur-191398, Aintab, Turkey) are Assyrians. All other individuals from Iran and most individuals from Turkey (Tur-...) are actually Armenians. Unfortunately, no known Kurd is included in this analysis but based on the names there is one individual from Turkey (Tur-221845, Ahmed) that is Muslim and thus, not an Armenian or Assyrian, so he could be a Kurd.

Resume:
The oldest branches of haplogroup J1 can be found in Northern Mesopotamia and Eastern Anatolia.

Friday, August 24, 2012

Haplogroup J1 tree STR111

Edit 09/12/2012:
I updated this post to make it shorter/easier for readers.

Today, I want to present the haplogroup J1 tree with STR111 data. I used the same method as before. Most of the individuals in this tree are from the Arabian peninsula and have the haplogroup J1b2b1 (aka J1c3d2) L222.2+. Some of them did not test for L222.2 but I am pretty confident about this. Based on the tree I predicted the haplogroups of some others.

Unfortunately, there is no Kurdish data included in this tree but we can assume that most Kurds with J1 belong to the same subclade as their closest neighbors, i.e. J1* M267. Of course, this has to be confirmed.

Actually, one individual in this figure is definitely from Iraqi Kurdistan: Irq---92829 is an Assyrian of Erbil :-)

I know of one Kurd of Sharif descent that is J1b2b. Unfortunately, he has only STR67 analyzed, so I cannot include him in the STR111 trees presented below. However, he is grouped with 100329 (unknown origin) and M4284 (from UAE) in the FTDNA J Haplogroup Project, so he would be close to these in the tree, and both 100329 and M4284 are at the root of the very large J1b2b branch (The previous name of J1b2b was J1c3d; at 23andme it is called J1e; details about the name changes can be found at ISOGG).

Color code:
J1 M267 Z1834-, Z1842- grey ("oldest" branch)
J1 M267 Z1842+ light blue
J1a M365.1 red (only one individual: Antonio Gomes 1635, Milhazes, Barcelos, Portugal [Por---73612])
J1b* L136 light green
J1b2* P58 dark green (clover)
J1b2b* L147.1 blue
J1b2b* Jewish Cohanim Cluster orange
J1b2b* L859+ yellow
J1b2b1 L222.2+ purple

Here are the results.
Rectangular tree of J1 (as jpg and as pdf):

Polar tree of J1 (as jpg and as pdf):

Based on this STR111-based tree the history of J1 is much clearer.

1. The ancestral region of J1 is Caucasus, Northern Mesopotamia or Eastern Anatolia.
2. There is an early split of J1 into two branches.

Edit 09/24/12:
Rob1 (eng.molgen.org user) helped me collecting more J1 STR111 data. The new tree has a total of 313 users (see below). Three Jewish clusters emerged in the STR tree, all three clusters are highlighted in light orange, orange, and dark orange. Arab clusters within the blue J1b2b* L147.1 area are highlighted in light brown, brown, dark brown, and asparagus green. The greenish Arab cluster is at the root of the J1b2b1 L222.2+ purple cluster. The most obvious cluster within the J1b2b1 L222.2+ purple cluster is Bany Zaid cluster (highlighted in red).

Edit 09/30/12:
Iyyovi (eng.molgen.org user) asked me to upload pdf versions of the latest J1 rectangular and polar tree.
pdf polar tree
pdf rectangular tree

Thursday, August 23, 2012

Haplogroup J2 tree STR111

Today, I want to present the haplogroup J2 tree with STR111 data. I used the same method as before (The annotation is not ready...)

Here are the results.

1. Rectangular tree of haplogroup J2:

2. Polar tree of haplogroup J2:

Edit:
A better J2 tree.

Tuesday, August 21, 2012

Whole Genome Comparison: Kurds Part II

I recently saw the ACD tool (ACD=Ancestral Component Dissection) at Vaêdhya. Unfortunately, I was not able to download it, but I could see the logic behind it by looking at the presented figures. What he did was:
1. Taking several populations from one region and assuming only one main ancestry for all of them.
2. Taking the lowest percentage of each component (Dodecad, Harappa or Eurogenes) and declaring this lowest percentage as ancestral.
3. Subtracting the actual percentage of a component of a population from the lowest percentage.

Example with the Baloch component in Harappa Project:
Step 1:
Armenian: 18%
Assyrian: 19%
Kurd: 26 %
Iranian: 27 %
Step 2:
Lowest Percentage for the Baloch component in these four populations: 18%
Step 3:
Armenian: 18% - 18% = 0 %
Assyrian: 19% - 18% = 1 %
Kurd: 26% - 18% = 8 %
Iranian: 27% - 18% = 9 %

His goal is to measure the influx migration from outside. This measurement is based on the assumption of an ancestral population for all compared populations. Let's see how this ancestral population would look like:

The minimum values of the 12 components for Armenians, Assyrians, Kurds and Iranians are:
S Indian         0%
Baloch         18%
Caucasian    42%
NE Euro        1%
SE Asian       0%
Siberian         0%
NE Asian       0%
Papuan           0%
American       0%
Beringian       0%
Medit.            5%
SW Asian    11%
San                0%
E African     0%
Pygmy      0%
W African    0%
Total         77%

Let's bring up the numbers to a total of 100% :
S Indian        0%
Baloch        23%
Caucasian   55%
NE Euro   1%
SE Asian      0%
Siberian      0%
NE Asian      0%
Papuan          0%
American      0%
Beringian      0%
Medit.       6%
SW Asian    14%
San                0%
E African      0%
Pygmy          0%
W African    0%
Total         100%

In short:

#	Population	Percent
1	Caucasian	55
2	Baloch	23
3	SW-Asian	14
4	Mediterranean	6
5	NE-Euro	1

The main drawbacks of this approach are that ...
1. The calculation is based on processing of already processed data, so the error is getting bigger. It is not directly based on raw data.
2. It is based on the assumption that neighboring people with different histories/languages/religions have one main common ancestry.

I actually used a similar approach here but I did not use averages of populations and I did not assume a common ancestry for these populations. Instead I only focused on results of individuals of only one population. My goal was not to see differences but to see similarities. Onur critized that my old approach was not based on admixture results of others and not on raw data.

So, what I now did was to take all Kurdish raw data and fuse them into one genome. This genome is based on the allele frequencies in Kurds results that I presented here. Allele frequencies of 25-75% were considered as heterozygous and allele frequencies below 25% and above 75% were considered as homozygous.

Then, I used this fused Kurdish genome and ran it through HarappaWorld, Dodecad K12b, and Eurogenes K12.

Interestingly, the Harappa results of the fused Kurdish genome are not far away from the results above:

HarappaWorld for fused Kurdish genome (N=16):

#	Population	Percent
1	Caucasian	54.63
2	Baloch	28.84
3	SW-Asian	12.61
4	Mediterranean	2.85
5	NE-Euro	1.07

Closest populations to this fused Kurdish genome (N=16):

Single Population Sharing:

#	Population (source)	Distance
1	kurd (yunusbayev)	9.97
2	kurd (xing)	10.36
3	assyrian (harappa)	11.41
4	armenian (yunusbayev)	11.69
5	kurd (harappa)	12.08
6	azerbaijan-jew (behar)	12.29
7	armenian (behar)	12.83
8	iranian (harappa)	12.93
9	uzbekistan-jew (behar)	13.06
10	georgian (harappa)	13.52
11	iranian-jew (behar)	13.91
12	iraqi-mandaean (harappa)	14
13	georgia-jew (behar)	14.49
14	azeri (harappa)	14.62
15	iranian (behar)	15.3
16	turkish (harappa)	16.27
17	iraq-jew (behar)	16.47
18	turk (behar)	17.64
19	turk-kayseri (hodoglugil)	18.51
20	kumyk (yunusbayev)	19.35

Dodecad K12b for fused Kurdish genome (N=16):

#	Population	Percent
1	Caucasus	52.25
2	Gedrosia	29.22
3	Southwest_Asian	12
4	Atlantic_Med	4.18
5	North_European	2.35

Single Population Sharing:

#	Population (source)	Distance
1	Kurds (Yunusbayev)	10.3
2	Kurd (Dodecad)	11.48
3	Armenians_15 (Yunusbayev)	11.7
4	Iranian (Dodecad)	11.95
5	Azerbaijan_Jews (Behar)	12.09
6	Uzbekistan_Jews (Behar)	12.59
7	Assyrian (Dodecad)	13.02
8	Armenian (Dodecad)	13.25
9	Georgia_Jews (Behar)	13.38
10	Iranian_Jews (Behar)	14.49
11	Iranians (Behar)	14.87
12	Armenians (Behar)	15.19
13	Turks (Behar)	16.88
14	Iraq_Jews (Behar)	17.74
15	Turkish (Dodecad)	19.37
16	Kumyks (Yunusbayev)	20.68
17	Druze (HGDP)	23.08
18	Adygei (HGDP)	23.34
19	Lezgins (Behar)	23.49
20	Chechens (Yunusbayev)	23.62

Eurogenes K12b for fused Kurdish genome (N=16):

#	Population	Percent
1	Caucasus	45.67
2	W-Central Asian	22.79
3	Mediterranean	19.29
4	Southwest Asian	11.81
5	North European	0.45

Total Pageviews