Thursday, June 27, 2013

The color of the eyes: 7 HERC2 variants in the Eurasian gene pool

In my previous post I briefly described the presence of 7 different HERC2 haplotypes in the Kurdish gene pool. Today, I want to show HERC2 haplotype data of populations from Eurasia. The blogger Davidski provided me with the raw data for HERC2 from the Human Genome Diversity Project (HGDP) and similar databases. I focused on Eurasian populations that had information for the following SNPs: rs12913832, rs7183877, rs11635884, rs11636232, rs8043281, rs6497284, rs8028689, rs9302376, rs16950960, rs8039195, rs16950987, and rs1667394. I added the Kurdish data that I collected.

In most Eurasian individuals I was able to explain their HERC2 genotypes by combining 2 of the 7 HERC2 haplotypes, however, in a few individuals the data set was not complete, in a few individuals other haplotypes must have been present but could not determined. Those data are labeled as "not determined".

Please take the percentages with a grain of salt because the number (N) of tested individuals per population is low, sometimes with only N=1, these data should be ignored.

Reminder: The phenotype of HERC2 haplotype#1 and #2 are light eye colors.

The annotated data can be seen here.


N #1 #2 #3 #4 #5 #6 #7 Not Determined
Abhkasian 18 17% 6% 22% 14% 17% 22% 3% 0%
Adygei 14 25% 4% 14% 4% 21% 32% 0% 0%
Altaians 12 0% 0% 13% 0% 13% 17% 0% 58%
Armenian 27 20% 2% 17% 9% 2% 9% 4% 37%
Ashkenazy_Jews 21 31% 14% 10% 2% 12% 5% 2% 24%
Balkar 16 38% 6% 13% 9% 22% 9% 3% 0%
Balochi 15 10% 3% 10% 17% 23% 23% 7% 7%
Bedouin 40 10% 0% 19% 14% 1% 19% 15% 23%
Belorussians 7 36% 64% 0% 0% 0% 0% 0% 0%
Bengali 1 50% 0% 50% 0% 0% 0% 0% 0%
Brahmins_TN 9 6% 0% 11% 6% 6% 17% 11% 44%
Brahui 20 3% 0% 13% 10% 20% 13% 3% 40%
Bulgarian 13 12% 4% 12% 0% 0% 4% 0% 69%
Burusho 25 10% 8% 16% 0% 12% 14% 4% 36%
Buryats 15 7% 0% 10% 0% 57% 23% 3% 0%
Cambodian 10 0% 0% 25% 10% 30% 25% 0% 10%
Chamar 9 6% 0% 33% 11% 0% 11% 17% 22%
Chechen 18 31% 11% 14% 3% 19% 22% 0% 0%
Chenchus 4 0% 0% 0% 0% 0% 0% 0% 100%
Chukchis 11 0% 0% 41% 0% 45% 9% 5% 0%
Chuvash 16 28% 31% 19% 0% 13% 6% 3% 0%
Cochin_Jews 3 0% 0% 17% 0% 17% 33% 0% 33%
Cypriots 12 33% 0% 21% 13% 17% 13% 4% 0%
Dai 10 0% 0% 35% 0% 10% 55% 0% 0%
Daur 9 0% 0% 17% 0% 56% 28% 0% 0%
Dharkars 8 0% 0% 25% 6% 31% 19% 19% 0%
Dolgans 6 0% 0% 25% 0% 50% 25% 0% 0%
Druze 20 18% 0% 3% 3% 8% 13% 3% 55%
Dusadh 6 0% 0% 25% 0% 25% 8% 8% 33%
Egyptians 11 23% 0% 18% 18% 14% 18% 9% 0%
Erzya 9 39% 56% 0% 0% 6% 0% 0% 0%
Evenkis 11 0% 0% 32% 0% 45% 23% 0% 0%
French 27 31% 30% 19% 4% 6% 11% 0% 0%
French_Basque 21 14% 24% 19% 7% 26% 10% 0% 0%
Georgians 14 46% 0% 7% 4% 18% 21% 4% 0%
Gond 2 0% 0% 25% 0% 0% 25% 0% 50%
Hakkipikki 4 0% 0% 13% 0% 13% 38% 13% 25%
Hazara 17 6% 0% 26% 3% 29% 26% 3% 6%
Hezhen 7 0% 0% 43% 0% 36% 21% 0% 0%
Hungarians 18 17% 50% 14% 0% 14% 6% 0% 0%
Iranian_Jews 4 13% 0% 13% 0% 0% 63% 13% 0%
Iranians 16 3% 6% 28% 3% 13% 28% 13% 6%
Iraqi_Jews 9 17% 0% 28% 17% 11% 28% 0% 0%
Jordanians 19 13% 8% 29% 5% 13% 21% 11% 0%
Kanjars 7 0% 0% 14% 0% 14% 64% 7% 0%
Kargopol_Russian 25 36% 46% 8% 2% 4% 4% 0% 0%
Karitiana 6 33% 0% 17% 0% 33% 0% 0% 17%
Kol 14 0% 0% 36% 4% 21% 25% 14% 0%
Koryaks 10 0% 0% 40% 0% 50% 0% 10% 0%
Kshatriya 7 0% 0% 29% 0% 29% 43% 0% 0%
Kumyk 13 35% 12% 15% 0% 19% 12% 0% 8%
Kurd 26 19% 4% 19% 15% 21% 17% 4% 0%
Kurmi 1 0% 0% 0% 0% 0% 0% 100% 0%
Kurumba 3 33% 0% 17% 0% 33% 0% 17% 0%
Lambadi 1 0% 0% 50% 0% 0% 50% 0% 0%
Lebanese 4 13% 0% 38% 38% 0% 13% 0% 0%
Lebanese_Christian 24 31% 6% 15% 10% 4% 19% 6% 8%
Lebanese_Druze 23 17% 0% 28% 11% 13% 13% 13% 4%
Lebanese_Muslim 25 14% 8% 14% 12% 12% 32% 8% 0%
Lezgins 16 19% 3% 22% 6% 22% 25% 3% 0%
Lithuanians 10 35% 60% 0% 0% 0% 5% 0% 0%
Makrani 20 10% 3% 18% 8% 33% 20% 5% 5%
Malay 87 1% 0% 22% 13% 28% 29% 5% 2%
Miaozu 10 0% 0% 25% 0% 25% 50% 0% 0%
Moksha 5 30% 50% 10% 0% 0% 10% 0% 0%
Mongola 10 0% 0% 35% 0% 25% 30% 0% 10%
Mongolians 9 0% 0% 28% 0% 50% 11% 0% 11%
Mumbai_Jews 4 13% 0% 25% 0% 0% 0% 38% 25%
Muslim_India 5 10% 0% 20% 40% 0% 20% 10% 0%
NAN_Melanesian 10 0% 0% 20% 0% 0% 15% 65% 0%
Nganassans 9 0% 0% 28% 0% 56% 11% 6% 0%
Nihali 1 0% 0% 0% 0% 50% 50% 0% 0%
Nogay 14 39% 0% 21% 4% 21% 11% 4% 0%
North_Italian 13 42% 27% 8% 4% 8% 8% 4% 0%
North_Kannadi 4 25% 0% 25% 38% 0% 13% 0% 0%
North_Ossetian 13 31% 15% 35% 8% 4% 8% 0% 0%
Orcadian 11 27% 41% 5% 5% 18% 5% 0% 0%
Oroqen 9 0% 0% 17% 0% 61% 22% 0% 0%
Palestinian 31 23% 0% 13% 6% 8% 26% 15% 10%
Pathan 21 19% 2% 14% 7% 24% 17% 17% 0%
Piramalai 8 0% 0% 31% 6% 25% 19% 19% 0%
Pulliyar 1 0% 0% 0% 0% 100% 0% 0% 0%
Romanians 14 29% 25% 21% 4% 14% 7% 0% 0%
Russians 5 60% 20% 0% 0% 10% 10% 0% 0%
Sardinian 24 19% 4% 40% 6% 10% 17% 0% 4%
Saudis 19 3% 0% 26% 0% 8% 13% 24% 26%
Selkups 9 39% 44% 11% 0% 6% 0% 0% 0%
Sephardic_Jews 18 25% 11% 25% 3% 11% 14% 11% 0%
She 9 0% 0% 33% 0% 11% 56% 0% 0%
Sindhi 14 4% 0% 29% 7% 14% 32% 7% 7%
Singapore_Indian 78 8% 2% 26% 4% 17% 26% 12% 6%
Spanish 11 32% 5% 32% 9% 9% 5% 0% 9%
Surui 3 17% 0% 33% 0% 50% 0% 0% 0%
Syrians 15 30% 0% 13% 3% 23% 20% 3% 7%
Tadjik 14 7% 11% 18% 7% 36% 18% 4% 0%
Tamil_Nadu 1 0% 0% 50% 0% 50% 0% 0% 0%
Tharus 2 25% 0% 25% 0% 0% 25% 25% 0%
Tibeto-Burman_Burmese 14 0% 0% 32% 18% 18% 21% 11% 0%
Tibeto-Burman_Garo 2 0% 0% 25% 0% 50% 25% 0% 0%
Tu 8 0% 0% 13% 0% 31% 44% 0% 13%
Tujia 10 0% 0% 5% 5% 45% 35% 10% 0%
Turks 19 21% 8% 29% 8% 13% 16% 5% 0%
Tuscan 8 25% 19% 6% 0% 19% 31% 0% 0%
Tuvinians 13 0% 0% 42% 0% 35% 23% 0% 0%
Ukrainian 20 43% 38% 8% 0% 8% 5% 0% 0%
Uttar_Pradesh 5 0% 0% 60% 0% 20% 20% 0% 0%
Uygur 10 10% 0% 20% 0% 35% 35% 0% 0%
Uzbeks 15 3% 13% 27% 23% 13% 20% 0% 0%
Velamas 7 7% 0% 21% 0% 43% 0% 29% 0%
Xibo 9 0% 0% 11% 0% 39% 39% 0% 11%
Yakut 20 0% 0% 40% 0% 43% 18% 0% 0%
Yemenese 7 7% 0% 14% 7% 7% 14% 21% 29%
Yemenite_Jews 15 7% 0% 27% 3% 0% 27% 10% 27%
Yizu 10 0% 0% 35% 0% 15% 40% 0% 10%
Yukaghirs 4 0% 13% 38% 0% 38% 13% 0% 0%



Unfortunately, this data set does not include many Germanic speaker populations (Austrians, Germans, Swiss). In the previous HERC2 analysis these populations showed peak frequencies for haplotype#1.

Haplotype#1 is ancestral towards haplotype#2. Peak frequencies of haplotype#2 can be found in Belorussians, Lithuanians, some Uralic language speakers from Russia (Moksha, Selkups). Interestingly, these populations show no or very little haplotype#3, the ancestral haplotype of #1 and #2.

Haplotype#3 peaks in populations of East-Siberia (Hezhen, Tuvinians, Chukchis, Koryaks, Yakut, Yukaghirs), West-Asia (Lebanese, North Ossetians, Turks, Jordanians, Lebanese Druze, Iranians, Iraqi Jews). Interestingly, in East-Siberia haplotype#3 correlates with the presence of haplotype#5 and #6. Highest frequencies of haplotype#3 in Europe can be found in Sardinia and Spain.

Edit: July 02, 2013:

I got some questions why some populations have let's say 20% Branch/haplotype #1 and #2 but not 20% of the population has light eye colors. The reasons is because haplotype 1 and 2 are recessive.
Thus, in order to get light eye colors not one but 2 copies/alleles are needed, one inherited from the father, one from the mother.
How to calculate frequency of light eyes in a population based on my presented tables (based on the Hardy-Weinberg principle):
Frequency of light eyes in a population = (%ht1 +%ht2)2

Example1: Germans have 46% ht1 and 33% ht2.
(46%+33%)2
= (0.46 + 0.33)2
= 0.792
= 0.62
= 62%
62% of the Germans have light eye color based on the HERC2 genotype.


Related:
The color of the eyes: 7 HERC2 variants in the Kurdish gene pool 
The color of the eyes: 7 HERC2 variants in the Eurasian gene pool 
The color of the eyes: at least 17 HERC2 variants in Human gene pool

Sunday, June 9, 2013

The color of the eyes: 7 HERC2 variants in the Kurdish gene pool

Today, I want to present some genetic data focusing on different eye colors of Kurds and their genetic origin. The color of the eye is mostly determined by only one SNP in the human genome, rs12913832 in the HERC2 gene.
GG at rs12913832 in the HERC2 gene results in light eyes (blue/green) eyes; AG and AA results in brown eyes.

My goal is not only look at the rs12913832 SNP itself but to determine the number of DNA segments that carry the SNP rs12913832; the goal is to determine the number of HERC2 variants (=number of different HERC2 segments) in the Kurdish gene pool. 23andme (and other companies) have a chip-based analysis approach, thus, the genetic read-out only contains data about the single SNPs but not in relation to other neighboring SNPs. The only way to determine DNA segments is to have genetic data of multiple close relatives.

Example with very short DNA segment consisting of two SNPs only:
Person A:  rs12913832=AG; neighboring SNP rs7183877=AC.
With this amount of information it is impossible to determine the two DNA segments that were inherited from father and mother of person A. Is rs12913832=A and rs7183877=A on the same DNA segment and inherited from the same parent? With this amount of data we cannot know. "Phased DNA", a new tool at gedmatch can somewhat address this question but not in all cases.
Let's say father and mother of person A have rs12913832=AG and rs7183877=AC as well. We still cannot say if rs12913832=A and rs7183877=A are on the same DNA segment.
Let's say we also know the SNP results of as sibling of person A" rs12913832=AA; neighboring SNP rs7183877=AA. Now, we can determine the DNA segments
DNA segment1: rs12913832=A and rs7183877=A
DNA segment2: rs12913832=G and rs7183877=C
Person A and both parents of person A have one copy of segment1 and one of segment2, while the sibling of personA has two copies of segment1.

This example was only based on two SNPs but it is more interesting to cover a larger DNA segment consisting of all tested SNPs of the HERC2 gene (at least all the SNPs tested by 23andme).

Having genetic data of multiple relatives I was able to determine a total 7 DNA segments of the HERC2 gene variants: two HERC2 variants correlate with light eye colors, the five other HERC2 variants of the HERC2 correlate with dark eyes.

 The SNP data of the HERC2 variants are presented in the spreadsheet.

Some observations/results:
1.
Kurdish DNA segments for light eye colors are
rs12913832=A
rs1129038=T
rs916977=C
rs1667394=T

2. Kurdish DNA segments for dark eye colors are
rs12913832=G
rs1129038=C
rs916977=C,T*
rs1667394=C,T*

*From other Kurdish SNP data we know that this correlation is always true for Kurdish DNA segments for light eye colors, always true for rs1129038, but not always for rs916977 and rs1667394 (Branch#3 for dark eye colors is also rs916977=C, rs1667394=T).

HERC2 SNP results of all tested Kurds can be described by combining 2 of the 7 HERC2 variants.

Example:
HERC2 SNPs of KD014 are a mix of Branch#1 and Branch#6.

Next, I generated a phylogenetic tree of the 7 HERC2 variants. Branch#7 is the most ancestral branch. The two branches that result in light eye colors (Branch#1 and Branch#2) are closely related and descendent from Branch#3.




The difference of Branch#1 and Branch#2 is the SNP rs11636232; the difference between these two branches and the other five branches is SNP rs12913832. With the help of ALFRED I could determine the frequency of Branch#1 and Branch#2 in several populations (assuming that there is no other HERC2 variant for light eye color):

Sorted by Branch#1:


Branch#1 Branch#2
Brahui 2% 2%
Balochi 8% 2%
Balochi 12% 6%
Kalash 12% 16%
Sardinian 16% 4%
Palestinian 18% 3%
Burusho 18% 12%
Basque 19% 21%
Italians 25% 19%
Adygei 26% 6%
Orcadian 28% 41%
Galician 30% 17%
French 32% 30%
Russians 36% 46%
Italians 42% 27%
Swedes 42% 54%
Germans 46% 33%
Danes 52% 32%
Austrian 55% 28%
Swiss 69% 25%

 Sorted by Branch#2:


Branch#1 Branch#2
Brahui 2% 2%
Balochi 8% 2%
Palestinian 18% 3%
Sardinian 16% 4%
Balochi 12% 6%
Adygei 26% 6%
Burusho 18% 12%
Kalash 12% 16%
Galician 30% 17%
Italians 25% 19%
Basque 19% 21%
Swiss 69% 25%
Italians 42% 27%
Austrian 55% 28%
French 32% 30%
Danes 52% 32%
Germans 46% 33%
Orcadian 28% 41%
Russians 36% 46%
Swedes 42% 54%


Edit 06/10/2013:
Maju asked for the "percentage of Kurds in each branch". With the limited number of 23andme tested Kurds (N=20) the percentages can be off.

Percentage of Kurds in each HERC2 branch

Branch#1 22.5%
Branch#2 5.0%
Branch#3 20.0%
Branch#4 10.0%
Branch#5 17.5%
Branch#6 20.0%
Branch#7 5.0%


From the few East-Europeans 23andme results of HERC2 I have seen so far I can say that East-Europeans have Branch#1, #2, #5 and #6, but not Branch#3, Branch#4 and Branch#7 (Note: Branch#3 and Branch#4 are the ancestral haplotypes of the two light eye color branches #1 and#2).
Besides Kurds, I found Branch#3 in one person from the Philippines (Branch#3 and #5), one Maltese (Branch#3 and #5), one Turk (Branch#3 and #1), and one multi-ethnic Canadian (Branch#3 and #2).
Besides Kurds, I found Branch#4 in one multi-ethnic US American (Branch#4 and #6), one Assyrian (Branch#4 and #1), and one British (Branch#4 and #2). I found one Ethiopian 23andme result that cannot be described with the 7 branches, so there might be more branches for brown eye color in Africa.

In the spreadsheet I now added all potential combinations of the 7 variants in a separate sheet.

Edit: July 02, 2013:

I got some questions why some populations have let's say 20% Branch/haplotype #1 and #2 but not 20% of the population has light eye colors. The reasons is because haplotype 1 and 2 are recessive.
Thus, in order to get light eye colors not one but 2 copies/alleles are needed, one inherited from the father, one from the mother.
How to calculate frequency of light eyes in a population based on my presented tables (based on the Hardy-Weinberg principle):
Frequency of light eyes in a population = (%ht1 +%ht2)2

Example1: Germans have 46% ht1 and 33% ht2.
(46%+33%)2
= (0.46 + 0.33)2
= 0.792
= 0.62
= 62%
62% of the Germans have light eye color based on the HERC2 genotype.


Related:
The color of the eyes: 7 HERC2 variants in the Kurdish gene pool 
The color of the eyes: 7 HERC2 variants in the Eurasian gene pool 
The color of the eyes: at least 17 HERC2 variants in Human gene pool