MITOCHONDRIAL DNA HAPLOGROUP K SURVEY AT 1500 ENTRIES ON MITOSEARCH
AS OF DECEMBER 21, 2007
This is my eighth survey of the mtDNA haplogroup K (Katrine’s Clan) entries on FamilyTreeDNA’s MitoSearch. The previous survey at 1000 entries may be found at K1000 Survey. There are several links on that page to previous surveys, supplements and charts. It has been ten months since the K1000 survey.
This K1500 survey includes a new CHART, which is sorted by predicted subclade, then HVR2, then HVR1 mutations. The first step taken was to eliminate all but the high-resolution entries – those with both HVR1 and HVR2 mutations listed. First to be dropped were 69 shown as CRS or matching the Cambridge Reference Sequence, which is represented by no mutations in HVR1. None of those would belong in K, so I suspect that most or all are entries which were not completed before entering the mutation lists. Next, I eliminated all of those with no HVR2 mutations. Also, a few were deleted because either HVR1 or HVR2 appeared to be incompletely tested in non-FTDNA sequences. Two years ago I created a special entry, ARFHH, which represents the ancestral haplotype for K. Since it does not represent an actual living person, I deleted it from this survey.
The remaining high-resolution entries numbered 688 or 45.9% of the original total – slightly higher than the previous survey. The real percentage, after eliminating the probably incomplete entries labeled CRS, was 48.1%. Also, some may have upgraded their tests without adding the HVR2 results to MitoSearch. Ten listed the testing company as the National Geographic Society’s Genographic Project. I suspect many more began their testing there, but completed it at FTDNA and so listed that. (In comparison, 39.7% of the FTDNA members of the Haplogroup K Project started testing with the Genographic Project.) Four listed Relative Genetics. Two each listed Argus Biosciences, SMGF or DNA Heritage. Single examples were found listing Ancestry, DNA Ancestry Project, Roots for Real and Julia Bronder. One listed Oxford Ancestors, although an HVR2 upgrade was taken at FTDNA. One simply listed Other. The remaining 662 listed FTDNA. Ninety-seven entries (plus 19 with HVR1 only) contain pedigree charts.
There are 445 entries, or 64.7%, with exact high-resolution matches in 95 different haplotypes or sequences. That leaves 243 unmatched singletons for a total of 338 different haplotypes. Dividing by the 688 total gives 49.1% for what I have been calling the “diversity percentage”; but which Ann Turner says is called “discrimination capacity” in scientific papers. That number is down from 53.1% in the K1000 survey and 59.2% in the K750 survey. Thus it continues to become more likely that a new entry will find one or more exact matches. The largest haplotype has 30 examples. I will list the most common haplotypes with just their “extra” mutations, not including the six basic K mutations: HVR1 – 16224C, 16311C, 16519C and HVR2 – 73G, 263G, 315.1C.
30 – 16234T, 114T, 497T - This is the most common form of the largest “Ashkenazi” subclade K1a1b1a.
24 – 146C, 152C - This haplotype is the modal K2a.
22 – 146C, 152C, 512C - This is the second Ashkenazi subclade, K2a2a.
21 – 16223T, 16234T, 114T, 497T – This adds 16223T to the basic K1a1b1a subclade above to form a “second modal.”
19 – 16093C, 16524G, 195C, 497T – This is another Ashkenazi subclade, K1a9.
17 – 146C, 152C, 498- – This is the modal for subclade K1c1.
15 – 16320T, 146C, 152C, 498- – This is the modal haplotype for subclade K1c2.
14 – 309.1C, 497T – This adds one of the most common recurrent mutations, 309.1C, to the defining mutation for K1a, 497T.
13 – 16048A, 16093C, 16291T, 195C, 497T, 524.1C, 524.2A, 524.3C, 524.4A. This is the modal for the provisional K1a10 subclade.
11 – 146C, 152C, 309.1C – This adds the common 309.1C insertion to the modal K2a.
Notice that four of the five most common haplotypes are forms of the three Ashkenazi K subclades mentioned in Dr. Doron Behar’s 2006 paper. These subclades have a relatively recent founding, accounting for their lack of additional mutations. Some other common subclades, such as K1b2, have few perfect examples, but have many with one or more additional mutations. The two haplotypes which have moved up the rankings the most are K2a2a, which before has always shown up as the third most common Ashkenazi subclade, and K1c1. The latter was listed simply as K1c in previous surveys, but the full-sequence results so far indicate that they are all in K1c1. That could change.
The two shortest haplotypes without back mutations are those with one extra mutation, either 497T or 146C. The first has nine examples and the second has five. Since no K has just the basic six mutations, these two haplotypes probably represent the closest in time to the founder of K. (Note that I have not considered the non-FTDNA examples, which may not have been tested for certain mutations.) Although I have referred to several K subclades above, the actual subclade designations are usually based on coding-region mutations outside the HVR1/2 regions. See the Behar paper mentioned above. Those mutations may only be tested for K with a full-sequence (FGS or Mega) test.
I have made an attempt to predict the subclades for most of the entries. There are now 116 persons in 22 different subclades in the Haplogroup K Project who have received results including subclade designations from full-sequence tests. Usually, but not always, subclade predictions can be made by comparing haplotypes with those with designated subclades and with Behar’s K chart. (Predictions are not always possible; one haplotype has examples confirmed in K1a, K1a3 and K1a4.) The most difficult to predict are some of the lower subclades of K1a which are defined entirely by coding-region mutations. For those I have usually just listed “K1a.”
I have also identified two new provisional subclades not included on Dr. Behar’s tree. The most prominent of these, with 27 easily identifiable examples - is the one mentioned above which has the 16048A mutation. I have even suggested a designation for it, K1a10, based on its nearness to K1a9 on Behar’s tree. While a formal designation will have to wait for the next scientific paper with a new K tree, I have used my name for it on the Chart. Also, I have been studying the other sequences in K1a with 195C, but which are not in K1a9 or K1a10. These represent almost 10% of all the K entries. On my Chart I have identified those as, respectively, pre-K1a10 or pre-K1a9, depending on whether or not they have one or more pairs of the HVR2 position 524 insertions. However, full-sequence testing sometimes puts them into other subclades, especially K1a4a1. There are 15 examples in K1a which include 16129A, 16T, 150T and 199C. I’ve labeled this provisional subclade as K1a11. Those in K1a11 who have taken FGS tests all have a matching group of coding-region mutations.
In the past when I have looked at the geographical aspects of the results, I either divided them into the major subclade groups – K1a, K1b, K1c and K2 – or the full list of lower subclades. This time I have divided them differently. First, I have combined the three Ashkenazi subclades – K1a1b1a, K1a9 and K2a2a – into one group, since they were founded about the same time and had similar migration patterns. I have again used the K1b, K1c and K2 (minus K2a2a) groups. But I have split K1a, which is over 60% of K. I have separated K1a4a1 (but just the easily identifiable ones – perhaps half, with 16245T or 16261T) and three provisional subclades K1a10, K1a11 and Pre-K1a10. Since most of the other lower subclades of K1a are usually not predictable from HVR results, I have combined them into K1a+. These K1a+ haplotypes are likely to have an older origin than the others. I have only looked at the examples which list European origins for their maternal ancestor, mainly leaving out those listing USA, Canada or Unknown. I’ve divided Europe into Southern (primarily Italy, Spain and Portugal), Western (including Germany), Central, Eastern, Scandinavia, and British Isles. I’ve created pie charts for each area.
Starting with Southern Europe, since that area is presumably closest to the founding location of K, we see that it is dominated by K1a+ at 64%. There is a small representation of K1b, K1c and K2, and a perhaps surprising 14% for Pre-K1a10. There are no examples with K1a4a1, K1a10, K1a11 or the Ashkenazi subclades, which is evidence that those developed further north.
Moving next to Western Europe, K1a+ is still dominant, but at a lower 38%. K1b, K1c and K2 still have substantially equal percentages, but somewhat higher. The percentage in Pre-K1a10 has gone down. We now see a few in K1a4a1 and K1a11, but still no K1a10. The major change is the introduction of 16% Ashkenazi, primarily from Germany whose Rhine Valley was the location of the founding of these subclades according to Behar. Since none of the Germany examples in K1a1b1a has the 16223T mutation (the “second modal”), that’s good evidence that that mutation occurred only after the subclade members moved eastward.
Choosing next to move eastward, we reach Central Europe where the obvious domination is by the Ashkenazi subclades at 57%. That leaves less room for the others, with K1a+ next highest at 20%. There are small percentages from K1b, K1c, K2, K1a11 and Pre-K1a10. No K1a10 or K1a4a1. Central Europe is not a consistent area. Austria is almost entirely Ashkenazi in K; none of the four examples listing Slovakia is Ashkenazi; the Czech Republic is very diverse; with the other countries somewhere in between.
Still moving eastward we get to Eastern Europe, where the Ashkenazi subclades have become almost completely dominant at 86%. Only small percentages of K1a+, K1c and K2 are left. K1b has disappeared. A thought is that perhaps the K1c – in Russia – came from Vikings.
Retracing our steps and moving north from Western Europe – and perhaps Central Europe – to the British Isles, we find a much more diverse collection of subclades. K1a+ again has the most at 32%, with K2, K1c and Pre-K1a10 in double-digits. K1b, overall the smallest of the major groups, is at 6%. There are even smaller percentages of K1a4a1, K1a11 and the Ashkenazi subclades; the latter perhaps a back-migration from Central and Eastern Europe. K1a10 finally makes an appearance at 8% (Ireland has 11%).
Finally we reach Scandinavia where the most significant fact, which I have noted previously, is that K1a+ is virtually absent at only 5%. That causes the other percentages to be higher, but K1c is obviously dominant at 33%. K1b is at its highest level at 19%, while K2 and K1a4a1 are also at double-digits. I have mentioned before that there is an issue of where K1a10 originated, with Ireland and Scandinavia being suggested. Here K1a10 is at 14%, more than in Ireland; but that is exaggerated by the shortage of K1a+. What I observed was that Pre-K1a10, from which K1a10 derived, is only at 5% in Scandinavia while being at 18% in Ireland. To me that suggests that K1a10 probably originated in Ireland. (Another factor is that K1a10 is more diverse in Ireland.)
While the west-to-east migration of the Ashkenazi subclades is obvious, other K migrations are not so obvious. Apparently K1a10 moved from Ireland to Great Britain and to Scandinavia. K1c could have originated in one of several places. In fact, since there seem to be no surviving plain K1c examples, K1c1 and K1c2 may have separate geographical origins. There is little or no evidence for the east-to-west migration of K shown on some mtDNA maps. Also, the evidence seems to be that the examples of K1a and its lower subclades in Scandinavia migrated from the British Isles, probably well before the Viking age.
As in the past, I have color-coded a few of the mutations on the chart. (The colors are somewhat different due to the use of the new version of Excel.) I again used yellow to mark the 498- and 16320T mutations, which usually indicate the K1c1 and K1c2 subclades. Also, dark green is used to mark the 16234T, 16524G, and 512C mutations which usually indicate the three “Ashkenazi” subclades, as discussed above. Light green is used for the 16092C, 16223T and 114T mutations usually found in K1a1b1a and its upstream relatives. Dark blue is used for those with the 16319A mutation, defining K1b1a. Orange marks the 16048A mutation of those in K1a10. Tan is used for 16T, the key defining mutation for the provisional K1a11. Grey is used for 16245T and 16261T, which seem to predict K1a4a1. Purple is used to mark the insertions at position 524, which appear in 25.3% of the entries.
All K’s who tested at FTDNA or who transfer their results from the Genographic Project are welcome to join the mtDNA Haplogroup K Project by clicking on the blue Join button on their FTDNA personal pages. (Those testing with other companies may join by e-mail request, but their mutations will be listed on the Results page rather than the mtDNA Results page.) Further information is available on our project website.
Many aspects of the above comments have been explained in greater detail in the K1000, K750, K500 or K403 surveys referenced above.
© 2008 William R. Hurst
Administrator, mtDNA Haplogroup K Project