MITOCHONDRIAL DNA HAPLOGROUP K (KATRINE’S CLAN) SURVEY AND SUBCLADE CHART

 

AT 300 K ENTRIES ON MITOSEARCH – DECEMBER 10, 2005

 

I have made two previous surveys of the mitochondrial haplogroup K (Katrine's Clan) entries on FamilyTreeDNA's mitosearch.org , in September and April, 2005. For a link to both see: https://lists.rootsweb.com/hyperkitty/th/read/GENEALOGY-DNA/2005-09/1126975899 It took five months for K entries to increase from 100 to 200, but only another three months to increase to 300.

 

There are two additional elements to this survey which were not present in the previous two. I have analyzed the listed "county of origin" for the entries. I have also attempted to create a subclade chart for the entries. I would not call it a "phylogenetic tree." It is more like a genealogy "descendants chart." A basic tree can easily be constructed from the chart. I have used no software, such as Fluxus, to create the chart, other than Microsoft Excel and "cut and paste." Because of the possibility of data entry errors on mitosearch, and from the aspect of learning about K, I think doing it by hand is a better option than using a computer program. I have no academic credentials to propose such a chart; it is what it is.

 

mtDNA haplogroup K has been traditionally defined by HVR1 mutations 16224C and 16311C. Virtually all K's also have 16519C and HVR2 mutations 073G, 263G, and 315.1C. The latter three show up because the Cambridge Reference Sequence (CRS) is in the minority for those loci. 16519C has been called a “hotspot”; but that does not mean, at least for K, that it mutates often. It simply means that it is found in several different haplogroups. Oddly enough, no K haplotype in mitosearch has just those six basic mutations. Two entries are missing 16311C. Two are missing 16224C; one of those has 16224A instead. One is missing 16519C. Any of those could be a typographical error. The lack of one or more of the three basic HVR2 mutations is not uncommon.

 

This survey was begun on December 10, 2005, when there were a total of 300 K's in mitosearch. All samples not including HVR2 results were eliminated, as were those claiming to be CRS. There were two sets of duplicate entries. Remaining were 146 different HVR1/HVR2 haplotypes. Of the full 300, most have tested with FTDNA, of course, but there are 16 who list the National Geographic Society Genographic Project, five Oxford Ancestors, one Sorenson Molecular Genealogy Foundation, one DNA Heritage, and three Other. None of those listing the Genographic Project have HVR2 results. I know of one person who has HVR1 results from there, but is listed under FTDNA after adding HVR2. One of those from Oxford listed HVR2 results, although I am not aware that they test HVR2. I am also not aware that DNAH offers public mtDNA tests. Sorenson has promised an mtDNA database, but it is not yet available. There are three K-looking results from Relative Genetics under Unknown haplogroup; apparently RG does not provide a haplogroup assignment. I have not included those in the statistics.

 

Of the 146 HVR1/2 entries, there are 114 different haplotypes. That's a "diversity percentage" of 78%. 133 of these are singletons; only 13 haplotypes have more than one entry. Two haplotypes have six entries, two have five, three have three, and six have two. The shortest haplotype has six mutations; the longest 16.

 

The subclade definitions below were also devised without some of those testing with other companies than FTDNA, since those companies often don't test the full HVR1 and especially the full HVR2.

 

mtDNA subclades were earlier defined by using only HVR1 mutations. One method was to define K1 by 16320T and K2 by 16093C.  A good chart defining K subclades by using both HVR1 and HVR2 (55 samples) was done by John S. Walden at: http://freepages.genealogy.rootsweb.com/~jswdna/mtdna.html  Walden used HVR2 mutation 498- (or 498d, a deletion at that locus) to define K1. He continued the usage of 16093C to define K2. But I noticed that the 16093C branch on his chart actually was from samples which did not include HVR2. I also noticed a large number of haplotypes starting with 497T.

 

The trend now is to define the subclades using full sequence samples, which adds coding region mutations to the control region mutations (HVR1/2). The main problem with these studies is that they tend to be limited by geography. Ian Logan has published a highly-detailed K chart based on full sequences, but none of those include the 16320T mutation formerly used to define K1. Therefore, his subclades bear no resemblance to those in the older studies or Walden's chart. My guess is that the sequences used by Logan do not include any from the British Isles. See: http://www.ilbg18230.pwp.blueyonder.co.uk/discussion/hap_K.htm

 

Dr. Doron Behar, now working at FTDNA, previously published a paper on Ashkenazi Jewish mtDNA based on HVR1 mutations only. He found that the Ashkenazi population is 32% K. See: http://www.familytreedna.com/pdf/Behar%202004%20mtDNA.pdf  Behar previewed a new paper using full sequences at the recent FTDNA conference in Washington. A quick glance at the subclades he is defining didn't show much resemblance to Walden's chart.

 

A paper by Finnila, et al., from 2001 using Finnish samples has charts based on HVR1/2 and full sequences. See: http://www.journals.uchicago.edu/AJHG/journal/issues/v68n6/002593/002593.text.html  A paper by Palanichamy, et al.,  from 2004, based on full sequences, used mostly samples from India.  See: http://www.familytreedna.com/pdf/Palani_2004.pdf

 

So, the previous K subclade charts were either based on just HVR1 mutations, which missed the important 497T and 498- mutations, or on HVR1/2 or full sequence samples which were limited by geography. John Walden's chart was an inspiration for me, but it included some HVR1-only samples. All this reminds me of the story of the three blind men describing an elephant just by touch. To get a final version of a K subclade chart, one would have to have a set of geographically-proportionate full-sequence samples. And even then there could be different versions based on the treatment of parallel mutations.

 

My chart below is based on only the 146 K samples in mitosearch which include both HVR1 and HVR2. Most of the test subjects were probably from the USA, but 69 listed a maternal ancestor's county of origin as a country other than USA, Canada, or Unknown. A general breakdown shows 34 from the British Isles, 17 from Eastern Europe, 10 from Germany or Austria, 3 from Scandinavia, 2 from southeastern Europe, and 3 from other countries in Western Europe. There are at least 12 from haplotypes generally considered strictly Ashkenazi. None listed Asia (other than possibly Russia), Africa, South America, or even Spain or Portugal. I believe the sample size is comparable to the other studies and is more geographically diverse than most of them. It does have limitations. It does not have samples from some of the areas, such as the Middle East, that I believe Behar's new paper will include. It does not include full sequences. The listed country of origin may be just a guess and may be incorrect. A major problem with the entries is that some of them were entered "by hand," either by choice or because the testing company was other than FTDNA. (FTDNA customers can upload the numbers automatically.) When I see a 16310C mutation in a haplotype missing the defining mutation 16311C, I suspect a typographical error. That one was in an HVR1-only sequence, so I didn't have to deal with it. But I also found a 497- which I think is probably a 498-.

 

Previous studies used 16093C as a subclade motif, but I found that virtually all those included 497T and that there were many 497T examples without 16093C. Other studies have used either a combination of 146C and 152C, or 16320T, or 498-, as the motif for K1. I decided to include all possible haplotypes in a subclade, so I am using 146C alone as the motif for K1. Unfortunately, 146C also appears as a random mutation in six other haplotypes; but the ones with both 146C and 497T are easily placed in K2. No haplotype with 497T appears in K1. I have noticed that several other studies found 497T and 498- in the same haplotype. I don't know why, but that hasn't happened - so far - in mitosearch.

 

For these reasons I have defined K1 by 146C and K2 by 497T. The order was based on that used in Walden's chart and older studies. For the lower subclades, the order is somewhat random, usually numbered in the order as I found them. I have tried to use as few parallel and back mutations as possible. I have only created subclades where there is more than one haplotype, or where there is at least one lower subclade. I have tried to avoid "empty" subclades; that is, where there is a defining mutation but each haplotype within it has additional mutations. The only two of this type are K1c and K2a2. Of course, with no perfect basic six-mutation examples, in this sense K is an empty haplogroup!

[Note: My chart below and it’s subclade designations have been superseded by the K chart based on full mtDNA sequences in Dr. Doron Behar’s new paper at http://www.familytreedna.com/pdf/43026_Doron.pdf However, the discussion following the chart is still applicable, especially since MitoSearch has a greater representation of British population than Behar’s study.]

 

MTDNA HAPLOGROUP K AND ITS SUBCLADES WITH THEIR DEFINING MUTATIONS

 

K = 16224C, 16311C, 16519C, 073G, 263G, 315.1C

 

.....K1 = add 146C

 

..........K1a = add 152C

 

...............K1a1 = add 498-

 

....................K1a1a = add 16320T

 

...............K1a2 = add 512C

 

...............K1a3 = add 324T

 

...............K1a4 = add 309.1C

 

..........K1b = add 16270T

 

..........K1c = add 195C

 

...............K1c1 = add 16129A

 

....................K1c1a = add 309.1C

 

...............K1c2 = add 524.1C, 524.2A

 

 

.....K2 = add 497T

 

..........K2a = add 16093C

 

...............K2a1 = add 114T

 

....................K2a1a = add 309.1C

 

.........................K2a1a1 = add 195C

 

...............K2a2 = add 195C

 

....................K2a2a = add 16524G

 

....................K2a2b = add 16048A, 16291T, 524.1C, 524.2A

 

.........................K2a2b1 = add 524.3C, 524.4A

 

..........K2b = add 309.1C

 

...............K2b1 = add 524.1A, 524.2C

 

....................K2b1a = add 16245T

 

...............K2b2 = add 146C

 

..........K2c = add 195C

 

..........K2d = add 16234T, 114T (Ashkenazi)

 

...............K2d1 = add 16223T

 

...............K2d2 = add 309.1C

 

...............K2d3 = add 133G, 174T, 323G, 357.1C, 557T, @073G, @263G, @315.1C, @497T

 

..........K2e = add 524.1C, 524.2A

 

...............K2e1 = add 309.1C

 

....................K2e1a = add 524.3C, 524.4A

 

                                                                                                                  DISCUSSION OF K CHART SUBCLADES

 

Due mainly to parallel mutations, there are often alternate ways to create the list. Subclade K1 is fairly linear, with few branches. I found only one example of 16320T outside of K1a1a. There are six scattered examples of 146C in K2. There is one 16093C in K1, but since it also has 498- and 16320T, it is well down the K1 chain. (It happens to be me.)  In K2, both K2a2 and K2c are defined by 195C. Also, both K2a1 and K2d are defined by 114T. An alternative would be to start branches of K2 with 16093C, 114T, 195C, and 309.1C. One result of that would have the 16093C's in three different locations. Also, since there are no haplotypes with 114T and without either 16093C or 16234T, the 114T subclade would be "empty." So, since solving the problem of location duplication for 114T and 195C just creates even more problems, I'll stick with the list as shown above.

 

Notes on subclade K1:

 

K1 is composed of 16224C, 16311C and 16519C in HVR1 and 073G, 146C, 263G and 315.1C in HVR2.

 

One person, J6Q9N in mitosearch, has the basic haplotype for K1, listing the country of origin as Ireland. He could be called K1*. There are two additional samples with "personal mutations," one listing Sweden, one USA. (In this discussion, personal mutations, additional mutations, singletons, all mean essentially the same thing.)

 

Major subclade K1a is defined by adding 152C. There are two perfect examples, and nine others with one or two personal mutations. Four list USA, two Scotland, one Northern Ireland, one Czech Republic, and three Unknown.

 

Subclade K1a1 adds 498-, which has been used elsewhere as the motif for K1 itself. There is only one perfect example. There are five others with personal mutations. Within these five are two listing USA, one Slovakia, one Scotland, and two Unknown. (The Scotland example is duplicated on mitosearch.)

 

Subclade K1a1a adds 16320T, which was an even earlier motif for K1. There are six perfect examples, which ties K2b for the greatest number for a K haplotype. There are also five others with personal mutations. One of those has an apparent back mutation on 16224C; it lists Norway as the origin. The others list two Ireland, one Northern Ireland, one Germany, two USA, and four Unknowns.

 

Subclade K1a1b add 16368C to K1a1. There are two perfect examples and no others.

 

Subclade K1a2 adds 512C. There are five perfect examples, the third highest count, plus one with an additional mutation.

 

Subclade K1a3 adds 324T to K1a. There are two perfect examples, with one more with a personal mutation.

 

Subclade K1a4 adds 309.1C to K1a. There is only one perfect example, with two more with personal mutations.

 

For the 40 in major subclade K1a in total, ten list British Isles, four Eastern Europe, two Germany, one Norway, one France, ten USA, and 12 Unknown.

 

Major subclade K1b is defined by adding 16270T. There is only one perfect example, with two with additional mutations, one Ireland, one Germany, one USA. On mitosearch the perfect one actually has two exact matches, but they are in haplogroup U5. The reason is that 16270T is a defining motif for U5; apparently for those two the 16224C and 16224C mutations are considered just additional personal mutations. I assume the haplogroups for these have been determined by looking at the coding region mutations.

 

Major subclade K1c is defined by adding 195c. It is presently empty and only exists because there are lower subclades.

 

Subclade K1c1 adds 16129A. There is only one example, but it has a lower subclade.

 

Subclade K1c1a adds 309.1C. There are two examples.

 

Subclade K1c2 adds 524.1C and 524.2A to K1c. There are two perfect examples and three with additional mutations. One of the latter has 16320T. If it actually had a back mutation on 498-, it would be a K1a1a. But I'm not sure you can have a back mutation after a deletion.

 

In total, subclade K1c has three USA, two Scotland, one each Belgium, Canada, and England.

 

Notes on subclade K2:

 

K2 is composed of 16224C, 16311C and 16519C in HVR1 and 073G, 263G, 315.1C and 497T in HVR2.

 

One person, 75X73, has the basic haplotype for K2; the county of origin is Unknown. He could be K2*. There are ten others who have additional mutations, but who do not fall into the lower subclades. Eight of them have 497T, so are clearly K2. My assumption is that they have had one or more back mutations. Sometimes I could guess what that was, but sometimes not. Others appear to have had one or more mutations from the basic K2. One of the two without 497T was tested by DNA Heritage, which doesn't publicly offer mtDNA tests. The other one was placed in K2 for no other reason than it has 499.1T and there is a 499A in K2. Mutation 497T appears to be subject to back mutations, as are all the others.

 

Major subclade K2a is defined by adding 16093C. There is only one "perfect" K2a, with England as county of origin - and four others with just one or more personal mutations.

 

K2a1 is defined by adding 114T. There is one perfect example and one other with a person mutation and a missing 497T. Since the test on the latter was performed by "Other," I suspect that the testing didn't go as high as 497T rather than a back mutation.

 

K2a1a is defined by adding 309.1C. There is only one of these, but there are three examples of a still lower subclade.

 

K2a1a1 is defined by adding 195C. There are three perfect examples.

 

K2a2 is defined by adding 195C to K2a. This is an "empty subclade," since there are no perfect examples, but there are lower subclades.

 

K2a2a adds 16524G. There are two of those; one Ukraine, one Poland. Those are the only K2a's from other than the British Isles, USA, and Unknown.

 

K2a2b adds 16048A, 16291T, 524.1C, and 524.2A to K2a2. There is only one of these, but it has a lower branch.

 

K2a2b1 adds 16291T, 524.3C and 524.4A. There are two perfect examples. However, there is one which has all nine HVR2 mutations so far acquired in this line, but is missing 16048A and 16291T. From there another one adds 151T, 198T, and 309.1C. This series is not clear enough to create another subclade level, but bears watching as the size of mitosearch increases.

 

There are three other singletons which resemble K2a2's; one claims to have been tested by Sorenson.

 

Of the 22 in K2a, six list British Isles, one each Canada, Ukraine, Romania, nine USA, and four Unknown.

 

Major subclade K2b is defined by adding 309.1C. There are six perfect examples, which ties K1a1a for the greatest number for a K haplotype. Countries of origin include Germany, Slovakia, and the USA. Six stray entries have been placed here which have additional mutations; two apparently have back mutations on 497T.

 

K2b1 adds 524.1A and 524.2C. There is only one of these; another adds 524.3A and 524.4C. There are two others which resemble K2b1's.

 

K2b1a adds 16245T. Again, there is only one perfect example, with another two with personal mutations.

 

K2b2 adds 146C to K2b. There is only one perfect example, but another one adds 309.2C and 548T. Actually, there are two entries for the latter; but they are duplicates.

 

K2b3 adds 16189C to K2b. Again, one example, but another one is missing 497T, apparently a back mutation.

 

Of the 21 in K2b, two list British Isles, two Germany, two Eastern Europe, seven USA, and eight Unknown.

 

Major subclade K2c is defined by adding 195C. There are three perfect examples, two USA, one Poland. There are four others with additional mutations, with a couple of curiosities. Two of them add the four-insertion sequence beginning with 524.1A; one with the sequence beginning with 524.1C. But the HVR1 mutations of one of the former matches one of the latter. So, instead of a little tree with branches, these three seem to form a triangle. Two of those list Ireland, one Scotland. The other K2c not mentioned lists Greece.

 

Major subclade K2d is defined by adding 16234T and 114T and is of special interest because it is said to be limited to Ashkenazi Jews. The discussion I have seen on this brand of K only mentioned 16234T, but 114T appears in most of them. (Since this group will be a major part of Dr. Behar's future paper, I will not be surprised if he labels it K1.)  There are three perfect examples and two more with personal mutations.

 

K2d1 adds 16223T. There are five perfect examples.

 

The previous two subclades are mentioned in Dr. Behar's previous paper, and in other discussions, as Ashkenazi. Of the ten sequences, three list Poland, with one each from Lithuania, Germany, Ukraine, Russia, and Hungary, with two Unknown - exactly where Ashkenazi's are supposed to be from. But let's look further.

 

K2d2 adds 309.1C to K2d. There are two examples, one Austria, one unknown.

 

K2d3 is perhaps the oddest subclade of all. From K2d it adds 133G, 174T, 323G, 375.1C, and 557T, and is missing - apparently back mutations - 073G, 263G, 315.1C, and 497T. It barely resembles K2d, except for that 16234T. Otherwise it looks like a K reverting to CRS. Is it Ashkenazi? Well, reading the comments on mitosearch reveals that the distant ancestor in Hungary was married to a rabbi. I have given it a subclade designation since there is another example which adds 16189C. In addition, there are five singletons which have 133G and all four back mutations - but don't have 16234T. They also have some mutations not seen elsewhere. They list England, Ireland, Denmark, Hungary, and USA. All these strange sequences were tested at FTDNA. It will be interesting to see if Behar found some of these.

 

So, of the 19 in K2d, 10 list Eastern Europe, two British Isles, two Germanic, one Denmark, one USA, and three Unknown.

 

Major subclade K2e is defined by adding 524.1C and 524.2A. Again, there is only one perfect example so far, but there are singletons and subclades below it.

 

K2e1 adds 309.1C. Again, only one of these, but there are three with additional mutations, and a subclade below.

 

K2e1a adds 524.3C and 524.4A. There is only one perfect example, with one with a personal mutation. What is interesting is that the second half of the 524.1C, etc., series was added, but only after the intervening 309.1C. Or else something else happened that I don't understand. At first glance, K2e1a looks like a singleton below K2b1, but K2e involves the 524.1C sequence of insertions and K2b1 involves the 524.1A insertions.

 

Countries of origin for K2e are quite varied: one each from England, Italy, Austria, Canada, Germany, UK, and Ireland, plus three Unknown.

 

                                                                                                                                         GENERAL COMMENTS

 

Countries of origin for 54 K1 sequences are: 6 Scotland, 4 Ireland, 3 England, 3 Germany, 2 Northern Ireland, 2 Lithuania, and 1 each Sweden, Czech, Slovakia, Norway, Russia, France, Belgium, Canada, plus 15 USA and 11 Unknown.

 

Regions of origin for K1 sequences are: 15 British Isles, 5 Eastern Europe, 3 Germanic, 2 Scandinavia, 2 Western Europe, 16 North America, and 16 Unknown.

 

Countries of origin for 92 K2 sequences are: 8 England, 6 Ireland, 5 Germany, 4 Poland, 3 Ukraine, 3 Hungary, 2 Scotland, 2 Canada, 2 Northern Ireland, 2 Austria, 1 each France, Romania, Slovakia, Czech, Greece, Lithuania, Russia, Denmark, Italy, UK, plus 21 USA and 24 Unknown.

 

Regions of origin for K2 sequences are: 19 British Isles, 15 Eastern Europe, 7 Germanic, 2 Western Europe, 2 Southeastern Europe, 23 North America, 24 Unknown.

 

If North America and Unknown are ignored the percentages for regions are:

 

K1: 56% British Isles, 19% Eastern Europe, 11% Germanic, 7% Western Europe, 7% Scandinavia.

 

K2: 42% British Isles, 33% Eastern Europe, 16% Germanic, 4% Western Europe, 4% Southeastern Europe.

 

K1 is 50% North America and Unknown; K2 is 49%.

 

It is easy to see that K1 is weighted toward the west and north of Europe, while K2 displays a trend toward the east and south.

 

I would hazard a guess that a high percentage of the USA and Unknown entries are from British Isles and Western Europe; their ancestors may have arrived in the USA so long ago that the origin has been forgotten or is not provable. Late arrivals from other regions would be more likely to provide a definite county of origin. If I'm correct, then the west/east weightings above would be even more pronounced.

 

Subclade K1, in tree terms, is tall and thin. Subclade K1a has 40 examples, while the other two have only 3 and 8.

 

Subclade K2 is wide. K2a has 22 sequences; the others have 21, 19, 10, and 7.  

 

I have left two entries as "questionable"; not easily placed in a subclade. One is probably K1, the other K2.

 

With more entries in the future, more subclades will no doubt be formed as singletons gain exact matches. Also, the arrangement of the lower subclades may change.

 

Certain mutations seem to be involved more often in parallel mutations, resulting in their appearance in more than one lower subclade. These include 16093C, 114T, 146C, 152C, 195C, 309.1C, and the sequence of four insertions 524.1C, 524.2A, 524.3C, 524.4A.

 

The above four-insertion sequence and its opposite, 524.1A, 524.2C, 524.3A, 524.4C, always appear in groups of two or four and rotate from one letter to the other. The other multiple-insertion sequence 309.1C, 309.2C, has the same nucleotide in each copy.

 

Several insertions are used to define subclades, but only one deletion, 498-. Only one subclade is defined by back mutations, the strange K2d3, which is defined by five, at 73G, 114T, 263G, 315.1C, and 497T, as well as five other mutations.

 

In conclusion, I found creating the chart to be an interesting exercise. I'm not sure how useful it will be for others. I have no illusion that this chart will be adopted as official or semi-official. I don't plan on ordering a K1a1a lapel pin for myself anytime soon.

 

© Copyright William R. Hurst 2005