MITOCHONDRIAL DNA HAPLOGROUP K (KATRINE’S CLAN) SURVEY AND
SUBCLADE CHART
AT 300 K ENTRIES ON MITOSEARCH – DECEMBER 10, 2005
I have
made two previous surveys of the mitochondrial haplogroup K (Katrine's Clan) entries on FamilyTreeDNA's
mitosearch.org , in September and April, 2005. For a
link to both see:
https://lists.rootsweb.com/hyperkitty/th/read/GENEALOGY-DNA/2005-09/1126975899 It took
five months for K entries to increase from 100 to 200, but only another three
months to increase to 300.
There are
two additional elements to this survey which were not present in the previous
two. I have analyzed the listed "county of origin" for the entries. I
have also attempted to create a subclade chart for the entries. I would not
call it a "phylogenetic tree." It is more like a genealogy "descendants chart." A basic tree can easily be
constructed from the chart. I have used no software, such as Fluxus, to create
the chart, other than Microsoft Excel and "cut and paste." Because of
the possibility of data entry errors on mitosearch, and from the aspect of
learning about K, I think doing it by hand is a better option than using a
computer program. I have no academic credentials to propose such a chart; it is
what it is.
mtDNA
haplogroup K has been traditionally defined by HVR1 mutations 16224C and
16311C. Virtually all K's also have 16519C and HVR2
mutations 073G, 263G, and 315.1C. The latter three show up because the
Cambridge Reference Sequence (CRS) is in the minority for those loci. 16519C
has been called a “hotspot”; but that does not mean, at least for K, that it
mutates often. It simply means that it is found in several different haplogroups. Oddly enough, no K haplotype in mitosearch has
just those six basic mutations. Two entries are missing 16311C. Two are missing
16224C; one of those has 16224A instead. One is missing 16519C. Any of those
could be a typographical error. The lack of one or more of the three basic HVR2
mutations is not uncommon.
This
survey was begun on December 10, 2005, when there were a total of 300 K's in mitosearch. All samples not including HVR2 results
were eliminated, as were those claiming to be CRS. There were two sets of
duplicate entries. Remaining were 146 different HVR1/HVR2 haplotypes. Of the
full 300, most have tested with FTDNA, of course, but there are 16 who list the
National Geographic Society Genographic Project, five Oxford Ancestors, one
Sorenson Molecular Genealogy Foundation, one DNA Heritage, and three Other.
None of those listing the Genographic Project have HVR2 results. I know of one
person who has HVR1 results from there, but is listed under FTDNA after adding
HVR2. One of those from Oxford listed HVR2 results, although I am not aware
that they test HVR2. I am also not aware that DNAH offers public mtDNA tests.
Sorenson has promised an mtDNA database, but it is not yet available. There are
three K-looking results from Relative Genetics under Unknown haplogroup;
apparently RG does not provide a haplogroup assignment. I have not included
those in the statistics.
Of the
146 HVR1/2 entries, there are 114 different haplotypes. That's a
"diversity percentage" of 78%. 133 of these are singletons; only 13
haplotypes have more than one entry. Two haplotypes have six entries, two have
five, three have three, and six have two. The shortest haplotype has six
mutations; the longest 16.
The
subclade definitions below were also devised without some of those testing with
other companies than FTDNA, since those companies often don't test the full
HVR1 and especially the full HVR2.
mtDNA subclades were earlier defined by using only HVR1 mutations. One method
was to define K1 by 16320T and
The trend
now is to define the subclades using full sequence samples, which adds coding
region mutations to the control region mutations (HVR1/2). The main problem
with these studies is that they tend to be limited by geography. Ian Logan has
published a highly-detailed K chart based on full sequences, but none of those
include the 16320T mutation formerly used to define K1. Therefore, his
subclades bear no resemblance to those in the older studies or Walden's chart.
My guess is that the sequences used by Logan do not include any from the
British Isles. See:
http://www.ilbg18230.pwp.blueyonder.co.uk/discussion/hap_K.htm
Dr. Doron Behar, now working at
FTDNA, previously published a paper on Ashkenazi Jewish mtDNA based on HVR1
mutations only. He found that the Ashkenazi population is 32% K. See:
http://www.familytreedna.com/pdf/Behar%202004%20mtDNA.pdf Behar
previewed a new paper using full sequences at the recent FTDNA conference in
A paper
by Finnila, et al., from 2001 using Finnish samples
has charts based on HVR1/2 and full sequences. See:
http://www.journals.uchicago.edu/AJHG/journal/issues/v68n6/002593/002593.text.html A
paper by Palanichamy, et al., from 2004, based on full sequences, used
mostly samples from
So, the
previous K subclade charts were either based on just HVR1 mutations, which missed
the important 497T and 498- mutations, or on HVR1/2 or full sequence samples
which were limited by geography. John Walden's chart was an inspiration for me,
but it included some HVR1-only samples. All this reminds me of the story of the
three blind men describing an elephant just by touch. To get a final version of
a K subclade chart, one would have to have a set of
geographically-proportionate full-sequence samples. And even then there could
be different versions based on the treatment of parallel mutations.
My chart
below is based on only the 146 K samples in mitosearch which include both HVR1
and HVR2. Most of the test subjects were probably from the USA, but 69 listed a
maternal ancestor's county of origin as a country other than USA, Canada, or
Unknown. A general breakdown shows 34 from the British Isles, 17 from Eastern
Europe, 10 from
Previous
studies used 16093C as a subclade motif, but I found that virtually all those
included 497T and that there were many 497T examples without 16093C. Other
studies have used either a combination of 146C and 152C, or 16320T, or 498-, as
the motif for K1. I decided to include all possible haplotypes in a subclade,
so I am using 146C alone as the motif for K1. Unfortunately, 146C also appears
as a random mutation in six other haplotypes; but the ones with both 146C and
497T are easily placed in
For these
reasons I have defined K1 by 146C and K2 by 497T. The order was based on that
used in Walden's chart and older studies. For the lower subclades, the order is
somewhat random, usually numbered in the order as I found them. I have tried to
use as few parallel and back mutations as possible. I have only created
subclades where there is more than one haplotype, or where there is at least
one lower subclade. I have tried to avoid "empty" subclades; that is,
where there is a defining mutation but each haplotype within it has additional
mutations. The only two of this type are K1c and K2a2. Of course, with no
perfect basic six-mutation examples, in this sense K is an empty haplogroup!
[Note: My chart below and it’s subclade designations have been superseded by the K chart based on full mtDNA sequences in Dr. Doron Behar’s new paper at http://www.familytreedna.com/pdf/43026_Doron.pdf However, the discussion following the chart is still applicable, especially since MitoSearch has a greater representation of British population than Behar’s study.]
MTDNA HAPLOGROUP K AND ITS SUBCLADES WITH THEIR DEFINING
MUTATIONS
K =
16224C, 16311C, 16519C, 073G, 263G, 315.1C
.....K1 =
add 146C
..........K1a
= add 152C
...............K1a1
= add 498-
....................K1a1a
= add 16320T
...............K1a2
= add 512C
...............K1a3
= add 324T
...............K1a4
= add 309.1C
..........K1b
= add 16270T
..........K1c
= add 195C
...............K1c1
= add 16129A
....................K1c1a
= add 309.1C
...............K1c2
= add 524.1C, 524.2A
.....K2 =
add 497T
..........K2a
= add 16093C
...............K2a1
= add 114T
....................K2a1a
= add 309.1C
.........................K2a1a1
= add 195C
...............K2a2
= add 195C
....................K2a2a
= add 16524G
....................K2a2b
= add 16048A, 16291T, 524.1C, 524.2A
.........................K2a2b1
= add 524.3C, 524.4A
..........K2b
= add 309.1C
...............K2b1
= add 524.1A, 524.2C
....................K2b1a
= add 16245T
...............K2b2
= add 146C
..........K2c
= add 195C
..........K2d
= add 16234T, 114T (Ashkenazi)
...............K2d1
= add 16223T
...............K2d2
= add 309.1C
...............K2d3
= add 133G, 174T, 323G, 357.1C, 557T, @073G, @263G, @315.1C, @497T
..........K2e
= add 524.1C, 524.2A
...............K2e1
= add 309.1C
....................K2e1a
= add 524.3C, 524.4A
DISCUSSION
OF K CHART SUBCLADES
Due
mainly to parallel mutations, there are often alternate ways to create the
list. Subclade K1 is fairly linear, with few branches. I found only one example
of 16320T outside of K1a1a. There are six scattered examples of 146C in K2.
There is one 16093C in K1, but since it also has 498- and 16320T, it is well
down the K1 chain. (It happens to be me.)
In K2, both K2a2 and K2c are defined by 195C. Also, both K2a1 and K2d
are defined by 114T. An alternative would be to start branches of K2 with
16093C, 114T, 195C, and 309.1C. One result of that would have the 16093C's in
three different locations. Also, since there are no haplotypes with 114T and
without either 16093C or 16234T, the 114T subclade would be "empty."
So, since solving the problem of location duplication for 114T and 195C just
creates even more problems, I'll stick with the list as shown above.
Notes on
subclade K1:
K1 is
composed of 16224C, 16311C and 16519C in HVR1 and 073G, 146C, 263G and 315.1C
in HVR2.
One
person, J6Q9N in mitosearch, has the basic haplotype for K1, listing the
country of origin as Ireland. He could be called K1*. There are two additional
samples with "personal mutations," one listing Sweden, one USA. (In
this discussion, personal mutations, additional mutations, singletons, all mean
essentially the same thing.)
Major
subclade K1a is defined by adding 152C. There are two perfect examples, and
nine others with one or two personal mutations. Four list
Subclade
K1a1 adds 498-, which has been used elsewhere as the motif for K1 itself. There
is only one perfect example. There are five others with personal mutations.
Within these five are two listing USA, one Slovakia, one Scotland, and two
Unknown. (The Scotland example is duplicated on mitosearch.)
Subclade
K1a1a adds 16320T, which was an even earlier motif for K1. There are six
perfect examples, which ties K2b for the greatest number for a K haplotype.
There are also five others with personal mutations. One of those has an
apparent back mutation on 16224C; it lists Norway as the origin. The others
list two Ireland, one Northern Ireland, one Germany, two USA, and four
Unknowns.
Subclade
K1a1b add 16368C to K1a1. There are two perfect examples and no others.
Subclade
K1a2 adds 512C. There are five perfect examples, the third highest count, plus
one with an additional mutation.
Subclade
K1a3 adds 324T to K1a. There are two perfect examples, with one more with a
personal mutation.
Subclade
K1a4 adds 309.1C to K1a. There is only one perfect example, with two more with
personal mutations.
For the
40 in major subclade K1a in total, ten list British Isles, four Eastern Europe,
two
Major
subclade K1b is defined by adding 16270T. There is only one perfect example,
with two with additional mutations, one
Major
subclade K1c is defined by adding 195c. It is presently empty and only exists
because there are lower subclades.
Subclade
K1c1 adds 16129A. There is only one example, but it has a lower subclade.
Subclade
K1c1a adds 309.1C. There are two examples.
Subclade
K1c2 adds 524.1C and 524.2A to K1c. There are two perfect examples and three
with additional mutations. One of the latter has 16320T. If it actually had a
back mutation on 498-, it would be a K1a1a. But I'm not sure you can have a
back mutation after a deletion.
In total,
subclade K1c has three USA, two Scotland, one each Belgium, Canada, and
England.
Notes on
subclade K2:
K2 is
composed of 16224C, 16311C and 16519C in HVR1 and 073G, 263G, 315.1C and 497T
in HVR2.
One
person, 75X73, has the basic haplotype for K2; the county of origin is Unknown.
He could be K2*. There are ten others who have additional mutations, but who do
not fall into the lower subclades. Eight of them have 497T, so are clearly K2.
My assumption is that they have had one or more back mutations. Sometimes I
could guess what that was, but sometimes not. Others appear to have had one or
more mutations from the basic
Major
subclade K2a is defined by adding 16093C. There is only one "perfect"
K2a, with
K2a1 is
defined by adding 114T. There is one perfect example and one other with a
person mutation and a missing 497T. Since the test on the latter was performed
by "Other," I suspect that the testing didn't go as high as 497T
rather than a back mutation.
K2a1a is
defined by adding 309.1C. There is only one of these, but there are three
examples of a still lower subclade.
K2a1a1 is
defined by adding 195C. There are three perfect examples.
K2a2 is
defined by adding 195C to K2a. This is an "empty subclade," since
there are no perfect examples, but there are lower subclades.
K2a2a
adds 16524G. There are two of those; one Ukraine, one Poland. Those are the
only K2a's from other than the British Isles, USA, and Unknown.
K2a2b
adds 16048A, 16291T, 524.1C, and 524.2A to K2a2. There is only one of these,
but it has a lower branch.
K2a2b1
adds 16291T, 524.3C and 524.4A. There are two perfect examples. However, there
is one which has all nine HVR2 mutations so far acquired in this line, but is
missing 16048A and 16291T. From there another one adds 151T, 198T, and 309.1C.
This series is not clear enough to create another subclade level, but bears
watching as the size of mitosearch increases.
There are
three other singletons which resemble K2a2's; one claims to have been tested by
Sorenson.
Of the 22
in K2a, six list British Isles, one each Canada, Ukraine, Romania, nine USA,
and four Unknown.
Major
subclade K2b is defined by adding 309.1C. There are six perfect examples, which
ties K1a1a for the greatest number for a K haplotype. Countries of origin
include Germany, Slovakia, and the USA. Six stray entries have been placed here
which have additional mutations; two apparently have back mutations on 497T.
K2b1 adds
524.1A and 524.2C. There is only one of these; another adds 524.3A and 524.4C.
There are two others which resemble K2b1's.
K2b1a
adds 16245T. Again, there is only one perfect example, with another two with
personal mutations.
K2b2 adds
146C to K2b. There is only one perfect example, but another one adds 309.2C and
548T. Actually, there are two entries for the latter; but they are duplicates.
K2b3 adds
16189C to K2b. Again, one example, but another one is missing 497T, apparently
a back mutation.
Of the 21
in K2b, two list British Isles, two Germany, two Eastern Europe, seven USA, and
eight Unknown.
Major
subclade K2c is defined by adding 195C. There are three perfect examples, two
USA, one Poland. There are four others with additional mutations, with a couple
of curiosities. Two of them add the four-insertion sequence beginning with
524.1A; one with the sequence beginning with 524.1C. But the HVR1 mutations of one of the former matches one of the latter.
So, instead of a little tree with branches, these three seem to form a
triangle. Two of those list
Major
subclade K2d is defined by adding 16234T and 114T and is of special interest
because it is said to be limited to Ashkenazi Jews. The discussion I have seen
on this brand of K only mentioned 16234T, but 114T appears in most of them.
(Since this group will be a major part of Dr. Behar's
future paper, I will not be surprised if he labels it K1.) There are three perfect examples and two more
with personal mutations.
K2d1 adds
16223T. There are five perfect examples.
The
previous two subclades are mentioned in Dr. Behar's
previous paper, and in other discussions, as Ashkenazi. Of the ten sequences,
three list
K2d2 adds
309.1C to K2d. There are two examples, one Austria, one unknown.
K2d3 is
perhaps the oddest subclade of all. From K2d it adds 133G, 174T, 323G, 375.1C,
and 557T, and is missing - apparently back mutations - 073G, 263G, 315.1C, and
497T. It barely resembles K2d, except for that 16234T. Otherwise it looks like
a K reverting to CRS. Is it Ashkenazi? Well, reading the comments on mitosearch
reveals that the distant ancestor in
So, of
the 19 in K2d, 10 list Eastern Europe, two British Isles, two Germanic, one
Denmark, one USA, and three Unknown.
Major
subclade K2e is defined by adding 524.1C and 524.2A. Again, there is only one
perfect example so far, but there are singletons and subclades below it.
K2e1 adds
309.1C. Again, only one of these, but there are three with additional
mutations, and a subclade below.
K2e1a
adds 524.3C and 524.4A. There is only one perfect example, with one with a
personal mutation. What is interesting is that the second half of the 524.1C,
etc., series was added, but only after the intervening 309.1C. Or else
something else happened that I don't understand. At first glance, K2e1a looks
like a singleton below K2b1, but K2e involves the 524.1C sequence of insertions
and K2b1 involves the 524.1A insertions.
Countries
of origin for K2e are quite varied: one each from
GENERAL
COMMENTS
Countries
of origin for 54 K1 sequences are: 6
Regions
of origin for K1 sequences are: 15 British Isles, 5 Eastern Europe, 3 Germanic,
2 Scandinavia, 2 Western Europe, 16
Countries
of origin for 92 K2 sequences are: 8 England, 6 Ireland, 5 Germany, 4 Poland, 3
Ukraine, 3 Hungary, 2 Scotland, 2 Canada, 2 Northern Ireland, 2 Austria, 1 each
France, Romania, Slovakia, Czech, Greece, Lithuania, Russia, Denmark, Italy,
UK, plus 21 USA and 24 Unknown.
Regions
of origin for K2 sequences are: 19 British Isles, 15 Eastern Europe, 7
Germanic, 2 Western Europe, 2 Southeastern Europe, 23 North America, 24
Unknown.
If
K1: 56%
British Isles, 19% Eastern Europe, 11% Germanic, 7% Western Europe, 7%
Scandinavia.
K2: 42%
British Isles, 33% Eastern Europe, 16% Germanic, 4% Western Europe, 4%
Southeastern Europe.
K1 is 50%
North America and Unknown; K2 is 49%.
It is
easy to see that K1 is weighted toward the west and north of Europe, while K2
displays a trend toward the east and south.
I would
hazard a guess that a high percentage of the USA and Unknown entries are from
British Isles and Western Europe; their ancestors may have arrived in the USA
so long ago that the origin has been forgotten or is not provable. Late
arrivals from other regions would be more likely to provide a definite county
of origin. If I'm correct, then the west/east weightings above would be even
more pronounced.
Subclade
K1, in tree terms, is tall and thin. Subclade K1a has 40 examples, while the
other two have only 3 and 8.
Subclade
I have
left two entries as "questionable"; not easily placed in a subclade.
One is probably K1, the other K2.
With more
entries in the future, more subclades will no doubt be formed as singletons
gain exact matches. Also, the arrangement of the lower subclades may change.
Certain
mutations seem to be involved more often in parallel mutations, resulting in
their appearance in more than one lower subclade. These include 16093C, 114T,
146C, 152C, 195C, 309.1C, and the sequence of four insertions 524.1C, 524.2A,
524.3C, 524.4A.
The above
four-insertion sequence and its opposite, 524.1A, 524.2C, 524.3A, 524.4C,
always appear in groups of two or four and rotate from one letter to the other.
The other multiple-insertion sequence 309.1C, 309.2C, has the same nucleotide
in each copy.
Several
insertions are used to define subclades, but only one deletion, 498-. Only one
subclade is defined by back mutations, the strange K2d3, which is defined by
five, at 73G, 114T, 263G, 315.1C, and 497T, as well as five other mutations.
In
conclusion, I found creating the chart to be an interesting exercise. I'm not
sure how useful it will be for others. I have no illusion that this chart will
be adopted as official or semi-official. I don't plan on ordering a K1a1a lapel
pin for myself anytime soon.
©
Copyright William R. Hurst 2005