MITOCHONDRIAL DNA HAPLOGROUP K SURVEY AT 1000 ENTRIES ON MITOSEARCH
AS OF FEBRUARY 6, 2007
[Updated March 27, 2007]
This is my seventh survey of the mtDNA haplogroup K (Katrine’s Clan) entries on FamilyTreeDNA’s MitoSearch. The previous survey at 750 entries may be found at K750 Survey. There are several links on that page to previous surveys, supplements and charts. It has only been five months since the K750 survey.
This K1000 survey includes a new CHART, which is sorted by predicted subclade, then HVR2, then HVR1 mutations. The first step taken was to eliminate all but the high-resolution entries – those with both HVR1 and HVR2 mutations listed. First to be dropped were 54 shown as CRS or matching the Cambridge Reference Sequence, which is represented by no mutations in HVR1. None of those would belong in K, so I suspect that they are mainly entries which were not completed before entering the mutation lists. Under HVR2 were 487 listing “Not Tested” and 57 as “No Mutations.” Since no K would have no HVR2 mutations, I eliminated all of those. I also eliminated one duplicated entry. Last year I created a special entry, ARFHH, which represents the ancestral haplotype for K. Since it does not represent an actual living person, I deleted it from this survey.
The remaining high-resolution entries numbered 454 or 45.4% of the original total – slightly higher than the previous survey. The real percentage, after eliminating the probably incomplete entries labeled CRS, was 48%. Also, some may have upgraded their tests without adding the HVR2 results to MitoSearch. Seven listed the testing company as the National Geographic Society’s Genographic Project. I suspect many more began their testing there, but completed it at FTDNA and so listed that. (41.3% of the members of the Haplogroup K Project started testing with the Genographic Project.) Three listed Relative Genetics. One listed Oxford Ancestors, although an HVR2 upgrade was taken at FTDNA. One each listed Argus BioSciences, DNA Heritage and Sorenson. Four listed Other. The remaining 436 listed FTDNA. Sixty entries (plus 18 with HVR1 only) contain pedigree charts.
There are 279 entries, or 61.5%, with exact high-resolution matches in 66 different haplotypes or sequences. That leaves 175 unmatched singletons for a total of 241 different haplotypes. Dividing by the 454 total gives 53.1% for what I have been calling the “diversity percentage”; but which Ann Turner says is called “discrimination capacity” in scientific papers. That number is down from 59.2% in the K750 survey and 67% in the K500 survey. Thus it is becoming more likely that a new entry will find an exact match. One haplotype has 21 examples. I will list the most common haplotypes with just the “extra” mutations, not including the six basic K mutations: HVR1 – 16224C, 16311C, 16519C and HVR2 – 73G, 263G, 315.1C.
21 – 16234T, 114T, 497T - This is the most common form of the largest “Ashkenazi” subclade K1a1b1a.
17 – 146C, 152C - This haplotype is probably K2a, but some examples may possibly be the “ancestor” of K1c.
16 – 16223T, 16234T, 114T, 497T – This adds 16223T to the basic K1a1b1a subclade above.
16 – 16093C, 16524G, 195C, 497T – This is another Ashkenazi subclade, K1a9.
14 – 146C, 152C, 512C - This is the third Ashkenazi subclade, K2a2a.
13 – 16320T, 146C, 152C, 498- - This is the modal haplotype for subclade K1c2.
11 – 309.1C, 497T – This adds one of the most common recurrent mutations, 309.1C, to the defining mutation for K1a, 497T.
10 – 16048A, 16093C, 16291T, 195C, 497T, 524.1C, 524.2A, 524.3C, 524.4A. I published an analysis of this as yet unnamed “16048A” cluster recently.
Notice that four of the most common haplotypes are forms of the three Ashkenazi K subclades mentioned in Dr. Doron Behar’s 2006 paper. Some other common subclades, such as K1b2 and K1c, have few perfect examples but have many with one or more additional mutations. The two shortest haplotypes without back mutations are those with one extra mutation, either 497T or 146C. The first has seven examples and the second has three. Since no K has just the basic six mutations, these two haplotypes probably represent the closest in time to the founder of K. (Note that I have not considered the non-FTDNA examples, which may not have been tested for certain mutations.) Although I have referred to several K subclades above, the actual subclade designations are usually based on coding-region mutations outside the HVR1/2 regions. See the Behar paper mentioned above. Those mutations may only be tested for K with a full-sequence test.
I have made an attempt to predict the subclades for most of the entries. There are now 33 persons in 14 different subclades in the Haplogroup K Project who have received results including subclade designations from full-sequence tests. Usually, but not always, subclade predictions can be made by comparing haplotypes with those with designated subclades and with Behar’s K chart. (One confirmed K1a4a1 is a perfect HVR match to a confirmed K1a*.) The most difficult to predict are the lower subclades of K1a which are defined entirely by coding-region mutations. For those I have usually just listed “K1a.”
I have also tried to identify possible new subclades not included on Dr. Behar’s chart. The most prominent of those is the one mentioned above which has the 16048A mutation. I have even suggested a designation for it, K1a10, based on its nearness to K1a9 on Behar’s chart. While a formal designation will have to wait for the next scientific paper with a new K chart, I have used my name for it on the Chart. Also, I have been studying the other sequences in K1a with 195C, but which are not in K1a9 or K1a10. These represent almost 10% of all the K entries. On my Chart I have identified those as, respectively, pre-K1a10 or pre-K1a9, depending on whether or not they have one or more pairs of the HVR2 position 524 insertions. Behar has, from one sequence in a Herrnstadt paper, an unnamed sequence close to K1a9, but not under 195C, which has the HVR mutations 16129, 17, 150 and 199. (Note that he doesn’t usually use the mutation letters, unless the mutation is a rare transversion.) While no example of this sequence is found in MitoSearch or the K Project, there are four in MitoSearch with a somewhat similar sequence: 16129A, 16T, 150T and 199C. Another example in the K Project adds two more mutations. In all, there are at least seven in the FTDNA database. Whether or not these are the same as the published example, there are probably enough of them to deserve a separate subclade. For this survey, I have labeled this possible new subclade K1a11.
Almost a year ago I thought I had discovered another new cluster, which I called the bizarre or odd cluster. Its HVR1 mutations looked normal, but the HVR2 mutations were completely abnormal for K, beginning with 133G. For this survey I decided to check to see if their numbers had increased since last March. I found that there was instead one fewer than back then, but that person was still in MitoSearch with different HVR2 mutations! Comparing the old and new entries quickly revealed that if 60 was subtracted from each of the old mutations the results were perfectly normal mutations; 133G became 73G, 323G became 263G, etc. An inquiry to FTDNA revealed that in the early days of MitoSearch such entry mistakes occurred, so FTDNA has promised to fix the remaining six examples. (For this survey, I fixed them by hand with their code numbers marked in aqua.) This problem is not restricted to K; a quick check revealed 10 incorrect entries just in haplogroup H.
Now I will discuss the definitions and characteristics of each of the K subclades found so far in MitoSearch.
K1a: K1a in
general is defined by HVR2 mutation 497T and includes about 60% of K. In my
chart it only includes those which can’t be assigned to a lower subclade. Many
of these come back from full-sequence tests as K1a* because they do not have
the coding-region mutations required for lower subclades. A common lower
subclade, K1a4a1, from full-sequence tests is shown here as just K1a; I have
not tried to match the MitoSearch entries against the list of those with that
assignment. K1a4a1 is not predictable from its highly variable HVR haplotypes.
The same reasoning applies to K1a1, K1a1a and K1a1b. Note that there are, so
far, no K Project members designated K1a2, K1a3, K1a5, K1a6, K1a7 or K1a8.
There are single probable examples of K1a1b1 and K1a3a, but I have included
them in K1a. In general, K1a is common in all area – except for Scandinavia,
where the rather small percentage seems to have come from the
K1a1b1a: This largest Ashkenazi subclade may usually be predicted by the presence of 16234T, with most having 114T and about half having 16223T. No example has any of the 524 insertions.
K1a9: Another Ashkenazi subclade, this is defined by 195C and 16524G. The modal variant includes 16093C. None has position 309 or 524 insertions. There are no defining coding-region mutations.
terminology, this neighbor of K1a9 is defined by 16048A. The modal variant also
has 16291T and 16093C. In common with K1a9, it never has the 309 insertions
which are otherwise common in K subclades. But opposite to K1a9, it always has
524 insertions; the modal has two pairs with one example having three pairs.
This subclade or cluster has numerous variants, including some with the longest
lists of mutations in K. It was likely founded in
my terminology, this group (even I hesitate to call this and the one below
subclades) includes those in K1a with 195C, but not 524 insertions. Its working
label comes from having every mutation for K1a9 but the defining 16524G. I’m
not sure that those with 114T should be included. If K1a9 is descended from one
of this group, it’s interesting that it is not found in
this group has 195C and 524 insertions – everything but the 16048A required for
K1a10. This is not found in Scandinavia, supporting the theory that those in
K1a10 in that area came from the
label suggested by me, the four examples on MitoSearch have 16129A, 16T, 150T
and 199C. All list North American origins. One example in the K Project has two
extra mutations and lists
K1b: This subclade includes those not in one of its lower subclades. It’s almost a “none-of-the-above” category, identified mostly by its lack of 497T or 146C. Some of these on MitoSearch may be in its lower subclades, but with back mutations.
K1b1a: This is defined on Behar’s chart by 16319A and 152C, but all but one on MitoSearch also have 16463G. Half have 524 insertions, half don’t. It is one of the more highly variable subclades, with various combinations of 524s, 16093C and 199C.
K1b2: This large subclade is defined by 146C and 195C. As with the above sibling subclade, it is highly variable with about two-thirds having 524 insertions. This subclade has the greatest range of 524 variants, from zero to three pairs. An even lower subclade or variant with one-fourth of the examples, featuring 16129A and no 524 insertions, but not on Behar’s chart, seems to be forming.
K1c: This is defined by 146C, 152C and 498-. It has many variants; even its modal variant isn’t in double digits on MitoSearch. It never has 524 insertions; in fact, three matching examples have 522- and 523-, which is “below zero” on the scale of 524 variants. The only full-sequence example in the K Project tested as a K1c1, which is defined by added coding-region mutations.
K1c2: This adds 16320T to the above. No example on MitoSearch has 524 insertions. (One pair of siblings in the K Project does have them, but their haplotype is otherwise very unusual. That’s possibly an example of a new mutation at position 524, rather than an inherited heteroplasmic variant.)
K2a: This adds 152C to the above. Several have 309.1C. Others have 522- and 523- as discussed under K1c.
K2a2a: This third
Ashkenazi subclade adds 512c to the above. This subclade, along with the other
two Ashkenazi ones, headed to Eastern Europe, while their parent subclades
stayed in Western Europe – with
[On March 27, 2007, I added a set of Fluxus phylogenetic diagrams of the major subclades above from the MitoSearch data. Start with the discussion.]
As in the past, I have color-coded a few of the mutations on
the chart. I again used yellow to mark the 498- and 16320T mutations, which
usually indicate the K1c and K1c2 subclades. Also, green is used to mark the
16234T, 16524G, and 512C mutations which usually indicate the three “Ashkenazi”
subclades, as discussed above. Blue is used for those with the 16319A mutation,
usually denoting K1b1a.
Another set of tables (in one file) of interest are those produced by Tom Glad’s mtDNAtool: An mtDNA Analysis Utility. The Summary Table shows the mutations for each entry plus a count and frequency for each mutation. (The Haplogroup column is blank, but all are K.) The Genetic Distance Report allows each entry to be compared to the others, ignoring the second of each pair of 524 insertions and the 523 deletion.
As with the K403 Survey, I plan to add charts and maps to a Geographical Supplement; it now contains four charts. Most of those in the K750 and previous surveys are still relevant.
All K’s who tested at FTDNA or who transfer their results from the Genographic Project are welcome to join the mtDNA Haplogroup K Project by clicking on the blue Join button on their FTDNA personal pages. (Those testing with other companies may join by e-mail request, but their mutations will be listed on the Results page rather than the mtDNA Results page.) Further information is available on our project website.
Many aspects of the above comments have been explained in greater detail in the K750, K500 or K403 surveys referenced above.
© 2007 William R. Hurst
Administrator, mtDNA Haplogroup K Project