mtDNA Haplogroup K Project
Progress Report at 1000 Members
June 2, 2008
The mtDNA Haplogroup K Project at FamilyTreeDNA reached a major milestone on June 2, 2008, with 1000 members 29 months after its founding. 979 of the members are shown on the mtDNA Results tab on the K Project website. Twenty-one members on the Results tab tested their mtDNA with nine companies other than FTDNA. As far as I know, we continue to be the second largest mtDNA haplogroup project after the H project.
396, or 39.6%, of our members came originally from the National Geographic Society's Genographic Project. Eight of the members began at FTDNA’s European office. At least 584 of the FTDNA members, or 59.7%, have uploaded their data to MitoSearch; others may have uploaded their results “by hand” and wouldn’t be counted. 241 members have received coding-region results and official subclade designations from full-sequence tests. All except one from Argus BioSciences were FTDNA members. We are waiting for the results from 28 more of these FGS or Mega tests. So 268 or 27.4% of the FTDNA members have ordered FGS tests. 49 of these have been uploaded to the federal GenBank database so far, with other submissions in progress. Of the 228 total FTDNA customer GenBank submissions, 21.5% are from K. There are more K Project members with FGS results than there are K FGS results on GenBank – even counting the ones in both.
656 members, or 67%, of the FTDNA members have HVR1 plus HVR2, or high-resolution, results. In addition, ten of the non-FTDNA entries also have full high-resolution results. (Eleven others have HVR1 only or incomplete HVR2 results.) We have 97 sets of high-resolution matches, including 451 members, or 67.7% of those with high-resolution results. That's up from 60.3% at 500 members. With the 215 unmatched high-res "singletons" added to the 97 haplotypes in matches, there are 312 different high-res haplotypes, for what I've been calling a "diversity percentage" of 46.8% - compared to 55.7% at 500 members. That percentage continues to go down as new members are more likely to find matches. The new percentage now means that the odds favor a new member with high-res results having an exact match.
The 666 high-resolution results appear on a CHART sorted by HVR2 then HVR1. A legend at the bottom of the chart explains the colors used. The Haplo column contains either K or the subclades assigned by FTDNA from full-sequence tests. There are now 25 different subclades from the FGS tests. Provisional subclades K1a10 and K1a11 are shown as just K1a.
The haplotype (list of mutations) with the largest number of matches is now a branch of the Ashkenazi K1a1b1a subclade, with 33 members. It has the six basic K mutations - 16224C, 16311C, 16519C, 73G, 263G and 315.1C - plus 497T, the defining mutation for K1a, plus the defining HVR mutation for K1a1b1a, 16234T, plus 114T.
The second largest haplotype, with 26 members, is the ancestral haplotype of subclade K2a with the six basic mutations plus 146C and 152C.
The third largest haplotype, with 24 members, is the same as the largest in K1a1b1a with the additional mutation 16223T. I have referred to this as the “second modal” haplotype of K1a1b1a.
The fourth largest haplotype, with 22 members, has the basic six mutations plus 146C, 152C, 498- and 16320T - the ancestral and modal K1c2.
The fifth largest with 19 members is in K1a9, another Ashkenazi subclade; this adds 195C, 497T, 16093C and the key mutation 16524G. Note that this one is similar to its “sister” subclade K1a10, found most commonly in those with Irish ancestry.
The sixth largest haplotype, with 17 members, has the basic six mutations, plus 497T and 195C; the latter mutation defines a large group under K1a. Added to that are 16048A, 16093C, 16291T and two pairs of position 524 insertions to form the modal haplotype of a subclade I’ve provisionally called K1a10.
The number of matches for each haplotype is shown in the Counts column of the CHART. Remember that the subclades are only officially determined by full-sequence tests. Also, it should be noted that mutations don’t have equal value for finding close connections. Two haplotypes only differing on 497T, for example, would not be related in many thousands of years. But two differing on 309.1C could be as close as siblings.
HVR1 and HVR2 mutation lists highlighted in yellow are those with the 498- and
16320T mutations, almost always suggesting K1c1 or K1c2 subclades. Those in
bright green are generally those in Dr. Doron Behar's Ashkenazi subclades,
K1a1b1a, K1a9 and K2a2a. About 123 members, or 18.5% of the 666 high-resolution
entries, are in one of the Ashkenazi subclades. I have used purple to mark the
524 series of HVR2 insertions. Of the 195, or 29.3%, with the 524's, eight
members have six of those insertions each, while one has eight.
FTDNA recently implemented the long-awaited Members Subgrouping feature for mtDNA results, which is reflected in the members’ chart under the mtDNA Results tab on our website. The feature allows the group administrator to create subgroups and assign members to them. Usually the subgroups are named subclades of K. With a few exceptions the subgroups will contain those members assigned to subclades by full-sequence (FGS) tests plus those predicted to be in the subclades from HVR results. A few subclades are combined into subgroups. Other subgroups are named after provisional subclades not on Dr. Doron Behar’s current K tree. All these will be discussed below. The subgroups assigned to members may change as new information becomes available. It should be remembered that official subclade designations are assigned only by FGS tests. The counts mentioned below are as of the 1000-member report for the K Project.
K1: This subgroup only contains the two members assigned to subclade K1 by FGS tests. The members have the defining coding-region mutations for K1, but not for any of the lower K1a, K1b or K1c subclades. This is the subclade of Ötzi the Iceman. There is no way to predict this subclade from HVR results.
K1a – Designated: These three members have been designated as K1a, but not assigned to any lower subclade. Excluded are those designated as K1a, but who fall into one of the provisional subclades discussed below.
K1a – Predicted: This large subgroup includes those who have not received assignments from FGS tests, but who either have the defining mutation for K1a, 497T, or who have certain exact HVR1 matches with those in K1a. All of these members would move to another subgroup if FGS test results were available. Excluded are those whose HVR results indicate membership in one of the deeper K1a subclades.
K1a + 195C: On Behar’s K tree there is a group of sequences including the Ashkenazi K1a9 subclade and a few others with no subclade labels. This subgroup excludes those in K1a9 and the provisional K1a10. I have previously called those with 195C and one or more pairs of position 524 insertions “Pre-K1a10” and those without the insertions “Pre-K1a9.” Some members of this subgroup might end up in other subclades, usually K1a4a1, if FGS tests were taken. Since this subgroup requires two specific HVR2 mutations, there are no members in it with just HVR1 results.
K1a1: This only includes two members designated by FGS tests plus one exact HVR match. It’s usually very difficult to predict K1a1 and its lower subclades from HVR-only results.
K1a10: Due to the alphanumeric method by which the subgroups are sorted by FTDNA, this subgroup follows K1a1, although it is not part of that group. K1a10 is a large provisional subclade solely defined by HVR1 mutation 16048A. All members have 195C, so they are relatively close to the K1a9 subclade and the “K1a + 195C” subgroup. Members with FGS results are temporarily shown just as K1a. There is some chance that on a future K tree this group will have some other designation, but I have published a description of it and placed it on the K tree in an article in the Fall 2007 issue of the Journal of Genetic Genealogy.
K1a11: This is another provisional subclade, easily predicted by HVR2 mutation 16T. All members also have 16129A, 150T and 199C; but those are also found in other subclades. This subclade may not be predicted by HVR1-only results. Members with FGS results also have specific defining coding-region mutations. As with the provisional K1a10, members are now labeled as just K1a. K1a11 was also described in my JoGG article mentioned above.
K1a1a: This subgroup only includes those assigned to the subclade by FGS results. There are no HVR mutations with predictive value.
K1a1b & K1a1b1: This subgroup includes those assigned to the two “mother-daughter” subclades by FGS results. There are no HVR mutations which would separate them, but together they can usually be predicted by some HVR matches or by 114T without K1a1b1a’s 16234T.
K1a1b1a: This largest Ashkenazi subclade includes most of those with HVR2 mutation 16234T. That mutation is not 100% predictive, but close to it. The addition of 16223T and/or 114T makes the prediction even more certain.
K1a2: This small subgroup only includes those assigned by FGS results and one exact match.
K1a3, K1a3a & K1a3a: This subgroup includes three “mother-daughter-granddaughter” subclades which are difficult to predict from HVR results. Also included are two HVR-only sequences which are exact matches and connected by geographical origin.
K1a4 & K1a4a: This subgroup includes two “mother-daughter” subclades. Almost all members have been designated by FGS results. A few HVR-only sequences are included due to specific mutations which are predictive in context.
K1a4a1: This subclade is the “granddaughter” to the above subgroup, but is much larger. Most sequences may not be predicted from HVR-only results. However, all those with 16245T have turned up here when FGS-tested. Mutation 16261T is often predictive. Several of these have 195C and thus might have been confused with those in “K1a + 195C” before FGS testing.
K1a9: This second-largest Ashkenazi subclade always has 16524G and so is usually easily predicted even from HVR1-only results. A few with this mutation have turned up in other subclades, but usually other mutations can eliminate the confusion. Since all members have 497T and 195C and no 309 or 524 insertions, membership from high-resolution results can be predicted with almost 100% certainty. (Note that there are no K Project members in subclades K1a5, K1a6, K1a7 or K1a8.)
K1b1a: This subclade may usually be predicted by HVR1 mutation 16319A, although one K2a has that. The addition of 16463G makes the prediction almost certain.
K1b2: This subclade has no predictive HVR1 mutations, but may usually be predicted by the combination of HVR2 mutations 146C and 195C. However, that combination shows up occasionally in other subclades; context is important. The addition of 152C may create confusion with subclade K2a, so several members with all three of those mutations have been left unassigned.
K1c1 & K1c1b: Another subgroup with “mother-daughter” subclades which can’t be separated by HVR-only results. K1c is defined by 146C, 152C and especially the 498- deletion, but so far no members have been assigned to just K1c by FGS tests. Therefore, members with 498- and without 16320T are put into this subgroup. (Since writing this, a K1c1 with 16320T has been confirmed. “Never say never” is a good rule when predicting mtDNA subclades.)
K1c2: This subgroup is easily predicted by high-resolution HVR results with 16320T and 498-. (But see above.) 16320T is also rarely found in other subclades, but all with it are grouped here until proven otherwise.
K2a: This subclade can usually be predicted by HVR2 mutations 146C and 152C and the lack of certain others. It should be noted that 146C and 152C are very recurrent and may be subject to back mutations.
K2a2a: This smallest Ashkenazi subclade – the “granddaughter” of K2a – is easily determined by HVR2 mutation 512C. To my knowledge, that mutation is found in no other subclade or even haplogroup.
K2a3: This small subclade is composed of three members assigned by FGS tests and one exact match with unusual mutations. There is usually no way to differentiate it from its parent K2a with HVR mutations.
K2a4: At present there is a single example of this subclade, which may only be determined by an FGS test.
K2b: This subclade is somewhat smaller than its K2a “sibling.” It can usually determined by the addition of 146C to the basic six HVR K mutations. One member with the additional 152C, which would usually denote K2a, has been determined by an FGS test to be in K2b. Again, this is because 152C is a recurrent mutation. In context, the presence of 16270T, and especially with the addition of 16222T, allows some in K2b to be predicted from HVR1-only results.
Unassigned Members: This section is for all those whose HVR-only results do not permit a prediction of a subclade. New K Project members are placed here automatically upon joining. The addition of HVR2 results from an mtDNARefine test is usually sufficient to move a member to one of the above subgroups. A few with HVR2 are not assigned, usually because of results which might be either in one of the K1b or K2 subgroups. Other sequences don’t have predictive HVR mutations for any subclade.
The News tab on the K Project website should be consulted for recent developments concerning K. Those tested as being in haplogroup K may join the Project by clicking on the blue Join button on their FTDNA personal page, then proceeding through four pages before clicking on yet another Join button.
© 2008 William R. Hurst
Administrator, mtDNA Haplogroup K Project