New National Geographic Society Genographic Project Paper on mtDNA


There is a new and very important scientific paper about mtDNA from the National Geographic Society Genographic Project. The primary author is Dr. Doron Behar of FTDNA, who last year published the K chart with the subclade definitions we use. There is a copy of the PDF version in FTDNA's Library. But if you want to see the full database and other tables, go here. The database and tables are under Supporting Information in the right-hand column. You can also get the paper and supplements at the Genographic site.


As of today, 182 of our 448 FTDNA members transferred their results from the Genographic Project (those whose kit numbers start with "N"). All of you, plus those of us who copied our FTDNA results to the Genographic Project, should be listed in the database. All members can check the database to see how many HVR1 matches you have there. I have one match. Of course, there is no geographical information or any way to contact those matches.


The paper is full of good information about mtDNA in general; some basic, some over my head. I'll let you read that. I will point out one difference you will see in the mutations in the database, the use of "N" for heteroplasmies. As an example, on our K Project website many of us have mutation 16093C. Those who don't are assumed to have the CRS version with base T. However, since each cell has multiple copies of mtDNA, there are often some copies of each variant. FTDNA apparently simply lists the one with the majority, but the database with this paper shows 16093N if some of each variant is detected. (In my humble opinion, if the technology allowed the detection of even one copy of the minority variant, we would all have 16093N and everybody would be a perfect match with everybody else and there would be no use to take all the tests! So FTDNA is using the preferred method for our purposes.)


I did pick out a few items relating specifically to K. Figure 4, which shows how K fits into the entire mtDNA tree, shows us as 8.12% of the total. Table 2 gives information about the database divided four different ways with us ranging from 8.04% to 8.54%. The 8.26% K in MitoSearch is in this range; see here. Both databases are concentrated in U.S. and Western Europe samples.


On p. 1087 of the PDF version of the paper, it mentions that K is usually defined by 16224C and 16311C; but that that combination has been found in other haplogroups. There are three of those in haplogroup H in the database, starting at serial 4095; but those do not have 16519C which almost all of us have. I think we can still differentiate K by just looking at HVR1 mutations. Also mentioned are some K haplotypes without 16224C or 16311C. We have one Project member without 16224C. There are a few more of those in MitoSearch, along with ones missing 16311C. No big deal.


On the same page there is a mention that most of the examples of 16223T below macro-haplogroup R are in our K1a1b1a subclade.


I created a pie chart showing the percentages of a few subclades of K in the paper's database. It is not perfectly accurate as it is based on single HVR1 mutations which occasionally show up in other subclades. However, each of the subclades percentages is within 1% of those from the pie chart I did for MitoSearch a few months ago; see  You will see that fully 68% of the sequences can't be placed in a subclade based on just HVR1 mutations. Most require HVR2 mutations which are not tested by the Genographic Project. Now you see why I recommend all our members upgrade with the mtDNARefine test if they didn't start with the mtDNAPlus test.


Bill Hurst

Administrator, mtDNA Haplogroup K Project