Finally, when all other efforts had come to naught, it was the group that acquiesced. Tomi reluctantly ordered some radioactive phosphorus, labeled some ordinary human DNA we used for test purposes, and took it through the steps of preparing a 454 sequencing library. The results were stunning. He showed that in each of the first three major steps in the preparation, between 15 and 60 percent of the DNA was lost—a level not entirely unexpected in a biochemical separation. But in the last step, where the complementary DNA strands were separated with a strong alkaline solution, more than 95 percent of the input DNA was lost! Others who used this separation method with ordinary modern DNA had not noticed its inefficiency, because they had so much DNA that these enormous losses didn’t matter to them. For our ancient work, though, they were catastrophic. Once the problem was identified, a simple remedy was devised. Alkaline solutions are not the only way to separate DNA strands; they also separate when they are heated. So Tomi tried heating and found from 10 to 250 times more radioactivity in the final DNA preparation! This was a great, indeed game-changing, advance.
Most labs discard side fractions as by-products. Fortunately, we had saved all of ours from our previous experiments. For years I had insisted on doing so, just in case something came along that would make them useful. This was easily one of my least popular ideas and caused many freezers to be filled with frozen side fractions that no one thought would ever be used. But thankfully in this case the crazy idea of the professor had been adhered to by the group. So now Tomi could simply heat the side fractions from earlier library preparations from the Vindija bones and retrieve additional, relatively copious amounts of Neanderthal DNA without even having to do any more extractions. He also optimized other steps in the library preparation. These changes resulted in a protocol several hundred times more efficient in turning the extracted DNA into a library ready for sequencing.
{52}
Following consultation with our Croatian partners, we dedicated three Vindija bones—Vi-33.16 along with two new bones, Vi-33.25 and Vi-33.26—to the project. All seemed to be fragments of long bones that had apparently been crushed to get at the marrow (see Figure 12.1). Thanks to Tomi’s advance we could now in principle produce libraries that contained 3 billion nucleotides of Neanderthal DNA from just these three bones. But the libraries would still contain at least 97 percent bacterial DNA, so the people in Branford would need to do between four thousand and six thousand runs on their sequencing machines to arrive at 3 billion base pairs of Neanderthal DNA. This was far more than we could ever imagine convincing Michael Egholm to do.
It seemed to me we were still stuck, until someone suggested that perhaps we could find pockets in our three bones where they contained much less bacterial DNA and therefore, relatively speaking, more Neanderthal DNA. Now and again over the years we had indeed seen indications that some parts of a bone might contain higher amounts of bacterial DNA than others, perhaps because bacteria had found growth conditions better in one part of the bone, and therefore multiplied more there, than in other parts. So, fueled by this hope, Johannes tried to systematically identify the best regions to sample. He drilled the bones until they looked first like flutes and then like Swiss cheese. He did indeed find a 10-fold difference in the percentage of Neanderthal DNA in regions just a centimeter or two apart, but the best regions still contained no more than 4 percent Neanderthal DNA!
We came back to this problem again and again in our Friday meetings. To me, these meetings were absorbing social and intellectual experiences: graduate students and postdocs know that their careers depend on the results they achieve and the papers they publish, so there is always a certain amount of jockeying for opportunity to do the key experiments and to avoid doing those that may serve the group’s aim but will probably not result in prominent authorship on an important publication. I had become used to the idea that budding scientists were largely driven by self-interest, and I recognized that my function was to strike a balance between what was good for someone’s career and what was necessary for a project, weighing individual abilities in this regard. As the Neanderthal crisis loomed over the group, however, I was amazed to see how readily the self-centered dynamic gave way to a more group-centered one. The group was functioning as a unit, with everyone eagerly volunteering for thankless and laborious chores that would advance the project regardless of whether such chores would bring any personal glory. There was a strong sense of common purpose in what all felt was a historic endeavor. I felt we had the perfect team (see Figure 13.1). In my more sentimental moments, I felt a love for each and every person around the table. This made the feeling that we’d achieved no progress all the more bitter.
During the spring of 2007, the Friday meetings continued to show our cohesive group from its best side. People threw out one crazy idea after another for increasing the proportion of Neanderthal DNA or finding microscopic pockets in the bones where preservation might be better. It was almost impossible to say who came up with which idea, because the ideas were generated in real-time, during continuous discussions to which everyone contributed. We started talking about ways to separate the bacterial DNA in our extracts from the endogenous Neanderthal DNA: maybe the bacterial DNA differed from the Neanderthal DNA in some feature that we could exploit for this purpose, perhaps a difference in the size of the bacterial and the Neanderthal DNA fragments? Alas, no! The size of bacterial DNA fragments in the bones was largely indistinguishable from that of the Neanderthal DNA.
Figure 13.1.The Neanderthal genome group in Leipzig 2010. From the left: Adrian Briggs, Hernan Burbano, Matthias Meyer, Anja Heinze, Jesse Dabney, Kay Prüfer, me, a reconstructed Neanderthal skeleton, Janet Kelso, Tomi Maricic, Qiaomei Fu, Udo Stenzel, Johannes Krause, Martin Kircher. Photo: MPI-EVA.
Again and again we asked what differences there might be between bacterial and mammalian DNA. And then it struck me: methylation! Methyl groups are little chemical modifications that are common in bacterial DNA, particularly on A nucleotides. In the DNA of mammals, however, C nucleotides are methylated. Perhaps we could use antibodies to methylated A’s to bind and remove bacterial DNA from the extracts. Antibodies are proteins that are produced by immune cells when they detect substances foreign to the body—for example, DNA from bacteria or viruses. The antibodies then circulate in the blood, bind with great strength to the foreign substances wherever they encounter them, and help eliminate them. Because of their ability to specifically bind to substances to which immune cells have been exposed, antibodies can be used as powerful tools in the laboratory. For example, if DNA containing methylated A nucleotides is injected into mice, their immune cells will recognize that the methylated A’s are foreign and make antibodies to them. These antibodies can then be purified from the blood of the mice and used in the laboratory, and I thought we should make such antibodies and then try to use them to bind and eliminate bacterial DNA in our DNA extracts.
A quick literature search revealed that researchers at a company, New England Biolabs outside Boston, had already produced antibodies to methylated A’s. I wrote to Tom Evans, an excellent scientist interested in DNA repair who I knew there, and he graciously sent us a supply. Now I wanted someone in the group to use them to bind to the bacterial DNA and remove it from the extracts. I thought that doing so would leave us with extracts in which the percentage of Neanderthal DNA was much higher. I considered this an ingenious plan. But when I presented it in our weekly meeting, people seemed skeptical—again, it seemed to me, because of their unfamiliarity with the technique. This time, bolstered by the fact that I had been right about the radioactivity, I more or less insisted. Adrian Briggs took it on. He spent months trying to get the antibodies to bind to the bacterial DNA and separate it from nonbacterial DNA. He tried all kinds of modifications of the technique. It never worked and we still don’t know why. For quite some time, I got to hear facetious comments about my wonderful antibody idea.
What else could we try in order to eliminate the bacterial DNA? One idea was to identify sequence motifs found frequently among our bacterial sequences. Perhaps we could then use synthetic DNA strands to specifically bind and remove the bacterial DNA in a way similar to what I had imagined for the antibodies. Kay Pruefer, a soft-spoken computer science student who, after coming to our lab, had taught himself more genome biology than most biology students know, looked for potentially useful sequence motifs. He found that some combinations of just two to six nucleotides—such as CGCG, CCGG, CCCGGG, and so on—were present much more often in the microbial DNA than in the Neanderthal DNA. When he presented this observation in a meeting, it was immediately clear to me what was going on. In fact, I should have thought of this earlier! Every molecular biology textbook will tell you that the nucleotide combination of C followed by G is relatively infrequent in the genomes of mammals. The reason is that methylation in mammals occurs to C nucleotides only when they are followed by G nucleotides. Such methylated C’s may be chemically modified and misread by DNA polymerases so that they mutate to T’s. As a result, over millions of years, mammalian genomes have slowly but steadily been depleted of CG motifs. In bacteria, this methylation of C’s does not occur, or is rare, so CG motifs are more common.
How could we use this information? The answer to that question, too, was immediately obvious to us. Bacteria make enzymes, so-called restriction enzymes, that cut within or nearby specific DNA sequence motifs (such as CGCG or CCCGGG). If we incubated the Neanderthal libraries with a collection of such enzymes, they would chop up many of the bacterial sequences so that they could not be sequenced but leave most of the Neanderthal sequences intact. We would thus tip the ratio of Neanderthal to bacterial DNA in our favor. Based on his analyses of the sequences, Kay suggested cocktails of up to eight restriction enzymes that would be particularly effective. We immediately treated one of our libraries with this mix of enzymes and sequenced it. Out of our sequencing machine came about 20 percent Neanderthal DNA instead of 4 percent! This meant that we needed only about seven hundred runs on the machines in Branford to reach our goal—a number within the realm of possibility. This small trick was what made the impossible possible. The only drawback was that the enzyme treatment would cause us to lose some Neanderthal sequences—the ones that carried particular runs of C’s and G’s—but we could pick up those sequences by using different mixtures of enzymes in different runs and by doing some runs without any enzymes. When we presented our restriction enzyme trick to Michael Egholm at 454, he called it brilliant. For the first time, we knew that we could in principle reach our goal!
While all this was going on, a paper appeared by Jeffrey Wall, a young and talented population geneticist in San Francisco whom I had met on several occasions. It compared the 750,000 nucleotides that our group had determined by 454 sequencing from the Vi-33.16 bones and published in
Nature
with the 36,000 nucleotides that Eddy Rubin had determined by bacterial cloning from our extracts of the same bone and published in
Science.
Wall and his co-author, Sung Kim, pointed out several differences in the data sets, many of which we had already seen and discussed extensively when the two papers were in review. They suggested that there could be several possible problems with the 454 data set but favored the interpretation that there were huge amounts of present-day human contamination in our library. In particular, they suggested that between 70 and 80 percent of what we had thought was Neanderthal DNA was instead modern human DNA.
{53}
This was troubling. We were aware that we might have some contamination in both the
Nature
and
Science
data sets, as we had sent the extracts to laboratories that did not work under clean-room conditions. We were also aware that if there was a difference in levels of contamination, they would be higher in the
Nature
data set produced at 454. We were sure, however, that any contamination levels could not be 70 to 80 percent, because Wall’s analysis relied on assumptions, such as similar GC content in short and long fragments, that we knew were not true.
In an attempt to clarify these issues, we immediately asked
Nature
to publish a short note, in which we pointed out that several features differed between the sequences determined by the 454 technique and by bacterial cloning, and that some of the features were likely to affect the analysis. We also wanted to mention that our additional sequencing of the library had indicated very little contamination based on mtDNA. But we further realized that some level of contamination had probably been introduced into the library at 454, perhaps from a library of Jim Watson’s DNA that it turned out 454 had sequenced at the same time as our Neanderthal library. So in the note we conceded that “contamination levels above that estimated by the mtDNA assay may be present.” But by how much was impossible to tell. We pointed the readers both to Wall’s paper and to the paper in which we described the use of tags in the library production that now made any contamination outside the clean room impossible.
{54}
We also posted a note in the publicly available DNA sequence database, so that any potential users would know of the concerns with these data. But, to my annoyance, after sending our note to reviewers,
Nature
decided not to publish it.