I don’t call Herbert often, so I think he realized that this was a matter of some import and urgency. When I got him on the line, I described our calculations on the feasibility of sequencing the Neanderthal genome and the cost, and I asked if he had any advice on how one might raise that much money in Europe. He said he would think about it and get back to me in a few days. I flew back to Leipzig the next day, torn between hope and despair. Perhaps we could find a rich benefactor, but how do you find one of those?
Two days after my return, true to his word, Herbert called me. The Max Planck Society, he said, had recently set up a Presidential Innovation Fund to support extraordinary research projects. He had discussed our project with the society’s president, and the society was ready—in principle—to support us with the requested funds, distributed over three years. They had even reserved the money, pending a written proposal, which would need to be reviewed by experts in the field. I was totally taken aback; I don’t recall even thanking him properly before hanging up. This money made all the difference in the world! I ran from my office into the lab and babbled the news to the first people I met. Then I immediately sat down and started drafting a proposal describing the results and calculations that had convinced us we could sequence the Neanderthal genome within three years, given sufficient resources.
At the proposal’s conclusion, I had to present a financial plan. When I started working it up, something extremely embarrassing dawned on me. I had called Herbert from the United States and said we needed “5 million” for the project, thinking in terms of US dollars. Herbert, in Europe, must have thought I meant 5 million euros. He may even have
said
that the Max Planck Society had reserved “5 million euros” for our project, but I had been too excited to register it. At the exchange rate, then, that amounted to US$6 million. What to do? Perhaps I could quietly increase the budget to accommodate the additional 20 percent in funds—but that would be disingenuous and might even be noticed once we signed a contract with 454 Life Sciences. I called Herbert and, with considerable embarrassment, explained the situation. He laughed. Then he asked whether we might not have extra costs in Leipzig, above what we were to pay 454 Life Sciences for sequencing. Of course we would. We would have to extract DNA from many fossils to find good ones, and test them all by making sequencing runs from them ourselves. So we needed to buy our own sequencing machine from 454 for testing all the extracts, and we needed reagents to run it. With the difference the exchange rate made, we could really make the project fly. I was elated and wrote a plan that included the work that would be done at our institute in Leipzig.
Meanwhile, Eddy Rubin’s group in Berkeley had made a bacterial library of the entire Neanderthal extract we had sent them. Jim Noonan, Eddy’s postdoc, had sequenced every drop. What they had retrieved amounted to a bit over sixty-five thousand base pairs. In Branford, they had used about 7 percent of the extract we had sent them and produced about a million base pairs. So, just as Adrian had predicted, the direct-sequencing approach was about two hundred times more efficient in generating DNA sequences from an extract. Eddy insisted nonetheless that his method could be more efficient, and that we should continue to send extracts to him. This was a fundamental disagreement. I realized with some disquiet that I could no longer in good conscience send extracts to Berkeley, when we could generate so much more data from each extract in Branford. But I put the decision off, thinking that it would become obvious to Eddy that the bacterial cloning was inefficient once we wrote up a manuscript describing the results of the two different approaches.
However, by this point it was impossible to conceive of a way to write just one paper, given the use of two completely different methods, the tremendous difference in the amounts of data generated, and the disagreement with Eddy about the viability of the bacterial-library approach. So we decided to write two papers. One was to be written by Eddy with us as co-authors, the other by us and Michael Egholm, Jonathan Rothberg, and the others at 454. Eddy’s paper stated: “The low coverage in library NE1 is more likely due to the quality of this particular library rather than being a general feature of ancient DNA,” suggesting that if one assembled more libraries, better results would be achieved. Given that the earlier cave-bear libraries had been just as inefficient, I disagreed with the assessment, but we stayed civil. Eddy submitted the paper in June to
Science
and it was accepted in August. Because we had much more data to analyze for the 454 paper, we couldn’t submit our paper until July to
Nature.
Eddy graciously arranged with
Science
to delay publication of the cloning paper until the paper with 454 Life Sciences had been reviewed and accepted in
Nature
so that the two papers could appear in the same week.
While this was going on, we began to prepare for what we hoped would be the production of large amounts of Neanderthal sequences. The first thing I did was to arrange production of 454 sequencing libraries in our clean room in Leipzig so that the precious, contamination-prone DNA extracts would not have to leave our laboratory. I also used a chunk of the new money to order a 454 sequencing machine so that we could test the libraries. Then Michael Egholm and I worked out a plan. We would make DNA extracts from bones, produce 454 sequencing libraries in our clean room, and use our new 454 sequencing machine to test the libraries. When we identified promising libraries, we would send them to Branford for production sequencing. The sequencing would be done in stages, and we would pay in installments once a certain amount of Neanderthal nucleotides had been sequenced. The latter was my suggestion, and I was amazed that 454 agreed to it, given that our earlier work together had shown that the best library so far had contained only 4 percent Neanderthal DNA and 96 percent assorted unwanted DNA of bacterial, fungal, and unknown origin. We did not yet know what percentage of Neanderthal DNA would be in the libraries we would produce. If it turned out to be 1 percent instead of 4 percent, then 454 would have to sequence four times as much to get its money, since the contract stipulated the number of
Neanderthal
nucleotides sequenced, not the
total
number of total nucleotides (which would include all the bacterial ones). Neither the scientists at 454 nor their attorneys who looked at the contract before it was signed appeared to take any notice of this. In a sense, it didn’t matter, since there was a clause that allowed either party to get out of the collaboration at any time. We were obviously not going to be able to force 454 to sequence forever against their will. But it still seemed a much better contract than one that stipulated that the company would sequence a certain amount of raw nucleotides for us, irrespective of whether these were microbial or Neanderthal in origin.
I felt very good about the collaboration with 454. We complemented each other’s strengths excellently, and the people at the company were fun and easy to talk to. However, one difference between us was that 454 was under great pressure to establish itself in an emerging market for high-throughput sequencing technologies that was clearly going to become very competitive. Already, two other big companies had announced their intention to start selling high-throughput sequencing machines. 454 therefore wanted positive publicity about their involvement in the Neanderthal project, and they wanted this publicity not in two or three years, when the Neanderthal genome would presumably be sequenced and published, but as soon as possible. Just as Michael Egholm took our concerns and priorities into account, I wanted to take their priorities seriously. So when the contract was signed with 454, we allowed them to arrange a press conference in our institute in Leipzig on July 20, 2006, shortly after we had submitted our joint paper to
Nature.
Michael and another senior executive from 454 flew in for the event. We also invited Ralf Schmitz, the curator of the Neanderthal type specimen who had given us samples from the Bonn museum in 1997. He brought along a copy of the Neanderthal bone from which we had determined the first Neanderthal mtDNA sequences. We wrote a press release that pointed out that we were putting together the methods for ancient DNA analysis that our group had developed over many years of painstaking work with 454 Life Sciences’ novel high-throughput sequencing technology to analyze the Neanderthal genome. We also mentioned that, by coincidence, we announced this almost exactly on the day 150 years after the first Neanderthal fossil was discovered in Neander Valley.
The press conference was an electrifying event. The room was full of journalists, and media from across the globe followed it via the Internet. We declared that we would determine about 3 billion Neanderthal nucleotides within two years. It was wonderful to contemplate that what I had started secretly in the lab in Uppsala more than twenty years earlier, afraid that my PhD supervisor would find out what I was doing, had developed into this. It was a heady time.
It was also a time of great scientific and emotional ups and downs. About a month after the press conference came a definitive down. The two papers led by Eddy Rubin’s and our group were not yet out, but we had already shared our 454 Neanderthal data with Jonathan Pritchard, a young and brilliant population geneticist at the University of Chicago who had helped Eddy analyze his smaller data set of cloned Neanderthal DNA fragments. We received an e-mail from two postdocs in Pritchard’s group, Graham Coop and Sridhar Kudaravalli. They were worried about patterns they saw in the 454 data: in particular, there were higher numbers of differences from the human reference genome in the shorter DNA fragments than in the longer DNA fragments. Ed Green in our group quickly confirmed that they were right. This was worrying. It could mean that some of the longer fragments were not from the Neanderthal genome but represented modern human contamination. I e-mailed Eddy, telling him that we saw some worrying patterns in the 454 test data. We agreed to send our data to Eddy’s group in exchange for their data. After the exchange of data, Jim Noonan in Eddy’s group quickly e-mailed back and said that he saw what we and the Chicago postdocs had already seen in the 454 data.
It seemed that we might have to rewrite or withdraw our
Nature
paper, which was already in press, and I e-mailed Eddy, saying that we would try to figure out what was going on as fast as we could in order not to hold up his paper. Back when I was a postdoc in Allan Wilson’s lab, we had once withdrawn a paper that
Nature
had already accepted because we had found that we had made a mistake in the analysis that changed the main conclusions we presented. I worried that we would have to do this again.
There was now frantic activity in our group. It was not unreasonable to assume that the patterns Jonathan’s group saw were due to some level of contamination, but it was not straightforward to come up with an estimate of how much contamination there might be. It would have been an error to simply assume contamination was the problem, however. We were acutely aware that we did not understand many aspects of how the short, damaged ancient DNA sequences behaved in comparison with the human reference genome. Perhaps other factors than contamination were at play? Unfortunately, we needed to act fast as our paper was already in press and Eddy was eager to publish his paper.
Ed had noticed that the shorter Neanderthal fragments in our 454 data contained more G and C nucleotides than the long ones. G and C nucleotides tend to mutate more often than A and T nucleotides, so this could lead to more differences between present-day humans and Neanderthals in the short (and GC-rich) sequences than in the long (and AT-rich) sequences. To test this, Ed matched up short and long Neanderthal fragments to the corresponding sequences in the human reference genome and compared those sequences in the reference human genome to those from other present-day humans. Although those comparisons did not include any Neanderthal sequences at all, they nonetheless showed that the human sequences corresponding to the shorter Neanderthal sequences had more differences from other human sequences than the longer ones. This observation suggested that the GC-rich sequences simply mutated faster, so maybe it would account for the higher number of differences seen in the shorter sequences. Before we could be certain, however, other factors also needed to be considered, especially the way in which we mapped Neanderthal sequences to the human reference genome sequence. Ed noticed that longer fragments of Neanderthal DNA had a better chance of being matched in the correct position in the human genome than shorter fragments, simply because they contained more sequence information. Therefore, a higher percentage of the short fragments might actually be bacterial DNA fragments that just happened to be similar to some part of the human reference genome. This, then, also might contribute to the observation that the shorter fragments contained more differences from the human reference genome. Such a phenomenon might have been overlooked in other ancient data sets—for example, the mammoth data, where fragments were on average longer. But I felt very uneasy. It seemed that every day we uncovered new things about how short and long DNA fragments differed in terms of how they behaved in our analyses. Obviously, we did not understand everything that was going on. What’s more, we still hadn’t excluded the possibility that our samples were contaminated by modern human DNA.
We had, of course, considered the possibility of contamination from the outset. In the extracts we sent to Eddy and to 454, we had assayed the level of contamination based on mtDNA and found it to be low. We knew that contamination could have entered the extracts once they had left our laboratory; we had even put a caveat about this in our
Nature
manuscript. I felt strongly that the only solid assay for contamination we had was the one based on assessing the observed mtDNA fragments, since the mtDNA was the only part of the genome where we
knew
about differences between Neanderthals and modern humans. Everything else was influenced by imponderables, such as differences in GC content, differences in mismapped bacterial DNA fragments, and other unknown factors. So I argued that we should look again at the mitochondrial DNA in the sequences that had been determined by 454.