As so often happens, things took longer than expected. It took us until the end of October to produce five Illumina-style DNA libraries with our special Neanderthal tags. We sequenced aliquots of each library on our Illumina machine and carefully determined the numbers of molecules in the libraries. We found that they contained over 1 billion DNA fragments. This should give us what we needed to complete the genome sequence. We sent the libraries along with custom primers to the Broad Institute for sequencing. The output from the Illumina machines, however, we would analyze using a computer program developed by Martin Kircher that read more nucleotides with fewer errors than the commercial programs provided by Illumina’s program could. The amount of data from the sequencing machines that his program needed was so huge that it was impractical to transfer it via the Internet; instead, we had arranged for large-capacity computer hard drives to be shipped from Boston to Leipzig.
In mid-January 2009, we took delivery of two hard drives containing the results of the first few runs. Now the special tags on our Neanderthal libraries proved their usefulness. Martin discovered that one of the runs from the Broad Institute contained
no
sequences with these tags. Clearly, something had gotten mixed up at the Broad Institute. This was startling, and a bit alarming. I considered shifting the sequencing back to our lab, where by now we had our four new Illumina machines up and running. But the other runs on the two hard drives from the Broad Institute looked good, and we were committed to working with Eric. Finally, on February 6, 2009, eighteen hard drives arrived by FedEx. It was not a day too early. In six days, on February 12, the AAAS meeting would convene.
Martin, Ed, and Udo checked the data from the Broad Institute. The reads carried our tags, the distribution of fragment sizes were the same as what we had seen in our own Illumina runs, and when Udo mapped the reads the results were consistent with the data we had generated in Leipzig. This was a relief. The AAAS had pushed for a press conference to accompany my talk at its Chicago meeting, and I had dreaded having nothing to say. Now I would be able to announce that we had produced the sequences needed to arrive at 1-fold coverage of the genome. But just as I had wanted the initial announcement of the project to be in Leipzig, a city still in the process of emerging from the shadows of its socialist past, I now felt that the AAAS press conference should take place in Leipzig, too. And in recognition of the early support from 454 Life Sciences for our project, I wanted to organize the press conference with them. The AAAS agreed and for February 12 we organized a press conference with 454 in Leipzig and with a video link to Chicago, so that the meeting participants and the Chicago press could ask questions. I would then fly to Chicago for my talk, scheduled for February 15.
This left us just six days to prepare. I focused the largest part of the press release, and my talk in Chicago, on the technical obstacles we had had to overcome in order to arrive at our first view of the genome of an extinct human form. I described how Tomi Maricic had used minute amounts of radioactively labeled DNA to identify and modify the steps where losses of DNA occur, how the tagged libraries produced in our clean room eliminated the problems of contamination that had affected the pilot study, how the detailed studies by Adrian Briggs and Philip Johnson had revealed patterns of errors in the DNA sequences, and how the computer programs Udo Stenzel and Ed Green had developed allowed us to identify and map the Neanderthal DNA fragments while avoiding many pitfalls.
I also wanted to say something about Neanderthals. We had not had time to map, much less analyze, the billion or so DNA sequences. Fortunately, over the past six months, Udo and others had mapped the more than 100 million DNA fragments we had sequenced with the 454 technology. This allowed us tease out a few tidbits of biological relevance. Ed had looked at two cases where others had claimed that gene variants now seen in present-day humans were likely to have been contributed by gene flow from Neanderthals. One of these was a big region of 900,000 bases on chromosome 17 that is inverted (or reversed) on the chromosome in many Europeans. The excellent Icelandic genealogical records had been used to show that the inverted form was associated with slightly higher fertility in women. Did the inverted version come from Neanderthals, as some people had speculated? Ed checked our Neanderthal sequences, and none of the three Neanderthals who had contributed sequences to our effort carried the inverted version. This finding did not rule out the possibility that other Neanderthals may have carried the inverted variant and contributed it to Europeans, but it made it less likely. Similarly, a gene on chromosome 8 that, when mutated, drastically reduces brain size comes in different versions in normal people around the world; the version common in Europe and Asia was suggested to have come from Neanderthals. But Ed showed that our sequences did not carry that variant. So, from looking at these examples, there was no hint of a genetic contribution from Neanderthals to modern Europeans. I was comfortable with this conclusion, which fit what we had found a decade before with the mitochondrial DNA data. But I was to be startled by the results of some other last-minute discoveries.
________________
During the long flight from Chicago back to Leipzig, I tried to soberly assess the status of the project. Although we had now generated all the DNA sequences we needed, much work remained. The first thing we needed to do was to map all the DNA fragments sequenced with the Illumina technology to the chimpanzee genome and to the reconstructed ancestral genome of humans and chimpanzees. The group in Leipzig would now focus on adapting the algorithms that Ed and Udo had developed for the 454 data to our new Illumina data.
Once this was done, we could start asking several questions about our relationship with Neanderthals: when our lines diverged, how different we were, whether our lines had ever mixed, and whether any genes had changed in interesting ways between people today and Neanderthals. To answer such questions, we would need more than just the people in my group—we would need many people from all over the world.
Back in 2006 I had come to realize that our project was historic not only because this was the first time the genome of an extinct form of human was to be sequenced, but also because it was the first time that a small academic group had taken on the sequencing of an entire mammalian genome. Until that time, only large sequencing centers would have been able to undertake such a project. But even those large centers collaborated with other institutions to analyze different aspects of the genomes. We clearly needed to put together some sort of consortium. So in 2006, I started thinking about what types of expertise we would need and which people I would like to work with.
First and foremost, we needed population geneticists. These are geneticists who study DNA sequence variation in a species or a population and, from this, infer what happened to these species or populations in the past. They can tell when populations split, whether they exchanged genes, and whether selection acted upon them. The population geneticists in our group, Michael Lachmann and Susan Ptak, could help with some of these things but we clearly needed input from more people, and we wanted to work only with the very best.
I started contacting people, most of them in the United States, as soon as our project was under way. Almost everyone I talked to wanted to be part of the project—it was clearly a unique opportunity to study a genome most researchers had thought impossible to sequence—but we needed people willing to work full-time or almost full-time on the project for at least a few months so that we could finish the analyses quickly. I had seen too many examples of genome projects that dragged on for months or years because crucial groups had multiple and conflicting commitments. To their credit, when I made this clear, several people realized they had too many other things to do and backed out.
One person I particularly wanted in the group was David Reich, a young professor at Harvard Medical School and a rather unorthodox population geneticist. He first studied physics at Harvard and then went on to do a PhD in genetics at Oxford. I invited him to visit Leipzig in September 2006, and he gave a talk on a controversial paper he and his colleagues had just published that summer in
Nature.
{57}
It suggested that after the initial separation of the populations that would become humans and chimpanzees, the two populations came together more than a million years later and exchanged genes before separating permanently. I found David to be very stimulating to talk to. In fact, I found him on the verge of intellectually intimidating. He produced a torrent of thoughts and ideas at a rate that was challenging and, at times, almost impossible to keep up with. But the intellectual onslaught was balanced by the fact that David is the kindest, gentlest person imaginable. He was and is also remarkably unconcerned about academic prestige. He shares, I imagine, my conviction that academic positions and grants will be available if one simply does good work on interesting problems. I spoke with him about the Neanderthal project during his visit to Leipzig and gave him our manuscript on the pilot study to read on his flight back to Boston. A few days later I received six pages of detailed comments on our paper. It was clear that he was an ideal candidate to work with on the Neanderthal genome.
In fact, working with David would mean that we would have not only his amazing brain involved in the project but also the unique capabilities of his close associate Nick Patterson. Nick had had an even more unusual career than David. He had studied mathematics at Cambridge in the UK and then worked for the British intelligence service as a cryptologist for over twenty years. Some people I have since met have said that at that time he had the reputation of being one of the best code breakers in both the British and US intelligence communities. After leaving the secretive intelligence world, he turned his attention to predicting financial markets; by 2000 he had earned enough money on Wall Street to live comfortably for the rest of his life. Ever intellectually curious, he then moved on to what would later become the Broad Institute in Boston to use his code-cracking abilities on the deluge of genome sequences that were generated there. In Boston, he eventually joined forces with David. Nick is the epitome of what a child might imagine a brilliant scientist to look like. Due to a congenital bone disease his head seems disproportionately large and his eyes are directed in different directions. This makes him seem constantly concerned with higher mathematical problems. I came to learn that he was also a Buddhist, sharing my long-standing but unfortunately not very committed interest in Zen Buddhism. Nick has an uncanny ability to discern patterns hidden in large amounts of data. I was so excited by the prospect of having Nick and David involved in our project that I offered to hire them both for the duration of the project if they would spend at least 75 percent of their time in Leipzig. Although they couldn’t accept this offer, they promised to devote as much attention as they could to the Neanderthal genome, a promise that they would fulfill to an extent that exceeded even my greatest expectations.
Another population geneticist I wanted on board was Montgomery, or “Monty,” Slatkin. He was based at UC Berkeley, where I had first met him in the 1980s when I was a postdoc with Allan Wilson. Monty has had a long and distinguished career as a mathematical biologist and has the level-headedness and balance that are the hallmarks of wisdom and experience. He had trained many brilliant students who went on to head their own groups, and the younger people who worked with him then were equally promising. Foremost perhaps was Philip Johnson, who was to work out the patterns of errors in the Neanderthal sequences alongside Adrian Briggs (see Chapter 14). I was delighted that Monty wanted to join our consortium, not least because his scientific style balanced David’s and Nick’s. Whereas they liked to come up with clever algorithms to infer past population events, Monty liked to construct explicit population models and test whether they fit the variation observed in DNA sequences.
One of the first questions the consortium wanted to take on was perhaps the most hotly contested one: Had Neanderthals contributed DNA to people living in Europe today? After all, they had lived throughout Europe until modern humans appeared around 40,000 years ago, and some paleontologists claimed that they saw Neanderthal traits in the skeletons of early modern people in Europe. The majority of paleontologists disagreed, however, and our 1997 paper analyzing Neanderthal mtDNA had given no hint that they had contributed DNA to present-day Europeans. Only an analysis of the nuclear genome would be able to definitively answer the question.
To understand why an analysis of the nuclear genome would be so much more powerful than analysis of the mtDNA, it’s important to remember that whereas the nuclear genome is composed of over 3 billion nucleotides, the mtDNA genome is made up of only 16,500 nucleotides. In addition, the nuclear genome is reshuffled in each generation, when each chromosome in a pair exchanges pieces with its partner and each chromosome is passed on independently of the others to offspring. Due to the shuffling as well as the sheer size of the nuclear genome, there are many chances to see even a small amount of mixing between two groups. If a child were born from a union between a Neanderthal and a modern human, then it would get roughly 50 percent of its DNA from each of the two groups. If that child then grew up with modern humans and in turn reared children with them, its children would carry, on average, 25 percent Neanderthal DNA, its grandchildren would carry 12.5 percent, its great-grandchildren would carry about 6 percent, and so on. Although the contribution decreases rapidly in this scenario, 6 percent of the genome still represents over 100 million nucleotides. And eventually the Neanderthal DNA would spread throughout the population so everyone would have some proportion of it. At that point, when both parents of a child would carry roughly similar proportions of Neanderthal DNA, it would not become further diluted but remain indefinitely in the population. Also, if mixing did happen, it likely didn’t occur only once. And if the population where the mixed children lived ended up expanding so that, on average, there would be more than one child per person in the next generation, then the contribution would tend to not get lost. We of course know that the modern human population expanded after they came to Europe and replaced the Neanderthals, so I felt certain that we would be able to see even a rather small contribution if it existed. But since the mtDNA showed no signs of a contribution, I still tended to think that no contribution had occurred.