Read Ancient DNA: Methods and Protocols Online
Authors: Beth Shapiro
Next, we converted one multiplex PCR for each primer set for each sample (two libraries per sample) into a barcoded sequencing library using the barcoding “protocol 2” for preamplifi ed DNA as described in Chapter 19 (
( 11
)
; Fig. 1
). This approach directly couples the barcoding protocol and the library preparation process by including the barcode sequence in the adapter sequence. We then quantifi ed all libraries using quantitative PCR (qPCR). According to the qPCR results, we pooled the libraries in equimolar ratios and sequenced them simultaneously on a small (1/16th) lane of a 454 FLX sequencing plate. After sequencing, we sorted all of the obtained reads according to their barcode sequence, in this case the fi rst seven bases of the sequencing reads. In the ideal scenario,
Fig. 1. Schematic overview of the combined protocol coupling fi rst-step multiplex PCR presented in Chapter 17 directly to barcoding protocol 2 of Chapter 19.
174
M. Stiller
all barcodes would be represented evenly in terms of number of reads. Errors introduced during upstream steps, such as incorrect quantifi cation in qPCR, errors in the dilution steps of the pooling procedure, or simple pipetting errors, can, however, result in an under-or overrepresentation of barcodes in the fi nal sequencing output.
After verifying a fairly balanced representation of barcodes in the library pool and a suffi cient enrichment of the target fragments, we sequenced four multiplex PCRs (the “odd” and “even” sets in replicates, respectively) for each of the selected cave bear samples on a full 454 FLX run. We then performed a second round of multiplex PCR in order to fi ll remaining gaps in the cave bear mitochondrial genome sequences. In this multiplex PCR, however, the primer sets contained only those primer pairs that fl anked missing sequence data. To compensate for the reduced number of targeted fragments and to ensure amplifi cation of the target fragments above the environmental DNA background, we increased the
number of cycles in the PCR from 20 to 25.
3. Results
and Discussion
Fifty-six of the one hundred and ten cave bear specimens tested showed suffi ciently well-preserved DNA to be used in multiplex PCR. After 20 cycles, the reactions were converted into barcoded sequencing libraries, quantifi ed and pooled in equimolar ratios, and sequenced on a small (1/16th) 454 FLX lane. Analysis of these initial sequencing results revealed differences between the samples, either in DNA preservation or in the amount of contamination with exogenous DNA (e.g., fungal and bacterial DNA). The proportion of sequence reads that matched the target fragments varied widely among the 56 specimens used, from 1% to 100%.
As only 1% of reads matching target fragments is insuffi cient to compile a consensus sequence, we continued to process only those samples that were the best preserved. We applied an arbitrary cutoff in which we required at least 40% of the sequencing reads to have matched the targeted fragments in order to keep a sample in the experiment. Instead of applying this cut-off, one could have chosen to re-amplify the more poorly performing samples (those showing low amounts of endogenous DNA and/or high levels of contamination with exogenous DNA), this time increasing the number of PCR cycles to 25 or up to 30 cycles. Note that increasing the number of cycles will also increase the uneven representation among the target fragments in the reaction, due to differences in amplifi cation effi ciency among primer pairs. Too few cycles, however, may be insuffi cient to enrich for the target fragments over the environmental background DNA. It is therefore highly 20 Case Study: Targeted high-Throughput Sequencing…
175
recommended to determine the ratio of reads matching target fragments to reads matching environmental background DNA
prior to fi nal deep sequencing.
In this case, we continued to process 31 of the 56 samples that met our preservation criterion. Based on the obtained output, 112
of the 128 target fragments were covered by sequencing reads on average among the 31 samples. Thus, based on only one full run of the 454 FLX instrument, on average 87% of the mitochondrial genome was obtained from 31 individuals, representing more than 7 kilobases (kb) of replicated, overlapping sequence from all of the 31 individuals. With only one more round of gap fi lling, on average 96% of the mitochondrial genomes were covered, translating into ~10 kb of overlapping sequence from all individuals.
Phylogenetic analyses of the consensus sequences revealed a stable topology with very high statistical support, indicating strong evidence for the reciprocal monophyly of the three cave bear lineages
( 4 )
.
DMPS has also been used successfully in experiments to amplify whole mitochondrial genomes from a modern polar bear and a fossil mammoth, as well as to amplify multiple nuclear loci from a modern African elephant
( 4 )
. In addition to using different primer sets designed for the respective species and target loci, the only other modifi cation to the protocol described above was, when modern samples were used, to lower the number of PCR cycles from 20 to 15.
These results show that no extensive optimization of primer sets is necessary to successfully apply DMPS to ancient or modern DNA sequencing experiments. Further, DMPS, like traditional PCR, of
fers full single-molecule sensitivity ( 10
) , as no pretreatment of the aDNA extract (e.g. library preparation) is necessary prior to amplifi cation. The protocol is therefore an easy-to-implement, robust, and cost-effi cient way to quickly retrieve many kb of homologous sequence data from large numbers of highly degraded samples, such as fossil remains and poorly preserved samples from museum, forensic, and medical collections.
Acknowledgments
I thank M Meyer and M Hofreiter for help throughout the research project; B Hoeffner and A Aximu for running the 454 sequencer; G Baryshnikov, H Bocherens, A Grandal d’Anglade, B Hilpert, T Kutznetsova, S Münzel, R Pinhasi, G Rabeder, W Rosendahl, and E Trinkaus for providing samples; K Finstermeier for help with the fi gure and the Max Planck Society and National Science Foundation (award ANS-0909456) for fi nancial support.
176
M. Stiller
References
1. Bon C, Caudy N, de Dieuleveult M, Fosse P,
deep divergences and complex phylogeographic
Philippe M, Maksud F, Beraud-Colomb E,
patterns. Mol Ecol 18:1225–1238
Bouzaid E, Kefi R, Laugier C, Rousseau B, 7. Hofreiter M, Rabeder G, Jaenicke-Despres V, Casane D, van der Plicht J, Elalouf JM (2008)
Withalm G, Nagel D, Paunovic M, Jambresic
Deciphering the complete mitochondrial
G, Pääbo S (2004) Evidence for reproductive
genome and phylogeny of the extinct cave bear
isolation between cave bear populations. Curr
in the Paleolithic painted cave of Chauvet. Proc
Biol 14:40–43
Natl Acad Sci U S A 105:17447–17452
8. Rabeder G, Hofreiter M, Withalm G (2004)
2. Binladen J, Gilbert MT, Bollback JP, Panitz F,
The systematic position of the Cave Bear from
Bendixen C, Nielsen R, Willerslev E (2007)
Potocka zijalka (Slovenia). Mitt Komm
The use of coded PCR primers enables high—
Quartärforsch Österr Akad Wiss 13:197–200
throughput sequencing of multiple homolog 9. Rohland N, Hofreiter M (2007) Ancient DNA amplifi cation products by 454 parallel sequenc—
extraction from bones and teeth. Nat Protoc
ing. PLoS One 2:e197
2:1756–1762
3. Meyer M, Stenzel U, Hofreiter M (2008) 10. Dear PH, Cook PR (1993) Happy mapping: Parallel tagged sequencing on the 454 plat—
linkage mapping using a physical analogue of
form. Nat Protoc 3:267–278
meiosis. Nucleic Acids Res 21:13–20
4. Stiller M, Knapp M, Stenzel U, Hofreiter M,
11. Knapp M, Stiller M, Meyer M (2011)
Meyer M (2009) Direct multiplex sequenc—
Generating barcoded libraries for multiplex
ing (DMPS)—a novel method for targeted
high-throughput sequencing. In: Shapiro B,
high-throughput sequencing of ancient and
Hofreiter M (eds) Ancient DNA. Springer,
highly degraded DNA. Genome Res 19:
New York
1843–1848
12. Fulton TL, Stiller M (2011) PCR amplifi ca—
5. Pacher M, Stuart AJ (2009) Extinction chro—
tion, cloning and sequencing of ancient DNA.
nology and palaeobiology of the cave bear
In: Shapiro B, Hofreiter M (eds) Ancient DNA.
(
Ursus spelaeus
). Boreas 38:189–206
Springer, New York
6. Knapp M, Rohland N, Weinstock J, Baryshnikov
13. Stiller M, Fulton TL (2011) Multiplex PCR
G, Sher A, Nagel D, Rabeder G, Pinhasi R,
amplifi cation of ancient DNA. In: Shapiro B,
Schmidt HA, Hofreiter M (2009) First DNA
Hofreiter M (eds) Ancient DNA. Springer,
sequences from Asian cave bear fossils reveal
New York
Target Enrichment via DNA Hybridization Capture
Susanne Horn
Abstract
Recent advances in high-throughput DNA sequencing technologies have allowed entire nuclear genomes to be shotgun sequenced from ancient DNA (aDNA) extracts. Nonetheless, targeted analyses of specifi c genomic loci will remain an important tool for future aDNA studies. DNA capture via hybridization allows the effi cient exploitation of current high-throughput sequencing for population genetic analyses using aDNA samples. Specifi cally, hybridization capture allows larger data sets to be generated for multiple target loci as well as for multiple samples in parallel. “Bait” molecules are used to select target regions from DNA libraries for sequencing. Here we present a brief overview of the currently available hybridization capture protocols using either an in-solution or a solid-phase (immobilized) approach. While it is possible to purchase ready-made kits for this purpose, I present a protocol that allows users to generate their own custom bait to be used for hybridization capture.
Key words:
Ancient DNA , Target enrichment , Hybridization , DNA capture , Bait , High-throughput sequencing
1. Introduction
Shotgun sequencing using next-generation sequencing techniques has been used to sequence entire genomes of ancient specimens
( 1– 3 )
. However, this approach remains prohibitively expensive for many users, and generally provides data from only a single specimen. Analyses of ancient populations generally do not focus on complete genome sequences, but instead on selected genomic loci that can be targeted from many individuals.
In many ancient DNA (aDNA) extracts, DNA fragments representing the target loci are present at very low copy-number compared to sequences of contaminating exogenous DNA. Such experiments therefore require an enrichment step, where the amount of target DNA is increased in a library to be sequenced, relative to nontarget DNA. Enrichment is most often achieved via Beth Shapiro and Michael Hofreiter (eds.),
Ancient DNA: Methods and Protocols
, Methods in Molecular Biology, vol. 840, DOI 10.1007/978-1-61779-516-9_21, © Springer Science+Business Media, LLC 2012
177
178
S. Horn
polymerase chain reaction (PCR). This approach, however, is currently being superseded by enrichment strategies that capture DNA by hybridization
( 4– 7 )
. In hybridization capture approaches, a genomic library is fi rst prepared from an aDNA extract and DNA bait molecules representing the target sequence are added to the library. The target DNA molecules in the library will hybridize with the added bait molecules and can then be pulled down out of the library for sequencing. DNA hybridization capture has several advantages compared to traditional PCR. First, while mismatches can prevent the binding of primers in PCR, mismatches are less detrimental for hybridization, making hybridization a useful method to enrich for DNA where the sequence of the ancient specimen is not exactly known. This can also be important when molecules with damage-induced base modifi cations may inhibit primer binding
( 8, 9
) . Second, hybridization is less sensitive to contamination than traditional PCR. While PCR selects for full-length amplicons and therefore tends to amplify longer molecules preferentially (which may be modern DNA contaminants), hybridization targets all lengths of starting molecules more equally. Third, nuclear mitochondrial insertions (numts) may be amplifi ed preferentially by PCR if the primer binding conditions allow. Hybridization, however, should preferentially enrich for the most common fragment, which will be the much higher copy-number mitochondrial sequence. One potential drawback of hybridization capture is the loss of target molecules during library preparation. This is not a problem for PCR, which is theoretically able to begin the amplifi -
cation process from a single starting molecule. Therefore, it is highly recommended that not all of the aDNA extract is used in a single enrichment experiment, but that some is saved for replication if necessary.
The choice of sequencing platform will determine what type of library will need to be prepared prior to enrichment (see T
able 1 ).
This choice may depend on the size of the sequence fragment to be targeted and the number of samples to be processed. Hybridization capture can be used to enrich for fragments ranging in length from a few hundred bases to many megabases (Mb) in size. When the sequencing is complete, only a fraction of the sequencing reads will map to the desired target region, and this also needs to be considered when planning the amount of sequence data that will be required. In previous work, enrichment rates for aDNA varied considerably across experiments: between 18 and 40% of reads could be mapped to a target region of a Neandertal mitochondrial genome
( 10
) ; 37% of reads mapped to targeted nuclear regions of Neander
tals ( 7
) ; and around 20% of reads mapped to a targeted 500-base-pair (bp) region of the mitochondrial control region of beavers (