`
`Vol 444 | 16 November 2006 | doi:10.1038/nature05336
`
`Analysis of one million base pairs of
`Neanderthal DNA
`
`Richard E. Green1, Johannes Krause1, Susan E. Ptak1, Adrian W. Briggs1, Michael T. Ronan2, Jan F. Simons2, Lei Du2,
`Michael Egholm2, Jonathan M. Rothberg2, Maja Paunovic3{ & Svante Pa¨a¨bo1
`
`Neanderthals are the extinct hominid group most closely related to contemporary humans, so their genome offers a unique
`opportunity to identify genetic changes specific to anatomically fully modern humans. We have identified a 38,000-year-old
`Neanderthal fossil that is exceptionally free of contamination from modern human DNA. Direct high-throughput sequencing
`of a DNA extract from this fossil has thus far yielded over one million base pairs of hominoid nuclear DNA sequences.
`Comparison with the human and chimpanzee genomes reveals that modern human and Neanderthal DNA sequences
`diverged on average about 500,000 years ago. Existing technology and fossil resources are now sufficient to initiate a
`Neanderthal genome-sequencing effort.
`
`Neanderthals were first recognized as a distinct group of hominids
`from fossil remains discovered 150 years ago at Feldhofer in Neander
`Valley, outside Du¨sseldorf, Germany. Subsequent Neanderthal finds
`in Europe and western Asia showed that fossils with Neanderthal
`traits appear in the fossil record of Europe and western Asia about
`400,000 years ago and vanish about 30,000 years ago. Over this period
`they evolved morphological traits that made them progressively
`more distinct from the ancestors of modern humans that were evol-
`ving in Africa1,2. For example, the crania of late Neanderthals have
`protruding mid-faces, brain cases that bulge outward at the sides, and
`features of the base of the skull, jaw and inner ears that set them apart
`from modern humans3.
`The nature of the interaction between Neanderthals and modern
`humans, who expanded out of Africa around 40,000–50,000 years
`ago and eventually replaced Neanderthals as well as other archaic
`hominids across the Old World is still a matter of some debate.
`Although there is no evidence of contemporaneous cohabitation at
`any single site, there is evidence of geographical and temporal over-
`lap in their ranges before the disappearance of Neanderthals.
`Additionally, late in their history, some Neanderthal groups adopted
`cultural traits such as body decorations, potentially through cultural
`interactions with incoming modern humans4.
`In 1997, a segment of the hypervariable control region of the mater-
`nally inherited mitochondrial DNA (mtDNA) of the Neanderthal type
`specimen found at Feldhofer was sequenced. Phylogenetic analysis
`showed that it falls outside the variation of contemporary humans
`and shares a common ancestor with mtDNAs of present-day humans
`approximately half a million years ago5,6. Subsequently, mtDNA
`sequences have been retrieved from eleven additional Neanderthal spe-
`cimens: Feldhofer 2 in Germany7, Mezmaiskaya in Russia8, Vindija 75,
`77 and 80 in Croatia9,10, Engis 2 in Belgium, La Chapelle-aux-Saints and
`Rochers de Villeneuve in France10, Scladina in Belgium11, Monte Lessini
`in Italy12, and El Sidron 441 in Spain13. Although some of these
`sequences are extremely short, they are all more closely related to one
`another than to modern human mtDNAs9,11.
`This fact, in conjunction with the absence of any related mtDNA se-
`quences in currently living humans or in a small number of early mod-
`ern human fossils5,10 strongly suggests that Neanderthals contributed no
`
`mtDNA to present-day humans. On the basis of various population
`models, it has been estimated that a maximal overall genetic contri-
`bution of Neanderthals to the contemporary human gene pool is
`between 25% and 0.1% (refs 10, 14). Because the latter conclusions
`are based on mtDNA, a single maternally inherited locus, they are
`limited in their ability to detect a Neanderthal contribution to the
`current human gene pool both by the vagaries of genetic drift and by
`the possibility of a sex bias in reproduction. However, both morpho-
`logical evidence4,15 and the variation in the modern human gene pool16
`support the conclusion that if any genetic contribution of Neanderthals
`to modern human occurred, it was of limited magnitude.
`Neanderthals are the hominid group most closely related to cur-
`rently living humans, so a Neanderthal nuclear genome sequence
`would be an invaluable resource for annotating the human genome.
`Roughly 35 million nucleotide differences exist between the genomes
`of humans and chimpanzees, our closest living relatives17. Soon,
`genome sequences from other primates such as the orang-utan and
`the macaque will allow such differences to be assigned to the human
`and chimpanzee lineages. However, temporal resolution of the gen-
`etic changes along the human lineage, where remarkable morpho-
`logical, behavioural and cognitive changes occurred, are limited
`without a more closely related genome sequence for comparison.
`In particular, comparison to the Neanderthal would enable the iden-
`tification of genetic changes that occurred during the last few hun-
`dred thousand years, when fully anatomically and behaviourally
`modern humans appeared.
`
`Identification of a Neanderthal fossil for DNA sequencing
`Although it is possible to recover mtDNA18 and occasionally even
`nuclear DNA sequences19–22 from well-preserved remains of organ-
`isms that are less than a few hundred thousand years old, determina-
`tion of ancient hominid sequences is fraught with special difficulties
`and pitfalls18. In addition to degradation and chemical damage to the
`DNA that can cause any ancient DNA to be irretrievable or misread,
`contamination of specimens, laboratory reagents and instruments
`with traces of DNA from modern humans must be avoided. In fact,
`when sensitive polymerase chain reaction (PCR) is used, human
`
`1Max-Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany. 2454 Life Sciences, 20 Commercial Street, Branford, Connecticut 06405, USA.
`3Institute of Quaternary Paleontology and Geology, Croatian Academy of Sciences and Arts, A. Kovacica 5/II, HR-10 000 Zagreb, Croatia.
`{Deceased.
`
`330
`
` ©2006
`
`Nature Publishing Group
`
`Petitioner Sequenom - Ex. 1012, p. 1
`
`
`
`NATURE| Vol 444 | 16 November 2006
`
`ARTICLES
`
`mtDNA sequences can be retrieved from almost every ancient spe-
`cimen23,24. This problem is especially severe when Neanderthal
`remains are studied because Neanderthal and human are so closely
`related that one expects to find few or no differences between
`Neanderthals and modern humans within many regions25, making
`it impossible to rely on the sequence information itself to distinguish
`endogenous from contaminating DNA sequences. A necessary first
`step for sequencing nuclear DNA from Neanderthals is therefore to
`identify a Neanderthal specimen that is free or almost free of modern
`human DNA.
`We tested more than 70 Neanderthal bone and tooth samples from
`different sites in Europe and western Asia for bio-molecular preser-
`vation by removing samples of a few milligrams for amino acid
`analysis. The vast majority of these samples had low overall contents
`of amino acids and/or high levels of amino acid racemization, a
`stereoisomeric structural change that affects amino acids in fossils,
`indicating that they are unlikely to contain retrievable endogenous
`DNA26. However, some of the samples are better preserved in that
`they contain high levels of amino acids (more than 20,000 p.p.m.),
`low levels of racemization of amino acids such as aspartate that
`racemize rapidly, as well as amino acid compositions that suggest
`that the majority of the preserved protein stems from collagen.
`From 100–200 mg of bone from six of these specimens we extracted
`DNA and analysed the relative abundance of Neanderthal-like
`mtDNA sequences and modern human-like mtDNA sequences by
`performing PCR with primer pairs that amplify both human and
`Neanderthal mtDNA with equal efficiency. The amplification pro-
`ducts span segments of the hypervariable region of the mtDNA in
`which all Neanderthals sequenced to date differ from all contempor-
`ary humans. From subsequent cloning into a plasmid vector and
`sequencing of more than a hundred clones from each product, we
`determined the ratio of Neanderthal-like to modern human-like
`mtDNA in each extract. We used two different primer pairs that
`amplify fragments of 63 base pairs and 119 base pair to gauge the
`contamination levels for different lengths of DNA molecules.
`Figure 1 shows that the level of contamination differs drastically
`among the samples. Whereas only around 1% of the mtDNA pre-
`sent in three samples from France, Russia and Uzbekistan was
`Neanderthal-like, one sample from Croatia and one from Spain con-
`tained around 5% and 75% Neanderthal-like mtDNA, respectively.
`One bone (Vi-80) from Vindija Cave, Croatia, stood out in that
`,99% of the 63-base-pair mtDNA segments and ,94% of the
`119-base pair segments are of Neanderthal origin. Assuming that
`the ratio of Neanderthal to contaminating modern human DNA is
`the same for mtDNA as it is for nuclear DNA, the Vi-80 bone there-
`fore yields DNA fragments that are predominantly of Neanderthal
`
`Neanderthal
`mtDNA (%)
`40
`60
`
`0
`
`20
`
`80 100
`94
`99
`
`No data
`
`No data
`
`75
`
`Modern human
`mtDNA (%)
`40
`60
`
`80
`
`20
`
`- - - -
`
`001-
`
`100
`
`Vi-80
`
`Vi-77
`
`St Cesaire
`
`Okladnikov
`
`El Sidron
`
`Teshik Tash
`
`95
`
`99
`99
`100
`98
`
`100
`99
`
`origin and provided that the contamination rate was not increased
`during the downstream sequencing process, the extent of contam-
`ination in the final analyses is below ,6%.
`The Vi-80 bone was discovered by M. Malez and co-workers in
`layer G3 of Vindija Cave in 1980. It has been dated by carbon-14
`accelerator mass spectrometry to 38,310 6 2,130 years before present
`and its entire mtDNA hypervariable region I has been sequenced10.
`Out of 14 Neanderthal remains from layer G3 that we have analysed,
`this bone is one of six samples that show good bio-molecular preser-
`vation, while the other eight bones show intermediate to bad states of
`preservation that do not suggest the presence of amplifiable DNA.
`Preservation conditions in Vindija Cave thus vary drastically from
`bone to bone, a situation that may be due to different extents of water
`percolation in different parts of the cave.
`
`Direct large-scale DNA sequencing from the Vindija Neanderthal
`Because the Vi-80 Neanderthal bone extract is largely free of contam-
`inating modern human mtDNA, we chose this extract to perform
`large-scale parallel 454 sequencing27. In this technology, single-
`stranded libraries, flanked by common adapters, are created from
`the DNA sample and individual library molecules are amplified
`through bead-based emulsion PCR, resulting in beads carrying mil-
`lions of clonal copies of the DNA fragments from the samples. These
`are subsequently sequenced by pyrosequencing on the GS20 454
`sequencing system.
`For several reasons, the 454 sequencing platform is extremely well
`suited for analyses of bulk DNA extracted from ancient remains28.
`First, it circumvents bacterial cloning, in which the vast majority of
`initial template molecules are lost during transformation and estab-
`lishment of clones. Second, because each molecule is amplified in
`isolation from other molecules it also precludes template competi-
`tion, which frequently occurs when large numbers of different DNA
`fragments are amplified together. Third, its current read length of
`100–200 nucleotides covers the average length of the DNA preserved
`in most fossils29. Fourth, it generates hundreds of thousands of reads
`per run, which is crucial because the majority of the DNA recovered
`from fossils is generally not derived from the fossil species, but rather
`from organisms that have colonized the organism after its death20,30.
`Fifth, because each sequenced product stems from just one original
`single-stranded template molecule of known orientation, the DNA
`strand from which the sequence is derived is known28. This provides
`an advantage over traditional PCR from double-stranded templates,
`in which the template strand is not known, because the frequency of
`different nucleotide misincorporations can be deduced. For example,
`using 454 sequencing, the rate at which cytosine is converted to uracil
`and read as thymine can be distinguished from the rate at which
`guanine is converted to xanthine and read as adenine, whereas this
`is impossible using traditional PCR or bacterial cloning. This is
`important since nucleotide conversions and misincorporations in
`ancient DNA are caused by damage that affects different bases
`differently28,31 and this pattern of false substitutions can be used to
`estimate the relative probability that a particular substitution (that is,
`the observation of a nucleotide difference between DNA sequences)
`represents the authentic DNA sequence of the organism versus an
`artefact from DNA degradation.
`We recovered a total of 254,933 unique sequences from the Vi-80
`bone (see Supplementary Methods). These were aligned to the
`human (build 36.1)32, chimpanzee (build 1)17 and mouse (build
`34.1)33 complete genome sequences,
`to environmental sample
`sequences in the GenBank env database (version 3, September
`2005), and to the complete set of redundant nucleotide sequences
`in GenBank nt (version 3, September 2005, excluding EST, STS, GSS,
`environmental and HTGS sequences)34 using the program BLASTN
`(NCBI version 2.2.12)35. The most similar database sequence for each
`query was identified and classified by its taxonomic order (Fig. 2) (see
`Supplementary Methods). No significant nucleotide sequence sim-
`ilarity in the databases was found for 79% of the fossil extract
`
`Figure 1 | Ratio of Neanderthal to modern human mtDNA in six hominid
`fossils. For each fossil, primer pairs that amplify a long (119 base pairs;
`upper lighter bars) and short (63 base pairs; lower darker bars) product were
`used to amplify segments of the mtDNA hypervariable region. The products
`were sequenced and determined to be either of Neanderthal (yellow) or
`modern human (blue) type.
`
` ©2006
`
`Nature Publishing Group
`
`331
`
`Petitioner Sequenom - Ex. 1012, p. 2
`
`
`
`ARTICLES
`
`NATURE| Vol 444 | 16 November 2006
`
`sequence reads. This is typical of large-scale sequencing both from
`other ancient bones20,22,28 and from environmental samples36,37,
`although some permafrost-preserved specimens can yield high
`amounts of endogenous DNA22. Sequences with similarity to a data-
`base sequence were classified by the taxonomic order of their most
`significant alignment. Actinomycetales, a bacterial order with many
`soil-living species, was the most populous order and accounted for
`6.8% of the sequences. The second most populous order, to which
`15,701 unique sequences or 6.2% of the sequence reads were most
`similar, was that of primates. All other individual orders were sub-
`stantially less frequent. Notably, the average percentage identity for
`the primate sequence alignments was 98.8%, whereas it was 92–98%
`for the other frequently occurring orders. Thus, the primate reads,
`unlike many of the prokaryotic reads, are aligned to a very closely
`related species.
`
`Neanderthal mtDNA sequences
`Among the 15,701 sequences of primate origin, we first identified all
`mtDNA in order to investigate whether their evolutionary relation-
`ship to the current human mtDNA pool is similar to what is known
`from previous analyses of Neanderthal mtDNA. A total of 41 unique
`DNA sequences from the Vi-80 fossil had their closest hits to different
`parts of the human mtDNA, and comprised, in total, 2,705 base pairs
`of unique mtDNA sequence. None of the putative Neanderthal
`mtDNA sequences map to the two hypervariable regions that have
`been previously sequenced in Neanderthals. We aligned these
`mtDNA sequences to the complete mtDNA sequences of 311 modern
`humans from different populations38 as well as to the complete
`mtDNA sequences of three chimpanzees and two bonobos (Supple-
`mentary Information). A schematic neighbour-joining tree esti-
`mated from this alignment is shown in Fig. 3. In agreement with
`previous results, the Neanderthal mtDNA falls outside the variation
`among modern humans. However, the length of the branch leading
`to the Neanderthal mtDNA is 2.5 times as long as the branch leading
`to modern human mtDNAs. This is likely to be due to errors in our
`Neanderthal sequences derived from substitution artefacts from
`damaged, ancient DNA and from sequencing errors28.
`To analyse the extent to which errors occur in the Neanderthal
`mtDNA reads, we designed 29 primer pairs (Supplementary
`Methods) flanking all 39 positions at which the Vi-80 Neanderthal
`mtDNA sequences differed by substitutions from the consensus bases
`seen among the 311 human mtDNA sequences. These primer pairs,
`
`which are designed to yield amplification products that vary in length
`between 50 and 98 base pairs (including primers), were used in a
`multiplex two-step PCR39 from the same Neanderthal extract that
`had been used for large-scale 454 sequencing. Twenty five of the PCR
`products, containing 34 of the positions where the Neanderthal differs
`from humans, were successfully amplified and cloned, and then six or
`more clones of each product were sequenced. The consensus sequence
`seen among these clones revealed the same nucleotides seen by the 454
`sequencing at 20 of the 34 positions and no additional differences. Of
`the 14 positions found to represent errors in the sequence reads, seven
`were C to T transitions, four were G to A, two were G to T and one was
`T to C. This pattern of change is typical for ancient DNA, where
`deamination of cytosine residues31 and, to a lesser extent, modifica-
`tions of guanosine residues28 have been found to account for the major-
`ity of nucleotide misincorporations during PCR.
`These results also show that the likelihood of observing errors in
`the sequencing reads is drastically different depending on whether
`one considers nucleotide positions where a base in the Neanderthal
`mtDNA sequence differs from both the human and chimpanzee
`sequences, or positions where the Neanderthal differ from the
`humans but is identical to the chimpanzee mtDNA sequences.
`Among the mtDNA sequences analysed, there are 14 positions where
`the Neanderthal carries a base identical to the chimpanzee, and 13 of
`those were confirmed by PCR. In contrast, among the remaining 20
`positions, where the Neanderthal sequences differed from both
`humans and chimpanzees, only seven were confirmed. When only
`PCR-confirmed sequence data are used to estimate the mtDNA tree
`(Fig. 3), the Neanderthal branch has a length comparable to that of
`contemporary humans. This suggests that no large source of errors
`other than what is detected by the PCR analysis affects the sequences.
`Using these PCR-confirmed substitutions and a divergence time
`between humans and chimpanzees of 4.7–8.4 million years40–42, we
`estimate the divergence time for the mtDNA fragments determined
`here to be 461,000–825,000 years. This is in general agreement with
`previous estimates of Neanderthal–human mtDNA divergence of
`317,000–741,000 years6 based on mtDNA hypervariable region
`sequences and is compatible with our presumption that the mtDNA
`sequences determined from the Vi-80 extract are of Neanderthal origin.
`
`Nuclear DNA sequences
`We next analysed the sequence reads whose closest matches are to the
`human or chimpanzee nuclear genomes and that are at least 30 base
`
`Pseudomonadales
`(1,470; 0.6%)
`
`Burkholderiales
`(1,912; 0.8%)
`
`env
`(8,408; 3.3%)
`
`Primates
`(15,701; 6.2%)
`
`Actinomycetales
`(17,213; 6.8%)
`
`Rhizobiales
`(1,230; 0.5%)
`
`Enterobacteriales
`(788; 0.3%)
`Poales
`(429; 0.2%)
`Rhodocyclales
`(394; 0.2%)
`All other orders
`(6,559; 2.6%)
`
`No hit
`(200,829; 79.0%)
`
`311
`Modern
`humans
`
`Vindija-80
`Neanderthal
`
`Figure 2 | Taxonomic distribution of DNA sequences from the Vi-80
`extract. The taxonomic order of the database sequence giving the best
`alignment for each unique sequence read was determined. The most
`populous taxonomic orders are shown.
`
`Figure 3 | Schematic tree relating the Vi-80 Neanderthal mtDNA
`sequences to 311 human mtDNA sequences. The Neanderthal branch
`length is given with uncorrected sequences (red triangle) and after
`correction of sequences via independent PCRs (black triangle). Chimpanzee
`and bonobo sequences (not shown) were used to root the neighbour-joining
`tree. Several substitution models (Kimura 2-parameter, Tajima-Nei, and
`Tamura 3-parameter with uniform or gamma-distributed (c 5 0.5–1.1)
`rates) yielded bootstrap support values for the human branch from 72–83%.
`
`332
`
` ©2006
`
`Nature Publishing Group
`
`Petitioner Sequenom - Ex. 1012, p. 3
`
`
`
`NATURE| Vol 444 | 16 November 2006
`
`ARTICLES
`
`pairs long. Figure 4 shows where they map to the human karyotype
`(see Supplementary Methods). Overall, 0.04% of the autosomal gen-
`ome sequence is covered by the Neanderthal reads—on average 3.61
`bases per 10,000 bases. Both X and Y chromosomes are represented,
`with a lower coverage of 2.18 and 1.62 bases per 10,000, respectively,
`showing that the Vi-80 bone is derived from a male individual.
`The data presented in Fig. 4 show that when the hit density for
`sequences that have a single best hit in the human genome is plotted
`along the chromosomes, several suggestive local deviations from the
`average hit density are seen, which may represent copy-number
`differences in the Neanderthal relative to the human reference gen-
`ome. For comparison, we generated 454 sequence data from a DNA
`sample from a modern human. Interestingly, some of the deviations
`seen in the Neanderthal are present also in the modern human,
`whereas others are not. The latter group of sequences may indicate
`copy-number differences that are unique to the Neanderthal relative
`to the modern human genome sequence. Thus, when more
`Neanderthal sequence is generated in the future, it may be possible
`to determine copy number differences between the Neanderthal, the
`chimpanzee and the human genomes.
`
`Patterns of nucleotide change on lineages
`We generated three-way alignments between all Neanderthal
`sequences that map uniquely within the human genome and the
`corresponding human and chimpanzee genome sequences (see
`
`Supplementary Methods). An important artefact of local sequence
`alignments, such as those produced here, is that they necessarily
`begin and end with regions of exact sequence identity. The size of
`these regions is a function of the scoring parameters for the align-
`ment. In this case, five bases at both ends of the alignments, amount-
`ing to ,14% of all data, needed to be removed (Supplementary Fig.
`1) to eliminate biases in estimates of sequence divergence.
`Each autosomal nucleotide position in the alignment that did not
`contain a deletion in the Neanderthal, the human or the chimpanzee
`sequences and was associated with a chimpanzee genome position
`with quality score $30 was classified according to which species share
`the same bases (Fig. 5). A total of 736,941 positions contained the
`same base in all three groups. The next largest category comprises
`10,167 positions in which the human and Neanderthal base are ident-
`ical, but the chimpanzee base is different. These positions are likely to
`have changed either on the hominid lineage before the divergence
`between human and Neanderthal sequences or on the chimpanzee
`lineage. At 3,447 positions, the Neanderthal base differs from both
`the human and chimpanzee bases, which are identical to each other.
`As suggested by the analysis of the mtDNA sequences, this category
`contains positions that have changed on the Neanderthal lineage, as
`well as a large proportion of errors that derive both from base damage
`that have accumulated in the ancient DNA and from sequencing
`errors. At 434 positions, the human base differs from both the
`Neanderthal and chimpanzee bases, which are identical to each other.
`
`Example of region with apparent 2X hit density
`
`25 megabases
`
`Chr 22
`3.81
`
`Chr 21
`2.94
`
`Chr 20
`3.89
`
`Chr 19
`4.19
`
`Chr 18
`3.39
`
`Chr 17
`3.69
`
`Chr 16
`4.06
`
`Chr 15
`3.54
`
`Chr 14
`3.52
`
`Chr 13
`3.59
`
`Chr 12
`3.48
`
`Chr 1
`3.79
`
`Chr 2
`3.87
`
`Chr 3
`3.66
`
`Chr 4
`3.48
`
`Chr 5
`3.27
`
`Chr 6
`3.44
`
`Chr 7
`3.44
`
`Chr 8
`3.67
`
`Chr 9
`3.17
`
`Chr 10
`3.80
`
`Chr 11
`3.72
`
`Chr X
`2.18
`
`Figure 4 | Location on the human karyotype of Neanderthal DNA
`sequences. All sequences longer than 30 nucleotides whose best alignments
`were to the human genome are shown. The blue lines above each
`chromosome mark the position of all alignments that are unique in terms of
`bit-score within the human genome. Orange lines are alignments that have
`more than one alignment of equal bit-score. To the left of each chromosome,
`the average number of Neanderthal bases per 10,000 is given. Lines
`(Neanderthal, blue; human, red) within each chromosome show the hit
`
`Non-uniquely mapping hits
`Uniquely mapping hits
`2X chr avg
`Chr avg hit density
`0.5X chr avg
`Cytogenetic band
`
`1.62
`
`Chr Y
`
`Gaps in human
`reference sequence
`Chr avg bases
`3.75
`per 10,000
`Human hit density
`Neanderthal hit density
`
`density, on a log-base 2 scale, within sliding windows of 3 megabases along
`each chromosome. The centre black lines indicate the average hit-density for
`the chromosomes. The purple lines above and below indicate hit densities of
`2X and 1/2X the chromosome average, respectively. On chromosome 5, an
`example of a region of increased sequence density is highlighted. Sequence
`gaps in the human reference sequence are indicated by dark grey regions.
`Chromosomal banding pattern is indicated by light grey regions.
`
` ©2006
`
`Nature Publishing Group
`
`333
`
`Petitioner Sequenom - Ex. 1012, p. 4
`
`
`
`ARTICLES
`
`NATURE| Vol 444 | 16 November 2006
`
`These positions are likely to have changed on the human lineage after
`the divergence from Neanderthal. Finally, a total of 51 positions
`contain different bases in all three groups.
`Because the 454 sequencing technology allows the base in a base
`pair from which a sequence is derived to be determined, the relative
`frequencies of each of the 12 possible categories of base changes can
`be estimated for each evolutionary lineage. As seen in Fig. 5, the
`patterns of the chimpanzee-specific and human-specific changes
`are similar to each other in that the eight transversional changes
`are of approximately equal frequency and about fourfold less fre-
`quent than each of the four transitional changes, yielding a transition
`to transversion ratio of 2.04, typical of closely related mammalian
`genomes43. For the Neanderthal-specific changes the pattern is very
`different in that mismatches are dominated by C to T and G to A
`differences. Thus, the pattern of change seen among the Neanderthal-
`specific alignment mismatches is typical of the nucleotide substi-
`tution pattern observed in PCR of ancient DNA.
`Consistent with this, modern human sequences determined by
`454 sequencing show no excess amount of C to T or G to A differences
`(Supplementary Fig. 2), indicating that lesions in the ancient DNA
`rather than sequencing errors account for the majority of the errors in
`the Neanderthal sequences. Assuming that the evolutionary rate of
`DNA change was the same on the Neanderthal and human lineages,
`the majority of observed differences specific to the Neanderthal lin-
`eage are artefacts. All Neanderthal-specific changes were therefore
`disregarded in the subsequent analyses and the Neanderthal
`sequences were used solely to assign changes to the human or chim-
`panzee lineage where the human and chimpanzee genome sequences
`differ and the Neanderthal sequence carries either the human or the
`chimpanzee base.
`
`Genomic divergence between Neanderthals and humans
`Assuming that the rates of DNA sequence change along the chim-
`panzee lineage and the human lineage were similar, it can be esti-
`mated that 8.2% of the DNA sequence changes that have occurred on
`the human lineage since the divergence from the chimpanzee lineage
`occurred after the divergence of the Neanderthal lineage. However,
`although the Neanderthal-specific changes that are heavily influ-
`enced by errors are not used for this analysis, some errors in the
`single-pass sequencing reads from the Neanderthal extract will create
`positions where the Neanderthal is identical either to human or
`chimpanzee sequences, and thus affect the estimates of sequence
`change on the human and chimpanzee lineages. When the effects
`
`of such errors in the Neanderthal sequences are quantified and
`removed (see Supplementary Methods), ,7.9% of the sequence
`changes along the human lineage are estimated to have occurred after
`divergence from the Neanderthal. If the human–chimpanzee diver-
`gence time is set to 6,500,000 years (refs 40, 41, 44), this implies an
`average human–Neanderthal DNA sequence divergence time of
`,516,000 years. A 95% confidence interval generated by bootstrap
`re-sampling of the alignment data gives a range of 465,000 to
`569,000 years. Obviously, these divergence estimates are dependent
`on the human–chimpanzee divergence time, which is a much larger
`source of uncertainty.
`We analysed the DNA sequences generated from a contemporary
`human using the same sequencing protocol as was used for the
`Neanderthal. Although ancient DNA is degraded and damaged, this
`comparison controls for many of the aspects of the analysis including
`sequencing and alignment methodology. In this case, ,7.1% of
`the divergence along the human lineage is assigned to the time
`subsequent to the divergence of the two human sequences. The aver-
`age divergence time between alleles within humans
`is
`thus
`,459,000 years with a 95% confidence interval between 419,000
`and 498,000 years. As expected, this estimate of the average human
`diversity is less than the divergence seen between the human and the
`Neanderthal sequences, but constitutes a large fraction of it because
`much of the human sequence diversity is expected to predate the
`human–Neanderthal split25. Neanderthal genetic differences to
`humans must therefore be interpreted within the context of human
`diversity.
`
`Ancestral population size
`Humans differ from apes in that their effective population size is of
`the order of 10,000 while those of chimpanzees, gorillas and orang-
`utans are two to four times larger45–47. Furthermore, the population
`size of the ancestor of humans and chimpanzees was found to be
`than to humans42,48. The
`similar
`to those of apes,
`rather
`Neanderthal sequence data now allow us to ask if the effective size
`of the population ancestral to humans and Neanderthals was large, as
`is the case for apes and the human–chimpanzee ancestor, or small, as
`for present-day humans.
`We applied a method42 that co-estimates the ancestral effective
`population size and the split time between Neanderthal and human
`populations (Fig. 6a; see Supplementary Methods). As seen in Fig. 6b,
`we recover a line describing combinations of population sizes and
`split times compatible with the data and lack power to be more
`
`h
`
`s
`
`Human
`
`s=434
`(422 corrected)
`
`CT
`
`GA
`TC
`
`AG
`TA
`
`AT
`
`GC
`CG
`GT
`CA
`
`TG
`AC
`
`Neanderthal=Chimpanzee;
`Human different
`
`n
`
`100
`80
`60
`40
`20
`0
`
`CT
`
`Neanderthal
`
`n=3,447
`(422 corrected)
`
`GA
`TC
`
`AG
`TA
`
`AT
`
`GC
`CG
`GT
`CA
`
`TG
`AC
`
`Human=Chimpanzee;
`Neanderthal different
`
`1,200
`
`800
`
`400
`
`0
`
`p
`
`CT
`
`250,000
`200,000
`150,000
`100,000
`50,000
`0
`
`Human=
`Neanderthal=
`Chimpanzee
`
`A C G
`736,941 total
`(739,966 corrected)
`
`T
`
`Chimpanzee
`
`p+h=10,167
`(10,208 corrected)
`
`GA
`TC
`
`AG
`TA
`
`AT
`
`GC
`CG
`GT
`CA
`
`TG
`AC
`
`Human=Neanderthal;
`Chimpanzee different
`
`2,000
`1,500
`1,000
`500
`0
`Neanderthal base
`Aligned base
`
`Figure 5 | Schematic tree illustrating the number of nucleotide changes
`inferred to have occurred on hominoid lineages. In blue is the distribution
`of all aligned positions that did not change on any lineage. In brown are the
`changes that occurred either on the chimpanzee lineage (p) or on the
`hominid lineage (h) before the human and Neanderthal lineages diverged. In
`red are the changes that are unique to the Neanderthal lineage (n), including
`
`all changes due to base-damage and base-calling errors. In yellow are changes
`unique to the human lineage. The distributions of types of changes in each
`category are also given. The numbers of changes in each category, corrected
`for base-calling errors in the Neanderthal sequence (see Supplementary
`Methods), are shown within parentheses.
`
`334
`
` ©2006
`
`Nature Publishing Group
`
`Petitioner Sequenom - Ex. 1012, p. 5
`
`
`
`NATURE| Vol 444 | 16 November 2006
`
`ARTICLES
`
`humans carry a single nucleotide polymorphism (SNP). The latter
`case identifies SNPs that were present in the common ancestor of
`Neanderthals and present-day humans. Using the SNPs that overlap
`with our data from two large genome-wide data sets (HapMap49, 786
`SNPs and Perlegen50, 318 SNPs), we find that the Neanderthal sample
`has the derived allele in ,30% of all SNPs. This number is presum-
`ably an ov