`
`G E N O M I C S
`Noninvasive Whole-Genome Sequencing of a
`Human Fetus
`Jacob O. Kitzman,1* Matthew W. Snyder,1 Mario Ventura,1,2 Alexandra P. Lewis,1 Ruolan Qiu,1
`LaVone E. Simmons,3 Hilary S. Gammill,3,4 Craig E. Rubens,5,6 Donna A. Santillan,7
`Jeffrey C. Murray,8 Holly K. Tabor,5,9 Michael J. Bamshad,1,5 Evan E. Eichler,1,10 Jay Shendure1*
`
`Analysis of cell-free fetal DNA in maternal plasma holds promise for the development of noninvasive prenatal
`genetic diagnostics. Previous studies have been restricted to detection of fetal trisomies, to specific paternally
`inherited mutations, or to genotyping common polymorphisms using material obtained invasively, for example,
`through chorionic villus sampling. Here, we combine genome sequencing of two parents, genome-wide maternal
`haplotyping, and deep sequencing of maternal plasma DNA to noninvasively determine the genome sequence
`of a human fetus at 18.5 weeks of gestation. Inheritance was predicted at 2.8 × 106 parental heterozygous sites
`with 98.1% accuracy. Furthermore, 39 of 44 de novo point mutations in the fetal genome were detected, albeit
`with limited specificity. Subsampling these data and analyzing a second family trio by the same approach in-
`dicate that parental haplotype blocks of ~300 kilo–base pairs combined with shallow sequencing of maternal
`plasma DNA is sufficient to substantially determine the inherited complement of a fetal genome. However,
`ultradeep sequencing of maternal plasma DNA is necessary for the practical detection of fetal de novo mutations
`genome-wide. Although technical and analytical challenges remain, we anticipate that noninvasive analysis of
`inherited variation and de novo mutations in fetal genomes will facilitate prenatal diagnosis of both recessive
`and dominant Mendelian disorders.
`
`INTRODUCTION
`On average, ~13% of cell-free DNA isolated from maternal plasma dur-
`ing pregnancy is fetal in origin (1). The concentration of cell-free fetal
`DNA in the maternal circulation varies between individuals, increases
`during gestation, and is rapidly cleared postpartum (2, 3). Despite this
`variability, cell-free fetal DNA has been successfully targeted for non-
`invasive prenatal diagnosis including for development of targeted assays
`for single-gene disorders (4). More recently, several groups have dem-
`onstrated that shotgun, massively parallel sequencing of cell-free DNA
`from maternal plasma is a robust approach for noninvasively diag-
`nosing fetal aneuploidies such as trisomy 21 (5, 6).
`Ideally, it should be possible to noninvasively predict the whole-
`genome sequence of a fetus to high accuracy and completeness, poten-
`tially enabling the comprehensive prenatal diagnosis of Mendelian
`disorders and obviating the need for invasive prenatal diagnostic proce-
`dures such as chorionic villus sampling with their attendant risks. How-
`ever, several key technical obstacles must be overcome for this goal to
`be achieved using cell-free DNA from maternal plasma. First, the sparse
`representation of fetal-derived sequences poses the challenge of detecting
`low-frequency alleles inherited from the paternal genome as well as those
`
`1Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
`2Department of Biology, University of Bari, Bari 70126, Italy. 3Department of Obstetrics
`and Gynecology, University of Washington, Seattle, WA 98195, USA. 4Division of Clinical
`Research, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA. 5De-
`partment of Pediatrics, University of Washington School of Medicine, Seattle, WA 98195,
`USA. 6Global Alliance to Prevent Prematurity and Stillbirth, an initiative of Seattle
`Children’s, Seattle, WA 98101, USA. 7Department of Obstetrics and Gynecology, Univer-
`sity of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA. 8Department of Pediatrics,
`IA 52242, USA. 9Treuman Katz Center for Pediatric
`University of Iowa,
`Iowa City,
`Bioethics, Seattle Children’s Research Institute, Seattle, WA 98101, USA. 10Howard
`Hughes Medical Institute, Seattle, WA 98195, USA.
`*To whom correspondence should be addressed. E-mail: shendure@uw.edu (J.S.);
`kitz@uw.edu (J.O.K.)
`
`arising from de novo mutations in the fetal genome. Second, maternal
`DNA predominates in the mother’s plasma, making it difficult to assess
`maternally inherited variation at individual sites in the fetal genome.
`Recently, Lo et al. showed that fetal-derived DNA is distributed
`sufficiently evenly in maternal plasma to support the inference of fetal
`genotypes, and furthermore, they demonstrated how knowledge of
`parental haplotypes could be leveraged to this end (7). However, their
`study was limited in several ways. First, the proposed method depended
`on the availability of parental haplotypes, but at the time of their work,
`no technologies existed to measure these experimentally on a genome-
`wide scale. Therefore, an invasive procedure, chorionic villus sampling,
`was used to obtain placental material for fetal genotyping. Second, pa-
`rental genotypes and fetal genotypes obtained invasively were used to
`infer parental haplotypes. These haplotypes were then used in combi-
`nation with the sequencing of DNA from maternal plasma to predict
`the fetal genotypes. Although necessitated by the lack of genome-wide
`haplotyping methods, the circularity of these inferences makes it diffi-
`cult to assess how well the method would perform in practice. Third,
`their analysis was restricted to several hundred thousand parentally het-
`erozygous sites of common single-nucleotide polymorphisms (SNPs)
`represented on a commercial genotyping array. These common SNPs
`are only a small fraction of the several million heterozygous sites present
`in each parental genome and include few of the rare variants that pre-
`dominantly underlie Mendelian disorders (8). Fourth, Lo et al. did not
`ascertain de novo mutations in the fetal genome. Because de novo mu-
`tations underlie a substantial fraction of dominant genetic disorders,
`their detection is critical for comprehensive prenatal genetic diagnostics.
`Therefore, although the Lo et al. study demonstrated the first successful
`construction of a genetic map of a fetus, it required an invasive proce-
`dure and did not attempt to determine the whole-genome sequence
`of the fetus. We and others recently demonstrated methods for exper-
`imentally determining haplotypes for both rare and common variation
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`1
`
`00001
`
`EX1076
`
`
`
`R E S E A R C H A R T I C L E
`
`on a genome-wide scale (9–12). Here, we set out to integrate the haplotype-
`resolved genome sequence of a mother, the shotgun genome sequence
`of a father, and the deep sequencing of cell-free DNA in maternal plasma
`to noninvasively predict the whole-genome sequence of a fetus.
`
`RESULTS
`
`We set out to predict the whole-genome sequence of a fetus in each of
`two mother-father-child trios (I1, a first trio at 18.5 weeks of gestation;
`G1, a second trio at 8.2 weeks of gestation). We focus here primarily on
`the trio for which considerably more sequence data were generated
`(I1) (Table 1).
`In brief, the haplotype-resolved genome sequence of the mother
`(I1-M) was determined by first performing shotgun sequencing of ma-
`ternal genomic DNA from blood to 32-fold coverage (coverage = median-
`fold coverage of mapping reads to the reference genome after discarding
`duplicates). Next, by sequencing complex haploid subsets of maternal
`genomic DNA while preserving long-range contiguity (9), we directly
`phased 91.4% of 1.9 × 106 heterozygous SNPs into long haplotype
`blocks [N50 of 326 kilo–base pairs (kbp)]. The shotgun genome se-
`quence of the father (I1-P) was determined by sequencing of paternal
`genomic DNA to 39-fold coverage, yielding 1.8 × 106 heterozygous
`SNPs. However, paternal haplotypes could not be assessed because only
`relatively low–molecular weight DNA obtained from saliva was availa-
`ble. Shotgun DNA sequencing libraries were also constructed from 5 ml
`of maternal plasma (obtained at 18.5 weeks of gestation), and this
`composite of maternal and fetal genomes was sequenced to 78-fold
`nonduplicate coverage. The fetus was male, and fetal content in these
`libraries was estimated at 13% (Fig. 1A). To properly assess the accuracy
`of our methods for determining the fetal genome solely from samples
`obtained noninvasively at 18.5 weeks of gestation, we also performed
`shotgun genome sequencing of the child (I1-C) to 40-fold coverage
`via cord blood DNA obtained after birth.
`Our analysis comprised four parts: (i) predicting the subset of
`“maternal-only” heterozygous variants (homozygous in the father)
`transmitted to the fetus; (ii) predicting the subset of “paternal-only”
`heterozygous variants (homozygous in the mother) transmitted to
`the fetus; (iii) predicting transmission at sites heterozygous in both par-
`ents; (iv) predicting sites of de novo mutation—that is, variants occurring
`only in the genome of the fetus. Allelic imbalance in maternal plasma,
`manifesting across experimentally determined maternal haplotype
`blocks, was used to predict their maternal transmission (Fig. 1B). The
`observation (or lack thereof) of paternal alleles in shotgun libraries
`derived from maternal plasma was used to predict paternal transmission
`
`Table 1. Summary of sequencing. Individuals sequenced, type of
`starting material, and final fold coverage of the reference genome after
`discarding PCR or optical duplicate reads.
`
`Individual
`
`Mother (I1-M)
`
`Father (I1-P)
`Offspring (I1-C)
`
`Sample
`
`Depth of coverage
`
`Plasma (5 ml, gestational
`age 18.5 weeks)
`Whole blood (<1 ml)
`Saliva
`Cord blood at delivery
`
`78
`
`32
`39
`40
`
`(Fig. 1C). Finally, a strict analysis of alleles rarely observed in maternal
`plasma, but never in maternal or paternal genomic DNA, enabled the
`genome-wide identification of candidate de novo mutations (Fig. 1D).
`Fetal genotypes are trivially predicted at sites where the parents are both
`homozygous (for the same or different allele).
`We first sought to predict transmission at maternal-only hetero-
`zygous sites. Given the fetal-derived proportion of ~13% in cell-free
`DNA, the maternal-specific allele is expected in 50% of reads aligned
`to such a site if it is transmitted versus 43.5% if the allele shared with
`the father is transmitted. However, even with 78-fold coverage of the
`maternal plasma “genome,” the variability of sampling is such that site-
`by-site prediction results in only 64.4% accuracy (Fig. 2). We therefore
`examined allelic imbalance across blocks of maternally heterozygous
`sites defined by haplotype-resolved genome sequencing of the mother
`(Fig. 1B). As anticipated given the haplotype assembly N50 of 326 kbp,
`the overwhelming majority of experimentally defined maternal hap-
`lotype blocks were wholly transmitted, with partial inheritance in a
`small minority of blocks (0.6%, n = 72) corresponding to switch errors
`from haplotype assembly and to sites of recombination. We developed a
`hidden Markov model (HMM) to identify likely switch sites and thus
`more accurately infer the inherited alleles at maternally heterozygous
`sites (Figs. 3 and 4 and Supplementary Materials). With the use of this
`model, accuracy of the inferred inherited alleles at 1.1 × 106 phased,
`maternal-only heterozygous sites increased from 98.6 to 99.3% (Table 2).
`Remaining errors were concentrated among the shortest maternal
`haplotype blocks (fig. S1), which provide less power to detect allelic
`imbalance in plasma DNA data compared with long blocks. Among
`the top 95% of sites ranked by haplotype block length, prediction accu-
`racy rose to 99.7%, suggesting that remaining inaccuracies can be miti-
`gated by improvements in haplotyping.
`We performed simulations to characterize how the accuracy of
`haplotype-based fetal genotype inference depended on haplotype
`block length, maternal plasma sequencing depth, and the fraction of
`fetal-derived DNA. To mimic the effect of less successful phasing, we
`split the maternal haplotype blocks into smaller fragments to create a
`series of assemblies with decreasing contiguity. We then subsampled a
`range of sequencing depths from the pool of observed alleles in ma-
`ternal plasma and predicted the maternally contributed allele at each
`site as above (Fig. 5A). The results suggest that inference of the inherited
`allele is robust either to decreasing sequencing depth of maternal plasma
`or to shorter haplotype blocks, but not both. For example, using only
`10% of the plasma sequence data (median depth = 8x) in conjunction
`with full-length haplotype blocks, we successfully predicted inheritance
`at 94.9% of maternal-only heterozygous sites. We achieved nearly iden-
`tical accuracy (94.8%) at these sites when highly fragmented haplotype
`blocks (N50 = 50 kbp) were used with the full set of plasma sequences.
`We next simulated decreased proportions of fetal DNA in the maternal
`plasma by spiking in additional depth of both maternal alleles at each
`site and subsampling from these pools, effectively diluting away the sig-
`nal of allelic imbalance used as a signature of inheritance (Fig. 5B).
`Again, we found the accuracy of the model to be robust to either lower
`fetal DNA concentrations or shorter haplotype blocks, but not both.
`We next sought to predict transmission at paternal-only hetero-
`zygous sites. At these sites, when the father transmits the shared allele,
`the paternal-specific allele should be entirely absent among the fetal-
`derived sequences. If instead the paternal-specific allele is transmitted,
`it will on average constitute half the fetal-derived reads within the
`maternal plasma genome (about five reads given 78-fold coverage,
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`2
`
`00002
`
`
`
`Allele transmitted
`Shared
`paternal-specific
`
`15
`10
`5
`0
`Plasma paternal-specific reads/site
`
`T C
`
`CC
`
`Father
`
`Mother
`
`R E S E A R C H A R T I C L E
`
`A
`
`Plasma
`
`87%
`
`13%
`
`WGS +
`haplotypes
`
`WGS
`(validation)
`
`C
`
`WGS
`
`B
`
`Dilution pool whole-genome phasing
`
`D
`
`Maternal plasma reads
`chr2:135,596,281
`
`C
`
`3/93
`(3.2%)
`
`T
`
`90/93
`(96.8%)
`
`ACMSD
`
`Mother
`
`Father
`
`Offspring
`
`Haplotype A transmitted
`
`Haplotype B transmitted
`
`G C
`
`C C
`
`C A
`
`A A
`
`A G
`
`G G
`
`Mother
`
`Father
`
`Using individual sites Using maternal haplotypes
`
`0.3
`
`0.3
`0.4
`0.5
`0.6
`0.4
`0.5
`0.6
`Plasma read fraction of maternal-specific alleles
`
`Fig. 1. Experimental approach. (A) Sequenced individuals in a family trio.
`Maternal plasma DNA sequences were ~13% fetal-derived on the basis of
`read depth at chromosome Y and alleles specific to each parent. WGS, whole-
`genome shotgun. (B) Inheritance of maternally heterozygous alleles inferred
`using long haplotype blocks. Among plasma DNA sequences, maternal-
`specific alleles are more abundant when transmitted (expected, 50% versus
`43.5%), but there is substantial overlap between the distributions of allele
`frequencies when considering sites in isolation (left histogram: yellow,
`shared allele transmitted; green, maternal-specific allele transmitted).
`Taking average allele balances across haplotype blocks (right histogram)
`provides much greater separation, permitting more accurate inference of
`maternally transmitted alleles. (C) Histogram of fractional read depth
`
`T C T A C
`
`T C T A C
`
`A C
`T C
`Y
`p.Leu10Pro
`among plasma data at paternal-specific heterozygous sites. In the over-
`whelming majority of cases when the allele specific to the father was not
`detected, the opposite allele had been transmitted (96.8%, n = 561,552).
`(D) De novo missense mutation in the gene ACMSD detected in 3 of 93 ma-
`ternal plasma reads and later validated by PCR and resequencing. The mu-
`tation, which is not observed in dbSNP nor among coding exons sequenced
`from >4000 individuals as part of the National Heart, Lung, and Blood In-
`stitute Exome Sequencing Project (http://evs.gs.washington.edu), creates
`a leucine-to-proline substitution at a site conserved across all aligned
`mammalian genomes (University of California, Santa Cruz, Genome Browser)
`in a gene implicated in Parkinson’s disease by genome-wide association
`studies (25).
`
`assuming 13% fetal content). To assess these, we performed a site-by-
`site log-odds test; this amounted to taking the observation of one or more
`reads matching the paternal-specific allele at a given site as evidence of
`its transmission and, conversely, the lack of such observations as evidence
`of nontransmission (Fig. 1C). In contrast to maternal-only heterozygous
`sites, this simple site-by-site model was sufficient to correctly predict
`inheritance at 1.1 × 106 paternal-only heterozygous sites with 96.8%
`accuracy (Table 2). We anticipate that accuracy could likely be improved
`by deeper sequence coverage of the maternal plasma DNA (fig. S2) or,
`alternatively, by taking a haplotype-based approach if high–molecular
`weight genomic DNA from the father is available.
`
`We next considered transmission at sites heterozygous in both par-
`ents. We predicted maternal transmission at such shared sites phased
`using neighboring maternal-only heterozygous sites in the same hap-
`lotype block. This yielded predictions at 576,242 of 631,721 (91.2%) of
`shared heterozygous sites with an estimated accuracy of 98.7% (Table 2).
`Although we did not predict paternal transmission at these sites, we an-
`ticipate that analogous to the case of maternal transmission, this could be
`done with high accuracy given paternal haplotypes. We note that shared
`heterozygous sites primarily correspond to common alleles (fig. S3),
`which are less likely to contribute to Mendelian disorders in nonconsan-
`guineous populations.
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`3
`
`00003
`
`
`
`Haplotype A allele frequency
`
`100%
`90%
`80%
`70%
`60%
`50%
`40%
`30%
`20%
`10%
`0%
`
`A
`
`B
`
`0.4
`
`0.2
`
`0.0
`
`0.2
`
`0.4
`
`Per-block confidence score
`
`43.4 Mb
`
`44.0 Mb
`43.8 Mb
`43.6 Mb
`Position (chromosome 10)
`Fig. 3. HMM-based predictions correctly predict maternally transmitted
`alleles across ~1 Mbp on chromosome 10 despite site-to-site variability of
`allelic representation among maternal plasma DNA sequences (red).
`
`44.2 Mb
`
`Haplotype A allele frequency
`
`100%
`90%
`80%
`70%
`60%
`50%
`40%
`30%
`20%
`10%
`0%
`
`OR
`HMM
`
`A
`
`B
`
`1.0
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`0.0
`
`Prob. hap A transmitted
`
`116.2 Mb
`
`116.0 Mb
`115.8 Mb
`115.6 Mb
`Position (chromosome 12)
`Fig. 4. HMM-based detection of recombination events and haplotype
`assembly switch errors. A maternal haplotype block of 917 kbp on chromo-
`some 12q is shown, with red points representing the frequency of haplotype
`A alleles among plasma reads and the black line indicating the posterior
`probability of transmission for haplotype A computed by the HMM at each
`site. A block-wide odds ratio (OR) test predicts transmission of the entire
`haplotype B, resulting in incorrect prediction at 272 of 587 sites (46.3%).
`The HMM predicts a switch between chromosomal coordinates 115,955,900
`and 115,978,082, and predicts transmission of haplotype B alleles from the
`centromeric end of the block to the switch point, and haplotype A alleles
`thereafter, resulting in correct predictions at all 587 sites. All three over-
`lapping informative clones support the given maternal phasing of the SNPs
`adjacent to the switch site, suggesting that the switch predicted by the
`HMM results from a maternal recombination event rather than an error
`of haplotype assembly.
`
`protein sequence, and only a subset of these are in genes associated with
`Mendelian disorders.
`
`DISCUSSION
`
`We have demonstrated noninvasive prediction of the whole-genome
`sequence of a human fetus through the combination of haplotype-
`resolved genome sequencing (9) of a mother, shotgun genome se-
`quencing of a father, and deep sequencing of maternal plasma DNA.
`Notably, the types and quantities of materials used were consistent
`with those routinely collected in a clinical setting (Table 1). To replicate
`
`R E S E A R C H A R T I C L E
`
`Paternal
`
`Maternal
`
`Maternal
`(w/o haplotypes)
`
` Paternal
`
` Maternal
`
`Maternal
`(w/o haplotypes)
`
`Trio I1
`
`Trio G1
`
`50 %
`
`80 %
`70 %
`60 %
`Prediction accuracy
`Fig. 2. Accuracy of fetal genotype inference from maternal plasma DNA se-
`quencing. Accuracy is shown for paternal-only heterozygous sites and for
`phased maternal-only heterozygous sites, either using maternal phase infor-
`mation (black) or instead predicting inheritance on a site-by-site basis (gray).
`
`90 %
`
`100 %
`
`De novo mutations in the fetal genome are expected to appear
`within the maternal plasma DNA sequences as “rare alleles” (Fig. 1D),
`similar to transmitted paternal-specific alleles. However, the detection
`of de novo mutations poses a much greater challenge: Unlike the 1.8 ×
`106 paternally heterozygous sites defined by sequencing the father (of
`which ~50% are transmitted), the search space for de novo sites is effec-
`tively the full genome, throughout which there may be only ~60 sites
`given a prior mutation rate estimate of ~1 × 10−8 (13). Indeed, whole-
`genome sequencing of the offspring (I1-C) revealed only 44 high-
`confidence point mutations (“true de novo sites”; table S1). Taking all
`positions in the genome at which at least one plasma-derived read had a
`high-quality mismatch to the reference sequence, and excluding var-
`iants present in the parental whole-genome sequencing data, we found
`2.5 × 107 candidate de novo sites, including 39 of the 44 true de novo
`sites. At baseline, this corresponds to sensitivity of 88.6% with a signal-
`to-noise ratio of 1-to-6.4 × 105.
`We applied a series of increasingly stringent filters (fig. S4) intended
`to remove sites prone to sequencing or mapping artifacts. We first re-
`moved alleles found in at least one read among any other individual
`sequenced in this study, known polymorphisms from dbSNP (release
`135), and sites adjacent to 1- to 3-mer repeats, reducing the number of
`candidate de novo sites to 1.8 × 107. We next filtered out sites with in-
`sufficient evidence (fewer than two independent reads supporting the
`variant allele, or variant base qualities summing to less than 105) as well
`as those with excessive reads supporting the variant allele (uncorrected
`P < 0.05, per-site one-sided binomial test using fetal-derived fraction of
`13%), cutting the total number of candidate sites to 3884, including 17
`true de novo sites. This candidate set is substantially depleted for sites of
`systematic error and is instead likely dominated by errors originating
`during polymerase chain reaction (PCR), because even a single round of
`amplification with a proofreading DNA polymerase with an error rate
`of 1 × 10−7 would introduce hundreds of false-positive candidate sites.
`Notably, this ~2800-fold improvement in signal-to-noise ratio reduced
`the candidate set to a size that is an order of magnitude fewer than the
`number of candidate de novo sites requiring validation in a previous
`study involving pure genomic DNA from parent-child trios within a
`nuclear family (14). In a clinical setting, validation efforts would be
`targeted to those sites considered most likely to be pathogenic. For ex-
`ample, only 33 of the 3884 candidate sites (0.84%) are predicted to alter
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`4
`
`00004
`
`
`
`R E S E A R C H A R T I C L E
`
`Table 2. Accuracy of fetal genome inference. Number of sites and accuracy
`of fetal genotype inference from maternal plasma sequencing (percentage
`of transmitted alleles correct out of all predicted) by parental genotype and
`phasing status. Sites later determined by trio sequencing (including the off-
`
`spring) to have poor genotype quality scores or genotypes that violated
`Mendelian inheritance were discarded for the purpose of evaluating accu-
`racy (14,000 maternal-only, 32,233 paternal-only, and 480 shared hetero-
`zygous sites, or 1.5% of all sites). ND, not determined.
`
`Individual
`
`Mother (I1-M)
`
`Father (I1-P)
`
`Site
`
`Other parental genotype
`
`Number of sites
`
`Accuracy (%)
`
`Heterozygous, phased
`
`Heterozygous, not phased
`Heterozygous
`
`Homozygous
`Heterozygous
`All
`Homozygous
`Heterozygous
`
`1,064,255
`576,242
`121,425
`1,134,192
`631,721
`
`99.3
`98.7*
`ND
`96.8
`ND
`
`*Among biparentally heterozygous sites, accuracy was assessed only where the offspring was homozygous (48.8%, n = 631,721), allowing the “true” transmitted alleles to be unambiguously
`inferred from trio genotypes.
`
`Accuracy
`
`<70%
`
`70%
`
`80%
`
`90%
`
`92%
`
`94%
`
`96%
`
`97%
`
`>98%
`
`Full-
`length
`
`325 kb
`
`300 kb
`
`250 kb
`
`200 kb
`
`150 kb
`
`100 kb
`
`50 kb
`
`B
`
`Haplotype block N50
`
`Full-
`length
`
`325 kb
`
`300 kb
`
`250 kb
`
`200 kb
`
`150 kb
`
`100 kb
`
`50 kb
`
`A
`
`Haplotype block N50
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`1
`
`7
`
`8
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`9
`
`1
`
`1
`
`1
`
`1
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`0
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`0
`
`1
`
`2
`
`3
`
`%
`
`%
`
`%
`
`%
`
`%
`
`Subsampling percentage
`Fig. 5. Simulation of effects of reduced coverage, haplotype length, and fetal
`DNA concentration on fetal genotype inference accuracy, defined as the per-
`centage of sites at which the inherited allele was correctly identified out of all
`
`Fetal percentage in plasma
`sites where prediction was attempted. (A and B) Heat maps of accuracy after in
`silico fragmentation of haplotype blocks and (A) shallower sequencing of ma-
`ternal plasma or (B) reduced fetal concentration among plasma sequences.
`
`these results, we repeated the full experiment for a second trio (G1) from
`which maternal plasma was collected earlier in the pregnancy, at 8.2 weeks
`after conception (tables S2 and S3). Both the overall sequencing depth
`and the fetal-derived proportion were each lower relative to the first trio
`(by 28 and 51%, respectively), resulting in an average of fewer than four
`fetal-derived reads per site. Nevertheless, we achieved 95.7% accuracy
`for prediction of inheritance at maternal-only sites, consistent with ac-
`curacy obtained under simulation with data from the first trio (Fig. 5).
`These results underscore the importance of specific technical param-
`eters in determining performance, namely, the length and completeness
`of haplotype-resolved sequencing of parental DNA, and the depth and
`complexity of sequencing libraries derived from low starting masses of
`plasma-derived DNA (less than 5 ng for both I1 and G1 in our study).
`There remain several key avenues for improvement. First, although
`we predicted inheritance at 2.8 × 106 heterozygous sites with high ac-
`curacy (98.2% overall), there were 7.5 × 105 sites for which we did not
`attempt prediction (Table 2). These include 6.3 × 105 shared sites het-
`erozygous in both parents for which we could not assess paternal trans-
`mission and 1.2 × 105 maternal-only heterozygous sites that were not
`included in our haplotype assembly. The shared sites are in principle
`accessible but require haplotype-resolved (rather than solely shotgun)
`
`sequencing of paternal DNA, which was not possible here with either
`the I1 or the G1 trio owing to unavailability of high–molecular weight
`DNA from each father. The unphased maternal sites are also in principle
`accessible but require improvements to haplotyping technology to en-
`able phasing of SNPs residing within blocks of relatively low heterozy-
`gosity as well as within segmental duplications. More generally, despite
`recent innovations from our group and others (9–12), there remains a
`critical need for genome-wide haplotyping protocols that are at once
`robust, scalable, and comprehensive. Significant reductions in cost,
`along with standardization and automation, will be necessary for com-
`patibility with large-scale clinical application.
`Second, although we were successful in detecting nearly 90% of de
`novo single-nucleotide mutations by deep sequencing of maternal plas-
`ma DNA, this was with very low specificity. The application of a series
`of filters resulted in a ~2800-fold gain in specificity at a ~2-fold cost in
`terms of sensitivity. However, there is clearly room for improvement if
`we are to enable the sensitive and specific prenatal detection of poten-
`tially pathogenic de novo point mutations at a genome-wide scale, a goal
`that will likely require deeper than 78-fold coverage of the maternal
`plasma genome (15) in combination with targeted validation of poten-
`tially pathogenic candidate de novo mutations.
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`5
`
`00005
`
`
`
`R E S E A R C H A R T I C L E
`
`Third, our analyses focused exclusively on single-nucleotide var-
`iants, which are by far the most common form of both nonpathogenic
`and pathogenic genetic variation in human genomes (16, 17). Clinical
`application of noninvasive fetal genome sequencing will require more
`robust methods for detecting other forms of variation, for example,
`insertion-deletions, copy number changes, repeat expansions, and struc-
`tural rearrangements. Ideally, techniques for the detection of other forms
`of variation could derive from short sequencing reads in a manner that is
`directly integrated with experimental methods and algorithms for haplotype-
`resolved genome sequencing (18).
`The ability to noninvasively sequence a fetal genome to high accu-
`racy and completeness will undoubtedly have profound implications for
`the future of prenatal genetic diagnostics. Although individually rare,
`when considered collectively, the ~3500 Mendelian disorders with a
`known molecular basis (19) contribute substantially to morbidity and
`mortality (20). Currently, routine obstetric practice includes offering a
`spectrum of screening and diagnostic options to all women. Prenatal
`screening options have imperfect sensitivity and focus mainly on a small
`number of specific disorders, including trisomies, major congenital
`anomalies, and specific Mendelian disorders. Diagnostic tests, generally
`performed through invasive procedures, such as chorionic villus sampling
`and amniocentesis, also focus on specific disorders and confer risk of
`pregnancy loss that may inversely correlate with access to high-quality
`care. Noninvasive, comprehensive diagnosis of Mendelian disorders
`early in pregnancy would provide much more information to expectant
`parents, with the greater accessibility inherent to a noninvasive test and
`without tangible risk of pregnancy loss. The less tangible implication of
`incorporating this level of information into prenatal decision-making
`raises many ethical questions that must be considered carefully within
`the scientific community and on a societal level. A final point is that as
`in other areas of clinical genetics, our capacity to generate data is out-
`stripping our ability to interpret it in ways that are useful to physicians
`and patients. That is, although the noninvasive prediction of a fetal ge-
`nome may be technically feasible, its interpretation—even for known
`Mendelian disorders—will remain a major challenge (21).
`
`MATERIALS AND METHODS
`
`Whole-genome shotgun library preparation and sequencing
`Genomic DNA was extracted from whole blood, as available, or alter-
`natively from saliva, with the Gentra Puregene Kit (Qiagen) or Oragene
`Dx (DNA Genotek), respectively. Purified DNA was fragmented by
`sonication with a Covaris S2 instrument. Indexed shotgun sequencing
`libraries were prepared with the KAPA Library Preparation Kit (Kapa
`Biosystems), following the manufacturer’s instructions. All libraries
`were sequenced on HiSeq 2000 instruments (Illumina) using paired-
`end 101-bp reads with an index read of 9 bp.
`
`Maternal plasma library preparation and sequencing
`Maternal plasma was collected by standard methods and split into 1-ml
`aliquots, which were individually purified with the QIAamp Circu-
`lating Nucleic Acid kit (Qiagen). DNA yield was measured with a
`Qubit fluorometer (Invitrogen). Sequencing libraries were prepared
`with the ThruPLEX-FD kit (Rubicon Genomics), comprising a pro-
`prietary series of end-repair, ligation, and amplification reactions.
`Index read sequencing primers compatible with the whole-genome
`sequencing and fosmid libraries from this study were included during
`
`sequencing of maternal plasma libraries to permit detection of any
`contamination from other libraries. The percentage of fetal-derived
`sequences was estimated from plasma sequences by counting alleles
`specific to each parent as well as sequences mapping specifically to
`the Y chromosome (fig. S5).
`
`Maternal haplotype resolution via clone pool
`dilution sequencing
`Haplotype-resolved genome sequencing was performed essentially as
`previously described (9), with minor updates to facilitate processing
`in a 96-well format. Briefly, high–molecular weight DNA was mechan-
`ically sheared to mean size of ~38 kbp using a HydroShear instrument
`(Digilab), with the following settings: volume = 120 ml, cycles = 20, speed
`code = 16. Sheared DNA was electrophoresed throug