throbber
R E S E A R C H A R T I C L E
`
`G E N O M I C S
`Noninvasive Whole-Genome Sequencing of a
`Human Fetus
`Jacob O. Kitzman,1* Matthew W. Snyder,1 Mario Ventura,1,2 Alexandra P. Lewis,1 Ruolan Qiu,1
`LaVone E. Simmons,3 Hilary S. Gammill,3,4 Craig E. Rubens,5,6 Donna A. Santillan,7
`Jeffrey C. Murray,8 Holly K. Tabor,5,9 Michael J. Bamshad,1,5 Evan E. Eichler,1,10 Jay Shendure1*
`
`Analysis of cell-free fetal DNA in maternal plasma holds promise for the development of noninvasive prenatal
`genetic diagnostics. Previous studies have been restricted to detection of fetal trisomies, to specific paternally
`inherited mutations, or to genotyping common polymorphisms using material obtained invasively, for example,
`through chorionic villus sampling. Here, we combine genome sequencing of two parents, genome-wide maternal
`haplotyping, and deep sequencing of maternal plasma DNA to noninvasively determine the genome sequence
`of a human fetus at 18.5 weeks of gestation. Inheritance was predicted at 2.8 × 106 parental heterozygous sites
`with 98.1% accuracy. Furthermore, 39 of 44 de novo point mutations in the fetal genome were detected, albeit
`with limited specificity. Subsampling these data and analyzing a second family trio by the same approach in-
`dicate that parental haplotype blocks of ~300 kilo–base pairs combined with shallow sequencing of maternal
`plasma DNA is sufficient to substantially determine the inherited complement of a fetal genome. However,
`ultradeep sequencing of maternal plasma DNA is necessary for the practical detection of fetal de novo mutations
`genome-wide. Although technical and analytical challenges remain, we anticipate that noninvasive analysis of
`inherited variation and de novo mutations in fetal genomes will facilitate prenatal diagnosis of both recessive
`and dominant Mendelian disorders.
`
`INTRODUCTION
`On average, ~13% of cell-free DNA isolated from maternal plasma dur-
`ing pregnancy is fetal in origin (1). The concentration of cell-free fetal
`DNA in the maternal circulation varies between individuals, increases
`during gestation, and is rapidly cleared postpartum (2, 3). Despite this
`variability, cell-free fetal DNA has been successfully targeted for non-
`invasive prenatal diagnosis including for development of targeted assays
`for single-gene disorders (4). More recently, several groups have dem-
`onstrated that shotgun, massively parallel sequencing of cell-free DNA
`from maternal plasma is a robust approach for noninvasively diag-
`nosing fetal aneuploidies such as trisomy 21 (5, 6).
`Ideally, it should be possible to noninvasively predict the whole-
`genome sequence of a fetus to high accuracy and completeness, poten-
`tially enabling the comprehensive prenatal diagnosis of Mendelian
`disorders and obviating the need for invasive prenatal diagnostic proce-
`dures such as chorionic villus sampling with their attendant risks. How-
`ever, several key technical obstacles must be overcome for this goal to
`be achieved using cell-free DNA from maternal plasma. First, the sparse
`representation of fetal-derived sequences poses the challenge of detecting
`low-frequency alleles inherited from the paternal genome as well as those
`
`1Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
`2Department of Biology, University of Bari, Bari 70126, Italy. 3Department of Obstetrics
`and Gynecology, University of Washington, Seattle, WA 98195, USA. 4Division of Clinical
`Research, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA. 5De-
`partment of Pediatrics, University of Washington School of Medicine, Seattle, WA 98195,
`USA. 6Global Alliance to Prevent Prematurity and Stillbirth, an initiative of Seattle
`Children’s, Seattle, WA 98101, USA. 7Department of Obstetrics and Gynecology, Univer-
`sity of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA. 8Department of Pediatrics,
`IA 52242, USA. 9Treuman Katz Center for Pediatric
`University of Iowa,
`Iowa City,
`Bioethics, Seattle Children’s Research Institute, Seattle, WA 98101, USA. 10Howard
`Hughes Medical Institute, Seattle, WA 98195, USA.
`*To whom correspondence should be addressed. E-mail: shendure@uw.edu (J.S.);
`kitz@uw.edu (J.O.K.)
`
`arising from de novo mutations in the fetal genome. Second, maternal
`DNA predominates in the mother’s plasma, making it difficult to assess
`maternally inherited variation at individual sites in the fetal genome.
`Recently, Lo et al. showed that fetal-derived DNA is distributed
`sufficiently evenly in maternal plasma to support the inference of fetal
`genotypes, and furthermore, they demonstrated how knowledge of
`parental haplotypes could be leveraged to this end (7). However, their
`study was limited in several ways. First, the proposed method depended
`on the availability of parental haplotypes, but at the time of their work,
`no technologies existed to measure these experimentally on a genome-
`wide scale. Therefore, an invasive procedure, chorionic villus sampling,
`was used to obtain placental material for fetal genotyping. Second, pa-
`rental genotypes and fetal genotypes obtained invasively were used to
`infer parental haplotypes. These haplotypes were then used in combi-
`nation with the sequencing of DNA from maternal plasma to predict
`the fetal genotypes. Although necessitated by the lack of genome-wide
`haplotyping methods, the circularity of these inferences makes it diffi-
`cult to assess how well the method would perform in practice. Third,
`their analysis was restricted to several hundred thousand parentally het-
`erozygous sites of common single-nucleotide polymorphisms (SNPs)
`represented on a commercial genotyping array. These common SNPs
`are only a small fraction of the several million heterozygous sites present
`in each parental genome and include few of the rare variants that pre-
`dominantly underlie Mendelian disorders (8). Fourth, Lo et al. did not
`ascertain de novo mutations in the fetal genome. Because de novo mu-
`tations underlie a substantial fraction of dominant genetic disorders,
`their detection is critical for comprehensive prenatal genetic diagnostics.
`Therefore, although the Lo et al. study demonstrated the first successful
`construction of a genetic map of a fetus, it required an invasive proce-
`dure and did not attempt to determine the whole-genome sequence
`of the fetus. We and others recently demonstrated methods for exper-
`imentally determining haplotypes for both rare and common variation
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`1
`
`00001
`
`EX1076
`
`

`

`R E S E A R C H A R T I C L E
`
`on a genome-wide scale (9–12). Here, we set out to integrate the haplotype-
`resolved genome sequence of a mother, the shotgun genome sequence
`of a father, and the deep sequencing of cell-free DNA in maternal plasma
`to noninvasively predict the whole-genome sequence of a fetus.
`
`RESULTS
`
`We set out to predict the whole-genome sequence of a fetus in each of
`two mother-father-child trios (I1, a first trio at 18.5 weeks of gestation;
`G1, a second trio at 8.2 weeks of gestation). We focus here primarily on
`the trio for which considerably more sequence data were generated
`(I1) (Table 1).
`In brief, the haplotype-resolved genome sequence of the mother
`(I1-M) was determined by first performing shotgun sequencing of ma-
`ternal genomic DNA from blood to 32-fold coverage (coverage = median-
`fold coverage of mapping reads to the reference genome after discarding
`duplicates). Next, by sequencing complex haploid subsets of maternal
`genomic DNA while preserving long-range contiguity (9), we directly
`phased 91.4% of 1.9 × 106 heterozygous SNPs into long haplotype
`blocks [N50 of 326 kilo–base pairs (kbp)]. The shotgun genome se-
`quence of the father (I1-P) was determined by sequencing of paternal
`genomic DNA to 39-fold coverage, yielding 1.8 × 106 heterozygous
`SNPs. However, paternal haplotypes could not be assessed because only
`relatively low–molecular weight DNA obtained from saliva was availa-
`ble. Shotgun DNA sequencing libraries were also constructed from 5 ml
`of maternal plasma (obtained at 18.5 weeks of gestation), and this
`composite of maternal and fetal genomes was sequenced to 78-fold
`nonduplicate coverage. The fetus was male, and fetal content in these
`libraries was estimated at 13% (Fig. 1A). To properly assess the accuracy
`of our methods for determining the fetal genome solely from samples
`obtained noninvasively at 18.5 weeks of gestation, we also performed
`shotgun genome sequencing of the child (I1-C) to 40-fold coverage
`via cord blood DNA obtained after birth.
`Our analysis comprised four parts: (i) predicting the subset of
`“maternal-only” heterozygous variants (homozygous in the father)
`transmitted to the fetus; (ii) predicting the subset of “paternal-only”
`heterozygous variants (homozygous in the mother) transmitted to
`the fetus; (iii) predicting transmission at sites heterozygous in both par-
`ents; (iv) predicting sites of de novo mutation—that is, variants occurring
`only in the genome of the fetus. Allelic imbalance in maternal plasma,
`manifesting across experimentally determined maternal haplotype
`blocks, was used to predict their maternal transmission (Fig. 1B). The
`observation (or lack thereof) of paternal alleles in shotgun libraries
`derived from maternal plasma was used to predict paternal transmission
`
`Table 1. Summary of sequencing. Individuals sequenced, type of
`starting material, and final fold coverage of the reference genome after
`discarding PCR or optical duplicate reads.
`
`Individual
`
`Mother (I1-M)
`
`Father (I1-P)
`Offspring (I1-C)
`
`Sample
`
`Depth of coverage
`
`Plasma (5 ml, gestational
`age 18.5 weeks)
`Whole blood (<1 ml)
`Saliva
`Cord blood at delivery
`
`78
`
`32
`39
`40
`
`(Fig. 1C). Finally, a strict analysis of alleles rarely observed in maternal
`plasma, but never in maternal or paternal genomic DNA, enabled the
`genome-wide identification of candidate de novo mutations (Fig. 1D).
`Fetal genotypes are trivially predicted at sites where the parents are both
`homozygous (for the same or different allele).
`We first sought to predict transmission at maternal-only hetero-
`zygous sites. Given the fetal-derived proportion of ~13% in cell-free
`DNA, the maternal-specific allele is expected in 50% of reads aligned
`to such a site if it is transmitted versus 43.5% if the allele shared with
`the father is transmitted. However, even with 78-fold coverage of the
`maternal plasma “genome,” the variability of sampling is such that site-
`by-site prediction results in only 64.4% accuracy (Fig. 2). We therefore
`examined allelic imbalance across blocks of maternally heterozygous
`sites defined by haplotype-resolved genome sequencing of the mother
`(Fig. 1B). As anticipated given the haplotype assembly N50 of 326 kbp,
`the overwhelming majority of experimentally defined maternal hap-
`lotype blocks were wholly transmitted, with partial inheritance in a
`small minority of blocks (0.6%, n = 72) corresponding to switch errors
`from haplotype assembly and to sites of recombination. We developed a
`hidden Markov model (HMM) to identify likely switch sites and thus
`more accurately infer the inherited alleles at maternally heterozygous
`sites (Figs. 3 and 4 and Supplementary Materials). With the use of this
`model, accuracy of the inferred inherited alleles at 1.1 × 106 phased,
`maternal-only heterozygous sites increased from 98.6 to 99.3% (Table 2).
`Remaining errors were concentrated among the shortest maternal
`haplotype blocks (fig. S1), which provide less power to detect allelic
`imbalance in plasma DNA data compared with long blocks. Among
`the top 95% of sites ranked by haplotype block length, prediction accu-
`racy rose to 99.7%, suggesting that remaining inaccuracies can be miti-
`gated by improvements in haplotyping.
`We performed simulations to characterize how the accuracy of
`haplotype-based fetal genotype inference depended on haplotype
`block length, maternal plasma sequencing depth, and the fraction of
`fetal-derived DNA. To mimic the effect of less successful phasing, we
`split the maternal haplotype blocks into smaller fragments to create a
`series of assemblies with decreasing contiguity. We then subsampled a
`range of sequencing depths from the pool of observed alleles in ma-
`ternal plasma and predicted the maternally contributed allele at each
`site as above (Fig. 5A). The results suggest that inference of the inherited
`allele is robust either to decreasing sequencing depth of maternal plasma
`or to shorter haplotype blocks, but not both. For example, using only
`10% of the plasma sequence data (median depth = 8x) in conjunction
`with full-length haplotype blocks, we successfully predicted inheritance
`at 94.9% of maternal-only heterozygous sites. We achieved nearly iden-
`tical accuracy (94.8%) at these sites when highly fragmented haplotype
`blocks (N50 = 50 kbp) were used with the full set of plasma sequences.
`We next simulated decreased proportions of fetal DNA in the maternal
`plasma by spiking in additional depth of both maternal alleles at each
`site and subsampling from these pools, effectively diluting away the sig-
`nal of allelic imbalance used as a signature of inheritance (Fig. 5B).
`Again, we found the accuracy of the model to be robust to either lower
`fetal DNA concentrations or shorter haplotype blocks, but not both.
`We next sought to predict transmission at paternal-only hetero-
`zygous sites. At these sites, when the father transmits the shared allele,
`the paternal-specific allele should be entirely absent among the fetal-
`derived sequences. If instead the paternal-specific allele is transmitted,
`it will on average constitute half the fetal-derived reads within the
`maternal plasma genome (about five reads given 78-fold coverage,
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`2
`
`00002
`
`

`

`Allele transmitted
`Shared
`paternal-specific
`
`15
`10
`5
`0
`Plasma paternal-specific reads/site
`
`T C
`
`CC
`
`Father
`
`Mother
`
`R E S E A R C H A R T I C L E
`
`A
`
`Plasma
`
`87%
`
`13%
`
`WGS +
`haplotypes
`
`WGS
`(validation)
`
`C
`
`WGS
`
`B
`
`Dilution pool whole-genome phasing
`
`D
`
`Maternal plasma reads
`chr2:135,596,281
`
`C
`
`3/93
`(3.2%)
`
`T
`
`90/93
`(96.8%)
`
`ACMSD
`
`Mother
`
`Father
`
`Offspring
`
`Haplotype A transmitted
`
`Haplotype B transmitted
`
`G C
`
`C C
`
`C A
`
`A A
`
`A G
`
`G G
`
`Mother
`
`Father
`
`Using individual sites Using maternal haplotypes
`
`0.3
`
`0.3
`0.4
`0.5
`0.6
`0.4
`0.5
`0.6
`Plasma read fraction of maternal-specific alleles
`
`Fig. 1. Experimental approach. (A) Sequenced individuals in a family trio.
`Maternal plasma DNA sequences were ~13% fetal-derived on the basis of
`read depth at chromosome Y and alleles specific to each parent. WGS, whole-
`genome shotgun. (B) Inheritance of maternally heterozygous alleles inferred
`using long haplotype blocks. Among plasma DNA sequences, maternal-
`specific alleles are more abundant when transmitted (expected, 50% versus
`43.5%), but there is substantial overlap between the distributions of allele
`frequencies when considering sites in isolation (left histogram: yellow,
`shared allele transmitted; green, maternal-specific allele transmitted).
`Taking average allele balances across haplotype blocks (right histogram)
`provides much greater separation, permitting more accurate inference of
`maternally transmitted alleles. (C) Histogram of fractional read depth
`
`T C T A C
`
`T C T A C
`
`A C
`T C
`Y
`p.Leu10Pro
`among plasma data at paternal-specific heterozygous sites. In the over-
`whelming majority of cases when the allele specific to the father was not
`detected, the opposite allele had been transmitted (96.8%, n = 561,552).
`(D) De novo missense mutation in the gene ACMSD detected in 3 of 93 ma-
`ternal plasma reads and later validated by PCR and resequencing. The mu-
`tation, which is not observed in dbSNP nor among coding exons sequenced
`from >4000 individuals as part of the National Heart, Lung, and Blood In-
`stitute Exome Sequencing Project (http://evs.gs.washington.edu), creates
`a leucine-to-proline substitution at a site conserved across all aligned
`mammalian genomes (University of California, Santa Cruz, Genome Browser)
`in a gene implicated in Parkinson’s disease by genome-wide association
`studies (25).
`
`assuming 13% fetal content). To assess these, we performed a site-by-
`site log-odds test; this amounted to taking the observation of one or more
`reads matching the paternal-specific allele at a given site as evidence of
`its transmission and, conversely, the lack of such observations as evidence
`of nontransmission (Fig. 1C). In contrast to maternal-only heterozygous
`sites, this simple site-by-site model was sufficient to correctly predict
`inheritance at 1.1 × 106 paternal-only heterozygous sites with 96.8%
`accuracy (Table 2). We anticipate that accuracy could likely be improved
`by deeper sequence coverage of the maternal plasma DNA (fig. S2) or,
`alternatively, by taking a haplotype-based approach if high–molecular
`weight genomic DNA from the father is available.
`
`We next considered transmission at sites heterozygous in both par-
`ents. We predicted maternal transmission at such shared sites phased
`using neighboring maternal-only heterozygous sites in the same hap-
`lotype block. This yielded predictions at 576,242 of 631,721 (91.2%) of
`shared heterozygous sites with an estimated accuracy of 98.7% (Table 2).
`Although we did not predict paternal transmission at these sites, we an-
`ticipate that analogous to the case of maternal transmission, this could be
`done with high accuracy given paternal haplotypes. We note that shared
`heterozygous sites primarily correspond to common alleles (fig. S3),
`which are less likely to contribute to Mendelian disorders in nonconsan-
`guineous populations.
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`3
`
`00003
`
`

`

`Haplotype A allele frequency
`
`100%
`90%
`80%
`70%
`60%
`50%
`40%
`30%
`20%
`10%
`0%
`
`A
`
`B
`
`0.4
`
`0.2
`
`0.0
`
`0.2
`
`0.4
`
`Per-block confidence score
`
`43.4 Mb
`
`44.0 Mb
`43.8 Mb
`43.6 Mb
`Position (chromosome 10)
`Fig. 3. HMM-based predictions correctly predict maternally transmitted
`alleles across ~1 Mbp on chromosome 10 despite site-to-site variability of
`allelic representation among maternal plasma DNA sequences (red).
`
`44.2 Mb
`
`Haplotype A allele frequency
`
`100%
`90%
`80%
`70%
`60%
`50%
`40%
`30%
`20%
`10%
`0%
`
`OR
`HMM
`
`A
`
`B
`
`1.0
`
`0.8
`
`0.6
`
`0.4
`
`0.2
`
`0.0
`
`Prob. hap A transmitted
`
`116.2 Mb
`
`116.0 Mb
`115.8 Mb
`115.6 Mb
`Position (chromosome 12)
`Fig. 4. HMM-based detection of recombination events and haplotype
`assembly switch errors. A maternal haplotype block of 917 kbp on chromo-
`some 12q is shown, with red points representing the frequency of haplotype
`A alleles among plasma reads and the black line indicating the posterior
`probability of transmission for haplotype A computed by the HMM at each
`site. A block-wide odds ratio (OR) test predicts transmission of the entire
`haplotype B, resulting in incorrect prediction at 272 of 587 sites (46.3%).
`The HMM predicts a switch between chromosomal coordinates 115,955,900
`and 115,978,082, and predicts transmission of haplotype B alleles from the
`centromeric end of the block to the switch point, and haplotype A alleles
`thereafter, resulting in correct predictions at all 587 sites. All three over-
`lapping informative clones support the given maternal phasing of the SNPs
`adjacent to the switch site, suggesting that the switch predicted by the
`HMM results from a maternal recombination event rather than an error
`of haplotype assembly.
`
`protein sequence, and only a subset of these are in genes associated with
`Mendelian disorders.
`
`DISCUSSION
`
`We have demonstrated noninvasive prediction of the whole-genome
`sequence of a human fetus through the combination of haplotype-
`resolved genome sequencing (9) of a mother, shotgun genome se-
`quencing of a father, and deep sequencing of maternal plasma DNA.
`Notably, the types and quantities of materials used were consistent
`with those routinely collected in a clinical setting (Table 1). To replicate
`
`R E S E A R C H A R T I C L E
`
`Paternal
`
`Maternal
`
`Maternal
`(w/o haplotypes)
`
` Paternal
`
` Maternal
`
`Maternal
`(w/o haplotypes)
`
`Trio I1
`
`Trio G1
`
`50 %
`
`80 %
`70 %
`60 %
`Prediction accuracy
`Fig. 2. Accuracy of fetal genotype inference from maternal plasma DNA se-
`quencing. Accuracy is shown for paternal-only heterozygous sites and for
`phased maternal-only heterozygous sites, either using maternal phase infor-
`mation (black) or instead predicting inheritance on a site-by-site basis (gray).
`
`90 %
`
`100 %
`
`De novo mutations in the fetal genome are expected to appear
`within the maternal plasma DNA sequences as “rare alleles” (Fig. 1D),
`similar to transmitted paternal-specific alleles. However, the detection
`of de novo mutations poses a much greater challenge: Unlike the 1.8 ×
`106 paternally heterozygous sites defined by sequencing the father (of
`which ~50% are transmitted), the search space for de novo sites is effec-
`tively the full genome, throughout which there may be only ~60 sites
`given a prior mutation rate estimate of ~1 × 10−8 (13). Indeed, whole-
`genome sequencing of the offspring (I1-C) revealed only 44 high-
`confidence point mutations (“true de novo sites”; table S1). Taking all
`positions in the genome at which at least one plasma-derived read had a
`high-quality mismatch to the reference sequence, and excluding var-
`iants present in the parental whole-genome sequencing data, we found
`2.5 × 107 candidate de novo sites, including 39 of the 44 true de novo
`sites. At baseline, this corresponds to sensitivity of 88.6% with a signal-
`to-noise ratio of 1-to-6.4 × 105.
`We applied a series of increasingly stringent filters (fig. S4) intended
`to remove sites prone to sequencing or mapping artifacts. We first re-
`moved alleles found in at least one read among any other individual
`sequenced in this study, known polymorphisms from dbSNP (release
`135), and sites adjacent to 1- to 3-mer repeats, reducing the number of
`candidate de novo sites to 1.8 × 107. We next filtered out sites with in-
`sufficient evidence (fewer than two independent reads supporting the
`variant allele, or variant base qualities summing to less than 105) as well
`as those with excessive reads supporting the variant allele (uncorrected
`P < 0.05, per-site one-sided binomial test using fetal-derived fraction of
`13%), cutting the total number of candidate sites to 3884, including 17
`true de novo sites. This candidate set is substantially depleted for sites of
`systematic error and is instead likely dominated by errors originating
`during polymerase chain reaction (PCR), because even a single round of
`amplification with a proofreading DNA polymerase with an error rate
`of 1 × 10−7 would introduce hundreds of false-positive candidate sites.
`Notably, this ~2800-fold improvement in signal-to-noise ratio reduced
`the candidate set to a size that is an order of magnitude fewer than the
`number of candidate de novo sites requiring validation in a previous
`study involving pure genomic DNA from parent-child trios within a
`nuclear family (14). In a clinical setting, validation efforts would be
`targeted to those sites considered most likely to be pathogenic. For ex-
`ample, only 33 of the 3884 candidate sites (0.84%) are predicted to alter
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`4
`
`00004
`
`

`

`R E S E A R C H A R T I C L E
`
`Table 2. Accuracy of fetal genome inference. Number of sites and accuracy
`of fetal genotype inference from maternal plasma sequencing (percentage
`of transmitted alleles correct out of all predicted) by parental genotype and
`phasing status. Sites later determined by trio sequencing (including the off-
`
`spring) to have poor genotype quality scores or genotypes that violated
`Mendelian inheritance were discarded for the purpose of evaluating accu-
`racy (14,000 maternal-only, 32,233 paternal-only, and 480 shared hetero-
`zygous sites, or 1.5% of all sites). ND, not determined.
`
`Individual
`
`Mother (I1-M)
`
`Father (I1-P)
`
`Site
`
`Other parental genotype
`
`Number of sites
`
`Accuracy (%)
`
`Heterozygous, phased
`
`Heterozygous, not phased
`Heterozygous
`
`Homozygous
`Heterozygous
`All
`Homozygous
`Heterozygous
`
`1,064,255
`576,242
`121,425
`1,134,192
`631,721
`
`99.3
`98.7*
`ND
`96.8
`ND
`
`*Among biparentally heterozygous sites, accuracy was assessed only where the offspring was homozygous (48.8%, n = 631,721), allowing the “true” transmitted alleles to be unambiguously
`inferred from trio genotypes.
`
`Accuracy
`
`<70%
`
`70%
`
`80%
`
`90%
`
`92%
`
`94%
`
`96%
`
`97%
`
`>98%
`
`Full-
`length
`
`325 kb
`
`300 kb
`
`250 kb
`
`200 kb
`
`150 kb
`
`100 kb
`
`50 kb
`
`B
`
`Haplotype block N50
`
`Full-
`length
`
`325 kb
`
`300 kb
`
`250 kb
`
`200 kb
`
`150 kb
`
`100 kb
`
`50 kb
`
`A
`
`Haplotype block N50
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`1
`
`7
`
`8
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`9
`
`1
`
`1
`
`1
`
`1
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`0
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`%
`
`0
`
`1
`
`2
`
`3
`
`%
`
`%
`
`%
`
`%
`
`%
`
`Subsampling percentage
`Fig. 5. Simulation of effects of reduced coverage, haplotype length, and fetal
`DNA concentration on fetal genotype inference accuracy, defined as the per-
`centage of sites at which the inherited allele was correctly identified out of all
`
`Fetal percentage in plasma
`sites where prediction was attempted. (A and B) Heat maps of accuracy after in
`silico fragmentation of haplotype blocks and (A) shallower sequencing of ma-
`ternal plasma or (B) reduced fetal concentration among plasma sequences.
`
`these results, we repeated the full experiment for a second trio (G1) from
`which maternal plasma was collected earlier in the pregnancy, at 8.2 weeks
`after conception (tables S2 and S3). Both the overall sequencing depth
`and the fetal-derived proportion were each lower relative to the first trio
`(by 28 and 51%, respectively), resulting in an average of fewer than four
`fetal-derived reads per site. Nevertheless, we achieved 95.7% accuracy
`for prediction of inheritance at maternal-only sites, consistent with ac-
`curacy obtained under simulation with data from the first trio (Fig. 5).
`These results underscore the importance of specific technical param-
`eters in determining performance, namely, the length and completeness
`of haplotype-resolved sequencing of parental DNA, and the depth and
`complexity of sequencing libraries derived from low starting masses of
`plasma-derived DNA (less than 5 ng for both I1 and G1 in our study).
`There remain several key avenues for improvement. First, although
`we predicted inheritance at 2.8 × 106 heterozygous sites with high ac-
`curacy (98.2% overall), there were 7.5 × 105 sites for which we did not
`attempt prediction (Table 2). These include 6.3 × 105 shared sites het-
`erozygous in both parents for which we could not assess paternal trans-
`mission and 1.2 × 105 maternal-only heterozygous sites that were not
`included in our haplotype assembly. The shared sites are in principle
`accessible but require haplotype-resolved (rather than solely shotgun)
`
`sequencing of paternal DNA, which was not possible here with either
`the I1 or the G1 trio owing to unavailability of high–molecular weight
`DNA from each father. The unphased maternal sites are also in principle
`accessible but require improvements to haplotyping technology to en-
`able phasing of SNPs residing within blocks of relatively low heterozy-
`gosity as well as within segmental duplications. More generally, despite
`recent innovations from our group and others (9–12), there remains a
`critical need for genome-wide haplotyping protocols that are at once
`robust, scalable, and comprehensive. Significant reductions in cost,
`along with standardization and automation, will be necessary for com-
`patibility with large-scale clinical application.
`Second, although we were successful in detecting nearly 90% of de
`novo single-nucleotide mutations by deep sequencing of maternal plas-
`ma DNA, this was with very low specificity. The application of a series
`of filters resulted in a ~2800-fold gain in specificity at a ~2-fold cost in
`terms of sensitivity. However, there is clearly room for improvement if
`we are to enable the sensitive and specific prenatal detection of poten-
`tially pathogenic de novo point mutations at a genome-wide scale, a goal
`that will likely require deeper than 78-fold coverage of the maternal
`plasma genome (15) in combination with targeted validation of poten-
`tially pathogenic candidate de novo mutations.
`
`www.ScienceTranslationalMedicine.org
`
`6 June 2012
`
`Vol 4 Issue 137 137ra76
`
`5
`
`00005
`
`

`

`R E S E A R C H A R T I C L E
`
`Third, our analyses focused exclusively on single-nucleotide var-
`iants, which are by far the most common form of both nonpathogenic
`and pathogenic genetic variation in human genomes (16, 17). Clinical
`application of noninvasive fetal genome sequencing will require more
`robust methods for detecting other forms of variation, for example,
`insertion-deletions, copy number changes, repeat expansions, and struc-
`tural rearrangements. Ideally, techniques for the detection of other forms
`of variation could derive from short sequencing reads in a manner that is
`directly integrated with experimental methods and algorithms for haplotype-
`resolved genome sequencing (18).
`The ability to noninvasively sequence a fetal genome to high accu-
`racy and completeness will undoubtedly have profound implications for
`the future of prenatal genetic diagnostics. Although individually rare,
`when considered collectively, the ~3500 Mendelian disorders with a
`known molecular basis (19) contribute substantially to morbidity and
`mortality (20). Currently, routine obstetric practice includes offering a
`spectrum of screening and diagnostic options to all women. Prenatal
`screening options have imperfect sensitivity and focus mainly on a small
`number of specific disorders, including trisomies, major congenital
`anomalies, and specific Mendelian disorders. Diagnostic tests, generally
`performed through invasive procedures, such as chorionic villus sampling
`and amniocentesis, also focus on specific disorders and confer risk of
`pregnancy loss that may inversely correlate with access to high-quality
`care. Noninvasive, comprehensive diagnosis of Mendelian disorders
`early in pregnancy would provide much more information to expectant
`parents, with the greater accessibility inherent to a noninvasive test and
`without tangible risk of pregnancy loss. The less tangible implication of
`incorporating this level of information into prenatal decision-making
`raises many ethical questions that must be considered carefully within
`the scientific community and on a societal level. A final point is that as
`in other areas of clinical genetics, our capacity to generate data is out-
`stripping our ability to interpret it in ways that are useful to physicians
`and patients. That is, although the noninvasive prediction of a fetal ge-
`nome may be technically feasible, its interpretation—even for known
`Mendelian disorders—will remain a major challenge (21).
`
`MATERIALS AND METHODS
`
`Whole-genome shotgun library preparation and sequencing
`Genomic DNA was extracted from whole blood, as available, or alter-
`natively from saliva, with the Gentra Puregene Kit (Qiagen) or Oragene
`Dx (DNA Genotek), respectively. Purified DNA was fragmented by
`sonication with a Covaris S2 instrument. Indexed shotgun sequencing
`libraries were prepared with the KAPA Library Preparation Kit (Kapa
`Biosystems), following the manufacturer’s instructions. All libraries
`were sequenced on HiSeq 2000 instruments (Illumina) using paired-
`end 101-bp reads with an index read of 9 bp.
`
`Maternal plasma library preparation and sequencing
`Maternal plasma was collected by standard methods and split into 1-ml
`aliquots, which were individually purified with the QIAamp Circu-
`lating Nucleic Acid kit (Qiagen). DNA yield was measured with a
`Qubit fluorometer (Invitrogen). Sequencing libraries were prepared
`with the ThruPLEX-FD kit (Rubicon Genomics), comprising a pro-
`prietary series of end-repair, ligation, and amplification reactions.
`Index read sequencing primers compatible with the whole-genome
`sequencing and fosmid libraries from this study were included during
`
`sequencing of maternal plasma libraries to permit detection of any
`contamination from other libraries. The percentage of fetal-derived
`sequences was estimated from plasma sequences by counting alleles
`specific to each parent as well as sequences mapping specifically to
`the Y chromosome (fig. S5).
`
`Maternal haplotype resolution via clone pool
`dilution sequencing
`Haplotype-resolved genome sequencing was performed essentially as
`previously described (9), with minor updates to facilitate processing
`in a 96-well format. Briefly, high–molecular weight DNA was mechan-
`ically sheared to mean size of ~38 kbp using a HydroShear instrument
`(Digilab), with the following settings: volume = 120 ml, cycles = 20, speed
`code = 16. Sheared DNA was electrophoresed throug

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket