`Author manuscript
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Published in final edited form as:
`Nat Med. 2014 May ; 20(5): 548–554. doi:10.1038/nm.3519.
`
`An ultrasensitive method for quantitating circulating tumor DNA
`with broad patient coverage
`
`Aaron M. Newman1,2,*, Scott V. Bratman1,3,*, Jacqueline To3, Jacob F. Wynne3, Neville C.
`W. Eclov3, Leslie A. Modlin3, Chih Long Liu2, Joel W. Neal2, Heather A. Wakelee2, Robert E.
`Merritt4, Joseph B. Shrager4, Billy W. Loo Jr.3, Ash A. Alizadeh1,2,5, and Maximilian
`Diehn1,3,6
`1Institute for Stem Cell Biology and Regenerative Medicine, 265 Campus Drive, Stanford, CA
`94305
`
`2Division of Oncology, Department of Medicine, Stanford Cancer Institute, 875 Blake Wilbur
`Drive, Stanford, CA 94305
`
`3Department of Radiation Oncology, 875 Blake Wilbur Drive, Stanford, CA 94305
`
`4Division of Thoracic Surgery, Department of Cardiothoracic Surgery, Stanford School of
`Medicine, 300 Pasteur Drive, Stanford, CA 94305
`
`5Division of Hematology, Department of Medicine, Stanford Cancer Institute, 875 Blake Wilbur
`Drive, Stanford, CA 94305
`
`6Stanford Cancer Institute, 875 Blake Wilbur Drive, Stanford, CA 94305
`
`Abstract
`Circulating tumor DNA (ctDNA) represents a promising biomarker for noninvasive assessment of
`cancer burden, but existing methods have insufficient sensitivity or patient coverage for broad
`clinical applicability. Here we introduce CAncer Personalized Profiling by deep Sequencing
`(CAPP-Seq), an economical and ultrasensitive method for quantifying ctDNA. We implemented
`CAPP-Seq for non-small cell lung cancer (NSCLC) with a design covering multiple classes of
`somatic alterations that identified mutations in >95% of tumors. We detected ctDNA in 100% of
`stage II–IV and 50% of stage I NSCLC patients, with 96% specificity for mutant allele fractions
`down to ~0.02%. Levels of ctDNA significantly correlated with tumor volume, distinguished
`between residual disease and treatment-related imaging changes, and provided earlier response
`assessment than radiographic approaches. Finally, we explored biopsy-free tumor screening and
`
`Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research,
`subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
`Correspondence to: Ash A. Alizadeh; Maximilian Diehn.
`*These authors contributed equally.
`Author Contributions
`AMN, SVB, AAA, and MD developed the concept, designed the experiments, analyzed the data, and wrote the manuscript. SVB
`performed the molecular biology experiments and AMN performed the bioinformatics analyses. CLL helped develop analytical
`pipeline software. SVB, JT, JFW, NCWE, LAM, JWL, HAW, REM, JBS, BWL, and MD provided patient specimens. AAA and MD
`contributed equally as senior authors. All authors commented on the manuscript at all stages.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.1
`
`
`
`Newman et al.
`
`Page 2
`
`genotyping with CAPP-Seq. We envision that CAPP-Seq could be routinely applied clinically to
`detect and monitor diverse malignancies, thus facilitating personalized cancer therapy.
`
`Introduction
`
`Analysis of ctDNA has the potential to revolutionize detection and monitoring of tumors.
`Noninvasive access to malignant DNA is particularly attractive for solid tumors, which
`cannot be repeatedly sampled without invasive procedures. In NSCLC, PCR-based assays
`have been used to detect recurrent point mutations in genes such as KRAS (kirsten rat
`sarcoma viral oncogene homolog) or EGFR (epidermal growth factor receptor) in plasma
`DNA1–4, but the majority of patients lack mutations in these genes. Recently, approaches
`employing massively parallel sequencing have been used to detect ctDNA5–12. However, the
`methods reported to date have been limited by modest sensitivity13, applicability to only a
`minority of patients, the need for patient-specific optimization, and/or cost. To overcome
`these limitations, we developed a novel strategy for analysis of ctDNA. Our method, called
`CAPP-Seq, combines optimized library preparation methods for low DNA input masses
`with a multi-phase bioinformatics approach to design a “selector” consisting of biotinylated
`DNA oligonucleotides that target recurrently mutated regions in the cancer of interest. To
`monitor ctDNA, the selector is applied to tumor DNA to identify a patient’s cancer-specific
`genetic aberrations and then directly to circulating DNA to quantify them (Fig. 1a). Here we
`demonstrate the technical performance and explore the clinical utility of CAPP-Seq in
`patients with early and advanced stage NSCLC.
`
`Results
`
`Design of a CAPP-Seq selector for NSCLC
`
`For the initial implementation of CAPP-Seq we focused on NSCLC, although our approach
`is generalizable to any cancer for which recurrent mutations have been identified. To design
`a selector for NSCLC (Fig. 1b, Supplementary Table 1, and Methods), we began by
`including exons covering recurrent mutations in potential driver genes from COSMIC14 and
`other sources15,16. Next, using whole exome sequencing (WES) data from 407 patients with
`NSCLC profiled by The Cancer Genome Atlas (TCGA), we applied an iterative algorithm to
`maximize the number of missense mutations per patient while minimizing selector size
`(Supplementary Fig. 1 and Supplementary Table 1).
`
`Approximately 8% of NSCLCs harbor rearrangements involving the receptor tyrosine
`kinases, ALK (anaplastic lymphoma receptor tyrosine kinase), ROS1 (c-ros oncogene 1
`tyrosine kinase) or the RET proto-oncogene17–19. To utilize the personalized nature and
`lower false detection rate inherent in the unique junctional sequences of structural
`rearrangements5,6, we included the introns and exons spanning recurrent fusion breakpoints
`in these genes in the final design phase (Fig. 1b). To detect fusions in tumor and plasma
`DNA, we developed a breakpoint-mapping algorithm (Methods). Application of this
`algorithm to next generation sequencing (NGS) data from two NSCLC cell lines known to
`harbor fusions with previously uncharacterized breakpoints22,23 readily identified the
`breakpoints at nucleotide resolution (Supplementary Fig. 2).
`
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.2
`
`
`
`Newman et al.
`
`Page 3
`
`Collectively, the NSCLC selector design targets 521 exons and 13 introns from 139
`recurrently mutated genes, in total covering ~125 kb (Fig. 1b). Within this small target
`(0.004% of the human genome), the selector identifies a median of 4 single nucleotide
`variants (SNVs) and covers 96% of patients with lung adenocarcinoma or squamous cell
`carcinoma. To validate the number of mutations covered per tumor, we examined the
`selector region in WES data from an independent cohort of 183 lung adenocarcinoma
`patients20. The selector covered 88% of patients with a median of 4 SNVs per patient, ~4-
`fold more than would be expected from random sampling of the exome (P < 1.0 × 10−6; Fig.
`1c), thus validating our selector design algorithm.
`
`Methodological optimization and performance assessment
`
`We performed deep sequencing with the NSCLC selector to achieve ~10,000x coverage
`(pre-duplication removal) based on considerations of sequencing depth, median number of
`reporters, and ctDNA detection limit (Fig. 1d). We profiled a total of 90 samples, including
`two NSCLC cell lines, 17 primary tumor samples and matched peripheral blood leukocytes
`(PBLs), and 40 plasma samples from 18 human subjects, including five healthy adults and
`13 patients with NSCLC (Supplementary Table 2). To assess and optimize selector
`performance, we first applied it to circulating DNA purified from healthy control plasma,
`observing efficient and uniform capture of genomic DNA (Supplementary Table 2).
`Sequenced plasma DNA fragments had a median length of ~170 bp (Fig. 2a), closely
`corresponding to the length of DNA contained within a chromatosome24. By optimizing
`library preparation from small quantities of plasma DNA, we increased recovery efficiency
`by >300% and decreased bias for libraries constructed from as little as 4 ng (Supplementary
`Fig. 3). Consequently, fluctuations in sequencing depth were minimal (Fig. 2b,c).
`
`The detection limit of CAPP-Seq is affected by (i) the input number and recovery rate of
`circulating DNA molecules, (ii) sample cross-contamination, (iii) potential allelic bias in the
`capture reagent, and (iv) PCR or sequencing errors. We examined each of these elements in
`turn. First, by comparing the number of input DNA molecules per sample with estimates of
`library complexity (Supplementary Fig. 4a and Supplementary Methods), we calculated a
`circulating DNA molecule recovery rate of ≥49% (Supplementary Table 2). This was in
`agreement with molecule recovery yields calculated following PCR (Supplementary Fig.
`4b). Second, by analyzing patient-specific homozygous SNPs across samples, we found
`cross-contamination of ~0.06% in multiplexed plasma DNA (Supplementary Fig. 4c and
`Supplementary Methods), prompting us to exclude any tumor-derived SNV from further
`analysis if found as a germline SNP in another profiled patient. Next, we evaluated the
`allelic skew in heterozygous germline SNPs within patient PBL samples and observed
`minimal bias toward capture of reference alleles (Supplementary Fig. 4d). Finally, we
`analyzed the distribution of non-reference alleles across the selector for the 40 plasma DNA
`samples, excluding tumor-derived SNVs and germline SNPs. We found mean and median
`background rates of 0.006% and 0.0003%, respectively (Fig. 2d), both considerably lower
`than previously reported NGS-based methods for ctDNA analysis8,10.
`
`In addition to technical reasons, non-germline plasma DNA could be present in the absence
`of cancer due to contributions from pre-neoplastic cells from diverse tissues, and such
`
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.3
`
`
`
`Newman et al.
`
`Page 4
`
`“biological” background may impact sensitivity. We hypothesized that biological
`background, if present, would be particularly high for recurrently mutated positions in
`known cancer driver genes and therefore analyzed mutation rates of 107 cancer-associated
`SNVs25 in all 40 plasma samples, excluding somatic mutations found in a patient’s tumor.
`Though the median fractional abundance was comparable to the global selector background
`(~0%), the mean was marginally higher at ~0.01% (Fig. 2e). Strikingly, we detected one
`mutational hotspot (tumor suppressor TP53, R175H) at a median frequency of ~0.18%
`across all plasma DNA samples, including patients and healthy subjects (Fig. 2f). Since this
`TP53 mutant allele is observed significantly above global background (P < 0.01), we
`hypothesize that it reflects true biological clonal heterogeneity, and thus excluded it as a
`potential reporter. To address background more generally, we also normalized for allele-
`specific differences in background rate when assessing the significance of ctDNA detection
`(Supplementary Methods). As a result, we found that biological background is not a major
`factor for ctDNA quantitation at detection limits above ~0.01%.
`
`Next, we empirically benchmarked the detection limit and linearity of CAPP-Seq (Fig. 2g
`and Supplementary Fig. 5a). We accurately detected defined inputs of NSCLC DNA at
`fractional abundances between 0.025% and 10% with high linearity (R2 ≥ 0.994). We
`observed only marginal improvements in error metrics above a threshold of 4 SNP reporters
`(Fig. 2h,i and Supplementary Fig. 5b,c), equivalent to the median number of SNVs per
`tumor identified by the selector. Moreover, the fractional abundance of fusion breakpoints,
`indels (insertions and deletions), and CNVs (copy number variants) correlated highly with
`expected concentrations (R2 ≥ 0.97; Supplementary Fig. 5d).
`
`Somatic mutation detection and tumor burden quantitation
`
`We next applied CAPP-Seq to the discovery of somatic mutations in tumors collected from
`17 patients with NSCLC (Table 1 and Supplementary Table 3), including formalin fixed
`surgical resection or needle biopsy specimens and malignant pleural fluid. At a mean
`sequencing depth of ~5,000x (pre-duplicate removal) in tumor and paired germline samples
`(Supplementary Table 2), we detected 100% of previously identified SNVs and fusions and
`discovered many additional somatic variants (Table 1 and Supplementary Table 3).
`Moreover, we characterized breakpoints at base-pair resolution and identified partner genes
`for each of eight known fusions involving ALK or ROS1 (Supplementary Fig. 2). Tumors
`containing fusions were almost exclusively from never smokers and contained fewer SNVs
`than those lacking fusions, as expected21 (Supplementary Fig. 2). Excluding patients with
`fusions, we identified a median of 6 SNVs (3 missense) per patient (Table 1), in line with
`our selector design-stage predictions (Fig. 1b,c).
`
`Next, we assessed the sensitivity and specificity of CAPP-Seq for disease monitoring and
`minimal residual disease detection using plasma samples from five healthy controls and 35
`samples collected from 13 patients with NSCLC (Table 1 and Supplementary Table 4). We
`integrated information content across multiple instances and classes of somatic mutations
`into a ctDNA detection index. This index is analogous to a false positive rate and is based on
`a decision tree in which fusion breakpoints take precedence due to their nonexistent
`background and in which p-values from multiple reporter types are integrated (Methods).
`
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.4
`
`
`
`Newman et al.
`
`Page 5
`
`Applying this approach in an ROC analysis, CAPP-Seq achieved an area under the curve
`(AUC) of 0.95, with maximal sensitivity and specificity of 85% and 96%, respectively, for
`all plasma DNA samples from pretreated patients and healthy controls. Sensitivity among
`stage I tumors was 50% and among stage II–IV patients was 100% with a specificity of 96%
`(Fig. 3a,b). Moreover, when considering both pre and post-treatment samples, CAPP–Seq
`exhibited robust performance, with AUC values of 0.89 for all stages and 0.91 for stages II–
`IV (P < 0.0001; Supplementary Fig. 6). Furthermore, by adjusting the ctDNA detection
`index, we could increase specificity up to 98% while still capturing 2/3 of all cancer-positive
`samples and 3/4 of stages II–IV cancer-positive samples (Supplementary Fig. 6). Thus,
`CAPP-Seq can achieve robust assessment of tumor burden and can be tuned to deliver a
`desired sensitivity and specificity.
`
`Monitoring of NSCLC tumor burden in plasma samples
`
`We next asked whether significantly detectable levels of ctDNA correlate with
`radiographically measured tumor volumes and clinical responses to therapy. Fractions of
`ctDNA detected in plasma by SNV and/or indel reporters ranged from ~0.02% to 3.2%
`(Table 1), with a median of ~0.1% in pre-treatment samples. Absolute levels of ctDNA in
`pre-treatment plasma were significantly correlated with tumor volume as measured by
`computed tomography (CT) and positron emission tomography (PET) imaging (R2 = 0.89, P
`= 0.0002; Fig. 3c).
`
`To determine whether ctDNA concentrations reflect disease burden in longitudinal samples,
`we analyzed plasma DNA from three patients with advanced NSCLC undergoing distinct
`therapies (Fig. 4a–c). As in pre-treatment samples, ctDNA levels were highly correlated
`with tumor volumes during therapy (R2 = 0.95 for P15; R2 = 0.85 for P9). This behavior was
`observed whether the mutation type measured was a collection of SNVs and an indel (P15,
`Fig. 4a), multiple fusions (P9, Fig. 4b), or SNVs and a fusion (P6, Fig. 4c). Of note, in one
`patient (P9) we identified both a classic EML4-ALK fusion and two previously unreported
`fusions involving ROS1: FYN-ROS1 and ROS1-MKX (Supplementary Fig. 2). All fusions
`were confirmed by qPCR amplification of genomic DNA and were independently recovered
`in plasma samples (Supplementary Table 4). To the best of our knowledge this is the first
`observation of ROS1 and ALK fusions in the same individual with NSCLC.
`
`We designed the NSCLC CAPP-Seq selector to detect multiple SNVs per tumor. In one
`patient (P5), this design allowed us to identify a dominant clone with an activating EGFR
`mutation as well as an erlotinib-resistant subclone with a “gatekeeper” EGFR T790M
`mutation26. The ratio between clones was identical in a tumor biopsy and simultaneously
`sampled plasma (Fig. 4d), demonstrating that our method has potential for detecting and
`quantifying clinically relevant subclones.
`
`Patients with stages II–III NSCLC undergoing definitive radiotherapy often have
`surveillance CT or PET/CT scans that are difficult to interpret due to radiation-induced
`inflammatory and fibrotic changes in the lung and surrounding tissues. For patient P13, who
`was treated with radiotherapy for stage IIB NSCLC, follow-up imaging showed a large mass
`that was felt to represent residual disease. However, ctDNA at the same time point was
`undetectable (Fig. 4e) and the patient remained disease free 22 months later, supporting the
`
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.5
`
`
`
`Newman et al.
`
`Page 6
`
`ctDNA result. Another patient (P14) was treated with chemoradiotherapy for stage IIIB
`NSCLC and follow-up imaging revealed a near complete response (Fig. 4f). However, the
`ctDNA concentration slightly increased following therapy, suggesting progression of occult
`microscopic disease. Indeed, clinical progression was detected 7 months later and the patient
`ultimately succumbed to NSCLC. These data highlight the promise of ctDNA analysis for
`identifying patients with residual disease after therapy.
`
`We next asked whether the low detection limit of CAPP-Seq would allow monitoring in
`early stage NSCLC. Patients P1 (Fig. 4g) and P16 (Fig. 4h) underwent surgery and
`stereotactic ablative radiotherapy (SABR), respectively, for stage IB NSCLC. We detected
`ctDNA in pre-treatment plasma of P1 but not at 3 or 32 months following surgery,
`suggesting this patient was free of disease and likely cured. For patient P16, the initial
`surveillance PET-CT scan following SABR showed a residual mass that was interpreted as
`representing either residual tumor or post-radiotherapy inflammation. We detected no
`evidence of residual disease by ctDNA, supporting the latter, and the patient remained free
`of disease at last follow-up 21 months after therapy. Taken together, these results
`demonstrate the potential utility of CAPP-Seq for measuring tumor burden in early and
`advanced stage NSCLC and for monitoring ctDNA during distinct types of therapy.
`
`Biopsy-free cancer screening and tumor genotyping
`
`Finally, we explored whether CAPP-Seq analysis of ctDNA could potentially be used for
`cancer screening and biopsy-free tumor genotyping. As proof-of-principle, we blinded
`ourselves to the mutations present in each patient’s tumor and applied a novel statistical
`method to test for the presence of cancer DNA in each plasma sample in our cohort
`(Supplementary Fig. 7). By implementing our cancer screening method for high specificity,
`we correctly classified 100% of patient plasma samples with ctDNA above fractional
`abundances of 0.4% with a false positive rate of 0% (Fig. 4i and Supplementary Methods).
`CAPP-Seq could therefore potentially improve upon the low positive predictive value of
`low-dose CT screening in patients at high risk of developing NSCLC29.
`
`Separately, when we specifically examined the ability to non-invasively detect actionable
`mutations in EGFR and KRAS25, we correctly identified 100% of mutations at allelic
`fractions greater than 0.1% with 99% specificity. CAPP-Seq may therefore have utility for
`biopsy-free tumor genotyping in locally advanced or metastatic patients. However,
`methodological improvements will be required to detect and genotype stage I tumors
`without prior knowledge of tumor genotype.
`
`Discussion
`
`In this study, we present CAPP-Seq as a new method for ctDNA quantitation. Key features
`include high sensitivity and specificity, coverage of nearly all patients with NSCLC, lack of
`patient-specific optimization, and low cost. By incorporating optimized library construction
`and bioinformatics methods, CAPP-Seq achieves the lowest background error rate and
`lowest detection limit of any NGS-based method used for ctDNA analysis to date. Our
`approach also reduces the potential impact of stochastic noise and biological variability
`(e.g., mutations near the detection limit or subclonal tumor evolution) on tumor burden
`
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.6
`
`
`
`Newman et al.
`
`Page 7
`
`quantitation by integrating information content across multiple instances and classes of
`somatic mutations. These features facilitated the detection of minimal residual disease, and
`the first report of ctDNA quantitation from stage I NSCLC tumors using deep sequencing.
`Although we focused on NSCLC, our method could be applied to any malignancy for which
`recurrent mutation data are available.
`
`In many patients, levels of ctDNA are considerably lower than the detection thresholds of
`previously described sequencing-based methods13. For example, pre-treatment ctDNA
`concentration is <0.5% in the majority of patients with lung and colorectal
`carcinomas1,30,31. Following therapy, ctDNA concentrations typically drop, thus requiring
`even lower detection thresholds. Previously published methods employing amplicon8,10,11,
`whole exome12, or whole genome9,32,33,24 sequencing would not be sensitive enough to
`detect ctDNA in most patients with NSCLC, even at 10-fold or greater sequencing costs
`(Fig. 1d and Supplementary Fig. 8).
`
`To further expand the potential clinical applications of ctDNA quantitation, additional gains
`in the detection threshold are desirable. Potential approaches include using barcoding
`strategies that suppress PCR errors resulting from library preparation34,35 and increasing the
`amount of plasma used for ctDNA analysis above the average of ~1.5mL used in our study.
`A second limitation of CAPP-Seq is the potential for inefficient capture of fusions, which
`could lead to underestimates of tumor burden (e.g., P9; Supplementary Methods). However,
`this bias can be analytically addressed when other reporter types are present (e.g., P6;
`Supplementary Table 4). Finally, while we found that CAPP-Seq could quantitate CNVs,
`our current selector design did not prioritize these types of aberrations. We anticipate that
`adding coverage for certain CNVs will prove useful for monitoring various types of cancers.
`
`In summary, targeted hybrid capture and high-throughput sequencing of plasma DNA
`allows for highly sensitive and non-invasive detection of ctDNA in the vast majority of
`patients with NSCLC at low cost. CAPP-Seq could therefore be routinely applied clinically
`and has the potential for accelerating the personalized detection, therapy, and monitoring of
`cancer. We anticipate that CAPP-Seq will prove valuable in a variety of clinical settings,
`including the assessment of cancer DNA in alternative biological fluids and specimens with
`low cancer cell content.
`
`Online Methods
`
`Patient selection
`
`Between April 2010 and June 2012, patients undergoing treatment for newly diagnosed or
`recurrent NSCLC were enrolled in a study approved by the Stanford University Institutional
`Review Board and provided informed consent. Enrolled patients had not received blood
`transfusions within 3 months of blood collection. Patient characteristics are in
`Supplementary Table 3. All treatments and radiographic examinations were performed as
`part of standard clinical care. Volumetric measurements of tumor burden were based on
`visible tumor on CT and calculated according to the ellipsoid formula: (length/2) ×
`(width^2).
`
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.7
`
`
`
`Newman et al.
`
`Page 8
`
`Sample collection and processing
`
`Peripheral blood from patients was collected in EDTA Vacutainer tubes (BD). Blood
`samples were processed within 3 h of collection. Plasma was separated by centrifugation at
`2,500 × g for 10 min, transferred to microcentrifuge tubes, and centrifuged at 16,000 × g for
`10 min to remove cell debris. The cell pellet from the initial spin was used for isolation of
`germline genomic DNA from PBLs (peripheral blood leukocytes) with the DNeasy Blood &
`Tissue Kit (Qiagen). Matched tumor DNA was isolated from FFPE specimens or from the
`cell pellet of pleural effusions. Genomic DNA was quantified by Quant-iT PicoGreen
`dsDNA Assay Kit (Invitrogen).
`
`Cell-free DNA purification and quantification
`
`Circulating DNA was isolated from 1–5 mL plasma with the QIAamp Circulating Nucleic
`Acid Kit (Qiagen). The concentration of purified plasma DNA was determined by
`quantitative PCR (qPCR) using an 81 bp amplicon on chromosome 124 and a dilution series
`of intact male human genomic DNA (Promega) as a standard curve. Power SYBR Green
`was used for qPCR on a HT7900 Real Time PCR machine (Applied Biosystems), using
`standard PCR thermal cycling parameters.
`
`NGS library construction
`
`Indexed Illumina NGS libraries were prepared from plasma DNA and shorn tumor,
`germline, and cell line genomic DNA. For patient plasma DNA, 7–32 ng DNA were used
`for library construction without additional fragmentation. For tumor, germline, and cell line
`genomic DNA, 69–1000 ng DNA was sheared prior to library construction with a Covaris
`S2 instrument using the recommended settings for 200 bp fragments. See Supplementary
`Table 2 for details.
`
`The NGS libraries were constructed using the KAPA Library Preparation Kit (Kapa
`Biosystems) employing a DNA Polymerase possessing strong 3′-5′ exonuclease (or
`proofreading) activity and displaying the lowest published error rate (i.e. highest fidelity) of
`all commercially available B-family DNA polymerases36,37. The manufacturer’s protocol
`was modified to incorporate with-bead enzymatic and cleanup steps using Agencourt
`AMPure XP beads (Beckman-Coulter) 38. Ligation was performed for 16 h at 16 °C using
`100-fold molar excess of indexed Illumina TruSeq adapters. Single-step size selection was
`performed by adding 40 μL (0.8X) of PEG buffer to enrich for ligated DNA fragments. The
`ligated fragments were then amplified using 500 nM Illumina backbone oligonucleotides
`and 4–9 PCR cycles, depending on input DNA mass. Library purity and concentration was
`assessed by spectrophotometer (NanoDrop 2000) and qPCR (KAPA Biosystems),
`respectively. Fragment length was determined on a 2100 Bioanalyzer using the DNA 1000
`Kit (Agilent).
`
`Library design for hybrid selection
`
`Hybrid selection was performed with a custom SeqCap EZ Choice Library (Roche
`NimbleGen). This library was designed through the NimbleDesign portal (v1.2.R1) using
`genome build hg19 NCBI Build 37.1/GRCh37 and with Maximum Close Matches set to 1.
`
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.8
`
`
`
`Newman et al.
`
`Page 9
`
`Input genomic regions were selected according to the most frequently mutated genes and
`exons in NSCLC. These regions were identified from the COSMIC database, TCGA, and
`other published sources as described in the Supplementary Methods. Final selector
`coordinates are provided in Supplementary Table 1.
`
`Hybrid selection and NGS
`
`NimbleGen SeqCap EZ Choice was used according to the manufacturer’s protocol with
`modifications. Between 9 and 12 indexed Illumina libraries were included in a single
`capture hybridization. Following hybrid selection, the captured DNA fragments were
`amplified with 12 to 14 cycles of PCR using 1X KAPA HiFi Hot Start Ready Mix and 2 μM
`Illumina backbone oligonucleotides in 4 to 6 separate 50 μL reactions. The reactions were
`then pooled and processed with the QIAquick PCR Purification Kit (Qiagen). Multiplexed
`libraries were sequenced using 2 × 100 bp paired-end runs on an Illumina HiSeq 2000.
`
`Mapping and quality control
`
`Paired-end reads were mapped to the hg19 reference genome with BWA 0.6.2 (default
`parameters)39, and sorted and indexed with SAMtools40. QC was assessed using a custom
`Perl script to collect a variety of statistics, including mapping characteristics, read quality,
`and selector on-target rate (i.e., number of unique reads that intersect the selector space
`divided by all aligned reads), generated respectively by SAMtools flagstat, FastQC (http://
`www.bioinformatics.babraham.ac.uk/projects/fastqc/), and BEDTools coverageBed41. Plots
`of fragment length distribution and sequence depth and coverage were automatically
`generated for visual QC assessment. To mitigate the impact of sequencing errors, analyses
`not involving fusions were restricted to properly paired reads, and only bases with Phred
`quality scores ≥30 (≤0.1% probability of a sequencing error) were further analyzed.
`
`Detection thresholds
`
`Two dilution series were performed to assess the linearity and accuracy of CAPP-Seq for
`quantitating ctDNA. In one experiment, shorn genomic DNA from a NSCLC cell line
`(HCC78) was spiked into circulating DNA from a healthy individual, while in a second
`experiment, shorn genomic DNA from one NSCLC cell line (NCI-H3122) was spiked into
`shorn genomic DNA from a second NSCLC line (HCC78). A total of 32 ng DNA was used
`for library construction. Following mapping and quality control, homozygous reporters were
`identified as alleles unique to each sample with at least 20x sequencing depth and an allelic
`fraction >80%. Fourteen such reporters were identified between HCC78 genomic DNA and
`plasma DNA (Fig. 2g,h), whereas 24 reporters were found between NCI-H3122 and HCC78
`genomic DNA (Supplementary Fig. 5).
`
`Bioinformatics pipeline
`
`Details of bioinformatics methods are supplied in the Supplementary Methods. Briefly, for
`detection of SNVs and indels, we employed VarScan 242 with strict postprocessing filters to
`improve variant call confidence, and for fusion identification and breakpoint
`characterization we used a novel algorithm, called FACTERA (Supplementary Methods).
`To quantify tumor burden in plasma DNA, allele frequencies of reporter SNVs and indels
`
`Nat Med. Author manuscript; available in PMC 2014 November 01.
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Author Manuscript
`
`Foresight EX1044
`Foresight v Personalis
`
`Personalis EX2152.9
`
`
`
`Newman et al.
`
`Page 10
`
`were assessed using the output of SAMtools mpileup40, and fusions, if detected, were
`enumerated with FACTERA.
`
`Statistical analyses
`
`The NSCLC selector was validated in silico using an independent cohort of lung
`adenocarcinomas20 (Fig. 1c). To assess statistical significance, we analyzed the same cohort
`using 10,000 random selectors sampled from the exome, each with an identical size
`distribution to the CAPP-Seq NSCLC selector. The performance of random selectors had a
`normal distribution, and p-values were calculated accordingly. Of note, all identified
`somatic lesions were considered in this analysis.
`
`Related to Fig. 1d, the probability P of recovering at least two reads of a single mutant allele
`in plasma for a given depth and detection limit was modeled by a binomial distribution.
`Given P, the probability of detecting all identified tumor mutations in plasma (e.g., median
`of 4 for CAPP-Seq) was modeled by a geometric distribution. Estimates are based on 250
`million 100 bp reads per lane (e.g., using an Illumina HiSeq 2000 platform). Moreover, an
`on-target rate of 60% was assumed for CAPP-Seq and WES.
`
`To evalu