`211–224 (2013)
`
`Cancer Diagnostics
`
`Cancer Genome Scanning in Plasma:
`Detection of Tumor-Associated Copy Number Aberrations,
`Single-Nucleotide Variants, and Tumoral Heterogeneity by
`Massively Parallel Sequencing
`
`K.C. Allen Chan,1,2,3 Peiyong Jiang,1,2 Yama W.L. Zheng,1,2 Gary J.W. Liao,1,2 Hao Sun,1,2 John Wong,4
`Shing Shun N. Siu,5 Wing C. Chan,6 Stephen L. Chan,3,7 Anthony T.C. Chan,3,7 Paul B.S. Lai,4
`Rossa W.K. Chiu,1,2 and Y.M.D. Lo1,2,3*
`
`BACKGROUND: Tumor-derived DNA can be found in
`the plasma of cancer patients. In this study, we ex-
`plored the use of shotgun massively parallel sequencing
`(MPS) of plasma DNA from cancer patients to scan a
`cancer genome noninvasively.
`
`METHODS: Four hepatocellular carcinoma patients and a
`patient with synchronous breast and ovarian cancers were
`recruited. DNA was extracted from the tumor tissues, and
`the preoperative and postoperative plasma samples of
`these patients were analyzed with shotgun MPS.
`
`RESULTS: We achieved the genomewide profiling of
`copy number aberrations and point mutations in the
`plasma of the cancer patients. By detecting and
`quantifying the genomewide aggregated allelic loss
`and point mutations, we determined the fractional
`concentrations of tumor-derived DNA in plasma
`and correlated these values with tumor size and sur-
`gical treatment. We also demonstrated the potential
`utility of this approach for the analysis of complex
`oncologic scenarios by studying the patient with 2
`synchronous cancers. Through the use of multire-
`gional sequencing of tumoral tissues and shotgun
`sequencing of plasma DNA, we have shown that
`plasma DNA sequencing is a valuable approach for
`studying tumoral heterogeneity.
`
`CONCLUSIONS: Shotgun DNA sequencing of plasma is a
`potentially powerful tool for cancer detection, moni-
`toring, and research.
`© 2012 American Association for Clinical Chemistry
`
`The presence of tumor-derived DNA in the plasma of
`cancer patients offers exciting opportunities for the de-
`tection and monitoring of cancer (1, 2 ). Indeed,
`cancer-associated microsatellite alterations (3, 4 ), gene
`mutations (5–9 ), DNA-methylation changes (10, 11 ),
`and viral nucleic acids (12 ) have been found in the
`plasma of patients with different cancer types. Most of
`the previously published work on plasma DNA as a
`cancer marker has focused on the detection of specific
`and predetermined molecular targets known to be as-
`sociated with cancer by means of such methods as the
`PCR (3, 4, 12 ), digital PCR (5–7 ), and digital ligation
`assays (9 ). With the advent of massively parallel se-
`quencing (MPS),8 several groups have incorporated
`this approach for developing new plasma DNA– based
`cancer markers. One approach is to use MPS on tumor
`samples to first identify specific genomic rearrange-
`ments that can subsequently be detected in plasma
`(13, 14 ). Another approach is based on the use of tar-
`geted amplicon sequencing to search for mutations of
`genes that are commonly found in cancer (8 ).
`
`1 Li Ka Shing Institute of Health Sciences, 2 Department of Chemical Pathology,
`3 State Key Laboratory in Oncology in South China, Sir Y.K. Pao Centre for
`Cancer, 4 Department of Surgery, and 5 Department of Obstetrics and Gynae-
`cology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin,
`New Territories, Hong Kong SAR, China; 6 Department of Surgery, North District
`Hospital, Sheung Shui, New Territories, Hong Kong SAR, China; 7 Department of
`Clinical Oncology, The Chinese University of Hong Kong, Prince of Wales
`Hospital, Shatin, New Territories, Hong Kong SAR, China.
`* Address correspondence to this author at: Department of Chemical Pathology,
`The Chinese University of Hong Kong, Prince of Wales Hospital, 30 –32 Ngan
`
`Shing St., Shatin, New Territories, Hong Kong SAR. Fax ⫹852-26365090; e-mail
`loym@cuhk.edu.hk.
`Received September 6, 2012; accepted September 27, 2012.
`Previously published online at DOI: 10.1373/clinchem.2012.196014
`8 Nonstandard abbreviations: MPS, massively parallel sequencing; HCC, hep-
`atocellular carcinoma; SOAP2, Short Oligonucleotide Alignment Program 2;
`SNP, single-nucleotide polymorphism; LOH, loss of heterozygosity; LOESS,
`locally weighted scatterplot smoothing; SNV, single nucleotide variant;
`GAAL, genomewide aggregated allelic loss.
`
`211
`
`00001
`
`EX1051
`
`
`
`Owing to their targeted nature, the approaches
`outlined above can provide only a partial glimpse of
`the tumor genome in the plasma of cancer patients.
`For a genomewide view of the tumor genome in the
`circulation, a nontargeted random— or shotgun—
`sequencing approach would be desirable. In this re-
`gard, there has been much progress in the field of non-
`invasive prenatal diagnosis because of the results ob-
`tained with shotgun MPS of DNA from the plasma of
`pregnant women (15 ). This approach has allowed the
`noninvasive detection of fetal chromosomal aneup-
`loidies (16 –18 ) and fetal genomic scanning (19, 20 ).
`In this article, we report the use of shotgun MPS to
`obtain a noninvasive, genomewide view of cancer-
`associated copy number variations and mutations in
`DNA in plasma. We have also sought to demonstrate
`the use of this approach for elucidating important tu-
`moral characteristics, with tumoral heterogeneity used
`as an example.
`
`Materials and Methods
`
`SAMPLE COLLECTION
`Hepatocellular carcinoma (HCC) patients and carriers
`of chronic hepatitis B were recruited from the Depart-
`ment of Surgery and the Department of Medicine and
`Therapeutics, respectively, of the Prince of Wales Hos-
`pital, Hong Kong, and informed consent and institu-
`tional review board approval were obtained. All HCC
`patients had Barcelona Clinic Liver Cancer stage A1
`disease. Informed consent was obtained after the na-
`ture and possible consequences of the studies were ex-
`plained. The patient with synchronous breast and
`ovarian cancers was recruited from the Department of
`Clinical Oncology, Prince of Wales Hospital. Periph-
`eral blood samples from all participants were collected
`into EDTA-containing tubes. The tumor tissues of the
`HCC patients were obtained during their cancer-
`resection surgeries.
`
`PROCESSING OF BLOOD
`Peripheral blood samples were centrifuged at 1600g for
`10 min at 4 °C. The plasma portion was recentrifuged
`at 16 000g for 10 min at 4 °C and then stored at ⫺80 °C.
`Cell-free DNA molecules from 4.8 mL of plasma were
`extracted according to the blood and body fluid proto-
`col of the QIAamp DSP DNA Blood Mini Kit (Qiagen).
`The plasma DNA was concentrated with a SpeedVac威
`Concentrator (Savant DNA120; Thermo Scientific)
`into a 40-L final volume per case for subsequent prep-
`aration of the DNA-sequencing library.
`
`GENOMIC DNA EXTRACTION
`Genomic DNA was extracted from patients’ buffy coat
`samples according to the blood and body fluid protocol
`
`212 Clinical Chemistry 59:1 (2013)
`
`of the QIAamp DSP DNA Blood Mini Kit. DNA was
`extracted from tumor tissues with the QIAamp DNA
`Mini Kit (Qiagen).
`
`DNA SEQUENCING
`Sequencing libraries of the genomic DNA samples were
`constructed with the Paired-End Sample Preparation
`Kit (Illumina) according to the manufacturer’s in-
`structions. In brief, 1–5 g genomic DNA was first
`sheared with a Covaris S220 Focused-ultrasonicator to
`200-bp fragments. Afterward, DNA molecules were
`end-repaired with T4 DNA polymerase and Klenow
`polymerase; T4 polynucleotide kinase was then used to
`phosphorylate the 5⬘ ends. A 3⬘ overhang was created
`with a 3⬘-to-5⬘ exonuclease– deficient Klenow frag-
`ment. Illumina adapter oligonucleotides were ligated
`to the sticky ends. The adapter-ligated DNA was en-
`riched with a 12-cycle PCR. Because the plasma DNA
`molecules were short fragments (21 ) and the amounts
`of total DNA in the plasma samples were relatively
`small, we omitted the fragmentation steps and used a
`15-cycle PCR when constructing the DNA libraries
`from the plasma samples.
`An Agilent 2100 Bioanalyzer (Agilent Technolo-
`gies) was used to check the quality and size of the
`adapter-ligated DNA libraries. DNA libraries were
`then measured by a KAPA Library Quantification Kit
`(Kapa Biosystems) according to the manufacturer’s
`instructions.
`The DNA library was diluted and hybridized to the
`paired-end sequencing flow cells. DNA clusters were
`generated on a cBot cluster generation system (Illu-
`mina) with the TruSeq PE Cluster Generation Kit v2
`(Illumina), followed by 51 ⫻ 2 cycles or 76 ⫻ 2 cycles of
`sequencing on a HiSeq 2000 system (Illumina) with the
`TruSeq SBS Kit v2 (Illumina).
`
`SEQUENCE ALIGNMENT AND FILTERING
`The paired-end sequencing data were analyzed by
`means of the Short Oligonucleotide Alignment Pro-
`gram 2 (SOAP2) in the paired-end mode (22 ). For each
`paired-end read, 50 bp or 75 bp from each end was
`aligned to the non–repeat-masked reference human
`genome (Hg18). Up to 2 nucleotide mismatches were
`allowed for the alignment of each end. The genomic
`coordinates of these potential alignments for the 2 ends
`were then analyzed to determine whether any combi-
`nation would allow the 2 ends to be aligned to the same
`chromosome with the correct orientation, spanning an
`insert size ⱕ600 bp, and mapping to a single location in
`the reference human genome. Duplicated reads were
`defined as paired-end reads in which the insert DNA
`molecule showed identical start and end locations in
`the human genome; the duplicate reads were removed
`as previously described (19 ).
`
`00002
`
`
`
`Cancer Genome Scanning in Plasma
`
`MICROARRAY ANALYSIS
`DNA extracted from the buffy coat and the tumor tis-
`sues of the HCC patients was genotyped with the Af-
`fymetrix Genome-Wide Human SNP Array 6.0 system,
`as previously described (23 ). The microarray data were
`processed with the Affymetrix Genotyping Console
`version 4.1. Genotyping analysis and single-nucleotide
`polymorphism (SNP) calling were performed with
`the Birdseed v2 algorithm, as previously described
`(24 ). The genotyping data for the buffy coat and the
`tumor tissues were used for identifying loss-of-
`heterozygosity (LOH) regions and for performing the
`copy number analysis. Copy number analysis was per-
`formed with the Genotyping Console with default pa-
`rameters from Affymetrix and with a minimum
`genomic-segment size of 100 bp and a minimum of 5
`genetic markers within the segment. Regions with LOH
`were identified as regions having 1 copy in the tumor
`tissue and 2 copies in the buffy coat, with the SNPs
`within these regions being heterozygous in the buffy
`coat but homozygous in the tumor tissue. For a
`genomic region exhibiting LOH in a tumor tissue, the
`SNP alleles that were present in the buffy coat but were
`absent from or of reduced intensity in the tumor tissues
`were considered to be the alleles on the deleted segment
`of the chromosomal region. The alleles that were pres-
`ent in both the buffy coat and the tumor tissue were
`deemed as having been derived from the nondeleted
`segment of the chromosomal region.
`
`ARRAY COMPARATIVE GENOMIC HYBRIDIZATION ANALYSIS
`DNA samples extracted from the buffy coat and the
`tumor tissues of the HCC patients were analyzed with
`the SurePrint G3 Human High Resolution Microarray
`Kit (Agilent) as previously described (25 ). Array com-
`parative genomic hybridization data for the HCC pa-
`tients were analyzed for copy number variation with
`the Partek威 Genomics Suite. In brief, the raw probe
`intensities were adjusted according to the GC content
`of the sequence. This adjustment was followed by
`probe-level normalization of signal intensity while si-
`multaneously adjusting for fragment length and probe
`sequences across all samples. Copy number gains and
`losses were detected by applying the default parameters
`of the Genomic Segmentation algorithm available in
`Partek Genomics Suite version 6.5 to obtain the differ-
`ent partitions of the copy number state.
`
`sequencing biases with high-
`of GC-dependent
`throughput sequencing technologies (26 ), we used a
`statistical correction method, locally weighted scatter-
`plot smoothing (LOESS), to correct the GC-associated
`bias (27 ). In this method, a correction factor is calcu-
`lated for each bin according to the LOESS regression
`model, as previously described (28 ). Then, the read
`counts of each bin were adjusted with the bin-specific
`correction factor and normalized with the median read
`counts of all bins. After GC correction, a ratio of the
`adjusted read counts of the tumor to those of the buffy
`coat was calculated with the following equation:
`
`R ⫽
`
`Atumor
`ABC
`
`,
`
`is the normalized GC-adjusted read
`where Atumor
`counts of the tumor tissue and ABC is the normalized
`GC-adjusted read counts of the buffy coat.
`We then constructed a frequency distribution of
`log2(R) for all bins. This distribution plot was used to
`estimate the proportion of tumor cells (F) in the tumor
`tissue that showed a particular copy number distribu-
`tion and, subsequently, the copy number change at
`each bin.
`On the frequency-distribution plot, a central peak
`at which R is approximately equal to 1 [i.e., log2(R) ⫽
`0] was identified; this peak represents the genomic re-
`gions without copy number aberrations. Then, the
`peaks lying to the left and right of the central peak were
`identified. These peaks represented regions with a
`1-copy loss and a 1-copy gain, respectively. The dis-
`tances of the left and right peaks from the central peak
`were used to determine the proportion of tumor cells
`(F) in the tumor tissue, according to the following
`equation:
`
`F ⫽ Rright ⫺ Rleft,
`where Rright is the R value of the right peak and Rleft is
`the R value of the left peak.
`Then, the copy number change (CN) values for all
`1-Mb bins were calculated with the following equation:
`Rbin ⫺ Rcen
`0.5 ⫻ F
`
`CN ⫽
`
`,
`
`where Rbin is the R value of the bin and Rcen is the R
`value of the central peak.
`
`DETECTION OF COPY NUMBER ABERRATION IN TUMOR TISSUE
`
`SAMPLES BY SEQUENCING
`To investigate genomic copy number aberrations (e.g.,
`copy number gains and copy number losses), we di-
`vided the genome into equal-sized segments (1 Mb per
`window/bin), and tallied the numbers of sequence
`reads mapping to each bin. Owing to the presence
`
`DETECTION OF COPY NUMBER ABERRATIONS IN PLASMA
`We analyzed the genomic representation of plasma
`DNA for different genomic regions. First, the entire
`genome was divided into 1-Mb windows, similar to the
`analysis of copy number aberrations in the tumor tis-
`sues. The GC-corrected read count was then deter-
`mined (as described above) for each 1-Mb window. A z
`
`Clinical Chemistry 59:1 (2013) 213
`
`00003
`
`
`
`score statistic was used to determine if the plasma DNA
`representation in a 1-Mb window would be signifi-
`cantly increased or decreased when compared with the
`reference group. The reference group consisted of the
`plasma samples from 16 healthy control individuals. In
`the current study, the GC-corrected read counts of
`each 1-Mb bin were normalized to the median GC-
`corrected read counts of all bins in the sample. The
`normalized plasma DNA representation was then
`compared with the data from the controls. A z score
`was then calculated for each 1-Mb window by using the
`mean and SDs of the controls. Regions with z scores
`of ⬍⫺3 and ⬎3 were regarded as significantly under-
`and overrepresented, respectively.
`
`NUMBER OF MOLECULES REQUIRED FOR IDENTIFYING COPY
`
`NUMBER ABERRATIONS IN PLASMA
`For copy number aberration analysis, the sensitivity
`and specificity of detecting tumor-associated copy
`number aberrations in plasma were determined by the
`precision of measuring the representation of plasma
`DNA in a chromosomal region and the fractional con-
`centration of the tumor-derived DNA in the plasma of
`the cancer patient. The precision of measuring the
`plasma DNA representation in turn was affected by the
`number of plasma DNA molecules analyzed. In this
`regard, we performed simulation analyses to determine
`the relationship between the number of plasma DNA
`molecules required for analysis and the fractional con-
`centration of tumor-derived DNA in the plasma so we
`could achieve a sensitivity of 95% for the detection of
`tumor-associated copy number aberrations. Com-
`puter simulations were performed for scenarios in
`which the affected region had a copy number change of
`⫺1, ⫹1, and ⫹2 and for fractional concentrations of
`tumor-derived DNA ranging from 1% to 50%. In each
`simulation analysis, the entire genome was divided into
`3000 bins. This number was similar to the one we used
`in the actual experimental analysis when a 1-Mb reso-
`lution was used.
`We assumed that 10% of the bins would exhibit
`chromosomal aberrations in the tumor tissue. In the tu-
`mor tissue, the expected fraction (P) of total molecules
`falling into a bin within an affected region would be:
`
`P ⫽
`
`2 ⫹ CN
`2 ⫻ 3000 ⫹ 3000 ⫻ 10% ⫻ CN
`
`,
`
`where CN is the copy number change. From this infor-
`mation, we calculated the expected change in the
`plasma.
`In the plasma, the expected proportion of the total
`molecules (E) falling into a bin within an affected re-
`gion can be calculated as:
`
`214 Clinical Chemistry 59:1 (2013)
`
`E ⫽ P ⫻ f ⫹
`
`1 ⫺ f
`3000
`
`,
`
`is the fractional concentration of tumor-
`where f
`derived DNA in plasma.
`Simulations of 1000 normal cases and 1000 cancer
`cases were performed on the assumption of a binomial
`distribution of the plasma DNA molecules, with the
`expected plasma representations as calculated above
`and with an increasing number of molecules being an-
`alyzed until the 95% detection rate was reached. The
`simulation was conducted with the rbinom function in
`R (http://www.r-project.org/).
`
`DETECTION OF TUMOR-ASSOCIATED
`
`SINGLE-NUCLEOTIDE VARIANTS
`We sequenced the paired tumor and constitutional
`DNA samples to identify the tumor-associated single-
`nucleotide variants (SNVs). We focused on the SNVs
`occurring at homozygous sites in the constitutional
`DNA (i.e., buffy coat DNA). In principle, any nucleo-
`tide variation detected in the sequencing data of the
`tumor tissues but absent in the constitutional DNA
`could be a potential mutation (i.e., a SNV). Because of
`sequencing errors (0.1%– 0.3% of sequenced nucleo-
`tides) (29 ), however, millions of false positives would
`be identified in the genome if a single occurrence of any
`nucleotide change in the sequencing data of the tumor
`tissue were to be regarded as a tumor-associated SNV.
`One way to reduce the number of false positives would
`be to institute the criterion of observing multiple
`occurrences of the same nucleotide change in the se-
`quencing data in the tumor tissue before a tumor-
`associated SNV would be called. Because the occur-
`rence of sequencing errors is a stochastic process, the
`number of false positives due to sequencing errors
`would decrease exponentially with the increasing num-
`ber of occurrences required for an observed SNV to be
`qualified as a tumor-associated SNV. On the other
`hand, the number of false positives would increase ex-
`ponentially with increasing sequencing depth. These
`relationships could be predicted with Poisson and bi-
`nomial distribution functions. In this regard, we have
`developed a mathematical algorithm to determine the
`dynamic threshold of occurrence for qualifying an ob-
`served SNV as tumor associated. This algorithm takes
`into account the actual coverage of the particular nu-
`cleotide in the tumor sequencing data, the sequencing
`error rate, the maximum false-positive rate allowed,
`and the desired sensitivity for mutation detection.
`In this study, we set very stringent criteria to re-
`duce false positives. We required a mutation to be com-
`pletely absent in the constitutional DNA sequencing,
`and the sequencing depth for the particular nucleotide
`position had to be ⬎20-fold. This threshold of occur-
`
`00004
`
`
`
`Cancer Genome Scanning in Plasma
`
`rence was required to control the false-positive detec-
`tion rate at ⬍1 ⫻ 10⫺7. In this algorithm we also fil-
`tered out SNVs
`that were within centromeric,
`telomeric, and low-complexity regions to minimize
`false positives due to alignment artifacts. In addition,
`putative SNVs mapping to known SNPs in the dbSNP
`build 135 database were also removed.
`
`Results
`
`TUMOR-ASSOCIATED COPY NUMBER ABERRATIONS IN PLASMA
`We investigated whether tumor-associated copy num-
`ber aberrations could be detected in the plasma of can-
`cer patients by shotgun MPS. Peripheral blood samples
`were obtained both before and 1 week after surgical
`resection with curative intent from 4 HCC patients.
`The blood samples were fractionated into plasma and
`blood cells. DNA was also obtained from each of the
`tumors. Copy number aberrations in the 4 tumor sam-
`ples were analyzed with MPS and with 1 or 2 microar-
`ray platforms (Affymetrix and Agilent). Copy number
`aberrations were analyzed in 1-Mb windows across the
`genome in the tumor tissues and compared with the
`plasma samples from a group of 16 healthy control in-
`dividuals. The data were consistent across the 3 plat-
`forms (see Fig. 1 in the Data Supplement that accom-
`panies the online version of this article at http://
`www.clinchem.org/content/vol59/issue1).
`We then used MPS to analyze the pre- and post-
`resection plasma samples obtained from all 4 HCC
`patients. The mean sequencing depth was 17-fold
`coverage of the haploid human genome (range, 15.2-
`fold to 18.5-fold). Fig. 1 shows Circos plots (30 ) of
`the copy number aberrations across the genome in
`the tumor, the pre-resection plasma sample, and the
`post-resection plasma sample, for each patient. In
`each case, characteristic copy number aberrations
`seen in the tumor tissue sample were also observed in
`the pre-resection plasma sample (Fig. 1). A signifi-
`cant change in the regional representation of plasma
`DNA was defined as ⬎3 SDs from the mean repre-
`sentation of the 16 healthy controls for the corre-
`sponding 1-Mb window.
`For all cases, such copy number aberrations disap-
`peared almost completely in the post-resection plasma
`sample (Fig. 1). The detectability of the different classes
`of tumor-associated genetic alterations in plasma is
`shown in Fig. 2. For comparison, we used the same
`approach to analyze plasma DNA samples from 4 hep-
`atitis B carriers without HCC (Fig. 1E; see Fig. 2 in the
`online Data Supplement). These individuals were fol-
`lowed up for 1 additional year after blood sampling and
`had no evidence of HCC. For these individuals, 99% of
`the sequenced bins showed normal representations in
`plasma (see Table 1 in the online Data Supplement).
`
`Similarly, a mean of 98.9% of the sequenced bins in the
`16 healthy controls showed normal representations in
`plasma (see Table 2 and Fig. 3 in the online Data Sup-
`plement). These results indicate that the analysis of
`copy number aberrations in plasma is specific for dif-
`ferentiating between cancer patients and individuals
`without a cancer; however, the specificity for plasma
`copy number analysis appeared to be reduced in the
`HCC patients. Hence, in the 4 HCC patients, a median
`of 15% (range, 2%– 48%) of the regions at which no
`copy number aberrations occurred in the correspond-
`ing tumor tissue showed an aberrant plasma DNA rep-
`resentation (see Table 1 in the online Data Supple-
`ment). This issue will be discussed in more detail in the
`Discussion section.
`
`FRACTIONAL CONCENTRATION OF TUMOR DNA IN PLASMA
`
`DETERMINED BY GENOMEWIDE AGGREGATED ALLELIC
`
`LOSS ANALYSIS
`The fractional concentrations of tumor-derived DNA
`in plasma were determined by analyzing, in a genome-
`wide manner, the allelic counts for SNPs exhibiting
`LOH in the plasma shotgun MPS data, which we term
`“genomewide aggregated allelic loss” (GAAL) analysis.
`For such an analysis, we chose SNPs that exhibited
`LOH in the tumors as demonstrated with the Af-
`fymetrix SNP 6.0 microarray. The alleles deleted in the
`tumors would have lower concentrations in the plasma
`than those that were not deleted. The difference in their
`concentrations was related to the concentration of
`tumor-derived DNA in the plasma sample. Thus, the
`plasma concentration of the tumor-derived DNA (C)
`can be deduced with the following equation:
`
`C ⫽
`
`Nnondel ⫺ Ndel
`Nnondel
`
`,
`
`where Nnondel represents the number of sequenced
`reads carrying the nondeleted alleles in the tumor
`tissues, and Ndel represents the number of sequenced
`reads carrying the deleted alleles in the tumor
`tissues.
`Table 1 lists the fractional concentrations of tumor
`DNA in the plasma samples for each of the 4 cases. The
`size of the tumor appears to be correlated with the es-
`timated fractional concentration of tumor-derived
`DNA in plasma before surgical resection. For example,
`we estimated that tumor-derived DNA accounted for
`52% of the total plasma DNA in the patient who had
`the largest tumor (13 cm) of the 4 HCC cases. For each
`of the 4 cases, we observed a reduction in the fractional
`concentration of tumor-derived DNA after surgical re-
`section of the tumor (Table 1).
`
`Clinical Chemistry 59:1 (2013) 215
`
`00005
`
`
`
`Fig. 1. Copy number aberrations detected in the tumor tissue sample (inner ring), presurgery plasma sample (middle
`ring), and postsurgery plasma sample (outer ring) for the 4 HCC cases (A–D) and genomic representation of plasma
`DNA samples for a hepatitis B virus carrier (HBV1) without HCC (E).
`Chromosome ideograms (outside the plots) are oriented pter to qter in a clockwise direction. For the tumor tissues, the numbers
`of copies gained (green) or lost (red) are presented. For the plasma samples, regions with increased and reduced representation
`in plasma are shown in green and red, respectively. Regions with z scores between ⫺3 and ⫹3 are represented by gray dots.
`For HCC1, the distance between 2 consecutive horizontal lines represents a z score of 30. For the other cases, the distance
`represents a z score of 10.
`
`216 Clinical Chemistry 59:1 (2013)
`
`00006
`
`
`
`Cancer Genome Scanning in Plasma
`
`Fig. 2. (A), Detection rates for different classes of tumor-associated copy number aberrations in the plasma of the
`HCC patients.
`The fractional concentration of tumor-derived DNA in plasma (in parentheses) determined for each case by GAAL analysis. (B),
`Number of molecules required for analysis in each region of interest (1-Mb window) for detecting 95% of the cancer-associated
`copy number aberrations in plasma.
`
`FACTORS INFLUENCING THE DETECTION OF COPY NUMBER
`
`ABERRATIONS IN PLASMA
`The fractional concentration of tumor DNA in plasma
`and the class of copy number aberrations strongly influ-
`enced the detectability of such alterations in plasma. Fig. 1
`and Fig. 2A show that the proportions of tumor-
`associated genetic aberrations that could be seen in the
`plasma DNA–sequencing results were correlated with the
`fractional concentrations of tumor DNA in plasma. For
`
`example, case HCC1, which had the largest tumor and the
`highest fractional concentration of tumor DNA in
`plasma, also had the largest proportion of tumor-
`associated copy number aberrations detected in plasma
`(Fig. 1A).
`Case HCC1 has a fractional concentration of
`tumor-derived DNA of 52%. Before treatment, most of
`the tumoral copy number aberrations could also be
`seen in the plasma. At most of the chromosomal re-
`
`Table 1. Fractional concentration of tumor-derived DNA in plasma by GAAL analysis.
`
`Maximal dimension
`of the tumor, cm
`
`SNPs with LOH
`in tumor, n
`
`Collection time
`
`Nnondel
`
`a
`
`Ndel
`
`Fractional concentration
`of tumor-derived DNA
`
`Plasma
`
`13.0
`
`6.2
`
`4.2
`
`6.2
`
`11 310
`
`6040
`
`24 783
`
`498
`
`Before surgery
`After surgery
`Before surgery
`After surgery
`Before surgery
`After surgery
`Before surgery
`After surgery
`
`88 499
`94 442
`56 611
`63 694
`198 688
`236 655
`2465
`2773
`
`42 829
`93 584
`53 428
`63 083
`190 163
`233 280
`2277
`2699
`
`52%
`0.9%
`5.6%
`0.9%
`4.3%
`1.4%
`7.6%
`2.7%
`
`Case
`
`HCC1
`
`HCC2
`
`HCC3
`
`HCC4
`
`a Nnondel, number of sequenced reads carrying the nondeleted alleles in tumor tissues; Ndel, number of sequenced reads carrying the deleted alleles in the tumor
`tissues.
`
`Clinical Chemistry 59:1 (2013) 217
`
`00007
`
`
`
`Table 2. Fractional concentrations of tumor-derived DNA in plasma determined by SNV analysis.
`
`No. of SNVs
`detected in
`tumor
`tissue
`
`Case
`
`HCC1
`
`2840
`
`HCC2
`
`3105
`
`HCC3
`
`3171
`
`HCC4
`
`1334
`
`Time point
`
`Before surgery
`After surgery
`Before surgery
`After surgery
`Before surgery
`After surgery
`Before surgery
`After surgery
`
`No. of tumor-
`associated SNVs
`sequenced from
`plasma
`(percentage of
`SNVs seen in
`tumor tissue)
`
`No. of plasma
`DNA
`sequence
`reads
`showing
`SNVs (p)
`
`No. of plasma DNA
`sequence reads
`showing wild-type
`sequence (q)
`
`Deduced fractional
`concentration of
`tumor-derived
`DNA by SNV
`
`analysis冉 2p
`
`p ⴙ q
`
`Deduced fractional
`concentration of
`tumor-derived
`DNA by GAAL
`analysis
`
`冊
`
`2569 (94%)
`44 (1.5%)
`1097 (35%)
`72 (2.3%)
`461 (15%)
`31 (1%)
`201 (15%)
`74 (5.5%)
`
`11 389
`91
`1490
`206
`525
`67
`248
`149
`
`31 602
`46 898
`57 865
`66 692
`48 886
`58 862
`18 527
`22 144
`
`53%
`0.4%
`5.0%
`0.6%
`2.1%
`0.2%
`2.6%
`1.3%
`
`52%
`0.9%
`5.6%
`0.9%
`4.3%
`1.4%
`7.6%
`2.7%
`
`gions with a single-copy gain in the tumor (e.g., chro-
`mosomes 1p, 3, and 6), the z scores were ⬎20, indicat-
`ing that the plasma representation was 20 SDs above
`the mean representation of the healthy control individ-
`uals for these regions. On the other hand, case HCC3
`had a fractional tumor DNA concentration in plasma
`of 4.3%, and a smaller proportion of the cancer-
`associated aberrations could be observed in the plasma.
`None of the regions with a single-copy gain (7q, 8q,
`13q, and 14p) had a z score of ⬎10 in the plasma.
`The classes of copy number aberrations that were
`studied included 1-copy losses, 1-copy gains, and
`2-copy gains. The percentages of such changes that
`were detected in plasma in each of the 4 HCC cases are
`plotted in Fig. 2A and listed in Table 3 in the online
`Data Supplement. For each case, 2-copy gains could be
`detected with higher sensitivity in plasma than 1-copy
`changes. For all 4 HCC patients, most of these tumor-
`associated chromosomal aberrations disappeared after
`surgical resection of the tumor (Fig. 1; see Table 3 in the
`online Data Supplement).
`Previous work on noninvasive prenatal diagnosis
`has revealed that a greater sequencing depth would en-
`able the detection in plasma of an aneuploid fetus at a
`lower fractional fetal DNA concentration (31 ). Using
`computer simulation, we explored the relationship be-
`tween the depths of sequencing that would be needed
`to detect different classes of tumor-associated copy
`number aberrations in plasma at different fractional
`concentrations of tumor DNA in plasma (Fig. 2B). For
`illustration purposes we fixed the detection rate at 95%
`and explored 3 classes of genetic aberrations—1-copy
`loss, 1-copy gain, and 2-copy gain. When the fractional
`concentration of tumor-derived DNA was 40%, the de-
`tection of aberrations with 2-copy gains and 1-copy
`
`218 Clinical Chemistry 59:1 (2013)
`
`gains would require the analysis of approximately 180
`and 800 molecules, respectively, per 1-Mb window.
`When the fractional concentration of tumor-derived
`DNA drops to 10%, the analysis of approximately 2500
`and 12 000 molecules per 1-Mb window is necessary to
`detect these respective changes. The requirement of an
`exponential increase in the number of molecules with a
`decreasing fractional concentration of tumor-derived
`DNA was consistent with the requirement for nonin-
`vasive prenatal diagnosis of fetal chromosomal aneup-
`loidies via analysis of maternal plasma DNA (32 ).
`
`TUMOR-DERIVED SNVs IN PLASMA
`We next explored the genomewide detection of tumor-
`derived SNVs in the plasma of the 4 HCC patients. We
`sequenced tumor DNA and buffy coat DNA to mean
`depths of 29.5-fold (range, 27-fold to 33-fold) and 43-
`fold (range, 39-fold to 46-fold) haploid genome cover-
`age, respectively. The MPS data from the tumor DNA
`and the buffy coat DNA from each of the 4 HCC pa-
`tients were compared, and SNVs present in the tumor
`DNA but not in the buffy coat DNA were mined with a
`stringent bioinformatics algorithm. This algorithm re-
`quired a putative SNV to be present in at least a thresh-
`old number of sequenced tumor DNA fragments be-
`fore it would be classified as a true SNV. The threshold
`number was determined by taking into account the se-
`quencing depth of a particular nucleotide and the se-
`quencing error rate.
`The number of tumor-associated SNVs ranged
`from 1334 to 3171 in the 4 HCC cases. The proportions
`of such SNVs that were detectable in plasma are listed in
`Table 2. Before treatment, 15%–94% of the tumor-
`associated SNVs were detected in plasma. The fractional
`concentrations of tumor-derived DNA in plasma were
`
`00008
`
`
`
`Cancer Genome Scanning in Plasma
`
`determined by the fractional counts of the mutant with
`respect to the total (i.e., mutant plus wild type) sequences
`(Table 2). These fractional concentrations were well cor-
`related with those determined with GAAL analysis and
`were reduced after surgery (Table 2).
`To estimate the specificity of the SNV analysis
`approach, we analyzed the plasma of the healthy
`controls for the tumor-associated SNVs (se