`http://genomemedicine.com/content/5/4/30
`
`R ES EA R C H
`Open Access
`Tumor-associated copy number changes in the
`circulation of patients with prostate cancer
`identified through whole-genome sequencing
`Ellen Heitzer1†, Peter Ulz1†, Jelena Belic1, Stefan Gutschi2, Franz Quehenberger3, Katja Fischereder2,
`Theresa Benezeder1, Martina Auer1, Carina Pischler1, Sebastian Mannweiler4, Martin Pichler5, Florian Eisner5,
`Martin Haeusler6, Sabine Riethdorf7, Klaus Pantel7, Hellmut Samonigg5, Gerald Hoefler4, Herbert Augustin2,
`Jochen B Geigl1* and Michael R Speicher1*
`
`Abstract
`
`Background: Patients with prostate cancer may present with metastatic or recurrent disease despite initial curative
`treatment. The propensity of metastatic prostate cancer to spread to the bone has limited repeated sampling of
`tumor deposits. Hence, considerably less is understood about this lethal metastatic disease, as it is not commonly
`studied. Here we explored whole-genome sequencing of plasma DNA to scan the tumor genomes of these
`patients non-invasively.
`Methods: We wanted to make whole-genome analysis from plasma DNA amenable to clinical routine applications
`and developed an approach based on a benchtop high-throughput platform, that is, Illuminas MiSeq instrument.
`We performed whole-genome sequencing from plasma at a shallow sequencing depth to establish a genome-
`wide copy number profile of the tumor at low costs within 2 days. In parallel, we sequenced a panel of 55 high-
`interest genes and 38 introns with frequent fusion breakpoints such as the TMPRSS2-ERG fusion with high
`coverage. After intensive testing of our approach with samples from 25 individuals without cancer we analyzed 13
`plasma samples derived from five patients with castration resistant (CRPC) and four patients with castration
`sensitive prostate cancer (CSPC).
`Results: The genome-wide profiling in the plasma of our patients revealed multiple copy number aberrations
`including those previously reported in prostate tumors, such as losses in 8p and gains in 8q. High-level copy number
`gains in the AR locus were observed in patients with CRPC but not with CSPC disease. We identified the TMPRSS2-
`ERG rearrangement associated 3-Mbp deletion on chromosome 21 and found corresponding fusion plasma
`fragments in these cases. In an index case multiregional sequencing of the primary tumor identified different copy
`number changes in each sector, suggesting multifocal disease. Our plasma analyses of this index case, performed 13
`years after resection of the primary tumor, revealed novel chromosomal rearrangements, which were stable in serial
`plasma analyses over a 9-month period, which is consistent with the presence of one metastatic clone.
`Conclusions: The genomic landscape of prostate cancer can be established by non-invasive means from plasma
`DNA. Our approach provides specific genomic signatures within 2 days which may therefore serve as ‘liquid
`biopsy’.
`
`* Correspondence: jochen.geigl@medunigraz.at; michael.
`speicher@medunigraz.at
`† Contributed equally
`1Institute of Human Genetics, Medical University of Graz, Harrachgasse 21/8,
`A-8010 Graz, Austria
`Full list of author information is available at the end of the article
`
`© 2013 Heitzer et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons
`Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
`any medium, provided the original work is properly cited.
`
`00001
`
`EX1032
`
`
`
`Heitzer et al. Genome Medicine 2013, 5:30
`http://genomemedicine.com/content/5/4/30
`
`Page 2 of 16
`
`Background
`Prostate cancer is the most common malignancy in
`men. In Europe each year an estimated number of 2.6
`million new cases is diagnosed [1]. The wide application
`of PSA testing has resulted in a shift towards diagnosis
`at an early stage so that many patients do not need
`treatment or are cured by radical surgery [2]. However,
`patients still present with metastatic or recurrent disease
`despite initial curative treatment [3]. In these cases pros-
`tate-cancer progression can be inhibited by androgen-
`deprivation therapy (ADT) for up to several years. How-
`ever, disease progression is invariably observed with
`tumor cells resuming proliferation despite continued
`treatment (termed castration-resistant prostate cancer or
`CRPC) [4]. CRPC is a strikingly heterogeneous disease
`and the overall survival can be extremely variable [5].
`Scarcity of predictive and prognostic markers underlines
`the growing need for a better understanding of the
`molecular makeup of these lethal tumors.
`However, acquiring tumor tissue from patients with
`metastatic prostate cancer often represents a challenge.
`Due to the propensity of metastatic prostate cancer to
`spread to bone biopsies can be technically challenging and
`limit repeated sampling of tumor deposits. As a conse-
`quence, considerably less is understood about the later
`acquired genetic alterations that emerge in the context of
`the selection pressure of an androgen-deprived milieu [6].
`Consistent and frequent findings from recent genomic
`profiling studies in clinical metastatic prostate tumors
`include the TMPRSS2-ERG fusion in approximately
`50%, 8p loss in approximately 30% to 50%, 8q gain in
`approximately 20% to 40% of cases, and the androgen
`receptor (AR) amplification in approximately 33% of
`CRPC cases [7-10]. Several whole-exome or whole-gen-
`ome sequencing studies consistently reported low over-
`all mutation rates even in heavily treated CRPCs [9-14].
`The difficulties in acquiring tumor tissue can partly be
`addressed by elaborate procedures such as rapid autopsy
`programs to obtain high-quality metastatic tissue for
`analysis [15]. However, this material can naturally only
`be used for research purposes, but not for biomarker
`detection for individualized treatment decisions. This
`makes blood-based assays crucially important to indivi-
`dualize management of prostate cancer [16]. Profiling of
`blood offers several practical advantages, including the
`minimally invasive nature of sample acquisition, relative
`ease of standardization of sampling protocols, and the
`ability to obtain repeated samples over time. For exam-
`ple, the presence of circulating tumor cells (CTCs) in
`peripheral blood is a prognostic biomarker and a mea-
`sure of therapeutic response in patients with prostate
`cancer [17-20]. Novel microfluidic devices enhance CTC
`capture [21-23] and allow to establish a non-invasive
`
`measure of intratumoral AR signaling before and after
`hormonal therapy [24]. Furthermore, prospective studies
`have demonstrated that mRNA expression signatures
`from whole blood can be used to stratify patients with
`CRPC into high- and low-risk groups [25,26].
`Another option represents the analysis of plasma DNA
`[27]. One approach is the identification of known altera-
`tions previously found in the resected tumors from the
`same patients in plasma DNA for monitoring purposes
`[28,29]. Furthermore, recurrent mutations can be identi-
`fied in plasma DNA in a subset of patients with cancer
`[30-32]. Given that chromosomal copy number changes
`occur frequently in human cancer, we developed an
`approach allowing the mapping of tumor-specific copy
`number changes from plasma DNA employing array-
`CGH [33]. At the same time, massively parallel sequen-
`cing of plasma DNA from the maternal circulation is
`emerging to a clinical tool for the routine detection of
`fetal aneuploidy [34-36]. Using essentially the same
`approach, that is, next-generation sequencing from
`plasma, the detection of chromosomal alterations in the
`circulation of three patients with hepatocellular carci-
`noma and one patient with both breast and ovarian can-
`cer [37] and from 10 patients with colorectal and breast
`cancer [38] was reported.
`However, the costs of the aforementioned plasma
`sequencing studies necessary for detection of rearrange-
`ments were prohibitive for routine clinical implementa-
`tion [37,38]. In addition, these approaches are very
`time-consuming. Previously it had been shown that
`whole-genome sequencing with a shallow sequencing
`depth of about 0.1x is sufficient for a robust and reliable
`analysis of copy number changes from single cells [39].
`Hence, we developed a different whole-genome plasma
`sequencing approach employing a benchtop high-
`throughput sequencing instrument, that is, the Illumina
`MiSeq, which is based on the existing Solexa sequen-
`cing-by-synthesis chemistry, but has dramatically
`reduced run times compared to the Illumina HiSeq [40].
`Using this instrument we performed whole-genome
`sequencing from plasma DNA and measured copy
`number from sequence read depth. We refer to this
`approach as plasma-Seq. Furthermore, we enriched 1.3
`Mbp consisting of exonic sequences of 55 high-interest
`cancer genes and 38 introns of genes, where fusion
`breakpoints have been described and subjected the DNA
`to next-generation sequencing at high coverage
`(approximately 50x). Here we present the implementa-
`tion of our approach with 25 plasma samples from indi-
`viduals without cancer and results obtained with whole
`genome sequencing of 13 plasma DNA samples derived
`from nine patients (five CRPC, four CSPC) with prostate
`cancer.
`
`00002
`
`
`
`Heitzer et al. Genome Medicine 2013, 5:30
`http://genomemedicine.com/content/5/4/30
`
`Page 3 of 16
`
`Methods
`Patient eligibility criteria
`This study was conducted among men with prostate can-
`cer (Clinical data in Additional file 1, Table S1) who met
`the following criteria: histologically-proven, based on a
`biopsy, metastasized prostate cancer. We distinguished
`between CRPC and CSPC based on the guidelines on
`prostate cancer from the European Association of Urology
`[41], that is: 1, castrate serum levels of testosterone (testos-
`terone <50 ng/dL or <1.7 nmol/L); 2, three consecutive
`rises of PSA, 1 week apart, resulting in two 50% increases
`over the nadir, with a PSA >2 ng/mL; 3, anti-androgen
`withdrawal for at least 4 weeks for flutamide and for at
`least 6 weeks for bicalutamide; 4, PSA progression, despite
`consecutive hormonal manipulations. Furthermore, we
`focused on patients who had ≥5 CTCs per 7.5 mL [19]
`and/or a biphasic plasma DNA size distribution as
`described previously by us [33].
`The study was approved by the ethics committee of
`the Medical University of Graz (approval numbers 21-
`228 ex 09/10, prostate cancer, and 23-250 ex 10/11, pre-
`natal plasma DNA analyses), conducted according to the
`Declaration of Helsinki, and written informed consent
`was obtained from all patients and healthy blood
`donors. Blood from prostate cancer patients and from
`male controls without malignant disease was obtained
`from the Department of Urology or the Division of
`Clinical Oncology, Department of Internal Medicine, at
`the Medical University of Graz. From prostate cancer
`patients we obtained a buccal swab in addition. Blood
`samples from pregnant females and from female con-
`trols without malignant disease were collected at the
`Department of Obstetrics and Gynecology, Medical Uni-
`versity of Graz. The blood samples from the pregnant
`females were taken prior to an invasive prenatal diag-
`nostic procedure.
`
`Plasma DNA preparation
`Plasma DNA was prepared using the QIAamp DNA
`Blood Mini Kit (Qiagen, Hilden, Germany) as previously
`described [33]. Samples selected for sequence library
`construction were analyzed by using the Bioanalyzer
`instrument (Agilent Technologies, Santa Clara, CA,
`USA) to observe the plasma DNA size distribution. In
`this study we included samples with a biphasic plasma
`DNA size distribution as previously described [33].
`
`Enumeration of CTCs
`We performed CTC enumeration using the automated
`and FDA approved CellSearch assay. Blood samples (7.5
`mL each) were collected into CellSave tubes (Veridex,
`Raritan, NJ, USA). The Epithelial Cell Kit (Veridex) was
`applied for CTC enrichment and enumeration with the
`CellSearch system as described previously [42,43].
`
`Array-CGH
`Array-CGH was carried out using a genome-wide oligo-
`nucleotide microarray platform (Human genome CGH
`60K microarray kit, Agilent Technologies, Santa Clara,
`CA, USA), following the manufacturer’s instructions
`(protocol version 6.0) as described [33]. Evaluation was
`done based on our previously published algorithm
`[33, 44, 45].
`
`HT29 dilution series
`Sensitivity of our plasma-Seq approach was determined
`using serial dilutions of DNA from HT29 cell line (50%,
`20%, 15%, 10%, 5%, 1%, and 0%) in the background of nor-
`mal DNA (Human Genomic DNA: Female; Promega,
`Fitchburg, WI, USA). Since quantification using absorp-
`tion or fluorescence absorption is often not reliable we
`used quantitative PCR to determine the amount of ampli-
`fiable DNA and normalized the samples to a standard
`concentration using the Type-it CNV SYBR Green PCR
`Kits (Qiagen, Hilden, Germany). Dilution samples were
`then fragmented using the Covaris S220 System (Covaris,
`Woburn, MA, USA) to a maximum of 150-250 bp and 10
`ng of each dilution were used for library preparation to
`simulate plasma DNA condition.
`
`Plasma-Seq
`Shotgun libraries were prepared using the TruSeq DNA
`LT Sample preparation Kit (Illumina, San Diego, CA,
`USA) following the manufacturer´s instructions with
`three exceptions. First, due to limited amounts of plasma
`DNA samples we used 5-10 ng of input DNA. Second,
`we omitted the fragmentation step since the size distribu-
`tion of the plasma DNA samples was analyzed on a Bioa-
`nalyzer High Sensitivity Chip (Agilent Technologies,
`Santa Clara, CA, USA) and all samples showed an enrich-
`ment of fragments in the range of 160 to 340 bp. Third,
`for selective amplification of the library fragments that
`have adapter molecules on both ends we used 20-25 PCR
`cycles. Four libraries were pooled equimolarily and
`sequenced on an Illumina MiSeq (Illumina, San Diego,
`CA, USA).
`The MiSeq instrument was prepared following routine
`procedures. The run was initiated for 1x150 bases plus
`1x25 bases of SBS sequencing, including on-board clus-
`tering and paired-end preparation, the sequencing of the
`respective barcode indices and analysis. On the comple-
`tion of the run, data were base called and demultiplexed
`on the instrument (provided as Illumina FASTQ 1.8 files,
`Phred+33 encoding). FASTQ format files in Illumina 1.8
`format were considered for downstream analysis.
`
`Calculation of segments with identical log2 ratio values
`We employed a previously published algorithm [46] to
`create a reference sequence. The pseudo-autosomal region
`
`00003
`
`
`
`Heitzer et al. Genome Medicine 2013, 5:30
`http://genomemedicine.com/content/5/4/30
`
`Page 4 of 16
`
`(PAR) on the Y chromosome was masked and the mapp-
`ability of each genomic position examined by creating vir-
`tual 150 bp reads for each position in the PAR-masked
`genome. Virtual sequences were mapped to the PAR-
`masked genome and mappable reads were extracted. Fifty
`thousand genomic windows were created (mean size,
`56,344 bp) each having the same amount of mappable
`positions.
`Low-coverage whole-genome sequencing reads were
`mapped to the PAR-masked genome and reads in differ-
`ent windows were counted and normalized by the total
`amount of reads. We further normalized read counts
`according to the GC content using LOWESS-statistics. In
`order to avoid position effects we normalized the sequen-
`cing data with GC-normalized read counts of plasma
`DNA of our healthy controls and calculated log2 ratios.
`Resulting normalized ratios were segmented using circu-
`lar binary segmentation (CBS) [47] and GLAD [48] by
`applying the CGHweb [49] framework in R [50]. These
`segments were used for calculation of the segmental z-
`scores by adding GC-corrected read-count ratios (read-
`counts in window divided by mean read-count) of all the
`windows in a segment. Z-scores were calculated by sub-
`tracting mean sum of GC-corrected read-count ratios of
`individuals without cancer (10 for men and 9 for women)
`of same sex and dividing by their standard-deviation.
`(cid:3)(cid:2)
`(cid:2)
`ratioGC−corr − mean
`ratioGC−corr,controls
`(cid:4)
`(cid:2) (cid:3)
`ratioGC−corr,controls
`SD
`
`zsegments =
`
`(cid:4)
`
`Calculation of z-scores for specific regions
`In order to check for the copy-number status of genes
`previously implicated in prostate-cancer initiation or
`progression we applied z-score statistics for each region
`focusing on specific targets (mainly genes) of variable
`length within the genome. At first we counted high-
`quality alignments against the PAR-masked hg19 gen-
`ome within genes for each sample and normalized by
`expected read counts.
`
`ratio =
`
`readsregion
`readsexpected
`
`Here expected reads are calculated as
`∗ readstotal
`
`readsexpected =
`
`lengthregion
`lengthgenome
`
`Then we subtracted the mean ratio of a group of con-
`trols and divided it by the standard deviation of that
`group.
`
`zregion =
`
`ratiosample − mean (ratiocontrols)
`SD (ratiocontrols)
`
`Calculation of genome-wide z-scores
`In order to establish a genome-wide z-score to detect
`aberrant genomic content in plasma, we divided the
`genome into equally-sized regions of 1 Mbp length and
`calculated z-scores therein.
`Under the condition that all ratios were drawn from
`the same normal distribution, z-scores are distributed
`proportionally to Student’s t-distribution with n-1
`degrees of freedom. For controls, z-scores were calcu-
`lated using cross-validation. In brief, z-score calculation
`of one control is based on means and standard deviation
`of the remaining controls. This prevents controls from
`serving as their own controls.
`The variance of these cross-validated z-scores of con-
`trols is slightly higher than the variance of z-scores of
`tumor patients. Thus ROC performance is underesti-
`mated. This was confirmed in the simulation experiment
`described below.
`In order to summarize the information about high or
`low z-score that was observed in many tumor patients
`squared z-scores were summed up.
`(cid:5)
`
`S =
`
`i from all Windows
`
`z2
`i
`
`Genome-wide z-scores were calculated from S-scores.
`Other methods of aggregation of z-score information,
`such as sums of absolute values or PA scores [38], per-
`formed poorer and were therefore not considered. Per
`window z-scores were clustered hierarchically by the
`hclust function of R using Manhattan distance that
`summed up the distance of each window.
`In order to validate the diagnostic performance of the
`genome-wide z-score in silico, artificial cases and con-
`trols were simulated from mean and standard deviations
`of ratios from 10 healthy controls according to a normal
`distribution. Simulated tumor cases were obtained
`through multiplication of the mean by the empirical
`copy number ratio of 204 prostate cancer cases [9]. Seg-
`mented DNA-copy-number data were obtained via the
`cBio Cancer Genomics Portal [51].
`To test the specificity of our approach at varying
`tumor DNA content, we performed in-silico dilutions of
`simulated tumor data. To this end we decreased the
`tumor signal using the formula below, where l is the
`ratio of tumor DNA to normal DNA:
`(1 − λ) + λ · ratiosegment
`
`We performed ROC analyses of 500 simulated con-
`trols and 102 published prostate tumor data and their
`respective dilutions using the pROC R-package [52].
`The prostate tumor data were derived from a previously
`published dataset [9] and the 102 cases were selected
`based on their copy number profiles.
`
`00004
`
`
`
`Heitzer et al. Genome Medicine 2013, 5:30
`http://genomemedicine.com/content/5/4/30
`
`Page 5 of 16
`
`Gene-Breakpoint Panel: target enrichment of cancer
`genes, alignment and SNP-calling, SNP-calling results
`We enriched 1.3 Mbp of seven plasma DNAs (four
`CRPC cases, CRPC1-3 and CRPC5; three CSPC cases,
`CSPC1-2 and CSPC4) including exonic sequences of 55
`cancer genes and 38 introns of 18 genes, where fusion
`breakpoints have been described using Sure Select Cus-
`tom DNA Kit (Agilent, Santa Clara, CA, USA) following
`the manufacturer’s recommendations. Since we had very
`low amounts of input DNA we increased the number of
`cycles in the enrichment PCR to 20. Six libraries were
`pooled equimolarily and sequenced on an Illumina
`MiSeq (Illumina, San Diego, CA, USA).
`We generated a mean of 7.78 million reads (range, 3.62-
`14.96 million), 150 bp paired-end reads on an Illumina
`MiSeq (Illumina, San Diego, CA, USA). Sequences were
`aligned using BWA [53] and duplicates were marked
`using picard [54]. We subsequently performed realigning
`around known indels and applied the Unified Genotyper
`SNP-calling software provided by the GATK [55].
`We further annotated resulting SNPs by employing
`annovar [56] and reduced the SNP call set by removing
`synonymous variants, variants in segmental duplications
`and variants listed in the 1000 Genome Project [57] and
`Exome sequencing (Project Exome Variant Server, NHLBI
`Exome Sequencing Project (ESP), Seattle, WA) [58] with
`allele frequency >0.01.
`We set very stringent criteria to reduce false positives
`according to previously published values [37]: a mutation
`had to be absent from the constitutional DNA sequencing
`and the sequencing depth for the particular nucleotide
`position had to be >20-fold. Furthermore, all putative
`mutations or breakpoint spanning regions were verified by
`Sanger sequencing.
`
`Split-read analysis
`Since plasma DNA is fragmented the read pair method is
`not suitable for identification of structural rearrange-
`ments [59] and therefore we performed split-read analy-
`sis of 150 bp reads. We used the first and the last 60 bp
`of each read (leaving a gap of 30 bp) and mapped these
`independently. We further analyzed discordantly mapped
`split-reads by focusing on targeted regions and filtering
`out split-reads mapping within repetitive regions and
`alignments having a low mapping quality (<25). Reads
`where discordantly mapped reads were found were
`aligned to the human genome using BLAT [60] to further
`specify putative breakpoints.
`
`Data deposition
`All sequencing raw data were deposited at the European
`Genome-phenome Archive (EGA) [61], which is hosted by
`the EBI, under accession numbers EGAS00001000451
`
`(Plasma-Seq) and EGAS00001000453 (Gene-Breakpoint
`Panel).
`
`Results
`Implementation of our approach
`Previously, we demonstrated that tumor-specific, somatic
`chromosomal alterations can be detected from plasma of
`patients with cancer using array-CGH [33]. In order to
`extend our method to a next-generation sequencing-
`based approach, that is, plasma-Seq, on a benchtop Illu-
`mina MiSeq instrument, we first analyzed plasma DNA
`from 10 men (M1 to M10) and nine women (F1 to F9)
`without malignant disease. On average we obtained 3.3
`million reads per sample (range, 1.9-5.8 million; see
`Additional file 1, Table S2) and applied a number of fil-
`tering steps to remove sources of variation and to remove
`known GC bias effects [62-64] (for details see Material
`and Methods).
`We performed sequential analyses of 1-Mbp windows
`(n=2,909 for men; n=2,895 for women) throughout the
`genome and calculated for each 1-Mbp window the
`z-score by cross-validating each window against the
`other control samples from the same sex (details in
`Material and Methods). We defined a significant
`change in the regional representation of plasma DNA
`as >3 SDs from the mean representation of the other
`healthy controls for the corresponding 1-Mbp window.
`A mean of 98.5% of the sequenced 1-Mbp windows
`from the 19 normal plasma samples showed normal
`representations in plasma (Figure 1a). The variation
`among the normalized proportions of each 1-Mbp win-
`dow in the plasma from normal individuals was very
`low (average, 47 windows had a z-score £-3 or ≥3;
`range of SD, ±52%) (Figure 1a).
`In addition, we calculated ‘segmental z-scores’ where
`the z-scores are not calculated for 1-Mbp windows but
`for chromosomal segments with identical copy number.
`In order to determine such segments we employed an
`algorithm for the assignment of segments with identical
`log2 ratios [39,46] (Material and Methods) and calculated
`a z-score for each of these segments (hence, ‘segmental
`z-scores’). As sequencing analyses of chromosome con-
`tent in the maternal circulation are now frequently being
`used for detection of fetal aneuploidy [34,36] and as our
`mean sequencing depth is lower compared to previous
`studies, we wanted to test whether our approach would
`be feasible for this application. To this end we obtained
`two plasma samples each of pregnancies with euploid
`and trisomy 21 fetuses and one each of pregnancies with
`trisomies of chromosomes 13 and 18, respectively. In the
`trisomy cases the respective chromosomes were identi-
`fied as segments with elevated log2 ratios and accordingly
`also increased z-scores (Additional file 2).
`
`00005
`
`
`
`Heitzer et al. Genome Medicine 2013, 5:30
`http://genomemedicine.com/content/5/4/30
`
`Page 6 of 16
`
`Figure 1 Implementation of our approach using plasma DNA samples from individuals without cancer and simulations. (a) Z-scores
`calculated for sequential 1-Mbp windows for 10 male (upper panel) and 9 female (lower panel) individuals without malignant disease. (b)
`Detection of tumor DNA in plasma from patients with prostate cancer using simulated copy-number analyses. ROC analyses of simulated
`mixtures of prostate cancer DNA with normal plasma DNA using the genome-wide z-score. Detection of 10% circulating tumor DNA could be
`achieved with a sensitivity of >80% and specificity of >80%. (c) Hierarchical cluster analysis (Manhattan distances of chromosomal z-scores) with
`normal female controls and the HT29 serial dilution series. One percent of tumor DNA still had an increased genome-wide z-score and did not
`cluster together with the controls (for details see text).
`
`Sensitivity and specificity of our approach
`We wanted to gain insight into the sensitivity of our
`approach to detect tumor-derived sequences in a patient’s
`plasma. To this end we calculated a genome-wide z-score
`for each sample (Material and Methods). The main pur-
`pose of the genome-wide z-score is to distinguish between
`aneuploid and euploid plasma samples. The genome-wide
`z-score from the plasma of male individuals ranged from
`-1.10 to 2.78 and for female individuals from -0.48 to 2.64.
`We performed receiver operating characteristic (ROC)
`analyses of simulated next-generation sequencing data
`from 102 published prostate cancer data and 500 simu-
`lated controls based on the data from our healthy indivi-
`duals. Using the equivalent of one-quarter MiSeq run,
`these analyses suggested that using the genome-wide
`z-score tumor DNA concentrations at levels ≥10% could
`be detected in the circulation of patients with prostate
`cancers with a sensitivity of >80% and specificity of >80%
`(Figure 1c).
`To test these estimates with actual data we fragmented
`DNA from the colorectal cancer cell line HT29 to sizes
`of approximately 150-250 bp to reflect the degree of frag-
`mented DNA in plasma and performed serial dilution
`experiments with the fragmented DNA (that is, 50%,
`20%, 15%, 10%, 5%, 1%, and 0%). We established the
`copy-number status of this cell line with undiluted, that
`is, 100%, DNA using both array-CGH and our next-
`generation sequencing approach (Additional file 3) and
`confirmed previously reported copy number changes
`
`[65,66]. Calculating the genome-wide z-score for each
`dilution we noted its expected decrease with increasing
`dilution. Whereas the genome-wide z- score was 429.74
`for undiluted HT29 DNA, it decreased to 7.75 for 1%
`(Additional file 1, Table S2). Furthermore, when we per-
`formed hierarchical cluster analysis the female controls
`were separated from the various HT29 dilutions, further
`confirming that our approach may indicate aneuploidy in
`the presence of 1% circulating tumor DNA (Figure 1d).
`
`Plasma analysis from patients with cancer
`Our analysis of plasma samples from patients with can-
`cer is two-fold (Figure 2): (a) we used plasma-Seq to
`calculate the genome-wide z-score as a general measure
`for aneuploidy and the segmental z-scores to establish a
`genome-wide copy number profile. The calculation of
`the segments with identical log2 ratios takes only 1 h
`and also provides a first assessment of potential copy
`number changes. Calculation of the z-scores for all seg-
`ments and thus definite determination of over- and
`under-represented regions requires about 24 h. (b) In
`addition, we sequenced with high coverage (approxi-
`mately 50x) 55 genes frequently mutated in cancer
`according to the COSMIC [67] and Cancer Gene Cen-
`sus [68] databases (Additional file 1, Table S3), and 38
`introns often involved in structural somatic rearrange-
`ments, including recurrent gene fusions involving mem-
`bers of the E twenty-six (ETS) family of transcription
`factors to test for TMPRSS2-ERG-positive prostate
`
`00006
`
`
`
`Heitzer et al. Genome Medicine 2013, 5:30
`http://genomemedicine.com/content/5/4/30
`
`Page 7 of 16
`
`Figure 2 Outline of our whole-genome plasma analysis strategy. After blood draw, plasma preparation, and DNA-isolation we start our
`analysis, which is two-fold: first (left side of the panel), an Illumina shotgun library is prepared (time required, approximately 24 h). Single-read
`whole genome plasma sequencing is performed with a shallow sequencing depth of approximately 0.1x (approximately 12 h). After alignment
`we calculate several z-scores: a genome-wide z-score, segments with identical log2-ratios required to establish corresponding segmental z-
`scores, and gene-specific z-scores, for example, for the AR-gene. Each of these z-scores calculations takes approximately 2 h so that these
`analyses are completed within 48 h and the material costs are only approximately €300. Second (right side of the panel), we prepare a library
`using the SureSelect Kit (Agilent) and perform sequence enrichment with our GB-panel (approximately 48-72 h), consisting of 55 high-interest
`genes and 38 introns with frequent fusion breakpoints. The GB-panel is sequenced by paired-end sequencing with an approximately 50x
`coverage (around 26 h). The evaluation of the sequencing results may take several hours, the confirmation by Sanger sequencing several days.
`Hence, complete analysis of the entire GB-panel analysis will normally require around 7 days.
`
`cancers (herein referred to as GB-panel (Gene-Break-
`point panel)). In a further step identified mutations were
`verified by Sanger sequencing from both plasma DNA
`and constitutional DNA (obtained from a buccal swab)
`
`to distinguish between somatic and germline mutations.
`If needed, somatic mutations can then be used to esti-
`mate by deep sequencing the fraction of mutated tumor
`DNA in the plasma.
`
`00007
`
`
`
`Heitzer et al. Genome Medicine 2013, 5:30
`http://genomemedicine.com/content/5/4/30
`
`Page 8 of 16
`
`Plasma-Seq and GB-panel of patients with prostate cancer
`We then obtained 13 plasma samples from nine patients
`with metastatic prostate cancer (five with castration-
`resistant disease, CRPC1 to CRPC5, and four with cas-
`tration-sensitive disease, CSPC1 to CSPC4. Furthermore,
`from each of patients CRPC1 and CSPC1 we obtained
`three samples at different time points (Clinical data in
`Additional file 1, Table S1). Applying plasma-Seq, we
`obtained on average 3.2 million reads (range, 1.1
`(CSPC4) to 5.2 (CRPC5) million reads) for the plasma
`samples from patients with prostate cancer per sample
`(see Additional file 1, Table S2).
`To assess whether plasma-Seq allows discrimination
`between plasma samples from healthy men and men
`with prostate cancer we first calculated the genome-
`wide z-score. In contrast to the male controls (Figure
`1a), the 1-Mbp window z-scores showed a substantial
`
`variability (Figure 3a) and only a mean of 79.3% of the
`sequenced 1-Mbp windows from the 13 plasma samples
`showed normal representations in plasma in contrast to
`99.0% of the cross-validated z-scores in the sample of
`controls (P=0.00007, Wilcoxon test on sample percen-
`tages). Accordingly, the genome-wide z-score was ele-
`vated for all prostate cancer patients and ranged from
`125.14 (CRPC4) to 1155.77 (CSPC2) (see Additional file
`1, Table S2). Furthermore, when we performed hierarch-
`ical clustering the normal samples were separated from
`the tumor samples (Figure 3b), suggesting that we can
`indeed distinguish plasma samples from individuals
`without malignant disease from those with prostate
`cancer.
`Applying the GB-panel, we achieved on average a
`coverage of ≥50x for 71.8% of target sequence (range,
`67.3% (CSPC4) to 77.6% (CSPC2)) (see Additional file 1,
`
`Figure 3 Copy number analyses of plasma samples from men with prostate cancer. (a) Z-scores calculated for 1-Mbp windows from the
`13 plasma samples of patients with prostate cancer showed a high variability (compare with same calculations from men without malignant
`disease in Figure 1a, upper panel). (b) Hierarchical clustering (Manhattan distances of chromosomal z-scores)