`(12) Patent Application Publication (10) Pub. No.: US 2014/0296081 A1
`Oct. 2, 2014
`(43) Pub. Date:
`Diehn et al.
`
`US 20140296081A1
`
`(54)
`
`(71)
`
`(72)
`
`(21)
`(22)
`
`(60)
`
`IDENTIFICATION AND USE OF
`CIRCULATING TUMIOR MARKERS
`
`Inventors:
`
`Applicant: The Board of Trustees of the Leland
`Stanford Junior University, Palo Alto,
`CA (US)
`Maximilian Diehn, Stanford, CA (US);
`Arash Ash Alizadeh, San Mateo, CA
`(US); Aaron M. Newman, Palo Alto,
`CA (US); Scott V. Bratman, Palo Alto,
`CA (US)
`Appl. No.: 14/209,807
`Filed:
`Mar 13, 2014
`Related U.S. Application Data
`Provisional application No. 61/798,925, filed on Mar.
`15, 2013.
`
`Publication Classification
`
`(2006.01)
`(2006.01)
`
`Int. C.
`G06F 9/22
`CI2O I/68
`U.S. C.
`CPC .............. G06F 19/22 (2013.01): CI2O 1/6886
`(2013.01)
`USPC ................................................... 506/2:506/8
`
`ABSTRACT
`
`(51)
`
`(52)
`
`(57)
`
`Methods for creating a library of recurrently mutated
`genomic regions and for using the library to analyze cancer
`specific and patient-specific genetic alterations in a patient
`are provided. The methods can be used to measure tumor
`derived nucleic acids in patient blood and thus to monitor the
`progression of disease. The methods can also be used for
`cancer Screening.
`
`00001
`
`EX1026
`
`
`
`Patent Application Publication
`
`Oct. 2,2014 Sheet 1 of 19
`
`US 2014/0296081 Al
`
` Hasue
`tansy ge
`
`
`CAPP-3e9
`sector
`
`Rorary
`
`Phases of-calactorgente:
`en
`Recewence
`
`ot
`
`$3
`
`
`No.of brgeed genomic regions
`
`Lung adienoosreninrs
`
`
`ff Predicted
`
`
` ed
`SoaS
`SL
`SF3
`2 ey
`=:
`964
`& 404eoSe gee
`os.
`ES
`oS
`ceom
`
` wo
`
`
`
`7
`
`i
`2
`
`é
`
`a
`
`Ee
`
`ae
`
`Lanations nes patient(log, scale
`codes Training
`wee Fe O08
`withe VWalidetan
`ste Fendom
`
`Figure 1
`
`00002
`
`00002
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 2 of 19
`
`US 2014/0296081 A1
`
`
`
`8::::::::::
`
`w8&88:
`
`
`
`x 333
`
`
`
`
`
`Figure 1 (cont.)
`
`8: 8888 &38:
`
`00003
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 3 of 19
`
`US 2014/0296081 A1
`
`*Sof
`: discovery
`;3::::::::
`
`(). Fusion(s)?
`
`fier8te
`aase
`frequencies
`
`emplated
`ision
`discovery
`
`
`
`Fusion{s}?
`SNWindet(s)?
`Fusion
`intersect
`SNVlindel ( ) are recovery
`reporters
`£3:3:8:
`Aijust freq.
`figutan.
`W
`
`SN Wide
`detection
`:
`
`
`
`
`
`
`
`Wariat
`aniotation
`
`
`
`
`
`Figure 2
`
`00004
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 4 of 19
`
`US 2014/0296081 A1
`
`2500
`
`2000
`
`1500
`1000
`
`500
`
`a
`
`$
`ll-
`
`:
`
`Known'suspected drivers
`Patients ky
`10 (CCOO
`
`40000
`
`
`
`2 30000
`
`s 20000
`d
`2
`
`OOOO
`
`Known/suspected drivers
`Patier its f exor
`10
`
`s: -s. .
`O
`50 510
`40
`30
`20
`10
`O
`Recurrence Index (No. of patients per kb)
`
`O
`O
`
`-
`2
`
`8
`6
`4.
`No. of patients per exon
`
`10 105
`
`Figure 3
`
`00005
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 5 of 19
`
`US 2014/0296081 A1
`
`88ata traps
`:
`tes ge:38 y
`*
`
`
`
`
`
`
`
`
`
`
`
`d
`
`Case a
`
`
`
`{Case.
`
`3ese gyres;exce
`
`Breakpoist identification:
`
`£eye feference
`
`*.
`
`SS SS
`
`.
`* Breakpoint validation/
`
`\ Baisaas
`* 83 genew
`'.
`
`fort
`rtsgecisegren:
`it hastate
`{e.g., &43
`
`s
`
`:::::::
`
`¥aggedge:8 w
`S88-cige:
`were
`
`
`
`
`
`Cargare sai
`crg
`ciged segment
`GCTA
`to 83 sing
`TAGC incretaerts of
`size k.
`
`assiste resis
`Case 2a
`
`Case 28
`
`&
`
`Case:
`
`
`
`3:
`
`Breakpoint adjustent
`Case 2
`
`bes:
`
`82
`
`Breakpoint 2 x 2 + x;
`
`8re333i: 2 2.xp2 - 3 - x:
`
`Figure 4
`
`00006
`
`
`
`Patent Application Publication
`
`Oct. 2,2014 Sheet 6 of 19
`
`US 2014/0296081 Al
`
`AGA
`
`
`
`
`
` PARAARG
`DACASARAASAU ERAS,
`
`bTETCAGAGTAL
`TATAACACGUGAGAASATAGCACCLC
`GAGTAGT
`TTATARGACGGGAGAAAA TAGCACCI CAL
`
`
`GAGAAAATAGCACCTCAC CICCAG:
`ATAAGACGGGAGAARATAGCACCT CACTTCCAGAAAG!
`
`AGASAATAGCACCI
`CACTTCCAGARAG!
`AATAGCACCTCALTTCCAGSAAGCTT
`ACCTCARTICCAGAAAGCIT
`CAGAAAGCTT
`
`
`
`
`
`
`ROS1 intron 34
`
`SLC34A2intron 4
`
`Figure 5
`
`00007
`
`00007
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 7 of 19
`
`US 2014/0296081 A1
`
`3?::::: 8888.32
`
`88ssor; 8:-Essac 3:ric
`
`issists
`
`88.882
`
`is stics is
`
`c
`s
`; :8
`
`8
`s
`38
`3.
`3:2
`
`Figure 6
`
`R88
`C.Ss.
`$ 8.3
`irs
`: es
`5 82
`gt:33
`is 8.;
`S
`
`
`
`--sis-s:
`
`00008
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 8 of 19
`
`US 2014/0296081 A1
`
`a
`
`6
`
`2
`
`S. O. 8
`. 6
`: . 4.
`
`0. 2
`O
`
`
`
`
`
`6triCycle
`18; 6°C
`5rin2GC
`Adapter ligation duratio? and temperature
`
`
`
`
`
`
`
`&
`
`&
`
`
`
`&
`
`
`
`
`
`
`
`Wit-Beat
`Control
`SPRI bead processing
`
`10x
`10x
`Adapter:fragment molar ratio
`
`&
`KAPA. HF
`Phusio
`DNA polymerase used for PCR
`
`NuGN
`KAPA
`Library Preparation Kit
`
`Figure 7
`
`00009
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 9 of 19
`
`US 2014/0296081 A1
`
`
`
`
`
`
`
`KAPA. With-Eead 4ng
`
`x 3:3:
`
`KAPAWith-Bead 28ng
`...
`
`5
`
`KAPA. With-Bead 4ng
`
`
`
`c
`
`age eig:
`
`u
`
`KAPAWith-Bead 28ng
`
`C
`
`1
`08 i
`3 0.6 -
`g
`g 0.4.
`s
`S 0.2
`i
`
`f
`
`wa
`
`
`
`5
`s O4.
`0.3
`2
`0.2
`5
`O
`
`83
`
`&
`
`O
`
`& 4ng
`832ng
`8 28ng
`
`Figure 8
`
`00010
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 10 of 19
`
`US 2014/0296081 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`-8. 88:33:
`
`3:
`2.--
`
`s
`s
`
`88:
`
`8:
`38:
`3888:38.
`
`8:3-------------------------------------------------,
`38. 3.8
`
`3: 83.3 : SS8
`3.3
`
`;--~~~*~~~~#~~~~~*~~~~~#~~~~~#~~~~~*~~~~~#~~~~~
`~~~~~~*~*~*~~~~~~*~~~~~~~~~;~~~;~~
`
`S.---------------
`
`00011
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 11 of 19
`
`US 2014/0296081 A1
`
`a
`
`
`
`s
`5
`
`e
`t
`
`S.
`:
`Cl
`c
`O
`
`(
`
`1 2 3 4 5 6 7 8 s to
`Known fraction (%)
`
`b
`
`s
`9.
`S
`s
`
`S
`
`s
`
`s
`
`g
`
`C
`
`2
`()
`
`8
`S.
`g
`
`. . . .
`
`. .
`
`.
`
`0.0-------~~~~T-T-T-T-T-1
`2 4 6 8
`12 a 16 8 20 22 24
`No. of reporters considered
`- Mean correlation r CW of 1% spike
`CW of 10% spike - CW of 0.1% spike
`
`C
`
`
`
`al
`
`O
`8
`& B
`7
`S 6
`5
`4.
`3
`as 2
`s
`
`Spike d
`
`10%
`
`100
`
`5
`
`as
`8 O.
`l
`
`1%
`13xx
`easies. O.1%
`s
`-----------------------------
`2 4 6 8 O 2 14 is 18 222 24
`... of reporters considered
`
`ook X:
`
`O.
`
`a
`1
`Known fraction (%)
`x SNPs & Indel
`
`a
`
`Fusion
`
`Figure 10
`
`00012
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 12 of 19
`
`US 2014/0296081 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`::: :
`
`- assssssssssssssssssssssssssssss
`
`
`
`~~ ~~~~+--------+-------+------- ? ? §
`
`»
`
`Figure 11
`
`00013
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 13 of 19
`
`US 2014/0296081 A1
`
`83.38
`
`
`
`3:38:
`
`
`
`Y: 8.
`
`
`
`Figure 11 (cont.)
`
`00014
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 14 of 19
`
`US 2014/0296081 A1
`
`
`
`
`
`
`
`
`
`EML4 ( Chr2
`
`KiF5B (chr0)
`
`ALK (chr2)
`
`38
`36
`284.4872:
`
`Predicted ALKFusion Genes
`
`K24A20
`
`
`
`SLC34A2 (chr4)
`
`ROS1 (chr6)
`
`CD74 (chr5)
`
`Figure 12
`
`00015
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 15 of 19
`
`US 2014/0296081 A1
`
`20
`
`P. O.O3
`
`15-
`
`e
`2 10-
`o
`C
`2
`
`5
`
`O
`
`(x)
`(x)
`
`Smoking history
`() Heavy
`Light
`O None
`
`&
`8
`8)
`Fusions
`absent
`
`Oc
`O
`8-OO
`Fusion(s)
`present
`
`Figure 13
`
`00016
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 16 of 19
`
`US 2014/0296081 A1
`
`
`
`Crizotinib initiated
`
`-O24.
`
`O 25- -----co-ro-for-oro-roceser... Y.-256
`-
`O
`2
`3
`4
`5
`6
`Months since initiation of therapy
`ex)o SNV: TP53 axe SNV:TMEM132D
`SNV: NAV3
`Fusion: KF5B-ALK
`
`Figure 14
`
`00017
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 17 of 19
`
`US 2014/0296081 A1
`
`
`
`:
`
`:
`
`Figure 15
`
`EpCAM-APC
`
`00018
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 18 of 19
`
`US 2014/0296081 A1
`
`CGA-35-4428
`SA-6-4-633
`Remaining patients as 16)
`w.w. Waxin
`assesse Wea
`
`b
`
`3:
`
`Pair: CGA-C5-4,428
`le:33:388:33:ex:33;
`
`to SC3A2
`
`a
`
`208
`
`150
`
`& 100
`1.
`
`SO
`
`8
`
`C
`
`
`
`Patient CGA-64-88
`
`XXX:
`
`R33ix:32
`
`scordant reads
`
`Read: 08PGAEXX 3O404:3:320:5341456437
`Soft-cipped: Es:3&&&.33888&&CCA&S::::::::::::::::::::::::::::::::::::::::::33:3:
`FROS;
`Ca
`Figure 16
`
`
`
`00019
`
`
`
`Patent Application Publication
`
`Oct. 2, 2014 Sheet 19 of 19
`
`US 2014/0296081 A1
`
`(O) Pre-filter re- (1) Germline filter a (2) cfoNA background filter reb (3) Outlier detection
`
`Backgroin allele --> 's
`g
`
`3.
`
`Known SW
`
`8.
`m
`
`8
`
`20
`
`8.
`A.
`*c. tags deduced
`
`8.
`
`c
`2
`
`
`
`40
`
`300
`
`s
`200
`
`w
`
`&
`
`&
`- -
`
`-
`
`ry
`
`s
`s
`
`i)
`
`s
`5.
`3.
`s
`a.
`
`&
`
`86
`4.
`Mo. tags cedupec
`
`80
`
`c
`
`2)
`
`8
`4)
`No. tags (driupod
`
`8
`
`Cancer plasma cFENA (P6)
`
`;43
`
`3.(
`
`s
`3
`S 290
`3.
`S
`a.
`
`1)}
`
`(s
`3.
`s
`
`s
`3.
`
`r
`
`3
`g
`s
`i
`
`g
`
`3.
`
`s
`S
`s
`s
`s
`.
`
`:
`
`v.
`: y
`Y
`
`g
`
`ios
`:
`3
`io.7 s
`;:
`
`--------...- :06
`30
`2.
`Rust.shalais disage
`w raiya corelation
`
`2
`
`
`
`43
`
`3.
`
`C as
`
`360
`
`S 206
`8,
`is
`2.
`
`C
`
`
`
`s
`
`a
`us
`
`is
`
`3.
`Mc. tags leaduped
`
`28
`
`88
`0
`No. tags (deduped
`
`80
`
`00
`
`Mo. tags caduped
`
`obtusi Mahalais distance
`w feate clait
`
`-
`Post-Op plasma cinA (P1)
`
`. 8
`
`8
`
`8.
`
`3.
`
`RobustMahalancis distance
`
`was terative clai'
`
`3)
`2
`Rotisfailagis is:
`
`40
`
`we leiaie
`
`islatif
`
`3
`
`&
`2
`Rustahalotis sistice
`
`sex leave celan
`
`0. 8
`
`0. 8
`
`0. 2
`
`.
`
`Figure 17
`
`00020
`
`
`
`US 2014/0296081 A1
`
`Oct. 2, 2014
`
`IDENTIFICATION AND USE OF
`CIRCULATING TUMIOR MARKERS
`
`STATEMENT OF GOVERNMENTAL SUPPORT
`0001. This invention was made with government support
`under grant number W81XWH-12-1-0285 awarded by the
`Department of Defense. The government has certain rights in
`the invention.
`
`BACKGROUND OF THE INVENTION
`0002 Analysis of cancer-derived cell-free DNA (cf)NA)
`has the potential to revolutionize detection and monitoring of
`cancer. Noninvasive access to malignant DNA is particularly
`attractive for solid tumors, which cannot be repeatedly
`sampled without invasive procedures. In non-Small cell lung
`cancer (NSCLC), PCR-based assays have been used previ
`ously to detect recurrent point mutations in genes Such as
`KRAS or EGFR in plasma DNA (Taniguchietal. (2011) Clin.
`Cancer Res. 17:7808-7815; Gautschi et al. (2007) Cancer
`Lett. 254:265-273; Kuang et al. (2009) Clin. Cancer Res.
`15:2630-2636; Rosellet al. (2009) N. Engl. J. Med. 361:958
`967), but the majority of patients lack mutations in these
`genes. Other studies have proposed identifying patient-spe
`cific chromosomal rearrangements in tumors via whole
`genome sequencing (WGS), followed by breakpoint qPCR
`from cf)NA (Leary et al. (2010) Sci. Transl. Med. 2:20ra14:
`McBride et al. (2010) Genes Chrom. Cancer 49:1062-1069).
`While sensitive, such methods require optimization of
`molecular assays for each patient, limiting their widespread
`clinical application. More recently, several groups have
`reported amplicon-based deep sequencing methods to detect
`cf. DNA mutations in up to 6 recurrently mutated genes (For
`shew et al. (2012) Sci. Transl. Med. 4:136ra168; Narayan et
`al. (2012) Cancer Res. 72:3492-3498; Kinde et al. (2011)
`Proc. Natl Acad. Sci. USA 108:9530-9535). While powerful,
`these approaches are limited by the number of mutations that
`can be interrogated (Rachlin et al. (2005) BMC Genomics
`6:102) and the inability to detect genomic fusions.
`0003. PCT International Patent Publication No. 2011/
`103236 describes methods for identifying personalized
`tumor markers in a cancer patient using “mate-paired librar
`ies. The methods are limited to monitoring somatic chromo
`Somal rearrangements, however, and must be personalized for
`each patient, thus limiting their applicability and increasing
`their cost.
`0004 U.S. Patent Application Publication No. 2010/
`0041048 A1 describes the quantitation of tumor-specific cell
`free DNA in colorectal cancer patients using the "BEAMing'
`technique (Beads, Emulsion, Amplification, and Magnetics).
`While this technique provides high sensitivity and specificity,
`this method is for single mutations and thus any given assay
`can only be applied to a Subset of patients and/or requires
`patient-specific optimization. U.S. Patent Application Publi
`cation No. 2012/0183967 A1 describes additional methods to
`identify and quantify genetic variations, including the analy
`sis of minor variants in a DNA population, using the "BEAM
`ing technique.
`0005 U.S. Patent Application Publication No. 2012/
`0214678 A1 describes methods and compositions for detect
`ing fetal nucleic acids and determining the fraction of cell
`free fetal nucleic acid circulating in a maternal sample. While
`sensitive, these methods analyze polymorphisms occurring
`between maternal and fetal nucleic acids rather than polymor
`
`phisms that result from Somatic mutations in tumor cells. In
`addition, methods that detect fetal nucleic acids in maternal
`circulation require much less sensitivity than methods that
`detect tumor nucleic acids in cancer patient circulation,
`because fetal nucleic acids are much more abundant than
`tumor nucleic acids.
`0006 U.S. Patent Application Publication Nos. 2012/
`0237928 A1 and 2013/0034546 describe methods for deter
`mining copy number variations of a sequence of interest in a
`test sample comprising a mixture of nucleic acids. While
`potentially applicable to the analysis of cancer, these methods
`are directed to measuring major structural changes in nucleic
`acids, Such as translocations, deletions, and amplifications,
`rather than single nucleotide variations.
`0007 U.S. Patent Application Publication No. 2012/
`0264121 A1 describes methods for estimating a genomic
`fraction, for example, a fetal fraction, from polymorphisms
`Such as Small base variations or insertions-deletions. These
`methods do not, however, make use of optimized libraries of
`polymorphisms. Such as, for example, libraries containing
`recurrently-mutated genomic regions.
`0008 U.S. Patent Application Publication No. 2013/
`0024127 A1 describes computer-implemented methods for
`calculating a percent contribution of cell-free nucleic acids
`from a major source and a minor Source in a mixed sample.
`The methods do not, however, provide any advantages in
`identifying or making use of optimized libraries of polymor
`phisms in the analysis.
`0009 PCT International Publication No. WO 2010/
`141955 A2 describes methods of detecting cancer by analyz
`ing panels of genes from a patient-obtained sample and deter
`mining the mutational status of the genes in the panel. The
`methods rely on a relatively small number of known cancer
`genes, however, and they do not provide any ranking of the
`genes according to effectiveness in detection of relevant
`mutations. In addition, the methods were unable to detect the
`presence of mutations in the majority of serum samples from
`actual cancer patients.
`0010. There is thus a need for new and improved methods
`to detect and monitor tumor-related nucleic acids in cancer
`patients.
`
`SUMMARY OF THE INVENTION
`0011. The present invention addresses these and other
`problems by providing novel methods and systems relating to
`the characterization, diagnosis, and monitoring of cancer. In
`particular, according to one aspect, the invention provides
`methods for creating a library of recurrently mutated genomic
`regions comprising:
`0012 identifying a plurality of genomic regions from a
`group of genomic regions that are recurrently mutated in a
`specific cancer,
`0013 wherein the library comprises the plurality of
`genomic regions:
`0014 the plurality of genomic regions comprises at least
`10 different genomic regions; and
`00.15
`at least one mutation within the plurality of genomic
`regions is present in at least 60% of all subjects with the
`specific cancer.
`0016. In specific embodiments of these methods, the plu
`rality of genomic regions comprises at least 25, at least 50, at
`least 100, at least 150, at least 200, or at least 500 different
`genomic regions.
`
`00021
`
`
`
`US 2014/0296081 A1
`
`Oct. 2, 2014
`
`0017. In other specific method embodiments, at least two
`mutations within the plurality of genomic regions or at least
`three mutations within the plurality of genomic regions is
`present in at least 60% of all subjects with the specific cancer.
`0018. In still other specific method embodiments, at least
`one mutation within the plurality of genomic regions is
`present in at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or
`99.9% of all subjects with the specific cancer.
`0019. In some embodiments, the identifying step com
`prises for each genomic region in the plurality of genomic
`regions, ranking the genomic region to maximize the number
`of all Subjects with the specific cancer having at least one
`mutation within the genomic region.
`0020. In other embodiments, the identifying step com
`prises for each genomic region in the plurality of genomic
`regions, ranking the genomic region to maximize the ratio
`between the number of all subjects with the specific cancer
`having at least one mutation within the genomic region and
`the length of the genomic region.
`0021. In some embodiments, the library comprises a plu
`rality of genomic regions encoding a plurality of driver
`sequences, more specifically known driver sequences or
`driver sequences that are recurrently mutated in the specific
`CaCC.
`0022. In some embodiments, the library comprises a plu
`rality of genomic regions that are recurrently rearranged in
`the specific cancer.
`0023. In preferred embodiments, the specific cancer is a
`carcinoma, and in more preferred embodiments, the carci
`noma is an adenocarcinoma, a non-Small cell lung cancer, or
`a squamous cell carcinoma.
`0024. In specific embodiments, the cumulative length of
`the plurality of genomic regions is at most 30 Mb. 20 Mb, 10
`Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb,
`or 10 kb.
`0025. In another aspect, the invention provides methods
`for analyzing a cancer-specific genetic alteration in a subject
`comprising the steps of
`0026 obtaining a tumor nucleic acid sample and a
`genomic nucleic acid sample from a subject with a specific
`Cancer,
`0027 sequencing a plurality of target regions in the tumor
`nucleic acid sample and in the genomic nucleic acid sample to
`obtain a plurality of tumor nucleic acid sequences and a
`plurality of genomic nucleic acid sequences; and
`0028 comparing the plurality of tumor nucleic acid
`sequences to the plurality of genomic nucleic acid sequences
`to identify a patient-specific genetic alteration in the tumor
`nucleic acid sample;
`0029 wherein the plurality of target regions are selected
`from a plurality of genomic regions that are recurrently
`mutated in the specific cancer,
`0030 the plurality of genomic regions comprises at least
`10 different genomic regions; and
`0031 at least one mutation within the plurality of genomic
`regions is present in at least 60% of all subjects with the
`specific cancer.
`0032. In specific embodiments of this aspect of the inven
`tion, the plurality of genomic regions comprises at least 25, at
`least 50, at least 100, at least 150, at least 200, or at least 500
`different genomic regions.
`0033. In other specific embodiments, at least two muta
`tions within the plurality of genomic regions or at least three
`
`mutations within the plurality of genomic regions is present in
`at least 60% of all subjects with the specific cancer.
`0034. In still other specific embodiments, at least one
`mutation within the plurality of genomic regions is present in
`at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of
`all Subjects with the specific cancer.
`0035. In some embodiments, each genomic region in the
`plurality of genomic regions is identified by ranking the
`genomic region to maximize the number of all Subjects with
`the specific cancer having at least one mutation within the
`genomic region.
`0036. In other embodiments, each genomic region in the
`plurality of genomic regions is identified by ranking the
`genomic region to maximize the ratio between the number of
`all Subjects with the specific cancer having at least one muta
`tion within the genomic region and the length of the genomic
`region.
`0037. In some embodiments, the plurality of genomic
`regions comprises genomic regions encoding a plurality of
`driver sequences, more specifically known driver sequences
`or driversequences that are recurrently mutated in the specific
`CaCC.
`0038. In some embodiments, the plurality of genomic
`regions comprises genomic regions that are recurrently rear
`ranged in the specific cancer.
`0039. In preferred embodiments, the specific cancer is a
`carcinoma, and in more preferred embodiments, the carci
`noma is an adenocarcinoma, a non-Small cell lung cancer, or
`a squamous cell carcinoma.
`0040. In specific embodiments, the cumulative length of
`the plurality of genomic regions is at most 30 Mb. 20 Mb, 10
`Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb,
`or 10 kb.
`0041. In some embodiments, the methods further com
`prising the steps of:
`0042 obtaining a cell-free nucleic acid sample from the
`Subject; and
`0043 identifying the patient-specific genetic alteration in
`the cell-free nucleic acid sample.
`0044. In specific embodiments, the step of identifying the
`patient-specific genetic alteration in the cell-free nucleic acid
`sample comprises sequencing a genomic region comprising
`the patient-specific genetic alteration in the cell-free sample.
`0045. In other specific embodiments, the step of obtaining
`a tumor nucleic acid sample and a genomic nucleic acid
`sample comprises the step of enriching the plurality of target
`regions in the tumor nucleic acid sample and the genomic
`nucleic acid sample, and in more specific embodiments, the
`enriching step comprises use of a custom library of biotiny
`lated DNA.
`0046. In still other specific embodiments, the step of
`obtaining a cell-free nucleic acid sample comprises the step
`of enriching the plurality of target regions in the cell-free
`nucleic acid sample, and in still more specific embodiments,
`the enriching step comprises use of a custom library of bioti
`nylated DNA.
`0047. In some embodiments, the methods further com
`prise the step of quantifying the cancer-specific genetic alter
`ation in the cell-free sample.
`0048. In yet another aspect, the invention provides meth
`ods for Screening a cancer-specific genetic alteration in a
`Subject comprising the steps of:
`0049 obtaining a cell-free nucleic acid sample from a
`Subject;
`
`00022
`
`
`
`US 2014/0296081 A1
`
`Oct. 2, 2014
`
`0050 sequencing a plurality of target regions in the cell
`free sample to obtain a plurality of cell-free nucleic acid
`sequences; and
`0051) identifying a cancer-specific genetic alteration in
`the cell-free sample:
`0052 wherein the plurality of target regions are selected
`from a plurality of genomic regions that are recurrently
`mutated in the specific cancer,
`0053 the plurality of genomic regions comprises at least
`10 different genomic regions; and
`0054 at least one mutation within the plurality of genomic
`regions is present in at least 60% of all subjects with the
`specific cancer.
`0055. In specific embodiments, the plurality of genomic
`regions comprises at least 25, at least 50, at least 100, at least
`150, at least 200, or at least 500 different genomic regions.
`0056. In other specific embodiments, at least two muta
`tions within the plurality of genomic regions or at least three
`mutations within the plurality of genomic regions is present in
`at least 60% of all subjects with the specific cancer.
`0057. In still other specific embodiments, at least one
`mutation within the plurality of genomic regions is present in
`at least 60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% of
`all Subjects with the specific cancer.
`0058. In particular embodiments, each genomic region in
`the plurality of genomic regions is identified by ranking the
`genomic region to maximize the number of all Subjects with
`the specific cancer having at least one mutation within the
`genomic region.
`0059. In other particular embodiments, each genomic
`region in the plurality of genomic regions is identified by
`ranking the genomic region to maximize the ratio between the
`number of all Subjects with the specific cancer having at least
`one mutation within the genomic region and the length of the
`genomic region.
`0060. In still other particular embodiments, the plurality
`of genomic regions comprises genomic regions encoding a
`plurality of driver sequences, and, more particularly, the
`driver sequences are known driver sequences or are recur
`rently mutated in the specific cancer.
`0061. In yet still other particular embodiments, the plural
`ity of genomic regions comprises genomic regions that are
`recurrently rearranged in the specific cancer.
`0062. In some embodiments, the specific cancer is a car
`cinoma, including, for example, an adenocarcinoma, a non
`Small cell lung cancer, or a squamous cell carcinoma.
`0063. In specific embodiments, the cumulative length of
`the plurality of genomic regions is at most 30 Mb. 20 Mb, 10
`Mb, 5 Mb, 2 Mb, 1 Mb, 500 kb, 200 kb, 100 kb, 50 kb, 20 kb,
`or 10 kb.
`0064. In other specific embodiments, the step of obtaining
`a cell-free nucleic acid sample comprises the step of enrich
`ing the plurality of target regions in the cell-free nucleic acid
`sample, and, in Some embodiments, the enriching step com
`prises use of a custom library of biotinylated DNA.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0065 FIG. 1. Development of CAncer Personalized Pro
`filing by Deep Sequencing (CAPP-Seq). (a) Schematic
`depicting design of CAPP-Seq selectors and their application
`for assessing circulating tumor DNA. (b) Multi-phase design
`of the NSCLCCAPP-Seq selector. (c) Analysis of the number
`of SNVs per lung adenocarcinoma covered by the NSCLC
`CAPP-Seq selector in the TCGA WES cohort (Training:
`
`N=229) and an independent lung adenocarcinoma WES data
`set (Validation; N=183) (Imielinski et al. (2012) Cell 150:
`1107-1120). (d) Number of SNVs perpatient identified by the
`NSCLC CAPP-Seq selector in WES data from three adeno
`carcinomas from TCGA, colon (COAD), rectal (READ), and
`endometrioid (UCEC) cancers. (e-f) Quality parameters from
`a representative CAPP-Seq analysis of plasma cf)NA,
`including length distribution of sequenced cf)NA fragments
`(e), and depth of sequencing coverage across all genomic
`regions in the selector (f). (g) Variation in sequencing depth
`across cfDNA samples from 4 patients.
`0.066
`FIG. 2. CAPP-Seq computational pipeline. Major
`steps of the bioinformatics pipeline for mutation discovery
`and quantitation in plasma are schematically illustrated.
`0067 FIG. 3. Statistical enrichment of recurrently
`mutated NSCLC exons captures known drivers.
`0068 FIG. 4. Development of the FACTERA algorithm.
`Major steps used by FACTERA (see Detailed Methods) to
`precisely identify genomic breakpoints from aligned paired
`end sequencing data are anecdotally illustrated using two
`hypothetical genes, w and V. (a) Improperly paired, or "dis
`cordant, reads (indicated in yellow) are used to locate genes
`involved in a potential fusion (in this case, w and V). (b)
`Because truncated (i.e., soft-clipped) reads may indicate a
`fusion breakpoint, any such reads within genomic regions
`delineated by w and v are also further analyzed. (c) Consider
`soft-clipped reads, R1 and R2, whose non-clipped segments
`map to w and v, respectively. If R1 and R2 derive from a
`fragment encompassing a true fusion between w and V, then
`the mapped portion of R1 should match the soft-clipped por
`tion of R2, and vice versa. This is assessed by FACTERA
`using fast k-mer indexing and comparison. (d) Four possible
`orientations of R1 and R2 are depicted. However, only Cases
`1a and 2a can generate valid fusions (see Detailed Methods).
`Thus, prior to k-mer comparison (panel c), the reverse
`complement of R1 is taken for Cases 1b and 2b, respectively,
`converting them into Cases 1a and 2a. (e) In some cases, short
`sequences immediately flanking the breakpoint are identical,
`preventing unambiguous determination of the breakpoint. Let
`iterators i and denote the first matching sequence positions
`between R1 and R2. To reconcile sequence overlap,
`FACTERA arbitrarily adjusts the breakpoint in R2 (i.e., bp2)
`to match R1 (i.e., bpl) using the sequence offset determined
`by differences in distance between bp2 and i, and bp1 and j.
`Two cases are illustrated, corresponding to sequence orienta
`tions described in (d).
`0069 FIG. 5. Application of FACTERA to NSCLC cell
`lines NCI-H3122 and HCC78, and Sanger-validation of
`breakpoints. (a) Pile-up of a subset of soft-clipped reads
`mapping to the EML4-ALK fusion identified in NCI-H3122
`along with the corresponding Sanger chromatogram. (b)
`Same as (a), but for the SLC34A2-ROS1 translocation iden
`tified in HCC78.
`(0070 FIG. 6. Improvements in CAPP-Seq performance
`with optimized library preparation procedures.
`(0071
`FIG. 7. Optimizing allele recovery from low input
`cf. DNA during Illumina library preparation.
`(0072 FIG. 8. CAPP-Seq performance with various
`amounts of input cf. DNA.
`(0073 FIG. 9. Analysis of CAPP-Seq background, allele
`detection threshold, and linearity. (a) Analysis of background
`rate for 6 NSCLC patient plasma samples and a healthy
`individual (Detailed Methods). (b) Analysis of biological
`background in (a) focusing on 107 recurrent somatic muta
`
`00023
`
`
`
`US 2014/0296081 A1
`
`Oct. 2, 2014
`
`tions from a previously reported SNaPshot panel (Su et al.
`(2011).J. Mol. Diagn. 13:74-84). Mutations found in a given
`patient's tumor were excluded. The mean frequency for each
`patient (horizontal red line) was within confidence limits of
`the mean background limit of 0.007% (horizontal blue line;
`panela). A single outlier mutation (TP53 R175H) is indicated
`by an orange diamond. (c) Individual mutations from (b)
`ranked by most to least recurrent, according to median fre
`quency across the 7 samples. (d) Dilution series analysis of
`expected versus observed frequencies of mutant alleles using
`CAPP-Seq. Dilution series were generated by spiking frag
`mented HCC78 DNA into control cf)NA. (e) Analysis of the
`effect of the number of SNVs considered on the estimates of
`fractional abundance (95% confidence intervals shown in
`gray). (f) Analysis of the effect of the number of SNVs con
`sidered on the mean correlation coefficient between expected
`and observed cancer fractions (blue dashed line) using data
`from panel (d). 95% confidence intervals are shown for (a)-
`(c). Statistical variation for (d) is shown as S.e.m.
`0074 FIG. 10. Empirical spiking analysis of CAPP-Seq
`using two NSCLC cell lines. (a) Expected and observed (by
`CAPP-Seq) fractions of NCI-H3122 DNA spiked into control
`HCC78 DNA are linear for all fractions tested (0.1%, 1%, and
`10%; R =1). (b) Using data from (a), analysis of the effect of
`the number of SNVs considered on the estimates of fractional
`abundance (95% confidence intervals shown in gray). (c)
`Analysis of the effect of the number of SNVs considered on
`the mean correlation coefficient and coefficient of variation
`between expected and observed cancer fractions (dashed
`lines) using data from panel (a). (d) Expected and observed
`fractions of the EML4-ALK fusion present in HCC78 are
`linear (R-0.995) over all spiking concentrations tested (see
`FIG. 5(b) for breakpoint verification). The observed EML4
`ALK fractions were normalized based on the relative abun
`dance of the fusion in 100% H3122 DNA (see Detailed Meth
`ods for details). Moreover, a single heterozygous insertion
`(indel) discovered within the selector space of NCI-H3122
`(chr7: 107416855, +T) was concordant with defined concen
`trations (shown are observed fractions adjusted for Zygosity).
`0075 FIG. 11. Application of CAPP-Seq for noninvasive
`detection and monitoring of circulating tumor DNA. (a) Char
`acteristics of 11 patients included in this study (Table 3).
`P-values reflect a two-sided paired t-test for patients with
`reporter SNVs detected at both time points; other p-values
`were determined as described in Methods. ND, mutant DNA
`was not detected above background. Dashes, plasma sample
`not available. Smoking history, 220 pack years (heavy), >0
`pack years (light). (b-d) Disease monitoring using CAPP
`Seq. Mutant allele frequencies (lefty-axis) and absolute con
`centrations (righty-axis) are shown. The lower limit of detec
`tion (defined in FIG.2(a)-(b)) is indicated by the dashed lines.
`(b) Pre- and post-Surgery circulating tumor DNA levels quan
`tified by CAPP-Seq in a Stage IB and a Stage IIIA NSCLC
`patient. Complete resections were achieved in both cases. (c)
`Disease burden changes in response to chemotherapy in a
`Stage IV NSCLC patient with three rearrangement break
`points identified by CAPP-Seq Tumor volume based on CT
`measurements and CAPP-Seq mutant allele frequencies are
`shown. Tu, tumor; Ef, pleural