`
`OPEN
`
`doi:10.1038/nature12634
`
`Mutational landscape and significance
`across 12 major cancer types
`
`Cyriac Kandoth1*, Michael D. McLellan1*, Fabio Vandin2, Kai Ye1,3, Beifang Niu1, Charles Lu1, Mingchao Xie1, Qunyuan Zhang1,3,
`Joshua F. McMichael1, Matthew A. Wyczalkowski1, Mark D. M. Leiserson2, Christopher A. Miller1, John S. Welch4,5,
`Matthew J. Walter4,5, Michael C. Wendl1,3,6, Timothy J. Ley1,3,4,5, Richard K. Wilson1,3,5, Benjamin J. Raphael2 & Li Ding1,3,4,5
`
`The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across
`thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions
`from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of
`mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/
`carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes
`from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/b-catenin and
`receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone
`modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in
`these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of
`driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show
`tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis
`identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal
`architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for
`developing new diagnostics and individualizing cancer treatment.
`
`The advancement of DNA sequencing technologies now enables the
`processing of thousands of tumours of many types for systematic
`mutation discovery. This expansion of scope, coupled with appreciable
`progress in algorithms1–5, has led directly to characterization of signifi-
`cant functional mutations, genes and pathways6–18. Cancer encompasses
`more than 100 related diseases19, making it crucial to understand the
`commonalities and differences among various types and subtypes.
`TCGA was founded to address these needs, and its large data sets
`are providing unprecedented opportunities for systematic, integrated
`analysis.
`We performed a systematic analysis of 3,281 tumours from 12 cancer
`types to investigate underlying mechanisms of cancer initiation and
`progression. We describe variable mutation frequencies and contexts
`and their associations with environmental factors and defects in DNA
`repair. We identify 127 significantly mutated genes (SMGs) from diverse
`signalling and enzymatic processes. The finding of a TP53-driven
`breast, head and neck, and ovarian cancer cluster with a dearth of other
`mutations in SMGs suggests common therapeutic strategies might be
`applied for these tumours. We determined interactions among muta-
`tions and correlated mutations in BAP1, FBXW7 and TP53 with det-
`rimental phenotypes across several cancer types. The subclonal structure
`and transcription status of underlying somatic mutations reveal the
`trajectory of tumour progression in patients with cancer.
`
`Standardization of mutation data
`Stringent filters (Methods) were applied to ensure high quality muta-
`tion calls for 12 cancer types: breast adenocarcinoma (BRCA), lung
`adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC),
`uterine corpus endometrial carcinoma (UCEC), glioblastoma multiforme
`
`(GBM), head and neck squamous cell carcinoma (HNSC), colon and
`rectal carcinoma (COAD, READ), bladder urothelial carcinoma (BLCA),
`kidney renal clear cell carcinoma (KIRC), ovarian serous carcinoma
`(OV) and acute myeloid leukaemia (LAML; conventionally called
`AML) (Supplementary Table 1). A total of 617,354 somatic mutations,
`consisting of 398,750 missense, 145,488 silent, 36,443 nonsense, 9,778
`splice site, 7,693 non-coding RNA, 523 non-stop/readthrough, 15,141
`frameshift insertions/deletions (indels) and 3,538 inframe indels, were
`included for downstream analyses (Supplementary Table 2).
`
`Distinct mutation frequencies and sequence context
`Figure 1a shows that AML has the lowest median mutation frequency
`and LUSC the highest (0.28 and 8.15 mutations per megabase (Mb),
`respectively). Besides AML, all types average over 1 mutation per Mb,
`substantially higher than in paediatric tumours20. Clustering21 illus-
`trates that mutation frequencies for KIRC, BRCA, OV and AML are
`normally distributed within a single cluster, whereas other types have
`several clusters (for example, 5 and 6 clusters in UCEC and COAD/
`READ, respectively) (Fig. 1a and Supplementary Table 3a, b). In UCEC,
`the largest patient cluster has a frequency of approximately 1.5 muta-
`tions per Mb, and the cluster with the highest frequency is more than
`150 times greater. Multiple clusters suggest that factors other than age
`contribute to development in these tumours14,16. Indeed, there is a
`significant correlation between high mutation frequency and DNA
`repair pathway genes (for example, PRKDC, TP53 and MSH6) (Sup-
`plementary Table 3c). Notably, PRKDC mutations are associated with
`high frequency in BLCA, COAD/READ, LUAD and UCEC, whereas
`TP53 mutations are related with higher frequencies in AML, BLCA,
`BRCA, HNSC, LUAD, LUSC and UCEC (all P , 0.05). Mutations in
`
`1The Genome Institute, Washington University in St Louis, Missouri 63108, USA. 2Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA. 3Department of Genetics,
`Washington University in St Louis, Missouri 63108, USA. 4Department of Medicine, Washington University in St Louis, Missouri 63108, USA. 5Siteman Cancer Center, Washington University in St Louis,
`Missouri 63108, USA. 6Department of Mathematics, Washington University in St Louis, Missouri 63108, USA.
`*These authors contributed equally to this work.
`
`1 7 O C T O B E R 2 0 1 3 | V O L 5 0 2 | N A T U R E | 3 3 3
`Macmillan Publishers Limited. All rights reserved
`
`©2013
`
`Genome Ex. 1021
`Page 1 of 20
`
`
`
`of C.T transitions are between 59% and 67%, substantially higher
`than the approximately 40% in other cancer types. Higher frequencies
`of transition mutations at CpG in gastrointestinal tumours, including
`colorectal, were previously reported22. We found three additional
`cancer types (GBM, AML and UCEC) clustered in the C.T mutation
`at CpG, consistent with previous findings of aberrant DNA methyla-
`tion in endometrial cancer23 and glioblastoma24. BLCA has a unique
`signature for C.T transitions compared to the other types (enriched
`for TC) (Extended Data Fig. 1).
`
`Significantly mutated genes
`Genes under positive selection, either in individual or multiple tumour
`types, tend to display higher mutation frequencies above background.
`Our statistical analysis3, guided by expression data and curation (Methods),
`identified 127 such genes (SMGs; Supplementary Table 4). These SMGs
`are involved in a wide range of cellular processes, broadly classified
`into 20 categories (Fig. 2), including transcription factors/regulators,
`histone modifiers, genome integrity, receptor tyrosine kinase signal-
`ling, cell cycle, mitogen-activated protein kinases (MAPK) signalling,
`phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/b-catenin
`signalling, histones, ubiquitin-mediated proteolysis, and splicing (Fig. 2).
`The identification of MAPK, PI(3)K and Wnt/b-catenin signalling path-
`ways is consistent with classical cancer studies. Notably, newer categories
`(for example, splicing, transcription regulators, metabolism, proteolysis
`and histones) emerge as exciting guides for the development of new
`therapeutic targets. Genes categorized as histone modifiers (Z 5 0.57),
`PI(3)K signalling (Z 5 1.03), and genome integrity (Z 5 0.66) all relate
`to more than one cancer type, whereas transcription factor/regulator
`(Z 5 0.40), TGF-b signalling (Z 5 0.66), and Wnt/b-catenin signalling
`(Z 5 0.55) genes tend to associate with single types (Methods).
`Notably, 3,053 out of 3,281 total samples (93%) across the Pan-
`Cancer collection had at least one non-synonymous mutation in at
`least one SMG. The average number of point mutations and small
`indels in these genes varies across tumour types, with the highest (,6
`mutations per tumour) in UCEC, LUAD and LUSC, and the lowest
`(,2 mutations per tumour) in AML, BRCA, KIRC and OV. This
`suggests that the numbers of both cancer-related genes (only 127
`identified in this study) and cooperating driver mutations required
`during oncogenesis are small (most cases only had 2–6) (Fig. 3),
`although large-scale structural rearrangements were not included in
`this analysis.
`
`Common mutations
`The most frequently mutated gene in the Pan-Cancer cohort is TP53
`(42% of samples). Its mutations predominate in serous ovarian (95%)
`and serous endometrial carcinomas (89%) (Fig. 2). TP53 mutations
`are also associated with basal subtype breast tumours. PIK3CA is the
`second most commonly mutated gene, occurring frequently (.10%)
`in most cancer types except OV, KIRC, LUAD and AML. PIK3CA
`mutations frequented UCEC (52%) and BRCA (33.6%), being speci-
`fically enriched in luminal subtype tumours. Tumours lacking PIK3CA
`mutations often had mutations in PIK3R1, with the highest occur-
`rences in UCEC (31%) and GBM (11%) (Fig. 2).
`Many cancer types carried mutations in chromatin re-modelling
`genes. In particular, histone-lysine N-methyltransferase genes (MLL2
`(also known as KMT2D), MLL3 (KMT2C) and MLL4 (KMT2B)) clus-
`ter in bladder, lung and endometrial cancers, whereas the lysine (K)-
`specific demethylase KDM5C is prevalently mutated in KIRC (7%).
`Mutations in ARID1A are frequent in BLCA, UCEC, LUAD and
`LUSC, whereas mutations in ARID5B predominate in UCEC (10%)
`(Fig. 2).
`KRAS and NRAS mutations are typically mutually exclusive, with
`recurrent activating mutations (KRAS (Gly 12), KRAS (Gly 13) and
`NRAS (Gln 61)) common in COAD/READ (30%, 5% and 5%, respect-
`ively), UCEC (15%, 4% and 2%) and LUAD (24%, 1% and 2%). EGFR
`mutations are frequent in GBM (27%) and LUAD (11%). Recurrent,
`
`RESEARCH ARTICLE
`
`AML BRCA OV
`
`KIRC UCEC GBM COAD/
`READ
`
`HNSC BLCA LUAD LUSC
`
`C>T
`
`C>G
`
`C>A
`
`A>T
`
`A>G
`
`A>C
`
`A>G
`
`A>T
`
`100
`
`1
`
`a
`
`Number of mutations per Mb
`
`0.01
`
`100
`75
`50
`25
`0
`
`b
`
`Ti/Tv frequency (%)
`
`c
`
`A>C
`
`C>A
`
`C>G
`
`C>T
`
`Correlation coefficient
`
`0.75 0.85 0.95
`
`Figure 1 | Mutation frequencies, spectra and contexts across 12 cancer
`types. a, Distribution of mutation frequencies across 12 cancer types. Dashed
`grey and solid white lines denote average across cancer types and median for
`each type, respectively. b, Mutation spectrum of six transition (Ti) and
`transversion (Tv) categories for each cancer type. c, Hierarchically clustered
`mutation context (defined by the proportion of A, T, C and G nucleotides
`within 62 bp of variant site) for six mutation categories. Cancer types
`correspond to colours in a. Colour denotes degree of correlation: yellow
`(r 5 0.75) and red (r 5 1).
`
`POLQ and POLE associate with high frequencies in multiple cancer types;
`POLE association in UCEC is consistent with previous observations14.
`Comparison of spectra across the 12 types (Fig. 1b and Supplemen-
`tary Table 3d) reveals that LUSC and LUAD contain increased C.A
`transversions, a signature of cigarette smoke exposure10. Sequence
`context analysis across 12 types revealed the largest difference being
`in C.T transitions and C.G transversions (Fig. 1c). The frequency
`of thymine 1-bp (base pair) upstream of C.G transversions is mark-
`edly higher in BLCA, BRCA and HNSC than in other cancer types
`(Extended Data Fig. 1). GBM, AML, COAD/READ and UCEC have
`similar contexts in that the proportions of guanine 1 base downstream
`
`3 3 4 | N A T U R E | V O L 5 0 2 | 1 7 O C T O B E R 2 0 1 3
`©2013
`Macmillan Publishers Limited. All rights reserved
`
`Genome Ex. 1021
`Page 2 of 20
`
`
`
`ARTICLE RESEARCH
`
`KRAS
`NF1
`MAP3K1
`BRAF
`NRAS
`MAP2K4
`MAPK8IP1
`PIK3CA
`PTEN
`PIK3R1
`TLR4
`PIK3CG
`AKT1
`SMAD4
`TGFBR2
`ACVR1B
`SMAD2
`ACVR2A
`APC
`CTNNB1
`AXIN2
`TBL1XR1
`SOX17
`HIST1H1C
`H3F3C
`HIST1H2BD
`FBXW7
`KEAP1
`SPOP
`SF3B1
`U2AF1
`PCBP1
`CDH1
`AJUBA
`DNMT3A
`TET2
`IDH1
`IDH2
`NFE2L2
`NFE2L3
`PPP2R1A
`PTPN11
`RPL22
`RPL5
`MTOR
`STK11
`NAV3
`NOTCH1
`LRRK2
`MALAT1
`ARHGAP35
`POLQ
`NCOR1
`USP9X
`NPM1
`HGF
`EPPK1
`AR
`LIFR
`PRX
`CRIPAK
`EGR3
`B4GALT3
`MIR142
`
`Pan−Cancer
`
`UCEC
`
`OV
`
`LUSC
`
`LUAD
`
`AML
`
`KIRC
`
`HNSC
`
`GBM
`
`COAD/READ
`
`BRCA
`
`BLCA
`
`0.0 0.8 45.1 0.7 0.3 0.2 4.0 26.3 1.2 0.6 20.0 6.7
`7.1 2.5 1.0 11.0 2.7 1.7 1.0 11.8 10.3 3.8 3.5 4.4
`3.1 7.2 0.0 2.1 1.0 1.2 0.0 1.8 1.7 0.3 3.5 2.7
`2.0 0.4 3.6 2.1 1.0 0.2 0.0 6.6 4.6 0.6 0.9 1.5
`2.0 0.1 8.8 0.3 0.0 0.0 7.5 1.8 0.6 0.6 2.6 1.5
`0.0 4.1 2.6 0.0 0.3 0.0 0.0 1.3 0.6 0.3 1.3 1.4
`2.0 0.3 2.1 0.7 0.7 0.5 0.0 1.8 1.2 0.3 0.4 0.7
`17.4 33.6 17.6 11.0 20.6 2.9 0.0 4.4 14.9 0.6 52.2 17.8
`3.1 3.8 1.0 30.7 1.3 4.3 0.0 2.2 8.1 0.6 63.5 9.7
`1.0 2.5 2.1 11.4 1.7 0.5 0.0 1.3 0.6 0.3 30.9 4.4
`2.0 1.2 0.0 0.3 2.0 0.5 0.5 11.4 5.8 1.0 0.4 1.9
`2.0 0.4 0.5 2.4 2.7 0.7 0.0 5.3 7.5 1.0 1.3 1.7
`0.0 2.5 0.0 0.3 0.7 0.5 0.0 0.0 0.6 0.0 1.3 0.9
`2.0 0.4 9.8 0.3 2.0 0.5 0.0 3.1 2.9 0.0 0.0 1.4
`3.1 0.4 2.6 0.7 3.0 0.2 0.0 0.9 1.7 1.0 1.3 1.1
`0.0 0.7 3.6 0.0 1.3 1.0 0.0 2.2 1.2 0.3 1.7 1.0
`1.0 0.5 5.7 0.0 1.0 0.5 0.0 0.9 1.2 0.0 1.3 0.9
`1.0 0.5 2.6 0.0 0.7 0.2 0.0 0.9 1.2 0.0 0.4 0.6
`4.1 0.5 81.9 0.3 4.0 1.4 0.0 9.2 4.0 2.2 5.7 7.3
`2.0 0.1 4.7 0.3 0.7 0.2 0.0 3.5 1.7 0.6 28.3 2.9
`3.1 0.1 3.6 0.3 1.7 0.2 0.0 0.9 0.6 0.3 2.6 0.9
`2.0 1.1 0.0 0.0 1.0 0.7 0.0 2.2 1.2 0.3 1.3 0.8
`0.0 0.0 0.5 0.3 0.3 0.0 0.0 0.4 0.0 0.0 3.0 0.3
`1.0 0.4 1.0 0.7 1.3 0.2 0.0 0.4 0.6 1.3 0.0 0.6
`0.0 0.0 0.0 0.7 0.7 0.0 0.0 1.8 1.2 0.0 0.9 0.4
`1.0 0.0 0.0 0.0 1.3 0.0 0.0 0.0 1.2 0.3 2.6 0.4
`9.2 0.8 11.4 0.3 5.0 0.2 0.0 1.3 5.2 1.0 11.7 3.0
`3.1 0.1 0.0 0.0 4.0 0.5 0.0 17.1 12.1 0.3 1.3 2.6
`1.0 0.1 0.0 0.0 1.0 0.0 0.0 0.4 0.6 0.3 6.5 0.7
`4.1 1.8 1.0 0.7 0.7 1.0 0.5 2.2 2.3 0.0 2.2 1.3
`1.0 0.3 0.5 0.0 1.3 0.0 4.0 2.6 0.0 0.0 0.9 0.8
`1.0 0.0 2.6 0.0 0.0 0.2 0.0 0.4 0.0 0.0 0.9 0.3
`5.1 7.0 0.5 0.3 1.3 0.5 0.0 1.3 1.7 0.3 3.0 2.5
`2.0 0.1 0.0 0.3 6.0 0.5 0.0 0.9 0.0 0.0 0.0 0.8
`0.0 0.5 1.0 0.0 1.7 1.2 25.5 4.0 4.0 1.0 1.3 2.8
`3.1 0.4 0.0 0.7 0.3 1.9 8.5 3.1 2.3 0.0 2.2 1.6
`3.1 0.3 0.0 5.2 0.3 0.5 9.5 0.9 1.2 0.0 0.9 1.5
`0.0 0.0 1.6 0.0 0.0 0.0 10.0 0.4 0.0 0.0 0.4 0.8
`9.2 0.1 0.0 0.0 5.3 1.2 0.0 2.2 14.9 0.0 5.2 2.3
`3.1 0.8 0.0 0.3 1.3 0.2 0.0 0.0 2.3 0.3 1.7 0.8
`1.0 0.1 1.6 0.0 1.3 1.2 0.0 1.3 4.6 1.3 8.7 1.5
`0.0 0.1 1.0 1.7 0.3 0.2 4.5 2.6 1.7 0.3 0.9 1.0
`0.0 0.0 0.0 0.3 0.7 0.5 0.0 0.4 0.0 0.0 10.9 1.0
`0.0 0.4 0.0 2.8 0.0 1.4 0.0 0.4 1.2 0.0 0.9 0.7
`2.0 1.4 3.6 1.4 1.3 6.0 0.0 7.5 4.6 1.9 5.2 3.0
`0.0 0.3 0.0 0.0 0.3 0.2 0.0 8.8 1.7 0.0 0.4 0.9
`5.1 1.4 2.1 1.0 7.3 1.4 0.0 21.5 19.0 1.3 5.2 4.6
`5.1 0.4 0.0 0.0 19.3 1.0 0.5 3.1 8.1 0.6 1.7 3.1
`5.1 0.7 2.6 1.0 5.0 1.4 0.0 6.6 11.5 2.9 3.5 2.8
`15.3 1.1 0.0 0.0 6.3 1.9 0.0 9.7 5.8 1.0 0.0 2.7
`5.1 0.9 0.5 0.7 3.7 1.2 0.5 4.0 5.8 1.6 10.0 2.5
`7.1 0.8 0.5 1.0 4.3 1.2 0.0 5.7 9.2 1.0 3.9 2.4
`8.2 3.9 0.5 0.7 3.3 0.7 0.0 2.6 3.5 0.3 1.3 2.2
`3.1 1.2 0.0 0.7 4.3 1.0 0.5 5.3 4.6 0.3 6.5 2.1
`0.0 0.0 0.0 0.3 0.3 0.0 27.0 0.9 0.0 0.0 0.4 1.8
`1.0 0.5 0.0 0.3 2.7 0.2 0.0 10.5 5.8 0.6 1.3 1.7
`2.0 0.3 0.0 2.8 2.7 0.7 0.0 3.1 4.0 0.3 3.0 1.4
`1.0 0.7 2.1 0.0 2.3 0.5 0.0 1.8 3.5 0.3 3.5 1.2
`1.0 0.8 5.2 0.0 2.7 0.5 0.0 0.9 1.7 0.6 1.7 1.2
`5.1 0.5 0.5 0.7 1.7 1.2 0.5 0.4 1.2 0.3 1.3 0.9
`2.0 0.3 0.0 0.3 1.0 0.5 0.0 5.3 0.0 0.0 0.4 0.7
`1.0 0.3 0.0 0.3 0.0 0.2 0.0 0.4 1.2 0.0 0.4 0.3
`0.0 0.1 0.5 0.0 0.0 0.2 0.0 0.0 1.2 0.0 0.9 0.2
`0.0 0.1 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.2
`
`MAPK signalling
`
`PI(3)K signalling
`
`TGF-β signalling
`
`Wnt/β-catenin
`signalling
`
`Histone
`
`Proteolysis
`
`Splicing
`
`HIPPO signalling
`
`DNA methylation
`
`Metabolism
`
`NFE2L
`
`Protein phosphatase
`
`Ribosome
`
`TOR signalling
`Other
`
`VHL
`GATA3
`TSHZ3
`EP300
`CTCF
`TAF1
`TSHZ2
`RUNX1
`MECOM
`TBX3
`SIN3A
`WT1
`EIF4A2
`FOXA1
`PHF6
`CBFB
`SOX9
`ELF3
`VEZF1
`CEBPA
`FOXA2
`MLL3
`MLL2
`ARID1A
`PBRM1
`SETD2
`NSD1
`SETBP1
`KDM5C
`KDM6A
`MLL4
`ARID5B
`ASXL1
`EZH2
`TP53
`ATM
`ATRX
`BRCA2
`ATR
`STAG2
`BAP1
`BRCA1
`SMC1A
`SMC3
`CHEK2
`RAD21
`ERCC2
`EGFR
`FLT3
`EPHA3
`ERBB4
`PDGFRA
`EPHB6
`FGFR2
`KIT
`FGFR3
`CDKN2A
`RB1
`CDK12
`CDKN1B
`CCND1
`CDKN1A
`CDKN2C
`
`Pan−Cancer
`
`UCEC
`
`OV
`
`LUSC
`
`LUAD
`
`AML
`
`KIRC
`
`HNSC
`
`GBM
`
`COAD/READ
`
`BRCA
`
`BLCA
`
`0.0 0.0 0.0 0.0 0.0 52.3 0.0 0.0 0.6 0.0 0.9 6.9
`1.0 10.6 1.0 0.0 2.0 0.0 0.0 2.6 2.9 0.3 0.4 3.2
`2.0 0.7 3.1 0.7 1.3 1.2 0.5 14.9 6.3 1.0 3.9 2.6
`17.4 0.8 2.1 0.3 8.0 1.4 0.0 0.9 4.6 0.3 5.2 2.5
`2.0 2.4 1.6 0.0 3.3 0.5 0.5 1.3 0.0 0.3 16.5 2.4
`2.0 1.1 1.6 1.4 2.3 1.2 0.0 4.0 6.9 1.6 8.7 2.3
`4.1 0.9 3.1 2.4 1.3 0.7 0.0 6.6 3.5 1.0 1.7 1.8
`1.0 3.3 1.0 0.0 0.7 0.0 9.0 0.4 0.0 0.0 1.3 1.6
`5.1 0.5 1.0 1.4 1.7 1.0 0.0 3.5 4.6 0.6 3.0 1.5
`3.1 2.4 1.0 0.0 0.7 0.0 0.0 4.4 2.9 1.0 1.3 1.4
`1.0 0.5 0.5 0.7 0.7 0.5 0.0 1.8 2.9 0.6 5.2 1.1
`0.0 0.1 1.0 0.7 0.0 0.7 6.0 3.5 2.3 0.0 0.4 1.0
`2.0 0.5 2.6 0.0 0.0 0.7 0.0 1.8 1.2 0.6 1.3 0.8
`4.1 1.7 0.0 1.0 0.7 0.0 0.0 0.4 0.6 0.0 0.0 0.8
`3.1 0.4 0.0 0.3 0.3 0.5 3.0 0.9 1.2 0.3 1.3 0.8
`1.0 2.1 0.0 0.0 0.0 0.2 1.0 0.4 0.6 0.0 0.4 0.7
`0.0 0.1 4.2 1.0 0.7 0.7 0.0 1.3 0.6 0.0 0.4 0.7
`8.2 0.1 3.6 0.0 0.3 0.0 0.0 0.4 0.0 0.3 0.4 0.6
`2.0 0.9 0.0 0.7 0.7 0.0 0.0 0.9 1.7 0.0 0.0 0.6
`0.0 0.0 0.0 0.0 0.0 0.2 6.5 0.0 0.6 0.0 0.0 0.5
`1.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.6 4.8 0.5
`24.5 6.4 2.6 3.1 7.3 3.6 0.5 18.4 15.5 1.9 5.2 6.6
`25.5 1.6 1.6 1.7 17.9 3.1 0.5 8.8 20.1 0.6 8.3 5.9
`27.6 2.0 5.7 0.7 3.0 2.9 0.5 6.1 6.3 1.0 30.0 5.4
`6.1 0.4 0.0 0.7 2.3 32.9 0.0 1.8 3.5 0.3 2.6 5.4
`6.1 1.2 2.6 1.7 2.3 11.5 0.5 7.9 2.9 1.9 2.6 3.6
`6.1 0.3 0.5 0.3 10.6 1.0 0.0 3.1 5.2 0.6 5.7 2.4
`2.0 0.4 1.6 1.4 3.0 1.4 1.0 12.7 5.2 0.0 2.2 2.2
`1.0 0.5 0.5 0.7 1.0 6.5 0.0 4.8 2.9 1.9 2.2 2.0
`26.5 1.1 0.0 1.0 2.7 1.0 1.5 0.9 4.0 0.0 0.9 2.0
`7.1 0.7 2.1 2.1 2.7 1.0 0.0 1.8 4.0 0.3 8.3 2.0
`3.1 0.4 0.0 0.3 3.3 0.7 0.0 2.2 1.7 0.6 9.6 1.6
`3.1 0.4 1.6 0.0 3.0 1.0 2.5 1.3 5.2 0.0 0.9 1.3
`1.0 0.1 0.0 1.0 0.3 0.7 1.5 2.2 2.3 0.0 1.7 0.8
`50.0 32.9 58.6 28.3 69.8 2.2 7.5 51.8 79.3 94.6 27.8 42.0
`11.2 2.1 5.7 1.4 2.7 2.9 0.0 7.9 4.0 1.3 6.5 3.3
`8.2 1.2 1.0 5.5 4.3 1.9 0.0 6.1 5.8 0.6 3.0 2.8
`6.1 1.7 1.6 1.4 3.7 1.9 0.0 5.7 5.8 3.2 4.4 2.7
`4.1 0.8 2.1 1.4 5.3 1.2 0.0 5.7 4.0 0.6 7.0 2.4
`10.2 0.9 1.0 4.1 0.7 1.7 3.0 2.6 3.5 1.0 3.9 2.2
`4.1 0.3 0.0 0.7 1.0 10.1 0.0 1.3 0.6 0.6 2.2 2.0
`4.1 1.6 0.0 1.0 2.7 1.0 0.0 3.5 5.2 3.5 0.9 1.9
`3.1 0.8 1.6 1.7 1.0 0.5 3.5 1.3 0.6 1.3 4.4 1.5
`1.0 0.4 0.0 1.4 1.7 1.2 3.5 2.6 2.3 0.3 0.4 1.2
`2.0 0.4 0.0 1.7 2.3 0.7 0.0 0.9 1.2 0.3 1.3 0.9
`2.0 0.5 1.0 0.3 1.0 0.0 2.5 2.6 1.2 0.3 0.9 0.9
`12.2 0.1 0.5 0.0 0.3 0.2 0.0 1.3 0.0 0.3 0.4 0.7
`1.0 0.7 1.6 26.6 4.7 1.7 1.0 11.4 2.9 1.9 1.3 4.6
`2.0 0.4 0.0 1.7 0.7 0.5 26.5 4.0 4.0 1.0 0.9 2.7
`1.0 0.5 3.1 1.0 3.7 0.5 0.5 8.8 6.3 1.0 2.2 2.1
`2.0 0.8 3.6 0.3 4.3 1.4 0.0 7.5 5.2 0.0 2.6 2.1
`6.1 0.4 1.0 3.8 1.0 1.4 0.5 6.6 4.0 1.0 1.3 1.9
`3.1 0.4 0.0 1.4 1.3 1.2 0.0 9.7 3.5 0.3 1.7 1.6
`2.0 0.9 0.0 0.3 0.7 0.2 0.0 3.1 2.3 0.0 10.4 1.5
`1.0 0.5 1.0 1.0 1.0 0.7 4.0 1.8 3.5 1.9 2.2 1.4
`8.2 0.1 0.5 1.4 1.7 1.4 0.0 0.4 2.3 0.3 0.4 1.0
`4.1 0.0 0.5 0.7 21.3 1.0 0.0 6.6 14.9 0.0 0.4 3.6
`14.3 1.8 0.5 8.3 3.0 0.2 0.0 5.3 6.9 1.9 3.9 3.2
`4.1 0.9 1.6 0.3 1.7 1.4 0.0 3.1 0.6 2.9 2.2 1.5
`2.0 0.9 1.0 0.3 0.7 0.0 0.0 1.8 0.0 0.3 0.9 0.7
`2.0 0.1 0.0 0.0 0.3 0.0 0.0 0.9 0.6 0.0 5.2 0.6
`12.2 0.0 0.0 0.3 0.0 0.2 0.0 0.4 1.2 0.3 0.0 0.6
`0.0 0.0 0.0 1.0 0.0 0.2 0.0 0.0 0.6 0.6 0.0 0.2
`
`Transcription
`factor/regulator
`
`Histone modifier
`
`Genome integrity
`
`RTK signalling
`
`Cell cycle
`
`Figure 2 | The 127 SMGs from 20 cellular processes in cancer identified in
`12 cancer types. Percentages of samples mutated in individual tumour types
`
`and Pan-Cancer are shown, with the highest percentage in each gene among 12
`cancer types in bold.
`
`gain-of-function mutations in IDH1 (Arg 132) and/or IDH2 (Arg 172)
`typify GBM and AML (Supplementary Table 2 and Fig. 2). Although
`KRAS residues Gly 12 and Gly 13 are commonly mutated in LUAD,
`COAD/READ and UCEC, the proportion of Gly12Cys changes is
`significantly higher in lung cancer (P , 3.2 3 10210), resulting from
`the high C.A transversion rate (Extended Data Fig. 2).
`
`Tumour-type-specific mutations
`Signature mutations exclusive to KIRC include those affecting VHL
`(52%) and PBRM1 (33%) (Fig. 2). Mutations to BAP1 (10%) and
`SETD2 (12%) are also most common in KIRC. Transcription factor
`CTCF, ribosome component RPL22, and histone modifiers ARID1A
`and ARID5B have the highest frequencies in UCEC. Predominant
`COAD/READ-specific mutations are those affecting APC (82%) and
`
`Wnt/b-catenin signalling (93% of samples). Several mutations occur
`exclusively in AML, including recurrent mutations in NPM1 (27%)
`and FLT3 (27%), and rare mutations affecting MIR142 (Fig. 2).
`Mutations of methylation and chromatin modifiers are also typical
`in AML, mostly affecting DNMT3A and TET2. BRCA-specific muta-
`tions include GATA3 and MAP3K1, whereas KEAP1 mutations pre-
`dominate in lung cancer (LUAD 17%, LUSC 12%). EPHA3 (9%),
`SETBP1 (13%) and STK11 (9%) are characteristic in LUAD.
`
`Shared and cancer type-specific mutation signatures
`Cluster analysis on mutations in SMGs (Fig. 4 and Extended Data Fig. 3)
`showed 72% (1,881 of 2,611) of tumours are adjacent to those from the
`same tissue type. Notably, clustering identified several dominant groups
`in UCEC, COAD, GBM, AML, KIRC, OV and BRCA. Two major
`
`1 7 O C T O B E R 2 0 1 3 | V O L 5 0 2 | N A T U R E | 3 3 5
`Macmillan Publishers Limited. All rights reserved
`
`©2013
`
`Genome Ex. 1021
`Page 3 of 20
`
`
`
`Mutual exclusivity and co-occurrence among SMGs
`Pairwise exclusivity and co-occurrence analysis for the 127 SMGs
`found 14 mutually exclusive (false discovery rate (FDR) , 0.05) and
`148 co-occurring (FDR , 0.05) pairs (Supplementary Table 6). TP53
`and CDH1 are exclusive in BRCA, with mutations enriched in different
`subtypes13, as are TP53 and CTNNB1 in UCEC. Cohort analysis iden-
`tified pairs where at least one gene has mutations strongly associated
`(corrected P , 0.05) to one cancer type, and also identifies TP53 and
`PIK3CA with significant exclusivity (Extended Data Fig. 4). Pairs with
`significant co-occurrence include IDH1 and ATRX in GBM29, TP53
`and CDKN2A in HNSC, and TBX3 and MLL4 in LUAD.
`Dendrix30 identified a set of five genes (TP53, PTEN, VHL, NPM1
`and GATA3) having strong mutual exclusivity (P , 0.01) (Extended
`Data Fig. 5a and Supplementary Table 7). Not surprisingly, many are
`associated (P , 0.05) with one cancer type (for example, VHL muta-
`tions in KIRC), demonstrating a strong relationship between exclus-
`ivity and tissue of origin. When 600 non-cancer-type-specific genes
`were added to the analysis (Methods), we identified another set con-
`sisting of TP53, PIK3CA, PIK3R1, SETD2 and WT1 (P , 0.01;
`Extended Data Fig. 5b and Supplementary Table 7). Dendrix also
`finds genes with strong mutual exclusivity from each cancer type
`separately (Extended Data Fig. 5c), allowing calculation of ‘cancer
`exclusivity’. KIRC has the strongest exclusivity from the other 11
`cancer types, followed by AML with clear exclusivity from BRCA
`and UCEC. Conversely, COAD/READ displayed the greatest co-
`occurrence with other cancer types (Extended Data Fig. 5d).
`
`Clinical correlation across tumour types
`We examined how clinical features (Supplementary Table 8) correlate
`with somatic events in 127 SMGs within tumour types. Some findings
`are unsurprising, such as the correlation of TP53 mutations with gene-
`rally unfavourable indicators, for example in tumour stage (P 5 0.01,
`Fisher’s exact test) and elapsed time to death (P 5 0.006, Wilcoxon) in
`HNSC, age (P 5 0.002, Wilcoxon rank test) and time to death (P 5
`0.09, Wilcoxon) in AML, and vital status in OV (P 5 0.04, Fisher). In
`UCEC, mutations in several genes are correlated with the endometrioid
`rather than serous subtype: PTEN, CTNNB1, PIK3R1, KRAS, ARID1A,
`CTCF, RPL22 and ARID5B (all P , 0.03) (Supplementary Table 9).
`
`RESEARCH ARTICLE
`
`12
`
`10
`
`02468
`
`Number of non-synonymous mutations in SMGs
`
`OV
`
`BRCA
`
`GBM
`
`KIRC
`
`AML
`
`(n = 316)
`
`(n = 763)
`
`(n = 290)
`
`(n = 471)
`
`(n = 200)
`
`COAD/
`READ
`(n = 193)
`
`HNSC
`
`LUAD
`
`LUSC
`
`BLCA
`
`UCEC
`
`(n = 301)
`
`(n = 228)
`
`(n = 174)
`
`(n = 98)
`
`(n = 230)
`
`Figure 3 | Distribution of mutations in 127 SMGs across Pan-Cancer
`cohort. Box plot displays median numbers of non-synonymous mutations,
`with outliers shown as dots. In total, 3,210 tumours were used for this analysis
`(hypermutators excluded).
`
`endometrial endometroid clusters were found, one having mutations
`in PIK3CA, PTEN and ARID1A, and the other containing mutations in
`two additional genes (PIK3R1 and CTNNB1). Five major breast cancer
`clusters were observed, with mutations in CDH1, GATA3, MAP3K1,
`PIK3CA and TP53 as drivers for respective clusters. The TP53-driven
`cluster is adjacent to an HNSC cluster and an ovarian cancer cluster, all
`having a dearth of other SMG mutations (Fig. 4). The glioblastoma
`cluster is characterized by mutations in EGFR. Two kidney clear cell
`cancer clusters were detected; both have VHL as the common driver
`and one has additional mutations in PBRM1 and/or BAP1 (refs 25–27).
`PBRM1 and BAP1 mutations are mutually exclusive in KIRC (P 5
`0.006), consistent with previous reports26,28. AML has three major clus-
`ters represented by various combinations of DNMT3A, NPM1 and
`FLT3 mutations, and one cluster dominated by RUNX1 mutations.
`One cluster having APC and KRAS mutations was almost exclusively
`detected in COAD/READ. Tumours from BLCA, HNSC, LUAD and
`LUSC are largely scattered over the Pan-Cancer cohort, indicating
`extensive heterogeneity in these diseases.
`
`BLCA
`
`BRCA
`
`COAD/READ
`
`GBM
`
`HNSC
`
`KIRC
`
`AML
`
`LUAD
`
`LUSC
`
`OV
`
`UCEC
`
`TP53
`PIK3CA
`PTEN
`PBRM1
`VHL
`KRAS
`APC
`MLL3
`EGFR
`ARID1A
`CTNNB1
`PIK3R1
`MLL2
`NF1
`GATA3
`SETD2
`MAP3K1
`NOTCH1
`CDKN2A
`ATM
`RB1
`CDH1
`NAV3
`FBXW7
`DNMT3A
`NPM1
`FLT3
`MTOR
`BAP1
`IDH1
`ATRX
`BRCA2
`CTCF
`NCOR1
`STAG2
`TAF1
`RUNX1
`
`Figure 4 | Unsupervised clustering based on mutation status of SMGs.
`Tumours having no mutation or more than 500 mutations were excluded. A
`mutation status matrix was constructed for 2,611 tumours. Major clusters of
`
`mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were
`highlighted. Complete gene list shown in Extended Data Fig. 3.
`
`3 3 6 | N A T U R E | V O L 5 0 2 | 1 7 O C T O B E R 2 0 1 3
`©2013
`Macmillan Publishers Limited. All rights reserved
`
`Genome Ex. 1021
`Page 4 of 20
`
`
`
`ARTICLE RESEARCH
`
`We examined which genes correlate with survival using the Cox
`proportional hazards model, first analysing individual cancer types
`using age and gender as covariates; an average of 2 genes (range: 0–4)
`with mutation frequency $2% were significant (P # 0.05) in each
`type (Supplementary Table 10a and Extended Data Fig. 6). KDM6A
`and ARID1A mutations correlate with better survival
`in BLCA
`(P 5 0.03, hazard ratio (HR) 5 0.36, 95% confidence interval (CI):
`0.14–0.92) and UCEC (P 5 0.03, HR 5 0.11, 95% CI: 0.01–0.84),
`respectively, but mutations in SETBP1, recently identified with worse
`prognosis in atypical chronic myeloid leukaemia (aCML)31, have a
`significant detrimental effect in HNSC (P 5 0.006, HR 5 3.21, 95% CI:
`1.39–7.44). BAP1 strongly correlates with poor survival (P 5 0.00079,
`HR 5 2.17, 95% CI: 1.38–3.41) in KIRC. Conversely, BRCA2 muta-
`tions (P 5 0.02, HR 5 0.31, 95% CI: 0.12–0.85) associate with better
`survival in ovarian cancer, consistent with previous reports32,33; BRCA1
`mutations showed positive correlation with better survival, but did not
`reach significance here.
`We extended our survival analysis across cancer types, restricting
`our attention to the subset of 97 SMGs whose mutations appeared
`in $2% of patients having survival data in $2 tumour types. Taking
`type, age and gender as covariates, we found 7 significant genes: BAP1,
`DNMT3A, HGF, KDM5C, FBXW7, BRCA2 and TP53 (Extended Data
`Table 1). In particular, BAP1 was highly significant (P 5 0.00013,
`
`HR 5 2.20, 95% CI: 1.47–3.29, more than 53 mutated tumours out
`of 888 total), with mutations associating with detrimental outcome in
`four tumour types and notable associations in KIRC (P 5 0.00079),
`consistent with a recent report28, and in UCEC (P 5 0.066). Mutations in
`several other genes are detrimental, including DNMT3A (HR 5 1.59),
`previously identified with poor prognosis in AML34, and KDM5C
`(HR 5 1.63), FBXW7 (HR 5 1.57) and TP53 (HR 5 1.19). TP53 has
`significant associations with poor outcome in KIRC (P 5 0.012), AML
`(P 5 0.0007) and HNSC (P 5 0.00007). Conversely, BRCA2 (P 5 0.05,
`HR 5 0.62, 95% CI: 0.38 to 0.99) correlates with survival benefit in six
`types, including OV and UCEC (Supplementary Table 10a, b). IDH1
`mutations are associated with improved prognosis across the Pan-
`Cancer set (HR 5 0.67, P 5 0.16) and also in GBM (HR 5 0.42,
`P 5 0.09) (Supplementary Table 10a, b), consistent with previous work35.
`
`Driver mutations and tumour clonal architecture
`To understand the temporal order of somatic events, we analysed the
`variant allele fraction (VAF) distribution of mutations in SMGs across
`AML, BRCA and UCEC (Fig. 5a and Supplementary Table 11a) and
`other tumour types (Extended Data Fig. 7). To minimize the effect of
`copy number alterations, we focused on mutations in copy neutral
`segments. Mutations in TP53 have higher VAFs on average in all three
`cancer types, suggesting early appearance during tumorigenesis, although
`
`UCEC
`
`100
`
`75
`
`50
`
`VAF
`
`25
`
`0
`
`BRCA
`
`ChrX
`
`100
`
`75
`
`50
`
`VAF
`
`25
`
`0
`
`a
`
`100
`
`AML
`
`75
`
`50
`
`VAF
`
`25
`
`0
`
`TP53
`SOX17
`PPP2R1A
`ARID1A
`PTEN
`CCND1
`FBXW7
`FGFR2
`FOXA2
`PIK3CA
`RPL22
`PIK3R1
`SPOP
`CTNNB1
`MECOM
`CTCF
`ARHGAP35
`TAF1
`KRAS
`HIST1H2BD
`ARID5B
`MLL4
`AR
`NRAS
`ATR
`SIN3A
`NFE2L2
`USP9X
`
`AKT1
`TP53
`MAP2K4
`NCOR1
`TBX3
`ARID1A
`FOXA1
`CDH1
`RB1
`PIK3CA
`PTEN
`MAP3K1
`CTCF
`SF3B1
`GATA3
`RUNX1
`NF1
`KRAS
`MLL3
`PIK3R1
`TBL1XR1
`
`PHF6
`SMC1A
`SMC3
`TP53
`DNMT3A
`U2AF1
`IDH2
`STAG2
`RUNX1
`IDH1
`WT1
`RAD21
`CEBPA
`FLT3
`KRAS
`TET2
`PTPN11
`KIT
`NRAS
`
`AML TCGA-AB-2968
`
`VAF
`
`b
`
`BRCA TCGA-BH-A18P
`
`VAF
`
`UCEC TCGA-B5-A0JV
`
`VAF
`
`80
`
`100
`
`0
`
`20
`
`40
`
`60
`
`80
`
`100
`
`0
`
`20
`
`40
`
`60
`
`80
`
`100
`
`2 copies
`Model fit
`Component fits
`
`Density (a.u.)
`
`2 copies
`Model fit
`Component fits
`
`Density (a.u.)
`
`2 copies
`Model fit
`Component fits
`
`0
`
`20
`
`40
`
`60
`
`Density (a.u.)
`
`5
`5
`
`2
`2
`
`6
`6
`
`40
`5 CTCF (R566H)
`6 PIK3CA (E542K)
`7 PMS2 (N432S)
`
`60
`
`80
`
`100
`
`Cluster 1
`Cluster 2
`
`ChrX variants
`Autosome
`variants
`
`17
`17
`
`3
`3
`
`4
`4
`
`2,000
`1,000
`500
`200
`100
`50
`20
`10
`5
`20
`0
`1 NRAS (Q61K)
`2 ARID1A (R693*)
`3 PTEN (P96T)
`4 KRAS (G12D)
`
`Tumour coverage
`
`20
`10
`5
`20
`0
`1 FOXA1 (I176M)
`2 PIK3R1 (E635K)
`3 MLL3 (E1992Q)
`
`40
`Cluster 1
`Cluster 2
`
`80
`
`60
`ChrX variants
`Autosome variants
`
`100
`
`Tumour coverage
`
`
`
`33
`
`
`
`22
`
`40
`Cluster 1
`Cluster 2
`Cluster 3
`Cluster 4
`
`80
`60
`ChrX variants
`Autosome variants
`
`100
`
`
`
`11
`
`2,000
`1,000
`500
`200
`100
`50
`20
`10
`5
`20
`0
`1 NRAS (Q61H)
`2 DNMT3A (E477*)
`3 DNMT3A (A741B)
`
`Tumour coverage
`
`2,000
`1,000
`500
`
`200
`100
`50
`
`2
`2
`
`3
`3
`
`1
`1
`
`Figure 5 | Driver initiation and progression mutations and tumour clonal
`architecture. a, Variant allele fraction (VAF) distribution of mutations in
`SMGs across tumours from AML, BRCA and UCEC for mutations ($203
`coverage) in copy neutral segments. SMGs having $5 mutation data points
`were included. ChrX, chromosome X. b, In AML sample TCGA-AB-2968
`(WGS), two DNMT3A mutations are in the founding clone, and one NRAS
`
`mutation is in the subclone. In BRCA tumour TCGA-BH-A18P (exome), one
`FOXA1 mutation is in the founding clone, and PIK3R1 and MLL3 mutations
`are in the subclone. In UCEC tumour TCGA-B5-A0JV (exome), PIK3CA,
`ARID1A and CTCF mutations are in the founding clone, and NRAS, PTEN and
`KRAS mutations are in the secondary clone. Asterisk denotes stop codon.
`
`1 7 O C T O B E R 2 0 1 3 | V O L 5 0 2 | N A T U R E | 3 3 7
`Macmillan Publishers Limited. All rights reserved
`
`©2013
`
`Genome Ex. 1021
`Page 5 of 20
`
`
`
`RESEARCH ARTICLE
`
`it is possible that a later mutation contributing to tumour cell expan-
`sion might have a high VAF. It is worth noting that copy neutral loss of
`heterozygosity is commonly found in classical tumour suppressors
`such as TP53, BRCA1, BRCA2 and PTEN, leading to increased VAFs
`in these genes. In AML, DNMT3A (permutation test P 5 0), RUNX1
`(P 5 0.0003) and SMC3 (P 5 0.05) have significantly higher VAFs
`than average among SMGs (Fig. 5a and Supplementary Table 11b).
`In breast cancer, AKT1, CBFB, MAP2K4, ARID1A, FOXA1 and PIK3CA
`have relatively high average VAFs. For endometrial cancer, multiple
`SMGs (for example, PIK3CA, PIK3R1, PTEN, FOXA2 and ARID1A)
`have similar median VAFs. Conversely, KRAS and/or NRAS mutations
`tend to have lower VAFs in all three tumour types (Fig. 5a), suggesting
`NRAS (for example, P 5 0 in AML) and KRAS (for example, P 5 0.02
`in BRCA) have a