`& 2012 USCAP, Inc All rights reserved 0023-6837/12 $32.00
`
`Evaluating tumor heterogeneity in
`immunohistochemistry-stained breast cancer tissue
`Steven J Potts1, Joseph S Krueger1, Nicholas D Landis1, David A Eberhard2, G David Young1,
`Steven C Schmechel3 and Holger Lange1
`
`Quantitative clinical measurement of heterogeneity in immunohistochemistry staining would be useful in evaluating
`patient therapeutic response and in identifying underlying issues in histopathology laboratory quality control. A hetero-
`geneity scoring approach (HetMap) was designed to visualize a individual patient’s immunohistochemistry heterogeneity
`in the context of a patient population. HER2 semiquantitative analysis was combined with ecology diversity statistics
`to evaluate cell-level heterogeneity (consistency of protein expression within neighboring cells in a tumor nest) and
`tumor-level heterogeneity (differences of protein expression across a tumor as represented by a tissue section). This
`approach was evaluated on HER2 immunohistochemistry-stained breast cancer samples using 200 specimens across two
`different laboratories with three pathologists per laboratory, each outlining regions of tumor for scoring by automatic
`cell-based image analysis. HetMap was evaluated using three different scoring schemes: HER2 scoring according to
`American Society of Clinical Oncology and College of American Pathologists (ASCO/CAP) guidelines, H-score, and a new
`continuous HER2 score (HER2cont). Two definitions of heterogeneity, cell-level and tumor-level, provided useful in-
`dependent measures of heterogeneity. Cases where pathologists had disagreement over reads in the area of clinical
`importance (þ 1 and þ 2) had statistically significantly higher levels of tumor-level heterogeneity. Cell-level hetero-
`geneity, reported either as an average or the maximum area of heterogeneity across a slide, had low levels of dependency
`on the pathologist choice of region, while tumor-level heterogeneity measurements had more dependence on the
`pathologist choice of regions. HetMap is a measure of heterogeneity, by which pathologists, oncologists, and drug
`development organizations can view cell-level and tumor-level heterogeneity for a patient for a given marker in the
`context of an entire patient cohort. Heterogeneity analysis can be used to identify tumors with differing degrees of
`heterogeneity, or to highlight slides that should be rechecked for QC issues. Tumor heterogeneity plays a significant role
`in disconcordant reads between pathologists.
`Laboratory Investigation (2012) 92, 1342–1357; doi:10.1038/labinvest.2012.91; published online 16 July 2012
`
`KEYWORDS: breast cancer; digital pathology; HER2; immunohistochemistry; pathology; tumor heterogeneity
`
`The companion diagnostic approach seeks to dictate thera-
`peutic strategy based on a molecular description of a patient’s
`disease, with drugs targeting the HER2 receptor one of the
`most widely adopted examples. There are well-established
`guidelines for selecting patients for anti-HER2 adjuvant
`therapies in breast cancer treatment. However, even with
`patient selection, many trastuzumab-treated patients do not
`benefit from therapy, as their disease progresses or becomes
`recurrent. For example, about 1/3 of breast cancer patients
`given Herceptin fail to respond (de novo resistance), and
`
`about 1/5 of the responsive patients become refractory (ac-
`quired resistance).1 The proportion of patients who are not
`responsive to therapy, even with the inclusion of a compa-
`nion diagnostic to predict patient response, indicates that the
`current approaches to treatment strategy and patient selec-
`tion may not be as robust as possible.
`The current HER2 immunohistochemistry (IHC) score
`methodology does not account for heterogeneity. Since 2007,
`the American Society of Clinical Oncology and College of
`American Pathologists (ASCO/CAP) have recommended
`
`1Flagship Biosciences, Westminster, CO, USA; 2Department of Pathology and Lineberger Comprehensive Cancer Center, University of North Carolina Chapel Hill, NC,
`USA and 3Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
`Correspondence: Dr SJ Potts, PhD, Flagship Biosciences, 10955 Westmoor Dr., Suite 400, Westminster, CO 80021, USA.
`E-mail: steve@flagshipbio.com
`
`Received 30 October 2011; revised 24 March 2012; accepted 2 April 2012
`
`1342
`
`Laboratory Investigation | Volume 92 September 2012 | www.laboratoryinvestigation.org
`
`IMMUNOGEN 2165, pg. 1
`Phigenix v. Immunogen
`IPR2014-00676
`
`
`
`Heterogeneity in breast cancer
`SJ Potts et al
`
`specific guidelines for HER2 scoring.2 These guidelines call
`for a consistent process of sample preparation and staining
`(IHC) or hybridization (FISH) approaches, as well as score
`reporting. The ASCO guidelines also suggest using the ter-
`minologies ‘positive’ (3þ ), ‘equivocal’ (2þ ), or ‘negative’
`(þ 1 to 0) to define HER2 scoring. According to the immu-
`nohistochemical (IHC) scoring methodology, the difference
`between an 1þ and 2þ score is a description of a ‘faint’
`(0/1þ ) compared to a ‘weak-to-moderate’ (2þ ) membrane
`staining in more than 10% of the tumor cells. In contrast, a
`3þ score is described as a uniform intense membrane staining
`of 430% of tumor cells. Thus, this widely used scoring ap-
`proach is semiquantitative, as it relies on a threshold per-
`centage of positive cells to determine the score. Importantly,
`this overall score does not include any additional information
`about the percentages of tumor cells that score beyond the
`threshold levels. The ASCO/CAP guidelines for HER2 FISH
`scoring also relies on a stratified HER 2/CEP17 ratio for
`the score.
`information about variability within the
`This lack of
`tumor, or between tumors with the same score, blinds clin-
`icians to a potential readout that could represent a biology
`responsible for non-effective responses to therapy. It
`is
`intuitive that differential cell populations within or between
`tumors could contribute to clinical refraction to therapy and
`thereby affect patient outcomes. All potential factors within
`individual patients that contribute to a lack of response are
`not known, but cancer biologists have long hypothesized that
`such disparate populations within the tumor can be selected
`for outgrowth and emerge as a resistant tumor. This concept
`of tumor heterogeneity leading to drug resistance was de-
`bated as early in the 1950s as the ‘Greenstein Hypothesis’, and
`has become part of cancer biology doctrine.3 In more recent
`times, as more targeted therapies are being developed, the
`issue of tumor heterogeneity has re-emerged as a factor sig-
`nificant to clinical strategy. Thus, there is a need for clinical
`evaluation of tumor heterogeneity that is aligned with the
`emerging understanding of cancer biology.
`Studies of intratumoral heterogeneity from the same site
`demonstrate that heterogeneity can affect prognosis in 2þ
`scored tissues.2,4 Another study found 16% of 3þ score cases
`exhibiting tumor heterogeneity.5 A recent case6 documented
`the personal significance of tumor heterogeneity, where a
`patient with invasive breast carcinoma demonstrated HER2
`gene amplification on core biopsy, but relapsed while on
`adjuvant trastuzumab therapy after mastectomy, dying 15
`months after diagnosis. Often, metastases harvested at
`autopsy demonstrated no evidence of HER2 gene amplifi-
`cation, but retrospective examination of the carcinoma in
`the patient’s mastectomy specimen revealed only focal
`HER2 amplification within the tumor,
`localized to the
`region of
`the prior core biopsy site, highlighting the
`importance of both adequate sampling and awareness of
`heterogeneity issues. Another case was noted7 where a
`patient with breast cancer had areas of the tumor that were
`
`3 þ positive and negative for HER2/neu by IHC, adjacent
`to each other. These cases represent an underlying biology
`of tumor heterogeneity, which contributes to the clinical
`outcome.
`The assessment of HER2 protein expression status in
`breast cancer provides a useful working example of tumor
`heterogeneity for future biomarker studies. There are sub-
`stantial biological and clinical
`implications of intratumor
`clonal heterogeneity.8,9 This heterogeneity may reside within
`a single tumor (intratumoral), or between tumors at different
`sites (intertumoral). Consequently, researchers have attemp-
`ted to identify the levels of clinically observed heterogeneity
`in multiple studies of HER2/neu in breast carcinoma, the
`results of which are summarized in Table 1. Eight different
`studies of HER2 heterogeneity between primary breast tumor
`and metastasis demonstrate the low disconcordance rates
`between these: 0 and 13%, with the majority of studies under
`5% disconcordant. Thus, determining the disparity between
`primary tumor and metastases may not be of high clinical
`priority. However, one recent study found disconcordance
`rates of 14% between core needle and excisional biopsies,
`suggesting that tumor heterogeneity could contribute to
`misclassification utilizing needle biopsies.10 The ASCO/CAP
`guidelines define HER2 genetic heterogeneity in FISH testing
`as 45%, and noted that
`the incidence of
`intratumor
`heterogeneity by this definition ranged in the literature from
`5 to 30%.11
`Accordingly, the ability to measure tumor heterogeneity
`may assist clinicians in verifying the predictive value of the
`HER2 score.
`It
`is critically important
`that
`the profes-
`sion begin to develop improved approaches of report-
`ing heterogeneity in samples. In the discipline of stereology,
`unbiased sampling is obtained by utilizing an entire tissue
`block, and randomly sampling both the sections and regions
`within a section to eliminate bias.12 However, a heterogeneity
`measurement seeks to start with the entire population, and
`then sample in an unbiased manner to then determine a
`representative variation. In addition, in clinical trials, it is
`difficult and nearly impossible to obtain the blocks required
`for stereology sampling, so the industry is left with dealing
`with one or several tissue slides as the specimen from which
`to obtain heterogeneity assessments.
`As pathology evolves into a more digital and quantitative
`discipline, the challenge of quantifying tumor heterogeneity
`comes more clearly into focus. Whole slide imaging and
`quantification techniques for the evaluation of IHC bio-
`markers facilitate an approach for measuring tumor hetero-
`geneity. The ability to distinguish and score individual cells
`across the whole tissue provides sufficient content to assess
`reliably diversity of a biomarker within the sample. Com-
`bined with a mathematical approach to describe a measure of
`variation within the sample, a heterogeneity index can be
`created. In this report, a novel,
`functional approach is
`described that assigns a numerical value to HER2 score
`diversity within a tumor sample, and thus serves to quantify
`
`www.laboratoryinvestigation.org | Laboratory Investigation | Volume 92 September 2012
`
`1343
`
`IMMUNOGEN 2165, pg. 2
`Phigenix v. Immunogen
`IPR2014-00676
`
`
`
`Heterogeneity in breast cancer
`SJ Potts et al
`
`Table 1 Studies of tumor heterogeneity in HER2/neu
`
`Tumor type and stage
`
`Analytical approach and tissue
`
`Result
`
`extraction technique
`
`Evaluation of heterogeneity on different samples from the same tumor site taken at the same time
`
`38 Invasive breast carcinomas grade
`
`Evaluated IHC and FISH on additional
`
`For the 2+ cases with no FISH gene amplification, 72% had a 1+ IHC
`
`2 tumors evenly split on FISH gene
`
`slides of the same tumor
`
`score on at least one additional slide, 22% remained 2+ and 6% had
`
`amplification (4)
`
`one slide scored 1+. For the 2+ cases with FISH gene amplification,
`
`55% had a 3+ IHC score on at least one additional slide, 40% remained
`
`2+ in IHC, and 5% had a slide scored 1+. The authors noted that
`
`significant intratumoral heterogeneity accounts for many breast can-
`
`cers with 2+ HER2 protein expression
`
`44 Breast carcinomas (all grades) and 5
`
`1-mm TMA cores in triplicate, evaluating
`
`Intratumoral heterogeneity was seen with ER, PR, HER2, p53, and MIB-
`
`normal breast tissues (10)
`
`seven proteins by IHC: HER2, ER, PR, E-cad,
`
`1, but not with E-cad and EGFR. Results indicate that core needle
`
`21 Breast cancer tumors (11)
`
`HER2, topoisomerase IIa, c-myc, and cy-
`
`HER2 mRNA expression by q-PCR had much lower levels of hetero-
`
`EGFR, p53, and MIB-1
`
`biopsies are problematic as indicators of status for the entire tumor
`
`clinD1 were evaluated with q-PCR in
`
`geneity than the other genes, with heterogeneity occurring in 36% of
`
`macroscopically and microscopically se-
`
`amplified cases. C-myc and cyclinD1 exhibited heterogeneity in 100%
`
`parate areas of individual tumors
`
`and 83% of cases, respectively
`
`48 Cases of grade 3 breast cancer were
`
`Multiple sites of morphologically similar
`
`Intratumoral heterogeneity for HER2/neu gene amplification was de-
`
`analyzed (5)
`
`tumor were analyzed with HER2 FISH
`
`monstrated in only 5 (16%) of 31 cases, where morphologically similar
`
`areas of a single tumor were analyzed
`
`Heterogeneity between needle core biopsy and excisional biopsy
`
`100 Patients with needle core biopsies
`
`HER2 FISH and IHC
`
`The concordance rate between FISH results determined on the needle
`
`and subsequent excisional biopsy sam-
`
`core biopsy and subsequent excisional biopsy of the same tumor was
`
`ples (12)
`
`Frequency of heterogeneity in a test population, determined by FISH
`
`Single institution results of all testing in
`
`Reported on clinical results as run by CAP
`
`calendar year 2006, total of
`
`guidelines
`
`742 consecutive cases of breast
`
`carcinoma (13)
`
`Heterogeneity comparison between DCIS and cancer
`
`86%. Of the 15 patients who received neoadjuvant chemotherapy, 93
`
`and 87% had no change in HER2/neu status as determined by IHC or
`
`FISH, respectively, in the excisional biopsy specimen when compared
`
`with that determined on the prior core biopsy sample
`
`Genotypic heterogeneity, defined as 45% but o50% of the tumor
`cells demonstrating HER2 gene amplification, was observed in 5% (40/
`
`7242) of the cases
`
`Multiple foci from 23 breast tumors with
`
`Multiple foci extracted and microsatellite
`
`Patterns of allelic losses were generally conserved in the synchronous
`
`DCIS only and 20 cases with synchro-
`
`markers by PCR for allelic losses in in-
`
`infiltrating tumors, supporting the paradigm that infiltrating tumors
`
`nous DCIS and infiltrating cancer (14)
`
`dividual foci for loci in chromosomes 6q,
`
`are clonally derived from the in situ lesions. However, in 8 (40%) of the
`
`9p, 11q, 13q, 16q, 17q, and 17p
`
`20 DCIS cases with invasive cancer, heterogeneous patterns of allelic
`
`loss at one or more loci were observed
`
`Concordance between primary site and metastasis
`
`44 Breast cancer patients with asyn-
`
`Samples from primary breast cancers and
`
`Discordance rates between primary and secondary tumor were 4.5%
`
`chronous metastasis or recurrence (15)
`
`metastatic lesions were analyzed for p53,
`
`for HER2 with IHC, with FISH results consistent with IHC
`
`ER, PR, and HER2 IHC and HER2 FISH
`
`1344
`
`Laboratory Investigation | Volume 92 September 2012 | www.laboratoryinvestigation.org
`
`IMMUNOGEN 2165, pg. 3
`Phigenix v. Immunogen
`IPR2014-00676
`
`
`
`Heterogeneity in breast cancer
`SJ Potts et al
`
`Table 1 Continued
`
`Tumor type and stage
`
`Analytical approach and tissue
`
`Result
`
`extraction technique
`
`Concordance between primary site and metastasis
`
`21 Breast cancer patients with
`
`Samples from primary breast cancers
`
`Discordance rate between primary and secondary tumors was 0% for
`
`metastasis (16)
`
`and metastatic lesions were analyzed
`
`HER2 and p53. Expression levels in breast cancer cells were almost
`
`with IHC for HER2, p53, ER, and PR
`
`unchanged as the disease progressed, regardless of hormone receptor
`
`status
`
`47 Breast cancer patients with
`
`Samples from primary breast cancers
`
`No cases of drastic changes in HER2 expression between primary and
`
`metastasis along with literature
`
`and lymph node metastatic lesions
`
`lymph node metastasis were reported. Authors conclude that breast
`
`review of other similar studies (17)
`
`were analyzed with IHC and CISH/FISH
`
`cancer lymph node metastases generally overexpress HER2 to
`
`for HER2
`
`the same extent as the corresponding primary, including distant
`
`metastases
`
`58 Breast cancer patients with
`
`Samples from primary breast cancers
`
`Discordance rate between primary and metastatic tumors was
`
`metastasis (18)
`
`and metastatic lesions were analyzed
`
`14% (8 of 58 patients), with the majority (7) positive in metastasis
`
`by IHC and FISH for HER2
`
`and negative in primary. FISH results were concordant with IHC for
`
`the data set
`
`789 Breast cancer patients with
`
`Samples from primary breast cancers and
`
`Discordance rate for ER, PR, and HER2 was 18.4%, 40.3%, and 13.6%,
`
`metastasis (19)
`
`recurrent tumors were analyzed by IHC for
`
`respectively. Patients with concordance have significantly better
`
`205 Breast cancer cases from
`
`Samples from primary and recurrent were
`
`Discordance rates for ER, PR, and HER2 were 10.2%, 24.8%, and
`
`20 institutions with matching
`
`analyzed for ER, PR, and HER2 by IHC
`
`2.9%, respectively, with no significant difference in locoregional
`
`ER, PR, and HER2, and by FISH for HER2
`
`post-recurrence survival than discordant cases
`
`primary and recurrent tumor (20)
`
`or distant recurrence. The switch in receptor status led to
`
`a change in the subsequent treatment plan for 17.5% of the
`
`patients
`
`107 Patients with primary breast cancer
`
`HER2 levels were analyzed by IHC and FISH
`
`Discordance rate of 6% with IHC, with all six discordant cases
`
`and at least one distant metastatic
`
`lesion (21)
`
`showing greater HER2 overexpression in the metastatic tumor.
`
`By FISH the discordant rate was 5%, and the discordant cases
`
`were split between under- and overexpression in the metastatic
`
`tumor
`
`Breast cancer primary sites and matched
`
`HER2, p53, bcl-2, topoisomerase IIa, HSP27,
`
`Disconcordance rates were 2% for HER2, 6% for p53, 15% for bcl-2,
`
`metastatic lymph nodes (22)
`
`and HSP70 were evaluated by IHC
`
`19% for topoisomerase IIa, 24% for HSP27, and 30% for HSP70
`
`Heterogeneity before and after treatment
`
`39 Patients with locally advanced breast
`
`IHC for HER2 was performed on paraffin
`
`HER2 IHC scores decreased in 28.5% (15/39) of patients receiving
`
`cancers who received neoadjuvant
`
`sections of the core biopsy before
`
`chemotherapy and 60 breast cancer
`
`treatment and the excised specimen
`
`patients who did not receive
`
`following chemotherapy
`
`neoadjuvant chemotherapy (23)
`
`neoadjuvant chemotherapy compared to 11.7% (7/60) of patients
`in the control (Po0.013). HER2/neu IHC status changed from
`strongly positive to negative (3+ to 0) in 5 of 39 (12.5%) in the
`study group and in 2 of 60 (3.3%) in the control group (P¼ 0.104)
`
`heterogeneity. This output can be included with other digital
`pathology-based measurements of
`IHC biomarkers
`to
`provide a more contextual value to the numerical score.
`Two definitions are introduced to further assist with
`describing heterogeneity cell-level and tumor-level hetero-
`geneity (Figure 1). Cell-level heterogeneity (Hetcell) is the
`variability of cells within a nest of tumor, and tumor-level
`
`heterogeneity (Hettumor) is the variability of nests of cells
`across an entire tumor. There is only one score per slide for
`Hettumor , but as each nest or sampled region in a tumor has
`its own Hetcell score, it is challenging to combine these into a
`single measure for a given slide. Thus, several approaches are
`examined to aggregate measures of cell-level heterogeneity
`across a slide.
`
`www.laboratoryinvestigation.org | Laboratory Investigation | Volume 92 September 2012
`
`1345
`
`IMMUNOGEN 2165, pg. 4
`Phigenix v. Immunogen
`IPR2014-00676
`
`
`
`where N is the number of biological types and pi the propor-
`tional abundance of the ith type. This index, ranging in theory
`from 0 to infinity, estimates the average uncertainty in pre-
`dicting to which species type a randomly selected subunit of
`area belongs. The Simpson index is defined as:
`pipi
`Simpson ¼
`
`XN
`
`i¼1
`
`Producing values from 0 to 1, Simpson’s index defines the
`probability that two randomly selected equal-sized subunits
`of terrain belong to different species. A recent evaluation of
`tumor heterogeneity pioneered the use of both Shannon and
`Simpson indices in evaluating 8q24 copy number gain in
`both CD24þ and CD44þ cell populations in ductal carci-
`noma in situ and invasive regions of tumors.15 Copy numbers
`at each of three levels were considered as separate ‘species’
`and the indices applied to deliver a measure of heterogeneity
`within each sample. Two distinct tumor subtypes of high and
`low diversity of 8q24 copy number, as measured by the
`Shannon index, and the group with lower diversity contained
`fewer samples of HER2þ tumors. There was no difference
`between diversity of the luminal A tumors and the normal
`cells, although basal-like tumors tended to have higher
`diversity scores. In this study, few qualitative differences were
`seen between Shannon and Simpson indices, although the
`data set were small. The Shannon index tends to blur dis-
`tinctions of species richness and evenness, while the Simpson
`index can be dominated by the most abundant species in the
`population.
`The disadvantage of both Shannon and Simpson indices is
`that they do not account for taxonomic distance between
`species. In the world of clinical anatomic pathology, most
`cells are binned and scored as one of three or four classes.
`In HER2 scoring methodology, pathologists (or pathologist-
`trained computer programs) score cells as populations of
`either 0þ , 1þ , 2þ , or 3þ intensity. Consider two regions:
`Region A with ten 0þ cells and ten 3þ cells, and region B
`with ten 1þ cells and ten 2þ cells. Clearly, Region A has a
`higher level of heterogeneity than Region B, but Shannon and
`Simpson indices would score these as equal heterogeneity.
`To overcome this problem, an ecological diversity approach
`known as Rao’s quadratic entropy (QE)16 was used. A dis-
`tance matrix is incorporated in the diversity index, where,
`for example, a difference between a 0þ and 3þ cell would
`be weighted a ‘3’, and a 1þ to 2þ would be weighted a ‘1’.
`When all weights are the same, the scoring schemes tend to
`be equivalent to those mentioned previously.
`The equation is as follows:
`
`3þ 2þ 1þ 0þ
`3þ 0
`1
`2
`3
`2þ 1
`0
`1
`2
`1þ 2
`1
`0
`1
`0þ 3
`2
`1
`0
`
`dijpipj
`
`D ¼
`
`XN
`
`i4j¼1
`
`QE ¼
`
`Heterogeneity in breast cancer
`SJ Potts et al
`
`Figure 1 Definitions of cell-level (above) and tumor-level heterogeneity
`(below). Slide-level heterogeneity is a sampling substitute for tumor-level
`heterogeneity. The below figure also illustrates some contributions of
`anatomic heterogeneity, as parts of the lesser stained areas are ductal
`carcinoma in situ (DCIS).
`
`Numerical Indices of Tumor Cell Diversity
`Diversity measurement is a well-established field in the eco-
`logical sciences, and numerous approaches to quantifying the
`variability of species have been utilized in this discipline. Ecol-
`ogists will describe diversity in terms of richness and evenness,
`and each can be ranked differently depending on the weighting
`of these concepts. For example, one area might have only two
`species, each covering half the area. The second area might have
`six different species, with one dominant species covering 95% of
`the area, and the other five each only covering 1%. Defined in
`terms of richness, the second area with eight different species
`would be considered more diverse. Defined in terms of evenness
`of distribution, the first area would be more diverse as it avoids
`having one type dominating over all others. Two commonly
`used diversity indices are the Shannon index13 and the Simpson
`index,14 for measuring plant and animal species diversity. The
`Shannon index of diversity is defined as:
`pilnpi
`Shannon ¼
`
`XN
`
`i¼1
`
`1346
`
`Laboratory Investigation | Volume 92 September 2012 | www.laboratoryinvestigation.org
`
`IMMUNOGEN 2165, pg. 5
`Phigenix v. Immunogen
`IPR2014-00676
`
`
`
`Heterogeneity in breast cancer
`SJ Potts et al
`
`classes (eg from 0þ to 1þ to 2þ to 3þ distance be
`changed from 0,1,2,3 to 0,1,4,9 respectively). The appropriate
`distance matrix should be discussed within the context of a
`particular protein and the therapeutic goals for prognosis
`decisions.
`In this study, a simple approach to heterogeneity is sought
`that can be utilized in the clinic as anatomic pathology is
`practiced today, to ensure immediate assistance to improving
`clinical trials practice. The constraints include: (1) dealing
`with a single slide; (2) working in brightfield IHC (there are
`16 brightfield FDA clearances for protein expression in tissue
`and none in fluorescence); (3) utilizing image analysis scor-
`ing approaches that have already been cleared for use in
`clinical practice and are familiar to practicing pathologists;
`and (4) delivering a scoring system that is easily commu-
`nicated between pathologists and oncologists. A measure of
`heterogeneity (HetMap) was developed that incorporates
`both cell- and tumor-level heterogeneity measures. The
`approach was evaluated on HER2 IHC-stained breast cancer
`samples, using 200 specimens across two different labora-
`tories, with three pathologists at each laboratory outlining
`10–25 regions of tumor for scoring by automatic image
`analysis.
`
`where N is the total number of species, pi and pj are the
`proportions of the ith and jth species, respectively, in the
`sampling unit, and dij is a member of the symmetric taxo-
`nomic distance matrix D (dij¼ dji and dii¼ 0). The values of
`D can be adjusted to match differences in classes in a parti-
`cularly biological application, and the matrix shown above
`was utilized in this study for HER2-expressing cells. As an
`example of the flexibility of this approach, studies in plant
`ecology17 have utilized the following matrix, where dij¼ l if
`both species belong to the same genus, dij¼ 2 if both species
`belong to the same family but different genera, dij¼ 3 if both
`species belong to the same order but different families,
`dij¼ 3.5 if both species belong to the same subclass but dif-
`ferent orders, dij¼ 4 if both species belong to the same class
`but different subclasses, dij¼ 4.5 if both species are both
`angiosperms but are from different classes, and finally dij¼ 5
`otherwise.
`The differences in diversity index scores are illustrated in
`Table 2, where several different distributions of samples in a
`given region are assumed. The QE weighs distances between
`species and will generally range from 0 (entirely homogenous
`population) to 1.5 (split evenly between extremes), although
`the upper range depends on the distance matrix used. Using a
`distance matrix also minimizes minor changes between cells
`classed into difference adjacent categories. Minimizing minor
`cell classification changes is important, as many researchers
`have noted the relative nature of IHC, and the difficulties
`associated with using IHC for quantitative analyses.18 One
`can further increase the numeric value between two classes in
`the distance matrix to make changes from one class to an
`adjacent class far less important than a change across several
`
`Table 2 Example diversity indices and their scores for various
`hypothetical regions
`
`3+ Cells
`
`2+ Cells
`
`1+ Cells
`
`0+ Cells
`
`Shannon
`
`Simpson
`
`QE
`
`0
`
`0
`
`99
`
`1
`
`3
`
`0
`
`0
`
`0
`
`1
`
`0
`
`3
`
`0
`
`0
`
`1
`
`0
`
`0
`
`3
`
`9
`
`100
`
`99
`
`0
`
`99
`
`91
`
`91
`
`0.00
`
`0.06
`
`0.06
`
`0.06
`
`0.40
`
`0.30
`
`0.30
`
`0.00
`
`0.02
`
`0.02
`
`0.02
`
`0.17
`
`0.17
`
`0.17
`
`0.00
`
`0.02
`
`0.02
`
`0.06
`
`0.33
`
`0.16
`
`0.49
`
`MATERIALS AND METHODS
`Two slide sets prepared and scored by two different clinical
`laboratories were used for this study. One slide set of 100
`breast carcinomas was selected with an equal distribution of
`slides scored from 0þ to 3þ , and a second slide set of 100
`breast carcinomas was taken from routine operation with a
`distribution of slides representative of the target population.
`The tissues were formalin-fixed, paraffin-embedded breast
`tissue specimens immunohistochemically stained using Dako
`in vitro diagnostic FDA-approved HerceptTest (Dako, Car-
`pinteria, CA, USA). All slides were scanned on an Aperio
`ScanScope, and three board-certified pathologists for each
`slide set manually drew between 10 and 20 regions of interest
`of tumor on the slides using the Aperio ImageScope interface.
`The pathologists were asked to draw representative regions of
`tumor on each slide for routine HER2 scoring using auto-
`matic image analysis. A total of 8549 tumor regions on 100
`slides were drawn electronically by three pathologists on the
`first slide set and 6002 tumor regions on 100 slides were
`drawn by three other pathologists on the second slide set. The
`Aperio HER2 membrane algorithm was adjusted and run on
`these regions of interest to identify cells and classify them as
`0þ , 1þ , 2þ , or 3þ staining (http://www.aperio.com). The
`algorithm was adjusted by consensus of the pathologists be-
`fore the study on control slides, and then used with a fixed
`parameter set for each of the slide sets. HER2 IHC protein
`expression status was classified per cell following ASCO/CAP
`guidelines. Data from the automated image analysis scoring
`of cells for each region were exported to tab-delimited file
`format and then analyzed with R.19
`
`9
`
`25
`
`50
`
`90
`
`0
`
`50
`
`49
`
`0
`
`25
`
`0
`
`0
`
`50
`
`0
`
`1
`
`0
`
`25
`
`0
`
`0
`
`50
`
`50
`
`49
`
`91
`
`25
`
`50
`
`10
`
`0
`
`0
`
`1
`
`4900
`
`100
`
`4900
`
`100
`
`1.39
`
`0.69
`
`0.33
`
`0.69
`
`0.69
`
`0.79
`
`0.79
`
`0.76
`
`0.51
`
`0.18
`
`0.51
`
`0.51
`
`0.52
`
`0.52
`
`1.25
`
`1.50
`
`0.54
`
`0.50
`
`1.00
`
`1.02
`
`1.02
`
`www.laboratoryinvestigation.org | Laboratory Investigation | Volume 92 September 2012
`
`1347
`
`IMMUNOGEN 2165, pg. 6
`Phigenix v. Immunogen
`IPR2014-00676
`
`
`
`Heterogeneity in breast cancer
`SJ Potts et al
`
`0.3; 9%: 0.4; and 10%: 0.5¼ critical threshold for next
`higher score).
`The two measurement components of the HER2cont score
`complement each other as the HER2 score moves from one
`score to the next higher score. Because an HER2 score is
`assigned to each individual cell, histogram data can be gen-
`erated, which tallies the quantum (0, 1þ , 2þ , 3þ ) scores
`for each region. The HER2cont scoring approach examines the
`components of this histogram and assigns a threshold value
`based on a weighted mean of the data values of this histo-
`gram. Thus, this score fully represents the two components of
`data captured in a histogram: (1) the percent of total cells
`with each quantum score; and (2) consideration of the
`population profile of these cells. For example, a classic HER2
`score of 2þ can be represented by an HER2cont score ranging
`from 1.5 to 2.4. The specific value of the HER2cont score
`consists, in part, of the sum of the quantum score multiplied
`by the number of cells given to each score, and divided by the
`number of cells. However, the percentages of these cells that
`contributed most
`to this mean value are weighted to
`determine a threshold value of the score. To understand how
`this approach works, we will use an example of a region that
`contained 10 cells, of which 4 cells had a score of 3þ , 3 cells
`had a score of 2þ , and the remaining 3 cells had a score
`of 1þ . Intuitively, we could deduce the mean of this score
`(3þ 3þ 3þ 3þ 2þ 2þ 2þ 1þ 1þ 1)/10¼ 21/10 ¼ 2.1.
`as
`However,
`in actuality, the percentages of cells that con-
`tributed to the HER2cont score are weighted to yield the final
`score. To understand how this weighting process contributes
`to the final HER2cont score, we can examine the components
`of a score of 1.5, right at the threshold: If back-calculated, we
`would find that there are exactly 10% of 2þ or 3þ cells.
`In contrast, a score of 1.9 must result from either 82% of 2þ
`or 3þ cells, or 4% of 3þ cells. Finally, a score of 2.4 must
`result from 9% of 3þ cells, moving the score very close to
`the next higher 3þ score. Thus, the HER2cont score better
`captures the effects of intra-regional variability, which con-
`tribute to the overall profile of cells within a given summary
`HER2 score for that region.
`The HER2cont was designed to capture the deficiency in
`a classic HER2 scoring approach, which has been in part
`addressed by the new HER2 scoring according to the ASCO/
`CAP guidelines, using the 30% of 3þ cells threshold for a
`3þ score. The newer ASCO/CAP threshold approach to
`defining a 3þ score is based on a more stringent require-
`ment for HER2 positivity, based on the now widely held
`understanding that the cells with the most HER2 expression
`are most responsive to trastuzumab. To understand how this
`decision was critical to predicting trastuzumab sensitivity, we
`can use the demonstration of how a classic H-scoring
`approach can result in an identical score for two dissimilar
`population profiles. For example, if a tumor had 30% of the
`cells being 3þ with the remaining cells being 0þ , the
`H-score would be 9