throbber
PERSPECTIVE
`
`
`
`
`
`
`
`
`A Consumer’s Guide to Subgroup Analyses
`Andrew D. Oxman. MD, and Gordon H. Guyatt. MD
`
`
`
`
`
`
`
`
`
`
`
`
`I The extent to which a clinician should believe and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`act on the results of subgroup analyses of data from
`
`
`
`
`
`
`randomized trials or meta-analyses is controversial.
`Guidelines are provided in this paper for making these
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`decisions. The strength of inference regarding a pro-
`
`
`
`
`
`
`posed difference in treatment effect among subgroups
`
`
`
`
`
`
`
`
`is dependent on the magnitude of the difference. the
`
`
`
`
`
`
`statistical significance of the difference, whether the
`
`
`
`
`
`
`hypothesis preceded or followed the analysis, whether
`
`
`
`
`
`
`
`
`
`the subgroup analysis was one of a small number of
`
`
`
`
`
`
`hypotheses tested, whether the difference was sug-
`
`
`
`
`
`
`
`gested by comparisons within or between studies, the
`
`
`
`
`
`
`
`consistency of the difference. and the existence of
`
`
`
`
`
`
`indirect evidence that supports the difference. Applica—
`
`
`
`
`
`
`
`
`tion of these guidelines will assist clinicians in making
`
`
`
`
`
`
`
`decisions regarding whether to base a treatment deci-
`
`
`
`
`
`
`
`
`
`
`sion on overall results or on the results of a subgroup
`
`analysis.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Annals qflnrerrttu' Medicine. 1992;116:78-84.
`
`
`
`
`
`
`
`From McMaster University Health Sciences Centre. Hamilton.
`
`
`
`
`
`
`Ontario. For current author addresses. see end of text.
`
`
`
`
`
`
`
`
`
`
`
`
`Clinicians faced with a treatment decision about a
`
`
`
`
`
`
`
`
`particular patient are interested in the evidence that
`
`
`
`
`
`
`
`
`pertains most directly to that
`individual. Thus.
`is
`it
`
`
`
`
`
`
`
`
`
`frequently of interest to examine a particular category
`
`
`
`
`
`
`
`
`of participants
`in a clinical
`trial:
`for example.
`the
`
`
`
`
`
`
`
`
`
`women.
`those in a certain age group. or those with a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`specific pattern of disease.
`In observational studies.
`these examinations. or subgroup analyses. are routine.
`
`
`
`
`
`
`
`They are also frequently encountered in reports of clin-
`
`
`
`
`
`
`
`
`ical trials. In a survey of 45 clinical
`trials reported in
`
`
`
`
`
`
`
`
`
`
`three leading medical journals. Pocock and colleagues
`
`
`
`
`
`
`(1) found at least one subgroup analysis that compared
`
`
`
`
`
`
`
`
`the response to treatment in difierent categories of pa-
`
`
`
`
`
`
`
`
`tients in 51% of the reports.
`
`
`
`
`
`
`The results of subgroup analyses have had major
`
`
`
`
`
`
`
`
`effects. sometimes harmful. on treatment recommenda-
`
`
`
`
`
`tions. For example. many patients with suspected myo-
`
`
`
`
`
`
`
`cardial infarction who could have benefited from throm-
`
`
`
`
`
`
`
`bolytic therapy may not have received this treatment as
`
`
`
`
`
`
`
`
`
`a result of subgroup analyses based on the duration of
`
`
`
`
`
`
`
`
`
`
`symptoms before treatment (2) and the conclusion that
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`streptokinase was only effective in patients treated
`within 6 hours alter the onset of pain (3. 4). A later.
`
`
`
`
`
`
`
`
`
`
`
`
`larger trial showed that streplokinase was effective up
`
`
`
`
`
`
`
`
`to 24 hours after the onset of symptoms (5).
`
`
`
`
`
`
`
`
`
`Conclusions based on subgroup analyses can have
`
`
`
`
`
`
`adverse consequences both when a particular category
`
`
`
`
`
`
`of patients is denied effective treatment (a “false-nega-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`78
`
`
`© |992 American College of Physicians
`
`
`
`
`
`
`
`Page 1 0f 8
`
`Biogen Exhibit 2069
`
`Mylan v. Biogen
`IPR 2018-01403
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`live" conclusion). as in the above example. and when
`
`
`
`
`
`
`
`
`ineffective or even harmful treatment is given to a sub—
`
`
`
`
`
`
`
`
`
`group oi' patients (a "false—positive" conclusion). Be—
`
`
`
`
`
`
`cause of these risks and their frequency. the appropri-
`
`
`
`
`
`
`
`
`ateness of drawing conclusions from subgroup analyses
`
`
`
`
`
`
`has been challenged (6. 7). and it has been argued that
`
`
`
`
`
`
`
`
`
`
`treatment recommendations based on subgroup analyses
`
`
`
`
`
`may do more harm than good. This hypothesis is cur-
`
`
`
`
`
`
`
`
`
`rently being tested empirically by comparing treatment
`
`
`
`
`
`
`recommendations generated from early trials of new
`
`
`
`
`
`
`treatments based on subgroup analyses with treatment
`
`
`
`
`
`
`recommendations that would have been made had sub—
`
`
`
`
`
`
`
`group analyses been ignored. assessing “whether they
`
`
`
`
`
`
`lead to more patients receiving treatments that are
`
`
`
`
`
`
`
`worthwhile and fewer patients receiving treatments that
`
`
`
`
`
`
`are not." (Sackett DL. Personal communication.)
`
`
`
`
`
`
`Although we agree that subgroup analyses are poten-
`
`
`
`
`
`
`
`tially misleading and that there is a tendency to over-
`
`
`
`
`
`
`
`
`
`emphasize the results of subgroup analyses.
`in this pa-
`
`
`
`
`
`
`
`
`per we will present an alternative point of view. The
`
`
`
`
`
`
`
`
`
`
`essence of our argument
`is that subgroup analysis is
`
`
`
`
`
`
`
`
`
`
`both informative and potentially misleading. Rather
`
`
`
`
`
`than arguing for or against the merits of subgroup anal-
`
`
`
`
`
`
`
`
`
`ysis. we will present guidelines in this article for decid-
`
`
`
`
`
`
`
`
`
`ing how believable the results of subgroup analyses are
`
`
`
`
`
`
`
`
`
`and. consequently. when to act on recommendations
`
`
`
`
`
`
`
`based on subgroup analyses and when to ignore them.
`
`
`
`
`
`
`
`
`
`Our discussion will focus on randomized trials and
`
`
`
`
`
`
`
`
`meta-analyses of randomized trials (systematic over-
`
`
`
`
`
`views). although the same principles apply to any other
`
`
`
`
`
`
`
`
`
`research design. The assumption from which we start in
`
`
`
`
`
`
`
`
`
`this discussion is that the underlying design of the stud—
`
`
`
`
`
`
`
`
`
`ies being examined is sound. For treatment trials. sound
`
`
`
`
`
`
`
`
`design invoIVes elements of randomization. masking.
`
`
`
`
`
`completeness of follow-up. and other strategies for min-
`
`
`
`
`
`
`
`imizing both random error and bias (8. 9}. If the study
`
`
`
`
`
`
`
`
`
`
`is not sound. the overall conclusion is suspect. let alone
`
`
`
`
`
`
`
`
`
`conclusions based on subgroup analyses.
`
`
`
`
`
`to
`the extent
`Even given a rigorous study design.
`
`
`
`
`
`
`
`
`which subgroup analyses
`should be done—or be-
`
`
`
`
`
`
`lieved—is highly controversial. Although there are
`
`
`
`
`
`those who ignore scientific principles in the subgroup
`
`
`
`
`
`
`
`analyses they undertake and report. go on fishing expe-
`
`
`
`
`
`
`
`
`ditions. and indulge in data-dredging exercises (It). 11).
`
`
`
`
`
`
`
`
`
`there are also those who rnix apples and oranges. drown
`
`
`
`
`
`
`
`
`
`in the data they pool (12). reach meaningless conclu-
`
`
`
`
`
`
`
`
`sions about “average" effects ([3). and fail
`to detect
`
`
`
`
`
`
`
`
`
`Clinically important effects because of the heterogeneity
`
`
`
`
`
`
`
`of their study groups (14). Although the debate between
`
`
`
`
`
`
`
`
`
`these two camps is entertaining and can lead to some
`
`
`
`
`
`
`
`
`
`
`useful
`insights. practical
`advice
`for assessing the
`
`
`
`
`
`
`
`strength of inferences based on subgroup analyses is
`
`
`
`
`
`
`
`
`also important. In providing such advice. we will build
`
`
`
`
`
`
`
`
`
`on criteria that have been suggested by other authors
`
`
`
`
`
`
`
`
`
`(IS-18).
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Biogen Exhibit 2069
`Mylan v. Biogen
`IPR 2018-01403
`
`Page 1 of 8
`
`

`

`usually artifacts due to chance. The same position can
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`be taken with respect to apparent diiferences between
`treatment effects in drugs of a single class: this would
`
`
`
`
`
`
`
`
`
`suggest that the best estimate of the efiect of any one
`
`
`
`
`
`
`
`
`
`
`
`drug is the overall effect of the group of drugs across all
`
`
`
`
`
`
`
`
`
`
`
`
`methodologically adequate. randomized. controlled tri-
`
`
`
`
`als (2]). There is confusion. however. over the funda-
`
`
`
`
`
`
`
`
`mental distinction between a “qualitative interaction"
`
`
`
`
`
`
`and a “quantitative interaction" (22). Although a strict
`
`
`
`
`
`
`
`
`definition of a qualitative interaction would mean that
`
`
`
`
`
`
`
`
`there is a sign reversal (22) (meaning that the treatment
`
`
`
`
`
`
`
`
`
`
`is beneficial in one group and harmful in another). it is
`
`
`
`
`
`
`
`
`
`
`
`also used to refer to a substantial quantitative interac-
`
`
`
`
`
`
`
`
`tion (that is, a difference in the magnitude of effect that
`
`
`
`
`
`
`
`
`
`
`is clinically important). From a clinical point of view. it
`
`
`
`
`
`
`
`
`
`
`is important to recognize that a substantial quantitative
`
`
`
`
`
`
`
`
`interaction can be as important as a qualitative interac—
`
`
`
`
`
`
`
`
`tion. For instance.
`the side efi'ects of a treatment may
`
`
`
`
`
`
`
`
`
`be such that
`is worth administering to patients in
`it
`
`
`
`
`
`
`
`
`
`
`whom the magnitude of the treatment effect is large, but
`
`
`
`
`
`
`
`
`
`
`not to patients in whom the treatment effect is small or
`
`
`
`
`
`
`
`
`
`
`
`moderate.
`
`it
`is still reasonable to distinguish
`Having said this.
`
`
`
`
`
`
`
`
`
`between interactions that are clinically trivial and those
`
`
`
`
`
`
`
`
`that are clinically important. The former can be ignored.
`
`
`
`
`
`
`
`
`
`and that
`is the point at which our guidelines begin.
`
`
`
`
`
`
`
`
`
`
`Once the clinician has decided that an interaction.
`if
`
`
`
`
`
`
`
`
`
`real. would be important.
`the subsequent six criteria
`
`
`
`
`
`
`
`
`can be used to help decide on the credibility of the
`
`
`
`
`
`
`
`
`
`
`
`proposed subgroup difference. Three of the criteria (2 to
`
`
`
`
`
`
`
`
`
`4) are markers of the potential for random error [that is.
`
`
`
`
`
`
`
`
`
`
`
`mistakes due to chance); one (criterion 5) is a marker of
`
`
`
`
`
`
`
`
`
`
`
`the potential for systematic errors; and the last
`two
`
`
`
`
`
`
`
`
`
`address the consistency of the evidence (criterion 6) and
`
`
`
`
`
`
`
`
`
`its biologic plausibility (criterion 7).
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Table I. Guidelines for Deciding whether Apparent Dif-
`
`
`
`
`
`
`
`ferences in Subgroup Response Are Real
`
`
`
`
`
`
`
`
`LMIJ_
`
`
`as
`.
`
`
`is the magnitude of the difi'erence clinically important?
`
`
`
`
`
`
`
`Was the difference statistically significant?
`
`
`
`
`
`Did the hypothesis precede rather than follow the analysis?
`
`
`
`
`
`
`
`
`Was the subgroup analysis one of a small number of
`
`
`
`
`
`
`
`
`
`hypotheses tested'?
`
`
`5. Was the difference suggested by comparisons within rather
`
`
`
`
`
`
`
`
`than between studies?
`
`
`
`Was the difi'ercnce consistent across studies?J
`
`
`
`
`
`
`Is there indirect evidence that supports the hypothesized
`
`
`
`
`
`
`
`difierence'?
`
`
`
`
`
`
`
`
`
`
`Our criteria are summarized in Table l and are de-
`
`
`
`
`
`
`
`
`
`scribed in detail below. An example of a hypothesized
`
`
`
`
`
`
`
`
`difference in subgroup response and the extent to which
`
`
`
`
`
`
`
`
`
`it meets our proposed criteria is given in Table 2. We
`
`
`
`
`
`
`
`
`
`
`
`will use this example in the text to highlight some of the
`
`
`
`
`
`
`
`
`
`
`
`
`relevant issues. It should be noted from the outset that
`
`
`
`
`
`
`
`
`
`
`our criteria. like any guidelines for making an inference.
`
`
`
`
`
`
`
`
`
`do not provide hard and fast rules; they simply repre-
`
`
`
`
`
`
`
`
`
`sent an organized approach to making reasonable judg-
`
`
`
`
`
`
`
`ments.
`
`
`Guidelines for Deciding whether Apparent Difl'erences in
`
`
`
`
`
`
`Subgroup Response Are Real
`
`
`
`
`
`
`
`Conceptual Approach Underlying the Guidelines
`
`
`
`
`
`
`
`
`
`
`
`Subgroup analyses of data from randomized trials or
`
`
`
`
`
`
`
`meta-analyses are undertaken to identify ”effect modi-
`
`
`
`
`
`
`fiers." characteriSLICs of the patients or treatment
`that
`
`
`
`
`
`
`
`modify the effect of the intervention under study. Sta-
`
`
`
`
`
`
`
`
`tistical “interactions" in a set of data are measured to
`
`
`
`
`
`
`
`
`
`
`estimate efi‘ect modification (an epidemiologic concept)
`
`
`
`
`
`
`in the population represented by the study sample (19).
`
`
`
`
`
`
`
`
`
`The term interaction is sometimes (but not in this pa-
`
`
`
`
`
`
`
`
`
`per) also used to refer to the concept of synergism or
`
`
`
`
`
`
`
`
`
`
`
`antagonism. a biologic mechanism of action in which
`
`
`
`
`
`
`
`
`the combined effect of two or more factors differs from
`
`
`
`
`
`
`
`
`
`
`the sum of their solitary effects (20). In the following
`
`
`
`
`
`
`
`
`
`
`discussion, we use the term "interaction“ to refer to
`
`
`
`
`
`
`
`
`
`situations in which the observed effectiveness of an
`
`
`
`
`
`
`
`
`intervention differs across subgroups.
`
`
`
`
`The premise underlying the hypothesis that subgroup
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`analyses do more harm than good is that "unanticipated
`qualitative interactions" are unusual and. when appar-
`
`
`
`
`
`
`ent unanticipated interactions are discovered. they are
`
`
`
`
`
`
`
`
`
`
`
`
`The Guidelines
`
`
`1. Is the Magnitude ofthe Diflérenr‘e Clinically
`
`
`
`
`
`
`
`important?
`
`Given the extent of biologic variability. it would be
`
`
`
`
`
`
`
`
`surprising not
`to find interactions between treatment
`
`
`
`
`
`
`effects and various other factors. Differences in the
`
`
`
`
`
`
`
`elfect of treatment are likely to be associated with dif-
`
`
`
`
`
`
`
`
`
`ferences in patient characteristics. differences in the
`
`
`
`
`
`
`
`administration of the treatment (such as different sur-
`
`
`
`
`
`
`
`geons or difi’erent drug doses). and difi'erences in the
`
`
`
`
`
`
`
`
`primary end point. However.
`is only when these
`it
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Table 2. An Example of a Hypothesined Difference in Subgroup Response: Digoxin is More Effective in Patients with
`More Severe Heart Failure
`
`
`
`
`Criterion
`
`
`
`Result
`
`
`Clinically important difierentiation between responders and nonresponders.
`. Magnitude of the difi'erence
`
`
`
`
`
`
`
`
`
`
`
`MPG-—
`Yes. P values were less than 0.01 in both studies.
`Statistical significance
`
`
`
`
`
`
`
`
`
`
`
`
`
`A prion‘ hypothesis
`Yes. the hypothesis was suggested by results of one study and tested in a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`second study.
`
`
`:’1 Small number of hypotheses
`lf viewed as severity of bean failure. yes. if viewed as components (for
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`example. heart size. third heart sound. ejection fraction]. no.
`
`
`
`
`
`
`
`
`
`LII . Within-study comparisons
`Yes. in mm crossover trials. comparisons were within studies.
`
`
`
`
`
`
`
`
`
`
`
`
`Yes. in two studies tested. However. it was not tested in other trials. and
`6. Consistency across studies
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`this is necessary for confirmation.
`
`
`
`
`
`7. Indirect evidence
`Yes. biologically plausible that clinically important response is restricted to
`
`
`
`
`
`
`
`
`
`
`
`
`
`those with more severe heart failure.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1 January I992 - Annals of internal Medicine - Volume lib - Number I
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`79
`
`
`
`Page 2 0f 8
`
`Page 2 of 8
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`differences or interactions are practically important—
`that is, when they are large enough that they would lead
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`to difi‘erent clinical decisions for different subgroups—-
`
`
`
`
`
`
`
`
`
`that there is any point in considering them further.
`As a rule. the larger the ditference between the effect
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`in a particular subgroup (or with a particular drug or
`
`
`
`
`
`
`
`
`
`dosage of drug) and the overall efi'ect. the more plausi-
`ble it is that the difference is real. At the same time. as
`
`
`
`
`
`
`
`
`
`
`
`
`the difi‘erence in efiect size between the anomalous sub-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`group and the remainder of the patients becomes larger.
`the clinical importance of the difference increases.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Unfortunately. if the results of subgroup analysis are
`
`
`
`
`
`
`
`
`only reported for the subgroups within which sizable
`treatment differences are found.
`the estimates of the
`
`
`
`
`
`
`
`
`magnitude of the interaction will be biased because only
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the extreme estimates are reported (23). This is analo-
`
`
`
`
`
`
`
`
`
`gous to regression to the mean (the tendency for ex-
`treme findings. such as unusually high blood pressure
`
`
`
`
`
`
`
`values.
`to revert
`toward less extreme values on re-
`
`
`
`
`
`
`
`
`peated examination) (24). Moreover. when the overall
`
`
`
`
`
`
`
`treatment effect is modest.
`there is a good chance of
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`finding a ”qualitative“ interaction even when only two
`
`
`
`
`subgroups are examined (17).
`When they report the results of subgroup analyses.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`authors should make clear to readers how many com-
`
`
`
`
`
`
`
`
`
`
`parisons were made and how it was decided which ones
`
`
`
`
`
`
`
`to report. Given current publication practices. however.
`
`
`
`
`
`
`
`
`
`were the reader simply to conclude that a reported
`
`
`
`
`
`
`
`
`
`
`
`interaction is real just because it
`is large. he or she
`
`
`
`
`
`
`
`
`
`would be wrong more often than right. Thus. having
`
`
`
`
`
`
`
`
`
`determined that an interaction. if real.
`is large enough
`
`
`
`
`
`
`
`
`
`
`to be important. it is essential to consider other criteria.
`
`
`
`
`
`2. Was the Difi'erence Statistically Significant?
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Any large data set has. imbedded within it, a certain
`
`
`
`
`
`
`
`
`number of apparent, but in fact spurious. interactions.
`
`
`
`
`
`
`
`
`
`
`Statistical tests of significance can be used to assess the
`likelihood that a given interaction might have arisen due
`
`
`
`
`
`
`
`
`to chance alone. For example. Yusuf and colleagues
`
`
`
`
`
`
`
`
`(25), in an overview of randomized trials of beta blocker
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`treatment for myocardial
`infarction. compared agents
`with and without
`intrinsic sympathomimetic activity
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(ISA) and found that the agents without ISA seemed to
`
`
`
`
`
`
`
`
`
`
`produce a larger effect than the ones with it. This dif-
`ference was significant at the 0.01 level. indicating that
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`it was unlikely to have occurred due to chance alone.
`
`
`
`
`
`
`
`
`
`
`Yet,
`two subsequent trials, one of an agent with ISA
`and one of an agent without 15A. showed the opposite
`
`
`
`
`
`
`
`
`
`
`result and. when added to the overview. eliminated the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`statistical significance of the interaction (26). There are
`
`
`
`
`
`
`
`several possible explanations for this. including chance.
`In other words. although events that occur one out of a
`
`
`
`
`
`
`
`
`
`
`hundred times might be considered rare. they do occur.
`
`
`
`
`
`
`
`
`
`Of course. the lower a P value is. the less likely it is
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`that an observed interaction can be explained by chance
`alone.
`
`Conversely. just as it is possible to observe spurious
`
`
`
`
`
`
`
`
`
`interactions. chance is likely to lead to some studies
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(among a large group) in which even a real interaction is
`
`
`
`
`
`
`
`
`
`
`not apparent. This is particularly true if the studies are
`
`
`
`
`
`
`
`
`
`small and the clinical end points of interest are infre-
`quent. In this case. the power to detect an interaction
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`would be low. Because subgroup analyses always in-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`clude fewer patients than does the overall analysis. they
`
`
`
`
`
`
`
`
`
`carry a greater risk for making a type [I error—falsely
`
`
`
`
`
`
`concluding that there is no difference.
`
`
`
`
`
`Statistical techniques for conducting subgroup analy-
`sis include the Breslow-Day technique and regression
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`approaches (27). With the Breslow-Day technique and
`
`
`
`
`
`
`
`
`
`
`
`it
`similar approaches (28).
`is possible to use a test for
`
`
`
`
`
`
`
`homogeneity to estimate the probability that an ob-
`served interaction might have arisen due to chance
`
`
`
`
`
`
`
`
`
`
`
`
`
`alone. More commonly. authors simply conduct a num—
`
`
`
`
`
`
`
`
`ber of comparisons for different subgroups and apply
`chi-square tests or t-tests without formally testing for
`
`
`
`
`
`
`
`
`interactions.
`
`
`
`
`
`
`
`
`This practice. together with only reporting subgroups
`within which sizable treatment diflerences are found.
`
`
`
`
`
`
`
`can lead to an overestimate of the significance as well
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`as the size of the difference. One way of adjusting for
`this bias is to use Bayes or empiric Bayes methods,
`
`
`
`
`
`
`
`
`
`
`which shrink the extreme estimates toward the overall
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`estimate of treatment effect (23. 29. 30). Both a point
`
`
`
`
`
`
`
`
`
`estimate of the magnitude of the difi'erence and a con-
`
`
`
`
`
`
`
`
`fidence interval can be obtained using these approaches.
`Regression models. such as logistic regression (28).
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`can also be used for analysis of interactions if the in-
`teractions are modeled by product terms. This approach
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`allows for testing the significance of an interaction while
`
`
`
`
`
`
`
`
`
`controlling for other factors. if there are many subgroup
`
`
`
`
`
`
`
`factors. however.
`the number of product terms neces-
`
`
`
`
`
`
`
`
`
`sary for an adequate modeling of the interactions may
`be greater than the number of observations; an analysis
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of the interactions is
`then impossible. An additional
`
`
`
`
`
`
`
`
`
`problem with this approach is deciding which of many
`
`
`
`
`
`
`
`
`
`possible interaction terms to enter into the model as
`
`
`
`
`
`
`
`
`
`well as the potential for bias in their selection.
`Methods for selecting factors to include have been
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`proposed (3|) in addition to other approaches to sub-
`
`
`
`
`
`
`
`
`
`
`im—
`group analysis (15, 18. 23. 27). Although it
`is not
`portant for clinical readers to understand the details of
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`these approaches. it is important to understand the con-
`
`
`
`
`
`
`
`cepts of statistical significance and power in subgroup
`analysis. Statistical analysis is a useful tool for assess-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ing whether an observed interaction might have been
`due to chance alone. but it is not a substitute for clin—
`
`
`
`
`
`
`
`
`
`
`
`ical judgment.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`3. Did the Hypothesis Prerede Rather than Follow the
`Anaiysis?
`
`
`
`
`
`
`
`
`Surveying patterns of data that suggest possible inter-
`
`
`
`
`
`
`
`
`actions may.
`in fact. prompt
`the analysis that “con-
`
`
`
`
`
`
`
`
`
`firms“ the existence of a possible interaction. As a
`
`
`
`
`
`
`
`
`result.
`the credibility of any apparent
`interaction that
`
`
`
`
`
`
`
`
`
`arises out of post-hoe exploration of a data set is ques-
`tionable.
`
`
`
`
`
`
`
`
`
`
`finding that
`An example of this was the apparent
`aspirin had a beneficial
`reflect
`in preventing stroke in
`
`
`
`
`
`
`
`
`
`men with cerebrovascular disease but not
`in women
`
`
`
`
`
`
`
`
`(32). This interaction. which was “discovered“ in the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`first large trial of aspirin in patients with transient ische—
`
`
`
`
`
`
`
`
`mic attacks. was subsequently found.
`in other studies
`
`
`
`
`
`
`
`
`and in a meta-analysis summarizing these studies (33).
`
`
`
`
`
`
`
`
`to be spurious. This finding.
`like the streptokinase ex-
`
`
`
`
`
`
`
`
`
`ample.
`is an example of a "false negative" subgroup
`
`
`
`
`
`
`
`analysis.
`In this instance. many physicians withheld
`
`30
`
`
`
`l January I992 - Annals of Internal Medicine - Volume us - Number]
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 3 0f 8
`
`Page 3 of 8
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`aspirin for women with cerebrovascular disease for a
`considerable period.
`
`
`
`
`
`
`
`
`
`Whether a hypothesis preceded analysis of a data set
`is not necessarily a black or white issue. At one ex-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`treme. unexpected results might be clearly responsible
`
`
`
`
`
`
`
`
`
`for generating a new hypothesis. At the other extreme.
`
`
`
`
`
`
`
`
`
`
`a subgroup analysis might be clearly planned for in a
`
`
`
`
`
`
`
`
`study protocol to test a hypothesis suggested by previ-
`ous research. Between these two extremes lie a range
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of possibilities. and the extent
`to which a hypothesis
`arose before. during. or after the data were collected
`
`
`
`
`
`
`
`
`
`and analyzed is frequently not clear. For example.
`if
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`data monitoring detects a seeming interaction in a long-
`
`
`
`
`
`
`
`
`
`term study.
`it may be possible to state the hypothesis
`and then test it
`in future analyses (34). This technique
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`may be most appropriate if additional study patients are
`still to be accrued
`
`
`
`
`
`
`
`
`
`
`sometimes yield
`Although post-hoe analyses will
`
`
`
`
`
`
`
`
`plausible results.
`they should generally be viewed as
`
`
`
`
`
`hypothesis~generating exercises rather than as hypothe—
`
`
`
`
`
`
`
`
`sis testing. Decisions about which analyses to do and
`
`
`
`
`
`
`
`
`
`
`which ones to report are much more likely to be data
`
`
`
`
`
`
`
`driven with post-hoe analyses and thereby more likely
`
`
`
`
`
`
`
`
`
`to be spurious. On the other hand. when a hypothesis
`has been clearly and unequivocally suggested by a dif-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ferent data set.
`it moves from a hypothesis-generating
`
`
`
`
`
`toward a hypothesis—testing framework.
`In Bayesian
`terms. the higher prior probability increases the poste-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`rior probability (after the subgroup analysis) of an in-
`teraction being real {29. 30).
`
`
`
`
`
`
`
`
`
`
`
`
`
`If a hypothesis about an interaction has arisen from
`
`
`
`
`
`
`
`
`
`exploration of a data set from a study.
`then an argu-
`ment can be made for excluding that study from a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`meta-analysis in which the hypothesis is tested. Cer—
`tainly. if the hypothesis is confirmed in a metaianalysis
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`that excludes data from the study that originally sug-
`
`
`
`
`
`
`
`
`gested the interaction.
`the inference rests on stronger
`
`
`
`
`
`
`
`
`ground. If the statistical significance of the interaction
`
`
`
`
`
`
`
`
`disappears or is substantially weakened when data from
`the original study are excluded.
`the strength of infer-
`
`
`
`
`
`
`
`
`ence is reduced.
`
`
`
`it should be
`When considering post-hoe analyses.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`kept in mind that they are more susceptible to bias as
`well as to spurious results. The reader should be par—
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ticularly cautious about analysis of subgroups of pa-
`
`
`
`
`
`
`
`
`tients that are delineated by variables measured after
`
`
`
`
`
`
`
`
`baseline. even if the hypothesis preceded the analysis.
`
`
`
`
`
`
`
`
`If the treatment can influence whether a participant
`
`
`
`
`
`
`
`
`becomes a member of a particular subgroup.
`the con-
`clusions of the analysis are open to bias. For instance.
`
`
`
`
`
`
`
`
`
`
`one might hypothesize that compliers will do better if
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`they are in the treatment group than in the control
`group but that noncompliers will do equally well in both
`
`
`
`
`
`
`
`
`
`
`groups. The reasons for compliance and noncompli-
`
`
`
`
`
`
`ance. however. probably difi‘er
`in the treatment and
`
`
`
`
`
`
`
`
`control groups. As a result.
`in this comparison.
`the
`
`
`
`
`
`
`
`
`
`advantages of randomization (and with it. the validity of
`
`
`
`
`
`
`
`
`
`the analysis) are lost.
`
`
`
`
`
`
`
`
`
`
`
`An example of the evolution of a hypothesis concern-
`ing responsive subgroups comes from the investigation
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of the eflicacy of digoxin in preventing clinically impor-
`
`
`
`
`
`
`
`tant exacerbations of heart failure in heart-failure pa-
`
`
`
`
`
`
`
`
`tients in sinus rhythm (see Table 2). Lee and colleagues
`
`
`
`
`
`
`
`
`
`(35) conducted a crossover study in which they found
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the drug to be effective. They did a regression analysis
`
`
`
`
`
`
`
`
`that suggested that only one factor—the presence of a
`
`
`
`
`
`
`
`third heart sound—predicted who would benefit from
`
`
`
`
`
`
`
`
`
`
`the drug. Only patients with a third heart sound were
`
`
`
`
`
`
`
`
`
`better 011' while taking digoxin. The hypothesis that this
`
`
`
`
`
`
`
`
`
`
`might be one of the predictors appears to have preceded
`the study. Nevertheless. on the basis of the foregoing
`
`
`
`
`
`
`
`
`
`discussion. the investigators Were perhaps too ready to
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`conclude that digoxin use in heart-failure patients in
`sinus rhythm should be restricted to those with a third
`
`
`
`
`
`
`
`
`
`
`heart sound.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`4. Was the Subgroup Analysis One Ufa Small Number
`of Hypotheses Tested?
`
`
`
`
`
`
`
`
`
`Post-hoe hypotheses based on subgroup analysis of-
`ten arise from exploration of a data set in which many
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`such hypotheses are considered. The greater the num-
`
`
`
`
`
`
`
`
`
`ber of hypotheses tested.
`the greater the number of
`interactions that will be discovered by chance. Even if
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`investigators have clearly specified their hypotheses in
`the strength of inference associated with the
`advance.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`apparent confirmation of any single hypothesis will de-
`
`
`
`
`
`
`
`
`
`
`
`
`crease if it
`is one of a large number that have been
`tested. In their regression analysis. Lee and colleagues
`
`
`
`
`
`
`
`
`(35) included If: variables. This relatively large number
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`increases the level of skepticism with which the pres-
`ence of a third heart sound as an important predictor of
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`response to digoxin should be viewed.
`
`
`
`
`
`
`
`Unfortunately. as noted above.
`the reader may not
`always be sure about
`the number of pessible interac-
`
`
`
`
`
`
`
`
`tions that were tested.
`if the investigators chose to
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`withhold this information. despite admonitions not to do
`so, and reported only those that were “significant,” the
`
`
`
`
`
`
`
`
`
`reader is likely to be misled.
`
`
`
`
`
`
`(BHAT) ran-
`The Beta-Blocker Heart Attack Trial
`
`
`
`
`
`
`
`
`
`
`
`
`domized approximately 4000 patients to propranolol or
`
`
`
`
`
`
`placebo after
`a myocardial
`infarction (36).
`Subse—
`quently. 146 subgroup comparisons were done (37). Al-
`
`
`
`
`
`
`
`though the estimated efi'ects of the treatment clustered
`
`
`
`
`
`
`
`around the overall efl'ect. the efl’ect
`in some small sub-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`groups appeared to be either much more effective or
`
`
`
`
`
`
`
`ineffective. The overall pattern. which approximated a
`
`
`
`
`
`
`
`
`"normal" distribution, would suggest that most of the
`
`
`
`
`
`
`
`observed difference in efi‘ect among the various sub-
`
`
`
`
`
`
`
`
`
`groups was due to sampling error rather than to true
`interactions.
`
`
`
`
`
`
`
`
`
`
`
`Another way to consider this is in terms of the effect
`
`
`
`
`
`
`
`of multiple comparisons on P values. The more hypoth-
`
`eses that are tested. the more likely it is to make a type
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`I error. that is. to reject one of the null hypotheses even
`
`
`
`
`
`
`
`
`
`if all are actually true. Assuming that no true diifer-
`ences exist. if IOD different comparisons are made. five
`
`
`
`
`
`
`
`
`can be expected to yield a P value of 0.05 or less by
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`chance alone. In this situation. a more appropriate anal-
`
`
`
`
`
`
`
`
`
`ysis would account for the number of subgroups. their
`
`
`
`
`
`
`
`
`
`
`relation to other subgroups. and the size of the elfect
`
`
`
`
`
`within subgroups and overall (23).
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`5. Was the Diflerem'e Suggested by Comparisons
`
`
`
`
`
`
`within Rather than between Studies?
`
`
`
`
`
`Making inferences about difi'erent eifect sizes in dif-
`
`
`
`
`
`
`
`ferent groups on the basis of between-study differences
`
`
`
`
`
`
`
`
`
`
`
`
`I January [992
`
`
`
`
`
`' Annals of‘lmernal' Medicine - Volume llo - Numberl
`
`
`
`
`
`
`
`
`
`
`
`81
`
`
`Page 4 0f 8
`
`Page 4 of 8
`
`

`

`entails a high risk compared with inferences made on
`
`
`
`
`
`
`
`
`
`the basis of within-study differences. For instance. one
`
`
`
`
`
`
`
`
`would be reluctant to conclude that propranolol results
`
`
`
`
`
`
`
`
`in a different magnitude of risk reduction for death after
`
`
`
`
`
`
`
`
`
`
`myocardial infarction than does metoprolol on the basis
`
`
`
`
`
`
`
`
`
`
`
`
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket