throbber
P E R S P ECT I VE
`
`A Consumer's Guide to Subgroup Analyses
`Andrew D . Oxman, MD, and Gordon H . Guyatt. MD
`
`• The extent to which a clinician should believe and
`act on the results of subgroup analyses of data from
`randomized trials or meta-analyses is controversial.
`Guidelines are provided in this paper for making these
`decisions. The strength of inference regarding a pro(cid:173)
`posed difference in treatment effect among subgroups
`is dependent on the magnitude of the difference, the
`statistical significance of the difference, whether the
`hypothesis preceded or followed the analysis, whether
`the subgroup analysis was one of a small number of
`hypotheses tested, whether the difference was sug(cid:173)
`gested by comparisons within or between studies, the
`consistency of the difference, and the existence of
`indirect evidence that supports the difference. Applica(cid:173)
`tion of these guidelines will assist clinicians in making
`decisions regarding whether to base a treatment deci(cid:173)
`sion on overall results or on the results of a subgroup
`analysis.
`
`Anna/.1· of ln/emal Medicine. 1992:116:78-84.
`
`From McMaster University Health Sciences Centre , Hamilton,
`Ontario. For current author addresses. see end of text.
`
`Clinicians faced with a treatment decision about a
`particular patient are interested in t he evidence that
`pertains most directly to that individual. T hus. it is
`frequently of interest to examine a particular category
`of participants
`in a clinical
`trial: f or example.
`the
`women, those in a certain age group. or those with a
`specific pattern of disease.
`In obser vational studies.
`these examinations, or subgroup analyses, are routine.
`They are also frequently encountered in reports of clin(cid:173)
`ical trials. In a survey of 45 clinical trials reported in
`t hree leading medical j ournals, Pocock and colleagues
`( I ) found at l east one subgroup analysis that compared
`the response to treatment in different categories of pa(cid:173)
`tients in 51% of the reports.
`The results of subgroup analyses have had maj or
`effects, sometimes harmful. on treatment recommenda(cid:173)
`tions. For example, many patients with suspected myo(cid:173)
`cardial infarction who could have benefited from throm(cid:173)
`bolytic therapy may not have received this treatment as
`a result of subgroup analyses based on t he duration of
`symptoms before treatment (2) and the conclusion that
`in patients treated
`streptokinase was only effecti ve
`wi thin 6 hours after the onset of pain (3. 4). A later.
`larger trial showed that streptokinase was effecti ve up
`to 24 hours after the onset of symptoms (5).
`Concl usions based on subgroup analyses can have
`adverse consequences both when a particular category
`of patients is denied effective treatment (a "fal se-nega-
`
`78
`
`© 1992 American College of Physicians
`
`ti ve" conclusion). as in the above example, and when
`ineffective or even harmful treatment is given to a sub(cid:173)
`group of patients (a "false-positive" conclusion). Be(cid:173)
`cause of these risks and their frequency, the appropri(cid:173)
`ateness of drawing conclusions from subgroup analyses
`has been challenged (6, 7), and it has been argued that
`treatment recommendations based on subgroup analyses
`may do more harm than good. T his hypothesis is cur(cid:173)
`rently being tested empirically by comparing treatment
`recommendations generated from earl y trials of new
`treatments based on subgroup anal yses with treatment
`recommendations that would have been made had sub(cid:173)
`gr oup anal yses been ignored, assessing "whether they
`lead
`to more patients receiving treatments that are
`worthwhile and fewer patients receiving treatments that
`are not. .. (Sackett DL. Personal communication.)
`Although we agree that subgroup analyses are poten(cid:173)
`tially misleading and that there is a tendency to over(cid:173)
`emphasize the results of subgroup analyses, in this pa(cid:173)
`per we will present an alternative point of view. The
`essence of our argument is that subgroup analysis is
`hotlr informative and potentially misleading. Rather
`than arguing for or against the merits of subgr oup anal (cid:173)
`ysis. we will present guidelines in this article for decid(cid:173)
`ing how believable the results of subgroup analyses are
`and , consequently, when to act on recommendations
`based on subgroup analyses and when to ignore them.
`Our discussion will focus on randomized trials and
`meta-analyses of randomized trials (systematic over(cid:173)
`views). although the same principles apply to any other
`research design. T he assumption from which we start in
`this discussion is that the underl ying design of the stud(cid:173)
`ies being examined is sound. For treatment trials, sound
`design invol ves elements of randomization, masking.
`completeness of follow-up, and other strategies for min(cid:173)
`imizing both random error and bias (8 . 9). If the study
`is not sound, the overall conclusion is suspect , let alone
`conclusions based on subgroup anal yses.
`Even gi ven a rigorous study design, the extent to
`which subgroup anal yses should be done-or be(cid:173)
`lieved-
`is highly controversial. Al though
`there are
`those who ignore scient ific principles in the subgroup
`analyses they undertake and report. go on fishing expe(cid:173)
`ditions, and indulge in data-dredging exercises (1 0. L1 ).
`there are also those who mix apples and oranges. drown
`in the data they pool ( 12). reach meaningless conclu(cid:173)
`sions about "average" effects ( 13), and fail to detect
`clinically important effects because of the heterogeneity
`of their study groups (14). Although the debate between
`these two camps is entertaining and can lead to some
`useful
`insights, practical advice
`for assessing
`the
`strength of inferences based on subgroup anal yses is
`also important. l n providing such advice, we will build
`on cri teria that have been suggested by other authors
`( 15-18).
`
`Page 1 of 8
`
`Biogen Exhibit 2037
`Coalition v. Biogen
`IPR2015-01993
`
`

`

`Table I. Guidelines for Deciding whether Apparent Dif·
`ferences in Subgroup Response Are Real
`
`I. Is the magnitude of the difference clinically important ?
`2. Was the difference statistically significant?
`3. Did the hypothesis precede rather than follow the analysis?
`4. Was the subgro up analysis one of a small number of
`hypothese tested?
`5. Was the difference suggested by comparisons within rather
`than between studies?
`6. Was the difference consistent across studies ?
`7. Is there indirect evidence that s upports the hypothesized
`difference?
`
`Our criteria a re summarized in Table I and are de(cid:173)
`scribed in detail below. An example of a hypothesized
`difference in subgroup response a nd the extent to which
`it meets our proposed criteria is give n in Table 2. We
`will use this example in the text to highlight some of the
`relevant issues. It should be noted from the outset that
`our criteria, like a ny guidelines for making an inference,
`do not prov ide hard and fast rules; they simply repre(cid:173)
`sent an organized a pproach to making reasonable judg(cid:173)
`ments.
`
`Guidelines for Deciding whether Apparent Differences in
`Subgroup Response Are Real
`
`Conceptual Approach Underlying the Guidelines
`
`Subgroup analyses of data from randomized trials or
`meta-analyses are undertaken to identify ··effect modi(cid:173)
`fiers," characteristics of the patients or treatment that
`modify the effect of the intervention under study. Sta(cid:173)
`tistical " interactions" in a set of data a re measured to
`estimate effect modification (an epidemiologic concept)
`in the population represented by the study sample ( 19).
`The term interactio n is sometimes (but not in this pa(cid:173)
`per) also used to refer to the concept of synergism or
`antagonism. a biologic mechanism of action in which
`the combined effect of two or more factors differs from
`the sum of their solitary effects (20). In the following
`discussion , we use the term " interaction" to refer to
`situations in which the observed effectiveness of an
`inte rve ntion differs across subgroups.
`The premise underlying the hypothesis that subgroup
`analyses do more harm than good is that ' 'unanticipated
`qualitative interactions" are unusual and. when appar(cid:173)
`ent unanticipated interactions are discovered , they are
`
`usuall y artifacts due to chance. The same position can
`be taken with respect to apparent differences between
`treatment effects in drugs of a single class; this would
`suggest that the best estimate of the effect of any one
`drug is the overall effect of the group of drugs across all
`methodologically adequate, randomized . controlled tri(cid:173)
`als (21). There is confusion. however, over the funda(cid:173)
`mental distinction between a •·qualitative interaction' '
`and a " qua ntitative interaction " (22). Although a strict
`definition of a qualitative interaction would mean that
`the re is a sign reversal (22) (meanjng that the treatment
`is beneficial in one group and harmful in another). it is
`also used to refe r to a substantial quantitative interac(cid:173)
`tion (that is, a difference in the magnitude of effect that
`is clinically important). From a clinical point of view, it
`is importa nt to recognize that a substantial quantitative
`interaction can be as important as a qualitative interac(cid:173)
`tion. For instance. the side effects of a treatment may
`be such that it is worth administering to patients in
`whom the magnitude of the treatment effect is large, but
`not to patients in whom the treatment effect is small or
`moderate.
`Having said this, it is still reasonable to distinguish
`between interactions that are clinically trivia l and those
`that are clinically important. The former can be ignored.
`and that is the point at which our guidelines begin.
`Once the clinician has decided that a n interaction , if
`real. would be important , the subsequent six criteria
`can be used to help dec ide on the credibility of the
`proposed subgroup difference . Three of the criteria (2 to
`4) are markers of the potential for random error (that is,
`mistakes due to chance); one (criterion 5) is a ma rker of
`the potential for systematic errors; a nd the last two
`address the consistency of the evidence (criterion 6) and
`its biologic plausibility (criterion 7).
`
`The Guidelines
`I. Is the Mag n.itude of the Difference Clinically
`Important?
`Given the extent of biologic variability. it would be
`surprising nor to find interactions between treatment
`effects and various other factors. Differences in the
`effect of treatment are likely to be associated with dif(cid:173)
`ferences in patient characteristics, differences in the
`ad ministration of the treatment (such as different sur(cid:173)
`geons or different drug doses), and differences in the
`prima ry e nd point. However, it is only when these
`
`Table 2. An Example of a Hypothesized Difference in Subgroup Response: Digoxin is More Effective in Patients with
`More Severe Heart Failure
`
`Criterion
`
`I . Magnitude of the difference
`2. Statistical significance
`3. A priori hypothesis
`
`4. Small number of hypotheses
`
`5. Within-study comparisons
`6. Consistency across studies
`
`7. Indirect evidence
`
`Result
`
`Clinicall y important differentiation between responders and nonresponders.
`Yes, P values were less than 0.01 in both studies.
`Yes. the hypothesis was suggested by results of one stud y and tested in a
`second study.
`If viewed as severity of heart failure. yes. lf viewed as components (for
`example, heart size. third heart sound . ejection fraction). no.
`Yes. in two crossover trials, comparisons were within studies.
`Yes, in two studies tested. Ho wever, it was not tested in other trials. and
`this is necessary for confirmation.
`Yes, biologically plausible that clinically important re sponse is restricted to
`those with more severe heart failure.
`
`I January 1992 • Annals of Internal Medicine • Volume 116 • Number I
`
`79
`
`Page 2 of 8
`
`

`

`differences or interactions are practically important(cid:173)
`that is, when they are large enough that they would lead
`to different clinical decisions for different subgroups(cid:173)
`that there is any point in considering them further.
`As a rule, the larger the difference between the effect
`in a particular subgroup (or with a particular drug or
`dosage of drug) and the overall effect, the more plausi(cid:173)
`ble it is that the difference is real. At the same time, as
`the difference in effect size between the anomalous sub(cid:173)
`group and the remainder of the patients becomes larger,
`the clinical importance of the difference increases.
`Unfortunately, if the results of subgroup analysis are
`only reported for the subgroups within which sizable
`treatment differences are found , the estimates of the
`magnitude of the interaction will be biased because only
`the extreme estimates are reported (23). This is analo(cid:173)
`gous to regression to the mean (the tendency for ex(cid:173)
`treme findings, such as unusually high blood pressure
`values , to revert toward less extreme values on re(cid:173)
`peated examination) (24). Moreover, when the overall
`treatment effect is modest, there is a good chance of
`finding a " qualitative" interaction even when only two
`subgroups are examined (17).
`When they report the results of subgroup analyses.
`authors should make clear to readers how many com(cid:173)
`parisons were made and how it was decided which ones
`to report. Given current publication practices, however,
`were the reader simply to conclude that a reported
`interaction is real just because it is large, he or she
`would be wrong more often than right. Thus, having
`determined that an interaction, if real , is large enough
`to be important, it is essential to consider other criteria.
`
`2. Was the Difference Statistically Significant?
`Any large data set has, imbedded within it, a certain
`number of apparent, but in fact spurious, interactions.
`Statistical tests of significance can be used to assess the
`likelihood that a given interaction might have arisen due
`to chance alone. For example, Yusuf and colleagues
`(25), in an overview of randomized trials of beta blocker
`treatment for myocardial infarction, compared agents
`with and without intrinsic sympathomimetic activity
`(ISA) and found that the agents without ISA seemed to
`produce a larger effect than the ones with it. This dif(cid:173)
`ference was significant at the 0.01 level, indicating that
`it was unlikely to have occurred due to chance alone.
`Yet, two subsequent trials, one of an agent with ISA
`and one of an agent without ISA. showed the opposite
`result and, when added to the overview, eliminated the
`statistical significance of the interaction (26). There are
`several possible explanations for this, including chance.
`In other words, although events that occur one out of a
`hundred times might be considered rare, they do occur.
`Of course, the lower a P value is, the less likely it is
`that an observed interaction can be explained by chance
`alone.
`Conversely, just as it is possible to observe spurious
`interactions, chance is likely to lead to some studies
`(among a large group) in which even a real interaction is
`not apparent. This is particularly true if the studies are
`small and the clinical end points of interest are infre(cid:173)
`quent. In this case, the power to detect an interaction
`would be low. Because subgroup analyses always in-
`
`elude fewer patients than does the overall analysis, they
`carry a greater risk for making a type II error-falsely
`concluding that there is no difference.
`Statistical techniques for conducting subgroup analy(cid:173)
`sis include the Breslow-Day technique and regression
`approaches (27). With the Breslow-Day technique and
`similar approaches (28), it is possible to use a test for
`homogeneity to estimate the probability that an ob(cid:173)
`served interaction might have arisen due to chance
`alone. More commonly, authors simply conduct a num(cid:173)
`ber of comparisons for different subgroups and apply
`chi-square tests or /-tests without formally testing for
`interactions.
`This practice, together with only reporting subgroups
`within which sizable treatment differences are found ,
`can lead to an overestimate of the significance as well
`as the size of the difference. One way of adjusting for
`this bias is to use Bayes or empiric Bayes methods,
`which shrink the extreme estimates toward the overall
`estimate of treatment effect (23, 29. 30). Both a point
`estimate of the magnitude of the difference and a con(cid:173)
`fidence interval can be obtained using these approaches.
`Regression models. such as logistic regression (28),
`can also be used for analysis of interactions if the in(cid:173)
`teractions are modeled by product terms . This approach
`allows for testing the significance of an interaction while
`controlling for other factors. If there are many subgroup
`factors, however, the number of product terms neces(cid:173)
`sary for an adequate modeling of the interactions may
`be greater than the number of observations ; an analysis
`of the interactions is then impossible. An additional
`problem with this approach is deciding which of many
`possible interaction terms to enter into the model as
`well as the potential for bias in their selection.
`Methods for selecting factors to include have been
`proposed (3 1) in addition to other approaches to sub(cid:173)
`group analysis (15 , 18, 23 , 27). Although it is not im(cid:173)
`ponant for clinical readers to understand the details of
`these approaches, it is important to understand the con(cid:173)
`cepts of statistical significance and power in subgroup
`analysis. Statistical analysis is a useful tool for assess(cid:173)
`ing whether an observed interaction might have been
`due to chance alone, but it is not a substitute for clin(cid:173)
`ical judgment.
`
`3. Did the Hypothesis Precede Rather than Follow the
`Analysis?
`Surveying patterns of data that suggest possible inter(cid:173)
`actions may, in fact. prompt the analysis that "con(cid:173)
`firms" the existence of a possible interaction. As a
`result , the credibility of any apparent interaction that
`arises out of post-hoc exploration of a data set is ques(cid:173)
`tionable.
`An example of this was the apparent finding that
`aspirin had a beneficial effect in preventing stroke in
`men with cerebrovascular disease but not in women
`(32). This interaction. which was " discovered" in the
`first large trial of aspirin in patients with transient ische(cid:173)
`mic attacks, was subsequently found, in other studies
`and in a meta-analysis summarizing these studies (33),
`to be spurious. This finding, like the streptokinase ex(cid:173)
`ample, is an example of a "false negative" subgroup
`analysis. In this instance , many physicians withheld
`
`80
`
`I January 1992 • Annals of Internal Medicine • Volume 116 • Number I
`
`Page 3 of 8
`
`

`

`aspmn for women with cerebrovascular disease for a
`considerable period.
`Whether a hypothesis preceded anal ysis of a data set
`is not necessarily a black or white issue. At one ex(cid:173)
`treme, unexpected results might be clearly responsible
`for generating a new hypothesi s. At the other extreme,
`a subgroup analysis might be clearly planned for in a
`stud y protocol to test a hypothesis suggested by previ(cid:173)
`ous research. Between these two extremes lie a range
`of possibilities, a nd the extent to which a hypothesis
`arose before, during. or after the data were collected
`and analyzed is frequently not clear. For example. if
`data monitoring detects a seeming interaction in a long(cid:173)
`term study. it may be possible to state the hypothesis
`and then test it in future a nalyse (34). This technique
`may be most appropriate if additional study patients are
`still to be accrued.
`Although post-hoc analyses will sometimes yield
`plausible results, they should generally be viewed as
`hypothesis-generating exercises rather than as hypothe(cid:173)
`sis testing. Decisions about which a nalyses to do and
`which ones to report are much more likely to be data
`driven with post-hoc analyses and thereby more likely
`to be spurious. On the other hand , whe n a hypothesis
`has been clearly and unequivocally suggested by a dif(cid:173)
`ferent data set , it moves from a hypothesis-generating
`toward a hypothesis-testing framework. In Bayesian
`terms, the higher prior probability inc reases the poste(cid:173)
`rior probability (after the subgroup a nalysis) of an in(cid:173)
`teraction being real (29, 30).
`If a hypothesis about an interaction has arisen from
`exploration of a data set from a study, then an argu(cid:173)
`ment can be made for excluding that study from a
`meta-analysis in which the hypothesis is tested. Cer(cid:173)
`tainly , if the hypothesis is confirmed in a meta-analysis
`that excludes data from the study that originally sug(cid:173)
`gested the interaction. the inference rests on stronger
`ground. If the statistical significance of the interaction
`disappears or is substantially weakened when data from
`the original study are excluded , the strength of infer(cid:173)
`ence is reduced.
`When considering post-hoc analyses , it should be
`kept in mind that they are more susceptible to bias as
`well as to spurious results. The reader shoul.d be par(cid:173)
`ticularly cautious about analysis of subgroups of pa(cid:173)
`tients that are delineated by variables measured after
`baseline, even if the hypothesis preceded the analysis.
`If the treatment can influence whether a participant
`becomes a member of a particular subgroup, the con(cid:173)
`clusions of the analysis are open to bias. For instance,
`one might hypothesize that compliers will do better if
`they are in the treatment group tha n in the control
`group but that noncompliers will do equally well in both
`groups. The reasons for compliance and noncompli(cid:173)
`ance, however. probably differ in the treatment a nd
`control groups. As a result, in this comparison, the
`advantages of randomization (and with it. the validity of
`the a nalysis) are lost.
`An example of the evolution of a hypothesis concern(cid:173)
`ing responsive subgroups comes from the investigation
`of the efficacy of digoxin in preventing clinicall y impor(cid:173)
`tant exacerbations of heart failure in heart-failure pa(cid:173)
`tients in sinus rhythm (see Table 2). Lee and colleagues
`
`(35) conducted a crossover study in which they found
`the drug to be effective. T hey did a regression analysis
`that suggested that only one factor-the presence of a
`third heart sound-predicted who would benefit from
`the drug. Only patients with a third heart sound were
`better off while taking digoxin. The hypothesis that this
`might be one of the predictors appears to have preceded
`the study. Nevertheless, on the basis of the foregoing
`discu sion. the investigators were perhaps too ready to
`conclude that digoxin use in heart-failure patients in
`sinus rhythm should be restricted to those with a third
`heart sound.
`
`4 . Was th e Subgroup Analysis One of a Small Number
`of Hypotheses Tested?
`Post-hoc hypotheses based on subgroup analysis of(cid:173)
`ten arise from exploration of a data set in which many
`such hypotheses are considered. The greater the num(cid:173)
`ber of hypotheses tested. the greater the number of
`interactions that will be discovered by chance. Even if
`investigators have clearly specified their hypotheses in
`advance. the strength of inference associated with the
`apparent confirmation of any single hypothesis will de(cid:173)
`crease if it is one of a large number that have been
`tested. In their regression analysis, Lee and colleagues
`(35) included 16 variables. This relatively large number
`increases the level of skepticism with which the pres(cid:173)
`ence of a third heart sound as an important predictor of
`response to digoxin should be viewed.
`Unfortunately, as noted above, the reader may not
`always be sure about the number of possible interac(cid:173)
`tions that were tested. If the investigators chose to
`with hold this information , despite admonitions not to do
`so, and reported only those that were "significant ," the
`reader is likely to be misled.
`The Beta-Blocker Heart Attack Trial (BRAT) ran(cid:173)
`domized approximately 4000 patients to propranolol or
`placebo after a myocardial
`infarction (36) . Subse(cid:173)
`quently. 146 subgroup comparisons were done (37). Al(cid:173)
`though the estimated effects of the treatment clustered
`around the overall effect, the effect in some small sub(cid:173)
`groups a ppeared to be either much more effective or
`ineffective. The overall pattern, which approximated a
`"normal" distribution, would suggest that most of the
`observed difference in effect among the various sub(cid:173)
`groups was due to sampling error rather than to true
`interactions.
`Another way to conside r this is in terms of the effect
`of multiple comparisons on P values. The more hypoth(cid:173)
`eses that are tested , the more likely it is to make a type
`I error, that is, to reject one of the null hypotheses even
`if all are actually true. Assuming that no true differ(cid:173)
`ences exist, if 100 different compa risons are made , fi ve
`can be expected to yield a P value of 0.05 or less by
`chance alone. In this situation. a more appropriate anal(cid:173)
`ysis would account for the number of subgroups , their
`relation to other subgroups, and the size of the effect
`within subgroups a nd overall (23).
`
`5. Was the Difference Suggested by Comparisons
`wiThin Rather Than between Studies?
`Making inferences about different effect sizes in dif(cid:173)
`ferent groups on the basis of between-study differences
`
`I January 1992 • Annals of Internal Medicine • Volume 116 • N umber I
`
`81
`
`Page 4 of 8
`
`

`

`entails a high risk compared with inference~ made on
`the basis of within-study differences. For instance , one
`would be relucrant ro conclude that propranolol results
`in a differenr magnitude of risk reduction for death after
`myocardial infarction than does metoprolol on the basis
`of data from lwo studies. one that compared proprano(cid:173)
`lol with placebo and another that compared metoprolol
`with placebo. This could be thought of as an indirect
`comparison. A direct comparison would involve. in a
`single study. patients being randomized to receive ei(cid:173)
`ther placebo, propranolol, or meroprolol. If, in such a
`direct comparison, clinically important and sratistically
`significant ditl'erences in magnitude or effect between
`the two active treatments were demonsrrated. rhe infer(cid:173)
`ence would be quite srrong.
`An example that illustrares this point comes from an
`overview examining the effectiveness of prophylaxi:. for
`gastrointestinal bleeding in critically ill patients (38l.
`Hisramine2-
`receptor (H1) antagonises and antacids.
`when individually compared with placebo. had compa(cid:173)
`rable effects in reducing overt bleeding (common odds
`ratios of 0.35 in both cases). In contrast. direct com(cid:173)
`parison from studies in which patients were randomized
`to receive H2 antagonists or antacids have shown a
`statistically significantly greater reducrion in bleeding
`with the latter (common odds ratio. 0.56).
`The reason that inference on the basis of bcrween(cid:173)
`study differences is so potentially misleading is that
`there may be a myriad of factors, a:.idc from the mosr
`salient dilrerence, which is the basis of the inference
`being made. that could explain the inreraction. For in(cid:173)
`stance. aside from diHerences in !he specific drugs u~ed.
`different populations (varying in risk for adverse out(cid:173)
`comes. for example). varying degrees of co-interven(cid:173)
`tion . or varying criteria for gastroinrestinal bleeding
`each could explain the results. These differences would
`not be plausible explanations if the inference were
`based on within-study differences in randomized trials
`in which populations studied. control of co-intervemion.
`and outcome crireria were all identical.
`Stated simply, berween-study inference~ arc based on
`comparisons between noncomparable group!>: even
`when all of the individual studies were randomiLed.
`patients were not randomized to one study or anorhcr.
`Clinical decisions based on between-stud y comparisons
`should be made cautiously. if at all. As a rule . infer(cid:173)
`ences based on between-study comparisons should be
`viewed as preliminary and as requiring confirmation
`from direct within-study comparison. This
`i~ rrue
`whether rhe between-study comparison has 10 do with
`dilferenr groups or different interventions.
`
`6. Was 1he Dijj'erenre Consistent tl('ross Swdie.\ '!
`A hypothesis concerning differential response in a
`subgroup of patients may be generated by examinarion
`of data from a single study. The interaction becomes far
`more credible if it is also found in other studies. The
`extent to which a comprehensive scientific overview of
`the relevant lirerature find s an interaction to be consi~­
`tenrly present is probably the besr single index as to
`wherher it should be believed.
`In other words. the replication of an interaction in
`independent. unbiased studies provides strong support
`
`for its believability. On rhe other hand. there arc two
`reasons to be cautious in applying this criterion . The
`first goes back to sample size. Because subgroup anal(cid:173)
`yses often include small numbers of patients, the resulls
`tend to be imprecise and the exrent to which results
`from different ~tudies are con!>istent can be uncertain.
`The second caurion rel ate~ to making between-study
`comparisons. For the same rea~on that it is risky ro
`base conclusions on between-study differences. it is
`only reasonable to expect variation in the results of
`trials of the same therapy. due to difference~ in the
`•audy popu lations. the interventions. the outcomes. and
`the study designs. as well as the play of chance. Thus,
`when assessing the consistency of results, it is impor(cid:173)
`tant to consider both the power of rhe comparisons (or
`their statistical certainty) and other diiTerences between
`studies that might influence the results.
`The hypothesis concerning a third heart sound as a
`predictor of response to digoxin in heart-failure patients
`in ~inus rhythm was tested in a second crossover, ran(cid:173)
`domized trial !39). The presence of a rhird heart sound
`proved a weaker predictor than in the initial study.
`although its association with response to digoxin did
`reach conventional levels of ~tati ~tica l significance.
`However. a number of factors that. like a third heart
`~ou nd. reflect greater severity of hear! failure. were
`associated with response to digoxin. Thu~. ~upport for a
`more general hypothesis. that response is related to the
`severity of heart failure. wa'> provided by the second
`study.
`Other sludies have examined rhe efficacy of digoxin
`in heart-failure patients in sinu!. rhythm, and these have
`been summarized in a meta-am.tlysis (40). Unfortu(cid:173)
`nately. none of these <;IUdies has conducted subgroup
`analyses addressing the issue of dift'erential response
`according to different severity of heart fai lure. Had
`rht:se analyses been done in the other studies. the hy(cid:173)
`pothesis wou ld likely have been confirmed or refuted
`with substantially greater confidence. A~ it b. we would
`be inclined to view the conclu~ion as tentative: the
`strength of inference is onl y moderate.
`
`7. Is There lndirrct E1•idence to Support tilt•
`Hypmfwsi:.ed Difference'!
`We are generally more ready to believe a hypothe(cid:173)
`sized interacrion if indirect evidence (such as from an(cid:173)
`imal studies or analogous <;ituations in human biology)
`makes the interaction more plausible . That is, to the
`extent rhat a hypothesis is consistent with our current
`under~tanding or the biologic mechanisms of disease.
`we are more likely to believe it. Such understanding
`come from three types of indirect evidence: from stud(cid:173)
`ies of different populations (including animal studies);
`from observation~ of inreractions for !.imilar interven(cid:173)
`tion:.: and from result~ of studi e~ of other, related out(cid:173)
`comes (parlicularly inrermediary ourcomes).
`The extent to which indirect evidence strengthens an
`inference about a hyporhesizcd interaction varies sub(cid:173)
`stanrially. In general . evidence from intermediary out(cid:173)
`comes is the strongesr type of indirect evidence. Evi(cid:173)
`dence of differences in immune response. for example.
`can provide srrong support for a conclusion that there is
`an imp

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket