`
`
`
`Treating Individuals 2
`
`Subgroup analysis in randomised controlled trials:
`
`importance, indications, and interpretation
`Peter M Rafllwell
`
`Lilla! 2005; 365: 176-86
`Strobe Prevention Rmdl
`
`Unit.11de large pragmatic trials provide the most reliable data about the effects of treatments, but should be designed,
`amm,mam
`analysed, and reported to enable the most effective use of treatments in routine practice. Subgroup analyses are
`Infirmry,0xrford0X16llE,UK
`(PMRothwelFRCP)
`important if there are potentially large differences between groups in the risk of a poor outcome with or without
`mm“
`treatment, if there is potential heterogeneity of treatment effect in relation to pathophysiology, if there are
`practical questions about when to treat, or if there are doubts about benefit in specific groups, such as elderly
`people, which are leading to potentially inappropriate undertreatment. Analyses must be predefined, carefufly
`justified, and limited to a few clinically important questions, and post-hoc observations should be treated with
`scepticism irrespective of their statistical significance. If important subgroup effects are anticipated, trials should
`either be powered to detect them reliably or pooled analyses of several trials should be undertaken. Formal rules
`for the planning, analysis, and reporting of subgroup analyses are proposed.
`
`Introduction
`
`“The essence of tragedy has been described as the
`destructive collision of two sets of protagonists, both
`of whom are correct. The statisticians are right in
`denouncing subgroups that are formed post hoc from
`exercises in pure data dredging. The clinicians are also
`right, however,
`in insisting that a subgroup is
`respectable and worthwhile when established a priori
`from pathophysiological principles.”
`
`A R Feinstein, 1998'
`
`Randomised controlled trials (RCI's) and systematic
`reviews are the most reliable methods ofdetermining the
`effects
`of
`treatments.H However, when
`trials
`
`were first developed for use in agriculture, researchers
`were presumably concerned about
`the effect of
`interventions on the overall size and quality of the
`crop rather than on the wellbeing of any individual plant.
`Clinicians have to make decisions about individuals, and
`
`how best to use results of RCI‘s and systematic reviews to
`do
`this
`has
`generated
`considerable
`debate.”
`Unfortunately. this debate has polarised, with statisti-
`cians and predominantly nonclinical (or non-practising)
`epidemiologists waming of the dangers of subgroup
`analysis and other attempts to target treatment, and
`clinicians warning of the dangers of applying the overall
`results of large trials to individual patients without
`consideration of pathophysiology or other determinants
`ofindividual response. This rift. described by Feinstein as
`a 'clinicostatistical tragedy”,I has been widened by some
`of the more enthusiastic proclamations on the extent to
`which the overall results of trials can properly infbrm
`decisions at the bedside or in the clinic?”
`
`The results of small explanatory trials with well-defined
`eligibility criteria should be easy to apply, but general-
`isability is often undermined by highly selective
`recruitment, resulting in trial populations that are unrep-
`resentative even of the few patients in routine practice
`who fit the eligibility criteria.“ Recruitment of a higher
`proportion ofeligible patients is a major strength oflarge
`pragmatic trials, but deliberately broad and somefimes ill-
`defined entry criteria mean that the overall result can be
`difficult to apply to particular groups}7 and that subgroup
`analyses are necessary ifheterogeneity oftreatment effect
`is likely to occur. Yet despite the adverse effects on patient
`care that can result from misinterpreted or inappropriate
`subgroup analyses (table 1), there are no reviews or
`guidelines on the clinical
`indications for subgroup
`analysis and no consensus on the implications for trial
`design, analysis, and interpretation of subgroup effects,
`and the CONSORT statement on reporting of trials
`includes only a few lines on subgroup analysis.” This
`article discusses arguments for and against subgroup
`analyses, the clinical situations in which they can be
`usefirl, and rules for their performance and interpretation.
`Illustrative examples are taken mainly from treatments
`
`m A
`
`sliitisilefieuiveinseuxnhypuufimofsuoluehmm"
`Amihypertensivetreaunentforprinwy prevemionisinefiectiveitwomen’”
`kamummmmfi
`Angiotensin—corueningenzyme itlibitorsdonotreducemondityand hospitdatinission
`inpatientswithheartfaiurewhoaealsotakingaspirin"
`fiflodusmidfecflnafwmnmrflifiuimindfiflypuphfadhpfiuns
`withinfirinrmyoanflifiuion“
`Humbolysisisineffective>6 houlsafterawternyoardid ’Ilt’artlion‘7
`Whmmyoatflhfamimisiflficfiveorhnfilhm
`withaplwiwsnyoadflinfaaion“
`Tmifenduateisineffediveitwanenwidibmastmeraged<50yulf
`mmmmummswam
`thonlylow—doseaqiitdretomilmdopuafivelisk'
`Amlodipinereducsmottaity in patientswith chronic heartfaiure due to non-isdmmic
`ardiomyopadtybmnotin palientswith isdnemicardiomyqiathy"
`
`ofmrmteflectwfidllusmmmabeflse
`‘8asif:S‘éflfifig
`
`runmdmmmmmmmmm
`
`176
`
`Page 1 of 11
`
`Biogen Exhibit 2070
`.
`.WWW.-
`cetc
`Vol 36
`For personal use. Only reproduce WIth permISSIon rom E seVIerEtla
`Mylan v. Biogen
`IPR 2018-01403
`
`8.2005
`
`
`
`Series
`
`for cerebrovascular or cardiovascular disease but the
`principles are relevant to all areas of medicine and
`surgery.
`
`Arguments against subgroup analysis
`
`“ . . . it would be unfortunate if desire for the perfect (ie,
`knowledge of exactly who will benefit from treatment)
`were to become the enemy of the possible (ie, knowledge
`of the direction and approximate size of the effects of
`treatment of wide categories of patient).”
`
`S Yusuf et al, 19844
`
`The main argument against subgroup analysis is that
`qualitative heterogeneity of relative treatment effect
`(defined as the treatment effect being in different
`directions in different groups of patients, ie, benefit in
`one subgroup and harm in another) is very rare.2–5
`However, this observation is much less reassuring than it
`seems. First, it automatically excludes most treatments
`because they do not have a substantial risk of harm and
`can only be effective or ineffective. Yet use of an
`ineffective treatment can be highly detrimental if this
`prevents the use of a more effective alternative or if
`adverse effects impair quality of life. Second, the
`
`Panel 1: Rules of subgroup analysis: a proposed guideline for design, analysis, interpretation, and reporting
`
`G
`
`G
`
`G
`
`Trial design
`Subgroups analyses should be defined before starting the
`G
`trial and should be limited to a small number of clinically
`important questions.
`Expert clinical input into the design of subgroup analyses is
`needed to ensure that all relevant baseline clinical and
`other data are recorded.
`The direction and magnitude of anticipated subgroup
`effects should be stated at the outset.
`The exact definitions and categories of the subgroup
`variables should be defined explicitly at the outset in order
`to avoid post hoc data-dependent variable or category
`definitions. For continuous or hierarchical variables the cut-
`off points for analysis should be predefined.
`Stratification of randomisation by important subgroup
`variables should be considered.
`If important subgroup-treatment effect interactions are
`anticipated, trials should ideally be powered to detect them
`reliably.
`Trial stopping rules should take into account anticipated
`subgroup-treatment effect interactions and not simply the
`overall effect of treatment.
`If relative treatment effect is likely to be related to baseline
`risk, the analysis plan should include a stratification of the
`results by predicted risk. The risk score or model should be
`selected in advance so that the relevant baseline data can
`be recorded.
`
`G
`
`G
`
`G
`
`G
`
`G
`
`Analysis and reporting
`The above design issues should be reported in the methods
`G
`section along with details of how and why subgroups were
`selected.
`Significance of the effect of treatment in individual
`subgroups should not be reported; rates of false negative
`and false positive results are extremely high. The only
`reliable statistical approach is to test for a subgroup-
`treatment effect interaction.
`All subgroup analyses that were done should be reported—
`ie, not only the number of subgroup variables but also the
`number of different outcomes analysed by subgroup,
`different lengths of follow-up etc.
`
`G
`
`G
`
`G
`
`G
`
`G
`
`G
`
`Significance of pre hoc subgroup-treatment effect
`interactions should be adjusted when multiple subgroup
`analyses are done.
`Subgroup analyses should be reported as absolute risk
`reductions and relative risk reductions. Where relevant the
`statistical significance of differences in absolute risk
`reductions should be tested.
`Ideally, only one outcome should be studied and this
`should usually be the primary trial outcome, irrespective of
`whether this is one outcome or a clinically important
`composite outcome.
`Comparability of treatment groups for prognostic factors
`should be checked within subgroups.
`If multiple subgroup-treatment effect interactions are
`identified, further analysis is needed to check whether their
`effects are independent.
`
`G
`
`Interpretation
`Reports of the significance of the effect of treatment in
`G
`individual subgroups should be ignored, especially
`reports of lack of benefit in a particular subgroup in a trial
`in which there is overall benefit, unless there is a
`significant subgroup treatment effect interaction
`Genuine unanticipated subgroup-treatment effect
`interactions are rare (assuming that expert clinical
`opinion was sought in order to pre-define potentially
`important subgroups) and so apparent interactions that
`are discovered post hoc should be interpreted with
`caution.
`No test of significance is reliable in this situation.
`Pre hoc subgroup analyses are not intrinsically valid and
`should still be interpreted with caution. The false
`positive rate for tests of subgroup-treatment effect
`interaction when no true interaction exists is 5% per
`subgroup.
`The best test of validity of subgroup-treatment effect
`interactions is their reproducibility in other trials.
`Few trials are powered to detect subgroup effects and so
`the false negative rate for tests of subgroup-treatment
`effect interaction when a true interaction exists will usually
`be high.
`
`G
`
`G
`
`G
`
`www.thelancet.com Vol 365 January 8, 2005
`
`For personal use. Only reproduce with permission from Elsevier Ltd
`
`177
`
`Page 2 of 11
`
`
`
`Series
`
`to so-called unanticipated
`observation refers only
`heterogeneity.2–5 As outlined below, there are many
`examples in which qualitative heterogeneity of relative
`treatment effect has been correctly anticipated. Third, the
`observation only applies to single outcome events; it is
`argued that subgroup analyses based on composite
`outcomes are inappropriate.2–5,51 However, since qualitative
`heterogeneity of relative treatment effect is only possible
`for treatments that have a risk of harm, and such
`treatments almost always need a composite outcome to
`express the balance of both risk and benefit, qualitative
`heterogeneity as defined will inevitably be rare—a Catch-
`22, in fact.
`There are several other arguments against attempts to
`target treatment. First, it is said that clinicians already tend
`to undertreat patients,52 and we should not risk effective
`treatments being further restricted. However, one of the
`main purposes of subgroup analysis is to extend the use of
`treatments to subgroups that are not currently treated in
`routine practice. Subgroup analyses in epidemiological
`studies and trials often show that benefit from treatment
`is likely to be more universal than expected and that
`current indications for treatments in routine clinical
`practice are inappropriately narrow, as is now clear, for
`example, with treatment thresholds for blood pressure
`lowering or lipid lowering.53,54 Second, it is argued that
`subgroup analyses are almost always underpowered,55–60
`
`G
`
`Panel 2: The four main clinical indications for subgroup
`analysis
`Potential heterogeneity of treatment effect related to risk
`Differences in risks of treatment
`G
`Differences in risk without treatment
`Potential heterogeneity of treatment effect related to
`pathophysiology
`G Multiple pathologies underlying a clinical syndrome
`Differences in the biological response to a single
`G
`pathology
`Genetic variation
`Clinically important questions related to the practical
`application of treatment
`Does benefit differ with severity of disease?
`G
`Does benefit differ with stage in the natural history of
`disease?
`Is benefit related to the timing of treatment after a
`clinical event?
`Is benefit dependent on comorbidity?
`Underuse of treatment in routine clinical practice due to
`uncertainty about benefit
`G Underuse of treatment in specific groups of patients eg,
`elderly people
`Confinement of treatment according a narrow range of
`values of a relevant physiological
`variable—eg, treatment thresholds for cholesterol level
`or blood pressure
`
`G
`
`G
`
`G
`
`G
`
`G
`
`but this is simply an argument for larger trials and for
`meta-analysis of individual patient data. Third, it has also
`been argued that false positive subgroup effects might be
`more common than genuine heterogeneity,2–5,55–60 and
`these
`false observations might harm patients—
`“subgroups kill people.”61 Subgroup analyses have
`certainly led to mistaken clinical recommendations (table
`1), but these analyses would not have satisfied the rules
`suggested in panel 1. Moreover, not doing subgroup
`analysis can also be harmful. Properly powered subgroup
`analyses most commonly show that relative treatment
`effect is consistent across subgroups and, or, that
`treatments should be used more extensively than is
`currently the case.53,62,63 Without such evidence, unfounded
`clinical concerns about possible heterogeneity or
`inappropriately narrow indications for treatment would
`reduce the use of effective treatments in routine practice.26
`Not doing subgroup analyses has very probably killed
`more people.
`
`Situations in which subgroup analyses should be
`considered
`
`“The tragedy of excluding cogent pathophysiologic
`subgroup analyses merely because they happen to be
`subgroups will occur if statisticians do not know the
`distinction, and if clinicians who do know it remain
`mute, inarticulate or intimidated.”
`
`A R Feinstein, 19981
`
`Subgroup analyses should be predefined and carefully
`justified. Feinstein and others have emphasised the need
`for determination of pathophysiological heterogeneity,
`but there are three other indications for subgroup analysis
`(panel 2), each of which are discussed below, which are
`probably more important.
`
`Heterogeneity related to risk
`Clinically important heterogeneity of treatment effect is
`common when different groups of patients have very
`different absolute risks with or without treatment. The
`need for reliable data about risks and benefits in
`subgroups and individuals is greatest for potentially
`harmful interventions, such as warfarin or carotid
`endarterectomy, which are of overall benefit but that kill
`or disable a proportion of patients. However, evidence-
`based guidelines usually recommend these treatments in
`all cases similar to those in the relevant RCTs.64–66 In
`considering this approach, it is useful to draw an analogy
`with the criminal justice system. Suppose that research
`showed that individuals charged by the police with
`specific crimes were usually guilty. Few would argue that
`they should
`therefore be sentenced without
`trial.
`Automatic sentencing would, on average, do more good
`than harm, with most criminals correctly convicted, but
`any avoidable miscarriages of justice are widely regarded
`as unacceptable. In contrast, relatively high rates of
`
`178
`
`www.thelancet.com Vol 365 January 8, 2005
`For personal use. Only reproduce with permission from Elsevier Ltd
`
`Page 3 of 11
`
`
`
`
`
`Series
`
`
`
`1
`54(1-15-2-33)
`1-27 (099-164)
`1-113 (092-151)
`1-90 (Lu—289)
`M03
`M13
`H30
`M02
`1-13 (050-154)
`097 (04-2-35)
`254(147-4-39)
`597043-1468)
`p-077
`p-095
`p—OOOI
`[KO-(”1
`mmmamhammflmmmwwmmhmmmm
`”mum-mquMIeMIWML >705m'suiymls'fllrfly
`WMmmhu-flrdmxwmdmmm
`
`1 1
`
`TflzflmmqufiflfiMhmmmmmdm
`umwmflmw
`
`relation to underlying pathology is seen with thrombolysis
`for acute ischaernic stroke.“ with aspirin in primary
`prevention of vascular disease (in which benefit may be
`largely confined to men with elevated levels of C—reactive
`protein,“ probably indicating underlying atherosderosis),
`and with blood pressure-lowering in secondary prevention
`of transient
`ischaernic attack and stroke,
`in which
`
`that all patients be treated.”
`suggest
`guidelines
`However, there is clinical concern about patients with
`carotid stenosis or occlusion in whom cerebral perfiision
`is often severely impaired)“ Table 2 shows stroke risk by
`systolic blood pressure in patients with and without flow-
`limiting (270%) carotid stenosis who were randomly
`assigned
`to medical
`treatment
`in
`RCI's
`of
`endartm’ectomy.” Major increases in stroke risk were
`noted in patients with flow-limiting stenosis. but only if
`systolic blood pressure <150 mm Hg: 5-year risk in
`patients with bilateral (270%) stenosis was 64- 3% versus
`24-2% (p=0-002) at higher blood pressures. This
`difference in risk was absent in patients who had been
`randomly assigned to endarterectomy (13-496 Vs 18-396,
`p=0-6). suggesting a causal effect and indicating that
`aggressive blood pressure-lowering would very probably
`be harmful in patients with bilateral severe carotid disease
`in whom endarterectomy was not possible.
`
`Biologicd heterogeneity
`Subgroup analyses can also be usefirl when there are
`predictable differences in the biological response to the
`underlying disease. For example, perioperative admini-
`stration of antilymphocyte antibodies reduces rejection in
`cadaveric renal transplantation by 30%;“ but is expensive
`and has serious adverse effects. Clinical concern that
`
`benefit might depend on preexisting immune sensiti-
`sation prompted a meta-analysis ofindividual patient data
`from five RCl‘s. As predicted,
`treatment was highly
`effective in sensitised patients (hamd ratio for allografi
`failure at 5 years=0-20, 95% CI=0-09—0-47) but was
`ineffective in the remaining 85% (0- 97, 0-71—1 - 32).” The
`subgroup-treatment effect
`interaction was significant
`(p=0-009)—ie, fire effect of treatment was significantly
`different between the subgroups. A similar pre-spedfied
`immunologiral subgroup analysis in a large trial of
`
`treatment-related death or disability (miscarriages of
`treatment) are tolerated by the medical
`scientific
`community precisely because, on average, treatment will
`do more good than harm. In both situations systems need
`to be in place to avoid doing harm. Yet the contrast
`between the effort that is put into the defence of the
`accused in order to avoid wrongfirl conviction and the
`very limited efforts ofthe medical sdentific community to
`identify patients at high risk of harm is obvious.
`Admittedly,determinationofguiltinacriminaltrialis
`based on knOwledge of past events, which can often be
`established with certainty, whereas probable benefit or
`harm fiom medical treatment depends on future events,
`which are usually less certain. However, the probable
`balance of risk and benefit in individual patients can be
`predicted to some extent with subgroup analysis and risk
`models, as has been shown. for example, with carotid
`endarterectomy.""" In view of the fact that treatment
`complications are now a leading cause of death in
`developed countries," effort is needed to more effectively
`target potentially harmful interventions.
`Differences in the risk of a poor outcome without
`treatment
`can also
`lead
`to clinically
`important
`heterogeneity of treatment effect. Trial populations are
`often skewed in terms of control group risk, with a few
`individuals contributing much of the observed risk," and
`treatment may be ineffective or harmfirl in the low risk
`majority. In vascular medicine, this is the case with
`endarterectomy for
`symptomatic
`carotid stenosis,“
`anticoagulation for uncomplicated nonovalvular atrial
`fibrillation," coronary artery bypass grafting," and anti-
`arrhythrnic drugs after myocardial infarction.75 Clinically
`important heterogeneity of relative treatment effect by
`baseline risk has also been shown for blood pressure
`lowering," aspirin,” and lipid lowering7|
`in primary
`prevention of vascular disease, and in treatment of acute
`coronary
`syndromes with clopidogrel,"
`and with
`enoxaparin versus unfiactionated heparin.‘lul There are
`many similar examples in other areas ofmedicine,“ and
`this issue is the subject ofthe next article in this series.
`
`Pathophysiological heterogeneity
`Differences between groups of patients in underlying
`pathology, biology, or genetics can each lead to clinically
`important heterogeneity of treatment effects. Examples
`will probably be identified more frequently as our
`understanding ofthe molecular mechanisms ofdisease is
`enhanced.
`
`Multiple underlying pathologies
`Clinicians often have to treat patients with ill—defined
`clinical syndromes, which probably have many underlying
`pathologies. rather than one disease. Primary generalised
`epilepsy is a typical example in which treatment eflects
`differ between patients, probably because of the different
`underlying molecular pathologies. In vascular disease,
`dinially important heterogeneity of treatment effect in
`
`www.melmcetcom Vol 365 julnryfiloos
`
`Page 4 of 11
`
`For personal use. Only reproduce with permission from Elsevier Ltd
`
`179
`
`
`
`Series
`
`40·0
`
`30·2
`
`17·6
`
`14·8
`
`11·4
`
`8·9
`
`3·3
`
`4·0
`
`0–2
`
`4–12
`2–4
`Weeks from event to randomisation
`
`–2·9
`
`12+
`
`30·0
`
`20·0
`
`10·0
`
`0·0
`
`–10·0
`
`ARR (%), 95% CI
`
`Figure 1: Effect of carotid endarterectomy in patients with 50–69% and
`⭓70% symptomatic stenosis in relation to time from last symptomatic
`ischaemic event to randomisation70
`Numbers above bars indicate actual absolute risk reduction. Vertical bars are
`95% CIs. ARR=absolute risk reduction.
`
`in patients with more marked changes
`benefit
`(interaction p=0·006).107 The stage of disease can also
`determine the effect of treatment of non-vascular
`disease, as is seen in people with cancer,108,109 or
`HIV/AIDS.110–112
`
`Timing of treatment and comorbidity
`Effect of treatment is often critically dependent on
`timing, as shown in figure 1, for benefit from
`endarterectomy
`for recently symptomatic carotid
`stenosis. The risk of a stroke is very high during the first
`few days and weeks after a transient ischaemic attack,113
`especially in patients with carotid stenosis,114 but falls
`rapidly with time, as therefore does benefit from
`endarterectomy.70 Similar time-dependence has been
`shown for benefit from thrombolysis for both acute
`myocardial infarction106 and acute ischaemic stroke.115
`Treatment effects may also depend on comorbidity.
`For example, angiotensin-converting enzyme inhibitors
`and angiotensin II receptor blocking drugs are harmful
`in patients with renovascular disease but highly
`beneficial in other hypertensive patients.116 Benefit from
`diltiazem after myocardial infarction may depend on
`the presence of heart failure because of the negative
`chronotropic and inotropic effects of the drug.117
`
`Underuse of treatment in specific groups
`Treatments that are effective in trials are often underused
`in specific groups of patients in routine practice. For
`example, statins were not used in elderly people for many
`years until the drugs were proved highly effective by
`subgroup analysis in the Heart Protection Study.53 Proof
`of some benefit by subgroup analysis was also needed to
`counter underuse in elderly patients of thrombolysis for
`acute myocardial infarction in elderly people,106 and
`similar underuse of endarterectomy for symptomatic
`carotid stenosis.70 In each case, treatment had already
`been shown to be highly effective overall. Use of
`treatment in routine clinical practice is also often
`inappropriately limited to patients with measurements of
`
`coronary
`after
`placebo
`versus
`roxithromycin
`angioplasty showed that treatment reduced restenosis and
`the need for revascularisation if the titre of Chlamydia
`pneumoniae antibody was high but was ineffective or
`harmful if the titre was low (interaction p=0·006).95
`
`Genetic heterogeneity
`Individuals respond differently to some drugs and this
`tendency can be inherited.96,97 Genotype is an important
`determinant of both the response to treatment and the
`susceptibility to adverse reactions for a wide range of
`drugs.98,99 For example, response to chemotherapy is
`dependent on gene expression in both colon cancer100 and
`breast cancer,101 and HDL cholesterol response to
`oestrogen replacement therapy is highly dependent on
`sequence variants in the gene encoding oestrogen
`receptor ␣.102 In each of these cases, significant subgroup-
`treatment effect interactions have been reported. There is
`also great interest in the effects of genetics on the
`response to treatment in patients with HIV-1.103 Subgroup
`analyses based on genotype have particular methodolo-
`gical problems since many genotypes may be studied and
`analyses will often be post hoc.
`
`Heterogeneity related to practical application
`Many of the arguments used against subgroup analyses
`misinterpret their main function. The main potential
`of subgroup analysis is not in the identification of
`groups that differ in their response to treatment for
`reasons of pathophysiology, but is in answering
`practical questions about how treatments should be
`used most effectively, such as at what stage of the
`disease is treatment most effective, how soon after a
`clinical event is treatment sufficiently safe or most
`effective, or how are the risks and benefits related to
`comorbidity? Subgroup analyses related to questions of
`the practical application of interventions can be vital to
`effective clinical practice.
`
`Severity or stage of disease
`Treatment effects often depend on severity of disease.
`In primary prevention of vascular disease, a pooled
`analysis of RCTs of pravastatin showed that the
`relative risk reduction with treatment increased with
`baseline LDL
`cholesterol
`(interaction p=0·01):
`relative risk reduction=3% in the lowest quintile and
`29% in the two highest quintiles.104 In stroke medicine,
`carotid endarterectomy
`is highly effective
`for
`⭓70% recently symptomatic stenosis, modestly
`effective for 50–69% stenosis, but harmful for <50%
`stenosis
`(interaction p<0·0001).105
`In cardiology,
`thrombolysis
`for acute myocardial
`infarction
`is
`ineffective or harmful in patients with ST segment
`depression, but highly beneficial in patients with ST
`elevation (interaction p<0·01),106 and early invasive
`treatment of unstable angina is of no benefit in patients
`with only minor ST segment change but of major
`
`180
`
`www.thelancet.com Vol 365 January 8, 2005
`For personal use. Only reproduce with permission from Elsevier Ltd
`
`Page 5 of 11
`
`
`
`Series
`
`Events/patients
`Surgical
`
`Medical
`
`ARR (%)
`
`95% CI
`
`p value
`
`7/56
`
`4/66
`
`8/76
`
`8/67
`
`9/75
`
`1/56
`
`6/51
`
`6/41
`
`10/44
`
`6/28
`
`13/47
`
`9/36
`
`6/37
`
`8/41
`
`3·1
`
`16·7
`
`10·5
`
`18·3
`
`12·8
`
`15·1
`
`9·5
`
`12·3
`
`–11·3 to 17·5
`
`3·0 to 30·3
`
`–6·9 to 27·9
`
`2·3 to 34·2
`
`–3·8 to 29·4
`
`2·3 to 27·9
`
`–6·6 to 25·6
`
`0·34
`
`0·008
`
`0·12
`
`0·01
`
`0·07
`
`0·01
`
`0·12
`
`6·5 to 18·1
`
`<0·001
`
`Day of birth
`
`Sunday
`
`Monday
`
`Tuesday
`
`Wednesday
`
`Thursday
`
`Friday
`
`Saturday
`
`Total
`
`43/447
`
`58/274
`
` Heterogeneity: p=0·83
`
`–20
`
`–10
`
`0
`
`10
`
`20
`
`30
`
`40
`
`% absolute risk reduction (95% CI)
`
`Figure 2: Effect of carotid endarterectomy in patients with ⭓70% symptomatic stenosis in ECST126 according to day of week on which patients were born
`
`physiological parameters above specific arbitrary cut-off
`points, such as treatment thresholds for blood pressure
`and total cholesterol in prevention of vascular disease.
`There is increasing evidence from subgroup analysis in
`large trials that such thresholds are inappropriate.53,87
`Proof of the generalisability of benefit is a major function
`of subgroup analysis. However, such analyses should be
`sufficiently powered to detect benefit, and pooled
`analyses of multiple trials will often be needed for
`subgroups such as elderly people who are commonly
`under-represented in trials.26
`
`Estimation and interpretation of subgroup
`effects
`
`“Far better an approximate answer to the right question,
`which is often vague, than an exact answer to the wrong
`question, which can always be made precise.”
`
`J W Tukey, 1962118
`
`Multiplicity, post hoc analyses, and publication bias
`In one trial of  blockers after myocardial infarction,119 146
`subgroup analyses were done,120 several of which showed
`apparent differences in the effect of treatment. However,
`none of the differences were confirmed by subsequent
`studies.40 Pocock reviewed 50 trials published in major
`journals in 1997 and noted that 70% reported a median of
`four subgroup analyses,55 which was little changed from
`10 years previously.121 The reliability of these subgroups
`depends to a great extent on whether they were predefined
`and how many other analyses were done but not reported.
`Selective reporting of post hoc subgroup observations,
`which are generated by the data rather than tested by
`them, is analogous to placing a bet on a horse after
`watching the race. There is certainly evidence of selective
`reporting of significant analyses,122–124 but this is difficult to
`judge when assessing an individual trial. The only
`solution is for a small number of potentially important
`subgroups to be pre-defined in the trial protocol, along
`
`with their anticipated directions. Post hoc observations are
`not automatically invalid (many medical discoveries have
`been fortuitous), but they should be regarded as unreliable
`unless they can be replicated.
`
`Statistical significance
`two ways.
`in
`Subgroup analyses can be wrong
`First, they can falsely indicate that treatment is beneficial
`in a particular subgroup when the trial shows no overall
`effect—the situation in which subgroup analyses are most
`commonly done.56,57 Simulations of RCTs powered to
`determine the overall effect of treatment suggest that false
`subgroup effects will be noted by chance in 7%–21% of
`analyses depending on other factors.58 More commonly (in
`41%–66% of simulated subgroups) simulations can falsely
`indicate that there is no treatment effect in a particular
`subgroup when the trial shows benefit overall.58 Benefit is
`most likely to be absent in small subgroups, which
`probably explains the recurrent and usually mistaken
`finding that treatments are ineffective in women29,32,125 and
`in elderly people,32,35 who tend to be under-represented in
`RCTs.26 The correct analysis is not the significance of the
`treatment effect in one subgroup or the other, but whether
`the effect differed significantly between the subgroups—
`the test of subgroup-treatment effect interaction. For
`example, although endarterectomy for severe stenosis in
`the European Carotid Surgery Trial (ECST)126 was only
`significantly beneficial in patients born on specific days of
`the week (figure 2), this was, of course, due to chance and
`there was no subgroup-treatment effect
`interaction
`(p=0·83). Data from simulation studies have shown that
`tests of subgroup-treatment effect interaction are reliable,
`with a false positive rate of 5% at p<0·05, which is robust
`to differences in the size of subgroups, the number of
`categories, and to continuous data.58 However, although
`testing of subgroup-treatment effect interactions is widely
`recommended,51,55–57,121 Pocock’s review showed that 37% of
`RCTs reported only p values for treatment effect within
`subgroups and only 43% reported tests of interaction.55
`
`www.thelancet.com Vol 365 January 8, 2005
`
`For personal use. Only reproduce with permission from Elsevier Ltd
`
`181
`
`Page 6 of 11
`
`
`
`Series
`
`Events/patients
`
`Month of birth
`
`Surgical
`
`Medical
`
`ARR (%)
`
`95% CI
`
`May–Jun
`
`Jul–Aug
`
`Sept–Oct
`
`Nov–Dec
`
`Jan–Feb
`
`Mar–Apr
`
`6/83
`
`8/84
`
`10/87
`
`6/56
`
`9/73
`
`12/64
`
`18/47
`
`16/58
`
`7/34
`
`9/39
`
`6/43
`
`6/53
`
`Total
`
`51/447
`
`62/274
`
`Heterogeneity: p<0·0001
`
`33·4
`
`20·7
`
`9·6
`
`11·2
`
`0·1
`
`–7·7
`
`11·6
`
`18·2 to 48·6
`
`7·0 to 34·4
`
`–6·2 to 25·3
`
`–5·2 to 27·6
`
`–13·1 to 13·2
`
`–20·8 to 5·3
`
`5·6 to 17·6
`
`–30
`
`–20
`
`–10
`
`0
`
`10
`
`20
`
`30
`
`40
`
`50
`
`% absolute risk reduction (95% CI)
`
`Figure 3: Effect of carotid endarterectomy in patients with ⭓70% symptomatic stenosis in ECST126 according to month of birth in six 2 month periods
`
`Chance
`The effect of chance on subgroup analyses is usually
`illustrated with the ISIS-2 trial example (aspirin vs placebo
`in acute myocardial infarction), in which aspirin was
`ineffective in patients born under the star signs of Libra
`and Gemini (150 deaths on aspirin vs 147 on placebo,
`2p=0·5), but was beneficial in the remainder (654 deaths
`on aspirin vs 869 on placebo, 2p<<0·0001).3–5 The
`significance of this subgroup treatment effect interaction
`has never been reported, but it seems to be p=0·01
`(Breslow Day test). However, Li