`Statist. Med. 2003; 22:169–186 (DOI: 10.1002/sim.1425)
`
`Non-inferiority trials: design concepts and issues – the
`encounters of academic consultants in statistics
`
`Ralph B. D’Agostino Sr.1;2;∗;†
`1Boston University Statistics and Consulting Unit; 111 Cummington Street; Boston; MA 02215; U.S.A.
`2Harvard Clinical Research Institute; 930 Commonwealth Avenue; Boston; MA 02215; USA
`3Boston University; Biostatistics; 715 Albany Street; Boston MA 02118; USA
`
`, Joseph M. Massaro1;2;3 and Lisa M. Sullivan1;3
`
`SUMMARY
`Placebo-controlled trials are the ideal for evaluating medical treatment e(cid:1)cacy. They allow for control of
`the placebo e(cid:2)ect and are most e(cid:1)cient, requiring the smallest numbers of patients to detect a treatment
`e(cid:2)ect. A placebo control is ethically justi(cid:3)ed if no standard treatment exists, if the standard treatment
`has not been proven e(cid:1)cacious, there are no risks associated with delaying treatment or escape clauses
`are included in the protocol. Where possible and justi(cid:3)ed, they should be the (cid:3)rst choice for medical
`treatment evaluation. Given the large number of proven e(cid:2)ective treatments, placebo-controlled trials are
`often unethical. In these situations active-controlled trials are generally appropriate. The non-inferiority
`trial is appropriate for evaluation of the e(cid:1)cacy of an experimental treatment versus an active control
`when it is hypothesized that the experimental treatment may not be superior to a proven e(cid:2)ective
`treatment, but is clinically and statistically not inferior in e(cid:2)ectiveness. These trials are not easy to
`design. An active control must be selected. Good historical placebo-controlled trials documenting the
`e(cid:1)cacy of the active control must exist. From these historical trials statistical analysis must be performed
`and clinical judgement applied in order to determine the non-inferiority margin M and to assess assay
`sensitivity. The latter refers to establishing that the active drug would be superior to the placebo in
`the setting of the present non-inferiority trial (that is, the constancy assumption). Further, a putative
`placebo analysis of the new treatment versus the placebo using data from the non-inferiority trial and the
`historical active versus placebo-controlled trials is needed. Useable placebo-controlled historical trials
`for the active control are often not available, and determination of assay sensitivity and an appropriate
`M is di(cid:1)cult and debatable. Serious consideration to expansions of and alternatives to non-inferiority
`trials are needed. Copyright ? 2003 John Wiley & Sons, Ltd.
`
`KEY WORDS:
`
`control group; clinical trial; placebo control; active control; equivalence;
`non-inferiority; assay sensitivity
`
`1. INTRODUCTION
`
`The randomized clinical trial (RCT) is one of the most important advances in the twentieth
`century [1–3]. Its importance grew as evidence-based medicine became the norm for estab-
`lishing e(cid:1)cacy of drugs, biologics and medical devices. In the early 1900s the e(cid:1)cacy of
`
`∗ Correspondence to: Ralph B. D’Agostino, Boston University Statistics and Consulting Unit, 111 Cummington
`Street, Boston, MA 02215,U.S.A.
`† E-mail: Ralph@bu.edu
`
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Received October 2002
`Accepted October 2002
`
`Biogen Exhibit 2068
`Mylan v. Biogen
`IPR 2018-01403
`
`Page 1 of 18
`
`
`
`170
`
`R. B. D’AGOSTINO, Sr., J. M. MASSARO AND L. M. SULLIVAN
`
`medical treatments was based on anecdotal evidence, often gathered on one or several patients
`(medical reports and case series). Some treatments had profound e(cid:2)ects such that evidence
`based on few patients was convincing (for example, penicillin). In general this was not the
`case. Later, more rigorous studies followed in which several patients were given the same
`treatment and evaluated. Many of these studies, however, were uncontrolled. Bradford Hill
`pointed out the problems of these and set the stage for RCTs in the medical arena [4]. Others
`illustrated the importance of RCTs and the potential deception of uncontrolled clinical trials
`by contrasting the ‘positive results’ reported in uncontrolled trials versus RCTs [5–7]. Spilker
`gave a review in four major clinical areas: psychiatry; depression; respiratory distress, and
`rheumatoid arthritis [5]. In each area, a substantially higher proportion of positive (cid:3)ndings
`were reported in uncontrolled trials as compared to RCTs. For example, in psychiatric therapy
`trials, 83 per cent of uncontrolled trials reported positive (cid:3)ndings, as compared to only 25
`per cent of RCTs [6]. In rheumatoid arthritis trials, 62 per cent of uncontrolled trials reported
`positive (cid:3)ndings, as compared to only 25 per cent of RCTs [7]. The RCT can distinguish the
`e(cid:2)ects of a medical treatment from other e(cid:2)ects, such as spontaneous changes in the course
`of the disease, the body’s natural healing, improvement due to participating in a study (that
`is, the placebo e(cid:2)ect), and biases in observation and measurement. Few now doubt the virtues
`of RCTs for assessing medical treatment e(cid:1)cacy.
`The United States’ Food and Drug Administration (FDA) emphasizes the need for RCTs for
`medical treatment (drugs, biologics and devices) approval. For example, the Code of Federal
`Regulations (CFR) Title 21, Part 314, outlines the procedures for applications to the FDA for
`approval to market new drugs and Section 126 outlines the criteria of ‘adequate and well-
`controlled’ studies [8]. Focus is on the RCT. The same emphasis holds in the international
`setting. The International Conference on Harmonisation (ICH) is attempting to consolidate
`procedures for the registration of pharmaceuticals in the European Union, Japan and the United
`States. The ICH E9 guidance document discusses statistical principles for clinical trials [9].
`The ICH E10 guidance document discusses the selection of appropriate controls in clinical
`trials [10, 11]. The latter document describes (cid:3)ve types of controls (placebo, no treatment,
`dose–response, active and historical), and outlines the advantages and disadvantages of each.
`The (cid:3)rst four controls are concurrent controls. These controls in randomized clinical trials are
`preferable to historical controls as patients for both the test and control treatments are drawn
`from the same population and studied under similar conditions, thereby minimizing bias in
`the comparison. Of all the possible RCTs, to many the ideal is the placebo-controlled RCT.
`In the absence of e(cid:2)ective treatments, placebo-controlled RCTs are uncontroversial. When,
`however, a proven e(cid:2)ective treatment exists, the ethics of the placebo-controlled trials are
`questionable. In this setting, the attacks against placebo-controlled trials are many and sub-
`stantial [12–15]. Of most importance is the Declaration of Helsinki [16]. Article II.3 of this
`states ‘In any medical study, every patient – including those of a control group, if any – should
`be assured of the best proven diagnostic and therapeutic method. This does not exclude the
`use of inert placebo studies where no proven diagnostic or therapeutic methods exists’. Many
`interpret this to mean that when an e(cid:2)ective treatment exists the use of a placebo is unethical
`and should not be included in a RCT. Others, including prestigious groups such as the Amer-
`ican Medical Association and the World Health Organization, leave room for the possible use
`of placebo-controlled RCTs under certain circumstances (see Section 2) [17–21].
`The active-controlled trial has been one response to the attack on placebo-controlled trials.
`Here the new experimental treatment is compared to a proven active control treatment. The
`
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Statist. Med. 2003; 22:169–186
`
`Page 2 of 18
`
`
`
`NON-INFERIORITY TRIALS
`
`171
`
`new treatment may not be superior to the active treatment in terms of e(cid:1)cacy, but it may
`be equivalent. Borrowing ideas from the (cid:3)eld of bioequivalency, medical researchers includ-
`ing clinicians and statisticians developed equivalency trials with their design issues and the
`necessary statistical testing procedures [22–27]. Upon further clari(cid:3)cation of the issues, it be-
`came clear that what was desired were non-inferiority trials (or more precisely, non-inferiority
`active-controlled RCTs), even if the term ‘equivalency trials’ is often used. The objective of a
`non-inferiority clinical trial is to establish that the e(cid:2)ect of the new treatment, when compared
`to the active control, is not below some pre-stated non-inferiority margin.
`The designing, implementation and analysis of non-inferiority trials have presented substan-
`tial challenges and issues for the pharmaceutical, biologics and medical device industries. The
`FDA and its scientists are well aware of these [11, 28, 29]. In our roles as academic consul-
`tants, industry sponsors are constantly seeking advice to decide when a non-inferiority trial
`is warranted, to clarify for them the unique design concepts and the issues involved, to help
`design, implement and perform the trial and ultimately to aid in the analysis and interpretation
`of the study. In this paper we focus on the design concepts and issues involved. We illustrate
`these with real world examples, many that we have encountered.
`In Section 2 we review the usefulness of the placebo-controlled trial and the situations
`where they may be justi(cid:3)ed, even when proven active treatments exist. Section 3 discusses
`two major issues in active-controlled non-inferiority trials: (i) the statistical hypotheses and
`tests involved in a non-inferiority trial and (ii) the selection of the non-inferiority margin.
`The latter includes discussion of clinical meaningfulness, assay sensitivity (which relates to
`establishing that the active treatment and in turn the experimental treatment would have been
`superior to placebo had a placebo been used in the trial), and the fear of what is called
`‘biocreep’. Section 4 concerns the putative placebo analysis as a means of establishing that
`the new treatment is superior to placebo. Section 5 deals with selecting the appropriate sample
`to use for the statistical analysis. In Section 6 we discuss the role of interim analysis. Then in
`Section 7 we expand the non-inferiority trial to consider safety issues and also review some
`alternatives to non-inferiority trials. Finally, in Section 8 we give a brief closing discussion
`and some recommendations.
`
`2. PLACEBO-CONTROLLED TRIALS
`
`An appropriate control group is always essential and, when feasible, a placebo control is
`optimal. Figures 1 and 2 demonstrate the problem when a study does not contain a placebo
`control. The comparison of the active control C with the test treatment T in Figures 1 and 2
`indicates that the two treatments are similar. However, if a placebo group is not included in
`the study, then one can never be sure if the new treatment is better than the placebo, as
`Figure 1 indicates, or not di(cid:2)erent from the placebo, as Figure 2 indicates. Figure 1 corre-
`sponds to both C and T being e(cid:2)ective, Figure 2 to neither being e(cid:2)ective.
`Historically, a placebo control group was the usual optimal control group for establishing
`e(cid:1)cacy of an experimental treatment. It has been the basis for many FDA approvals. Su-
`periority of the experimental treatment over placebo in two well controlled and performed
`RCTs justi(cid:3)ed approval. At times it was essential to establish that the trial had sensitivity
`(or sometimes called assay sensitivity) and an active control was added as, for example, in
`analgesic studies [30, 31]. Here the comparison of the active control to the placebo was an
`
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Statist. Med. 2003; 22:169–186
`
`Page 3 of 18
`
`
`
`172
`
`R. B. D’AGOSTINO, Sr., J. M. MASSARO AND L. M. SULLIVAN
`
`Control
`
`Treatment
`
`Placebo
`
`% Effective
`
`Figure 1. Comparison of test treatment (T ) with active control (C) and unobserved
`placebo (P) (T and C superior to P).
`
`Control
`
`Treatment
`
`Placebo
`
`% Effective
`
`Figure 2. Comparison of test treatment (T ) with active control (C) and unobserved
`placebo (P) (T and C not superior to P).
`
`essential component of the analysis. The comparison of the active control to the experimental
`treatment was not required. The ideal was a study with a placebo, an active control and an
`experimental treatment.
`Now with the large array of proven e(cid:2)ective treatments, ethical considerations cast doubts on
`the appropriateness of using a placebo control. Dose response trials are possible alternatives,
`but they also raise ethical problems since the low dose may not be any di(cid:2)erent than a
`placebo. So when is a placebo control justi(cid:1)ed in the presence of proven active treatments?
`We agree with Ellenberg and Temple [21]. ‘that placebo controls are ethical when delaying
`
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Statist. Med. 2003; 22:169–186
`
`Page 4 of 18
`
`
`
`NON-INFERIORITY TRIALS
`
`173
`
`or omitting available treatment has no permanent adverse consequences for the patient and
`as long as patients are fully informed about the alternatives’. We also believe escape clauses
`should be included in the protocol.
`An active control arm may be included in the RCT, but the active control is there for reasons
`such as assay sensitivity. It is not necessary for comparison with the experimental treatment.
`Thus for many over-the-counter drug situations such as pain, headaches, upset stomach and
`the treatment of the common cold, placebo-controlled trials are ethical. Ellenberg and Temple
`[20, 21] discuss numerous prescription drug situations involving, for example, antidepressants
`and short term trials (such as some anti-hypertensive trials), and settings where the available
`‘e(cid:2)ective treatment’ may not be uniformly accepted as standard treatment and so placebo-
`controlled trials are justi(cid:3)ed.
`
`3. ACTIVE-CONTROLLED TRIALS=NON-INFERIORITY TRIALS
`
`Now let us move to the situation where the placebo control is considered unethical or for
`some other reason is deemed inappropriate. This leads us to active-controlled trials in which
`the experimental treatment is compared directly to a proven e(cid:2)ective active control. If the
`sponsor believes the experimental treatment is superior to the active control, then a standard
`superiority trial with the objective of showing that the experimental treatment is statistically
`and clinically superior to the active control is appropriate.
`What, however, if anticipated superiority is not the case? Then a non-inferiority trial (that
`is, a trial with the objective of showing that the experimental treatment is statistically and
`clinically not inferior to the active control) may be appropriate. A sponsor of an experimen-
`tal treatment may logically decide to conduct a non-inferiority trial even when he believes
`the active control’s e(cid:1)cacy cannot be surpassed. Why? The new product may o(cid:2)er safety
`advantages. For example, a new anti-infective product may produce no resistant bacteria, a
`new respiratory distress product for premature infants may be synthetic as opposed to animal
`derived and pose less risk, a new asthma treatment inhaler may have no chloro(cid:4)uorocarbons
`in contrast to the standard product [23]. In the case of HIV treatments, new products may
`have simpler regimens promoting adherence and potentially reducing resistance. It is even
`possible that costs, marketing and potential pro(cid:3)ts are the underlying reasons. For example,
`the costs of the new product may be less expensive or the sponsor may have better access to
`the markets.
`
`3.1. Statistical algorithm for assessing non-inferiority
`The statistical algorithms for assessing non-inferiority (and equivalency) are in Blackwelder’s
`paper [22]. We give a brief summary here and in Table I. Let T and ‘Test’ represent the value
`of the e(cid:1)cacy variable for the new (experimental) treatment. Similarly let C and ‘Control’
`and P and ‘Placebo’ represent the values of the e(cid:1)cacy variable for the active control and
`placebo, respectively. Further, say we have a trial where higher values of this e(cid:1)cacy variable
`are desirable. The standard null and alternative hypotheses for proving non-inferiority are
`H0: C − T ¿ M (C is superior to T )
`H1: C − T ¡ M (T is not inferior to C)
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Statist. Med. 2003; 22:169–186
`
`Page 5 of 18
`
`
`
`174
`
`R. B. D’AGOSTINO, Sr., J. M. MASSARO AND L. M. SULLIVAN
`
`Table I. Hypotheses for a non-inferiority trial.
`H0: C − T ¿M (C superior to T )
`H1: C − T ¡M (T not inferior to C)
`Here T is the new treatment, C is the active control
`and M is the non-inferiority margin.
`
`Here, M is the non-inferiority margin, that is, how much C can exceed T with T still being
`considered non-inferior to C (M ¿0). The null hypothesis states that the active control C
`exceeds the experimental treatment T by at least M ; if this cannot be rejected, then the active
`control is considered superior to the experimental treatment with respect to e(cid:1)cacy. The
`alternative hypothesis states that the active control may indeed have better e(cid:1)cacy than the
`experimental treatment, but by no more than M . In such a case, we say the investigational
`product is not inferior to the active control. Rejection of the null hypothesis is needed to
`conclude non-inferiority.
`One of the major issues today in non-inferiority clinical trials is the choice of M . We discuss
`this in Section 3.2. We should note here that the above displays the statistical hypotheses as
`di(cid:2)erences between the treatments. The hypotheses could be in terms of means or proportions
`of successes. Also, depending on the application the hypotheses could be stated in terms of
`logs (log C − log(T ) ¿ M ), etc.
`ratios (C=T ¿ M );
`In order to assess if non-inferiority is met (that is, whether the null hypothesis is rejected)
`we can perform a one-sided hypothesis test at (cid:1) level of signi(cid:3)cance. Equivalently, we can
`compute a 100(1−2(cid:1)) per cent two-sided con(cid:3)dence interval for the di(cid:2)erence (C−T ). If the
`con(cid:3)dence interval’s upper bound is less than M , then with 100(1 − 2(cid:1)) per cent con(cid:3)dence,
`we say the active control is more e(cid:1)cacious than the investigational product by no more than
`M , hence allowing us to claim non-inferiority of the experimental product as compared to the
`active control at an (cid:1) level of signi(cid:3)cance.
`
`3.2. Choosing the non-inferiority margin M
`Prior to mounting the active-controlled non-inferiority trial (or at least before the blinding of
`the trial is broken) we need to state the non-inferiority margin M , that is, how close the new
`treatment T must be to the active control treatment C on the e(cid:1)cacy variable in order for the
`new treatment to be considered non-inferior to the active control. The ICH documents o(cid:2)er
`two guidelines [10]:
`
`1. The determination of the margin in a non-inferiority trial is based on both statistical
`reasoning and clinical judgement, and should re(cid:4)ect uncertainties in the evidence on
`which the choice is based, and should be suitably conservative.
`2. This non-inferiority margin cannot be greater than the smallest e(cid:2)ect size that the active
`drug would be reliably expected to have compared with placebo in the setting of a
`placebo-controlled trial.
`
`While the (cid:1)rst guideline mentions ‘clinical judgement’ we have never seen a case where
`this has actually been employed. There is often talk that C and T should be within some
`percentage of one another (for example, the sponsor says 20 per cent while the FDA says 10
`
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Statist. Med. 2003; 22:169–186
`
`Page 6 of 18
`
`
`
`NON-INFERIORITY TRIALS
`
`175
`
`(a)
`
`1. Historical Effect of Active Control versus Placebo is of a
`specified size and there if belief that it is maintained in the
`present trial (C>P)
`
`Placebo
`
`Control
`
`(b)
`
`2. Trial has the ability to recognize when the test drug is within
`non-inferiority margin (M) of control
`
`Placebo
`
`Test
`
`Control
`
`3. and Superior to a Placebo by a specified amount
`
`0.8(C-P)
`
`Placebo
`
`
`
`Test
`
`
`
`Control
`
`Figure 3. Considerations in the determination of non-inferiority margin M . (a) Assay sen-
`sitivity in a non-inferiority trial. The ability of a speci(cid:3)c trial to detect a di(cid:2)erence between
`treatments if one exists (that is, assay is working and can detect a di(cid:2)erence). (b) Assess-
`ment of non-inferiority and putative placebo comparison.
`
`per cent), clinical judgement does not seem to be the deciding factor. Rather, the determina-
`tion becomes a statistical discussion usually focusing on trying to extract information from
`historical data. To the dismay of some, the statisticians seem to have taken control of this
`issue.
`Attempts have been made to take a statistical approach; speci(cid:3)cally to combine data from
`historical placebo-controlled trials of the active drug C and determine M so that it re(cid:4)ects
`the uncertainty in the historical data and is not greater than the smallest reliable e(cid:2)ect size
`of the active treatment versus a placebo [32, 33].
`
`3.2.1. Our summary of the determination of the non-inferiority margin M. In our review
`of the (cid:3)eld, the determination of M must address three steps or issues. We present them here
`and display them in Figure 3.
`First, in the non-inferiority trial we must have assurance that the active control would have
`been superior to a placebo if a placebo were employed. This is the need to demonstrate
`or establish assay sensitivity. The use of past placebo-controlled trials often accomplishes
`this. We must have available historical data in which it has been established that the active
`control C is superior to the placebo P. Further, we must evoke a very strong assumption, the
`constancy assumption, namely, that the historical di(cid:2)erence between the active control and
`placebo is assumed to hold in the setting of the new trial if a placebo control had been used.
`This is step 1 in Figure 3.
`
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Statist. Med. 2003; 22:169–186
`
`Page 7 of 18
`
`
`
`176
`
`R. B. D’AGOSTINO, Sr., J. M. MASSARO AND L. M. SULLIVAN
`
`Second, the non-inferiority active-controlled trial should demonstrate that the new treatment
`T is within the non-inferiority margin M of the active control C (step 2 in Figure 3). This
`margin should have clinical relevance.
`Third, it is then necessary to use the C versus T data (step 2 of Figure 3) in conjunction
`with the C versus P historical placebo-controlled trial data (step 1 of Figure 3) to demonstrate
`that T is superior to P. This step is the putative placebo comparison. In conjunction with
`this step it is often necessary to establish that not only is the new treatment superior to the
`placebo, but that it also retains at least a certain amount of the superiority of the active
`control over placebo (say, 80 per cent or 50 per cent). Figure 3, step 3, illustrates this last
`step. If we think of (C − P) as representing the di(cid:2)erence between the active control and the
`placebo and (T − P) as the di(cid:2)erence between the new treatment and the placebo, then the
`amount retained by the new treatment is (T − P)=(C − P). Jones et al. favour 50 per cent
`[32]. This seems to be where the clinical community is leaning.
`One way of viewing M is that it should be no larger than X (C − P) where C and P are
`based on historical placebo-controlled trials of the active control C versus the placebo P and
`X is 1 minus the amount of the di(cid:2)erence (C − P) we desire to retain with the experimental
`treatment (for example, X = 1 − 0:8 = 0:2 or 1 − 0:5 = 0:5).
`To employ the above, the historical di(cid:2)erence (C − P) in Figure 3 must be estimated and
`this estimate must incorporate the variability in the historical data. Ideally, good historical
`placebo-controlled data from more than one study are available. In such an ideal situation
`(C − P) could be estimated as follows. Estimate C − P for each study and its corresponding
`two-sided 95 per cent con(cid:3)dence interval. Of all the con(cid:3)dence intervals, use the ‘small-
`est’ lower bound (that is, the lower bound that yields the smallest value of C − P). This
`is the most conservative estimate of (C − P). Another approach would be to perform a
`meta-analysis of the historical studies and use the average estimate of (C − P) or the lower
`con(cid:3)dence limit. Hauck and Anderson [24] discuss more formal approaches for estimating
`(C − P) and M from previous active versus placebo trials, accounting for both within-trial
`and across-trial variability. At the present time there is no universally accepted way of doing
`this.
`
`3.2.2. Some caveats. These are caveats:
`
`1. Assay sensitivity. As we mentioned above, in some areas, such as the analgesic (cid:3)eld,
`there is a need to include both a placebo control and an active control in the same trial
`in order to ensure assay sensitivity [30]. No matter how much historical data exists there
`is no assurance that the next trial will have assay sensitivity. One can argue, for those
`(cid:3)elds, the use of historical data does tell us about the historical di(cid:2)erence between the
`active and placebo controls, but not necessarily anything about assay sensitivity for the
`non-inferiority trial.
`2. Constancy assumption. With the rapid changes in medical practice and standard of care
`we may not be correct in saying that the historical di(cid:2)erence between the active control
`and placebo is valid for the present day. In our experience this constancy assumption is
`often a major issue, at times putting an end to a discussion for a formal determination
`on M .
`3. Variability of (C − P). Suppose the estimate of C − P di(cid:2)ers markedly across previous
`active versus placebo clinical trials. Which is the most appropriate estimate to use for
`
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Statist. Med. 2003; 22:169–186
`
`Page 8 of 18
`
`
`
`NON-INFERIORITY TRIALS
`
`177
`
`determining M for the non-inferiority study? To be conservative, the smallest estimate
`of (C − P) should be used, but is that too conservative? What if the smallest estimate
`of (C −P) is not statistically signi(cid:3)cant? What if the smallest di(cid:2)erence is a case where
`assay sensitivity was not established?
`4. Small number of available historical placebo-controlled studies. Historical placebo-
`controlled trials are often not plentiful; it is the experience of the authors that for many
`indications, only one historical placebo-controlled trial exists. The estimate of (C − P)
`from only one study often is called into question by regulatory agencies since there is
`not an adequate estimate of the variability of estimate of C − P.
`5. No available placebo-controlled studies. In our experience there are cases where there
`are no placebo-controlled studies. In such situations, one may try to work with previous
`dose response studies of the active control where the marketed dose of the active control
`was compared with a low dose. Here the low dose e(cid:2)ect may or may not be an adequate
`substitute for a placebo e(cid:2)ect.
`
`3.2.3. Biocreep. Biocreep is the phenomenon that can occur when a slightly inferior treat-
`ment becomes the active control for the next generation of non-inferiority trials and so
`on until
`the active controls become no better than a placebo. This is a real possibility,
`except
`it
`is easy to address. The active control comparator should always be the ‘best’
`comparator.
`
`3.2.4. Two examples. Example 1: No available placebo-controlled trials. Studies in vancom-
`ycin-resistant-enterococcal (VRE) infection (where the outcome is success de(cid:3)ned as cure of
`the infection) often use the marketed product
`linezolid as the active control comparator.
`Unfortunately, there are no published placebo-controlled studies of linezolid. The results of a
`study have been published comparing high dose (that is, the marketed dose) linezolid versus
`low dose. The results showed a di(cid:2)erence in success rates of 14 per cent. While this approach
`is conservative and may underestimate the true C−P, it is better than a simple guess at C−P.
`The best value to use for M , however, is still not clear. For example, is one-half of 14 per
`cent too conservative? At the very least, 7 per cent will lead to very large sample sizes, which
`is problematic due to the very small number of patients with VRE. In this particular example,
`because there is only one study comparing the marketed dose with a low dose, the reliability
`of the estimate is also questionable.
`Example 2: M and the history of anti-infective trials. The choice of a margin is quite
`di(cid:1)cult and somewhat controversial in anti-infective trials. To underscore this fact, consider
`a non-inferiority anti-infective trial comparing an experimental product to an active control
`(the non-inferiority study design is quite common for anti-infectives, given the large number
`of generic and non-generic anti-infectives already marketed). Suppose the outcome is cure or
`improvement of infection (dichotomous ‘success’) at the ‘test-of-cure visit’ (which occurs at
`a predetermined time interval after the last application of study treatment). Although there
`are no o(cid:1)cial guidelines for the choice of M , a common recommendation from regulatory
`agencies is to use M = 10 per cent, regardless of the speci(cid:3)c type or severity of infection.
`Until recently, however, the FDA considered a ‘step function’ for M . Here M = 0:10 (or 10
`per cent) when it was thought that the cure rate of the active control and investigational drugs
`were ¿90 per cent, an M of 15 per cent when the cure rate was thought to be between 80
`and 90 per cent, and an M of 20 per cent when the cure rate was 80 per cent or below. The
`
`Copyright ? 2003 John Wiley & Sons, Ltd.
`
`Statist. Med. 2003; 22:169–186
`
`Page 9 of 18
`
`
`
`178
`
`R. B. D’AGOSTINO, Sr., J. M. MASSARO AND L. M. SULLIVAN
`
`FDA no longer suggests this step-down function for non-inferiority trials and has disclaimed
`it on its web site.
`The removal of the step-down function, and the uno(cid:1)cial FDA guideline of an M of 10 per
`cent, caused a major concern in the anti-infective industry [34]. The FDA is now being more
`conservative with M because of its concern over biocreep. This concern is understandable.
`However, the concern over biocreep can be counteracted by the FDA regulation of the choice
`of a comparator in such trials (for example, always use the ‘best’ comparator). Overall the anti-
`infective industry is very concerned with using M = 0:10, especially in rare, serious infections,
`since the sample size, cost and time implications can be enormous. For example, if the success
`rate of both treatments is assumed to be 70 per cent and a non-inferiority margin of M = 15
`per cent is used in the trial, then the number of evaluable subjects required is approximately
`400 (this assumes a one-sided signi(cid:3)cance level of 0.025 and power of 0.90). The sample size
`increases to approximately 900 evaluable subjects when the non-inferiority margin is reduced
`M = 10 per cent. Enrolling such numbers of patients can be practically impossible for rare,
`serious infections.
`
`4. PUTATIVE PLACEBO ANALYSIS
`
`Assay sensitivity of the active control is determined from the historical active- versus placebo-
`controlled trials. In the above,
`the putative placebo comparison of the new experimental
`treatment to the placebo was satis(cid:3)ed by requiring that the new experimental treatment retains
`a portion of the active control’s superiority to the placebo. A second approach due to Lloyd
`Fisher [35] has been published by Hasselblad and Kong [36]. This method involves estimating
`the e(cid:2)ect of the new experimental treatment compared to the placebo by a set of ratios as
`follows:
`
`T versus P = T=P = T=C × C=P
`T=C and C=P can be, for example, the relative risks comparing treatments. Note T=C is from
`the non-inferiority trial and C=P is from a meta-analysis of the historical placebo-controlled
`trials, so the Cs are from di(cid:2)erent data sets. The approach is very clever for from the above
`we can in fact obtain an estimate of the variance of the e(cid:2)ect of the new treatment to placebo.
`We obtain this simply by taking logs
`ln(T=P) = ln(T=C) + ln(C=P)
`
`and
`
`var(ln(T=P)) = var(ln(T=C)) + var(ln(C=P))
`Here var denotes variance. Note that all the quantities on the right side of the equations
`are obtainable from existing data. Odds ratios can be dealt with using ratios directly. Others
`have suggested similar methods [24, 37] and even a Bayesian approach has been developed
`[38].
`For