throbber
3. Recent developments in the design of phase II
`clinical trials
`
`Peter F. Thall and Richard M. Simon
`
`Introduction
`
`Clinical trials of new medical treatments may be classified into three succes-
`sive phases. Phase I trials typically are small pilot studies to determine the
`therapeutic dose of a drug, biological agent, radiation schedule, or a com-
`bination of these regimens (cf. [1]). In cancer therapeutics, the underlying
`idea is that a higher dose of the therapeutic agent kills more cancer cells but
`also is more likely to harm and possibly kill the patient. Consequently,
`toxicity is the usual criterion for determining a maximum tolerable dose
`(MTD), and most phase I cancer trials involve very small groups of patients,
`usually three to six patients per dose, with each successive group receiving a
`higher dose until it is likely that the MTD has been reached. A more refined
`approach that continually updates an estimate of the probability of toxicity
`has also been proposed by O’Quigley, Pepe and Fisher [2].
`Once a dose and schedule of a new experimental regimen E have been
`determined, its therapeutic efficacy is evaluated in a phase II trial. Phase II
`trials are usually single-arm studies involving roughly 11 = 14 to 90 patients
`treated with E, with It usually well under 60. These studies typically are
`carried out within a single institution and are most prominent in clinical
`environments where there are many new treatments to be tested. The
`primary goal is to determine whether E has a level of antidisease activity
`sufficiently promising to warrant its evaluation in a subsequent phase III
`trial (described below). Phase II results also frequently serve as the basis for
`additional single-arm studies involving E in other combination regimens or
`dosage schedules. The main statistical objective of a phase II trial thus is to
`provide an estimator of the response rate associated with E (cf. [3]). Treat-
`ment success generally is characterized by a binary patient response, such as
`50% or more shrinkage of a solid tumor or complete remission of leukemia,
`and the scientific focus is p, the probability of response with E. Patient
`response usually is defined over a relatively short time period in phase 11,
`based on the underlying idea that short-term response is a necessary pre-
`cursor to improved long-term survival and reduction in morbidity. Phase
`II trials are important because they are the primary means of selecting
`
`Peter F. 7712!! (ed), RECENTADVANCES IN CLINICAL TRIAL DESIGN AND ANALYSIS.
`Copyright © 1995 Kfuwr Academic Publishers. Barton. AH rights reserved. ISBN 9784-4613-5505.
`
`Genentech 2066
`
`Celltrion v. Genentech
`
`|PR2017-01122
`
`Genentech 2066
`Celltrion v. Genentech
`IPR2017-01122
`
`

`

`treatments for phase III evaluation, andmoreover, many patients receive
`treatment within the context of a phase II trial.
`The ultimate standard for evaluation of medical treatments is the ran-
`
`domized comparative phase III clinical trial. Phase III trials generally are
`large, multi—institutional studies with treatments evaluated in terms of long-
`term patient response, such as survival or time to disease progression. Phase
`III trials are designed and conducted to evaluate the effectiveness of a
`treatment relative to an appropriate control and with regard to endpoints
`that represent patient benefit, such as survival. To achieve such objectives,
`the trial design is based on statistical tests of one or more hypotheses and
`may require approximate balance and minimal sample size within important
`patient subgroups. Because they are larger and of longer duration than
`phase II trials, and typically involve multiple institutions, phase III trials are
`usually much more costly and logistically complicated. The results of phase
`III trials are broadly disseminated within the medical community and form-
`the basis for changes and advances in general medical practice.
`The simplest phase II design is a single-arm, single-stage trial in which n
`patients are treated with E. The data consist of the random variable Y",
`namely, the number of successes after 11 patients are evaluated, which is
`binomial in n and p. The sample size is determined so that, given a fixed
`standard rate p0 that is of no clinical interest, a test of H0: p S p0 versus
`H1: p 2 p1 has type I error probability (significance level) S (1 and type—II
`error probability S B for a given target response probability p1 = p0 + 5.
`The test is determined by a cutoff r, with H0 rejected if Y” 2 r and H1
`rejected if Yn < r. A type I error occurs if it is concluded that E is promising
`compared to standard therapy, i.e., if H1 is accepted, when in fact p .S p0.
`The consequences of this are that an uninteresting or even inferior treat-
`ment is likely to become the basis for a phase III trial, and that if future
`phase II trials using a combination therapy based on E. are conducted, the
`patients in those trials will be treated with an inferior agent while phase II
`trials of other potentially promising new treatments are delayed. A type II
`error occurs if it is concluded that E is not promising compared tostandard
`therapy, i.e., if H0 is accepted when in fact, p 2 p0- + 5. The power of the
`test is 1 — [3, the probability of correctly accepting H1 when E really has
`success rate p0 + 5. The consequence of a type II error is that a promising
`treatment has been lost or its detection delayed. The required sample size n
`and test cutoff r are determined by specifying a, B, p0, and 5. Since there is a
`trade-off between type I and type II error,
`in practice-typically. ((M3) =,-
`(010,010), (005,020), or (005,010). We shall refer to (1 and [3, and more
`generally any parameters that describe a design’s behavior, as its operating
`characteristics.
`_
`.
`
`Smaller treatment advances 5 are harder to detect, i.e., they require a
`larger sample size for given p0, 0t, and B. A very large. 5'requires a-trivially
`small sample size, i.e.,
`it
`is easy to detect a large treatment advance.
`Reasonable values are ,thus 5 =._. 0.15 to 0.20, since 5 < 0.15 usually leads to
`
`50
`
`

`

`unrealistically large n while 5 > 0.20 leads to a trial yielding very little
`information about E and in many cases is intellectually dishonest. Parameters
`of some typical single-stage designs are given in table 1.
`An alternative to designing a single-stage trial in terms of hypothesis
`testing, which is a formal method for deciding whether E is promising
`compared to the fixed-standard success probability p0,
`is to choose 77
`to
`obtain a confidence interval of given width and level (coverage probability)
`to estimate 1). A good approximate confidence interval, due to Ghosh [4], is
`
`
`p + A/2 i z{p(1 — mm + A/(4n)}“2
`1 + A
`
`’
`
`where 13 = Yn/n, z = 1.645, 1.96, or 2.576 for a 90%, 95%, or 99% coverage
`probability, respectively, and A = zz/n. The exact binomial confidence
`interval of Clopper and Pearson [5] also may be used, although the above
`approximation is quite adequate for planning purposes. An important caveat
`is that the commonly used approximate interval 13 i 2{ 13(1 — [By/1}” is
`rather inaccurate for many values of n and p encountered in phase II trials
`[4] and is not recommended. Table 2 gives the sample sizes needed to obtain
`90% or 95% confidence intervals for p of given width, based on values of 13
`from 0.20 to 0.50. The sample sizes for 13 = 0.50 + A and 0.50 — A are
`identical. For example, if it is anticipated that the empirical rate Yn/n will be
`approximately 0.30 or 0.70,
`then a sample of 34 patients is required to
`obtain a 90% confidence interval for 1) having width at most 0.25. Given an
`observed rate of 10/34, one could be 90% certain that the true success
`probability of E is somewhere between 0.185 and 0.434.
`Although the single-stage design is easy to understand and implement, it
`has several severe practical limitations. Each of the designs described in the
`following sections was created to address one or more of the following
`problems.
`
`Table I. Single-stage designs Conclude p 2 p, at level a and
`power 1 — 0 if Km 2 r/n.
`
`
`(9,13)
`
`
`p0
`p,
`(010,010)
`(005,020)
`(005,010)
`
`5
`
`0.20
`
`0.15
`
`0.10
`0.20
`0.30
`0.40
`
`0.10
`0.20
`0.30
`0.40
`
`0.30
`0.40
`0.50
`0.60
`
`0.25
`0.35
`0.45
`0.55
`
`5/25
`11/36
`16/39
`21/41
`
`7/40
`17/61
`27/71
`36/75
`
`6/25
`12/35
`17/39
`23/42
`
`8/40
`17/56
`, 27/67
`36/71
`
`7/33
`15/47
`22/53
`29/56
`
`10/55
`22/77
`36/93-
`46/94
`
`51
`
`

`

`Table 2. Single-stage n to obtain confidence interval of given
`level and width SW
`
`Level
`
`90%
`
`95%
`
`W
`
`0.20
`0.25
`0. 30
`
`0.20
`0.25
`0. 30
`
`Anticipated 1‘) = Yn/n
`
`0.20
`
`0.30
`
`0.40
`
`0.50
`
`44
`26
`19
`
`64
`39
`26
`
`55
`34
`24
`
`78
`48
`33
`
`63
`40
`26
`
`89
`56
`38
`
`66
`42
`28
`
`94
`58
`40
`
`1. The most serious limitation of the single-stage design is that it ignores
`all data prior to observation of Y", and in particular has no provision for
`early termination if the interim observed response rate is unacceptably low.
`For example,
`if p0 = 0.30 is the established response rate with standard
`treatment and E also has rate p = 0.30, then an initial run of 12 failures
`should occur with probability 0.014, and if p > 0.30 then such a run has
`probability close to 0. Most clinicians would be strongly inclined to dis-
`continue use of E at or before this point, especially in trials of treatments for
`rapidly fatal diseases or other circumstances where early failure increases
`morbidity or reduces survival. Designs with early stopping rules address this
`problem (cf. [6—14]).
`2. Reporting results of a phase II trial entails augmenting or replacing
`significance test results with a confidence interval for p, since the real goal of
`a phase II trial is estimation [3]. If rules for early stopping are included in
`the design, however,
`then computation of the confidence interval for p
`based on the final data must account for the fact that the trial continued
`
`through its intermediate stages, since the usual unadjusted confidence
`intervals are biased in this case. Methods for computing a confidence interval
`for p after a multistage trial have been given by numerous authors, including
`Jennison and Turnbull [15], Tsiatis, Rosner, and Mehta [16], Atkinson and
`Brown [17], and Duffy and Santner [18].
`3. Another problem, addressed by Thall and Simon [19], is that p0 often
`is estimated from historical data and hence is a statistic [30, not a fixed value.
`Since this estimator has an associated variance, the usual test statistic Yn/n
`— 130 has variance p(1 — p)/n + var(p0). The sample size computation that
`ignores var(p0) is incorrect, and the actual type I and type II error rates are
`larger than their nominal values.
`-
`4. In some settings several new treatments may be ready simultaneously
`for phase II evaluation. The question then arises of whether to carry out a
`sequence of single-arm trials or one randomized trial, and in either case
`strategies are needed for prioritizing treatments and for selecting one or
`
`52
`
`

`

`more promising treatments from those tested. Several approaches to this
`general problem have been proposed. Simon, Wittes, and Ellenberg [20]
`propose a randomized phase II trial; Whitehead [21] proposes a combined
`phase II—III strategy; Thall, Simon, and Ellenberg [22,23] propose ‘select
`then test’ designs for comparing the best of several experimental treatments
`to a standard; and Strauss and Simon [24] examine properties of a sequence
`of ‘play the winner’ randomized phase II trials.
`5. The assumption that patient response can be characterized effectively
`by a single variable is rather strong, even for short-term response, and it
`may be necessary to monitor more than one-patient outcome. For example,
`in most cancer chemotherapy trials, toxicity is an important issue, and it is
`highly desirable to have an early stopping rule to protect future patients
`from unacceptably high rates of toxicity. Many phase II trials include such a
`rule either formally or informally in their prOtocols, but they ignore the
`interdependence between toxicity and response in the design. Designs
`accounting for multiple outcomes have been proposed by Etzioni and Pepe
`[25] and Thall, Simon, and Estey [26].
`trials
`6. Patient-to-patient variability is often high, even in clinical
`with very specific entry criteria. Since phase II trials are relatively small,
`a study with an unusually high proportion of either poor-prognosis 0r
`good-prognosis patients may give a misleadingly pessimistic or optimistic
`indication of how E would behave in the general patient population.
`7. Although most phase II designs regard treatment response rate p
`as a fixed unknown quantity, many clinicians regard p as random. For
`example, when asked to specify p0, the clinician may respond by giving a
`range rather than a single value, and may even describe the probability
`distribution of po within that range.
`In such circumstances, a Bayesian
`design, based on random values of po and p, may be more appropriate.
`Bayesian phase 11 designs have been proposed by Sylvester and Staquet
`[27,28] Sylvester
`[29], Etzioni and Pepe [25], and Thall and Simon
`[12—14], and Thall, Simon and Estey [30].
`
`Refinements of the phase I—II—III paradigm
`
`When the best available therapy has little or no effect against the disease,
`the phase II trial’s objective is to determine whether E has any antidisease
`activity at all. This is a phase IIA trial. Since p0 = 0 or possibly 0.05 in this
`case, type II error is the main consideration. Gehan [6] proposed the first
`phase IIA design, a two—stage design in which In patients are treated at stage
`1, the trial is stopped if Yn] = 0, and an additional n2 patients are treated in
`stage 2 if Y"1 > 0. The stage 1 sample size is chosen to control type II error,
`specifically n1 2 log(B)/log(1 — p1) for targeted success rate p1. The stage 2
`sample size is chosen to obtain 13 having standard error no larger than a
`given magnitude, and 11; also depends on Ynl. For example, if [3 = 0.05 and
`
`53
`
`

`

`p1 = 0.20, then n1 = 14 patients are required at stage 1. If Y14 > 0, then to
`obtain an estimate of p having standard error 0.10 requires 122 = 1, 6, 9, or
`11 if Y14 is 1, 2, 3, or 24, respectively.
`When there exists a standard treatment, say S, having some level of
`activity (i.e., when p0 > 0), then the goal is to identify new treatments that
`are promising compared to p0. This is a phase IIB trial. In this case, there
`are compelling data,.arising from clinical trials or in vitro testing, indicating
`that E is likely to be active at a level exceeding p0. An important considera-
`tion in IIB trials is that it is clinically undesirable to continue a trial of an
`experimental treatment that proves to be not promising compared to S. For
`example, when p0 = 0.40 and p1 = 0.55,
`if interim trial results strongly
`indicate that p < 0.40, then it is unethical to continue; if it is likely that 0.40
`S p < 0.55, then it may be desirable to terminate the trial to make way for
`other, potentially more promising new treatments. It is also important to
`recognize the comparative aspect of phase IIB trials, which may lead to
`formal use of historical data on S in the evaluation of E, and possibly to a
`randomized trial [19]. This issue will be discussed below.
`If several new treatments are simultaneously available for phase II test-
`ing, then’the problem of choosing among them arises. Since the number of
`patients in any clinic- is limited, this situation frequently occurs in institutions
`with high levels of research activity in growth factors or pharmacologic
`agents. Thall and Estey [30] propose a pre-phase II Bayesian strategy in
`which patients having a prognosis more favorable than that of phase I
`patients but less favorable than that of the target group of the subsequent
`phase II trial are randomized among several experimental treatments. The
`response rate. distribution in each treatment arm is updated. continually
`during the trial and is compared to early termination cutoffs, and the best
`final treatment must satisfy a minimal posterior efficacy criterion before it is
`evaluated in a subsequent phase II trial. This type of study, the phase 1.5
`trial, bridges the gap between phase I and phase IIB. It provides an ethical
`means of giving poor—prognosis patients experimental
`treatments while
`replacing the usual informal pre-phase II treatment selection process with a
`fair comparison formally based on a combination of prior opinion and
`clinical data.
`
`As an example, a phase 1.5 trial might be carried out in patients who have
`acute myelogenous leukemia (AML) ' with 21 prior relapse and poor-
`prognosis cytogenetic characteristics, in order to select a treatment for phase
`II testing in untreated AML patients who have good-prognosis cytogenetics.
`If the accrual rate is 40 per year in the poor-prognosis group, then a phase
`1.5 trial of three treatments with up to 10 patients per treatment arm could
`be carried out in nine months. Assuming a prior mean response rate of 0.40
`for all three arms, Thall and Estey [30] recommend a design in which a
`treatment arm is terminated if there are 0 responses in the first 4 patients;
`otherwise, 10 patients are accrued in that arm. The best treatment, among
`
`54
`
`

`

`those not terminated, must have 24 responses to be selected for the phase
`II trial.
`
`The response rates obtained in different phase II trials of the same
`treatment often vary widely. Simon, Wittes, and Ellenberg [20] cite a
`number of factors as the sources of this variability, including patient selec—
`tion, definition of response, interobserver variability in response evaluation,
`drug dosage and schedule, reporting procedures, and sample size. To deal
`with these problems,
`these authors prOpose randomizing patients among
`several experimental treatments in phase II, with ranking and selection
`methods rather than hypothesis testing used to evaluate treatments. They
`recommend the use of conventional phase 11 sample sizes and early stopping
`criteria in each treatment arm, and that a standard treatment arm not be
`
`included. Specifically, they propose that sample size be computed to ensure
`that, if one group of treatments has response rate [)0 + 8 and the rest have
`rate p0, then a ‘select the best’ strategy will choose one of the superior
`treatments with a desired probability. For example, if p0 = 0.20 and 5 =
`0.15, then 44 patients in each of three arms will ensure a 90% chance of
`choosing a treatment with response rate 0.35.
`Strategies for phase II evaluation of new treatments that become available
`sequentially over time have been considered by Whitehead [31] and by
`Strauss and Simon [24]. Whitehead is motivated in part by the desire to
`examine the properties of small sample sizes for phase H studies. He as-
`sumes that the success rates of the experimental treatments are random and
`may be considered as independent draWs from a beta prior distribution.
`Given N equal to the total number of patients for all the trials, he derives
`the number of trials k and number of patients per trial n that maximize the
`expected success probability E(7c) of the selected treatment, subject to nk =
`N. For example, if N = 60 and the mean experimental success rate is 0.20,
`then depending upon prior variability, the optimal integer values of (n, k)
`and E(1r) vary from (4,15) with E(TE) = 0.426, to (6,10) with E(Tc) = 0.292.
`Strauss and Simon [24] study properties of a sequence of comparative
`phase II trials. At each of k stages, 211 patients are randomized between a
`new experimental treatment and the better of the two treatments from the
`previous stage, starting with a known standard S at stage 1. The better of
`the two treatments at each stage (the ‘winner’)
`thus becomes the new
`standard, and is then compared to the next experimental treatment. The
`goal
`is to select a single treatment for phase III evaluation. Similar to
`Whitehead [31], Strauss and Simon assume that the success probabilities
`of the experimental treatments‘are independent draws from a beta prior
`distribution, either with fixed mean equal to that of S or with distribution
`adapted to the data in that its mean equals that of the latest winner. This
`approach, however,
`is more robust against time trends in the selection of
`patients. Given a total of N = nk patients, they examine the manner in
`which the expected success probability E( p) of the final selected treatment
`
`55
`
`

`

`varies with n, k, and N. They identify conditions under which such a
`sequence of phase II trials is more likely than a single phase II trial to
`identify a promising experimental treatment.
`Whitehead [21] also proposes an integrated approach to the problem of
`evaluating several new treatments. A sequence of single-arm phase II trials
`is conducted; the most promising experimental treatment among them is
`selected, and it
`is then compared to the standard in a phase III trial.
`Assuming that the success rates of the experimental treatments are random
`and may be considered as independent draws from a beta prior distribution,
`Whitehead derives strategies for dividing patients between the two phases
`(given the number of phase II trials and the total number of patients) that
`maximize the probability TC of obtaining a significant result in the phase III
`trial. For example, if N = 300 patients are available and there are five new
`agents to be tested, then allocating 18 patients to each of the five phase II
`trials and 210 to phase III ensures that 1: = 0.52. If instead N = 500, then
`the optimal allocation is 31 with 345 in phase III, which ensures that TC =
`0.63. Whitehead notes that, when using this strategy, the main trade-off is
`between the total numbers of patients allocated to the two phases.
`
`Some practical considerations
`
`Because phase II trials are developmental, their design and conduct must
`include several ethical and logistical considerations. These include the
`appropriateness of treating patients with E, the relevance of the trial within
`the larger context of treatment development, the patient accrual rate, defini-
`tion of patient response, and the monetary cost of the trial. In any phase II
`setting, a priori there must be a reasonable basis for the belief that E may
`provide an improvement over the standard, whether p0 = 0 or p0 > 0. If in
`the course of the trial it becomes clear that this is unlikely, then it may be
`desirable to terminate early, and here the unavoidable conflict between type
`I and type II error comes into play. The trade-off is between protecting
`patients from an ineffective or dangerous experimental regimen and risking
`the loss of a treatment advance. If an adverse outcome, such as toxicity,
`is monitored along with the usual efficacy outcome,
`then an alternative
`goal may be to decrease the adverse event rate while maintaining a given
`response rate. Designs which monitor multiple events, such as response and
`toxicity, are discussed in a later section.
`Ethical considerations are most pressing for rapidly fatal diseases, and the
`standards of clinical conduct for such diseases may provide a basis for
`analogous decisions in less extreme circumstances. The desirability of a
`particular treatment E in a phase II trial must be assessed from the view—
`points of the individual patient, all patients in the trial taken as a group, and
`future patients after the trial is completed. A general consideration is that
`patients are more likely to choose a physician rather than a treatment and to
`
`.
`
`56
`
`

`

`rely on their physician’s advice regarding treatment choice. The centuries-
`old process of entrusting one’s life and well-being to one’s physician is a
`fundamental part of medicine, informed consent notwithstanding. Thus, the
`trial must be designed so that trial objectives and individual patient benefit
`are not in conflict. The situation is most desperate in phase IIA trials of
`treatments for rapidly fatal diseases for which no effective treatment exists.
`The trade-off for both the individual patient and for the trial is between the
`risk of adverse treatment effects and the likelihood of any therapeutic
`benefit. For nonfatal diseases, the potential severity of adverse effects first
`must be weighed against the effects of the disease itself, and it is inappro-
`priate to conduct a trial of E if its effects are likely to be worse than those of
`the disease. Phase IIB trials often evaluate combination therapies whose
`components are already known to have antidisease activity. Consequently, a
`new combination regimen with an activity level below that of the standard is
`usually not promising for future development. Two exceptions are a trial in
`which a reduced likelihood of early response may be an acceptable trade-off
`for improved overall survival, and a trial in which the real goal is to reduce
`toxicity and a small reduction in response rate is considered an acceptable
`trade—off. Examples of such trials are given in a later section.
`Patient accrual and monetary cost are absolute limits on the size of any
`clinical trial. If either the number of patients or the available resources are
`insufficient to achieve initial goals, then a smaller trial may be appropriate.
`However, the magnitudes of or and B and the reliability of the final estimate
`of p should be kept in mind when reducing sample size due to low accrual
`rate or limited resources. The results of very small trials often are of limited
`value and, due to their high variability, are potentially misleading. If re—
`sources are inadequate to conduct a trial that will produce useful results,
`then it is inappropriate to conduct the trial.
`A simple but critical
`issue in trial design and conduct is definition of
`patient outcome. For example,
`in AML,
`treatment response is typically
`complete remission (CR), which is defined in terms of several parameters
`(e.g., blast count, platelet recovery, white cell count, etc.), as measured
`within a given timeframe. It is essential that CR be defined formally in the
`protocol and that, however CR is defined, all clinicians involved in the trial
`adhere to that definition. Otherwise, one clinician’s CR may be another’s
`failure, which renders the recorded trial results Virtually meaningless. The
`same considerations apply to definition of adverse outcomes, since there are
`various grades of toxicity, etc. This problem is potentially more severe in
`multi-institutional phase II trials; hence, an even stronger effort must be
`made to define and score patient outcomes consistently.
`Short-term response in a phase II trial is used as the measure of treatment
`effect. For solid tumors, however, partial response often is not a validated
`measure of patient benefit. In general, the comparison of survival between
`responders and nonresponders is not valid for demonstrating that treatment
`has extended survival for responders [32]. Because response is often viewed
`
`57
`
`

`

`as a necessary but not sufficient condition for extending, survival, response
`may be used in phase II
`trials for screening promising treatments. To
`evaluate the effectiveness of a regimen in prolonging survival, however, a
`phase III trial of survival is required.
`
`Historical data and Bayesian designs
`
`Most phase II trials evaluate one or more new treatments relative to a
`standard therapy S; hence, they are inherently comparative, even though a
`standard treatment arm usually is not
`included. In designing the single-
`stage, single-arm trial described in the introduction to this chapter, a common
`practice is to assume that p0 is a known constant (and hence that the statistic
`[3 — p0 = (Yn/n) — p0 has variance var([3) = p(1 — p)/n) and to determine n
`to obtain a test of p 2 p0 versus p = p0 + 5 having given type I and type II
`error rates a and B. For phase IIB trials, where p0 represents the activity
`level of available regimens, the numerical value of po used in this computa-
`tion is often a statistical estimate p0 based on historical data, rather than a
`known constant. The empirical difference [31 — 130, which is the basis for the
`test,
`is thus the difference between two statistics and has variance larger
`than the assumed p(1 — p)/n.r Consequently,
`the sample size computed
`under a model
`ignoring the fact
`that 130 is a statistic is incorrect. This
`common practice may be due to the belief that the variability of 130 is of no
`practical consequence or to the absence of a theoretical basis and associated
`statistical software for computing sample sizes correctly.
`Thall and Simon [19] derive optimal single-stage phase II designs that
`incorporate historical data from one or more trials of S and account for the
`variability inherent in 130. They consider both binary and normally distributed
`responses. Because the variability between historical pilot studies sometimes
`exceeds what is predicted by a binomial model for binary responses, they
`use a beta-binomial model to account for possible extrabinomial variation.
`Their results indicate that it is sometimes best to randomize a proportion of
`patients to S, and they derive the total sample size and optimal proportions
`for allocation to E and S that minimize var(131 — [30). Their results indicate
`that an unbalanced randomization may be superior to a single-arm trial of
`E alone, and that ignoring var(p‘0) may lead to trials with actual values of
`(1 and [3 much higher than their nominal values. For example, consider a
`trial in which [30 = 0.20 is based on three historical trials of 20 patients
`each. To obtain a test that detects an improvement of 8 = 0.20, i.e., for
`alternative p1 = 0.40, with a = 0.05 and B = 0.20,
`the optimal design
`requires 85 patients with 27 allocated to S and 58 to E. If the variability
`in [30 is ignored and a single-arm trial of E is conducted,
`the standard
`computation yields n = 35, and the resulting test will have actual or = 0.14
`and B = 0.27. Since the numerical computations to incorporate the his-
`torical data and obtain the optimal design are somewhat complicated,
`
`58
`
`

`

`a menudriven computer program written in Splus has been made
`available.
`
`The above method for dealing with the variability of an estimate of po
`may be regarded as a particular approach to a more general problem. Given
`that in a phase II trial the success rate of E ultimately must be compared
`to that of S, and that uncertainty regarding the response rate of S will always
`exist, the general problem is to account for this uncertainty when planning
`the trial and interpreting its results. A different statistical approach is based
`on the Bayesian framework, in which the success probabilities of E and S
`are regarded as random rather than fixed parameters. To underscore this
`distinction, we denote the random response probabilities by 05 and 05.
`Although the theoretical basis for Bayesian methods is well established,
`practical methods for clinical trials have been proposed only recently, notably
`by Freedman and Spiegelhalter [33,34], Spiegelhalter and Freedman [35,36],
`Racine et al. [37], and Berry [38,39].
`Sylvester and Staquet [28] and Sylvester [29] propose decision-theoretic
`Bayesian methods for phase II clinical trials. They optimize the sample size
`and decision cutoff of a single-stage design where n is fixed, to determine
`whether a new drug is active, by minimizing the Bayes risk. Their approach
`assumes that Pr [BE = p1] = 1 — Pr [BE = p2], with p2 > p1, where p2 and
`p1 are response rates at which E would and would not be considered
`promising, respectively — i.e., they assume that GE may take on two pos-
`sible values.
`
`Herson [7] proposes the use of predictive probability (PP) as a criterion
`for early termination of phase II trials to minimize the number of patients
`exposed to an ineffective therapy. The PP of an event, such as concluding
`that E is or is not promising according to some decision rule, is the condi-
`tional probability of that event given the current data, computed by first
`averaging over the prior distributions of the parameters, which are 05 and
`BE in the present context. Mehta and Cain [9] provide charts of early
`stopping rules based on the posterior probability of [6,; > p0], where p0 is a
`fixed level at which E would be considered active.
`
`[40] proposes a‘Bayesian procedure for identifying the best
`Palmer
`of three treatments E1,E2,E3. He assumes that
`their respective success
`probabilities are 151 = (a,b,b), 152 = (b,a,b), or 153 = (b,b,a) with prior pro-
`bability 1/3 each, where b < a are known fixed standards, analogous to p0
`and p0 + 5 in the hypothesis—testing context. Given a maximum sample size
`N, patients are first randomized among the treatments in triplets, and based
`on the posterior probabilities of {7:1,Tc2,n3} the worst treatment may be
`dropped. Patients are then randomized between the two remaining treat-
`ments in pairs, and the worse of the two is subsequently dropped based
`on the posterior distribution. The optimality criterion is to maximize the
`expected number of future treatment successes. Palmer g

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket