throbber
ELSEVIER
`
`Secondary Endpoints Cannot Be Validly
`Analyzed if the Primary Endpoint Does Not
`Demonstrate Clear Statistical Significance
`
`Robert T. O’Neill, PhD
`Office of Epidemiology
`and biostatistics, Center
`Rockville, Maryland
`
`Drug Evaluation and Research/FDA,
`
`treatment
`the interpretation of observed
`ABSTRACT: There is lack of consensus surrounding
`effects for secondary clinical endpoints when the primary endpoint for which the clinical
`trial was initially designed does not meet the objective of a demonstrated
`effect. We
`provide some arguments
`to support caution in making
`inferences
`for secondary end-
`points in this situation. We examine the definitions of primary and secondary endpoints
`within the context of a hypothesis-testing
`framework
`for multiple endpoints, and we
`address the relationship of the correlation structure of these endpoints and the statistical
`adjustments needed to preserve experiment-wise
`type I error for a valid inference. We
`also address the hypothesis-testing
`framework and the estimation framework
`for valid
`inference, focusing on the interpretation of p-values associated with differentially pow-
`ered hypothesis
`tests for each endpoint to detect an important clinical effect. We point
`out the limitations on the strength of evidence
`(and quantification of uncertainty)
`for
`a secondary endpoint effect that can be derived from only one study and introduce the
`likelihood of replication of the finding in another study of identical size and design as
`Controlled Clin Trials 1997;18:550-556
`a useful concept
`to guide this interpretation.
`0 Elsevier Science Inc. 1997
`
`KEY WORDS: Primary endpoints, secondary endpoints,
`hypothesis fests
`
`statistical adjustments, valid inference,
`
`that secondary end-
`in favor of the provocative premise
`This article argues
`points cannot be validly analyzed
`if the primary endpoint does not demonstrate
`clear statistical significance.
`I ask the following questions: What is the definition
`and what is the difference between a primary and a secondary endpoint, and
`what are the other categories of endpoint definitions? What is the impact of
`the correlation
`structure among
`these endpoints? What
`is meant by a valid
`analysis within the context of a hypothesis-testing
`decision rule for evidence?
`What is meant by the concept of clear statistical significance? Also, I will discuss
`the relationship between
`the sample size of a clinical
`trial planned
`from the
`perspective of power against a particular alternative
`to test a hypothesis and
`the corresponding precision of the estimate of treatment effects derived
`from
`
`for Robert T. OWeill, PhD, Ofice of Epidemiology and Biosfatisfics, Center
`Address reprint requests
`for Drug Evaluation and Research/FDA, Room 158-45, HFD-700,560O Fishers Lane, Rockuille, MD 20857.
`Received February 17, 1997; revised April 7, 1997; accepted April 22, 1997.
`
`Controlled Clinical Trials 18:550-556 (1997)
`0 Elsevier Science Inc. 1997
`655 Avenue of the Americas, New York, NY 10010
`
`0197-2456/97/$17.00
`PII SOl97-2456(97)00075-5
`
`Page 1 of 7
`
`YEDA EXHIBIT NO. 2081
`MYLAN PHARM. v YEDA
`IPR2015-00644
`
`for
`p-values,
`

`
`Inference from Primary and Secondary Endpoints
`
`the differing concepts
`that sample size. This discussion allows us to contrast
`of clear statistical significance and precise estimates of treatment effects. Finally,
`as another comment on the notion of clear statistical
`significance
`and how
`much evidence we may derive from a single clinical trial, I will briefly discuss
`the concept of the chance of replication of statistically
`significant
`results
`(i.e.,
`p-values
`less than a prespecified
`level, say 0.05) in a second clinical
`trial as a
`basis of confirmatory
`evidence of a potentially
`serendipitous
`secondary end-
`point finding observed
`in a single initial study.
`First, to clarify the distinction between a primary endpoint and a secondary
`endpoint,
`I define a primary endpoint as a clinical endpoint
`that provides
`evidence sufficient
`to fully characterize
`clinically
`the effect of a treatment
`in a
`manner
`that would support a regulatory
`claim
`for the treatment. Because
`evaluation of the impact of treatment on a primary endpoint
`is the major
`purpose of a clinical trial, the sample size of the trial is based upon the power
`of the trial to detect a specified clinical benefit on the primary endpoint. A
`secondary endpoint
`is a clinical endpoint that provides additional clinical char-
`acterization of treatment effect but that is not sufficient
`to characterize
`fully
`the benefit or to support a claim for a treatment effect. By definition, a secondary
`endpoint could not, by itself, be convincing of clinically significant
`treatment
`effects, even if it were observed
`to be statistically
`significant. Defined
`in this
`way, a secondary endpoint could not become a primary endpoint after the fact.
`This distinction
`in definitions does not, however,
`illustrate why controversy
`exists concerning whether a statistically
`significant secondary endpoint should
`be considered valid. I believe the controversy arises when there is a multiplicity
`of endpoints whose collective use has not been considered
`in advance and
`when none of these endpoints may fully characterize
`a treatment effect. For
`example,
`the validity of a secondary endpoint analysis becomes difficult
`to
`interpret when composite endpoints, composed of both primary and secondary
`endpoints, are themselves considered as primary or secondary endpoints. An-
`other example
`that is difficult to interpret occurs when an endpoint
`is catego-
`rized as secondary not because
`it would not characterize
`the clinical benefit of
`treatment, but because
`the planned size of the clinical
`trial gives c1 priori low
`statistical power to detect treatment-induced
`changes. If we permit a secondary
`endpoint
`to become a primary endpoint
`solely on the basis of its observed
`statistical significance,
`then it is very important
`to formulate,
`in advance,
`the
`statistical structure of the decision rule for judging clear statistical evidence.
`The clinical trial literature supports the principle of parsimony
`in the choice
`and selection of clinical endpoints used to characterize and test the effects of
`treatment on disease
`[l]. Trialists recognize
`that the multiplicity of treatment
`endpoints within a confirmatory hypothesis-testing
`paradigm can impact inter-
`pretation of results. The quantification of statistical uncertainty of any conclu-
`sion of treatment benefit
`(e.g., a valid analysis)
`should weigh
`the possible
`scenarios
`for judging a result of a clinical trial as successful. Thus, clinical trial
`investigators have usually chosen a few primary endpoints on which to base
`the design of the trial and relegated
`to secondary
`status other endpoints
`that
`are clinically
`interesting,
`corroborative,
`or suggestive but not of convincing
`clinical importance. Mortality, when considered as a secondary endpoint, seems
`to be one of the few exceptions
`to this strategy because of the clinical
`impact
`of a statistically
`significant
`finding. One of the usual reasons
`for designating
`
`Page 2 of 7
`
`YEDA EXHIBIT NO. 2081
`MYLAN PHARM. v YEDA
`IPR2015-00644
`
`551
`

`
`Table
`
`Overall and Adjusted Type 1 Errors for Two Decision Criteria
`Designed
`to Evaluate Four Equally Correlated Endpoints*
`Criterion 1
`Criterion 2
`At Least One of the Four End-
`Each of Four Endpoints
`Correlation Among
`points Must Be Significant at 0.05
`the Four Endpoints Must Be Significant at 0.05
`Adjusted Type 1
`for Endpoints
`to Maintain
`Overall 0.05
`
`R.T. O’Neill
`
`Adjusted Type 1
`for Endpoints
`0.0127
`0.013
`0.014
`0.017
`0.022
`0.05
`
`Overall
`Type 1
`
`Overall
`Type 1
`to Maintain
`Overall 0.05
`
`0.0
`0.2
`0.4
`0.6
`0.8
`1.0
`
`<O.OOl
`0.0002
`0.0014
`0.005
`0.013
`0.05
`
`0.473
`0.376
`0.289
`0.209
`0.136
`0.05
`
`0.186
`0.173
`0.155
`0.133
`0.106
`0.05
`
`* Capizzi and Zhang [Z].
`
`is that the trialist believes u priori that there
`mortality as a secondary endpoint
`is little chance a treatment effect will be observed, given the sample sizes and
`the power to detect a clinically
`important effect on mortality.
`that incorporate
`Further, when a clinical trial employs secondary endpoints
`all cause and cause-specific mortality, nonfatal events, composites of fatal and
`nonfatal events, and composites of highly correlated endpoints and competing
`multiple risk endpoints,
`the interpretation
`of the multiplicity of outcomes be-
`comes complex. This is especially
`true if the decision
`rules and multiplicity
`adjustments
`to control false conclusions are not properly planned
`in advance
`as objectives of a clinical
`trial.
`formula-
`We assume, as most clinical trial protocols do, a hypothesis-testing
`tion, and it is within
`this framework
`that statistical
`significance of treatment
`effects associated with the primary and secondary endpoints
`is the criterion
`used in judging
`the uncertainty of the result. We interpret a valid analysis
`to
`mean
`that the observed
`strength of evidence,
`as represented
`by a p-value
`and a confidence
`interval,
`is considered well within acceptable bounds
`for
`controlling overall type I error for the trial, and other hypothesis-testing
`consid-
`erations, such as adjustments
`for multiplicity of endpoints, are satisfied.
`
`MULTIPLE CORRELATED
`IN A CLINICAL TRIAL
`
`ENDPOINTS:
`
`THE DECISION RULE FOR A “WIN”
`
`as a collection of
`endpoints may be considered
`Primary and secondary
`multiple endpoints, each of which,
`if not held to a protocol-defined
`criterion
`for valid interpretation,
`could produce a variety of outcomes whose uncertainty
`is difficult to quantify. Table 1, adapted from Capizzi and Zhang [2], illustrates
`the impact
`that two different decision rules have on the overall
`type I error
`in a hypothesis-testing
`framework when
`there are four correlated multiple
`endpoints
`in a trial.
`The table considers
`the correlation
`ranges
`
`structure in which
`four endpoints with an equicorrelation
`from 0 to 1. We consider
`two decision criteria within
`
`Page 3 of 7
`
`YEDA EXHIBIT NO. 2081
`MYLAN PHARM. v YEDA
`IPR2015-00644
`
`552
`1
`

`
`Inference from Primary and Secondary Endpoints
`
`that at least one of the
`framework. Criterion 2 requires
`the hypothesis-testing
`four clinical endpoints demonstrate a statistically
`significant
`finding at a 0.05
`type 1 level. Depending upon the correlation among the four endpoints,
`the
`overall
`type I error for the decision
`rule can range
`from 0.05 to 0.186. To
`maintain an overall 0.05 error rate, the adjusted type 1 levels for the individual
`endpoint can range between 0.05 and 0.0127. Clearly,
`the validity of the infer-
`ence is sensitive
`to the correlation
`structure
`for this decision rule.
`The other decision
`rule, criterion 1, requires
`that all four of the clinical
`endpoints demonstrate
`a statistically
`significant
`result at the 0.05 level. The
`overall type I error rate for this decision rule ranges from 0.05 when correlation
`is 1 to less than 0.0001 when the endpoints are uncorrelated. To maintain an
`overall 0.05 error rate, the adjusted
`levels for the individual endpoints
`range
`from 0.05 to 0.473. Thus, both decision rules are valid for their intended pur-
`poses. The inferences made from each are valid, but they differ
`in terms of
`both clinical and statistical
`interpretation.
`in Table 1 is that the number of
`The message derived
`from the information
`endpoints,
`the correlation structure among the endpoints, and the decision rule
`for a “win” all matter in judging the validity of the inference. When the criterion
`for that win is that at least one endpoint must be statistically
`significant at the
`0.05 level, then the need for statistical adjustments
`to maintain a constant overall
`0.05
`level decreases
`as the correlation
`among
`the four endpoints
`increases
`toward 1. On the other hand, no statistical adjustments may be needed and,
`in fact, the overall
`type 1 error of 0.05 is conservative when the win criterion
`specifies
`that each of four clinical endpoints must demonstrate
`statistical sig-
`nificance at the 0.05 level.
`
`IS CONDITIONAL ON WHETHER
`INFERENCE ON THE SECONDARY ENDPOINT
`THE PRIMARY ENDPOINT
`IS OR IS NOT STATISTICALLY
`SIGNIFICANT
`
`Consider the implications of making inference on the secondary endpoint condi-
`tional on what has occurred with the primary endpoint. When there is correla-
`tion between
`the primary and secondary endpoints, as indeed there would be
`when some endpoints
`are functions or composites of other endpoints,
`the
`information
`conveyed
`in the secondary
`endpoint differs when
`the primary
`endpoint
`is and is not significant. The conditional nature of this inference raises
`some interesting
`issues. For example,
`the primary endpoint usually forms the
`basis for the design, sample size, and power of a clinical trial. Thus, it should
`be more likely to observe significant p-values for the primary endpoint when
`the trial
`is well powered and when
`the alternative hypothesis
`(a specified
`treatment effect) is true. If the clinical trial is underpowered
`for the secondary
`endpoint, as it might be when mortality
`is categorized as a secondary endpoint,
`we should expect to observe significant p-values associated with the secondary
`endpoint
`to a lesser extent, even if the alternative
`for the secondary endpoint
`is true. Thus, differentially powered
`tests for each of the multiple endpoints
`play some role in the interpretation of observed outcomes.
`the secondary
`An example
`that may be more problematic
`occurs when
`endpoint
`is a composite endpoint such as total mortality, which also includes
`a primary endpoint
`like cardiovascular mortality. The validity of the analysis
`should be influenced by the correlation
`structure
`induced by inclusion of the
`
`Page 4 of 7
`
`YEDA EXHIBIT NO. 2081
`MYLAN PHARM. v YEDA
`IPR2015-00644
`
`553
`

`
`R.T. O’Neill
`
`contri-
`into the secondary endpoint and by the proportional
`primary endpoint
`bution of the primary endpoint
`treatment effect to the secondary
`composite
`endpoint
`treatment effect. If the treatment effect on the primary endpoint
`is
`not observed
`to be statistically
`significant,
`than the interpretation
`of a valid
`inference
`for the secondary
`(composite)
`endpoint should address
`the condi-
`tional nature of the criteria. For example,
`if a primary endpoint
`that comprises
`75% of a composite
`secondary
`endpoint
`is not statistically
`significant,
`even
`when the trial is powered
`for that primary endpoint,
`there is now information
`that a component of the composite
`secondary
`endpoint
`is not likely
`to be
`impacted by treatment. The clinical and statistical
`interpretation
`of a valid
`inference
`for the secondary
`endpoint
`seems
`to me to involve a conditional
`inference,
`the statistical adjustments
`for which can be viewed
`in several ways.
`I am unaware of research
`that directly sheds
`light on the properties of the
`inference
`for these situations, but I question
`the validity of the inference.
`
`SAMPLE SIZE AND THE PRECISION OF THE EXPECTED TREATMENT EFFECT:
`ANOTHER MEASURE OF VALIDITY
`
`that the precision of the estimate of
`trialists recognize
`While most clinical
`treatment effect
`is important
`to the characterization
`of the treatment effect,
`clinical
`trials are usually not designed
`to estimate precisely a treatment effect.
`Rather,
`the precision of the estimate of the treatment effect is a byproduct of
`the clinical
`trial, the size of which
`is based on detecting a posited clinically
`important
`treatment effect in a hypotheis-testing
`framework. A valid inference
`may consider the clinical utility of the precision of the estimate of the treatment
`effect on the secondary endpoint as compared with that on the primary end-
`point or a multiplicity of endpoints of different variabilities.
`for the
`interval
`Figure 1 illustrates
`the expected width of a 95% confidence
`planned
`treatment effect size associated with a sample size designed
`to detect
`a univariate endpoint effect size with 90% power, using a hypothesis
`test as
`the method of inference
`[2]. The sample sizes, calculated
`according
`to the
`traditional hypthesis-testing
`paradigm, are designed
`to detect important
`treat-
`ment effect sizes ranging
`from 0.1 to 1.0, with 90% study power. A treatment
`effect size is defined as the clinically
`important difference
`in average endpoint
`response
`in the test and control groups divided by the standard deviation of
`the endpoint response. Figure 1 may provide
`insight
`into the extent to which
`different endpoints with different
`treatment effect sizes will have, for a fixed-
`sample-size
`trial, different precisions
`for the estimates. A given trial sample
`size may be more or less sufficient
`to provide
`the necessary power
`for any
`primary or secondary endpoint, depending upon the variations among each
`of the endpoints
`in control group response and its variability and upon the
`relative
`treatment
`impact on each of the endpoints.
`
`INTERPRETATION OF THE P-VALUE FOR PRIMARY AND
`SECONDARY ENDPOINTS
`
`for a secondary endpoint
`A clinical trial may be substantially underpowered
`treatment effects. For a
`relative
`to the primary endpoint
`for their respective
`given fixed sample size of a trial, assuming a true treatment effect, D, exists
`
`Page 5 of 7
`
`YEDA EXHIBIT NO. 2081
`MYLAN PHARM. v YEDA
`IPR2015-00644
`
`554
`

`
`Inference from Primary and Secondary Endpoints
`
`555
`
`1.0 -
`l 9 -
`I
`I
`N=26 $j
`.8-
`I
`I
`
`I
`
`I
`
`N=33 iz :;- t-1 N=43 E-c 1-1 N=59 $ s- g N=85
`
`1-1
`.4-
`Nt132 % 4 ::- i---------_-I N =234
`H
`N=528 01 - H Nt2102
`0
`I I I
`0 .1 .2 .3 .4 .5 .8 .7 .8 .9 1.01.1 1.21.3 1.41.51.8 1.7
`Effect Size A / (3
`
`Figure
`
`Expected width of 95% confidence interval for studies powered at 90% for an assumed
`treatment of ES.
`
`the expected distribution of the p-values associated
`for a secondary endpoint,
`with the test of treatment effect on the secondary endpoint may differ from
`the distribution
`of expected p-values
`for the primary endpoints. Statistical
`adjustments used in multiple
`testing situations are designed
`to preserve
`the
`overall experiment-wise
`type I error rate for the variety of comparisons
`envi-
`sioned. Statistical adjustment procedures
`for multiple comparisons assume the
`null hypothesis
`is true. The expected distribution of p-values for primary and
`secondary endpoints should differ as a function of the power to detect a specific
`alternate
`treatment effect size for each endpoint
`[4]. An observed p-value of
`0.05 for an underpowered
`(secondary) endpoint may be more impressive
`than
`an observed p-value of 0.05 for a substantially overpowered
`(primary) endpoint.
`The interpretation
`of the strength of the evidence on behalf of a treatment
`effect on a secondary or primary endpoint should
`involve knowledge of the
`distribution of the p-value when both the null and alternative hypotheses are
`true for each of the endpoints.
`finding may
`A final point: Under certain situations, a secondary endpoint
`be considered an exploratory
`result, especially when criteria
`for assessing
`its
`importance have not been clearly specified
`in the protocol
`in advance.
`If we
`use the observed magnitude of the p-value as a measure of the validity of the
`inference,
`it is worth considering how likely
`it is, given the p-value observed
`in that initial study, that a second study would replicate the statistical evidence
`observed
`in the initial study. This
`is one conceptual approach
`to verifying
`whether a statistically significant
`finding for a secondary endpoint
`is serendipi-
`tous, especially
`if considered
`in light of an exploratory or hypothesis-generating
`finding. Exploratory
`findings, especially
`those derived
`from analyses not pre-
`specified
`in a protocol, should be confirmed
`in a second study designed
`in part
`
`Page 6 of 7
`
`YEDA EXHIBIT NO. 2081
`MYLAN PHARM. v YEDA
`IPR2015-00644
`
`N=22
`-
`I I I I I I I I I I , I I I
`1
`

`
`Table 2 Probability of Observing a Statistically Significant Result (p <
`0.05) upon Repetition of a Clinical Trial when the Effect, ES, Observed
`in the First Trial Is Assumed
`to Be the True Effect
`Probability of a
`Significant Result (Power)
`
`Observed
`p-Value
`
`R.T. O’Neill
`
`0.10
`0.05
`0.03
`0.01
`0.005
`0.001
`
`0.37
`0.57
`0.58
`0.73
`0.80
`0.91
`
`from the first study. Some ideas of
`to replicate or confirm the “valid” analysis
`Goodman
`[5] can be adapted to illustrate
`that if a second trial were conducted
`in a manner
`identical
`to that of the initial trial that produced
`the secondary
`endpoint result, and if one used the same sample size as was used in the initial
`study, and if one assumed
`that the treatment effect size observed
`in the first
`trial was the true effect size against which to calculate power of the second
`study, the chance a having a statistically
`significant
`result in the repeat study
`(i.e., p-value
`less than 0.05) can be calculated. Table 2 presents
`the probability
`of a statistically
`significant
`result in the second study (power) as a function of
`the observed p-value
`in the first study. For an initial study with an observed
`p-value of 0.10, th e chances of an observed p-value of 0.10 or less in a repeat
`study of the same sample size is about 37%. It is not until one observes a p-
`value of 0.001 that the chances of observing a p-value of 0.05 or less in the
`second study
`is 90%. Viewed
`in this manner, an observed
`p = 0.05 for a
`secondary endpoint might not be convincing of evidence against the null hy-
`pothesis.
`to support the position that second-
`these arguments
`Finally, after presenting
`ary endpoints cannot be validly
`interpreted when the primary endpoints are
`not statistically
`significant,
`I would conclude by saying, “Never say never.”
`There are too many situations
`that have not been thoroughly explored.
`
`REFERENCES
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`CPMP Working Party and Efficacy of Medicinal Products. Note for guidance: biosta-
`tistical methodology
`in clinical trials in applications
`for marketing authorization
`for
`medical products. Stat Med 1995;14:1659-1682.
`Capizzi T, Zhang JI. Testing the hypothesis
`that matters for multiple primary end-
`points. Drug Info J 1996;30:949-956.
`Bristol DR. Sample sizes for constructing confidence
`Stat Med 1989;8:803-811.
`Hung HMJ, O’Neill R, Bauer P, Kohne K. The behavior of the p-value when the
`alternative hypothesis
`is true. Biometrics 1997;53:11-22.
`Goodman SN. A comment on replication, p-values and evidence. Stat Med 1992;
`11:875-879.
`
`intervals and testing hypotheses.
`
`Page 7 of 7
`
`YEDA EXHIBIT NO. 2081
`MYLAN PHARM. v YEDA
`IPR2015-00644
`
`556

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket