Network-1 Technologies, Inc. v. Google, L.L.C. et al, 1:14-cv-02396, No. 148-8 (S.D.N.Y. May. 30, 2019)

Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 1 of 12
`Case 1:14-cv-02396—PGG-MHD Document 148-8 Filed 05/30/19 Page 1 of 12
`
`EXHIBIT 4
`
`EXHIBIT 4
`PART
`2 OF 8
`
`PART
`
`20F8
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 2 of 12
`200
`UNSUPERVISED LEARNING AND CLUSTERING
`
`estimates P.i, Ei, and P(wi):
`
`A
`
`equations for the local-maximum-likelihood
`1 n
`f>(wi) = - I,f>(w, I xk, 8)
`n 1,- 1
`n
`I,P(w, I xk, 8)xk
`!Li = -'---,.-----
`I,P(wi I xk, tJ)
`
`,..
`
`k-1
`
`k=l
`
`(14)
`
`(15)
`
`(16)
`
`(17)
`
`n
`
`I,P(w, I xk, 8)(xk -
`tl;)(x,. -
`E-= -k-_l __
`_ ________
`•
`n
`I Pc w, 1 xk> e,)
`
`k=l
`
`f,.;)'
`_
`
`where
`
`I -
`p(x,. j W;, 8i)P(w,)
`~
`P(w, xk, 8) = _c ________
`_
`°'I,p(xk I w1, 61).P(w1)
`I .Eil-112 exp[ - ½(xk -
`I, 1..r,i-112 exp[-½(x,. - ~,YE";1(xk - P.,)].P(w;)
`;- 1
`
`(i..;)'.E-;
`1(xk -
`
`fl,)].P(w,)
`
`A
`
`J - 1
`
`C
`
`,<
`
`A
`
`While the notation may make these equations appear to be rather formid(cid:173)
`is actually quite simple. In the extreme case where
`able, their interpretation
`F(w, Ix", 8) is one when x;. is from Class r11, and 7.ero otherwise, fi(m.) is the
`fraction of samples from w,, P,; is the mean of those samples, and E, is the
`sample covariance matrix. More generally, f>(w, I xk, 8) is
`corresponding
`between zero and one, and all of the samples play some role in the estimates.
`However,
`the estimates are basically still frequency ratios, sa,mple means,
`and sample covariance matrices.
`The problems involved in solving these implicit equations are similar to
`the problems discussed in Section 6.4.1, with the additional complication
`of having to avoid singular solutions. Of the various techniques that can be
`used to obtain a solution, the most obvious approach is to use initial estimates
`to evaluate Eq. (17) for P(w, Ix,., 8) and then to use Eqs. (14)-(16) to
`update these estimates. If the initial estimates are very good, having perhaps
`been obtained from a fairly large set of labelled samples, convergence can
`be quite rapid. However, the results do depend upon the starting point, and
`the problem of multiple solutions is always present. Furthermore,
`the repeated
`computation and inversion of the sample covariance matrices can be quite
`time consuming.
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 3 of 12
`
`APPLICATION TO NORMAL MIXTURES
`
`201
`
`Considerable simplification can be obtained it it is possible to assume that
`the covaria nce matrices are diagonal. This has the added virtue of reducing
`the number of unknown parameters, which is very important when the
`number of samples is not large. If this assumption is too strong, it still may
`be possible to obtain some simplification by assuming that the c covariance
`matrices are equal, which also eliminates the problem of singular solutions.
`The derivation of the appropriate maximum likelihood equations for this
`case is treated in Problems 5 and 6.
`
`6.4.4 A Simple Approximate Procedure
`
`Of the various techniques that can be used to simplify the computation and
`accelerate convergence, we shall briefly consider one elementary, approximate
`method. From Eq. (17), it is clear that the probability F(wi I xk, 8) is large
`(i.;)IE;-1(xk -
`when the squared Mahalanobis distance (xk -
`fli) is small.
`fJ-;112.
`Suppose that we merely compute the squared Euclidean distance llxk -
`find the mean flm nearest to xk, and approximate F(wi I xk, 8) as
`
`i=m
`
`otherwise.
`
`Then the iterative application of Eq. (15) leads to the following procedure*
`for finding P.1, ... , P.c:
`
`Procedure: Basic Isodata
`
`Loop:
`
`1. Choose some initial values for the means (i.1 , ...
`, (i.0 •
`2. Classify the n samples by assigning them to the class of the
`closest mean.
`3. Recompute the means as the average of the samples in their
`class.
`4. If any mean changed value, go to Loop; otherwise, stop.
`
`This is typical of a class of procedures that are known as clustering
`procedures. Later on we shall place it in the class of iterative optimization
`procedures, since the means tend to move so as to minimize a squared-error
`
`* Throughout this chapter we shall name and describe various iterative procedures as if
`they were computer programs. All of these procedures have in fact been programmed,
`often with much more elaborate provisions for doing such things as breaking ties, avoiding
`trap states, and allowing more sophisticated terminating conditions. Thus, we occasionally
`include the word "basic" in their names to emphasize the fact that our interest is limited to
`explaining essential concepts.
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 4 of 12
`202
`UNSUPERVISED LEARNING AND CLUSTERING
`
`criterion function. At the moment we view it merely as an approximate way
`to obta in maximum likelihood estimates for the means. The values obtained
`can be accepted as the answer, or can be used as starting points for the more
`exact computations.
`It is interesting to see how this procedure behaves on the example data in
`Table 6-1. Figure 6.4 shows the sequence of values for µ1 and µ2 obtained
`
`4
`
`-1
`
`-6
`
`-5
`
`-4
`
`- 2
`
`--3
`
`--4
`
`/\
`6 µ,
`
`-~
`
`FIGURE 6.4. Trajectories for the Basic Isodata Procedure.
`
`- 5
`
`for several different starting points. Since interchanging µ1 and µ2 merely
`interchanges the labels assigned to the data , the trajectories are symmetric
`about the line /11 = µ2 • The trajectories lead either to the point /11 = -2.176 ,
`µ2 = 1.684 or to its image. This is close to the solution found by the maximum
`likelihood method (viz., µ1 = - 2.13.0 and µ2 = 1.668), and the trajectories
`show a general resemblance to those shown in Figure 6.3 In general, when
`the overlap between the component densities is small the maximum likelihood
`approach and the lsodata procedure can be expected to give similar results.
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 5 of 12
`UNSUPERVISED BAYESIAN LEARNING
`203
`
`6.5 UNSUPERVISED BAYESIAN LEARNING
`
`6.5.1 The Bayes Classifier
`
`Maximum likelihood methods do not consider the parameter vector 8 to be
`is just unknown. Prior knowledge about likely values for 8 is
`random-it
`irrelevant, although in practice such knowledge may be used in choosing
`good starting points for hill-climbing procedures . In this section we shall
`take a Bayesian approach to unsupervised learning. We shall assume that 8
`is a random variable with a known a priori distribution p(6), and we shall
`use the samples to compute the a posteriori density p(6 I .%). Interestingly
`enough, the analysis will virtually parallel the analysis of supervised Bayesian
`learning, showing that the two problems are formally very similar .
`We begin with an explicit statement of our basic assumptions. We assume
`that:
`
`I. The number of classes is known.
`2. The a priori probabilities P( w;) for each class are known, j = I, ...
`, c.
`3. The forms for the class-conditional probability densities p(x I w1, 61) are
`known,j = 1, ...
`, c, but the parameter vector 8 = (8 1, •••
`, 8c) is not
`known.
`4. Part of our knowledge about 8 is contained in a known a priori density
`p(6).
`5. The rest of our knowledge about 8 is contained in a set ::£ of n samples
`x1 , •••
`, x,. drawn independently from the mixture density
`
`p(x I 8) = Ip(x I W;, 8;)P(w 1).
`
`C
`
`J=l
`
`(1)
`
`At this point we could go directly to the calculation of p(8 I fl"). However,
`let us first see how this density is used to determine the Bayes classifier.
`Suppose that a state of nature is selected with probability P(w;) and a feature
`vector x is selected according to the probability law p(x I w;, 8;), To derive
`the Bayes classifier we must use all of the information at our disposal to
`compute the a posteriori probabil ity P(w. Ix). We exhibit the role of the
`samples explicitly by writing this as P(w; Ix, 8t). By Bayes rule,
`p(x I W;, 8t)P(w; J ::£)
`P(w; x, :!£) = -~-
`--
`~-
`I
`C I p(x J w;, 8t)P(w; I 8t)
`
`.
`
`; - 1
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 6 of 12
`
`204
`
`UNSUPERVISED LEARNING AND CLUSTERING
`
`.
`
`(18)
`
`Since the selection of the state of nature w, was done independently of the
`previously drawn samples, P(w, I :!l) = P(w,), and we obtain
`P(w, I x, fl') = /(x I w,, ~)P(w,)
`_°Lp(x I W;, :!l)P(w1)
`1=1
`We introduce the unknown parameter vector by writing
`p(x I w,, :!l) = f p(x, 8 I w,, ~) d6
`= f p(x I 8, w,, :!l)p(6 I w,, :!l) d6.
`Since the selection of x is independent of the samples, p(x I 8, wi, :!l) =
`p(x I w,, 8,). Similarly, since knowledge of the state of nature when x is
`selected tells us nothing about the distribution of 8, p(8 j w,, :!l) = p(8 I :!l).
`Thus we obtain
`
`(19)
`That is, our best estimate of p(x I wi) is obtained by averaging p(x I w,, 8i)
`over 8 •. Whether or not this is a good estimate depends on the nature of
`p(8 I :!l), and thus our attention turns at last to that density.
`
`6.5.2 Learning the Parameter Vector
`
`Using Bayes rule, we can write
`p(8 I :!l) = P< fl' I 8) p(8)
`f p(~ I 8)p(8) d6
`
`(20)
`
`n
`
`p(~
`
`where the independence of the samples yields
`j 8) = II p(xk I 8).
`k=l
`Alternatively, letting fl°" denote the set of n samples, we can write Eq. (20)
`in the recursive form
`p(8 I ~") = p(x,. I 8)p(8 I ~n-1)
`f p(x,. I 8)p(8 I fl'"- 1
`
`(21)
`
`(22)
`
`.
`
`) d6
`
`These are the basic equations for unsupervised Bayesian learning. Eq. (20)
`emphasizes the relation between the Bayesian and the maximum likelihood
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 7 of 12
`
`UNSUPERVISED BAYESIAN LEARNING
`205
`solutions. If p(8) is essentially uniform over the region where p(ft I 8) peaks,
`then p(8 j f'£) peaks at the same place. If the only significant peak occurs at
`8 :::: 8 and if the peak is very sharp, then Eqs. (19) and (18) yield
`p(x I <»;, ft) l'::I p(x I <»;, 8;)
`
`and
`
`That is, these conditions justify the use of the maximum likelihood estimate
`as if it were the true value of 8 in designing the Bayes classifier.
`Of course, if p(8) has been obtained by supervised learning using a large
`set oflabelled samples, it will be far from uniform, and it will have a dominant
`influence on p(8 I fr n), when n is small. Eq. (22) shows how the observation
`of an additional unlabelled sample modifies our opinion about the true value
`of 8, and emphasizes the ideas of updating and learning. If the mixture
`density p(x I 8) is identifiable, then each additional sample tends to sharpen
`p(8 j ftn), and under fairly general conditions p(8 j frn) can be shown to
`converge (in probability) to a Dirac delta function centered at the true value
`of 8. Thus, even though we do not know the categories of the samples,
`identifiability assures us that we can learn the unknown parameter vector
`8, and thereby learn the component densities p(x I <»;, 8).
`This, then, is the formal Bayesian solution to the problem of unsupervised
`learning. In retrospect , the fact that unsupervised learning of the parameters
`of a mixture density is so similar to supervised learning of the parameters of
`a component density is not at all surprising. Indeed , if the component density
`is itself a mixture, there would appear to be no essential difference between
`the two problems.
`However, there are some significant differences between supervised and
`unsupervised learning. One of the major differences concerns the problem
`of identifiability. With supervised learning, lack of identifiability merely
`means that instead of obtaining a unique parameter vector we obtain an
`equivalence class of parameter vectors. However, since all of these yield the
`same component density, lack of identifiability presents no theoretical
`difficulty. With unsupervised learning , lack of identifiability is much more
`serious. When 8 can not be determined uniquely , the mixture can not be
`decomposed into its true components. Thus, while p(x I ~ n) may still
`converge to p(x) ,p(x I<»;, ft n) given by Eq. (19) will not in general converge
`to p(x I <»;), and a theoretical barrier to learning exists.
`Another serious problem for unsupervised learning is computational
`complexity. With supervised learning , the possibility of finding sufficient
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 8 of 12
`
`206
`
`UNSUPERVISED LEARNING AND CLUSTERING
`
`statistics allows solutions that are analytically pleasing and computationally
`feasible. With unsupervised learning , there is no way to avoid the fact that
`the samples are obtained from a mixture density ,
`
`p(x j 8) = I,p(x I W J, 81)P(w;),
`
`C
`
`;- 1
`
`(I)
`
`and this gives us little hope of ever finding simple exact solutions for p(8 I tr).
`Such solutions are tied to the existence of a simple sufficient statistic, and the
`factorization theorem requires the ability to factor p(!!l' I 8) as
`p(!!l' I 8) = g(s, 8)h(!!l').
`
`But from Eqs. (21) and (1),
`
`p(!!l' j 8) = g [;ip(xk
`I w1, 81)P(w ;)J
`Thus, p(!!l' I 8) is the sum of c" products of component densities. Each term
`in this sum can be interpreted as the joint probability of obtaining the
`samples x1 , ...
`, X n bearing a particular labelling, with the sum extending
`over all of the ways that the samples could be labelled. Clearly , this results
`in a thorough mixture of 8 and the x's, and no simple factoring should be
`expected. An exception to this statement arises if the component densities
`do not overlap , so that as 8 varies only one term the mixture density is
`non-zero. In that case , p(!!l' I 8) is the product of the n nonzero terms , and
`may possess a simple sufficient statistic. However, since that case allows the
`class of any sample to be determined, it actually reduces the problem to one
`of supervised learning, and thus is not a significant exception.
`Another way to compare supervised and unsupervised learning
`substitute the mixture density for p(x ,. 18) in Eq. (22) and obtain
`
`is to
`
`C
`
`_I,p(x,. I w,, 81)P(w 1)
`p(8 I !!l'"- 1).
`p(8 I !!£") = C
`;- 1
`;~i J p(x,. I W;, 8;)P(w 1)p(8 I ,qrn-1) d8
`Jf we consider the special case where P(w 1) = 1 and all the other a priori
`probabilities are zero, corresponding
`to the supervised case in which all
`samples come from Class I, then Eq. (23) simplifies to
`p(8 I ,q[") =
`p(x ,. I Wi, 81)
`p(8 I ,qrn-1).
`J p(x,. I w1 , 8 1)p(8 I !!l'"-1
`) d8
`
`(23)
`
`(24)
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 9 of 12
`UNSUPERVISED BAYESIAN LEARNING
`207
`
`Let us compare Eqs. (23) and (24) to see how observing an additional
`sample changes our estimate of 8. In each case we can ignore the denominator,
`which is independent of 8. Thus, the only significant difference is that in the
`supervised case we multiply the "a priori" density for 8 by the component
`density p(x ,. j w 1 , 81), while in the unsupervised case we multiply it by the
`mixture density .2~=iP(Xn I w;, 8;)P(w 1). Assuming that the sample really did
`come from Class 1, we see that the effect of not knowing
`this category
`membership in the unsupervised case is to diminish the influence of xn on
`changing 8. Since xn could have come from any of the c classes, we cannot
`the component(s) of 8 associated
`use it with full effectiveness in changing
`with any one category. Rather, we must distribute its effect over the various
`categories in accordance with the probability
`that it arose from each category.
`
`6.5.3 An Example
`
`two-component mixture with p(x I <.ui) ,__,
`Consider
`the one -dimensional,
`N(µ, I), p(x I w 2 , 0) ,_, N(0, I) , whereµ, P(w 1) and P(w 2) are known. Here
`
`P(w1)
`I
`p(x 0) = PC exp[-½(x
`y2~
`
`2
`
`P(w2)
`- µ)] + ✓- exp[ - ½(x - 0) ].
`2~
`
`2
`
`Viewed as a function of x, this mixture density is a superposition of two
`normal densities, one peaking at x = µ and the other peaking at x = 0.
`Viewed as a function of 0, p(x I 0) has a sing le peak at 0 = x. Suppose that
`the a priori density p(0) is uniform from a to b. Then after one observation
`
`otherwise,
`
`independent of 0. If the sample
`where rx and rx' are normalizing constants,
`Xi is in the range a ::;; x 1 ::;; b, then p(0 I x 1) peaks at 0 = x1 . Otherwise it
`peaks either at 0 = a if x 1 < a or at 0 = b if x1 > b. Note that the additive
`(l/2)(x 1 - µ) 2] is large if x 1 is near µ, and thus the peak of
`constant exp [-
`p(fJ I x1) is less pronounced
`if x 1 is near µ. This corresponds
`to the fact that
`it is more likely to have come from the p(x I w1) component,
`if Xi is nearµ,
`and hence its influence on our estimate for fJ is diminished.
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 10 of 12
`
`208
`UNSUPERVISED LEARNING AND CLUSTERING
`With the addition of a second sample x2 , p(0 I xi) changes to
`p(0 j Xi, x2) = {3p(x2 I 0)p(0 I x1)
`{-(x2 - µ) 2]
`{3'[P(w1)P(w 1) exp[-½(x 1 - µ) 2 -
`+ P(w 1)P(w 2) exp [-½(x 1 - µ) 2 - ½(x2 - 8)2]
`+ P(w 2)P(w 1) exp[-½(x 1 - 0)2 - ½(x2 - µ)2]
`+ P(w 2)P(w 2)exp[-½(x 1 - 0)2 - ½(x2 - 0)2]]
`
`0
`
`otherwise.
`
`Unfortunately,
`the primary
`thing we learn from this expression
`is that
`p(8 I qn) is already complicated when n = 2. The four terms in the sum
`correspond to the four ways in which the samples could have been drawn
`from the two component populations. With n samples there will be 2" terms,
`
`1.6
`
`1 .4
`
`1.2
`
`.~
`
`~ 0.8
`Q.
`
`0.6
`
`0.4
`
`0 .2
`
`0 L_ _
`-5
`
`_L _
`
`__J,_...,,,,et::::: :_---1.. _ __,:i:z.__,,_~ _ _J~~;J:::::==-;;;:a.-_..J
`-3
`3
`-2
`-1
`2
`0
`8
`FIGURE 6.5. Unsupervised Bayesian learning.
`
`4
`
`5
`
`

`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 11 of 12
`
`UNSUPERVISED BAYESIAN LEARNING
`
`209
`
`1.8 ,-----,---,,------,---,,------,---,,------,--
`
`-
`
`,------,----,
`
`1.6
`
`1.4
`
`1.2
`
`X
`;;;' 0.8
`ii
`
`0 .6
`
`0 .4
`
`0.2
`
`n = 25
`
`n = 16
`
`n s 0
`
`oL- --'---L---'---
`-5
`-4
`
`-3
`
`L- -
`-1
`
`-2
`
`-'- --
`0
`9
`FIGU~E 6.6. The effect of narrowing the a priori density.
`
`L- --L
`
`~..3',..__ _ _,_ _
`3
`4
`
`2
`
`___J
`
`5
`
`and no simple sufficient statistics can be found to facilitate understanding
`or to simplify computations.
`It is possible to use the relation
`P(x I 0)p(0 J [!l'n-1)
`p(0 I f!l'") = __ n....;___..;.
`J p(x 11 I 0)p(0 J f!l'n-l) d0
`
`__
`
`and numerical integration to obtain an approximate numerical solution for
`). This was done for the data in Table 6-1 using the values µ = 2,
`p(0 J f!l'11
`P(w 1) = 1/3, and P(w 2) = 2/3. An a priori density p(0) uniform from -4
`to 4 encompasses the data in that table. When this was used to start the
`recursive computation of p(0 J [!l'n), the results shown in Figure 6.5 were
`obtained. As n goes to infinity we can confident ly expect p(0 I f!l'") to approach
`an impulse centered at 0 = 2. This graph gives some idea of the rate of
`convergence.
`One of the main differences between the Bayesian and the maximum
`likelihood approaches to unsupervised learning appears in the presence of
`the a priori density p(0). Figure 6.6 shows how p(0 J f!l'n) changes when p(0)
`
`

`r
`
`Case 1:14-cv-02396-PGG-MHD Document 148-8 Filed 05/30/19 Page 12 of 12
`210
`UNSUPERVISED LEARNING AND CLUSTERING
`
`is assumed to be uniform from I to 3, corresponding to more certain initial
`knowledge about 0. The results of this change are most pronounced when n
`is small. It is here also that the differences between the Bayesian and the
`maximum likelihood solutions are most significant. As n increases, the im(cid:173)
`portance of prior knowledge diminishes, and in this particular case the curves
`for n = 25 are virtually identical. In general , one would expect the difference
`to be small when the number of unlabelled samples is several times the
`effective number of labelled samples used to determine p(O).
`
`6.5.4 Decision-Directed Approximations
`
`Although the problem of unsupervised learning can be stated as merely the
`problem of es,timating parameters of a mixture density, neither the maximum
`likelihood nor the Bayesian approach yields analytically simple results.
`Exact solutions for even the simplest nontrivial examples lead to computa(cid:173)
`tional reguirements that grow exponentially with the number of samples.
`The problem of unsupervised learning is too important
`to abandon just
`because exact solutions are hard to find, however , and numerous procedures
`for obtaining approximate solutions have been suggested.
`Since the basic difference between supervised and unsupervised learning
`is the presence or absence of labels for the samples, an obvious approach to
`unsupervised learning is to use the a priori information to design a classifier
`and to use the decisions of this classifier to label the samples. This is called
`the decision-directed approach to unsupervised learning, and it is subject to
`many variations. It can be applied sequentially by updating the classifier
`each time an unlabelled sample is classified. Alternatively , it can be applied
`in parallel by waiting until all n samples are classified before updating the
`classifier. If desired , this process can be repeated until no changes occur in
`the way the samples are labelled.* Various heuristics can be introduced to
`make the extent of any corrections depend upon
`the confidence of the
`classification decision.
`There are some obvious dangers associated with the decision-directed
`approach. If the initial classifier is not reasonably good , or if an unfortunate
`sequence of samples is encountered , the errors in classifying the unlabelled
`samples can drive the classifier the wrong way, resulting in a solution corre(cid:173)
`sponding roughly to one of the lesser peaks of the likelihood function. Even
`if the initial classifier is optimal, the resulting labelling will not in general
`be the same as the true class membership;
`the act of classification will
`exclude samples from the tails of the desired distribution , and will include
`samples from the tails of the other distributions. Thus, if there is significant
`
`* The Basic Isodata procedure described in Section 6.4.4 is essentially a decision-directed
`procedure of this type.
`
`

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases