throbber
I
`
`- I
`
`326
`
`Chap. 6
`
`Theory and Implementation of Hidden Markov M
`Odels
`
`called a hidden Markov model) is a doubly embedded stochastic process with an und 1 .
`.
`.
`I (. • h. dd ) b
`er Ytng
`stochastic process that is not directly observab e It is 1 en ut can be observed on]
`through another set of stochastic processes that_produce the sequence of observations. Y
`To illustrate the basic concepts of the hidden Markov model, we will use se
`u, b
`.
`.
`veraI
`.
`.
`.
`.
`simple examples including simple co1n-toss1ng expenments. ne eg1n with a revie
`.
`.
`w~
`some basic ideas of probability in the following exercise.
`Exercise 6.1
`Given a single fair coin, i.e., P(Heads) = P(Tails) = 0.5, which you toss once and observe
`Tails,
`1. What is the probability that the next 10 tosses will provide the sequence
`(HHTHITHITH)?
`2. What is the probability that the next IO tosses will produce the sequence
`(HHHHHHHHHH)?
`3. What is the probability that 5 of the next 10 tosses will be tails? What is the expected
`number of tails over the next IO tosses?
`
`Solution 6.1
`1. For a fair coin, with independent coin tosses, the probability of any specific observation
`sequence of length 10 (10 tosses) is (1/2) 10 since there are i1° such sequences and all
`are equally probable. Thus:
`
`2.
`
`P(H HTHTTHTTH) = GY°.
`P(HHHHHHHHHH)= GY°
`
`Thus a specified run of length 10 is as likely as a specified run of interlaced H and T.
`3. The probability of 5 tails in the next 10 tosses is just the number of observation sequences
`with 5 tails and 5 heads (in any order) and this is
`
`P(5H, 5T) = ( 150) (!) JO = 252 rv O 25
` ) ways of getting SH and 5T in IO tosses, and each sequence has
`since there are (
`probability of ( ! ) 10
`• The expected n~mber of tails in IO tosses is
`E(T in 10 tosses) = L d ( ~) G) = s.
`
`2
`
`1024
`
`•
`
`lO
`
`10
`
`15°
`
`d=O
`Thus, on average, there will be 5H and 5T in IO tosses but the probability of exactly
`5H and 5T is only 0.25.
`'
`
`6.3.1 <::oin-Toss Models
`
`• v
`Assume the foil owing
`h
`•
`•
`scenano. iou are 1n a room with a barrier (e.g., a curtain) throug
`
`IPR2023-00035
`Apple EX1015 Page 177
`
`

`

`sec. 6.3
`
`Extensions to Hidden Markov Models
`
`327
`
`which you cannot see what is happening. On the other side of the barrier is another person
`who is performing a coin-tossing experiment (using one or more coins). The person wilJ not
`tell you which coin he selects at any time; he will only tell you the result of each coin flip.
`Thus a sequence of hidden coin-tossing experiments is performed, with the observation
`sequence consisting of a series of heads and tails. A typical observation sequence would
`be
`
`... H)
`
`0 = ( 0 I 02 03 .. , Or)
`= (HHTTTHTTH
`where H stands for heads and T stands for tails.
`the question is, How do we build an HMM to explain
`Given the above scenario,
`(model) the observed sequence of heads and tails? The first problem we face is deciding
`what the states in the model correspond to, and then deciding how many states should be
`in the model. One possible choice would be to assume that only a single biased coin was
`being tossed. In this case, we could model the situation with a two-state model in which
`each state corresponds
`to the outcome of the previous toss (i.e., heads or tails). This model
`is depicted in Figure 6.3a. In this case, the Markov model is observable, and the only issue
`for complete specification of the model would be to decide on the best value for the single
`parameter of the model (i.e., the probability of, say, heads). Interestingly, an equivalent
`HMM to that of Figure 6.3a would be a degenerate one-state model in which the state
`corresponds to the single biased coin, and the unknown parameter is the bias of the coin.
`A second HMM for explaining the observed sequence of coin toss outcomes is given
`in Figure 6.3b. In this case there are two states in the model, and each state corresponds to a
`different, biased coin being tossed. Each state is characterized by a probability distribution
`of heads and tails, and transitions between states are characterized by a state-transition
`matrix. The physical mechanism
`that accounts for how state transitions are selected could
`itself be a set of independent coin tosses or some other probabilistic event.
`A third form of HMM for explaining the observed sequence of coin toss outcomes
`is given in Figure 6.3c. This model corresponds to using three biased coins, and choosing
`from among the three, based on some probabilistic event.
`Given the choice among the three models shown in Figure 6.3 for explaining the
`observed sequence of heads and tails, a natural question would be which model best matches
`the actual observations.
`It should be clear that the simple one-coin model of Figure 6.3a
`has only one unknown parameter;
`the two-coin model of Figure 6.3b has four unknown
`parameters; and the three-coin model of Figure 6.3c has nine unknown parameters. Thus,
`with the greater degrees of freedom, the larger HMMs would seem to be inherently more
`capable of modeling a series of coin-tossing experiments than would equivalently smaller
`models. Although this is theoretically true, we will see later in this chapter that practical
`considerations impose some strong limitations on the size of models that we can consider.
`A fundamental question here is whether the observed head-tail sequence is long and rich
`enough to be able to specify a complex model. Also, it might just be the case that only
`a single coin is being tossed. Then using the three-coin model of Figure 6.3c would be
`inappropriate because we would be using an underspecified system.
`
`IPR2023-00035
`Apple EX1015 Page 178
`
`

`

`328
`
`Chap. 6
`
`Theory and Implementation of Hidden Markov Models
`
`(a)
`
`PlH)
`
`1-P(Hl
`
`1-COIN MODEL
`(OBSERVABLE MARKOV MODEL)
`
`HEADS
`
`TAILS
`
`O=HHTTHTHHTTH
`S=l
`1221211221
`
`.. .
`.. .
`
`.. .
`.. .
`
`2-COINS MODEL
`(HIDDEN MARKOV MODEL)
`
`O = HHTTHTHHTTH
`S=21122212212
`
`(b)
`
`(c)
`
`P(H) = Pi
`P(T) = l-P1
`
`P(n = l-P2
`
`G33
`
`_l_
`P1
`1-P,
`
`P(H)
`
`P(T)
`
`STATE
`__i_
`P2
`l-P2
`
`_L_
`P3
`l-P3
`
`3-COINS MODEL
`(HIDDEN MARKOV MODEL)
`
`O=HHTTHTHHTTH
`S=31233112313
`
`.. .
`.. .
`
`Figure 6.3 Three possible Markov models that can account for the results of hidden coin-tossing
`experiments. (a) one-coin model, (b) two-coins model, (c) three-coins model.
`
`6.3.2 The Urn-and-Ball Model
`
`To extend the ideas of the HMM to a somewhat more complicated situation, consider the
`um-and-ball system of Figure 6.4. We assume that there are N (large) glass urns in a room.
`Within each um is a large quantity of colored balls. We assume there are M distinct colors
`of the balls. The physical process for obtaining observations is as follows. A genie is in
`the room, and, according to some random procedure, it chooses an initial um. From this
`um, a ball is chosen at random, and its color is recorded as the observation. The ball is
`
`IPR2023-00035
`Apple EX1015 Page 179
`
`

`

`sec. 6.3
`
`Extensions to Hidden Markov Models
`
`329
`
`• • •
`
`URN 1
`
`URN 2
`
`• b1< 1)
`P(RED)
`P( BLUE)
`• b1(2)
`P(GREEN) • b1(3)
`P(YELLOW) • b1(4)
`•
`•
`•
`P(ORANGE) • b1 (M)
`
`P(REO)
`• bz( 1)
`P(BLUE) • b2(2)
`P(GREEN) • b2(3)
`P(YELLOW) • bz( 4)
`•
`•
`•
`P(ORANGE) • b2(M)
`
`URN N
`
`P(RED)
`• bN( 1)
`P(BLUE) • bN(2)
`P(GREEN) • bN(3)
`P(YELLOW) 1: bN(4)
`
`P(ORANGE) • bN(M)
`
`0• {GREEN, GREEN, BLUE, RED, YELLOW, RED, .......
`
`, BLUE}
`
`Figure 6.4 An N-state um-and-ball model illustrating the general case of a discrete
`symbol HMM.
`
`then replaced in the urn from which it was selected. A new um is then selected according
`to the random selection procedure associated with the current um, and the ball selection
`process is repeated. This entire process generates a finite observation sequence of colors,
`which we would like to model as the observable output of an HMM.
`It should be obvious that the simplest HMM that corresponds to the um-and-ball
`process is one in which each state corresponds to a specific um, and for which a (ball) color
`probability is defined for each state. The choice of urns is dictated by the state-transition
`matrix of the HMM.
`It should be noted that the ball colors in each um may be the same, and the distinction
`among various urns is in the way the collection of colored balls is composed. Therefore,
`an isolated observation of a particular color ball does not immediately tell which um it is
`drawn from.
`
`6.3.3 Elements of an HMM
`
`The above examples give us some idea of what an HMM is and how it can be applied to
`some simple scenarios. We now formally define the elements of an HMM.
`An HMM for discrete symbol observations such as the above um-and-ball model is
`characterized by the foil owing:
`
`1. N, the number of states in the model. Although the states are hidden, for many
`practical applications there is often some physical significance attached to the states
`or to sets of states of the model. Thus, in the coin-tossing experiments, each state
`corresponded to a distinct biased coin. In the um-and-ball model, the states cor(cid:173)
`responded to the urns. Generally the states are interconnected in such a way that
`any state can be reached from any other state (i.e., an ergodic model); however, we
`will see later in this chapter that other possible interconnections of states are often
`
`IPR2023-00035
`Apple EX1015 Page 180
`
`

`

`330
`
`Chap. 6
`
`Theory and Implementation of Hidden Markov Models
`
`(6.7)
`
`of interest and may better suit speech applications. We label the individual states as
`{ 1, 2, ... , N}, and denote the state at time t as q,.
`2. M, the number of distinct observation symbols per state-i.e., the discrete alphabet
`size. The observation symbols correspond to the physical output of the system being
`modeled. For the coin-toss experiments the observation symbols were simply heads
`or tails; for the ball-and-um model they were the colors of the balls selected from the
`urns. We denote the individual symbols as V = { v 1 , v2, ... , v M}.
`3. The state-transition probability distribution A = { aiJ} where
`aiJ = P[qr+l = jlq, = i],
`1 < i,j < N.
`For the special case in which any state can reach any other state in a single step, we
`have aij > 0 for all i,j. For other types of HMMs, we would have aij = 0 for one or
`more (i ,j) pairs.
`4. The observation symbol probability distribution, B = { bj(k)}, in which
`bj(k) = P[o, = vklq1 = }],
`1 < k < M,
`defines the symbol distribution in state},}= 1, 2, ... , N.
`5. The initial state distribution 1r = { 1r;} in which
`1r; = P[q1 = i],
`I < i < N.
`
`(6.8)
`
`(6.9)
`
`It can be seen from the above discussion that a complete specification of an HMM
`requires specification of two model parameters, N and M, specification of observation
`symbols, and the specification of the three sets of probability measures A, B, and 1r. For
`convenience, we use the compact notation
`;\ = (A,B, 1r)
`to indicate the complete parameter set of the model. This parameter set, of course, defines
`a probability measure for 0, i.e. P(OI;\), which we discuss in the next section. We use the
`tenninology HMM to indicate the parameter set .X and the associated probability measure
`interchangeably without ambiguity.
`
`(6.10)
`
`6.3.4 HMM Generator of Observations
`
`Given appropriate values of N,M,A,B, and 1r, the HMM can be used as a generator to give
`an observation sequence
`
`(6.11)
`
`(in which each observation Or is one of the symbols from V, and T is the number of
`observations in the sequence) as follows:
`
`1. Choose an initial state qi = i according to the initial state distribution 1r.
`2. Sett= 1.
`3. Choose Or = vk according to the symbol probability distribution in state i, i.e., bj(k).
`
`IPR2023-00035
`Apple EX1015 Page 181
`
`

`

`Extensions to Hidden Markov Models
`331
`sec. 6.3
`4. Transit to a new state q,+ 1 = j according to the state-transition probability distribution
`for state i, i.e., aij.
`s. Set t == t + 1; return to step 3 if t < T; otherwise, terminate the procedure.
`
`The following table shows the sequence of states and observations generated by the above
`procedure:
`
`time, t
`state
`observation
`
`2
`
`3
`
`4
`
`5
`
`6
`
`T
`
`The above procedure can be used as both a generator of observations and as a model to
`simulate how a given observation sequence was generated by an appropriate HMM.
`Exercise 6.2
`Consider an HMM representation (parametrized by >.) of a coin-tossing experiment. Assume
`a three-state model (corresponding
`to three different coins) with probabilities
`
`State I
`0.5
`0.5
`
`P(H)
`P(T)
`
`State 2
`0.75
`0.25
`
`State 3
`0.25
`0.75
`
`and with all state-transition probabilities equal to I /3. (Assume initial state probabilities of
`1/3.)
`1. You observe the sequence
`
`0 = (HHHHTHTTTT).
`What state sequence is most likely? What is the probability of the observation sequence
`and this most likely state sequence?
`2. What is the probability that the observation sequence came entirely from state I?
`3. Consider the observation sequence
`6 = (HTTHTHHTTH).
`How would your answers to parts a and b change?
`4. If the state-transition probabilities were
`a31 = 0.45
`a21 = 0.45
`a11 = 0.9
`,
`a32 = 0.45
`a,2 = 0.05
`, a22=0.l
`, a23 = 0.45
`a13 = 0.05
`033 = 0.1
`that is, a new model ')..', how would your answers to parts 1-3 change? What does this
`suggest about the type of sequences generated by the models?
`
`,
`,
`
`'
`
`Solution 6.2
`1. Given O = (HHHHTH777T)
`and that all state transitions are equiprobable, the most
`likely state sequence is the one for which the probability of each individual observation
`
`IPR2023-00035
`Apple EX1015 Page 182
`
`

`

`332
`
`Chap. 6
`
`Theory and Implementation of Hidden Markov Models
`
`is maximum. Thus for each H, the most likely_ state is 2 and for each T the most likely
`state is 3. Thus the most likely state sequence 1s
`q = (2 2 2 2 3 2 3 3 3 3).
`
`The probability of O and q (given the model) is
`
`P(O, qj-\) = (0.75) 10
`
`( 1) 10
`
`3
`
`2. The probability of O given that q is
`q = (1111 11 1 1 1 1)
`
`is
`
`P(O,qj-\) = (0.50) 10 (j) 10
`
`The ratio of P(O, qj>.) to P(O, qj-\) is:
`
`R = P(O, qj-\) = (~) 10 = 57.67
`P(O, qj-\)
`2
`which shows, as expected, that q is more likely than q.
`3. Given O which has the same number of Hs and Ts, the answers to parts 1 and 2 would
`remain the same, as the most likely states occur the same number of times in both cases.
`4. The new probability of O and q becomes
`
`P(O, qj-\') = (0.75) 10
`
`( j) (0.1)6(0.45)3.
`
`The new probability of O and q becomes
`
`The ratio is
`
`( 1) (0.9)9.
`P(O, qj-\') = (0.50) 10
`R = (~) ro (if (1) 3 = 1.36 X 10-s.
`In other words, because of the nonuniform transition probabilities, q is more likely than q. (The
`rea~er is encouraged to find the most likely state sequence in this case.) Now, the probability
`of O and q is not the same as the probability of O and q. We have
`P(O, qj-\') = 3 (0.1 )6 (0.45)3 (0.25)4 (0.75)6
`P(O, qj-\') = (0.50) 10 ( j") (0.9)9
`
`-
`
`1
`
`with ratio
`
`R = (if (lf (1)4
`Clearly, because a1, = 0.9, q is more likely.
`
`6
`
`(~)
`
`= 1.67 X 10- 1
`
`.
`
`IPR2023-00035
`Apple EX1015 Page 183
`
`

`

`sec. 6.4
`
`The Three Basic Problems for HMMs
`
`333
`
`THE THREE BASIC: PROBLEMS FOR HMMS
`
`6,4
`
`Given the fonn of HMM of the previous section, three basic problems of interest must
`be solved for the model to be useful in real-world applications. These problems are the
`following:
`
`Problem 1
`
`sequence O = (o, o ... or), and a mode1 A = (A,B, 1r), how do we
`Given the observation
`efficiently compute P(OI .X). the probability of the observation sequence, given the model?
`
`Problem 2
`
`sequence O = (o, o ... Or), and the model A, how do we choose
`Given the observation
`state sequence q = (q,q2 ... qr) that is optimal in some sense (i.e., best
`a corresponding
`"explains"
`the observations)?
`
`Problem 3
`How do we adjust the model parameters A = (A, B, 11') to maximize P(Oj.X)?
`Problem I is the evaluation problem; namely, given a model and a sequence of
`observations, how do we compute the probability that the observed sequence was produced
`by the model? We can also view the problem as one of scoring how well a given model
`matches a given observation sequence. The latter viewpoint is extremely useful. For
`example, if we consider the case in which we are trying to choose among several competing
`models, the solution
`to Problem 1 allows us to choose the model that best matches the
`observations.
`Problem 2 is the one in which we attempt to uncover the hidden part o·~ the model(cid:173)
`that is, to find the "correct" state sequence. It should be clear that for all but the case of
`degenerate models, there is no "correct" state sequence to be found. Hence for practical
`situations, we usually use an optimality criterion to solve this problem as best as possible.
`As we will see, several reasonable optimal cy criteria can be imposed, and hence the choice
`of criterion is a strong function of the intended use for the uncovered state sequence. Typical
`uses might be to learn about the structure of the model, to find optimal state sequences for
`continuous speech recognition, or to get average statistics of individual states, etc.
`Problem 3 is the one in which we attempt to optimize the model parameters to best
`describe how a given observation sequence comes about. The observation sequence used
`to adjust the model parameters is called a training sequence because it is used to "train" the
`HMM. The training problem is the crucial one for most applications of HMMs, because
`it allows us to optimally adapt model parameters to observed training data-i.e.,
`to create
`best models for real phenomena.
`To fix ideas, consider the following simple isolated-word speech recognizer. For each
`word of a W word vocabulary, we want to design a separateN-state HMM. We represent the
`speech signal of a given word as a time sequence of coded spectral vectors. We assume that
`the coding is done using a spectral codebook with M unique spectral vectors; hence each
`observation is the index of the spectral vector closest (in some spectral distortion sense) to
`the original speech signal. Thus, for each vocabulary word, we have a training sequence
`
`IPR2023-00035
`Apple EX1015 Page 184
`
`

`

`334
`
`Chap. 6
`
`Theory and Implementation of Hidden Markov Models
`
`consisting of a number of repetitions of sequences of codebook indices of the word (by
`one or more talkers). The first task is to build individual word models. This task is done
`by using the solution to Problem 3 to optimally estimate model parameters for each word
`model. To develop an understanding of the physical meaning of the model states, we use
`the solution to Problem 2 to segment each of the word training sequences into states, and
`then study the properties of the spectral vectors that lead to the observations occurring in
`each state. The goal here is to make refinements of the model (e.g., more states, different
`codebook size) to improve its capability of modeling the spoken word sequences. Finally,
`once the set of W HMMs has been designed and optimized, recognition of an unknown
`word is performed using the solution to Problem I to score each word model based upon
`the given test observation sequence, and select the word whose model score is highest (i.e.,
`the highest likelihood).
`In the next sections we present formal mathematical solutions to each fundamental
`problem for HMMs. We shall see that the three problems are tightly linked together under
`the probabilistic framework.
`
`6.4.1 Solution to Problem 1-Probability Evaluation
`
`We wish to calculate the probability of the observation sequence, 0 = ( o I o ... or), given
`the model >., i.e., P(Oj>.). The most straightforward way of doing this is through enumer(cid:173)
`ating every possible state sequence of length T (the number of observations). There are N7
`such state sequences. Consider one such fixed-state sequence
`
`(6.12)
`
`where q 1 is the initial state. The probability of the observation sequence O given the state
`sequence of Eq. (6.12) is
`
`P(Ojq, >.) = IT P(o,lq,, >.)
`
`T
`
`t=I
`where we have assumed statistical independence of observations. Thus we get
`
`P(Olq, >.) = bq, (01) • bq 2(02)
`The probability of such a state sequence q can be written as
`
`... bqr(or).
`
`(6.13a)
`
`(6.13b)
`
`P(qi>.) = 1rq,aq,q2aq2q3 • • • aqr-1Qr·
`The joint probability of O and q, i.e., the probability that O and q occur simultaneously, is
`simply the product of the above two terms, i.e.,
`
`(6.14)
`
`P(O, qi>.) = P(Olq, >.)P(qi>.).
`
`(6. 15)
`
`The probability of O (given the model) is obtained by summing this joint probability over
`all possible state sequences q, giving
`P(OI>.) = L P(Olq, >.)P(qi>.)
`
`(6.16)
`
`all q
`
`IPR2023-00035
`Apple EX1015 Page 185
`
`

`

`sec. 6.4
`
`The Three Basic Problems for HMMs
`
`335
`
`(6.17)
`
`QI, Q2, ..• , QT
`
`The interpretation of the computation in the above equation is the following. Initially (at
`time t == 1) we are in state Q1 with probability 7rq 1 , and generate the symbol o1 (in this state)
`with probability bq1 (01). The clock changes from time t tot+
`I (time= 2) and we make
`a transition to state Q2 from state Qi with probability aq 1q2 , and generate symbol 0i with
`probability bq2 (02). This process continues in this manner until we make the last transition
`(at time n from state QT-I to state QT with probability aqr-,qr and generate symbol Or with
`probability bq/0T ).
`A little thought should convince the reader that the calculation of P(Oj.X), according
`to its direct definition (Eq. (6.17)) involves on the order of 2T • NT calculations, since
`at every t = 1, 2, ... , T, there are N possible states that can be reached (i.e., there are
`NT possible state sequences), and for each such state sequence about 2T calculations are
`(To be precise, we need (2T - 1 )NT
`required for each term in the sum of Eq. ( 6.17).
`multiplications, and NT - I additions.) This calculation is computationally infeasible, even
`for small values of N and T; e.g., for N = 5 (states), T = 100 (observations), there are
`on the order of 2 • 100 • 5 100 ::::::: 1072 computations! Clearly a more efficient procedure is
`required to solve problem I. Fortunately such a procedure (called the forward procedure)
`exists.
`
`6.4.1.1 The Forward Procedure
`
`Consider the forward variable a,(i) defined as
`... o,, q, = ij.X)
`a,(i) = P(o102
`that is, the probability of the partial observation sequence, 01 Oi ... o,, (until time t) and
`state i at time t, given the model >.. We can solve for a,(i) inductively, as follows:
`
`(6.18)
`
`1. Initialization
`
`2. Induction
`
`3. Termination
`
`a1 (i) = 1r;b;(o1),
`
`N
`
`at+i (j) = [~ a,(i)au] bj(0 1+i),
`
`t=I
`
`1::;1::;r-1
`1 :s;j:s;N
`
`N
`
`P(OI.X) = L aT(i).
`
`i=I
`
`(6.19)
`
`(6.20)
`
`(6.21)
`
`Step 1 initializes the forward probabilities as the joint probability of state i and initial
`observation o1. The induction step, which is the heart of the forward calculation, is
`illustrated in Figure 6.5(a). This figure shows how state j can be reached at time t + 1
`from the N possible states, i, 1 < i < N, at time t. Since a,(i) is the probability of the
`joint event that o 1 o2 ... o, are observed, and the state at time tis i, the product a,(i)aij is
`
`IPR2023-00035
`Apple EX1015 Page 186
`
`

`

`336
`
`Chap. 6
`
`Theory and Implementation of Hidden Markov Models
`
`(O)
`
`(bl
`
`➔
`➔
`~
`
`T
`
`N
`
`U.I
`
`._
`C ._
`
`u,
`
`2
`
`2
`
`3
`OBSERVATION,
`
`t
`
`Figure 6.5
`(a) Illustration of the sequence of operations required for
`the computation of the forward variable o 1+ 1 (j).
`(b) Implementation
`of the computation of a:1(i) in terms of a lattice of observations t, and
`states i.
`
`then the probability of the joint event that o 1 o2 ... 0 1 are observed, and state j is reached
`at time t + 1 via state i at time t. Summing this product over all the N possible states, i,
`1 ~ i ~ Nat time t results in the probability of j at time t + I with all the accompanying
`previous partial observations. Once this is done and j is known, it is easy to see that a,+ 1 (j)
`is obtained by accounting for observation o,+1 in state j, i.e., by multiplying the summed
`quantity by the probability bj(o,+ 1). The computation of Eq. (6.20) is performed for all
`states j, I ~ j ~ N, for a given t; the computation is then iterated for t = 1, 2 ... , T - 1.
`Finally, step 3 gives the desired calculation of P(OI..\) as the sum of the terminal forward
`variables o.T(i). This is the case since, by definition,
`
`and hence P(Oj..\) is just the sum of the aT(i)'s.
`If we examine the computation involved in the calculation of o.,(j), I < t ~ T,
`I ~ j ~ N, we see that it requires on the order of N 2T calculations, rather than 2TNr as
`I) + N
`required by the direct calculation. (Again, to be precise, we need N(N + I )(T -
`
`(6.22)
`
`IPR2023-00035
`Apple EX1015 Page 187
`
`

`

`337
`
`The Three Basic Problems for HMMs
`
`sec. 6.4
`rnultiplications and N(N - 1 )(T - 1) additions.) For N = 5, T = l 00, we need about 3000
`cornputations for the forward method, versus 1072 computations for the direct calculation,
`a savings of about 69 orders of magnitude.
`is, in effect, based upon the lattice (or trellis)
`The forward probability calculation
`stn1cture shown in Figure 6.5(b). The key is that, because there are only N states (nodes
`at each time slot in the lattice), all the possible state sequences will remerge into these N
`nodes, no matter how long the observation sequence. At time t = I (the first time slot in
`the lattice), we need to calculate values of a 1 (i), l ~ i ~ N. At times t = 2, 3, ... , T, we
`need only calculate values of a,(j), I < j < N, where each calculation involves only the N
`previous values of a,_ 1 (i) because each of the N grid points can be reached from only the
`N grid points at the previous time slot.
`
`6.4.1.2 The Backward Procedure
`
`In a similar manner, we can consider a backward variable /3,(i) defined as
`/J,(i) = P(o,+10,+2
`.. . orlq, = i,A)
`(6.23)
`that is, the probability of the partial observation sequence from t + l to the end, given state
`i at time t and the model .A. Again we can solve for /3,(i) inductively, as follows:
`
`1. Initialization
`
`2. Induction
`
`/3-r(i) = 1,
`
`1 ~ i ~ N.
`
`N
`
`/3,(i) = L aijbj(Or+1)/3,+1U),
`t = T-
`
`j=l
`1, T-
`
`2, ... , 1,
`
`1 ~ i ~ N.
`
`(6.24)
`
`(6.25)
`
`The initialization step 1 arbitrarily defines /3r(i) to be 1 for all i. Step 2, which is illustrated
`in Figure 6.6, shows that in order to have been in state i at time t, and to account for the
`observation sequence from time t + 1 on, you have to consider all possible states j at time
`t + 1, accounting for the transition from i to j (the aij term), as well as the observation
`Or+ 1 in state j (the bj( Or+ 1) term), and then account for the remaining partial observation
`sequence from state j (the /3,+1 (j) term). We will see later how the backward as well as the
`forward calculations are used to help solve fundamental Problems 2 and 3 of HMMs.
`Again, the computation of /3,(i), 1 < t ~ T, 1 ~ i ~ N, requires on the order of N2T
`calculations, and can be computed in a lattice structure similar to that of Figure 6.5(b ).
`
`6.4.2 Solution to Problem 2-"0ptimal" State Sequence
`
`Unlike Problem 1, for which an exact solution can be given, there are several possible
`ways of solving Problem 2-namely,
`finding the "optimal" state sequence associated with
`the given observation sequence. The difficulty lies with the definition of the optimal state
`sequence-that
`is, there are several possible optimality criteria. For example, one possible
`
`IPR2023-00035
`Apple EX1015 Page 188
`
`

`

`338
`
`Chap. 6
`
`Theory and Implementation of Hidden Markov Models
`
`Si
`
`t + 1
`
`Figure 6.6 Sequence of operations required for the computa(cid:173)
`tion of the backward variable {3,(i).
`
`optimality criterion is to choose the states q1 that are individually most likely at each time t.
`This optimality criterion maximizes the expected number of correct individual states. To
`implement this solution to Problem 2, we can define the a posteriori probability variable
`,r(i) = P(q, = ilO, A)
`
`(6.26)
`
`that is, the probability of being in state i at time t, given the observation sequence 0, and
`the model A. We can express 11(i) in several forms, including
`"'(,(i) = P(q, = i I 0, A)
`P(O, q, = i I A)
`P(O I A)
`P(O, q, = i I A)
`N L P(O, q, = i I A)
`
`(6.27)
`
`i=l
`
`Since P(O, q, = i I ,\) is equal to o:r(i)/3,(i), we can write ,,(i) as
`
`(6.28)
`
`o,(i)/3,(i)
`,,(,) = _N ___
`_
`
`L o,<i)/3,<i)
`
`i=l
`where we see that o,(i) accounts for the partial observation sequence o1 o2 . .. o, and state
`i at t, while /3i(i) accounts for the remainder of the observation sequence o,+10,+2 ... or,
`given state q, = i at t.
`
`IPR2023-00035
`Apple EX1015 Page 189
`
`

`

`I
`
`sec. 6.4
`
`The Three Basic Problems for HMMs
`
`Using ,y,(i), we can solve for the individually most likely state q; at time t, as
`
`q~ = arg min ['y,(i)],
`lsJ-5,N
`
`I ~ t ~ T.
`
`339
`
`(6.29)
`
`Although Eq. (6.29) maximizes the expected number of correct states (by choosing the most
`likely state for each t), there could be some problems with the resulting state sequence.
`for example, when the HMM has state transitions which have zero probability (aij = O for
`some i andj), the "optimal" state sequence may, in fact, not even be a valid state sequence.
`This is because the solution of Eq. (6.29) simply determines the most likely state at every
`instant, without regard to the probability of occurrence of sequences of states.
`One possible solution
`to the above problem is to modify the optimality criterion.
`For example, one could solve for the state sequence that maximizes the expected number
`of correct pairs of states (q,, q,+ i), or triples of states (q,, qt+ 1, q,+2), etc. Although these
`criteria might be reasonable for some applications, the most widely used criterion is to find
`is, to maximize P(qjO, A), which is equivalent to
`the single best state sequence (path)-that
`maximizing P( q, 0 I A). A formal technique for finding this single best state sequence exists,
`based on dynamic programming methods, and is called the Viterbi algorithm [15, 16].
`
`6.4.2.1 The Viterbi Algorithm
`
`To find the single best state sequence, q = (q 1 q2 ... qr), for the given observation sequence
`0 = (o 1 02 ... or), we need to define the quantity
`8,(i) = max
`P[q1q2 ... q,-1, q, = i, 0102 ... o,jA]
`
`(6.30)
`
`Qt ,q2,····q,_,
`
`that is, 81(i) is the best score (highest probability) along a single path, at time t, which
`accounts for the first t observations and ends in state i. By induction we have
`
`(6.31)
`
`To actually retrieve the state sequence, we need to keep track of the argument that maximized
`Eq. (6.31), for each t and j. We do this via the array 'lj;,(j). The complete procedure for
`finding the best state sequence can now be stated as follows:
`
`1. Initialization
`
`2. Recursion
`
`81 (i) = 1r;b;(oi),
`'Ip) (i) = 0.
`
`8,(j) = max [8,-1(i)aij]bj(o,),
`lsJ$.N
`
`'l/J,(j) = arg max [8,_ 1(i)aij],
`I $_;$.N
`
`2 ~ t ~ T
`I ~j ~N
`2 ~ t ~ T
`1 ~j ~ N.
`
`'
`
`(6.32a)
`(6.32b)
`
`(6.33a)
`
`(6.33b)
`
`IPR2023-00035
`Apple EX1015 Page 190
`
`

`

`340
`
`Chap. 6
`
`Theory and Implementation of Hidden Markov Models
`
`3. Tennination
`
`p• = max [6r(i)]
`Qr = arg max [ 8r(i)].
`
`l<i<N
`
`J<i<N
`
`4. Path ( state sequence) backtracking
`q; = 1P1+1(q7+1),
`
`t = T - 1, T - 2, ... , 1.
`
`(6.34a)
`
`(6.34b)
`
`(6.35)
`
`It should be noted that the Viterbi algorithm is similar (except for the backtracking step) in
`implementation to the forward calculation of Eqs. (6.19)-(6.21). The major difference is
`the maximization in Eq. (6.33a) over previous states, which is used in place of the summing
`procedure in Eq. (6.20). It also should be clear that a lattice (or trellis) structure efficiently
`implements the computation of the Viterbi procedure.
`
`6.4.2.2 Alternative Viterbi Implementation
`
`~J
`.,,
`~~
`f
`
`By ta1cing logarithms of the model parameters, the Viterbi algorithm of the preceding
`section can be implemented without the need for any multiplications. Thus:
`
`0. Preprocessing
`
`1. Initiali1.ation
`
`ii-; = log (1r;),
`1 < i < N
`b;(o,) = log [b;(o,)], 1 < i < N, 1 < t < T
`aij = log (aiJ),
`1 < i, j < N
`
`-
`-
`81 (i) = log (81 (i)) = i; + b;(o1), 1 < i < N
`VJI (i) = 0,
`1 < i < N
`
`2. Recursion
`6,(1) = log (8,(1)) = max [i,-1(i) + aiJ] + hj(o,)
`1/J,(j) = arg max [8,-1(i) + aiJ],
`2 < t < T, 1 <j < N
`
`l<i<N
`
`l<i<N
`
`3. Termination
`
`4. Backtracking
`
`P* = max [6r(i)]
`qi = arg max [67(i)]
`
`l<i<N
`
`I<i<N
`
`t = T - 1, T - 2, ... , 1
`The calculation required for this alternative implementation is on the order of N2T additions
`(plus the calculation for preprocessing). Because the preprocessing needs to be perform

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket