throbber
INTERNATIONAL COMPUTER SCIENCE INSTITUTE I
`
`  Center St(cid:5) (cid:1) Suite  (cid:1) Berkeley(cid:8) California (cid:9)  (cid:1) (cid:11) (cid:13)  (cid:9)  (cid:1) FAX (cid:11) (cid:13)  (cid:9)
`
`Markov Models and Hidden
`
`Markov Models(cid:1) A Brief Tutorial
`
`Eric Fosler(cid:1)Lussier
`
`TR(cid:1) (cid:1)
`
`December 
`
`Abstract
`This tutorial gives a gentle introduction to Markov models and Hidden Markov
`models as mathematical abstractions(cid:7) and relates them to their use in automatic
`speech recognition(cid:8) This material was developed for the Fall  semester of CS (cid:3)
`Introduction to Arti(cid:4)cial Intelligence at the University of California(cid:7) Berkeley(cid:8)
`It
`is targeted for introductory AI courses(cid:10) basic knowledge of probability theory (cid:11)e(cid:5)g(cid:5)
`Bayes(cid:12) Rule(cid:13) is assumed(cid:8) This version is slightly updated from the original(cid:7) including
`a few minor error corrections(cid:7) a short (cid:14)Further Reading(cid:15) section(cid:7) and exercises that
`were given as a homework in the Fall  class(cid:8)
`
`Columbia Ex 2033-1
`Symantec v Columbia
`IPR2015-00375
`
`

`
`ii
`
`Columbia Ex 2033-2
`Symantec v Columbia
`IPR2015-00375
`
`

`
` Markov Models
`
`Let(cid:1)s talk about the weather(cid:2) Here in Berkeley(cid:3) we have three types of weather(cid:4) sunny(cid:3) rainy(cid:3) and
`foggy(cid:2) Let(cid:1)s assume for the moment that the weather lasts all day(cid:3) i(cid:2)e(cid:2) it doesn(cid:1)t change from rainy
`to sunny in the middle of the day(cid:2)
`Weather prediction is all about trying to guess what the weather will be like tomorrow based on
`a history of observations of weather(cid:2) Let(cid:1)s assume a simpli(cid:5)ed model of weather prediction(cid:4) we(cid:1)ll
`collect statistics on what the weather was like today based on what the weather was like yesterday(cid:3)
`the day before(cid:3) and so forth(cid:2) We want to collect the following probabilities(cid:4)
`
`P (cid:6)wn j wn(cid:1) (cid:1) wn(cid:1)(cid:1) (cid:2) (cid:2) (cid:2) (cid:1) w (cid:7)
`
`(cid:6) (cid:7)
`
`Using expression (cid:3) we can give probabilities of types of weather for tomorrow and the next day
`using n days of history(cid:2) For example(cid:3) if we knew that the weather for the past three days was fsunny(cid:3)
`sunny(cid:3) foggyg in chronological order(cid:3) the probability that tomorrow would be rainy is given by(cid:4)
`
`P (cid:6)w (cid:9) Rainy j w (cid:9) F oggy(cid:1) w (cid:9) Sunny(cid:1) w (cid:9) Sunny(cid:7)
`
`(cid:6)(cid:7)
`
`Here(cid:1)s the problem(cid:4) the larger n is(cid:3) the more statistics we must collect(cid:2) Suppose that n (cid:9) (cid:3)
`then we must collect statistics for  (cid:9)  past histories(cid:2) Therefore(cid:3) we will make a simplifying
`assumption(cid:3) called the Markov Assumption(cid:4)
`In a sequence fw (cid:1) w(cid:1) (cid:2) (cid:2) (cid:2) (cid:1) wng(cid:4)
`
`P (cid:6)wn j wn(cid:1) (cid:1) wn(cid:1)(cid:1) (cid:2) (cid:2) (cid:2) (cid:1) w (cid:7) (cid:2) P (cid:6)wn j wn(cid:1) (cid:7)
`
`(cid:6) (cid:7)
`
`This is called a (cid:1)rst(cid:2)order Markov assumption(cid:3) since we say that the probability of an observation
`at time n only depends on the observation at time n (cid:3) (cid:2) A second(cid:2)order Markov assumption would
`have the observation at time n depend on n (cid:3) andn (cid:3) (cid:2) In general(cid:3) when people talk about
`Markov assumptions(cid:3) they usually mean (cid:5)rst(cid:14)order Markov assumptions(cid:15) I will use the two terms
`interchangeably(cid:2)
`We can the express the joint probability using the Markov assumption(cid:2)
`
`P (cid:6)w (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) wn(cid:7) (cid:9) (cid:16)n
`i(cid:6) P (cid:6)wi j wi(cid:1) (cid:7)
`
`(cid:6)(cid:7)
`
`So this now has a profound a(cid:17)ect on the number of histories that we have to (cid:5)nd statistics
`for(cid:18) we now only need  (cid:9) numbers to characterize the probabilities of all of the sequences(cid:2)
`This assumption may or may not be a valid assumption depending on the situation (cid:6)in the case of
`weather(cid:3) it(cid:1)s probably not valid(cid:7)(cid:3) but we use these to simplify the situation(cid:2)
`So let(cid:1)s arbitrarily pick some numbers for P (cid:6)wtomorrow j wtoday(cid:7)(cid:3) expressed in Table (cid:2)
`
`Sunny
`Today(cid:1)s Weather Rainy
`Foggy
`
`Tomorrow(cid:1)s Weather
`Sunny Rainy Foggy
`(cid:2)
`(cid:2)
`(cid:2) 
`(cid:2)
`(cid:2)
`(cid:2)
`(cid:2)
`(cid:2)
`(cid:2)
`
`Table (cid:4) Probabilities of Tomorrow(cid:1)s weather based on Today(cid:1)s Weather
`
`For (cid:5)rst(cid:14)order Markov models(cid:3) we can use these probabilities to draw a probabilistic (cid:5)nite state
`automaton(cid:2) For the weather domain(cid:3) you would have three states (cid:6)Sunny(cid:3) Rainy(cid:3) and Foggy(cid:7)(cid:3) and
`
` One question that comes to mind is (cid:1)What is w(cid:2)(cid:3) In general(cid:4) one can think of w as the START word(cid:4) so
`P (cid:5)w jw(cid:6) is the probability that w can start a sentence(cid:7) We can also just multiply the prior probability of w with
`the product of (cid:8)n
`i(cid:3) P (cid:5)wi j wi(cid:1) (cid:6)(cid:9) it(cid:10)s just a matter of de(cid:11)nitions(cid:7)
`
`
`
`Columbia Ex 2033-3
`Symantec v Columbia
`IPR2015-00375
`
`

`
`every day you would transition to a (cid:6)possibly(cid:7) new state based on the probabilities in Table (cid:2) Such
`an automaton would look like this(cid:4)
`0.8
`
`0.5
`
`Sunny
`
`0.2
`
`Foggy
`
`0.2
`
`0.15
`
`0.2
`
`0.05
`
`0.3
`
`Rainy
`
`0.6
`
` (cid:2) Questions(cid:3)
`
` (cid:2) Given that today is sunny(cid:3) what(cid:1)s the probability that tomorrow is sunny and the day after is
`rainy(cid:23)
`
`Well(cid:3) this translates into(cid:4)
`
`P (cid:6)w (cid:9) Sunny(cid:1) w (cid:9) Rainyjw (cid:9) Sunny(cid:7) (cid:9) P (cid:6)w (cid:9) Rainy j w (cid:9) Sunny(cid:1) w (cid:9) Sunny(cid:7) (cid:4)
`
`P (cid:6)w (cid:9) Sunny j w (cid:9) Sunny(cid:7)
`(cid:9) P (cid:6)w (cid:9) Rainy j w (cid:9) Sunny(cid:7) (cid:4)
`P (cid:6)w (cid:9) Sunny j w (cid:9) Sunny(cid:7)
`(cid:9) (cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)
`
`(cid:9) (cid:2)
`
`You can also think about this as moving through the automaton(cid:3) multiplying the probabilities
`as you go(cid:2)
`
`(cid:2) Given that today is foggy(cid:3) what(cid:1)s the probability that it will be rainy two days from now(cid:23)
`
`There are three ways to get from foggy today to rainy two days from now(cid:4) ffoggy(cid:3) foggy(cid:3)
`rainyg(cid:3) ffoggy(cid:3) rainy(cid:3) rainyg(cid:3) and ffoggy(cid:3) sunny(cid:3) rainyg(cid:2) Therefore we have to sum over these
`paths(cid:4)
`
`P (cid:6)w (cid:9) Rainy j w (cid:9) Foggy(cid:7) (cid:9) P (cid:6)w (cid:9) Foggy(cid:1) w (cid:9) Rainy j w (cid:9) Foggy(cid:7) (cid:24)
`P (cid:6)w (cid:9) Rainy(cid:1) w (cid:9) Rainy j w (cid:9) Foggy(cid:7) (cid:24)
`P (cid:6)w (cid:9) Sunny(cid:1) w (cid:9) Rainy j w (cid:9) Foggy(cid:7) (cid:24)
`(cid:9) P (cid:6)w (cid:9) Rainy j w (cid:9) Foggy(cid:7)P (cid:6)w (cid:9) Foggy j w (cid:9) Foggy(cid:7) (cid:24)
`
`P (cid:6)w (cid:9) Rainy j w (cid:9) Rainy(cid:7)P (cid:6)w (cid:9) Rainy j w (cid:9) Foggy(cid:7) (cid:24)
`P (cid:6)w (cid:9) Rainy j w (cid:9) Sunny(cid:7)P (cid:6)w (cid:9) Sunny j w (cid:9) Foggy(cid:7)
`
`
`
`Columbia Ex 2033-4
`Symantec v Columbia
`IPR2015-00375
`
`

`
`(cid:9) (cid:6)(cid:2) (cid:7)(cid:6)(cid:2)(cid:7) (cid:24) (cid:6)(cid:2)(cid:7)(cid:6)(cid:2) (cid:7) (cid:24) (cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)
`(cid:9) (cid:2) 
`
`Note that you have to know where you start from(cid:2) Usually Markov models start with a null start
`state(cid:3) and have transitions to other states with certain probabilities(cid:2) In the above problems(cid:3) you
`can just add a start state with a single arc with probability to the initial state (cid:6)sunny in problem
` (cid:3) foggy in problem (cid:7)(cid:2)
`
` Hidden Markov Models
`
`So what makes a Hidden Markov Model(cid:23) Well(cid:3) suppose you were locked in a room for several days(cid:3)
`and you were asked about the weather outside(cid:2) The only piece of evidence you have is whether the
`person who comes into the room carrying your daily meal is carrying an umbrella or not(cid:2)
`Let(cid:1)s suppose the following probabilities(cid:4)
`
`Sunny
`Rainy
`Foggy
`
`Probability of Umbrella
`(cid:2)
`(cid:2)
`(cid:2)
`
`Table (cid:4) Probabilities of Seeing an Umbrella Based on the Weather
`
`Remember(cid:3) the equation for the weather Markov process before you were locked in the room
`was(cid:4)
`
`P (cid:6)w (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) wn(cid:7) (cid:9) (cid:16)n
`i(cid:6) P (cid:6)wi j wi(cid:1) (cid:7)
`
`(cid:6)(cid:7)
`
`Now we have to factor in the fact that the actual weather is hidden from you(cid:2) We do that by
`using Bayes(cid:1) Rule(cid:4)
`
`P (cid:6)w (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) wn j u (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) un(cid:7) (cid:9)
`
`P (cid:6)u (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) unjw (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) wn(cid:7)P (cid:6)w (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) wn(cid:7)
`P (cid:6)u (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) un(cid:7)
`
`(cid:6)(cid:7)
`
`where ui is true if your caretaker brought an umbrella on day i(cid:3) and false if the caretaker
`didn(cid:1)t(cid:2) The probability P (cid:6)w (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) wn(cid:7) is the same as the Markov model from the last section(cid:3) and
`the probability P (cid:6)u (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) un(cid:7) is the prior probability of seeing a particular sequence of umbrella
`events (cid:6)e(cid:2)g(cid:2) fTrue(cid:3) False(cid:3) Trueg(cid:7)(cid:2) The probability P (cid:6)u (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) unjw (cid:1) (cid:2) (cid:2) (cid:2) (cid:1) wn(cid:7) can be estimated as
`(cid:16)n
`i(cid:6) P (cid:6)uijwi(cid:7)(cid:3) if you assume that(cid:3) for all i(cid:3) given wi(cid:3) ui is independent of all uj and wj(cid:3) for all j (cid:9) i(cid:2)
`
`(cid:2) Questions(cid:3)
`
` (cid:2) Suppose the day you were locked in it was sunny(cid:2) The next day(cid:3) the caretaker carried an
`umbrella into the room(cid:2) Assuming that the prior probability of the caretaker carrying an
`umbrella on any day is (cid:2)(cid:3) what(cid:1)s the probability that the second day was rainy(cid:23)
`
`P (cid:6)w (cid:9) Rainyj
`w (cid:9) Sunny(cid:1) u (cid:9) True(cid:7)
`
`(cid:9)
`
`(cid:6)u and w independent(cid:7) (cid:9)
`
`P (cid:6)w (cid:9) Rainy(cid:1) w (cid:9) Sunnyju (cid:9) T(cid:7)
`P (cid:6)w (cid:9) Sunnyju (cid:9) T(cid:7)
`P (cid:6)w (cid:9) Rainy(cid:1) w (cid:9) Sunnyju (cid:9) T(cid:7)
`P (cid:6)w (cid:9) Sunny(cid:7)
`
`
`
`Columbia Ex 2033-5
`Symantec v Columbia
`IPR2015-00375
`
`

`
`(cid:6)Bayes Rule(cid:7) (cid:9)
`
`(cid:6)Markov assumption(cid:7) (cid:9)
`
`(cid:6)P(cid:6)A(cid:1) B(cid:7) (cid:9)P (cid:6)AjB(cid:7)P(cid:6)B(cid:7)(cid:7) (cid:9)
`
`(cid:6)Cancel (cid:4) P(cid:6)Sunny(cid:7)(cid:7) (cid:9)
`
`P (cid:6)u (cid:9) Tjw (cid:9) Sunny(cid:1) w (cid:9) Rainy(cid:7)P (cid:6)w (cid:9) Rainy(cid:1) w (cid:9) Sunny(cid:7)
`P (cid:6)w (cid:9) Sunny(cid:7)P (cid:6)u (cid:9) T(cid:7)
`P (cid:6)u (cid:9) Tjw (cid:9) Rainy(cid:7)P (cid:6)w (cid:9) Rainy(cid:1) w (cid:9) Sunny(cid:7)
`P (cid:6)w (cid:9) Sunny(cid:7)P (cid:6)u (cid:9) T(cid:7)
`P (cid:6)u (cid:9) Tjw (cid:9) Rainy(cid:7)P (cid:6)w (cid:9) Rainyjw (cid:9) Sunny(cid:7)P (cid:6)w (cid:9) Sunny(cid:7)
`P (cid:6)w (cid:9) Sunny(cid:7)P (cid:6)u (cid:9) T(cid:7)
`P (cid:6)u (cid:9) Tjw (cid:9) Rainy(cid:7)P (cid:6)w (cid:9) Rainyjw (cid:9) Sunny(cid:7)
`P (cid:6)u (cid:9) T(cid:7)
`
`(cid:9)
`
`(cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)
`(cid:2)
`
`(cid:9) (cid:2)
`
`(cid:2) Suppose the day you were locked in the room it was sunny(cid:15) the caretaker brought in an umbrella
`on day (cid:3) but not on day (cid:2) Again assuming that the prior probability of the caretaker bringing
`an umbrella is (cid:2)(cid:3) what(cid:1)s the probability that it(cid:1)s foggy on day (cid:23)
`
`P (cid:6)w (cid:9) Fj
`w (cid:9) S(cid:1) u (cid:9) T(cid:1) u (cid:9) F(cid:7)
`
`(cid:9) P (cid:6)w (cid:9) Foggy(cid:1) w (cid:9) Foggy j
`w (cid:9) Sunny(cid:1) u (cid:9) True(cid:1) u (cid:9) False(cid:7) (cid:24)
`
`(cid:9)
`
`(cid:9)
`
`(cid:9)
`
`P (cid:6)w (cid:9) Rainy(cid:1) w (cid:9) Foggy j (cid:2) (cid:2) (cid:2)(cid:7) (cid:24)
`P (cid:6)w (cid:9) Sunny(cid:1) w (cid:9) Foggy j (cid:2) (cid:2) (cid:2)(cid:7)
`P (cid:6)u (cid:9) F jw (cid:9) F (cid:7)P (cid:6)u (cid:9) T jw (cid:9) F (cid:7)P (cid:6)w (cid:9) F jw (cid:9) F (cid:7)P (cid:6)w (cid:9) F jw (cid:9) S(cid:7)P (cid:6)w (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F (cid:7)P (cid:6)u (cid:9) T (cid:7)P (cid:6)w (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F jw (cid:9) F (cid:7)P (cid:6)u (cid:9) T jw (cid:9) R(cid:7)P (cid:6)w (cid:9) F jw (cid:9) R(cid:7)P (cid:6)w (cid:9) Rjw (cid:9) S(cid:7)P (cid:6)w (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F (cid:7)P (cid:6)u (cid:9) T (cid:7)P (cid:6)w (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F jw (cid:9) F (cid:7)P (cid:6)u (cid:9) T jw (cid:9) S(cid:7)P (cid:6)w (cid:9) F jw (cid:9) S(cid:7)P (cid:6)w (cid:9) Sjw (cid:9) S(cid:7)P (cid:6)w (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F (cid:7)P (cid:6)u (cid:9) T (cid:7)P (cid:6)w (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F jw (cid:9) F (cid:7)P (cid:6)u (cid:9) T jw (cid:9) F (cid:7)P (cid:6)w (cid:9) F jw (cid:9) F (cid:7)P (cid:6)w (cid:9) F jw (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F (cid:7)P (cid:6)u (cid:9) T (cid:7)
`P (cid:6)u (cid:9) F jw (cid:9) F (cid:7)P (cid:6)u (cid:9) T jw (cid:9) R(cid:7)P (cid:6)w (cid:9) F jw (cid:9) R(cid:7)P (cid:6)w (cid:9) Rjw (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F (cid:7)P (cid:6)u (cid:9) T (cid:7)
`P (cid:6)u (cid:9) F jw (cid:9) F (cid:7)P (cid:6)u (cid:9) T jw (cid:9) S(cid:7)P (cid:6)w (cid:9) F jw (cid:9) S(cid:7)P (cid:6)w (cid:9) Sjw (cid:9) S(cid:7)
`P (cid:6)u (cid:9) F (cid:7)P (cid:6)u (cid:9) T (cid:7)
`
`(cid:24)
`
`(cid:24)
`
`(cid:24)
`
`(cid:24)
`
`(cid:6)(cid:2)(cid:7)(cid:6)(cid:2) (cid:7)(cid:6)(cid:2)(cid:7)(cid:6)(cid:2) (cid:7)
`(cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)
`(cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)
`(cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)
`(cid:6)(cid:2)(cid:7)(cid:6)(cid:2) (cid:7)(cid:6)(cid:2) (cid:7)(cid:6)(cid:2)(cid:7)
`(cid:6)(cid:2)(cid:7)(cid:6)(cid:2)(cid:7)
`
`(cid:24)
`
`(cid:24)
`
`(cid:9) (cid:2)
`
`
`
`Columbia Ex 2033-6
`Symantec v Columbia
`IPR2015-00375
`
`

`
` Relationship to Speech
`
`Let(cid:1)s get away from umbrellas and such for a moment and talk about real things(cid:3) like speech(cid:2) In
`speech recognition(cid:3) the basic idea is to (cid:5)nd the most likely string of words given some acoustic input(cid:3)
`or(cid:4)
`
`argmaxwLP (cid:6)wjy(cid:7)
`Where w is a string of words(cid:3) L is the language you(cid:1)re interested in(cid:3) and y is the set of acoustic
`vectors that you(cid:1)ve gotten from your front end processor(cid:2) To compare this to the weather example(cid:3)
`the acoustics are our observations(cid:3) similar to the umbrella observations(cid:3) and the words are similar
`to the weather on successive days(cid:2)
`Remember that the basic equation of speech recognition is Bayes(cid:1) Rule(cid:4)
`
`(cid:6)(cid:7)
`
`argmaxwLP (cid:6)wjy(cid:7) (cid:9) argmaxwL
`
`P (cid:6)yjw(cid:7)P (cid:6)w(cid:7)
`P (cid:6)y(cid:7)
`
`(cid:6)(cid:7)
`
`For a single speech input (cid:6)e(cid:2)g(cid:2) one sentence(cid:7)(cid:3) the acoustics (cid:6)y(cid:7) will be constant(cid:3) and therefore(cid:3)
`so will P (cid:6)y(cid:7)(cid:2) So we only need to (cid:5)nd(cid:4)
`
`argmaxwLP (cid:6)yjw(cid:7)P (cid:6)w(cid:7)
`The (cid:5)rst part of this expression is called the model likelihood(cid:3) and the second part is a prior
`probability of the word string(cid:2)
`
`(cid:6) (cid:7)
`
` (cid:2) Word pronunciations
`
`First(cid:3) we want to be able to calculate P (cid:6)yjw(cid:7)(cid:3) the probability of the acoustics given the word(cid:2) Note
`that many acoustic vectors can make up a word(cid:3) so we choose an intermediate representation(cid:4) the
`phone(cid:2) Here is a word model for the word (cid:26)of(cid:27)(cid:4)
`
`P(q | q )a
`b
`
`q
`
`a
`
`0.7
`
`Start
`
`0.3
`
`q
`
`b
`
`ax
`ah
`
`q d
`
`q c
`
`v
`
`1.0
`
`1.0
`
`End
`
`Figure (cid:4) Word Model for (cid:26)of(cid:27)
`
`In this model(cid:3) we represent phone states as qi(cid:1)s(cid:2) We can now break up P (cid:6)yjw(cid:7) into P (cid:6)yjq(cid:1) w(cid:7)
`and P (cid:6)qjw(cid:7)(cid:2) Since the picture above only talks about one word wof (cid:3) we leave the w out of the
`conditioning here(cid:2)
`
`P (cid:6)y j wi(cid:7) (cid:9)P (cid:6)yjStart ax v End(cid:7) (cid:24) P (cid:6)yjStart ah v End(cid:7)
`
`P (cid:6)yjwof (cid:7) (cid:9) P (cid:6)qbjqa(cid:7)P (cid:6)yjqb(cid:7)P (cid:6)qcjqb(cid:7)P (cid:6)y jqc(cid:7) (cid:24) P (cid:6)qdjqa(cid:7)P (cid:6)yjqd(cid:7)P (cid:6)qcjqd(cid:7)P (cid:6)y jqc(cid:7)
`
`(cid:6) (cid:7)
`
`(cid:6) (cid:7)
`
`The individual P (cid:6)yijqi(cid:7) are given by a Gaussian or multi(cid:14)layer(cid:14)perceptron likelihood estimator(cid:2)
`The transitions P (cid:6)qjjqi(cid:7) depend on the pronunciations of words (cid:2)
`
`This is usually an n(cid:12)dimensional vector which represents (cid:12) milliseconds of speech(cid:7) A typical value for n is 
`to (cid:7) Your mileage may vary based on the type of system(cid:7)
` I(cid:10)ve made a simplifying assumption here that for the word (cid:1)of(cid:3) we only have two acoustic vectors(cid:4) y and y (cid:7)
`
`
`
`Columbia Ex 2033-7
`Symantec v Columbia
`IPR2015-00375
`
`

`
` (cid:2) Bigram grammars
`
`In the second part of expression (cid:3) we were concerned with P (cid:6)w(cid:7)(cid:2) This is the prior probability of
`the string of words we are considering(cid:2) Notice that we can also replace this with a Markov model(cid:3)
`using the following equation(cid:4)
`
`P (cid:6)w(cid:7) (cid:9) (cid:16)n
`i(cid:6) P (cid:6)wijwi(cid:1) (cid:7)
`
`(cid:6) (cid:7)
`
`What is this saying(cid:23) Well(cid:3) suppose that we had the string (cid:26)I like cabbages(cid:27)(cid:2) The probability of
`this string would be(cid:4)
`
`P (cid:6)I like cabbages(cid:7) (cid:9) P (cid:6)IjSTART(cid:7)P (cid:6)likejI(cid:7)P (cid:6)cabbagesjlike(cid:7)
`
`(cid:6) (cid:7)
`
`This type of grammar is called a bigram grammar (cid:6)since it relates two (cid:6)bi(cid:7) words (cid:6)grams(cid:7)(cid:7)(cid:2)
`Using a higher order Markov assumption allows more context into the grammars(cid:15) for instance(cid:3) a
`second(cid:14)order Markov language model is a trigram grammar(cid:3) where the probabilities of word n are
`given by P (cid:6)wnjwn(cid:1) (cid:1) wn(cid:1)(cid:7)(cid:2)
`
` Further reading
`
`I have only really scratched the surface of automatic speech recognition (cid:6)ASR(cid:7) here(cid:15) for the sake
`of brevity I have omitted many of the details that one would need to consider in building a speech
`recognizer(cid:2) For a more mathematically rigorous introduction to Hidden Markov Models and ASR
`systems(cid:3) I would suggest (cid:6)Rabiner  (cid:7)(cid:2) A recent book by Jelinek (cid:6) (cid:7) gives a good overview of
`how to build large(cid:14)vocabulary ASR systems(cid:3) at about the same level of rigor as Rabiner(cid:2) Hidden
`Markov models can also be viewed as one instance of statistical graphical models (cid:6)Smyth et al(cid:3) (cid:7)(cid:2)
`Other recommended introductions to HMMs(cid:3) ASR systems(cid:3) and speech technology are Rabiner (cid:28)
`Juang (cid:6) (cid:7)(cid:3) Deller et al(cid:3) (cid:6) (cid:7)(cid:3) and Morgan (cid:28) Gold (cid:6)forthcoming(cid:7)(cid:2)
`
` Exercises
`
`Imagine that you work at ACME Chocolate Factory(cid:3) confectioners extraordinaire(cid:2) Your job is to
`keep an eye on the conveyor belt(cid:3) watching the chocolates as they come out of the press one at a
`time(cid:2)
`Suppose that ACME makes two types of chocolates(cid:4) ones with almonds and ones without(cid:2) For
`the (cid:5)rst few problems(cid:3) assume you can tell with (cid:29) accuracy what the chocolate contains(cid:2) In the
`control room(cid:3) there is a lever that switches the almond control on and o(cid:17)(cid:2) When the conveyor is
`turned on at the beginning of the day(cid:3) there is a (cid:29) chance that the almond lever is on(cid:3) and a (cid:29)
`chance that it is o(cid:17)(cid:2) As soon as the conveyor belt is turned on(cid:3) it starts making a piece of candy(cid:2)
`Unfortunately(cid:3) someone has let a monkey loose in the control room(cid:3) and it has locked the door
`and started the conveyor belt(cid:2) The lever cannot be moved while a piece of candy is being made(cid:2)
`Between pieces(cid:3) however(cid:3) there is a (cid:29) chance that the monkey switches the lever to the other
`position (cid:6)i(cid:2)e(cid:2) turns almonds on if it was o(cid:17)(cid:3) or o(cid:17) if it was on(cid:7)(cid:2)
`
` (cid:2) Draw a Markov Model that represents the situation(cid:2) Hint(cid:4) you need three states(cid:3)
`
`Now assume that there is a coconut lever as well(cid:3) so that there are four types of candy(cid:4) Plain(cid:3)
`Almond(cid:3) Coconut(cid:3) and Almond(cid:24)Coconut(cid:2) Again(cid:3) there is a (cid:29) chance of the lever being on at the
`beginning of the day(cid:3) and the chance of the monkey switching the state of the second lever between
`candies is also (cid:29)(cid:2) Assume that the switching of the levers is independent of each other(cid:2)
`
`However(cid:4) we don(cid:10)t know a priori how long a word is(cid:7) In order to handle multiple lengths of words(cid:4) self(cid:12)loops are often
`added to the pronunciation graph(cid:7) Thus(cid:4) one can stay in stateq b with probability P (cid:5)qbjqb(cid:6)(cid:9) we can then handle these
`probabilities like any other transition probabilities on the graph(cid:7)
`
`
`
`Columbia Ex 2033-8
`Symantec v Columbia
`IPR2015-00375
`
`

`
`(cid:2) Now draw a Markov Model for both production of all four types of chocolates(cid:2)
`
` (cid:2) What is the probability that the machine will produce in order(cid:4) fPlain(cid:3) Almond(cid:3) Almond(cid:3)
`Almond(cid:24)Coconutg
`
`Now assume that you can(cid:1)t tell what(cid:1)s inside the chocolate candy(cid:3) only that the chocolate is light
`or dark(cid:2) Use the following table for P (cid:6)colorjstu(cid:17) inside(cid:7)(cid:4)
`
`Inside
`Plain
`Almond
`Coconut
`Almond(cid:24)Coconut
`
`P(cid:6)color(cid:9)light(cid:7) P(cid:6)color(cid:9)dark(cid:7)
`(cid:2)
`(cid:2)
`(cid:2)
`(cid:2)
`(cid:2)
`(cid:2)
`(cid:2)
`(cid:2)
`
`Assume that the prior probability of P(cid:6)color(cid:9)light(cid:7)(cid:9)P(cid:6)color(cid:9)dark(cid:7)(cid:9)(cid:2)(cid:3) and that the color of
`one chocolate is independent of the next(cid:3) given the (cid:5)lling(cid:2)
`
`(cid:2) If the (cid:5)rst two chocolates were dark and light respectively(cid:3) what is the probability that they
`are Almond and Coconut respectively(cid:23)
`
`Acknowledgments
`
`The weather story for Markov models has been (cid:30)oating around the speech community for a while(cid:15) it
`is presented (cid:6)probably originally(cid:7) by Rabiner (cid:6)  (cid:7)(cid:2) Thanks to Su(cid:14)Lin Wu(cid:3) Nelson Morgan(cid:3) Nikunj
`Oza(cid:3) Brian Kingsbury(cid:3) and Jitendra Malik for proofreading and comments(cid:2) Any mistakes in this
`tutorial(cid:3) however(cid:3) are the responsibility of the author(cid:3) and not of the reviewers(cid:2)
`
`References
`
`Deller(cid:1) J(cid:2)(cid:3) J(cid:2) Proakis(cid:3) (cid:28)J(cid:2) Hansen(cid:2) (cid:2) Discrete(cid:2)Time Processing of Speech Signals(cid:2) New
`York(cid:4) Macmillan Publishing(cid:2)
`
`Jelinek(cid:1) F(cid:2) (cid:2) Statistical Methods for Speech Processing(cid:2) Language(cid:3) Speech and Communcation
`Series(cid:2) Cambridge(cid:3) MA(cid:4) MIT Press(cid:2)
`
`Morgan(cid:1) N(cid:2)(cid:3) (cid:28) B(cid:2) Gold(cid:2) forthcoming(cid:2) Speech and Audio Signal Processing(cid:2) Wiley Press(cid:2)
`
`Rabiner(cid:1) L(cid:2)  (cid:2) A tutorial on hidden Markov models and selected applications in speech
`recognition(cid:2) Proceedings of the IEEE (cid:2)
`
`(cid:18)(cid:18)(cid:3) (cid:28) B(cid:2)(cid:3)H(cid:2) Juang(cid:2) (cid:2) Fundamentals of Speech Recognition(cid:2) Prentice Hall Signal Processing
`Series(cid:2) Englewood Cli(cid:17)s(cid:3) NJ(cid:4) Prentice Hall(cid:2)
`
`Smyth(cid:1) P(cid:2)(cid:3) D(cid:2) Heckerman(cid:3) (cid:28) M(cid:2)I(cid:2) Jordan(cid:2) (cid:2) Probabilistic independence networks for
`hidden Markov probability models(cid:2) Neural Computation (cid:2)(cid:31)(cid:2)
`
`
`
`Columbia Ex 2033-9
`Symantec v Columbia
`IPR2015-00375

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket