throbber
sec. 7.4
`
`The Level Building (LB) Algorithm
`
`403
`
`(7.17)
`
`and for each frame, m, in the composite ending range, m 1 (l) ~ m ~ mz(l), we need to
`store
`
`D1(m) = min [De(m)]
`l~v~V
`Flf (m) = arg min [De(m)]
`l~vSV
`
`- best distance at level e to frame m
`
`(7.18)
`
`-
`
`reference pattern index which gave dis- (7 .19)
`tance at level e to frame m
`- back pointer to best ending frame at pre- (7 .20)
`vious level that achieves D1(m)
`(By definition Ff (m) = 0 for all m since the ending frame of the 0th level is 0.) By storing
`only D1(m), N1(m) and F1(m), we significantly reduce the storage at each level and yet
`we retain all the information required to pick up the optimal path through the entire grid as
`shown below.
`The second-level computation does not begin until the first-level computation is
`finished. To see how the computation is picked up at level 2, consider Figure 7.9, which
`shows a series of time warps that span a set of beginning frames and provide a new set of
`ending frames.
`For each reference pattern, Rv, the range of starting frames is m 1 (1) ~ m ~ m2(1)
`because every ending point of the first level is a possible starting point in the second level.
`Hence for each frame m in the beginning range, and for each frame at the beginning of the
`reference pattern, we must consider paths coming from both the current reference pattern
`and the previous level. Other than the broadened beginning region, each DTW at level 2 is
`essentially identical to those at level 1. Thus for reference 'R.1 the range of ending frames
`at level 2 is m 11 (2) < m < m12(2); for reference 'R.2 the range of ending frames at level 2
`is m21 (2) < m < m22 (2), etc. We again derive the ending range at the end of level 2 as
`m1(2) = min [mv1(2)]
`1$v$V
`m2(2) = max [mv2(2)]
`
`1$v$V
`
`(7.21)
`
`(7.22)
`
`and for each frame in m 1 (2) ~ m ~ m2(2) we determine the best distance D~(m), the
`reference with the best distance, Nf (m), and the backpointer F~(m).
`We can continue
`the level building procedure through all levels until level lmax in
`the above manner and we obtain, as the final solution, D* as
`
`(7.23)
`
`It should be clear that by performing the computation in levels (i.e., a word at a time)
`and by doing appropriate minimization within levels, we avoid much of the computation
`of the two-level DP algorithm described in the previous section. However, the negative
`feature of the level building algorithm is that the computation is level synchronous, not
`time synchronous;
`that is, we can go back to a given test frame at many levels. Hence,
`real-time hardware
`implementations of level building are difficult, if not impossible, to
`
`IPR2023-00037
`Apple EX1013 Page 254
`
`

`

`404
`
`Chap. 7
`
`Speech Recognition Based on Connected Word Models
`
`n
`REFERENCE
`1
`
`g:J
`
`m1CO m2<0
`m21<2l
`
`REFERENCE
`n
`2
`
`REFE~ENC~ L.__Jj~~:L.L.:.i..r::::_-+-i _________
`m t (1) : m2(1l
`:
`'
`'
`'
`
`I
`
`e =2
`
`m
`
`m
`
`----:~----
`m
`
`m
`
`2--(cid:173)
`---LEVEL
`ENDING RANGE
`
`Figure 7.9
`[4]).
`
`Implementation of level 2 of the level building algorithm (after Myers and Rabiner
`
`implement. (We will return to these issues later in this section.)
`To gain a better understanding of the basic concepts of the level building algorithm,
`consider the simple example shown in Figure 7 .10. Here we assume that the vocabulary
`consists of two words (which we call A and B for simplicity) and that the two reference
`patterns, 'RA and 'R8 , are of equal length. We also assume we are only interested in an
`f. = 4 level solution. (This greatly simplifies the range of the dynamic programming
`search parallelogram which, as seen in Figure 7. I 0, is essentially identical to the case for
`isolated word template matching. In the next section, where multiple level considerations
`are discussed, the dynamic programming search range will be shown to be significantly
`more complicated to specify.) Since both patterns are of equal length, the ending regions
`for both words, at each level, are identical. Hence at each level ending region we choose
`the reference pattern (A or B) that gave the smallest accumulated distance to that frame. In
`this simple example, there are 6 ending frames at level 1, with the best path corresponding
`to 'RA for the first two frames, and 'R,8 for the next 4 frames. At level 2 there are 10 ending
`frames; at level 3 there are 6 ending frames, and finally at level 4 there is only one ending
`frame corresponding to frame M, the end of the test utterance. By tracing back the (best)
`
`IPR2023-00037
`Apple EX1013 Page 255
`
`

`

`sec. 7.4
`
`The Level Building (LB) Algorithm
`
`405
`
`,--------------
`
`_ __,!8
`
`i=4
`
`2=3
`
`Q=2
`
`i= 1
`
`m(TEST)
`
`Figure 7.10 Simple example illustrating level building on two reference patterns of
`equal length (after Myers and Rabiner (4)).
`
`path ending at m = M we see that the sequence of reference patterns giving the best score
`IS
`
`with test frames e 1, e2, e3, and e4 = M corresponding to the last frames of the 4 words in
`the sequence.
`
`7.4,2 Multiple Level Considerations
`
`In the more realistic case in which we want the level building solutions for all feasible
`levels (i.e., where there is a possible solution), there are some very simple techniques that
`can be used to eliminate unnecessary computation. To understand this issue consider the
`"standard" warping range of the level building algorithm as shown in Figure 7 .11. (We
`assume here that we use a DTW algorithm with a maximum expansion of 2 and a minimum
`
`IPR2023-00037
`Apple EX1013 Page 256
`
`

`

`406
`
`Chap. 7
`
`Speech Recognition Based on Connected Word Models
`
`n
`
`1tt:::..... ________________
`I
`
`m (TEST)
`
`__..
`M
`
`Figure 7.11 Computation region of level building algorithm for a fixed-length
`reference pattern (after Myers and Rabiner [4]).
`
`expansion of ½.) If we define lower and upper constraints lines as
`L(m) = (m + 1)/2
`U(m) = 2(m - 1) + 1
`
`(7.24)
`(7.25)
`
`then for a fixed-length reference pattern we get the computation and ending regions shown
`in Figure 7.11. (We denote the computation region at level£ as Gt with ending region Et,)
`It should be clear that some of the computation of Figure 7 .11 is unnecessary, since there
`do not exist paths from some of the computed points to ends of reference patterns at any
`level.
`We can use the constraint that in order to do the computation at any grid point, the
`path from that grid point must be capable of reaching the end of the reference pattern before
`the end of the test pattern as shown in Figure 7 .12. Here we have drawn lines, at each
`level, of slope 2, from the last frame of the test pattern and the last frame of the reference
`pattern, and used them as constraints on the grid. We have also drawn a line, at level Lmax,
`of slope 1 h from the last test frame and the last reference frame to further constrain the grid
`at the last level. The lower and upper constraints of the simplified grid can be described by
`the equations
`
`l
`
`[m + 1
`L(m) = max ~,
`
`2(m _ M) + 0(£)
`
`(7.26)
`
`IPR2023-00037
`Apple EX1013 Page 257
`
`

`

`sec. 7.4
`
`The Level Building (LB) Algorithm
`
`407
`
`'Rq(1)
`
`m( TEST)
`
`M
`
`Figure 7 .12 Reduced computation region using upper-and lower-level constraints
`(after Myers and Rabiner [ 4 ]).
`
`(7.27)
`
`in which the reference pattern length function, 0( f), is the accumulated number of frames
`of the reference patterns used up to level f where (m + 1)/2 = 2(m - M) + 0(£) at some
`frame m < M. The resulting region of computation is somewhat reduced from that shown
`in Figure 7 .11.
`Since the length of each reference pattern is different (in general) the actual computa(cid:173)
`tion regions for a level building search based on variable length reference patterns is shown
`in Figure 7 .13. Here we show the computation regions, at each level, for the shortest and
`for the longest reference patterns. Fundamentally, the pattern of computation is similar to
`that of Figure 7 .11.
`
`7.4.3 Computation of the Level Building Algorithm
`
`From the above discussion it should be clear that the basic computation of the level building
`algorithm is a series of V time warps at each level, where Vis the size of the vocabulary.
`If we assume the maximum number of levels in Lmax, then we need V Lmax time warps. An
`overbound on size of each time warp is NM /3 grid points where N is the average length of
`each reference pattern and M is the number of frames in the test pattern. Hence the total
`
`IPR2023-00037
`Apple EX1013 Page 258
`
`

`

`408
`
`Chap. 7
`
`Speech Recognition Based on Connected Word Models
`
`~ SEARCH REGION FOR LONGEST REFERENCE AT EACH LEVEL
`0
`
`SEARCH REGION FOR SHORTEST REFERENCE AT EACH LEVEL
`
`I
`I
`I
`
`--------+
`
`w
`... u
`(I) z
`WW
`... a::
`a:::w
`0 Lr.
`::cw
`(I) a::
`
`w
`... u
`(I) z
`WW
`(!) a::
`zw
`0 Lr.
`.-IW a::
`f
`
`m
`
`1=4
`
`J. =3
`
`l=2
`
`£. = 1
`
`M
`
`Figure 7.13 Overall computation pattern of level building algorithm for variable
`length reference patterns (after Myers and Rabiner [4)).
`
`computation of the level building algorithm is
`CLB = V • lmax • N • M /3
`
`grid points
`
`(7.28)
`
`..
`
`with storage
`
`SLB = 3M • lmax
`since we need storage for D8 , N8 , and F8 at each value of m and for each level£.
`Using the same values for M, N, and V as used in determining the computation
`and storage of the two-level DP method, namely M = 300, N = 40, V = 10, and using
`Lmax = 7, we get CLB = 280,000 and SLB = 6300. The basic computation of the level
`building algorithm is a factor of 4.7 less than that of the two-level DP method; the storage
`of both methods is about the same.
`The above calculations are based on full DTW s at each level for each reference
`pattern. Because of the overlaps of regions in the (m, n) plane, at different levels, some of
`the assumed computation is redundant. The following exercise illustrates this point.
`
`(7.29)
`
`IPR2023-00037
`Apple EX1013 Page 259
`
`

`

`sec. 7.4
`
`The Level Building (LB) Algorithm
`
`409
`
`Exercise 7 .2
`In implementing different levels of the level building algorithm, we have assumed all levels
`are independent. Thus, at each level, we have performed a full DTW, for each reference
`pattern, where the size of the DTW was N • M /3 (grid points). An alternative, and, we hope,
`more efficient, approach is to realize that for each reference pattern, a significant portion of the
`computation at each level (namely that of distance computation) may have previously been
`performed at an earlier level, and therefore only the combinatorics part of the DTW is truly
`independent at each level.
`1. Show that for the assumed parameters of the system (i.e., M = number of test
`frames = 300, N = average number of frames per reference pattern = 40, V = size of
`vocabulary = l 0, Lmax = maximum number of levels = 7), the alternative implemen(cid:173)
`tation of storing all previously computed distances is more efficient than the standard
`implementation.
`(Assume that the cost of the combinatorics in a DTW procedure is
`negligible.)
`
`2. What is the ratio of computation of the standard implementation to the stored distance
`implementation?
`3. If we assume the cost of the combinatorics at each grid point is 1
`/ 5 the cost of a distance
`computation, how do the results to parts I and 2 change?
`
`Solution 7 .2
`1. The simplest way of exploiting previously computed distances is to precompute the
`entire grid of distances from each reference pattern frame to each test frame. (Clearly
`this is not the most efficient implementation, since many of these full grid distances will
`never be required in practice.) If we do this we need
`
`CmsT = V • M • N (distances)= 120,000.
`
`The number of combinatorics is just the number of grid points used in the level building
`method; hence
`
`CcoMB = V • Lmax • N • ~ (grid points) = 280,000.
`
`The total computation of this approach is then
`
`CroTAL = Co1sT +a· CcoMB
`
`in which a is the weight for combinatorics. If we assume combinatorics are negligible,
`a= 0, we get
`
`CroTAL = CmsT = 120,000
`which is significantly less than the 280,000 distances for the standard implementation
`(i.e., one per grid point).
`2. The ratio of computation of the two approaches is
`_ V • Lmax • N • M /3 = Lmax = 233
`CLB
`V • M • N
`CroTAL -
`3
`
`and is independent of V, M, and N but only depends on lmax•
`
`IPR2023-00037
`Apple EX1013 Page 260
`
`

`

`410
`
`Chap. 7
`
`Speech Recognition Based on Connected Word Models
`
`3. If we assume the cost of combinatorics is 1/s the cost of a distance computation we get
`-
`- M 1
`CroTAL = V • M • N + V • Lmax • N • 3 S = 176,000
`with the ratio in computation being
`
`CLB
`
`CroTAL
`
`V•lmu•N·M/3
`=
`V • M • N + V • Lmax • N • 3 5
`
`Ml
`
`-
`
`Lma~ = 1.59
`30 + 15 )
`
`7 .4.4
`
`Implementation Aspects of Level Building
`
`Even though we have already shown that the computation for the level building approach
`to connected word recognition is significantly less than that of the two-level DP approach
`of Section 7 .2, there are several ways of even further reducing the computational load of
`the algorithm. These include the following:
`
`1. Beginning range reduction in which we reduce the size of the initial region (range of
`m) for which valid paths to a given level can begin.
`2. Global range reduction in which we reduce the size of the region (width of the
`template) that is tracked, within a level, to determine a best path to each possible
`level ending frame, m.
`3. Test pattern ending range in which we increase the range over which the global path
`can match the connected word string; this procedure provides some robustness to
`ending frame errors.
`4. Reference pattern uncertainty ranges in which we increase the range of search at
`the beginning and end of reference patterns to allow for some degree of word
`coarticulation at reference word boundaries.
`
`The way in which we implement these features is as follows.
`
`1. Beginning range reduction-MT
`For the complete level building algorithm, at level .e - 1, we retain the best score,
`f>1_1(m), for each frame min the ending region m 1(.f, - 1) < m ~ m2(f - 1). It
`should be obvious that, in general, the best global path is not at either boundary but
`somewhere in the middle range of m. Hence if we could eliminate some of the ending
`rarige at level .e -
`I, the search at level .e would involve less computation. (To see
`this, consider the limiting case where we make the ending region, at each level, a •
`single frame; then the computation at each new level is a simple DTW with a single
`initial frame, much as occurs at level 1.)
`I we need to nonnalize
`To determine how to reduce the ending range at level £ -
`the best accumulated distance scores by the number of frames. Thus we first find the
`locally best (minimum) normalized accumulated distance as:
`
`(7.30)
`
`IPR2023-00037
`Apple EX1013 Page 261
`
`

`

`I
`
`sec. 7.4
`
`The level Building (LB) Algorithm
`
`411
`
`level threshold as Mr . <Pt-I where Mr is a defined
`We now define a reduced-range
`I) ::; m ::; m2( e - I) to find the indices S}
`parameter, and search the range m 1 ( f -
`and S~ such that
`
`(7.31)
`
`(7.32)
`
`To see what is achieved by the beginning range reduction, consider Figure 7 .14,
`which shows a sequence of levels and the resulting ending ranges with and without
`range reduction (top graph), and the way in which the reduced range is determined
`(bottom graph). For the level shown, the best normalized distance score, <Pt-I, is
`determined as the smallest value of the curve, and the threshold Mr • <Pt-I is shown
`as a level above </>e-1 (clearly Mr ~ 1.0). The initial reduced range point, S~. is
`the last index (beginning from m = m 1(f - 1)) where the curve is always above the
`threshold; similarly the index S~ is the last index (beginning from m = m2(e - 1))
`where the curve is always above the threshold. The reduced range means that the
`computation at the next level is smaller as seen at the top of Figure 7.14. Depending
`on the value of Mr, the reduced range can get smaller (as Mr ➔ 1) or larger (as
`Mr --+ oo ). It should be clear that too small a value of Mr will allow the best path to
`be prematurely eliminated; hence proper choice of Mr is essential.
`2. Global range reduction-E
`The idea behind global range reduction is to reduce the search range along the
`reference axis, for each test frame, by tracking the global minimum, and allowing
`only a range around the global minimum. Thus for each test frame, m, at each level
`f., and for each reference pattern, 'Rv, we determine the local minimum, c(m), as
`c(m) = arg
`
`min
`c(m-1)-fSnSc(m-1)+€
`
`[Dt(m - 1, n)]
`
`(7.33)
`
`in which De(m - 1, n) is defined to be the best distance at level f. using reference nv
`at test frame m - 1 to reference frame n (as determined by the local alignment path)
`and with c(l) defined to be 1. Figure 7 .15 illustrates the global range reduction for
`a typical level building search. For global range reduction to be effective we must
`have the reduced range, 2E + 1, be smaller than the typical reference pattern width.
`3. Test pattern ending range-<5£ND
`The idea here is to allow a range of test pattern ending frames, rather than restricting
`the ending frame tom = M. This feature provides a measure of robustness to test
`pattern endpoint errors. If we extend the end of the test pattern by 6END frames, the
`global level building solution is modified to be
`
`(7.34)
`
`IPR2023-00037
`Apple EX1013 Page 262
`
`

`

`412
`
`Chap. 7
`
`Speech Recognition Based on Connected Word Models
`
`-e D.)ml
`., m
`
`(bl
`....._----ORIGINAL
`
`m(TEST)
`
`M
`
`RANGE -----~•~1
`I
`I
`I
`
`Figure 7.14 Illustration of beginning range reduction (after Myers and Rabiner [4]).
`
`4. Reference pattern uncertainty regions-8 R
`, 8 R
`1
`2
`To account for coarticulation of words across word boundaries, the level building
`algorithm allows a range of beginning and ending frames of the reference pattern. In
`this manner, at any level, the path can begin over the range 1 :::; n '.S 1 +8R
`(where n is
`2 < n '.S Nv. For
`thereferencepattemindex),andendatanyframeintherangeNv-8R
`appropriate values of 8R1 and 8R2 , it is possible to have a path which skips (8R1 + DR2)
`frames at highly coarticulated word boundaries (e.g., the boundary between six and
`seven in the string "six-seven"). Figure 7.16 illustrates the use of reference pattern
`uncertainty regions in level building.
`
`1
`
`A summary of the four implementation aspects of level building, as discussed in
`this section, is shown in Figure 7.17 in which all four features are combined in a single
`
`IPR2023-00037
`Apple EX1013 Page 263
`
`

`

`sec. 7.4
`
`The Level Building (LB) Algorithm
`
`413
`
`'Rq( I)
`
`m(TEST)
`
`M
`
`Figure 7.15
`[4]).
`
`Illustration of global range reduction (after Myers and Rabiner
`
`• • •
`
`r(n)
`
`Figure 7 .16
`Illustration of the use of reference pattern uncertainty regions (after
`Myers and Rabiner [4]).
`
`t(m)
`
`L
`
`IPR2023-00037
`Apple EX1013 Page 264
`
`

`

`414
`
`Chap. 7
`
`Speech Recognition Based on Connected Word Models
`
`m (TEST)
`
`Figure 7.17 Summary of level building implementation of computation reduction
`methods (after Myers and Rabiner (4)).
`
`level building search. It can be shown that with judicious choice of the implementation
`parameters, Mr, E, 8END, 8R1 and 8R2 , the overall computation can be substantially reduced
`from that of the standard level building approach.
`
`7 .4.5
`
`Integration of a Grammar Network
`
`We have been implicitly assuming that, for connected word recognition, each word in the
`string can be followed by any other word in the string. This implicit form of grammar
`is most appropriate for things like digit strings in which any digit can follow any other
`digit. However, for some connected word recognition tasks there is an explicit set of rules
`(grammar) governing which words can logically follow other words to form valid sentences
`in the language [10]. Although the form of such a grammar can be of several different
`types (we will discuss this in more detail in Chapter 8), we will restrict ourselves here to
`those tasks in which we can represent the grammar by a finite state network (FSN) or a
`finite state automata (FSA) of the form
`
`G =A(Q, V,8,qo,Z)
`
`(7.35)
`
`where
`
`Q = set of states
`
`IPR2023-00037
`Apple EX1013 Page 265
`
`

`

`sec. 7.4
`
`The Level Building (LB) Algorithm
`
`415
`
`V = set of vocabulary words
`8 = set of transitions
`qo E Q = initial state
`Z ~ Q = set of tennina1 states
`and the set of transitions obeys the rule
`
`6(q, v) = s
`meaning that word v drives the state from q to s
`
`(7.36)
`
`Jt
`
`/
`I
`I
`
`V
`
`----- ....
`
`--·
`-
`
`I
`I
`
`/
`
`To integrate the FSN grammar network into the following level building algorithm we must
`do the following:
`
`1. Identify levels with states rather than word position so that word candidates at the flh
`level need not be temporally contiguous to those at the (f + })st level
`2. Partition the vocabulary so that only the reference patterns for words leaving the f1h
`state are matched at the £th level
`3. Retain state backtracking pointers for recovering the best matching string.
`
`(It should be noted that for the most efficient computation, the states in Q should be
`topologically sorted.)
`To illustrate how a simple grammar FSN can be integrated into the LB algorithm,
`consider the following example:
`
`I
`1.
`2. WANT
`3. NEED
`4.
`THREE
`
`5. ONE
`6. A
`7. AN
`8. BOOK
`
`9. BOOKS
`IO. COAT
`11. COATS
`12. NEW
`
`13. OLD
`
`IPR2023-00037
`Apple EX1013 Page 266
`
`

`

`416
`
`Chap. 7
`
`Speech Recognition Based on Connected Word Models
`
`Current
`State
`2
`3
`4
`5
`6
`7
`8
`8
`8
`9*
`9*
`
`Words
`Used
`
`WANT
`NEED
`THREE
`A
`AN
`ONE
`NEW
`OLD
`BOOK, COAT
`BOOKS. COATS
`
`Predecessor
`State
`I
`2
`2
`3
`4
`4
`3
`6
`7
`8
`5
`
`Currect
`Level
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`
`Predecessor
`Levels
`0
`
`1
`2
`3
`3
`2
`5
`6
`7,8,9
`4
`
`If we study this simple example, we see that levels 2 and 3 both pick up from level
`I; similarly both levels 4 and 7 pick up from level 2. By building up the computation
`by levels (keeping track of the correct predecessor level), and by backtracking the final
`result by states (according to the grammar FSN) we can essentially use all the techniques
`described to efficiently find the best grammatically correct string.
`
`7 .4.6 Examples of LB Computation of Digit Strings
`
`Figures 7 .18 and 7 .19 show two examples of connected digit strings matched using the LB
`algorithm. Figure 7 .18 is for the string "51560" and shows the computation building up
`level by level. For this example, the locally best path at each level (shown by the digit to
`the right of the last test frame) is actually the globally best digit. At the end of level four
`the algorithm provided the four best choices with the string "5157" having the lowest score
`of 0.553. At the end of level 5 there were six string choices with the correct string "51560"
`having the lowest average accumulated distance of 0.333.
`The example in Figure 7.19 is for the string "99211," which did not provide a match
`nearly as good as the previous example. We see here that the locally best string at each level,
`namely "90111" is not globally best, and actually is incorrect in two positions. Although
`the algorithm gets the correct string as the best score, the second best string is the string
`"901," which has two digit deletions, and one digit-substitution error.
`
`7.5 THE ONE-PASS (ONE-STATE) ALGORITHM
`
`The third general approach to the connected word recognition problem is a procedure
`which was originally proposed by Vintsyuk in 1971 [5] and which has been "rediscovered"
`several times in the last two decades [6-8] and generalized in scope (9]. The algorithm
`has been called the one-pass procedure or the one-state algorithm, or most recently, the
`frame-synchronous level building (FSLB) method. The basic idea behind the algorithm
`is illustrated in Figure 7 .20, which shows a grid with the test pattern, T, mapped to
`
`IPR2023-00037
`Apple EX1013 Page 267
`
`

`

`sec. 7.5
`
`The One-Pass (One-State) Algorithm
`
`417
`
`1111 ENERGY ENVELOPE 1111
`MAX DB= 44 BEGIN s 33 ENO: 156 NF=124
`
`(a)
`
`I
`
`,. .
`
`.
`
`-
`
`--
`
`t
`
`( b)
`
`0.333 51560
`0.349 11560
`0.354 51570
`0.370 59560
`0.375 51160
`0.413 51562
`0.553 5157
`0.569 t157
`0.591 5957
`0.596 51 I 7
`
`I
`
`, . 0
`. . '
`
`{
`
`6
`
`5
`
`5
`
`Figure 7.18 Level building of the string "51560" (after Myers and Rabiner [4]).
`
`the horizontal axis, and the set of reference patterns, {'R.1, 'R.2, ... , Rv} mapped to the
`vertical axis.
`Using the standard notation of m to represent the test frame index, 1 s m s M, v to
`represent the reference pattern (Rv) index, I < v s V, and n to represent the reference frame
`index of pattern 'Rv, I < n < Nv, then for each test frame we calculate the accumulated
`distance, dA(m, n, v) as:
`dA(m, n, v) = d(m, n, v) + min (dA(m - 1,j, v)).
`
`n-29~n
`For 2 < n < Nv, 1 < v < V, where d(m, n, v) is the local distance between test frame
`t(m) and reference from rv(n), and we assume a maximum path expansion of 2 to I (hence
`we search back by 2 reference pattern frames in the combinatoric stage). The recursion of
`Eq. (7.37) is carried out for all internal frames of each reference pattern (i.e., n ~ 2). At
`the reference pattern boundary, i.e., n = 1, we have the simple recursion
`
`(7.37)
`
`dA(m, 1, v) = d(m, 1, v) + min [ min [dA(m - 1,N,,r)],dA(m -
`l~r~V
`
`I, I, v)].
`
`(7.38)
`
`Thus the combinatorics for internal and boundary frames are as shown in Figure 7.21.
`Figure 7 .21 a shows that for internal frames the combinatorics choose the best internal path
`
`IPR2023-00037
`Apple EX1013 Page 268
`
`

`

`J
`
`(a)
`
`( b)
`
`nu ENERGY ENVELOPE llll
`MAX DB= 44 BEGIN= 27 END• 99 NF= 73
`
`0 523 99211
`O 570 19211
`0 572 92211
`0 603 99219
`o 545 9921
`0 547 9901
`0592
`1921
`0593
`9221
`0 604 9929
`0 535 901
`O 565 101
`0.594 909
`
`{
`
`0
`
`9
`
`Figure 7.19 Level building of the string "99211" (after Myers
`and Rabiner [4]) .
`
`• •
`•
`
`n,
`
`TEST FRAME, m
`
`M
`
`Figure 7.20 The one-pass connected word recognition algorithm (after
`Bridle et al. (6)).
`
`IPR2023-00037
`Apple EX1013 Page 269
`
`

`

`\
`
`sec. 7.5
`
`The One-Pass (One-State) Algorithm
`
`419
`
`m - 1
`
`• n - 2
`m
`
`(a)
`
`m - 1
`
`m
`
`(b)
`
`Figure 7.21 Combinatorics for the one-pass algorithm.
`
`within the reference pattern, whereas at boundary frames the combinatorics choose either
`a straight (horizontal) path from within the reference pattern (subject to the constraint that
`the path cannot remain constant for more than one frame) or the best ending frame of any
`reference pattern. (Within the dynamic programming framework, of course, incorporation
`of a set of local constraints that differ from those of Eq. (7 .38) is possible, subject to
`pragmatic considerations.)
`The final solution for the best path (corresponding to the best word string) is
`
`(7.39)
`
`Thus the one-pass algorithm computes best paths to every reference pattern frame at every
`test frame and eventually is able to backtrack the best score (from Eq. (7.39)) to give the
`best word sequence, as shown in Figure 7 .20.
`The major problem with the one-pass algorithm is that no mechanism is provided for
`controlling the resulting string length-that
`is, for giving a best path for a string of arbitrary
`length. The algorithm inherently finds a single best path whose string length is whatever it
`turns out to be. Thus there is no simple way of exploiting given constraints on string length
`within the fundamental procedure.
`There is, however, a simple and straightforward way of incorporating level (i.e.,
`string-length) constraint
`in the computation. We do this by extending the accumulated
`distance to include level information-that
`is, we extend the recursion to compute the
`accumulated distance at level £ as
`d!(m,n, v) = d(m,n, v) + min
`
`[d!(m - 1,j, v)]
`
`(7.40)
`
`n-25)5,n
`where the computation is for 1 ~ i < Lmax, 2 < n < Nv, 1 < v < V, 1 ~ m ~ M. At each
`boundary frame the computation now becomes
`d!(m, 1, v) = d(m, 1, v) + min [ min d!-l(m - 1,Nv,r),d!(m - 1, 1, v)]
`
`l 5,r5,V
`
`(7.41)
`
`with
`
`v• = min min [d!(M,Nv, v)].
`
`(7.42)
`
`l5,i5,lmu 15,v5,V
`The key difference is in Eq. (7.41), which only allows a path to an ending frame at level
`l) to become a path at a beginning frame at level i.
`(£ -
`The main advantage of the one-pass algorithm is that the computation for a given
`test frame, m, can be done frame synchronously; hence the one-pass algorithm is well
`
`IPR2023-00037
`Apple EX1013 Page 270
`
`

`

`420
`
`Chap. 7
`
`Speech Recognition Based on Connected Word Models
`
`suited to real-time implementation on processors that are capable of doing all the necessary
`computations of Eqs. (7 .40) and (7 .41) in a single frame interval. Although at first glance
`it seems as though the computation of the one-pass algorithm is significantly greater than
`that of the LB approach, it is easily seen that the computation of d(m, n, v) of Eq. (7.40)
`is independent of level,£; hence it can be computed once (e.g., at level 1) and stored, and
`used in subsequent levels with no extra computation. Because of its suitability for real-time
`implementation the level-based version of the one-pass algorithm is generally the one used
`for connected word recognition tasks.
`
`7 .6 MULTIPLE CANDIDATE STRINGS
`
`In the previous sections we have been discussing ways of detennining the globally best
`path through the extended grid, corresponding to the best match to the spoken word string.
`There are many ways of obtaining multiple candidate strings from which we can determine,
`at least in theory, the second-best string match, the third-best string match, etc. Multiple
`string candidates are particularly useful when using grammar networks in order to provide
`robust recognition decisions. The way in which we obtain multiple candidate strings is
`simple; namely, we keep track of both the best distance, and the second-best distance to
`each ending frame at each level (to get the second-best string match). Since we have
`computed all reference pattern distances already (as needed to determine the best distance),
`all that is required is additional storage (to keep track of the array of second-best distances)
`and the bookkeeping to detennine the second best string. (In Chapter 4 we discussed the
`general problem of using dynamic programming methods to determine the N-best paths
`through a grid. What is described here is essentially what was called the parallel algorithm
`in Chapter 4, Section 4.7.5. The main difference is that here the path ranking is in terms of
`word candidate strings instead of frame-based time warping paths. This difference gives
`rise to some extra optimality considerations as discussed in this section.) We illustrate the
`procedure in Figure 7 .22, which shows a simple two-level LB search with tracking of the
`two best strings at each grid point. At the end of level 2 there are two distinct paths to the
`last frame of the test pattern-namely,
`the best distance (labeled 1 and shown as a solid
`path) and the second best distance (labeled 2 and shown as a dashed line). The best path
`from level 2 branches off to two other paths in level 1, with one being a best path and the
`other being a second-best path. Similarly, the second-best path from level 2 branches off
`to two other paths in level 1. Thus a total of four paths are followed-namely,
`the 11 path
`(best from level 2, best from level 1), the 12 path (best from level 2, second best from level
`1), the 21 path (second best from level 2, best from level 1 to the beginning point of the level
`2 path) and the 22 path (second best from level 2, second best from level 1). The overall
`best path is, by definition, the 11 path through the grid. The second-best path, according
`to this method, is (with some pathological exceptions, to be explained next) either the 12
`path, or the 21 path; the 22 path cannot ever be as good as the 21 path.
`The procedure described above can be extended trivially to the best three candidates
`at each level, in which case a total of 3L scores are obtained after L levels. This is illustrated
`at the bottom of Figure 7.22 for a two-level match in which there are nine string candidates.
`
`IPR2023-00037
`Apple EX1013 Page 271
`
`

`

`:.,/
`
`sec. 7.6
`
`Multiple Candidate Strings.
`
`421
`
`(a}
`
`LEVEL 2
`
`LEVEL 1
`
`( b)
`
`I
`I
`I
`I
`
`/
`
`/2
`
`/
`
`/
`
`TEST
`
`/
`
`/ --
`
`I
`I
`I
`I
`I
`I
`2 /
`/
`/
`
`/
`
`,,-'
`
`/
`
`L=2
`K=3
`
`Figure 7.22 Description of procedure for determining multiple can(cid:173)
`didate scores (after Myers and Rabiner [4]).
`
`Figure 7 .23 shows a case of using L = 4 levels with two best candidates. There are now
`four possible choices for the second-best string, including the 1112 path, the 1121 path,
`the 12

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket