IPR2020-00613, No. 1011 Exhibit - Exhibit 1011 (P.T.A.B. Feb. 20, 2020)

`
`Page 1 of 15
`
`SAMSUNG EXHIBIT1011
`
`
`
`Reducing the computations of the SVD array given by Brent and Luk
`
`B. Yang and J.F. Béhme
`
`Ruhr-Universitét, Departmentof Electrical Engineering
`4630 Bochum, West Germany
`
`ABSTRACT
`
`A new,efficient two plane rotations (TPR) method for computing two-sided rotations involved in
`singular value decomposition (SVD)is presented. By exploiting the commutative properties of some special
`types of 2x2 matrices, we show that a two-sided rotation can be computed by only two planerotations and a
`few additions. Moreover, if we use coordinate rotation digital computer (CORDIC) processors to implement
`the processing elements (PEs) of the SVD array given by Brent and Luk, the computational overhead of the
`diagonal PEs due to angle calculations can be avoided. The resulting SVD array has a homogeneous
`structure with identical diagonal and off-diagonal PEs.
`
`1.
`
`INTRODUCTION
`
`the SVD. Particular
`One important problem in linear algebra and digital signal processing is
`applications arise in beamforming and direction finding, spectrum analysis, digital image processing etc.
`Recently, there is a massive interest in parallel architectures for computing SVD due to the growing
`importanceof real-time signal processing and advances in VLSI devices. Brent and Luk2” have shown how
`a Jacobi method with parallel ordering can efficiently compute the SVD, and how the method can be
`implemented by a two-dimensional systolic array. The method is based on, as commonforall two-sided
`approaches, applying a sequence of two-sided rotations to 2x2 submatrices of the original matrix. The
`computational complexity is thus determined by how to compute the two-sidedrotations.
`
`Usually, a two-sided rotation is realized by four plane rotations, where two of them are applied from
`left to the two column vectors of the 2x2 submatrix and the other ones are applied from right to the row
`vectors, respectively. For the diagonal PEs of the SVD array, additional operations for calculating the
`rotation angles are required. This leads to an inhomogeneous array architecture containing two different
`types of PEs.In this paper, we develop a new TPR methodfor computing two-sided rotations. We show that
`the above computational complexity is reduced significantly, and the SVD array becomes homogeneous
`when using CORDICprocessors.
`
`The paper is organized as follows. In section 2, we briefly re-examine the Jacobi SVD method and the
`SVD array. Then, we develop the TPR methodin section 3. The CORDIC algorithm is described in section
`4. Different techniques for scaling correction are discussed, and examples of scaling corrected CORDIC
`sequences for different data formats are given. In section 5, an unified CORDIC implementation of all PEs
`
`92 / SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989)
`
`Page 1 of 15
`
`SAMSUNG EXHIBIT 1011
`
`

`of the SVDarrayis presented. Finally, some convergence aspects are discussed in section 6.
`
`The SVDof a matrix MeRis given by
`
`2.
`
`JACOBI SVD METHOD
`
`(1)
`M=UZV',
`where UeRN™® and VerRN* are orthogonal matrices and LERis a diagonal matrix of singular values.
`Based on an extension of the Jacobi eigenvalue algorithm, Kogbetliantz’, Forsythe and Henrici> proposed to
`diagonalize M by a sequence of two-sided rotations,
`
`Mo=M,=Muy = UxMyVy (k=0,1,2,-- +). (2)
`
`U, and V, represent rotations in the (i,j)-plane (1Si<jsN). The rotation angles are chosen to annihilate the
`elements of My at the positions (i,j) and (j,i). Usually, several sweeps are necessary to complete the SVD,
`where a sweep is a sequence of N(N-1)/2 two-sided rotations according to a special ordering of the
`N(N-1)/2 different index pairs (i,j). Unfortunately, the best known cyclic orderings, namely, the cyclic row
`ordering
`
`(ij) = (1,2), ... , (LN), (2,3), «. » (2,N), «> (N-LN)
`and the equivalent cyclic column ordering are not suitable for parallel computations.
`
`(3)
`
`Recently, Brent and Luk”? suggested a parallel ordering allowing a high degree of parallelism. It
`enables
`the use of
`a
`square
`systolic
`array with
`[N/2]x[N/2] PEs to implement the Jacobi SVD method
`(Figure 1). For doing this, the matrix M is partitioned into
`2x2 submatrices. Each PE contains one submatrix and
`
`performs a two-sided rotation B = R(0,)'AR(®,) ,
`
`where
`
`(4)
`
`am (2 2] ae ie |
`-
`
`
`
`= ~Lbg,Lan ag) b22 ©)
`
`denote the input and output matrices, and
`
`cos@ sin@
`R®)=|sind cos0 (6)
`
`Fi
`1. The
`SVD
`ma eric ae
`array given by
`describes
`a
`plane
`rotation
`through
`the
`angle
`8,
`respectively. Notice
`that
`there
`are
`two
`different
`computation modes. In a diagonal PE, the rotation angles
`@, and @ are generated to diagonalize the 2x2 submatrix (b;2=b2;=0) stored. We call this the generation
`
` i
`
`SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989) / 93
`
`Page 2 of 15
`
`Page 2 of 15
`
`

`
`
`
`
`mode. Then, the rotation angles are propagated to all off-diagonal PEs in the same row and the same
`column, in which pure rotations (4) with the received angles are performed. Wecall this the rotation mode.
`Clearly, if we directly compute (4) in the rotation mode, we require four plane rotations. For the generation
`mode, additional operations for calculating 6, and @, are needed.
`
`3. TPR METHOD FOR COMPUTING TWO-SIDED ROTATIONSNeSEEERLAMONS
`
`In order to develop the TPR method for computing two-sided rotations more efficiently, we first
`discuss the commutative properties of two special types, the rotation-type and the reflection-type, of 2x2
`matrices, namely,
`
`and A= i. I | eseR}
`A =| L§ | | c.ser }
`Obviously, the plane rotation matrix and the Givensreflection matrix® with c2+s2=1 are two elements of the
`sets &™ and «™*, respectively. The following results can be shown by elementary manipulations.
`Lemma 1.
`If Aj #™™ and Aye .«™,
`then A;Ao=AoA,e M™,
`Lemma 2.
`If Aj M™* and Age #™, then A,Ap=ALA€ 4,
`
`In particular, if we consider two planerotations, we know,
`Lemma 3.
`If R(;) and R(@) are plane rotations described by (6), then R(8,)R(62)=R(6)+6,) and
`R(,)"R(O2)=R(0,-0)).
`
`Now,we give a theorem describing the rotation mode of the TPR method.
`Theorem.
`If the 2x2 matrix A and the tworotation angles @, and 62 are given, then the two-sided
`rotation (4) can be computed by twoplanerotations, ten additions and four scalings by 1/2:
`Pi = (a22 + a3))/2
`P2 = (a2 — a31)/2
`{a = (a2) —ay)/2 ’
`{ oo = (a1 + ap)/2 ’
`6_ = 6-6, ,
`6, = 62+ 0;
`7
`
`(7)
`(8)
`
`a] = x00[2]
`fn) = Ref
`tp|
`=
`R@d|g|
`ty)
`=
`R@ Iq)
`big = +t) + ty
`by = 1 -r
`era ™1+% °
`(eee aia
`Proof. Using (7), the input matrix A can be reformulated by
`
`»
`
`9)
`(10)
`
`A = A,+A) = |
`| —P2
`Pi =]
`a4:
`Pi
`a2 P2
`Clearly, R(®;), R(®2) in (4) and A, are elements of #'™ while A, belongs to #™. This leads to the
`following expression for the output matrix B by using the lemmas 1-3,
`
`94 / SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989)
`
`Page 3 of 15
`
`Page 3 of 15
`
`

`ll
`
`w
`
`RO)"ARO.) = R(O)'A:R(G,) + RO)"A2R(2)
`
`RQ)"R(O)Ai + RO)"RO)"Ay = R(O2-0))A; + ROz+0))"Ay
`—P2 es
`sad| qi
`T
`nie G2 P2
`
`-q1
`Pi
`
`Pi
`
`A direct consequence of the above theorem shows how a two-sided rotation in the generation mode can
`be computed in a similar way.
`Corollary.
`If the 2x2 matrix A is given, we can diagonalize A and calculate the corresponding rotation
`angles 0, and 02 by two Cartesian to polar coordinates conversions, eight additions and four scalings by 1/2:
`Pi = (a2 + a33)/2
`P2 = (a2 — a43)/2
`a
`{a = (a2) —aj)/2 ’”
`Ve = (a2) + aj2)/2 ’”
`(11)
`ry = sign(p:){p4 + qi
`13: = sign(p2){p? + @ ;
`(12)
`{o. = arctan(q;/Ppi)
`le, = arctan(q2/p2)
`
` ddeesee
`
`tT
`tq)
`
`-ty Rb
`TY +
`tg
`In}
`
`°
`
`=
`
`This completes the proof.
`
`0, = (0,-0)/2 ,
`
`0 = (0,+0)/2 ,
`
`bi = 1-2,
`
`by = +12.
`
`(13)
`
`(14)
`
`Proof. Regarding equation (10), t;=t2=0 follows directly from bj.=b2);=0.
`
`In equation (12), we choose the rotation through the smaller angle. All vectors lying in the first or the
`fourth quadrant are rotated to the positive x-axis, and all vectors lying in the second andthe third quadrant
`are rotated to the negative x-axis. For vectors on the y-axis, the rotation direction is arbitrary. Thus, the
`generated rotation angles @_ and 6, satisfy |@_|,
`|®,| < 7/2. This results in
`
`|0,| <7/2
`
`and
`
`|| <sx/2
`
`(15)
`
`due to (13).
`
`Weremark that Delosme’” has also proposed to use (14) for computing b,; and b22 in the genération
`mode. However, he did not recognize the fact that the above approach is generally possible because of the
`commutative properties of the rotation-type and the reflection-type of matrices. So, he still requires four
`plane rotations for computing a two-sided rotation in the rotation mode.
`
`4. THE CORDIC ALGORITHM
`
`In the previous section, we have seen that the main operations of the TPR-methodare plane rotations
`and Cartesian to polar coordinates conversions. Although these operations can be carried out by multiplier-
`
`SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989) /. 95
`
`Page 4 of 15
`
`Page 4 of 15
`
`

`
`
`adder based processors supported by software or special hardware units, a simpler approach is the use of
`dedicated processors that map algorithms more effectively to hardware. The CORDIC processor is a
`powerful one for calculating these functions.
`
`The CORDIC algorithm was originally designed by Volder® as an iterative procedure for computing
`plane rotations and Cartesian to polar coordinates conversions. It was later generalized and unified by
`Walther? enabling a CORDICprocessorto calculate more functions, including hyperbolic functions as well
`as multiplications and divisions, in a similar way. In the following, only Volder's CORDIC algorithm is
`considered for the sake of simplicity because only trigonometric functions are involved in SVD applications.
`
`The CORDICalgorithm consists of iterative shift-add operations on a three-componentvector,
`
`Bs | 7 a _
`Yai
`y,+0,5.x.
`Zi,
`= 2,-€6.0,
`
`cos(a.,)
`|
`1
`“05C) G;sin(@,)
`(i=0,1,....n-1)
`,
`
`-9,sin(a.) *]
`
`cos(a,)
`y;
`
`diay
`
`(17)
`
`in whichthe iteration stepsize 0<8.<1 is defined by
`(18)
`6, = tan(a,) = 20,
`Theset of integers {S(i)} is called CORDIC sequence. Equation (16) can be interpreted, except for a scaling
`factor of
`
`=
`
`1
`
`=
`
`(19)
`148° ,
`k ~ Cos a, ~
`as a rotation of (x,; y)" through the angle a, where o.=11 gives the rotation direction. After n iterations
`(16) and (17),the roast data are given by
`x
`cosa -sinw Xo
`Yn = sin® cosa Yos
`(20)
`ZO Z) — €&
`(21)
`with the overall scaling factor K = i k; and the cumulative rotation angle o = . 6... Now,if the CORDIC
`sequencesatisfies the following paeeiedace condition
`
`(i=0,1,....n-2) ,
`
`to force Y,, OF Z, to zero, provided that the input data x
`
`(22)
`
`0 Yo
`
`n-1
`a- 2asca
`*
`jsitt J
`cc
`we can choose O,=sign(x,y;) or 0,=sign(€z.)
`and Zo lie in the convergence region
`es |
`yl tar
`9,
`
`
`n-1 fory+0arctan(y,/x
`C=
`(23)
`for z_ +0
`Izo|
`Both, plane rotations as well as Cartesian to polar coordinates conversions can be calculated in this way
`
`i=o '
`
`96 / SPIE Vol. 1152 AdvancedAlgorithms andArchitectures for Signal Processing IV (1989)
`
`Page 5 of 15
`
`Page 5 of 15
`
`

`(Table 1), where e=+1 is a sign parameter.
`
`Zot€arctan(yo/xXo9)|Yn = K(exosinzg+ yocosZo)
`
`X, = K( xpcoszp—€ypsinzo)
`
`ra Ksign(xo){x3+y9
`
`Table 1 : CORDICtrigonometric functions
`
`In Table 1, only the principal values |arctan(yo/xo) |<m/2 are calculated when computing Cartesian to
`polar coordinates conversions. Correspondingly, x, may be the positive or the negative vector norm
`according to the sign of x9. These properties for computing (12) are enabled by the particular choice of
`o.=-sign(x,y.), in contrast to o.=-sign(y,) in most other CORDIC publications. So, it is guaranteed that a
`vector is always rotated through the smaller angle to the x-axis. In this case, a convergence region of C2900
`is sufficient for the generation mode.
`
`A disadvantage of the CORDIC algorithm is the need to correct the scaling factor K. The traditional
`CORDIC sequence
`
`{S@} = (0,1, 2,3,...,p-1,p }
`
`(24)
`
`with n=p+1 CORDICiterations for a mantissa length p delivers, for example, a scaling factor of K=1.64676
`for p=16. Using multiplications of x, and y, by K? will degrade the algorithm performancesignificantly.
`Welike to compensate the scaling effect by using a minimum numberofshift-add operations.
`
`SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989) / 97
`
`For a two-sided rotation that is implemented by four plane rotations, each matrix element undergoes
`two plane rotations so that the total scaling factor to be corrected is K. In this case, Cavallaro and Luk!°
`pointed out that there is a simple systematic approach for scaling correction if the traditional CORDIC
`sequence (24) is used. They proposed to use [p/4] scaling iterations of the type x — x-2Ix with jeJ={1, 3,
`5, ... ,2[p/4]-1} and oneshift operation 21. The remaining scaling error is boundedasfollows),
`:
`.
`?p
`:
`1-271 (1-2)-K?| J ja -2? 0-2). (142-7)|
`< gel
`
`jes
`
`jes
`
`i=0
`
`(25)
`
`This approach, however,fails in the TPR method. Here, each matrix element undergoes only one plane
`rotation. So, the scaling factor to be corrected is K. For solving this more difficult problem, different
`techniques have been developed. Haviland and Tuszynski!! used similar scaling iterations as Cavallaro and
`Luk. Ahmed}? proposed to makeKto a powerof the machine radix by repeating some CORDICiterations.
`Delosme!*
`combined both methods above for minimizing the computational overhead. Deprettere et al.4
`suggested the double-shift concept for providing more freedom by optimizing CORDIC sequences.
`
`Wedesigned a computer program)> for a systematic search of CORDIC sequences with comparable
`
`) Whenreplacing [p/4] by [(p-1)/4] or [(p+1)/4], the upper boundin (25) becomes 2-P or 2-P-2,respectively.
`
`Page 6 of 15
`
`Page 6 of 15
`
`

`
`
`complexity for correcting K and found sequences requiring the same number [p/4] of additional iterations.
`We examinedall relevant combinations ofthe shift parameters S(i) (i=0,1,....n-1). In particular, we allowed
`a difference two between adjacentshifts, for example S(i)={0, 1, 3, 3, 3, ...}, such that much more CORDIC
`sequences could be tested. The resulting scaling factor K is corrected by a sequence of nx shift-add
`operations,
`
`ge TT[14n@27®) -K = 14+AK
`
`—(T(j)integers,
`
`(j)=+1) ,
`
`(26)
`
`j=l
`that are characterized by the set of parameters {T(0), n(1)T(1),... , 1(n,)T(n,)}. This results in a total
`number of L=n+n, iterations. We note that the remaining scaling error AK is a systematic error in contrast
`to the other two types of CORDICerrors, the angle resolution and the roundingerrors. So, it is particularly
`important
`to keep AK much smaller than 27, especially if repeated CORDIC operations have to be
`performed on the same data such as in SVD applications.
`
`In the following, for five different mantissa lengthes p=16, 20, 24, 28 and 32, examples of such
`CORDIC sequences, including the total numberofiterations L=n+n,, the remaining scaling error AK and
`the convergenceregionCare given.
`
`
`
`L=17+4, C=100°, AK=-271601 2)
`
`p=16:
`
`p=20:
`
`p=24:
`
`p=28 :
`
`p=32:
`
`{S@} = (0123 --.... 15 16} ,
`{n@T@} = {142-5 +9 410},
`{S@} = (0123 ...... 19 20} ,
`L=21+5, C=100°, AK=2~23-5
`{n@T@} = {1 +2-5 +9 +10 +16} ,
`{S@} = (1123345566788910 ---.-. 23 24},
`{n@TG)} = {0-2 +6},
`L=294+2, C=91°, AK=-2-2%13
`{S@} = (112334556678891011 1213141415 ...... 27 28},
`{n@T)} = {0-2 +6},
`L=3442, C=91°, AK=2732-53
`{S@} = {(001333456789910 ...... 31 32},
`{n@T@} = {1-3-8 +16 -25 -27},
`L=36+5, C=145°, AK~-2739-93
`In order to meet the convergence condition (22) and to provide a convergence region C2>90°, the minimum
`number of CORDICiterations for a mantissa length of p is p+1. So, for all CORDIC sequences given above,
`the number L-(p+1) of additional iterations for scaling correction is p/4. Moreover, except for the first
`CORDICsequence, the remaining scaling error AK is always significantly smaller than 27?.
`
`that similar results can also be obtained by using Deprettere's double-shift concept.
`We suspect
`However, this method requires a slightly increased hardware complexity and will not be discussed in this
`paper.
`
`2) When appendingan additional scaling iteration with (5)T(5)=+16, the scaling accuracy can be enhanced to AK=2-23.
`
`98 / SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989)
`
`
`Page 7 of 15
`
`Page 7 of 15
`
`

`5. CORDIC IMPLEMENTATIONS OF THE SVD PES
`
`Wefirst introduce a CORDIC processor symbol as shown in Figure 2. The descriptions inside the box
`determine uniquely the function modeof the processor. The output data x, y and z are assumed to bescaling
`corrected.
`
`
`
`The operations (7)-(10) and (11)-(14) of the TPR method can be mapped onto a two CORDIC
`processor architecture. In Figure 3, the diagonal PEs of the SVD
`array working in the generation mode are implemented by two
`CORDIC processors computing Cartesian to polar coordinates
`conversions and eight adders. The dotted inputs of the adders
`represent negated inputs. In Figure 4, the off-diagonal PEs of the
`SVD array working in the rotation mode are implemented by two
`CORDIC processors computing plane rotations and ten adders.
`Clearly, both CORDIC implementations have nearly the same
`architecture. All PEs of the SVD array can thus be implemented by
`one unified CORDIC SVD module (Figure 5) without considerably increased hardware cost. In this case, the
`SVD array depicted in Figure 1 becomes homogeneous. The different computation modes of the diagonal
`and off-diagonal PEs can be easily selected by one control bit.
`
`Figure 2. CORDIC symbol
`
`
`
`Weremark that Figure 5 is more a "graphic program" describing the sequence of operations to be
`computed rather than a hardware block diagram.In particular, the twelve adders that are paired into three
`pre-butterflys and three post-butterflys can be integrated into the two CORDIC processors without separate
`hardware implementations. Because of the recursive nature of the Jacobi SVD method, only recursive
`CORDICprocessors can be used. Each of them consists mainly of two barrel-shifters and three adders for
`computing the iterative shift-add operations (16) and (17). So,it is natural to modify the CORDICprocessor
`architecture slightly and to use the existing six adders contained in the two CORDIC processors for
`computing both, the pre-butterfly and the post-butterfly operations. The only computational overhead are
`two iterations resulting in a total computation time of L+2 iterations for one CORDIC SVD module. In
`Figure 6, the principal architecture of such a two CORDIC processor SVD module is shown, where the
`dashed lines and boxes represent the additional hardware components enabling the CORDIC processors to
`compute the butterfly operations. It can be easily verified that the upper four adders devoted to x and y can
`perform the following types of operations 2xt274y, xt2Sy (CORDICiteration) and xt2Sx (scaling
`iteration), while the lower two adders devoted to z can compute 2712,42712, and zta.
`
`Now, we compare the area and time complexity of the above CORDIC SVD module with that of the
`CORDIC SVD modules proposed by Cavallaro and Luk!®, Let Acsva and Tesyq denote the area and time
`complexity of a CORDIC SVD module and A, and T, those of a CORDIC processor, respectively.
`Cavallaro and Luk have shown thattheir most efficient parallel diagonalization method requires A,,yg=2A,
`
`SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989) / 99
`
`_
`
`Page 8 of 15
`
`Page 8 of 15
`
`

`a1
`
`Figure 3. CORDIC implementation of the
`diagonal PE of the SVD array
`
`an
`
`421
`
`822
`
`12
`
`26,
`
`
`
`421
`
`aN
`of the unified CORDIC SVD module
`
`Figure 4. CORDIC implementation of the
`off-diagonal PE of the SVD array
`
`post--butterflys
`pre—butterflys
`
`CateanCau,
`
`Figure 6. The principal architecture
`
`Figure 5. An unified CORDIC SVD module
`for implementing all PEs of the SVD array
`
`100 / SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989)
`
`Page9of 15
`
`Page 9 of 15
`
`

`and T.syg=3T, for the diagonal PEs and Agsyq=2A, and Tosyq=2T,, for the off-diagonal PEs. By using the
`TPR method, we need only Agsyg=2A, and T,syg=T, for all PEs. In other words, the time complexity is
`reduced by more than 50 percent.
`
`6.
`SOME CONVERGENCE CONSIDERATIONS
`Forsythe and Henrici> have proven the convergence of the Jacobi SVD method with the cyclic row
`ordering or the cyclic column ordering if the rotation angles 6; and 0, in (4) are restricted to a closed
`interval H inside the openinterval (-1/2, 7/2). They have also demonstrated that this condition may fail to
`hold, i.e. 6; and @, may have the values +n/2, if the off-diagonal elements by. and b2) in (5) have to be
`exactly annihilated. As a remedy, they suggested an under- or over-rotation by computing (4) with angles
`(1-¥)@, and (1-y)®, (-l<y<1) and proved its convergence. This causes complications for CORDIC
`implementations. Because no longerthe rotation angles @, and 62 generated according to (12) and (13) are
`used, an additional two-sided rotation (4) in the rotation mode with the modified angles (1—y)6, and (1—y)®2
`has to be performed in the diagonal PEs. However, the finite mantissa length in the real arithmetic allows
`only an approximative computation of the rotation angles and implies under- or over-rotations. In the
`practical applications, the Jacobi SVD method converges without using under- or Over-rotations as shown by
`the experimental results of Brent and Luk’.
`
`Recently, Luk and Park!” have shown that various parallel orderings, including the parallel ordering of
`Brent and Luk? for odd N,are equivalent to the cyclic orderings and share the same convergenceproperties.
`Although the parallel architectures for computing SVD for these parallel orderings may be different, the
`basic operations are always two-sided rotations (4). So, the TPR method can be applied to any parallel
`orderings.
`
`7. CONCLUSION
`
`In this paper, we have presented a novel algorithm for computing two-sided rotations requiring only
`two plane rotations and a few additions. This results in a reduced computational complexity of the Jacobi
`SVD method. Moreover, we have presented an unified CORDIC SVD module for implementing all PEs of
`the SVD array given by Brent and Luk. This leads to a homogeneous SVDarray that is simple, regular and
`offers twice the computational speed.
`
`REFERENCES
`
`1.
`
`Speiser, J.M., "Signal Processing Computational Needs", Proc. SPIE Advanced Algorithms and
`Architectures for Signal Processing I, vol.696, pp.2-6, 1986
`2. Brent, R.P. and Luk, F.T., "The Solution of Singular Value and Symmetric Eigenvalue Problems on
`Multiprocessor Arrays" SIAM J. Sci. Stat. Comput., vol.6, no.1, pp.69-84, 1985
`3. Brent, R.P., Luk, F.T. and Van Loan, C.F., "Computation of the Singular Value Decomposition Using
`
`&
`
`Page 10 of 15
`
`SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989) / 101
`
`
`
`
`
`Page 10 of 15
`
`

`Mesh-Connected Processors", Journal of VLSI and Computer Systems, vol.1, no.3, pp.242-270, 1985
`Kogbetliantz, E.G., "Solution of Linear Equations by Diagonalization of Coefficient Matrices", Quart.
`Appla. Math., vol.13, pp.123-132, 1955
`Forsythe, G.E. and Henrici, P., "The Cyclic Jacobi Method for Computing the Principal Values of a
`Complex Matrix", Trans. Amer. Math. Soc., vol.94, pp.1—23, 1960
`Golub, G.H. and Van Loan, C.F., Matrix Computations, pp.44, Johns Hopkins University Press,
`Baltimore, 1983
`Delosme, J.M., "A Processor for Two-dimensional Symmetric Eigenvalue and Singular Value Arrays",
`Proc. 21st Asilomar Conference on Circuits, Systems and Computers, Nov. 1987
`Volder, J.E.,
`"The CORDIC Trigonometric Computing Technique", JRE Trans. on Electronic
`Computers, vol.8, pp.330-334, 1959
`Walther, J.S., "An Unified Algorithm for Elementary Functions", Proc. Spring Joint Computer
`Conference, pp.379-385, 1971
`Cavallaro, J.R. and Luk, F.T., "CORDIC Arithmetic for an SVD Processor", Journal of Parallel and
`Distributed Computing, vol.5, pp.271-290, 1988
`Haviland, G.L. and Tuszynski, A.A., "A CORDIC Arithmetic Processor Chip", JEEE Trans. on
`Computers, vol.29, no.2, pp.68-79, 1980
`Ahmed, H.M., Signal Processing Algorithms and Architectures, Ph.D., Department of Electrical
`Engineering, Stanford University, Dec. 1981
`Delosme, J.M., "VLSI Implementation of Rotations in Pseudo-Euclidean Spaces", Proc. IEEE ICASSP,
`pp.927—930, Boston, 1983
`Deprettere, E.F., Dewilde, P. and Udo, R., "Pipeliend CORDIC Architectures for Fast VLSI Filtering
`and Array Processing", Proc. IEEE ICASSP, pp.41A.6.1-41A.6.4, San Diego, March 1984
`K6nig, D.,
`(in German) Optimierung des CORDIC-Verfahrens fir Prozessoren mit verschiedenen
`Datenformaten, Tech. Report, Lehrstuhl fiir Signaltheorie, Ruhr-Universitit Bochum, FRG, 1989
`Cavallaro, J.R. and Luk, F.T., "Architectures for a CORDIC SVD Processor", Proc. SPIE Real-Time
`Signal Processing IX, vol.698, pp.45-53, 1986
`Luk, F.T. and Park, H., "On the Equivalence and Convergence of Parallel Jacobi SVD Algorithms",
`Proc. SPIE Advanced Algorithms and Architectures for Signal Processing II, vol.826, pp.152-159,
`1987
`
`
`
`102 / SPIE Vol. 1152 Advanced Algorithms and Architectures for Signal Processing IV (1989)
`
`Page 11 of 15 |
`
`
`
`10.
`
`11.
`
`12.
`
`13.
`
`14.
`
`15.
`
`16.
`
`17.
`
`Page 11 of 15
`
`

`PROCEEDINGS
`@SPIE—The International Society for Optical Engineering
`
`Franklin T. Luk
`Chair/Editor
`
`8-10 August 1989
`San Diego, California
`
`Advanced Algorithms
`and Architectures for
`Signal Processing IV
`
` Page 12 of 15
`
`
`
`Volume 1152
`
`;
`
`- a
`
`_
`
`Page 12 of 15
`
`

`
`
`PROCEEDINGS
`SPIE—TheInternational Society for Optical Engineering
`
`Advanced Algorithms
`‘and Architectures for
`Signal Processing [V
`
`Franklin T. Luk
`Chair/Editor
`
`8-10 August 1989
`San Diego, California
`
`Sponsored by
`SPIE—TheInternational Society for Optical Engineering
`
`Cooperating Organizations
`Applied Optics Laboratory/New Mexico State University
`Center for Applied Optics Studies/Rose-HulmanInstitute of Technology
`Center for Applied Optics/University of Alabama in Huntsville
`Center for Electro-Optics/University of Dayton
`Center for Excellence in Optical Data Processing/Carnegie Mellon University
`Jet Propulsion Laboratory/California Institute of Technology
`Optical Sciences Center/University of Arizona
`Optoelectronic Computing Systems Center/University of Colorado,
`Colorado State University
`
`Published by
`SPIE—TheInternational Society for Optical Engineering
`P.O. Box 10, Bellingham, Washington 98227-0010 USA
`Telephone 206/676-3290 (Pacific Time) @ Telex 46-7053
`
`a)
`
`PROCEEDINGS
`
`1
`Volume 1152
`
`
`
`SPIE (TheSociety of Photo-Optical Instrumentation Engineers) is a nonprofit society dedicated to advancing engineering
`andscientific applications of optical, electro-optical, and optoelectronic instrumentation, systems, and technology.
`
`Page 13 of 15
`
`Page 13 of 15
`
`

`
`
`ADVANCED ALGORITHMS AND ARCHITECTURES FOR SIGNAL PROCESSING IV
`
`Volume 1152
`
`CONTENTS
`
`1152-01
`
`SESSION 1
`1152-02
`
`1152-03
`
`1152-04
`
`1152-05
`
`1152-06
`
`1152-07
`
`SESSION 2
`1152-08
`
`1152-09
`
`1152-10
`
`1152-11
`
`1152-12
`
`1152-13
`
`1152-14
`
`1152-15
`
`1152-16
`
`Conference Committee ... 020 s0cccccccccces cu cccce nna ste e ened eee e eens gue e eee reece rene reemerneaees vi
`THtrOduction . ccs sas 6.6 dec dative eu epie esis corspeeeree ereewin osnn cia aieis ae ¥i4i6.a'4.6 5a a9 8 CWS RIS eypgiael arene ereece «oe vii
`
`KEYNOTE ADDRESS
`Algorithmic engineering: an emerging discipline
`J. G. McWhirter, Royal Signals and Radar Establishment (UK). ...-.6- 0+ secre e eset eee e entre eee e ence eens Z
`
`ADAPTIVE FILTERING
`Chaotic model of sea clutter using a neural network
`S. Haykin, H. Leung, McMaster Univ. (Canada). .....- 6. esse cece cece eee e tence nee nee e cence ence ee cees 18
`Multichannelfast transversalfilter algorithms for adaptive broadband beamforming
`D. T.M.Slock, Philips Research Lab. (Belgium); T. Kailath, Stanford Univ: nhs ccs savewwewueses soe este eee 22
`Data equalization using highly nonlinear adaptive architectures
`C. F. N. Cowan,G.J. Gibson, S. Siu, Edinburgh Univ. (UK). ..... esses cece eee eee eee ener eee e ee enes 34
`VLSI architectures for square root covariance Kalmanfiltering
`F. M.Gaston,G. Irwin, Queen’s Univ. (UK). 00... cece eee eee cece e eee ene ee ene ene e ete nee ene eeees 44
`QR-decomposition basedlattice filter algorithms
`L K.Proudler, J. G. McWhirter, T. J. Shepherd, Royal Signals and Radar Establishment CUR). ss scosessicce ma areaeracs 6 56
`Parallel weight extraction by a systolic least squares algorithm
`J. E. Hudson, STC Technology Ltd. (UK); T. J. Shepherd, RoyalSignals and RadarEstablishment (UK). ....... 68
`
`NUMERICAL TECHNIQUES
`Updating singular value decompositions: a parallel implementation
`M. Moonen,Catholic Univ. of Leuven (Belgium); P. Van Dooren,Philips Research Lab. (Belgium);
`J. Vandewalle, Catholic Univ. of Leuven (Belgium)...........0e sees cence eee ener nen e ener ene e een e ene ees 80
`Reducing the computations of the singular value decomposition array given by Brent and Luk
`B. Yang, J. F. Bohme, Ruhr-Univ. (FRG). 00.6.0 e cece cece renee eee e eee een n ene teen ease cn se ences eenees 92
`Lineararray for covariance differencing via hyperbolic singular value decomposition
`A. W.Bojanczyk, A. O. Steinhardt, Cornell Univ... ..... 0. eee e cece eee e ener n ener e nen en cnr n eee e nee. 103
`Transputer implementation of system identification and control using singular value decompositions
`W. E. Larimore, D. Martin, Computational Engineering, Inc.; F. T. Luk, Cornell Univ. .....--++++s+sseeeees 108
`Jacobi-like algorithm for computing the generalized Schur form of a regular pencil
`J.P. Charlier, P. Van Dooren,Philips Research Lab. (Belgium). ......0--0 sss seer eetee etre tree e eres ee ees 119
`CORDICalgorithms: theory and extensions
`JicML Delosmee; Yale Unive. g:o:5 sree aces sais oie oie owt eiore con nin nishsiafoiatn gesesitin teins tivie/#ntefe )cleke olerele/ se elese.sisiece 131
`Asymptotic analysis of the total least squares ESPRIT algorithm
`B. E. Ottersten, Stanford Univ.; M. Viberg, Stanford Univ. and Linkdping Univ. (Sweden); T. Kailath,
`Stanford Univ. ....ccccccccccccvvcccccccveccccnssececcsceteeseseccrerewseeeereeeeceeeeeeseseeees 146
`Analysis of the constrained total least squares method andits relation to harmonic super-resolution
`T. Abatzoglou, Lockheed Palo Alto Research Lab. ......0ssseeee essen eee e nese nese nett nent seen eeennes 158
`Separation of sources using higher order cumulants
`P. Comon, Thomson SINTRA (France). 0.0... sec cece cece cence nee ene n eee n tee ee sees eee eee eee ences 170
`W
`i
`.
`4 tar ARft
`(continued)
`bbevt*
`&
`eemEt
`“epA BA-= yo.
`Tiss7-SeoNV¥dvancedAlgorithms andArchitectures forSignalProcessingIV(1989) / iii
`
`Page 14 of 15
`
`Page 14 of 15
`
`

`
`
`
`
`“AKSIOVS
`sASVSR
`1a
`
`
`
`
`
`The papers appearingin this book comprise the proceedings of the meeting mentioned on
`the cover andtitle page. They reflect the authors’ opinions and are published as presented
`and without change,in the interests of timely dissemination. Their inclusion in this
`publication does not necessarily constitute endorsement by the editors or by SPIE.
`
`Please use the following formatto cite material from this book:
`Author(s), “Title of Paper,” Advanced Algorithms and Architectures for Signal
`Processing IV, Franklin T. Luk, Editor, Proc. SPIE 1152, page numbers(1989).
`
`Library of Congress Catalog Card No. 89-43265
`ISBN 0-8194-0188-9
`
`Copyright © 1989,The Society of Photo-Optical Instrumentation Engineers.
`Copying of material in this bookfor sale orfor internal or personal use beyondthefair use provisions granted by
`the U.S. Copyright Law is subject to paymentof copying fees. The Transactional Reporting Service basefee for
`this volume is $2.00 perarticle and should be paid directly to Copyright Clearance Center, 27 CongressStreet,
`Salem, MA 01970.For those organizations that have been granted a photocopylicense by CCC, a separate
`system of payment has been arranged. The fee code for users of the Transactional Reporting Service is
`0-8194-0188-9/89/$2.00.
`‘
`
`Individual readers of this book and nonprofitlibraries acting for them are permitted to makefair use of the
`materialin it, such as to copy anarticle for teaching or research, without paymentof a fee. Republication or
`systematic or multiple reproduction of any material in this book (including abstracts) is prohibited except with
`the permission of SPIE and oneof the authors.
`,
`
`Permission is granted to quote excerpts from articles in this book in other scientific or technical works with
`acknowledgmentofthe source, including the author's name,thetitle of the book, SPIE volume number, page
`number(s), and year. Reproduction of figures and tablesis likewise permitted in other articles and books
`provided that the same acknowledgmentof the source is printed with the

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases