throbber

`
`
`
`Reducing the computations of the SVD array given by Brent and Luk
`
`B. Yang and J.F.B6hme
`
`Ruhr—Universitiit, Department of Electrical Engineering
`
`4630 Bochum, West Germany
`
`ABSTRACT
`
`A new, efficient two plane rotations (TPR) method for computing two—sided rotations involved in
`singular value decomposition (SVD) is presented. By exploiting the commutative properties of some special
`types of 2x2 matrices, we show that a two—sided rotation can be computed by only two plane rotations and a
`few additions. Moreover, if we use coordinate rotation digital computer (CORDIC) processors to implement
`the processing elements (PEs) of the SVD array given by Brent and Luk, the computational overhead of the
`diagonal PEs due to angle calculations can be avoided. The resulting SVD array has a homogeneous
`structure with identical diagonal and off-diagonal PEs.
`
`1.
`
`INTRODUCTION
`
`the SVD. Particular
`One important problem in linear algebra and digital signal processing is
`applications arise in beamforming and direction finding, Spectrum analysis, digital image processing etc.1
`Recently, there is a massive interest in parallel architectures for computing SVD due to the growing
`importance of real—time signal processing and advances in VLSI devices. Brent and Lulcz’3 have shown how
`a Jacobi method with parallel ordering can efficiently compute the SVD, and how the method can be
`implemented by a two-dimensional systolic array. The method is based on, as common for all two-sided
`approaches, applying a sequence of two—sided rotations to 2x2 submatrices of the original matrix. The
`computational complexity is thus determined by how to compute the two-sided rotations.
`
`Usually, a two—sided rotation is realized by four plane rotations, where two of them are applied from
`left to the two column vectors of the 2x2 submatrix and the other ones are applied from right to the row
`vectors, respectively. For the diagonal PEs of the SVD array, additional operations for calculating the
`rotation angles are required. This leads to an inhomogeneous array architecture containing two different
`types of PBS. In this paper, we develop a new TPR method for computing two—sided rotations. We show that
`the above computational complexity is reduced significantly, and the SVD array becomes homogeneous
`when using CORDIC processors.
`
`The paper is organized as follows. In section 2, we briefly re—examine the Jacobi SVD method and the
`SVD array. Then, we develop the TPR method in section 3. The CORDIC algorithm is described in section
`4. Different techniques for scaling correction are discussed, and examples of scaling corrected CORDIC
`sequences for different data formats are given. In section 5, an unified CORDIC implementation of all PEs
`
`92 / SPIE Vol. I 152 A dvanced Algorithms and Architectures for Signal PracessMg lV (1 .989)
`
`Page 1 of 15
`
`SAMSUNG EXHIBIT 1011
`
`Page 1 of 15
`
`SAMSUNG EXHIBIT 1011
`
`

`

`of the SVD array is presented. Finally, some convergence aspects are discussed in section 6.
`
`The SVD of a matrix Me IRNXN is given by
`
`2.
`
`JACOBI SVD METHOD
`
`M = Usz ,
`
`(1)
`
`where UElRNXN and VEIRNXN are orthogonal matrices and ZEIRNXN is a diagonal matrix of singular values.
`Based on an extension of the Jacobi eigenvalue algorithm, Kogbetliantz4, Forsythe and Henrici5 proposed to
`
`diagonalize M by a sequence of two-sided rotations,
`Mo = M ,
`Mk+1 = UIMka
`
`(k=0,1,2,- - -) .
`
`(2)
`
`Uk and Vk represent rotations in the (i,j)—plane (15i<jSN). The rotation angles are chosen to annihilate the
`elements of Mk at the positions (i,j) and (Li). Usually, several sweeps are necessary to complete the SVD,
`where a sweep is a sequence of N(N-1)/2 two—sided rotations according to a special ordering of the
`N(N—l)/2 different index pairs (i,j). Unfortunately, the best known cyclic orderings, namely, the cyclic row
`
`ordering
`
`(i,j) = (1.2),
`
`.(1,N),(2.3),
`
`, (LN),
`
`, (N-1,N)
`
`(3)
`
`and the equivalent cyclic column ordering are not suitable for parallel computations.
`
`Recently, Brent and Lukz’3 suggested a parallel ordering allowing a high degree of parallelism. It
`enables
`the use of
`a
`square
`systolic
`array with
`
`[N/2]x|'N/2] PBS to implement the Jacobi SVD method
`(Figure 1). For doing this, the matrix M is partitioned into
`2x2 submatrices. Each PE contains one submatrix and
`
`performs a two—sided rotation B = R(61)TAR(62),
`
`where
`
`A_[an a12]
`—
`a21
`a22
`
`d B
`an
`
`—
`
`[bu 1312]
`b21
`b22
`
`denote the input and output matrices, and
`
`sine
`cost)
`Rm): —sin9 cose
`
`(4)
`
`5
`()
`
`(6)
`
`F'
`1. Th
`SVD
`31:31:: and Lil:
`
`army given by
`
`plane
`a
`describes
`respectively. Notice
`
`through
`rotation
`that
`there
`are
`
`the
`two
`
`9,
`angle
`different
`
`computation modes. In a diagonal PE, the rotation angles
`61 and 62 are generated to diagonalize the 2x2 submatrix (b12=b21=0) stored. We call this the generation
`
` i
`
`SP/E Vol.
`
`1 152 Advanced Algorithms and Architectures for Signal Processing IV (1 989) / 93
`
`Page 2 of 15
`
`Page 2 of 15
`
`

`

`
`
`
`
`mode. Then, the rotation angles are propagated to all off-diagonal PBS in the same row and the same
`column, in which pure rotations (4) with the received angles are performed. We call this the rotation mode.
`Clearly, if we directly compute (4) in the rotation mode, we require four plane rotations. For the generation
`mode, additional operations for calculating 91 and 62 are needed.
`
`~—-—%———
`3. TPR METHOD FOR COMPUTING TWO—SIDED ROTATIONS
`
`In order to develop the TPR method for computing two-sided rotations more efficiently, we first
`discuss the commutative properties of two special types, the rotation-type and the reflection—type, of 2x2
`matrices, namely,
`
`flr°f={ [: j] ‘c,se|R}.
`and
`‘c,selR}
`flr°[={ [‘2 z]
`Obviously, the plane rotation matrix and the Givens reflection matrix6 with cZ+s2=1 are two elements of the
`sets J! m and .1! M, respectively. The following results can be shown by elementary manipulations.
`
`lemma 1.
`
`If Ale 4"“ and A25 I“, then A1A2=A2Ale Jam.
`
`Lemma 2.
`
`If Ale M“ and A26 45’“,
`
`then A1A2=A§Ale 4’“.
`
`In particular, if we consider two plane rotations, we know,
`
`If R(9]) and R(92) are plane rotations described by (6), then R(61)R(92)=R(91+67) and
`Lemma 3.
`R<61)TR<62)=R<02491).
`
`Now, we give a theorem describing the rotation mode of the TPR method.
`
`Theorem If the 2x2 matrix A and the two rotation angles 61 and 62 are given, then the two—sided
`rotation (4). can be computed by two plane rotations, ten additions and four scalings by 1/2:
`
`P1 = (a22+a11)/2
`{q1 = (azl—aw/z ’
`
`P2 = (azz—an)/2
`{q2 = (a2] +a12)/2 ’
`
`0_=62—01,
`
`6+=62+61,
`
`["l - M [m]
`tl‘ ”ql’
`
`[’2] - R 9 [p2]
`t2_ (9,1,.
`
`"
`
`b12 = -tl+t2
`bll = rl-fz
`{b22= r1+r2‘
`{b21=t1+t2’
`Proof. Using (7), the input matrix A can be reformulated by
`
`(7)
`
`(8)
`
`(9)
`
`(10)
`
`A=A1+A2=[
`pl «11] + [-pz <12]
`Clearly, R(61), me?) in (4) and A1 are elements of 1’“ while A2 belongs to 4"“. This leads to the
`following expression for the output matrix B by using the lemmas 1—3,
`
`‘12 P2
`
`‘11
`
`P1
`
`94 / SPIE Vol. 1152 AdvancedA/gorithms andArch/(ectures for Signal Processing IV {1989)
`
`Page 3 of 15
`
`Page 3 of 15
`
`

`

`B = meoTARte» = R<61)TA1R(62) + R<el>TA2R<ez>
`
`R<eoTR<eaA1+ R<el)TR<ez>TA2 = R<ez4al>A1+ R(92+91)TA2
`
`P1
`
`R(9_)[ ‘11
`
`1'
`
`
`
`-p2 <12]]+R(6+) [ 92 P2
`
`II
`
`1‘1
`t1
`
`-t1
`r1 +
`
`-l'2 t2
`t2
`r2
`
`'
`
`This completes the proof.
`
`A direct consequence of the above theorem shows how a two-sided rotation in the generation mode can
`
`be computed in a similar way.
`
`Corollary.
`
`If the 2x2 matrix A is given, we can diagonalize A and calculate the corresponding rotation
`
`angles 01 and 62 by two Cartesian to polar coordinates conversions, eight additions and four scalings by 1/2:
`
`P2 =' (fizz — a11)/2
`
`<12 = (a21+a12)/2 ’
`
`P1 = (322 + a11)/2
`
`<11 = (an-aifl/2 ’
`
`{
`
`I1
`
`{9-
`
`=
`
`.
`
`51910,”in + (1% ,
`ar“MIMI/Pl)
`
`{
`{ I'2
`
`.
`
`2
`
`2
`
`Slgn(P2)JP2 + ‘12
`
`9+ = arcran(q2/p2)
`
`’
`
`(12)
`
`(13)
`
`(14)
`
`91 = (9+—9_)/2 ,
`
`92 = (9++9_)/2 ,
`
`bu = Il-rz .
`
`b22 = r1+I2 -
`
`Proof. Regarding equation (10), t1=t2=0 follows directly from b12=b21=0.
`
`In equation (12), we choose the rotation through the smaller angle. All vectors lying in the first or the
`
`fourth quadrant are rotated to the positive x—axis, and all vectors lying in the second and the third quadrant
`
`are rotated to the negative x—axis. For vectors on the y—axis, the rotation direction is arbitrary. Thus, the
`
`generated rotation angles 0_ and 6+ satisfy |9_| ,
`
`|6+| S 1t/2. This results in
`
`|01|S1rl2
`
`and
`
`|92|Sn/2
`
`(15)
`
`due to (13).
`
`We remark that Delosme7 has also proposed to use (14) for computing bu and bn in the generation
`
`mode. However, he did not recognize the fact that the above approach is generally possible because of the
`
`commutative properties of the rotation—type and the reflection-type of matrices. So, he still requires four
`
`plane rotations for computing a two-sided rotation in the rotation mode.
`
`4. THE CORDIC ALGORITHM
`
`In the previous section, we have seen that the main operations of the TPR—method are plane rotations
`
`and Cartesian to polar coordinates conversions. Although these operations can be carried out by multiplier—
`
`SP/E Vol. 1752 Advanced Algorithms andArchiteclures for Signal Processing IV (1989/ /. 95
`
`K—________—
`Page 4 of 15
`
`
`
`Page 4 of 15
`
`

`

`
`
`adder based processors supported by software or special hardware units, a simpler approach is the use of
`dedicated processors that map algorithms more effectively to hardware. The CORDIC processor is a
`powerful one for calculating these functions.
`
`The CORDIC algorithm was originally designed by Volder8 as an iterative procedure for computing
`plane rotations and Cartesian to polar coordinates conversions. It was later generalized and unified by
`Waltherg enabling a CORDIC processor to calculate more functions, including hyperbolic functions as well
`as multiplications and divisions, in a similar way. In the following, only Volder's CORDIC algorithm is
`considered for the sake of simplicity because only trigonometric functions are involved in SVD applications.
`
`The CORDIC algorithm consists of iterative shift-add operations on a three-component vector,
`
`cos(0ti) —Gisin((xi)
`[
`1
`[XHI ] = [ xi—O'ioiyi J =
`yi+1
`yi+<)'i8ixi m 0's1n(0t)
`cos(oci)
`2H1
`= zi — soiai
`(i=0,1,...,n—1)
`,
`
`
`
`l xi ]
`yi
`
`in which the iteration stepsize 0<8i<1 is defined by
`ai = tan(oci) = 2‘5“) .
`
`(16)
`
`(17)
`
`(18)
`
`The set of integers {S(i)} is called CORDIC sequence. Equation (16) can be interpreted, except for a scaling
`factor of
`
`=
`
`1
`(19)
`1+8.2
`cosocl =
`i
`as a rotation of (xi, yiT) through the angle eti, where O'i=+1 gives the rotation direction. After 11 iterations
`(16) and (17), the resulting data are given by
`
`Xu
`cosoc —sin0t
`y
`= K sinoc
`cosoc
`(20)
`2n
`= zo—ea
`(21)
`with the overall scaling factor K: 1'_I ki and the cumulative rotation angle a: 2 6ioni. Now, if the CORDIC
`sequence satisfies the following convergence condition
`"
`n— 1
`
`x0
`y0
`
`’
`
`0c. — 2 on. S or
`1
`j=i+1 J
`“‘1
`we can choose oi=—sign(xiy.) or ci=sign(£zi) to force yn or zn to zero, provided that the input data x
`and 20 lie1n the convergence region
`
`(i=0,1,...,n—2) ,
`
`(22)
`
`0’ yo
`
`n—l
`
`[larctan(y0/x0)|
`
`for y —»0
`
`“
`
`.
`
`(23)
`
`.96 / SPIE Vol. 1 752 Advanced Algorithms and Architectures for Signal Processing /V I7.989)
`
`Page 5 of 15
`
`Page 5 of 15
`
`

`

`(Table 1), where 8:11 is a sign parameter.
`
`yn = K(exosinzo+ yocoszo)
`
`- Ksign(x0)Jx6+y3I
`zo+earctan(yo/xo)
`
`xn = K( xocoszo—eyosinzo)
`
`Table 1 : CORDIC trigonometric functions
`
`In Table 1, only the principal values |arctan(yo/xo) |Srt/2 are calculated when computing Cartesian to
`
`polar coordinates conversions. Correspondingly, xn may be the positive or the negative vector norm
`according to the sign of XO. These properties for computing (12) are enabled by the particular choice of
`oi=-sign(xiyi), in contrast to 6i=—sign(yi) in most other CORDIC publications. So, it is guaranteed that a
`vector is always rotated through the smaller angle to the x-axis. In this case, a convergence region of C2900
`
`is sufficient for the generation mode.
`
`A disadvantage of the CORDIC algorithm is the need to correct the scaling factor K. The traditional
`
`CORDIC sequence
`
`{S(i)} = {0,1, 2, 3,
`
`, p—l, p}
`
`(24)
`
`with n=p+1 CORDIC iterations for a mantissa length p delivers, for example, a scaling factor of Kzl.64676
`for p=16. Using multiplications of xn and yn by K‘1 will degrade the algorithm performance significantly.
`
`We like to compensate the scaling effect by using a minimum number of shift—add operations.
`
`For a two—sided rotation that is implemented by four plane rotations, each matrix element undergoes
`two plane rotations so that the total scaling factor to be corrected is K2. In this case, Cavallaro and Luk10
`
`pointed out that there is a simple systematic approach for scaling correction if the traditional CORDIC
`sequence (24) is used. They proposed to use [p/4] scaling iterations of the type x «— x—2—2jx with je J={1, 3,
`5,
`, 2fp/41—1} and one shift operation 2—1. The remaining scaling error is bounded as followsl) ,
`.
`.
`p
`.
`
`SPIE Vol. I 752 AdvancedA/gorithms and Architectures for Signal Processing /V (1989/ / 97
`
`1—2'1r1(1—"21)-K2‘ = 11—24110—245-11(1+2‘2‘)
`
`jeJ
`
`jeJ
`
`i=0
`
`< 2‘1"1 .
`
`(25)
`
`This approach, however, fails in the TPR method. Here, each matrix element undergoes only one plane
`
`rotation. So, the scaling factor to be corrected is K. For solving this more difficult problem, different
`techniques have been developed. Haviland and Tuszynski11 used similar scaling iterations as Cavallaro and
`Luk. Ahmed12 proposed to make K to a power of the machine radix by repeating some CORDIC iterations.
`Delosme13 combined both methods above for minimizing the computational overhead. Deprettere et al.14
`suggested the double-shift concept for providing more freedom by optimizing CORDIC sequences.
`
`We designed a computer program15 for a systematic search of CORDIC sequences with comparable
`
`1) When replacing [p/4] by [(p—1)/4] or [(p+1)/4], the upper bound in (25) becomes 2—p or 2—p—2, respectively.
`
`h
`
`Page 6 of 15
`
`Page 6 of 15
`
`

`

`
`
`
`
`complexity for correcting K and found sequences requiring the same number [p/4] of additional iterations.
`We examined all relevant combinations of the shift parameters S(i) (i=O,1,...,n—1). In particular, we allowed
`a difference two between adjacent shifts, for example S(i)={0, 1, 3, 3, 3, ...}, such that much more CORDIC
`sequences could be tested. The resulting scaling factor K is corrected by a sequence of nk shift-add
`operations,
`
`24(0) fik[1+na)2‘TG)] -K = 1+AK
`
`j=1
`
`(TO) integers, n0)=i1) ,
`
`(26)
`
`, n(nk)T(nk)}. This results in a total
`that are characterized by the set of parameters {T(0), 11(1)T(1),
`number of L=n+nk iterations. We note that the remaining scaling error AK is a systematic error in contrast
`to the other two types of CORDIC errors, the angle resolution and the rounding errors. So, it is particularly
`important
`to keep AK much smaller than 2"), especially if repeated CORDIC operations have to be
`performed on the same data such as in SVD applications.
`
`In the following, for five different mantissa lengthes p=16, 20, 24, 28 and 32, examples of such
`CORDIC sequences, including the total number of iterations L=n+nk, the remaining scaling error AK and
`the convergence region C are given.
`
`p=l6:
`
`{S(i)} = {o 1 2 3 ------ 1516} ,
`
`p=20:
`
`p=24:
`
`p=28:
`
`p=32:
`
`{mum} = {1 +2 —5 +9 +10} ,
`
`L=17+4, C=100°, AK=—2"16'01 2)
`
`19 20} ,
`{S(i)} = {0123 ------
`mama} = {1 +2 —5 +9 +10 +16} ,
`
`L=21+5, C=100°, AK=2'23'05
`
`{S(i)}:{1123345566788910 ------ 2324},
`mama} = {0 —2 +6} ,
`L=29+2, c=91°, AK=—2_29'13
`
`{S(i)}:{1123345566788910111213141415 ------ 2728},
`mamp} = {o —2 +6} ,
`L=34+2, c=91°, AK=2'32'53
`
`{S(i)}:{001333456789910 ------ 3132},
`mama} = {1 —3-—8 +16 —25 —27} ,
`L=36+5, c=145°, AK=—2‘.,39'93
`
`In order to meet the convergence condition (22) and to provide a convergence region C2900, the minimum
`number of CORDIC iterations for a mantissa length of p is p+1. So, for all CORDIC sequences given above,
`the number L—(p+1) of additional iterations for scaling correction is p/4. Moreover, except for the first
`CORDIC sequence, the remaining scaling error AK is always significantly smaller than 2—p.
`
`that similar results can also be obtained by using Deprettere's double—shift concept.
`We suspect
`However, this method requires a slightly increased hardware complexity and will not be discussed in this
`paper.
`
`
`2) When appending an additional scaling iteration with n(5)T(5)=+16, the scaling accuracy can be enhanced to AK=2—23.
`
`98 / SPIE Vol. 7 152 AdvancedA/gorithms andArch/tectures for Signal Processing /V (1989/
`
`
`Page 7 of 15
`
`Page 7 of 15
`
`

`

`5. CORDIC IMPLEMENTATIONS OF THE SVD PES
`
`We first introduce a CORDIC processor symbol as shown in Figure 2. The descriptions inside the box
`
`determine uniquely the function mode of the processor. The output data x, y and z are assumed to be scaling
`corrected.
`
`The operations (7)—(10) and (11)—(14) of the TPR method can be mapped onto a two CORDIC
`
`
`
`Figure 2' CORDIC symbol
`
`processor architecture. In Figure 3, the diagonal PEs of the SVD
`
`array working in the generation mode are implemented by two
`
`CORDIC processors computing Cartesian to polar coordinates
`
`conversions and eight adders. The dotted inputs of the adders
`
`represent negated inputs. In Figure 4, the off-diagonal PEs of the
`
`SVD array working in the rotation mode are implemented by two
`
`CORDIC processors computing plane rotations and ten adders.
`Clearly, both CORDIC implementations have nearly the same
`
`architecture. All PEs of the SVD array can thus be implemented by
`
`one unified CORDIC SVD module (Figure 5) without considerably increased hardware cost. In this case, the
`
`SVD array depicted in Figure 1 becomes homogeneous. The different computation modes of the diagonal
`
`and off-diagonal PEs can be easily selected by one control bit.
`
`We remark that Figure 5 is more a "graphic program" describing the sequence of operations to be
`
`computed rather than a hardware block diagram. In particular, the twelve adders that are paired into three
`
`pre-butterflys and three post-butterflys can be integrated into the two CORDIC processors without separate
`
`hardware implementations. Because of the recursive nature of the Jacobi SVD method, only recursive
`
`CORDIC processors can be used. Each of them consists mainly of two barrel-shifters and three adders for
`
`computing the iterative shift-add operations (16) and (17). So, it is natural to modify the CORDIC processor
`
`architecture slightly and to use the existing six adders contained in the two CORDIC processors for
`
`computing both, the pre—butterfly and the post-butterfly operations. The only computational overhead are
`
`two iterations resulting in a total computation time of L+2 iterations for one CORDIC SVD module. In
`
`Figure 6, the principal architecture of such a two CORDIC processor SVD module is shown, where the
`
`dashed lines and boxes represent the additional hardware components enabling the CORDIC processors to
`
`compute the butterfly operations. It can be easily verified that the upper four adders devoted to x and y can
`perform the following types of operations 2'lxi2"ly, xfl'sy (CORDIC iteration) and xi2'sx (scaling
`iteration), while the lower two adders devoted to 2 can compute 2—121fl'122 and lid.
`
`Now, we compare the area and time complexity of the above CORDIC SVD module with that of the
`CORDIC SVD modules proposed by Cavallaro and Luk“. Let Acsvd and Tam denote the area and time
`
`complexity of a CORDIC SVD module and Ac and Te those of a CORDIC processor, respectively.
`
`Cavallaro and Luk have shown that their most efficient parallel diagonalization method requires Amwd'z2Ac
`
`SP/E Vol. 1 152 Advanced Algorithms and Architectures for Signal Processing lV (1989) / 99
`
`
`
`k—
`
`Page 8 of 15
`
`
`
`Page 8 of 15
`
`

`

`
`
`
`
`pos t- butterfly:
`of the unified CORDIC SVD module
`
`Figure 3. CORDIC implementation of the
`diagonal PE of the SVD array
`
`Figure 4. CORDIC implementation of the
`off—diagonal PE of the SVD array
`
`pre— bu tterflys
`
`
`
`Figure 6. The principal architecture
`
`Figure 5. An unified CORDIC SVD module
`for implementing all PEs of the SVD array
`
`100 / SPIE Vol. 1 152 Advanced Algorithms and Architectures for Signal Processing IV (7989)
`
`WOHS
`
`Page 9 of 15
`
`

`

`and Tcsvd=3Tc for the diagonal PBS and AMWFZAc and de=2Tc for the off-diagonal PEs. By using the
`TPR method, we need only Ac,“,d:=2Ac and Tcsvd=Tc for all PEs. In other words, the time complexity is
`
`reduced by more than 50 percent.
`
`6.
`
`SOME CONVERGENCE CONSIDERATIONS
`
`Forsythe and Henrici5 have proven the convergence of the Jacobi SVD method with the cyclic row
`ordering or the cyclic column ordering if the rotation angles 61 and 62 in (4) are restricted to a closed
`interval H inside the open interval (—1t/2, 1r/2). They have also demonstrated that this condition may fail to
`
`hold, i.e. 91 and 92 may have the values int/2, if the off-diagonal elements bu and b2] in (5) have to be
`exactly annihilated. As a remedy, they suggested an under- or over—rotation by computing (4) with angles
`(l—y)91 and (1—y)62 (—1<'y<1) and proved its convergence. This causes complications for CORDIC
`implementations. Because no longer the rotation angles 61 and 92 generated according to (12) and (13) are
`used, an additional two—sided rotation (4) in the rotation mode with the modified angles (1—y)61 and (l—y)62
`has to be performed in the diagonal PEs. However, the finite mantissa length in the real arithmetic allows
`only an approximative computation of the rotation angles and implies under- or over-rotations. In the
`practical applications, the Jacobi SVD method converges without using under- or over—rotations as shown by
`the experimental results of Brent and Luk3.
`
`Recently, Luk and Park17 have shown that various parallel orderings, including the parallel ordering of
`Brent and Luk2 for odd N, are equivalent to the cyclic orderings and share the same convergence properties.
`Although the parallel architectures for computing SVD for these parallel orderings may be different, the
`basic operations are always two—sided rotations (4). So, the TPR method can be applied to any parallel
`
`orderings.
`
`7. CONCLUSION
`
`In this paper, we have presented a novel algorithm for computing two-sided rotations requiring only
`
`two plane rotations and a few additions. This results in a reduced computational complexity of the Jacobi
`
`SVD method. Moreover, we have presented an unified CORDIC SVD module for implementing all PEs of
`
`the SVD array given by Brent and Luk. This leads to a homogeneous SVD array that is simple, regular and
`
`offers twice the computational speed.
`
`REFERENCES
`
`l.
`
`Speiser, J.M., "Signal Processing Computational Needs", Proc. SPIE Advanced Algorithms and
`
`Architectures for Signal Processing I, vol.696, pp.2—6, 1986
`
`2. Brent, RP. and Luk, F.T., "The Solution of Singular Value and Symmetric Eigenvalue Problems on
`
`Multiprocessor Arrays" SIAM J. Sci. Stat. Comput., vol.6, no.1, pp.69—84, 1985
`
`3. Brent, R.P., Luk. ET. and Van Loan, C.F., "Computation of the Singular Value Decomposition Using
`
`It
`
`Page 10 of 15
`
`SPIE Vol. 7 752 Advanced Algorithms andArchitectures far Signal Processing /V (7989) / 107
`
`
`
`
`
`Page 10 of 15
`
`

`

`7. Delosme, J.M., "A Processor for Two-dimensional Symmetric Eigenvalue and Singular Value Arrays",
`Proc. 2] st Asilomar Conference on Circuits, Systems and Computers, Nov. 1987
`8. Volder, J.E.,
`"The CORDIC Trigonometn'c Computing Technique",
`IRE Trans. on Electronic
`Computers, vol.8, pp.330—334, 1959
`
`9. Walther, J.S., "An Unified Algorithm for Elementary Functions", Proc. Spring Joint Computer
`Conference, pp.379—385, 1971
`
`10. Cavallaro, J.R. and Luk, F.T., "CORDIC Arithmetic for an SVD Processor", Journal of Parallel and
`Distributed Computing, vol.5, pp.271—290, 1988
`
`11. Haviland, G.L. and Tuszynski, A.A., "A CORDIC Arithmetic Processor Chip", IEEE Trans. on
`Computers, vol.29, no.2, pp.68—79, 1980
`
`12. Ahmed, H.M., Signal Processing Algorithms and Architectures, Ph.D., Department of Electrical
`Engineering, Stanford University, Dec. 1981
`13. Delosme, J.M., "VLSI Implementation of Rotations in Pseudo-Euclidean Spaces", Proc. IEEE ICASSP,
`pp.927—930, Boston, 1983
`
`
`
`Mesh-Connected Processors", Journal of VLSI and Computer Systems, vol.1, no.3, pp.242—270, 1985
`4. Kogbetliantz, E.G., "Solution of Linear Equations by Diagonalization of Coefficient Matrices", Quart.
`Appla. Math., vol.13, pp.123—132, 1955
`
`5.
`
`Forsythe, GE. and Henrici, P., "The Cyclic Jacobi Method for Computing the Principal Values of a
`Complex Matrix", Trans. Amer. Math. Soc., vol.94, pp.1—23, 1960
`6. Golub, G.H. and Van Loan, C.F., Matrix Computations, pp.44, Johns Hopkins University Press,
`Baltimore, 1983
`
`702 / SPIE Vol. I 752 AdvancedA/gon'thms and Architectures for Signal Processing IV (7 989)
`
`14. Deprettere, E.F., Dewilde, P. and Udo, R., "Pipeliend CORDIC Architectures for Fast VLSI Filtering
`and Array Processing", Proc. IEEE ICASSP, pp.41A.6.141A.6.4, San Diego, March 1984
`15. Konig, D.,
`(in German) Optimierung des CORDIC-Verfahrens fiir Prozessoren mit verschiedenen
`Datenformaten, Tech. Report, Lehrstuhl fiir Signaltheorie, Ruhr-Universitiit Bochum, FRG, 1989
`16. Cavallaro, J.R. and Luk, F.T., "Architectures for a CORDIC SVD Processor", Proc. SPIE Real-Time
`Signal Processing IX, vol.698, pp.45—53, 1986
`17. Luk, RT. and Park, H., "On the Equivalence and Convergence of Parallel Jacobi SVD Algorithms",
`Proc. SPIE Advanced Algorithms and Architectures for Signal Processing 11, vol.826, pp.152—159,
`1987
`
`Page 11 0f15 .
`
`Page 11 of 15
`
`

`

`
`
`PROCEEDINGS
`
`@SPIE—The International Society for Optical Engineering
`
`Advanced Algorithms
`and Architectures for
`Signal Processing IV
`
`Franklin T. Luk
`
`Chair/Editor
`
`8—10 August 1989
`San Diego, California
`
`
`
`Volume 1 152
`
`
`
`Page 12 0f15
`
`
`
`'
`7m
`W
`N '
`'
`———J
`
`W
`
`Page 12 of 15
`
`

`

`
`
`PROCEEDINGS
`SPIE—The International Society for Optical Engineering
`
`Advanced Algorithms
`//and Architectures for
`Signal Processing IV
`
`Franklin T. Luk
`
`Chair/Editor
`
`8-10 August 1989
`San Diego, California
`
`Sponsored by
`SPIE—The International Society for Optical Engineering
`
`Cooperating Organizations
`Applied Optics Laboratory/New Mexico State University
`Center for Applied Optics Studies/Roseel-Iulman Institute of Technology
`Center for Applied Optics/University of Alabama in Huntsville
`Center for Electro—Optics/University of Dayton
`Center for Excellence in Optical Data Processing/Carnegie Mellon University
`Jet Propulsion Laboratory/California Institute of Technology
`Optical Sciences Center/University of Arizona
`Optoelectronic Computing Systems Center/University of Colorado,
`Colorado State University
`
`Published by
`SPIE—The lntemational Society for Optical Engineering
`P.O. Box 10, Bellingham, Washington 98227—0010 USA
`Telephone 206/6763290 (Pacific Time) . Telex 46—7053
`
`E R l E 2
`
`PROCEEDINGS
`'3
`
`l 5 2
`V l
`0 ume 1
`
`
`
`SPIE (The Society of PhotoOptical Instrumentation Engineers) is a nonprofit society dedicated to advancing engineering
`and scientific applications of optical, electranptical, and optoelectronic instrumentation, systems, and technology.
`
`Page 13 of 15
`
`Page 13 of 15
`
`

`

`
`
`ADVANCED ALGORITHMS AND ARCHITECTURES FOR SIGNAL PROCESSING IV
`
`
`
`Volume 1152
`
`CONTENTS
`
`1152—01
`
`SESSION 1
`1 152—02
`
`1 152—03
`
`1 152—04
`
`1 152—05
`
`1 152—06
`
`1 152—07
`
`SESSION 2
`1 152—08
`
`1152—09
`
`1152—10
`
`1152—11
`
`1152—12
`
`1152—13
`
`1152—14
`
`1152—15
`
`1152—16
`
`Conference Committee ............................................................................ vi
`Introduction .................................................................................... vii
`
`KEYNOTE ADDRESS
`
`Algorithmic engineering: an emerging discipline
`J. G. McWhirter, Royal Signals and Radar Establishment (UK). .......................................... 2
`
`ADAPTIVE FILTERING
`Chaotic model of sea clutter using a neural network
`S. Haykin, H. Leung, McMaster Univ. (Canada)....................................................... 18
`Multichannel fast transversal filter algorithms for adaptive broadband beamforming
`D. T. M. Slock, Philips Research Lab. (Belgium); T. Kailath, Stanford Univ. ............................... 22
`Data equalization using highly nonlinear adaptive architectures
`C. F. N. Cowan, G. J. Gibson, S. Siu, Edinburgh Univ. (UK). ........................................... 34
`VLSI architectures for square root covariance Kalman filtering
`F. M. Gaston, G. Irwin, Queen‘s Univ. (UK). ........................................................ 44
`QR—decomposition based lattice filter algorithms
`I. K. Proudler, J. G. McWhirter, T. J. Shepherd, Royal Signals and Radar Establishment (UK)................. 56
`Parallel weight extraction by a systolic least squares algorithm
`J. E. Hudson, STC Technology Ltd. (UK); T. J. Shepherd, Royal Signals and Radar Establishment (UK). ....... 68
`
`NUMERICAL TECHNIQUES
`Updating singular value decompositions: a parallel implementation
`M. Moonen, Catholic Univ. of Leuven (Belgium); P. Van Dooren. Philips Research Lab. (Belgium);
`J. Vandewalle, Catholic Univ. of Leuven (Belgium)..................................................... 80
`Reducing the computations of the singular value decomposition array given by Brent and Luk
`B. Yang, J. F. Bohme, Ruhr—Univ. (FRG)............................................................. 92
`Linear array for covariance differencing via hyperbolic singular value decomposition
`A. W. Bojanczyk, A. O. Steinhardt. Cornell Univ..................................................... 103
`Transputer implementation of system identification and control using singular value decompositions
`W. E. Larimore, D. Martin, Computational Engineering, Inc.; F. T. Luk, Cornell Univ. ..................... 108
`Jacobi-like algorithm for computing the generalized Schur form of a regular pencil
`J.—P. Charlier, P. Van Dooren, Philips Research Lab. (Belgium).......................................... 119
`CORDIC algorithms: theory and extensions
`J.—M. Delosme, Yale Univ......................................................................... 131
`Asymptotic analysis of the total least squares ESPRIT algorithm
`B. E. Ottersten, Stanford Univ.; M. Viberg, Stanford Univ. and Linkoping Univ. (Sweden); T. Kailath,
`Stanford Univ. ................................................................................. 146
`Analysis of the constrained total least squares method and its relation to harmonic super-resolution
`T. Abatzoglou, Lockheed Palo Alto Research Lab..................................................... 158
`Separation of sources using higher order cumulants
`P. Comon, Thomson SINTRA (France)............................................................. 170
`V.
`.
`-
`v
`, 3... «l91;; {
`(continued)
`.2
`.
`’
`O
`twig51::ééfipiflz
`dvfsnced Algorithms and Architectures for Signal Processing /V (1989) / iii
`
`¥Page 14 of 15
`
`Page 14 of 15
`
`

`

`
`
`
`
`~l lg: to 1. ‘3
`.
`t\- O");
`
`\c’i "c‘i
`
`
`
`The papers appearing in this book comprise the proceedings of the meeting mentioned on
`the cover and title page. They reflect the authors' opinions and are published as presented
`and without change, in the interests of timely dissemination Their inclusion in this
`publication does not necessarily constitute endorsement by the editors or by SPIE.
`
`Please use the following format to cite material from this book:
`Authorls), ”Title of Paper," Advanced Algorithms and Architectures for Signal
`Processing IV, Franklin T. Luk, Editor, Proc. SPIE 1152, page numbers (1989).
`
`Library of Congress Catalog Card No. 89-43265
`ISBN 0-8194-0188-9
`
`Copyright © 1989, The Society of Photo-Optical Instrumentation Engineers.
`
`Copying of material in this book for sale or for internal or personal use beyond the fair use provisions granted by
`the US. Copyright Law is subject to payment of copying fees. The Transactional Reporting Service base fee for
`this volume is $2.00 per article and should be paid directly to Copyright Clearance Center, 27 Congress Street,
`Salem, MA 01970. For those organizations that have been granted a photocopy license by CCC, a separate
`system of payment has been arranged. The fee code for users of the Transactional Reporting Service is
`0-8194-0188-9/89/S200.
`
`Individual readers of this book and nonprofit libraries acting for them are permitted to make fair use of the
`material in it, such as to copy an article for teaching or research, without payment of a fee. Republication or
`systematic or multiple reproduction of any material in this book (including abstracts) is prohibited except with
`the permission of SPIE and one of the authors.
`~
`
`Permission is granted to quote excerpts from articles in this book in other scientific or technical works with
`acknowledgment of the source, including the author's name, the title of the book, SPIE volume number, page
`numberls), and year. Reproduction of figures and tables is likewise per

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket