throbber
1016
`
`IEEE TRANSACTIONS ON COMPUTERS, VOL. 41, NO. 8, AUGUST 1992
`
`Constant-Factor Redundant CORDIC
`for Angle Calculation and Rotation
`
`Jeong-A Lee, Member, IEEE, and Tomas Lang
`
`Abstraci— We develop a Constant-Factor Redundant-CORDIC
`(CFR-CORDIC) scheme, where the scale factor is forced to be
`constant while computing an angle for plane rotations. The di-
`rection of rotation is determined from an estimate of the sign and
`convergence is assured by suitably placed correcting iterations.
`Moreover, the number of iterations in the CORDIC rotation
`unit is reduced by about 25% by expressing the direction of the
`rotation in radix-2 and radix-4, and conversion to conventional
`representation is done on-the-fly. We estimate the performance of
`CFR-CORDIC and compare it with previously proposed schemes
`and show thatit provides a similar execution time as redundant
`CORDICwith a variable scaling factor, with a significant saving
`in area.
`
`Index Terms— Angle calculation and rotation, constant scale
`factor, CORDIC, digital signal processing, matrix computations,
`matrix triangularization, redundant arithmetic.
`
`I].
`
`INTRODUCTION
`
`ODERN digital signal processing requires the solution
`of systems of linear equations and the computation of
`eigenvalues, eigenvectors, singular values, and singular vectors
`(1]. Basic algorithms for these computations are matrix trian-
`gularization and singular value decomposition [2]. Since these
`applications are computationally intensive and require real-
`time response, parallel algorithms and pipelined and parallel
`architectures have been proposed to achieve high throughput
`(3]-[7]. In particular, linear, triangular, and mesh-connected
`arrays are very suitable because these matrix algorithms can
`be effectively mapped onto these arrays [8], [3], [5]. The key
`operations of these parallel algorithms are the computation of
`2 x 2 rotation matrices (rotation angles) and their application
`to appropriate submatrices. The rotation angle is computed
`in boundary or diagonal processors and broadcast
`to other
`processors wherethe rotations are performed. One calculation
`approach is to compute the sine and cosine of the angle
`(by means of a sequence of operations involving squaring,
`addition, multiplication, square root, and division) and then
`
`Manuscript received November 1, 1991; revised February 18, 1992. This
`work was done while the authors were at the Computer Science Department,
`U.C.L.A., and was supported in part by the NSF Grant MIP-8813340
`Composite Operations Using On-Line Arithmetic for Application-Specific Par-
`allel Architectures: Algorithms, Design, and Experimental Studies. Based on
`“Matrix Triangularization by Fixed-Point Redundant CORDIC with a Constant
`Scale Factor” by J. Lee and T. Lang which appeared in Proceedings of SPIE
`Conference on Advanced Signal Processing Algorithms, Architectures, and
`Implementation, vol. 1348, pp. 430-447, San Diego, CA, July 1990.
`J. Lee is with the Department of Electrical Engineering, University of
`Houston, Houston, TX 77204-4793,
`T. Lang is with the Computer Architecture Department, Universitat Politec-
`nica de Catalunya, Barcelona, Spain,
`[EEE Log Numbver 9201909,
`
`to perform the rotations by multiplications and additions. The
`main drawback of this approach is that the angle calculation
`requires a long sequence of operations resulting in a large
`number of modules and a long execution time.
`Another approach is to use the CORDIC algorithm [9],
`[10], which is utilized directly both to calculate the angle
`and to perform the rotations.
`[1 is characterized by its simple
`and regular architecture, which mainly consists of shifters and
`adders and is, therefore, well suited for VLSI implementation
`[11]-[13].
`The CORDICalgorithm is relatively slow because each
`iteration requires an addition. A solution is to use a redundant
`number representation (for example, carry-save or signed-
`digit) to achieve carry-free addition. This can be applied di-
`rectly to the rotation mode; however,for the angle-calculation
`mode it
`is necessary to detect
`the sign of the redundant
`representation, which is slow. In a sequential implementation
`of CORDIC,this sign detection can be done in parallel with
`the variable shifting. On the other hand,
`in an unfolded
`implementation the variable shifters are replaced by wired
`connections, resulting in lower delay (and a higher throughput
`if the implementation is pipelined). In such a case, the sign
`detection is in the critical path and degrades the performance.
`In [14] a scheme was developed, similar to what has been
`standard for other recurrences such as division and square
`root, in which an estimate of the sign is used. This approach
`produces a significant speedup for matrix computations but
`complicates the scaling required for rotation.
`In this paper we develop a Constant-Factor Redundant-
`CORDIC (CFR-CORDIC) scheme for the angle calculation
`and rotation application. The scale factor
`is forced to be
`constant while computing the angles, so that the scaling for
`the associated rotations is performed as in the nonredundant
`CORDIC case. The approach, which is based on sign esti-
`mation and additional iterations to assure convergence,
`is an
`extension of that proposed in [15] for the calculation of sine
`and casine. The angle calculation and rotation application is
`more complicated because the sign estimation is affected by
`the interdependence of the two CORDIC recurrence equations
`and because of the scaling required in the rotation.
`We also propose a scheme to reduce by about 25% the
`number of iterations in the CORDIC rotation unit. This is
`
`achieved by expressing the direction of rotation in radix-2 and
`radix-4 (this can be used in both nonredundant and redundant
`CORDIC). Moreover, the transformation of the rotated output
`into nonredundant representation is done on-the-fly during
`the iterations, using an cxtension of the approach presented
`9018-9340/92$03.00 © 1992 IEEE
`
`Page 1 of 10
`
`SAMSUNG EXHIBIT 1044
`
`Page 1 of 10
`
`SAMSUNG EXHIBIT 1044
`
`

`

`LEE AND LANG: CONSTANT-FACTOR REDUNDANT CORDIC
`
`in [16]. The method here is more complex because of the
`iteration structure of the rotation. Finally, we discuss tradeoffs
`in the implementation, evaluate an estimate of performance
`and compare with previously proposed schemes. For matrix
`triangularization, we estimate that the execution time is similar
`to that of redundant CORDIC with variable scale factor, at
`significant savings in area.
`
`Il. REVIEW OF CORDIC SCHEMES
`
`117
`
`vector
`(a,b)
`rotated vector
`CORDIC
`
`
`Rotation unit
`
`©, dr)
`(X and ¥ rec.)
`
`Fig. 1. General architecture for matrix computations with CORDIC.
`
`the following CORDICscaling operation is
`(X,.|0]. ¥,[0]),
`necessary, where (x®y") is the rotated vector.
`
`(nonredundant) and
`We briefly review the conventional
`redundant CORDIC schemes [10], [14] to compute an angle
`x -t Xn} where
`and perform rotations.
`
`y® K|Y-(n]
`n-1 a
`A. Nonredundant CORDIC
`K = [J t/cosoi2™) = [] fi +022. ©)
`=f
`i=0)
`
`=
`@
`angle
`Angle calculation(vectoring mode): The
`tan~!(Y¥, [0]/Xq{0]) in n-bit precision is obtained from Z,.[n}
`using the following recurrence equations with Z,(0] = 0.
`Xali + 1) = X,[i] + 027° Yai)
`Y, [i + 1] = Y, {i} — 427° X,[i]
`Zq'i +1) = Z,[i] + 0; tan71(27*)
`
`()
`
`is assumed positive and the direction of the
`where X,[0]
`rotation is obtained as
`
`oad
`| =1
`
`if Y,{i] > 0
`(2)
`if Y,{i] <0.
`Rotation(rotating-mode): To perform a plane rotation on
`the input vector (X,.(0], ¥,[0}) by an angle (@ = Z,{0]), the
`following recurrence equations are used:
`X,fi+ UY = X,[i] + oi27°Y, [i]
`¥,[¢ + 1] = ¥,{¢] — 27° X, {9
`Z,{i + 1) = Z,[2] + o; tan71(27*)
`
`(3)
`
`where
`
`safc
`i +1
`
`if Z,[i) > 0
`(4)
`if Z,[i] <0.
`The recurrence equations are the same for both angle and
`rotation modes.
`Rotation with angle in decomposed form: Note that for an
`application in which an angle is calculated by the vectoring
`mode andthenthis angle is used for rotation, it is not necessary
`to implement
`the Z recurrences in any of the two modes,
`since the angle can be used in its o-decomposed form. This
`is the type of application we have in mind, so that we do not
`include the Z recurrences in the sequel and the o for rotationis
`obtained from the angle calculation. The general architecture
`of these applications is shown in Fig.1.
`In both operation modes, the iteration time corresponds to
`the delay of the variable shifter plus the delay of the adder.
`If the recurrences are unfolded, the variable shifter is replaced
`by wiring, which reduces the iteration time to the delay of
`the adder. Moreover,
`the unfolded implementation can be
`pipelined to increase the throughput.
`Scaling: Since the CORDIC rotation changes the length of
`the vector,
`to maintain the same length of the input vector
`
`Page 2 of 10
`
`Note that K is constant for nonredundant CORDIC since
`o? = 1 for all
`i.
`The scaling operation can be done by multiplication by
`1/K. However, as proposed in [17]-[19] it might be simpler
`
`to include scaling iterations of the form X,[i] — X,|[i] +
`2-4X, [i and Y,{i) — Y,[‘] + 2-4Y¥,(2] and repetitions of
`CORDICiterations to make the scaling factor equal to a power
`of two (really 1 or 2) and perform the scaling by shifting.
`For an efficient
`implementation,
`the total number of these
`additional iterations need to be minimized; solutions have been
`found that require about 25% moreiterations to force K to be
`2 or
`1
`[19],
`[20].
`
`B. Redundant CORDIC
`
`To reduce theiteration time,it is possible to use a redundant
`representation (for example carry-save or signed-digit) and
`the corresponding (carry-free) adder. In the developmentthat
`follows we consider
`the signed-digit case;
`the carry-save
`alternative is similar, We now discuss the effect that the use
`of the redundant representation has on the angle computation
`and on the rotation.
`Angle computation: To compute a from the redundant
`representation, a sign detection is required, which is slow;
`however,
`in a sequential (not unfolded) implementation this
`sign detection is not
`in the critical path since it can be
`performed in parallel with the variable shifter. On the other
`hand, for an unfolded implementation, since the shifters are
`eliminated,
`the delay of this sign detection determines the
`iteration time. In [14] a scheme was proposed that computes 6;
`based on an estimate ofY{i]. This requires the modification of
`the recurrence and selection function to determine the direction
`of rotation. By making Wi] = 2'Y,[7] we get from (1),
`
`X,fi +1) = X,[i] + e277 Whi]
`Wi +1] = {WT[i] - 6X,(i).
`
`(6)
`
`The value of &; is obtained from Wi], an estimate produced
`by truncating W’{i} to one fractional (signed) bit. It is shown in
`[14] that to assure convergence the value of a; has to belong
`
`Page 2 of 10
`
`

`

`1018
`
`IEEE TRANSACTIONS ON COMPUTERS, VOL. 41, NO. 8 AUGUST 1992
`
`to the set {—1,0,1} and
`
`if WE > 5
`1
`if Wii] =0
`a=40
`-1 if W[i) <-3
`
`(7)
`
`where X,(0] and ¥,(0] (W[0}) are fractional values, one of
`them is normalized, and X,[0] > 0. With these conditions, a
`normalized X,(1] is obtained, that is, 1/2 < Xq[1] < 2.
`Rotation: Redundant adders can be used for the iterations.
`No ¢ is calculated since it is obtained from the angle calcu-
`lation.
`
`Scaling: The same scaling operation for rotation as in
`nonredundant CORDIC has to be performed. However,
`in
`this redundant CORDIC approach,
`the scale factor K is
`not a constant as G;
`{—1,0,1}. Therefore, K must be
`computed for each G-decomposed angle. This calculation can
`be performed in two steps [14]: 1) K? computation using
`Plj +1) = Plj]+|é;|2~-~ Ply] with initial condition P[0| = 1
`and 2) K = VP. The K computation can be overlapped with
`the angle computation, so it does not increase the computation
`time. However, a division operation is needed for the final
`scaling operation.
`It was estimated that this scheme produces a speedup of 4.5
`for matrix triangularization and of 4 for SVD, when compared
`with the nonredundant scheme [14].
`
`Ili. CONSTANT-FACTOR REDUNDANT CORDIC
`
`To reduce the implementation cost of redundant CORDIC,
`especially for the scale factor calculation and the scaling
`operation,
`there is a need to develop a redundant CORDIC
`scheme with a constant scale factor. This is achieved if & is
`restricted to the set {—1,1}. However, since to eliminate the
`delay of exact sign detection, & is obtained from an estimate
`of Y,{i], this does not assure convergence. We now discuss
`how to modify the scheme to assure convergence, first in the
`calculation of the sine and cosine functions (a review of a
`schemeproposedin [15]) and then in angle calculation/rotation
`mode (a new scheme).
`
`B. Angle Calculation/Rotation
`In the case of angle calculation, the problem of having a
`constantscale factor is more complicated becausethe direction
`of the rotation is obtained from an estimate of Y,,[#] and it is
`necessary to deal with the inter-dependency of the recurrences
`of X and Y. To use the estimate effectively, we utilize the
`modified recurrences of expressions (6). However, to achieve a
`constantscaling factor, instead of the G-selection of expression
`(7), we use
`
`
`
`. {!=
`
`-l
`
`if W[i] > 0
`if W[i] <0
`
`®)
`
`where Wii] is an estimate obtained by truncating the signed-
`digit representation of W [i] to the th fractionalbit. As before,
`this requires X,(1] > 1/2, which is achieved in the same
`manner.
`
`The use of this a selection does not assure convergence
`because Wi] can attain values outside the bound required
`for convergence. We now compute the convergence bound,
`determine the amount by which W’[i] can surpass this bound
`because of the 6-selection used, and show how correcting
`iterations can restore the bound.
`Theorem 3.1 (Bound): When the direction of the angle o;
`is obtained using the exact sign, that is,
`
`a 1
`i=)
`_y
`
`if W[i] >0
`if Wii] <0
`
`W [i] is bounded as follows [14]:
`
`|W [i] < 2X [i].
`
`Proof: Since o;_, is determined by the sign of W[i— 1],
`the largest magnitude of W[:] is obtained when W[i — 1] = 0
`(o;-1 = 1) in
`
`W(t] = 2(Wit — 1] - o31X[i - 1).
`
`Thus,
`
`A. Sine and Cosine Calculations
`
`|W [i] < 2X [i - 1].
`
`In [15] a correcting iteration scheme is used to have a
`constant scale factor when redundant CORDIC computes
`the cosine and sine functions. For these computations,
`the
`recurrences (3) are used and a; is obtained as in (4), except
`that an estimate of 27] is utilized. Because of this estimate,
`to assure convergenceit is necessary to repeat someiterations,
`called correcting iterations. These repetitions are done at fixed
`intervals, where the frequency of repetition depends on the
`precision of the estimate.
`This approach could be directly extended to perform ro-
`tations by decomposing the given angle. However,
`it
`is not
`directly applicable when the angle is computed as tan !(a/b)
`and this angle is used in decomposed form for the rotations.
`We develop the required modifications now.
`
`As shownin [21], X[é] < X{i+ 1] for all 7. Thus,
`
`|W fi]| < 2X fi -— 1] < 2X (i.
`
`
`
`O
`
`Fig. 2 shows the bound of Theorem 3.1. This bound cannot be
`maintained with the selection function of the CFR-CORDIC
`scheme [expression (8)], as shown by the following lemma.
`Lewmma 3.1: Assume that W[p] satisfies the bound of
`Theorem 3.1,
`that
`is |W[p]| < 2X[p]. If Gp is determined
`from W [p] where W[p] is computed using ¢ fractional bits of
`W |p], then W[p+ 1] satisfies the new bound
`
`-2X [p+ 1) -2*27(1+277) < W[pt1] < 2X[p+ ll.
`
`Page 3 of 10
`
`Page 3 of 10
`
`€
`

`

`LEE AND LANG: CONSTANT-FACTOR REDUNDANTCORDIC
`
`1019
`
`Wii)
`
`Wii-1]
`
`-2X[i) -2X[i-1)
`
`0
`
`2X[i-1]
`
`2X[i]
`
`Fig. 2. Convergence bound.
`
`Proof: Three regions of Wp] need to be considered.
`Case a): W[p] > 0 (Gp = 1).
`In signed-digit number system, if W[p] > 0 then W'{p] > 0,
`since W{p] is computed using t fractional bits of W[p]. Thus,
`Gp = Op = 1. From Theorem 3.1,
`|W [p + 1]| < 2X[p + 1).
`Case b): W[p] = 0 (Gp = 1).
`If W[p] = 0 then —2-' < W[p] < 27! We divide this region
`into two. In the first, when —27' < W[p] < 0, ap = —1 (not
`equal to 4). From the recurrence,
`
`Wip + 1] = 2(W[p] — X[p))
`Xp +1) = X[p] + 277? WiIp].
`
`Consequently,
`
`Wl[p+1]
`
`2W[p] — 2(X [p + 1] — 2-7PW[p])
`= —2X [p+ 1] + 2W[p](1 + 277?)
`
`and for the most negative value of W[p|
`
`W[p +1] > -2X [p+ 1] —2+27*(1+ 2777).
`
`On the other hand, when 0 < W{p] < 27! then o, = 6, = 1.
`Thus, from Theorem 3.1,
`|W[p + 1]| < 2X[p + 1].
`Case c): W[p] < 0
`Similar to case a), 6, = o, = —1. Thus, from the Theorem
`3.1,
`|W[p + 1]| < 2X[p + 1].
`0
`From a), b), and c), the proof is done.
`is out of the bound
`Fig. 3 shows the case when Wp + 1]
`due to the use of an estimate, W[p]. Once W[#]
`is outside
`of the bound required for convergence, the amount outside of
`the bound accumulates in the following iterations, as shown
`in the following lemma.
`Lemma 3.2:
`If —2X[p+1]—2+#27'(14+277?) < W[p+1l] <
`~2X [p + 1],
`then the bound of iteration W[p + k] where
`k = {2,3,---} becomes
`
`—2X [p+ k] — 2*-#(1 4277?) < Wi[p + k] < -2X[p +k).
`
`Proof by induction:
`1) Basis condition for k = 2,
`Since —2X[p + 1] — 2% 27'(1+4+2°-) < Wp+l<
`—2X[p + 1], we obtain W{p + 1] < 0. Thus, 6,4; = -1.
`From the lower bound and the recurrence, we have
`
`W[p + 2] > 2(-2X[p + 1] -—2*27'(14+277") + X[p + 1])
`
`W[p + 2] > 2(—X[p + 2] - 2-7"*) Wp + 1]
`—2*27-*(14+277?))
`W[p + 2] > -2X [p + 2] — 2?-*(1 +277”).
`
`Page 4 of 10
`
`2Wip])
`
`27P+! wip}
`
` at
`
`-2X[p]
`
`-2X[p+1] 2 0
`
`2X[p+1]
`
`2X[p]
`
`Fig. 3. Out of the bound due to the use of an estimate.
`
`From the upper bound and the recurrence we have
`Wp + 2] < 2(-2X[p + 1] + X[p + 1]) = 2(-Alp + 1)
`W[p +2] < 2(—X[p + 2] - 2-2°* Ow p + 1)
`< -2X [p+ 2).
`
`Thus,
`
`—2X [p+ 2) -2?-*(1 +277?) < W[pt 2] < —2X[p + 2].
`
`2) Induction step. Assume that it is true when & = 2 and
`prove that it is true when k = 7+ 1.
`By assumption,
`—2X [p+ i] — 2°-*(1 4.2777) < W[p+i] < -2X[p+i.
`
`Therefore, W[p +i] < 0 and 6,4; = —1. From lower bound
`of assumption and from recurrence,
`W[p +it1] > 2(-2X[p + i] — 2'-'(1 4277”) + X[p + 4)
`W([p+it i) > 2(-X[p+it1]-27°@OW[p+ ij
`— 2'-*(1 4.27)
`W[p+itl1) > —2X[p+it 1j—2't-(1 4277).
`
`From upper bound of assumption and from recurrence,
`W[p+it1] <2(-2X[p+i] + X[p+ i]) = 2(-X |p + @))
`W[p+it1] <2(-X[p+it 1] — 2-779 Wip + i)
`< -2X[p+i+l1].
`
`Thus,
`
`~2X[p+i+ 1j)—2°t'-*(1 42-7?)
`<W[p+itl) <-2X[pt+it+l).
`
`oO
`Therefore, from 1) and 2), the proof is done.
`Fig. 4 shows the bound of W[p + 2]
`to explain how the
`amount outside of the bound accumulates once W|[p+ 1]
`is
`out of the bound.
`To recoverthe original bound of |W[i+ 1]| < 2X[¢+ l], a
`correcting iteration needs to be performed. The sameangle of
`the previous iteration is used for the correcting iteration,i.e.,
`2—instead of 2-“+) is used with Y [i + 1] as shown below.
`Xofi+ 1] = Xfi 4+ 1] 4682 °Y E+ 1]
`YCfi+ 1) =Y¥fi+1) —6027' Xfi + 1].
`
`The corresponding shifted recurrence becomes
`X°[i+ 1) = X[i4+ 1) 4 6027-7 Whi + YJ
`Wi 4+ 1] = Wii 4 1] — 26Xfi + YY.
`
`
`
`Page 4 of 10
`
`

`

`
`
`é = { 1
`
`if Whi] >0
`
`-1 if Wii] <0.
`
`(9)
`
`Page 5 of 10
`
`1020
`
`IEEE ‘TRANSACTIONS ON COMPUTERS, VOL. 41, NO. 8, AUGUST 1992
`
`
`
`-2X{p+2]
`
`-2X[p+1]
`
`0
`
`2X[p+1]
`
` 2X[p+2]
`
`Fig. 4. Accumulating the amount outside of the bound.
`
`The direction of the angle ¢© is determined from W{i + 1}
`instead of Wi]. However, notice that
`the same selection
`function in (8) is used for this iteration. Fig. 5 shows the case
`where a correcting iteration recovers the bound of Theorem
`3.1. The next theorem determines the frequency of correcting
`iterations required.
`|W[p]} < 2X[p| and the di-
`Theorem 3.2: Assume that
`rection of the angle, 6,+%,
`is determined from the estimate
`W[p +4]forall k. If ¢ fractional bits are used in the estimate,
`then a maximum of ¢ — 1
`iterations can be done before a
`correcting iteration to recover the convergence bound.
`Proof: Let us assumethat & iterations are performed after
`the pth iteration. From Lemma3.2, the bound of W[p + k]is
`expanded as follows due to the use of estimate
`—2X(p+ kj —2*-"(1 4.277?) < W[p tk] < 2X[p+ &].
`If |Wip + &]| < 2X[p + &],
`then it satisfies the bound of
`Theorem 3.1 and a correcting iteration is not needed to recover
`the bound.
`Since we are interested in finding out how many iterations
`can be performed before recovering the bound bya correcting
`iteration, we need to consider the case when W[p + kj is in
`the following range:
`~2NX[p + kj] — 2*-*(1 4.2777) < Wp +k] < —2X[p+ kj.
`In this case, 7,4, = ~1 since W[p+ k] < 0. By applying the
`correcting iteration, W[p + k] becomes
`W°[p+ k] = W[p + k] + 2X[p +k] > -2*-"1 427-7),
`
`To recover the bound of Theorem 3.1, we need to find k
`satisfying the following inequality,
`| — 2*-*(1 4.27?P)| < 2X [p + kj.
`
`Therefore,
`
`okt
`
`2X[p + k]
`(1+ 2-2P).
`
`c
`
`Wii+1]
`
`Witt]
`
`2Xfi+l}
`-2X[i+1]
`0
`
`Fig. 5. Recovering the convergence bound by a correcting iteration.
`
`0
`Thus, the largest integer & becomes ¢ — 1.
`As indicated by Theorem 3.2, a correcting iteration puts W”
`inside the convergence bound,if it was outside of the bound
`before this iteration. However, as shown in the next lemma,
`this correcting iteration might take Wout of the bound ifit
`was inside the bound before.
`Lemma 3.3:
`If |W[i+ 1)| < 2X [i + 1), the boundafter a
`correcting iteration
`WCli+ 1] = Wii + 1] - 26F X[i + 1]
`
`becomes
`
`~2-' 2X [i+ 1] < Woe + 1) < 2X [i+ 2.
`
`Proof: The proof is very similar to that of Lemma 3.1
`and is omitted.
`O
`From Lemma3.2, Theorem 3.2, and Lemma 3.3, we obtain
`the following corollary.
`Corollary 3.1: If ¢ fractional bits are used for the estimate,
`then the interval between correcting iterations should be less
`
`than or equalto (t — 2).
`O
`The fact that a correcting iteration does not have to be
`included exactly every (t—2) iterations provides flexibility and
`helps to minimize the numberof additional iterations required
`to avoid an explicit scaling operation. Also, note that
`it
`is
`possible to repeat a correcting iteration more than once ifit
`helps to minimize the total numberof iterations for a certain
`K. This additional correcting iterations should not be counted
`as one of the iterations of CORDIC when determining the
`index of the next correcting iteration.
`Since the frequency of correcting iterations is determined by
`t, to reduce their numberit is necessary to increase f. However,
`this makes the @ selection slower. Consequently, the value of
`t has to be chosen to reduce the overall execution time.
`
`C. Reducing the Number of Correcting Iterations
`From the previous discussion we can determine the mini-
`mum number ofcorrecting iterations required. We now show
`that it is sufficient to have only one correcting iteration in the
`second half, if for that part the a-selection of expression (7)
`is used. That is, we split the iterations of CORDIC (for i = 0
`to i = n — 1) into two groups as follows:
`1) Selection function for 0 < i < n/2
`
`The value of & is determined by considering the smallest
`possible value of the right-hand side. For this, we consider the
`smallest value of 2X [p+ k] and the largest of (1+ 27"), ie.,
`we should consider the smallest value of p. Since we assume
`that the input W’{0] is in conventional numberrepresentation,
`the direction of the angle Gp is correct, regardless of the use
`of an estimate. Thus, the smallest p to consider is 1. Now we
`want to determine the smallest value of X[p+ &l. In [22] it is
`As shownearlier, correcting iterations must be included for
`shownthat the smallest value of X[p+ k] is 3/8. Therefore,
`convergence for
`this group and the number of fractional
`log Gem log?
`
`bits used for W(] determines the frequency of correcting
`kat ORE)5OBS
`iterations.
`log 2
`log 2
`
`Page 5 of 10
`
`

`

`LEE AND LANG: CONSTANT-FACTOR REDUNDANT CORDIC
`
`1021
`
`2) Selection function for n[2 <i <n
`12
`0
`
`(10)
`
`Utilizing the fact that the value of AK in n-bit precision is not
`affected by a; for n/2 < i < (nm — 1), the selection function
`of redundant CORDIC (with a variable scale factor) shown
`in (7) is used for the second group. This assures convergence
`without correcting iterations; however, we need to include one
`additional correcting iteration (independentof n) since the last
`correcting iteration of the first group might fall into the case
`of Lemma 3.3.
`
`D, Making Scale Factor a Power of Two
`Sincethe scale factor is a constant,it is possible to utilize the
`same methods for scaling used for the conventional CORDIC
`algorithm. As mentioned before, the most effective method is
`to use additional iterations (CORDIC and scaling) to make
`the scale factor
`a power of two,
`since in that case the
`scaling can be done by shifting, which is implemented by
`wiring. As in the conventional case, the issue is to minimize
`the number of additional iterations; however, the problem is
`somewhat more complex because of the correcting iterations.
`As indicated in the previous section, there is some freedom in
`the placement of these iterations; therefore, the problem is to
`obtain the minimum numberof iterations with the correcting
`iterations suitably placed. We developed an efficient method
`of exhaustive search, called decomposed search, to reduce the
`searching time from T to VT [22]. For the case in which
`six fractional bits are used in the estimate,
`the number of
`additional iterations for 16-bit and 32-bit precision is shown
`in Table I. The corresponding scale factor is equal to two.
`The additional iterations require an increase in the number
`of guardbits since this number should be at least log m, where
`mis the numberof iterations. For the solutions shown in the
`
`Table I, we get five guard bits for the 16-bit case and six
`guard bits for the 32-bit one.
`
`TV. REDUCING THE NUMBER OF
`ITERATIONS AND ON-THE-FLY CONVERSION
`
`We now describe a method for reducing the numberofiter-
`ations for rotation, which can be applied both to nonredundant
`and redundant CORDIC. Moreover, we develop a scheme for
`performing the conversion of the result from redundant
`to
`2’s complement in an on-the-fly manner. This scheme was
`not necessary for the redundant CORDIC with variable scale
`factor, since the on-the-fly conversion can be done during the
`division for scale factor correction.
`
`A. Radix-2 and Radix-4 Iterations for Rotation
`
`To speed up the CORDIC rotation, we propose a scheme
`that reduces the last n/2 iterations to n/4 by the use of radix-
`4 rotations. One possibility would be to directly decompose
`the angle into radix-4 o’s; however, that would complicate
`significantly the selection function. Consequently, we continue
`
`TABLE I
`EXAMPLES OF ADDITIONAL ITERATIONS FOR t = 6
`.
`No. of
`we
`_
`Additional CORDIC
`additional
`iterations
`iterations
`6
`4, 8,9
`12
`4, 8, 9, 11, 15, 17
`
`ae
`Scaling tierations
`+2, +10, —5
`+2, +10, +17, —5, —22, -28
`
`No. of
`bits
`16-bit
`32-bit
`
`decomposing the angle into radix-2 a’s and combine these to
`form a radix-4 rotation. This can be easily done only when
`tan7!(2-*) = 27* for the required precision. Moreover,
`in
`order not to change the scaling factor, this should be done
`only for i > n/2. The following theorem showsthat this is
`satisfied for i > n/2.
`Theorem 4.1: Let mbe the total number ofiterations. Then,
`Viena(27! -— tan71(2™")) < 2-(ntles™) form < 2"/?.
`Proof: Theseries expansion for tan x where|x| < 1is
`: egt
`=@-—+ > -5t
`ane
`3 + 5
`7
`
`t
`
`Thus,
`
`ee
`e-tanta2=—- +3 a
`
`
`For « = 27%,
`;
`.
`2-* tan"! Q7*=
`an
`
`Thus,
`
`Q-3t
`3
`
`Q-5t
`—-—— tid
`BT
`
`
`g-3t
`2
`
`
`So (27% = tan *(2-)) < 2PM $2 HE He)
`i=§
`
`< 45-150 < g—(n+log m) |
`7
`
`Consequently, a sufficient condition is
`
`m< 2,
`
`Oo
`
`Therefore, when 7 > 5, a rotation (iteration) with an angle
`6; = o; tan! 2—* can be regarded asan iteration with an angle
`g,27', so that two consecutive iterations can be expressed with
`one radix-4 direction of rotation since
`
`6; + Gig, = 0:27" + 054127) = (20; + 054 ,)27°0Y,
`The set of values of (20; + 0:41) is {43, +1} if o € {41}
`and {+3,+2,+1,0} if o €
`{+1,0}. To avoid multiplica-
`tion by +3, we developed recoding schemes into the set
`{£2,+1,0} [22].
`One problem that appears with this reduction in the number
`of iteration in the rotation,
`is the imbalance between the
`number of iterations of angle calculation (all radix-2) and
`of rotation (radix-2 and radix-4). Moreover, to perform the
`radix-4 rotations it
`is necessary to have performed already
`the first of the pair of radix-2 angle iterations. Fortunately,
`the scaling iterations for making the scale factor a power of
`two have to be done only for the rotation (and not for the
`angle). Consequently, if these scaling iterations are done at the
`beginning together with the first half of the angle iterations,
`
`Page 6 of 10
`
`Page 6 of 10
`
`

`

`it is possible to match the timing. To achieve this match, a
`solution with a suitable number of scaling iterations should
`be selected.
`
`IEEE TRANSACTIONS ON COMPUTERS, VOL. 41, NO, 8, AUGUST 1992
`
`Review of the on-the-fly conversion scheme presented in
`[16]: Consider a unit (such as SRT division) which produces
`a signed-digit result (we describe the radix-2 case)
`
`P= So po",
`i=l
`
`pi € {-1,0,1}
`
`one digit per iteration beginning from the most significant.
`Then P can be converted on-the-fly into its 2’s complement
`representation
`
`m
`Q=-qt+ So a2™,
`7=1
`
`gi € {0.1}.
`
`The final result Q = Alm].
`On-the-fly conversion scheme for redundant rotation: For
`CORDIC rotation both X and Y have to be converted; here
`we describe the conversion of X only since the other one is
`similar. The previous algorithm cannot be applied directly to
`B. On-the-Fly Conversion
`the CORDIC case because the result is not obtained one bit
`the rotation has to be
`In most
`instances the result of
`per iteration. However, as shown in Fig. 6,
`the conversion
`converted to conventional representation. For the scheme pre-
`schemeutilizes the fact that in the calculation of X[¢ + 1] the
`sented in [14], since the scale factor is variable this conversion
`operand¥[2] is shifted right by 7 positions. Consequently, the
`is done as part of the scaling operation (division) using the on-
`most-significant part can only be modified by a positive or
`the fly conversion method described in [16]. Since the explicit
`by a negative carry coming fromthe least-significant part. As
`scaling operation by division is avoided in CFR-CORDIC,
`shownin the figure, Xi} can be divided into three regions:
`there is a need to develop an on-the-fly scheme to convert
`1)
`the converted region of j bits (represented by two
`while the CORDIC rotation is performed. The property of
`conditional forms JZ] and AM[i] = Afi] — 27)
`after CORDIC which is the generation of approximately one
`2) the window #; corresponding to bits 7 + 1 to 7.
`bit of the output per iteration starting from most significant
`3)
`the region beginning from bit
`(¢ + 1), where the addi-
`bit, makes it possible to use a similar conversion scheme.
`tion/subtraction is performed. Note that the first 7 bits
`However,the fact that the output is not really produced in an
`of ¥[2] in this region are already converted.
`on-line manner(one bit periteration) [23] makes the scheme
`Since we want to select only between two conditional forms,
`more complex. Wefirst briefly review the conversion scheme
`we restrict the carryout of the window to be either 0 or minus
`presented in [16] and then present the adaptation for CORDIC
`rotation.
`1. The window A;
`is used to absorb the carry produced by
`the addition/subtraction so that only a minus I carry can cross
`into the first region.
`We now need to determine the numberofbits of the window
`F;. This dependson the set of values of 7. We have developed
`the three possible cases that appear in the rotation scheme
`described before, namely, 1) in radix-2, 2) in radix-4, and 3)
`repetition of two radix-2 stages [22]. Here we only discuss
`the radix-2 case.
`for the
`the Ath bit of X(t], with & = 1
`We call X;[7]
`most significant bit (similarly for Y[i]).
`In Fig. 6, dj2-@+D
`denotes the summation of the three operands, X;,1[7], 4,Y; [i],
`and a transfer digit from the next lower weight. Since 6;
`can be —1, 4;Y,[i] is a signed-bit. As a result, d; belongs
`to the set {-—3, —2,-1,0,1, 2,3} because the three operands
`are in {—1,0,1}. In one cycle of the conversion the following
`equality has to hold (see the figure):
`
`2* Ry +d; = (2* carryout + cb;) «2°84 Ri,
`
`(11)
`
`The algorithm produces a converted digit q; as it obtains a
`signed-digit p, from most significant to least significant bit.
`The simple algorithm Q[k] = Q[k -1)+px2-* produces Q =
`Q{m);, however, when p, = —1, this requires the propagation
`ofa carry andis therefore slow. To avoid this propagation, the
`on-the-fly conversion algorithm keeps two conditional forms:
`A{k — 1] expecting p, = 1 or py = 0, and B[k — 1] expecting
`pe = —1, where Blk - 1) = Afk —1) — 2-0-9),
`The initial values of A[1] and B{1] are
`
`where carryout is either O or —1, which determines the choice
`of the conditional form A or AM, cb;
`{0,1} is the newly
`converted bit, and ndelay is the width of the window. The
`analysis shows that we need to delay at least three signed bits,
`resulting in the decision logic summarized in Table Il. On the
`
`other hand, as shown

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket