`
`lt-Ll-ZE TRANSACTIONS ON COMPUTERS. VOL. 41, NO. 8. AUGUST 1992
`
`Constant-Factor Redundant CORDIC
`
`for Angle Calculation and Rotation
`
`Jeong—A Lee, Member. IEEE, and Tomas Lang
`
`Abstract-'— We develop a Constant-Factor Redundant-CORDIC
`(CFR-CORDICJ scheme, where the scale factor is forced to be
`constant while computing an angle for plane rotations. The di-
`rection of rotation is determined from an estimate of the Sign and
`convergence is assured by suitably placed correcting iterations.
`Moreover, the number of iterations in the CORDIC rotation
`unit is reduced by about 25 % by expressing the direction of the
`rotation in radix-2 and radix-4, and conversion to conventional
`representation is done on-the-tly. We estimate the performance of
`CPR-CORDIC and compare it with previously proposed schemes
`and show that it provides a similar execution time as redundant
`CORDIC with a variable scaling factor, with a significant saving
`in area.
`
`Index Terms—Angle calculation and rotation, constant scale
`factor, CORDIC, digital signal processing, matrix computations,
`matrix ti'iangularization, redundant arithmetic.
`
`I.
`
`INTRODUCTION
`
`ODERN digital signal processing requires the solution
`of systems of linear equations and the computation of
`eigenvalues, cigenvectors, singular values, and singular vectors
`[1]. Basic algorithms for these computations are matrix trian—
`gularization and singular value decomposition [2]. Since these
`applications are computationally intensive and require real-
`time response, parallel algorithms and pipelined and parallel
`architectures have been proposed to achieve high throughput
`[3]—[7]. In particular, linear, triangular, and mesh-connected
`arrays are very suitable because these matrix algorithms can
`be effectively mapped onto these arrays [8], [3], [5]. The key
`operations of these parallel algorithms are the computation of
`2 x 2 rotation matrices (rotation angles) and their application
`to appropriate submatrices. The rotation angle is computed
`in boundary or diagonal processors and broadcast
`to other
`processors where the rotations are performed. One calculation
`approach is to compute the sine and cosine of the angle
`{by means of a sequence of operations involving squaring,
`addition, multiplication. square root, and division) and then
`
`Manuscript received November I. l99l: revised February 18. 1992.. This
`work was done while the authors were at the Computer Science Department.
`U.C.L.A., and was supported in part by the NSF Grant MIP—RSBMO
`Composite Operations Using Orr-Line Arithmetic for Application-.S'pecrfic Par,
`all?! Architectures: .4. fgort‘t‘hms, Design. and Experiments? Studios. Based on
`“Matrix Triangulariration by Fixed—Point Redundant CORDIC with a Constant
`Scale Factor" by J. Lee and T. Lang which appeared in Proceedings of SPIE
`Conference on Advanced Signal Processing Algorithms, Architectures, and
`Implementation. vol. 1348, pp. 430—447. San Diego, CA. July 1990.
`J. Lee is wilh the Department of Electrical Engineering. University of
`Houston, Houston. TX TT204—4i93.
`T. Lang is with the Computer Architecture Department, Universital Poll-lees
`nica dc Cataiunya. Barcelona. Spain.
`IEEE Log Numbver 9201909.
`
`to perform the rotations by multiplications and additions. The
`main drawback of this approach is that the angle calculation
`requires a long sequence of operations resulting in a large
`number of modules and a long execution time.
`Another approach is to use the CORDIC algorithm [9],
`[10], which is utilized directly both to calculate the angle
`and to perform the rotations. It is characterized by its simple
`and regular architecture, which mainly consists of shifters and
`adders and is, therefore, well suited for VLSI implementation
`[ll]—[13].
`The CORDIC algorithm is relatively slow because each
`iteration requires an addition. A solution is to use a redundant
`number representation (for example, carry-save or signed—
`digit) to achieve carry-free addition. This can be applied di-
`rectly to the rotation mode; however, for the angle-calculation
`mode it
`is necessary to detect
`the sign of the redundant
`representation, which is slow. In a sequential implementation
`of CORDIC, this sign detection can be done in parallel with
`the variable shifting. On the other hand,
`in an unfolded
`implementation the variable shifters are replaced by wired
`connections, resulting in lower delay (and a higher throughput
`if the implementation is pipelincd). In such a case, the sign
`detection is in the critical path and degrades the performance.
`In [14] a scheme was developed, Similar to what has been
`standard [or other recurrences such as division and square
`root, in which an estimate of the sign is used. This approach
`produces a significant speedup for matrix computations but
`complicates the scaling required for rotation.
`In this paper we develop a Constant-Factor Redundant-
`CORDIC (CFR—CORDIC) scheme for the angle calculation
`and rotation application. The scale factor
`is forced to be
`constant while computing the angles, so that the scaling for
`the associated rotations is performed as in the nonredundant
`CORDIC case. The approach, which is based on sign esti-
`mation and additional iterations to assure convergence,
`is an
`extension of that proposed in [15] for the calculation of sine
`and cosine. The angle calculation and rotation application is
`more complicated because the sign estimation is affected by
`the interdependence of the two CORDIC recurrence equations
`and because of the scaling required in the rotation.
`We also propose a scheme to reduce by about 25% the
`number of iterations in the CORDIC rotation unit. This is
`
`achieved by expressing the direction of rotation in radix—2 and
`radix-4 {this can be used in both nonredundant and redundant
`CORDIC]. Moreover, the transformation of the rotated output
`into nonredundant representation is done on-the-fly during
`the iterations, using an extension of the approach presented
`OBIS—93409230100 © 1992 IEEE
`
`Page 1 of 10
`
`SAMSUNG EXHIBIT 1044
`
`Page 1 of 10
`
`SAMSUNG EXHIBIT 1044
`
`
`
`LEE AND LRN'G: (“ONS'l'K‘tNT-FACFOR REDUNDANT CORDIC‘
`
`ll|l7I
`
`in [16]. The method here is more complex because of the
`iteration structure of the rotation. Finally, we discuss tradeoffs
`in the implementation, evaluate an estimate of performance
`and compare with previously proposed schemes. For matrix
`triangularization, we estimate that the execution time is similar
`to that of redundant CORDIC with variable scale factor, at
`
`significant savings in area.
`
`1]. REVIEW OF CORDIC SCHEMES
`
`(nonredundant) and
`We briefly review the Conventional
`redundant CORDIC schemes [10]. [14] to compute an angle
`and perform rotations.
`
`A. Nonrert‘undam CORDIC
`
`:
`0
`angle
`Angie calculationfvectorirtg mode): The
`ton—1 [YulllMXaEUD in nebil precision is obtained from Z,,[rr]
`using the following recurrence equations with Z,,[()] = 0.
`
`X, [i + l] : Xu[z'.1+ o;2_iYu[-i]
`up + l] = r; [it _ era-trap]
`xii-+1] = 2n] + Mario“)
`
`(1)
`
`is assumed positive and the direction of the
`where Xaitl]
`rotation is obtained as
`
`+1
`Of
`’_ —l
`
`if 1-1. [i] 3 o
`if Yup] < o.
`
`[2)
`
`Rorarion(rorrtring-mode): To perform a plane rotation on
`the input vector [X,.[O].l"..[0]) by an angle (6 : 240]} the
`following recurrence equations are used:
`
`mwedvccwr
`
`Rotation uni:
`(cr .d.)
`
`
`( X and Y rec.)
`
`
`Fig. 1. General archilccture for matrix computations with CORDIC.
`
`the following CORDIC scaling operation is
`(X,.101.Y,.[l}]).
`necessary, where (.e“.y”} is the rotated vector.
`
`IR = J— X" [n] where
`3) R
`K Yr in]
`n— l
`n - l
`K : fl mimics—t") -: H Jig—Ema:
`i=0
`5:0
`
`
`
`(5)
`
`Note that K is constant for nonredundant CORDIC since
`
`i.
`a"? = l for all
`The scaling operation can be done by multiplication by
`UK. However, as proposed in [l7]i[l9] it might be simpler
`
`to include scaling iterations of the form X..[i] a. X.,.['i] ::
`2‘-lX,.lil and leil «— rapt i rims and repetitions of
`CORDIC iterations to make the scaling factor equal to a power
`of two (really 1 or 2) and perform the scaling by shifting.
`For an efficient
`implementation.
`the total number of these
`additional iterations need to be minimized: solutions have been
`
`found that require about 25% more iterations to force K to be
`2 or
`i
`[19],
`[20].
`
`where
`
`Xrfi + 15 _ Xr[i] + ar2“"l’r[t]
`r1. [-2 + 1] : r3. [r1] — 042’*.X',[r3]
`2,.[r + 1] = arr-r] + Jinan—1&4)
`
`._ at
`
`a“ F { +1
`
`it are] 3 U
`
`if an] < 0.
`
`B. Redundant CORDIC
`
`To reduce the iteration time, it is possible to use a redundant
`representation {for example carry—save or signed-digit) and
`the corresponding (carry-free) adder. In the development that
`follows we consider
`the signed—digit case;
`the carry—save
`alternative is similar. We now discuss the effect that the use
`
`of the redundant representation has on the angle computation
`and on the rotation.
`
`(3)
`
`(4)
`
`The recurrence equations are the same for both angle and
`rotation modes.
`
`Rotation with angle or decomposed form: Note that for an
`application in which an angle is calculated by the vectoring
`mode and then this angle is used for rotation, it is not necessary
`to implement
`the Z recurrences in any of the two modes,
`since the angle can be used in its o—decomposed form. This
`is the type of application we have in mind, so that we do not
`include the Z recurrences in the sequel and the rr for rotation is
`obtained from the angle calculation. The general architecture
`of these applications is shown in Fig. l.
`in both operation modes. the iteration time corresponds to
`the delay of the variable shifter plus the delay of the adder.
`If the recurrences are unfolded, the variable shifter is replaced
`by wiring, which reduces the iteration time to the delay of
`the adder. Moreover,
`the unfolded implementation can he
`pipelined to increase the throughput.
`Scaring: Since the CORDIC rotation changes the length of
`the vector,
`to maintain the same length of the input vector
`
`from the redundant
`Angie computation: To compute or
`representation, a sign detection is required. which is slow;
`however,
`in a sequential (not unfolded) implementation this
`sign detection is not
`in the critical path since it can be
`performed in parallel with the variable shifter. On the other
`hand, for an unfolded implementation, since the shifters are
`eliminated,
`the delay of this sign detection determines the
`iteration time. In [14] a scheme was proposed that computes (it,-
`based on an estimate of Hi]. This requires the modification of
`the recurrence and selection function to determine the direction
`of rotation. By making Wit] : 2‘Y,,[i] we get from (1),
`
`mt + 1] 2 mt] + Fri-Q’fiWI'i]
`
`W[r. + 1] = grit-tr] _ an}, [in
`
`[6)
`
`The value of Fr,- is obtained from Wnli], an estimate produced
`by truncating Wit] to one fractional (signed) bit. It is shown in
`[14} that to assure convergence the value of c’r, has to belong
`
`Page 2 of 10
`
`Page 2 of 10
`
`
`
`1018
`
`[EEE TRANSACTIONS ON COMPUTERS. VOL. 41. N0. 8, AUGUST 1902
`
`to the set {—l,[l.1} and
`
`B. Angie CoicuiationiRotation
`
`a.- z
`
`1
`0
`—1
`
`if Wit-I 2%
`if nip: 2 n
`if W[i] 5 —%
`
`(7)
`
`where Xc[tl] and Ya[[]] (W[[]]) are fractional values, one of
`them is normalized, and Xa[(}] > 0. With these conditions, a
`normalized Xa[l] is obtained, that is, 1/2 5 Xn[1] < 2.
`Rotation: Redundant adders can be used for the iterations.
`
`No (it
`lation.
`
`is calculated since it is obtained from the angle calcu-
`
`Scaling: The same scaling operation for rotation as in
`nonredundant CORDIC has to be performed. However,
`in
`this redundant CORDIC approach,
`the scale factor K is
`not a constant as [7,- E {r1._0.1}. Therefore, K must be
`computed for each tit-decomposed angle. This calculation can
`be performed in two steps [14]: 1) K2 computation using
`P[j+ l] : Plj]+|&j|2_2jPLj] with initial condition PM = 1
`and 2) K = x/F The K computation can be overlapped with
`the angle computation, so it does not increase the computation
`time. However, a division operation is needed for the final
`scaling operation.
`it was estimated that this scheme produces a speedup of 4.5
`for matrix triangularization and of 4 for SVD, when compared
`with the nonredundant scheme [14].
`
`III. CONSTANT—FACTOR REDUNDANT CORDIC
`
`To reduce the implementation cost of redundant CORDIC,
`especially for the scale factor calculation and the scaling
`operation,
`there is a need to develop a redundant CORDIC
`scheme with a constant scale factor. This is achieved if 6 is
`
`restricted to the set {—1, 1}. However, since to eliminate the
`delay of exact sign detection, & is obtained from an estimate
`of Yap], this does not assure convergence. We now discuss
`how to modify the scheme to assure convergence, first in the
`calculation of the sine and cosine functions (a review of a
`scheme proposed in [15]) and then in angle calculationtrotation
`mode (a new scheme).
`
`In the case of angle calculation, the problem of having a
`constant scale factor is more complicated because the direction
`of the rotation is obtained from an estimate of Y},[i.] and it is
`necessary to deal with the inter-dependency of the recurrences
`of X and Y. To use the estimate effectively, we utilize the
`modified recurrences of expressions (6). However, to achieve a
`constant scaling factor, instead of the rir—selection of expression
`(7), we use
`
`175:
`
`.
`
`{1
`
`—1
`
`if nip] 2 0
`if Wli] < 0
`
`(8)
`
`where WE] is an estimate obtained by truncating the signed-
`digit representation of W[i] to the ith fractional bit. As before,
`this requires Xa[1] 2 1/2, which is achieved in the same
`manner.
`
`The use of this 5 selection does not assure convergence
`because W[i.] can attain values outside the bound required
`for convergence. We now compute the convergence bound,
`determine the amount by which W [i] can surpass this bound
`because of the 6-selection used, and show how correcting
`iterations can restore the bound.
`
`Theorem 3.] (Bound): When the direction of the angle a,-
`is obtained using the exact sign, that is,
`
`o-— 1
`.,_
`_1
`
`ifW[i] 20
`if wn] <0
`
`W[i] is bounded as follows [14]:
`
`|W[-i]| < 2X[i].
`
`Proof: Since o,_1 is determined by the sign of W[i— 1],
`the largest magnitude of W[i] is obtained when W[t' — 1] = 0
`(03-4 = 1) in
`
`W[i] = 2(Wli ~ 1] — ois1X[i — 1]).
`
`Thus,
`
`A. Sine and Cosine Calculations
`
`|W[i]] < 2X[i.— 1].
`
`in [15] a correcting iteration scheme is used to have a
`constant scale factor when redundant CORDIC computes
`the cosine and sine functions. For these computations,
`the
`recurrences (3) are used and Er,-
`is obtained as in (4), except
`that an estimate of Z[i] is utilized. Because of this estimate,
`to assure convergence it is necessary to repeat some iterations,
`called correcting iterations. These repetitions are done at fixed
`intervals, where the frequency of repetition depends on the
`precision of the estimate.
`This approach could be directly extended to perform ro—
`tations by decomposing the given angle. However,
`it
`is not
`directly applicable when the angle is computed as tan—1(a/b)
`and this angle is used in decomposed form for the rotations.
`We develop the required modifications now.
`
`As shown in [21], X[i] g X[i + 1] for all i. Thus,
`
`|W[i]| < 2X[i _ 1] g 2m].
`
`
`
`:
`
`Fig. 2 shows the bound of Theorem 3.1. This bound cannot be
`maintained with the selection function of the CFR—CORDIC
`
`scheme [expression (8)], as shown by the following lemma.
`Lewmma 3.1: Assume that W [p] satisfies the bound of
`Theorem 3.1,
`that
`is |W[p]| < 2X[p]. If tip is determined
`from W[p] where W[p] is computed using 1 fractional bits of
`Who], then Wk) + 1] satisfies the new bound
`
`—2X[p+ 1] — 2324(1 + 2—21’]< W[p +114. 2X[p+ 1].
`
`Page 3 of 10
`
`Page 3 of 10
`
`
`
`LEE AND LANG: CONSTANT-FACTOR REDL'NDANT CORDIC
`
`JUN
`
`Wti]
`
`wp—n
`
`-ZX[i]
`
`-2X[i-l]
`
`0
`
`2X[i-ll
`
`211m
`
`Fig. 2. Convergence hound,
`
`Proof: Three regions of sz] need to be considered.
`Case a): WLp] > 0 (5p 2 1).
`In signed-digit number system, if Who] > 0 then W[p] > 0.
`since Whit] is computed using 1 fractional bits of Wlp] Thus,
`63, = 0-,, = 1. From Theorem 3.1, ]W[p +1]| < 2X[p +1].
`Case b): who] 2 0 tap : 1),
`[1' Wm] : D then —2_E < [rt/[p] < 2‘1 We divide this region
`into two. In the first, when —2_t < W[p] < 0, 0,, : —1 (not
`equal to a), From the recurrence,
`
`W]-p+l]=2(]—W[p X]p])
`X]p+l]:X[p]+22pmp].
`
`Consequently,
`
`WLD + 1] : 21m] _ 2{X[p +1]— 2-2pm“)
`: —2X[p + 1] + 2W[p](1+ 2—2?)
`
`and for the most negative value of W[p]
`
`Wlp + 1] > —2th + 1] — 2 :1 2—‘(1 + 2—29).
`
`011 the other hand, when 0 < W[_p] < 2" then 17,, : Er, : 1.
`Thus, from Theorem 3.1,
`|W[p + 1]] < 2X[p + 1].
`Case e): W [p] < 0
`Similar to case a), 6,, = 0,, : —1. Thus, from the Theorem
`3.1,
`iW[p + 1]| < 2X[p +1].
`El
`From a), b), and c), the proof is done.
`is out of the bound
`Fig. 3 shows the case when W by + 1]
`due to the use of an estimate, W[p]. Once Wle']
`is outside
`of the bound required for convergence, the amount outside of
`the bound accumulates in the foIIOwing iterations, as shown
`in the following lemma.
`Lemma 3.2: 1f72X@+]]72a2—‘(1+2‘2P) < W[p+1] <
`72X[p + 1],
`then the bound of iteration W[p + k] where
`k: = {2.3.---} becomes
`
`—2XLD+ k] — 21-10 + 2—23”) < Mp + 1] < —2X[p+ 1:].
`
`Proof by induction:
`1) Basis condition for k : 2,
`Since —2X[p + 1] — 2 at 24(1 + 2—2?) < W[p +1] <
`—2X[p+ 1], we obtain tit/[p + 1] < 0. Thus, a,“ = —1.
`From the lower bound and the recurrence, we have
`
` W[p —— 2] > 2(—2X[p + 1] —2 112—11 +2—2P) + le+ 1])
`
`W’Lp —— 2] > 2(—X[p + 2] — 2—2111“)th +1]
`— 2 , 2—‘(1 + 2411)]
`
`who * 2] > —2X[p + 211 22—’(1 + 241’].
`
`Page 4 of 10
`
`2W[p]
`
`2‘1Pt1wm
`
`—2X[p] —2X[p+l]
`
`72
`
`0
`
` 4
`
`1X[p+l]
`
`2X[p]
`
`Fig. 3, Out of the bound due to the use of an estimate.
`
`From the upper bound and the recurrence we have
`
`W11» + 21< 2t—2X1p+ 11+ ti +111 = 21—ti + 111
`Who + 2] < 2(—X['p + 2] — 2-2<P+UW[p + 1])
`< —2X[p + 2].
`
`Thus,
`
`—2X[p+2]— 221 (1+2 2" )<W[p+2]<——2XL;;+2].
`
`2) Induction step. Assume that it is true when k z i and
`prove that it is true when k = 1' + 1.
`By assumption,
`
`—2X[p +1] — 2"—*(1 + 241’) < W[p+ 1:] < —2X]p +12].
`
`Therefore, W[p + if] < 0 and 6,,“- = —1. From lower bound
`of assumption and from recurrence,
`
`W[p+ 2' + 1] > 2(A2XL11+1] — 21"(1+2‘2P) + X[p+'i])
`W[p +1+ 1] > 2(—X[p+ 1' + 1] — 2—2(P+”W[p +1]
`— 2*-*(1 + 2-2?»
`th+1+11> —2X[p+ i + 1] — 2"11—‘(1 + 2—29).
`
`From upper bound of assumption and from recurrence,
`
`W[p+i+1] < 2(—2Xfi1+1‘]+X[p+i]) = 2(—X|p+1‘])
`W[p+ 1: + 1] < 2(—X[p +1+ 1] — 2—2meVLu-t— 1])
`< —2X]p + 1' + 1].
`
`Thus,
`
`—2X]]J +1+ 1] — 2"+1**(1 + 2’2?)
`
`< Wlp+t+ 1] < —2X[p+1'+1].
`
`CI
`Therefore, from 1) and 2), the proof is done.
`Fig. 4 shows the bound of W[p + 2]
`to explain how the
`amount outside of the bound accumulates once Wk) + 1]
`is
`out of the bound.
`
`To recover the original bound of ]W[i + 1]] < 2X[i + l], a
`correcting iteration needs to be performed. The same angle of
`the previous iteration is used for the correcting iteration,
`i. e.,
`2 ' instead of 2 5+1):5 used with Y[i + 1] as shown below.
`
`X011: + 1] = 1([1‘ + 1] + (35'2’EY[1‘+ t]
`YC{1‘+ 1] : 11-1 + 1] — a?2-*‘.vr[z' + 1].
`
`The corresponding shifted recurrence becomes
`
`X" [1: + 1] =
`
`WC[1+ 1] =
`
`X[i + 1] + 592-21—111/[1 + 1]
`W[1'+ 1] — 26,-0X[1'+1].
`
`
`
`Page 4 of 10
`
`
`
`
`
`lllill
`
`lLiEl; 'I'HANSNL'TIDNS ON COMPUTERS, VOL. 4t, N0 8. AUGUST 1992
`
`CW
`
`li+l]
`
`axiom]
`
`
`
`o
`
`2xn+n
`
`Fig. 5. Recovering the convergence heund by a correcting iteration.
`
`El
`Thus, the largest integer it becomes 1‘. e 1.
`As indicated by Theorem 3.2, a correcting iteration puts W
`inside the convergence bound, if it was outside of the bound
`before this iteration. However, as shown in the next lemma,
`this correcting iteration might take W out of the bound if it
`was inside the bound before.
`
`If ]Wii + 1“ < ‘ZXii + 1]. the bound after a
`Lemma 3.3:
`correcting iteration
`
`WCl-i + 1] ~ Wi-i+1]— 2af‘xn + 1]
`
`becomes
`
`~2‘r — 2X[.r:+1]< Wait + 1] < 2m + 1].
`
`Proof: The proof is very similar to that of Lemma 3.1
`and is omitted.
`El
`From Lemma 3.2, Theorem 3.2, and Lemma 3.3, we obtain
`the following corollary.
`Corollary ii: If t fractional bits are used for the estimate,
`then the interval between correcting iterations should be less
`
`than or equal to (t — 2).
`l:
`The fact that a correcting iteration does not have to be
`included exactly every (if?) iterations provides flexibility and
`helps to minimize the number of additional iterations required
`to avoid an explicit sealing operation. Also, note that
`it
`is
`possible to repeat a correcting iteration more than once if it
`helps to minimize the total number of iterations for a certain
`K. This additional correcting iterations should not be counted
`as one of the iterations of CORDIC when determining the
`index of the next correcting iteration.
`Since the frequency of correcting iterations is determined by
`it, to reduce their number it is necessary to increase 5. However,
`this makes the or selection slower. Consequently, the value of
`t has to be chosen to reduce the overall execution time,
`
`C. Reducing the Number of Correcting iterations
`
`From the previous discussion we can determine the mini-
`mum numbcr of correcting iterations required. We now show
`that it is sufficient to have only one correcting iteration in the
`second half, if for that part the deselection of expression (7)
`is used. That is, we split the iterations of CORDIC (for -i : t)
`to 1' : rt — 1} into two groups as follows:
`1) Selection function for t] g i 5 tit/2
`
`- ,
`(If; —
`
`1
`*1
`
`if Wit] 2 it
`.
`r,
`.
`1f W [t] < 0.
`
`(9)
`
`As shown earlier. correcting iterations must be included for
`convergence for
`this group and the number of fractional
`bits used for W[-.€] determines the frequency of correcting
`iterations.
`
`
`
`axing]
`
`-2X[p+1]
`
`0
`
`2X[p+1].
`
`2le+2]
`
`Fig. 4. Accumulating the amount outside of the bound.
`
`The direction of the angle of is determined from Witt-1]
`instead of tilt]. However, notice that
`the same selection
`function in (8) is used for this iteration. Fig. 5 shows the case
`where a correcting iteration recovers the bound of Theorem
`3.1. The next theorem determines the frequency of correcting
`iterations required.
`ll‘l’ip“ < 2X[pi and the di—
`Theorem 3.2: Assume that
`rection of the angle, aw“ is determined from the estimate
`l-i-’[p+k] for all k. 1ft fractional bits are used in the estimate,
`then a maximum of t’. a l
`iterations can be done before a
`
`correcting iteration to recover the convergence bound.
`Proof: Let us assume that k iterations are performed after
`the path iteration. From Lemma 3.2, the bound of W[p + it] is
`expanded as follows due to the use of estimate
`
`—‘2X[‘p + M _ 2"”(1 + 2-”) < WLD + k] < ext” + k].
`
`then it satisfies the bound of
`If it‘l'ip + is“ < 2X[-p + k}.
`Theorem 3.] and a correcting iteration is not needed to recover
`the hound.
`
`Since we are interested in finding out how many iterations
`can be performed before recovering the bound by a correcting
`iteration, we need to consider the case when Wit? + k] is in
`the following range:
`
`—2X[p + t] — 2*'-t(1+ 24!); 4 mp + k] < —-2X[p+ at.
`
`in this case, op“. : -—l_sincc W[p+ k] < 0. By applying the
`correcting iteration. Wc'lp + k] becomes
`
`ltd-"hi + it] = My + t] + 2X[p + a] > _2*'—'{i + 241’).
`
`To recover the bound of Theorem 3.1, we need to find it
`satisfying the following inequality.
`
`|— ilk—’[l + 2-2pm z. 2m) + it].
`
`Therefore,
`
`2*“f <
`
`'3le + kl
`(t + 2—31“).
`
`is determined by considering the smallest
`The value of it:
`possible value of the right—hand side. For this, we consider the
`smallest value of 2X[])+ is] and the largest of (1 + 2’2”), i.e.,
`we should consider the smallest value of p. Since we assume
`that the input ”[0] is in conventional number representation,
`the direction of the angle do is correct, regardless of the use
`of an estimate. Thus, the smallest if) to consider is 1. Now we
`want to determine the smallest value of X [p+ kl. In [22] it is
`shown that the smallest value of Xfin + k] is 3/8. Therefore,
`2K“ J-Lii"
`.
`
`R—l‘< —l0gU—+g—Jfl% (log;
`log ‘2
`log ‘2'
`
`Page 5 of 10
`
`Page 5 of 10
`
`
`
`LEE AND LANG: CONSTANT-FACTOR REDUNDANT CORDIC
`
`102]
`
`2) Selection function for n/2 < i < n
`
`a, __.
`
`1
`o
`—1
`
`if Vito] a g
`if tit/[r] = o
`if we] 3 —%.
`
`(10)
`
`Utilizing the fact that the value of K in n—bit precision is not
`affected by r}, for n/2 < i S (o i l}, the selection function
`of redundant CORDIC (with a variable scale factor} shown
`in (7) is used for the second group. This assures convergence
`without correcting iterations; however, we need to include one
`additional correcting iteration (independent of it) since the last
`correcting iteration of the first group might fall into the case
`of Lemma 3.3.
`
`D. Making Scale Factor (1 Power of Two
`
`Since the scale factor is a constant, it is possible to utilize the
`same methods for scaling used for the conventional CORDIC
`algorithm. As mentioned before, the most effective method is
`to use additional iterations (CORDIC and scaling) to make
`the scale factor
`a power of two.
`since in that case the
`scaling can be done by shifting, which is implemented by
`wiring. As in the conventional case, the issue is to minimize
`the number of additional iterations; however, the problem is
`somewhat more complex because of the correcting iterations.
`As indicated in the previous section, there is some freedom in
`the placement of these iterations; therefore, the problem is to
`obtain the minimum number of iterations with the correcting
`iterations suitably placed. We developed an efficient method
`of exhaustive search, called decomposed search, to reduce the
`searching time from T to if? [22]. For the case in which
`six fractional bits are used in the estimate,
`the number of
`additional iterations for 16-bit and 32-bit precision is shown
`in Table l. The corresponding scale factor is equal to two.
`The additional iterations require an increase in the number
`of guard bits since this number shOuld be at least log or, where
`m is the number of iterations. For the solutions shown in the
`
`Table l, we get five guard bits for the 16-bit case and six
`guard bits for the 32-bit one.
`
`IV. REDUCING THE NUMBER OF
`Irenarrous AND ON-THE-FLY Convaasrou
`
`We now describe a method for reducing the number of iter-
`ations for rotation, which can be applied both to nonredundant
`and redundant CORDIC. Moreover, we develop a scheme for
`performing the conversion of the result from redundant
`to
`22’s complement in an on-the-fiy manner. This scheme was
`not necessary for the redundant CORDIC with variable scale
`factor, since the onitherfiy conversion can be done during the
`division for scale factor correction.
`
`A. Radix—2 and Radix-4 Iterations for Rotation
`
`To speed up the CORDIC rotation, we propose a scheme
`that reduces the last 11/2 iterations to NM by the use of radix-
`4 rotations. One possibility would be to directly decompose
`the angle into radix-4 o’s; however, that would complicate
`significantly the selection function. Consequently, we continue
`
`TABLE I
`EXAMPLES or ADDITIONAL ITEMI'IONS Port 1 = 6
`.
`No. of
`.
`.
`No. of
`..
`.
`.
`.
`.
`Additional CORDIC
`iterations
`bits
`additional
`Scaling llBleanS
`iteralions
`
`16-bit
`6
`+2, +10, —5
`4, 8, 9
`
`+2, +10, +17, —5, —22, w281232bit 4, S, 9, 11, 15, IT
`
`
`
`decomposing the angle into radix-2 0's and combine these to
`form a radix—4 rotation. This can be easily done only when
`tan—K24) : 2‘ii for the required precision. Moreover,
`in
`order not to change the scaling factor, this should be done
`only for i 2 2-1/2. The following theorem shows that this is
`satisfied for i 2 n/2.
`Theorem 4.1: Let m be the total number of iterations. Then,
`Sin/2Q"- — tan—1(2")) < 2_(“+‘°5m) for m < 2””.
`Proof? The series expansion for tan’1 3 where he! 5 1 is
`
`t
`
`an
`
`x7
`1:5
`3:3
`1
`‘ =e—# ———+
`3 + 5
`7
`.r'
`
`Thus,
`
`For a: = 2 ‘,
`
`J:5.
`x3
`m-tan’lx=—;-—7+H
`.i
`o
`
`_
`.
`2_'—t. _12-t:
`an
`
`
`273:
`3
`
`2—51'
`__..
`5 +
`
`.. <
`
`
`2#31
`2
`
`Thus,
`
`
`2(2 IIitzin1(2“I))<21'5"1(1+2 3+2-‘5+..-)
`L—gf
`
`< 32—1-5” < 2—(n+logrn)-
`
`Consequently, a sufficient condition is
`
`m < 2"”.
`
`El
`
`Therefore, when 3' 3 12‘, a rotation (iteration) with an angle
`91' = or, tan‘1 2* can be regarded as an iteration with an angle
`0,2“, so that two consecutive iterations can be expressed with
`one radix-4 direction of rotation since
`
`9i + 6H1: 052—!- +0r+r2_”+n = (20'; +Ut+1)2_(i+1)-
`
`The set of values of (20, +o,-+1) is {3:3, :|:1} ifo 6 {i1}
`and {:I:S,:l:2,:l:1,0} if or E {$1.0}. To avoid multiplica-
`tion by 21:3, we developed recoding schemes into the set
`{5:2,:t1,0} [22].
`One problem that appears with this reduction in the number
`of iteration in the rotation,
`is the imbalance between the
`number of iterations of angle calculation (all radix72) and
`of rotation (radix-2 and radix—4). Moreover, to perform the
`radix-4 rotations it
`is necessary to have performed already
`the first of the pair of radix-2 angle iterations. Fortunately,
`the scaling iterations for making the scale factor a power of
`two have to be done only for the rotation (and not for the
`angle). Consequently, if these scaling iterations are done at the
`beginning together with the first half of the angie iterations,
`
`Page 6 of 10
`
`Page 6 of 10
`
`
`
`“J22.
`
`IFEE TRANSACTIONS ON COMPUTERS, VOL 41, N0. 8. AUGUST 1992
`
`it is possible to match the timing. To achieve this match, a
`solution with a suitable number of scaling iterations should
`be selected.
`
`3. ()n-tl‘te—F1y Conversion
`in most
`instances the result of
`
`the rotation has to be
`
`converted to conventional representation. For the scheme pre—
`sented in [14]. since the scale factor is variable this conversion
`is done as part of the scaling operation (division) using the on-
`the fly conversion method described in [16]. Since the explicit
`sealing Operation by division is avoided in CFR—CORDIC,
`there is a need to develop an on-the-fly scheme to convert
`while the CORDIC rotation is performed. The preperty of
`after CORDIC which is the generation of approximately one
`bit of the output per iteration starting from most significant
`bit, makes it possible to use a similar conversion scheme.
`However, the fact that the output is not really produced in an
`on-line manner (one bit per iteration) [23] makes the scheme
`more complex. We first briefly review the conversion scheme
`presented in [16] and then present the adaptation for CORDIC
`rotation.
`
`Review of the on-rhe-flv conversion scheme presented in
`[16}: Consider a unit {such as SRT division) which produces
`a signed—digit result (we describe the radix-2 case)
`
`P=Zp12_i.
`i=1
`
`p1c[ 1,0.1}
`
`one digit per iteration beginning from the most significant.
`Then i” can be converted on—lhe-fiy into its 2’s complement
`representation
`
`i'i'i
`
`Q=rqo+2v.2_'.
`i:1
`
`q;e{0.l}.
`
`The algorithm produces a converted digit q,- as it obtains a
`signed—digit p, from most significant to least significant bit.
`The simple algorithm QM] = Q“; — l] +pk2‘i“ produces Q =
`Qirn]; however, when pt, -_ —1, this requires the propagation
`ofa carry and is therefore slow. To avoid this propagation, the
`on—the—fly conversion algorithm keeps two conditional forms:
`All: — 1] expecting pi. = 1 or 3);, : 0, and BM — I] expecting
`pr. : —1, where Bile ~1]:A[k — 1] — 2 “i U.
`The initial values of A[l] and B[l] are
`
`_ [LN-HIS)
`
`Am _ { 1.1(_n_s)
`Bili‘{l.0(—l)
`
`_ 011(0)
`
`if pi : l
`
`ifpl = ,1
`ifpl:—1
`
`if p1 = l
`
`Note that pt 75 0 since P is normalized.
`
`The recurrences on Able] and B[k] for it 3— l are
`
`A[k] 2
`
`BIL-t 2
`
`if'pk =
`A“: - 15+ 2—}:
`1f int- = (l
`AM —- 1]
`B[k—1i+2“k ifpkz—l
`A“: - 1]
`if m. _1
`Bit- -1]+ 2"“
`it 3),, = o
`BER—1]
`ifka-nl”
`
`Page 7 of 10
`
`The final result Q = Abra].
`Orr-tltc—fly conversion scheme for redundant rotation: For
`CORDIC rotation both X and Y have to be converted; here
`we describe the conversion of X only since the other one is
`similar. The previous algorithm cannot be applied directly to
`the CORDIC case because the result is not obtained one hit
`
`the conversion
`per iteration. However, as shown in Fig. 6,
`scheme utilizes the fact that in the calculation of X['1i+ 1] the
`operand VB] is shifted right by :2 positions. Consequently, the
`most—significant part can only be modified by a posnive or
`by a negative carry coming from the least-Significant part. As
`shown in the figure, XM can be divided into three regions:
`1)
`the converted region of _;: bits (represented by two
`conditional forms Ali] and AMp] : Ali] — 2‘3')
`2) the window it,- corresponding to bits 3' + 1 to r}.
`3)
`the region beginning from bit
`(i. + 1), where the addi-
`tiom’subtraction is performed. Note that the first j bits
`of Yii] in this region are already converted.
`Since we want to select only between two conditional forms,
`we restrict the carryout of the window to be either 0 or minus
`1. The window R,-
`is used to absorb the carry produced by
`the addition/subtraction so that only a minus I carry can cross
`into the first region.
`We now need to determine the number of bits of the window
`
`Hi. This depends on the set of values of a. We have developed
`the three possible cases that appear in the rotation scheme
`described before, namely. 1} in radix—2, 2) in radix-4, and 3)
`repetition of two radix—2 sta