throbber
1010
`
`lt-Ll-ZE TRANSACTIONS ON COMPUTERS. VOL. 41, NO. 8. AUGUST 1992
`
`Constant-Factor Redundant CORDIC
`
`for Angle Calculation and Rotation
`
`Jeong—A Lee, Member. IEEE, and Tomas Lang
`
`Abstract-'— We develop a Constant-Factor Redundant-CORDIC
`(CFR-CORDICJ scheme, where the scale factor is forced to be
`constant while computing an angle for plane rotations. The di-
`rection of rotation is determined from an estimate of the Sign and
`convergence is assured by suitably placed correcting iterations.
`Moreover, the number of iterations in the CORDIC rotation
`unit is reduced by about 25 % by expressing the direction of the
`rotation in radix-2 and radix-4, and conversion to conventional
`representation is done on-the-tly. We estimate the performance of
`CPR-CORDIC and compare it with previously proposed schemes
`and show that it provides a similar execution time as redundant
`CORDIC with a variable scaling factor, with a significant saving
`in area.
`
`Index Terms—Angle calculation and rotation, constant scale
`factor, CORDIC, digital signal processing, matrix computations,
`matrix ti'iangularization, redundant arithmetic.
`
`I.
`
`INTRODUCTION
`
`ODERN digital signal processing requires the solution
`of systems of linear equations and the computation of
`eigenvalues, cigenvectors, singular values, and singular vectors
`[1]. Basic algorithms for these computations are matrix trian—
`gularization and singular value decomposition [2]. Since these
`applications are computationally intensive and require real-
`time response, parallel algorithms and pipelined and parallel
`architectures have been proposed to achieve high throughput
`[3]—[7]. In particular, linear, triangular, and mesh-connected
`arrays are very suitable because these matrix algorithms can
`be effectively mapped onto these arrays [8], [3], [5]. The key
`operations of these parallel algorithms are the computation of
`2 x 2 rotation matrices (rotation angles) and their application
`to appropriate submatrices. The rotation angle is computed
`in boundary or diagonal processors and broadcast
`to other
`processors where the rotations are performed. One calculation
`approach is to compute the sine and cosine of the angle
`{by means of a sequence of operations involving squaring,
`addition, multiplication. square root, and division) and then
`
`Manuscript received November I. l99l: revised February 18. 1992.. This
`work was done while the authors were at the Computer Science Department.
`U.C.L.A., and was supported in part by the NSF Grant MIP—RSBMO
`Composite Operations Using Orr-Line Arithmetic for Application-.S'pecrfic Par,
`all?! Architectures: .4. fgort‘t‘hms, Design. and Experiments? Studios. Based on
`“Matrix Triangulariration by Fixed—Point Redundant CORDIC with a Constant
`Scale Factor" by J. Lee and T. Lang which appeared in Proceedings of SPIE
`Conference on Advanced Signal Processing Algorithms, Architectures, and
`Implementation. vol. 1348, pp. 430—447. San Diego, CA. July 1990.
`J. Lee is wilh the Department of Electrical Engineering. University of
`Houston, Houston. TX TT204—4i93.
`T. Lang is with the Computer Architecture Department, Universital Poll-lees
`nica dc Cataiunya. Barcelona. Spain.
`IEEE Log Numbver 9201909.
`
`to perform the rotations by multiplications and additions. The
`main drawback of this approach is that the angle calculation
`requires a long sequence of operations resulting in a large
`number of modules and a long execution time.
`Another approach is to use the CORDIC algorithm [9],
`[10], which is utilized directly both to calculate the angle
`and to perform the rotations. It is characterized by its simple
`and regular architecture, which mainly consists of shifters and
`adders and is, therefore, well suited for VLSI implementation
`[ll]—[13].
`The CORDIC algorithm is relatively slow because each
`iteration requires an addition. A solution is to use a redundant
`number representation (for example, carry-save or signed—
`digit) to achieve carry-free addition. This can be applied di-
`rectly to the rotation mode; however, for the angle-calculation
`mode it
`is necessary to detect
`the sign of the redundant
`representation, which is slow. In a sequential implementation
`of CORDIC, this sign detection can be done in parallel with
`the variable shifting. On the other hand,
`in an unfolded
`implementation the variable shifters are replaced by wired
`connections, resulting in lower delay (and a higher throughput
`if the implementation is pipelincd). In such a case, the sign
`detection is in the critical path and degrades the performance.
`In [14] a scheme was developed, Similar to what has been
`standard [or other recurrences such as division and square
`root, in which an estimate of the sign is used. This approach
`produces a significant speedup for matrix computations but
`complicates the scaling required for rotation.
`In this paper we develop a Constant-Factor Redundant-
`CORDIC (CFR—CORDIC) scheme for the angle calculation
`and rotation application. The scale factor
`is forced to be
`constant while computing the angles, so that the scaling for
`the associated rotations is performed as in the nonredundant
`CORDIC case. The approach, which is based on sign esti-
`mation and additional iterations to assure convergence,
`is an
`extension of that proposed in [15] for the calculation of sine
`and cosine. The angle calculation and rotation application is
`more complicated because the sign estimation is affected by
`the interdependence of the two CORDIC recurrence equations
`and because of the scaling required in the rotation.
`We also propose a scheme to reduce by about 25% the
`number of iterations in the CORDIC rotation unit. This is
`
`achieved by expressing the direction of rotation in radix—2 and
`radix-4 {this can be used in both nonredundant and redundant
`CORDIC]. Moreover, the transformation of the rotated output
`into nonredundant representation is done on-the-fly during
`the iterations, using an extension of the approach presented
`OBIS—93409230100 © 1992 IEEE
`
`Page 1 of 10
`
`SAMSUNG EXHIBIT 1044
`
`Page 1 of 10
`
`SAMSUNG EXHIBIT 1044
`
`

`

`LEE AND LRN'G: (“ONS'l'K‘tNT-FACFOR REDUNDANT CORDIC‘
`
`ll|l7I
`
`in [16]. The method here is more complex because of the
`iteration structure of the rotation. Finally, we discuss tradeoffs
`in the implementation, evaluate an estimate of performance
`and compare with previously proposed schemes. For matrix
`triangularization, we estimate that the execution time is similar
`to that of redundant CORDIC with variable scale factor, at
`
`significant savings in area.
`
`1]. REVIEW OF CORDIC SCHEMES
`
`(nonredundant) and
`We briefly review the Conventional
`redundant CORDIC schemes [10]. [14] to compute an angle
`and perform rotations.
`
`A. Nonrert‘undam CORDIC
`
`:
`0
`angle
`Angie calculationfvectorirtg mode): The
`ton—1 [YulllMXaEUD in nebil precision is obtained from Z,,[rr]
`using the following recurrence equations with Z,,[()] = 0.
`
`X, [i + l] : Xu[z'.1+ o;2_iYu[-i]
`up + l] = r; [it _ era-trap]
`xii-+1] = 2n] + Mario“)
`
`(1)
`
`is assumed positive and the direction of the
`where Xaitl]
`rotation is obtained as
`
`+1
`Of
`’_ —l
`
`if 1-1. [i] 3 o
`if Yup] < o.
`
`[2)
`
`Rorarion(rorrtring-mode): To perform a plane rotation on
`the input vector [X,.[O].l"..[0]) by an angle (6 : 240]} the
`following recurrence equations are used:
`
`mwedvccwr
`
`Rotation uni:
`(cr .d.)
`
`
`( X and Y rec.)
`
`
`Fig. 1. General archilccture for matrix computations with CORDIC.
`
`the following CORDIC scaling operation is
`(X,.101.Y,.[l}]).
`necessary, where (.e“.y”} is the rotated vector.
`
`IR = J— X" [n] where
`3) R
`K Yr in]
`n— l
`n - l
`K : fl mimics—t") -: H Jig—Ema:
`i=0
`5:0
`
`
`
`(5)
`
`Note that K is constant for nonredundant CORDIC since
`
`i.
`a"? = l for all
`The scaling operation can be done by multiplication by
`UK. However, as proposed in [l7]i[l9] it might be simpler
`
`to include scaling iterations of the form X..[i] a. X.,.['i] ::
`2‘-lX,.lil and leil «— rapt i rims and repetitions of
`CORDIC iterations to make the scaling factor equal to a power
`of two (really 1 or 2) and perform the scaling by shifting.
`For an efficient
`implementation.
`the total number of these
`additional iterations need to be minimized: solutions have been
`
`found that require about 25% more iterations to force K to be
`2 or
`i
`[19],
`[20].
`
`where
`
`Xrfi + 15 _ Xr[i] + ar2“"l’r[t]
`r1. [-2 + 1] : r3. [r1] — 042’*.X',[r3]
`2,.[r + 1] = arr-r] + Jinan—1&4)
`
`._ at
`
`a“ F { +1
`
`it are] 3 U
`
`if an] < 0.
`
`B. Redundant CORDIC
`
`To reduce the iteration time, it is possible to use a redundant
`representation {for example carry—save or signed-digit) and
`the corresponding (carry-free) adder. In the development that
`follows we consider
`the signed—digit case;
`the carry—save
`alternative is similar. We now discuss the effect that the use
`
`of the redundant representation has on the angle computation
`and on the rotation.
`
`(3)
`
`(4)
`
`The recurrence equations are the same for both angle and
`rotation modes.
`
`Rotation with angle or decomposed form: Note that for an
`application in which an angle is calculated by the vectoring
`mode and then this angle is used for rotation, it is not necessary
`to implement
`the Z recurrences in any of the two modes,
`since the angle can be used in its o—decomposed form. This
`is the type of application we have in mind, so that we do not
`include the Z recurrences in the sequel and the rr for rotation is
`obtained from the angle calculation. The general architecture
`of these applications is shown in Fig. l.
`in both operation modes. the iteration time corresponds to
`the delay of the variable shifter plus the delay of the adder.
`If the recurrences are unfolded, the variable shifter is replaced
`by wiring, which reduces the iteration time to the delay of
`the adder. Moreover,
`the unfolded implementation can he
`pipelined to increase the throughput.
`Scaring: Since the CORDIC rotation changes the length of
`the vector,
`to maintain the same length of the input vector
`
`from the redundant
`Angie computation: To compute or
`representation, a sign detection is required. which is slow;
`however,
`in a sequential (not unfolded) implementation this
`sign detection is not
`in the critical path since it can be
`performed in parallel with the variable shifter. On the other
`hand, for an unfolded implementation, since the shifters are
`eliminated,
`the delay of this sign detection determines the
`iteration time. In [14] a scheme was proposed that computes (it,-
`based on an estimate of Hi]. This requires the modification of
`the recurrence and selection function to determine the direction
`of rotation. By making Wit] : 2‘Y,,[i] we get from (1),
`
`mt + 1] 2 mt] + Fri-Q’fiWI'i]
`
`W[r. + 1] = grit-tr] _ an}, [in
`
`[6)
`
`The value of Fr,- is obtained from Wnli], an estimate produced
`by truncating Wit] to one fractional (signed) bit. It is shown in
`[14} that to assure convergence the value of c’r, has to belong
`
`Page 2 of 10
`
`Page 2 of 10
`
`

`

`1018
`
`[EEE TRANSACTIONS ON COMPUTERS. VOL. 41. N0. 8, AUGUST 1902
`
`to the set {—l,[l.1} and
`
`B. Angie CoicuiationiRotation
`
`a.- z
`
`1
`0
`—1
`
`if Wit-I 2%
`if nip: 2 n
`if W[i] 5 —%
`
`(7)
`
`where Xc[tl] and Ya[[]] (W[[]]) are fractional values, one of
`them is normalized, and Xa[(}] > 0. With these conditions, a
`normalized Xa[l] is obtained, that is, 1/2 5 Xn[1] < 2.
`Rotation: Redundant adders can be used for the iterations.
`
`No (it
`lation.
`
`is calculated since it is obtained from the angle calcu-
`
`Scaling: The same scaling operation for rotation as in
`nonredundant CORDIC has to be performed. However,
`in
`this redundant CORDIC approach,
`the scale factor K is
`not a constant as [7,- E {r1._0.1}. Therefore, K must be
`computed for each tit-decomposed angle. This calculation can
`be performed in two steps [14]: 1) K2 computation using
`P[j+ l] : Plj]+|&j|2_2jPLj] with initial condition PM = 1
`and 2) K = x/F The K computation can be overlapped with
`the angle computation, so it does not increase the computation
`time. However, a division operation is needed for the final
`scaling operation.
`it was estimated that this scheme produces a speedup of 4.5
`for matrix triangularization and of 4 for SVD, when compared
`with the nonredundant scheme [14].
`
`III. CONSTANT—FACTOR REDUNDANT CORDIC
`
`To reduce the implementation cost of redundant CORDIC,
`especially for the scale factor calculation and the scaling
`operation,
`there is a need to develop a redundant CORDIC
`scheme with a constant scale factor. This is achieved if 6 is
`
`restricted to the set {—1, 1}. However, since to eliminate the
`delay of exact sign detection, & is obtained from an estimate
`of Yap], this does not assure convergence. We now discuss
`how to modify the scheme to assure convergence, first in the
`calculation of the sine and cosine functions (a review of a
`scheme proposed in [15]) and then in angle calculationtrotation
`mode (a new scheme).
`
`In the case of angle calculation, the problem of having a
`constant scale factor is more complicated because the direction
`of the rotation is obtained from an estimate of Y},[i.] and it is
`necessary to deal with the inter-dependency of the recurrences
`of X and Y. To use the estimate effectively, we utilize the
`modified recurrences of expressions (6). However, to achieve a
`constant scaling factor, instead of the rir—selection of expression
`(7), we use
`
`175:
`
`.
`
`{1
`
`—1
`
`if nip] 2 0
`if Wli] < 0
`
`(8)
`
`where WE] is an estimate obtained by truncating the signed-
`digit representation of W[i] to the ith fractional bit. As before,
`this requires Xa[1] 2 1/2, which is achieved in the same
`manner.
`
`The use of this 5 selection does not assure convergence
`because W[i.] can attain values outside the bound required
`for convergence. We now compute the convergence bound,
`determine the amount by which W [i] can surpass this bound
`because of the 6-selection used, and show how correcting
`iterations can restore the bound.
`
`Theorem 3.] (Bound): When the direction of the angle a,-
`is obtained using the exact sign, that is,
`
`o-— 1
`.,_
`_1
`
`ifW[i] 20
`if wn] <0
`
`W[i] is bounded as follows [14]:
`
`|W[-i]| < 2X[i].
`
`Proof: Since o,_1 is determined by the sign of W[i— 1],
`the largest magnitude of W[i] is obtained when W[t' — 1] = 0
`(03-4 = 1) in
`
`W[i] = 2(Wli ~ 1] — ois1X[i — 1]).
`
`Thus,
`
`A. Sine and Cosine Calculations
`
`|W[i]] < 2X[i.— 1].
`
`in [15] a correcting iteration scheme is used to have a
`constant scale factor when redundant CORDIC computes
`the cosine and sine functions. For these computations,
`the
`recurrences (3) are used and Er,-
`is obtained as in (4), except
`that an estimate of Z[i] is utilized. Because of this estimate,
`to assure convergence it is necessary to repeat some iterations,
`called correcting iterations. These repetitions are done at fixed
`intervals, where the frequency of repetition depends on the
`precision of the estimate.
`This approach could be directly extended to perform ro—
`tations by decomposing the given angle. However,
`it
`is not
`directly applicable when the angle is computed as tan—1(a/b)
`and this angle is used in decomposed form for the rotations.
`We develop the required modifications now.
`
`As shown in [21], X[i] g X[i + 1] for all i. Thus,
`
`|W[i]| < 2X[i _ 1] g 2m].
`
`
`
`:
`
`Fig. 2 shows the bound of Theorem 3.1. This bound cannot be
`maintained with the selection function of the CFR—CORDIC
`
`scheme [expression (8)], as shown by the following lemma.
`Lewmma 3.1: Assume that W [p] satisfies the bound of
`Theorem 3.1,
`that
`is |W[p]| < 2X[p]. If tip is determined
`from W[p] where W[p] is computed using 1 fractional bits of
`Who], then Wk) + 1] satisfies the new bound
`
`—2X[p+ 1] — 2324(1 + 2—21’]< W[p +114. 2X[p+ 1].
`
`Page 3 of 10
`
`Page 3 of 10
`
`

`

`LEE AND LANG: CONSTANT-FACTOR REDL'NDANT CORDIC
`
`JUN
`
`Wti]
`
`wp—n
`
`-ZX[i]
`
`-2X[i-l]
`
`0
`
`2X[i-ll
`
`211m
`
`Fig. 2. Convergence hound,
`
`Proof: Three regions of sz] need to be considered.
`Case a): WLp] > 0 (5p 2 1).
`In signed-digit number system, if Who] > 0 then W[p] > 0.
`since Whit] is computed using 1 fractional bits of Wlp] Thus,
`63, = 0-,, = 1. From Theorem 3.1, ]W[p +1]| < 2X[p +1].
`Case b): who] 2 0 tap : 1),
`[1' Wm] : D then —2_E < [rt/[p] < 2‘1 We divide this region
`into two. In the first, when —2_t < W[p] < 0, 0,, : —1 (not
`equal to a), From the recurrence,
`
`W]-p+l]=2(]—W[p X]p])
`X]p+l]:X[p]+22pmp].
`
`Consequently,
`
`WLD + 1] : 21m] _ 2{X[p +1]— 2-2pm“)
`: —2X[p + 1] + 2W[p](1+ 2—2?)
`
`and for the most negative value of W[p]
`
`Wlp + 1] > —2th + 1] — 2 :1 2—‘(1 + 2—29).
`
`011 the other hand, when 0 < W[_p] < 2" then 17,, : Er, : 1.
`Thus, from Theorem 3.1,
`|W[p + 1]] < 2X[p + 1].
`Case e): W [p] < 0
`Similar to case a), 6,, = 0,, : —1. Thus, from the Theorem
`3.1,
`iW[p + 1]| < 2X[p +1].
`El
`From a), b), and c), the proof is done.
`is out of the bound
`Fig. 3 shows the case when W by + 1]
`due to the use of an estimate, W[p]. Once Wle']
`is outside
`of the bound required for convergence, the amount outside of
`the bound accumulates in the foIIOwing iterations, as shown
`in the following lemma.
`Lemma 3.2: 1f72X@+]]72a2—‘(1+2‘2P) < W[p+1] <
`72X[p + 1],
`then the bound of iteration W[p + k] where
`k: = {2.3.---} becomes
`
`—2XLD+ k] — 21-10 + 2—23”) < Mp + 1] < —2X[p+ 1:].
`
`Proof by induction:
`1) Basis condition for k : 2,
`Since —2X[p + 1] — 2 at 24(1 + 2—2?) < W[p +1] <
`—2X[p+ 1], we obtain tit/[p + 1] < 0. Thus, a,“ = —1.
`From the lower bound and the recurrence, we have
`
` W[p —— 2] > 2(—2X[p + 1] —2 112—11 +2—2P) + le+ 1])
`
`W’Lp —— 2] > 2(—X[p + 2] — 2—2111“)th +1]
`— 2 , 2—‘(1 + 2411)]
`
`who * 2] > —2X[p + 211 22—’(1 + 241’].
`
`Page 4 of 10
`
`2W[p]
`
`2‘1Pt1wm
`
`—2X[p] —2X[p+l]
`
`72
`
`0
`
` 4
`
`1X[p+l]
`
`2X[p]
`
`Fig. 3, Out of the bound due to the use of an estimate.
`
`From the upper bound and the recurrence we have
`
`W11» + 21< 2t—2X1p+ 11+ ti +111 = 21—ti + 111
`Who + 2] < 2(—X['p + 2] — 2-2<P+UW[p + 1])
`< —2X[p + 2].
`
`Thus,
`
`—2X[p+2]— 221 (1+2 2" )<W[p+2]<——2XL;;+2].
`
`2) Induction step. Assume that it is true when k z i and
`prove that it is true when k = 1' + 1.
`By assumption,
`
`—2X[p +1] — 2"—*(1 + 241’) < W[p+ 1:] < —2X]p +12].
`
`Therefore, W[p + if] < 0 and 6,,“- = —1. From lower bound
`of assumption and from recurrence,
`
`W[p+ 2' + 1] > 2(A2XL11+1] — 21"(1+2‘2P) + X[p+'i])
`W[p +1+ 1] > 2(—X[p+ 1' + 1] — 2—2(P+”W[p +1]
`— 2*-*(1 + 2-2?»
`th+1+11> —2X[p+ i + 1] — 2"11—‘(1 + 2—29).
`
`From upper bound of assumption and from recurrence,
`
`W[p+i+1] < 2(—2Xfi1+1‘]+X[p+i]) = 2(—X|p+1‘])
`W[p+ 1: + 1] < 2(—X[p +1+ 1] — 2—2meVLu-t— 1])
`< —2X]p + 1' + 1].
`
`Thus,
`
`—2X]]J +1+ 1] — 2"+1**(1 + 2’2?)
`
`< Wlp+t+ 1] < —2X[p+1'+1].
`
`CI
`Therefore, from 1) and 2), the proof is done.
`Fig. 4 shows the bound of W[p + 2]
`to explain how the
`amount outside of the bound accumulates once Wk) + 1]
`is
`out of the bound.
`
`To recover the original bound of ]W[i + 1]] < 2X[i + l], a
`correcting iteration needs to be performed. The same angle of
`the previous iteration is used for the correcting iteration,
`i. e.,
`2 ' instead of 2 5+1):5 used with Y[i + 1] as shown below.
`
`X011: + 1] = 1([1‘ + 1] + (35'2’EY[1‘+ t]
`YC{1‘+ 1] : 11-1 + 1] — a?2-*‘.vr[z' + 1].
`
`The corresponding shifted recurrence becomes
`
`X" [1: + 1] =
`
`WC[1+ 1] =
`
`X[i + 1] + 592-21—111/[1 + 1]
`W[1'+ 1] — 26,-0X[1'+1].
`
`
`
`Page 4 of 10
`
`

`

`
`
`lllill
`
`lLiEl; 'I'HANSNL'TIDNS ON COMPUTERS, VOL. 4t, N0 8. AUGUST 1992
`
`CW
`
`li+l]
`
`axiom]
`
`
`
`o
`
`2xn+n
`
`Fig. 5. Recovering the convergence heund by a correcting iteration.
`
`El
`Thus, the largest integer it becomes 1‘. e 1.
`As indicated by Theorem 3.2, a correcting iteration puts W
`inside the convergence bound, if it was outside of the bound
`before this iteration. However, as shown in the next lemma,
`this correcting iteration might take W out of the bound if it
`was inside the bound before.
`
`If ]Wii + 1“ < ‘ZXii + 1]. the bound after a
`Lemma 3.3:
`correcting iteration
`
`WCl-i + 1] ~ Wi-i+1]— 2af‘xn + 1]
`
`becomes
`
`~2‘r — 2X[.r:+1]< Wait + 1] < 2m + 1].
`
`Proof: The proof is very similar to that of Lemma 3.1
`and is omitted.
`El
`From Lemma 3.2, Theorem 3.2, and Lemma 3.3, we obtain
`the following corollary.
`Corollary ii: If t fractional bits are used for the estimate,
`then the interval between correcting iterations should be less
`
`than or equal to (t — 2).
`l:
`The fact that a correcting iteration does not have to be
`included exactly every (if?) iterations provides flexibility and
`helps to minimize the number of additional iterations required
`to avoid an explicit sealing operation. Also, note that
`it
`is
`possible to repeat a correcting iteration more than once if it
`helps to minimize the total number of iterations for a certain
`K. This additional correcting iterations should not be counted
`as one of the iterations of CORDIC when determining the
`index of the next correcting iteration.
`Since the frequency of correcting iterations is determined by
`it, to reduce their number it is necessary to increase 5. However,
`this makes the or selection slower. Consequently, the value of
`t has to be chosen to reduce the overall execution time,
`
`C. Reducing the Number of Correcting iterations
`
`From the previous discussion we can determine the mini-
`mum numbcr of correcting iterations required. We now show
`that it is sufficient to have only one correcting iteration in the
`second half, if for that part the deselection of expression (7)
`is used. That is, we split the iterations of CORDIC (for -i : t)
`to 1' : rt — 1} into two groups as follows:
`1) Selection function for t] g i 5 tit/2
`
`- ,
`(If; —
`
`1
`*1
`
`if Wit] 2 it
`.
`r,
`.
`1f W [t] < 0.
`
`(9)
`
`As shown earlier. correcting iterations must be included for
`convergence for
`this group and the number of fractional
`bits used for W[-.€] determines the frequency of correcting
`iterations.
`
`
`
`axing]
`
`-2X[p+1]
`
`0
`
`2X[p+1].
`
`2le+2]
`
`Fig. 4. Accumulating the amount outside of the bound.
`
`The direction of the angle of is determined from Witt-1]
`instead of tilt]. However, notice that
`the same selection
`function in (8) is used for this iteration. Fig. 5 shows the case
`where a correcting iteration recovers the bound of Theorem
`3.1. The next theorem determines the frequency of correcting
`iterations required.
`ll‘l’ip“ < 2X[pi and the di—
`Theorem 3.2: Assume that
`rection of the angle, aw“ is determined from the estimate
`l-i-’[p+k] for all k. 1ft fractional bits are used in the estimate,
`then a maximum of t’. a l
`iterations can be done before a
`
`correcting iteration to recover the convergence bound.
`Proof: Let us assume that k iterations are performed after
`the path iteration. From Lemma 3.2, the bound of W[p + it] is
`expanded as follows due to the use of estimate
`
`—‘2X[‘p + M _ 2"”(1 + 2-”) < WLD + k] < ext” + k].
`
`then it satisfies the bound of
`If it‘l'ip + is“ < 2X[-p + k}.
`Theorem 3.] and a correcting iteration is not needed to recover
`the hound.
`
`Since we are interested in finding out how many iterations
`can be performed before recovering the bound by a correcting
`iteration, we need to consider the case when Wit? + k] is in
`the following range:
`
`—2X[p + t] — 2*'-t(1+ 24!); 4 mp + k] < —-2X[p+ at.
`
`in this case, op“. : -—l_sincc W[p+ k] < 0. By applying the
`correcting iteration. Wc'lp + k] becomes
`
`ltd-"hi + it] = My + t] + 2X[p + a] > _2*'—'{i + 241’).
`
`To recover the bound of Theorem 3.1, we need to find it
`satisfying the following inequality.
`
`|— ilk—’[l + 2-2pm z. 2m) + it].
`
`Therefore,
`
`2*“f <
`
`'3le + kl
`(t + 2—31“).
`
`is determined by considering the smallest
`The value of it:
`possible value of the right—hand side. For this, we consider the
`smallest value of 2X[])+ is] and the largest of (1 + 2’2”), i.e.,
`we should consider the smallest value of p. Since we assume
`that the input ”[0] is in conventional number representation,
`the direction of the angle do is correct, regardless of the use
`of an estimate. Thus, the smallest if) to consider is 1. Now we
`want to determine the smallest value of X [p+ kl. In [22] it is
`shown that the smallest value of Xfin + k] is 3/8. Therefore,
`2K“ J-Lii"
`.
`
`R—l‘< —l0gU—+g—Jfl% (log;
`log ‘2
`log ‘2'
`
`Page 5 of 10
`
`Page 5 of 10
`
`

`

`LEE AND LANG: CONSTANT-FACTOR REDUNDANT CORDIC
`
`102]
`
`2) Selection function for n/2 < i < n
`
`a, __.
`
`1
`o
`—1
`
`if Vito] a g
`if tit/[r] = o
`if we] 3 —%.
`
`(10)
`
`Utilizing the fact that the value of K in n—bit precision is not
`affected by r}, for n/2 < i S (o i l}, the selection function
`of redundant CORDIC (with a variable scale factor} shown
`in (7) is used for the second group. This assures convergence
`without correcting iterations; however, we need to include one
`additional correcting iteration (independent of it) since the last
`correcting iteration of the first group might fall into the case
`of Lemma 3.3.
`
`D. Making Scale Factor (1 Power of Two
`
`Since the scale factor is a constant, it is possible to utilize the
`same methods for scaling used for the conventional CORDIC
`algorithm. As mentioned before, the most effective method is
`to use additional iterations (CORDIC and scaling) to make
`the scale factor
`a power of two.
`since in that case the
`scaling can be done by shifting, which is implemented by
`wiring. As in the conventional case, the issue is to minimize
`the number of additional iterations; however, the problem is
`somewhat more complex because of the correcting iterations.
`As indicated in the previous section, there is some freedom in
`the placement of these iterations; therefore, the problem is to
`obtain the minimum number of iterations with the correcting
`iterations suitably placed. We developed an efficient method
`of exhaustive search, called decomposed search, to reduce the
`searching time from T to if? [22]. For the case in which
`six fractional bits are used in the estimate,
`the number of
`additional iterations for 16-bit and 32-bit precision is shown
`in Table l. The corresponding scale factor is equal to two.
`The additional iterations require an increase in the number
`of guard bits since this number shOuld be at least log or, where
`m is the number of iterations. For the solutions shown in the
`
`Table l, we get five guard bits for the 16-bit case and six
`guard bits for the 32-bit one.
`
`IV. REDUCING THE NUMBER OF
`Irenarrous AND ON-THE-FLY Convaasrou
`
`We now describe a method for reducing the number of iter-
`ations for rotation, which can be applied both to nonredundant
`and redundant CORDIC. Moreover, we develop a scheme for
`performing the conversion of the result from redundant
`to
`22’s complement in an on-the-fiy manner. This scheme was
`not necessary for the redundant CORDIC with variable scale
`factor, since the onitherfiy conversion can be done during the
`division for scale factor correction.
`
`A. Radix—2 and Radix-4 Iterations for Rotation
`
`To speed up the CORDIC rotation, we propose a scheme
`that reduces the last 11/2 iterations to NM by the use of radix-
`4 rotations. One possibility would be to directly decompose
`the angle into radix-4 o’s; however, that would complicate
`significantly the selection function. Consequently, we continue
`
`TABLE I
`EXAMPLES or ADDITIONAL ITEMI'IONS Port 1 = 6
`.
`No. of
`.
`.
`No. of
`..
`.
`.
`.
`.
`Additional CORDIC
`iterations
`bits
`additional
`Scaling llBleanS
`iteralions
`
`16-bit
`6
`+2, +10, —5
`4, 8, 9
`
`+2, +10, +17, —5, —22, w281232bit 4, S, 9, 11, 15, IT
`
`
`
`decomposing the angle into radix-2 0's and combine these to
`form a radix—4 rotation. This can be easily done only when
`tan—K24) : 2‘ii for the required precision. Moreover,
`in
`order not to change the scaling factor, this should be done
`only for i 2 2-1/2. The following theorem shows that this is
`satisfied for i 2 n/2.
`Theorem 4.1: Let m be the total number of iterations. Then,
`Sin/2Q"- — tan—1(2")) < 2_(“+‘°5m) for m < 2””.
`Proof? The series expansion for tan’1 3 where he! 5 1 is
`
`t
`
`an
`
`x7
`1:5
`3:3
`1
`‘ =e—# ———+
`3 + 5
`7
`.r'
`
`Thus,
`
`For a: = 2 ‘,
`
`J:5.
`x3
`m-tan’lx=—;-—7+H
`.i
`o
`
`_
`.
`2_'—t. _12-t:
`an
`
`
`273:
`3
`
`2—51'
`__..
`5 +
`
`.. <
`
`
`2#31
`2
`
`Thus,
`
`
`2(2 IIitzin1(2“I))<21'5"1(1+2 3+2-‘5+..-)
`L—gf
`
`< 32—1-5” < 2—(n+logrn)-
`
`Consequently, a sufficient condition is
`
`m < 2"”.
`
`El
`
`Therefore, when 3' 3 12‘, a rotation (iteration) with an angle
`91' = or, tan‘1 2* can be regarded as an iteration with an angle
`0,2“, so that two consecutive iterations can be expressed with
`one radix-4 direction of rotation since
`
`9i + 6H1: 052—!- +0r+r2_”+n = (20'; +Ut+1)2_(i+1)-
`
`The set of values of (20, +o,-+1) is {3:3, :|:1} ifo 6 {i1}
`and {:I:S,:l:2,:l:1,0} if or E {$1.0}. To avoid multiplica-
`tion by 21:3, we developed recoding schemes into the set
`{5:2,:t1,0} [22].
`One problem that appears with this reduction in the number
`of iteration in the rotation,
`is the imbalance between the
`number of iterations of angle calculation (all radix72) and
`of rotation (radix-2 and radix—4). Moreover, to perform the
`radix-4 rotations it
`is necessary to have performed already
`the first of the pair of radix-2 angle iterations. Fortunately,
`the scaling iterations for making the scale factor a power of
`two have to be done only for the rotation (and not for the
`angle). Consequently, if these scaling iterations are done at the
`beginning together with the first half of the angie iterations,
`
`Page 6 of 10
`
`Page 6 of 10
`
`

`

`“J22.
`
`IFEE TRANSACTIONS ON COMPUTERS, VOL 41, N0. 8. AUGUST 1992
`
`it is possible to match the timing. To achieve this match, a
`solution with a suitable number of scaling iterations should
`be selected.
`
`3. ()n-tl‘te—F1y Conversion
`in most
`instances the result of
`
`the rotation has to be
`
`converted to conventional representation. For the scheme pre—
`sented in [14]. since the scale factor is variable this conversion
`is done as part of the scaling operation (division) using the on-
`the fly conversion method described in [16]. Since the explicit
`sealing Operation by division is avoided in CFR—CORDIC,
`there is a need to develop an on-the-fly scheme to convert
`while the CORDIC rotation is performed. The preperty of
`after CORDIC which is the generation of approximately one
`bit of the output per iteration starting from most significant
`bit, makes it possible to use a similar conversion scheme.
`However, the fact that the output is not really produced in an
`on-line manner (one bit per iteration) [23] makes the scheme
`more complex. We first briefly review the conversion scheme
`presented in [16] and then present the adaptation for CORDIC
`rotation.
`
`Review of the on-rhe-flv conversion scheme presented in
`[16}: Consider a unit {such as SRT division) which produces
`a signed—digit result (we describe the radix-2 case)
`
`P=Zp12_i.
`i=1
`
`p1c[ 1,0.1}
`
`one digit per iteration beginning from the most significant.
`Then i” can be converted on—lhe-fiy into its 2’s complement
`representation
`
`i'i'i
`
`Q=rqo+2v.2_'.
`i:1
`
`q;e{0.l}.
`
`The algorithm produces a converted digit q,- as it obtains a
`signed—digit p, from most significant to least significant bit.
`The simple algorithm QM] = Q“; — l] +pk2‘i“ produces Q =
`Qirn]; however, when pt, -_ —1, this requires the propagation
`ofa carry and is therefore slow. To avoid this propagation, the
`on—the—fly conversion algorithm keeps two conditional forms:
`All: — 1] expecting pi. = 1 or 3);, : 0, and BM — I] expecting
`pr. : —1, where Bile ~1]:A[k — 1] — 2 “i U.
`The initial values of A[l] and B[l] are
`
`_ [LN-HIS)
`
`Am _ { 1.1(_n_s)
`Bili‘{l.0(—l)
`
`_ 011(0)
`
`if pi : l
`
`ifpl = ,1
`ifpl:—1
`
`if p1 = l
`
`Note that pt 75 0 since P is normalized.
`
`The recurrences on Able] and B[k] for it 3— l are
`
`A[k] 2
`
`BIL-t 2
`
`if'pk =
`A“: - 15+ 2—}:
`1f int- = (l
`AM —- 1]
`B[k—1i+2“k ifpkz—l
`A“: - 1]
`if m. _1
`Bit- -1]+ 2"“
`it 3),, = o
`BER—1]
`ifka-nl”
`
`Page 7 of 10
`
`The final result Q = Abra].
`Orr-tltc—fly conversion scheme for redundant rotation: For
`CORDIC rotation both X and Y have to be converted; here
`we describe the conversion of X only since the other one is
`similar. The previous algorithm cannot be applied directly to
`the CORDIC case because the result is not obtained one hit
`
`the conversion
`per iteration. However, as shown in Fig. 6,
`scheme utilizes the fact that in the calculation of X['1i+ 1] the
`operand VB] is shifted right by :2 positions. Consequently, the
`most—significant part can only be modified by a posnive or
`by a negative carry coming from the least-Significant part. As
`shown in the figure, XM can be divided into three regions:
`1)
`the converted region of _;: bits (represented by two
`conditional forms Ali] and AMp] : Ali] — 2‘3')
`2) the window it,- corresponding to bits 3' + 1 to r}.
`3)
`the region beginning from bit
`(i. + 1), where the addi-
`tiom’subtraction is performed. Note that the first j bits
`of Yii] in this region are already converted.
`Since we want to select only between two conditional forms,
`we restrict the carryout of the window to be either 0 or minus
`1. The window R,-
`is used to absorb the carry produced by
`the addition/subtraction so that only a minus I carry can cross
`into the first region.
`We now need to determine the number of bits of the window
`
`Hi. This depends on the set of values of a. We have developed
`the three possible cases that appear in the rotation scheme
`described before, namely. 1} in radix—2, 2) in radix-4, and 3)
`repetition of two radix—2 sta

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket