throbber
1992 IEEE
`INTERNATIONAL SYMPOSIUM ON
`CIRCUITS AND SYSTEMS
`
`Es
`
`Volume 1 of 6
`
`Sheraton Hotel
`San Diego, CA
`May 10-13, 1992
`
`92CH3139-3
`
`Page 1 of 5
`
`SAMSUNG EXHIBIT 1043
`
`Page 1 of 5
`
`SAMSUNG EXHIBIT 1043
`
`

`

`
`
`A Normalization Scheme to Reduce Numerical Errors in Inverse
`Tangent Computations on a Fixed-point CORDIC Processor
`
`Kishore Kota
`
`Joseph R. Cavallaro
`Department of Electrical & Computer Engineering
`Rice University, Houston, TX 77251
`
`Z-Reduction (for Vector Rotations, sine, cosine functions)
`Initial Values
`Final values of variables
`
`Abstract
`
`CO-ordinate Rotation Digital Computer (CORDIC) Is a unified
`arithmetic algorithm that allows efficient VLSI implementation
`of several elementary functions. Many sp~cial-purpose systems
`for real-time signal processing applications use a fixed-point rep-
`resentation of numbers due to the ease of implementation. An
`analysis of fixed-point CORDIC in the Y-reduction mode, which
`allows computation of the inverse tangent function, shows that
`unnormalized input values can result in large numerical errors.
`This paper preseuts a method tu integrate ble wormalization op-
`eration with CORDIC iterations for efficient implementation in
`O(n1*) hardware.
`1
`Intreduction
`
`The CO-ordinate Rotation Digital Computer (CORDIC) is an
`arithmetic technique that allows efficient computation of a vari-
`ety of elementary functions. The CORDIC algorithm is attrac-
`tive from a hardware point of view since it uses only primitive
`operations such as shifts and additions to implement more com-
`plex functions including sine, cosine, tangent, arctangent, sinh,
`cosh, tanh, arctanh, In and exp. A numberofreal-time signal
`processing and image processing algorithms have been developed
`to make efficient use of CORDIC in special-purpose parallel ar-
`rays.
`Interest in CORDIC has spurred recent work to improve
`the basic algorithm. Techniques have been developed to increase
`the range ot convergence [4], speedup the computation by reduc-
`ing the number ofiterations (7] and pipeline the computation
`through conventional techniques or through the use of on-line
`arithmetic [3].
`The basic CORDIC iterations as specified by Walther [9] are:
`
`zy + mbjy;27*
`Liat
`Vier = wer mdz",
`Tig. = 2 + 6j;tan727-*
`
`(1)
`(2)
`(3)
`
`where z;, yj and z; are the states of the variables r, y and = at
`the start of the iiteration, 0 <i <n, and 6; € {-1, +1}. The
`inverse tangents, a; = tan7!27', are pre-computed and stored
`in a table of angles. CORDIC iterations can be performed in
`three different modes, hyperbolic, linear or circular, depending
`on the choice of m € {-1,0,1}. All results in this paper have
`been obtained for the circular mode of CORDIC. The different
`functions implemented in this mode are summarized in Table 1.
`The computation of interest in this paper is inverse tangent,
`which requires CORDIC operation in the vectoring or Y-reduction
`mode. The Y-reduction mode is selected by a choice of 6; at each
`iteration given by:
`
`1
`
`6 = { 4
`
`if ya; > 0
`if vizi< 0.
`
`(4)
`
`The above choice (10] corresponds to the choice of 6;, which trics
`to reduce the magnitude of y,; to zero.
`
`244
`
`0-7803-0593-0/92 $3.00 1992 IEEE
`
`Page 2 of 5
`
`
`
`(29 cos @ + yosin#)/K,
`(yo cos @ ~ zqgsin 9)/K,
`0
`
`25+ ¥h/Kn
`0
`@ = tan! (
`
`Yo
`To
`
`)
`
`Table 1: CORDIC Functionality in the Circular Mode
`
`CORDIC has previously been implemented primarily using a
`fixed-point representation of the numbers. Design of a floating-
`point CORDIC unit requires handling of numerous special cases.
`which complicates design (2].
`In particular, a floating-point im-
`plementation of a system that uses CORDIC requires normaliza-
`tion at every stage of computation, which is not necessary with
`a fixed-point representation.
`
`2 Error Analysis of CORDIC Y-Reduction
`In the study of numerical error in CORDIC,it is convenient to
`split the errors due to different factors as an approximation error
`and a truncation error [5]. The approximationerroris a result of
`the finite number of iterations in an implementation of CORDIC.
`A truncationerroris caused by the finite word length of the data
`paths. The following notation is used in the derivations.
`
`e 2), 9i, 2; represent the values obtained in a real-world im-
`plementation of CORDIC. These numbers include the effects
`of both approximation errors and truncation errors.
`
`Li, Yi, 2 Tepresent the values that will be obtained in a hy-
`pothetical CORDIC unit with infinite precision, where every
`iteration uses the same sequence of 6;s used in the compu-
`tation of Z,, ¥; and 2,.
`
`© 2, fi, 2; represent the values that should be obtained for the
`desired function with infinite precision, as mathematically
`dofined.
`
`Let Q[.] denote the quantization operator. The result of the
`quantization operator on a vector is defined as the vector ob-
`tained through truncation of each vector element. The followin
`lemma, relating the vector vn = (tn, yn]? and ¥a = [in,jn]
`can be derived from the z and y iterations given by Equation 1
`and Equation 2 respectively.
`
`Page 2 of 5
`
`

`

`2 g F
`
`tan-'(1) =
`the computed value of
`in
`Error
`igure 2:
`tan—! (yo/yo) versus yo in 2 CORDIC module that implements
`the partial normalization scheme
`
`s4
`
`Equation 1, Equation 2 and Equation 3 with the choice of 6; at
`each iteration given by Equation 4. The convergence property
`of the CORDIC algorithmguarantees that J, is reduced to close
`to zero. A detailed error analysis |6] of Y-reduction shows that
`the approximation error forms an insignificant fraction of the fi-
`nal error and does not affect our results. Hence, in this paper
`we present a simplified analysis that neglects the approximation
`error.
`
`In a hypothetical CORDIC unit with infinite precision, the fol-
`lowing relation holds:
`(8)
`Yn = (yo cosa — zosina)/ Ky
`where a has been defined in lemma 2.2. Let r and @ be defined
`as
`
`= ,/rity2-
`tané
`zh+yg ;
`r
`Equation 8 can now be rewritten as:
`Kayan = rsin(@-a)
`
`_ »
`.
`
`(9)
`
`\@-a| =
`
`sin7! (5Vz6 +93
`
`
`
`is bounded by
`Using lemma 2.1, we can conclude that y, — §,
`(n+1)/2e. Similarly, using lemma2.2, we can conclude that|y~—
`a|, which is the angle |2,, — z,|, is bounded by ny. Substituting
`these worst case bounds, we obtain the following relation
`
`(10)
`
`J@-gl< ie
`
`in| + np
`
`Vr5 +98
`Here, @ is the true inverse tangent that is to be evaluated. The
`final value accumulated in the » variable, 2, = y, ie the angle
`actually obtained. Thus relation (10) gives the numerical error
`in inverse tangent calculations.
`Figure 1 shows the errors observed in the computation ofin-
`voreo tangont for various initial valuce. It chowo that when oy
`and yo are close to zero, a large error results. An intuitive ex-
`planation of this error is the successively larger right shift in the
`CORDIC iterations given by Equation 1 and Equation 2, which
`can result in a loss ofall significant bits if rg and yo are not large
`enough. A similar problem does not occur in a floating-point
`CORDIC module, since in such an implementation the relative
`
`Value of YO
`
`1
`
`1
`
`0 9 8 7 6 5 4 3 2
`
`o-
`
`T
`
`No.ofbitsinerror
`
`Figure 1: Error in the computed value of tan~'(1)=tan— (yo/yo)
`versus Yo in a CORDIC module without normalization. The error
`has been quantized into the number of bits in error. A large
`errer is observed for emall values of yg. Thie experiment hao
`been performed on a datapath that is 16 bits wide. The selected
`range of yo forms about 4th ofthe entire range of yo.
`
`Lemma2.1
`
`[va — Val < V2e (1435j=0 n-1
`
`n=l
`
`I] &
`inj
`
`(5)
`
`Proof This is a result derived by Hu (5]. The quantity ¢ is the
`precision of the xy-data path. The scale factor A; is defined as
`K, = [Uzhcosa;, The aboverelation holds forfixed-point data
`paths. A plot of Dyce (mRs} a versus n shows thatit is very
`close to n for all practical values of n such as 16, 32 or 64. Thus
`the result reduces to:
`(6)
`|e. —val < /2e(1 +n)
`The number of iterations, n, and the width of the data path
`are related. Since a wider datapath implies greater precision, an
`approximation error that is of the same order, can be achieved
`vuly through more iterations.
`
`o
`The truncation error due to finite precision of the z datapath can
`be quantified by the following lemma.
`Lemina 2.2
`n=-1
`n-1
`a- >> 6Qlai]= >> bi; where
`i=0
`1=0
`
`pi =a;-Qlai
`
`(7)
`
`Proof The angle a is defined as a = )-?7) d;a;. Hence, a is
`the total angle that is either added to or subtracted from zo in a
`hypothetical CORDIC unit withinfinite precision. The resultfol-
`lows from definitions of a and u;. If a fixed-point representation
`is used, thenall the y;s are bounded above by y, the precision of
`the z-data path.
`
`a
`
`Z.L Perturbation in the Computed Value of the In-
`verse Tangent
`The goal of Y-reduction iterations is to compute the inverse tan-
`gent 9 = tan-!(yo/z9). Initially, zo is set to zero and the inverse
`tangent is accumulated in z,. The angle actually accumulated
`in 2, is wy. This is achieved by performing iterations given by
`
`Page 3 of 5
`
`Page 3 of 5
`
`

`

`
`
`begin
`for i:=0 tondo
`if (z; < MAX_X and y; < MAX.Y) then
`Choose rinder such that
`xgshift[zinder] © Wf AX_X and
`xpshift|zinder+ J] > MAXX ;
`Choose yinder such that
`yiashift[yindez] < xfAX¥ and
`y2shift[yinder+1] > MAK_Y
`inder := min( xindex, yindex ):
`else
`
`’
`
`indez := 0;
`
`end
`
`Normalization Shift p := shift(indez']
`Perform modified CORDIC Iteration;
`end
`end
`
`
`Figure 3: Modified CORDIC Iterations,
`reduction when the input is not normalized
`
`invoked during Y-
`
`ie. = Q [si - 64:2-'] 2°,
`where i is the iteration index and Q{.] is the quantization opera.
`tor. This is followed by (n —q/p) unmodified CORDIC iterations.
`The two inverse tangents computed, t and t', will be identical if
`q < &, where €
`is an upper bound given by
`
`(12)
`
`€ = 2p?
`
`Figure 1 shows that a large error in the computed inverse tan-
`gent can occur if [Vol is small. However, if the initial values are
`bounded below by a value that is in the order of 2, since the
`numerator in Equation 10 is hannded ahave hy W,,/9(n + 1,
`the error will be bounded. Hence, a lower bound on |¥o|, enforced
`through normalization of the input values, can be used to control
`the error. If this normalization takes the form of a left shift of
`both the inputs. it appears in the final output, +, as a shift and
`does not affect the computed inverse tangent, except for better
`accuracy.
`We observed the above problem when designing a processor(6]
`that computes the Singular Value Decomposition (SVD) of a ma-
`trix using CORDIC arithmetic [1],
`In a fixed-point implementa-
`tion, when CORDIC is used to perform operations on numbers
`obtained in a long chain of calculations, one cannot guarantee
`that the numbers are always in a normalized form. A straight-
`forward way to eliminate the resulting numerical problemsis to
`explicitly normalize the inputs before proceeding with CORDIC.
`Implementation of normalization requires hardware to deter-
`mine the amount by which components of the vector can be
`shifted left, and hardware to implement the shifts. The simplest
`implementation uses two leading-zero encoders to determine the
`amounts through which each component, #o and jo, need to be
`shifted; a comparator to choose the smaller of the twoshifts; and
`a barrel shifter to implement the shifts. This generic solution
`to achieve normalization incurs an O(1) time penalty and O(n?)
`area penalty.
`A tradeoff between time and area [8] can be made by imple-
`menting a shifter that can shift through selected powers of two,
`and performing the desired normalization shift in logn cycles.
`Such an approach reduces the complexity of the barrel shifter to
`Proof This result is obtained by observing that theleft shift
`of p introduced at the end of a CORDICiteration introduces p
`O(nlogn), while the complexity of the leading-zero encoders is
`zeros in the lower order bits that can be safely shifted out with-
`stil O(n*).
`‘Lhe implementation is considerably more complex
`out losing any significant bits. However, as the iteration index
`and incurs a time penalty of O(log n),
`increases, the right shift at each iteration increases, while the
`Jn this section we present a new normalization technique that
`left shift introduced by normalization is a constant. The upper
`incurs an area penalty of O(n!) but no time penalty. This tech-
`nique, which we call partial normalization, achieves the desired
`bound ¢ is the point at which the p js no longersufficient to pre-
`vent anyloss of bits. A detailed proof of this result can be found
`normalization shift as a combination of smaller shifts merged with
`in a thesis [6].
`the CORDICiterations. At the end of eachiteration, the vari-
`o
`ables @; and # are shifted left by a few bits, decided by control
`This thevrem slows that if the CORDIC Iterations can be modi-
`logic, until the desired total shift is achieved. The design goal is
`to minimize the maximum shift that needs to be introduced at
`fied to include a constant left shift p then normalization through
`any multiple of p but less than 2p? can be achieved. In practice,
`any iteration. The normalization shift introduced at every iter-
`the parameter p can be made a variable, allowing one of several
`ation prevents CORDIC from shifting out and losing significant
`shifts to be chosen at each iteration. An appropriate choice of
`bits in subsequent iterations. The above scheme results in two
`these shifts can achieve any desired total shift.
`competing processes. The partial normalization scheme can only
`Suppose pin Equation 11 can be chosen fromatotal of 4 shifts,
`implementa total shift that grows linearly as the numberofiter-
`and the condition shift(A — 1] > shift{A —2] > +--+ > 0 holds. The
`ations. CORDIC, however, tends to shift out bits as a square of
`algorithm given in Figure 2, gives a choice of shifts at each it-
`the numberof iterations duc to the successively iucceasing shift,
`eration to normalize the input. The parameters MAX_X and
`i, at each iteration. The following theorem quantifies this idea.
`MAX.Y are the maximum allowed values of zp and yo chosen
`Theorem 3.1 Let t'=tan7!(§o27/Z927), be the value calcu-
`to prevent overflow. The required control can be implemented
`lated by preshifting Zo and to and using an unmodified CORDIC
`using combinational logic: a pair of leading scro oncoders and o
`unit, where q ts an integral mulliple of a constant p. Let t =
`comparator to select the smaller shift from the output of the en-
`tan—"(#o/%o) be the value calculated by using a modified CORDIC
`coders. The additional hardware required to implement partial
`untt and unnormalized initial values. The modified CORDIC unit
`normalization is a pair of shifters, which implement a subset of
`initially performs q/p iterations of the form
`all the shifts implemented by a complete barrel shifter. Conse-
`quently, the leading-zero encoders need to encode very few bits
`to determine the appropriate shift.
`
`error, rather than the absolute error, is bounded at each itera-
`tion.
`Intuitively, the errors in such an implementation have to
`be smaller since the initial values are, by definition, always in a
`normalized form.
`
`3 Normalization for CORDIC Y-Reduction
`
`Sar = Q [is + 6:g:27'] 2”
`
`(11)
`
`246
`
`Page 4 of 5
`
`Page 4 of 5
`
`

`

`
`
`
`
`logic grows
`control
`the
`Control Logic: The size of
`O((Maximum Shift required)?) = O(,/") = O(w).
`Time Penalty: Since the shifting is performed as part of the
`CORDIC iterations, the only time penalty incurred is a de-
`crease in the clock rate due to the extra propagation de-
`lay caused by the presence of an additional shifter in the
`CORDIC data path.
`4 Conclusions
`
`as
`
`An analysis of CORDIC Y-reduction, assuming a fixed-point rep-
`resentation of numbers, shows that unnormalized input values
`can result in large numerical errors in the computed inverse tan-
`gent. The complexity of a floating-point implementation uf the
`entire system can be avoided by locally normalizing the values
`before using CORDIC. We have implemented the normalization
`using a method that integrates it with the CORDIC iterations
`resulting in an elegant solution to the problem. This method -
`requires only O(n!) extra hardware and does notaffect the la-
`tency. We believe this scheme can be extended to the other modes
`of CORDIC,
`
`Acknowledgements
`This work was supported in part by the National Science Foun-
`dation under Research Initiation Award MIP-8909498.
`Heferences
`
`(1) J. R. Cavallaro and F. T. Luk. CORDIC Arithmetic for an
`SVD Processor. Journal of Parallel and Distributed Com-
`puting, 5(3):271-290, June 1988.
`
`(2
`
`[3]
`
`[4
`
`{5}
`
`(6
`
`[7]
`
`J. R. Cavallaro and F. T. Luk. Floating-Point CORDIC
`for Matrix Computations.
`[EEE Int. Conf. on Computer
`Design, pages 40-42, October 1988.
`
`Redundant and On-
`M. D. Ercegovac and T. Lang.
`Line CORDIC: Application to Matrix Triangularization and
`SVD. IEEE Trans. Computers, 39(6):725-740, June 1990.
`
`X. Hu, R. G. Harber, and 5. C. Bass. Expanding the Range
`of the CORDIC Algorithm.
`[EEE Trans. on Computers,
`pages 13-21, Jan. 1991.
`
`Y¥. H. Wu. The Quantization Dffects of the CORDIC Alyu-
`rithm. Intl. Conf. on Accoustics, Speech and Signal Process-
`ing, pages 1822-1825, April, 1988.
`
`kK. Kota. Architectural, Numerical and Implementation Is-
`sues in the VLSI Design of an Integrated CORDIC SVD
`Processor. Master's thesis, Rice University, Department of
`Electrical and Computer Engineering, May 1991.
`
`IL. Hu, and W. J. Yu. Doubly Pipelined
`T. Y. Sung, Y.
`CORDIC Array for Digital Signal Processing Algorithms.
`IEEE Int. Conf. on Acoustics, Speech, and Signal Process-
`ing, 2:1169-1172, April 1986.
`
`(8} J. D. Ullman. Computational Aspects of VLSI. Computer
`Science Press, Rockville, MD, 1984.
`
`(9] J. S. Walther. A Unified Algorithm for Elementary Func-
`tions. AFIPS Spring Joint Computer Conys., pages 379-385,
`1971.
`
`(l0] B. Yangand J. F. Béhme. Reducing the Computationsof the
`SVD Array Given by Brent and Luk. SPIE Advanced Algo-
`rithms and Architectures for Signal Processing, 1152:92-102,
`August, 1989.
`
`XY-CHIP
`
`.TEESeS
`ooEncoder
`
`Normal-
`ization
`Shifter
`
`Peston
`
`Figure 4: Implementation of the CORDIC Iterations in Hardware
`
`3.1 Choice of a Minimal Set of Shifts
`The following is an example of a CORDIC unit with a data-width
`w = 16 bits and nr = 16 iterations. The higher order three bits
`correspond to one sign bit and two overflow bits. Hence, for
`this example, the parameters MAX_X and MAXLYare set to the
`values of x and y corresponding to the bit pattern “0010 0000
`0000 0000”. The algorithm in figure 3 chooses the largest of
`{0,1,3} shifts, such that the resulting bit string after, has three
`leading zeros (or leading ones if negative).
`« To permit unmodified iterations in a modified CORDIC unit
`a shift of zero is necessary > shift[0] = 0
`An overall normalization shift of 1 can be achieved only by
`choosing p — 1. Thus a shift of 1 is necessary in any imple
`mentation = shift(1] = 1.
`Multiple iterations with p = 1 can achieve a shift up toa
`maximum of 2p? = 2. A shift of 3, however, cannot be
`iaipleuimeuted as thace modified CORDIC iterativous with p=
`1, since this will result in a loss of significant bits. This
`necessitates the inclusion of the shift p = 3 to the set >
`shi ft{2] = 3.
`The shift p = 3 can achieve any shift up to a maximum
`2p* = 18. Since the maximumshift required is 13, the nor-
`malization shifter does not have to implement any shift other
`than 0,
`1 and 3. Any shift, q, which is not a multiple of 3
`is performed as shifts of 3 for |q/3] iterations, followed by
`qmod 3 iterations with p= 1.
`Figure 4 shows a hardware implementation of the CORDIC unit
`that includes the above normalization scheme. The numberof
`bite in error using such a scleaue is bounded even for siuall initial
`values, as shown in Figure 2.
`
`3.2 Cost of Partial Normalization Implementation
`Por a CORDIC implementation with the data width of the # and
`y datapaths w bits, a shifter with a maximum shift
`\/w/2 is
`required. The shifts at each iteration are obtained from leading-
`zero encoders that encode the \/w/2 most significant bits of the
`z and y variables. The costs involved in these operations are:
`
`Shifter: The area complexity [8] of the shifter grows as
`O(w x MaximumShift required) = O(w x yw) = O(w1),
`
`247
`
`Page 5 of 5
`
`Page 5 of 5
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket