throbber
1992 IEEE
`
`INTERNATIONAL SYMPOSIUM ON
`
`CIRCUITS AND SYSTEMS
`
`5
`
`Volume 1 of 6
`
`Sheraton Hotel
`
`San Diego, CA
`May 10-13, 1992
`
`92CH3139-3
`
`Page 1 of 5
`
`SAMSUNG EXHIBIT 1043
`
`Page 1 of 5
`
`SAMSUNG EXHIBIT 1043
`
`

`

`
`
`A Normalization Scheme to Reduce Numerical Errors in Inverse
`
`Tangent Computations on a Fixed-point CORDIC Processor
`
`Kisbore Kota
`
`Joseph R. Cavallaro
`Department of Electrical .9; Computer Engineering
`Rice University, Houston, TX 77251
`
`Abstract
`
`ISO-ordinate Rotation Dlgtal Computer (COMIC) is a unified
`arithmetic algorithm that allows efficient V'LSI implementation
`of several elementary functions. Many spncial-purpose systems
`for real-time signal processing applications use a. fixed-point rep
`resentation of numbers due to the ease of implementation. An
`analysis of fixed-point CORDIC in the Y-reduction mode. which
`allows computation of the inverse tangent function. shows that
`unnornialized input values can result in large numerical errors.
`This paper Pres-ileum: a methud LU lute-251:“: the riurulaflimntiuu up-
`eration with CORDIC iterations for efficient implementation in
`00.215) hardware.
`1
`Introduction
`
`The CO-ordinate Rotation Dlgital Computer (CORDIC) is an
`arithmetic technique that allows efficient computation of a. vari-
`ety of elementary functions. The CORDIC algorithm is attrac-
`tive from a hardware point of view since it uses only primitive
`operations such as shifts and additions to implement more com-
`plex functions including sine, cosine. tangent, arctangent, sinll.
`cosh, tanh, arctanh, 1n and exp. A number of real-time signal
`processing and image processing algorithms have been developed
`to make efficient use of CORDIC in special-purpose parallel ar-
`rays.
`interest in CORDIC has spurred recent work to improve
`the basic algorithm. Techniques have been developed to increase
`the range of convergence [4], speedup the computation by reduc-
`ing the number of iterations [T] and pipeline the computation
`through conventional techniques or through the use of on-line
`arithmetic [3].
`The basic CORDIC iterations as specified by Walther [9] are:
`
`2;,“ : £i+1116;y[2_i
`Hm
`In - mattr‘l'fi .
`2;.“
`z.- + Egan—12“
`
`(1)
`l2}
`(I!)
`
`where r;, y.‘ and z.- are the states of the variables I, y and 3 at
`the start of the 2"” iteration. 0 <Ii < re and 6.- E {—1.+1]. The
`inverse tangents, o, = tan" 2". are pie-computed and stored
`in a table of angles. CORDIC iterations can he performed in
`three different modes, hyperbolic, linear or circular, depending
`on the choice of m E {-I.0.1}. All results in this paper have
`been obtained for the circular mode of CORDIC. The different
`functions implemented in this mode are summarized in Table I.
`The computation of interest in this paper is inverse tangent,
`which requires CORDIC operation in the restoring or Y—mn‘ucir'on
`mode. The Y-reduction mode is selected by a choice ofti; at each
`iteration given by:
`
`1
`
`a; = { _1
`
`if yiz.‘ 2 0
`if 91:; < 0.
`
`{4)
`
`The above choice [10] corresponds to the choice of 5;, which tries
`to reduce the magnitude of y.- to zero.
`
`Z-Reduction ( for Vector Rotations, sine. cosine functions}
`Initial Values
`Final values of variables
`
`(to coed + ygsindlfh’n
`
`{3.1342039 - :0 sin Lin/Kn
`
`Table l: CORDIC Functionality in the Circular Mode
`
`CORDIC has previously been implemented primarily using a
`fixed-point representation of the numbers. Design of a floating-
`point CORDIC unit requires handling of numerous special cases.
`which complicates design [2].
`In particular, a floating-point im-
`plementation of a. system that uses CORDIC requires normaliza-
`tion at every stage of computation. which is not necessary with
`a fixed-point representation.
`
`2 Error Analysis of CORDIC Y-Reduction
`In the study of numerical error in CURDIC, it is convenient to
`split the errors due to different. factors as an approximation error
`and a truncation error [5]. The approximation error is a result of
`the finite number ofiterations in an implementation olCORDlC.
`A truncation error is caused by the finite word length of the data
`paths. The following notation is used in the derivations.
`
`o 23;» lit. 5; represent the values obtained in a realvworld im—
`plementation ofCORDlC. These numbers include the effects
`of both approximation errors and truncation errors.
`
`A 2;, y“ 2; represent the values that will be obtained in a hy-
`pothetical CORDIC unit with infinite precision, where every
`iteration uses the same sequence of 6,-3 used in the compu-
`tation orig, 5“,- nnd 2‘.
`
`o 2", fig, 5.- represent the values that should be obtained for the
`desired function with infinite precision, as mathematically
`defined.
`
`Let Q[] denote the quantization operator. The result of the
`quantization operator on a vector is defined as the vector ob-
`tained through truncation of each vector element. The followin
`lemma, relating the vector v“ = [em ifan and in. = [sum]
`can be derived from the c and y iterations given by Equation 1
`and Equation '2 respectively.
`
`244
`
`0-7803-0593-W92 $3.00 1992 IEEE
`
`Page 2 of 5
`
`Page 2 of 5
`
`

`

`_.G
`
`HNwhmm-thfl
`
`i.5
`
`g 2
`
`'
`
`tan—l0) =
`the computed value of
`in
`Error
`Figure ‘2:
`tan-1 {yo/yo) versus ya in a CORDIC module that implements
`the partial normalization scheme
`
`Equation 1. Equation 2 and Equation 3 with the choice of 6,- at
`each iteration given by Equation 4. The convergence property
`of the CORDIC algorithgguarantees that 5",, is reduced to close
`to zero. A detailed error analysis [6] oi Y-reduction shows that
`the approximation error forms an insignificant fraction of the li-
`nal error and does not affect our results. Hence, in this paper
`we present a. simplified analysis that neglects the approximation
`El‘l'Dl'.
`
`In a hypothetical CORDIC unit with infinite precision, the fol-
`lowing relation holds:
`(8)
`y“ = (yocoso—zgsinafllfn
`where or has been defined in lemma 2.2. Let r and 3 be defined
`as
`
`_.
`
`.
`2
`i2
`tanfl
`30+?o ,
`:-
`Equation 8 can now be rewritten as:
`Knyn = rsinfli' — l1]
`
`__-W
`o.
`
`(9)
`
`[9-04
`
`
`
`sin‘1(—I‘"3")
`
`WE + lid
`
`
`
`is bounded by
`Using lemma 2.1, we can conclude that y" — I}:
`[n+1]\/‘Z£, Similarly, using lemma. 2.2, we can conclude that Itpi
`oI, which is the angle {in — znl, is bounded by M1. Substituting
`these worst case bounds, we obtain the following relation
`
`I3 - wl <
`
`Nuaum-ommc
`
`L-
`
`.5
`§In1..
`
`O6z
`
`Figure : Error in the computed value of tan “(1]=tan‘1(ya/yo)
`versus yo in a. CORDIC module without normalization. The error
`has been quantized into the number of bits in error. A large
`error is observed fol: small values of ya. This unporimunt lion
`been performed on a datapath that is 16 bits wide. The selected
`range of ya forms about ith of the entire range of ya.
`
`Lemma 2.1
`
`11—1
`
`lvn—vni<\/§r (ll-E,'_a "in1-,-
`
`Pmof This is a result derived by Hu {5]. The quantity 6 is the
`precision of the icy-data path. The Scale factor K; is defined as
`Xi = {12;}, cos 0;. The above relation holds for fixed-point data
`paths. A plot of 5:6 [UL-,1 Kri versus in shows that it is very
`close to n. for all practical values of n such as 16, 32 or 64. Thus
`the result reduces to:
`
`(a)
`I?-—Vq|(x/§€(l-i-n)
`The number of iterations, n, and the width of the data. path
`are related. Since a wider datapath implies greater precision, an
`approximation error that is of the same order, can be achieved
`only through more iterations.
`
`CI
`The truncation error due to finite precision of the z datapath can
`be quantified by the following lemma.
`Lcuuuq 2.2
`n—l
`
`11—1
`
`0 - ): 5rQlail = Z 5% Where
`t=0
`i=0
`
`in = 0i - Q [or]
`
`[7]
`
`Proof The angle or is defined as o = 293-01 dim. Hence, a is
`the total angle that is either added to or subtracted from so in a
`hypothetical CORDIC unit with infinite precision. The result fol-
`low from definitions of o: and in. If a fixedwoint representation
`is used. then all the pigs are bounded above by p, the precision of
`the z-dala path.
`
`D
`
`2.1 Perturbation 1n the lLlorrtpui'ocl Value of the in-
`verse Tangent
`The goal of Y-reduction iterations is to compute the inverse tan.
`gent 9 = tan-1(po/zo). Initially, 20 is set to zero and the inverse
`tangent is accumulated in 2‘... The angle actually accumulated
`in 2,,
`is to. This is achieved by performing iterations given by
`
`245
`
`Page 3 of 5
`
`
`
`sin"'II (“K"y")
`
`
`
`+ M:
`
`(10)
`
`i/13+ vi
`Here, 3 is the true inverse tangent that is to be evaluated. The
`final value accumulated in the 1 variabla, 5,. = rp,
`is the angle
`actually obtained. Thus relation {10} fives the numerical error
`in inverse tangent calculations.
`Figure 1 shows the errors observed in the computation of in-
`vma langonl for various: initial Valued.
`II. ohowo that when 47"
`and ya are close to zero, a large error results. An intuitive ex-
`planation of this error is the successively larger right shift in the
`CORDIC iterations given by Equation 1 and Equation 2, which
`can result in a loss of all significant bits if an and ya are not large
`enough. A similar problem does not occur in a floating-point
`CORDIC module, since in such an implementation the relative
`
`Page 3 of 5
`
`

`

`error, rather than the absolute error, is bounded at each itera-
`tion.
`Intuitiveiy, the errors in such an implementation have to
`be smaller since the initial values are, by definition, always in a.
`normalized form.
`
`3 Normalization for CORDIC Y-Reduction
`
`Figure 1 shows that a large error in the computed inverse tan-
`gent can occur if [vol is small. However, if the initial values are
`bounded below by a value that is in the order of 2":, since the
`nlirneratnr in Equation in is hounded ahmm hp lint/'30 4. 1):.
`the error will be bounded. Hence, a. lower bound on [sigh enforced
`through normalization of the input values, can be used to control
`the error. If this normalization takes the form of a left shift of
`both the inputs. it appears in the final output. 9... as a shift and
`does not afi'ect the computed inverse tangent, except for better
`accuracy.
`“he observed the above problem when designing a processor [ii]
`that computes the Singular Value Decomposition [SVD) of a. ma
`trix using CORDIC arithmetic [i].
`In a. fixedvpoint implementa—
`tion, when CORDIC is used to perform operations on numbers
`obtained in a long chain of calculations, one cannot guarantee
`that the numbers are always in a normalized form. A straight-
`forward way to eliminate the resulting numerical problems is to
`explicitly normalize the inputs before proceeding With CORDIC.
`Implementation of normalization requires hardware to deter—
`mine the amount by which components of the vector can be
`shifted left, and hardware to implement the shifts. The simplest
`implementation uses two leading—zero encoders to determine the
`amounts through which each component, 520 and in, need to be
`shifted: a comparator to choose the smaller of the two shifts: and
`a. barrel shifter to implement the shifts. This generic solution
`to achieve normalization incurs an 0(1) time penalty and 001:}
`area penalty.
`A tradeofi' between time and area. [8] can be made by imple-
`menting a shifter that can shift through selected powers of two.
`and performing the desired normalization shift in logic cycles.
`Such an approach reduces the complexity of the barrel shifter to
`0(n logo), while the complexity of the leading-zero encoders is
`still Uiu‘}.
`'l'he Implementation is considerably more complex
`and incurs a time penalty of Oflog n).
`In this section we present a new normalization technique that
`incurs an area penalty of0(n]‘5) but no time penalty. This liECll‘
`nique, which we call partial normalisation, achieva the desired
`normalization shift as a combination of smaller shi its merged with
`the CORDIC iterations. At the and of each iteration, the sari:
`ables i,- and 3}. are shifted left by a. few hits, decided by control
`logic, until the desired total shift is achieved. The design goal is
`to minimize the maximum shift that needs to be introduced at
`any iteration. The normalization shift introduced at every iter-
`ation prevents CORDIC from shifting out and. losing significant
`bits in subsequent iterations. The above scheme results iti two
`competing processes. The partial normalization scheme can only
`implement a. total shift that grows linearly as the number ofiter-
`atioris. CORDIC, however, tends to shift out bits as a square of
`the number of iteration: due: to the oucccaaivdy humming aliift,
`i, at each iteration. The following theorem quantifies this idea.
`
`tan-‘(jo2'fio29L be the value calcu-
`Theorem 3.1 Let t"
`lated til." Dreshifiinn :50 and fig and using an unmodified CUR DIE
`unit. where q is an integral multiple of a constant p. Let I :
`tan—lfgfrg/tio) be the value calculated by using a modified CGRDIC
`unit and unnormob'zeo’ initial values. The modified CORDIC unit
`initially performs nfp iterations of the form
`
`i.“ = Mimosa" 2”
`
`(11)
`
`
`
`begin
`fori:=0tondo
`“(at < MAXK and y; 1: MARX) then
`Choose cinder such that
`
`zesffiftlmdfll < MALX and
`zizshifllrinder-Fl] > MAXI ;
`CHOOSE yfnde: such that
`
`trashifllyinderl c MAJLY and
`inShiftlyindcr+1l > NleJl! ;
`index := min{ xindex. yinrfmr ):
`else
`
`index :: I]:
`
`end
`
`Normalization Shift p := shiftiindex]
`Perform modified CORDIC Iteration:
`end
`end
`
`
`Figure 3: Modified CORDIC Iterations,
`reduction when the input is not normalized
`
`invoked during Y-
`
`ili+l = Qifii-fiiifl'ilz”,
`where 1' is the iteration indies: rmrf OH is the qrmni'i'mri‘nn Dram.
`tor. This isfoiiotued by (n ~q/p} unmodified CORDIC iterations.
`The two inverse tangents computed. t and t", will be identical if
`q < E,
`tuber-e E is an upper bound given by
`
`(12]
`
`5:21]?
`
`Proof This result is obtained by observing that the left shift
`ofp introduced at the end of a CORDIC iteration introduces p
`zeros in the lower order bits that can be safely shifted out With-
`out losing any significant bits. However, as the iteration index
`increases. the right shift at each iteration increases, while the
`left shift introd uced by normalization is a constant. The upper
`bound 4 is the point at which the p is no longer sulTicient to pre
`vent any loss of hits. A detailed proof of this result can be found
`in a thesis [6]
`D
`Tlfio tltculmll allows that if the 6011ch Iterations can be modi-
`fied to include a. constant left shift p then normalization through
`any multiple of p but less than 2;)2 can be achieved. In practice,
`the parameter p can be made a variable, allowing one of several
`shifts to be chosen at each iteration. An appropriate choice of
`these shifts can achieve any desired total shift.
`Suppose p in Equation 11 can be chosen from a. total of). shifts,
`and the condition shiftfl - L] > shiftli — 2} >
`2 0 holds. The
`algorithm given in Figure 3, give: or choice of shifts at each it.-
`eratiou to normalize the input. The parameters MAXJC and
`MAXI are the maximum allowed values of so and yo chosen
`to prevent overflow. The required control can be implemented
`using combinational logic: a. pair of leading sore encodes-o and. n.
`comparator to select the smaller shift from the output of the en-
`coders. The additional hardware required to implement partial
`normalization 'n a pair of shifters, which implement a subset of
`all the shifts implemented by a complete barrel shifter. Conse-
`quently, the leading-zero encoders need to encode very few bits
`to determine the appropriate shift.
`
`245
`
`Page 4 of 5
`
`Page 4 of 5
`
`

`

`
`
`
`
`‘orma -
`L .T
`Iml...
`‘orrnn -
`nation
`
`H Shifter
`
`Figure 4: Implementation ofthe CORDIC Iterations in Hardware
`
`3.1 Choice of a Minimal Set of Shifts
`The following is an example ofa. CORDIC unit with a data‘width
`in = 16 bits and n = 16 iterations. The higher order three hits
`correspond to one Sign bit and two overflow bits. Hence. for
`this example. the parameters MAXJC and MAPLY are set to the
`values of I and y corresponding to the hit patio“: “0010 0000
`00-00 0000". The algorithm in figure 3 chooses the largest of
`{0.1.3} shifts, such that the resulting bit string alter. has three
`leading zeros (or leading ones if negative}.
`0 To permit unmodified iterations in a, modified CORDIC unit
`a. shift of zero is necessary :> shifth’l] = 0.
`o An overall normalization shift of 1 can be achieved only by
`chucking 3) no 1. Thus a shift of 1 is necessary in any inlplo
`mentarion =9 shift[l] = 1.
`a Multiple iterations with p = 1 can achieve a shift up to a.
`maximum of 2p2
`2. A shift of 3. however, cannot be
`iuipleiueiitctl a.» Line: inudiliml CORDIC iteration; wiLh p :
`1, since this will result in a loss of significant bits. This
`necessitates the inclusion of the shift p = 3 to the set
`::r
`shiftp} = 3.
`u The shirt 1) 2 .3 can achieve any shift up to a. maximum
`2})2 = is. Since the maximum shiit required is 13. the nor-
`mali ration shifter does not have to implemen t any shift other
`than D,
`1 and 3. Any shift, q, which is not a. multiple ol 3
`is performed as shifts 013 for erf3] iterations, followed by
`q rnod3 ilerations with p 2 1.
`Figure 4 shows a hardware implementation of the CORDIC unit
`that includes the above normalization scheme. The number of
`lJlI-b in 51101 trains bLLLlI. n achuruc in lruurulctl. over: for entail initial
`values. as shown in Figure 2.
`
`3.2 Cost of Partial Normalization Inlplementation
`For :- CORDIC implementation will: the data. width of the a and.
`y datapaths to hits. a shifter with a. maximum shift «1112':
`is
`required. The shifts at each iteration are obtained from leading-
`zero encoders that encode the Jun?) most significant bits of the
`.1: and y mriables. The costs involved in these operations are:
`
`Shifter: The area. complexity [S] of the shifter grows as
`Olw x Maximum Shift required) = 0(w >< fl) = Ofwl's].
`
`247
`
`logic grows
`control
`the
`Control Logic: The size of
`O[(Maximum Shift required?) 2 Ohflu‘?) = 0(w}.
`Time. Penalty: Since the shifting is performed as part of the
`CORDIC iterations. the only time penalty incurred is a de-
`crease in the clock rate due to the extra propagation the
`lay caused by the presence of an additional shifter in the
`CORDIC data. path.
`4 Conclusions
`
`as
`
`An analysis of CORDIC Y—rednction, assuming a fixed-point rep-
`resentation of numbers. shows that unnorrnalized input values
`can result in large numerical errors in the computed inverse tan-
`gent. The complexity of a. floating-point implenmntotiun of the
`entire system can be avoided by locally normalizing the values
`before using CORDIC. We have implemented the normalization
`using a. method that integrates it with the CORDIC iterations
`resulting in an elegant solution to the problem. This method.
`requires only 001”) extra hardware and does not afl'cct the la-
`tency. We believe this scheme can be extended to the other modes
`of CORDIC.
`
`Acknowledgements
`This work was supported in part by the National Science Foun-
`dation under Research Initiation Award NIP-8909498.
`References
`
`[1] J. R. Cavallaro and F. T. Luk. CORDIC Arithmetic for an
`SVD Processor. Journal of Parallel and Distributed Com-
`nutina. 5(3l:271—290. June 1988.
`
`[2] J. R. Cavallaro and F. T. Luk. Floating—Point CORDIC
`for Matrix Computations.
`lEEE Int. Confi on Computer
`Design, pages 40-42, October 1988.
`
`[3|
`
`[4]
`
`[5i
`
`[Lil
`
`Redundant and 011-
`M. D. Ercegovac and T. Lang.
`Line CORD 1C: Application to Matrix 'I'riangularization and
`SVD. IEEE Trans. Computers, 39(5):?‘25—740. June 1990.
`
`X. Hu, R. G. Harber, and S. C. Bass. Expanding the Range
`of the CORDIC Algorithm.
`IEEE Trims. on Computers.
`pages 13-21. Jan. 1991.
`
`Y. H. H“. The Quantization Effect: of thc CORDIC Alger
`rithm. In“. Conf. on .rlccouslics, Speech and Signal Process-
`frir, pages 1322-1825, April, 1983.
`
`Ii. Kola. Architectural. Numerical and Implementation Is-
`sues in the VLSI Design of an Integrated CORDIC SVD
`Processor. Master’s thesis, Rice University, Department of
`Electrical and Computer Engineering, May 1991.
`
`[T] T. Y. Sung, Y. 11. Ho, and ll. .1. Yu. Doubly Pipelinod
`CORDIC Array for Digital Signal Processing Algorithms.
`IEEE Int. Conf. on Acoustics, Speech, and Signal Process-
`ing, 21169-1172, April 1986.
`
`[3} J. D. Ullman. Computational Aspects of VLSl'. Computer
`Science Prms, Rocltville, MD, 1984.
`
`[9] l. S. Walther. A Unified Algorithm for Elementary Func-
`tions. AFIPSJpr-mg Jain: Computer Cont. pages dis—J56,
`1971.
`
`[10] B. Yang and J. F. B'ohme. Reducing the Computations of the
`SVD Array Given by Brent and Luk. SPIE Advanced Algo-
`rithms and Architecture: for Signal Processing. 1152112402.
`August, 1939.
`
`Page 5 of 5
`
`Page 5 of 5
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket