throbber
308
`
`Chapter 4 Arithmetic for Computers
`
`4.11 Concluding Remarks
`
`309
`
`With the explanation of computer arithmetic in this chapter comes a de(cid:173)
`scription of much more of the MIPS instruction set. One point of confusion is
`the instructions covered in these chapters versus instructions executed by
`MIPS chips versus the instructions accepted by MIPS assemblers. The next two
`figures try to make this clear.
`Figure 4.52 lists the MIPS instructions covered in Chapters 3 and 4. We call
`the set of instructions on the left-hand side of the figure the MIPS core. The in(cid:173)
`structions on the right we call the MIPS arithmetic core. On the left of Figure 4.53
`are the instructions the MIPS processor executes that are not found in
`Figure 4.52. We call the full set of hardware instructions MIPS I. On the right
`of Figure 4.53 are the instructions accepted by the assembler that are not part
`of MIPS I. We call this set of instructions Pseudo MIPS.
`
`taken on a life of its own. Although Intel firmly stands behind the qual(cid:173)
`ity of the current version of the Pentium processor, we recognize that
`many users have concerns. We want to resolve these concerns. Intel will
`exchange the current version of the Pentium processor for an updated
`version, in which this floating-point divide flaw is corrected, for any
`owner who requests it, free of charge anytime during the life of their
`computer." Analysts estimate that this recall cost Intel $300 million.
`This story brings up a few points for everyone to ponder. How much cheaper
`would it have been to fix the bug in July 1994? What was the cost to repair the
`damage to Intel's reputation? And what is the corporate responsibility in
`disclosing bugs in a product so widely used and relied upon as a micro(cid:173)
`processor?
`In April 1997 another floating-point bug was revealed in the Pentium Pro
`and Pentium II microprocessors. When the floating-point-to-integer store in(cid:173)
`structions (fist, fi stp) encounter a negative floating-point number that is
`too large to fit in a 16- or 32-bit word after being converted to integer, they set
`the wrong bit in the FPO status word (precision exception instead of invalid
`operation exception). To Intel's credit, this time they publicly acknowledged
`the bug and offered a software patch to get around it-quite a different reac(cid:173)
`tion from what they did in 1994.
`
`II Concluding Remarks
`
`Computer arithmetic is distinguished from paper-and-pencil arithmetic by
`the constraints of limited precision. This limit may result in invalid operations
`through calculating numbers larger or smaller than the predefined limits.
`Such anomalies, called "overflow" or "underflow," may result in exceptions
`or interrupts, emergency events similar to unplanned subroutine calls. Chap(cid:173)
`ter 5 discusses exceptions in more detail.
`Floating-point arithmetic has the added challenge of being an approxima(cid:173)
`tion of real numbers, and care needs to be taken to ensure that the computer
`number selected is the representation closest to the actual number. The chal(cid:173)
`lenges of imprecision and limited representation are part of the inspiration for
`the field of numerical analysis.
`Over the years, computer arithmetic has become largely standardized,
`greatly enhancing the portability of programs. Two's complement binary inte(cid:173)
`ger arithmetic and IEEE 754 binary floating-point arithmetic are found in the
`vast majority of computers sold today. For example, every desktop computer
`sold since this book was first printed follows these conventions.
`A side effect of the stored-program computer is that bit patterns have no in(cid:173)
`herent meaning. The same bit pattern may represent a signed integer, unsigned
`integer, floating-point number, instruction, and so on. It is the instruction that
`operates on the word that determines its meaning.
`
`MIPS core instructions NW::IMAH::f\l
`
`R
`I
`R
`
`-
`
`- ! - -
`
`-+--
`
`add
`add immediate
`add unsigned
`add immediate unsigned
`subtract
`- - - - - - - - - - t -
`subtract unsigned
`and
`and immediate
`or
`or immediate
`shift left logical
`- - - -- - -
`shift right logical
`load upper immediate
`- - -
`load word
`store word
`load byte unsigned
`store byte
`branch on equal
`branch on not equal
`jump
`jump and link
`jump register
`set less than
`set less than immediate
`set less than unsigned
`set less than immediate unsigned
`
`add
`addi
`addu
`add i u
`sub
`-+- subu
`_j___ and
`andi
`or
`ori
`s l l
`s r l
`l ui
`lw
`SW
`l bu
`sb
`beq
`bne
`j
`j al
`jr
`slt
`s lt i
`s ltu
`s lt i u
`
`- - l - - -
`
`i sub . d
`
`mu l . s
`floating-point multiply single
`floating-point multiply double ~ l d
`floating-point d1v1de single
`v : s
`- -- I --
`floating-point d1v1de double
`v . d
`--1---
`l we 1
`1 o ad word to floating-point single
`- -'-s-to- re word to floating-point single ~
`wel
`be 1 t
`branch on floating-point true
`belf
`branch on floating-point false
`- --+--
`e . x . s
`floating-point compare single
`(x = eq , neq , l t , le . gt , ge )
`floating-point compare double
`R
`(x = eq, neq . l t , le , gt. ge )
`R
`- --+--
`
`-
`
`R
`
`------i--e . x d
`---+-
`
`mul t
`multiply
`- - - - -+ - -- -
`multu
`multiply unsigned
`d i v
`divide
`d iv u
`divide unsigned
`mfhi
`R
`move from Hi
`- ---1--(cid:173)
`mflo
`R
`m o v e from Lo
`- --+-(cid:173)
`mfeO
`move from system control (EPC)
`R
`- - - -+ - -
`- ---1-
`add . s
`floating-point add single
`add . d
`R
`floating-point add double
`- l --
`-
`sub . s
`floating-point subtract single
`I
`----+-------+--f-lo-ating-point subtract double
`R
`R
`
`MIPS arithmetic core
`
`IW::IMAH::f\l
`
`R
`R
`R
`
`i
`
`,•
`
`R
`
`R
`R
`R
`R
`
`R
`R
`R
`R
`R
`
`R
`R
`
`R
`
`R
`
`FIGURE 4 .52 The MIPS instruction set covered so far. This book concentrates on the instructions in the left column.
`
`Figure 4.54 gives the popularity of the MIPS instructions for two programs: gee
`and spice. All instructions are listed that were responsible for at least 0.5% of the
`instructions executed. The table following summarizes that information:
`
`INTEL - 1012
`
`

`

`310
`
`Chapter 4 Arithmetic for Computers
`
`4.11 Concluding Remarks
`
`311
`
`Instruction subset &JS&a;.:;sa
`
`MIPS core
`MIPS arithmetic core
`Remaining MIPS I
`
`95%
`45%
`0%
`49%
`5%
`6%
`Note _that although programmers and compiler writers may use MIPS I to
`have a n_cher menu of options, MIPS core instructions dominate gee execution,
`and the mteger core plus arithmetic core dominate spice.
`(cid:127)
`i'lt::Pl
`R
`
`Remaining MIPS I
`
`exclusive or ( rs EB rt )
`
`exclusive or immediate
`nor ( -,(rs v rt))
`shift right arithmetic
`
`shift left logical variable
`
`shift right logical variable
`
`shift right arith. variable
`
`move to Hi
`
`move to Lo
`
`load halfword
`
`load halfword unsigned
`store halfword
`
`(cid:127)
`A!
`S:::'ffl
`xor
`
`p seudo MfPS
`
`move
`
`xori
`
`nor
`
`s ra
`
`s l l V
`s rl v
`s rav
`
`mthi
`
`mtlo
`
`l h
`
`lhu
`
`sh
`
`I
`
`R
`
`R
`
`R
`
`R
`
`R
`
`R
`
`R
`
`I
`
`I
`
`absolute value
`not (-,rs)
`
`negate (signed or unsigned)
`rotate left
`
`rotate right
`
`mult. & don 't check oflw (signed or uns.)
`multiply & check oflw (signed or uns.)
`divide and check overflow
`
`divide and don't check overflow
`remainder (signed or unsigned)
`load immediate
`
`load address
`
`1r:::1r(cid:127)
`
`move
`
`i':1::n(cid:127)
`
`rd,rs
`
`abs
`
`not
`
`negs
`
`rol
`
`ror
`mul s
`
`mul OS
`div
`
`divu
`rerm·
`
`l i
`
`rd,rs
`
`rd,rs
`
`rd,rs
`
`rd,rs ,rt
`
`rd,rs,rt
`
`rd,rs,rt
`
`rd,rs ,rt
`
`rd,rs,rt
`
`rd,rs,rt
`
`rd,rs,rt
`
`rd ,imm
`
`IM::IIFHMWWI Arithmetic core + MIPS I IMi::ii(cid:141) HMEWI
`
`Core MIPS
`
`add
`add immediate
`add unsigned
`add immediate unsigned
`subtract unsigned
`and
`and immediate
`shift left logical
`shift right logical
`load upper immediate
`load word
`store word
`load byte
`store byte
`branch on equal (zero)
`branch on not equal (zero)
`jump and link
`jump register
`
`set less than
`
`add
`addi
`addu
`addiu
`subu
`and
`andi
`s l l
`s r l
`l ui
`lw
`
`SW
`lb
`sb
`beq
`bne
`j al
`j r
`s l t
`
`set less than immediate
`
`s l ti
`
`0%
`0%
`9%
`17%
`0%
`1%
`2%
`5%
`0%
`2%
`21%
`12%
`1%
`1%
`9%
`8%
`1%
`1%
`
`FP add double
`0%
`FP subtract double
`0%
`FP multiply double
`10%
`FP divide double
`1%
`load word to FP single
`1%
`store word to FP single
`0%
`branch on FP true
`1%
`branch on FP false
`5%
`FP compare double
`1%
`6% move to FP
`7% move from FP
`2%
`convert float integer
`0%
`shift right arithmetic
`load half
`0%
`branch less than zero
`3%
`branch greater or equal zero
`2%
`branch less or equal zero
`1%
`1%
`
`2%
`
`1%
`
`0%
`
`0%
`
`add . d
`sub . d
`mul . d
`div . d
`l . s
`s . s
`belt
`bclf
`c . x . d
`mtcl
`mfc2
`cut
`s ra
`l h
`bltz
`bgez
`blez
`
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`2%
`1%
`1%
`1%
`0%
`
`4%
`3%
`5%
`2%
`24%
`9%
`1%
`1%
`1%
`2%
`2%
`1%
`0%
`0%
`0%
`0%
`1%
`
`load word left (unaligned)
`load word right (unaligned)
`store word left (unaligned)
`store word right (unaligned)
`branch on less than zero
`
`branch on less or equal zero
`
`branch on greater than zero
`branch on 2! zero
`
`branch on 2! zero and link
`branch on < zero and link
`
`jump and link register
`
`return from exception
`system call
`
`break (cause exception)
`
`move from FP to integer
`
`move to FP from integer
`FP move (s or g)
`FP absolute value (s or g)
`FP negate (s or g)
`FP convert (w, s, or g)
`FP compare un (1 or g)
`
`lwl
`
`lwr
`
`swl
`
`swr
`
`bltz
`
`blez
`
`bgtz
`
`bgez
`
`bgezal
`
`b ltza l
`j al r
`rfe
`
`syscal l
`
`break
`
`mfcl
`
`mtcl
`mov .f
`abs .f
`neg .f
`cvt .ff
`c . xn .f
`
`I
`I
`I
`
`I
`I
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`R
`
`R
`
`R
`
`R
`
`R
`
`R
`
`R
`R
`R
`R
`R
`
`load double
`
`store double
`
`unaligned load word
`
`unaligned store word
`
`unaligned load halfword (signed or uns.)
`unaligned store halfword
`branch
`
`branch on equal zero
`branch on > (signed or unsigned)
`branch on > (signed or unsigned)
`branch on < (signed or unsigned)
`branch on < (signed or unsigned)
`set equal
`
`set not equal
`
`set greater or equal (signed or unsigned)
`set greater than (signed or unsigned)
`set less or equal (signed or unsigned)
`set less than (signed or unsigned)
`load to floating point (s or g)
`store from floating point (s or g)
`
`la
`
`l d
`
`sd
`
`ulw
`
`usw
`ul hs
`
`ush
`
`b
`
`beqz
`
`bges
`
`bgts
`bl es
`bl t s
`seq
`
`sne
`sges
`sgts
`s l es
`s l es
`l .f
`s .f
`
`rd,addr
`
`rd ,addr
`
`rd,addr
`
`rd,addr
`
`rd,addr
`
`rd ,addr
`
`rd,addr
`
`Label
`
`rs,L
`
`rs,rt,L
`
`rs,rt,L
`
`rs .rt,L
`
`rs,rt,L
`
`rd,rs ,rt
`
`rd,rs,rt
`
`rd ,rs ,rt
`
`rd,rs,rt
`
`rd,rs,rt
`rd ,rs,rt
`rd ,addr
`rd,addr
`
`FIGUR~ _4.53 Remaining MIPS _l _and "Pseu_do MIPS" Instruction sets. Appendix A describes all these instructions.!
`mea ns smgle (s) and double prec1s10n (d) vers10ns of the floating-point instruction, and s means signed and unsigned (u)
`vers10ns.
`
`set less than unsigned
`
`set less than imm. uns .
`
`sltu
`
`s lt i u
`
`1%
`
`1%
`
`0%
`
`0%
`
`FIGURE 4.54 The frequency of the MIPS Instructions for two programs, gee and spice. Calculated from "pixie"
`output of the full MIPS I. (Pixie is an instruction measurement tool from MIPS.) All instructions that accou_nted for at_le_ast
`0.5% of the instructions executed in either gee or spice are included in the table. Thus the integer multiply and d1v1de
`instructions are not listed because they were responsible for less than 0.5% of the instructions executed. Pseudoinstructions
`are converted into MIPS I before execution, and hence do not appear here.
`
`For the rest of the book, we concentrate on the MIPS core instructions-the
`integer instruction set excluding multiply and divide-to make the explan(cid:173)
`ation of computer design easier. As we can see, the MIPS core includes the
`most popular MIPS instructions, and be assured that understanding a comput(cid:173)
`er that runs the MIPS core will give you sufficient background to understand
`even more ambitious machines.
`
`INTEL - 1012
`
`(cid:127)
`

`

`312
`
`Chapter 4 Arithmetic for Computers
`
`4.12 Historical Perspective and Further Reading
`
`313
`
`II Historical Perspective and Further Reading
`
`Gresham's Law ("Bad money drives out Good") for computers would say, "The
`Fast drives out the Slow even if the Fast is wrong."
`
`W. Kahan, 1992
`
`At first it may be hard to imagine a subject of less interest than the correctness
`of computer arithmetic or its accuracy, and harder still to understand why a
`subject so old and mathematical should be so controversial. Computer arith(cid:173)
`metic is as old as computing itself, and some of the subject's earliest notions,
`like the economical reuse of registers during serial multiplication and divi(cid:173)
`sion, still command respect today. Maurice Wilkes [1985] recalled a conversa(cid:173)
`tion about that notion during his visit to the United States in 1946, before the
`earliest stored-program machine had been built:
`
`... a project under von Neumann was to be set up at the Institute of Advanced
`Studies in Princeton . ... Goldstine explained to me the principal features of the
`design , including the device whereby the digits of the multiplier were put into
`the tail of the accumulator and shifted out as the least significant part of the prod(cid:173)
`uct was shifted in. I expressed some admiration at the way registers and shifting
`circuits were arranged ... and Goldstine remarked that things of that nature
`came very easily to von Neumann.
`
`There is no controversy here; it can hardly arise in the context of exact inte(cid:173)
`ger arithmetic so long as there is general agreement on what integer the correct
`result should be. However, as soon as approximate arithmetic enters the pic(cid:173)
`ture, so does controversy, as if one person's "negligible" must be another's
`"everything."
`
`The First Dispute
`
`Floating-point arithmetic kindled disagreement before it was ever built. John
`von Neumann was aware of Konrad Zuse's proposal for a computer in Ger(cid:173)
`many in 1939 that was never built, probably because the floating point made
`it appear too complicated to finish before the Germans expected World War II
`to end. Hence von Neumann refused to include it in the machine he built at
`Princeton. In an influential report coauthored in 1946 with H. H . Goldstine
`and A. W. Burks, he gave the arguments for and against floating point. In
`favor:
`
`.. . to retain in a sum or product as many sign ificant digits as possible and . .. to
`free the human operator from the burden of estimating and inserting into a prob(cid:173)
`lem "scale factors" - multiplication constants which serve to keep numbers
`within the limits of the machine.
`
`Floating point was excluded for several reasons:
`
`There is, of co11rse, 110 denying the fact that human ti111e is co11s11111ed in arrang(cid:173)
`ing for the i11troductio11 of suitable scale factors. We on ly arg11e that the ti111e con(cid:173)
`s11111ed is a very small perce11tage of the total time we will spend in preparing an
`interesting problem for our machine. The first advantage of the floating point is,
`we feel, somewhat illusory. In order to have such a floating point, one 11111st waste
`memory capacity which could otherwise be 11sed for carrying 111ore digits per
`word. It would therefore seem to us not at all clear whether the modest advan(cid:173)
`tages of a floating binary point offset the loss of memory capacity and the in(cid:173)
`creased complexity of the arithmetic and control circuits.
`
`The argument seems to be that most bits devoted to exponent fields would be
`bits wasted. Experience has proved otherwise.
`One software approach to accommodate reals without floating-point hard(cid:173)
`ware was called floating vectors; the idea was to compute at runtime one scale
`factor for a whole array of numbers, choosing the scale factor so that the array's
`biggest number would barely fill its field . By 1951, James H. Wilkinson had
`used this scheme extensively for matrix computations. The problem proved to
`be that a program might encounter a very large val ue, and hence the scale
`factor must accommodate these rare large numbers. The comn10n numbers
`would thus have man y lea ding Os, since all numbers had to use a single scale
`factor. Accuracy was sacrificed because the least significant bits had to be lost
`on the right to accommodate leading Os. This wastage became obvious to prac-
`'• titioners on early machines that displayed all their memory bits as dots on
`cathode ray tubes (like TV screens) because the loss of precision was visible.
`Where floating point deserved to be used, no practical alternative existed .
`Thus true floatin g-point hardware became popular because it was useful.
`By 1957, floating-point hardware was almost ubiquitous. A decimal floating(cid:173)
`point unit was available for the IBM 650; and soon the IBM 704, 709, 7090,
`7094 ... series would offer binary floating-point hardware for double as well
`as single precision .
`As a result, everybod y had floa ting point, but every implementation was
`different.
`
`Diversity versus Portability
`Since roundoff introduces some error into almost all floating-point opera(cid:173)
`tions, to complain about another bit of error seems picayune. So for 20 years
`nobody complained much that those operations behaved a little differently on
`different machines. If software required clever tricks to circum vent those idio(cid:173)
`syncrasies and finally d eliver results correct in all but the last several bits,
`such tricks were d eemed part of the programmer's art. For a long time, matri x
`computations mystified most people w ho had no noti on of error ana lysis; per(cid:173)
`haps this continues to be tru e. That ma y be why people are still surprised that
`
`INTEL - 1012
`
`

`

`314
`
`Chapter 4 Arithmetic for Computers
`
`numerically stable matrix computations depend upon the quality of arith(cid:173)
`metic in so few places, far fewer than are generally supposed. Books by
`Wilkinson and widely used software packages like Unpack and Eispack
`sustained a false impression, widespread in the early 1970s, that a modicum
`of skill sufficed to produce portable numerical software.
`Portable here means that the software is distributed as source code in some
`standard language to be compiled and executed on practically any commer(cid:173)
`cially significant machine, and that it will then perform its task as well as any
`other program performs that task on that machine. Insofar as numerical soft(cid:173)
`ware has often been thought to consist entirely of machine-independent math(cid:173)
`ematical formulas, its portability has often been taken for granted; the mistake
`in that presumption will become clear shortly.
`Packages like Unpack and Eispack cost so much to develop-over a
`hundred dollars per line of Fortran delivered-that they could not have been
`developed without U.S. government subsidy; their portability was a pre(cid:173)
`condition for that subsidy. But nobody thought to distinguish how various
`components contributed to their cost. One component was algorithmic-de(cid:173)
`vise an algorithm that deserves to work on at least one computer despite its
`roundoff and over/underflow limitations. Another component was the soft(cid:173)
`ware engineering effort required to achieve and confirm portability to the di(cid:173)
`verse computers commercially significant at the time; this component grew
`more onerous as ever more diverse floating-point arithmetics blossomed in the
`1970s.
`And yet scarcely anybody realized how much that diversity inflated the cost
`of such software packages.
`
`A Backward Step
`
`Early evidence that somewhat different arithmetics could engender grossly
`different software development costs was presented in 1964. It happened at a
`meeting of SHARE, the IBM mainframe users' group, at which IBM
`announced System/360, the successor to the 7094 series. One of the speakers
`described the tricks he had been forced to devise to achieve a level of quality
`for the S/360 library that was not quite so high as he had previously achieved
`for the 7094.
`Part of the trouble could have been foretold by von Neumann had he still
`been alive. In 1948 he and Goldstine had published a lengthy error analysis so
`difficult and so pessimistic that hardly anybody paid attention to it. It did pre(cid:173)
`dict correctly, however, that computations with larger arrays of data would
`probably fall prey to roundoff more often. IBM S/360s had bigger memories
`than 7094s, so data arrays could grow bigger, and they did. To make matters
`worse, the S/360s had narrower single precision words (32 bits versus 36) and
`used a cruder arithmetic (hexadecimal or base 16 versus binary or base 2) with
`consequently poorer worst-case precision (21 significant bits versus 27) than
`
`l ~(
`
`I:
`
`j
`
`4.12 Historical Perspective and Further Reading
`
`315
`
`old 7094s. Consequently, software that had almost always provided (barely)
`satisfactory accuracy on 7094s too often produced inaccurate results when run
`on S/360s. The quickest way to recover adequate accuracy was to replace old
`codes' single precision declarations with double precision before recompila(cid:173)
`tion for the S/360. This practice exercised S/360 double precision far more
`than had been expected.
`The early S/360s' worst troubles were caused by lack of a guard digit in
`double precision. This lack showed up in multiplication as a failure of identi(cid:173)
`ties like 1.0 * x = x because multiplying x by 1.0 dropped x's last hexadecimal
`digit (4 bits). Similarly, if x and y were very close but had different exponents,
`subtraction dropped off the last digit of the smaller operand before computing
`x - y. This last aberration in double precision undermined a precious theorem
`that single precision then (and now) honored: If 1 /2 s x/y s 2, then no
`rounding error can occur when x -y is computed; it must be computed exactly.
`Innumerable computations had benefited from this minor theorem, most of(cid:173)
`ten unwittingly, for several decades before its first formal announcement and
`proof. We had been taking all this stuff for granted.
`The identities and theorems about exact relationships that persisted, despite
`roundoff, with reasonable implementations of approximate arithmetic were
`not appreciated until they were lost. Previously, all that had been thought to
`matter were precision (how many significant digits were carried) and range
`(the spread between over/underflow thresholds). Since the S/360s' double
`precision had more precision and wider range than the 7094s', software was
`, expected to continue to work at least as well as before. But it didn't.
`Programmers who had matured into program managers were appalled at
`the cost of converting 7094 software to run on S/360s. A small subcommittee
`of SHARE proposed improvements to the S/360 floating point. This committee
`was surprised and grateful to get a fair part of what they asked for from IBM,
`including all-important guard digits. By 1968, these had been retrofitted to
`S/360s in the field at considerable expense; worse than that was customers'
`loss of faith in IBM's infallibility (a lesson learned by Intel 30 years later). IBM
`employees who can remember the incident still shudder.
`
`The People Who Built the Bombs
`Seymour Cray was associated for decades with the CDC and Cray computers
`that were, when he built them, the world's biggest and fastest. He always
`understood what his customers wanted most: speed. And he gave it to them
`even if, in so doing, he also gave them arithmetics more "interesting" than
`anyone else's. Among his customers have been the great government labora(cid:173)
`tories like those at Livermore and Los Alamos, where nuclear weapons were
`designed. The challenges of "interesting" arithmetics were pretty tame to peo(cid:173)
`ple who had to overcome Mother Nature's challenges.
`
`INTEL - 1012
`
`

`

`316
`
`Chapter 4 Arithmetic for Computers
`
`4.12 Historical Perspective and Further Reading
`
`317
`
`Perhaps all of us could learn to live with arithmetic idiosyncrasy if only one
`computer's idiosyncrasies had to be endured. Instead, when accumulating
`different computers' different anomalies, software dies the Death of a Thou(cid:173)
`sand Cuts. Here is an example from Cray's machines:
`if (x == 0.0)
`y = 17.0 else y = z/x
`Could this statement be stopped by a divide-by-zero error? On a CDC 6600
`it could. The reason was a conflict between the 6600's adder, where x was com(cid:173)
`pared with 0.0, and the multiplier and divider. The adder's comparison exam(cid:173)
`ined x's leading 13 bits, which sufficed to distinguish zero from normal
`nonzero floating-point numbers x. The multiplier and divider examined only
`12 leading bits. Consequently, tiny numbers existed that were nonzero to the
`adder but zero to the multiplier and divider! To avoid disasters with these tiny
`numbers, programmers learned to replace statements like the one above by
`i f ( 1 . 0 * x == 0 . 0 )
`y = 1 7 . 0 e l s e y = z I x
`But this statement is unsafe to use in would-be portable software because it
`malfunctions obscurely on other computers designed by Cray, the ones mar(cid:173)
`keted by Cray Research, Inc. If x is so huge that 2.0 * x would overflow, then
`1.0 * x may overflow too! Overflow happens because Cray computers check
`the product's exponent before the product's exponent has been normalized,
`just to save the delay of a single AND gate.
`In case you think the statement above is safe to use now for portable soft(cid:173)
`ware, since computers of the CDC 6600 era are no longer commercially signif(cid:173)
`icant, you should be warned that it can lead to overflow on a Cray computer
`even if z is almost as tiny as x; the trouble here is that the Cray computes not
`z Ix but z * ( 1 / x), and the reciprocal can overflow even though the desired
`quotient is unexceptionable. A similar difficulty troubles the Intel i860s used in
`its massively parallel computers. The would-be programmer of portable code
`faces countless dilemmas like these whenever trying to program for the full
`range of existing computers.
`Rounding error anomalies that are far worse than the over/ underflow
`anomaly just discussed also affect Cray computers. The worst error comes
`from the lack of a guard digit in add/ subtract, an affliction of IBM S / 360s. Fur(cid:173)
`ther bad luck for software is occasioned by the way Cray economized his mul(cid:173)
`tiplier; about one-third of the bits that normal multiplier arrays generate have
`been left out of his multipliers because they would contribute less than a unit
`to the last place of the final Cray-rounded product. Consequently, a Cray's
`multiplier errs by almost a bit more than might have been expected. This error
`is compounded when division takes three multiplications to improve an ap(cid:173)
`proximate reciprocal of the divisor and then multiply the numerator by it.
`Square root compounds a few more multiplication errors.
`The fast way drove out the slow, even though the fast was occasionally
`slightly wrong.
`
`I ....
`
`Making the World Safe for Floating Point, or Vice Versa
`William Kahan was an undergraduate at the University of Toronto in 1953
`when he learned to program its Ferranti-Manchester Mark-I computer.
`Because he entered the field early, Kahan became acquainted with a wide
`range of devices and a large proportion of the personalities active in comput(cid:173)
`ing; the numbers of both were small at that time. He has performed computa(cid:173)
`tions on slide rules, desktop mechanical calculators,
`tabletop analog
`differential analyzers, and so on; he used all but the earliest electronic com(cid:173)
`puters and calculators mentioned in this book.
`Kahan' s desire to deliver reliable software led to an interest in error analysis
`that intensified during two years of postdoctoral study in England, where he
`became acquainted with Wilkinson. In 1960, he resumed teaching at Toronto,
`where an IBM 7090 had been acquired, and was granted free rein to tinker with
`its operating system, Fortran compiler, and runtime library. (He denies that he
`ever came near the 7090 hardware with a soldering iron but admits asking to
`do so.) One story from that time illuminates how misconceptions and numer(cid:173)
`ical anomalies in computer systems can incur awesome hidden costs.
`A graduate student in aeronautical engineering used the 7090 to simulate
`the wings he was designing for short takeoffs and landings. He knew such a
`wing would be difficult to control if its characteristics included an abrupt onset
`of stall, but he thought he could avoid that. His simulations were telling him
`otherwise. Just to be sure that roundoff was not interfering, he had repeated
`many of his calculations in double precision and gotten results much like those
`'in single; his wings had stalled abruptly in both precisions. Disheartened, the
`student gave up.
`Meanwhile Kahan replaced IBM's logarithm program (ALOG) with one of
`his own, which he hoped would provide better accuracy. While testing it,
`Kahan reran programs using the new version of ALOG. The student's results
`changed significantly; Kahan approached him to find out what had happened.
`The student was puzzled. Much as the student preferred the results pro(cid:173)
`duced with the new ALOG-they predicted a gradual stall-he knew they
`must be wrong because they disagreed with his double precision results. The
`discrepancy between single and double precision results disappeared a few
`days later when a new release of IBM's double precision arithmetic software
`for the 7090 arrived. (The 7090 had no double precision hardware.) He went on
`to write a thesis about it and to build the wings; they performed as predicted.
`But that is not the end of the story.
`In 1963, the 7090 was replaced by a faster 7094 with double precision
`floating-point hardware but with otherwise practically the same instruction set
`as the 7090. Only in double precision and only when using the new hardware
`did the wing stall abruptly again. A lot of time was spent to find out why. The
`7094 hardware turned out, like the superseded 7090 software and the sub(cid:173)
`sequent early S/360s, to lack a guard bit in double precision. Like so many pro(cid:173)
`grammers on those machines and on Cray's, the student discovered a trick to
`
`INTEL - 1012
`
`

`

`318
`
`Chapter 4 Arithmetic for Computers
`
`4.12 Historical Perspective and Further Reading
`
`319
`
`compensate for the lack of a guard digit; he wrote the ex~ression ( 0. 5 - ~)
`+ O. 5 in place of 1. 0 - x. Nowadays we would blush 1f we had to explam
`why such a trick might be necessary, but it solved the student's prob~em.
`.
`Meanwhile the lure of California was working on Kahan and his family;
`they came to Berkeley and he to the University of California. An opportunity
`presented itself in 1974 when accuracy questions induced Hewlett-Packard'.s
`calculator designers to call in a consultant. The consultant was Kahan, and his
`work dramatically improved the accuracy of HP calculators, but that is a~o_th(cid:173)
`er story. Fruitful collaboration with congenial co-workers, however, fortified
`him for the next and crucial opportunity.
`It came in 1976, when John F. Palmer at Intel was empowered to specify the
`"best possible" floating-point arithmetic for all of Intel's product line. The 8086
`was imminent, and an 8087 floating-point coprocessor for the 8086 was
`contemplated. (A coprocessor is simply an additional chip that accelera_tes a p~r(cid:173)
`tion of the work of a processor; in this case, it accelerated floatmg-pomt
`computation.)
`Palmer had obtained his Ph.D. at Stanford a few years before and knew
`whom to call for counsel of perfection-Kahan. They put together a design that
`obviously would have been impossible only a few years earlier and looked no:
`quite possible at the time. But a new Israeli team of Intel employees led by Rafi
`Nave felt challenged to prove their prowess to Americans and leaped at an
`opportunity to put something impossible on a chip-the 8087.
`.
`By now, floating-point arithmetics that had been merely d1ver~e am?ng
`mainframes had become chaotic among microprocessors, one of which might
`be host to a dozen varieties of arithmetic in ROM firmware or software. Robert
`G. Stewart, an engineer prominent in IEEE activities, got fed up with this a~(cid:173)
`archy and proposed that the IEEE draft a decent floating-point ~tandard. Si(cid:173)
`multaneously, word leaked out in Silicon Valley that Intel ':as ~omg to p~t on
`one chip some awesome floating point well beyond anything its competitors
`had in mind. The competition had to find a way to slow Intel down, so they
`formed a committee to do what Stewart requested.
`Meetings of this committee began in late 1977 with a plethora of competing
`drafts from innumerable sources and dragged on into 1985

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket