`
`Chapter 4 Arithmetic for Computers
`
`4.11 Concluding Remarks
`
`309
`
`With the explanation of computer arithmetic in this chapter comes a de(cid:173)
`scription of much more of the MIPS instruction set. One point of confusion is
`the instructions covered in these chapters versus instructions executed by
`MIPS chips versus the instructions accepted by MIPS assemblers. The next two
`figures try to make this clear.
`Figure 4.52 lists the MIPS instructions covered in Chapters 3 and 4. We call
`the set of instructions on the left-hand side of the figure the MIPS core. The in(cid:173)
`structions on the right we call the MIPS arithmetic core. On the left of Figure 4.53
`are the instructions the MIPS processor executes that are not found in
`Figure 4.52. We call the full set of hardware instructions MIPS I. On the right
`of Figure 4.53 are the instructions accepted by the assembler that are not part
`of MIPS I. We call this set of instructions Pseudo MIPS.
`
`taken on a life of its own. Although Intel firmly stands behind the qual(cid:173)
`ity of the current version of the Pentium processor, we recognize that
`many users have concerns. We want to resolve these concerns. Intel will
`exchange the current version of the Pentium processor for an updated
`version, in which this floating-point divide flaw is corrected, for any
`owner who requests it, free of charge anytime during the life of their
`computer." Analysts estimate that this recall cost Intel $300 million.
`This story brings up a few points for everyone to ponder. How much cheaper
`would it have been to fix the bug in July 1994? What was the cost to repair the
`damage to Intel's reputation? And what is the corporate responsibility in
`disclosing bugs in a product so widely used and relied upon as a micro(cid:173)
`processor?
`In April 1997 another floating-point bug was revealed in the Pentium Pro
`and Pentium II microprocessors. When the floating-point-to-integer store in(cid:173)
`structions (fist, fi stp) encounter a negative floating-point number that is
`too large to fit in a 16- or 32-bit word after being converted to integer, they set
`the wrong bit in the FPO status word (precision exception instead of invalid
`operation exception). To Intel's credit, this time they publicly acknowledged
`the bug and offered a software patch to get around it-quite a different reac(cid:173)
`tion from what they did in 1994.
`
`II Concluding Remarks
`
`Computer arithmetic is distinguished from paper-and-pencil arithmetic by
`the constraints of limited precision. This limit may result in invalid operations
`through calculating numbers larger or smaller than the predefined limits.
`Such anomalies, called "overflow" or "underflow," may result in exceptions
`or interrupts, emergency events similar to unplanned subroutine calls. Chap(cid:173)
`ter 5 discusses exceptions in more detail.
`Floating-point arithmetic has the added challenge of being an approxima(cid:173)
`tion of real numbers, and care needs to be taken to ensure that the computer
`number selected is the representation closest to the actual number. The chal(cid:173)
`lenges of imprecision and limited representation are part of the inspiration for
`the field of numerical analysis.
`Over the years, computer arithmetic has become largely standardized,
`greatly enhancing the portability of programs. Two's complement binary inte(cid:173)
`ger arithmetic and IEEE 754 binary floating-point arithmetic are found in the
`vast majority of computers sold today. For example, every desktop computer
`sold since this book was first printed follows these conventions.
`A side effect of the stored-program computer is that bit patterns have no in(cid:173)
`herent meaning. The same bit pattern may represent a signed integer, unsigned
`integer, floating-point number, instruction, and so on. It is the instruction that
`operates on the word that determines its meaning.
`
`MIPS core instructions NW::IMAH::f\l
`
`R
`I
`R
`
`-
`
`- ! - -
`
`-+--
`
`add
`add immediate
`add unsigned
`add immediate unsigned
`subtract
`- - - - - - - - - - t -
`subtract unsigned
`and
`and immediate
`or
`or immediate
`shift left logical
`- - - -- - -
`shift right logical
`load upper immediate
`- - -
`load word
`store word
`load byte unsigned
`store byte
`branch on equal
`branch on not equal
`jump
`jump and link
`jump register
`set less than
`set less than immediate
`set less than unsigned
`set less than immediate unsigned
`
`add
`addi
`addu
`add i u
`sub
`-+- subu
`_j___ and
`andi
`or
`ori
`s l l
`s r l
`l ui
`lw
`SW
`l bu
`sb
`beq
`bne
`j
`j al
`jr
`slt
`s lt i
`s ltu
`s lt i u
`
`- - l - - -
`
`i sub . d
`
`mu l . s
`floating-point multiply single
`floating-point multiply double ~ l d
`floating-point d1v1de single
`v : s
`- -- I --
`floating-point d1v1de double
`v . d
`--1---
`l we 1
`1 o ad word to floating-point single
`- -'-s-to- re word to floating-point single ~
`wel
`be 1 t
`branch on floating-point true
`belf
`branch on floating-point false
`- --+--
`e . x . s
`floating-point compare single
`(x = eq , neq , l t , le . gt , ge )
`floating-point compare double
`R
`(x = eq, neq . l t , le , gt. ge )
`R
`- --+--
`
`-
`
`R
`
`------i--e . x d
`---+-
`
`mul t
`multiply
`- - - - -+ - -- -
`multu
`multiply unsigned
`d i v
`divide
`d iv u
`divide unsigned
`mfhi
`R
`move from Hi
`- ---1--(cid:173)
`mflo
`R
`m o v e from Lo
`- --+-(cid:173)
`mfeO
`move from system control (EPC)
`R
`- - - -+ - -
`- ---1-
`add . s
`floating-point add single
`add . d
`R
`floating-point add double
`- l --
`-
`sub . s
`floating-point subtract single
`I
`----+-------+--f-lo-ating-point subtract double
`R
`R
`
`MIPS arithmetic core
`
`IW::IMAH::f\l
`
`R
`R
`R
`
`i
`
`,•
`
`R
`
`R
`R
`R
`R
`
`R
`R
`R
`R
`R
`
`R
`R
`
`R
`
`R
`
`FIGURE 4 .52 The MIPS instruction set covered so far. This book concentrates on the instructions in the left column.
`
`Figure 4.54 gives the popularity of the MIPS instructions for two programs: gee
`and spice. All instructions are listed that were responsible for at least 0.5% of the
`instructions executed. The table following summarizes that information:
`
`INTEL - 1012
`
`
`
`310
`
`Chapter 4 Arithmetic for Computers
`
`4.11 Concluding Remarks
`
`311
`
`Instruction subset &JS&a;.:;sa
`
`MIPS core
`MIPS arithmetic core
`Remaining MIPS I
`
`95%
`45%
`0%
`49%
`5%
`6%
`Note _that although programmers and compiler writers may use MIPS I to
`have a n_cher menu of options, MIPS core instructions dominate gee execution,
`and the mteger core plus arithmetic core dominate spice.
`(cid:127)
`i'lt::Pl
`R
`
`Remaining MIPS I
`
`exclusive or ( rs EB rt )
`
`exclusive or immediate
`nor ( -,(rs v rt))
`shift right arithmetic
`
`shift left logical variable
`
`shift right logical variable
`
`shift right arith. variable
`
`move to Hi
`
`move to Lo
`
`load halfword
`
`load halfword unsigned
`store halfword
`
`(cid:127)
`A!
`S:::'ffl
`xor
`
`p seudo MfPS
`
`move
`
`xori
`
`nor
`
`s ra
`
`s l l V
`s rl v
`s rav
`
`mthi
`
`mtlo
`
`l h
`
`lhu
`
`sh
`
`I
`
`R
`
`R
`
`R
`
`R
`
`R
`
`R
`
`R
`
`I
`
`I
`
`absolute value
`not (-,rs)
`
`negate (signed or unsigned)
`rotate left
`
`rotate right
`
`mult. & don 't check oflw (signed or uns.)
`multiply & check oflw (signed or uns.)
`divide and check overflow
`
`divide and don't check overflow
`remainder (signed or unsigned)
`load immediate
`
`load address
`
`1r:::1r(cid:127)
`
`move
`
`i':1::n(cid:127)
`
`rd,rs
`
`abs
`
`not
`
`negs
`
`rol
`
`ror
`mul s
`
`mul OS
`div
`
`divu
`rerm·
`
`l i
`
`rd,rs
`
`rd,rs
`
`rd,rs
`
`rd,rs ,rt
`
`rd,rs,rt
`
`rd,rs,rt
`
`rd,rs ,rt
`
`rd,rs,rt
`
`rd,rs,rt
`
`rd,rs,rt
`
`rd ,imm
`
`IM::IIFHMWWI Arithmetic core + MIPS I IMi::ii(cid:141) HMEWI
`
`Core MIPS
`
`add
`add immediate
`add unsigned
`add immediate unsigned
`subtract unsigned
`and
`and immediate
`shift left logical
`shift right logical
`load upper immediate
`load word
`store word
`load byte
`store byte
`branch on equal (zero)
`branch on not equal (zero)
`jump and link
`jump register
`
`set less than
`
`add
`addi
`addu
`addiu
`subu
`and
`andi
`s l l
`s r l
`l ui
`lw
`
`SW
`lb
`sb
`beq
`bne
`j al
`j r
`s l t
`
`set less than immediate
`
`s l ti
`
`0%
`0%
`9%
`17%
`0%
`1%
`2%
`5%
`0%
`2%
`21%
`12%
`1%
`1%
`9%
`8%
`1%
`1%
`
`FP add double
`0%
`FP subtract double
`0%
`FP multiply double
`10%
`FP divide double
`1%
`load word to FP single
`1%
`store word to FP single
`0%
`branch on FP true
`1%
`branch on FP false
`5%
`FP compare double
`1%
`6% move to FP
`7% move from FP
`2%
`convert float integer
`0%
`shift right arithmetic
`load half
`0%
`branch less than zero
`3%
`branch greater or equal zero
`2%
`branch less or equal zero
`1%
`1%
`
`2%
`
`1%
`
`0%
`
`0%
`
`add . d
`sub . d
`mul . d
`div . d
`l . s
`s . s
`belt
`bclf
`c . x . d
`mtcl
`mfc2
`cut
`s ra
`l h
`bltz
`bgez
`blez
`
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`0%
`2%
`1%
`1%
`1%
`0%
`
`4%
`3%
`5%
`2%
`24%
`9%
`1%
`1%
`1%
`2%
`2%
`1%
`0%
`0%
`0%
`0%
`1%
`
`load word left (unaligned)
`load word right (unaligned)
`store word left (unaligned)
`store word right (unaligned)
`branch on less than zero
`
`branch on less or equal zero
`
`branch on greater than zero
`branch on 2! zero
`
`branch on 2! zero and link
`branch on < zero and link
`
`jump and link register
`
`return from exception
`system call
`
`break (cause exception)
`
`move from FP to integer
`
`move to FP from integer
`FP move (s or g)
`FP absolute value (s or g)
`FP negate (s or g)
`FP convert (w, s, or g)
`FP compare un (1 or g)
`
`lwl
`
`lwr
`
`swl
`
`swr
`
`bltz
`
`blez
`
`bgtz
`
`bgez
`
`bgezal
`
`b ltza l
`j al r
`rfe
`
`syscal l
`
`break
`
`mfcl
`
`mtcl
`mov .f
`abs .f
`neg .f
`cvt .ff
`c . xn .f
`
`I
`I
`I
`
`I
`I
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`R
`
`R
`
`R
`
`R
`
`R
`
`R
`
`R
`R
`R
`R
`R
`
`load double
`
`store double
`
`unaligned load word
`
`unaligned store word
`
`unaligned load halfword (signed or uns.)
`unaligned store halfword
`branch
`
`branch on equal zero
`branch on > (signed or unsigned)
`branch on > (signed or unsigned)
`branch on < (signed or unsigned)
`branch on < (signed or unsigned)
`set equal
`
`set not equal
`
`set greater or equal (signed or unsigned)
`set greater than (signed or unsigned)
`set less or equal (signed or unsigned)
`set less than (signed or unsigned)
`load to floating point (s or g)
`store from floating point (s or g)
`
`la
`
`l d
`
`sd
`
`ulw
`
`usw
`ul hs
`
`ush
`
`b
`
`beqz
`
`bges
`
`bgts
`bl es
`bl t s
`seq
`
`sne
`sges
`sgts
`s l es
`s l es
`l .f
`s .f
`
`rd,addr
`
`rd ,addr
`
`rd,addr
`
`rd,addr
`
`rd,addr
`
`rd ,addr
`
`rd,addr
`
`Label
`
`rs,L
`
`rs,rt,L
`
`rs,rt,L
`
`rs .rt,L
`
`rs,rt,L
`
`rd,rs ,rt
`
`rd,rs,rt
`
`rd ,rs ,rt
`
`rd,rs,rt
`
`rd,rs,rt
`rd ,rs,rt
`rd ,addr
`rd,addr
`
`FIGUR~ _4.53 Remaining MIPS _l _and "Pseu_do MIPS" Instruction sets. Appendix A describes all these instructions.!
`mea ns smgle (s) and double prec1s10n (d) vers10ns of the floating-point instruction, and s means signed and unsigned (u)
`vers10ns.
`
`set less than unsigned
`
`set less than imm. uns .
`
`sltu
`
`s lt i u
`
`1%
`
`1%
`
`0%
`
`0%
`
`FIGURE 4.54 The frequency of the MIPS Instructions for two programs, gee and spice. Calculated from "pixie"
`output of the full MIPS I. (Pixie is an instruction measurement tool from MIPS.) All instructions that accou_nted for at_le_ast
`0.5% of the instructions executed in either gee or spice are included in the table. Thus the integer multiply and d1v1de
`instructions are not listed because they were responsible for less than 0.5% of the instructions executed. Pseudoinstructions
`are converted into MIPS I before execution, and hence do not appear here.
`
`For the rest of the book, we concentrate on the MIPS core instructions-the
`integer instruction set excluding multiply and divide-to make the explan(cid:173)
`ation of computer design easier. As we can see, the MIPS core includes the
`most popular MIPS instructions, and be assured that understanding a comput(cid:173)
`er that runs the MIPS core will give you sufficient background to understand
`even more ambitious machines.
`
`INTEL - 1012
`
`(cid:127)
`
`
`312
`
`Chapter 4 Arithmetic for Computers
`
`4.12 Historical Perspective and Further Reading
`
`313
`
`II Historical Perspective and Further Reading
`
`Gresham's Law ("Bad money drives out Good") for computers would say, "The
`Fast drives out the Slow even if the Fast is wrong."
`
`W. Kahan, 1992
`
`At first it may be hard to imagine a subject of less interest than the correctness
`of computer arithmetic or its accuracy, and harder still to understand why a
`subject so old and mathematical should be so controversial. Computer arith(cid:173)
`metic is as old as computing itself, and some of the subject's earliest notions,
`like the economical reuse of registers during serial multiplication and divi(cid:173)
`sion, still command respect today. Maurice Wilkes [1985] recalled a conversa(cid:173)
`tion about that notion during his visit to the United States in 1946, before the
`earliest stored-program machine had been built:
`
`... a project under von Neumann was to be set up at the Institute of Advanced
`Studies in Princeton . ... Goldstine explained to me the principal features of the
`design , including the device whereby the digits of the multiplier were put into
`the tail of the accumulator and shifted out as the least significant part of the prod(cid:173)
`uct was shifted in. I expressed some admiration at the way registers and shifting
`circuits were arranged ... and Goldstine remarked that things of that nature
`came very easily to von Neumann.
`
`There is no controversy here; it can hardly arise in the context of exact inte(cid:173)
`ger arithmetic so long as there is general agreement on what integer the correct
`result should be. However, as soon as approximate arithmetic enters the pic(cid:173)
`ture, so does controversy, as if one person's "negligible" must be another's
`"everything."
`
`The First Dispute
`
`Floating-point arithmetic kindled disagreement before it was ever built. John
`von Neumann was aware of Konrad Zuse's proposal for a computer in Ger(cid:173)
`many in 1939 that was never built, probably because the floating point made
`it appear too complicated to finish before the Germans expected World War II
`to end. Hence von Neumann refused to include it in the machine he built at
`Princeton. In an influential report coauthored in 1946 with H. H . Goldstine
`and A. W. Burks, he gave the arguments for and against floating point. In
`favor:
`
`.. . to retain in a sum or product as many sign ificant digits as possible and . .. to
`free the human operator from the burden of estimating and inserting into a prob(cid:173)
`lem "scale factors" - multiplication constants which serve to keep numbers
`within the limits of the machine.
`
`Floating point was excluded for several reasons:
`
`There is, of co11rse, 110 denying the fact that human ti111e is co11s11111ed in arrang(cid:173)
`ing for the i11troductio11 of suitable scale factors. We on ly arg11e that the ti111e con(cid:173)
`s11111ed is a very small perce11tage of the total time we will spend in preparing an
`interesting problem for our machine. The first advantage of the floating point is,
`we feel, somewhat illusory. In order to have such a floating point, one 11111st waste
`memory capacity which could otherwise be 11sed for carrying 111ore digits per
`word. It would therefore seem to us not at all clear whether the modest advan(cid:173)
`tages of a floating binary point offset the loss of memory capacity and the in(cid:173)
`creased complexity of the arithmetic and control circuits.
`
`The argument seems to be that most bits devoted to exponent fields would be
`bits wasted. Experience has proved otherwise.
`One software approach to accommodate reals without floating-point hard(cid:173)
`ware was called floating vectors; the idea was to compute at runtime one scale
`factor for a whole array of numbers, choosing the scale factor so that the array's
`biggest number would barely fill its field . By 1951, James H. Wilkinson had
`used this scheme extensively for matrix computations. The problem proved to
`be that a program might encounter a very large val ue, and hence the scale
`factor must accommodate these rare large numbers. The comn10n numbers
`would thus have man y lea ding Os, since all numbers had to use a single scale
`factor. Accuracy was sacrificed because the least significant bits had to be lost
`on the right to accommodate leading Os. This wastage became obvious to prac-
`'• titioners on early machines that displayed all their memory bits as dots on
`cathode ray tubes (like TV screens) because the loss of precision was visible.
`Where floating point deserved to be used, no practical alternative existed .
`Thus true floatin g-point hardware became popular because it was useful.
`By 1957, floating-point hardware was almost ubiquitous. A decimal floating(cid:173)
`point unit was available for the IBM 650; and soon the IBM 704, 709, 7090,
`7094 ... series would offer binary floating-point hardware for double as well
`as single precision .
`As a result, everybod y had floa ting point, but every implementation was
`different.
`
`Diversity versus Portability
`Since roundoff introduces some error into almost all floating-point opera(cid:173)
`tions, to complain about another bit of error seems picayune. So for 20 years
`nobody complained much that those operations behaved a little differently on
`different machines. If software required clever tricks to circum vent those idio(cid:173)
`syncrasies and finally d eliver results correct in all but the last several bits,
`such tricks were d eemed part of the programmer's art. For a long time, matri x
`computations mystified most people w ho had no noti on of error ana lysis; per(cid:173)
`haps this continues to be tru e. That ma y be why people are still surprised that
`
`INTEL - 1012
`
`
`
`314
`
`Chapter 4 Arithmetic for Computers
`
`numerically stable matrix computations depend upon the quality of arith(cid:173)
`metic in so few places, far fewer than are generally supposed. Books by
`Wilkinson and widely used software packages like Unpack and Eispack
`sustained a false impression, widespread in the early 1970s, that a modicum
`of skill sufficed to produce portable numerical software.
`Portable here means that the software is distributed as source code in some
`standard language to be compiled and executed on practically any commer(cid:173)
`cially significant machine, and that it will then perform its task as well as any
`other program performs that task on that machine. Insofar as numerical soft(cid:173)
`ware has often been thought to consist entirely of machine-independent math(cid:173)
`ematical formulas, its portability has often been taken for granted; the mistake
`in that presumption will become clear shortly.
`Packages like Unpack and Eispack cost so much to develop-over a
`hundred dollars per line of Fortran delivered-that they could not have been
`developed without U.S. government subsidy; their portability was a pre(cid:173)
`condition for that subsidy. But nobody thought to distinguish how various
`components contributed to their cost. One component was algorithmic-de(cid:173)
`vise an algorithm that deserves to work on at least one computer despite its
`roundoff and over/underflow limitations. Another component was the soft(cid:173)
`ware engineering effort required to achieve and confirm portability to the di(cid:173)
`verse computers commercially significant at the time; this component grew
`more onerous as ever more diverse floating-point arithmetics blossomed in the
`1970s.
`And yet scarcely anybody realized how much that diversity inflated the cost
`of such software packages.
`
`A Backward Step
`
`Early evidence that somewhat different arithmetics could engender grossly
`different software development costs was presented in 1964. It happened at a
`meeting of SHARE, the IBM mainframe users' group, at which IBM
`announced System/360, the successor to the 7094 series. One of the speakers
`described the tricks he had been forced to devise to achieve a level of quality
`for the S/360 library that was not quite so high as he had previously achieved
`for the 7094.
`Part of the trouble could have been foretold by von Neumann had he still
`been alive. In 1948 he and Goldstine had published a lengthy error analysis so
`difficult and so pessimistic that hardly anybody paid attention to it. It did pre(cid:173)
`dict correctly, however, that computations with larger arrays of data would
`probably fall prey to roundoff more often. IBM S/360s had bigger memories
`than 7094s, so data arrays could grow bigger, and they did. To make matters
`worse, the S/360s had narrower single precision words (32 bits versus 36) and
`used a cruder arithmetic (hexadecimal or base 16 versus binary or base 2) with
`consequently poorer worst-case precision (21 significant bits versus 27) than
`
`l ~(
`
`I:
`
`j
`
`4.12 Historical Perspective and Further Reading
`
`315
`
`old 7094s. Consequently, software that had almost always provided (barely)
`satisfactory accuracy on 7094s too often produced inaccurate results when run
`on S/360s. The quickest way to recover adequate accuracy was to replace old
`codes' single precision declarations with double precision before recompila(cid:173)
`tion for the S/360. This practice exercised S/360 double precision far more
`than had been expected.
`The early S/360s' worst troubles were caused by lack of a guard digit in
`double precision. This lack showed up in multiplication as a failure of identi(cid:173)
`ties like 1.0 * x = x because multiplying x by 1.0 dropped x's last hexadecimal
`digit (4 bits). Similarly, if x and y were very close but had different exponents,
`subtraction dropped off the last digit of the smaller operand before computing
`x - y. This last aberration in double precision undermined a precious theorem
`that single precision then (and now) honored: If 1 /2 s x/y s 2, then no
`rounding error can occur when x -y is computed; it must be computed exactly.
`Innumerable computations had benefited from this minor theorem, most of(cid:173)
`ten unwittingly, for several decades before its first formal announcement and
`proof. We had been taking all this stuff for granted.
`The identities and theorems about exact relationships that persisted, despite
`roundoff, with reasonable implementations of approximate arithmetic were
`not appreciated until they were lost. Previously, all that had been thought to
`matter were precision (how many significant digits were carried) and range
`(the spread between over/underflow thresholds). Since the S/360s' double
`precision had more precision and wider range than the 7094s', software was
`, expected to continue to work at least as well as before. But it didn't.
`Programmers who had matured into program managers were appalled at
`the cost of converting 7094 software to run on S/360s. A small subcommittee
`of SHARE proposed improvements to the S/360 floating point. This committee
`was surprised and grateful to get a fair part of what they asked for from IBM,
`including all-important guard digits. By 1968, these had been retrofitted to
`S/360s in the field at considerable expense; worse than that was customers'
`loss of faith in IBM's infallibility (a lesson learned by Intel 30 years later). IBM
`employees who can remember the incident still shudder.
`
`The People Who Built the Bombs
`Seymour Cray was associated for decades with the CDC and Cray computers
`that were, when he built them, the world's biggest and fastest. He always
`understood what his customers wanted most: speed. And he gave it to them
`even if, in so doing, he also gave them arithmetics more "interesting" than
`anyone else's. Among his customers have been the great government labora(cid:173)
`tories like those at Livermore and Los Alamos, where nuclear weapons were
`designed. The challenges of "interesting" arithmetics were pretty tame to peo(cid:173)
`ple who had to overcome Mother Nature's challenges.
`
`INTEL - 1012
`
`
`
`316
`
`Chapter 4 Arithmetic for Computers
`
`4.12 Historical Perspective and Further Reading
`
`317
`
`Perhaps all of us could learn to live with arithmetic idiosyncrasy if only one
`computer's idiosyncrasies had to be endured. Instead, when accumulating
`different computers' different anomalies, software dies the Death of a Thou(cid:173)
`sand Cuts. Here is an example from Cray's machines:
`if (x == 0.0)
`y = 17.0 else y = z/x
`Could this statement be stopped by a divide-by-zero error? On a CDC 6600
`it could. The reason was a conflict between the 6600's adder, where x was com(cid:173)
`pared with 0.0, and the multiplier and divider. The adder's comparison exam(cid:173)
`ined x's leading 13 bits, which sufficed to distinguish zero from normal
`nonzero floating-point numbers x. The multiplier and divider examined only
`12 leading bits. Consequently, tiny numbers existed that were nonzero to the
`adder but zero to the multiplier and divider! To avoid disasters with these tiny
`numbers, programmers learned to replace statements like the one above by
`i f ( 1 . 0 * x == 0 . 0 )
`y = 1 7 . 0 e l s e y = z I x
`But this statement is unsafe to use in would-be portable software because it
`malfunctions obscurely on other computers designed by Cray, the ones mar(cid:173)
`keted by Cray Research, Inc. If x is so huge that 2.0 * x would overflow, then
`1.0 * x may overflow too! Overflow happens because Cray computers check
`the product's exponent before the product's exponent has been normalized,
`just to save the delay of a single AND gate.
`In case you think the statement above is safe to use now for portable soft(cid:173)
`ware, since computers of the CDC 6600 era are no longer commercially signif(cid:173)
`icant, you should be warned that it can lead to overflow on a Cray computer
`even if z is almost as tiny as x; the trouble here is that the Cray computes not
`z Ix but z * ( 1 / x), and the reciprocal can overflow even though the desired
`quotient is unexceptionable. A similar difficulty troubles the Intel i860s used in
`its massively parallel computers. The would-be programmer of portable code
`faces countless dilemmas like these whenever trying to program for the full
`range of existing computers.
`Rounding error anomalies that are far worse than the over/ underflow
`anomaly just discussed also affect Cray computers. The worst error comes
`from the lack of a guard digit in add/ subtract, an affliction of IBM S / 360s. Fur(cid:173)
`ther bad luck for software is occasioned by the way Cray economized his mul(cid:173)
`tiplier; about one-third of the bits that normal multiplier arrays generate have
`been left out of his multipliers because they would contribute less than a unit
`to the last place of the final Cray-rounded product. Consequently, a Cray's
`multiplier errs by almost a bit more than might have been expected. This error
`is compounded when division takes three multiplications to improve an ap(cid:173)
`proximate reciprocal of the divisor and then multiply the numerator by it.
`Square root compounds a few more multiplication errors.
`The fast way drove out the slow, even though the fast was occasionally
`slightly wrong.
`
`I ....
`
`Making the World Safe for Floating Point, or Vice Versa
`William Kahan was an undergraduate at the University of Toronto in 1953
`when he learned to program its Ferranti-Manchester Mark-I computer.
`Because he entered the field early, Kahan became acquainted with a wide
`range of devices and a large proportion of the personalities active in comput(cid:173)
`ing; the numbers of both were small at that time. He has performed computa(cid:173)
`tions on slide rules, desktop mechanical calculators,
`tabletop analog
`differential analyzers, and so on; he used all but the earliest electronic com(cid:173)
`puters and calculators mentioned in this book.
`Kahan' s desire to deliver reliable software led to an interest in error analysis
`that intensified during two years of postdoctoral study in England, where he
`became acquainted with Wilkinson. In 1960, he resumed teaching at Toronto,
`where an IBM 7090 had been acquired, and was granted free rein to tinker with
`its operating system, Fortran compiler, and runtime library. (He denies that he
`ever came near the 7090 hardware with a soldering iron but admits asking to
`do so.) One story from that time illuminates how misconceptions and numer(cid:173)
`ical anomalies in computer systems can incur awesome hidden costs.
`A graduate student in aeronautical engineering used the 7090 to simulate
`the wings he was designing for short takeoffs and landings. He knew such a
`wing would be difficult to control if its characteristics included an abrupt onset
`of stall, but he thought he could avoid that. His simulations were telling him
`otherwise. Just to be sure that roundoff was not interfering, he had repeated
`many of his calculations in double precision and gotten results much like those
`'in single; his wings had stalled abruptly in both precisions. Disheartened, the
`student gave up.
`Meanwhile Kahan replaced IBM's logarithm program (ALOG) with one of
`his own, which he hoped would provide better accuracy. While testing it,
`Kahan reran programs using the new version of ALOG. The student's results
`changed significantly; Kahan approached him to find out what had happened.
`The student was puzzled. Much as the student preferred the results pro(cid:173)
`duced with the new ALOG-they predicted a gradual stall-he knew they
`must be wrong because they disagreed with his double precision results. The
`discrepancy between single and double precision results disappeared a few
`days later when a new release of IBM's double precision arithmetic software
`for the 7090 arrived. (The 7090 had no double precision hardware.) He went on
`to write a thesis about it and to build the wings; they performed as predicted.
`But that is not the end of the story.
`In 1963, the 7090 was replaced by a faster 7094 with double precision
`floating-point hardware but with otherwise practically the same instruction set
`as the 7090. Only in double precision and only when using the new hardware
`did the wing stall abruptly again. A lot of time was spent to find out why. The
`7094 hardware turned out, like the superseded 7090 software and the sub(cid:173)
`sequent early S/360s, to lack a guard bit in double precision. Like so many pro(cid:173)
`grammers on those machines and on Cray's, the student discovered a trick to
`
`INTEL - 1012
`
`
`
`318
`
`Chapter 4 Arithmetic for Computers
`
`4.12 Historical Perspective and Further Reading
`
`319
`
`compensate for the lack of a guard digit; he wrote the ex~ression ( 0. 5 - ~)
`+ O. 5 in place of 1. 0 - x. Nowadays we would blush 1f we had to explam
`why such a trick might be necessary, but it solved the student's prob~em.
`.
`Meanwhile the lure of California was working on Kahan and his family;
`they came to Berkeley and he to the University of California. An opportunity
`presented itself in 1974 when accuracy questions induced Hewlett-Packard'.s
`calculator designers to call in a consultant. The consultant was Kahan, and his
`work dramatically improved the accuracy of HP calculators, but that is a~o_th(cid:173)
`er story. Fruitful collaboration with congenial co-workers, however, fortified
`him for the next and crucial opportunity.
`It came in 1976, when John F. Palmer at Intel was empowered to specify the
`"best possible" floating-point arithmetic for all of Intel's product line. The 8086
`was imminent, and an 8087 floating-point coprocessor for the 8086 was
`contemplated. (A coprocessor is simply an additional chip that accelera_tes a p~r(cid:173)
`tion of the work of a processor; in this case, it accelerated floatmg-pomt
`computation.)
`Palmer had obtained his Ph.D. at Stanford a few years before and knew
`whom to call for counsel of perfection-Kahan. They put together a design that
`obviously would have been impossible only a few years earlier and looked no:
`quite possible at the time. But a new Israeli team of Intel employees led by Rafi
`Nave felt challenged to prove their prowess to Americans and leaped at an
`opportunity to put something impossible on a chip-the 8087.
`.
`By now, floating-point arithmetics that had been merely d1ver~e am?ng
`mainframes had become chaotic among microprocessors, one of which might
`be host to a dozen varieties of arithmetic in ROM firmware or software. Robert
`G. Stewart, an engineer prominent in IEEE activities, got fed up with this a~(cid:173)
`archy and proposed that the IEEE draft a decent floating-point ~tandard. Si(cid:173)
`multaneously, word leaked out in Silicon Valley that Intel ':as ~omg to p~t on
`one chip some awesome floating point well beyond anything its competitors
`had in mind. The competition had to find a way to slow Intel down, so they
`formed a committee to do what Stewart requested.
`Meetings of this committee began in late 1977 with a plethora of competing
`drafts from innumerable sources and dragged on into 1985