throbber
Huppenthal
`
`Reference 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2102, p. 1
`
`

`

`We believe the key to the 10’s longevity is its
`basically simple, clean structure with adequately large
`(one Mbyte) address space that allows users to get
`work done. In this way, it has evolved easily with use
`and with technology. An equally significant factor in
`its success is a single operating system environment
`enabling user program sharing among all machines.
`The machine has thus attracted users who have built
`significant languages and applications in a variety of
`environments. These user-developers are thus the
`dominant system architects-implementors.
`In retrospect, the machine turned out to be larger
`and further from a minicomputer than we expected.
`As such it could easily have died or destroyed the tiny
`DEC organization that started it. We hope that this
`paper has provided insight into the interactions of its
`development.
`Acknowledgments. Dan Siewiorek deserves our
`greatest thanks for helping with a complete editing of
`the text. The referees and editors have been especially
`helpful. The important program contributions by users
`are too numerous for us to give by name but here are
`most of them: apl, Basic, bliss, dot, Lisp, Pascal,
`Simula, sos, teco, and Tenex. Likewise, there have
`been so many contributions to the 10’s architecture
`and implementations within DEC and throughout the
`user community that we dare not give what would be a
`partial list.
`Received April 1977; revised September 1977
`References
`X. Beil, G., Cady, R., McFarland, H., Delagi, B., O’Laughlin, J.,
`and Noonan, R. A new architecture for minicomputers —the DEC
`PDP-11. Proc. AFIPS 1970 SJCC, Vol. 36, AFIPS Press,
`Montvale, N.J., pp. 657-675.
`2. Bell, G., and Freeman, P. Cai —A computer architecture for
`AI research AFIPS Conf. Proc. Vol. 38 (Spring, 1971), 779-790.
`3. Bell, G., and Newell, A. Computer Structures: Readings and
`Examples. McGraw-Hill, New York, 1971.
`4. Bobrow, D.G., Burchfiel, J.D., Murphy, D. L., and
`Tomlinson, R.S. TENEX, A Paged Time Sharing System for the
`PDP-10. Comm. ACM 15, 3 (March 1972), 135-143.
`5. Bullman, D.M. Editor, stack computers issue. Computer 10, 5
`(May 1977), 14-52.
`6. Clark, W.A. The Lincoln TX-2 computer. Proc. WJCC 1957,
`Vol. 11, pp. 143-171.
`7. Lunde, A. Empirical evaluation of some features of Instruction
`Set Processor architecture. Comm. ACM 20, 3 (March 1977), 143-
`152.
`8. Mitchell, J.L., and Olsen, K.H. TX-0, a transistor computer.
`Proc. EJCC 1956, Vol. 10, pp. 93-100.
`9. McCarthy, J. Time Sharing Computer Systems, Management
`and the Computer of the Future M. Greenberger, Ed., M.I.T. Press,
`Cambridge, Mass., 1962, pp. 221-236.
`10. Murphy, D.L. Storage organization and management in
`TENEX. Proc. AFIPS 1972 FJCC, Vol. 41, Pt. I, AFIPS Press,
`Montvale, N.J., pp. 23-32.
`11. Olsen, K.H. Transistor circuitry in the Lincoln TX-2. Proc.
`WJCC 1957, Vol. 11, pp. 167-171.
`12. Roberts, L.G. Ed. Section on Resource Sharing Computer
`Networks. AFIPS 1970 SJCC, Vol. 36, AFIPS Press, Montvale,
`N.J., pp. 543-598.
`13. Wulf, W., and Bell, G. C.mmp —A mutli-mini-processor. Proc.
`AFIPS 1972 FJCC, Vol. 41, AFIPS Press, Montvale, N.J., pp.
`765-777.
`14. Wulf, W., Russell, D., and Habermann, A.N. BLISS: A
`language for systems programming. Comm. ACM 14, 12 (Dec.
`1971), 780-790.
`63
`
`Computer
`Systems
`
`G. Bell, S. H. Fuller, and
`D. Siewiorek, Editors
`
`The CRAY-1
`Computer System
`Richard M. Russell
`Cray Research, Inc.
`
`This paper describes the CRAYil, discusses the
`evolution of its architecture, and gives an account of
`some of the problems that were overcome during its
`manufacture.
`The CRAY-1 is the only computer to have been
`built to date that satisfies ERDA’s Class VI
`requirement (a computer capable of processing from
`20 to 60 million floating point operations per second)
`[1].
`The CRAY-l’s Fortran compfler (cft) is designed
`to give the scientific user immediate access to the
`benefits of the CRAY-l’s vector processing
`architecture. An optimizing compfler, cft,
`“vectorizes” innermost DO loops. Compatible with
`the ansi 1966 Fortran Standard and with many
`commonly supported Fortran extensions, cft does not
`require any source program modifications or the use
`of additional nonstandard Fortran statements to
`achieve vectorization. Thus the user’s investment of
`hundreds of man months of effort to develop Fortran
`programs for other contemporary computers is
`protected.
`Key Words and Phrases: architecture, computer
`systems
`CR Categories: 1.2, 6.2, 6.3
`
`Introduction
`
`Vector processors are not yet commonplace ma­
`chines in the larger-scale computer market. At the
`time of this writing we know of only 12 non-CRAY-1
`vector processor installations worldwide. Of these 12,
`the most powerful processor is the ILLIAC IV (1
`installation), the most populous is the Texas Instru­
`ments Advanced Scientific Computer (7 installations)
`and the most publicized is Control Data’s STAR 100
`Copyright © 1977, Association for Computing Machinery, Inc.
`General permission to republish, but not for profit, all or part of
`this material is granted provided that ACM’s copyright notice is
`given and that' reference is made to the publication, to its date of
`issue, and to the fact that reprinting privileges were granted by
`permission of the Association for Computing Machinery.
`Author’s address: Cray Research Inc., Suite 213, 7850 Metro
`Parkway, Minneapolis, MN 55420.
`January 1978
`Communications
`Volume 21
`of
`the ACM
`Number 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2102, p. 2
`
`

`

`(4 installations). In its report on the CRAY-1, Auer­
`bach Computer Technology Reports published a com­
`parison of the CRAY-1, the ASC, and the STAR 100
`[2]. The CRAY-1 is shown to be a more powerful
`computer than any of its main competitors and is
`estimated to be the equivalent of five IBM 370/195s.
`Independent benchmark studies have shown the
`CRAY-1 fully capable of supporting computational'
`rates of 138 milhon floating-point operations per sec­
`ond (mflops) for sustained periods and even higher
`rates of 250 mflops in short bursts [3, 4]. Such
`comparatively high performance results from the
`CRAY-1 internal architecture, which is designed to
`accommodate the computational needs of carrying out
`many calculations in discrete steps, with each step
`producing interim results used in subsequent steps.
`Through a technique called “chaining,” the CRAY-1
`vector functional units, in combination with scalar and
`vector registers, generate interim results and use them
`again immediately without additional memory refer­
`ences, which slow down the computational process in
`other contemporary computer systems.
`Other features enhancing the CRAY-l’s computa­
`tional capabilities are: its small size, which reduces
`distances electrical signals must travel within the com­
`puter’s framework and allows a 12.5 nanosecond clock
`period (the CRAY-1 is the world’s fastest scalar proc­
`essor); a one million word semiconductor memory
`equipped with error detection and correction logic
`(secded); its 64-bit word size; and its optimizing
`Fortran compiler.
`
`Architecture
`The CRAY-1 has been called “the world’s most
`expensive love-seat” [5]. Certainly, most people’s first
`reaction to the CRAY-1 is that it is so small. But in
`computer design it is a truism that smaller means
`faster. The greater the separation of components, the
`longer the time taken for a signal to pass between
`them. A cylindrical shape was chosen for the CRAY-1
`in order to keep wiring distances small.
`Figure 1 shows the physical dimensions of the
`machine. The mainframe is composed of 12 wedge­
`like columns arranged in a 270° arc. This leaves room
`for a reasonably trim individual to gain access to the
`interior of the machine. Note that the love-seat dis­
`guises the power supplies and some plumbing for the
`Freon cooling system. The photographs (Figure 2 and
`3) show the interior of a working CRAY-1 and an
`exterior view of a column with one module in place.
`Figure 4 is a photograph of the interior of a single
`module.
`
`An Analysis of the Architecture
`Table I details important characteristics of the
`CRAY-1 Computer System. The CRAY-1 is equipped
`with 12 i/o channels, 16 memory banks, 12 functional
`
`64
`
`Fig. 1. Physical organization of mainframe.
`-------- 56te"----- A
`
`77"r19"
`
`103%"
`
`— Dimensions
`Base-1031 inches diameter by 19 inches high
`Columns —561 inches diameter by 77 inches high including
`height of base
`— 24 chassis
`— 1662 modules; 113 module types
`— Each module contains up to 288 IC packages per module
`— Power consumption approximately 115 kw input for maximum
`memory size
`—Freon cooled with Freon/water heat exchange
`—Three memory options
`— Weight 10,500 lbs (maximum memory size)
`—Three basic chip types
`5/4 NAND gates
`Memory chips
`Register chips
`
`units, and more than 4k bytes of register storage.
`Access to memory is shared by the i/o channels and
`high-speed registers. The most striking features of the
`CRAY-1 are: only four chip types, main memory
`speed, cooling system, and computation section.
`
`\
`
`Four Chip Types
`Only four chip types are used to build the CRAY-
`1. These are 16 x 4 bit bipolar register chips (6
`nanosecond cycle time), 1024 x 1 bit bipolar memory
`chips (50 nanosecond cycle time), and bipolar logic
`chips with subnanosecond propagation times. The logic
`chips are all simple low- or high-speed gates with both
`a 5 wide and a 4 wide gate (5/4 nand). Emitter-
`coupled logic circuit (ecl) technology is used through­
`out the CRAY-1.
`The printed circuit board used in the CRAY-1 is a
`5-layer board with the two outer surfaces used for
`signal runs and the three inner layers for -5.2V,
`-2.0V, and ground power supplies. The boards are
`six inches wide, 8 inches long, and fit into the chassis
`as shown in Figure 3.
`All integrated circuit devices used in the CRAY-1
`are packaged in 16-pin hermetically sealed flat packs
`supplied by both Fairchild and Motorola. This type of
`package was chosen for its reliability and compactness.
`Compactness is of special importance; as many as 288
`packages may be added to a board to fabricate a
`module (there are 113 module types), and as many as
`72 modules may be inserted into a 28-inch-high chassis.
`January 1978
`Volume 21
`Number 1
`
`Communications
`of
`the ACM
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2102, p. 3
`
`

`

`Fig. 2. The CRAY-1 Computer.
`‘ ■.
`
`■ -
`
`Fig. 4. A single module.
`
`4
`
`:
`
`i
`
`:
`
`mm.
`
`m
`
`i
`
`t;
`
`%
`
`i
`
`■
`
`f?
`1R
`: .i''
`
`Fig. 3. CRAY-1 modules in place.
`
`:
`a
`$
`
`’ ■
`
`-i
`
`li
`m
`3 V
`
`hm■
`
`mfie
`uiuJo l 'iMimiErfl
`/Si]
`
`■ ».ci* * '■ - < ■ ".rw* hsri'isr
`cgl’-srikT. IWRfcK
`, its- rar
`
`I
`
`vsss;
`ISTS 7*3
`.. [
`|--'1. 1 - 8 tT-s-pri.mBTSTS
`E l V I
`I'i2*l;.'1c"! I „sr.]
`
`L ' I. f I :r I I r “ I. .rl
`I. I- I
`KtFlrs* I
`| 1 I. (... H...' I- I I'.TjTEu
`
`I 1
`xAS'iil ■ itiam
`
`K.,.» ^
`
`I .,T7.| .,t'|
`
`-WWJji
`
`■ lid
`
`4
`
`il
`H—Mi
`
`Such component densities evitably lead to a mammoth
`cooling problem (to be described).
`
`Main Memory Speed
`CRAY-1 memory is organized in 16 banks, 72
`modules per bank. Each module contributes 1 bit to a
`64-bit word. The other 8 bits are used to store an 8-bit
`check byte required for single-bit error correction,
`double-bit error detection (secded). Data words are
`stored in 1-bank increments throughout memory. This
`organization allows 16-way interleaving of memory
`accesses and prevents bank conflicts except in the case
`
`65
`
`•Tr
`
`!
`
`c
`• I
`
`mm
`
`i-mmz
`
`V •
`
`.2: * •L
`ill
`111
`wa
`
`?’;
`
`■Mlil
`
`Table I. CRAY-1 CPU characteristics summary
`Computation Section
`Scalar and vector processing modes
`12.5 nanosecond clock period operation
`64-bit word size
`Integer and floating-point arithmetic
`Twelve fully segmented functional units
`Eight 24-bit address (A) registers
`Sixty-four 24-bit intermediate address (B) registers
`Eight 64-bit scalar (5) registers
`Sixty-four 64-bit intermediate scalar (T) registers
`Eight 64-element vector (K) registers (64-bits per element)
`Vector length and vector mask registers
`One 64-bit real time clock (RT) register
`Four instruction buffers of sixty-four 16-bit parcels each
`128 basic instructions
`Prioritized interrupt control
`Memory Section
`1,(>48,576 64-bit words (plus 8 check bits per word)
`16 independent banks of 65,536 words each
`4 clock period bank cycle time
`1 word per clock period transfer rate for B, T, and V registers
`1 word per 2 clock periods transfer rate for A and 5 registers
`4 words per clock period transfer rate to instruction buffers (up to
`16 instructions per clock period)
`i/o Section
`24 i/o channels organized into four 6-channel groups
`Each channel group contains either 6 input or 6 output channels
`Each channel group served by memory every 4 clock periods
`Channel priority within each channel group
`16 data bits, 3 control bits per channel, and 4 parity bits
`Maximum channel rate of one 64-bit word every 100 nanoseconds
`Maximum data streaming rate of 500,000 64-bit words/second
`Channel error detection
`
`of memory accesses that step through memory with
`either an 8 or 16-word increment.
`Cooling System
`The CRAY-1 generates about four times as much
`heat per cubic inch as the 7600. To cool the CRAY-1
`a new cooling technology was developed, also based
`on Freon, but employing available metal conductors in
`a new way. Within each chassis vertical aluminum/
`stainless steel cooling bars line each column wall. The
`January 1978
`Volume 21
`Number 1
`
`Communications
`of
`the ACM
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2102, p. 4
`
`

`

`Fig. 5. Block diagram of registers.
`VECTOR REGISTERS
`V7
`V6
`V5
`V4
`V3
`VZ
`
`-T
`
`v
`
`((AOMAklV/'
`
`co
`
`VI
`
`VO
`
`MEMORY
`
`VJ
`(/ Vk
`Vi
`
`*-/u
`
`Vector
`Control
`
`[
`
`VM
`
`RTC
`
`TOO
`
`iAOi
`
`through
`
`T77
`
`((Ah) + jkm)
`
`SCALAR REGISTERS
`57
`si^=5
`3?
`IlLi
`SI
`
`so
`
`Sj
`^ Sit
`_£i
`
`I
`TKTTt
`^ Logical
`Add
`
`VECTOR
`
`Sj
`
`Vj
`
`VJ^
`Vi1"^
`
`S1
`
`Sj
`
`Sk
`
`. I Keeld. ftp
`i Multiply"
`Add
`
`FLOATING
`POINT
`
`_l..Pftn/l Z
`I SR ft
`I I naira
`Add
`
`ADDRESS REGISTERS
`//
`‘"I A7
`A6
`
`Exchange
`Control
`
`XA
`
`Vector
`Control
`
`VL
`
`hi
`
`Ak
`
`SCALAR
`
`n
`
`1 MulHnlv
`Add
`
`(AO)
`
`800
`through
`877
`
`((Ah) +
`
`jkm)
`
`Ai
`Bile
`
`St
`
`I
`
`r
`
`A
`
`1
`
`00
`
`3
`
`2
`
`I AO
`
`h p
`
`Aj
`
`ffc
`
`n
`
`L-+i
`
`\
`Ak [ {u ’ Ak l ft
`f}
`P1
`t
`I CL
`!
`i
`I/O
`» control
`HNIP
`J-r[
`H LIP
`
`ADDRESS
`
`e
`
`1<
`
`31
`
`I1 2
`
`FUNCTIONAL UNITS
`
`i 7
`INSTRUCTION BUFFERS
`
`CIP
`L. — —
`
`Execution
`
`Freon refrigerant is passed through a stainless steel
`tube within the aluminum casing. When modules are
`in place, heat is dissipated through the inner copper
`heat transfer plate in the module to the column walls
`and thence into the cooling bars. The modules are
`mated with the cold bar by using stainless steel pins to
`pinch the copper plate against the aluminum outer
`casing of the bar.
`To assure component reliability, the cooling system
`
`66
`
`was designed to provide a maximum case temperature
`of 130°F (54°C). To meet this goal, the following
`temperature differentials are observed:
`
`Temperature at center of module 130°F (54°C)
`Temperature at edge of module
`118°F (48°C)
`Cold plate temperature at wedge
`78°F (25°C)
`Cold bar temperature
`70°F (21°C)
`Refrigerant tube temperature
`70°F (21°C)
`January 1978
`Volume 21
`Number 1
`
`Communications
`of
`the ACM
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2102, p. 5
`
`

`

`Functional Units
`There are 12 functional units, organized in four
`groups: address, scalar, vector, and floating point.
`Each functional unit is pipelined into single clock
`segments. Functional unit time is shown in Table II.
`Note that all of the functional units can operate concur­
`rently so that in addition to the benefits of pipelining
`(each functional unit can be driven at a result rate of 1
`per clock period) we also have parallelism across the
`units too. Note the absence of a divide unit in the
`CRAY-1. In order to have a completely segmented
`divide operation the CRAY-1 performs floating-point
`division by the method of reciprocal approximation.
`This technique has been used before (e.g. IBM System/
`360 Model 91).
`Registers
`Figure 5 shows the CRAY-1 registers in relation­
`ship to the functional units, instruction buffers, i/o
`channel control registers, and memory. The basic set
`of programmable registers are as follows:
`8 24-bit address (A) registers
`64 24-bit address-save (B) registers
`8 64-bit scalar (S) registers
`64 64-bit scalar-save (T) registers
`8 64-word (4096-bit) vector (V) registers
`Expressed in 8-bit bytes rather than 64-bit words,
`that’s a total of 4,888 bytes of high-speed (6ns) register
`storage.
`The functional units take input operands from and
`store result operands only to A, S, and V registers.
`Thus the large amount of register storage is a crucial
`factor in the CRAY-Ts architecture. Chaining could
`not take place if vector register space were not availa­
`ble for the storage of final or intermediate results. The
`B and T registers greatly assist scalar performance.
`Temporary scalar values can be stored from and re­
`loaded to the A and S register in two clock periods.
`Figure 5 shows the CRAY-l’s register paths in detail.
`The speed of the cft Fortran IV compiler would be
`seriously impaired if it were unable to keep the many
`Pass 1 and Pass 2 tables it needs in register space.
`Without the register storage provided by the B, T, and
`V registers, the CRAY-l’s bandwidth of only 80
`million words/second would be a serious impediment
`to performance.
`
`Instruction Formats
`Instructions are expressed in either one or two 16-
`bit parcels. Below is the general form of a CRAY-1
`instruction. Two-parcel instructions may overlap mem­
`ory-word boundaries, as follows:
`k
`m
`i
`Fields
`h
`j
`g
`4-6
`7-9
`10-12 13-15
`0-3
`16-31
`Bit posi­
`(3)
`(4)
`(3)
`(3)
`(3)
`(16)
`tions
`Parcel 1
`Parcel 2
`The computation section processes instructions at a
`maximum rate of one parcel per clock period.
`
`67
`
`Table II. CRAY-1 functional units
`
`Address function units
`address add unit
`address multiply unit
`Scalar functional units
`scalar add unit
`scalar shift unit
`scalar logical unit
`population/leading zero count
`unit
`Vector functional units
`vector add unit
`vector shift unit
`vector logical unit
`Floating-point functional units
`floating-point add unit
`floating-point multiply unit
`reciprocal approximation unit
`
`Register
`usage
`
`A
`A
`
`S
`S
`
`S
`
`S
`
`Functional
`unit time
`(clock pe­
`riods)
`
`2
`6
`
`3
`2 or 3 if double-
`word shift
`
`1
`
`3
`
`3
`4
`2
`
`V
`V
`V
`S and V 6
`S and V 7
`S and V 14
`
`For arithmetic and logical instructions, a 7-bit op­
`eration code (gh) is followed by three 3-bit register
`designators. The first field, i, designates the result
`register. The j and k fields designate the two operand
`registers or are combined to designate a B or T
`register.
`The shift and mask instructions consist of a 7-bit
`operation code (gh) followed by a 3-bit i field and a 6-
`bit jk field. The i field designates the operand register.
`The jk combined field specifies a shift or mask count.
`Immediate operand, read and store memory, and
`branch instructions require the two-parcel instruction
`word format. The immediate operand and the read
`and store memory instructions combine the j, k, and
`m fields to define a 22-bit quantity or memory address.
`In addition, the read and store memory instructions
`use the h field to specify an operating register for
`indexing. The branch instructions combine the i, j, k,
`and m fields into a 24-bit memory address field. This
`allows branching to any one of the four parcel positions
`in any 64-bit word, whether in memory or in an
`instruction buffer.
`Operating Registers
`Five types of registers —three primary (A, S, and
`V) and two intermediate (B and T) —are provided in
`the CRAY-1.
`A registers — eight 24-bit A registers serve a variety
`of applications. They are primarily used as address
`registers for memory references and as index registers,
`but also are used to provide values for shift counts,
`loop control, and channel i/o operations. In address
`applications, they hre used to index the base address
`for scalar memory references and for providing both a
`base address and an index address for vector memory
`references.
`The 24-bit integer functional units modify values
`
`Communications
`of
`the ACM
`
`January 1978
`Volume 21
`Number 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2102, p. 6
`
`

`

`(such as program addresses) by adding, subtracting,
`and multiplying A register quantities. The results of
`these operations are returned to A registers.
`Data can be transferred directly from memory to A
`registers or can be placed in B registers as an interme­
`diate step. This allows buffering of the data between
`A registers and memory. Data can also be transferred
`between A and S registers and from an A register to
`the vector length registers. The eight A registers are
`individually designated by the symbols AO, Al, A2,
`A3, A4, A5, A6, and A7.
`B registers—there are sixty-four 24-bit B registers,
`which are used as auxiliary storage for the A registers.
`The transfer of an operand between an A and a B
`register requires only one clock period. Typically, B
`registers contain addresses and counters that are refer­
`enced over a longer period than would permit their
`being retained in A registers. A block of data in B
`registers may be transferred to or from memory at the
`rate of one clock period per register. Thus, it is feasible
`to store the contents of these registers in memory
`prior to calling a subroutine requiring their use. The
`sixty-four B registers are individually designated by
`the symbols BO, Bl, B2, . . . , and B778.
`S registers—eight 64-bit S registers are the principle
`data handling registers for scalar operations. The S
`registers serve as both source and destination registers
`for scalar arithmetic and logical instructions. Scalar
`quantities involved in vector operations are held in S
`registers. Logical, shift, fixed-point, and floating-point
`operations may be performed on S register data. The
`eight S registers are individually designated by the
`symbols SO, SI, S2, S3, S4, S5, S6, and S7.
`T registers—sixty-four 64-bit T registers are used as
`auxiliary storage for the S registers. The transfer of an
`operand between S and T registers requires one clock
`period. Typically, T registers contain operands that
`are referenced over a longer period than would permit
`their being retained in S registers. T registers allow
`intermediate results of complex computations to be
`held in intermediate access storage rather than in
`memory. A block of data in T registers may be
`transferred to or from memory at the rate of one word
`per clock period. The sixty-four T registers are individ­
`ually designated by the symbols TO, Tl, T2, . . . , and
`T778.
`V registers—eight 64-element V registers provide
`operands to and receive results from the functional
`units at a one clock period rate. Each element of a V
`register holds a 64-bit quantity. When associated data
`is grouped into successive elements of a V register, the
`register may be considered to contain a vector. Exam­
`ples of vector quantities are rows and columns of a
`matrix, or similarly related elements of a table. Com­
`putational efficiency is achieved by processing each
`element of the vector identically. Vector merge and
`test instructions are provided in the CRAY-1 to allow
`operations to be performed on individual elements
`designated by the content of the vector mask (VM)
`68
`
`register. The number of vector register elements to be
`processed is contained in the vector length (VL) regis­
`ter. The eight V registers are individually designated
`by the symbols VO, VI, V2, V3, V4, V5, B6, and V7.
`
`Supporting Registers
`The CPU contains a variety of additional registers
`that support the control of program execution. These
`are the vector length (VL) and vector mask (VM)
`registers, the program counter (P), the base address
`(BA) and limit address (LA) registers, the exchange
`address (XA) register, the flag (F) register, and the
`mode (M) register.
`VL register—the 64-bit vector mask (VM) register
`controls vector element designation in vector merge
`and test instructions. Each bit of the VM register
`corresponds to a vector register element. In the vector
`test instruction, the VM register content is defined by
`testing each element of a V register for a specific
`condition.
`P register—the 24-bit P register specifies the mem­
`ory register parcel address of the current program
`instruction. The high order 22 bits specify a memory
`address and the low order two bits indicate a parcel
`number. This parcel address is advanced by one as
`each instruction parcel in a nonbranching sequence is
`executed and is replaced whenever program branching
`occurs.
`BA registers—the 18-bit base address (BA) register
`contains the upper 18 bits of a 22-bit memory address.
`The lower four bits of this address are considered
`zeros. Just prior to initial or continued execution of a
`program, a process known as the “exchange sequence”
`stores into the BA register the upper 18 bits of the
`lowest memory address to be referenced during pro­
`gram execution. As the program executes, the address
`portion of each instruction referencing memory has its
`content added to that of the BA register. The sum
`then serves as the absolute address used for the mem­
`ory reference and ensures that memory addresses lower
`than the contents of the BA register are not accessed.
`Programs must, therefore, have all instructions refer­
`encing memory do so with their address portions
`containing relative addresses. This process supports
`program loading and memory protection operations
`and does not, in producing an absolute address, affect
`the content of the instruction buffer, BA, or memory.
`LA register—the 18-bit limit address (LA) register
`contains the upper 18 bits of a 22-bit memory address.
`The lower 4 bits of this address are considered zeros.
`Just prior to initial or continued execution of a pro­
`gram, the “exchange sequence” process stores into the
`LA register the upper 18 bits of that absolute address
`one greater than allowed to be referenced by the
`program. When program execution begins, each in­
`struction referencing a memory location has the abso­
`lute address for that reference (determined by summing
`its address portion with the BA register contents)
`checked against the LA register content. If the absolute
`January 1978
`Volume 21
`Number 1
`
`Communications
`of
`the ACM
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2102, p. 7
`
`

`

`address equals or exceeds the LA register content, an
`out-of-range error condition is flagged and program
`execution terminates. This process supports the mem­
`ory protection operation.
`XA register — the 8-bit exchange address (XA) reg­
`ister contains the upper eight bits of a 12-bit memory
`address. The lower four bits of the address are consid­
`ered zeros. Because only twelve bits are used, with the
`lower four bits always being zeros, exchange addresses
`can reference only every 16th memory address begin­
`ning with address 0000 and concluding with address
`4080. Each of these addresses designates the first
`word of a 16-word set. Thus, 256 sets (of 16 memory
`words each) can be specified. Prior to initiation or
`continuation of a program’s execution, the XA register
`contains the first memory address of a particular 16-
`word set or exchange package. The exchange package
`contains certain operating and support registers’ con­
`tents as required for operations following an interrupt.
`The XA register supports the exchange sequence op­
`eration and the contents of XA are stored in an
`exchange package whenever an exchange sequence
`occurs.
`F register — \h& 9-bit F register contains flags that,
`whenever set, indicate interrupt conditions causing
`initiation of an exchange sequence. The interrupt con­
`ditions are: normal exit, error exit, i/o interrupt, uncor­
`rected memory error, program range error, operand
`range error, floating-point overflow, real-time clock
`interrupt, and console interrupt.
`M register — the M (mode) register is a three-bit
`register that contains part of the exchange package for
`a currently active program. The three bits are selec­
`tively set during an exchange sequence. Bit 37, the
`floating-point error mode flag, can be set or cleared
`during the execution interval for a program through
`use of the 0021 and 0022 instructions. The other two
`bits (bits 38 and 39) are not altered during the execu­
`tion interval for the exchange package and can only be
`altered when the exchange package is inactive in stor­
`age. Bits are assigned as follows in word two of the
`exchange package.
`Bit 37—Floating-point error mode flag. When this
`bit is set, interrupts on floating-point errors are
`enabled.
`Bit 38 —Uncorrectable memory error mode flag.
`When this bit is set, interrupts on uncorrectable
`memory parity errors are enabled.
`Bit 39 —Monitor mode flag. When this bit is set, all
`interrupts other than parity errors are inhibited.
`
`Integer Arithmetic
`All integer arithmetic is performed in 24-bit or 64-
`bit 2’s complement form.
`
`Floating-Point Arithmetic
`Floating-point numbers are represented in signed
`magnitude form. The format is a packed signed binary
`
`69
`
`fraction and a biased binary integer exponent. The
`fraction is a 49-bit signed magnitude value. The expo­
`nent is 15-bit biased. The unbiased exponent range is:
`2—200008 tQ 2+177778)
`or approximately
`J 0-2500 to 10+2500
`An exponent equal to or greater than 2+2000U* is recog­
`nized by the floating-point functional units as an over­
`flow condition, and causes an interrupt if floating point
`interrupts are enabled.
`Chaining
`The chaining technique takes advantage of the
`parallel operation of functional units. Parallel vector
`operations may be processed in two ways: (a) using
`different functional units and V registers, and (b)
`chaining; that is, using the result stream to one vector
`register simultaneously as the operand set for another
`operation in a different functional unit.
`Parallel operations on vectors allow the generation
`of two or more results per clock period. A vector
`operation either uses two vector registers as sources of
`operands or uses one scalar register and one vector
`register as sources of operands. Vectors exceeding 64
`elements are processed in 64-element segments.
`Basically, chaining is a phenomenon that occurs
`when results issuing from one functional unit (at a rate
`of one/clock period) are immediately fed into another
`functional unit and so on. In other words, intermediate
`results do not have to be stored to memory and can be
`used even before the vector operation that created
`them runs to completion.
`Chaining has been compared to the technique of
`“data forwarding” used in the IBM 360/195. Like
`data forwarding, chaining takes place automatically.
`Data forwarding consists of hardware facilities within
`the 195 floating-point processor communicating auto­
`matically by transferring “name tags,” or internal codes
`between themselves [6], Unlike the CRAY-1, the user
`has no access to the 195’s data-forwarding buffers.
`And, of course, the 195 can only forward scalar values,
`not entire vectors.
`Interrupts and Exchange Sequence
`Interrupts are handled cleanly by the CRAY-1
`hardware. Instruction issue is terminated by the hard­
`ware upon detection of an interrupt condition. All
`memory bank activity is allowed to complete as are
`any vector instructions that are in execution, and then
`an exchange sequence is activated. The Cray Operating
`System (cos) is always one partner of any exchange
`sequence. The cause of an interrupt is analyzed during
`an exchange sequence and all interrupts are processed
`until none remain.
`Only the address and scalar registers are maintained
`in a program’s exchange package (Fig. 6). The user’s
`B, T, and V registers are saved by the operating
`system in the user’s Job Table Area.
`January 1978
`Volume 21
`Number 1
`
`Communications
`of
`the ACM
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2102, p. 8
`
`

`

`LA
`
`M
`
`63
`
`AO
`A I
`A2
`
`AS
`
`A4
`
`A6
`
`A6
`AT
`
`Fig. 6. Exchange package.
`36 <10
`0 2
`10 12 16 18 2k 31
`*1 » l"l ■> ^
`r
`ft
`22
`y/z/^/M/A
`BA
`n ♦ 1
`n + 2
`'Mz////////////////////////////ti.
`n+3
`n* 4 V///////////////////////////////////.
`A ♦ 5
`WMmmmmmmy.
`A+ 6
`n* 7
`n* e
`fM- 9
`A 4 10
`A 4 II
`A* 12
`A 4 13
`04 14
`A+ 15
`
`SO
`si
`S2
`
`S3
`S*
`
`SS
`se
`ST
`
`H - Modes'^
`36 Interrupt on correctable
`memory error
`37 Interrupt on floating point
`38 Interrupt on uncorrectable
`memory error
`39 Monitor mode
`F - Flaost
`31 Console interrupt
`32 RFC Interrupt
`33 Floating point error
`34 Operand range
`35 Program range
`36 Memory error
`37 I/O Interrupt
`38 Error exit
`39 Normal exit
`
`+B1t position from left of word
`
`Registers
`S Syndrome bi ts
`RAB Read address for error
`(where B is bank)
`P Program address
`BA Base address
`LA Limit address
`XA Exchange address
`VL Vector 1e

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket