throbber
PARALLEL OPERATION IN THE CONTROL DATA 6600
`James E. Thornton
`Control Data Corporation
`Minneapolis, Minnesota
`
`HISTORY
`About four years ago, in the summer of
`1960, Control Data began a project which cul(cid:173)
`minated last month in the delivery of the first
`6600 Computer.
`In 1960 it was apparent that
`brute force circuit performance and parallel
`operation were the two main approaches to
`any advanced computer.
`This paper presents some of the consid(cid:173)
`erations having to do with the parallel opera(cid:173)
`tions in the 6600. A most important and
`fortunate event coincided with the beginning
`of the 6600 project. This was the appearance
`of
`the high-speed silicon
`transistor, which
`
`survived early difficulties to become the basis
`for a nice jump in circuit performance.
`
`SYSTEM ORGANIZATION
`The computing system envisioned in that
`project, and now called the 6600, paid special
`attention to two kinds of use, the very large
`scientific problem and
`the time sharing of
`smaller problems. For the large problem, a
`high-speed floating point central processor with
`access to a large central memory was obvious.
`Not so obvious, but important to the 6600
`system idea, was the isolation of this central
`arithmetic from any peripheral activity.
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`4096 WORD
`CORE MEMORY
`
`PHERIPHERAL
`& CONTROL
`PROCESSOR
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`< '
`
`i '
`
`6600 CENTRAL MEMORY
`
`6600 CENTRAL PROCESSOR
`
`AAtV\ rPMTPAl UCUOPV
`
`i i
`
`i i
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`Figure 1. Control Data 6600.
`
`33
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`4096 WORD
`CORE MEMORY
`
`PERIPHERAL
`& CONTROL
`PROCESSOR
`
`SAMSUNG-1010
`Page 1 of 8
`
`

`

`3 4
`
`PROCEEDINGS—SPRING JOINT COMPUTER CONFERENCE, 1964
`
`It was from this general line of reasoning
`that the idea df a multiplicity of peripheral
`processors was formed (Fig. 1). Ten such
`peripheral processors have access to the central
`memory on one side and the peripheral channels
`on the other. The executive control of the
`system is always in one of these peripheral
`processors, with the others operating on as(cid:173)
`signed peripheral or control tasks. All ten
`processors have access to twelve input-output
`channels and may "change hands," monitor
`channel activity, and perform other related
`jobs. These processors have access to central
`memory, and may pursue independent transfers
`to and from this memory.
`Each of the ten peripheral processors
`contains its own memory for program and
`buffer areas, thereby isolating and protecting
`the more critical system control operations in
`the separate processors. The central processor
`operates from the central memory with relocat(cid:173)
`ing register and file protection for each program
`in central memory.
`
`PERIPHERAL AND CONTROL PROCESSORS
`The peripheral and control processors
`are housed in one chassis of the main frame.
`Each processor contains 4G96 memory words
`of 12 bits length. There are 12- and 24-bit
`
`instruction formats to provide for direct, in(cid:173)
`direct, and relative addressing.
`Instructions
`provide logical, addition, subtraction, shift, and
`conditional branching.
`Instructions also pro(cid:173)
`vide single word or block transfers to and from
`any of twelve peripheral channels, and single
`word or block transfers to and from central
`memory. Central memory words of 60 bits
`length are assembled from five consecutive
`peripheral words. Each processor has instruc(cid:173)
`tions to interrupt the central processor and to
`monitor the central program address.
`To get this much processing power with
`reasonable economy and space, a time-sharing
`design was adopted (Fig. 2). This design
`contains a register "barrel" around which is
`moving the dynamic information for all ten
`processors. Such things as program address,
`accumulator contents, and other pieces of in(cid:173)
`formation totalling 52 bits are shifted around
`the barrel. Each complete trip around requires
`one major cycle or one thousand nanoseconds.
`A "slot" in the barrel contains adders, assembly
`networks, distribution network, and intercon(cid:173)
`nections to perform one step of any peripheral
`instruction. The time to perform this step or,
`in other words, the time through the slot, is
`one minor cycle or one hundred nanoseconds.
`Each of the ten processors, therefore, is allowed
`
`PROCESSOR
`REGISTERS
`
`TIME-SHARED
`INSTRUCTION
`CONTROL
`
`PROCESSOR
`MEMORIES
`
`READ PYRAMID
`
`(60)
`
`(48)
`
`(36)
`
`(24)
`
`WRITE RYRAMID
`
`(60)
`
`(48)
`
`(36)
`
`(24)
`
`(12)
`
`(12)
`
`(12)
`
`(12)
`
`CENTRAL
`MEMORY
`(60)
`
`CENTRAL
`MEMORY
`(60)
`
`02)
`
`10
`
`11
`
`12
`
`13
`
`14
`
`1/0 CHANNELS
`
`^r REAL TIME
`
`EXTERNAL EQUIPMENT
`
`Figure 2. 6600 Peripheral and Control Processors.
`
`SAMSUNG-1010
`Page 2 of 8
`
`

`

`PARALLEL OPERATION IN THE CONTROL DATA 6600
`
`35
`
`one minor cycle of every ten to perform one of
`its steps. A peripheral instruction may require
`one or more of these steps, depending on the
`kind of instruction.
`In effect, the single arithmetic and the
`single distribution and assembly network are
`made to appear as ten. Only the memories are
`kept
`truly
`independent.
`Incidentally,
`the
`memory read-write cycle time is equal to one
`complete trip around the barrel, or one thousand
`nanoseconds.
`Input-output channels are bi-directional,
`12-bit paths. One 12-bit word may move in
`one direction every major cycle, or 1000 nano(cid:173)
`seconds, on each channel. Therefore, a maxi(cid:173)
`mum burst rate of 120 million bits per second
`is possible using all ten peripheral processors.
`A sustained rate of about 50 million bits per
`second can be maintained in a practical operat(cid:173)
`ing system. Each channel may service several
`peripheral devices and may interface to other
`systems, such as satellite computers.
`Peripheral and control processors access
`central memory through an assembly network
`and a dis-assembly network. Since five periph(cid:173)
`eral memory references are required to make
`up one central memory word, a natural assem(cid:173)
`bly network of five levels is used. This allows
`
`PERIPHERAL AND
`CONTROL PROCESSORS
`
`10
`
`9
`
`«•
`
`five references to be "nested" in each network
`during any major cycle. The central memory
`is organized in independent banks with the abil(cid:173)
`ity to transfer central words every minor cycle.
`The peripheral processors, therefore, introduce
`at most about 2% interference at the central
`memory address control.
`A single real time clock, continuously
`running, is available to all peripheral proces(cid:173)
`sors.
`
`CENTRAL PROCESSOR
`
`The 6600 central processor may be con(cid:173)
`sidered the high-speed arithmetic unit of the
`system (Fig. 3). Its program, operands, and
`results are held in the central memory. It has
`no connection to the peripheral processors ex(cid:173)
`cept through memory and except for two single
`controls. These are the exchange jump, which
`starts or interrupts the central processor from
`a peripheral processor, and the central program
`address which can be monitored by a peripheral
`processor.
`A key description of the 6600 central
`processor, as you will see in later discussion, is
`"parallel by function." This means that a num(cid:173)
`ber of arithmetic functions may be performed
`
`CENTRAL PROCESSOR
`
`24
`OPERATING
`REGISTERS
`
`UPPER
`BOUNDARY
`
`CENTRAL
`MEMORY
`
`LOWER
`BOUNDARY
`
`ADD
`
`MULTIPLY
`
`MULTIPLY
`
`DIVIDE
`
`LONG ADD
`
`SHIFT
`
`BOOLEAN
`
`INCREMENT
`
`INCREMENT
`
`•-
`
`BRANCH
`
`12 INPUT
`OUTPUT CHANNELS
`
`Figure 3. Block Diagram of 6600.
`
`SAMSUNG-1010
`Page 3 of 8
`
`

`

`36
`
`PROCEEDINGS—SPRING JOINT COMPUTER CONFERENCE, 1964
`
`concurrently. To this end, there are ten func(cid:173)
`tional units within the central processor. These
`are the two increment units, floating add unit,
`fixed add unit, shift unit, two multiply units,
`divide unit, boolean unit, and branch unit. In
`a general way, each of these units is a three
`address unit. As an example, the floating add
`unit obtains two 60-bit operands from the cen(cid:173)
`tral registers and produces a 60-bit result which
`is returned to a register.
`Information to and
`from these units is held in the central registers,
`of which there are twenty-four. Eight of these
`are considered index registers, are of 18 bits
`length, and one of which always contains zero.
`Eight are considered address registers, are of
`18 bits length, and serve to address the five read
`central memory trunks and the two store cen(cid:173)
`tral memory trunks. Eight are considered float(cid:173)
`ing point registers, are of 60 bits length, and
`are the only central registers to access central
`memory during a central program.
`In a sense, just as the whole central proc(cid:173)
`essor is hidden behind central memory from
`the peripheral processors, so, too, the ten func(cid:173)
`tional units are hidden behind the central regis(cid:173)
`ters from central memory. As a consequence,
`a considerable instruction efficiency is obtained
`and an interesting form of concurrency is feasi(cid:173)
`ble and practical. The fact that a small number
`of bits can give meaningful definition to any
`function makes it possible to develop forms of
`operand and unit reservations needed for a
`general scheme of concurrent arithmetic.
`Instructions are organized in two for(cid:173)
`mats, a 15-bit format and a 30-bit format, and
`may be mixed in an instruction word (Fig. 4).
`f
`m
`i
`j
`k
`
`3
`
`3
`
`3
`
`3
`
`3
`
`15 BITS
`
`OPERATION
`CODE
`
`RESULT
`REG.
`(1 of 8)
`
`1st OPERAND
`REG.
`(1 of 8)
`
`Figure 4. 15-Bit Instruction Format
`
`2nd OPERAND
`REG.
`(1 of 8)
`
`As an example, a 15-bit instruction may call
`for an ADD, designated by the / and m octal
`digits, from registers designated by the j and k
`octal digits, the result going to the register
`designated by the i octal digit. In this example,
`the addresses of the three-address, floating add
`unit are only three bits in length, each address
`referring to one of the eight floating point regis(cid:173)
`ters. The 30-bit format follows this same form
`but substitutes for the k octal digit an 18-bit
`constant K which serves as one of the input
`operands. These two formats provide a highly
`efficient control of concurrent operations.
`As a background, consider the essential
`difference between a general purpose device and
`a special device in which high speeds are re(cid:173)
`quired. The designer of the special device can
`generally improve on the traditional general
`purpose device by introducing some form of
`concurrency. For example, some activities of
`a housekeeping nature may be performed sepa(cid:173)
`rate from the main sequence of operations in
`separate hardware. The total time to complete
`a job is then optimized to the main sequence
`and excludes the housekeeping. The two cate(cid:173)
`gories operate concurrently.
`It would be, of course, most attractive to
`provide in a general purpose device some gen(cid:173)
`eralized scheme to do the same kind of thing.
`The organization of the 6600 central processor
`provides just this kind of scheme. With a multi(cid:173)
`plicity of functional units, and of operand reg(cid:173)
`isters and with a simple and highly efficient
`addressing system, a generalized queue and res(cid:173)
`ervation scheme is practical. This is called the
`scoreboard.
`The scoreboard maintains a running file
`of each central register, of each functional unit,
`and of each of the three operand trunks to and
`from each unit. Typically, the scoreboard file
`is made up of two-, three-, and four-bit quan(cid:173)
`tities identifying the nature of register and
`unit usage. As each new instruction is brought
`up, the conditions at the instant of issuance are
`set into the scoreboard. A snapshot is taken,
`so to speak, of the pertinent conditions. If no
`waiting is required, the execution of the instruc(cid:173)
`tion is begun immediately under control of the
`unit itself. If waiting is required (for example,
`an input operand may not yet be available in
`the central registers), the scoreboard controls
`the delay, and when released, allows the unit to
`
`SAMSUNG-1010
`Page 4 of 8
`
`

`

`PARALLEL OPERATION IN THE CONTROL DATA 6600
`
`37
`
`begin its execution. Most important, this activ(cid:173)
`ity is accomplished in the scoreboard and the
`functional unit, and does not necessarily limit
`later instructions from being brought up and
`issued.
`
`In this manner, it is possible to issue a
`series of instructions, some related, some not,
`until no functional units are left free or until
`a specific register is to be assigned more than
`one result. With just
`those two restrictions
`on issuing (unit free and no double result),
`several independent chains of instructions may
`proceed concurrently.
`Instructions may
`issue
`every minor cycle in the absence of the two
`restraints. The instruction executions, in com(cid:173)
`parison, range from three minor cycles for fixed
`add, 10 minor cycles for floating multiply, to
`29 minor cycles for floating divide.
`To provide a relatively continuous source
`of instructions, one buffer register of 60 bits is
`located at the bottom of an instruction stack
`capable of holding 32 instructions
`(Fig. 5).
`Instruction words from memory enter the bot(cid:173)
`tom register of the stack pushing up the old
`instruction words.
`In straight line programs,
`only the bottom two registers are in use, the
`bottom being refilled as quickly as memory con(cid:173)
`flicts allow.
`In programs which branch back
`to an instruction in the upper stack registers,
`no refills are allowed after the branch, thereby
`holding
`the program
`loop completely
`in
`the
`stack. As a result, memory access or memory
`
`INSTRUCTION
`STACK
`
`8 60-BIT
`WORDS
`
`conflicts are no longer involved, and a consider(cid:173)
`able speed increase can be had.
`Five memory trunks are provided from
`memory into the central processor to five of the
`floating point registers (Fig. 6). One address
`register is assigned to each trunk (and there(cid:173)
`fore to the floating point register). Any
`in(cid:173)
`struction calling for address
`register
`result
`implicitly initiates a memory reference on t h at
`trunk. These instructions are handled through
`the scoreboard and therefore
`tend to overlap
`memory access with arithmetic. For example,
`a new memory word to be loaded in a floating
`point register can be brought in from memory
`but may not enter the register until all previous
`uses of that register are completed. The central
`registers, therefore, provide all of the data to
`the ten functional units, and receive all of the
`unit results. No storage is maintained in any
`unit.
`
`Central memory is organized in 32 banks
`of 4096 words. Consecutive addresses call for a
`different bank; therefore, adjacent addresses in
`one bank are in reality separated by 32. Ad(cid:173)
`dresses may be issued every 100 nanoseconds.
`A typical central memory information
`transfer
`rate is about 250 million bits per second.
`As mentioned before, the functional units
`are hidden behind the registers. Although the
`units might appear to increase hardware dupli(cid:173)
`cation, a pleasant fact emerges from this design.
`Each unit may be trimmed to perform its func-
`4
`T
`
`INSTRUCTION
`REGISTERS
`
`i
`
`i i
`
`1
`
`1
`f
`1
`t
`t
`t
`!
`
`i >
`BUFFER REGISTER
`
`1
`Figure 5. 6600 Instruction Stack Operation.
`
`SAMSUNG-1010
`Page 5 of 8
`
`

`

`3 8
`
`PROCEEDINGS—SPRING JOINT COMPUTER CONFERENCE, 1964
`
`OPERANDS
`
`(60-BIT)
`xo
`XI
`X2
`X3
`X4
`X5
`X6
`X7
`
`OPERANDS
`
`RESULTS
`
`ADDRESSES (18-BIT)
`
`€
`
`CENTRAL
`MEMORY
`
`10 FUNCTIONAL
`UNITS
`
`INSTRUCTION
`REGISTERS
`
`INSTRUCTION
`STACK
`
`(UP TO 8 WORDS
`60-BIT)
`
`INSTRUCTIONS
`
`Figure 6. Central Processor Operating Registers.
`
`tion without regard to others. Speed increases
`are had from this simplified design.
`As an example of special functional unit
`design, the floating multiply accomplishes the
`coefficient multiplication in nine minor cycles
`plus one minor cycle to put away the result for
`a total of 10 minor cycles, or 1000 nanoseconds.
`The multiply uses layers of carry save adders
`grouped in two halves. Each half concurrently
`forms a partial product, and the two partial
`products finally merge while the long carries
`propagate. Although this is a fairly large com(cid:173)
`plex of circuits, the resulting device was suffi(cid:173)
`ciently smaller than originally planned to allow
`two multiply units to be included in the final
`design.
`To sum up the characteristics of the
`central processor, remember that the broad-
`brush description is "concurrent operation."
`In other words, any program operating within
`the central processor utilizes some of the avail(cid:173)
`
`able concurrency. The program need not be
`written in a particular way, although certainly
`some optimization can be done. The specific
`method of accomplishing this concurrency in(cid:173)
`volves issuing as many instructions as possible
`while handling most of the conflicts during
`execution. Some of the essential requirements
`for such a scheme include:
`
`1. Many functional units
`2. Units with three address properties
`3. Many transient registers with many
`trunks to and from the units
`4. A simple and efficient instruction set
`
`CONSTRUCTION
`Circuits in the 6600 computing system
`use all-transistor logic (Fig. 7). The silicon
`transistor operates in saturation when switched
`"on" and averages about five nanoseconds of
`stage delay. Logic circuits are constructed in
`
`SAMSUNG-1010
`Page 6 of 8
`
`

`

`PARALLEL OPERATION IN THE CONTROL DATA 6600
`
`39
`
`Figure 7. 6600 Printed Circuit Module.
`
`a cord wood plug-in module of about 2i/2 inches
`by 2i/2 inches by 0.8 inch. An average of about
`50 transistors are contained in these modules.
`Memory circuits are constructed
`in a
`plug-in module of about six inches by six inches
`by 2y2 inches (Fig. 8). Each memory module
`contains a coincident current memory of 4096
`12-bit words. All read-write drive circuits and
`
`MMMfma
`
`n
`
`o
`
`S
`
`c
`
`^mmmm^0
`
`Figure 8. 6600 Memory Module.
`
`bit drive circuits plus address translation are
`contained in the module. One such module is
`used for each peripheral processor, and
`five
`modules make up one bank of central memory.
`Logic modules and memory modules are
`held in upright hinged chassis in an X shaped
`cabinet
`(Fig. 9).
`Interconnections between
`modules on the chassis are made with twisted
`pair
`transmission
`lines.
`Interconnections be(cid:173)
`tween chassis are made with coaxial cables.
`Both maintenance and operation are ac(cid:173)
`complished at a programmed display console
`(Fig. 10). More than one of these consoles may
`be included in a system if desired. Dead start
`facilities bring the ten peripheral processors to
`a condition which allows information to enter
`from any chosen peripheral device. Such loads
`normally bring in an operating system which
`provides a highly sophisticated capability for
`multiple users, maintenance, and so on.
`The 6600 Computer has taken advantage
`of certain technology advances, but more par(cid:173)
`ticularly, logic organization advances which now
`appear
`to be quite successful. Control Data
`is exploring advances
`in
`technology upward
`within the same compatible structure, and iden(cid:173)
`tical
`technology downward, also within
`the
`same compatible structure.
`
`SAMSUNG-1010
`Page 7 of 8
`
`

`

`40
`
`PROCEEDINGS—SPRING JOINT COMPUTER CONFERENCE, 1964
`
`w^mm^**ummmm
`
`Figure 9. 6600 Main Frame Section.
`
`Figure 10. 6600 Display Console.
`
`SAMSUNG-1010
`Page 8 of 8
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket