`
`Architecture
`Processor
`A Multithreaded
`for Parallel
`Symbolic
`Computing
`
`Jr.
`Robert H. Halstead,
`M.I.T. Laboratory
`for Computer Science
`545 Technology Square
`Cambridge, Mass. 02139, U.S.A.
`
`Tetsuya Fujita
`NEC Corporation
`l-10, Nisshincho
`FIIC~~I City, Tokyo, 183, Japan
`
`Abstract
`
`intended as a
`is a “first cut” at a processor architecture
`MASA
`building block for a multiprocessor
`that can execute parallel Lisp pro
`grams efficiently. MASA
`features a tagged architecture, multiple con-
`texts,
`fast trap handling, and a synchronization
`bit
`in every memory
`word. MASA’s principal novelty
`is its use of multiple contexts both to
`support multithreaded
`execution-interleaved
`execution
`from separate
`instruction
`streams-and
`to speed up procedure calls and trap han-
`dling in the same manner as register windows. A project
`is under way
`to evaluate MASA-like architectures
`for executing programs written
`in
`Multilisp.
`
`1.
`
`Introduction
`
`is a
`for Symbolic Applications)
`Architecture
`(Multilisp
`MASA
`for
`Yirst cut” at a “communication-oriented”
`processor architecture
`efficiently executing parallel Lisp (especially Multilisp
`[8,9] or QLISP
`[6]) programs. The system structure envisioned
`for MASA consists of
`a collection of dozens to hundreds of nodes, each including a MASA
`processor and some memory
`(including
`caches),
`interconnected
`by a
`high-speed communication
`net,
`Individual memory modules are phys-
`ically associated with particular nodes, but the system has one unified
`address space (in line with
`the “shared memory” philosophy of Multi-
`lisp and &LISP). This structure
`resembles that of current systems such
`es the IBM RP3 [21] and the BBN Butterfly
`[3]. Although
`the design
`of the communication net and caching strategy are important subjects,
`this paper focuses on MASA’s processor architecture.
`
`MASA aims to speed up several parallel Lisp operations, notably
`generic operations on tagged data,
`task spawning, synchronization
`(in
`and garbage collection, which are ex-
`particular,
`using futures
`[l&9]),
`pensive on “mainstream”
`processor architectures.
`The
`“communica-
`tion-oriented”
`character of MASA comes from its use of multiple con-
`texts
`to support multithreaded
`execution
`in which each processor can
`have several active
`tasks loaded at once and can switch
`from one to
`another on every clock cycle. Multithreading
`offers a way to improve
`MASA’s
`tolerance of communication and synchronization
`delays by al-
`lowing other
`tasks to run while some are blocked, even if only briefly.
`Although
`this has been done before (on the HEP-1 116,22]), MASA’s
`principal design contribution
`is its scheme for graceful management
`of and interaction between
`these contexts.
`In MASA, one resource-a
`bank of register sets-is
`shared flexibly among various uses, supporting
`
`in part by NEC Corp. (by supporting
`This research was supported
`the second author
`for a year at M.I.T.) and in part by the Defense
`Advanced Research Projects Agency, monitored by the OWce of Naval
`Research under contract numbers N00014-83-K-0125 and N00014-84
`K-0099.
`
`[15,20,24]
`but also used in the style of register windows
`multithreading
`to reduce the average cost of context saving and restoring
`for procedure
`calls and traps.
`is a ‘Wrst cut,” many other details are still unresolved.
`Since MASA
`We have focused on the choice of operations
`implemented,
`their alloca
`tion among pipeline stages, and the design of novel mechanisms such as
`those for interaction
`between
`tasks. Other
`issues, such as the packing
`of bits into
`instruction words,
`the size of tag and data fields, and the
`number and size of register sets, involve
`low-level optimizations
`that
`interact weakly with
`the major design issues. They are best settled on
`the basis of empirical experience
`that has not yet accumulated, so they
`are not yet fixed.
`In the following sections, we first give a broad overview of MASA’s
`architectural
`goals and mechanisms. Then we discuss in greater detail
`MASA’s context management mechanisms,
`including
`their application
`to multithreading,
`procedure
`linkage, and trap handling. Next, we give
`an overview of a possible pipelined
`implementation
`of MASA. We close
`by discussing
`the performance
`impact of MASA’s design decisions.
`2. Architectural
`Overview
`architecture
`load/store
`MASA employs a simple,
`register-based,
`augmented with hardware mechanisms
`for tagged data manipulation,
`garbage collection, synchronization,
`and multithreaded
`execution using
`multiple contexts. Essentially all details of MASA’s architecture are
`direct consequences of the style of these mechanisms. Space constraints
`require
`this paper to omit many details, which can be found
`in [4].
`2.1 Manipulating
`Tagged Data
`by
`As in
`the Berkeley SPUR
`[15,24], data values manipulated
`I:
`MASA are tagged pointers containing
`three fields, shown in Figure
`a data field, a type
`tag field, and a generation
`tag field. MASA han-
`dles simple cases of generic operations (such as adding small integers)
`and frequently occurring checks (such as tag comparisons) directly in
`hardware. For complex cases that rarely occur but must be checked
`for frequently
`(such as adding
`rational numbers), MASA
`relies on a
`software
`trap handler
`to performshe
`operation.
`This
`is essentially
`SPUR’s approach, but MASA devotes more attention
`than SPUR
`to
`streamlining trap handling, so traps can be used heavily.
`2.2 Garbage Collection
`incremental garbage
`MASA supports parallel, generation-based,
`reduces garbage collec-
`collection. Generation-based
`garbage collection
`tion costs in large address spaces [17]. Incremental garbage collection
`is desirable both to avoid long pauses for garbage collection and to help
`in load balancing on a multiprocessor
`[13].
`MASA, like SPUR, supports generation-based
`garbage collection
`tags. A tagged pointer’s generation
`tag indicates
`the
`using generation
`generation of the storage area containing
`the object pointed
`to. As
`objects survive garbage collections,
`they are moved to areas of storage
`
`CH2545-2/88/0000/0443$01.00 0 1988 IEEE
`
`443
`
`SAMSUNG-1012
`Page 1 of 9
`
`
`
`I
`
`SiL Tag
`
`Figure 1: A MASA
`
`tagged pointer.
`
`Data
`Type Tag
`Generation Tag
`Old/New Bit
`Empty/Full
`Bit
`
`Figure 2: Memory word
`
`format.
`
`1 future
`
`1
`
`l +----+F~
`
`1
`1
`
`FI
`
`I
`
`value
`task queue
`
`I
`
`Figure 3: Representation of a future.
`
`nor-
`with successively smaller generation numbers. Store instructions
`mally cause a generation
`trap
`if the generation
`tag of a value being
`stored is greater
`than the generation
`tag of the pointer
`to the location
`where it is being stored.
`
`incremental garbage collection, MASA associates an
`To support
`old/new bit with each memory
`location and erects a barrier
`[17] that
`checks all pointers being loaded into the processor, causing a transport
`trap whenever an attempt
`is made to load an “old” pointer. Generation
`tags and old/new bits duplicate
`information
`that can be computed from
`a pointer’s address, but by avoiding
`this computation,
`they expedite
`loads and stores at the expense of adding a few bits to every memory
`word.
`
`2.3 Synchronization
`
`con-
`two kinds of synchronization
`to support
`is designed
`MASA
`structs:
`futures
`[1,8,9] and locks on individual memory locations. Locks
`facilitate simple atomic updates. Futures are values that serve as place-
`holders. A future
`is initially
`unresolved, and later resolves to a value
`supplied by some computation. A future can be copied without wait-
`ing for it to resolve. Strict operations
`like comparisons and arithmetic
`cannot operate on an unresolved
`future:
`they need the value to which
`it resolves. Strict operations
`touch their operands;
`touching an unre-
`solved future causes the touching
`task to be suspended.
`
`bit
`futures and locks by associating a full/empty
`MASA supports
`with each memory
`location. A “full”
`location contains valid data and
`may be accessed. Setting
`the bit
`to “empty” effectively
`locks the lo-
`cation. Load instructions
`normally
`trap
`if asked to read from empty
`locations. Full/empty
`bits were used on the HEP-1 processor
`[16,22],
`though attempts
`to read an empty
`location on the HEP-1 busy-waited
`rather
`than trapping.
`
`includes
`format, which
`Figure 2 shows the MASA memory word
`space for a tagged pointer, a full/empty
`bit, and an old/new bit. A
`
`future can be represented by a tagged pointer of type FUTURE, as indi-
`cated by Figure 3. The pointer points
`to a pair of memory
`locations.
`The first
`location
`(the value cell) contains
`the future’s value when the
`future
`is resolved; otherwise
`this location
`is empty. When a strict op-
`eration has an operand of type FUTURE, the contents of the future’s
`value cell should be used as the operand
`instead.
`If the value cell is
`empty, the task touching
`the future must be suspended until
`the future
`resolves. While
`the future
`is unresolved,
`the second location points
`to
`a list of such suspended tasks. Additional
`locations could be added to
`hold other desired information.
`
`2.4 Multiple
`
`Contexts
`
`into
`MASA has Nr task frames; a separate task can be pre-loaded
`each. Each task frame haa a set of auxiliary
`registers (such aa a program
`counter) and a set of N? general
`registers. Nr and N, are not yet
`decided, but we expect each to be either 8 or 16. Each general register
`holds a full tagged pointer
`(Figure
`I), but the full/empty
`and old/new
`bits are not copied into registers when a word is loaded from memory.
`
`alternatives during syn-
`Conventional processors face unattractive
`chronization
`or remote memory access delays (which can take tens of
`processor clock cycles--or more-to
`complete): either
`idle until execu-
`tion can resume, or pay the cost (usually at least dozens of instructions)
`is
`of a context switch
`to a new task.
`In MASA, any loaded task that
`not suspended is eligible
`to issue the next instruction;
`instructions
`from
`different
`tasks can be issued on consecutive clock cycles. Thii
`fast con-
`text switching mechanism allows the MASA processor to perform useful
`work on other
`tasks when one task is blocked even for a period of only
`a few clock cycles.
`
`register sets also expedite procedure calling, pro-
`MASA’s multiple
`cess creation, and trap handling, all common
`in the parallel Lisp pro-
`grams that interest us [19]. These operations all allocate a new context
`(tank
`frame) and initiate execution
`in that context. They differ only
`
`SAMSUNG-1012
`Page 2 of 9
`
`
`
`Table 1: MASA’s
`
`trap conditions.
`
`values between the newly created
`in the mechanism for communicating
`context and its activator, and in the synchronization
`(or lack thereof)
`between
`the two. Thus MASA’s non-overlapping multiple
`register sets,
`introduced
`for fast context switching, also support procedures and traps
`in a manner analogous
`to register windows
`[15,20,24].
`
`each task frame’s auxiliary
`relationships,
`intertask
`To represent
`registers
`include a parent and a child task register
`to hold
`the task
`frame numbers of the task’s parent and child, or an indication
`that
`there is none.
`In a procedure call, the caller’s child frame is the callee
`and the callee’s parent
`frame
`is the caller.
`If a trap occurs,
`the trap
`handler’s parent
`frame is that of the task that
`trapped.
`
`3. Context
`
`Management
`
`in MASA
`
`3.1 Trap Conditions
`
`is generally subject to several trap conditions, which
`An instruction
`may cause execution of the instruction
`to trap. Table 1 lists all of
`MASA’s
`trap conditions, along with
`their mnemonic abbreviations,
`in
`decreasing order of priority
`(thus,
`if both
`the FT and FX conditions
`are met, the FT trap will occur since it appears earlier
`in the table).
`Some trap conditions
`(e.g., arithmetic
`overflow) may be enabled or
`disab!ed in an instruction’s
`operation code; others are always enabled
`or disabled
`for particular
`families of instructions. Variations
`from
`the
`standard set of enabled
`trap conditions are noted in MASA assembly
`language syntax by clauses of the form
`“+con~
`or “-cond”
`following
`an instruction’s
`operation
`code, where
`the mnemonic cond names a
`trap condition
`that
`is enabled or disabled
`for that
`instruction.
`
`Table 1 classifies traps according
`cution where they are detected:
`Instruction
`decode can signal the Illegal
`
`l
`
`Instruction
`
`trap.
`
`to the phase of instruction
`
`exe-
`
`ex-
`l Tag checking can uncover a discrepancy during an instruction’s
`ecution or address calculation phase. When enabled,
`the Future
`Touch
`trap will occur
`if either of the instruction’s
`operands
`is of
`type FUTURE. Arithmetic
`and logical
`instructions
`normally
`require
`operands of type FIXNUH: if an operand’s
`type is neither FIXNUR nor
`FU’IURE, the Not FIXNUM
`trap will occur‘. Load and store instruc-
`tions can require
`the address used to be a pointer of type CONS (or
`else signal
`the Not CONS
`trap during address calculation),
`or to
`be one of CONS or NIL (to support
`the Common Lisp car and cdr
`operations
`[23]). Other
`tag checking traps are listed in Table 1.
`
`trap handling by saving
`Having many tag checking traps streamlines
`that the hardware can
`trap handlers
`from recomputing
`information
`detect at the time a trap is signalled. MASA could have just a single
`“tag mismatch”
`trap and let its trap handler analyze
`the problem,
`but since MASA’s hardware distinguishes
`important
`special cases
`such as touching a future, handlers for these cases need not take time
`to analyze
`the situation before going to work. As a further piece
`of streamlining,
`the Future Touch
`trap has two vectors, according
`to whether
`the first or the second operand was found to be of type
`FUTURE.
`cause a trap using the TRAP
`trap. A program can explicitly
`l Software
`instruction. A useful variant of TRAP is TRTC, which traps if its first
`operand’s
`type
`tag field
`is not equal to its second operand’s data
`field.
`trap.
`can signal the Arithmetic Overflow
`computation
`l Arithmetic
`l Memory access in a load or store instruction
`can signal an Empty
`Location, Full Location, or Transport
`trap
`(if enabled)
`if the ac-
`cessed location’s
`full/empty
`or old/new bit
`is not
`in the desired
`state.
`l Task frame management can cause the Task Frame Unavailable or
`Return
`to Saved Task traps discussed in Section 3.5.
`
`3.2 Register
`
`Specifiers
`
`identified by an ab-
`is uniquely
`in the MASA processor
`A register
`solute
`register specifier
`(t,r) where 1 is the number of a task frame
`and P names a register within
`the task
`frame
`(either a general
`reg-
`ister such as Rl or an auxiliary
`register such ss PC, CHILD, or PAR-
`EBT). Instructions,
`however, contain
`relative
`register specifiers, which
`are translated
`to absolute
`register specifiers at instruction
`issue time.
`The assembly syntax
`for a relative
`register specifier
`is “reJtasL. r” where
`r&ask E {CURTASK, CKILD, PARENT} and r is as in an absolute
`register
`specifier.
`“CURTASK.
`refers to a task’s own task frame (we use the
`simplified syntax
`r in this case); the CHILD and PARENT forms allow
`access to child and parent
`task frames. Register-window
`architectures
`generally
`include some global
`registers
`to hold constants or pointers
`to system-wide data structures
`[15,24]. This
`feature could easily be
`added to MASA by adding GLOBAL as a fourth possibility
`for reJlasE;
`and making
`it always refer
`to a designated
`task frame never used for
`instruction
`execution.
`
`register specifier can have
`When used as a source of data, a relative
`an optional
`field specifier appearing ss a postfix of the form
`“.fieJd”.
`If no field specifier
`is given,
`the source operand
`is the entire
`tagged
`pointer contained
`in the specified register,
`If a field specifier
`“. DATA” I
`is used, the data field, generation
`tag field, or type
`“.GEN”
`or “*TYPE”
`tag field, respectively, of the source register
`is selected as the data field
`of the operand, with FIXNUH and 0 supplied as the operand’s
`type and
`generation
`tags, respectively. Field specifiers allow
`the type tag fields
`of two registers
`to be compared,
`for example, using the instruction
`“CMP
`Rl .TYPE.RZ.TYPE”. Section 4 discusses the cost in cycle time of this
`feature.
`It was not deemed worthwhile
`for destination
`registers to have
`field specifiers;
`instead,
`the tag insertion
`instructions
`INST and INSG
`permit
`type and generation
`tags of values in registers
`to be selectively
`modified.
`
`3.3
`
`Instruction
`
`Set
`
`logical, and
`the usual repertoire of arithmetic,
`incorporates
`MASA
`that
`take their operands
`from and (where ap-
`instructions
`comparison
`plicable) store their results in processor registers.
`(Optionally,
`the sec-
`ond operand can be an immediate constant of type FIXNUN.) The data
`field of an arithmetic or logical
`instruction’s
`result
`is computed
`from
`the data fields of its operands;
`the result’s
`tag fields are those of the
`first operand. By default,
`these operations enable the Future Touch
`and Not FIXNUM
`trap conditions; however,
`these traps can be dis-
`abled (e.g., for address arithmetic).
`In MASA assembly syntax,
`the
`flow of data
`is from right
`to left: RI is the destination
`in both
`“ADD
`Rl , R2, R3” and “MOVE Rl , RZ”
`
`445
`
`SAMSUNG-1012
`Page 3 of 9
`
`
`
`full
`can leave the accessed memory word either
`Load instructions
`OF empty.
`Their assembly syntax
`is “LDz dest,oflset(base)”
`where
`I E {E,F}
`(indicating
`the resulting state of the accessed location’s
`full/empty
`bit), dest and base ate relative
`register specifiers, and o&t
`is an integer constant. The address of the accessed memory
`location
`is calculated by adding oflsef to the data field of the base operand
`(to
`allow for absolute addressing, RO of every task frame has 0 “wired”
`into
`it). Transport and Empty Location
`traps ate enabled by default.
`
`ate written as “STy oflsel( base) , source” where
`Store instructions
`source is a relative
`register specifier
`indicating
`the value to be stored,
`and base and offset are as for
`the LDr
`family of instructions.
`The
`old/new bit of the location stored into is set to “new”
`if y is N, and to
`“old”
`if y is 0. Generation
`traps are enabled by default.
`
`in
`forces compromises
`instruction words
`The packing of bits into
`virtually
`all processor architectures. A processor’s data paths may ea.+
`ily accommodate a very orthogonal
`instruction
`set, but a more highly
`encoded,
`less orthogonal
`instruction
`format
`is often chosen solely
`to
`reduce instruction
`length. As a result, mote than one instruction may
`be required
`to perform an operation
`that
`the underlying
`data paths
`could perform
`in one step. We have aimed for a high degree of orthog-
`onality and regularity
`in MASA’s
`instruction
`set., rather
`than apply
`preconceptions
`about
`the combinations of operations and addressing
`modes
`that will be the mosst useful. Therefore, MASA
`is lavish
`in
`its allocation
`of function
`to instructions, which
`is governed by only
`one constraint:
`no instruction
`should require
`time-consuming
`“magic”
`in any pipeline stage. MASA’s
`instruction
`repertoire must eventually
`also be constrained by instruction encoding requirements, but we defer
`applying such constraints until we accumulate more experience. The
`flexibility
`of MASA’s
`register and field specifiers,
`in particular,
`is prob-
`ably extravagant
`in its use of instruction word bits, but we would
`like
`to see which combinations of register specifiers and operation codes ate
`valuable before restricting
`the permissible combinations.
`
`3.4 Task Frame Allocation
`
`and
`
`Instruction
`
`Issue
`
`The state of a task frame may be represented as a pair (RStale,
`FSfafe) where
`
`RState E {Ready, Running, CallWait, TmpFailed, Suspended, Fee}
`
`is discussed in Sec-
`(the use of FState
`and FState E {Enabled, Frozen}
`tion 3.5). Only
`tasks in the (Ready, Enabled) state ate eligible
`to issue
`instructions. When a task issues an instruction,
`its RState goes from
`Ready to Running,
`returning
`to Ready when a previously
`issued in-
`struction
`completes;
`thus a task never has more than one instruction
`executing at a time. This restriction
`reduces MASA’s speed when only
`one task is active below that of processors that issue instructions with-
`out waiting
`for previously
`issued instructions
`to finish, but it simplifies
`the processor design and does not reduce speed when many tasks ate
`active. Section 5 discusses ways to relax
`this restriction.
`
`instruction allocates a Free task ftame
`frame”)
`The RQFR (“request
`F (if one exists), setting
`its RState
`to Suspended.
`(To facilitate
`this
`operation, a processor-wide
`task usage register (TUR) has one bit cotre-
`sponding
`to each task frame indicating whether
`the task frame is Free.)
`The requesting
`task frame R is made the parent of F and F is made
`the child of R. When a trap occuts,
`the trapping
`instruction
`does not
`write
`its normal results ot update
`its program counter;
`instead, an un-
`used task frame H is allocated
`(as by RQFR) for a trap handler
`to run
`in. The initial value of H’s program counter
`is determined by the trap
`condition
`that occurred. The trapping
`frame Tis made the parent of H
`and T’s RState is set to Suspended but the parent and child task tegis-
`tets of T are not disturbed. H can re-execute
`the trapping
`instruction
`simply by making T Ready again; alternatively,
`it can proceed
`to the
`next
`instruction
`in T by te-activating
`T after explicitly
`incrementing
`T’s program counter.
`
`CALLER: RQFR
`UOVE CHILD.Rl,RZ
`CHILD.R2,R5
`MOVE
`CALL
`PROC
`MOVE RZ,CHILD.Rl
`RLFR
`CHILD
`
`; allocate
`; transfer
`
`frame
`child
`arguments
`to child
`
`; set child’s
`; retrieve
`; release
`
`PC and activate
`result
`from child
`child
`frame
`
`PROC :
`
`Rl,Rl,R2
`
`; add arguments
`ADD
`RETN
`; return
`Figure 4: An example of procedure
`
`linkage.
`
`deallocates a task ftamr
`instruction
`frame”)
`The RLFR (“release
`“RLFR CURTASK”, “RLFR PARENT”, awl
`by resetting
`its RState to Fw.
`“RLFR CHILD” deallocate
`the current
`frame,
`its parent, and its child,
`respectively. The instructions RETN (“return”)
`and RERL (“return
`and
`release”) are useful
`for returning
`from procedures and trap handlers,
`respectively. RETN makes the current
`task’s parent Ready again, but at
`the same time leaves the current
`task Suspended. RERL is like RETN but
`leaves the current
`task Free instead.
`
`“CALL add? sets the child task’s program counter
`The instruction
`to addr, makes the child task Ready, and S&R
`the CJJtrent.
`task’s RStntr
`to CallWait.
`In the example of procedure call and return given
`in
`Figure 4, PROC receives two arguments
`in its RI and R2, stores its result
`in its Ri, and then returns control
`to its caller. Then
`the caller
`takes
`the result
`from
`its child’s Ri and releases PROC’s task frame. As with
`register windows,
`the caller’s
`registers save the caller’s state while
`the
`callee task works.
`
`3.5 The Frame Saver
`
`task
`Deadlock can occur if a trap handler uses up the last available
`frame and then traps. To prevent deadlock, a trap occurring when only
`one task frame is available
`is converted
`into a Task Frame Unavailable
`trap. To prevent RQFR from using up the last available
`task frame, it too
`causes a Task Frame Unavailable
`trap if only one task frame is available.
`In either case, the trapping
`task’s RState is set to TrapFailed. The Task
`Frame Unavailable
`trap
`is serviced by the frame saver, which unloads
`some task frames to make space. The frame saver must not unload
`the
`Suspended patent of a loaded
`trap handler
`(since a trap handler may
`reference
`its patent’s
`registers) or the Suspended child C of a loaded
`task P (since P may access C’s registers as part of the procedure call
`protocol
`illustrated
`in Figure 4). Any other
`frame unloading policy
`is
`acceptable.
`In particular,
`the frame saver may unload
`frames that are
`in the CallWait state waiting
`for a procedure call to finish and frames in
`the Ready ot napFailed
`states that are candidates
`for being moved to
`another processor
`for load balancing. The frame saver needs to record
`the memory area to which a frame has been saved so that
`it may be
`reloaded when needed.
`
`The frame saver must be careful not to unload a task in the middle
`of executing an instruction
`or on the basis of outdated observations of
`its state. The frame saver can avoid such eventualities
`by setting
`the
`FState of selected frames to Frozen so that a consistent observation of
`their state may be made before
`the final decision on which
`frames
`to
`unload. Space limitations
`unfortunately
`preclude detailed discussion of
`of the RStates of tasks
`frame saver algorithms here, but examination
`to be saved, their patents, and their children, using FSlate as needed
`to ensure that a consistent picture
`is observed, suffice to implement
`the
`frame saver correctly.
`(It is noteworthy
`that the only RState values that
`need to be distinguished
`in the absence of the frame saver are Ready,
`TmpFuiled, Free, and “other.” The remaining RStale values exist only
`to preserve
`information
`valuable
`to the frame saver.)
`
`446
`
`SAMSUNG-1012
`Page 4 of 9
`
`
`
`If a trap (or RQFR) occurs when no task frames are free, the trap-
`ping task simply enters
`the TrapFailed state and the trap
`is forgot-
`ten, under
`the assumption
`that
`the frame saver
`is already
`running.
`When the frame saver finishes,
`it causes the trapping
`instruction
`to be
`re-executed
`in each task that has entered
`the TrapFailed state. The
`search for
`these tasks
`is speeded by means of a processor-wide
`sus-
`pended frame requests register
`(SFR) with a bit corresponding
`to each
`task frame indicating whether
`the frame is currently
`in the TrapFailed
`state.
`
`for a procedure call to return may be un-
`Since a frame waiting
`loaded, it is possible for a callee to execute a RETN instruction when its
`caller is not loaded into any task frame. This condition can be detected
`because the callee’s parent
`task register will contain a null value put
`there when the caller frame was saved. A REZTN instruction executed by
`such a task causes a Return
`to Saved Task trap, whose service routine
`locates
`the caller’s saved state, allocates a new task frame
`for it, and
`re-executes
`the RETN instruction
`after properly updating
`the caller’s
`and callee’s parent and child task registers.
`
`3.6 Trap Handling
`
`traps, the absolute register spec-
`instruction
`If a register-to-register
`ifiers of the source and destination
`registers are saved in a trap informa-
`tion register of the trap handler
`task. The trap handler may then access
`the trapping
`instruction’s
`source and destination
`registers directly us-
`ing the relative
`register specifiers SRi, SR2, and DST. If a load or store
`instruction
`traps,
`its source and destination
`registers are accessible in
`this way, and additionally
`the effective address is written
`into Rl in the
`trap handler’s
`task frame. Section 4 discusses the implementation
`of
`this mechanism.
`
`As an example of trap handling, consider the trap that occurs when
`a strict operation such as ADD discovers
`that one of its operands
`is a
`future.
`If futures are represented as in Figure 3, the two trap handlers
`(depending on whether
`the future
`is the first or second operand) are
`TOUCHIST: LDF
`SRl,O(SRl)
`RERL
`frame and re-execute
`; release
`trapping
`instruction
`
`TO’JCHZND: LDF
`RERL
`
`SR2,O(SR2)
`
`task’s register set
`in the trapping
`These trap handlers replace the future
`by the contents of the future’s value cell,
`If the future
`is unresolved,
`an Empty Location
`trap occurs
`in the LDF instruction.
`The handler
`for this trap can diagnose the situation and take the steps needed to
`suspend the task that
`touched
`the future. Although
`touching an un-
`resolved
`future
`involves
`two traps and a fair amount of computation,
`touching a resolved
`future
`(the more common case [lg])
`requires only
`two instructions
`to be executed
`in the trap handler.
`3.7 Task Creation
`
`the parent
`that
`is like procedure calling except
`Task spawning
`aud data
`task does not wait after activating
`its child. Synchronization
`transfer
`from
`the chiId are handled
`instead using a future: Figure 5
`gives an example. The parent task first allocates and initializes a future
`for the child task’s value.
`It then allocates a task frame for the child,
`fills some of the child’s registers with parameters
`(including
`the future
`just created), and uses ACTK to set the child task’s program counter
`to
`TASK2 and initiate
`its execution. ACTK is just like CALL except that ACTK
`allows the current
`task to continue executing, while CALL suspends the
`current
`task until
`its child executes a RETN instruction.
`
`the new task created by this code helps fill up pipeline
`Although
`slots on the same processor,
`this code does nothing
`to distribute
`tasks
`to other processors
`in a multiprocessor.
`For load balancing among
`processors, each processor must periodically
`run a process related
`to
`the frame saver that can save excess active
`task frames in memory so
`other processors can pick them up and execute
`them.
`
`task queue
`; store NIL as waiting
`frame
`; allocate
`child
`task
`; give
`future
`to child
`; transfer
`any arguments
`; set child’s
`PC and activate
`
`task
`
`the
`
`; The parent
`
`task:
`
`; allocate
`
`2 words
`
`for
`
`the
`
`future
`
`RI, FREESTR
`TASXl: LDE
`ADD-FX R2.Rl.2
`FREE-PTR. R2
`STN
`; put FUTURE tag on
`Rl,Rl,FUTURE-TAG
`INST
`cell
`empty
`; make value
`RS,O(Rl)
`LDE
`R2, NIL
`LDF
`I(Rl),RZ
`STN
`RDFR
`MOVE CHILD.Rl,Ri
`
`ACTK
`
`TASK2
`
`ADD
`
`R3,Rl,R?
`
`; touch
`
`the
`
`future
`
`in Rl
`
`; The child
`
`task:
`
`TASK2 :
`
`; compute value
`; store
`value
`
`in
`
`(say,
`future
`
`in R4)
`
`STN+FL O(RI),R4
`R2,NIL
`LDF
`task queue
`; read and empty
`R3,1(Rl)
`LDE
`; replace with NIL
`I(RI),R2
`STN
`on the queue headed by RS
`all
`; activate
`:
`tasks
`; de-allocate
`own task
`fr.?ime
`CURTASK
`RLFR
`Figure 5: Creating and using a future.
`
`4. MASA’s
`
`Data Paths
`
`illus-
`Figure 6 gives a block diagram of a possible implementation
`trating
`the feasibility of the MASA processor architecture. Execution of
`an instruction
`begins in the Instruction Fetch Unit (IFU). The IFU op-
`erates on the Task Database, which contains the auxiliary
`registers of all
`the task frames. On each clock cycle, the IFU (1) determines
`the frame
`number of a task T that
`is in the (Ready, Enabled) state (if one exists),
`(2) reads T’s program counter,
`(3) fetches the corresponding
`instruc-
`to Running.
`tion
`from
`the Instruction Cache, and (4) sets T’s RSlate
`Step (1) can be performed quickly by combinational
`logic similar
`to
`a priority encoder
`that reads a bit vector
`identifying
`(Ready, Enabled)
`frames.
`In step (3), if the Instruction Cache misses, a cycle might be
`wasted, but the IFU can attempt
`to issue instructions
`for other
`tasks
`on subsequent cycles while
`the cache is being refilled.
`
`can fetch
`the next cycle (“Stage 2”), an issued instruction
`During
`two operands
`from the Register File, which contains al1 the general reg-
`isters, or from
`the Task Database. The principal operations
`in Stage 2
`are (1) translate
`relative
`register specifiers
`to absolute
`register speci-
`fiers; (2) read the indicated
`registers; and (3) select the desired fields
`of the registers, or substitute an immediate constant
`from
`the instruc-
`tion word. The IFU’s output
`to the Operand Fetch Unit
`includes
`the
`task number of the task T issuing the instruction,
`along with T’s child
`task, parent
`task, and trap
`information
`registers
`for use in step (l),
`which can be performed quickly by a small multiplexer.
`Step (3) can
`be performed by four-input multiplexers
`and should not add much de-
`Iay because the multiplexers
`select inputs can be set up concurrently
`with step (2); only
`the multiplexers’
`data-to-output
`propagation delay
`time will add to the delay of Stage 2.
`
`in the
`and logical operations happen
`Tag checking, arithmetic,
`following
`cycle of an instruction’s
`execution
`(Stage 3). Register-to-
`register
`instructions
`perform
`their computation
`step in this pipeline
`stage; load/store operations perform a base-plus-displacement
`address
`calculation
`here. Following Stage 3, a register-to-register
`instruction
`proceeds immediately
`to the final pipeline stage (Stage 4), where results
`are written