throbber
MASA:
`
`Architecture
`Processor
`A Multithreaded
`for Parallel
`Symbolic
`Computing
`
`Jr.
`Robert H. Halstead,
`M.I.T. Laboratory
`for Computer Science
`545 Technology Square
`Cambridge, Mass. 02139, U.S.A.
`
`Tetsuya Fujita
`NEC Corporation
`l-10, Nisshincho
`FIIC~~I City, Tokyo, 183, Japan
`
`Abstract
`
`intended as a
`is a “first cut” at a processor architecture
`MASA
`building block for a multiprocessor
`that can execute parallel Lisp pro
`grams efficiently. MASA
`features a tagged architecture, multiple con-
`texts,
`fast trap handling, and a synchronization
`bit
`in every memory
`word. MASA’s principal novelty
`is its use of multiple contexts both to
`support multithreaded
`execution-interleaved
`execution
`from separate
`instruction
`streams-and
`to speed up procedure calls and trap han-
`dling in the same manner as register windows. A project
`is under way
`to evaluate MASA-like architectures
`for executing programs written
`in
`Multilisp.
`
`1.
`
`Introduction
`
`is a
`for Symbolic Applications)
`Architecture
`(Multilisp
`MASA
`for
`Yirst cut” at a “communication-oriented”
`processor architecture
`efficiently executing parallel Lisp (especially Multilisp
`[8,9] or QLISP
`[6]) programs. The system structure envisioned
`for MASA consists of
`a collection of dozens to hundreds of nodes, each including a MASA
`processor and some memory
`(including
`caches),
`interconnected
`by a
`high-speed communication
`net,
`Individual memory modules are phys-
`ically associated with particular nodes, but the system has one unified
`address space (in line with
`the “shared memory” philosophy of Multi-
`lisp and &LISP). This structure
`resembles that of current systems such
`es the IBM RP3 [21] and the BBN Butterfly
`[3]. Although
`the design
`of the communication net and caching strategy are important subjects,
`this paper focuses on MASA’s processor architecture.
`
`MASA aims to speed up several parallel Lisp operations, notably
`generic operations on tagged data,
`task spawning, synchronization
`(in
`and garbage collection, which are ex-
`particular,
`using futures
`[l&9]),
`pensive on “mainstream”
`processor architectures.
`The
`“communica-
`tion-oriented”
`character of MASA comes from its use of multiple con-
`texts
`to support multithreaded
`execution
`in which each processor can
`have several active
`tasks loaded at once and can switch
`from one to
`another on every clock cycle. Multithreading
`offers a way to improve
`MASA’s
`tolerance of communication and synchronization
`delays by al-
`lowing other
`tasks to run while some are blocked, even if only briefly.
`Although
`this has been done before (on the HEP-1 116,22]), MASA’s
`principal design contribution
`is its scheme for graceful management
`of and interaction between
`these contexts.
`In MASA, one resource-a
`bank of register sets-is
`shared flexibly among various uses, supporting
`
`in part by NEC Corp. (by supporting
`This research was supported
`the second author
`for a year at M.I.T.) and in part by the Defense
`Advanced Research Projects Agency, monitored by the OWce of Naval
`Research under contract numbers N00014-83-K-0125 and N00014-84
`K-0099.
`
`[15,20,24]
`but also used in the style of register windows
`multithreading
`to reduce the average cost of context saving and restoring
`for procedure
`calls and traps.
`is a ‘Wrst cut,” many other details are still unresolved.
`Since MASA
`We have focused on the choice of operations
`implemented,
`their alloca
`tion among pipeline stages, and the design of novel mechanisms such as
`those for interaction
`between
`tasks. Other
`issues, such as the packing
`of bits into
`instruction words,
`the size of tag and data fields, and the
`number and size of register sets, involve
`low-level optimizations
`that
`interact weakly with
`the major design issues. They are best settled on
`the basis of empirical experience
`that has not yet accumulated, so they
`are not yet fixed.
`In the following sections, we first give a broad overview of MASA’s
`architectural
`goals and mechanisms. Then we discuss in greater detail
`MASA’s context management mechanisms,
`including
`their application
`to multithreading,
`procedure
`linkage, and trap handling. Next, we give
`an overview of a possible pipelined
`implementation
`of MASA. We close
`by discussing
`the performance
`impact of MASA’s design decisions.
`2. Architectural
`Overview
`architecture
`load/store
`MASA employs a simple,
`register-based,
`augmented with hardware mechanisms
`for tagged data manipulation,
`garbage collection, synchronization,
`and multithreaded
`execution using
`multiple contexts. Essentially all details of MASA’s architecture are
`direct consequences of the style of these mechanisms. Space constraints
`require
`this paper to omit many details, which can be found
`in [4].
`2.1 Manipulating
`Tagged Data
`by
`As in
`the Berkeley SPUR
`[15,24], data values manipulated
`I:
`MASA are tagged pointers containing
`three fields, shown in Figure
`a data field, a type
`tag field, and a generation
`tag field. MASA han-
`dles simple cases of generic operations (such as adding small integers)
`and frequently occurring checks (such as tag comparisons) directly in
`hardware. For complex cases that rarely occur but must be checked
`for frequently
`(such as adding
`rational numbers), MASA
`relies on a
`software
`trap handler
`to performshe
`operation.
`This
`is essentially
`SPUR’s approach, but MASA devotes more attention
`than SPUR
`to
`streamlining trap handling, so traps can be used heavily.
`2.2 Garbage Collection
`incremental garbage
`MASA supports parallel, generation-based,
`reduces garbage collec-
`collection. Generation-based
`garbage collection
`tion costs in large address spaces [17]. Incremental garbage collection
`is desirable both to avoid long pauses for garbage collection and to help
`in load balancing on a multiprocessor
`[13].
`MASA, like SPUR, supports generation-based
`garbage collection
`tags. A tagged pointer’s generation
`tag indicates
`the
`using generation
`generation of the storage area containing
`the object pointed
`to. As
`objects survive garbage collections,
`they are moved to areas of storage
`
`CH2545-2/88/0000/0443$01.00 0 1988 IEEE
`
`443
`
`SAMSUNG-1012
`Page 1 of 9
`
`

`

`I
`
`SiL Tag
`
`Figure 1: A MASA
`
`tagged pointer.
`
`Data
`Type Tag
`Generation Tag
`Old/New Bit
`Empty/Full
`Bit
`
`Figure 2: Memory word
`
`format.
`
`1 future
`
`1
`
`l +----+F~
`
`1
`1
`
`FI
`
`I
`
`value
`task queue
`
`I
`
`Figure 3: Representation of a future.
`
`nor-
`with successively smaller generation numbers. Store instructions
`mally cause a generation
`trap
`if the generation
`tag of a value being
`stored is greater
`than the generation
`tag of the pointer
`to the location
`where it is being stored.
`
`incremental garbage collection, MASA associates an
`To support
`old/new bit with each memory
`location and erects a barrier
`[17] that
`checks all pointers being loaded into the processor, causing a transport
`trap whenever an attempt
`is made to load an “old” pointer. Generation
`tags and old/new bits duplicate
`information
`that can be computed from
`a pointer’s address, but by avoiding
`this computation,
`they expedite
`loads and stores at the expense of adding a few bits to every memory
`word.
`
`2.3 Synchronization
`
`con-
`two kinds of synchronization
`to support
`is designed
`MASA
`structs:
`futures
`[1,8,9] and locks on individual memory locations. Locks
`facilitate simple atomic updates. Futures are values that serve as place-
`holders. A future
`is initially
`unresolved, and later resolves to a value
`supplied by some computation. A future can be copied without wait-
`ing for it to resolve. Strict operations
`like comparisons and arithmetic
`cannot operate on an unresolved
`future:
`they need the value to which
`it resolves. Strict operations
`touch their operands;
`touching an unre-
`solved future causes the touching
`task to be suspended.
`
`bit
`futures and locks by associating a full/empty
`MASA supports
`with each memory
`location. A “full”
`location contains valid data and
`may be accessed. Setting
`the bit
`to “empty” effectively
`locks the lo-
`cation. Load instructions
`normally
`trap
`if asked to read from empty
`locations. Full/empty
`bits were used on the HEP-1 processor
`[16,22],
`though attempts
`to read an empty
`location on the HEP-1 busy-waited
`rather
`than trapping.
`
`includes
`format, which
`Figure 2 shows the MASA memory word
`space for a tagged pointer, a full/empty
`bit, and an old/new bit. A
`
`future can be represented by a tagged pointer of type FUTURE, as indi-
`cated by Figure 3. The pointer points
`to a pair of memory
`locations.
`The first
`location
`(the value cell) contains
`the future’s value when the
`future
`is resolved; otherwise
`this location
`is empty. When a strict op-
`eration has an operand of type FUTURE, the contents of the future’s
`value cell should be used as the operand
`instead.
`If the value cell is
`empty, the task touching
`the future must be suspended until
`the future
`resolves. While
`the future
`is unresolved,
`the second location points
`to
`a list of such suspended tasks. Additional
`locations could be added to
`hold other desired information.
`
`2.4 Multiple
`
`Contexts
`
`into
`MASA has Nr task frames; a separate task can be pre-loaded
`each. Each task frame haa a set of auxiliary
`registers (such aa a program
`counter) and a set of N? general
`registers. Nr and N, are not yet
`decided, but we expect each to be either 8 or 16. Each general register
`holds a full tagged pointer
`(Figure
`I), but the full/empty
`and old/new
`bits are not copied into registers when a word is loaded from memory.
`
`alternatives during syn-
`Conventional processors face unattractive
`chronization
`or remote memory access delays (which can take tens of
`processor clock cycles--or more-to
`complete): either
`idle until execu-
`tion can resume, or pay the cost (usually at least dozens of instructions)
`is
`of a context switch
`to a new task.
`In MASA, any loaded task that
`not suspended is eligible
`to issue the next instruction;
`instructions
`from
`different
`tasks can be issued on consecutive clock cycles. Thii
`fast con-
`text switching mechanism allows the MASA processor to perform useful
`work on other
`tasks when one task is blocked even for a period of only
`a few clock cycles.
`
`register sets also expedite procedure calling, pro-
`MASA’s multiple
`cess creation, and trap handling, all common
`in the parallel Lisp pro-
`grams that interest us [19]. These operations all allocate a new context
`(tank
`frame) and initiate execution
`in that context. They differ only
`
`SAMSUNG-1012
`Page 2 of 9
`
`

`

`Table 1: MASA’s
`
`trap conditions.
`
`values between the newly created
`in the mechanism for communicating
`context and its activator, and in the synchronization
`(or lack thereof)
`between
`the two. Thus MASA’s non-overlapping multiple
`register sets,
`introduced
`for fast context switching, also support procedures and traps
`in a manner analogous
`to register windows
`[15,20,24].
`
`each task frame’s auxiliary
`relationships,
`intertask
`To represent
`registers
`include a parent and a child task register
`to hold
`the task
`frame numbers of the task’s parent and child, or an indication
`that
`there is none.
`In a procedure call, the caller’s child frame is the callee
`and the callee’s parent
`frame
`is the caller.
`If a trap occurs,
`the trap
`handler’s parent
`frame is that of the task that
`trapped.
`
`3. Context
`
`Management
`
`in MASA
`
`3.1 Trap Conditions
`
`is generally subject to several trap conditions, which
`An instruction
`may cause execution of the instruction
`to trap. Table 1 lists all of
`MASA’s
`trap conditions, along with
`their mnemonic abbreviations,
`in
`decreasing order of priority
`(thus,
`if both
`the FT and FX conditions
`are met, the FT trap will occur since it appears earlier
`in the table).
`Some trap conditions
`(e.g., arithmetic
`overflow) may be enabled or
`disab!ed in an instruction’s
`operation code; others are always enabled
`or disabled
`for particular
`families of instructions. Variations
`from
`the
`standard set of enabled
`trap conditions are noted in MASA assembly
`language syntax by clauses of the form
`“+con~
`or “-cond”
`following
`an instruction’s
`operation
`code, where
`the mnemonic cond names a
`trap condition
`that
`is enabled or disabled
`for that
`instruction.
`
`Table 1 classifies traps according
`cution where they are detected:
`Instruction
`decode can signal the Illegal
`
`l
`
`Instruction
`
`trap.
`
`to the phase of instruction
`
`exe-
`
`ex-
`l Tag checking can uncover a discrepancy during an instruction’s
`ecution or address calculation phase. When enabled,
`the Future
`Touch
`trap will occur
`if either of the instruction’s
`operands
`is of
`type FUTURE. Arithmetic
`and logical
`instructions
`normally
`require
`operands of type FIXNUH: if an operand’s
`type is neither FIXNUR nor
`FU’IURE, the Not FIXNUM
`trap will occur‘. Load and store instruc-
`tions can require
`the address used to be a pointer of type CONS (or
`else signal
`the Not CONS
`trap during address calculation),
`or to
`be one of CONS or NIL (to support
`the Common Lisp car and cdr
`operations
`[23]). Other
`tag checking traps are listed in Table 1.
`
`trap handling by saving
`Having many tag checking traps streamlines
`that the hardware can
`trap handlers
`from recomputing
`information
`detect at the time a trap is signalled. MASA could have just a single
`“tag mismatch”
`trap and let its trap handler analyze
`the problem,
`but since MASA’s hardware distinguishes
`important
`special cases
`such as touching a future, handlers for these cases need not take time
`to analyze
`the situation before going to work. As a further piece
`of streamlining,
`the Future Touch
`trap has two vectors, according
`to whether
`the first or the second operand was found to be of type
`FUTURE.
`cause a trap using the TRAP
`trap. A program can explicitly
`l Software
`instruction. A useful variant of TRAP is TRTC, which traps if its first
`operand’s
`type
`tag field
`is not equal to its second operand’s data
`field.
`trap.
`can signal the Arithmetic Overflow
`computation
`l Arithmetic
`l Memory access in a load or store instruction
`can signal an Empty
`Location, Full Location, or Transport
`trap
`(if enabled)
`if the ac-
`cessed location’s
`full/empty
`or old/new bit
`is not
`in the desired
`state.
`l Task frame management can cause the Task Frame Unavailable or
`Return
`to Saved Task traps discussed in Section 3.5.
`
`3.2 Register
`
`Specifiers
`
`identified by an ab-
`is uniquely
`in the MASA processor
`A register
`solute
`register specifier
`(t,r) where 1 is the number of a task frame
`and P names a register within
`the task
`frame
`(either a general
`reg-
`ister such as Rl or an auxiliary
`register such ss PC, CHILD, or PAR-
`EBT). Instructions,
`however, contain
`relative
`register specifiers, which
`are translated
`to absolute
`register specifiers at instruction
`issue time.
`The assembly syntax
`for a relative
`register specifier
`is “reJtasL. r” where
`r&ask E {CURTASK, CKILD, PARENT} and r is as in an absolute
`register
`specifier.
`“CURTASK.
`refers to a task’s own task frame (we use the
`simplified syntax
`r in this case); the CHILD and PARENT forms allow
`access to child and parent
`task frames. Register-window
`architectures
`generally
`include some global
`registers
`to hold constants or pointers
`to system-wide data structures
`[15,24]. This
`feature could easily be
`added to MASA by adding GLOBAL as a fourth possibility
`for reJlasE;
`and making
`it always refer
`to a designated
`task frame never used for
`instruction
`execution.
`
`register specifier can have
`When used as a source of data, a relative
`an optional
`field specifier appearing ss a postfix of the form
`“.fieJd”.
`If no field specifier
`is given,
`the source operand
`is the entire
`tagged
`pointer contained
`in the specified register,
`If a field specifier
`“. DATA” I
`is used, the data field, generation
`tag field, or type
`“.GEN”
`or “*TYPE”
`tag field, respectively, of the source register
`is selected as the data field
`of the operand, with FIXNUH and 0 supplied as the operand’s
`type and
`generation
`tags, respectively. Field specifiers allow
`the type tag fields
`of two registers
`to be compared,
`for example, using the instruction
`“CMP
`Rl .TYPE.RZ.TYPE”. Section 4 discusses the cost in cycle time of this
`feature.
`It was not deemed worthwhile
`for destination
`registers to have
`field specifiers;
`instead,
`the tag insertion
`instructions
`INST and INSG
`permit
`type and generation
`tags of values in registers
`to be selectively
`modified.
`
`3.3
`
`Instruction
`
`Set
`
`logical, and
`the usual repertoire of arithmetic,
`incorporates
`MASA
`that
`take their operands
`from and (where ap-
`instructions
`comparison
`plicable) store their results in processor registers.
`(Optionally,
`the sec-
`ond operand can be an immediate constant of type FIXNUN.) The data
`field of an arithmetic or logical
`instruction’s
`result
`is computed
`from
`the data fields of its operands;
`the result’s
`tag fields are those of the
`first operand. By default,
`these operations enable the Future Touch
`and Not FIXNUM
`trap conditions; however,
`these traps can be dis-
`abled (e.g., for address arithmetic).
`In MASA assembly syntax,
`the
`flow of data
`is from right
`to left: RI is the destination
`in both
`“ADD
`Rl , R2, R3” and “MOVE Rl , RZ”
`
`445
`
`SAMSUNG-1012
`Page 3 of 9
`
`

`

`full
`can leave the accessed memory word either
`Load instructions
`OF empty.
`Their assembly syntax
`is “LDz dest,oflset(base)”
`where
`I E {E,F}
`(indicating
`the resulting state of the accessed location’s
`full/empty
`bit), dest and base ate relative
`register specifiers, and o&t
`is an integer constant. The address of the accessed memory
`location
`is calculated by adding oflsef to the data field of the base operand
`(to
`allow for absolute addressing, RO of every task frame has 0 “wired”
`into
`it). Transport and Empty Location
`traps ate enabled by default.
`
`ate written as “STy oflsel( base) , source” where
`Store instructions
`source is a relative
`register specifier
`indicating
`the value to be stored,
`and base and offset are as for
`the LDr
`family of instructions.
`The
`old/new bit of the location stored into is set to “new”
`if y is N, and to
`“old”
`if y is 0. Generation
`traps are enabled by default.
`
`in
`forces compromises
`instruction words
`The packing of bits into
`virtually
`all processor architectures. A processor’s data paths may ea.+
`ily accommodate a very orthogonal
`instruction
`set, but a more highly
`encoded,
`less orthogonal
`instruction
`format
`is often chosen solely
`to
`reduce instruction
`length. As a result, mote than one instruction may
`be required
`to perform an operation
`that
`the underlying
`data paths
`could perform
`in one step. We have aimed for a high degree of orthog-
`onality and regularity
`in MASA’s
`instruction
`set., rather
`than apply
`preconceptions
`about
`the combinations of operations and addressing
`modes
`that will be the mosst useful. Therefore, MASA
`is lavish
`in
`its allocation
`of function
`to instructions, which
`is governed by only
`one constraint:
`no instruction
`should require
`time-consuming
`“magic”
`in any pipeline stage. MASA’s
`instruction
`repertoire must eventually
`also be constrained by instruction encoding requirements, but we defer
`applying such constraints until we accumulate more experience. The
`flexibility
`of MASA’s
`register and field specifiers,
`in particular,
`is prob-
`ably extravagant
`in its use of instruction word bits, but we would
`like
`to see which combinations of register specifiers and operation codes ate
`valuable before restricting
`the permissible combinations.
`
`3.4 Task Frame Allocation
`
`and
`
`Instruction
`
`Issue
`
`The state of a task frame may be represented as a pair (RStale,
`FSfafe) where
`
`RState E {Ready, Running, CallWait, TmpFailed, Suspended, Fee}
`
`is discussed in Sec-
`(the use of FState
`and FState E {Enabled, Frozen}
`tion 3.5). Only
`tasks in the (Ready, Enabled) state ate eligible
`to issue
`instructions. When a task issues an instruction,
`its RState goes from
`Ready to Running,
`returning
`to Ready when a previously
`issued in-
`struction
`completes;
`thus a task never has more than one instruction
`executing at a time. This restriction
`reduces MASA’s speed when only
`one task is active below that of processors that issue instructions with-
`out waiting
`for previously
`issued instructions
`to finish, but it simplifies
`the processor design and does not reduce speed when many tasks ate
`active. Section 5 discusses ways to relax
`this restriction.
`
`instruction allocates a Free task ftame
`frame”)
`The RQFR (“request
`F (if one exists), setting
`its RState
`to Suspended.
`(To facilitate
`this
`operation, a processor-wide
`task usage register (TUR) has one bit cotre-
`sponding
`to each task frame indicating whether
`the task frame is Free.)
`The requesting
`task frame R is made the parent of F and F is made
`the child of R. When a trap occuts,
`the trapping
`instruction
`does not
`write
`its normal results ot update
`its program counter;
`instead, an un-
`used task frame H is allocated
`(as by RQFR) for a trap handler
`to run
`in. The initial value of H’s program counter
`is determined by the trap
`condition
`that occurred. The trapping
`frame Tis made the parent of H
`and T’s RState is set to Suspended but the parent and child task tegis-
`tets of T are not disturbed. H can re-execute
`the trapping
`instruction
`simply by making T Ready again; alternatively,
`it can proceed
`to the
`next
`instruction
`in T by te-activating
`T after explicitly
`incrementing
`T’s program counter.
`
`CALLER: RQFR
`UOVE CHILD.Rl,RZ
`CHILD.R2,R5
`MOVE
`CALL
`PROC
`MOVE RZ,CHILD.Rl
`RLFR
`CHILD
`
`; allocate
`; transfer
`
`frame
`child
`arguments
`to child
`
`; set child’s
`; retrieve
`; release
`
`PC and activate
`result
`from child
`child
`frame
`
`PROC :
`
`Rl,Rl,R2
`
`; add arguments
`ADD
`RETN
`; return
`Figure 4: An example of procedure
`
`linkage.
`
`deallocates a task ftamr
`instruction
`frame”)
`The RLFR (“release
`“RLFR CURTASK”, “RLFR PARENT”, awl
`by resetting
`its RState to Fw.
`“RLFR CHILD” deallocate
`the current
`frame,
`its parent, and its child,
`respectively. The instructions RETN (“return”)
`and RERL (“return
`and
`release”) are useful
`for returning
`from procedures and trap handlers,
`respectively. RETN makes the current
`task’s parent Ready again, but at
`the same time leaves the current
`task Suspended. RERL is like RETN but
`leaves the current
`task Free instead.
`
`“CALL add? sets the child task’s program counter
`The instruction
`to addr, makes the child task Ready, and S&R
`the CJJtrent.
`task’s RStntr
`to CallWait.
`In the example of procedure call and return given
`in
`Figure 4, PROC receives two arguments
`in its RI and R2, stores its result
`in its Ri, and then returns control
`to its caller. Then
`the caller
`takes
`the result
`from
`its child’s Ri and releases PROC’s task frame. As with
`register windows,
`the caller’s
`registers save the caller’s state while
`the
`callee task works.
`
`3.5 The Frame Saver
`
`task
`Deadlock can occur if a trap handler uses up the last available
`frame and then traps. To prevent deadlock, a trap occurring when only
`one task frame is available
`is converted
`into a Task Frame Unavailable
`trap. To prevent RQFR from using up the last available
`task frame, it too
`causes a Task Frame Unavailable
`trap if only one task frame is available.
`In either case, the trapping
`task’s RState is set to TrapFailed. The Task
`Frame Unavailable
`trap
`is serviced by the frame saver, which unloads
`some task frames to make space. The frame saver must not unload
`the
`Suspended patent of a loaded
`trap handler
`(since a trap handler may
`reference
`its patent’s
`registers) or the Suspended child C of a loaded
`task P (since P may access C’s registers as part of the procedure call
`protocol
`illustrated
`in Figure 4). Any other
`frame unloading policy
`is
`acceptable.
`In particular,
`the frame saver may unload
`frames that are
`in the CallWait state waiting
`for a procedure call to finish and frames in
`the Ready ot napFailed
`states that are candidates
`for being moved to
`another processor
`for load balancing. The frame saver needs to record
`the memory area to which a frame has been saved so that
`it may be
`reloaded when needed.
`
`The frame saver must be careful not to unload a task in the middle
`of executing an instruction
`or on the basis of outdated observations of
`its state. The frame saver can avoid such eventualities
`by setting
`the
`FState of selected frames to Frozen so that a consistent observation of
`their state may be made before
`the final decision on which
`frames
`to
`unload. Space limitations
`unfortunately
`preclude detailed discussion of
`of the RStates of tasks
`frame saver algorithms here, but examination
`to be saved, their patents, and their children, using FSlate as needed
`to ensure that a consistent picture
`is observed, suffice to implement
`the
`frame saver correctly.
`(It is noteworthy
`that the only RState values that
`need to be distinguished
`in the absence of the frame saver are Ready,
`TmpFuiled, Free, and “other.” The remaining RStale values exist only
`to preserve
`information
`valuable
`to the frame saver.)
`
`446
`
`SAMSUNG-1012
`Page 4 of 9
`
`

`

`If a trap (or RQFR) occurs when no task frames are free, the trap-
`ping task simply enters
`the TrapFailed state and the trap
`is forgot-
`ten, under
`the assumption
`that
`the frame saver
`is already
`running.
`When the frame saver finishes,
`it causes the trapping
`instruction
`to be
`re-executed
`in each task that has entered
`the TrapFailed state. The
`search for
`these tasks
`is speeded by means of a processor-wide
`sus-
`pended frame requests register
`(SFR) with a bit corresponding
`to each
`task frame indicating whether
`the frame is currently
`in the TrapFailed
`state.
`
`for a procedure call to return may be un-
`Since a frame waiting
`loaded, it is possible for a callee to execute a RETN instruction when its
`caller is not loaded into any task frame. This condition can be detected
`because the callee’s parent
`task register will contain a null value put
`there when the caller frame was saved. A REZTN instruction executed by
`such a task causes a Return
`to Saved Task trap, whose service routine
`locates
`the caller’s saved state, allocates a new task frame
`for it, and
`re-executes
`the RETN instruction
`after properly updating
`the caller’s
`and callee’s parent and child task registers.
`
`3.6 Trap Handling
`
`traps, the absolute register spec-
`instruction
`If a register-to-register
`ifiers of the source and destination
`registers are saved in a trap informa-
`tion register of the trap handler
`task. The trap handler may then access
`the trapping
`instruction’s
`source and destination
`registers directly us-
`ing the relative
`register specifiers SRi, SR2, and DST. If a load or store
`instruction
`traps,
`its source and destination
`registers are accessible in
`this way, and additionally
`the effective address is written
`into Rl in the
`trap handler’s
`task frame. Section 4 discusses the implementation
`of
`this mechanism.
`
`As an example of trap handling, consider the trap that occurs when
`a strict operation such as ADD discovers
`that one of its operands
`is a
`future.
`If futures are represented as in Figure 3, the two trap handlers
`(depending on whether
`the future
`is the first or second operand) are
`TOUCHIST: LDF
`SRl,O(SRl)
`RERL
`frame and re-execute
`; release
`trapping
`instruction
`
`TO’JCHZND: LDF
`RERL
`
`SR2,O(SR2)
`
`task’s register set
`in the trapping
`These trap handlers replace the future
`by the contents of the future’s value cell,
`If the future
`is unresolved,
`an Empty Location
`trap occurs
`in the LDF instruction.
`The handler
`for this trap can diagnose the situation and take the steps needed to
`suspend the task that
`touched
`the future. Although
`touching an un-
`resolved
`future
`involves
`two traps and a fair amount of computation,
`touching a resolved
`future
`(the more common case [lg])
`requires only
`two instructions
`to be executed
`in the trap handler.
`3.7 Task Creation
`
`the parent
`that
`is like procedure calling except
`Task spawning
`aud data
`task does not wait after activating
`its child. Synchronization
`transfer
`from
`the chiId are handled
`instead using a future: Figure 5
`gives an example. The parent task first allocates and initializes a future
`for the child task’s value.
`It then allocates a task frame for the child,
`fills some of the child’s registers with parameters
`(including
`the future
`just created), and uses ACTK to set the child task’s program counter
`to
`TASK2 and initiate
`its execution. ACTK is just like CALL except that ACTK
`allows the current
`task to continue executing, while CALL suspends the
`current
`task until
`its child executes a RETN instruction.
`
`the new task created by this code helps fill up pipeline
`Although
`slots on the same processor,
`this code does nothing
`to distribute
`tasks
`to other processors
`in a multiprocessor.
`For load balancing among
`processors, each processor must periodically
`run a process related
`to
`the frame saver that can save excess active
`task frames in memory so
`other processors can pick them up and execute
`them.
`
`task queue
`; store NIL as waiting
`frame
`; allocate
`child
`task
`; give
`future
`to child
`; transfer
`any arguments
`; set child’s
`PC and activate
`
`task
`
`the
`
`; The parent
`
`task:
`
`; allocate
`
`2 words
`
`for
`
`the
`
`future
`
`RI, FREESTR
`TASXl: LDE
`ADD-FX R2.Rl.2
`FREE-PTR. R2
`STN
`; put FUTURE tag on
`Rl,Rl,FUTURE-TAG
`INST
`cell
`empty
`; make value
`RS,O(Rl)
`LDE
`R2, NIL
`LDF
`I(Rl),RZ
`STN
`RDFR
`MOVE CHILD.Rl,Ri
`
`ACTK
`
`TASK2
`
`ADD
`
`R3,Rl,R?
`
`; touch
`
`the
`
`future
`
`in Rl
`
`; The child
`
`task:
`
`TASK2 :
`
`; compute value
`; store
`value
`
`in
`
`(say,
`future
`
`in R4)
`
`STN+FL O(RI),R4
`R2,NIL
`LDF
`task queue
`; read and empty
`R3,1(Rl)
`LDE
`; replace with NIL
`I(RI),R2
`STN
`on the queue headed by RS
`all
`; activate
`:
`tasks
`; de-allocate
`own task
`fr.?ime
`CURTASK
`RLFR
`Figure 5: Creating and using a future.
`
`4. MASA’s
`
`Data Paths
`
`illus-
`Figure 6 gives a block diagram of a possible implementation
`trating
`the feasibility of the MASA processor architecture. Execution of
`an instruction
`begins in the Instruction Fetch Unit (IFU). The IFU op-
`erates on the Task Database, which contains the auxiliary
`registers of all
`the task frames. On each clock cycle, the IFU (1) determines
`the frame
`number of a task T that
`is in the (Ready, Enabled) state (if one exists),
`(2) reads T’s program counter,
`(3) fetches the corresponding
`instruc-
`to Running.
`tion
`from
`the Instruction Cache, and (4) sets T’s RSlate
`Step (1) can be performed quickly by combinational
`logic similar
`to
`a priority encoder
`that reads a bit vector
`identifying
`(Ready, Enabled)
`frames.
`In step (3), if the Instruction Cache misses, a cycle might be
`wasted, but the IFU can attempt
`to issue instructions
`for other
`tasks
`on subsequent cycles while
`the cache is being refilled.
`
`can fetch
`the next cycle (“Stage 2”), an issued instruction
`During
`two operands
`from the Register File, which contains al1 the general reg-
`isters, or from
`the Task Database. The principal operations
`in Stage 2
`are (1) translate
`relative
`register specifiers
`to absolute
`register speci-
`fiers; (2) read the indicated
`registers; and (3) select the desired fields
`of the registers, or substitute an immediate constant
`from
`the instruc-
`tion word. The IFU’s output
`to the Operand Fetch Unit
`includes
`the
`task number of the task T issuing the instruction,
`along with T’s child
`task, parent
`task, and trap
`information
`registers
`for use in step (l),
`which can be performed quickly by a small multiplexer.
`Step (3) can
`be performed by four-input multiplexers
`and should not add much de-
`Iay because the multiplexers
`select inputs can be set up concurrently
`with step (2); only
`the multiplexers’
`data-to-output
`propagation delay
`time will add to the delay of Stage 2.
`
`in the
`and logical operations happen
`Tag checking, arithmetic,
`following
`cycle of an instruction’s
`execution
`(Stage 3). Register-to-
`register
`instructions
`perform
`their computation
`step in this pipeline
`stage; load/store operations perform a base-plus-displacement
`address
`calculation
`here. Following Stage 3, a register-to-register
`instruction
`proceeds immediately
`to the final pipeline stage (Stage 4), where results
`are written

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket