`USOO_5394524A
`Patent Number:
`Date of Patent;
`
`5,394,524
`
`Feb: 28, 1995
`
`[11]
`
`[45]
`
`FOREIGN PATENT DOCUMENTS
`W038/09539 12/1988 WIPO . ... .
`...,. ..... G06F 15/72
`W089/01664 2/1989 WIPO . . .. .
`..... ..... G06F 12/02
`
`
`
`Primary Examiner—Marl< R. Powell
`Assistant Exarm'rzer—Kee M. Tung
`Attorney, Agent, or Firm——Wi]liam A. Kinnaman; Duke
`W. Yee; Andrew J. Dillon
`[57]
`ABSTRACT
`In a graphics subsystem, a highly interactive two-di-
`mensional (2D) data stream and a computationally in-
`tensive three-dimensional (3D) data stream are pro-
`cessed concurrently in such a manner that processing of
`the 2D data stream is not held up by processing of the
`3D data stream. A 3D geometry subsystem having a
`parallel pipeline architecture is used to process the 3D
`data stream, while a 2D subsystem concurrently pro-
`cessed the 2D data stream in parallel with the 3D sub-
`system. A reordering device couples the processed 2D
`and 3D data streams to a common raster subsystem. The
`reordering device, which contains an internal buffer,
`reorders any order—dependent elements of the 3D data
`stream appearing at the output of the 3D geometry
`subsystem in an order different from the order in which
`they were supplied to the input end. The reordering
`device prioritizes the 2D data stream relative to the 3D
`data stream so that elements of the 2D data stream
`arriving from the 2D subsystem are passed to the raster
`subsystem almost immediately, without having to wait
`for elements of the 3D data stream.
`
`16 Claims, 6 Drawing Sheets
`
`United States Patent
`DiNicola et al.
`
`[193
`
`[54]
`
`[75]
`
`METHOD AND APPARATUS FOR
`PROCESSING TWO GRAPHICS DATA
`STREAMS IN PARALLEL
`
`Inventors: Paul D. DiNico1a, Hurley; Joseph C.
`Kantz, Saugerties; Omar M. Rahim,
`Kingston; David A. Rice, New Paltz;
`Edward M. Ruddick, Woodstock, all
`of N.Y.
`
`[73]
`
`Assignee:
`
`International Business Machines
`Corporation, Armonk, N.Y.
`
`Appl. No.: 983,455
`
`Filed:
`
`Nov. 30, 1992
`
`Related U.S. Application Data
`Continuation-in-part of Ser. No. 926,724, Aug. 7, 1992,
`Pat. No. 5,315,701.
`
`Int. Cl.5
`U.S. C1.
`Field of Searc
`395/650,
`
`.................................. G06F 3/14
`.................. .. 395/163
`5/119, 1
`, 141, 162-164,
`345/24, 112, 133, 204, 214;
`364/200 MS File, 900 MS File, 228
`References Cited
`U.S. PATENT DOCUMENTS
`4,550,386 10/1985 Hirosawa et al.
`4,737,921
`4/1983 Goldwasser et al
`395/163
`4,987,550
`1/1991 Leonard et a1.
`. 395/150
`5,045,995
`9/1991 Levinthal et al
`364/200
`5,136,593
`8/1992 Rice
`364/DIG. 1
`
`
`
`[21]
`
`[22]
`
`[63]
`
`[51]
`[52]
`[53]
`
`[56]
`
`—— ADDRESS/DATA BUS
`> --
`- COMMUNICAHONS PATH
`
`V TO SYSTEM BUS
`
`
`
`
`ROSTER SUBSYSTEM
`
`330
`
` i 325
`
`
`
`REORDERING DEVICE
`
`328
`
`0001
`
`Volkswagen 1007
`
`0001
`
`Volkswagen 1007
`
`
`
`U.S. Patent
`
`Feb. 28, 1995
`
`Sheet 1 of 6
`
`5,394,524
`
`1:!=*
`
`/
`
`52
`
`0002
`
`
`
`U.S. Patent
`
`Sheet 2 of 6
`
`5,394,524
`
`
`
`mezozo_mz<$m,K~_<z<,_n_zwrfim
`
`
`
`
`
`
`E.:oEzooE3528Ede”:zoo:5.
`
`xmaimammtefiomozo
`wmjofizooEE
`
`mno
`
`EjoEz8
`
`Eozmz
`
`dj<m<._$5ox<om>mx
`
`
`@5528~_£w___s7_oo éfiuoo§m_8E
`
`$_
`
`mam2m.~m>mE=%m:z_
`
`S88Ejofizoo
`
`0003
`
`
`
`U.S. Patent
`
`Feb. 28, 1995
`
`Sheet 3 of 6
`
`5,394,524
`
`_.I2wmwz
`
`nllluIII.II-I'll.
`
`....
`
`
`
`.....\\............%in
`
`
` DAKSEmzoEa_z2§8_.:<:5.
`ma.<2a\m$Eo<m:55Smém
`newV‘
`
`
`
`EnNE.
`
`mam
`
`m_oE~Ez_
`
`mam55%.E
`
`
`
`mo_>mooz_%e_8m
`
`zfimémzm$58
`
`0004
`
`
`
`U.S. Patent
`
`Feb. 28, 1995
`
`Sheet 4 of 6
`
`5,394,524
`
`3D PORT
`
`1
`
`IN
`
`END TAG
`
`DATA
`
`DATA
`
`DATA
`
`SEQ.NO.
`
`DATA
`
`DATA
`
`DATA
`
`SEQ.N0.
`
`END TAG
`
`DATA
`
`DATA
`
`DATA
`
`DATA
`
`DATA
`
`DATA
`
`DATA
`
`DATA
`
`DATA IN
`2D pQRT
`
`NEW SEQ.NO.
`OR END TAG
`
`DATA IN
`SELECTED 3D
`PORT AND
`NOT IN 2D PORT
`
`\
`
`PROCESS
`2D
`PORT
`
`PROCESS
`SELECTED 3D
`PORT
`
`0005
`
`
`
`U.S. Patent
`
`Feb. 23, 1995
`
`Sheet 5 of 6
`
`5,394,524
`
`wmAuzE
`CURRENT SEQUENCE
`NUMBER{CUR_SEQ_.NUM)
`
`702
`
`CHECK BOTTOM OF EACH
`ENABLED FIFO
`
`IS THERE DATA*
`AT THE BOTTOM OF
`ANY FIFO
`9
`
`1: DATA AS OPPOSED T0
`SEQUENCE NUMBERS. END TAGS.
`OR EMPTY FIFOS
`
`708
`
`N0
`
`712
`
`SELECT LOWEST SEQUENCE
`NUMBER FROM BOTTOM
`WORDS OF FIFOS
`
`YES. wA1T FOR NEW
`SEQ NUMBER
`
`ARE
`ANY nros
`EMSTY
`
`N0
`
`724
`
`ERROR CONDITTON
`
`726
`
`END
`
`YES (ORDER INDEPENDENT
`WORK)
`
`NO (ORDER DEPENDENT
`WORK)
`LOWEST §EQUENCE
`NUMBER
`CUR_SEQ_NUM+1
`?
`
`720
`
`INCREMENT CUR_SEQ._.NUM
`
`E|.G_.ZA
`
`0006
`
`
`
`U.S. Patent
`
`Feb. 28, 1995
`
`Sheet 6 of 5
`
`5,394,524
`
`MOVE DATA FROM FIFO TO OUTPUT
`UNTIL A NEW SEQUENCE NUMBER
`OR END TAG IS ENCOUNTERED, OR
`UNTIL THE FIFO IS EMPTY.
`
`MOVE DATA FROM CP FIFO
`TO OUTPUT UNTTL AN END
`TAG IS ENCOUNTERED
`
`DISCARD SEQUENCE NUMBER
`(THIS STARTS TRANSFER OF DATA
`FROM THIS FIFO)
`
`0007
`
`
`
`1
`
`5,394,524
`
`METHOD AND APPARATUS FOR PROCESSING
`TWO GRAPHICS DATA STREAMS IN PARALLEL
`
`REFERENCE TO RELATED APPLICATION
`
`This application is a continuation-in-part of applica-
`tion Ser. No. 07/926,724, filed Aug. 7, 1992, now U.S.
`Pat. No. 5,315,701, entitled “A Method and System for
`Processing Graphics Data Streams Utilizing Scalable
`Processing Nodes”.
`
`10
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`The present invention relates in general to a method
`and system for improved graphical computation and in
`particular to a method and system for utilizing graphical
`computation to process a data stream. Still more partic-
`ularly, the present invention relates to a method and
`system of graphical computation to efficiently process a
`graphics data stream.
`2. Description of the Related Art
`Data processing systems such as personal computers
`and workstations are commonly utilized to run comput-
`er-aided design (CAD) applications, computer-aided
`manufacturing (CAM) applications, and computer-
`aided software engineering (CASE) tools. Engineers,
`scientists, technicians, and others employ these applica-
`tions daily. These applications involve complex calcula-
`tions, such as finite element analysis, to model stress in
`structures. Other applications include chemical or mo-
`lecular modelling applications. CAD/CAM/CASE
`applications are normally graphics intensive in terms of
`the information relayed to the user. Other data process-
`ing system users may employ other graphics intensive
`applications such as desktop publishing applications.
`Ideally, such systems should be able to process two
`graphics data streams in parallel and interleave the re-
`sulting drawing information without mutual interfer-
`ence. One of the data streams might consist of two-di-
`mensional (2D) drawing primitives and window manip-
`ulation commands, while the other might be primarily
`three-dimensional (3D) drawing primitives and attri-
`butes. The 3D data stream processing should be ex-
`tremely high performance, while the 2D processing
`should be very low latency. In addition, the time re-
`quired to swap between these two data streams should
`be minimal. The system should be able to use current
`processor technology. Overall, the system should pro-
`vide consistent high-performance, low-latency 2D pro-
`cessing in conjunction with providing a scalable range
`of 3D processing.
`Systems which are currently on the market providing
`2D and 3D data stream support process these data
`streams sequentially, i.e., by time-multiplexing them on
`a single processor or processor complex. They process
`one data stream for a period of time, then they process
`the second for a period of time, and then they return to
`the first. This approach is an unacceptable solution since
`intermixing a data stream which is computationally
`intensive with one that is highly interactive generally
`degrades both. The computationally intensive one (3D)
`does not get as much processor time as it might, and the
`interactive one (2D) must wait for the 3D data stream to
`be processed before getting an opportunity to display
`the interactive information that the user is waiting for.
`Currently available systems require large amounts of
`context information to be swapped in order to switch
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`from processing 3D information to processing 2D infor-
`mation and back.
`A system which provides for fast 3D graphics run-
`ning alongside (or within) an interactive 2D windowed
`environment
`(e.g., X Windows)
`requires a system
`which can process these two data streams efficiently
`without mutual interference. However, the traditional
`approach of time-slicing between the two types of data
`streams can cause serious performance problems, as
`noted above.
`
`SUMMARY OF THE INVENTION
`
`In general, the present invention contemplates a seal-
`able parallel pipeline graphics system with separate
`processor complexes for the 2D data stream (the con-
`trol processor) and for the 3D data stream (attribute and
`node processors). The 3D subsystem is optimized to
`provide extremely high floating-point performance,
`which is required for 3D graphics. The 2D subsystem
`has less processing capacity, but has faster, more direct
`access to the raster subsystem that is used to actually
`modify the pixels seen on the screen.
`In accordance with the present invention, a compos-
`ite graphics data stream comprising a highly interactive
`2D data stream and a computationally intensive 3D data
`stream is partitioned into its constituent 2D and 3D
`streams, which are sent to separate 2D and 3D subsys-
`tems operating in parallel with one another. The pro-
`cessed 2D and 3D data streams are coupled to a com-
`mon raster subsystem by a reordering device or priorit-
`izer, which prioritizes the 2D data stream relative to the
`3D stream so that elements of the 2D data stream arriv-
`ing from the 2D subsystem are passed to the raster
`subsystem almost immediately, without having to wait
`for elements of the 3D data stream.
`
`Preferably, the 3D subsystem comprises a parallel
`pipeline system having a plurality of processing nodes,
`each of which contains a processor pipeline. Segments
`of the 3D data stream are distributed to the various
`processing nodes in such a manner as to balance the
`workload among the nodes. To maintain the relative
`sequence of 3D primitives that must be processed by the
`raster subsystem in a given order (and are therefore
`order dependent),
`the 3D segments are assigned se-
`quence numbers as they are distributed to the process-
`ing nodes. Successively dispatched order-independent
`segments are assigned the same sequence number, while
`order-dependent segments are assigned successively
`increasing sequence numbers. In addition to the se-
`quence numbers, end tags are sent to the processing
`nodes to indicate hiatuses in the incoming 3D data
`stream.
`
`Segments of the 2D data stream that are sent to the
`2D subsystem are not assigned sequence numbers; al-
`though the processing of these segments is generally
`order dependent, they necessarily retain their original
`order since, unlike the 3D subsystem, the 2D subsystem
`does not have parallel processing channels. On the
`other hand, as in the 3D subsystem, end tags are sent to
`the 2D subsystem to indicate hiatuses in the incoming
`2D data stream.
`The prioritizer interposed between the 2D and 3D
`subsystems and the raster subsystem has a 2D port for
`the 2D subsystem and a 3D port for each processing
`node of the 3D subsystem. Each port has associated
`with it a FIFO for buffering incoming data pending its
`further processing. In general, the prioritizer processes
`the 3D data (by dispatching it to the raster subsystem) in
`
`0008
`
`
`
`5,394,524
`
`5
`
`10
`
`15
`
`3
`order of sequence number, so that order-dependent
`primitives maintain their original sequence. The priorit-
`izer services each of the 3D ports in turn in recirculat-
`ing fashion, servicing a given port until it encounters
`either a new sequence number or an end tag indicating
`a temporarily empty port. Before proceeding to service
`the next 3D port, however, the prioritizer checks the
`2D port to determine whether it is empty. If not, the
`prioritizer services the 2D port until it encounters an
`end tag (indicating a gap in the 2D data stream), at
`which time it switches to the next 3D port.
`The primary advantage of this system over the prior
`art is that it allows the 2D and 3D data streams to be
`processed concurrently and interleaved in such a way
`that the 2D data stream is not forced to wait for large
`amounts of 3D data to be processed before it can be
`processed.
`As an example, in some systems, if a computationally
`intense piece of 3D work is given to the system to do
`(such as a NURBS surface or a high quality factor cir-
`cle), all 2D work on the system must stop while the 3D
`computations are completed. The 3D work may take
`many seconds or even minutes to complete. During this
`time, if the user wants to pop up a menu or open a new
`window, he will find that the system will not respond to
`the request until the 3D work is done. This is very
`disconcerting to the user and may even lead him to
`believe that the system is dead. In the present system, by
`contrast, the 3D output is temporarily interrupted while
`the 2D work goes on, so the menu or window appears
`almost as quickly as if the 3D work were not going on.
`Furthermore, the 3D output only is affected. The 3D
`processing continues with the output being buffered
`until the prioritizer again selects the 3D subsystem.
`Note that 3D processing is never halted.
`The separate 2D subsystem, with a direct, prioritized
`path into the raster subsystem via the prioritizer, pro-
`vides
`the consistent high-performance,-low-latency
`processing for a 2D (e.g., X Windows) data stream in
`conjunction with a 3D subsystem which is indepen-
`dently scalable to meet a range of processing needs.
`An additional advantage of this system is the reduc-
`tion in the amount of data which must be saved and
`restored when switching between the 3D and 2D pro-
`cessing. In current systems, since a single processor or
`processor complex is processing both data streams, it
`must completely save the state of the process in order to
`switch from one to the other; in the case of a 3D pro-
`cess, this is typically a large amount of data. In the
`present system, this is unnecessary, since the state of
`each process is maintained on independent processors in
`the 2D and 3D subsystems.
`The above as well as additional objects, features, and
`advantages of the present invention will become appar-
`ent in the following detailed written description.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 depicts a pictorial representation of a com-
`puter system in which the present invention may be
`implemented in accordance with a preferred embodi-
`ment of the present invention;
`FIG. 2 is a block diagram of selected components in
`a personal computer in which a preferred embodiment
`of the present invention may be implemented;
`FIG. 3 depicts a block diagram of a graphics subsys-
`tem constructed in accordance with a preferred em-
`bodiment of the present invention;
`
`4
`FIG. 4 is a block diagram of the FIFO associated
`with the 2D port of the reordering device shown in
`FIG. 3;
`-
`FIG. 5 is a block diagram of the FIFO associated
`with each 3D port of the reordering device shown in
`FIG. 3;
`FIG. 6 is a state diagram illustrating how the reorder-
`ing device interleaves servicing of its 2D and 3D ports;
`and
`FIG. 7 depicts a high level flowchart of a method and
`system for recombining processed Work Groups.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`
`With reference now to the figures and in particular
`with reference to FIG. 1, there is depicted a pictorial
`representation of a computer system in which the pres-
`ent invention may be implemented in accordance with a
`preferred embodiment of the present invention. A com-
`puter 50 is depicted which includes a system unit 52, a
`video display terminal 54, a keyboard 56, and a mouse
`58. Computer 50 may be implemented utilizing any
`suitable computer such as an IBM PS/2® personal
`computer or an IBM RISC System/6000® worksta-
`tion, both products of International Business Machines
`Corporation. (RISC System/6000 and PS/2 are regis-
`tered trademarks of International Business Machines
`Corporation.) A preferred embodiment of the present
`invention may be implemented in other types of data
`processing systems, for example, host-attached graphics
`systems such as the IBM 5080 and 6090 graphics sys-
`tems or minicomputers.
`Referring now to FIG. 2, there is depicted a block
`diagram of selected components in computer 50 in
`which a preferred embodiment of the present invention
`may be implemented. System unit 52 preferably in-
`cludes a system bus 60 for interconnecting and estab-
`lishing communication between various components in
`system unit 52. Microprocessor 62 is connected to sys-
`tem bus 60 and may also have numeric coprocessor 64
`connected to it. DMA controller 66 is also connected to
`system bus 60 and allows various devices to appropriate
`cycles from microprocessor 62 during large I/O trans-
`fers.
`
`Read only memory (ROM) 68 is mapped into the
`microprocessor 62 address space. Read Only Memory
`(ROM) 68 and Random Access Memory (RAM) 70 are
`also connected to system bus 60. ROM 68 contains the
`power-on self test (POST) and the Basic Input/Output
`System (BIOS) which control hardware operations,
`such as those involving disk drives and the keyboard.
`CMOS RAM 72 is attached to system bus 60 and con-
`tains system configuration information.
`Also connected to system bus 60 are memory control-
`ler 74, bus controller 76, and interrupt controller 78
`which serve to aid in the control of data flow through
`system bus 60 between various peripherals, adapters,
`and devices. System unit 52 also contains various input-
`/output (I/O) controllers such as: keyboard and mouse
`controller 80, video controller 82, parallel controller 84,
`serial controller 86, and diskette controller 88. Key-
`board and mouse controller 80 provides a hardware
`interface for keyboard 90 and mouse 92. Video control-
`ler 82 provides a hardware interface for video display
`terminal 94. Parallel controller 84 provides a hardware
`interface for devices such as printer 96. Serial controller
`86 provides a hardware interface for devices such as a
`modem 98. Diskette controller 88 provides a hardware
`
`50
`
`55
`
`60
`
`65
`
`0009
`
`
`
`5,394,524
`
`55
`
`5
`interface for floppy disk unit 100. Expansion cards may
`also be added to system bus 60, such as disk controller
`102, which provides a hardware interface for hard disk
`unit 104. Empty slots 106 are provided so that other
`peripherals, adapters, and devices may be added to
`system unit 52. A preferred embodiment of the present
`invention may be added to system unit 52 in the form of
`a graphics adapter placed into empty slots 106.
`Those skilled in the art will appreciate that the hard-
`ware depicted in FIG. 2 may vary for specific applica-
`tions. For example, other peripheral devices such as:
`optical disk media, audio adapters, or chip program-
`ming devices such as a PAL or EPROM programming
`device, and the like may also be utilized in addition to or
`in place of the hardware already depicted.
`In accordance with a preferred embodiment of the
`present invention, processors may be arranged in paral-
`lel pipelines to form processing nodes. These processing
`nodes are utilized to perform the bulk of the graphics
`computations for a data processing system. The proces-
`sors receive data from input communications paths and
`perform required computations, such as transforma-
`tions, clipping, lighting, etc. Each processor in a pro-
`cessing node passes intermediate data to the following
`processor to allow it to continue the calculations. This
`allows the computations to be spread among the proces-
`sors within a processing node. Each processor may
`have its own memory, and the communications paths
`are designed to allow data movement to occur without
`impacting the ability of the processors to access their
`code and data memory in accordance with a preferred
`embodiment of the present invention.
`FIG. 3 is a block diagram of a graphics subsystem 300
`constructed in accordance with a preferred embodi-
`ment of the present invention. Graphics subsystem 300,
`which is contained within video controller 82 (FIG. 2),
`includes a 2D subsystem 301 and a 3D subsystem 303.
`The 3D subsystem 303 is in turn formed of a plurality of
`processing pipelines or nodes 305, as described below. 40
`Graphics subsystem 300 receives interleaved 2D and
`3D graphics data streams through a bus interface 302,
`which is coupled to the system bus 60 of the host system
`52 utilizing presently available techniques well known
`to those skilled in the art. The 3D graphics data stream
`may be divided up or partitioned into Work Elements.
`A Work Element (WE) may be (1) a drawing primitive,
`which is a command to draw, i.e., a line, a polygon, a
`triangle, or text; (2) an attribute primitive, which is a
`command to change an attribute, also called an attribute 59
`change, i.e. color or line style, or (3) a context primitive,
`which is context information for an area of display or a
`window. Both the 2D and the 3D graphics data stream
`may be stored in a work element RAM 304.
`An attribute processor (AP) 306 performs prepro-
`cessing of the incoming 2D and 3D data streams (such
`as graphics attribute processing) and dispatches work to
`the 3D processing nodes 305 or to the 2D subsystem
`301, as appropriate. Attribute processor 306 may be
`either a suitably programmed general-purpose proces-
`sor or a special-purpose logic circuit.
`Attribute processor 306 reads work from an input
`FIFO, memory or other input path and moves work
`groups to the appropriate processing node 305. This
`processor is also responsible for operations such as in-
`cluding a sequence number with the work groups so
`that the work groups may be reordered after processing
`by the processing nodes 305. Also, for some graphics
`
`65
`
`6
`data streams, the processor may perform display list
`processing and non-drawing processing.
`Attribute processor 306 is utilized to parse or parti-
`tion the 3D data stream into multiple segments in accor-
`dance with a preferred embodiment of the present in-
`vention. Each segment is also called a work group
`(WG), and each work group may contain one or more
`work elements. The number of work elements in a work
`group may be determined by various factors such as the
`amount of processing time that it takes to process a
`work group versus the amount of processing time it
`takes to group work elements into a work group. Attri-
`bute processor 306 is coupled to a RAM 308, which is
`employed to store various instructions or data utilized
`by attribute processor 306. Additionally, attribute pro-
`cessor 306 may move data by utilizing other devices
`such as DMA controllers, processors, or with internal
`features within the attribute processor itself. Attribute
`processor 306 may perform graphics processing and
`supply current attribute data to the processing nodes
`305 along with the work to be done.
`A video RAM (VRAM) 310 stores attribute informa-
`tion, in the form of processed attribute primitives, from
`the data streams along with font information and other
`context-related data in accordance with a preferred
`embodiment of the present invention. Attribute proces-
`sor 306 copies attribute data from the graphics data
`streams into VRAM 310. A shared RAM 312 is utilized
`to store font and context data. Both VRAM 310 and
`shared RAM 312 are shared memory areas utilized for
`storing globally accessed data, such as graphics context
`information, fonts, and attribute data. This type of mem-
`ory may be accessible by all of the processors, but is
`accessed relatively infrequently. As a result, contention
`for bus access to this type of memory has minimal im-
`pact on performance.
`Attribute processor 306 distributes work groups to
`the processing nodes 305 through communications
`paths 313. Communications paths 313 are utilized for
`passing data between the various processors in accor-
`dance with a preferred embodiment of the present in-
`vention. These communications paths may be memory
`ports, or any type of hardware well known to those
`skilled in the art that provides a data path to another
`processor.
`Although not necessary for an Understanding of the
`present invention, further details of the operation of
`attribute processor 306 and other elements of the graph-
`ics subsystem 300 may be found in the above-identified
`copending application Ser. No. 07/926,724, the specifi-
`cation of which is incorporated herein by reference.
`Each of the processing nodes 305 includes a first
`processor 314 coupled to a RAM 316 and a second
`processor 318 coupled to a RAM 320. Processor 314
`and processor 318 are serially coupled to each other.
`Processors 314 and 318 are TMS32OC40 processors
`manufactured by Texas Instruments Incorporated in
`accordance with a preferred embodiment of the present
`invention. Information on programming and utilizing
`TMS320C4O processors may be found in TMS320C4x
`User’: Guide, available from Texas Instruments Incor-
`porated. RAM 316 and RAM 320 are utilized to store
`instructions and data for processor 314 and processor
`318 respectively.
`The number of processing nodes 305 may vary in
`accordance with _a preferred embodiment of the present
`invention. Although the depicted embodiment shows
`only two processors per processing node 305, it is con-
`
`0010
`
`
`
`5,394,524
`
`7
`templated that other numbers of processors may be
`utilized in each processing node. Additionally, if more
`than one processor is in a processing node 305, it is not
`necessary that all of the processors in the processing
`node be of the same type or make.
`Processing nodes 305 are separated by bus transceiv-
`ers 321a, 321b, and 321c, which are well known in the
`art. These bus transceivers control access to VRAM
`310 and Shared RAM 312 by the processing nodes 305.
`Closing the bus transceivers creates a single bus, while
`opening the bus transceivers creates two buses. When
`the bus transceivers are all open, node processors 318
`have access to shared RAM 312, while node processors
`314 have access to VRAM 310. Closing all of the bus
`transceivers results in all of the processors in the pro-
`cessing nodes 305 being able to access both shared
`RAM 312 and VRAM 310. Although only three bus
`transceivers and one shared RAM and one VRAM are
`shown in the depicted embodiment, other numbers of
`bus transceivers, and various numbers and types of
`RAM may be utilized in accordance with a preferred
`embodiment of the present invention.
`As work groups are processed within the processing
`nodes 305, the processed work groups are sent from the
`processing nodes, via a bus 324, to a reordering device
`322 en route to a raster subsystem 326.
`Reordering device 322 combines the processed 3D
`data from the processing nodes 305 into a single 3D data
`stream for transmission to the raster subsystem 326.
`'Reordering device 322 also merges the processed 3D
`data stream from 3D subsystem 303 with the processed
`2D data stream from 2D subsystem 301 to form a single
`combined data stream for the raster subsystem 326. In
`this particular embodiment, reordering device 322 is an
`application-specific integrated circuit (ASIC). How-
`ever, reordering device 322 may also be a processor or
`other specialized logic circuit.
`As noted above, processed work groups are recom-
`bined to produce a processed graphics data stream,
`which is sent to raster subsystem 326, which may be an
`specialized ASIC or a processor, for display of a pixel
`image on video display terminal 94 (FIG. 2). The reor-
`dering or recombining of the processed work groups is
`accomplished by assigning a tag or sequence number to
`each work group in accordance with a preferred em-
`bodiment of the present invention. Reordering device
`322 utilizes the synchronization tags to determine the
`order in which to place work groups to produce a data
`stream.
`
`In some cases, the order in which work groups are
`placed may be extremely important, and in other cases,
`the order of work groups may be unimportant. As a
`result, in addition to dividing up a graphics data stream
`into segments, attribute processor 306 may be utilized to
`determine the order in which the segments are reor-
`dered or reassembled at reordering device 322 in accor-
`dance with a preferred embodiment of the present in-
`vention. Furthermore, attribute processor 306 deter-
`mines whether or not the order of a work group is
`important and assigns synchronization tags or sequence
`numbers to each work group to reflect this in accor-
`dance with a preferred embodiment of the present in-
`vention. This determination may be dependent on vari-
`ous factors such as the type of graphics data stream
`being processed or their drawing locations on the
`screen. These synchronization tags or sequence num-
`bers are utilized by reordering device 322 to determine
`the order in which to send processed graphics data to
`
`5
`
`10
`
`15
`
`20
`
`25
`
`50
`
`8
`raster subsystem 326 in accordance with a preferred
`embodiment of the present invention. Work groups
`which do not require any temporal order may be as-
`signed the same synchronization tag or sequence num-
`ber. Reordering device 322 passes these primitives to
`raster subsystem 326 as it encounters them; it will not
`force one to be drawn before another. When order-
`dependent primitives are encountered, attribute proces-
`sor 306 assigns successive sequence numbers that are
`then used by reordering device 322 to output the primi-
`tives in the correct order to raster subsystem 326. The
`disclosed system thus allows those primitives that can
`be drawn without regard to order to be drawn at will,
`while those that must be drawn sequentially are drawn
`sequentially.
`Reordering device 322 has a 2D port 323 for receiv-
`ing the processed 2D data stream from 2D subsystem
`301 and a 3D port 325 for receiving processed 3D data
`from each processing node 305 of 3D subsystem 303;
`each 3D port 325 is associated with a particular node
`305 of the 3D subsystem 303.
`Referring now to FIG. 4, associated with the 2D port
`323 of reordering device 322 is a FIFO 400 for receiv-
`ing and storing the 2D data stream from 2D subsystem
`301 while awaiting dispatching to raster subsystem 326.
`Incoming elements are added to the top of the occupied
`area of FIFO 400 as shown in the figure, while outgoing
`elements are removed from the bottom of the occupied
`FIFO area as shown in the same figure. Any suitable
`means known in the art, such as pointers to an address-
`able memory, may be used to realize FIFO 400. At a
`given instant in time, FIFO 400 might contain a plural-
`ity of data entries 402, constituting elements of the pro-
`cessed data stream from 2D subsystem 301, and an
`“end” tag 404 at the top of the occupied buffer area
`indicating a gap in the 2D data stream. In the embodi-
`ment shown, end tag 404 is added to the stream either
`by attribute processor 306 or by the 2D subsystem pro-
`cessor (to be described) when it detects a gap in the 2D
`data stream.
`In a similar manner, referring now to FIG. 5, associ-
`ated with each 3D port 325 of reordering device 322 is
`a FIFO 500 for receiving and storing 3D data from the
`corresponding node 305 of 3D subsystem 303 while
`awaiting dispatching to raster subsystem 326. As with
`the 2D FIFO 400, incoming elements of the 3D data
`stream are added to the top of the occupied area of each
`FIFO 500 as shown in FIG. 5, while outgoing elements
`are removed from the bottom of the occupied FIFO
`area as shown in the same figure. Any suitable means
`known in the art, such as pointers to an addressable
`memory, may be used to realize FIFO 500. At a given
`instant in time, each 3D FIFO 500 might contain a
`plurality of groups of data entries 504 constituting ele-
`ments of the processed 3D data stream from the corre-
`sponding processing node 305 of 3D subsystem 303,
`with each group of entries being preceded by a se-
`quence number 502 for that group. An “end” tag 506 at
`the top of the occupied buffer area indicates a gap in the
`3D data stream. In the embodiment shown, attribute
`processor 306 adds sequence numbers 502 to the por-
`tions of the 3D data stream that it distributes to the
`processing nodes 305 to indicate the order in which the
`primitives are to be recombined for processing by raster
`subsystem 326. Attribute processor 306 adds end tag 404
`to each distributed portion of the 3D data stream when
`it detects a gap in the 3D data stream.
`
`0011
`
`
`
`5,394,524
`
`9
`Reordering device 322 recombines 3D data arriving
`from the various nodes 305 in such a manner as to en-
`sure that the data reaches raster subsystem 326 in the
`correct order. Reordering device 322 receives the se-
`quence number 502 of each primitive, as indicated
`above, and selects the next sequential primitive to draw.
`The sequence number is incremented after each order-
`dependent primitive (or set of order-independent primi-
`tives) is passed to the raster subsystem 326. FIFOs 400
`and 500 contain sufficient buffering capability to allow
`the 2D subsystem 301 and the processing nodes 305 to
`write their output to the reordering device 322 and
`continue processing, even if their output data is not
`currently selected. Preferably, reordering device 322
`also allows data t