`Kcllcher
`
`[19]
`
`[54] PARALLEL-PROCESSOR GRAPHICS
`ARCHITECTURE
`
`[75]
`
`Inventor: Brian Michael Kelleher. Palo Alto.
`(‘ali.f.
`
`[73] Assignee: Dynamic Pictures, Inc" Palo Alto.
`Calif.
`
`[21] Appl. No.: 570.088
`[22]
`Filed:
`Dec. 11, 1995
`
`Int. Cl.‘5 .
`[S11
`[52] U.S. Cl.
`..
`
`...................................... G06F 15/80
`.. 395/505; 395/509: 395/497.04:
`395/412
`[58] Field of Search ..................................... 395/502. 505.
`395/506. 507. 509. 401. 402. 497.0l—497.04.
`412-419. 515. 516; 382/303. 304
`
`[56]
`
`4.761.736
`4.949.280
`5.117.468
`5.276.798
`5.315.699
`5.377.320
`5.388206
`
`References Cited
`U.S. PATENT DOCUMENTS
`8/1938 Di Ono
`8/1990 Littlefield .
`5/1992 Hind et al.
`1/1994 Peaslee et a1.
`5/1994 Imai et al.
`12/1994 Abi-Ezzi et al.
`2/1995 Poulton et al.
`
`......
`
`PC!
`
`395/402
`395/505
`382/304
`395/S05
`.. 395/505
`395/502
`.. 395/S05
`
`it)
`'
`
`IlllllllllllllllllllllllllllllllllllllllIlllllllllllllllllllllllllll
`US0057940 1 6A
`
`[11] Patent Number:
`
`[45] Date of Patent:
`
`5,794,016
`
`Aug. 11, 1998
`
`5.392.393
`5.-103.606
`5,434,967
`5.-140,682
`5.493.643
`
`
`
`2/1995 Deering et al.
`395/S05
`4/1995 Eckar1
`.. 395/502
`7/1995 Tanuenhaum et al
`395/506
`
`8/1995 Deering ..........
`395/503
`.. 395/505
`2/1996 Soderberg et al.
`U1‘HER PUBLICATIONS
`
`“Simulation and Expected Performance Analysis of Mul-
`tiple Processor Z—Bu1fer Systems”. Frederick I. Parke. Com-
`puter Graphics. vol. 14. No. 3. pp. 46-56 (Jul.. 1980).
`“High—Performance Polygon Rendering". Kurt Ake1ey.Tom
`Jermoluk. Silicon Graphics. Inc.. pp. 6-12. (1988).
`
`Primary Irixaminz-r——Kee M. Tung-
`Attomcy, Agent, or Firm4awyer & Associates
`[57]
`ABSTRACT
`
`A parallel-processor graphics architecture appropriate for
`multimedia graphics workstations that is scalable to the
`needs of a user. The graphics architecture includes one or
`more rendering processors and a graphics memory that is
`partitioned into blocks. Noncontiguous groups of the blocks
`are then assigned to different processors. The parallel»
`processor graphics architecture is scalable by the number of
`rendering processors utilized. and is configurable with
`respect to the allocation of the groups of the blocks to
`specific rendering processors.
`
`10 Claims, 8 Drawing Sheets
`
`COMMAND
`QUEUE
`
`5*‘
`
`CONFIG REG.
`
`
`7-‘l~.
`
`. X-density
`
`ts-1 -— I
`
`DISPATCHER
`
`GEOMFTRIC
`SETUP
`
`no J
`
`7" "‘
`
`A'[‘TRlBLITF,
`,
`_
`p
`WT“
`1
`
`_.
`-.
`(ilII\F.l-Z./\TOR
`
`M _,
`AI)I)RLsS
`
`
` 1
`
`PI l\"Sl(‘/\|
`MF.'VlUl'{Y
`+\Dl)RF.SS
`
`
`
`PIXEL
`ATTRI HUT!-.
`l).v\'I .»’\
`
`0001
`
`Volkswagen 1005
`
`76
`
`msPA'rcH
`E
`
`..
`SET‘-P
`34;;
`
`lll’I)A'l‘l:
`E
`
`
`
`
`
`
`
`
` Block Enable /
`
`0001
`
`Volkswagen 1005
`
`
`
`U-S- Patent
`
`Aug. 11, 1993
`
`Sheet 1 of 3
`
`5,794,016
`
`SYS.
`CHIP SET
`
`MAIN
`MEMORY
`
`PROCESSOR
`
`B’
`
`0002
`
`
`
`U.S. Patent
`
`Aug. 11,1993
`
`Sheet 2 of8
`
`5,794,016
`
`SYS.
`CHIP SET
`
`MAIN
`MEMORY
`
`I)
`..0A
`
`7
`«OB
`
`PROCESSOR
`pf
`
`to
`vdac1
`
`PROCESSOR
`1
`
`32A
`
`3213 ~'"’.
`
`0003
`
`
`
`U.S. Patent
`
`Aug. 11, 1993
`
`Sheet 3 of 3
`
`5,794,016
`
`BRIDGE
`
`33
`
`FI5III1lII1I
`
`0004
`
`
`
`U.S. Patent
`
`0...UA
`
`I1
`
`11!
`
`89911
`
`Sheet 4 of 8
`
`0005
`
`
`
`U.S. Patent
`
`la.12.UA
`
`89911
`
`Sheet 5 of 8
`
`0006
`
`
`
`U.S. Patent
`
`Aug. 11, 1993
`
`Sheet 6 of 8
`
`5,794,016
`
`DISPA -
`80TCH
`
`COM M AND
`QUEUE
`
`-
`
`Block Enable
`
`DISPATCHER
`
`GEOM ETRIC
`SETUP
`
`ATTRIBUTE
`SETU P
`
`(R.G.B.Z,T)
`
`Gf:‘{f€§f1§gR
`
`INTERPOLATOR
`
`PHYSICAL
`MEMORY
`ADDRESS
`
`PIXEL
`ATTRIB UTE
`DATA
`
`0007
`
`
`
`U.S. Patent
`
`Aug. 11, 1993
`
`Sheet 7 of 3
`
`5,794,016
`
`0008
`
`
`
`U.S. Patent
`
`Aug. 11, 1993
`
`Sheet 8 of 8
`
`5,794,016
`
`0009
`
`
`
`5.794.016
`
`1
`PARALLEL-PROCESSOR GRAPHICS
`ARCHITECTURE
`
`FIELD OF THE INVENTION
`
`5
`
`The present invention relates to computer graphics and
`more particularly to a parallel-processor graphics architec-
`ture appropriate for multimedia graphics workstations that is
`scalable to the needs of a user.
`
`BACKGROUND OF THE INVENTION
`
`The tasks of traditional geometric graphics applications
`can be partitioned into three functional categories: structure
`traversal. geometry processing. and rendering. Structure
`traversal refers to the traversal of an application’s graphics
`data structure. either by the application or by a graphics
`library.
`Geometry processing refers to floating point intensive
`operations. such as vertex transformation and shading. that
`convert the image data from an application's format into a
`geometric format of vertices comprising the image and
`vertex properties. such as color. Finally. rendering refers to
`the process of calculating individual pixel values for the
`image that are stored in graphics memory based on the
`transformed geometric data.
`Designers have recognized that substantial gains in graph-
`ics system performance are possible through the use of
`multiple processors operating in parallel. One problem with
`multi-processor graphic subsystem designs is determining
`how to distribute the rendering tasks among the processors.
`Prior methods have divided graphics memory or display
`screen into partitions and allocated different parts of the
`memory or screen to separate processors. However. those
`methods are ineflicient. In some methods. the processors
`worked on rendering different parts of the same object and
`the processors were therefore dependent on one another. If
`one processor became overloaded and slowed down.
`it
`caused the other processors also to slow down.
`In other methods. the processors were assigned to large
`contiguous areas of memory. For instance. in a four proces-
`sors system. the display screen would be divided into four
`large adjacent blocks with each processor assigned to a
`dilferent block. The problem with this method is that it
`reduces the efliciency of the parallelism of the graphics
`system. If there were no objects for a processor to render in
`its area. then the processor sat idle and under utilized. while
`the other processors may have been overloaded. An extreme
`solution to this problem would be to assign one processor to
`one pixel. but as more and more processors are added.
`additional gains in performance decrease. while costs
`increase dramatically.
`Typical graphic systems include a general-purpose base
`system with its own central processing unit and main
`memory. and special—purpose graphics hardware to acceler-
`ate the graphics applications. Since the introduction of the
`first special-purpose graphics hardware. physical memory
`restrictions have burdened low-level graphics software and
`firmware developers. To solve this problem. some graphics
`software interfaces provide a virtual memory abstraction to
`their applications to allow the application a view of an
`unlimited number of coexistent images it can create and
`maintain through the software interface.
`Early examples of graphics virtual memory systems
`include a simple memory-mapped frame-buffer method and
`a rendering processor method The memory-mapped frame-
`buffer method achieved virtual memory rendering by allow-
`
`2
`ing the CPU to implement the rendering algorithms using a
`virtual memory system in the base system as the graphic
`memory system. Similarly. the rendering processor method
`also used the main memory of the base system. but inter-
`faced the main memory with a special-purpose rendering
`processor through a wide bus. While these systems have
`very diiferent implementations.
`they are similar in their
`ability to render directly into processor virtual memory.
`Each of these methods tightly integrates the graphics with
`the main memory of the base system. Unfortunately. graphic
`systems marketed today do not satisfy the requirements of a
`full ra.nge of customers. For example. a workstation that uses
`the CPU to perform rendering operations may be an accept-
`able low-cost solution; however. some customers will
`undoubtedly demand higher performance. Similarly. a work-
`station that utilizes multiple processors to increase system
`performance may not be scalable downward to please cost-
`sensitive customers.
`
`Therefore. what is needed is a graphics system architec-
`ture that includes a low-cost entry configuration with limited
`graphics hardware. and allows a higher performance multi-
`processor configuration in which processor efficiency is
`maximized. The present invention addresses such a need
`
`SUMMARY OF THE INVENTION
`
`The present invention provides a parallel graphics system
`and method for rendering an image on a display screen. The
`system includes a plurality of processors for rendering an
`image on the display screen. and a memory organized into
`noncontiguous groups of blocks. The system and method
`further comprises block enable means for allocating at least
`one of the noncontiguous groups of blocks to each of the
`plurality of processors for rendering. thereby increasing the
`efflciency of the parallelism of the graphics system.
`According to the system and method disclosed herein. the
`present invention is scalable by the number of rendering
`processors utilized. and is configurable with respect to the
`allocation of the groups of the blocks to specific rendering
`processors.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 depicts a low cost implementation of a graphics
`system having one rendering processor.
`FIG. 2 is a block diagram depicting the graphics system
`with two rendering processors.
`FIG. 3 depicts a high—end implementation of the graphics
`system with N rendering processors.
`FIG. 4 is a block diagram depicting a graphics memory
`that has been divided into a plurality of blocks.
`FIGS. 5 and 6 are block diagrams depicting the noncon-
`tiguous block organization of the present invention within a
`4x4 array of adjacent blocks.
`FIG. 7 is a block diagram of a rendering processor.
`FIG. 8 is a block diagram showing three triangles. triangle
`A. triangle B. and triangle C that are to be rendered in blocks
`0 and 1.
`
`FIG. 9 is a block diagram showing a representative pixel
`and pixelmap within a graphics virtual memory 90.
`FIG. 10 is a block diagram illustrating a rendering pro-
`cessor and an associated memory segment.
`FIG. 11 is a functional block diagram of an address
`generation circuit.
`FIG. 12 is a block diagram illustrating the contents of a
`TLB entry.
`
`0010
`
`
`
`5.794.0 I 6
`
`_
`
`3
`DESCRIPTION OF THE INVENTION
`
`The present invention relates to an improved parallel-
`processor graphics architecture. The following description is
`presented to enable one of ordinary skill in the art to make
`and use the invention and is provided in the context of a
`patent application and its requirements. Various modifica-
`tions to the preferred embodiment will be readily apparent to
`those skilled in the art and the generic principles herein may
`be applied to other embodiments. Thus. the present inven-
`tion is not intended to be limited to the embodiment shown
`but is to be accorded the widest scope consistent with the
`principles and features described herein.
`FIGS. 1-3 are block diagrams depicting a parallel-
`-processor graphics system 10 of the present invention that is
`.scalable to the needs of a user. The system may be config-
`ured with one rendering processor. or with multiple render-
`ing processors. FIG.'l depicts a low-cost implementation of
`the graphics system 10A having one rendering processor
`20A. FIG. 2 is a block diagram depicting the graphics
`system 10B with two rendering processors 20A and 2013.
`FIG. 3 depicts a high-end implementation of the graphics
`system 10C with N rendering processors 20A-20N. In the
`multiple—processor configurations. the efliciency of the mul-
`tiple processors is maximized by effectively distributing the
`rendering tasks. explained further below.
`The graphics system 10A. 10B. and 10C (hereinafter
`referred to as graphics system 10) is part of a general-
`purpose base computer system 12 including a CPU 14. a
`system chip set 16. and main memory 18. Each rendering
`processor 20A—2l)N (hereinafter referred to as rendering
`processor 26) in the graphics system 10 may be coupled to
`a graphics memory 22. a video digital-to-analog converter
`(VDAC) 24. a peripheral component interconnect (PCI)
`ROM 26. and a VGA processor 28.
`The purpose of the rendering processor 20 is to accept a
`geometric description of an object and scan-convert the
`object into the graphics memory 22. The graphics memory
`22. also known as a frame buffer. stores pixel data at
`memory addresses corresponding to pixels on a display
`device. such as a cathode ray tube (CRT) 29. The intercon-
`nect of the rendering processor 20 comprises four major
`buses: a PCI bus 30. a memory bus 32. a video bus 34. and
`a utility bus (Ubus) 36.
`The PCI bus 30 is a 32-bit bus that connects the rendering
`processor 20 to the general-purpose computer system 12. In
`the multiple-processor configurations 10B and 10C.
`the
`rendering processors 20 are connected to a system PCI bus
`30A through a PCI bridge 38. Since only one load may be
`placed on the system PCI bus 30A. the PCI bridge 38 is used
`to translate the system PCI bus 30A into a second PCI bus
`30B.
`
`The video bus 34 provides pixel data from the rendering
`processors 20 to the VDAC 24 via a 64-bit interconnect The
`VDAC 24 is a display interface that receives ordered pixel
`data from each of the rendering processors 20 in the form of
`digital color data. and passes the data to a CRT 29.
`The Ubus 36 provides an inputloutput data path for the
`rendering processor’s 20 ancillary chips. such as the VDAC
`24. the VGA processor 28 and the PCI ROM 26. The VGA
`processor 28 is provided so that the graphics system 10 is
`compatible with legacy systems.
`The rendering processor 20 is the primary component of
`the graphics system 10 of the present invention. The ren-
`dering processor 20 implements graphics and multimedia
`algorithms and interfaces with the PCI bus 30 and all other
`
`4
`components in the system 10. The rendering processor 20
`integrates video technologies including 2D and 3D geomet-
`ric graphics. photorcalistic rendering. stereoscopic viewing.
`windowing operations. and live video.
`More particularly. each rendering processor 20 imple-
`ments a set of parallel rendering algorithms including
`smooth shadowing. depth-buffering. dithering. antialiasing.
`transparency. blending. alpha testing. stenciling. stippling.
`accumulation buffering. and texture mapping. In addition.
`the rendering processor 20 provides a video pipeline for
`multimedia applications through which live video data can
`be scaled. filtered. and converted into red. green. and blue
`color components for storage in graphics memory 22 and
`subsequent display.
`in order to increase the
`Referring to FIGS. 2 and 3.
`efliciency of the rendering processors 20.
`the graphics
`memory subsystem 22 is partitioned into physical memory
`segments 42 and each memory segment 42 is coupled to a
`difierent rendering processor 20. Each rendering processor
`20 accesses only its memory segment 42 of the graphics
`memory 22 and is required to write only those pixels falling
`within that memory segment 42. For example. rendering
`processor 20B is coupled to memory segment 42B and
`writes only to those pixels contained within memory seg-
`ment 42B of the graphics memory subsystem 22.
`The memory bus 32 connects each rendering processor 20
`to its corresponding graphics memory segment 42 via a
`64-bit
`interconnect.
`In a preferred embodiment of the
`present invention. the memory segments 42 comprise syn-
`chronous dynamic random access memory (SDRAM). In the
`low-cost one-processor configuration of FIG. 1. four 1M><16
`SDRAMs are used to construct an eight Mbyte graphics
`memory system 22. In the two-processor configuration of
`FIG. 2. eight 1M><l6 SDRAMs are used to construct a
`sixteen Mbyte graphics memory system 22. In a higher-end
`four-processor configuration. sixteen IMX 16 SDRAMs are
`used to construct a thirty—two Mbyte graphics memory
`system 22. The graphics memory 22 could also be imple-
`mented using VRAM. in which case. the graphics memory
`22 would connect directly to the VDAC 24.
`Prior methods have divided graphics memory or display
`screen into partitions and allocated ditferent parts of the
`memory or screen to separate processors. However. those
`methods are inefficient in the manner that the memory is
`allocated to the processors. which also creates dependencies
`among the processors.
`The present invention provides an improved method and
`system for allocating dilferent parts of the graphics memory
`22 to multiple rendering processors 20. The method and
`system for allocating the graphics memory 22 increases the
`eficiency of the graphics system. and is configurable such
`that additional rendering processors 20 may be added to the
`graphic architecture. making the system scalable.
`To more particularly illustrate the method and system for
`allocating graphics memory 22 to the rendering processors
`20 in accordance with the present invention. refer now to
`FIG. 4 depicting a block diagram of one embodiment of such
`a system.
`FIG. 4 is a block diagram of a graphics memory 22 that
`has been partitioned into a plurality of pixel blocks 52 that
`are tiled in the x- and y-direction of the graphics memory 22.
`In a preferred embodiment. the graphics memory 22 is large
`enough to render a 1280x1024 screen display.
`According to the present invention. the pixel blocks 52 are
`organized into noncontiguous groups of blocks 52. where
`blocks 52 in the same group are designated by the same
`
`0011
`
`
`
`5.794.016
`
`5
`number. As an example. the graphics memory 22 in FIG. 4
`is shown comprising sixteen groups of pixel blocks (0-15).
`The blocks 52 labeled “O” belong in group “O”. the blocks
`labeled
`belong in group “i”.
`the blocks labeled “3"
`belong in group “3" et cetera.
`The groups of blocks 52 in the graphics memory 22 are
`then assigned to the rendering processors 20 for rendering.
`Each rendering processor 20 writes to only those pixels that
`are located in the blocks 52 of the assigned groups. The
`groups of block 52 may be assigned to the rendering
`processors 20 in varying ratios. Assuming there are sixteen
`rendering processors present in the multi-processor system
`10C. then the sixteen groups of blocks 52 could be assigned
`to the processors 20 on a one-to—one basis. In this case. the
`blocks in group “O” may be assigned to rendering processor
`0; the blocks in group “l” may be assigned to rendering
`processor 1; the blocks in group “2" may be assigned to
`rendering processor 2. and so on.
`FIGS. 5 and 6 are block diagrams depicting the noncon-
`tiguous block organization of the present invention within
`one 4x4 array of adjacent blocks 52in the graphics memory
`22 (the 4x4 array‘ is replicated throughout the graphics
`memory 22). The sixteen groups of pixel blocks 52 within
`each 4><4 array are tiled as in FIG. 4. where a particular
`group is designated by the middle number in the blocks 52.
`and the number in the upper right corner of the blocks 52
`represents the processor to which a particular block is
`assigned
`When the number of rendering processors 20 is less than
`the number of groups of blocks 52. then the groups of blocks
`52 may be assigned to the rendering processors 20 in higher
`ratios. such as two-to-one or four-to-one. rather than the
`one-to-one ratio above.
`
`Referring to FIG. 5. what is shown is an example allo-
`cation of groups of blocks 52 when four rendering proces-
`sors 20 (0-3) are present. When sixteen groups are allocated
`among four rendering processors 20. the group of blocks 52
`are assigned to the rendering processors 20 in a four-to-one
`ratio. As shown. processor 0 is assigned to groups 0. 6. 11.
`and 13; processor 1 is assigned to groups 1. 7. 10. and 12:
`processor 2 is assigned to groups 2. 4. 9. and 15; and
`processor 3 is assigned to groups 3. 5. 8. and 14.
`Referring to FIG. 6. what is shown is an example allo-
`cation of groups of blocks 52 when two rendering processors
`20 (0-1) are present. When sixteen groups are allocated
`among two rendering processors 20. the group of blocks 52
`are assigned to the rendering processors 20 in an eight-to-
`one ratio. As shown. processor 0 is assigned to groups 0. 2.
`5. 7. 8. 10. 13. and 15; and processor 1 is assigned to groups
`1. 3. 4. 6. 9. 11. 12. and 14.
`The assignment or mapping of various groups of blocks
`52 to rendering processors 20 is referred to as a processor
`map. As shown in the processor maps of FIGS. 4-6. the
`blocks 52 within a particular group that are assigned to a
`particular processor 20 are arranged in a noncontiguous
`organization. and are spaced about the display. Therefore.
`the rendering demands placed on each of the rendering
`processors 20 should be equalized. That is. as the rendering
`processors 20 operate in parallel no rendering processor
`should be idle while others are overloaded Therefore. the
`noncontiguous block organization of the present invention
`effectively increases the overall efiiciency and performance
`of the graphics system 10.
`When a polygon to be rendered crosses a block boundary
`and enters into another block 52. the two rendering proces-
`sors 20 assigned to each of the respective group of blocks 52
`
`6
`
`must duplicate calculations for the same polygon. which
`wastes parallelism Therefore. in a preferred embodiment of
`the present invention. each pixel block 52 is 128x128 pixels
`in size. This block size has been determined to efiectively
`divide the parallelism of the rendering processors 20. Since
`polygons tend to be particularly small. and 128x128 blocks
`52 are relatively large. polygons will not commonly cross
`block boundaries. The block size. however. is not so large as
`to slow the rendering processors 20 down when intensive
`rendering is required in a block 52. As stated above. when
`the blocks 52 are too large. the distribution of polygons
`becomes ineffective.
`
`Referring now to FIG. 7. a block diagram of a rendering
`processor 20 is shown. Each rendering processor 20 includes
`a configuration register 60. a command queue 62. a dis-
`patcher circuit 64. a geometric setup circuit 66. an attribute
`setup circuit 70. an address generation circuit 68 and an
`interpolator circuit 72.
`According to the present invention. the assignment of
`various blocks to particular rendering processors is not
`fixed. but is rather alterable through programming. The
`block assignments are programmed through the configura-
`tion register 60. which sets the global state of the rendering
`processor 20. The configuration register 60 includes a block
`enable field 61. an X-density field 74 and Y-density field 76.
`The block enable field 61 determines which groups of
`blocks 52 within the graphics memory 22 are allocated to
`and controlled by the rendering processor 20. It is through
`the block enable field 61 that the parallelism among the
`rendering processors 20 may be configured.
`The block enable field 61 includes a plurality of bits.
`where each bit corresponds to a particular group of blocks
`52. A set bit implies that a rendering processor 20 accesses
`the group of blocks 52 corresponding to that bit. For
`example. if bit zero is set within the block enable field 61 of
`a particular rendering processor 20. then that rendering
`processor 20 will access each block 52 in group “O". The bits
`in the block enable field 61 must be mutually exclusive
`between rendering processors 20.
`In a preferred embodiment. the block enable field 61 is
`sixteen-bits in length. implying there are sixteen group of
`blocks 52 that can be distributed among the processors 20.
`Rather than using a hardware register. the allocation of a
`group of blocks 52 to a particular processor 20 could also be
`implemented through software running on each of the ren-
`dering processors 20. or through a global register or software
`located remote from the rendering processors.
`Referring again to FIG. 6. for example. the block enable
`field 61 for the rendering processor 0 would contain the
`hexadecimal value “a5a5." This means that bits 0. 2. 5. 7.8.
`10. 13 and 15 would be set to instruct the processor to render
`those groups of blocks 52. The block enable field 61 for
`rendering processor 1 would contain the complementary
`hexadecimal value “5a5a.” This means that bits 1. 3. 4. 6. 9.
`11. 12. and 14 would be set to instruct the processor to render
`those groups of blocks 52. As in FIG. 4. the noncontiguous
`block organization shown in FIGS. 5-6 distributes work
`among the rendering processors 20 in a more etficient
`manner than prior implementations. thereby improving par-
`allelism.
`
`Referring again to FIGS. 3-7. as stated previously. each
`rendering processor 20 writes only those pixels that fall
`within the groups of blocks 52 to which it has been assigned.
`The pixels within the assigned groups of blocks 52 are
`rendered in the memory segment 42 connected to that
`rendering processor 20.
`
`0012
`
`
`
`5.794.016
`
`7
`
`According to the present invention. the use of memory is
`conserved within the graphics system 10 through the
`X-density and Y-density fields 74 and 76 in the configuration
`register 60. The purpose of the X-density and Y-density
`fields 74 and 76 is to instruct the rendering processor 20 how
`much of the memory segment 42 to reserve for pixels falling
`within the assigned groups of blocks 52. Stated another way.
`the X-density and Y-density fields 74 and 76 instruct the
`rendering processor 20 not to reserve physical memory for
`pixels falling within unassigned groups of blocks 52. thus
`saving memory in each of the memory segments 42.
`For an eflicient organization of memory in which no
`memory is wasted. the blocks 52 within each row of blocks
`52 must be allocated to the same number of rendering
`processors 20. and similarly. the blocks 52 within each
`column of blocks 52 must be allocated to the same number
`of rendering processors 20. Furthermore. for ease of
`implementation. the number of processors 20 corresponding
`to the blocks 52 in each row of blocks 52 and to the blocks
`52 in each column of blocks 52 must be a power of two. The
`following discussion assumes such an organization.
`The X-density and Y-density fields 74 and 76 indicate the
`minimum size of a two-dimensional array of blocks 52 that.
`at any location within the graphics memory 22 aligned on a
`block boundary that is a multiple of the values of the
`X-density and Y-density 74 and 76 in the X and Y
`dimensions. respectively. include one and only one block
`allocated to each of the processors 20 in the system. If such
`a two-dimensional array is unachievable within the configu-
`ration’s processor map.
`then the configuration can not
`achieve optimal memory usage within the scope of the
`present invention and the processor map should be altered
`accordingly. With an optimal processor map. the number of
`blocks 52 in the two-dimensional array determined by
`X-density and Y-density fields 74 and 76 should equal the
`number of processors 20 in the system.
`Referring again to FIG. 5. what is shown is an example in
`which the X- and Y-density fields 74 and 76 may be set to
`two and two. respectively. to four and one respectively. or to
`one and four. respectively: each of these settings will pro-
`vide the sarne results. Referring again to FIG. 6. what is
`shown is an example in which the X- and Y-density fields 74
`and 76 may be set to two and one respectively. or to one and
`two. respectively: each of these settings will provide the
`same results.
`
`As stated above. the X- and Y-density fields 74 and 76
`instruct the rendering processors 20 not to reserve physical
`memory for pixels within unassigned blocks 52. Arendering
`processor 20 uses its X-density and Y-density fields 74 and
`76 to create ‘aliases’ of pixel coordinate addresses to physi-
`cal memory addresses. Specifically. all blocks 52 within a
`two-dimensional array of blocks 52 that is X-density by
`Y-density blocks 52 in size. and is aligned on a block
`boundary that is a multiple of X- and Y-density in the X and
`Y dimensions.
`respectively. share physical memory
`addresses.
`
`For example. referring to FIG. 5. if the X-density and
`Y-density fields 74 and 76 are set
`to two and two.
`respectively. then the 2x2 array formed from the groups of
`blocks 0. 1. 4. and 5 will share physical memory addresses;
`the groups of blocks 2. 3. 6. and 7 will share physical
`memory addresses; the groups of blocks 3. 9. 12. and 13 will
`share physical memory addresses; and the groups of blocks
`10. ll. 14. and 15 will share physical memory addresses.
`Since a rendering processor 20 updates only the memory
`within the blocks 52 allocated to it by the block enable field.
`
`8
`only the aliases corresponding to the allocated blocks 52 will
`be updated by the rendering processor 20. Referring again to
`FIG. 5 for example. in the 2x2 array formed from the groups
`of blocks 0. 1. 4. and 5. only the block in group 0 is allocated
`to processor 0. Therefore.
`the aliases corresponding to
`groups 1. 4. and 5 will not be updated by processor 0. And
`as a result. the physical memory addresses shared by groups
`0. 1.4. and 5 will contain only the data corresponding to the
`block in group 0.
`Consequently. the physical memory requirements within
`each rendering processor 20 are reduced by a factor equiva-
`lent to the size of the two—dimensional array determined by
`X-density and Y-density fields 74 and 76. which is also
`equivalent to the number of rendering processors 20 in a
`configuration with an optimal processor map.
`As with the block enable field 61. the X- and Y-density
`fields 74 and 76 can be implemented through software or
`through software or hardware located remote from the
`rendering processor.
`The operation of the parallel graphics architecture of the
`present invention will be explained by way of example. FIG.
`8 is a block diagram showing three triangles. triangle A.
`triangle B. and triangle C that are to be rendered in a block
`52 belonging to group “O” and in a block 52 belonging to
`group “l". As explained above. through the appropriate
`assignment of groups of blocks 52 to rendering processors
`20 by the block enable fields 61 of the configuration registers
`60. all objects appearing in the group “0" blocks (block 0)
`will be rendered by rendering processor 0. and all objects
`appearing in the group “1” blocks (block 1) will be rendered
`by rendering processor 1.
`Referring to FIGS. 3 and 8. the triangles originate from a
`graphics application executing on the CPU 14. which per-
`forms all 3D pipeline calculations. such as transformations
`and shading computations. At
`the last stage of the 3D
`graphics pipeline. the CPU 14 generates graphic command
`packets. which typically include 3D geometry information
`in the form of (x. y) coordinates and pixel intensity values
`(Red. Green. Blue) for the ve.r1:ioes comprising each triangle.
`The CPU 14 is aware of the checkerboard pattern of the
`graphics memory 22. and from the triangle vertices deter-
`mines in which blocks 52 the triangles will be visible or
`partially visible. The CPU 14 sends command packets for
`triangle A to processor 0. and the command packets for
`triangle C to processor 1. Since triangle B is visible in both
`block 0 and block 1. the CPU 14 sends the command packets
`for triangle B to both processors 0 and 1. However. proces-
`sors 0 and 1 will render only those pixels of triangle B that
`are viewable in the block 52 corresponding to the respective
`rendering processor 20.
`Referring again to FIG. 7. in response to graphic com-
`mands sent from the CPU 14. each of the rendering proces-
`sors 2|) independently scan-converts the geometric objects
`into the memory blocks 52 indicated by their block enable
`held 61. The rendering processor 20 first reads the command
`packets sent to it over the PCI bus 30 and stores them in the
`command queue 62.
`The rendering processor 20 then processes the command
`packets in a pipeline process comprising a dispatch stage 80.
`a setup stage 82. and an update stage 84. In the dispatch
`stage 80.
`the dispatcher circuit 64 reads the command
`packets from the command queue 62 and dispatches the
`vertex data in the command to the next stage in the pipeline.
`the setup stage 82.
`The setup stage 82 includes the geometric setup circuit 66
`and the attribute setup circuit 70. which operate in parallel.
`
`0013
`
`
`
`5.794.016
`
`9
`Both the geometric setup circuit 66 and the attribute setup
`circuit 70 accept the triangle vertex data in the form of (x.
`y) coordinate data. and setup the triangles for scan-
`conversion by the update stage 84.
`From the triangle Vertex data. the geometric setup circuit
`66 determines initial geometry values in the form of (X. y)
`coordinates for the attribute setup circuit 70. Similarly. the
`attribute setup circuit 70 calculates initial attribute values for
`the interpolator circuit 72. The attributes include red. green.
`blue. 2 (depth). and transparency values.
`The update stage 84 includes the address generation
`circuit 68 and the interpolator circuit 72. which also operate
`in parallel. The interpolator circuit 72 uses the initial
`attribute values in order to interpolate final attribute values
`for the pixels in each triangle. The output of the interpolator
`circuit 72 is used to update the pixel attributes in memory
`once the physical address for
Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.
This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.
Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.
Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.
One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.
Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.
Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site