throbber
United States Patent
`
`[19]
`
`[11] Patent Number:
`
`5,794,016
`
`Kelleher
`
`[45] Date of Patent:
`
`Aug. 11, 1998
`
`US0057940 1 6A
`
`5.392.393
`5.408.606
`5.434.967
`
`5.440.682
`5.493.643
`
`[54] PARALLEL-PROCESSOR GRAPHICS
`ARCHHECTURE
`
`[75]
`
`Invcmor: Brian Nfichael Keueher. pal0A1t0.
`Ca“-f_
`
`[73] Assignee: Dynamic Pictures, Inc.. Palo Alto.
`Calif,
`
`[21] AWL No; 570,083
`
`[22]
`
`Filfidi
`
`Dec 11. 1995
`
`15/80
`Int. CL6 ......................................................
`[52] U.S. Cl.
`.................... 395/505; 395/509; 395/497.04;
`395/412
`[5 3] Field of Search ..................................... 395/502. 505.
`395/509 5U7~ 509~ 401- 402- 497-01-497-04~
`412‘419~ 515~ 5161332/303~ 304
`
`References Cited
`U_S_ PATENT DOCUMENTS
`_
`_
`8/1933 D1 0110 ................................... 395/409
`8/1990 Littlefield
`395/505
`.. 382/304
`5/1992 Hino etal.
`1/1994 Peaslee eta1_ .
`" 395/505
`5/1994 Imai etal.
`...... ..
`.. 395/505
`12/1994 Abi-Ezzi et al,
`.. 395/502
`.
`2/1995 Poulton et al.
`.. 395/505
`
`
`
`4.761.736
`4.949.280
`5.117.468
`$276,798
`5,315,699
`5.377.320
`5.388206
`
`[56]
`
`“Simulation and Expected Performance Analysis of Mul-
`tiple Processor Z—Bufier Systems”. Frederick I. Parke. Com-
`puter Graphics. vol. 14. No. 3. pp. 4_-6-56 (Jul.. 1980).
`“High—Performance Polygon Rendering". Kurt Akeley. Tom
`Jermoluk. Silicon Graphics. Inc.. pp. 6-12. (1988).
`
`Primary Examincr——Kcc M_ Tung
`Attorney, Agent, or Firm—Sawyer & Associates
`
`ABSTRACT
`[57]
`A parallel-processor graphics architecture appropriate for
`multimedia graphics workstations that is scalable to the
`needs of a user. The graphics architecture includes one or
`more rendering processors and a graphics memory that is
`partitioned into blocks. Npncontiguous groups of the blocks
`are then assigned to dilferent processors. The parallel-
`processor graphics architecture is scalabjc
`thg nu]-nbcr Qf
`rendering processors utilized. and is configurable with
`-
`resptjsjcit to t(1!1e.a]locat.1on of the groups of the blocks to
`59°‘ “C” “mg P’°°°$$°'S‘
`
`10 Claims, 8 Drawing Sheets
`
`COMMAND
`( UEUE
`2
`
`,
`
`CUNFIG REG.
`Block Enable
`
`.X-density Y-density
`
`l)1SPATL_‘HER
`
`DISPATCH
`m
`
`GEUMETRIC
`sum?
`
`AT§§ii3}§TE
`
`SETUP
`3;;
`
`ADDRESS
`(iliNERA'[‘OR
`
`,
`
`“""’”"'
`E
`
`l
`I‘
`j‘
`v-
`ADIJRESS
`
`V
`PIXEL
`A'rTRI_nUTi-;
`D-*\ 1 A
`
`TEXAS INSTRUMENTS EX. 1005 - 1/16
`
`2/1995 Deexing et al.
`4/1995 Eckart .....................
`7/1995 Tanneubaum et al.
`.
`
`8/1995 Deering ...................
`2/1996 Soderberg et al.
`OTHER PUBLICATIONS
`
`.
`
`
`
`395/505
`. 395/502
`. 395/506
`
`. 395/503
`395/505
`
`TEXAS INSTRUMENTS EX. 1005 - 1/16
`
`

`
`U.S. Patent
`
`Aug. 11, 1993
`
`Sheet 1 of 3
`
`5,794,016
`
`
`
`SYS.
`CHIP SET
`
`MAIN
`MEMORY
`
`
`
`
`PCI
`
`43 R)
`
`20A
`
`30A
`
`PROCESSOR
`H
`
`34
`
`64
`
`VIDEO
`
`5
`
`PCI
`
`m
`
`VDAC
`
`CRT
`
`FIG. 1
`
`Z>:u
`
`TEXAS INSTRUMENTS EX. 1005 - 2/16
`
`TEXAS INSTRUMENTS EX. 1005 - 2/16
`
`

`
`U.S. Patent
`
`Aug. 11, 1998
`
`Sheet 2 of 3
`
`5,794,016
`
`
`
`SYS.
`CHIP SET
`
`MAIN
`MEMORY
`
`PCI
`
`30”‘
`
`BRIDGE
`
`38
`
`30B
`
`
`
`
`34
`
`64
`
`VIDEO
`
`20A
`
`PROCESSOR
`
`[0
`
`9;
`
`vdac.
`
`32A
`
`04
`
`"""" —’
`
`s
`D
`R
`A
`M
`
`.
`
`s
`D
`R
`A
`M
`
`1’ “
`1
`1
`1
`1
`1
`
`1 11
`
`22"1___4_2§ __________________ ——
`
`36
`
`26
`
`1
`
`PCI
`
`to
`
`CRT
`
`24
`
`29
`
`R A
`
`M
`
`FIG. 2
`
`TEXAS INSTRUMENTS EX. 1005 - 3/16
`
`TEXAS INSTRUMENTS EX. 1005 - 3/16
`
`

`
`U.S. Patent
`
`Aug. 11, 1998
`
`Sheet 3 of 8
`
`5,794,016
`
`
`SYS.
`CHIP SET
`
`32
`
`34
`
`64
`
`VIDEO
`
`CRT
`
`43C
`42B
`42A
`L,————————————————————————————————— "J
`K 22
`36
`Ubus
`8
`
`26
`
`28
`
`PCI
`
`24
`
`29
`
`R A
`
`M
`
`FIG. 3
`
`TEXAS INSTRUMENTS EX. 1005 - 4/16
`
`
`
`.r—'
`
`‘ " ” ’ ‘
`s
`D
`R
`A
`M
`
`S
`D
`R
`A
`M
`
`_ _ _ T’
`
`S
`D
`R
`A
`M
`
`S
`D
`R
`A
`M
`
`S
`D
`R
`A
`M
`
`s
`D
`R
`A
`M
`
`1
`l
`,
`1
`1
`
`1I }
`
`TEXAS INSTRUMENTS EX. 1005 - 4/16
`
`

`
`aPS”U
`
`tn
`
`mHflflflflflflflfl
`
`U
`
`A_
`
`0w.1
`
`nG9Tl1FH,
`
`4
`
`4a6LuS
`
`4
`
`mmoT
`
`.0/,m9EsW2R2T
`
`Mm0,a
`
`/.m
`
`S
`
`S
`
`6aM
`
`TEXAS INSTRUMENTS EX. 1005 - 5/16
`
`
`

`
`LlH6LI.aP3U
`
`UA
`
`m0..
`
`mmm
`
`8
`
`4
`
`%G9T1
`.,F
`
`/0
`
`...mE5T
`
`9m79NsW
`
`6m11.0»a
`
`W
`
`URTSW
`
`6W5O
`
`TEXAS INSTRUMENTS EX. 1005 - 6/16
`
`

`
`U.S. Patent
`
`Aug. 11, 1998
`
`Sheet 6 of 3
`
`5,794,016
`
`Block Enable
`
`DISPATCHER
`
`DISIEZTCH
`
`SETUP
`
`ATTRIBUTE
`
`(R.G.B,Z,T)
`
`82
`
`ADDRESS
`GENERATOR
`
`INTERPOLATOR
`
`UPDATE
`E
`
`PHYSICAL
`MEMORY
`ADDRESS
`
`PIXEL
`ATTRIB UTE
`DATA
`
`TEXAS INSTRUMENTS EX. 1005 - 7/16
`
`TEXAS INSTRUMENTS EX. 1005 - 7/16
`
`

`
`U.S. Patent
`
`Aug. 11, 1998
`
`Sheet 7 of 3
`
`5,794,016
`
`BLOCK H
`
`g
`
`128
`
`BLOCK 1
`
`[J U1 U’:
`
`I
`
`'
`
`52
`
`FIG. 8
`
`TEXAS INSTRUMENTS EX. 1005 - 8/16
`
`TEXAS INSTRUMENTS EX. 1005 - 8/16
`
`

`
`U.S. Patent
`
`Aug. 11, 1993
`
`Sheet 8 of 8
`
`5,794,016
`
`FIG. 10
`
`FIG. 11
`
`1 00
`
`VPN
`
`PID
`
`102
`
`TEXAS INSTRUMENTS EX. 1005 - 9/16
`
`TEXAS INSTRUMENTS EX. 1005 - 9/16
`
`

`
`1
`PARALLEL-PROCESSOR GRAPHICS
`ARCHITECTURE
`
`FIELD OF THE INVENTION
`
`‘The present invention relates to computer graphics and
`more particularly to a parallel-processor graphics architec-
`ture appropriate for multimedia graphics workstations that is
`scalable to the needs of a user.
`
`BACKGROUND OF THE DJVENTION
`
`The tasks of traditional geometric graphics applications
`can be partitioned into three functional categories: structure
`traversal. geometry processing. and rendering. Structure
`traversal refers to the traversal of an application’s graphics
`data structure. either by the application or by a graphics
`library.
`Geometry processing refers to floating point intensive
`operations. such as vertex transformation and shading. that
`convert the image data from an application's format into a
`geometric format of vertices comprising the image and
`vertex properties. such as color. Finally. rendering refers to
`the process of calculating individual pixel values for the
`image that are stored in graphics memory based on the
`transformed geometric data.
`Designers have recognized that substantial gains in graph-
`ics system performance are possible through the use of
`multiple processors operating in parallel. One problem with
`multi-processor graphic subsystem designs is determining
`how to distribute the rendering tasks among the processors.
`Prior methods have divided graphics memory or display
`screen into partitions and allocated different parts of the
`memory or screen to separate processors. However. those
`methods are ineflicient. In some methods. the processors
`worked on rendering dilferent parts of the same object and
`the processors were therefore dependent on one another. If
`one processor became overloaded and slowed down. it
`caused the other processors also to slow down.
`In other methods. the processors were assigned to large
`contiguous areas of memory. For instance. in a four proces-
`sors system. the display screen would be divided into four
`large adjacent blocks with each processor assigned to a
`different block. The problem with this method is that it
`reduces the efliciency of the parallelism of the graphics
`system. If there were no objects for a processor to render in
`its area. then the processor sat idle and under utilized. while
`the other processors may have been overloaded. An extreme
`solution to this problem would be to assign one processor to
`one pixel. but as more and more processors are added.
`additional gains in performance decrease. while costs
`increase dramatically.
`Typical graphic systems include a general-purpose base
`system with its own central processing unit and main
`memory. and special-purpose graphics hardware to acceler-
`ate the graphics applications. Since the introduction of the
`first special-purpose graphics hardware. physical memory
`restrictions have burdened low-level graphics software and
`firmware developers. To solve this problem. some graphics
`software interfaces provide a virtual memory abstraction to
`their applications to allow the application a view of an
`unlimited number of coexistent images it can create and
`maintain through the software interface.
`Early examples of graphics virtual memory systems
`include a simple memory-mapped frame-buffer method and
`a rendering processor method. The memory-mapped frame-
`buifer method achieved virtual memory rendering by allow-
`
`5.794.016
`
`2
`
`ing the CPU to implement the rendering algorithms using a
`virtual memory system in the base system as the graphic
`memory system. Similarly. the rendering processor method
`also used the main memory of the base system. but inter-
`faced the main memory with a special-purpose rendering
`processor through a wide bus. While these systems have
`very different
`implementations.
`they are similar in their
`ability to render directly into processor virtual memory.
`Each of these methods tightly integrates the graphics with
`the main memory of the base system. Unfortunately. graphic
`systems marketed today do not satisfy the requirements of a
`full range of customers. For example. a workstation that uses
`the CPU to perform rendering operations may be an accept-
`able low-cost solution: however. some customers will
`undoubtedly demand higher performance. Similarly. a work-
`station that utilizes multiple processors to increase system
`performance may not be scalable downward to please cost-
`sensitive customers.
`
`Therefore. what is needed is a graphics system architec-
`ture that includes a low-cost entry configuration with limited
`graphics hardware. and allows a higher performance multi-
`processor configuration in which processor efficiency is
`maximized. The present invention addresses such a need
`
`SUMMARY OF THE INVENTION
`
`The present invention provides a parallel graphics system
`and method for rendering an image on a display screen. The
`system includes a plurality of processors for rendering an
`image on the display screen. and a memory organized into
`noncontiguous groups of blocks. The system and method
`further comprises block enable means for allocating at least
`one of the noncontiguous groups of blocks to each of the
`plurality of processors for rendering. thereby increasing the
`efficiency of the parallelism of the graphics system.
`According to the system and method disclosed herein. the
`present invention is scalable by the number of rendering
`processors utilized. and is configurable with respect to the
`allocation of the groups of the blocks to specific rendering
`processors.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 depicts a low cost implementation of a graphics
`system having one rendering processor.
`FIG. 2 is a block diagram depicting the graphics system
`with two rendering processors.
`FIG. 3 depicts a high-end implementation of the graphics
`system with N rendering processors.
`FIG. 4 is a block diagram depicting a graphics memory
`that has been divided into a plurality of blocks.
`FIGS. 5 and 6 are block diagrams depicting the noncon-
`tiguous block organization of the present invention within a
`4x4 array of adjacent blocks.
`FIG. 7 is a block diagram of a rendering processor.
`FIG. 8 is a block diagram showing three triangles. triangle
`A. triangle B. and triangle C that are to be rendered in blocks
`0 and 1.
`
`FIG. 9 is a block diagram showing a representative pixel
`and pixelmap within a graphics virtual memory 90.
`FIG. 10 is a block diagram illustrating a rendering pro-
`cessor and an associated memory segment.
`FIG. 11 is a functional block diagram of an address
`generation circuit.
`FIG. 12 is a block diagram illustrating the contents of a
`TLB entry.
`
`5
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`TEXAS INSTRUMENTS EX. 1005 - 10/16
`
`TEXAS INSTRUMENTS EX. 1005 - 10/16
`
`

`
`3
`DESCRIPTION OF THE INVENTION
`
`5.794.0 I 6
`
`4
`
`The present invention relates to an improved parallel-
`processor graphics architecture. The following description is
`presented to enable one of ordinary skill in the art to make
`and use the invention and is provided in the context of a
`patent application and its requirements. Various modifica-
`tions to the preferred embodiment will be readily apparent to
`those skilled in the art and the generic principles herein may
`be applied to other embodiments. Thus. the present inven-
`tion is not intended to be limited to the embodiment shown
`but is to be accorded the widest scope consistent with the
`principles and features described herein.
`FIGS. 1-3 are block diagrams depicting a parallel-
`‘. -processor graphics system 10 of the present invention that is
`.scalable to the needs of a user. The system may be config-
`ured with one rendering processor. or with multiple render-
`ing processors. FIG.'l depicts a low-cost implementation of
`the graphics system 10A having one rendering processor
`20A. FIG. 2 is a block diagram depicting the graphics
`system 10B with two rendering processors 20A and 20B.
`FIG. 3 depicts a high-end implementation of the graphics
`system 10C with N rendering processors 20A—20N. In the
`multiple-processor configurations. the efliciency of the mul-
`tiple processors is maximized by eifectively distributing the
`rendering tasks. explained further below.
`The graphics system 10A. 10B. and 10C (hereinafter
`referred to as graphics system 10) is part of a general-
`ptnpose base computer system 12 including a CPU 14. a
`system chip set 16. and main memory 18. Each rendering
`processor 20A—20N (hereinafter referred to as rendering
`processor 20) in the graphics system 10 may be coupled to
`a graphics memory 22. a video digital-to-analog converter
`(VDAC) 24. a peripheral component interconnect (PCI)
`ROM 26. and a VGA processor 28.
`The purpose of the rendering processor 20 is to accept a
`geometric description of an object and scan-convert the
`object into the graphics memory 22. The graphics memory
`22. also known as a frame buffer. stores pixel data at
`memory addresses corresponding to pixels on a display
`device. such as a cathode ray tube (CRT) 29. The intercon-
`nect of the rendering processor 20 comprises four major
`buses: a PCI bus 30. a memory bus 32. a video bus 34. and
`a utility bus (Ubus) 36.
`The PCI bus 30 is a 32-bit bus that connects the rendering
`processor 20 to the general-purpose computer system 12. In
`the multiple-processor configurations 10B and 10C.
`the
`rendering processors 20 are connected to a system PCI bus
`30A through a PCI bridge 38. Since only one load may be
`placed on the system PCI bus 30A. the PCI bridge 38 is used
`to translate the system PCI bus 30A into a second PCI bus
`30B.
`
`The video bus 34 provides pixel data from the rendering
`processors 20 to the VDAC 24 via a 64-bit interconnect The
`VDAC 24 is a display interface that receives ordered pixel
`data from each of the rendering processors 20 in the form of
`digital color data. and passes the data to a CRT 29.
`The Ubus 36 provides an inputloutput data path for the
`rendering processor’s 20 ancillary chips. such as the VDAC
`24. the VGA processor 28 and the PCI ROM 26. The VGA
`processor 28 is provided so that the graphics system 10 is
`compatible with legacy systems.
`The rendering processor 20 is the primary component of
`the graphics system 10 of the present invention. The ren-
`dering processor 20 implements graphics and multimedia
`algorithms and interfaces with the PCI bus 30 and all other
`
`10
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`components in the system 10. The rendering processor 20
`integrates video technologies including 2D and 3D geomet-
`ric graphics. photorealistic rendering. stereoscopic viewing.
`windowing operations. and live video.
`More particularly. each rendering processor 20 imple-
`ments a set of parallel rendering algorithms including
`smooth shadowing. depth-buffering. dithering. antialiasing.
`transparency. blending. alpha testing. stenciling. stippling.
`accumulation buffering. and texture mapping. In addition.
`the rendering processor 20 provides a video pipeline for
`multimedia applications through which live video data can
`be scaled. filtered. and converted into red. green. and blue
`color components for storage in graphics memory 22 and
`subsequent display.
`in order to increase the
`Referring to FIGS. 2 and 3.
`efficiency of the rendering processors 20.
`the graphics
`memory subsystem 22 is partitioned into physical memory
`segments 42 and each memory segment 42 is coupled to a
`different rendering processor 20. Each rendering processor
`20 accesses only its memory segment 42 of the graphics
`memory 22 and is required to write only those pixels falling
`within that memory segment 42. For example. rendering
`processor 20B is coupled to memory segment 42B and
`writes only to those pixels contained within memory seg-
`ment 42B of the graphics memory subsystem 22.
`The memory bus 32 connects each rendering processor 20
`to its corresponding graphics memory segment 42 via a
`64-bit
`interconnect.
`In a preferred embodiment of the
`present invention. the memory segments 42 comprise syn-
`chronous dynamic random access memory (SDRAM). In the
`low-cost one-processor configuration of FIG. 1. four lM><16
`SDRAMs are used to construct an eight Mbyte graphics
`memory system 22. In the two-processor configuration of
`FIG. 2. eight 1M><16 SDRAMs are used to construct a
`sixteen Mbyte graphics memory system 22. In a higher-end
`four-processor configuration. sixteen 1M>< 16 SDRAMS are
`used to construct a thirty-two Mbyte graphics memory
`system 22. The graphics memory 22 could also be imple-
`mented using VRAM. in which case. the graphics memory
`22 would connect directly to the VDAC 24.
`Prior methods have divided graphics memory or display
`screen into partitions and allocated different parts of the
`memory or screen to separate processors. However. those
`methods are inefiicient in the manner that the memory is
`allocated to the processors. which also creates dependencies
`among the processors.
`The present invention provides an improved method and
`system for allocating dilferent parts of the graphics memory
`22 to multiple rendering processors 20. The method and
`system for allocating the graphics memory 22 increases the
`efficiency of the graphics system. and is configurable such
`that additional rendering processors 20 may be added to the
`graphic architecture. making the system scalable.
`To more particularly illustrate the method and system for
`allocating graphics memory 22 to the rendering processors
`20 in accordance with the present invention. refer now to
`FIG. 4 depicting a block diagram of one embodiment of such
`a system.
`FIG. 4 is a block diagram of a graphics memory 22 that
`has been partitioned into a plurality of pixel blocks 52 that
`are tiled in the x- and y-direction of the graphics memory 22.
`In a preferred embodiment. the graphics memory 22 is large
`enough to render a 1280x1024 screen display.
`According to the present invention. the pixel blocks 52 are
`organized into noncontiguous groups of blocks 52. where
`blocks 52 in the same group are designated by the same
`
`TEXAS INSTRUMENTS EX. 1005 - 11/16
`
`TEXAS INSTRUMENTS EX. 1005 - 11/16
`
`

`
`5
`
`6
`
`5.794.016
`
`number. As an example. the graphics memory 22 in FIG. 4
`is shown comprising sixteen groups of pixel blocks (0-15).
`The blocks 52 labeled “0” belong in group “O”. the blocks
`labeled “I” belong in group “I”. the blocks labeled “3"
`belong in group “3” et cetera.
`The groups of blocks 52 in the graphics memory 22 are
`then assigned to the rendering processors 20 for rendering.
`Each rendering processor 20 writes to only those pixels that
`are located in the blocks 52 of the assigned groups. The
`groups of block 52 may be assigned to the rendering
`processors 20 in varying ratios. Assuming there are sixteen
`rendering processors present in the multi-processor system
`10C. then the sixteen groups of blocks 52 could be assigned
`to the processors 20 on a one-to-one basis. In this case. the
`blocks in group “O” may be assigned to rendering processor
`0; the blocks in group “1” may be assigned to rendering
`processor 1; the blocks in group “2" may be assigned to
`rendering processor 2. and so on.
`FIGS. 5 and 6 are block diagrams depicting the noncon-
`tiguous block organization of the present invention within
`one 4x4 array of adjacent blocks 52 in the graphics memory
`22 (the 4x4 array‘ is replicated throughout the graphics
`memory 22). The sixteen groups of pixel blocks 52 within
`each 4><4 array are tiled as in FIG. 4. where a particular
`group is designated by the middle number in the blocks 52.
`and the number in the upper right corner of the blocks 52
`represents the processor to which a particular block is
`assigned.
`When the number of rendering processors 20 is less than
`the number of groups of blocks 52. then the groups of blocks
`52 may be assigned to the rendering processors 20 in higher
`ratios. such as two-to-one or four-to-one. rather than the
`one-to-one ratio above.
`
`Referring to FIG. 5. what is shown is an example allo-
`cation of groups of blocks 52 when four rendering proces-
`sors 20 (0-3) are present. When sixteen groups are allocated
`among four rendering processors 20. the group of blocks 52
`are assigned to the rendering processors 20 in a four-to-one
`ratio. As shown. processor 0 is assigned to groups 0. 6. 11.
`and 13; processor 1 is assigned to groups 1. 7. 10. and 12;
`processor 2 is assigned to groups 2. 4. 9. and 15; and
`processor 3 is assigned to groups 3. 5. 8. and 14.
`Referring to FIG. 6. what is shown is an example allo-
`cation of groups of blocks 52 when two rendering processors
`20 (0-1) are present. When sixteen groups are allocated
`among two rendering processors 20. the group of blocks 52
`are assigned to the rendering processors 20 in an eight-to-
`one ratio. As shown. processor 0 is assigned to groups 0. 2.
`5. 7. 8. 10. 13. and 15; and processor 1 is assigned to groups
`1. 3. 4. 6. 9. 11. 12. and 14.
`
`The assignment or mapping of various groups of blocks
`52 to rendering processors 20 is referred to as a processor
`map. As shown in the processor maps of FIGS. 4-6. the
`blocks 52 within a particular group that are assigned to a
`particular processor 20 are arranged in a noncontiguous
`organization. and are spaced about the display. Therefore.
`the rendering demands placed on each of the rendering
`processors 20 should be equalized. That is. as the rendering
`processors 20 operate in parallel no rendering processor
`should be idle while others are overloaded Therefore. the
`
`must duplicate calculations for the same polygon. which
`wastes parallelism. Therefore. in a preferred embodiment of
`the present invention. each pixel block 52 is 128x128 pixels
`in size. This block size has been determined to effectively
`divide the parallelism of the rendering processors 20. Since
`polygons tend to be particularly small. and 128x128 blocks
`52 are relatively large. polygons will not commonly cross
`block boundaries. The block size. however. is not so large as
`to slow the rendering processors 20 down when intensive
`rendering is required in a block 52. As stated above. when
`the blocks 52 are too large. the distribution of polygons
`becomes ineffective.
`
`Referring now to FIG. 7. a block diagram of a rendering
`processor 20 is shown. Each rendering processor 20 includes
`a configuration register 60. a command queue 62. a dis-
`patcher circuit 64. a geometric setup circuit 66. an attribute
`setup circuit 70. an address generation circuit 68 and an
`interpolator circuit 72.
`According to the present invention. the assignment of
`various blocks to particular rendering processors is not
`fixed. but is rather alterable through programming. The
`block assignments are programmed through the configura-
`tion register 60. which sets the global state of the rendering
`processor 20. The configuration register 60 includes a block
`enable field 61. an X-density field 74 and Y-density field 76.
`The block enable field 61 determines which groups of
`blocks 52 within the graphics memory 22 are allocated to
`and controlled by the rendering processor 20. It is through
`the block enable field 61 that the parallelism among the
`rendering processors 20 may be configured.
`The block enable field 61 includes a plurality of bits.
`where each bit corresponds to a particular group of blocks
`52. A set bit implies that a rendering processor 20 accesses
`the group of blocks 52 corresponding to that bit. For
`example. if bit zero is set within the block enable field 61 of
`a particular rendering processor 20.
`then that rendering
`processor 20 will access each block 52in group “O”. The bits
`in the block enable field 61 must be mutually exclusive
`between rendering processors 20.
`In a preferred embodiment. the block enable field 61 is
`sixteen-bits in length. implying there are sixteen group of
`blocks 52 that can be distributed among the processors 20.
`Rather than using a hardware register. the allocation of a
`group of blocks 52 to a particular processor 20 could also be
`implemented through software running on each of the ren-
`dering processors 20. or through a global register or software
`located remote from the rendering processors.
`Referring again to FIG. 6. for example. the block enable
`field 61 for the rendering processor 0 would contain the
`hexadecimal value “aSa5.” This means that bits 0. 2. 5. 7.8.
`10. 13 and 15 would be set to inst:ruct the processor to render
`those groups of blocks 52. The block enable field 61 for
`rendering processor 1 would contain the complementary
`hexadecimal value “5a5a.” This means that bits 1. 3. 4. 6. 9.
`11. 12. and 14 would be set to instruct the processor to render
`those groups of blocks 52. As in FIG. 4. the noncontiguous
`block organization shown in FIGS. 5-6 distributes work
`among the rendering processors 20 in a more efiicient
`manner than prior implementations. thereby improving par-
`a.l.lelism.
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`noncontiguous block organization of the present invention
`effectively increases the overall etficiency and performance
`of the graphics system 10.
`When a polygon to be rendered crosses a block boundary
`and enters into another block 52. the two rendering proces-
`sors 20 assigned to each of the respective group of blocks 52
`
`65
`
`Referring again to FIGS. 3-7. as stated previously. each
`rendering processor 20 writes only those pixels that fall
`within the groups of blocks 52 to which it has been assigned.
`The pixels within the assigned groups of blocks 52 are
`rendered in the memory segment 42 connected to that
`rendering processor 20.
`
`TEXAS INSTRUMENTS EX. 1005 - 12/16
`
`TEXAS INSTRUMENTS EX. 1005 - 12/16
`
`

`
`5.794.016
`
`7
`
`According to the present invention. the use of memory is
`conserved within the graphics system 10 through the
`X-density and Y-density fields 74 and 76 in the configuration
`register 60. The purpose of the X-density and Y-density
`fields 74 and 76 is to instruct the rendering processor 20 how
`much of the memory segment 42 to reserve for pixels falling
`within the assigned groups of blocks 52. Stated another way.
`the X-density and Y-density fields 74 and 76 instruct the
`rendering processor 20 not to reserve physical memory for
`pixels falling within unassigned groups of blocks 52. thus
`saving memory in each of the memory segments 42.
`For an eflicient organization of memory in which no
`memory is wasted. the blocks 52 within each row of blocks
`52 must be allocated to the same number of rendering
`processors 20. and similarly.
`the blocks 52 within each
`column of blocks 52 must be allocated to the same number
`of rendering processors 20. Furthermore. for ease of
`implementation. the number of processors 20 corresponding
`to the blocks 52 in each row of blocks 52 and to the blocks
`52 in each column of blocks 52 must be a power of two. The
`following discussion assumes such an organization.
`The X-density and Y-density fields 74 and 76 indicate the
`minimum size of a two-dimensional array of blocks 52 that.
`at any location within the graphics memory 22 aligned on a
`block boundary that is a multiple of the values of the
`X-density and Y-density 74 and 76 in the X and Y
`dimensions. respectively. include one and only one block
`allocated to each of the processors 20 in the system. If such
`a two-dimensional array is unachievable within the configu-
`ration’s processor map.
`then the configuration can not
`achieve optimal memory usage within the scope of the
`present invention and the processor map should be altered
`accordingly. With an optimal processor map. the number of
`blocks 52 in the two-dimensional array determined by
`X-density and Y-density fields 74 and 76 should equal the
`number of processors 20 in the system.
`Referring again to FIG. 5. what is shown is an example in
`which the X- and Y-density fields 74 and 76 may be set to
`two and two. respectively. to four and one respectively. or to
`one and four. respectively; each of these settings will pro-
`vide the same results. Referring again to FIG. 6. what is
`shown is an example in which the X- and Y-density fields 74
`and 76 may be set to two and one respectively. or to one and
`two. respectively; each of these settings will provide the
`same results.
`
`As stated above. the X- and Y-density fields 74 and 76
`instruct the rendering processors 20 not to reserve physical
`memory for pixels within unassigned blocks 52. Arendering
`processor 20 uses its X-density and Y-density fields 74 and
`76 to create ‘aliases’ of pixel coordinate addresses to physi-
`cal memory addresses. Specifically. all blocks 52 within a
`two-dimensional array of blocks 52 that is X-density by
`Y-density blocks 52 in size. and is aligned on a block
`boundary that is a multiple of X- and Y-density in the X and
`Y dimensions.
`respectively. share physical memory
`addresses.
`
`For example. referring to FIG. 5. if the X-density and
`Y-density fields 74 and 76 are set
`to two and two.
`respectively. then the 2x2 array formed from the groups of
`blocks 0. 1. 4. and 5 will share physical memory addresses;
`the groups of blocks 2. 3. 6. and 7 will share physical
`memory addresses; the groups of blocks 8. 9. 12. and 13 will
`share physical memory addresses; and the groups of blocks
`10. 11. 14. and 15 will share physical memory addresses.
`Since a rendering processor 20 updates only the memory
`within the blocks 52 allocated to it by the block enable field.
`
`8
`only the aliases corresponding to the allocated blocks 52 will
`be updated by the rendering processor 20. Referring again to
`FIG. 5 for example. in the 2x2 array formed from the groups
`of blocks 0. 1. 4. and 5. only the block in group 0 is allocated
`to processor 0. Therefore.
`the aliases corresponding to
`groups 1. 4. and 5 will not be updated by processor 0. And
`as a result. the physical memory addresses shared by groups
`0. 1. 4. and 5 will contain only the data corresponding to the
`block in group 0.
`Consequently. the physical memory requirements within
`each rendering processor 20 are reduced by a factor equiva-
`lent to the size of the two-dimensional array determined by
`X-density and Y-density fields 74 and 76. which is also
`equivalent to the number of rendering processors 20 in a
`configuration with an optimal processor map.
`As with the block enable field 61. the X- and Y-density
`fields 74 and 76 can be implemented through software or
`through software or hardware located remote from the
`rendering processor.
`The operation of the parallel graphics architecture of the
`present invention will be explained by way of example. FIG.
`8 is a block diagram showing three triangles. triangle A.
`triangle B. and triangle C that are to be rendered in a block
`52 belonging to group “O” and in a block 52 belonging to
`group “1". As explained above. through the appropriate
`assignment of groups of blocks 52 to rendering processors
`20 by the block enable fields 61 of the configuration registers
`60. all objects appearing in the group “0" blocks (block 0)
`will be rendered by rendering processor 0. and all objects
`appearing in the group “l” blocks (block 1) will be rendered
`by rendering processor 1.
`Referring to FIGS. 3 and 8. the triangles originate from a
`graphics application executing on the CPU 14. which per-
`fomis all 3D pipeline calculations. such as transformations
`and shading computations. At
`the last stage of the 3D
`graphics pipeline. the CPU 14 generates graphic command
`packets. which typically include 3D geometry information
`in the form of (x. y) coordinates and pixel intensity values
`(Red. Green. Blue) for the ve11:ioes comprising each triangle.
`The CPU 14 is aware of the checkerboard pattern of the
`graphics memory 22. and from the triangle vertices deter-
`mines in which blocks 52 the triangles will be visible or
`partially visible. The CPU 14 sends command packets for
`triangle A to processor 0. and the command packets for
`triangle C to processor 1. Since triangle B is visible in both
`block 0 and block 1. the CPU 14 sends the command packets
`for triangle B to both processors 0 and 1. However. proces-
`sors 0 and 1 will render only those pixels of triangle B that
`are viewable in the block 52 corresponding to the respective
`rendering processor 20.
`Referring again to FIG. 7. in response to graphic com-
`mands sent from the CPU 14. each of the rendering proces-
`sors 20 independently scan-converts the geometric objects
`into the memory blocks 52 indicated by their block enable
`field 61. The rendering processor 20 first reads the command
`packets sent to it over the PCI bus 30 and stores them in the
`command queue 62.
`The rendering processor 20 then processes the command
`packets in a pipeline process comprising a dispatch stage 80

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket