throbber
111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US008933945B2
`
`c12) United States Patent
`Leather et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,933,945 B2
`Jan.13,2015
`
`(54) DIVIDING WORK AMONG MULTIPLE
`GRAPHICS PIPELINES USING A
`SUPER-TILING TECHNIQUE
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`(75)
`
`Inventors: Mark M. Leather, Saratoga, CA (US);
`Eric Demers, Palo Alto, CA (US)
`
`(73) Assignee: ATI Technologies ULC, Markham,
`Ontario (CA)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1808 days.
`
`(21) Appl. No.: 10/459,797
`
`(22) Filed:
`
`Jun.12,2003
`
`(65)
`
`Prior Publication Data
`
`US 2004/0100471 Al
`
`May 27,2004
`
`Related U.S. Application Data
`
`(60)
`
`Provisional application No. 60/429,641, filed on Nov.
`27,2002.
`
`(51)
`
`(52)
`
`(58)
`
`(2006.01)
`(2006.01)
`(2006.01)
`(2006.01)
`(2011.01)
`(2006.01)
`
`Int. Cl.
`G06T 1120
`G06F 13114
`G06F 12102
`G06T 11140
`G06T 15100
`G09G5/36
`U.S. Cl.
`CPC . G06T 11140 (2013.01); G06T 1120 (2013.01);
`G06T 151005 (2013.01); G09G 51363 (2013.01)
`USPC ............................ 345/506; 345/519; 345/544
`Field of Classification Search
`USPC ......... 345/506, 530, 505, 588, 544, 545, 532,
`345/501,502,531,519
`See application file for complete search history.
`
`4,885,703 A
`5,179,640 A *
`5,550,962 A
`5,745,118 A *
`5,794,016 A *
`5,818,469 A
`5,905,506 A *
`5,977,997 A
`5,999,196 A
`6,118,452 A
`
`12/19 89 Deering
`111993 Duffy ............................ 345/596
`8/1996 Nakamura eta!.
`4/1998 Alcorn eta!. ................. 345/587
`8/1998 Kelleher ....................... 345/505
`10/1998 Lawless et a!.
`5/1999 Hamburg ...................... 345/672
`1111999 Vainsencher
`12/1999 Storm eta!.
`9/2000 Gannett
`(Continued)
`OTHER PUBLICATIONS
`
`Elias, Hugo. "Polygon Scan Converting." http:/ /freespace.virgin.netl
`hugo.elias/graphics/x_polysc.htm. *
`(Continued)
`
`Primary Examiner- Joni Richer
`(74) Attorney, Agent, or Firm- Faegre Baker Daniels LLP
`
`ABSTRACT
`(57)
`A graphics processing circuit includes at least two pipelines
`operative to process data in a corresponding set of tiles of a
`repeating tile pattern, a respective one of the at least two
`pipelines operative to process data in a dedicated tile, wherein
`the repeating tile pattern includes a horizontally and verti(cid:173)
`cally repeating pattern of square regions. A graphics process(cid:173)
`ing method includes receiving vertex data for a primitive to be
`rendered; generating pixel data in response to the vertex data;
`determining the pixels within a set of tiles of a repeating tile
`pattern to be processed by a corresponding one of at least two
`graphics pipelines in response to the pixel data, the repeating
`tile pattern including a horizontally and vertically repeating
`pattern of square regions; and performing pixel operations on
`the pixels within the determined set oftiles by the correspond(cid:173)
`ing one of the at least two graphics pipelines.
`
`21 Claims, 5 Drawing Sheets
`
`DETERMINE THE PIXELS WITHIN A SET OF TILES OF
`THE REPEATING TILE PATTERN TO BE PROCESSED
`BY A CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES IN RESPONSE TO THE PIXEL
`DATA, THE REPEATING TILE PATTERN INCLUDING A
`HORIZONTALLY AND VERTICALLY REPEATING
`PATTERN OF SQUARE REGIONS
`
`PERFORM PIXEL OPERATIONS ON THE PIXELS
`WITHIN THE DETERMINED SET OF TILES BY THE
`CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES
`
`/
`
`108
`
`PROVIDE POSITION
`COORDINATES OF THE PIXELS
`WITHIN THE DETERMINED SET
`OF TILES TO BE PROCESSED
`
`y
`
`N
`
`END
`
`0001
`
`Volkswagen 1001
`
`

`
`US 8,933,945 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,184,906 B1 *
`6,219,062 B1
`6,222,550 B1
`6,292,200 B1
`6,323,860 B1
`6,344,852 B1
`6,353,439 B1
`6,380,935 B1
`6,384,824 B1
`6,407,736 B1
`6,417,858 B1
`6,424,345 B1
`6,557,083 B1
`6,570,579 B1 *
`6,573,893 B1
`6,636,232 B2
`6,650,327 B1
`6,650,330 B2
`6,697,063 B1
`6,714,203 B1 *
`6,724,394 B1
`6,731,289 B1
`6,750,867 B1
`6,753,878 B1 *
`6,762,763 B1 *
`6,778,177 B1*
`6,791,559 B2
`6,801,203 B1
`6,809,732 B2
`6,864,893 B2
`6,864,896 B2 *
`6,897,871 B1
`6,980,209 B1
`
`............. 345/629
`
`2/2001 Wang et al . ................... 345/532
`4/2001 Matsuo et al.
`4/2001 Rosman eta!.
`9/2001 Bowen eta!.
`1112001 Zhu eta!.
`212002 Zhu eta!.
`3/2002 Lindholm et a!.
`4/2002 Heeschen et a!.
`5/2002 Morgan eta!.
`6/2002 Regan
`7/2002 Bosch eta!.
`7/2002 Smith eta!.
`4/2003 Sperber et a!.
`5/2003 Macinnis et a!.
`6/2003 Naqvi eta!.
`10/2003 Larson
`1112003 Airey eta!.
`1112003 Lindholm et a!.
`2/2004 Zhu
`3/2004 Morgan eta!. ................ 345/506
`4/2004 Zatz eta!.
`5/2004 Peercy eta!.
`6/2004 Gibson
`6/2004 Heirich et a!. ................ 345/629
`7/2004 Migdal eta!. ................. 345/506
`8/2004 Furtner ......................... 345/544
`9/2004 Baldwin
`10/2004 Hussain
`10/2004 Zatz eta!.
`3/2005 Zatz
`3/2005 Perego .......................... 345/542
`5/2005 Morein et al.
`12/2005 Donham et a!.
`
`7,015,913 B1
`7,061,495 B1
`7,170,515 B1
`2002/0145612 A1
`2003/0076320 A1
`2003/0164830 A1 *
`2004/0041814 A1
`2004/0164987 A1
`2005/0068325 A1
`2005/0200629 A1
`2006/0170690 A1
`
`3/2006 Lindholm et a!.
`6/2006 Leather
`1/2007 Zhu
`10/2002 Blythe eta!.
`4/2003 Collodi
`9/2003 Kent ............................. 345/505
`3/2004 Wyatt eta!.
`8/2004 Aronson eta!.
`3/2005 Lefebvre eta!.
`9/2005 Morein et al.
`8/2006 Leather
`
`OTHER PUBLICATIONS
`
`European Search Report from European Patent Office; European
`Application No. 03257464.2; dated Apr. 4, 2006.
`Foley, James et a!.; Computer Graphics, Principles and Practice;
`Addison-Wesley Publishing Company; 1990; pp. 873-899.
`Crockett, Thomas W.; An introduction to parallel rendering; Elsevier
`Science B.V.; 1997; pp. 819-843.
`Montrym, JohnS. eta!.; InfiniteReality: A Real-Time Graphics Sys(cid:173)
`tem; Silicon Graphics Computer Systems; 1997; pp. 293-302.
`Humphreys, Greg eta!.; WireGL: A Scalable Grpahics System for
`Clusters; ACM Siggraph; 2001; pp. 129-140.
`Akeley, K. eta!.: High-Performance Polygon Rendering; ACM Com(cid:173)
`puter Graphics; vol. 22, No.4; 1988; pp. 239-246.
`Breternitz, Jr., Mauricio eta!.; Compilation, Architectural Support,
`and Evaluation of SIMD Graphics Pipeline Programs on a General(cid:173)
`Purpose CPU; IEEE; 2003; pp. 1-11.
`International Search Report for PCT Patent Application PCT/
`IB2004/003821 dated Mar. 22, 2005.
`Fuchs, Henry eta!.; Pixel-Planes 5: A Heterogeneous Multiprocessor
`Graphics System Using Processor-Enhanced Memories; Computer
`Graphics; vol. 23, No.3; Jul. 1989; pp. 79-88.
`* cited by examiner
`
`0002
`
`

`
`U.S. Patent
`
`Jan. 13,2015
`
`Sheet 1 of 5
`
`US 8,933,945 B2
`
`)2
`
`/
`
`i}c
`21
`t:{
`
`A
`
`B
`
`c
`
`D
`
`E
`
`F
`
`2~
`13
`
`il 14
`
`.1Q
`
`1Q
`
`17
`
`.ill
`
`10
`\
`
`54
`\
`
`13~
`
`139-
`
`201 \
`117
`I
`SCAN
`CONVERTER
`
`h60
`BACK END
`CIRCUITRY
`AO
`143---l
`
`FIG.1
`PRIOR ART
`
`t-31
`
`FRONT END
`CIRCUITRY
`
`[.)35
`
`~136
`
`202
`
`1/110
`
`141~
`
`SCAN
`CONVERTER
`
`142-
`
`-t---161
`BACK END
`CIRCUITRY
`BO
`}-144
`
`I
`
`MEMORY CONTROLLER
`
`146
`
`~
`FIG. 5
`
`0003
`
`

`
`U.S. Patent
`
`Jan.13,2015
`
`Sheet 2 of 5
`
`US 8,933,945 B2
`
`31
`I
`
`/54
`
`GRAPHICS
`PROCESSING
`CIRCUIT
`
`55-
`
`101
`
`\
`r '
`
`38
`
`SCAN
`-::- CONVERTER
`r6o
`39- BACK END
`CIRCUITRY A
`
`HOST
`
`-31
`
`AGP
`
`1--32
`
`t-31
`
`FRONT END t-35
`CIRCUITRY
`36---4
`
`41
`
`vee
`
`SCAN
`CONVERTER
`-¥--61
`42- BACK END
`CIRCUITRY B
`
`34
`
`I
`
`102
`
`I
`'
`jo
`
`r-
`
`1- f--52
`
`43=1
`
`}-44
`
`4
`
`MEMORY CONTROLLER
`
`?Q-
`
`Switch
`
`-71
`
`-49
`
`-50
`
`48--
`
`GRAPHICS
`MEMORY
`
`DISPLAY 1-51
`
`FIG. 2
`
`0004
`
`

`
`U.S. Patent
`
`Jan.13,2015
`
`Sheet 3 of 5
`
`US 8,933,945 B2
`
`72
`
`73
`
`V1
`
`A
`
`48
`
`83
`
`82
`
`V2
`
`A
`
`75
`
`8
`
`8
`
`A
`
`FIG. 3
`
`]3
`
`BO
`
`AO
`
`16
`
`A1
`
`r-95
`81
`
`918
`
`AO
`
`17
`
`81
`
`A1
`
`919
`
`BO
`
`- 68
`
`8
`
`A
`
`9f
`
`AO
`
`94 ,-
`
`BO
`
`A1
`
`81
`
`81
`
`A1
`
`80
`
`AO
`
`FIG. 4
`
`0005
`
`

`
`('
`\.
`
`""
`/
`
`START
`.....
`
`RECEIVE VERTEX DATA FOR A
`PRIMITIVE TO BE RENDERED
`...
`GENERATE PIXEL DATA IN
`RESPONSE TO THE VERTEX DATA
`...
`
`/100
`
`/102
`
`/104
`
`105
`
`THE SET OF TILES
`DETERMINE
`ORRESPONDING
`THAT THE C
`S PIPELINE IS
`GRAPH I<
`NSIBLE FOR
`RESPC
`
`/
`
`106
`
`~
`
`PROVII E POSITION
`COORDINATI
`S OF THE PIXELS
`WITHIN THE
`DETERMINED SET
`BE PROCESSED
`OF TILES TC
`
`DETERMINE THE PIXELS WITHIN A SET OF TILES OF
`THE REPEATING TILE PATTERN TO BE PROCESSED
`BY A CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES IN RESPONSE TO THE PIXEL
`DATA, THE REPEATING TILE PATTERN INCLUDING A
`HORIZONTALLY AND VERTICALLY REPEATING
`PATTERN OF SQUARE REGIONS
`~
`PERFORM PIXEL OPERATIONS ON THE PIXELS
`WITHIN THE DETERMINED SET OF TILES BY THE
`CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES
`
`/108
`
`y ~109 G
`
`--··-·---
`
`N
`
`I
`\.
`
`END
`
`FIG. 6
`
`~
`00
`•
`~
`~
`~
`
`~ = ~
`
`~
`
`~ := ....
`0 ....
`
`~(H
`N
`
`Ul
`
`rFJ =(cid:173)
`.....
`
`('D
`('D
`
`.j;o.
`
`0 .....
`Ul
`
`d
`rJl
`00
`
`\c w
`w
`\c
`
`~ u. = N
`
`0006
`
`

`
`U.S. Patent
`
`Jan.13,2015
`
`Sheet 5 of 5
`
`US 8,933,945 B2
`
`vertex 1 --------------------------------v""· ,,
`
`--------------- :
`---------------
`:
`~-------------------------------------j
`Vertex 0
`
`vertex 2
`
`FIG. 7
`
`r n
`.........
`
`r ...,
`
`'-' 1 ....
`
`,..
`.....
`r .,
`..... I"'
`
`STS = 1
`
`STS = 2
`
`STS =4
`
`STS=8
`
`FIG. 8
`
`0007
`
`

`
`US 8,933,945 B2
`
`1
`DIVIDING WORK AMONG MULTIPLE
`GRAPHICS PIPELINES USING A
`SUPER-TILING TECHNIQUE
`
`This application claims the benefit of U.S. Provisional
`Application Ser. No. 60/429,641 filed Nov. 27,2002, entitled
`"Dividing Work Among Multiple Graphics Pipelines Using a
`Super-Tiling Technique", having as inventors Mark M.
`Leather and Eric Demers, and owned by instant assignee.
`
`RELATED CO-PENDING APPLICATION
`
`This is a related application to a co-pending application
`entitled "Parallel Pipeline Graphics System", having Ser. No.
`10/724,384, having Leather eta!. as the inventors, filed on 15
`Nov. 26, 2003, owned by the same assignee and hereby incor(cid:173)
`porated by reference in its entirety.
`
`FIELD OF THE INVENTION
`
`The present invention generally relates to graphics pro(cid:173)
`cessing circuitry and, more particularly, to dividing graphics
`processing operations among multiple pipelines.
`
`BACKGROUND OF THE INVENTION
`
`2
`The video image observed by the human eye becomes dis(cid:173)
`torted or choppy when the amount of time taken to render an
`entire frame of video exceeds the amonnt of time in which the
`display device must be refreshed with a new graphic or frame
`in order to avoid perception by the human eye. To decrease
`processing time, graphics processing systems typically
`divide primitive processing among several graphics process(cid:173)
`ing circuits where, for example, one graphics processing cir(cid:173)
`cuit is responsible for one vertical strip (e.g. 13) of the frame
`10 while another graphics processing circuit is responsible for
`another vertical strip ( e.g.14) of the frame. In this manner, the
`pixel data is provided to the frame buffer within the required
`refresh time.
`Load balancing is a significant drawback associated with
`the partitioning systems as described above. Load balancing
`problems occur, for example, when all of the primitives 20-23
`of a particular object or scene are located in one strip (e.g.
`strip 13) as illustrated in FIG. 1. When this occurs, only the
`20 graphics processing circuit responsible strip 13 is actively
`processing primitives; the remaining graphics processing cir(cid:173)
`cuits are idle. This results in a significant waste of computing
`resources as at most only half of the graphics processing
`circuits are operating. Consequently, graphics processing
`25 system performance is decreased as the system is only oper(cid:173)
`ating at a maximum of fifty percent capacity.
`Changing the width of the strips has been employed to
`counter the system performance problems. However, when
`the width of a strip is increased, the load balancing problem is
`30 enhanced as more primitives are located within a single strip;
`thereby, increasing the processing required of the graphics
`processing circuit responsible for that strip, while the remain(cid:173)
`ing graphics processing circuits remain idle. When the width
`of the strip is decreased (e.g. four bits to two bits), cache (e.g.
`35 texture cache) efficiency is decreased as the number of cache
`lines employed in transferring data is reduced in proportion to
`the decreased width of the strip. In either case, graphics
`processing system performance is still decreased due to the
`idle graphics processing circuits.
`Frame based subdivision has been used to overcome the
`performance problems associated with conventional parti(cid:173)
`tioning systems. In frame based subdivision, each graphics
`processor is responsible for processing an entire frame, not
`strips within the same frame. The graphics processors then
`alternate frames. However, frame subdivision introduces one
`or more frames of latency between the user and the screen,
`which is unacceptable in real-time interactive envirouments,
`for example, providing graphics for a flight simulator appli-
`cation.
`
`Computer graphics systems, set top box systems or other
`graphics processing systems typically include a host proces(cid:173)
`sor, graphics (including video) processing circuitry, memory
`(e.g. frame buffer), and one or more display devices. The host
`processor may have a graphics application running thereon,
`which provides vertex data for a primitive (e.g. triangle) to be
`rendered on the one or more display devices to the graphics
`processing circuitry. The display device, for example, a CRT
`display includes a plurality of scan lines comprised of a series
`of pixels. When appearance attributes (e.g. color, brightness,
`texture) are applied to the pixels, an object or scene is pre(cid:173)
`sented on the display device. The graphics processing cir(cid:173)
`cuitry receives the vertex data and generates pixel data includ(cid:173)
`ing the appearance attributes which may be presented on the 40
`display device according to a particular protocol. The pixel
`data is typically stored in the frame buffer in a manner that
`corresponds to the pixels location on the display device.
`FIG. 1 illustrates a conventional display device 10, having
`a screen 12 partitioned into a series of vertical strips 13-18. 45
`The strips 13-18 are typically 1-4 pixels in width. In like
`manner, the frame buffer of conventional graphics processing
`systems is partitioned into a series of vertical strips having the
`same screen space width. Alternatively, the frame buffer and
`the display device may be partitioned into a series of horizon- 50
`tal strips. Graphics calculations, for example, lighting, color,
`texture and user viewing information are performed by the
`graphics processing circuitry on each of the primitives pro(cid:173)
`vided by the host. Once all calculations have been performed
`on the primitives, the pixel data representing the object to be 55
`displayed is written into the frame buffer. Once the graphics
`calculations have been repeated for all primitives associated
`with a specific frame, the data stored in the frame buffer is
`rendered to create a video signal that is provided to the display
`device.
`The amount of time taken for an entire frame of informa(cid:173)
`tion to be calculated and provided to the frame buffer
`becomes a bottleneck in graphics systems as the calculations
`associated with the graphics become more complicated. Con(cid:173)
`tributing to the increased complexity of the graphics calcula- 65
`tion is the increased need for higher resolution video, as well
`as the need for more complicated video, such as 3-D video.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention and the related advantages and ben(cid:173)
`efits provided thereby, will be best appreciated and under(cid:173)
`stood upon review of the following detailed description of a
`preferred embodiment, taken in conjunction with the follow(cid:173)
`ing drawings, where like numerals represent like elements, in
`which:
`FIG. 1 is a schematic block diagram of a conventional
`60 display partitioned into several vertical strips:
`FIG. 2 is a schematic block diagram of a graphics process(cid:173)
`ing system employing an exemplary multi-pipeline graphics
`processing circuit according to one embodiment of the
`present invention;
`FIG. 3 is a schematic block diagram of a memory parti(cid:173)
`tioned into an exemplary super-tile pattern according to the
`present invention;
`
`0008
`
`

`
`US 8,933,945 B2
`
`3
`FIG. 4 is a schematic block diagram of a memory parti(cid:173)
`tioned into a super-tile pattern according to an alternate
`embodiment of the present invention;
`FIG. 5 is a schematic block diagram of an exemplary multi(cid:173)
`pipeline graphics processing circuit used in a multi processor
`configuration according to an alternate embodiment of the
`present invention;
`FIG. 6 is a flow chart of the operations performed by the
`graphics processing circuit according to the present inven(cid:173)
`tion;
`FIG. 7 is a diagram illustrating a polygon bounding box to
`determine which, if a polygon fits in a tile or super tile; and
`FIG. 8 is a schematic block diagram of an exemplary multi(cid:173)
`pipeline graphics processing circuit used in a multi processor
`configuration according to an alternate embodiment of the
`present invention.
`
`DETAILED DESCRIPTION OF A PREFERRED
`EMBODIMENT
`
`A multi-pipeline graphics processing circuit includes at
`least two pipelines operative to process data in a correspond(cid:173)
`ing tile of a repeating tile pattern, a respective one of the at
`least two pipelines is operative to process data in a dedicated
`tile, wherein the repeating tile pattern includes a horizontally
`and vertically repeating pattern of square regions. The multi(cid:173)
`pipeline graphics processing circuit may be coupled to a
`frame buffer that is subdivided into a replicating pattern of
`square regions (e.g. tiles), where each region is processed by
`a corresponding one of the at least two pipelines such that
`load balancing and texture cache utilization is enhanced.
`A multi-pipeline graphics processing method includes
`receiving vertex data for a primitive to be rendered, generat(cid:173)
`ing pixel data in response to the vertex data, determining the
`pixels within a set of tiles of a repeating tile pattern to be 35
`processed by a corresponding one of at least two graphics
`pipelines in response to the pixel data, the repeating tile
`pattern including a horizontally and vertically repeating pat(cid:173)
`tern of square regions, and performing pixel operations on the
`pixels within the determined set of tiles by the corresponding
`one of the at least two graphics pipelines. An exemplary
`embodiment of the present invention will now be described
`with reference to FIGS. 2-6.
`FIG. 2 is a schematic block diagram of an exemplary
`graphics processing system 30 employing an example of a 45
`multi-pipeline graphics processing circuit 34 according to
`one embodiment of the present invention. The graphics pro(cid:173)
`cessing system 30 can be implemented with a single graphics
`processing circuit 34 or with two or more graphics processing
`circuits 34, 54. The components and corresponding function(cid:173)
`ality of the graphics processing circuits 34, 54 are substan(cid:173)
`tially the same. Therefore, only the structure and operation of
`graphics processing circuit 34 will be described in detail. An
`alternate embodiment, employing both graphics processing
`circuits 34 and 54 will be discussed in greater detail below
`with reference to FIGS. 4-5.
`Graphics data 31, for example, vertex data of a primitive
`(e.g. triangle) 80 (FIG. 3) is transmitted as a series of strips to
`the graphics processing circuit 34. As used herein, graphics
`data 31 can also include video data or a combination of video
`data and graphics data. The graphics processing circuit 34 is
`preferably a portion of a stand-alone graphics processor chip
`or may also be integrated with a host processor or other
`circuit, if desired, or part of a larger system. The graphics data
`31 is provided by a host (not shown). The host may be a
`system processor (not shown) or a graphics application run(cid:173)
`ning on the system processor. In an alternate embodiment, an
`
`4
`Accelerated Graphics Port (AGP) 32 or other suitable port
`receives the graphics data 31 from the host and provides the
`graphics data 31 to the graphics processing circuit 34 for
`further processing.
`The graphics processing circuit 34 includes a first graphics
`pipeline 101 operative to process graphics data in a first set of
`tiles as discussed in greater detail below. The first pipeline
`101 includes front end circuitry 35, a scan converter 37, and
`back end circuitry 39. The graphics processing circuit 34 also
`10 includes a second graphics pipeline 102, operative to process
`graphics data in a second set of tiles as discussed in greater
`detail below. The first graphics pipeline 101 and the second
`graphics pipeline 102 operate independently of one another.
`The second graphics pipeline 102 includes the front end cir-
`15 cuitry 35, a scan converter 40, and back end circuitry 42.
`Thus, the graphics processing circuit 34 of the present inven(cid:173)
`tion is configured as a multi-pipeline circuit, where the back
`end circuitry 39 of the first graphics pipeline 101 and the back
`end circuitry 42 of the second graphics pipeline 102 share the
`20 front end circuitry 35, in that the first and second graphics
`pipelines 101 and 102 receive the same pixel data 36 provided
`by the front end circuitry 35. Alternatively, the back end
`circuitry 39 of the first graphics pipeline 101 and the back end
`circuitry 42 of the second pipeline 102 may be coupled to
`25 separate front end circuits. Additionally, it will be appreciated
`that a single graphics processing circuit can be configured in
`similar fashion to include more than two graphics pipelines.
`The illustrated graphics processing circuit 34 has the first and
`second pipelines 101-102 present on the same chip. However,
`30 in alternate embodiments, the first and second graphics pipe(cid:173)
`lines 101-102 may be present on multiple chips intercon(cid:173)
`nected by suitable communication circuitry or a communica(cid:173)
`tion path, for example, a synchronization signal or data bus
`interconnecting the respective memory controllers.
`The front end circuitry 35 may include, for example, a
`vertex shader, set up circuitry, rasterizer or other suitable
`circuitry operative to receive the primitive data 31 and gen(cid:173)
`erate pixel data 36 to be further processed by the back end
`circuitry 39 and 42, respectively. The front end circuitry 35
`40 generates the pixel data 36 by performing, for example, clip(cid:173)
`ping, lighting, spatial transformations, matrix operations and
`rasterizing operations on the primitive data 31. The pixel data
`36 is then transmitted to the respective scan converters 37 and
`40 of the two graphics pipelines 101-102.
`The scan converter 37 of the first graphics pipeline 101
`receives the pixel data 36 and sequentially provides the posi(cid:173)
`tion (e.g. x, y) coordinates 60 in screen space of the pixels to
`be processed by the back end circuitry 39 by determining or
`identifying those pixels of the primitive, for example, the
`50 pixels within portions 81-82 of the triangle 80 (FIG. 3) that
`intersect the tile or set of tiles that the back end circuitry 39 is
`responsible for processing. The particular tile(s) that the back
`end circuitry 39 is responsible for is determined based on the
`tile identification data present on the pixel identification line
`55 38 of the scan converter 37. The pixel identification line 38 is
`illustrated as being hard wired to ground. Thus, the tile iden(cid:173)
`tification data corresponds to a logical zero. This corresponds
`to the back end circuitry 39 being responsible for processing
`the tiles labeled "A" (e.g. 72 and 75) in FIG. 3. Although the
`60 pixel identification line 38 is illustrated as being hard wired to
`a fixed value, it is to be understood and appreciated that the
`tile identification data can be programmable data, for
`example, from a suitable driver and such a configuration is
`contemplated by the present invention and is within the spirit
`65 and scope of the instant disclosure.
`Back end circuitry 39 may include, for example, pixel
`shaders, blending circuits, z-buffers or any other circuitry for
`
`0009
`
`

`
`US 8,933,945 B2
`
`5
`performing pixel appearance attribute operations (e.g. color,
`texture blending, z-buffering) on those pixels located, for
`example, in tiles 72, 75 (FIG. 3) corresponding to the position
`coordinates 60 provided by the scan converter 37. The pro(cid:173)
`cessed pixel data 43 is then transmitted to graphics memory
`48 via memory controller 46 for storage therein at locations
`corresponding to the position coordinates 60.
`The scan converter 40 of the second graphics pipeline 102,
`receives the pixel data 36 and sequentially provides position
`(e.g. x, y) coordinates 61 in screen space of the pixels to be 10
`processed by the back end circuitry 42 by determining those
`pixels of the primitive, for example, the pixels within portions
`83-84 of the triangle 80 (FIG. 3) that intersect the tiles that the
`back end circuitry 42 is responsible for processing. Back end
`circuitry 42 tile responsibility is determined based on the tile
`identification data present on the pixel identification line 41 of
`the scan converter 41. The pixel identification line 41 is illus(cid:173)
`trated as being hard wired to V cc; thus, the tile identification
`data corresponds to a logical one. This corresponds to the
`back end circuitry 42 being responsible for processing the 20
`tiles labeled "B" (e.g. 73-74) in FIG. 3. Although the pixel
`identification line 41 is illustrated as being hard wired to a
`fixed value, it is to be understood and appreciated that the tile
`identification data can be programmable data, for example,
`from a suitable driver and such configuration is contemplated 25
`by the present invention and is within the spirit and scope of
`the instant disclosure.
`Back end circuitry 42 may include, for example, pixel
`shaders, blending circuits, z-buffers or any suitable circuitry
`for performing pixel appearance attribute operations on those
`pixels located, for example, in tiles 73 and 74 (FIG. 3) corre(cid:173)
`sponding to the position coordinates 61 provided by the scan
`converter 40. The processed pixel data 44 is then transmitted
`to the graphics memory 48, via memory controller 46, for
`storage therein at locations corresponding to the position
`coordinates 61.
`The memory controller 46 is operative to transmit and
`receive the processed pixel data 43-44 from the back end
`circuitry 39 and 42; transmit and retrieve pixel data 49 from
`the graphics memory 48; and in a single circuit implementa(cid:173)
`tion, transmit pixel data 50 for presentation on a suitable
`display 51. The display 51 may be a monitor, a CRT, a high
`definition television (HDTV) or any other device or combi(cid:173)
`nation thereof.
`Graphics memory 48 may include, for example, a frame 45
`buffer that also stores one or more texture maps. Referring to
`FIG. 3, the frame buffer portion of the graphics memory 48 is
`partitioned in a repeating tile pattern ofhorizontal and vertical
`square regions or tiles 72-75, where the regions 72-75 provide
`a two dimensional partitioning of the frame buffer portion of
`the memory 48. Each tile is implemented as a 16x16 pixel
`array. The repeating tile pattern of the frame buffer 48 corre(cid:173)
`sponds to the partitioning of the corresponding display 51
`(FIG. 2). When rendering a primitive (e.g. triangle) 80, the
`first graphics pipeline 101 processes only those pixels in
`portions 81, 82 of the primitive 80 that intersects tiles labeled
`"A", for example, 72 and 75, as the back end circuitry 39 is
`responsible for the processing of tiles corresponding to tile
`identification 0 present on pixel identification line 38 (FIG.
`2). In corresponding fashion, the second graphics pipeline
`102 processes only those pixels in portions 83, 84 of the
`primitive 80 that intersects tiles labeled "B", for example
`73-74, as the back end circuitry 42 (FIG. 2) is responsible for
`the processing of tiles corresponding to tile identification 1
`present on pixel identification line 41 (FIG. 2).
`By configuring the frame buffer 48 according to the present
`invention, as the primitive data 31 is typically written in
`
`6
`strips, the tiles (e.g. 72 and 75) being processed by the first
`graphics pipeline 101 and the tiles (e.g. 73 and 74) being
`processed by the second graphics pipeline 102 will be sub(cid:173)
`stantially equal in size, notwithstanding the primitive 80 ori(cid:173)
`entation. Thus, the amount of processing performed by the
`first graphics pipeline 101 and the second graphics pipeline
`102, respectively, are substantially equal; thereby, effectively
`eliminating the load balance problems exhibited by conven-
`tional techniques.
`FIG. 4 is a schematic block diagram of a frame buffer 68
`partitioned into a super-tile pattern according to an alternate
`embodiment of the present invention. Such a partitioning
`would be used, for example, in conjunction with a multi(cid:173)
`processor implementation to be discussed below with refer-
`15 ence to FIG. 5. As illustrated, the frame buffer 68 is parti(cid:173)
`tioned into a repeating tile pattern where the tiles, for
`example, 92-99 that form the repeating tile pattern are the
`responsibility of and processed by a corresponding one of the
`graphics pipelines provided by the multi-processor imple(cid:173)
`mentation.
`FIG. 5 is a schematic block diagram of a graphics process(cid:173)
`ing circuit 54 which may be coupled with the graphics pro(cid:173)
`cessing circuit 34 (FIG. 2), for example, by the AGP 32 or
`other suitable port, to form one embodiment of a multi-pro(cid:173)
`cessor implementation. The graphics processing circuit 54 is
`preferably a portion of a stand-alone graphics processor chip
`or may also be integrated with a host processor or other
`circuit, if desired, or port of a larger system. The multi(cid:173)
`processor implementation exhibits an increased fill rate of,
`30 for example, 9.6 billion pixels/sec with a triangle rate of300
`million triangles/sec. This represents a tremendous perfor(cid:173)
`mance increase as compared to conventional graphics pro(cid:173)
`cessing systems. The triangle rate is defined as the number of
`triangles the graphics processing circuit can generate per
`35 second. The fill rate is defined as the number of pixels the
`graphics processing circuit can render per second.
`Referring briefly to FIG. 2, in the multi-processor imple(cid:173)
`mentation, processed pixel data 52 from the graphics process(cid:173)
`ing circuit 34 is provided as a first of two inputs to a high
`40 speed switch 70. The second input to the high speed switch 70
`is the processed pixel data 55 from the graphics processing
`circuit 54. The high speed switch 70 has a switching fre(cid:173)
`quency (f) sufficient to provide the pixel information 71 to a
`suitable display device without any detectable latency.
`Returning to FIG. 5, the graphics processing circuit 54
`includes a third graphics pipeline 201 operative to process
`graphics data in a third set of tiles. The third graphics pipeline
`201 includes front end circuitry 135, which may be the front
`end circuitry 35 discussed with reference to FIG. 2, a scan
`50 converter 137 and back end circuitry 139. The graphics pro(cid:173)
`cessing circuit 54 also includes a fourth graphics pipeline
`202, operative to process graphics data in a fourth set of tiles.
`The fourth graphics pipeline 202 includes the front end cir(cid:173)
`cuitry 135, a scan converter 140 and back end circuitry 142.
`55 The third graphics pipeline 201 and the fourth graphics pipe(cid:173)
`line 202 also operate independently of one another. Thus, the
`graphics processing circuit 54 is configured as a multi-pipe(cid:173)
`line circuit, where the back end circuitry 139 of the third
`graphics pipeline 201 and the back end circuitry 142 of the
`60 fourth graphics pipeline 202 share the front end circuitry 135,
`in that the respective back end circuitry 139 and 142 receives
`the same pixel data from the front end circuitry 135. As
`illustrated, the components of the third and fourth graphics
`pipelines are present on a single chip. Additionally, the back
`65 end circuitry 139 and the back end circuitry 142 may be
`configured to share the front end circuitry

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket