`
`1111111111111111111111111111111111111111111111111111111111111
`US008933945B2
`
`c12) United States Patent
`Leather et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,933,945 B2
`Jan.13,2015
`
`(54) DIVIDING WORK AMONG MULTIPLE
`GRAPHICS PIPELINES USING A
`SUPER-TILING TECHNIQUE
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`(75)
`
`Inventors: Mark M. Leather, Saratoga, CA (US);
`Eric Demers, Palo Alto, CA (US)
`
`(73) Assignee: ATI Technologies ULC, Markham,
`Ontario (CA)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1808 days.
`
`(21) Appl. No.: 10/459,797
`
`(22) Filed:
`
`Jun.12,2003
`
`(65)
`
`Prior Publication Data
`
`US 2004/0100471 Al
`
`May 27,2004
`
`Related U.S. Application Data
`
`(60)
`
`Provisional application No. 60/429,641, filed on Nov.
`27,2002.
`
`(51)
`
`(52)
`
`(58)
`
`(2006.01)
`(2006.01)
`(2006.01)
`(2006.01)
`(2011.01)
`(2006.01)
`
`Int. Cl.
`G06T 1120
`G06F 13114
`G06F 12102
`G06T 11140
`G06T 15100
`G09G5/36
`U.S. Cl.
`CPC . G06T 11140 (2013.01); G06T 1120 (2013.01);
`G06T 151005 (2013.01); G09G 51363 (2013.01)
`USPC ............................ 345/506; 345/519; 345/544
`Field of Classification Search
`USPC ......... 345/506, 530, 505, 588, 544, 545, 532,
`345/501,502,531,519
`See application file for complete search history.
`
`4,885,703 A
`5,179,640 A *
`5,550,962 A
`5,745,118 A *
`5,794,016 A *
`5,818,469 A
`5,905,506 A *
`5,977,997 A
`5,999,196 A
`6,118,452 A
`
`12/19 89 Deering
`111993 Duffy ............................ 345/596
`8/1996 Nakamura eta!.
`4/1998 Alcorn eta!. ................. 345/587
`8/1998 Kelleher ....................... 345/505
`10/1998 Lawless et a!.
`5/1999 Hamburg ...................... 345/672
`1111999 Vainsencher
`12/1999 Storm eta!.
`9/2000 Gannett
`(Continued)
`OTHER PUBLICATIONS
`
`Elias, Hugo. "Polygon Scan Converting." http:/ /freespace.virgin.netl
`hugo.elias/graphics/x_polysc.htm. *
`(Continued)
`
`Primary Examiner- Joni Richer
`(74) Attorney, Agent, or Firm- Faegre Baker Daniels LLP
`
`ABSTRACT
`(57)
`A graphics processing circuit includes at least two pipelines
`operative to process data in a corresponding set of tiles of a
`repeating tile pattern, a respective one of the at least two
`pipelines operative to process data in a dedicated tile, wherein
`the repeating tile pattern includes a horizontally and verti(cid:173)
`cally repeating pattern of square regions. A graphics process(cid:173)
`ing method includes receiving vertex data for a primitive to be
`rendered; generating pixel data in response to the vertex data;
`determining the pixels within a set of tiles of a repeating tile
`pattern to be processed by a corresponding one of at least two
`graphics pipelines in response to the pixel data, the repeating
`tile pattern including a horizontally and vertically repeating
`pattern of square regions; and performing pixel operations on
`the pixels within the determined set oftiles by the correspond(cid:173)
`ing one of the at least two graphics pipelines.
`
`21 Claims, 5 Drawing Sheets
`
`DETERMINE THE PIXELS WITHIN A SET OF TILES OF
`THE REPEATING TILE PATTERN TO BE PROCESSED
`BY A CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES IN RESPONSE TO THE PIXEL
`DATA, THE REPEATING TILE PATTERN INCLUDING A
`HORIZONTALLY AND VERTICALLY REPEATING
`PATTERN OF SQUARE REGIONS
`
`PERFORM PIXEL OPERATIONS ON THE PIXELS
`WITHIN THE DETERMINED SET OF TILES BY THE
`CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES
`
`/
`
`108
`
`PROVIDE POSITION
`COORDINATES OF THE PIXELS
`WITHIN THE DETERMINED SET
`OF TILES TO BE PROCESSED
`
`y
`
`N
`
`END
`
`0001
`
`Volkswagen 1001
`
`
`
`US 8,933,945 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,184,906 B1 *
`6,219,062 B1
`6,222,550 B1
`6,292,200 B1
`6,323,860 B1
`6,344,852 B1
`6,353,439 B1
`6,380,935 B1
`6,384,824 B1
`6,407,736 B1
`6,417,858 B1
`6,424,345 B1
`6,557,083 B1
`6,570,579 B1 *
`6,573,893 B1
`6,636,232 B2
`6,650,327 B1
`6,650,330 B2
`6,697,063 B1
`6,714,203 B1 *
`6,724,394 B1
`6,731,289 B1
`6,750,867 B1
`6,753,878 B1 *
`6,762,763 B1 *
`6,778,177 B1*
`6,791,559 B2
`6,801,203 B1
`6,809,732 B2
`6,864,893 B2
`6,864,896 B2 *
`6,897,871 B1
`6,980,209 B1
`
`............. 345/629
`
`2/2001 Wang et al . ................... 345/532
`4/2001 Matsuo et al.
`4/2001 Rosman eta!.
`9/2001 Bowen eta!.
`1112001 Zhu eta!.
`212002 Zhu eta!.
`3/2002 Lindholm et a!.
`4/2002 Heeschen et a!.
`5/2002 Morgan eta!.
`6/2002 Regan
`7/2002 Bosch eta!.
`7/2002 Smith eta!.
`4/2003 Sperber et a!.
`5/2003 Macinnis et a!.
`6/2003 Naqvi eta!.
`10/2003 Larson
`1112003 Airey eta!.
`1112003 Lindholm et a!.
`2/2004 Zhu
`3/2004 Morgan eta!. ................ 345/506
`4/2004 Zatz eta!.
`5/2004 Peercy eta!.
`6/2004 Gibson
`6/2004 Heirich et a!. ................ 345/629
`7/2004 Migdal eta!. ................. 345/506
`8/2004 Furtner ......................... 345/544
`9/2004 Baldwin
`10/2004 Hussain
`10/2004 Zatz eta!.
`3/2005 Zatz
`3/2005 Perego .......................... 345/542
`5/2005 Morein et al.
`12/2005 Donham et a!.
`
`7,015,913 B1
`7,061,495 B1
`7,170,515 B1
`2002/0145612 A1
`2003/0076320 A1
`2003/0164830 A1 *
`2004/0041814 A1
`2004/0164987 A1
`2005/0068325 A1
`2005/0200629 A1
`2006/0170690 A1
`
`3/2006 Lindholm et a!.
`6/2006 Leather
`1/2007 Zhu
`10/2002 Blythe eta!.
`4/2003 Collodi
`9/2003 Kent ............................. 345/505
`3/2004 Wyatt eta!.
`8/2004 Aronson eta!.
`3/2005 Lefebvre eta!.
`9/2005 Morein et al.
`8/2006 Leather
`
`OTHER PUBLICATIONS
`
`European Search Report from European Patent Office; European
`Application No. 03257464.2; dated Apr. 4, 2006.
`Foley, James et a!.; Computer Graphics, Principles and Practice;
`Addison-Wesley Publishing Company; 1990; pp. 873-899.
`Crockett, Thomas W.; An introduction to parallel rendering; Elsevier
`Science B.V.; 1997; pp. 819-843.
`Montrym, JohnS. eta!.; InfiniteReality: A Real-Time Graphics Sys(cid:173)
`tem; Silicon Graphics Computer Systems; 1997; pp. 293-302.
`Humphreys, Greg eta!.; WireGL: A Scalable Grpahics System for
`Clusters; ACM Siggraph; 2001; pp. 129-140.
`Akeley, K. eta!.: High-Performance Polygon Rendering; ACM Com(cid:173)
`puter Graphics; vol. 22, No.4; 1988; pp. 239-246.
`Breternitz, Jr., Mauricio eta!.; Compilation, Architectural Support,
`and Evaluation of SIMD Graphics Pipeline Programs on a General(cid:173)
`Purpose CPU; IEEE; 2003; pp. 1-11.
`International Search Report for PCT Patent Application PCT/
`IB2004/003821 dated Mar. 22, 2005.
`Fuchs, Henry eta!.; Pixel-Planes 5: A Heterogeneous Multiprocessor
`Graphics System Using Processor-Enhanced Memories; Computer
`Graphics; vol. 23, No.3; Jul. 1989; pp. 79-88.
`* cited by examiner
`
`0002
`
`
`
`U.S. Patent
`
`Jan. 13,2015
`
`Sheet 1 of 5
`
`US 8,933,945 B2
`
`)2
`
`/
`
`i}c
`21
`t:{
`
`A
`
`B
`
`c
`
`D
`
`E
`
`F
`
`2~
`13
`
`il 14
`
`.1Q
`
`1Q
`
`17
`
`.ill
`
`10
`\
`
`54
`\
`
`13~
`
`139-
`
`201 \
`117
`I
`SCAN
`CONVERTER
`
`h60
`BACK END
`CIRCUITRY
`AO
`143---l
`
`FIG.1
`PRIOR ART
`
`t-31
`
`FRONT END
`CIRCUITRY
`
`[.)35
`
`~136
`
`202
`
`1/110
`
`141~
`
`SCAN
`CONVERTER
`
`142-
`
`-t---161
`BACK END
`CIRCUITRY
`BO
`}-144
`
`I
`
`MEMORY CONTROLLER
`
`146
`
`~
`FIG. 5
`
`0003
`
`
`
`U.S. Patent
`
`Jan.13,2015
`
`Sheet 2 of 5
`
`US 8,933,945 B2
`
`31
`I
`
`/54
`
`GRAPHICS
`PROCESSING
`CIRCUIT
`
`55-
`
`101
`
`\
`r '
`
`38
`
`SCAN
`-::- CONVERTER
`r6o
`39- BACK END
`CIRCUITRY A
`
`HOST
`
`-31
`
`AGP
`
`1--32
`
`t-31
`
`FRONT END t-35
`CIRCUITRY
`36---4
`
`41
`
`vee
`
`SCAN
`CONVERTER
`-¥--61
`42- BACK END
`CIRCUITRY B
`
`34
`
`I
`
`102
`
`I
`'
`jo
`
`r-
`
`1- f--52
`
`43=1
`
`}-44
`
`4
`
`MEMORY CONTROLLER
`
`?Q-
`
`Switch
`
`-71
`
`-49
`
`-50
`
`48--
`
`GRAPHICS
`MEMORY
`
`DISPLAY 1-51
`
`FIG. 2
`
`0004
`
`
`
`U.S. Patent
`
`Jan.13,2015
`
`Sheet 3 of 5
`
`US 8,933,945 B2
`
`72
`
`73
`
`V1
`
`A
`
`48
`
`83
`
`82
`
`V2
`
`A
`
`75
`
`8
`
`8
`
`A
`
`FIG. 3
`
`]3
`
`BO
`
`AO
`
`16
`
`A1
`
`r-95
`81
`
`918
`
`AO
`
`17
`
`81
`
`A1
`
`919
`
`BO
`
`- 68
`
`8
`
`A
`
`9f
`
`AO
`
`94 ,-
`
`BO
`
`A1
`
`81
`
`81
`
`A1
`
`80
`
`AO
`
`FIG. 4
`
`0005
`
`
`
`('
`\.
`
`""
`/
`
`START
`.....
`
`RECEIVE VERTEX DATA FOR A
`PRIMITIVE TO BE RENDERED
`...
`GENERATE PIXEL DATA IN
`RESPONSE TO THE VERTEX DATA
`...
`
`/100
`
`/102
`
`/104
`
`105
`
`THE SET OF TILES
`DETERMINE
`ORRESPONDING
`THAT THE C
`S PIPELINE IS
`GRAPH I<
`NSIBLE FOR
`RESPC
`
`/
`
`106
`
`~
`
`PROVII E POSITION
`COORDINATI
`S OF THE PIXELS
`WITHIN THE
`DETERMINED SET
`BE PROCESSED
`OF TILES TC
`
`DETERMINE THE PIXELS WITHIN A SET OF TILES OF
`THE REPEATING TILE PATTERN TO BE PROCESSED
`BY A CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES IN RESPONSE TO THE PIXEL
`DATA, THE REPEATING TILE PATTERN INCLUDING A
`HORIZONTALLY AND VERTICALLY REPEATING
`PATTERN OF SQUARE REGIONS
`~
`PERFORM PIXEL OPERATIONS ON THE PIXELS
`WITHIN THE DETERMINED SET OF TILES BY THE
`CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES
`
`/108
`
`y ~109 G
`
`--··-·---
`
`N
`
`I
`\.
`
`END
`
`FIG. 6
`
`~
`00
`•
`~
`~
`~
`
`~ = ~
`
`~
`
`~ := ....
`0 ....
`
`~(H
`N
`
`Ul
`
`rFJ =(cid:173)
`.....
`
`('D
`('D
`
`.j;o.
`
`0 .....
`Ul
`
`d
`rJl
`00
`
`\c w
`w
`\c
`
`~ u. = N
`
`0006
`
`
`
`U.S. Patent
`
`Jan.13,2015
`
`Sheet 5 of 5
`
`US 8,933,945 B2
`
`vertex 1 --------------------------------v""· ,,
`
`--------------- :
`---------------
`:
`~-------------------------------------j
`Vertex 0
`
`vertex 2
`
`FIG. 7
`
`r n
`.........
`
`r ...,
`
`'-' 1 ....
`
`,..
`.....
`r .,
`..... I"'
`
`STS = 1
`
`STS = 2
`
`STS =4
`
`STS=8
`
`FIG. 8
`
`0007
`
`
`
`US 8,933,945 B2
`
`1
`DIVIDING WORK AMONG MULTIPLE
`GRAPHICS PIPELINES USING A
`SUPER-TILING TECHNIQUE
`
`This application claims the benefit of U.S. Provisional
`Application Ser. No. 60/429,641 filed Nov. 27,2002, entitled
`"Dividing Work Among Multiple Graphics Pipelines Using a
`Super-Tiling Technique", having as inventors Mark M.
`Leather and Eric Demers, and owned by instant assignee.
`
`RELATED CO-PENDING APPLICATION
`
`This is a related application to a co-pending application
`entitled "Parallel Pipeline Graphics System", having Ser. No.
`10/724,384, having Leather eta!. as the inventors, filed on 15
`Nov. 26, 2003, owned by the same assignee and hereby incor(cid:173)
`porated by reference in its entirety.
`
`FIELD OF THE INVENTION
`
`The present invention generally relates to graphics pro(cid:173)
`cessing circuitry and, more particularly, to dividing graphics
`processing operations among multiple pipelines.
`
`BACKGROUND OF THE INVENTION
`
`2
`The video image observed by the human eye becomes dis(cid:173)
`torted or choppy when the amount of time taken to render an
`entire frame of video exceeds the amonnt of time in which the
`display device must be refreshed with a new graphic or frame
`in order to avoid perception by the human eye. To decrease
`processing time, graphics processing systems typically
`divide primitive processing among several graphics process(cid:173)
`ing circuits where, for example, one graphics processing cir(cid:173)
`cuit is responsible for one vertical strip (e.g. 13) of the frame
`10 while another graphics processing circuit is responsible for
`another vertical strip ( e.g.14) of the frame. In this manner, the
`pixel data is provided to the frame buffer within the required
`refresh time.
`Load balancing is a significant drawback associated with
`the partitioning systems as described above. Load balancing
`problems occur, for example, when all of the primitives 20-23
`of a particular object or scene are located in one strip (e.g.
`strip 13) as illustrated in FIG. 1. When this occurs, only the
`20 graphics processing circuit responsible strip 13 is actively
`processing primitives; the remaining graphics processing cir(cid:173)
`cuits are idle. This results in a significant waste of computing
`resources as at most only half of the graphics processing
`circuits are operating. Consequently, graphics processing
`25 system performance is decreased as the system is only oper(cid:173)
`ating at a maximum of fifty percent capacity.
`Changing the width of the strips has been employed to
`counter the system performance problems. However, when
`the width of a strip is increased, the load balancing problem is
`30 enhanced as more primitives are located within a single strip;
`thereby, increasing the processing required of the graphics
`processing circuit responsible for that strip, while the remain(cid:173)
`ing graphics processing circuits remain idle. When the width
`of the strip is decreased (e.g. four bits to two bits), cache (e.g.
`35 texture cache) efficiency is decreased as the number of cache
`lines employed in transferring data is reduced in proportion to
`the decreased width of the strip. In either case, graphics
`processing system performance is still decreased due to the
`idle graphics processing circuits.
`Frame based subdivision has been used to overcome the
`performance problems associated with conventional parti(cid:173)
`tioning systems. In frame based subdivision, each graphics
`processor is responsible for processing an entire frame, not
`strips within the same frame. The graphics processors then
`alternate frames. However, frame subdivision introduces one
`or more frames of latency between the user and the screen,
`which is unacceptable in real-time interactive envirouments,
`for example, providing graphics for a flight simulator appli-
`cation.
`
`Computer graphics systems, set top box systems or other
`graphics processing systems typically include a host proces(cid:173)
`sor, graphics (including video) processing circuitry, memory
`(e.g. frame buffer), and one or more display devices. The host
`processor may have a graphics application running thereon,
`which provides vertex data for a primitive (e.g. triangle) to be
`rendered on the one or more display devices to the graphics
`processing circuitry. The display device, for example, a CRT
`display includes a plurality of scan lines comprised of a series
`of pixels. When appearance attributes (e.g. color, brightness,
`texture) are applied to the pixels, an object or scene is pre(cid:173)
`sented on the display device. The graphics processing cir(cid:173)
`cuitry receives the vertex data and generates pixel data includ(cid:173)
`ing the appearance attributes which may be presented on the 40
`display device according to a particular protocol. The pixel
`data is typically stored in the frame buffer in a manner that
`corresponds to the pixels location on the display device.
`FIG. 1 illustrates a conventional display device 10, having
`a screen 12 partitioned into a series of vertical strips 13-18. 45
`The strips 13-18 are typically 1-4 pixels in width. In like
`manner, the frame buffer of conventional graphics processing
`systems is partitioned into a series of vertical strips having the
`same screen space width. Alternatively, the frame buffer and
`the display device may be partitioned into a series of horizon- 50
`tal strips. Graphics calculations, for example, lighting, color,
`texture and user viewing information are performed by the
`graphics processing circuitry on each of the primitives pro(cid:173)
`vided by the host. Once all calculations have been performed
`on the primitives, the pixel data representing the object to be 55
`displayed is written into the frame buffer. Once the graphics
`calculations have been repeated for all primitives associated
`with a specific frame, the data stored in the frame buffer is
`rendered to create a video signal that is provided to the display
`device.
`The amount of time taken for an entire frame of informa(cid:173)
`tion to be calculated and provided to the frame buffer
`becomes a bottleneck in graphics systems as the calculations
`associated with the graphics become more complicated. Con(cid:173)
`tributing to the increased complexity of the graphics calcula- 65
`tion is the increased need for higher resolution video, as well
`as the need for more complicated video, such as 3-D video.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention and the related advantages and ben(cid:173)
`efits provided thereby, will be best appreciated and under(cid:173)
`stood upon review of the following detailed description of a
`preferred embodiment, taken in conjunction with the follow(cid:173)
`ing drawings, where like numerals represent like elements, in
`which:
`FIG. 1 is a schematic block diagram of a conventional
`60 display partitioned into several vertical strips:
`FIG. 2 is a schematic block diagram of a graphics process(cid:173)
`ing system employing an exemplary multi-pipeline graphics
`processing circuit according to one embodiment of the
`present invention;
`FIG. 3 is a schematic block diagram of a memory parti(cid:173)
`tioned into an exemplary super-tile pattern according to the
`present invention;
`
`0008
`
`
`
`US 8,933,945 B2
`
`3
`FIG. 4 is a schematic block diagram of a memory parti(cid:173)
`tioned into a super-tile pattern according to an alternate
`embodiment of the present invention;
`FIG. 5 is a schematic block diagram of an exemplary multi(cid:173)
`pipeline graphics processing circuit used in a multi processor
`configuration according to an alternate embodiment of the
`present invention;
`FIG. 6 is a flow chart of the operations performed by the
`graphics processing circuit according to the present inven(cid:173)
`tion;
`FIG. 7 is a diagram illustrating a polygon bounding box to
`determine which, if a polygon fits in a tile or super tile; and
`FIG. 8 is a schematic block diagram of an exemplary multi(cid:173)
`pipeline graphics processing circuit used in a multi processor
`configuration according to an alternate embodiment of the
`present invention.
`
`DETAILED DESCRIPTION OF A PREFERRED
`EMBODIMENT
`
`A multi-pipeline graphics processing circuit includes at
`least two pipelines operative to process data in a correspond(cid:173)
`ing tile of a repeating tile pattern, a respective one of the at
`least two pipelines is operative to process data in a dedicated
`tile, wherein the repeating tile pattern includes a horizontally
`and vertically repeating pattern of square regions. The multi(cid:173)
`pipeline graphics processing circuit may be coupled to a
`frame buffer that is subdivided into a replicating pattern of
`square regions (e.g. tiles), where each region is processed by
`a corresponding one of the at least two pipelines such that
`load balancing and texture cache utilization is enhanced.
`A multi-pipeline graphics processing method includes
`receiving vertex data for a primitive to be rendered, generat(cid:173)
`ing pixel data in response to the vertex data, determining the
`pixels within a set of tiles of a repeating tile pattern to be 35
`processed by a corresponding one of at least two graphics
`pipelines in response to the pixel data, the repeating tile
`pattern including a horizontally and vertically repeating pat(cid:173)
`tern of square regions, and performing pixel operations on the
`pixels within the determined set of tiles by the corresponding
`one of the at least two graphics pipelines. An exemplary
`embodiment of the present invention will now be described
`with reference to FIGS. 2-6.
`FIG. 2 is a schematic block diagram of an exemplary
`graphics processing system 30 employing an example of a 45
`multi-pipeline graphics processing circuit 34 according to
`one embodiment of the present invention. The graphics pro(cid:173)
`cessing system 30 can be implemented with a single graphics
`processing circuit 34 or with two or more graphics processing
`circuits 34, 54. The components and corresponding function(cid:173)
`ality of the graphics processing circuits 34, 54 are substan(cid:173)
`tially the same. Therefore, only the structure and operation of
`graphics processing circuit 34 will be described in detail. An
`alternate embodiment, employing both graphics processing
`circuits 34 and 54 will be discussed in greater detail below
`with reference to FIGS. 4-5.
`Graphics data 31, for example, vertex data of a primitive
`(e.g. triangle) 80 (FIG. 3) is transmitted as a series of strips to
`the graphics processing circuit 34. As used herein, graphics
`data 31 can also include video data or a combination of video
`data and graphics data. The graphics processing circuit 34 is
`preferably a portion of a stand-alone graphics processor chip
`or may also be integrated with a host processor or other
`circuit, if desired, or part of a larger system. The graphics data
`31 is provided by a host (not shown). The host may be a
`system processor (not shown) or a graphics application run(cid:173)
`ning on the system processor. In an alternate embodiment, an
`
`4
`Accelerated Graphics Port (AGP) 32 or other suitable port
`receives the graphics data 31 from the host and provides the
`graphics data 31 to the graphics processing circuit 34 for
`further processing.
`The graphics processing circuit 34 includes a first graphics
`pipeline 101 operative to process graphics data in a first set of
`tiles as discussed in greater detail below. The first pipeline
`101 includes front end circuitry 35, a scan converter 37, and
`back end circuitry 39. The graphics processing circuit 34 also
`10 includes a second graphics pipeline 102, operative to process
`graphics data in a second set of tiles as discussed in greater
`detail below. The first graphics pipeline 101 and the second
`graphics pipeline 102 operate independently of one another.
`The second graphics pipeline 102 includes the front end cir-
`15 cuitry 35, a scan converter 40, and back end circuitry 42.
`Thus, the graphics processing circuit 34 of the present inven(cid:173)
`tion is configured as a multi-pipeline circuit, where the back
`end circuitry 39 of the first graphics pipeline 101 and the back
`end circuitry 42 of the second graphics pipeline 102 share the
`20 front end circuitry 35, in that the first and second graphics
`pipelines 101 and 102 receive the same pixel data 36 provided
`by the front end circuitry 35. Alternatively, the back end
`circuitry 39 of the first graphics pipeline 101 and the back end
`circuitry 42 of the second pipeline 102 may be coupled to
`25 separate front end circuits. Additionally, it will be appreciated
`that a single graphics processing circuit can be configured in
`similar fashion to include more than two graphics pipelines.
`The illustrated graphics processing circuit 34 has the first and
`second pipelines 101-102 present on the same chip. However,
`30 in alternate embodiments, the first and second graphics pipe(cid:173)
`lines 101-102 may be present on multiple chips intercon(cid:173)
`nected by suitable communication circuitry or a communica(cid:173)
`tion path, for example, a synchronization signal or data bus
`interconnecting the respective memory controllers.
`The front end circuitry 35 may include, for example, a
`vertex shader, set up circuitry, rasterizer or other suitable
`circuitry operative to receive the primitive data 31 and gen(cid:173)
`erate pixel data 36 to be further processed by the back end
`circuitry 39 and 42, respectively. The front end circuitry 35
`40 generates the pixel data 36 by performing, for example, clip(cid:173)
`ping, lighting, spatial transformations, matrix operations and
`rasterizing operations on the primitive data 31. The pixel data
`36 is then transmitted to the respective scan converters 37 and
`40 of the two graphics pipelines 101-102.
`The scan converter 37 of the first graphics pipeline 101
`receives the pixel data 36 and sequentially provides the posi(cid:173)
`tion (e.g. x, y) coordinates 60 in screen space of the pixels to
`be processed by the back end circuitry 39 by determining or
`identifying those pixels of the primitive, for example, the
`50 pixels within portions 81-82 of the triangle 80 (FIG. 3) that
`intersect the tile or set of tiles that the back end circuitry 39 is
`responsible for processing. The particular tile(s) that the back
`end circuitry 39 is responsible for is determined based on the
`tile identification data present on the pixel identification line
`55 38 of the scan converter 37. The pixel identification line 38 is
`illustrated as being hard wired to ground. Thus, the tile iden(cid:173)
`tification data corresponds to a logical zero. This corresponds
`to the back end circuitry 39 being responsible for processing
`the tiles labeled "A" (e.g. 72 and 75) in FIG. 3. Although the
`60 pixel identification line 38 is illustrated as being hard wired to
`a fixed value, it is to be understood and appreciated that the
`tile identification data can be programmable data, for
`example, from a suitable driver and such a configuration is
`contemplated by the present invention and is within the spirit
`65 and scope of the instant disclosure.
`Back end circuitry 39 may include, for example, pixel
`shaders, blending circuits, z-buffers or any other circuitry for
`
`0009
`
`
`
`US 8,933,945 B2
`
`5
`performing pixel appearance attribute operations (e.g. color,
`texture blending, z-buffering) on those pixels located, for
`example, in tiles 72, 75 (FIG. 3) corresponding to the position
`coordinates 60 provided by the scan converter 37. The pro(cid:173)
`cessed pixel data 43 is then transmitted to graphics memory
`48 via memory controller 46 for storage therein at locations
`corresponding to the position coordinates 60.
`The scan converter 40 of the second graphics pipeline 102,
`receives the pixel data 36 and sequentially provides position
`(e.g. x, y) coordinates 61 in screen space of the pixels to be 10
`processed by the back end circuitry 42 by determining those
`pixels of the primitive, for example, the pixels within portions
`83-84 of the triangle 80 (FIG. 3) that intersect the tiles that the
`back end circuitry 42 is responsible for processing. Back end
`circuitry 42 tile responsibility is determined based on the tile
`identification data present on the pixel identification line 41 of
`the scan converter 41. The pixel identification line 41 is illus(cid:173)
`trated as being hard wired to V cc; thus, the tile identification
`data corresponds to a logical one. This corresponds to the
`back end circuitry 42 being responsible for processing the 20
`tiles labeled "B" (e.g. 73-74) in FIG. 3. Although the pixel
`identification line 41 is illustrated as being hard wired to a
`fixed value, it is to be understood and appreciated that the tile
`identification data can be programmable data, for example,
`from a suitable driver and such configuration is contemplated 25
`by the present invention and is within the spirit and scope of
`the instant disclosure.
`Back end circuitry 42 may include, for example, pixel
`shaders, blending circuits, z-buffers or any suitable circuitry
`for performing pixel appearance attribute operations on those
`pixels located, for example, in tiles 73 and 74 (FIG. 3) corre(cid:173)
`sponding to the position coordinates 61 provided by the scan
`converter 40. The processed pixel data 44 is then transmitted
`to the graphics memory 48, via memory controller 46, for
`storage therein at locations corresponding to the position
`coordinates 61.
`The memory controller 46 is operative to transmit and
`receive the processed pixel data 43-44 from the back end
`circuitry 39 and 42; transmit and retrieve pixel data 49 from
`the graphics memory 48; and in a single circuit implementa(cid:173)
`tion, transmit pixel data 50 for presentation on a suitable
`display 51. The display 51 may be a monitor, a CRT, a high
`definition television (HDTV) or any other device or combi(cid:173)
`nation thereof.
`Graphics memory 48 may include, for example, a frame 45
`buffer that also stores one or more texture maps. Referring to
`FIG. 3, the frame buffer portion of the graphics memory 48 is
`partitioned in a repeating tile pattern ofhorizontal and vertical
`square regions or tiles 72-75, where the regions 72-75 provide
`a two dimensional partitioning of the frame buffer portion of
`the memory 48. Each tile is implemented as a 16x16 pixel
`array. The repeating tile pattern of the frame buffer 48 corre(cid:173)
`sponds to the partitioning of the corresponding display 51
`(FIG. 2). When rendering a primitive (e.g. triangle) 80, the
`first graphics pipeline 101 processes only those pixels in
`portions 81, 82 of the primitive 80 that intersects tiles labeled
`"A", for example, 72 and 75, as the back end circuitry 39 is
`responsible for the processing of tiles corresponding to tile
`identification 0 present on pixel identification line 38 (FIG.
`2). In corresponding fashion, the second graphics pipeline
`102 processes only those pixels in portions 83, 84 of the
`primitive 80 that intersects tiles labeled "B", for example
`73-74, as the back end circuitry 42 (FIG. 2) is responsible for
`the processing of tiles corresponding to tile identification 1
`present on pixel identification line 41 (FIG. 2).
`By configuring the frame buffer 48 according to the present
`invention, as the primitive data 31 is typically written in
`
`6
`strips, the tiles (e.g. 72 and 75) being processed by the first
`graphics pipeline 101 and the tiles (e.g. 73 and 74) being
`processed by the second graphics pipeline 102 will be sub(cid:173)
`stantially equal in size, notwithstanding the primitive 80 ori(cid:173)
`entation. Thus, the amount of processing performed by the
`first graphics pipeline 101 and the second graphics pipeline
`102, respectively, are substantially equal; thereby, effectively
`eliminating the load balance problems exhibited by conven-
`tional techniques.
`FIG. 4 is a schematic block diagram of a frame buffer 68
`partitioned into a super-tile pattern according to an alternate
`embodiment of the present invention. Such a partitioning
`would be used, for example, in conjunction with a multi(cid:173)
`processor implementation to be discussed below with refer-
`15 ence to FIG. 5. As illustrated, the frame buffer 68 is parti(cid:173)
`tioned into a repeating tile pattern where the tiles, for
`example, 92-99 that form the repeating tile pattern are the
`responsibility of and processed by a corresponding one of the
`graphics pipelines provided by the multi-processor imple(cid:173)
`mentation.
`FIG. 5 is a schematic block diagram of a graphics process(cid:173)
`ing circuit 54 which may be coupled with the graphics pro(cid:173)
`cessing circuit 34 (FIG. 2), for example, by the AGP 32 or
`other suitable port, to form one embodiment of a multi-pro(cid:173)
`cessor implementation. The graphics processing circuit 54 is
`preferably a portion of a stand-alone graphics processor chip
`or may also be integrated with a host processor or other
`circuit, if desired, or port of a larger system. The multi(cid:173)
`processor implementation exhibits an increased fill rate of,
`30 for example, 9.6 billion pixels/sec with a triangle rate of300
`million triangles/sec. This represents a tremendous perfor(cid:173)
`mance increase as compared to conventional graphics pro(cid:173)
`cessing systems. The triangle rate is defined as the number of
`triangles the graphics processing circuit can generate per
`35 second. The fill rate is defined as the number of pixels the
`graphics processing circuit can render per second.
`Referring briefly to FIG. 2, in the multi-processor imple(cid:173)
`mentation, processed pixel data 52 from the graphics process(cid:173)
`ing circuit 34 is provided as a first of two inputs to a high
`40 speed switch 70. The second input to the high speed switch 70
`is the processed pixel data 55 from the graphics processing
`circuit 54. The high speed switch 70 has a switching fre(cid:173)
`quency (f) sufficient to provide the pixel information 71 to a
`suitable display device without any detectable latency.
`Returning to FIG. 5, the graphics processing circuit 54
`includes a third graphics pipeline 201 operative to process
`graphics data in a third set of tiles. The third graphics pipeline
`201 includes front end circuitry 135, which may be the front
`end circuitry 35 discussed with reference to FIG. 2, a scan
`50 converter 137 and back end circuitry 139. The graphics pro(cid:173)
`cessing circuit 54 also includes a fourth graphics pipeline
`202, operative to process graphics data in a fourth set of tiles.
`The fourth graphics pipeline 202 includes the front end cir(cid:173)
`cuitry 135, a scan converter 140 and back end circuitry 142.
`55 The third graphics pipeline 201 and the fourth graphics pipe(cid:173)
`line 202 also operate independently of one another. Thus, the
`graphics processing circuit 54 is configured as a multi-pipe(cid:173)
`line circuit, where the back end circuitry 139 of the third
`graphics pipeline 201 and the back end circuitry 142 of the
`60 fourth graphics pipeline 202 share the front end circuitry 135,
`in that the respective back end circuitry 139 and 142 receives
`the same pixel data from the front end circuitry 135. As
`illustrated, the components of the third and fourth graphics
`pipelines are present on a single chip. Additionally, the back
`65 end circuitry 139 and the back end circuitry 142 may be
`configured to share the front end circuitry