throbber
US008933945B2
`
`(12) United States Patent
`Leather et al.
`
`(10) Patent No.:
`
`(45) Date of Patent:
`
`US 8,933,945 B2
`Jan. 13, 2015
`
`(54)
`
`(75)
`
`(73)
`
`DIVIDING WORK AMONG MULTIPLE
`GRAPHICS PIPELINES USING A
`SUPER-TILING TECHNIQUE
`
`Inventors: Mark M. Leather, Saratoga, CA (US);
`Eric Demers, Palo Alto, CA (US)
`
`Assignee: ATI Technologies ULC, Markham,
`Ontario (CA)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1808 days.
`
`(21)
`
`Appl. No.: 10/459,797
`
`(22)
`
`Filed:
`
`Jun. 12, 2003
`
`(65)
`
`(60)
`
`(51)
`
`(52)
`
`(58)
`
`Prior Publication Data
`
`US 2004/0100471 A1
`
`May 27, 2004
`
`Related U.S. Application Data
`
`Provisional application No. 60/429,641, filed on Nov.
`27, 2002.
`
`Int. Cl.
`
`G06T 1/20
`G06F 13/14
`G06F 12/02
`G06T 11/40
`G06T 15/00
`G09G 5/36
`U.S. Cl.
`
`(2006.01)
`(2006.01)
`(2006.01)
`(2006.01)
`(2011.01)
`(2006.01)
`
`CPC . G06T11/40 (2013.01); G06T1/20 (2013.01);
`G06T 15/005 (2013.01); G09G 5/363 (2013.01)
`USPC .......................... .. 345/506; 345/519; 345/544
`Field of Classification Search
`USPC ....... .. 345/506, 530, 505, 588, 544, 545, 532,
`345/501, 502, 531, 519
`See application file for complete search history.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,885,703 A
`5,179,640 A *
`5,550,962 A
`5,745,118 A *
`5,794,016 A *
`5,818,469 A
`5,905,506 A *
`5,977,997 A
`5,999,196 A
`6,118,452 A
`
`12/1989 Deering
`1/1993 Duffy .......................... .. 345/596
`8/1996 Nakamura et al.
`4/1998 Alcorn et al.
`............... .. 345/587
`8/1998 Kelleher
`..................... .. 345/505
`10/1998 Lawless et al.
`5/1999 Hamburg .................... .. 345/672
`11/1999 Vainsencher
`12/1999 Storm et al.
`9/2000 Garmett
`
`(Continued)
`OTHER PUBLICATIONS
`
`Elias, Hugo. “Polygon Scan Converting.” http://freespace.virgin.net/
`hugo.elias/graphic s/x_polysc.htm. *
`
`(Continued)
`
`Primary Examiner — Joni Richer
`(74) Attorney, Agent, or Firm — Faegre Baker Daniels LLP
`
`(57)
`
`ABSTRACT
`
`A graphics processing circuit includes at least two pipelines
`operative to process data in a corresponding set of tiles of a
`repeating tile pattern, a respective one of the at least two
`pipelines operative to process data in a dedicated tile, wherein
`the repeating tile pattern includes a horizontally and verti-
`cally repeating pattern of square regions. A graphics process-
`ing method includes receiving vertex data for a primitive to be
`rendered; generating pixel data in response to the vertex data;
`determining the pixels within a set of tiles of a repeating tile
`pattern to be processed by a corresponding one of at least two
`graphics pipelines in response to the pixel data, the repeating
`tile pattern including a horizontally and vertically repeating
`pattern of square regions; and performing pixel operations on
`the pixels within the determined set oftiles by the correspond-
`ing one of the at least two graphics pipelines.
`
`21 Claims, 5 Drawing Sheets
`
`
`
`RECEIVE VERTEX DATA FOR A
`PRIMITIVE TO BE RENDERED
`
`
`
`
`
`GENERATE PIXEL DATA IN
`RESPONSE TO THE VERTEX DATA
`
`
`
`
`
`
`
`109
`
`PROCESSING
`COMPLETE
`7
`
`TEXAS INSTRUMENTS EX. 1001 - 1/13
`
`DETERMINE THE PIXELS WITHIN A SET OF TILES OF
`THE REPEATING TILE PATTERN TO BE PROCESSED
`
`BY A CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES IN RESPONSE TO THE PIXEL
`
`
`DATA, THE REPEATING TILE PATTERN INCLUDING A
`HORIZONTALLY AND VERTICALLY REPEATING
`
`
`PATTERN OF SQUARE REGIONS
`PERFORM PIXEL OPERATIONS ON THE PIXELS
`WITHIN THE DETERMINED SET OF TILES BY THE
`CORRESPONDING ONE OF THE AT LEAST TWO
`GRAPHICS PIPELINES
`
`DETERMINE THE SET OF TILES
`THAT THE CORRESPONDING
`GRAPHICS PIPELINE IS
`RESPONSIBLE FOR
`
`PROVIDE POSITION
`COORDINATES OF THE PIXELS
`WITHIN THE DETERMINED SET
`OF TILES TO BE PROCESSED
`
`/\ 108
`
`
`
`TEXAS INSTRUMENTS EX. 1001 - 1/13
`
`

`
`US 8,933,945 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,184,906
`6,219,062
`6,222,550
`6,292,200
`6,323,860
`6,344,852
`6,353,439
`6,380,935
`6,384,824
`6,407,736
`6,417,858
`6,424,345
`6,557,083
`6,570,579
`6,573,893
`6,636,232
`6,650,327
`6,650,330
`6,697,063
`6,714,203
`6,724,394
`6,731,289
`6,750,867
`6,753,878
`6,762,763
`6,778,177
`6,791,559
`6,801,203
`6,809,732
`6,864,893
`6,864,896
`6,897,871
`6,980,209
`
`B1*
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1
`B1*
`B1
`B2
`B1
`B2
`B1
`B1*
`B1
`B1
`B1
`B1*
`B1*
`B1*
`B2
`B1
`B2
`B2
`B2*
`B1
`B1
`
`2/2001
`4/2001
`4/2001
`9/2001
`11/2001
`2/2002
`3/2002
`4/2002
`5/2002
`6/2002
`7/2002
`7/2002
`4/2003
`5/2003
`6/2003
`10/2003
`11/2003
`11/2003
`2/2004
`3/2004
`4/2004
`5/2004
`6/2004
`6/2004
`7/2004
`8/2004
`9/2004
`10/2004
`10/2004
`3/2005
`3/2005
`5/2005
`12/2005
`
`........... .. 345/629
`
`................. .. 345/532
`Wang et al.
`Matsuo et al.
`Rosman et al.
`Bowen et al.
`Zhu et al.
`Zhu et al.
`Lindholm et al.
`Heeschen et al.
`Morgan et al.
`Regan
`Bosch et al.
`Smith et al.
`Sperber et al.
`MacInnis et al.
`Naqvi et al.
`Larson
`Airey et al.
`Lindholm et al.
`Zhu
`Morgan et al.
`Zatz et al.
`Peercy et al.
`Gibson
`.............. .. 345/629
`Heirich et al.
`............... .. 345/506
`Migdal et al.
`Furtner
`....................... .. 345/544
`Baldwin
`Hussain
`Zatz et al.
`Zatz
`Perego ........................ .. 345/542
`Morein et al.
`Donhan1 et al.
`
`.............. .. 345/506
`
`7,015,913 B1
`7,061,495 B1
`7,170,515 B1
`2002/0145612 A1
`2003/0076320 A1
`2003/0164830 A1*
`2004/0041814 A1
`2004/0164987 A1
`2005/0068325 A1
`2005/0200629 A1
`2006/0170690 A1
`
`3/2006 Lindholm et al.
`6/2006 Leather
`1/2007 Zhu
`10/2002 Blythe et al.
`4/2003 Collodi
`9/2003 Kent
`........................... .. 345/505
`3/2004 Wyatt et al.
`8/2004 Aronson et al.
`3/2005 Lefebvre et al.
`9/2005 Morein et al.
`8/2006 Leather
`
`OTHER PUBLICATIONS
`
`European Search Report from European Patent Oflice; European
`Application No. 032574642; dated Apr. 4, 2006.
`Foley, James et al.; Computer Graphics, Principles and Practice;
`Addison-Wesley Publishing Company; 1990; pp. 873-899.
`Crockett, Thomas W.; An introduction to parallel rendering; Elsevier
`Science BV.; 1997; pp. 819-843.
`Montrym, John S. eta1.; InfiniteReality: A Real-Time Graphics Sys-
`tem; Silicon Graphics Computer Systems; 1997; pp. 293-302.
`Humphreys, Greg et al.; WireGL: A Scalable Grpahics System for
`Clusters; ACM Siggraph; 2001; pp. 129-140.
`Akeley, K. et al.: High-Performance Polygon Rendering; ACM Com-
`puter Graphics; V01. 22, No. 4; 1988; pp. 239-246.
`Breternitz, Jr., Mauricio et al.; Compilation, Architectural Support,
`and Evaluation of SIMD Graphics Pipeline Programs on a General-
`Purpose CPU; IEEE; 2003; pp. 1-11.
`International Search Report
`for PCT Patent Application PCT/
`IB2004/003821 dated Mar. 22, 2005.
`Fuchs, Henry et al.; Pixel-Planes 5: A Heterogeneous Multiprocessor
`Graphics System Using Processor-Enhanced Memories; Computer
`Graphics; V01. 23, No. 3; Jul. 1989; pp. 79-88.
`
`* cited by examiner
`
`TEXAS INSTRUMENTS EX. 1001 - 2/13
`
`TEXAS INSTRUMENTS EX. 1001 - 2/13
`
`

`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 1 of5
`
`US 8,933,945 B2
`
`
`
`FIG. 1
`
`CONVERTER
`
`60
`BACK END
`CIRCUITRY
`A0
`
`BACK END
`CIRCUITRY
`B0
`
`MEMORY CONTROLLER
`
`FIG. 5
`
`TEXAS INSTRUMENTS EX. 1001 - 3/13
`
`TEXAS INSTRUMENTS EX. 1001 - 3/13
`
`

`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 2 of5
`
`US 8,933,945 B2
`
`34
`
`101
`
`GRAPHICS
`
`PROCESSING
`CIRCUIT 40
`
`
`
`CONVERTER
`
`
`
`61
`
`BACK END
`CIRCUITRY B
`
`
`
`
`
`
`BACK END
`CIRCUITRY A
`
`34
`
`
`
` GRAPHICS
`
`
`MEMORY
`
`FIG. 2
`
`TEXAS INSTRUMENTS EX. 1001 - 4/13
`
`TEXAS INSTRUMENTS EX. 1001 - 4/13
`
`

`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 3 of5
`
`US 8,933,945 B2
`
`TEXAS INSTRUMENTS EX. 1001 - 5/13
`
`TEXAS INSTRUMENTS EX. 1001 - 5/13
`
`

`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 4 of5
`
`US 8,933,945 B2
`
`
`
`
`
`mm...»..._OEmm_I._.m_z_s_mm_Eo
`
`oz_n_zon_mm_mmoomi».r<I._.
`
`
`
`
`
`m.mz_._m_n__n_mo_:n_<mo
`
`
`
`MO“.m:m_wzon_wmm
`
`
`
`zO_._._wOn_m_o_>omn_
`
`
`
`wdxim__.:u_Owm:<z_omooo
`
`
`
`
`
`Emomz__2mmEn_m:._._.z__.:._>>
`
`
`
`
`
`ommmmoomammO._.wm_.__._.n_O
`
`._.w_<._.w
`
`
`
`cow<mo”.<._.<n_xmEm>m_>_momm
`
`
`
`
`
`
`
`
`
`No_.Z_<._.<D._m_X_n_mP<mm_zmo
`
`
`
`ommmozmmmmO._.m>Es__«E
`
`
`
`
`
`<._.<n_xmEm>m_I._.O._.mmzoammm
`
`
`
`
`
`n_Ow.n._.__._.n_OEm<z=._.:>>mdxamm»m_z_s_mm_Eo
`
`O>>._.._.m<m._._.<m=.:.n_Omzooz_ozon_mmEmoo<EomwwmoommmmOHz«mE<n_
`
`
`
`
`
`m.__._.oz_p<mn_mmmI._.
`
`
`
`
`
`._mX_n_mi»O._.mmzonammz_wmz_._mEEwo__._n_<mo
`
`
`
`
`
`>j<o_Em_>oz<>.:<»zoN_mo:<oz_n5._oz_Zmm_.F_.<n_m_.=._.0z:<mnm_mm=._._..<._.<n_
`
`
`oz_+<mn_mm
`
`
`
`
`
`m.0_u_
`
`
`
`mmz:mn__n_wo__._n_<mo
`
`oz_wmmoomn_
`
`m._.m._n=>_Oo
`
`
`
`wzo_om_mm_m<:omn_OzmmE<n_
`
`
`
`
`
`w._m_x_n_mI._.ZOwzO_._.<M_m_n_O._wX_n__2mo#m_n_
`
`
`
`
`
`mI._.>mmm.__»u_OEmn_mz__>_mmEom=._._.z_I._._>>
`
`
`
`
`
`O>>._.5<m_._._.<m__._._.n_Omzowz_ozon_mm_mmoo
`
`
`
`
`
`TEXAS INSTRUMENTS EX. 1001 - 6/13
`
`TEXAS INSTRUMENTS EX. 1001 - 6/13
`
`
`
`
`
`
`
`

`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 5 of5
`
`US 8,933,945 B2
`
`_____ ding box
`
`vertex 2
`
`
`“K\.\\m/.,..#I“"Il..Il..Il.."IW\\
`%.........§
`
`
`
`.\ /wwu-unununm/M
`
`F
`
`8Wu.
`
`TEXAS INSTRUMENTS EX. 1001 - 7/13
`
`TEXAS INSTRUMENTS EX. 1001 - 7/13
`
`
`

`
`US 8,933,945 B2
`
`1
`DIVIDING WORK AMONG MULTIPLE
`GRAPHICS PIPELINES USING A
`
`SUPER-TILING TECHNIQUE
`
`This application claims the benefit of U.S. Provisional
`Application Ser. No. 60/429,641 filed Nov. 27, 2002, entitled
`“Dividing Work Among Multiple Graphics Pipelines Using a
`Super-Tiling Technique”, having as inventors Mark M.
`Leather and Eric Demers, and owned by instant assignee.
`
`RELATED CO-PENDING APPLICATION
`
`This is a related application to a co-pending application
`entitled “Parallel Pipeline Graphics System”, having Ser. No.
`10/724,384, having Leather et al. as the inventors, filed on
`Nov. 26, 2003, owned by the same assignee and hereby incor-
`porated by reference in its entirety.
`
`FIELD OF THE INVENTION
`
`The present invention generally relates to graphics pro-
`cessing circuitry and, more particularly, to dividing graphics
`processing operations among multiple pipelines.
`
`BACKGROUND OF THE INVENTION
`
`Computer graphics systems, set top box systems or other
`graphics processing systems typically include a host proces-
`sor, graphics (including video) processing circuitry, memory
`(e. g. frame buffer), and one or more display devices. The host
`processor may have a graphics application running thereon,
`which provides vertex data for a primitive (e.g. triangle) to be
`rendered on the one or more display devices to the graphics
`processing circuitry. The display device, for example, a CRT
`display includes a plurality of scan lines comprised of a series
`of pixels. When appearance attributes (e.g. color, brightness,
`texture) are applied to the pixels, an object or scene is pre-
`sented on the display device. The graphics processing cir-
`cuitry receives the vertex data and generates pixel data includ-
`ing the appearance attributes which may be presented on the
`display device according to a particular protocol. The pixel
`data is typically stored in the frame buffer in a manner that
`corresponds to the pixels location on the display device.
`FIG. 1 illustrates a conventional display device 10, having
`a screen 12 partitioned into a series of vertical strips 13-18.
`The strips 13-18 are typically 1-4 pixels in width. In like
`manner, the frame buffer of conventional graphics processing
`systems is partitioned into a series ofvertical strips having the
`same screen space width. Alternatively, the frame buffer and
`the display device may be partitioned into a series of horizon-
`tal strips. Graphics calculations, for example, lighting, color,
`texture and user viewing information are performed by the
`graphics processing circuitry on each of the primitives pro-
`vided by the host. Once all calculations have been performed
`on the primitives, the pixel data representing the object to be
`displayed is written into the frame buffer. Once the graphics
`calculations have been repeated for all primitives associated
`with a specific frame, the data stored in the frame buffer is
`rendered to create a video signal that is provided to the display
`device.
`The amount of time taken for an entire frame of informa-
`
`tion to be calculated and provided to the frame buffer
`becomes a bottleneck in graphics systems as the calculations
`associated with the graphics become more complicated. Con-
`tributing to the increased complexity of the graphics calcula-
`tion is the increased need for higher resolution video, as well
`as the need for more complicated video, such as 3-D video.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`
`The video image observed by the human eye becomes dis-
`torted or choppy when the amount of time taken to render an
`entire frame ofvideo exceeds the amount oftime in which the
`
`display device must be refreshed with a new graphic or frame
`in order to avoid perception by the human eye. To decrease
`processing time, graphics processing systems typically
`divide primitive processing among several graphics process-
`ing circuits where, for example, one graphics processing cir-
`cuit is responsible for one vertical strip (e.g. 13) of the frame
`while another graphics processing circuit is responsible for
`another vertical strip (e.g. 14) ofthe frame. In this manner, the
`pixel data is provided to the frame buffer within the required
`refresh time.
`
`Load balancing is a significant drawback associated with
`the partitioning systems as described above. Load balancing
`problems occur, for example, when all ofthe primitives 20-23
`of a particular object or scene are located in one strip (e.g.
`strip 13) as illustrated in FIG. 1. When this occurs, only the
`graphics processing circuit responsible strip 13 is actively
`processing primitives; the remaining graphics processing cir-
`cuits are idle. This results in a significant waste of computing
`resources as at most only half of the graphics processing
`circuits are operating. Consequently, graphics processing
`system performance is decreased as the system is only oper-
`ating at a maximum of fifty percent capacity.
`Changing the width of the strips has been employed to
`counter the system performance problems. However, when
`the width of a strip is increased, the load balancing problem is
`enhanced as more primitives are located within a single strip;
`thereby, increasing the processing required of the graphics
`processing circuit responsible for that strip, while the remain-
`ing graphics processing circuits remain idle. When the width
`ofthe strip is decreased (e.g. four bits to two bits), cache (e.g.
`texture cache) efficiency is decreased as the number of cache
`lines employed in transferring data is reduced in proportion to
`the decreased width of the strip. In either case, graphics
`processing system performance is still decreased due to the
`idle graphics processing circuits.
`Frame based subdivision has been used to overcome the
`
`performance problems associated with conventional parti-
`tioning systems. In frame based subdivision, each graphics
`processor is responsible for processing an entire frame, not
`strips within the same frame. The graphics processors then
`alternate frames. However, frame subdivision introduces one
`or more frames of latency between the user and the screen,
`which is unacceptable in real-time interactive environments,
`for example, providing graphics for a flight simulator appli-
`cation.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention and the related advantages and ben-
`efits provided thereby, will be best appreciated and under-
`stood upon review of the following detailed description of a
`preferred embodiment, taken in conjunction with the follow-
`ing drawings, where like numerals represent like elements, in
`which:
`
`FIG. 1 is a schematic block diagram of a conventional
`display partitioned into several vertical strips:
`FIG. 2 is a schematic block diagram of a graphics process-
`ing system employing an exemplary multi-pipeline graphics
`processing circuit according to one embodiment of the
`present invention;
`FIG. 3 is a schematic block diagram of a memory parti-
`tioned into an exemplary super-tile pattern according to the
`present invention;
`
`TEXAS INSTRUMENTS EX. 1001 - 8/13
`
`TEXAS INSTRUMENTS EX. 1001 - 8/13
`
`

`
`US 8,933,945 B2
`
`3
`FIG. 4 is a schematic block diagram of a memory parti-
`tioned into a super-tile pattern according to an alternate
`embodiment of the present invention;
`FIG. 5 is a schematic block diagram of an exemplary multi-
`pipeline graphics processing circuit used in a multi processor
`configuration according to an alternate embodiment of the
`present invention;
`FIG. 6 is a flow chart of the operations performed by the
`graphics processing circuit according to the present inven-
`tion;
`FIG. 7 is a diagram illustrating a polygon bounding box to
`determine which, if a polygon fits in a tile or super tile; and
`FIG. 8 is a schematic block diagram of an exemplary multi-
`pipeline graphics processing circuit used in a multi processor
`configuration according to an alternate embodiment of the
`present invention.
`
`DETAILED DESCRIPTION OF A PREFERRED
`EMBODIMENT
`
`A multi-pipeline graphics processing circuit includes at
`least two pipelines operative to process data in a correspond-
`ing tile of a repeating tile pattern, a respective one of the at
`least two pipelines is operative to process data in a dedicated
`tile, wherein the repeating tile pattern includes a horizontally
`and vertically repeating pattern of square regions. The multi-
`pipeline graphics processing circuit may be coupled to a
`frame buffer that is subdivided into a replicating pattern of
`square regions (e. g. tiles), where each region is processed by
`a corresponding one of the at least two pipelines such that
`load balancing and texture cache utilization is enhanced.
`A multi-pipeline graphics processing method includes
`receiving vertex data for a primitive to be rendered, generat-
`ing pixel data in response to the vertex data, determining the
`pixels within a set of tiles of a repeating tile pattern to be
`processed by a corresponding one of at least two graphics
`pipelines in response to the pixel data, the repeating tile
`pattern including a horizontally and vertically repeating pat-
`tern of square regions, and performing pixel operations on the
`pixels within the determined set of tiles by the corresponding
`one of the at least two graphics pipelines. An exemplary
`embodiment of the present invention will now be described
`with reference to FIGS. 2-6.
`
`FIG. 2 is a schematic block diagram of an exemplary
`graphics processing system 30 employing an example of a
`multi-pipeline graphics processing circuit 34 according to
`one embodiment of the present invention. The graphics pro-
`cessing system 30 can be implemented with a single graphics
`processing circuit 34 or with two or more graphics processing
`circuits 34, 54. The components and corresponding function-
`ality of the graphics processing circuits 34, 54 are substan-
`tially the same. Therefore, only the structure and operation of
`graphics processing circuit 34 will be described in detail. An
`alternate embodiment, employing both graphics processing
`circuits 34 and 54 will be discussed in greater detail below
`with reference to FIGS. 4-5.
`
`Graphics data 31, for example, vertex data of a primitive
`(e. g. triangle) 80 (FIG. 3) is transmitted as a series of strips to
`the graphics processing circuit 34. As used herein, graphics
`data 31 can also include video data or a combination of video
`
`data and graphics data. The graphics processing circuit 34 is
`preferably a portion of a stand-alone graphics processor chip
`or may also be integrated with a host processor or other
`circuit, if desired, or part of a larger system. The graphics data
`31 is provided by a host (not shown). The host may be a
`system processor (not shown) or a graphics application run-
`ning on the system processor. In an alternate embodiment, an
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`Accelerated Graphics Port (AGP) 32 or other suitable port
`receives the graphics data 31 from the host and provides the
`graphics data 31 to the graphics processing circuit 34 for
`further processing.
`The graphics processing circuit 34 includes a first graphics
`pipeline 101 operative to process graphics data in a first set of
`tiles as discussed in greater detail below. The first pipeline
`101 includes front end circuitry 35, a scan converter 37, and
`back end circuitry 39. The graphics processing circuit 34 also
`includes a second graphics pipeline 102, operative to process
`graphics data in a second set of tiles as discussed in greater
`detail below. The first graphics pipeline 101 and the second
`graphics pipeline 102 operate independently of one another.
`The second graphics pipeline 102 includes the front end cir-
`cuitry 35, a scan converter 40, and back end circuitry 42.
`Thus, the graphics processing circuit 34 of the present inven-
`tion is configured as a multi-pipeline circuit, where the back
`end circuitry 39 ofthe first graphics pipeline 101 and the back
`end circuitry 42 of the second graphics pipeline 102 share the
`front end circuitry 35, in that the first and second graphics
`pipelines 101 and 102 receive the same pixel data 36 provided
`by the front end circuitry 35. Alternatively, the back end
`circuitry 39 ofthe first graphics pipeline 101 and the back end
`circuitry 42 of the second pipeline 102 may be coupled to
`separate front end circuits.Additionally, it will be appreciated
`that a single graphics processing circuit can be configured in
`similar fashion to include more than two graphics pipelines.
`The illustrated graphics processing circuit 34 has the first and
`second pipelines 101-102 present on the same chip. However,
`in alternate embodiments, the first and second graphics pipe-
`lines 101-102 may be present on multiple chips intercon-
`nected by suitable communication circuitry or a communica-
`tion path, for example, a synchronization signal or data bus
`interconnecting the respective memory controllers.
`The front end circuitry 35 may include, for example, a
`vertex shader, set up circuitry, rasterizer or other suitable
`circuitry operative to receive the primitive data 31 and gen-
`erate pixel data 36 to be further processed by the back end
`circuitry 39 and 42, respectively. The front end circuitry 35
`generates the pixel data 36 by performing, for example, clip-
`ping, lighting, spatial transformations, matrix operations and
`rasterizing operations on the primitive data 31. The pixel data
`36 is then transmitted to the respective scan converters 37 and
`40 of the two graphics pipelines 101-102.
`The scan converter 37 of the first graphics pipeline 101
`receives the pixel data 36 and sequentially provides the posi-
`tion (e.g. x, y) coordinates 60 in screen space of the pixels to
`be processed by the back end circuitry 39 by determining or
`identifying those pixels of the primitive, for example, the
`pixels within portions 81-82 of the triangle 80 (FIG. 3) that
`intersect the tile or set oftiles that the back end circuitry 39 is
`responsible for processing. The particular tile(s) that the back
`end circuitry 39 is responsible for is determined based on the
`tile identification data present on the pixel identification line
`38 of the scan converter 37. The pixel identification line 38 is
`illustrated as being hard wired to ground. Thus, the tile iden-
`tification data corresponds to a logical zero. This corresponds
`to the back end circuitry 39 being responsible for processing
`the tiles labeled “A” (e. g. 72 and 75) in FIG. 3. Although the
`pixel identification line 38 is illustrated as being hard wired to
`a fixed value, it is to be understood and appreciated that the
`tile identification data can be programmable data,
`for
`example, from a suitable driver and such a configuration is
`contemplated by the present invention and is within the spirit
`and scope of the instant disclosure.
`Back end circuitry 39 may include, for example, pixel
`shaders, blending circuits, z-buffers or any other circuitry for
`
`TEXAS INSTRUMENTS EX. 1001 - 9/13
`
`TEXAS INSTRUMENTS EX. 1001 - 9/13
`
`

`
`US 8,933,945 B2
`
`5
`performing pixel appearance attribute operations (e.g. color,
`texture blending, z-buffering) on those pixels located, for
`example, in tiles 72, 75 (FIG. 3) corresponding to the position
`coordinates 60 provided by the scan converter 37. The pro-
`cessed pixel data 43 is then transmitted to graphics memory
`48 via memory controller 46 for storage therein at locations
`corresponding to the position coordinates 60.
`The scan converter 40 of the second graphics pipeline 102,
`receives the pixel data 36 and sequentially provides position
`(e.g. x, y) coordinates 61 in screen space of the pixels to be
`processed by the back end circuitry 42 by determining those
`pixels ofthe primitive, for example, the pixels within portions
`83-84 ofthe triangle 80 (FIG. 3) that intersect the tiles that the
`back end circuitry 42 is responsible for processing. Back end
`circuitry 42 tile responsibility is determined based on the tile
`identification data present on the pixel identification line 41 of
`the scan converter 41. The pixel identification line 41 is illus-
`trated as being hard wired to VCC; thus, the tile identification
`data corresponds to a logical one. This corresponds to the
`back end circuitry 42 being responsible for processing the
`tiles labeled “B” (e.g. 73-74) in FIG. 3. Although the pixel
`identification line 41 is illustrated as being hard wired to a
`fixed value, it is to be understood and appreciated that the tile
`identification data can be programmable data, for example,
`from a suitable driver and such configuration is contemplated
`by the present invention and is within the spirit and scope of
`the instant disclosure.
`
`Back end circuitry 42 may include, for example, pixel
`shaders, blending circuits, z-buffers or any suitable circuitry
`for performing pixel appearance attribute operations on those
`pixels located, for example, in tiles 73 and 74 (FIG. 3) corre-
`sponding to the position coordinates 61 provided by the scan
`converter 40. The processed pixel data 44 is then transmitted
`to the graphics memory 48, via memory controller 46, for
`storage therein at locations corresponding to the position
`coordinates 61.
`
`The memory controller 46 is operative to transmit and
`receive the processed pixel data 43-44 from the back end
`circuitry 39 and 42; transmit and retrieve pixel data 49 from
`the graphics memory 48; and in a single circuit implementa-
`tion, transmit pixel data 50 for presentation on a suitable
`display 51. The display 51 may be a monitor, a CRT, a high
`definition television (HDTV) or any other device or combi-
`nation thereof.
`
`Graphics memory 48 may include, for example, a frame
`buffer that also stores one or more texture maps. Referring to
`FIG. 3, the frame buffer portion of the graphics memory 48 is
`partitioned in a repeating tile pattern ofhorizontal and vertical
`square regions or tiles 72-75, where the regions 72-75 provide
`a two dimensional partitioning of the frame buffer portion of
`the memory 48. Each tile is implemented as a 16x16 pixel
`array. The repeating tile pattern of the frame buffer 48 corre-
`sponds to the partitioning of the corresponding display 51
`(FIG. 2). When rendering a primitive (e.g. triangle) 80, the
`first graphics pipeline 101 processes only those pixels in
`portions 81, 82 ofthe primitive 80 that intersects tiles labeled
`“A”, for example, 72 and 75, as the back end circuitry 39 is
`responsible for the processing of tiles corresponding to tile
`identification 0 present on pixel identification line 38 (FIG.
`2). In corresponding fashion, the second graphics pipeline
`102 processes only those pixels in portions 83, 84 of the
`primitive 80 that intersects tiles labeled “B”, for example
`73-74, as the back end circuitry 42 (FIG. 2) is responsible for
`the processing of tiles corresponding to tile identification 1
`present on pixel identification line 41 (FIG. 2).
`By configuring the frame buffer 48 according to the present
`invention, as the primitive data 31 is typically written in
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`strips, the tiles (e.g. 72 and 75) being processed by the first
`graphics pipeline 101 and the tiles (e.g. 73 and 74) being
`processed by the second graphics pipeline 102 will be sub-
`stantially equal in size, notwithstanding the primitive 80 ori-
`entation. Thus, the amount of processing performed by the
`first graphics pipeline 101 and the second graphics pipeline
`102, respectively, are substantially equal; thereby, effectively
`eliminating the load balance problems exhibited by conven-
`tional techniques.
`FIG. 4 is a schematic block diagram of a frame buffer 68
`partitioned into a super-tile pattern according to an alternate
`embodiment of the present invention. Such a partitioning
`would be used, for example, in conjunction with a multi-
`processor implementation to be discussed below with refer-
`ence to FIG. 5. As illustrated, the frame buffer 68 is parti-
`tioned into a repeating tile pattem where the tiles, for
`example, 92-99 that form the repeating tile pattern are the
`responsibility of and processed by a corresponding one ofthe
`graphics pipelines provided by the multi-processor imple-
`mentation.
`
`FIG. 5 is a schematic block diagram of a graphics process-
`ing circuit 54 which may be coupled with the graphics pro-
`cessing circuit 34 (FIG. 2), for example, by the AGP 32 or
`other suitable port, to form one embodiment of a multi-pro-
`cessor implementation. The graphics processing circuit 54 is
`preferably a portion of a stand-alone graphics processor chip
`or may also be integrated with a host processor or other
`circuit, if desired, or port of a larger system. The multi-
`processor implementation exhibits an increased fill rate of,
`for example, 9.6 billion pixels/sec with a triangle rate of 300
`million triangles/sec. This represents a tremendous perfor-
`mance increase as compared to conventional graphics pro-
`cessing systems. The triangle rate is defined as the number of
`triangles the graphics processing circuit can generate per
`second. The fill rate is defined as the number of pixels the
`graphics processing circuit can render per second.
`Referring briefly to FIG. 2, in the multi-processor imple-
`mentation, processedpixel data 52 from the graphics process-
`ing circuit 34 is provided as a first of two inputs to a high
`speed switch 70. The second input to the high speed switch 70
`is the processed pixel data 55 from the graphics processing
`circuit 54. The high speed switch 70 has a switching fre-
`quency (f) sufiicient to provide the pixel information 71 to a
`suitable display device without any detectable latency.
`Returning to FIG. 5, the graphics processing circuit 54
`includes a third graphics pipeline 201 operative to process
`graphics data in a third set oftiles. The third graphics pipeline
`201 includes front end circuitry 135, which may be the front
`end circuitry 35 discussed with reference to FIG. 2, a scan
`converter 137 and back end circuitry 139. The graphics pro-
`cessing circuit 54 also includes a fourth graphics pipeline
`202, operative to process graphics data in a fourth set of tiles.
`The fourth graphics pipeline 202 includes the front end cir-
`cuitry 135, a scan converter 140 and back end circuitry 142.
`The third graphics pipeline 201 and the fourth graphics pipe-
`line 202 also operate independently of one another. Thus, the
`graphics processing circuit 54 is configured as a multi-pipe-
`line circuit, where the back end circuitry 139 of the third
`graphics pipeline 201 and the back end circuitry 142 of the
`fourth graphics pipeline 202 share the front end circuitry 135,
`in that the respective back end circuitry 139 and 142 receives
`the same pixel data from the front end circuitry 135. As
`illustrated, the components of the third and fourth graphics
`pipelines are present on a single chip. Additionally, the back
`end circuitry 139 and the back end circuitry 142 may be
`configured to share the front end circuitry 35 of the graphics
`processing circuit 34. Alternatively,
`the third and fourth
`
`TEXAS INSTRUMENTS EX. 1001 - 10/13
`
`TEXAS INSTRUMENTS EX. 1001 - 10/13
`
`

`
`US 8,933,945 B2
`
`7
`graphics pipelines may be configured to be on multiple chips
`interconnected by a communication path, for example, a syn-
`chronization signal or data bus.
`The front end circuitry 135 may include, for example, a
`vertex shader, set up circuitry, rasterizer or other suitable
`circuitry operative to receive the primitive data 31 from the
`AGP 32 and generate pixel data 136 to be processed by the
`third graphics pipeline 201 and fourth graphics pipeline 202,
`respectively. The front end circuitry 135 generates the pixel
`data 136 by performing, for example, clipping, lighting, spa-
`tial transformations, matrix operations, rasterization or any
`suitable primitive operations or combination thereof on the
`primitive data 31. The pixel data 136 is then transmitted to the
`respective scan converters 137 and 140 of the two graphics
`pipelines 201-202.
`The scan converter 137 of the third graphics pipeline 201
`receives the pixel data 136 and sequentially provides the
`position (e.g. x, y) coordinates 160 in screen space of the
`pixels to be processed by the back end circuitry 139, based on
`the tile identification data pre

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket