`
`(12) United States Patent
`Leather et a].
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,633,506 B1
`Dec. 15, 2009
`
`(54) PARALLEL PIPELINE GRAPHICS SYSTEM
`
`6,344,852 B1
`
`2/2002 Zhu et al.
`
`(75) I
`nventors:
`
`CA (Us)
`M kM L h S
`ar
`. eat er, aratoga,
`;
`Eric Demers’ Palo Alto, CA (Us)
`
`(73) Assignee; ATI Technologies ULC, Markham,
`Ontario (CA)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 709 days.
`
`(21) Appl. N0.: 10/724,3s4
`
`(22) Filed:
`
`N0“ 26 2003
`’
`Related US. Application Data
`(60) Provisional application No. 60/429,976, ?led on Nov.
`27s 2002
`
`(51) Int. Cl.
`(2006.01)
`G06T1/20
`(52) us. Cl. ...................... .. 345/506; 345/505; 345/519
`(58) Field of Classi?cation Search ............... .. 345/506,
`345/505, 519
`See application ?le for complete search history
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,885,703 A * 12/1989 Deering .................... .. 345/422
`5,179,640 A
`1/1993 Duffy
`5,550,962 A
`8/1996 Nakamura et al.
`5,745,118 A
`4/1998 Alcorn et al.
`5,794,016 A
`8/1998 Kelleher
`5,818,469 A 10/1998 Lawless et al.
`5,905,506 A
`5/1999 Hamburg
`5,977,997 A * 11/1999 Vainsencher .............. .. 345/519
`5,999,196 A * 12/1999 Storm et a1. .............. .. 345/506
`6,118,452 A
`9/2000 Gannett
`6,184,906 B1
`2/2001 Wang et a1.
`6,219,062 B1 *
`4/2001 Matsuo et a1. ............ .. 345/426
`
`6,222,550 B1* 4/2001 Rosman et a1. . . . . .
`
`. . . .. 345/419
`
`6,292,200 B1 *
`6,323,860 B1
`
`9/2001 Bowen et a1. ............. .. 345/506
`11/2001 Zhu et a1.
`
`3/2002 Lindholm et al.
`6,353,439 B1
`4/2002 Heeschen et al.
`6,380,935 B1
`5/2002 Morgan et al.
`6,384,824 B1
`6,407,736 B1* 6/2002 Regan ...................... .. 345/422
`6,417,858 B1
`7/2002 Bosch et al.
`
`(Continued)
`
`OTHER PUBLICATIONS
`
`Akeley K. et a1., “High-Performance Polygon Rendering”, ACM
`Computer Graphics vol. 22 No. 4, 1988, pp. 239-246.*
`(Continued)
`_
`Primary ExamineriChante Harrison
`Assistant ExamineriMichelle K Lay
`(74) Attorney, Agent, or FirmiVedder Price P.C.
`
`(57)
`
`ABSTRACT
`
`The present invention relates to a parallel pipeline graphics
`system. The parallel pipeline graphics system includes a
`back-end con?gured to receive primitives and combinations
`of primitives (i.e., geometry) and process the geometry to
`produce values to place in a frame buffer for rendering on
`screen. Unlike prior single pipeline implementation, some
`embodiments use two or four parallel pipelines, though other
`con?gurations having 20n pipelines may be used. When
`geometry data is sent to the back-end, it is divided up and
`provided to one of the parallel pipelines. Each pipeline is a
`component of a raster back-end, where the display screen is
`divided into tiles and a de?ned portion of the screen is sent
`through a pipeline that owns that portion of the screen’s tiles.
`In one embodiment, eachpipeline comprises a scan converter,
`a hierarchical-Z unit, a Z buffer logic, a rasterizer, a shader,
`and a color buffer logic.
`
`21 Claims, 15 Drawing Sheets
`
`Tmnsbmed
`Venlees
`
`Raltenzimn
`Plpeline A
`520
`
`Rasterlza?nn
`Pipeline a
`
`MEDIATEK, Ex. 1001, Page 1
`
`
`
`US 7,633,506 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`6,424,345
`6,557,083
`6,570,579
`6,573,893
`6,636,232
`6,650,327
`6,650,330
`6,697,063
`6,714,203
`6,724,394
`6,731,289
`6,753,878
`6,762,763
`6,778,177
`6,791,559
`6,801,203
`6,809,732
`6,864,893
`6,864,896
`6,897,871
`6,980,209
`7,015,913
`7,061,495
`7,170,515
`2002/0145612
`2003/0076320
`2003/0164830
`
`7/2002
`4/2003
`5/2003
`6/2003
`10/2003
`11/2003
`11/2003
`2/2004
`3/2004
`4/2004
`5/2004
`6/2004
`7/2004
`8/2004
`9/2004
`10/2004
`10/2004
`3/2005
`3/2005
`5/2005
`12/2005
`3/2006
`6/2006
`1/2007
`10/2002
`4/2003
`9/2003
`
`Smith et al.
`Sperber et al. ............ .. 711/144
`MacInnis et al.
`Naqvi et al.
`Larson
`Airey et al.
`Lindholm et al.
`Zhu ......................... .. 345/421
`Morgan et al.
`ZatZ et al.
`Peercy et al.
`Heirich et al.
`Migdal et al.
`Furtner ..................... .. 345/544
`
`Baldwin ................... .. 345/557
`Hussain
`ZatZ et al.
`ZatZ
`Perego
`Morein et a1.
`Donham et al. ........... .. 345/426
`Lindholm et al.
`Leather
`Zhu ......................... .. 345/422
`
`Blythe et al. .............. .. 345/581
`Collodi
`
`Kent ........................ .. 345/505
`
`2004/0041814 A1
`2004/0100471 A1
`2004/0164987 A1
`2005/0068325 A1
`2005/0200629 A1
`
`3/2004 Wyatt et al.
`5/2004 Leather et al.
`8/2004 Aronson et al.
`3/2005 Lefebvre et al.
`9/2005 Morein et a1.
`
`OTHER PUBLICATIONS
`
`Elias, Hugo; Polygon Scan Converting; from http://freespace.virgin.
`net/hugo.elias/graphics/Xipolysc.htrn; pp. 1-7; Jul. 26, 2005*
`BreternitZ, Jr., Mauricio et al.; Compilation, Architectural Support,
`and Evaluation of SIMD Graphics Pipeline Programs on a General
`Purpose CPU; IEEE; 2003; pp. 1-11.
`International Search Report for PCT Patent Application PCT/
`IB2004/003821 dated Mar. 22, 2005.
`European Search Report from European Patent Of?ce; European
`Application No. 032574642; dated Apr. 4, 2006.
`Foley, James et al.; Computer Graphics, Principles and Practice;
`Addison-Wesley Pubiishing Company; 1990; pp. 873-899.
`Crockett, Thomas W.; An introduction to parallel rendering; Eisevier
`Science B.V.; 1997; pp. 819-843.
`Montrym, John S. et al.; In?niteReality: A Real-Time Graphics Sys
`tem; Silicon Graphics Computer Systems; 1997; pp. 293-302.
`Humphreys, Greg et al.; .WireGL: A Scalable Grpahics System for
`Ciusters; ACM Siggraph; 2001; pp. 129-140.
`Fuchs, Henry et al.; Pixel-Planes 5: A Heterogeneous Multiprocessor
`Graphics System Using Processor-Enhanced Memories; Computer
`Graphics; vol. 23, No. 3; Jul. 1989; pp. 79-88.
`
`* cited by examiner
`
`MEDIATEK, Ex. 1001, Page 2
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 1 0f 15
`
`US 7,633,506 B1
`
`95.55
`
`\ 2:
`
`
`
`02 EEO m2ng
`
`><AmmHD
`
`mUSmO A
`o3
`
`\ szmE
`
`/ 02
`
`o: EASE; J m:
`
`gm 2: L 96 929E moigo
`
`mmEDm My: A|| mzoioa?mé
`
`bimzmwmo
`
`Q:
`
`MEDIATEK, Ex. 1001, Page 3
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 2 0f 15
`
`US 7,633,506 B1
`
`FRONT-END RECEIVES
`GRAPHICS INSTRUCTIONS
`AND OUTPUTS GEOMETRY
`200
`
`I
`
`BACK-END OBTAINS
`GEOMETRY AS INPUT
`210
`
`FIGURE 2
`
`I
`
`DETERMINE WHICH
`PIPELINES TO USE TO
`PROCESS THE GEOMETRY
`220
`
`I
`
`USE APPROPRIATE PIPELINES
`TO OPERATE ON THE
`GEOMETRY
`230
`
`I
`
`NUMERICAL VALUES ASSOCIATED
`WITH THE PIXELS THAT DEFINE THE
`GEOMETRY ARE PLACED IN A FRAME
`BUFFER
`240
`
`MEDIATEK, Ex. 1001, Page 4
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 3 0f 15
`
`US 7,633,506 B1
`
`#0
`
`%
`ohm W/
`
`szgué /.
`
`
`
`
`
`com “EEO WUQEAQJWG
`
`m
`
`Edi,
`
`H
`
`c
`
`MS“;
`WAC,
`
`com
`
`/
`
`I
`I
`
`m EEQE
`
`R
`
`
`
`111 MZDHFUDMEmh/m
`
`wU m» m <m U
`
`m H m
`
`MEDIATEK, Ex. 1001, Page 5
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 4 0f 15
`
`US 7,633,506 B1
`
`FRONT-END RECEIVES
`GRAPHICS INSTRUCTIONS
`AND OUTPUTS GEOMETRY
`400
`
`I
`
`BACK-END OBTAIN S
`GEOMETRY AS INPUT
`410
`
`FIGURE 4
`
`I
`
`DETERIVIINE WIHCH
`PIPELINES OWN WHICH
`PORTION OF THE GEOMETRY
`420
`
`I
`
`USE APPROPRIATE PIPELINES
`TO OPERATE ON THE
`GEOMETRY
`430
`
`I
`
`NUMERICAL VALUES ASSOCIATED
`WITH THE PIXELS THAT DEFINE THE
`GEOMETRY ARE PLACED IN A 256 BIT
`FRAME BUFFER
`440
`
`MEDIATEK, Ex. 1001, Page 6
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 5 0f 15
`
`US 7,633,506 B1
`
`lFIIGlURE 5
`
`Transformed
`Vertices
`
`510
`Graphics
`Assembly
`
`530
`L____
`
`$515
`8 qtp
`“'
`
`To other
`Rasterization
`Pmennes(as
`needed)
`0 >
`
`Rasterization
`Pipeline B
`
`Rasterization
`Pipeline A
`
`0
`
`“ii
`
`\I‘
`
`“
`
`h
`
`'
`
`Hierarchical
`i Z Interface
`
`550
`
`w W
`
`Early
`
`2 Interface
`
`Buffer
`Logic
`555
`
`Scan
`Converter
`54o
`
`T.
`
`liE
`Rasterizer
`(paragneter
`interplclater)
`560
`
`Unit
`
`Color
`Buffer I,
`
`Logic ‘
`590
`
`MEDIATEK, Ex. 1001, Page 7
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 6 6f 15
`
`US 7,633,506 B1
`
`Bounding box
`
`vertex 2
`
`Vertex 0
`
`FIGURE 6
`
`MEDIATEK, Ex. 1001, Page 8
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 7 0f 15
`
`US 7,633,506 B1
`
`0
`Q
`Q
`E
`
`asepem Jalsg?aa
`
`MEDIATEK, Ex. 1001, Page 9
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 8 0f 15
`
`US 7,633,506 B1
`
`l ____________________________________ _ _ 1
`
`:
`|
`|
`|
`
`I
`|
`g
`
`|
`l
`I
`I
`|
`|
`|
`|
`|
`l
`I
`
`I
`:
`I
`l
`:
`
`1/0
`
`819
`
`<———-———-
`
`KEYggARD ___—~>
`------
`
`_________________________
`
`.
`
`i
`i
`
`PROCESSOR
`<1 ------ --
`1
`
`813
`
`I
`2
`
`5
`
`MASS STORAGE
`------------ --,>
`11 ----- --
`812
`=
`I
`i
`i
`i
`1
`I
`
`MOUSE
`811
`
`V1DI;(1)6AMP
`+
`CRT
`817
`
`I
`
`VIDEsqlVIEM
`
`818
`
`K
`
`=
`E
`MAIN
`3
`____________ H? MEMORY
`
`COMNI
`INT <—----—
`820
`A
`
`801
`
`315
`
`l
`
`'
`,,,,,,,,,,,,,,,,,,,,,,,,, L
`
`_____ __
`
`| |
`|
`|
`I
`
`'
`|
`|
`g
`
`l
`l
`'
`|
`i
`I
`I
`I
`I
`I
`l
`:
`
`SERVER
`
`826
`
`I
`l
`:
`I
`|
`
`____________________________________ _ _|
`
`800
`
`NETWORK LINK 21
`8
`
`LOCAL
`NETWOR
`K
`822
`
`INTERNET
`825
`
`ISP
`824
`
`FIGURE 8
`
`HOST
`823
`
`MEDIATEK, Ex. 1001, Page 10
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 9 0f 15
`
`US 7,633,506 B1
`
`RASTERIZER
`
`1110
`
`TEXTURE
`UNFF
`
`b UNHHED
`SHADER
`
`1130
`
`1
`
`1100
`
`FRAME
`BUFFER
`
`1120
`
`FIGURE 9
`
`MEDIATEK, Ex. 1001, Page 11
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 10 0f 15
`
`US 7,633,506 B1
`
`1200
`Rasterizer (rs)
`
`Output FIFO/
`Formatter
`
`rame
`
`FIGURE 10
`
`Control
`1244
`
`constant
`inst
`
`
`
`Texture Unit
`
`MEDIATEK, Ex. 1001, Page 12
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 11 0f 15
`
`US 7,633,506 B1
`
`1350
`
`1320
`
`1300 \
`1310
`tex to;
`\ dp3 t4'rltolt1;
`dp3 t4 _g, to, Q;
`dp3 t4 .b,tO,t3;
`
`\ t t4_
`1325
`ex
`'
`\ dp3 t5.r,t0,t1;
`1330
`dp3 t5 ' 9" to’tz‘
`\ dp3 t5.b,t0,t3;
`teX t5;
`mad t0 , t5 , r0 , r1;
`
`1340
`
`LEVEL 0 TEXTURE
`/ INSTRUCTIONS
`LEVELOALU
`/INSTRUCTIONS
`
`LEVEL 1 TEXTURE
`INSTRUCTIONS
`
`INLSE‘IYIELIIST'I‘OLIQIS
`
`*\ LEVEL 2 TEXTURE
`INSTRUCTIONS
`
`LEVEL 2 ALU
`INSTRUCTIONS
`
`FIGURE 11
`
`MEDIATEK, Ex. 1001, Page 13
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 12 0f 15
`
`US 7,633,506 B1
`
`from Rasterizer
`1400
`
`to ALU's
`
`Instruction
`Store
`
`to SRAM's
`Read Addr
`Addr
`
`Input
`Machine
`
`1410
`
`1450 w
`
`from "ix"
`
`1430 E
`
`1445 1435
`
`Level 0
`Tex Machine)
`
`Level 0
`ALU Machine
`
`Level 1
`.
`Tex Machlne)
`
`Level 1
`ALU Machine
`
`Level 2
`Tex Machine
`
`texture
`command
`
`Level 2
`ALU Machine
`
`Level 3
`Tex Machine
`
`Level 3
`ALU Machine
`
`Arbit
`
`1486
`
`Output
`Machine
`
`to output
`formatter
`
`1485
`
`FIGURE 12
`
`MEDIATEK, Ex. 1001, Page 14
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 13 0f 15
`
`US 7,633,506 B1
`
`1520
`
`Rasterizer (rs)
`
`rc
`
`—— 288
`
`-- 288
`
`V
`iag
`
`V
`
`1510
`Shader
`Subsystem
`side door
`
`+
`eglster
`R
`subs“ _
`3
`
`A
`
`
`
`Global Register Load Bus
`
`
`
`
`
`cc 139x
`—————L
`—— 256
`
`Frame Buffer (cb)
`
`1550
`
`1500
`
`FIGURE 13
`
`MEDIATEK, Ex. 1001, Page 15
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 14 0f 15
`
`US 7,633,506 B1
`
`
`
`,1, mEDQE
`
`, x
`
`26 r :2 i?uzmm
`
`TI! 4i
`
`3: 5. Q8 50..
`
`Li I... L l lll I?!
`
`
` p r X F _ i l! A: {I ..r a...
`
`o 5 _‘ oow v 0N0 w
`1.
`
`
`
`
`
`omwr V 23x8. “55:: 35:: Six:
`
`awzumw 3M3 33$ 35%; 822w “HM”.
`
`
`
`
`
`5 8 8 @—
`
`4... 1| 1| L1
`
`2 U.- u- u. U» E J
`
`_
`
`it Air 1 .1! |
`
`MEDIATEK, Ex. 1001, Page 16
`
`
`
`US. Patent
`
`Dec. 15, 2009
`
`Sheet 15 0f 15
`
`US 7,633,506 B1
`
`cm W
`phase OX1X2X3XOX1X2X3XOX1XZX3XDX1X2X3XOX1)
`
`FIGURE 15
`
`MEDIATEK, Ex. 1001, Page 17
`
`
`
`US 7,633,506 B1
`
`1
`PARALLEL PIPELINE GRAPHICS SYSTEM
`
`This application claims priority to US. Provisional Appli
`cation No. 60/429,976, ?led Nov. 27, 2002.
`This is a related application to a co-pending US. patent
`application entitled “DIVIDING WORK AMONG MUL
`TIPLE GRAPHICS PIPELINES USINGA SUPER-TILING
`TECHNIQUE”, having Ser. No. 10/459,797, ?led Jun. 12,
`2003, having Leather et al. as the inventors, owned by the
`same assignee and hereby incorporated by reference in its
`entirety.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates computer graphics chips.
`Portions of the disclosure of this patent document contain
`material that is subject to copyright protection. The copyright
`owner has no objection to the facsimile reproduction by any
`one of the patent document or the patent disclosure as it
`appears in the Patent and Trademark Of?ce ?le or records, but
`otherwise reserves all copyright rights whatsoever.
`2. Background Art
`Computer systems are often used to generate and display
`graphics on an output device such as a monitor. When com
`plex and realistic graphics are desired there are often addi
`tional components, or chips, that are added to the computer
`system to assist it with the complex instruction processing
`that it must perform to render the graphics to the screen.
`Graphics chips may be considered as having a front-end and
`a back-end. The front-end typically receives graphics instruc
`tions and generates “primitives” that form the basis for the
`back-end’s work. The back-end receives the primitives and
`performs the operations necessary to send the data to a frame
`buffer where it will eventually be rendered to the screen. As
`will be further described below, graphics chip back-ends are
`currently inadequate. Before further discussing this problem,
`an overview of a graphics system is provided.
`Graphics System
`Display images are made up of thousands of tiny dots,
`where each dot is one of thousands or millions of colors.
`These dots are known as picture elements, or “pixels”. Each
`pixel has multiple attributes associated with it, including a
`color and a texture which is represented by a number value
`stored in the computer system. A three dimensional display
`image, although displayed using a two dimensional array of
`pixels, may in fact be created by rendering of a plurality of
`graphical objects. Examples of graphical objects include
`points, lines, polygons, and three dimensional solid objects.
`Points, lines, and polygons represent rendering primitives
`which are the basis for most rendering instructions. More
`complex structures, such as three dimensional objects, are
`formed from a combination or mesh of such primitives. To
`display a particular scene, the visible primitives associated
`with the scene are drawn individually by determining those
`pixels that fall within the edges of the primitive, and obtaining
`the attributes of the primitive that correspond to each of those
`pixels. The obtained attributes are used to determine the dis
`played color values of applicable pixels.
`Sometimes, a three dimensional display image is formed
`from overlapping primitives or surfaces. A blending function
`based on an opacity value associated with each pixel of each
`primitive is used to blend the colors of overlapping surfaces or
`layers when the top surface is not completely opaque. The
`?nal displayed color of an individual pixel may thus be a
`blend of colors from multiple surfaces or layers.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`In some cases, graphical data is rendered by executing
`instructions from an application that is drawing data to a
`display. During image rendering, three dimensional data is
`processed into a two dimensional image suitable for display.
`The three dimensional image data represents attributes such
`as color, opacity, texture, depth, and perspective information.
`The draw commands from a program drawing to the display
`may include, for example, X andY coordinates for the verti
`ces of the primitive, as well as some attribute parameters for
`the primitive (color and depth or “Z” data), and a drawing
`command. The execution of drawing commands to generate a
`display image is known as graphics processing.
`Graphics Processing Chips
`When complex graphics processing is required, such as
`using primitives to as a basis for rendering instructions or
`texturing geometric patterns, graphics chips are added to the
`computer system. Graphics chips are speci?cally designed to
`handle the complex and tedious instruction processing that
`must be used to render the graphics to the screen. Graphics
`chips have a front-end and a back-end. The front-end typi
`cally receives graphics instructions and generates the primi
`tives or combination of primitives that de?ne geometric pat
`terns.
`The primitives are then processed by the back end where
`they might be textured, shaded, colored, or otherwise pre
`pared for ?nal output. When the primitives have been fully
`processed by the back end, the pixels on the screen will each
`have a speci?c number value that de?nes a unique color
`attribute the pixel will have when it is drawn. This ?nal value
`is sent to a frame buffer in the back-end, where the value is
`used at the appropriate time.
`Modern graphics processing chip back-ends are equipped
`to handle three-dimensional data, since three-dimensional
`data produces more realistic results to the screen. When pro
`ces sing three-dimensional data, memory bandwidth becomes
`a limitation on performance. The progression of graphics
`processing back-ends has been from a 32 bit system, to a 64
`bit system, and to a 128 bit system. Moving to a 256 bit
`system, where 512 bits may be processed in a single logic
`clock cycle, presents problems. In particular, the ef?cient
`organization and use of data “words” with a 256 bit wide
`DDR frame buffer is problematic because the granularity is
`too coarse. Increasing the width of the frame buffer to 256 bits
`requires innovations in the input and output (I/ O) system used
`by the graphics processing back-end.
`
`SUMMARY OF THE INVENTION
`
`The present invention relates to a parallel array graphics
`system. In one embodiment, the parallel array graphics sys
`tem includes a back-end con?gured to receive primitives and
`combinations of primitives (i.e., geometry) and process the
`geometry to produce values to place in a frame buffer for
`eventual rendering on a screen. In one embodiment, the
`graphics system includes two parallel pipelines. When data
`representing the geometry is presented to the back-end of the
`graphics chip, it is divided into data words and provided to
`one or both of the parallel pipelines.
`In some embodiments, fourparallel pipelines or otherpipe
`line con?gurations having 2An pipelines may be used. Each
`pipeline is a component of a raster back-end, where the dis
`play screen is divided into tiles and a de?ned portion of the
`screen (i.e., one or more tiles) is sent through a pipeline that
`owns that portion of the screen’s tiles.
`In one embodiment, each parallel pipeline comprises a
`raster back-end having a scan converter to step through the
`geometric patterns passed to the back-end, a “hierarchical-Z”
`
`MEDIATEK, Ex. 1001, Page 18
`
`
`
`US 7,633,506 B1
`
`3
`component to more precisely de?ne the borders of the geom
`etry, a “Z-buffer” for performing three-dimensional opera
`tions on the data, a rasterizer for computing texture addresses
`and color components for a pixel, a uni?ed shader for com
`bining multiple characteristics for a pixel and outputting a
`single value, and a color buffer logic unit for taking the
`incoming shader color and blending it into the frame buffer
`using the current frame buffer blend operations. A plurality of
`FIFO (First-In, First-Out) units are used to balance load
`among the pipelines.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`These and other features, aspects and advantages of the
`present invention will become better understood with regard
`to the following description, appended claims and accompa
`nying drawings where:
`FIG. 1 is a parallel pipeline graphics system architecture
`according to an embodiment of the present invention.
`FIG. 2 is a ?owchart showing the operation of a parallel
`pipeline graphics system according to an embodiment of the
`present invention.
`FIG. 3 is a parallel pipeline graphics system architecture
`according to another embodiment of the present invention.
`FIG. 4 is a ?owchart showing the operation of a parallel
`pipeline graphics system according to another embodiment of
`the present invention.
`FIG. 5 is a raster back-end portion of a pipeline according
`to another embodiment of the present invention.
`FIG. 6 is a bounding box illustrating an embodiment of the
`invention.
`FIG. 7 shows an apparatus for synchronizing graphics data
`and state according to an embodiment of the present inven
`tion.
`FIG. 8 is an embodiment of a computer execution environ
`ment suitable for the present invention.
`FIG. 9 is a block diagram of a uni?ed shader according to
`an embodiment of the present invention.
`FIG. 10 shows a uni?ed shader architecture according to an
`embodiment of the present invention.
`FIG. 11 shows how shader code is partitioned according to
`an embodiment of the present invention.
`FIG. 12 shows how control logic is used according to an
`embodiment of the present invention.
`FIG. 13 shows a register subsystem according to an
`embodiment of the present invention.
`FIG. 14 shows a multiple shader system according to an
`embodiment of the present invention.
`FIG. 15 shows anALU according to an embodiment of the
`present invention.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`The invention relates to a parallel pipeline graphics system.
`In the following description, numerous speci?c details are set
`forth to provide a more thorough description of embodiments
`of the invention. It will be apparent, however, to one skilled in
`the art, that the invention may be practiced without these
`speci?c details. In other instances, well known features have
`not been described in detail so as not to obscure the invention.
`Parallel Array Graphics System
`One embodiment of the present invention is shown in the
`block diagram of FIG. 1. Graphics processing chip 100 com
`prises a front-end 110 and a back end 120. The front-end 110
`receives graphics instructions 115 as input and generates
`geometry 116 as output. The back-end 120 is used to process
`the geometry 116 it receives as input. For instance, the back
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`end 120 might operate by texturing, shading, scanning, col
`oring, or otherwise preparing a pixel for ?nal output.
`When the geometry 116 has been fully processed by back
`end 120, the pixels on the screen will each have a speci?c
`number value that de?nes a unique color attribute the pixel
`will have when it is drawn. The number values are passed to
`a frame buffer 130 where they are stored for use at the appro
`priate time, for instance, when they are rendered on display
`device 160. Back-end 120 includes two parallel pipelines,
`designated pipeline 140 and pipeline 150. When data repre
`senting the geometry is presented to the back-end 120 of the
`graphics chip, it is divided into data words and provided to
`one or both of the parallel pipelines 140 and 150.
`FIG. 2 provides a ?owchart showing the operation of the
`architecture of FIG. 1 according to an embodiment of the
`present invention. At step 200 a graphics chip front-end
`receives graphics instructions as input and generates geom
`etry as output. At step 210, a graphics chip back-end obtains
`the geometry as input. Next, it is determined which pipelines
`to use to operate on the geometry at step 220. At step 230 the
`appropriate pipelines operate on the geometry, for instance,
`the pipelines might texture, shade, scan, color, or otherwise
`preparing the geometry for ?nal output. Then, at step 240, the
`numerical values that are associated with the pixels that
`de?ne the geometry are put into a frame buffer. The size of the
`frame buffer may vary.
`In another embodiment, 2 or more pipelines are used and
`eachpipeline is a component of a rasterback-end. The display
`screen is divided into tiles and a de?ned portion of the screen
`is sent (i.e., one or more tiles) through a pipeline that owns
`that portion of the screen’ s tiles. This embodiment is shown in
`FIG. 3. Graphics processing chip 300 comprises a front-end
`310 and a back-end 320. The front-end 310 receives graphics
`instructions 315 as input and generates geometry 316 as out
`put. The back-end 320 is used to process the geometry 316 it
`receives as input. For instance, the back-end 320 might oper
`ate by texturing, shading, scanning, coloring, or otherwise
`preparing a pixel for ?nal output.
`When the geometry 316 has been fully processed by back
`end 320, the pixels on the screen will each have a speci?c
`number value that de?nes a unique color attribute the pixel
`will have when it is drawn. The number values are passed to
`a frame buffer 330 where they are stored for use at the appro
`priate time, for instance, when they are rendered on display
`device 360. Back-end 320 includes 2An parallel pipelines,
`designated pipeline 0 through pipeline n—l. When data rep
`resenting the geometry is presented to the back-end 320 of the
`graphics chip 300, it is analyzed by back-end 320 to deter
`mine which geometry (or portions of geometry) fall within a
`given tile. For instance, if pipeline 0 owns tile 0 on display
`device 360, then the geometry in tile 0 is passed to pipeline 0.
`FIG. 4 provides a ?owchart showing the operation of the
`architecture of FIG. 3 according to an embodiment of the
`present invention. At step 400 a graphics chip front-end
`receives graphics instructions as input and generates geom
`etry as output. At step 410, a graphics chip back-end obtains
`the geometry as input. Next, at step 420 the back-end analyzes
`the geometry to determine which pipeline owns which por
`tion of the geometry, for instance if a geometry falls within
`two tiles, then the geometry processing is divided among the
`pipelines that own those tiles. At step 430 the appropriate
`pipelines operate on the geometry, for instance, the pipelines
`might texture, shade, scan, color, or otherwise preparing the
`geometry for ?nal output. Then, at step 440, the numerical
`values that are associated with the pixels that de?ne the geom
`etry are put into a frame buffer.
`Embodiment of a Back-End Graphics Chip
`
`MEDIATEK, Ex. 1001, Page 19
`
`
`
`US 7,633,506 B1
`
`5
`In one embodiment, each parallel pipeline comprises a
`raster back-end having a scan converter to step through the
`geometric patterns passed to the back-end, a “hierarchical-Z”
`component to more precisely de?ne the borders of the geom
`etry, a “Z-buffer” for performing three-dimensional opera
`tions on the data, a rasterizer for computing texture addresses
`and color components for a pixel, a uni?ed shader for com
`bining multiple characteristics for a pixel and outputting a
`single value, and a color buffer logic unit for taking the
`incoming shader color and blending it into the frame buffer
`using the current frame buffer blend operations.
`In operation, graphics assembly unit 510 takes transformed
`vertices data and assembles complete graphics primitivesi
`triangles or parallelograms, for instance. A set-up unit 515
`receives the data output from graphics assembly 510 and
`generates slope and initial value information for each of the
`texture address, color, or Z parameters associated with the
`primitive. The resulting set-up information is passed to 2 or
`more identical pipelines. In the current example there are two
`pipelines, pipeline 520 and pipeline 525, but the present
`invention contemplates any con?guration of parallel pipe
`lines. In this example, each pipeline 520 and 525 owns one
`half of the screens pixels. Allocation of work between the
`pipelines is made based on a repeating square pixel, tile
`pattern. In one embodiment, logic 530 in the set-up unit 515
`intersects the graphics primitives with the repeating tile pat
`tern such that a primitive is only sent to a pipeline if it is likely
`that it will result in the generation of covered pixels. The
`functionality of a setup unit is further described in commonly
`owned co-pending US. patent application entitled “Scalable
`Rasterizer Interpolator”, with Ser. No. 10/730,864, ?led Dec.
`8, 2003, and is hereby fully incorporated by reference.
`In one embodiment of the present invention, the set-up unit
`manages the distribution of polygons to the pipelines. As
`noted above, the display is divided into multiple tiles and each
`pipeline is responsible for a subset of the tiles. It should be
`noted that any number of square or non-square tiles can be
`used in the present invention.
`A polygon can be composed of 1, 2, or 3 vertices. Vertices
`are given by the graphics application currently executing on a
`host system. The vertices are converted from object space
`3-dimensional homogeneous coordinate system to a display
`(screen) based coordinate system. This conversion can be
`done on the host processor or in a front end section of the
`graphics chip (i.e. vertex transformation). The screen based
`coordinate system has at least X andY coordinates for each
`vertex. The set-up unit 515 creates a bounding box based on
`the screen space X, Y coordinates of each vertex as shown in
`FIG. 6. The bounding box is then compared against a current
`tile pattern. The tiling pattern is based on the number of
`graphics pipelines currently active. For example, in a two (A
`and B) pipeline system, the upper left and lower right pixel
`tiles of a four tile quad are assigned to pipeline A and the
`upper right and lower left tiles to pipeline B (or vice versa). In
`a single pipeline system, all tiles are assigned to pipelineA. In
`one embodiment, the setup unit computes initial value (at
`vertex 0) and slopes for each of up to 42 parameters associated
`with the current graphics primitive.
`The bounding boxes’ four comers are mapped to the tile
`pattern, simply by discarding the lower bits of X &Y. The four
`corners map to the same or different tiles. If they all map to the
`same tile, then only the pipeline that is associated with that
`tile receives the polygon. If it maps to only tiles that are
`associated with only one pipeline, then again only that pipe
`line receives the polygon. In one embodiment, if it maps to
`tiles that are associated with multiple pipelines, then the
`entire polygon is sent to all pipelines.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`Each pipeline contains an input FIFO 535 used to balance
`the load over different pipelines. A scan converter 540 works
`in conjunction with Hierarchical Z interface of Z buffer logic
`555 to step through the geometry (e.g., triangle or parallelo
`gram) within the bounds of the pipeline’s tile pattern. In one
`embodiment, initial stepping is performed at a coarse level.
`For each of the coarse level tiles, a minimum (i.e., closest) Z
`value is computed. This is compared with the farthest Z value
`for the tile stored in a hierarchical-Z buffer 550. If the com
`pare fails, the tile is rejected. The functionality of the scan
`converter and Hierarchical Z interface is further described in
`commonly owned co-pending U. S. patent application entitled
`“Scalable Rasterizer Interpolator”, with Ser. No. 10/730,864,
`?led Dec. 8, 2003, and is hereby fully incorporated by refer
`ence.
`The second section of the scan converter 540 works in
`conjunction with the Early Z interface of the Z buffer logic
`550 to step through the coarse tile at a ?ne level. In one
`embodiment, the coarse tile is subdivided into 2><2 regions
`(called “quads”). For each quad, coverage and Z (depth)
`information is computed. A single bit mode register speci?es
`where Z buffering takes place. If the current Z buffering mode
`is set to “early”, each quad is passed to the Z buffer 555 where
`its Z values are compared against the values stored in the Z
`buffer at that location. Z values for those coveredpixels which
`“pass” the Z compare, are written back into the Z buffer, and
`a modi?ed coverage mask describing the result of the Z
`compare test is passed back to the scan converter 540. At this
`stage, those quads for which none of the covered pixels
`passed the Z compare test are discarded. The early Z func
`tionality attempts to minimize the amount of work applied by
`the uni?ed shader and texture unit to quads which are not
`visible. The functionality of the scan converter and Early Z
`interface is further described in commonly owned co -pending
`US. patent application entitled “Scalable Rasterizer Interpo
`lator”, with Ser. No. 10/730,864, ?led Dec. 8, 2003, and is
`hereby fully incorporated by reference.
`Rasterizer 560 computes up to multiple sets of 2D or 3D
`perspective correct texture addresses and colors for each
`quad. The time taken to transfer data for each quad depends
`on the total number of texture addresses and colors required
`by that quad.
`A uni?ed shader 570 works in conjunction with the texture
`unit 585 and applies a programmed sequence of instructions
`to the rasterized values. These instructions may involve
`simple mathematical functions (add, multiply, etc.) and may
`also involve requests to the texture unit. A un