`
`(12) United States Patent
`Zhu
`
`(to) Patent No.:
`(45) Date of Patent:
`
`US 6,697,063 B1
`Feb. 24, 2004
`
`(54) RENDERING PIPELINE
`
`(75)
`
`Inventor: Ming Benjamin Zhu, Palo Alto, CA
`(US)
`
`(73) Assignee: Nvidia U.S. Investment Company,
`Santa Clara, CA (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 08/978,491
`
`(22) Filed:
`
`Nov. 25, 1997
`
`Related U.S. Application Data
`(60) Provisional application No. 60/050,912, filed on Jun. 13,
`1997, and provisional application No. 60/035,744, filed on
`Jan. 3, 1997.
`
`Int. C1.7
`(51)
`(52) U.S. Cl.
`(58) Field of Search
`
` GO6T 15/00
` 345/421; 345/422
` 345/423, 421,
`345/422
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,876,651 A * 10/1989 Dawson et al.
`5,339,386 A * 8/1994 Sodenberg et al.
`5,509,110 A * 4/1996 Latham
`5,561,750 A * 10/1996 Lentz
`5,613,050 A * 3/1997 Hochmuth et al.
`5,886,701 A * 3/1999 Chauvin et al.
`5,936,642 A * 8/1999 Yumoto et al.
`
`OTHER PUBLICATIONS
`
` 345/511
` 345/422
` 345/421
` 345/422
` 345/422
` 345/418
` 345/504
`
`Foley et al. "Computer Graphics: Principles and Practice"
`pp 863, 1120 1996.*
`
`Foley et al. "Computer Graphics: Principles and Practices",
`Addison—Wesley Publishing Co., 2nd Ed. in C. pp. 91-92,
`673, 873-874, 1997.*
`
`* cited by examiner
`
`Primary Examiner Almis R. Jankus
`(74) Attorney, Agent, or Firm—Townsend and Townsend
`and Crew LLP
`
`(57)
`
`ABSTRACT
`
`A rendering pipeline system for a computer environment
`uses screen space tiling (SST) to eliminate the memory
`bandwidth bottleneck due to frame buffer access and per-
`forms screen space tiling efficiently, while avoiding the
`breaking up of primitives. The system also reduces the
`buffering size required by SST. High quality, full-scene
`anti-aliasing is easily achieved because only the on-chip
`multi-sample memory corresponding to a single tile of the
`screen is needed. The invention uses a double-z scheme that
`decouples the scan conversion/depth-buffer processing from
`the more general rasterization and shading processing
`through a scan/z engine. The scan/z engine externally
`appears as a fragment generator but internally resolves
`visibility and allows the rest of the rendering pipeline to
`perform setup for only visible primitives and shade only
`visible fragments. The resulting reduced raster/shading
`requirements can lead to reduced hardware costs because
`one can process all parameters with generic parameter
`computing units instead of with dedicated parameter com-
`puting units. The invention processes both opaque and
`transparent geometries.
`
`Carpenter "The A—buffer, an Antialiased Hidden Surface
`Method" ACM pp103-108 1984.*
`
`29 Claims, 20 Drawing Sheets
`
`Zt r:irlr8tries
`
`402
`
`401
`
`screen
`space tiler
`
`412
`
`the
`screen
`x,y,z
`
`2-opaque;
`1-transparent
`the scan/z
`with depth
`buffer
`
`visible
`fragment '4-- 404
`FIFO
`
`raster
`
`....„
`411
`
`tile
`geometries
`
`Memory
`
`410
`
`texture
`memory
`
`shading
`
`403
`
`405
`
`406
`
`409
`
`N\ color
`
`frame
`buffer
`
`video-out 4
`
`408
`
`407
`
`tile z alpha
`blend ng /
`pixel-ops
`with color/z
`frame buffer
`
`MEDIATEK, Ex. 1029, Page 1
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 1 of 20
`
`US 6,697,063 B1
`
`101
`
`h
`
`e
`
`102
`
`103
`
`FIG. 1
`(PRIOR ART)
`
`MEDIATEK, Ex. 1029, Page 2
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 2 of 20
`
`US 6,697,063 B1
`
`201
`
`202
`
`screen
`geometries
`
`raster
`
`203
`
`shading
`
`206
`
`video-out
`
`
`
`205
`
`204
`
`color / z
`frame
`buffer
`
`z / alpha
`blending I
`pixel-ops
`
`FIG. 2
`(PRIOR ART)
`
`MEDIATEK, Ex. 1029, Page 3
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 3 of 20
`
`US 6,697,063 B1
`
`301
`
`310
`
`311
`
`314
`
`316
`
`312
`
`13
`
`14
`
`15
`
`309
`
`3> 9
`
`313
`
`V
`10
`
`306
`
`-I> 5
`
`6
`
`.
`
`11
`
`4
`
`7
`
`V
`16
`
`12
`
`8
`
`315
`
`1
`
`2
`Al
`
`3
`Al
`
`4
`/
`
`302
`
`303
`
`304
`
`307
`
`308
`
`305
`
`FIG. 3
`(PRIOR ART)
`
`MEDIATEK, Ex. 1029, Page 4
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 4 of 20
`
`US 6,697,063 B1
`
`screen
`geometries
`
`402
`
`401
`
`screen
`space tiler
`
`412
`
`tile
`screen
`x,y,z
`
`2—opaque;
`1—transparent
`
`tile scan/z
`with depth
`buffer
`
`visible
`fragment '4- 404
`FIFO
`
`raster
`
`411
`
`tile
`geometries
`
`Memory
`
`410
`
`texture
`memory
`
`shading
`
`409
`
`video-out
`
`
`
`N\e, color
`frame
`buffer
`
`408
`
`FIG. 4
`
`tile z I alpha
`blending /
`pixel-ops
`with color/z
`frame buffer
`
`403
`
`405
`
`406
`
`407
`
`MEDIATEK, Ex. 1029, Page 5
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 5 of 20
`
`US 6,697,063 B1
`
`506
`
`504
`
`7 m
`3
`
`n
`
`0
`
`4
`
`503
`
`b /d f
`
`h
`
`j
`
`/
`a
`
`I
`
`505
`
`501
`
`////
`i
`k
`/ 9/
`
`c
`
`2
`
`507
`
`508
`
`502
`
`FIG. 5
`
`MEDIATEK, Ex. 1029, Page 6
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 6 of 20
`
`US 6,697,063 B1
`
`614
`
`610 615
`
`611 616
`
`612
`
`( 1 pages -\
`
`C1 pages J
`
`( 1 pages ) ( 1 pages )
`
`( 13
`
`14
`
`15
`
`16
`
`613
`
`609
`
`605
`
`601
`
`( 3 pages
`
`( 3 pages J
`
`( 3 pages J ( 3 pages
`
`1
`
`f 9
`
`10
`
`11
`
`12
`
`( 4 pages J
`
`( 4 pages 1
`
`( 4 pages
`
`4 pages )
`
`ff 5
`
`(2 pages
`
`pages
`
`( 2 pages
`
`2 pages
`
`g 1
`
`2
`
`3
`
`4
`
`4
`
`602
`
`603
`
`604
`
`606
`
`607
`
`608
`
`FIG. 6
`
`MEDIATEK, Ex. 1029, Page 7
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 7 of 20
`
`US 6,697,063 B1
`
`visible
`fragments
`
`702
`
`
`screen
`geometries
`
`input
`
`703
`
`. primitive
`parameter
`setup
`
`701
`
`704
`
`fragment
`parameter
`computation
`
`705
`
`texture
`access
`
`
`
`shading
`
`707
`
`706
`
`to z/alpha blending/
`pixel-ops engine
`
`708
`
`FIG. 7
`
`MEDIATEK, Ex. 1029, Page 8
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 8 of 20
`
`US 6,697,063 B1
`
`[x2, y2; P2] •rg--- 803
`
`804
`
`[xi, yl; Pi]
`
`802
`
`[x0, y0; POI
`
`801
`
`806
`
`605
`
`Ha 8
`
`MEDIATEK, Ex. 1029, Page 9
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 9 of 20
`
`US 6,697,063 B1
`
`[x0,y0;r01 [xl,y1;r1] [x2,y2;r2]
`
`[x0,y0;a0] [xl,y1;a1] [x2,y2;a2]
`
`• • • • • •
`
`ri, drdx, drdy
`
`two more units
`for g and b
`
`ai, dadx, dady
`
`901
`
`[x0,y0;r0,g0,b0,a0]
`
`[xl,y1;r1,g1,b1,a1]
`
`[x2,y2;r2,g2,b2,a2]
`
`generic parameter setup pipeline
`
`902
`
`al, dadx, dady
`bi, dbdx, dbdy
`gi, dgdx, dgdy
`ri, drdx, drdy
`
`FIG. 9
`
`MEDIATEK, Ex. 1029, Page 10
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 10 of 20
`
`US 6,697,063 B1
`
`O
`
`O
`
`O
`
`SM•O SOMME
`n •OOMSEE•Mil •
`•••5• UM• n
`OMMUMSOISMMUM
`• MO auSSOISM
`SOOM
`MO
`EM•OSS
`
`MU S MUM
`E
`W
`1. NI GINIOSI•allOaa
`I
`a• •r• • m
`•
`SO SESOM•MOSIOMS
`ISO,"-SESO ISSUO.I.SO
`SSIO
`SO
`SM •
`OOO
`
`OOO SOSM
`SS•R I
`
` O O
`MS
`S • SIUMMOM•SSOSIO•
`• Mel WSW
`••••••••••••••••
`Osiowasses awn..
`
`lidismallies5 Warn
`••••••••••••••••
`
`
`
`
`
`• •1; •• • • • • •
`
`ass
`e
`g
`ai
` •
`e
`sure•essim—si
`EISOS2112 •Mt MRS SIMMS
`SOSSOMOOSOMO•MS•S• •
`SMOSIOOV.M1 MSS/. • S
`MOMS OMOSS•MO• • • S
`••• ONO& "• ••••
`" a.
`• • Willinelowis •saliiii• •
`Mews memessMOO SIMMS.
`mimmieges• ••siesemr•ime• ••
`•••••••••• miss' sOUSOMM
`OlO410•1•11•SMSSOM•MO•SSM
`as•saaa• • was manse
`
`1003
`
`O
`
`OMOUSElf•OOSOSOMEISSOMO8
`• SMO•SOSISMSOSSMO EMOISE•
`MUSESM•SSOSSOSIOM••• •111, •
`ESSUSMOSSMOSOI MOSSM• M
`••••• n ••••••••••••••••••
`O SSMOSMORSOMS•SSO OioSsionS
`WVO."MOSMOSSISSOOMMi•MS."•00
`O Mffeernillefli•MS1 Wheel SOO
`SW/Mae••••• IOUVIM•111N .• fain
`••••• .111•• IOW .••Of USIBS.•111
`011OSS•SSIAS•MSOMMUMEOSSISS
`SSOIOSSMWOKOSSOSSO WSW
`SOSOS• •SES• •MS •SSMOSES• •SO
`WORESSOISSOUSEMSUSES MUM
`• OOMMEMESSIO•S•SMSMSMSM•S•SISS
`moomoomemman•mmosammos siossosms
`
`1001
`
`1002
`
`FIG. 10
`
`MEDIATEK, Ex. 1029, Page 11
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 11 of 20
`
`US 6,697,063 B1
`
`1101
`
`binning mailbox
`
`binning
`next frame
`
`page-release
`read pointer
`
`unused
`page
`pool
`
`1107
`
`current frame page list
`
`[1;21,2]
`4
`
`[2;3;3,4,5]
`
`[3;1;6]
`
`[4;4;7,8,9,10]
`
`rendering
`
`read pointer
`A
`
`rendering
`current
`frame
`
`1106
`
`1108
`
`FIG. 11
`
`1102
`
`1103
`
`1104
`
`1105
`
`MEDIATEK, Ex. 1029, Page 12
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 12 of 20
`
`US 6,697,063 B1
`
`1203
`
`screen
`geometries
`
`1201
`
`screen
`space tiler
`
`1202
`
`tile
`screen
`xsy,z
`
`2—opaque;
`1—transparent
`tile scan/z
`with depth
`buffer
`
`visible
`fragment
`FIFO
`
`1204
`
`1205
`
`tile
`geometries
`
`raster
`
`1208
`
`1206
`
`Memory
`
`to downstream
`pipeline
`
`1207
`
`FIG. 12
`
`MEDIATEK, Ex. 1029, Page 13
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 13 of 20
`
`US 6,697,063 B1
`
`Model
`
`View
`
`Project
`
`P-divide
`
`1301
`
`1302
`
`1303
`
`1304
`
`FIG. 13
`
`MEDIATEK, Ex. 1029, Page 14
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 14 of 20
`
`US 6,697,063 B1
`
`1403
`
`multipass
`buffer
`
`)
`
`color
`buffer
`
`/
`1401
`
`/
`1402
`
`FIG. 14
`
`MEDIATEK, Ex. 1029, Page 15
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 15 of 20
`
`US 6,697,063 B1
`
`= color
`
`buffer
`
`1
`f
`1
`l
`....
`
`.
`\
`1
`
`..,,IF
`/I
`multipass
`buffer
`
`/
`1501
`
`/
`1502
`
`FIG. 15
`
`MEDIATEK, Ex. 1029, Page 16
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 16 of 20
`
`US 6,697,063 B1
`
`1603
`
`color
`merging/
`composition
`unit
`
`multipass
`buffer
`
`color
`buffer
`
`/
`1601
`
`/
`1602
`
`FIG. 16
`
`MEDIATEK, Ex. 1029, Page 17
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 17 of 20
`
`US 6,697,063 B1
`
`1702
`
`1703
`
`1704
`
`1705
`
`1706
`
`texture
`parameter
`setup
`
`screen xy
`setup
`
`screen z
`Setup
`
`color
`parameter
`Setup
`
`V V
`span scrrl-z
`interpolation
`
`span cclor
`interpoiation
`
`1707
`
`Ii
`
`xel color
`nterpolation
`
`Ii
`
`xel m.-Z
`nterpolation
`
`1701
`
`V
`span texture
`interpolation
`
`1714 V V
`pixel texture
`Interpolation
`
`edge walk
`
`pixel walk
`
`1713
`
`z-bufferl
`color-
`buffer
`
`V
`
`shading
`
`1708
`
`1712
`
`1711
`
`1710
`
`1709
`
`FIG. 17
`
`MEDIATEK, Ex. 1029, Page 18
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 18 of 20
`
`US 6,697,063 B1
`
`screen xy
`setup
`
`screen z
`setup
`
`----k
`
`attribute
`
`setup
`( 4 )
`
`/
`1802
`
`pixel
`generation
`
`screen z
`interpolation
`
`/
`1808
`
`/
`1807
`
`z buffering
`
`1806
`
`/
`1803
`
`-
`
`1804
`
`v
`attribute
`computation
`( 4 )
`
`1
`1
`:i
`
`data
`composition
`( .4 )
`
`/
`1805
`
`FIG. 18
`
`MEDIATEK, Ex. 1029, Page 19
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 19 of 20
`
`US 6,697,063 B1
`
`1903
`
`1901
`
`1902
`
`pixel
`buffer
`
`1904
`
`--Ob.
`
`attribute
`setup
`
`,----). attribute
`interpolator
`
`............---0.
`
`lut/ pixel
`
`assembly
`
`...110
`
`color
`composition
`
`I.
`
`color
`buffer/
`blender
`
`/
`1906
`
`/
`1905
`
`FIG. 19
`
`MEDIATEK, Ex. 1029, Page 20
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 20 of 20
`
`US 6,697,063 B1
`
`2003
`
`\I.
`
`pixel
`buffer
`
`2001
`
`2002
`
`--).
`
`attribute
`setup
`
`\ r
`attribute
`interpolator
`
`2004
`
`pixel lut/
`assembly
`
`color
`buffer!
`compositor
`
`/
`2005
`
`FIG. 20
`
`MEDIATEK, Ex. 1029, Page 21
`IPR2018-00101
`
`
`
`1
`RENDERING PIPELINE
`
`US 6,697,063 B1
`
`This application claims benefit of No. 60/035,744 filed
`Jan. 3, 1997 and claims benefit of No. 60/050,912 filed Jun.
`13, 1997.
`
`5
`
`BACKGROUND OF THE INVENTION
`
`1. Technical Field
`The invention relates to the rendering of graphics in a
`computer environment. More particularly, the invention
`relates to a rendering pipeline system that renders graphical
`primitives displayed in a computer environment.
`2. Description of the Prior Art
`Graphical representations and user interfaces are no
`longer an optional feature but rather a requirement for
`computer applications. There is a pressing need to produce
`high performance, high quality, and low cost 3D graphics
`rendering pipelines because of this demand.
`Some geometry processing units (e.g. general-purpose
`host processors or specialized dedicated geometry engines)
`process geometries in model space into geometries in screen
`space. Screen space geometries are a collection of geometric
`primitives represented by screen space vertices and their
`connectivity information. A screen space vertex typically
`contains screen x, y, z coordinates, multiple sets of colors,
`and multiple sets of texture attributes (including the homo-
`geneous components), and possibly vertex normals. Refer-
`ring to FIG. 1, the connectivity information is conveyed
`using basic primitives such as points, lines, triangles 101, or
`strip 102, or fan 103 forms of these basic primitives.
`In a traditional architecture, raster or rasterization refers to
`the following process:
`Given screen x and y positions as well as all other
`parameter values for all vertices of a primitive, perform
`parameter setup computation in the form of plain equations;
`scan convert the primitive into fragments based on screen x
`and y positions; compute parameter values at these fragment
`locations. Referring to FIG. 2, a traditional rendering pipe-
`line is shown. Screen geometries 201 are rasterized 202. The
`shading process 203 is then performed on the graphics
`primitives. The z/alpha blending process 204 places the final
`output into the color/z frame buffer 205 which is destined for
`the video output 206. There is a serious concern with the
`memory bandwidth between the z/alpha-blending/pixel-op
`process 204 and the frame buffer in the memory 205. To
`z-buffer 100 Mpixels/s, assuming 4 bytes/pixel for RGBA
`color, 2 bytes/pixel for z, and 50% of the pixels actually
`being written into the frame buffer on average due to
`z-buffering. The memory bandwidth is computed as follows:
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`100 Mpixels/s*(2 bytes+50%*(4 bytes+2 bytes))/pixel=500
`Mbytes/s
`
`The equation assumes a hypothetical perfect prefetch of 55
`pixels from frame buffer memory into a local pixel cache
`without either page miss penalty or wasteful pixels.
`The actual memory bandwidth is substantially higher
`because the read-modify-write cycle required for z-buffering
`cannot be implemented efficiently without a complicated 60
`pipeline and long delay. Alpha blending increases the band-
`width requirement even further. The number is dramatically
`increased if full-scene anti-aliasing is performed. For
`example, 4-subsample multi-sampling requires the frame
`buffer memory access bandwidth by the z/alpha-blending/ 65
`pixel-op engine 204 to roughly quadruple, i.e. at least 2
`Gbytes/s of memory bandwidth is required to do
`
`2
`4-subsample multi-sampling at 100 Mpixels/s. Full-scene
`anti-aliasing is extremely desirable for improving rendering
`quality; however, unless either massive memory bandwidth
`is applied (e.g. through interleaving multiple processors/
`memories), which leads to rapid hardware cost increase or
`compromised pixel fill performance, full scene anti-aliasing
`is impractical to implement under a traditional rendering
`pipeline architecture. Full scene anti-aliasing also requires
`the frame buffer size to increase significantly, e.g. to qua-
`druple in the case of 4-subsample multi-sampling.
`Another drawback with the traditional rendering pipeline
`is that all primitives, regardless if they are visible or not, are
`completely rasterized and corresponding fragments are
`shaded. Considering a pixel fill rate of 400 Mpixels for
`non-anti-aliased geometries and assuming a screen resolu-
`tion of 1280x1024 with a 30 Hz frame rate, the average
`depth complexity is 10. Even if there is anti-aliasing, the
`average depth complexity is still between 6-7 for an average
`triangle size of 50 pixels. The traditional pipeline therefore
`wastes a large amount of time rasterizing and shading
`geometries that do not contribute to final pixel colors.
`There are other approaches which attempt to resolve these
`problems. With respect to memory bandwidth, two solutions
`exist. One approach is to use a more specialized memory
`design by either placing sophisticated logic on Dynamic
`Random Access Memory (DRAM) (e.g. customized
`memory chips such as 3DRAM) or placing a large amount
`of DRAM on logic. While this can alleviate the memory
`bandwidth problem to a large extent, it is not currently
`cost-effective due to the-economy-of-scale. In addition, the
`frame buffer size in the memory grows dramatically for
`full-scene anti-aliasing.
`The other alternative is by caching the frame buffer
`on-chip, which is also called virtual buffering. Only a
`portion of frame buffer can be cached at any time because
`on-chip memory is limited. One type of virtual buffering
`uses the on-chip memory as a general pixel cache, i.e. a
`window into the frame buffer memory. Pixel caching can
`take advantage of spatial coherence, however, the same
`location of the screen might be cached in and out of the
`on-chip memory many times during a frame. Therefore, it
`uses very little intra-frame temporal coherence (in the form
`of depth complexity).
`The only way to take advantage of intra-frame temporal
`coherence reliably is through screen space tiling (SST).
`First, by binning all geometries into tiles (also called screen
`subdivisions which are based on screen locations). For
`example, with respect to FIG. 3, the screen 301 is partitioned
`into 16 square, disjoint tiles, numbered 1 302, 2 303, 3 304,
`up to 16 312. Four triangles a 313, b 314, c 315, and d 316
`are binned as follows:
`tile 5 306: a 313
`tile 6 307: a 313, b 314, c 315
`tile 7 308: c 315, d 316
`tile 9 309: a 313
`tile 10 310: a 313, b 314, c 315, d 316
`tile 11 311: c 315, d 316
`Secondly, by sweeping through screen tiles, processing a
`tile's worth of geometry at a time, using an on-chip tile
`frame buffer, producing the final pixel colors corresponding
`to the tile, and outputting them to the frame buffer. Here, the
`external frame buffer access bandwidth is limited to the final
`pixel color output. There is no external memory bandwidth
`difference between non-anti-aliasing and full-scene anti-
`aliasing. The memory footprint in the external frame buffer
`is identical regardless if non-anti-aliasing or full-scene anti-
`
`MEDIATEK, Ex. 1029, Page 22
`IPR2018-00101
`
`
`
`US 6,697,063 B1
`
`4
`avoiding the breaking up of primitives the invention also
`reduces the buffering size through the use of single+
`buffering.
`The invention uses a double-z scheme that decouples the
`5 scan conversion/depth-buffer processing from the more gen-
`eral rasterization and shading processing. The core of
`double-z is the scan/z engine, which externally looks like a
`fragment generator but internally resolves visibility. It
`allows the rest of the rendering pipeline to rasterize only
`10 visible primitives and shade only visible fragments.
`Consequently, the raster/shading rate is decoupled from the
`scan/z rate. The invention also allows both opaque and
`transparent geometries to work seamlessly under this frame-
`work.
`The raster/shading engine is alternatively modified to take
`advantage of the reduced raster/shading requirements.
`Instead of using dedicated parameter computing units, one
`can share a generic parameter computing unit to process all
`parameters.
`20 Other aspects and advantages of the invention will
`become apparent from the following detailed description in
`combination with the accompanying drawings, illustrating,
`by way of example, the principles of the invention.
`
`15
`
`3
`aliasing is used. There is no external depth-buffer memory
`bandwidth effectively, and the depth-buffer need not exist in
`the external memory. The disadvantage is that extra screen
`space binning is introduced, which implies an extra frame of
`latency.
`Two main approaches exist with respect to depth com-
`plexity. One requires geometries sorted from front-to-back
`and rendered in that order and no shading of invisible
`fragments.
`The disadvantages to this first approach are: 1) spatial
`sorting needs to be performed off-line, and thus only works
`reliably for static scenes, dynamics dramatically reduce the
`effectiveness; 2) front-to-back sorting requires depth priori-
`ties to be adjusted per frame by the application programs,
`which places a significant burden on the host processors; and
`3) front-to-back sorting tends to break other forms of
`coherence, such as texture access coherence or shading
`coherence. Without front-to-back sorting, one-pass shading-
`after-z for random applications gives some improvement
`over the traditional rendering pipeline, however, perfor-
`mance improvement is not assured.
`The other approach is deferred shading where: 1) primi-
`tives are fully rasterized and their fragments are depth-
`buffered with their surface attributes; and 2) the (partially)
`visible fragments left in the depth-buffer are shaded using
`the associated surface attributes when all geometries are
`processed at the end of a frame. This guarantees that only
`visible fragments are shaded.
`The main disadvantages with this approach are: 1)
`deferred shading breaks shading coherence; 2) deferred
`shading requires full rasterization of all primitives, including
`invisible primitives and invisible fragments; 3) deferred
`shading requires shading all subsamples when multi-sample
`anti-aliasing is applied; and 4) deferred shading does not
`scale well with a varying number of surface attributes
`(because it has to handle the worst case).
`It would be advantageous to provide a rendering pipeline
`system that lowers the system cost by reducing the memory
`bandwidth consumed by the rendering system. It would
`further be advantageous to provide an efficient rendering
`pipeline system that writes visible fragments once into the
`color buffer and retains coherence.
`
`SUMMARY OF THE INVENTION
`The invention provides a rendering pipeline system for a
`computer environment. The invention uses a rendering pipe-
`line design that efficiently renders visible fragments by
`decoupling the scan conversion/depth buffer processing
`from the rasterization/shading process. It further provides a
`rendering pipeline system that reduces the memory band-
`width consumed by frame buffer accesses through screen
`space tiling. In the invention, raster or rasterization refers to
`the following process:
`For each visible primitive, parameter setup computation is
`performed to generate plane equations. For each visible
`fragment of said visible primitive, parameter values are
`computed. Scan conversion is excluded from the ras-
`terization process.
`The invention uses screen space tiling (SST) to eliminate
`the memory bandwidth bottleneck due to frame buffer
`access. Quality is also improved by using full-scene anti-
`aliasing. This is possible under SST because only on-chip
`memory corresponding to a single tile of the screen, as
`opposed to the full screen, is needed. A 32x32 tile anti-
`aliased frame buffer is easily implemented on-chip, and a
`larger tile size can later be accommodated. Additionally, the
`invention performs screen space tiling efficiently, while
`
`25
`
`30
`
`35
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a schematic diagram of triangle, strip and fan
`forms of basic primitives;
`FIG. 2 is a block schematic diagram of the data flow of a
`traditional rendering pipeline;
`FIG. 3 is a schematic diagram of a screen partition;
`FIG. 4 is a block schematic diagram of the data flow of a
`preferred embodiment of the invention;
`FIG. 5 is a schematic diagram of two triangle strips in an
`ideal binning situation according to the invention;
`FIG. 6 is a schematic diagram of a depth complexity
`distribution that arises frequently in graphics applications
`due to perspective according to the invention;
`FIG. 7 is a block schematic diagram of the data flow of the
`raster/shading engine in a preferred embodiment of the
`40 invention;
`FIG. 8 is a schematic diagram of a triangle and its visible
`fragments according to the invention;
`FIG. 9 is a block schematic diagram of the data flow of
`primitive parameter setup pipelines according to the inven-
`45 tion;
`FIG. 10 is a schematic diagram of a subsample, pixel, and
`visible opaque fragment according to the invention;
`FIG. 11 is a block schematic diagram of the data flow of
`the page allocation/release synchronization in screen space
`50 tiling in a preferred embodiment of the invention;
`FIG. 12 is a block schematic diagram of the module
`communications in a preferred embodiment of the invention;
`FIG. 13 is a block schematic diagram of the data flow
`involved in geometry transformation in a preferred embodi-
`ment of the invention;
`FIG. 14 is a block schematic diagram of two schemes for
`anti-aliased multipass rendering according to the invention;
`FIG. 15 is a block schematic diagram of the data flow of
`60 a revised scheme for anti-aliased multipass rendering
`according to the invention;
`FIG. 16 is a block schematic diagram of the data flow of
`a further refined scheme for anti-aliased multipass rendering
`according to the invention;
`FIG. 17 is a block schematic diagram of the data flow of
`a traditional polygon rasterization engine according to the
`invention;
`
`55
`
`65
`
`MEDIATEK, Ex. 1029, Page 23
`IPR2018-00101
`
`
`
`US 6,697,063 B1
`
`5
`FIG. 18 is a block schematic diagram of the data flow of
`a decoupled rasterization engine in a preferred embodiment
`of the invention;
`FIG. 19 is a block schematic diagram of the data flow of
`a fine-grain multipass rendering engine in a preferred
`embodiment of the invention; and
`FIG. 20 is a block schematic diagram of the data flow of
`a coarse-grain multipass rendering engine in a preferred
`embodiment of the invention.
`DETAILED DESCRIPTION OF THE
`INVENTION
`As shown in the drawings for purposes of illustration, the
`invention provides a rendering pipeline system in a com-
`puter environment. A system according to the invention
`provides efficient use of processing capabilities and memory
`bandwidth through the intelligent management of primitive
`rendering and memory usage while retaining coherence.
`The invention uses screen space tiling (SST) to eliminate
`the memory bandwidth bottleneck due to frame buffer
`access. Quality is also improved by using full-scene anti-
`aliasing. This is possible under SST because only on-chip
`memory corresponding to a single tile of screen as opposed
`to the full screen is needed. A 32x32 tile anti-aliased frame
`buffer is easily implemented on-chip, and a larger tile size
`can later be accommodated. Additionally, the invention
`performs screen space tiling efficiently while avoiding the
`breaking up primitives and reduces the buffering size
`required by SST.
`The invention uses a double-z scheme that decouples the
`scan conversion/depth-buffer processing from the more gen-
`eral rasterization and shading processing. The core of
`double-z is the scan/z engine, which externally looks like a
`fragment generator but internally resolves visibility. It
`allows the rest of the rendering pipeline to compute param-
`eters for only visible primitives and shade only visible
`fragments. Consequently, the raster/shading rate is
`decoupled from the scan/z rate. The invention also allows
`both opaque and transparent geometries to work seamlessly
`under this framework.
`The raster/shading engine is alternatively modified to take
`advantage of the reduced raster/shading requirements.
`Generic parameter pipelines that are shared by all param-
`eters replace dedicated processing units for each surface
`parameter are another option in the invention.
`Referring to FIG. 4, the basic data flow of the invention
`is shown. The geometries in model space 401 are trans-
`formed into screen space and the screen space tiler 412 bins
`a frame worth of geometries into screen tiles. The visibility
`of all geometries is determined up front using only screen x,
`y, z coordinates 402 in the scan/z engine 403 for each tile.
`Visibility information 404 are sent out for rasterization 405
`and shading 406. The visibility information 404 are com-
`bined with the tile geometries 411 for each tile so that only
`visible geometries are set up for rasterization. Only visible
`fragments are fully rasterized and shaded in the raster
`405/shading 406 engine. The resulting fragments are sent to
`the blending engine 407. The blending engine 407 alpha-
`blends incoming fragments. The blending engine 407
`resolves and outputs pixel colors into the frame buffer at the
`end-of-tile. The tasks of the screen space tiler 412, scan z
`403, raster 405/shading 406, and blending 407 engines
`operate in parallel for the load-balancing of the various
`processes. This does introduce one frame of latency. If the
`extra latency is objectionable, then the scan z 403, raster
`405/shading 406, and blending 407 engines operate in
`parallel with the screen space tiler 412 operating serially
`before them.
`
`6
`
`1 5
`
`Screen Space Tiling
`2.1 Overview
`Screen space tiling (SST) partitions a screen into disjoint
`(rectangular) regions (called tiles). It bins all geometries in
`5 screen space into tiles that the geometries intersect. Primi-
`tives crossing multiple tiles will be binned in all relevant
`tiles.
`Referring to FIG. 3, for example, a screen 301 is parti-
`tioned into 16 square, disjoint tiles, numbered 1 302, 2 303,
`10 3 304, up to 16 312. Four triangles a 313, b 314, c 315, and
`d 316 are binned as follows:
`tile 5 306: a 313
`tile 6 307: a 313, b 314, c 315
`tile 7 308: c 315, d 316
`tile 9 309: a 313
`tile 10 310: a 313, b 314, c 315, d 316
`tile 11 311: c 315, d 316
`This binning process completes when a frame boundary is
`20 reached. Then the binned geometries are handed to the
`rendering pipeline for rendering. Meanwhile, geometry
`transformation and binning for the next frame gets started.
`Ideally, geometry transformation and binning of the next
`frame is performed in parallel with the rendering of the
`25 current frame.
`One potential drawback with SST is the extra frame of
`latency it introduces; however, this is generally tolerable.
`A tile worth of geometries is rendered without external
`frame buffer access by maintaining a tile frame buffer
`30 on-chip. The final pixel colors are output to the external
`color buffer in the memory only after geometries for the
`whole tile have been processed. Therefore, the memory
`bandwidth bottleneck in a traditional rendering pipeline
`caused by frame buffer access is eliminated.
`Because SST requires binning one frame of geometries,
`due to the geometry size, the binned geometries have to be
`stored in external memory. Both writing binned geometries
`into memory during binning and reading binned geometries
`from memory during tile rendering consumes memory band-
`40 width. The memory bandwidth requirement for both reading
`and writing is examined next. Assume that 1 Mtriangles are
`represented in strip form. The average triangle size is 50
`pixels, the average vertex size is 20-25 bytes with screen x,
`y, z coordinates, 2 sets of colors, 1 set of 2D texture
`45 coordinates, and 1/w in packed form of adequate precision.
`The average triangle strip size within a tile is about 8
`vertices, which gives 1.33 vertex/tri. In addition, up to 50%
`of the triangles need to be duplicated across multiple tiles.
`Therefore, the memory bandwidth required for SST is
`so roughly:
`
`35
`
`2(write/read)*1 Mtris*1.5*1.33 vtx/tri*20-.25 bytes/vtx=80-100
`Mbytes
`
`The memory bandwidth number for SST stays the same
`55 whether full-scene anti-aliasing is implemented or not, or
`geometries are only z-buffered, or complex alpha-blending
`is performed because the tile frame buffer is on-chip. This
`bandwidth scales linearly with polygon performance. For
`example, a performance of 5 Mtris/s and 10 Mtris/s requires
`60 400.-500 Mbytes/s and 800-1000 Mbytes/s memory band-
`width respectively. The bandwidth goes down when the
`average vertex size decreases. In addition, the bandwidth
`number goes down as the average triangle size becomes
`smaller, because a tile can now contain longer strips, and the
`65 likelihood of triangle duplication in multiple tiles due to tile
`border crossing is reduced. The asymptotic rate approaches
`40-50 Mbytes per 1M triangles as the average triangle size
`
`MEDIATEK, Ex. 1029, Page 24
`IPR2018-00101
`
`
`
`US 6,697,063 B1
`
`5
`
`10
`
`15
`
`8
`what tile(s) to output, and what the new state is. The state
`information contains two parts:
`a) what tile(s) was the previous triangle in the binning
`strip/fan output to?
`b) where is the new triangle?
`For b), there are three main state values and the corre-
`sponding actions:
`1) when all three vertices are in the same tile, then output
`the triangle to that tile only.
`2) when all three vertices are in two horizontally and
`vertically adjacent tiles, then output the triangle to both
`tiles.
`3) otherwise, bin the triangle to all tiles that intersect with
`the bounding box of the triangle.
`Clearly, the handling of case 3) may be too conservative
`by binning nonintersecting triangles because of the co