throbber
11111111111111111111111101111flt !!1111111111111111111111111
`
`(12) United States Patent
`Zhu
`
`(to) Patent No.:
`(45) Date of Patent:
`
`US 6,697,063 B1
`Feb. 24, 2004
`
`(54) RENDERING PIPELINE
`
`(75)
`
`Inventor: Ming Benjamin Zhu, Palo Alto, CA
`(US)
`
`(73) Assignee: Nvidia U.S. Investment Company,
`Santa Clara, CA (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 08/978,491
`
`(22) Filed:
`
`Nov. 25, 1997
`
`Related U.S. Application Data
`(60) Provisional application No. 60/050,912, filed on Jun. 13,
`1997, and provisional application No. 60/035,744, filed on
`Jan. 3, 1997.
`
`Int. C1.7
`(51)
`(52) U.S. Cl.
`(58) Field of Search
`
` GO6T 15/00
` 345/421; 345/422
` 345/423, 421,
`345/422
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,876,651 A * 10/1989 Dawson et al.
`5,339,386 A * 8/1994 Sodenberg et al.
`5,509,110 A * 4/1996 Latham
`5,561,750 A * 10/1996 Lentz
`5,613,050 A * 3/1997 Hochmuth et al.
`5,886,701 A * 3/1999 Chauvin et al.
`5,936,642 A * 8/1999 Yumoto et al.
`
`OTHER PUBLICATIONS
`
` 345/511
` 345/422
` 345/421
` 345/422
` 345/422
` 345/418
` 345/504
`
`Foley et al. "Computer Graphics: Principles and Practice"
`pp 863, 1120 1996.*
`
`Foley et al. "Computer Graphics: Principles and Practices",
`Addison—Wesley Publishing Co., 2nd Ed. in C. pp. 91-92,
`673, 873-874, 1997.*
`
`* cited by examiner
`
`Primary Examiner Almis R. Jankus
`(74) Attorney, Agent, or Firm—Townsend and Townsend
`and Crew LLP
`
`(57)
`
`ABSTRACT
`
`A rendering pipeline system for a computer environment
`uses screen space tiling (SST) to eliminate the memory
`bandwidth bottleneck due to frame buffer access and per-
`forms screen space tiling efficiently, while avoiding the
`breaking up of primitives. The system also reduces the
`buffering size required by SST. High quality, full-scene
`anti-aliasing is easily achieved because only the on-chip
`multi-sample memory corresponding to a single tile of the
`screen is needed. The invention uses a double-z scheme that
`decouples the scan conversion/depth-buffer processing from
`the more general rasterization and shading processing
`through a scan/z engine. The scan/z engine externally
`appears as a fragment generator but internally resolves
`visibility and allows the rest of the rendering pipeline to
`perform setup for only visible primitives and shade only
`visible fragments. The resulting reduced raster/shading
`requirements can lead to reduced hardware costs because
`one can process all parameters with generic parameter
`computing units instead of with dedicated parameter com-
`puting units. The invention processes both opaque and
`transparent geometries.
`
`Carpenter "The A—buffer, an Antialiased Hidden Surface
`Method" ACM pp103-108 1984.*
`
`29 Claims, 20 Drawing Sheets
`
`Zt r:irlr8tries
`
`402
`
`401
`
`screen
`space tiler
`
`412
`
`the
`screen
`x,y,z
`
`2-opaque;
`1-transparent
`the scan/z
`with depth
`buffer
`
`visible
`fragment '4-- 404
`FIFO
`
`raster
`
`....„
`411
`
`tile
`geometries
`
`Memory
`
`410
`
`texture
`memory
`
`shading
`
`403
`
`405
`
`406
`
`409
`
`N\ color
`
`frame
`buffer
`
`video-out 4
`
`408
`
`407
`
`tile z alpha
`blend ng /
`pixel-ops
`with color/z
`frame buffer
`
`MEDIATEK, Ex. 1029, Page 1
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 1 of 20
`
`US 6,697,063 B1
`
`101
`
`h
`
`e
`
`102
`
`103
`
`FIG. 1
`(PRIOR ART)
`
`MEDIATEK, Ex. 1029, Page 2
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 2 of 20
`
`US 6,697,063 B1
`
`201
`
`202
`
`screen
`geometries
`
`raster
`
`203
`
`shading
`
`206
`
`video-out
`
`
`
`205
`
`204
`
`color / z
`frame
`buffer
`
`z / alpha
`blending I
`pixel-ops
`
`FIG. 2
`(PRIOR ART)
`
`MEDIATEK, Ex. 1029, Page 3
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 3 of 20
`
`US 6,697,063 B1
`
`301
`
`310
`
`311
`
`314
`
`316
`
`312
`
`13
`
`14
`
`15
`
`309
`
`3> 9
`
`313
`
`V
`10
`
`306
`
`-I> 5
`
`6
`
`.
`
`11
`
`4
`
`7
`
`V
`16
`
`12
`
`8
`
`315
`
`1
`
`2
`Al
`
`3
`Al
`
`4
`/
`
`302
`
`303
`
`304
`
`307
`
`308
`
`305
`
`FIG. 3
`(PRIOR ART)
`
`MEDIATEK, Ex. 1029, Page 4
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 4 of 20
`
`US 6,697,063 B1
`
`screen
`geometries
`
`402
`
`401
`
`screen
`space tiler
`
`412
`
`tile
`screen
`x,y,z
`
`2—opaque;
`1—transparent
`
`tile scan/z
`with depth
`buffer
`
`visible
`fragment '4- 404
`FIFO
`
`raster
`
`411
`
`tile
`geometries
`
`Memory
`
`410
`
`texture
`memory
`
`shading
`
`409
`
`video-out
`
`
`
`N\e, color
`frame
`buffer
`
`408
`
`FIG. 4
`
`tile z I alpha
`blending /
`pixel-ops
`with color/z
`frame buffer
`
`403
`
`405
`
`406
`
`407
`
`MEDIATEK, Ex. 1029, Page 5
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 5 of 20
`
`US 6,697,063 B1
`
`506
`
`504
`
`7 m
`3
`
`n
`
`0
`
`4
`
`503
`
`b /d f
`
`h
`
`j
`
`/
`a
`
`I
`
`505
`
`501
`
`////
`i
`k
`/ 9/
`
`c
`
`2
`
`507
`
`508
`
`502
`
`FIG. 5
`
`MEDIATEK, Ex. 1029, Page 6
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 6 of 20
`
`US 6,697,063 B1
`
`614
`
`610 615
`
`611 616
`
`612
`
`( 1 pages -\
`
`C1 pages J
`
`( 1 pages ) ( 1 pages )
`
`( 13
`
`14
`
`15
`
`16
`
`613
`
`609
`
`605
`
`601
`
`( 3 pages
`
`( 3 pages J
`
`( 3 pages J ( 3 pages
`
`1
`
`f 9
`
`10
`
`11
`
`12
`
`( 4 pages J
`
`( 4 pages 1
`
`( 4 pages
`
`4 pages )
`
`ff 5
`
`(2 pages
`
`pages
`
`( 2 pages
`
`2 pages
`
`g 1
`
`2
`
`3
`
`4
`
`4
`
`602
`
`603
`
`604
`
`606
`
`607
`
`608
`
`FIG. 6
`
`MEDIATEK, Ex. 1029, Page 7
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 7 of 20
`
`US 6,697,063 B1
`
`visible
`fragments
`
`702
`
`
`screen
`geometries
`
`input
`
`703
`
`. primitive
`parameter
`setup
`
`701
`
`704
`
`fragment
`parameter
`computation
`
`705
`
`texture
`access
`
`
`
`shading
`
`707
`
`706
`
`to z/alpha blending/
`pixel-ops engine
`
`708
`
`FIG. 7
`
`MEDIATEK, Ex. 1029, Page 8
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 8 of 20
`
`US 6,697,063 B1
`
`[x2, y2; P2] •rg--- 803
`
`804
`
`[xi, yl; Pi]
`
`802
`
`[x0, y0; POI
`
`801
`
`806
`
`605
`
`Ha 8
`
`MEDIATEK, Ex. 1029, Page 9
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 9 of 20
`
`US 6,697,063 B1
`
`[x0,y0;r01 [xl,y1;r1] [x2,y2;r2]
`
`[x0,y0;a0] [xl,y1;a1] [x2,y2;a2]
`
`• • • • • •
`
`ri, drdx, drdy
`
`two more units
`for g and b
`
`ai, dadx, dady
`
`901
`
`[x0,y0;r0,g0,b0,a0]
`
`[xl,y1;r1,g1,b1,a1]
`
`[x2,y2;r2,g2,b2,a2]
`
`generic parameter setup pipeline
`
`902
`
`al, dadx, dady
`bi, dbdx, dbdy
`gi, dgdx, dgdy
`ri, drdx, drdy
`
`FIG. 9
`
`MEDIATEK, Ex. 1029, Page 10
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 10 of 20
`
`US 6,697,063 B1
`
`O
`
`O
`
`O
`
`SM•O SOMME
`n •OOMSEE•Mil •
`•••5• UM• n
`OMMUMSOISMMUM
`• MO auSSOISM
`SOOM
`MO
`EM•OSS
`
`MU S MUM
`E
`W
`1. NI GINIOSI•allOaa
`I
`a• •r• • m
`•
`SO SESOM•MOSIOMS
`ISO,"-SESO ISSUO.I.SO
`SSIO
`SO
`SM •
`OOO
`
`OOO SOSM
`SS•R I
`
` O O
`MS
`S • SIUMMOM•SSOSIO•
`• Mel WSW
`••••••••••••••••
`Osiowasses awn..
`
`lidismallies5 Warn
`••••••••••••••••
`
`
`
`
`
`• •1; •• • • • • •
`
`ass
`e
`g
`ai
` •
`e
`sure•essim—si
`EISOS2112 •Mt MRS SIMMS
`SOSSOMOOSOMO•MS•S• •
`SMOSIOOV.M1 MSS/. • S
`MOMS OMOSS•MO• • • S
`••• ONO& "• ••••
`" a.
`• • Willinelowis •saliiii• •
`Mews memessMOO SIMMS.
`mimmieges• ••siesemr•ime• ••
`•••••••••• miss' sOUSOMM
`OlO410•1•11•SMSSOM•MO•SSM
`as•saaa• • was manse
`
`1003
`
`O
`
`OMOUSElf•OOSOSOMEISSOMO8
`• SMO•SOSISMSOSSMO EMOISE•
`MUSESM•SSOSSOSIOM••• •111, •
`ESSUSMOSSMOSOI MOSSM• M
`••••• n ••••••••••••••••••
`O SSMOSMORSOMS•SSO OioSsionS
`WVO."MOSMOSSISSOOMMi•MS."•00
`O Mffeernillefli•MS1 Wheel SOO
`SW/Mae••••• IOUVIM•111N .• fain
`••••• .111•• IOW .••Of USIBS.•111
`011OSS•SSIAS•MSOMMUMEOSSISS
`SSOIOSSMWOKOSSOSSO WSW
`SOSOS• •SES• •MS •SSMOSES• •SO
`WORESSOISSOUSEMSUSES MUM
`• OOMMEMESSIO•S•SMSMSMSM•S•SISS
`moomoomemman•mmosammos siossosms
`
`1001
`
`1002
`
`FIG. 10
`
`MEDIATEK, Ex. 1029, Page 11
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 11 of 20
`
`US 6,697,063 B1
`
`1101
`
`binning mailbox
`
`binning
`next frame
`
`page-release
`read pointer
`
`unused
`page
`pool
`
`1107
`
`current frame page list
`
`[1;21,2]
`4
`
`[2;3;3,4,5]
`
`[3;1;6]
`
`[4;4;7,8,9,10]
`
`rendering
`
`read pointer
`A
`
`rendering
`current
`frame
`
`1106
`
`1108
`
`FIG. 11
`
`1102
`
`1103
`
`1104
`
`1105
`
`MEDIATEK, Ex. 1029, Page 12
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 12 of 20
`
`US 6,697,063 B1
`
`1203
`
`screen
`geometries
`
`1201
`
`screen
`space tiler
`
`1202
`
`tile
`screen
`xsy,z
`
`2—opaque;
`1—transparent
`tile scan/z
`with depth
`buffer
`
`visible
`fragment
`FIFO
`
`1204
`
`1205
`
`tile
`geometries
`
`raster
`
`1208
`
`1206
`
`Memory
`
`to downstream
`pipeline
`
`1207
`
`FIG. 12
`
`MEDIATEK, Ex. 1029, Page 13
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 13 of 20
`
`US 6,697,063 B1
`
`Model
`
`View
`
`Project
`
`P-divide
`
`1301
`
`1302
`
`1303
`
`1304
`
`FIG. 13
`
`MEDIATEK, Ex. 1029, Page 14
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 14 of 20
`
`US 6,697,063 B1
`
`1403
`
`multipass
`buffer
`
`)
`
`color
`buffer
`
`/
`1401
`
`/
`1402
`
`FIG. 14
`
`MEDIATEK, Ex. 1029, Page 15
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 15 of 20
`
`US 6,697,063 B1
`
`= color
`
`buffer
`
`1
`f
`1
`l
`....
`
`.
`\
`1
`
`..,,IF
`/I
`multipass
`buffer
`
`/
`1501
`
`/
`1502
`
`FIG. 15
`
`MEDIATEK, Ex. 1029, Page 16
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 16 of 20
`
`US 6,697,063 B1
`
`1603
`
`color
`merging/
`composition
`unit
`
`multipass
`buffer
`
`color
`buffer
`
`/
`1601
`
`/
`1602
`
`FIG. 16
`
`MEDIATEK, Ex. 1029, Page 17
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 17 of 20
`
`US 6,697,063 B1
`
`1702
`
`1703
`
`1704
`
`1705
`
`1706
`
`texture
`parameter
`setup
`
`screen xy
`setup
`
`screen z
`Setup
`
`color
`parameter
`Setup
`
`V V
`span scrrl-z
`interpolation
`
`span cclor
`interpoiation
`
`1707
`
`Ii
`
`xel color
`nterpolation
`
`Ii
`
`xel m.-Z
`nterpolation
`
`1701
`
`V
`span texture
`interpolation
`
`1714 V V
`pixel texture
`Interpolation
`
`edge walk
`
`pixel walk
`
`1713
`
`z-bufferl
`color-
`buffer
`
`V
`
`shading
`
`1708
`
`1712
`
`1711
`
`1710
`
`1709
`
`FIG. 17
`
`MEDIATEK, Ex. 1029, Page 18
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 18 of 20
`
`US 6,697,063 B1
`
`screen xy
`setup
`
`screen z
`setup
`
`----k
`
`attribute
`
`setup
`( 4 )
`
`/
`1802
`
`pixel
`generation
`
`screen z
`interpolation
`
`/
`1808
`
`/
`1807
`
`z buffering
`
`1806
`
`/
`1803
`
`-
`
`1804
`
`v
`attribute
`computation
`( 4 )
`
`1
`1
`:i
`
`data
`composition
`( .4 )
`
`/
`1805
`
`FIG. 18
`
`MEDIATEK, Ex. 1029, Page 19
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 19 of 20
`
`US 6,697,063 B1
`
`1903
`
`1901
`
`1902
`
`pixel
`buffer
`
`1904
`
`--Ob.
`
`attribute
`setup
`
`,----). attribute
`interpolator
`
`............---0.
`
`lut/ pixel
`
`assembly
`
`...110
`
`color
`composition
`
`I.
`
`color
`buffer/
`blender
`
`/
`1906
`
`/
`1905
`
`FIG. 19
`
`MEDIATEK, Ex. 1029, Page 20
`IPR2018-00101
`
`

`

`U.S. Patent
`
`Feb. 24, 2004
`
`Sheet 20 of 20
`
`US 6,697,063 B1
`
`2003
`
`\I.
`
`pixel
`buffer
`
`2001
`
`2002
`
`--).
`
`attribute
`setup
`
`\ r
`attribute
`interpolator
`
`2004
`
`pixel lut/
`assembly
`
`color
`buffer!
`compositor
`
`/
`2005
`
`FIG. 20
`
`MEDIATEK, Ex. 1029, Page 21
`IPR2018-00101
`
`

`

`1
`RENDERING PIPELINE
`
`US 6,697,063 B1
`
`This application claims benefit of No. 60/035,744 filed
`Jan. 3, 1997 and claims benefit of No. 60/050,912 filed Jun.
`13, 1997.
`
`5
`
`BACKGROUND OF THE INVENTION
`
`1. Technical Field
`The invention relates to the rendering of graphics in a
`computer environment. More particularly, the invention
`relates to a rendering pipeline system that renders graphical
`primitives displayed in a computer environment.
`2. Description of the Prior Art
`Graphical representations and user interfaces are no
`longer an optional feature but rather a requirement for
`computer applications. There is a pressing need to produce
`high performance, high quality, and low cost 3D graphics
`rendering pipelines because of this demand.
`Some geometry processing units (e.g. general-purpose
`host processors or specialized dedicated geometry engines)
`process geometries in model space into geometries in screen
`space. Screen space geometries are a collection of geometric
`primitives represented by screen space vertices and their
`connectivity information. A screen space vertex typically
`contains screen x, y, z coordinates, multiple sets of colors,
`and multiple sets of texture attributes (including the homo-
`geneous components), and possibly vertex normals. Refer-
`ring to FIG. 1, the connectivity information is conveyed
`using basic primitives such as points, lines, triangles 101, or
`strip 102, or fan 103 forms of these basic primitives.
`In a traditional architecture, raster or rasterization refers to
`the following process:
`Given screen x and y positions as well as all other
`parameter values for all vertices of a primitive, perform
`parameter setup computation in the form of plain equations;
`scan convert the primitive into fragments based on screen x
`and y positions; compute parameter values at these fragment
`locations. Referring to FIG. 2, a traditional rendering pipe-
`line is shown. Screen geometries 201 are rasterized 202. The
`shading process 203 is then performed on the graphics
`primitives. The z/alpha blending process 204 places the final
`output into the color/z frame buffer 205 which is destined for
`the video output 206. There is a serious concern with the
`memory bandwidth between the z/alpha-blending/pixel-op
`process 204 and the frame buffer in the memory 205. To
`z-buffer 100 Mpixels/s, assuming 4 bytes/pixel for RGBA
`color, 2 bytes/pixel for z, and 50% of the pixels actually
`being written into the frame buffer on average due to
`z-buffering. The memory bandwidth is computed as follows:
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`100 Mpixels/s*(2 bytes+50%*(4 bytes+2 bytes))/pixel=500
`Mbytes/s
`
`The equation assumes a hypothetical perfect prefetch of 55
`pixels from frame buffer memory into a local pixel cache
`without either page miss penalty or wasteful pixels.
`The actual memory bandwidth is substantially higher
`because the read-modify-write cycle required for z-buffering
`cannot be implemented efficiently without a complicated 60
`pipeline and long delay. Alpha blending increases the band-
`width requirement even further. The number is dramatically
`increased if full-scene anti-aliasing is performed. For
`example, 4-subsample multi-sampling requires the frame
`buffer memory access bandwidth by the z/alpha-blending/ 65
`pixel-op engine 204 to roughly quadruple, i.e. at least 2
`Gbytes/s of memory bandwidth is required to do
`
`2
`4-subsample multi-sampling at 100 Mpixels/s. Full-scene
`anti-aliasing is extremely desirable for improving rendering
`quality; however, unless either massive memory bandwidth
`is applied (e.g. through interleaving multiple processors/
`memories), which leads to rapid hardware cost increase or
`compromised pixel fill performance, full scene anti-aliasing
`is impractical to implement under a traditional rendering
`pipeline architecture. Full scene anti-aliasing also requires
`the frame buffer size to increase significantly, e.g. to qua-
`druple in the case of 4-subsample multi-sampling.
`Another drawback with the traditional rendering pipeline
`is that all primitives, regardless if they are visible or not, are
`completely rasterized and corresponding fragments are
`shaded. Considering a pixel fill rate of 400 Mpixels for
`non-anti-aliased geometries and assuming a screen resolu-
`tion of 1280x1024 with a 30 Hz frame rate, the average
`depth complexity is 10. Even if there is anti-aliasing, the
`average depth complexity is still between 6-7 for an average
`triangle size of 50 pixels. The traditional pipeline therefore
`wastes a large amount of time rasterizing and shading
`geometries that do not contribute to final pixel colors.
`There are other approaches which attempt to resolve these
`problems. With respect to memory bandwidth, two solutions
`exist. One approach is to use a more specialized memory
`design by either placing sophisticated logic on Dynamic
`Random Access Memory (DRAM) (e.g. customized
`memory chips such as 3DRAM) or placing a large amount
`of DRAM on logic. While this can alleviate the memory
`bandwidth problem to a large extent, it is not currently
`cost-effective due to the-economy-of-scale. In addition, the
`frame buffer size in the memory grows dramatically for
`full-scene anti-aliasing.
`The other alternative is by caching the frame buffer
`on-chip, which is also called virtual buffering. Only a
`portion of frame buffer can be cached at any time because
`on-chip memory is limited. One type of virtual buffering
`uses the on-chip memory as a general pixel cache, i.e. a
`window into the frame buffer memory. Pixel caching can
`take advantage of spatial coherence, however, the same
`location of the screen might be cached in and out of the
`on-chip memory many times during a frame. Therefore, it
`uses very little intra-frame temporal coherence (in the form
`of depth complexity).
`The only way to take advantage of intra-frame temporal
`coherence reliably is through screen space tiling (SST).
`First, by binning all geometries into tiles (also called screen
`subdivisions which are based on screen locations). For
`example, with respect to FIG. 3, the screen 301 is partitioned
`into 16 square, disjoint tiles, numbered 1 302, 2 303, 3 304,
`up to 16 312. Four triangles a 313, b 314, c 315, and d 316
`are binned as follows:
`tile 5 306: a 313
`tile 6 307: a 313, b 314, c 315
`tile 7 308: c 315, d 316
`tile 9 309: a 313
`tile 10 310: a 313, b 314, c 315, d 316
`tile 11 311: c 315, d 316
`Secondly, by sweeping through screen tiles, processing a
`tile's worth of geometry at a time, using an on-chip tile
`frame buffer, producing the final pixel colors corresponding
`to the tile, and outputting them to the frame buffer. Here, the
`external frame buffer access bandwidth is limited to the final
`pixel color output. There is no external memory bandwidth
`difference between non-anti-aliasing and full-scene anti-
`aliasing. The memory footprint in the external frame buffer
`is identical regardless if non-anti-aliasing or full-scene anti-
`
`MEDIATEK, Ex. 1029, Page 22
`IPR2018-00101
`
`

`

`US 6,697,063 B1
`
`4
`avoiding the breaking up of primitives the invention also
`reduces the buffering size through the use of single+
`buffering.
`The invention uses a double-z scheme that decouples the
`5 scan conversion/depth-buffer processing from the more gen-
`eral rasterization and shading processing. The core of
`double-z is the scan/z engine, which externally looks like a
`fragment generator but internally resolves visibility. It
`allows the rest of the rendering pipeline to rasterize only
`10 visible primitives and shade only visible fragments.
`Consequently, the raster/shading rate is decoupled from the
`scan/z rate. The invention also allows both opaque and
`transparent geometries to work seamlessly under this frame-
`work.
`The raster/shading engine is alternatively modified to take
`advantage of the reduced raster/shading requirements.
`Instead of using dedicated parameter computing units, one
`can share a generic parameter computing unit to process all
`parameters.
`20 Other aspects and advantages of the invention will
`become apparent from the following detailed description in
`combination with the accompanying drawings, illustrating,
`by way of example, the principles of the invention.
`
`15
`
`3
`aliasing is used. There is no external depth-buffer memory
`bandwidth effectively, and the depth-buffer need not exist in
`the external memory. The disadvantage is that extra screen
`space binning is introduced, which implies an extra frame of
`latency.
`Two main approaches exist with respect to depth com-
`plexity. One requires geometries sorted from front-to-back
`and rendered in that order and no shading of invisible
`fragments.
`The disadvantages to this first approach are: 1) spatial
`sorting needs to be performed off-line, and thus only works
`reliably for static scenes, dynamics dramatically reduce the
`effectiveness; 2) front-to-back sorting requires depth priori-
`ties to be adjusted per frame by the application programs,
`which places a significant burden on the host processors; and
`3) front-to-back sorting tends to break other forms of
`coherence, such as texture access coherence or shading
`coherence. Without front-to-back sorting, one-pass shading-
`after-z for random applications gives some improvement
`over the traditional rendering pipeline, however, perfor-
`mance improvement is not assured.
`The other approach is deferred shading where: 1) primi-
`tives are fully rasterized and their fragments are depth-
`buffered with their surface attributes; and 2) the (partially)
`visible fragments left in the depth-buffer are shaded using
`the associated surface attributes when all geometries are
`processed at the end of a frame. This guarantees that only
`visible fragments are shaded.
`The main disadvantages with this approach are: 1)
`deferred shading breaks shading coherence; 2) deferred
`shading requires full rasterization of all primitives, including
`invisible primitives and invisible fragments; 3) deferred
`shading requires shading all subsamples when multi-sample
`anti-aliasing is applied; and 4) deferred shading does not
`scale well with a varying number of surface attributes
`(because it has to handle the worst case).
`It would be advantageous to provide a rendering pipeline
`system that lowers the system cost by reducing the memory
`bandwidth consumed by the rendering system. It would
`further be advantageous to provide an efficient rendering
`pipeline system that writes visible fragments once into the
`color buffer and retains coherence.
`
`SUMMARY OF THE INVENTION
`The invention provides a rendering pipeline system for a
`computer environment. The invention uses a rendering pipe-
`line design that efficiently renders visible fragments by
`decoupling the scan conversion/depth buffer processing
`from the rasterization/shading process. It further provides a
`rendering pipeline system that reduces the memory band-
`width consumed by frame buffer accesses through screen
`space tiling. In the invention, raster or rasterization refers to
`the following process:
`For each visible primitive, parameter setup computation is
`performed to generate plane equations. For each visible
`fragment of said visible primitive, parameter values are
`computed. Scan conversion is excluded from the ras-
`terization process.
`The invention uses screen space tiling (SST) to eliminate
`the memory bandwidth bottleneck due to frame buffer
`access. Quality is also improved by using full-scene anti-
`aliasing. This is possible under SST because only on-chip
`memory corresponding to a single tile of the screen, as
`opposed to the full screen, is needed. A 32x32 tile anti-
`aliased frame buffer is easily implemented on-chip, and a
`larger tile size can later be accommodated. Additionally, the
`invention performs screen space tiling efficiently, while
`
`25
`
`30
`
`35
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a schematic diagram of triangle, strip and fan
`forms of basic primitives;
`FIG. 2 is a block schematic diagram of the data flow of a
`traditional rendering pipeline;
`FIG. 3 is a schematic diagram of a screen partition;
`FIG. 4 is a block schematic diagram of the data flow of a
`preferred embodiment of the invention;
`FIG. 5 is a schematic diagram of two triangle strips in an
`ideal binning situation according to the invention;
`FIG. 6 is a schematic diagram of a depth complexity
`distribution that arises frequently in graphics applications
`due to perspective according to the invention;
`FIG. 7 is a block schematic diagram of the data flow of the
`raster/shading engine in a preferred embodiment of the
`40 invention;
`FIG. 8 is a schematic diagram of a triangle and its visible
`fragments according to the invention;
`FIG. 9 is a block schematic diagram of the data flow of
`primitive parameter setup pipelines according to the inven-
`45 tion;
`FIG. 10 is a schematic diagram of a subsample, pixel, and
`visible opaque fragment according to the invention;
`FIG. 11 is a block schematic diagram of the data flow of
`the page allocation/release synchronization in screen space
`50 tiling in a preferred embodiment of the invention;
`FIG. 12 is a block schematic diagram of the module
`communications in a preferred embodiment of the invention;
`FIG. 13 is a block schematic diagram of the data flow
`involved in geometry transformation in a preferred embodi-
`ment of the invention;
`FIG. 14 is a block schematic diagram of two schemes for
`anti-aliased multipass rendering according to the invention;
`FIG. 15 is a block schematic diagram of the data flow of
`60 a revised scheme for anti-aliased multipass rendering
`according to the invention;
`FIG. 16 is a block schematic diagram of the data flow of
`a further refined scheme for anti-aliased multipass rendering
`according to the invention;
`FIG. 17 is a block schematic diagram of the data flow of
`a traditional polygon rasterization engine according to the
`invention;
`
`55
`
`65
`
`MEDIATEK, Ex. 1029, Page 23
`IPR2018-00101
`
`

`

`US 6,697,063 B1
`
`5
`FIG. 18 is a block schematic diagram of the data flow of
`a decoupled rasterization engine in a preferred embodiment
`of the invention;
`FIG. 19 is a block schematic diagram of the data flow of
`a fine-grain multipass rendering engine in a preferred
`embodiment of the invention; and
`FIG. 20 is a block schematic diagram of the data flow of
`a coarse-grain multipass rendering engine in a preferred
`embodiment of the invention.
`DETAILED DESCRIPTION OF THE
`INVENTION
`As shown in the drawings for purposes of illustration, the
`invention provides a rendering pipeline system in a com-
`puter environment. A system according to the invention
`provides efficient use of processing capabilities and memory
`bandwidth through the intelligent management of primitive
`rendering and memory usage while retaining coherence.
`The invention uses screen space tiling (SST) to eliminate
`the memory bandwidth bottleneck due to frame buffer
`access. Quality is also improved by using full-scene anti-
`aliasing. This is possible under SST because only on-chip
`memory corresponding to a single tile of screen as opposed
`to the full screen is needed. A 32x32 tile anti-aliased frame
`buffer is easily implemented on-chip, and a larger tile size
`can later be accommodated. Additionally, the invention
`performs screen space tiling efficiently while avoiding the
`breaking up primitives and reduces the buffering size
`required by SST.
`The invention uses a double-z scheme that decouples the
`scan conversion/depth-buffer processing from the more gen-
`eral rasterization and shading processing. The core of
`double-z is the scan/z engine, which externally looks like a
`fragment generator but internally resolves visibility. It
`allows the rest of the rendering pipeline to compute param-
`eters for only visible primitives and shade only visible
`fragments. Consequently, the raster/shading rate is
`decoupled from the scan/z rate. The invention also allows
`both opaque and transparent geometries to work seamlessly
`under this framework.
`The raster/shading engine is alternatively modified to take
`advantage of the reduced raster/shading requirements.
`Generic parameter pipelines that are shared by all param-
`eters replace dedicated processing units for each surface
`parameter are another option in the invention.
`Referring to FIG. 4, the basic data flow of the invention
`is shown. The geometries in model space 401 are trans-
`formed into screen space and the screen space tiler 412 bins
`a frame worth of geometries into screen tiles. The visibility
`of all geometries is determined up front using only screen x,
`y, z coordinates 402 in the scan/z engine 403 for each tile.
`Visibility information 404 are sent out for rasterization 405
`and shading 406. The visibility information 404 are com-
`bined with the tile geometries 411 for each tile so that only
`visible geometries are set up for rasterization. Only visible
`fragments are fully rasterized and shaded in the raster
`405/shading 406 engine. The resulting fragments are sent to
`the blending engine 407. The blending engine 407 alpha-
`blends incoming fragments. The blending engine 407
`resolves and outputs pixel colors into the frame buffer at the
`end-of-tile. The tasks of the screen space tiler 412, scan z
`403, raster 405/shading 406, and blending 407 engines
`operate in parallel for the load-balancing of the various
`processes. This does introduce one frame of latency. If the
`extra latency is objectionable, then the scan z 403, raster
`405/shading 406, and blending 407 engines operate in
`parallel with the screen space tiler 412 operating serially
`before them.
`
`6
`
`1 5
`
`Screen Space Tiling
`2.1 Overview
`Screen space tiling (SST) partitions a screen into disjoint
`(rectangular) regions (called tiles). It bins all geometries in
`5 screen space into tiles that the geometries intersect. Primi-
`tives crossing multiple tiles will be binned in all relevant
`tiles.
`Referring to FIG. 3, for example, a screen 301 is parti-
`tioned into 16 square, disjoint tiles, numbered 1 302, 2 303,
`10 3 304, up to 16 312. Four triangles a 313, b 314, c 315, and
`d 316 are binned as follows:
`tile 5 306: a 313
`tile 6 307: a 313, b 314, c 315
`tile 7 308: c 315, d 316
`tile 9 309: a 313
`tile 10 310: a 313, b 314, c 315, d 316
`tile 11 311: c 315, d 316
`This binning process completes when a frame boundary is
`20 reached. Then the binned geometries are handed to the
`rendering pipeline for rendering. Meanwhile, geometry
`transformation and binning for the next frame gets started.
`Ideally, geometry transformation and binning of the next
`frame is performed in parallel with the rendering of the
`25 current frame.
`One potential drawback with SST is the extra frame of
`latency it introduces; however, this is generally tolerable.
`A tile worth of geometries is rendered without external
`frame buffer access by maintaining a tile frame buffer
`30 on-chip. The final pixel colors are output to the external
`color buffer in the memory only after geometries for the
`whole tile have been processed. Therefore, the memory
`bandwidth bottleneck in a traditional rendering pipeline
`caused by frame buffer access is eliminated.
`Because SST requires binning one frame of geometries,
`due to the geometry size, the binned geometries have to be
`stored in external memory. Both writing binned geometries
`into memory during binning and reading binned geometries
`from memory during tile rendering consumes memory band-
`40 width. The memory bandwidth requirement for both reading
`and writing is examined next. Assume that 1 Mtriangles are
`represented in strip form. The average triangle size is 50
`pixels, the average vertex size is 20-25 bytes with screen x,
`y, z coordinates, 2 sets of colors, 1 set of 2D texture
`45 coordinates, and 1/w in packed form of adequate precision.
`The average triangle strip size within a tile is about 8
`vertices, which gives 1.33 vertex/tri. In addition, up to 50%
`of the triangles need to be duplicated across multiple tiles.
`Therefore, the memory bandwidth required for SST is
`so roughly:
`
`35
`
`2(write/read)*1 Mtris*1.5*1.33 vtx/tri*20-.25 bytes/vtx=80-100
`Mbytes
`
`The memory bandwidth number for SST stays the same
`55 whether full-scene anti-aliasing is implemented or not, or
`geometries are only z-buffered, or complex alpha-blending
`is performed because the tile frame buffer is on-chip. This
`bandwidth scales linearly with polygon performance. For
`example, a performance of 5 Mtris/s and 10 Mtris/s requires
`60 400.-500 Mbytes/s and 800-1000 Mbytes/s memory band-
`width respectively. The bandwidth goes down when the
`average vertex size decreases. In addition, the bandwidth
`number goes down as the average triangle size becomes
`smaller, because a tile can now contain longer strips, and the
`65 likelihood of triangle duplication in multiple tiles due to tile
`border crossing is reduced. The asymptotic rate approaches
`40-50 Mbytes per 1M triangles as the average triangle size
`
`MEDIATEK, Ex. 1029, Page 24
`IPR2018-00101
`
`

`

`US 6,697,063 B1
`
`5
`
`10
`
`15
`
`8
`what tile(s) to output, and what the new state is. The state
`information contains two parts:
`a) what tile(s) was the previous triangle in the binning
`strip/fan output to?
`b) where is the new triangle?
`For b), there are three main state values and the corre-
`sponding actions:
`1) when all three vertices are in the same tile, then output
`the triangle to that tile only.
`2) when all three vertices are in two horizontally and
`vertically adjacent tiles, then output the triangle to both
`tiles.
`3) otherwise, bin the triangle to all tiles that intersect with
`the bounding box of the triangle.
`Clearly, the handling of case 3) may be too conservative
`by binning nonintersecting triangles because of the co

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket