`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 1 of 17 Page|D# 14240
`
`
`
`
`
`
`
`
`
`
`
`
`
`EXHIBIT
`
`EXHIBIT
`A
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 2 of 17 PageID# 14241
`case3:14'CV'00757'REP'DJN ”we“87|III||IIIII‘IIIIIIIIIIIIII|||IIIIII1I|III|IIIIII’IIIIIIIIIIII’IIIL‘Q“1
`
`US007339590B1
`
`(12) United States Patent
`US 7,339,590 B1
`(10) Patent N0.:
`
`Moskal et al.
`(45) Date of Patent:
`Mar. 4, 2008
`
`(54) VERTEX PROCESSING UNIT SUPPORTING
`VERTEX TEXTURE MAPPING
`
`6,847,369 B2 *
`6,897,871 B1*
`6,900,800 B2*
`
`.............. 345/558
`1/2005 Lavelle et al.
`.
`345/501
`5/2005 Morein et al.
`5/2005 Baldwin ...........
`345/419
`
`(75)
`
`Inventors: Jeflrey B. Moskal, Austin, TX (US);
`-
`-
`$3915dTananbgum’ Aft“: Téx
`(US)j J:kofv§el;ek:rwesngn fissén’cA
`(US)
`
`(73) Assignee: NVIDIA Corporation, Santa Clara, CA
`(US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 536 days.
`
`Sep. 2’ 2004
`
`.
`(21) Appl. NO" 10/934’119
`.
`.
`(22) Flled'
`(51)
`Int. Cl.
`(2006 01)
`G06F 15/00
`(200601)
`G06T ”00
`(200601)
`G06T 11/40
`(200601)
`G09G 5/36
`.
`.
`'
`(52) US. Cl.
`..................; .....
`345/501, 345/552, 345/557
`(58) Field of Class1ficatlon Search ................ 345/501,
`.
`.
`345/552’ 557
`See apphcation file for complete search history‘
`
`(56)
`
`References Cited
`US. PATENT DOCUMENTS
`
`5,856,829 A *
`6,717,577 B1*
`
`1/1999 Gray et al.
`4/2004 Cheng et al.
`
`................. 345/422
`............... 345/419
`
`
`
`gig/23%
`3533,33 E: lggggg 156%? et :11'1 ~~
`,
`,
`1n
`0 m e a .
`345/552
`7,027,062 B2*
`4/2006 Lindholm et al.
`7,109,987 B2*
`9/2006 Goel et al. .................. 345/423
`* cited by examiner
`
`Primary ExamineriJiH-Cheng Wang
`(74) Attorney, Agent, or FirmiTownsend and Townsend
`and Crew, LLP
`
`(57)
`
`ABSTRACT
`
`A graphics processing subsystem includes a vertex process-
`ing unit that allows vertex shader programs to arbitrarily
`access data stored in vertex texture maps. The vertex pro-
`cessing unit includes a vertex texture fetch unit and vertex
`rocessin en ines. The vertex rocessin en ines o erate
`P
`g
`g
`P
`g
`g
`P
`in parallel to execute vertex shader programs that specify
`operations to be performed on vertices. In response to a
`vertex texture load instruction, a vertex processing engine
`dispatches a vertex texture request to the vertex texture fetch
`unit. The vertex texture fetch unit retrieves the correspond-
`ing vertex texture map data. While the vertex texture fetch
`unit is processing a vertex texture request, the requesting
`vertex processing engine is adapted to evaluate whether
`instructions that follow the vertex texture load instruction
`
`are dependent on the vertex texture map data, and if the
`instructions are not dependent on the vertex texture map
`data, to execute the additional instructions.
`
`13 Claims, 6 Drawing Sheets
`
`
`FIFO
`FIFO
`FIFO
`
`405{FIFO
`FIFO
`FIFO
`
`
`
`
`
`
`
`
`
`
`
`W
`
`410
`
`435
`EXTERNAL
`
`MEMORY
`l
`
`
`SAMPLE UNIT
`
`
`I
`
`
`
`
`
`430
`415
`MEMORY
`ADDRESS TRANS
`INTERFACE
`
`I
`I
`
`420
`425
`GENERAL ‘—> VERTEX TEXTURE
`CACHE
`CACHE
`
`
`400
`
`440
`DATA FORMAT
`AND DISPATCH
`
`“{m
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 3 of 17 PageID# 14242
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 3 of 17 Page|D# 14242
`
`U.S. Patent
`
`Mar. 4, 2008
`
`Sheet 1 0f 6
`
`US 7,339,590 B1
`
`160
`
`105
`CPU
`
`110
`
`Memory
`
`115
`Storage
`
`i
`
`
`
`165
`
`Graphics
`Coprocessor
`
`'
`
`155
`Additional
`
`GPUS
`
`
`
`E
`
`135
`GPU
`
`Interface
`
`120
`User Input
`
`125
`
`Network
`
`145
`
`Additional
`
`i
`
`Memo
`
`ry
`
`100
`
`F:|C3. 1
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 4 of 17 PageID# 14243
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 4 of 17 Page|D# 14243
`
`U.S. Patent
`
`Mar. 4, 2008
`
`Sheet 2 0f 6
`
`US 7,339,590 B1
`
`200
`
`
`
` 205
`VERTEX PROCESSING UNIT
`
`
`
` 215
`
`SETUP UNIT
`
`
`
`
`220
`
`
`
`225
`RASTERIZER UNIT
`COLOR ASSEMBLY UNIT
`
`
`
`
`230
`
`
`
`235
`FRAGMENT PROCESSING
`
`
`RASTER OPERATIONS UNIT
`
`UNIT
`
`
`FIG. 2
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 5 of 17 PageID# 14244
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 5 of 17 Page|D# 14244
`
`U.S. Patent
`
`Mar. 4, 2008
`
`Sheet 3 0f 6
`
`US 7,339,590 B1
`
`300
`
`303
`
`VPE
`
`305
`
`VPE
`
`307
`
`VPE
`
`309
`
`VPE
`
`311
`
`VPE
`
`313
`
`VPE
`
`320
`
`VERTEX TEXTURE
`
`FETCH UNIT
`
`
`
`FIG. 3
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 6 of 17 PageID# 14245
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 6 of 17 Page|D# 14245
`
`U.S. Patent
`
`Mar. 4, 2008
`
`Sheet 4 0f 6
`
`US 7,339,590 B1
`
`405
`
`FIFO‘
`
`FIFO|
`
`FIFO
`
`FIFO
`
`FIFO
`
`FIFO
`
`435
`
`EXTERNAL
`
`MEMORY
`
`410
`
`SAMPLE UNIT
`
`430
`
`MEMORY
`
`415
`
`ADDRESS TRANS
`
`400
`
`425
`
`GENERAL
`
`CACHE
`
`420
`
`VERTEX TEXTURE
`
`INTERFACE
`CACHE
`AND DISPATCH
`
`44o
`
`DATA FORMAT
`
`445
`
`FIG. 4
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 7 of 17 PageID# 14246
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 7 of 17 Page|D# 14246
`
`U.S. Patent
`
`Mar. 4, 2008
`
`Sheet 5 0f 6
`
`US 7,339,590 B1
`
`SAMPLE UNIT
`
`505
`
`1-D TEXTURE
`
`OFFSET
`
`OFFSET
`
`510
`
`2-D TEXTURE
`
`FIG. 5A
`
`SAMPLE UNIT
`
`
`
`
` 555
`
`1-D/2—D
`TEXTURE
`
`
`OFFSET
`
`
`
`
`
`FIG. 5B
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 8 of 17 PageID# 14247
`Case 3:14-cv-OO757-REP-DJN Document 87
`-1 Filed 04/16/15 Page 8 of 17 Page|D# 14247
`
`U.S. Patent
`
`Mar. 4, 2008
`
`Sheet 6 of 6
`
`US 7,339,590 B1
`
`N...
`E.
`
`
`9%
`
`ovm
`
`thmw>
`
`map—kw...
`
`<._.<D
`
`0mm
`
`
`
`
`
`Ro/<20mmmm_<20EoweE
`
`_
`
`
`
`
`
` owmm5m.__n_awkwamm.—
`
`vmmmmo
`
`mme<._.<Dom
`
`
` mmm>>
`
`
`
`wmm/<203mm_
`
`_mmo_
`
`c.07.
`
`
`
`mmDHXMHXm.Em>
`
`:23IOHmu
`
`05
`
`mom
`
`Nm
`
`mom
`
`EAJXH
`
`now
`
`mm
`
`mom
`
`rm
`
`mom
`
`coo
`
`
`
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 9 of 17 PageID# 14248
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 9 of 17 Page|D# 14248
`
`US 7,339,590 B1
`
`1
`VERTEX PROCESSING UNIT SUPPORTING
`VERTEX TEXTURE MAPPING
`
`CROSS-REFERENCES TO RELATED
`APPLICATIONS
`
`This application is being filed concurrently with US.
`application Ser. No. 10/934,120, entitled “VERTEX TEX-
`TURE CACHE RETURNING HITS OUT OF ORDER,”
`filed Sep. 2, 2004 by Jakob Nebeker and Jeffrey B. Moskal
`now US. Pat. No. 6,972,769, the disclosure of which is
`incorporated herein by reference.
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates to the field of computer
`graphics. Many computer graphic images are created by
`mathematically modeling the interaction of light with a three
`dimensional scene from a given viewpoint. This process,
`called rendering, generates a two-dimensional image of the
`scene from the given viewpoint, and is analogous to taking
`a photograph of a real-world scene.
`As the demand for computer graphics, and in particular
`for real-time computer graphics, has increased, computer
`systems with graphics processing subsystems adapted to
`accelerate the rendering process have become widespread.
`In these computer systems, the rendering process is divided
`between a computer’s general purpose central processing
`unit (CPU) and the graphics processing subsystem. Typi-
`cally,
`the CPU performs high level operations, such as
`determining the position, motion, and collision of objects in
`a given scene. From these high level operations, the CPU
`generates a set of rendering commands and data defining the
`desired rendered image or images. For example, rendering
`commands and data can define scene geometry, lighting,
`shading, texturing, motion, and/or camera parameters for a
`scene. The graphics processing subsystem creates one or
`more rendered images from the set of rendering commands
`and data.
`
`Scene geometry is typically represented by geometric
`primitives, such as points,
`lines, polygons (for example,
`triangles and quadrilaterals), and curved surfaces, defined by
`one or more two- or three-dimensional vertices. Each vertex
`
`may have additional scalar or vector attributes used to
`determine qualities such as the color, transparency, lighting,
`shading, and animation of the vertex and its associated
`geometric primitives.
`Many graphics processing subsystems are highly pro-
`grammable, enabling implementation of, among other
`things, complicated lighting and shading algorithms. In
`order to exploit
`this programmability, applications can
`include one or more graphics processing subsystem pro-
`grams, which are executed by the graphics processing sub-
`system in parallel with a main program executed by the
`CPU. Although not confined to merely implementing shad-
`ing and lighting algorithms, these graphics processing sub-
`system programs are often referred to as shading programs
`or shaders.
`
`One portion of a typical graphics processing subsystem is
`a vertex processing unit. To enable a variety of per-vertex
`algorithms, for example for visual effects, the vertex pro-
`cessing unit is highly programmable. The vertex processing
`unit executes one or more vertex shader programs in parallel
`with the main CPU. While executing, each vertex shader
`program successively processes vertices and their associated
`attributes to implement the desired algorithms. Additionally,
`vertex shader programs can be used to transform vertices to
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`
`a coordinate space suitable for rendering, for example a
`screen space coordinate system. Vertex shader programs can
`implement algorithms using a wide range of mathematical
`and logical operations on vertices and data, and can includes
`conditional and branching execution paths.
`Unfortunately, vertex shader programs typically cannot
`arbitrarily access data stored in memory. This prevents
`vertex shader programs from using of data structures such as
`arrays. Using scalar or vector data stored in arrays enables
`vertex shader programs to perform a variety of additional
`per-vertex algorithms, including but not limited to advanced
`lighting effects, geometry effects such as displacement map-
`ping, and complex particle motion simulations. Arrays of
`data could also be used to implement per-vertex algorithms
`that are impossible, unpractical, or inefficient to implement
`otherwise.
`
`One barrier to allowing vertex shader programs to arbi-
`trarily access data in memory is that arbitrary memory
`accesses typically have large latencies, especially when
`accessing external memory. When the vertex processing unit
`must stop vertex shader program execution until data is
`returned from memory, performance is severely decreased.
`Caches alone do little to reduce the occurrence of these
`
`pipeline stalls, as the size of arrays for some per-vertex
`algorithms are too large to be cached entirely.
`It is therefore desirable for a vertex processing unit of a
`graphics processing subsystem to enable vertex shader pro-
`grams to arbitrarily access array data. It is further desirable
`that the vertex processing unit efficiently access array data
`while minimizing the occurrence and impact of pipeline
`stalls due to memory latency.
`
`BRIEF SUMMARY OF THE INVENTION
`
`An embodiment of the invention includes a graphics
`processing subsystem includes a vertex processing unit that
`allows vertex shader programs to arbitrarily access data
`stored in vertex texture maps. The vertex processing unit
`includes a vertex texture fetch unit and vertex processing
`engines. The vertex processing engines operate in parallel to
`execute vertex shader programs that specify operations to be
`performed on vertices. In response to a vertex texture load
`instruction, a vertex processing engine dispatches a vertex
`texture request to the vertex texture fetch unit. The vertex
`texture fetch unit retrieves the corresponding vertex texture
`map data. While the vertex texture fetch unit is processing
`a vertex texture request, the requesting vertex processing
`engine is adapted to evaluate whether instructions that
`follow the vertex texture load instruction are dependent on
`the vertex texture map data, and if the instructions are not
`dependent on the vertex texture map data, to execute the
`additional instructions.
`
`In an embodiment, a graphics processing subsystem com-
`prises a vertex texture fetch unit and a plurality of vertex
`processing engines. Each vertex processing engine is
`adapted to execute a vertex shader program specifying a
`series of operations to be performed on a vertex. In response
`to a vertex texture load instruction of the vertex shader
`
`program, the vertex processing engine is adapted to dispatch
`a vertex texture request to the vertex texture fetch unit. The
`vertex texture fetch unit is adapted to receive a plurality of
`vertex texture requests from the vertex processing engines.
`Each vertex texture request includes at least one texture
`coordinate specifying a location of vertex texture map data
`within a vertex texture map. In response to the vertex texture
`request from one of the plurality of vertex processing
`engines, the vertex texture fetch unit is adapted to retrieve
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 10 of 17 PageID# 14249
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 10 of 17 Page|D# 14249
`
`US 7,339,590 B1
`
`3
`the vertex texture map data from the location within the
`vertex texture map specified by the at least one index value
`of the vertex texture request and to return the vertex texture
`map data to the requesting vertex processing engine.
`In another embodiment, the vertex texture fetch unit is
`adapted to associate a vertex processing engine ID with the
`vertex texture request in response to receiving a vertex
`texture request from one of the plurality of vertex processing
`engines. The vertex processing engine ID specifies the
`vertex processing engine dispatching the vertex texture
`request. The vertex texture fetch unit is adapted to return the
`vertex texture map data to the vertex processing engine
`specified by the vertex processing engine ID.
`In a further embodiment, each vertex processing engine
`includes a vertex shader program instruction queue adapted
`to store a set of vertex shader instructions included in the
`
`vertex shader program. The set of vertex shader instructions
`include vertex texture load instruction. The vertex process-
`ing engine also has a register file including a set of data
`registers adapted to store data used while executing the
`vertex shader program. Each data register includes a control
`portion adapted to restrict access to the data register while a
`vertex texture request is pending. In response to the vertex
`texture load instruction requesting a vertex texture map data
`and specifying one of the set data registers as a destination
`data register for storing the vertex texture map data, the
`vertex processing engine is adapted to dispatch a vertex
`texture request to a vertex texture fetch unit. In an additional
`embodiment,
`the vertex processing engine is adapted to
`fetch an additional instruction that follows the vertex texture
`
`load instruction in the vertex shader program instruction
`queue,
`to evaluate whether the additional
`instruction is
`dependent on the vertex texture map data, and in response to
`a determination that the additional instruction is not depen-
`dent on the vertex texture map data, to execute the additional
`instruction.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram of an example computer system
`suitable for implementing an embodiment of the invention;
`FIG. 2 illustrates a block diagram of a rendering pipeline
`of a graphics processing subsystem according to an embodi-
`ment of the invention;
`FIG. 3 illustrates a portion of the vertex processing unit
`according to an embodiment of the invention;
`FIG. 4 illustrates a vertex texture fetch unit according to
`an embodiment of the invention;
`FIGS. 5A and 5B illustrate sample units according to an
`embodiment of the invention; and
`FIG. 6 illustrates a portion of a vertex processing engine
`according to an embodiment of the invention.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`FIG. 1 is a block diagram of a computer system 100, such
`as a personal computer, video game console, personal digital
`assistant, or other digital device, suitable for practicing an
`embodiment of the invention. Computer
`system 100
`includes a central processing unit (CPU) 105 for running
`software applications and optionally an operating system. In
`an embodiment, CPU 105 is actually several separate central
`processing units operating in parallel. Memory 110 stores
`applications and data for use by the CPU 105. Storage 115
`provides non-volatile storage for applications and data and
`may include fixed disk drives, removable disk drives, flash
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`memory devices, and CD-ROM, DVD-ROM, or other opti-
`cal storage devices. User input devices 120 communicate
`user inputs from one or more users to the computer system
`100 and may include keyboards, mice, joysticks,
`touch
`screens, and/or microphones. Network interface 125 allows
`computer system 100 to communicate with other computer
`systems via an electronic communications network, and may
`include wired or wireless communication over local area
`networks and wide area networks such as the Internet. The
`
`components of computer system 100, including CPU 105,
`memory 110, data storage 115, user input devices 120, and
`network interface 125, are connected via one or more data
`buses 160. Examples of data buses include ISA, PCI, AGP,
`PCI, PCI-Express, and HyperTransport data buses.
`A graphics subsystem 130 is further connected with data
`bus 160 and the components of the computer system 100.
`The graphics subsystem may be integrated with the com-
`puter system motherboard or on a separate circuit board
`fixedly or removably connected with the computer system.
`The graphics subsystem 130 includes a graphics processing
`unit (GPU) 135 and graphics memory. Graphics memory
`includes a display memory 140 (e.g., a frame buffer) used
`for storing pixel data for each pixel of an output image. Pixel
`data can be provided to display memory 140 directly from
`the CPU 105. Alternatively, CPU 105 provides the GPU 135
`with data and/or commands defining the desired output
`images, from which the GPU 135 generates the pixel data of
`one or more output images. The data and/or commands
`defining the desired output images is stored in additional
`memory 145. In an embodiment, the GPU 135 generates
`pixel data for output images from rendering commands and
`data defining the geometry,
`lighting, shading,
`texturing,
`motion, and/0r camera parameters for a scene.
`In another embodiment, display memory 140 and/or addi-
`tional memory 145 are part of memory 110 and is shared
`with the CPU 105. Alternatively, display memory 140 and/or
`additional memory 145 is one or more separate memories
`provided for the exclusive use of the graphics subsystem
`130. The graphics subsystem 130 periodically outputs pixel
`data for an image from display memory 140 and displayed
`on display device 150. Display device 150 is any device
`capable of displaying visual information in response to a
`signal from the computer system 100, including CRT, LCD,
`plasma, and OLED displays. Computer system 100 can
`provide the display device 150 with an analog or digital
`signal.
`In a further embodiment, graphics processing subsystem
`130 includes one or more additional GPUs 155, similar to
`GPU 135. In an even further embodiment, graphics process-
`ing subsystem 130 includes a graphics coprocessor 165.
`Graphics processing coprocessor 165 and additional GPUs
`155 are adapted to operate in parallel with GPU 135, or in
`place of GPU 135. Additional GPUs 155 generate pixel data
`for output images from rendering commands, similar to
`GPU 135. Additional GPUs 155 can operate in conjunction
`with GPU 135 to simultaneously generate pixel data for
`different portions of an output image, or to simultaneously
`generate pixel data for different output
`images.
`In an
`embodiment, graphics coprocessor 165 performs rendering
`related tasks such as geometry transformation, shader com-
`putations, and backface culling operations for GPU 135
`and/or additional GPUs 155.
`Additional GPUs 155 can be located on the same circuit
`
`board as GPU 135 and sharing a connection with GPU 135
`to data bus 160, or can be located on additional circuit
`boards separately connected with data bus 160. Additional
`GPUs 155 can also be integrated into the same module or
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 11 of 17 PageID# 14250
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 11 of 17 Page|D# 14250
`
`US 7,339,590 B1
`
`5
`chip package as GPU 135. Additional GPUs 155 can have
`their own display and additional memory, similar to display
`memory 140 and additional memory 145, or can share
`memories 140 and 145 with GPU 135. In an embodiment,
`the graphics coprocessor 165 is integrated with the computer
`system chipset (not shown), such as with the Northbridge or
`Southbridge chip used to control the data bus 160.
`FIG. 2 illustrates a block diagram of a rendering pipeline
`200 of a graphics processing subsystem according to an
`embodiment of the invention. Pipeline 200 may be imple-
`mented in GPU 135 and/or described above. Pipeline 200
`includes a vertex processing unit 205, a viewport and culling
`(VPC) unit 210, a setup unit 215, a rasterizer unit 220, a
`color assembly block 225, and a fragment processing unit
`230.
`
`Vertex processing unit 205, which may be of generally
`conventional design, receives a geometric representation of
`a three-dimensional scene to be rendered. In one embodi-
`
`ment, the scene data includes a definitions for objects (e.g.,
`a table, a mountain, a person, or a tree) that may be present
`in the scene. Objects typically represented as one or more
`geometric primitives, such as points,
`lines, polygons (for
`example, triangles and quadrilaterals), and curved surfaces.
`Geometric primitives are typically defined by one or more
`vertices, each having a position that is typically expressed in
`a two- or three-dimensional coordinate system. In addition
`to a position, each vertex also has various attributes asso-
`ciated with it. In general, attributes of a vertex may include
`any property that is specified on a per-vertex basis. In an
`embodiment, the vertex attributes include scalar or vector
`attributes used to determine qualities such as the color,
`transparency, lighting, shading, and animation of the vertex
`and its associated geometric primitives.
`It is typically more convenient to express and manipulate
`portions of the three-dimensional scene in different coordi-
`nate systems. For example, each object may have one or
`more local coordinate systems. Because objects may have
`their own coordinate systems, additional data or commands
`are advantageously provided to position the objects relative
`to each other within a scene, for example with reference to
`global coordinate system. Additionally, rendering may be
`performed by transforming all or portions of a scene from a
`global coordinate system to a viewport or screen-space
`coordinate system. In an embodiment, the vertex processing
`unit 205 transforms vertices from their base position in a
`local coordinate system through one or more frames of
`reference to a destination coordinate system, such as a global
`or screen space coordinate system. In a further embodiment,
`vertices are specified using a homogeneous coordinate sys-
`tem to facilitate coordinate transformations by the vertex
`processing unit 205.
`Additionally, as discussed in more detail below, the vertex
`processing unit is highly programmable and can execute
`vertex shader programs specified by the rendering applica-
`tion. While executing, each vertex shader program succes-
`sively processes vertices and their associated attributes to
`implement a variety of visual effects. Numerous examples of
`such “per-vertex” operations are known in the art and a
`detailed description is omitted as not being critical
`to
`understanding the present invention. Vertex shader programs
`can implement algorithms using a wide range of mathemati-
`cal and logical operations on vertices and data, and can
`includes conditional and branching execution paths.
`In addition to performing mathematical and logical opera-
`tions on vertices, vertex shader programs can arbitrarily
`access memory to retrieve additional scalar or vector data
`stored in an array. Vertex shader programs can use the array
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`data to enable vertex shader programs to perform a variety
`of additional per-vertex algorithms, including but not lim-
`ited to advanced lighting effects, geometry effects such as
`displacement mapping, and complex particle motion simu-
`lations. Vertex shader programs can also use arrays of data
`to implement per-vertex algorithms that are impossible,
`unpractical, or inefficient to implement otherwise.
`In the most simple implementation, the data elements of
`a one- or two-dimensional array are associated with vertices
`based upon the vertices’ positions. For this reason, an array
`of data associated with one or more vertices is typically
`referred to as a vertex texture map. However, the term vertex
`texture map also includes arrays of scalar or vector data of
`any size and number of dimensions and associated with
`vertices in any arbitrary way. Unlike texture maps associated
`with pixels, vertex texture maps do not specify the color,
`transparency, or other attributes of individual pixels; rather,
`vertex texture maps specify the attributes of vertices.
`Additionally, multiple vertex texture maps can be asso-
`ciated with a set of vertices, for example, with each vertex
`texture map supplying different parameters to a vertex
`shader program. The vertex shader program can use data
`from multiple vertex texture maps separately, can combine
`data from multiple vertex texture maps, and can use data
`from one vertex texture map to specify the location of data
`in another vertex texture map.
`The viewport and culling unit 210 culls or discards
`geometric primitives and/or portions thereof that are outside
`the field of view or otherwise unseen in the rendered image.
`By discarding geometric primitives that are not seen in the
`rendered image, culling decreases the number of geometric
`primitives to be processed by downstream processing stages
`of the rendering pipeline 200 and thus increases rendering
`speed.
`Setup unit 215 assembles one or more vertices into a
`geometric primitive, such as a triangle or quadrilateral. The
`rasterization stage 220 then converts each geometric primi-
`tive into one or more pixel fragments. A pixel fragment
`defines a set of one or more pixels to be potentially displayed
`in the rendered image. Each pixel fragment includes infor-
`mation defining the appearance of its pixels, for example
`screen position, texture coordinates, color values, and nor-
`mal vectors.
`
`Color assembly block 225 associates the pixel fragments
`received from rasterizer 220 with the per-vertex attributes,
`such as vertex colors, depth values, vertex normal vectors,
`and texture coordinates, received from VPC block 210 and
`generates additional attributes for interpolating per-vertex
`attribute values at any point within the pixel fragments. The
`pixel fragments and associated attributes are provided to
`fragment processor 230.
`Fragment processor 230 uses the information associated
`with each pixel fragment to determine the output color value
`of each pixel to be potentially displayed. Like the vertex
`processor 205,
`the fragment processing unit is program-
`mable. Apixel fragment program, also referred to as a pixel
`shader, is executed on each pixel fragment to determine an
`output color value for a pixel. Although the pixel fragment
`program operates independently of the vertex shader pro-
`gram, the pixel fragment program may be dependent upon
`information created by or passed through previous stream
`processing units, including information created by a vertex
`program. Rendering applications can specify the pixel frag-
`ment program to be used for any given set of pixel frag-
`ments. Pixel fragment programs can be used to implement a
`
`
`
`Case 3:14-cv-00757-REP-DJN Document 87-1 Filed 04/16/15 Page 12 of 17 PageID# 14251
`Case 3:14-cv-OO757-REP-DJN Document 87-1 Filed 04/16/15 Page 12 of 17 Page|D# 14251
`
`US 7,339,590 B1
`
`7
`including lighting and shading
`variety of Visual effects,
`effects, reflections, texture mapping and procedural texture
`generation.
`The set of pixels are then output to the raster operations
`and storage unit 235. The raster operations unit 235 inte-
`grates the set of pixels output from the fragment processing
`unit 230 with the rendered image. Pixels can be blended or
`masked with pixels previously written to the rendered
`image. Depth buffers, alpha buffers, and stencil buffers can
`also be used to determine the contribution of each incoming
`pixel, if any, to the rendered image. The combination of each
`incoming pixel and any previously stored pixel values is
`then output to a frame buffer, stored for example in display
`memory 140, as part of the rendered image.
`FIG. 3 illustrates a portion 300 of the vertex processing
`unit according to an embodiment of the invention. Portion
`300 includes vertex processing engines 303, 305, 307, 309,
`311, 313. Each vertex processing engine can independently
`execute a vertex shader program. As discussed above, each
`vertex processing engine executes its vertex shader program
`and successively processes vertices and their associated
`attributes to implement a per-vertex algorithm. The per-
`vertex algorithm can be used to perform any combination of
`a wide variety of lighting, shading, coordinate transforma-
`tion, geometric, and animation effects.
`The use of multiple vertex processing units operating in
`parallel improves execution performance. In the example of
`portion 300,
`there are a total of six vertex processing
`engines; therefore, portion 300 can simultaneously execute
`a total of six different vertex shading programs, six instances
`of the same vertex shading program, or any combination in
`between. In alternate embodiments, the vertex processing
`unit can include any number of vertex processing engines. In
`an additional embodiment, each vertex processing engine is
`multithreaded, enabling the execution of multiple vertex
`shading programs.
`In the embodiment of portion 300, each vertex processing
`engine dispatches vertex texture requests to a vertex texture
`fetch (VTF) unit 320. Each vertex processing engine dis-
`patches a vertex texture requests to the VTF unit 320 when
`its vertex shader program includes an instruction to access
`data from an array. A vertex texture request includes a set of
`attributes used to locate the requested data. In an embodi-
`ment, these attributes include one or more texture coordi-
`nates (depending upon the number of dimensions in the
`vertex texture map), texture ID, and a texture level of detail
`(LOD). Here, the one or more texture coordinates represent
`index value(s) specifying a location of vertex texture map
`data within a vertex texture map. In a further embodiment,
`these attributes also include a thread ID specifying the
`execution thread of the vertex processing engine requesting
`data from a vertex texture map. All of these attributes can be
`expressed in any numerical format, including integer, fixed-
`point, and floating-point formats.
`FIG. 4 illustrates a vertex texture fetch (VTF) unit 400
`according to an embodiment of the invention. VTF unit 400
`includes a set of input buffers 405, in an embodiment each
`of which receives vertex texture requests from one of vertex
`processing engines. Sample unit 410 reads vertex texture
`requests from each of the set of input buffers 405 in turn, for
`example in a round-robin manner. In an embodiment, as the
`sample unit 410 reads a vertex texture request from one of
`the set of input buffers 405, it assigns a vertex processing
`engine ID to the request, so that once the request is com-
`pleted, the data is returned to the correct vertex processing
`engine.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`Using the texture ID of the vertex texture request, the
`sample unit 410 retrieves texture state information associ-
`ated with the requested vertex texture map. Texture state
`information includes a base memory address for the vertex
`texture map, the number of dimensions of the vertex texture
`map, the size of each dimension of the vertex texture map,
`the format of the data in the vertex texture map (for example,
`scalar or vector), and texture boundary conditions (for
`example, when texture coordinates fall outside the vertex
`texture map, the texture coordinates can be clamped at the
`boundary, mirrored, or wrapped around to the other side of
`the vertex texture map). It should be noted that the VTF unit
`400 can retrieve data from vertex texture maps of any
`arbitrary siz