throbber
SHAN, COME:
`TOALG,TOWHOM THESE;
`UNITED STATES DEPARTMENT OF COMMERCE
`United States Patent and Trademark Office
`
`THIS IS TO CERTIFY THAT ANNEXED HERETOIS A TRUE COPY FROM
`THE RECORDSOF THIS OFFICE OF:
`
`May 9, 2023
`
`
`
`ByAuthority of the
`UnderSecretary of CommerceforIntellectual Property
`and Directorof the United States Patent and Trademark Office
`fi
`Miguel Tarver
`Certifying Officer
`
`PATENT NUMBER:7,038,685
`ISSUE DATE: May 2, 2006
`
`
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 1 of 22
`
`

`

`a2) United States Patent
`US 7,038,685 B1
`(10) Patent No.:
`
` Lindholm (45) Date of Patent: May2, 2006
`
`
`US007038685B1
`
`(54) PROGRAMMABLE GRAPHICS PROCESSOR
`FOR MULTITHREADED EXECUTION OF
`PROGRAMS
`
`(75)
`
`Inventor:
`
`John Erik Lindholm, Saratoga, CA
`(US)
`
`(73) Assignee: NVIDIA Corporation, Santa Clara, CA
`(US)
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`US.C. 154(b) by 134 days.
`
`(*) Notice:
`
`(21) Appl. No.: 10/609,967
`
`(22) Filed:
`
`Jun. 30, 2003
`
`(51)
`
`Int. Cl.
`(2006.01)
`GO6F 15/00
`(2006.01)
`GO06F 13/00
`(2006.01)
`G06F 12/02
`(2006.01)
`GO6F 9/46
`(2006.01)
`G06T 1/00
`(52) US. CMe vcescsessessssees 345/501; 345/543; 345/536,
`718/104
`(58) Field of Classification Seareh............... 345/501,
`345/502, 530, 531, 522, 418, 419, 426, 427,
`345/543, 505, 536; 718/100, 104, 103
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,020,115 A *
`S/L991 Black ...sesssseeeeseseen 382/298
`5,969,726 A
`10/1999 Rentschler etal.
`6,630,935 B1L* 10/2003 Taylor et al. ow. 345/522
`6,731,289 BL*
`5/2004 Peercy et al. wo... 345/503
`2003/0041173 Al
`2/2003 Hoyle
`
`ho
`.
`cited by examiner
`
`Primary Examiner—Kee M. Tung
`(74) Attorney, Agent, or Firm—Patterson & Sheridan, LLP
`
`(57)
`
`ABSTRACT
`
`for multithreaded
`A programmable graphics processor
`execution of program instructions including a thread control
`unit. The programmable graphics processor is programmed
`with program instructions for processing primitive, pixel
`and vertex data. The thread control unit has a thread storage
`resource including locations allocated to store thread state
`data associated with samples of two or more types. Sample
`types include primitive, pixel and vertex. A number of
`threads allocated to processing a sample type may be
`dynamically modified.
`
`45 Claims, 9 Drawing Sheets
`
`From From
`245
`220
`
`Execution
`
`Multithreaded
`Processing Unit
`400
`Thread
`nee Selection
`“ve
`Unit
`A410
`we
`
`||
`Pipeline
`wp
`"
`
`
`
`
`
`Thread
`Conta
`Unit
`=
`
`325
`
`Register
`
`File
`350
`
`
`
`||
`
`To 260
`
`10270
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page 2 of 22
`
`T0228 «
`From 225
`
`
`
`
`
`Resource
`Scoreboard |—»,_
`
`425
`
`"
`Instruction
`Scheduler
`430
`
`
`
`
`Instruction
`Dispatcher
`440
`|
`
`Execution Unit
`470
`
`From 215 ri
`From 220:
`
`
`
`
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 2 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 1 of 9
`
`US 7,038,685 B1
`
`100
`
`Host Memory
`Host Processor
`112
`114
`fF
`
`System Interface
`115
`
`|
`
`Graphics
`Subsystem
`170
`
`
`
`Graphics Interface 117
`araphics
`
`rocessor
`105
`
` Host Computer 110
`
`
`
`
`
`
`
`Memory
`
`
`
`Controller |||Processing
`120
`Pipeline
`150
`
`
`
`Front End
`130
`it
`
`IDX
`135
`
`Programmable
`Graphics
`
`160
`
`wid Raster Analyzer
`
`Output Controller
`180
`
`
`
`
`
`
`
`Output
`
`™/
`
`FIG. 1
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page 3 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 3 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 2 of 9
`
`US 7,038,685 B1
`
`From
`4135
`
`Programmable
`Graphics
`Processing
`Pipeline
`150
`
`
`
`Primitive Assembly/Setup
`205
`
`RasterUnit
`210
`
`
`
`
`
`Pixel Input Buffer
`Vertex Input Buffer
`220
`215
`
`
`
`
`
`Execution
`Pipeline
`240
`
`Execution
`Pipeline
`240
`
`Execution
`Pipeline
`240
`
`
`
`Execution
`Pipeline
`240
`
`
`
`
`
`Vertex Output Buffer
`260
`
`
`
`
`
`Pixel Output Buffer
`
`
`
`
`
`
`Texture
`Unit
`225
`
`Texture
`Cache
`230
`
`420,
`
`120
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page 4 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 4 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 3 of 9
`
`US 7,038,685 B1
`
`From
`215
`
`From
`220
`
`Execution
`Pipeline
`240
`
`Multithreaded
`Processing Unit
`
`File
`
`To 225
`
`From 215
`
`From 220
`
`Thread Control Unit
`320
`
`Register
`
`FIG. 3
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page 5 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 5 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 4 of 9
`
`US 7,038,685 B1
`
`Execution
`Pipeline
`
`Multithreaded
`Processing Unit
`Instructi
`nstruction
`cache
`
`—
`
`
`
`Thread
`Selection
`Unit
`
`415
`
`From From
`215
`220
`
`
`
`Thread
`Control
`Unit
`420
`
`TSR
`325
`
`470
`
`
`
`
`
`Resourceource
`Scoreboard
`460
`
`Sequencer
`425
`
`Instruction
`Scheduler
`430
`
`
`
`Instruction
`Dispatcher
`440
`
`
`
`Execution Unit
`
`Register
`File
`350
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page6 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 6 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 5 of 9
`
`US 7,038,685 B1
`
`
`Receive Pointer to a
`
`Program
`510
`
`
`
`Vertex
`
`Pixel
`or Vertex?
`515
`
`Pixel
`
`
`
`
`
`Receive Pointer to a
`Program
`510
`
`Pixel
`or Vertex?
`515
`
`.
`Pixel
`
`
`
`Vertex
`
`Assign Pixel
`Thread
`545
`
`
`
`
`
`
`
`Assign
`VertexThread
`
`
`
`535
`
`
`
`
`
`
`
`
`
`
`Pass
`Pass
`
`
`
`Priority Test?
`Priority Test?
`520
`535
`
`
` Vertex
`
` Pixel
`Thread
`Thread
`
`
`Available?
`Available?
`
`
`
`
`525
`540
`
`
`
`
`Assign
`Assign Pixel
`Thread
`VertexThread
`
`
`
`
`930
`545
`
`FIG. 5A
`
`FIG. 5B
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page 7 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 7 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 6 of 9
`
`US 7,038,685 B1
`
`
`
`625
`
`635
`
`645
`
`620
`
`630
`
`640
`
`FIG. 6B
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page8 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 8 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 7 of 9
`
`US 7,038,685 B1
`
`Allocating threads to
`a first sample type
`110
`
`Allocating threads to a
`second sample type
`715
`
`Execute First
`Program Instructions
`£20
`
`£25
`
`Execute Second
`Program Instructions
`
`Determine
`allocations
`750
`
`Allocating threads to
`a first sample type
`£99
`
`Allocating threads to a
`second sample type
`160
`
`Execute First
`Program Instructions
`165
`
`Execute Second
`Program Instructions
`170
`
`ifs
`
`FIG. 7A
`
`Allocating threads to
`the first sample type
`
`FIG. 7B
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page 9 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 9 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 8 of 9
`
`US 7,038,685 B1
`
` Receive Sample
`
`810
`
`
`
`
`
`815
`
`Identify Sample
`Type
`
`
`
`
` Assign Thread
`825
`
` Thread
`Available?
`
`
`
`820
`
`FIG. 8A
`
` Receive Sample
`
`
`
`
`850
`
` Identify Sample
`
`Type
`855
`
`
`
`
`
`
`Position
`PIOR
`disabled?
`Hazard?
`
`
`
`
`
`860
`865
`
`
`
`
`870
`
`
`
`eeAvailable?
` Assign Thread
`
`
` esources
`
`Available?
`877
`
`Execute Thread
`880
`
`
`
`Deallocate Resources
`850
`
`FIG. 8B
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page 10 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 10 of 22
`
`

`

`U.S. Patent
`
`May2, 2006
`
`Sheet 9 of 9
`
`US 7,038,685 B1
`
`Determine thread
`priority
`
`950
`
`
`Identify next priority
`980
`
`a
`
`
`Identify assigned
`
`thread(s) for priority
`955
`
`
`No threads?
`960
`
`
`
`:
`;
`Identify assigned
`thread(s)
`910
`
`Select Thread(s)
`915
`
`
`
`Read Program
`Counter(s)
`920
`
`Update Program
`
`
`
`
`
`
`
`
`Select Thread(s)
`965
`
`
`
`
`970
`Read Program
`Counter(s)
`
`
`
`
`
`
`Counter(s)
`925
`
`
`Update Program
`Counter(s)
`
`975
`
`
`FIG. 9A
`
`FIG. 9B
`
`Realtek Ex. 1005
`
`Case No. IPR2023-00922
`
`Page 11 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 11 of 22
`
`

`

`US 7,038,685 Bl
`
`1
`PROGRAMMABLE GRAPHICS PROCESSOR
`FOR MULTITHREADED EXECUTION OF
`PROGRAMS
`
`FIELD OF THE INVENTION
`
`2
`A methodofassigning threads for processing of graphics
`data includes receiving a sample to be processed. A sample
`type of vertex, pixel, or primitive, associated with the
`sample is determined. A thread is determined to be available
`for assignmentto the sample. The thread is assigned to the
`sample.
`A method of selecting at least one thread for execution
`includes identifying one or more assigned threads from
`threads including atleast a thread assignedto a pixel sample
`10 and a thread assigned to a vertex sample.Atleast one of the
`one or more assigned threads is selected for processing.
`A met vuesdate performance of multeade’
`processing OF graphics
`data usingat least two sample types
`includes dynamically allocating a first number ofthreadsfor
`15 processing a first portion of the graphics data to a first
`sample type and dynamically allocating a second number of
`pie
`'yP
`y!
`y
`ing
`.
`threads for processing a second portion of the graphics data
`to a second sample type.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`
`
`
`One or more aspects of the invention generally relate to
`multithreaded processing, and more particularly to process-
`ing graphics data in a programmable graphics processor.
`BACKGROUND
`Current graphics data processing includes systems and
`methods developed to perform a specific operation on graph-
`ics data, e.g., linear interpolation, tessellation, rasterization.
`ee
`.

`1:
`>
`texture mapping, depthtesting, etc. These graphics proces-
`sors include several fixed function computation units to
`nerform such specific operations on specific types of graph-
`:
`.
`ics data, such as vertex data and pixel data. Morerecently,
`he computation units have a degree of programmability to 30
`perform user specified operations such that the vertex data 1s
`exemplary
`show
`drawing(s)
`Accompanying
`processed by a vertex processing unit using vertex programs
`embodiment(s) in accordance with one or more aspects of
`and the pixel data is processed by a pixel processing unit
`the
`present
`invention;
`however,
`the
`accompanying
`using pixel programs. When the amount of vertex databeing
`processed is low relative the amount of pixel data being 5 drawing(s) should not be taken to limit the present invention
`processed, the vertex processing unit may be underutilized.
`to the embodiment(s) shown, but are for explanation and
`Conversely, when the amountof vertex data being processed
`understanding only.
`is high relative the amount of pixel data being processed, the
`FIG.1 illustrates one embodiment of a computing system
`pixel processing unit may be underutilized.
`according to the invention including a host computer and a
`Accordingly, it would be desirable to provide improved 30 graphics subsystem.
`approaches to processing different types of graphics data to
`FIG. 2 is a block diagram of an embodiment of the
`better utilize one or more processing units within a graphics
`Programmable Graphics Processing Pipeline of FIG. 1.
`processor.
`FIG. 3 is a block diagram of an embodiment of the
`Execution Pipeline of FIG. 1.
`FIG.4 is a block diagram of an alternate embodiment of
`SUMMARY
`the Execution Pipeline of FIG.1.
`:
`:
`A
`FIGS. 5A and 5B are flow diacrams of exempla
`method and apparatus for processing and allocating
`embodiments of thread assionment weenydance vith ome
`hreads for multithreaded execution of graphics programsis
`ts of the cent invention
`described. A graphics processor for multithreaded execution
`orFIGS Shen 46B
`P
`/
`b dj
`ts
`of
`vorti
`ofprogram instructions associated with threads to process at
`east two sample types includesa thread control unit includ- 40 of the "Thread Swe Roscoe orn ‘heen,i sue ‘ata
`ing a thread storage resource configured to store thread state
`thi
`b Tone ofthe Thread Co*ol Unit of FIG. 3
`data for each ofthe threadsto process the at least two sample
`- FIG4 emo
`.
`types.
`.
`Mimahe eis posesns a mie GS,7A and7, afw, pansof emg
`hreaded processing unit. The multithreaded processing unit 45 dance with one or more aspects of they resent ention
`includes a thread control unit configured to store pointers to
`FIGS. 8A and 8B a flow dia es of exem la
`
`
`
`program instructions associated with threads, each thread tsofthreadb din assi ti e 4 .iL v
`
`
`
`processing a sample type of vertex, pixel or primitive. The
`empocanen's © th assignment
`Ihaccorcance WIN) One
`multithreaded processing unit also includes at
`least one
`orFIGS ve ‘i 0B ee anea

`\
`programmable computation unit configured to process data 50
`.
`am
`are OW Maagranls OF
`exempiary
`under control of the program instructions.
`embodiments °thread selection in accordance with one or
`A method of multithreaded processing of graphics data
`more aspects of
`the present Invention.
`includes receiving a pointer to a vertex program to process
`DETAILED DESCRIPTION
`vertex samples.Afirst thread is assigned to a vertex sample.
`A pointer to a shader program to process pixel samples is 55
`—_In the following description, numerous specific details are
`received. A second threadis assigned to a pixel sample. The
`set forth to provide a more thorough understanding of the
`vertex program is executed to process the vertex sample and—_present invention. However, it will be apparent to one of
`produce a processed vertex sample. The shader program is
`skill in the art that the present invention may be practiced
`executed to process the pixel sample and produce a pro-
`without one or more of these specific details.
`In other
`cessed pixel sample.
`60 instances, well-known features have not been described in
`Alternatively, the method of multithreaded processing of
`order to avoid obscuring the present invention.
`graphics data includes allocating a first number of process-
`FIG.1 is an illustration of a Computing System generally
`ing threads for a first sample type. A second number of
`designated 100 and including a Host Computer 110 and a
`processing threads is allocated for a second sample type.
`Graphics Subsystem 170. Computing System 100 may be a
`First program instructions associated with the first sample
`desktop computer, server, laptop computer, palm-sized com-
`type are executed to process the graphics data and produce
`puter,
`tablet computer, game console, cellular telephone,
`processed graphics data.
`computer based simulator, or the like. Host Computer 110
`Realtek Ex. 1005
`
`35
`
`65
`
`Case No. IPR2023-00922
`
`Page 12 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 12 of 22
`
`

`

`US 7,038,685 Bl
`
`25
`
`3
`4
`operations, such as stencil, z test, and the like, and saves the
`includes Host Processor 114 that may include a system
`results or the samples output by Programmable Graphics
`memory controller to interface directly to Host Memory 112
`Processing Pipeline 150 in Local Memory 140. When the
`or may communicate with Host Memory 112 through a
`data received by Graphics Subsystem 170 has been com-
`System Interface 115. System Interface 115 may be an I/O
`pletely processed by Graphics Processor 105, an Output 185
`(input/output) interface or a bridge device including the
`of Graphics Subsystem 170 is provided using an Output
`system memory controller to interface directly to Host
`Controller 180. Output Controller 180 is optionally config-
`Memory 112. Examples of System Interface 115 known in
`ured to deliver data to a display device, network, electronic
`the art include Intel® Northbridge and Intel® Southbridge.
`control system, other Computing System 100, other Graph-
`Host Computer 110 communicates with Graphics Sub-
`10 ics Subsystem 170, or the like. Alternatively, data is output
`system 170 via System Interface 115 and a Graphics Inter-
`to a film recording device or written to a peripheral device,
`face 117 within a Graphics Processor 105. Data received at
`e.g., disk drive, tape, compact disk, or the like.
`Graphics Interface 117 can be passed to a Front End 130 or
`FIG.2 is an illustration of Programmable Graphics Pro-
`written to a Local Memory 140 through Memory Controller
`cessing Pipeline 150 of FIG. 1. At least one set of samples
`120. Graphics Processor 105 uses graphics memory to store
`is output by IDX 135 and received by Programmable
`graphics data and programinstructions, where graphics data 15
`Graphics Processing Pipeline 150 andtheat least one set of
`is any data thatis input to or output from components within
`samples is processed accordingto at least one program, the
`the graphics processor. Graphics memory can include por-
`at least one program including graphics program instruc-
`tions of Host Memory112, Local Memory 140,registerfiles
`tions. A program can process one or more sets of samples.
`coupled to the components within Graphics Processor 105,
`20 Conversely, a set of samples can be processed by a sequence
`and the like.
`of one or more programs.
`Graphics Processor 105 includes, among other compo-
`Samples, such as surfaces, primitives, or the like, are
`nents, Front End 130 that recerves commands from Host
`received from IDX 135 by Programmable Graphics Process-
`Computer 110 via Graphics Interface 117. Front End 130
`ing Pipeline 150 and stored in a Vertex Input Buffer 220
`interprets and formats the commands and outputs the for-
`including a register file, FIFO(first in first out), cache, or the
`matted commands and data to an IDX (Index Processor)
`like (not shown). The samples are broadcast to Execution
`135. Some of the formatted commands are used by Pro-
`Pipelines 240, four of which are shown in the figure. Each
`grammable Graphics Processing Pipeline 150 to initiate
`Execution Pipeline 240 includes at least one multithreaded
`processing of data by providing the location of program
`processing unit, to be described further herein. The samples
`instructions or graphics data stored in memory. IDX 135,
`Programmable Graphics Processing Pipeline 150 and a 30 output by Vertex Input Buffer 220 can be processed by any
`Raster Analyzer 160 each include an interface to Memory
`one of the Execution Pipelines 240. A sample is accepted by
`Controller 120 through which program instructions and data
`an Execution Pipeline 240 when a processing thread within
`can be read from memory, e.g., any combination of Local
`the Execution Pipeline 240 is available as described further
`
`
`Memory 140 and Host Memory 112. Whenaportion of Host —_herein. Each Execution Pipeline 240 signals to Vertex Input
`Memory 112 is used to store program instructions and data,
`35 Buffer 220 when a sample can be accepted or when a sample
`the portion of Host Memory 112 can be uncachedso as to
`cannot be accepted.
`In one embodiment Programmable
`increase performance of access by Graphics Processor 105.
`Graphics Processing Pipeline 150 includes a single Execu-
`IDX 135 optionally reads processed data, e.g., data writ-
`tion Pipeline 240 containing one multithreaded processing
`ten by Raster Analyzer 160, from memory and outputs the
`unit. In an alternative embodiment, Programmable Graphics
`data, processed data and formatted commands to Program- 40 Processing Pipeline 150 includes a plurality of Execution
`mable Graphics Processing Pipeline 150. Programmable
`Pipelines 240.
`Graphics Processing Pipeline 150 and Raster Analyzer 160
`Execution Pipelines 240 may receive first samples, such
`each contain one or more programmable processing units to
`as higher-order surface data, and tessellate the first samples
`perform a variety of specialized functions. Some of these
`to generate second samples, such as vertices. Execution
`unctionsare table lookup, scalar and vector addition, mul- 45 Pipelines 240 may be configured to transform the second
`iplication, division, coordinate-system mapping, calcula-
`samples from an object-based coordinate representation
`ion of vector normals, tessellation, calculation of deriva-
`(object space) to an alternatively based coordinate system
`ives, interpolation, and the like. Programmable Graphics
`such as world space or normalized device coordinates
`Processing Pipeline 150 and Raster Analyzer 160 are each
`(NDC) space. Each Execution Pipeline 240 may communi-
`optionally configured such that data processing operations 50 cate with Texture Unit 225 using a read interface (not shown
`are performed in multiple passes through those units or in
`in FIG. 2) to read program instructions and graphics data
`multiple passes within Programmable Graphics Processing
`such as texture maps from Local Memory 140 or Host
`Pipeline 150. Programmable Graphics Processing Pipeline|Memory 112 via Memory Controller 120 and a Texture
`150 and a Raster Analyzer 160 also each include a write
`Cache 230. Texture Cache 230 is used to improve memory
`interface to Memory Controller 120 through which data can 55 read performance by reducing read latency. In an alternate
`be written to memory.
`embodiment Texture Cache 230 is omitted.
`In another
`In a typical implementation Programmable Graphics Pro-
`alternate embodiment, a Texture Unit 225 is included in each
`cessing Pipeline 150 performs geometry computations, ras-
`Execution Pipeline 240. In another alternate embodiment
`erization, and pixel computations. Therefore Programmable
`program instructions are stored within Programmable
`Graphics Processing Pipeline 150 is programmed to operate 60 Graphics Processing Pipeline 150.
`In another alternate
`on surface, primitive, vertex, fragment, pixel, sample or any
`embodiment each Execution Pipeline 240 has a dedicated
`other data. For simplicity, the remainderof this description
`instruction read interface to read program instructions from
`will use the term “samples”to refer to graphics data such as
`Local Memory 140 or Host Memory 112 via Memory
`surfaces, primitives, vertices, pixels, fragments, or the like.
`Controller 120.
`Samples output by Programmable Graphics Processing 65
`Execution Pipelines 240 output processed samples, such
`Pipeline 150 are passed to a Raster Analyzer 160, which
`as vertices, that are stored in a Vertex Output Buffer 260
`optionally performs near and far plane clipping and raster
`including a register file, FIFO, cache, or the like (not
`Realtek Ex. 1005
`
`
`
`Case No. IPR2023-00922
`
`Page 13 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 13 of 22
`
`

`

`US 7,038,685 Bl
`
`5
`shown). Processed vertices output by Vertex Output Buffer
`260 are received by a Primitive Assembly/Setup Unit 205.
`Primitive Assembly/Setup Unit 205 calculates parameters,
`such as deltas andslopes, to rasterize the processed vertices
`and outputs parameters and samples, such as vertices, to a
`Raster Unit 210. Raster Unit 210 performs scan conversion
`on samples, such as vertices, and outputs samples, such as
`fragments, to a Pixel Input Buffer 215. Alternatively, Raster
`Unit 210 resamples processed vertices and outputs addi-
`tional vertices to Pixel Input Buffer 215.
`
`Pixel Input Buffer 215 outputs the samples to each Execu-
`tion Pipeline 240. Samples, such as pixels and fragments,
`output by Pixel Input Buffer 215 are each processed by only
`one of the Execution Pipelines 240. Pixel Input Buffer 215
`determines which one of the Execution Pipelines 240 to
`output each sample to depending on an output pixel position,
`e.g., (x,y), associated with each sample. In this manner, each
`sample is output to the Execution Pipeline 240 designatedto
`process samples associated with the outputpixelposition. In
`an alternate embodiment, each sample output by Pixel Input
`Buffer 215 is processed by one of any available Execution
`Pipelines 240.
`Each Execution Pipeline 240 signals to Pixel Input Buffer
`240 when a sample can be accepted or when a sample cannot
`be accepted as described further herein. Program instruc-
`tions configure programmable computation units (PCUs)
`within an Execution Pipeline 240 to perform operations such
`as tessellation, perspective correction,
`texture mapping,
`shading, blending, and the like. Processed samples are
`output from each Execution Pipeline 240 to a Pixel Output
`Buffer 270. Pixel Output Buffer 270 optionally stores the
`processed samples in a register file, FIFO, cache.or the like
`(not shown). The processed samples are output from Pixel
`Output Buffer 270 to Raster Analyzer 160.
`FIG.3 is a block diagram of an embodiment of Execution
`Pipeline 240 of FIG. 1 includingat least one Multithreaded
`Processing Unit 300. An Execution Pipeline 240 can contain
`a plurality of Multithreaded Processing Units 300, each
`Multithreaded Processing Unit 300 containing at least one
`PCU 375. PCUs 375 are configured using program instruc-
`tions read by a Thread Control Unit 320 via Texture Unit
`225. Thread Control Unit 320 gathers source data specified
`by the program instructions and dispatches the source data
`and program instructions to at least one PCU 375. PCUs 375
`performs computations specified by the program instruc-
`tions and outputs data to at least one destination, e.g., Pixel
`Output Buffer 160, Vertex Output Buffer 260 and Thread
`Control Unit 320.
`
`A single program may be used to process several sets of
`samples. Thread Control Unit 320 receives samples or
`pointers to samples stored in Pixel Input Buffer 215 and
`Vertex Input Buffer 220. Thread Control Unit 320 receives
`a pointer to a program to process one or more samples.
`Thread Control] Unit 320 assigns a thread to each sample to
`be processed. A thread includes a pointer to a program
`instruction (program counter), such as thefirst instruction
`within the program, thread state information, and storage
`resources for storing intermediate data generated during
`processing of the sample. Thread state informationis stored
`ina TSR (Thread Storage Resource) 325. TSR 325 may be
`a register file, FIFO, circular buffer, or the like. An instruc-
`ion specifies the location of source data needed to execute
`he instruction. Source data, such as intermediate data gen-
`erated during processing of the sample is stored in a Register
`File 350. In addition to Register File 350, other source data
`may be stored in Pixel Input Buffer 215 or Vertex Input
`
`
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`6
`Buffer 220. In an alternate embodiment sourcedata is stored
`in Local Memory 140, locations in Host Memory 112, and
`the like.
`Alternatively, in an embodiment permitting multiple pro-
`grams for two or more thread types, Thread Control Unit 320
`also receives a program identifier specifying which one of
`the two or more programs the program counter is associated
`with. Specifically,
`in an embodiment permitting simulta-
`neous execution of four programsfor a thread type, twobits
`of thread state information are used to store the program
`identifier for a thread. Multithreaded execution of programs
`is possible because each thread may be executed indepen-
`dent of other threads, regardless ofwhetherthe other threads
`are executing the same program or a different program.
`PCUs375 update each program counter associated with the
`threads in Thread Control Unit 320 following the execution
`of an instruction. For execution of a loop, call, return, or
`branch instruction the program counter may be updated
`based on the loop, call, return, or branchinstruction.
`For example, each fragment or group of fragments within
`a primitive can be processed independently from the other
`fragments or from the other groups of fragments within the
`primitive. Likewise, each vertex within a surface can be
`processed independently from the other vertices within the
`surface. For a set of samples being processed using the same
`program, the sequence of program instructions associated
`with each thread used to process each sample within the set
`will be identical, although the program counter for each
`thread may vary. However, it is possible that, during execu-
`tion, the threads processing some of the samples within a set
`will diverge following the execution of a conditional branch
`instruction. After the execution of a conditional branch
`instruction, the sequence of executed instructions associated
`with each thread processing samples within the set may
`differ and each program counter stored in TSR 325 within
`Thread Control Unit 320 for the threads may differ accord-
`ingly.
`FIG.4 is anillustration of an alternate embodiment of
`Execution Pipeline 240 containing at
`least one Multi-
`threaded Processing Unit 400. Thread Control Unit 420
`includes a TSR 325 to retain thread state data. In one
`embodiment TSR 325 stores thread state data for each of at
`least two thread types, where the at least two thread types
`may include pixel, primitive, and vertex. Thread state data
`for a thread may include, among other things, a program
`counter, a busy flag that indicates if the thread is either
`assigned to a sample or available to be assigned to a sample,
`a pointer to a source sample to be processed bythe instruc-
`tions associated with the thread or the output pixel position
`and output buffer ID of the source sample to be processed,
`and a pointer specifying a destination location in Vertex
`Output Buffer 260 or Pixel Output Buffer 270. Additionally,
`thread state data for a thread assigned to a sample may
`include the sample type, e.g., pixel, vertex, primitive, or the
`like. The type of data a thread processes identifies the thread
`type, e.g., pixel, vertex, primitive, or the like. For example,
`a thread may process a primitive, producing a vertex. After
`the vertex is rasterized and fragments are generated, the
`thread may process a fragment.
`Source samplesare stored in either Pixel Input Buffer 215
`or Vertex Input Buffer 220. Thread allocation priority, as
`described further herein,
`is used to assign a thread to a
`source sample. A thread allocation priority is specified for
`each sample type and Thread Control Unit 420 is configured
`to assign threads to samples or allocate locations in a
`Register File 350 based on the priority assigned to each
`sample type. The thread allocation priority maybe fixed,
`Realtek Ex. 1005
`
`
`
`Case No. IPR2023-00922
`
`Page 14 of 22
`
`Realtek Ex. 1005
`Case No. IPR2023-00922
`Page 14 of 22
`
`

`

`US 7,038,685 Bl
`
`
`
`
`
`7
`programmable, or dynamic. In one embodimentthe thread
`allocation priority may be fixed, always giving priority to
`allocating vertex threads and pixel threads are only allocated
`if vertex samples are not available for assignment to a
`hread.
`In an alternate embodiment, Thread Control Unit 420 is
`configured to assign threads to source samples or allocate
`ocations in Register File 350 using thread allocation pri-
`orities based on an amount of sample data in Pixel Inpu
`Buffer 215 and another amount of sample data in Vertex
`nput Buffer 220. Dynamically modifying a thread alloca-
`ion priority for vertex samples based on the amount o
`sample data in Vertex Input Buffer 220 permits Vertex Inpu
`Buffer 220 to drain faster and fill Vertex Output Buffer 260
`and Pixel Input Buffer 215 faster or drain slower and fil
`Vertex Output Buffer 260 and Pixel Input Buffer 215 slower.
`Dynamically modifying a thread allocation priority for pixel
`samples based on the amount of sample data in Pixel Inpu
`Buffer 215 permits Pixel Input Buffer 215 to drain faster and
`fill Pixel Output Buffer 270 faster or drain slower and fil
`Pixel Output Buffer 270 slower.
`In a further alternate
`embodiment, Thread Control Unit 420 is configured to
`assign threads to source samples or allocate locations in
`Register File 350 using threadallocation priorities based on
`graphics primitive size (number of pixels or fragments
`included in a primitive) or a numberof graphics primitives
`in Vertex Output Buffer 260. For example a dynamically
`determined thread allocation priority may be determined
`based on a numberof “pending”pixels, 1-e., the number of
`pixels to be rasterized from the primitives in Primitive
`Assembly/Setup 205 and in Vertex Output Buffer 260.
`Specifically, the thread allocation priority may be tuned such
`that the number of pending pixels produced by processing
`vertex threadsis adequate to achieve maximum utilization of
`the computation resources in Execution Pipelines 240 pro-
`cessing pixel threads.
`Once a thread is assigned to a source sample, the thread
`is allocated storage resources such as locations in a Register
`File 350 to retain intermediate data generated during execu-
`tion of program instructions associated with the thread.
`Alternatively, source data is stored in storage resources
`including Local Memory 140, locations in Host Memory
`112, and thelike.
`A Thread Selection Unit 415 reads one or more thread
`entries, each containing thread state data,
`from Thread
`Control Unit 420. Thread Selection Unit 415 may read
`thread entries to process a group of samples. For example,
`in one embodiment a group of samples, e.g., a number of
`vertices defining a primitive,
`four adjacent
`fragments
`arranged in a square, or the like, are processed simulta-
`neously. In the one embodiment computed values such as
`derivatives are shared within the group of samples thereby
`reducing the number of computations needed to process the
`group of samples compared with processing the group of
`samples without sharing the computed values.
`In Multithreaded Processing Unit 400, a thread execution
`priority is specified for each thread type and Thread Selec-
`tion Unit 415 is configured to read thread entries based on
`the thread execution priority assigned to each thread type. A
`Thread execution priority may be fixed, programmable, or
`dynamic. In one embodiment the thread execution priority
`maybefixed, always giving priority to execution of vertex
`threads and pixel threads are only executed if vertex threads
`are not available for execution.
`In anothe

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket