throbber

`
`
`THIS IS TO CERTIFY THAT ANNEXED HERETO IS A TRUE COPY FROM
`THE RECORDSOFTHIS OFFICE OF:
`
`May 9, 2023
`
`PATENT NUMBER:7,015,913
`ISSUE DATE: March21, 2006
`
`TOALL,TOWH
`UNITED STATES DEPARTMENT OF COMMERCE
`United States Patent and Trademark Office
`
`a= ~
`
`ByAuthorityof the
`UnderSecretary of Commercefor Intellectual Property
`andDirectorof the United States Patent and Trademark Office
`
`TUN,
`
`wfetCertifying Officer
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 1 of 21
`
`

`

`US007015913B1
`
`«2, United States Patent
`US 7,015,913 B1
`(10) Patent No.:
`Mar.21, 2006
`(45) Date of Patent:
`Lindholm etal.
`
`(54) METHOD AND APPARATUS FOR
`MULTITHREADED PROCESSING OF DATA
`IN A PRO!GRAMMABLE GRAPHICS
`PROCESSOR
`
`(75)
`
`Inventors: John Erik Lindholm, Saratoga, CA
`(US); Rui M. Bastos, Santa Clara, CA
`(US); Harold Robert Feldman Zatz,
`Palo Alto, CA (US)
`
`(73)
`
`Assignee: NVIDIA Corporation, Santa Clara, CA
`(US)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`US.C. 154(b) by 164 days.
`
`(21)
`
`Appl. No.: 10/608,346
`
`(22)
`
`Filed:
`
`Jun. 27, 2003
`
`(51)
`
`(52)
`
`(58)
`
`Int. Cl.
`(2006.01)
`GO6F 15/00
`(2006.01)
`GO6F 12/02
`(2006.01)
`GO6F 9/46
`(2006.01)
`GO6T 1/00
`US. Che oes 345/501; 345/543; 718/102;
`718/104
`Field of Classification Search................ 345/501,
`345/504, 520, 522, 420, 423, 503, 519, 506,
`345/543, 557; 709/208, 231; 712/23, 28,
`712/31-32; 718/102, 104
`See application file for complete search history.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5,818,469 A * 10/1998 Lawless et al... 345/522
`5,946,487 A *
`8/1999 Dangelo ............
`weve 717/148
`
`we 345/505
`6,088,044 A *
`7/2000 Kwoketal. ........
`
`6/2004 Heirich et al... 345/629
`6,753,878 B1*
`6,765,571 B1*
`7/2004 Sowizral et al... 345/420
`2003/0140179 A1*
`7/2003 Wilt et al. wou 709/321
`2004/0160446 A1*
`8/2004 Gosalia et al.
`.........0.. 345/503
`
`* cited by examiner
`
`Primary Examiner—Kee M. Tung
`(74) Attorney, Agent, or Firm—Patterson & Sheridan, LLP
`
`(67)
`
`ABSTRACT
`
`A graphics processor and method for executing a graphics
`program as a plurality of threads where each sample to be
`processed by the program is assigned to a thread. Although
`threads share processing resources within the programmable
`graphics processor,the execution of each thread can proceed
`independent of any other threads. For example, instructions
`in a second thread are scheduled for execution while execu-
`tion of instructions in a first thread are stalled waiting for
`source data. Consequently,a first received sample (assigned
`to the first thread) may be processed after a second received
`sample (assigned to the second thread). A benefit of inde-
`pendently executing each thread is improved performance
`because a stalled thread does not prevent the execution of
`other threads.
`
`33 Claims, 9 Drawing Sheets
`
`From From
`215 220
`
`Execution
`Pipeline
`
`eto
`
`Multithreaded
`Processing Unit
`
`
`To 225
`400
`Instruction
`
`Cache —
`
`>!
`410
`From 225:
`
`
`
`
`
`To 260
`
`To 270
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 2 of 21
`
`
`
`
`
`oo
`«—
`
`}
`
`CU 433
`
`Thread
`cone
`Buffer
`420
`
`I
`
`From 218
`From 220-+—
`
`Inetracti
`Resource
`Scoreboard —+|
`Instruction
`460
`Scheduler
`
`
`
`Instruction
`
`*| Dispatcher
`|_|
`
`
`Register
`
`File
`450
`
`Execution Unit
`
`
`
`
`
`
`
`
`
`
`
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 2 of 21
`
`

`

`U.S. Patent
`
`Mar. 21,2006
`
`Sheet 1 of 9
`
`US 7,015,913 B1
`
`
`
`Host Computer 110
`100
`
`Host Memory
`Host Processor
`142
`114
`to
`
`
`Syst m Interfac
`115
`
`
`
`
`
`
`
`
`| GraphicsInterface117
`Processor
`
`
`
`
`>
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Graphics
`Subsystem
`
`170
`
`:
`
`
`:
`Graphics
`
`Front End
`130
`
`<—$———
`—_—_
`
`IDX
`135
`
`a
`
`Programmable
`
`Graphics
`Memory r+—>
`Local
`Processing
`ontroller
`Memory
`140 100yg| Pipeline
`
`150
`ft
`r——— Raster Analyzer
`—— 160
`
`:
`
`Output Controller
`180
`
`FIG. 1
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page3 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 3 of 21
`
`

`

`U.S. Patent
`
`Mar. 21,2006
`
`Sheet 2 of 9
`
`US 7,015,913 B1
`
`From
`135
`
`Programmable
`Graphics
`Processing
`Pipeline
`150
`
`
`
`Primitive Assembly/Setup
`205
`|.
`
`
`
`
`Raster Unit
`210
`
`
`
`
`
`Vertex Input Buffer
`220
`
`Pixel Input Buffer
`215
`
`270
`
`
`
`
`
`TextureUnit
`
`225
`
`Texture
`Cache
`230
`
`
`
`
`
`Execution
`Pipeline
`240
`
`Execution
`Pipeline
`240
`
`Execution
`Pipeline
`240
`
`Execution
`Pipeline
`240
`
`
`
`
`
`
`
`Vertex Output Buffer
`260
`
`Pixel Output Buffer
`
`
`
`~~
`
`120
`
`120
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 4 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 4 of 21
`
`

`

`U.S. Patent
`
`Mar. 21, 2006
`
`Sheet 3 of 9
`
`US 7,015,913 B1
`
`330
`
`350
`
`- 3
`
`60
`
`370
`
`FIG. 3
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 5 of 21
`
`331
`
`332
`
`333
`
`342
`
`343
`
`344
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 5 of 21
`
`

`

`U.S. Patent
`
`Mar. 21,2006
`
`Sheet 4 of 9
`
`US 7,015,913 B1
`
`From From
`215
`220
`
`Execution
`Pipeline
`
`Multithreaded
`Processing Unit
`

`
`To 225
`_
`From 225
`
`
`Instruction
`Cache
`410
`
`
`
`
`
`
`
`From 220
`
`
`Thread
`Control
`Buffer
`420
`
`Resource
`Scoreboard
`460
`_
`
`Instruction
`Scheduler
`430
`
`
`
`instruction
`Dispatcher
`440
`
`Register
`File
`450
`
`Execution Unit
`
`From 218
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 6 of21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 6 of 21
`
`

`

`U.S. Patent
`
`Mar. 21, 2006
`
`Sheet 5 of 9
`
`US 7,015,913 Bl
`
`Receive a Sample
`504
`
`Receive Another
`Sample
`203
`
`
`
`Source
`
`Data Available for the
`Sample?
`§05
`
`
`
`
` Source
`
`Data Available for the
`Other Sample?
`507
`
`
`DispatchInstruction
`for Processing the
`Other Sample
`
`Dispatch Instruction
`for Processing the
`Sample
`
`515
`
`
`
`
`
`Source
`Data Available for the
`Other Sample?
`517
`
`
`Dispatch Instruction
`for Processing the
`Other Sample
`
`519
`
`509
`
`
`
`
` Source
`
`Data Available for the
`Sample?
`§11
`
`Y
`
`Dispatch Instruction
`for Processing the
`Sample
`513
`
`FIG. 5A
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 7 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 7 of 21
`
`

`

`U.S. Patent
`
`Mar. 21,2006
`
`Sheet 6 of 9
`
`US 7,015,913 B1
`
`Receive Sample
`$20
`
`identify Thread
`Type Needed
`921
`
`
`
`
`Thread
`
`Available?
`
`527
`
`Y
`
`Assign Thread
`530
`
`
`
`Allocate Resources
`533
`
`Fetch Instruction
`535
`
`Update PC and
`Resource Scoreboard
`540
`
`Dispatch Instruction
`543
`
`Process Sample
`545
`
`More
`Instructions?
`247
`
`
`
`
`
`
`
`
`
`PIOR
`Pixel
`
`
`disabledfor pixel
`Thread Position
`
`threads?
`Hazard?
`
`
`
`
`523
`525
`
`
`
`
`Can
`Schedule?
`
`537
`
`
`
`
`
`
`Deallocate
`Resources
`
`550
`
`
`
`FIG. 5B
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page8 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 8 of 21
`
`

`

`U.S. Patent
`
`Mar. 21,2006
`
`Sheet 7 of 9
`
`US 7,015,913 B1
`
`Instruction
`in window?
`
`:
`Timeout?
`
`Y
`
`Removeinstruction
`from window
`
`
`605
`
`
`810 TO 615
`
`
`
`
`Check
`
`synch made
`
`
`synchronization
`enabled?
`625
`
`
`
`620
`
`
`
`
`Removeinstruction
`nstruction
`
`
`
`
`synched?
`from window
`
`635
`630
`
`
`Sort by thread age
`640
`
`
`
`Read scoreboard
`645
`
`
`655
`680
`
` N
`
`
`
`Check resources
`
`650
`
`Schedule instruction
`
`Update scoreboard
`660
`
`Update PC
`670
`
`Output instruction
`
`FIG. 6
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 9 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 9 of 21
`
`

`

`U.S. Patent
`
`Mar. 21,2006
`
`Sheet 8 of 9
`
`US 7,015,913 B1
`
`£40 FIG. 7A
`
`
`
`
`Issue Function Call to
`enable PIOR
`703
`
`PIOR configuration
`complete
`£06
`
`Configure enable PIOR
`710
`
`Renderintersecting
`objects
`720
`
`Configure to disable
`PIOR
`730
`
`Render non-
`intersecting objects
`
`FIG. 7B
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 10 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 10 of 21
`
`

`

`U.S. Patent
`
`Mar. 21,2006
`
`Sheet 9 of 9
`
`US 7,015,913 B1
`
`
`
`£45
`
`Configure enable PIOR
`FAQ
`
`Render opaque objects
`£25
`
`Configure to disable
`PIOR
`£30
`
`Render non-opaque
`objects
`
`FIG. 7C
`
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 11 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 11 of 21
`
`

`

`US 7,015,913 B1
`
`1
`METHOD AND APPARATUS FOR
`MULTITHREADED PROCESSING OF DATA
`IN A PROGRAMMABLE GRAPHICS
`PROCESSOR
`
`FIELD OF THE INVENTION
`
`One or more aspects of the invention generally relate to
`multithreaded processing, and more particularly to process-
`ing graphics data in a programmable graphics processor.
`
`BACKGROUND
`
`Current graphics data processing is exemplified by sys-
`tems and methods developed to perform a specific operation
`on several graphics data elements, e.g., linear interpolation,
`tessellation,
`texture mapping, depth testing. Traditionally
`graphics processing systems were implemented as fixed
`function computation units and more recently the computa-
`tion units are programmable to perform a limited set of
`operations. In either system, the graphics data elements are
`processed in the order in which they are received by the
`graphics processing system. Within the graphics processing
`system, when a resource, e.g., computation unit or data,
`required to process a graphics data element is unavailable,
`the processing of the elementstalls, 1e., does not proceed,
`until the resource becomes available. Because the system is
`pipelined, the stall propagates back through the pipeline,
`stalling the processing of later received elements that may
`not require the resource and reducing the throughput ofthe
`system.
`For the foregoing reasons, there is a need for improved
`approaches to processing graphics data elements.
`
`SUMMARY
`
`The present invention is directed to a system and method
`that satisfies the need for a programmable graphics proces-
`sor that supports processing of graphics data elements in an
`order independent from the order in which the graphics data
`elements are received by the programmable graphics pro-
`cessing pipeline within the programmable graphics proces-
`sor.
`
`20
`
`30
`
`35
`
`40
`
`2
`Yet further embodiments of the invention include an
`application programming interface for a programmable
`graphics processor comprising a function call to configure a
`multithreaded processing unit within the programmable
`graphics processor to disable processing of samples inde-
`pendent of an order in which the samples are received.
`Various embodiments of a method of the invention
`include processing a first program instruction associated
`with a first thread and a second program instruction asso-
`ciated with a second thread. A first sample to be processed
`by a program instruction associated with a first thread is
`received before a second sample to be processed by a
`program instruction associated with a second thread is
`received. First source data required to process the program
`instruction associated with the first thread are determined to
`be not available. Second source data required to process the
`program instruction associated with the second thread are
`determined to be available. The program instruction asso-
`ciated with the second thread to process the second sample
`in the execution unit is dispatched prior to dispatching the
`program instruction associated with the first thread to pro-
`cess the first sample in the execution unit.
`Further embodiments of a methodof the invention include
`using a function call to configure the graphics processor.
`Support for processing samples of at least one sample type
`independent of an order in which the samples are received
`by a multithreaded processing unit within the graphics
`processor is detected. The function call to configure the
`multithreaded processing unit within the graphics processor
`to enable processing of the samples independent of an order
`in which the samples are received is issued for the at least
`one sample type.
`Yet further embodimentso of a method of the invention
`include rendering a scene using the graphics processor. The
`multithreaded processing unit within the graphics processor
`1s configured to enable processing of samples independentof
`an order in which the samples are received. The multi-
`threaded processing unit within the graphics processor pro-
`cess the samples independent of the order in which the
`samples are received to renderat least a portion of the scene.
`
`BRIEF DESCRIPTION OF THE VARIOUS
`VIEWS OF THE DRAWINGS
`
`45
`
`55
`
`Various embodimentsof the invention include a comput-
`exemplary
`show
`drawing(s)
`Accompanying
`ing system comprising a host processor, a host memory, a
`embodiment(s) in accordance with one or more aspects of
`system interface configured to interface with the host pro-
`the
`present
`invention;
`however,
`the
`accompanying
`cessor, and the programmable graphics processor for mul-
`drawing(s) should notbe takento limit the present invention
`tthreaded execution of program instructions. The graphics
`50
`to the embodiment(s) shown, but are for explanation and
`processorincludesat least one multithreaded processing unit
`understanding only.
`configured to receive samplesinafirst order to be processed
`FIG. 1 illustrates one embodiment of a computing system
`by program instructions associated with at least one thread.
`according to the invention including a host computer and a
`Each multithreaded processing unit includes a scheduler
`graphics subsystem;
`configured to receive the program instructions, determine
`availability of source data, and schedule the program
`FIG. 2 is a block diagram of an embodiment of the
`instructions for execution in a second order independent of
`Programmable Graphics Processing Pipeline of FIG. 1;
`the first order. Each multithreaded processing unit further
`FIG. 3 is a conceptual diagram ofthe relationship between
`includes a resource tracking unit configured to track the
`a program and threads;
`availability of the source data, and a dispatcher configured
`FIG. 4 is a block diagram of an embodiment of the
`to output the program instructions in the second orderto be
`Execution Pipeline of FIG. 2;
`executed by the at least one multithreaded processing unit.
`FIGS. 5A and 5B illustrate embodiments of methods
`Further embodiments of the invention include an appli-
`utilizing the Execution Pipelineillustrated in FIG. 4;
`cation programminginterface for a programmable graphics
`FIG. 6 illustrates an embodiment of a method utilizing the
`processor comprising a function call to configure a multi-
`Execution Pipeline illustrated in FIG. 4;
`threaded processing unit within the programmable graphics
`processorto enable processing of samples independentof an
`FIGS. 7A, 7B, and 7C illustrate embodiments of methods
`order in which the samples are received.
`utilizing the Computing System illustrated in FIG. 1.
`Realtek Ex. 1006
`
`60
`
`65
`
`Case No. IPR2023-00922
`
`Page 12 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 12 of 21
`
`

`

`US 7,015,913 B1
`
`5
`
`
`
`3
`DISCLOSURE OF THE INVENTION
`
`4
`150 and a Raster Analyzer 160 also each include a write
`interface to Memory Controller 120 through which data can
`be written to memory.
`The current invention involves new systems and methods
`In a typical implementation Programmable Graphics Pro-
`for processing graphics data elements in an order indepen-
`cessing Pipeline 150 performs geometry computations, ras-
`dent from the order in which the graphics data elements are
`terization, and pixel computations. Therefore Programmable
`received by a multithreaded processing unit within a graph-
`Graphics Processing Pipeline 150 is programmedto operate
`ies processor.
`onsurface, primitive, vertex, fragment, pixel, sample Or anly
`FIG. 1 is an illustration of a Computing System generally
`other data. A fragmentis at least a portion of a pixel, Le., a
`designated 100 and including a Host Computer 10 and a
`Graphics Subsystem 170. Computing System 100 may be a 10 pixel includes at least one fragment. For simplicity, the
`desktop computer, server, laptop computer, palm-sized com-
`remainderof this description willuse the term “samples”
`to
`puter, tablet computer, game console, cellular telephone,
`refer to surfaces, primitives, vertices, pixels,or fragments.
`computer based simulator, or the like. Host Computer 110
`Samples output by Programmable Graphics Processing
`includes Host Processor 114 which may include a system
`Pipeline 150 are passed to a Raster Analyzer 160, which
`memory controller to interface directly to Host Memory 112 1s optionally performs near and far plane clipping and raster
`or may communicate with Host Memory 112 through a
`operations,such asstencil, z test, and the like, and saves the
`System Interface 115. System Interface 115 may be an I/O
`resulis or the samples output by Programmable Graphics
`(input/output) interface or a bridge device including the
`Processing Pipeline 150 in Local Memory 140. When the
`system memory controller to interface directly to Host
`data received by GraphicsSubsystem 170 has been com-
`Memory 112. Examples of System Interface 115 known in 20 pletely processed by Graphics PFOcessor 105,an Output 185
`the art include Intel®Northbridge and Intelg® Southbridge.
`of Graphics Subsystem 170 is provided using an Output
`.
`.
`.
`Controller 180. Output Controller 180 is optionally config-
`Host Computer 110 communicates with Graphics Sub-
`ured to deliver data to a display device, network, electronic
`system 170 via System Interface 115 and a Graphics Inter-
`control system, other Computing System 100, other Graph-
`face 117 within a Graphics Processor 105. Data received at 95 ies Subsystem 170 orthelike.
`,
`Graphics Interface 117 can be passed to a Front End 130 or
`FIG. 2 is an ‘Tlustration of Programmable Graphics Pro-
`written to a Local Memory 140 through Memory Controller
`cessing Pipeline 150 of FIG. 1. At least one set of samples
`120. Graphics Processor 105 uses graphics memoryto store
`is output by IDX 135 and received by Programmable
`graphics data and program instructions, where graphics data
`Graphics Processing Pipeline 150 and the at least one set of
`1S any data thatis input to or output from components within 49 samples is processed according to at least one program, the
`the graphics processor. Graphics memory can include por-
`at least one program including graphics program instruc-
`tionsof Host Memory 112, Local Memory 140, register files
`tions. A program can process one or more sets of samples.
`coupled to the components within Graphics Processor 105,
`Conversely, a set of samples can be processed by a sequence
`andthe like.
`of one or more programs.
`Graphics Processor 105 includes, among other compo- 35
`Samples, such as surfaces, primitives, or the like, are
`nents, Front End 130 that receives commands from Host
`_yeceived from IDX 135 by Programmable Graphics Process-
`Computer 110 via Graphics Interface 117. Front End 130_ing Pipeline 150 andstored in a Vertex Input Buffer 220 in
`interprets and formats the commands and outputs the for-
`_q register file, FIFO (first in first out), cache, orthe like (not
`matted commands and data to an IDX (Index Processor)
`shown). The samples are broadcast to Execution Pipelines
`135. Some of the formatted commands are used by Pro- 49 240, four of which are shownin the figure. Each Execution
`grammable Graphics Processing Pipeline 150 to initiate
`Pipeline 240 includesat least one multithreaded processing
`processing of data by providing the location of program
`—_ynit, to be described further herein. The samples output by
`instructions or graphics data stored in memory. IDX 135,
`Vertex Input Buffer 220 can be processed by any oneofthe
`Programmable Graphics Processing Pipeline 150 and a
`Execution Pipelines 240. A sample is accepted by a Execu-
`Raster Analyzer 160 each include an interface to Memory 45 tion Pipeline 240 when a processing thread within the
`Controller 120 through which program instructions and data
`Execution Pipeline 240 is available as described further
`can be read from memory, ¢.g., any combination of Local
`herein. Each Execution Pipeline 240 signals to Vertex Input
`Memory 140 and Host Memory 112. Whenaportion of Host—_Buffer 220 when a sample can be accepted or when a sample
`Memory 112 is used to store program instructions and data,
`cannot be accepted.
`In one embodiment Programmable
`the portion of Host Memory 112 can be uncached so as to 59 Graphics Processing Pipeline 150 includes a single Execu-
`increase performanceof access by Graphics Processor 105.
`tion Pipeline 240 containing one multithreaded processing
`IDX 135 optionally reads processed data, e.g., data writ-
`unit. In an alternative embodiment, Programmable Graphics
`en by Raster Analyzer 160, from memory and outputs the
`Processing Pipeline 150 includes a plurality of Execution
`data, processed data and formatted commands to Program-
`Pipelines 240.
`mable Graphics Processing Pipeline 150. Programmable s5
`Execution Pipelines 240 canreceive first samples, such as
`Graphics Processing Pipeline 150 and Raster Analyzer 160
`higher-order surface data, and tessellate the first samples to
`each contain one or more programmable processing units to
`generate second samples, such as vertices. Execution Pipe-
`perform a variety of specialized functions. Some of these
`lines 240 can be configuredto transform the second samples
`unctions are table lookup,scalar and vector addition, mul-
`from an object-based coordinate representation (object
`iplication, division, coordinate-system mapping, calcula-
`60 space) to an alternatively based coordinate system such as
`ion of vector normals, tessellation, calculation of deriva-
`world space or normalized device coordinates (NDC)space.
`tives, interpolation, and the like. Programmable Graphics
`Each Execution Pipeline 240 communicates with Texture
`Processing Pipeline 150 and Raster Analyzer 160 are each
`Unit 225 using a read interface (not shownin FIG.2) to read
`optionally configured such that data processing operations
`program instructions and graphicsdata such as texture maps
`are performed in multiple passes through those units or in 65 from Local Memory 140 or Host Memory 112 via Memory
`Controller 120 and a Texture Cache 230. Texture Cache 230
`multiple passes within Programmable Graphics Processing
`Pipeline 150. Programmable Graphics Processing Pipeline
`is used to improve memory read performance by reducing
`Realtek Ex. 1006
`
`
`
`Case No. IPR2023-00922
`
`Page 13 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 13 of 21
`
`

`

`US 7,015,913 B1
`
`5
`readlatency. In an alternate embodiment Texture Cache 230
`is omitted. In another alternate embodiment, a Texture Unit
`225 is included in each Execution Pipeline 240. In yet
`another alternate embodiment program instructions are
`stored within Programmable Graphics Processing Pipeline
`150.
`Execution Pipelines 240 output processed samples, such
`as vertices, that are stored in a Vertex Output Buffer 260 in
`a register file, FIFO, cache, or the like (not shown). Pro-
`cessed vertices output by Vertex Output Buffer 260 are
`received by a Primitive Assembly/Setup 205. This unit
`calculates parameters, such as deltas and slopes, to rasterize
`the processed vertices. Primitive Assembly/Setup 205 out-
`puts parameters and samples,such as vertices, to Raster Unit
`210. The Raster Unit 210 performs scan conversion on
`samples, such as vertices, and outputs samples, such as
`fragments, to a Pixel Input Buffer 215. Alternatively, Raster
`Unit 210 resamples processed vertices and outputs addi-
`tional vertices to Pixel Input Buffer 215.
`Pixel Input Buffer 215 outputs the samples to each Execu-
`tion Pipeline 240. Samples, such as pixels and fragments,
`output by Pixel Input Buffer 215 are each processed by only
`one of the Execution Pipelines 240. Pixel Input Buffer 215
`determines which one of the Execution Pipelines 240 to
`output each sample to depending on an output pixel position,
`e.g., (x,y), associated with each sample.In this manner, each
`sample is output to the Execution Pipeline 240 designatedto
`process samples associated with the output pixel position. In
`an alternate embodiment, each sample output by Pixel Input
`Buffer 215 is processed by an available Execution Pipeline
`240.
`A sample is accepted by a Execution Pipeline 240 when
`a processing thread within the Execution Pipeline 240 is
`available as described further herein. Each Execution Pipe-
`line 240 signalsto Pixel Input Buffer 240 when a sample can
`be accepted or when a sample cannot be accepted. Program
`instructions associated with a thread configure program-
`mable computation units within a Execution Pipeline 240 to
`perform operations suchas texture mapping, shading, blend-
`ing, and the like. Processed samples are output from each
`Execution Pipeline 240 to a Pixel Output Buffer 270. Pixel
`Output Buffer 270 optionally stores the processed samples in
`a register file, FIFO, cache, or the like (not shown). The
`processed samples are output from Pixel Output Buffer 270
`to Raster Analyzer 160.
`Execution Pipelines 240 are optionally configured using
`program instructions read by Texture Unit 225 such that data
`processing operations are performed in multiple passes
`through at least one multithreaded processing unit,
`to be
`described further herein, within Execution Pipelines 240.
`Intermediate data generated during multiple passes can be
`stored in graphics memory.
`FIG. 3 is aconceptual diagram illustrating the relationship
`between a program andthreads. A single program 1s used to
`process several sets of samples. Each program, such as a
`vertex program or shader program,includes a sequence of
`program instructions such as, a Sequence 330 of program
`instructions 331 to 344. The at least one multithreaded
`processing unit within a Execution Pipeline 240 supports
`multithreaded execution. Therefore the program instructions
`in instruction Sequence 330 can be used by theat least one
`multithreaded processing unit to process each sample or
`each group of samples independently, Le., the at least one
`multithreaded processing unit may process each sample
`asynchronouslyrelative to other samples. For example, each
`ragment or group of fragments within a primitive can be
`processed independently from the other fragments or from
`
`
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`the other groups of fragments within the primitive. Like-
`wise, each vertex within a surface can be processed inde-
`pendently from the other vertices within the surface. For a
`set of samples being processed using the same program,the
`sequence of program instructions associated with each
`thread used to process each sample within the set will be
`identical. However, it is possible that, during execution, the
`threads processing some of the samples within a set will
`diverge following the execution of a conditional branch
`instruction. After the execution of a conditional branch
`instruction, the sequence of executed instructions associated
`with each thread processing samples within the set may
`differ.
`In FIG. 3 program instructions within instruction
`Sequence 330 are stored in graphics memory, ie., Host
`Memory 112, Local Memory 140, register files coupled to
`the components within Graphics Processor 105,andthelike.
`Each program counter
`(0 through 13)
`in instruction
`Sequence 330 corresponds to a program instruction within
`instruction Sequence 330. The program counters are con-
`ventionally numbered sequentially and can be used as an
`index to locate a specific program instruction within
`Sequence 330. Thefirst instruction 331 in the sequence 330
`represents is the program instruction corresponding to pro-
`gram counter 0. A base address, corresponding to the graph-
`ics memory location where the first instruction 331 in a
`program is stored, can be used in conjunction with a
`program counter to determine the location where a program
`instruction corresponding to the program counter is stored.
`In this example, program instructions within Sequence
`330 are associated with three threads. A Thread 350, a
`Thread 360 and a Thread 370are eachassigned to a different
`sample and each thread is uniquely identified by a thread
`identification code. A program instruction within Sequence
`330 is associated with a thread using a program counterthat
`is stored as a portion of thread state data, as described further
`herein. Thread 350 thread state data includes a program
`counter of 1 as shown in Sequence 330. The program
`counter associated with Thread 350 is a pointer to the
`program instruction in Sequence 330 corresponding to pro-
`gram counter 1 and stored at location 332. The instruction
`stored at location 332 is the next instruction to be used to
`process the sample assigned to Thread 350. Alternatively, an
`instruction stored at
`location 332 is the most recently
`executed instruction to process the sample assigned to
`Thread 350.
`Thethread state data for Thread 360 and Thread 370 each
`include a program counter of 11, as shown in FIG. 3,
`referencing the program instruction corresponding to pro-
`gram counter 11 in Program 330 and stored at location 342.
`Program counters associated with threads to process samples
`within a primitive, surface, or the like, are not necessarily
`identical becausethe threads can be executed independently.
`Whenbranchinstructions are not used, Thread 350, Thread
`360 and Thread 370 each execute all of the program instruc-
`tions in Sequence 3390.
`The number of threads that can be executed simulta-
`
`neously is limited to a predetermined number in each
`embodiment and is related to the number of Execution
`Pipelines 240, the amountof storage required forthread state
`data, the latency of Execution Pipelines 240, and thelike.
`Each sample is a specific type, ¢.g., primitive, vertex, or
`pixel, corresponding to a program type. A primitive type
`sample, e.g., primitive, is processed by a primitive program,
`a vertex type sample, e.g., surface or vertex, is processed by
`a vertex program,and a pixel type sample, e.g., fragment or
`pixel, is processed by a shader program. Likewise, a primi-
`Realtek Ex. 1006
`
`Case No. IPR2023-00922
`
`Page 14 of 21
`
`Realtek Ex. 1006
`Case No. IPR2023-00922
`Page 14 of 21
`
`

`

`US 7,015,913 B1
`
`7
`tive thread is associated with program instructions within a
`primitive program, a vertex thread is associated with pro-
`gram instructions within a vertex program, and a pixel
`thread is associated with program instructions within a
`shader program.
`A number of threads of each thread type that may be
`executed simultaneously is predetermined in each embodi-
`ment. Therefore, not all samples within a set of samples of
`a type can be processed simultaneously when the number of
`threads of the type is less than the number of samples.
`Conversely, when the numberof threads of a type exceeds
`the number of samples of the type within a set, more than
`one set can be processed simultaneously. Furthermore, when
`the number of threads of a type exceeds the number of
`samples of the type within one or more sets, more than one
`program ofthe type can be executed on the one or more sets
`and the thread state data can include data indicating the
`program associated with each thread.
`FIG. 4 is an illustration of a Execution Pipeline 240
`containing at least one Multithreaded Processing Unit 400.
`A Execution Pipeline 240 can contain a plurality of Multi-
`threaded Processing Units 400. Within each Multithreaded
`Processing Unit 400, a Thread Control Buffer 420 receives
`samples from Pixel Input Buffer 215 or Vertex Input Buffer
`220. Thread Control Buffer 420 includes storage resources
`o retain thread state data for a subset of the predetermined
`number of threads. In one embodiment Thread Control
`Buffer 420 includesstorage resources for eachofat least two
`hread types, where the at least two thread types can include
`pixel, primitive, and vertex. At least a portion of Thread
`Control Buffer 420 is a register file, FIFO, circular buffer, or
`he like. Thread state data for a thread can include, among
`other things, a program counter, a busyflag that indicates if
`he thread is either assigned to a sample oravailable to be
`assigned to a sample, a pointer to the source sample to be
`processed by the instructions associated with the thread or
`he output pixel position and output buffer ID of the sample
`to be processed, and a pointer specifying a destination
`ocation in Vertex Output Buffer 260 or Pixel Output Buffer
`270. Additionally, thread state data for a thread assigned to
`a sample can include the sample type, e.g., pixel, vertex,
`primitive, or the like.
`The source sample is stored in either Pixel Input Buffer
`215 or Vertex Input Buffer 220. When a thread is assigned
`o a sample, the thread is allocated storage resources to
`etain intermediate data generated during execution of pro-
`gram instructions associated with the thread. The thread
`identification code for a thread may be the address of a
`ocation in Thread Control Buffer 420 in which the thread
`state data for the thread is stored. In one embodiment,
`priority is specified for each thread type and Thread Control
`Buffer 420 is configured to assign threads to samples or
`allocate storage resources based on the priority assigned to
`each thread type. In an alternate embodiment, Thread Con-
`trol Buffer 420 is configured to assign threads to samples or
`allocate storage resources based on an amount of sample
`data in Pixel Input Buffer 215 and another amountof sample
`data in Vert

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket