`
`(12) United States Patent
`Morein et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,760,454 B2
`*Jun. 24, 2014
`
`(54)
`
`(75)
`
`(73)
`
`(*)
`
`(21)
`(22)
`(65)
`
`(63)
`
`GRAPHICS PROCESSING ARCHITECTURE
`EMPLOYING A UNIFIED SHADER
`
`Inventors: Stephen L. Morein, Cambridge, MA
`(US); Laurent Lefebvre, Lachgnaie
`(CA); Andrew E. Gruber, Arlington,
`MA (US); Andi Skende, Shrewsbury,
`MA (US)
`Assignee: ATI Technologies ULC, Markham,
`Ontario (CA)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`This patent is Subject to a terminal dis
`claimer.
`
`Notice:
`
`Appl. No.: 13/109.738
`Filed:
`May 17, 2011
`
`Prior Publication Data
`US 2011/0216O77 A1
`Sep. 8, 2011
`
`Related U.S. Application Data
`Continuation of application No. 12/791,597, filed on
`Jun. 1, 2010, now abandoned, which is a continuation
`of application No. 1 1/842.256, filed on Aug. 21, 2007,
`now abandoned, which is a continuation of application
`No. 1 1/117,863, filed on Apr. 29, 2005, now Pat. No.
`7.327.369, which is a continuation of application No.
`10/718,318, filed on Nov. 20, 2003, now Pat. No.
`6,897,871.
`
`(51)
`
`(52)
`
`Int. C.
`G06F 5/00
`U.S. C.
`USPC .......................................................... 345/SO1
`
`(2006.01)
`
`(58) Field of Classification Search
`CPC ......... G06T 15/005; G06T 15/80, G06T 1/20;
`G06T 1/60; G09G 5/363; G06F 3/14
`USPC .......................................................... 34.5/501
`See application file for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`5,485,559 A
`5,550,962 A
`
`1/1996 Sakaibara et al.
`8/1996 Nakamura et al.
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`EP
`EP
`
`3, 2011
`22961-16 A2
`3, 2011
`2299.408 A2
`4/2011
`230946.0 A1
`OTHER PUBLICATIONS
`
`European Patent Office Examination Report; EP Application No.
`04798.938.9; dated Nov. 9, 2006; pp. 1-3.
`(Continued)
`Primary Examiner — Kee MTung
`Assistant Examiner — Frank Chen
`(74) Attorney, Agent, or Firm — Faegre Baker Daniels LLP
`(57)
`ABSTRACT
`A graphics processing architecture in one example performs
`Vertex manipulation operations and pixel manipulation
`operations by transmitting vertex data to a general purpose
`register block, and performing vertex operations on the vertex
`datab a processor unless the general purpose register block
`does not have enough available space therein to store incom
`ing vertex data; and continues pixel calculation operations
`that are to be or are currently being performed the processor
`based on instructions maintained in an instruction store until
`enough registers within the general purpose register block
`become available.
`11 Claims, 5 Drawing Sheets
`
`INDCES
`
`ARBTER
`
`MUX
`
`4.
`6
`
`6
`
`65 -62 |
`
`UNIFE
`SHADER
`
`4 To MMORY
`68
`
`MEMORY
`
`8.
`
`85
`
`89A
`
`CACHE
`
`69
`
`78
`
`RENDER
`BACK
`EMD
`
`77
`
`F8
`
`MMORY
`controller
`
`PARAMTER FOA
`CACHE
`
`POSITION
`CACHE
`
`OB
`
`71
`prMITW
`ASSEMBLY
`
`73
`
`DISPLAY
`CONTROLLER
`
`rasterization
`ENGINE
`
`
`
`8
`
`8
`
`84
`
`82
`
`DISPLAY
`
`MMORY
`
`TCL 1001
`
`
`
`US 8,760,454 B2
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,818,469
`6,118,452
`6,353,439
`6,384,824
`6,417,858
`6,573,893
`6,650,327
`6,650,330
`6,697,074
`6,704,018
`6,724,394
`6,731.289
`6,809,732
`6,864,893
`6,897,871
`6,980,209
`7,015,909
`7,015,913
`7,038,685
`7,239,322
`7,327,369
`7,646,817
`7,742,053
`7,746,348
`2003/003 0643
`2003/0076320
`2003. O164830
`2004/0041814
`2004O164987
`2005.0068325
`2005/02O0629
`2007/0222785
`2007/0222786
`2007/0222787
`2007/0285427
`2010.0156915
`2010/0231592
`
`A
`A
`B1
`B1
`B1
`B1
`B1
`B2
`B2
`B1
`B1
`B1
`B2
`B2
`B1
`B1
`B1
`B1
`B1
`B2
`B2
`B2
`B2
`B2
`A1
`A1
`A1
`A1
`A1
`A1
`A1
`A1
`A1
`A1
`A1
`A1
`A1
`
`10, 1998
`9, 2000
`3, 2002
`5, 2002
`T/2002
`6, 2003
`11, 2003
`11, 2003
`2, 2004
`3, 2004
`4, 2004
`5, 2004
`10, 2004
`3, 2005
`5/2005
`12, 2005
`3, 2006
`3, 2006
`5, 2006
`7/2007
`2, 2008
`1, 2010
`6, 2010
`6, 2010
`2, 2003
`4, 2003
`9, 2003
`3, 2004
`8, 2004
`3, 2005
`9, 2005
`9, 2007
`9/2007
`9, 2007
`12, 2007
`6, 2010
`9, 2010
`
`Lawless et al.
`Gannett
`Lindholm et al.
`Morgan et al.
`Bosch et al.
`Naqvi et al.
`Airey et al.
`Lindholm et al.
`Parikh et al. .......
`Mori et al.
`Zatz et al.
`Peercy et al.
`Zatz et al.
`Zatz
`Morein et al.
`Donham et al.
`Morgan, III et al.
`Lindholm et al.
`Lindholm ..........
`Lefebvre et al.
`Morein et al.
`Shen et al.
`Lefebvre et al.
`Lefebvre et al.
`Taylor et al.
`Collodi
`Kent
`Wyatt et al.
`Aronson et al.
`Lefebvre et al.
`Morein et al.
`Lefebvre et al.
`Lefebvre et al.
`Lefebvre et al.
`Morein et al.
`Lefebvre et al.
`Morein et al.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`. 345,522
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`. 345.426
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`. 345,501
`
`375,240.25
`
`OTHER PUBLICATIONS
`
`Purcell, Timothy J. et al.; Ray Tracing on Programmable Graphics
`Hardware; SIGGRAPH '02; San Antonio, TX; ACM Transactions on
`Graphics; Jul. 2002; vol. 21, No. 3; pp. 703-712.
`Mark, William R. et al.; CG: A System for programming graphics
`hardware in a C-like language; SIGGRAPH '03; San Diego, CA;
`ACM Transactions on Graphics; Jul. 2002; vol. 22, No. 3; pp. 896
`907.
`Breternitz, Jr., Mauricio et al.; Compilation, Architectural Support,
`and Evaluation of SIMD Graphics Pipeline Programs on a General
`Purpose CPU: IEEE: 2003; pp. 1-11.
`International Search Report and Written Opinion; International
`Application No. PCT/IB2004/003821; dated Mar. 22, 2005.
`EP Supplemental Search Report; EP Application No. 10075688.1;
`dated Feb. 25, 2011.
`EP Supplemental Search Report; EP Application No. 10075686.5;
`dated Feb. 25, 2011.
`EP Supplemental Search Report; EP Application No. 10075687.3;
`dated Feb. 25, 2011.
`EP Supplemental Search Report; EP Application No. 10075685.7;
`dated Feb. 25, 2011.
`Eldridge, Matthew et al.; Pomegranate: A Fully Scalable Graphics
`Architecture; Computer Graphics, SIGGRAPH 2000 Conference
`Proceedings; Jul. 23, 2000.
`Owens, John D. et al.; Polygon Rendering on a Stream Architecture;
`Proceedings 2000 SIGGRAPH/Eurographics Workshop on Graphics
`Hardware; Aug. 21, 2000.
`Chinese Office Action; Chinese Application No. 2004800405708;
`dated Sep. 2008.
`Chinese Office Action; Chinese Application No. 2004800405708;
`dated Nov. 2009.
`Chinese Office Action; Chinese Application No. 2004800405708;
`dated Sep. 2010.
`* cited by examiner
`
`TCL 1001
`
`
`
`U.S. Patent
`U.S. Patent
`
`Jun. 24, 2014
`Jun. 24, 2014
`
`Sheet 1 of 5
`Sheet 1 of 5
`
`US 8,760,454 B2
`US 8,760,454 B2
`
`
`
`cl
`Z ||
`
`
`
`(.z‘A‘x)YAqVHS
`
`Loardo
`
`dAdVHS
`
`YO109
`
`danLxal
`
`(LYVMOIMd)L“Sis
`
`TCL 1001
`
`TCL 1001
`
`
`
`
`
`U.S. Patent
`
`Jun. 24, 2014
`
`Sheet 2 of 5
`
`US 8,760,454 B2
`
`FIG. 2A
`(PRIOR ART)
`
`
`
`FIG. 2B
`(PRIOR ART)
`
`TCL 1001
`
`
`
`U.S. Patent
`
`Jun. 24, 2014
`
`Sheet 3 of 5
`
`US 8,760,454 B2
`
`55
`
`MEMORY
`
`----------- 4.
`TEXTURE
`
`MAP
`
`41
`
`43
`
`VERTEX FETCH
`
`V-CACHE
`
`44
`
`42
`
`40
`
`45
`
`VERTEX
`SHADER
`
`46
`
`VERTEX
`STORE
`
`48
`
`47
`
`49
`
`PRIMITIVE
`ASSEMBLY
`
`50
`
`51
`
`RASTERIZATION 52
`ENGINE
`
`TO
`57
`
`55
`
`FROM
`57
`
`TEXTURE
`CACHE 58
`56
`
`53
`
`PXEL
`SHADER
`
`54
`
`59
`
`FIG. 3
`(PRIOR ART)
`
`POST RASTER
`PROCESSING
`
`TCL 1001
`
`
`
`U.S. Patent
`
`Jun. 24, 2014
`
`Sheet 4 of 5
`
`US 8,760,454 B2
`
`64
`
`63
`
`6O
`
`
`
`INDICES
`
`65
`
`UNIFIED
`SHADER
`
`RENDER
`BACK
`END
`
`MEMORY
`CONTROLLER
`
`
`
`DISPLAY
`CONTROLLER
`
`TO MEMORY
`7-68
`TEXTURE
`VERTEX
`CACHE
`
`MEMORY
`DATA
`
`PARAMETER
`CACHE
`
`POSITION
`CACHE
`
`
`
`71
`
`PRIMITIVE
`ASSEMBLY
`
`73
`
`RASTERIZATION
`ENGINE
`
`75
`
`8
`
`84
`
`82
`
`DISPLAY
`
`MEMORY
`
`FIG. 4A
`
`TCL 1001
`
`
`
`U.S. Patent
`
`Jun. 24, 2014
`
`Sheet 5 of 5
`
`US 8,760,454 B2
`
`INDICES
`
`VERTEX
`CACHE
`
`FIG. 4B
`
`61A
`
`61B
`
`
`
`FROMMUX
`
`MEMORY
`FETCH
`67
`
`CONSTANTS
`
`SOURCE B
`
`SOURCEC
`
`9
`
`96A
`
`CPU
`
`(SCALER)
`
`TCL 1001
`
`
`
`1.
`GRAPHICS PROCESSING ARCHITECTURE
`EMPLOYING A UNIFIED SHADER
`
`US 8,760,454 B2
`
`RELATED APPLICATIONS
`
`10
`
`This application is a continuation of co-pending U.S. appli
`cation Ser. No. 12/791,597, filed Jun. 1, 2010, entitled
`“GRAPHICS PROCESSING ARCHITECTURE EMPLOY
`ING A UNIFIED SHADER, having as inventors Steven
`Morein et al., owned by instant assignee and is incorporated
`herein by reference, which is a continuation of co-pending
`U.S. application Ser. No. 1 1/842.256, filed Aug. 21, 2007,
`entitled “GRAPHICS PROCESSING ARCHITECTURE
`EMPLOYING A UNIFIED SHADER, having as inventors
`Steven Morein et al., owned by instant assignee and is incor
`porated herein by reference, which is a continuation of U.S.
`application Ser. No. 1 1/117,863, filed Apr. 29, 2005, which
`has issued into U.S. Pat. No. 7,327,369, entitled “GRAPHICS
`PROCESSING ARCHITECTURE EMPLOYING A UNI
`FIED SHADER, having as inventors Steven Morein et al.,
`and owned by instant assignee and is incorporated herein by
`reference which is a continuation of U.S. application Ser. No.
`10/718,318, filed on Nov. 20, 2003, which has issued into
`U.S. Pat. No. 6,897,871, entitled “GRAPHICS PROCESS
`25
`ING ARCHITECTURE EMPLOYING A UNIFIED
`SHADER, having as inventors Steven Morein et al., and
`owned by instant assignee and is incorporated herein by ref
`CCC.
`
`15
`
`FIELD OF THE INVENTION
`
`The present invention generally relates to graphics proces
`sors and, more particularly, to a graphics processor architec
`ture employing a single shader.
`
`BACKGROUND OF THE INVENTION
`
`30
`
`35
`
`40
`
`55
`
`In computer graphics applications, complex shapes and
`structures are formed through the sampling, interconnection
`and rendering of more simple objects, referred to as primi
`tives. An example of Such a primitive is a triangle, or other
`Suitable polygon. These primitives, in turn, are formed by the
`interconnection of individual pixels. Color and texture are
`then applied to the individual pixels that comprise the shape
`based on their location within the primitive and the primitives
`45
`orientation with respect to the generated shape; thereby gen
`erating the object that is rendered to a corresponding display
`for Subsequent viewing.
`The interconnection of primitives and the application of
`color and textures to generated shapes are generally per
`50
`formed by a graphics processor. Conventional graphics pro
`cessors include a series of shaders that specify how and with
`what corresponding attributes, a final image is drawn on a
`screen, or suitable display device. As illustrated in FIG. 1, a
`conventional shader 10 can be represented as a processing
`block 12 that accepts a plurality of bits of input data, such as,
`for example, object shape data (14) in object space (x,y,z);
`material properties of the object, such as color (16); texture
`information (18); luminance information (20); and viewing
`angle information (22) and provides output data (28) repre
`senting the object with texture and other appearance proper
`ties applied thereto (x, y, z).
`In exemplary fashion, as illustrated in FIGS. 2A-2B, the
`shader accepts the vertex coordinate data representing cube
`30 (FIG. 2A) as inputs and provides data representing, for
`example, a perspectively corrected view of the cube 30' (FIG.
`2B) as an output. The corrected view may be provided, for
`
`60
`
`65
`
`2
`example, by applying an appropriate transformation matrix to
`the data representing the initial cube 30. More specifically, the
`representation illustrated in FIG. 2B is provided by a vertex
`shader that accepts as inputs the data representing, for
`example, vertices V, V, and V., among others of cube 30 and
`providing angularly oriented vertices V,'.V. and V', includ
`ing any appearance attributes of corresponding cube 30'.
`In addition to the vertex shader discussed above, a shader
`processing block that operates on the pixel level, referred to as
`a pixel shader is also used when generating an object for
`display. Generally, the pixel shader provides the color value
`associated with each pixel of a rendered object. Convention
`ally, both the vertex shader and pixel shader are separate
`components that are configured to perform only a single
`transformation or operation. Thus, in order to perform a posi
`tion and a texture transformation of an input, at least two
`shading operations and hence, at least two shaders, need to be
`employed. Conventional graphics processors require the use
`of both a vertex shader and a pixel shader in order to generate
`an object. Because both types of shaders are required, known
`graphics processors are relatively large in size, with most of
`the real estate being taken up by the vertex and pixel shaders.
`In addition to the real estate penalty associated with con
`ventional graphics processors, there is also a corresponding
`performance penalty associated therewith. In conventional
`graphics processors, the vertex shader and the pixel shader are
`juxtaposed in a sequential, pipelined fashion, with the vertex
`shader being positioned before and operating on vertex data
`before the pixel shader can operate on individual pixel data.
`Thus, there is a need for an improved graphics processor
`employing a shader that is both space efficient and computa
`tionally effective.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention and the associated advantages and
`features thereof, will become better understood and appreci
`ated upon review of the following detailed description of the
`invention, taken in conjunction with the following drawings,
`where like numerals represent like elements, in which:
`FIG. 1 is a schematic block diagram of a conventional
`shader;
`FIGS. 2A-2B are graphical representations of the opera
`tions performed by the shader illustrated in FIG. 1;
`FIG. 3 is a schematic block diagram of a conventional
`graphics processor architecture;
`FIG. 4A is a schematic block diagram of a graphics pro
`cessor architecture according to the present invention;
`FIG. 4B is a schematic block diagram of an optional input
`component to the graphics processor according to an alternate
`embodiment of the present invention; and
`FIG. 5 is an exploded schematic block diagram of the
`unified shader employed in the graphics processor illustrated
`in FIG. 4A.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`Briefly stated, the present invention is directed to a graph
`ics processor that employs a unified shader that is capable of
`performing both the vertex operations and the pixel opera
`tions in a space saving and computationally efficient manner.
`In an exemplary embodiment, a graphics processor according
`to the present invention includes an arbiter circuit for select
`ing one of a plurality of inputs for processing in response to a
`control signal; and a shader, coupled to the arbiter, operative
`to process the selected one of the plurality of inputs, the
`shader including means for performing vertex operations and
`
`TCL 1001
`
`
`
`3
`pixel operations, and wherein the shader performs one of the
`Vertex operations or pixel operations based on the selected
`one of the plurality of inputs.
`The shader includes a general purpose register block for
`storing at least the plurality of selected inputs, a sequencer for
`storing logical and arithmetic instructions that are used to
`perform vertex and pixel manipulation operations and a pro
`cessor capable of executing both floating point arithmetic and
`logical operations on the selected inputs according to the
`instructions maintained in the sequencer. The shader of the
`present invention is referred to as a “unified” shader because
`it is configured to perform both vertex and pixel operations.
`By employing the unified shader of the present invention, the
`associated graphics processor is more space efficient than
`conventional graphics processors because the unified shader
`takes up less real estate than the conventional multi-shader
`processor architecture.
`In addition, according to the present invention, the unified
`shader is more computationally efficient because it allows the
`shader to be flexibly allocated to pixels or vertices based on
`workload.
`Referring now to FIG. 3, illustrated therein is a graphics
`processor incorporating a conventional pipeline architecture.
`As shown, the graphics processor 40 includes a vertex fetch
`block 42 which receives vertex information relating to a
`primitive to be rendered from an off-chip memory 55 on line
`41. The fetched vertex data is then transmitted to a vertex
`cache 44 for storage online 43. Upon request, the vertex data
`maintained in the vertex cache 44 is transmitted to a vertex
`shader 46 on line 45. As discussed above, an example of the
`information that is requested by and transmitted to the vertex
`shader 46 includes the object shape, material properties (e.g.
`color), texture information, and viewing angle. Generally, the
`Vertex shader 46 is a programmable mechanism which
`applies a transformation position matrix to the input position
`information (obtained from the vertex cache 44), thereby
`providing data representing a perspectively corrected image
`of the object to be rendered, along with any texture or color
`coordinates thereof.
`After performing the transformation operation, the data
`representing the transformed vertices are then provided to a
`vertex store 48 on line 47. The vertex store 48 then transmits
`the modified vertex information contained therein to a primi
`tive assembly block 50 on line 49. The primitive assembly
`block 50 assembles, or converts, the input vertex information
`into a plurality of primitives to be Subsequently processed.
`Suitable methods of assembling the input vertex information
`into primitives is known in the art and will not be discussed in
`greater detail here. The assembled primitives are then trans
`mitted to a rasterization engine 52, which converts the previ
`ously assembled primitives into pixel data through a process
`referred to as walking. The resulting pixel data is then trans
`mitted to a pixel shader 54 on line 53.
`The pixel shader 54 generates the color and additional
`appearance attributes that are to be applied to a given pixel,
`and applies the appearance attributes to the respective pixels.
`In addition, the pixel shader 54 is capable of fetching texture
`data from a texture map 57 as indexed by the pixel data from
`the rasterization engine 52 by transmitting such information
`on line 55 to the texture map. The requested texture data is
`then transmitted back from the texture map 57 on line 57 and
`stored in a texture cache 56 before being routed to the pixel
`shader online 58. Once the texture data has been received, the
`pixel shader 54 then performs specified logical or arithmetic
`operations on the received texture data to generate the pixel
`color or other appearance attribute of interest. The generated
`pixel appearance attribute is then combined with a base color,
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 8,760,454 B2
`
`4
`as provided by the rasterization engine on line 53, to thereby
`provide a pixel color to the pixel corresponding at the position
`of interest. The pixel appearance attribute present online 59 is
`then transmitted to post raster processing blocks (not shown).
`As described above, the conventional graphics processor
`40 requires the use of two separate shaders: a vertex shader 46
`and a pixel shader 54. A drawback associated with such an
`architecture is that the overall footprint of the graphics pro
`cessor is relatively large as the two shaders take up a large
`amount of real estate. Another drawback associated with con
`ventional graphics processor architectures is that can exhibit
`poor computational efficiency.
`Referring now to FIG. 4A, in an exemplary embodiment,
`the graphics processor 60 of the present invention includes a
`multiplexer 66 having vertex (e.g. indices) data provided at a
`first input thereto and interpolated pixel parameter (e.g. posi
`tion) data and attribute data from a rasterization engine 74
`provided at a second input. A control signal generated by an
`arbiter 64 is transmitted to the multiplexer 66 on line 63. The
`arbiter 64 determines which of the two inputs to the multi
`plexer 66 is transmitted to a unified shader 62 for further
`processing. The arbitration scheme employed by the arbiter
`64 is as follows: the vertex data on the first input of the
`multiplexer 66 is transmitted to the unified shader 62 on line
`65 if there is enough resources available in the unified shader
`to operate on the vertex data; otherwise, the interpolated pixel
`parameter data present on the second input will be passed to
`the unified shader 62 for further processing.
`Referring briefly to FIG. 5, the unified shader 62 will now
`be described. As illustrated, the unified shader 62 includes a
`general purpose register block 92, a plurality of source reg
`isters: including source register A93, source register B95,
`and source register C 97, a processor (e.g. CPU) 96 and a
`sequencer 99. The general purpose register block 92 includes
`sixty four registers, or available entries, for storing the infor
`mation transmitted from the multiplexer 66 on line 65 or any
`other information to be maintained within the unified shader.
`The data present in the general purpose register block 92 is
`transmitted to the plurality of source registers via line 109.
`The processor 96 may be comprised of a dedicated piece of
`hardware or can be configured as part of a general purpose
`computing device (i.e. personal computer). In an exemplary
`embodiment, the processor 96 is adapted to perform 32-bit
`floating point arithmetic operations as well as a complete
`series of logical operations on corresponding operands. As
`shown, the processor is logically partitioned into two sec
`tions. Section 96 is configured to execute, for example, the
`32-bit floating point arithmetic operations of the unified
`shader. The second section, 96A, is configured to perform
`scaler operations (e.g. log, exponent, reciprocal square root)
`of the unified shader.
`The sequencer 99 includes constants block 91 and an
`instruction store 98. The constants block 91 contains, for
`example, the several transformation matrices used in connec
`tion with vertex manipulation operations. The instruction
`store 98 contains the necessary instructions that are executed
`by the processor 96 in order to perform the respective arith
`metic and logic operations on the data maintained in the
`general purpose register block 92 as provided by the source
`registers 93-95. The instruction store 98 further includes
`memory fetch instructions that, when executed, causes the
`unified shader 62 to fetch texture and other types of data, from
`memory 82 (FIG. 4A). In operation, the sequencer 99 deter
`mines whether the next instruction to be executed (from the
`instruction store 98) is an arithmetic or logical instruction or
`a memory (e.g. texture fetch) instruction. If the next instruc
`tion is a memory instruction or request, the sequencer 99
`
`TCL 1001
`
`
`
`5
`sends the request to a fetch block (not shown) which retrieves
`the required information from memory 82 (FIG. 4A). The
`retrieved information is then transmitted to the sequencer 99.
`through the vertex texture cache 68 (FIG. 4A) as described in
`greater detail below.
`If the next instruction to be executed is an arithmetic or
`logical instruction, the sequencer 99 causes the appropriate
`operands to be transferred from the general purpose register
`block 92 into the appropriate source registers (93.95,97) for
`execution, and an appropriate signal is sent to the processor
`96 on line 101 indicating what operation or series of opera
`tions are to be executed on the several operands present in the
`source registers. At this point, the processor 96 executes the
`instructions on the operands present in the source registers
`and provides the result online 85. The information present on
`line 85 may be transmitted back to the general purpose reg
`ister block 92 for storage, or transmitted to Succeeding com
`ponents of the graphics processor 60.
`As discussed above, the instruction store 98 maintains both
`Vertex manipulation instructions and pixel manipulation
`instructions. Therefore, the unified shader 99 of the present
`invention is able to perform both vertex and pixel operations,
`as well as execute memory fetch operations. As such, the
`unified shader 62 of the present invention is able to perform
`both the vertex shading and pixel shading operations on data
`in the context of a graphics controller based on information
`passed from the multiplexer. By being adapted to perform
`memory fetches, the unified shader of the present invention is
`able to perform additional processes that conventional vertex
`shaders cannot perform; while at the same time, perform pixel
`operations.
`The unified shader 62 has ability to simultaneously per
`form vertex manipulation operations and pixel manipulation
`operations at various degrees of completion by being able to
`freely switch between Such programs or instructions, main
`tained in the instruction store 98, Very quickly. In application,
`Vertex data to be processed is transmitted into the general
`purpose register block 92 from multiplexer 66. The instruc
`tion store 98 then passes the corresponding control signals to
`the processor 96 on line 101 to perform such vertex opera
`tions. However, if the general purpose register block 92 does
`not have enough available space therein to store the incoming
`vertex data, such information will not be transmitted as the
`arbitration scheme of the arbiter 64 is not satisfied. In this
`manner, any pixel calculation operations that are to be, or are
`currently being, performed by the processor 96 are continued,
`based on the instructions maintained in the instruction store
`98, until enough registers within the general purpose register
`block 92 become available. Thus, through the sharing of
`resources within the unified shader 62, processing of image
`data is enhanced as there is no down time associated with the
`processor 96.
`Referring back to FIG. 4A, the graphics processor 60 fur
`ther includes a cache block 70, including a parameter cache
`70A and a position cache 70B which accepts the pixel based
`output of the unified shader 62 on line 85 and stores the
`respective pixel parameter and position information in the
`corresponding cache. The pixel information present in the
`cache block 70 is then transmitted to the primitive assembly
`block 72 on line 71. The primitive assembly block 72 is
`responsible for assembling the information transmitted
`thereto from the cache block 70 into a series of triangles, or
`other suitable primitives, for further processing. The
`assembled primitives are then transmitted on line 73 to ras
`terization engine block 74, where the transmitted primitives
`are then converted into individual pixel data information
`through a walking process, or any other Suitable pixel gen
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 8,760,454 B2
`
`10
`
`15
`
`6
`eration process. The resulting pixel data from the rasteriza
`tion engine block 74 is the interpolated pixel parameter data
`that is transmitted to the second input of the multiplexer 66 on
`line 75.
`In those situations when vertex data is transmitted to the
`unified shader 62 through the multiplexer 66, the resulting
`vertex data generated by the processor 96, is transmitted to a
`render back end block 76 which converts the resulting vertex
`data into at least one of several formats suitable for later
`display on display device 84. For example, if a stained glass
`appearance effect is to be applied to an image, the information
`corresponding to such appearance effect is associated with
`the appropriate position data by the render back end 76. The
`information from the renderbackend 76 is then transmitted to
`memory 82 and a display controller line 80 via memory
`controller 78. Such appropriately formatted information is
`then transmitted on line 83 for presentation on display device
`84.
`Referring now to FIG. 4B, shown therein is a vertex block
`61 which is used to provide the vertex information at the first
`input of the multiplexer 66 according to an alternate embodi
`ment of the present invention. The vertex block 61 includes a
`vertex fetch block 61A which is responsible for retrieving
`Vertex information from memory 82, if requested, and trans
`mitting that vertex information into the vertex cache 61B. The
`information stored in the vertex cache 61B comprises the
`vertex information that is coupled to the first input of multi
`plexer 66.
`As discussed above, the graphics processor 60 of the
`present invention incorporates a unified shader 62 which is
`capable of performing both vertex manipulation operations
`and pixel manipulation operations based on the instructions
`stored in the instruction store 98. In this fashion, the graphics
`processor 60 of the present invention takes up less real estate
`than conventional graphics processors as separate vertex
`shaders and pixel shaders are no longer required. In addition,
`as the unified shader 62 is capable of alternating between
`performing vertex manipulation operations and pixel
`manipulation operations, graphics processing efficiency is
`enhanced as one type of data operations is not dependent upon
`another type of data operations. Therefore, any performance
`penalties experienced as a result of dependent operations in
`conventional graphics processors are overcome.
`The above detailed description of the present invention and
`the examples described therein have been presented for the
`purposes of illustration and description. It is therefore con
`templated that the present invention cover any and all modi
`fications, variations and equivalents that fall within the spirit
`and scope of the basic underlying principles disclosed and
`claimed herein.
`What is claimed is:
`1. A method carried out by a unified shader comprising:
`performing vertex manipulation operations and pixel
`manipulation operations by transmitting vertex data to a
`general purpose register block, and performing vertex
`operations on the vertex data by a processor within the
`unified shader unless the general purpose register block
`does not have enough available space therein to store
`incoming vertex data; and
`continuing pixel calculation operations that are to be or are
`currently being performed by the processor based on
`instructions maintained in an instruction store until
`enough registers within the general purpose register
`block become available.
`2. A unified shader, comprising:
`a general purpose register block for maintaining data;
`a processor unit;
`
`TCL 1001
`
`
`
`7
`a sequencer, coupled to the general purpose register block
`and the processor unit, the sequencer maintaining
`instructions operative to cause the processor unit to
`execute vertex calculation and pixel calculation opera
`tions on selected data maintained in the general purpose
`register block; and
`wherein the processor unit executes instructions that gen
`erate a pixel color in response to selected data from the
`general purpose register block and generates vertex
`position and appearance data in response to selected data
`from the general purpose register block.
`3. A unified shader comprising:
`a processor unit operative to perform vertex calculation
`operations and pixel calculation operations; and
`shared resources, operatively coupled to the processor unit;
`the processor unit operative to use the shared resources for
`either vertex data or pixel information and operative to
`perform pixel calculation operations until enough
`shared resources become available and then use the
`shared resources to perform vertex calculation opera
`tions.
`4. A unified shader comprising:
`a processor unit operative to perform vertex calculation
`operations and pixel calculation operations; and
`shared resources, operatively coupled to the processor unit;
`the processor unit operative to use the shared resources for
`either vertex data or pixel information and operative to
`perform vertex calculation operations until enough
`shared resources become available and then use the
`shared resources to perform pixel calculation opera
`t1OnS.
`
`25
`
`30
`
`US 8,760,454 B2
`
`10
`
`15
`
`8
`5. A unified shader comprising:
`a processor unit;
`a sequencer coupled to the processor unit, the sequencer
`maintaining instructions operative to cause the proces
`Sor unit to execute vertex calculation and pixel calcula
`tion operations on selected data maintained in a store
`depending upon an amount of space available in the
`Store.
`6. The shader of claim 5, wherein the sequencer further
`includes circuitry operative to fetch data from a memory.
`7. The shader of claim 5, further including a selection
`circuit operative to provide inform