`Gossett et al.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006259460Bl
`US 6,259,460 Bl
`Jul. 10, 2001
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) METHOD FOR EFFIC IENT HANOUNG OF
`TEXTURE CACHE MISSES BY
`RECIRCUlATION
`
`(75)
`
`Inventors: Carroll Philip Gossett, Mounta.in
`View; Mark Goudy, Berkeley; Ole
`Bentz, Mountain View, all of CA (US)
`
`(73) Assignee: Silicon Gnll>h ics, Inc., Mountain View,
`CA (US)
`
`( *) Nolice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/048,099
`
`Mar. 26, 1998
`
`(22) Filed:
`Int. C l.7
`................................. ~6F 1/20; G09G 5!00
`(51)
`(52) U.S. Cl . ............................................. 345/552; 345/506
`(58) Field of Search ..................................... 345/501-506,
`345/520, 521, 514; 711/118, 144, 145,
`169
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,831,640 * 11/1998 Wang ct al. ......................... 345/521
`5,987,567 * U/1999 Riva rd ct al. ........................ 7Jl/ll8
`6,000,019 • 12/ 1999 Dykstal et at. ...................... 7 ll / 157
`6,011,565 • 1/2000 Kuo el al. ............................ 345/513
`
`OTIIER PUBLICATIONS
`
`Open GL Reference Manual, The Official Reference Docu(cid:173)
`ment for Open GL, Release 1 by the Open GLArchitecture
`Review Board, Addison-Wesley Publishing Company,
`1992, Table of Contents (pp. vii-ix), pp. 1-26, and diagram
`entitled "The OpenGL Machine".
`Open GL Programming Guide, Jackie Neider, Tom Davis
`and Mason Woo, Addison-Wesley Publishing Company,
`1993, Table of Contents (pp. xv-xxiii), pp. 259-290,
`412-415, and 491-504.
`
`Principles of Interactive Computer Graphics, Second Edi(cid:173)
`tion, William M. Newman and Robert F. Sproull,
`McGraw-llill Book Company, 1979, 1able of Contents (pp.
`vii-xii), pp. 3-8, and 406-408.
`U.S. application No. 08/956,537, Wingett et al., filed Oct.
`23, 1997.
`U.S. application No. 09/048,024, Gossett et al., filed Mar.
`26, 1998.
`U.S. application No. 09/048,098, Gossett et al., Mar. 26,
`1998.
`The OpenGL Graphics System: A Specification (Version
`1.1), Mark Segal, Kurt Akeley; Editor: Chris Frazier, Table
`of Contents (pp. i- iv), pp. 9-11, 67,68, and 85-105; unpub(cid:173)
`lished; dated Mar. 4, 1997; Silicon Graphics, Inc.
`
`(List continued on next page.)
`
`Primary Examiner-UI.ka J. Chauhan
`(74) Altorney, Agent, or Firm-Staas & Halsey LLp
`
`(57)
`
`ABSTRACT
`
`A mel bod of a computer graphics system recircu lates lexture
`cache misses into a graphics pipeline without stalling the
`graphics pipeline, increasing 1he processing speed of the
`computer graphics system. The method reads data from a
`texture cache memory by a read request placed in the
`graphics pipeline sequence, then reads the data from the
`texture cache memory if the data is s tored in the texture
`cache memory and places the dala in tbe pipeline sequence.
`If the data is not stored in the texture cache memory, the
`method recirculates the read request in t.be pipeline sequence
`by indicating in the pipeline sequence that the data is not
`stored in the texture cache memory, placing the read request
`al a subsequent, determined place in the pipeline sequence,
`reading the data into the texture cache memory from a main
`memory, and executing the read request from the
`subsequent, determined place and after the data bas been
`read into the texture cache memory.
`
`25 Claims, 10 Drawing Sheets
`
`3 UIEREO.ftCULATECOUAO
`AI!AIVESOERfATTI-ESAt.E
`Tli.IE A$ !fiE O<'E' SlOT
`AND AIJUCES IT I~ ll<E
`SUBSEOUENT Fl'£ ST~GES.
`1 1-lTI MISS JS Dt~
`L1 ~l2
`
`0001
`
`Volkswagen 1008
`
`
`
`US 6,259,460 Bl
`Page 2
`
`OTIIER PUBLICATIONS
`
`Computer Graphics, Principles and Practice, Second Edition
`in C, James D. Foley, Andries van Dam, Steven K. Feiner,
`and John F. Hughes, Addison-Wesley Publishing Company,
`Jul. 1997, Table of Contents (pp. >•vii-xxiii), pp. 38-40,
`67-227 (including plates l.l-1.34), 229-283,471-531, 611,
`614, 616- 647 (including plates ll.l- 11.39), 741- 745, 800,
`
`805-853 (including plates lll.l- lll.28), 855- 922, 923-1006,
`1015-1018, and 1083-1112.
`Open GL Programming Guide, Second Edition, Tbe Official
`Guide to Learning Open GL, Version 1.1, Mason Woo,
`Jackie Neider, and Tom Davis, Addison-Wesley Developers
`Press, Jul. 1997, Table of Contents (pp. v-xv), pp. 1-14,
`317- 373,529- 553 and 609-627.
`* cited by examiner
`
`0002
`
`
`
`0 -......
`t'!l -......
`(J') =(cid:173)
`
`t'!l
`
`Q
`
`~
`
`~ ......
`F
`~ c: -......
`~ .... ~ = ....
`\J'J .
`
`•
`(j
`
`,.....
`o=
`0\ c
`~
`I,C
`Ul
`'N
`0\
`f'J)
`
`c::
`
`OPERATIONS
`
`PIXEL
`
`..
`
`(
`26
`
`OPERATIONS
`FRAGMENT r+ FRAMEBUFFER
`
`PER-
`
`.....
`
`..... RASTERIZATION
`
`ASSEMBLY ~
`PRIMITIVE
`OPERATIONS
`PER-VERTEX
`
`.-v28
`
`MEMORY
`TEXTURE
`
`+
`
`)
`24
`
`)
`22
`
`)
`20
`
`(
`18
`
`(
`16
`
`.....
`
`.. EVALUATOR
`
`..
`.....
`
`0003
`
`~ DISPLAY .-.....,14
`
`LIST
`
`(PRIOR ART)
`FIG. 1
`
`10
`
`
`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 2 of 10
`
`US 6,259,460 Bl
`
`FIG. 2
`(PRIOR ART)
`
`5.0
`1.0
`
`t
`
`,2~
`
`r---- --- ---- ---- ---- ---- ---- ---- ---- ---,
`4:
`:
`____ :
`t----
`23~· ~~~~~~~~~-~
`ot I ~: r----
`: __ .~~
`---~
`
`- 1'0 ~1-- -0- -1- --2- --3- i-4- -5--6- -=r-- _8 __ .
`-1.o ......... J-------u------t ... ~9.0
`s
`... 1.0
`o.o~
`
`0004
`
`
`
`......
`t:d
`~
`0\
`J;::..
`:::0
`{/'I
`N
`.9'
`VJ
`c
`
`<'> -~
`
`<'>
`00 :r
`
`.....
`e .....
`
`Q
`
`Q .....
`~
`?
`~ .....
`
`;...
`
`~ = ~
`
`~
`~
`~
`•
`'J).
`d •
`
`42m
`
`j/
`
`m
`
`!SUBSYSTEM
`GRAPHICS
`
`(DISK, ETC)
`OTHER 1/0
`
`TO MODEM
`
`l_
`46
`
`SYSTEM BLOCK
`
`DIAGRAM
`
`44
`
`GRAPHICS k
`
`SUBSYSTEM1
`
`421
`
`FIG. 3
`
`6
`
`_3
`
`HOST L
`
`MEMORY
`
`30_)
`
`0005
`
`32n
`
`321
`
`
`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 4 of 10
`
`US 6,259,460 Bl
`
`FIG. 4
`
`X TALK
`
`43 ,.-
`
`t-
`
`v
`
`2/ :---....
`~
`5
`
`4/ 1'---.
`~
`5
`
`6/ 1'-~
`5
`
`s ....- ............ ---1
`5
`
`42 ,..._,_,
`
`TEXTURE UNIT
`
`SHADER UNIT
`
`'
`
`I
`I
`63~
`48
`)
`~ ;----------,
`I CROSSTALLK INTERFACE!-
`I
`SDRAM DMA
`I
`I
`r---'---------_j
`I
`1 SDRAM Cmd FIFO
`I
`~ - -l.-------- -,
`f·-.-50
`I
`'
`I
`I
`I
`TRANSFORM ENGINE
`I
`I
`· I
`I
`I
`I
`I
`I
`I
`I
`I
`,-
`
`RASTER UNIT
`
`61~
`
`PINS/ PADS
`
`JTAG UNIT
`
`:---------:
`: SDRAM TEXTURE :
`1---------,
`.. ~ SDRAM FRAME BUF:
`I
`I
`l _____ ----~
`
`DISPLAY UNIT
`
`I
`
`,lr
`DAC46
`
`1.-
`
`I.-
`
`60
`
`0006
`
`
`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 5 of 10
`
`US 6,259,460 Bl
`
`FIG. 5
`PRIMITIVES
`
`RASTERIZATION
`UNIT
`
`62
`
`RASTERIZER OPENS
`A SLOT IN PIPELINE.
`(SENDS NIL DATA).
`
`QUADS
`
`~
`::>
`<( u..
`
`THE RECIRCULATED QUAD
`ARRIVES HERE AT THE SAME
`TIME AS THE OPEN SLOT
`AND REPLACES IT IN THE
`SUBSEQUENT PIPE STAGES.
`HIT/ MISS IS DETERMINED
`L1 ~L2
`
`76 CONTROL
`
`SDRAM
`
`70
`
`82 TEXTURE
`CACHE
`
`w
`z
`::J
`w
`a_
`Cl..
`(/)
`()
`:::r:
`a_
`<(
`a:
`CJ
`
`w
`z
`-I w
`a_
`a:
`CJ z
`a:
`::>
`I-
`X w
`I-
`
`::r:
`I-
`<(
`a_
`z
`0
`~
`::>
`(.)
`a:
`(.)
`w
`a:
`
`84
`
`0007
`
`
`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 6 of 10
`
`US 6,259,460 Bl
`
`FIG. 6
`FROM
`PRIMITIVES
`
`1 54
`
`1 91
`
`\._ 110
`
`I
`
`104}
`I COVERAGE
`I
`
`J
`I
`,
`~ DELTAZ~ ~ SUBPIXEL SELECT I
`
`'------
`
`1081
`
`1 9o
`: PRIMITIVE PARSER I
`1 94
`...
`... ·!VERTEX DENORM I I
`~ 1 96
`...
`II RECIPROCAL H LINE EQUATION I
`1 100
`: SUBDIVIDER
`
`TO/FROM
`SDRAM 50
`I •
`' I
`FRAME
`BUFFER
`CACHE
`
`r----121
`
`1 98
`
`LINE STIPPLE
`
`t
`
`LINE GAMMA
`'--- 120
`
`133
`tCACHE
`• GFE • GBE t--coNTROL
`SCHEDULER/
`FAULT
`CONTROLLER
`ADDR
`TO
`~102 '
`SDRAM
`riLINE ANTIALIAS 150
`50
`,,
`~118
`
`1121 '--- 106
`~MULTIPLIER)~
`
`114)
`'
`~ BILERP I
`
`--.j PERSPECTIVE L~
`
`' ( 116
`
`,,IDX bry-pa,b
`~
`93
`
`r
`'---,-' ALPHA
`92
`
`IDX
`
`ALPHA
`
`,,
`
`CVG
`'-y-1
`95
`
`0008
`
`
`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 7 of 10
`
`US 6,259,460 Bl
`
`FIG.7
`~ PRIMITIVES
`
`54
`,>
`RASTER UNIT
`
`92 93 95
`
`IDX
`
`A~HA
`
`~1~
`SCHEDULER ~
`
`, f
`~ 150
`,
`TEXTURE UNIT
`~-=-=~~:-:::~~---_j
`148
`TEXTURE BILERPS ...,..~1--------+----L-?__,
`122
`.-----92~~· +93 95 S,T,R S,T,R
`,
`~ TEXTURE ADDRESS rv 124
`REFORMA TIER
`1---+--~~
`
`133
`(FAULT(
`
`S,T,R S,T,R
`lr
`
`1--t--+--l"~
`
`58 A.. SHADER UNIT
`
`0009
`
`..
`r-+J LEVEL OF DETAIL UNIT 126
`S,T,R LOD S,T,R
`lr
`,
`~-o~~~~-12_8 __ 1....t:..~-1
`
`~ TEXTURE ADDRESS
`----.
`l
`UNIT
`"' 139
`1!mmmmH\!ADDR
`'
`' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '\..'132
`1301\. TCACHE
`r-+J TEXTURE CACHE
`~1 -
`CNTRL
`: ,,,,,,,,,,,,,,,, ~4 l! 141
`•
`•
`92
`95
`r-+J
`TEXTURE MUX
`138 Lj FORMAT
`l l
`l l
`r+ TEXTURE TEXTURE rv 140
`Fl L.
`Fl L.
`'
`'
`140 _)
`L _ ___:_ TEXTURE TEXTURE rv 143
`LUTS
`LUTS
`..
`r--
`,,
`143--'
`,
`
`136
`
`~ 137
`FROM
`SDRAM 50
`
`
`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 8 of 10
`
`US 6,259,460 Bl
`
`FIG.S
`
`54--. RASTER UNIT
`
`152
`
`1~2
`SCHEDULE
`f CONTROLLER
`
`--
`
`I
`
`133
`
`/ 92--
`\
`)
`
`r 93--
`J
`
`r 95"
`
`150' E LECT
`ULATION
`~ RECI RC
`
`SIGNA LSOR
`SFROM SIGNAL
`
`
`RUNIT RASTE
`
`rv 56
`
`, ,
`
`~
`
`150
`
`W-
`
`1547
`
`154)
`
`150
`
`,. w
`
`1547
`
`~
`
`~92
`
`~93
`
`95"'
`
`)__
`122
`
`148 .../ 1<- r-- )
`'- 1-
`
`TEXTURE UNIT
`
`'----Y""-'
`FROM
`RECIRCULATION
`TAP POINT
`
`0010
`
`
`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 9 of 10
`
`US 6,259,460 Bl
`
`FIG.9
`
`160~
`
`164
`
`162
`
`0011
`
`
`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 10 of 10
`
`US 6,259,460 Bl
`
`FIG. 10
`NEW (UN-RECIRCULATED)
`QUADS OR OPEN SLOTS
`
`DETERMINE 2X2 REGIONS FOR EACH SLEEPING
`PIXEL IN QUAD IN BOTH FINE AND COARSE LODS.
`
`COMPUTE A FOOTPRINT IN EACH LOD THAT COVERS
`AS MANY OF THE ABOVE 2X2 REGIONS AS POSSIBLE,
`WITH A MAX FOOTPRINT SIZE OF 4X4 TEXELS.
`
`S10
`
`S20
`
`sao
`
`YES
`
`S40
`
`READ TWO FOOTPRINTS
`FROM TEXTURE CACHE
`
`S90-,
`sso
`SELECT THE 2X2 REGIONS
`' '
`r--~--- --------,
`(UP TO 4) FROM EACH FOOTPRINT
`CORRESPONDING TO S1 0 BASED
`AT SOME POINT, BUT NOT
`I
`1
`ON OFFSET OF EACH 2X2 WITHIN
`INECESSARIL Y BEFORE THE QUAD I
`EACH FOOTPRINT.
`I RETURNS TO S10, THE REQUIRED I
`: TEXELS WILL BE RETRIEVED
`: '-------~------~
`FROM SDRAMSO
`WAKE UP THE SLEEPING PIXELS
`L----- - - - - - - - - - - - 1
`WHOSE2X2 REGIONS WERE
`S60
`COMPLETELY COVERED BY
`THE TWO FOOTPRINTS COMPUTED
`IN S20
`
`S100
`
`YES
`
`RASTER UNIT 54 OPENS EMPTY
`SLOT IN PIPELINE. QUAD IS
`RECIRCULATED. ATIRIBUTE TOKENS,
`TAGS, AND BARYCENTRIC COORDI(cid:173)
`NATES ARE SYNCHRONIZED WITH
`OPEN SLOT.
`
`PROCESSING
`CONTINUES
`
`0012
`
`
`
`US 6,259,460 Bl
`
`1
`METHOD FOR EFFIClliNT HANDLING OF
`TEXTURE CACHE MISSES BY
`RECIRCULATION
`
`CROSS-REl<"'ERENCE TO RELATED
`APPLICKriONS
`
`This application is related to U.S. patent application Ser.
`No. 09/048,098, entitled A Method for Improving Texture
`Locality for Pixel Ouads by Diagonal Level-of-Detail
`Calculation, by CarroU Philjp Gossett, filed concmrently
`herewith and which is incorporated herein by reference; U.S.
`patent application Ser. No. 09/048,024, entitled A Method
`for Accelerating Minified Texture Cache Access, by Carroll
`Philip Gossett now U.S. Pat. No. 6,104,415, filed concur(cid:173)
`rently herewith and which is incorporated herein by refer(cid:173)
`ence; and U.S. patent application Ser. No. 08/956,537,
`entitled A Method and Appraratus F>r Providing Image and
`Graphics Processing Using A Graphics Rendering Engine,
`fi led Oct. 23, 1997 and which is incorporated herein by
`reference.
`
`BACKGROUND OF THE INVENTION
`
`2
`shown in FIG. 1 receives commands, and may store the
`commands for futme processing in a display list 14 or
`execute the commands immediate ly. The OpcnGL process(cid:173)
`ing pipeline includes an evaluator 16, which approximates
`5 curve and surface geometry by evaluating polynomial com(cid:173)
`mands of the input values; per-vertex operations and primi(cid:173)
`tive assembly 18, in which geometric primitives such as
`points, line segments, and polygons, described by vertices,
`are processed, such as transforming and lighting the vertices,
`10 and clipping tbe primitives into a viewing volume; raster(cid:173)
`izatlon 20 produces a series of frame buffer addresses and
`associated values, and converts a projected point, line, or
`polygon, or the pixels of a bitmap or image, to fragments,
`each corresponding to a pixel in the framebuffer; per-
`15 fragment operations 22 performs operations such as condi(cid:173)
`tional updates to tbe frame buffer 24 based on incomfug and
`previously stored depth values (to effect dep th buffering)
`and blending of incoming pixel colors with s tored colors,
`masking, and other logical operations o n pixel values. The
`20 fina l pixel values are then stored in the frame buffer 24.
`Pixel operations 26 processes input da ta from the com(cid:173)
`mands 12 which is in the form of pixels rat ber than vert ices.
`Tbe result of the pixel operations 26 is stored in texture
`memory 28, for use in rasterization 20. The resulting frag-
`25 ments are merged into the ·frame buffer 24 as if the fragments
`were generated from geometric data.
`In addition, if texturing is enabled during rasterization 20,
`a texel is generated from texture memory 28 for each
`fragment and applied to the fragment. A texel is a textwe
`30 clement obtained from texture memory and represents the
`color of the texture to be applied to a corresponding frag(cid:173)
`ment. Texturing maps a portion of a specified texture image
`onto each primitive.
`Texture mapping is accomplished by using tbc color (Red
`(R), Green (G), Blue (B), or Alpha (A)) of tbe texture image
`at the location indicated by the fragment's (s, t, and r)
`coordinates. In the case of a 2-dimensional image (2-D
`image), sand t coordinates are applicable, and in the case of
`40 a 3-dimeosional image (3-D image), then s, t, and r coor(cid:173)
`dinates are applicable.
`An example of a texture image 29 and the coordinates
`used to access it is shown in FIG. 2 . FIG. 2 shows a
`two-dimensional texture image with nxm dimensions of n=3
`45 and m=2. A one-dimensional textu.re would comprise a
`single strip. The values, o. and 13, used in blending adjacent
`texels to obtain a texture value are also shown. As shown in
`FIG. 2, values of sand t coordinates are each in the range of
`0.0 to 1.0.
`In implementi ng the OpenGL processiog pipeline in the
`related art, a memory stores textures of images. Some of the
`textures are read into a texture cache during system
`initialization, wb.ile others are read iruo the texture cache
`upon a texture cache miss. Although there are many reasons
`that a system may experience a texture cache fault, such as
`during heavy minification of an image, most texture cache
`faults are data-dependent, and depend upon the s, t, and r
`coordinates wbicb are calculated in the OpenGL processing
`pipeline.
`If a texture cache fault occurs in the related art, then the
`OpenGL processing pipeline must be stalled to allow the
`system to retrieve the required data from the memory, and
`store it in tbe texture cache. However, Lhe OpenGL process(cid:173)
`ing pipeline is difficult to stall in that a performance penally
`65 is as.sessed in completing the OpenGL processing pipeline
`aod displaying an image. ln addition, stal ling the OpenGL
`processing pipeline would typically require a gated clock
`
`35
`
`1. Field of the Invention
`The present invention is related to computer graphics
`hardware for which OPENGL (GRAPHICS LIBRARY)
`software is an interface thereto, and, in particular, to effi(cid:173)
`ciently recirculating texture cache misses in the computer
`graphics hardware.
`2. Description of the Related Art
`loteractive graphics display, in general, is explained in
`Computer Graphics: Principles and Practices, Foley,
`vaoDam, Feiner, and Hughes, Addison-Wesley, 1992, and in
`Principles of Interactive Computer Graphics, William M.
`Newman and Robert F. Sproull, Second Edition, McGraw(cid:173)
`Hill Book Company, New York, 1979. Interactive graphics
`display generaUy includes a frame buffer storing pixels (or
`picture elements), a display, and a display controller that
`transmits the contents of the frame buffer to tbe display.
`The OpenGL graphics system is a software interface to
`graphics hardware, and is explained in the OpenGL Pro(cid:173)
`gramming Guide, The Official Guide to Learning OpenGL,
`Second Edition, Release 1.1, by the OpcnGL Architecture
`Review Board, Jackie Neider, Tom Davis, Mason Woo,
`Addison-Wesley Developers Press, Reading, Mass., 1997, in
`tbe OpenGL Programming Guide, 171e Official Guide to
`Learning OpenGL, Release 1, by the OpenGLArchitecture
`Review Board, Jackje Neider, Tom Davis, Mason Woo,
`Addison-Wesley Publishing Company, Reading, Mass., 50
`1993, and in the Open.GL Reference Manual, The Official
`Reference Document for OpenGL, Release J, the OpenGl
`Architecture Review Board, Addison-Wesley Publishing
`Company, Reading, Mass., 1993.
`A computer model for interpretation of OpenGL com- 55
`mands is a client-server model. An application program
`being executed by one computer, typically the client
`computer, issues commands, which are interpreted and pro(cid:173)
`cessed by another computer, typically the server computer,
`on which OpenGL is implemented. Tbe client may or may 60
`not operate on the same computer as the server. A computer,
`then, can make calls through an OpenGLsoftware interface
`to graphics hardware, and tbe graphics hardware can reside
`either on tbe same computer making the calls or on a remote
`computer.
`A tool fo r describing bow data is processed in OpenGL is
`a processing pipeline. The OpcnGL processing pipeline 10
`
`0013
`
`
`
`US 6,259,460 Bl
`
`3
`and/ or a multiplexer to be placed at the input of every
`Oipllop used in the OpenGL processing pipeline.
`A problem in the related art is that texture cache faults
`occur in retrieving textures from the texture cache for pixels
`already launched in the OpenGL processing pipeline, requir-
`ing that the OpeoGL processing pipeline be stalled.
`Another problem in tbe related art is tbat performance of
`the OpeoGL processing pipeline is reduced when texture
`cache faults occur.
`A f11rther problem in the related art is that the OpenGL 10
`processing pipeline must be stalled to allow data to be
`retrieved from the memory and read into the texture cache
`when texture cacbe faults occur.
`
`5
`
`4
`cons tructio n and operation as more fully hereinafter
`described and claimed, reference being bad to the accom(cid:173)
`panying drawings formi ng a part hereof, wherein like
`numerals refer to like parts throughou t.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a diagram of ao OpenGL processing pipeline of
`the related art;
`FIG. 2 is a diagram of a texture image and the coordinates
`used to access it;
`FIG. 3 is a block diagram of a graphics system according
`to the present invention;
`FIG. 4 is a block diagram of a hardware organization of
`t5 a chip implementing the present invention;
`FIG. 5 is an overview of the present invention;
`FIG. 6 is a detailed diagram of raster unit of a chip
`implementing the present invention;
`FIG. 7 is a detailed diagram of a texture uni t of the present
`invention in relation to a raster unit and a shader unit of the
`graphics chip implementing tbc prcseot invention;
`FIG. 8 is a detailed diagram of recirculation control in the
`present invention;
`FIG. 9 is a diagram showing an example of a graphic
`footprint likely to produce a texture cache miss; and
`Fl. G. 10 is a nowcbart of the method of texture cache miss
`recirculation of the present invention.
`
`30
`
`DESCRIPTION OF THE PREFERRED
`EMBOD1MEN1S
`
`The present invention resides in a single-chip implemen(cid:173)
`tation of OpenGL, in which matrices are composed on a host
`computer aod evaluated on the single c hip.
`The present invention L'> directed to recirculating a texture
`cache request resulting in a texture cache miss into the
`OpenGL processing pipeline without stalling the OpenGL
`processing pipeline. The present invention is explained in
`detail herein below, after an explanation of the preferred
`embodiment of the graphics subsystem 42 of the graphics
`system 30 in which the present invention is implemented.
`As shown in FIG. 3, in the graphics system 30 according
`to the present invention, central processing units (CPUs) 32l
`tbrougb 32, execute OpenGL software commands 12 (not
`shown in FIG. 3), using memory agent 34 and host memory
`36. A command stream, which is analogous to subroutine
`calls calling an OpenGL APl library, is pushed immediately
`by the CPU to be executed by the ha rdware implementi ng
`the OpenGL system, and, accordingly, a push model is
`representative of the OpeoGL system.
`The memory agent 34 then transmits the commands 12 to
`crossbar switch 40. Tben, commands U arc transmitted to
`graphic subsystems 421 through 42,, which process the
`commands U in a token stream (commands, including GL
`commands, are mapped to integer tokens), as described in
`further detail herein below. After graphic subsystems 42 1
`through 42, process the commands 12, a display backend 44
`transfers the processed data to d igi1al-to-analog (DAC)
`converter 46, then to a monitor for display.
`FIG. 4 is a block diagram of a graphic subsystem 42 of the
`present invention. Graphics subsystem 42 comprises graph(cid:173)
`ics chip 43 and SDRAM 50, coupled 10 graphics chip 43.
`As s hown in FIG. 4, CROSSTALK interface 48, which is
`65 also referred to as a graphics front end, interfaces to the rest
`of t he g raphic system 30 tbrougb XTALK (or
`CROSSTALK). CROSSTALK is a router/connection uoit
`
`0014
`
`20
`
`25
`
`SUMMARY OF THE INVENTION
`The present invention solves the above-mentioned prob(cid:173)
`lems of tbe related art.
`An object of the present invention is to recirculate texture
`cache misses into the OpenGL processing pipeline.
`Another object of the present invention is to process
`texture cache misses without stalling the OpeoGL process(cid:173)
`ing pipeline.
`A further object of the present invention is to maintain
`OpenGL processing pipeline performance if a tex ture cache
`faull occurs.
`The present invention overcomes the problems in the
`related art and accompl ishes the above-mentioned objects.
`The present invention recirculates a texture cache request
`(i.e., a texture cache read request) resu lting in a texture
`cache mis.5 into a predetermined, earlier stage of the
`OpenGL processing pipeline, without stalling the OpeoGL
`processing pipeline.
`The present invention increases the performance of a
`graphics chip implementing the present invention by 35
`enabling the graphics chip to run at a relatively higher clock
`rate, increasing the performance of the graphics chip, but no t
`stalling the graphics pipeline of approximately 150 stages
`being executed by the graphics chip. The present invention
`can be implemented at a low cost and with mi nimal design 40
`complexity.
`If texture cache request resulting in a texture cache miss
`occurs, tbe present invention processes the next texture
`cacbe request without stalling the OpenGL proccs.5ing pipe(cid:173)
`line while the data requested by the faulted texture cache 45
`request is read in from the textu re memory. Instead of
`stalling the OpenGL processing pipeline as in the prior art,
`the present invention tra nsmits a signal to circuitry corre(cid:173)
`sponding to a prior step in the OpenGL processing pipeline,
`and reinserts the texture cache request that resulted io the 50
`texture cache miss into the prior step, while, concurrently,
`the requested data is read from the texture memory and
`the
`text11 re cache. Consequently, when the
`stored in
`previously-faul ted texture cache request is again presented
`to tbe texture cache, the requested data is stored and avail- 55
`able in the texture cache, for retrieval and use in response to
`the texture cache request. J( the requested data remains
`unavailable when the texture cache request is again pre(cid:173)
`sented to the texture cache, thco the texture cache request is
`recircu lated subsequentl y, and repeatedly, until the requested 60
`data is available for retrieval from the texture cache when
`tbe texture cache request is presented to the texture cache.
`By the method of the present invention, texture cache
`requests are processed without stalling the OpenGL process(cid:173)
`ing pipeline.
`These toge ther with other objects and advantages which
`will be subsequently apparent, reside in the details of
`
`
`
`US 6,259,460 Bl
`
`15
`
`5
`available from Silicon Graphics, Inc. Graphics front end 48
`comprises a XTALK peer-to-peer interface, a DMA engine
`(including formall ing), and a command first- in, first-out
`(fifo) buffer. The command fifo is maintained in SDRAM 50,
`as shown in FTG. 4 . The command fifo is norninaUy 1 ms. 5
`worth of latency.
`Graphics front end 48 also directs incoming streams to
`intended destinations, provides all support for chained
`graphics subsyste ms 42
`in mul ti-subsystems 42
`configurations, provides access in and out for the SDRAM 10
`50, provides DMA channels for gnphics and video, formats
`input and output streams, manages context switching and
`context states, provides a read path for graphics registers,
`and provides access to the display backend 44 through
`Display Unit 60.
`Also as shown in FIG. 4, Transform Engine 52 interfaces
`to both the graphics front end 48 and the raster unit 54 on a
`first- in, first-out basis, receives commands and data [Tom the
`graphics front end 48, and sends computed da ta to raster unit
`54. The maio computations performed io the transform 20
`engine 52 include geometry transforms of both vertices and
`normals (MVP and MV matrices). Transform Engine 52 is
`responsible for all geometry calculations in graphics sub(cid:173)
`system 42, including performing vertex transforms and
`lighting computations for Phong Shading, and Gouraud 25
`Shading. In addition, Transform Engine 52 performs texture
`transform.
`Raster unit 54 of FIG. 4 parses command tokens trans(cid:173)
`milled from tbe Transform Engine 52, schedul es all
`SDRAM 50 memory transactions, rastcrizes each primitive 30
`by recursive subdivision, and genera tes perspective(cid:173)
`corrected barycentric parameters whicb are used to bi-lerp
`(bilinear interpolate) among the 3 vertices for each triangle.
`Raster unit 54 also includes tbe framebuJier cache.
`In addition, raster unit 54 includes line and point
`antialiasing, and the control for a framebuffcr cache. Frus(cid:173)
`tum clipping is effectively performed by the recursive sub(cid:173)
`division rasterization in raster unit 54, and user clipped
`planes are performed using the sign bit of the bi-lerps fo r the
`texture coordinates to invalidate user-clipped pixels.
`Barycentric coordinates for a triangle are a set of three
`numbers a, b, and c, each in the range of (0,1), with a+b+ccl
`and which uniquely specify any point within the triangle or
`on the triangle's boundary. For example, a point P in a
`triangle having vertices A, B, and C, and area a from the
`triangle having vertices P, B, and C (the edge within the
`triangle opposite from the vertex A), area b from the triangle
`having vertices P, C, and A (the edge within the triangle
`opposite from the vertex B), and area c from the triangle
`having vertices P, A, and B (the edge witbin the triangle
`opposite from the vertex C) is given by
`Pa{Axa+Bxb+Cxc)l(a+b+<:).
`Raster unit 54 also generates ao attribute tag pipeline 95
`and a barycentric pipeline 93, which are discussed in fu rther ss
`detail herein below, and which are generated after generation
`of the barycentric coordinates. Raster unit 54 performs
`Hilbert curve rasterization.
`A graphics pipeline may include attributes, a coverage
`mask, and barycentric coordinates. Lo the present invention,
`the graphics pipeline would include the attribute token
`pipeline 92, the barycentric pipeline 93, the attribute tag
`pipeline 95, and hardware elements comprising the raster
`unit 54, the texture unit 56, and the shader unit 58, explained
`in detail herein below.
`Raster unit 54 receives the attribute token pipeline 92
`from software executed on a host computer. The attribute
`
`6
`token pipeline 92 transmits data originating from OpenGL
`calls executed on the host computer. The attribute token
`pipel ine 92 is formed when the above-mentioned OpenGL
`calls are translated into the data by a driver available from
`Silicon Graphics, Inc. running on the host computer and arc
`transmitted to the graphics chip 43.
`Also as shown io FIG. 4, texture unit 56 includes level of
`detail calculation, texture add ressing, control for the texture
`cache, the texture tree filter for lerps (linearly interpolate)
`and the TLUT (texture color look-up table). Texture unit 56
`also includes an SRAM for an on-chip texture cache, and the
`texture cache SRAM is organized as 16 ban.ksx512 wordsx
`48 bits. Texture unit 56 is explained in further detail herein
`below.
`Shader unit 58 shown in FIG. 4 imcludes shading and
`depth bi-lerps, per-pixel lighting, pre-lighting texture envi(cid:173)
`ronments and post-lighting texture environments, fog, multi(cid:173)
`fragment polygon antialiasing, and per-fragment tests and
`operations.
`A display unit 60 is provided io eacb graphics subsystem
`42, as shown in FIG. 4. T he display l>ackend 44 shown in
`FIG. 3 includes the display units 60 of the graphics sub(cid:173)
`systems 42, and additional funct ions. The display un it 60
`shown in FIG. 4 includes all of the pixel processing which
`occurs between the (ramebulier and the output of the graph(cid:173)
`ics subsystem 42 to the DAC or display backend 44. The
`display backend 44 combines the output from multiple
`graphic subsystems 42 for output by the DAC 46, or divides
`the output from the graphics system 30 to up to 4 DACs 46.
`The display unit 60, tra nsfers pixels and overlay/WID
`data from the framebuffer interface into first-in, first-out
`queues (FlFOs), changes pixels from lh.e framebuJier format
`into a staodarcl format RGB component format, maps color
`indexed into fina l RGB pixel values and applies gamma
`35 correction to all pixel data, generates all tim ing control
`signals for the various parts of the display unit 60, and
`provides read and write access to registers and memories io
`the display unit 60.
`The graphics chip 43 of the present invention also
`40 includes pins/pads 61 for physical ac<:ess to the graphics
`chip 43, and JTAG uoit 63 for cbip testing purposes.
`The focus of the present invention resides in Raster Unit
`54 and Texture Unit 56, which are described io greater detail
`with reference to FIGS. 6- U , after an overview of texture
`45 cache miss recirculation in accordance with tbe present
`invention as shown in FIG. 5.
`FIG. 5 is a diagram showing the general operation of the
`present invention in graphics pipelines. and is applicable to
`graphics pipelines including OpenGL. Microsoft DrRECT
`so 3D, and other graphics pipelines. Texture cache recirculation
`in accordance with the present ioveotioo involves graphics
`pipeline 62, including a rasterization unit 64, a texture unit
`66, a shading unit 68, and an SO RAM (synchronous
`dynamic random access memory) 70.
`As shown in FIG. 5, primitives at tbe level of a triangle
`enter the rasterization unit 64 and arc rasterized into pixels
`covering the area of a primitive. Tbe pixels are grouped into
`sets of fou r referred to as pixel quads (or quads) and
`transmitted into the Lz portion of the pipeline 74. In the
`60 texturing unit 66, the x, y, z address of the pixels and the
`corresponding texture address s, t, and rare examined. Each
`s, t, and r address corresponds to a given pixel, and the s, t,
`and r addresses flow through the texturing pipeline 74 after
`the gating unit 80. However, for the purposes of explanation,
`65 pixel quads are referred to in the remainder of FIG. 5. To
`addition, the present invention is applicable to both
`2-dimensional images and 3-climeosiol!lal images.
`
`0015
`
`
`
`US 6,259,460 Bl
`
`7
`The present invention recirculates a texture cache miss
`without stalling the graphics pipeline below controller 78, as
`sbown in FIG. 5. Tberefore, the constraint L1 ~~ (tbe delay
`of a pixel quad transmitted througb the L 1 portion of tbe
`grapbics pipeline must be greater tban or equal to tbe delay 5
`tbrougb Lz) is imposed by tbe raster controller 78 so that tbe
`raster controller 78 bas enough time to create an open (or
`empty) slot in the raster pipeline 72 to send nil data instead
`of transmilling another pixel quad. Accordingly, when a
`pixel quad is being recirculated back to the lop of the texture 10
`unit 66, tbc constraint of L 1 ~~allows the recirculated pixel
`quad to arrive at the gating c ircui t 80 at the same ti me as the
`open slot created by the raster controller 78. The recirculated
`pixel quad replaces the open slot in the remaining stages of
`tbe graphics pipeline 62. ~ and L1 are predetermined 15
`numbers, dependen t upon the physical design of a graph