throbber
(12) United States Patent
`Gossett et al.
`
`111111111111111111111111111111111111111111111111111111111111111111111111111
`US006259460Bl
`US 6,259,460 Bl
`Jul. 10, 2001
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) METHOD FOR EFFIC IENT HANOUNG OF
`TEXTURE CACHE MISSES BY
`RECIRCUlATION
`
`(75)
`
`Inventors: Carroll Philip Gossett, Mounta.in
`View; Mark Goudy, Berkeley; Ole
`Bentz, Mountain View, all of CA (US)
`
`(73) Assignee: Silicon Gnll>h ics, Inc., Mountain View,
`CA (US)
`
`( *) Nolice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/048,099
`
`Mar. 26, 1998
`
`(22) Filed:
`Int. C l.7
`................................. ~6F 1/20; G09G 5!00
`(51)
`(52) U.S. Cl . ............................................. 345/552; 345/506
`(58) Field of Search ..................................... 345/501-506,
`345/520, 521, 514; 711/118, 144, 145,
`169
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,831,640 * 11/1998 Wang ct al. ......................... 345/521
`5,987,567 * U/1999 Riva rd ct al. ........................ 7Jl/ll8
`6,000,019 • 12/ 1999 Dykstal et at. ...................... 7 ll / 157
`6,011,565 • 1/2000 Kuo el al. ............................ 345/513
`
`OTIIER PUBLICATIONS
`
`Open GL Reference Manual, The Official Reference Docu(cid:173)
`ment for Open GL, Release 1 by the Open GLArchitecture
`Review Board, Addison-Wesley Publishing Company,
`1992, Table of Contents (pp. vii-ix), pp. 1-26, and diagram
`entitled "The OpenGL Machine".
`Open GL Programming Guide, Jackie Neider, Tom Davis
`and Mason Woo, Addison-Wesley Publishing Company,
`1993, Table of Contents (pp. xv-xxiii), pp. 259-290,
`412-415, and 491-504.
`
`Principles of Interactive Computer Graphics, Second Edi(cid:173)
`tion, William M. Newman and Robert F. Sproull,
`McGraw-llill Book Company, 1979, 1able of Contents (pp.
`vii-xii), pp. 3-8, and 406-408.
`U.S. application No. 08/956,537, Wingett et al., filed Oct.
`23, 1997.
`U.S. application No. 09/048,024, Gossett et al., filed Mar.
`26, 1998.
`U.S. application No. 09/048,098, Gossett et al., Mar. 26,
`1998.
`The OpenGL Graphics System: A Specification (Version
`1.1), Mark Segal, Kurt Akeley; Editor: Chris Frazier, Table
`of Contents (pp. i- iv), pp. 9-11, 67,68, and 85-105; unpub(cid:173)
`lished; dated Mar. 4, 1997; Silicon Graphics, Inc.
`
`(List continued on next page.)
`
`Primary Examiner-UI.ka J. Chauhan
`(74) Altorney, Agent, or Firm-Staas & Halsey LLp
`
`(57)
`
`ABSTRACT
`
`A mel bod of a computer graphics system recircu lates lexture
`cache misses into a graphics pipeline without stalling the
`graphics pipeline, increasing 1he processing speed of the
`computer graphics system. The method reads data from a
`texture cache memory by a read request placed in the
`graphics pipeline sequence, then reads the data from the
`texture cache memory if the data is s tored in the texture
`cache memory and places the dala in tbe pipeline sequence.
`If the data is not stored in the texture cache memory, the
`method recirculates the read request in t.be pipeline sequence
`by indicating in the pipeline sequence that the data is not
`stored in the texture cache memory, placing the read request
`al a subsequent, determined place in the pipeline sequence,
`reading the data into the texture cache memory from a main
`memory, and executing the read request from the
`subsequent, determined place and after the data bas been
`read into the texture cache memory.
`
`25 Claims, 10 Drawing Sheets
`
`3 UIEREO.ftCULATECOUAO
`AI!AIVESOERfATTI-ESAt.E
`Tli.IE A$ !fiE O<'E' SlOT
`AND AIJUCES IT I~ ll<E
`SUBSEOUENT Fl'£ ST~GES.
`1 1-lTI MISS JS Dt~
`L1 ~l2
`
`0001
`
`Volkswagen 1008
`
`

`
`US 6,259,460 Bl
`Page 2
`
`OTIIER PUBLICATIONS
`
`Computer Graphics, Principles and Practice, Second Edition
`in C, James D. Foley, Andries van Dam, Steven K. Feiner,
`and John F. Hughes, Addison-Wesley Publishing Company,
`Jul. 1997, Table of Contents (pp. >•vii-xxiii), pp. 38-40,
`67-227 (including plates l.l-1.34), 229-283,471-531, 611,
`614, 616- 647 (including plates ll.l- 11.39), 741- 745, 800,
`
`805-853 (including plates lll.l- lll.28), 855- 922, 923-1006,
`1015-1018, and 1083-1112.
`Open GL Programming Guide, Second Edition, Tbe Official
`Guide to Learning Open GL, Version 1.1, Mason Woo,
`Jackie Neider, and Tom Davis, Addison-Wesley Developers
`Press, Jul. 1997, Table of Contents (pp. v-xv), pp. 1-14,
`317- 373,529- 553 and 609-627.
`* cited by examiner
`
`0002
`
`

`
`0 -......
`t'!l -......
`(J') =(cid:173)
`
`t'!l
`
`Q
`
`~
`
`~ ......
`F
`~ c: -......
`~ .... ~ = ....
`\J'J .
`
`•
`(j
`
`,.....
`o=
`0\ c
`~
`I,C
`Ul
`'N
`0\
`f'J)
`
`c::
`
`OPERATIONS
`
`PIXEL
`
`..
`
`(
`26
`
`OPERATIONS
`FRAGMENT r+ FRAMEBUFFER
`
`PER-
`
`.....
`
`..... RASTERIZATION
`
`ASSEMBLY ~
`PRIMITIVE
`OPERATIONS
`PER-VERTEX
`
`.-v28
`
`MEMORY
`TEXTURE
`
`+
`
`)
`24
`
`)
`22
`
`)
`20
`
`(
`18
`
`(
`16
`
`.....
`
`.. EVALUATOR
`
`..
`.....
`
`0003
`
`~ DISPLAY .-.....,14
`
`LIST
`
`(PRIOR ART)
`FIG. 1
`
`10
`
`

`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 2 of 10
`
`US 6,259,460 Bl
`
`FIG. 2
`(PRIOR ART)
`
`5.0
`1.0
`
`t
`
`,2~
`
`r---- --- ---- ---- ---- ---- ---- ---- ---- ---,
`4:
`:
`____ :
`t----
`23~· ~~~~~~~~~-~
`ot I ~: r----
`: __ .~~
`---~
`
`- 1'0 ~1-- -0- -1- --2- --3- i-4- -5--6- -=r-- _8 __ .
`-1.o ......... J-------u------t ... ~9.0
`s
`... 1.0
`o.o~
`
`0004
`
`

`
`......
`t:d
`~
`0\
`J;::..
`:::0
`{/'I
`N
`.9'
`VJ
`c
`
`<'> -~
`
`<'>
`00 :r
`
`.....
`e .....
`
`Q
`
`Q .....
`~
`?
`~ .....
`
`;...
`
`~ = ~
`
`~
`~
`~
`•
`'J).
`d •
`
`42m
`
`j/
`
`m
`
`!SUBSYSTEM
`GRAPHICS
`
`(DISK, ETC)
`OTHER 1/0
`
`TO MODEM
`
`l_
`46
`
`SYSTEM BLOCK
`
`DIAGRAM
`
`44
`
`GRAPHICS k
`
`SUBSYSTEM1
`
`421
`
`FIG. 3
`
`6
`
`_3
`
`HOST L
`
`MEMORY
`
`30_)
`
`0005
`
`32n
`
`321
`
`

`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 4 of 10
`
`US 6,259,460 Bl
`
`FIG. 4
`
`X TALK
`
`43 ,.-
`
`t-
`
`v
`
`2/ :---....
`~
`5
`
`4/ 1'---.
`~
`5
`
`6/ 1'-~
`5
`
`s ....- ............ ---1
`5
`
`42 ,..._,_,
`
`TEXTURE UNIT
`
`SHADER UNIT
`
`'
`
`I
`I
`63~
`48
`)
`~ ;----------,
`I CROSSTALLK INTERFACE!-
`I
`SDRAM DMA
`I
`I
`r---'---------_j
`I
`1 SDRAM Cmd FIFO
`I
`~ - -l.-------- -,
`f·-.-50
`I
`'
`I
`I
`I
`TRANSFORM ENGINE
`I
`I
`· I
`I
`I
`I
`I
`I
`I
`I
`I
`,-
`
`RASTER UNIT
`
`61~
`
`PINS/ PADS
`
`JTAG UNIT
`
`:---------:
`: SDRAM TEXTURE :
`1---------,
`.. ~ SDRAM FRAME BUF:
`I
`I
`l _____ ----~
`
`DISPLAY UNIT
`
`I
`
`,lr
`DAC46
`
`1.-
`
`I.-
`
`60
`
`0006
`
`

`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 5 of 10
`
`US 6,259,460 Bl
`
`FIG. 5
`PRIMITIVES
`
`RASTERIZATION
`UNIT
`
`62
`
`RASTERIZER OPENS
`A SLOT IN PIPELINE.
`(SENDS NIL DATA).
`
`QUADS
`
`~
`::>
`<( u..
`
`THE RECIRCULATED QUAD
`ARRIVES HERE AT THE SAME
`TIME AS THE OPEN SLOT
`AND REPLACES IT IN THE
`SUBSEQUENT PIPE STAGES.
`HIT/ MISS IS DETERMINED
`L1 ~L2
`
`76 CONTROL
`
`SDRAM
`
`70
`
`82 TEXTURE
`CACHE
`
`w
`z
`::J
`w
`a_
`Cl..
`(/)
`()
`:::r:
`a_
`<(
`a:
`CJ
`
`w
`z
`-I w
`a_
`a:
`CJ z
`a:
`::>
`I-
`X w
`I-
`
`::r:
`I-
`<(
`a_
`z
`0
`~
`::>
`(.)
`a:
`(.)
`w
`a:
`
`84
`
`0007
`
`

`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 6 of 10
`
`US 6,259,460 Bl
`
`FIG. 6
`FROM
`PRIMITIVES
`
`1 54
`
`1 91
`
`\._ 110
`
`I
`
`104}
`I COVERAGE
`I
`
`J
`I
`,
`~ DELTAZ~ ~ SUBPIXEL SELECT I
`
`'------
`
`1081
`
`1 9o
`: PRIMITIVE PARSER I
`1 94
`...
`... ·!VERTEX DENORM I I
`~ 1 96
`...
`II RECIPROCAL H LINE EQUATION I
`1 100
`: SUBDIVIDER
`
`TO/FROM
`SDRAM 50
`I •
`' I
`FRAME
`BUFFER
`CACHE
`
`r----121
`
`1 98
`
`LINE STIPPLE
`
`t
`
`LINE GAMMA
`'--- 120
`
`133
`tCACHE
`• GFE • GBE t--coNTROL
`SCHEDULER/
`FAULT
`CONTROLLER
`ADDR
`TO
`~102 '
`SDRAM
`riLINE ANTIALIAS 150
`50
`,,
`~118
`
`1121 '--- 106
`~MULTIPLIER)~
`
`114)
`'
`~ BILERP I
`
`--.j PERSPECTIVE L~
`
`' ( 116
`
`,,IDX bry-pa,b
`~
`93
`
`r
`'---,-' ALPHA
`92
`
`IDX
`
`ALPHA
`
`,,
`
`CVG
`'-y-1
`95
`
`0008
`
`

`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 7 of 10
`
`US 6,259,460 Bl
`
`FIG.7
`~ PRIMITIVES
`
`54
`,>
`RASTER UNIT
`
`92 93 95
`
`IDX
`
`A~HA
`
`~1~
`SCHEDULER ~
`
`, f
`~ 150
`,
`TEXTURE UNIT
`~-=-=~~:-:::~~---_j
`148
`TEXTURE BILERPS ...,..~1--------+----L-?__,
`122
`.-----92~~· +93 95 S,T,R S,T,R
`,
`~ TEXTURE ADDRESS rv 124
`REFORMA TIER
`1---+--~~
`
`133
`(FAULT(
`
`S,T,R S,T,R
`lr
`
`1--t--+--l"~
`
`58 A.. SHADER UNIT
`
`0009
`
`..
`r-+J LEVEL OF DETAIL UNIT 126
`S,T,R LOD S,T,R
`lr
`,
`~-o~~~~-12_8 __ 1....t:..~-1
`
`~ TEXTURE ADDRESS
`----.
`l
`UNIT
`"' 139
`1!mmmmH\!ADDR
`'
`' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '\..'132
`1301\. TCACHE
`r-+J TEXTURE CACHE
`~1 -
`CNTRL
`: ,,,,,,,,,,,,,,,, ~4 l! 141
`•
`•
`92
`95
`r-+J
`TEXTURE MUX
`138 Lj FORMAT
`l l
`l l
`r+ TEXTURE TEXTURE rv 140
`Fl L.
`Fl L.
`'
`'
`140 _)
`L _ ___:_ TEXTURE TEXTURE rv 143
`LUTS
`LUTS
`..
`r--
`,,
`143--'
`,
`
`136
`
`~ 137
`FROM
`SDRAM 50
`
`

`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 8 of 10
`
`US 6,259,460 Bl
`
`FIG.S
`
`54--. RASTER UNIT
`
`152
`
`1~2
`SCHEDULE
`f CONTROLLER
`
`--
`
`I
`
`133
`
`/ 92--
`\
`)
`
`r 93--
`J
`
`r 95"
`
`150' E LECT
`ULATION
`~ RECI RC
`
`SIGNA LSOR
`SFROM SIGNAL
`
`
`RUNIT RASTE
`
`rv 56
`
`, ,
`
`~
`
`150
`
`W-
`
`1547
`
`154)
`
`150
`
`,. w
`
`1547
`
`~
`
`~92
`
`~93
`
`95"'
`
`)__
`122
`
`148 .../ 1<- r-- )
`'- 1-
`
`TEXTURE UNIT
`
`'----Y""-'
`FROM
`RECIRCULATION
`TAP POINT
`
`0010
`
`

`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 9 of 10
`
`US 6,259,460 Bl
`
`FIG.9
`
`160~
`
`164
`
`162
`
`0011
`
`

`
`U.S. Patent
`
`Jul. 10, 2001
`
`Sheet 10 of 10
`
`US 6,259,460 Bl
`
`FIG. 10
`NEW (UN-RECIRCULATED)
`QUADS OR OPEN SLOTS
`
`DETERMINE 2X2 REGIONS FOR EACH SLEEPING
`PIXEL IN QUAD IN BOTH FINE AND COARSE LODS.
`
`COMPUTE A FOOTPRINT IN EACH LOD THAT COVERS
`AS MANY OF THE ABOVE 2X2 REGIONS AS POSSIBLE,
`WITH A MAX FOOTPRINT SIZE OF 4X4 TEXELS.
`
`S10
`
`S20
`
`sao
`
`YES
`
`S40
`
`READ TWO FOOTPRINTS
`FROM TEXTURE CACHE
`
`S90-,
`sso
`SELECT THE 2X2 REGIONS
`' '
`r--~--- --------,
`(UP TO 4) FROM EACH FOOTPRINT
`CORRESPONDING TO S1 0 BASED
`AT SOME POINT, BUT NOT
`I
`1
`ON OFFSET OF EACH 2X2 WITHIN
`INECESSARIL Y BEFORE THE QUAD I
`EACH FOOTPRINT.
`I RETURNS TO S10, THE REQUIRED I
`: TEXELS WILL BE RETRIEVED
`: '-------~------~
`FROM SDRAMSO
`WAKE UP THE SLEEPING PIXELS
`L----- - - - - - - - - - - - 1
`WHOSE2X2 REGIONS WERE
`S60
`COMPLETELY COVERED BY
`THE TWO FOOTPRINTS COMPUTED
`IN S20
`
`S100
`
`YES
`
`RASTER UNIT 54 OPENS EMPTY
`SLOT IN PIPELINE. QUAD IS
`RECIRCULATED. ATIRIBUTE TOKENS,
`TAGS, AND BARYCENTRIC COORDI(cid:173)
`NATES ARE SYNCHRONIZED WITH
`OPEN SLOT.
`
`PROCESSING
`CONTINUES
`
`0012
`
`

`
`US 6,259,460 Bl
`
`1
`METHOD FOR EFFIClliNT HANDLING OF
`TEXTURE CACHE MISSES BY
`RECIRCULATION
`
`CROSS-REl<"'ERENCE TO RELATED
`APPLICKriONS
`
`This application is related to U.S. patent application Ser.
`No. 09/048,098, entitled A Method for Improving Texture
`Locality for Pixel Ouads by Diagonal Level-of-Detail
`Calculation, by CarroU Philjp Gossett, filed concmrently
`herewith and which is incorporated herein by reference; U.S.
`patent application Ser. No. 09/048,024, entitled A Method
`for Accelerating Minified Texture Cache Access, by Carroll
`Philip Gossett now U.S. Pat. No. 6,104,415, filed concur(cid:173)
`rently herewith and which is incorporated herein by refer(cid:173)
`ence; and U.S. patent application Ser. No. 08/956,537,
`entitled A Method and Appraratus F>r Providing Image and
`Graphics Processing Using A Graphics Rendering Engine,
`fi led Oct. 23, 1997 and which is incorporated herein by
`reference.
`
`BACKGROUND OF THE INVENTION
`
`2
`shown in FIG. 1 receives commands, and may store the
`commands for futme processing in a display list 14 or
`execute the commands immediate ly. The OpcnGL process(cid:173)
`ing pipeline includes an evaluator 16, which approximates
`5 curve and surface geometry by evaluating polynomial com(cid:173)
`mands of the input values; per-vertex operations and primi(cid:173)
`tive assembly 18, in which geometric primitives such as
`points, line segments, and polygons, described by vertices,
`are processed, such as transforming and lighting the vertices,
`10 and clipping tbe primitives into a viewing volume; raster(cid:173)
`izatlon 20 produces a series of frame buffer addresses and
`associated values, and converts a projected point, line, or
`polygon, or the pixels of a bitmap or image, to fragments,
`each corresponding to a pixel in the framebuffer; per-
`15 fragment operations 22 performs operations such as condi(cid:173)
`tional updates to tbe frame buffer 24 based on incomfug and
`previously stored depth values (to effect dep th buffering)
`and blending of incoming pixel colors with s tored colors,
`masking, and other logical operations o n pixel values. The
`20 fina l pixel values are then stored in the frame buffer 24.
`Pixel operations 26 processes input da ta from the com(cid:173)
`mands 12 which is in the form of pixels rat ber than vert ices.
`Tbe result of the pixel operations 26 is stored in texture
`memory 28, for use in rasterization 20. The resulting frag-
`25 ments are merged into the ·frame buffer 24 as if the fragments
`were generated from geometric data.
`In addition, if texturing is enabled during rasterization 20,
`a texel is generated from texture memory 28 for each
`fragment and applied to the fragment. A texel is a textwe
`30 clement obtained from texture memory and represents the
`color of the texture to be applied to a corresponding frag(cid:173)
`ment. Texturing maps a portion of a specified texture image
`onto each primitive.
`Texture mapping is accomplished by using tbc color (Red
`(R), Green (G), Blue (B), or Alpha (A)) of tbe texture image
`at the location indicated by the fragment's (s, t, and r)
`coordinates. In the case of a 2-dimensional image (2-D
`image), sand t coordinates are applicable, and in the case of
`40 a 3-dimeosional image (3-D image), then s, t, and r coor(cid:173)
`dinates are applicable.
`An example of a texture image 29 and the coordinates
`used to access it is shown in FIG. 2 . FIG. 2 shows a
`two-dimensional texture image with nxm dimensions of n=3
`45 and m=2. A one-dimensional textu.re would comprise a
`single strip. The values, o. and 13, used in blending adjacent
`texels to obtain a texture value are also shown. As shown in
`FIG. 2, values of sand t coordinates are each in the range of
`0.0 to 1.0.
`In implementi ng the OpenGL processiog pipeline in the
`related art, a memory stores textures of images. Some of the
`textures are read into a texture cache during system
`initialization, wb.ile others are read iruo the texture cache
`upon a texture cache miss. Although there are many reasons
`that a system may experience a texture cache fault, such as
`during heavy minification of an image, most texture cache
`faults are data-dependent, and depend upon the s, t, and r
`coordinates wbicb are calculated in the OpenGL processing
`pipeline.
`If a texture cache fault occurs in the related art, then the
`OpenGL processing pipeline must be stalled to allow the
`system to retrieve the required data from the memory, and
`store it in tbe texture cache. However, Lhe OpenGL process(cid:173)
`ing pipeline is difficult to stall in that a performance penally
`65 is as.sessed in completing the OpenGL processing pipeline
`aod displaying an image. ln addition, stal ling the OpenGL
`processing pipeline would typically require a gated clock
`
`35
`
`1. Field of the Invention
`The present invention is related to computer graphics
`hardware for which OPENGL (GRAPHICS LIBRARY)
`software is an interface thereto, and, in particular, to effi(cid:173)
`ciently recirculating texture cache misses in the computer
`graphics hardware.
`2. Description of the Related Art
`loteractive graphics display, in general, is explained in
`Computer Graphics: Principles and Practices, Foley,
`vaoDam, Feiner, and Hughes, Addison-Wesley, 1992, and in
`Principles of Interactive Computer Graphics, William M.
`Newman and Robert F. Sproull, Second Edition, McGraw(cid:173)
`Hill Book Company, New York, 1979. Interactive graphics
`display generaUy includes a frame buffer storing pixels (or
`picture elements), a display, and a display controller that
`transmits the contents of the frame buffer to tbe display.
`The OpenGL graphics system is a software interface to
`graphics hardware, and is explained in the OpenGL Pro(cid:173)
`gramming Guide, The Official Guide to Learning OpenGL,
`Second Edition, Release 1.1, by the OpcnGL Architecture
`Review Board, Jackie Neider, Tom Davis, Mason Woo,
`Addison-Wesley Developers Press, Reading, Mass., 1997, in
`tbe OpenGL Programming Guide, 171e Official Guide to
`Learning OpenGL, Release 1, by the OpenGLArchitecture
`Review Board, Jackje Neider, Tom Davis, Mason Woo,
`Addison-Wesley Publishing Company, Reading, Mass., 50
`1993, and in the Open.GL Reference Manual, The Official
`Reference Document for OpenGL, Release J, the OpenGl
`Architecture Review Board, Addison-Wesley Publishing
`Company, Reading, Mass., 1993.
`A computer model for interpretation of OpenGL com- 55
`mands is a client-server model. An application program
`being executed by one computer, typically the client
`computer, issues commands, which are interpreted and pro(cid:173)
`cessed by another computer, typically the server computer,
`on which OpenGL is implemented. Tbe client may or may 60
`not operate on the same computer as the server. A computer,
`then, can make calls through an OpenGLsoftware interface
`to graphics hardware, and tbe graphics hardware can reside
`either on tbe same computer making the calls or on a remote
`computer.
`A tool fo r describing bow data is processed in OpenGL is
`a processing pipeline. The OpcnGL processing pipeline 10
`
`0013
`
`

`
`US 6,259,460 Bl
`
`3
`and/ or a multiplexer to be placed at the input of every
`Oipllop used in the OpenGL processing pipeline.
`A problem in the related art is that texture cache faults
`occur in retrieving textures from the texture cache for pixels
`already launched in the OpenGL processing pipeline, requir-
`ing that the OpeoGL processing pipeline be stalled.
`Another problem in tbe related art is tbat performance of
`the OpeoGL processing pipeline is reduced when texture
`cache faults occur.
`A f11rther problem in the related art is that the OpenGL 10
`processing pipeline must be stalled to allow data to be
`retrieved from the memory and read into the texture cache
`when texture cacbe faults occur.
`
`5
`
`4
`cons tructio n and operation as more fully hereinafter
`described and claimed, reference being bad to the accom(cid:173)
`panying drawings formi ng a part hereof, wherein like
`numerals refer to like parts throughou t.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a diagram of ao OpenGL processing pipeline of
`the related art;
`FIG. 2 is a diagram of a texture image and the coordinates
`used to access it;
`FIG. 3 is a block diagram of a graphics system according
`to the present invention;
`FIG. 4 is a block diagram of a hardware organization of
`t5 a chip implementing the present invention;
`FIG. 5 is an overview of the present invention;
`FIG. 6 is a detailed diagram of raster unit of a chip
`implementing the present invention;
`FIG. 7 is a detailed diagram of a texture uni t of the present
`invention in relation to a raster unit and a shader unit of the
`graphics chip implementing tbc prcseot invention;
`FIG. 8 is a detailed diagram of recirculation control in the
`present invention;
`FIG. 9 is a diagram showing an example of a graphic
`footprint likely to produce a texture cache miss; and
`Fl. G. 10 is a nowcbart of the method of texture cache miss
`recirculation of the present invention.
`
`30
`
`DESCRIPTION OF THE PREFERRED
`EMBOD1MEN1S
`
`The present invention resides in a single-chip implemen(cid:173)
`tation of OpenGL, in which matrices are composed on a host
`computer aod evaluated on the single c hip.
`The present invention L'> directed to recirculating a texture
`cache request resulting in a texture cache miss into the
`OpenGL processing pipeline without stalling the OpenGL
`processing pipeline. The present invention is explained in
`detail herein below, after an explanation of the preferred
`embodiment of the graphics subsystem 42 of the graphics
`system 30 in which the present invention is implemented.
`As shown in FIG. 3, in the graphics system 30 according
`to the present invention, central processing units (CPUs) 32l
`tbrougb 32, execute OpenGL software commands 12 (not
`shown in FIG. 3), using memory agent 34 and host memory
`36. A command stream, which is analogous to subroutine
`calls calling an OpenGL APl library, is pushed immediately
`by the CPU to be executed by the ha rdware implementi ng
`the OpenGL system, and, accordingly, a push model is
`representative of the OpeoGL system.
`The memory agent 34 then transmits the commands 12 to
`crossbar switch 40. Tben, commands U arc transmitted to
`graphic subsystems 421 through 42,, which process the
`commands U in a token stream (commands, including GL
`commands, are mapped to integer tokens), as described in
`further detail herein below. After graphic subsystems 42 1
`through 42, process the commands 12, a display backend 44
`transfers the processed data to d igi1al-to-analog (DAC)
`converter 46, then to a monitor for display.
`FIG. 4 is a block diagram of a graphic subsystem 42 of the
`present invention. Graphics subsystem 42 comprises graph(cid:173)
`ics chip 43 and SDRAM 50, coupled 10 graphics chip 43.
`As s hown in FIG. 4, CROSSTALK interface 48, which is
`65 also referred to as a graphics front end, interfaces to the rest
`of t he g raphic system 30 tbrougb XTALK (or
`CROSSTALK). CROSSTALK is a router/connection uoit
`
`0014
`
`20
`
`25
`
`SUMMARY OF THE INVENTION
`The present invention solves the above-mentioned prob(cid:173)
`lems of tbe related art.
`An object of the present invention is to recirculate texture
`cache misses into the OpenGL processing pipeline.
`Another object of the present invention is to process
`texture cache misses without stalling the OpeoGL process(cid:173)
`ing pipeline.
`A further object of the present invention is to maintain
`OpenGL processing pipeline performance if a tex ture cache
`faull occurs.
`The present invention overcomes the problems in the
`related art and accompl ishes the above-mentioned objects.
`The present invention recirculates a texture cache request
`(i.e., a texture cache read request) resu lting in a texture
`cache mis.5 into a predetermined, earlier stage of the
`OpenGL processing pipeline, without stalling the OpeoGL
`processing pipeline.
`The present invention increases the performance of a
`graphics chip implementing the present invention by 35
`enabling the graphics chip to run at a relatively higher clock
`rate, increasing the performance of the graphics chip, but no t
`stalling the graphics pipeline of approximately 150 stages
`being executed by the graphics chip. The present invention
`can be implemented at a low cost and with mi nimal design 40
`complexity.
`If texture cache request resulting in a texture cache miss
`occurs, tbe present invention processes the next texture
`cacbe request without stalling the OpenGL proccs.5ing pipe(cid:173)
`line while the data requested by the faulted texture cache 45
`request is read in from the textu re memory. Instead of
`stalling the OpenGL processing pipeline as in the prior art,
`the present invention tra nsmits a signal to circuitry corre(cid:173)
`sponding to a prior step in the OpenGL processing pipeline,
`and reinserts the texture cache request that resulted io the 50
`texture cache miss into the prior step, while, concurrently,
`the requested data is read from the texture memory and
`the
`text11 re cache. Consequently, when the
`stored in
`previously-faul ted texture cache request is again presented
`to tbe texture cache, the requested data is stored and avail- 55
`able in the texture cache, for retrieval and use in response to
`the texture cache request. J( the requested data remains
`unavailable when the texture cache request is again pre(cid:173)
`sented to the texture cache, thco the texture cache request is
`recircu lated subsequentl y, and repeatedly, until the requested 60
`data is available for retrieval from the texture cache when
`tbe texture cache request is presented to the texture cache.
`By the method of the present invention, texture cache
`requests are processed without stalling the OpenGL process(cid:173)
`ing pipeline.
`These toge ther with other objects and advantages which
`will be subsequently apparent, reside in the details of
`
`

`
`US 6,259,460 Bl
`
`15
`
`5
`available from Silicon Graphics, Inc. Graphics front end 48
`comprises a XTALK peer-to-peer interface, a DMA engine
`(including formall ing), and a command first- in, first-out
`(fifo) buffer. The command fifo is maintained in SDRAM 50,
`as shown in FTG. 4 . The command fifo is norninaUy 1 ms. 5
`worth of latency.
`Graphics front end 48 also directs incoming streams to
`intended destinations, provides all support for chained
`graphics subsyste ms 42
`in mul ti-subsystems 42
`configurations, provides access in and out for the SDRAM 10
`50, provides DMA channels for gnphics and video, formats
`input and output streams, manages context switching and
`context states, provides a read path for graphics registers,
`and provides access to the display backend 44 through
`Display Unit 60.
`Also as shown in FIG. 4, Transform Engine 52 interfaces
`to both the graphics front end 48 and the raster unit 54 on a
`first- in, first-out basis, receives commands and data [Tom the
`graphics front end 48, and sends computed da ta to raster unit
`54. The maio computations performed io the transform 20
`engine 52 include geometry transforms of both vertices and
`normals (MVP and MV matrices). Transform Engine 52 is
`responsible for all geometry calculations in graphics sub(cid:173)
`system 42, including performing vertex transforms and
`lighting computations for Phong Shading, and Gouraud 25
`Shading. In addition, Transform Engine 52 performs texture
`transform.
`Raster unit 54 of FIG. 4 parses command tokens trans(cid:173)
`milled from tbe Transform Engine 52, schedul es all
`SDRAM 50 memory transactions, rastcrizes each primitive 30
`by recursive subdivision, and genera tes perspective(cid:173)
`corrected barycentric parameters whicb are used to bi-lerp
`(bilinear interpolate) among the 3 vertices for each triangle.
`Raster unit 54 also includes tbe framebuJier cache.
`In addition, raster unit 54 includes line and point
`antialiasing, and the control for a framebuffcr cache. Frus(cid:173)
`tum clipping is effectively performed by the recursive sub(cid:173)
`division rasterization in raster unit 54, and user clipped
`planes are performed using the sign bit of the bi-lerps fo r the
`texture coordinates to invalidate user-clipped pixels.
`Barycentric coordinates for a triangle are a set of three
`numbers a, b, and c, each in the range of (0,1), with a+b+ccl
`and which uniquely specify any point within the triangle or
`on the triangle's boundary. For example, a point P in a
`triangle having vertices A, B, and C, and area a from the
`triangle having vertices P, B, and C (the edge within the
`triangle opposite from the vertex A), area b from the triangle
`having vertices P, C, and A (the edge within the triangle
`opposite from the vertex B), and area c from the triangle
`having vertices P, A, and B (the edge witbin the triangle
`opposite from the vertex C) is given by
`Pa{Axa+Bxb+Cxc)l(a+b+<:).
`Raster unit 54 also generates ao attribute tag pipeline 95
`and a barycentric pipeline 93, which are discussed in fu rther ss
`detail herein below, and which are generated after generation
`of the barycentric coordinates. Raster unit 54 performs
`Hilbert curve rasterization.
`A graphics pipeline may include attributes, a coverage
`mask, and barycentric coordinates. Lo the present invention,
`the graphics pipeline would include the attribute token
`pipeline 92, the barycentric pipeline 93, the attribute tag
`pipeline 95, and hardware elements comprising the raster
`unit 54, the texture unit 56, and the shader unit 58, explained
`in detail herein below.
`Raster unit 54 receives the attribute token pipeline 92
`from software executed on a host computer. The attribute
`
`6
`token pipeline 92 transmits data originating from OpenGL
`calls executed on the host computer. The attribute token
`pipel ine 92 is formed when the above-mentioned OpenGL
`calls are translated into the data by a driver available from
`Silicon Graphics, Inc. running on the host computer and arc
`transmitted to the graphics chip 43.
`Also as shown io FIG. 4, texture unit 56 includes level of
`detail calculation, texture add ressing, control for the texture
`cache, the texture tree filter for lerps (linearly interpolate)
`and the TLUT (texture color look-up table). Texture unit 56
`also includes an SRAM for an on-chip texture cache, and the
`texture cache SRAM is organized as 16 ban.ksx512 wordsx
`48 bits. Texture unit 56 is explained in further detail herein
`below.
`Shader unit 58 shown in FIG. 4 imcludes shading and
`depth bi-lerps, per-pixel lighting, pre-lighting texture envi(cid:173)
`ronments and post-lighting texture environments, fog, multi(cid:173)
`fragment polygon antialiasing, and per-fragment tests and
`operations.
`A display unit 60 is provided io eacb graphics subsystem
`42, as shown in FIG. 4. T he display l>ackend 44 shown in
`FIG. 3 includes the display units 60 of the graphics sub(cid:173)
`systems 42, and additional funct ions. The display un it 60
`shown in FIG. 4 includes all of the pixel processing which
`occurs between the (ramebulier and the output of the graph(cid:173)
`ics subsystem 42 to the DAC or display backend 44. The
`display backend 44 combines the output from multiple
`graphic subsystems 42 for output by the DAC 46, or divides
`the output from the graphics system 30 to up to 4 DACs 46.
`The display unit 60, tra nsfers pixels and overlay/WID
`data from the framebuffer interface into first-in, first-out
`queues (FlFOs), changes pixels from lh.e framebuJier format
`into a staodarcl format RGB component format, maps color
`indexed into fina l RGB pixel values and applies gamma
`35 correction to all pixel data, generates all tim ing control
`signals for the various parts of the display unit 60, and
`provides read and write access to registers and memories io
`the display unit 60.
`The graphics chip 43 of the present invention also
`40 includes pins/pads 61 for physical ac<:ess to the graphics
`chip 43, and JTAG uoit 63 for cbip testing purposes.
`The focus of the present invention resides in Raster Unit
`54 and Texture Unit 56, which are described io greater detail
`with reference to FIGS. 6- U , after an overview of texture
`45 cache miss recirculation in accordance with tbe present
`invention as shown in FIG. 5.
`FIG. 5 is a diagram showing the general operation of the
`present invention in graphics pipelines. and is applicable to
`graphics pipelines including OpenGL. Microsoft DrRECT
`so 3D, and other graphics pipelines. Texture cache recirculation
`in accordance with the present ioveotioo involves graphics
`pipeline 62, including a rasterization unit 64, a texture unit
`66, a shading unit 68, and an SO RAM (synchronous
`dynamic random access memory) 70.
`As shown in FIG. 5, primitives at tbe level of a triangle
`enter the rasterization unit 64 and arc rasterized into pixels
`covering the area of a primitive. Tbe pixels are grouped into
`sets of fou r referred to as pixel quads (or quads) and
`transmitted into the Lz portion of the pipeline 74. In the
`60 texturing unit 66, the x, y, z address of the pixels and the
`corresponding texture address s, t, and rare examined. Each
`s, t, and r address corresponds to a given pixel, and the s, t,
`and r addresses flow through the texturing pipeline 74 after
`the gating unit 80. However, for the purposes of explanation,
`65 pixel quads are referred to in the remainder of FIG. 5. To
`addition, the present invention is applicable to both
`2-dimensional images and 3-climeosiol!lal images.
`
`0015
`
`

`
`US 6,259,460 Bl
`
`7
`The present invention recirculates a texture cache miss
`without stalling the graphics pipeline below controller 78, as
`sbown in FIG. 5. Tberefore, the constraint L1 ~~ (tbe delay
`of a pixel quad transmitted througb the L 1 portion of tbe
`grapbics pipeline must be greater tban or equal to tbe delay 5
`tbrougb Lz) is imposed by tbe raster controller 78 so that tbe
`raster controller 78 bas enough time to create an open (or
`empty) slot in the raster pipeline 72 to send nil data instead
`of transmilling another pixel quad. Accordingly, when a
`pixel quad is being recirculated back to the lop of the texture 10
`unit 66, tbc constraint of L 1 ~~allows the recirculated pixel
`quad to arrive at the gating c ircui t 80 at the same ti me as the
`open slot created by the raster controller 78. The recirculated
`pixel quad replaces the open slot in the remaining stages of
`tbe graphics pipeline 62. ~ and L1 are predetermined 15
`numbers, dependen t upon the physical design of a graph

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket