throbber
United States Patent [19J
`Baldwin
`
`lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
`
`US005815166A
`(11] Patent Number:
`(45] Date of Patent:
`
`5,815,166
`Sep. 29, 1998
`
`(54) GRAPHICS SUBSYSTEM WITH SlAVEABLE
`RAST ERIZER
`
`(75)
`
`Inventor: David Robert Bald win, Weybridge,
`United Kingdom
`
`[73) Assignee: 3DLabs Inc., Ltd., Hamilton, Bermuda
`
`[21) Appl. No.: 410,354
`
`(22)
`
`Filed:
`
`Mar. 24, 1995
`
`(51]
`[52]
`[58J
`
`Int. C l.6
`... .. ........ ... ....................... .. ............. . G06T 1/20
`U.S. Cl . ........................... 345/506; 345/509; 345/520
`Field of Sear ch ..................................... 395/ 162, 163,
`395/164, 118, 119, 140-143, 502, 506,
`520, 513, 509, 522; 345/112, 185, 189,
`502, 506, 509, 513, 520, 522, 418, 419,
`440-443
`
`(56]
`
`Refer ences Cited
`
`U.S. PATENT DOCUMENTS
`
`4,727,363
`5,136,664
`5, 185,599
`5,25 1,322
`5,287,442
`
`2/1988 Ishii ........................................ 345/190
`8/1992 Bcrsack ct al. ......................... 395/141
`2/1993 Doornink ct al. ...................... 345/200
`10/1993 Doyle et al. ............................ 395/162
`2/1994 Alcorn et al. ........................... 395/141
`
`OTITER PUBLICATIONS
`
`Electronic Imaging '88, International Electronic Imaging
`Exposition & Conference, Mar. 28-31, 1988, Published by
`Institu te for Graphic Comm., Inc, "A low Cost Imaging
`Workstation Using the Commodore Amiga and NEC's
`Image Pipelined Processors", by Miner eL al, pp. 422-427.
`Deering et al., "Leo: A System for Cost Effective 30 Shaded
`Graphics," Compwer Graphic~. pp. 101-108 (1993).
`
`Dunnell et a!., "The Image Chip for High Performance 30
`Rendering," IEEE Compwer Graphics & Applicalions, pp.
`4 1- 52 (1992).
`Gharachorloo e t a!., "Subnanosecond Pixel Renderi ng w ith
`Million Transistor Chips," Complller Graphics, vol. 4, pp.
`41-49 (1988).
`Akeley et al., " High- Performance Polygon Rendering,"
`Computer Graphics, vol. 22, pp. 239- 246 (1988).
`Molnar et al., '·Pixelflow: High-Speed Rendering Using
`Image Composition," Complller Graphics, vol. 26, pp.
`231-240 (1992).
`Akeley, ·'RealityEngioe Graphics," Comptller Graphics
`Proc., 109-16 (1993).
`Harrell, ''Graphics Renderi ng J\rchitecture for a High Per(cid:173)
`formance Desktop Workstation," Computer Graphics Proc.,
`pp. 93-100 (1993).
`Juan Pineda, "A Parallel Algorithm for Polygon Rasteriza(cid:173)
`tion" 22 Computer Graphics, pp. 17-20 (1988).
`tbe
`J(jrk et a!., "The Rendering Architecture of
`DNlOOOOVS," Compwer Graphics, vol. 24, pp. 299-307
`(1990).
`
`Primary Examiner-Kee M. Tung
`Attorney, Agent, or Firm-Robert Groover; Belly Formby;
`Mallhew Anderson
`
`[57]
`
`ABSTRAC f
`
`A graphics processing system with a message-passing
`architectUie, in which the rasterizer can be bypassed by a
`particular type of message from the host. This permits
`rasterizatioo to be slaved to the host downloads and
`bit masks, so that images and palteros can be applied lo lines
`and polygons, rather than just rectangles as is the case for
`prior art.
`
`21 Claims, 7 Drawing Sheets
`
`r----------------------- -----
`
`A
`
`LOCAL BUFFER INTERFACE
`
`LOCAL
`BUFFER
`
`r--------,
`
`LOCAL
`BUFFER
`BYPASS
`
`GRAPHICS
`PROCESSOR
`Flf'O {IN)
`
`GRAPHICS
`PROCESSOR ..... .---- - - - -1
`FIFO {OUT)
`
`-
`
`I
`I
`I
`
`t-----' : I
`
`_ ____ _ _ .J
`I
`
`FRAMEBUFFER [4;-- - - - - -- --
`BYPASS
`
`FRAME-
`BUFFER
`-.f
`FRAMEBUFFER INTERFACE
`' - - - -- - - - - - -- -- - - - -_ .
`
`D
`
`GLiNT2
`
`0001
`
`Volkswagen 1012
`
`

`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 1 of 7
`
`5,815,166
`
`TRANSFORM
`
`{ 1 {
`
`FIG. 1A
`
`TRANSFORM
`
`~ WORLD COO RDINATES {3D)
`TRANSFORM INTO VIEW
`COORDINATES AND
`CANONICAL VIEW VOLUME
`~ VIEW COORD !NATES {3D)
`CLIP AGAINST CANONICAL
`VIEW VOLUME
`~ VIEW COORD !NATES (3D)
`PROJECT ON TO
`VIEW PLANE
`~ VIEW COOR DINATES (2D)
`MAP INTO VIEW PORT
`~ NORMALIZED DEVICE COORDINATES
`TRANSFORM TO PHYSICAL
`DEVICE COORDINATES
`~ PHYSICAL D EVICE COORDINATES
`RENDER
`
`SUBORDINATE
`//SIDES
`
`SUBORDINATE/
`SIDE
`
`FIG. 4A
`
`Trapezoid B
`
`count3
`
`count2
`
`countl
`
`4B
`
`0002
`
`

`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 2 of 7
`
`5,815,166
`
`TEXCOORD •
`
`CURRENT
`TEXTURE
`COORDINATES
`
`FIG. 1B
`COLOR
`INDEX
`•
`
`VERTEX
`RASTERPOS NORMAL
`t
`CURRENT
`NORMAL
`
`CURRENT
`COLOR
`
`'
`
`MODEL VIEW
`MATRIX
`
`LIGHTING
`AND COLORING
`
`VERTICES
`
`PRIMITIVES
`
`+
`TEXGEN
`+
`TEXTURE
`MATRIX •
`•
`
`READ PIXELS
`
`DRAWPIXELS
`TEXIMAGE
`
`PIXEL
`STORAGE
`MODES
`
`PIXEL
`TRANSFER
`MODES
`
`I
`TEXTURE r--
`MEMORY
`
`PRIMITIVE ASSEMBLY
`
`..
`
`VIEW VOLUME CLIPPING
`
`+
`APPLICATION SPECIFIC CLIPPING
`+
`PROJECTION
`MATRIX •
`+
`DIVIDE BY
`W; VIEWPORT
`
`FRAGMENTS
`
`+
`CURRENT
`RASTER
`POSITION
`
`t
`
`r.
`
`PIXELS
`0
`
`RASTERIZATION
`
`PER-FRAGMENT OPERATIONS
`
`'
`•
`FRAME BUFFER J
`
`0003
`
`

`
`0'\
`0'\
`1-oi
`01
`1-oi
`QC
`01
`
`~
`
`~
`
`-...1
`0 .....
`w
`.....
`00 =(cid:173) <'>
`
`('>
`
`C ....
`N
`"P
`<'>
`00
`
`~\
`
`QO
`\C
`\C
`
`(0 = .....
`~ = .....
`rJJ .
`d •
`
`FRAMEBUFFER INTERFACE UNIT
`
`FRAMEBUFFER INTERFACE UNIT
`
`ADDRESS (24) ADDRESS (24)
`
`READ
`
`ITE
`
`WR
`
`DATA (32)
`
`READ
`
`DATA (32)
`
`WRITE
`
`~ NOTED)
`~ UNLESS OTHERWISE
`
`FRAMEBUFFER
`
`~ ~
`
`• I
`
`READ
`
`BLEND
`ALPHA
`
`~
`
`1-
`
`DITHER
`
`1-1-
`
`1-1-OPS
`LOGICAL
`
`1-1-FRAMEBUFFER
`
`WRITE
`
`OUT
`HOST
`
`INTERFACE
`
`HOST
`
`FIFO (OUT)
`ICS PROCESSOR
`
`GRAPH
`
`0004
`
`.._ BUFFER .._ r
`
`ITE
`
`WR
`
`LOCAL
`
`DATA (52)
`
`ITE
`
`WR
`
`DATA (52)
`
`!
`
`1
`
`READ
`READ
`LOCAL BUFFER INTERFACE UNIT
`
`ITE
`
`WR
`
`ADDRESS (24) ADDRE~S (24)
`
`'" STENCIL 1-
`
`DEPTH
`
`GID
`
`.._ BUFFER 1-
`
`READ
`
`LOCAL
`
`FOG >->-ALPHA >-
`
`COLOUR
`
`TEXTURE
`
`TEST
`
`rl •
`1
`
`RASTERIZER .._ >-SCISSOR>->-COLOR >->-
`
`DDA
`
`STIPPLE
`
`INTERFACE
`
`HOST
`
`GRAPH
`
`FIG. 2A
`
`

`
`~ -~
`
`\F.J, =(cid:173)~
`
`-...J
`0 .....
`
`~\C ....
`N
`~ "P
`\F.J
`
`QO
`\C
`\C
`
`~ = .....
`~ = .....
`rJJ .
`d •
`
`I I
`'I I
`I I
`I I
`I I
`I I
`I I
`I I
`I I
`I I
`I I
`I I
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`BYPASS
`
`FIFO (OUT)
`PROCESSOR
`GRAPHICS
`
`I
`I
`I
`I
`
`I
`I
`I
`
`I
`I
`I
`I
`I
`I
`I
`I
`~
`I
`I
`I
`I
`I
`I
`I
`I
`
`0 ~ I
`,...,
`E; (./)
`-I 0
`(./)
`0
`::c
`
`(./)
`(./)
`:::0
`
`-I
`
`:X:
`
`0005
`
`I
`i
`I
`r--------,
`
`BYPASS
`BUFFER
`LOCAL
`
`0\
`0\
`1--&
`...
`Ul
`1--&
`... oe
`
`Ul
`
`GLiNT2
`
`BUFFER
`FRAME-
`
`0
`
`A
`
`FRAMEBUFFER INTERFACE
`
`D
`
`FB Rd
`
`BLEND 1-
`
`DITHER -
`
`LOGICAL
`
`OPS
`
`OUT ~ FB Wr 1-
`HOST
`
`I
`:
`L----------------------------------------1----------~
`:1 GRAPHICS CORE
`
`
`·:
`
`L ~~!. ~~~~c~ J
`I
`I
`~
`I FRAME BUFFER
`
`r----------------------------------1---------------~-..,
`
`LB Wr
`
`STENCIL
`GID/Z/
`
`LB Rd -
`
`-
`
`D
`
`0 A 0
`
`A
`
`BUFFER
`LOCAL
`
`LOCAL BUFFER INTERFACE
`
`FIG. 2B
`
`...._
`
`Sti
`X
`'
`r'1
`""0
`c;
`c
`3:
`
`r--
`
`TEST
`ALPHA
`
`COLOR
`TRd ~ FOG
`
`~-----------------J
`TEXTURE UNIT I
`I
`I
`:
`r-t--
`TEXTURE 1
`I
`I
`,.--------· --------.,
`
`TAddr -
`
`DDA
`
`COLOR +
`
`I
`I
`
`-
`
`SCISSOR
`
`FIFO (IN)
`:::0
`,.,.,
`PROCESSOR H-i-RASTERIZER -STIPPLE -c:
`-i
`0
`GRAPHICS
`:::0
`
`

`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 5 of 7
`
`5,815,166
`
`FIG. 2 C
`
`RASTERIZER
`
`SCISSOR
`TEST
`
`STIPPLE
`
`COLOR DDA
`
`ALPHA TEST
`
`ANTI ALIAS
`APPLICATION
`
`FOG
`
`~
`
`TEXTURE
`
`LB
`
`READ -- OWNERSHIP
`
`PIXEL
`
`(GID)
`
`STENCIL
`TEST
`
`DEPTH
`
`LB
`
`TEST -- WRITE
`LOCALBUFFER ~
`
`LOGICAL OP/
`FB
`WRITE ~ FRAMEBUFFER
`MASK
`
`COLOR
`FORMAT
`(DITHER)
`
`ALPHA
`BLEND
`
`FB
`f4- READ
`
`~ FRAMEBUFFER
`
`HOST
`OUT
`
`0006
`
`

`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 6 of 7
`
`5,815,166
`
`FIG. 3A
`
`PLUG- IN CARD
`
`LOCALBUFFER
`
`HOST CPU
`
`GLINT
`300SX
`
`-.......
`
`VRAM
`
`-
`
`LUT- OAC
`
`I
`
`PCI LOCAL BUS
`
`FIG. 3B
`
`PLUG-IN CARD
`
`LOCAL
`GEOMETRY
`PROCESSOR
`I
`
`LOCALBUFFER
`
`GLINT
`300SX
`
`I
`
`PCI - PCI
`BRIDGE
`
`....... ~ VRAM
`
`.......__ LUT-DAC
`I
`
`PCI LOCAL BUS
`
`0007
`
`

`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 7 of 7
`
`5,815,166
`
`FIG. 3C
`
`PLUG- IN CARD
`
`PC!
`LOCAL
`BUS
`
`PCI-PCI
`BRIDGE
`
`-
`
`GUI
`ACCELERATOR
`
`I
`
`GLINT
`300SX
`I
`
`LOCALBUFFER
`
`1--< >-
`
`FRAMEBUFFER
`
`!--<"'"""
`
`LUT-DAC
`
`PC!
`LOCAL
`BUS
`
`PCI - PCI
`BRIDGE
`
`1 -
`
`FIG. 3D
`
`VIDEO
`COPROCESSOR
`
`PLUG- IN CARD
`
`-"'""" FRAMEBUFFER -- LUT-OAC
`
`I
`
`GLINT
`300SX
`I
`
`LOCALBUFFER
`
`0008
`
`

`
`5,815,166
`
`1
`GRAPHICS SUBSYSTEM WITH SLAV EABLE
`RASTERrZER
`
`BACKGROUND AND SUMMARY OF THE
`INVENTION
`
`The present application relates to computer graphics and
`animation systems, and particularly to graphics rendering
`bard ware.
`
`Background: Computer G raphics and Rendering
`
`2
`Thus efficient rendering is an essential step in translating an
`image representation into the correct pixel values. This is
`particularly true in animation applications, where newly
`rendered updates to a computer graphics display must be
`generated at regular intervals.
`The rendering requirements of three-dimensio nal graph(cid:173)
`ics are particu.larly heavy. One reason for this is that, even
`after the three-dimensional model has been translated to a
`two-dimensional model, some computational tasks may be
`10 bequeathed to tbe re ndering process. (For example, color
`values will need to be interpolated acros.s a triangle or other
`primitive.) These computational tasks tend to burden the
`rendering process. Ano ther reason is that since three(cid:173)
`dimensional graphics are much more lifelike, users are more
`likely to demand a fully rendered image. (By contrast, in the
`two-di mensional images created e.g. by a G Ul or simple
`game, users wi U learn not to expect all areas of the scene to
`be active or filled with information.)
`FIG. l A is a very h.igh-level view of other processes
`performed in a 3D graphics computer system. A three
`20 dimensional image which is defined in some fixed 3D
`coordinate system (a "world" coordinate system) is trans (cid:173)
`formed into a viewing volume (determined by a view
`position and direction), and the parts of the image which fall
`outside the viewing volume are discarded. Tbe visible
`25 portion of the image volume is then projected onto a viewing
`plane, in accordance with the familiar rules of perspective.
`This produces a two-climensional image, wbich is now
`mapped into device coordinates. 1t is important to under(cid:173)
`stand that all of these operations occur prior to the operations
`30 performed by the rendering subsystem of the present inven(cid:173)
`tion. FIG. l B is an expanded version of FIG. lA, and shows
`the flow of operations defined by the OpenGL standard.
`A vast amount of engineering effort bas been invested in
`computer graphics systems, and this area is ooc of increasing
`activity and demands. Numerous books have discus.sed the
`35 requirements of this area; see, e.g., ADVANCES IN COM(cid:173)
`PUTER GRAPHICS (ed. Enderle 1990-); Chellappa and
`Sawchuk, DIGITAL IMAGE PROCESSING AND ANALY(cid:173)
`SIS (1985); COMPUTER GRAPHICS HA R DWAR E (eel.
`Rcgbbati and Lee 1988); COMPUTER GRAPHICS:
`40 IMAGE SYNTHESIS (eel. Joy ct al.); Foley et at., FUN(cid:173)
`DAMENTALS OF INTERACflVE COMPUTER GRAPH(cid:173)
`ICS (2.ed. 1984); Foley, COM PUTER GRAPJ UCS PR IN(cid:173)
`C I PLES & PRACTICE (2 . ed . 1990); Foley,
`lNTRODUCflON TO COMPUTER GRAPHICS (1994);
`45 Giloi, Interactive Computer G rapbics (1978); Hearn and
`Baker, COMPUTER GRAPHICS (2.ed. 1994); Hill, COM(cid:173)
`PUTER GRAPHICS (1990); Latham, DICTIONARY OF
`COM PUTER G RAPHICS (1991); M agnenat-Tbalma,
`IMAGE SYNTHESIS T HEORY & PRACTICE (1988);
`50 Newman and Sproull, PRINCIPLES OF INTERACTIVE
`COMPUTER G RAPHICS (2.ecl. 1979); PICTURE ENGI(cid:173)
`NEERING (eel. Fu and Kun ii 1982); PICTURE PROCESS(cid:173)
`ING & DIGn'AL FILTERING (2.ed. Huang 1979); Prosise,
`How COMPUTER GRAPHICS WORK (1994); Rimmer,
`55 BIT MAPPED GRAPI-nCS (2.ecl. 1993); Salmon, COM(cid:173)
`PUTER GRAPIJI CS SYSTEMS & CONCEPTS (1987);
`Schachter, COMPUTER IMAGE GENERATION (1990);
`Watt, TIIREE-DIMENSIONAL COMPUTER GRAPHlCS
`(2.ed. 1994); Sco11 Wb.itman, MULTIPROCESSOR METH-
`60 ODS FOR COMPUTER GRAPHICS RENDERING; the
`SIGGRAPH PROCEEDINGS (or the years 1980-1994; and
`the IEEE Complller Graphics and Applicmions magazine
`for the years 1990-1994.
`
`IS
`
`Modem computer systems normally manipulate graphical
`objects as bigb-level entities. For example, a solid body may
`be described as a collection of triangles with specified
`vertices, or a straight line segment may be described by
`listing its two endpoints with th ree-dimensional or two(cid:173)
`dimensional coordinates. Such high-level descriptions are a
`necessary basis for bigb-level geometric manipuJations, and
`also have tbe advantage of providing a compact format
`which does not consume memory space unnecessarily.
`Such higher-level representations arc very convenient for
`performing the many required computations. For example,
`ray-tracing or other lighting calcuJations may be performed,
`and a projective transformation can be used to reduce a
`thrcc-dimensional scene to its two-dimensional appearance
`from a given viewpoint. However, w hen an image contain(cid:173)
`ing grapbicaJ objects is to be displayed, a very low-level
`description is needed. For example, in a conventionaJ CRT
`display, a ·' Hying spot" is moved acros.s the screen (one line
`at a time), and the beam from each of th ree electron guns is
`switched to a desired level of intensity as tbe flying spot
`passes each pixel location. T hus at some point the image
`model must be translated into a data set which can be used
`by a conventional display. This operation is known as
`" rendering. "
`The graphics-processing system typically interfaces to the
`display controller through a "frame store" or "frame buffer"
`of special two-port memory, which can be written to ran(cid:173)
`domly by the graphics proces.sing system, but also provides
`the synchronous data output needed by the video output
`driver. (DigitaJ-to-analog conversion is aJso provided after
`tbe frame buffer.) Such a frame buffer is usually imple(cid:173)
`mented using VRAM memory chips (or sometimes with
`DRAM and special DRAM controllers). This interface
`relieves the graphics-processing system of most of the
`burden of synchronization for video output. Nevertheless,
`the amounts of data w hich must be moved around are very
`sizable, and the computational and data-transfer burden of
`placing the correct data into the frame buffer can still be very
`large.
`Even if the computational operations required are quite
`simple, they must be performed repeatedly on a large
`number of datapoints. For example, in a typical 1995
`bigb-cod configuration, a display of 1280x1024 clements
`may need to be refreshed at 72 Hz, with a color resolu tion
`of 24 bits per pixel. II blending is desired, additional bits
`(e.g. another 8 bits per pixel) will be requjred to store an
`"alpha" or transparency value for each pixel. This implies
`manipulation of more than 3 billion bits per second, without
`allowing for any of the actual computations being per(cid:173)
`formed. Thus it may be seen that this is an environment wi th
`unique data manipulation requirements.
`If the display is unchanging, no demand is placed on tbe
`rendering operations. However, some common operations
`(such as zooming or rota tion) will require every object io tbe 65
`image space to be re-rendered. Slow rendering will make the
`rotation or zoom appear jerky. This is highly undesirable.
`
`0009
`
`Background: Graphics Animation
`In many areas of computer graphjcs a succession of
`slowly changing pictures are displayed rapidly one after the
`
`

`
`5,815,166
`
`3
`other, to give the impression of smooth movement, in much
`the same way as for cartoon animation. In general the higher
`the speed of the animation, the smoother (and better) Lbe
`result.
`When an application is generating animation images, it is
`normally necessary not on ly to draw each picture into the
`frame buffer, but also to first clear down the frame bufl:er,
`and to clear down auxiliary buffers such as depth (Z) buffers,
`stencil buffers, alpha buJicrs and others. A good treatment of
`the general principles may be found in Computer Graphics:
`Principles and Practice, James D. Foley et al., Reading
`Mass.: Addison-Wesley. A specific description of the various
`auxiliary buffers may be found in The OpenGL Graphics
`System: A Specification (Version 1.0), Mark Segal and Kurt
`Akeley, SGl.
`In most applications the value written, when clearing any
`given buffer, is the same at every pixel location, though
`different values may be used in different auxiliary buffers.
`Thus the frame buffer is often cleared to the value which
`corresponds to black, while the depth (Z) buffer is typically
`cleared to a value corresponding to infinity.
`The time taken to clear down the buffers is often a
`significant portion of the total time taken to draw a frame, so
`it is important to minimize it.
`
`Background: Parallelism in Graphics P rocessing
`
`Due to the large number of at least partially independent
`operations which arc performed in rendering, many propos(cid:173)
`als have been made to use some form of parallel architecture
`for g raphics (and particularly for rendering). See, for
`example, the special issue of Computer Graphics on parallel
`rendering (September 1994). Other approaches may be
`found in earlier patent filings by the assignee of the present
`application and its predecessors, e.g. U.S. Pat. No. 5,195,
`186, and published PCT applications PCT/ GB90/00987,
`PCT/ GB90/01209, PCT/GB90/01210, PCT/ GB90/01212,
`PCT/ GB90/01213, PCT/GB90/01214, PC f / GB90/01215,
`and PCT/ GB90/01216.
`
`Background: Pipelined Processing Generally
`
`40
`
`There are several general approaches to parallel process(cid:173)
`ing. One of the basic approaches to achieving parallelism in
`computer processing is a technique known as pipelining. In
`this technique the individual processors are, in effect, con(cid:173)
`nected in series in an assembly-line configuration: one
`processor performs a first set of operations on one chunk of
`data, and then passes that chunk along to another processor
`which performs a second set of operations, while at the same
`time the first processor performs the first set operations 50
`again on another chunk of data. Such architectures are
`generally discussed in Kogge, THE ARCHITECfURE OF
`PIPELINED COMPUTERS (1981).
`
`45
`
`Background: The OpenGLTM Standard
`
`55
`
`The ''OpenGL" standard is a very important software
`standard for graphics applkations. In any computer system
`which supports this standard, the operating systcm(s) and
`application software programs can make calls according to 60
`the OpenGL standards, without knowing exactly what the
`hardware configuration of the system is.
`The OpcnGL standard provides a complete library of
`low-level graphics manipulation commands, which can be
`used to implement three-dimensional graphics operations. 65
`This standard was o riginally based on the proprietary stan(cid:173)
`dards of Silicon Graphics, Inc., but was later transformed
`
`0010
`
`4
`into an open standard. It is now becoming extremely
`important, not only in high-end graphics-intensive
`workstations, but also in high-end PCs. OpenGL is sup(cid:173)
`ported by Windows NT.,.M, which makes it accessible to
`many PC appljcations.
`The OpenGL specification provides some constraints on
`Lbe sequence of operations. For instance, the color DDA
`operations must be performed before the texturing
`operations, which must be performed before the alpha
`10 operations. (A "DOA'' or digital differential analyzer, is a
`conventional piece of hardware used to produce l.inear
`gradation of color (or other) values over an image area.)
`Other graphics interfaces (or ''APls"), such as PHIGS or
`XGL, are also current as of 1995; but at the lowest level,
`tS OpenGL is a superset of most of these.
`The OpenGLstandard is described in the OPENGLPRO(cid:173)
`UKAMM IN U UUWI:. (1993), the O PJ::N<iL R J::FJ::KJ::NCJ::
`MANUAL(1993), and a book by Segal and Akeley (ofSGI)
`entitled THE OPENGL GRAPI-UCS SYSTEM: A SPECI-
`20 FIC.AJlON (Version 1.0.).
`Fl G.1B is an expanded version of FIG.!A, and shows the
`flow of operations defined by the OpenGL standard. Note
`that the most basic model is carried in terms of vertices, and
`these vertices arc then assembled into primitives (such as
`triangles, li.ncs, etc.). After all manipulation of the primitives
`has been completed, the rendering operations will translate
`each primitive into a set of ''fragments." (A fragment is the
`portion of a primitive which affects a single pixel.) Again, it
`30 should be noted that all operations above the block marked
`"Rasteri7..ation" would be performed by a host processor, or
`possibly by a ··geometry engine" (i.e. a dedicated processor
`which performs rapid matrix multiplies and related data
`manipulations), but would normally not be performed by a
`35 dedicated rendering processor such as that of the presently
`preferred embodiment.
`
`25
`
`Innovative System and Preferred System Context
`The present inventions provide a rendering system with
`multiple processors pipclined in a message-passing archi(cid:173)
`tecture. A key unit at the start of the pipeline is a rasterizer
`which nom1ally translates primitives into sequences of ren(cid:173)
`dering commands. However, the rasterizer can be bypassed
`by a particular type of message from the host. Thus, the
`rasterizer can be slaved to downloads and bitmasks from the
`host. This permits images and patterns to be applied to lines
`and polygons rather than just rectangles as is the case for
`prior art. This capability is also particularly advantageous
`for diagnostics, and hence for rapid product development.
`
`URIEF OESCRTPTION OF THE DRAWING
`The disclosed inventions will be described with reference
`to the accompanying drawings, which show important
`sample embodiments of the invention and which arc incor(cid:173)
`porated io the specification hereof by reference, wherein:
`FTG. l A, described above, is an overview of key e lements
`and processes in a 30 graphics computer system.
`FIG.lB is an expanded version of Fl G.lA, and shows the
`flow of operations defined by the OpenGL standard.
`FIG. 2A is an overview of the graphics rendering c hip of
`the presently preterred embodiment.
`FIG. 2B is an alternative embodiment of the graphics
`rendering chip of FIG. 2A, which includes additional
`texture-manipulation capabilities.
`FIG. 2C is a more schematic view of the sequence of
`operations performed in the graphics rendering chip of FIG.
`2A.
`
`

`
`5,815,166
`
`5
`FIG. 3A shows a sample graphics board which incorpo(cid:173)
`rates the chip of FIG. 2A.
`FIG. 3B shows another sample graphics board
`implementation, which differs from the board of FIG. 3A in
`that more memory and an additional component is used to
`achieve bigher performance.
`FIG. 3C shows anotber grapbics board, in wrucb tbe chip
`ofFlG. 2Ashares access to a common frame store with GUJ
`accelerator chip.
`FIG. 3D shows another graphics board, in which the chip
`of FTG. 2A sbares access to a common frame store with a
`video coprocessor (wruch may be used for video capture and
`playback functions.
`FIG. 4A illustrates tbe definition of tbe dominant side and
`the subordinate sides of a triangle.
`FIG. 4B iUustrates the sequence of rendering an Anti(cid:173)
`aliased Line primitive.
`
`DETAILED DESCRIPTION OF 'D-IE
`PREFERRED EMBODIMENTS
`
`The numerous innovative teachings of the present appli(cid:173)
`cation will be described witb particular reference to tbc
`presently preferred embodiment (by way of example, and
`not of limitation). The presently preferred embodiment is a
`GLINT™ 300SXTM 3D rendering chip. The llardware Ref(cid:173)
`erence Manual and Programmer's Reference Manual for this
`cbip describe further details of this sample embodiment.
`Both arc available, as of the effective filing date of tbis
`application, from 3D labs Inc. Ltd., 2010 N. 1st St., suite 403, 30
`San Jose Calif. 95131.
`
`Definitions
`
`6
`rendering operations are accelerated by GLINT, including
`Gouraud shading, texture mapping, depth buffering, anti(cid:173)
`aliasing, and alpha blending.
`The scalable memory architecture of GLINT makes it
`ideal for a wide range of graphics products, from PC boards
`to high-end workstation accelerators.
`There will be several of the GLINT family of graphics
`processors: the GLINT 300SXTM is the primary preferred
`embodiment which is described herein io great detail, and
`10 the GLINT 300TXn• is a planned alternative embodiment
`which is aLso mentioned hereinbelow. The two devices are
`generally compatible, with the 300TX adding local texture
`storage and texel address generation for all texture modes.
`FIG. 2A is an overview of the graphics rendering chip of
`15 the presently preferred embodiment (i.e. the GLINT
`300SXTM).
`General Concept
`The overall architecture of the GLINT crup is best viewed
`using tbe sof1ware paradigm of a message passing systcm. ln
`20 this system all the processing blocks arc connected in a long
`pipeline with communication with the adjacent blocks being
`done through message passing. Between each block there is
`a small amount of buffering, the size being specific to the
`local communications requirements and speed of tile two
`25 blocks.
`TI1e message rate is variable and depends on the rendering
`mode. The messages do not propagate through the system at
`a ftxed rate typical of a more traditional pipeline system. If
`tbe receiving block can not accept a message, because its
`input buffer is full, then the sending block stalls until space
`is available.
`The message structure is fundamental to the whole system
`as the messages are used to control, synchronize and inform
`each block about the processing it is to undertake. Each
`35 message has two fields-a 32 bit data fteld and a 9 bit tag
`field. (This is the minimum width guaranteed, but some local
`block to block connections may be wider to accommodate
`more data.) The data field will bold color information,
`coordinate information, local state information, etc. The tag
`40 field is used by each block to identify the message type so
`it knows how to act on it.
`Each block, on receiving a message, can do one of several
`things:
`Not recognize the message so it just passes it on to the next
`b lock.
`Recogni:z~ it as updating some local state (to the block) so
`the local state is updated and the message terminated, i.e.
`not passed on to the next block.
`Recognize it as a processing actioo, and if appropriate to the
`unit., the processing work specific to the unit is done. This
`may entail sending out oew messages sucb as Color
`and/or modifying the initial message before sending it on.
`Any new messages are injected into the message stream
`before the initial message is forwarded on. Some
`examples will clarify this.
`When the Depth Block receives a message ' new
`fragment' , it will calculate the corresponding depth and do
`the depth test. If the test passes then the ' new fragment'
`mcs.sagc is passed to the next unit. If tbc test fa ils then tbc
`60 message is modified and passed on. TI1e temptation is nolto
`pass the message on when the test fai ls (because the pixel is
`not going to be updated), but other units downstream need
`to keep their loca l DDA units in step.
`( In the present application, the messages are being
`65 described in general terms so as not to be bogged down in
`detail at this stage. The details o( what a 'new fragment'
`message actuall y specifies ( i.e. coordinate, color
`
`45
`
`The following definitions may help in understanding the
`exact meaning of terms used in the text of this application:
`application: a computer program wbich uses graphics ani(cid:173)
`mation.
`depth (Z) buffer: A memory buffer containing the depth
`component of a pixel. Used to, for example, eliminate
`hidden surfaces.
`bit double-bu!Iering: A technique for achieving smooth
`animation, by rendering only to an undisplayed back
`buffer, and then copying the back bulfer to the front once
`drawing is complete.
`FrameCount Planes: Used lo allow higher animation rates by
`enabling DRAM local buffer pixel data, such as depth (Z),
`to be cleared down quickly.
`frame buffer: An area of memory containing the displayable
`color buffers (front, back, lef1, right, overlay, underlay). 50
`This memory is typically separate from the local buffer.
`local buffer: An area of memory which may be used to store
`non-displayable pixel information: depth(Z), stencil,
`FrameCount and GTD planes. l11is memory is typically
`separate from the framebuffer.
`pixel: Picture element. A pixel comprises the bits in all the
`buffers (whether stored in the local buffer or frame buffer),
`corresponding to a particular location in the framebulfer.
`stencil buffer: A buffer used to store information about a
`pixel which controls bow subsequent stencilled pixels al
`the same location may be combined with the current value
`in the framebuffer. Typically used to mask complex
`two-dimensional shapes.
`Preferred Chip Embodiment- Overview
`The GLIN1'TM high performance graphics processors
`combine workstation class 3D g raphics acceleration, and
`state-of-the-art 2D performance in a single chip. All 3D
`
`55
`
`0011
`
`

`
`5,815,166
`
`10
`
`7
`information) is left till late r. In general, the term ·'pixel" is
`used to describe the picture element on the screen or in
`memory. The term "fragment" is used to describe the part of
`a polygon or other primitive which projects onto a pixel.
`Note that a fragmen t may only cover a part of a pixel.)
`When the Texture Read Unit (if enabled) gets a 'new
`fragmen t' message, it will calculate the texture map
`addresses, and will accordingly provide 1, 2, 4 or 8 texels to
`the next unit together with the appropriate number of
`interpolation coefficients.
`Each unit aocl
`the message passing are conceptually
`running asynchronous to all the others. Ilowever, in the
`presently preferred embodiment there is considerable syn(cid:173)
`chrony because of the common clock.
`How does the host process send messages? The message
`da ta field is the 32 bit data wrillen by the host, and the 15
`message lag is the bollom 9 bits of the address (excluding
`the byte resolution address lines). Writing to a specific
`address causes the message type associated with that address
`to be inserted into the mes.o;age queue. Alternatively, the
`on-chip OMA controller may fetch tbc messages from tbe 20
`host's memory.
`The mes.sage throughput, in the presently preferred
`embodiment, is SOM messages per second and this gives a
`fragment throughput of up to SOM per second, depending on
`w hat is being rendered. Of course, this rate will predictably 25
`be further increased over time, with adva nces in process
`technology and clock rates.
`Linkage
`The block diagram of FIG. 2A shows bow the units are
`connected together in the GLINT 300SX embodiment, and
`the b lock diagram o( FIG. 2B shows bow the units are
`connected 10gether in the GLINT 300TX embodiment.
`Some general points are:
`T he following functionality is present in the 300TX, but
`missing from the 300SX: The Texture Address ( f Addr)
`and Texture Read (TRd) Units are missing. Also, the 35
`router and multiplexer are missing from this section, so
`the unit ordering is Scissor/Stipple, Color DDA, Texture
`Fog Color, Alpha Test, LB Rd, etc.
`l o the embodiment of FIG. 2B, the order of the units can be
`configured in two ways. The most general order (Router, 40
`Color ODA, Texture Unit, Alpha Test, LB Rd, GID/Z/
`Stencil, LB Wr, Multiplexer) and will work in all modes
`of OpenGL. However, w hen the alpha test is disabled it is
`much better to do the Graphics 10, depth and stencil tests
`before the texture operations rather than after. This is 45
`because the tex ture operations have a high processing cost
`and this should not be spent on fragments which are later
`rejected because of window, depth or stencil tests.
`The loop back to the host at the bouom provides a simple
`synchronization mec hanism. T he host can insert a Sync
`command and w hen all the preceding renderi ng bas 50
`finished the sync command will reach the bollom host
`interface which will notify the host the sync event has
`occurred.
`Benefits
`The very modular nature of this architectme gives great 55
`benefits. Each unit lives io isolation from all the others and
`has a very well defined set of input and output messages.
`This allows the internal structu re of a unit (or group of units)
`to be cbangcd to make algori thmic/speed/gate count tradc-
`o~
`The isolation and well defined logical and behavioral
`interface to each unit allows much beller testing and veri(cid:173)
`fication of the corrcctnes.s of a unit.
`The message passing paradigm is easy to simulate with
`software, and the hardware design is nicely partitioned. The 65
`architecture is self synchronizing for mode or primitive
`changes.
`
`30
`
`8
`The host can mimic any block in the chain by ins

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket