`Baldwin
`
`lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
`
`US005815166A
`(11] Patent Number:
`(45] Date of Patent:
`
`5,815,166
`Sep. 29, 1998
`
`(54) GRAPHICS SUBSYSTEM WITH SlAVEABLE
`RAST ERIZER
`
`(75)
`
`Inventor: David Robert Bald win, Weybridge,
`United Kingdom
`
`[73) Assignee: 3DLabs Inc., Ltd., Hamilton, Bermuda
`
`[21) Appl. No.: 410,354
`
`(22)
`
`Filed:
`
`Mar. 24, 1995
`
`(51]
`[52]
`[58J
`
`Int. C l.6
`... .. ........ ... ....................... .. ............. . G06T 1/20
`U.S. Cl . ........................... 345/506; 345/509; 345/520
`Field of Sear ch ..................................... 395/ 162, 163,
`395/164, 118, 119, 140-143, 502, 506,
`520, 513, 509, 522; 345/112, 185, 189,
`502, 506, 509, 513, 520, 522, 418, 419,
`440-443
`
`(56]
`
`Refer ences Cited
`
`U.S. PATENT DOCUMENTS
`
`4,727,363
`5,136,664
`5, 185,599
`5,25 1,322
`5,287,442
`
`2/1988 Ishii ........................................ 345/190
`8/1992 Bcrsack ct al. ......................... 395/141
`2/1993 Doornink ct al. ...................... 345/200
`10/1993 Doyle et al. ............................ 395/162
`2/1994 Alcorn et al. ........................... 395/141
`
`OTITER PUBLICATIONS
`
`Electronic Imaging '88, International Electronic Imaging
`Exposition & Conference, Mar. 28-31, 1988, Published by
`Institu te for Graphic Comm., Inc, "A low Cost Imaging
`Workstation Using the Commodore Amiga and NEC's
`Image Pipelined Processors", by Miner eL al, pp. 422-427.
`Deering et al., "Leo: A System for Cost Effective 30 Shaded
`Graphics," Compwer Graphic~. pp. 101-108 (1993).
`
`Dunnell et a!., "The Image Chip for High Performance 30
`Rendering," IEEE Compwer Graphics & Applicalions, pp.
`4 1- 52 (1992).
`Gharachorloo e t a!., "Subnanosecond Pixel Renderi ng w ith
`Million Transistor Chips," Complller Graphics, vol. 4, pp.
`41-49 (1988).
`Akeley et al., " High- Performance Polygon Rendering,"
`Computer Graphics, vol. 22, pp. 239- 246 (1988).
`Molnar et al., '·Pixelflow: High-Speed Rendering Using
`Image Composition," Complller Graphics, vol. 26, pp.
`231-240 (1992).
`Akeley, ·'RealityEngioe Graphics," Comptller Graphics
`Proc., 109-16 (1993).
`Harrell, ''Graphics Renderi ng J\rchitecture for a High Per(cid:173)
`formance Desktop Workstation," Computer Graphics Proc.,
`pp. 93-100 (1993).
`Juan Pineda, "A Parallel Algorithm for Polygon Rasteriza(cid:173)
`tion" 22 Computer Graphics, pp. 17-20 (1988).
`tbe
`J(jrk et a!., "The Rendering Architecture of
`DNlOOOOVS," Compwer Graphics, vol. 24, pp. 299-307
`(1990).
`
`Primary Examiner-Kee M. Tung
`Attorney, Agent, or Firm-Robert Groover; Belly Formby;
`Mallhew Anderson
`
`[57]
`
`ABSTRAC f
`
`A graphics processing system with a message-passing
`architectUie, in which the rasterizer can be bypassed by a
`particular type of message from the host. This permits
`rasterizatioo to be slaved to the host downloads and
`bit masks, so that images and palteros can be applied lo lines
`and polygons, rather than just rectangles as is the case for
`prior art.
`
`21 Claims, 7 Drawing Sheets
`
`r----------------------- -----
`
`A
`
`LOCAL BUFFER INTERFACE
`
`LOCAL
`BUFFER
`
`r--------,
`
`LOCAL
`BUFFER
`BYPASS
`
`GRAPHICS
`PROCESSOR
`Flf'O {IN)
`
`GRAPHICS
`PROCESSOR ..... .---- - - - -1
`FIFO {OUT)
`
`-
`
`I
`I
`I
`
`t-----' : I
`
`_ ____ _ _ .J
`I
`
`FRAMEBUFFER [4;-- - - - - -- --
`BYPASS
`
`FRAME-
`BUFFER
`-.f
`FRAMEBUFFER INTERFACE
`' - - - -- - - - - - -- -- - - - -_ .
`
`D
`
`GLiNT2
`
`0001
`
`Volkswagen 1012
`
`
`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 1 of 7
`
`5,815,166
`
`TRANSFORM
`
`{ 1 {
`
`FIG. 1A
`
`TRANSFORM
`
`~ WORLD COO RDINATES {3D)
`TRANSFORM INTO VIEW
`COORDINATES AND
`CANONICAL VIEW VOLUME
`~ VIEW COORD !NATES {3D)
`CLIP AGAINST CANONICAL
`VIEW VOLUME
`~ VIEW COORD !NATES (3D)
`PROJECT ON TO
`VIEW PLANE
`~ VIEW COOR DINATES (2D)
`MAP INTO VIEW PORT
`~ NORMALIZED DEVICE COORDINATES
`TRANSFORM TO PHYSICAL
`DEVICE COORDINATES
`~ PHYSICAL D EVICE COORDINATES
`RENDER
`
`SUBORDINATE
`//SIDES
`
`SUBORDINATE/
`SIDE
`
`FIG. 4A
`
`Trapezoid B
`
`count3
`
`count2
`
`countl
`
`4B
`
`0002
`
`
`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 2 of 7
`
`5,815,166
`
`TEXCOORD •
`
`CURRENT
`TEXTURE
`COORDINATES
`
`FIG. 1B
`COLOR
`INDEX
`•
`
`VERTEX
`RASTERPOS NORMAL
`t
`CURRENT
`NORMAL
`
`CURRENT
`COLOR
`
`'
`
`MODEL VIEW
`MATRIX
`
`LIGHTING
`AND COLORING
`
`VERTICES
`
`PRIMITIVES
`
`+
`TEXGEN
`+
`TEXTURE
`MATRIX •
`•
`
`READ PIXELS
`
`DRAWPIXELS
`TEXIMAGE
`
`PIXEL
`STORAGE
`MODES
`
`PIXEL
`TRANSFER
`MODES
`
`I
`TEXTURE r--
`MEMORY
`
`PRIMITIVE ASSEMBLY
`
`..
`
`VIEW VOLUME CLIPPING
`
`+
`APPLICATION SPECIFIC CLIPPING
`+
`PROJECTION
`MATRIX •
`+
`DIVIDE BY
`W; VIEWPORT
`
`FRAGMENTS
`
`+
`CURRENT
`RASTER
`POSITION
`
`t
`
`r.
`
`PIXELS
`0
`
`RASTERIZATION
`
`PER-FRAGMENT OPERATIONS
`
`'
`•
`FRAME BUFFER J
`
`0003
`
`
`
`0'\
`0'\
`1-oi
`01
`1-oi
`QC
`01
`
`~
`
`~
`
`-...1
`0 .....
`w
`.....
`00 =(cid:173) <'>
`
`('>
`
`C ....
`N
`"P
`<'>
`00
`
`~\
`
`QO
`\C
`\C
`
`(0 = .....
`~ = .....
`rJJ .
`d •
`
`FRAMEBUFFER INTERFACE UNIT
`
`FRAMEBUFFER INTERFACE UNIT
`
`ADDRESS (24) ADDRESS (24)
`
`READ
`
`ITE
`
`WR
`
`DATA (32)
`
`READ
`
`DATA (32)
`
`WRITE
`
`~ NOTED)
`~ UNLESS OTHERWISE
`
`FRAMEBUFFER
`
`~ ~
`
`• I
`
`READ
`
`BLEND
`ALPHA
`
`~
`
`1-
`
`DITHER
`
`1-1-
`
`1-1-OPS
`LOGICAL
`
`1-1-FRAMEBUFFER
`
`WRITE
`
`OUT
`HOST
`
`INTERFACE
`
`HOST
`
`FIFO (OUT)
`ICS PROCESSOR
`
`GRAPH
`
`0004
`
`.._ BUFFER .._ r
`
`ITE
`
`WR
`
`LOCAL
`
`DATA (52)
`
`ITE
`
`WR
`
`DATA (52)
`
`!
`
`1
`
`READ
`READ
`LOCAL BUFFER INTERFACE UNIT
`
`ITE
`
`WR
`
`ADDRESS (24) ADDRE~S (24)
`
`'" STENCIL 1-
`
`DEPTH
`
`GID
`
`.._ BUFFER 1-
`
`READ
`
`LOCAL
`
`FOG >->-ALPHA >-
`
`COLOUR
`
`TEXTURE
`
`TEST
`
`rl •
`1
`
`RASTERIZER .._ >-SCISSOR>->-COLOR >->-
`
`DDA
`
`STIPPLE
`
`INTERFACE
`
`HOST
`
`GRAPH
`
`FIG. 2A
`
`
`
`~ -~
`
`\F.J, =(cid:173)~
`
`-...J
`0 .....
`
`~\C ....
`N
`~ "P
`\F.J
`
`QO
`\C
`\C
`
`~ = .....
`~ = .....
`rJJ .
`d •
`
`I I
`'I I
`I I
`I I
`I I
`I I
`I I
`I I
`I I
`I I
`I I
`I I
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`BYPASS
`
`FIFO (OUT)
`PROCESSOR
`GRAPHICS
`
`I
`I
`I
`I
`
`I
`I
`I
`
`I
`I
`I
`I
`I
`I
`I
`I
`~
`I
`I
`I
`I
`I
`I
`I
`I
`
`0 ~ I
`,...,
`E; (./)
`-I 0
`(./)
`0
`::c
`
`(./)
`(./)
`:::0
`
`-I
`
`:X:
`
`0005
`
`I
`i
`I
`r--------,
`
`BYPASS
`BUFFER
`LOCAL
`
`0\
`0\
`1--&
`...
`Ul
`1--&
`... oe
`
`Ul
`
`GLiNT2
`
`BUFFER
`FRAME-
`
`0
`
`A
`
`FRAMEBUFFER INTERFACE
`
`D
`
`FB Rd
`
`BLEND 1-
`
`DITHER -
`
`LOGICAL
`
`OPS
`
`OUT ~ FB Wr 1-
`HOST
`
`I
`:
`L----------------------------------------1----------~
`:1 GRAPHICS CORE
`
`
`·:
`
`L ~~!. ~~~~c~ J
`I
`I
`~
`I FRAME BUFFER
`
`r----------------------------------1---------------~-..,
`
`LB Wr
`
`STENCIL
`GID/Z/
`
`LB Rd -
`
`-
`
`D
`
`0 A 0
`
`A
`
`BUFFER
`LOCAL
`
`LOCAL BUFFER INTERFACE
`
`FIG. 2B
`
`...._
`
`Sti
`X
`'
`r'1
`""0
`c;
`c
`3:
`
`r--
`
`TEST
`ALPHA
`
`COLOR
`TRd ~ FOG
`
`~-----------------J
`TEXTURE UNIT I
`I
`I
`:
`r-t--
`TEXTURE 1
`I
`I
`,.--------· --------.,
`
`TAddr -
`
`DDA
`
`COLOR +
`
`I
`I
`
`-
`
`SCISSOR
`
`FIFO (IN)
`:::0
`,.,.,
`PROCESSOR H-i-RASTERIZER -STIPPLE -c:
`-i
`0
`GRAPHICS
`:::0
`
`
`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 5 of 7
`
`5,815,166
`
`FIG. 2 C
`
`RASTERIZER
`
`SCISSOR
`TEST
`
`STIPPLE
`
`COLOR DDA
`
`ALPHA TEST
`
`ANTI ALIAS
`APPLICATION
`
`FOG
`
`~
`
`TEXTURE
`
`LB
`
`READ -- OWNERSHIP
`
`PIXEL
`
`(GID)
`
`STENCIL
`TEST
`
`DEPTH
`
`LB
`
`TEST -- WRITE
`LOCALBUFFER ~
`
`LOGICAL OP/
`FB
`WRITE ~ FRAMEBUFFER
`MASK
`
`COLOR
`FORMAT
`(DITHER)
`
`ALPHA
`BLEND
`
`FB
`f4- READ
`
`~ FRAMEBUFFER
`
`HOST
`OUT
`
`0006
`
`
`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 6 of 7
`
`5,815,166
`
`FIG. 3A
`
`PLUG- IN CARD
`
`LOCALBUFFER
`
`HOST CPU
`
`GLINT
`300SX
`
`-.......
`
`VRAM
`
`-
`
`LUT- OAC
`
`I
`
`PCI LOCAL BUS
`
`FIG. 3B
`
`PLUG-IN CARD
`
`LOCAL
`GEOMETRY
`PROCESSOR
`I
`
`LOCALBUFFER
`
`GLINT
`300SX
`
`I
`
`PCI - PCI
`BRIDGE
`
`....... ~ VRAM
`
`.......__ LUT-DAC
`I
`
`PCI LOCAL BUS
`
`0007
`
`
`
`U.S. Patent
`
`Sep. 29, 1998
`
`Sheet 7 of 7
`
`5,815,166
`
`FIG. 3C
`
`PLUG- IN CARD
`
`PC!
`LOCAL
`BUS
`
`PCI-PCI
`BRIDGE
`
`-
`
`GUI
`ACCELERATOR
`
`I
`
`GLINT
`300SX
`I
`
`LOCALBUFFER
`
`1--< >-
`
`FRAMEBUFFER
`
`!--<"'"""
`
`LUT-DAC
`
`PC!
`LOCAL
`BUS
`
`PCI - PCI
`BRIDGE
`
`1 -
`
`FIG. 3D
`
`VIDEO
`COPROCESSOR
`
`PLUG- IN CARD
`
`-"'""" FRAMEBUFFER -- LUT-OAC
`
`I
`
`GLINT
`300SX
`I
`
`LOCALBUFFER
`
`0008
`
`
`
`5,815,166
`
`1
`GRAPHICS SUBSYSTEM WITH SLAV EABLE
`RASTERrZER
`
`BACKGROUND AND SUMMARY OF THE
`INVENTION
`
`The present application relates to computer graphics and
`animation systems, and particularly to graphics rendering
`bard ware.
`
`Background: Computer G raphics and Rendering
`
`2
`Thus efficient rendering is an essential step in translating an
`image representation into the correct pixel values. This is
`particularly true in animation applications, where newly
`rendered updates to a computer graphics display must be
`generated at regular intervals.
`The rendering requirements of three-dimensio nal graph(cid:173)
`ics are particu.larly heavy. One reason for this is that, even
`after the three-dimensional model has been translated to a
`two-dimensional model, some computational tasks may be
`10 bequeathed to tbe re ndering process. (For example, color
`values will need to be interpolated acros.s a triangle or other
`primitive.) These computational tasks tend to burden the
`rendering process. Ano ther reason is that since three(cid:173)
`dimensional graphics are much more lifelike, users are more
`likely to demand a fully rendered image. (By contrast, in the
`two-di mensional images created e.g. by a G Ul or simple
`game, users wi U learn not to expect all areas of the scene to
`be active or filled with information.)
`FIG. l A is a very h.igh-level view of other processes
`performed in a 3D graphics computer system. A three
`20 dimensional image which is defined in some fixed 3D
`coordinate system (a "world" coordinate system) is trans (cid:173)
`formed into a viewing volume (determined by a view
`position and direction), and the parts of the image which fall
`outside the viewing volume are discarded. Tbe visible
`25 portion of the image volume is then projected onto a viewing
`plane, in accordance with the familiar rules of perspective.
`This produces a two-climensional image, wbich is now
`mapped into device coordinates. 1t is important to under(cid:173)
`stand that all of these operations occur prior to the operations
`30 performed by the rendering subsystem of the present inven(cid:173)
`tion. FIG. l B is an expanded version of FIG. lA, and shows
`the flow of operations defined by the OpenGL standard.
`A vast amount of engineering effort bas been invested in
`computer graphics systems, and this area is ooc of increasing
`activity and demands. Numerous books have discus.sed the
`35 requirements of this area; see, e.g., ADVANCES IN COM(cid:173)
`PUTER GRAPHICS (ed. Enderle 1990-); Chellappa and
`Sawchuk, DIGITAL IMAGE PROCESSING AND ANALY(cid:173)
`SIS (1985); COMPUTER GRAPHICS HA R DWAR E (eel.
`Rcgbbati and Lee 1988); COMPUTER GRAPHICS:
`40 IMAGE SYNTHESIS (eel. Joy ct al.); Foley et at., FUN(cid:173)
`DAMENTALS OF INTERACflVE COMPUTER GRAPH(cid:173)
`ICS (2.ed. 1984); Foley, COM PUTER GRAPJ UCS PR IN(cid:173)
`C I PLES & PRACTICE (2 . ed . 1990); Foley,
`lNTRODUCflON TO COMPUTER GRAPHICS (1994);
`45 Giloi, Interactive Computer G rapbics (1978); Hearn and
`Baker, COMPUTER GRAPHICS (2.ed. 1994); Hill, COM(cid:173)
`PUTER GRAPHICS (1990); Latham, DICTIONARY OF
`COM PUTER G RAPHICS (1991); M agnenat-Tbalma,
`IMAGE SYNTHESIS T HEORY & PRACTICE (1988);
`50 Newman and Sproull, PRINCIPLES OF INTERACTIVE
`COMPUTER G RAPHICS (2.ecl. 1979); PICTURE ENGI(cid:173)
`NEERING (eel. Fu and Kun ii 1982); PICTURE PROCESS(cid:173)
`ING & DIGn'AL FILTERING (2.ed. Huang 1979); Prosise,
`How COMPUTER GRAPHICS WORK (1994); Rimmer,
`55 BIT MAPPED GRAPI-nCS (2.ecl. 1993); Salmon, COM(cid:173)
`PUTER GRAPIJI CS SYSTEMS & CONCEPTS (1987);
`Schachter, COMPUTER IMAGE GENERATION (1990);
`Watt, TIIREE-DIMENSIONAL COMPUTER GRAPHlCS
`(2.ed. 1994); Sco11 Wb.itman, MULTIPROCESSOR METH-
`60 ODS FOR COMPUTER GRAPHICS RENDERING; the
`SIGGRAPH PROCEEDINGS (or the years 1980-1994; and
`the IEEE Complller Graphics and Applicmions magazine
`for the years 1990-1994.
`
`IS
`
`Modem computer systems normally manipulate graphical
`objects as bigb-level entities. For example, a solid body may
`be described as a collection of triangles with specified
`vertices, or a straight line segment may be described by
`listing its two endpoints with th ree-dimensional or two(cid:173)
`dimensional coordinates. Such high-level descriptions are a
`necessary basis for bigb-level geometric manipuJations, and
`also have tbe advantage of providing a compact format
`which does not consume memory space unnecessarily.
`Such higher-level representations arc very convenient for
`performing the many required computations. For example,
`ray-tracing or other lighting calcuJations may be performed,
`and a projective transformation can be used to reduce a
`thrcc-dimensional scene to its two-dimensional appearance
`from a given viewpoint. However, w hen an image contain(cid:173)
`ing grapbicaJ objects is to be displayed, a very low-level
`description is needed. For example, in a conventionaJ CRT
`display, a ·' Hying spot" is moved acros.s the screen (one line
`at a time), and the beam from each of th ree electron guns is
`switched to a desired level of intensity as tbe flying spot
`passes each pixel location. T hus at some point the image
`model must be translated into a data set which can be used
`by a conventional display. This operation is known as
`" rendering. "
`The graphics-processing system typically interfaces to the
`display controller through a "frame store" or "frame buffer"
`of special two-port memory, which can be written to ran(cid:173)
`domly by the graphics proces.sing system, but also provides
`the synchronous data output needed by the video output
`driver. (DigitaJ-to-analog conversion is aJso provided after
`tbe frame buffer.) Such a frame buffer is usually imple(cid:173)
`mented using VRAM memory chips (or sometimes with
`DRAM and special DRAM controllers). This interface
`relieves the graphics-processing system of most of the
`burden of synchronization for video output. Nevertheless,
`the amounts of data w hich must be moved around are very
`sizable, and the computational and data-transfer burden of
`placing the correct data into the frame buffer can still be very
`large.
`Even if the computational operations required are quite
`simple, they must be performed repeatedly on a large
`number of datapoints. For example, in a typical 1995
`bigb-cod configuration, a display of 1280x1024 clements
`may need to be refreshed at 72 Hz, with a color resolu tion
`of 24 bits per pixel. II blending is desired, additional bits
`(e.g. another 8 bits per pixel) will be requjred to store an
`"alpha" or transparency value for each pixel. This implies
`manipulation of more than 3 billion bits per second, without
`allowing for any of the actual computations being per(cid:173)
`formed. Thus it may be seen that this is an environment wi th
`unique data manipulation requirements.
`If the display is unchanging, no demand is placed on tbe
`rendering operations. However, some common operations
`(such as zooming or rota tion) will require every object io tbe 65
`image space to be re-rendered. Slow rendering will make the
`rotation or zoom appear jerky. This is highly undesirable.
`
`0009
`
`Background: Graphics Animation
`In many areas of computer graphjcs a succession of
`slowly changing pictures are displayed rapidly one after the
`
`
`
`5,815,166
`
`3
`other, to give the impression of smooth movement, in much
`the same way as for cartoon animation. In general the higher
`the speed of the animation, the smoother (and better) Lbe
`result.
`When an application is generating animation images, it is
`normally necessary not on ly to draw each picture into the
`frame buffer, but also to first clear down the frame bufl:er,
`and to clear down auxiliary buffers such as depth (Z) buffers,
`stencil buffers, alpha buJicrs and others. A good treatment of
`the general principles may be found in Computer Graphics:
`Principles and Practice, James D. Foley et al., Reading
`Mass.: Addison-Wesley. A specific description of the various
`auxiliary buffers may be found in The OpenGL Graphics
`System: A Specification (Version 1.0), Mark Segal and Kurt
`Akeley, SGl.
`In most applications the value written, when clearing any
`given buffer, is the same at every pixel location, though
`different values may be used in different auxiliary buffers.
`Thus the frame buffer is often cleared to the value which
`corresponds to black, while the depth (Z) buffer is typically
`cleared to a value corresponding to infinity.
`The time taken to clear down the buffers is often a
`significant portion of the total time taken to draw a frame, so
`it is important to minimize it.
`
`Background: Parallelism in Graphics P rocessing
`
`Due to the large number of at least partially independent
`operations which arc performed in rendering, many propos(cid:173)
`als have been made to use some form of parallel architecture
`for g raphics (and particularly for rendering). See, for
`example, the special issue of Computer Graphics on parallel
`rendering (September 1994). Other approaches may be
`found in earlier patent filings by the assignee of the present
`application and its predecessors, e.g. U.S. Pat. No. 5,195,
`186, and published PCT applications PCT/ GB90/00987,
`PCT/ GB90/01209, PCT/GB90/01210, PCT/ GB90/01212,
`PCT/ GB90/01213, PCT/GB90/01214, PC f / GB90/01215,
`and PCT/ GB90/01216.
`
`Background: Pipelined Processing Generally
`
`40
`
`There are several general approaches to parallel process(cid:173)
`ing. One of the basic approaches to achieving parallelism in
`computer processing is a technique known as pipelining. In
`this technique the individual processors are, in effect, con(cid:173)
`nected in series in an assembly-line configuration: one
`processor performs a first set of operations on one chunk of
`data, and then passes that chunk along to another processor
`which performs a second set of operations, while at the same
`time the first processor performs the first set operations 50
`again on another chunk of data. Such architectures are
`generally discussed in Kogge, THE ARCHITECfURE OF
`PIPELINED COMPUTERS (1981).
`
`45
`
`Background: The OpenGLTM Standard
`
`55
`
`The ''OpenGL" standard is a very important software
`standard for graphics applkations. In any computer system
`which supports this standard, the operating systcm(s) and
`application software programs can make calls according to 60
`the OpenGL standards, without knowing exactly what the
`hardware configuration of the system is.
`The OpcnGL standard provides a complete library of
`low-level graphics manipulation commands, which can be
`used to implement three-dimensional graphics operations. 65
`This standard was o riginally based on the proprietary stan(cid:173)
`dards of Silicon Graphics, Inc., but was later transformed
`
`0010
`
`4
`into an open standard. It is now becoming extremely
`important, not only in high-end graphics-intensive
`workstations, but also in high-end PCs. OpenGL is sup(cid:173)
`ported by Windows NT.,.M, which makes it accessible to
`many PC appljcations.
`The OpenGL specification provides some constraints on
`Lbe sequence of operations. For instance, the color DDA
`operations must be performed before the texturing
`operations, which must be performed before the alpha
`10 operations. (A "DOA'' or digital differential analyzer, is a
`conventional piece of hardware used to produce l.inear
`gradation of color (or other) values over an image area.)
`Other graphics interfaces (or ''APls"), such as PHIGS or
`XGL, are also current as of 1995; but at the lowest level,
`tS OpenGL is a superset of most of these.
`The OpenGLstandard is described in the OPENGLPRO(cid:173)
`UKAMM IN U UUWI:. (1993), the O PJ::N<iL R J::FJ::KJ::NCJ::
`MANUAL(1993), and a book by Segal and Akeley (ofSGI)
`entitled THE OPENGL GRAPI-UCS SYSTEM: A SPECI-
`20 FIC.AJlON (Version 1.0.).
`Fl G.1B is an expanded version of FIG.!A, and shows the
`flow of operations defined by the OpenGL standard. Note
`that the most basic model is carried in terms of vertices, and
`these vertices arc then assembled into primitives (such as
`triangles, li.ncs, etc.). After all manipulation of the primitives
`has been completed, the rendering operations will translate
`each primitive into a set of ''fragments." (A fragment is the
`portion of a primitive which affects a single pixel.) Again, it
`30 should be noted that all operations above the block marked
`"Rasteri7..ation" would be performed by a host processor, or
`possibly by a ··geometry engine" (i.e. a dedicated processor
`which performs rapid matrix multiplies and related data
`manipulations), but would normally not be performed by a
`35 dedicated rendering processor such as that of the presently
`preferred embodiment.
`
`25
`
`Innovative System and Preferred System Context
`The present inventions provide a rendering system with
`multiple processors pipclined in a message-passing archi(cid:173)
`tecture. A key unit at the start of the pipeline is a rasterizer
`which nom1ally translates primitives into sequences of ren(cid:173)
`dering commands. However, the rasterizer can be bypassed
`by a particular type of message from the host. Thus, the
`rasterizer can be slaved to downloads and bitmasks from the
`host. This permits images and patterns to be applied to lines
`and polygons rather than just rectangles as is the case for
`prior art. This capability is also particularly advantageous
`for diagnostics, and hence for rapid product development.
`
`URIEF OESCRTPTION OF THE DRAWING
`The disclosed inventions will be described with reference
`to the accompanying drawings, which show important
`sample embodiments of the invention and which arc incor(cid:173)
`porated io the specification hereof by reference, wherein:
`FTG. l A, described above, is an overview of key e lements
`and processes in a 30 graphics computer system.
`FIG.lB is an expanded version of Fl G.lA, and shows the
`flow of operations defined by the OpenGL standard.
`FIG. 2A is an overview of the graphics rendering c hip of
`the presently preterred embodiment.
`FIG. 2B is an alternative embodiment of the graphics
`rendering chip of FIG. 2A, which includes additional
`texture-manipulation capabilities.
`FIG. 2C is a more schematic view of the sequence of
`operations performed in the graphics rendering chip of FIG.
`2A.
`
`
`
`5,815,166
`
`5
`FIG. 3A shows a sample graphics board which incorpo(cid:173)
`rates the chip of FIG. 2A.
`FIG. 3B shows another sample graphics board
`implementation, which differs from the board of FIG. 3A in
`that more memory and an additional component is used to
`achieve bigher performance.
`FIG. 3C shows anotber grapbics board, in wrucb tbe chip
`ofFlG. 2Ashares access to a common frame store with GUJ
`accelerator chip.
`FIG. 3D shows another graphics board, in which the chip
`of FTG. 2A sbares access to a common frame store with a
`video coprocessor (wruch may be used for video capture and
`playback functions.
`FIG. 4A illustrates tbe definition of tbe dominant side and
`the subordinate sides of a triangle.
`FIG. 4B iUustrates the sequence of rendering an Anti(cid:173)
`aliased Line primitive.
`
`DETAILED DESCRIPTION OF 'D-IE
`PREFERRED EMBODIMENTS
`
`The numerous innovative teachings of the present appli(cid:173)
`cation will be described witb particular reference to tbc
`presently preferred embodiment (by way of example, and
`not of limitation). The presently preferred embodiment is a
`GLINT™ 300SXTM 3D rendering chip. The llardware Ref(cid:173)
`erence Manual and Programmer's Reference Manual for this
`cbip describe further details of this sample embodiment.
`Both arc available, as of the effective filing date of tbis
`application, from 3D labs Inc. Ltd., 2010 N. 1st St., suite 403, 30
`San Jose Calif. 95131.
`
`Definitions
`
`6
`rendering operations are accelerated by GLINT, including
`Gouraud shading, texture mapping, depth buffering, anti(cid:173)
`aliasing, and alpha blending.
`The scalable memory architecture of GLINT makes it
`ideal for a wide range of graphics products, from PC boards
`to high-end workstation accelerators.
`There will be several of the GLINT family of graphics
`processors: the GLINT 300SXTM is the primary preferred
`embodiment which is described herein io great detail, and
`10 the GLINT 300TXn• is a planned alternative embodiment
`which is aLso mentioned hereinbelow. The two devices are
`generally compatible, with the 300TX adding local texture
`storage and texel address generation for all texture modes.
`FIG. 2A is an overview of the graphics rendering chip of
`15 the presently preferred embodiment (i.e. the GLINT
`300SXTM).
`General Concept
`The overall architecture of the GLINT crup is best viewed
`using tbe sof1ware paradigm of a message passing systcm. ln
`20 this system all the processing blocks arc connected in a long
`pipeline with communication with the adjacent blocks being
`done through message passing. Between each block there is
`a small amount of buffering, the size being specific to the
`local communications requirements and speed of tile two
`25 blocks.
`TI1e message rate is variable and depends on the rendering
`mode. The messages do not propagate through the system at
`a ftxed rate typical of a more traditional pipeline system. If
`tbe receiving block can not accept a message, because its
`input buffer is full, then the sending block stalls until space
`is available.
`The message structure is fundamental to the whole system
`as the messages are used to control, synchronize and inform
`each block about the processing it is to undertake. Each
`35 message has two fields-a 32 bit data fteld and a 9 bit tag
`field. (This is the minimum width guaranteed, but some local
`block to block connections may be wider to accommodate
`more data.) The data field will bold color information,
`coordinate information, local state information, etc. The tag
`40 field is used by each block to identify the message type so
`it knows how to act on it.
`Each block, on receiving a message, can do one of several
`things:
`Not recognize the message so it just passes it on to the next
`b lock.
`Recogni:z~ it as updating some local state (to the block) so
`the local state is updated and the message terminated, i.e.
`not passed on to the next block.
`Recognize it as a processing actioo, and if appropriate to the
`unit., the processing work specific to the unit is done. This
`may entail sending out oew messages sucb as Color
`and/or modifying the initial message before sending it on.
`Any new messages are injected into the message stream
`before the initial message is forwarded on. Some
`examples will clarify this.
`When the Depth Block receives a message ' new
`fragment' , it will calculate the corresponding depth and do
`the depth test. If the test passes then the ' new fragment'
`mcs.sagc is passed to the next unit. If tbc test fa ils then tbc
`60 message is modified and passed on. TI1e temptation is nolto
`pass the message on when the test fai ls (because the pixel is
`not going to be updated), but other units downstream need
`to keep their loca l DDA units in step.
`( In the present application, the messages are being
`65 described in general terms so as not to be bogged down in
`detail at this stage. The details o( what a 'new fragment'
`message actuall y specifies ( i.e. coordinate, color
`
`45
`
`The following definitions may help in understanding the
`exact meaning of terms used in the text of this application:
`application: a computer program wbich uses graphics ani(cid:173)
`mation.
`depth (Z) buffer: A memory buffer containing the depth
`component of a pixel. Used to, for example, eliminate
`hidden surfaces.
`bit double-bu!Iering: A technique for achieving smooth
`animation, by rendering only to an undisplayed back
`buffer, and then copying the back bulfer to the front once
`drawing is complete.
`FrameCount Planes: Used lo allow higher animation rates by
`enabling DRAM local buffer pixel data, such as depth (Z),
`to be cleared down quickly.
`frame buffer: An area of memory containing the displayable
`color buffers (front, back, lef1, right, overlay, underlay). 50
`This memory is typically separate from the local buffer.
`local buffer: An area of memory which may be used to store
`non-displayable pixel information: depth(Z), stencil,
`FrameCount and GTD planes. l11is memory is typically
`separate from the framebuffer.
`pixel: Picture element. A pixel comprises the bits in all the
`buffers (whether stored in the local buffer or frame buffer),
`corresponding to a particular location in the framebulfer.
`stencil buffer: A buffer used to store information about a
`pixel which controls bow subsequent stencilled pixels al
`the same location may be combined with the current value
`in the framebuffer. Typically used to mask complex
`two-dimensional shapes.
`Preferred Chip Embodiment- Overview
`The GLIN1'TM high performance graphics processors
`combine workstation class 3D g raphics acceleration, and
`state-of-the-art 2D performance in a single chip. All 3D
`
`55
`
`0011
`
`
`
`5,815,166
`
`10
`
`7
`information) is left till late r. In general, the term ·'pixel" is
`used to describe the picture element on the screen or in
`memory. The term "fragment" is used to describe the part of
`a polygon or other primitive which projects onto a pixel.
`Note that a fragmen t may only cover a part of a pixel.)
`When the Texture Read Unit (if enabled) gets a 'new
`fragmen t' message, it will calculate the texture map
`addresses, and will accordingly provide 1, 2, 4 or 8 texels to
`the next unit together with the appropriate number of
`interpolation coefficients.
`Each unit aocl
`the message passing are conceptually
`running asynchronous to all the others. Ilowever, in the
`presently preferred embodiment there is considerable syn(cid:173)
`chrony because of the common clock.
`How does the host process send messages? The message
`da ta field is the 32 bit data wrillen by the host, and the 15
`message lag is the bollom 9 bits of the address (excluding
`the byte resolution address lines). Writing to a specific
`address causes the message type associated with that address
`to be inserted into the mes.o;age queue. Alternatively, the
`on-chip OMA controller may fetch tbc messages from tbe 20
`host's memory.
`The mes.sage throughput, in the presently preferred
`embodiment, is SOM messages per second and this gives a
`fragment throughput of up to SOM per second, depending on
`w hat is being rendered. Of course, this rate will predictably 25
`be further increased over time, with adva nces in process
`technology and clock rates.
`Linkage
`The block diagram of FIG. 2A shows bow the units are
`connected together in the GLINT 300SX embodiment, and
`the b lock diagram o( FIG. 2B shows bow the units are
`connected 10gether in the GLINT 300TX embodiment.
`Some general points are:
`T he following functionality is present in the 300TX, but
`missing from the 300SX: The Texture Address ( f Addr)
`and Texture Read (TRd) Units are missing. Also, the 35
`router and multiplexer are missing from this section, so
`the unit ordering is Scissor/Stipple, Color DDA, Texture
`Fog Color, Alpha Test, LB Rd, etc.
`l o the embodiment of FIG. 2B, the order of the units can be
`configured in two ways. The most general order (Router, 40
`Color ODA, Texture Unit, Alpha Test, LB Rd, GID/Z/
`Stencil, LB Wr, Multiplexer) and will work in all modes
`of OpenGL. However, w hen the alpha test is disabled it is
`much better to do the Graphics 10, depth and stencil tests
`before the texture operations rather than after. This is 45
`because the tex ture operations have a high processing cost
`and this should not be spent on fragments which are later
`rejected because of window, depth or stencil tests.
`The loop back to the host at the bouom provides a simple
`synchronization mec hanism. T he host can insert a Sync
`command and w hen all the preceding renderi ng bas 50
`finished the sync command will reach the bollom host
`interface which will notify the host the sync event has
`occurred.
`Benefits
`The very modular nature of this architectme gives great 55
`benefits. Each unit lives io isolation from all the others and
`has a very well defined set of input and output messages.
`This allows the internal structu re of a unit (or group of units)
`to be cbangcd to make algori thmic/speed/gate count tradc-
`o~
`The isolation and well defined logical and behavioral
`interface to each unit allows much beller testing and veri(cid:173)
`fication of the corrcctnes.s of a unit.
`The message passing paradigm is easy to simulate with
`software, and the hardware design is nicely partitioned. The 65
`architecture is self synchronizing for mode or primitive
`changes.
`
`30
`
`8
`The host can mimic any block in the chain by ins