throbber
(12)
`
`United States Patent
`Wolfe
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,079,133 B2
`Jul.18, 2006
`
`US0070791.33B2
`
`(54) SUPERSCALAR 3D GRAPHICS ENGINE
`(75) Inventor: Andrew Wolfe, Los Gatos, CA (US)
`(73) Assignee: S3 Graphics Co., Ltd., Grand Cayman
`(KN)
`
`- r
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 11/069,821
`
`(22) Filed:
`
`Feb. 28, 2005
`
`(65)
`
`Prior Publication Data
`US 2005/O195197 A1
`Sep. 8, 2005
`Related U.S. Application Data
`63) Continuati
`faroplication No. 09/715,701 filed
`(63) Continuation of application No
`, /UT, Illed On
`Nov. 16, 2000.
`(51) Int. Cl.
`(2006.01)
`G06T IS/00
`(2006.01)
`G06T L/20
`(2006.01)
`G06F 5/16
`(52) U.S. Cl. ...................... 345/419; 34.5/502; 34.5/506;
`345/574
`(58) Field of Classification Search ........ 345/418-428,
`345/501-506, 537 538,520, 522,553,574,
`345/545, 530: 382/103, 190; 711/100, 117
`See application file for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`S. A
`s
`g NS et al.
`4.967:39. A
`10, 1990
`less al
`5,010,515 A
`4/1991 Torborg, Jr.
`5,325,485 A
`6, 1994 Hockmuth et al.
`5,347,618 A
`9/1994 Akeley
`
`5,357,599 A 10/1994 Luken
`5,392,385 A
`2/1995 Evangelisti
`5,408,605 A
`4/1995 Deering
`5,422,991 A
`6, 1995 Fowler
`5,440,682 A
`8/1995 Deering
`5,452,412 A
`9/1995 Johnson, Jr. et al.
`5.457,775 A 10/1995 Johnson, Jr. et al.
`5,459,835 A 10, 1995 Trevett
`5.499,326 A
`3/1996 Narayanaswami
`5.502.462 A
`3, 1996 Mical et al.
`5,539,911 A
`7/1996 Nguyen et al.
`5,560,032 A
`9/1996 Nguyen et al.
`5,572,634 A * 11/1996 Duluk, Jr. ................... 345/419
`5,572,657 A 11/1996 Pindeo et al.
`(Continued)
`OTHER PUBLICATIONS
`Wolfe, Andrew et al., “A Superscalar 3D Graphics Engine.”
`s All Symposium on Microarchitecture, Nov. 17,
`, Israel.
`Primary Examiner Richard Hjerpe
`y
`Assistant Examiner—Wesner Sajous
`(74) Attorney, Agent, or Firm—Carr & Ferrell LLP
`
`ABSTRACT
`(57)
`A method, apparatus and computer program product for
`parallel execution of primitives in 3D graphics engines. It
`includes detection and preservation of dependences between
`graphics primitives with the ability to execute multiple
`independent primitives concurrently while preserving their
`ordering because the architecture of the graphics engine for
`the present invention further provides concurrent resources
`for parallel execution. In a first preferred embodiment,
`primitives are executed in parallel using an in-order dispatch
`unit capable of detecting dependencies between primitives.
`In a second preferred embodiment, an out-of-order dispatch
`unit is used such that not only are primitives executed
`concurrently; but, the primitives may be executed in any
`order when dependencies are detected.
`
`10 Claims, 8 Drawing Sheets
`
`St.
`
`
`
`S46a
`
`
`
`issue unit
`
`6.a 0
`
`22.2-(2-r
`Z2A gra-
`candidate 5ters
`eSWalt alon
`suetus - stro
`
`accelerator i
`
`acceleraic 8
`
`
`
`
`
`T-cache
`
`T-cache
`
`merriory interface unit (MU}
`
`SS a
`
`cardigate bufers
`
`destratii
`resertail StarS
`
`
`
`Site
`reservator slators
`
`w
`
`tag
`
`rimitive
`
`ce wector
`
`y
`
`trixx
`
`MEDIATEK, Ex. 1023, Page 1
`IPR2018-00102
`
`

`

`US 7,079,133 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`5,574,835 A 1 1/1996 Duluk, Jr. et al.
`5,574,836 A 1 1/1996 Broemmelsiek
`5,574,847 A 1 1/1996 Eckart et al.
`5,596,686 A
`1/1997 Duluk, Jr.
`5,602,570 A
`2/1997 Capps et al.
`5,606,657 A
`2f1997 Dennison et al.
`5,632,030 A
`5, 1997 Takano et al.
`5,649,173 A
`7, 1997 Lentz
`6,108,014 A
`8/2000 Dye
`
`4/2001 Rosman et al. ............. 345/419
`6,222,550 B1
`3. R S. sing et al.
`- - -
`a
`6,417,848 B1
`7/2002 Battle
`6,552,723 B1
`4/2003 Duluk et al.
`6,611.264 B1
`8/2003 Regan
`2002/0158865 A1 * 10/2002 Dye et al. ................... 345/419
`2004/O130552 A1* 7/2004 Duluk et al. ................ 345,506
`
`* cited by examiner
`
`MEDIATEK, Ex. 1023, Page 2
`IPR2018-00102
`
`

`

`U.S. Patent
`
`Jul.18, 2006
`
`Sheet 1 of 8
`
`US 7,079,133 B2
`
`triangles
`
`
`
`world
`transform
`
`E:
`
`i
`
`i
`
`projection
`transform
`
`Figure
`
`clipping Hrasterization
`/2
`(-e J.
`
`rendered trame
`
`MEDIATEK, Ex. 1023, Page 3
`IPR2018-00102
`
`

`

`U.S. Patent
`
`Jul.18, 2006
`
`Sheet 2 of 8
`
`US 7,079,133 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`bdel
`
`Figure 2:
`
`Span 0
`
`
`
`><><>KR-><><
`GGGEGGG
`as feas
`Eas
`SEA
`><>3.255.22
`V. NARS Ssss
`asygia:NA Y
`
`Span 3
`
`span 4
`span 5
`Spar 8
`
`Spar 7
`-> span 8
`fire
`>>3><-> --><
`tail it is ti -> span 9
`
`
`
`MEDIATEK, Ex. 1023, Page 4
`IPR2018-00102
`
`

`

`U.S. Patent
`
`Jul.18, 2006
`
`Sheet 3 of 8
`
`US 7,079,133 B2
`
`fatch Queue
`(unexamined)
`
`in flight
`(running)
`
`/XA
`
`FIG. 3a
`
`fetch queue
`(unexamined)
`
`4.
`/X
`
`FIG. 3b
`
`ir flight
`(running)
`
`MEDIATEK, Ex. 1023, Page 5
`IPR2018-00102
`
`

`

`U.S. Patent
`
`Jul. 18, 2006
`
`Sheet 4 of 8
`
`US 7,079,133 B2
`
`fetch queue
`(unexamined)
`
`in flight
`(running)
`
`FIG. 4a
`
`letch queue
`(unexamined)
`
`in flight
`(running)
`
`AN
`
`fetch dueue
`(unexamined)
`
`s
`
`FIG. 4b
`
`res. Stations
`(waiting)
`
`in flight
`(running)
`
`4|AA
`A
`
`FIG. 4c
`
`MEDIATEK, Ex. 1023, Page 6
`IPR2018-00102
`
`

`

`U.S. Patent
`
`Jul. 18, 2006
`
`Sheet 5 of8
`
`US 7,079,133 B2
`
`fetch unit
`
`issue unit
`
`Ssa é
`
`memory interface unit (MIU)
`
`ew,
`Soa
`
`candioate buffers
`
`destination
`reservation) Slalons
`
`source
`reservation slavons
`
`
`
`FIG. 5
`
`C22a-nv
`
`MEDIATEK,Ex. 1023, Page 7
`IPR2018-00102
`
`MEDIATEK, Ex. 1023, Page 7
`IPR2018-00102
`
`

`

`U.S. Patent
`
`Jul.18, 2006
`
`Sheet 6 of 8
`
`US 7,079,133 B2
`
`MOVE PLURALITY OF
`PRIMTVES INTO ISSUE UNIT
`
`601
`
`STORE DESTINATION
`REGION AND SOURCE
`REGION BOUNDING BOX
`COORDINATES IN
`RESERVATION STATIONS
`
`602
`
`BEGIN EXECUT ON IN
`PARALLEL OF MULTIPLE
`PRMITIVES BY FIRST
`COMPUTING DEPENDENCE
`VECTOR FOR EACH
`PRIMTVE
`
`603
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`PASS PRMITVE TO
`AVAILABLE GRAPHICS
`ACCEE RATOR/
`RASTERIZER
`
`
`
`F.G. 6a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`WAIT UNTIL PRMIT VE
`CAUSING
`DEPENDENCY S
`COMPLETED
`PROCESSING AND
`THEN PROCESS NEXT
`PRMITIVE IN
`CANDIDATE BUFFER
`
`605
`
`
`
`COMPLETE
`PROCESSING OF
`PRIMTIVE AND CLEAR
`ENTRIES IN
`RESERVATION
`STATIONS
`
`606
`
`PASS NEXT PRIMTVE
`FROM FETCH UNIT TO
`ISSUE UNIT AND
`SEARCH CAND DATE
`BUFFER FOR NEXT
`ODEST ELGIBLE
`PRMTVE
`
`
`
`607
`
`MEDIATEK, Ex. 1023, Page 8
`IPR2018-00102
`
`

`

`U.S. Patent
`
`Jul.18, 2006
`
`Sheet 7 of 8
`
`US 7,079,133 B2
`
`MOVE PLURALITY OF
`PRMITIVES INTO ISSUE UNIT
`
`610
`
`STORE DESTINATION
`REGION AND SOURCE
`REGION BOUNDING BOX
`COORDNATES IN
`RESERVATION STATIONS
`
`615
`
`BEGIN EXECUTION IN
`PARALLEL OF MULTIPLE
`PRMTVES BY FIRST
`COMPUTING DEPENDENCE
`VECTOR FOR EACH
`PRMITIVE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`PASS PRIMTIVE TO
`AVAILABLE
`GRAPHCS
`ACCE LERATOR/
`RASTERIZER
`
`
`
`
`
`
`
`
`
`
`
`
`
`F.G. 6b
`
`
`
`
`
`BYPASS PRIMITIVE
`FROM PROCESSSING
`AND CAL CULATE
`DEPENDENCY FOR
`NEXT PRIMITIVE IN
`CAND DATE BUFFER
`
`630
`
`
`
`COMPLETE
`PROCESSING OF
`PRIMITIVE AND CLEAR
`ENTRIES IN
`RESERVATION
`STATIONS
`
`635
`
`PASS NEW PRIMITIVE
`FROM FETCH UNIT TO
`ISSUE UNIT AND
`SEARCH CAND DATE
`BUFFER FOR NEXT
`OLDEST ELGBLE
`PRMITIVE
`640
`
`MEDIATEK, Ex. 1023, Page 9
`IPR2018-00102
`
`

`

`U.S. Patent
`
`Jul.18, 2006
`
`Sheet 8 of 8
`
`US 7,079,133 B2
`
`destination
`reservation statioS
`
`initi
`w primitive
`
`new dependence
`
`
`
`
`
`
`
`
`
`
`
`wector IP
`|
`|| T
`H || ||
`--
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`sorce
`reservation stations
`
`FIG. 7
`
`MEDIATEK, Ex. 1023, Page 10
`IPR2018-00102
`
`

`

`US 7,079,133 B2
`
`1
`SUPERSCALAR 3D GRAPHICS ENGINE
`
`The present application is a continuation and claims the
`priority benefit of U.S. patent application Ser. No. 09/715,
`701 filed Nov. 16, 2000 and entitled “Superscalar 3D Graph-
`ics Engine.” The disclosure of this commonly owned appli-
`cation is incorporated herein by reference.
`
`FIELD OF THE INVENTION
`
`The invention relates to 3D graphics systems. More
`specifically,
`the invention relates to an architecture for
`parallel processing of graphics primitives which allows the
`concurrent execution of multiple graphics primitives while
`maintaining exact sequential semantics.
`
`BACKGROUND OF THE INVENTION
`
`Since the widespread introduction of graphicaluserinter-
`faces in computers more than 15 years ago, special-purpose
`graphics accelerators have been an integral component of
`desktop computer systems. Recently, 3D applications such
`as computer gaming, CAD, and visualization applications
`have been pushing high-performance 3D acceleration from
`specialty work-stations into mainstream PCs. The demand
`for increased 3D performanceis currently insatiable. Single-
`chip 3D accelerators today are 50 times faster than those
`available 3-4 years ago. Each generation substantially
`improvesthe quality of the images and yet another factor of
`100 still would not producetruly realistic scenes.
`Like all computing systems, in order to achieve such rapid
`increases in performance,
`it is necessary to improve the
`microarchitecture of each generation to increasingly take
`advantage of parallelism. Over the past few years, 3D
`graphics engines have moved from 32 to 64 to 128-bit
`memory buses in much the same way that microprocessors
`grew from 4 to 8 to 16 and eventually to 64 bit buslines.
`However,
`this advancement
`in 3D graphics engines has
`diminishing returns, especially as applications move toward
`a larger numberofprimitives for each scene to be displayed.
`Current high-performance microprocessors
`all use
`instruction-level parallelism (ILP) to further increase per-
`formance.
`ILP exploits information about dependences
`between instructions to allow parallel execution of multiple
`instructions while maintaining the execution semantics of a
`sequential program. Manydifferent ILP mechanisms can be
`effectively employed to improve performance, however
`dynamically scheduled, out-of-order, superscalar micropro-
`cessors are the commercially dominant microarchitecture at
`the present time.
`This invention includes an approach for using dynami-
`cally scheduled, out-of-order, superscalar principles for
`increasing the available parallelism in a 3D graphics engine
`while maintaining sequential execution semantics. In the
`abstract, it would seem that graphics, and particularly 3D
`graphics, is a massively parallel application that would not
`need ILP technology for high performance.In fact, in many
`simple graphics applications, it would be possible to render
`each pixel
`independently; however,
`in practice, graphics
`applications have very similar characteristics to traditional
`sequential programs. Standard APIs
`like Direct3D or
`OpenGLare used to create graphics applications. These are
`translated via software drivers to a sequence of graphics
`primitives to be rendered by a graphics engine that consists
`of some combination of additional hardware and software.
`
`The programming model assumesthat these primitives will
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`be executed sequentially and atomically, in much the same
`mannerthat it is assumed that instructions in a traditional
`
`sequential ISA are executed.
`Typically, 3D graphics applications will allow blocks of
`frame buffer memory to be directly read or written by the
`main processor. This requires that precise frame buffer state
`be available whenevera direct access executes. This creates
`
`a dependence between previous primitives and a direct read
`or subsequent primitives and a direct write. As in general-
`purpose computing systems, it is possible to build massively
`parallel systems that provide excellent performance on a
`limited set of applications that have been programmed with
`parallel execution in mind. However, in order to be com-
`patible with a large existing base of software using widely
`accepted programminginterfaces and programmingstyles, it
`is necessary to detect the dependences between graphics
`primitives, extract independent primitives from the instruc-
`tion stream, and execute them concurrently. Therefore, in
`order to implement a parallel system for executing this
`sequence of primitives, the semantics of sequential execu-
`tion must be maintained. More particularly, several factors
`cause dependences between graphics primitives which that
`can prevent concurrentor parallel execution of primitives in
`a 3D graphics engine.
`Z-buffering
`Realistic 3D graphics usually include hidden surface
`removal. More specifically, objects that are behind other
`objects from the perspective of the viewer should not be
`visible in the final image. Typically, a Z-buffer is used to
`implement hidden-surface removal. The Z-buffer stores the
`distance from the viewpoint to a currently drawn pixel so
`that when drawing a new pixel, it can be determined if the
`new pixel is in front of or behind the currently drawn pixel.
`A well-implemented Z-buffering algorithm should produce
`the sameresult even if the triangles are drawn in a different
`order. However, if the primitives are executed concurrently,
`then two triangles may be drawn concurrently. If each
`primitive attempts to concurrently read the same Z-buffer
`value, modify it, and then write the new value to the Z-buffer
`using commonread and write operations, incorrect results
`can be produced. A special type of dependence thus exists
`between any two primitives that must access the same
`Z-buffer value. Although they can execute in either order,
`there is currently no known process for executing these
`primitives concurrently.
`
`Alpha-blending
`Alpha-blending is an operation that uses a transparency
`value (alpha) to permit some portion of an occluded object
`to be visible through a foreground object. Unfortunately, the
`primitives that execute alpha-blending operations to make
`objects appear transparent must be executed in order. The
`foreground and the background must be executed in order to
`make the transparent effect appear correct on the image.
`Accordingly, ifthe 3D graphics engine does not maintain the
`semantics of the sequence in which primitives are executed,
`the image will be incorrect.
`
`Dynamic Textures, Procedural Textures, Environment Map-
`ping
`In another feature of realistic 3D graphics, an image
`(called a texture) can be mapped onto another image; for
`example,
`it may be part of the image to have objects
`reflected off of water displayed in the image. Often these
`textures have limited life-times. Procedural
`textures are
`
`created on-the-fly by program code. Dynamic textures are
`loaded into the graphics system memory space from some
`backing store for a limited time. Environment mapping is a
`MEDIATEK,Ex. 1023, Page 11
`IPR2018-00102
`
`MEDIATEK, Ex. 1023, Page 11
`IPR2018-00102
`
`

`

`US 7,079,133 B2
`
`5
`
`10
`
`15
`
`3
`technique for reflections where the 3D objects which is to be
`reflected is drawn and then copied as an image to be mapped
`onto a reflective surface. In each of these cases, there is a
`dependence between the primitives that create the texture
`and the primitives that render the reflective surface or
`polygon upon which the texture is projected. Once again, if
`the 3D graphics engine does not maintain the semantics of
`the sequence in which primitives are executed, the image
`will be incorrect.
`2D BLITS
`Often in 3D graphics, it is advantageous to be able to mix
`2D block copy and drawing operations with 3D rendering.
`If overlapping 2D objects are read or written out of order, the
`resulting image is incorrect.
`Direct Frame Buffer Access
`Common graphics APIs allow blocks of frame buffer
`memory to be directly read from or written to at the same
`time by the processor. This requires that the precise state of
`the frame buffer be available and known whenever an access
`to the frame buffer memory executes. This creates a depen
`dence between any previously executed primitives and a
`direct read or any future executed primitives and a direct
`write.
`Generally, a 3D application creates a series of frames. A
`25
`3D graphics engine then identifies each of the objects in a
`frame and breaks the surface of the object down into a
`collection of triangles for processing (typically the process
`ing and drawings of the pixels within these triangles are
`represented by a serious of executable instructions which are
`referred to as primitives which are processed individually).
`Each triangle or primitive is specified by three vertices in a
`3D space, one or more surface normals, and a description of
`how to draw the triangle's Surface, i.e. texturing, alpha
`blending parameters, etc. Accordingly, from the point of
`view of a 3D graphics engine, a frame consists of a collec
`tion of triangles or primitives which are all processed and
`executed separately thereby rendering the entire frame or
`image. The 3D graphics engine is responsible for processing
`each triangle or primitive and converting them each into
`pixels, which when displayed render the entire 3D frame.
`FIG. 1 illustrates a block diagram which shows a prior art
`3D processing pipeline resident in a prior art 3D graphics
`engine. Generally, the graphics engine identifies the trian
`gular coordinates for each primitive within the shared world
`space of the entire image, applies lighting to the triangles or
`primitives, transforms each triangle or primitive from the 3D
`space used by the application into 2D screen coordinates,
`and draws the appropriate pixels into the frame buffer
`(applying any shading, Z-buffering, alpha bending etc.).
`Referring now to FIG. 1, and more specifically, a first
`stage in a pipeline is a world transform stage 105, in which
`the graphics engine converts the vertices and normals of the
`triangle from the real world object space, which may be
`different for each object in the scene, to the shared world
`space, which is space shared by all of the objects to be
`rendered in the entire scene. This transform consists of a
`matrix-vector multiplication for each vertex and each nor
`mal. In a second stage of the pipeline, a lighting stage 110.
`the graphics engine takes the triangle's color and Surface
`normal(s) and computes the effect of one or more light
`Sources. The result is a color at each vertex. At the next stage
`in the pipeline, a view transform stage 115, the graphics
`engine converts the vertices from the world space to a
`camera space, with the viewer (or camera) at the center or
`origin and all vertices then mapped relative from that origin.
`Additionally, in the view transform stage 115, the graphics
`
`35
`
`4
`engine applies a matrix-vector multiplication to each vertex
`calculated for the camera space.
`As further shown in FIG. 1, the next stage in the pipeline
`is a projection transform stage 120. At the projection trans
`form stage 120, the graphics engine maps the vertices for the
`camera space to the actual view space. This includes the
`perspective transformation from 3D to 2D. Accordingly, at
`this point in the pipeline, the vertices are effectively two
`dimensional to which perspective effects (i.e., depth fore
`shortening) have been applied. Accordingly, the third (Z)
`coordinate is only needed to indicate the relative front-to
`back ordering of the vertices when the objects are rendered
`or drawn within the view space. Like the other two trans
`form stages in the pipeline, the projection transform stage
`requires the application of a matrix-vector multiplication per
`each vertex. In a clipping stage 125, the graphics engine
`clips the triangles or primitives to fit within the view space.
`Accordingly, the triangles or primitives which lie entirely off
`the side of the screen or behind the viewer are removed.
`Meanwhile, triangles or primitives which are only partially
`out of bounds are trimmed. This generally requires splitting
`the resulting polygon into additional multiple triangles or
`primitives and processing each one of these additional
`triangles or primitives separately. Finally, in a rasterization
`stage 130, the graphics engine converts those triangles to be
`displayed within the view space into pixels and computes
`the color value to be displayed at each pixel. This includes
`visible-surface determination (dropping pixels which are
`obscured by a triangle closer to the viewer), texture map
`ping, and alpha blending (transparency effects).
`FIG. 2 further illustrates how a prior art rasterizer stage
`130 in a 3D graphics engine operates. First, the rasterizer
`calculates the centers for each pixel in the triangle or
`primitive and assigns X and y values to these centers. The
`rasterizer stage then converts each triangle or primitive into
`a series of horizontal spans, with one span generated for
`each integery value that falls inside the triangle. For each
`horizontal span, the rasterizer computes the two endpoints,
`i.e. the points where the horizontal span crosses the edges or
`boundaries of the triangle or primitive. The rasterizer will
`also interpolate color values and perspective-corrected tex
`ture coordinates for the endpoints. Next, the rasterizer
`generates the series of pixels along the span, again interpo
`lating color and texture coordinates for each pixel between
`the two endpoints of the horizontal span. Several operations
`are then performed at each pixel. First each pixel has its Z
`(depth) value compared to the Z (depth) value for the
`currently displayed pixel in the same location. The currently
`displayed pixel has its Z (depth) value stored in a Z buffer. If
`the comparison indicates that this new pixel is behind the old
`one, the new pixel is discarded. If the comparison indicates
`that this new pixel is in front of the old pixel then the Z test
`Succeeds and the new pixel color is computed. This can
`include texture mapping and alpha blending. Accordingly,
`prior art 3D graphics are computed serially because each Z
`(depth) value must be compared to the previously displayed
`Z (depth) value.
`Generally, a prior art 3D graphics engine will serially
`perform these steps on each triangle or primitive one at a
`time, such that the triangles or primitives are processed in an
`orderly fashion one after the other in series. One reason this
`is done is to avoid any dependencies which may occur
`between the primitives or triangles as they are each
`executed. As explained earlier, as each primitive is executed
`for processing, the Z values for each pixel location in the
`triangle or primitive is compared with the Z value previously
`displayed in that same location on the two dimensional
`
`30
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`MEDIATEK, Ex. 1023, Page 12
`IPR2018-00102
`
`

`

`US 7,079,133 B2
`
`5
`screen in order to determine whether the new pixel should
`overwrite (appear in front of) the old value or be ignored
`(appear behind). If the triangles are not executed in order,
`then the Z value-test results will be faulty.
`However, the present invention is directed toward a
`method, apparatus and computer program product for par
`allel execution of primitives in 3D graphics engines. It
`includes detection and preservation of dependences between
`graphics primitives. Accordingly, the present invention has
`the ability to execute multiple independent primitives con
`currently while preserving their ordering because the archi
`tecture of the graphics engine for the present invention
`further provides concurrent resources for parallel execution.
`In a first preferred embodiment, primitives are executed in
`parallel using an in-order dispatch unit capable of detecting
`dependencies between primitives. In a second preferred
`embodiment, an out-of-order dispatch unit is used such that
`not only are primitives executed concurrently; but, the
`primitives may be executed in any order when dependencies
`are detected.
`
`10
`
`15
`
`6
`multiple instruction/multiple data MIMD fashion, which
`means the graphics engine is capable of processing multiple
`primitives at the same time.
`In Order Dispatch Unit
`FIGS. 3a–b illustrate how a preferred embodiment of a
`graphics engine uses an in-order dispatch system in order to
`dispatch primitives for concurrent execution by two or more
`graphics accelerators in accordance with the present inven
`tion. In this embodiment, each primitive is ranked in order
`in an input list and is dispatched from the head of the input
`list, in order, as an accelerator becomes available. As shown
`in FIG. 3a, a 3D image is broken into a number of triangles
`or primitives, i.e. each triangle is a primitive which must be
`processed for display by the graphics engine. In a typical
`prior art 3D graphics engine, the coordinates for the vertices
`of the primitives are stored in a queue for serial execution/
`processing by the 3D graphics engine. The primitives are
`executed in order serially one at a time from the queue.
`However, in the present invention more than one primitive
`may be executed at a time so the queue has what is referred
`to as an in-order dispatch system.
`Assume in the preferred embodiment illustrated that there
`are two graphics accelerators capable of executing two
`graphics primitives in parallel such that two primitives may
`be dispatched to the two simultaneously executing accelera
`tors/rasterizers in the 3D graphics engine. Accordingly, FIG.
`3b shows primitives 1 and 2 being executed while primitives
`3, 4 and 5 remain in the queue. When using an in-order
`dispatch unit in accordance with a preferred embodiment of
`the present invention, each Subsequent primitive is tested
`against currently executing primitives in order to detect any
`dependencies. Accordingly, in our example, primitive 3 will
`be tested against primitives 1 and 2 for dependencies. Since
`primitive 3 overlaps primitives 1 and 2, primitive 3 cannot
`be executed until primitives 1 and 2 are both complete. This
`is because triangles associated with primitives 1 and 2 are
`both overlapped by the triangle represented by primitive 3.
`Thus, even if primitive 1 completes before primitive 2.
`primitive 3 cannot begin processing/execution until primi
`tive 2 has completed. In this embodiment, the graphics
`engine cannot execute primitive 4 because it includes an
`in-order dispatch system which does not permit out of order
`execution and processing of the primitives. Accordingly, the
`in-order dispatch system prevents the next primitive in the
`list from executing until all previous primitives that overlap
`that primitive have been completed and all primitives which
`are in the queue before that primitive have executed. This
`allows for parallel processing and execuion of primitives
`while preserving the ordering and integrity of the primitives
`thereby insuring that the final graphic is properly drawn.
`Out of Order Dispatch
`FIG. 4a–c illustrate how a preferred embodiment of a
`graphics engine uses an out-of-order dispatch system in
`order to dispatch primitives for concurrent execution by two
`or more graphics accelerators in accordance with the present
`invention. As shown in FIG. 4a, a 3D image is broken into
`a number of triangles, each triangle represented by a primi
`tive. The primitives are stored in a queue for execution/
`processing by the 3D graphics engine.
`Assume in the preferred embodiment illustrated that there
`are two graphics accelerators capable of executing two
`graphics primitives in parallel such that two primitives may
`be dispatched to the two simultaneously executing accelera
`tors/rasterizers in the 3D graphics engine. However, unlike
`the preferred embodiment which utilizes an in-order dis
`patch unit, in the embodiment which utilizes an out of order
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`25
`
`35
`
`FIG. 1 illustrates a block diagram which shows a prior art
`3D processing pipeline resident in a prior art 3D graphics
`engine;
`FIG. 2 illustrates further how a prior art rasterizer stage
`130 in a 3D graphics engine operates;
`FIGS. 3a–b illustrate how a first preferred embodiment of
`30
`a 3D graphics engine uses an in-order dispatch mechanism
`which dispatches each primitive from the head of the input
`list, in order, as an accelerator becomes available;
`FIG. 4a–c illustrate the basic concepts used in the process
`of 3D graphics in accordance with a second preferred
`embodiment of the present invention which uses an out-of
`order dispatch mechanism;
`FIG. 5 illustrates a functional block diagram showing an
`out-of-order dispatch mechanism in accordance with a pre
`ferred embodiment of the present invention;
`40
`FIGS. 6a and 6b illustrate the process executed when
`processing graphics primitives and using either an in-order
`or an out-of-order dispatch mechanism in accordance with a
`preferred embodiment of the present invention; and
`FIG. 7 illustrates a block diagram which shows a pre
`ferred embodiment of the hardware used to compute depen
`dency vectors for each primitive.
`
`45
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`In the present invention, an out-of-order dispatch 3D
`graphics engine detects dependences and maintains sequen
`tial semantics at the rasterizer stage to extract ILP from the
`graphics primitive stream. Multiple rasterization engines are
`used to provide execution resources for the graphics primi
`tives such that multiple graphics triangles or primitives may
`be executed at the same time. Since primitives can have long
`execution times, from several cycles to several hundred
`thousand cycles, it is not very important to dispatch more
`than one primitive per cycle, although it is possible to do so.
`However, it is important to have many primitives executing
`concurrently. Also, since the primitives do many memory
`accesses and require an indeterminate number of cycles to
`execute, it is best to have the rasterization engines run
`independently. Accordingly, in a preferred embodiment of
`the present invention, the graphics engine operates in a
`
`50
`
`55
`
`60
`
`65
`
`MEDIATEK, Ex. 1023, Page 13
`IPR2018-00102
`
`

`

`US 7,079,133 B2
`
`7
`dispatch unit, the primitives may be executed out-of-order
`since the queue has an out-of-order dispatch system.
`Accordingly, once again assume in the example illustrated
`that two primitives may be dispatched to two simultaneously
`executing accelerators/rasterizers in the 3D graphics engine.
`FIG. 4b shows primitives 1 and 2 being executed while
`primitives 3, 4 and 5 remain in the queue. When using the
`out-of-order dispatch unit in accordance with a preferred
`embodiment of the present invention, each Subsequent
`primitive is tested against currently executing primitives in
`order to detect any dependencies. Accordingly, in our
`example, primitive 3 will be tested against primitives 1 and
`2 for dependencies. Since primitive 3 overlaps primitives 1
`and 2, it cannot be executed until primitives 1 and 2 are both
`complete. Using the out-of-order dispatch unit, if primitive
`1 completes before primitive 2, primitive 3 may be bypassed
`and primitive 4 is then processed/executed. Once primitive
`2 completes processing, then primitive 3 may be executed.
`Accordingly, the out-of-order dispatch system prevents the
`next primitive in the list from executing until all previous
`primitives that overlap that primitive have been completed
`while allowing Subsequent primitives which do not overlap
`to be executed. This allows for parallel processing and
`execution of primitives while preserving the ordering and
`integrity of the primitives thereby insuring that the final
`graphic is properly drawn.
`Therefore, utilizing the preferred embodiment of the
`present invention in which an out-of-order dispatch unit is
`used, although primitive 3 cannot be approved for execution
`since it overlaps with the currently executing primitives, in
`order to extract additional parallelism, it is preferable to
`allow out of-of-order dispatch of other primitives in the
`queue. In this case, stalled operations or primitives which
`cannot be executed are placed into central reservation sta
`tions (i.e. primitive 3) and other operations that can proceed
`independently (i.e. other primitives which can be processed
`immediately since there is no overlap) are allowed to bypass
`them. Accordingly, if either primitive 1 or 2 completes
`before the other, primitive 4 can begin execution, as shown
`in FIG. 4c.
`Dependency Checking
`In both embodiments (i.e. using either an in-order or an
`out-of-order dispatch mechanism), all incoming primitives
`are placed in reservation stations and the oldest eligible
`candidate (i.e., the oldest primitive eligible for execution
`which has no dependencies) is dispatched each cycle as an
`accelerator/rasterizer becomes available. Accordingly, in
`both embodiments, it is necessary to have a mechanism for
`checking the dependency of each incoming p

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket