throbber
(12) United States Patent
`Furtner
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,778,177 B1
`Aug. 17, 2004
`
`US0067781 77131
`
`1/1999 Buekelew et al.
`5,864,512 A *
`6/1999 Aleksic
`5,914,722 A *
`2/2001 Min
`6,184,907 B1 *
`5/2001 Watldns .
`6,236,408 B1 *
`FOREIGN PATENT DOCUMENTS
`19600431 A1
`9/1996
`2297018 A
`7/1996
`OTHER PUBLICATIONS
`
`
`
`365/230.01
`345/423
`345/558
`345/582
`
`(54) METHOD FOR RASTERIZING A GRAPHICS
`BASIC COMPONENT
`
`(75)
`
`Inventor: Wblfgang Furtner,Fuerstenfeldbruck
`(DE)
`
`(73) Assignee: SP3D Chip Design GmhH, Starnberg
`(DE)
`
`DE
`GB
`
`(_ * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.:
`(22) PCT Filed:
`
`09/958,297
`Apr. 10, 2000
`
`PCT/EP00/03175
`
`(86) PCT No.:
`§ 371 ('C)(1)a
`Feb. 19, 2002
`(2), (4) Date:
`(87) PCT Pub. No.: W000/63846
`PCT Pub. Date: Oct. 26, 2000
`
`Foreign Application Priority Data
`(30)
`Apr. 15, 1999
`(DE)
`....................................... .. 199 17 092
`
`Int. Cl.7 ........................................ .. G06F 12/02
`(:51)
`(52) U.S. Cl.
`. . . . . . .
`. . . . . . . . . .. 345/544; 345/443; 345/506
`(58) Field of Search ............................... .. 345/544, 501,
`345/502, 503, 505, 506, 530, 418, 440,
`441, 442, 443
`
`('56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`A
`A
`
`11/1994 Bowen el al.
`'10/1998 Watkins
`
`J.D. Foley, et al., “Grundlagender Computergraphik”, 1. Ed.
`Addison—Wesley, 1994, ISBN: 3—89319—647—1; pp 75-82
`and 101-106.
`
`cited by examiner
`Primary Examiner—Kee M. Tung
`(74) Attorney, Agent, or Firm—Dougherty, Clements,
`Ilofer & Bernard
`
`(5 7,)
`
`ABSTRACT
`
`A method for rasterizing a graphic primitive (120) in a
`graphics system generates, starting from graphic primitive
`description data, pixel data for the graphic primitive, the
`graphics system comprising a memory which is divided up
`into a plurality of blocks (a, a+1, b, b+1) which are each
`associated with a predetermined one of a plurality of areas
`on a mapping screen (114). Each block of the plurality of
`blocks (a, a+1, b, b+1) is associated with a memory page in
`the memory. The i11etl1od includes scanning the pixels asso-
`ciated With the graphic primitive (120) in one of the plurality
`of blocks (a) into which the graphic primitive extends,
`repeating the preceding steps until all of the pixels associ-
`ated with the graphic primitive have been scaimed in each of
`the plurality of blocks into which the graphic primitive
`extends, and outputting the pixel data.
`
`15 Claims, 12 Drawing Sheets
`
`[042
`
`14_)(,a
`
`10“,
`
`10611
`
`108:!
`
`
`
`Rendering
`
`engine
`
`..w —
`4*’
`Vmex Processor
`data
`
`
`
`0001
`
`Volkswagen 1004
`
`0001
`
`Volkswagen 1004
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 1 of 12
`
`US 6,778,177 B1
`
`Rendering
`engine
`
`
`
`In?Ffiifififlfisjut?fiyguli
`
`0002
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 2 of 12
`
`US 6,778,177 B1
`
`E+dex
`
`Em ,
`
`E+(dex+dey)
`
`132
`
`Cluster:
`
`Non-visible :E
`
`Visible
`
`-E‘
`
`2
`
`0,,
`
`0003
`
`

`
`U.S. Patent
`
`UA
`
`one
`
`40027:1
`
`Sheet 3 of 12
`
`US 6,778,177 B1
`
`II.
`
`__________
`
`4
`
`2.
`
`2, 4,
`Rendering engine
`Texture engine
`
`
`
`.l.:.l.I1..|l7.
`
`0004
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 4 of 12
`
`US 6,778,177 B1
`
`Starting values
`t.
`lgeltas.
`d.
`Ccifitslglrnfgorxlliec Ion
`Interleave factors
`Interleave offset
`
`V.
`.b_]_
`151
`1 it
`Start command
`damp y
`mination _> Visibility
`-
`O '
`I
`“mt
`Var ap
`
`Leading pixel
`
`Dir_X 4.‘
`IIII 25110
`II+L>Dir_Y
`
`0005
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 5 of 12
`
`US 6,778,177 B1
`
`x_mn mod clx
`[y_sun mod cly|
`
`ccfx = - (x_slar( mod clx)
`ccfy I -(_y_:L1n mod cly)
`
`ccfx = (x_surt mod cix) . (clx .1)
`ccfy = (y Vstafi mod cly)~ (sly —l)
`
`d1I_x = 0[dir_y =0]
`
`(x_slanlclx) mod ilfx =
`[(y_sLan/ciy) mod ilfy =1
`
`dir-x = I
`
`|dir_y =l]
`
`(x_starUcIx) mod ilfx :
`[(y_sl.1:1/cly) mod ilfy =]
`
`iiox[i1oy)
`
`icfx=(ilox—((x_sIan/clx) mod il{x)) mod ilfx
`icfy =(iloy~((y_stz:1/ciy) mod iify)) 1-nod ilfy
`
`icfx=(((K_S\art/clx) mod ilfx)-ilox) mod ilfx
`Icfy=(((y_:IxnIc}y) mod ilfy)-iloy) mod ilfy
`
`Definition
`
`ccfx + cix ' Icfx. Start correction factor in the x direction
`
`“TY * CIY ‘ icfy. Slarl correction factor in the y direction
`
`cI_star1. + scfic ‘ cl_d.x + scfy ‘ el_dy
`
`enl_sui1. + scfiz ‘ e.n1_dx + scfy ‘ r.ni_dy
`
`cn2_:un 1- scfir ‘ a12_dx # scfy ‘ m2_dy
`
`if dir_x = 0 then x_sun‘ = x_xun + scfx cls: x_xun‘ = x_slan - scfx
`if dir_y = 0 then y_sur1‘ = y_sur1 + scfy else y_xun‘ = y_slIrl . scfy
`ccm_sun v scfx
`
`u:n\_:un ~ xcfy
`
`0006
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 6 of 12
`
`US 6,778,177 B1
`
`log2(i1fy * cly)
`log2(i1fx * clx)
`
`E‘ Sensor for syl
`el+dx+2dy
`
`calculation:
`
`“
`:5
`f“fif C
`
`;_ T3
`
`an
`
`0007
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 7 of 12
`
`US 6,778,177 B1
`
`Term
`
`Definition
`
`ccn1_dccr
`
`scnt_decr
`
`ccnt - clx ‘ ilfx
`
`scnl — cly ‘ ilfy
`
`(cl_x, cnl_x, cn2_x)
`
`(cl, cnl, cn2) + clx ‘ ilfx * (cl_dx, cn1_dx, cn2_dx)
`
`(c1_y, cn1__y, cn2_y)
`
`(cl, cnl, cn2) + cly “ ilfy ' (elwdy, cn1_dy, cn2_dy)
`
`(cl_xy, cn1_xy, cn2_xy)
`
`(cl, cnl, cn2) + clx ‘ ilf‘ (cl_dx, cnl_dx, cn2_dx) + cly " iify ‘ (cl_dy, cn1_dy,
`cn2*dy)
`
`syl
`
`(cl + (clx - 1) “ cl_dx + cly ‘ ilfy “ cl_dy) >= 0
`
`scnt_dccr >= 0
`
`((cnl + clx ‘ ilf‘ cnl_dx )>= 0 AND (cn2 + cl): ‘ Hf‘ cn2_dx) >= 0) OR
`((cn1 + clx “ ilf‘ cnl_dx + cn1_dy) >= 0 AND (en2 + clx ‘ i1f* cn2_dx +
`cn2_dy) >= 0) OR
`
`((cnI + clx " ilf‘ cniwdx + (cly -I) ’ cn!_dy) >= 0 AND (cn2 + clx ‘ iii" *
`cn2_dx + (cly ~I) “ cn2_dy) >= 0)
`
`ccnt_dccr>= O
`
`ifdir_x = 0 then cos = ((x div ilfx) mod stw >= stw - clx) clsc cos = ((x div ilfx)
`mod slw < clx )
`
`Movement
`
`Condition
`
`x direction
`
`y direction
`
`xy direction
`
`Edge return
`
`Next stri
`
`I3
`
`(/cos AND sxn AND soc) on ((cos AND /sf AND sxn AND
`soc) AND ((/cf AND /syl) OR /ssc))
`
`'
`
`'
`
`(cos OR /sxn OR /scc) AND ssc AND /ef AND syl
`
`/sxn AND ssc AND /cf AND /syl AND /sf AND soc
`
`(cos OR /sxn OR /scc) AND ssc AND cf
`
`(/ssc AND sf AND /sxn AND soc) OR ((cos OR /sxn) AND /cf
`AND /syl AND sf AND scc) OR (605 AND sf AND /ssc & soc)
`
`Next primitive
`
`(/ssc AND /sf AND /(sxn AND sec» on User: AND /syl & /ct)
`OR (/scc AND /ssc)
`
`0008
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 8 of 12
`
`US 6,778,177 B1
`
`Register
`
`Value to be loaded
`
`Condition
`
`(el, enl, en2)
`
`(el_x, en1_x, en2__x)
`
`(el y, enl_y, en2_y)
`
`(cl_xy, cn1__xy, en2_xy)
`
`(eI_edge, en1_edge, en2_edge)
`
`(el_sLrip, en I_strip, en2_strip)
`
`hon
`
`vctt
`
`diag
`
`edge
`
`strip
`
`(el_edge, en1_edge,
`en2_edge)
`
`(el__strip, enl_strip,
`en2_slrip)
`
`(eI_starl, en I _start, en 2_start)
`
`newp
`
`(elj, enlj, enlj)
`
`horz AND syl AND /ef
`
`_fl
`
`(e1_x_ en1_x, en2_x)
`
`(vert OR edge) AND sxn AND
`sec AND /sf
`
`ccnl
`
`ccnt__decr
`
`horz OR diag
`
`mp
`
`horz AND syl AND /ef
`
`(vcrt OR edge) AND sxn AND
`scc AND /sf
`
`vert OR edge OR diag
`
`ncwp
`
`(ven OR edge) AND sxn & sec
`AND /sf
`
`(van OR edge) AND sxn AND
`soc AND sf
`
`syl AND horz
`
`/(syl AND horz)
`
`(vcrt OR edge) AND sxn AND
`SCC
`
`strip OR ncwp OR diag
`
`ccnl_Sta11
`
`cont
`
`ccnt_decr
`
`scnl_decr
`
`scnt_sta11
`
`1
`
`1s
`
`cnt_st1ip + 1
`
`0
`
`ccnt_edge
`
`0009
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 9 of 12
`
`US 6,778,177 B1
`
`Condition
`
`Comment
`
`11012 AND /cf
`
`horz AND ef
`
`go to the next pixel in the scan line,
`track edge return position
`
`go to the next pixel in the scan line,
`secure edge return position
`
`(vert OR edge) AND /sf
`
`(VCI1 OR edge) AND sf
`
`go to the edge entry position,
`track strip entry position
`
`go to the edge entry position,
`secure strip entry position
`
`diag
`
`SWP
`
`go diagonally, secure no return position
`
`go to the strip entry point
`
`go to the next primitive
`
`vis(cofs,rofs) = (el + cofs * el_dx + rofs * el_dy) >= 0 AND
`(enl + cofs * en1_dx + rofs * enl_dy) >= 0 AND
`(en2 + cofs * en2_dx + rofs * en2_dy) >= 0 AND
`(cent - cofs) >= 0 AND (sent - rofs) >= 0
`
`(°°f5 / mfs)
`
`Leading pixel
`
`0010
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 10 of 12
`
`US 6,778,177 B1
`
`EEEEEEEE
`u:>—-IQ)v—dDJL.)—-
`
`Efiififififiififii
`Ix)IQIQl\)
`
`EEEEEEEH
`U1L»)Lo.)l—‘1,)—-
`
`
`L‘)1-1EH
`
`EEEEEEEE
`
`WW
`
`W E
`
`! W
`
`Interpolator
`
`0011
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 11 of 12
`
`US 6,778,177 B1
`
`Parallel
`scan COD-
`
`version
`
`Interpolator
`commands
`
`Parallel parameter
`interpolation
`
`Memory subsystem
`
`30
`
`(Xbyl)
`
`BUSY) : (X‘Xo) (Y1'Y0) ' (Y'Yo) (X1'X0)
`
`P(xp,yP)
`
`X Q(Xq:yq)
`
`X
`
`E(XpaYp) 2 O
`
`E(xq’yq) > O
`
`E(xf>y[) < 0
`
`E(x+ 1)’) : E(xxy) +dex
`
`dex : U’1*)’o)
`
`E(xy+ 1) = E(xy)+dey
`
`day = —(x.-xo)
`
`0012
`
`

`
`U.S. Patent
`
`Aug. 17, 2004
`
`21f021tCChS
`
`US 6,778,177 B1
`
`Block (bank 1)
`
`Block (bank 0)
`
`3 pixels (32 bits each)
`
`0013
`
`

`
`US 6,778,177 B1
`
`1
`METHOD FOR RASTERIZING A GRAPHICS
`BASIC COMPONENT
`
`FIELD OF THE INVENTION
`
`The present invention relates to a method for rasterizing
`a graphic primitive and,
`in particular,
`to an accelerated
`method for rasterizing a graphic primitive in a graphics
`system in order to generate pixel data for the graphic
`primitive from graphic primitive description data.
`BACKGROUND OF THE INVENTION AN
`DESCRIPTION OF THE PRIOR ART
`
`For accelerating the process of image-rendering of three-
`dimensional images, it is known to use multi—processors or
`hardware pipelines in parallel. Each of these units acts upon
`a sub-set of the information contained in an entire image, as
`has been described by James D. Foly et. al in “Computer-
`graphic Principles and Practice”, second edition, 1990,
`pages 887 to 902. This task can be divided up by either
`processing, in parallel, objects (polygons) in the image, or
`by processing certain sections of the image in parallel. Mere
`implementation of the division of objects leads to a subdi-
`vision of the object level description of a scene (vertex list),
`so that each of the processors is equally loaded. This division
`is carried out
`independently of the arrangement of the
`respective objects in the three-dimensional world or in a
`frame buffer.
`
`The implementation of the division of the task by forming
`sections in an image is effected by subdividing a frame
`buffer into sub-sections which normally have the same size.
`With regard to dividing the frame buffer,
`there is the
`possibility of either associating the same with large, con-
`tinuous pixel blocks or of effecting the association in an
`interleaved manner.
`
`FIG. 21 shows the above-described possibilities of parti-
`tioning a frame buffer with regard to the case of a graphics
`system operating with four graphics processing engines.
`FIG. 21a shows the association of large continuous pixel
`blocks to the respective graphics processing engines. As can
`be seen,
`in this exemplary case,
`the frame buffer 10 is
`subdivided into four equallysized blocks which are associ-
`ated to the engines. FIG. 21b shows the interleaved frame
`partitioning of the frame buffer 10, and it can be seen that the
`processing of the individual pixels 12, which are represented
`by the boxes in FIG. 21, is effected in an interleaved manner
`by the four graphics processing engines of the graphics
`system.
`Interleaved partitioning is used very frequently, since it
`offers the advantage that the workload of the individual
`processors is automatically balanced. Except for the smallest
`polygons, all polygons are located in all partitions of the
`frame, so that almost every image renderer is supplied with
`the same number of pixels. Interleaved frame buffer parti-
`tioning is also referred to as “distributed frame buffer”.
`FIG. 22 shows a block diagram of a conventional graphics
`system having a pipeline for pixel processing. The graphics
`system, in its entirety, is denoted by reference numeral 14
`and includes a scan converter 16 receiving, at its input, data
`which write onto the graphic primitive, e.g., a polygon, to be
`processed. The scan converter 16 processes the received data
`and produces, at its output, interpolator commands which
`are entered into a parameter interpolator 18. The output of
`the parameter interpolator 18 is connected to a first input of
`a pixel pipeline 20. The output of the pixel pipeline 20 is
`connected to a memory subsystem 24 via a packing unit 22.
`
`2
`Data from the memory subsystem 24 are supplied to the
`second input of the pixel pipeline 20 via a depacking unit 26.
`FIG. 23 shows a block diagram of a conventional graphics
`system with a plurality of pipelines working in parallel. The
`graphics system,
`in its entirety,
`is denoted by reference
`numeral 28, and identical or similar elements, such as in the
`system in FIG. 22, are provided with the same reference
`numerals. Unlike the graphics system illustrated in FIG. 22,
`the scan converter 16 is designed as a parallel scan converter
`and, similarly, the parameter interpolator 18 is designed as
`a parallel parameter interpolator. This parallel parameter
`interpolator 18 has a plurality of outputs for supplying data
`to a plurality of pixel pipelines 200—20,,, outputs of the pixel
`pipelines being connected with the packing unit 22. The
`depacking unit 26 is connected with the second inputs of the
`respective pixel pipeline 200-20”.
`Parallel image processing using interleaved frame parti-
`tioning constitutes a very suitable method for hardware
`implementation of image-rendering pipelines, as shown in
`FIG. 23. The memory subsystem 24 typically manages
`so-called memory words containing a plurality of pixels. A
`128-bit word, for example, contains four color pixels (true
`color pixels), with each pixel including 32 bits. The memory
`subsystem 24 can either read or write such a word during a
`clock cycle. In a graphics system having a single pixel
`pipeline, such as is shown in FIG. 22, the depacking unit 26
`must, for fragment calculation (e.g.,
`texture fade—overs,
`reflecting additions,
`target
`fade—overs, dithering, raster
`operations, and the like), extract one pixel per clock and
`convert it into the internal color format. Packing unit 22
`converts the results of the pixel pipeline calculation into the
`color format stored in the memory and unites several pixels
`to form one memory word.
`Systems having several image-rendering pipelines, as are
`shown in FIG. 23, can process, in parallel, several pixels
`contained in one memory word. If the number of pixel
`pipelines is equal to the number of pixels per memory word,
`packing and depacking the same becomes trivial.
`Graphics processing systems mostly use image-rendering
`engines whose primitives are polygons. In addition, these
`polygons are limited to certain types, such as triangles or
`quadrilateral elements. More complex polygons can then be
`defined using these graphic primitives.
`The basic challenge in processing graphic primitives is
`that determining whether a point in a screen area is within
`or outside the graphic primitive to be rendered must be as
`simple as possible. For triangles, this can be achieved, for
`example, in that the three edges forming the graphic primi-
`tive are written onto by means of linear edge functions.
`FIG. 24 shows an example of a linear edge function. In 130
`Cartesian co-ordinate system in FIG. 24, an edge 30 of a
`graphic primitive is illustrated by way of example, and 1162
`starting point and the end point, respectively, of the edge are
`determined by the co-ordinates X0 and yo and x1 and y1,
`respectively.
`It can be determined by the edge function indicated in [18
`right-hand section of FIG. 24 whether a point within [16
`Cartesian co-ordinate system is located to the left or the right
`of the edge or on the edge. Point P is located on the edge 30
`and, in this case, the value for the edge function is 0. Point
`Q is located to the right of edge 30 and, in this case, [16
`result of the edge function is larger than 0, whereas for point
`R, which is located to the left of edge 30, the result of [16
`edge function is smaller than 0. In other words, each of [16
`linear edge functions yields a value of 0 for co-ordinates
`which are located exactly on the edge or on the line, a
`
`8
`
`0014
`
`

`
`US 6,778,177 B1
`
`'
`
`3
`positive value for co-ordinates located to one side of the line
`or edge, and a negative value for co-ordinates located to the
`other side of the line or edge. The sign of the linear edge
`f1I11ction subdivides the drawing surface into two l1alf-
`planes.
`Linear edge functions are further described in the follow-
`ing articles: J. Pineda “A Parallel Algorithm for Polygon
`Rasterisation” Seggraph Proceedings, Vol. 22, No. 4, 1988,
`pages 17 to 20; H. Fuchs et. al, “Fast Spheres Shadows,
`Textures, Transparences, and Image Enhancements in Pixel-
`Planes”; Seggraph Proceedings, Vol. 19, No. 3, 1985, pages
`111 to 120; Dunnet, White, Lister, (jrinsdale University of
`Sussex, “The Image Chip for High Performance”, IEEE
`Computer Graphics and Applications, November 1992,
`pages 41 to 51.
`By multiplying the edge functions with the value of -1,
`the sign for the half—planes can be inverted, and the edge
`function can further be normalized for indicating a distance
`of a point from the edge, as has been described by A.
`Schilling in “A New, Simple and Efficient Antialiasing with
`Subpixel Marks”, Seggraph Proceedings, Vol. 25, No. 4,
`1991, pages 1, 2, 3 to 141. This is useful, in particular, for
`pixel overlap calculations for performing edge antialiasing
`(antialiasing=measure for reducing image distortions).
`The linear edge functions are calculated incrementally
`from a given starting point, which is particularly desirable
`for hardware implementations, since this offers the possi-
`bility of merely using simple adders instead of costly
`multipliers. FIG. 25 shows an example of edge function
`increments, wherein the starting point
`is denoted by E,
`E+de,, indicates the incrementation in the x direction, and
`E+dey indicates the incrementation in the y direction. The
`right—hand part of FIG. 25 describes the determination of the
`incremental values of de_,_ and dey, respectively. If the edge
`function is, itself, normalized or inverted, it is required to
`also normalize and invert the delta values for the incremen-
`tal steps, indicated in FIG. 25.
`For a triangle, the three edge functions can be arranged
`such that all three edge functions supply positive values only
`for such co-ordinates which are located within the triangle.
`If at least one edge function yields a negative value, the
`co-ordinate in question, i.e., the pixel in question, is located
`outside the triangle. FIG. 26A shows the sign distribution for
`the three edges 30a, 30b, 30c of a triangle-shaped graphic
`primitive 32. The boxes 12 shown in FIG. 26 each illustrate
`an illustratable pixel. As can be seen, the edge functions for
`the edges 30a to 30d yield a negative value whenever the
`co-ordinate is located outside the graphic primitive 32, and
`a result with a positive sign is output only when the
`co-ordinate is located within the same.
`
`Typically, the scan conversion hardware obtains the edge
`function values of all three edges for a given starting point
`together with the delta values for the x and y directions, so
`as to enable incremental calculation of the successive
`co-ordinates. With each clock, the scan converter advances
`by one pixel in the horizontal direction or by one pixel in the
`vertical direction. FIG. 26B shows a potential traversing
`algorithm for passing through the triangle 32 already shown
`in FIG. 26A. The scan path is shown in FIG. 26B and, as can
`be seen, the triangle is passed through up to the last pixel 36
`in the manner shown, starting from a starting pixel 34. From
`here, the algorithm jumps to a further graphic primitive to be
`processed. In traversing the graphic primitive, edge function
`values for older positions can be stored so as to enable a
`return to the same or to their neighbors. The aim is to
`consume as few clock cycles per triangle as possible or, in
`
`4
`to avoid the scanning of pixels outside 1:16
`other words,
`triangle, which will be referred to as invisible pixels in 1:10
`further course of the description. For example, a simple
`method might consist in traversing all pixels which are
`contained within the enclosure triangle of the graphic primi-
`tive and in verifying the same with regard to their visibili y.
`This would evidently mean that at least 50% of non-visible
`pixels would have to be traversed. In contrast to this, tie
`algorithm shown in FIG. 26B is developed further, after it
`has scanned the triangle on a scan line-by-scan line basis,
`with a leading edge of the triangle being tracked. T1e
`leading edge of the triangle is that which exhibits the largest
`extension in a direction perpendicular to the scanning direc-
`tion or to the scan line. With most triangle forms, traversing
`invisible pixels to a large extent is thereby avoided, and 1163
`percentage of scanned invisible pixels rises only for very
`narrow triangles.
`The scan lines may be defined either horizontally or
`vertically or even with changing orientations, depending on
`the triangle to be examined. In practice, it is expedient to
`restrict scan conversion to horizontal scan lines, as this
`aligns the scan with the display—refreshing scan and,
`moreover, a memory access can typically be optimized only
`for one scan axis. If the scan lines are horizontally defined,
`the leading edge of the triangle is defined by the two vertices
`exhibiting the largest difference regarding their y
`co-ordinates. In order to assure symmetrical behavior after
`rasterization or scan conversion, it is desirable to change 116
`vertical and horizontal scanning directions as a function of
`t1e inclination of the leading edge and as a function of 1:16
`orientation of the triangle, respectively. FIG. 27 shows
`eifferent scanning directions for different types of triangles.
`As can be seen, for the triangle of type A, the horizon al
`scanning direction is defined in the positive x direction, and
`t1e vertical scanning direction is defined in the positive y
`cirection. For the triangle of type B, the horizontal scanning
`eirection is defined in the negative x direction, and 1162
`Vertical scanning direction is defined in the positive y
`cirection. For the type C triangle, the horizontal scanning
`cirection is defined in the positive x direction, and he
`vertical scanning direction is defined in the negative y
`cirection, and for the type D triangle, the vertical scanning
`cirection is defined in the negative y direction, and he
`horizontal scanning direction is defined in the negative X
`( irection.
`
`In the following, a more detailed description will be given
`of the memory subsystem mentioned with regard to FIGS.
`22 and 23. Known graphics systems typically use dynamic
`random access memories (e.g., synchronous DRAMs) for
`frame buffer storage. After the performance of the rasterizer
`has been determined by the memory bandwidth, it is desir-
`able to communicate with the memory in an efficient man-
`ner
`
`Large frame buffers (e.g., 1600><1280><32 bits-8M bits)
`can be accommodated in only a small number of memory
`components. For assuring an adequate bandwidth,
`the
`memory is accessed via a broad path, and the same is limited
`only by the number of inputs/outputs (I/Os) present at the
`connection between the graphics chip and the memory (e.g.,
`128 data bits). Using modern technologies, such as double
`data rate transmission, frame buffers having bandwidths of
`more than 2 GByte/sec per graphics control can be achieved.
`However, this bandwidth is not available for the entirely
`random access.
`
`A DRAM array consists of rows and columns, and access
`within one row (page) to varying columns will normally be
`very fast. Synchronous DPAMs can transfer data in each
`
`0015
`
`

`
`US 6,778,177 B1
`
`5
`clock cycle, provided that they remain in the same row.
`Passing to a different row is equivalent to consuming several
`clock cycles for closing the old column and opening the new
`one.
`
`These cycles cannot be utilized for actual data
`transmission, so that the overall bandwidth is reduced. In
`order to minimize this effect, modern DRAMs contain some,
`2 to 4, memory banks in which different rows may be open.
`An efficient image rendering system must take these prop-
`erties into account in order to be able to yield optimum '
`performance.
`A known technique in memories is referred to as
`“memory tiling”, i.e., the subdivision of the memory into
`blocks or blocks. In this case, rectangular-shaped areas of a
`mapping screen are mapped to blocks (blocks)
`in the
`memory. Small triangles have a tendency to completely fall
`into one block, which means that these do not lead, during
`image rendering, to page defaults in accessing the memory.
`The graphics systems properties for processing triangles
`which intersect several blocks,
`i.e., which extend over
`several blocks, can be enhanced by mapping adjacent blocks
`onto different memory banks in the form of a chessboard.
`One example of a potential memory partitioning is shown in
`FIG. 28, in which each block has a size of 2 Kbytes.
`From U.S. Pat. No. 5,367,632, a graphics system is known
`which has a plurality of graphics rendering elements
`arranged in the manner of pipelines, each pipeline being
`associated with a rasterization with a corresponding
`memory. The individual memories are conventional memory
`elements which per se each form a frame buffer for the
`respective pipeline. The memories are not arranged in any
`specific organization.
`U.S. Pat. No. 5,821,944 describes “memory tiling”,
`wherein a screen area, onto which a graphic primitive is to
`be mapped, is subdivided into a plurality of fields or blocks.
`Specification of the blocks is followed by a two—step scan,
`and it is established which of the blocks comprise a portion
`of the graphic primitive to be processed. Subsequently, the
`blocks which have just been determined are scanned in the
`second step. The individual blocks are selected so as to be
`associated with corresponding memory areas, the memory
`areas associated with the respective blocks being filed in a
`cache memory during the processing.
`The graphics systems known from the prior art for pro-
`cessing three-dimensional
`images are disadvantageous,
`however, in that optimum utilization of the memory capaci-
`ties is not ensured. For this reason, and on the grounds of the
`rasterization methods known from the prior art, the perfor-
`mance of these systems is limited.
`SUMMARY OF THE INVENTION
`
`It is the object of the present invention to provide a
`method for rasterizing a graphic primitive which exhibits
`increased performance compared with the methods known
`in the prior art.
`In accordance with a first aspect, the present invention
`provides a method for rasterizing a graphic primitive in a
`graphics system for generating pixel data for the graphic
`primitive, starting from graphic primitive description data,
`with the graphics system having a memory divided up into
`a plurality of blocks, each of which is associated with a
`predetermined one of a plurality of areas on a mapping
`screen. In a first step, the pixels associated with the graphic
`primitive are scanned in one of the plurality of blocks into
`which the graphic primitive extends, and this step is repeated
`until all pixels associated with the graphic primitive have
`
`6
`been scanned in each of the plurality of blocks into which
`the graphic primitive extends. Subsequently, the pixel data
`obtained are output for further processing.
`In accordance with a second aspect, a method for raster-
`izing a graphic primitive in a graphics system is provided for
`generating pixel data for the graphic primitive, starting from
`graphic primitive description data,
`the graphics system
`including a plurality of graphics processing pipelines.
`Initially, a plurality of adjacent pixels are simultaneously
`scanned, with the adjacent pixels forming a cluster, at least
`one of the plurality of adjacent pixels being associated with
`the graphic primitive, and with the number of the pixels
`being simultaneously scanned depending on the number of
`graphics processing pipelines in the graphics system.
`Subsequently, this step is repeated until all pixels associated
`with the graphic primitive have been scanned, and, finally,
`all the pixel data are output.
`The present invention is based on the realization that the
`performance of graphics processing systems can be
`increased in that, on the one hand, the graphic primitives to
`be scanned are traversed in an “intelligent” manner and/or
`that, on the other hand, the performance of the system is
`increased by a further parallelization of data processing.
`In accordance with the present invention, a method is
`taught which implements a “monolithic algorithm” in which
`all of the aspects explained above can be used together,
`individually or in any combination so as to increase the
`system’s performance. This results in a “scalable architec-
`ture” of the graphics processing means to be used.
`Several image-rendering pipelines are supported on one
`individual chip such that each of the same processes a
`diiferent pixel of a memory word. This requires that the
`parallel scan converter functions in an operating mode
`referred to as locked scan. This means that
`the pixels
`processed in parallel always have a fixed geometric rela-
`tionship with one another (pixel cluster). This facilitates
`hardware implementation with regard to the memory sub-
`system. Furthermore, this enables application of the method
`to chips with several image-rendering pipelines, indepen-
`dent of the chip layout.
`A further advantage of the method is that it is possible to
`combine several individual chips (see above) in one system
`so as to increase the performance thereof with each chip
`, added. In addition, different chips in the system may serve
`to fulfil different tasks and to process a different number of
`pixels, i.e., clusters of different sizes, in parallel. In this case,
`it is not necessary for the scan converters of the parallel
`image-rendering chips in the system to be interlocked, since
`each of same has its own frame buffer memory, and the
`supply of the polygon data can be decoupled using FIFOs.
`A further advantage of the present invention consists in
`memory utilization. Memory utilization mainly depends on
`the efficiency of memory control and the memory decision
`circuit (arbitration circuit). However, even with an ideal
`memory interface unit, the randomness of the pixel accesses
`may ruin memory utilization, in particular when scanning
`small triangles, this effect being even further aggravated in
`parallel image rendering, where the triangles are subdivided
`into smaller sections. This problem is avoided in accordance
`with the present invention, since the same is based on the
`realization that the number of page defaults per triangle can
`be minimized in the event
`that
`the scan converter has
`knowledge with regard to mapping the screen areas onto the
`memory address area (tiling). Further, the average number of
`memory banks which are simultaneously open may also be
`reduced which, again, reduces potential collisions in systems
`
`0016
`
`

`
`US 6,778,177 B1
`
`7
`where several requests (texture reading operation, graphics
`rendering engine reading/writing operation, display screen
`reading operation) are effected with regard to a shared
`memory element (linked memory).
`Another advantage of the present invention is that the
`efficiency of cache storage of texture data can be improved
`by the method in accordance with the invention. Typically,
`bi-linear or tri-linear filtering is used for texture mapping.
`Without latching the texture data, four (bi-linear filtering) or
`even eight (tri-hnear filtering) unfiltered texels become '
`necessary which would have to be provided by the memory
`subsystem per pixel. A texture cache memory can benefit
`from the fact that adjacent texels can be reused during the
`passing of a scan line. The extent of the reuse strongly
`depends on the magnification/reduction chosen and is sig-
`nificant if a suitable MIP-map level is selected. Within one
`scan line, only a very small
`texture cache memory is
`required in order to benefit from this advantage. In order to
`reuse the adjacent texels of a previous scan line, however,
`the texture cache memory must contain a complete scan line
`of the texels.
`In practice, a cache memory size will be
`selected which is capable of storing scan lines for triangles
`or graphic primitives of an average size, whereby the
`efficiency for larger triangles somewhat decreases. In this
`connection, a further advantage of the present invention is
`that a maximum length of a scan line can be guaranteed by
`the scan converter, so that the cache memory can be accu-
`rately dimensioned and is normally considerably smaller
`than that required for storing scan lines for average triangles.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`In the following, preferred embodiments of the present
`invention will be explained in more detail with reference to
`the attached drawings, in which:
`FIG. 1 is a schematic illustration of a graphics system;
`FIG. 2 is an illustration of the inventive rasterization
`method in accordance with a first embodiment;
`FIG. 3 is an illustration of the inventive rasterization
`method in accordance with a second embodiment;
`FIG. 4 is an illustration of the inventive rasterization
`method in accordance with a third embodiment;
`FIG. 5 is a table which shows dimensions for different
`clusters that may be used in accordance with

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket