`Furtner
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,778,177 B1
`Aug. 17, 2004
`
`US0067781 77131
`
`1/1999 Buekelew et al.
`5,864,512 A *
`6/1999 Aleksic
`5,914,722 A *
`2/2001 Min
`6,184,907 B1 *
`5/2001 Watldns .
`6,236,408 B1 *
`FOREIGN PATENT DOCUMENTS
`19600431 A1
`9/1996
`2297018 A
`7/1996
`OTHER PUBLICATIONS
`
`
`
`365/230.01
`345/423
`345/558
`345/582
`
`(54) METHOD FOR RASTERIZING A GRAPHICS
`BASIC COMPONENT
`
`(75)
`
`Inventor: Wblfgang Furtner,Fuerstenfeldbruck
`(DE)
`
`(73) Assignee: SP3D Chip Design GmhH, Starnberg
`(DE)
`
`DE
`GB
`
`(_ * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.:
`(22) PCT Filed:
`
`09/958,297
`Apr. 10, 2000
`
`PCT/EP00/03175
`
`(86) PCT No.:
`§ 371 ('C)(1)a
`Feb. 19, 2002
`(2), (4) Date:
`(87) PCT Pub. No.: W000/63846
`PCT Pub. Date: Oct. 26, 2000
`
`Foreign Application Priority Data
`(30)
`Apr. 15, 1999
`(DE)
`....................................... .. 199 17 092
`
`Int. Cl.7 ........................................ .. G06F 12/02
`(:51)
`(52) U.S. Cl.
`. . . . . . .
`. . . . . . . . . .. 345/544; 345/443; 345/506
`(58) Field of Search ............................... .. 345/544, 501,
`345/502, 503, 505, 506, 530, 418, 440,
`441, 442, 443
`
`('56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`A
`A
`
`11/1994 Bowen el al.
`'10/1998 Watkins
`
`J.D. Foley, et al., “Grundlagender Computergraphik”, 1. Ed.
`Addison—Wesley, 1994, ISBN: 3—89319—647—1; pp 75-82
`and 101-106.
`
`cited by examiner
`Primary Examiner—Kee M. Tung
`(74) Attorney, Agent, or Firm—Dougherty, Clements,
`Ilofer & Bernard
`
`(5 7,)
`
`ABSTRACT
`
`A method for rasterizing a graphic primitive (120) in a
`graphics system generates, starting from graphic primitive
`description data, pixel data for the graphic primitive, the
`graphics system comprising a memory which is divided up
`into a plurality of blocks (a, a+1, b, b+1) which are each
`associated with a predetermined one of a plurality of areas
`on a mapping screen (114). Each block of the plurality of
`blocks (a, a+1, b, b+1) is associated with a memory page in
`the memory. The i11etl1od includes scanning the pixels asso-
`ciated With the graphic primitive (120) in one of the plurality
`of blocks (a) into which the graphic primitive extends,
`repeating the preceding steps until all of the pixels associ-
`ated with the graphic primitive have been scaimed in each of
`the plurality of blocks into which the graphic primitive
`extends, and outputting the pixel data.
`
`15 Claims, 12 Drawing Sheets
`
`[042
`
`14_)(,a
`
`10“,
`
`10611
`
`108:!
`
`
`
`Rendering
`
`engine
`
`..w —
`4*’
`Vmex Processor
`data
`
`
`
`0001
`
`Volkswagen 1004
`
`0001
`
`Volkswagen 1004
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 1 of 12
`
`US 6,778,177 B1
`
`Rendering
`engine
`
`
`
`In?Ffiifififlfisjut?fiyguli
`
`0002
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 2 of 12
`
`US 6,778,177 B1
`
`E+dex
`
`Em ,
`
`E+(dex+dey)
`
`132
`
`Cluster:
`
`Non-visible :E
`
`Visible
`
`-E‘
`
`2
`
`0,,
`
`0003
`
`
`
`U.S. Patent
`
`UA
`
`one
`
`40027:1
`
`Sheet 3 of 12
`
`US 6,778,177 B1
`
`II.
`
`__________
`
`4
`
`2.
`
`2, 4,
`Rendering engine
`Texture engine
`
`
`
`.l.:.l.I1..|l7.
`
`0004
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 4 of 12
`
`US 6,778,177 B1
`
`Starting values
`t.
`lgeltas.
`d.
`Ccifitslglrnfgorxlliec Ion
`Interleave factors
`Interleave offset
`
`V.
`.b_]_
`151
`1 it
`Start command
`damp y
`mination _> Visibility
`-
`O '
`I
`“mt
`Var ap
`
`Leading pixel
`
`Dir_X 4.‘
`IIII 25110
`II+L>Dir_Y
`
`0005
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 5 of 12
`
`US 6,778,177 B1
`
`x_mn mod clx
`[y_sun mod cly|
`
`ccfx = - (x_slar( mod clx)
`ccfy I -(_y_:L1n mod cly)
`
`ccfx = (x_surt mod cix) . (clx .1)
`ccfy = (y Vstafi mod cly)~ (sly —l)
`
`d1I_x = 0[dir_y =0]
`
`(x_slanlclx) mod ilfx =
`[(y_sLan/ciy) mod ilfy =1
`
`dir-x = I
`
`|dir_y =l]
`
`(x_starUcIx) mod ilfx :
`[(y_sl.1:1/cly) mod ilfy =]
`
`iiox[i1oy)
`
`icfx=(ilox—((x_sIan/clx) mod il{x)) mod ilfx
`icfy =(iloy~((y_stz:1/ciy) mod iify)) 1-nod ilfy
`
`icfx=(((K_S\art/clx) mod ilfx)-ilox) mod ilfx
`Icfy=(((y_:IxnIc}y) mod ilfy)-iloy) mod ilfy
`
`Definition
`
`ccfx + cix ' Icfx. Start correction factor in the x direction
`
`“TY * CIY ‘ icfy. Slarl correction factor in the y direction
`
`cI_star1. + scfic ‘ cl_d.x + scfy ‘ el_dy
`
`enl_sui1. + scfiz ‘ e.n1_dx + scfy ‘ r.ni_dy
`
`cn2_:un 1- scfir ‘ a12_dx # scfy ‘ m2_dy
`
`if dir_x = 0 then x_sun‘ = x_xun + scfx cls: x_xun‘ = x_slan - scfx
`if dir_y = 0 then y_sur1‘ = y_sur1 + scfy else y_xun‘ = y_slIrl . scfy
`ccm_sun v scfx
`
`u:n\_:un ~ xcfy
`
`0006
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 6 of 12
`
`US 6,778,177 B1
`
`log2(i1fy * cly)
`log2(i1fx * clx)
`
`E‘ Sensor for syl
`el+dx+2dy
`
`calculation:
`
`“
`:5
`f“fif C
`
`;_ T3
`
`an
`
`0007
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 7 of 12
`
`US 6,778,177 B1
`
`Term
`
`Definition
`
`ccn1_dccr
`
`scnt_decr
`
`ccnt - clx ‘ ilfx
`
`scnl — cly ‘ ilfy
`
`(cl_x, cnl_x, cn2_x)
`
`(cl, cnl, cn2) + clx ‘ ilfx * (cl_dx, cn1_dx, cn2_dx)
`
`(c1_y, cn1__y, cn2_y)
`
`(cl, cnl, cn2) + cly “ ilfy ' (elwdy, cn1_dy, cn2_dy)
`
`(cl_xy, cn1_xy, cn2_xy)
`
`(cl, cnl, cn2) + clx ‘ ilf‘ (cl_dx, cnl_dx, cn2_dx) + cly " iify ‘ (cl_dy, cn1_dy,
`cn2*dy)
`
`syl
`
`(cl + (clx - 1) “ cl_dx + cly ‘ ilfy “ cl_dy) >= 0
`
`scnt_dccr >= 0
`
`((cnl + clx ‘ ilf‘ cnl_dx )>= 0 AND (cn2 + cl): ‘ Hf‘ cn2_dx) >= 0) OR
`((cn1 + clx “ ilf‘ cnl_dx + cn1_dy) >= 0 AND (en2 + clx ‘ i1f* cn2_dx +
`cn2_dy) >= 0) OR
`
`((cnI + clx " ilf‘ cniwdx + (cly -I) ’ cn!_dy) >= 0 AND (cn2 + clx ‘ iii" *
`cn2_dx + (cly ~I) “ cn2_dy) >= 0)
`
`ccnt_dccr>= O
`
`ifdir_x = 0 then cos = ((x div ilfx) mod stw >= stw - clx) clsc cos = ((x div ilfx)
`mod slw < clx )
`
`Movement
`
`Condition
`
`x direction
`
`y direction
`
`xy direction
`
`Edge return
`
`Next stri
`
`I3
`
`(/cos AND sxn AND soc) on ((cos AND /sf AND sxn AND
`soc) AND ((/cf AND /syl) OR /ssc))
`
`'
`
`'
`
`(cos OR /sxn OR /scc) AND ssc AND /ef AND syl
`
`/sxn AND ssc AND /cf AND /syl AND /sf AND soc
`
`(cos OR /sxn OR /scc) AND ssc AND cf
`
`(/ssc AND sf AND /sxn AND soc) OR ((cos OR /sxn) AND /cf
`AND /syl AND sf AND scc) OR (605 AND sf AND /ssc & soc)
`
`Next primitive
`
`(/ssc AND /sf AND /(sxn AND sec» on User: AND /syl & /ct)
`OR (/scc AND /ssc)
`
`0008
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 8 of 12
`
`US 6,778,177 B1
`
`Register
`
`Value to be loaded
`
`Condition
`
`(el, enl, en2)
`
`(el_x, en1_x, en2__x)
`
`(el y, enl_y, en2_y)
`
`(cl_xy, cn1__xy, en2_xy)
`
`(eI_edge, en1_edge, en2_edge)
`
`(el_sLrip, en I_strip, en2_strip)
`
`hon
`
`vctt
`
`diag
`
`edge
`
`strip
`
`(el_edge, en1_edge,
`en2_edge)
`
`(el__strip, enl_strip,
`en2_slrip)
`
`(eI_starl, en I _start, en 2_start)
`
`newp
`
`(elj, enlj, enlj)
`
`horz AND syl AND /ef
`
`_fl
`
`(e1_x_ en1_x, en2_x)
`
`(vert OR edge) AND sxn AND
`sec AND /sf
`
`ccnl
`
`ccnt__decr
`
`horz OR diag
`
`mp
`
`horz AND syl AND /ef
`
`(vcrt OR edge) AND sxn AND
`scc AND /sf
`
`vert OR edge OR diag
`
`ncwp
`
`(ven OR edge) AND sxn & sec
`AND /sf
`
`(van OR edge) AND sxn AND
`soc AND sf
`
`syl AND horz
`
`/(syl AND horz)
`
`(vcrt OR edge) AND sxn AND
`SCC
`
`strip OR ncwp OR diag
`
`ccnl_Sta11
`
`cont
`
`ccnt_decr
`
`scnl_decr
`
`scnt_sta11
`
`1
`
`1s
`
`cnt_st1ip + 1
`
`0
`
`ccnt_edge
`
`0009
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 9 of 12
`
`US 6,778,177 B1
`
`Condition
`
`Comment
`
`11012 AND /cf
`
`horz AND ef
`
`go to the next pixel in the scan line,
`track edge return position
`
`go to the next pixel in the scan line,
`secure edge return position
`
`(vert OR edge) AND /sf
`
`(VCI1 OR edge) AND sf
`
`go to the edge entry position,
`track strip entry position
`
`go to the edge entry position,
`secure strip entry position
`
`diag
`
`SWP
`
`go diagonally, secure no return position
`
`go to the strip entry point
`
`go to the next primitive
`
`vis(cofs,rofs) = (el + cofs * el_dx + rofs * el_dy) >= 0 AND
`(enl + cofs * en1_dx + rofs * enl_dy) >= 0 AND
`(en2 + cofs * en2_dx + rofs * en2_dy) >= 0 AND
`(cent - cofs) >= 0 AND (sent - rofs) >= 0
`
`(°°f5 / mfs)
`
`Leading pixel
`
`0010
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 10 of 12
`
`US 6,778,177 B1
`
`EEEEEEEE
`u:>—-IQ)v—dDJL.)—-
`
`Efiififififiififii
`Ix)IQIQl\)
`
`EEEEEEEH
`U1L»)Lo.)l—‘1,)—-
`
`
`L‘)1-1EH
`
`EEEEEEEE
`
`WW
`
`W E
`
`! W
`
`Interpolator
`
`0011
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 11 of 12
`
`US 6,778,177 B1
`
`Parallel
`scan COD-
`
`version
`
`Interpolator
`commands
`
`Parallel parameter
`interpolation
`
`Memory subsystem
`
`30
`
`(Xbyl)
`
`BUSY) : (X‘Xo) (Y1'Y0) ' (Y'Yo) (X1'X0)
`
`P(xp,yP)
`
`X Q(Xq:yq)
`
`X
`
`E(XpaYp) 2 O
`
`E(xq’yq) > O
`
`E(xf>y[) < 0
`
`E(x+ 1)’) : E(xxy) +dex
`
`dex : U’1*)’o)
`
`E(xy+ 1) = E(xy)+dey
`
`day = —(x.-xo)
`
`0012
`
`
`
`U.S. Patent
`
`Aug. 17, 2004
`
`21f021tCChS
`
`US 6,778,177 B1
`
`Block (bank 1)
`
`Block (bank 0)
`
`3 pixels (32 bits each)
`
`0013
`
`
`
`US 6,778,177 B1
`
`1
`METHOD FOR RASTERIZING A GRAPHICS
`BASIC COMPONENT
`
`FIELD OF THE INVENTION
`
`The present invention relates to a method for rasterizing
`a graphic primitive and,
`in particular,
`to an accelerated
`method for rasterizing a graphic primitive in a graphics
`system in order to generate pixel data for the graphic
`primitive from graphic primitive description data.
`BACKGROUND OF THE INVENTION AN
`DESCRIPTION OF THE PRIOR ART
`
`For accelerating the process of image-rendering of three-
`dimensional images, it is known to use multi—processors or
`hardware pipelines in parallel. Each of these units acts upon
`a sub-set of the information contained in an entire image, as
`has been described by James D. Foly et. al in “Computer-
`graphic Principles and Practice”, second edition, 1990,
`pages 887 to 902. This task can be divided up by either
`processing, in parallel, objects (polygons) in the image, or
`by processing certain sections of the image in parallel. Mere
`implementation of the division of objects leads to a subdi-
`vision of the object level description of a scene (vertex list),
`so that each of the processors is equally loaded. This division
`is carried out
`independently of the arrangement of the
`respective objects in the three-dimensional world or in a
`frame buffer.
`
`The implementation of the division of the task by forming
`sections in an image is effected by subdividing a frame
`buffer into sub-sections which normally have the same size.
`With regard to dividing the frame buffer,
`there is the
`possibility of either associating the same with large, con-
`tinuous pixel blocks or of effecting the association in an
`interleaved manner.
`
`FIG. 21 shows the above-described possibilities of parti-
`tioning a frame buffer with regard to the case of a graphics
`system operating with four graphics processing engines.
`FIG. 21a shows the association of large continuous pixel
`blocks to the respective graphics processing engines. As can
`be seen,
`in this exemplary case,
`the frame buffer 10 is
`subdivided into four equallysized blocks which are associ-
`ated to the engines. FIG. 21b shows the interleaved frame
`partitioning of the frame buffer 10, and it can be seen that the
`processing of the individual pixels 12, which are represented
`by the boxes in FIG. 21, is effected in an interleaved manner
`by the four graphics processing engines of the graphics
`system.
`Interleaved partitioning is used very frequently, since it
`offers the advantage that the workload of the individual
`processors is automatically balanced. Except for the smallest
`polygons, all polygons are located in all partitions of the
`frame, so that almost every image renderer is supplied with
`the same number of pixels. Interleaved frame buffer parti-
`tioning is also referred to as “distributed frame buffer”.
`FIG. 22 shows a block diagram of a conventional graphics
`system having a pipeline for pixel processing. The graphics
`system, in its entirety, is denoted by reference numeral 14
`and includes a scan converter 16 receiving, at its input, data
`which write onto the graphic primitive, e.g., a polygon, to be
`processed. The scan converter 16 processes the received data
`and produces, at its output, interpolator commands which
`are entered into a parameter interpolator 18. The output of
`the parameter interpolator 18 is connected to a first input of
`a pixel pipeline 20. The output of the pixel pipeline 20 is
`connected to a memory subsystem 24 via a packing unit 22.
`
`2
`Data from the memory subsystem 24 are supplied to the
`second input of the pixel pipeline 20 via a depacking unit 26.
`FIG. 23 shows a block diagram of a conventional graphics
`system with a plurality of pipelines working in parallel. The
`graphics system,
`in its entirety,
`is denoted by reference
`numeral 28, and identical or similar elements, such as in the
`system in FIG. 22, are provided with the same reference
`numerals. Unlike the graphics system illustrated in FIG. 22,
`the scan converter 16 is designed as a parallel scan converter
`and, similarly, the parameter interpolator 18 is designed as
`a parallel parameter interpolator. This parallel parameter
`interpolator 18 has a plurality of outputs for supplying data
`to a plurality of pixel pipelines 200—20,,, outputs of the pixel
`pipelines being connected with the packing unit 22. The
`depacking unit 26 is connected with the second inputs of the
`respective pixel pipeline 200-20”.
`Parallel image processing using interleaved frame parti-
`tioning constitutes a very suitable method for hardware
`implementation of image-rendering pipelines, as shown in
`FIG. 23. The memory subsystem 24 typically manages
`so-called memory words containing a plurality of pixels. A
`128-bit word, for example, contains four color pixels (true
`color pixels), with each pixel including 32 bits. The memory
`subsystem 24 can either read or write such a word during a
`clock cycle. In a graphics system having a single pixel
`pipeline, such as is shown in FIG. 22, the depacking unit 26
`must, for fragment calculation (e.g.,
`texture fade—overs,
`reflecting additions,
`target
`fade—overs, dithering, raster
`operations, and the like), extract one pixel per clock and
`convert it into the internal color format. Packing unit 22
`converts the results of the pixel pipeline calculation into the
`color format stored in the memory and unites several pixels
`to form one memory word.
`Systems having several image-rendering pipelines, as are
`shown in FIG. 23, can process, in parallel, several pixels
`contained in one memory word. If the number of pixel
`pipelines is equal to the number of pixels per memory word,
`packing and depacking the same becomes trivial.
`Graphics processing systems mostly use image-rendering
`engines whose primitives are polygons. In addition, these
`polygons are limited to certain types, such as triangles or
`quadrilateral elements. More complex polygons can then be
`defined using these graphic primitives.
`The basic challenge in processing graphic primitives is
`that determining whether a point in a screen area is within
`or outside the graphic primitive to be rendered must be as
`simple as possible. For triangles, this can be achieved, for
`example, in that the three edges forming the graphic primi-
`tive are written onto by means of linear edge functions.
`FIG. 24 shows an example of a linear edge function. In 130
`Cartesian co-ordinate system in FIG. 24, an edge 30 of a
`graphic primitive is illustrated by way of example, and 1162
`starting point and the end point, respectively, of the edge are
`determined by the co-ordinates X0 and yo and x1 and y1,
`respectively.
`It can be determined by the edge function indicated in [18
`right-hand section of FIG. 24 whether a point within [16
`Cartesian co-ordinate system is located to the left or the right
`of the edge or on the edge. Point P is located on the edge 30
`and, in this case, the value for the edge function is 0. Point
`Q is located to the right of edge 30 and, in this case, [16
`result of the edge function is larger than 0, whereas for point
`R, which is located to the left of edge 30, the result of [16
`edge function is smaller than 0. In other words, each of [16
`linear edge functions yields a value of 0 for co-ordinates
`which are located exactly on the edge or on the line, a
`
`8
`
`0014
`
`
`
`US 6,778,177 B1
`
`'
`
`3
`positive value for co-ordinates located to one side of the line
`or edge, and a negative value for co-ordinates located to the
`other side of the line or edge. The sign of the linear edge
`f1I11ction subdivides the drawing surface into two l1alf-
`planes.
`Linear edge functions are further described in the follow-
`ing articles: J. Pineda “A Parallel Algorithm for Polygon
`Rasterisation” Seggraph Proceedings, Vol. 22, No. 4, 1988,
`pages 17 to 20; H. Fuchs et. al, “Fast Spheres Shadows,
`Textures, Transparences, and Image Enhancements in Pixel-
`Planes”; Seggraph Proceedings, Vol. 19, No. 3, 1985, pages
`111 to 120; Dunnet, White, Lister, (jrinsdale University of
`Sussex, “The Image Chip for High Performance”, IEEE
`Computer Graphics and Applications, November 1992,
`pages 41 to 51.
`By multiplying the edge functions with the value of -1,
`the sign for the half—planes can be inverted, and the edge
`function can further be normalized for indicating a distance
`of a point from the edge, as has been described by A.
`Schilling in “A New, Simple and Efficient Antialiasing with
`Subpixel Marks”, Seggraph Proceedings, Vol. 25, No. 4,
`1991, pages 1, 2, 3 to 141. This is useful, in particular, for
`pixel overlap calculations for performing edge antialiasing
`(antialiasing=measure for reducing image distortions).
`The linear edge functions are calculated incrementally
`from a given starting point, which is particularly desirable
`for hardware implementations, since this offers the possi-
`bility of merely using simple adders instead of costly
`multipliers. FIG. 25 shows an example of edge function
`increments, wherein the starting point
`is denoted by E,
`E+de,, indicates the incrementation in the x direction, and
`E+dey indicates the incrementation in the y direction. The
`right—hand part of FIG. 25 describes the determination of the
`incremental values of de_,_ and dey, respectively. If the edge
`function is, itself, normalized or inverted, it is required to
`also normalize and invert the delta values for the incremen-
`tal steps, indicated in FIG. 25.
`For a triangle, the three edge functions can be arranged
`such that all three edge functions supply positive values only
`for such co-ordinates which are located within the triangle.
`If at least one edge function yields a negative value, the
`co-ordinate in question, i.e., the pixel in question, is located
`outside the triangle. FIG. 26A shows the sign distribution for
`the three edges 30a, 30b, 30c of a triangle-shaped graphic
`primitive 32. The boxes 12 shown in FIG. 26 each illustrate
`an illustratable pixel. As can be seen, the edge functions for
`the edges 30a to 30d yield a negative value whenever the
`co-ordinate is located outside the graphic primitive 32, and
`a result with a positive sign is output only when the
`co-ordinate is located within the same.
`
`Typically, the scan conversion hardware obtains the edge
`function values of all three edges for a given starting point
`together with the delta values for the x and y directions, so
`as to enable incremental calculation of the successive
`co-ordinates. With each clock, the scan converter advances
`by one pixel in the horizontal direction or by one pixel in the
`vertical direction. FIG. 26B shows a potential traversing
`algorithm for passing through the triangle 32 already shown
`in FIG. 26A. The scan path is shown in FIG. 26B and, as can
`be seen, the triangle is passed through up to the last pixel 36
`in the manner shown, starting from a starting pixel 34. From
`here, the algorithm jumps to a further graphic primitive to be
`processed. In traversing the graphic primitive, edge function
`values for older positions can be stored so as to enable a
`return to the same or to their neighbors. The aim is to
`consume as few clock cycles per triangle as possible or, in
`
`4
`to avoid the scanning of pixels outside 1:16
`other words,
`triangle, which will be referred to as invisible pixels in 1:10
`further course of the description. For example, a simple
`method might consist in traversing all pixels which are
`contained within the enclosure triangle of the graphic primi-
`tive and in verifying the same with regard to their visibili y.
`This would evidently mean that at least 50% of non-visible
`pixels would have to be traversed. In contrast to this, tie
`algorithm shown in FIG. 26B is developed further, after it
`has scanned the triangle on a scan line-by-scan line basis,
`with a leading edge of the triangle being tracked. T1e
`leading edge of the triangle is that which exhibits the largest
`extension in a direction perpendicular to the scanning direc-
`tion or to the scan line. With most triangle forms, traversing
`invisible pixels to a large extent is thereby avoided, and 1163
`percentage of scanned invisible pixels rises only for very
`narrow triangles.
`The scan lines may be defined either horizontally or
`vertically or even with changing orientations, depending on
`the triangle to be examined. In practice, it is expedient to
`restrict scan conversion to horizontal scan lines, as this
`aligns the scan with the display—refreshing scan and,
`moreover, a memory access can typically be optimized only
`for one scan axis. If the scan lines are horizontally defined,
`the leading edge of the triangle is defined by the two vertices
`exhibiting the largest difference regarding their y
`co-ordinates. In order to assure symmetrical behavior after
`rasterization or scan conversion, it is desirable to change 116
`vertical and horizontal scanning directions as a function of
`t1e inclination of the leading edge and as a function of 1:16
`orientation of the triangle, respectively. FIG. 27 shows
`eifferent scanning directions for different types of triangles.
`As can be seen, for the triangle of type A, the horizon al
`scanning direction is defined in the positive x direction, and
`t1e vertical scanning direction is defined in the positive y
`cirection. For the triangle of type B, the horizontal scanning
`eirection is defined in the negative x direction, and 1162
`Vertical scanning direction is defined in the positive y
`cirection. For the type C triangle, the horizontal scanning
`cirection is defined in the positive x direction, and he
`vertical scanning direction is defined in the negative y
`cirection, and for the type D triangle, the vertical scanning
`cirection is defined in the negative y direction, and he
`horizontal scanning direction is defined in the negative X
`( irection.
`
`In the following, a more detailed description will be given
`of the memory subsystem mentioned with regard to FIGS.
`22 and 23. Known graphics systems typically use dynamic
`random access memories (e.g., synchronous DRAMs) for
`frame buffer storage. After the performance of the rasterizer
`has been determined by the memory bandwidth, it is desir-
`able to communicate with the memory in an efficient man-
`ner
`
`Large frame buffers (e.g., 1600><1280><32 bits-8M bits)
`can be accommodated in only a small number of memory
`components. For assuring an adequate bandwidth,
`the
`memory is accessed via a broad path, and the same is limited
`only by the number of inputs/outputs (I/Os) present at the
`connection between the graphics chip and the memory (e.g.,
`128 data bits). Using modern technologies, such as double
`data rate transmission, frame buffers having bandwidths of
`more than 2 GByte/sec per graphics control can be achieved.
`However, this bandwidth is not available for the entirely
`random access.
`
`A DRAM array consists of rows and columns, and access
`within one row (page) to varying columns will normally be
`very fast. Synchronous DPAMs can transfer data in each
`
`0015
`
`
`
`US 6,778,177 B1
`
`5
`clock cycle, provided that they remain in the same row.
`Passing to a different row is equivalent to consuming several
`clock cycles for closing the old column and opening the new
`one.
`
`These cycles cannot be utilized for actual data
`transmission, so that the overall bandwidth is reduced. In
`order to minimize this effect, modern DRAMs contain some,
`2 to 4, memory banks in which different rows may be open.
`An efficient image rendering system must take these prop-
`erties into account in order to be able to yield optimum '
`performance.
`A known technique in memories is referred to as
`“memory tiling”, i.e., the subdivision of the memory into
`blocks or blocks. In this case, rectangular-shaped areas of a
`mapping screen are mapped to blocks (blocks)
`in the
`memory. Small triangles have a tendency to completely fall
`into one block, which means that these do not lead, during
`image rendering, to page defaults in accessing the memory.
`The graphics systems properties for processing triangles
`which intersect several blocks,
`i.e., which extend over
`several blocks, can be enhanced by mapping adjacent blocks
`onto different memory banks in the form of a chessboard.
`One example of a potential memory partitioning is shown in
`FIG. 28, in which each block has a size of 2 Kbytes.
`From U.S. Pat. No. 5,367,632, a graphics system is known
`which has a plurality of graphics rendering elements
`arranged in the manner of pipelines, each pipeline being
`associated with a rasterization with a corresponding
`memory. The individual memories are conventional memory
`elements which per se each form a frame buffer for the
`respective pipeline. The memories are not arranged in any
`specific organization.
`U.S. Pat. No. 5,821,944 describes “memory tiling”,
`wherein a screen area, onto which a graphic primitive is to
`be mapped, is subdivided into a plurality of fields or blocks.
`Specification of the blocks is followed by a two—step scan,
`and it is established which of the blocks comprise a portion
`of the graphic primitive to be processed. Subsequently, the
`blocks which have just been determined are scanned in the
`second step. The individual blocks are selected so as to be
`associated with corresponding memory areas, the memory
`areas associated with the respective blocks being filed in a
`cache memory during the processing.
`The graphics systems known from the prior art for pro-
`cessing three-dimensional
`images are disadvantageous,
`however, in that optimum utilization of the memory capaci-
`ties is not ensured. For this reason, and on the grounds of the
`rasterization methods known from the prior art, the perfor-
`mance of these systems is limited.
`SUMMARY OF THE INVENTION
`
`It is the object of the present invention to provide a
`method for rasterizing a graphic primitive which exhibits
`increased performance compared with the methods known
`in the prior art.
`In accordance with a first aspect, the present invention
`provides a method for rasterizing a graphic primitive in a
`graphics system for generating pixel data for the graphic
`primitive, starting from graphic primitive description data,
`with the graphics system having a memory divided up into
`a plurality of blocks, each of which is associated with a
`predetermined one of a plurality of areas on a mapping
`screen. In a first step, the pixels associated with the graphic
`primitive are scanned in one of the plurality of blocks into
`which the graphic primitive extends, and this step is repeated
`until all pixels associated with the graphic primitive have
`
`6
`been scanned in each of the plurality of blocks into which
`the graphic primitive extends. Subsequently, the pixel data
`obtained are output for further processing.
`In accordance with a second aspect, a method for raster-
`izing a graphic primitive in a graphics system is provided for
`generating pixel data for the graphic primitive, starting from
`graphic primitive description data,
`the graphics system
`including a plurality of graphics processing pipelines.
`Initially, a plurality of adjacent pixels are simultaneously
`scanned, with the adjacent pixels forming a cluster, at least
`one of the plurality of adjacent pixels being associated with
`the graphic primitive, and with the number of the pixels
`being simultaneously scanned depending on the number of
`graphics processing pipelines in the graphics system.
`Subsequently, this step is repeated until all pixels associated
`with the graphic primitive have been scanned, and, finally,
`all the pixel data are output.
`The present invention is based on the realization that the
`performance of graphics processing systems can be
`increased in that, on the one hand, the graphic primitives to
`be scanned are traversed in an “intelligent” manner and/or
`that, on the other hand, the performance of the system is
`increased by a further parallelization of data processing.
`In accordance with the present invention, a method is
`taught which implements a “monolithic algorithm” in which
`all of the aspects explained above can be used together,
`individually or in any combination so as to increase the
`system’s performance. This results in a “scalable architec-
`ture” of the graphics processing means to be used.
`Several image-rendering pipelines are supported on one
`individual chip such that each of the same processes a
`diiferent pixel of a memory word. This requires that the
`parallel scan converter functions in an operating mode
`referred to as locked scan. This means that
`the pixels
`processed in parallel always have a fixed geometric rela-
`tionship with one another (pixel cluster). This facilitates
`hardware implementation with regard to the memory sub-
`system. Furthermore, this enables application of the method
`to chips with several image-rendering pipelines, indepen-
`dent of the chip layout.
`A further advantage of the method is that it is possible to
`combine several individual chips (see above) in one system
`so as to increase the performance thereof with each chip
`, added. In addition, different chips in the system may serve
`to fulfil different tasks and to process a different number of
`pixels, i.e., clusters of different sizes, in parallel. In this case,
`it is not necessary for the scan converters of the parallel
`image-rendering chips in the system to be interlocked, since
`each of same has its own frame buffer memory, and the
`supply of the polygon data can be decoupled using FIFOs.
`A further advantage of the present invention consists in
`memory utilization. Memory utilization mainly depends on
`the efficiency of memory control and the memory decision
`circuit (arbitration circuit). However, even with an ideal
`memory interface unit, the randomness of the pixel accesses
`may ruin memory utilization, in particular when scanning
`small triangles, this effect being even further aggravated in
`parallel image rendering, where the triangles are subdivided
`into smaller sections. This problem is avoided in accordance
`with the present invention, since the same is based on the
`realization that the number of page defaults per triangle can
`be minimized in the event
`that
`the scan converter has
`knowledge with regard to mapping the screen areas onto the
`memory address area (tiling). Further, the average number of
`memory banks which are simultaneously open may also be
`reduced which, again, reduces potential collisions in systems
`
`0016
`
`
`
`US 6,778,177 B1
`
`7
`where several requests (texture reading operation, graphics
`rendering engine reading/writing operation, display screen
`reading operation) are effected with regard to a shared
`memory element (linked memory).
`Another advantage of the present invention is that the
`efficiency of cache storage of texture data can be improved
`by the method in accordance with the invention. Typically,
`bi-linear or tri-linear filtering is used for texture mapping.
`Without latching the texture data, four (bi-linear filtering) or
`even eight (tri-hnear filtering) unfiltered texels become '
`necessary which would have to be provided by the memory
`subsystem per pixel. A texture cache memory can benefit
`from the fact that adjacent texels can be reused during the
`passing of a scan line. The extent of the reuse strongly
`depends on the magnification/reduction chosen and is sig-
`nificant if a suitable MIP-map level is selected. Within one
`scan line, only a very small
`texture cache memory is
`required in order to benefit from this advantage. In order to
`reuse the adjacent texels of a previous scan line, however,
`the texture cache memory must contain a complete scan line
`of the texels.
`In practice, a cache memory size will be
`selected which is capable of storing scan lines for triangles
`or graphic primitives of an average size, whereby the
`efficiency for larger triangles somewhat decreases. In this
`connection, a further advantage of the present invention is
`that a maximum length of a scan line can be guaranteed by
`the scan converter, so that the cache memory can be accu-
`rately dimensioned and is normally considerably smaller
`than that required for storing scan lines for average triangles.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`In the following, preferred embodiments of the present
`invention will be explained in more detail with reference to
`the attached drawings, in which:
`FIG. 1 is a schematic illustration of a graphics system;
`FIG. 2 is an illustration of the inventive rasterization
`method in accordance with a first embodiment;
`FIG. 3 is an illustration of the inventive rasterization
`method in accordance with a second embodiment;
`FIG. 4 is an illustration of the inventive rasterization
`method in accordance with a third embodiment;
`FIG. 5 is a table which shows dimensions for different
`clusters that may be used in accordance with