`
`[191
`
`[11] Patent Number:
`
`5,757,385
`
`
`
`[45] Date of Patent: May 26, 1998
`Narayanaswarni et al.
`
`US005?5733SA
`
`Graham Rowan. "Real Images in real—-time". Computer
`Graphics 88. pp. 13-23. Oct. 1983.
`Anthony Reeves. "Region Growing on a Highly Parallel
`Mesh-Connected SIMD Computer”. IEEE Computer Soci-
`ety Press. pp. 93-100. Oct. 1988.
`Michael Palmer and Stephen Taylor. "Rotation Invariant
`Partitioning for Concurrent Scientific Visualization". Paral-
`lel CFD‘94 Conference. p. 409-416. May I994.
`Tong—Yee Lee. C.S. Raghavendra. John B. Nicholas. "Load
`Balancing Strategies for Ray Tracing on Parallel Proces-
`sors”. IEEE. Region Annual International Conference. pp.
`1Ta'—l8l. 1994.
`Marc H. Willebeelr-l_reM.air. Anthony Reeves. Strategies for
`Dynamic Load Balancing on Highly Parallel Cornputersz.
`]EEETransacl.ions on Parallel and Distributed Systems. pp.
`-999-993. Sep. 1993.
`Computer Graphics Principles and Prrzrice. Second Edition.
`Foley et al. Chapter 18. "Advanced Raster Graphics Archi-
`tecture". S. Molnar et al. pp. 855-922.
`
`Primary F.t'.mrr.irrer—-K.-‘.'.c M. Tliltlg
`Amuse); Agent, or Fr‘rm—Volel Ernilc: Alan L. Carlson:
`Andrew J. Dillon
`
`[ST]
`
`ABSTRACT
`
`An apparatus for utilizing multiple processors to render
`graphical objects for display including apparatus for storing
`in memory a list of pixel locations assigned to each of the
`processors. apparanis for scan convening each received
`graphical object into pixels. and each processor including
`apparatus for rendering graphical object pixels at pixel
`locations assigned to the processor. In addition. a method of
`utilizing multiple processors to render graphical objects for
`display including the steps or storing in memtry a list of
`pixel locations assigned to each of the processors. scan
`convening each received graphical object into pixels. and
`each processor rendering graphical object pixels at pixel
`locations assigned to the processor.
`
`18 Clahns. as nnwtng Sheets
`
`[S4] MEITIOD AND APPARATIJS FOR
`MANAGING MIIJLTIPROCBSOR
`GRAPHICAL WORKLOAD DISTRIBUTION
`
`[751
`
`Inventors: Chandrasekhar Narayaluswnml:
`Avijit Saba. both of Austin. Tex.
`International llllsiness Machines
`Corporation. Armonk. N.Y.
`
`[73] Assignee:
`
`[21] A.ppL No.:6295W
`[22] Filed:
`Jun.3o.199e
`
`Related U.S. Applimlion Data
`
`[53] Continuation ntfscr. No. ‘2.86.‘i'l.i.. Jul. 21. I994.
`[51]
`Int. cl.‘ ......
`.............. Gear 15:30
`[52] us. CI.
`........
`.345.r5o5: 3955575
`.,..
`[53] Field orsearcn .............
`395:12s—133.
`3951501. 502. 505. 50?. 515. 675; 345124.
`2123. 133. 185. 139. 203. 505. 501. 502.
`501.515
`
`[56]
`
`References Cited
`U.S. PATENT nocumzems
`
`
`
`1011990
`4,961,392
`611993
`5.220.650
`2t'I994
`5.28".-‘,-I38
`311994
`5343553
`2t'1995
`5.392.393
`-#1995
`5.408.606
`FOREIGN PATENT DOCUMENTS
`3lTl'96l
`[E1991
`Japan.
`UPI-IER PUBIJCATIONS
`
`.. 395505
`395505
`395E132
`3951125
`3954505
`39S.I'163
`
`IEEE Computer Graphics and Applications Magazine.
`"Dynamic load balancing for parallel polygon rendering“. S.
`Whitman. Jul. 1994. pp. 41-48.
`
`...__.. - - - -
`
`FlE|l0‘lllBLE HEM E
`
`
`
`Blfl
`mm: some ..__,
`GRAPHICS
`hon,“
`
`1E
`
`LUT 55!
`use 25:
`
`
`
`
`eounnea
`
`ms
`—
`
`comics
`0iITP|iT|.'IE\I'lGEiS]
`
`-'1’
`
`0001
`0001
`
`Volkswagen 1008
`Volkswagen 1008
`
`mm
`nemesis:
`E
`auteur
`Di\|lCEl5I
`H
`
`.9}
`
`
`
`US. Patent
`
`Mmm
`
`Sheet 1 of 6
`
`5,757,385
`
`%Ems.m._m§o.._#_
`
`
`.......--vaflmofiae.
`
`flw.
`
`mu_:n_§o
`
`$5.34
`
`>mo:m=
`
`mo_=._§_o
`
`EH42
`
`mommmuoma
`
`02mo_E<mo
`
`
`
`§m.o_>3S.E._o
`
`8_:%mo
`
`$5.3:
`
`
`
` wexm_.__....=.._<_._0flE.:oEz89..
`
`2.5..
`
`Emomm.u.uoE
`
`E>___2__m___._2::
`
`m2mm5n___._oo
`
`0002
`
`
`
`U.S. Patent
`
`May 26, 1993
`
`Sheet 2 of 5
`
`5,757,385
`
`OPERATING
`
`OPERATING
`svsnau
`KEFINEL E!
`
`HOST 3.9‘?
`COMPUTER
`MICROCODE
`
`GRAPHICS
`APPLICATION
`
`GRAPHICS
`APPLICATION
`
`E
`
`E2
`
`APPLICATION
`PROGRAM
`INTERFACE
`(API)
`3_a£
`
`APPLICATION
`PROGRAM
`INTERFACE
`(API)
`Sfl
`
`GRAPHICS
`APPLICATION
`INTERFACE
`(Gm)
`152
`
`GRAPHICS
`APPLICATION
`INTERFACE
`(GA!) E
`
`DEVICE DRIVER
`(GRAPHICS
`KEFINEL)
`
`370
`
`ADAPTER
`HICROCODE
`
`330
`
`0003
`
`
`
`5,757,385
`
`0004
`
`
`
`U.S. Patent
`
`8mmV.aM
`
`Sheet 4 of 6
`
`5,757,385
`
`IIIIIIIIIII1
`IIIIIIIIIIH2
`
`3
`
`0005
`
`
`
`U.S. Patent
`
`May 26, 1993
`
`Sheet 5 of 5
`
`5,757,385
`
`!
`
`A
`
`BLOCK
`AT N
`LL09,»
`‘O
`
`USER
`ALLOCAHON
`?
`
`MIXED
`ALLOCATION
`?
`
`FIG. 5
`
`ALLOCATE PFIOCESSOR
`OWNERSHIP FIOUND-ROBIN
`m UNIFOFIMBLOCKS
`
`5_o_5
`
`ALLOCATE PROCESSOR
`OWNERSHIP ROUND-ROBIN
`IN UNIFORM STRIPES
`
`515
`
`GE!’ BUFFER FROM
`USER FOR SPECIFIED
`OWNERSHIP
`
`@
`
`RETHIEVE OBJECT
`
`E
`
`EXTRACT SUBOBJECT
`
`5_4_9
`
`COMPUTE SUBOBJECT
`COORDINATES
`
`ffi
`
`550
`COMPUTE&STOHE
`SUBOBJECT BOUNDING BOX —
`
`555
`
`NO
`
`LAST
`SUBOBJECT
`?
`
`YES
`
`Es
`
`BUILD
`PIXEUFIEG ION
`OWNERSHIP
`BUFFEFI
`
`@
`
`0006
`
`
`
`U.S. Patent
`
`May 26, 1998
`
`Sheet 6 of 6
`
`5,757,385
`
`HETRIEVE OBJECT
`
`SQ
`
`EXTRACT SUBOBUECT
`
`BE
`
`CALCULATE VARIABLES FOR
`SHADING, TEXTUHING,
`LIGHTING, EIC. FOR
`SUBOBJECT
`
`510
`
`SCAN CONVERT
`SUBOBJECT INTO
`PIXELS
`
`0007
`
`
`
`5.757.385
`
`1
`METHOD AND APPARATIJS FOR
`MANAGING l\‘l'U'LTlPROCESSOR
`GRAPIIICAL WORKLOAD DISTRIBUTION
`
`2
`FIG. 6 is a flowchart illustrating a preferred method for
`each processor to handle the graphical workload while
`determining ownership of pixels or regions.
`
`This is a continuation of application Ser. No. 0B)‘286.?l l.
`filed Jul. 21. 1994.
`
`5
`
`BEST MODE FOR CARRYING OUT THE
`JNVE-2~l'l"ION
`
`TECHNICAL FIELD
`
`invention relates generally to computer
`The present
`graphics systems and more particularly to a rnethod and
`apparatus for managing a graphical workload across mul-
`tiple processors.
`
`BACKGROUND ARI‘
`
`In computer graphics systems. it is desirable to render
`multiple objects into a frame bufier for display as quickly as
`possible. However. the rendering process has become more
`complex and computationally intensive as users demand
`more detailed results using more objects rendered more
`quickly. including providing reallirne motion. while using
`more computationally intensive processing techniques such
`as color. texture. fighting. transparency and other rendering
`techniques. As a result. multiprocessing has been utilized to
`handle this eva increasing graphical workload.
`Many techniques for utilizing multiple processors are
`described in pages 855 to 922 of “Computer Grapltics
`Principles and Practice". second edition. 1990. by Foley.
`Van Dam. Feiner. and Hughes. For example. each object to
`be rendered may be assigned to a specific processor for
`processing including the use of a processor for each object
`in very high performance systems.
`
`DISCLOSURE OF THE INVENTION
`
`An apparatus for utilizing multiple processors to render
`graphical objects for display including apparatus for storing
`in memory a list of pixel locations assigned to each of the
`processors. apparatus for scan converting each received
`graphical object into pixels. and each processor including
`apparatus for rendering graphical object pixels at pixel
`locations assigned taothe processor. In addition. a method of
`utilizing multiple processors to render graphical objects for
`display including the steps of storing in memory a list of
`pixel locations assigned to each of the processors. scan
`converting each received graplrical object into pixels. and
`each processor rendering graphical object pixels at pixel
`locations assigned to the processor.
`A ftnther understanding of the nature and advantages of
`the present invention may be realized by reference to the
`remaining portions of the specification and the drawings.
`BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 is a diagram of a typical digital computer utilized
`by a preferred embodiment of the invention:
`FIG. 2 is a block diagram illustrating the layers of code
`typically utilized by the host computer and graphics adapter
`to pfl'fOl'1'I1 graphics functions;
`FIGS. 3A—3C show various ways a graphical workload
`can be di.sI:r1'buted by pixel location according to a preftnred
`embodiment of the invention;
`FIG. 4 is a block diagram illustrating data structures that
`may be used in the graphics adapter memory in the preferred
`embodiment of the invention;
`FIG. 5 is a flowchart illustrating a prefured method for as
`generating a pixel ownership bulfer or a region ownership
`list; and
`
`This disclosure describes an improved method and appa-
`ratus for managing a graphical workload such as rendering
`across multiple processors by pixel location or by display
`region. That is. a processor may process rendering of pixels
`for particular regions of a display or window based on
`various allocation techniques. These allocation techniques
`may be dynamic such that a particular pixel or display region
`maybeprocessedby afirst processor foraperiod oftime
`and then rendered by a second. processor for another period
`of time. This allows for great flexibility in managing the
`graphical workload across multiple processors.
`FIG. 1 is a blockdiagramofatypical digital computer 100
`utilized by a in-eferred embodiment of the invention. The
`computer includes main processor(s) 110 coupled to a
`memory 120 and a hard disk 125 in computer box 105 with
`input dev-ice(s) 130 and output device(s) 140 attached. Main
`processor(s) 110 may include a single processor or multiple
`processcrs. Input device(s) 13! may include a keyboard.
`mouse. tablet or other types of input devices. Output device
`(s) 140 may include a text monitor. plotter or other types of
`output devices. Computer readable removable media 190.
`such as a magnetic diskette or a compact disc. may be
`inserted into an inputfoutpul device -180. such as a disk drive
`or a CD-ROM (compact disc—read only memory) drive.
`Data is read from or written to the removable media by the
`I10 device undo the control of the H0 device controller 170.
`The U0 device controller comrnunicates with the main
`processor through across bus 160. Main memory 120. hard
`disk 125 and removable media 190 are all refa-red to as
`me.rnory for storing data for processing by main processorfs)
`11 .
`-
`The main processor may also be coupled to graphics
`output devioc(s) 150 such as a graphics display through a
`graphics adapter 3130. Graphics adapter 200 receives instru c-
`tions regarding graphics from main processor(s) 110 on bus
`160. The graphics adapter then executes those instructions
`with graphics adapter prooesstxs 220 coupled to a graphics
`adapter memory 230. The graphics processors in the graph-
`ics adapter then execute those instructions and updates
`frame buffets) 240 based on those instructions. Graphics
`processors 220 may be a pipeline of processors in series. a
`set of parallel processors. or some combination thereof.
`wltereeachprooessorrnay handleaportion ofatasktobe
`completed. Graphic processors 2240 may also include spe-
`cialized rendering hardware for rendering specific types of
`prirnitives. Graphics memory 230 is used by the graphics
`processors to store information being processed. such as
`received object data. intermediate calculated data (such as a
`stencil btdfer or partially rendaed object data). and com-
`pleted data being loaded into the firanre httfier 240. Frame
`bulferls) 240 includes data for every pixel to be displayed on
`the graphics output device. A RAMDAC (random access
`memory digital-to-analog converter) 2.50 converts the digital
`data stored in the frame bullets into RGB signals to be
`provided to the graphics display 150 thereby rendering the
`desired graphics output from the main processor.
`FIG. 2 is a block diagram illustrating the layers of code
`typically utilized by the host computer and graphics adapter
`to perform graphics functions. An operating system 300
`such as UNIX provides the primary control of the host
`
`0008
`
`
`
`5.757.385
`
`3
`computer. Coupled to the operating system is an opaatiug
`system kernel 310 which provides the hardware intensive
`tasks for the operating system. The operating system kernel
`communicates directly with the host computer microcode
`320. The host computer microcode is the primary iusi:ruct:ion
`set executed by the host computer processor. Coupled to the
`operating system 300 are graphics applications 330 and 332.
`This graphics application software can include software
`packages such as Silicon Graphic‘s GL. IBM's graPli1GS.
`M1'l“s PEX. etc. This soltware provides the primary func-
`tions of two dimensional or three dimensional graphics.
`Graphics applications 33. and 332 are coupled to grapltics
`application API (application program interface) 340 and
`342. respectively. The API provides many of the computa-
`tionally intensive tasks for the graphics application and
`provides an interface between the application software and
`software closer to the graphics hardware such as a device
`driver for the graphics adapter. For example. API 340 and
`342 may communicate with a GAI (graphics application
`interface) 350 and 352. respectively. The GA] provides an
`interface between the application API and a graphics adapter
`device driver 370. In some graphics systems. the API also
`performs the function of the GAI.
`The graphics application. AP]. and GA] are considered by
`the operating system and the device driver to be a single
`rxocess. ‘That is. graphics applications 330 and 332. AP] 349
`and 342. and GA! 350 and 352 are considered by operating
`system 300 and device driver 3'70 to be processes 360 and
`362. respectively. The processes are identified by the oper-
`ating system and the device driver by a process identifier
`(PID) that is assigned to the process by the operating system
`kernel. Processes 360 and 362 may use the same code that
`is being executed twice simultaneously. such as two execu-
`tions of a programin two separate windows. ‘The PID is used
`to dislingtlish the sqaarate executions of the same code.
`The device driver is a graphics kernel which is an exten-
`sion of the operating system kernel 310. The graphics lranel
`communicates directly with microcode of the graphics
`adapter 380. In many graphics systems. the GAL or the API
`i.f no GAI layer is used. may request direct access from the
`GA] or API to the adapter microcode by sending an
`request instruction to the device driver. In addition. many
`graphics systems also allow the adapter microcode to
`request direct access from the adapter microcode to the GAI
`orAPIi.fnoGAIis usedby sendinganinitialrequest
`instruction to the device driver. Both processes will herein-
`after he referred to as direct memory access (DMA). DMA
`is typically used when transferring large blocks of data.
`DMA provides for a quiclrer transmission of data between
`the host computer andthe adaptubyeliminating the aeedto
`go through the display driver other than the
`request ft:
`the device driver to set up the DMA. In some cases. the
`adapter microcode utilizes context switching which allows
`the adapt: microcode to replace the ctnrent attributes being
`utilized by the adapter microcode. Context switching is used
`when the adapter microcode is to receive an instruction from
`a graphics application that utilizes difl°etent a.l:lIt'butes than
`the adapted microcode is currently using. The context switch
`is typically initiated by the device driver which recognizes
`the attribute changes.
`Blocks 3010-34! are software code layers that are typi-
`cally independent of the type of graphics adapter being
`utilized Blocks 350-380 are software code layers that are
`typically dependent upon the type of graphics adapter being
`utilized. For example. ifa diifereut graphics adapter were to
`be used by the graphics application software. then a new
`GAL graphics kernel and adapter microcode would be
`
`4
`needed. In addition. blocks 300-370 typically reside on and
`are executed by the host computer. Howeva. the adapter
`microcode 380 typically resides on and is executed by the
`graphics adapter. However.
`in some cases.
`the adapter
`microcode is loaded into the graphics adapter by the host
`computa during initialization of the graphics adapter.
`In typical graphics systems. the usu instructs the graphics
`application to consnud an image from a two or three
`dimensional model. The user firs: selects the lomtion and
`type of light sorn-oes. The user then instructs the application
`software to build the desired model from a set of predefined
`or usa defined objects. Each object may include one or more
`coplanar drawing primitives describing the object. For
`example. a set of drawing primitives such as many triangles
`may be used to define the surface of an object. The user then
`provides a perspective in a window to view the model.
`thereby defining the desired image. The application sofiware
`then starts the rendering of the image from the model by
`sending the drawing primitives describing the objects to the
`adapter microcode through the API. the GA]. and then the
`device driver unless DMA is used. The adapter microcode
`then renders the image on the graphics display by clipping
`(Le. not using) those drawing primitives not visible in the
`window and the adapter microcode breaks each remaining
`drawing primitive into visible pixels born the perspective
`given by the user. The pixels are then loaded into the frarne
`bt.rffer.ofienwitbflteuseofadept|1hulfer‘inthecase ofa
`three dimensional model. This step is very computationally
`intensive due to the number of drawing primitives. variables.
`and pixels involved The resulting image stored in the frame
`butter and displayed on the graphics display typically does
`not carry the original information such as which drawing
`primitive or object the pixel was daived from. As a result.
`the image may need to be rerendered in part or in whole if
`the window. the user perspective. the model. the fighting.
`etc. are modified
`
`The techniques of the present invention could be utilized
`in many locations such as the adapter microcode which is
`close to the adapter frame buffa. This approach would also
`be relatively quick and fairly easy to implement. In addition.
`the present invention could be applied in the graphics
`application software wherein the rendered image is also
`stored in system memory either prior to the image being
`rendered or subsequently by the graphics adapter passing the
`data back up to the gra.pln'cs application software. This
`approach would probably be slowa but would allow for
`utilization of this technique on preexisting graphics adapt-
`ers. The present invention could also be implemented in
`hardware in the graphics adapterprocessor. This approach is
`extremely quick but may necessitate specialized hardware.
`As would be obvious to one of ordinary slrill in the art. the
`present invention could be applied in many other locations
`within the host computer or graphics adapter.
`FIGS. 3A~3vC show various ways a graphical workload
`can be distribtrted by pixel location accccrtling to a preferred
`embodiment of the invention. Each figrne shows a window
`or display where certain regions are allocated to various
`processors for rendering pixels in those regions.
`FIG. 3A shows a round robin approach for distributing the
`graphical workload. A display or window 400 may be
`divided into multiple regions of pixels that are distributed
`across multiple processors in a round robin fashion. In the
`present example there are sixteen regions and three proces-
`gg SUITS P‘. P] and P2.
`FIG. 3B shows a stripe allocation approach for distribut-
`ing the grapltical workload by region of pixels. A display or
`
`0009
`
`
`
`5.757.385
`
`5
`window 400 may be divided into multiple stripes where each
`processor is assigned one snipe. In the present example there
`are six stripes and six processors P0. P1. P2. P3. P4 and PS.
`FIG. 3C shows a mixed allocation procedure where the
`processors may be assigned regions of pixels using various
`tecl1n.iques. In the present example there are many regions
`allocated to three processors P0. P1 and P2. The mixed
`allocation approach allows the allocation to be optimized
`based on various parameters. For example. the breakdown
`and allocation of pixels or regions of pixels may be based on
`the number of objects in a particular areas of the display. The
`breakdown and allocation may also be based on the amount
`of workload each processor already has. The breakdown and
`allocation of pixels or regions of pixels may also be dynamic
`whereby the pixels may be reassigned to dilferent processors
`over time based on various factors. In any approach. the
`present invention allows for great flexibility in distributing
`the workload.
`FIG. 4 is a block diagram illustrating data stn1ch.tres that
`may be used in the graphics adapter memory 230 in the
`preferred embodiment of the invention.
`A pixel ownership bulfa 231 is used to store identifiers
`E) of the processor that owns each pixel. That is. there is
`one entry corresponding to every pixel in the frame bufier.
`and that entry contains an identifier of the processor that
`handles processing for that pixel including rendering. These
`entries may be modified over time to dynamically manage
`the processor workload.
`A region ownership list 235 may be used as an alternative
`to the pixel ownership butter 23] described above. In this
`alternative. the region ownership list includes one entry for
`each region and may be fixed or variable length. Variable
`length would allow for greater flexibility in allocating
`regions but may he dificult to manage. Each entry includes
`five elements that describe the minimum X value (X1). the
`rnaximuni X value (K2). the minimum Y value (Y1). the
`maximum Y value (Y2). and the identifier of the processor
`that owns the region. As a result. the size and location ofthe
`region is defined and ownership is established of the region.
`In another alternative embodiment. the size and location of
`the regions may be fixed such that only the identifier of the
`processor that owns the region may be displayed.
`FIG. 5 is a flowchart illustrating a preferred method for
`generating a pixel ownership buifer or a region ownership
`list. This process may be exealted at any time to establish
`processor ownership of regions or pixels. In addition. this
`process may be repeatedly executed to constantly optimize
`the allocation of the graphical workload
`In a first step 500. it is determined whether block alloca-
`tion of the pixels may be performed This type of allocation
`has the advantage of being simple and quick. Ifyes. then in
`step 505. the processor ownership is allocated in a round-
`robin fashion in uniform blocks of pixels or regions accord-
`ing to the prefared embodiment as shown in FIG. 3A.
`Dining this process. the data structures shown in FIG. 4 or
`their equivalents may be generated which indicate which
`processors own which pixels or regions. However. in alter-
`native embodiments. the regions need not be uniform and
`may be allocated in some other manner other than round-
`robin. Upon completion of step 505. processing ends for this
`process until it is reexecuted. If no in step 500 above. then
`processing continues to step 510.
`In step 510. it is determined whether stripe allocation of
`the pixels may be pa-formed."l‘his type of allocation also has
`the advantage of being simple and quick. Ifyes. then in step
`515. the processor ownership is allocated in a round-robin
`
`6
`fashion in uniforrn stripes of pixels or regions according to
`the preferred embodiment as shown in FIG. 313. During this
`process. the data structures shown in FIG. 4 or their equiva-
`lents may be generated which indicate which processors
`own which pixels or regions. However.
`in alternative
`embodiments. the regions need not be uniform and may be
`allocated in some other manner other than round-robin.
`Upon completion of step 515. processing ends for this
`process until it is reexecuted. If no in step 510 above. then
`processing continues to step 520.
`In step 520. it is determined whether the user (or an
`application program) may wish to allocate the pixels or
`regions. This type of allocation has the advantage of allow-
`ing the user or application programs to generate optimized
`ownership patterns that anticipate upcoming graphical
`workloads. Ifyes. then in step 525. a bufier or list is obtained
`from the user or application program for storage in the
`graphics adapter memory. Upon completion of step 525.
`processing ends for this process until it is reexecuted. I! no
`in step 520 above. then processing continues to step 530.
`In step 530. it is determined whetha a mixed allocation
`of the pixels may he performed. 'I'his type of allocation has
`the advantage of being extremely flexible for optimizing
`allocation of pixel ownership. If no in step 530.
`then
`processing ends for this process until
`it
`is reexecuted.
`However. since no allocation has occurred. then a default or
`previous allocation continues to control which processors
`own which pixels or regions. If yes in step 530.
`then
`processing continues to step 535.
`In step 535. an object to be processed is retrieved In step
`540. the object may be split into subobjects and a first one
`of the subobjects is extracted by generating or retrieving its
`vertices. This splitting process is particularly useful for
`handling three dimensional objects which have many faces.
`each face a subobject. In step 545. the window coordinates
`of the subobject are computed for processing. In step 550. a
`bounding box containing the subcbject is calculated and
`stored. A bounding box is a region that contains the winch-
`ject and may have the maximum and minimum values for X.
`Y and Z coordinates in its boundary. The bounding box is
`preferably stored with other parameters that describe the
`object. In step 555. it is detcnnined whether the last subcla-
`ject for the given object has been processed. If not. then
`processing returns to step 540 for processing the next
`subobject. If yes. then processing continues to step 560
`where it
`is dettnrnined whether the last object has been
`processed. If not. then processing retttrns to step 535 for
`processing the next object. Ifyes. then processing continues
`to step 565.
`the pixel ownership bulfer or the region
`In step 5455.
`ownership list may be generated from all the previously
`calculated bounding boxes. In the preferred ernbodiruent. the
`K-D tree method may be used to allocate the pixels or
`regions based on the distribution of the object bounding
`boxes in the display or window. The K-D method is a well
`known technique for allocating limited resources and is
`described in Algorithm in Combinatorial Geometry. H.
`Edelsbrunner. Springa-—V"er1ag publisher. 1987. Of course.
`other optimization techniques may be utilized.
`The above described steps 500. 510. 520 and 530 may be
`dynamically controlled. That is. they may be controlled by
`the user or an application program. In addition. they may be
`controlled by an optirnization process that determines which
`is the optiminn approach to use for a given situation.
`FIG. 6 is a flowchart illustrating a preferred method for
`each processor to handle the graphical workload while
`
`0010
`
`
`
`S .75 7.3 85
`
`5
`
`7
`determining ownership of pixels or regions. This process
`may be executed concurrently and in parallel by all the
`processors.
`In a first step 600. an object is retrieved for processing.
`The object retrieval may include comparing the bounding
`box for the object (previously determined and stored with
`the object in step 550 described above) to the regions that the
`processor owns it‘ a region ownership list is utilized. Of
`course. other typical clipping techniques may be utilized.
`Once retrieved. in step 605. the object may be split into
`subobjecrs as described above with refuence to 540. Of
`course. other splitting techniques may be utilized in this step
`based on the type of graphical activities the processor is
`handling.
`Once the subobject is extracted. then in step 1510. various
`variables may be calculated for use in calculating colors.
`shading. textures. fighting. transparency. dithering. and other
`possible variables. The type of variables calculated may
`change depending on the types of graphical activities being
`performed by the processor.
`In step 615. the subobject is then scan convuted into
`pixels. In stqi 620. each pixel is checked to see ifit is owned
`by the processor-by comparilzgtheidentifierof the processor
`with the identifier stored in the pixel ownership butter or the
`region ownership list for that pixel.
`If yes in step 620. indicating that the processor owns this
`pixel. then the processor processes the pixel This processing
`may include using the variables from step 610 and other-
`known variables to calculate the color.
`lighting.
`transparency. etc. of that pixel. The calculated pixel values
`are then stored in the Ethnic btlflcr for display. This process
`is typically known as rendering. Clearly. if a processor does
`not own a pixel. many cornputationally intensive calcula-
`tions are avoided. thereby speeding up the rendering pro-
`cess.
`
`If no in step 620 or if step 625 has been completed. tben
`processing continues to step 630. In step 630. it is deter-
`mined whether the last pixel for the subobject has been
`rendered If no. then processing returns to step 615 fer
`generating the next pixel for processing. If yes in stqa 630.
`then processing continues to stqa 635. In step 635.
`it is
`determined whether the last subobject for the object has
`been processed. If no. then processing returns to step 605 for
`generating the next subobjecl. If yes in step 635.
`then
`processing continues to step 640. In step 64!. it is deta-
`mined whom: the last object has been processed. If no. then
`processing returns to step 600 fcr retrieving the next object.
`If yes in step 640. then processing ends for the processor
`until ft:t1l1I::r objects are available for processing. It is at this
`time that the processor may reexecure the allocation process
`described in FIG. 5 above to reallocate some workload to the
`now idle processor.
`The present invention has many advantages over the prior
`art. The present invention is very flexible and dynamic and
`allows optimization techniques to be used to optimize opera-
`tion of the rendering. These optimizafion teclmiques may be
`applied in a predetermined fashion or they may be dynami-
`Cally applied during runtirnc. The present invention may be
`optimized based on the current or planned workload of each
`processor. die numba‘ of processors. the number and loca-
`tion of objects on a display. as well as other meastrres of
`optimization.
`Although the present invention has been fully described
`above with reference to specific embodiments. other alter-
`native embodiments will be apparent to those of ordinary
`skill in the art. Therefore. the above description should not
`
`8
`be taken as limiting the scope of the present invention which
`is defined by the appended claims.
`What is claimed is:
`1. An apparatus for utilizing a plurality of [Iocessors to
`render graphical objects for display comprising:
`means for assigning selected processors. to predetermined
`pixels. wherein said means for assigning is in response
`to a ctnrent or a planned workload;
`means for storing in memory said assignrnent of said
`selected processors to said predetermined pixels:
`means for scan converting each received graphical object
`into pixels;
`means for dynamically modifying said assignment. to
`reassign said predetermined pixels to alternate proces-
`sors so as to optimize the processors workload. wherein
`said reassignment is in response to said current cr said
`planned workload; and
`wherein eadr procmsor includes means for rendering
`graphical object pixels only at pixel locations assigned
`to said processor according to said assignment stored in
`said memory.
`2. The apparatus of claim 1 wherein said means for storing
`includes means for dynamically determining which pixel
`locations are to be assigned to each of said processors.
`3. The apparabrs of claim 2 further contprising means for
`providing to each processor graphical objects that intersect
`pixel locations assigned to said selected processors.
`4. The apparatus of claim 3 whuein die means for scan
`converting includes each processor including means for scan
`converting graphical objects received by said processor into
`pixels.
`5. The apparanrs of claim 6 wherein the means for storing
`includes means for utilizing user input
`to dynamically
`determine which pixel locations are to be assigned to each
`of said processors.
`6. The apparatus of claim 4 wherein the means for storing
`includes means for utilizing optimization techniques to
`dynamically deteunine which pixel
`locations are to be
`assigned to each of said processors.
`7. A method of utilizing a plurality of processors to render
`graphical objects for display coanprising the steps of:
`storing in memory assignments of pixel locations to a
`selected processor in response to a current or a planned
`workload;
`scan converting each received graphical object into pix-
`els:
`dynamically modifying said assignments to reassign pixel
`locations to alternate processors so as to optimize the
`processors workload. wherein said reassignment is in
`response to a current tr planned workload: and
`each processor rendering graphical object pixels only at
`pixel locations assigned to said processor according to
`said assignments stored in said memory.
`3. The method of clai