throbber
0201
`
`Volkswagen 1002 - Part 3 of 8
`
`

`
`[24] R. Sialrlanta, I. Zheng. T. Funkhouser. K- Li. and J» P- Sinslh
`Load Balancing for Multi-Projector Rendering Systems.
`Pwceedings of5'fGGRAPHfEuragriqphic.r Wbrkrfiop on
`Graphics Hardware, pages 101-116. August 1999.
`[25] SGI rnultipipe. wmw ww.
`
`[26] St]! vizserver. uqrclhrwwsgiamnieofiwueivinetvuf.
`I211 G. Stoll. M. Eldridge. D- Pnflemn. A- Webb. 8. Barman.
`R. LOW. C. Caywflfid. M. Tevelra. 8.1-lam. arid! Hanrahnn.
`Lightning-2: A‘Higl|-Performance Display Subsystem for
`PC Clusters. Praeeediugs OfSIGGRAPH 2001. August 200].
`
`"
`
`SIGGRAPH 2001, Los Angeles. California. August 1 2-17, 2001
`
`Pixel-Planes S: A Heterogeneous Multiprocessor Graphics
`Sysiem Using Pi-oeessor—Enl1ancecl Memories. Proceedings
`t.'fS!GGRAPH 89. pages 79-38, July 1989.
`_
`_
`_
`[8] P. D. Heerrnann. Production Visualization for the ASCI One
`TeraFLOPS Machine. Pmceeding: ofIEEE iisualization.
`pages 459-462. October 1998.
`9 A.HeirichandL.M 1|. Sealabl Dstributedv’ uali at
`I 1 Using or:-me-slwuceampommi. is.-:2 pamuii
`Z on
`mmfi
`‘
`mg G M? sympas‘
`I
`_§5_59'
`mp
`3
`mm Pas“
`[10] G. Htlmphreys. I. Buck. M. Eldridge. andP. Hanrahen.
`Distributed Rendering for Scalable Displays. 1551.-:
`Superramputing 2000. October 2000.
`
`[1 I] It Igeliy. M. Eldridge. and P. I-Ienrshan. Parallel ‘Texture
`Caching. Proceeding: OISIGGRAPH I Eumgrapfricr
`Workshop on Graphics Hamlwarz. Past: 95-106. August
`I999.
`
`H. Igehy. G. Ste-ll. and P. Hemahan. The Design of a Parallel
`Graphics Interface. P.-acceding: cf.S'IGG'RAPH 93. pages
`Ml-ISO, July 1998.
`
`M. Kilgard. GLR. an 0penGL Render Server Facility.
`Pmceeding: a_fX Technical Conference. February 1996.
`
`M. Lewsy. K Pulli. B. cmless. s. Rusinlciewicz. D. Koller.
`L Pleteira. M. Giuzton, S. Anderson, J. Davis, J. Ginsberg.
`J. Shade. and D. Folk. The Digital Michelangelo Project: 3D
`Scanning of Large Statues. Proreedings af.S'l’GGRAPH
`2000. pages I3 I-144. July 2000.
`
`[15] W. E. Lorensen and H. E.C1ine. Marching Cubes: A High
`Resolution 3D Surface Construction Algorithm Pmceedings
`cfSIGGRAPH 87. pages 163-169. July 1937.
`
`[16] S. Molnar, M. Cox. D. Ellsworth. and H. Fuchs. A Sorting
`Classification oi'Parel!e1 Rendering. IEEE Computer
`Graphic: mrdAlgan'rlum'. pages 23-32. July 1994.
`
`[17] S. Molnar. J. Eyles. and J. Poullon. PiJtelFlow: High-Speed
`Rendering Using Image Composition. Proceedings qt‘
`SIGGRAPH 92. pages 231-240. August 1992.
`
`[18] J. Monuyrn. D. Baum. D. Dignam. and C. Migdal.
`lnfinilekeality: A Real-‘lime Graphics System. Proceeding:
`ofSlG'GRAFH 9?. pages 293-302. August 1997.
`
`[19] C. Mueller. ‘me Sort-First Rendering Architecture for
`High-Perlormanoe Graphics. 1995 Symposium an Interactive
`JD Gmpkics. pages 75-84, April 1995.
`
`[20] 0penGl.. Specifications.
`IIIpafo‘w\~w.open;l.urg:Donm|euutionf3peIL1.hunI.
`
`121] W. C. Reynolds and M. Falica. Stanford Center for
`Integrated Tll.I'iI.IiB-l'ICB Simulations. IEEE Cmrpurbig in
`Science and Engineering. pages 54-63. April 2000.
`
`[22] J. Rohlf and I. Helman. IRIS Performer: A High
`Pbrfonnanoe Multiprocessing Toolkit for Real-Tim: 3D
`Graphics. In Pmceedings OISIGGRAPH 94. pages 381-395.
`July I994.
`
`[23] R. Samsnu. T. Funlshonacr. K. Li. and LP. Singh. Son-First
`Parallel Rendering with a Cluster of PCs. SIGGRAPH 2000
`Technical Sketch. August 2000.
`
`0202
`
`

`
`2
`9.3.5313 - P
`225:: Hi! all-QT-.’-E'i'?2i°-’r}
`9:
`roroiuozoao
`FAX [0T0}34030I6
`
`-
`-'-'*.
`". '3 E5‘
`
`=§§_
`a
`'
`'
`
`'."_"‘-'51s""'-’l
`
`Europalsches
`Patentolnt
`
`European
`Patent Office
`
`Oflloe européen
`dos brevets
`
`GenaraId'n‘elilion 1
`
`Directorate Generali
`
`Direction generals 1
`
`Hgwe, Steven
`
`Lloyd Wise
`Commonwealth House,
`1-19 New Oxford Street
`London WO1A 1LW
`GFIAN DE BRETAGNE
`
`Reference
`SH-46739
`
`Applioanl!PtopI'iotor
`ATI Technologies lnc.
`
`59° °““""'°' 9°”'°°°
`1'ar_;+31 (5)79 340 45 99
`
`Application NoJPatent No.
`0325746-1-.2 - 2218
`
`COMMUNICATION
`
`The European Patent Office herewith transmits as an enclosure the European search report (under Fl. 44
`or Fl. 45 EPO) for the above-mentioned European patent application.
`
`if applicable, copies of the documents cited in the European search report are attached.
`
`Additional set(s) of copies of the documents cited in the European search report is (are) enclosed
`as well.
`
`The following specifications given by the applicant have been approved by the Search Division :
`
`53'
`
`Abstract
`
`Title
`
`|j
`
`The abstract was modified by the Search Division and the definitive text is attached to this
`communication.
`
`The following figure will be published together with the abstract : 2
`
`Refund of search fee
`
`If applicable under Article 10 Rules relating to fees. a separate communication from the Fleceiving Section
`on the refund of the search fee will be sent later.
`
`0203
`
`

`
`0
`
`E"'_°”°“" "“‘°"'
`omce
`
`EUROPEAN SEARCH neponr
`
`“"""°"“°-“"“"“’°’
`EP 03 25 7454
`
`DOCUMENTS CONSIDERED TO BE RELEVANT
`Citation of document with indication, where appropriate.
`ofretevant
`
`cam 0
`9 '5’
`
`“Computer Graphics,
`FOLEY JAMES ET AL:
`PrincipTes and Practice"
`COMPUTER GRAPHICS. PRINCIPLES AND
`
`PRACTICE, READING, ADDISON WESLEY, US,
`1990, pages 8?3—899, XP002360077
`* pages 874,876, *
`
`* pages 887,388 *
`
`"An introduction to
`CROCKETT T H:
`parallel render1ng"
`PARALLEL COMPUTING ELSEUIER NETHERLANDS,
`V01. 23, no. 7, Juiy 1997 (1997-07), Pages
`819-343, XP004?29969
`ISSN: 0167-8191
`* pages 822-823 *
`
`* page 828 *
`
`"InfiniteReaT1ty: a
`MONTRYM J 8 ET AL:
`real-time graphics system"
`COMPUTER GRAPHICS PROCEEDINGS, SIGGRAPH 97
`ACM NEW YORK, NY, USA, 1997, pages
`293-302, KPOO236007B
`ISBN: 0—89791—396—7
`
`-
`
`* pages 294-296; figures 1-3 *
`
`(SMITH WADE K ET AL)
`US 6 424 345 B1
`23 July 2002 (2002-07-23)
`* column 3,
`line 31 - Tine 60; figure 7 *
`* column 6, Tine 20 ~ Tine 40 *
`
`_/__
`
`tocleim
`
`APPI-Icemn arc}
`
`GO6T15/00
`
`Tficflmcfitflfiflfi
`SEARCHED
`(IPC)
`
`G06T
`0090
`
`The present search report has been orawn up for all claims
`Place of seam
`Dale of connotation oi Ina search
`
`Examiner
`
`CATEGORY OF CITED DOCUMENTS
`X : particularly relevant ii taken alone
`Y ' panicuiany relevant ii combined with another
`document on the same category
`A : technological background
`O : non-written disclosure
`P ~ irnennediale documens
`
`19 December 2005
`
`rem, H
`
`T : theory or principle underlying the invention
`E : earllerpatent document. but published on. or
`after the tiring date
`D : document cited in the application
`L: document cited tor other reasons
`
`5:
`
`: member of the same patent family. corresponding
`document
`
`
`
`EPOFORM150303.32[P04-CD1]N)
`
`0204
`
`

`
`"NireGL: a scaiabie
`HUMPHREYS G ET AL:
`graphics system for ciusters“
`COMPUTER GRAPHICS PROCEEDINGS. SIGGRAPH
`2001 ACM NEW YORK, NY, USA, 2001, pages
`129-140, XP001049381
`ISBN:
`I-58l13-374-x
`* abstract; figures 1,2,5 s
`* pages 129,130 *
`* page 135 *
`
`Application Number
`
`EP 03 25 7464
`
`Relevant
`to c|aim
`
`CLASSIFICATION OF THE
`APPL1CM'l0N {IPC}
`
`1- ,
`20-24
`
`TECHNICAL HELDS
`SEARCHED
`(IPC)
`
`The present search report has been drawn up for all claims
`Place 01 Search
`Date olcomplelion o'1rve search
`
`Munich
`CATEGORY OF CITED DOCUMENTS
`X : panicularly relevant ii taken alone
`Y : paniculerly relevant if combined with another
`document of the same category
`A : technological background
`O: l1:In-written disclosure
`P L intermediate d0CLlI'I'lEI'Il
`
`19 December 2005
`T : theory or principle underlying the invention
`E : earlier patent document. but published an. or
`alter the filing date
`0 ' document cited in the application
`L: document cited tor other reasons
`8. : member at the same patent family. corresponding
`dDCUl'l'IE|'|l
`
`
`
`EPOFOFIM1503o3.a21Pu-iconN
`
`0205
`
`

`
`ANNEXTOTHEEUROPEANSEARCHREPORT
`ON EUROPEAN PATENT APPLICATFON N0.
`
`EP 03 25 7464
`
`This annex lists the patent family membererelating to the patent documents cited in the above-mentioned European search report.
`The members are as contained in the European Patent Office EDP iile on
`The European Patent Office is in no way liable for these particulars which are mereiy given for the purpose of information.
`19-12-2005
`
`Patent document
`cited in search report
`
`Publication
`date
`
`rly
`
`Publication
`date
`
`US 6424345
`
`B1
`
`23-07-2002
`
`NONE
`
`Obuu
`
`For more details about this annex .-see OI-Iicial Journal of the European Patent Oflice, No. 1232
`
`EEL
`
`:0L
`
`i.
`0n.Lu
`
`0206
`
`

`
`XP-002360077
`
`ECOND EDITION W C
`
`Computer Graphics
`
`PRINCIPLES AND PRACTICE
`
`James D. Foley
`Georgia Institute of Technology
`
`Andries van Dam
`Brown University
`
`P Steven K. Feiner
`Columbia University
`
`'
`
`John F. Hughes
`
`Brown University
`
`A
`TV
`
`_
`
`ADDISON-WESLEY PUBLISHING COMPANY
`Rnading, Massachusetts 1 Menlo Park, California 0 New York
`Don Mills, Ontario o Wokingham, England 0 Amsterdam 9 Bonn
`Sydney 4 Singapore 0 Tokyo 0 Madrid o San Juan 0 Milan ¢ Paris
`
`0207
`
`

`
`''8.-'l
`
`'
`
`lntrociuction to M uitiprocessing
`
`3253
`
`Summing the floating-point requirements for all of the geometry stages gives a total oi’
`3.220.000 multiplicationstdivisions and l .-470.000 addititmslsubtractions per frame. Since
`a new trains is calctilated cv<:ry,—‘,_, second. a total of 22.2 million multiplicationsldivisions
`and I4.7 million additions/subtractiom» 136.9 million aggregate floating-point operations)
`as required per seconti-—-a very substantial number.
`
`Rasterization calculations and frame-buffer accesses. Let 03 now estimate the
`number of pixel calculations and l"rarne-buffer memory accesses required in each frame. We
`assume that 2 values and RGB triples each occupy one word (32 bits) of frame-bufier
`memory (typical in most current high-performance systems). For each pixel that is initially
`visible (i.e., results in an update to the frame buffer). 2. R, G, and 8 values are calculated (4
`additions per pixel if forward differences are used). a 2 value is read from the frame bufi'er(1
`frame-buffer cycle), the 2 values are compared (I subtraction) and new z values and colors
`are written (2 frame-buffer cycles). For each pixel that is initially not visible, only the 2
`value needs to be calculated (1 addition). and a 2 value is read from the frame buffer (1
`frame-bufier cycle], and the two 2 values are compared {1 subtraction). Note that initially
`visible pixels may get covered, but initially invisible pixels can never be exposed.
`Since we assume that one-half of the pixels of each triangle are visible in the final
`scene, a reasonable guess is that three-quarters of the pixels are initially visible and
`one-quarter of the pixels are initially invisible. Eaclt triangle covers I00 pixels, so
`100 -
`10,000 = 750,000 pixels are initially visible and% - 100 - 10,000 = 250.000 pixels are ini-
`tially invisible. To display an entire frame, therefore, a total of {750,000 - 5) + (250,000 -
`2) = 4.25 million additions and 050,000 - 3) + (250,000 - 1) = 2.5 million frame-buffer
`accesses is required. To initialize each frame. both color and 2-buffers must be cleared, an
`additional 1280 -
`I024 - 2 = 2.6 million frame-buffer accesses. The total number of
`frame-buffer accesses per frame, therefore, is 2.5 million + 2.6 million = 5.1 million. If
`10 frames are generated per second, 42.5 million additions and 51 million frame-buffer
`accesses are required per second.
`In 1989,
`the fastest floating-point processors available computed approximateiy 20
`million floating-point operations per second,
`the fastest integer processors computed
`approximately 40 million integer operations per second. and DRAM memory systems had
`. Jtycle times of approximately 100 nanoseconds. The floating-point and integer requirements
`5 of our sample application, therefore, are just at the limit of what can be achieved in a single
`CPU. The number of frame-butter accesses , however, is much higher than is possible in a con-
`ventional memory system. As we mentioned earlier, this database is only modestly sized
`for systems available in 1989. In the following sections. we show how multiprocessing can be
`used to achieve the perforrnance necessary to display databases that are this size and larger.
`
`'
`
`18.4 INTRODUCTION TO MULTIPROCESSING
`
`Displaying large databases at high frame rates clearly requires dramatic system perfor-
`mance, both in terms of computations and of memory of bandwidth. We have seen that the
`geometry portion of a graphics system can require more processing power than a single
`CPU can provide. Likewise, rasterization can require more bandwidth into memory than a
`single memory system can provide. The only way to attain such performance levels is to
`
`0208
`
`

`
`874
`
`Advanced Raster Graphics Arcititecture
`
`la)
`
`Fig. 18.9 Basic forms of multiprocessing: [al pipelining. and (bl parallelism.
`
`perform multiple operations concurrently and to perform multiple reads and
`memory concurrently—we need concurrent processing.
`Concurrent processing, or multiprocessing.
`is the basis of virtually all high-pm.-fat‘
`mance graphics architectures. Multiprocessing has two basic fort-ns: pipelim'ng_ Na
`parallelism (we reserve the term concurrency for multiprocessing in general). A
`processor contains a number of processing elements (PES) arranged such that the outwit‘
`one becomes the input of the next, in pipeline fashion (Fig. 18.93.). The PEs of a"par
`processor‘ are arranged side by side and operate simultaneously on different portions 0
`data (Fig. 18.9b).
`
`1 8.4.1 Pipelining
`
`To pipeline a computation. we partition it into stages that can be executed sequentittilljii
`separate PEs. Obviously, a pipeline can run only as fast as its slowest stage;_,_S0
`processing load should be distributed evenly over the PEs. If this is not possible, PEs-can
`sized according to the jobs they must perform.
`_'
`An important issue in pipeline systems is nhr-oughput versus latency. Throughput
`overall rate at which data are processed; latency is the time required for a single
`element to pass from the beginning to the end of the pipeline. Some calculations
`pipclined using 2 large number of stages to achieve very high throughput. Pipeline Ia
`increases with pipeline length, however. and certain computations can tolerate cub?
`limited amount of latency. For example. real-time graphics systems, such as-.
`simulators, must respond quickly to changes in flight controls. If more than oneor
`frames are in the rendering pipeline at once, the system’s interactivity may be imp
`regardless of the frame rate.
`-
`
`*
`
`'
`
`0209
`
`

`
`§n'troduc:ion‘to Multiprocessing ' 3?5
`
`13.4.2 Parallelism
`
`To parttllelize :1 cotnputation. we partition the data into portions that can be processed
`independently by different PEs. Frequently. PEs can execute the same prognttn. Homoge-
`'l£’0lt.\‘ parallel processors contain PEs of the same type; ltetetrwgettsotnt parallel processors
`contain PE.-: of dilferent types. In any parallel system. the overall computation speed is
`determined by the time required for the slowest PE to finish its task.
`It
`is important.
`therefore. to balance the processing load tttnong the PES.
`A further distinction is useful
`for homogeneous parallel processors: whether the
`processors operate in lock step or independently. Processors that operate in lock step
`generally share a single code store and are called single-instrttctiott tttttltiple-data (SIMD)
`processors. Processors that operate independently must have a separate code store for each
`PE and are called ttttr.-'tt'ple~itt.ttrttctt'on tnttltiple-dnto {MIMD) processors.
`
`SIMD processors. Because all the PE in a SIMD processor share a single code store,
`SIMD processors are generally less expensive than MIMD processors. However, they do
`not perform well on algorithms that contain conditional branches or that access data using
`pointers or indirection. Since the path taken in a conditional branch depends on data
`specific to a PE. dilferent PEs may follow different branch paths. Because all the PEs in 11
`SIMD processor operate in loclt step. they all must follow every possible branch path. To
`accommodate conditional branches, PEs generally contain an enable register to qualify
`write operations. only PEs whose enable registers are set write the results of computations.
`By appropriately setting and clearing the enable register, PEs can execute conditional
`branches (see Fig. l8.l0a).
`Algorithrns with few conditional branches execute efficiently on SIMD processors.
`Algorithms with many conditional branches can be extremely inefl‘icient, however, since
`
`statement I;
`if not cottdition then
`enable = FALSE;
`statement 2;
`toggle enable:
`statetrte.-tr 3;
`statement 4;
`enable = TRUE;
`statement 5;
`statement 6;
`
`statement 1‘;
`if condition then
`statement 2',
`else
`begin
`statement 3;
`statement 4;
`end;
`statement 5;
`statement 6;
`
`Total operations:
`10 if condition evaluates TRUE.
`10 if condition evaluates FALSE
`lal
`
`Total operations:
`5 if condition evaluates TRUE,
`6 if condition evaluates FALSE
`(bl
`
`in a SIMD
`Fig. 18.10 la} SIMD and lb} MIMD expressions of the same algorithm.
`Drogram, conditional branches transform into operations on the enable register. when
`the enable register of a particular PE is FALSE, the PE executes the current instruction.
`but does not write the result.
`
`0210
`
`

`
`376
`
`Advanced Raster Graphics Architecture
`
`most PEs may be disabled at any given time. Data structures containing pointers (such as
`linked lists or trees) or indexed arrays cause similar problems. Since a pointer or array index :-
`may contain a different value at each PE, all possible values must be enumerated to ensure
`that each PE can make its required memory reference. For large amys or pointers, this is an 5 H
`absurd waste of processing resources. A few SIMD processors provide separate addres
`wires for each PE in order to avoid this problem, but this adds size and complexity to ll_1__
`
`:-
`
`MIMD processors. MIMI) processors‘ are more expensive than SIMD processors, since
`each PE must have its own code store and controller. PEs in a MIMD processor ofteh
`execute the same program. Unlike SIMD PEs, however, they are not constrained to opera
`in look step. Because of this freedom, MIMD processors suffer no disadvantage when _ ha
`encounter conditional branches; each PE makes an independent control-flow decision
`skipping instructions that do not need to be executed (see Fig. 18.1013). As a result, MIM
`processors achieve higher efliciency on -general types of computations. However, si'n_i:'g
`processors may start and end at different times and may process data at dilferent ra_
`synchronization and load balancing are more tlifficult, frequently requiring FIFO buffer;
`the input or output of each PE.
`'-
`
`18.4.3 Multiprocessor Graphics Systems
`
`Pipeline and parallel processors are the basic building blocks of virtually all on
`high-performance graphics systems. Both techniques can be used to accelerate front
`and back-end subsystems of a graphics system. as shown in Fig. 13.11.
`in the following sections, we examine each of these strategies. Sections 18.3 and I
`discuss pipeline and parallel
`I'ront—end architectures. Sections 18.8 and 18.9 cl’
`_
`pipeline and parallel back-end architectures. Section 18.10 discusses back-end arciiitec
`totes that use parallel techniques in combination.
`-
`
`I
`
`,'
`
`-
`
`From
`traversal
`stage
`
`Pipeline
`
`OI‘
`
`Fig. 18.1 1 Pipelining and parallelism can be used to accelerate both front—end an
`back-end portions of a graphics system.
`'
`'
`
`0211
`
`

`
`13.5
`
`Pipeline Front-End Architectures
`
`377
`
`13.5 PIPELINE FRONT-END ARCHITECTURES
`
`Recall from Section l8.3 that the front end of ti graphics display system has two major
`tasks: traversing the display model and transforming primitives into screen space. As we
`have seen, to achieve the rendering rates required in cun-ent applications, we must use
`concurrency to speed these computations. Both pipclining and parallelism have been used
`for decades to build front ends of high-performance graphics systems. Since the front end is
`intrinsically pipelincd, its stages can be assigned to separate hardware units. Also, the large
`numbers of primitives in most graphics databases can be distributed over multiple
`processors and processed in parallel. In this section, we discuss pipeline front-end systems.
`We discuss parallel front-end systems in Section 18.6.
`In introducing the standard graphics pipeline of Fig. 18.7, we mentioned that it
`provides a useful conceptual model of the rendering process. Because of its linear nature
`and fairly even allocation of processing eflort, it also maps well onto a physical pipeline of
`processors. This has been a popular approach to building high-performance graphics
`systems since the 19605, as described in [MYER68]. ll classic paper on the evolution of
`graphics architectures. Each stage of the pipeline can be implemented in several ways: as an
`individual general-purpose processor, as a custom hardware unit, or as a pipeline or parallel
`processor itself. We now discuss implementations for each stage in the fi'ont-end pipeline.
`
`18.5.1 Application Program and Display Traversal
`
`_
`
`Some processor must execute the application program that drives the entire graphics
`system. In addition to feeding the graphics pipeline, this processor generally handles input
`devices, file F0, and all interaction with the user. In systems using imrnccliate-mode
`traversal. the display mode] is generally stored in the CPU’s main memory. The CPU must
`therefore traverse the model as well as run the application. In systems using retained mode,
`the model is generally (but not always) stored in the display processor’: memory, with the
`display processor performing u-avcrsal. Because such systems use two processors for these
`tasks, they are potentially faster, although they are less flexible and have other limitations,
`'
`_ as discussed in Section 7.2.2.
`Where very high performance is desired, a single processor may not be powerful
`-' enough to traverse the entire database with suflicient speed. The only remedy is to partition
`._
`the database and to traverse it in parallel. This relatively new technique is discussed in
`— Section 18.6.1.
`
`7 18.5.2 Geometric Transformation
`
`-2}; The geometric transformation stages (modeling transfon-nation and viewing transforma-
`tion) are highly compute-intensive. Fortunately, vector and matrix multiplications are
`Simple calculations that require no branching or looping, and can readily be implemented in
`hardware.
`The most common implementation of these stages is a single processor or functional
`unit that sequentially transforms a series of vertices. A pioneering processor of this type
`Was the Matrix Multiplier [SUTH68], which could multiply a four—clement vector by a
`
`0212
`
`

`
`3'53
`
`Advanced Raster =-‘3rapi'1ics .‘-‘trchitecturo
`
`honiogeneous transfot-mutton matrix in 20 microseconds. Other special-purpose geometry
`processors have been developed since then. most notably Clark's Geometry Engine. which
`can perform clipping as well (see Section |8.5.5). Recent geometry processors have
`exploited the power and progrnmmability of commercial floating-point chips.
`If pipelining does not provide enough performance. transformation computations can
`be parallclized in several ways:
`a
`Individual components of a vertex may be calculated in parallel. Four para1[e1',.'k'
`.proc-essors, each containing the currertl
`transformation matrix, can evaluate the
`expressions for x. y, 2. and w in parallel,
`'
`Multiple vertices can be transformed in parallel. If primitives are all of a uniforrn‘
`type—say. triangle.-t —the three vertices of each triangle can be transformed simult
`neously_
`Entire primitives can be transformed in parallel. If n transformation engines
`. _
`available, each processor can transform every nth primitive. This technique has
`of the advantages and disadvantages of parallel front-end systems, which wew‘
`discuss in Section 13.6.
`
`'
`
`_
`
`1 8.5.3 Trivial Acceptjlileiect Classification
`
`Trivial accept and reject tests are straightforward to implement, since they require at worst
`dot product and at best a single floating-point comparison (or subtract) to determine '_
`which side of each clipping plane each vertex lies. Because these tests require lint
`computation, they are generally performed by the processor that transforms primitives.
`
`1 8.5.4 Lighting
`
`Like geometric transformation. lighting calculations are straightforward and are float:
`point—intensive. A specialized hardware processor can calculate vertex colors based on-
`polygon ’s color and the light vector. More frequently, lighting calculations are -.
`using a programmable floating—point processor. In lower-performance systems, lighti"
`calculations can be done in the same processor that trmsforms vertices. Note that if Phong
`shading is used, lighting calculations are deferred until the rasterization stage.
`'
`
`18.5.5 Clipping
`
`Polygon clipping was once considered cumbersome, since the number of vertices =
`change during the clipping process and concave polygons can fragment into mnltip
`polygons during clipping. Sutherland and Hodgtnan [SUTH74] showed that arbi ‘-
`convex or concave polygons can be clipped to a convex view volume by passing
`polygotfs vertices through a single processing unit multiple times. Each pass through
`unit clips the polygon to a different place. In 1930, Clark proposed unwrapping
`i
`processing loop into a simple pipeline of identical processors. each of which could
`implemented in a single VLSI chip. which he named the Geometry Engine [CLAR82]. Th"
`Geometry Engine was general enough that
`it could transform primitives and --
`perspective division as well.
`
`0213
`
`

`
`13.5
`
`Pipeiine Front-End Architectures
`
`B79
`
`Clipping using :1 Geometry Engine {or similar pt‘(}CI2‘:S.‘i(Jl’] can be performed either by at
`single processor that clips each polygon by as many planes as necessary. or by a pipeline of
`clipping processors. one for each clipping plunc. The technique chosen urfects the
`worst-case performance of the graphics system: Systems with only one clipping processor
`may bog down during frames in which large numbers of primitives need to be clipped.
`whereas systems with :1 clipping processor for each clipping plane can run at full speed.
`However. most of the clipping processors are idle for most databases and views in the latter
`approach.
`General-purpose floating-point units recently have begun to replace custom VLSI
`transformation and clipping processors. For example, Silicon Graphics, which for many
`years employed custom front-end processors, in l989 used the Weirek 3332 floating-point
`chip for transformations and clipping in their POWER IRIS system (described in detail in
`Section 18.8.2). The delicate balance between performance and cost now favors commodi-
`ty processors. This balance may change again in the future if new graphics-specific
`functionality is needed and cannot be incorporated economically into general—purpose
`processors.
`
`18.5.6 Division by w and Mapping to 3D Viewpoint
`
`Like geometric transformation and lighting, the calculations in this stage are straightfor-
`ward but require substantial floating-point resources. A floating-point divide is time
`consuming even for most floating-point processors (many processors use an iterative
`method to do division). Again. these stages can be implemented in custom functional units
`or in a commercial floating-point processor. In very high-performance systems,
`these
`calculations can be performed in separate, pipelined processors.
`
`13.5.7 Limitations of Front-End Pipelines
`
`technique for building high-performance
`Even though pipelining is the predominant
`front-end qsterns, it has several limitations that are worth considering. First, a different
`algorithm is needed for each stage of the front—end pipeline. Thus. either a variety of
`hard-wired functional units must be designed or, if programmable processors are used,
`different programs must be written and loaded into each processor. In either case, processor
`or functional-unit capabilities must be carefully matched to their tasks, or bottlenecks will
`occur.
`
`to
`least
`Second. since the rendering algorithm is committed to hardware {or at
`firmware. since few systems allow users to reprogram pipeline processors), it is difficult to
`add new features. Even if usm have programming support for the pipeline processors, the
`ciistribution of hardware resources in the system may not adequately support new features
`such as complex primitives or collision detection between primitives.
`A final shortcoming of pipelined front ends is that the approach breaks down when
`display traversal can no longer be performed by a single processor, and this inevitably
`occurs at some performance level. For example. if we assume that traversal is performed by
`-' 3 20—MHz processor and memory system,
`that the description of each triangle in the
`database requires 40 words of data (for vertex coordinates, normal vectors, colors, etc.).
`-- and that each word sent to the pipeline requires two memorylprocessor cycles (one to read it
`
`0214
`
`

`
`BBD
`
`Advanced Raster Graphics Architecture
`
`from memory. another to load it into the pipeline), then a maximum of 20,000,000 ; (
`40) = 250,000 triangles per second can be displayed by the system. no matter how powgff
`the processors in the pipeline are. Current systems are rapidly approaching such limits
`What else can be done. then,
`to achieve higher performance‘? The alternative ltd
`pipelining front—end calculations is to parallelize them. The following section describes this
`second way to build high-performance front-end systems.
`"
`
`18.6 PARALLEL FRONT-END ARCHITECTURES
`
`Since graphics databases are regular, typically consisting of at large number of primiti'
`that receive nearly identical processing, an alternate way to add concurrency is to par-tit‘
`the data into separate streams and to process them independently. For most stages of
`front-end subsystem. such partitioning is readily done; for example,
`the geom
`transformation stages can use any of the parallel techniques described in Section 18.5.
`However, stages in which data streams diverge (display traversal) or converge (between--the
`front end and back end) are problematic, since they must handle the full data bandwid
`
`18.6.1 Display Traversal
`
`Almost all application programs assume a single, contiguous display model or database,
`a parallel front-end system, the simplest technique is to traverse the database in a si
`processor (serial
`traversal) and then to distribute primitives to the parallel processors
`Unfortunately, this serial traversal can become the bottleneck in a parallel f'ront~end sys‘
`Several techniques can be used to accelerate serial traversal:
`
`-
`
`I
`
`'
`‘
`
`Traversal routines can be optimized or written in assembly code
`
`The database can be stored in faster memory (i.e.. SRAM instead of DRAM)
`A faster traversal processor (or one optimized for the particular structure format) can be
`used.
`
`If these optimizations are not enough. the only alternative is to traverse the database
`parallel. The database either can be stored in a single memory system that allows parallel
`access by multiple processors (at sharcd—memory model), or can be distributed over Inultipl
`'
`processors, each with its own memory system (a distributed-memory model).
`The advantage of the shared-memory approach is that the database can remain invone
`place, although traversal must be divided among multiple processors. Presumably, each
`processor is assigned a certain portion of the database to traverse. Unfortuttately, inherited _
`attributes in a hierarchical database model mean that processors must contend for :tccesst_° :
`l_
`the same data. For example, each processor must have access to the current transformation _
`matrix and to other viewing and lighting parameters. Since the data bandwidth to and from '-
`a shared-memory system may not be much higher than that of a conventional mcm0fl'
`system. the shared-memory approach may not provide enough perforlmncc.
`In the distributed-memory approach. each processor contains El portion of the databfl-"VB
`in its local memory. It traverses its portion of the database for each l'r.-zrne and may alfifl
`perform other front-end computations, Distributing the database presents its own prob|eII'I5-
`however: Unless the system gives the application programmer the illusion in‘ at c:onti9“°”5
`database. it cannot support portable graphics libraries. Also. the load must be balanced-
`
`0215
`
`

`
`Parallel Front-End Architectures
`
`B81
`
`yer the traversal processors if system resources are to be utilized fully. Hierarchical
`_ atabases exacerbate both of these problems, since attributes in one level of a hierarchy
`R” ffect primitives below them. and structures deep in a hierarchy may be referenced by
`
`'l."l-ie following two sections examine two ways to distribute a hierarchical database over
`-
`ultiple processors: by structure, where each traversal processor is given a complete branch
`of the structure hierarchy; or by primitive, vvhere cach traversal processor is given a fraction
`f the primitives at each block in the hierarchy.
`
`istrihuting by structure. Distributi

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket