throbber
IEEE
`
`3 0 05 0
`a -H L
`e 13 xj
`0 05 -H Qj
`U U _J >
`■H 'H
`111 I—< Ll Qj
`lu acn u
`ixi au Q)
`^ 05 D q:
`
`MEDIATEK, Ex. 1021, Page 1
`IPR2018-00101
`
`

`

`CompuhrGrapHic^
`
`AND/applications
`
`f
`
`IEEE Computer Society
`10662 Los Vaqueros Circle
`PO Box 3014
`Los Alamitos, CA 90720-1264
`
`Editor in Chief
`Peter R. Wilson
`
`Submissions: Please submit six copies of all articles and
`proposals for special issues to Peter R. Wilson, editor in chief,
`NIST, Bldg. 220, Rra. A127, Gaithersburg, MD 20899, e-mail
`pwilson@cme.nist.gov. All submissions are subject to editing
`for style, clarity, and space considerations. An author’s guide
`is available from the publications office or by e-mail from
`nhays@computer.org.
`
`Editorial Board
`
`Frank W. Bliss
`Jack E. Bresenham
`Thomas A. DeFanti
`John C. Dill
`Rae A. Earnshaw
`Jose L. Encarnacao
`Nick England
`Andrew Glassner
`Bertram Herzog
`Tosiyasu L. Kunii
`Carl Machover
`Gregory M. Nielson
`Michael J. Potel
`Michael L. Rhodes
`Philip K. Robertson
`Lawrence J. Rosenblum
`
`EDS/GM Corp.
`Consultant
`U. of Illinois at Chicago
`Simon Eraser Univ.
`University of Leeds
`Technical U. of Darmstadt
`U. of North Carolina
`Xerox PARC
`Consultant
`U. of Aizu
`Machover Associates
`Arizona State U.
`Taligent
`Toshiba America MRI
`CSIRO, Australia
`Office of Naval Research
`
`Managing Editor
`Staff Editor
`Assistant Editor
`Publisher
`Assistant Publisher
`Editorial Dir., Magazines
`Advertising Manager
`Advertising Coordinator
`Member/Circ. Promos Mgr.
`Editorial Secretary
`Art Direction
`
`Nancy Hays
`Linda World
`Karen Whitehouse
`True Seaborn
`Matt Loeb
`Marilyn Potes
`Heidi Rex
`Marian Tibayan
`John Gill
`Carole Danner
`Joe Daigle
`
`Publications Board
`
`Barry W. Johnson (chair),
`Jon T. Butler, James J. Farrell III,
`Tadao Ichikawa, Mary Jane Irwin, Raymond Miller,
`Michael C. Mulder, Theo Pavlidis,
`John P. Riganati, Harold Stone, Ronald D. Williams
`
`Editorial: IEEE Computer Graphics and Applications serves
`both users and designers of graphics hardware, software, and
`systems. Its readers work in industry, business, the arts, and
`universities.Unless otherwise stated, bylined articles and
`departments, as well as descriptions of products and services,
`reflect the author’s or firm’s opinion; inclusion in this
`publication does not necessarily constitute endorsement by the
`IEEE or the Computer Society.
`
`Copyright and reprint permission: Abstracting is permitted
`with credit to the source. Libraries are permitted to photocopy
`beyond the limits of US copyright law for private use of
`patrons those articles that carry a code at the bottom of the
`first page, provided the per-copy fee indicated in the code is
`paid through the Copyright Clearance Center, 29 Congress St.,
`Salem, MA 01970. Instructors are permitted to photocopy
`isolated articles for noncommercial classroom use without fee.
`For other copying, reprint, or republication permission, write
`to Permissions Editor, IEEE Computer Graphics and
`Applications, 10662 Los Vaqueros Circle, PO Box 3014, Los
`Alamitos, CA 90720-1264. All rights reserved. Copyright ©
`1994 by the Institute of Electrical and Electronics Engineers,
`Inc.
`
`Circulation: IEEE Computer Graphics and Applications (ISSN
`0272-1716) is published bimonthly by the IEEE Computer
`Society, 10662 Los Vaqueros Circle, PO Box 3014, Los
`Alamitos, CA 90720-1264; (714) 821-8380. IEEE Computer
`Society Headquarters: 1730 Massachusetts Ave., Washington,
`DC 20036-1903. IEEE Headquarters: 345 East 47th St., New
`York, NY 10017-2394. Annual subscription: $29 in addition to
`any IEEE group or society dues. Members of other technical
`organizations: $53. Nonmember subscription rates are
`available on request. Back issues: members, $10; nonmembers,
`$20. This journal is also available in microfiche form.
`
`Postmaster: Send address changes and undelivered copies to
`IEEE Computer Graphics and Applications, IEEE Computer
`Society, 10662 Los Vaqueros Circle, PO Box 3014, Los
`Alamitos, CA 90720-1264. Second class postage is paid at New
`York, NY, and at additional mailing offices. Canadian GST
`#125634188. Printed in USA.
`
`Moving?
`
`PLEASE NOTIFY Mail to
`US 4 WEEKS
`IEEE Computer Society
`IN ADVANCE
`10662 Los Vaqueros Circle
`PO Box 3014
`Los Alamitos, CA 90720-1264
`• This notice of address change will apply to all IEEE publications to
`which you subscribe.
`• List your new address below.
`• If you have a question about your subscription, place your address label
`here and clip this form to your letter.
`ATTACH
`LABEL
`HERE
`
`Magazine Operations Committee
`
`Name (please print)
`
`James J. Farrel III (chair),
`Valdis Berzins, B. Chandrasekaran,
`Carl Chang, Dante Del Corso, Ramesh Jain,
`John A.N. Lee, Ted Lewis, Michael J. Quinn,
`Ahmed Sameh, Ken Wagner, Peter R. Wilson
`
`New address
`
`City
`
`State/Country
`
`^ ZIP
`
`MEDIATEK, Ex. 1021, Page 2
`IPR2018-00101
`
`

`

`SHORT NOTE
`
`Bump Shading for Volume Textures
`Nelson L. Max and Barry G. Becker
`Bump mapping works well for parameterized surfaces. Here we interpret the 3D texture as a
`displacement function to be added to the surface position.
`
`PARALLEL RENDERING
`
`21
`23
`
`33
`
`41
`
`49
`
`59
`
`69
`
`Guest Editors’ Introduction: Recent Developments in Parallel Rendering
`Scott Whitman, Charles D. Hansen, and Thomas W. Crockett
`
`A Sorting Classification of Parallel Rendering
`Steven Molnar, Michael Cox, David Ellsworth, and Henry Fuchs
`This conceptual model defines three broad sorting classes that illuminate the trade-offs
`between different parallel rendering approaches.
`
`A New Algorithm for Interactive Graphics on Multicomputers
`David A. Ellsworth
`This polygon rendering algorithm uses a new load-balancing method that exploits the frame-
`to-frame coherence of interactive applications.
`
`Dynamic Load Balancing for Parallel Polygon Rendering
`Scott Whitman
`A new graphics renderer incorporates novel partitioning methods for efficient execution on
`a parallel computer. It requires little overhead, executes efficiently, and demands minimal
`processor synchronization.
`
`Communication Costs for Parallel Volume-Rendering Algorithms
`Ulrich Neumann
`Analyzing the intrinsic communication costs of three classes of parallel volume-rendering
`algorithms yielded estimates of how varying system and data sizes would affect a system s
`communication time.
`
`Parallel Volume Rendering Using Binary-Swap Compositing
`Kwan-Liu Ma, James S. Painter, Charles D. Hansen, and Michael F. Krogh
`This algorithm distributes data and computations to individual processing nodes for
`rendering subimages, then composites the final image with a method that uses all nodes
`at all times.
`
`Distributed Data and Control for Ray Tracing in Parallel
`Didier Badouel, Kadi Bouatouch, and Thierry Priol
`Distributed-memory parallel computers offer high performance for ray tracing when
`supplied with shared virtual memory. They also avoid complex message-passing techniques.
`
`July 1994
`
`Vol. 14 No. 4 (ISSN 0272-1716)
`
`Published by IEEE Computer Society
`
`MEDIATEK, Ex. 1021, Page 3
`IPR2018-00101
`
`

`

`'1
`
`DEPARTMENTS
`4 Letters to the Editor
`Fanning the Flames for Fortran
`5 About the Cover
`Building Castles Out of Pixels
`8 Displays on Display
`Mark of the Hand—by Mouse
`12 Visualization Blackboard
`The Cosmic Worm
`15 Applications
`Decriminalizing the Fingerprint
`78 Jim Blinn’s Corner
`Quantization Error and Dithering
`83 Graphic News
`Is There Life After Commodore?;
`Campuses Chosen for New Media Centers
`DOD Funds Flat-Panel Display R&D
`86 Conferences
`The Future of VR;
`State of VR Art in Medicine
`89 Calendar
`92 New Products
`96 Advertiser/Product Index
`CIII Final Focus
`
`Change of address form, p. 1
`Computer Society Information, p. 17
`Reader Service Card, p. 96a & b
`
`MEDIATEK, Ex. 1021, Page 4
`IPR2018-00101
`
`

`

`A Sorting Classification of
`Parallel Rendering
`
`Steven Molnar
`University of North Carolina at Chapel Hill
`Michael Cox
`Princeton University
`David Ellsworth and Henry Fuchs
`University of North Carolina at Chapel Hill
`
`This conceptual model
`defines three broad sorting
`classes that illuminate the
`trade-offs between
`different parallel
`rendering approaches.
`
`Graphics rendering is notoriously compute intensive, par­
`
`rasterization are performed in parallel. This classification
`ticularly for realistic images and fast updates. Demanding
`scheme supports the analysis of computational and communi­
`applications, such as scientific visualization, CAD, vehicle sim­
`cation costs, and encompasses the bulk of current and proposed
`ulation, and virtual reality, can require hundreds of Mflops of
`highly parallel renderers—both hardware and software.
`performance and gigabytes per second of memory bandwidth—
`We begin by reviewing the standard feed-forward rendering
`far beyond the capabilities of a single processor. For these rea­
`pipeline, showing how different ways of parallelizing it lead to
`sons, parallelism has become a crucial tool for building
`three classes of rendering algorithms. Next, we consider each of
`high-performance graphics systems, whether special-purpose
`these classes in detail, analyzing their aggregate processing and
`hardware systems or software systems for general-purpose mul­
`communication costs, possible variations, and constraints they
`ticomputers.
`may impose on rendering applications. Finally, we use these
`We can employ parallelism of various types at many levels.
`analyses to compare the classes and identify when each is likely
`For example, functional parallelism (pipelining) can speed crit­
`to be preferable.
`ical calculations, and data parallelism can allow multiple results
`to be computed at once. Common data-parallel approaches are
`by object (object parallelism) and by pixel or portion of the
`screen (pixel or image parallelism).
`Several taxonomies of parallel rendering algorithms have been
`proposed.’ '* These taxonomies help in classifying and under­
`standing systems but do not lend themselves easily to compari­
`son or analysis. Some rendering systems have been analyzed in
`isolation.”'’ However, these analyses tend to focus on unique sys­
`tem attributes and make comparisons between systems difficult.
`In this article, we describe a classification scheme that we be­
`lieve provides a more structured framework for reasoning about
`parallel rendering. The scheme is based on where the sort from
`object coordinates to screen coordinates occurs, which we be­
`lieve is fundamental whenever both geometry processing and
`
`About the title photograph
`The photograph above shows the simulation results of
`communications traffic between 36 sort-first processors in
`a 2D mesh rendering the National Computer Graphics
`Association's "rotating head" Picture-Level Benchmark
`{GPC Quarterly Report, Vol. 2, No. 4, 1992).
`Arrow color indicates the number of primitives
`transferred between processors for the 4.5 degree
`rotation of the head model between these two successive
`frames. Range is 0 (black) to 800 (white) using a heated-
`object spectrum.
`
`July 1994
`
`0272-17-16/94/$4.00 © 1994 IEEE
`
`23
`
`MEDIATEK, Ex. 1021, Page 5
`IPR2018-00101
`
`

`

`Parallel Rendering
`
`Graphics database traversal
`
`i
`
`G
`
`G
`
`G
`
`G
`
`Geometry
`processing
`
`V
`
`R
`
`R
`
`R
`
`R
`
`Rasterization
`
`Display
`
`Parallel rendering as
`a sorting problem
`Figure 1 shows a simplified version of the standard, feed­
`forward rendering pipeline, adapted for parallel rendering. It
`consists of two principal parts: geometry processing (transfor­
`mation, clipping, lighting, and so on) and rasterization (scan-
`conversion, shading, and visibility determination). In this article,
`we target rendering rates high enough that both geometry pro­
`cessing and rasterization must be performed in parallel. We say
`such systems are fully parallel.
`Geometry processing usually is parallelized by assigning each
`processor a subset of the primitives (objects) in the scene. Ras­
`terization usually is parallelized by assigning each processor a
`portion of the pixel calculations.
`The essence of the rendering task is to calculate the effect of
`each primitive on each pixel. Due to the arbitrary nature of the
`modeling and viewing transformations, a primitive can fall any­
`where on (or off) the screen. Thus, we can view rendering as a
`problem of sorting primitives to the screen, as noted by Suther­
`land, Sproull, and Schumacher in their seminal paper on visible-
`surface algorithms.* For fully parallel renderers, this sort
`involves a redistribution of data between processors, because re­
`sponsibility for primitives and pixels is distributed.
`The location of this sort largely determines the structure of
`the resulting parallel rendering system. Understanding the va­
`riety of system structures possible within the constraints of this
`distributed sort and realizable with available computational re­
`sources is the main challenge for designers of fully parallel ren­
`dering systems.
`The sort can, in general, take place anywhere in the render­
`ing pipeline: during geometry processing (sort-first), between
`geometry processing and rasterization (sort-middle), or during
`rasterization (sort-last). Sort-first means redistributing “raw”
`primitives—before their screen-space parameters are known.
`Sort-middle means redistributing screen-space primitives. Sort-
`last means redistributing pixels, samples, or pixel fragments.
`Each of these choices leads to a separate class of parallel ren­
`dering algorithms with distinct properties. We describe the
`classes briefly now and examine them in more detail later.
`
`Sort-first
`The aim in sort-first is to distribute primitives early in the
`rendering pipeline—during geometry processing—to proces­
`sors that can do the remaining rendering calculations (Figure 2).
`We can do this by dividing the screen into disjoint regions and
`making processors (called renderers) responsible for all ren­
`dering calculations that affect their respective screen regions.
`Initially, we assign primitives to renderers in some arbitrary
`
`Figure 1. Graphics pipeline in a fully parallel rendering system.
`Processors G perform geometry processing. Processors R perform
`rasterization.
`
`Figure 2. Sort-first redistributes raw primitives
`during geometry processing.
`
`Graphics database
`(arbitrarily partitioned)
`
`(Pre-transform)
`
`Geometry
`processing
`
`Rasterization
`
`fashion. When rendering begins, each renderer does enough
`transformation to determine into which region(s) each primitive
`falls, generally computing the screen-space bounding box of
`the primitive. We call this pre-transformation, and it may or
`may not involve actually transforming the primitive. In some
`cases, primitives will fall into the screen regions of renderers
`other than the one on which they reside. These primitives must
`then be redistributed over an interconnect network to the
`appropriate renderer (or renderers), which then perform the re­
`mainder of the geometry-processing and rasterization calcula­
`tions for these primitives.
`This redistribution of primitives at the beginning of the ren­
`dering process is the distinguishing feature of sort-first. It clearly
`involves overhead, since it performs some geometry processing
`for some primitives on the wrong renderer. The results of these
`calculations must either be sent to or recomputed on the new
`renderer (s).
`Sort-first is the least explored of the three classes. To our
`knowledge, no one has built a sort-first system. Although sort-
`first may seem impractical at first, we will see later that it can re­
`quire much less communication bandwidth than the other
`approaches, if primitives are tessellated or if frame-to-frame
`coherence can be exploited. We will discuss its potential ad­
`vantages and disadvantages in more detail later.
`
`Sort-middle
`In sort-middle, primitives are redistributed in the middle of
`the rendering pipeline—between geometry processing and ras­
`terization. Primitives at this point have been transformed into
`screen coordinates and are ready for rasterization. Since many
`systems perform geometry processing and rasterization on sep­
`arate processors, this is a natural place to break the pipeline.
`In a sort-middle system, geometry processors are assigned,
`arbitrary subsets of the primitives to be displayed; rasterizers are
`assigned a portion of the display screen (generally a contiguous
`region of pixels, as in sort-first). The two processor groups can
`be separate sets of processors, or they can time-share the same
`physical processors.
`During each frame, geometry processors transform, light.
`
`24
`
`IEEE Computer Graphics and Applications
`
`MEDIATEK, Ex. 1021, Page 6
`IPR2018-00101
`
`

`

`A Sorting Classification of Parallel Rendering
`
`Figure 3. Sort-middle redistributes screen-space primitives between
`geometry processing and rasterization.
`
`Figure 4. Sort-last redistributes pixels, samples,
`or pixel fragments during rasterization.
`
`Graphics database
`(arbitrarily partitioned)
`
`and perform other geometric operations on their portion of the
`primitives, and classify them with respect to screen region
`boundaries. Then they transmit all these screen-space primi­
`tives to the appropriate rasterizer or rasterizers, as shown in
`Figure 3.
`Sort-middle is general and straightforward. It has been the
`most common approach to date for both hardware”' and soft­
`ware'’'’'^ parallel rendering systems. We will examine its ad­
`vantages and disadvantages in more detail later.
`
`Sort-last
`Sort-last defers sorting until the end of the rendering
`pipeline—after primitives have been rasterized into pixels, sam­
`ples, or pixel fragments. Sort-last systems assign each processor
`(renderer) arbitrary subsets of the primitives. Each renderer
`computes pixel values for its subset, no matter where they fall
`in the screen. Renderers then transmit these pixels over an in­
`terconnect network to compositing processors, which resolve
`the visibility of pixels from each renderer (Figure 4).
`In sort-last, renderers operate independently until the visi­
`bility stage—the very end of the rendering pipeline. The inter­
`connect network, however, must handle all the pixel data
`generated on all the renderers. For interactive or real-time ap­
`plications rendering high-quality images, this can result in very
`high data rates.
`Sort-last can be done in at least two ways. One approach,
`which we call SL-sparse, minimizes communication by dis­
`tributing only those pixels actually produced by rasterization.
`The second approach, called SL-full, stores and transfers a full
`image from each renderer. Both methods have advantages, as
`we will see later.
`Sort-last systems have existed in various forms for more than
`20 years. The 1967 GE NASA II flight simulator used a simple
`version of SL-full that assigned a processor to each primitive.'^
`Since then, several primitive-per-processor’ '"' and multiple-
`primitive-per-processor"’'"’ SL-full systems have been proposed.
`Several recent commercial systems have used the SL-sparse
`approach.
`
`Processing and
`communication model
`We now analyze each of the three rendering methods in more
`detail. The aim is to build a quantitative model of their pro­
`cessing and communication costs to use as a basis for compar­
`ing them. We first consider the processing required to render on
`a uniprocessor. Then we evaluate the additional processing and
`communication requirements of the parallel methods. We focus
`
`Terms used in the analysis
`
`N
`
`dr. 3d
`
`Or.O,
`
`frJd
`
`T
`S
`c
`
`The resolution of the screen, in pixels.
`The number of processors in the multiprocessor.
`The number of raw and display primitives,
`respectively.
`The average size, in pixels, of a raw and display
`primitive.
`The average number of screen regions that raw
`and display primitives overlap.
`The fraction of pre-tessellation and post­
`tessellation geometry processing that is
`duplicated when a raw primitive overlaps
`multiple screen regions.
`The tessellation ratio, njn^.
`The number of samples per pixel.
`The fraction of raw primitives that must be
`redistributed each frame when sort-first takes
`advantage of frame-to-frame coherence.
`
`on the inherent overhead in the methods and do not specifi­
`cally model factors such as load balancing, buffering, and trans­
`port delay. Such factors affect performance significantly, but
`depend on the detailed system implementation and the graph­
`ics scene itself. We will, however, diseuss these factors (partic­
`ularly load balancing) where appropriate.
`See the sidebar for definitions of terms used in the analyses.
`
`Uniprocessor pipeline
`For the analysis that follows, we refine the rendering pipeline
`as shown in Figure 5 on the next page. First, some rendering
`systems tessellate primitives to generate higher quality images
`(RenderMan is one widely used example). Tessellation decom­
`poses larger primitives into smaller ones, typically into polygons
`or polygonal meshes. Not all rendering packages perform tes­
`sellation, but we include it in the pipeline because it can greatly
`expand the number of primitives that need to be displayed.
`
`July 1994
`
`25
`
`MEDIATEK, Ex. 1021, Page 7
`IPR2018-00101
`
`

`

`Parallel Rendering
`
`Pipeline stage
`
`Processing cost
`
`Geometry:
`Pre-tessellation
`Post-tessellation
`Pixel rendering
`
`Visibility
`
`Of «geom-pre-tess» -i-
`Pel «geom-post-tess»
`«rend-setup» +
`«rend»
`«comp»
`
`Second, we break rasterization into two stages called pixel
`rendering (computing pixel values) and visibility (determining
`which pixels are visible), since some algorithms perform these
`on separate processors. Finally, we do not explicitly mention
`shading, which can be a major consumer of processing cycles,
`but can occur almost anywhere in the pipeline. Shading should
`be considered a part of geometry processing or of pixel ren­
`dering, as appropriate.
`We assume that we are rendering a data set containing raw
`primitives with average size a,. We will call primitives that re­
`sult from tessellation display primitives. If T is the tessellation
`ratio, there are = Tn^ display primitives, with average size
`= aJT. If there is no tessellation, 7 = 1,
`and = a,. We
`assume that we are rendering an image containing A pixels and
`that we are to compute S samples per pixel. For simplicity, we
`assume that all primitives are potentially visible (that is, lie
`within the viewing frustum).
`
`Processing costs
`Figure 5 lists the processing costs for each stage of the unipro­
`cessor rendering pipeline. First, the raw primitives are pro­
`cessed by the stages of the geometry pipeline up to tessellation.
`The cost for this is «geom-pre-tess» (“«...»” denotes cost,
`most naturally in units of time). The resulting display prim­
`itives are then processed by the geometry pipeline stages fol­
`lowing tessellation, at a cost of «geom-post-tess».
`Rasterization follows. Pixel rendering is the first stage and con­
`sists of two parts: setup and per-sample rendering operations,
`whose costs are «rend-setup» and n^a^S «rend», respectively.
`Finally, the cost of compositing is n^a^S «comp», where «comp»
`is the cost of compositing one sample (generally a z comparison
`and conditional write). There is no data redistribution or “sort”
`in the uniprocessor model.
`Sort-first analysis
`We begin our analysis with sort-first. Figure 6 shows its ren­
`dering pipeline and what it costs beyond uniprocessor render­
`ing. Our analysis assumes that primitives are redistributed as
`early in the rendering pipeline as possible and that renderers dis­
`card transformed data when they send a primitive to a new ren-
`derer. (An alternative is to send the transformed data. Although
`we will not focus on this alternative, its analysis is similar to the
`case we present here.)
`
`Processing and communication costs
`The first steps in sort-first are to “pre-transform” the raw
`primitives so that their screen extents are known and to classify
`them with respect to screen-region buckets. Each bucket be­
`longs to a region, and a primitive falls in several buckets when
`
`Figure 5. Rendering pipeline and processing costs in a uniprocessor
`implementation.
`
`Figure 6. Sort-first processing and communication overhead
`(communication costs indicated by box).
`
`Pipeline stage
`
`SF overhead
`
`Pre-transform
`Bucketization
`Redistribution
`Geometry:
`Pre-tessellation
`Post-tessellation
`Pixel rendering
`Visibility
`
`«pre-xform»
`nPr «bucket^>
`enPr «prim;»
`
`nr(Or-1)/r«geom-pre-tess» +
`nJ^Ocj-^) frf«geom-post-tess»
`1) «rend-setup» +
`—
`
`it overlaps several regions. We define an overlap factor O^,
`which is the average number of regions a raw primitive overlaps.
`If the cost to precompute a primitive’s screen coordinates is
`«pre-xform» and the cost to put the primitive in each of its buck­
`ets is «bucket,», then the overhead for these stages is n, «pre-
`xform» and «bucket,.».
`Next, primitives on the wrong renderer must be distributed
`to the correct renderer(s). The number of primitives redis­
`tributed depends on the application and whether frame-to-
`frame coherence is employed. For example, if we are rendering
`a single frame, almost all the primitives must be sent. However,
`if we render multiple frames in an animation or real-time ap­
`plication and the scene does not change much between frames,
`we may need to send only a small fraction of the primitives.
`(Note that this type of coherence is only available in sort-first,
`as it is the only technique that distributes raw primitives. The
`other methods distribute data that has undergone view-depen-
`dent processing.)
`To take this application-dependent behavior into account,
`we define the parameter c, the fraction of primitives that must
`be redistributed. The total network bandwidth required is then
`cn,0, «primr», where «prim,.» is the size of the data structure rep­
`resenting a raw primitive. We will not explicitly tally the pro­
`cessing cost of communication, but note that it is proportional
`to this term.
`After redistribution, sort-first algorithms may accrue other
`overhead that a uniprocessor pipeline would not. If a raw prim­
`itive falls in exactly one bucket, then it undergoes geometry
`processing exactly once, and the costs are the same as they
`would be on a uniprocessor. If the primitive overlaps more than
`one region, however, there will be duplication of effort. Each
`additional processor responsible for a given raw primitive must
`duplicate some fraction of pre-tessellation geometry pra-
`cessing and some fraction/^ of post-tessellation geometry pro­
`cessing. These fractions will depend on the algorithms chosen
`for geometry processing. For example, explicit clipping to region
`boundaries requires duplication of effort, while implementa­
`tions that avoid clipping avoid this extra work. From these con­
`siderations, the additional cost of geometry processing for the
`
`26
`
`IEEE Computer Graphics and Applications
`
`MEDIATEK, Ex. 1021, Page 8
`IPR2018-00101
`
`

`

`A Sorting Classification of Parallel Rendering
`
`Figure 7. Sort-middle processing and communication overhead
`(communication costs indicated by box).
`
`Pipeline stage
`
`SM overhead
`
`Geometry
`Bucketization
`Redistribution
`Pixel rendering
`Visibility
`
`_
`«bucket(j»
`ndOd«prim£^>
`nJ^Ofd-^) «rend-setup»
`—
`
`/I, raw primitives is - 1)/, «geom-pre-tess». For the Tn^
`display primitives, it is n^O^- \ )f,, «geom-post-tess».
`Finally, each display primitive that overlaps several screen re­
`gions requires pixel rendering setup for each region. This adds
`processing overhead of
`1) «rend-setup».
`
`Frame-to-frame coherence
`If we are rendering a single frame, most primitives will be on
`the wrong renderer and will require redistribution—unless we
`are very lucky. Hence, c will be close to 1. However, if we ren­
`der several related frames in sequence, we may be able to take
`advantage of frame-to-frame coherence to reduce overhead.
`To do this, processors that render a primitive during one frame
`must retain that primitive for the next frame. If there is signifi­
`cant spatial coherence between frames, fewer primitives will
`require redistribution during the next frame and c will be close
`to 0. Of course, the application must support retained-mode
`(display-list) rendering, and bookkeeping is needed to track
`primitive ownership.
`To get an estimate of c, we have analyzed traces from two ren­
`dering sequences: the rotating head model in the NCGA Graph­
`ics Performance Committee’s Picture-Level Benchmark^” (see
`the title-page photo) and an architectural walkthrough of a
`building interior seen through a head-mounted display. In both
`cases, we assumed a display size of 1,280 x 1,024 pixels and a re­
`gion size of 64 x 64 pixels. In the Picture-Level Benchmark, the
`head contains 60,000 triangles and rotates 4.5 degrees each
`frame. Here, c varied between 0.13 and 0.16. The architectural
`model contained 64,796 triangles. During a 60-second explo­
`ration of a room within the model, c varied between 0.0 and
`0.64 with a mean value of 0.15. Thus, we see that c can be quite
`small in practice, making sort-first appealing for applications
`with substantial coherence.
`
`Tessellation and oversampling
`If a system employs tessellation and oversampling, then each
`raw primitive generates T display primitives, and each of these
`in turn generates samples. This means that for each raw
`primitive redistributed by sort-first, sort-middle must redis­
`tribute Tdisplay primitives and sort-last must redistribute Ta^S
`samples. If T and S are large, it may make the most sense to re­
`distribute raw primitives, since there are far fewer of them. We
`will see after considering sort-middle and sort-last that sort-
`first has the lowest cost under these circumstances.
`
`Load balancing
`Sort-first is susceptible to several types of load imbalance (as
`are the other two approaches). The most obvious is the initial dis­
`tribution of primitives across processors. Even if equal numbers
`of primitives are assigned to each processor, primitives may re­
`quire different amounts of work, so load imbalances can result.
`
`A second type of load imbalance arises from the distribution
`of primitives over the screen. In sort-first, some regions of the
`screen will very likely receive many more primitives than oth­
`ers. Also, some primitives may take longer to process than oth­
`ers. A way to combat this is to make regions smaller and make
`each processor responsible for more than one region. This can
`be done statically, which is simple, but might not achieve an
`optimal load distribution for any given frame. It can also be
`done dynamically, which increases algorithm complexity but
`may achieve better results. When using frame-to-frame coher­
`ence, we must constrain dynamic load balancing so that each re­
`gion is assigned to the same processor in successive frames.
`Also, dividing the screen more finely to improve the load bal­
`ance tends to increase c.
`
`Advantages and disadvantages
`In summary, sort-first has two advantages:
`
`• Communication requirements are low when the tessellation
`ratio and the degree of oversampling are high, or when frame-
`to-frame coherence can be exploited.
`• Processing nodes implement the entire rendering pipeline for
`a portion of the screen.
`
`Its disadvantages, however, are
`
`• Susceptibility to load imbalance. Primitives may clump into re­
`gions, concentrating the work on a few renderers.
`• Necessity of retained mode and complex data handling code
`to take advantage of frame-to-frame coherence.
`
`Sort-middle analysis
`Sort-middle algorithms are in some sense the most natural. In
`contrast to sort-first, they redistribute primitives after geome­
`try processing, that is, after screen coordinates have already
`been computed. Figure 7 shows the sort-middle rendering
`pipeline and its additional costs.
`
`Processing and communication costs
`In sort-middle, the cost of geometry processing is the same as
`on a uniprocessor. Sort-middle algorithms first accrue overhead
`when they redistribute display primitives. Each of the prim­
`itives must be placed into buckets at a total cost of
`«buckeL». The bandwidth required to send these buckets is
`^dOd «prim^,», and the processing cost of communication is pro­
`portional. After redistribution, display primitives that overlap
`more than one region require extra pixel rendering setup. The
`cost for this is (O^-1) times the cost on a uniprocessor. Visibility
`calculations cost the same in sort-middle as they would on a
`uniprocessor.
`
`July 1994
`
`V
`
`27
`
`MEDIATEK, Ex. 1021, Page 9
`IPR2018-00101
`
`

`

`Parallel Rendering
`
`Figure 8. Sort-last processing and communication overhead
`(communication costs indicated by box).
`
`Pipeline stage
`
`SL-sparse overhead SL-full overhead
`
`Geometry
`Pixel rendering
`Redistribution
`Visibility
`
`— —
`
`«sample»
`—
`
`NAS «sample»
`NAS «comp»
`
`From Figure 7 we can see that the overhead for sort-middle
`depends on the number of display primitives and on the dis­
`play-primitive overlap factor

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket