`
`: @ Springer-Verlag
`
`•
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 1 of 29
`
`
`
`BEfl-;EU�0
`LIBRARY )
`I UNIVERSITY OF :
`
`CAUFOf-lt.:i:__./
`
`li I•
`II
`1-
`
`I -------- - ---
`
`--- -��..---
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 2 of 29
`
`
`
`Focus on Computer Graphics �!i!H
`
`in Computer Graphics Tutorials and Perspectives
`
`Edited by W.T. Hewitt, R. Gnatz, and W. Hansmann
`
`...
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 3 of 29
`
`
`
`A. Kaufman (Ed.)
`
`Rendering, Visualization
`
`and Rasterization Hardware
`
`With 100 Figures
`
`Springer -Verlag
`
`
`Berlin Heidelberg New York
`London Paris Tokyo
`Hong Kong Barcelona
`Budapest
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 4 of 29
`
`
`
`--� ,( ...... ·� ,. y
`;� .. - 7
`ENGINEERJN�
`Focus on Computer Graphics
`Edited by W. T. Hewitt, R. Gnatz, and W. Hansmann
`for EUROGRAPHICS -
`
`The European Association for Computer Graphics
`P. 0. Box 16, CH-1288 Aire-la-Ville, Switzerland
`
`Volume Editor
`
`Arie Kaufman
`Department of Computer Science
`
`State University of NY at Stony Brook
`Stony Brook, NY I L 794-4400, USA
`
`H. Selzer, Fraunhofer-lnstitut
`Coverpicture:
`Datenverarbeitung (see also contribution
`fUr Graphische
`p. 37)
`
`ISBN 3-540-56787-9 Springer-Verlag Berlin Heidelberg New York
`
`
`
`
`
`ISBN 0-387-56787-9 Springer-Verlag New York Berlin Heidelberg
`
`of Congress Cataloging-in-Publication Data
`
`Rendering, visualization
`and rastcrizntion
`computer graphics)
`record of the contributions
`to the Sixth Eurographics Work
`"Comprehensive
`
`1991 in Vienna. Austria,
`with the
`in conjunction
`
`references and index. ISBN 0-387-
`equipment-Congresses.
`
`Library
`hardware I A. Kaufman, (cd.). p. em. - crocus on
`shop on Graphics Hardware held on I ·2 September,
`Eurographics
`'91 Conference"- Pref. Includes bib�ographical
`
`56787-9 (U.S.) I. Computer graphics-Co ngresses. 2. Computerinput-output
`I. Kaufman. Arie. II. Eurographics Workshop on Graphics Hardware (6th: 1991: Vienna. Austria).
`
`
`
`HI. EUROGRAPHICS (1991: Vienna. Austria). IV. Series. T385.R458 1993 621.39'9-dc20
`
`·ntis work is subject to copyright.
`whether the whole or part of the material
`All rights arc reserved,
`
`
`
`
`is concerned. specifically tl1e rights of translation, reprinting, reuse of illustrations, recitation.
`
`
`broadcasting, reproduction on microfilm or in any other way, and storage in data bauks. Duplication
`
`
`of this publication or parts thereof is pcrmiued only under the provisions of the German Copyright
`for use must always be obtained
`Verlag. Violations
`from Springer·
`
`arc liable for prosecution under the German Copyrighl Law.
`
`Preface
`
`The material in this 1
`\1
`Sixth Eurographics
`enna, Austria, in conj
`
`ics Workshop on Gra
`These workshops hav
`on the latest develop:
`The papers in this
`papers were revised
`Workshop.
`The book has live
`and C
`Vice President
`on "Issues and Dire<
`first part of the book
`
`simulation and silicot
`graphics systems: a I
`The third part focusc
`two devices to facilit;
`papers on rasteri;r.ati
`
`triangular faces. The
`include a programma
`
`on a parallel architec
`The book is a testi
`
`architectural and alg
`
`technology, newly de
`
`application encoura1
`and • viable research
`
`I am very grateful
`the planning of the
`
`I would like to than
`Max Mehl from Fh
`
`Technical Universit)
`
`local organization; .
`book for publicatior
`the careful preparat
`
`Stony Brook, New ·
`Spring 1993
`
`L1w of September 9, 1965, in its current version, and permission
`� 1993 EUROGRAPHICS 'llle European Associalion
`11te usc of general dcscripli
`
`Printed in Germany
`
`for Compulcr Graphics
`
`vc names. rcgis1crcd names. trademarks. etc. in this publica! ion docs no1
`
`
`
`
`imply, even in the absence of a specific stmcmem. thai such names arc exempt from !he relevam
`
`
`
`protective laws and regulations and lhcrcforc free for general use.
`
`Camera ready copy by aulhors/cditors
`45/3140-54 3 2 I 0-Printed on acid-free
`
`Typescning:
`
`paper
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 5 of 29
`
`
`
`Preface
`
`i?v _-(
`'11J
`fl)/-1
`L /'• �'�
`
`
`
`
`record a comprehensive The material in this book represents of lhc contributions to the
`
`
`
`
`held on 1-2 September Workshop Sixth Eurographics on Graphics Hardware 1991 in Vi
`'91 Conference.
`
`
`enna, Austria, in conjunction with the Eurographics
`The Sixth Eurograph·
`
`
`
`
`
`series of workshops. ics Workshop on Graphics Hardware is the sixth in an established
`
`
`
`
`These workshops have been an exccllenl forum for an exchange of information and ideas
`in the field of graphics
`
`
`
`report and work-in-progress on the latest development
`hardware.
`
`
`
`
`The at the Workshop. versions of those presented The papers in this book arc revised
`
`
`
`
`during the comments and the discussions papers were revised based on lhe reviewers
`Workshop.
`The book has five parts and a keynote paper. The keynote paper is by Kurt Akeley,
`
`
`
`and Chief Engineer of Silicon
`
`Vice President
`
`
`Graphics, who delivered the keynote address
`
`
`
`
`
`The Accelerators" at the Workshop. for Graphics Hardware on "Issues and Directions
`
`
`
`
`
`
`discuss in this part The papers hardware design. first part of the book concerns graphics
`part contains two papers
`
`
`The second simulation and silicon compilers for such a design.
`on
`
`
`machine. systems: a high-performance graphics
`system and the I.M.O.G.E.N.E.
`graphics
`
`
`
`
`The third part focuses on volume (voxel-based) machines. The papers in this part describe
`
`
`
`
`
`of volumes. The fourth transformations two devices to facilitate part of this book includes
`
`
`
`
`
`papers on rasterization systems, including character rasterization
`
`and scan-conversion of
`
`
`
`
`machines. They triangular faces. The papers in the last part of the book focus on rendering
`
`
`
`
`include a programmable engine, rendering primitive shaders, and radiosity implementation
`on a parallel architecture.
`
`The book is a testimony that there are flourishing activities in the development of novel
`
`
`
`
`
`
`
`
`architectural and algorithmic ideas in graphics hardware. Specifically, the impact of VLSJ
`
`
`
`and lhe increasing newly developed and approaches, diversity of
`algorithms
`technology,
`
`
`
`
`hardware application encourage new hardware solutions and keep the graphics topic a
`
`
`
`
`
`research viable and developmcni area.
`lam very grateful
`
`
`
`process for the amount of time and energy put into the refereeing and
`Committee. ln addition,
`
`
`
`the planning of the Workshop by the members of lhc Program
`
`
`
`
`series; the Workshop I would like to thank the Eurographics Association for supporting
`
`
`
`
`Max Mehl from FhG-AGD, Darmstadt, for his effort in organizing the Workshop; the
`
`
`
`
`
`the event; Gerhard University Technical of Vienna for hosting Hicss from TU Vienna for
`local organization; and my students
`
`
`
`Chiudio Silva and Juliana Silva for preparing the
`
`
`
`book for publication. Last, but not least, my thanks go to the authors of the papers for
`
`the careful preparation of their manuscript.
`
`Stony Brook, New York
`Spring 19!33
`
`Arie Kaufman
`
`vYork
`elbcrg
`
`Focus on
`cs Work·
`1 with the
`N 0-387-
`ngrcsses.
`Austria).
`9-<lc20
`
`material
`!Cilation,
`plic3lion
`:opyright
`obtained
`Law.
`
`'docs not
`relevant
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 6 of 29
`
`
`
`
`
`tics Hard ware
`
`Table of Contents
`
`l:, USA
`
`
`
`Amsterdam, NL)
`
`Keynote Speaker
`1 Issues and Directions for Graphics Hardware Accelerators
`/(u1't Akeley
`
`I Graphics Hardware Design
`2 XInPosse: Structural Simulation for Graphics Hardware
`M .A. Guravage, E.H. Blake, A.A.M. Kuijk
`
`Design? 3 Silicon Compilers for Graphics Hardware
`
`
`
`Oliver Renz, A /win Groene
`
`II Graphics Systems
`4 Dynamic Load Balancing within a High Performance
`Graphics System
`Harald Selzer
`
`5 The I.M.O.G.E.N.E. Machine:
`Some Hardware Elements
`V. Lefevere,
`
`S. Karpf, C. Chaillou, M. Mb·iaux.
`
`ill Volume Machines
`6 The Conveyor -an Interconnection
`Device for Parallel
`Volumetric Transformations
`Daniel Cohen, Reuven Baka/ash
`
`7 The Flipping Cube: A Device for Rotating 3D Rasters
`Roni Yagel
`
`IV Rasterization
`
`8 Hardware Outline Character Rasterization
`Marc Morgan, Roge1· D. Fle1·sch
`9 Accurate Scanconversion
`of Triangulated Surfaces
`Jarek R. Rossignac
`
`1
`
`3
`
`7
`
`9
`
`20
`
`35
`
`37
`
`54
`
`75
`
`77
`
`86
`
`101
`
`103
`
`116
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 8 of 29
`
`
`
`V Rendering Machines
`10 Testing Geometric Primitive Shaders
`M. White, P. F. Listc,·,
`G. J. Dunnett,
`R. L. G1·imsdale
`11 An Architecture for a High Performance Rendering Engine
`
`[[ omung
`
`
`Hans-Josef A ckemwnn, Christoph
`
`12 Space Partitioning for Mapping Radiosity Computations onto
`
`
`
`a Pipelined Parallel Architecture (II)
`E. Dep1·ette1·e
`L.S. Shen, F.A.J. Laarakke1·,
`
`List of Contributors
`
`139
`
`141
`
`157
`
`175
`
`191
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 9 of 29
`
`
`
`4 Dynamic Load Balancing within a High Performance
`Graphics System
`Harald Selzer
`
`ABSTRACT Interactive 3D graphics applications require significant arithmetic pro
`
`
`
`
`cessing
`
`
`
`
`to meet the ever-inrea.sing desire for higher image complexity and higher
`
`
`resolution in displayed images.
`This paper describes a graphics processor architecture with a high degree of paral
`
`
`
`
`
`
`
`lelism connected to a distributed frame buffer. The architecture can be configured
`
`
`
`
`with an arbitrary number of identical, high level programmable processors operating
`in parallel.
`Within the architecture an automatic load balancing mec.hanism is presented which
`
`
`
`
`
`
`
`
`
`
`distributes the processing load between geometry and rendering section.
`
`
`
`
`After the unique features of the architecture are described the load balancing mech
`
`
`
`anism is analyzed and the increase of performance is demonstrated.
`
`4.1 Introduction
`
`Since human visual perception is the most effective method to perceive a lot of informa
`
`
`
`
`
`
`
`tions in a short time, the photorealistic rendering for the visualization of medical, physical
`
`
`
`
`
`or technical data requires speed improvements and demands for developing innovative ar
`chi teet ures.
`Modern workstations with a state of the art graphics platform incorporate some form
`
`
`
`
`
`
`
`
`of hardware support for graphics applications to release the CPU from the burden of visu
`
`
`
`
`alization tasks. Sophisticated user interfaces within any CAX application in conjunction
`
`
`
`
`with high interactivity and realistic images require to split and parallelise the system to
`
`
`
`distribute overall computing load.
`1, a.
`
`
`This paper describes considerations made within the work for the GRACE project
`
`
`
`
`development which tries to satisfy the requirements of a graphics processor architecture.
`
`4.2 Background
`4.2.1 Contemporary Architectures
`Common to all raster
`the image on a. display systems is the frame buffer, which stores
`
`
`
`
`
`
`
`pixel by pixel basis and decouples image generation and video refresh process. The design
`
`
`
`
`of the frame buffer with its partitioning related to object or screen space and the degree
`
`
`
`
`
`of parallel access possibilities arc a keyfeature to systems merit 116).
`a. lot of architectures Attempting to satisfy the demands of increased calculation rates
`
`
`
`
`'This project was funded by the Commission of the European Community in the ESPRlT-11-Program, Project·
`No 2569 (EuroWorkStation)
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 10 of 29
`
`
`
`38 Harald Selzer
`
`A well known approach making extensive usc of full custom VLSI devices is the design
`
`with different basic. concepts have been developed [9] showing the demands of integrating
`
`
`
`
`
`2. Floating Poi
`
`
`more system functionality on a single chip.
`While desigr
`
`
`
`is the most sui
`of the Pixel Planes [ 15],[8].
`
`Graphics ap·
`To achieve higher rendering performance and to overcome the frame bufi"er bottleneck
`
`
`
`
`
`
`devices with v•
`
`
`
`
`the rasterization processor and the frame buffer memory are integrated on the same chip.
`State of the
`
`
`
`Similar approaches were proposed in the Scan Line Access Memory [7] and the Smart
`few of these pn
`Image Memory [4].
`point operati01
`Other architectures try to parallelize functional modules of the image generation process
`
`
`
`
`
`
`cation specific
`
`
`
`
`
`e.g. by mapping the geometry section to a multistage pipeline of customized VLSI devices
`for mathemati
`
`
`
`[5]. This design was enhanced and is now available as a full parallelized slate of the art
`tions are perfo
`[2],[3].
`workstation
`
`geometry secti
`Another more general architecture is the Pixel Macltine, a MIMD computer based on
`
`
`
`
`
`-Transform
`[14].
`
`
`
`
`an array of asynchronous processor nodes with parallel access to a large frame buffer
`-Interpolati
`
`
`
`
`The advantage of this approach is the homogeneous structure and the programmability
`-Normalizi1
`
`
`
`which allows all algorithms to be implemented in software.
`-Performing
`Using a sing
`4.2.2 Goals
`c
`150 (decimal
`4.2.2.1 Principal Considerations
`the operation!
`1. Frame Buffer
`
`operations lik•
`-colour intc
`The memory in which an image is stored on a pixel by pi.xel basis is called the frame
`
`
`int
`-z-value
`
`
`buffer or image memory. This memory is accessed on the one hand by the rendering
`-transpare1
`
`
`processor, which writes data into the memory and on the other hand by the video refresh
`-algorithm�
`
`
`controller, which reads from the memory and conveys pixel data lo the video output
`For an ima
`
`
`circuitry and the display monitor.
`
`fixpoint arith1
`The image memory buill up with conventional DRAMs can bother image generation
`
`
`Suggesting
`
`
`
`
`process at rendering processor side as well as at video refresh side. Using lodays available
`
`tiona! part co
`
`
`
`
`
`video RAMs (VRAl'vls) improves the speed of frame buffer access dramatically (Whit84).
`over a whole
`
`
`
`
`
`
`Nevertheless a certain level of performance implies the need of parallelism within the frame
`
`ceptible for tl
`
`
`
`
`
`
`buffer. A resolution of 1024xl280 visible pixels with a 60 Hz refresh rate (noninterlaced)
`This shows
`
`
`
`requires a pixel frequency of about 110 MHz or equivalent 330 Mbyle/s transferrale for
`
`units are nee
`
`
`
`
`full colour representation with 24 hils/pixel. A monolithic frame buffer can not achieve
`
`be done with
`
`
`
`that. The maximum clock frequency of the VRAM shift register measures 30-40 MHz and
`
`application is
`
`
`
`
`
`
`is therefore limited to resolutions 640x480 pixels with 60 Hz video refresh rate (noninler
`metic m�y su
`
`laced) or equivalent.
`
`triangle shad•
`On the other side display processors with 25ns cycle times have lo compete for the
`
`
`
`In the syst
`
`random access port of a VRAM with a normal cycle lime of about 150ns {no page, nibble
`rend
`perform
`
`
`
`
`or static column mode is taken into account) slowing down image generation .
`to in
`decision
`
`
`
`
`
`The solution appears to be found in writing multiple pixels into the memory in parallel,
`on chip and
`
`
`
`
`
`the basic concept of the distributed frame buffer. The frame buffer could be divided into
`market.
`
`
`
`rows, columns, or arrays [9] [16] and each of these parts is attached to a separate rendering
`
`
`
`processor thus overcoming the memory access bottleneck.
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 11 of 29
`
`
`
`
`
`While designing a system-architecture the central question is arising which processor
`
`For an image with a limited resolution most of this operations could be done with
`
`1eralion process
`
`!d VLSI devices
`slate of the art
`
`:ailed the frame
`
`r the rendering
`he video refresh
`. e video output
`
`4. Dynamic Load Balancing within a High Performance Graphics System 39
`Is of integrating
`
`
`2. Floating Point Versus Fixpoint Calculation
`
`
`
`
`:es is the design
`
`is the most suitable for that design.
`
`
`
`
`
`Graphics applications arc very arithmetic intensiv tasks and therefore need processing
`uffer bottleneck
`1 the same chip.
`
`
`
`devices with very powcrfuU arithmetic and logic units (ALUs).
`State of the art processors include a floating point unit on chip. But nevertheless only a
`
`
`
`
`and the Smart
`
`
`
`
`
`few of these processors incorporate sufficient powerfuU arithmetic units to perform floating
`if the system signed with appli
`
`
`
`point operations as fast as integer operations. Especially
`
`
`
`
`cation specific integrated circuits (A SICS) it is worth while to consider which accuracy
`
`
`
`
`Numerical intensive operasystem is needed. for mathematical calculations in a graphics
`
`
`
`
`
`tions are performed in the geometry and the rendering section. Typical tasks within the
`
`
`
`
`
`geometry section requiring floating point calculation with high accuracy are the following:
`aputer based on
`
`
`
`
`-Transforming objects with world coordinates to image space,
`arne buffer (14].
`
`
`
`-Interpolating vertex normals (Phong shading)
`rogrammability
`
`
`
`
`-Normalizing interpolated vertex normals (Phong shading)
`
`
`
`-Performing the lighting calculations (Phong and Gouraud shading).
`
`
`
`
`
`Using a single precision floating point number results in a maximum inaccuracy of 2exp-
`
`
`
`
`
`150 (decimal equivalent: 7*10exp-46) per operation [13]. This is a sufficient precision for
`
`
`
`
`
`
`
`the operations mentioned above without visible effects. The rendering section comprises
`
`operations like
`-colour interpolation (Gouraud shading)
`
`
`
`
`-z-value interpolation for z-buffering
`-transparency calculation
`
`
`-algorithms for image processing .
`
`
`
`
`
`
`fucpoint arithmetic in an appropriate precision.
`1age generation
`Suggesting a resolution of 2048 x 2048 pixel and a fixpoint representation with a frac
`
`
`
`
`
`:odays available
`
`
`
`
`tional part consistng of 16 part consisting of 16 bits, a RGB model colour interpolation
`cally (Whit84).
`
`
`
`over a whole scanline would incorporate a binary error of 2exp-6 - a deviation not per
`vi thin the frame
`
`ceptible for the human eye on todays monitors.
`( noninterlaced)
`This shows that for the mathematical calculation in the geometry section floating point
`
`
`
`
`
`
`transferrate for
`
`
`
`
`units are necessary but in the rendering section the mathematical computations could
`can not achieve
`
`
`
`
`
`be done with fixpoint precision. Therefore, if a straightforward architecture for a specific
`30-40 MHz and
`
`
`
`
`
`application is implemented with no parallelism on board or module level, fixpoint arith
`rate (noninter-
`
`
`
`metic may suite well -an approach that was realized and tested well for a fast Gouraud
`triangle shadcr (1].
`In the system-architecture discussed in this paper the processors should be able to
`
`
`
`
`
`
`
`perform rendering tasks as well as geometry calculations. This argue mainly led to the
`
`
`
`
`
`decision to incorporate digital signal processors (DSPs) which have a floating point unit
`
`
`
`on chip and were at the time of system design the fastest processors available on the
`market.
`
`ompete for the
`no page, nibble
`lion.
`nory in parallel,
`be divided
`into
`•arate
`rendering
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 12 of 29
`
`
`
`40 Harald Selzer
`4.2.2.2 System Characteristics and Design Goals
`
`The architecture to be rea.li�ed should be capable of generating high quality images within
`
`
`
`
`
`
`
`
`
`a moderate time. That means the hardware should be as fast as possible hut not as big
`
`
`
`
`as possible. The employed computatiomal power should be used very effectively. Other
`
`characteristics arc:
`· A flexible system, high level programmable to enable the implementation of all graph·
`
`
`
`
`
`
`
`
`ics functions necessary and various algorithms for image generation.
`
`
`-Parallelism should be implemented wherever possible
`
`
`
`
`-Homogene. To become familiar with an off-the-shelf VLST device needs some time.
`
`
`
`
`To become familiar with a few different such devices needs a lot of time. Therefore the
`
`
`
`
`number of clifferent off-the-shelf VLST components had to be reduced to a minimum to
`
`
`ease system use and shorten software development time.
`
`
`
`-The arithmetical and logical units (ALUs) should be available off-the-shelf.
`
`
`
`
`
`
`
`-The frame buffer design should overcome the access bottleneck on the generation side
`
`
`
`
`as well as on the video side and incorporate hardware support for fast window handling.
`
`
`-The frame buffer resolution is 1280 * 1024 pixel with a video refresh rate of 60 Hz
`
`
`(noninterlaced). Every pixel has 24 bit colour and is double buffered <tS weU as z-buffered.
`
`
`4.3.1.1 T
`
`
`
`
`-The frame bufl"er should provide double buffering in order to accomodate dynamics
`
`and z-buffering too.
`The comm
`memory at
`and interrr
`The mas
`firs·
`special
`ln the c;
`etc.) the rr
`to the app
`4.3.1.2 ']
`Above of the frame buffer there arc three different units handling the image generation
`c
`Graphics
`1
`geometry
`normal int
`ncr and d<
`4.3.1.3 1
`The struc
`renderin
`g
`its data F
`processing
`Becaus<
`fetch grap
`of the apr
`
`4.3 The Architecture
`4.3.1 Overview
`Taking into account the demands of the different tasks within the image generation process
`
`
`
`
`
`
`
`
`
`
`
`the mapping of the functional sections to hardware suggested the splitting into units as
`
`shown in Figure 4.1.
`
`
`process:
`-The Master Module
`
`-The Geometry Module
`
`-The Rendering Module.
`The master module is the systems supervisor, handles the communication to the host
`
`
`
`
`
`
`
`processor and is responsible for start-up and synchronising activities.
`
`
`
`
`
`The geometry module transforms and clips the graphic primitives, subdivides bipara
`
`
`
`
`mctric patches and the lighting calculations that are necessary and tasks like this.
`
`
`
`
`
`The rendering module performs the shacling algorithms and transfers pixel data to the
`
`
`
`
`
`frame buffer. The rendering module also supports too aU functions of the geometry module
`
`(Figure 4.1).
`
`AU modules contain a digital signal processor (DSP) with up to 256k * 32 bit wide,
`
`
`
`
`
`
`
`
`fast static memory for instruction and data storage. This type of processor was chosen
`
`
`
`
`
`because of its 60ns instruction cycles, the on-chip cache and the floating point unit and
`
`
`the two independent, parallel bus interfaces [10].
`
`
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 13 of 29
`
`
`
`y images within
`, but not as big
`
`fectively. Other
`
`ion of all graph-
`
`eds some time .
`
`. Therefore the
`a minimum to
`
`-shelf.
`
`generation side
`1dow /Jandling.
`1 rate of 60 Hz
`U as z-buffered.
`·date dynamics
`
`eration process
`g into units as
`
`
`
`4. Dynamic Load Balancing within a High Performance Graphics System 41
`
`Traversal
`ol graphics data structure
`
`Moster Module
`
`Geometry Calculations
`
`transformation and
`cli>p;,q. light;,q
`
`Geomelry Module
`
`Rendering
`scone-an version
`end shadilg
`
`Rendering Module
`
`
`
`FIGURE 4.1. Mapping functional sections to hardware
`
`
`
`4.3.1.1 The Muster Modul
`
`is handled over a 256k * 32 bit dual ported The commurucation to the host processor
`
`
`
`
`memory allowing to transfer and process data in parallel. The interface is asynchronous
`
`and interrupt driven for fast response and transfers data up to 20 Mbyte/s.
`The master module traverses the graphics data structure and feeds graphics data to a
`
`
`
`
`
`
`
`special first-in-first-out memory (FIFO) for delivering to the appropriate processors.
`
`
`In the case of synchronizing or updating (e.g. graphics context, colour lookup tables,
`
`etc.) the master takes over system control and bypasses the pipeline with a direct access
`
`to the appropriate resource.
`
`age generation
`
`4.3.1.2 The Geometry Module
`
`Graphics data are transferred to the geometry modules by a rate of 33 Mbyte/s. The
`
`
`
`
`geometry module performs the transformation, clipping, polygon and patch subdivision,
`
`
`
`
`normal interpolation and renormalisation and Eghting operations in an appropriate man
`
`
`
`
`
`ner and delivers the processed graphical primitives to the rendering module data FIFOs.
`
`4.3.1.3 The Rendering Module
`
`The structure of -the rendering module is similar to that of the geometry module. For
`
`
`
`
`
`
`
`rendering calculations Eke shading and scan conversion the processor fetches data from
`values to the frame
`its data FIFO and conveys the calculated pixel
`bufl"er. For image
`
`
`
`processing purposes data are read from the frame buffer, manipulated and written back.
`
`Because the rendering module can act as a geometry module too, it can also directly
`
`fetch graphics data from the master data FIFO and deEver processed data to the FIFOs
`
`
`of the appropriate rendering modules (see Section 4.5).
`
`on to the host
`
`.ivides bipara
`
`ke this.
`:el data to the
`metry module
`
`' 32 bit wide,
`lr was chosen
`•oint urut and
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 14 of 29
`
`
`
`42 Harald Selzer
`
`'GeomellyModie
`
`FIGURE 4.2. Dnsic module structure
`1280
`2 3 4 5
`
`2 3 4 5
`2 3 4 5
`
`2 3 4 5
`
`2 3 4 5
`
`2 3 4 5
`
`Clipping at arb
`
`
`of windows and l
`
`4.3.2 Overall
`The system is o
`g
`by multiplying
`rendering modul
`
`geometry modulo
`three 1
`comprises
`Three indcpen
`of the system.
`
`All modules ar
`resou rccs arc ac<
`
`in single or broa
`The rendering
`on the rendering
`to 132 Mbytcfs.
`
`Theinit bus al
`of the colour loo
`Each renderin
`tached to it whi·
`The frame bul
`
`processor in ord
`
`the handling of
`back from the ir
`
`4.4 Datafl
`S)
`In the entire
`
`subsequent mo<
`
`increases as the
`n
`The master
`
`itives like spline
`has finished th<
`task automatic;
`
`The logical ir
`pi
`gles, vectors,
`data structure
`and common d:
`
`2 3 4 5
`2 3 4 5
`
`FIGURE 4.3. Frame buffer interleaving
`
`4.3.1.4 The Frame Buffer
`
`The frame buffer is distributed and divided into 5 parts with an overall resolution of
`
`
`
`
`
`
`
`
`1280xl024 pixels with 88 bits per pixel (2x24 bit colour, 24 bit z-buffer, 8 bit transparency,
`
`
`8 bit window identifier) with a video refresh rate of 60Hz (noninterlaced).
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 15 of 29
`
`
`
`4. Dynamic Load Balancing
`
`within a High Performance Graphks System 43
`
`Clipping at arbitrarily shaped windows
`is supported by hardware
`as well as fast copying
`at high data transfer rates [11].
`of windows and bit block operations
`4.3.2 Overall Architecture
`The system is organized as a pipeline with additional
`parallelism
`on functional
`level
`geometry and rendering modules (sec Figure 4.4). The number of the
`by multiplying
`rendering modules is fixed
`to a multiple of five due to technical
`reasons, whereas the
`geometry modules can be multiplied
`theoretically
`unlimited.
`The current configuration
`comprises three geometry and five rendering modules.
`Three independent
`busses enable parallel
`data transfer to and from multiple resources
`of the system.
`All modules arc connected to the geometry bus which acts as the system bus. All system
`resources are accessablc
`by the master. System, graphics or update data are transferred
`in single or broadcast mode with 33 Mbytefs.
`The rendering bus is designated
`to convey only rendering primitives
`to the data FIFOs
`on the rendering modules. For speed reasons data arc transferred
`synchronously
`with up
`to 132 Mbytefs.
`Thcinit bus allows a direct accccss to the video and cursor planes used for fast update
`of the colour look up tables (CLUT) and generating
`the cursor in
`separate cursor planes.
`data with 33 Mbytes/s to the frame buffer bank at
`Each rendering processor writes
`tached to it which results in a total transfer rate of 165 Mbytesfs.
`The frame buffer and the video/cursor plane memories can be accessed also by the host
`processor in order to get a possibility
`to bypass the graphics pipeline.
`This supports e.g.
`the handling of pixel maps if the host processor wants
`to transfer pixel values to or read
`back from the image memory.
`
`4.4 Dataflow
`
`In the entire system graphics data are processed simultaneous
`and transferred
`to the
`subsequent modules in parallel.
`From stage to stage the number of elements per object
`(Figure 4.5).
`increases
`as the content of information
`per clement decreases
`The master module traverses
`the graphics data structure
`and puts the high order prim
`itives like splines,
`polygons,
`meshes or triangles
`into the data FIFO. If a geometry module
`has finished the last task, it accesses the geometry bus and fetches the next primitiv or
`task automatically.
`All geometry calculations
`arc done within a single module.
`The logical interface
`between the geometry and rendering calculations
`transfers
`trian
`gles, vectors, pixel and trapeziums with edges parallel
`to the screen y axis columns. The
`processor specific data (due to the distributed
`frame buffer)
`data structure incorporates
`and common dat<L. The latter ones are broadcastcd
`to the rendering modules.
`
`;erall resolution
`of
`8 bit transparency,
`ced).
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 16 of 29
`
`
`
`44 Harald Selzer
`
`[
`
`r---------------,
`I
`:
`
`r----------------,
`
`I
`
`I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
`
`4.5 Load
`4.5.1 Auton
`The effort of c
`position
`of the
`y axis require
`to initialize
`th•
`than the rend€
`for interpolatir:
`c
`An anlysis
`from a lot of s:
`a few large on•
`Further inv•
`shown a more
`below). The p
`The reason:
`
`� �--�IN�IT�B�US�--------------------------------------�
`
`I I I
`;:!"' I
`I
`"'
`J
`
`�----------------
`
`s
`c .., 0
`0 "'
`
`,----�---:
`I
`L-------------
`J
`
`FIGURE 4.4. System architecture
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 17 of 29
`
`
`
`4. Dynamic Load Balancing within a High Performance Graphics System 45
`
`graphical dote
`structure
`
`I
`I
`•
`
`graphical
`(splines, polygones,
`objects
`meshes ... )
`
`I
`I
`•
`
`graphical
`( ·./.6.0)
`primitives
`
`I
`I
`•
`
`pixels
`
`Master
`Module
`
`'
`
`G-Module
`
`�
`GR-Module
`
`Frame
`buffer
`
`FIGURE 4.5. Graphics data processing
`4.5 Load Balancing
`4.5.1 Automatical Regulatio n
`The effort of computation in the geometry and rendering section depends on size and
`
`
`position of the geometrical objects. Small triangles or short vectors parallel to the x or
`
`y axis require only a small number of rendering operations. In fact the lime consumed
`
`
`lo initialize the rendering processor for primitives producing only a few pixels is greater
`
`than the rendering time itself. On the other hand the number of geometric calculations
`
`
`
`for interpolating shading methods is independent from the resulting size of the primitive.
`An anlysis of scene complexity has shown, that in most cases the image is generated
`
`
`from a lot of small triangles (1-10 pixels), a number of medium sized (11-100 pixels) and
`a few large ones (101-1000 pixels) [6].
`Further investigations with less complex scenes (no more than 5000 triangles) have
`
`
`
`
`
`
`shown a more extrem distribution of the size of triangles incorporated (s. statistics shown
`below). The pictures arc shown at the end of this paper.
`The reason is the way of modeling a scene i. c. things of interest arc generated with a lot
`
`
`
`\
`
`·------------,
`
`I I I I I I I I I I I I I I I I I I I I I
`
`I I I I I
`
`r------1.--: I
`�---_JI
`------------
`...J
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 18 of 29
`
`
`
`46 Harald Selzer
`
`Breakfast
`
`Legend
`% or total triangles
`[�·.] % of total triangle
`area
`
`100
`
`90
`80
`
`70
`
`60
`
`50
`
`40
`
`30
`
`20
`
`10
`
`0
`
`0-30
`
`30-60 60-300 300-1200 1200-6000-30000->60000
`6000 30000 60000
`Pixel{friangle
`---+
`
`for "breakfast"
`FIGURE 4.6. Image statistics
`
`100
`
`90
`
`80
`
`70
`
`60 � � �
`50 � � � �
`40 � � � � 30 � � �
`20 �
`i 10 :
`�
`0 '
`
`the background
`The rest of the scene especially
`surface.
`to get a fine grained
`of primitives
`).
`(triangles
`calculation
`with only a few but very large primitives
`is defined
`After rer
`of 1024 x
`with a solution
`below were rendered
`in the statistics
`analyzed
`The pictures
`data in his
`1280 pixels.
`objects anc
`background.
`without
`defined
`for a picture
`is an example
`The chessman figure
`a•
`achieved
`more
`will be used to display
`performance
`in graphics
`increase
`the future
`Additionally
`ion
`calculat
`Those images
`faster.
`the same number of objects
`than displaying
`rather
`scenes
`complex
`power. Do·
`to the geometry
`the load of computation
`shifting
`a lot of very small triangles
`will comprise
`than 95%
`section.
`g 01
`startin
`of may vary from
`an image consists
`the size and the number of triangles
`Nevertheless
`a fi.xed balanced
`within
`scene to scene or even from view to view. This will cause idle states
`4.5.2 T•
`computational
`power of
`all the distributed
`of exploiting
`intention
`With the
`architecture.
`to the actual
`units have to be able to adapt their activities
`the system, the processing
`The capab
`of the scene.
`requirements
`processing
`ing modul
`and to speed up geometry calculation
`dy
`To enabel such a dynamic load balancing
`lion. SupF
`all the geometry
`of performing
`are capable
`modules
`the rendering
`if required,
`namically
`to the sav
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 19 of 29
`
`
`
`100
`90
`80
`70