`
`sOniaatelam (=o P)
`
`6) Springer-Verlag
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 1 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 1 of 26
`
`
`
`
`
`
`Focus on Computer Graphics
`Tutorials and Perspectives in Computer Graphics
`Edited by W.T. Hewitt, R. Gnatz, and W. Hansmann
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 2 of 26
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 2 of 26
`
`
`
`A. Kaufman (Ed.)
`
`Rendering, Visualization
`and Rasterization Hardware
`
`With 100 Figures
`
`Budapest
`
`Springer-Verlag
`Berlin Heidelberg New York
`London Paris Tokyo
`Hong Kong Barcelona
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 3 of 26
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 3 of 26
`
`
`
`Focus on Computer Graphics
`Edited by W.T. Hewitt, R. Gnatz, and W. Hansmann,
`for EUROGRAPHICS -—
`The European Association for Computer Graphics
`P. O. Box 16, CH-1288 Aire-la-Ville, Switzerland
`
`T3 8 x
`: R Ht 5 g
`q q 3
`
`Volume Editor
`
`Arie Kaufman
`Department of Computer Science
`State University of NY at Stony Brook
`Stony Brook, NY 11794-4400, USA
`
`
`
`
`
`
`
`
`Coverpicture: H. Selzer, Fraunhofer-Institut
`fiir Graphische Datenverarbeitung (see also contribution p. 37)
`
`ISBN 3-540-56787-9 Springer-Verlag Berlin Heidelberg New York
`ISBN 0-387-56787-9 Springer-Verlag New York Berlin Heidelberg
`G3- ATIF
`
`.
`Library of Congress Cataloging-in-Publication Data
`Rendering, visualization and rasterization hardware / A. Kaufman,(ed.). p. em. — (Focus on
`computer graphics) “Comprehensive record of the contributions to the Sixth Eurographics Work-
`shop on Graphics Hardwareheld on 1-2 September, 1991 in Vienna, Austria, in conjunction with the
`Eurographics ’91 Conference” - Pref. Includes bibliographical references and index. ISBN 0-387-
`56787-9 (U.S.) 1. Computer graphics—Congresses. 2. Computerinput-output equipment~Congresses.
`I. Kaufman,Arie. IT. Eurographics Workshop on Graphics Hardware (6th: 1991: Vienna, Austria).
`TIL. EUROGRAPHICS(1991: Vienna, Austria). IV, Series. T385.R458
`1993
`621.39’9-dc20
`This work is subject to copyright. All rights are reserved, whether the whole or part of the material
`is concerned, specifically the rights of wanslation, reprinting, reuse of illustrations, recitation,
`broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication
`ofthis publication or parts thereof is permitted only underthe provisions of the German Copyright -
`Law of September9, 1965,in its current version, and permission for use must always be obtained
`from Springer-Verlag. Violationsare liable for prosecution under the German Copyright Law.
`© 1993 EUROGRAPHICSThe European Association for Computer Graphics
`Printed in Germany
`The useof general descriptive names,registered names, trademarks, etc. in this publication does not
`imply, even in the absenceof a specific statement, that such names are exempt from the relevant
`protective laws and regulations and therefore free for general use.
`Typesetting: Camera ready copy by authors/editors
`45/3140 -5 43210 — Printed on acid-free paper
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 4 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 4 of 26
`
`
`
`Preface
`
`The material in this book represents a comprehensive record of the contributions to the
`Sixth Eurographics Workshop on Graphics Hardware held on 1-2 September 1991 in Vi-
`enna, Austria, in conjunction with the Eurographics ’91 Conference. The Sixth Eurograph-
`ics Workshop on Graphics Hardware is the sixth in an established series of workshops.
`These workshops have been an excellent forum for an exchange of information and ideas
`on the latest development and work-in-progress report in the field of graphics hardware.
`The papers in this book are revised versions of those presented at the Workshop. The
`papers were revised based on the reviewers comments and the discussions during the
`Workshop.
`.
`The book has five parts and a keynote paper. The keynote paper is by Kurt Akeley,
`Vice President and Chief Engineerof Silicon Graphics, who delivered the keynote address
`on “Issues and Directions for Graphics Hardware Accelerators” at the Workshop. The
`first part of the book concerns graphics hardware design. The papers in this part discuss
`simulation and silicon compilersforsich 4 design. The second part.contains two papers on
`graphics systems: a high-performance graphics system and the I.M.O.G.E.N.E. machine.
`The third part focuses on volume (voxel-based) machines. The papersin this part describe
`two devices to facilitate transformations of volumes. The fourth part of this book includes
`paperson rasterization systems, including character rasterization and scan-conversion of
`triangular faces. The papersin the last part of the book focus on rendering machines. They
`include a programmable rendering engine, primitive shaders, and radiosity implementation
`on a parallel architecture.
`The book is a testimony that thereare flourishing activities in the development of novel
`architectural and algorithmic ideas in graphics hardware. Specifically, the impact of VLSI
`technology, newly developed algorithms and approaches, and the increasing diversity of
`application encourage new hardware solutions and keep the graphics hardware topic a
`viable research and developmentarea.
`I am very grateful for the amountof time and energy putinto the refereeing process and
`the planning of the Workshop by the members of the Program Committee. In addition,
`1 would like to thank the Eurographics Association for supporting the Workshop series;
`Max Mehl from FhG-AGD, Darmstadt, for his effort in organizing the Workshop; the
`Technical University of Vienna for hosting the event; Gerhard Hiess from TU Vienna for
`local organization; and my students Cldudio Silva and Juliana Silva for preparing the
`book for publication. Last, but not least, my thanks go to the authors of the papers for
`the careful preparation of their manuscript.
`
`Stony Brook, New York
`Spring 1993
`
`Arie Kaufman
`
`.
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 5 of 26
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 5 of 26
`
`
`
`Sixth Eurographics Workshop on Graphics Hardware
`Vienna - Austria
`
`1 - 2 September 1991
`
`;
`Workshop Chairman
`Professor A. Kaufman, State University of New York at Stony Brook, USA
`
`Local Organisation Chairman.
`Dr. M. Mehl, FhG-AGD, Darmstadt, Germany
`
`Workshop Programme Committee
`Prof. R.L. Grimsdale (University of Sussex, UK)
`Dr. F. Kitson (HP Labs, Palo Alto, CA, USA)
`Dr. P. Leray (CCETT, France)
`Drs. A.A.M. Kuijk (Centre for Mathematics and Computer Science, Amsterdam, NL)
`Prof. W. Strasser (University of Tuebingen, Germany)
`Dr. S. Molnar (University of North Carolina, Chapel Hill, USA)
`Dr. J.R. Rossignac (IBM Thomas J. Watson Research Center, USA)
`Dr. C. Shaw (University of Alberta, Canada).
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page6 of 26
`
`
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 6 of 26
`
`
`
`Table of Contents
`
`Keynote Speaker
`
`1
`
`Issues and Directions for Graphics Hardware Accelerators
`Kurt Akeley
`
`I Graphics Hardware Design
`
`2 XInPosse: Structural Simulation for Graphics Hardware
`M.A. Guravage, E.H. Blake, A.A.M. Kuijk
`
`3 Silicon Compilers for Graphics Hardware Design?
`Oliver Renz, Alwin Groene
`
`II Graphics Systems
`
`4 Dynamic Load Balancing within a High Performance
`Graphics System
`Harald Selzer
`
`5 The I.M.O.G.E.N.E. Machine: Some Hardware Elements
`V. Lefévére, S. Karpf, C. Chaillou, M. Mériauz
`
`III Volume Machines
`
`6 The Conveyor- an Interconnection Device for Parallel
`Volumetric Transformations
`Daniel Cohen, Reuven Bakalash
`
`7 The Flipping Cube: A Device for Rotating 3D Rasters
`Roni Yagel
`
`IV Rasterization
`
`8 Wardware Outline Character Rasterization
`Marc Morgan, Roger D. Hersch
`
`9 Accurate Scanconversion of Triangulated Surfaces
`Jarek R. Rossignac
`
`20
`
`35
`
`37
`
`54
`
`75
`
`77
`
`86
`
`101
`
`103
`
`116
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 7 of 26
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 7 of 26
`
`
`
`V. Rendering Machines
`
`10 Testing Geometric Primitive Shaders
`G. J. Dunnett, M. White, P. F. Lister, R. L. Grimsdale
`11 An Architecture for a High Performance Rendering Engine
`Hans-Josef Ackermann, Christoph Hornung
`12 Space Partitioning for Mapping Radiosity Computations onto
`a Pipelined Parallel Architecture (I)
`L.S. Shen, F.A.J. Laarakker, E. Deprettere
`
`List of Contributors
`
`139
`
`141
`
`157
`
`175
`
`191
`
`
`
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page8 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 8 of 26
`
`
`
`Keynote Speaker
`
`Kurt Akeley
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 9 of 26
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 9 of 26
`
`
`
`Dynamic Load Balancing within a High Performance
`4
`Graphics System
`
`Harald Selzer
`
`ABSTRACTInteractive 3D graphics applications require significant arithmetic pro-
`cessing to meet the ever-inreasing desire for higher image complexity and higher
`resolution in displayed images.
`This paper describes a graphics processor architecture with a high degree of paral-
`lelism connected to a distributed frame buffer. The architecture can be configured
`with an arbitrary numberofidentical, high level programmable processors operating
`‘in parallel.
`Within the architecture an automatic load balancing mechanism is presented which
`distributes the processing load between geometry and rendering section.
`After the unique features of the architecture are described the load balancing mech-
`anism is analyzed and the increase of performance is demonstrated.
`
`4.1
`
`Introduction
`
`Since human visual perception is the most effective method to perceive a lot of informa-
`tions in a short time, the photorealistic rendering for the visualization of medical, physical
`or technical data requires speed improvements and demands for developing innovative ar-
`chitectures.
`
`Modern workstations with a state of the art graphics platform incorporate some form
`of hardware support for graphics applications to release the CPU from the burden of visu-
`alization tasks. Sophisticated user interfaces within any CAX application in conjunction
`with high interactivity and realistic images require to split and parallelise the system to
`distribute overall computing load.
`/
`This paper describes considerations made within the work for the GRACEproject 4, a
`development which tries to satisfy the requirements of a graphics processor architecture.
`
`4.2 Background
`
`4.2.1 Contemporary Architectures
`
`Commonto all raster display systems is the frame buffer, which stores the image on a
`pixel by pixel basis and decouples image generation and video refresh process. The design
`of the frame buffer with its partitioning related to object or screen space and the degree
`of parallel access possibilities are a keyfeature to systems merit [16].
`Attempting to satisfy the demands of increased calculation rates a lot of architectures
`
`' This project was funded by the Commission of the European Community in the ESPRIT-I-Program,Project-
`No 2569 (EuroWorkStation)
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 10 of 26
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 10 of 26
`
`
`
`38
`
`Harald Selzer
`
`with different basic concepts have been developed [9] showing the demandsof integrating
`more system functionality on a single chip.
`A well known approach making extensive use of full customVLSI devices is the design
`of the Pixel Planes [15],[8].
`To achieve higher rendering performance and to overcomethe frame buffer bottleneck
`the rasterization processor and the frame buffer memory are integrated on the samechip.
`Similar approaches were proposed in the Scan Line Access Memory [7] and the Smart
`Image Memory [4].
`Otherarchitectures try to parallelize functional modules of the image generation process
`e.g. by mapping the geometrysection to a multistage pipeline of customized VLSI devices
`[5]. This design was enhanced and is now available as a full parallelized state of the art
`workstation [2],[3}.
`Another more general architecture is the Pixel Machine, a MIMD computer based on
`an array of asynchronous processor nodes with parallel access to a large frame buffer [14].
`The advantage of this approach is the homogeneous structure and the programmability
`which allows all algorithms to be implementedin software.
`
`4.2.2 Goals
`
`4.2.2.1 Principal Considerations
`1. Frame Buffer
`The memory in which an image is stored on a pixel by pixel basis is called the frame
`buffer or image memory. This memory is accessed on the one hand by the rendering
`processor, which writes data into the memory and on the other hand by the video refresh
`controller, which reads from the memory and conveys pixel data to the video output
`circuitry and the display monitor.
`The image memory built up with conventional DRAMs can bother image generation
`process at rendering processor side as well as at video refresh side. Using todays available
`video RAMs (VRAMs) improves the speed of frame buffer access dramatically (Whit84).
`Nevertheless a certain level of performance implies the needof parallelism within the frame
`buffer. A resolution of 1024x1280 visible pixels with a 60 Hz refresh rate (noninterlaced)
`requires a pixel frequency of about 110 MHz or equivalent 330 Mbyte/s transferrate for
`full colour representation with 24 bits/pixel. A monolithic frame buffer can not achieve
`that. The maximum clock frequency of the VRAM shift register measures 30-40 MHz and
`is therefore limited to resolutions 640x480 pixels with 60 Hz video refresh rate (noninter-
`laced) or equivalent.
`:
`On the other side display processors with 25ns cycle times have to compete for the
`random access port of a VRAM with a normal cycle time of about 150ns (no page, nibble
`or static column mode is taken into account) slowing down image generation .
`The solution appears to be found in writing multiple pixels into the memory in parallel,
`the basic concept of the distributed frame buffer. The frame buffer could be divided into
`rows, columns,or arrays [9] [16] and each of these partsis attached to a separate rendering
`processor thus overcoming the memory access bottleneck.
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 11 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 11 of 26
`
`
`
`4, Dynamic Load Balancing within a High Performance Graphics System
`
`39
`
`2. Floating Point Versus Fixpoint Calculation
`While designing a system- architecture the central question is arising which processor
`is the most suitable for that design.
`Graphics applications are very arithmetic intensiv tasks and therefore need Processing
`devices with very powerfull arithmetic and logic units (ALUs).
`State of the art processors include a floating point unit on chip. But nevertheless only a
`few of these processors incorporate sufficient powerfull arithmetic units to perform floating
`point operations as fast as integer operations. Especially if the system signed with appli-
`cation specific integrated circuits (ASICS) it is worth while to consider which accuracy
`for mathematical calculations in a graphics system is needed. Numerical intensive opera-
`tions are performed in the geometry and the rendering section. Typical tasks within the
`geometry section requiring floating point calculation with high accuracy are the following:
`- Transforming objects with world coordinates to image space,
`- Interpolating vertex normals (Phong shading)
`- Normalizing interpolated vertex normals (Phong shading)
`-Performing the lighting calculations (Phong and Gouraud shading).
`Using a single precision floating point numberresults in a maximum inaccuracy of 2exp-
`150 (decimal equivalent: 7*10exp-46) per operation [13]. This is a sufficient precision for
`the operations mentioned above without visible effects. The rendering section comprises
`operationslike
`- colour interpolation (Gouraud shading)
`- z-value interpolation for z-buffering
`- transparency calculation
`- algorithms for image processing.
`For an image with a limited resolution most of this operations could be done with
`fixpoint arithmetic in an appropriate precision.
`Suggesting a resolution of 2048 x 2048 pixel and a fixpoint representation with a frac-
`tional part consistng of 16 part consisting of 16 bits, a RGB model colour interpolation
`over a whole scanline would incorporate a binary error of 2exp-6 - a deviation not per-
`ceptible for the human eye on todays monitors.
`This shows that for the mathematical calculation in the geometrysection floating point
`units are necessary but in the rendering section the mathematical computations could
`be done with fixpoint precision. Therefore,if a straightforward architecture for a specific
`application is implemented with no parallelism on board or modulelevel, fixpoint arith-
`metic may suite well - an approach that was realized and tested well for a fast Gouraud
`triangle shader [1].
`In the system-architecture discussed in this paper the processors should be able to
`- perform rendering tasks as well as geometry calculations. This argue mainly led to the
`decision to incorporate digital signal processors (DSPs) which have a floating point unit
`on chip and were at the time of system design the fastest processors available on the
`market.
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 12 of 26
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 12 of 26
`
`
`
`40
`
`Harald Selzer
`
`4.2.2.2 System Characteristics and Design Goals
`
`Thearchitecture to be realized should be capable of generating high quality images within
`a moderate time. That means the hardware should be as fast as possible but not as big
`as possible. The employed computatiomal power should be used very effectively. Other
`characteristics are:
`- A flexible system, high level programmable to enable the implementationof all graph-
`ics functions necessary and various algorithms for image generation.
`- Parallelism should be implemented wherever possible
`- Homogene. To become familiar with an off-the-shelf VLSI device needs some time.
`To become familiar with a few different such devices needs a lot of time. Therefore the
`numberofdifferent off- the-shelf VLSI components had to be reduced to a minimum to
`ease system use and shorten software development time.
`- The arithmetical and logical units (ALUs) should be available off-the-shelf.
`- The frame buffer design should overcome the access bottleneck on the generation side
`as well as on the video side and incorporate hardware support for fast window handling.
`- The frame buffer resolution is 1280 * 1024 pixel with a video refresh rate of 60 Hz
`(noninterlaced). Every pixel has 24 bit colour and is double buffered as well as 2-buffered.
`- The frame buffer- should provide double buffering in order to accomodate dynamics
`and z-buffering too.
`
`4.3 The Architecture
`
`4.3.1 Overview
`
`Taking into account the demandsofthe different tasks within the image generation process
`the mapping of the functional sections to hardware suggested the splitting into units as
`shown in Figure 4.1.
`Aboveof the frame buffer there are three different units handling the image generation
`process:
`:
`- The Master Module
`- The Geometry Module
`- The Rendering Module.
`The master module is the systems supervisor, handles the communication to the host
`processor and is responsible for start-up and synchronising activities.
`The geometry module transforms andclips the graphic primitives, subdivides bipara-
`metric patches and the lighting calculations that are necessary and tasks like this.
`The rendering module performs the shading algorithms and transfers pixel data to the
`frame buffer. The rendering module also supports too all functions of the geometry module
`(Figure 4.1).
`All modules contain a digital signal processor (DSP) with up to 256k * 32 bit wide,
`fast static memory for instruction and data storage. This type of processor was chosen
`because of its 60ns instruction cycles, the on-chip cache and the floating point unit and
`the two independent, parallel bus interfaces [10].
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 13 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 13 of 26
`
`
`
`4. Dynamic Load Balancing within a High Performance Graphics System
`
`41
`
`Traversal
`of graphics doto structure
`
`Master Module
`
`
`Geometry Calculations
`transformation and
`clipping,
`lighting
`
`
`
`
`
` Rendering
`‘sconconyersion
`
`
`Rendering Module
`and shading
`
`
`FIGURE 4.1. Mapping functional sections to hardware
`
`4.3.1.1 The Master Modul
`
`The communication to the host processor is handled over a 256k * 32 bit dual ported
`memory allowing to transfer and process data in parallel. The interface is asynchronous
`and interrupt driven for fast response and transfers data up to 20 Mbyte/s.
`The master module traverses the graphics data structure and feeds graphics data to a
`special first-in-first-out memory (FIFO) for delivering to the appropriate processors.
`In the case of synchronizing or updating (e.g. graphics context, colour lookup tables,
`etc.) the master takes over system control and bypasses the pipeline with a direct access
`to the appropriate resource.
`
`4.3.1.2 The Geometry Module
`
`Graphics data are transferred to the geometry modules by a rate of 33 Mbyte/s. The
`geometry module performs the transformation, clipping, polygon and patch subdivision,
`normal interpolation and renormalisation and lighting operations in an appropriate man-
`ner and delivers the processed graphical primitives to the rendering module data FIFOs.
`
`4.3.1.3 The Rendering Module
`
`The structure of the rendering module is similar to that of the geometry module. For
`rendering calculations like shading and scan conversion the processor fetches data from
`its data FIFO and conveys the calculated pixel values to the frame buffer. For image
`processing purposes data are read from the frame buffer, manipulated and written back.
`Because the rendering module can act as a geometry module too, it can also directly
`fetch graphics data from the master data FIFO and deliver processed data to the FIFOs
`of the appropriate rendering modules (see Section 4.5).
`.
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 14 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 14 of 26
`
`
`
`42
`
`Harald Selzer
`
`Master Module
`
`Geomety Module
`
`Rendering Module|oi/\£/\7\
`1024
`
`
`
`FIGURE4.3. Frame buffer interleaving
`
`4.3.1.4 The Frame Buffer
`
`The frame buffer is distributed and divided into 5 parts with an overall resolution of
`1280x1024 pixels with 88 bits per pixel (2x24 bit colour, 24 bit z-buffer, 8 bit transparency,
`8 bit window identifier) with a video refresh rate of 60 Hz (noninterlaced).
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 15 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 15 of 26
`
`
`
`4, Dynamic Load Balancing within a High Performance Graphics System
`
`43
`
`Clipping at arbitrarily shaped windows is supported by hardware as well as fast copying
`of windows and bit block operations at high data transfer rates [11].
`
`4.3.2 Overall Architecture
`
`The system is organized as a pipeline with additional parallelism on functional level
`by multiplying geometry and rendering modules (see Figure 4.4). The number of the
`rendering modules is fixed to a multiple of five due to technical reasons, whereas the
`geometry modules can be multiplied theoretically unlimited. The current configuration
`comprises three geometry and five rendering modules.
`Three independent busses enable parallel data transfer to and from multiple resources
`of the system.
`All modules are connected to the geometry bus which acts as the system bus. All system
`resources are accessable by the master. System, graphics or update data are transferred
`in single or broadcast mode with 33 Mbyte/s.
`The rendering bus is designated to convey only rendering primitives to the data FIFOs
`on the rendering modules. For speed reasons data are transferred synchronously with up
`to 132 Mbyte/s.
`Theinit bus allows a direct acccess to the video and cursor planes used for fast update
`of the colour look up tables (CLUT) and generating the cursor in separate cursor planes.
`Each rendering processor writes data with 33 Mbytes/s to the frame buffer bank at-
`tached to it which results in a total transfer rate of 165 Mbytes/s.
`The frame buffer and the video/cursor plane memories can be accessedalso by the host
`processor in order to get a possibility to bypass the graphics pipeline. This supports e.g.
`the handling of pixel mapsif the host processor wants to transfer pixel values to or read
`back from the image memory.
`
`4.4 Dataflow .
`
`In the entire system graphics data are processed simultaneous and transferred to the
`subsequent modules in parallel. From stage to stage the number of elements per object
`increases as the content of information per element decreases (Figure 4.5).
`The master module traverses the graphics data structure and puts the high order prim-
`itives like splines, polygons, meshesor triangles into the data FIFO. If a geometry module
`has finished the last task, it accesses the geometry bus and fetches the next primitiv or
`task automatically. All geometry calculations are done within a single module.
`The logical interface between the geometry and rendering calculations transfers trian-
`gles, vectors, pixel and trapeziums with edges parallel to the screen y axis columns. The
`data structure incorporates processor specific data (due to the distributed frame buffer)
`and common data. The latter ones are broadcasted to the rendering modules.
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 16 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 16 of 26
`
`
`
`ARBITER
`x-PorT
`LFiro|
`DSP
`PROG.
`MEM.
`I"ial0
`
`
`
`
`
`MASTER-MODULE
`
`
`
` FRAMEBUFFER
`
`
`44
`
`Harald Selzer
`
`MEM.
`
`PROG.
`
`GR-MODULE
`
`oid
`
`RENDERINGBUS
`
`Cae)
`
`BANK BUS
`
`INIT BUS
`
`VIDEO/CURSOR
`
`
`
`GEOMETRYBUSL__g_pen.$$J
`
`FIGURE 4.4. System architecture
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 17 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 17 of 26
`
`
`
`
`
`SRea
`
`
`
`
`buffer
`
`GR-Module
`
`Frame
`
`4. Dynamic Load Balancing within a High Performance Graphics System
`
`45
`
`graphical data
`structure
`1
`
`|
`
`|v
`
`(splines, polygones,
`meshes...)
`
`graphical
`objects
`
`1 |
`
`I v
`
`primitives (-/A.2 )
`
`I v
`
`pixels
`
`Master
`Module
`
`G-Module
`
`FIGURE 4.5. Graphics data processing
`
`4.5 Load Balancing
`
`4.5.1 Automatical Regulation
`
`The effort of computation in the geometry and rendering section depends on size and
`position of the geometrical objects. Small triangles or short vectors parallel to the x or
`y axis require only a small number of rendering operations. In fact the time consumed
`to initialize the rendering processor for primitives producing only a few pixels is greater
`than the rendering time itself. On the other hand the number of geometric calculations
`for interpolating shading methods is independent from the resulting size of the primitive.
`An anlysis of scene complexity has shown, that in most cases the image is generated
`from a lot of small triangles (1-10 pixels), a number of medium sized (11-100 pixels) and
`a few large ones (101-1000 pixels) [6].
`Further investigations with less complex scenes (no more than 5000 triangles) have
`shown a more extrem distribution of the size of triangles incorporated (s. statistics shown
`below). The pictures are shown at the end of this paper.
`The reason is the way of modeling a scenei. e. thingsof interest are generated with a lot
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 18 of 26
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 18 of 26
`
`
`
`46
`
`Harald Selzer
`
`100 -
`
`
`
`
`Legend
`Y % of total triangles
`
`% oftotal triangle area
`
`30-60 60-300
`
`300-
`1200
`
`
`30000- >60000
`1200-
`6
`6000 30000 60000
`
`Breakfast
`
`
`
`Pixel/Triangle —>
`
`FIGURE 4.6. Imagestatistics for “breakfast”
`
`of primitives to get a fine grained surface. Therest of the scene especially the background
`is defined with only a few but very large primitives (triangles),
`The pictures analyzed in the statistics below were rendered with a solution of 1024 x
`1280 pixels.
`The chessman figure is an example for a picture defined without background.
`Additionally the future increase in graphics performance will be used to display more
`complex scenes rather than displaying the same numberof objects faster. Those images
`will comprise a lot of very small triangles shifting the load of computation to the geometry
`section.
`
`Nevertheless the size and the numberof triangles an image consists of may vary from
`scene to scene or even from view to view. This will cause idle states within a fixed balanced
`architecture. With the intention of exploiting all the distributed computational power of
`the system, the processing units have to be able to adapt their activities to the actual
`processing requirements of the scene.
`To enabel such a dynamic load balancing and to speed up geometry calculation dy-
`namically if required, the rendering modules are capable of performing all the geometry
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 19 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 19 of 26
`
`
`
`4. Dynamic. Load Balancing within a High Performance Graphics System
`
`AT
`
`Billiards
`
`Legend
`
`;
`
`% of total triangles
`% of total triangle area
`
`100 5
`
`90 +
`
`80 +
`
`705
`60 4
`
`wooO
`
`—o Oo
`
`0-30
`
`30-60 60-300
`
`300-
`1200
`
`6000- 30000- >50000
`1200-
`6000 30000 60000
`
`Pixel/Triangle —>
`
`FIGURE4.7. Imagestatistics for “billiards”
`
`calculations too.
`After rendering an object any processor may run into anidle state if there is no rendering
`data in his input buffer. Performing a task switch it will request new unprocessed geometry
`objects and continue geometry calculations. In this way an automatical load balancing is
`achieved across all the processors. When several rendering modules are doing geometrical
`calculation the overall rendering performance is reducedin favour of geometry processing
`power. Doing so the exploitation of the processing power incorporated encounters more
`than 95% and no computational power is going to be wasted by a rendering module
`starting out to run anidle state.
`
`4.5.2 Task Switching
`The capability of automatically distributing the work load between geometry and render-
`ing modules means inherently task switching between two jobs within the same applica-
`tion. Supported by the large local memory (up to 256k x 32bit) the switching is reduced
`to the saving and restoring of all processor registers, processing interrupt control and in-
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 20 of 26
`
`
`eee
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 20 of 26
`
`
`
`48
`
`Harald Selzer
`
`100;
`
`90 +
`
`80 +
`0
`
` % of total triangle area
`
`Chessmans
`
`B
`
`.
`
`Legend
`% of total triangles
`
`00
`
`300-
`1200
`
`6000- 30000- >60000
`1200-
`6000 30000 60000
`
`Pixel/Triangle —>
`
`FIGURE 4.8. Imagestatistics for “chessmen”
`
`specting the rendering FIFO status. This is performed in less than 5 us. Since this time is
`very small with respect to the time consumed for geometry processing theself balancing
`is even useful for scenes comprising only very small triangles (sce below).
`
`4.5.3 Performance Increase by Task Switching
`The task switiching capability of the rendering processors accelerates the geometry calcu-
`lation but as mentioned above the switching itself consumes time. How much calculation.
`time is eaten up by task switching and by which factor geometry calculations are ac-
`celerated if a rendering processor switches to geometry processing is evaluated in this
`chapter.
`,
`The peak performance of this architecture is delivered when all rendering processor
`poweris exploited for rendering calculations and the rendering are working continuously.
`If the balancing mechanism is activated the peak performanceis not achieved, but the
`processing powerof the rendering modules is used to speed up geometry calculation. This
`has twoeffects:
`
`Realtek Ex. 1009
`
`Case No. IPR2023-00922
`Page 21 of 26
`
`
`
`Realtek Ex. 1009
`Case No. IPR2023-00922
`Page 21 of 26
`
`
`
`4, Dynamic Load Balancing within a High Performance Graphics System
`
`49
`
`with
`load balancing
`
`[%]
`100
`
`Case A
`
`Case B
`
`7
`5.66 4
`
`50
`
`13
`
`without
`load balancing
`
`
`
`FIGURE 4.9. Exploitation of the rendering processor computation power
`
`- it prevents the rendering modules from running into an idle state and
`- speeds up image generation by supporting substantially geometry processing.
`For simulation of the architecture the DARENDERgraphics software was chosen [12].
`This is a functional implementation for PHIGS-PLUS/PEX.It is fully written in and ©
`incorporates no optimazations in form of assembler routines or similar. The hardware for
`simulation was a 32 MHz application board of the Digital Signal Processor development
`toolkit. The scerie used for exploitation measurement shows several goblets in 3D: space.
`The goblets were fed into the architecture in a b-spline representation with different
`parametrizations: -Case A: The goblets were tesselated into 10082 triangles covering 81389
`pixels (about 8 pixel/triangle). -Case B: The go