throbber
• •
`
`: @ Springer-Verlag
`
`•
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 1 of 29
`
`

`

`BEfl-;EU�0
`LIBRARY )
`I UNIVERSITY OF :
`
`CAUFOf-lt.:i:__./
`
`li I•
`II
`1-
`
`I -------- - ---
`
`--- -��..---
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 2 of 29
`
`

`

`Focus on Computer Graphics �!i!H
`
`in Computer Graphics Tutorials and Perspectives
`
`Edited by W.T. Hewitt, R. Gnatz, and W. Hansmann
`
`...
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 3 of 29
`
`

`

`A. Kaufman (Ed.)
`
`Rendering, Visualization
`
`and Rasterization Hardware
`
`With 100 Figures
`
`Springer -Verlag
`
`
`Berlin Heidelberg New York
`London Paris Tokyo
`Hong Kong Barcelona
`Budapest
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 4 of 29
`
`

`

`--� ,( ...... ·� ,. y
`;� .. - 7
`ENGINEERJN�
`Focus on Computer Graphics
`Edited by W. T. Hewitt, R. Gnatz, and W. Hansmann
`for EUROGRAPHICS -
`
`The European Association for Computer Graphics
`P. 0. Box 16, CH-1288 Aire-la-Ville, Switzerland
`
`Volume Editor
`
`Arie Kaufman
`Department of Computer Science
`
`State University of NY at Stony Brook
`Stony Brook, NY I L 794-4400, USA
`
`H. Selzer, Fraunhofer-lnstitut
`Coverpicture:
`Datenverarbeitung (see also contribution
`fUr Graphische
`p. 37)
`
`ISBN 3-540-56787-9 Springer-Verlag Berlin Heidelberg New York
`
`
`
`
`
`ISBN 0-387-56787-9 Springer-Verlag New York Berlin Heidelberg
`
`of Congress Cataloging-in-Publication Data
`
`Rendering, visualization
`and rastcrizntion
`computer graphics)
`record of the contributions
`to the Sixth Eurographics Work­
`"Comprehensive
`
`1991 in Vienna. Austria,
`with the
`in conjunction
`
`references and index. ISBN 0-387-
`equipment-Congresses.
`
`Library
`hardware I A. Kaufman, (cd.). p. em. - crocus on
`shop on Graphics Hardware held on I ·2 September,
`Eurographics
`'91 Conference"- Pref. Includes bib�ographical
`
`56787-9 (U.S.) I. Computer graphics-Co ngresses. 2. Computerinput-output
`I. Kaufman. Arie. II. Eurographics Workshop on Graphics Hardware (6th: 1991: Vienna. Austria).
`
`
`
`HI. EUROGRAPHICS (1991: Vienna. Austria). IV. Series. T385.R458 1993 621.39'9-dc20
`
`·ntis work is subject to copyright.
`whether the whole or part of the material
`All rights arc reserved,
`
`
`
`
`is concerned. specifically tl1e rights of translation, reprinting, reuse of illustrations, recitation.
`
`
`broadcasting, reproduction on microfilm or in any other way, and storage in data bauks. Duplication
`
`
`of this publication or parts thereof is pcrmiued only under the provisions of the German Copyright
`for use must always be obtained
`Verlag. Violations
`from Springer·
`
`arc liable for prosecution under the German Copyrighl Law.
`
`Preface
`
`The material in this 1
`\1
`Sixth Eurographics
`enna, Austria, in conj
`
`ics Workshop on Gra
`These workshops hav
`on the latest develop:
`The papers in this
`papers were revised
`Workshop.
`The book has live
`and C
`Vice President
`on "Issues and Dire<
`first part of the book
`
`simulation and silicot
`graphics systems: a I
`The third part focusc
`two devices to facilit;
`papers on rasteri;r.ati
`
`triangular faces. The
`include a programma
`
`on a parallel architec
`The book is a testi
`
`architectural and alg
`
`technology, newly de
`
`application encoura1
`and • viable research
`
`I am very grateful
`the planning of the
`
`I would like to than
`Max Mehl from Fh
`
`Technical Universit)
`
`local organization; .
`book for publicatior
`the careful preparat
`
`Stony Brook, New ·
`Spring 1993
`
`L1w of September 9, 1965, in its current version, and permission
`� 1993 EUROGRAPHICS 'llle European Associalion
`11te usc of general dcscripli
`
`Printed in Germany
`
`for Compulcr Graphics
`
`vc names. rcgis1crcd names. trademarks. etc. in this publica! ion docs no1
`
`
`
`
`imply, even in the absence of a specific stmcmem. thai such names arc exempt from !he relevam
`
`
`
`protective laws and regulations and lhcrcforc free for general use.
`
`Camera ready copy by aulhors/cditors
`45/3140-54 3 2 I 0-Printed on acid-free
`
`Typescning:
`
`paper
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 5 of 29
`
`

`

`Preface
`
`i?v _-(
`'11J
`fl)/-1
`L /'• �'�
`
`
`
`
`record a comprehensive The material in this book represents of lhc contributions to the
`
`
`
`
`held on 1-2 September Workshop Sixth Eurographics on Graphics Hardware 1991 in Vi­
`'91 Conference.
`
`
`enna, Austria, in conjunction with the Eurographics
`The Sixth Eurograph·
`
`
`
`
`
`series of workshops. ics Workshop on Graphics Hardware is the sixth in an established
`
`
`
`
`These workshops have been an exccllenl forum for an exchange of information and ideas
`in the field of graphics
`
`
`
`report and work-in-progress on the latest development
`hardware.
`
`
`
`
`The at the Workshop. versions of those presented The papers in this book arc revised
`
`
`
`
`during the comments and the discussions papers were revised based on lhe reviewers
`Workshop.
`The book has five parts and a keynote paper. The keynote paper is by Kurt Akeley,
`
`
`
`and Chief Engineer of Silicon
`
`Vice President
`
`
`Graphics, who delivered the keynote address
`
`
`
`
`
`The Accelerators" at the Workshop. for Graphics Hardware on "Issues and Directions
`
`
`
`
`
`
`discuss in this part The papers hardware design. first part of the book concerns graphics
`part contains two papers
`
`
`The second simulation and silicon compilers for such a design.
`on
`
`
`machine. systems: a high-performance graphics
`system and the I.M.O.G.E.N.E.
`graphics
`
`
`
`
`The third part focuses on volume (voxel-based) machines. The papers in this part describe
`
`
`
`
`
`of volumes. The fourth transformations two devices to facilitate part of this book includes
`
`
`
`
`
`papers on rasterization systems, including character rasterization
`
`and scan-conversion of
`
`
`
`
`machines. They triangular faces. The papers in the last part of the book focus on rendering
`
`
`
`
`include a programmable engine, rendering primitive shaders, and radiosity implementation
`on a parallel architecture.
`
`The book is a testimony that there are flourishing activities in the development of novel
`
`
`
`
`
`
`
`
`architectural and algorithmic ideas in graphics hardware. Specifically, the impact of VLSJ
`
`
`
`and lhe increasing newly developed and approaches, diversity of
`algorithms
`technology,
`
`
`
`
`hardware application encourage new hardware solutions and keep the graphics topic a
`
`
`
`
`
`research viable and developmcni area.
`lam very grateful
`
`
`
`process for the amount of time and energy put into the refereeing and
`Committee. ln addition,
`
`
`
`the planning of the Workshop by the members of lhc Program
`
`
`
`
`series; the Workshop I would like to thank the Eurographics Association for supporting
`
`
`
`
`Max Mehl from FhG-AGD, Darmstadt, for his effort in organizing the Workshop; the
`
`
`
`
`
`the event; Gerhard University Technical of Vienna for hosting Hicss from TU Vienna for
`local organization; and my students
`
`
`
`Chiudio Silva and Juliana Silva for preparing the
`
`
`
`book for publication. Last, but not least, my thanks go to the authors of the papers for
`
`the careful preparation of their manuscript.
`
`Stony Brook, New York
`Spring 19!33
`
`Arie Kaufman
`
`vYork
`elbcrg
`
`Focus on
`cs Work·
`1 with the
`N 0-387-
`ngrcsses.
`Austria).
`9-<lc20
`
`material
`!Cilation,
`plic3lion
`:opyright
`obtained
`Law.
`
`'docs not
`relevant
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 6 of 29
`
`

`

`

`

`tics Hard ware
`
`Table of Contents
`
`l:, USA
`
`
`
`Amsterdam, NL)
`
`Keynote Speaker
`1 Issues and Directions for Graphics Hardware Accelerators
`/(u1't Akeley
`
`I Graphics Hardware Design
`2 XInPosse: Structural Simulation for Graphics Hardware
`M .A. Guravage, E.H. Blake, A.A.M. Kuijk
`
`Design? 3 Silicon Compilers for Graphics Hardware
`
`
`
`Oliver Renz, A /win Groene
`
`II Graphics Systems
`4 Dynamic Load Balancing within a High Performance
`Graphics System
`Harald Selzer
`
`5 The I.M.O.G.E.N.E. Machine:
`Some Hardware Elements
`V. Lefevere,
`
`S. Karpf, C. Chaillou, M. Mb·iaux.
`
`ill Volume Machines
`6 The Conveyor -an Interconnection
`Device for Parallel
`Volumetric Transformations
`Daniel Cohen, Reuven Baka/ash
`
`7 The Flipping Cube: A Device for Rotating 3D Rasters
`Roni Yagel
`
`IV Rasterization
`
`8 Hardware Outline Character Rasterization
`Marc Morgan, Roge1· D. Fle1·sch
`9 Accurate Scanconversion
`of Triangulated Surfaces
`Jarek R. Rossignac
`
`1
`
`3
`
`7
`
`9
`
`20
`
`35
`
`37
`
`54
`
`75
`
`77
`
`86
`
`101
`
`103
`
`116
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 8 of 29
`
`

`

`V Rendering Machines
`10 Testing Geometric Primitive Shaders
`M. White, P. F. Listc,·,
`G. J. Dunnett,
`R. L. G1·imsdale
`11 An Architecture for a High Performance Rendering Engine
`
`[[ omung
`
`
`Hans-Josef A ckemwnn, Christoph
`
`12 Space Partitioning for Mapping Radiosity Computations onto
`
`
`
`a Pipelined Parallel Architecture (II)
`E. Dep1·ette1·e
`L.S. Shen, F.A.J. Laarakke1·,
`
`List of Contributors
`
`139
`
`141
`
`157
`
`175
`
`191
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 9 of 29
`
`

`

`4 Dynamic Load Balancing within a High Performance
`Graphics System
`Harald Selzer
`
`ABSTRACT Interactive 3D graphics applications require significant arithmetic pro­
`
`
`
`
`cessing
`
`
`
`
`to meet the ever-inrea.sing desire for higher image complexity and higher
`
`
`resolution in displayed images.
`This paper describes a graphics processor architecture with a high degree of paral­
`
`
`
`
`
`
`
`lelism connected to a distributed frame buffer. The architecture can be configured
`
`
`
`
`with an arbitrary number of identical, high level programmable processors operating
`in parallel.
`Within the architecture an automatic load balancing mec.hanism is presented which
`
`
`
`
`
`
`
`
`
`
`distributes the processing load between geometry and rendering section.
`
`
`
`
`After the unique features of the architecture are described the load balancing mech­
`
`
`
`anism is analyzed and the increase of performance is demonstrated.
`
`4.1 Introduction
`
`Since human visual perception is the most effective method to perceive a lot of informa­
`
`
`
`
`
`
`
`tions in a short time, the photorealistic rendering for the visualization of medical, physical
`
`
`
`
`
`or technical data requires speed improvements and demands for developing innovative ar­
`chi teet ures.
`Modern workstations with a state of the art graphics platform incorporate some form
`
`
`
`
`
`
`
`
`of hardware support for graphics applications to release the CPU from the burden of visu­
`
`
`
`
`alization tasks. Sophisticated user interfaces within any CAX application in conjunction
`
`
`
`
`with high interactivity and realistic images require to split and parallelise the system to
`
`
`
`distribute overall computing load.
`1, a.
`
`
`This paper describes considerations made within the work for the GRACE project
`
`
`
`
`development which tries to satisfy the requirements of a graphics processor architecture.
`
`4.2 Background
`4.2.1 Contemporary Architectures
`Common to all raster
`the image on a. display systems is the frame buffer, which stores
`
`
`
`
`
`
`
`pixel by pixel basis and decouples image generation and video refresh process. The design
`
`
`
`
`of the frame buffer with its partitioning related to object or screen space and the degree
`
`
`
`
`
`of parallel access possibilities arc a keyfeature to systems merit 116).
`a. lot of architectures Attempting to satisfy the demands of increased calculation rates
`
`
`
`
`'This project was funded by the Commission of the European Community in the ESPRlT-11-Program, Project·
`No 2569 (EuroWorkStation)
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 10 of 29
`
`

`

`38 Harald Selzer
`
`A well known approach making extensive usc of full custom VLSI devices is the design
`
`with different basic. concepts have been developed [9] showing the demands of integrating
`
`
`
`
`
`2. Floating Poi
`
`
`more system functionality on a single chip.
`While desigr
`
`
`
`is the most sui
`of the Pixel Planes [ 15],[8].
`
`Graphics ap·
`To achieve higher rendering performance and to overcome the frame bufi"er bottleneck
`
`
`
`
`
`
`devices with v•
`
`
`
`
`the rasterization processor and the frame buffer memory are integrated on the same chip.
`State of the
`
`
`
`Similar approaches were proposed in the Scan Line Access Memory [7] and the Smart
`few of these pn
`Image Memory [4].
`point operati01
`Other architectures try to parallelize functional modules of the image generation process
`
`
`
`
`
`
`cation specific
`
`
`
`
`
`e.g. by mapping the geometry section to a multistage pipeline of customized VLSI devices
`for mathemati
`
`
`
`[5]. This design was enhanced and is now available as a full parallelized slate of the art
`tions are perfo
`[2],[3].
`workstation
`
`geometry secti
`Another more general architecture is the Pixel Macltine, a MIMD computer based on
`
`
`
`
`
`-Transform
`[14].
`
`
`
`
`an array of asynchronous processor nodes with parallel access to a large frame buffer
`-Interpolati
`
`
`
`
`The advantage of this approach is the homogeneous structure and the programmability
`-Normalizi1
`
`
`
`which allows all algorithms to be implemented in software.
`-Performing
`Using a sing
`4.2.2 Goals
`c
`150 (decimal
`4.2.2.1 Principal Considerations
`the operation!
`1. Frame Buffer
`
`operations lik•
`-colour intc
`The memory in which an image is stored on a pixel by pi.xel basis is called the frame
`
`
`int
`-z-value
`
`
`buffer or image memory. This memory is accessed on the one hand by the rendering
`-transpare1
`
`
`processor, which writes data into the memory and on the other hand by the video refresh
`-algorithm�
`
`
`controller, which reads from the memory and conveys pixel data lo the video output
`For an ima
`
`
`circuitry and the display monitor.
`
`fixpoint arith1
`The image memory buill up with conventional DRAMs can bother image generation
`
`
`Suggesting
`
`
`
`
`process at rendering processor side as well as at video refresh side. Using lodays available
`
`tiona! part co
`
`
`
`
`
`video RAMs (VRAl'vls) improves the speed of frame buffer access dramatically (Whit84).
`over a whole
`
`
`
`
`
`
`Nevertheless a certain level of performance implies the need of parallelism within the frame
`
`ceptible for tl
`
`
`
`
`
`
`buffer. A resolution of 1024xl280 visible pixels with a 60 Hz refresh rate (noninterlaced)
`This shows
`
`
`
`requires a pixel frequency of about 110 MHz or equivalent 330 Mbyle/s transferrale for
`
`units are nee
`
`
`
`
`full colour representation with 24 hils/pixel. A monolithic frame buffer can not achieve
`
`be done with
`
`
`
`that. The maximum clock frequency of the VRAM shift register measures 30-40 MHz and
`
`application is
`
`
`
`
`
`
`is therefore limited to resolutions 640x480 pixels with 60 Hz video refresh rate (noninler­
`metic m�y su
`
`laced) or equivalent.
`
`triangle shad•
`On the other side display processors with 25ns cycle times have lo compete for the
`
`
`
`In the syst
`
`random access port of a VRAM with a normal cycle lime of about 150ns {no page, nibble
`rend
`perform
`
`
`
`
`or static column mode is taken into account) slowing down image generation .
`to in
`decision
`
`
`
`
`
`The solution appears to be found in writing multiple pixels into the memory in parallel,
`on chip and
`
`
`
`
`
`the basic concept of the distributed frame buffer. The frame buffer could be divided into
`market.
`
`
`
`rows, columns, or arrays [9] [16] and each of these parts is attached to a separate rendering
`
`
`
`processor thus overcoming the memory access bottleneck.
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 11 of 29
`
`

`

`
`
`While designing a system-architecture the central question is arising which processor
`
`For an image with a limited resolution most of this operations could be done with
`
`1eralion process
`
`!d VLSI devices
`slate of the art
`
`:ailed the frame
`
`r the rendering
`he video refresh
`. e video output
`
`4. Dynamic Load Balancing within a High Performance Graphics System 39
`Is of integrating
`
`
`2. Floating Point Versus Fixpoint Calculation
`
`
`
`
`:es is the design
`
`is the most suitable for that design.
`
`
`
`
`
`Graphics applications arc very arithmetic intensiv tasks and therefore need processing
`uffer bottleneck
`1 the same chip.
`
`
`
`devices with very powcrfuU arithmetic and logic units (ALUs).
`State of the art processors include a floating point unit on chip. But nevertheless only a
`
`
`
`
`and the Smart
`
`
`
`
`
`few of these processors incorporate sufficient powerfuU arithmetic units to perform floating
`if the system signed with appli­
`
`
`
`point operations as fast as integer operations. Especially
`
`
`
`
`cation specific integrated circuits (A SICS) it is worth while to consider which accuracy
`
`
`
`
`Numerical intensive opera­system is needed. for mathematical calculations in a graphics
`
`
`
`
`
`tions are performed in the geometry and the rendering section. Typical tasks within the
`
`
`
`
`
`geometry section requiring floating point calculation with high accuracy are the following:
`aputer based on
`
`
`
`
`-Transforming objects with world coordinates to image space,
`arne buffer (14].
`
`
`
`-Interpolating vertex normals (Phong shading)
`rogrammability
`
`
`
`
`-Normalizing interpolated vertex normals (Phong shading)
`
`
`
`-Performing the lighting calculations (Phong and Gouraud shading).
`
`
`
`
`
`Using a single precision floating point number results in a maximum inaccuracy of 2exp-
`
`
`
`
`
`150 (decimal equivalent: 7*10exp-46) per operation [13]. This is a sufficient precision for
`
`
`
`
`
`
`
`the operations mentioned above without visible effects. The rendering section comprises
`
`operations like
`-colour interpolation (Gouraud shading)
`
`
`
`
`-z-value interpolation for z-buffering
`-transparency calculation
`
`
`-algorithms for image processing .
`
`
`
`
`
`
`fucpoint arithmetic in an appropriate precision.
`1age generation
`Suggesting a resolution of 2048 x 2048 pixel and a fixpoint representation with a frac­
`
`
`
`
`
`:odays available
`
`
`
`
`tional part consistng of 16 part consisting of 16 bits, a RGB model colour interpolation
`cally (Whit84).
`
`
`
`over a whole scanline would incorporate a binary error of 2exp-6 - a deviation not per­
`vi thin the frame
`
`ceptible for the human eye on todays monitors.
`( noninterlaced)
`This shows that for the mathematical calculation in the geometry section floating point
`
`
`
`
`
`
`transferrate for
`
`
`
`
`units are necessary but in the rendering section the mathematical computations could
`can not achieve
`
`
`
`
`
`be done with fixpoint precision. Therefore, if a straightforward architecture for a specific
`30-40 MHz and
`
`
`
`
`
`application is implemented with no parallelism on board or module level, fixpoint arith­
`rate (noninter-
`
`
`
`metic may suite well -an approach that was realized and tested well for a fast Gouraud
`triangle shadcr (1].
`In the system-architecture discussed in this paper the processors should be able to
`
`
`
`
`
`
`
`perform rendering tasks as well as geometry calculations. This argue mainly led to the
`
`
`
`
`
`decision to incorporate digital signal processors (DSPs) which have a floating point unit
`
`
`
`on chip and were at the time of system design the fastest processors available on the
`market.
`
`ompete for the
`no page, nibble
`lion.
`nory in parallel,
`be divided
`into
`•arate
`rendering
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 12 of 29
`
`

`

`40 Harald Selzer
`4.2.2.2 System Characteristics and Design Goals
`
`The architecture to be rea.li�ed should be capable of generating high quality images within
`
`
`
`
`
`
`
`
`
`a moderate time. That means the hardware should be as fast as possible hut not as big
`
`
`
`
`as possible. The employed computatiomal power should be used very effectively. Other
`
`characteristics arc:
`· A flexible system, high level programmable to enable the implementation of all graph·
`
`
`
`
`
`
`
`
`ics functions necessary and various algorithms for image generation.
`
`
`-Parallelism should be implemented wherever possible
`
`
`
`
`-Homogene. To become familiar with an off-the-shelf VLST device needs some time.
`
`
`
`
`To become familiar with a few different such devices needs a lot of time. Therefore the
`
`
`
`
`number of clifferent off-the-shelf VLST components had to be reduced to a minimum to
`
`
`ease system use and shorten software development time.
`
`
`
`-The arithmetical and logical units (ALUs) should be available off-the-shelf.
`
`
`
`
`
`
`
`-The frame buffer design should overcome the access bottleneck on the generation side
`
`
`
`
`as well as on the video side and incorporate hardware support for fast window handling.
`
`
`-The frame buffer resolution is 1280 * 1024 pixel with a video refresh rate of 60 Hz
`
`
`(noninterlaced). Every pixel has 24 bit colour and is double buffered <tS weU as z-buffered.
`
`
`4.3.1.1 T
`
`
`
`
`-The frame bufl"er should provide double buffering in order to accomodate dynamics
`
`and z-buffering too.
`The comm
`memory at
`and interrr
`The mas
`firs·
`special
`ln the c;
`etc.) the rr
`to the app
`4.3.1.2 ']
`Above of the frame buffer there arc three different units handling the image generation
`c
`Graphics
`1
`geometry
`normal int
`ncr and d<
`4.3.1.3 1
`The struc
`renderin
`g
`its data F
`processing
`Becaus<
`fetch grap
`of the apr
`
`4.3 The Architecture
`4.3.1 Overview
`Taking into account the demands of the different tasks within the image generation process
`
`
`
`
`
`
`
`
`
`
`
`the mapping of the functional sections to hardware suggested the splitting into units as
`
`shown in Figure 4.1.
`
`
`process:
`-The Master Module
`
`-The Geometry Module
`
`-The Rendering Module.
`The master module is the systems supervisor, handles the communication to the host
`
`
`
`
`
`
`
`processor and is responsible for start-up and synchronising activities.
`
`
`
`
`
`The geometry module transforms and clips the graphic primitives, subdivides bipara­
`
`
`
`
`mctric patches and the lighting calculations that are necessary and tasks like this.
`
`
`
`
`
`The rendering module performs the shacling algorithms and transfers pixel data to the
`
`
`
`
`
`frame buffer. The rendering module also supports too aU functions of the geometry module
`
`(Figure 4.1).
`
`AU modules contain a digital signal processor (DSP) with up to 256k * 32 bit wide,
`
`
`
`
`
`
`
`
`fast static memory for instruction and data storage. This type of processor was chosen
`
`
`
`
`
`because of its 60ns instruction cycles, the on-chip cache and the floating point unit and
`
`
`the two independent, parallel bus interfaces [10].
`
`
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 13 of 29
`
`

`

`y images within
`, but not as big
`
`fectively. Other
`
`ion of all graph-
`
`eds some time .
`
`. Therefore the
`a minimum to
`
`-shelf.
`
`generation side
`1dow /Jandling.
`1 rate of 60 Hz
`U as z-buffered.
`·date dynamics
`
`eration process
`g into units as
`
`
`
`4. Dynamic Load Balancing within a High Performance Graphics System 41
`
`Traversal
`ol graphics data structure
`
`Moster Module
`
`Geometry Calculations
`
`transformation and
`cli>p;,q. light;,q
`
`Geomelry Module
`
`Rendering
`scone-an version
`end shadilg
`
`Rendering Module
`
`
`
`FIGURE 4.1. Mapping functional sections to hardware
`
`
`
`4.3.1.1 The Muster Modul
`
`is handled over a 256k * 32 bit dual ported The commurucation to the host processor
`
`
`
`
`memory allowing to transfer and process data in parallel. The interface is asynchronous
`
`and interrupt driven for fast response and transfers data up to 20 Mbyte/s.
`The master module traverses the graphics data structure and feeds graphics data to a
`
`
`
`
`
`
`
`special first-in-first-out memory (FIFO) for delivering to the appropriate processors.
`
`
`In the case of synchronizing or updating (e.g. graphics context, colour lookup tables,
`
`etc.) the master takes over system control and bypasses the pipeline with a direct access
`
`to the appropriate resource.
`
`age generation
`
`4.3.1.2 The Geometry Module
`
`Graphics data are transferred to the geometry modules by a rate of 33 Mbyte/s. The
`
`
`
`
`geometry module performs the transformation, clipping, polygon and patch subdivision,
`
`
`
`
`normal interpolation and renormalisation and Eghting operations in an appropriate man­
`
`
`
`
`
`ner and delivers the processed graphical primitives to the rendering module data FIFOs.
`
`4.3.1.3 The Rendering Module
`
`The structure of -the rendering module is similar to that of the geometry module. For
`
`
`
`
`
`
`
`rendering calculations Eke shading and scan conversion the processor fetches data from
`values to the frame
`its data FIFO and conveys the calculated pixel
`bufl"er. For image
`
`
`
`processing purposes data are read from the frame buffer, manipulated and written back.
`
`Because the rendering module can act as a geometry module too, it can also directly
`
`fetch graphics data from the master data FIFO and deEver processed data to the FIFOs
`
`
`of the appropriate rendering modules (see Section 4.5).
`
`on to the host
`
`.ivides bipara­
`
`ke this.
`:el data to the
`metry module
`
`' 32 bit wide,
`lr was chosen
`•oint urut and
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 14 of 29
`
`

`

`42 Harald Selzer
`
`'GeomellyModie
`
`FIGURE 4.2. Dnsic module structure
`1280
`2 3 4 5
`
`2 3 4 5
`2 3 4 5
`
`2 3 4 5
`
`2 3 4 5
`
`2 3 4 5
`
`Clipping at arb
`
`
`of windows and l
`
`4.3.2 Overall
`The system is o
`g
`by multiplying
`rendering modul
`
`geometry modulo
`three 1
`comprises
`Three indcpen
`of the system.
`
`All modules ar
`resou rccs arc ac<
`
`in single or broa
`The rendering
`on the rendering
`to 132 Mbytcfs.
`
`Theinit bus al
`of the colour loo
`Each renderin
`tached to it whi·
`The frame bul
`
`processor in ord
`
`the handling of
`back from the ir
`
`4.4 Datafl
`S)
`In the entire
`
`subsequent mo<
`
`increases as the
`n
`The master
`
`itives like spline
`has finished th<
`task automatic;
`
`The logical ir
`pi
`gles, vectors,
`data structure
`and common d:
`
`2 3 4 5
`2 3 4 5
`
`FIGURE 4.3. Frame buffer interleaving
`
`4.3.1.4 The Frame Buffer
`
`The frame buffer is distributed and divided into 5 parts with an overall resolution of
`
`
`
`
`
`
`
`
`1280xl024 pixels with 88 bits per pixel (2x24 bit colour, 24 bit z-buffer, 8 bit transparency,
`
`
`8 bit window identifier) with a video refresh rate of 60Hz (noninterlaced).
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 15 of 29
`
`

`

`4. Dynamic Load Balancing
`
`within a High Performance Graphks System 43
`
`Clipping at arbitrarily shaped windows
`is supported by hardware
`as well as fast copying
`at high data transfer rates [11].
`of windows and bit block operations
`4.3.2 Overall Architecture
`The system is organized as a pipeline with additional
`parallelism
`on functional
`level
`geometry and rendering modules (sec Figure 4.4). The number of the
`by multiplying
`rendering modules is fixed
`to a multiple of five due to technical
`reasons, whereas the
`geometry modules can be multiplied
`theoretically
`unlimited.
`The current configuration
`comprises three geometry and five rendering modules.
`Three independent
`busses enable parallel
`data transfer to and from multiple resources
`of the system.
`All modules arc connected to the geometry bus which acts as the system bus. All system
`resources are accessablc
`by the master. System, graphics or update data are transferred
`in single or broadcast mode with 33 Mbytefs.
`The rendering bus is designated
`to convey only rendering primitives
`to the data FIFOs
`on the rendering modules. For speed reasons data arc transferred
`synchronously
`with up
`to 132 Mbytefs.
`Thcinit bus allows a direct accccss to the video and cursor planes used for fast update
`of the colour look up tables (CLUT) and generating
`the cursor in
`separate cursor planes.
`data with 33 Mbytes/s to the frame buffer bank at­
`Each rendering processor writes
`tached to it which results in a total transfer rate of 165 Mbytesfs.
`The frame buffer and the video/cursor plane memories can be accessed also by the host
`processor in order to get a possibility
`to bypass the graphics pipeline.
`This supports e.g.
`the handling of pixel maps if the host processor wants
`to transfer pixel values to or read
`back from the image memory.
`
`4.4 Dataflow
`
`In the entire system graphics data are processed simultaneous
`and transferred
`to the
`subsequent modules in parallel.
`From stage to stage the number of elements per object
`(Figure 4.5).
`increases
`as the content of information
`per clement decreases
`The master module traverses
`the graphics data structure
`and puts the high order prim­
`itives like splines,
`polygons,
`meshes or triangles
`into the data FIFO. If a geometry module
`has finished the last task, it accesses the geometry bus and fetches the next primitiv or
`task automatically.
`All geometry calculations
`arc done within a single module.
`The logical interface
`between the geometry and rendering calculations
`transfers
`trian­
`gles, vectors, pixel and trapeziums with edges parallel
`to the screen y axis columns. The
`processor specific data (due to the distributed
`frame buffer)
`data structure incorporates
`and common dat<L. The latter ones are broadcastcd
`to the rendering modules.
`
`;erall resolution
`of
`8 bit transparency,
`ced).
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 16 of 29
`
`

`

`44 Harald Selzer
`
`[
`
`r---------------,
`I
`:
`
`r----------------,
`
`I
`
`I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
`
`4.5 Load
`4.5.1 Auton
`The effort of c
`position
`of the
`y axis require
`to initialize
`th•
`than the rend€
`for interpolatir:
`c
`An anlysis
`from a lot of s:
`a few large on•
`Further inv•
`shown a more
`below). The p
`The reason:
`
`� �--�IN�IT�B�US�--------------------------------------�
`
`I I I
`;:!"' I
`I
`"'
`J
`
`�----------------
`
`s
`c .., 0
`0 "'
`
`,----�---:
`I
`L-------------
`J
`
`FIGURE 4.4. System architecture
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 17 of 29
`
`

`

`4. Dynamic Load Balancing within a High Performance Graphics System 45
`
`graphical dote
`structure
`
`I
`I
`•
`
`graphical
`(splines, polygones,
`objects
`meshes ... )
`
`I
`I
`•
`
`graphical
`( ·./.6.0)
`primitives
`
`I
`I
`•
`
`pixels
`
`Master
`Module
`
`'
`
`G-Module
`
`�
`GR-Module
`
`Frame
`buffer
`
`FIGURE 4.5. Graphics data processing
`4.5 Load Balancing
`4.5.1 Automatical Regulatio n
`The effort of computation in the geometry and rendering section depends on size and
`
`
`position of the geometrical objects. Small triangles or short vectors parallel to the x or
`
`y axis require only a small number of rendering operations. In fact the lime consumed
`
`
`lo initialize the rendering processor for primitives producing only a few pixels is greater
`
`than the rendering time itself. On the other hand the number of geometric calculations
`
`
`
`for interpolating shading methods is independent from the resulting size of the primitive.
`An anlysis of scene complexity has shown, that in most cases the image is generated
`
`
`from a lot of small triangles (1-10 pixels), a number of medium sized (11-100 pixels) and
`a few large ones (101-1000 pixels) [6].
`Further investigations with less complex scenes (no more than 5000 triangles) have
`
`
`
`
`
`
`shown a more extrem distribution of the size of triangles incorporated (s. statistics shown
`below). The pictures arc shown at the end of this paper.
`The reason is the way of modeling a scene i. c. things of interest arc generated with a lot
`
`
`
`\
`
`·------------,
`
`I I I I I I I I I I I I I I I I I I I I I
`
`I I I I I
`
`r------1.--: I
`�---_JI
`------------
`...J
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 18 of 29
`
`

`

`46 Harald Selzer
`
`Breakfast
`
`Legend
`% or total triangles
`[�·.] % of total triangle
`area
`
`100
`
`90
`80
`
`70
`
`60
`
`50
`
`40
`
`30
`
`20
`
`10
`
`0
`
`0-30
`
`30-60 60-300 300-1200 1200-6000-30000->60000
`6000 30000 60000
`Pixel{friangle
`---+
`
`for "breakfast"
`FIGURE 4.6. Image statistics
`
`100
`
`90
`
`80
`
`70
`
`60 � � �
`50 � � � �
`40 � � � � 30 � � �
`20 �
`i 10 :
`�
`0 '
`
`the background
`The rest of the scene especially
`surface.
`to get a fine grained
`of primitives
`).
`(triangles
`calculation
`with only a few but very large primitives
`is defined
`After rer
`of 1024 x
`with a solution
`below were rendered
`in the statistics
`analyzed
`The pictures
`data in his
`1280 pixels.
`objects anc
`background.
`without
`defined
`for a picture
`is an example
`The chessman figure
`a•
`achieved
`more
`will be used to display
`performance
`in graphics
`increase
`the future
`Additionally
`ion
`calculat
`Those images
`faster.
`the same number of objects
`than displaying
`rather
`scenes
`complex
`power. Do·
`to the geometry
`the load of computation
`shifting
`a lot of very small triangles
`will comprise
`than 95%
`section.
`g 01
`startin
`of may vary from
`an image consists
`the size and the number of triangles
`Nevertheless
`a fi.xed balanced
`within
`scene to scene or even from view to view. This will cause idle states
`4.5.2 T•
`computational
`power of
`all the distributed
`of exploiting
`intention
`With the
`architecture.
`to the actual
`units have to be able to adapt their activities
`the system, the processing
`The capab
`of the scene.
`requirements
`processing
`ing modul
`and to speed up geometry calculation
`dy­
`To enabel such a dynamic load balancing
`lion. SupF
`all the geometry
`of performing
`are capable
`modules
`the rendering
`if required,
`namically
`to the sav
`
`Realtek Ex. 1016
`Case No. IPR2023-00922
`Page 19 of 29
`
`

`

`100
`90
`80
`70

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket