throbber
Realizing OpenGL: Two Implementations of One Architecture
`
`MarkJ. Kilgard
`Silicon Graphics, Inc.
`
`Abstract
`
`Implementations that are simply “compatible” do not necessarily
`manifest an architecture. Our definition allows for an implemen-
`tation to belong to an architecture but have additional capabilities
`beyondthose definedbythe architecture.
`
`By our definition, OpenGLis clearly an architecture. While the
`determineroffunctional equivalenceis not required to be a codified
`specification,’ OpenGL’sarchitecture is indeed defined byits spec-
`ification [11].
`
`The OpenGL Graphics System provides a well-specified, widely-
`accepted dataflow for 3D graphics and imaging. OpenGLis an ar-
`chitecture,; an OpenGL-capable computer is a hardware manifesta-
`tion or implementation ofthat architecture. The Onyx2 InfiniteRe-
`ality and 02 workstations exemplify two very different implemen-
`tations of OpenGL. The two designs respondto differentcost, per-
`formance, and capability goals.
`Commonpractice is to describe a graphics hardware implemen-
`Implementations of an architecture typically accrue significant
`tation based on how the hardwareitself operates. However, this
`advantagesnotavailable to adhoc implementationsorsets ofimple-
`paper discusses two OpenGL hardware implementations based on
`mentations that are compatible yet do not manifest an architecture.
`howthey embody the OpenGLarchitecture. An important thread
`Architectures gain an advantage from compatibility, but also tend to
`throughout is how OpenGL implementations can be designed not
`be more adaptable and foster innovative implementations through
`merely based on graphics price-performance considerations, but
`the freedom granted designers in howthey realize the architecture.
`also with consideration of larger system issues such as memory ar-
`Architectures also tend to be easy to extend because an implemen-
`chitecture, compression, and video processing. Just as OpenGL
`tation’s behavioris typically not specified for situations not defined
`is influenced by wider system concerns, OpenGLitself can pro-
`by the architecture’s functional equivalence.
`videaclarifying influence on system capabilities not conventionally
`Theintent ofthis paper is to explore OpenGL’s adaptability as an
`thought of as graphics-related.
`architecture. What werefer to as the adaptability ofan architecture
`is not measured by units sold or market share. Instead, we contend
`that the adaptability ofan architecture should bejudged by the archi-
`tecture’s ability to codify well-understoodfunctionality,its potential
`to be cleanly extended to support new capabilities, andits ability to
`influencepositively issues outside the scope ofthe architectureit-
`self.
`
`1.3.1 [Computer Graphics]: Hardware Architec-
`CR Categories:
`ture; 1.3.6 [Computer Graphics]: Methodology and Techniques—
`Standards
`
`Keywords:
`Reality, 02
`
`OpenGL, Graphics Hardware Architecture, Infinite-
`
`1
`
`Introduction
`
`The OpenGL Graphics System provides a well-specified, widely-
`accepted dataflow for 3D graphics and imaging. While program-
`mers maythink of OpenGLassimply a programming interface [7],
`wetake the view that OpenGL definesan architecture,
`Wesay a set of implementations manifest an architecture when
`three conditions are met:
`
`1, The implementations mustall have an identical interface and
`generate functionally equivalent outputs given the sameinputs
`andinitial state,
`
`2. The determiner offunctional equivalence is something other
`than a particular implementation.
`
`3, The determiner of functional equivalence does not necessi-
`tate that all implementations be operationally identical. (There
`must be multiple ways to implementthe architecture.)
`
`Permission to make digital/hard copies of all or part ofthis material for
`personal or classroomuseis granted withoutfee provided that the copies
`are not made or distributed for profit or commercial advantage,the copy-
`right notice, the title ofthe publicationandits date appear, andnotice is
`given that copyright is by pennission ofthe ACM,Inc. To copy otherwise,
`to republish, to post on servers orto redistribute tolists, requires specific
`pennission and/orfee
`1997 SIGGRAPH:Eurographies Workshop
`Copyright 1997 ACM0-89791-961-0/97/8,.33,50
`
`Our approachis to consider two manifestations of the OpenGL
`architecture: the Onyx2 InfiniteReality graphics supercomputerand
`the O02 desktop workstation. Our examples were chosen because
`eachis the result ofquite different cost, performance, and capability
`goals, but both concretely demonstrate our primary contentionthat
`OpenGLis technically successfulas an architecture becauseit is ex-
`tensible to encompass new capabilities within the scope of interac-
`tive graphics and because OpenGLcanpositively influence system
`issues not directly graphics-related. Our approachis novel because,
`while we consider concrete implementations, we are fundamentally
`evaluating OpenGLas a graphics system architecture, not a partic-
`ular hardware implementation.
`
`Section 2 reviews the OpenGLarchitecture’s scope, philosophy,
`functionality, and meansofextensibility. Section 3 describes how
`OpenGLis instantiated by the Silicon Graphics Onyx2 InfiniteRe-
`ality. Section 4 describes how OpenGLisinstantiated by the Silicon
`Graphics 02 workstation. Section 5 contrasts the two implementa-
`tions based on howthey distinctly manifest the OpenGLarchitec-
`ture. Section 6 discusses how the OpenGLarchitecture influenced
`and even clarified several non-OpenGL design considerations in
`both example implementations. Section 7 argues that the OpenGL
`architecture is “good” becauseit provides us a framework forbuild-
`ing innovative, evolvable, well-integrated graphics systems.
`
`1 The PC architecture lacks a codified specification but whatconstitutes
`a PC has evolved beyondthepointthat a PC can be described operationally
`by a single implementationas was originally the case.
`
`45
`
`1
`
`APPLE 1037
`
`APPLE 1037
`
`1
`
`

`

`LO”|Color Tabla Post Convolution
`
`Shit
` Point, Line,
`Original
`and Add
`and Polygon
`Rasterization
`.
`xo
`BorPath
`
`Pixel Mapping
`Pixet Mapping
`giPixelMap
`
`RGBA-> RGBA]|Indox » RGB.
`
`
`
`(ColorTableEXT
`
`
`Sienablelgicisable
`giConvolutionParameterEXT
`glEnable/giDisabte
`Convolution
`g\PixelTranster
`
`Let|Scale & Blas
`giColorTableEXT
`glEnable/g!Disable
`
`
`Unpack’
`Pixels
`Pa
`giColorMatixSGt
`glEnable/giDisable
`g!PixelTransfer
`
`
`glPixelTransfer
`
`Scale
`
`
`
`
`Color Tabla
`
`7
`
`giColorTableEXT
`glEnable/g!Disable
`glHistogramEXT
`g'ResetHistogramEXT
`glEnable/g!Disable
`
`giMinmaxEXT
`glResetMinmaxEXT
`glEnable/g!Disable
`
`Color Atatrix
`Scale & Blas
`
`
`
`
`
`Color Tabla
`
`Histogram
`

`
`RGBA
`
`Post Color Matrix
`
`t
`
`xz
`
`Feedback/
`Selection
`
`RGBA
`
`Indox
`
`Di
`
`Fragment
`
`Pixels Framebuffer
`
`
`
`ey
`
`
`Figure 1: The dataflow within the OpenGLarchitecture’s concep-
`tual state machine.
`
`2 OpenGLis a Visualization Architecture
`
`The OpenGLarchitecture addressesthe task ofefficiently convert-
`ing vertex- and pixel-based data representations into images. While
`the “GL” in OpenGL stands for Graphics Library, we consider
`OpenGL’s functionality mandate to be larger than that of a tradi-
`tional 3D graphics library. OpenGL manipulates vertex and pixel
`data with comparable ease. Moreover, texture mapping provides
`a “bridge” to effectively combine therasterization of vertex- and
`pixel-based data representations.
`We consider SGI’s early IRIS GL implementation to exemplify
`the conventionalfeature set ofa 3D graphicslibrary. Over time IRIS
`GL added texture mapping and imageprocessing operationsto its
`repertoire. These additions served as the motivation for rethinking
`the purposeofa graphicslibrary during the design ofOpenGL. Be-
`cause OpenGLis well-suited for manipulating both vertex andpixel
`data, supports texture mapping, and embodies an architecture, we
`refer to OpenGLasa visualization architecture.
`
`2.1 State Machine Philosophy
`
`OpenGLisspecified as a state machine. OpenGL commandseither
`set state variables,retrieve state variables, retrieve framebuffer con-
`tents, compile orcall displaylists, or introduce vertex or pixel data
`into the state machine. Vertex and pixel data introduced into the
`state machine are processed based on the current OpenGLstate set-
`tings with the results sentto the framebuffer, texture objects, display
`lists, or selection/feedback buffer depending on OpenGL’s current
`settings. Figure 1 showsthe high-level dataflow within the OpenGL
`architecture’s conceptual state machine.
`Beyond OpenGL’s state machine model, several philosophical
`choices help make OpenGLboth extensible and adaptable to unex-
`pected situations. In later discussion, we note howthese choices are
`manifested in the two example implementations considered.
`OpenGL’sstatevariables are orthogonal. In general, the enabling
`or reconfiguring of OpenGLfeatures does notinterfere with other
`features. For example,lighting calculations can be enabled ordis-
`abled independently from the current depth buffering mode. This
`means programmers can combinefeatures with predictable results.
`Anoften unforeseen advantage offeature orthogonality is that mul-
`tiple independentfeatures can often be combinedin useful but unan-
`ticipated ways. Much of OpenGL’sease of extensibility is predi-
`cated on feature independence. Without orthogonality, multiple ar-
`chitectural extensions lead to confusing interdependencies or even
`create feature conflicts.
`The OpenGLarchitecture is client-server in the abstract sense,
`
`Figure 2: The extended OpenGLpixelpath including the convolu-
`tion, histogram, color matrix, and color table extensions.
`
`not necessarily in a networked sense. Client-server means thatthe
`interface between an OpenGLapplication and an OpenGL imple-
`mentationis strictly defined andall data passing betweenthe appli-
`cation and implementation is explicit. The client-server separation
`defines the boundary between OpenGL implementation state and
`that ofthe application. This clear boundary makespossible network
`extensible OpenGL implementations[5] and allows OpenGLto be
`used as a direct hardware interface.
`Immediate
`The OpenGL architecture is data format rich.
`modetransfer of pixel and vertex data can be accomplished using
`OpenGL’s wide variety of data sizes and formats. This allows ap-
`plicationsto easily transfer their vertex and pixel data to OpenGL
`by traversing application-dictated data structures. Applications can
`supply pixel data using variousstrides, offsets, and component
`packings. Application performancetypically benefits from avoid-
`ing data reformatting whentransferring data to OpenGL. However,
`OpenGL implementations must be ready to accept OpenGL’s mul-
`titude ofpossible data formats.
`The OpenGLarchitectureis configurable, but notprogrammable.
`The OpenGLstate machine can be thought of as a pipeline with a
`fixed topology (though various stages may be switchedin or out).
`This mimics the layout of high-performance graphics subsystems
`where rendering steps are decomposedandinstantiated by special-
`ized hardware. The OpenGLarchitecture clearly encourages this
`style of implementation. This doescreate situations where features
`such as programmable shaders [8] or generalized image processing
`chains [12] are difficult to express as extensions to the OpenGLar-
`chitecture.
`
`2.2 Functional Decomposition
`
`Sections 3 and 4 discuss how OpenGL{as specified in version 1.1)
`is instantiated by our example implementations. Therefore,this sec-
`tion briefly reviews OpenGL’s functionality from an architectural
`standpoint. The operations are explained “bottom up”starting with
`the lowestlevel operations that update the framebuffer and moving
`to the highest level operationsthat accept commands.
`
`46
`
`
`
`
`
`2
`
`

`

`2.2.1 Per-Fragment Processing and Rasterization
`
`2.3 Extensibility
`
`A fragment in OpenGLis the bundle ofstate required to update
`a specific pixel in the framebuffer. Fragments are generated dur-
`ing rasterization. The per-fragmentoperationsare pixel ownership,
`scissoring, alpha testing, stencil testing, depth testing, blending,
`dithering, and logicop. The operations are performedin the order
`listed though what operations are enabled depends on OpenGL’s
`per-fragmentstate variables.
`Rasterization is the process of breaking a primitive up into frag-
`ments that are passed to the per-fragmentprocessing stage. OpenGL
`supports five types ofprimitives: points, lines, polygons,pixelrect-
`angles, and bitmaps. Thefirst step in rasterization is determiningif
`a framebuffer pixel is updated by the primitive. Depending on the
`primitive being rasterized, the current raster position, face culling,
`pointsize, line width, line stipple, polygonstipple, and antialiasing
`state affect which pixels are updated. The next rasterization step de-
`termines the fragment depth andcolorofaffected pixels. The alpha
`color componentisaltered based on the antialiasing state ofgeomet-
`ric primitives, The depth ofgeometric primitives can be altered de-
`pending onthe polygonoffset state, When enabled, texture mapping
`and fog modify the color ofboth geometric andpixelprimitives.
`
`2.2.2 Texture Mapping and Mangement
`
`Texturing mapsa portion of a specified image onto each primitive
`for which texturing is enabled, Texture coordinates determine what
`portion of the image is mappedto the primitive. OpenGL supports
`both 1D and 2D textures in a wide variety of formats. Texture pa-
`rameters and the texture environment determine the methodoffil-
`tering texels and howtexels are combinedwith fragments generated
`during rasterization.
`Texture objects provide the capability to switch between multiple
`texture images without the overhead ofrespecifying the texture im-
`age each time, Rectangular regionsoftextures can be incrementally
`updated using subtexture loads, When a texture imageis specified,
`the constituentpixels are passed through the OpenGLpixelpipeline
`so the same operationsdiscussedbelowthat apply to drawing, copy-
`ing, or readingpixel rectangles also transform texture images when
`they are specified.
`
`2.2.3 Both Vertex and Pixel Processing
`
`OpenGLtransformsapplication-supplied vertex coordinates to win-
`dow coordinates, clipping the primitives as necessary. Per-vertex
`lighting is performed if enabled. Texture coordinatesare either ex-
`plicitly supplied by the application or generated based onthe vertex
`coordinates,
`OpenGLdefinesa pixelpath to process pixels. The pixel path can
`be configured to perform componentscaling, biasing, and remap-
`ping via table lookups, Pixels are transformed by the pixel path
`whenpixels are drawnto the framebuffer, read back from the frame-
`buffer, copied within the framebuffer, or downloadedinto texture
`memory, Each pixeltransfer case sharesthe identical pixel process-
`ing machinery,
`
`2.2.4 Other Capabilities
`
`Displaylists provide a way to cache repeated command sequences
`for potentially faster execution, Evaluators provide a meansto effi-
`ciently specify Bézier curves and surfaces, Feedback and selection
`redirect the results of vertex processing back to the application in-
`stead ofon to rasterization.
`
`One key to an architecture’s adaptability is its extensibility.
`OpenGLcanbe incrementally enhanced through its proven API
`extension mechanisms. OpenGL’s rendering functionality can
`be extended by adding extensions to OpenGL’s core rendering
`model. Extensions also can be made to OpenGL’s windowsystem
`dependentinterface to address issues outside OpenGL’s rendering
`model.
`
`Various OpenGL vendors have already implemented dozens of
`extensions, and the OpenGL 1.1 update was the result of the
`OpenGL Architectural Review Board’s efforts to fold success-
`ful, proven extensions back into the core OpenGLarchitecture.
`OpenGL1.1 added vertex arrays, polygon offset, RGBAlogic oper-
`ations, texture objects, and further texture functionality enabled by
`texture objects.
`The following extensions are importantfor later discussion.
`
`2.3.1
`
`Imaging Extensions
`
`A keyset of OpenGL extensions? are the imaging extensions[10]:
`color table, convolution, color matrix, histogram, and new per-
`fragment blending modes. Figure 2 showsthe extendedpixelpath.
`
`2.3.2 Hardware Accelerated Off-screen Rendering
`
`Hardwareacceleratedoffscreen renderingis critical for a multitude
`oftechniques that mustreliably readback or reuserenderingresults.
`A windowsystem dependentextensionforpixel buffers (commonly
`called pbuffers) enables hardware accelerated offscreen rendering.
`
`3 OpenGL as Instantiated by
`InfiniteReality
`
`Onyx2 InfiniteReality implements the bulk of OpenGL’s dataflow
`within the InfiniteReality graphics subsystem.InfiniteReality is de-
`signedto be a “real time” graphics machine meaningthatsustained
`30 hertz and higher frame rates are achievable even for demanding
`applications. InfiniteReality’s intended application domainsare vi-
`sual simulation,film & video production,real-time image process-
`ing, volume rendering, and large-scale CAD.
`InfiniteReality is a hardware-intensive design consisting of 13
`distinct Application Specific Integrated Circuits (ASICs).° Infinite-
`Reality is a multiple-board graphics subsystem with the same board-
`level architecture as the RealityEngine [1], InfiniteReality’s prede-
`cessor. A single Transform Managerboard connects to 1, 2, or
`4 Raster Managerboards and a single Display Generator board.
`Figure 3 shows an ASIC-level block diagram of InfiniteReality.
`Figure 4 shows how OpenGL’s conceptual state machine(origi-
`nally shown in Figure 1) roughly mapsto InfiniteReality’s render-
`ing ASICs. Starting at the host interface and working towards the
`framebuffer and display back-end, the following discussion shows
`howthe OpenGLarchitectureis instantiated by InfiniteReality.
`
`2 Underconsiderationfor inclusion in OpenGL 1.2.
`3Other sourcesofinformation aboutInfiniteReality are likelyto refer to
`the boardsand ASICsthat constitute InfiniteReality by “working names”that
`grew out ofhistorical SGI jargon andtradition. In a few cases, the work-
`ing names inadequately describe the ASIC or board’s true functionin the
`context ofOpenGL.For example, the Geometry Engine ASIC handles both
`vertex and pixel data so wereferto it here as a Transform Engineto bet-
`ter suit our purpose ofdescribing how InfiniteReality manifests the OpenGL
`architecture.
`
`47
`
`3
`
`

`

` Requestor
`
`
`Figure 3: ASIC-level diagram showing the InfiniteReality graphics subsystem architecture.
`
`3.1 Host Interface
`
`The client-serverstructure of OpenGL makesit possible for essen-
`tially the entire OpenGLfeature setto be implemented within the In-
`finiteReality graphics subsystem. The host-based OpenGLlibrary is
`largely usedto setup efficient datatransfers to and from the graphics
`subsystem. For example, an immediate mode giVertex3£ call
`returns in 7 instructions. This consists ofjumping through a redi-
`rection table, writing the Vertex3£ token followed by the three
`floating point coordinates to the graphics FIFO address,and retumn-
`ing.
`OpenGL commandsanddata enter InfiniteReality via a high-
`bandwidth proprietary IO bus where they are received by the Host
`Interface Processor (HIP) that decodes and dispatches OpenGL
`command streams. Commandscanbesenteither by programmed
`IO or via Direct Memory Access (DMA).
`The HIP’s Input Control and Mapping (ICU)logic arbitrates the
`OpenGL commandstream from oneofthree sources:the host-filled
`graphics FIFO, the host-activated input DMA stream, or a local
`DMAstream usedforcalling locally cached displaylists. The ICU
`performs basic OpenGL commandstream error checking and di-
`rects commandsfor subsequent processing. Pixel and vertex com-
`mands and some mode changesare simply passed along for further
`processing. To process OpenGL commandstreamswith data rates
`over 300 MBs/second, the ICU mustbe very fast. More complex
`OpenGL commandsinvolving displaylists, more complicated state
`management, DMAsetup,or non-renderingtasks can be redirected
`to a microcoded 32-bit RISC core. Most of the RISC core’s mi-
`crocodeis written in C.
`Display lists are cached in 15 of the 16 megabytes of external
`memory managedby the RISC core (one megabyteis usedforstate
`and microcode). The HIP’s local DMAfacility allows cacheddis-
`playlists to be passed through the ICU justas if the command se-
`quence wasgeneratedby the host. Most immediate mode OpenGL
`calls result in IO writes to the hardware’s graphics FIFO address.
`The graphics FIFO is mapped into the address space ofdirect ren-
`dering OpenGLapplications [6]. OpenGL commandstreams can
`also be “pulled into” the HIP via input DMA.Largetextures, pixel
`arrays, vertex arrays, and host-residentdisplaylists can all be trans-
`ferred this way. Because DMAtransfers involvefixing host physical
`memory mappings, DMAisinitiated with operating system support.
`The HIP is also responsible for returning OpenGL data back
`
`Figure 4: How the conceptual OpenGLstate machine roughly maps
`to InfiniteReality’s rendering ASICs.
`
`to the host. The results of glGet*, feedback, selection, and
`glReadPixels are all returned via DMA.The HIP is responsi-
`ble for any data reassembly required before returning the data to the
`host.
`
`3.2 Vertex and Pixel Transform Subsystem
`
`The HIP sendsthe partially decoded OpenGL command stream to
`the Transform EngineDistributor (TED). The TED front endis
`responsible for converting OpenGL’s data format rich command
`stream into a canonicalformatin preparation for handingthe data to
`the Transform Engines(TEs) for processing. For example, double
`precision floating point or integer coordinates are forced to single
`precisionfloating point. Pixeldatais also reformatted as necessary.
`Commandsto change OpenGLstate are mostly passed through un-
`altered. Given the high data bandwidths involved andtheflexibility
`that OpenGLallows,the TED front end mustbe very fast,
`The TED backenddistributes bundles of work to 2 or 4 TEs
`that perform the actual vertex and pixel transformations required,
`Managing OpenGL’sg1Begin/g1lEnd and per-vertexstate is done
`through a microcodedstate machine. The TED also must ensure
`that OpenGLtransformationstate is synchronized amongthe mul-
`tiple TEs to guarantee proper OpenGL commandserialization sc-
`
`
`
`
`
`)
`
`,
`
`Readback BUS
`TransformvRasterization
`
`Crossbar
`
`
`Transform Manager board
`(2or4
`
`Texture
`Fragment
`Processor
`
`Texture
`Fragment
`Processor
`
`Texture
`Fragment
`Processor
`Texture
`Fragment
`Processor
`
`Video
`oeee,
`Channol #1
`
`Vidco
`
`Output
`Channel #0
`
`
` Vid
`Video
`
`Roquostor
`
`
`<-> Requestor
`Requostor
`
`
`RAMDAC
`
`RAMDAG
`
`Olspla'
`Function—
`
`A singe Raster Managerboard set
`, 2,074 RMs perpipe)
`
`Olsplay Generator board
`{option for & channels)
`
`4
`
`

`

`3.5 Texturing
`
`mantics despite multiple active TEs. The TED performs a mapping
`donein the PG are texture and fog application. The PG cansustain
`ofOpenGL commandtokens to TE microcode addressessothat the
`the rasterization of over 12 million polygonsasecond.
`TE can immediately begin command execution. Work is typically
`assigned to the least busy TE.
`The TE ASIC is a custom microcoded floating point processor.
`Each TE has a peak performance of 540 megaflopsachieved using
`three SIMDfloating point cores, The TEs use custom support logic
`to accelerate graphics-specific operations such as clipping. A care-
`fully tuned memory system is essential to keep the floating point
`units continually busy. To minimize the amount of microcodere-
`quired given the variety ofgeometry andpixeltransformations po-
`tentially enabled, microcode modulesare “stitched” together based
`on the current OpenGL geometry or pixel transformation state. For
`example,the lighting microcode module would only be addedto the
`TE’s geometry microcode sequenceiflighting is currently enabled.
`The TEs implementthe pixel path functionality including the ex-
`tendedpixel path functionality described in Section 2.3.1. Special
`care is taken in the TED and TEsto managepixeldistribution when
`pixel convolution is enabled. Anotherpixel path challenge is mem-
`ory managementfor the various lookup tables, convolution kernels,
`histogram bins, and otherpixel path state that must be maintained
`within each TE, Bothpixel rectangles and texture downloads flow
`through the TEs and so the identical microcode transforms both
`typesofpixel data identically as required by OpenGL.
`The complete Transform Managersubsystem can sustain geom-
`etry transformationrates ofover 11 million polygons/second.
`
`3.3 Transformation to Rasterization Crossbar
`
`The transformedvertices andpixels from the TEsflowoutin pack-
`ets that mustbe reordered by the Back End FIFO (BEF). The BEF
`is a 4 megabyte FIFO intended to minimizestalling the TEs dur-
`ing framebuffer clears or the rasterization ofvery large polygons or
`pixel rectangles,
`The BEFbroadcasts the contents ofits FIFO across the Trans-
`form/Rasterization Crossbar connecting the BEFto 1, 2, or 4 Raster
`Manager boards. Two main types of requests are sent over the
`crossbar:
`texture (or /Joad) requests and rendering (or draw)re-
`quests. The crossbaralso feeds back to the HIP to implement se-
`lection/feedback,state retrieval, and context switching.
`The BEFactually maintainstwodistinct FIFOs: the draw FIFO
`for rendering and the load FIFO for texture download. The draw
`FIFO takes priority over the load FIFO, but the load FIFO drains
`wheneverthe drawpath is stalled. The draw path canstall because
`it has gotten backed up with rasterization work or becauseit is wait-
`ing on a texture to download, Waiting for a texture to fully down-
`load provides an interlock that ensures textures are always properly
`loaded before use. The advantage ofthis schemeis that textures
`can be downloaded concurrently with rendering to increase overall
`throughput.
`
`3.4 Primitive Rasterization
`
`Geometric and image primitives, texture data, and mode changes
`are all broadcast over the Transform/Rasterization Crossbar to the
`Raster Managerboards. The crossbarcan sustain a maximum band-
`width of 400 MBs/second, The Pixel Generator (PG) and Texel
`Generator (TG) ASICs on each Raster Managerlisten for the data
`flowing from the BEF, Both the PG and TGrasterize image and ge-
`ometry primitives sent over the crossbar. The PG almost completely
`rasterizes primitives. Depending upon the current OpenGLrasteri-
`zationstate, the highly pipelined PG scan converts geometric prim-
`itives, pixel zooms images, scissors, interpolates color and depth
`between vertices, calculates coverage alpha valuesforantialiasing,
`and applies the polygon stipple. The only rasterization steps not
`
`49
`
`InfiniteReality is balancedto renderjustasfast with its highest qual-
`ity (linear mipmaplinear) texturing enabled as whenrendering with
`texturing disabled. This requires a very fast and sophisticated tex-
`ture subsystem.
`Using data received over the Transform/Rasterization Crossbar
`and rasterization results passedto it from the PG, the TG needsto
`initiate texel fetches for textured primitives in parallel with the ras-
`terization work done by the PG. The TG needsto rasterize only tex-
`tured primitives to the point that the TG can generate the necessary
`per-fragmenttexture coordinatesinterpolatedacrossthe primitive.
`Texture coordinate informationis broadcast to 8 Texture Memory
`(TM) ASICs. Each Raster Managerboard is configured with either
`16 or 64 megabytesoftexture memory split evenly among the TMs.
`Texture accessestend be highly redundantasnearbytexels are often
`needed multiple times in the courseoffiltering the texels for a given
`textured primitive. The TMsactas specialized memory controllers
`that are optimized for texel access patterns.
`InfiniteReality includes numeroustexture extensionsintroduced
`by RealityEngine including sharpentexture, detail texture, 3D tex-
`ture for volume rendering, and post-filtering texture lookuptables.
`InfiniteReality also includes new texture features such as clipmap-
`ping for rendering continuous terrain and various modesforbetter
`video texture mapping.
`
`3.6 Fragment Processing
`Texels from the TMsand texture coordinate information from the
`TG are combined in one of 4 Texture Fragment (TF) ASICs. The
`TFs also receive the actual fragments generated by the PG. Thein-
`formation from the TMsand TGare usedto perform OpenGL’s tex-
`ture filtering modessuch as linear mipmaplinearfiltering. A post-
`filtering stage can optionally scale, bias, and perform a table look
`up on the filtered texels. These extra steps are OpenGL extensions
`that are useful for image processing and volumerenderingeffects.
`Fully filtered texels are then combined with the fragments from the
`PG based on the current OpenGLtexture environment. If enabled,
`fog is applied. The last operation donebythe TFis the per-fragment
`alphatest.
`Each TF is connected to 5 Image Memory Processor (IMP)
`ASICs. Each IMP ASICcontains4 instances ofthe IMP core. Each
`IMPcore manages 1 megabyte of external memory containing the
`framebuffer. The IMPs manage 80 megabytestotal per Raster Man-
`ager. Each IMP core managesa scattereddistribution ofpixels and
`receivesfragments from its TF. The IMP coreperformsall OpenGL
`per-fragment operations exceptalphatesting which is donein the
`TFandscissoring whichis done in the PG.
`The IMPs maintain multiple depth and color samples per pixel
`to realize order-independentantialiasing. The IMPs also perform
`OpenGL’s accumulation buffer [4] operations.
`A single RasterManagerboard cansustaintexturedpixelfill rates
`of200 megapixels per second. The combinedtexturedfill rate with
`four Raster Managers is therefore 800 megapixels per second.
`
`3.7 Display Generator Subsystem
`
`The Display Generator board is responsible for generating analog
`video streams based on the current contents ofthe framebuffer main-
`tained by the IMPsin the Raster Manager. InfiniteReality supports
`2 or 8 analog video output channels . Each Video Output Channel
`(VOC) ASICgenerates video requests sent overa serial interface to
`the IMPs. The IMPsrespond with the requested framebuffer color
`
`5
`
`

`

`
`
`Imaging &
`Compression
`Engine
`
`Display
`
`Rendering
`Engine
`
`
`Display|Engine
`
`
`
`
`
`Imaging &
`
`Compression
`Engine
`
`
`SDRAM
`Main Memory
`
`Figure 5: Block diagram showing the O2 system-levelarchitecture.
`
`Figure 6: How the conceptual OpenGLstate machine roughly maps
`to O2’s various ASICs.
`
`information on the Video Bus. The core OpenGLarchitecture does
`notdirectly concern itself with video issues so further details about
`the Display Generatorare put off until Section 6.
`
`3.8 Reading and Copying Pixel Data
`
`The OpenGLpixel path draws pixels and downloads textures, but
`mustalso transform pixels that are copied (g1CopyPixels) or
`read back to the host (gl1ReadPixels). Whena framebuffer read
`or copyis initiated, the IMPs send framebufferpixels to the TFs that
`transfer on the data over the Readback Bus to the TED. The TED
`feeds the framebuffer pixel data through the TEs muchasifit were
`pixel data originated from thehost.
`The fetchedpixeldata is transformed bythe TEs andthenis either
`rendered back into the framebufferin the case of glLCopyPixels
`(just like the gl1DrawPixels case) oris transferred back to the
`host in the case of gLReadPixels. Whenreading pixels, the
`BEFdirects the pixels across the Transform/Rasterization Crossbar
`wherethe HIP reassemblesthe pixel data before DMAing thepixels
`back to the host.
`OpenGL’s requirementthat texture memory mustberetrievable
`necessitatesa pathwayfor texels to be returned to the host. The TMs
`canpasstexture contents to the TF where data passes overthe Read-
`back Bus and eventually back to the host. Unlike gLReadPixels,
`retrieved texture contents are not transformedbythe pixel path.
`
`3.9 Offscreen Rendering
`
`Excess framebuffer memory can be allocated to pbuffers for off-
`screen rendering as described in Section 2.3.2. The amount of
`renderable offscreen memory is limited and depends on the res-
`olution of the framebuffer. While pbuffers allow full speed off-
`screen rendering, because pbuffers are carved from “excess” frame-
`buffer space, pbuffers on InfiniteReality can suffer from thrashing
`or volatility when pbuffer resources comeinto contention with other
`pbuffers or the “deep”ancillary buffers belonging to windows. Win-
`dow framebufferstate always takespriority over pbuffers.
`
`3.10 Context Switching
`
`InfiniteReality
`OpenGL permits multiple concurrent contexts.
`context switches as necessary to support multiple processes each
`using OpenGL. Context switches can be synchronous such as
`when a process changes to a different rendering context with
`
`glXMakeCurrent or completely asynchronous dueto the oper-
`ating system’s scheduling of multiple concurrently rendering pro-
`cesses [13]. Both cases are handledbasically the same way from
`the hardware’s point of view.
`A special context switch token is generated by the operating
`system when a context switch is required. The token “pushes”
`HIP, TED,TE,and BEFstate outover the Transform/Rasterization
`Crossbarwhereit is DMAedbackto the host. Commandspreceding
`the conte

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket