`
`MarkJ. Kilgard
`Silicon Graphics, Inc.
`
`Abstract
`
`Implementations that are simply “compatible” do not necessarily
`manifest an architecture. Our definition allows for an implemen-
`tation to belong to an architecture but have additional capabilities
`beyondthose definedbythe architecture.
`
`By our definition, OpenGLis clearly an architecture. While the
`determineroffunctional equivalenceis not required to be a codified
`specification,’ OpenGL’sarchitecture is indeed defined byits spec-
`ification [11].
`
`The OpenGL Graphics System provides a well-specified, widely-
`accepted dataflow for 3D graphics and imaging. OpenGLis an ar-
`chitecture,; an OpenGL-capable computer is a hardware manifesta-
`tion or implementation ofthat architecture. The Onyx2 InfiniteRe-
`ality and 02 workstations exemplify two very different implemen-
`tations of OpenGL. The two designs respondto differentcost, per-
`formance, and capability goals.
`Commonpractice is to describe a graphics hardware implemen-
`Implementations of an architecture typically accrue significant
`tation based on how the hardwareitself operates. However, this
`advantagesnotavailable to adhoc implementationsorsets ofimple-
`paper discusses two OpenGL hardware implementations based on
`mentations that are compatible yet do not manifest an architecture.
`howthey embody the OpenGLarchitecture. An important thread
`Architectures gain an advantage from compatibility, but also tend to
`throughout is how OpenGL implementations can be designed not
`be more adaptable and foster innovative implementations through
`merely based on graphics price-performance considerations, but
`the freedom granted designers in howthey realize the architecture.
`also with consideration of larger system issues such as memory ar-
`Architectures also tend to be easy to extend because an implemen-
`chitecture, compression, and video processing. Just as OpenGL
`tation’s behavioris typically not specified for situations not defined
`is influenced by wider system concerns, OpenGLitself can pro-
`by the architecture’s functional equivalence.
`videaclarifying influence on system capabilities not conventionally
`Theintent ofthis paper is to explore OpenGL’s adaptability as an
`thought of as graphics-related.
`architecture. What werefer to as the adaptability ofan architecture
`is not measured by units sold or market share. Instead, we contend
`that the adaptability ofan architecture should bejudged by the archi-
`tecture’s ability to codify well-understoodfunctionality,its potential
`to be cleanly extended to support new capabilities, andits ability to
`influencepositively issues outside the scope ofthe architectureit-
`self.
`
`1.3.1 [Computer Graphics]: Hardware Architec-
`CR Categories:
`ture; 1.3.6 [Computer Graphics]: Methodology and Techniques—
`Standards
`
`Keywords:
`Reality, 02
`
`OpenGL, Graphics Hardware Architecture, Infinite-
`
`1
`
`Introduction
`
`The OpenGL Graphics System provides a well-specified, widely-
`accepted dataflow for 3D graphics and imaging. While program-
`mers maythink of OpenGLassimply a programming interface [7],
`wetake the view that OpenGL definesan architecture,
`Wesay a set of implementations manifest an architecture when
`three conditions are met:
`
`1, The implementations mustall have an identical interface and
`generate functionally equivalent outputs given the sameinputs
`andinitial state,
`
`2. The determiner offunctional equivalence is something other
`than a particular implementation.
`
`3, The determiner of functional equivalence does not necessi-
`tate that all implementations be operationally identical. (There
`must be multiple ways to implementthe architecture.)
`
`Permission to make digital/hard copies of all or part ofthis material for
`personal or classroomuseis granted withoutfee provided that the copies
`are not made or distributed for profit or commercial advantage,the copy-
`right notice, the title ofthe publicationandits date appear, andnotice is
`given that copyright is by pennission ofthe ACM,Inc. To copy otherwise,
`to republish, to post on servers orto redistribute tolists, requires specific
`pennission and/orfee
`1997 SIGGRAPH:Eurographies Workshop
`Copyright 1997 ACM0-89791-961-0/97/8,.33,50
`
`Our approachis to consider two manifestations of the OpenGL
`architecture: the Onyx2 InfiniteReality graphics supercomputerand
`the O02 desktop workstation. Our examples were chosen because
`eachis the result ofquite different cost, performance, and capability
`goals, but both concretely demonstrate our primary contentionthat
`OpenGLis technically successfulas an architecture becauseit is ex-
`tensible to encompass new capabilities within the scope of interac-
`tive graphics and because OpenGLcanpositively influence system
`issues not directly graphics-related. Our approachis novel because,
`while we consider concrete implementations, we are fundamentally
`evaluating OpenGLas a graphics system architecture, not a partic-
`ular hardware implementation.
`
`Section 2 reviews the OpenGLarchitecture’s scope, philosophy,
`functionality, and meansofextensibility. Section 3 describes how
`OpenGLis instantiated by the Silicon Graphics Onyx2 InfiniteRe-
`ality. Section 4 describes how OpenGLisinstantiated by the Silicon
`Graphics 02 workstation. Section 5 contrasts the two implementa-
`tions based on howthey distinctly manifest the OpenGLarchitec-
`ture. Section 6 discusses how the OpenGLarchitecture influenced
`and even clarified several non-OpenGL design considerations in
`both example implementations. Section 7 argues that the OpenGL
`architecture is “good” becauseit provides us a framework forbuild-
`ing innovative, evolvable, well-integrated graphics systems.
`
`1 The PC architecture lacks a codified specification but whatconstitutes
`a PC has evolved beyondthepointthat a PC can be described operationally
`by a single implementationas was originally the case.
`
`45
`
`1
`
`APPLE 1037
`
`APPLE 1037
`
`1
`
`
`
`LO”|Color Tabla Post Convolution
`
`Shit
` Point, Line,
`Original
`and Add
`and Polygon
`Rasterization
`.
`xo
`BorPath
`
`Pixel Mapping
`Pixet Mapping
`giPixelMap
`
`RGBA-> RGBA]|Indox » RGB.
`
`
`
`(ColorTableEXT
`
`
`Sienablelgicisable
`giConvolutionParameterEXT
`glEnable/giDisabte
`Convolution
`g\PixelTranster
`
`Let|Scale & Blas
`giColorTableEXT
`glEnable/g!Disable
`
`
`Unpack’
`Pixels
`Pa
`giColorMatixSGt
`glEnable/giDisable
`g!PixelTransfer
`
`
`glPixelTransfer
`
`Scale
`
`
`
`
`Color Tabla
`
`7
`
`giColorTableEXT
`glEnable/g!Disable
`glHistogramEXT
`g'ResetHistogramEXT
`glEnable/g!Disable
`
`giMinmaxEXT
`glResetMinmaxEXT
`glEnable/g!Disable
`
`Color Atatrix
`Scale & Blas
`
`
`
`
`
`Color Tabla
`
`Histogram
`
`¥
`
`RGBA
`
`Post Color Matrix
`
`t
`
`xz
`
`Feedback/
`Selection
`
`RGBA
`
`Indox
`
`Di
`
`Fragment
`
`Pixels Framebuffer
`
`
`
`ey
`
`
`Figure 1: The dataflow within the OpenGLarchitecture’s concep-
`tual state machine.
`
`2 OpenGLis a Visualization Architecture
`
`The OpenGLarchitecture addressesthe task ofefficiently convert-
`ing vertex- and pixel-based data representations into images. While
`the “GL” in OpenGL stands for Graphics Library, we consider
`OpenGL’s functionality mandate to be larger than that of a tradi-
`tional 3D graphics library. OpenGL manipulates vertex and pixel
`data with comparable ease. Moreover, texture mapping provides
`a “bridge” to effectively combine therasterization of vertex- and
`pixel-based data representations.
`We consider SGI’s early IRIS GL implementation to exemplify
`the conventionalfeature set ofa 3D graphicslibrary. Over time IRIS
`GL added texture mapping and imageprocessing operationsto its
`repertoire. These additions served as the motivation for rethinking
`the purposeofa graphicslibrary during the design ofOpenGL. Be-
`cause OpenGLis well-suited for manipulating both vertex andpixel
`data, supports texture mapping, and embodies an architecture, we
`refer to OpenGLasa visualization architecture.
`
`2.1 State Machine Philosophy
`
`OpenGLisspecified as a state machine. OpenGL commandseither
`set state variables,retrieve state variables, retrieve framebuffer con-
`tents, compile orcall displaylists, or introduce vertex or pixel data
`into the state machine. Vertex and pixel data introduced into the
`state machine are processed based on the current OpenGLstate set-
`tings with the results sentto the framebuffer, texture objects, display
`lists, or selection/feedback buffer depending on OpenGL’s current
`settings. Figure 1 showsthe high-level dataflow within the OpenGL
`architecture’s conceptual state machine.
`Beyond OpenGL’s state machine model, several philosophical
`choices help make OpenGLboth extensible and adaptable to unex-
`pected situations. In later discussion, we note howthese choices are
`manifested in the two example implementations considered.
`OpenGL’sstatevariables are orthogonal. In general, the enabling
`or reconfiguring of OpenGLfeatures does notinterfere with other
`features. For example,lighting calculations can be enabled ordis-
`abled independently from the current depth buffering mode. This
`means programmers can combinefeatures with predictable results.
`Anoften unforeseen advantage offeature orthogonality is that mul-
`tiple independentfeatures can often be combinedin useful but unan-
`ticipated ways. Much of OpenGL’sease of extensibility is predi-
`cated on feature independence. Without orthogonality, multiple ar-
`chitectural extensions lead to confusing interdependencies or even
`create feature conflicts.
`The OpenGLarchitecture is client-server in the abstract sense,
`
`Figure 2: The extended OpenGLpixelpath including the convolu-
`tion, histogram, color matrix, and color table extensions.
`
`not necessarily in a networked sense. Client-server means thatthe
`interface between an OpenGLapplication and an OpenGL imple-
`mentationis strictly defined andall data passing betweenthe appli-
`cation and implementation is explicit. The client-server separation
`defines the boundary between OpenGL implementation state and
`that ofthe application. This clear boundary makespossible network
`extensible OpenGL implementations[5] and allows OpenGLto be
`used as a direct hardware interface.
`Immediate
`The OpenGL architecture is data format rich.
`modetransfer of pixel and vertex data can be accomplished using
`OpenGL’s wide variety of data sizes and formats. This allows ap-
`plicationsto easily transfer their vertex and pixel data to OpenGL
`by traversing application-dictated data structures. Applications can
`supply pixel data using variousstrides, offsets, and component
`packings. Application performancetypically benefits from avoid-
`ing data reformatting whentransferring data to OpenGL. However,
`OpenGL implementations must be ready to accept OpenGL’s mul-
`titude ofpossible data formats.
`The OpenGLarchitectureis configurable, but notprogrammable.
`The OpenGLstate machine can be thought of as a pipeline with a
`fixed topology (though various stages may be switchedin or out).
`This mimics the layout of high-performance graphics subsystems
`where rendering steps are decomposedandinstantiated by special-
`ized hardware. The OpenGLarchitecture clearly encourages this
`style of implementation. This doescreate situations where features
`such as programmable shaders [8] or generalized image processing
`chains [12] are difficult to express as extensions to the OpenGLar-
`chitecture.
`
`2.2 Functional Decomposition
`
`Sections 3 and 4 discuss how OpenGL{as specified in version 1.1)
`is instantiated by our example implementations. Therefore,this sec-
`tion briefly reviews OpenGL’s functionality from an architectural
`standpoint. The operations are explained “bottom up”starting with
`the lowestlevel operations that update the framebuffer and moving
`to the highest level operationsthat accept commands.
`
`46
`
`
`
`
`
`2
`
`
`
`2.2.1 Per-Fragment Processing and Rasterization
`
`2.3 Extensibility
`
`A fragment in OpenGLis the bundle ofstate required to update
`a specific pixel in the framebuffer. Fragments are generated dur-
`ing rasterization. The per-fragmentoperationsare pixel ownership,
`scissoring, alpha testing, stencil testing, depth testing, blending,
`dithering, and logicop. The operations are performedin the order
`listed though what operations are enabled depends on OpenGL’s
`per-fragmentstate variables.
`Rasterization is the process of breaking a primitive up into frag-
`ments that are passed to the per-fragmentprocessing stage. OpenGL
`supports five types ofprimitives: points, lines, polygons,pixelrect-
`angles, and bitmaps. Thefirst step in rasterization is determiningif
`a framebuffer pixel is updated by the primitive. Depending on the
`primitive being rasterized, the current raster position, face culling,
`pointsize, line width, line stipple, polygonstipple, and antialiasing
`state affect which pixels are updated. The next rasterization step de-
`termines the fragment depth andcolorofaffected pixels. The alpha
`color componentisaltered based on the antialiasing state ofgeomet-
`ric primitives, The depth ofgeometric primitives can be altered de-
`pending onthe polygonoffset state, When enabled, texture mapping
`and fog modify the color ofboth geometric andpixelprimitives.
`
`2.2.2 Texture Mapping and Mangement
`
`Texturing mapsa portion of a specified image onto each primitive
`for which texturing is enabled, Texture coordinates determine what
`portion of the image is mappedto the primitive. OpenGL supports
`both 1D and 2D textures in a wide variety of formats. Texture pa-
`rameters and the texture environment determine the methodoffil-
`tering texels and howtexels are combinedwith fragments generated
`during rasterization.
`Texture objects provide the capability to switch between multiple
`texture images without the overhead ofrespecifying the texture im-
`age each time, Rectangular regionsoftextures can be incrementally
`updated using subtexture loads, When a texture imageis specified,
`the constituentpixels are passed through the OpenGLpixelpipeline
`so the same operationsdiscussedbelowthat apply to drawing, copy-
`ing, or readingpixel rectangles also transform texture images when
`they are specified.
`
`2.2.3 Both Vertex and Pixel Processing
`
`OpenGLtransformsapplication-supplied vertex coordinates to win-
`dow coordinates, clipping the primitives as necessary. Per-vertex
`lighting is performed if enabled. Texture coordinatesare either ex-
`plicitly supplied by the application or generated based onthe vertex
`coordinates,
`OpenGLdefinesa pixelpath to process pixels. The pixel path can
`be configured to perform componentscaling, biasing, and remap-
`ping via table lookups, Pixels are transformed by the pixel path
`whenpixels are drawnto the framebuffer, read back from the frame-
`buffer, copied within the framebuffer, or downloadedinto texture
`memory, Each pixeltransfer case sharesthe identical pixel process-
`ing machinery,
`
`2.2.4 Other Capabilities
`
`Displaylists provide a way to cache repeated command sequences
`for potentially faster execution, Evaluators provide a meansto effi-
`ciently specify Bézier curves and surfaces, Feedback and selection
`redirect the results of vertex processing back to the application in-
`stead ofon to rasterization.
`
`One key to an architecture’s adaptability is its extensibility.
`OpenGLcanbe incrementally enhanced through its proven API
`extension mechanisms. OpenGL’s rendering functionality can
`be extended by adding extensions to OpenGL’s core rendering
`model. Extensions also can be made to OpenGL’s windowsystem
`dependentinterface to address issues outside OpenGL’s rendering
`model.
`
`Various OpenGL vendors have already implemented dozens of
`extensions, and the OpenGL 1.1 update was the result of the
`OpenGL Architectural Review Board’s efforts to fold success-
`ful, proven extensions back into the core OpenGLarchitecture.
`OpenGL1.1 added vertex arrays, polygon offset, RGBAlogic oper-
`ations, texture objects, and further texture functionality enabled by
`texture objects.
`The following extensions are importantfor later discussion.
`
`2.3.1
`
`Imaging Extensions
`
`A keyset of OpenGL extensions? are the imaging extensions[10]:
`color table, convolution, color matrix, histogram, and new per-
`fragment blending modes. Figure 2 showsthe extendedpixelpath.
`
`2.3.2 Hardware Accelerated Off-screen Rendering
`
`Hardwareacceleratedoffscreen renderingis critical for a multitude
`oftechniques that mustreliably readback or reuserenderingresults.
`A windowsystem dependentextensionforpixel buffers (commonly
`called pbuffers) enables hardware accelerated offscreen rendering.
`
`3 OpenGL as Instantiated by
`InfiniteReality
`
`Onyx2 InfiniteReality implements the bulk of OpenGL’s dataflow
`within the InfiniteReality graphics subsystem.InfiniteReality is de-
`signedto be a “real time” graphics machine meaningthatsustained
`30 hertz and higher frame rates are achievable even for demanding
`applications. InfiniteReality’s intended application domainsare vi-
`sual simulation,film & video production,real-time image process-
`ing, volume rendering, and large-scale CAD.
`InfiniteReality is a hardware-intensive design consisting of 13
`distinct Application Specific Integrated Circuits (ASICs).° Infinite-
`Reality is a multiple-board graphics subsystem with the same board-
`level architecture as the RealityEngine [1], InfiniteReality’s prede-
`cessor. A single Transform Managerboard connects to 1, 2, or
`4 Raster Managerboards and a single Display Generator board.
`Figure 3 shows an ASIC-level block diagram of InfiniteReality.
`Figure 4 shows how OpenGL’s conceptual state machine(origi-
`nally shown in Figure 1) roughly mapsto InfiniteReality’s render-
`ing ASICs. Starting at the host interface and working towards the
`framebuffer and display back-end, the following discussion shows
`howthe OpenGLarchitectureis instantiated by InfiniteReality.
`
`2 Underconsiderationfor inclusion in OpenGL 1.2.
`3Other sourcesofinformation aboutInfiniteReality are likelyto refer to
`the boardsand ASICsthat constitute InfiniteReality by “working names”that
`grew out ofhistorical SGI jargon andtradition. In a few cases, the work-
`ing names inadequately describe the ASIC or board’s true functionin the
`context ofOpenGL.For example, the Geometry Engine ASIC handles both
`vertex and pixel data so wereferto it here as a Transform Engineto bet-
`ter suit our purpose ofdescribing how InfiniteReality manifests the OpenGL
`architecture.
`
`47
`
`3
`
`
`
` Requestor
`
`
`Figure 3: ASIC-level diagram showing the InfiniteReality graphics subsystem architecture.
`
`3.1 Host Interface
`
`The client-serverstructure of OpenGL makesit possible for essen-
`tially the entire OpenGLfeature setto be implemented within the In-
`finiteReality graphics subsystem. The host-based OpenGLlibrary is
`largely usedto setup efficient datatransfers to and from the graphics
`subsystem. For example, an immediate mode giVertex3£ call
`returns in 7 instructions. This consists ofjumping through a redi-
`rection table, writing the Vertex3£ token followed by the three
`floating point coordinates to the graphics FIFO address,and retumn-
`ing.
`OpenGL commandsanddata enter InfiniteReality via a high-
`bandwidth proprietary IO bus where they are received by the Host
`Interface Processor (HIP) that decodes and dispatches OpenGL
`command streams. Commandscanbesenteither by programmed
`IO or via Direct Memory Access (DMA).
`The HIP’s Input Control and Mapping (ICU)logic arbitrates the
`OpenGL commandstream from oneofthree sources:the host-filled
`graphics FIFO, the host-activated input DMA stream, or a local
`DMAstream usedforcalling locally cached displaylists. The ICU
`performs basic OpenGL commandstream error checking and di-
`rects commandsfor subsequent processing. Pixel and vertex com-
`mands and some mode changesare simply passed along for further
`processing. To process OpenGL commandstreamswith data rates
`over 300 MBs/second, the ICU mustbe very fast. More complex
`OpenGL commandsinvolving displaylists, more complicated state
`management, DMAsetup,or non-renderingtasks can be redirected
`to a microcoded 32-bit RISC core. Most of the RISC core’s mi-
`crocodeis written in C.
`Display lists are cached in 15 of the 16 megabytes of external
`memory managedby the RISC core (one megabyteis usedforstate
`and microcode). The HIP’s local DMAfacility allows cacheddis-
`playlists to be passed through the ICU justas if the command se-
`quence wasgeneratedby the host. Most immediate mode OpenGL
`calls result in IO writes to the hardware’s graphics FIFO address.
`The graphics FIFO is mapped into the address space ofdirect ren-
`dering OpenGLapplications [6]. OpenGL commandstreams can
`also be “pulled into” the HIP via input DMA.Largetextures, pixel
`arrays, vertex arrays, and host-residentdisplaylists can all be trans-
`ferred this way. Because DMAtransfers involvefixing host physical
`memory mappings, DMAisinitiated with operating system support.
`The HIP is also responsible for returning OpenGL data back
`
`Figure 4: How the conceptual OpenGLstate machine roughly maps
`to InfiniteReality’s rendering ASICs.
`
`to the host. The results of glGet*, feedback, selection, and
`glReadPixels are all returned via DMA.The HIP is responsi-
`ble for any data reassembly required before returning the data to the
`host.
`
`3.2 Vertex and Pixel Transform Subsystem
`
`The HIP sendsthe partially decoded OpenGL command stream to
`the Transform EngineDistributor (TED). The TED front endis
`responsible for converting OpenGL’s data format rich command
`stream into a canonicalformatin preparation for handingthe data to
`the Transform Engines(TEs) for processing. For example, double
`precision floating point or integer coordinates are forced to single
`precisionfloating point. Pixeldatais also reformatted as necessary.
`Commandsto change OpenGLstate are mostly passed through un-
`altered. Given the high data bandwidths involved andtheflexibility
`that OpenGLallows,the TED front end mustbe very fast,
`The TED backenddistributes bundles of work to 2 or 4 TEs
`that perform the actual vertex and pixel transformations required,
`Managing OpenGL’sg1Begin/g1lEnd and per-vertexstate is done
`through a microcodedstate machine. The TED also must ensure
`that OpenGLtransformationstate is synchronized amongthe mul-
`tiple TEs to guarantee proper OpenGL commandserialization sc-
`
`
`
`
`
`)
`
`,
`
`Readback BUS
`TransformvRasterization
`
`Crossbar
`
`
`Transform Manager board
`(2or4
`
`Texture
`Fragment
`Processor
`
`Texture
`Fragment
`Processor
`
`Texture
`Fragment
`Processor
`Texture
`Fragment
`Processor
`
`Video
`oeee,
`Channol #1
`
`Vidco
`
`Output
`Channel #0
`
`
` Vid
`Video
`
`Roquostor
`
`
`<-> Requestor
`Requostor
`
`
`RAMDAC
`
`RAMDAG
`
`Olspla'
`Function—
`
`A singe Raster Managerboard set
`, 2,074 RMs perpipe)
`
`Olsplay Generator board
`{option for & channels)
`
`4
`
`
`
`3.5 Texturing
`
`mantics despite multiple active TEs. The TED performs a mapping
`donein the PG are texture and fog application. The PG cansustain
`ofOpenGL commandtokens to TE microcode addressessothat the
`the rasterization of over 12 million polygonsasecond.
`TE can immediately begin command execution. Work is typically
`assigned to the least busy TE.
`The TE ASIC is a custom microcoded floating point processor.
`Each TE has a peak performance of 540 megaflopsachieved using
`three SIMDfloating point cores, The TEs use custom support logic
`to accelerate graphics-specific operations such as clipping. A care-
`fully tuned memory system is essential to keep the floating point
`units continually busy. To minimize the amount of microcodere-
`quired given the variety ofgeometry andpixeltransformations po-
`tentially enabled, microcode modulesare “stitched” together based
`on the current OpenGL geometry or pixel transformation state. For
`example,the lighting microcode module would only be addedto the
`TE’s geometry microcode sequenceiflighting is currently enabled.
`The TEs implementthe pixel path functionality including the ex-
`tendedpixel path functionality described in Section 2.3.1. Special
`care is taken in the TED and TEsto managepixeldistribution when
`pixel convolution is enabled. Anotherpixel path challenge is mem-
`ory managementfor the various lookup tables, convolution kernels,
`histogram bins, and otherpixel path state that must be maintained
`within each TE, Bothpixel rectangles and texture downloads flow
`through the TEs and so the identical microcode transforms both
`typesofpixel data identically as required by OpenGL.
`The complete Transform Managersubsystem can sustain geom-
`etry transformationrates ofover 11 million polygons/second.
`
`3.3 Transformation to Rasterization Crossbar
`
`The transformedvertices andpixels from the TEsflowoutin pack-
`ets that mustbe reordered by the Back End FIFO (BEF). The BEF
`is a 4 megabyte FIFO intended to minimizestalling the TEs dur-
`ing framebuffer clears or the rasterization ofvery large polygons or
`pixel rectangles,
`The BEFbroadcasts the contents ofits FIFO across the Trans-
`form/Rasterization Crossbar connecting the BEFto 1, 2, or 4 Raster
`Manager boards. Two main types of requests are sent over the
`crossbar:
`texture (or /Joad) requests and rendering (or draw)re-
`quests. The crossbaralso feeds back to the HIP to implement se-
`lection/feedback,state retrieval, and context switching.
`The BEFactually maintainstwodistinct FIFOs: the draw FIFO
`for rendering and the load FIFO for texture download. The draw
`FIFO takes priority over the load FIFO, but the load FIFO drains
`wheneverthe drawpath is stalled. The draw path canstall because
`it has gotten backed up with rasterization work or becauseit is wait-
`ing on a texture to download, Waiting for a texture to fully down-
`load provides an interlock that ensures textures are always properly
`loaded before use. The advantage ofthis schemeis that textures
`can be downloaded concurrently with rendering to increase overall
`throughput.
`
`3.4 Primitive Rasterization
`
`Geometric and image primitives, texture data, and mode changes
`are all broadcast over the Transform/Rasterization Crossbar to the
`Raster Managerboards. The crossbarcan sustain a maximum band-
`width of 400 MBs/second, The Pixel Generator (PG) and Texel
`Generator (TG) ASICs on each Raster Managerlisten for the data
`flowing from the BEF, Both the PG and TGrasterize image and ge-
`ometry primitives sent over the crossbar. The PG almost completely
`rasterizes primitives. Depending upon the current OpenGLrasteri-
`zationstate, the highly pipelined PG scan converts geometric prim-
`itives, pixel zooms images, scissors, interpolates color and depth
`between vertices, calculates coverage alpha valuesforantialiasing,
`and applies the polygon stipple. The only rasterization steps not
`
`49
`
`InfiniteReality is balancedto renderjustasfast with its highest qual-
`ity (linear mipmaplinear) texturing enabled as whenrendering with
`texturing disabled. This requires a very fast and sophisticated tex-
`ture subsystem.
`Using data received over the Transform/Rasterization Crossbar
`and rasterization results passedto it from the PG, the TG needsto
`initiate texel fetches for textured primitives in parallel with the ras-
`terization work done by the PG. The TG needsto rasterize only tex-
`tured primitives to the point that the TG can generate the necessary
`per-fragmenttexture coordinatesinterpolatedacrossthe primitive.
`Texture coordinate informationis broadcast to 8 Texture Memory
`(TM) ASICs. Each Raster Managerboard is configured with either
`16 or 64 megabytesoftexture memory split evenly among the TMs.
`Texture accessestend be highly redundantasnearbytexels are often
`needed multiple times in the courseoffiltering the texels for a given
`textured primitive. The TMsactas specialized memory controllers
`that are optimized for texel access patterns.
`InfiniteReality includes numeroustexture extensionsintroduced
`by RealityEngine including sharpentexture, detail texture, 3D tex-
`ture for volume rendering, and post-filtering texture lookuptables.
`InfiniteReality also includes new texture features such as clipmap-
`ping for rendering continuous terrain and various modesforbetter
`video texture mapping.
`
`3.6 Fragment Processing
`Texels from the TMsand texture coordinate information from the
`TG are combined in one of 4 Texture Fragment (TF) ASICs. The
`TFs also receive the actual fragments generated by the PG. Thein-
`formation from the TMsand TGare usedto perform OpenGL’s tex-
`ture filtering modessuch as linear mipmaplinearfiltering. A post-
`filtering stage can optionally scale, bias, and perform a table look
`up on the filtered texels. These extra steps are OpenGL extensions
`that are useful for image processing and volumerenderingeffects.
`Fully filtered texels are then combined with the fragments from the
`PG based on the current OpenGLtexture environment. If enabled,
`fog is applied. The last operation donebythe TFis the per-fragment
`alphatest.
`Each TF is connected to 5 Image Memory Processor (IMP)
`ASICs. Each IMP ASICcontains4 instances ofthe IMP core. Each
`IMPcore manages 1 megabyte of external memory containing the
`framebuffer. The IMPs manage 80 megabytestotal per Raster Man-
`ager. Each IMP core managesa scattereddistribution ofpixels and
`receivesfragments from its TF. The IMP coreperformsall OpenGL
`per-fragment operations exceptalphatesting which is donein the
`TFandscissoring whichis done in the PG.
`The IMPs maintain multiple depth and color samples per pixel
`to realize order-independentantialiasing. The IMPs also perform
`OpenGL’s accumulation buffer [4] operations.
`A single RasterManagerboard cansustaintexturedpixelfill rates
`of200 megapixels per second. The combinedtexturedfill rate with
`four Raster Managers is therefore 800 megapixels per second.
`
`3.7 Display Generator Subsystem
`
`The Display Generator board is responsible for generating analog
`video streams based on the current contents ofthe framebuffer main-
`tained by the IMPsin the Raster Manager. InfiniteReality supports
`2 or 8 analog video output channels . Each Video Output Channel
`(VOC) ASICgenerates video requests sent overa serial interface to
`the IMPs. The IMPsrespond with the requested framebuffer color
`
`5
`
`
`
`
`
`Imaging &
`Compression
`Engine
`
`Display
`
`Rendering
`Engine
`
`
`Display|Engine
`
`
`
`
`
`Imaging &
`
`Compression
`Engine
`
`
`SDRAM
`Main Memory
`
`Figure 5: Block diagram showing the O2 system-levelarchitecture.
`
`Figure 6: How the conceptual OpenGLstate machine roughly maps
`to O2’s various ASICs.
`
`information on the Video Bus. The core OpenGLarchitecture does
`notdirectly concern itself with video issues so further details about
`the Display Generatorare put off until Section 6.
`
`3.8 Reading and Copying Pixel Data
`
`The OpenGLpixel path draws pixels and downloads textures, but
`mustalso transform pixels that are copied (g1CopyPixels) or
`read back to the host (gl1ReadPixels). Whena framebuffer read
`or copyis initiated, the IMPs send framebufferpixels to the TFs that
`transfer on the data over the Readback Bus to the TED. The TED
`feeds the framebuffer pixel data through the TEs muchasifit were
`pixel data originated from thehost.
`The fetchedpixeldata is transformed bythe TEs andthenis either
`rendered back into the framebufferin the case of glLCopyPixels
`(just like the gl1DrawPixels case) oris transferred back to the
`host in the case of gLReadPixels. Whenreading pixels, the
`BEFdirects the pixels across the Transform/Rasterization Crossbar
`wherethe HIP reassemblesthe pixel data before DMAing thepixels
`back to the host.
`OpenGL’s requirementthat texture memory mustberetrievable
`necessitatesa pathwayfor texels to be returned to the host. The TMs
`canpasstexture contents to the TF where data passes overthe Read-
`back Bus and eventually back to the host. Unlike gLReadPixels,
`retrieved texture contents are not transformedbythe pixel path.
`
`3.9 Offscreen Rendering
`
`Excess framebuffer memory can be allocated to pbuffers for off-
`screen rendering as described in Section 2.3.2. The amount of
`renderable offscreen memory is limited and depends on the res-
`olution of the framebuffer. While pbuffers allow full speed off-
`screen rendering, because pbuffers are carved from “excess” frame-
`buffer space, pbuffers on InfiniteReality can suffer from thrashing
`or volatility when pbuffer resources comeinto contention with other
`pbuffers or the “deep”ancillary buffers belonging to windows. Win-
`dow framebufferstate always takespriority over pbuffers.
`
`3.10 Context Switching
`
`InfiniteReality
`OpenGL permits multiple concurrent contexts.
`context switches as necessary to support multiple processes each
`using OpenGL. Context switches can be synchronous such as
`when a process changes to a different rendering context with
`
`glXMakeCurrent or completely asynchronous dueto the oper-
`ating system’s scheduling of multiple concurrently rendering pro-
`cesses [13]. Both cases are handledbasically the same way from
`the hardware’s point of view.
`A special context switch token is generated by the operating
`system when a context switch is required. The token “pushes”
`HIP, TED,TE,and BEFstate outover the Transform/Rasterization
`Crossbarwhereit is DMAedbackto the host. Commandspreceding
`the conte