throbber

`
`;$3.9;F-wlujsi C”3'T€C'€}lh‘~'-§ :v' ‘ :vM
`
`
`Si€::<'3P.-\PH 2:361
`F'rr.
`
`1:;
`
`- ‘
`
`PROCEEDINGS
`
`'_,
`
`n
`
`,.._n -_- Fuc‘v
`
`A ?ld£:f:‘lh0:10§ACT‘.ISZGGH.‘-Fh
`
`Sprmsarufi by 31:0ACifi‘i:- Spusat
`
`frvr~--,51C‘r:m;:« :2: Cur-31.x»?
`
`Grant-Hr;
`
`1‘
`
`.
`
`.
`
`.
`
`-
`
`PRESS
`
`_
`
`.siéUS—aeee 3553
`
`233%“
`
`WISE GRAPHICS fiPROCEED 1 NES‘
`
`mam-mag:
`3393371000 m m
`
`w; m W‘
`
`. as“
`
`MEDIATEK, Ex. 1013, Page 1
`IPR2018-00101
`
`

`

`
`
`PROCEEDINGS
`
` Annual Conference Series 2001
`
`
`SEGGHAPH 2001
`Conference Proceedings
`August 12—17, 2001
`Papers Chain Eugene Flume
`
`A Publication of ACM SIGGRAPH
`
`Sponsored by the ACM‘s Special
`Inierest Group on Computer
`Graphics
`
`
`
`20 O 1EXPLDRE iNfEEACTIGN
`AND DlGiTAL IMAGES
`
`Emit-.9.
`'t-"u't a 3;"
`i:
`
`ii r: 39'
`:
`“RE
`
`‘6 Wm
`
`
` __,..——_
`
`iiiilifliillliliilllliiiillillliilifl
`
`REG-22587597
`
`magnum Loan. Return Almall within 4 weeks of date
`Wham ufifi'sfi'recafi‘ed‘exm. f
`Request Ref. No.
`
`9/10~2-VE LOAN S S
`If no other library inéicated please return loan to:-
`The British Library Document Supply Centre, Boston Spa,
`Wetherby, West Yorksnlre, United Kingdom I323 7BQ
`
`MEDIATEK EX. 101 , Page 2
`
`IPR2018-00101
`
`_._
`
`/
`“i
`__s
`
`wvum
`
`/.'
`
`9’
`
`/'.
`
`)
`
`:4
`
`/ .
`
`-'
`
`
`
`MEDIATEK, Ex. 1013, Page 2
`IPR2018-00101
`
`

`

`SIGGFIAPH 2001. Los Angeies. Califoméa. August 12—17. 2001
`
`The Association for Computing Machinery, Inc.
`1515 Broadway
`New York. New York 10036
`
`Copyright © 2001 by the Association for Computing Machinery, Inc (ACM). Permission to make digital or hard copies of
`portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed
`for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyright for com-
`ponents of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
`
`To copy otherwise, to republish. to post on servers or to redistribute to lists. requires prior specific permission andlor a fee.
`Request permission to republish from : Publications Department. ACM, Inc. Fax +1-212-869-0481 or e—maji
`permissions @acmorg.
`
`For other copying of articies that carry a code at the bottom of the first or last page, copying is permitted provided that the
`per-copy fee indicated in the code is paid through the Copyright Clearance Center. 222 Rosewood Drive. Danvers, MA 01923.
`
`Notice to Past Authors of ACM-Published Articles
`ACM intends to create a complete electronic archive of all articles and/or other material previously published by ACM. if you
`have written a work that was previously published by ACM in any journal or conference proceedings prior to 1978, or any
`SIG newsletter at any time, and you do NOT want this work to appear in the ACM Digital Library. please inform
`permissions©acm.org. stating the title of the work, the author(s). and where and when published.
`
`ACM ISBN: 1-581 i3-374—X
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`PO. Box 11405
`Church Street Station
`New York. NY 10286~ 1405
`
`Phone: 1-800-342-6626
`(USA and Canada)
`Halli—62640500
`(All other countries)
`Fax: +i-212-944-1318
`
`E-mail: acmheip@acm.org
`
`ACM Order Number: 428010
`
`Printed in the USA
`
`MEDIATEK, EX. 1013, Page 3
`
`IPR2018-00101
`
`MEDIATEK, Ex. 1013, Page 3
`IPR2018-00101
`
`

`

`AND DIGITAL lflJGl-S
`"'2 0 0 TEXPLORE lMTClACTlON
`
`E
`
`En‘k Lindholm
`
`erikl@nvidia.corn
`
`Computer Graphics Proceedings. Annual Conference Series, 2001
`
`A User-Programmable Vortex Engine
`Mark J Kllgard
`
`mjk@nvidia.com
`
`NVIDIA Corporation
`
`Henry Moraton
`
`moreton@nvidia.corn
`
`perfonnance has driven. and been driven by increasingly rich
`graphics APls. The motivation behind the creation of the user-
`programmabie geometry engine described in this paper is two
`fold: first. the increasing configurability required by continually
`evolving graphics APIs requires a programmable device to
`support
`the combinatorial explosion of mode combinations.
`Second, high-performance programmability is an end unto itself.
`Given the right programming model. with a sufficient degree of
`target processor independence.
`the need for rapidly evolving
`graphics APls is reduced, and an opportunity is created for
`inventiveness
`unconstrained
`by
`fixed-firnction, modally
`configured Mia and hardware. Further. compatibility across
`hardware generations and platforms will increase the lifespan and
`utility of programs written for geometry processors.
`
`The programming model and design of the geometry engine in the
`GeForce3 was guided by several
`factors: commodity pricing.
`design
`time.
`area.
`legacy
`performance.
`programmable
`performance. programmabiliry.
`and platform independence.
`Ultimately. all of these influence the commercial viability of the
`design. Design time obviously determines time to market. Area is
`directly linked to product cost. Previously existing applications
`must exhibit higher performance on new products. There can only
`be a slight performance penalty paid for taking advantage of
`progranunability. To gain acceptance. the engine must be easy to
`program. Finally. to promote adoption across vendors. a standard
`interface is required and thus the functionality cannot be too
`tightly coupled to a specific hardware implementation;
`for
`example. CPU implementations must be viable.
`We provide a monornic description of previous programmable
`graphics processors. comparing them to our device. We show how
`the programming model can be effectively supported by a custom
`processor design. We describe how a programmable processing
`element can be incorporated into an existing graphics AP].
`Finally. we illustrate how the programming model and interface
`may be used to efficiently implement complex custom effects.
`2 PREVIOUS WORK
`Geometric calculations have been accelerated for over 30 years.
`starting with early flight simulators. Among the best known is the
`Geometry Engine [5]. A system was built from !2 instances ofthe
`GE, coupled with a raster subsystem built out of AMDZ903s. The
`GE was fabricated using a 31m feature size and housed in a 40-
`pin package. The GeForoe3 CiPU is manufactured using a 0.I8lrm
`process with a ~550—pin package. So while available logic has
`increased by a factor of 300. the relative amount of available
`bandwidth has only increased by a factor of 14. Note that
`increases in clock frequency cancel in this relative measure. We
`provide these numbers simply to illustrate that the problem is
`continually evolving. and that the natural amount of computation
`performed by the GPU todsy'rs far more than ms performedIn
`years past. and probably a fraction of what will be appropriate
`tomorrow.
`
`The various products and technologies applied to performing the
`standardgeomeu‘yprooessingtsskscenbecstegormdbyasmeli
`number
`of
`attributes:
`technology.
`attainment.
`and
`programmability. The technology is one of ASIC, DSP. RISC
`
`MEDIATEK, EX. 1013, Page 4 149
`
`IPR2018-00101
`
`EABSTRACT
`in this paper we describe the design, programing interface. and
`
`nulernentation of a very efficient user-programmable vertex
`graine. The vertex engine of NVIDIA's GeForce3 GPU evolved
`
`fiun a hiyrly tuned fixed-fimction pipeline requiring considerable
`:hiowledge to program. Programs operate only on a stream of
`independent vettices traversing the pipe. Embedded in the broader
`
`ed function pipeline. our approach preserves parallelism
`unificed by previous approaches. The programmer is presented
`
`" '. a straightforward programming model. which is supported by
`t mold-threading and bypassing to preserve parallelism
`. performance
`.
`in the remainder of the paper we discuss the motivation behind
`
`it design and contrast it with previous work. We present the
`.1.
`: ming model. the instruction set selection process. and
`a- '13 of the hardware implementation. Finally. we discuss
`
`m-rtant API design issues encountered when creating an
`I':
`to such a device. We close with thoughts about the
`..
`. of programmable graphics devices.
`
`ords
`
`.
`...
`ics Hardware, Graphics Systems.
`
`
`
`
`
`
`.
`
`-
`
`.
`
`-
`
`X 2
`
`'5
`>
`
`INTRODUCTION
`
`Host interface
`
`Primitive Assembl lSetu-
`u .u
`- E_RasterITextura
`gE
`Fra_mobuffer Interface
`3'
`
`
`Figure 1: Graphics Processing Unit (GPU)
`
`dramatic increases in the computational power of graphics
`-.
`; units (OPUS. Figure l) have been fueled both by
`I:-I
`innovation
`and
`the
`continuing
`improvement
`in
`
`W ..
`-
`.
`process tectmologies. The need for
`increased
`
`-. on to make digital or hard copies of all or part of this
`r personal or classroom use is granted without for:
`Mt
`that copies are not made or distributed for profit or
`
`“-i
`is! advantage and that copies bear this notice and the
`Cation on the first page. To copy otherwise. to republish.
`
`on servers or to redistribute to lists. requires prior
`pcmission andlor 11 fee.
`
`
`
`IGGRAPH 200i. 12-17 August 200i. Los Angclcs. CA.
`200i ACM l-SSI l3-374vX/01/08...$5.00
`
`
`MEDIATEK, Ex. 1013, Page 4
`IPR2018-00101
`
`

`

`SlGGFlAPH 2001. Los Annelos, California. Au-ust 12-47, 2001
`
`caches
`CPU. and CPU extensions. Arrangement refers to the appr
`to exploiting parallelism, such as SIMD or MlMD. Each system's
`programmnbility may be characterized by whether they were
`intended for end-user programming, and the relative ease with
`which they were programmed.
`
`the Stellar 681000 [4]
`implementation.
`The only non-parallel
`used a supercomputer-like vector processor. and was driven by
`hand-coded assembly for critical paths
`
`Pixar's CHAP [l7] and the lltonas [7] are early examples of fine.
`grain SIMD processors, based on the AMDZ903. user micro-
`eodable by skilled programmers. "These machines operated in
`parallel on pixel and vertex components. The only coarse-grained
`SIMD implementation of which we are aware is the geometry
`subsystem ofthe Indigo Extreme [ll]. It was implemented using a
`hand micro-coded ASlC. The Indigo processed eight triangles in
`parallel. stalling if any of the group were clipped. or otherwise
`required branching.
`
`Following the original Geometry Engine. the IRIS GT [3] and
`The Pixel Machine [24] were the only machines to arrange
`floating point DSPs in pipeline Fashion. As has been observed by
`many. the slowest proeemr in the pipeline gated than machines“
`perfonnsnee. Since it was only practical to distribute the geometry
`rash statically.
`the pipelines were
`inefficient
`for certain
`workloads.
`
`MIMD machines dominate the history ofgeornetry processors. In
`each case the individual processors operated on single triangles.
`The Raster Tech GX4000 [26],[27] was the earliest example.
`followed by Pixel-Planes s [10]. the outoooovs [[5], Pixel
`Flow [19]. and the ReoliryEngine [2}. The @0100!) used a Weitek
`floating point DSP. while all but one of the remaining machines
`used the lBtSOXP [13]. a 64-bit microprocessor. The last of the
`MIMD geometry subsystems was the lnfiniteReality [23]. using a
`custom micro-cow ASlC built
`to exceed the performance
`available in third party processors. The lnfiniteReality's processor
`was micro-coded in SIMD fashion within each of the processors
`in a MIMD array of configurable size.
`
`Alternatives to the above large high-perfonnance machines are the
`processor extarsions, all of which exploit fine-grained SIMD
`parallelism similar to the CHAP and lkonas. Each of these
`exploits the existing resources and clock rate of a general purpose
`CPU to deliver high performance. MIPS-3D ASE [18] and
`JDNowl
`[I] perform paired single SlMD floating point
`operations. Intel's SSE instructions [I4] express 4-wide SlMD
`processing. Motorola's AltiVec [9] delivers the hill 4-wide SIMD
`performance. Sony's Emotion Engine [16] has two 4-wide SIMD
`processors'l'hefirstisinterfacedtothemainCPUasa
`coprocessor. executing instructions directly from the application's
`instruction stream. The second processor is more loosely coupled.
`ntnning loaded subromines.
`typically performing standard
`geometry processing tasks.
`
`In all cases, experts were required to very carefully crafi assembly
`code to achieve processor performance approaching theoretical
`peaks. Clem attention to pipeline latency.
`lunrds. and stall
`conditions was necessary to produce good results. While
`compilers were generally available. generated code was typically
`of inadequate pcrl‘onnance.
`
`in contrast to virtually all of these systems. our geometry engine
`only exposes the progranmtability of a small part of the larger
`geometry pipeline. Tasks such as vertex loaddrstore.
`format
`conversion1 primitive assembly, clipping. and triangle setup occur
`completely in parallel. in pipeline fashion. We use 4-wide fine-
`grained SIMD floating point
`to provide
`the necmary
`performance. and rim multiple execution threads to maintain
`efficiency and provide a very simple programming model.
`
`3 PROGRAD/[MING MODEL
`In this section we describe our programming model for geometry
`processing and discuss the design in the areas of input. Dutpm’
`data path, and instruction set selection. We include the rationing
`for choicest made in the design process.
`
`3.1 Vertex Processing
`There were two main possibilities for processing the vm-_
`stream:
`as
`independent vertices or as part of a geomenicj‘
`primitive. for example a triangle. The advantage of primitivenlevcl
`.
`information is enabling operations such as culling. reducing}
`processing time. However. we determined that
`the increases f.
`complexity and loss of parallelism in the primitive p
`model did not justify the perceived benefits. We chose an
`independent vertex program model to exploit the parallel nature if:
`of the task. and greatly simplify the resulting programming task. l'f“
`We preserved the latter stages of the fixed function programming
`.1
`model. there being no benefit to their programmability. in fact.
`incorrect clipping could fleece a hardware rasterizer. As such we "
`'
`leave frustum clipping. perspective divide. and viewport scale and
`bias
`to subsequent
`irnplementation~specific processing. The
`programming model is capable of expressing everything in the ._
`fixed function pipeline excqat user clip planes. We instead
`recommend encoding plane distances into texture coordinates and ’E,
`using fi'agment level operations to implement this fimctionality.
`3.2 Precision and Data Type
`lEEE single precision floating point has been used for many years
`as the standard precision for 3D transformations and to keep the
`model simple it was adopted as the only data type. The common
`data in 3D graphics are 3 and 4 component vectors. for example
`position. normal. texture coordinates and colors. The basic data
`type is therefore the quad-float vector written as ix.y.z.w].
`
`7"
`
`3.3 Scalar and Vector Handling
`It was critical to deal efficiently with scalar packing/extraction
`and vector data in this design since the 3D transform pipeline
`mixes these operations. Two simple concepts can resolve this:
`I. On input. vectors can have their components arbitrarily
`rearranged/replicated (swinled).
`
`2. Any operation generating a scalar must generate that scalar
`replicated across all components. and output writes have a
`component write mask.
`
`.
`
`A scalar value in a vector register can be replicated into a vector
`through (i), and then stored again as a scalar through (2).
`Swizzling is very useful for doing cross products efficiently.
`where the source vectors need to be rotated. Another use is
`converting constants
`such as {4.0.1.2} into others such as
`[0.0.1.0] or[-l.-l.-l.l].
`
`3.4 Program Model
`The program model is illustrated in Figure 2. The current vertex
`attributes are available in the input (source) registers. and the
`processed vertex is written into the output (destination) registers.
`The constant bank holds n'artsform and light parameters. and the
`register file (R) holds temporary results. A function unit (Fl
`implements the instruction set.
`
`Making the vertex source read-only by the vertex program. and
`the destination write-only recognizes the streaming nature of the
`design and simplifies implementation.
`
`mm“Whammsn3'
`
`x».’.\.w-'.J.Nx%..n
`
`MEDIATEK, EX. 1013, Page 5
`
`IPR2018-00101
`
`MEDIATEK, Ex. 1013, Page 5
`IPR2018-00101
`
`

`

`Com-ular Grannies Proceedin-s. Annual Conference Series. 2001
`
`(clamped. only vaiid for points). Having a fog output permits
`more general fog effects than using the position‘a z or w values,
`and is interpolated before use a a distance in the standard fog
`equations. We allow for up to eight texture coordinate sets that
`can be used for traditional texturing as well as more novel effects
`in wmbination with GeForoeS's texture shade:- and register
`combiner: per-fragment functionality [20]. Texture coordinates
`are assumed to be full precision and range. as well as perspective
`correct when used in pixel programs.
`All instruction writes have an optional 4-componcnt write mask.
`
`
`
`Table 1: Output Attribute:
`
`All vertex output registers are initialized to (0.0.0.0.0.0.l .0) at the
`start of a vertex program. Subsequent writes then apply the output
`write mask to update the selected components. This avoids any
`problems with undefined outputs. and having to verify raster
`subsystem input options.
`
`3.7 Instruction Set
`The instruction set consists of l? operations. These can be
`divided into vector, scalar. and miscellaneous operation. We
`discuss the instructions selected after explaining the constraints
`we choaetoimpose.
`
`Figure 2: Program Model
`
`
`
`Input Attributes
`are to quad-float vertex source attribute registers Fixed
`
`m mode typically requires a position, normal, two colors.
`.. eight texture coordinate sets. skin weights. fog. and point
`
`These are sent from the host in many fonnats including
`shorts.
`integers. and floats, with conversion to floating
`
`done before the data is accessed. Unspecified attribute
`
`« ts default to 0.0 for the second and third components.
`1.1.0 for the founh. The attribmcs are all persistent. that is they
`
`their data until they are changed by subsequent API calls.
`are addressed from 0 to l5. An API write to attribute 0 (the
`
`position when in fixed fimction mode) will
`invoke the
`program. Only one vertex attribute may be read per
`instruction.
`
`
`-
`
`‘i‘
`
`
`
`
`light positions, and plane
`ld constants such as matrices.
`Iathatareusedintypicalvertcxprogrmndwreisa
`bank of 96 quad-floats. It may only be loaded before
`are processed (for example outside of BeginfEnd). The
`“chosenbasedon fixedfimctionmemoryusage.andto
`
`Ironsonably large set of matrices for indexed skinning. As
`emcee attributes, only one constant may be read by one
`
`...
`inmwfion.1hepmgnmmaynotwritetoconstants
`
`.,
`it would create a dependency between venues. forcing
`
`w-n'on causing a serious performance impact.
`Iainooneirneguaddrusregistertbatmaybeloadedusing
`
`r-u'on (ARL). This address register allows for indexed
`t ready. with out-of-range reads returning the (0.0.0.0)
`
`'te register file is 12 quad-floats in size and allows
`-
`,
`
`leads and one write per instruction. The size was chosen to
`amenably simple modular code design, where some of the
`
`.
`,. would be used for storage of variables across multiple
`
`.
`«- All registers are initialized to (0.0.0.0) per vertex.
`.‘3metorreadmaybesoureedasmultipleweranda,md
`
`'1 ly swizzledlnegated each time: see Figure 2. Since any
`embenegatedthereisnoneedforasubtractinsuuction.
`
`atptrt Attributes
`
`vertex program outputs merge back into the fixed function
`
`at the homogeneous clip space point. there is a standard
`.. of output attributes. Position is used for clipping. Vertex
`
`..
`v
`- components are automatically clamped to the range
`”1.0. There is also a fog distance. and point sine output
`
`
`
`[333“
`
`(55.1.:
`[it-M'
`_‘
`
`3.1.1 No Bumble:
`and
`in OpenGL’[25]
`The fixed fimction transform paths
`DirectJD'“[6] are both controlled by glow sate that does not
`depend on the some! data supplied with each vertex. This allows
`for driver optimizations at the time the first vertex is supplied by
`the application since all subsequent vertices (until a new state
`changc)eandtensharethiacarefully optimizedpath. Theresult is
`a code segment that removes state checking and branching. It is
`therefore possible to support the full fixed flmction transform path
`(at
`least
`to hornogenous clip space) withotu branching. The
`decision was therefore made to not support branching. keeping
`the hardware as simple as possible. Also. late binding changes in
`control
`flow disrupt pipeline efficiency. Simple ifi'then/else
`evaluation is still supported through sum-oilproducts using. 1.0
`and 0.0. which can be generated with SL1" and 5m.
`3.7.2 Consult Latency
`One instruction set constraint we imposed was that our hardware
`implementation must issue any instruction per clock and execute
`
`MEDIATEK, EX. 1013, Page 6
`
`151
`
`IPR2018-00101
`
`IE_
`
`mE
`m[
`
`E-
`
`E—
`
`MEDIATEK, Ex. 1013, Page 6
`IPR2018-00101
`
`

`

`_.e._:
`
`option. We also wanted an accurate power function con fen-11mg
`the car" model. hence known approximations would not an ., ,
`,
`is possible to implement the an instruction with about It) .
`..
`instructions, but the performance lossIs extreme.
`1‘
`
`The we base 2 instruction returns an output accurate to sham |.
`mantissa hits as well as two partial results: the exponent
`.‘-..
`mantissaofthesotuccscalar. Amoreaccurateuserpro--
`....'.
`approximation based on the limited range mantissa can be ...'-,‘,-
`with the result added to the exponent. The EXP base 2 ins- mi: "
`also realms an output accurate to about ll mantissa bits as well?“
`
`two partial results.
`two raised to power of floodsouree)
`
`fiacticdsource).
`A more
`accurate
`user
`pro». :..
`approximation based on the limited range fraction can be .
`
`with the result multiplied by the power output. The precision .
`these instructions was based on the desired 8-bit color a - '
`
`of the specular LIT operation It takes about
`to instructions.
`achieve full accuracy LOG and EXP evaluation.
`
`The tint and flax operations allow for clamping and absolute
`computations (ttn of source and -source). Related to these are
`
`an and sea instructions that return [.0 it” the component co .
`is true and 00 if false.
`
`The an. instruction was added to allow support of vertex- . .
`constant access such as a matrix or plane equation. It converts
`floating—-point scalar into a signed integer. which can be used
`an offset into the constant memory Out-of-range reads from
`constant memory return (0.0.0.0).
`
`'
`Sourcesare negated by prefixinga - sign.and canbe
`via
`four optional
`subscripts
`that describe the on u.
`rearrangement desired. For example:
`IIW RD.
`-R1.tryxy;
`
`J
`
`-'
`
`'
`
`.
`
`
`
`
`into the .-
`moves the negated w component of register RI
`component ofregister R0.movcsthenegatedyandzcom~ ..
`
`across. and uses the negated y component again to place into .
`'
`R0 w component.
`
`The destination of an instruction has an optional write mask of . ..
`desired xyzw compmems to be written. For example:
`A30 30.". R1. R2 t
`
`updatesthexandwcomponentsofkfl with sum ole and R1.
`
`
`
`4 HARDWARE IMPLEMENTATION
`4.1 Overview
`The hardware implementation of vertex programs is divided '- _
`two main blocks:
`the vertex attribute buffer (VAB) and
`floating point core.
`
`Vortex In
`
`Vector FP Core
`
`Vertex Out
`
`Figure 3: Hardware Units
`
`The VAB is responsible for vertex attribute persistence. and-
`floating-point core processed the instnrction set.
`
`:t
`
`
`
`SlGGFlAPH 2001. Los An-eles. California. Au-Usl 12—17. 2001
`
`all instructions with the same latency. limiting the complexity of
`any instruction. This improves programmability and simplifies the
`hardware. All operands are immediately available.
`limiting the
`size of register and memory banks.
`3.1.3 Instruction Set Rationale
`Since we wanted to use the same instruction set for vertex
`programs and fixed fimction (non-programmable) mode. we
`started by anaiyzing the fixed fimction implementation of a
`previous architecture. We found that the equivalents of the HOV.
`not. act). and rum instructions were used about 50% of the time.
`and that the on. and DH equivalents were used about 40% of the
`time. We Stqrport dot products for their coding convenience. and
`also because as the number of cycles spent on a vertex decreases
`over architectural generations. it becomes more important to have
`powerful concise instructions. Cross products are also important,
`andtheycanbedoneviaanet’ficientmmsequence with
`source vector rotations. For example. R1 = RGXRZ is done as:
`IDS. R1, RD.IIYU. R2.yaxw ;
`Inn R1. RD.ysz-r.
`it: . zxy'w.
`
`-R1.-
`
`We support reciprocal (RCP) instead of division due to the samurai
`latency restriction. The RCP instruction is also scalar since the
`main use of it is in the perspective division of w in homogeneous
`clip space (done after the vertex program) which involves the
`multiply of the (x.
`.2) vector with the scalar NW.
`
`The reciprocal square root (use) is mainly used in nomralizing
`vectors to be used in lighting equations. The typical sequence is a
`up: to find the vector length squared. a use to get the reciprocal
`length. and a nut. to normalize the vector. It is very convenient to
`use the vector w component for storing the length squared and
`reciprocal lengtirvaluea.nsois alsoascalaroperator.
`To avoid problems with vector lengths of 0.0 causing R59 to return
`infinity. we mandated that 0.0 times anything be 0.0. This is also
`useful in conditional evaluation when multiplying by 0.0. Artmher
`mandate is tint L0 times anything be the same value.
`
`A major exception to our goal of similar performance in fixed
`fimction and program mode involved lighting. The previous
`architecture design has a separate hard-wired lighting engine.
`Since it was too hard to expose this engine in program mode. the
`decision was made to turn it off when fuming vertex programs.
`Fixed function performance with heavy lighting can therefore be
`twice as that as a comparable vertex program. To alleviate this
`problem. two instructions were included: DST and LIT. The not
`instruction assists in constructing attenuation factors of the form:
`(Karma-(lewd) = K0+Kl'd+K2‘d‘d
`where d is some distance. Since {Pd and lid are natural
`bypmducts of the vector nonrtalization process. these values are
`input as (NA.d'd.d'd.NA) and (NA.I/d.NA.l/d)) to DST, which
`then remms the (ldfi‘ddld) vector. The last l/d term can be
`used with a DIM operation if desired.
`The LIT instruction does the fairly complex ambient. diffuse. and
`specular calculations with clamping based on N-l... N-H. and the
`power p. The calculations are:
`contains-1.0.
`Output. y - null-1.. 0. 0);
`OutputHa-DD:
`12(IIL>0.0flp—0.0l
`Du
`r...a-1ll;
`also DM.>0-0ul-II>0.0)
`On L's-IDS);
`0cm ..;II-1tl
`
`”ambient
`// diffuse
`If specular
`
`Since LIT implements the specular power fitnction via use of a
`log, multiply. and exp sequence. we also decided to expose the
`too artd m instructions Since the power is a variable in the LIT
`source. a table needing a pro-known specular power was not an
`
`4.2 Attribute Input
`Vertex attributes are converted to floating point represent-
`before arriving at the VAB. which has room for the 16v-
`attributes. The contents oi'each address default to (0.0.0.0.0.0.1 '
`
`MEDIATEK, EX. 1013, Page 7
`
`IPR2018-00101
`
`
`
`.__
`
`-;
`
`MEDIATEK, Ex. 1013, Page 7
`IPR2018-00101
`
`

`

`‘F
`
`Computer Graphics Proceedings. Annual Conference Series. 2001
`
`Eall“ an attribute write arrives. and then overwritten by thevalid
`
`data components This is required since the API allows for
`Encoding less than four components; defaulting the remainder saves
`_undwidthrnto the GPU.
`
`
`act-w“‘..
`
`-r'ymssuv-cp1s-u
`
`w-we.“nu.a.“
`
`.3 E m
`
`Figure 4: VAB
`
`The VAB drains into a number of input buffers (18) that are used
`first feed the floating-point core in a round-robin fashion. Dirty bits
`m maintained in the VAB so that only changed attributes are
`updated when the same buffer is again the drain target. The
`mfer of a vertex is triggered by a write to address 0,
`corresponding to the vertex position in fixed function mode. To
`' prevent bubbles during simultaneous loading and draining of the
`VAB. incoming writes may push out the contents of the target
`address. supetceding a default drain sequence.
`
`4.3 The Floating-Point Core
`- The floating-point core is a multi-threaded vector processor
`‘operating on quad-float dam. Vertex data is read from the input
`buffers and transformed into the output buffers (OB). The latency
`of the vector and special function units are equal and multiple
`vertex threads are used to hide this latency.
`'llre SIMD Vector Unit is responsible for the ttov. mm. ADD. Han.
`m, D“. csr. ant, rm, 5LT. and sea operations. The Special
`Function Unit is responsible for the m. ttso. too. ass. and LIT
`operations.
`
`imgi
`
`
`
`Figure 5: Floating Point Core
`
`The Vector Unit flouting-point precision is approximately IEEE.
`There'rs no support for denomiulized numbers or exceptions. and
`rounding is always towards negative infinity. The hardware
`outputs 0.0 for a multiply with any source of 0.0.
`including
`
`0.0‘infinity and 0.0'NaN. The Special Function Unit calculates
`the Res and R50 functions to within about
`l.S bits of [EEE
`precision using two-pass Newton-Raphson iteration from a seed
`table. While lighting may suffice with a lower precision use,
`texture and position evaluation can require much higher precision.
`It was not felt necessary to provide a low-precision use option.
`The hardware accepts one instruction per clock and firlly
`implements all
`instruction set
`input/output options with no
`perfonnance penalty. All
`input vectors are available with no
`latency.
`
`5 PROGRAMMING INTERFACES
`the
`Given the predominance of OpenGL and DirectBD.
`3!)
`integration
`of
`programmable
`geometry
`into
`these
`programming interfaces is vital to its widespread availability and
`quick adoption. The discussion below concentrates on how we
`integrated programmable geometry into OpenGL through an
`extension named NV_vertex_pmgr-am. Where Direct3D makes
`alternative design choices. such choices are noted.
`
`5.1 Design Goals
`Existing OpenGL applications
`I. Backward compatibilitv.
`unaware of programmable
`geometry
`should
`operate
`unchanged.
`
`2.
`
`3.
`
`It should be relatively straightforward to
`Ease ofadoption.
`integrate
`prograrmnable
`geometry
`into
`an
`existing
`application without overhauling the way in which vertex data
`is presented to OpenGL. Moreover. applications should be
`able to mix existing fixed function vertex processing with
`programmable geometry.
`
`in our view. programmable geometry frees
`Fonvard focus.
`programmers from existing API conventions of what a
`“vertex normal“ or a “light direction” is; the vertex program
`supplies these semantic connections. transcending per-vertex
`attributes and vertex-related naming. By not constraining
`programmable geometry to existing conventions, we hope
`this will encourage novel applications for programmable
`geometry. including automatic generation of vertex programs
`by higher-level software [22].
`
`4.
`
`Preparation to expate firrwe progranrmabr‘llgr. We believe
`that other
`functionality beyond vertex processing in
`OpenGL's dataflow will eventually be programmable as
`well. The programming interface should be amenable to
`exposing other types ofprogrammability.
`5. Well-defined execution environment. Preliminary fwdback
`from developers and our own thinking convinced us that an
`unconstrained cxecrnion environment
`for programmable
`geometry would lead to harsh-afloat for developers. Unlike
`textures that can usually be down-sampled if too large,
`vertex programs that require more instructions. registers. Or
`other
`resources
`that
`are not
`available on
`a given
`implementation oamtot be easily simplified to cope with
`implementation limitations. For this reason, we chose to
`require a strict. well-defined execution environment.
`
`5.2 Programming Mode]
`NVuverraxflprogmm augments OpenGL vertex processing with a
`new mode known as vertex program mode.
`initially. vertex
`program mode is disabled. When disabled, vertices are
`transformed
`by OpenGL‘s
`conventional
`vertex-processing
`fimctionality. consisting of coordinate transformation. vertex
`lighting,
`texture coordinate generation. and user-defined clip
`planes.
`
`MEDIATEK, EX. 1013, Page 8
`IPR2018-00101
`
`‘53
`
`MEDIATEK, Ex. 1013, Page 8
`IPR2018-00101
`
`

`

`SIG-(2‘ HAl-‘r‘l 2001. Los An-eies. California. Au - as! 12—17, 2001
`
`Vertex program state afi'ects the OpenGL dataflow only when
`vertex program mode is enabled, so vertex program mode being
`initially disabled ensures backward compatibility.
`Vertex program mode is enabled as follows
`
`gramme lot._vsarax_9aocaxxasvl .-
`
`When enabled, a glVertex command (or equivalent) initiates
`vertex prognm execution. The current vertex program processes
`the current
`l6 vertex attributes and 96 program parameters as
`described in Section 3.5. At vertex program completion,
`the
`vertex result registers contain a transformed vertex that is firrther
`processed to screen space and forwarded to primitive assembly.
`
`5.2.1 Vertex Program Objects
`Multiple vertex programs are managed via progrmn objects. but
`there is a single current vertex program that i

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket