throbber
(12) United States Patent
`Rubinstein et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 6,856,320 B1
`Feb. 15, 2005
`
`US006856320B1
`
`(54)
`
`(75)
`
`(73)
`
`DEMAND-BASED MEMORY SYSTEM FOR
`GRAPHICS APPLICATIONS
`
`Inventors: Oren Rubinstein, Sunnyvale, CA (US);
`Ming Benjamin Zhu, San Jose, CA
`(Us)
`NVIDIA U.S. Investment Company,
`Santa Clara, CA (US)
`
`Assignee:
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 294 days.
`
`(21)
`(22)
`
`Appl. No.: 09/709,964
`Nov. 10, 2000
`Filed:
`
`Related US. Application Data
`
`(63)
`
`(51)
`(52)
`
`(58)
`
`Continuation-in-part of application No. 08/978,491, ?led on
`Nov. 25, 1997.
`
`Int. Cl.7 .............................................. .. G06F 12/02
`US. Cl. ..................... .. 345/543; 345/537; 345/538;
`345/545; 345/562
`Field of Search ............................... .. 345/543, 582,
`345/423, 516, 517, 553, 568, 545, 537,
`562; 711/208, 207, 205, 206, 209
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,344,852 B1 *
`6,380,935 B1 *
`6,396,473 B1 *
`
`2/2002 Zhu et al.
`4/2002 Heeschen et a1.
`5/2002 Callahan et a1.
`
`* cited by examiner
`
`Primary Examiner—MattheW C. Bella
`Assistant Examiner—Hau Nguyen
`(74) Attorney, Agent, or Firm—ToWnsend and Townsend
`and CreW LLP
`(57)
`
`ABSTRACT
`
`A memory system and methods of operating the same that
`drastically increase the efficiency in memory use and allo
`cation in graphics systems. In a graphics system using a tiled
`architecture, instead of pre-allocating a ?xed amount of
`memory for each tile, the invention dynamically allocates
`varying amounts of memory per tile depending on the
`demand. In one embodiment all or a portion of the available
`memory is divided into smaller pages that are preferably
`equal in siZe. Memory allocation is done by page based on
`the amount of memory required for a given tile.
`
`6 Claims, 5 Drawing Sheets
`
`CPU
`
`Interface
`Controller
`
`[100
`
`System _/ 108
`Memory
`
`(AGP)_
`
`r___________
`l
`l
`I
`I
`l
`I
`I
`I
`
`Geometry
`Processor
`
`I
`I :
`
`l
`
`l
`I
`I
`|
`I
`I
`l
`l
`l
`
`Binning
`Engine
`
`118
`
`Rendering
`Engine
`
`114
`/
`
`7 Memory
`Controller
`
`l
`Video
`Backend
`
`I
`
`i/1o2
`:
`I
`1
`I
`I
`:
`
`I
`1
`I
`
`l
`
`Graphics
`Memory
`
`/116
`
`Frame
`Buffer
`
`MEDIATEK Ex. 1006, Page 1
`
`

`

`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 1 0f 5
`
`US 6,856,320 B1
`
`100 J
`
`8
`
`0 /
`2 0 1
`
`106\
`
`
`rnerh _\ me \ "9 d9
`P_ r 9 6H G. W0 98 he _
`
`a0 8S I... 8i _
`(_
`nmw U MW BE ME _
`C" A 15 "H .?? _
`MW“ 0“ mn nn _
`
`_ / / _
`_ 0 2 _
`_ 4| 4| - _ 1 1 -
`
`_ r _
`_ _
`
`_ _
`
`_ _ < .
`
`_ _
`
`_ 1 _
`
`_ 1.\ _
`| no _
`
`mw _ _
`m0 _ _
`
`Wm _ H d _
`
`SM _ M W," On _
`
`I
`
`
`
`
`
`I I I I I
`
`
`
`
`
`
`
`
`
`
`
`1 I.‘ ‘ l l I I I l I I I lllllnd
`
`
`
`
`
`Graphics
`Memory
`
`/116
`
`_ c _
`
`_ 1/ 00 ee _ _ m"
`ll‘vd? . en W3 _ _ MO B .
`
`_ _
`
`_ _
`
`_ _
`
`Frame
`Buffer
`
`FIG. 1
`
`200
`
`202a
`
`mA
`
`202b -/
`
`?x 202C
`2
`
`FIG. 2
`
`MEDIATEK Ex. 1006, Page 2
`
`

`

`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 2 0f 5
`
`US 6,856,320 B1
`
`LOGICAL
`ADDRESS
`
`INDEX
`
`PHYSICAL
`ADDRESS
`
`VALID
`
`FIG. 3
`
`PPPP
`
`PPPD-?
`
`
`
` PPPPw b PPPPM h
`
`PPPP
`
`PPPP
`
`PPPP?
`
`Back buffer
`(800x600x16bit)
`
`PPPPIQWM. b 39
`PPPPtM
`
`el.
`
`e PPPP .@
`
`PPPPPP
`
`PPPPPP
`
`P P P P P
`
`Frame buffer
`(800x600x16bit)
`
`PPPPPDIPPPPPPP
`
`PPPPPDIPPPPPPP
`
`PPPPPPPPPPPPP
`
`PDIPPPPPPPPPPP
`
`PDIPPPDIPPPPPPP
`
`PPPPPPPPPPPPP
`
`FIG. 4B
`
`FIG. 4A
`
`PPPPPPPPPPPPP
`
`PPPPPDIPPPPPPP
`
`PDIPPPPPPPPPPP
`
`PDIPPPPPPPDIPPP
`
`PPPPPDIPPPPPPP
`
`PPPPPPPPPPPPP
`
`MEDIATEK Ex. 1006, Page 3
`
`

`

`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 3 0f 5
`
`US 6,856,320 B1
`
`Binning Render
`Host
`CPU Engine Engine
`
`I
`
`514\
`
`Input
`'"terface
`
`I
`
`504
`
`508
`
`510
`
`I
`
`TMU
`
`502
`/
`
`MMU
`
`FIG. 5
`
`Tile Descriptor
`Cache
`516
`/
`Page Descriptor
`Cache
`
`Output
`Interface
`
`<—-—> Memory
`
`512 \ Registers
`
`Triggers from TMU
`
`612\
`Request
`Arbitration
`
`Registers ‘
`Read/Write
`A°°e55e$<—-—
`
`I
`
`K502
`Page Walk
`State Machine
`
`/»604
`Page Prefetch
`State Machine
`606
`/'
`Page Allocation
`State Machine
`K608
`Page Freeing
`State Machine
`
`7°
`Page
`Descriptor
`Address
`Log“:
`
`L,
`
`»—>-
`
`I
`
`I
`
`Memory
`Access
`Requests
`
`To Page
`Descriptor
`Cache
`
`FIG. 6
`
`MEDIATEK Ex. 1006, Page 4
`
`

`

`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 4 0f 5
`
`US 6,856,320 B1
`
`Tile Table one of two
`
`700
`
`Tile
`
`Last_Addr Current
`
`4
`f
`L.Addr. Index R.Addr.
`
`706
`
`Valid
`
`FIG. 7
`
`MEDIATEK Ex. 1006, Page 5
`
`

`

`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 5 0f 5
`
`US 6,856,320 B1
`
`Binning engine
`“close buffer"
`
`Rendering subsystem
`“close buffer"
`
`NO
`
`Raster
`buffer closed
`
`4 I
`
`Binning
`buffer closed
`7
`
`No
`
`Swap buffers
`
`FIG. 8
`
`905 \ Input
`Interface
`
`Command /904
`Arbitration
`
`Registers read/write
`
`K- 900
`TMU state
`Machine
`
`Buffer Swap
`State Machine
`
`902
`
`Page
`access
`triggers
`
`Memory
`access
`requests
`
`To MMU To tile
`descriptor
`cache
`
`FIG. 9
`
`MEDIATEK Ex. 1006, Page 6
`
`

`

`US 6,856,320 B1
`
`1
`DEMAND-BASED MEMORY SYSTEM FOR
`GRAPHICS APPLICATIONS
`
`CROSS-REFERENCES TO RELATED
`APPLICATIONS
`
`This application is a continuation-in-part of commonly
`assigned US. patent application Ser. No. 08/978,491, titled
`“Rendering Pipeline,” by Zhu, ?led Nov. 25, 1997, and
`hereby incorporated by reference in its entirety.
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates in general to memory
`systems, and in particular to a memory system that provides
`for ef?cient use and allocation of memory resources in
`graphics applications.
`Limited memory bandWidth has been a major obstacle in
`improving the performance of traditional 3-D graphics pipe
`lines. Tiling or “chunking” has evolved as an alternative
`architecture that reduces the demand on memory bandWidth
`in 3-D graphics pipelines. A graphics pipeline using tiling
`architecture segments the rasteriZation area into a regular
`grid of tiles. Input geometry and commands are then
`“binned” into separate memory regions for each tile, such
`that only the geometry and commands Which affect raster
`iZation of a given tile need be read by the rasteriZation
`hardWare. By storing, for example, one tile Worth of data in
`memory that is local to the processor, external memory
`bandWidth requirements are signi?cantly reduced.
`While this type of tile-based architecture improves
`memory bandWidth, it also increases the demand for larger
`memory. Pipelining techniques Whereby one frame is ren
`dered at the same time the next frame is being binned,
`require even larger memory since a rendering buffer is
`provided in addition to a binning buffer. It therefore becomes
`necessary to allocate memory resources in this type of tiled
`architecture as ef?ciently as possible. Conventional
`approaches, hoWever, pre-allocate allocate a ?xed amount of
`memory for each tile. In this type of memory system
`inef?ciencies arise When, for example, the amount of geom
`etry data for a given tile is larger than the pre-allocated
`amount of memory. This results in the inability of the
`graphics hardWare to process all the data and render the tile.
`Conversely, a particular tile With a small amount of geom
`etry data may require less memory than that provided by the
`pre-allocated memory resulting in portions of the pre
`allocated memory to go unused.
`There is therefore a need for a memory system that
`provides for more ef?cient use and allocation of memory
`resources in graphics applications.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides a memory system and
`methods of operating the same that drastically increase the
`ef?ciency in memory use and allocation in graphics systems.
`Broadly, in a graphics system using a tiled architecture,
`instead of pre-allocating a ?xed amount of memory for each
`tile, the present invention dynamically allocates varying
`amounts of memory per tile depending on the demand. In
`one embodiment all or a portion of the available memory is
`divided into smaller “pages” that are preferably equal in
`siZe. Memory allocation is done by page based on the
`amount of memory required for a given tile. The smaller the
`siZe of the page the higher the granular resolution, and
`therefore the more efficient memory allocation becomes. In
`one embodiment, the siZe of the page is made con?gurable.
`
`2
`This page-based memory system removes the limitation on
`the memory siZe for a single tile by simply allocating more
`pages When necessary. Therefore, a tile Wherein a large
`amount of geometry may lie Would not cause an exception.
`In one embodiment, When all geometry data stored in a page
`of memory has been rendered, the page is freed and made
`available for reallocation. This effectively reduces the over
`all amount of memory used for binning, eliminating the need
`for double-buffering for a system that pipelines the binning
`and the rendering process.
`Accordingly, in one embodiment, the present invention
`provides a method of processing graphics data comprising
`dividing an image rendering area into a plurality of tiles;
`binning input graphics data; organiZing memory into a
`plurality of pages; and allocating one or more pages of
`memory to store binned data per tile, Wherein the number of
`pages allocated is determined by the siZe of the binned data
`for each tile.
`In another embodiment, the present invention provides a
`memory system comprising memory divided into a plurality
`of pages and con?gured to store graphics data; a memory
`management unit coupled to the memory and con?gured to
`maintain a plurality of page descriptors corresponding to the
`plurality of pages; and a tile management unit coupled to the
`memory and con?gured to maintain a plurality of tile
`descriptors corresponding to a plurality of screen tiles.
`In yet another embodiment, the present invention pro
`vides a computer system for processing graphics data,
`comprising a binning engine coupled to receive graphics
`data and con?gured to bin graphics data into screen tiles;
`memory coupled to the binning engine; and a rendering
`engine coupled to the memory and con?gured to render
`binned graphics data, Wherein, the memory is divided into a
`plurality of pages, and is con?gured to allocate one or more
`pages per tile depending on tile requirements.
`The folloWing detailed description and the accompanying
`draWings provide a better understanding of the nature and
`advantages of the page-based memory system of the present
`invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a graphics system that processes graphics data
`in accordance With an embodiment of the present invention;
`FIG. 2 illustrates a portion of an exemplary frame com
`prising a number of tiles;
`FIG. 3 shoWs an exemplary page descriptor according to
`the present invention;
`FIGS. 4A and 4B shoW examples of memory con?gura
`tions according to the present invention for graphics
`memory and system memory, respectively
`FIG. 5 is a block diagram for an implementation of a
`memory controller according to an exemplary embodiment
`of the present invention;
`FIG. 6 is a functional block diagram of an exemplary
`implementation for a memory management unit according to
`the present invention
`FIG. 7 shoWs a tile descriptor table and relationship
`betWeen tile and page tables according to an illustrative
`embodiment of the present invention;
`FIG. 8 illustrates an embodiment of buffer sWapping
`according to the invention; and
`FIG. 9 is a functional block diagram of an exemplary
`implementation for a tile management unit according to the
`present invention.
`
`15
`
`25
`
`40
`
`45
`
`55
`
`65
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`FIG. 1 is a block diagram of a graphics system 100 that
`processes graphics data in accordance With an embodiment
`
`MEDIATEK Ex. 1006, Page 7
`
`

`

`US 6,856,320 B1
`
`5*1~2 Mbytes=5~10 Mbytes
`
`15
`
`25
`
`35
`
`4
`The buffering siZe scales linearly With the performance
`such that, for example, for 5 Mtris/s, the amount of required
`memory Would be approximately:
`
`3
`of the present invention. A graphics processor 102 is coupled
`to a central processing unit (CPU) 104 by a bus, such as an
`advanced graphics protocol (AGP) bus, via an interface
`controller 106. System memory 108 also couples to CPU
`104 via interface controller 106. Graphics processor 102
`This increase in demand for memory is exacerbated by the
`includes a geometry processor 10 that receives graphics data
`from CPU 104. Graphics data typically includes geometry
`desire to bin geometries of a current frame in parallel With
`rendering geometries of a previous frame. Double buffering
`data and mode data. Geometry data comprises information
`relating to various polygons (e.g., triangles, parallelograms,
`has been proposed as a means to enable this type of parallel
`processing of consecutive frames. While binning of the
`rectangles, circles, etc.) Which are processed to produce a
`complete image. Geometry data speci?es, for example, the
`current frame is going on in one buffer, the rendering of the
`vertices (e.g., in X,Y,Z coordinates) and color (e.g., red
`previous frame is using the other buffer. The roles of these
`green-blue (RGB)) combinations for various polygons.
`tWo buffers are simply sWitched at the frame boundary.
`Unfortunately, double buffering requires too much buffering
`Mode data comprises information relating to various modes
`memory to accommodate a rendering performance of 5~10
`affecting the appearance of one or more geometries When
`Mtris/s.
`displayed. For example, for a given geometry, mode data can
`To address the ever increasing demand for memory by
`de?ne or specify one or more “textures” (e.g., fur, brick,
`bark, sky), blending effects, translucence effects, and the
`such graphics systems, the present invention implements a
`demand-based scheme to drastically increase ef?ciency in
`like, Which may be applied to the rendered geometry.
`use and allocation of memory resources. Broadly, the buff
`Geometry processor 10 supplies the graphics data to a
`ering memory is segmented into pages of suf?cient
`binning engine 112. Using the graphics data, binning engine
`granularity, one or more of Which can then be allocated per
`112 reproduces the associated 13 geometries and modes in
`tile depending on that tile’s requirements. The invention
`an image frame Which comprises a plurality of grid-like
`divides the pages of memory into tWo types, pages currently
`regions referred to as tiles. FIG. 2 shoWs a portion of a frame
`allocated to tiles, and unused pages. Unused pages are kept
`segmented into a number of tiles. A tile may be de?ned, for
`track of in an unused page pool. Each tile is allocated With
`example, by a 32 pixel by 32 pixel square area of the frame.
`a variable number of pages scattered in the memory. In one
`Binning engine 112 determines Which tiles are “touched” by
`embodiment, pages in the unused page pool or each tile are
`each geometry. For every tile, the graphics data for each
`organiZed in a linked list referred to herein as a chain.
`geometry touching the tile is linked to the tile. This linked
`Whenever the screen tiler needs more pages so that it can
`data is output by binning engine 112 per tile. A memory
`store more data to a tile, it attempts to capture pages from the
`controller 114 is coupled to binning engine 112 and routes
`unused pool. If it succeeds, the references of these pages are
`the tile data for storage into various portions, or bins, of a
`removed from the unused page pool, and these pages are
`graphics memory 116. The portion of graphics memory 116
`allocated to the requesting tile. If it fails, the screen tiler
`that is designated for storing binning data is sometimes
`stalls (Which in turn stalls the upstream geometry
`referred to herein as binning memory. A rendering engine
`processing) and Waits until pages get released back into the
`118 is coupled to memory controller 114 and accesses the
`unused pool. After the content of a page has been consumed
`binned graphics data contained in graphics memory 116 to
`by the rendering pipeline, the page is released back into the
`render an image for display. In the rendering engine a tile
`unused page pool. This alloWs the rendering buffer and the
`frame buffer may be cached locally to eliminate external
`binning buffer to share memory resources. The page-based
`frame buffer accesses. The process of rendering an image is
`scheme also removes the issue of limiting the memory siZe
`thus optimiZed by storing graphics data into local graphics
`for a single tile, because more pages can alWays be allocated
`memory (or binning memory) such that it can be readily
`to a single tile as long as there are unused pages left in the
`retrieved by rendering engine 118 to generate an image
`page pool. Therefore, the case of all geometries lying in a
`frame.
`single tile does not cause an exception under this scheme. It
`It is to be understood that the various functional blocks in
`is to be noted that this scheme can be applied to either the
`the graphics system 100 may be implemented by a combi
`graphics memory or system memory.
`nation of hardWare and/or softWare, and that in speci?c
`In a speci?c embodiment, the present invention con?g
`implementations some or all of the functionality of the
`ures the memory into multiple pages as folloWs. There is a
`various blocks may be combined into an integrated solution.
`default siZe used for all the pages. This siZe can be made
`For example, While the diagram shoWs graphics memory
`con?gurable through a global register, and can be set to, for
`116 as a separate block from graphics processor 102, a
`example, 0.5, 1, 2, 4 or 8 Kbytes. Each page has an
`certain amount of memory for binning or other purposes
`associated page descriptor that is stored in a page table in
`may be provided in the graphics processor block. Another
`memory. A page descriptor includes information as to cur
`implementation may combine interface controller and
`rent availability of the page, the page physical and logical
`graphics processor, and combine graphics and system
`addresses, as Well as a page index that indicates the next
`memory resources into an integrated chipset. Geometry
`page in the chain. Pages are preferably processed in chains
`processor 110 may be implemented as part of CPU 104
`to alloW for pre-fetching. In a speci?c example, a page
`eliminating the need for a separate processor. Also, in
`descriptor includes a valid bit that indicates the availability
`alternative embodiments, the binning function may be
`of the page. If the valid bit is, for example 0, it signi?es that
`implemented by softWare on a general purpose CPU or DSP.
`memory has not been allocated for this page. If a neW page
`The memory requirements for such a system Will next be
`is to be allocated to the binning engine and no more valid
`examined using exemplary numbers for illustrative pur
`pages are available, the page allocation logic stalls until
`poses. To render, for example, 1 Million triangles per second
`more pages become available. The page descriptor further
`(1 Mtris/s), at an exemplary frame rate of 30 HZ, the amount
`5 includes a physical address for the beginning of the page.
`of memory needed to buffer the geometry is approximately: 6
`This address may use, for example, the top 23 bits of the
`page descriptor, of Which the bottom 1—4 bits are ignored for
`
`40
`
`45
`
`55
`
`30~6O Mbytes/s/30/s=1~2 Mbytes
`
`MEDIATEK Ex. 1006, Page 8
`
`

`

`US 6,856,320 B1
`
`5
`page sizes of 1, 2, 4 or 8 Kbytes. A logical address is also
`provided for the beginning of the page, using for example 17
`bits for positions [25:9] in the page descriptor (With address
`[31:26]=0). The logical address may be allocated on the ?y
`When the page is added to a chain. The page descriptor also
`includes a page index to identify the next page in the chain
`(e.g., 15 bits). In one example, all “1’s” signi?es this is the
`last page in the chain. FIG. 3 shoWs an exemplary embodi
`ment of a page descriptor 300 With the ?elds as described
`above.
`Pages start by being free and get allocated as a frame is
`binned. The used pages remain reserved until rendering
`begins. According to one embodiment of the present
`invention, pages can be allocated from different memory
`resources including, for example, both the graphics memory
`as Well as the system memory. To accomplish this, separate
`chains of free pages are provided for each memory resource.
`The pages are returned to the beginning of the correct chain,
`Which is selected based on their physical address, When they
`are freed. At initialiZation time, all the pages are linked in the
`correct free chain (in system or graphics memory) and, in
`order to be usable by the page allocation logic, the registers
`storing the list of free pages are initialiZed With the values of
`the ?rst page descriptor in each chain. FIGS. 4A and 4B
`shoW examples of memory con?gurations for graphics
`memory and system memory, respectively. The small
`squares marked “P” are the binning/render buffer pages. To
`take advantage of all the available memory, the virtual
`memory pages, the tile tables, and the page table can be
`allocated in the unused memory betWeen larger areas, for
`example, at the end of the frame buffer and back buffer,
`Which typically have “odd” sizes (not a power of 2).
`Referring back to FIG. 1, managing and allocating pages,
`maintaining page descriptors and all other related memory
`functions are performed by memory controller 114. FIG. 5
`shoWs a block diagram for an implementation of memory
`controller 114 according to an exemplary embodiment of the
`present invention. The tWo main blocks Within memory
`controller 114 are memory management unit (MMU) 502,
`Which performs all the functions related to maintaining
`pages of memory, and tile management unit (TMU) 504,
`Which maintains tile descriptors and implements the algo
`rithm that frees and re-allocates pages of memory for each
`tile. In this speci?c embodiment, memory controller 114 is
`provided With internal caches to reduce the latency of
`various operations. A page descriptor cache (PDC) 506
`couples to MMU 502, and a tile descriptor cache (TDC) 508
`couples to TMU 504. MMU 502 and TMU 504 access the
`memory via their respective caches and an interface circuit
`510. A set of registers 512 couple to both MMU 502 and
`TMU 504 as Well as another interface circuit 514. Interface
`circuit 514 facilitates the interface betWeen the memory
`controller and the graphics processor (binning engine and
`rendering engine) as Well as a host CPU.
`Memory Management Unit
`In one embodiment, MMU 502 is implemented by a state
`machine that performs all page related memory operations.
`Among the various functions performed by MMU 502 is
`page allocation including moving a page from a “free” pool
`to a tile list. MMU 502 allocates pages dynamically as the
`binning engine Writes into a tile. When the address falls past
`the end of the current page, the next page from the free pool
`is assigned for that particular tile. In order to optimiZe
`sequential page accesses When the rendering engine 118
`reads the tile data, MMU 502 prefetches the descriptor of the
`next page into the internal registers. In case an address falls
`outside a prefetched page, MMU 502 is able to skip pages
`
`10
`
`15
`
`25
`
`35
`
`40
`
`45
`
`55
`
`65
`
`6
`in a chain. This is referred to herein as “page Walking” and
`is accomplished by reading the link index from the page
`descriptor of a page to be skipped and fetching the descriptor
`of the folloWing page in the chain from the page table one
`or more times, until the page descriptor containing the
`virtual address to be accessed is fetched.
`When the rendering engine moves from one page to the
`next, the old page is freed. MMU 502 returns a freed page
`to the free pool. If page Walking is involved, all the skipped
`pages are freed. Pages that are freed explicitly by the
`rendering engine are the last pages in each tile, Which are
`freed When the rendering engine closes the tile. MMU 502
`also operates to transfer pages from the list used by the
`binning engine, to the one used by the rendering engine. An
`example of a typical life cycle of a page is as folloWs:
`1. Free ->Binning (added at the end of a tile Which is part
`of the binning buffer)
`2. Binning ->Reserved (close the binning buffer for Write)
`3. Reserved ->Rendering (open the render buffer for read)
`4. Rendering ->Free
`FIG. 6 provides a functional block diagram of an exem
`plary implementation for MMU 502. A state machine is
`provided for each of the various functions preformed by the
`MMU including a page Walk state machine 602, a page
`prefetch state machine 604, a page allocation state machine
`606 and a page freeing state machine 608. Each one of these
`state machines communicates With the registers for read/
`Write purposes. Apage descriptor address logic 610 receives
`inputs from the various state machines and communicates
`memory access requests to the page descriptor cache. MMU
`502 also includes a request arbitration block 612 that
`receives requests for the various operations (e.g., page
`allocation, prefetching, etc.) from TMU 504. The requests
`may occur at any time, and at any given time there may be
`several requests pending. Request arbitration block 612
`decides the sequence of the requests and applies the request
`signals to the individual state machines.
`Tile Management Unit
`In order to optimiZe the memory utiliZation, the present
`invention provides TMU 504 to implement the algorithm by
`Which pages from the rendering buffer are freed When they
`are no longer needed and then re-allocated to the binning
`buffer. In one embodiment, each tile is allocated a virtual
`address space of, for example, up to 64 Mbytes (26 address
`bits). For systems Where tWo frames are being processed at
`the same time, tWo tile descriptor tables are provided in
`memory for each of the rendering buffer and binning buffer.
`Alternative embodiments may include one or more than tWo
`tile descriptor tables.
`A tile descriptor table, according to an illustrative
`embodiment of the present invention is shoWn in FIG. 7
`Which also shoWs a tile descriptor 702(i) that includes the
`folloWing ?elds:
`Last virtual address (LastiAddr) used for Writing to the
`tile. In this example, the bottom 13 bits of the ?eld are
`designated for this information. Whenever a tile is Written to,
`the corresponding “last access” address is updated.
`Valid bit indicating the tile “exists”. TMU 504 ignores the
`entries that contain, for example, a “0” in this bit. If the
`rendering engine requests a tile to be opened for read and the
`tile is invalid, it Will receive a value of “0” as the “last
`access” address, indicating an empty tile.
`Start page index for the tile (e.g., 15 bits).
`Current page index (e.g., 15 bits).
`Replica of the page descriptor for the last accessed page
`in the tile. Storing this information redundantly, both in the
`page table and in the tile descriptor, reduces the number of
`
`MEDIATEK Ex. 1006, Page 9
`
`

`

`US 6,856,320 B1
`
`,
`
`num ere as o oWs.
`
`8
`7
`memory read requests, and therefore the latency of the tile
`beginning (i.e., as the ?rst page) of the a chain of free pages.
`open operations.
`This algorithm increases the temporal locality of the page
`FIG. 7 also depicts the relationship betWeen tile and page
`descriptor accesses and therefore improves the hit ratio of
`tables. A tile table 700 stores the tile descriptors 702, an
`the page descriptor cache. Not appending the free page at the
`example of Which (702i) shoWs the various ?elds. An
`5 end of the chain also saves a read operation since this
`.
`.
`.
`.
`.
`1
`.th
`.
`th t
`1
`th “? t
`,,
`. t
`b
`exemplary allocation of pages for the tile With tile descriptor
`a gen m requhes
`a on y e
`rs page regls er 6
`702i is shoWn in page descriptor table 704. FIG. 7 also
`updated‘
`shoWs the various ?elds of a page descriptor 706.
`In systems using tWo tile descriptor tables, one for the
`Another function performed by the TMU is buffer sWap-
`ping to alloW for parallel processing of tWo frames. An 10 binning buffer and one for the rendering buffer, the access
`embodiment Of buffer swapping according to the iIlVeIlIiOIl
`pattern to the tWo tables of tile descriptors is very different.
`is depicted in
`When the binning engine ?nishes
`The binning engine may
`back and forth between
`binning a new frame it Performs a “Close for Write” Com‘
`different tiles, because the geometry data can refer to tri
`mand on thebmnmg buffer- From tins Polm On> all “me
`angles that are in different places on the screen. The ren
`ope'n” operations issued by the binning engine are stalled 15 dering engine, on the other hand, Opens a tile, renders
`until the rendering buffer is alscl closed and the t’Wo buffers
`everything inside it, and then moves to another tile‘ The
`Chh ,be SWaPPed Subsequent C1956 for Wm? from the
`binning engine, therefore, requires alot more tile operations
`binning engine are also stalled until the rendering buffer is
`th e latency of which directly impacts the Overall binning
`clhsed and theh reopened‘ In the exemplary emhodhhhht’ a
`performance, Whereas, the one-time latency of the “tile
`thhd (hose 15 Illegal‘ When ,the rehdenhgfhglhe hhlshei 20 open” operation in the rendering engine is acceptable. Given
`rehdenhg the hhrreht frame’ It performs a Close for ,read
`this difference, in one embodiment, the present invention
`COmmahd’_Wh1Ch slghals the memo,” cohhoher that It can
`provides a cache for the binning tile descriptor table and no
`Swap, the blhhlhg/rehdenhg bhffers’ If, the blhhlhg bhfferyas
`cache for the rendering tile descriptor table. In a speci?c
`prevlghsly clhsed by the bhhhhg ehglhe; htherwlse’ the the
`embodiment, When a frame is binned in its entirety, before
`Open _ operhhohs from the réhdenhg ehglhe Wlh be Stahed' 25 the rendering engine starts rendering the tiles in that frame,
`That 15’. Whlchever block ?hlshes ?rst Wahs for the other to
`the tile descriptor cache is ?ushed. The operation of the tile
`alsh hhlsh to be able to Support the systfhh Whh two buffers‘
`descriptor cache Will be described in greater detail herein
`It IS to be understood’, however’ that It IS posslble to adh
`after in the context of a speci?c exemplary embodiment. In
`mere buffers (i.e., a third tile descriptor table) and Work in
`this example, the binning buffer tiles are Organized in
`a mple bhffer mod?‘
`_
`3O “super-tiles” made up of multiple tiles, e.g., four tiles
`_ FIG‘ 9 15,21 fuhchohal block dlagram of an ‘KXemPIaFY
`arranged as a tWo-tiles-Wide by tWo-tiles-high square, each
`implementation for TMU 504. A TMU state machine 900 is
`b d
`f H
`_
`provided that implements the tile management algorithm. A
`0 1
`buffer sWap state machine 902 couples to TMU state
`2 3
`machine 900, and performs the buffer sWapping unction. A
`The tile descriptor cache contains eight cache lines, each
`command arbitration block 904 decides the sequence of 35
`line containing the four tile descriptors for a “super-tile.”
`commands from the binning or the rendering engine (e.g.,
`The tile descriptor cache in this exemplary embodiment is
`“open tile,” “close buffer,” etc.) before supplying them to
`organiZed as a fully associative cache. The replacement
`TMU state machine 900. TMU state machine 900 is also
`policy is “random” using, for example, an 8-bit random
`coupled to read/Write registers and the MMU as shoWn.
`As mentioned above, to reduce the latency of the opera- 40 number generator. In order to optimiZe cache ?lls or spills,
`tions that require accessing the tables, memory controller
`the tiles in a “super-tile” use sequential entries in the tile
`includes tWo caches: page descriptor cache 506 for the page
`table. In one embodiment, the correspondence betWeen the
`descriptor table, and tile descriptor cache 508 for the tile
`tile number and its position in the “super-tile” is calculated
`descriptor table. To reduce the latency of the accesses to the
`by the hardWare. The folloWing is an example of a tile table
`page descriptor table, memory controller (114 in FIG. 1) 45 for an 800x600 screen (25x19 tiles), using a 32x20 organi
`includes memory, such as static random access memory
`Zation. The 0, 1, 32, 33, 2, 3, 34, 35, .
`.
`. organiZation
`(SRAM), to provide a caching mechanism for this table. In
`corresponds to the folloWing screen:
`
`,
`
`,
`
`,
`
`0
`32
`64
`96
`128
`160
`
`i
`33
`65
`97
`129
`161
`
`2
`34
`66
`98
`130
`162
`
`3
`35
`67
`99
`131
`163
`
`4
`36
`68
`100
`132
`164
`
`s
`37
`69
`101
`133
`165
`
`6
`38
`70
`102
`134
`166
`
`7
`39
`71
`103
`135
`167
`
`24
`56
`88
`120
`152
`184
`
`25
`57
`89
`121
`153
`185
`
`31
`63
`95
`127
`159
`191
`
`576
`608
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket