`Rubinstein et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 6,856,320 B1
`Feb. 15, 2005
`
`US006856320B1
`
`(54)
`
`(75)
`
`(73)
`
`DEMAND-BASED MEMORY SYSTEM FOR
`GRAPHICS APPLICATIONS
`
`Inventors: Oren Rubinstein, Sunnyvale, CA (US);
`Ming Benjamin Zhu, San Jose, CA
`(Us)
`NVIDIA U.S. Investment Company,
`Santa Clara, CA (US)
`
`Assignee:
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 294 days.
`
`(21)
`(22)
`
`Appl. No.: 09/709,964
`Nov. 10, 2000
`Filed:
`
`Related US. Application Data
`
`(63)
`
`(51)
`(52)
`
`(58)
`
`Continuation-in-part of application No. 08/978,491, ?led on
`Nov. 25, 1997.
`
`Int. Cl.7 .............................................. .. G06F 12/02
`US. Cl. ..................... .. 345/543; 345/537; 345/538;
`345/545; 345/562
`Field of Search ............................... .. 345/543, 582,
`345/423, 516, 517, 553, 568, 545, 537,
`562; 711/208, 207, 205, 206, 209
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,344,852 B1 *
`6,380,935 B1 *
`6,396,473 B1 *
`
`2/2002 Zhu et al.
`4/2002 Heeschen et a1.
`5/2002 Callahan et a1.
`
`* cited by examiner
`
`Primary Examiner—MattheW C. Bella
`Assistant Examiner—Hau Nguyen
`(74) Attorney, Agent, or Firm—ToWnsend and Townsend
`and CreW LLP
`(57)
`
`ABSTRACT
`
`A memory system and methods of operating the same that
`drastically increase the efficiency in memory use and allo
`cation in graphics systems. In a graphics system using a tiled
`architecture, instead of pre-allocating a ?xed amount of
`memory for each tile, the invention dynamically allocates
`varying amounts of memory per tile depending on the
`demand. In one embodiment all or a portion of the available
`memory is divided into smaller pages that are preferably
`equal in siZe. Memory allocation is done by page based on
`the amount of memory required for a given tile.
`
`6 Claims, 5 Drawing Sheets
`
`CPU
`
`Interface
`Controller
`
`[100
`
`System _/ 108
`Memory
`
`(AGP)_
`
`r___________
`l
`l
`I
`I
`l
`I
`I
`I
`
`Geometry
`Processor
`
`I
`I :
`
`l
`
`l
`I
`I
`|
`I
`I
`l
`l
`l
`
`Binning
`Engine
`
`118
`
`Rendering
`Engine
`
`114
`/
`
`7 Memory
`Controller
`
`l
`Video
`Backend
`
`I
`
`i/1o2
`:
`I
`1
`I
`I
`:
`
`I
`1
`I
`
`l
`
`Graphics
`Memory
`
`/116
`
`Frame
`Buffer
`
`MEDIATEK, Ex. 1030, Page 1
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 1 0f 5
`
`US 6,856,320 B1
`
`100 J
`
`8
`
`0 /
`2 0 1
`
`mw _ _
`m0 _ _
`
`106\
`
`
`rnerh _\ me \ "9 d9
`P_ r 9 6H G. W0 98 he _
`a0 (_ 8S I... 8i _
`nmw U MW BE ME _
`C" A 15 "H .?? _
`MW“ 0“ mn nn _
`
`_ / / _
`_ 0 2 _
`_ 4| 4| - _ 1 1 -
`
`_ r _
`_ _
`
`_ _
`
`_ _ < .
`
`_ _
`
`_ 1 _
`
`_ 1.\ _
`
`| no _
`
`_ c _
`
`Wm _ H d _
`
`SM _ M W," On _
`
`_ 1/ 00 ee _ _ m" ll‘vd?
`. en W3 _ _ MO B .
`
`
`
`1 I.‘ ‘ I l l I I I I I I l I I I I I lllllnd
`
`_ _
`
`_ _
`
`_ _
`
`Graphics
`Memory
`
`/116
`
`Frame
`Buffer
`
`FIG. 1
`
`200
`
`202a
`
`mA
`
`202b -/
`
`?x 202C
`2
`
`FIG. 2
`
`MEDIATEK, Ex. 1030, Page 2
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 2 0f 5
`
`US 6,856,320 B1
`
`LOGICAL
`ADDRESS
`
`INDEX
`
`PHYSICAL
`ADDRESS
`
`VALID
`
`FIG. 3
`
`PPPP
`
`PPPD-?
`
`
`
` PPPPw b PPPPM h
`
`PPPP
`
`PPPP
`
`PPPP?
`
`Back buffer
`(800x600x16bit)
`
`PPPPIQWM. b 39
`PPPPtM
`
`el.
`
`e PPPP .@
`
`PPPPPP
`
`PPPPPP
`
`P P P P P
`
`Frame buffer
`(800x600x16bit)
`
`PPPPPDIPPPPPPP
`
`PPPPPDIPPPPPPP
`
`PPPPPPPPPPPPP
`
`PDIPPPPPPPPPPP
`
`PDIPPPDIPPPPPPP
`
`PPPPPPPPPPPPP
`
`FIG. 4B
`
`FIG. 4A
`
`PPPPPPPPPPPPP
`
`PPPPPDIPPPPPPP
`
`PDIPPPPPPPPPPP
`
`PDIPPPPPPPDIPPP
`
`PPPPPDIPPPPPPP
`
`PPPPPPPPPPPPP
`
`MEDIATEK, Ex. 1030, Page 3
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 3 0f 5
`
`US 6,856,320 B1
`
`Binning Render
`Host
`CPU Engine Engine
`
`I
`
`514\
`
`Input
`'"terface
`
`I
`
`504
`
`508
`
`510
`
`I
`
`TMU
`
`502
`/
`
`MMU
`
`FIG. 5
`
`Tile Descriptor
`Cache
`516
`/
`Page Descriptor
`Cache
`
`Output
`Interface
`
`<—-—> Memory
`
`512 \ Registers
`
`Triggers from TMU
`
`612\
`Request
`Arbitration
`
`Registers ‘
`Read/Write
`A°°e55e$<—-—
`
`I
`
`K502
`Page Walk
`State Machine
`
`/»604
`Page Prefetch
`State Machine
`606
`/'
`Page Allocation
`State Machine
`K608
`Page Freeing
`State Machine
`
`7°
`Page
`Descriptor
`Address
`Log“:
`
`L,
`
`»—>-
`
`I
`
`I
`
`Memory
`Access
`Requests
`
`To Page
`Descriptor
`Cache
`
`FIG. 6
`
`MEDIATEK, Ex. 1030, Page 4
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 4 0f 5
`
`US 6,856,320 B1
`
`Tile Table one of two
`
`700
`
`Tile
`
`Last_Addr Current
`
`4
`f
`L.Addr. Index R.Addr.
`
`706
`
`Valid
`
`FIG. 7
`
`MEDIATEK, Ex. 1030, Page 5
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Feb. 15,2005
`
`Sheet 5 0f 5
`
`US 6,856,320 B1
`
`Binning engine
`“close buffer"
`
`Rendering subsystem
`“close buffer"
`
`NO
`
`Raster
`buffer closed
`
`4 I
`
`Binning
`buffer closed
`7
`
`No
`
`Swap buffers
`
`FIG. 8
`
`905 \ Input
`Interface
`
`Command /904
`Arbitration
`
`Registers read/write
`
`K- 900
`TMU state
`Machine
`
`Buffer Swap
`State Machine
`
`902
`
`Page
`access
`triggers
`
`Memory
`access
`requests
`
`To MMU To tile
`descriptor
`cache
`
`FIG. 9
`
`MEDIATEK, Ex. 1030, Page 6
`IPR2018-00101
`
`
`
`US 6,856,320 B1
`
`1
`DEMAND-BASED MEMORY SYSTEM FOR
`GRAPHICS APPLICATIONS
`
`CROSS-REFERENCES TO RELATED
`APPLICATIONS
`
`This application is a continuation-in-part of commonly
`assigned US. patent application Ser. No. 08/978,491, titled
`“Rendering Pipeline,” by Zhu, ?led Nov. 25, 1997, and
`hereby incorporated by reference in its entirety.
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates in general to memory
`systems, and in particular to a memory system that provides
`for ef?cient use and allocation of memory resources in
`graphics applications.
`Limited memory bandWidth has been a major obstacle in
`improving the performance of traditional 3-D graphics pipe
`lines. Tiling or “chunking” has evolved as an alternative
`architecture that reduces the demand on memory bandWidth
`in 3-D graphics pipelines. A graphics pipeline using tiling
`architecture segments the rasteriZation area into a regular
`grid of tiles. Input geometry and commands are then
`“binned” into separate memory regions for each tile, such
`that only the geometry and commands Which affect raster
`iZation of a given tile need be read by the rasteriZation
`hardWare. By storing, for example, one tile Worth of data in
`memory that is local to the processor, external memory
`bandWidth requirements are signi?cantly reduced.
`While this type of tile-based architecture improves
`memory bandWidth, it also increases the demand for larger
`memory. Pipelining techniques Whereby one frame is ren
`dered at the same time the next frame is being binned,
`require even larger memory since a rendering buffer is
`provided in addition to a binning buffer. It therefore becomes
`necessary to allocate memory resources in this type of tiled
`architecture as ef?ciently as possible. Conventional
`approaches, hoWever, pre-allocate allocate a ?xed amount of
`memory for each tile. In this type of memory system
`inef?ciencies arise When, for example, the amount of geom
`etry data for a given tile is larger than the pre-allocated
`amount of memory. This results in the inability of the
`graphics hardWare to process all the data and render the tile.
`Conversely, a particular tile With a small amount of geom
`etry data may require less memory than that provided by the
`pre-allocated memory resulting in portions of the pre
`allocated memory to go unused.
`There is therefore a need for a memory system that
`provides for more ef?cient use and allocation of memory
`resources in graphics applications.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides a memory system and
`methods of operating the same that drastically increase the
`ef?ciency in memory use and allocation in graphics systems.
`Broadly, in a graphics system using a tiled architecture,
`instead of pre-allocating a ?xed amount of memory for each
`tile, the present invention dynamically allocates varying
`amounts of memory per tile depending on the demand. In
`one embodiment all or a portion of the available memory is
`divided into smaller “pages” that are preferably equal in
`siZe. Memory allocation is done by page based on the
`amount of memory required for a given tile. The smaller the
`siZe of the page the higher the granular resolution, and
`therefore the more efficient memory allocation becomes. In
`one embodiment, the siZe of the page is made con?gurable.
`
`2
`This page-based memory system removes the limitation on
`the memory siZe for a single tile by simply allocating more
`pages When necessary. Therefore, a tile Wherein a large
`amount of geometry may lie Would not cause an exception.
`In one embodiment, When all geometry data stored in a page
`of memory has been rendered, the page is freed and made
`available for reallocation. This effectively reduces the over
`all amount of memory used for binning, eliminating the need
`for double-buffering for a system that pipelines the binning
`and the rendering process.
`Accordingly, in one embodiment, the present invention
`provides a method of processing graphics data comprising
`dividing an image rendering area into a plurality of tiles;
`binning input graphics data; organiZing memory into a
`plurality of pages; and allocating one or more pages of
`memory to store binned data per tile, Wherein the number of
`pages allocated is determined by the siZe of the binned data
`for each tile.
`In another embodiment, the present invention provides a
`memory system comprising memory divided into a plurality
`of pages and con?gured to store graphics data; a memory
`management unit coupled to the memory and con?gured to
`maintain a plurality of page descriptors corresponding to the
`plurality of pages; and a tile management unit coupled to the
`memory and con?gured to maintain a plurality of tile
`descriptors corresponding to a plurality of screen tiles.
`In yet another embodiment, the present invention pro
`vides a computer system for processing graphics data,
`comprising a binning engine coupled to receive graphics
`data and con?gured to bin graphics data into screen tiles;
`memory coupled to the binning engine; and a rendering
`engine coupled to the memory and con?gured to render
`binned graphics data, Wherein, the memory is divided into a
`plurality of pages, and is con?gured to allocate one or more
`pages per tile depending on tile requirements.
`The folloWing detailed description and the accompanying
`draWings provide a better understanding of the nature and
`advantages of the page-based memory system of the present
`invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a graphics system that processes graphics data
`in accordance With an embodiment of the present invention;
`FIG. 2 illustrates a portion of an exemplary frame com
`prising a number of tiles;
`FIG. 3 shoWs an exemplary page descriptor according to
`the present invention;
`FIGS. 4A and 4B shoW examples of memory con?gura
`tions according to the present invention for graphics
`memory and system memory, respectively
`FIG. 5 is a block diagram for an implementation of a
`memory controller according to an exemplary embodiment
`of the present invention;
`FIG. 6 is a functional block diagram of an exemplary
`implementation for a memory management unit according to
`the present invention
`FIG. 7 shoWs a tile descriptor table and relationship
`betWeen tile and page tables according to an illustrative
`embodiment of the present invention;
`FIG. 8 illustrates an embodiment of buffer sWapping
`according to the invention; and
`FIG. 9 is a functional block diagram of an exemplary
`implementation for a tile management unit according to the
`present invention.
`
`15
`
`25
`
`40
`
`45
`
`55
`
`65
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`FIG. 1 is a block diagram of a graphics system 100 that
`processes graphics data in accordance With an embodiment
`
`MEDIATEK, Ex. 1030, Page 7
`IPR2018-00101
`
`
`
`US 6,856,320 B1
`
`5*1~2 Mbytes=5~10 Mbytes
`
`15
`
`25
`
`35
`
`4
`The buffering siZe scales linearly With the performance
`such that, for example, for 5 Mtris/s, the amount of required
`memory Would be approximately:
`
`3
`of the present invention. A graphics processor 102 is coupled
`to a central processing unit (CPU) 104 by a bus, such as an
`advanced graphics protocol (AGP) bus, via an interface
`controller 106. System memory 108 also couples to CPU
`104 via interface controller 106. Graphics processor 102
`This increase in demand for memory is exacerbated by the
`includes a geometry processor 10 that receives graphics data
`from CPU 104. Graphics data typically includes geometry
`desire to bin geometries of a current frame in parallel With
`rendering geometries of a previous frame. Double buffering
`data and mode data. Geometry data comprises information
`relating to various polygons (e.g., triangles, parallelograms,
`has been proposed as a means to enable this type of parallel
`processing of consecutive frames. While binning of the
`rectangles, circles, etc.) Which are processed to produce a
`complete image. Geometry data speci?es, for example, the
`current frame is going on in one buffer, the rendering of the
`vertices (e.g., in X,Y,Z coordinates) and color (e.g., red
`previous frame is using the other buffer. The roles of these
`green-blue (RGB)) combinations for various polygons.
`tWo buffers are simply sWitched at the frame boundary.
`Unfortunately, double buffering requires too much buffering
`Mode data comprises information relating to various modes
`memory to accommodate a rendering performance of 5~10
`affecting the appearance of one or more geometries When
`Mtris/s.
`displayed. For example, for a given geometry, mode data can
`To address the ever increasing demand for memory by
`de?ne or specify one or more “textures” (e.g., fur, brick,
`bark, sky), blending effects, translucence effects, and the
`such graphics systems, the present invention implements a
`demand-based scheme to drastically increase ef?ciency in
`like, Which may be applied to the rendered geometry.
`use and allocation of memory resources. Broadly, the buff
`Geometry processor 10 supplies the graphics data to a
`ering memory is segmented into pages of suf?cient
`binning engine 112. Using the graphics data, binning engine
`granularity, one or more of Which can then be allocated per
`112 reproduces the associated 13 geometries and modes in
`tile depending on that tile’s requirements. The invention
`an image frame Which comprises a plurality of grid-like
`divides the pages of memory into tWo types, pages currently
`regions referred to as tiles. FIG. 2 shoWs a portion of a frame
`allocated to tiles, and unused pages. Unused pages are kept
`segmented into a number of tiles. A tile may be de?ned, for
`track of in an unused page pool. Each tile is allocated With
`example, by a 32 pixel by 32 pixel square area of the frame.
`a variable number of pages scattered in the memory. In one
`Binning engine 112 determines Which tiles are “touched” by
`embodiment, pages in the unused page pool or each tile are
`each geometry. For every tile, the graphics data for each
`organiZed in a linked list referred to herein as a chain.
`geometry touching the tile is linked to the tile. This linked
`Whenever the screen tiler needs more pages so that it can
`data is output by binning engine 112 per tile. A memory
`store more data to a tile, it attempts to capture pages from the
`controller 114 is coupled to binning engine 112 and routes
`unused pool. If it succeeds, the references of these pages are
`the tile data for storage into various portions, or bins, of a
`removed from the unused page pool, and these pages are
`graphics memory 116. The portion of graphics memory 116
`allocated to the requesting tile. If it fails, the screen tiler
`that is designated for storing binning data is sometimes
`stalls (Which in turn stalls the upstream geometry
`referred to herein as binning memory. A rendering engine
`processing) and Waits until pages get released back into the
`118 is coupled to memory controller 114 and accesses the
`unused pool. After the content of a page has been consumed
`binned graphics data contained in graphics memory 116 to
`by the rendering pipeline, the page is released back into the
`render an image for display. In the rendering engine a tile
`unused page pool. This alloWs the rendering buffer and the
`frame buffer may be cached locally to eliminate external
`binning buffer to share memory resources. The page-based
`frame buffer accesses. The process of rendering an image is
`scheme also removes the issue of limiting the memory siZe
`thus optimiZed by storing graphics data into local graphics
`for a single tile, because more pages can alWays be allocated
`memory (or binning memory) such that it can be readily
`to a single tile as long as there are unused pages left in the
`retrieved by rendering engine 118 to generate an image
`page pool. Therefore, the case of all geometries lying in a
`frame.
`single tile does not cause an exception under this scheme. It
`It is to be understood that the various functional blocks in
`is to be noted that this scheme can be applied to either the
`the graphics system 100 may be implemented by a combi
`graphics memory or system memory.
`nation of hardWare and/or softWare, and that in speci?c
`In a speci?c embodiment, the present invention con?g
`implementations some or all of the functionality of the
`ures the memory into multiple pages as folloWs. There is a
`various blocks may be combined into an integrated solution.
`default siZe used for all the pages. This siZe can be made
`For example, While the diagram shoWs graphics memory
`con?gurable through a global register, and can be set to, for
`116 as a separate block from graphics processor 102, a
`example, 0.5, 1, 2, 4 or 8 Kbytes. Each page has an
`certain amount of memory for binning or other purposes
`associated page descriptor that is stored in a page table in
`may be provided in the graphics processor block. Another
`memory. A page descriptor includes information as to cur
`implementation may combine interface controller and
`rent availability of the page, the page physical and logical
`graphics processor, and combine graphics and system
`addresses, as Well as a page index that indicates the next
`memory resources into an integrated chipset. Geometry
`page in the chain. Pages are preferably processed in chains
`processor 110 may be implemented as part of CPU 104
`to alloW for pre-fetching. In a speci?c example, a page
`eliminating the need for a separate processor. Also, in
`descriptor includes a valid bit that indicates the availability
`alternative embodiments, the binning function may be
`of the page. If the valid bit is, for example 0, it signi?es that
`implemented by softWare on a general purpose CPU or DSP.
`memory has not been allocated for this page. If a neW page
`The memory requirements for such a system Will next be
`is to be allocated to the binning engine and no more valid
`examined using exemplary numbers for illustrative pur
`pages are available, the page allocation logic stalls until
`poses. To render, for example, 1 Million triangles per second
`more pages become available. The page descriptor further
`(1 Mtris/s), at an exemplary frame rate of 30 HZ, the amount
`5 includes a physical address for the beginning of the page.
`of memory needed to buffer the geometry is approximately: 6
`This address may use, for example, the top 23 bits of the
`page descriptor, of Which the bottom 1—4 bits are ignored for
`
`40
`
`45
`
`55
`
`30~6O Mbytes/s/30/s=1~2 Mbytes
`
`MEDIATEK, Ex. 1030, Page 8
`IPR2018-00101
`
`
`
`US 6,856,320 B1
`
`5
`page sizes of 1, 2, 4 or 8 Kbytes. A logical address is also
`provided for the beginning of the page, using for example 17
`bits for positions [25:9] in the page descriptor (With address
`[31:26]=0). The logical address may be allocated on the ?y
`When the page is added to a chain. The page descriptor also
`includes a page index to identify the next page in the chain
`(e.g., 15 bits). In one example, all “1’s” signi?es this is the
`last page in the chain. FIG. 3 shoWs an exemplary embodi
`ment of a page descriptor 300 With the ?elds as described
`above.
`Pages start by being free and get allocated as a frame is
`binned. The used pages remain reserved until rendering
`begins. According to one embodiment of the present
`invention, pages can be allocated from different memory
`resources including, for example, both the graphics memory
`as Well as the system memory. To accomplish this, separate
`chains of free pages are provided for each memory resource.
`The pages are returned to the beginning of the correct chain,
`Which is selected based on their physical address, When they
`are freed. At initialiZation time, all the pages are linked in the
`correct free chain (in system or graphics memory) and, in
`order to be usable by the page allocation logic, the registers
`storing the list of free pages are initialiZed With the values of
`the ?rst page descriptor in each chain. FIGS. 4A and 4B
`shoW examples of memory con?gurations for graphics
`memory and system memory, respectively. The small
`squares marked “P” are the binning/render buffer pages. To
`take advantage of all the available memory, the virtual
`memory pages, the tile tables, and the page table can be
`allocated in the unused memory betWeen larger areas, for
`example, at the end of the frame buffer and back buffer,
`Which typically have “odd” sizes (not a power of 2).
`Referring back to FIG. 1, managing and allocating pages,
`maintaining page descriptors and all other related memory
`functions are performed by memory controller 114. FIG. 5
`shoWs a block diagram for an implementation of memory
`controller 114 according to an exemplary embodiment of the
`present invention. The tWo main blocks Within memory
`controller 114 are memory management unit (MMU) 502,
`Which performs all the functions related to maintaining
`pages of memory, and tile management unit (TMU) 504,
`Which maintains tile descriptors and implements the algo
`rithm that frees and re-allocates pages of memory for each
`tile. In this speci?c embodiment, memory controller 114 is
`provided With internal caches to reduce the latency of
`various operations. A page descriptor cache (PDC) 506
`couples to MMU 502, and a tile descriptor cache (TDC) 508
`couples to TMU 504. MMU 502 and TMU 504 access the
`memory via their respective caches and an interface circuit
`510. A set of registers 512 couple to both MMU 502 and
`TMU 504 as Well as another interface circuit 514. Interface
`circuit 514 facilitates the interface betWeen the memory
`controller and the graphics processor (binning engine and
`rendering engine) as Well as a host CPU.
`Memory Management Unit
`In one embodiment, MMU 502 is implemented by a state
`machine that performs all page related memory operations.
`Among the various functions performed by MMU 502 is
`page allocation including moving a page from a “free” pool
`to a tile list. MMU 502 allocates pages dynamically as the
`binning engine Writes into a tile. When the address falls past
`the end of the current page, the next page from the free pool
`is assigned for that particular tile. In order to optimiZe
`sequential page accesses When the rendering engine 118
`reads the tile data, MMU 502 prefetches the descriptor of the
`next page into the internal registers. In case an address falls
`outside a prefetched page, MMU 502 is able to skip pages
`
`10
`
`15
`
`25
`
`35
`
`40
`
`45
`
`55
`
`65
`
`6
`in a chain. This is referred to herein as “page Walking” and
`is accomplished by reading the link index from the page
`descriptor of a page to be skipped and fetching the descriptor
`of the folloWing page in the chain from the page table one
`or more times, until the page descriptor containing the
`virtual address to be accessed is fetched.
`When the rendering engine moves from one page to the
`next, the old page is freed. MMU 502 returns a freed page
`to the free pool. If page Walking is involved, all the skipped
`pages are freed. Pages that are freed explicitly by the
`rendering engine are the last pages in each tile, Which are
`freed When the rendering engine closes the tile. MMU 502
`also operates to transfer pages from the list used by the
`binning engine, to the one used by the rendering engine. An
`example of a typical life cycle of a page is as folloWs:
`1. Free ->Binning (added at the end of a tile Which is part
`of the binning buffer)
`2. Binning ->Reserved (close the binning buffer for Write)
`3. Reserved ->Rendering (open the render buffer for read)
`4. Rendering ->Free
`FIG. 6 provides a functional block diagram of an exem
`plary implementation for MMU 502. A state machine is
`provided for each of the various functions preformed by the
`MMU including a page Walk state machine 602, a page
`prefetch state machine 604, a page allocation state machine
`606 and a page freeing state machine 608. Each one of these
`state machines communicates With the registers for read/
`Write purposes. Apage descriptor address logic 610 receives
`inputs from the various state machines and communicates
`memory access requests to the page descriptor cache. MMU
`502 also includes a request arbitration block 612 that
`receives requests for the various operations (e.g., page
`allocation, prefetching, etc.) from TMU 504. The requests
`may occur at any time, and at any given time there may be
`several requests pending. Request arbitration block 612
`decides the sequence of the requests and applies the request
`signals to the individual state machines.
`Tile Management Unit
`In order to optimiZe the memory utiliZation, the present
`invention provides TMU 504 to implement the algorithm by
`Which pages from the rendering buffer are freed When they
`are no longer needed and then re-allocated to the binning
`buffer. In one embodiment, each tile is allocated a virtual
`address space of, for example, up to 64 Mbytes (26 address
`bits). For systems Where tWo frames are being processed at
`the same time, tWo tile descriptor tables are provided in
`memory for each of the rendering buffer and binning buffer.
`Alternative embodiments may include one or more than tWo
`tile descriptor tables.
`A tile descriptor table, according to an illustrative
`embodiment of the present invention is shoWn in FIG. 7
`Which also shoWs a tile descriptor 702(i) that includes the
`folloWing ?elds:
`Last virtual address (LastiAddr) used for Writing to the
`tile. In this example, the bottom 13 bits of the ?eld are
`designated for this information. Whenever a tile is Written to,
`the corresponding “last access” address is updated.
`Valid bit indicating the tile “exists”. TMU 504 ignores the
`entries that contain, for example, a “0” in this bit. If the
`rendering engine requests a tile to be opened for read and the
`tile is invalid, it Will receive a value of “0” as the “last
`access” address, indicating an empty tile.
`Start page index for the tile (e.g., 15 bits).
`Current page index (e.g., 15 bits).
`Replica of the page descriptor for the last accessed page
`in the tile. Storing this information redundantly, both in the
`page table and in the tile descriptor, reduces the number of
`
`MEDIATEK, Ex. 1030, Page 9
`IPR2018-00101
`
`
`
`US 6,856,320 B1
`
`,
`
`num ere as o oWs.
`
`8
`7
`memory read requests, and therefore the latency of the tile
`beginning (i.e., as the ?rst page) of the a chain of free pages.
`open operations.
`This algorithm increases the temporal locality of the page
`FIG. 7 also depicts the relationship betWeen tile and page
`descriptor accesses and therefore improves the hit ratio of
`tables. A tile table 700 stores the tile descriptors 702, an
`the page descriptor cache. Not appending the free page at the
`example of Which (702i) shoWs the various ?elds. An
`5 end of the chain also saves a read operation since this
`.
`.
`.
`.
`.
`1
`.th
`.
`th t
`1
`th “? t
`,,
`. t
`b
`exemplary allocation of pages for the tile With tile descriptor
`a gen m requhes
`a on y e
`rs page regls er 6
`702i is shoWn in page descriptor table 704. FIG. 7 also
`updated‘
`shoWs the various ?elds of a page descriptor 706.
`In systems using tWo tile descriptor tables, one for the
`Another function performed by the TMU is buffer sWap-
`ping to alloW for parallel processing of tWo frames. An 10 binning buffer and one for the rendering buffer, the access
`embodiment Of buffer swapping according to the iIlVeIlIiOIl
`pattern to the tWo tables of tile descriptors is very different.
`is depicted in
`When the binning engine ?nishes
`The binning engine may
`back and forth between
`binning a new frame it Performs a “Close for Write” Com‘
`different tiles, because the geometry data can refer to tri
`mand on thebmnmg buffer- From tins Polm On> all “me
`angles that are in different places on the screen. The ren
`ope'n” operations issued by the binning engine are stalled 15 dering engine, on the other hand, Opens a tile, renders
`until the rendering buffer is alscl closed and the t’Wo buffers
`everything inside it, and then moves to another tile‘ The
`Chh ,be SWaPPed Subsequent C1956 for Wm? from the
`binning engine, therefore, requires alot more tile operations
`binning engine are also stalled until the rendering buffer is
`th e latency of which directly impacts the Overall binning
`clhsed and theh reopened‘ In the exemplary emhodhhhht’ a
`performance, Whereas, the one-time latency of the “tile
`thhd (hose 15 Illegal‘ When ,the rehdenhgfhglhe hhlshei 20 open” operation in the rendering engine is acceptable. Given
`rehdenhg the hhrreht frame’ It performs a Close for ,read
`this difference, in one embodiment, the present invention
`COmmahd’_Wh1Ch slghals the memo,” cohhoher that It can
`provides a cache for the binning tile descriptor table and no
`Swap, the blhhlhg/rehdenhg bhffers’ If, the blhhlhg bhfferyas
`cache for the rendering tile descriptor table. In a speci?c
`prevlghsly clhsed by the bhhhhg ehglhe; htherwlse’ the the
`embodiment, When a frame is binned in its entirety, before
`Open _ operhhohs from the réhdenhg ehglhe Wlh be Stahed' 25 the rendering engine starts rendering the tiles in that frame,
`That 15’. Whlchever block ?hlshes ?rst Wahs for the other to
`the tile descriptor cache is ?ushed. The operation of the tile
`alsh hhlsh to be able to Support the systfhh Whh two buffers‘
`descriptor cache Will be described in greater detail herein
`It IS to be understood’, however’ that It IS posslble to adh
`after in the context of a speci?c exemplary embodiment. In
`mere buffers (i.e., a third tile descriptor table) and Work in
`this example, the binning buffer tiles are Organized in
`a mple bhffer mod?‘
`_
`3O “super-tiles” made up of multiple tiles, e.g., four tiles
`_ FIG‘ 9 15,21 fuhchohal block dlagram of an ‘KXemPIaFY
`arranged as a tWo-tiles-Wide by tWo-tiles-high square, each
`implementation for TMU 504. A TMU state machine 900 is
`b d
`f H
`_
`provided that implements the tile management algorithm. A
`0 1
`buffer sWap state machine 902 couples to TMU state
`2 3
`machine 900, and performs the buffer sWapping unction. A
`The tile descriptor cache contains eight cache lines, each
`command arbitration block 904 decides the sequence of 35
`line containing the four tile descriptors for a “super-tile.”
`commands from the binning or the rendering engine (e.g.,
`The tile descriptor cache in this exemplary embodiment is
`“open tile,” “close buffer,” etc.) before supplying them to
`organiZed as a fully associative cache. The replacement
`TMU state machine 900. TMU state machine 900 is also
`policy is “random” using, for example, an 8-bit random
`coupled to read/Write registers and the MMU as shoWn.
`As mentioned above, to reduce the latency of the opera- 40 number generator. In order to optimiZe cache ?lls or spills,
`tions that require accessing the tables, memory controller
`the tiles in a “super-tile” use sequential entries in the tile
`includes tWo caches: page descriptor cache 506 for the page
`table. In one embodiment, the correspondence betWeen the
`descriptor table, and tile descriptor cache 508 for the tile
`tile number and its position in the “super-tile” is calculated
`descriptor table. To reduce the latency of the accesses to the
`by the hardWare. The folloWing is an example of a tile table
`page descriptor table, memory controller (114 in FIG. 1) 45 for an 800x600 screen (25x19 tiles), using a 32x20 organi
`includes memory, such as static random access memory
`Zation. The 0, 1, 32, 33, 2, 3, 34, 35, .
`.
`. organiZation
`(SRAM), to provide a caching mechanism for this table. In
`corresponds to the folloWing screen:
`
`,
`
`,
`
`,
`
`0
`32
`64
`96
`128
`160
`
`i
`33
`65
`97
`129
`161
`
`2
`34
`66
`98
`130
`162
`
`3
`35
`67
`99
`131
`163
`
`4
`36
`68
`100
`132
`164
`
`s
`37
`69
`101
`133
`165
`
`6
`38
`70
`102
`134
`166
`
`7
`39
`