`
`(12) United States Patent
`Rubinstein et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,102,646 B1
`*Sep. 5, 2006
`
`(54)
`
`(75)
`
`DEMAND-BASED MEMORY SYSTEM FOR
`GRAPHICS APPLICATIONS
`
`Inventors: Oren Rubinstein, Sunnyvale, CA (US);
`Ming Benjamin Zhu, San Jose, CA
`(Us)
`
`(73)
`
`Assignee:
`
`NVIDIA Us. Investment Company,
`Santa Clara, CA (US)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`This patent is subject to a terminal dis
`claimer.
`
`(21)
`
`(22)
`
`(63)
`
`(51)
`
`(52)
`
`(58)
`
`(56)
`
`Appl. N0.: 10/888,951
`
`Filed:
`
`Jul. 9, 2004
`
`Related US. Application Data
`
`Continuation of application No. 09/709,964, ?led on
`Nov. 10, 2000, noW Pat. No. 6,856,320, Which is a
`continuation-in-part of application No. 08/978,491,
`?led on Nov. 25, 1997, noW Pat. No. 6,697,063.
`
`Int. Cl.
`(2006.01)
`G06F 13/28
`(2006.01)
`G06F 12/06
`(2006.01)
`G06F 12/02
`US. Cl. .................... .. 345/570; 345/543; 345/571;
`345/572
`Field of Classi?cation Search .............. .. 345/582,
`345/423, 570-572, 543, 538,568,426; 711/207,
`711/208
`See application ?le for complete search history.
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`8/1994 Sodenberg et al.
`5,339,386 A
`4/1996 Latham
`5,509,110 A
`5,561,750 A 10/1996 LentZ
`5,613,050 A
`3/1997 Hochmuth et a1.
`5,886,701 A
`3/1999 Chauvin et al.
`5,936,642 A
`8/1999 Yumoto et al.
`6,205,531 B1
`3/2001 Hussain
`6,331,854 B1
`12/2001 Rogers et al.
`6,344,852 B1
`2/2002 Zhu et a1.
`6,380,935 B1
`4/2002 Heeschen et al.
`6,396,473 B1
`5/2002 Callahan et al.
`6,697,063 B1* 2/2004 Zhu ......................... .. 345/421
`6,856,320 B1 *
`2/2005 Rubinstein et al. ....... .. 345/543
`
`OTHER PUBLICATIONS
`
`Carpenter, “The A-Buifer, an Antialiased Hidden Surface Method,”
`ACM pp. 103-108, 1984.
`Foley et al., “Computer Graphics: Principles and Practice,” pp.
`863-1120, 1996.
`Foley et al., “Computer Graphics: Principles and Practice,”
`Addison-Wesley Publishing Co., 2nd Ed. in C., pp. 91-92, 673,
`873-874, 1997.
`
`* cited by examiner
`
`Primary ExamineriKee M. Tung
`Assistant ExamineriHau Nguyen
`(74) Attorney, Agent, or FirmiToWnsend and Townsend
`and CreW LLP
`
`(57)
`
`ABSTRACT
`
`A memory system and methods of operating the same that
`drastically increase the e?iciency in memory use and allo
`cation in graphics systems. In a graphics system using a tiled
`architecture, instead of pre-allocating a ?xed amount of
`memory for each tile, the invention dynamically allocates
`varying amounts of memory per tile depending on the
`demand. In one embodiment all or a portion of the available
`memory is divided into smaller pages that are preferably
`equal in siZe. Memory allocation is done by page based on
`the amount of memory required for a given tile.
`
`4,876,651 A 10/1989 Dawson et al.
`
`17 Claims, 6 Drawing Sheets
`
`106\ Interface
`Controller
`
`(AGP)
`
`System / 103
`Memory
`
`F---"""---- - - - - ' - - - - _ ' _ _ _ — _ _ _ “7|
`
`Geometry fllo
`Processor
`l
`
`/112
`
`Binn_ing
`Eng'ne
`
`118
`\
`Rendering
`519'“
`
`114
`/
`Memory
`Controller
`
`Video
`Backend
`
`'
`I
`
`I
`I
`1
`E
`I
`I
`'
`I
`I
`I
`:
`
`'
`
`l
`I
`
`|
`l
`
`102
`
`'/
`I
`:
`I
`I
`I
`l
`;
`I
`'
`I
`
`|
`
`I
`
`/100
`
`/115
`
`Graphics
`Memory
`
`Frame
`Buffer
`
`MEDIATEK, Ex. 1026, Page 1
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 1 of6
`
`US 7,102,646 B1
`
`104
`
`CPU
`
`106
`
`Interface
`Controller
`
`System
`Memory
`
`108
`
`fo
`
`(AGP)
`a I!
`1
`I
`!
`1-102
`
`1
`
`|'
`
`Geometry
`Processor
`
`110
`
`;
`'
`|
`;
`114
`;
`Graphics |-116
`
`i
`Memory
`Binning
`Engine
`
`
`
`; ;
`
`Frame
`
`FIG. 2
`
`MEDIATEK,Ex. 1026, Page 2
`IPR2018-00101
`
`Buffer
`200
`
`;
`
`11i
`
`}lt1(tti
`
`}t
`
`MEDIATEK, Ex. 1026, Page 2
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 2 6f 6
`
`US 7,102,646 B1
`
`screen
`geometries
`
`1 52
`
`f
`151
`
`-
`_
`'\ tslgeen
`XMZ
`
`g--- ..... --
`
`2-opaque;
`1 53
`1-4ransparent
`tile scan/z /
`with depth
`buffer
`
`screen
`space tiier
`
`f
`162
`
`_
`i
`i vlsible
`a fragment ‘*m 154
`2 PFC
`1
`- Pu",
`
`1 55
`raster /
`
`_
`
`____>
`
`—-—-——-> tile
`
`/1 geometries
`1 61
`Memory
`
`texture
`160 --'> memory ‘——->
`
`-
`shadmg
`
`/ 1 56
`
`1 59\
`\ gglge
`-
`"'de°‘°“t "'_" buffer
`/
`1 58
`
`tile 2 I alpha
`157
`bgenldingl /
`paxe -ops
`“__ with color]:
`frame bu?er
`
`FIG. 1A
`
`MEDIATEK, Ex. 1026, Page 3
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 3 6f 6
`
`US 7,102,646 B1
`
`LOGICAL
`ADDRESS
`
`INDEX
`
`PHYSICAL
`ADDRESS
`
`VALID
`
`FIG. 3
`
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`Tile table
`Tile table
`Page table P P
`
`Back buffer
`(800x600x16bit)
`
`PIP PIP P
`
`Frame buffer
`(800x600x16bit)
`
`FIG. 4A
`
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`PPPPPPPPPPPP
`
`FIG. 4B
`
`MEDIATEK, Ex. 1026, Page 4
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 4 6f 6
`
`US 7,102,646 B1
`
`Binning Render
`Host
`CPU Engine Engine
`
`514\
`
`Input
`Interface
`
`502
`
`508
`
`510
`
`‘
`
`TMU
`
`Tile Descriptor
`Cache
`
`512
`\ Registers
`
`502
`/
`
`516
`/
`
`Output
`Interface
`
`, Memory
`
`MMU
`
`Page Descriptor
`Cache
`
`_
`
`FIG. 5
`
`Triggers from TMU
`
`612\
`Request
`Arbitration
`
`Registers i
`ReadNVrite
`Accesses <—-—
`
`602
`
`
`
`Page Waik State Machine
`
`‘
`
`r 604
`Page Prefetch
`State Machine
`606
`F
`Page Allocation
`State Machine ‘
`
`I
`
`I
`
`/- 608
`Page Freeing
`State Machine
`
`610
`
`/
`
`Page
`Descriptor
`Address
`L°9'°
`
`Memory
`Access
`Requests
`
`To Page
`Descriptor
`Cache
`
`FIG. 6
`
`MEDIATEK, Ex. 1026, Page 5
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 5 6f 6
`
`US 7,102,646 B1
`
`Tile Table (one of two)
`
`700\
`
`Tile descriptor
`
`EasLAddrI CurrentI Start
`
`I
`
`Valid
`
`I Copyofpagedescriptorforlastpage I
`
`p
`
`1p
`
`J
`
`702i
`
`Page Table
`
`Page descriptor
`
`Page descriptor
`
`/ 704
`
`Page descriptor
`
`Page descriptor
`
`FIG. 7
`
`/
`L e‘
`706
`IiAddLI Index |R.Addr.| Valid l/
`
`MEDIATEK, Ex. 1026, Page 6
`IPR2018-00101
`
`
`
`U.S. Patent
`
`Sep. 5, 2006
`
`Sheet 6 6f 6
`
`US 7,102,646 B1
`
`Binning engine
`“close buffer”
`
`Rendering subsystem
`“close buffer”
`
`No
`
`'
`
`Raster
`buffer closed
`?
`
`Binning
`buffer closed
`?
`
`No
`
`Swap buffers
`
`FIG. 8
`
`906 \ Input
`Interface
`
`Command / 904
`Arbitration
`
`Registers read/write
`
`TMU state
`Machine
`
`Buffer Swap / ‘302
`State Machine
`
`Page
`access
`triggers
`
`Memory
`access
`triggers
`
`To MMU To tile
`descriptor
`cache
`
`FIG. 9
`
`MEDIATEK, Ex. 1026, Page 7
`IPR2018-00101
`
`
`
`US 7,102,646 B1
`
`1
`DEMAND-BASED MEMORY SYSTEM FOR
`GRAPHICS APPLICATIONS
`
`CROSS-REFERENCES TO RELATED
`APPLICATIONS
`
`This application is a continuation of US. patent applica
`tion Ser. No. 09/709,964 ?led Nov. 10, 2000 now US. Pat.
`No. 6,856,320, Which is a continuation-in-part of com
`monly-assigned US. patent application Ser. No. 08/978,491,
`titled “Rendering Pipeline,” by Zhu, ?led Nov. 25, 1997 now
`US. Pat. No. 6,697,063. The complete disclosures of these
`applications are hereby incorporated by reference in their
`entirety for all purposes.
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates in general to memory sys
`tems, and in particular to a memory system that provides for
`ef?cient use and allocation of memory resources in graphics
`applications.
`Limited memory bandWidth has been a major obstacle in
`improving the performance of traditional 3-D graphics pipe
`lines. Tiling or “chunking” has evolved as an alternative
`architecture that reduces the demand on memory bandWidth
`in 3-D graphics pipelines. A graphics pipeline using tiling
`architecture segments the rasteriZation area into a regular
`grid of tiles. Input geometry and commands are then
`“binned” into separate memory regions for each tile, such
`that only the geometry and commands Which affect raster
`iZation of a given tile need be read by the rasteriZation
`hardware. By storing, for example, one tile Worth of data in
`memory that is local to the processor, external memory
`bandWidth requirements are signi?cantly reduced.
`While this type of tile-based architecture improves
`memory bandWidth, it also increases the demand for larger
`memory. Pipelining techniques Whereby one frame is ren
`dered at the same time the next frame is being binned,
`require even larger memory since a rendering buffer is
`provided in addition to a binning buffer. It therefore becomes
`necessary to allocate memory resources in this type of tiled
`architecture as ef?ciently as possible. Conventional
`approaches, hoWever, pre-allocate a ?xed amount of
`memory for each tile. In this type of memory system
`inefficiencies arise When, for example, the amount of geom
`etry data for a given tile is larger than the pre-allocated
`amount of memory. This results in the inability of the
`graphics hardWare to process all the data and render the tile.
`Conversely, a particular tile With a small amount of geom
`etry data may require less memory than that provided by the
`pre-allocated memory resulting in portions of the pre-allo
`cated memory to go unused.
`There is therefore a need for a memory system that
`provides for more ef?cient use and allocation of memory
`resources in graphics applications.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`SUMMARY OF THE INVENTION
`
`The present invention provides a memory system and
`methods of operating the same that drastically increase the
`ef?ciency in memory use and allocation in graphics systems.
`Broadly, in a graphics system using a tiled architecture,
`instead of pre-allocating a ?xed amount of memory for each
`tile, the present invention dynamically allocates varying
`amounts of memory per tile depending on the demand. In
`one embodiment all or a portion of the available memory is
`divided into smaller “pages” that are preferably equal in
`
`60
`
`65
`
`2
`siZe. Memory allocation is done by page based on the
`amount of memory required for a given tile. The smaller the
`siZe of the page the higher the granular resolution, and
`therefore the more ef?cient memory allocation becomes. In
`one embodiment, the siZe of the page is made con?gurable.
`This page-based memory system removes the limitation on
`the memory siZe for a single tile by simply allocating more
`pages When necessary. Therefore, a tile Wherein a large
`amount of geometry may lie Would not cause an exception.
`In one embodiment, When all geometry data stored in a page
`of memory has been rendered, the page is freed and made
`available for reallocation. This effectively reduces the over
`all amount of memory used for binning, eliminating the need
`for double-buffering for a system that pipelines the binning
`and the rendering process.
`Accordingly, in one embodiment, the present invention
`provides a method of processing graphics data comprising
`dividing an image rendering area into a plurality of tiles;
`binning input graphics data; organiZing memory into a
`plurality of pages; and allocating one or more pages of
`memory to store binned data per tile, Wherein the number of
`pages allocated is determined by the siZe of the binned data
`for each tile.
`In another embodiment, the present invention provides a
`memory system comprising memory divided into a plurality
`of pages and con?gured to store graphics data; a memory
`management unit coupled to the memory and con?gured to
`maintain a plurality of page descriptors corresponding to the
`plurality of pages; and a tile management unit coupled to the
`memory and con?gured to maintain a plurality of tile
`descriptors corresponding to a plurality of screen tiles.
`In yet another embodiment, the present invention pro
`vides a computer system for processing graphics data,
`comprising a binning engine coupled to receive graphics
`data and con?gured to bin graphics data into screen tiles;
`memory coupled to the binning engine; and a rendering
`engine coupled to the memory and con?gured to render
`binned graphics data, Wherein, the memory is divided into a
`plurality of pages, and is con?gured to allocate one or more
`pages per tile depending on tile requirements.
`The folloWing detailed description and the accompanying
`draWings provide a better understanding of the nature and
`advantages of the page-based memory system of the present
`invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a graphics system that processes graphics data
`in accordance With an embodiment of the present invention;
`FIG. 1A is a block schematic diagram of the data How of
`a preferred embodiment of the invention;
`FIG. 2 illustrates a portion of an exemplary frame com
`prising a number of tiles;
`FIG. 3 shoWs an exemplary page descriptor according to
`the present invention;
`FIGS. 4A and 4B shoW examples of memory con?gura
`tions according to the present invention for graphics
`memory and system memory, respectively
`FIG. 5 is a block diagram for an implementation of a
`memory controller according to an exemplary embodiment
`of the present invention;
`FIG. 6 is a functional block diagram of an exemplary
`implementation for a memory management unit according to
`the present invention
`FIG. 7 shoWs a tile descriptor table and relationship
`betWeen tile and page tables according to an illustrative
`embodiment of the present invention;
`
`MEDIATEK, Ex. 1026, Page 8
`IPR2018-00101
`
`
`
`US 7,102,646 B1
`
`3
`FIG. 8 illustrates an embodiment of bulfer swapping
`according to the invention; and
`FIG. 9 is a functional block diagram of an exemplary
`implementation for a tile management unit according to the
`present invention.
`
`DETAILED DESCRIPTION OF SPECIFIC
`EMBODIMENTS
`
`4
`that is designated for storing binning data is sometimes
`referred to herein as binning memory. A rendering engine
`118 is coupled to memory controller 114 and accesses the
`binned graphics data contained in graphics memory 116 to
`render an image for display. In the rendering engine a tile
`frame bulfer may be cached locally to eliminate external
`frame bulfer accesses. The process of rendering an image is
`thus optimiZed by storing graphics data into local graphics
`memory (or binning memory) such that it can be readily
`retrieved by rendering engine 118 to generate an image
`frame.
`It is to be understood that the various functional blocks in
`the graphics system 100 may be implemented by a combi
`nation of hardWare and/or softWare, and that in speci?c
`implementations some or all of the functionality of the
`various blocks may be combined into an integrated solution.
`For example, While the diagram shoWs graphics memory
`116 as a separate block from graphics processor 102, a
`certain amount of memory for binning or other purposes
`may be provided in the graphics processor block. Another
`implementation may combine interface controller and
`graphics processor, and combine graphics and system
`memory resources into an integrated chipset. Geometry
`processor 110 may be implemented as part of CPU 104
`eliminating the need for a separate processor. Also, in
`alternative embodiments, the binning function may be
`implemented by softWare on a general purpose CPU or DSP.
`The memory requirements for such a system Will next be
`examined using exemplary numbers for illustrative pur
`poses. To render, for example, 1 Million triangles per second
`(1 Mtris/ s), at an exemplary frame rate of 30 HZ, the amount
`of memory needed to bulfer the geometry is approximately:
`
`30~60 Mbytes/s/30/s:l~2 Mbytes
`The bu?cering siZe scales linearly With the performance
`such that, for example, for 5 Mtris/ s, the amount of required
`memory Would be approximately:
`
`5 * l~2 Mbytes:5~l 0 Mbytes
`
`This increase in demand for memory is exacerbated by the
`desire to bin geometries of a current frame in parallel With
`rendering geometries of a previous frame. Double bu?cering
`has been proposed as a means to enable this type of parallel
`processing of consecutive frames. While binning of the
`current frame is going on in one bulfer, the rendering of the
`previous frame is using the other bulfer. The roles of these
`tWo bulfers are simply sWitched at the frame boundary.
`Unfortunately, double bu?‘ering requires too much bu?cering
`memory to accommodate a rendering performance of 5~l0
`Mtris/s.
`To address the ever increasing demand for memory by
`such graphics systems, the present invention implements a
`demand-based scheme to drastically increase ef?ciency in
`use and allocation of memory resources. Broadly, the bulf
`ering memory is segmented into pages of su?icient granu
`larity, one or more of Which can then be allocated per tile
`depending on that tile’s requirements. The invention divides
`the pages of memory into tWo types, pages currently allo
`cated to tiles, and unused pages. Unused pages are kept track
`of in an unused page pool. Each tile is allocated With a
`variable number of pages scattered in the memory. In one
`embodiment, pages in the unused page pool or each tile are
`organiZed in a linked list referred to herein as a chain.
`Whenever the screen tiler needs more pages so that it can
`store more data to a tile, it attempts to capture pages from the
`unused pool. If it succeeds, the references of these pages are
`removed from the unused page pool, and these pages are
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`FIG. 1 is a block diagram of a graphics system 100 that
`processes graphics data in accordance With an embodiment
`of the present invention. A graphics processor 102 is coupled
`to a central processing unit (CPU) 104 by a bus, such as an
`advanced graphics protocol (AGP) bus, via an interface
`controller 106. System memory 108 also couples to CPU
`104 via interface controller 106. Graphics processor 102
`includes a geometry processor 110 that receives graphics
`data from CPU 104. Graphics data typically includes geom
`etry data and mode data. Geometry data comprises infor
`mation relating to various polygons (e.g., triangles, paral
`lelograms, rectangles, circles, etc.) Which are processed to
`produce a complete image. Geometry data speci?es, for
`example, the vertices (e.g., in X,Y,Z coordinates) and color
`(e.g., red-green-blue (RGB)) combinations for various poly
`gons. Mode data comprises information relating to various
`modes a?cecting the appearance of one or more geometries
`When displayed. For example, for a given geometry, mode
`data can de?ne or specify one or more “textures” (e.g., fur,
`brick, bark, sky), blending elfects, translucence effects, and
`the like, Which may be applied to the rendered geometry.
`Referring to FIG. 1A, the basic data How of the invention
`is shown. The geometries in model space 151 are trans
`formed into screen space and the screen space tiler 162 bins
`a frame Worth of geometries into screen tiles. The visibility
`of all geometries is determined up front using only screen x,
`y, Z coordinates 152 in the scan/Z engine 153 for each tile.
`Visibility information 154 are sent out for rasteriZation 155
`and shading 156. The visibility information 154 are com
`bined With the tile geometries 161 for each tile so that only
`visible geometries are set up for rasteriZation. Only visible
`fragments are fully rasteriZed and shaded in the raster
`155/ shading 156 engine. The resulting fragments are sent to
`the blending engine 157. The blending engine 157 alpha
`blends incoming fragments. The blending engine 157
`resolves and outputs pixel colors into the frame bulfer at the
`end-of-tile. The tasks of the screen space tiler 162, scan Z
`153, raster 155/ shading 156, and blending 157 engines
`operate in parallel for the load-balancing of the various
`processes. This does introduce one frame of latency. If the
`extra latency is objectionable, then the scan Z 153, raster
`155/ shading 156, and blending 157 engines operate in
`parallel With the screen space tiler 162 operating serially
`before them.
`Geometry processor 110 supplies the graphics data to a
`binning engine 112. Using the graphics data, binning engine
`112 reproduces the associated geometries and modes in an
`image frame Which comprises a plurality of grid-like regions
`referred to as tiles. FIG. 2 shoWs a portion of a frame
`segmented into a number of tiles. A tile may be de?ned, for
`example, by a 32 pixel by 32 pixel square area of the frame.
`Binning engine 112 determines Which tiles are “touched” by
`each geometry. For every tile, the graphics data for each
`geometry touching the tile is linked to the tile. This linked
`data is output by binning engine 112 per tile. A memory
`controller 114 is coupled to binning engine 112 and routes
`the tile data for storage into various portions, or bins, of a
`graphics memory 116. The portion of graphics memory 116
`
`50
`
`55
`
`60
`
`65
`
`MEDIATEK, Ex. 1026, Page 9
`IPR2018-00101
`
`
`
`US 7,102,646 B1
`
`5
`allocated to the requesting tile. If it fails, the screen tiler
`stalls (Which in turn stalls the upstream geometry process
`ing) and Waits until pages get released back into the unused
`pool. After the content of a page has been consumed by the
`rendering pipeline, the page is released back into the unused
`page pool. This alloWs the rendering buffer and the binning
`buffer to share memory resources. The page-based scheme
`also removes the issue of limiting the memory siZe for a
`single tile, because more pages can alWays be allocated to a
`single tile as long as there are unused pages left in the page
`pool. Therefore, the case of all geometries lying in a single
`tile does not cause an exception under this scheme. It is to
`be noted that this scheme can be applied to either the
`graphics memory or system memory.
`In a speci?c embodiment, the present invention con?g
`ures the memory into multiple pages as folloWs. There is a
`default siZe used for all the pages. This siZe can be made
`con?gurable through a global register, and can be set to, for
`example, 0.5, 1, 2, 4 or 8 Kbytes. Each page has an
`associated page descriptor that is stored in a page table in
`memory. A page descriptor includes information as to cur
`rent availability of the page, the page physical and logical
`addresses, as Well as a page index that indicates the next
`page in the chain. Pages are preferably processed in chains
`to alloW for pre-fetching. In a speci?c example, a page
`descriptor includes a valid bit that indicates the availability
`of the page. If the valid bit is, for example 0, it signi?es that
`memory has not been allocated for this page. If a neW page
`is to be allocated to the binning engine and no more valid
`pages are available, the page allocation logic stalls until
`more pages become available. The page descriptor further
`includes a physical address for the beginning of the page.
`This address may use, for example, the top 23 bits of the
`page descriptor, of Which the bottom 1*4 bits are ignored for
`page siZes of 1, 2, 4 or 8 Kbytes. A logical address is also
`provided for the beginning of the page, using for example 17
`bits for positions [25:9] in the page descriptor (With address
`[31:26]:0). The logical address may be allocated on the ?y
`When the page is added to a chain. The page descriptor also
`includes a page index to identify the next page in the chain
`(e.g., 15 bits). In one example, all “1 ’s” signi?es this is the
`last page in the chain. FIG. 3 shoWs an exemplary embodi
`ment of a page descriptor 300 With the ?elds as described
`above.
`Pages start by being free and get allocated as a frame is
`binned. The used pages remain reserved until rendering
`begins. According to one embodiment of the present inven
`tion, pages can be allocated from different memory
`resources including, for example, both the graphics memory
`as Well as the system memory. To accomplish this, separate
`chains of free pages are provided for each memory resource.
`The pages are returned to the beginning of the correct chain,
`Which is selected based on their physical address, When they
`are freed. At initialiZation time, all the pages are linked in the
`correct free chain (in system or graphics memory) and, in
`order to be usable by the page allocation logic, the registers
`storing the list of free pages are initialiZed With the values of
`the ?rst page descriptor in each chain. FIGS. 4A and 4B
`shoW examples of memory con?gurations for graphics
`memory and system memory, respectively. The small
`squares marked “P” are the binning/render bulfer pages. To
`take advantage of all the available memory, the virtual
`memory pages, the tile tables, and the page table can be
`allocated in the unused memory betWeen larger areas, for
`example, at the end of the frame buffer and back buffer,
`which typically have “odd” siZes (not a poWer of 2).
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`Referring back to FIG. 1, managing and allocating pages,
`maintaining page descriptors and all other related memory
`functions are performed by memory controller 114. FIG. 5
`shoWs a block diagram for an implementation of memory
`controller 114 according to an exemplary embodiment of the
`present invention. The tWo main blocks Within memory
`controller 114 are memory management unit (MMU) 502,
`Which performs all the functions related to maintaining
`pages of memory, and tile management unit (TMU) 504,
`Which maintains tile descriptors and implements the algo
`rithm that frees and re-allocates pages of memory for each
`tile. In this speci?c embodiment, memory controller 114 is
`provided With internal caches to reduce the latency of
`various operations. A page descriptor cache (PDC) 506
`couples to MMU 502, and a tile descriptor cache (TDC) 508
`couples to TMU 504. MMU 502 and TMU 504 access the
`memory via their respective caches and an interface circuit
`510. A set of registers 512 couple to both MMU 502 and
`TMU 504 as Well as another interface circuit 514. Interface
`circuit 514 facilitates the interface betWeen the memory
`controller and the graphics processor (binning engine and
`rendering engine) as Well as a host CPU.
`Memory Management Unit
`In one embodiment, MMU 502 is implemented by a state
`machine that performs all page related memory operations.
`Among the various functions performed by MMU 502 is
`page allocation including moving a page from a “free” pool
`to a tile list. MMU 502 allocates pages dynamically as the
`binning engine Writes into a tile. When the address falls past
`the end of the current page, the next page from the free pool
`is assigned for that particular tile. In order to optimiZe
`sequential page accesses When the rendering engine 118
`reads the tile data, MMU 502 prefetches the descriptor of the
`next page into the internal registers. In case an address falls
`outside a prefetched page, MMU 502 is able to skip pages
`in a chain. This is referred to herein as “page Walking” and
`is accomplished by reading the link index from the page
`descriptor of a page to be skipped and fetching the descriptor
`of the folloWing page in the chain from the page table one
`or more times, until the page descriptor containing the
`virtual address to be accessed is fetched.
`When the rendering engine moves from one page to the
`next, the old page is freed. MMU 502 returns a freed page
`to the free pool. If page Walking is involved, all the skipped
`pages are freed. Pages that are freed explicitly by the
`rendering engine are the last pages in each tile, Which are
`freed When the rendering engine closes the tile. MMU 502
`also operates to transfer pages from the list used by the
`binning engine, to the one used by the rendering engine. An
`example of a typical life cycle of a page is as folloWs:
`1. Free->Binning (added at the end of a tile Which is part
`of the binning buffer)
`2. Binning->Reserved (close the binning buffer for Write)
`3. Reserved->Rendering (open the render buffer for read)
`4. Rendering->Free
`FIG. 6 provides a functional block diagram of an exem
`plary implementation for MMU 502. A state machine is
`provided for each of the various functions preformed by the
`MMU including a page Walk state machine 602, a page
`prefetch state machine 604, a page allocation state machine
`606 and a page freeing state machine 608. Each one of these
`state machines communicates With the registers for read/
`Write purposes. Apage descriptor address logic 610 receives
`inputs from the various state machines and communicates
`memory access requests to the page descriptor cache. MMU
`502 also includes a request arbitration block 612 that
`
`MEDIATEK, Ex. 1026, Page 10
`IPR2018-00101
`
`
`
`US 7,102,646 B1
`
`7
`receives requests for the various operations (e.g., page
`allocation, prefetching, etc.) from TMU 504. The requests
`may occur at any time, and at any given time there may be
`several requests pending. Request arbitration block 612
`decides the sequence of the requests and applies the request
`signals to the individual state machines.
`
`8
`buffer sWap state machine 902 couples to TMU state
`machine 900, and performs the buffer sWapping function. A
`command arbitration block 904 decides the sequence of
`commands from the binning or the rendering engine (e.g.,
`“open tile,” “close buffer,” etc.) before supplying them to
`TMU state machine 900. TMU state machine 900 is also
`coupled to read/Write registers and the MMU as shoWn.
`As mentioned above, to reduce the latency of the opera
`tions that require accessing the tables, memory controller
`includes tWo caches: page descriptor cache 506 for the page
`descriptor table, and tile descriptor cache 508 for the tile
`descriptor table. To reduce the latency of the accesses to the
`page descriptor table, memory controller (114 in FIG. 1)
`includes memory, such as static random access memory
`(SRAM), to provide a caching mechanism for this table. In
`one embodiment, When the memory controller allocates a
`page to the binning engine, the page is taken from the
`beginning of one of the chains of free pages. When a page
`is no longer needed because its content has been already read
`by the rendering engine, and is freed, it is inserted at the
`beginning (i.e., as the ?rst page) of the a chain of free pages.
`This algorithm increases the temporal locality of the page
`descriptor accesses and therefore improves the hit ratio of
`the page descriptor cache. Not appending the free page at the
`end of the chain also saves a read operation since this
`algorithm requires that only the “?rst page” register be
`updated.
`In systems using tWo tile descriptor tables, one for the
`binning buffer and one for the rendering buffer, the access
`pattern to the tWo tables of tile descriptors is very different.
`The binning engine may jump back and forth betWeen
`different tiles, because the geometry data can refer to tri
`angles that are in different places on the screen. The ren
`dering engine, on the other hand, opens a tile, renders
`everything inside it, and then moves to another tile. The
`binning engine, therefore, requires a lot more tile operations
`the latency of Which directly impacts the overall binning
`performance, Whereas, the one-time latency of the “tile
`open” operation in the rendering engine is acceptable. Given
`this difference, in one embodiment, the present invention
`provides a cache for the binning tile descriptor table and no
`cache for the rendering tile descriptor table. In a speci?c
`embodiment, When a frame is binned in its entirety, before
`the rendering engine starts rendering the tiles in that frame,
`the tile descriptor cache is ?ushed. The operation of the tile
`descriptor cache Will be described in greater detail herein
`after in the context of a speci?c exemplary embodiment. In
`this example, the binning buffer tiles are organiZed in
`“super-tiles” made up of multiple tiles, e.g., four tiles
`arranged as a tWo-tiles-Wide by tWo-tiles-high square, each
`numbered as folloWs:
`01
`23
`The tile descriptor cache contains eight cache lines, each
`line containing the four tile descriptors for a “super-tile.”
`The tile descriptor cache in this exemplary embodiment is
`organiZed as a fully associative cache. The replacement
`policy is “random” using, for example, an 8-bit random
`number generator. In order to optimiZe cache ?lls or spills,
`the tiles in a “super-tile” use sequential entries in the tile
`table. In one embodiment, the correspondence betWeen the
`tile number and its position in the “super-tile” is calculated
`by the hardWare. The folloWing is an example of a tile table
`for an 800x600 screen (25x19 tiles), using a 32x20 organi
`Zation. The 0, 1, 32, 33, 2, 3, 34, 35, .
`.
`. organization
`corresponds to the folloWing screen:
`
`25
`
`30
`
`35
`
`20
`
`Tile Management Unit
`In order to optimiZe the memory utiliZation, the present
`invention provides TMU 504 to implement the algorithm by
`Which pages from the rendering buffer are freed When they
`are no longer needed and then re-allocated to the binning
`buffer. In one embodim