`
`||||||||||||||||||||[l||||||||||||||l|||||||||||||||||||||||||||||llll
`US0061 0441 TA
`
`United States Patent
`Nmbm1unL
`
`119]
`
`l’atent Number:
`||: 1
`[H] DmcofPm£m:
`
`6,104,417
`*Aug1&2fim
`
`[54] UNIFIED MPZMORY C{)N[PUTER
`ARCHI'l”IiCTURI". WITH DYNAMIC
`GRAPHICS MEMORY ALLOCATION
`
`I75]
`
`]IW€flT0l'S'- Mifhfll-‘I J- K- Nifltfiefl. 530 1080; Zflhid
`5. Hussaln, Palo Alto, holh of Calil‘.
`
`_
`I a I Nance:
`
`[T3] A-migncc: Silicon Graphics. Inc.. Muumain View.
`Calil".
`_
`_
`_
`_
`rh1'c'.'mmm lffiufad N.‘ a Ccmllmlcd '”,n,S'
`ecuilon appltcallnn fried under 3? (.l‘R
`Lsam)‘ and is subject In the lwumy year
`patent
`{Hm pnwisiflm of 35 U‘S_(_-_
`|_-34(a)('_J_).
`
`[ZI | Appl. No: l}8,'7l.‘3,'.-'79
`
`Sap’ I3’ 1996
`Film:
`[22
`---------------------------------I GMF 1335
`“IL CL
`[51]
`
`---- v- 34515212 34555”; 345.3503;
`[53] U-S- CL I
`3455 l_24_-'''11":3m
`‘
`_
`5202-
`[33]
`“*3” 0f 59379“
`345’'lS'1' 5] " Sm‘ 71 1 "'0" 'm‘ '05
`Refcmnccs cued
`U.S. P!\TlE.NT DOIf‘UMENTS
`9.=I‘J05
`L£:hI'n-‘In elal.
`
`T
`
`[55]
`
`5.450.542
`
`345512
`
`5_h40.54-3
`
`l"i_-"I9!J'."
`
`I":1rrcIlL'I al.
`
`.34-5.’5TI2
`
`.f’r:'mm'_r .F:'.rrmu‘m:r—[(ec M. Tung
`A!.iomr:_\'. Agmr. or .r‘"r'rm—Wagncr. Murabito 6’: [Ian
`
`[5?|
`
`ABS’I'RACT
`
`A compulcr syslcrn provides dynamic memory aliucalion for
`graphics. The cumpulur syslum includes a mumnry
`uuntrollcr, 3 unified system mt.-mo1'_v. and memory clients
`I
`_h h .
`‘
`_
`h_ _
`_
`_
`_
`_.
`h‘
`_
`NIL
`.w|n_r,
`.:.Lu..~..a I0_l
`L.
`.~uy:~_.I:._m mumury \|a_1 I. momrary
`controller. Memory clrcnls can mciuclc a graphics rendering
`engine, a (.‘]’U, an image processor. a data crmniprcssion.-’
`uxpanzsiun (twice. an inpul.-"oulput tlcvicc, :1 graphics [Jack
`end clcvicc. The I‘J(1IIlpI1lI:1‘S.y$lum pruvidcsa read.-"wriI:: accuse
`
`Ihruugh [his memory
`the unilicrl syslcm memory,
`10
`::nntr(:l1cr,
`[hr cach u|'
`lhu memory cliunls.
`'['rans|aIiun
`hardware is inciuciud for nrapping virluai acklrcw.-s of pixel
`lauflbrs In phy.‘iiL‘a| memory locations in the unified system
`rrwmury. l’ixL:| bu |I::rs are dynanliczrlly allocalctl as iilces cl’
`physically ::un1iguou.~; memory.
`'['1'a11sE.11iu11 hardware is
`implumunlcti
`in :;.'1::l1 of the curnpulaliunzll dcviccs. which
`are :m:1m|ud as memory clscnls m Ihc computer syslcm.
`including primariiy lhc rendering engine.
`
`29 Claims, I3 Drawing Sheets
`
`UNIFIED
`SY STE M
`MEMORY
`
`E
`
`MEMORY CONTROLLER
`
`GRAPHICS
`GRAPHICS
`INPUT!
`IMAGE
`
`BACK
`RENDERING
`OUTPUT
`PFIOC ESSOFI
`ENGINE
`END
`
`lfl
`
`ELQ
`
`2.12
`
`H3.
`
`
`
`
`215
`DATA COMPRESSION I
`EXPANSION DEVICE
`
`
`
`0001
`0001
`
`Volkswagen 1003
`Volkswagen 1003
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 1 of 13
`
`6,104,417
`
`GP”
`
`"' 0
`
`GRAPHICS
`PROCESSOR
`
`mass
`PROCESSOR
`
`19.5
`
`ILQE
`
`1;
`
`HQ
`
`MAIN MEMORY
`
`GRAPHICS MEMORY
`
`IP MEMORY
`
`CONTROLLER
`
`JJQ
`
`CONTROLLER 11
`
`CONTROLLER 11_
`
`MAIN
`MEMORY
`E
`
`MEMORY
`HE
`
`DEDICATED IP
`MEMORY
`E
`
`PRIOR ART
`
`FIGURE 1
`
`0002
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 2 of 13
`
`6,104,417
`
`UNIFIED
`SYSTEM
`MEMORY
`
`102
`
`MEMORY CONTROLLER
`
`ENGINE
`
`E
`
`OUTPUT
`
`10
`
`END
`
`E
`
`PROCESSOR
`
`Q4
`
`E
`DATA COMPRESSION!
`EXPANSION DEVICE
`
`FIGURE 2A
`
`0003
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 3 of 13
`
`6,104,417
`
`64 @ 100 MHZ
`
`GF'AF'H'°5
`BACK END
`
`64 @100 MHZ
`
`GRAPHRCS RENDERING
`AND MEMORY
`CONTROLLER IC
`‘EB
`
`RENDEHING
`ENGINE
`
`28
`
`33
`133 MHZ
`
`INPUT!
`OUTPUT
`
`133 MHZ
`
`UNIFIED SYSTEM MEMORY
`(USMJ
`
`FIGURE 2B
`
`0004
`
`
`
`U.S. Patent
`
`Aug.15,200l)
`
`Sheet 4 of 13
`
`6,104,417
`
`CPUIIPCE
`INTERFACE
`
`GEE
`INTERFACE
`E
`
`E
`INTQIEIIQACE
`
`MEMORY CONTROLLER
`
`FIGURE 2C
`
`0005
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 5 of 13
`
`6,104,417
`
`512-BYTES
`
`00-0}
`
`300
`
`HGURESA
`
`302
`
`30
`
`HGUHESB
`
`0006
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 6 of 13
`
`6,104,417
`
`GEN TLB
`INDEX
`
`TLB TYPE
`
`FRAME BUFFER
`
`“-3 A
`256 x 16-BITS
`
`FRAME BUFFEFI
`TLB B
`256 X 16—BiTS
`
`FRAME BUFFER
`TLB C
`256 X 16-BITS
`
`TEXTURE TLB
`112 X 16-BITS
`
`CID TLB
`16 X 16-BITS
`
`LJNEAFI A TLB
`32 X 32-BITS
`
`LINEAR B TLB
`32 X 32-BITS
`
`GEN OFFSET
`
`FIGURE 3C
`
`0007
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 7 of 13
`
`6,104,417
`
`Command Pipe
`
`CS_N
`
`RASFN
`
`CAS_N
`
`VVE_N
`
`Mem_Addr
`
`Memmask_out
`
`ECCmask
`
`Merndata2mern_out
`
`Memdatazcllentjn
`
`.0.
`
`Memma5k_in
`
`Data Pipe
`
`Decode Stall
`‘1'— hoid
`'0' - enable
`
`Mernda'ta2mem_i.n
`
`Memdata2mem_oul
`
`I
`
`Decode Stall
`
`ECC
`General =
`
`ECC
`Correct
`
`FIGURE 4
`
`0008
`
`
`
`U.S. Patent
`
`Aug.15,200l)
`
`Sheet 8 of 13
`
`6,104,417
`
`Clk
`
`Clientreqxalid
`
`C|ientreq.adr
`
`C|ientreq.cmd
`
`Clientreqmsg
`
`Clientreq.ecc
`
`Clientreagnt
`
`Y
`Y
`Y
`Y
`Y
`AA-AMAMA
`
`V
`V
`V
`V
`V
`...-.%.%.
`V
`v
`Y
`7
`Y
`A L-Aw; M592 A
`
`TH
`
`request latched into queue
`
`FIGURE 5
`
`0009
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 9 of 13
`
`6,104,417
`
`Clientreawrrdy
`
`C|ientres.oe
`
`|
`
`-
`
`_
`cnemremmsg
`
`Y
`Y
`Y
`1'
`Y
`.@.j.@.@.
`
`memaataemenun
`
`V
`V
`Datao .3 Dam
`
`MernmaSk_in
`
`Maskfl
`
`
`
`‘TA A
`
`MaSl<2
`
`FIGURE 6
`
`C|ientres.rdrdy :
`
`V
`V
`V
`V
`V
`Clientresmdmsg A L_A A
`
`Memdtal2c|ienl_out
`
`T
`Y
`D390 n Dam?
`
`FIGURE 7
`
`0010
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 10 of 13
`
`6,104,417
`
`We_n
`
`Mem_addr
`
`Memdata2mem_out
`
`Me mmask_out
`
`Ecc__out
`
`Eccmask
`
`I
`
`I
`
`I
`
`I
`
`4
`Precharge
`
`4
`Activate
`
`1!
`write
`
`FIGURE 8
`
`0011
`
`
`
`U.S. Patent
`
`Aug. 15,2000
`
`Sheet 11 of 13
`
`6,104,417
`
`Cs_n
`
`Ras_n
`
`Ca:-:_n
`
`We_n
`
`Me-m_addr
`
`Merndata2mern_ou1
`
`Memrnask_out
`
`Ecc_out
`
`Eccmask
`
`|
`
`|
`
`V
`Y
`Y
`Y
`.m.-mp.
`
`V
`V
`Am;
`
`7
`7
`gm
`
`Y
`v
`.1.
`
`Precharge
`
`FIGURE 9
`
`0012
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 12 of 13
`
`6,104,417
`
`Address from CPU
`16 M1:-it Bank
`EBS
`
`IBS
`
`row address
`
`column address
`
`2928 272625 24 23222120 19181716
`
`151413121110 9 8
`
`7 6 5 4 3210
`
`Internal Revenue Address
`16 Mbil Bank
`
`EBS
`
`IBS
`I
`II
`24 232221201918171615141312111093 7654 3210
`
`row address
`
`column address
`
`Address from CPU
`64 Mei! Bank
`EBS
`EBS
`
`row address
`
`column address
`
`2928 27262524
`
`23222120 19131716 15141312 1110 9 8
`
`7 B 5 4
`
`3 210
`
`Iniernal Revenue Address
`64 Mbit Bank
`EB
`IBS
`
`row address
`
`column address
`
`24 23222120 19181715151413-12111098 7654
`
`3210
`
`Key -
`IBS - Internal Bank Select
`EBS - Extemal Bank Select
`
`FIGURE 10
`
`0013
`
`
`
`U.S. Patent
`
`Aug. 15, 2000
`
`Sheet 13 of 13
`
`6,104,417
`
`RR or Flw and
`Tras = 0
`
`HR or Fiw and
`Tras = 0
`
`PR — Page Flead
`PW - Page Write
`FIFI - Random Read
`RW - Random Write
`
`FIGURE 11
`
`0014
`
`
`
`6,104,417
`
`1
`UNIF'IED MEMORY COM!-‘U'l‘l*IR
`ARCHITltlC'I'llRE WITH DYNAMIC
`GRAPHICS MEMORY ALLOCATION
`
`BACKGROUNI) OF TI-IE INVl:'.NTION
`
`The present invention relates to the field of computer
`systems. Specifically,
`the present
`invention relates to a
`computer system architecture including dynamic memory
`allocation of pixel buffers for graphics and image process-
`mg.
`
`BACKGROUND OF THE. INVlE.N'l'lON
`
`Typical prior art computer systems often rely on periph-
`eral processors and dedicated peripheral memory units to
`perform various computational operations. For example,
`peripheral graphics display processors are used to render
`graphics images (synthesis) and peripheral image processors
`are used to perform image processing (analysis). In typical
`prior art computer systems, CPU main memory is separate
`from peripheral memory units which can be dedicated to
`graphics rendering or image processing or other computa-
`tional functions.
`With reference to Prior Art FIG. 1, a prior art computer
`graphics system 100 is shown. The prior art computer
`graphics system 100 includes three separate memory units;
`a main memory 102, a dedicated graphics memory 104, and
`a dedicated image processing memory (image processor
`memory) 105. Main memory 102 provides fast access to
`data for a ('.‘l’U 106 and an inputfoutput device 108. The
`CPU 106 and inputfoulput device 108 are connected to main _
`memory 102 via a main memory controller 110. Dedicated
`graphics memory 104 provides fast access to graphics data
`for a graphics processor 112 via a graphics memory con-
`troller 114. Dedicated image processor memory 105 pro-
`vides fast access to buffers of data used by an image __
`processor 116 via an image processor memory controller
`118. In the prior art computer graphics system 100, (TPU 106
`has readtyvrite access to main memory 102 but not
`to
`dedicated graphics memory 104 or dedicated image proces-
`sor memory 105. Likewise,
`the image processor 116 has
`readfwrite access to dedicated image processor memory 105.
`but not to main memory 102 or dedicated graphics memory
`104. Similarly, graphics processor 112 has readi"\-vrite access
`to dedicated graphics memory 104 but not to main memory
`102 or dedicated image processor memory 105.
`Certain computer system applications require that data,
`stored in main memory 102 or in one of the dedicated
`memory units 104, 105, he operated upon by a processor
`other than the processor which has access to the memory
`unit
`in which the desired data is stored. Whenever data .
`stored in one particular memory unit is to be processed by
`a designated processor other than the processor which has
`access to that particular memory unit,
`the data must be
`transferred to a memory unit
`for which the designated
`processor has ‘access. For example, certain image processing .
`applications require that data, stored in main memory 102 or
`dedicated graphics memory 104, be processed by the image
`processor 116. Image processing is defined as any function
`{s} that apply to two dimensional blocks of pixels. These
`pixels may be in the format of file system images, fields, or
`frames of video entering the prior art computer system 100
`through video ports, mass storage devices such as
`CD-ROMs, fixed-disk subsystems and Local or Wide Area
`network ports. In order to enable image processor 116 to
`access data stored in main memory 102 or in dedicated
`graphics memory 194, the data must be transferred or copied
`to dedicated image processor memory 105.
`
`_
`
`.
`
`2
`One problem with the prior art computer graphics system
`100 is the cost of high perfomtance peripheral dedicated
`memory systems such as the dedicated graphics memory
`unit 104 and dedicated image processor memory 105.
`Another problem with the prior art computer graphics sys-
`tem 100 is the cost of high performance interconnects for
`multiple memory systems. Another problem with the prior
`art computer graphics system 100 is that the above discussed
`transfers of data between memory units require time and
`processing resources.
`Thus, what is needed is a computer system architecture
`with a single unified memory system which can be shared by
`multiple processors in the computer system without trans-
`ferring data between multiple dedicated memory units.
`SUMMARY OF Tl-IE INVE.N'I'l0N
`
`invention pertains to a computer system
`The present
`providing dynamic memory allocation for graphics. The
`computer system includes a memory controller, a unified
`system memory, and memory clients each having access to
`the system memory via the memory controller. Memory
`clients can include a graphics rendering engine, a central
`processing unit
`(CPU), an image processor, a data
`cttmpretrsionfexpartsittn device, an inputfoutput device, and
`a graphics back end device. In a preferred embodiment, the
`rendering engine and the memory controller are imple-
`mented on a first integrated circuit (first IC) and the image
`processor and the data compressiontexpansion are imple-
`mented on a second 1C. The computer system provides
`readfwrite access to the unified system memory, through the
`memory controller, for each of the memory clients. Trans-
`lation hardware is included for mapping virtual addresses of
`pixel butiers to physical memory locations in the unified
`system memory. Pixel bulfers are dynamically allocated as
`tiles of physically contiguous memory. Translation
`hardware, for mapping the virtual addresses of pixel buffers
`to physical memory locations in the unified system memory.
`is implemented in each of the computational devices which
`are included as memory clients in the computer system.
`In a preferred embodiment, the unified system memory is
`implemented using synchronous DRAM. Also in the pre-
`ferred embodiment, tiles are comprised of 64 kilobytes of
`physically contiguous memory arranged as 138 rows of 138
`pixels wherein each pixel is a 4 byte pixel. However, the
`present invention is aLso well suited to using tiles of other
`sizes. Also in the preferred embodiment, the dynamically
`allocated pixel bulfers are comprised of n3 tiles where n is
`an integer.
`The computer system of the present invention provides
`functional advantages for graphical display and image pro-
`cessing. There are no dedicated memory units in the com-
`puter system of the present invention aside from the unilied
`system memory. Therefore, it
`is not necessary to transfer
`data from one dedicated memory unit to another when a
`peripheral processor is called upon to process data generated
`by the ('.'PU or by another peripheral device.
`BRIEF I)ESCRIP'I'[ON OF THE. DRAWINGS
`
`The present invention is illustrated by way of example,
`and not by way of limitation, in the ligures of the accom-
`panying drawings and in which like reference numerals refer
`to similar elements and in which:
`Prior Art FIG. 1 is a circuit block diagram of a typical
`prior art computer system including peripheral processors
`and associated dedicated memory units.
`FIG. 2A is a circuit block diagram of an exemplary unified
`system memory computer architecture according to the
`present invention.
`
`0015
`
`
`
`6,104,417
`
`3
`FIG. 2B is an internal circuit block diagram of a graphics
`rendering and memory controller [(2 including a memory
`controller (M(_‘}and a graphics renderittg engine integrated
`therein.
`
`is an internal circuit block diagram of the -
`FIG. 2C’
`graphics rendering and memory controller [(1 of FIG. 2B.
`FIG. 3Ais an illustration ofan exemplary tile fordynamic
`allocation of pixel bufl°ers according to the present invention.
`FIG. 313 is an illustration of an exemplary pixel buffer
`comprised of It: tiles according to the present invention.
`FIG. 3C is a block diagram of an address translation
`scheme according to the present invention.
`FIG. 4 is a block diagram ol' a memory controller accord-
`ing to the present invention.
`FIG. 5 is a timing diagram for memory client requests
`issued to the unified system memory according to the
`present invention.
`FIG. 6 is a timing diagram for memory clicnt write data
`according to the present invention.
`FIG. 7 is a timing diagram for memory client read data
`according to the present invention.
`FIG. 8 is a timing diagram for an exemplary write to a
`new page performed by the unified system memory accord-
`ing to the present invention.
`FIG. 9 is a timing diagram for an exemplary read to a new
`page performed by the unified system memory according to
`the present invention.
`FIG. 10 shows external banks of the memory controller _
`according to the present invention.
`FIG. 11 shows a flow diagram for bank state machines
`according to the present invention.
`DETAILED DESCRIPTION OF TI-I1’.
`INVENTION
`
`4
`system 200 includes a unified system memory 204 which is
`shared by various memory system clients including a CPU
`206. a graphics rendering engine 208. an inputfoutput I(_‘
`210, a graphics back end IC 212. an image processor 214.
`data compressionrlexpansion device 215 and a memory con-
`troller 204.
`‘With reference to FIG. 21-}, an exemplary computer sys-
`tem 201, according to the present
`invention.
`is shown.
`Computer system 201 includes the unified system memory
`202 which is shared by various memory system clients
`including the CPU 206. the inputloutput IC 210. the grapltics
`back end IC 212, an image processing and compression and
`expansion IC 216, and a graphics rendering and memory
`controller [C 218. The image processing and eornpressiun
`and expansion IC 216 includes the image processor 214, and
`a data compression and expansion unit 215. GRMC IC 218
`includes the graphics rendering engine {rendering engine)
`208 and the memory controller 204 integrated therein. The
`graphics rendering and memory controller [C218 is coupled
`to unilied system memory 202 via a high bandwidth memory
`data bus (IIBWMD BUS) 225. In a preferred embodiment of
`the present invention. 1-IBWMD BUS 225 includes a demol-
`tiplexer (SD-MUX} 220. a lirst BUS 222 coupled between
`the graphics rendering and memory controller IC 218 and
`SD—MUX 220, and a second bus 224 coupled between
`I‘ SI)-MUX 220 and unified system memory 202.
`In the
`preferred embodiment of the present invention, BUS 222
`includes 144 lines cycled at [33 NIH: and BUS 224 includes
`288 lines cycled at 56 MHz. SI)-MUX 220 demultiplexes
`the 144 lines of BUS 222. which are cycled at 133 MHZ, to
`double the number of lines, 288, of BUS 224, which are
`cycled at half the frequency, 66 MHz. CPU 206 is coupled
`to the graphics rendering and memory controller IC‘ 218 by
`a third bus 226. In the preferred embodiment of the present
`invention, BUS 226 is 64 bits wide and carries signals
`cycled at 100 MHZ. The image processing and compression
`and expansion IC 216 is coupled to BUS 226, by a third bus
`228. In the preferred embodirnent of the present invention,
`BUS 228 is 64 bits wide and carries signals cycled at 101!
`MHZ. The graphics back end IC 212 is coupled to the
`grapltics rendering and memory controller IC 218 by a
`fourth lrius 230. In the preferred embodiment of the present
`invention, BUS 230 is 64 bits wide and carries signals
`cycled at 133 Mllz. The inputloutput IC 210 is coupled to
`the graphics rendering and memory controller IC 218 by a
`fifth bus 232. In the preferred embodiment of the present
`invention, BUS 232 is 32 bits wide and carries signals
`cycled at 133 MI-I2.
`'I'l.te inputfoutput IC 210 of FIG. 2A contains all of the
`inputfoutput interfaces including: keyboard Sr. mouse, inter-
`val timers, serial, parallel. ic, audio, video in & out. and fast
`cthcrnet. The inputr’oLttput IC‘ 210 also contains an interface
`to an external 64-bit PCI expansion bus, BUS 231,
`that
`supports five masters (two SCSI controllers and three expan-
`sion slots).
`With reference to FIG. 2C, an internal circuit block
`diagram is shown of the graphics rendering and memory
`controller IC 218 according to an embodiment of the present
`invention. As previously mentioned. rendering 20 engine
`208 and memory controller 204 are integrated within the
`graphics rendering and memory controller IC 218. The
`graphics rendering and memory controller IC 218 also
`includes a CI’U,lII’Cl:'. interface 238, an inputroutput inter-
`face 240, and a GBE interface 236.
`With reference to FIGS. 2A and 2B. GBE interface 232
`buffers and transfers display data from unified system
`memory 202 to the graphics back end i(_‘ 212 in 16x32-byte
`
`_
`
`In the following detailed klescription of the present
`invention, numerous specific details are set forth in order to
`provide a thorough understanding of the present invention.
`However, it will be obvious to one skilled in the an that the
`present invention may be practiced without these specific
`details. In other instances well known methods. procedures.
`components, and circuits have not been described in detail
`as not
`to unnecessarily obscure aspects of the present
`invention.
`Reference will now be made in detail to the preferred
`embodiments of the present invention, a computer system
`architecture having dynamic memory allocation for
`graphics, examples of which are illustrated in the accompa-
`nying drawings. While the invention will be described in _
`conjunction with the preterred ernbodimerILs,
`it will he
`understood that they are not intended to limit the invention
`to these embodiments. On the contrary,
`the invention is
`intended to cover alternatives. modifications and
`equivalents, which may be included within the spirit and ,_
`scope of the invention as defined by the appended claims.
`Furthermore,
`in the following detailed description of the
`present invention, numerous specilic details are set forth in
`order to provide a thorough understanding of the present
`invention. However, it will be obvious to one of ordinary
`skill in the art that the present invention may be practiced
`without these specilic details. In other instances, well known
`methods, procedures, components, and circuits have not
`been described in detail as not
`to unnecessarily obscure
`aspects of the present invention.
`With reference to FIG. 2A, a computer system 200,
`according to the present
`invention,
`is shown. Computer
`
`0016
`
`
`
`6,104,417
`
`S
`bursts. GBE interface 232 buffers and transfers video cap-
`ture data from the graphics back end [(f 212 to unified
`system memory 202 in 16x32-byte bursts. (iBl:‘. interface
`232 issues GBE interrupts to CPUKIPCE interface 234. BUS
`228, shown in both FIG. 2A and FIG. 2B, couples GBE ‘
`interface 232 to the graphics back end IC 212 (FIG. 2A). The
`inputfoutput interface 236 buffers and transfers data from
`unified system memory 202 to the inputfoutput IC 210 in
`8:-<32-byte bursts. The inputioutput interface 236 buffers and
`transfers data from the inputloutput IC 210 to uni lied system
`memory 202 in 8><32-byte bursts. The inpuuoutput interface
`236 issues the inputfoutput
`IC interrupts to CPUKIPCE
`interface 234. BUS 230, shown in both FIG. 2A and FIG.
`2B, couples the inputroutput
`interface 236 to the input!
`output lC'210(i-‘IG. 2A). Abus, BUS 224, providescoupling
`between CPUIIPCE interface 234 and CPU 206 and the
`image processing and compression and expansion IC 216.
`With reference to FIG. 2A, the memory controller 214 is
`the interface between memory system clients [CPU 206,
`rendering engine 208, inputfoutput IC 210, graphics back
`end IC 212, image processor 214, and data contpressionf
`expansion device 215) and the unified system memory 202.
`As previously mentioned,
`the memory controller 214 is
`coupled to unified system memory 202 via HBWMD BUS
`225 which allows fast transfer of large amounts of data to .
`and from unified system memory 202. Memory clients make
`read and write requests to unified system memory 202
`through the memory controller 214. The memory controller
`214 converts requests into the appropriate control 5 equences
`and passes data between memory clients and unified system _
`memory 202. In the preferred embodiment of the present
`invention, the memory controller 214 contains two pipeline
`structures, one for commands and another for data. The
`request pipe has three stages, arbitration, decode and IJSSIIC.-"
`state machine. The data pipe has only one stage, ECC.
`Requests and data llow through the pipes in the following
`manner. Clients place their requests in a queue. The arbi-
`tration logic looks at all of the requests at the top of the client
`queues and decides which request to start through the pipe.
`From the arbitration stage, the request llows to the decode
`stage. During the decode stage.
`inlorrnation about
`the
`request is collected and passed onto an issuefstate machine
`stage.
`With reference to FIG. 2A, the rendering engine 208 is a
`2-D and 3-D graphics coprocessor which can accelerate
`rasterization.
`In a preferred embodiment of the present
`invention. the rendering engine 208 is also cyclcd at 66 M I-12
`and operates synchronously to the unified system memory
`202. The rendering engine 208 receives rendering param-
`eters from the CPU 206 and renders directly to frame bulfers .
`stored in the unified system memory 202 (FIG. 2A). The
`rendering engine 208 issues memory access requests to the
`memory controller 214. Since the rendering engine 208
`shares the unilied system memory 202 with other memory
`clients, the perforrnancc of the rendering engine 208 will
`vary as a function of the load on the unified system memory
`202. The rendering engine 208 is logically partitioned into
`four major functional uniLs: a host interface, a pixel pipeline,
`:1 memory transfer engine. and a memory request unit. The
`host interface controls reading and writing from the host to
`programming interface registers. The pixel pipeline imple—
`ments a rasterixation and rendering pipeline to a frame
`bulfer. The memory transfer engine performs memory band-
`width byte aligned clears and copies on both linear buffers
`and frame buffers. The memory request unit arbitrates
`between requests from the pixel pipeline and queues up
`memory requests to be issued to the memory controller 214.
`
`__
`
`..
`
`6
`The computer system 200 includes dynamic memory
`allocation of virtual pixel buffers in the unified system
`memory 202. Pixel buffers include frame buffers. texture
`maps, video maps, image buffers, etc. Each pixel buffer can
`include multiple color bull}.-rs, a depth butter, and a stencil
`butfer. In the present invention. pixel butfcrs are allocated in
`units of contiguous memory called tiles and address trans-
`lation bullets are provided for dynamic allocation of pixel
`buffers.
`With reference to FIG. 3A, an illustration is shown of an
`exemplary tile 300 for dynamic allocation of pixel buffers
`according to the present invention. In a preferred embodi-
`ment of the present
`invention, each tile 300 includes I54
`kilobytes of physically contiguous memory. A 64 l-tilobyte
`tilt: size can be comprised of I28xl28 pixels for 32 bit
`pixels, 256x13 pixels for 16 hit pixels. or 512x128 pixels
`for 8 bit pixels. In the present invention, tiles begin on 64
`kilobyte aligned addresses. An integer number of tiles can be
`allocated for each pixel buffer. For example, a ltllxmlpixcl
`bullet and a 256x256 pixel buffer would both require four
`(128-428) pixel tiles.
`With reference to FIG. 3B, an illustration is shown of an
`exemplary pixel buffer 3112 according to the present inven-
`tion. In the computer system 200 of the present invention,
`translation hardware maps virtual atldresscs of pixel l'}1.tfft)l'S
`302 to physical memory locations in unified system memory
`202. Each of the computational units of the computer system
`200 (image processing and compression and expansion IC,
`212, graphics back end IC 212, The inputfoutput IC 210, and
`rendering engine 208) includes translation hardware for
`mapping virtu al addresses of pixel buffers 302 to physical
`memory locations in unilied system memory 202. Each pixel
`hulfer 302 is partitioned into [12 tiles 300, where rt is an
`integer. In a preferred embodiment of the present invention,
`n-4.
`
`The rendering engine 208 supports a frame bufler address
`translation butfer ('l‘l.B)
`to translate frame buffer (x.y)
`addresses into physical memory addresses. This TLB is
`loaded by CPU 206 with the base physical memory
`atldresses of the tiles which compose a color buffer and the
`s1enciI—depth buffer ofa frame buffer. In a preferred embodi-
`ment of the present invention, the frame buffer 'l'LI3 has
`enough entries to hold the tile base physical memory
`addrc.v.<ses of a 2048x2048 pixel color l’t'tJffI.:l‘ and a 2048x
`" 2048 pixel stencil-depth buffer. Therefore, the TLB has 256
`entries for color buffer tiles and 256 entries for stencil-depth
`bulfer tiles.
`
`Tiles provide a convenient unit for memory allocation. By
`allowing tiles to be scattered throughout memory,
`tiling
`makes the amount of memory which must be contiguously
`allocated manageable. Additionally, tiling provides a means
`of reducing the amount of system memory consumed by
`frame buffers. Rendering to tiles which do not contain any
`pixels pertinent for display,
`invisible tiles, can be easily
`clipped out and hence no memory needs to be allocated for
`these tiles. For example, a lt'l24x1(]24 virtual frame buffer
`consisting of front and back RGBA bu|]'ers and a depth
`bulfer would consume 12 Mb of memory if fully resident.
`However, if each 1024x1024 buffer were partitioned into 64
`(128><l28) tiles of which only four tiles contained non-
`occluded pixels, only memory for those visible tiles would
`need to be allocated.
`In this case, only 3 MB would be
`consumed.
`
`invention, memory system clients (e.g.,
`In the present
`CPU 206, rendering engine 208, inputfoutput l(_‘ 2 I0, graph-
`ics back end IC 212,
`image processor 214. and data
`
`0017
`
`
`
`6,104,417
`
`8
`SDRAM components and populated on the front only or the
`front and back side of the DIMM. Two DIMMS are required
`to make an external SDRAM bank. 1Mx1t3 SDRAM com-
`ponents construct a 32 Mbyte external bank, while -’lM><16
`SDRAM components construct a 128 Mbyle external bank.
`unified system memory 202 can range in sin: from 32
`Mhytes to 1 Gbyte.
`FIG. 3C shows a block diagram of an address translation
`scheme according to the present invention. FIG. 4 shows a
`block diagram of the memory controller 204 ot' the present
`invention.
`A memory client interface contains the signals listed in
`Table 1, below.
`
`TABLE 1
`
`Mcmog clicnt interface signals
`CREME
`Pin
`Name
`
`# of
`BiLs
`
`Description
`
`3
`
`'
`
`'
`
`intema
`only
`
`intema
`only
`lt11£1‘ttfl
`onlyinterrtn
`only
`ittterrta
`only
`internal
`only
`irtterrtzt
`only
`internal
`only
`int:-rna
`only
`internzt
`only
`internal
`only
`Enterna
`only
`
`internal
`only
`
`type of request -
`I — rv.-act
`3 - write
`4 - rntw
`addrms of request
`
`ntesaage sent with request
`I - valid
`El ‘ not valid
`t
`- ecc is valid
`(J - ecc not valid
`t - room in client queue
`0 - no room
`I
`- ME‘ is ready for write data
`0 ~ MC‘ not ready for write
`data
`I — valid read data
`0 — not valid read data
`E — enable client driver
`0 - disable client driver
`[cud ntcssnge sent with read
`data:
`-writ: ntessage sent with wt-rdy
`
`memory data from client
`going to unified system
`lIl.GWlDf:y'
`memory ma.-rtt from client
`going
`to unified system memory
`ll
`- write byte
`t - don't write byte
`trterrtntaslt
`in (U) is matched
`with memdamintem _in (7:0)
`and so on.
`memory data from unified
`system rnentnry going to the
`client
`
`Signal
`
`c ientreqcmd
`
`ierttreqadr
`
`ierttreqmsg
`
`* iertu:q.v
`
`ierttrcqecc
`
`ie ntresgnt
`
`ie rttrcs.wrrdy
`
`' ic rttrc5.rdr-:|_\'
`‘ ie rttte.-:.<>e
`
`c ic nlrt.-s.rdrnsg
`
`c ie titres. wrmsg
`memdnta2-
`ment
`in
`me mnta:ak_.in
`
`n1en1dnta2-
`cliertt_ out
`
`With reference to FIG. 5, at timing diagram for memory
`client requests is shown. A memory client makes a request
`to the memory controller 204 by asserting clientreq.valid
`while setting the clientreq.adr, clientreqmsg, clientreq.cmt|
`and clientreq.ecc lines to the appropriate values. if there is
`room in the queue, the request is latched into the memory
`client queue. Only two of the memory clients, the rendering
`engine 208 and the inputfoutput IC 210, use clientreq.msg.
`The message specifies which subsystem within the input!
`output IC 210 or the rendering engine 208 made the request.
`When an error occurs, this message is saved along with other
`pertinent information to aid in the debug process. For the
`rendering engine 208,
`the message is passed through the
`request pipe and returned with other pertinent infomtation to
`
`7
`compressitinlexpansion devitx: 215) share the unilied system
`memory 202. Since each memory system client has access
`to memory shared by each of the other memory system
`clients,
`there is no need for transferring data from one
`dedicated memory unit to another. For example, data can be
`received by the inputfoutput
`IC 210, decomprc.<tsed (or
`expanded) by the data compressionfexpansion device 215,
`and stored in the unified system memory 202. "this data can
`then be accessed by the CPU 206, the rendering engine 208,
`the inputloulput IC 210. the graphics back end IC 212, or the
`image processor 214. As a second example, the CPU 206,
`the rendering engine 208,
`the inpulfoutput
`IC 210,
`the
`graphics back end IC 212, or the image processor 214 can
`use data generated by the CPU 206, the rendering engine
`208, the inputfoutput [C 210, the graphics back end It? 212,
`or the image processor 214. Each ofthe computational units
`(CPU 206, inputfoutpul IC 210, the graphics back end IC
`212, the image processing and compression and expansion
`IC 216, the graphics rendering and memory controller [C
`218, and the data oompressionxexpansion device 215) has
`translation hardware for determining the physical i1t.lElI‘c$.‘-‘:65
`of pixel bullets as is discussed below.
`There are numerous video applications for which the
`present invention computer system 200 provides fimctional
`advantages over prior art computer system architectures.
`These applications range from video conferencing to video
`editing. There is significant variation in the processing
`required for the various applications, but a few processing
`steps are common to all applications: capture,
`littering,
`scaling. compression, blending. and display. In operation of
`computer system 200,
`inputfoutput IC 210 can bring in a
`compressed stream of video data which can be stored into
`unified system memory 202. The inputfoutput IC 210 can
`access the compressed data stored in unified system memory
`220, via a path through the graphics rendering and memory __
`controller IC 218. The inputfoulput IC 210 can then decom-
`press the accessed data and store the decompressed data into
`unified system memory 202. The stored image data can then
`be used. for example. as a texture map by rendering engine
`208 for mapping the stored image onto another image. The
`resultant image can then be stored into a pixel buffer which
`has been allocated dynamically in unified system memory
`202. If the resultant i.rI1age is stored into a frame buffer,
`allocated dynamically in unified system memory 202, then
`the resultant image can be displayed by the graphics back
`end IC 212 or the imag