throbber
(12) United States Patent
`Perego
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 6,864,896 B2
`Mar. 8, 2005
`
`US006864896B2
`
`345/542
`345/542
`.. 341/51
`......... .. 375/240.12
`
`
`
`2/1999 Katayamaetal.
`8/ZOOIJ Nielsen el al.
`3/2001 Dye et al.
`..
`2/2004 Yttvits et ul.
`
`5,867,180 A +
`f),1[l4,417 A *
`6,208,273 B1 *
`6,690,726 B1 *‘
`
`* Cited by examiner
`
`C R 11
`_M h
`-
`F
`P -
`C 3
`”'."””-V "m'"”.'”
`“I. W
`Asslsmnt Exmmn,er—Dal1p K. Singh
`
`(74) Attorney, Agent, or Firm—Lee & Hayes, PLLC
`(57)
`ABSTRACT
`
`A memory architecture includes a memory controller
`coupled to multiple modules. Each module includes a coni-
`puting engine coupled to a shared memory. Each computing
`engine is capable of receiving instructions from the memory
`controller and processing the received instructions. The
`shared memory is configured to store main memory data and
`graphical data. Certain computing engines are capable of
`processing graphical data. The memorv controller mav
`include a graphics controller that providesiinstructions to the
`computing engine. An interconnect on each module allows
`multiple modules to be coupled to the memory controller.
`
`(54) SCALABLE UNIFIED MEMORY
`ARCHITECTURE
`
`(75)
`
`Inventor: Richard E. Perego, San Jose, CA (US)
`
`(73) Assignee: Rambus Inc., Los Altos, CA (US)
`(_ * ) Notice:
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.s.C. 154(b) by 396 days.
`'
`
`(21) Appl. No‘: 09/858,836
`('22)
`Filed:
`May 15, 2001
`
`(65)
`
`Prior Publication Data
`US 2[l02/0171652 A1 Nov. 21, 2002
`
`............................... .. GUGF 15/167
`Int. Cl.7 ..... ..
`(51)
`(52) U.S.Cl.
`..... ..
`345/542; 345/543; 345/544
`.. 345/421, 542,
`('58) Field of Search .
`s
`
`5 505; 531, 532;
`345/'51 5
`~ ;
`520’ 535’ 543’ 5449 710/269 712/201? 711/209’
`120’ 170’ 111’ 105’ 1035 341/51; 375”240'12
`R f
`C't d
`e erences
`I e
`U.S. PATENT DOCUMENTS
`
`'56
`(‘
`
`)
`
`
`
`5,325,493 A *
`
`6/1994 l-lerrell et al.
`
`............ .. 712/201
`
`34 Clflillls; 7 Dr-“Wing Sheets
`
`300 ~\
`
`\
`
`304 “\
`
`
`RENDERING
`SHARED MAIN
`AND GRAPHICS
`ENGINE
`MEMORY g
`
`
`
`:
`
`316 \
`‘
`
`MEMORY
`CoNTRoLLER/
`GRAPHICS
`CONTROLLER
`
`I
`_ _ _ _ A .1
`
`VIDEO INPUT
`DISPLAY
`INTERFACE
`
`CPU/MEMORY
`CONTROLLER SUBSYSTEM
`\‘_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
`
`I ,
`
`302 ._\
`
`/,
`
`305
`
`CPU
`
`:
`.
`:
`I
`,
`310
`I
`I _ _ , _ _ W _ _ _ _ V _ _ _ _ _ _ _ _ _ _ _
`
`306
`
`CoNTRoLLER
`
`0001
`
`Volkswagen 1006
`
`0001
`
`Volkswagen 1006
`
`

`
`U.S. Patent
`
`Mar. 8,2005
`
`Sheet 1 of 7
`
`US 6,864,896 B2
`
`GRAPHICS
`MEMORY
`
`MEMORY
`CONTROLLER
`
`GRAPHICS
`CONTROLLER
`
`DISPLAY
`INTERFACE
`
`I/O
`CONTROLLER
`
`76¢. 1
`(Prior Art)
`
`0002
`
`

`
`U.S. Patent
`
`Mar. 8,2005
`
`Sheet 2 of 7
`
`US 6,864,896 B2
`
`CPU/M EMORY
`CONTROLLER SUBSYSTEM
`
`202
`
`VIDEO INPUT
`
`DBPLAY
`INTERFACE
`
`SHARED MAIN
`MEMORY AND
`GRAPHICS MEMORY
`
`MEMORY
`CONTROLLE R/
`GRAPHICS
`CONTROLLER
`
`CONTROLLER
`
`76¢. 2‘
`(Prior Art)
`
`0003
`
`

`
`U.S. Patent
`
`Mar. 8,2005
`
`Sheet 3 of 7
`
`US 6,864,896 B2
`
`CPU/MEMORY
`
`302 . CONTROLLER SUBSYSTEM
`
`RENDERING
`ENGINE
`312
`
`SHARED MAIN
`AND GRAPHICS
`MEMORY 314
`
`VIDEO INPUT
`
`D'SPLAY
`INTERFACE
`
`MEMORY
`CONTROLLE R/
`GRAPHICS
`CONTROLLER
`
`CONTROLLER
`
`0004
`
`

`
`U.S. Patent
`
`Mar. 8, 2005
`
`Sheet 4 of 7
`
`US 6,864,896 B2
`
`THE MEMORY CONTROLLER/GRAPHICS CONTROLLER
`RECEIVES A PROCESSING TASK
`
`THE MEMORY CONTROLLER/GRAPHICS CONTROLLER
`PARTITIONS THE PROCESSING TASK INTO
`MULTIPLE PORTIONS
`
`THE MEMORY CONTROLLER/GRAPHICS CONTROLLER
`DISTRIBUTES EACH PORTION OF THE PROCESSING
`TASK TO A RENDERING ENGINE
`
`ALL RENDERING ENGINES FINISHED?
`
`THE MEMORY CONTROLLER/GRAPHICS CONTROLLER
`REASSEMBLES THE FINISHED PORTIONS OF THE TASK
`
`0005
`
`

`
`U.S. Patent
`
`r..aM
`
`50028:
`
`Sheet 5 of 7
`
`US 6,864,896 B2
`
`HORIZONTAL
`
`V
`
`VERTICAL
`
`0006
`
`

`
`U.S. Patent
`
`Mar. 8,2005
`
`Sheet 6 of 7
`
`US 6,864,896 B2
`
`MEMORY MODULE
`
`MEMORY MODULE
`
`RENDERING
`ENGINE
`
`MEMORY DEVICES 704
`
`MEMORY DEVICES 704
`
`706
`
`|NTERCONNECT
`
`INTERCONNECT
`
`708
`
`0007
`
`

`
`U.S. Patent
`
`Mar. 8,2005
`
`Sheet 7 of 7
`
`US 6,864,896 B2
`
`MEMORY MODULE
`
`RENDERING
`
`RENDEWNG
`ENGWE
`
`MEMORY DEVICES 804
`
`806 x
`
`LE1
`MEMORY DEVICES 804
`
`MEMORY DEVICES 812
`
`808
`
`INTERCONNECT
`
`0008
`
`

`
`US 6,864,896 B2
`
`1
`SCALABLE UNIFIED MEMORY
`ARCHITECTURE
`
`TECHNICAL FIELD
`
`The present invention relates to memory systems and, in
`particular, to scalable unified memory architecture systems
`that support parallel processing of data.
`BACKGROUND
`
`Various systems are available for storing data in memory
`devices and retrieving data from those memory devices.
`FIG. 1 illustrates a computer architecture 100 in which a
`discrete graphics controller supports a local graphics
`memory. Computer architecture 100 includes a central pro-
`cessing unit (CPU) 102 coupled to a memory controller 104.
`Memory controller 104 is coupled to a main memory 108, an
`I/O controller 106, and a graphics controller 110. The main
`memory is used to store program instructions which are
`executed by the CPU and data which are referenced during
`the execution of these programs. The graphics controller 110
`is coupled to a discrete graphics memory 112. Graphics
`controller 110 receives video data through a video input 114
`and transmits video data to other devices through a display
`interface 116.
`
`The architecture of FIG. 1 includes two separate memo-
`ries (main memory 108 and graphics memory 112), each
`controlled by a different controller (memory controller 104
`and graphics controller 110, respectively). Typically, graph-
`ics memory 112 includes faster and more expensive memory
`devices, while main memory 108 has a larger storage
`capacity, but uses slower, less expensive memory devices.
`Improvements in integrated circuit design and manufac-
`turing technologies allow higher levels of integration,
`thereby allowing an increasing number of subsystems to be
`integrated into a single device.
`'lhis increased integration
`reduces the total number of components in a system, such as
`a computer system. As subsystems with high memory per-
`formance requirements (such as graphics subsystems) are
`combined with the traditional main memory controller, the
`resulting architecture may provide a single high-
`performance main memory interface.
`Another type of computer memory architecture is referred
`to as a unified memory architecture (UMA). In a UMA
`system, the graphics memory is statically or dynamically
`partitioned off from the main memory pool, thereby saving
`the cost associated with dedicated graphics memory. UMA
`systems often employ less total memory capacity than
`systems using discrete graphics memory to achieve similar
`levels of graphics performance. UMA systems typically
`realize additional cost savings due to the higher levels of
`integration between the memory controller and the graphics
`controller.
`
`FIG. 2 illustrates another prior art memory system 200
`that uses a unified memory architecture. A CPU/Memory
`Controller subsystem 202 includes a CPU 208 and a
`memory controller and a graphics controller combined into
`a single device 210. The subsystem 202 represents an
`increased level of integration as compared to the architecture
`of FIG. 1. Subsystem 202 is coupled to a shared memory
`204, which is used as both the main memory and the
`graphics memory. Subsystem 202 is also coupled to an I/O
`controller 206, a video input 212, and a display interface
`214.
`
`2
`portion of shared memory 204 and for data stored in the
`graphics memory portion of the shared memory. The shared
`memory 204 may be partitioned statically or dynamically. A
`static partition allocates a fixed portion of the shared
`memory 204 as “main memory” and the remaining portion
`is the “graphics memory.” A dynamic partition allows the
`allocation of shared memory 204 between main memory and
`graphics memory to change depending on the needs of the
`system. For example, if the graphics memory portion is full,
`and the graphics controller needs additional memory, the
`graphics memory portion may be expanded if a portion of
`the shared memory 204 is not currently in use or if the main
`memory allocation can be reduced.
`Regardless of the system architecture, graphics rendering
`performance is often constrained by the memory bandwidth
`available to the graphics subsystem. In the system of FIG. 1,
`graphics controller 110 interfaces to a dedicated graphics
`memory 112. Cost constraints for the graphics subsystem
`generally dictate that a limited capacity of dedicated graph-
`ics memory 112 must be used. This limited amount of
`memory, in turn, dictates a maximum number of memory
`devices that can be supported. In such a memory system, the
`maximum graphics memory bandwidth is the product of the
`number of memory devices and the bandwidth of each
`memory device. Device-level cost constraints and technol-
`ogy limitations typically set the maximum memory device
`bandwidth. Consequently, graphics memory bandwidth, and
`therefore graphics performance, are generally bound by the
`small number of devices that can reasonably be supported in
`this type of system configuration.
`Unified memory architectures such as that shown in FIG.
`2, help alleviate cost constraints as described above, and
`generally provide lower cost relative to systems such as that
`shown in FIG. 1. Ilowever, memory bandwidth for the
`system of FIG. 2 is generally bound by cost constraints on
`the memory controller/graphics controller 210. Peak
`memory bandwidth for this system is the product of the
`number of conductors on the memory data interface and the
`communication bandwidth per conductor. The communica-
`tion bandwidth per conductor is often limited by the choice
`of memory technology and the topology of the main
`memory interconnect. The number of conductors that can be
`used is generally bound by cost constraints on the memory
`controller/graphics controller package or system board
`, design. However, the system of FIG. 2 allows theoretical
`aggregate bandwidth to and from the memory devices to
`scale linearly with system memory capacity, which is typi-
`cally much larger than the capacity of dedicated graphics
`memory. The problem is that
`this aggregate bandwidth
`cannot be exploited due to the limiting factors described
`above relating to bandwidth limitations at
`the memory
`controller/graphics controller.
`A system architecture which could offer the cost savings
`advantages of a unified memory architecture, while provid-
`ing scalability options to higher levels of aggregate memory
`bandwidth (and therefore graphics performance) relative to
`systems using dedicated graphics memory would be advan-
`tageous.
`
`SUMMARY
`
`The systems and methods described herein achieve these
`goals by supporting the capability of locating certain pro-
`cessing fiinctions on the memory modules, while providing
`the capability of partitioning tasks among multiple parallel
`functional units or modules.
`
`The memory controller/graphics controller 210 controls
`all memory access, both for data stored in the main memory
`
`In one embodiment, an apparatus includes a memory
`controller coupled to one or more modules. Each installed
`
`0009
`
`

`
`US 6,864,896 B2
`
`3
`module includes a computing engine coupled to a shared
`memory. The computing engine is configured to receive
`information from the memory controller. The shared
`memory includes storage for i11str11ctions or data, which
`includes a portion of the main memory for a central pro-
`cessing 1Ii1it (CPU).
`In another embodiment, the computing engine is a graph-
`ics rendering engine.
`In a particular implementation of the system, the shared
`memory is configured to store main memory data and I
`graphical data.
`Another embodiment provides that the computing engine
`is coupled between the memory controller and the shared
`memory.
`In a particular embodiment,
`includes a graphics controller.
`In a described implementation, one or more modules are
`coupled to a memory controller. Each installed module
`includes a computing engine and a shared memory coupled
`to the computing engine.
`
`the memory controller
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates a prior art computer architecture in
`which a discrete graphics controller supports a local graph-
`ics memory.
`FIG. 2 illustrates another prior art memory system that
`uses a unified memory architecture.
`FIG. 3 illustrates an embodiment of a scalable unified
`memory architecture that supports parallel processing of
`graphical data.
`FIG. 4 is a flow diagram illustrating a procedure for
`partitioning a processing task into multiple portions and
`distributing the multiple portions to different rendering
`engines.
`FIG. 5 illustrates a graphical rendering surface divided
`into sixteen different sections.
`
`FIGS. 6 and 7 illustrate different memory modules con-
`taining a rendering engine and multiple memory devices.
`FIG. 8 illustrates a memory module containing two dif-
`ferent rendering engines and multiple memory devices asso-
`ciated with reach rendering engine.
`DETAILED DESCRIPTION
`
`The architecture described herein provides one or more
`discrete memory modules coupled to a con1i11on memory
`controller. Each memory module includes a computing
`engine and a shared memory. Thus, the data processing tasks
`can be distributed among the various computing engines and
`the memory controller to allow parallel processing of dif-
`ferent portions of the various processing tasks.
`Various examples are provided herein for purposes of
`explanation. These examples include the processing of
`graphical data by one or more rendering engines on one or
`more memory modules.
`It will be understood that
`the
`systems described herein are not
`limited to processing
`graphical data. Instead, the systems described herein may be
`applied to process any type of data from any data source.
`FIG. 3 illustrates an embodiment of a scalable unified
`memory architecture 300 that supports parallel processing of
`graphical data and/or graphical instructions. ACPU/memory
`controller subsystem 302 includes a CPU 308 coupled to a
`memory controller/graphics controller 310. One or more
`memory modules 304 are coupled to memory controller,’
`graphics controller 310 in subsystem 302. Each memory
`
`4
`module 304 includes a rendering engine 312 and a shared
`memory 314 (i.e., main memory and graphics memory). The
`main memory typically contains instructions and/or data
`used by the CPU. The graphics memory typically contains
`instructions and/or data used to process, render, or otherwise
`handle graphical images or graphical information.
`Rendering engine 312 may also be referred to as a
`“compute engine” or a “computing engine.” The shared
`memory 314 typically includes multiple memory devices
`coupled together to form a block of storage space. Each
`rendering engine 312 is capable of performing various
`memory access and/or data processing functions. For the
`embodiment shown in FIG. 3, memory controller/graphics
`controller 310 is also coupled to an I/O controller 306 which
`controls the flow of data into and out of the system. An
`optional video input port 316 provides data to memory
`controller/graphics controller 310 and a display interface
`318 provides data output to one or more devices (such as
`display devices or storage devices). For systems which
`support video input or capture capability, a video input port
`on the memory controller/graphics controller 310 is one way
`to handle the delivery of video source data. Another means
`of delivery of video input data to the system would include
`delivering the data from a peripheral module through the I/
`controller 306 to device 310.
`
`In the example of FIG. 3, CPU/memory controller sub-
`system 302 is coupled to four distinct memory modules 304.
`Each memory module includes a rendering engine and a
`shared memory. Each rendering engine 312 is capable of
`performing various data processing functions. Thus, the four
`rendering engines are capable of performing four different
`processing functions simultaneously (i.e., parallel
`processing). Further, each rendering engine 312 is capable of
`communicating with other rendering engines on other
`memory modules 304.
`The memory controller/graphics controller 310 distributes
`particular processing tasks (such as graphical processing
`tasks)
`to the different rendering engines, and performs
`certain processing tasks itself. These tasks may include data
`to be processed and/or instructions to be processed.
`Although four memory modules 304 are shown in FIG. 3,
`alternate system may contain any number of memory mod-
`ules coupled to a common memory controller/graphics con-
`troller 310. This ability to add and remove memory modules
`' 304 provides an upgradeable and scalable memory and
`computing architecture.
`The architecture of FIG. 3 allows the memory controller/’
`graphics controller 310 to issue high level primitive com-
`mands to the various rendering engines 312, thereby reduc-
`ing the volume or bandwidth of data that must be
`communicated between the controller 310 and the memory
`modules 304. Thus,
`the partitioning of memory among
`multiple memory modules 304 improves graphical data
`throughput relative to systems in which a single graphics
`controller performs all processing tasks and reduces band-
`width contention with the CPU. This bandwidth reduction
`occurs because the primitive commands typically contain
`significantly less data than the amount of data referenced
`when rendering the primitive. Additionally, the system par-
`titioning described allows aggregate bandwidth between the
`rendering engines and the memory devices to be much
`higher than the bandwidth between the controller and
`memory modules. Thus, elfective system bandwidth is
`increased for processing graphics tasks.
`FIG. 4 is a flow diagram illustrating a procedure for
`partitioning a processing task into multiple portions and
`
`0010
`
`

`
`US 6,864,896 B2
`
`5
`distributing the multiple portions to different rendering
`engines. Initially, the memory controller/graphics controller
`receives a processing task (block 402). The memory
`controller/graphics controller partitions the processing task
`into multiple portions (block 404) and distributes each
`portion of the processing task to a rendering engine on a
`memory module (block 406).
`When all rendering engines have finished processing their
`portion of the processing task,
`the memory controller/
`graphics controller reassembles the finished portions of the ,
`task (block 410).
`In certain situations,
`the memory
`controller/graphics controller may perform certain portions
`of the processing task itself rather than distributing them to
`a rendering engine. In other situations, the processing task
`may be partitioned such that one or more rendering engines
`perform multiple portions of the processing task. In another
`embodiment, the processing task may be partitioned such
`that one or more rendering engines are idle (i.e., they do not
`perform any portion of the processing task).
`For typical graphics applications, primitive data (e.g.,
`triangles or polygons) are sorted by the graphics controller
`or CPU according to the spatial region of the rendering
`surface (e.g.,
`the x and y coordinates) covered by that
`primitive. The rendering surface is generally divided into
`multiple rectangular regions of pixels (or picture elements),
`referred to as “tiles” or “chunks.” Since the tiles generally do
`not overlap spatially,
`the rendering of the tiles can be
`partitioned among multiple rendering engines.
`An example of the procedure described with respect to
`FIG. 4 will be described with reference to FIG. 5, which
`illustrates a graphical rendering surface divided into sixteen
`different sections or “tiles” (four rows and four columns).
`The graphical
`rendering surface may be stored in,
`for
`example, an image buffer or displayed on a display device.
`If a particular image processing task is to be performed on
`the graphical rendering surface shown in FIG. 5,
`the
`memory controller/graphics controller may divide the pro-
`cessing tasks into different portions based on the set of tiles
`intersected by the primitive elements of the task. For
`example, four of the sixteen tiles of the surface are assigned
`to a first rendering engine (labeled “REO”) on a first memory
`module. Another four tiles of the surface are assigned to a
`different rendering engine (labeled “RE1), and so on. This
`arrangement allows the four different rendering engines
`(REO, RE1, RE2, and RE3) to process different regions of
`the surface simultaneously. This parallel processing signifi-
`cantly reduces the processing burden on the memory
`controller/graphics controller. In this example, a significant
`portion of the set of graphics rendering tasks is performed by
`the four rendering engines, which allows the memory
`controller,/graphics controller to perform other tasks while
`these rendering tasks are being processed.
`In an alternate embodiment, each of the rendering engines
`is assigned to process a particular set of rows or columns of
`the rendering surface, where these rows or columns repre-
`sent a band of any size of pixels (at least one pixel wide). As
`shown in FIG. 5, the rendering surface is divided into two or
`more horizontal pixel bands (also referred to as pixel rows)
`and divided into two or more vertical pixel bands (also
`referred to as pixel columns). The specific example of FIG.
`5 shows a rendering surface divided into four horizontal
`pixel bands and four vertical pixel bands. However,
`in
`alternate embodiments the rendering surface may be divided
`into any number of horizontal pixel bands and any number
`of vertical pixel bands. For example, the rendering surface
`may be divided into four sections or “quadrants” (i.e., two
`columns and two rows), in which each of the rendering
`engines is assigned to a particular quadrant of the rendering
`surface.
`
`6
`The example discussed above with respect to FIG. 5 is
`related to graphics or image processing. However, the sys-
`tem described herein may be applied to any type of pro-
`cessing in which one or more processing functions ca11 be
`divided into multiple tasks that can be performed in parallel
`by multiple computing engines. Other examples include
`floating point numerical processing, digital signal
`processing, block transfers of data stored in memory, coni-
`pression of data, decompression of data, and cache control.
`For example, a task such as decompression of a compressed
`MPEG (Moving Picture Experts Group) video stream can
`begin by partitioning the video stream into multiple regions
`or “blocks.” Each block is decompressed individually and
`the results of all block decompressions are combined to form
`a complete video image. In this example, each block is
`associated with a particular rendering engine.
`FIGS. 6 and 7 illustrate different memory modules con-
`taining a rendering engine and multiple memory devices.
`The multiple memory devices represent the shared memory
`314 shown in FIG. 3. Although the memory modules shown
`in FIGS. 6 and 7 each contain eight memory devices,
`alternate memory modules may contain any number of
`memory devices coupled to a rendering engine.
`Referring to FIG. 6, a memory module 600 includes eight
`memory devices 604 coupled to a rendering engine 602. The
`rendering engine 602 is also coupled to a module intercon-
`nect 606 which couples the memory module 600 to an
`associated interconnect on another module, motherboard,
`device, or other system. Amemory interconnect 608 couples
`the memory devices 604 to the rendering engine 602. The
`memory interconnect 608 may allow parallel access to
`multiple memory devices 604, and may use either a multi-
`drop topology, point-to-point topology, or any combination
`of these topologies.
`Referring to FIG. 7, a memory module 700 includes
`multiple memory devices 704 coupled to a rendering engine
`702. The rendering engine 702 is also coupled to a pair of
`module interconnects 706 and 708, which couple tie
`memory module 700 to another module, motherboard,
`device, or system. A memory interconnect 710 couples tie
`memory devices 704 to the rendering engine 702 and may
`allow parallel access to multiple memory devices 704. In an
`alternate embodiment, module interconnects 706 and 708
`are connected to form a contiguous interconnect through tie
`' module 700 connecting the rendering engine 702 to other
`devices in the system.
`As discussed above, each memory module has its own
`memory interconnect for communicating data between tie
`memory module’s rendering engine and the various memory
`devices. The memory interconnect on each memory module
`is relatively short and provides significant bandwidth for
`communicating with the multiple memory devices on tie
`module. Further, the simultaneous processing and comm —
`nication of data on each memory module reduces the volume
`or bandwidth of data communicated between the memory
`modules and the memory controller/graphics controller. Tie
`memory interconnects discussed herein may be a bus, a
`series of point-to-point links, or any combination of these
`types of interconnects.
`FIG. 8 illustrates a memory module 800 containing two
`different
`rendering engines 802 and 810, and multiple
`memory devices 804 and 812 associated with reach render-
`ing engine.
`In this embodiment,
`two different memory
`interconnects 806 and 814 couple the memory devices to the
`rendering engines. The two rendering engines 802 and 810
`are coupled to a common module interconnect 808. Thus,
`
`0011
`
`

`
`US 6,864,896 B2
`
`7
`rendering engines 802 and 810 can process data and/or tasks
`simultaneously.
`In another embodiment, each rendering
`engine 802 and 810 is coupled to an independent module
`interconnect (11ot shown).
`Aparticular embodiment of the memory modules shown
`in FIGS. 6, 7, and 8 use multiple “Direct Rambus Channels”
`developed by Rambus Inc. of Mountain View, Calif. The
`Direct Rambus Channels connect
`the multiple memory
`devices to the associated rendering engine.
`In this
`embodiment, the memory devices are Rambus DRAMs (or
`RDRAMs®). However, other memory types and topologies
`may also be used within the scope of this invention. For
`example, memory types such as Synchronous DRAMs
`(SDRAMS), Double-Data-Rate (DDR) SDRAMs, Fast-
`Cycle (FC) SDRAMS, or derivatives of these memory types
`may also be used. These devices may be arranged in a single
`rank of devices or multiple ranks, where a rank is defined as
`an group of devices which respond to a particular class of
`commands including reads and writes.
`In a particular embodiment of the architecture shown in
`FIG. 3, processing functions that are associated with typical
`graphics controller interface functions such as video input
`processing and display output processing are handled by the
`memory controller/graphics controller 310. Processing func-
`tions that are associated with high-bandwidth rendering or
`image buffer creation are handled by the rendering engines
`304 on the memory modules.
`Although particular examples are discussed herein as
`having one or two rendering engines, alternate implemen-
`tations may include any number of rendering engines or
`computing engines on a particular module. Further, a par-
`ticular module may contain any number of memory devices
`coupled to any number of rendering engines and/or C0111-
`puti11g engines.
`Thus, a system has been described that provides multiple
`discrete memory modules coupled to a common memory
`controller. Each memory module includes a computing
`engine and a shared memory. A data processing task from
`the memory controller can be partitioned among the differ-
`ent computing engines to allow parallel processing of the
`various portions of the processing task.
`is
`Although the description above uses language that
`specific to structural features and/or methodological acts, it
`is to be understood that
`the invention defined in the
`appended claims is not limited to the specific features or acts
`described. Rather, the specific features and acts are disclosed
`as exemplary forms of implementing the invention.
`What is claimed is:
`1. An apparatus comprising:
`a memory controller configured to partition processing
`tasks among a plurality of computing engines; and
`at least one module coupled to the memory controller,
`each module including:
`at least one computing engine configured to receive
`information associated with at least one of the par-
`titioned processing tasks from the memory control-
`ler; and
`a shared memory coupled to the computing engine,
`wherein the shared memory includes storage for data
`which includes a portion of the main memory for a
`central processing unit (CPU).
`2. An apparatus as recited in claim 1 further including an
`interconnect configured to couple a plurality of modules to
`the memory controller.
`3. An apparatus as recited in claim 1 wherein the com-
`puting engine is a graphics rendering engine.
`
`5
`
`'
`
`55
`
`8
`4. An apparatus as recited in claim 1 wherein the shared
`memory includes at least one memory device.
`5. An apparatus as recited in claim 1 wherein the shared
`memory is configured to store main memory instructions
`and data in addition to graphical data.
`6. An apparatus as recited in claim 1 wherein the com-
`puting engine is coupled between the memory controller and
`the shared memory.
`7. An apparatus as recited in claim 1 wherein the memory
`controller is configured to receive video data.
`8. An apparatus as recited in claim 1 wherein the memory
`controller includes a graphics controller.
`9. An apparatus as recited in claim 1 wherein each module
`includes a plurality of computing engines.
`10. An apparatus as recited in claim 9 wherein computing
`engines on the same module can communicate with one
`another.
`11. An apparatus as recited in claim 1 wherein computing
`engines on different modules can communicate with one
`another.
`12. An apparatus as recited in claim 1 wherein the
`information received by the computing engine from the
`memory controller includes instructions to be processed by
`the computing engine.
`13. An apparatus as recited in claim 1 wherein the
`information received by the computing engine from the
`memory controller includes data to be processed by the
`computing engine,
`the data associated with at least one
`section in a graphical rendering surface.
`14. An apparatus comprising:
`a memory controller configured to partition processing
`tasks among a plurality of rendering engines;
`a plurality of memory modules coupled to the memory
`controller, each module including:
`at least one rendering engine configured to receive data
`to be processed from the memory controller; and
`a shared memory coupled to the rendering engine,
`wherein the shared memory includes storage for a
`portion of the main memory for a central processing
`unit (CPU) coupled to the memory controller.
`15. An apparatus as recited in claim 14 wherein each
`shared memory includes a plurality of memory devices.
`16. An apparatus as recited in claim 14 wherein the
`memory controller is configured to receive video data.
`17. An apparatus as recited in claim 14 wherein each
`, partitioned processing task is associated with at least one
`section of a graphical rendering surface.
`18. An apparatus as recited in claim 14 further including
`an interconnect configured to couple the plurality of mod-
`ules to the memory controller.
`19. An apparatus as recited in claim 14 wherein each
`module includes a plurality of rendering engines.
`20. An apparatus as recited in claim 14 wherein rendering
`engines on different modules can communicate with one
`another.
`21. A module comprising:
`a rendering engine configured to receive information from
`a memory controller, the information associated with at
`least one processing task partitioned by the memory
`controller, and
`shared memory coupled to the rendering engine,
`wherein the shared memory is configured to store main
`memory data as well as graphical data, wherein the
`module is configured to be coupled to or decoupled
`from the memory controller.
`22. Amodule as recited in claim 21 wherein the rendering
`engine and the shared memory are mounted on a common
`substrate.
`
`0012
`
`

`
`US 6,864,896 B2
`
`9
`23. Amodule as recited in claim 21 wherein the rendering
`engine is a graphics rendering engine.
`24. A module as recited in claim 21 wherein the shared
`memory includes at least one memory device.
`25. A module as recited in claim 21 further including an
`interconnect configured to couple the module to a plurality
`of other modules.
`26. Amodule as recited in claim 21 wherein the rendering
`engine communicates with rendering engines on other mod-
`ules.
`27. A module as recited in claim 21 wherein the infor-
`mation received by the rendering engine from the memory
`controller includes instructions to be processed by the ren-
`dering engine.
`28. A module as recited in claim 21 wherein the infor-
`mation received by the rendering engine from the memory
`controller includes data to be processed by the rendering
`engine, the data associated with at least one section in a
`graphical rendering surface.
`29. A method comprising:
`receiving a processing task at a memory controller,
`partitioning, by the memory controller, the processing
`task into a plurality of portions; and
`
`10
`distributing, by the memory controller, each portion of the
`processing task t

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket