`
`(12)
`
`(cid:6)(cid:27)&(cid:11)(cid:11)(cid:12)(cid:19) (cid:11)(cid:14)(cid:11)(cid:20)(cid:24)(cid:12)(cid:6)
`EP 1 820 309 B1
`
`(11)
`
`EUROPEAN PATENT SPECIFICATION
`(51) Int Cl.:(cid:3)
`H04L12/56(2006.01)
`
`(45) Date of publication and mention
`of the grant of the patent:
`27.08.2008 Bulletin 2008/35
`
`(21) Application number: 05850071.1
`
`(22) Date of filing: 30.11.2005
`
`(54) STREAMING MEMORY CONTROLLER
`STREAMING-(cid:3)SPEICHERSTEUERUNG
`CONTROLEUR DE MEMOIRE EN CONTINU
`
`(84) Designated Contracting States:
`AT BE BG CH CY CZ DE DK EE ES FI FR GB GR
`HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI
`SK TR
`Designated Extension States:
`AL BA HR MK YU
`
`(30) Priority: 03.12.2004 EP 04106274
`
`(43) Date of publication of application:
`22.08.2007 Bulletin 2007/34
`(73) Proprietor: Koninklijke Philips Electronics N.V.(cid:3)
`5621 BA Eindhoven (NL)(cid:3)
`
`(72) Inventors:
`• BURCHARD, Artur
`NL-(cid:3)5656 AA Eindhoven (NL)(cid:3)
`
`(86) International application number:
`PCT/IB2005/053970
`
`(87) International publication number:
`WO 2006/059283 (08.06.2006 Gazette 2006/23)(cid:3)
`
`• HEKSTRA-(cid:3)NOWACKA, Ewa
`NL-(cid:3)5656 AA Eindhoven (NL)(cid:3)
`• HARMSZE, Francoise, J.(cid:3)
`NL-(cid:3)5656 AA Eindhoven (NL)(cid:3)
`• VAN DEN HAMER, Peter
`NL-(cid:3)5656 AA Eindhoven (NL)(cid:3)
`
`(74) Representative: van der Veer, Johannis Leendert
`et al
`NXP Semiconductors B.V.
`IP&L Department
`High Tech Campus 32
`5656 AE Eindhoven (NL)(cid:3)
`
`(56) References cited:
`US-(cid:3)A- 5 751 951
`US-(cid:3)B1- 6 405 256
`
`US-(cid:3)A1- 2002 034 162
`
`Note: Within nine months of the publication of the mention of the grant of the European patent in the European Patent
`Bulletin, any person may give notice to the European Patent Office of opposition to that patent, in accordance with the
`Implementing Regulations. Notice of opposition shall not be deemed to have been filed until the opposition fee has been
`paid. (Art. 99(1) European Patent Convention).
`
`Printed by Jouve, 75001 PARIS (FR)
`
`EP1 820 309B1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2172, p. 1
`
`
`
`EP 1 820 309 B1
`
`Description
`(cid:3)[0001] The present invention relates to a memory controller and a method for coupling a network and a memory.
`(cid:3)[0002] The complexity of advanced mobile and portable devices increases. The ever more demanding applications
`of such devices, the complexity, flexibility and programmability requirements intensify data exchange inside the devices.
`The devices implementing such applications often consist of several functions or processing blocks, here called sub-
`systems. These subsystems typically are implemented as separate ICs, each having a different internal architecture
`that consists of local processors, busses, and memories, etc. Alternatively, various subsystems, may be integrated on
`an IC. At system level, these subsystems communicate with each other via a top-(cid:3)level interconnect, that provides certain
`services, often with real-(cid:3)time support. As an example of subsystems in a mobile phone architecture we can have, among
`others, base-(cid:3)band processor, display, media processor, or storage element. For support of multimedia applications,
`these subsystems exchange most of the data in a streamed manner. As an example of data streaming, reference is
`made to readout of an MP3 encoded audio file from the local storage by a media-(cid:3)processor and sending the decoded
`stream to speakers. Fig. 1 shows a basic representation of such a communication, which can be described as a graph
`of processes P1 -P4 connected via FIFO buffers B. Such an representation is often referred to as Kahn process network.
`The Kahn process network can be mapped on the system architecture, as described in E.A. de Kock et al., "YAPI:
`Application modeling for signal processing systems". In Proc. of the 37th. Design Automation Conference, Los Angeles,
`CA, June 2000, pages 402-405. IEEE, 2000. In such an architecture the processes are mapped onto the subsystems,
`FIFO buffers on memories SMEM, and communications onto the system-(cid:3)level interconnect IM.
`(cid:3)[0003] Buffering is essential in a proper support of data streaming between the involved processes. Typically, FIFO
`buffers are used for streaming, which is in accordance to (bounded) Kahn process network models of streaming appli-
`cation. With increased number of multimedia applications that can run simultaneously the number of processes, real-
`time streams, as well as the number of associated FIFOs, substantially increases.
`(cid:3)[0004] There exist two extreme implementations of streaming with respect to memory usage and FIFOs allocation.
`The first uses physically distributed memory, where FIFO buffers are allocated in a local memory of a subsystem. The
`second uses physically and logically unified memory where all FIFO buffers are allocated in a shared, often off-(cid:3)chip,
`memory. A combination thereof is also possible.
`(cid:3)[0005] The FIFO buffers can be implemented in a shared memory using an external DRAM memory technology.
`SDRAM and DDR-(cid:3)SDRAM are the technologies that deliver large capacity external memory at low cost, with a very
`attractive cost to silicon area ratio.
`(cid:3)[0006] Fig. 2 shows a basic architecture of a system on chip with a shared memory streaming framework. The process-
`ing units C, S communicate with each other via the buffer B. The processing units C, S as well as the buffer each are
`associated to an interface unit IU for coupling them to an interconnect means IM. In case of a shared memory date
`exchange, the memory can also be used for other purposes. The memory can for example also be used for the code
`execution or a dynamic memory allocation for the processings of a program running on a main processor.
`(cid:3)[0007] Such a communication architecture or network, including the interconnect means, the interface units as well
`as the processing units C, S and the buffer B, may provide specific transport facilities and a respective infrastructure
`giving certain data transport guarantee such as for example a guaranteed throughput or a guaranteed delivery for an
`error-(cid:3)free transport of data or a synchronization service for synchronizing source and destination elements such that no
`data is lost due to the under or overflow of buffers. This becomes important if real-(cid:3)time streaming processing is to be
`performed by the system and a real-(cid:3)time support is required for all of the components.
`(cid:3)[0008] Within many systems-(cid:3)on-(cid:3)chip (SoC) and microprocessor systems background memory (DRAM) are used for
`buffering of data. When the data is communicated in a streaming manner, and buffered as a stream in the memory, pre-
`fetch buffering can be used. This means that the data from the SDRAM is read beforehand and kept in a special (pre-
`fetch) buffer. When the read request arrives it can be served from local pre-(cid:3)fetch buffer, usually implemented in on-(cid:3)chip
`SRAM, without latency otherwise introduced by background memory (DRAM). This is similar to known caching techniques
`of random data for processors. For streaming, a contiguous (or better to say a predictable) addressing of data is used
`in a pre-(cid:3)fetch buffer, rather then a random address used in a cache. For more details, please refer to J. L. Hennessy
`and D. A. Patterson "Computer Architecture -- A Quantitative Approach"
`(cid:3)[0009] On the other hand, due to DRAM technology, it is better to access (read or write) DRAM in bursts. Therefore,
`often a write-(cid:3)back buffer is implemented, which gathers many single data accesses into a burst of accesses of a certain
`size. Once the initial processing is done for the first DRAM access, every next data word, with address in a certain
`relation to the previous one (e.g. next, previous - depending on a burst policy), accessed in every next cycle of the
`memory can be stored without any further delay (within 1 cycle), for a specified number of accesses (2/4/8/full page).
`Therefore, for streaming accesses to memory, when addresses are increased or decreased in the same way for every
`access (e.g. contiguous addressing) the burst access provides the best performance at the lowest power dissipation.
`For more information regarding the principles of a DRAM memory, please refer to Micron’s 128-(cid:3)Mbit DDRRAM specifi-
`cations,(cid:3)
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2172, p. 2
`
`
`
`EP 1 820 309 B1
`http:(cid:3)//(cid:3)download.micron.com/pdf/(cid:3)datasheets/(cid:3)dram/ddr/(cid:3)128MbDDRx4x8x16.pdf, which is incorporated by reference.
`(cid:3)[0010] Until now, controllers of external DRAM were designed to work in bus-(cid:3)based architectures. Buses provide
`limited services for data transport, simple medium access control, and best effort data transport only. In such architectures,
`the unit that gets the access to the bus automatically gets the access to the shared memory. Moreover, the memory
`controllers used in such systems are not more than access blocks optimized to perform numerous low latency reads or
`writes, often tweaked for processor random cache-(cid:3)like burst accesses. As a side effect of the low-(cid:3)latency, high-(cid:3)bandwidth,
`and high-(cid:3)speed optimizations of the controllers, the power dissipation of external DRAM is relatively high.
`(cid:3)[0011] The above-(cid:3)mentioned network services are, however, only applicable within the network. As soon as a data
`exchange occurs to any component outside the network, the network service guarantees are not met. Within a shared
`memory architecture, data which is to be buffered will be typically exchanged via the physically unified memory such
`that data need to be transported to and from the memory, whereby the data will break the services provided by the
`network as neither a memory controller nor the memory itself supports any of the network services.
`(cid:3)[0012] US 5,751,951 discloses a memory controller for coupling a memory to a network. The memory controller
`comprises a first network for connecting the memory controller to the network. The memory controller furthermore
`comprises a streaming memory unit having a buffer for temporarily storing at least part of the data streams and a buffer
`managing unit for managing a temporal storing of data streams in the buffer. The memory controller furthermore comprises
`an interface for exchanging data with the memory in bursts.
`(cid:3)[0013]
`It is an object of the invention to provide a memory controller for coupling a network and a memory as well as
`a method for coupling a network and a memory, which together with the memory improve the predictable behavior of
`the communication between the network and the memory.
`(cid:3)[0014] This object is solved by a memory controller according to claim 1 and by a method for coupling a network and
`a memory according to claim 3.
`(cid:3)[0015] A memory controller is provided for coupling a memory to a network. The memory controller comprises a first
`interface, a streaming memory unit and a second interface. The first interface is used for connecting the memory controller
`to the network for receiving and transmitting data streams. The streaming memory unit is coupled to the first interface
`for controlling data streams between the network and the memory. The streaming memory unit comprises a buffer for
`temporarily storing at least part of the data streams and a buffer managing unit for managing the temporarily storing of
`the data streams in the buffer. The second interface is coupled to the streaming memory unit for connecting the memory
`controller to the memory in order to exchange data with the memory in bursts. The streaming memory unit is provided
`to implement network services of the network onto the memory.
`(cid:3)[0016] Accordingly, with such a memory controller, a memory which does not implement the network services as
`provided by a network can be integrated with a communication network supporting specific network services. In other
`words, the same services will be applicable to the data being communicated within a network or to data which is exchanged
`with the memory sub-(cid:3)system.
`(cid:3)[0017] According to an aspect of the invention, the first interface is implemented as a PCI-(cid:3)Express interface such that
`the properties and network services of a PCI-(cid:3)Express network can be implemented by the memory controller.
`(cid:3)[0018] According to a further aspect of the invention, the memory is at least partly organized as FIFOs and a stream
`identifier is associated to every data stream from the network. The streaming memory unit is provided to control the data
`stream from/to the network by directing a particular data stream to a particular FIFO in the memory according to the
`stream identifier of the data stream. Furthermore, an arbitration is performed between the different data streams for
`accessing the memory. The second interface is arranged to exchange a relatively course grain stream of data with the
`memory and a relatively fine grain stream of data with the network. As the stream identifier of a data stream is used to
`map the data stream onto a FIFO in the memory, a simple addressing scheme is realized.
`(cid:3)[0019] According to a further aspect of the invention, the network is implemented as a PCI-(cid:3)Express network and a
`PCI-(cid:3)Express ID is used in the network for addressing purposes. The first interface is then implemented as a PCI-(cid:3)Express
`interface. The streaming memory unit converts a PCI-(cid:3)Express ID into a FIFO memory address as well as a FIFO memory
`address into a PCI-(cid:3)Express ID. Accordingly, the PCI-(cid:3)Express device addressing scheme is used to address the FIFO
`buffers within the memory.
`(cid:3)[0020] The invention also relates to a method for coupling a memory to a network. Data streams are received and
`transmitted via a first interface (PI) for connecting a memory controller to the network. The data streams between the
`network and the memory is controlled by a streaming memory unit (SMU). At least part of the data streams is temporarily
`stored in a buffer. The temporarily storing of the data streams in a buffer is managed. The streaming memory controller
`is coupled to the memory via a second interface and data is exchanged with the memory in bursts. network services of
`the network are implemented onto the memory.
`(cid:3)[0021] The invention relates to the idea of introducing a steaming memory controller associated to a shared memory.
`The streaming memory controller is able to provide the same services as a network. Such services may be flow control,
`virtual channels and memory bandwidth arbitration tuned to network bandwidth arbitration. Such services guaranteed
`by the network will then also be guaranteed by the memory controller if data leaves the network in order to be buffered
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`3
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2172, p. 3
`
`
`
`EP 1 820 309 B1
`
`in the memory. The integrity of the network services will thus be preserved from the source of the data to its destination.
`(cid:3)[0022] Other aspects of the invention are subject to the dependent claims.
`(cid:3)[0023] These and other aspects of the invention are apparent from and will be elucidated with reference to the em-
`bodiments describe hereinafter and with respect to the following figures.(cid:3)
`
`Fig. 1 shows a basic representation of a Kahn process network and mapping of it onto a shared memory architecture;
`Fig. 2 shows a basic architecture of a system on chip with a shared memory streaming framework;
`Fig. 3 shows a block diagram of a system on chip according to the first embodiment;
`Fig. 4 shows the logical architecture of a SDRAM for the state when the memory clock is enabled;
`Fig. 5 shows a block diagram of a streaming memory controller SMC according to a second embodiment;
`Fig. 6 shows a block diagram of a logical view of the streaming memory controller SMC;
`Fig. 7 shows a block diagram of an architecture of a system on chip according to a third embodiment;
`Fig. 8 shows a format of an ID within a PCI-(cid:3)Express network;
`Fig. 9 shows a configuration within a PCI-(cid:3)Express system;
`Fig. 10 shows a block diagram of a system on chip according to the fourth embodiment;
`Fig. 11 shows an example of the memory allocation within the memory of Fig. 10; and
`Fig. 12 shows a power dissipation of external DDR-(cid:3)SDRAM versus the burst size of the access and worst-(cid:3)case
`delay versus buffer size in network packets.
`(cid:3)[0024] Fig. 3 shows a block diagram of a system on chip according to the first embodiment. A consumer C and a
`producer P is coupled to a PCI-(cid:3)express network PCIE. The communication between the producer and consumer P, C
`is performed via the network PCIE and a streaming memory controller SMC to an (external) memory MEM. The (external)
`memory MEM can be implemented as a DRAM or a SDRAM. As the communication between the producer P and the
`consumer C is a stream-(cid:3)based communication, FIFO buffers are provided in the external memory MEM for this com-
`munication.
`(cid:3)[0025] The streaming memory controller SMC according to Fig. 3 has two interfaces: one towards PCI Express fabric,
`and second towards the DRAM memory MEM. The PCI Express interface of the streaming memory controller SMC must
`perform the traffic shaping on the data retrieved from the SDRAM memory MEM to comply with the traffic rules of the
`PCI Express network PCIE. On the other interface of the streaming memory controller SMC, the access to the DRAM
`is performed in bursts, since this mode of accessing data stored in DRAM has the biggest advantage with respect to
`power consumption. The streaming memory controller SMC itself must provide intelligent arbitration of access to the
`DRAM among different streams such that throughput and latency of access are guaranteed. Additionally, the SMC also
`provides functionality for smart FIFO buffer management.
`(cid:3)[0026] The basic concept of a PCI-(cid:3)Express network is described in "PCI Express Base Specification, Revision 1.0",
`PCI-(cid:3)SIG, July 2002, www.pcisig.org.
`(cid:3)[0027] The features of a PCI Express network, which are taken into consideration in the design of the streaming
`memory controller, are: isochronous data transport support, flow control, and specific addressing scheme. The iso-
`chronous support is primarily based on segregation of isochronous and non-(cid:3)isochronous traffic by means of Virtual
`Channels VCs. Consequently, network resources like bandwidth and buffers are explicitly reserved in the switch fabric
`for specific streams, such that no interference between streams in different virtual channels VC is guaranteed. Additionally,
`the isochronous traffic, in the switch fabric, is regulated by scheduling, namely admission control and service discipline.
`(cid:3)[0028] The flow control is performed on a credit base to guarantee such that no data is lost in the network PCIE due
`to buffers under/(cid:3)overflows. Each network node is only allowed to transmit network packet through a network link to the
`other network node when the receiving node has enough space to receive the data. Every virtual channel VC comprises
`a dedicated flow control infrastructure. Therefore, a synchronization between the source and destination can be realized,
`through chained PCI Express flow control, separately for every virtual channel VC.
`(cid:3)[0029] The PCI Express addressing scheme typically uses 32 or 64 bit memory addresses. As no explicit memory
`addresses are to be used, device and function IDs, i.e. stream IDs, are used to differentiate between different streams.
`The memory controller SMC itself will generate/(cid:3)convert stream IDs into the actual memory addresses.
`(cid:3)[0030]
`In order to simplify the addressing scheme even further, the ID of the virtual channel VC is used as a stream
`identifier. Since PCI Express allows up to eight virtual channels VCs, half of them can be used for identifying incoming
`streams and the other half for identifying outgoing streams from the external memory. Therefore, the maximum number
`of streams that can access the memory through the memory controller SMC is limited to eight. Please note that such a
`limitation is due to PCI Express that allows for arbitration between streams in different VCs, and not between those
`inside the same virtual channel VC. However, such limitation is only specific to PCI Express based systems, it is not
`fundamental for the concepts of the present invention.
`(cid:3)[0031] Summarizing, the PCI Express interface of the memory controller SMC consists of a full PCI Express interface,
`equipped additionally with some logic necessary for address translation and stream identification.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`4
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2172, p. 4
`
`
`
`EP 1 820 309 B1
`In the first embodiment a (DDR)(cid:3)SDRAM memory is used. As an example one can refer to the Micron’s 128-
`(cid:3)[0032]
`Mbit DDR-(cid:3)SDRAM as described in Micron’s 128-(cid:3)Mbit DDRRAM specifications, http:(cid:3)//(cid:3)download.micron.com/pdf/(cid:3)datash-
`eets/(cid:3)dram/ddr/(cid:3)128MbDDRx4x8x16.pdf is used. Such technology is preferable since it provides desirable power con-
`sumption and timing behavior. However, the design is parameterized, and the memory controller SMC can be configured
`to work also with single rate memory. Since the DDR-(cid:3)SDRAM behaves similar to SDRAM, except the timing of the data
`lines, we explain basics using SDRAM concepts.
`(cid:3)[0033] The PCI Express network PCIE provides network services, e.g. guaranteed real-(cid:3)time data transport, through
`exclusive resource/(cid:3)bandwidth reservation in the devices that are traversed by the real-(cid:3)time streams. When an external
`DRAM supported by a standard controller is connected to the PCI Express fabric, without having any intelligent memory
`controller in between, bandwidth and delay guarantees, typically provided by the PCI Express, will not be fulfilled by the
`memory, since it does not give any guarantees and acts as a "slave" towards incoming traffic.
`(cid:3)[0034] The design of standard memory controller focuses on delivering the highest possible bandwidth at the lowest
`possible latency. Such approach is suited for processor data and instruction (cache) access and not for isochronous
`traffic. To be able to provide the predictable behavior of the PCI Express network extended with the external DRAM, a
`streaming memory controller is needed, which guarantees a predictable behavior of the external memory for streaming.
`In addition, we aim to design the memory controller not only for guaranteeing throughput and latency, but also for reducing
`power consumption while accessing this DRAM.
`(cid:3)[0035] Fig. 4 shows the logical architecture of a SDRAM for the state when the memory clock is enabled, i.e. the
`memory is in one of the power up mode. The SDRAM comprise a logic unit L, an memory array AR, and data rows DR.
`When the clock is disabled, the memory is in low power state (power down mode).
`(cid:3)[0036] Typical commands applied to a memory are activate ACT, pre-(cid:3)charge PRE, read/(cid:3)write RD/WR, and refresh.
`The activate command takes care that after charging a bank and row address are selected and the data row (often
`referred to as a page) is transferred to the sense amplifiers. The data remains in the sense amplifiers until the pre-(cid:3)charge
`command restores the data to the appropriate cells in the array. When data is available in the sense amplifiers SAM,
`the memory is said to be in the active state. During such a state reads and writes can take place. After pre-(cid:3)charge
`command, the memory is said to be in the pre-(cid:3)charge state where all data is stored in cell array. Another interesting
`aspect of memory operation is a refresh. The memory cells of the SDRAM store data using small capacitors and these
`must be recharged regularly to guarantee integrity of data. When powered up, the SDRAM memory is instructed by
`controller to perform refresh. When powered down, SDRAM is in self-(cid:3)refresh mode, (i.e. no clock is enabled) and the
`memory performs refresh on its own. This state consumes very little power. Getting memory out of the self-(cid:3)refresh mode
`to the state in which data can be asserted for read or write takes more time than for others modes (e.g. 200 clock cycles,
`specifically for DDR-(cid:3)SDRAM).
`(cid:3)[0037] The timing and power management of the memory is important for proper design of the memory controller
`SMC that must provide specific bandwidth, latency and power guarantees. Reading a full page (equal to 1Kbyte), from
`an activated SDRAM, may take about 2560 clock cycles (~19.2 us) for burst length of 1 read, 768 clock cycles (~5.8 us)
`for burst length of 8 reads, and only 516 clock cycles (~3.9 us) for full page burst. These values are based on the specific
`128-(cid:3)Mbit DDR-(cid:3)SDRAM with clock period of 7.5 ns as described in "Micron’s 128-(cid:3)Mbit DDRRAM specifications,(cid:3)
`http:(cid:3)//(cid:3)download.micron.com/pdf/(cid:3)datasheets/(cid:3)dram/ddr/(cid:3)128MbDDRx4x8x16.pdf".
`(cid:3)[0038] Fig. 5 shows a block diagram of a streaming memory controller SMC according to a second embodiment. The
`streaming memory controller SMC comprises a PCI-(cid:3)Express interface PI, a streaming memory unit SMU and further
`interface MI which serves as interface to an (external) SDRAM memory. The streaming memory unit SMU comprises a
`buffer manager unit BMU, a buffer B, which may be implemented as a SRAM memory, as well as an arbiter ARB. The
`streaming memory unit SMU that implements buffering in SRAM, is together with the buffer manager used for buffering
`an access via PCI-(cid:3)Express Interface to the SDRAM. The buffer manager unit BMU serves to react to read or write
`accesses to SDRAM from the PCI-(cid:3)Express Interface, to manage the buffers (update pointer’s registers) and to relay
`data from/to buffers (SRAM) and from/to SDRAM. In particular, the buffer manager unit BMU may comprise a FIFO
`manager and a stream access unit SAU.
`(cid:3)[0039] The stream access unit SAU provides a stream ID, an access type, and the actual data for each stream. For
`each packet received from PCI Express interface, based on its virtual channel number VC0 - VC7, the stream access
`unit SAU forwards the data to an appropriate input buffer, implemented in local shared SRAM memory. For data retrieved
`from (DDR-) SDRAM’s FIFOs, and placed in output buffer B in local SRAM, it generates destination address and passes
`the data to the PCI Express interface PI. The Arbiter ARB decides which stream can access the (DDR-)(cid:3)SDRAM. The
`SRAM memory implements the input/(cid:3)output buffering, i.e. for pre-(cid:3)fetching and write-(cid:3)back purposes. The FIFO manager,
`which is at the heart of SMC, implements FIFO functionality for the memory through address generation for streams,
`access pointers update, and additional controls.
`(cid:3)[0040] Fig. 6 shows a block diagram of a logical view of the streaming memory controller SMC. Each of the streams
`ST1-(cid:3)ST4 are associated to a separate buffer. As only one stream at the time can access the external DRAM an arbiter
`ARB is provided which performs the arbitration in combination with a multiplexer MUX.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`5
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2172, p. 5
`
`
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`EP 1 820 309 B1
`(cid:3)[0041] The arbitration of the memory access between different real-(cid:3)time streams is essential for guaranteeing through-
`put and bounded access delay. Assume that whenever data is written to or read from the memory, a full page is either
`written or read, i.e. the access is performed in bursts. The time needed to access one page (slightly different for read
`and write operations) can be referred to as a time slot. A service cycle is defined as consisting of a fixed number of time
`slots. The access sequence repeats and resets as every new service cycle is started.
`(cid:3)[0042] The arbitration algorithm between streams according to the second embodiment is credit based. Each stream
`gets a number of credits (time slots) reserved, the same for every service cycle. The number of credits reflects bandwidth
`requirements of the stream. Each time an access is granted to the stream the number of credits available for the granted
`stream decreases. Credit count per stream is updated every time the arbitration occurs. Furthermore, credits are reset
`at the end of service cycle to guarantee periodicity of arbitration process. The credit counts can also be refreshed only
`(e.g. all decreased by the lowest value of all counts) to provide arbitration memory of previous service cycles, in case
`adaptive arbitration over a longer time is needed. In extreme case, single service cycle infinitely long can be used.
`(cid:3)[0043] When multiple streams want to access the memory in the same time slot, the credit count is used as an
`arbitration criterion. The stream that has used the least of its credits (relatively, measured as ratio between used and
`reserved credits per current service cycle) gets the access. The denied request is buffered and scheduled (or arbitrated
`with another incoming request), for the next time slot. In case the credit ratios are the same for two requesting streams,
`the one that requires lower access latency gets the access first (e.g. read over write).
`(cid:3)[0044]
`In this way, every stream (if requesting) gets in worst case the reserved number of accesses to the memory
`per service cycle, regardless the order of the incoming requests or the behavior of the other streams. This guarantees
`that the bandwidth requirement for every stream is met.
`(cid:3)[0045] Now an example of the credit-(cid:3)based arbitration algorithm is described in more detail. A time slot is defined as
`equal to a page (1KB) access to SDRAM memory MEM that, as calculated before, is equal to 3.9 Ps. Moreover, it is
`assumed that the service cycle has 60 time slots, so it is equal to 234 Ps. Therefore, there will be 4273 service cycles
`per second,(cid:3) what results in the total memory bandwidth of about 2 Gbit/s (4237*60*1KB). It is assumed that 3 streams
`each having respectively 350 Mbit/s, 700 Mbit/s, and 1050 Mbit/s of bandwidth requirements are provided. Therefore,
`the reserved credit count per service cycle of the first stream ST1 will be 350/2100 time 60 slots, what equals to 10 slots.
`Stream 2 and 3 ST2, ST3 will have 20 and 30 reserved credits, respectively. Table 1 shows the stream schedule (row
`Sdl) that is a result of the arbitration. It also shows credit (bandwidth) utilization levels that determine the arbitration
`result (rows CS1, CS2, CS3 - measured as ratio between used and reserved credits per current service cycle) per each
`
`Table 1. Example of the Credit Based Arbitration
`
`time slot (row Slot).(cid:3)(cid:3)[0046] While the reserved bandwidth is always guaranteed for each stream, the reserved but unused slots can be
`
`Slot
`
`CS1
`
`CS2
`
`CS3
`
`Sdl
`
`1
`
`0.1
`
`0
`
`0
`
`S1
`
`2
`
`0.1
`
`0.05
`
`0
`
`S2
`
`3
`
`0.1
`
`0.05
`
`0.03
`
`S3
`
`4
`
`0.1
`
`0.05
`
`0.06
`
`S3
`
`5
`
`0.1
`
`0.1
`
`0.06
`
`S2
`
`6
`
`0.1
`
`0.1
`
`0.1
`
`S3
`
`7
`
`0.2
`
`0.1
`
`0.1
`
`S1
`
`8
`
`0.2
`
`0.15
`
`0.1
`
`S2
`
`9
`
`0.2
`
`0.1
`
`10
`
`0.2
`
`0.1
`
`11
`
`0.2
`
`0.2
`
`0.13
`
`0.16
`
`0.16
`
`S3
`
`S3
`
`S2
`
`reused by other streams if necessary. This also enables flexible allocation of the bandwidth. While keeping all guarantees,
`it enables flexible handling of the unavoidable fluctuations in the network.
`(cid:3)[0047] Furthermore, sufficient buffering of the incoming requests must be provided to ensure that the above scheme
`works. A mechanism of stalling the requesting streams in case other streams are granted the access is also required.
`The stalling mechanism may be implemented using PCI Express flow control, which enables delaying of any stream,
`separately per each virtual channel VC. The minimal buffering required can be therefore equal to the size of the data
`accessed from memory during one time slot, i.e. one page. Increasing the access buffering is therefore not needed.
`However, it will decrease access latency, as such buffers then behave as pre-(cid:3)fetch or write-(cid:3)back buffers.
`(cid:3)[0048] The mentioned over-(cid:3)dimensioning of I/O buffers relaxes the arbitration. The proposed arbitration algorithm is
`all parameterized. Most of the aspects of the arbitration can be programmed. For example, the particular arbitration
`strategy can be chosen at the configuration time, the granularity of memory access (a time slot) can be changed from
`a page to a burst of other length, and finally the number of tim