`Ebeling
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,254,691 B1
`Aug. 7, 2007
`
`US007254691B1
`
`(54) QUEUING AND ALIGNING DATA
`
`(75) Inventor: Christopher D. Ebeling, San Jose, CA
`(US)
`
`6,181,609 B1* 1/2001 Muraoka ............... 365,189.05
`6,237,122 B1* 5/2001 Maki .......................... T14f730
`6,594,752 B1* 7/2003 Baxter ...
`... 712/43
`6,961,842 B2 * 1 1/2005 Baxter ...........
`... 712/43
`6,985,096 B1* 1/2006 Sasaki et al. ..
`... 341/100
`7,187,200 B2 * 3/2007 Young ......................... 326,41
`OTHER PUBLICATIONS
`(*) Notice: Subsysts still U.S. Appl. No. 10/919,900, filed Aug. 17 2004, Sasaki.
`U.S.C. 154(b) by 335 days.
`U.S. Appl. No. 10/683,944, filed Oct. 10, 2003, Young.
`* cited by examiner
`Primary Examiner Stephen C. Elmore
`(74) Attorney, Agent, or Firm W. Eric Webostad
`(57)
`ABSTRACT
`Queuing and ordering data is described. Data is stored or
`queued in concatenated memories where each of the memo
`ries has a respective set of data out ports. An aligner having
`multiplexers arranged in a lane sequence are coupled to each
`set of the data out ports. A virtual-to-physical address
`translator is configured to translate a virtual address to
`provide physical addresses and select signals, where the
`physical addresses are locations of at least a portion of data
`words of a cell stored in the concatenated memories in
`successive order. The multiplexers are coupled to receive the
`select signals as control select signaling to align the at least
`- - - - - - - - - - - - - - -i.
`well et
`Y. : g E.
`one data word obtained from each of the concatenated
`... 71412
`5.574,849. A 1/1996 SN, al.
`5.867,501 A
`2, 1999 Horset al.370/474 memories for lane aligned output from the aligner.
`5,914,953 A * 6/1999 Krause et al. .............. 370,392
`6,157,967 A * 12/2000 Horst et al. ................... T10/19
`
`(73) Assignee: Xilinx, Inc., San Jose, CA (US)
`
`(21) Appl. No.: 11/072,106
`(22) Filed:
`Mar. 4, 2005
`(51) Int. Cl.
`(2006.01)
`G06F 12/00
`(52) U.S. Cl. ....................... 711/202; 71 1/103; 71 1/104;
`711/154; 711/158: 341/100; 341/101: 365/194
`(58) Field of Classification Search ................ 711/202,
`711/103, 104,154, 158; 34.1/100, 101: 365/194
`See application file for complete search history.
`References Cited
`U.S. PATENT DOCUMENTS
`
`(56)
`
`
`
`
`
`
`
`MEMORY
`103
`
`SELECS
`2.
`
`TRANSLAOR
`CONTROLLER
`101
`
`20 Claims, 9 Drawing Sheets
`
`Emerson Exhibit 1042
`Emerson Electric v. Ollnova
`IPR2023-00626
`Page 00001
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 1 of 9
`
`US 7.254,691 B1
`
`13
`
`SERDES :
`
`61
`
`13P
`
`PRE-
`PRO
`106
`
`-
`
`- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
`vy
`r
`
`(
`
`E.
`M
`
`Y
`
`O
`
`:
`
`ALIGNER
`102
`
`Word from Memory #1
`
`Aligned Word 1
`
`120
`
`
`
`41
`
`SELECTS
`121.
`
`TRANSLATOR,
`CONTROLLER
`
`FIG. A
`
`IPR2023-00626 Page 00002
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 2 of 9
`
`US 7.254,691 B1
`
`PA 1024
`
`PA 3
`
`PA 2
`
`PA 1
`
`. BRAMO . BRAM 1
`
`170
`1.
`. BRAM 2 . BRAM 3
`
`---------------------------------------------------------------------------------------------------
`
`WORD 3
`
`A
`
`WORD 3 WORD O WORD 1
`
`WORD 2
`
`WORD O WORD 1
`
`WORD 2
`
`7 2
`1
`
`1
`7
`
`--> -----w-------------- esse-------------------------------------------------------------------------
`
`FIG. 1B
`
`IPR2023-00626 Page 00003
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 3 of 9
`
`US 7.254,691 B1
`
`VMS
`195
`
`
`
`PAN-1
`175
`
`PAN
`176
`
`PAN-1
`177
`
`PAN-2
`178
`
`VIRTUAL ADDRESS-(N-1,1,1) 190-1
`
`VIRTUAL ADDRESS-(N-1,1,0) 190-2
`
`VIRTUAL ADDRESS=(N-1,0,1) 1903
`
`VIRTUAL ADDRESS=(N-1,0,0) 190-4
`
`VIRTUAL ADDRESS=(N,1,1) 1905
`
`VIRTUAL ADDRESS=(N,1,0) 1906
`
`VIRTUAL ADDRESS=(N,0,1) 190-7
`
`VIRTUAL ADDRESS=(N,0,0) 190-8
`
`VIRTUAL ADDRESS=(N+1,1,1) 190-9
`
`VIRTUAL ADDRESS=(N+1,1,0) 190-10
`VIRTUAL ADDRESS=(N+1,0,1) 190-11
`
`VIRTUAL ADDRESS=(N+1,0,0) 190-12
`
`VIRTUAL ADDRESS=(N+2,1,1) 190-13
`
`VIRTUAL ADDRESS=(N+2,1,0) 190-14
`
`VIRTUAL ADDRESS=(N+2,0,1) 190-15
`
`VIRTUAL ADDRESS=(N+2,0,0) 190-16
`
`FIG. 1C
`
`IPR2023-00626 Page 00004
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 4 of 9
`
`US 7.254,691 B1
`
`-180
`
`------------------------
`
`PA 4096
`
`REC 4096
`
`
`
`PA 3
`
`PA 2
`
`PA 1
`
`PA O
`
`FIG. 1D
`
`IPR2023-00626 Page 00005
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 5 Of 9
`
`US 7.254,691 B1
`
`TRANSLATOR/CONTROLLER
`101
`
`ADDRESS
`GENERATOR
`2O-1
`
`Address for BRAM 0
`
`
`
`
`
`
`
`ADDRESS
`GENERATOR
`
`Address for BRAMM-1
`
`VIRTUAL
`ADDRESS
`112
`
`LOAD
`110
`
`INCREMENT
`111
`
`ALIGNMENT
`SELECT
`GENERATOR
`2O2-1
`
`
`
`| AlgNMENT
`| GENERATOR
`
`SELECT
`
`FIG. 2
`
`ADDRESSES
`120
`
`SELECTS
`121
`
`IPR2023-00626 Page 00006
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 6 of 9
`
`US 7.254,691 B1
`
`
`
`
`
`OPERATION
`SINCREMENT2
`301
`
`INCREMENT CURRENT
`OUTPUT ADDRESS
`303
`
`
`
`OPERATION
`IS LOAD?
`302
`
`
`
`
`
`
`
`
`
`MEMORY #
`LESS THANS
`304
`
`SET ADDRESS TOP
`305
`
`SET ADDRESS TOP
`PLUS ONE
`306
`
`
`
`
`
`
`
`
`
`
`
`OUTPUT ADDRESS
`31 O
`
`FIG. 3
`
`IPR2023-00626 Page 00007
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 7 Of 9
`
`US 7.254,691 B1
`
`OPERATION
`IS LOAD?
`402
`
`SELECT MEGUALS S--M
`MODULO 2^O
`403
`
`NO CHANGE TO OUTPUT
`SELECT SIGNAL
`404
`
`
`
`
`
`
`
`
`
`
`
`OUTPUT SELECT
`SIGNAL
`405
`
`FIG. 4
`
`IPR2023-00626 Page 00008
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 8 of 9
`
`US 7.254,691 B1
`
`102
`
`BRAMO Output
`
`BRAM 1 Output
`5O1
`
`502 ...
`
`503
`
`SELECT
`531
`
`|
`
`|
`
`|
`|
`
`|
`
`|
`
`S C E"
`
`|
`|
`|
`
`|
`
`|
`|
`
`
`
`-
`NE" 522
`|
`
`LANE-2
`523
`
`533
`
`si"
`534
`
`SELECT
`
`|
`
`|
`
`|
`
`524
`
`|
`| NE
`| - /
`
`FIG. 5
`
`IPR2023-00626 Page 00009
`
`
`
`U.S. Patent
`
`Aug. 7, 2007
`
`Sheet 9 Of 9
`
`US 7.254,691 B1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`JO555
`
`TldSO
`
`TOI I TOI
`§55 | G35
`
`?
`
`|||||||||||||||No.||||||||||||||||||||||||||
`
`|||||||||||||||||||||||||||||||||?||||||||||||||||||||||||||
`
`IPR2023-00626 Page 00010
`
`
`
`1.
`QUEUING AND ALIGNING DATA
`
`US 7,254,691 B1
`
`FIELD OF THE INVENTION
`
`One or more aspects of the invention relate generally to
`aligning data and, more particularly, to queuing and aligning
`data.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`15
`
`25
`
`30
`
`35
`
`Programmable logic devices (“PLDs) are a well-known
`type of integrated circuit that can be programmed to perform
`specified logic functions. One type of PLD, the field pro
`grammable gate array (“FPGA), typically includes an array
`of programmable tiles. These programmable tiles can
`include, for example, input/output blocks (“IOBs), config
`urable logic blocks (“CLBs), dedicated random access
`memory blocks (“BRAMs), multipliers, digital signal pro
`cessing blocks ("DSPs'), processors, clock managers, delay
`lock loops (“DLLs”), and so forth. Notably, as used herein,
`“include” and “including mean including without limita
`tion.
`Each programmable tile typically includes both program
`mable interconnect and programmable logic. The program
`mable interconnect typically includes a large number of
`interconnect lines of varying lengths interconnected by
`programmable interconnect points (“PIPs). The program
`mable logic implements the logic of a user design using
`programmable elements that can include, for example, func
`tion generators, registers, arithmetic logic, and so forth.
`The programmable interconnect and programmable logic
`are typically programmed by loading a stream of configu
`ration data into internal configuration memory cells that
`define how the programmable elements are configured. The
`configuration data can be read from memory (e.g., from an
`external programmable read only memory (“PROM) or
`written into the FPGA by an external device. The collective
`states of the individual memory cells then determine the
`function of the FPGA.
`Another type of PLD is the Complex Programmable
`Logic Device (“CPLD). A CPLD includes two or more
`“function blocks' connected together and to input/output
`(“I/O”) resources by an interconnect switch matrix. Each
`function block of the CPLD includes a two-level AND/OR
`structure similar to those used in Programmable Logic
`45
`Arrays (“PLAs) and Programmable Array Logic (“PAL”)
`devices. In some CPLDS, configuration data is stored on
`chip in non-volatile memory. In other CPLDs, configuration
`data is stored on-chip in non-volatile memory, then down
`loaded to volatile memory as part of an initial configuration
`50
`Sequence.
`For all of these PLDs, the functionality of the device is
`controlled by data bits provided to the device for that
`purpose. The data bits can be stored in Volatile memory (e.g.,
`static memory cells, as in FPGAs and some CPLDs), in
`non-volatile memory (e.g., FLASH memory, as in some
`CPLDs), or in any other type of memory cell.
`Other PLDS are programmed by applying a processing
`layer, such as a metal layer, that programmably intercon
`nects the various elements on the device. These PLDs are
`known as mask programmable devices. PLDS can also be
`implemented in other ways, e.g., using fuse or antifuse
`technology. The terms “PLD and “programmable logic
`device' include these exemplary devices, as well as encom
`passing devices that are only partially programmable
`An incoming information stream ('data stream'), which
`may include control information and data, to an FPGA may
`
`40
`
`55
`
`60
`
`65
`
`2
`be at a data rate that is higher than that which may be
`processed by FPGA programmable circuitry (“FPGA fab
`ric'). The FPGA fabric can include block random access
`memories (BRAMs), that are dual ported memory blocks,
`such as those found in the Virtex II FPGA chip from Xilinx,
`Inc. of San Jose, Calif. For example, for a high-bandwidth
`application, memory read or write cycle time of BRAMs of
`FPGA fabric may be insufficient to process each incoming
`word of a data stream sequentially. To address this imbal
`ance between data rate of an incoming data stream and data
`rate which may be handled by BRAMs, one may write and
`read multiple words of the data stream into and out of
`BRAMs configured to be N words wide, for N a positive
`integer.
`Conventionally, multi-word data is written to and
`retrieved from N word wide queues. Accessing N words at
`a time from the same physical address is known as single
`address pointer access. BRAMs of an FPGA may be con
`figured to provide one or more circular queues to receive an
`incoming data stream. However, configuring BRAMs as a
`circular queue with a width of N words has a drawback when
`a packet or cell boundary is not an integer multiple of integer
`N. In reads or writes, where a packet or cell boundary is not
`an integer multiple of integer N, a starting point ("word
`Zero') of a packet or cell may appear in any of N possible
`word locations of Such a memory configuration. In other
`words, a packet or cell word Zero may not be aligned to a
`physical address boundary. Because a packet or cell starting
`point may be in any of the N possible word locations, this
`randomness complicates reconstruction of a packet or a cell.
`Others have addressed this data alignment issue by imple
`menting what is known as “barrel shifting.” However,
`implementing a barrel shifter in an FPGA may consume a
`significant amount of resources and power.
`Accordingly, it would be desirable and useful to provide
`means by which packets or cells may be accessed where a
`first word of a retrieved packet or cell appears in a same
`location regardless of where such word originated in a
`physical memory space.
`
`SUMMARY OF THE INVENTION
`
`One or more aspects of the invention relate generally to
`aligning data and more particularly to queuing and aligning
`data.
`An aspect of the invention is a method for queuing and
`ordering data, including: receiving a set of bits (for example,
`a cell) at a first data rate; interrogating the set of bits at a
`second data rate slower than the first data rate, where the set
`of bits is interrogated to obtain data information for the set
`of bits; generating physical addresses for storing data
`obtained from the set of bits; storing the data from the set of
`bits in memory responsive to the physical addresses, where
`the storing includes writing multiple words during a write
`operation; creating a record of the data information for the
`set of bits, the record including a virtual address associated
`with a starting location of the physical addresses; retrieving
`the record to read the data stored in the memory; translating
`the record to provide the physical addresses for accessing
`the data stored in the memory and to provide select signals;
`reading the data stored in the memory responsive to the
`physical addresses provided from translating the record; and
`aligning the data accessed from the memory, the data aligned
`responsive to the select signals.
`Another aspect of the invention is an integrated circuit for
`queuing and ordering data, including: a serializer-deserial
`izer configured to receive a serial stream of information at a
`
`IPR2023-00626 Page 00011
`
`
`
`US 7,254,691 B1
`
`3
`first data rate and configured to convert the serial stream of
`information to parallel streams of information at a second
`data rate slower than the first data rate; a pre-processor
`coupled to receive the parallel streams of information and
`configured to interrogate the parallel streams of information
`to locate an initial data word of a cell and to obtain
`information on size of the cell, where the pre-processor is
`configured to generate a physical address responsive to the
`initial data word location and to generate a virtual address
`responsive to the physical address and a slot index, the slot
`index indicating an memory unit in which the initial data
`word is to be stored; a memory coupled to receive the
`parallel streams of information and the physical address
`associated therewith for storage of cell data, where the
`memory includes concatenated memory blocks such that the
`physical address associated with the memory unit being a
`respective one of the concatenated memory blocks; a pointer
`circuit coupled to receive the virtual address and the infor
`mation on size of the cell from the pre-processor and
`configured to create record thereof, where the physical
`address for the initial data word is associated with a row of
`the concatenated memory blocks; a translator coupled to
`retrieve the record and configured to generate: physical
`addresses responsive to the virtual address and the informa
`tion on size of the cell obtained from the record retrieved,
`and select signals associated with the concatenated memory
`blocks responsive to the slot index and Successive incre
`ments thereof responsive to the information on size of the
`cell; and an aligner coupled to receive the cell data read from
`the memory responsive to the physical addresses and
`coupled to receive the select signals, the aligner configured
`to provide lane alignment of the cell data read responsive to
`the select signals.
`An integrated circuit for aligning data, including: memo
`ries, where each of the memories has a respective set of data
`35
`out ports; an aligner having multiplexers arranged in a lane
`sequence, where the multiplexers are coupled to each set of
`the data out ports of the memories to receive at least one data
`word from each of the memories associated with a cell
`having data words; and a virtual-to-physical address trans
`40
`lator coupled to the memories and to the aligner and con
`figured to translate a virtual address to provide physical
`addresses and select signals, where the physical addresses
`are locations of at least a portion of the data words of the cell
`stored in the memories in successive order. The virtual-to
`physical address translator is configured to generate the
`select signals responsive to a memory index in the virtual
`address, where the memory index is associated with a
`memory of the memories having stored therein a first data
`word of the data words of the cell. The virtual address
`includes a physical address of the first data word of the cell
`stored in the memory. The multiplexers are coupled to
`receive the select signals as control select signaling to align
`the at least one data word obtained from each of the
`memories for lane aligned output from the aligner, where the
`first data word is output from a multiplexer of the multi
`plexers associated with an initial lane of the lane sequence.
`
`4
`FIG. 1B is a high-level block diagram depicting an
`exemplary embodiment of a memory array for the queuing
`and alignment system of FIG. 1A.
`FIG. 1C is a block diagram depicting an exemplary
`embodiment of virtual-to-physical memory mapping.
`FIG. 1D is a high-level block diagram depicting an
`exemplary embodiment of a storage array for the queuing
`and alignment system of FIG. 1A.
`FIG. 2 is a high-level block diagram depicting an exem
`plary embodiment of translator/controller.
`FIG. 3 is a flow diagram depicting an exemplary embodi
`ment of operation of an address generator.
`FIG. 4 is a flow diagram depicting an exemplary embodi
`ment of operation of an alignment select generator.
`FIG. 5 is a high-level block/schematic diagram depicting
`an exemplary embodiment of memory coupled to an aligner.
`FIG. 6 is a high-level block diagram depicting an exem
`plary embodiment of a field programmable gate array archi
`tecture including different programmable tiles.
`
`DETAILED DESCRIPTION OF THE DRAWINGS
`
`In the following description, numerous specific details are
`set forth to provide a more thorough description of the
`specific embodiments of the invention. It should be appar
`ent, however, to one skilled in the art, that the invention may
`be practiced without all the specific details given below. In
`other instances, well known features have not been
`described in detail so as not to obscure the invention. For
`ease of illustration, the same number labels are used in
`different diagrams to refer to the same items; however, in
`alternative embodiments the items may be different. More
`over, for purposes of clarity, a single signal or multiple
`signals may be referred to or illustratively shown as a signal
`to avoid encumbering the description with multiple signal
`lines. Moreover, along those same lines, a multiplexer or a
`register, among other circuit elements, may be referred to or
`illustratively shown as a single multiplexer or a single
`register though such reference or illustration may be repre
`senting multiples thereof. Furthermore, though particular
`signal bit widths, data rates, and frequencies are described
`herein for purposes of clarity by way of example, it should
`be understood that the scope of the description is not limited
`to these particular numerical examples as other values may
`be used.
`From the description that follows, it will become apparent
`that post processing of an incoming data stream is simplified
`by providing a capability to access any starting word, as well
`as Subsequent words, from stored multiple-word data using
`a virtual memory space aligned to cell or packet boundaries.
`Moreover, this may be used where storage, and thus access
`thereof, of multi-word data spans more than one physical
`address.
`FIG. 1A is a high-level block diagram depicting an
`exemplary embodiment of a queuing and alignment system
`(“system') 100. System 100 may be located in an FPGA 160
`where an incoming serial data stream 113S is received at a
`data rate faster than can be processed by FPGA fabric 150
`in which system 100 is located. Notably, configurable logic,
`dedicated logic, or a combination of dedicated and config
`urable logic may be used to provide system 100. Thus, it
`should be appreciated that memory 103, aligner 102, trans
`lator/controller 101, finite state machine (“FSM) 107, and
`pointer logic (“POINTER) 105 may operate in a clock
`domain of clock signal 162, which in one embodiment is
`
`5
`
`10
`
`15
`
`25
`
`30
`
`45
`
`50
`
`55
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Accompanying drawing(s) showing exemplary embodi
`ment(s) in accordance with one or more aspects of the
`invention; however, the accompanying drawing(s) should
`not be taken to limit the invention to the embodiment(s)
`shown, but are for explanation and understanding only.
`FIG. 1A is a high-level block diagram depicting an
`exemplary embodiment of a queuing and alignment system.
`
`60
`
`65
`
`IPR2023-00626 Page 00012
`
`
`
`US 7,254,691 B1
`
`15
`
`5
`asynchronous to clock signal 161. In an alternative embodi
`ment clock signal 162 can be synchronous with clock signal
`161.
`Incoming data stream 113S is converted by a serializer
`deserializer (“SERDES) 104 from a serial data stream 113S
`to parallel data streams 113P. SERDES 104 is configured to
`operate at a clock rate of incoming data stream 113S and to
`output parallel data streams 113P at a clock rate of clock
`signal 161 which is slower than the clock rate of incoming
`data stream 113S. An example of a SERDES in an FPGA
`10
`that may be used is described in additional detail in a patent
`application entitled “Bimodal Serial to Parallel Converter
`with Bitslip Controller”, by Paul T. Sasaki et al., Ser. No.
`10/919,900, filed Aug. 17, 2004, now U.S. Pat. No. 6,985,
`096. Notably, incoming data stream 113S may be provided
`with a forwarded clock signal to provide a source synchro
`nous interface, as is known.
`For purposes of clarity by way of example and not
`limitation, numerical examples shall be used. However, it
`should be understood that other values than those used
`herein may be employed.
`It shall be assumed that incoming serial data stream 113S
`is approximately 500 MHz double-data rate (“DDR) signal,
`and it shall be assumed that FPGA fabric 150 operates at
`approximately one quarter the data rate of an incoming data
`stream 113S. Accordingly, for each clock cycle of clock
`signal 161, four words of incoming data stream 113S may be
`written from parallel data streams 113P into memory 103.
`Parallel data streams 113P are provided to pre-processor
`106. Pre-processor 106 provides data from parallel data
`streams 113P to memory 103 along with information as to
`where in memory 103 such data is to be written. For
`example, data may be part of a cell. A cell is a set or group
`of bits and may be any of a packet, a frame, a Sub-packet,
`or a sub-frame. The type of information provided is format
`dependent. Thus, pre-processor 106 extracts from a cell a
`starting location of the data to be written to memory 103 and
`a size of the cell data. It should be appreciated that cell size
`may be greater than memory width. The physical address as
`to where the cell data is to start being written to memory,
`along with the starting memory unit, is provided from
`pre-processor 106 to memory 103 along with the cell data.
`As shall be more clearly understood from the description
`that follows, the physical address of a starting word of cell
`data and the memory unit is used to provide a virtual
`address, and this virtual address along with cell size is
`provided from pre-processor 106 to pointer 105 to be stored
`as a record in record array 180 of pointer 105.
`Memory 103 is configured with M memory units, where
`each memory unit may be 2N deep, for M and N positive
`integers. Each memory unit may be one or more words wide.
`For purposes of illustration and not limitation, it shall be
`assumed that M is equal to four and that N is equal to 10.
`Furthermore, it shall be assumed, for purposes of illustra
`tion, in one exemplary embodiment of the present invention,
`that each memory unit is a block memory unit, such as
`BRAM, of an FPGA. However, in other embodiments of the
`present invention each memory unit can be any discrete
`memory unit in any type of integrated circuit. Notably,
`because of control overhead, for example such as cell start
`point (“SP) and cell end point (“EP”) control words, it may
`be useful to include additional BRAMs in memory 103.
`Moreover, it may be desirable to write more than four words
`at a time, and thus additional BRAMs or wider BRAMs may
`be used. However, for purposes of illustration and not
`limitation, it shall be assumed that the physical address
`space in this example is four words wide and approximately
`
`45
`
`6
`1024 words deep, where each BRAM is one word wide.
`Moreover, though BRAM is used in this example, it should
`be understood that distributed memory, such as distributed
`RAM, may be used.
`For example, a record in record array 180 may be physical
`addresses in memory 103 for a SP and an EP of a cell as
`determined by pre-processor 106.
`FIG. 1B is a high-level block diagram depicting an
`exemplary embodiment of a memory array 170 for the
`above-mentioned example. Memory array 170 may be part
`of or separate from memory 103 of FIG. 1A. However, for
`purposes of clarity, it shall be assumed that memory array
`170 is part of memory 103. Four memory units are indicated
`as BRAMs 0 through 3, where each BRAM has 1024 rows.
`Concatenating BRAMs 0 through 3, each physical address
`(“PA') is four words wide. In addition to BRAMs 0 through
`3 forming a physical address space for memory array 170,
`other memory units may be used to provide a physical
`address space.
`Again, each BRAM may be more than one word wide. For
`example, a BRAM may be two words wide, in which
`implementation indexing may be to the first and second
`word stored in a BRAM. However, for purposes of illustra
`tion and not limitation, it shall be assumed that each BRAM
`is one word wide.
`In the example, cell 171 stored in memory array 170
`occupies a portion of two physical address spaces, namely a
`portion of a physical address space associated with PA 1 and
`a portion of a physical address space associated with PA 2.
`Both a PA for a starting location of a cell, as well as an index
`indicating where within the PA a cell starts, namely which
`memory unit, may be used for accessing a cell. In this
`example, the notation Word 0 is used to indicate a control or
`starting word of cell 171. Notably, for this example, depend
`ing in which BRAM Word 0 is written will affect whether
`one or two PAS are used for cell 171.
`FIG. 1C is a block diagram depicting an exemplary
`embodiment of virtual memory space 195. Continuing the
`above example of four BRAMs, a virtual-to-physical
`memory mapping ratio of four is used, though other ratios
`may be employed.
`Example physical addresses, PAN-1175, PAN 176, PA
`N+1 177, and PA N+2 178, each may have one of four
`respective virtual addresses associated therewith, namely
`virtual addresses 190-1 through 190-4, 190-5 through 190-8,
`190-9 through 190-12, and 190-13 through 190-16, respec
`tively, depending on the location of a starting word of a cell.
`For the example, if a starting word of a cell is stored in slot
`1 at PAN 176, then virtual address 190-7 is generated by
`pre-processor 106 of FIG. 1A, where (0,1) indicates slot 1.
`Notably, start of cell sequences are conventionally used to
`indicate a starting location, which may be detected by
`pre-processor 106 to identify a starting word of a cell.
`Pre-processor 106 may be configured to concatenate the
`physical address with a slot address to provide a virtual
`address for each starting word, where the slot address is
`associated with a memory in which the starting word is
`stored.
`Each virtual address pointer includes a physical address
`and a slot index. Notably, the physical address is sequen
`tially incremented (“pushed') if a cell spans more than one
`physical address. For the example of M equal to four, then
`Q may equal two, namely the two least significant bits
`(“LSBs) of each virtual address in this example. Notably,
`ordering of bits may vary from application to application.
`For example, most significant bits may be used. However,
`for purposes of illustration by way of example, it shall be
`
`25
`
`30
`
`35
`
`40
`
`50
`
`55
`
`60
`
`65
`
`IPR2023-00626 Page 00013
`
`
`
`US 7,254,691 B1
`
`5
`
`10
`
`15
`
`25
`
`30
`
`7
`assumed that LSBs are used though other bit orders may be
`used. Thus, for example bits 00 may correspond to BRAM
`0 (“slot 0'). For example, for virtual addresses 190-5
`through 190-8, the physical address value for each is N, each
`corresponding to PAN 176, and the slot value for each is two
`bits, one set for each slot, namely (0,0) for slot 0, (0,1) for
`slot 1, (1,0) for slot 2, and (1,1) for slot 3 in this example.
`With simultaneous reference to FIGS. 1A through 1C,
`virtual-to-physical mapping is further described. Notably,
`for writing to memory array 170, words may be written in
`groups of four words at a time to Successive available slots
`in the current example. Notably, where data streams 113P
`begin writing to memory array 170 may be in any of slots 0
`through 3. For reading from memory array 170, virtual
`addresses are used to provide for four concurrent reads at a
`time in the instant example responsive to a virtual address
`pointer pointing to a first word of a cell stored in at a
`physical address in memory 103 and Successive increments
`thereafter for subsequent data cells. For a “full bandwidth
`system, there is little or no “dead” space, and thus the
`number of words written at a time generally equals the
`number of words read at a time. Of course, less than a full
`bandwidth implementation may be used.
`FIG. 1D is a high-level block diagram depicting an
`exemplary embodiment of a record array 180. Record array
`180 may be part of or separate from pointer 105 of FIG. 1A.
`Continuing the above-example, record array 180 may be
`implemented as a first-in, first-out buffer (“FIFO) for a
`virtual address space for queuing virtual address pointers.
`Alternatively, a BRAM or a circular queue may be used to
`provide a record array 180. Record array 180 may have a
`depth that is M times 2 N. Width of record array 180 is
`sufficient for storing a pointer, such as pointer 190-1 of FIG.
`1C for example, along with cell size, for example four
`words, which make up a record (“REC), such as any of
`35
`records 0 through 4096 in this example. Again, other record
`formats may be used.
`FIG. 2 is a high-level block diagram depicting an exem
`plary embodiment of translator/controller 101 of FIG. 1A.
`Translator/controller 101 is a virtual address translator/
`40
`controller. Translator/controller 101 includes address gen
`erators 201 and alignment select generators 202. For M a
`positive integer, translator/controller 101 is configured to
`generate Maddress signals (“addresses') 120 and M select
`control signals (“selects”) 121 respectively by address gen
`45
`erators 201-1 through 201-M and alignment select genera
`tors 202-1 through 202-M.
`Addresses 120 may include respective addresses for
`BRAM 0 through BRAM M-1. Selects 121 may include
`respective selects for Lane 0 through Lane M-1. Addresses
`120 and selects 121 are generated respectively by generators
`201 and 202 responsive to inputs, including a virtual address
`signal 112, a load signal 110, and an increment signal 111.
`A virtual address may be described by a set of values, (P.
`S), where P and S are positive integers, respectively such as
`a value for a row pointer and a value for a column pointer
`of virtual address signal 112. A value X, in binary format, is
`for bit width. The value X is equal to the sum of N and S.
`Again, depth of each BRAM is 2N, as N bits are used to
`access memory contents. Moreover, M is equal to or greater
`than 2S, which is the number of words that may be accessed
`in single read operation.
`With simultaneous reference to FIGS. 1A through 1D and
`FIG. 2, system 100 is further described. Address generation
`by a generator of generators 201 is responsive to either load
`signal 110 or increment signal 111, depending on which
`operation is being done. Moreover, address generation by a
`
`50
`
`55
`
`60
`
`65
`
`8
`generator of generators 201 is responsive to the value of P
`(“row pointer) and the value of S (“column pointer) of
`virtual address signal 112. Recall from the example, for
`Successive reads are done at a time to read four words, Such
`as Words 0 through 3 of a cell 171. For an initial set of
`successive reads, load signal 110 is asserted. However, recall
`that cell size may be wider than memory width. Thus, for
`example if a cell spans more than four words in the current
`example, a physical address is incremented for the next set
`of Successive reads. Thus, for example, to read the next part
`of a cell, namely a sub-cell 172 where a sub-cell 171 in
`combination with sub-cell 172 form at least part of a cell,
`increment signal 111 is asserted. Notably, as all cell data is
`successively written in memory 103, by successively assert
`ing increment signal 111 until the entire cell size is spanned
`by sets of Succ