throbber

`
`
`
`
`
`
`
`
`
`
`
`
`
`DECLARATION OF GORDON MACPHERSON
`
`I, Gordon MacPherson, am over twenty-one (21) years of age. I have never been
`convicted of a felony, and I am fully competent to make this declaration. I declare the following
`to be true to the best of my knowledge, information and belief:
`
`1. I am Director, Board Governance & Policy Development of The Institute of Electrical
`and Electronics Engineers, Incorporated (“IEEE”).
`
`2. IEEE is a neutral third party in this dispute.
`
`3. I am not being compensated for this declaration and IEEE is only being reimbursed
`for the cost of the article I am certifying.
`
`4. Among my responsibilities as Director, Board Governance & Policy Development, I
`act as a custodian of certain records for IEEE.
`
`5. I make this declaration based on my personal knowledge and information contained
`in the business records of IEEE.
`
`6. As part of its ordinary course of business, IEEE publishes and makes available
`technical articles and standards. These publications are made available for public
`download through the IEEE digital library, IEEE Xplore.
`
`7. It is the regular practice of IEEE to publish articles and other writings including
`article abstracts and make them available to the public through IEEE Xplore. IEEE
`maintains copies of publications in the ordinary course of its regularly conducted
`activities.
`
`8. The article below has been attached as Exhibit A to this declaration:
`
`
`A. A. Radulescu, et al.; “An efficient on-chip network interface offering
`guaranteed services, shared-memory abstraction, and flexible network
`configuration”, published in Proceedings Design, Automation and Test in
`Europe Conference and Exhibition, date of conference February 16-20,
`2004.
`
`
`
`
`
`9. I obtained a copy of Exhibit A through IEEE Xplore, where it is maintained in the
`ordinary course of IEEE’s business. Exhibit A is a true and correct copy of the
`Exhibit, as it existed on or about October 30, 2023.
`
`
`445 Hoes Lane Piscataway, NJ 08854
`
`DocuSign Envelope ID: D785B3EF-83B0-43A2-820A-BE0763E7DBDE
`
`Samsung Ex. 1017
`Page 1
`
`

`

`
`
`10. The article and abstract from IEEE Xplore shows the date of publication. IEEE
`Xplore populates this information using the metadata associated with the publication.
`
`11. A. Radulescu, et al.; “An efficient on-chip network interface offering guaranteed
`services, shared-memory abstraction, and flexible network configuration”, published
`in Proceedings Design, Automation and Test in Europe Conference and Exhibition,
`date of conference February 16-20, 2004. Copies of the conference proceedings were
`made available no later than the last day of the conference. The article is currently
`available for public download from the IEEE digital library, IEEE Xplore.
`
`12. I hereby declare that all statements made herein of my own knowledge are true and
`that all statements made on information and belief are believed to be true, and further
`that these statements were made with the knowledge that willful false statements and
`the like are punishable by fine or imprisonment, or both, under 18 U.S.C. § 1001.
`
`I declare under penalty of perjury that the foregoing statements are true and correct.
`
`
`
`
`
`
`
`
`
`
`Executed on:
`
`
`
`
`
`
`
`
`
`
`
`
`DocuSign Envelope ID: D785B3EF-83B0-43A2-820A-BE0763E7DBDE
`
`10/30/2023
`
`Samsung Ex. 1017
`Page 2
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`EXHIBIT A
`
`
`
`DocuSign Envelope ID: D785B3EF-83B0-43A2-820A-BE0763E7DBDE
`
`Samsung Ex. 1017
`Page 3
`
`

`

`An Efficient On-Chip Network Interface Offering Guaranteed Services,
`Shared-Memory Abstraction, and Flexible Network Configuration
`Andrei R˘adulescu, John Dielissen, Kees Goossens, Edwin Rijpkema, and Paul Wielage
`Philips Research Laboratories, Eindhoven, The Netherlands
`
`Abstract
`
`In this paper we present a network interface for an on-chip
`network. Our network interface decouples computation from com-
`munication by offering a shared-memory abstraction, which is in-
`dependent of the network implementation. We use a transaction-
`based protocol to achieve backward compatibility with existing
`bus protocols such as AXI, OCP and DTL. Our network interface
`has a modular architecture, which allows flexible instantiation. It
`provides both guaranteed and best-effort services via connections.
`These are configured via network interface ports using the net-
`work itself, instead of a separate control interconnect. An exam-
`ple instance of this network interface with 4 ports has an area of
`0.143mm2 in a 0.13µm technology, and runs at 500 MHz.
`
`1
`
`Introduction
`Networks on chip (NoC) have been proposed as a solution to
`the interconnect problem for highly complex chips [2, 3, 5, 9, 12,
`14, 15, 17, 21, 27]. NoCs help designing chips in several ways:
`they (a) structure and manage wires in deep submicron technolo-
`gies [2, 3, 9, 12, 21], (b) allow good wire utilization through shar-
`ing [5, 9, 12, 21], (c) scale better than buses [14, 21], (d) can be
`energy efficient and reliable [2, 5], and (e) decouple computation
`from communication through well-defined interfaces, enabling IP
`modules and interconnect to be designed in isolation, and to be
`integrated more easily [2, 13, 21, 24]
`Networks are composed of routers, which transport the data
`from one place to another, and network interfaces (NI), which im-
`plement the interface to the IP modules. In a previous article [21],
`we have shown the trade-offs in designing a cost-effective router
`combining guaranteed with best-effort traffic. In this paper, we
`focus on the other network component, the network interface.
`Network interface design has received considerable attention
`for parallel computers [8,25], and computer networks [6,7]. These
`designs are optimized for performance (high throughput, low la-
`tency), and often consist of a dedicated processor, and large
`amount of buffering. As a consequence, their cost is too large
`to be applicable on chip.
`On-chip network interfaces must provide a low-area overhead,
`because the size of IP modules attached to the NoC is relatively
`small. Designs of network interfaces with a low area have been
`proposed [4, 28]. However, they do not provide throughput or
`latency guarantees, which are essential for a compositional con-
`struction of complex SoCs.
`Our NI is intended for systems on chip (SoC), hence, it must
`have a low area. To enable the reuse of existing IP modules, we
`must provide a smooth transition from buses to NoCs. A shared-
`
`memory abstraction via transactions (e.g., read, write) ensures this.
`Further, we also have to provide a simple and flexible configura-
`tion, preferably using the NoC itself to avoid the need for a sepa-
`rate scalable interconnect.
`We achieve a low-cost implementation of the NI by implement-
`ing the protocol stack in hardware, and by exploiting on-chip char-
`acteristics (such as the absence of transmission errors, relatively
`static configuration, tight synchronization) to implement only the
`relevant parts of a complete OSI stack. A hardware implementa-
`tion of the protocol stack provides a much lower latency overhead
`compared to a software implementation. Further, a hardware im-
`plementation allows both hardware and software cores to be reused
`without change [4].
`Our NI provides services at the transport layer in the ISO-OSI
`reference model [22], because this is the first layer where offered
`services are independent of the network implementation. This
`is a key ingredient in achieving the decoupling between com-
`putation and communication [16, 24], which allows IP modules
`and interconnect to be designed independently from each other.
`We provide transport-layer services by defining connections (e.g.,
`point-to-point or multicast) configured for specific properties (e.g.,
`throughput, ordering).
`We offer guaranteed services as they are essential for a com-
`positional construction (design and programming) of SoC. The
`reasons are that they limit the possible interactions of IPs with
`the communication environment [12, 13], separate the IP require-
`ments and their implementation, and make application quality of
`service independent of the IP and NoC implementations. Exam-
`ples of such guarantees are lower bounds on throughput, and upper
`bounds on latency.
`Our NoC, called Æthereal, offers a shared-memory abstrac-
`tion to the IP modules. Communication is performed using a
`transaction-based protocol, where master IP modules issue request
`messages (e.g., read and write commands at an address, possibly
`carrying data) that are executed by the addressed slave modules,
`which may respond with a response message (i.e., status of the
`command execution, and possibly data) [23]. We adopt this proto-
`col to provide backward compatibility to existing on-chip commu-
`nication protocols (e.g., AXI [1], OCP [18], DTL [19]), and also
`to allow future protocols better suited to NoCs.
`We provide a modular NI, which can be configured at design
`time. This is, the number of ports and their type (i.e., configuration
`port, master port, or slave port), the number of connections at each
`port, memory allocated for the queues, the level of services per
`port, and the interface to the IP modules are all configurable at
`design (instantiation) time using an XML description [11].
`The NI allows flexible NoC configuration at run time. Each
`connection can be configured individually, requiring configurable
`NoC components (i.e., router and NI). However, instead of using
`
`1
`Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04)
`1530-1591/04 $20.00 © 2004 IEEE
`
`Authorized licensed use limited to: IEEE Staff. Downloaded on October 30,2023 at 16:50:25 UTC from IEEE Xplore. Restrictions apply.
`
`Samsung Ex. 1017
`Page 4
`
`

`

`a separate control interconnect to program them, the NoC is used
`to program itself. This is performed through configuration ports
`using DTL-MMIO (memory-mapped IO) transactions [19]. The
`NoC can be configured in a distributed fashion (i.e., via multiple
`configuration ports), or centralized (i.e., via a single port).
`The paper is organized as follows.
`In the next section, the
`services that we implement, and the interface offered to the IP
`modules are described. In Section 3, we show that NoCs can be
`configured both in a distributed and in a centralized way, and we
`present the trade-offs between the two approaches. In Section 4,
`we present a modular network interface architecture, which is split
`into a kernel, providing core functionality, and a number of shells
`to extend functionality, e.g., wrappers to provide an interface to
`existing bus protocols, such as AXI or DTL. In this section, we
`also show how the NI allows NoC configuration using the NoC
`itself as opposed to via a separate control interconnect. In Sec-
`tion 5, we demonstrate the feasibility of our network interface de-
`sign through a prototype implementation in a 0.13µm technology,
`and we conclude in Section 6.
`2 NoC Services
`As mentioned in the previous section, the communication ser-
`vices of the Æthereal NoC are defined to meet the following
`goals: (a) decouple computation (IP modules) from communica-
`tion (NoC), (b) provide backward compatibility to existing bus
`protocols, (c) provide support for real-time communication, and
`(d) have a low-cost implementation.
`Decoupling computation from communication is a key ingredi-
`ent in managing the complexity of designing chips with billions of
`transistors, because it allows the IP modules and the interconnect
`to be designed independently [16, 24]. In NoCs, this decoupling
`is achieved by positioning the network services at the transport
`level [3, 21] or above in the ISO-OSI reference model [22]. At the
`transport level, the offered services are end to end between com-
`municating IP modules, hiding, thus, the network internals, such
`as topology, routing scheme, etc.
`Backward compatibility with existing protocols, such as AXI
`or DTL, is achieved by using a model based on transactions [23].
`In a transaction-based model, there are two types of IP modules:
`masters and slaves. Masters initiate transactions by issuing re-
`quests, which can be further split in commands, and write data
`(corresponding to the address and write signal groups in AXI).
`Examples of commands are read, and write. One or more slaves
`receive and execute each transaction. Optionally, a transaction can
`also include a response issued by the slave to the master to return
`data or an acknowledgment of the transaction execution (corre-
`sponding to the read data and write response groups in AXI).
`In the Æthereal NoC, all these signals are sequentialized in
`request and response messages, which are supplied to the NoC,
`where they are transported by means of packets. Sequentialization
`is performed to reduce the number of wires, increasing their uti-
`lization, and to simplify arbitration. Packetization is performed by
`the NI, and is thus transparent to the IP modules.
`The Æthereal NoC offers its services on connections, which
`can be point to point (one master, one slave), multicast (one mas-
`ter, multiple slaves, all slaves executing each transaction), and
`narrowcast (one master, multiple slaves, a transaction is executed
`by only one slave) [23]. Connections are composed of unidirec-
`tional point-to-point channels (between a single master and a sin-
`gle slave). To each channel, properties are attached, such as guar-
`anteed message delivery or not, in order or un-ordered message
`
`delivery, and with or without timing guarantees. As a result, dif-
`ferent properties can be attached to the request and response parts
`of a connection, or for different slaves within the same connection.
`Connections can be opened and closed at any time. Opening and
`closing of connections takes time, and is intended to be performed
`at a granularity larger than individual transactions.
`Support for real-time communication is achieved by providing
`throughput, latency and jitter guarantees. In Æthereal, this is im-
`plemented by configuring connections as pipelined time-division-
`multiplexed circuits over the network. Time multiplexing is only
`possible when the network routers have a notion of synchronic-
`ity which allows slots to be reserved consecutively in a sequence
`of routers [13, 21]. This scheme [21] has smaller packet buffers,
`and, hence, has lower implementation cost compared to alterna-
`tives, such as rate-based packet switching [29], or deadline-based
`packet switching [20].
`Throughput guarantees are given by the number of slots re-
`served for a connection. Slots correspond to a given bandwidth:
`Bi, and, therefore, reserving N slots for a connection results in a
`total bandwidth of N×Bi. The latency bound is given by the wait-
`ing time until the reserved slot arrives and the number of routers
`data passes to reach its destination. Jitter is given by the maximum
`distance between two slot reservations.
`Protocol stacks that are used in networks to implement com-
`munication services, require additional cost compared to buses.
`Protocol stacks are necessary in networks to manage the complex-
`ity of networks, and to offer differentiated services. The pressure
`to keep the protocol stack small is higher on-chip than off-chip,
`because the size of the IP modules attached to the NoC is rela-
`tively small. However, for NoCs, the protocol stacks can be re-
`duced by exploiting the on-chip characteristics (e.g., no transfer
`errors, short wires) [23]. In the Æthereal NoC, we optimize the
`performance and minimize the cost of the protocol stack by im-
`plementing it in hardware, rather than in software. We support this
`claim in Section 5.
`3 Network Configuration
`Before the Æthereal NoC can be used by an application, it must
`be configured. NoC (re)configuration means opening and closing
`connections in the system. Connections are set up depending on
`the application or the mode the system is running. Therefore, we
`must be able to open and close connections while the system is
`running. (Re)configuration can be partial or total (some or all con-
`nections are opened/closed, respectively).
`Opening a connection involves setting several registers, and al-
`locating shared resources (for more details see Section 4). In the
`case of the current prototype of the Æthereal NoC, for each pair
`of one master and one slave of a connection, there are 5 and 3
`registers written at the master and slave network interfaces, re-
`spectively. The shared resources consist of the slots allocated to
`the connections. These slots can be configured using either a dis-
`tributed or a centralized model.
`In the distributed case, a connection can be opened/closed from
`multiple network interface ports. Multiple configuration opera-
`tions can be performed simultaneously, however, potential con-
`flicts must also be solved (e.g., connection configurations initiated
`at two configuration ports may try to reserve the same slot in a
`router). Information about the slots is maintained in the routers,
`which also accept or reject a tentative slot allocation.
`In a centralized system, there is only one place that performs
`NoC configuration.
`In such a case, the slot information can be
`
`2
`Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04)
`1530-1591/04 $20.00 © 2004 IEEE
`
`Authorized licensed use limited to: IEEE Staff. Downloaded on October 30,2023 at 16:50:25 UTC from IEEE Xplore. Restrictions apply.
`
`Samsung Ex. 1017
`Page 5
`
`

`

`connid
`msg
`connid
`msg
`data port
`
`msg
`msg
`data port
`
`msg
`msg
`data port
`
`chid
`
`msg
`
`msg
`
`msg
`
`chid
`
`msg
`
`msg
`
`msg
`
`Clock domain
`boundary
`
`Source
`queues
`
`Pck
`
`pck
`
`Path
`
`BE/GT
`
`Limit
`
`Space
`
`pck
`pck
`router port
`
`Scheduler
`
`Credit
`
`STU
`
`pck
`
`Depck
`
`Destination
`queues
`
`memory-mapped
`config. port
`
`Figure 2. Network interface kernel
`
`and are also used to provide the clock domain crossing between
`the network and the IP modules. Each port can, therefore, have a
`different clock frequency.
`In a first prototype
`Each channel is configured individually.
`of the Æthereal NI, we can configure if a channel provides time
`guarantees (GT) or not (we call this best effort, BE), reserve slots
`for GT connections, configure the end-to-end flow control, and the
`routing information.
`End-to-end flow control ensures that no data is sent unless
`there is enough space in the destination buffer to accommodate
`it. This is implemented using credits [26]. For each channel, there
`is a counter (Space) tracking the empty buffer space of the re-
`mote destination queue. This counter is initialized with the remote
`buffer size. When data is sent from the source queue, the counter
`is decremented. When data is consumed by the IP module at the
`other side, credits are produced in a counter (Credit) to indicate
`that more empty space is available. These credits are sent to the
`producer of data to be added to its Space counter. In the Æthe-
`real prototype, we piggyback credits in the header of the packets
`for the data in the other direction to improve NoC efficiency. Note
`that at most Space data items can be transmitted before credits
`are received. We call the minimum between the data items in the
`queue and the value in the counter Space, the sendable data.
`From the source queues, data is packetized (Pck) and sent to
`the NoC via a single link. A packet header consists of the rout-
`ing information (NI address for destination routing, and path for
`source routing), remote queue id (i.e., the queue of the remote NI
`in which the data will be stored), and piggybacked credits.
`There are multiple channels which may require data transmis-
`sion, we implement a scheduler to arbitrate between them. The
`scheduler checks if the current slot is reserved for a GT chan-
`nel. If the slot is reserved, is the GT channel has data which can
`be transmitted, and if there is space in the channel’s destination
`buffer, then the channel is granted data transmission. Otherwise,
`
`NI kernel
`ports
`
`NI
`kernel
`
`Router
`
`narrowcast
`
`multicast
`
`DTL adapter
`
`AXI adapter
`
`Network interface (NI)
`
`network
`
`Figure 1. NI kernel and shells
`
`DTL
`
`DTL
`
`AXI
`
`AXI
`
`NI
`ports
`
`user
`
`stored in the configuration module instead of the routers, which
`simplifies the design, and, in the case of small NoCs, may even
`speed up configuration. For large NoCs, however, centralized con-
`figuration can introduce a bottleneck.
`In the initial prototype of the Æthereal NoC, we opt for central-
`ized configuration, because it is able to satisfy the needs of a small
`NoC (around 10 routers), and has a simpler design and lower cost.
`We use transactions to program the NoC, both for connection reg-
`isters in the NIs, and for the slot information. We present details
`of how NoC configuration is performed in Section 4.
`
`4 Network Interface Architecture
`The network interface (NI) is the component that provides the
`conversion of the packet-based communication of the NoC to the
`higher-level protocol that IP modules use. We split the design of
`the network interface in two parts (see Figure 1): (a) the NI kernel,
`which implements the channels, packetizes messages and sched-
`ules them to the routers, implements the end-to-end flow control,
`and the clock domain crossing, and (b) the NI shells, which im-
`plement the connections (e.g., narrowcast, multicast), transaction
`ordering for connections, and other higher-level issues specific to
`the protocol offered to the IP.
`4.1 NI Kernel Architecture
`The NI kernel (see Figure 2) receives and provides messages,
`which contain the data provided by the IP modules via their pro-
`tocol after sequentialization. The message structure may vary de-
`pending on the protocol used by the IP module. However, the
`message structure is irrelevant for the NI kernel, as it just sees
`messages as pieces of data to be transported over the NoC.
`The NI kernel communicates with the NI shells via ports. At
`each port, point-to-point connections can be configured, their max-
`imum number being selected at NI instantiation time. A port can
`have multiple connections to allow differentiated traffic classes,
`in which case there are also connid signals to select on which
`connection a message is supplied or consumed.
`Int he NI kernel, there are two message queues for each point-
`to-point connection (one source queue, for messages going to the
`NoC, and one destination queue, for messages coming from the
`NoC). Their size is also selected at the NI instantiation time. In our
`NI, queues are implemented using custom-made hardware fifos,
`
`3
`Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04)
`1530-1591/04 $20.00 © 2004 IEEE
`
`Authorized licensed use limited to: IEEE Staff. Downloaded on October 30,2023 at 16:50:25 UTC from IEEE Xplore. Restrictions apply.
`
`Samsung Ex. 1017
`Page 6
`
`

`

`msg_req
`
`msg_req
`
`connid
`
`Conn
`
`Resp
`
`cid_req
`
`msg_req
`
`que_fill
`
`cid_req
`
`Sched
`
`Resp
`
`resp
`
`resp
`
`msg_resp
`
`msg_resp
`
`cid_resp
`
`cid_req
`
`msg_req
`
`cmd length
`
`flags
`
`seq no.
`
`trans id
`
`address
`
`write data 1
`
`. . .
`
`write data N
`
`Request message format
`
`msg_resp
`
`msg_resp
`
`error
`
`seq no.
`
`trans id
`
`Figure 3. Narrowcast shell
`
`Figure 4. Multi-connection shell
`
`wr_data
`
`cmd+flags
`
`addr
`
`Seq
`
`msg
`
`msg
`
`Deseq
`
`wr_data
`
`cmd+flags
`
`addr
`
`rd_data
`
`wr_resp
`
`Deseq
`
`msg
`
`msg
`
`Seq
`
`rd_data
`
`wr_resp
`
`Figure 5. Master shell
`
`Figure 6. Slave shell
`
`the scheduler selects a BE channel with data and remote space
`using some arbitration scheme: e.g. round-robin, weighted round-
`robin, or based on the queue filling.
`To optimize the NoC utilization, it is preferable to send longer
`packets. To achieve this, we implemented a configurable threshold
`mechanism, which skips a channel as long as the sendable data
`is below the threshold. This is applicable for both BE and GT
`channels. To prevent starvation at user/application level (e.g., due
`to write data being buffered indefinitely on which the IP module
`waits for an acknowledge), we also provide a flush signal for each
`channel (and a bit in the message header) to temporarily override
`the threshold. When the flush signal is high for a cycle, a snapshot
`of its source queue filling is taken, and as long as all the words in
`the queue at the time of flushing have not been sent, the threshold
`for that queue is bypassed.
`A similar threshold is set for credit transmission. The reason
`is that, when there is no data on which the credits can be pig-
`gybacked, the credits are sent as empty packets, thus, consuming
`extra bandwidth. To minimize the bandwidth consumed by cred-
`its, a credit threshold is set, which allows credits to be transmitted
`only when their sum is above the threshold. Similarly to the data
`case, to prevent possible starvation, we provide a flush signal to
`force credits to be sent even when they are below their threshold.
`As credits are piggybacked on packets, a queue becomes eli-
`gible for scheduling when either the amount of sendable data are
`above a first threshold, or when the amount of credits is above
`a second threshold. However, once a queue is selected, a packet
`containing the largest possible amount of credits and data will be
`produced. Note the amount of credits is bound by implementa-
`tion to the given number of bits in the packet header, and packet
`have a maximum length to avoid links being used exclusively by a
`packet/channel, which would cause congestion.
`On the outgoing path, packets are depacketized, credits are
`added to the counter Space, and data is stored in its correspond-
`ing queue, which is given by a queue id field in the header.
`
`read data 1
`
`. . .
`
`read data N
`
`Response message format
`
`Figure 7. Message format examples
`
`4.2 NI Shells Architectures
`With the NI kernel described in the previous section, point-to-
`point connections (i.e., between on master and one slave) can be
`supported directly. These type of connections are useful in systems
`involving chains of modules communicating point to point with
`one another (e.g., video pixel processing [10]).
`For more complex types of connections, such as narrowcast or
`multicast, and to provide conversions to other protocols, we add
`shells around the NI kernel. As an example, in Figure 1, we show
`a NI with two DTL and two AXI ports. All ports provide point-to-
`point connections. In addition to this, the two DTL ports provide
`narrowcast connections, and one DTL and one AXI port provide
`multicast connections. Note that these shells add specific function-
`ality, and can be plugged in or left out at design time according to
`the requirements. NoC instantiation is simple, as we use an XML
`description to automatically generate the VHDL code for the NIs
`as well as for the NoC topology.
`In Figure 3, we show an example of a narrowcast shell. Nar-
`rowcast connections are connections between one master and sev-
`eral slaves, where each transaction is executed by a single slave
`selected based on the address provided in the transaction [23].
`Narrowcast connections provide a simple, low-cost solution for a
`single shared address space mapped on multiple memories. It im-
`plements the splitting/merging of data going to/coming from these
`memories.
`We implement the narrowcast connection as a collection of
`point-to-point connections, one for each master-slave pair. Within
`a narrowcast connection, the slave for which the transaction is des-
`tined is selected based on the address (Conn block). The address
`range assigned to a slave is configurable in the narrowcast mod-
`ule. To provide in-order response delivery, the narrowcast must
`also keep a history of connection identifiers of the transactions in-
`cluding responses (e.g., reads, and acknowledged writes), and the
`length of these responses. In-order delivery per slave of request
`messages is already provided by the point-to-point connections.
`When a slave using a connectionless protocol (e.g., DTL) is
`connected to a NI port supporting multiple connections, a multi-
`connection shell must be included to arbitrate between the connec-
`tions. A multi-connection shell (see Figure 4) includes a scheduler
`to select connections from which messages are consumed, based
`e.g., on their filling. As for the narrowcast, the multi-connection
`shell has a connection id history for scheduling the responses.
`
`4
`Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04)
`1530-1591/04 $20.00 © 2004 IEEE
`
`Authorized licensed use limited to: IEEE Staff. Downloaded on October 30,2023 at 16:50:25 UTC from IEEE Xplore. Restrictions apply.
`
`Samsung Ex. 1017
`Page 7
`
`

`

`A
`
`Cfg
`
`NI1
`cfg
`
`NI1
`data
`wr path, rqid
`wr space
`wr be, enable
`wr path, rqid
`wr space
`wr be, enable
`
`wr path, rqid
`wr space
`wr be, enable
`wr path, rqid
`wr space
`wr be, enable
`
`NI2
`data
`
`NI 2
`cfg
`
`B
`
`1. Setting up
` request channel
` NI1 -> NI2
`
`Setting up
`configuration
`connection
`
`2. Setting up
` response channel
` NI2 -> NI1
`
`3. Setting up
` response channel
` A -> B
`
`Setting up
`connection
`from B to A
`
`4. Setting up
` request channel
` B -> A
`
`,
`
`w r
`
` d a t a
`r d
`
`data
`
`B can issue
`requests to A. and
`A can respond
`
`Figure 9. Connection configuration example
`
`uring NI2 (B’s NI), the previously set up configuration connection
`is used. For configuring NI1, the NI1’s configuration port is ac-
`cessed directly via Config Shell. First, the channel from the
`slave module A to the master module B is configured at NI1 (Step
`3). Second, the channel from the master module B to the slave
`module A is configured (Step 4) through messages to NI2.
`
`5 Implementation
`In the previous section, we describe a prototype of a config-
`urable NI architecture.
`In this section, we discuss the synthe-
`sized area and speed figures for the network interface components:
`NI kernel, narrowcast, multichannel and configuration shells, and
`master and slave shells for a simplified version of DTL.
`We have synthesized an instance of a NI kernel with a STU of
`8 slots, and 4 ports having 1, 1, 2, and 4 channels, respectively,
`with all queues being 32-bit wide and 8-word deep. The queues
`are area-efficient custom-made hardware fifos. We use these fi-
`fos instead of RAMs, because we need simultaneous access at all
`NI ports (possibly running at different speeds) as well as simulta-
`neous read and write access for incoming and outgoing packets,
`which cannot be offered with a single RAM. Finally, for the small
`queues needed in the NI, multiple RAMs have a too large area
`overhead. Moreover, the hardware fifos implement the clock do-
`main boundary allowing each NI port to run at a different clock
`frequency. The router side of the NI kernel runs at a frequency of
`500 MHz, which matches our prototype router frequency [21], and
`delivers a bandwidth toward the router of 16 Gbit/s in each direc-
`tion. The synthesized area for this NI-kernel instance is 0.11 mm2
`in a 0.13µm technology.
`
`B
`(master)
`
`DTL
`
`DTL
`Shells
`
`B->A
`
`A->B
`
`Shells
`DTL
`
`DTL
`
`A
`(slave)
`
`D
`(slave)
`
`DTL
`
`DTL
`Shells
`
`Router
`network
`
`NI
`kernel
`
`NI
`kernel
`
`Shells
`DTL
`
`DTL
`
`C
`(master)
`
`DTL
`
`NI2
`
`NI2->Cfg
`
`Cfg->NI2
`
`NI1
`
`Config Shell
`
`DTL
`
`Cfg
`(master)
`
`DTL
`MMIO
`
`CNIP
`(DTL MMIO)
`
`CNIP
`(DTL MMIO)
`
`Figure 8. NI configuration
`
`In Figures 5 and 6, we show a master and slave shells that im-
`plement a simplified version of a protocol such as AXI. The basic
`functionality of such a shell is to sequentialize commands and their
`flags, addresses, and write data in request messages, and to dese-
`quentialize messages into read data, and write responses. Exam-
`ples of the message structures (i.e., after sequentialization) passing
`from NI shells and NI kernel are shown in Figure 7. In full-fledged
`master and slave shells, more blocks would be added to implement
`e.g., the unbuffered writes at the master side, and read linked, write
`conditional at the slave side.
`4.3 NI Configuration
`As mentioned in Section 3, in our prototype Æthereal NoC, we
`opt for centralized configuration. This means that there is a single
`configuration module that configures the whole NoC, and that slot
`tables can be removed from the routers. Consequently, only the
`NIs need to be configured when opening/closing connections.
`NIs are configured via a configuration port (CNIP), which of-
`fers a memory-mapped view on all control registers in the NIs.
`This means that the registers in the NI are readable and writable
`by any master using normal read and write transactions.
`Configuration is performed using the NoC itself (i.e., there is
`no separate control interconnect needed for NoC configuration).
`Consequently, the CNIPs are connected to the NoC like any other
`slave (see CNIP at NI2 in Figure 8). At the configuration module
`Cfg’s NI, we introduce a configuration shell (Config Shell), which,
`based on the address configures the local NI (NI1), or sends con-
`figuration messages via the NoC to other NIs. The configuration
`shell optimizes away the need for an extra data port at NI1 to be
`connected to the NI1’s CNIP.
`In Figure 9, we show the necessary steps in setting up a connec-
`tion between two module

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket