`
`
`
`
`
`
`
`
`
`
`
`
`
`DECLARATION OF GORDON MACPHERSON
`
`I, Gordon MacPherson, am over twenty-one (21) years of age. I have never been
`convicted of a felony, and I am fully competent to make this declaration. I declare the following
`to be true to the best of my knowledge, information and belief:
`
`1. I am Director, Board Governance & Policy Development of The Institute of Electrical
`and Electronics Engineers, Incorporated (“IEEE”).
`
`2. IEEE is a neutral third party in this dispute.
`
`3. I am not being compensated for this declaration and IEEE is only being reimbursed
`for the cost of the article I am certifying.
`
`4. Among my responsibilities as Director, Board Governance & Policy Development, I
`act as a custodian of certain records for IEEE.
`
`5. I make this declaration based on my personal knowledge and information contained
`in the business records of IEEE.
`
`6. As part of its ordinary course of business, IEEE publishes and makes available
`technical articles and standards. These publications are made available for public
`download through the IEEE digital library, IEEE Xplore.
`
`7. It is the regular practice of IEEE to publish articles and other writings including
`article abstracts and make them available to the public through IEEE Xplore. IEEE
`maintains copies of publications in the ordinary course of its regularly conducted
`activities.
`
`8. The article below has been attached as Exhibit A to this declaration:
`
`
`A. A. Radulescu, et al.; “An efficient on-chip network interface offering
`guaranteed services, shared-memory abstraction, and flexible network
`configuration”, published in Proceedings Design, Automation and Test in
`Europe Conference and Exhibition, date of conference February 16-20,
`2004.
`
`
`
`
`
`9. I obtained a copy of Exhibit A through IEEE Xplore, where it is maintained in the
`ordinary course of IEEE’s business. Exhibit A is a true and correct copy of the
`Exhibit, as it existed on or about October 30, 2023.
`
`
`445 Hoes Lane Piscataway, NJ 08854
`
`DocuSign Envelope ID: D785B3EF-83B0-43A2-820A-BE0763E7DBDE
`
`Samsung Ex. 1017
`Page 1
`
`
`
`
`
`10. The article and abstract from IEEE Xplore shows the date of publication. IEEE
`Xplore populates this information using the metadata associated with the publication.
`
`11. A. Radulescu, et al.; “An efficient on-chip network interface offering guaranteed
`services, shared-memory abstraction, and flexible network configuration”, published
`in Proceedings Design, Automation and Test in Europe Conference and Exhibition,
`date of conference February 16-20, 2004. Copies of the conference proceedings were
`made available no later than the last day of the conference. The article is currently
`available for public download from the IEEE digital library, IEEE Xplore.
`
`12. I hereby declare that all statements made herein of my own knowledge are true and
`that all statements made on information and belief are believed to be true, and further
`that these statements were made with the knowledge that willful false statements and
`the like are punishable by fine or imprisonment, or both, under 18 U.S.C. § 1001.
`
`I declare under penalty of perjury that the foregoing statements are true and correct.
`
`
`
`
`
`
`
`
`
`
`Executed on:
`
`
`
`
`
`
`
`
`
`
`
`
`DocuSign Envelope ID: D785B3EF-83B0-43A2-820A-BE0763E7DBDE
`
`10/30/2023
`
`Samsung Ex. 1017
`Page 2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`EXHIBIT A
`
`
`
`DocuSign Envelope ID: D785B3EF-83B0-43A2-820A-BE0763E7DBDE
`
`Samsung Ex. 1017
`Page 3
`
`
`
`An Efficient On-Chip Network Interface Offering Guaranteed Services,
`Shared-Memory Abstraction, and Flexible Network Configuration
`Andrei R˘adulescu, John Dielissen, Kees Goossens, Edwin Rijpkema, and Paul Wielage
`Philips Research Laboratories, Eindhoven, The Netherlands
`
`Abstract
`
`In this paper we present a network interface for an on-chip
`network. Our network interface decouples computation from com-
`munication by offering a shared-memory abstraction, which is in-
`dependent of the network implementation. We use a transaction-
`based protocol to achieve backward compatibility with existing
`bus protocols such as AXI, OCP and DTL. Our network interface
`has a modular architecture, which allows flexible instantiation. It
`provides both guaranteed and best-effort services via connections.
`These are configured via network interface ports using the net-
`work itself, instead of a separate control interconnect. An exam-
`ple instance of this network interface with 4 ports has an area of
`0.143mm2 in a 0.13µm technology, and runs at 500 MHz.
`
`1
`
`Introduction
`Networks on chip (NoC) have been proposed as a solution to
`the interconnect problem for highly complex chips [2, 3, 5, 9, 12,
`14, 15, 17, 21, 27]. NoCs help designing chips in several ways:
`they (a) structure and manage wires in deep submicron technolo-
`gies [2, 3, 9, 12, 21], (b) allow good wire utilization through shar-
`ing [5, 9, 12, 21], (c) scale better than buses [14, 21], (d) can be
`energy efficient and reliable [2, 5], and (e) decouple computation
`from communication through well-defined interfaces, enabling IP
`modules and interconnect to be designed in isolation, and to be
`integrated more easily [2, 13, 21, 24]
`Networks are composed of routers, which transport the data
`from one place to another, and network interfaces (NI), which im-
`plement the interface to the IP modules. In a previous article [21],
`we have shown the trade-offs in designing a cost-effective router
`combining guaranteed with best-effort traffic. In this paper, we
`focus on the other network component, the network interface.
`Network interface design has received considerable attention
`for parallel computers [8,25], and computer networks [6,7]. These
`designs are optimized for performance (high throughput, low la-
`tency), and often consist of a dedicated processor, and large
`amount of buffering. As a consequence, their cost is too large
`to be applicable on chip.
`On-chip network interfaces must provide a low-area overhead,
`because the size of IP modules attached to the NoC is relatively
`small. Designs of network interfaces with a low area have been
`proposed [4, 28]. However, they do not provide throughput or
`latency guarantees, which are essential for a compositional con-
`struction of complex SoCs.
`Our NI is intended for systems on chip (SoC), hence, it must
`have a low area. To enable the reuse of existing IP modules, we
`must provide a smooth transition from buses to NoCs. A shared-
`
`memory abstraction via transactions (e.g., read, write) ensures this.
`Further, we also have to provide a simple and flexible configura-
`tion, preferably using the NoC itself to avoid the need for a sepa-
`rate scalable interconnect.
`We achieve a low-cost implementation of the NI by implement-
`ing the protocol stack in hardware, and by exploiting on-chip char-
`acteristics (such as the absence of transmission errors, relatively
`static configuration, tight synchronization) to implement only the
`relevant parts of a complete OSI stack. A hardware implementa-
`tion of the protocol stack provides a much lower latency overhead
`compared to a software implementation. Further, a hardware im-
`plementation allows both hardware and software cores to be reused
`without change [4].
`Our NI provides services at the transport layer in the ISO-OSI
`reference model [22], because this is the first layer where offered
`services are independent of the network implementation. This
`is a key ingredient in achieving the decoupling between com-
`putation and communication [16, 24], which allows IP modules
`and interconnect to be designed independently from each other.
`We provide transport-layer services by defining connections (e.g.,
`point-to-point or multicast) configured for specific properties (e.g.,
`throughput, ordering).
`We offer guaranteed services as they are essential for a com-
`positional construction (design and programming) of SoC. The
`reasons are that they limit the possible interactions of IPs with
`the communication environment [12, 13], separate the IP require-
`ments and their implementation, and make application quality of
`service independent of the IP and NoC implementations. Exam-
`ples of such guarantees are lower bounds on throughput, and upper
`bounds on latency.
`Our NoC, called Æthereal, offers a shared-memory abstrac-
`tion to the IP modules. Communication is performed using a
`transaction-based protocol, where master IP modules issue request
`messages (e.g., read and write commands at an address, possibly
`carrying data) that are executed by the addressed slave modules,
`which may respond with a response message (i.e., status of the
`command execution, and possibly data) [23]. We adopt this proto-
`col to provide backward compatibility to existing on-chip commu-
`nication protocols (e.g., AXI [1], OCP [18], DTL [19]), and also
`to allow future protocols better suited to NoCs.
`We provide a modular NI, which can be configured at design
`time. This is, the number of ports and their type (i.e., configuration
`port, master port, or slave port), the number of connections at each
`port, memory allocated for the queues, the level of services per
`port, and the interface to the IP modules are all configurable at
`design (instantiation) time using an XML description [11].
`The NI allows flexible NoC configuration at run time. Each
`connection can be configured individually, requiring configurable
`NoC components (i.e., router and NI). However, instead of using
`
`1
`Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04)
`1530-1591/04 $20.00 © 2004 IEEE
`
`Authorized licensed use limited to: IEEE Staff. Downloaded on October 30,2023 at 16:50:25 UTC from IEEE Xplore. Restrictions apply.
`
`Samsung Ex. 1017
`Page 4
`
`
`
`a separate control interconnect to program them, the NoC is used
`to program itself. This is performed through configuration ports
`using DTL-MMIO (memory-mapped IO) transactions [19]. The
`NoC can be configured in a distributed fashion (i.e., via multiple
`configuration ports), or centralized (i.e., via a single port).
`The paper is organized as follows.
`In the next section, the
`services that we implement, and the interface offered to the IP
`modules are described. In Section 3, we show that NoCs can be
`configured both in a distributed and in a centralized way, and we
`present the trade-offs between the two approaches. In Section 4,
`we present a modular network interface architecture, which is split
`into a kernel, providing core functionality, and a number of shells
`to extend functionality, e.g., wrappers to provide an interface to
`existing bus protocols, such as AXI or DTL. In this section, we
`also show how the NI allows NoC configuration using the NoC
`itself as opposed to via a separate control interconnect. In Sec-
`tion 5, we demonstrate the feasibility of our network interface de-
`sign through a prototype implementation in a 0.13µm technology,
`and we conclude in Section 6.
`2 NoC Services
`As mentioned in the previous section, the communication ser-
`vices of the Æthereal NoC are defined to meet the following
`goals: (a) decouple computation (IP modules) from communica-
`tion (NoC), (b) provide backward compatibility to existing bus
`protocols, (c) provide support for real-time communication, and
`(d) have a low-cost implementation.
`Decoupling computation from communication is a key ingredi-
`ent in managing the complexity of designing chips with billions of
`transistors, because it allows the IP modules and the interconnect
`to be designed independently [16, 24]. In NoCs, this decoupling
`is achieved by positioning the network services at the transport
`level [3, 21] or above in the ISO-OSI reference model [22]. At the
`transport level, the offered services are end to end between com-
`municating IP modules, hiding, thus, the network internals, such
`as topology, routing scheme, etc.
`Backward compatibility with existing protocols, such as AXI
`or DTL, is achieved by using a model based on transactions [23].
`In a transaction-based model, there are two types of IP modules:
`masters and slaves. Masters initiate transactions by issuing re-
`quests, which can be further split in commands, and write data
`(corresponding to the address and write signal groups in AXI).
`Examples of commands are read, and write. One or more slaves
`receive and execute each transaction. Optionally, a transaction can
`also include a response issued by the slave to the master to return
`data or an acknowledgment of the transaction execution (corre-
`sponding to the read data and write response groups in AXI).
`In the Æthereal NoC, all these signals are sequentialized in
`request and response messages, which are supplied to the NoC,
`where they are transported by means of packets. Sequentialization
`is performed to reduce the number of wires, increasing their uti-
`lization, and to simplify arbitration. Packetization is performed by
`the NI, and is thus transparent to the IP modules.
`The Æthereal NoC offers its services on connections, which
`can be point to point (one master, one slave), multicast (one mas-
`ter, multiple slaves, all slaves executing each transaction), and
`narrowcast (one master, multiple slaves, a transaction is executed
`by only one slave) [23]. Connections are composed of unidirec-
`tional point-to-point channels (between a single master and a sin-
`gle slave). To each channel, properties are attached, such as guar-
`anteed message delivery or not, in order or un-ordered message
`
`delivery, and with or without timing guarantees. As a result, dif-
`ferent properties can be attached to the request and response parts
`of a connection, or for different slaves within the same connection.
`Connections can be opened and closed at any time. Opening and
`closing of connections takes time, and is intended to be performed
`at a granularity larger than individual transactions.
`Support for real-time communication is achieved by providing
`throughput, latency and jitter guarantees. In Æthereal, this is im-
`plemented by configuring connections as pipelined time-division-
`multiplexed circuits over the network. Time multiplexing is only
`possible when the network routers have a notion of synchronic-
`ity which allows slots to be reserved consecutively in a sequence
`of routers [13, 21]. This scheme [21] has smaller packet buffers,
`and, hence, has lower implementation cost compared to alterna-
`tives, such as rate-based packet switching [29], or deadline-based
`packet switching [20].
`Throughput guarantees are given by the number of slots re-
`served for a connection. Slots correspond to a given bandwidth:
`Bi, and, therefore, reserving N slots for a connection results in a
`total bandwidth of N×Bi. The latency bound is given by the wait-
`ing time until the reserved slot arrives and the number of routers
`data passes to reach its destination. Jitter is given by the maximum
`distance between two slot reservations.
`Protocol stacks that are used in networks to implement com-
`munication services, require additional cost compared to buses.
`Protocol stacks are necessary in networks to manage the complex-
`ity of networks, and to offer differentiated services. The pressure
`to keep the protocol stack small is higher on-chip than off-chip,
`because the size of the IP modules attached to the NoC is rela-
`tively small. However, for NoCs, the protocol stacks can be re-
`duced by exploiting the on-chip characteristics (e.g., no transfer
`errors, short wires) [23]. In the Æthereal NoC, we optimize the
`performance and minimize the cost of the protocol stack by im-
`plementing it in hardware, rather than in software. We support this
`claim in Section 5.
`3 Network Configuration
`Before the Æthereal NoC can be used by an application, it must
`be configured. NoC (re)configuration means opening and closing
`connections in the system. Connections are set up depending on
`the application or the mode the system is running. Therefore, we
`must be able to open and close connections while the system is
`running. (Re)configuration can be partial or total (some or all con-
`nections are opened/closed, respectively).
`Opening a connection involves setting several registers, and al-
`locating shared resources (for more details see Section 4). In the
`case of the current prototype of the Æthereal NoC, for each pair
`of one master and one slave of a connection, there are 5 and 3
`registers written at the master and slave network interfaces, re-
`spectively. The shared resources consist of the slots allocated to
`the connections. These slots can be configured using either a dis-
`tributed or a centralized model.
`In the distributed case, a connection can be opened/closed from
`multiple network interface ports. Multiple configuration opera-
`tions can be performed simultaneously, however, potential con-
`flicts must also be solved (e.g., connection configurations initiated
`at two configuration ports may try to reserve the same slot in a
`router). Information about the slots is maintained in the routers,
`which also accept or reject a tentative slot allocation.
`In a centralized system, there is only one place that performs
`NoC configuration.
`In such a case, the slot information can be
`
`2
`Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04)
`1530-1591/04 $20.00 © 2004 IEEE
`
`Authorized licensed use limited to: IEEE Staff. Downloaded on October 30,2023 at 16:50:25 UTC from IEEE Xplore. Restrictions apply.
`
`Samsung Ex. 1017
`Page 5
`
`
`
`connid
`msg
`connid
`msg
`data port
`
`msg
`msg
`data port
`
`msg
`msg
`data port
`
`chid
`
`msg
`
`msg
`
`msg
`
`chid
`
`msg
`
`msg
`
`msg
`
`Clock domain
`boundary
`
`Source
`queues
`
`Pck
`
`pck
`
`Path
`
`BE/GT
`
`Limit
`
`Space
`
`pck
`pck
`router port
`
`Scheduler
`
`Credit
`
`STU
`
`pck
`
`Depck
`
`Destination
`queues
`
`memory-mapped
`config. port
`
`Figure 2. Network interface kernel
`
`and are also used to provide the clock domain crossing between
`the network and the IP modules. Each port can, therefore, have a
`different clock frequency.
`In a first prototype
`Each channel is configured individually.
`of the Æthereal NI, we can configure if a channel provides time
`guarantees (GT) or not (we call this best effort, BE), reserve slots
`for GT connections, configure the end-to-end flow control, and the
`routing information.
`End-to-end flow control ensures that no data is sent unless
`there is enough space in the destination buffer to accommodate
`it. This is implemented using credits [26]. For each channel, there
`is a counter (Space) tracking the empty buffer space of the re-
`mote destination queue. This counter is initialized with the remote
`buffer size. When data is sent from the source queue, the counter
`is decremented. When data is consumed by the IP module at the
`other side, credits are produced in a counter (Credit) to indicate
`that more empty space is available. These credits are sent to the
`producer of data to be added to its Space counter. In the Æthe-
`real prototype, we piggyback credits in the header of the packets
`for the data in the other direction to improve NoC efficiency. Note
`that at most Space data items can be transmitted before credits
`are received. We call the minimum between the data items in the
`queue and the value in the counter Space, the sendable data.
`From the source queues, data is packetized (Pck) and sent to
`the NoC via a single link. A packet header consists of the rout-
`ing information (NI address for destination routing, and path for
`source routing), remote queue id (i.e., the queue of the remote NI
`in which the data will be stored), and piggybacked credits.
`There are multiple channels which may require data transmis-
`sion, we implement a scheduler to arbitrate between them. The
`scheduler checks if the current slot is reserved for a GT chan-
`nel. If the slot is reserved, is the GT channel has data which can
`be transmitted, and if there is space in the channel’s destination
`buffer, then the channel is granted data transmission. Otherwise,
`
`NI kernel
`ports
`
`NI
`kernel
`
`Router
`
`narrowcast
`
`multicast
`
`DTL adapter
`
`AXI adapter
`
`Network interface (NI)
`
`network
`
`Figure 1. NI kernel and shells
`
`DTL
`
`DTL
`
`AXI
`
`AXI
`
`NI
`ports
`
`user
`
`stored in the configuration module instead of the routers, which
`simplifies the design, and, in the case of small NoCs, may even
`speed up configuration. For large NoCs, however, centralized con-
`figuration can introduce a bottleneck.
`In the initial prototype of the Æthereal NoC, we opt for central-
`ized configuration, because it is able to satisfy the needs of a small
`NoC (around 10 routers), and has a simpler design and lower cost.
`We use transactions to program the NoC, both for connection reg-
`isters in the NIs, and for the slot information. We present details
`of how NoC configuration is performed in Section 4.
`
`4 Network Interface Architecture
`The network interface (NI) is the component that provides the
`conversion of the packet-based communication of the NoC to the
`higher-level protocol that IP modules use. We split the design of
`the network interface in two parts (see Figure 1): (a) the NI kernel,
`which implements the channels, packetizes messages and sched-
`ules them to the routers, implements the end-to-end flow control,
`and the clock domain crossing, and (b) the NI shells, which im-
`plement the connections (e.g., narrowcast, multicast), transaction
`ordering for connections, and other higher-level issues specific to
`the protocol offered to the IP.
`4.1 NI Kernel Architecture
`The NI kernel (see Figure 2) receives and provides messages,
`which contain the data provided by the IP modules via their pro-
`tocol after sequentialization. The message structure may vary de-
`pending on the protocol used by the IP module. However, the
`message structure is irrelevant for the NI kernel, as it just sees
`messages as pieces of data to be transported over the NoC.
`The NI kernel communicates with the NI shells via ports. At
`each port, point-to-point connections can be configured, their max-
`imum number being selected at NI instantiation time. A port can
`have multiple connections to allow differentiated traffic classes,
`in which case there are also connid signals to select on which
`connection a message is supplied or consumed.
`Int he NI kernel, there are two message queues for each point-
`to-point connection (one source queue, for messages going to the
`NoC, and one destination queue, for messages coming from the
`NoC). Their size is also selected at the NI instantiation time. In our
`NI, queues are implemented using custom-made hardware fifos,
`
`3
`Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04)
`1530-1591/04 $20.00 © 2004 IEEE
`
`Authorized licensed use limited to: IEEE Staff. Downloaded on October 30,2023 at 16:50:25 UTC from IEEE Xplore. Restrictions apply.
`
`Samsung Ex. 1017
`Page 6
`
`
`
`msg_req
`
`msg_req
`
`connid
`
`Conn
`
`Resp
`
`cid_req
`
`msg_req
`
`que_fill
`
`cid_req
`
`Sched
`
`Resp
`
`resp
`
`resp
`
`msg_resp
`
`msg_resp
`
`cid_resp
`
`cid_req
`
`msg_req
`
`cmd length
`
`flags
`
`seq no.
`
`trans id
`
`address
`
`write data 1
`
`. . .
`
`write data N
`
`Request message format
`
`msg_resp
`
`msg_resp
`
`error
`
`seq no.
`
`trans id
`
`Figure 3. Narrowcast shell
`
`Figure 4. Multi-connection shell
`
`wr_data
`
`cmd+flags
`
`addr
`
`Seq
`
`msg
`
`msg
`
`Deseq
`
`wr_data
`
`cmd+flags
`
`addr
`
`rd_data
`
`wr_resp
`
`Deseq
`
`msg
`
`msg
`
`Seq
`
`rd_data
`
`wr_resp
`
`Figure 5. Master shell
`
`Figure 6. Slave shell
`
`the scheduler selects a BE channel with data and remote space
`using some arbitration scheme: e.g. round-robin, weighted round-
`robin, or based on the queue filling.
`To optimize the NoC utilization, it is preferable to send longer
`packets. To achieve this, we implemented a configurable threshold
`mechanism, which skips a channel as long as the sendable data
`is below the threshold. This is applicable for both BE and GT
`channels. To prevent starvation at user/application level (e.g., due
`to write data being buffered indefinitely on which the IP module
`waits for an acknowledge), we also provide a flush signal for each
`channel (and a bit in the message header) to temporarily override
`the threshold. When the flush signal is high for a cycle, a snapshot
`of its source queue filling is taken, and as long as all the words in
`the queue at the time of flushing have not been sent, the threshold
`for that queue is bypassed.
`A similar threshold is set for credit transmission. The reason
`is that, when there is no data on which the credits can be pig-
`gybacked, the credits are sent as empty packets, thus, consuming
`extra bandwidth. To minimize the bandwidth consumed by cred-
`its, a credit threshold is set, which allows credits to be transmitted
`only when their sum is above the threshold. Similarly to the data
`case, to prevent possible starvation, we provide a flush signal to
`force credits to be sent even when they are below their threshold.
`As credits are piggybacked on packets, a queue becomes eli-
`gible for scheduling when either the amount of sendable data are
`above a first threshold, or when the amount of credits is above
`a second threshold. However, once a queue is selected, a packet
`containing the largest possible amount of credits and data will be
`produced. Note the amount of credits is bound by implementa-
`tion to the given number of bits in the packet header, and packet
`have a maximum length to avoid links being used exclusively by a
`packet/channel, which would cause congestion.
`On the outgoing path, packets are depacketized, credits are
`added to the counter Space, and data is stored in its correspond-
`ing queue, which is given by a queue id field in the header.
`
`read data 1
`
`. . .
`
`read data N
`
`Response message format
`
`Figure 7. Message format examples
`
`4.2 NI Shells Architectures
`With the NI kernel described in the previous section, point-to-
`point connections (i.e., between on master and one slave) can be
`supported directly. These type of connections are useful in systems
`involving chains of modules communicating point to point with
`one another (e.g., video pixel processing [10]).
`For more complex types of connections, such as narrowcast or
`multicast, and to provide conversions to other protocols, we add
`shells around the NI kernel. As an example, in Figure 1, we show
`a NI with two DTL and two AXI ports. All ports provide point-to-
`point connections. In addition to this, the two DTL ports provide
`narrowcast connections, and one DTL and one AXI port provide
`multicast connections. Note that these shells add specific function-
`ality, and can be plugged in or left out at design time according to
`the requirements. NoC instantiation is simple, as we use an XML
`description to automatically generate the VHDL code for the NIs
`as well as for the NoC topology.
`In Figure 3, we show an example of a narrowcast shell. Nar-
`rowcast connections are connections between one master and sev-
`eral slaves, where each transaction is executed by a single slave
`selected based on the address provided in the transaction [23].
`Narrowcast connections provide a simple, low-cost solution for a
`single shared address space mapped on multiple memories. It im-
`plements the splitting/merging of data going to/coming from these
`memories.
`We implement the narrowcast connection as a collection of
`point-to-point connections, one for each master-slave pair. Within
`a narrowcast connection, the slave for which the transaction is des-
`tined is selected based on the address (Conn block). The address
`range assigned to a slave is configurable in the narrowcast mod-
`ule. To provide in-order response delivery, the narrowcast must
`also keep a history of connection identifiers of the transactions in-
`cluding responses (e.g., reads, and acknowledged writes), and the
`length of these responses. In-order delivery per slave of request
`messages is already provided by the point-to-point connections.
`When a slave using a connectionless protocol (e.g., DTL) is
`connected to a NI port supporting multiple connections, a multi-
`connection shell must be included to arbitrate between the connec-
`tions. A multi-connection shell (see Figure 4) includes a scheduler
`to select connections from which messages are consumed, based
`e.g., on their filling. As for the narrowcast, the multi-connection
`shell has a connection id history for scheduling the responses.
`
`4
`Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’04)
`1530-1591/04 $20.00 © 2004 IEEE
`
`Authorized licensed use limited to: IEEE Staff. Downloaded on October 30,2023 at 16:50:25 UTC from IEEE Xplore. Restrictions apply.
`
`Samsung Ex. 1017
`Page 7
`
`
`
`A
`
`Cfg
`
`NI1
`cfg
`
`NI1
`data
`wr path, rqid
`wr space
`wr be, enable
`wr path, rqid
`wr space
`wr be, enable
`
`wr path, rqid
`wr space
`wr be, enable
`wr path, rqid
`wr space
`wr be, enable
`
`NI2
`data
`
`NI 2
`cfg
`
`B
`
`1. Setting up
` request channel
` NI1 -> NI2
`
`Setting up
`configuration
`connection
`
`2. Setting up
` response channel
` NI2 -> NI1
`
`3. Setting up
` response channel
` A -> B
`
`Setting up
`connection
`from B to A
`
`4. Setting up
` request channel
` B -> A
`
`,
`
`w r
`
` d a t a
`r d
`
`data
`
`B can issue
`requests to A. and
`A can respond
`
`Figure 9. Connection configuration example
`
`uring NI2 (B’s NI), the previously set up configuration connection
`is used. For configuring NI1, the NI1’s configuration port is ac-
`cessed directly via Config Shell. First, the channel from the
`slave module A to the master module B is configured at NI1 (Step
`3). Second, the channel from the master module B to the slave
`module A is configured (Step 4) through messages to NI2.
`
`5 Implementation
`In the previous section, we describe a prototype of a config-
`urable NI architecture.
`In this section, we discuss the synthe-
`sized area and speed figures for the network interface components:
`NI kernel, narrowcast, multichannel and configuration shells, and
`master and slave shells for a simplified version of DTL.
`We have synthesized an instance of a NI kernel with a STU of
`8 slots, and 4 ports having 1, 1, 2, and 4 channels, respectively,
`with all queues being 32-bit wide and 8-word deep. The queues
`are area-efficient custom-made hardware fifos. We use these fi-
`fos instead of RAMs, because we need simultaneous access at all
`NI ports (possibly running at different speeds) as well as simulta-
`neous read and write access for incoming and outgoing packets,
`which cannot be offered with a single RAM. Finally, for the small
`queues needed in the NI, multiple RAMs have a too large area
`overhead. Moreover, the hardware fifos implement the clock do-
`main boundary allowing each NI port to run at a different clock
`frequency. The router side of the NI kernel runs at a frequency of
`500 MHz, which matches our prototype router frequency [21], and
`delivers a bandwidth toward the router of 16 Gbit/s in each direc-
`tion. The synthesized area for this NI-kernel instance is 0.11 mm2
`in a 0.13µm technology.
`
`B
`(master)
`
`DTL
`
`DTL
`Shells
`
`B->A
`
`A->B
`
`Shells
`DTL
`
`DTL
`
`A
`(slave)
`
`D
`(slave)
`
`DTL
`
`DTL
`Shells
`
`Router
`network
`
`NI
`kernel
`
`NI
`kernel
`
`Shells
`DTL
`
`DTL
`
`C
`(master)
`
`DTL
`
`NI2
`
`NI2->Cfg
`
`Cfg->NI2
`
`NI1
`
`Config Shell
`
`DTL
`
`Cfg
`(master)
`
`DTL
`MMIO
`
`CNIP
`(DTL MMIO)
`
`CNIP
`(DTL MMIO)
`
`Figure 8. NI configuration
`
`In Figures 5 and 6, we show a master and slave shells that im-
`plement a simplified version of a protocol such as AXI. The basic
`functionality of such a shell is to sequentialize commands and their
`flags, addresses, and write data in request messages, and to dese-
`quentialize messages into read data, and write responses. Exam-
`ples of the message structures (i.e., after sequentialization) passing
`from NI shells and NI kernel are shown in Figure 7. In full-fledged
`master and slave shells, more blocks would be added to implement
`e.g., the unbuffered writes at the master side, and read linked, write
`conditional at the slave side.
`4.3 NI Configuration
`As mentioned in Section 3, in our prototype Æthereal NoC, we
`opt for centralized configuration. This means that there is a single
`configuration module that configures the whole NoC, and that slot
`tables can be removed from the routers. Consequently, only the
`NIs need to be configured when opening/closing connections.
`NIs are configured via a configuration port (CNIP), which of-
`fers a memory-mapped view on all control registers in the NIs.
`This means that the registers in the NI are readable and writable
`by any master using normal read and write transactions.
`Configuration is performed using the NoC itself (i.e., there is
`no separate control interconnect needed for NoC configuration).
`Consequently, the CNIPs are connected to the NoC like any other
`slave (see CNIP at NI2 in Figure 8). At the configuration module
`Cfg’s NI, we introduce a configuration shell (Config Shell), which,
`based on the address configures the local NI (NI1), or sends con-
`figuration messages via the NoC to other NIs. The configuration
`shell optimizes away the need for an extra data port at NI1 to be
`connected to the NI1’s CNIP.
`In Figure 9, we show the necessary steps in setting up a connec-
`tion between two module