IPR2015-01353, No. 1017 Exhibit - Benini Networks on Chips (P.T.A.B. Jun. 8, 2015)

Networks on Chins:
`A New 800
`Paradigm
`
`On-chip micronetworks, designed with a layered methodology, will
`meet the distinctive challenges of providing functionally correct,
`reliable operation of interacting system-on-chip components.
`
`Luca Benini
`University of
`Bologna
`
`Giovanni
`De Micheli
`Stanford University
`
`ystem-on-chip (SOC) designs provide inte-
`grated solutions to challenging design
`
`problems in the telecommunications, mul-
`timedia, and consumer electronics d0—
`mains. Much of the progress in these fields
`.
`,
`hinges on the designers’ ability to conceive complex
`electronic engines under strong time-to-market
`pressure. Success will rely on using appropriate
`design and process technologies, as well as on the
`ability to interconnect existing components—
`including processors, controllers, and memory
`arrays—reliably, in a plug-and-play fashion.
`By the end of the decade, SoCs, using 50-nm tran-
`sistors operating below one volt, will grow to 4 bil-
`lion transistors running at 10 GHZ, according to the
`International Technology Roadmap for Semicon—
`ductors. The major challenge designers of these sys-
`tems must overcome will be to provide for function-
`ally correct, reliable operation of the interacting com-
`ponents. On—chip physical interconnections will pre—
`sent a limiting factor for performance and, possibly,
`energy consumption.
`face other challenges.
`Silicon technologies
`Synchronization of future chips with a single clock
`source and negligible skew will be extremely diffi—
`cult, if not impossible. The most likely synchro-
`nization paradigm for
`future chips—globally
`asynchronous and locally synchronous—involves
`using many different clocks. In the absence of a sin-
`gle timing reference, SoC chips become distributed
`systems on a single silicon substrate. Global con-
`trol of the information traffic is unlikely to succeed
`because the system needs to keep track of each com-
`ponent’s states. Thus, components will initiate data
`
`Computer
`
`transfers autonomously, according to their needs.
`The global communication pattern will be fully dis-
`tributed, with little or no global coordination.
`As SOC complexity scales, capturing the system’s
`functionality with fully deterministic operation
`models will become increasingly difficult. As global
`wires span multiple clock domains, synchroniza-
`tion failures in communicating between different
`domains will be rare but unavoidable events.1
`Moreover, energy and device reliability concerns
`will impose small logic swings and power supplies,
`most likely less than one volt. Electrical noise due
`to crosstalk, electromagnetic interference, and radi-
`ation-induced charge injection will likely produce
`data errors, also called upsets. Thus, transmitting
`digital values on wires will be inherently unreliable
`and nondeterministic. Other causes of nondeter-
`
`minism include design components with a high level
`of abstraction and coarse granularity and distrib-
`uted communication control.
`
`Focusing on using probabilistic metrics such as
`average values or variance to quantify design objec-
`tives such as performance and power will lead to a
`major change in design methodologies. Overall,
`SOC design will be based on both deterministic and
`stochastic models. Creating complex SoCs requires
`a modular, component-based approach to both
`hardware and software design.
`Based on the premise that interconnect technology
`will be the limiting factor for achieving SoCs’ opera-
`tional goals, we postulate that the layered design of
`reconfigurable micronetworks, which exploits the
`methods and tools used for general networks, can
`best achieve efficient communication on SoCs.
`
`0018-9162/021317flo © 2002 lEEE
`
`APPLE 1017
`
`1
`
`APPLE 1017
`
`

`“ii!"‘fl M95.--
`
`Projections for future silicon technologies show that chip size
`will scale up slightly while gate delays decrease compared to
`wiring delays. A simple computation shows that delays on wires
`that span the chip will extend longer than the clock period. This
`trend is a trivial consequence of the finite propagation speed of
`electromagnetic waves, which is 1/ = (0.3/46 ) mm per second in
`a homogeneous medium with relative permittivity e. In 50 nm
`technology, the projected chip die edge will be around 22 mm,
`with a clock frequency of 10 GHz.
`Thus, the delay for a signal traversing the chip diagonally will
`be approximately 100 picoseconds, or one clock period, in the
`ideal case that e = 1. A lower bound of two clock periods applies
`to general media with e > 1.1 Obviously, signal propagation on
`
`
`
`real-life interconnections is much slower than this lower bound,
`
`and optimistic predictions estimate propagation delays for
`highly optimized global wires—taking wire sizing and buffering
`into account—to be between six and 10 clock cycles for chips
`made using 50 nm technology.2
`
`References
`1. D. Sylvester and K. Keutzer, “A Global Wiring Paradigm for Deep
`Submicron Design,” IEEE Trans. CAD/ICAS, Feb. 2000, pp. 242-
`252.
`
`2. R. Ho, K. Mai, and M. Horowitz, “The Future of Wires,” Free.
`the IEEE, Apr. 2001, pp. 490-504.
`
`
`
`Network engineers have already gained experi-
`ence with using stochastic techniques and models
`for large—scale designs. We propose borrowing
`models, techniques, and tools from the network
`design field and applying them to 50C design.
`We view a SoC as a micronetwork of compo-
`nents. The network is the abstraction of the com-
`munication among components and must satisfy
`quality-of-service requirements—such as reliabil-
`ity, performance, and energy bounds—under the
`limitation of intrinsically unreliable signal trans-
`mission and significant communication delays on
`wires. We propose using the micronetwork stack
`paradigm, an adaptation of the protocol stack
`shown in Figure 1,2 to abstract the electrical, logic,
`and functional properties of the interconnection
`scheme.
`SoCs differ from wide area networks in their local
`
`proximity and because they exhibit less nondeter-
`minism. Local, high—performance networks—such
`as those developed for large-scale multiprocessors—
`have similar requirements and constraints. Some
`distinctive characteristics, such as energy constraints
`and design—time specialization, are unique to 80C
`networks, however.
`Whereas computation and storage energy greatly
`benefit from device scaling, which provides smaller
`gates and memory cells, the energy for global com-
`munication does not scale down. On the contrary,
`as the “Wiring Delays” sidebar indicates, projec-
`tions based on current delay optimization tech—
`niques for global wires3 show that global on-chip
`communication will require increasingly higher
`energy consumption. Hence, minimizing the energy
`used for communications will be a growing con-
`cern in future technologies. Further, network traf-
`fic control and monitoring can help better manage
`the power that networked computational resources
`consume. For example, the clock speed and volt-
`age of end nodes can vary according to available
`network bandwidth.
`
`Figure 1. Protocol
`stack from which
`the micronetwork
`stack paradigm can
`be adapted. Bottom
`up, the layers span
`Increasing design
`abstraction levels.
`
`Software
`
`appllcation
`
`system
`
`
`
`
`'
`
`» Architecture
`-
`and control
`transport
`network
`data link
`
`Physical
`,
`' wiring
`
`Another facet of the SOC network design prob-
`lem, design-time specialization, raises many new
`challenges. Macroscopic networks emphasize gen-
`eral—purpose communication and modularity.
`Communication network design has traditionally
`been decoupled from specific end applications and
`is strongly influenced by standardization and com-
`patibility constraints in legacy network infrastruc-
`tures. In SoC networks, these constraints are less
`restrictive because developers design the communi-
`cation network fabric on silicon from scratch. Thus,
`only the abstract network interface for the end
`nodes requires standardization. Developers can tai-
`lor the network architecture itself to the applica-
`tion, or class of applications, the 50C design targets.
`We thus envision a vertical design flow in which
`every layer of the micronetwork stack is special—
`ized and optimized for the target application
`domain. Such an application-specific on-chip net-
`work-synthesis paradigm represents an open and
`exciting research field. Specialization does not
`imply complete loss of flexibility, however. From a
`design standpoint, network reconfigurability will
`be key in providing plug-and—play component use
`because the components will interact with one
`another through reconfigurable protocols.
`
`January 2002
`
`2
`
`

`{3% ~$§ié¥§l§ Séfifi'fizi T??§i§‘§$§t§§3$i§§?é
`
`
`
`
` zséa§ sites? as
`
`
`
`magmas.
`
`Wires are the physical realization of com-
`munication channels in SoCs and, for our
`purposes, buses function as wire ensembles.
`Intensive research” into on-chip wiring has
`resulted in the commercial development of
`several physical design tools to support auto-
`mated wiring. Nevertheless, coping with
`global wires that span significant distances,
`such as those beyond one millimeter, requires
`a paradigm shift.
`Most likely, the reverse—scaled global wires
`will be routed on the top metal layers pro-
`vided by the technology. Wiring pitch and
`width increase in higher wiring levels so that wires
`at top levels can be much wider and thicker than
`low-level wires.5 Increased width reduces wire resis—
`tance, even considering the skin effect, while
`increased spacing around the wire prevents capac-
`itance growth. At the same time, inductance effects
`increase relative to resistance and capacitance. As
`a result, future global wires will function as lossy
`transmission lines,I as opposed to today’s lumped
`or distributed resistance-capacitance models.
`In addition to facilitating high-speed communi-
`cation, reducing the voltage swing also has a ben-
`eficial effect on power dissipation. Reduced-swing,
`current-mode transmission requires careful receiver
`design, with good adaptation to line impedance and
`high-sensitivity sensing, possibly with the help of
`sense amplifiers.
`When using current technologies, most chip
`developers assume that electrical waveforms always
`carry correct on-chip information. Guaranteeing
`error-free information transfer at the physical level
`on global on-chip wires will become more difficult
`for several reasons.‘ Signal swings will be reduced
`and noise—due to crosstalk, electromagnetic inter-
`ference, and other factors—will have increased
`impact. Thus, it will not be possible to abstract the
`physical layer of on—chip networks as a fully reliable,
`fixed-delay channel. At the micronetwork stack lay-
`ers atop the physical layer, noise is a source of local
`transient malfunctions. An upset is the abstraction of
`such malfunctions. Upset probability can vary over
`different physical channels and over time.
`In current designs, wiring-related effects are unde-
`sirable parasitics, and designers use specific, detailed
`physical techniques to reduce or cancel them. A
`well~balanced design should not try to achieve ideal
`wire behavior at the physical layer because the cor-
`responding cost in performance, energy efficiency,
`and modularity may be too high. Physical-layer
`design should find a compromise between satisfy-
`
`Computer
`
`ing competing quality metrics and providing a clean
`and complete abstraction of channel characteristics
`for the micronetwork layers above.
`
`reisssastwsaa
`éfigi‘iiTEflHfi‘éi ass masses,
`The architecture specifies the interconnection net-
`work’s topology and physical organization, while
`the protocols specify how to use network resources
`during system operation. Whereas both micronet-
`work and general network design must meet per-
`formance requirements, the need to satisfy tight
`energy bounds differentiates on-chip network
`implementations.
`
`inierttssassiésa assesses. fir‘fiiéiiefiifiifig
`
`On-chip networks relate closely to interconnec-
`tion networks for high-performance parallel com-
`puters with multiple processors, in which each
`processor is an individual chip. Like multiprocessor
`interconnection networks, nodes are physically
`close to each other and have high link reliability.
`Further, developers have traditionally designed
`multiprocessor interconnections under stringent
`bandwidth and latency constraints to support effec-
`tive parallelization.7 Similar constraints will drive
`micronetwork design.
`Sharvzramssmm iiii‘lwfii‘iifi. Most current SoCs have
`a shared—medium architecture, which has the sim-
`plest interconnect structures. In this architecture,
`all communication devices share the transmission
`
`medium. Only one device can drive the network
`at a time. These networks support broadcast as
`well, an advantage for the highly asymmetric com—
`munication that occurs when information flows
`from few transmitters to many receivers. Within
`current technologies, the backplane bus is the most
`common example of an on-chip, shared-medium
`structure. This convenient, low-overhead inter-
`connection handles a few active bus masters and
`
`many passive bus slaves that only respond to bus
`master requests.
`We need bus arbitration mechanisms when sev-
`
`eral processors attempt to use the bus simultane-
`ously. A bus arbiter module performs centralized
`arbitration in current on-chip buses. A processor
`seeking to communicate must first gain bus mas-
`tership from the arbiter. Because this process
`implies a control transaction and communication
`performance loss, arbitration should be as fast and
`rare as possible.
`Together with arbitration, the response time of
`slow bus slaves may cause serious performance
`losses because the bus remains idle while the mas-
`
`3
`
`

`ter waits for the slave to respond. To minimize the
`bandwidth consumption, developers have devised
`split transaction protocols for high-performance
`buses. In these protocols, the network releases bus
`mastership upon request completion, and the slave
`must gain access to the bus to respond, possibly
`several bus cycles later. Thus, the bus can support
`multiple outstanding transactions.
`Obviously, bus masters and bus interfaces for
`split-transaction buses are more complex than
`those for simple atomic-transaction buses. For
`example, developers chose a 128-bit split-transac—
`tion bus for the Lucent Daytona chip,8 a multi-
`processor on a chip that contains four 64-bit
`processing elements that generate transactions of
`different sizes. To improve bus-bandwidth utiliza-
`tion and minimize the average latency caused by
`simultaneous requests, the bus partitions large
`transfers into smaller packets.
`Although well understood and widely used,
`shared-medium architectures have seriously lim-
`ited scalability. The bus-based organization remains
`convenient for current SoCs that integrate fewer
`than five processors and, rarely, more than 10 bus
`masters. Energy inefficiency is another critical lim-
`itation of shared—medium networks. In these archi-
`tectures, every data transfer is broadcast, meaning
`the data must reach each possible receiver at great
`energy cost. Future integrated systems will contain
`tens to hundreds of units generating information
`that must be transferred. For such systems, a bus-
`based network would become a critical perfor-
`mance and power bottleneck.
`
`:.
`=
`"I“mm The direct or point-
`to--point network overcomes the scalability prob—
`lems of shared—medium networks In this archi-
`tecture, each node directly connects to a limited
`number of neighboring nodes. These on-chip com-
`putational units contain a network interface block,
`often called a router, that handles communication
`and directly connects to neighboring nodes’
`routers. Direct interconnect networks are popular
`for building large—scale systems because the total
`communication bandwidth also increases when the
`number of nodes in the system increases.
`The Raw Architecture Workstation (RAW) archi—
`tecture9 is an example of a direct network imple-
`mentation derived from a fully programmable SoC
`consisting of an array of identical computational
`tiles with local storage. Full programmability means
`that the compiler can program both the function of
`each tile and the interconnections among them.
`The term RAW derives from the “raw” hard-
`ware’s full exposure to the compiler. To accomplish
`
`‘3‘
`
`Eaergy iseiiésiesa
`is a stitieafi
`ismitaiésa sf
`shareémaeém
`main?:5?a
`
`programmable communication, each tile has
`a router. The compiler programs the routers
`on all tiles to issue a sequence of commands
`that determines exactly which set of wires
`connect at every cycle. Moreover, the com-
`pilet pipelines the long wires to support high
`clock frequency.
`Indirect or switch-based networks offer an
`alternative to direct networks for scalable
`interconnection design. In these networks, a
`connection between nodes must go through a set of
`switches. The network adapter associated with each
`node connects to a switch’s port. Switches them-
`selves do not perform information processing—they
`only provide a programmable connection between
`their ports, setting up a communication path that
`can change over time.7 Significantly, the distinction
`between direct and indirect networks is blurring as
`routers in direct networks and switches in indirect
`networks become more complex and absorb each
`other’s functionality. As the “Virtex II FPGA” side-
`bar indicates, some field-programmable gate arrays
`are examples of indirect networks on chips.
`fly as grammars Introducing a controlled amount
`of nonuniformity in communication- network
`design provides several advantages. Multiple--back-
`plane and hierarchical buses are two notable exam-
`ples of
`the many heterogeneous or hybrid
`interconnection architectures that developers have
`proposed and implemented. These architectures
`cluster tightly coupled computational units with
`high communication bandwidth and provide lower
`bandwidth intercluster communication links.
`Because they use a fraction of the communication
`resources and energy to provide performance com-
`parable with homogeneous, high-bandwidth archi—
`tectures, energy efficiency is a strong driver toward
`using hybrid architectures.10
`
`E‘s‘éimmst amt;amiss!
`
`Using micronetwork architectures effectively
`requires relying on protocols—network control
`algorithms that are often distributed. Network
`control dynamically manages network resources
`during system operation, striving to provide the
`required quality of service. Following the micro-
`network stack layout shown in Figure 1, we
`describe the three architecture-and-control lay-
`ers—data link, network, and transport—from the
`bottom up.
`33.231313: Efiyat’. The physical layer is an unreliable
`digital link in which the probability of bit upsets is
`non-null. Data-link protocols increase the relia-
`bility of the link, up to a minimum required level,
`
`January 2002
`
`4
`
`

`Yiflfix.!!.fl’§5.
`
`Most current field-programmable gate arrays consist of a
`homogeneous fabric of programmable elements connected by a
`switch-based network. FPGAs can be seen as the archetype of
`future programmable SoCs: They contain many interconnected
`computing elements. Current FPGA communication networks
`
`
`
`Figure A. Xilinx Viriax ii, a livid-programmable gate array architec-
`ture that exemplifies an indirect network over a heternganeous
`fabric.
`
`differ from future SoC micronetworks in granularity and homo-
`geneity.
`Processing elements in traditional FPGAS implement simple
`bit-level functional blocks. Thus, communication channels in
`FPGAs are functionally equivalent to wires that connect logic
`gates. Because future SoCs will house complex processing ele-
`ments, interconnects will carry much coarser quantities of infor-
`mation. The different granularity of computational elements and
`communication requirements has far-reaching consequences for
`the complexity of the network interface circuitry associated with
`each communication channel. Interface circuitry and network
`control policies must be kept extremely simple for FPGAs, while
`they can be much more complex when supporting coarser-grain
`information transfers. The increased complexity will introduce
`greater degrees of freedom for optimizing communication as well.
`The concept of dynamically reconfiguring FPGAs applies well
`to micronetwork design. SoCs benefit from programmability
`on the field to match, for example, environmental constraints.
`This programmability also lets runtime reconfiguration adapt,
`for example, to a varying workload. Reconfigurable micronet—
`works exploit programmable routers, switches, or both. Their
`embodiment may leverage multiplexers whose control signals
`are set—as with FPGAs—by configuration bits in local storage.
`For example, Figure A shows the Xilinx Virtex II FPGA with
`various configurable elements to support reconfigurable digi-
`tal-signal-processor design. The internal configurable rectan-
`gular array contains configurable logic blocks (CLBs), random
`access memories (RAMs), multipliers (MUL), switches (SWT),
`I/O buffers (IUB), and dynamic clock managers (DCM). Routing
`switches facilitate programmable interconnection. Each pro-
`grammable element connects to a switch matrix, allowing mul-
`tiple connections to the general routing matrix. Values stored
`in static memory cells control all programmable elements,
`including the routing resources. Thus, Virtex II exemplifies an
`indirect network over a heterogeneous fabric.
`
`under the assumption that the physical layer by
`itself is not sufficiently reliable.
`In a shared-medium network, contention creates
`an additional error source. Contention resolution,
`fundamentally a nondeterministic process, is an
`additional noise source because it requires syn-
`chronization of a distributed system. In general,
`synchronization can virtually eliminate nondeter-
`minism at the price of some performance loss. For
`example, centralized bus arbitration eliminates con-
`tention-induced errors in a synchronous bus but
`the slow bus clock and bus request-and-release
`cycles impose a substantial performance penalty.
`Packetizing data deals effectively with commu—
`nication errors. Sending data on an unreliable chan-
`nel
`in packets makes error containment and
`recovery easier because the packet boundaries con—
`tain the effect of errors and allow error recovery on
`a packet-by-packet basis. Using error—correcting
`codes that add redundancy to the transferred infor-
`
`mation can achieve error correction at the data link
`
`layer. Packet—based error-detection and -recovery
`protocols that have been developed for traditional
`networks, such as alternating—bit, go-back-N, and
`selective repeat, can complement error correction.2
`Several parameters in these protocols, such as
`packet size and number of outstanding packets, can
`be adjusted to achieve maximum performance at a
`specified residual error probability, within given
`energy consumption bounds, or both.
`
`flatware
`ea This layer implements cnd-to-end
`delivery control in network architectures with
`many communication channels. In most current
`on-chip networks, all processing elements connect
`to the same channel: the on—chip bus, leaving the
`network layer empty. However, when a collection
`of links connects the processing elements, we must
`decide how to set up connections between succes-
`sive links and route information from its source to
`
`the final destination. Developers have studied these
`
`Computer
`
`5
`
`

`switching and routing tasks extensively in the con-
`text of both multiprocessor interconnects7 and gen-
`eral communication networks.2
`Switching algorithms can be grouped into three
`classes: circuit, packet, and cut-through switching.7
`These approaches trade off better average delivery
`time and channel utilization for increased variance
`and decreased predictability. The low latency of
`cut-through switching schemes will likely make
`them preferable for on-chip micronetworks from
`a performance standpoint. However, aggressive for-
`warding of data through switches can increase traf-
`fic and contention, which may waste energy.
`Depending on the application domain, nondeter-
`minism can be more or less tolerable.
`Switching is tightly coupled to routing. Routing
`algorithms establish the path a message follows
`through the network to its final destination.
`Classifying, evaluating, and comparing on—chip
`routing schemes7 requires analyzing several trade-
`offs, such as
`
`0 predictability versus average performance,
`0 router complexity and speed versus achievable
`channel utilization, and
`0 robustness versus aggressiveness.
`
`We can make a coarse distinction between deter-
`ministic and adaptive routing algorithms. Deter-
`ministic approaches always supply the same path
`between a given source—destination pair and offer
`the best choice for uniform or regular traffic pat-
`terns. In contrast, adaptive approaches use infor~
`mation about network traffic and channel con—
`ditions to avoid congested network regions. An
`adaptive approach is preferable when dealing with
`irregular traffic or in networks with unreliable nodes
`and links.
`
`We conjecture that future on-chip micronetvvork
`designs will emphasize speed and decentralization
`of routing decisions. Robustness and fault toler-
`ance will also be highly desirable. These factors,
`and the observation that traffic patterns for spe-
`cial-purpose SoCs tend to be irregular, seem to
`favor adaptive routing. However, when traffic pre—
`dictability is high and nondeterminism is undesir-
`able, deterministic routing may be the best choice.
`The “SPIN Micronetwork” sidebar describes a
`
`micronetwork that uses deterministic routing.”
`
`:9
`iféifiifi“3
`’
`2:? Atop the network layer, the trans—
`port layer decomposes messages into packets at
`the source. It also resequences and reassembles the
`messages at the destination. Packetization granu—
`larity presents a critical design decision because
`
`§MPIILMicronmgmeWWWM
`
`..
`
`W.
`
`The Scalable, Programmable, Integrated Network (SPIN) on-chip
`micronetwork defines packets as sequences of 32-bit words, with the
`packet header fitting in the first word. SPIN uses a byte in the header to
`identify the destination, allowing the network to scale up to 256 termi-
`nal nodes. Other bits carry packet tagging and routing information, and
`the packet payload can be of variable size. A trailer—which does not con-
`tain data, but a checksum for error detection—terminates every packet.
`SPIN has a packetization overhead of two words. The payload should
`thus be significantly larger than two words to amortize the overhead.
`The SPIN micronetwork adopts cut-through switching to minimize
`message latency and storage requirements in the design of network
`switches. However, it provides some extra buffering space on output links
`to store data from blocked packets. Figure B shows SPIN’s fat-tree net-
`work architecture, which derives its name from the progressively increas-
`ing communication bandwidth toward the root. The architecture is
`nonblocking when packet size is limited to a single word. Because pack-
`ets can span more than one switch, SPIN’s blocking is a side effect of cut—
`through switching alone.
`
`
`
`
`
`Flame 5. SP”! archltocturo. R blocks are swltclws, II blocks are nudes.
`
`SPIN uses deterministic routing, with routing decisions set by the net-
`work architecture. In fat-tree networks, tree routing is the algorithm of
`choice. The network routes packets from a node, or tree leaf, toward the
`tree root until they reach a switch that is a common ancestor with the
`destination node. At that point, the network routes the packet toward
`the destination by following the unique path between the ancestor and
`destination nodes.
`
`most network-control algorithms are highly sen-
`sitive to packet size. Most macroscopic networks
`standardize packets to facilitate internetworking,
`extensibility, and the compatibility of the net-
`working hardware that different manufacturers
`produce. Packet standardization constraints can
`be relaxed in 50C micronetworks, which can be
`customized at design time.
`In general, either deterministic or statistical pro-
`cedures can provide the basis for flow control and
`negotiation. Deterministic approaches ensure that
`traffic meets specifications, and they provide hard
`bounds on delays or message losses. Deterministic
`techniques have the disadvantage of being based on
`worst cases, however, and they generally lead to sig—
`nificant underutilization of network resources.
`
`Statistical techniques offer more efficient resource
`utilization, but they cannot provide worst-case
`guarantees.
`
`January 2002
`
`6
`
`

`
`
`
`The Silicon Backplane Micronetwork
`(http://www.sonicsinc.com), a shared-med-
`ium bus based on time-division multiplexing,
`offers an example of transport layer issues in
`micronetwork design. When a node wants to
`communicate, it must issue a request to the
`arbiter during a time slot. If arbitration is
`favorable, it may be granted access in the fol-
`lowing time slot. Hence, arbitration intro-
`duces a nondeterministic waiting time in
`transmission. To reduce nondeterminism, the
`micronetwork protocol provides a form of
`slot reservation: Nodes can reserve a fraction
`of the available time slots, thereby allocating bus
`bandwidth deterministically.
`{.w i‘éesézéamfieni. The theoretical framework
`developed for large-scale networks provides a con-
`venient environment for reasoning about on-chip
`micronetworks as well. Currently very scarcely
`explored, the micronetwork design requires fur-
`ther work to predict the tradeoff curves in this
`space. We also believe that this area offers signifi-
`cant room for innovation: On-chip micronetwork
`architectures and protocols can be tailored to spe-
`cific system configurations and application classes.
`Further, the impact of network design and control
`decisions on communication energy presents an
`important research theme that will become criti-
`cal as communication energy consumption scales
`up in SoC architectures.
`
`
`‘5,’
`
` 13*
`Network architectures and control algorithms
`constitute the infrastructure and provide commu-
`nication services to the end nodes, which are pro-
`grammable in most cases. The software layers for
`SoCs include system and application programs.
`
`wwaters seawater
`The operating system captures the system pro-
`grams that support SOC operation. System support
`software in current SoCs usually consists of ad hoc
`routines designed for a specific integrated core
`processor under the assumption that a processor
`provides global, centralized system control. In
`future SoCs, the prevailing paradigm will be peer-
`to—peer interaction among several possibly hetero-
`geneous processing elements. Thus, we think that
`system software will be designed as a modular dis-
`tributed system. Each programmable component
`will be provided with system software to support its
`own operation, manage its communication with
`the micronetwork, and interact effectively with
`neighboring components’ system software.
`
`Computer
`
`Seamless composition of micronetwork compo-
`nents will require system software that is config-
`urable according to the network’s requirements.
`System software configuration may be achieved in
`various ways, ranging from manual adaptation to
`automatic configuration. One end of the spectrum
`favors software optimization and compactness
`while the other end favors ease of design and fast
`turnaround time. With this vision, on-chip com-
`munication protocols should be programmable at
`the system software level to adapt the underlying
`layers to the components’ characteristics.
`Most SoCs are dedicated to a specific application,
`and system software seeks to provide the required
`quality of service within the physical constraints of
`that application. Consider, for example, a 50C for
`a wireless mobile video terminal. Quality of service
`relates to the video quality, which implies specific
`computation, storage element, and micronetwork
`performance levels. Constraints relate to the
`strength and signal—to-noise ratio of the radio-
`frequency signal and to the energy available in the
`battery. Thus, the system software must provide
`high performance by orchestrating the information
`processing within the service stations and optimiz-
`ing information flow. Moreover, the software should
`achieve this task while minimizing energy con-
`sumption.
`The system software provides an abstraction of
`the underlying hardware platform. We can view the
`system as a queuing network of service stations.
`Each service station models a computational or
`storage unit, while the queuing network abstracts
`the micronetwork. Moreover, we can assume the
`following:
`
`0 Each service station can operate at various
`service levels, providing corresponding per-
`formance and energy consumption levels.
`This approach abstracts the physical imple-
`mentation of components with adjustable
`voltage or frequency levels, or both, along
`with the ability to disable their functions in
`full or in part.
`0 The system software can control the informa-
`tion flow between the various units to provide
`the appropriate quality of service. This func

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases