`A New 800
`Paradigm
`
`On-chip micronetworks, designed with a layered methodology, will
`meet the distinctive challenges of providing functionally correct,
`reliable operation of interacting system-on-chip components.
`
`Luca Benini
`University of
`Bologna
`
`Giovanni
`De Micheli
`Stanford University
`
`ystem-on-chip (SOC) designs provide inte-
`grated solutions to challenging design
`
`problems in the telecommunications, mul-
`timedia, and consumer electronics d0—
`mains. Much of the progress in these fields
`.
`,
`hinges on the designers’ ability to conceive complex
`electronic engines under strong time-to-market
`pressure. Success will rely on using appropriate
`design and process technologies, as well as on the
`ability to interconnect existing components—
`including processors, controllers, and memory
`arrays—reliably, in a plug-and-play fashion.
`By the end of the decade, SoCs, using 50-nm tran-
`sistors operating below one volt, will grow to 4 bil-
`lion transistors running at 10 GHZ, according to the
`International Technology Roadmap for Semicon—
`ductors. The major challenge designers of these sys-
`tems must overcome will be to provide for function-
`ally correct, reliable operation of the interacting com-
`ponents. On—chip physical interconnections will pre—
`sent a limiting factor for performance and, possibly,
`energy consumption.
`face other challenges.
`Silicon technologies
`Synchronization of future chips with a single clock
`source and negligible skew will be extremely diffi—
`cult, if not impossible. The most likely synchro-
`nization paradigm for
`future chips—globally
`asynchronous and locally synchronous—involves
`using many different clocks. In the absence of a sin-
`gle timing reference, SoC chips become distributed
`systems on a single silicon substrate. Global con-
`trol of the information traffic is unlikely to succeed
`because the system needs to keep track of each com-
`ponent’s states. Thus, components will initiate data
`
`Computer
`
`transfers autonomously, according to their needs.
`The global communication pattern will be fully dis-
`tributed, with little or no global coordination.
`As SOC complexity scales, capturing the system’s
`functionality with fully deterministic operation
`models will become increasingly difficult. As global
`wires span multiple clock domains, synchroniza-
`tion failures in communicating between different
`domains will be rare but unavoidable events.1
`Moreover, energy and device reliability concerns
`will impose small logic swings and power supplies,
`most likely less than one volt. Electrical noise due
`to crosstalk, electromagnetic interference, and radi-
`ation-induced charge injection will likely produce
`data errors, also called upsets. Thus, transmitting
`digital values on wires will be inherently unreliable
`and nondeterministic. Other causes of nondeter-
`
`minism include design components with a high level
`of abstraction and coarse granularity and distrib-
`uted communication control.
`
`Focusing on using probabilistic metrics such as
`average values or variance to quantify design objec-
`tives such as performance and power will lead to a
`major change in design methodologies. Overall,
`SOC design will be based on both deterministic and
`stochastic models. Creating complex SoCs requires
`a modular, component-based approach to both
`hardware and software design.
`Based on the premise that interconnect technology
`will be the limiting factor for achieving SoCs’ opera-
`tional goals, we postulate that the layered design of
`reconfigurable micronetworks, which exploits the
`methods and tools used for general networks, can
`best achieve efficient communication on SoCs.
`
`0018-9162/021317flo © 2002 lEEE
`
`APPLE 1017
`
`1
`
`APPLE 1017
`
`
`
`“ii!"‘fl M95.--
`
`Projections for future silicon technologies show that chip size
`will scale up slightly while gate delays decrease compared to
`wiring delays. A simple computation shows that delays on wires
`that span the chip will extend longer than the clock period. This
`trend is a trivial consequence of the finite propagation speed of
`electromagnetic waves, which is 1/ = (0.3/46 ) mm per second in
`a homogeneous medium with relative permittivity e. In 50 nm
`technology, the projected chip die edge will be around 22 mm,
`with a clock frequency of 10 GHz.
`Thus, the delay for a signal traversing the chip diagonally will
`be approximately 100 picoseconds, or one clock period, in the
`ideal case that e = 1. A lower bound of two clock periods applies
`to general media with e > 1.1 Obviously, signal propagation on
`
`
`
`real-life interconnections is much slower than this lower bound,
`
`and optimistic predictions estimate propagation delays for
`highly optimized global wires—taking wire sizing and buffering
`into account—to be between six and 10 clock cycles for chips
`made using 50 nm technology.2
`
`References
`1. D. Sylvester and K. Keutzer, “A Global Wiring Paradigm for Deep
`Submicron Design,” IEEE Trans. CAD/ICAS, Feb. 2000, pp. 242-
`252.
`
`2. R. Ho, K. Mai, and M. Horowitz, “The Future of Wires,” Free.
`the IEEE, Apr. 2001, pp. 490-504.
`
`
`
`Network engineers have already gained experi-
`ence with using stochastic techniques and models
`for large—scale designs. We propose borrowing
`models, techniques, and tools from the network
`design field and applying them to 50C design.
`We view a SoC as a micronetwork of compo-
`nents. The network is the abstraction of the com-
`munication among components and must satisfy
`quality-of-service requirements—such as reliabil-
`ity, performance, and energy bounds—under the
`limitation of intrinsically unreliable signal trans-
`mission and significant communication delays on
`wires. We propose using the micronetwork stack
`paradigm, an adaptation of the protocol stack
`shown in Figure 1,2 to abstract the electrical, logic,
`and functional properties of the interconnection
`scheme.
`SoCs differ from wide area networks in their local
`
`proximity and because they exhibit less nondeter-
`minism. Local, high—performance networks—such
`as those developed for large-scale multiprocessors—
`have similar requirements and constraints. Some
`distinctive characteristics, such as energy constraints
`and design—time specialization, are unique to 80C
`networks, however.
`Whereas computation and storage energy greatly
`benefit from device scaling, which provides smaller
`gates and memory cells, the energy for global com-
`munication does not scale down. On the contrary,
`as the “Wiring Delays” sidebar indicates, projec-
`tions based on current delay optimization tech—
`niques for global wires3 show that global on-chip
`communication will require increasingly higher
`energy consumption. Hence, minimizing the energy
`used for communications will be a growing con-
`cern in future technologies. Further, network traf-
`fic control and monitoring can help better manage
`the power that networked computational resources
`consume. For example, the clock speed and volt-
`age of end nodes can vary according to available
`network bandwidth.
`
`Figure 1. Protocol
`stack from which
`the micronetwork
`stack paradigm can
`be adapted. Bottom
`up, the layers span
`Increasing design
`abstraction levels.
`
`Software
`
`appllcation
`
`system
`
`
`
`
`'
`
`» Architecture
`-
`and control
`transport
`network
`data link
`
`Physical
`,
`' wiring
`
`Another facet of the SOC network design prob-
`lem, design-time specialization, raises many new
`challenges. Macroscopic networks emphasize gen-
`eral—purpose communication and modularity.
`Communication network design has traditionally
`been decoupled from specific end applications and
`is strongly influenced by standardization and com-
`patibility constraints in legacy network infrastruc-
`tures. In SoC networks, these constraints are less
`restrictive because developers design the communi-
`cation network fabric on silicon from scratch. Thus,
`only the abstract network interface for the end
`nodes requires standardization. Developers can tai-
`lor the network architecture itself to the applica-
`tion, or class of applications, the 50C design targets.
`We thus envision a vertical design flow in which
`every layer of the micronetwork stack is special—
`ized and optimized for the target application
`domain. Such an application-specific on-chip net-
`work-synthesis paradigm represents an open and
`exciting research field. Specialization does not
`imply complete loss of flexibility, however. From a
`design standpoint, network reconfigurability will
`be key in providing plug-and—play component use
`because the components will interact with one
`another through reconfigurable protocols.
`
`January 2002
`
`2
`
`
`
`{3% ~$§i饧l§ Séfifi'fizi T??§i§‘§$§t§§3$i§§?é
`
`
`
`
` zséa§ sites? as
`
`
`
`magmas.
`
`Wires are the physical realization of com-
`munication channels in SoCs and, for our
`purposes, buses function as wire ensembles.
`Intensive research” into on-chip wiring has
`resulted in the commercial development of
`several physical design tools to support auto-
`mated wiring. Nevertheless, coping with
`global wires that span significant distances,
`such as those beyond one millimeter, requires
`a paradigm shift.
`Most likely, the reverse—scaled global wires
`will be routed on the top metal layers pro-
`vided by the technology. Wiring pitch and
`width increase in higher wiring levels so that wires
`at top levels can be much wider and thicker than
`low-level wires.5 Increased width reduces wire resis—
`tance, even considering the skin effect, while
`increased spacing around the wire prevents capac-
`itance growth. At the same time, inductance effects
`increase relative to resistance and capacitance. As
`a result, future global wires will function as lossy
`transmission lines,I as opposed to today’s lumped
`or distributed resistance-capacitance models.
`In addition to facilitating high-speed communi-
`cation, reducing the voltage swing also has a ben-
`eficial effect on power dissipation. Reduced-swing,
`current-mode transmission requires careful receiver
`design, with good adaptation to line impedance and
`high-sensitivity sensing, possibly with the help of
`sense amplifiers.
`When using current technologies, most chip
`developers assume that electrical waveforms always
`carry correct on-chip information. Guaranteeing
`error-free information transfer at the physical level
`on global on-chip wires will become more difficult
`for several reasons.‘ Signal swings will be reduced
`and noise—due to crosstalk, electromagnetic inter-
`ference, and other factors—will have increased
`impact. Thus, it will not be possible to abstract the
`physical layer of on—chip networks as a fully reliable,
`fixed-delay channel. At the micronetwork stack lay-
`ers atop the physical layer, noise is a source of local
`transient malfunctions. An upset is the abstraction of
`such malfunctions. Upset probability can vary over
`different physical channels and over time.
`In current designs, wiring-related effects are unde-
`sirable parasitics, and designers use specific, detailed
`physical techniques to reduce or cancel them. A
`well~balanced design should not try to achieve ideal
`wire behavior at the physical layer because the cor-
`responding cost in performance, energy efficiency,
`and modularity may be too high. Physical-layer
`design should find a compromise between satisfy-
`
`Computer
`
`ing competing quality metrics and providing a clean
`and complete abstraction of channel characteristics
`for the micronetwork layers above.
`
`reisssastwsaa
`éfigi‘iiTEflHfi‘éi ass masses,
`The architecture specifies the interconnection net-
`work’s topology and physical organization, while
`the protocols specify how to use network resources
`during system operation. Whereas both micronet-
`work and general network design must meet per-
`formance requirements, the need to satisfy tight
`energy bounds differentiates on-chip network
`implementations.
`
`inierttssassiésa assesses. fir‘fiiéiiefiifiifig
`
`On-chip networks relate closely to interconnec-
`tion networks for high-performance parallel com-
`puters with multiple processors, in which each
`processor is an individual chip. Like multiprocessor
`interconnection networks, nodes are physically
`close to each other and have high link reliability.
`Further, developers have traditionally designed
`multiprocessor interconnections under stringent
`bandwidth and latency constraints to support effec-
`tive parallelization.7 Similar constraints will drive
`micronetwork design.
`Sharvzramssmm iiii‘lwfii‘iifi. Most current SoCs have
`a shared—medium architecture, which has the sim-
`plest interconnect structures. In this architecture,
`all communication devices share the transmission
`
`medium. Only one device can drive the network
`at a time. These networks support broadcast as
`well, an advantage for the highly asymmetric com—
`munication that occurs when information flows
`from few transmitters to many receivers. Within
`current technologies, the backplane bus is the most
`common example of an on-chip, shared-medium
`structure. This convenient, low-overhead inter-
`connection handles a few active bus masters and
`
`many passive bus slaves that only respond to bus
`master requests.
`We need bus arbitration mechanisms when sev-
`
`eral processors attempt to use the bus simultane-
`ously. A bus arbiter module performs centralized
`arbitration in current on-chip buses. A processor
`seeking to communicate must first gain bus mas-
`tership from the arbiter. Because this process
`implies a control transaction and communication
`performance loss, arbitration should be as fast and
`rare as possible.
`Together with arbitration, the response time of
`slow bus slaves may cause serious performance
`losses because the bus remains idle while the mas-
`
`3
`
`
`
`ter waits for the slave to respond. To minimize the
`bandwidth consumption, developers have devised
`split transaction protocols for high-performance
`buses. In these protocols, the network releases bus
`mastership upon request completion, and the slave
`must gain access to the bus to respond, possibly
`several bus cycles later. Thus, the bus can support
`multiple outstanding transactions.
`Obviously, bus masters and bus interfaces for
`split-transaction buses are more complex than
`those for simple atomic-transaction buses. For
`example, developers chose a 128-bit split-transac—
`tion bus for the Lucent Daytona chip,8 a multi-
`processor on a chip that contains four 64-bit
`processing elements that generate transactions of
`different sizes. To improve bus-bandwidth utiliza-
`tion and minimize the average latency caused by
`simultaneous requests, the bus partitions large
`transfers into smaller packets.
`Although well understood and widely used,
`shared-medium architectures have seriously lim-
`ited scalability. The bus-based organization remains
`convenient for current SoCs that integrate fewer
`than five processors and, rarely, more than 10 bus
`masters. Energy inefficiency is another critical lim-
`itation of shared—medium networks. In these archi-
`tectures, every data transfer is broadcast, meaning
`the data must reach each possible receiver at great
`energy cost. Future integrated systems will contain
`tens to hundreds of units generating information
`that must be transferred. For such systems, a bus-
`based network would become a critical perfor-
`mance and power bottleneck.
`
`:.
`=
`"I“mm The direct or point-
`to--point network overcomes the scalability prob—
`lems of shared—medium networks In this archi-
`tecture, each node directly connects to a limited
`number of neighboring nodes. These on-chip com-
`putational units contain a network interface block,
`often called a router, that handles communication
`and directly connects to neighboring nodes’
`routers. Direct interconnect networks are popular
`for building large—scale systems because the total
`communication bandwidth also increases when the
`number of nodes in the system increases.
`The Raw Architecture Workstation (RAW) archi—
`tecture9 is an example of a direct network imple-
`mentation derived from a fully programmable SoC
`consisting of an array of identical computational
`tiles with local storage. Full programmability means
`that the compiler can program both the function of
`each tile and the interconnections among them.
`The term RAW derives from the “raw” hard-
`ware’s full exposure to the compiler. To accomplish
`
`‘3‘
`
`Eaergy iseiiésiesa
`is a stitieafi
`ismitaiésa sf
`shareémaeém
`main?:5?a
`
`programmable communication, each tile has
`a router. The compiler programs the routers
`on all tiles to issue a sequence of commands
`that determines exactly which set of wires
`connect at every cycle. Moreover, the com-
`pilet pipelines the long wires to support high
`clock frequency.
`Indirect or switch-based networks offer an
`alternative to direct networks for scalable
`interconnection design. In these networks, a
`connection between nodes must go through a set of
`switches. The network adapter associated with each
`node connects to a switch’s port. Switches them-
`selves do not perform information processing—they
`only provide a programmable connection between
`their ports, setting up a communication path that
`can change over time.7 Significantly, the distinction
`between direct and indirect networks is blurring as
`routers in direct networks and switches in indirect
`networks become more complex and absorb each
`other’s functionality. As the “Virtex II FPGA” side-
`bar indicates, some field-programmable gate arrays
`are examples of indirect networks on chips.
`fly as grammars Introducing a controlled amount
`of nonuniformity in communication- network
`design provides several advantages. Multiple--back-
`plane and hierarchical buses are two notable exam-
`ples of
`the many heterogeneous or hybrid
`interconnection architectures that developers have
`proposed and implemented. These architectures
`cluster tightly coupled computational units with
`high communication bandwidth and provide lower
`bandwidth intercluster communication links.
`Because they use a fraction of the communication
`resources and energy to provide performance com-
`parable with homogeneous, high-bandwidth archi—
`tectures, energy efficiency is a strong driver toward
`using hybrid architectures.10
`
`E‘s‘éimmst amt;amiss!
`
`Using micronetwork architectures effectively
`requires relying on protocols—network control
`algorithms that are often distributed. Network
`control dynamically manages network resources
`during system operation, striving to provide the
`required quality of service. Following the micro-
`network stack layout shown in Figure 1, we
`describe the three architecture-and-control lay-
`ers—data link, network, and transport—from the
`bottom up.
`33.231313: Efiyat’. The physical layer is an unreliable
`digital link in which the probability of bit upsets is
`non-null. Data-link protocols increase the relia-
`bility of the link, up to a minimum required level,
`
`January 2002
`
`4
`
`
`
`Yiflfix.!!.fl’§5.
`
`Most current field-programmable gate arrays consist of a
`homogeneous fabric of programmable elements connected by a
`switch-based network. FPGAs can be seen as the archetype of
`future programmable SoCs: They contain many interconnected
`computing elements. Current FPGA communication networks
`
`
`
`Figure A. Xilinx Viriax ii, a livid-programmable gate array architec-
`ture that exemplifies an indirect network over a heternganeous
`fabric.
`
`differ from future SoC micronetworks in granularity and homo-
`geneity.
`Processing elements in traditional FPGAS implement simple
`bit-level functional blocks. Thus, communication channels in
`FPGAs are functionally equivalent to wires that connect logic
`gates. Because future SoCs will house complex processing ele-
`ments, interconnects will carry much coarser quantities of infor-
`mation. The different granularity of computational elements and
`communication requirements has far-reaching consequences for
`the complexity of the network interface circuitry associated with
`each communication channel. Interface circuitry and network
`control policies must be kept extremely simple for FPGAs, while
`they can be much more complex when supporting coarser-grain
`information transfers. The increased complexity will introduce
`greater degrees of freedom for optimizing communication as well.
`The concept of dynamically reconfiguring FPGAs applies well
`to micronetwork design. SoCs benefit from programmability
`on the field to match, for example, environmental constraints.
`This programmability also lets runtime reconfiguration adapt,
`for example, to a varying workload. Reconfigurable micronet—
`works exploit programmable routers, switches, or both. Their
`embodiment may leverage multiplexers whose control signals
`are set—as with FPGAs—by configuration bits in local storage.
`For example, Figure A shows the Xilinx Virtex II FPGA with
`various configurable elements to support reconfigurable digi-
`tal-signal-processor design. The internal configurable rectan-
`gular array contains configurable logic blocks (CLBs), random
`access memories (RAMs), multipliers (MUL), switches (SWT),
`I/O buffers (IUB), and dynamic clock managers (DCM). Routing
`switches facilitate programmable interconnection. Each pro-
`grammable element connects to a switch matrix, allowing mul-
`tiple connections to the general routing matrix. Values stored
`in static memory cells control all programmable elements,
`including the routing resources. Thus, Virtex II exemplifies an
`indirect network over a heterogeneous fabric.
`
`under the assumption that the physical layer by
`itself is not sufficiently reliable.
`In a shared-medium network, contention creates
`an additional error source. Contention resolution,
`fundamentally a nondeterministic process, is an
`additional noise source because it requires syn-
`chronization of a distributed system. In general,
`synchronization can virtually eliminate nondeter-
`minism at the price of some performance loss. For
`example, centralized bus arbitration eliminates con-
`tention-induced errors in a synchronous bus but
`the slow bus clock and bus request-and-release
`cycles impose a substantial performance penalty.
`Packetizing data deals effectively with commu—
`nication errors. Sending data on an unreliable chan-
`nel
`in packets makes error containment and
`recovery easier because the packet boundaries con—
`tain the effect of errors and allow error recovery on
`a packet-by-packet basis. Using error—correcting
`codes that add redundancy to the transferred infor-
`
`mation can achieve error correction at the data link
`
`layer. Packet—based error-detection and -recovery
`protocols that have been developed for traditional
`networks, such as alternating—bit, go-back-N, and
`selective repeat, can complement error correction.2
`Several parameters in these protocols, such as
`packet size and number of outstanding packets, can
`be adjusted to achieve maximum performance at a
`specified residual error probability, within given
`energy consumption bounds, or both.
`
`flatware
`ea This layer implements cnd-to-end
`delivery control in network architectures with
`many communication channels. In most current
`on-chip networks, all processing elements connect
`to the same channel: the on—chip bus, leaving the
`network layer empty. However, when a collection
`of links connects the processing elements, we must
`decide how to set up connections between succes-
`sive links and route information from its source to
`
`the final destination. Developers have studied these
`
`Computer
`
`5
`
`
`
`switching and routing tasks extensively in the con-
`text of both multiprocessor interconnects7 and gen-
`eral communication networks.2
`Switching algorithms can be grouped into three
`classes: circuit, packet, and cut-through switching.7
`These approaches trade off better average delivery
`time and channel utilization for increased variance
`and decreased predictability. The low latency of
`cut-through switching schemes will likely make
`them preferable for on-chip micronetworks from
`a performance standpoint. However, aggressive for-
`warding of data through switches can increase traf-
`fic and contention, which may waste energy.
`Depending on the application domain, nondeter-
`minism can be more or less tolerable.
`Switching is tightly coupled to routing. Routing
`algorithms establish the path a message follows
`through the network to its final destination.
`Classifying, evaluating, and comparing on—chip
`routing schemes7 requires analyzing several trade-
`offs, such as
`
`0 predictability versus average performance,
`0 router complexity and speed versus achievable
`channel utilization, and
`0 robustness versus aggressiveness.
`
`We can make a coarse distinction between deter-
`ministic and adaptive routing algorithms. Deter-
`ministic approaches always supply the same path
`between a given source—destination pair and offer
`the best choice for uniform or regular traffic pat-
`terns. In contrast, adaptive approaches use infor~
`mation about network traffic and channel con—
`ditions to avoid congested network regions. An
`adaptive approach is preferable when dealing with
`irregular traffic or in networks with unreliable nodes
`and links.
`
`We conjecture that future on-chip micronetvvork
`designs will emphasize speed and decentralization
`of routing decisions. Robustness and fault toler-
`ance will also be highly desirable. These factors,
`and the observation that traffic patterns for spe-
`cial-purpose SoCs tend to be irregular, seem to
`favor adaptive routing. However, when traffic pre—
`dictability is high and nondeterminism is undesir-
`able, deterministic routing may be the best choice.
`The “SPIN Micronetwork” sidebar describes a
`
`micronetwork that uses deterministic routing.”
`
`:9
`iféifiifi“3
`’
`2:? Atop the network layer, the trans—
`port layer decomposes messages into packets at
`the source. It also resequences and reassembles the
`messages at the destination. Packetization granu—
`larity presents a critical design decision because
`
`§MPIILMicronmgmeWWWM
`
`..
`
`W.
`
`The Scalable, Programmable, Integrated Network (SPIN) on-chip
`micronetwork defines packets as sequences of 32-bit words, with the
`packet header fitting in the first word. SPIN uses a byte in the header to
`identify the destination, allowing the network to scale up to 256 termi-
`nal nodes. Other bits carry packet tagging and routing information, and
`the packet payload can be of variable size. A trailer—which does not con-
`tain data, but a checksum for error detection—terminates every packet.
`SPIN has a packetization overhead of two words. The payload should
`thus be significantly larger than two words to amortize the overhead.
`The SPIN micronetwork adopts cut-through switching to minimize
`message latency and storage requirements in the design of network
`switches. However, it provides some extra buffering space on output links
`to store data from blocked packets. Figure B shows SPIN’s fat-tree net-
`work architecture, which derives its name from the progressively increas-
`ing communication bandwidth toward the root. The architecture is
`nonblocking when packet size is limited to a single word. Because pack-
`ets can span more than one switch, SPIN’s blocking is a side effect of cut—
`through switching alone.
`
`
`
`
`
`Flame 5. SP”! archltocturo. R blocks are swltclws, II blocks are nudes.
`
`SPIN uses deterministic routing, with routing decisions set by the net-
`work architecture. In fat-tree networks, tree routing is the algorithm of
`choice. The network routes packets from a node, or tree leaf, toward the
`tree root until they reach a switch that is a common ancestor with the
`destination node. At that point, the network routes the packet toward
`the destination by following the unique path between the ancestor and
`destination nodes.
`
`most network-control algorithms are highly sen-
`sitive to packet size. Most macroscopic networks
`standardize packets to facilitate internetworking,
`extensibility, and the compatibility of the net-
`working hardware that different manufacturers
`produce. Packet standardization constraints can
`be relaxed in 50C micronetworks, which can be
`customized at design time.
`In general, either deterministic or statistical pro-
`cedures can provide the basis for flow control and
`negotiation. Deterministic approaches ensure that
`traffic meets specifications, and they provide hard
`bounds on delays or message losses. Deterministic
`techniques have the disadvantage of being based on
`worst cases, however, and they generally lead to sig—
`nificant underutilization of network resources.
`
`Statistical techniques offer more efficient resource
`utilization, but they cannot provide worst-case
`guarantees.
`
`January 2002
`
`6
`
`
`
`
`
`
`The Silicon Backplane Micronetwork
`(http://www.sonicsinc.com), a shared-med-
`ium bus based on time-division multiplexing,
`offers an example of transport layer issues in
`micronetwork design. When a node wants to
`communicate, it must issue a request to the
`arbiter during a time slot. If arbitration is
`favorable, it may be granted access in the fol-
`lowing time slot. Hence, arbitration intro-
`duces a nondeterministic waiting time in
`transmission. To reduce nondeterminism, the
`micronetwork protocol provides a form of
`slot reservation: Nodes can reserve a fraction
`of the available time slots, thereby allocating bus
`bandwidth deterministically.
`{.w i‘éesézéamfieni. The theoretical framework
`developed for large-scale networks provides a con-
`venient environment for reasoning about on-chip
`micronetworks as well. Currently very scarcely
`explored, the micronetwork design requires fur-
`ther work to predict the tradeoff curves in this
`space. We also believe that this area offers signifi-
`cant room for innovation: On-chip micronetwork
`architectures and protocols can be tailored to spe-
`cific system configurations and application classes.
`Further, the impact of network design and control
`decisions on communication energy presents an
`important research theme that will become criti-
`cal as communication energy consumption scales
`up in SoC architectures.
`
`
`‘5,’
`
` 13*
`Network architectures and control algorithms
`constitute the infrastructure and provide commu-
`nication services to the end nodes, which are pro-
`grammable in most cases. The software layers for
`SoCs include system and application programs.
`
`wwaters seawater
`The operating system captures the system pro-
`grams that support SOC operation. System support
`software in current SoCs usually consists of ad hoc
`routines designed for a specific integrated core
`processor under the assumption that a processor
`provides global, centralized system control. In
`future SoCs, the prevailing paradigm will be peer-
`to—peer interaction among several possibly hetero-
`geneous processing elements. Thus, we think that
`system software will be designed as a modular dis-
`tributed system. Each programmable component
`will be provided with system software to support its
`own operation, manage its communication with
`the micronetwork, and interact effectively with
`neighboring components’ system software.
`
`Computer
`
`Seamless composition of micronetwork compo-
`nents will require system software that is config-
`urable according to the network’s requirements.
`System software configuration may be achieved in
`various ways, ranging from manual adaptation to
`automatic configuration. One end of the spectrum
`favors software optimization and compactness
`while the other end favors ease of design and fast
`turnaround time. With this vision, on-chip com-
`munication protocols should be programmable at
`the system software level to adapt the underlying
`layers to the components’ characteristics.
`Most SoCs are dedicated to a specific application,
`and system software seeks to provide the required
`quality of service within the physical constraints of
`that application. Consider, for example, a 50C for
`a wireless mobile video terminal. Quality of service
`relates to the video quality, which implies specific
`computation, storage element, and micronetwork
`performance levels. Constraints relate to the
`strength and signal—to-noise ratio of the radio-
`frequency signal and to the energy available in the
`battery. Thus, the system software must provide
`high performance by orchestrating the information
`processing within the service stations and optimiz-
`ing information flow. Moreover, the software should
`achieve this task while minimizing energy con-
`sumption.
`The system software provides an abstraction of
`the underlying hardware platform. We can view the
`system as a queuing network of service stations.
`Each service station models a computational or
`storage unit, while the queuing network abstracts
`the micronetwork. Moreover, we can assume the
`following:
`
`0 Each service station can operate at various
`service levels, providing corresponding per-
`formance and energy consumption levels.
`This approach abstracts the physical imple-
`mentation of components with adjustable
`voltage or frequency levels, or both, along
`with the ability to disable their functions in
`full or in part.
`0 The system software can control the informa-
`tion flow between the various units to provide
`the appropriate quality of service. This func