`
`Performance simulation and analysis of a CMOS/nano hybrid nanoprocessor
`system
`
`Article in Nanotechnology · May 2009
`
`DOI: 10.1088/0957-4484/20/16/165203 · Source: PubMed
`
`CITATIONS
`8
`
`2 authors, including:
`
`Shamik Das
`MITRE
`
`27 PUBLICATIONS 1,111 CITATIONS
`
`SEE PROFILE
`
`READS
`91
`
`All content following this page was uploaded by Shamik Das on 23 May 2014.
`
`The user has requested enhancement of the downloaded file.
`
`AMD EX1038
`U.S. Patent No. 6,239,614
`
`0001
`
`
`
`Performance Simulation and Analysis of a CMOS/Nano
`Hybrid Nanoprocessor System
`
`Adam C Cabe and Shamik Das
`Nanosystems Group, The MITRE Corporation, McLean, VA 22102 USA
`
`E-mail: sdas@mitre.org
`
`Abstract.
`This paper provides detailed simulation results and analysis of the prospective performance
`of hybrid CMOS/nano electronic processor systems based upon the Field-Programmable Nanowire
`Interconnect (FPNI) architecture. To evaluate this architecture, a complete design was developed for
`an FPNI implementation using 90-nm CMOS with 15-nm-wide nanowire interconnects. Detailed
`simulations of this design illustrate that critical design choices and tradeoffs exist beyond those
`specified by the architecture. This includes the selection of the types of junction nanodevices, as
`well as the implementation of low-level circuits. In particular, the simulation results presented here
`show that only nanodevices with an “on/off” current ratio of 200 or more are suitable to produce
`correct system-level behavior. Furthermore, the design of the CMOS logic gates in the FPNI system
`must be customized to accommodate the resistances of both “on”-state and “off”-state nanodevices.
`Using these customized designs together with models of suitable nanodevices, additional simulations
`demonstrate that, relative to conventional 90-nm CMOS FPGA systems, performance gains can be
`obtained of up to 70% greater speed or up to a nine-fold reduction in energy consumption.
`
`Copyright c(cid:13) 2009 IOP Publishing Ltd.
`
`This paper appears in Nanotechnology , vol. 20, no. 16, 22 Apr. 2009.
`
`0002
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`2
`
`1. Introduction
`
`Hybrid micro-nano electronics systems [1–9] seek to combine the very best of industrial micro-
`electronics–complementary metal oxide semiconductor (CMOS) technology–with nanoelectronics,
`whose chief advantage over CMOS is its capacity for ultra-dense integration of devices and interconnects.
`In so doing, such hybrid systems purport to offer performance that exceeds that of either CMOS or
`nanoelectronics alone. Specifically, hybrid systems promise greater computational speed, plus lower
`power and energy consumption, all within a smaller system form factor due to the increased density of
`integration.
`Two system proposals, in particular, have garnered much recent attention [10]. These systems
`are CMOL (“CMOS+nanowires+MOLecules”), developed by Likharev et al. [5], and its close relative
`FPNI (“Field-Programmable Nanowire Interconnect”), devised by the Hewlett-Packard corporation [8].
`The CMOL and FPNI architectures combine CMOS logic elements with nanowire crossbar arrays to
`form programmable interconnect fabrics akin to Field-Programmable Gate Arrays (FPGAs) [11, 12].
`Designs for both CMOL and FPNI systems have been specified very thoroughly at the architectural
`level by their respective designers. Initial, high-level analyses conducted by these designers indicate
`that both systems offer significant promise when measured according to metrics such as circuit speed
`and system area. Furthermore, the design of the FPNI architecture contains specific enhancements [8]
`to CMOL that are intended to make the manufacturing of such systems feasible using established
`nanofabrication technologies, such as nanoimprint lithography [13–16]. A clear next step would be
`laboratory experimentation to fabricate and test physical prototypes of these systems.
`In support of that objective, this paper presents detailed simulation results that demonstrate the
`critical design challenges and tradeoffs for FPNI that exist beyond the architectural specification of
`Snider and Williams [8]. In particular, it is shown here that any nanoelectronic switches to be used in
`FPNI systems must provide an “on/off” current ratio of 200 or more. This restricts options for near-
`term experiments to a small set of demonstrated nanodevices. Further simulations illustrate that by
`using such nanodevices, in conjunction with CMOS circuits that are customized to interface with them,
`performance gains may be achieved of up to 70% greater circuit speed or up to a nine-fold reduction
`in energy consumption, relative to conventional CMOS FPGAs.
`To begin to explain the approach that led to these findings, a detailed design is presented in section
`2 for an FPNI system that implements a simple logic circuit. This design is based upon 90-nm CMOS
`technology, combined with an FPNI-style nanowire crossbar [8] composed of 15-nm-wide nanowires.
`These dimensions were chosen since they can be achieved with technology that presently is accessible
`to the research community. Following the discussion of this design, simulation results are presented in
`section 3. These results elucidate the design choices that enable FPNI systems to function correctly,
`as well as those that permit functioning FPNI systems to be optimized. Section 4 provides a summary
`and conclusions.
`
`2. Detailed Design for an FPNI System
`
`Hybrid systems such as CMOL and FPNI consist of two interdependent components: an array of
`CMOS cells and a homogeneous, switchable array of crossed nanowires that resides atop these cells.
`The design of such systems is constrained by relative size scales of these components. For example,
`a complete CMOL design is determined almost entirely by the size ratio of the logic cell to the unit
`nanowire crossbar. This is because CMOL utilizes just one type of logic cell, the inverter. In contrast,
`the FPNI architecture permits multiple types of CMOS logic, such as NAND gates and flip-flops. These
`logic gates may vary widely in size. In order to pack these disparate gates into a homogeneous fabric,
`the FPNI fabric is partitioned into uniform, rectangular “hypercells” [8]. Each hypercell consists of a
`small number of unit cells, where a single unit cell corresponds to the smallest logic element, typically
`a buffer or inverter.
`The design of the hypercell must be customized prior to fabrication in order to optimize the FPNI
`system for its intended applications. For example, Snider and Williams present two hypercell variations
`in their original work [8]: one variation consists of four two-cell NAND gates and eight single-cell buffer
`
`0003
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`3
`
`(a)
`
`(b)
`
`Figure 1. Schematic and layout for a 4 × 4 FPNI hypercell. This hypercell can be tiled to make
`larger FPNI fabrics. Figure (a) shows that this hypercell consists of 4 NAND gates, 4 inverters, and a
`flip-flop. Figure (b) provides a cutaway view of a partial CMOS layout for the FPNI flip-flop hypercell.
`
`gates in a 4 × 4 hypercell, and the other variation is a 6 × 7 hypercell that provides a flip-flop element
`in addition to other, simpler logic gates.
`Figure 1(a) shows an alternative 4 × 4 hypercell design developed for this analysis. It provides four
`two-input NAND gates, four inverters, and a flip-flop. In comparison to the hypercells described by
`Snider and Williams [8], this design provides a greater density of flip-flops per unit cell, as is desirable
`for the pipelined arithmetic operations considered in section 3.
`Also, the FPNI design developed and considered here goes a step beyond the work of Snider and
`Williams in that it is specified all the way down to the CMOS layout and takes into account precise
`dimensions for all the CMOS components. A sample CMOS layout for this design is shown in figure
`1(b). As is shown in this layout, there can be a small area penalty to be paid in the form of empty
`space in some unit cells. This is due to the fact that the areas of the complex logic gates are not integer
`multiples of that of the inverter or buffer, as would be desirable to take full advantage of the density
`of the interconnect layer above. Thus, in comparison to a custom CMOS-only design, where there is
`no need to align to a uniform nanowire interconnect structure, the mapping of FPNI gates to integer
`unit cells can be inefficient. However, it should be noted that in designing the layout shown in figure
`1(b), circuit function was given priority over mapping efficiency.
`As with the design of the underlying CMOS logic gates, design choices also arise in the FPNI
`interconnect layer. This interconnect consists of the nanowire crossbar array and the programmable
`nanodevices that exist at each nanowire junction. The nanowire crossbar consists of two layers of
`parallel nanowires, one laid orthogonally over the other, creating a 2-D interconnect grid [2]. Nanowire
`crossbar arrays of the required scale have been demonstrated with pitches as low as 14 nm [13,14]. The
`nanowire pitch, together with the CMOS unit cell dimensions, determines the location and number of
`programmable connections between adjacent logic gates. Thus, the first design choice for the FPNI
`interconnect layer is to decide the nanowire pitch. For this design, a nanowire pitch of 30 nm was
`selected because it is aggressive, yet accessible with present technology.
`The second choice to be made is to decide upon an appropriate junction nanodevice. The
`configurable nanodevice at each junction must have both a high-conductivity (“on”) state and a low-
`conductivity (“off”) state, making it bi-stable. Applying a large positive or negative voltage across the
`device causes it to switch states. Although many device technologies provide this functionality, in order
`
`0004
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`4
`
`to select an appropriate technology, one must understand first how these devices are intended to work
`within FPNI circuits.
`The schematic in figure 1(a) shows how these nanodevices are employed to create functional circuits
`in the FPNI fabric. In this figure, a two-input NAND gate, with inputs ‘A’ and ‘B’, is connected to an
`inverter. The output of this inverter is connected to the flip-flop, whose output is marked ‘Z’. The figure
`shows the nanowires and junctions that are employed to create this circuit, as well as some nanowires
`in the vicinity that are unused, both in the circuit path and off the path. For clarity, some nanowires
`are omitted from the figure.
`This figure illustrates a design challenge that must be resolved through simulation. Specifically, the
`circuit path is dictated by programming the required junction nanodevices into their “on” conductive
`state. Thus, ideally, it is desired that the “off” state of these devices conduct no current, i.e., that the
`“on/off” ratio be infinite. In practice, such ratios occupy a wide variety of non-ideal, finite values that
`depend on the device composition.
`For example, self-assembled monolayers of molecules, such as rotaxanes and pseudo-rotaxanes
`[17], yield “on/off” current ratios from two to 11 [18, 19]. Other molecular devices, such as an
`oligo(phenylene-ethynylene) (OPE) molecule with a nitro sidegroup [20], produce similar ratios of
`approximately 10.
`Inorganic nanodevices, such as those based upon metal oxides, also have been
`shown to exhibit useful switching characteristics. Examples include Cu2O, Al2O3, NiO, and TiO2. For
`such devices, “on/off” ratios of 100 or more have been demonstrated [21], with those of Cu2O [22]
`and TiO2 [23] as high as 1000.
`In particular, the latter material is central to the “memristive”
`nanodevice [24] proposed by the Hewlett-Packard team that invented the FPNI architecture. In addition
`to metal oxides, other nanowire-based inorganic junction nanodevices also have been demonstrated to
`achieve device “on/off” ratios on the order of 1000 or more [25, 26].
`Since practical “off”-state nanodevices necessarily will conduct some current, the various CMOS
`cells and hypercells will not be isolated completely. Thus, the design of circuits in FPNI fabrics must
`account not only for the CMOS logic and nanodevices to be used in the circuits, but also those that
`are unused, yet adjacent to the circuits. A simplified example is shown in figure 2. This figure depicts
`two CMOS logic gates, shown at the upper left and lower right, connected through a nanowire crossbar
`array. Other gates are shown to share this nanowire array (through connections that are not shown in
`the figure). Here, a logic ‘1’ is intended as the voltage signal transmitted via the topmost horizontal
`nanowire. However, the presence of the ‘0’ signals pulls down the output, denoted ‘?’, through the
`resistive bridge that is formed from the “on”-state junction and the parallel collection of “off”-state
`junctions, which have finite resistance.
`In this simplified example, there is one vertical nanowire to consider with, say, N “off”-state
`resistors. If the maximum tolerable error in the ‘1’ voltage is ǫ (as a fraction of the total voltage), then
`
`RON
`RON + ROF F /N
`
`< ǫ,
`
`i.e., the “on/off” current ratio, equal to ROF F
`, must exceed N ( 1
`−1). For the complete design presented
`RON
`ǫ
`in this paper, N = 7, and assuming ǫ = 0.1, this provides a theoretical requirement that the “on/off”
`ratio exceed 63.
`However, this simplified analysis does not consider the impact of the other nanowire junctions
`implied in the figure, nor does it consider the other possible configurations of the additional logic
`gates. Also, importantly, it models the nanowire junction nanodevices as linear resistors and omits the
`nonlinearities present in experimentally demonstrated nanodevices. Furthermore, this analysis neglects
`the impacts of parasitic components, such as the nanowire resistances and the capacitances that couple
`the nanowires. In conjunction with the junction and nanowire resistances, these capacitances can affect
`the propagation of signals through the crossbar array as these signals change in value.
`The most effective way to evaluate these nanodevice and interconnect issues accurately and
`exhaustively is via the use of detailed system simulation, which takes into account the behavior of
`individual devices and parasitic components, as well as their behavior in aggregate. Such simulations
`are discussed in the next section.
`
`0005
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`5
`
`Figure 2. Simplified schematic of the nanowire crossbar interconnecting a set of CMOS logic gates.
`For the configuration depicted here, the upper-left logic gate is modeled as providing the input to
`the lower-right gate through a linear resistor. As seen here, the other gates may produce conflicting
`signals that corrupt the output nanowire, denoted ‘?’, via the finite-resistance “off”-state junctions
`that connect them.
`
`3. Simulations of the FPNI System
`
`3.1. Simulations of System Functionality
`
`The inventors of CMOL and of FPNI evaluated their respective systems by mapping the Toronto 20
`benchmark circuits [27] into their fabrics and examining the overall performance [5, 8]. In doing this,
`they focused on three primary metrics: circuit area, critical path delay, and dynamic power consumption
`(i.e., the portion of the total power that primarily is capacitive and is consumed during transitions in
`the digital state of a circuit). The circuit area was calculated directly from the mapping. The other
`two metrics were estimated using high-level analytical techniques, such as Elmore delay modeling [8].
`However, the nonidealities of the nanodevice behaviors are likely to result in unexpected system-
`level performance issues in CMOL- and FPNI-based systems. Such nonidealities are not amenable
`to simple, high-level analytical modeling. Instead, detailed computer-based simulation is required to
`evaluate the impacts of these behaviors fully. This is well known to designers of deep-submicron and
`nanometer CMOS, where accounting for the nonidealities of interconnect behavior is a key factor in the
`characterization and optimization of system performance. No CMOS system design can be completed
`without simulation at the layout level of the system or its subsystems. This is certain to be true to an
`even greater extent for nanoelectronic and hybrid CMOS/nano designs, in which ultra-miniaturization
`exacerbates the parasitic behaviors of interconnects relative to those of the underlying devices.
`To study the impacts of such parasitics in a nanoprocessing system, a full adder circuit was
`designed and mapped into a simulated FPNI fabric. This adder circuit takes three single-bit inputs
`and produces a two-bit output that is the binary sum of the inputs. This circuit is ubiquitous in digital
`logic, and therefore, its performance is indicative of that of larger systems, including nanoprocessors.
`Thus, detailed simulation of this circuit is conducted in lieu of the detailed simulation of an entire
`nanoprocessor, which would be computationally intractable, just as would be the detailed simulation
`of an entire commercial microprocessor.
`The full adder design is intended especially to provide insight into how the FPNI architecture
`might scale to larger circuit sizes. The design uses 11 NAND gates and two flip-flops. It requires four
`of the 4 ×4 hypercells described in section 2. (These hypercells were designed with the full-adder circuit
`in mind; the same circuit would require two 6 × 7 hypercells from Snider and Williams [8], who did not
`optimize their design for this application.)
`Detailed simulations of this hybrid full-adder circuit were performed using a methodology developed
`originally for simulating nanomemory and nanoprocessor systems. This methodology, together with
`
`0006
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`6
`
`Figure 3. A portion of an FPNI circuit highlighting the current leakage paths through the system.
`Given inputs A and B, the arrows denote the stray currents flowing through “off” devices that amass
`at one particular unused inverter (second from top).
`
`the CAD environment and nanodevice models, is discussed in detail in prior publications [28–30]. Four
`main steps are involved. First, empirical data are obtained for the desired nanodevices and interconnect
`structures. Second, these data are encapsulated in models written in the Verilog-A language [28–30].
`Third, a system-level schematic is assembled within the Cadence Virtuoso modeling software [31], using
`models for each CMOS device and each nanodevice. Finally, the electrical behavior of the circuit is
`simulated using the Cadence Spectre simulator [31].
`The empirical data used for modeling the nanodevices were obtained from published experimental
`results on rotaxane-based nanodevices [18, 19, 32]. These nanodevices exhibit exponential current-
`voltage (I-V) behaviors that are characteristic of many of the resistive nanodevices demonstrated to
`date [20–22, 25, 26]. Using this experimental data, parameterized Verilog-A models were developed,
`through which characteristics such as nanodevice resistance could be varied by adjusting the parameters.
`For example, in initial simulations, the nanodevice “on” resistance was assumed to be 2.5 MΩ, which
`is consistent with experimental data [17–22, 25, 26, 32].
`Models for the nanowire interconnects were based upon resistor-capacitor networks. These
`interconnect models were constructed using the method of Steinh¨ogl et al. [33] that also was employed
`by Snider and Williams [8]. As stated in section 2, the nanowires were assumed to be 15 nm wide with a
`pitch of 30 nm. The wire resistivity was set at 8.88 µΩ·cm and the substrate and coupling capacitances
`were 2 pF/cm and 1 pF/cm, respectively.
`System schematics were assembled as follows. First, a detailed CMOS layout was designed for the
`FPNI hypercell shown in figure 1(a). This layout was used to determine the physical dimensions of
`the nanowire interconnect network. Given the physical dimensions, the schematics were completed by
`combining the aforementioned nanodevice and nanowire models with Cadence Spectre models of 90-nm
`CMOS transistor devices.
`Using these models and schematics, analyses were carried out in the Cadence Spectre simulator
`to establish the functionality of FPNI circuits. This was done by simulating the behavior of individual
`logic gates within the aforementioned schematics. These simulations revealed that there can exist
`undesired “sneak leakage” current paths flowing throughout the nanowire interconnect array. This issue
`is depicted in figure 3. Since the nanowire interconnect is based on resistive nanodevice connections
`that have finite “on/off” ratios, current flows through both the desired “on”-state nanodevices and the
`unselected “off”-state nanodevices. In figure 3, the bold wires highlight the intended circuit path using
`diamonds to denote the “on”-state nanodevices and large circles to indicate the “off”-state nanodevices.
`
`0007
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`7
`
`Figure 4. Transistor-level designs for the NAND and inverter gates used in the FPNI schematic.
`These designs are modified from standard CMOS and FPNI implementations through the addition of
`the uppermost and lowermost transistors. The “EN” signals that drive these transistors permit the
`disconnection of unused CMOS logic cells from the power supply, thus reducing the leakage power
`consumption of these cells.
`
`The arrows represent one example of stray currents in the nanowires.
`Detailed circuit simulations demonstrate that these currents can be large enough to disturb the
`voltage states of internal nodes of the CMOS logic network. This can partially turn on CMOS transistors
`that are intended to be unused (and therefore off). As a result, there can be short-circuit current
`paths within the CMOS circuitry itself. Such CMOS leakage current can result in significant power
`consumption.
`Thus, before carrying out further simulations, the CMOS logic cells were redesigned to prevent the
`CMOS sneak leakage paths. Each CMOS logic gate in the revised design is implemented with “sleep”
`transistors that allow for power to be disconnected from the unused circuits. Example modified CMOS
`circuits are shown in figure 4. In these examples, the sleep transistors are inserted next to each power
`supply line.
`Using these design refinements, further simulations were conducted to assess the functionality of
`FPNI systems. In particular, the “on/off” ratio was expected to have significant impact on the currents
`that flow through both the intended and undesired interconnect paths. To verify this expectation,
`simulations were performed by varying the “on/off” ratio, keeping the “on” resistance fixed at 2.5 MΩ.
`Figure 5 shows the results of this simulation. The waveforms depicted in this figure confirm the
`existence of a minimum threshold “on/off” ratio in order to guarantee correct logic operation. At
`low “on/off” ratios, such as in curve (a), almost no correct output values are attained. However, as
`the “on/off” ratio is increased, the adder begins to work as intended. The simulation shows that the
`circuit functions at a ratio of 200, albeit with some voltage waveform degradation still visible during
`the transitions between ‘0’ and ‘1’ states. Further contrast between “on” and “off” resistances yields
`the correct, full ‘0’-to-‘1’ output. Because it takes many more details into account, this simulation
`improves upon and gives a somewhat higher, more accurate estimate of the “on/off” ratio requirement
`than does the illustrative, algebraic analysis that was conducted above in section 2.
`These simulations show that once fabricated, the FPNI architecture can be made to work using
`experimentally demonstrated nanodevices. However, the simulations illustrate that only nanodevices
`with an “on/off” ratio of 200 or more are suitable for this architecture. Of the devices that are suitable,
`there exist a variety of options for the “on” resistance, “off” resistance, and “on/off” ratio. Thus, it is
`important to examine how these parameters may be tuned to optimize the performance of a functioning
`FPNI design. Simulations to address such questions are described in the next subsection.
`
`0008
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`8
`
`1.25
`
`2.5
`
`3.75
`
`1.25
`
`2.5
`
`3.75
`
`1.25
`
`2.5
`
`3.75
`
`1.25
`
`2.5
`
`3.75
`
`5
`
`5
`
`5
`
`5
`
`(a)
`
`6.25
`
`7.5
`
`8.75
`
`10
`
`(b)
`
`6.25
`
`7.5
`
`8.75
`
`10
`
`(c)
`
`6.25
`
`7.5
`
`8.75
`
`10
`
`(d)
`
`6.25
`
`7.5
`
`8.75
`
`10
`
`(e)
`
`1.25
`
`2.5
`
`3.75
`
`5
`time (µs)
`
`6.25
`
`7.5
`
`8.75
`
`10
`
`01
`
`0
`
`01
`
`0
`
`01
`
`0
`
`01
`
`0
`
`01
`
`0
`
`voltage (V)
`
`Figure 5. Simulations of the FPNI full-adder circuit. The waveforms shown here depict the voltage
`of the carry output bit for various values of the junction nanodevice “on/off” ratio. This ratio was
`set via simulation to (a) 2, (b) 20, (c) 200, (d) 2,000, and (e) 20,000, respectively. The simulated
`waveforms show that correct behavior is not obtained for the two lowest values of the “on/off” ratio,
`and also that the waveform voltages achieve full ‘0’-to-‘1’ swing only for the two highest values.
`
`(a)
`
` 10K
`
`100K
`
` 1M
`
` 10M
`
`100M
`
`(b)
`
`total
`
`leakage
`
`1000
`100
`10
`1
`0.1
` 1K
`
`delay (ns)
`propagation
`
`100
`10
`1
`0.1
`
`(µW)
`power
`
` 1K
`
` 10K
`
`100K
`
` 1M
`
` 10M
`
`100M
`
`(c)
`
` 10K
`
` 1M
`100K
`nanodevice resistance (Ohms)
`
` 10M
`
`100M
`
`1000
`
`100
`
`10
` 1K
`
`addition (fJ)
`energy per
`
`Figure 6. Three-part plot showing the impact of nanodevice “on”-state resistance on circuit delay and
`energy consumption for the FPNI full adder circuit. This plot provides (a) circuit delay, (b) average
`power, and (c) energy per addition operation, all as a function of the nanodevice “on” resistance. The
`optimum resistances for circuit delay, power, and energy per addition are denoted by vertical dotted
`lines at 2.5 KΩ, 25 MΩ, and 57 KΩ, respectively. The “on/off” ratio is fixed at 2000 in all cases.
`
`3.2. Simulations to Optimize System Performance
`
`Simulations were carried out to determine the impact of nanodevice resistances on circuit performance
`for a functioning FPNI adder circuit.
`In these simulations, the circuit delay, power, and energy
`consumption were evaluated for various nanodevice “on” resistances. The “off” resistances also were
`varied so that the “on/off” ratio was fixed at 2000 in all cases (which ensures correct functionality).
`This ratio is a reasonable basis for further simulations since, as discussed above in section 2, several
`appropriate devices have been demonstrated with “on/off” ratios exceeding 1000.
`The results of these simulations are shown in figure 6. Figure 6(a) details the impact on signal
`propagation delay. This simulation shows a monotonic increase in delay with the nanodevice “on”
`resistance, assuming a fixed “on/off” ratio. At 24 KΩ, the minimum nanodevice “on” resistance
`
`0009
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`9
`
`Circuit Delay (ps)
`Leakage Power
`Dynamic Energy (nW/MHz)
`Energy per Addition (fJ)
`
`CMOS
`Full Adder
`148
`10.17 nW
`4.30
`4.30
`
`FPNI Full Adder
`Xilinx
`Fastest
`Least Energy
`Spartan-3
`354
`698
`610
`36 nW 33.89 µW
`3.79 µW
`240
`34.24
`24.45
`240
`46.25
`27.09
`
`Table 1. Comparison of an FPNI full-adder circuit with conventional CMOS implementations.
`The FPNI full-adder data are the best-case data exhibited in figure 6. This FPNI full adder is
`compared with a custom, optimized CMOS adder and a Xilinx Spartan-3 [34–36] single-logic-slice
`FPGA implementation. For each circuit, the circuit delay is given, together with the average leakage
`power, average dynamic energy, and average total energy per addition.
`
`proposed by Snider and Williams [8], the propagation delay is approximately 0.7 ns, supporting a clock
`speed of up to 1.4 GHz. In contrast, figure 6(b) shows that power consumption for this circuit decreases
`as a function of “on” resistance. Here the minimum point is observed at 25 MΩ “on” resistance.
`It is clear from figure 6 that there is an optimization tradeoff between delay and power.
`Furthermore, the product of these two metrics is a single metric that measures the energy per addition
`operation. This common metric strikes a useful balance between the optimization of speed and power
`consumption. Figure 6(c) provides this data. Here, it is seen that at low nanodevice resistances, total
`power dominates due to leakage, while delay remains relatively flat. Conversely, at high nanodevice
`resistances, delay dominates strongly, while power flattens out. As a result, in the power-delay product,
`which is the energy per addition, a minimum exists at approximately 57 KΩ. This presents system
`designers with a middle-ground option between optimizing for circuit speed and optimizing for power
`efficiency.
`To place the results shown in figure 6 in context, it is valuable to compare these results against the
`performance of a conventional reference circuit. The ideal reference circuit would be one designed for
`a reconfigurable CMOS technology such as one of several commercially available FPGAs based upon
`90-nm CMOS [34, 35]. Alternatively, a custom CMOS full-adder circuit could be used as a reference.
`Such a circuit would be tailored specifically to compute additions and would not be reconfigurable.
`This circuit would provide an upper bound for 90-nm full adder performance in terms of speed and
`energy efficiency.
`Table 1 provides a comparison of two FPNI full adder versions to these reference circuits. The two
`FPNI versions, denoted “fastest” and “least energy” in the table, are the designs using nanodevice “on”
`resistances of 2.5 KΩ and 57 KΩ, respectively, as determined via the simulation data shown in figure
`6. As expected, the fully customized CMOS implementation outperforms FPNI both in delay and in
`energy consumption. However, when compared against a state-of-the-art reconfigurable CMOS FPGA,
`the FPNI versions perform better in simulation. As table 1 shows, the FPNI full adder can be made
`up to 70% faster than the FPGA and simultaneously five times as energy efficient. Alternatively, the
`FPNI version can be made nine times as energy efficient with only a slight cost to speed. Overall, the
`FPNI full adders perform more closely overall to the custom CMOS version than to the reconfigurable
`one.
`
`In particular, it is seen that the dynamic energy consumption of the programmable hybrid circuit is
`much closer to the custom CMOS than to the programmable CMOS FPGA. This is due to the reduced
`interconnect capacitance provided by the nanowire interconnect. In contrast, the leakage power is much
`higher in the FPNI circuit than in either CMOS reference circuit. As discussed above in section 3.1,
`this is due to the CMOS gate voltage offsets generated by the highly resistive nanodevice network.
`Nevertheless, due to the low dynamic energy consumption of the FPNI circuit, the overall energy
`consumption for this circuit is seen to be lower than that of its closest conventional kin, the CMOS
`FPGA.
`Thus, the detailed system simulation results provided here show that by using existing, experimen-
`tally demonstrated nanoelectronic devices [23,25,26], a range of system performance options superior to
`conventional CMOS is available to designers of prospective nano-enabled reconfigurable logic systems
`such as FPNI. This is the case even though the present state of junction nanodevice research has
`
`0010
`
`
`
`Performance Simulation and Analysis of a Hybrid Nanoprocessor
`
`10
`
`produced devices that have relatively high resistance values or that are otherwise less suitable than
`their conventional counterparts. As a result, near-term opportunities exist to improve performance over
`conventional CMOS by pursuing the fabrication and demonstration of entire systems that hybridize
`CMOS with presently available nanodevices.
`
`4. Summary and Conclusions
`
`In their paper introducing the FPNI architecture [8], Snider and Williams showed that FPNI
`performance could by optimized by exploiting design flexibility at the architectural level. For example,
`by changing the number and/or type of gates within a hypercell, as well as the number of inputs to
`these gates, various area, delay, and power characteristics could be obtained for a number of different
`benchmark circuits.
`This paper goes beyond that work to show that nanodevice and circuit customizations play
`an even more fundamental role in the design and optimization of functioning FPNI systems. Such
`customizations determine whether or not the system will function correctly. Also, even in a correctly
`functioning system, the CMOS subsystem must be designed to compensate for the non-ideal behavior
`of the nanodevices. Integration of the design of the nanodevices with that of the CMOS circuits, where
`each is customized to function with the othe