`
`1929
`
`A 1.8-V 128-Mb 125-MHz Multilevel Cell Flash
`Memory With Flexible Read While Write
`
`Daniel Elmhurst and Matthew Goldman, Member, IEEE
`
`Abstract—Application of multilevel cell (MLC) technology
`to a flexible read-while-write flash memory has been achieved
`through the use of a highly optimized sensing architecture. The
`goal of this implementation is to provide performance on par with
`single-bit-per-cell implementations while significantly reducing
`the overall die size. In order to achieve the required high-speed
`operation using MLC structures, all offsets to the sense amplifier
`were minimized and the column load and local sense amplifier
`were optimized to provide ample differential gain. Through the
`use of these optimization techniques, a 1.8-V MLC-based flexible
`read-while-write memory with 125-MHz continuous burst and
`40-ns random read access time has been manufactured. Using
`a 0.13- m technology, this new device provides a die size that
`is 25% of the size of the equivalent single-bit-per-cell device
`manufactured on a 0.18- m technology.
`
`Index Terms—Active current mirror, address transition detec-
`tion (ATD), amplifier, drain biasing, flash, multilevel cell (MLC),
`nonvolatile memory, NOR flash, parallel sensing, read while write
`(RWW), serial sensing.
`
`I. INTRODUCTION
`
`M ULTILEVEL cell (MLC) storage in Flash memory
`
`was first reported at the IEEE International Solid-State
`Circuits Conference in 1995 as a means to reduce the cost per
`bit by almost 50% for any given process lithography [1]. In
`recent years, more and more applications have come to require
`a low-cost high-performance flexible memory architecture that
`includes nonvolatile storage of both code and data [2]. This
`paper is a progression in the MLC development cycle with
`the introduction of a 0.13- m 1.8-V 128-Mb Flash memory
`that achieves two-bits-per-cell storage. High-performance syn-
`chronous read operation for code execution and a simultaneous
`read and write operation with a flexible boundary between the
`code and data partitions are achieved. Direct execution of code
`can be performed with an initial latency of 40 ns followed by a
`burst operation at 125 MHz. During any asynchronous or syn-
`chronous read operation, the device can perform a background
`program or erase operation in another partition of the memory.
`This is the first 1.8-V MLC read-while-write (RWW) design
`that supports 125-MHz synchronous burst operation.
`This paper focuses primarily on the sensing architecture nec-
`essary to allow flexible RWW in an MLC memory. Section II
`reviews the serial sense architecture along with some other key
`architectural and design features that ensure a robust sensing op-
`eration. Section III discusses details of the design of the column
`
`Manuscript received April 11, 2003; revised July 2, 2003.
`The authors are with Intel Corporation, Folsom, CA 95630 USA (e-mail:
`matthew.goldman@intel.com).
`Digital Object Identifier 10.1109/JSSC.2003.818144
`
`Fig. 1. Serial sense reference levels and search algorithm.
`
`load and sense amplifier circuitry that convert the array to ref-
`erence current differential into a logical one or zero. Section IV
`describes the sequence of events during a complete sensing op-
`eration. Section V provides some additional detail about the ref-
`erence cell array used to read and verify cell placement. Finally,
`Section VI summarizes the key points of the paper with refer-
`ences to the product implementation that first captured the ar-
`chitectural and design improvements described.
`
`II. SENSING ARCHITECTURE
`
`The MLC analog-to-digital conversion is achieved using a se-
`rial sensing technique [3]. With this technique, the array cell
`current is compared to a reference current that has been tuned
`to the approximate midpoint in the overall cell current range
`and the result is captured as the most significant bit (MSB) of
`data. Depending on whether the array cell develops more or less
`current than the initial reference cell, a new reference cell is
`chosen such that the updated reference current is approximately
`midway between the initial reference current and upper or lower
`boundary current respectively. The array current is then com-
`pared to the new reference and the least significant bit (LSB) of
`data is captured. Fig. 1 shows the relative reference levels and
`binomial search algorithm used to read the data with a serial
`sense scheme.
`Given the goal of minimizing the overall sense time and re-
`sultant access time, one might wonder why a parallel sensing
`scheme was not chosen. As shown in Fig. 2, a parallel sensing
`implementation would provide both the MSB and LSB data in
`the time it takes to complete a single sense operation (plus the
`incremental time to decode the outputs). In addition to the ob-
`vious area penalty of implementing three times the number of
`sense amplifiers, this approach introduces a significant capaci-
`tive offset between the two inputs to each amplifier. Notice that
`the array cell is connected to the input of each of the three am-
`plifiers while each reference is connected to only one amplifier
`
`0018-9200/03$17.00 © 2003 IEEE
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1024, p. 1929
`
`
`
`1930
`
`IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 11, NOVEMBER 2003
`
`reference currents, then the resultant output data could be com-
`promised. To address this concern, there are duplicate bitlines
`connected from the two inputs of the sense amplifier and run-
`ning across the length of the array. Depending on the address
`selected, one of these bitlines is connected to the desired cell
`in the array while the other is connected to a reference array.
`Referring to Fig. 4, the array flash cell is connected by a local
`bitline (LBL) to the odd global bitline (OGBL) and then to the
`odd sense input (OSEN). The even sense input (ESEN), on the
`other hand, is connected directly to the reference array, but is
`also connected to the even global bitline (EGBL) which is fur-
`ther connected to an equivalent number of deselected LBLs (not
`shown in the diagram). Although this implementation doubles
`the density of bitlines over the main array, it has the benefit of
`more closely matching the capacitance at the input of the sense
`amplifier. Furthermore, by repeatedly twisting the even and odd
`bitline pair along their length, the susceptibility to differential
`noise can be significantly reduced. The twisting ensures that
`each bitline of a given pair is adjacent to any spurious noise
`source that might occur along their length, essentially rendering
`it to be common mode to the sense amplifier.
`A second factor that must be accounted for when optimizing
`the speed and robustness of the design is the variation between
`the multiple sense amplifiers that are used to perform a high-
`bandwidth read. To ensure minimal layout induced offsets, the
`local sense amplifier is divided into two identical half sense cir-
`cuits so that this half sense circuit along with a single instance
`of the drain bias circuitry and active current mirror can be drawn
`in such a way that it can be easily stepped and repeated to create
`an array of sense amplifiers. In this way, each half sense am-
`plifier sees identical layout features on all sides. To complete
`the layout matching, even the half sense cells at the edge of the
`sensing array have a dummy half sense cell placed adjacent to
`them so that these edge amplifiers perform the same as the others
`in the array.
`The third factor that was considered when optimizing the
`speed and robustness of the design is variation that could occur
`between the placement of the verified cell during a program or
`erase operation and the data as determined during a subsequent
`read operation. To minimize this type of offset, it is critical that
`the exact same sense circuits be used for both the verify oper-
`ation that takes place as part of the write and erase algorithms
`and the read operation. As such, the design has to allow for a
`read in one partition while simultaneously supporting a verify
`operation in any other partition. Referring to Figs. 4 and 5, this
`is accomplished by creating two sets of identical global sense
`amplifiers plus duplicate busses between the local and global
`sense circuits as well as duplicate bus driver circuits. With this
`design, the same local serial sense circuit is used to verify the
`cell current as is used to read the cell current. The only differ-
`ence is that the output from the local sense amplifier is directed
`onto a different local-to-global bus. Since the local sense am-
`plifier is latched to full rail data, splitting the read and verify
`paths at this point creates negligible offsets. As shown in Fig. 5,
`since the local sense outputs are passed to the global sense cir-
`cuits as a differential signal, the same type of bus twisting as
`was described for the global bitlines is employed to reduce the
`susceptibility of the global sense amplifier to input offset. This
`
`Fig. 2. Parallel sense block diagram.
`
`Fig. 3. Hypothetical MLC threshold voltage histogram.
`
`input. The only two ways to overcome the inherent capacitance
`mismatch with this scheme are to either add even more devices
`to attempt to balance the loading or to allow more time in the
`sense cycle for the displacement currents that are charging this
`offset capacitance to dissipate.
`In addition to choosing the best sense architecture to mini-
`mize capacitive mismatch, a robust sense design must also focus
`on minimizing all the other sources of offset between a cell’s
`initially programmed value and the value read back through the
`sense path. Referring to Fig. 3, it can be seen that although the
`ideal distribution of programmed cells may provide a relatively
`wide spacing between any array cell threshold voltage and the
`nearest reference cell thresholds, the accumulation of offset and
`mismatch components can reduce the resulting differential cur-
`rent to such a small value that the read speed of the part, if a ro-
`bust read is even possible, must be severely reduced. Although
`there are many statistical and process related factors that impact
`the robustness of the read, three factors in particular are easily
`addressed through the design: capacitance mismatch and dif-
`ferential noise, sense-amplifier-to-sense-amplifier offsets, and
`read to verify offsets.
`The first of these factors is the reduction of non-data-related
`offsets at the inputs to the sensing circuits. For example, if a
`noise source were to cause a differential offset with a magni-
`tude on the same order as the differential between the array and
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1024, p. 1930
`
`
`
`ELMHURST AND GOLDMAN: MLC FLASH MEMORY WITH FLEXIBLE RWW
`
`1931
`
`Fig. 4. Serial sense architecture.
`
`SEN node [2], [3]. For 1.8-V high-performance MLC designs,
`this circuit had numerous disadvantages, such as voltage head-
`room and inadequate signal-to-noise margin. Voltage headroom
`is even a greater concern in MLC designs due to the larger re-
`quired current window than in single-bit-per-cell designs. As a
`result, the column load creates a significant area impact for ei-
`ther an active or a passive device. Fig. 6 shows the implementa-
`tion of the active current mirror column load which was critical
`to realize the 40–ns access time. The characteristics of an ideal
`active current mirror as they apply to an MLC sensing scheme
`are illustrated in Fig. 7. The RIN branch –
`curve represents
`the diode-connected device which provides a low impedance re-
`sulting in fast settling for the reference. The reference column
`load device (MCL1) is biased with a flash cell reference current.
`The three points along the RIN –
`curve represents the tran-
`sistor operating points for the three reference cells: R1, R2, and
`R3. The column load device (MCL0) is biased with an array cell
`current. The three –
`curves represent array cells with their
`threshold voltages placed in the state above or below each read
`reference cell. The SIN/RIN nodes are equalized at the intersec-
`tion of the two curves and then, when equalization is turned off,
`the small difference in array cell current relative to the reference
`current creates a large change in voltage. As a result, this scheme
`achieves large gain and quickly amplifies the signal, which fa-
`cilitates fast sense times.
`The drain bias signal in Fig. 6 provides the reference voltage
`for the cascode device (MCASC
`) to not only ensure that
`the flash cell meets minimum drain voltage requirements, but
`also to ensure the flash cell is not disturbed by a high drain
`
`Fig. 5. Read-while-write circuit implementation.
`
`can be particularly significant given the long routing necessary
`to bus the data from the farthest local sense amplifier circuit in
`the array to the global sense circuitry in the periphery.
`
`III. ACTIVE CURRENT MIRROR AND SENSE
`AMPLIFIER CIRCUITS
`
`The local sense amplifier is composed of the column load,
`latched sense amplifier, and the MSB sense reference latch. Pre-
`vious generation designs utilized an n-channel transistor as a re-
`sistor for the column load and a bitline precharger to charge the
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1024, p. 1931
`
`
`
`1932
`
`IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 11, NOVEMBER 2003
`
`Fig. 6. Active current mirror column load.
`
`Fig. 7. Active current mirror column load I–V curve.
`
`voltage. Additionally, the cascode device provides isolation for
`the low capacitance fast switching SIN/RIN nodes from the
`high-capacitance bitlines, SENO/SENE. To maximize perfor-
`mance, the design guarantees that the diode-connected branch
`is always tied to the bitline that connects to the reference flash
`cell (MCLOE
`). By doing so, the main array flash cell
`bitline is tied to the higher gain and lower capacitive SIN node.
`Devices MEQ
`provide equalization for the SIN/RIN
`and SENE/SENO nodes to guarantee that they are at the same
`voltage level prior to building margin.
`
`IV. SENSE OPERATION
`
`The entire sense cycle is timed by signals that are generated
`by the address transition detector (ATD) circuit. At a clock tran-
`sition, the address is decoded to select the proper wordline and
`bitline (Fig. 8). Once the bitline is selected, the equalization de-
`vices equalize the bitlines while providing maximum charging
`
`Fig. 8. Sensing simulation waveforms.
`
`capability. The equalization devices are released and the dif-
`ferential current between the reference and selected flash cell
`rapidly creates a voltage differential on the inputs to the local
`sense amplifier. During this time, the local sense to global sense
`differential bus is being initialized to ground. To achieve fast
`grounding, the pull-down devices are distributed in the local
`sense amplifiers throughout the midsection of the chip. Once
`adequate differential margin is achieved, the local sense ampli-
`fier latches the MSB data. The local sense MSB data is driven
`to the global sense amplifier in differential pairs. At the same
`time, the MSB sense reference latch latches the MSB data to
`select the reference cell for the LSB sensing (Fig. 4). Since
`the main array bitline and reference bitline capacitance is al-
`ready charged, only a short equalization time is necessary for
`the LSB sensing. When the equalization for the second sensing
`is complete, the equalization devices are released to rapidly
`develop differential margin at the local sense amplifier. The
`local-to-global differential bus is again initialized to ground.
`The local sense amplifier differential margin is latched in the
`local differential sense latch. The local sense LSB data is driven
`to the global sense amplifier in differential pairs and latched in
`the global sense amplifier, thus completing the two-bit sensing
`sequence. As illustrated in Fig. 8, the completion of a two-pass
`sensing is typically achieved in 30 ns. From the selection of the
`wordline and the bitline, the equalization, the building margin
`to latching data locally, and to latching data globally, each seg-
`ment is timed and controlled by the ATD. This provides opti-
`mized performance and minimized power consumption.
`
`V. REFERENCE ORGANIZATION AND OPERATION
`
`As described in the prior section regarding sense operation,
`a small array of reference cells is required for each local sense
`amplifier. This array, which is physically located adjacent to the
`local sense as shown in Fig. 9, contains multiple distinct word-
`lines and bitlines that are used to select the appropriate reference
`cell within this “mini” array. As shown in Fig. 4, the selection of
`a particular local bitline through the serial sense reference de-
`code logic controls which of the read reference cells is used for
`the sense operation. The local bitline and corresponding refer-
`ence cell is then connected to the sense amplifier through a sec-
`ondary device indicated in the diagram as the reference global
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1024, p. 1932
`
`
`
`ELMHURST AND GOLDMAN: MLC FLASH MEMORY WITH FLEXIBLE RWW
`
`1933
`
`Fig. 9.
`
`128-Mb die photo.
`
`-select (GYR). The biasing of these bitline select devices as
`well as the reference array wordline is identical to the biasing
`of the corresponding devices in the main array.
`In addition to the read reference wordline shown, there is
`a second wordline (not shown) connected to three additional
`reference cells that are used when verifying the cell threshold
`placement after programming an array cell. This second word-
`line is connected to the PV1, PV2, and PV3 reference cells refer-
`enced in Fig. 3. Bitline control for this second wordline is iden-
`tical to that used for a read operation. The MSB verify cycle
`compares an array cell to the PV2 reference cell, while the ref-
`erence for the LSB verify cycle (either PV1 or PV3) is chosen
`by the serial sense reference decoder based on the outcome from
`the MSB sense. As mentioned in Section II, the purpose of using
`distinct reference cells for the program verification is to ensure
`that there is enough space between the threshold voltage for any
`given array cell and the nearest read reference cell to overcome
`all sense related offsets and still ensure a robust read at the re-
`quired speed.
`
`VI. CONCLUSION
`
`The memory demands of code and data applications are met
`with a flexible RWW architecture [2]. The scheme described in
`this paper improves upon previously reported RWW architec-
`tures by utilizing MLC technology while still providing robust
`high-speed performance. As shown in Fig. 9, the RWW parti-
`tioning of the memory array is achieved through the use of 16
`8-Mb partitions. The 128-Mb device shown is manufactured in
`0.13- m technology with a die size of 27 mm . This die size is
`25% of the die size of a single-bit-per-cell device manufactured
`on 0.18- m technology.
`The high throughput of this device is achieved through a
`highly optimized sensing architecture that provides support for
`both asynchronous page mode and synchronous burst-mode
`operations. For synchronous operations, a 64-bit page buffer
`stores four 16-bit words of sensed data that are accessed
`sequentially by a system clock. Fig. 10 illustrates actual data
`showing the part performing a 125-MHz continuous burst
`operation. The clock is operating at 125 MHz with a period of
`8 ns. At the failing edge of the address valid signal, the part
`latches the address and initiates a read. After an initial latency
`equal to the asynchronous random access time, a new data word
`is output on every clock cycle at a rate of 125 MHz. At the rise
`
`Fig. 10.
`
`125-MHz Continuous burst operation.
`
`of the clock, the output transitions from a one to a zero and at
`the next rise of the clock the output transitions from a zero to
`a one. Using this new architecture, a flexible RWW capability
`in MLC technology and 1.8-V operation has been achieved.
`This design provides 125-MHz continuous burst capability and
`40-ns random read access time.
`
`ACKNOWLEDGMENT
`
`The authors would like to thank the worldwide Intel Flash
`team.
`
`REFERENCES
`[1] M. Bauer et al., “A multilevel-cell 32 Mb flash memory,” in IEEE Int.
`Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 1995, pp. 132–133.
`[2] B. Pathak et al., “A 1.8-V 100-MHz flexible read while write flash
`memory,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb.
`2001, pp. 32–33.
`[3] H. A. Castro et al., “A 125MHz burst mode 0.18 m128Mbit 2 bits per
`cell flash memory,” in Symp. VLSI Circuits Dig. Tech. Papers, June 2002,
`pp. 304–307.
`[4] D. Elmhurst et al., “A 1.8-V 128-Mb 125-MHz multi-level cell flash
`memory with flexible read while write,” in IEEE Int. Solid-State Circuits
`Conf. Dig. Tech. Papers, Feb. 2003, pp. 286–287.
`
`Daniel Elmhurst received the B.S.E.E. degree from
`Purdue University, West Lafayette, IN, in 1996.
`In his seven years with Intel Corporation, Folsom,
`CA, his work has focused on multilevel cell (MLC)
`technology for Flash memories. His specific areas of
`interest are MLC program algorithms and MLC high-
`speed sensing.
`
`Matthew Goldman (M’85) received the B.S. degree
`in electrical engineering from Rutgers University,
`New Brunswick, NJ, in 1987, and the M.S. and Ph.D.
`degrees from the University of Arizona, Tucson, in
`1989 and 1993, respectively.
`He has worked as a Design Engineer with Intel
`Corporation, Folsom, CA, since 1993. Most recently,
`his work has focused on high-speed sensing for mul-
`tilevel cell Flash memories.
`
`Petitioners SK hynix Inc., SK hynix America Inc. and SK hynix memory solutions Inc.
`Ex. 1024, p. 1933