`DRAM
`for graphics
`applications
`
`by T. Sunaga
`K. Hosokawa
`S. H. Dhong
`K. Kitamura
`
`A high-speed 2Mb CMOS DRAM with 32 data
`l/Os is described. A 0.6-yum CMOS process
`with a single poiysiiicon layer, two levels of
`metal, and substrate-plate trench-capacitor
`(SPT) memory cells is used to fabricate the
`chip. It is designed to provide the wide data
`bandwidth required by high-performance
`graphics applications. A 35-ns access
`time with an 80-ns cycle time has been
`demonstrated. The 32-bit data bus and the
`high-speed feature achieve more than two
`times better graphics performance than
`conventional dual-port memories. A sensing
`method with a 2/3 V^^ bit-line precharge
`voltage and a limited bit-line voltage swing is
`exploited to optimize speed and power. The
`chip, which operates on a 5-V power supply,
`dissipates 140 mA at the 80-ns cycle time.
`
`Introduction
`A remarkable advance in graphics display systems has
`been achieved in recent years. It is seen in a wide range of
`products from portable systems to high-end workstations.
`As a frame buffer memory for these display systems,
`DRAMs or DRAM-based memory chips have been used
`extensively because of density and cost advantages. As the
`number of colors and the screen size of a display increase,
`the memory density necessary for the frame buffer
`becomes large. For example, one and two megabytes of
`memory are very common memory sizes required in a
`
`personal computer system; multiple-megabyte frame buffer
`memories are not unusual in high-end workstations. In
`addition to a large memory size, graphics applications also
`require a very high data rate for frame buffer memories.
`The data rate is the most important performance factor in
`maintaining a fast screen change. It plays a crucial role as
`the number of bits per pixel becomes large. Frame buffer
`memories have two functions, a screen refresh and data
`update. To display the memory contents on the screen, all
`data bits required for one screen must be read within a
`specific time which is determined by horizontal and
`vertical refresh periods. The read operation for the screen
`refresh is done periodically. On the other hand, the
`memory contents must be updated by the graphics
`controller or the main processor. Since the update
`operation happens randomly, it is done in the time slots
`between the periodical read operations for the screen
`refresh.
`These performance and functional requirements are
`unique to graphics applications, and conventional DRAMs
`are not an optimal solution for them. Dual-port video
`memories (VRAMs) specially designed only for these
`particular applications have been announced in
`256Kb-4Mb generations [1-4]. They have a serial access
`memory (SAM) port for screen-refresh operations in
`addition to a random access memory (RAM) port. A single
`read access obtains either a full page or a half page of data
`and places it in the shift registers. A serial read operation
`through the SAM port is available for screen refreshing. It
`is done in parallel and independently of any access through
`
`^Copyright 1995 by International Business Maciiines Corporation. Copying in printed form for private use is permitted witliout payment of royalty provided tliat (1) eacli
`reproduction is done witliout alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of
`this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other
`portion of this paper must be obtained from the Editor.
`
`43
`
`0018-S646/95/$3.a0 ® 1995 IBM
`
`IBM J. RES. DEVELOP. VOL. 39 NO. 1/2 JANUARY/MARCH 1995
`
`T. SUNAGA ET AL.
`
`Page 1 of 8
`
`SAMSUNG EXHIBIT 1097
`
`
`
`TabIS 1 Process technology features.
`
`Design rule
`
`0.6 /i.m
`
`CMOS process
`
`Retrograded n-well
`p- epitaxial on p+ substrate
`
`Cell structure
`Cell transistor
`Storage cell
`Cell capacitance
`Size
`
`p-MOS
`Substrate-plate trench capacitor (SPT)
`80 fF
`1.9 X 4.35 /im^
`
`Polysilicon
`
`Diffusion
`
`Metal
`Ml
`M2
`
`Ti polycide (3.0 n/D)
`
`Ti salicide (2.0 n/D)
`
`Tungsten (0.16 fl/D)
`Aluminum (0.03 n/D)
`
`T a b le 2 Chip features and functions.
`
`Chip size
`
`Organization
`
`Function
`
`Refresh mode
`
`Refresh
`
`Package
`
`Pin configuration
`
`5.85 X 8.70 mm^
`
`64Kb X 32
`Two sets of 8-bit address inputs
`
`Fast page
`Read-modify-write
`Write-per-bit
`
`RAS only
`CAS before RAS
`Hidden refresh
`
`256 cycles/4 ms
`
`80-pin PFP
`
`32 I/Os, 16 addresses,
`RAS, CAS, WE, OE,
`12 power and ground
`
`the RAM port. Therefore, for memory content updates,
`almost the full time is available in VRAM, while only a
`small fraction of the full time can be allocated in a
`conventional DRAM because the latter must share its
`data I/Os for both screen refresh and update operations.
`However, the RAM port in VRAM is basically the same
`as the data I/Os of a conventional DRAM, and there is
`no advantage in normal and page-mode cycle times for
`memory-content update operations. The difference in the
`time windows available for updates causes differences in
`performance. This suggests that a simpler single-port
`DRAM can have a better performance than VRAM if a
`higher data rate, which compensates the time lost for
`screen-refresh operations, can be obtained.
`A 2Mb CMOS DRAM has been designed to realize a
`high graphics system performance by this single-port
`architecture. Key features needed to obtain the high data
`rate are a 32-bit-wide data bus and an 80-ns cycle time for
`normal RAS (row address strobe) access. Increasing the
`number of data I/Os is an effective way of enhancing the
`
`memory bandwidth; however, it also creates some difficult
`circuit design problems. In particular, when the chip is
`operated at a fast switching speed, active power becomes a
`serious concern. This paper describes some CMOS circuit
`design techniques that solve these problems and enable
`high-speed operation of the 32-1/0 DRAM. A detailed
`circuit design approach to optimize DRAM speed and
`active power is shown, in which the DRAM architecture
`is tuned specifically for graphics system flexibility. Some
`performance advantages for this specific application are
`also explained.
`
`Process technology
`The CMOS process used in fabricating the 2Mb DRAM
`features a 0.6-/xm design rule, a single layer of polysilicon,
`and two levels of metal. A p- epitaxial layer is grown on
`a p-F substrate, and low-resistivity retrograde n-wells are
`implemented by ion implantation. It is a smaller version of
`the 0.8-/Ltm process with substrate-plate trench-capacitor
`(SPT) memory cells which is used for the 4Mb DRAM [5].
`The 1.9 X 4.35-fim^ SPT cell consists of an 80-fF trench
`capacitor and a p-MOS FET transfer gate. The first level
`of metal, Ml, provides bit lines. Word lines consist of the
`polycide layer and the second level of metal, M2. Self-
`aligned silicide is formed on source and drain diffusion
`regions of both p- and n-channel devices. The process
`features are summarized in Table 1.
`
`Chip features and functions
`For graphics application flexibility, the 64Kb x 32 chip
`consists of two 64Kb x 16 arrays. There are two sets of
`8-bit address receivers associated with each 1Mb array.
`Each bus accepts independent 8-bit row and 8-bit column
`addresses in a conventional address multiplex method. One
`set of signal inputs—a row address strobe (RAS), a column
`address strobe (CAS), a write enable (WE), and an output
`enable (OE)—is provided to control all of the 2Mb arrays.
`The chip has 72 pads around its peripheral area. However,
`eight of them are monitoring pads for wafer tests, and only
`64 pads are actually used: 32 data I/Os, 16 address pads,
`four pads for RAS/CASAVE/OE, and 12 VJV^^ pads. To
`reduce switching noise due to the data I/O simultaneous
`activation, four pairs of I^cc^'^ss P^*^* ^""^ "^^'^ ^°^ power
`and ground of 32 data I/Os. Two other pairs are for
`internal circuits only; there is no on-chip connection to
`^cc/^ss °f ^^*^ ^/O^ ^° protect receiver circuits from
`power and ground line noise induced by data I/Os. An
`80-pin plastic flat package (PFP) is used for the package
`of the chip.
`In addition to conventional operational modes such as
`fast-page and read-modify-write, a write-per-bit function is
`implemented. Write-mask operation for 32 individual bits
`is defined by driving selected data I/O pins when RAS and
`WE are turned low. The chip is designed for high-speed
`
`T. SUNAGA ET AL.
`
`IBM J. RES. DEVELOP. VOL. 39 NO. 1/2 JANUARY/MARCH 1995
`
`44
`
`Page 2 of 8
`
`SAMSUNG EXHIBIT 1097
`
`
`
`•"•^—^-1
`
`j
`
`LBMMH
`
`m^
`bnaMs
`
`11
`
`,1111 - i rj
`
`IMWM
`
`' ••
`
`'
`
`"^
`
`•
`
`a
`
`D
`D
`
`D 0
`
`D.
`
`2"D
`%
`ID •2D
`•Evo
`wn
`
`B
`rS
`
`on
`
`2yo
`.2fD
`WQ
`D
`D
`
`Address receivers for top 1Mb anray
`n DA DA DA DA DV DG DA DA DA DA a D
`64Kb array
`Sense amp/Column dec
`1-
`64Kb array
`
`.1
`
`i4
`. 3-
`
`.3-
`
`.1
`
`yo'2
`
`DWDR DA DA DA DA DV DG DA DA DA DA DODC
`Address receivers for bottom 1Mb array
`(b)
`
`(a) Microphotograph of the 64Kb x 32 DRAM chip, (b) Floor plan of the 64Kb x 32 DRAM chip. Two 1Mb arrays are placed in the top
`and (he bottom halves of the chip. All 72 pads (64 active and eight wafer test purpose) are placed around chip peripheral areas. Pad
`identifications: W = WE, R = RAS, A = address, O = OE, C = CAS, V = internal V^;^, G = internal circuit ground, D = data I/O,
`VO = off-chip driver V(,j,, and GO = off-chip driver ground.
`
`operation, and 80 ns is a typical cycle time for its normal
`read and write modes. It has CAS before RAS and hidden
`refresh functions (256 words per 4 ms is the refresh cycle).
`Features and chip information are summarized in Table 2.
`
`Chip organization
`
`• Floor plan
`Figure 1(a) shows a microphotograph of the 2Mb CMOS
`DRAM chip (actual chip size is 5.85 x 8.70 mm^). The
`chip consists of two 1Mb arrays which are placed in
`the top and bottom halves of the floor plan shown in
`Figure 1(b). The floor plan and circuit block placements are
`optimized to obtain the shortest signal lines from address
`receivers to data I/Os. Each 1Mb block in the 64Kb x 16
`array has its own eight address receivers and 16 data
`
`I/Os in peripheral areas. Eight address pads for the top-
`half arrays are placed on a top peripheral. True and
`complement pairs for the output bus of row and column
`addresses run vertically in the center portion of the chip,
`where redundancy and predecoder/decoder circuits are also
`placed. The scheme realizes effective connections from the
`address circuits to both redundancy and decoder circuits.
`Write buffers and I/O sense amplifiers are placed along the
`sides of the sense amplifier and column decoder blocks.
`Since two sets of eight data I/O circuits and pads for the
`top-half arrays are located in both left and right top-half
`peripheral areas, data stream paths from sense amplifiers to
`I/O pads are also short. The bottom-half arrays, which mirror
`those of the top half, have a similar structure. Therefore, top
`and bottom peripherals contain address receivers of eight
`bits each, and there are 16 data I/Os on both left and right
`
`45
`
`IBM J. RES. DEVELOP. VOL. 39 NO. 1/2 JANUARY/MARCH 1995
`
`T. SUNAGA ET AL.
`
`Page 3 of 8
`
`SAMSUNG EXHIBIT 1097
`
`
`
`Row and column
`predecoder
`
`32Kb array
`
`32Kb array
`
`DataB
`DauA
`Is^a^iGliii
`Sense amplitieii"
`BUswteh^!?
`BHswjtyhg
`Column decoder and data line
`Bic switctes
`Sense ainplifiers
`- 512 bil-line pairs-
`
`S^memElifm 1
`
`. BiUwiK-1
`
`128 word lines
`Data A
`
`DataB
`
`5 V to the internal 3.3 V. There are two internal V^^ pads
`on the top and the bottom peripheral areas of the floor
`plan. To minimize voltage drop due to V^^ power wiring
`resistance, each V^^ pad is connected directly to two
`regulators on its left and right sides. Regulated 3.3-V
`outputs are distributed all over the chip by wide M2 wiring
`as an internal F^j, bus. Cell transistors are p-channel
`devices in an array n-well. The array n-well voltage must
`be biased at about 1.1 V above the internal voltage, F^^,
`to minimize subthreshold leakage currents. A charge-pump
`voltage generator on the chip supplies this n-well potential.
`It generates 4.4 V from the internal 3,3 V, shielding it from
`the external 5-V variations.
`
`ft •i
`
`a
`
`Circuits
`
`• High-speed circuits
`Subanay organteation of the 64Kb x 32 DRAM chip. The chip Some circuit design techniques explored in previous high-
`has a total of 16 such 128Kb subarrays. Two I/O pads are placed
`speed DRAMs are also used [6-8], Fast RAS access times
`close to their array blocks.
`
`Data line
`
`peripheral areas. In addition to the address receivers, the
`bottom peripheral area contains all control signal input
`circuits for RAS, CAS, WE, and OE.
`
`• Memory array
`There are 16 subarray blocks in the chip. Figure 2 shows
`one subarray block which contains 128Kb of storage cells.
`Column decoders and sense amplifiers divide it into two
`64Kb blocks, each containing 128 word lines and 512
`folded bit-line pairs. There are 64 cells on each bit line.
`In parallel with the 512-bit-long polycide word lines, M2
`lines ran above them, and these two layers of lines are
`connected every 128 bits to minimize word-line delays.
`A RAS access activates one of two 64Kb arrays, and
`the column decoder selects two out of 512 bit switches to
`transfer two data bits from or to two I/Os for write or read
`operation. The half-array activation, short bit lines with 64
`cells, and 512-bit-long M2-assisted word lines are key array
`design features for achieving low active power, fast access
`and cycle times, and a large signal margin.
`
`• Power system
`The chip is supplied by 5-V power only, but all internal
`circuits, including the memory arrays, operate at 3.3 V.
`The only exception is power for the off-chip drivers, which
`is supplied directly from the external 5 V. The 3.3-V
`internal voltage was selected because of the 0.6-fim
`technology used, but it also contributes to a power
`reduction. For high-speed operation of the 32-data-I/O
`DRAM, a solid internal power system is desirable. Four
`on-chip voltage regulators convert the externally supplied
`
`46
`
`Refcell
`
`MZMH> <H
`
`Ref word line 1
`
`Ref word line 0
`
`BEQ
`
`-G^
`
`HZHK>
`H I—
`
`—
`
`<ti\-{z}-
`
`WHiilineO
`
`W)rd line I
`
`:ir WDid line 126
`
`Word line 127
`
`Bit line
`
`Bit line
`
`Sense circuit of the 64Kb x 32 DRAM.
`
`T. SUNAOA ET AL.
`
`IBM J. RES. DEVELOP. VOL. 39 NO. IB JANUARY/MARCH 1995
`
`Page 4 of 8
`
`SAMSUNG EXHIBIT 1097
`
`
`
`were the prime targets of these DRAMs. Since graphics
`operations move data to and from frame buffer memories
`consecutively in both normal and page modes, cycle times
`are also important. The high-speed design in the 2Mb
`DRAM is therefore focused to achieve fast access and
`cycle times. As the floor plan shows, simple and short
`signal propagation paths are used throughout, from address
`inputs to data outputs. Row and column circuits are
`physically mdependent of one another. They do not share
`true and complement output buffer lines. The separate true
`and complement buses eliminate an address control circuit
`and the delays associated with it. Because redundancy
`circuits are located close to row and column predecoders,
`they require no modificatioo of the decode timing. Since
`the polycide word line is the key delay factor in RAS
`access time, it is shorted by M2 every 128 bits, and
`it connects only 512 cell transistors rather than the
`conventional 1024 [8, 9]. The bit-line length is also half
`that of typical lMb/4Mb DRAMs. Only 64 cells are
`connected to the bit line. The short word and bit lines are
`important for the fast cycle time, because they reduce the
`time required for the precharge operation.
`
`H Seme circuits
`The sensing scheme is the most important circuit in any
`DRAM design, because signal development speed and
`active power depend on it strongly. Since p-channel array
`transistors are used in the 2Mb DRAM, the fastest signal
`development can be obtained when bit lines are precharged
`at the full internal supply voltage, F^j,. However, this
`causes a high active current, since one of the bit-line pair
`must swing over the full rail-to-rail voltage. The signal
`development speed and the active power can be traded
`off against each other. As the optunum sensing scheme,
`2/3 Fpp sensing with a limited bit-line swing is used [8, 10].
`Figure 3 shows the sense circuit. When cross-coupled
`p- and n-channel transistors are activated by the p-set node
`and the n-set node, they latch to the full F^j, voltage.
`However, one of the bit-line pair is clamped at the
`threshold voltage, F,, because of p-channel transistors
`between the bit lines and the sense amplifier. The 2/3 F^^
`precharge voltage is obtained by simply shorting the bit-
`line pair. Since the p-set node pulls up one of the bit-line
`pair from 2/3 Fp^ to full F^p when the sense amplifier is
`turned on, the array current is about 1/3 of the full F^^
`precharge sensing. The 2/3 F^p precharge voltage allows
`a reasonable signal development speed to meet the 35-ns
`access time. This sensing method is faster and uses less
`power than the conventional 1/2 F^^ sensing scheme.
`The use of a short 64-cell bit line and 2/3 F^p sensing
`with a limited bit-line swing creates an ideal combination
`for optimum sensing. The combination is also very
`effective in reducing cycle time. Because of the small
`capacitance ratio between bit line and cell, the word line is
`
`90.0
`
`nO.O 130.0 150.0 170.0 190.0 210.0 2.10.0
`
`250.0
`
`Time (ns)
`
`Simulated waveforms of the sense circuit. Simulation was per-
`formed at 87°C and 3.3 V internal voltage. The first half of simu-
`lation is a read operation. When the sense amplifier is activated,
`the low-level bit line is clamped at K- The latter half of simulation
`is a write operation which .starts from a time around 170 ns to
`write the opposite polarity of signal into the cell.
`
`not boosted in the write-back operation, and the chip can
`keep a sufficient signal margin. The unboosted word line is
`a favorable feature for short cycle time. Clamping the bit-
`line voltage with pounded-gate p-channel devices is a
`simple and cost-effective sensing scheme, but it has a
`longer write-back time because of the source-follower
`operation of the clamp devices. However, the short bit line
`overcomes this drawback because of its small capacitance.
`Generation of the 2/3 F^p precharge voltage by shorting
`the bit-line pair is another good feature for fast precharge
`because of the short bit line. Figure 4 shows simulated
`waveforms of key signals.
`
`% Low-power design
`Reducing active power is a very important task in fast-
`cycle-time DRAM designs, because the current, which
`provides the majority of CMOS DRAM power, is
`proportional to the reciprocal of the cycle time. In
`addition, the wide data bus chip dissipates sipiicantly
`higher power in data path circuits than conventional
`DRAMs. Furthermore, the 32 data I/Os also have a
`considerable amount of power consumption. Thus, low-
`active-power designs are the primary focus of the chip
`architecture definition. Since a dominant part of the
`total power in memory arrays is consumed when sense
`amplifiers are fired and bit lines are restored to the
`
`47
`
`IBM J. RES. DEVELOP. VOL. 39 NO. 1/2 JANUARY/MARCH 1995
`
`T. SUNAGA ET AL.
`
`Page 5 of 8
`
`SAMSUNG EXHIBIT 1097
`
`
`
`Table 3 Chip characteristics at 25°C, 5 V, and 80-pF load
`capacitance.
`
`/
`
`Data-r
`
`,' "'
`
`RAS
`
`RAS access time (tj^^)
`Cycle time (f^^)
`RAS precharge time (f^p)
`Page cycle time (fp^)
`Active power at 80-ns
`cycle time
`
`35 ns
`80 ns
`15 ns
`30 ns
`140 mA
`
`••3 >
`
`-22.0
`
`RAS access time
`
`\
`V..^_^~-^V-
`^
`
`28.0
`'
`Time (ns)
`
`Data "0"
`
`78.0
`
`sensing method, because bit lines swing only 1/3 of Kp^.
`The short bit line, which is half that of a conventional
`DRAM, is also effective in reducing array power. It allows
`a 1/2 fractional array activation with the standard refresh
`requirement of 256 words per 4 ms.
`
`Characteristics
`Figure 5 shows waveforms of the RAS signal and the data
`output at 25°C and 5 V. A RAS access time of 35 ns is
`measured. Figure 6 shows a F^(,-RAS access time plot.
`Read-write operation with an 80-ns cycle time is also
`demonstrated. The RAS precharge time of 15 ns is
`achieved by the unboosted word line and the short bit line.
`The typical active current is 140 mA at the 80-ns cycle
`time. Some performance data for a 5.0-V power supply
`voltage and a 25°C ambient temperature are summarized
`in Table 3.
`
`RAS access waveform. RAS access time is observed at 25''C,
`5V,
`and 80-pF load capacitance with 2-mA output current load.
`
`'i?.S.
`
`>
`
`5.600
`5.550
`5.500
`5.450
`5.400
`5.350
`5.300
`5.250
`5.200
`5.150
`5.100
`5.050
`5.000
`J3 4.950
`4.900
`4.850
`4.800
`4.750
`4.700
`4.650
`4.600
`4.550
`4.500
`4.450
`4.400
`
`+-
`+
`+
`AAAkkAkkkkkAkAAklklihk+
`kkkkkkkkkkkkkkkkkkkL
`kkkkkkkkkkkkhkkkkkki.
`kkkkkkkkkkkkkkkkkkk-L
`kkkkkkkkkkkkkkkkkkkL
`kkkkkkkkkkkkkkkkkkk+
`
`a
`a
`
`AAAAAAAAAAAAAAAAA.».
`
`AAAAAAAAAAAAAAAAAJ.
`
`AAAAAAAAAAAAAAAAJ.
`AAAAAAAAAAAAAAAA^.
`*x
`
`AAAAAAAAAAAAAAX
`
`AAAAAAAAAAAAAX
`
`20.00
`
`25.00
`
`30.00
`
`35.00
`
`40.00
`
`Time (ns)
`
`Graphics applications
`The overall graphics performance of the 64Kb x 32
`DRAM can be compared with that of dual-port memories.
`For example, a typical 256-color, 1024 x 768-pixel screen
`usually needs a 1MB frame buffer, including off-screen
`memories. It therefore uses four 2Mb DRAMs with a
`64-bit bus, or two 256Kb x 16 dual-port memories with
`a 32-bit bus. With a vertical screen refresh rate of 70 Hz,
`almost the full 14.28 ms (1/70) of the one-screen trace time
`is available for memory content update by a CPU or a
`graphics controller to the VRAM's RAM port. The time
`window for a single-port DRAM is smaller, because the
`time required for screen refresh must be subtracted from
`14.28 ms. However, with a typical screen refresh buffer in
`the graphics controller, the 2Mb DRAM loses only about
`20% of the 14.28 ms for screen refresh because of the wide
`r'«S^",<-^';
`data bus. The data rate for the memory-content update
`%S^:"Y5^jj,^-
`operation is determined by the product of the bandwidth
`Plot of V^c vs. RAS access time. Plot taken at 25°C and 80-pF
`load capacitance with 2-mA output current load.
`and the ratio of available time to the full single-screen
`trace time. The 20% loss of the DRAM can be well
`compensated by the bandwidth. In horizontal line
`accesses, the page mode is extensively used. If the same
`page cycle time is assumed, the relative data rate for
`DRAM with respect to VRAM is 1.6, obtained from the
`double-width data bus and 80% of the available time
`window. In vertical accesses such as line drawing, the
`
`precharge state, selection of a sensing scheme and a
`fractional array activation requires careful consideration.
`As shown in the section on sense circuits, the 2/3 F^^
`sensing with a limited bit-line swing is the lowest-power
`
`48
`
`T. SUNAGA ET AL.
`
`IBM J. RES. DEVELOP. VOL. 39 NO. 1/2 JANUARY/MARCH 1995
`
`Page 6 of 8
`
`SAMSUNG EXHIBIT 1097
`
`
`
`nominal access mode must be used. Because of the two
`independent address inputs, the 2Mb DRAM can access
`two vertical pixels in a single nominal access. A typical
`cycle time for the nominal access mode in VRAM is
`120 ns. The 1.5x advantage in cycle time for two pixels
`(but only an 80% available time window) gives the 2Mb
`DRAM about 2.4 x better bandwidth than a typical VRAM.
`The two independent address inputs also allow the single
`DRAM to access a 2 by 2-pixel box area of an 8-bit-per-
`pixel screen.
`
`Conclusion
`A 2Mb CMOS DRAM organized 64Kb x 32 has been
`developed for graphics applications. The 32-bit-wide data
`bus and an 80-ns cycle time provide the high memory
`bandwidth. The chip consists of two 1Mb arrays, and each
`64Kb X 16 block has independent 8-bit address inputs to
`realize flexible memory mapping to graphics screens. The
`chip supports fast-page, read-modify-write, and write-per-
`bit modes. Its package is an 80-pin PFP with 64 active
`pins.
`The floor plan is optimized to obtain short signal
`propagation paths from address receivers to data I/Os. Key
`circuit design features for achieving fast access time, short
`cycle time, and low active power consumption include M2-
`strapped 512-bit-long word lines, short bit lines with 64
`cells, and 2/3 V^^ sensing with a limited bit-line swing.
`A 35-ns RAS access time is observed with an 80-ns cycle
`time. A 140-mA active current is measured at the short
`80-ns cycle time. The single-port x32 DRAM can provide
`1.6-2.4 times better bandwidth for graphics systems than
`conventional dual-port video memories because of its wide
`data bus and high-speed design.
`
`Acknowledgments
`The authors are indebted to many individuals in the IBM
`Japan Yasu manufacturing and development organization
`who worked on the project. We would like to thank
`N. Tanigaki, T. Saito, K. Fujisawa, and M. Kazusawa
`for process technology support. We would also like to
`acknowledge the test and characterization assistance by
`T. Yoshikawa and S. Iwamoto. We are grateful as well to
`L. M. Terman for his encouragement of our work and his
`review of an earlier version of this paper.
`
`References
`1. S. Ishimoto, A. Nagami, H. Watanabe, J. Kivono,
`N. Hirakawa, and Y. Okuyama, "A 256K Dual Port
`Memory," Digest of Technical Papers, International
`Solid-State Circuits Conference, pp. 38-39 (1985).
`2. K, Ohta, H. Kawai, M. Fujii, T. Nishimoto, S. Ueda, and
`Y. Furuta, "A 1-Mbit DRAM with 33MHz Serial I/O
`Ports," IEEE J. Solid-State Circuits 80-21, 649-654
`(October 1986).
`3. R. Prinkham, D. Russell, A. Balistreri, T. H. Herndon,
`D. Anderson, A. Metha, T. Nguyen, N. H. Hong, H.
`
`Sakurai, S. Hatakoshi, and A. Guillemaud, "A 128k x 8
`70MHz Multiport Video RAM with Auto Register Reload
`and 8 X 8 Block Write Feature," IEEE J. Solid-State
`Circuits SC-23, 1133-1139 (October 1988).
`4. 4M-bit Dual Port Graphics Buffer, NEC Data Sheet,
`PD482445, NEC Corporation, Tokyo, Japan, April 1993.
`5. N. C. C. Lu, P. B. Cottrell, W. J. Craig, S. Dash. D. L.
`Critchlow, R. L. Mohler, B. J. Machesney, T. H. Ning,
`W. P. Noble, R. M, Parent, R. E. Scheuerlein, E. J.
`Sprogis, and L. M. Terman, "A Substrate-Plate Trench-
`Capacitor (SPT) Memory Cell for Dynamic RAM's," IEEE
`J. Solid-State Circuits SC-21, 627-634 (October 1986).
`6. N. C. C. Lu, H, H. Chao, W. Hwang, W. H. Henkels,
`T. V. Rajeevakumar, H. I. Hanai, L. M. Terman, and
`R. L. Franch, "A 20-ns 128-kbit x 4 High-Speed DRAM
`with 330-Mbit/s Data Rate," IEEE J. Solid-State Circuits
`SC-23, 1140-1149 (October 1988).
`7. N. C. C. Lu, G. B. Bronner, K. Kitamura, R. E.
`Scheuerlein, W. H. Henkels, S. H. Dhong, Y. Katayama,
`T. Kirihata, H. Niijima, R. L. Franch, W. Hwang,
`M. Nishiwakl, F. L. Pesavento, T. V. Rajeevakumar,
`Y. Sakaue, Y. Suzuki, E. Yano, and Y. Iguchi, "A 22-ns
`1-Mbit CMOS High-Speed DRAM with Address
`Multiplexing," IEEE J. Solid-State Circuits SC-24,
`1198-1205 (October 1989).
`8. T. Kirihata, S. H. Dhong, K. Kitamura, T. Sunaga,
`Y. Katayama, R. E. Scheurlein, A. Satoh, Y. Sakaue,
`K. Tobimatsu, K. Hosokawa, T. Saitoh, T. Yoshikawa,
`H. Hashimoto, and M. Kazusawa, "A 14-ns 4-MB CMOS
`DRAM with 300-mW Active Power," IEEE J. Solid-State
`Circuits SC-27, 1222-1228 (September 1992).
`9. T. Furuyama, T. Ohsawa, Y. Watanabe, H. Ishiuchi,
`T. Watanabe, T. Tanaka, K. Natori, and O. Ozawa, "An
`Experimental 4-Mbit CMOS DRAM," IEEE J. Solid-State
`Circuits SC-21, M5-611 (October 1986).
`10. S. H. Dhong, N. C. C. Lu, W. Hwang, and S. A. Parke,
`"High Speed Sensing Scheme for CMOS DRAM's," JEEE
`J. Solid-State Circuits SC-23, 34-40 (February 1988).
`
`Received May 25, 1994; accepted for publication
`November 17, 1994
`
`ToshiO Sunaga IBM Japan Ltd., Yasu Technology
`Application Laboratory, 800 Ichimiyake, Yasu-cho, Yasu-gun,
`Shiga-ken 520-23, Japan (SUNAGA at YSUVMl). Mr. Sunaga
`received the B.S. degree in applied physics from the Science
`University of Tokyo, Japan, and the M.S.E. degree in
`electrical engineering from Princeton University. He joined
`IBM Japan, Ltd. at the Fujisawa plant, Kanagawa, Japan,
`in 1970, and worked on analog circuit design and power
`semiconductor devices. After educational leave from IBM
`Japan at Princeton University, he worked on semiconductor
`process technology, VLSI design methodology, 1Mb, 4Mb,
`and 8Mb ROM development, and DRAM circuit design at the
`Yasu manufacturing and development facility, Shiga, Japan,
`and the Japan Science Institute (JSI), Tokyo, Japan. Mr.
`Sunaga is currently with the IBM Japan Yasu Technology
`Application Laboratory; he is responsible for semiconductor
`product development.
`
`Koji Hosokawa IBM Japan Ltd., Yasu Technology
`Application Laboratory, 8W Ichimiyake, Yasu-cho, Yasu-gun,
`Shiga-ken 520-23, Japan. Mr. Hosokawa received the B.S.
`degree in mechanical engineering from Hiroshima University,
`Japan, in 1983. Since joining IBM Japan Ltd. at the Yasu facility
`in 1983, he has been engaged in CMOS DRAM product
`development.
`
`49
`
`IBM J. RES. DEVELOP. VOL. 39 NO. 1/2 JANUARY/MARCH 1995
`
`T. SUNAOA ET AL.
`
`Page 7 of 8
`
`SAMSUNG EXHIBIT 1097
`
`
`
`Ssng H. Dhong IBM Research Division, Thomas J. Watson
`Research Center, P. O. Box 218, Yorktown Heights, New York
`10598 (DHONG at YKTVMV). Dr. Dhong received the
`B.S.E.E. degree from the Korea University, Seoul, in 1974
`and the M.S. and Ph.D. degrees in electrical engineering from
`the University of California at Berkeley in 1980 and 1983,
`respectively. He joined the IBM Research Division in
`Yorktown Heights, New York, in 1983 as a research staff
`member involved with the research and development of silicon
`processing technology, mainly bipolar devices and reactive-ion
`etching (RIE). From 1985 to 1992, he was engaged in research
`and development of DRAM designs spanning many
`generations of IBM DRAMs, from 1 Mb to 256 Mb. After
`spending three years in development in one of the IBM
`PowerPC™ chips as a circuit designer, he is currently working
`on low-power design aspects of the IBM PowerPC.
`
`Koji Kitamura IBM Japan Ltd., Semiconductor Operation,
`800 Ichimiyake, Yasu-cho, Yasu-gun, Shiga-ken 520-23, Japan.
`Mr. Kitamura received the B.S. degree from Osaka University
`and the M.S. degree from Kyoto University, Japan, both in
`chemical and solid-state physics. In 1979, he joined IBM Japan
`at the Yasu facility. He joined the semiconductor group in
`1982, working in back-end-of-Iine (BEOL) processing. In 1988,
`he joined a process development group to work in fast-access
`DRAM fabrication and characterization. His technical interests
`include device simulation and design, DRAM characterization,
`nonvolatile memory design, and yield modeling. Mr. Kitamura
`is a manager of semiconductor process development at the
`IBM Yasu site.
`
`PowerPC is a trademark of International Business Machines Corporation.
`
`50
`
`T. SUNAGA ET AL.
`
`IBM J. RES. DEVELOP. VOL. 39 NO. 1/2 JANUARY/MARCH 1995
`
`Page 8 of 8
`
`SAMSUNG EXHIBIT 1097
`
`