578

people involved made a significant contributions to this project. We would like to thank Z. Fazarinc and R. Eschenbach of HP Labs and J. Horner and D. Hunsinger at the Santa Clara Division for their enthusiastic and continuous support of this project. We also benefited from the excellent help and cooperation of a team of people in IC R&D and CAA design, namely, I. Pesic, J. DeGrenier, K. Kikuta, K. Zenor, and E. Burke. A. Wang, I. Pecenco, and A. Motamedi were responsible for the IC process, and W. Scott and J. Struss for the production. H. Hofmann, K. Purdue, R. Eaton, and R. Wan wrote all the PLM tools software, which is such an important part of this short turnaround design cycle. Finally, we deeply appreciate the efforts of many others involved in the mask making, processing, production, testing, and packaging phases of this project.

#### REFERENCES

- [1] E. B. Eichelberger and T. W. Williams, "A logic design structure for LSI testability," in *Proc. 14th Design Automation Conf.*, New Orleans LA, June 1977, pp. 462-468.
- F. F. Tsui, "In-situ testability design," Proc. IEEE, vol. 70, pp. [2] 59-81 Jan. 1982.
- [3] W. Twadell, "Uncommitted IC logic," EDN, pp. 89-98, Apr. 5, 1980.

- [4] J. Birkner, "Reduce random-logic complexity," Electron. Design,
- pp. 98-105, Aug. 16, 1978. W. R. Iversen, "3,000-gate array has 600-ps delay," *Electronics*, [5] D. Bursky, "Chips get denser, faster, as software shortens turn-
- [6] around," Electron. Design, pp. 91-102, Dec. 9, 1982.
- Z. E. Skokan, "Emitter function logic," IEEE J. Solid-State Circuits, vol. SC-8, Oct. 1973.
- K. M. Ferguson and L. R. Dickstein, "Time synthesizer generates [8] B. W. Wong and W. D. Jackson, "A high-performance bipolar LSI
- [9] counter chip using EFL and IIL circuits," HP J., Jan. 1979.



Zdenek E. Skokan received his education in Prague, Czechoslovakia.

From 1963 to 1968 he worked in the Government Computer Research Institute, Prague. His experience there included high-speed instrumentation design and circuitry for thin-film memories. Since 1969 he has worked as a member of the Technical Staff at Hewlett-Packard Laboratories, Palo Alto, CA. His main area of interest is in high-speed bipolar integrated circuits for instrumentation and computation. He holds a

number of patents in the area of logic circuit design. In his spare time, he does R&D work for electric automobiles as a founder and President of Electric Vehicles, Inc.

## A 20K-Gate CMOS Gate Array

TAKASHI SAIGO, HARUYUKI TAGO, MASAZUMI SHIOCHI, TAMOTSU HIWATASHI, KIYOSHI NIWA, SHOHEI SHIMA, AND TAKAHIKO MORIYA

Abstract-Combining an advanced 2 µm CMOS technology with a newly developed triple level metallization technology, a high-performance 20K-gate CMOS gate array has been developed. An advantage of triple level metallization for area saving in a large scale gate array was evaluated by a computer simulation. The typical gate delay is 1.5 ns with fan-out 3, and 3 mm metal interconnect length. As a test vehicle for verifying the high-performance gate array, a  $32 \times 32$  bit parallel multiplier has been successfully designed and fabricated. Cell utilization is about 65 percent. Typical multiplying time is 120 ns at a 5 MHz clock rate with a power dissipation of 400 mW.

#### I. INTRODUCTION

ATE arrays have been widely used because of their short G turnaround time nature and cost/performance advantage. The CMOS approach especially has become a dominant technology trend due to several advantages such as low-power dissipation, high density and high speed, as also reported in memory device papers [1]-[3].

Manuscript received April 19, 1983; revised June 27, 1983. The authors are with VLSI Application Department, Toshiba R and D Center, Toshiba Corporation, Kawasaki, Japan.

DOCKE

6K- and 8K-gate CMOS gate arrays with double level metallization have been developed and reported [4]-[6]. However, demand for larger scale gate arrays is still increasing, especially in applications to large computers.

Combining an advanced CMOS technology with a newly developed triple level metallization technology, a high-performance 20K-gate CMOS gate array has been developed. The first part of this paper describes advantages of triple level metallization for large scale gate arrays. Then, the fabrication process is discussed. The basic cell, I/O cell, and chip configuration as well as the basic performance of the array are discussed next, and finally the circuits, the design, and the performance of a  $32 \times 32$  bit parallel multiplier as a test vehicle of a 20Kgate gate array are described.

#### II. ADVANTAGE OF TRIPLE LEVEL METALLIZATION

In the conventional, double level metallization CMOS gate arrays, the second metal layer is usually used for the periphery of the chip for power buses. It is widely known that the power buses occupy a large area on a large scale gate array

0018-9200/83/1000-0578\$01.00 © 1983 IEEE



Fig. 1. Calculated ratio of area required for power buses to the total chip area.

chip. By employing the third metal layer as additional power buses, this problem can be eased drastically. Fig. 1 shows the calculated ratios of the area required for power buses to the total chip area for double and triple level metallizations. The hatched area is for power buses. The rest of the chip area is available for signal lines. The calculation is performed as follows.

The chip area is the sum of the core area and the power bus area which is surrounding the core area.

When the number of gates is given, the core area and power bus area are calculated as follows.

1) Core Area Calculation:

The basic cell size and wiring channel width are fixed and are  $90 \times 20.1$  and  $100 \,\mu\text{m}$ , respectively.

The core area is calculated for the shape which is chosen to be as square as possible.

The core area does not depend on whether the metallization is double or triple.

2) Power Bus Area Calculation:

The current density limit is  $10^5 \text{ A/cm}^2$ .

The thickness of each level metal line is 1  $\mu$ m.

The each basic cell operates at a cycle clock of 25 MHz, and power dissipation is 250  $\mu$ W/basic cell [ $V_{DD}$  = 5 V].

Every metal line on the core area is available for a signal line and power bus.

The first metal line is also available for a signal line and power bus.

The power bus area is calculated for both double level and triple level metallizations under the conditions mentioned above.

Then, the total chip area and the ratio are obtained. At the 20K-gate level, about 60 percent of the chip area is required for power buses in the double level metallization. However, in the triple level metallization, only 35 percent of the chip area is required for power buses. The result clearly shows that triple level metallization is useful for large scale gate arrays. Obviously, there exist many other advantages of triple level metallization for CMOS gate arrays as discussed in [7].

As the first application of the triple level metallization to CMOS gate arrays, we considered a case of the third metal layer being used only for  $V_{DD}$  and  $V_{SS}$  lines. In the earlier



Fig. 2. Equivalent circuit of  $V_{DD}$  and  $V_{SS}$  lines.



reported gate array [4], the first metal lines run vertically (along the channel) and the second metal lines run horizontally, on the basic cell array. The first metal layer is used both for power buses on the array and for signal lines. The second metal layer is used for signal lines on the array and for power buses on the I/O area. Then, we also considered the case where the third metal lines used for the power buses run horizontally on the array (perpendicular to the first metal lines used for power buses) and along the periphery on the I/O area. In this case, connecting the third to the first metal lines on the array is desirable, in order to prevent voltage fluctuation along the power buses. Fig. 2 shows the equivalent circuit of the power buses for the double level metallization.  $R_P$  and  $L_P$ are resistance and inductance along the inner leads of the package, respectively. Fig. 3 shows a part of the circuit including an equivalent large inverter composed of twenty pairs of pand n-channel transistors in the basic cell, each connected in parallel. The inverter is located in the center of the circuit. The bold solid lines correspond to the third metal power buses. Fig. 4 shows the results of simulation for the circuit

Find authenticated court documents without watermarks at docketalarm.com.



Fig. 4. Voltage fluctuations along  $V_{DD}$  and  $V_{SS}$  lines in double level metallization.



Fig. 5. Voltage fluctuations along  $V_{DD}$  and  $V_{SS}$  lines in triple level metallization.

in the double level metallization. Values of resistances, capacitances, and inductances are estimated from the structure of the 20K-gate gate array described in Section IV. The upper waveforms show the input and output signals. The other waveforms show voltage fluctuations of the  $V_{DD}$  and  $V_{SS}$  lines at the points shown in Figs. 2 and 3. In the double level metallization, the fluctuation of the power buses is rather serious, as shown in Fig. 4. However, the fluctuation of the power buses is reasonably suppressed by employing triple level metallization, as shown in Fig. 5. From these results it is clear that the triple level metallization is effective in suppressing the fluctuation of power buses, when many basic cells operate simultaneously.

#### III. FABRICATION PROCESS

#### A. Master Process

The device structure is a conventional p-well with single level Si-gate, similar to that of 64K CMOS RAM [2]. Fig. 6 shows some of the key process parameters and design rules. For realizing the 2  $\mu$ m design rules, dry etching and ion implantation processes are fully utilized. The master process

| Substrate                | N-type Si,10Ω-cm |
|--------------------------|------------------|
| Gate Oxide Thickness     | 400 Å            |
| Effective Channel Length | 1 5 µm           |
| Threshold Voltages       | ± 0.8V           |
| Contact Hole Size        | 2µm 🗆            |
| 1st Al Line / Pitch      | 2 µm/5µm         |
| Via I Size               | 2 µm 🗆           |
| 2 nd Al Line/Pitch       | 3µm/6.7µm        |
| Via 2 Size               | 7µm x 5µm        |
| 3rd Al Line/Space        | 11µm/5µm         |
|                          |                  |

Fig. 6. Key process parameters and design rules.



is almost identical to the earlier reported 6K-gate gate array process [4].

#### **B.** Personalization Process

The personalization process is constructed by triple level metallization and 2 µm design rules, different from the 6Kgate gate array [4]. For introducing these tight design rules as shown in Fig. 6, reactive ion etching (RIE) technology is quite essential. Consequently, steep pattern edge steps produced during the etching process tend to cause several problems. Two major problems are discontinuity and pattern deterioration of upper level A1 interconnecting lines, and poor A1 step coverage at hole edges, which causes open failue in interconnecting lines. Two new processes have been developed to overcome these difficulties. The first one is a rounding technique of hole edges, which has been already described briefly [8]. Fig. 7 shows the flow chart of the rounding edge of holes. In the first step, the hole is anisotropically etched using CF<sub>4</sub> and H2. In step (c) after the resist removal, RIE in the mixture of  $C_3F_8$  and  $H_2$  bevels the edge of the holes. The step edge of the holes is cut roundly without any change in the bottom size. This technology is applied to contact holes and via 1 holes (between the first and second metal layers) formations. The second is the low-temperature planarization technique using plasma SiN [9]. Fig. 8 shows the flowchart of the planarization. This technology utilizes the phenomenon that the silicon nitride etch rate at the groove bottom is suppressed in the CF4 and H2 gas environment. In step (c) after deposition of SiO<sub>2</sub> and SiN films, the SiN film is etched so as to fill the SiO<sub>2</sub> grooves with the SiN. The following RIE process is carried out with the same etch rate for SiN and SiO2 films. Then, the SiO<sub>2</sub> surface is planarized as shown in (d). This

Find authenticated court documents without watermarks at docketalarm.com.

#### SAIGO et al.: 20K-GATE CMOS ARRAY



Fig. 8. Flowchart of planarization.



Fig. 9. Microphotograph of triple level metallization.

planarization technique is applied to two interlayer  $SiO_2$  films. Fig. 9 is a SEM microphotograph of the triple level metallization. A 2  $\mu$ m minimum linewidth and good step coverage of every metal layer are realized.

#### IV. ARRAY OF BASIC CELL AND I/O CELL

#### A. Basic Cell

The basic cell consists of two pairs of n- and p-channel transistors. One pair has a polygate in common and the other has separate gates [4]. The basic cell size is  $90 \times 20.1 \ \mu m$ . The channel width of the transistors is  $24 \ \mu m$  each. The effective channel length is  $1.5 \ \mu m$  each. The basic cell size is minimized by a tight A1 pitch and fine alignment tolerance.

#### B. I/O Cell

An I/O cell is important as well as a basic cell, which determines performance and compatibility to the interface. In a large scale gate array with many I/O cells, chip size is highly dependent on the I/O cell size. Therefore, in order to minimize the I/O cell area, the I/O cell is designed as a one-stage inverter. The effective channel length is designed to be 2  $\mu$ m in order to reduce leakage currents. For the input cell, the channel widths are 240 and 24  $\mu$ m for n- and p-channel transistors. The basic cell is used for interfacing the I/O cell and the array.



Fig. 10. Chip configuration of 20K-gate gate array.

#### C. Chip Configuration

Fig. 10 shows the chip configuration of the 20K-gate array. There are 46 columns on the chip and 435 basic cells are in each column. In total, 20 010 basic cells are laid out on the chip. Between each column, there are 20 tracks of the first A1 interconnecting lines with a 5  $\mu$ m pitch. The first A1 interconnecting lines run vertically, and the second A1 interconnecting lines run horizontally with 6.7  $\mu$ m pitch. The third A1 interconnecting lines, which are used for power buses, run horizontally with a 47 basic cell pitch. The third A1 lines are connected indirectly to the first A1 lines used for power buses, through the second A1 lines. Location of both via 2 holes (between the second and third metal layers) and the third A1 interconnecting line pattern are fixed, therefore, four layers of masks (contact hole, first metal, via 1, and second metal) are used for personalization.

There are 180 I/O cells on the periphery of the chip, each of which is programmable as an input, output, or tristate buffer with TTL compatibility. The chip size measures  $9.99 \times 9.99$  mm.

#### D. Performance of Array

The performance of the array was evaluated by measuring frequencies of various ring oscillators designed by the basic cells. Gate delay time is typically 1.5 ns under the conditions of fan-out 3 and 3 mm metal interconnect length. The output buffer delay is less than 5 ns for output load of 1-TTL gate and 15 pF capacitance. The sink and source currents are about 13 mA ( $V_{OL} = 0.4$  V) and 5 mA ( $V_{OH} = 4.5$  V), respectively. Fig. 11 shows the typical  $I_{OL}$ - $V_{OL}$  characteristics of the output buffer. The buffer has high drive capability.

#### V. $32 \times 32$ Bit Multiplier

#### A. Configuration of Multiplier

The 20K-gate gate array is applied to a 32 bit parallel multiplier. The utilization of the basic cells is about 65 percent. Fig. 12 shows the block diagram of the multiplier. This configuration is an array type. Fig. 13 shows the logic diagram of the basic multiplier cell. All partial products are generated simultaneously, and the product is obtained by adding these

581

Find authenticated court documents without watermarks at docketalarm.com.



Fig. 11. Characteristics of output buffer.



Fig. 12. Block diagram of  $32 \times 32$  bit parallel multiplier.



Fig. 13. Logic diagram of basic multiplier cell.

partial products. The TCX signal selects the signed or unsigned multiplication, and the RND signal selects the output rounding mode or nonrounding mode. The format of the multiplied output is controlled by the FORMAT SELECT signal.

#### B. Carry Look Ahead Adder

In an array-type multiplier, large propagation delay time of the carry signal tends to limit performance of the multiplier. For achieving fast operational speed, a carry look ahead adder (CLA) circuit is employed for addition of upper 31 bits. The 31 bit wide CLA is divided into blocks to minimize carry path delay under the limitation of fan-out number in actual MOS circuits. Compared to 4 bits X 8 organization, 8 bits X 4 type needs more basic cells, but it offers much shorter carry path







Fig. 15. Logic diagram of 8 bit CLA 1.

delay. Fig. 14 shows the configuration of the 8 bits X 4 type CLA. The 31 bit wide CLA is divided into four blocks. Fig. 15 shows the logic diagram of the 8 bit CLA. In this case, we choose NAND instead of NOR gates for CMOS circuits. The number of input signals to the NAND gate is limited to eight.

#### C. Design of Multiplier

The multiplier has been designed in three steps. The first is choice of the multiplier type mentioned above. The second is the logic design of the multiplier. The third is the layout and pattern design.

For logical verification of the multiplier, a logic simulation has been performed. In order to save CPU time and memory requirements, a basic multiplier cell is defined as a functional description in the simulation. By using this technique, the number of gates for the logic simulation is reduced to 4K gates. Fig. 16 shows a part of the simulation result. Before the layout, macrocells and super macrocells are prepared.

# DOCKET



## Explore Litigation Insights

Docket Alarm provides insights to develop a more informed litigation strategy and the peace of mind of knowing you're on top of things.

## **Real-Time Litigation Alerts**



Keep your litigation team up-to-date with **real-time** alerts and advanced team management tools built for the enterprise, all while greatly reducing PACER spend.

Our comprehensive service means we can handle Federal, State, and Administrative courts across the country.

## **Advanced Docket Research**



With over 230 million records, Docket Alarm's cloud-native docket research platform finds what other services can't. Coverage includes Federal, State, plus PTAB, TTAB, ITC and NLRB decisions, all in one place.

Identify arguments that have been successful in the past with full text, pinpoint searching. Link to case law cited within any court document via Fastcase.

## **Analytics At Your Fingertips**



Learn what happened the last time a particular judge, opposing counsel or company faced cases similar to yours.

Advanced out-of-the-box PTAB and TTAB analytics are always at your fingertips.

## API

Docket Alarm offers a powerful API (application programming interface) to developers that want to integrate case filings into their apps.

#### LAW FIRMS

Build custom dashboards for your attorneys and clients with live data direct from the court.

Automate many repetitive legal tasks like conflict checks, document management, and marketing.

### **FINANCIAL INSTITUTIONS**

Litigation and bankruptcy checks for companies and debtors.

## **E-DISCOVERY AND LEGAL VENDORS**

Sync your system to PACER to automate legal marketing.

