`
`;
`
`x
`
`if
`
`L’
`
`r_
`
`x
`
`_
`
`,I
`:3.
`
`. -:.
`
`x
`
`‘_
`
`;
`
`“
`
`_
`
`.
`
`.“\_ '
`‘_-- _
`
`\.
`
`.\
`
`V
`
`-
`
`n
`
`_
`
`'4.
`--
`
`..
`
`_.
`
`'
`
`|\
`\
`
`\
`_\
`_-I
`
`x
`
`‘
`
`.
`— -u--
`
`\
`
`s;
`TM.
`
`.
`-
`I ~"-
`
`.
`-.‘
`-— .~.
`--
`flu;- " .—\
`'
`
`.2
`'\'
`
`_
`
`
`
`.
`
`-
`
`-'
`
`_-
`.-
`‘
`
`.
`2'.
`
`an,
`!
`
`-
`
`3.9-,
`‘J a.
`
`,n‘
`4.,
`.
`if.
`:x r-
`
`-_-
`
`..
`.
`':|.‘$’
`.‘-'
`
`1. fig“
`_
`'.—\-.
`“-
`, 3 a act-mac.-
`:3 ""Jr fi‘ v5»
`
`SAMSUNG EXHIBIT 1038
`Page 1 of 5
`
`
`
`Page 1 of 5
`
`SAMSUNG EXHIBIT 1038
`
`
`
`Em 85.000H
`
`Extended Abstracts of the 1996 International Conference on
`SOLID STATE DEVICES AND MATERIALS
`_—_——_——_______—_—
`
`imfisfixfl
`
`_
`
`finfi
`Mm?
`EIJEEFJ'I
`
`Effiwfl+gf$§T A 11154:: 7 '7 7 1/ V1
`firtEEEAEZESifi$i§-hva—
`’1‘
`"E 1U
`ED
`EU I ¥ Hi i % if;
`
`113 §§§3¥5§1§¥$fisuls9
`
`162 mfiflfifiszmsgm
`
`SAMSUNG EXHIBIT 1038
`2 of 5
`
`
`
`Page 2 of 5
`
`SAMSUNG EXHIBIT 1038
`
`
`
`Extended Abstracts of the 1996 Inlet-national Conference on Solid Stale Devices and Materish'. Yokohama, I996. PP. 324-825
`
`A New Three-Dimensional Mulliport Memory for Shared Memory
`
`in High Performance Parallel Processor System
`
`K. l:liranol S. Kawahito, T. l'illatsumoto,I Y. Kudoh, S. Pidin,
`N. Miyalcawa+, H. Itani“, T. Ichikizaki“, H. Tsukamoto“ and M. Koyanagi
`
`Dept. of Machine Intelligence and System Engineering, Tchaku University
`Arcmcki, Aabo-ku, Sendai 986-77, Japan
`+ Fuji Xena: Co.
`’ Mitsubishi Heavy Industry Co.
`
`We propose a new multiport memory with three-dimensional (3D) structure for a parallel pro-
`cessor system with real shared memories. This multiport memory can act as a real shared memory
`Without the bus—bottle neck. Therefore, a high performance parallel prcessor system with shared
`memories can be easily constructed using this multport memories. The simulation results for basic
`memory operation and the broadcast operation in this 3D multiport memory are described. Fur—
`thermore, a new 3D integration technology to fabricate this memory is also proposed and the key
`technologies for this 3D integration are explained.
`
`1. Introduction
`
`The parallel processing using multi-processors is very ef-
`fective to dramatically improve the computational
`throughput.
`It
`is well known that a shared memory
`is very useful
`to build the high performance parallel
`processor system with simple configuration and archi-
`tecture. However, the conventional system with shared
`memories has the drawback that only limited number
`of processors can be connected through a. shared mem-
`ory. This is because the processors are connected to the
`shared memory through the common buses in the con-
`ventional system and therefore the overall system per-
`formance is eventually limited by the data transfer speed
`of the buses. A high performance parallel processor sys—
`tem with shared memories can be constructed using the
`multiport memories because the multiport memory can
`act as a. real shared memory without the bus-bottle neck.
`However, it is not easy to design the high speed multi—
`port memory with many read / write ports using the con-
`ventional method because one memory cell must drive
`many bit-lines and the memory cell area significantly in~
`creases when many ports are connected to the memory
`cell. To solve this problem, each port should be driven
`by each memory cell. This implies that several mem-
`ory cells with their own ports are stacked each other.
`That is, the multi-cells with multiports. Such multiport
`memory can be achieved by using three—dimensional in~
`tegration technology.
`In this paper, we propose a new
`parallel processor system with three-dimensional mul-
`tiport memories as shared memories and a new three-
`dimensional integration technology to achieve this sys—
`tem.
`
`2. Parallel Processor System with 3D Multiport
`Memories
`
`A configuration of parallel processor system with direc—
`tory architecture for shared memory is shown in Fig.1.
`Many processors are connected to one shared memory in
`a. cluster of this system. The computational throughput
`can be significantly improved within a cluster because
`a real multiport memory is used as a shared memory
`in this system and hence the bus-bottle neck is not
`significant any more- Several clusters are connected
`
`324
`
`
`
`
`
`
`
`
`
`
`
`
`Fig.1 Parallel processor system with directory at
`ture for shared memory.
`
` Parallel Buses
`
`Fig.2 3D shared memory system.
`
`is required to
`by the high speed buses when it
`nect more processors.
`In this case, shared da_
`stored in several shared memories beyond one
`These shared data are supervised by the shared.
`ory directories. Therefore, the performance is 3
`reduced when the data transfer among the pro
`and the shared memories is required beyond the
`ters. Nevertheless, the overall performance of th:
`tem is significantly improved by using real mill
`memories as shared memories. A configuration of
`dimensional shared memory system is shown in'
`where several layers of shared memory are to :
`by many short buses for the broadcast in the ink
`direction. A main memory is connected to the
`tive layer of shared memory by the internal multii
`with very high data transfer speed and data band
`A pwcessor in a cluster is connected to the res '
`
`SAMSUNG EXHIBIT 1038
`3 of 5
`
`
`
`Page 3 of 5
`
`SAMSUNG EXHIBIT 1038
`
`
`
`
`
`mvrmi-imVim“Cl
`
`
`
`
`
`Fig.5 Simulated waveforms for 3D multi-port memory.
`
`sums Inlmclmmlan
`
`
`
`Grinding
`and
`Polishing
`
`Alignment
`and
`Gluing
`
`
`
`Fig.6 Fabrication sequence of 3D LSI-
`
`the memory cell in the eighth layer of multiport memory
`and then transferred to the memory cells in the other
`memory layers in the former part and the latter part of
`the operation, respectively. It was confirmed from Fig.5
`that the basic memory operation and the broadcast op-
`eration are successfully executed.
`It takes about l3ns
`for broadcasting the data to all layers of 3D multiport
`memory.
`
`3. 3D Integration Technology for Multiport
`Memory
`
`A new 3D integration technology as shown in Fig.6 has
`been proposed. Thinned memory layers are stacked on
`the bottom memory layer with the thickness of about
`200pm using a wafer bonding technique. The upper
`memory wafer with many buried interconnections is
`glued to the quarts substrate using a liquid wax and
`then thinned down to 20pm using the grinding and the
`chemical—mechanical polishing (UMP) techniques. The
`buried interconnection consists of a highly doped poly-Si
`
`825
`
`SAMSUNG EXHIBIT 1038
`
`4 of 5
`
`3D multi-port memory circuit for shared memory.
`
`
`Boned mreonnscuun
`
`
`
`5i Subslrals
`
`
`Cross-sectional view of 3D multi—port memory.
`
`
`
`'
`
`3.3-4 of shared memory. Data written to some layer of
`' ' memory by the respective processor are simul-
`---:-- transferred (broadcast) to other layers of the
`:1 memory through the vertical multi—buses (broad-
`'-.- buses]. Therefore. the memory cells with an identi-
`l memory address in all memory layers of shared mem~
`have an identical data after the data transfer. This
`- stical data can be read—out simultaneously and inde—
`.- ntly by several processors which are connected to
`respective layers of shared memory. Therefore. this
`ed memory acts as a real shared memory without
`bus-bottle neck. The circuit configuration to achieve
`. real shared memory is shown in Fig.3. Three-
`sional (3D) multiport memory is used to achieve
`real shared memory in the figure where a part of
`rnory layers is shown. A memory cell of this mul-
`t memory has two pairs of access transistors [two
`s) for the horizontal access and the vertical access.
`4': horizontal access transistor is used to execute the
`ventional memory operation within a memory layer
`- lie the vertical access transistor is used to broadcast
`_-l- data in the vertical direction. The amplifier and
`'
`'te circuits are installed in the top layer of this mul—
`[Em memory in order to amplify the data readwout
`€--
`the broadcast buses and rewrite them to the memory
`,- la in all memory layers of multiport memory. A cross-
`tional view of 3D multiport memory is shown in Fig.4
`-. here a new 3D integration technology is used to stack
`“I memory layers. Figure 5 shows the simulated wave-
`-..- of the multiport memory with 3 memory layers (3
`Its) which was designed based on 2am CMOS design
`'."r In the figure, the data "1” and "0” are written to
`
`
`
`Page 4 of 5
`
`SAMSUNG EXHIBIT 1038
`
`
`
`23.8}:0813 23M!
`
`IBM'-
`
`(b)After alignment
`(a) Before alignment
`Fig.10 Wafer alignment using 3D wafer aligner.
`
`Fig.7 SEM cross-section of oxidized deep trench filled
`with poly-Si.
`
`as: i
`
`I -.=.:_;
`
`e
`
`i
`I
`l
`.u 3,"!
`.I-l
`L.-.
`
`
`
`which is deposited onto the oxidised deep silicon trench.
`The thinned memory wafer is glued to the lower mem-
`ory layer on the surface of which an UV-hardening ad-
`hesive material with the thickness of 1pm is spin-coated.
`Three-dimensional memory structure for the multiport
`memory is formed by repeating these sequences. Figure
`7
`shows SEM cross-section of oxidized silicon trench
`with the depth of about 20pm which is filled with a
`highly doped poly-Si. This highly doped poly—Si is used
`as the buried interconnection. As is obvious in the fig
`ure, the buried interconnection with the size of about
`2pm and the length of about 20pm is clearly formed.
`A photomicrograph of the back surface of silicon wafer
`after thinning the wafer down to 20pm by grinding and
`CMP is shown in Fig.8 where the bottom of silicon
`trench is clearly seen. The electrical contact between
`the upper and lower buried interconnections is imple-
`mented using Ianu micro-bumps as shown in Fig.9.
`This micro-bump is formed using a liftoff method. A
`newly developed 313 wafer aligner is used to glue the up-
`per memory wafer to the lower memory wafer. Infra—red
`light is used for the alignment of two wafers in this 3!)
`wafer aligner. In addition, the wafer stage is precisely
`controlled in the movement using the pieso actuators.
`The controllability of the wafer stage is 50mm in the
`x, y and 2 directions. Furthermore,
`the gap between
`two wafers and the contact force to glue two wafers
`can be monitored and controlled in-situ before and after
`twa wafers are contacted, respectively. Infra—red images
`
`dihifihl
`
`_
`
`gliIII-‘hdx.
`
`5me&S 5me&S
`
`Fig.8 Photomicrograp‘h of the back surface of silicon
`wafer after grinding.
`
`1 st layer
`
`2nd ayer
`
`Fig.1]. Photomicrograph of test structure to eval
`the contact resistance betWeen micro—bumps.
`
`:;.,'
`
`of test wafers before and after the alignment using 2;-
`wafer aligner are shown in Fig.10. The alignment _..
`curacy of two wafers is around 1pm. Figure 11 ---.--
`the infra—red image of the test structure to evaluate =:'-.
`contact resistance between two micro-bumps in the i._:
`per and lower layers. This image was taken after bou
`ing two wafers. The [TV-hardening adhesive material
`';
`coated on the surface of the lower wafer and hence
`Hz'
`the surface of the micro-bump in the lower wafer.
`adhesive material on the surface of the rnicro~bump [E
`pushed aside during applying the force to contact t 4,
`wafers.
`It was confirmed using the test structure
`'
`shown in Fig.11 that a good electrical contact bet
`..
`two bumps in the upper and lower wafers can be
`tained. Thus, the key technologies to fabricate 3D n I 5'.
`tiport memory have been developed.
`
`4. Conclusion
`
`We proposed a. new multiport memory with
`r:-
`dimensional structure for a parallel processor sys
`with real shared memories. The basic memory ope: .
`tion and the broadcast operation in this 31) multi '
`memory is confirmed by the computer simulation.
`'-'
`thermore, a new 3D integration technology to fabri v
`this memory was also proposed and the key technol 2.;
`for this 3D integration were developed.
`
`Reference
`
`.12. (It. 0|}in
`
`.;I_Il_~.'
`
`E...“
`
`[1} T. Matsumoto et al., Ext.
`(1995)1073
`
`Abst.
`
`of SSD .3
`
`Fig.9 SEM cross-section of microbump.
`
`SAMSUNG EXHIBIT 1038
`
`
`
`Page 5 of 5
`
`SAMSUNG EXHIBIT 1038
`
`