`______________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`______________________
`MICROSOFT CORPORATION,
`Petitioner,
`v.
`DIRECTSTREAM, LLC,
`Patent Owner.
`_______________________
`IPR2018-01594 (Patent 6,434,687 B1)
`IPR2018-01599 (Patent 6,076,152)
`IPR2018-01600 (Patent 6,247,110 B1)
`IPR2018-01601 (Patent 7,225,324 B2)
`IPR2018-01602 (Patent 7,225,324 B2)
`IPR2018-01603 (Patent 7,225,324 B2)
`IPR2018-01604 (Patent 7,421,524 B2)
`IPR2018-01605 (Patent 7,620,800 B2)
`IPR2018-01606 (Patent 7,620,800 B2)
`IPR2018-01607 (Patent 7,620,800 B2)
`__________________________
`
`DECLARATION OF JON HUPPENTHAL
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 1
`
`
`
`TABLE OF CONTENTS
`
`I.
`
`INTRODUCTION .................................................................................................... 1
`
`II. QUALIFICATIONS ............................................................................................... 1
`
`III. STATE OF THE ART ........................................................................................... 3
`
`A. Cray Research and Cray Computer Corporation ........................................... 3
`
`B. SRC Computers ............................................................................................ 13
`
`C. SRC-6 Hi-Bar Crossbar Switch ................................................................. 16
`
`D. SRC-6 Processor .......................................................................................... 22
`
`E. SRC-6 Common Memory ............................................................................ 30
`
`F. SRC-6 Reconfigurable Processor ................................................................. 30
`
`G. MAP Development ....................................................................................... 35
`
`H. SRC Architecture and Focus Change .......................................................... 37
`
`I. Software Development .................................................................................. 48
`
`J. Applications ................................................................................................... 51
`
`K. Summary....................................................................................................... 54
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 2
`
`
`
`I. INTRODUCTION
`1. I am an inventor of U.S. Patents 6,076,152, 6,247,110, 6,434,687, 7,225,324,
`
`7,421,524, and 7,620,800 and one of the original employees of SRC Computers.
`
`2. Everything in this declaration is based on my personal knowledge and
`
`professional judgment. Several of the documents referenced in Exhibit B and
`
`attached to this declaration are based on my personal knowledge from awareness of
`
`them at the time of their creation, documents I personally created, or business records
`
`of SRC Computers/DirectStream, LLC, which I am a custodian of. Furthermore, all
`
`photographs in this document were taken by myself at the time with the exception of
`
`the SRC-6e, which was taken from the SRC Computers/DirectStream’s photo
`
`archive.
`
`3. If called as a witness during this matter, I am prepared to testify competently
`
`about them.
`
`II. QUALIFICATIONS
`4. My curriculum vitae is provided as Exhibit A. Relevant highlights are
`
`summarized below.
`
`5. I received a Bachelor’s Degree in Electrical Engineering from Purdue
`
`University in West Lafayette, Indiana in 1979, and am a named inventor on 27
`
`United States Patents, as well as numerous foreign counterparts. These patents cover
`
`methods and apparatus for wafer level testing of semiconductors, high-speed
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 3
`
`
`
`computer interconnect technologies, FPGA-based reconfigurable processor designs,
`
`heterogeneous computer system designs, optimal application programming
`
`techniques for reconfigurable processors, and methods for the use of heterogeneous
`
`computer systems.
`
`6. I also held a Top Secret SCI security clearance with SI TK endorsements. To
`
`achieve these clearances, I was subjected to extended background investigations by
`
`various U.S. Governmental Intelligence Services, which included polygraph testing.
`
`7. I am currently the Executive Vice-President and Chief Technology Officer for
`
`Systems at DirectStream, LLC. In this role I am responsible for, and actively
`
`participate in, the design and manufacture of all DirectStream FPGA-based computer
`
`systems.
`
`8. In 1996, I was asked by Seymour Cray, the father of supercomputing, to be one
`
`of the founders of SRC Computers LLC. I served as the Vice-President of Hardware
`
`Development for the company through December of 2003. In January 2004, I
`
`became the company’s Chief Executive Officer and Chief Technology Officer
`
`serving in that position until the company was acquired by DirectStream, LLC in
`
`February of 2016. While at SRC Computers I invented, developed and patented the
`
`FPGA-based MAP® processor, as well as the system architecture incorporating it and
`
`methods for its optimal use. I was also responsible for overseeing the entire
`
`intellectual property program at SRC.
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 4
`
`
`
`9. Prior to SRC Computers, I was Manager of Electrical Design initially for Cray
`
`Research in 1988 and then Cray Computer Corporation after its separation in 1999. I
`
`stayed in this role until March of 1995 and was responsible for the electrical design
`
`and testing at the wafer, module and system level of the Cray-3, Cray-4 and Cray-5
`
`Gallium Arsenide-based supercomputers.
`
`10. I have been a member of the Advisory Boards for the School of Electrical and
`
`Computer Engineering of the University of Colorado, Colorado Technical University
`
`and the Catholic University of America. At the 2010 World Conference in Computer
`
`Science, I gave the keynote address and received the Outstanding Achievement
`
`Award in recognition of my “leadership and outstanding research to the field of
`
`Heterogeneous Systems”.
`
` STATE OF THE ART
`III.
`A. Cray Research and Cray Computer Corporation
`11. In order to understand the DirectStream patents under discussion, it is
`
`imperative to understand the high-performance computing (HPC) field for which
`
`they were developed. For example, as shown in patent number 6,607,152 col. 1 lines
`
`35-49; patent number 7,421,524 col. 1 line 21, col. 1 line 28 – col. 2 line 12; patent
`
`number 7,620,800 col. 1 line 39-61; patent number 6,434,687 col. 1 line 20, col. 1
`
`line 52-63. Such an understanding unquestionably starts with Seymour Cray and the
`
`Cray family of supercomputer systems.
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 5
`
`
`
`12. My first involvement with Seymour and HPC came in November of 1988.
`
`Earlier that year, Seymour had moved the portion of Cray Research responsible for
`
`the Cray-3 system, along with himself, to Colorado Springs, Colorado. I had spent
`
`the previous 8 years in Colorado Springs designing test systems and simulators at
`
`TRW for use in the manufacture of cryptographic systems for the National Security
`
`Agency (NSA). As luck would have it, Cray Research had a serious issue trying to
`
`manufacture and test the semiconductor wafers used in the Cray-3 and I was
`
`recruited to solve this problem. My experiences here would heavily influence the
`
`design choices that would be made at SRC Computers.
`
`13. The Cray-3 was a very typical Seymour Cray architecture building on the
`
`Cray-1 and Cray-2 with a relatively small number, 2 to 16, very high-performance
`
`processors connected to multiple shared memory banks through a crossbar switch. A
`
`more detailed description of these systems can be found in the Cray Research and
`
`Cray Computer Corporation documentation.1,2,3
`
`14. Unlike all other computers at the time, the Cray-3 used Gallium Arsenide
`
`(GaAs) instead of silicon to make its semiconductors. The reason for this was that
`
`GaAs had significantly higher electron mobility than silicon so we were able to
`
`operate theses circuits much faster than silicon, which would yield a big performance
`
`advantage over all other systems.
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 6
`
`
`
`15. The drawback was that at that time GaAs chips could not be fabricated with as
`
`small of a feature size as silicon so the number of gates that could fit in a given area
`
`of a semiconductor wafer was smaller. This meant that we could not fit as much
`
`circuitry on a GaAs chip as the competitors, such as Intel using silicon could.
`
`However, the performance gains were still significant enough that the decision was
`
`made to stay with GaAs and build the Cray-3 in such a way that we could package
`
`many bare GaAs ASIC die in a very dense fashion to minimize the size of the
`
`system.
`
`16. This meant that a single Cray-3 processor would be built not using a single or
`
`small number of microprocessor chips, but rather a set of four, 4"x4"x1/4" modules
`
`containing a total of 4096 GaAs ASICs in a very unique and complex 3D stacking
`
`process3.
`
`
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 7
`
`
`
`
`4"x4"x1/4" Cray-3 Module with Bare GaAs Logic Board and Memory Boards
`
`17. A complete 10 module processor assembly would consist of eight modules
`
`making up two processor and two modules for I/O. These were then interconnected
`
`to the common memory banks using thousands of twisted pair wires making up a
`
`wire mat. The term Memory Bank was well understood in the HPC industry,
`
`including myself, as a group of interconnected memory devices that are accessible by
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 8
`
`
`
`a processor or other similar device through a single access port and was commonly
`
`used in Cray Research documentation.1,2,3
`
`
`
`Final Generation Dual Processor Cray-3 Assembly with Wire Mat
`
`18. The total volume consumed by a single Cray-3 processor was about the same
`
`
`
`size as the much lower performance Intel slot 2 Xeon processor that would come
`
`along a decade later. This same stacked module assembly process detailed in the
`
`Cray-3 documentation3 was also used to create the Common Memory Banks and the
`
`Crossbar Switch in the Cray-3. The Crossbar Switch in the Cray-3 was actually
`
`distributed among all the modules and was not a discrete assembly.
`
`
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 9
`
`
`
`32 Module 32 Bank Cray-3 Memory Bank Assembly and Wire Mat
`
`19. In total, the two processors, accompanying memory banks and crossbar
`
`
`
`switch, called an Octant, were made up of 18,432 GaAs die level ASICs along with
`
`18,432 memory die all mounted directly on circuit boards. While many of the die had
`
`identical functionality, there were still 480 unique GaAs ASIC designs. Due to the
`
`complex module assembly, and since there was no packaging to be used on any of
`
`these parts, it was imperative that the good and bad die be identified while the GaAs
`
`wafers were still in wafer form.
`
`20. At the time I joined Cray Research in 1988, the company had no effective way
`
`to test these wafers at full operating speed. As a result, completed modules had to be
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 10
`
`
`
`repeatedly disassembled and reassembled to try to find working ASICs. With 40,000
`
`opportunities for a bad part to be in the mix, the Cray-3 was making slow forward
`
`progress. My assignment was to develop a way to functionally test the ASICs on the
`
`wafers at 500MHz. At the time there were no commercial systems available to do
`
`this. The first step was to develop our own probe cards just to make the high number
`
`of contacts with the wafer in a way that would support our fast data rates. This design
`
`would ultimately lead to my first patent.
`
`500 MHz Cray-3 Wafer Probe Card
`
`
`
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 11
`
`
`
`21. With that accomplished, we then had to find a way to generate all the high-
`
`speed input data patterns that we needed, as well as a way to capture the high-speed
`
`output signals. This would involve both very expensive commercial equipment and
`
`circuits of our own designs. Once the whole test system was functioning, we then had
`
`to develop the 480 test programs to run on it to test each ASIC type. After several
`
`years of effort, we were then able to successfully screen the good and bad ASICs at
`
`the wafer level.
`
`22. Now that we could accurately evaluate the quality of the GaAs wafers we
`
`uncovered the next major issue. At that time there were only two commercial
`
`foundries, Gigabit Logic and Fujitsu, that would produce GaAs wafers to order.
`
`Unfortunately, what we found was that the process consistency both lot-to-lot and
`
`between vendors was horrible. As a result, we were unable to have a constant supply
`
`of good ASICs that would work with each other. This led to the company making the
`
`decision to build our own GaAs semiconductor processing foundry so that we could
`
`be assured of a consistent supply of ASICs. Once this very expensive effort was
`
`completed and operational, we were finally able to build functional Cray3 systems.
`
`23. In 1989 the Colorado Springs operation and Seymour Cray broke off from
`
`Cray Research to become Cray Computer Corporation (CCC) and I became Manager
`
`of Electrical Design for the new company. In this role I was responsible for all
`
`electrical design aspects of the system and Seymour would lead a team of about 8
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 12
`
`
`
`engineers who would design the logic that would go into the ASICs. This structure
`
`would continue our very close working relationship.
`
`24. Unfortunately, overcoming these major issues associated with the ASICs and
`
`the complex module assembly process had made the Cray-3 very late to market and
`
`the underlying ASIC performance was not as good as what we knew we could now
`
`build. Consequently, after fielding just one machine at the National Center for
`
`Atmospheric Research, most of us would focus on the Cray-4. This new system
`
`would leverage all that we had learned on the Cray-3 and would be assembled in a
`
`similar fashion but with 2X faster GaAs ASICs that each contained 10x more
`
`circuitry. Of course, these faster ASICs would require new faster test systems and a
`
`new test program for every ASIC.
`
`25. At that time there was no commercial equipment that we could leverage so the
`
`entire wafer level test system was designed and built internally. One day, while
`
`debugging the first of these test systems, a circuit board caught fire. It was one of
`
`about 100 that fed the inputs to the ASICs so the impact was minimal. However,
`
`within 30 minutes the stock analysts had somehow found out and were already
`
`calling the front office asking what the impact would be to the Cray-4 program. This
`
`just highlighted to all of us the criticality of our testing efforts to the overall
`
`company.
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 13
`
`
`
`26. In the end, the HPC markets were now in enough of a state of confusion that
`
`the Cray-4 was not enough to keep the company going as discussed more thoroughly
`
`by Seymore himself in the Newcray Business Plan:4
`
`
`
`
`
`27. The day we closed the company and informed the employees in March of
`
`1995, Seymour said to me "If we could just mothball what we have for five years the
`
`government would be clamoring for it". This statement impacted me because it
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 14
`
`
`
`confirmed we were on the right track long-term for building HPC systems, but
`
`needed the customers to exhaust other inefficient systems first. In the end he was
`
`right as after just a few years, the NSA started complaining that no one was
`
`producing vector processors and they started pumping money into Cray Research to
`
`keep them going in that direction. Unfortunately, Seymour would not live to see this
`
`happen.
`
`28. After helping liquidate Cray Computer, I would spend the next year as
`
`Manager of Portable Product Engineering for Apple, where my exposure to
`
`microprocessor designs, as all of my years in HPC at Cray, would have significant
`
`influence on all the decisions that we would make at SRC Computers.
`
`B. SRC Computers
`29. In June of 1996, Seymour called and wanted to meet for lunch. At that
`
`meeting he presented me an offer package4 to join him at a new company he was
`
`starting. The idea was to go after the same HPC markets and customers that we had
`
`at Cray Computer Corporation but do it in a more cost-effective way. One of the key
`
`elements of this plan was to utilize Intel microprocessors but to do so in a very novel
`
`way that no other company had the understanding or expertise to do. It is particularly
`
`noted in the business plan that HP did not have the expertise to do what we were
`
`going to do and that as part of the plan we would share our technical capability with
`
`them.4 The real differentiator of Newcray, ultimately called SRC Computers, systems
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 15
`
`
`
`would be that we would implement a classic Cray architecture having
`
`microprocessors connected through a crossbar switch to common memory banks.
`
`
`
`
`
`July 1996 White Board in Jon Huppenthal's Office Showing Initial
`SRC-6 Block Diagram
`
`
`
`30. To accomplish this, it meant that we had to design a high bandwidth crossbar
`
`switch to connect multiple Intel microprocessors to multiple memory banks that we
`
`would also have to design. The typical way of accomplishing such custom logic
`
`functions as practiced by our competitors was to develop ASICs. However, since
`
`these would require the same performance level as the microprocessors, ASICs
`
`would have to be fabricated using the same leading-edge semiconductor fabrication
`
`process as the microprocessors. My experience at Cray told me that to get the
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 16
`
`
`
`performance level that we wanted, particularly in the relatively low volumes that we
`
`would consume, would probably cost us several hundred million dollars. In addition,
`
`the rate at which we made feature additions and improvements at Cray would mean
`
`that these ASICs would probably have to be updated relatively often thus incurring
`
`additional cost. Not wanting to repeat my earlier ASIC experience, I suggested that
`
`we could find a way to accomplish what we needed using commodity Field
`
`Programmable Gate Arrays (FPGAs). These devices consisted of an array of
`
`identical circuit blocks that the user could program to perform whatever function they
`
`desired. Consequently, we could accomplish the custom designs that we required
`
`without the need to design, fabricate and test ASICs.
`
`31. In the summer of 1996, the highest clock rate FPGAs were built by Lucent.
`
`While Xilinx and Altera produced FPGAs that could hold somewhat more circuitry,
`
`the rate at which we could run them was significantly lower than the Lucent parts. As
`
`a result, we decided to repeat the path that we had followed at Cray and go with the
`
`smaller but faster parts. This was again a very counter intuitive choice since most
`
`HPC designers at the time were trying to fit as much functionality as possible into a
`
`single chip to simplify the inter-chip communication and overall system design.
`
`Since most of the designers at SRC came from CCC, they were already very
`
`experienced at partitioning their designs to efficiently use multiple small chips.
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 17
`
`
`
`C. SRC-6 Hi-Bar Crossbar Switch
`32. As we detailed out the design in the late summer of 1996, it became apparent
`
`that there would be three major FPGA design efforts. The first would be a Bridge
`
`Chip designed to interface the commodity microprocessor to the Cross Bar Switch.
`
`The second would be the Cross Bar Switch itself and the third would be the Common
`
`Memory Banks. We decided that I would design the Cross Bar Switch first. Given
`
`the number of I/O pins on the largest FPGA package, it was decided that the switch
`
`would be made in two tiers such that the first tier would connect to a group of
`
`microprocessors and the second physically separate tier to 16 memory banks. This
`
`would allow the output of one tier to be connected to the inputs of the second such
`
`that all processors could access up to 256 memory banks in a fully populated 16
`
`segment system. The switch tiers were duplicated for both the read and write paths to
`
`memory. For a variety of reasons, we vacillated on the number of microprocessors
`
`that made up a group as being between 16 and 20. Ultimately, we built the first
`
`switches assuming 20 but only populated connectors for 16.
`
`
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 18
`
`
`
`Slide from February 1997 Showing Half of the SRC-6 Interconnect6
`
`33. To accomplish this design, each tier was built as two large high layer count
`
`
`
`circuit cards each containing 27 interconnected FPGAs. Two cards dealt with data
`
`traffic going to the memory banks and two with traffic coming from the memory
`
`banks. We would use 16 of each of the four switch board designs in a fully
`
`configured 16 segment system consuming 1728 FPGAs just for the switch. As we got
`
`into the details of the FPGA designs for the various switch chips we discovered that
`
`the Lucent design tools of the day could not adequately control the time delay of the
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 19
`
`
`
`quantity of signals inside an FPGA that we needed. To achieve the high performance
`
`that we required, I ended up hand selecting all the routing resources in the FPGA so
`
`as to insure equal performance of all signal paths through the switch FPGAs.
`
`Basically, I hand routed all the switch FPGAs. To physically fit all of these FPGAs
`
`and I/O connectors on a single circuit card required a very large, high layer count
`
`printed circuit board.
`
`
`
`One of Four Original SRC-6 17" x 22" Cross Bar Switch Boards
`
`
`
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 20
`
`
`
`34. However, due to the very high quantity of impedance controlled high speed
`
`circuit board traces that were required to interconnect all of the FPGAs, it proved not
`
`physically possible to manufacture the circuit boards using standard PCB processing
`
`techniques of the day. This was because of the basic physics involved. To achieve the
`
`best desired interconnect signal quality, the board must maintain a specific
`
`impedance along its traces. This impedance is determined by the width and thickness
`
`of the trace, its distance to the nearest reference plane, and the type of material used
`
`to separate the two. This will then tell you how thick a single layer of the board must
`
`be. As you interconnect all the chips during the layout of the PCB you find out how
`
`many layers will be required to completely interconnect all chips without any traces
`
`crossing each other on the same layer. Now to interconnect the surface pads under
`
`each FPGA to the signals on the inner layers of the board, you must drill a hole
`
`through the board and then plate the hole. This is called a via. The state of PCB
`
`manufacturing at any point in time will tell you how deep you can drill and plate for a
`
`given diameter hole which is called the aspect ratio. The larger the hole diameter, the
`
`thicker the board you can drill through. On top of that, the pad pattern of the FPGA
`
`package will determine what the largest diameter hole is that you can use without
`
`shorting two pads together. Because of these aspect ratio issues, in the time frame
`
`that were building these boards, no traditional board shop was capable of
`
`manufacturing the roughly 50 layer thick and very large boards that we required. This
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 21
`
`
`
`problem threatened to completely derail the program since we could not reduce the
`
`FPGA count and be able to achieve the system performance that we needed. After a
`
`global search we came across a circuit board technology called Multiwire. This
`
`process was only available from one shop in Japan and one in Georgia. What it did
`
`was to embed very small insulated wires into the circuit board resin instead of
`
`etching away copper like traditional circuit boards. Since the wires were insulated
`
`they could cross over each other in a single layer unlike regular board traces. This
`
`resulted in a about a 4x to 6x reduction in the number of layers required, which
`
`reduced the aspect ratio of the via by about the same amount. Consequently, they
`
`were able to produce our switch boards as designed using this technology.
`
`
`
`
`17"x22" Multiwire SRC-6 Switch Board Layer Showing Close Up of Wires
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 22
`
`
`
`
`35. Over the years as FPGA process geometries improved and internal feature
`
`sizes shrunk, we were able to put more of the switch logic in each FPGA. This
`
`allowed us to reduce the number of FPGAs required to build the switch, thus
`
`reducing the amount of chip to chip interconnect. This reduction in interconnect
`
`reduced the PCB layer count and ultimately allowed us to stop using Multiwire
`
`boards and move back to traditional printed circuit boards. In March of 2004, we
`
`filed for trademark protection of the name Hi-Bar for our switch, which was issued
`
`in August of 2005 and is still used today.
`
`36. In a multi-processor HPC system with common memory banks such as we
`
`were building, it is imperative that all processors have equal access to all memory4.
`
`This is referred to as a Symmetric Multi-Processor computer system or SMP system.
`
`To accomplish this symmetric access, all portions of the switch must be in
`
`communication with each other. This allows memory accesses to be equitably
`
`granted to prevent any one processor from blocking access to a memory bank by
`
`other processors. Such a switch arbitration scheme that could be implemented in an
`
`FPGA and coordinate the routing activities of up to 1728 FPGAs that made up our
`
`switch did not exist. As a result, we had to design one which resulted in the first
`
`issued SRC Computers patent number 6,026,459.
`
`37. Unfortunately, before we ever got to this point in the design process, Seymour
`
`would pass away as the result of a car accident. The HPC community truly lost a
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 23
`
`
`
`visionary (http://pages.cs.wisc.edu/~bezenek/cray.html, last accessed July 11, 2019).
`
`From that point on I was the final decision maker for all technical aspects of the
`
`system.
`
`D. SRC-6 Processor
`38. The next portion of the system to be dealt with was the microprocessor board.
`
`As discussed in the Newcray business plan4, we intended to use Intel's upcoming
`
`processor code named Merced. This processor had been initially designed by Hewlett
`
`Packard and was to be fabricated by Intel. It was also going to be Intel's first offering
`
`with a 64-bit address bus and 64-bit registers. We felt that this would finally be
`
`adequate to address the large common memory that HPC applications required, as
`
`well as having high enough performance for our HPC customers. In the summer of
`
`1996, as we started having meetings with Intel, significant issues with Merced started
`
`to come to light. When the engineers at HP initially designed the processor, their
`
`primary focus was to include all of the high-end processor features that they felt they
`
`would need, which was also well aligned with what we wanted for HPC.
`
`Unfortunately, this design did not appropriately take into account Intel's design rules.
`
`These rules were very important and are what allowed Intel to achieve the very high
`
`manufacturing yields that they were known for. The end result was that Merced,
`
`which would carry the product name of Itanium, was not going to be available in
`
`1997 as we had expected. Redesign cycles and production issues ended up ultimately
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 24
`
`
`
`causing its release to be delayed until June of 2001. It was clear to us by late summer
`
`of 1996 that an alternative to this processor had to be found.
`
`39. When it comes to HPC architectures there were two basic models. The SMP
`
`model that we were pursuing and the MPP or Massively Parallel Processing model.
`
`The basic difference between the two is that MPP tries to use hundreds or thousands
`
`of low performance processors all working together, whereas SMP used a small
`
`number of high performance processors. Once, when Seymour Cray was asked about
`
`his thoughts on MPP he replied, "If you were plowing a field, which would you
`
`rather use? Two strong oxen or 1024 chickens?" The big problem with the MPP
`
`strategy at that point in time was that it was very difficult to program and coordinate
`
`a large number of processors to accomplish a single task. Seymour himself talks
`
`about this in the Newcray Business Plan4. Even today, very few computer
`
`applications can even take advantage of the multiple microprocessors found in a
`
`standard microprocessor packaged device. Since we were developing an SMP
`
`system, our choices for microprocessors for use in the SRC-6 were limited to the
`
`highest performance full featured microprocessors of the day, which primarily came
`
`from Intel with whom we already had a relationship.
`
`40. To carry Seymour's ox and chicken quote a bit further I would add that
`
`choosing one over the other also has additional impact. While both the ox and
`
`chicken are generally categorized as livestock, there are distinct differences between
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 25
`
`
`
`the two. At night, when it is time to put them in the barn, the chicken farmer can
`
`simply pick up a chicken and put it in the barn. However, since an ox weights 1000x
`
`more than a chicken, the chicken farmer's method of accomplishing the same task is
`
`not relevant prior art, even the though the end result of putting livestock in the barn is
`
`the same for both. Prior art methods are only relevant, and would be obvious to
`
`explore, if they apply to the features of the technology employed. In our case, the
`
`design of an HPC SMP system required the use of high end Intel microprocessors,
`
`which themselves had many unchangeable features much like the weight of an ox.
`
`Therefore, solutions that were developed for other low end processors without the
`
`restrictions that the required high end processors had, often become irrelevant. This
`
`meant that we had to make high end Intel processors work for us and it did not make
`
`any sense to spend much time exploring what low end processor designers were up
`
`to.
`
`41. In those days Intel was made up of two camps. There was the 64-bit group that
`
`was developing Merced with all kinds of new features and consequently problems,
`
`and the 32-bit group which had developed all of Intel's previously successful
`
`products and was on a more evolutionary development path. We immediately started
`
`fresh discussions with 32-bit group about what other processors were in development
`
`and nearing release. Their new high-end offering was code named Deschutes, while
`
`they were not permitted by Intel to use 64 address bits, they were using 36. This
`
`Patent Owner FG SRC LLC
`IPR2021-00633, Ex. 2017, p. 26
`
`
`
`would allow us to offer 64Gbytes of shared common memory which was at least 4
`
`times greater than the largest Cray 3. Best of all, we could get samples in 1997 and
`
`production parts in 1998.5 Armed with that information in January, 1997 we made
`
`the decision to go with this processor instead of Merced.
`
`
`
`42. With the processor nailed down it was time to work out the details of the
`
`processor board design. All of our previous designs using custom processors allowed
`
`the processor address and data bus to connect directly to the switch circuitry and on
`
`to memory. This meant that a complete memory access on the Cray 3 was completed
`
`in 22ns or 5 1/2 processor clocks.3 Given the Deschutes' 100Mhz processor bus
`
`speed, and assuming the same number of clocks for a memory access meant that it
`
`should take about 55ns to access memory. Unfortunately, to both Seymour and my
`
`surprise, Intel informed us that there was on the order of 10-20 clock cycles required
`
`for the bus protocol alone so random accesses to memory would be much slower than
`
`we expected. This was probably the first indication that trying to adapt a commodity
`
`microprocessor to the HPC market was not going to be straightforward.
`
`Unfortunately, we did not have much choice but to move f