throbber
UNITED ST A TES PATENT AND TRADEMARK OFFICE
`
`BEFORE THE PA TENT TRIAL AND APPEAL BOARD
`
`INTEL CORPORATION,
`
`Petitioner
`
`V.
`
`FG SRC LLC,
`
`Patent Owner
`
`CASE NO. : 2020-01449
`PATENT NO. 7, 149,867
`
`SUPPLEMENTAL DECLARATION OF RAJESH K. GUPTA, PH.D.
`
`Mail Stop PA TENT BOARD
`Patent Trial and Appeal Board
`U.S. Patent and Trademark Office
`P.O. Box 1450
`
`Alexandria, VA 223 13- 1450
`
`Intel Exhibit 1030 - 1
`
`

`

`I, Dr. Rajesh K. Gupta, declare as follows:
`
`1.
`
`I am currently Professor and Qualcomm Endowed Chair at the
`
`Department of Computer Science and Engineering at University of California, San
`
`Diego ("UCSD"). I have served in that role since 2002. My resume and a detailed
`
`description ofmy work on the MORPH/AMRM project can be found in my
`
`declaration dated August 10, 2020, which was filed in the above-captioned case as
`
`Exhibit 1010.
`
`2.
`
`I make this declaration based on my personal knowledge. I have not
`
`been compensated for my time or efforts in providing this testimony.
`
`Publications ofMORPH/AMRM Papers
`
`3.
`
`Based on my personal knowledge as a member of the Institute of
`
`Electrical and Electronics Engineers, Inc. ("IEEE") and the IEEE Computer
`
`Society, as well as an organizer, frequent attendee, and presenter at many such
`
`IEEE conferences since at least 1996, the IEEE and the IEEE Computer Society
`
`sponsor many technical meetings each year. Those conferences were ( and are)
`
`considered premier conferences in the computer science field and were attended by
`
`leading experts and skilled practitioners in that field. Papers presented at these
`
`conferences were published by the IEEE as part of conference proceedings that
`
`were distributed to attendees in hard copy and then online. Once conference
`
`proceedings were published, in print and/or online, the public had access to print
`
`versions in libraries, and subscribing members had full electronic access to
`
`individual conference articles and the online abstracts and tables of contents were
`
`available for free to anyone.
`
`4.
`
`In the first half of 1996, Dr. Andrew A. Chien and I coauthored a
`
`paper entitled MORPH: A System Architecture for Robust Higher Performance
`
`Using Customization. We presented this paper at Frontiers '96, The Sixth
`
`Symposium on the Frontiers of Massively Parallel Computing. ("Frontiers '96
`
`2
`
`Intel Exhibit 1030 - 2
`
`

`

`Conference"). The Frontiers '96 Conference was sponsored by the IEEE,
`
`specifically the IEEE Computer Society, and held in Annapolis, Maryland between
`
`October 27-31, 1996.
`
`5.
`
`I confirm based on my personal knowledge that this paper was
`
`submitted to the conference organizers before the Frontiers '96 Conference and
`
`that this paper was included in the IEEE printed publication that was distributed to
`
`the conference attendees during the conference. The paper that I co-authored was
`
`published as pp. 336-345 of the Frontiers '96 Conference Proceedings by the IEEE
`
`in 1996. As with other IEEE conferences, this paper was made available in 1996
`
`to conference attendees at the Frontiers '96 Conference no later than the last day of
`
`that conference. My recollection is also consistent with my overall personal
`
`experience in attending conferences sponsored by the IEEE and the general
`
`practice in the scientific and engineering community during this period. The
`
`conference organizers generally reviewed submitted papers to determine who
`
`would be invited to speak at the conference. The conference materials were
`
`typically printed in advance so they could be distributed at the conference, as it
`
`would have been impractical to mail the conference presentations to all the
`
`conference participants after the conference. I attended the Frontiers '96
`
`Conference and received a printed copy of these published papers no later than the
`
`last day of that conference. A copy of this paper from my personal files is attached
`
`to this declaration as Appendix G 1001.
`
`6.
`
`The purpose, scope, nature of presentations, and intended audience of
`
`the Frontiers '96 Conference are described in the front matter to the printed
`
`proceedings, available from the IEEE Computer Society Digital Library at
`
`https ://www.computer.org/ csdl/proceedings/frontiers/ l 996/l 2OmNCaLEmP,
`
`attached as Appendix GI 002. These materials refresh and confirm my
`
`recollections of the Frontiers '96 Conference.
`
`3
`
`Intel Exhibit 1030 - 3
`
`

`

`7.
`
`In 1997, after I had joined the faculty at University of California,
`
`Irvine ("UCI"), I co-authored a paper entitled Architectural Adaptation for
`
`Application-Specific Locality Optimizations with Xingbin Zhang, Ali Dasdan, and
`
`Dr. Chien (all at University of Illinois, Urbana-Champaign ("UIUC") at the time)
`
`and Martin Schulz ( at the Institut filr Infonnatik, Technische Universitat
`
`Mi.inchen). This paper was presented at the International Conference on Computer
`
`Design - VLSI in Computers and Processors ("VLSI '97 Conference"). The
`
`VLSI '97 Conference was sponsored by the IEEE, specifically the IEEE Computer
`
`Society Technical Committee on Design Automation and the IEEE Circuits and
`
`Systems Society, and was held in Austin, Texas between October 12-15, 1997.
`
`8.
`
`I confirm based on my personal knowledge that this paper was
`
`submitted to the conference organizers before the VLSI '97 Conference and that
`
`this paper was included in the IEEE printed publication that was distributed to the
`
`conference attendees. This paper that I co-authored was published as pp. 150-156
`
`of the VLSI '97 Conference Proceedings by the IEEE in 1997. The copyright page
`
`of the VLSI '97 Conference Proceedings contains the following statement: "The
`
`papers in this book comprise the proceedings of the meeting mentioned on the
`
`cover and title page. They reflect the author's opinions and, in the interests of
`
`timely dissemination, are published as presented and without change." See Exhibit
`
`1003 at 2; see also Exhibit 1028, Appendix EDM0 1 at 4. This is consistent with
`
`my overall personal experience in attending conferences sponsored by the IEEE
`
`Computer Society and the general practice in the scientific and engineering
`
`community during this period. The conference organizers generally reviewed
`
`submitted papers to determine who would be invited to speak at the conference.
`
`The conference materials were typically printed in advance so they could be
`
`distributed at the conference, as it would have been impractical to mail the
`
`conference presentations to all the conference participants.
`
`4
`
`Intel Exhibit 1030 - 4
`
`

`

`9.
`
`The purpose, scope, nature of presentations, and intended audience of
`
`the VLSI '97 Conference are described in the front matter to the printed
`
`proceedings, available from the IEEE Computer Society Digital Library at
`
`https://www.computer.org/csdl/proceedings/iccd/l997/l20mNB8Cj8h, attached as
`
`Appendix GI 003. These materials refresh and confirm my recollections of the
`
`purpose, scope, nature of presentations, and intended audience of the industry
`
`conferences sponsored by the IEEE and the IEEE Computer Society during this
`
`time frame.
`
`10.
`
`In 2000, I authored a research paper entitled Architectural Adaptation
`
`in AMRM Machines . I presented this paper at the Proceedings of the IEEE
`
`Computer Society Workshop on VLSI 2000 ("VLSI '00 Workshop"). 1 The
`
`VLSI ' 00 Workshop was sponsored by the IEEE, specifically the IEEE Computer
`
`Society Technical Committee on VLSI, and held in Orlando, Florida between April
`
`27 - 28, 2000.
`
`11.
`
`I confirm based on my personal knowledge that this paper was
`
`submitted to the conference organizers before the VLSI '00 Workshop and that this
`
`paper was included in the IEEE printed publication that was distributed to the
`
`conference attendees during the conference. The paper that I authored was
`
`published as pp. 75-79 of the VLSI '00 Workshop Proceedings by the IEEE in
`
`2000. As with other IEEE conferences, this paper was made available to the
`
`conference attendees at the VLSI '00 Workshop no later than the last day of that
`
`conference. My recollection is also consistent with my overall personal experience
`
`in attending conferences sponsored by the IEEE Computer Society and the general
`
`practice in the scientific and engineering community during this period. The
`
`conference organizers generally reviewed submitted papers to determine who
`
`1 I have reviewed my prior declaration (Ex. 1010) and confirm that the reference to
`"in 1997" in paragraph 25 was a typographical error and should read "in 2000."
`
`5
`
`Intel Exhibit 1030 - 5
`
`

`

`would be invited to speak at the conference. The conference materials were
`
`typically printed in advance so they could be distributed at the conference, as it
`
`would have been impractical to mail the conference presentations to all the
`
`conference participants. I attended the VLSI '00 Workshop and received a printed
`
`copy of these published papers no later than the last day of that conference.
`
`12. The purpose, scope, nature of presentations, and intended audience of
`
`the VLSI '00 Workshop are described in the front matter to the printed
`
`proceedings, available from the IEEE Computer Society Digital Library at
`
`https://www.computer.org/csdl/proceedings/wvlsid/2000/ l20mNzl3WWV,
`
`attached as Appendix G 1004. These materials refresh and confirm my
`
`recollections of the VLSI ' 00 Workshop.
`
`13. As stated in my prior declaration, I reviewed each ofExhibits 1003,
`
`1004, and 1005, and I confirm based on my personal knowledge as a co-author or
`
`author that each is what it purports to be. Specifically, Exhibit 1003 is a true and
`
`c01Tect copy of the article that I co-authored and submitted for the International
`
`Conference on Computer Design - VLSI in Computers and Processors in 1997,
`
`titled Architectural Adaptation for Application-Specific Locality Optimizations;
`Ex. l 004 is a true and correct copy of the article that I authored and submitted for
`the Proceedings of the IEEE Computer Society Workshop on VLSI 2000 in 2000,
`
`titled Architectural Adaptation in AMRM Machines; and Ex. 1005 is a true and
`
`correct copy of the article that I co-authored and submitted for the Frontiers '96,
`
`The Sixth Symposium on the Frontiers of Massively Parallel Computing in 1996,
`
`titled MORPH: A System Architecture for Robust Higher Performance Using
`Customization.
`
`6
`
`Intel Exhibit 1030 - 6
`
`

`

`I declare under penalty of perjury that the foregoing is true and correct.
`
`Date: March 31, 2021
`
`7
`
`Intel Exhibit 1030 - 7
`
`

`

`Appendix G1001
`Appendix G1001
`
`Intel Exhibit 1030 - 8
`
`Intel Exhibit 1030 - 8
`
`

`

`
`
`Frontiers 96
`
`———<==
`
`7
`
`The Sixth Symposium on
`The Frontiers of Massively Parallel Computation
`
`t
`
`
`
`
`
`October 27 - 31, 1996
`
`Annapolis, Maryland
`
`Sponsored by
`IEEE Computer Society
`
`
`
`EE
`
`®CompUTER
`CIE & ELECTRONICSENGINEERS,INC.
`TY
`TE So
`THE INSTITUTE OF ELECTRICAL AND
`le 5 )YEARS OF SERVICE *1946-1996
`
`ah Se ea
`
`__ IntelExhibit
`
`Intel Exhibit 1030 - 9
`
`

`

`Proceedings
`
`Frontiers ‘96
`
`The Sixth Symposium onthe
`Frontiers of Massively Parallel Computing
`
`:
`
`October 27-31, 1996
`Annapolis, Maryland
`
`Sponsored by
`IEEE Computer Society
`
`In cooperation with
`NASA Goddard Space Flight Center
`USRA/CESDIS
`
`eeameoaew
`aeSNESTA
`—_
`

`|JEEE Computer Society Press
`Los Alamitos, California
`Brussels
`@
`
`Tokyo
`
`Washington
`
`®@
`
`
`
`Intel Exhibit 1030 - 10
`
`

`

`
`
`
`
`{EEE ComputerSociety Press
`10662 Los Vaqueros Circle
`P.O.Box 3014
`Los Alamitos, CA 90720-1264
`
`
`Copyright © 1996 by The Institute of Electrical and Electronics Engineers, Inc,
`All rights reserved.
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyondthe limits of US copyright law, for private use of patrons, those articles in this volume
`that carry a codeat the bottom ofthe first page, provided that the per-copy fee indicated in the code is
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923,
`R
`Other copying,reprint, or republication requests should be addressed to:
`IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O, Box 1331, Piscataway, NJ 08855-1331.
`
`paid
`
`The papers in this book comprise the proceedings ofthe meeting mentioned on the cover and title page. They
`reflect the authors’ opinions and,
`in the interests of timely dissemination, are published as presented and
`without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society Press, or the Institute ofElectrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Press Order Number PRO7551
`IEEE Order Plan Catalog Number 96TB 100062
`ISBN 0-8186-7551-9
`Microfiche ISBN 0-8186-7553-5
`ISSN 1088-4955
`
`Additional copies may be orderedfrom:
`IEEE Computer Society
`IEEEService Center
`IEEE Computer Society
`IEEE Computer Society Press
`
`
`Customer Service Center 13, Avenue de l’Aquilon—Ooshima Building445 Hoes Lane
`10662 Los Vaqueros Circle
`P.O. Box 1331
`B-1200 Brussels
`2-19-1 Minami-Aoyama
`P.O, Box 3014
`Piscataway, NJ 08855-1331
`BELGIUM
`Minato-ku, Tokyo 107
`Los Alamitos, CA 90720-1314
`Tel: +1-908-981-1393
`Tel: +32-2-770-2198
`JAPAN
`Tel: +1-714-821-8380
`Fax; +1-908-981-9667
`Fax: +32-2-770-8505
`Tel: +81-3-3408-3118
`Fax; +1-714-821-464]
`misc.custsery@computer.org
`euro.ofc@computr.org
`Fax; +81-3-3408-3553
`Email; ¢s.books@computer.org
`tokyo.ofe@computer.org
`
`Editorial production by Penny Storms
`Cover by Kerry Bedford and Alex Torres
`Printed in the United States of America by KNI,Inc.
`
`& The Institute of Electrical and Electronics Engineers, Inc.
`
`
`
`Intel Exhibit 1030 - 11
`
`

`

`: e
`
`s
`xsaTh
`
`ereTPa
`
`aeeee
`
`——
`
`Panel Session—Petaflops Alternative Paths
`Panel Chair: Paul Messina, California Institute of Technology
`
`Session 7: Invited Speaker
`Independence Day
`Steven Wallach, HP-Convex
`
`Session 8A: Synchronization
`A Fair Fast Distributed Concurrent-Reader Exclusive-Writer Synchronization........--.++0++ 246
`T.J. Johnson and H. Yoon
`Lock Improvement Technique for Release Consistency in Distributed
`Shared Memory Systems.......scssssssesssssssssscscsrsenessosssnenssesucanansnaracnersessnnsasrsscsarsncenssnarseensestestseiees
`S.S. Fu and N-F. Tzeng
`A Quasi-Barrier Technique to Improve Performance of an Irregular Application.......-.+-+- 263
`H.V. Shah and J.A.B. Fortes
`
`255
`
`272
`
`282
`
`Session 8B: Networks
`Performance Analysis and Fault Tolerance of Randomized Routing on
`Clos NetevOPiS occccecscrcsscscoyseseenerceostasocneeonseroscrbevsansevsnorcssscsacthscznasccaica eaaa ca tcce aaa amas
`M.Bhatia and A.Youssef
`Performing BMMC Permutations in Two Passes through the Expanded
`Delta Network and MasPar MP-2 ...-ssssss-sssssssesssssssoseosstnnnunnnnnessssensseseT
`L.F. Wisniewski, T.H. Cormen, and T. Sundquist
`Macro-Star Networks: Efficient Low-Degree Alternatives to Star Graphs
`for Large-Scale Parallel TACHAECEITES asssviainiscaasabshcderesnrsssecoacarressrorbevtonsmereraascoseredalstansasstanaa
`C-H. Yeh and E. Varvarigos
`Session 9A: Performance Analysis
`Modeling and Identifying Bottlenecks in EOSDIS..........-s-ccsnsrseersnneettnnesssnennnnmanssannatesee 300
`J. Demmel, M.Y. Ivory, and S.L. Smith
`Tools-Supported HPF and MPIParallelization ofthe NAS Parallel Benchmarks......-.--+-+- 309
`C. Clémengon, K.M.Decker, V.R. Deshpande, A. Endo,
`J. Fritscher, P.A.R. Lorenzo, N. Masuda,A. Miiller,
`R. Ruhl, W. Sawyer, BJ.N. Wylie, and F. Zimmerman
`A Comparison ofWorkload Traces from Two Production Parallel Machines.......--ss+:-ssss0
`K. Windisch, V. Lo, D. Feitelson, R. Moore, and B. Nitzberg
`Morphological Image Processing on Three Parallel Machines pedenelReGaeeedenstetteeaciensonnesrteoere
`M.D. Theys, R.M.Born, M.D. Allemang, and H.J. Siegel
`Session 9B: Petaflops Computing/ Point Design Studies
`MORPH:A System Architecture for Robust High Performance Using
`Customization (An NSF 100 TeraOps Point Design Study)ssccsscosssserensnsncasseesecerseersessnesssnse
`A.A. Chien and R.K. Gupta
`Architecture, Algorithms and Applications for Future Generation
`cecccscsccccescsnceene®
`Opel cerriudsvacarcetcaasaacnsaceatanst7z sntcnhtons) aims)
`Supercomputers --rrvrrrrecerererrs
`V. Kumar, A. Sameh, A. Grama, and G. Karypts
`
`319
`
`327
`
`-
`sis
`
`vii
`
`Intel Exhibit 1030 - 1
`
`anneaa
`
`Intel Exhibit 1030 - 12
`
`

`

`a
`
`MORPH: A System Architecture for Robust High Performance
`Using Customization
`(An NSF 100 TeraOps Point Design Study)
`Andrew A. Chien
`
`Rajesh K. Gupta
`Departmentof ComputerScience
`University of Illinois
`Urbana,Illinois 61801
`achien,rgupta} @cs.uiuc.edu
`
`Abstract
`Achieving 100 TeraOps performance within a ten-
`year horizon will require massively-parallel architec-
`tures that exploit both commodity software and hard-
`ware technology for cost efficiency.
`Increasing clock
`rates and system diameter in clock periods will make
`efficient management of communication and coordina-
`tion increasingly critical. Configurable logic presents a
`unique opportunity to customize bindings, mechanisms,
`and policies which comprise the interaction of process-
`ing, memory, I/O and communication resources. This
`programming flexibility, or customizability, can pro-
`vide the key to achieving robust high performance.
`The MultiprocessOr with Reconfigurable Parallel
`Hardware (MORPH) uses reconfigurable logic blocks
`integrated with the system core to control policies, in-
`teractions, and interconnections. This integrated con-
`figurability can improve the performanceof local mem-
`ory hierarchy,
`increase the effictency of interproces-
`sor coordination, or better utilize the network bisec-
`tion of the machine. MORPH provides a framework
`for exploring such integrated application-specific cus-
`tomizability. Rather than complicate the situation,
`MORPH’s configurability supports component software
`and interoperabilty frameworks, allowing direct support
`forapplication-specified patterns, objects, and struc-
`tures. This paper reports the motivation and initial
`design of the MORPH system,
`
`1
`
`Introduction
`Increasing reliance on computational techniques for
`scientific inquiry, complex systems design, faster than
`real life simulations, and higher fidelity human
`com-
`puter interaction continue to drive the need or ever
`higher performance computing systems. Despite rapid
`progress in basic device technology [1], and even in
`uniprocessor computing technology [2], these applica-
`
`tions demand systems scalable in every aspect: pro-
`cessing, memory, I/O, and particularly communica-
`tion. In addition to advances in raw processing power,
`we must also achieve dramatic improvementsin system
`usability. Current day scalable systemsarestill quite
`difficult to program, and in many cases effectively pre-
`cluding use of the most. sophisticated (and mosteffi-
`cient) algorithms. Even whensuccessfully used, most,
`systems exhibit substantial performance fragility due
`to rigid architectural choices that do not work well
`across different applications.
`Based on technology projections for the 2007 design
`window for the NSF point design studies, our analyses
`indicate that increases in communication cost relative
`to computation (gate speeds) make configurable logic
`practical in an ever broader range of the system. The
`benefit of configurable logic is that it can be used to
`customize the machine’s behavior to better match that
`required by the application — in essence a machine can
`be tuned for each application with little or no perfor-
`mance penalty for this generality. While a broad va-
`riety of such architectures are possible, MORPH is a
`design point which explores the potential of integrating
`configurability deep into the system core.
`Because technology trends continue to increase the
`importance of communication, the MORPH architec-
`ture focuses on exploiting configurability to manage
`locality, communication, and coordination. In particu-
`lar, the MORPHdesign studyis exploring improved ef-
`ficiency andscalability by exploring novel mechanisms
`for binding and mechanisms which comprise the in-
`teraction of processing, memory, I/O, and communi-
`cation resources, Other innovations explore flexible
`hardware granularity (e.g. mechanisms and associa-
`tion with processors and memory) and memory Bye
`tem management (e.g. cache coherence, prefetching,
`
`1088-4955/96 $5.00 © 1996 IEEE
`
`336
`
`Intel Exhibit 1030 - 13
`
`Intel Exhibit 1030 - 13
`
`

`

`ae ininterchip wiring. However, commu-
`llTemain critical inachieving high perfor-
`
`Semiconductor Industry Associa-
`
`decade hence indicate advanced
`
`Hfeature size. However at the deep
`
`on evel feature
`sizeas measured bytransistor
`
`ength isincreasingly irrelevant for velocity-
`sa uurated carrier transport
`[3]. Both logic density and
`
`speed ared r
`by
`thein rconnectdensity: Pitch
`
`
`ect
`lengths are any-
`e pitch, that is, up
`delay. This limits
`
`comn
`
`
`bit 1030 - 14
`
`
`Intel Exhibit 1030 - 14
`
`

`

`of less than 5ns. As an alternative organization, cur-
`rent FPGA devices also offer up to 40 Kbits of mul-
`tifunctional memory in addition to over 60,000 us-
`able gates in implementation of specialized hardware
`functions such as arithmetic or DSP functions. Un-
`like standard SRAM parts, the embedded memory can
`be used as multiple-ported SRAM or FIFO providing
`much greater flexibility in system organization. Cur-
`rent efforts in FPGA design and architecture show
`significant improvements in the efficiency of the on-
`chip memory blocks. This trend in memory efficiency
`and utilization is expected to continueand close the
`gap between FPGA and SRAM densities and up to
`10 Mbitsof multi-functional embedded memorywould
`be availableiinadditiona th lorieblocks.insingle-
`
`nections are modified based on applications will soon
`
`be acommonfeature to allow programmable logic to be
`
`embedded in key modules of a system andprovide on-
`
`line programmability to change hardware functional-
`
`ity. Tools for distributed hardware control synthesis to
`
`allow dynamic binding of hardwareresources [7], and
`
`synthesis of protocols to low latency hardware(9,10, 3)
`
`have been successfully demonstrated. With these CAD
`
`and synthesis capabilities, embedded programmable
`
`logic can be inserted into the key parts of systems,
`
`andused toalter behavior dramatically with modest
`
`performance overhead.
`
`
`_ Thearchitectural SensOneof these Mee)
`
`
`
`
`bit 1030 - 15
`
`Intel Exhibit 1030 - 15
`
`

`

`Figure 1: A Flexible 100 TeraOp Architecture
`
`either acache-coherent machine, a non-cache coherent
`achine, or even clusters of cache coherent machines
`
`‘connected by put/getormessage passing. Varying bit 1030 - 16
`
`Intel Exhibit 1030 - 16
`
`

`

`eucecheorraniecticn
`‘
`anization(blocksizeor objects)
`B= heyS. a
`
`
`
`Figure2: A ‘S
`(AddressSpace Seeorganizations
`<
`+ A wide rang
`,
`.
`ured.
`:
`‘Cache coherence) canbe config-
`
`ementereEeeryplicationsthephys:
`seevaplero, global network) andby adcs
`:
`urce
`m:

`Sadie
`special hardware structures [30, 31] such as fast bad5
`rier or broadcast support for machine subsets or ‘i '
`entire machine, to optimize performance. For exam-
`ple, experience over the last ten years demonstrates
`that intraprocessor communication mechanisms (data_
`shared through the cache) are muchmore efficient than-
`{ interprocessor mechanisms. When ma-
`chine configuration granularity matches the applica~
`tion,extremelyhigh performance can result. The pro-
`grammablelogicon both processorelements and mem-
`oryelements
`us to dynamically associate mem-
`ies with
`SSOT
`ips,
`changing the node granu-J
`on set up. In another example, some
`5
`m low-latencybarriers,
`icast. Suchstructu
`
` even the bes
`
`bit 1030 - 17
`
`Intel Exhibit 1030 - 17
`
`

`

`feweaeeee likely to present too inflexible a
`dettnndi: P
`oth a wide range of applications and
`
`believe ie irregular, adaptive applications well. We
`wide
`at, achieving scalable high performance on a
`
`elesion of applications demands the development
`
`A
`Ologies (automatic and high level abstractions
`
`“or programmerassisted decisions) to exploit the flex-
`
`ibility of our proposed architecture.
`
`There are two basic types of techniques for identi-
`
`fying opportunities for customization: static analysis
`
`(compileranalysis and directives) and dynamic adap-
`
`tation (profiles and dynamicstatistics) to rationally
`
`makeuse of theflexibility to optimize the mapping and
`
`execution of the program. It is imperative that good
`
`performance
`beachievablewith modest effort and the
`
`evels
`of performance be available with reason-
`
`
`iqueswhichexploit aggressive
`s
`(35, 36, 37,38, 39, 40, 41, 42],
`statistics to optimize pro-
`essential to the pro-
`
`
`
`bit 1030 - 18
`
`
`Intel Exhibit 1030 - 18
`
`

`

`=
`
`ewing for Perfor-
`
`5 Application-Driven Customizability
`
`not possible. Critical issues in assessing mechanisms
`include hardware cost, cycle-time impact, configura-
`tioncost, effect on software, protection mechanism(s)
`needed, etc. We describe several illustrative examples
`below to showthe leverage and importance of a flex-
`:
`—
`Fey aia ible architecture. However, with such a small group,
`ACETS
`ASU
`Te
`these are neither representative nor typical; however,
`Figure 3: A
`Architect
`Software Architecturefora100TeraOp
`_they do illustratetheoverall architectural framework —
`nosis anrace Thispicture.
`y
`do
`ulust)
`wa
`&
`woes
`shows useof
`intime diag-
`;
`RPH provides.
`synthesized hardware ye eeecine
`that MORP iPro ice
`k
`td
`“4
`: Strid
`
`Sin a nut
`
`bit 1030 - 19
`
`Intel Exhibit 1030 - 19
`
`

`

`
`ee zations to be implemented with low over-
`latency etthe communication reduction and lower
`
`course,
`the efits without computational overhead. Of
`tomoe fre are a wealth of cache system optimiza-
`
`beans posedwithin parallel machines which could
`bi Nae tt lied in anapplication-specific mannerto achieve
`
`OSSt Performance(52, 53, 32).
`
`
`bit 1030 - 20
`
`
`Intel Exhibit 1030 - 20
`
`

`

`eoIntegrated Circuits, ch. 8, bit 1030 - 21
`
`Intel Exhibit 1030 - 21
`
`

`

`We Ha net PP522-545; August 1993.
`
`-
`Hall, K, Kennedy, and K,8. McKinley, “In-
`parallel code gener-
`fesewi Augisalolereneoy
`ov. 199] ing (Supercomputing

`,
`Burke, and P. Carini,
`“Efficient
`srocedural com tation ofpointer-
`sidearea in amenee ey
`SIGPLAN
`foam
`$ of Programming
`Languages,
`
`
`
`
`bit 1030 - 22
`
`
`Intel Exhibit 1030 - 22
`
`

`

`
`
`Intel Exhibit 1030 - 23
`
`

`

`vogentun)(Red
`
`hansenYOsieuold
`oppVO"
`
`vowmigOdushgWXISOL
`
`aul
`
`Intel Exhibit 1030 - 24
`
`Intel Exhibit 1030 - 24
`
`

`

`Appendix G1002
`Appendix G1002
`
`Intel Exhibit 1030 - 25
`
`Intel Exhibit 1030 - 25
`
`

`

`Proceedings
`
`Frontiers ‘96
`
`The Sixth Symposium on the
`Frontiers of Massively Parallel Computing
`
`October 27-31, 1996
`Annapolis, Maryland
`
`Sponsored by
`IEEE Computer Society
`
`In cooperation with
`NASA Goddard Space Flight Center
`US RNCESD IS
`
`IEEE Computer Society Press
`Los Alamitos, California
`Washington
`Brussels
`
`Tokyo
`
`Intel Exhibit 1030 - 26
`
`

`

`IEEE Computer Society Press
`10662 Los Vaqueros Circle
`P.O.Box 301 4
`Los Alamitos, CA 90720-1 264
`
`Copyright 0 1996 by The Institute of Electrical and Electronics Engineers, Inc.
`All rights reserved.
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume
`that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 088551331,
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. They
`reflect the authors’ opinions and, in the interests of timely dissemination, are published as presented and
`without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society Press, or the Institute of Electrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Press Order Number PRO755 1
`IEEE Order Plan Catalog Number 96TB 100062
`ISBN 0-81 86-755 1-9
`Microfiche ISBN 0-81 86-7553-5
`ISSN 1088-4955
`
`Additional copies may be ordered from:
`
`IEEE Computer Society Press
`Customer Service Center
`10662 Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitos, CA 90720-1314
`Tel: +1-714-821-8380
`Fax: +1-7 14-821 -4641
`Email: cs.books@computer.org
`
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`Tel: +1-908-98 1-1393
`Fax: +1-908-981-9667
`misc.custserv @ computer.org
`
`IEEE Computer Society
`13, Avenue de 1’Aquilon
`B-1200 Brussels
`BELGIUM
`Tel: +32-2-770-2198
`Fax: +32-2-770-8.505
`euro.ofc @ computr.org
`
`IEEE Computer Society
`Ooshima Building
`2-19-1 Minami-Aoyama
`Minato-ku, Tokyo 107
`JAPAN
`Tel: +81-3-3408-3118
`Fax: +81-3-3408-3553
`tokyo.ofc@ computer.org
`
`Editorial production by Penny Storms
`Cover by Kerry Bedford and Alex Torres
`Printed in the United States of America by KNI, Inc.
`
`The Institute of Electrical and Electronics Engineers, Inc.
`
`Intel Exhibit 1030 - 27
`
`

`

`Contents
`
`Message from the General Chair ............................................................................................
`ix
`Message from the Program Chair ...........................................................................................
`x
`Conference Committee .............................................................................................................
`xi
`...
`Referees ...................................................................................................................................... xu1
`
`Session 1: Invited Speaker
`From ASCI to Teraflops
`John Hopson, Accelerated Strategic Computing Initiative (ASCI)
`
`Session 2 A Scheduling 1
`Gang Scheduling for Highly Efficient Distributed Multiprocessor Systems ..............................
`H. Franke, P. Pattnaik, and L. Rudolph
`Integrating Polling, Interrupts, and Thread Management ........................................................
`K. Langendoen, J. Romein, R. Bhoedjang, and H. Bal
`A Practical Processor Design for Multithreading ....................................................................... 23
`M. Amamiya, T. Kawano, H. Tomiyasu, and S. Kusakabe
`
`13
`
`4
`
`Session 2B: Routing
`Analysis of Deadlock-Free Path-Based Wormhole Multicasting in
`Meshes in Case of Contentions .................................................................................................... 34
`E. Fleury and P. Fraigniaud
`Efficient Multicast in Wormhole-Routed 2D MesWTorus Multicomputers:
`A Network-Partitioning Approach ...............................................................................................
`S- Y. Wang, Y-C. Tseng, and C- W. Ho
`Turn Grouping for Efficient Multicast in Wormhole Mesh Networks ......................................
`K-P. Fan and C-T. King
`
`42
`
`50
`
`Session 3A. Applications and Algorithms
`A3: A Simple and Asymptotically Accurate Model for Parallel Computation ...........................
`A. Grama, V. Kumar, S. Ranka, and V. Singh
`Fault Tolerant Matrix Operations Using Checksum and Reverse Computation ....................
`Y. Kim, J.S. Plank, and J. J. Dongarra
`A Statistically-Based Multi-Algorithmic Approach for Load-Balancing
`Sparse Matrix Computations .......................................................................................................
`S. Nastea, T. El-Ghazawi, and 0. Frieder
`
`60
`
`.70
`
`78
`
`Session 3B: Petaflops Computing / Point Design Studies
`Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies ............. 88
`P.M. Kogge, S.C. Bass, J.B. Brockman, D.Z. Chen, and E. Sha
`Hybrid Technology Multithreaded Architecture .......................................................................
`G. Gao, K.K. Likharev, P.C. Messina, and T.L. Sterling
`The Illinois Aggressive Coma Multiprocessor Project (I-ACOMA) ..........................................
`J. Torrellas and D. Padua
`
`106
`
`.98
`
`V
`
`Intel Exhibit 1030 - 28
`
`

`

`Panel Session-How Do We Break the Barrier to the Software Frontier?
`Panel Chair: Rick Stevens, Argonne National Laboratory
`
`Session 4: Invited Speaker
`
`Session 5 A Scheduling 2
`Largest-Job-First-Scan-All Scheduling Policy for 2D Mesh-Connected Systems ................... 118
`S-M. Yo0 and H.Y. Youn
`Scheduling for Large-scale Parallel Video Servers ..................................................................
`M-Y. Wu and W. Shu
`Effect of Variation in Compile Time Costs on Scheduling Tasks on
`Distributed Memory Systems .................................................................................................... 134
`S. Darbha and S. Pande
`
`126
`
`Session 5B: SIMD
`Proce

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket