throbber
UNITED STATES PATENT AND TRADEMARK OFFICE
`____________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`____________________
`
`INTEL CORPORATION,
`
`Petitioner
`
`v.
`
`FG SRC LLC,
`
`Patent Owner
`
`____________________
`
`CASE NO.: 2020-01449
`PATENT NO. 7,149,867
`____________________
`
`DECLARATION OF AUSTIN M. SCHNELL
`
`
`
`
`
`
`
`
`
`Mail Stop PATENT BOARD
`Patent Trial and Appeal Board
`U.S. Patent and Trademark Office
`P.O. Box 1450
`
`Alexandria, VA 22313-1450
`
`
`
`Intel Exhibit 1029 - 1
`
`

`

`I, Austin M. Schnell, declare as follows:
`
`l.
`
`I am an associate with the law firm of Pillsbury Winthrop Shaw Pittman
`
`LLP. I haveservedin that role since 2020. I have personal knowledgeof the matters
`
`set forth in this declaration.
`
`2.
`
`Attached hereto as SCHNO1 is a true and correct copy of Rajesh Gupta,
`
`Architectural Adaptation in AMRM Machines, Proceedings of the IEEE Computer
`
`Society Workshop on VLSI 2000 (IEEE, April 27-28, 2000), 75-79 (“Gupta”). I
`
`retrieved a physical copy of the proceedings from the University of Texaslibrary. In
`
`addition to Gupta, SCHNO1 includes true and correct copies of the front and back
`
`covers of the proceedings, the spine and edgesof the proceedings, table of contents,
`
`introductory materials, and publication information.
`
`3.
`
`Attached hereto as SCHNO2is a true and correct copy of Andrew A.
`
`Chien et al., MORPH: A System Architecture for Robust High Performance Using
`Customization (An NSF 100 TeraOps Point Design Study), Proceedings of Frontiers
`
`°96 — The Sixth Symposium on the Frontiers of Massively Parallel Computing
`
`(IEEE, October 27-31, 1996), 336-345 (“Chien’’). I retrieved a physical copy of the
`
`proceedings from the University of Texas library. In addition to Chien, SCHNO2
`
`includestrue and correct copies of the front and back covers of the proceedings, the
`
`spine and edges of the proceedings, table of contents, introductory materials, and
`
`publication information.
`
`
`
`I declare under penalty of perjury that the foregoingis true apd correct.
`
`Date: March 31, 2021
`
`
`
`Austin M. Schnell
`
`Intel Exhibit 1029 - 2
`
`Intel Exhibit 1029 - 2
`
`

`

`Appendix SCHN01
`
`Intel Exhibit 1029 - 3
`
`

`

`Proceedings
`
`IEEE Computer Society
`
`
`Workshop on VLSI 2000
`
`System Design for a System-on-Chip Era
`Orlando, Florida
`27-2B April 2DCDO
`
`Edited by
`
`
`Asim Smailagic, Robert Brodersen and Hugo De Man
`
`TK
`7874.7-5
`1354
`
`2000 �
`ENG IN -�
`
`.JTER
`Sponsored
`by
`
`
`
`
`on VLSI ETY IEEE Computer Society Technical Committee
`
`Intel Exhibit 1029 - 4
`
`

`

`T
`
`l
`
`Intel Exhibit 1029 - 5
`
`Intel Exhibit 1029 - 5
`
`

`

`Proceedings
`
`
`
`THE UNIVERSITY OF TEXAS AT AUSTIN
`
`THE GENERAL LIBRARIES
`
`DUE
`
`RETURNED
`
`Sysi JUN O 3 2004
`
`)
`
`ip Era
`
`Intel Exhibit 1029 - 6
`
`

`

`Proceedings
`
`IEEE Computer
`Society
`Workshop on VLSI 2000
`
`System Design for a System-on-Chip
`Era
`
`27-28 April 2000
`
`Orlando,
`Florida
`
`Edited by
`Asim Smailagic,
`Robert Brodersen,
`and Hugo De Man
`
`Sponsored
`by the
`IEEE Computer
`Society
`Technical
`Committee
`on VLSI
`
`IEEE�
`COMPUTER
`SOCIETY
`
`♦
`Los Alamitos,
`California
`
`Washington • Brussels • Tokyo
`
`Intel Exhibit 1029 - 7
`
`

`

`
`
`Copyright © 2000 by The Institute of Electrical and Electronics Engineers, Inc.
`
`
`
`
`
`
`
`All rights reserved
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries
`
`
`
`
`
`
`
`
`
`
`
`
`
`in those articles use of patrons, law, for private of US copyright may photocopy beyond the limits
`
`
`this volume that carry a code at the bottom of the first page, provided that the per-copy fee
`
`
`
`
`
`
`indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive,
`
`Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights
`
`
`
`
`
`
`
`
`
`Manager, IEEE Service Center, 445 Hoes Lane, P.O. Box 133, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and title
`
`
`
`
`
`
`
`
`
`page. They reflect the authors' opinions and, in the interests of timely dissemination, are
`
`
`
`
`
`
`published as presented and without change. Their inclusion in this publication does not
`
`
`
`
`
`
`
`necessarily constitute endorsement by the editors, the IEEE Computer Society, or the Institute of
`
`
`
`Electrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Order Number PR00534
`
`
`ISBN 0-7695-0534-1
`ISBN 0-7695-0536-8(microfiche)
`
`
`Library of Congress Number 99-069215
`
`
`
`
`
`
`
`Additional copies may be ordered from:
`
`IEEE Service Center
`
`IEEE Computer Society
`
`IEEE Computer Society
`
`
`
`Customer Service Center
`445 Hoes Lane
`
`Asia/Pacific Office
`
`10662 Los Vaqueros Circle
`P.O. Box 1331
`
`
`Watanabe Bldg., 1-4-2
`P.O. Box 3014
`
`Piscataway, NJ 08855-1331
`Minami-Aoyama
`
`Los Alamitos, CA 90720-1314
`Tel:+ l-732-981-0060
`
`Minato-ku, Tokyo 107-0062
`Tel:+ 1-714-821-8380
`Fax: + 1-732-981-9667
`JAPAN
`l�ttp://sbop.iece.orgh.tore/
`Fax:+ 1-714-821-4641
`
`Tel: + 81-3-3408-3 l 18
`
`c· uswmcr-�ervicc or g<iy ieee.
`
`
`E-mail: cs.books@computer.org
`Fax:+ 81-3-3408-3553
`tokyo.ofc@computer.org
`
`
`
`
`
`Editorial production by Anne Rawlinson
`
`
`
`Cover art production by Joe Daigle/Studio Productions
`
`
`
`
`
`
`
`
`
`
`
`Printed in the United States of America by The Printing House
`
`
`
`IEEE�
`COMPUTER
`SOCIETY
`
`♦
`
`Intel Exhibit 1029 - 8
`
`

`

`Table of Contents
`
`Message from the General Chairs .................
`
`
`
`.................................................... ix
`
`
`
`Message from the Technical Program Chairs ................................................
`.. xi
`
`
`
`
`
`
`
`Workshop Committees .....................................................................................
`xiii
`
`
`
`
`Steering Committee .......................................................................................
`.... xv
`
`
`
`
`
`System Level Design Methods and Examples I
`
`Challenges Ahead ........................................................................................................... 3
`
`Mead-Conway VLSI Design Approach and System Design
`
`
`
`
`Lynn Conway
`
`
`
`
`
`Alternative Architectures for Video Signal Processing ..................................................... 5
`
`
`
`W.Wolf
`
`Sensor/Monitor Nodes ..................................................................................................... 9
`
`PicoRadio: Ad-hoc Wireless Networking of Ubiquitous Low-energy
`
`
`
`
`
`
`
`
`J.Rabaey, J. Ammer, J. L. da Silva Jr., and D. Patel
`
`
`
`System Level Design Methods and Examples II
`
`A System-level Approach to Power/Performance Optimization in Wearable
`
`
`
`
`
`
`Computers ..................................................................................................................... 15
`
`
`A.Smailagic, D. Reilly, and D. P. Siewiorek
`
`Emerging Trends in VLSI Test and Diagnosis ............................................................... 21
`
`
`
`
`
`Y.Zorian
`
`Multilanguage Design of a Robot Arm Controller: Case Study ....................................... 29
`
`
`
`
`
`
`
`and A.A. JerrayaP. LeMarrec, G.Nicolescu, P. Coste, F. Hessel,
`
`Low Power Design
`
`Instruction Scheduling Based on Energy and Performance Constraints ........................ 37
`
`
`
`
`
`
`
`
`
`
`A.Parikh, M. Kandemir, N. Vijaykrishnan, and M.J. Irwin
`
`
`
`Dynamic Voltage Scaling Techniques for Distributed Microsensor Networks ................. 43
`
`
`
`
`
`
`
`
`
`
`R.Min, T. Furrer, and A. Chandrakasan
`
`V
`
`Intel Exhibit 1029 - 9
`
`

`

`Reducing the Power Consumption in FPGAs with Keeping a High Performance
`
`
`
`47
`
`
`
`Level .............................................................................................................................
`A.D. Garcia G, W. P. Burleson, and J. L. Danger
`
`
`Multiple Access Caches: Energy Implications ...................................................
`
`
`
`............. 53
`
`
`H. S. Kim, N. Vijaykrishnan, M. Kandemir and M.J. Irwin
`
`System Level Design Examples
`
`Systems ........................................................................................................................ 61
`
`Improved Synchronization Methodologies for High Performance Digital
`
`
`
`
`
`S.Welch and K. Kornegay
`
`Low Power VLSI Architecture for 2D-Mesh Video Object Motion Tracking .................... 67
`
`
`
`
`
`W. Badawy and M. Bayoumi
`
`Timing Issues in System Design
`
`
`
`
`
`Architectural Adaptation in AMRM Machines ................................................................. 75
`
`
`
`R.Gupta
`
`Delay Element Design ................................................................................................... 81
`
`An Empirical and Analytical Comparison of Delay Elements and a New
`
`
`
`
`
`
`
`
`
`N.Mahapatra, S. Garimella and A. Tareen
`
`
`
`Software System Design and Design Environment
`
`Encapsulation and Symbolic Execution ......................................................................... 89
`
`Specification and Validation of Information Processing Systems by Process
`
`
`
`
`
`
`
`
`W. BoBung, T. Geyer, S. A. Huss, and L. Wehmeyer
`
`Compiler Generator Environment .................................................................................. 97
`
`Interaction in Language Based System Level Design using an Advanced
`
`
`
`
`
`
`
`
`and P. TsanakasI.Poulakis, G. Economakos,
`
`
`
`On Design-for-reusability in Hardware Description Languages .................................... 103
`
`
`
`
`
`
`
`J.M. Chang and S. K. Agun
`
`
`
`Analysis and Synthesis of Asynchronous Circuits
`
`
`
`
`
`
`
`
`
`Fine-grain Pipelined Asynchronous Adders for High-speed DSP Applications ............. 111
`
`
`
`
`
`
`M.Singh and S. M. Nowick
`
`A Low-latency Fl FO for Mixed-clock Systems .............................................................. 119
`
`
`
`
`
`
`
`T. Chelcea and S. M. Nowick
`
`Advances in Multiplier Design
`
`Reconfigurable Low Energy Multiplier for Multimedia System Design .......................... 129
`
`
`
`
`
`
`S.Kim and M. C. Papaefthymiou
`
`
`
`
`
`An Algorithmic Approach to Building Datapath Multipliers Using (3,2) Counters .......... 135
`
`
`
`
`
`
`
`
`H.AI-Twaijry and M. Aloqeely
`
`vi
`
`Intel Exhibit 1029 - 10
`
`

`

`
`
`
`
`
`
`G.Stoler
`
`
`A.Jaekel
`
`Issues in System Design
`
`RT-level Interconnect Optimization in DSM Regime .................................................... 143
`
`
`
`
`
`
`
`
`S.Katkoori and S. Alupoaei
`
`
`
`
`
`Validation of Complex Designs through Hardware Prototyping .................................... 149
`
`
`
`
`
`On-line Error Detection in Multiplexor Based FPGAs ................................................... 155
`
`
`
`
`
`Author Index .................................................................................................... 161
`
`
`
`
`
`
`
`vii
`
`Intel Exhibit 1029 - 11
`
`

`

`
`
`Message from the General Chairs
`
`-_ _J
`
`Welcome to Orlando, to WVLSI '2000. We hope and trust that those of you
`
`
`
`
`
`attending the workshop find it to be both enjoyable and a productive use of your
`
`
`time. WVLSI has become a regular annual forum for researchers to exchange
`
`ideas in the area of VLSI and system level design, in particular.
`
`This workshop has been successful over the past decade due to the many
`
`
`
`
`
`members of the VLSI community who have volunteered their efforts. In particular,
`
`
`we would like to thank the Program Co-chairs Asim Smailagic, Robert Broderson
`
`and Hugo De Man for having put together a program of high technical
`excellence.
`
`Srinivas Katkoori and Vamsi Krishna helped in coordinating the publicity for the
`
`
`
`
`
`
`
`
`workshop and earn our thanks for their excellent job. We appreciate the help
`
`
`
`from the past General Co-Chairs, Nagarajan Ranganathan and Anantha
`
`
`
`
`
`Chandrakhasan, in organizing the conference. Crucial administrative help came
`
`
`
`from the members of the IEEE Computer Society, in particular, Anne Marie Kelly,
`
`Mary-Kate Rada and Maggie Johnson.
`
`
`
`Welcome once again and enjoy the workshop program.
`
`Vijaykrishnan Narayanan
`
`
`
`The Pennsylvania State University, USA
`
`Mary Jane Irwin
`
`
`
`The Pennsylvania State University, USA
`
`ix
`
`Intel Exhibit 1029 - 12
`
`

`

`
`
`Message from the Technical Program Chairs
`
`It is our distinct pleasure to welcome you to the IEEE Computer Society Annual
`
`
`
`
`
`
`
`
`Workshop on VLSI in Orlando, FL.
`
`This Workshop explores emerging trends and novel concepts in the area of
`
`
`
`
`
`VLSI. The theme of the Workshop is System Design for a System-on-Chip Era.
`
`
`
`System Level Design has been identified as a dominant research theme for the
`
`
`next decade. System Design has been gaining significance and momentum
`
`
`
`
`recently due to the emergence of system-on-a-chip designs. New visionary
`
`
`approaches at the system design level are needed to exploit the great
`
`
`
`opportunities created by the continuous advances in technology and
`
`miniaturization of the semiconductor devices.
`
`System design is converging on a paradigm which includes general purpose
`
`
`
`
`
`
`chips (i.e. processors, memories, DSP) and full custom mixed
`commodity
`
`
`
`
`
`
`
`
`analogy and digital application specific integrated circuits (ASICs) integrated via
`
`
`
`
`programmable gate arrays on custom printed circuit boards or complete silicon
`
`
`
`boards, System-on-a-Chip. These hardware systems will be driven by custom,
`
`
`
`
`
`real time software that utilizes the latest software design paradigms (i.e. object
`
`
`
`
`
`oriented languages, client-server architecture, browser interfaces) and wireless
`
`
`
`
`communications to provide users with unique functionality. To be effective, these
`
`
`
`
`systems must be optimized taking into account a variety of constraints including
`
`
`
`
`complexity, power consumption, heat dissipation, mechanical packaging,
`
`
`
`
`ergonomics, and design effort. Also, future system design methodologies are an
`
`important topic at the Workshop.
`
`We are glad to have a number of leading scientists and distinguished speakers
`
`
`
`
`
`
`
`
`on the workshop program, providing an unique opportunity for the attendees to
`
`hear the recent research results in this technical area. It is the face to face
`
`
`
`
`
`meetings with each other that attendees will probably value most, which is why
`
`
`
`we have tried to maintain a schedule permitting such interactions.
`
`We would like to acknowledge the effort and help from the program committee
`
`
`
`
`
`
`
`members, and thank the authors and invited speakers for their contributions to an
`
`
`
`
`
`
`outstanding technical program. We gratefully acknowledge a diligent work of
`
`Anne Rawlinson, of the IEEE Computer Society Press, on the workshop
`proceedings.
`
`XI
`
`Intel Exhibit 1029 - 13
`
`

`

`It is our sincere hope that each attendee will benefit greatly from participating in
`
`
`
`
`
`
`
`
`this conforence, and will find these proceedings to be a valuable source of
`
`
`information for your future work.
`
`Asim Smailagic
`Carnegie Mellon University
`
`Robert Brodersen
`
`
`University of California at Berkeley
`
`Hugo De Man
`/MEG, Belgium
`
`Xll
`
`Intel Exhibit 1029 - 14
`
`

`

`General Chairs
`
`Vijaykrishnan Narayanan
`
`
`
`The Pennsylvania State University, USA
`
`Mary Jane Irwin
`
`
`
`The Pennsylvania State University, USA
`
`
`
`Technical Program Chairs
`
`Asim Smailagic
`
`Carnegie Mellon University, USA
`
`Robert Brodersen
`
`
`
`University of California, Berkeley, USA
`
`Hugo De Man
`/MEG, Belgium
`
`Program Committee
`
`University of Washington, USA
`
`
`Gaetano Borriello,
`
`
`University of Tennessee, USA
`Don Bouldin,
`/MEG, Belgium
`
`Francky Catthoor,
`Massachusetts Institute of Technology, USA
`
`
`
`Anantha Chandrakasan,
`
`Intel Corporation, USA
`Mike Connors,
`
`
`University of Braunschweig, Germany
`Rolf Ernst,
`
`
`Katholieke Universiteit Leuven, Belgium
`Georges Gielen,
`
`Cambridge University, UK
`Mike Gordon,
`
`
`Rajesh Gupta, University of California, Irvine, USA
`
`
`
`Germany Sorin Huss, Darmstadt University of Technology,
`
`
`Stanford University, USA
`Fritz Prinz,
`
`
`USA Teresa Meng, Stanford University,
`
`Xlll
`
`Intel Exhibit 1029 - 15
`
`

`

`Kyoto University, Japan
`
`
`H1detoshi Onodern,
`
`Philips, Netherlands
`Jef Van Meerbergen,
`
`
`SGs-Thomson, Grenoble, France
`Pierre Paulin,
`
`
`
`USA Jan Rabaey, University of California, Berkeley,
`
`
`University of Auckland, New Zealand
`Zoran Salcic,
`
`Carnegie Mellon University, USA
`Dan Siewiorek,
`UCLA, USA
`Mani Srivastava,
`/MEG Belgium
`
`Diederik Verkest,
`The Pennsylvania State University, USA
`
`
`
`Vijaykrishnan Narayanan,
`
`
`
`USA Jacob White, Massachusetts Institute of Technology,
`Wayne Wolf, Princeton University, USA
`
`
`
`
`Publicity Chairs
`
`Srinivas Katkoori
`
`
`
`University of South Florida, Tampa, USA
`
`Vamsi Krishna
`HP Labs, USA
`
`
`
`Registration Chair
`
`Srinivas Aruru
`
`
`Intel Corporation, USA
`
`XIV
`
`Intel Exhibit 1029 - 16
`
`

`

`
`
`Steering Committee
`
`A.Mukherjee
`
`
`
`
`
`University of Central Florida, Orlando, USA
`
`D.W. Bouldin
`
`
`
`
`University of Tennessee, Knoxville, USA
`
`N.Ranganathan
`
`
`University of South Florida, Tampa, USA
`
`P.A. Subramanyam
`AT&T Bell Labs, Murray Hill, USA
`
`J. A. B. Fortes
`
`
`Purdue University, West Lafayette, USA
`
`xv
`
`Intel Exhibit 1029 - 17
`
`

`

`
`
`
`
`Architectural Adaptation in AMRM Machines
`
`
`
`Rajesh Gupta
`
`Information and Computer Science
`
`
`
`
`University of California, Irvine
`
`Irvine, CA 92697
`
`rgupta@ics. uci. edu
`
`Abstract
`
`of places where architectural adaptivity can be used,
`
`
`
`
`
`
`for instance, in tailoring the interaction of processing
`
`
`with 1/0, customization of CPU elements ( e.g.,
`
`
`splittable ALU resources) etc.
`
`In view of the microelectronic technology trends
`
`
`Application adaptive architectures use archi­
`
`
`
`
`
`
`that emphasize increasing importance of
`
`
`tectural mechanisms and policies to achieve system
`
`
`
`communication at all levels, from network interfaces
`
`
`
`level performance goals. The AMRM project at UC
`
`
`
`to on-chip interconnection fabrics, communication
`
`
`
`Irvine focuses on adaptation of the memory hierarchy
`
`
`represents the focus of our studies in architectural
`
`
`and its role in latency and bandwidth management.
`
`
`adaptation. In this context, memory system latency
`
`
`
`This paper describes the architectural principles and
`
`
`
`
`and bandwidth issues are key determining factors in
`
`
`first implementation of the A MRM machine proof-of
`
`
`
`performance of high performance machines because
`
`concept prototype.
`
`
`
`these can provide a constant multiplier on the
`
`
`
`achievable system performance [1]. Further, this
`
`
`multiplier decreases as the memory latency fails to
`
`
`improve as fast as processor clock speeds.
`
`1.Introduction
`
`Consider a hypothetical machine with processing
`
`
`
`
`
`Modern computer system architectures represent
`
`
`
`elements running at 2 GHz with eight-way super­
`
`
`
`
`design tradeoffs and optimizations involving a large
`
`
`
`
`scalar pipelines. Assuming a typical 1 microsecond
`
`number of variables in a very large design space.
`
`
`round-trip latency for a cache miss, this corresponds
`
`
`Even when successfully implemented for high
`
`
`to about 16K instructions, with an average 30% or
`
`
`
`performance, which is benchmarked against a set of
`
`
`480'0 instructions being load/store. For a single­
`
`
`representative applications, the performance
`
`thread execution a miss rates as low as 0.02%
`
`
`
`
`optimization is only in an average sense. Indeed, the
`
`
`
`reduces computing efficiency by as much as 50%.
`
`
`
`
`performance variation across applications and against
`
`This points to a need for very low miss rates to
`
`
`changing data set even in a given application can
`
`
`ensure that high-throughput CPUs can be kept busy.
`
`
`easily be by an order of magnitude [l]. In other
`
`
`
`A similar analysis of the bisection bandwidth
`
`words, delivered performance can be less than one
`
`
`
`concludes that active bandwidth management is
`
`tenth of the system performance that the underlying
`
`
`
`required to reduce the need for communication and to
`
`
`hardware is capable of.
`
`
`increase the number of operations before a
`
`communication is necessary.
`A primary reason of this fragility in performance
`
`
`
`
`
`
`is that rigid architectural choices related to
`2.The AMRM Project
`
`organization of major system blocks (CPU, cache,
`memory, IO) do not work well across different
`applications.
`
`The Adaptive Memory Reconfiguration
`
`
`Management, or the AMRM, project at the
`Architectural Adaptivity provides an attractive
`
`
`
`
`
`
`University of California, Irvine aims to find ways to
`means to ensure robust high performance.
`
`
`of a improve the memory system performance
`
`
`
`Architectural adaptation refers to the capability of a
`
`
`computing system. The basic system architecture
`
`machine to support multiple architectural
`
`
`reflects the view that communication is already
`
`
`
`mechanisms and policies that can be tailored to
`
`
`
`critical and getting increasingly so [3], and flexible
`
`
`application and/or data needs [2]. There are a number
`
`
`
`0-7695-0534-1/00 $10.00 © 2000 IEEE
`
`75
`
`Intel Exhibit 1029 - 18
`
`

`

`
`
`
`
`This prefetch hardware is combined, for some
`
`
`
`interconnects can be used to replace static wires at
`
`
`
`
`
`
`applications, with address translation and compaction
`
`
`competitive performance in interconnect dominated
`
`
`hardware in the memory controller that works well
`
`
`
`microelectronic technologies [ 4,5,6]. The AMRM
`
`
`machine uses reconfigurable logic blocks integrated
`
`with data structures that do not quite fit into a single
`
`
`
`with the system core to control policies, interactions,
`
`
`cache line. The address translation is done
`
`
`and interconnections of memory to processing [7].
`
`
`transparently from the application using hardware
`assist
`
`in the cache controller and a
`
`
`
`The basic machine architecture supp·orts application­
`translate
`
`
`
`specific cache organization and policies, hardware­
`
`
`
`corresponding hardware assist in the memory
`gather
`
`
`
`
`assisted blocking, prefetching and dynamic cache
`
`
`
`controller. Simulation results using this prefetch
`
`
`
`structures (such as stream, victim caches, stride
`
`
`hardware show a I OX reduction in read miss rates
`
`
`
`
`prediction and miss history buffers) that optimize the
`
`
`and I OOX reduction in data volume reduction for
`
`
`
`sparse matrix multiply operations [9].
`
`
`
`movement and placement of application data through
`
`
`the memory hierarchy. Depending upon the hardware
`
`
`
`technology used and the support available from the
`3.The AMRM System Prototype
`
`
`
`runtime environment this adaptation can be done
`
`
`
`
`statically or at run-time. In the following section we
`
`
`describe a specific mechanism for latency
`While AMRM simulation results continue to
`
`
`
`
`
`management that is shown to provide significant
`
`
`
`provide valuable insights into the space of
`
`performance boost for the class of applications
`
`
`architectural mechanisms and their effectiveness
`
`
`
`
`characterized by frequent accesses to linked data
`
`
`[7][9][10], a system implementation is needed to
`
`
`structures scattered in the physical memory. This
`
`bring together different parts of the AMRM project
`
`
`
`includes algorithms that operate on sparse matrices
`
`
`
`
`(including compiler and runtime system algorithms to
`
`and linked trees.
`
`
`
`support adaptivity). The AMRM system prototype is
`
`
`
`divided into two phases. First phase consists of
`2.1 Adaptation for Latency Management
`
`
`
`implementation of a board-level prototype; followed
`
`
`by a second phase single-chip implementation of the
`
`
`cache memory system. At the time of this writing, the
`Latency management refers to techniques for
`
`
`
`
`
`first phase of the project prototype implementation
`
`
`
`hiding long latencies of memory accesses by useful
`
`has recently completed. The rest of this paper
`
`
`computation. Most common technique for latency
`
`
`
`describes the system design and implementation of
`
`hiding is by pre fetching of data to the CPU.
`
`
`the Phase I prototype and its relationship to the
`
`
`
`Prefetching is combined with smaller and faster
`
`
`ongoing second phase ASIC prototype.
`
`
`
`(cache) memory elements that attempt to prefetch
`
`
`application context(s) rather than single data
`The AMRM phase I prototype board is designed
`
`
`
`elements. The pointer-based accesses to data items in
`
`
`to serve two purposes. It can simulate a range of
`
`
`memory hierarchies typically yield poor results
`
`
`
`memory hierarchies for applications running on a
`
`
`
`because the indirection introduces main memory and
`
`
`
`host processor. The board supports configurability of
`
`
`memory hierarchy latencies into the innermost
`
`the cache memory via an on-board FPGA-based
`
`
`computational loop. Techniques such as software
`
`
`memory controller. The board is also designed to be
`
`
`
`
`prefetching (loop unrolling and hoisting of loads) do
`
`
`used, in future, as a complete system platform, with
`
`
`
`not adequately solve the problem, as prefetching at
`
`
`on-board memory serving as main memory, via a
`
`
`the processor leaves multi-level memory hierarchy
`
`
`mezzanine card containing the Phase II AMRM
`
`
`latency in the critical path. Purely hardware
`ASIC implementation.
`
`
`
`prefetching [8] is also often ineffective because the
`
`
`address references generated by an application may
`Whereas the goal of the AMRM prototype is to
`
`
`We use an
`
`
`
`contain no particular address structure.
`
`
`build an adaptive cache memory system, in general,
`
`
`application-specific prefetching scheme that resides
`
`the memory hierarchy performance cannot be de­
`
`
`
`
`in dedicated hardware at arbitrary levels of the
`
`
`coupled from the processor instruction set
`
`
`memory hierarchy, in all of them, or to bypass them
`
`architecture, implementation, and compiler
`
`
`completely. This hardware performs application­
`
`
`implementation. (This is particularly true of the
`
`
`
`
`specific prefetching, based on the address ranges of
`
`CPU-L 1 path that is often pipelined using non­
`
`
`data structures used. When there is a reference to an
`
`
`
`
`
`blocking caches.) It would, therefore, be desirable to
`
`
`
`
`address inside in this range, the prefetch hardware
`
`
`evaluate any proposed changes to the memory
`
`
`
`
`will prefetch the "next" element pointed to by the
`
`
`hierarchy for several processor architectures rather
`
`
`
`current element. The pointer field for the next
`
`
`than being tied to one specific processor type and its
`
`
`element can be changed at runtime.
`
`76
`
`Intel Exhibit 1029 - 19
`
`

`

`3.1 AMRM System Goals
`
`3.3 Command Interface to the AMRM
`
`software. In order to be able to evaluate the effect
`
`
`detailed and accurate simulation of diverse memory
`
`
`
`
`
`
`
`with different processor architectures as well as to
`
`
`hierarchy configurations at any clock speed, a
`
`
`
`circumvent the implementation difficulties, our Phase
`
`
`
`hardware "virtual" clock has been implemented as
`
`
`part of the performance monitoring hardware.
`
`
`
`I implementation attaches an additional memory
`
`
`
`hierarchy to a system through a standard peripheral
`
`
`
`
`Performance monitoring hardware primarily includes
`
`
`various event counters, which are memory-mapped
`
`
`bus. Thus, the board provides a PCT interface that
`
`
`and readable from the host processor. The "virtual
`
`
`allows a host processor to use the board as a part of
`
`
`
`
`clock" emulates a target system's clock: the clock
`
`
`its memory hierarchy. Applications running on the
`
`
`
`host processor are instrumented automatically using
`
`
`
`rate is determined by the target system's memory
`
`
`
`hierarchy design and technology parameters. For
`
`the AMRM compiler to use the memory on the
`
`example, the delay for an L 1 cache hit, miss fetch etc
`
`
`
`
`AMRM board. Thus direct program execution can
`
`
`in terms of virtual clocks can be configured by the
`
`
`proceed on the host processor while the extra
`
`
`
`host to emulate a given target cache design. Thus, the
`
`memory hierarchy is being exercised.
`
`
`use of virtual clock allows us to simplify the
`
`
`
`hardware implementation. For instance, the tag and
`
`
`data stores of L 1 cache can be a single RAM while
`
`
`
`the timing may reflect a design with two separate
`One goal of the AMRM prototype system is that
`
`RAM's.
`
`
`it be adaptable to many different memory hierarchy
`
`
`architectures. Another goal of the AMRM system is
`
`
`that it be useful for running real time program
`Board
`
`
`
`execution or even memory simulations. The latter is
`
`accomplished by making the AMRM memory
`
`
`
`available to the user and converting user program to
`The memory hierarchy on the AMRM board can
`
`
`access this memory "directly". The former is
`
`
`be used by an application running on the host
`
`
`
`accomplished through the use of a sequence of
`
`
`
`processor by writing commands to specific addresses
`
`
`address/command type requests "run" through
`
`
`
`in the PCI address space. Each command consists of
`
`
`various memory system configurations. The AMRM
`
`
`a set of four words that specify the operation ( e.g.,
`
`system is to be fast enough to support extensive
`
`
`
`memory read/write, register read/write), the address
`
`execution or simulation.
`
`
`of the location to access and data in case of a write.
`
`
`
`For read commands, a read response is generated and
`A CPU interfaces to the reconfigurable AMRM
`
`
`
`
`data is written into the host's memory.
`
`memory system through the PCI bus. AMRM
`
`
`accepts CPU PCI requests for memory operations,
`For debugging purposes and to enable the cache
`
`
`
`
`
`
`issues them to the attached memory system, and
`
`to be flushed by the host, there are commands to
`
`sends back the data for memory read operations as
`
`
`access memory banks directly, i.e., without going
`
`well as memory access time information.
`
`
`
`through the caches. Commands are also available to
`
`
`
`read/write the status, configuration registers and
`3.2 AMRM prototype architecture
`performance counters.
`
`
`
`The onboard command processors reads a
`
`Figure 1 shows the main components of the
`
`
`
`command and launches its execution in the AMRM
`
`
`
`
`AMRM prototype board. It consists of a general 3-
`board. Data is read from the cache and sent back to
`
`
`level memory hierarchy plus support for the AMRM
`
`
`the processor if it hits in the cache. It takes m Virtual
`
`
`
`ASIC chip implementing architectural assists with in
`
`
`
`Clock cycles. Otherwise it is requested from the next
`
`
`the CPU-LI datapath. The host interface is managed
`
`
`
`level in the hierarchy. Writes take n virtual clocks.
`
`
`by a Motorola PLX 9080 processor. The FPGAs on
`
`Upon load command completion the data can be
`
`
`the board contain controllers for the SRAM, DRAM
`
`
`
`written into the system memory for access by the host
`
`and L 1 cache. A I MB SRAM is used for tag and
`
`
`processor. Both parameters n and m can be
`
`data store for the L 1 cache. A total of 512 MB of
`
`
`programmed under compiler control.
`
`
`DRAM is provided to implement part of the cache
`
`hierarchy (and also to serve as main memory by
`
`
`reloading the memory controller into the FPGA.)
`
`
`
`3.4 Virtual Clock System
`
`The board implementation necessarily hard-wires
`
`
`The virtual clock system consists of a master
`
`
`
`
`
`certain parameters of the memory hierarchy. This
`
`
`
`clock counter, Vtime, the virtual clock signal,
`Velk,
`
`
`
`includes the board's clock. In order to perform
`
`77
`
`Intel Exhibit 1029 - 20
`
`

`

`Ready inputs, and associate virtual clock generation
`
`
`
`an AMRM chip that uses an ASIC implementation of
`
`
`
`logic. The Ready input from each major memory
`
`
`the AMRM cache assist mechanisms. It is positioned
`
`
`hierarchy module specifies that this unit has
`
`
`between L 1 cache and the rest of the system and can
`
`
`
`
`completed the current virtual clock cycle activities. A
`
`
`be accessed in parallel with the L2 cache. It can thus
`
`
`new Velk edge/period is generated when all Ready
`
`
`
`accept and supply data coming from or going to the
`
`inputs reach 1. The Vtime can be read out by the
`
`
`
`L l cache. For instance, it may contain a write buffer
`
`
`
`
`host processor to determine the current virtual time.
`
`
`
`or a prefetch unit to access L2. It also has access to
`
`
`It can also be automatically supplied to the CPU via
`
`the memory interface and thus can, for instance,
`
`host memory ( as opposed to the AMRM board
`
`
`
`prefetch from memory. The AMRM ASIC design is
`memory).
`
`
`
`currently in progress. This chip will include a
`
`
`processor core with adaptive memory hierarchy.
`
`
`When plugged into the AMRM board, the ASIC will
`Each major unit in the memory hierarchy is
`
`
`use onboard DRAM as main memory by simply
`
`
`designed to generate Ready and wait for the Velk,
`
`
`reconfiguring the memory controller in FPGAl.
`
`
`
`when appropriate. In most cases the designs actually
`
`
`use A utoReady counters local to each unit which can
`
`
`
`be loaded with a programmable number of cycles. An
`4.Summary
`
`
`AutoReady counter generates Ready using the Velk
`
`while its output is non-zero. An idle unit not
`
`
`
`processing any requests also outputs a Ready signal
`Traditional computer system architectures are
`
`
`
`
`
`every Velk. For instance, Consider the AMRM Read
`
`
`
`designed for best machine performance averaged
`
`
`command addressed to the on-board memory
`
`
`
`
`across applications. Due to the static nature of these
`
`
`hierar

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket