`
`SEVENTH ANNUAL
`IEEE SYMPOSIUM
`ON
`
`Intel Exhibit 1011 - 1
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`Proceedings
`
`SEVENTH ANNUAL
`IEEE SYMPOSIUM
`ON
`
`A
`
`April 21 - 23, 1999
`Napa Valley, California
`
`Sponsored by
`IEEE Computer Society Technical Committee on
`Computer Architecture
`
`Edited by
`Kenneth L. Pocek and Jeffrey M. Arnold
`
`SOCIETY
`
`Los Alamitos, California
`
`Washington
`
`Brussels
`
`Tokyo
`
`Intel Exhibit 1011 - 2
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`Copyright 0 1999 by The Institute of Electrical and Electronics Engineers, Inc.
`All rights reserved
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume
`that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 133, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page.
`They reflect the authors’ opinions and, in the interests of timely dissemination, are published as presented
`and without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society, or the Institute of Electrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Order Number PRO0375
`ISBN 0-7695-0375-6
`ISBN 0-7695-0377-2 (microfiche)
`IEEE Order Plan Catalog Number PRO0375
`ISSN 1082-3409
`
`Additional copies may be ordered from:
`
`IEEE Computer Society
`Customer Service Center
`10662 Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitos, CA 90720- I3 I4
`Tel: + 1-7 14-82 1-8380
`Fax: + 1-7 14-82 1-464 1
`E-mail: cs.books@computer.org
`
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855- 133 1
`Tel: + 1-732-98 I - 1393
`Fax: + 1-732-98 1-9667
`mis.custserv@computer.org
`
`IEEE Computer Society
`Ooshima Building
`1-4-2 Minami-Aoyama
`Minato-ku, Tokyo 107-0062
`JAPAN
`Tel: + 81-3-3408-31 18
`Fax: + 81-3-3408-3553
`Tokyo.ofc@computer.org
`
`Editorial production by Thomas Baldwin
`Cover art design by Joseph DaigleKtudio Productions
`Printed in the United States of America by The Printing House
`
`SOCIETY
`
`Intel Exhibit 1011 - 3
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`x
`
`.2
`
`12
`
`25
`
`Table of Contents
`SEVENTH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE
`CUSTOM COMPUTING MACHINES (FCCM’99)
`Co-chairs & Program Committee ..............................................................................................
`-SESSION 1: TOOLS 1
`CHAIR: Stephen Smith
`Macro-Based Hardware Compilation of JavaTM Bytecodes into a Dynamic
`Reconfigurable Computing System .............................................................................................
`J.M.P. Cardoso, H.C. Net0
`A CAD Suite for High-Performance FPGA Design ....................................................................
`B. Hutchings, P. Bellows, J. Hawkins, S. Hemmert, B. Nelson, M. Rytting
`Formal Verification of Reconfigurable Cores ..............................................................................
`S. Singh, C.J. Lillieroth
`- SESSION 2: NETWORK APPLICATIONS
`CHAIR: Mark Shand
`Transmutable Telecom System and Its Application ...................................................................
`T. Miyazaki, T. Murooka, M. Katayama, A. Takahara
`Implementation and Evaluation of a Prototype Reconfigurable Router .....................................
`J.R. Hess, D.C. Lee, S.J. Harper, M.T. Jones, P.M. Athanas
`- SESSION 3: COMPILATION
`CHAIR: Andrk DeHon
`Pipeline Vectorization for Reconfigurable Systems ...................................................................
`M. Weinhardt, W. Luk
`Automatic Allocation of Arrays to Memories in FPGA Processors with
`Multiple Memory Banks ............................................................................................................. 63
`M.B. Gokhale, J.M. Stone
`. .
`Parallelizing Applications into Silicon.. ......................................................................................
`J. Babb, M. Rinard, CA. Moritz, W. Lee, M. Frank, R. Barua, S. Amarasinghe
`- SESSION 4: ARCHITECTURES
`CHAIR: Scott Hauck
`Reconfigurable Elements for a Video Pipeline Processor ..........................................................
`M.R. Piacentino, G.S. van der Wal, M. W. Hansen
`
`.70
`
`.34
`
`.44
`
`.52
`
`.82
`
`V
`
`Intel Exhibit 1011 - 4
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`.92
`
`1 14
`
`123
`
`104
`
`ConCISe: A Compiler-Driven CPLD-Based Instruction Set Accelerator ..................................
`B. Kastrup, A. Bink, J. Hoogerbrugge
`- SESSION 5: TOOLS 2
`CHAIR: Roger Woods
`CPR: A Configuration Profiling Tool ........................................................................................
`S. Cadambi, S.C. Goldstein
`Debugging Techniques for Dynamically Reconfigurable Hardware .........................................
`N McKay, S. Singh
`Improving Simulation Accuracy in Design Methodologies for Dynamically Reconfigurable
`Logic Systems .............................................................................................................................
`M. Vasilko. D. Cabanis
`- SESSION 6: GRAPHICS APPLICATIONS
`CHAIR: Herman Schmit
`Reconfigurable Computing for Augmented Reality ..................................................................
`W: Luk, T.K. Lee, J.R. Rice, N. Shirazi, P. Y.K. Cheung
`Sepia: Scalable 3D Compositing using PCI Pamette .................................................................
`L. Moll, A. Heirich, M. Shand
`- SESSION 7: APPLICATIONS
`CHAIR: Mike Butts
`An Edge-Endpoint-Based Configurable Hardware Architecture for
`VLSI CAD Layout Design Rule Checking ...............................................................................
`Z. Luo, M. Martonosi, P. Ashar
`FAFNER-Accelerating Nesting Problems with FPGAs ............................................................
`J. C. Alves, J. C. Ferreira, C. Albuquerque, J.F. Oliveira, J.S. Ferreira, J. Silva Matos
`- SESSION 8: DSP APPLICATIONS
`CHAIR: Phil Kuekes
`Field Programmable Gate Array Based Radar Front-End Digital Signal Processing ............... 178
`T.J. Moeller, D.R. Martinez
`Optimizing FPGA-Based Vector Product Designs ....................................................................
`D. Benyamin, W: Luk, J. Villasenor
`
`136
`
`146
`
`158
`
`168
`
`1 88
`
`vi
`
`Intel Exhibit 1011 - 5
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`209
`
`.232
`
`240
`
`.222
`
`-. SESSION 9: RUN TIME SYSTEMS
`CHAIR: Satnam Singh
`PCI-PipeRench and the SWORDAPI: A System for Stream-based Reconfigurable Computing 200
`R. Laufer, R.R. Taylor, H. Schmit
`Safe and Protected Execution for the MorpWAMRM Reconfigurable Processor .....................
`A.A. Chien, J.H. Byun
`Implementing an API for Distributed Adaptive Computing Systems .......................................
`M. Jones, L. ScharJI J. Scott, C. Twaddle, M. Yaconis, K. Yao, P. Athanas, B. Schott
`- SESSION 10: ARITHMETIC
`CHAIR: Steve Casselman
`A Super-serial Galois Fields Multiplier for FPGAs and its Application to
`Public-Key Algorithms ............................................................................................................
`G. Orlando, C. Paar
`Automatic Floating to Fixed Point Translation and its Application to
`Post-Rendering 3D Warping .....................................................................................................
`M.P. Leong, M. Y. Yeung, C. K. Yeung, C. FY Fu, P.A. Heng, P.H. FY Leong
`Dynamic Precision Management for Loop Computations on Reconfigurable Architectures.. . .249
`K. Bondalapati, V. K. Prasanna
`- POSTER SESSION 1
`Accelerating Run-Time Reconfiguration on FCCMs ................................................................ 260
`J. -P. Heron, R.F. Woods
`A Virtual Hardware Handler for RTR Systems .........................................................................
`R. Turner, R.F. Woods, S. Sezer, J.-P. Heron
`Algorithm Analysis and Mapping Environment for Adaptive
`Computing Systems: Further Results ........................................................................................
`E.K. Pauer, P.D. Fiore, J.M. Smith
`Development System for FPGA-Based Digital Circuits ...........................................................
`V. Sklyarov, J. Fonseca, R. Monteiro, A. Oliveira, A. Melo,
`N. Lau, I. Skliarova, P. Neves, A. Ferrari
`Design of a JTAG Based Run Time Reconfigurable System ....................................................
`C. Cousineau, F. Laperle, Y. Savaria
`Architectures for System-Level Applications of Adaptive Computing ....................................
`B. Schott, C. Chen, S. Crago, J. Czarnaski, M. French, I. Hom, T. Tho, T. Valenti
`Task-level Partitioning and RTL Design Space Exploration for Multi-FPGA Architectures .. .272
`V. Srinivasan, R. Vemuri
`Enabling Automatic Module Generation for FCCM Compilers ................................................
`A. Koch
`
`262
`
`264
`
`.266
`
`268
`
`.2'70
`
`274
`
`vii
`
`Intel Exhibit 1011 - 6
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`278
`
`282
`
`284
`
`286
`
`290
`
`.294
`
`.298
`
`.300
`
`304
`
`-. POSTER SESSION 2
`ICARUS: A Dynamically Reconfigurable Computer Architecture ...........................................
`M. Baxter
`SONIC-A Plug-In Architecture for Video Processing ............................................................ 280
`S.D. Haynes, P. Y.K. Cheung, K Luk, J. Stone
`A Reconfigurable Platform for Academic Purposes ..................................................................
`C. Teuscher, J.-0. Haenni, F.J. Gbmez, H.F. Restrepo, E. Sanchez
`VHDL Placement Directives for Parametric IP Blocks .............................................................
`J. Hwang, C. Patterson, S. Mitra
`Runlength Compression Techniques for FPGA Configurations ................................................
`S. Hauck, KD. Wilson
`- POSTER SESSION 3
`Accelerating An IR Automatic Target Recognition Application with FPGAs ..........................
`J. Jean, X Liang, B. Drozd, K. Tomko
`Mapping of an Automated Target Recognition Application from a Graphical Software
`Environment to FPGA-based Reconfigurable Hardware .........................................................
`B. Levine, S. Natarajan, C. Tan, D. Newport, D. Bouldin
`Hybrid DatdConfiguration Caching for Striped FPGAs ..........................................................
`D. Deshpande, A.K. Somani, A. Tyagi
`On Reconfiguring Cache for Computing ................................................................................... 296
`H.-S. Kim, A.K. Somani, A. Tyagi
`Reconfigurable Pipelines in VLIW Execution Units ................................................................
`R.D. Williams, B.D. Kuebert
`Fast Online Placement for Reconfigurable Computing Systems ..............................................
`K. Bazargan, M. Sarrafzadeh
`POSTER SESSION 4
`A Compact Fast Variable Key Size Elliptic Curve Cryptosystem Coprocessor ........................
`L. Gao, S. Shrivastava, H. Lee, G.E. Sobelman
`A Virtual Logic Algorithm for Solving Satisfiability Problems Using
`Reconfigurable Hardware ......................................................................................................... 306
`M. Abramovici, J. T. de Sousa
`Reducing Compilation Time of Zhong’s FPGA-based SAT solver ...........................................
`P.K. Chan, M.J. Boyd, S. Goren, K. Klenk, V. Kodavati, R. Kundu, M. Margolese,
`J. Sun, K. Suzuki, E. Thorne, X Wang, J. Xu, M. Zhu
`FPGA-based Structures for On-line FFT and DCT ..................................................................
`D. Lau, A. Schneider, M.D. Ercegovac, J. Villasenor
`
`.292
`
`308
`
`.3 10
`
`...
`Vlll
`
`Intel Exhibit 1011 - 7
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`An FPGA-based Fan Beam Image Reconstruction Module ......................................................
`L. Maltar, F.M.G. Franqa, V.C. Alves, C.L. Amorim
`Bezier Curve Rendering on VirtexTM ..... .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 14
`D. MacVicar, S. Singh, R. Slous
`
`3 12
`
`Author Index .............................................................................................................................
`
`3 18
`
`ix
`
`Intel Exhibit 1011 - 8
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`Safe and Protected Execution for the Morph/AMRM Reconfigurable
`Processor
`
` Jay H. Byun
` Andrew A. Chien
` Department of Computer Science and Engineering
` Department of Computer Science
` University of California, San Diego
` University of Illinois at Urbana-Champaign
` achien@cs.ucsd.edu
` jaybyun@cs.uiuc.edu
`
`April 1, 1999
`
`Abstract
`
`Technology scaling of CMOS processes brings relatively faster
`transistors (gates) and slower interconnects (wires), making viable
`the addition of reconfigurability to increase performance. In the
`Morph/AMRM system, we are exploring
`the addition of
`reconfigurable logic, deeply integrated with the processor core,
`employing the reconfigurability to manage the cache, datapath,
`and pipeline resources more effectively. However, integration of
`reconfigurable logic introduces significant protection and safety
`challenges for multiprocess execution. We analyze the protection
`structures in a state of the art microprocessor core (R10000),
`identifying the few critical logic blocks and demonstrating that the
`majority of the logic in the processor core can be safely
`reconfigured. Subsequently, we propose a protection architecture
`for the Morph/AMRM reconfigurable processor which enable
`nearly the full range of power of reconfigurability in the processor
`core while requiring only a small number of fixed logic features
`which to ensure safe, protected multiprocess execution.
`
`1. Introduction
`
`Trends in semiconductor technology suggest that the
`use of reconfigurable logic blocks within the processor will
`be desirable in the future. Projections from Semiconductor
`Industry Association(SIA) for the year 2007 indicate
`advanced semiconductor processes using 0.1 micron feature
`sizes [1]. However, this feature size, as measured by
`transistor channel length, is of decreasing importance to
`logic and circuit as well as processor speed. In systems of
`that era, logic density, logic speed, and processor speed will
`be dominated by interconnect performance and wiring
`density. For 2007, the SIA projects pitch for the finest
`interconnect at 0.4-0.6 microns. Between logic blocks,
`average interconnect lengths typically range from from
`1,000x to 10,000x pitch -- up to 6mm of intra-chip
`interconnect
`length.
` For such an
`interconnect,
`the
`achievable global clock speed would be
`limited
`to
`approximately 1 nanosecond. Within a few technology
`generations, a crossover will occur, and the average
`interconnect delay will surpass logic block delays --
`projections
`indicate that by
`the year 2007, average
`interconnect delay can be equivalent to five gate delays.
`
`
`
` 1
`
`Once past the cross-over point, dynamic interconnect
`(reconfigurable interconnect or logic) can be introduced at
`modest impact even on critical timing paths[2]. In such
`systems, the dynamic configurability in the processor can
`be used
`to significant advantage [4, 5],
`improving
`performance by factors of 10 to 100x for computational
`kernels while avoiding the traditional disadvantages of
`custom computing approaches such as I/O coprocessor
`coupling and slower
`logic [6].
` In
`these systems,
`reprogrammable
`logic
`blocks will
`replace
`static
`interconnects in the processor core, paving the way for a
`new class of architectures which are customized to the
`application, delivering more robust and higher performance.
`
`Figure 1. Reconfigurability in the processor core and the extended
`application to fixed hardware interface
`
`Reconfigurable, or application adaptive processors
`allow customization of mechanisms, bindings, and policies
`on a per application basis. While current microprocessors
`implement a number of aggressive architectural techniques
`such as speculative execution, branch prediction, block
`prefetching, multi-level caching, etc. to achieve higher
`execution speeds, these mechanisms and policies are tuned
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`Intel Exhibit 1011 - 9
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`for a broad suite of applications (e.g. SPEC), and thus
`cannot be tightly matched to the needs of a particular
`application, procedure, or even loop in an application. For
`example, the cache block size and organization is chosen to
`maximize performance over a suite of applications, but may
`not give best performance on any particular application.
`Similar constraints apply to other performance critical
`aspects such as value prediction, branch prediction, and
`data movement. In contrast, a processor incorporating
`reconfigurability can adopt optimal policies (and in some
`cases better mechanisms) for the application, enabling
`increased execution efficiency. Thus, the reconfigurable
`logic can used to tune the processor to better match the
`application, rather than the more traditional view of
`thinking of it as an add-on coprocessor. One example of
`this per-application basis tuning would be to adapt the
`cache
`line size
`to maximize performance
`for
`that
`application[3]. This approach
`is embodied
`in
`the
`Morph/AMRM
`(Adaptive Memory Reconfiguration
`Management) architecture [4, 5], and the basic change in
`perspective is that the reconfigurable hardware is an
`extension of
`the application program, extending
`the
`application -- fixed hardware interface to enable more
`efficient execution. The fixed hardware then has a
`somewhat richer (and in parts lower level) interface as
`shown in Figure 1. Studies of Morph/AMRM have
`demonstrated that performance increases of ten to 100 times
`are possible [5]. In essence, this is an extension of the
`application binary interface (so-called ABI), but need not
`be a non-portable extension of the application programming
`interface (API) if appropriate CAD support is available.
`This approach is similar to that which has recently gained
`popularity in the software design community as "open
`implementations"
`[7]
`in which
`software architects
`recognize
`the need
`to open
`the
`implementation for
`customization for particular application uses in order to
`achieve adequate performance.
`Introducing application-controlled reconfigurability in
`the processor raises significant challenges for ensuring
`process isolation and protection (multiprocess isolation), a
`critical element of robust desktop and to an increasing
`degree, embedded computing
`systems. Multiprocess
`isolation is an essential modularity element in software
`systems: without the guarantee of safely isolated and
`protected processes, the system can never be robust since
`software faults cannot be contained and the system cannot
`be safely extended. It is essential for robust reconfigurable
`computing that an application's customization only affect its
`computation, not that of other applications. For example, if
`application-defined hardware were allowed to control
`hardware addressing,
`it
`could allow unauthorized
`corruption of operating system data or even the data of
`other application processes. If an application-defined
`hardware were allowed to control data prefetching, it could
`swamp the memory system with spurious requests. If
`
`
`
` 2
`
`application-defined hardware were allowed to control
`privilege mode changes, it could compromise all traditional
`protection structures.
`the protection structures of
` Our study examines
`traditional processors and operating systems, and based on
`these lessons, proposes a safe multiprocess execution
`architecture for reconfigurable systems. We analyze in
`detail the software and hardware mechanisms central to the
`process protection in conventional processors and OS,
`specifically studying the MIPS R10000 [8] microprocessor,
`an exemplar of a system employing Unix/RISC protection
`architecture. This study elucidates the key mechanisms and
`architectural features for Unix style two mode protection,
`and addressing based isolation. The key feature of this
`protection architecture is process isolation via address
`isolation and mediation. Specifically,
`1. All access to hardware devices is mediated by the
`operating system,
`2. The operating system manages address translation to
`isolate processes,
`3. Application processes cannot change the address
`translation information,
`4. Application processes cannot substitute other
`translation information,
`5. All application accesses are subject to this
`translation, and
`6. The hardware ensures these guarantees
`
`the Morph/AMRM
`describe
`subsequently
`We
`architecture, outlining the dimensions of configurability and
`the hazards for multiprocess protection they induce. For the
`Morph/AMRM system, we then describe the protection
`architecture, describing in detail how each of the key
`properties of the operating system / processor protection
`architecture are provided. The key elements of this
`protection architecture are:
`1. A hardwired control processor which controls
`instruction sequence and privilege mode transitions
`2. A hardwired control processor to TLB control for
`address translation and TLB entry management
`3. A requirement for all other configurable elements
`(system chip sets, input/output devices, memory
`controllers) must deal in virtual addresses, and their
`accesses are checked by local TLBs
`4. Controlled access to key shared interconnects such as
`the system bus are controlled by hardwired arbiters
`which are not changed, system reserves highest priority
`to allow preemption for these resources
`
`This architecture enables configurability in the processor
`complex because it can ensure multiprocess protection (safe
`configuration). We also believe it enables much of the
`useful configurability in the processor complex, notably
`policies for improving efficient management of resources
`and even the addition of instructions, special functional
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`Intel Exhibit 1011 - 10
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`units, or even processor state. The model provided to
`application programs is a private, configurable, virtual
`machine which enables rich application customization.
`These applications (and their customizations) are cleanly
`isolated.
`The remainder of the paper is organized as follows.
`Section 2 describes
`the basic problem of protected
`execution and process isolation in computer systems.
`Section 3 describes our analysis of the software and
`hardware mechanisms central to the process protection in
`conventional processors and operating systems. Section 4
`discusses the implications of reconfigurability on process
`protection and identifies the key requirement for safe
`process isolation in reconfigurable processors. In Section 5,
`we describe the Morph/AMRM system and a proposed
`protection architecture that meets these requirements set
`forth in Section 3. Section 6 discusses alternate approaches
`and the limitations on configurability imposed by the
`Morph/AMRM protection architecture.
` Sections 7
`summarizes future work and the material covered in this
`paper.
`
`2. Process Isolation: the Problem
`
`Figure 2: Multiprocess Protection based on Address Space
`Isolation
`
`To understand the challenges of multiprocess isolation,
`it is instructive to first consider the possible modalities in
`which multiprocess isolation can be compromised. In the
`simplest mode, an application corrupts the data of another,
`causing it to fail or compute incorrectly. In a more complex
`mode, the application somehow locks up the machine, so no
`other application state is damaged, but neither can the
`machine make progress. One example of this would be
`jamming the memory bus or defeating the timer interrupt
`which ensures preemption. A more serious failure mode is
`to corrupt the operating system's data, which can lead to a
`machine crash
`in which all applications have data
`corruption. Finally, an application could also corrupt
`
`
`
` 3
`
`the operating
`input/output device state, confounding
`system, the device (leading to data loss or misdirection), or
`application data itself. In all of these cases, the failure is
`the result of allowing an application action which can affect
`the machine hardware state, other application memory state,
`or operating system state.
`The key issue in safe multiprocess execution is to
`control access to hardware resources, ensuring that these
`accesses are non-interfering. In general, access to main
`memory, as well as other architecturally visible state
`(processor data registers, control registers), system chip
`registers, and input/output device state must be controlled.
`Traditional approaches partition memory access, virtualize
`resources
`such as processor data
`resources with
`multitasking, and use operating system calls to mediate
`operations which require access to control registers, system
`chip sets, input/output device state, etc. Note that isolation
`and virtualization must apply to any resource at any level
`that a process can claim its ownership. The final piece of
`the puzzle is that in order to support the virtualization and
`multitasking, transitions between the different entities must
`be carefully controlled to prevent compromise.
`
`3. Process Isolation in the MIPS R10000
`
`The key issue in maintaining a safe multiprocessing
`environment is ensuring process isolation: the processor
`and the OS must prevent independent processes from
`interfering with the data and memory of each other and of
`the operating system kernel. They must also prevent a
`malicious process from taking over the processor and
`locking up the system.
`Through a detailed analysis of the R10000 architecture
`and operating
`system, we
`identify
`the hardware
`mechanisms and OS software structures that are central to
`process isolation. We chose the MIPS R10000 processor as
`an exemplar of a modern RISC processor that supports a
`relatively simple UNIX style protection structure [9]. We
`first examine how a UNIX style operating system ensures
`process
`isolation and
`thereby derive
`the hardware
`requirements it imposes. Then identify the corresponding
`support
`in
`the R10000 processor. In
`the following
`discussion, we assume that the address translation is on a
`simple paging system. Most of today’s systems actually
`employ multiple-paging or segmented paging but the
`address translation mechanism is fundamentally the same as
`a simple paging system.
`
`3.1 Operating System-based Process
`Protection
`
`3.1.1 Application and Operating System Memory
`Isolation
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`Intel Exhibit 1011 - 11
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`
`
`Application and operating system memory isolation is
`achieved through controlled address translation. The
`physical memory of each process is isolated by having
`process's virtual address space pages map to its own
`physical memory frames only. To protect processes from
`modification by other processes, the memory-management
`hardware and the OS must prevent programs from changing
`their own address mappings. The UNIX kernel, for
`example, runs in a privileged mode (kernel mode or system
`mode) in which memory mapping may be controlled,
`whereas application processes run in an unprivileged mode
`(user mode). The page tables, mapping information for each
`process reside in the memory space of the kernel so that
`they can only be modified by the OS running in kernel
`mode This address translation control to ensure isolation is
`achieved through the following mechanisms in UNIX [9,
`10].
`
`1. Locating correct translation information for each process.
`By using a special page table base register(ptbr) which
`is set from the process control block(PCB) on each
`process switch, the OS can correctly locate the page
`table for the executing process. Then the index portion
`of the virtual address is added to the address pointed to
`by the ptbr to locate the appropriate page table entry
`(PTE).
`2. Distinguishing valid and invalid entries in page tables.
`Notice that the page table can contain entries that are
`not used by
`the process. These unused entries
`correspond to the pages that are not in the process’s
`logical address space and thus compromise process
`isolation. The OS uses valid-invalid bits to distinguish
`these entries. Alternatively, the page table can also be
`implemented to contain only the entries that are
`actually used by the process. This implementation will
`require a special register containing the length of the
`process's page table, usually called page-table length
`register(PTLR). PTLR can be used to check if page
`index portion of the virtual address is in the range and
`therefore
`is not
`accessing
`illegal
`translation
`information.
`3. Controlling access types
`While the address translation to physical memory
`frames can be valid, the access to those physical
`memory frames are unlimited; the process can read,
`write, and execute them. It will be safer and more
`efficient if we can control the type of access to them.
`The protection bit field in the PTE provides this access
`control information. At the same time that the physical
`address is being computed, the protection bits can be
`checked to verify that no accesses not granted are
`being made. These bits usually indicate whether the
`process can read/write, read-only, or execute-only.
`The type and the number of the protection bits
`provided are dependent on the underlying processor.
`
`
`
` 4
`
`4. Managing TLB consistency.
` The translation information, namely the PTE, is
`cached in the processor's TLB to avoid extra memory
`access to the page table. Using special privileged
`instructions, OS updates the TLB with consistent
`mapping information when a miss occurs. But notice
`that after a context switch, although the new page table
`is pointed to by the n