`
`
`«Doouoam-booN—x
`_\_\_\_\_\_\_\_\_\_\
`©00xl0301-booN—xo
`OJNNNNNNNNNNomooxlmm-boaN—xo
`00000000000.)6:01-wa4
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`00000.)«DOOM
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`mm-b-b-b-b-b-b-b-b-b-bAOQDOONmm-wa—‘to
`
`O‘IO‘IO‘IO‘IO‘I0301-wa
`
`Proceedings '
`
`SEVENTH ANNUAL
`
`IEEE SYMPOSIUM ON
`
`FIELWPROGRAMMABLE
`
`.
`
`CUSTOM COMPUTING MACHINES
`
`FCCM ’99
`
`XILINX 1011
`
`1
`
`XILINX 1011
`
`
`
`
`
`Proceedings
`
`SEVENTH ANNUAL
`
`IEEE SYMPOSIUM ON
`
`ETIELDHFROGRAMMABLE
`
`
`CUSTOM COMPUTING MACHINES
`
`FCCM 2999
`
`April 21 ——23, 1999
`
`Napa Valley, California
`
`Sponsored by
`
`IEEE Computer Society Technical Committee on
`Computer Architecture
`
`Edited by
`
`Kenneth L. Pocek and Jeffrey M. Arnold
`
`IEEE®
`COMPUTER
`SOCIETY
`9
`
`Los Alamitos, California
`
`Tokyo
`0
`Brussels
`0
`Washington
`
`
`©OONO30‘I-bOJN—K
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`01010101010101##-b-b-b-b-b-b-b-bOJOJOJOJOJOJOJOJOJOJNN
`
`2
`
`
`
`Copyright © 1999 by The Institute of Electrical and Electronics Engineers, Inc.
`All rights reserved
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume
`P
`P
`that carry a code at the bottom of the first page, provided that the er—co y fee indicated in the code is
`aid
`p
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`P)’ g
`P
`P
`‘1
`c
`in , re rim, or re ublication re uests should be addressed to: [BEE Copyrights Manaoer, [EEE
`Other co
`Service Center, 445 Hoes Lane, PO. Box 133, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page.
`They re ect the authors' 0 inions and, in the interests 0 time! ' dissemination, are ublished as
`resented
`P
`)
`P
`P
`and without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society, or the Institute of Electrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Order Number PR00375
`ISBN 0-7695—0375-6
`ISBN 0-7695—0377-2 (microfiche)
`g
`IEEE Order Plan Catalo Number PR00375
`ISSN 1082-3409
`
`Additional copies may be orderedfrom:
`
`IEEE Computer Society
`Customer Service Center
`l0662 Los Vaqueros Circle
`Poi Box 3014
`Los Alamilos, CA 90720-1314
`Tel: + 1-714-321-8380
`Fax: + 1—714-821-4641
`Email: cs.books@computer.otg
`
`IEEE Service Center
`445 Hoes Lane
`,
`PO. Box 1331
`Piscataway, NJ 08855-1331
`Tel: + 1-732-981-1393
`Fax: + l»732~98|—9667
`mis.custserv@computer.org
`
`_
`
`IEEE Computer Society
`Ooshima Building
`1-4-2 Minami-Aoyama
`Minato—ku. Tokyo 107-0062
`JAPAN
`Tel: + 81—3A3408-3l [8
`Fax: + 81-3-3408-3553
`Tokyo,ofc@computer.org
`
`Editorial production by Thomas Baldwin
`
`Cover art design by Joseph Daigle/Studio Productions
`Printed in the United States of America by The Printing House
`i
`
`o
`[EEE
`COMPUTER
`SOCIETY
`©
`
`©OONO30‘I-bOJN—‘t
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`‘IO
`11
`1 1
`12
`12
`13
`13
`14
`14
`15
`15
`16
`16
`17
`17
`18
`18
`19
`19
`20
`20
`21
`21
`22
`22
`23
`23
`24
`24
`25
`25
`26
`26
`27
`27
`28
`28
`29
`29
`30
`30
`31
`31
`32
`32
`33
`33
`34
`34
`35
`35
`36
`36
`37
`37
`38
`:3
`39
`40
`40
`41
`41
`42
`42
`43
`43
`44
`44
`45
`45
`46
`46
`47
`47
`48
`48
`49
`49
`50
`50
`51
`51
`52
`52
`53
`53
`54
`54
`55
`55
`56
`56
`
`3
`
`
`
`
`
`Table of Contents
`
`SEVENTH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE
`
`CUSTOM COMPUTING MACHINES (FCCM’99)
`
`Co-Chairs & Program Committee .............................................................................................. x
`
`
`— SESSION 1: TOOLS 1
`CHAIR: Stephen Smith
`
`Macro-Based Hardware Compilation of JavaTM Bytecodes into a Dynamic
`Reconfigurable Computing System .............................................................................................. 2
`J.MP. Cardoso, H.C. Neto
`
`A CAD Suite for High-Performance FPGA Design .................................................................... 12
`B. Hutchings, P. Bellows, J. Hawkins, S. Hemmert, B. Nelson, M Rytting
`
`Formal Verification of Reconfigurable Cores .............................................................................. 25
`S. Singh, C.J. Lillieroth
`
`" SESSION 2: NETWORK APPLICATIONS
`
`CHAIR: Mark Shand
`
`Transmutable Telecom System and Its Application .................................................................... 34
`T. Miyazaki, T. Murooka, M Katayama, A. Takahara
`
`Implementation and Evaluation of a Prototype Reconfigurable Router ...................................... 44
`.].R. Hess, D. C. Lee, S]. Harper, MT. Jones, RM Athanas
`
`_ SESSION 3: COMPILATION
`
`CHAIR: André DeHon
`
`Pipeline Vectorization for Reconfigurable Systems .................................................................... 52
`M. Weinhardt, W. Luk
`
`Automatic Allocation of Arrays to Memories in FPGA Processors with
`Multiple Memory Banks ............................................................................................................. 63
`MB. Gokhale, J.M Stone
`
`Parallelizing Applications into Silicon ......................................................................................... 70
`J. Babb, M Rinard, CA. Moritz, W. Lee, M Frank, R. Barua, S. Amarasinghe
`
`" SESSION 4: ARCHITECTURES
`
`CHAIR: Scott Hauck
`
`Reconfigurable Elements for a Video Pipeline Processor
`MR. Piacentino, G.S. van der Wal, M W. Hansen
`
`......................................................... 82
`
`©OONO30‘I-bOJN—‘t
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`01010101010101-b#-b-b-b##-b-b-bOJOJOJOJOJOJOJOJOJOJNNNNNNNNNN—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t
`
`0301-503N—\O©OO\IO30‘I#OJN—‘tOGDOONam-ho.)NAOQDOONmm-bOJNAOQDOONmm-bOJN—‘to
`
`4
`
`
`
`
`
`
`
`ConCISe: A Compiler-Driven CPLD—Based Instruction Set Accelerator ................................... 92
`B. Kastrup, A. Bink, J. Hoogerbrugge
`
`‘_ SESSION 5: TOOLS 2
`
`CHAIR: Roger Woods
`
`CPR: A Configuration Profiling Tool ........................................................................................ 104
`S. Cadambi, S. C. Goldstein
`
`Debugging Techniques for Dynamically Reconfigurable Hardware ......................................... 1 14
`N. McKay, S. Singh
`
`Improving Simulation Accuracy in Design Methodologies for Dynamically Reconfigurable
`Logic Systems ............................................................................................................................ 123
`M Vasilko, D. Cabanis
`
`" SESSION 6: GRAPHICS APPLICATIONS
`
`CHAIR: Herman Schmit
`
`Reconfigurable Computing for Augmented Reality .................................................................. 136
`W. Luk, T.K. Lee, J.R. Rice, N. Shirazi, P. Y.K. Cheung
`
`Sepia: Scalable 3D Compositing using PCI Pamette ................................................................. 146
`L. Moll, A. Heirich, M. Shand
`
`
`
`'— SESSION 7: APPLICATIONS
`
`CHAIR: Mike Butts
`
`An Edge-Endpoint-Based Configurable Hardware Architecture for
`VLSI CAD Layout Design Rule Checking ............................................................................... 158
`Z. Luo, M. Marionosi, P. Ashar
`
`FAFNER—Accelerating Nesting Problems with FPGAS ...........................'................................. 168
`J. C. Alves, J. C. Ferreira, C. Albuquerque, J.F. Oliveira, J.S. Ferreira, J. Silva Matos
`
`
`_ SESSION 8: DSP APPLICATIONS .
`CHAIR: Phil Kuekes
`
`Field Programmable Gate Array Based Radar Front-End Digital Signal Processing ............... 178
`T.J. Moeller, DR. Martinez
`
`Optimizing FPGA-Based Vector Product Designs .................................................................... 188
`D. Benyamin, W. Luk, J. Villasenor
`
`vi
`
`©OONO30‘I-500N—K
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`01010101010101##-b-b-b-b-b-b-b-bwwwwwwwwww
`
`5
`
`
`
`
`
`
`
`— SESSION 9: RUN TIME SYSTEMS
`
`CHAIR: Satnam Singh
`
`PCI-PipeRench and the SWORDAPI: A System for Stream-based Reconfigurable Computing 200
`R. Laufer, R.R. Taylor, H. Schmit.
`
`Safe and Protected Execution for the Morph/AMRM Reconfigurable Processor ..................... 209
`A.A. Chien, J.H. Byun
`
`Implementing an API for Distributed Adaptive Computing Systems ........................................ 222
`M Jones, L. Scharfi J. Scott, C. Twaddle, M Yaconis, K. Yao, P. Athanas, B. Schott
`
`— SESSION 10: ARITHMETIC
`
`CHAIR: Steve Casselman
`
`A Super-Serial Galois Fields Multiplier for FPGAs and its Application to
`Public—Key Algorithms ............................................................................................................. 232
`G. Orlando, C. Paar
`
`Automatic Floating to Fixed Point Translation and its Application to
`Post-Rendering 3D Warping ..................................................................................................... 240
`MP. Leong, MY. Yeung, CK. Yeung, CW. Fu, P.A. Heng, P.H. W. Leong
`
`Dynamic Precision Management for Loop Computations on Reconfigurable Architectures... 249
`K. Bondalapati, V.K. Prasanna
`
`_ POSTER SESSION 1
`
`Accelerating Run-Time Reconfiguration on FCCMs ................................................................ 260
`J.-P. Heron, R.F. Woods
`
`A Virtual Hardware Handler for RTR Systems ......................................................................... 262
`R. Turner, R.F. Woods, S. Sezer, J.-P. Heron
`
`Algorithm Analysis and Mapping Environment for Adaptive
`Computing Systems: Further Results ........................................................................................ 264
`EX. Pauer, P.D. Flore, J.M Smith
`
`Development System for FPGA-Based Digital Circuits ......................... '. .................................. 266
`V. Sklyarov, J. Fonseca, R. Monteiro, A. Oliveira, A. Mela,
`N. Lau, I. Sklt'arova, P. Neves, A. Ferrari
`
`Design of a JTAG Based Run Time Reconfigurable System .................................................... 268
`C. Cousineau, F. Laperle, Y. Savarz'a
`
`Architectures for System—Level Applications of Adaptive Computing ..................................... 270
`B. Schott, C. Chen, S. Crago, J. Czarnaskz', M French, I. Horn, T. Tho, T. Valenti
`
`Task-level Partitioning and RTL Design Space Exploration for Multi-FPGA Architectures
`V. Srinivasan, R. Vemuri
`
`272
`
`Enabling Automatic Module Generation for FCCM Compilers ................................................ 274
`A. Koch
`
`vii
`
`©OONO30‘I-bOJN—‘t
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`O‘IO‘IO‘IO‘IO‘IO‘IO‘l-b-b-b-b-b##-b##OJOJOJOJOJOJOJOJOJOJNNNNNNNNNN—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t
`
`0301#03N—\O©OONam-bOJN—‘tOQDOO\lmm#03NAOQDOONmm-bOJN—‘tOQDOONmO‘l-bOJN—‘to
`
`6
`
`
`
`
`
`_ POSTER SESSION 2
`
`ICARUS: A Dynamically Reconfigurable Computer Architecture ........................................... 278
`M. Baxter
`
`SONIC—A Plug-In Architecture for Video Processing ............................................................ 280
`SD. Haynes, P. Y.K. Cheung, W. Luk, J. Stone
`
`A Reconfigurable Platform for Academic Purposes .................................................................. 282
`C. Teuscher, J.-0. Haenni, F.J. Gomez, H.F. Restrepo, E. Sanchez
`
`VHDL Placement Directives for Parametric IP Blocks ............................................................. 284
`
`J. Hwang, C. Patterson, S. Alitra
`
`Runlength Compression Techniques for FPGA Configurations ................................................ 286
`S. Hauck, W.D. Wilson
`-
`
`— POSTER SESSION 3
`
`Accelerating An IR Automatic Target Recognition Application with FPGAs .......................... 290
`J. Jean, X Liang, B. Drozd, K. Tomko
`
`Mapping of an Automated Target Recognition Application from a Graphical Sofiware
`Environment to FPGA-based Reconfigurable Hardware .......................................................... 292
`B. Levine, S. Natarajan, C. Tan, D. Newport, D. Bouldin
`
`Hybrid Data/Configuration Caching for Striped FPGAs ........................................................... 294
`D. Deshpande, A.K. Somani, A. Tyagi
`
`On Reconfiguring Cache for Computing ................................................................................... 296
`H.-S. Kim, A.K. Somani, A. Tyagi
`
`Reconfigurable Pipelines in VLIW Execution Units ................................................................. 298
`R.D. Williams, B.D. Kuebert
`
`Fast Online Placement for Reconfigurable Computing Systems ............................................... 300
`K. Bazargan, M Sarrafzadeh
`
`
`_ POSTER SESSION 4
`
`A Compact Fast Variable Key Size Elliptic Curve Cryptosystem Coprocessor........................ 304
`L. Gao, S. Shrivastava, H. Lee, G.E. Sobelman
`
`A Virtual Logic Algorithm for Solving Satisfiability Problems Using
`Reconfigurable Hardware ......................................................................................................... 306
`M. Abramovici, J T. de Sousa
`
`Reducing Compilation Time of Zhong’s FPGA-based SAT solver ........................................... 308
`P.K. Chan, M]. Boyd, S. Goren, K. Klenk, V. Kodavati, R. Kundu, M. Margolese,
`J. Sun, K. Suzuki, E. Thome, X. Wang, J. Xu, M Zhu
`
`FPGA-based Structures for On-line FFT and DCT ................................................................... 310
`
`D. Lau, A. Schneider, MD. Ercegovac, J. Villasenor
`
`viii
`
`©OONO30‘I-bOJN—‘t
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`01010101010101-b-b##-b-b-b-b-b#OJOJOJOJOJOJOJOJOJOJNNNNNNNNNN—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t
`
`0301#03N—\O©OONam-bOJN—‘tOQDOONmO‘I#03NAOQDOONmm-bOJN—‘tOQDOONmO‘l-bOJN—‘to
`
`7
`
`
`
`
`
`©OONO30‘I-bOJN—‘t
`
`An FPGA-based Fan Beam Image Reconstruction Module ...................................................... 312
`L. Maltar, F.M. G. Franca, V.C. Alves, C.L. Amorim
`'
`
`Bézier Curve Rendering on VirtexTM ......................................................................................... 314
`D. MacVicar, S. Singh, R. Slous
`
`Author Index............................................................................................................................. 318
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`01010101010101-b#-b-b-b##-b-b-bOJOJOJOJOJOJOJOJOJOJNNNNNNNNNN—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t—‘t
`
`0301-503N—\O©OO\IO30‘I#OJN—‘tOGDOONam-ho.)NAOQDOONmm-bOJNAOQDOONmm-bOJN—‘to
`
`8
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`Safe and Protected Execution for the Morph/AMRM Reconfigurable
`Processor
`
` Jay H. Byun
` Andrew A. Chien
` Department of Computer Science and Engineering
` Department of Computer Science
` University of California, San Diego
` University of Illinois at Urbana-Champaign
` achien@cs.ucsd.edu
` jaybyun@cs.uiuc.edu
`
`April 1, 1999
`
`Abstract
`
`Technology scaling of CMOS processes brings relatively faster
`transistors (gates) and slower interconnects (wires), making viable
`the addition of reconfigurability to increase performance. In the
`Morph/AMRM system, we are exploring
`the addition of
`reconfigurable logic, deeply integrated with the processor core,
`employing the reconfigurability to manage the cache, datapath,
`and pipeline resources more effectively. However, integration of
`reconfigurable logic introduces significant protection and safety
`challenges for multiprocess execution. We analyze the protection
`structures in a state of the art microprocessor core (R10000),
`identifying the few critical logic blocks and demonstrating that the
`majority of the logic in the processor core can be safely
`reconfigured. Subsequently, we propose a protection architecture
`for the Morph/AMRM reconfigurable processor which enable
`nearly the full range of power of reconfigurability in the processor
`core while requiring only a small number of fixed logic features
`which to ensure safe, protected multiprocess execution.
`1. Introduction
`
`Trends in semiconductor technology suggest that the
`use of reconfigurable logic blocks within the processor will
`be desirable in the future. Projections from Semiconductor
`Industry Association(SIA) for the year 2007 indicate
`advanced semiconductor processes using 0.1 micron feature
`sizes [1]. However, this feature size, as measured by
`transistor channel length, is of decreasing importance to
`logic and circuit as well as processor speed. In systems of
`that era, logic density, logic speed, and processor speed will
`be dominated by interconnect performance and wiring
`density. For 2007, the SIA projects pitch for the finest
`interconnect at 0.4-0.6 microns. Between logic blocks,
`average interconnect lengths typically range from from
`1,000x to 10,000x pitch -- up to 6mm of intra-chip
`interconnect
`length.
` For such an
`interconnect,
`the
`achievable global clock speed would be
`limited
`to
`approximately 1 nanosecond. Within a few technology
`generations, a crossover will occur, and the average
`interconnect delay will surpass logic block delays --
`projections
`indicate that by
`the year 2007, average
`interconnect delay can be equivalent to five gate delays.
`
` 1
`
`Once past the cross-over point, dynamic interconnect
`(reconfigurable interconnect or logic) can be introduced at
`modest impact even on critical timing paths[2]. In such
`systems, the dynamic configurability in the processor can
`be used
`to significant advantage [4, 5],
`improving
`performance by factors of 10 to 100x for computational
`kernels while avoiding the traditional disadvantages of
`custom computing approaches such as I/O coprocessor
`coupling and slower
`logic [6].
` In
`these systems,
`reprogrammable
`logic
`blocks will
`replace
`static
`interconnects in the processor core, paving the way for a
`new class of architectures which are customized to the
`application, delivering more robust and higher performance.
`
`Figure 1. Reconfigurability in the processor core and the extended
`application to fixed hardware interface
`
`Reconfigurable, or application adaptive processors
`allow customization of mechanisms, bindings, and policies
`on a per application basis. While current microprocessors
`implement a number of aggressive architectural techniques
`such as speculative execution, branch prediction, block
`prefetching, multi-level caching, etc. to achieve higher
`execution speeds, these mechanisms and policies are tuned
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`9
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`for a broad suite of applications (e.g. SPEC), and thus
`cannot be tightly matched to the needs of a particular
`application, procedure, or even loop in an application. For
`example, the cache block size and organization is chosen to
`maximize performance over a suite of applications, but may
`not give best performance on any particular application.
`Similar constraints apply to other performance critical
`aspects such as value prediction, branch prediction, and
`data movement. In contrast, a processor incorporating
`reconfigurability can adopt optimal policies (and in some
`cases better mechanisms) for the application, enabling
`increased execution efficiency. Thus, the reconfigurable
`logic can used to tune the processor to better match the
`application, rather than the more traditional view of
`thinking of it as an add-on coprocessor. One example of
`this per-application basis tuning would be to adapt the
`cache
`line size
`to maximize performance
`for
`that
`approach
`is embodied
`in
`the
`application[3]. This
`Morph/AMRM
`(Adaptive Memory Reconfiguration
`Management) architecture [4, 5], and the basic change in
`perspective is that the reconfigurable hardware is an
`extension of
`the application program, extending
`the
`application -- fixed hardware interface to enable more
`efficient execution. The fixed hardware then has a
`somewhat richer (and in parts lower level) interface as
`shown in Figure 1. Studies of Morph/AMRM have
`demonstrated that performance increases of ten to 100 times
`are possible [5]. In essence, this is an extension of the
`application binary interface (so-called ABI), but need not
`be a non-portable extension of the application programming
`interface (API) if appropriate CAD support is available.
`This approach is similar to that which has recently gained
`popularity in the software design community as "open
`implementations"
`[7]
`in which
`software architects
`recognize
`the need
`to open
`the
`implementation for
`customization for particular application uses in order to
`achieve adequate performance.
`Introducing application-controlled reconfigurability in
`the processor raises significant challenges for ensuring
`process isolation and protection (multiprocess isolation), a
`critical element of robust desktop and to an increasing
`degree, embedded computing
`systems. Multiprocess
`isolation is an essential modularity element in software
`systems: without the guarantee of safely isolated and
`protected processes, the system can never be robust since
`software faults cannot be contained and the system cannot
`be safely extended. It is essential for robust reconfigurable
`computing that an application's customization only affect its
`computation, not that of other applications. For example, if
`application-defined hardware were allowed to control
`hardware addressing,
`it
`could allow unauthorized
`corruption of operating system data or even the data of
`other application processes. If an application-defined
`hardware were allowed to control data prefetching, it could
`swamp the memory system with spurious requests. If
`
`application-defined hardware were allowed to control
`privilege mode changes, it could compromise all traditional
`protection structures.
`the protection structures of
` Our study examines
`traditional processors and operating systems, and based on
`these lessons, proposes a safe multiprocess execution
`architecture for reconfigurable systems. We analyze in
`detail the software and hardware mechanisms central to the
`process protection in conventional processors and OS,
`specifically studying the MIPS R10000 [8] microprocessor,
`an exemplar of a system employing Unix/RISC protection
`architecture. This study elucidates the key mechanisms and
`architectural features for Unix style two mode protection,
`and addressing based isolation. The key feature of this
`protection architecture is process isolation via address
`isolation and mediation. Specifically,
`1. All access to hardware devices is mediated by the
`operating system,
`2. The operating system manages address translation to
`isolate processes,
`3. Application processes cannot change the address
`translation information,
`4. Application processes cannot substitute other
`translation information,
`5. All application accesses are subject to this
`translation, and
`6. The hardware ensures these guarantees
`
`the Morph/AMRM
`describe
`subsequently
`We
`architecture, outlining the dimensions of configurability and
`the hazards for multiprocess protection they induce. For the
`Morph/AMRM system, we then describe the protection
`architecture, describing in detail how each of the key
`properties of the operating system / processor protection
`architecture are provided. The key elements of this
`protection architecture are:
`1. A hardwired control processor which controls
`instruction sequence and privilege mode transitions
`2. A hardwired control processor to TLB control for
`address translation and TLB entry management
`3. A requirement for all other configurable elements
`(system chip sets, input/output devices, memory
`controllers) must deal in virtual addresses, and their
`accesses are checked by local TLBs
`4. Controlled access to key shared interconnects such as
`the system bus are controlled by hardwired arbiters
`which are not changed, system reserves highest priority
`to allow preemption for these resources
`
`This architecture enables configurability in the processor
`complex because it can ensure multiprocess protection (safe
`configuration). We also believe it enables much of the
`useful configurability in the processor complex, notably
`policies for improving efficient management of resources
`and even the addition of instructions, special functional
`
` 2
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`10
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`units, or even processor state. The model provided to
`application programs is a private, configurable, virtual
`machine which enables rich application customization.
`These applications (and their customizations) are cleanly
`isolated.
`The remainder of the paper is organized as follows.
`Section 2 describes
`the basic problem of protected
`execution and process isolation in computer systems.
`Section 3 describes our analysis of the software and
`hardware mechanisms central to the process protection in
`conventional processors and operating systems. Section 4
`discusses the implications of reconfigurability on process
`protection and identifies the key requirement for safe
`process isolation in reconfigurable processors. In Section 5,
`we describe the Morph/AMRM system and a proposed
`protection architecture that meets these requirements set
`forth in Section 3. Section 6 discusses alternate approaches
`and the limitations on configurability imposed by the
`Morph/AMRM protection architecture.
` Sections 7
`summarizes future work and the material covered in this
`paper.
`2. Process Isolation: the Problem
`
`Figure 2: Multiprocess Protection based on Address Space
`Isolation
`
`To understand the challenges of multiprocess isolation,
`it is instructive to first consider the possible modalities in
`which multiprocess isolation can be compromised. In the
`simplest mode, an application corrupts the data of another,
`causing it to fail or compute incorrectly. In a more complex
`mode, the application somehow locks up the machine, so no
`other application state is damaged, but neither can the
`machine make progress. One example of this would be
`jamming the memory bus or defeating the timer interrupt
`which ensures preemption. A more serious failure mode is
`to corrupt the operating system's data, which can lead to a
`machine crash
`in which all applications have data
`corruption. Finally, an application could also corrupt
` 3
`
`the operating
`input/output device state, confounding
`system, the device (leading to data loss or misdirection), or
`application data itself. In all of these cases, the failure is
`the result of allowing an application action which can affect
`the machine hardware state, other application memory state,
`or operating system state.
`The key issue in safe multiprocess execution is to
`control access to hardware resources, ensuring that these
`accesses are non-interfering. In general, access to main
`memory, as well as other architecturally visible state
`(processor data registers, control registers), system chip
`registers, and input/output device state must be controlled.
`Traditional approaches partition memory access, virtualize
`resources
`such as processor data
`resources with
`multitasking, and use operating system calls to mediate
`operations which require access to control registers, system
`chip sets, input/output device state, etc. Note that isolation
`and virtualization must apply to any resource at any level
`that a process can claim its ownership. The final piece of
`the puzzle is that in order to support the virtualization and
`multitasking, transitions between the different entities must
`be carefully controlled to prevent compromise.
`3. Process Isolation in the MIPS R10000
`
`The key issue in maintaining a safe multiprocessing
`environment is ensuring process isolation: the processor
`and the OS must prevent independent processes from
`interfering with the data and memory of each other and of
`the operating system kernel. They must also prevent a
`malicious process from taking over the processor and
`locking up the system.
`Through a detailed analysis of the R10000 architecture
`and operating
`system, we
`identify
`the hardware
`mechanisms and OS software structures that are central to
`process isolation. We chose the MIPS R10000 processor as
`an exemplar of a modern RISC processor that supports a
`relatively simple UNIX style protection structure [9]. We
`first examine how a UNIX style operating system ensures
`process
`isolation and
`thereby derive
`the hardware
`requirements it imposes. Then identify the corresponding
`support
`in
`the R10000 processor. In
`the following
`discussion, we assume that the address translation is on a
`simple paging system. Most of today’s systems actually
`employ multiple-paging or segmented paging but the
`address translation mechanism is fundamentally the same as
`a simple paging system.
`
`3.1 Operating System-based Process
`Protection
`
`3.1.1 Application and Operating System Memory
`Isolation
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`11
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`Application and operating system memory isolation is
`achieved through controlled address translation. The
`physical memory of each process is isolated by having
`process's virtual address space pages map to its own
`physical memory frames only. To protect processes from
`modification by other processes, the memory-management
`hardware and the OS must prevent programs from changing
`their own address mappings. The UNIX kernel, for
`example, runs in a privileged mode (kernel mode or system
`mode) in which memory mapping may be controlled,
`whereas application processes run in an unprivileged mode
`(user mode). The page tables, mapping information for each
`process reside in the memory space of the kernel so that
`they can only be modified by the OS running in kernel
`mode This address translation control to ensure isolation is
`achieved through the following mechanisms in UNIX [9,
`10].
`
`1. Locating correct translation information for each process.
`By using a special page table base register(ptbr) which
`is set from the process control block(PCB) on each
`process switch, the OS can correctly locate the page
`table for the executing process. Then the index portion
`of the virtual address is added to the address pointed to
`by the ptbr to locate the appropriate page table entry
`(PTE).
`2. Distinguishing valid and invalid entries in page tables.
`Notice that the page table can contain entries that are
`not used by
`the process. These unused entries
`correspond to the pages that are not in the process’s
`logical address space and thus compromise process
`isolation. The OS uses valid-invalid bits to distinguish
`these entries. Alternatively, the page table can also be
`implemented to contain only the entries that are
`actually used by the process. This implementation will
`require a special register containing the length of the
`process's page table, usually called page-table length
`register(PTLR). PTLR can be used to check if page
`index portion of the virtual address is in the range and
`therefore
`is not
`accessing
`illegal
`translation
`inform