throbber

`
`
`Proceedings —
`
`SEVENTH ANNUAL
`IEEE Symposium on
`FIELD-PROGRAMMABLE
`ay Custom CoMPUTING MACHINES
`FCCM’99
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`1
`
`XILINX 1011
`
`1
`
`XILINX 1011
`
`

`

`
`
`Proceedings
`
`SEVENTH ANNUAL
`TEEE SymposiuM On
`FIELD-PROGRAMMABLE
`
`Custom ComMPuUTING MACHINES
`
`FCCM’99
`
`April 21 — 23, 1999
`Napa Valley, California
`
`Sponsored by
`IEEE Computer Society Technical Committee on
`Computer Architecture
`
`Edited by
`Kenneth L. Pocek and Jeffrey M. Arnold
`
`wD
`COMPUTER
`SOCIETY
`o
`Los Alamitos, Califormia
`
`Tokyo
`e
`Brussels
`e
`Washington
`
`
`OMDNDORWN=
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`2
`
`

`

`Copyright © 1999 by TheInstitute of Electrical and Electronics Engineers, Inc.
`All rights reserved
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume
`that carry a codeat the bottom ofthefirst page, provided that the per-copy fee indicated in the codeis paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 133, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page.
`Theyreflect the authors’ opinions and, in the interests of timely dissemination, are published as presented
`and without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society, or the Institute of Electrical and Electrenics Engineers, Inc.
`
`IEEE Computer Society Order Number PRO00375
`ISBN 0-7695-0375-6
`ISBN 0-7695-0377-2 (microfiche)
`IEEE Order Plan Catalog Number PROO375
`ISSN 1082-3409
`
`Additional copies maybe orderedfrom:
`
`IEEE Computer Society
`Customer Service Center
`10662 Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitas, CA 90720-1314
`Tel: + 1-714-821-8380
`Fax: + 1-714-821-4641
`E-mail: cs.books@computer.arg
`
`IEEEService Center
`445 Hoes Lane
`:
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`Tel: + 1-732-981-1393
`Fax: + 1-732-981-9667
`mis.custserv@computer.org
`
`.
`
`[EEE ComputerSociety
`Ooshima Building
`1-4-2 Minami-Aoyama
`Minato-ku, Tokyo 107-0062
`JAPAN
`Tel: + 81-3-3408-3118
`Fax: + 81-3-3408-3553
`Tokyo.ofc @computer.org
`
`Editorial production by Thomas Baldwin
`
`Coverart design by Joseph Daigle/Studio Productions
`
`Printed in the United States of America by The Printing House
`
`Mm:
`
`COMPUTER
`SOCIETY
`
`}
`
`OMANDARWDND=
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`3
`
`

`

`
`
`Table of Contents
`
`SEVENTH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE
`CUSTOM COMPUTING MACHINES (FCCM’99)
`
`Co-Chairs & Program Committee... ccccccsessssscscscesseocesecscseescsssssssensssssasscsssssassssasssessasons x
`
`
`~ SESSION 1: TOOLS 1
`CHAIR: Stephen Smith
`Macro-Based Hardware Compilation of Java™ Bytecodes into a Dynamic
`Reconfigurable Computing System «00... ecsssesesessecesseesescreveeseersesesseesseoneceaseenessensessesnenseeensaes 2
`J.M.P. Cardoso, H.C. Neto
`
`A CADSuite for High-Performance FPGA Design 0.0.0... ee cecsseeseseseceeneesesseececeneceeeevscseeseeessanes 12
`B. Hutchings, P. Bellows, J. Hawkins, S. Hemmert, B. Nelson, M. Rytting
`
`Formal Verification of Reconfigurable Cores........csscssescesssssssesssesessessassecesenesenesensetseseseneaeeesaves 25
`S. Singh, C.J. Lillieroth
`
`— SESSION 2: NETWORK APPLICATIONS
`CHAIR: Mark Shand
`
`Transmutable Telecom System and Its Application... ee ceeeseeceeececesneeeseesscessesereenenesanees 34
`T. Miyazaki, T. Murooka, M. Katayama, A. Takahara
`Implementation and Evaluation of a Prototype Reconfigurable Router .............ccscccesetseeeeees 44
`JR. Hess, D.C. Lee, S.J. Harper, M.T. Jones, P.M. Athanas
`
`— SESSION 3: COMPILATION
`CHAIR: André DeHon
`
`Pipeline Vectorization for Reconfigurable Systems..........:.ccsceseeeessssseeseeseeseseeseeseenseneennnees 52
`M. Weinhardt, W. Luk
`
`Automatic Allocation of Arrays to Memories in FPGA Processors with
`Multiple Memory Banks ..00..... cc eeeccesesseececeeceseeeeeeceeesecessassseanssssanssesseeseseseessaserssseneesscneseeensensens 63
`M.B. Gokhale, J.M. Stone
`
`Parallelizing Applications into SilicOm..........csssseesecsseessecceetetensereresensescserssssssnesssensssssesssseneseees 70
`J. Babb, M. Rinard, C.A. Moritz, W. Lee, M. Frank, R. Barua, S. Amarasinghe
`
`~ SESSION 4: ARCHITECTURES
`CHAIR:Scott Hauck
`
`Reconfigurable Elements for a Video Pipeline Processor.............::scsseseessesesseseeseesneneneeseesnsanes 82
`MR. Piacentino, G.S. van der Wal, M.W. Hansen
`
`OMANDARWDND=
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`4
`
`

`

`
`
`
`
`OMDNDORWND=
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`ConCISe: A Compiler-Driven CPLD-Based Instruction Set Accelerator ......... cc eeeeeeeeeeees 92
`B. Kastrup, A. Bink, J. Hoogerbrugge
`
`~ SESSION 5: TOOLS 2
`CHAIR: Roger Woods
`CPR: A Configuration Profiling Tool ........cc.cccccsscssssscscsssessssecssseusesecsseeseuseseuseseusaasesseneses 104
`S. Cadambi, S.C. Goldstein
`
`Debugging Techniques for Dynamically Reconfigurable Hardware ..........cccessssereseneereseneees 114
`N. McKay, S. Singh
`Improving Simulation Accuracy in Design Methodologies for Dynamically Reconfigurable
`Logic Systems........cccecesssesesecsesesesscseeesesstststsnssssesssusssusasesevsvensesesanecenseseenseeseseaneneseresessenensesserses 123
`M. Vasilko, D. Cabanis
`
`™~ SESSION 6: GRAPHICS APPLICATIONS
`CHAIR: Herman Schmit
`Reconfigurable Computing for Augmented Reality .........:c:ccceeceeeeesereneneneeeeeteeeseeeteneetenaaes 136
`W. Luk, T.K. Lee, J.R. Rice, N. Shirazi, P.Y.K. Cheung
`Sepia: Scalable 3D Compositing using PCI Pamette.........ccsececcserereseesseresesesssseseeesscsensesseeees 146
`L. Moll, A. Heirich, M. Shand
`
`
`
`~ SESSION 7: APPLICATIONS
`CHAIR: Mike Butts
`An Edge-Endpoint-Based Configurable Hardware Architecture for
`VLSI CAD Layout Design Rule Checking ......ccccccsseeseceeenseesceeseenerenseeeneaneeeeesssseeneenenenensens 158
`Z. Luo, M. Martonosi, P. Ashar
`FAFNER—Accelerating Nesting Problems with FPGAS..........sscccssseecesreeseetsesesessenesecssenennenes 168
`J.C. Alves, J.C. Ferreira, C. Albuquerque, J.F. Oliveira, J.S. Ferreira, J. Silva Matos
`
`— SESSION 8: DSP APPLICATIONS —
`CHAIR:Phil Kuekes
`Field Programmable Gate Array Based RadarFront-End Digital Signal Processing............... 178
`T.J. Moeller, D.R. Martinez
`Optimizing FPGA-Based Vector Product Designs ..........ccseesseeeersseeeetensssssenssanesansenenerenseenees 188
`D. Benyamin, W. Luk, J. Villasenor
`
`vi
`
`5
`
`

`

`
`
`
`
`— SESSION 9: RUN TIME SYSTEMS
`CHAIR: Satnam Singh
`PCI-PipeRench and the SworDAPI: A System for Stream-based Reconfigurable Computing 200
`R. Laufer, R.R. Taylor, H. Schmit.
`Safe and Protected Execution for the Morph/AMRM Reconfigurable Processor...........0.0000 209
`A.A. Chien, J.H. Byun
`
`Implementing an API for Distributed Adaptive Computing Systems..............cccssesseseesees 222
`M. Jones, L. Scharf, J. Scott, C. Twaddle, M. Yaconis, K. Yao, P. Athanas, B. Schott
`
`~ SESSION 10: ARITHMETIC
`CHAIR: Steve Casselman
`
`A Super-Serial Galois Fields Multiplier for FPGAsand its Application to
`Public-Key Algorithmsoes eseseessescsseenecnesesesessesnecesasseseesseesseseeseasecasesusesseneessanseseacaees 232
`G. Orlando, C. Paar
`
`Automatic Floating to Fixed Point Translation and its Application to
`Post-Rendering 3D Warping .......... ce ccecseececeseerersersceceecerceeenecsecetnenecsecsesseeaseaeaevenesesatenteneeeeeees 240
`M.P. Leong, M.Y. Yeung, C.K. Yeung, CW. Fu, P.A. Heng, P.H.W. Leong
`
`Dynamic Precision Management for Loop Computations on Reconfigurable Architectures... 249
`K. Bondalapati, V.K. Prasanna
`
`~ POSTER SESSION 1
`
`Accelerating Run-Time Reconfiguration on FOCMS..........cccccsseesseseesesesesarsseceseseeseeetaeeneess 260
`J.-P. Heron, RF. Woods
`
`A Virtual Hardware Handler for RTR Systems... esceecceressesesenccsessssceneenssssesesseenesnecseesees 262
`R. Turner, R.F. Woods, S. Sezer, J.-P. Heron
`
`Algorithm Analysis and Mapping Environment for Adaptive
`Computing Systems: Further Results... ccscescesssecscrscessenesescessssssaseseeseeresecenensenesssenersnenanes 264
`ELK. Pauer, PD. Fiore, J.M. Smith
`
`Development System for FPGA-Based Digital Circuits... ccssseessesseneescseesssseeeeneesseeene 266
`V. Sklyarov, J. Fonseca, R. Monteiro, A, Oliveira, A. Melo,
`N. Lau, I. Skliarova, P. Neves, A. Ferrari
`
`Design of a JTAG Based Run Time Reconfigurable System ..........cecsscecssecssssssreeseseneeseeneees 268
`C. Cousineau, F. Laperle, ¥. Savaria
`Architectures for System-Level Applications of Adaptive Computing ...........ccseceeeseeeeeees 270
`B. Schott, C. Chen, S. Crago, J. Czarnaski, M. French, I. Hom, T. Tho, T. Valenti
`Task-level Partitioning and RTL Design Space Exploration for Multi-FPGA Architectures ...272
`V. Srinivasan, R. Vemuri
`
`Enabling Automatic Module Generation for FCCM Compilers .........0c::cccseeceeseseeeneenersenes 274
`A. Koch
`
`vii
`
`OONDARWDM=
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`6
`
`

`

`
`
`OONDARWNDM=
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`~ POSTER SESSION 2
`
`ICARUS: A Dynamically Reconfigurable Computer Architecture ..........0ccecscceeeseeeeeeteereeees 278
`M. Baxter
`
`SONIC—APlug-In Architecture for Video Processing..........ccseseeecsersseseeseseeeesseseenesesesenaees 280
`S.D. Haynes, P.Y.K. Cheung, W. Luk, J. Stone
`A Reconfigurable Platform for Academic Purposes... cies teee ce eeeeneseee nines neseeneeenel 282
`C. Teuscher, J.-O. Haenni, F.J. Gomez, H.F. Restrepo, E. Sanchez
`
`VHDLPlacementDirectives for Parametric IP BIOCKS ........cccce essen eseescseeseetsensenseeseseenees 284
`J. Hwang, C. Patterson, S. Mitra
`Runlength Compression Techniques for FPGA Configurations..........cccccsesrereeeereeeeeees 286
`S. Hauck, W.D. Wilson
`
`~ POSTER SESSION 3
`
`Accelerating An IR Automatic Target Recognition Application with FPGAS... 290
`J. Jean, X. Liang, B. Drozd, K. Tomko
`Mappingof an Automated Target Recognition Application from a Graphical Software
`Environment to FPGA-based Reconfigurable Hardware.....iccscsssescsseensesssersesetsenscrersesseeas 292
`B. Levine, S. Natarajan, C. Tan, D. Newport, D. Bouldin
`Hybrid Data/Configuration Caching for Striped FPGAS.........ccsseecseseeeseetesseteteneneneeneeaneneneneene 294
`D. Deshpande, A.K. Somani, A. Tyagi
`On Reconfiguring Cache for Computing .0.......ccsceeseressneceesersseneneneceseeneneeseeetatsnaessaatessieneeey 296
`H.-S. Kim, A.K. Somani, A. Tyagi
`Reconfigurable Pipelines in VLTW Execution UMits «00.0... serssessteseissetstnsseeseenenereeeseens 298
`R.D. Williams, B.D. Kuebert
`Fast Online Placement for Reconfigurable Computing System.........c:cccscssserereeenereenees 300
`K. Bazargan, M. Sarrafzadeh
`
`
`— POSTER SESSION 4
`
`A CompactFast Variable Key Size Elliptic Curve Cryptosystem Coprocessor........scceeeee 304
`L. Gao, S. Shrivastava, H. Lee, G.E. Sobelman
`
`A Virtual Logic Algorithm for Solving Satisfiability Problems Using
`Reconfigurable Hardware .......csscsssecsssssscsecsssseececssnsessesessenecssseseeeeeesneeseseeessaeenseneeanenneegs 306
`M. Abramovici, J.T. de Sousa
`Reducing Compilation Time of Zhong’s FPGA-based SAT Solver... scsececeseseeteneneretetereeeres 308
`P.K. Chan, M.J. Boyd, S. Goren, K. Klenk, V. Kodavati, R. Kundu, M. Margolese,
`J. Sun, K. Suzuki, E. Thorne, X. Wang, J. Xu, M. Zhu
`
`FPGA-basedStructures for On-line FFT and DCT 0.0... ee eeececeneeeeenecneereseneseeeenessseeseeeseesee 310
`D. Lau, A. Schneider, M.D. Ercegovac, J. Villasenor
`
`Vili
`
`7
`
`

`

`
`
`An FPGA-based Fan Beam Image Reconstruction Module ..0.........cccesecccssesceeseseseeeseesseseseeens 312
`L. Maltar, F.M.G. Franca, V.C. Alves, C.L. Amorim
`,
`
`Bézier Curve Rendering on Virtex™ oc cccsssssssssesesessessseesssessssssensscsssccsseeseeseaenssseeasecsesnenesenea 314
`D. MacVicar, S. Singh, R. Slous
`
`Author Undex.......c..cccccseeccccscssssssscssesssececssnsrssensscesausecssseessseessscescseucnesesecesnsueneeseeerenacessecsenseaseees 318
`
`OMANDARWDND=
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`8
`
`

`

`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`Safe and Protected Execution for the Morph/AMRM Reconfigurable
`Processor
`
` Jay H. Byun
` Andrew A. Chien
` Department of Computer Science and Engineering
` Department of Computer Science
` University of California, San Diego
` University of Illinois at Urbana-Champaign
` achien@cs.ucsd.edu
` jaybyun@cs.uiuc.edu
`
`April 1, 1999
`
`Abstract
`
`Technology scaling of CMOS processes brings relatively faster
`transistors (gates) and slower interconnects (wires), making viable
`the addition of reconfigurability to increase performance. In the
`Morph/AMRM system, we are exploring
`the addition of
`reconfigurable logic, deeply integrated with the processor core,
`employing the reconfigurability to manage the cache, datapath,
`and pipeline resources more effectively. However, integration of
`reconfigurable logic introduces significant protection and safety
`challenges for multiprocess execution. We analyze the protection
`structures in a state of the art microprocessor core (R10000),
`identifying the few critical logic blocks and demonstrating that the
`majority of the logic in the processor core can be safely
`reconfigured. Subsequently, we propose a protection architecture
`for the Morph/AMRM reconfigurable processor which enable
`nearly the full range of power of reconfigurability in the processor
`core while requiring only a small number of fixed logic features
`which to ensure safe, protected multiprocess execution.
`1. Introduction
`
`Trends in semiconductor technology suggest that the
`use of reconfigurable logic blocks within the processor will
`be desirable in the future. Projections from Semiconductor
`Industry Association(SIA) for the year 2007 indicate
`advanced semiconductor processes using 0.1 micron feature
`sizes [1]. However, this feature size, as measured by
`transistor channel length, is of decreasing importance to
`logic and circuit as well as processor speed. In systems of
`that era, logic density, logic speed, and processor speed will
`be dominated by interconnect performance and wiring
`density. For 2007, the SIA projects pitch for the finest
`interconnect at 0.4-0.6 microns. Between logic blocks,
`average interconnect lengths typically range from from
`1,000x to 10,000x pitch -- up to 6mm of intra-chip
`interconnect
`length.
` For such an
`interconnect,
`the
`achievable global clock speed would be
`limited
`to
`approximately 1 nanosecond. Within a few technology
`generations, a crossover will occur, and the average
`interconnect delay will surpass logic block delays --
`projections
`indicate that by
`the year 2007, average
`interconnect delay can be equivalent to five gate delays.
`
` 1
`
`Once past the cross-over point, dynamic interconnect
`(reconfigurable interconnect or logic) can be introduced at
`modest impact even on critical timing paths[2]. In such
`systems, the dynamic configurability in the processor can
`be used
`to significant advantage [4, 5],
`improving
`performance by factors of 10 to 100x for computational
`kernels while avoiding the traditional disadvantages of
`custom computing approaches such as I/O coprocessor
`coupling and slower
`logic [6].
` In
`these systems,
`reprogrammable
`logic
`blocks will
`replace
`static
`interconnects in the processor core, paving the way for a
`new class of architectures which are customized to the
`application, delivering more robust and higher performance.
`
`Figure 1. Reconfigurability in the processor core and the extended
`application to fixed hardware interface
`
`Reconfigurable, or application adaptive processors
`allow customization of mechanisms, bindings, and policies
`on a per application basis. While current microprocessors
`implement a number of aggressive architectural techniques
`such as speculative execution, branch prediction, block
`prefetching, multi-level caching, etc. to achieve higher
`execution speeds, these mechanisms and policies are tuned
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`9
`
`

`

`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`for a broad suite of applications (e.g. SPEC), and thus
`cannot be tightly matched to the needs of a particular
`application, procedure, or even loop in an application. For
`example, the cache block size and organization is chosen to
`maximize performance over a suite of applications, but may
`not give best performance on any particular application.
`Similar constraints apply to other performance critical
`aspects such as value prediction, branch prediction, and
`data movement. In contrast, a processor incorporating
`reconfigurability can adopt optimal policies (and in some
`cases better mechanisms) for the application, enabling
`increased execution efficiency. Thus, the reconfigurable
`logic can used to tune the processor to better match the
`application, rather than the more traditional view of
`thinking of it as an add-on coprocessor. One example of
`this per-application basis tuning would be to adapt the
`cache
`line size
`to maximize performance
`for
`that
`approach
`is embodied
`in
`the
`application[3]. This
`Morph/AMRM
`(Adaptive Memory Reconfiguration
`Management) architecture [4, 5], and the basic change in
`perspective is that the reconfigurable hardware is an
`extension of
`the application program, extending
`the
`application -- fixed hardware interface to enable more
`efficient execution. The fixed hardware then has a
`somewhat richer (and in parts lower level) interface as
`shown in Figure 1. Studies of Morph/AMRM have
`demonstrated that performance increases of ten to 100 times
`are possible [5]. In essence, this is an extension of the
`application binary interface (so-called ABI), but need not
`be a non-portable extension of the application programming
`interface (API) if appropriate CAD support is available.
`This approach is similar to that which has recently gained
`popularity in the software design community as "open
`implementations"
`[7]
`in which
`software architects
`recognize
`the need
`to open
`the
`implementation for
`customization for particular application uses in order to
`achieve adequate performance.
`Introducing application-controlled reconfigurability in
`the processor raises significant challenges for ensuring
`process isolation and protection (multiprocess isolation), a
`critical element of robust desktop and to an increasing
`degree, embedded computing
`systems. Multiprocess
`isolation is an essential modularity element in software
`systems: without the guarantee of safely isolated and
`protected processes, the system can never be robust since
`software faults cannot be contained and the system cannot
`be safely extended. It is essential for robust reconfigurable
`computing that an application's customization only affect its
`computation, not that of other applications. For example, if
`application-defined hardware were allowed to control
`hardware addressing,
`it
`could allow unauthorized
`corruption of operating system data or even the data of
`other application processes. If an application-defined
`hardware were allowed to control data prefetching, it could
`swamp the memory system with spurious requests. If
`
`application-defined hardware were allowed to control
`privilege mode changes, it could compromise all traditional
`protection structures.
`the protection structures of
` Our study examines
`traditional processors and operating systems, and based on
`these lessons, proposes a safe multiprocess execution
`architecture for reconfigurable systems. We analyze in
`detail the software and hardware mechanisms central to the
`process protection in conventional processors and OS,
`specifically studying the MIPS R10000 [8] microprocessor,
`an exemplar of a system employing Unix/RISC protection
`architecture. This study elucidates the key mechanisms and
`architectural features for Unix style two mode protection,
`and addressing based isolation. The key feature of this
`protection architecture is process isolation via address
`isolation and mediation. Specifically,
`1. All access to hardware devices is mediated by the
`operating system,
`2. The operating system manages address translation to
`isolate processes,
`3. Application processes cannot change the address
`translation information,
`4. Application processes cannot substitute other
`translation information,
`5. All application accesses are subject to this
`translation, and
`6. The hardware ensures these guarantees
`
`the Morph/AMRM
`describe
`subsequently
`We
`architecture, outlining the dimensions of configurability and
`the hazards for multiprocess protection they induce. For the
`Morph/AMRM system, we then describe the protection
`architecture, describing in detail how each of the key
`properties of the operating system / processor protection
`architecture are provided. The key elements of this
`protection architecture are:
`1. A hardwired control processor which controls
`instruction sequence and privilege mode transitions
`2. A hardwired control processor to TLB control for
`address translation and TLB entry management
`3. A requirement for all other configurable elements
`(system chip sets, input/output devices, memory
`controllers) must deal in virtual addresses, and their
`accesses are checked by local TLBs
`4. Controlled access to key shared interconnects such as
`the system bus are controlled by hardwired arbiters
`which are not changed, system reserves highest priority
`to allow preemption for these resources
`
`This architecture enables configurability in the processor
`complex because it can ensure multiprocess protection (safe
`configuration). We also believe it enables much of the
`useful configurability in the processor complex, notably
`policies for improving efficient management of resources
`and even the addition of instructions, special functional
`
` 2
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`10
`
`

`

`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`units, or even processor state. The model provided to
`application programs is a private, configurable, virtual
`machine which enables rich application customization.
`These applications (and their customizations) are cleanly
`isolated.
`The remainder of the paper is organized as follows.
`Section 2 describes
`the basic problem of protected
`execution and process isolation in computer systems.
`Section 3 describes our analysis of the software and
`hardware mechanisms central to the process protection in
`conventional processors and operating systems. Section 4
`discusses the implications of reconfigurability on process
`protection and identifies the key requirement for safe
`process isolation in reconfigurable processors. In Section 5,
`we describe the Morph/AMRM system and a proposed
`protection architecture that meets these requirements set
`forth in Section 3. Section 6 discusses alternate approaches
`and the limitations on configurability imposed by the
`Morph/AMRM protection architecture.
` Sections 7
`summarizes future work and the material covered in this
`paper.
`2. Process Isolation: the Problem
`
`Figure 2: Multiprocess Protection based on Address Space
`Isolation
`
`To understand the challenges of multiprocess isolation,
`it is instructive to first consider the possible modalities in
`which multiprocess isolation can be compromised. In the
`simplest mode, an application corrupts the data of another,
`causing it to fail or compute incorrectly. In a more complex
`mode, the application somehow locks up the machine, so no
`other application state is damaged, but neither can the
`machine make progress. One example of this would be
`jamming the memory bus or defeating the timer interrupt
`which ensures preemption. A more serious failure mode is
`to corrupt the operating system's data, which can lead to a
`machine crash
`in which all applications have data
`corruption. Finally, an application could also corrupt
` 3
`
`the operating
`input/output device state, confounding
`system, the device (leading to data loss or misdirection), or
`application data itself. In all of these cases, the failure is
`the result of allowing an application action which can affect
`the machine hardware state, other application memory state,
`or operating system state.
`The key issue in safe multiprocess execution is to
`control access to hardware resources, ensuring that these
`accesses are non-interfering. In general, access to main
`memory, as well as other architecturally visible state
`(processor data registers, control registers), system chip
`registers, and input/output device state must be controlled.
`Traditional approaches partition memory access, virtualize
`resources
`such as processor data
`resources with
`multitasking, and use operating system calls to mediate
`operations which require access to control registers, system
`chip sets, input/output device state, etc. Note that isolation
`and virtualization must apply to any resource at any level
`that a process can claim its ownership. The final piece of
`the puzzle is that in order to support the virtualization and
`multitasking, transitions between the different entities must
`be carefully controlled to prevent compromise.
`3. Process Isolation in the MIPS R10000
`
`The key issue in maintaining a safe multiprocessing
`environment is ensuring process isolation: the processor
`and the OS must prevent independent processes from
`interfering with the data and memory of each other and of
`the operating system kernel. They must also prevent a
`malicious process from taking over the processor and
`locking up the system.
`Through a detailed analysis of the R10000 architecture
`and operating
`system, we
`identify
`the hardware
`mechanisms and OS software structures that are central to
`process isolation. We chose the MIPS R10000 processor as
`an exemplar of a modern RISC processor that supports a
`relatively simple UNIX style protection structure [9]. We
`first examine how a UNIX style operating system ensures
`process
`isolation and
`thereby derive
`the hardware
`requirements it imposes. Then identify the corresponding
`support
`in
`the R10000 processor. In
`the following
`discussion, we assume that the address translation is on a
`simple paging system. Most of today’s systems actually
`employ multiple-paging or segmented paging but the
`address translation mechanism is fundamentally the same as
`a simple paging system.
`
`3.1 Operating System-based Process
`Protection
`
`3.1.1 Application and Operating System Memory
`Isolation
`
`Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 24,2020 at 16:51:20 UTC from IEEE Xplore. Restrictions apply.
`
`11
`
`

`

`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`29
`30
`31
`32
`33
`34
`35
`36
`37
`38
`39
`40
`41
`42
`43
`44
`45
`46
`47
`48
`49
`50
`51
`52
`53
`54
`55
`56
`
`Application and operating system memory isolation is
`achieved through controlled address translation. The
`physical memory of each process is isolated by having
`process's virtual address space pages map to its own
`physical memory frames only. To protect processes from
`modification by other processes, the memory-management
`hardware and the OS must prevent programs from changing
`their own address mappings. The UNIX kernel, for
`example, runs in a privileged mode (kernel mode or system
`mode) in which memory mapping may be controlled,
`whereas application processes run in an unprivileged mode
`(user mode). The page tables, mapping information for each
`process reside in the memory space of the kernel so that
`they can only be modified by the OS running in kernel
`mode This address translation control to ensure isolation is
`achieved through the following mechanisms in UNIX [9,
`10].
`
`1. Locating correct translation information for each process.
`By using a special page table base register(ptbr) which
`is set from the process control block(PCB) on each
`process switch, the OS can correctly locate the page
`table for the executing process. Then the index portion
`of the virtual address is added to the address pointed to
`by the ptbr to locate the appropriate page table entry
`(PTE).
`2. Distinguishing valid and invalid entries in page tables.
`Notice that the page table can contain entries that are
`not used by
`the process. These unused entries
`correspond to the pages that are not in the process’s
`logical address space and thus compromise process
`isolation. The OS uses valid-invalid bits to distinguish
`these entries. Alternatively, the page table can also be
`implemented to contain only the entries that are
`actually used by the process. This implementation will
`require a special register containing the length of the
`process's page table, usually called page-table length
`register(PTLR). PTLR can be used to check if page
`index portion of the virtual address is in the range and
`therefore
`is not
`accessing
`illegal
`translation
`information.
`3. Controlling access types
`While the address translation to physical memory
`frames can be valid, the access to those physical
`memory frames are unlimited; the process can read,
`write, and execute them. It will be safer and more
`efficient if we can control the type of access to them.
`The protection bit field in the PTE provides this access
`control information. At the same time that the physical
`address is being computed, the protection bits can be
`checked to verify that no accesses not granted are
`being made. These bits usually indicate whether the
`process can read/write, read-only, or execute-only.
`The type and the number of the protection bits
`provided are dependent on the underlying processor.
`
` 4
`
`4. Managing TLB consistency.
` The translation information, namely the PTE, is
`cached in the processor's TLB to avoid extra memory
`access to the page table. Using special privileged
`instructions, OS updates the TLB with consistent
`mapping information when a miss occurs. But notice
`that after a context switch, although the new page table
`is pointed to by the new process's ptbr, the TLB would
`contain entries that are left over from the previous
`processes. Therefore, to ensure process isolation, we
`need to invalidate or distinguish the entries in the TLB
`that does not belong to the executing process. This can
`be done by allowing the OS to flush the TLB by a
`special privileged instruction after a context switch or
`by tagging the TLB entries with the process ID's and
`valid-invalid bits.
`
`through Operating
`
`3.1.2 Resource Protection
`System Mediation
` Not only the memory but also all resources that can be
`shared by processes mu

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket