`International Conf ere nee on
`Computer Design
`VLSI in Computers and Processors
`October 12-15, 1997
`Austin, Texas
`Sponsored by
`IEEE Computer Society Technical Committee on Design Automation
`IEEE Circuits and Systems Society
`Los Alamitos, California
`Intel Exhibit 1003 - 1
`Copyright© 1997 by The Institute of Electrical and Electronics Engineers, Inc.
`All rights reserved
`Copyright and Reprint Permissions: Abstracting is pennitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that
`carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 133, Piscataway, NJ 08855-1331.
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and litle page. They
`reflect the authors' opinions and, in the imerests of timely dissemination, are published as presented and
`without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society, or the Institute of Electrical and Electonics Engineers, Inc.
`IEEE Computer Society Order Number PR08026
`ISBN 0-8186-8026-X
`ISBN 0-8186--8207-8 (case)
`ISBN 0-8186-8208-6 (mkroficbe)
`IEEE Order Plan Catalog Number 97CB36149
`ISSN 1063-6404
`Additional copies may be ordered from:
`IEEE Computer Society
`Customer Service Center
`10662Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitos, CA 90720-1314
`Tel:+ 1-714-821-8380
`Fax:+ 1-714-821-4641
`&mail: cs.books@computer.org
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`Tel: + 1-908-981-1393
`Fax:+ 1-908-981 -9667
`IEEE Computer Society
`13, Avenue de l' Aquilon
`B-1200 Brussels
`Tel:+ 32-2-770-2198
`Fax:+ 32-2-770-8505
`IEEE Computer Society
`Oosbima Building
`2-19-1 Minami-Aoyama
`Minato-ku, Tok-yo 107
`Tel: + 81-3-3408-3118
`Fax:+ 81-3-3408-3553
`Editorial production by Ian Torwick
`Cover design by Joseph Daigle/Studio Productions
`Printed in the United States of America by Technical Communication Services
`Intel Exhibit 1003 - 2
`Table of Contents
`ICCD '97 Conference Program
`1997 Technical Program
`Welcome to ICCD'97 ............... ............................................ ...................................................... xiv
`Program Committee ................ ................................................ ................................................ xvi
`Additional Referees .... ............................................................................................................. xix
`Session I.I: Keynote Speech
`Intelligent RAM (!RAM): the Industrial Setting, Applications, and Architectures ................... 2
`D. Patterson, K. Asanovic, A Brown, R. Fromm, J. Golbus, B. Gribstad,
`K. Keeton, C. Kozyrakis, D. Martin, S . Perissakis, R. Thomas, N. Treuhaft,
`and K Ye lick, University of California at Berkeley, California
`Session 1.2: CAD Plenary
`Chair: Andreas Kuehlmann, IBM T. J. Watson Research Center
`A Brief History of the Future of Semiconductor Electronic Design Automation ..................... 10
`Ron Rohrer, TBD Consultants
`Concurrent Sessions 1.8
`Session .1.3.1: Special Session: Industrial Applications of Formal Verification
`Organizer and Chair: Andreas Kuehlmann, IBM T. J. Watson Research Center
`Formal Implementation Verification of the Bus Interface Unit for the Alpha 21264
`Microprocessor .............................................................................................................................. 16
`G.P. Bischoff, KS. Brace, S. Jain, and R. Ra,zdan
`Intertwined Development and Formal Verification of a 60x bus Model ................................... 25
`M. Kaufmann and C. Pixley
`Formally Specifying and Mechanically Verifying Programs for the Motorola Complex
`Arithmetic Processor DSP ............................................................................................................ 31
`B.C. Brock and W.A Hunt, Jr.
`BIST-Based Fault Diagnosis in the Presence of Embedded Memories ..................................... 37
`J. Savir
`Built-in Self Test for Content Addressable Memories ................................................................ 48
`Y.-S. Kang, J.-C. Lee, and S. Kang
`Pseudo-Random Pattern Testing of Bridging Faults ................................................................. 54
`NA Touba and E.J. McCluskey
`Session 1.8.3: Simulation and Power Estimation
`Chair: Teng-Sheng Moh, Silicon Valley Research, Inc.
`Novel Simulation of Deep-Submicron MOSFET Circuits .......................................................... 62
`S . Bruma and R.H.J.M. Otten
`Time-Stamped Transition Density for the Estimation of Delay Depemlent
`Switching Activities ...................................................................................................................... 68
`H. Choi and S.H. Hwang
`Intel Exhibit 1003 - 3
`Power Compiler : A Gate-Level Power Optimization and Synthesis System ..................... ....... 7 4
`B. Chen and I. Nedelcheu
`Session 1.3,4: Branch Prediction
`Chair: Jim Bondi, Texas Instruments
`Elastic History Buffer: A Low-Cost Method to Improve Branch
`Prediction Accuracy ... ................... ..................................... ............................................. ........... .. . 82
`M.-D. Tarlescu, K.B. Theobald, and G.R. Gao
`Design Optimization for High-Speed Per-address Two-level Branch Predictors ...... ........ ...... . 88
`1. -C.K. Chen, C. -C. Lee, M.A. Postiff, and T.N. Mudge
`PA-8000: A Case Study of Static and Dynamic Branch Prediction ................................... .. .. .... 97
`C. Burch
`Concurrent Sessions 1.4
`Session 1.4.1: New Techniques for Gate-Sizing and Retiming
`Chair: Derek Beatty, Motorola, Inc.
`Discrete Drive Selection for Continuous Sizing .................................................... .................... 110
`R. Haddad, L .P.P.P. uan Ginneken, and N. Shenoy
`Continuous Retiming: Algorithms and Applications ....... .. .................. .. .. ................................. 116
`P. Pan
`Optimal Clock Period Clustering for Sequential Circuits with Retiming ............................... 122
`A.K. Karandikar, P. Pan, and C.L. Liu
`Session 1.4.2: Circuit Modeling
`Chair: Sandip Kundu, IBM Corp.
`Comparison between nMOS Pass-Transistor logic style vs. CMOS
`Complementary Cells ............................... : ............................................................... .. ........ ........ 130
`R. Mehrotra, M. Pedram, and X . . Wu
`Circuit-Based Description and Modeling of Electromagnetic Noise Effects
`in Packaged Low-Power Electronics ........................... .. ................................................. ............ 136
`AC. Cangellaris, W. Pinello, and A. Ruehli
`Transistor-Level Sizing and Ti.ming Verification of Domino Circuits in the
`Power PC Microprocessor ...................................... ... .... .............................................................. 143
`A Dharchoudhury, D. Blaauw, J. Norton, S. Pullela, a,id J . Dunning
`Session 1.4.3: Novel Architectures
`Chair: Greg Fisher, Printronix Corporation
`Architectural Adaptation for Application-Specific Locality Optimization .............................. 150
`X. Zhang, A. Dasdan, M. Schulz, R.K. Gupta, and A.A. Chien
`A New Processor Architecture for Digital Signal Transport Systems .. .. ............ .. .................. 157
`M. Inamori, K Ishii, A. Tsutsui, K Shirakawa H. Nakada, and T. Miyazaki
`Short Papers
`PROPHID: A Heterogeneous Multi-Processor Architecture for Multimedia .......................... 164
`JA. Leijten, J.L. van Meerbergen, A.A. Timmer, and J.A.G. Jess
`Intel Exhibit 1003 - 4
`Enhanced Compression Techniques to Simplify Program Decompression
`and Execution ............................................................................................................................. 170
`M. Breternitz, Jr. and R. Smith
`Session 1.4.4: Low Power Architectures
`Chair: Tim Brodna:; IBM
`A Low Power Approach to Floating Point Adder Design ......................................................... 178
`R. V.K. Pillai, D. Af-Khalili, and AJ. Af-Khalili
`Design and Implementation of Low-Power Digit-Serial Multipliers ...................................... 186
`Y.-N. Chang, J.H. Satyanarayana, and KK. Parhi
`On Complexity Reduction of FIR Digital Filters Using Constrained
`Least Squares Solution ............................................................................................................... 196
`K. Muhammad and K Roy
`Concurrent Session 1.5
`Session 1.5.1: Timing Optimization for Deep Submicron Technology
`Chair: Masahiro Fl-Oita, FJ-Oitsu Laboratories of America
`An Integrated Placement and Synthesis Approach for Timing Closure of
`PowerPC™ Microprocessors ....................................................................................................... 206
`S. Hojat and P. Villarrubia
`Post-Layout Circuit Speed-up by Event Elimination .............................. : ................................ 211
`H. Vaishnav, C.-K Lee, and M. Pedram
`Clustering and Load Balancing for Buffered Clock Tree Synthesis ......................... : .............. 217
`AD. Mehta, Y.-P. Chen, N. Menezes, D.F. Wong, and L.T. Pileggi
`CMOS Gate Delay Models for General RLC Loading .............................................................. 224
`R. Arunachalam, F. Dartu, and L. T. Pileggi
`Session 1,5.2: Special Session: The G4 S/390 Microprocessor
`Organizer: Andreas Kuehlmann, IBM T. J. Watson Research Center
`Chair: Sumit Dasgupta, IBM
`Design Methodology for the High-Performance 04 S/390 Microprocessor ............................. 232
`KL. Shepard, S. Carey, D.K. Beece, R. Hatch, and G. Northrop
`A High-Frequency Custom CMOS S/390 Microprocessor ................................ .-....................... 241
`C.F. Webb and J.S. Liptay
`High Performance CMOS Circuit Techniques for the G-4 S/390 ,Microprocessor .................. 247
`J. Warnock, L. Sigal, B. Curran, and Y. Chan
`A 400 MHz, 144Kb CMOS ROM MACRO for an IBM S/390-Class Microprocessor ............... 253
`A Tuminaro
`Session 1.5.3: Multiprocessor Communication
`Chair: Wai-Chi Fang, Jet Propulsion Laboratory
`A Comparative Evaluation of Hierarchical Network Architecture of the
`HP-Convex Exemplar ................................................................................................................. 258
`R. Castaneda, X. Zhang, and J.M. Hoover, Jr.
`Intel Exhibit 1003 - 5
`Effect of Message Length and Processor Speed on the Performance of the Bidirectional
`Ring based Multiprocessor .................................. ......... ..... ......... ......... ........... ............................ 267
`H. Oi and N. Ranganathan
`An Approach to Network Caching for Multimedia Objects ..................................................... 273
`M.A Kozuch, W. Wolf, andA. Wolfe
`Development of a High Bandwidth Merged Logic/DRAM Multimedia Chip ..... ..................... 279
`W.K. Luk, Y. Katayama, W. Hwang, M. Wordeman, T. Kirihata, A. Satoh,
`S. Munetoh, H. Wong, B. El-Kareh, P. Xio, and R. Joshi
`Session 1.5.4: Asynchronous Architectures
`Chair: Eric Chou, Hewlett-Packard ULSI Labs
`TITAC-2: An asynchronous 32bit microprocessor based on Scalable -Delay-
`Insensitive model ........................................... ..... ...... ... ..... ........................................................... 288
`A Takamura, M. Kuwako, M. Ima, T. Fujii, M. Ozawa, I. Fukasaku,
`Y. Ueno, and T. Nanya
`An Evaluation of Asynchronous and Synchronous Design for Superscalar
`Architectures .......................................... ............. ......................................................... ............... 295
`A. Davey and D. Loyd
`Synthesis of High Speed· Delay-Insensitive Combinational Iterative Tree Circuits .............. 301
`F.-C. Cheng
`Asynchronous Wrapper for Heterogeneous Systems ............................................................... 307
`D.S. Bormann and P. Y.K. Cheung
`Concurrent Session 1.6
`Session 1.6.I
`Panel: The War of the Roses: Designers versus Tool Developers .. , ........................................ 317
`Organizer: Andreas Kuehlmann, IBM T. J. Watson Research Center
`Moderator: Daniel Beece, -IBM T. J. Watson Research Center
`Barbara Chappell, Intel Corp.
`Robert Damiano, Synopsys, Inc.
`Charlie Malley, Hewl.ett-Packard
`Yiannos Manoli, University of Saarland
`David F. Reed, Advanced Micro Devices
`Steven E. Schulz, Texas Instruments
`Session 1.6.2
`Panel: If Software is King for Systems-on-Silicon, What's New in Compilers? ...... ............... 322
`Organizer: Nikil Dutt, University of California at Irvine
`Moderator: Sharad Malik, Princeton University
`Lex Augusteijn, Philips Research Laboratory
`Beatrice Fu, Intel Corporation
`Alex Nicolau, University of California at Irvine
`Constantine Polychronopoulus, University of Illinois at Urbana-Champaign
`Intel Exhibit 1003 - 6
`Session 2.1 Design and Test Plenary
`Chair: Magdy Abadir, Motorola, Inc.
`Design and Test: The Lost World .............................................................................................. 328
`W. Joyner, IBM T. J. Watson Research Center
`Concurrent Sessions 2.2
`Session 2.2.1: Binary Decision Diagrams
`Chair: B. Bischoff, Digital Equipment Corporation
`Equivalence Checking Using Abstract BDDs .......................................................................... 332
`S. Jha, Y. Lu, M. Minea, and E.M. Clarke
`Speeding up Variable Reordering of OBDDs ............................................................................ 338
`C. Meinel and A Slobodoua
`Dynamic Reordering in a Breadth-First Manipulation Based BDD
`package: Challenges and Solutions ........................................................................................... 344
`R.K Ranjan, W. Gosti, R.K. Brayton, and A Sangiovanni-Vincentelli
`Timed Binary Decision Diagrams .............................................................................................. 852
`Z . Li, Y. Zhao, Y. Min, and R.K. Brayton
`Session 2.2.2: Advanced Test Topics
`Chair: Magdy Abadir, Motorola
`Vector Restoration Based Static Compaction of Test Sequences for Synchronous
`Sequential Circuits ................................................................................ ···················-· ··············· 360
`I. Pomeranz and S.M. Reddy
`Nonenumerative Path Delay Fault Coverage Estimation with
`Optimal Algorithms .................................................................................................................... 366
`D. Kagaris, S. Tragoudas, and D. Karayiannis
`Properties of the Input Pattern Fault Model. ........................................................................... 372
`R.D. (Shawn) B,laton and J.P. Hayes
`A new approach for Initialization Sequences Computation for Synchronous
`Sequential Circuits ..................................................................................................................... 881
`F. Como, P. Prinetto, M. Rebaudengo, M.S. Reorda, and G. Squillero
`Session 2.2.3: Embedded Software and Systems
`Chair: Rolf Ernst, Technische Universitaet Braunschweig, Germany
`Real Time Operating Systems for Embedded Computing ....................................................... 388
`Y Li, M. Potlu:mjak and W. Wolf
`Allocation and Data Arrival Design of Hard Real Time Systems ........................................... 393
`D.L. Rhodes and W. Wolf
`Improving Design Turnaround Time Via Two-Levels Hw/Sw Co-Simulation ....................... 400
`A Allara, S . Filipponi, W. Fomaciari, F. Salice, and D. Sciuto
`Session 2.2,4: Low Power Issues
`Chair: Jose L. Cruz-Rivera, University of Puerto RicerMayaguez
`Power Constrained Design of Multiprocessor Interconnection Networks .............................. 408
`C.S . Patel, S .M. Chai, S . Yalamanchili, and D.E. Schimmel
`Intel Exhibit 1003 - 7
`Memory Traffic and Data Cache Behavior of an MPEG-2 Software Decoder ........................ 417
`P. Soderquist and M. Leeser
`Asynchronous Transpose-Matrix Architectures ............................................................ · ........... 423
`J A. Tierno and P. Kudva
`A Low Power Smart Vision System Based on Active Pixel Sensor
`Integrated with Programmable Neural Processor ................................................................... 429
`W.-C. Fang, G. Yang, B. Pain, and B.J. Sheu
`Concurrent Sessions 2.3
`Session 2.3.1: Formal Verification Methods
`Chair: Warren Hunt, Jr., Computational Logic, Inc.
`Formal Verification of the HAL S1 System Cache Coherence Protocol.. ................................ 438
`A.J. Hu, M. Fujita, and C. Wilson
`A Survey of Techniques ~or Formal Verification of Combinational Circuits .......................... 445
`J . Jain, A. Narayan, M. Fujita, and A. Sangiovanni-Vincentelli
`Checking Formal Specifications under Simulation .................................................................. 455
`W. Canfield, E.A. Emerson, and A Saha
`Session 2.3.2: Mixed Signal Design and Test
`Chair: Yervant Zorian, Logic Vision
`Built-In Temperature Sensors for On-line Thermal Monitoring of
`Microelectronic Structures ......................................................................................................... 462
`K. Arabi and B. Kaminska
`Develop of Hierarchical Testability Design Methodologies for Analog/ Mixed-Signal
`Integrated Circuits ..................................................................................................................... 468
`C.-P. Wang and C.-L. Wey
`A Novel Test Set Design for Parametric Testing of Analog and Mixed-Signal Circuits ........ 4 7 4
`J . Chen and A. Ramachandran
`Session 2.3.3: FPGA Design
`Chair: Paul Franzon, North Carolina State University
`On the Construction of Universal Series-Parallel Functions for Logic Module Design ......... 482
`F.Y. YoungandD.F. Wong
`An Universal Pezaris Array Multiplier Generator for SRAM-Based FPGAs ......................... 489
`J. Stohmann and E. Barke
`Channel Segmentation Design for Symmetrical FPGAs ......................................................... 496
`W.-K Mak and D.F. Wong
`Session 2.3.4: Cache Technology I
`Chair: Mauricio Breternitz, Motorola Inc.
`Multi-Column Implementations for Cache Associativity ......................................................... 504
`C. Zhang, X Zhang, and Y. Yan
`Design and Performance Evaluation of a Cache Assist to Implement
`Selective Caching ........................................................................................................................ 510
`L.K. John and A. Subramanian
`Intel Exhibit 1003 - 8
`On Effective Data Supply for Multi-Issue Processor ................................................................ 519
`J.A Rivers, E.S. Tam, and E.S. Davidson
`Concurrent Sessions 2.4
`Session 2.4.1: Embedded Tutorial
`Practical Issue of Interconnect Analysis in Deep Submicron Integrated Circuits ................. 532
`Chair: Andreas Kuehlmann, IBM T. J. Watson Research Center
`Presenter: Kenneth L. Shepard, mM T. J. Watson Research Center
`Session 2.4.2: Fault Diagnosis
`Chair: Rqj Raina, Motorola, Inc.
`First Results of System Level Fault Tolerant Design Validation Through
`Laser Fault Injection .................................................................................................................. 544
`W. A Moreno, F.J. Falquez, J.R. Samon Jr., and T. Smith
`Integrated Diagnostics for Embedded Memory Built-in Self Test
`on Power PC™ Devices ................................................................................................................ 549
`C. Hunter
`A TSC Evalution Function for Combinatorial Circuits ............................................................ 555
`C. Bolchini, D. Sciuto, and F. Salice
`Session 2.4.3: Special Session: Low Power Design Issues
`Chair: Sanna Vrudhula, Univenity of Aluona
`An Architectural Power Optimization Case Study Using High-level Synthesis .................... 562
`C.-T. Chen and K Kucukcakar
`High-Level Design Synthesis of Low Power, VLIW Processor for the IS-54
`VSELP Speech Encoder ............................................................................................................. 571
`R. Henning and C. Chakrabarti
`Session 2,4.4: Cache Technology D
`Chair: Michael Stumm, University of Toronto
`Fast Cache Access with Full-Map Block Directory ................................................................. 578
`J.-K. Peir, W.W. Hsu, H. Young, and S. Ong
`A Data Alignment Technique for Improving Cache Performance ........................................... 587
`P.R. Panda, H. Nakamura, N.D. Dutt, and A Nicolau
`Instruction Prefetching Using Branch Prediction Information ............................................... 593
`1.-C.K. Chen, C.-C. Lee, and T.N. Mudge
`Session 3.1: Architecture & Algorithm Plenary
`Chair: J. Robert Jump, Rice University
`Is Wireless Data Dead? ............................................................................................................. 604
`Randy Katz, University of California at Berkeley, California
`Intel Exhibit 1003 - 9
`Concurrent Sessions 3.2
`Session 3.2.1: Layout Partitioning and Synthesis
`Chair: Georg Pelz, Gerhard-Mercator-University-GB, Duisburg, Germany
`An Efficient Multi-Way Algorithm for Balanced Partitioning of VLSI Circuits ................... 608
`X. Tan, J. Tong, P. Tan, N. Park, and F. Lombardi
`Partitioning Under Timing and Area Constraints ................................................................... 614
`G. Tumbush and D. Bhatia
`A Parallel Circuit-Partitioned Algorithm for Timing Driven Standard Cell Placement ....... 621
`J .A Chandy, and P. Banerjee
`Crosstalk-Constrained Maze Routing Based on Lagrangian Relaxation ............................... 628
`H. Zhou andD.F. Wong
`Session 3.2.2: Design for Testability & Test Synthesis
`Chair: Sujit Dey, NEC
`High Level Test Synthesis Across the Boundary of Behavioral and Structural
`Domains : ............................................................ ......................................................................... 636
`K. Lai, C.A. Papachristou, and M . Baklashou
`Power Driven Partial Scan .... ........... ...... .. ...... ............................................................................ 642
`J. -Y. Jou and M.-C. Nien
`Synthesis of Delay Verifiable Sequential Circuits using Partial Enhanced Scan .................. 648
`R.C. Tekumalla and P.R. Menon
`Application of a Testing Framework to VHDL Descriptions at Different
`Abstraction Levels .................................................................................................. .................... 654
`M. Bacis, G. Buonanno, F. Ferrandi, F. Fummi, L. Gerli, and D. Sciuto
`Session 3.2.3: Embedded Tutorial
`Practical Advances in Asynchronous Design ........................................................................... 662
`Chair: Rob Roy, NEC, Inc.
`Pre sen tors:
`Eric Brunvand of University of Utah
`Steve Nowick of Columbia University
`Kenneth Yun of University of California at San Diego
`Session 3.2.4: Arithmetics
`Chair: Joe Cavallaro, Rice University·
`Benchmarking and Analysis of Architectures for CAD Applications ................... ................... 670
`A Mehrotra, S. Qadeer, R.K. Ranjan, and R.H. Katz
`Fast Low-Energy VLSI Binary Addition .... : ...................... ........................................................ 676
`K.K. Parhi
`A Floating-Point Divider using Redundant Binary Circuits and an Asynchronous
`Clock Scheme ........................... ................... .......... ................................................... ...... ....... ...... 685
`H. Suzuki, H. Mankino, K. Mashiko, and H. Hamano
`Parallel-Array Implementations of A Non-Restoring Square Root Algorithm ....................... 690
`Y. Li and W. Chu
`Intel Exhibit 1003 - 10
`Concurrent Session 3.3
`Session 3.3.1: Asynchronous Design
`Chair: Daniel Saab, Case Reserve Western University
`Optimizing CMOS Implementations of C-element ................................................................... 700
`M. Shams, J .C. Ebergen, and M.l. Elmasry
`A Doubly-Latched Asynchronous Pipeline ................................................................................ 706
`R. Kol and R. Ginosar
`A Pulse-To-Static Conversion Latch With a Self-Timed Control Circuit ................................ 712
`W. Hwang, R. V. Joshi, and W.H. Henkels
`Session 3.3.2: Special Session: Interconnect Modeling & Repeater Methodologies
`Chair: Byron Krauter, IBM Corp.
`Fast Generation of Statistically-based Worst.Case Modeling of On-Chip Interconnect ........ 720
`N. Chang, V. Kanevsky, O.S. Nakagawa, K Rahmat, and S. -Y Oh
`A Repeater Optimization Methodology for Deep Sub.Micron, High-Performance
`Processor ................................... ................................................................................................. 726
`D. Li, A Pu.a, P. Srivastava, and U. Ko
`Critical Voltage Transition Logic: An Ultrafast CMOS Logic Family .................................... 732
`Z. Zhu and B.S. Carlson
`Session 3.3.3: Finite-State Machine and High-Level Synthesis
`Chair: Ramin Hojati, University of California at Berkeley
`Divide and Conquer: A Strategy for Synthesis of Low Power Finite State Machines ........... 740
`A. Dasgupta and S. Ganguly
`Estimation of Maximum Power for Sequential Circuits Considering
`Spurious Transitions .................................................................................................................. 7 46
`C. -Y. WangandK. Roy
`Dynamic Bounding of Successor Force Computations in the Force Directed
`List Scheduling Algorithm ................................................................... ...................................... 752
`S. Govindarajan and R. Vemuri
`Author Index ............................................................................................................................ 758
`Intel Exhibit 1003 - 11
`Architectural Adaptation for Application-Specific Locality Optimizations
`Xingbin Zhang*
`Ali Dasdan* Martin Schulzt
`Rajesh K. Gupta+ Andrew A. Chien*
`"'Department of Computer Science
`University of Illinois at Urbana-Champaign
`{ zhang, dasdan, achien} @cs. u.iuc. edu
`iJ.nstitut flir Infonnatik
`Technische Universitat Mi.inchen
`schulzm@info rmatik. tu-muenchen.de
`+Information and Computer Science, University of California at Irvine
`rg upta@ ics. uci. edu
`We propose a machine architecture that integrates pro(cid:173)
`grammable logic into key components of the system with
`the goal of customizing architectural mechanisms and poli(cid:173)
`cies to match an application. This approach presents
`an improvement over tr_aditional approach of exploiting
`programmable logic as a separate co-processor by pre(cid:173)
`serving machine usability through software and over tra(cid:173)
`ditional computer architecture by providing application(cid:173)
`specific hardware assists. We present two case studies of
`architectural customization to enhance latency tolerance
`and efficiently utilize network bisection on multiproces(cid:173)
`sors for sparse matrix computations. We demonstrate that
`application-specific hardware assists and policies can pro(cid:173)
`vide substantial improvements in performance on a per ap(cid:173)
`plication basis. Based on these preliminary results, we pro(cid:173)
`pose that an application-driven machine customization pro(cid:173)
`vides a promising approach to achieve high performance
`and combat performance fragility.
`1 Introduction
`Technology projections for the coming decade [1] point
`out that system performance is going to be increasingly
`dominated by intra-chip interconnect delay. This presenLS
`a unique opportunity for programmable logic as the inter(cid:173)
`connect dominance reduces the contribution of per stage
`logic complexity on performance and the marginal costs
`of adding switching logic in the interconnect. However,
`the traditional co-processing architecture of exploiting pro(cid:173)
`grammable logic as a specialized functional unit to de(cid:173)
`liver a specific application suffers from the problem of ma(cid:173)
`chine retargetability. A system generated using this ap(cid:173)
`proach typically can not be retargeted to another application
`without repartitioning hardware and software functionality
`and reimplementing the co-processing hardware. This re(cid:173)
`targetability problem is an obstacle toward exploiting pro(cid:173)
`grammable logic for general purpose computing.
`We propose a machine architecture that integrates pro(cid:173)
`grammable logic into key components of the system with
`the goal of customizing architectural mechanisms and poli(cid:173)
`cies to match an application. We base our design on the
`premise that communication is already critical and getting
`increasingly so [17], and flexible interconnects can be used
`to replace static wires at competitive performance [6, 9, 20).
`Our approach presents an improvement over co-processing
`by preserving machine usability through software and over
`traditional computer architecture by providing application(cid:173)
`specific hardware assists. The goal of application-specific
`hardware assists is to overcome the rigid architectural
`choices in modern computer systems that do not work well
`across different applications and often cause substantial per(cid:173)
`formance fragility. Because performance fragility is espe(cid:173)
`cially apparent on memory performance on systems with
`deep memory hierarchies, we present two case studies of ar(cid:173)
`chitectural customization to enhance latenc