throbber
High Performance Reconfigurable
`
`Computing for Science and
`
`Engineering Applications
`
`by
`
`Peter Leonard McMahon
`
`Submitted to the Department of Electrical Engineering
`
`in partial fulfillment of the requirements for the degree of
`
`Bachelor of Science in Electrical and Computer Engineering
`
`at the
`
`University of Cape Town
`
`October 2006
`
`Advisor: Professor Michael R. Inggs
`
`

`

`Abstract
`
`This thesis investigates the feasibility of using reconfigurable computers for
`
`scientific applications. We review recent developments in reconfigurable high
`
`performance computing. We then present designs and implementation details
`
`of various scientific applications that we developed for the SRC-6 reconfig-
`
`urable computer. We present performance measurements and analysis of the
`
`results obtained.
`
`We chose a selection of applications from bioinformatics, physics and
`
`financial mathematics, including automatic docking of molecular models into
`
`electron density maps, lattice gas fluid dynamics simulations, edge detection
`
`in images and Monte Carlo options pricing simulations.
`
`We conclude that reconfigurable computing is a maturing field that may
`
`provide considerable benefit to scientific applications in the future. At present
`
`the performance gains offered by reconfigurable computers are not sufficient
`
`to justify the expense of the systems, and the software development environ-
`
`ment lacks the language features and library support that application devel-
`
`opers require so that they can focus on developing correct software rather
`
`than on software infrastructure.
`
`

`

`Contents
`
`1 Introduction
`
`1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.2.1
`
`Investigate the State-of-the-art in Reconfigurable Com-
`
`puting . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
`1.2.2
`
`Implement Scientific Computing Algorithms on Recon-
`
`figurable Computers
`
`. . . . . . . . . . . . . . . . . . .
`
`1.2.3 Analyze the Performance of Scientific Applications on
`
`Reconfigurable Computers . . . . . . . . . . . . . . . .
`
`1.2.4 Provide Guidance on the Methodology for Developing
`
`Software for Reconfigurable Computers . . . . . . . . .
`
`1.3 Motivation for Problems Studied . . . . . . . . . . . . . . . .
`
`1.4 Thesis Outline and Summary . . . . . . . . . . . . . . . . . .
`
`1
`
`1
`
`3
`
`3
`
`4
`
`4
`
`4
`
`5
`
`6
`
`2 An Introduction to Reconfigurable Computing
`
`10
`
`2.1 Reconfigurable Computing Hardware . . . . . . . . . . . . . . 11
`
`2.1.1 Where do reconfigurable computers get their speed from? 12
`
`2.2 Reconfigurable Computing Software . . . . . . . . . . . . . . . 13
`
`2.3 Measuring Performance in Reconfigurable Computing Systems
`
`14
`
`2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
`
`3 Monte Carlo Methods on Reconfigurable Computers
`
`16
`
`3.1 Monte Carlo Methods
`
`. . . . . . . . . . . . . . . . . . . . . . 16
`
`3.2 Monte Carlo Estimation of π . . . . . . . . . . . . . . . . . . . 17
`
`i
`
`

`

`3.2.1
`
`Implementation of a Parallel Pseudorandom Number
`
`Generator . . . . . . . . . . . . . . . . . . . . . . . . . 19
`
`3.2.2 Design and Implementation of the Monte Carlo π Es-
`
`timator
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . 25
`
`3.2.3 Performance Results
`
`. . . . . . . . . . . . . . . . . . . 29
`
`3.3 Monte Carlo Options Pricing . . . . . . . . . . . . . . . . . . . 32
`
`3.3.1 Pricing Asian Options with Monte Carlo . . . . . . . . 33
`
`3.3.2 Generating Normal Random Variables
`
`. . . . . . . . . 36
`
`3.3.3 Design and Implementation . . . . . . . . . . . . . . . 37
`
`3.3.4 Performance Results
`
`. . . . . . . . . . . . . . . . . . . 43
`
`3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
`
`4 Cellular Automata Simulations on Reconfigurable Comput-
`
`ers
`
`48
`
`4.1 An Introduction to Cellular Automata . . . . . . . . . . . . . 49
`
`4.1.1 One-dimensional Cellular Automata . . . . . . . . . . . 49
`
`4.1.2 Two-dimensional Cellular Automata . . . . . . . . . . 51
`
`4.2 Conway’s Game of Life . . . . . . . . . . . . . . . . . . . . . . 52
`
`4.2.1 Design and Implementation . . . . . . . . . . . . . . . 53
`
`4.2.2 Performance Results
`
`. . . . . . . . . . . . . . . . . . . 71
`
`4.3 Fluid Dynamics Simulations using the Lattice Gas Method . . 73
`
`4.3.1 Design and Implementation . . . . . . . . . . . . . . . 75
`
`4.3.2 Performance Results
`
`. . . . . . . . . . . . . . . . . . . 79
`
`4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
`
`5 Image Processing – Edge Detection on Reconfigurable Com-
`
`puters
`
`81
`
`5.1 An Introduction to the Sobel Edge Detection Algorithm . . . 81
`
`5.2 Edge Detection on a Reconfigurable Computer . . . . . . . . . 84
`
`5.2.1 Design and Implementation . . . . . . . . . . . . . . . 84
`
`5.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
`
`5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
`
`ii
`
`

`

`6 Automatic Macromolecular Docking on Reconfigurable Com-
`
`puters
`
`91
`
`6.1 Macromolecular Docking using Global Correlation . . . . . . . 93
`
`6.1.1 Design and Implementation . . . . . . . . . . . . . . . 94
`
`6.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
`
`6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
`
`7 Conclusion
`
`99
`
`7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
`
`7.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
`
`A Monte Carlo Methods
`
`101
`
`A.1 A Review of Monte Carlo Methods
`
`. . . . . . . . . . . . . . . 101
`
`A.1.1 An Early Monte Carlo Algorithm . . . . . . . . . . . . 101
`
`A.1.2 Numerical Integration using Monte Carlo Methods
`
`. . 102
`
`A.1.3 Monte Carlo, Beyond Simple Integration . . . . . . . . 104
`
`A.2 A Review of Parallel Pseudorandom Number Generation . . . 105
`
`A.2.1 Generating Random Numbers on Deterministic Com-
`
`puters . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
`
`A.2.2 Parallel Pseudorandom Number Generation . . . . . . 108
`
`A.2.3 Assessing the Quality of Pseudorandom Number Se-
`
`quences
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . 110
`
`B The SRC-6 Reconfigurable Computer
`
`114
`
`iii
`
`

`

`List of Figures
`
`1.1 A generic reconfigurable computer architecture.
`
`. . . . . . . .
`
`2
`
`2.1 Computational density of FPGAs and Intel microprocessors.
`
`Image from ref. [6]. . . . . . . . . . . . . . . . . . . . . . . . . 12
`
`2.2 Measurement of processing time for an application running on
`
`a reconfigurable computer. Image from ref. [9]. . . . . . . . . . 14
`
`3.1 Scatter plot of (x, y) pairs randomly sampled from [0, 1]2. . . . 18
`3.2 Processing engines in the FPGA, each independently generat-
`
`ing pseudorandom numbers.
`
`. . . . . . . . . . . . . . . . . . . 20
`
`3.3 Simulation of a set of 3 parallel pseudorandom number gener-
`
`ators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
`
`3.4 The operation of a single processing engine.
`
`. . . . . . . . . . 22
`
`3.5 Top-level design for random number generators on two FPGAs. 23
`
`3.6 Processing engines each performing N simulations to estimate
`
`π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
`
`3.7 Percentage of FPGA resources used on a single FPGA, as a
`
`function of the number of processing engines included.
`
`. . . . 28
`
`3.8 Performance of Monte Carlo estimation of π on an SRC-6
`
`MAPe with 15 processing engines, using one FPGA, compared
`
`with that of an x86 processor.
`
`. . . . . . . . . . . . . . . . . . 30
`
`3.9 Performance of Monte Carlo estimation of π on an SRC-6
`
`MAPe with 30 processing engines, using two FPGAs, com-
`
`pared with that of an x86 processor.
`
`. . . . . . . . . . . . . . 31
`
`iv
`
`

`

`3.10 Performance of Monte Carlo estimation of π on an SRC-6
`
`MAPe, using one FPGA, as a function of the number of process-
`
`ing engines used.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . 32
`
`3.11 Histogram, with 500 bins, of 200000 random numbers gen-
`
`erated on the SRC-6 (blue), and 2000000 random numbers
`
`generated using MATLAB’s randn function, scaled (red). . . . 37
`
`3.12 Processing engines each generating N independent paths of
`
`stock price movements to simulate possible payoff scenarios.
`
`. 39
`
`3.13 Performance of a Monte Carlo options pricing simulation on an
`
`SRC-6 MAPe, using one FPGA, showing the speed difference
`
`between unflattened loops and flattened loops. . . . . . . . . . 44
`
`3.14 Performance of a Monte Carlo options pricing simulation on
`
`an SRC-6 MAPe, compared with that of an x86 processor.
`
`. . 46
`
`4.1 Examples of Birth, Survival and Death in the Game of Life.
`
`. 52
`
`4.2 A grid representing a configuration in the Game of Life cellular
`
`automaton.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
`
`4.3 A grid decomposed into four subgrids, by dividing the grid
`
`vertically and horizontally.
`
`. . . . . . . . . . . . . . . . . . . . 55
`
`4.4 A grid decomposed into five subgrids, by dividing the grid into
`
`horizontal strips.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . 55
`
`4.5 A grid decomposed into five subgrids, showing the ‘ghost re-
`
`gions’ for each slice. . . . . . . . . . . . . . . . . . . . . . . . . 57
`
`4.6 Sequence of processor communication to synchronize ghost re-
`
`gions, while avoiding deadlock. . . . . . . . . . . . . . . . . . . 59
`
`4.7 Processing engine performing an N timestep simulation, us-
`
`ing the design that all communication is done between the
`
`processing engines.
`
`. . . . . . . . . . . . . . . . . . . . . . . . 61
`
`4.8 Processing engines performing an N timestep simulation. The
`
`program forks processing engines, then joins them once every
`
`iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
`
`v
`
`

`

`4.9 Layout and flow of the data in the system during a cellular
`
`automata simulation. . . . . . . . . . . . . . . . . . . . . . . . 66
`
`4.10 Performance of a Game of Life simulation on an SRC-6 MAPe,
`
`compared with that of an x86 processor.
`
`. . . . . . . . . . . . 72
`
`4.11 Transition of an FHP lattice gas automata. Solid arrows show
`
`the configuration at the current timestep, and hollow arrows
`
`show the configuration at the next timestep. Image from Luo,
`
`ref. [37].
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
`
`4.12 Example transition rules for FHP lattice gas automata. Image
`
`from Luo, ref. [37].
`
`. . . . . . . . . . . . . . . . . . . . . . . . 75
`
`4.13 Layout and flow of the data in the system during a lattice gas
`
`automata simulation. . . . . . . . . . . . . . . . . . . . . . . . 77
`
`4.14 Performance of a Lattice Gas Automata simulation on an
`
`SRC-6 MAPe, with 5 processing engines, using one FPGA,
`
`compared with that of an x86 processor.
`
`. . . . . . . . . . . . 80
`
`5.1 Processing engines each performing Sobel edge detection on a
`
`slice of the input image.
`
`. . . . . . . . . . . . . . . . . . . . . 85
`
`5.2 Layout and flow of the data in the system during edge detection. 87
`
`5.3 Original image (left). Image with detected edges shown (right). 88
`
`5.4 Caching pixels to reduce the number of reads from onboard
`
`memory.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
`
`6.1 A macromolecular model (yellow) docked in an electron mi-
`
`croscope density map (blue mesh). Image from ref. [45].
`
`. . . 92
`
`6.2 Fitting a macromolecular model by rotation and translation.
`
`Image from ref. [46].
`
`. . . . . . . . . . . . . . . . . . . . . . . 93
`
`6.3 An overview of the global correlation docking algorithm. Im-
`
`age from ref. [46]. . . . . . . . . . . . . . . . . . . . . . . . . . 95
`
`6.4 Processing engines each search to find a maximum correlation
`
`for the Euler angles they are assigned.
`
`. . . . . . . . . . . . . 97
`
`vi
`
`

`

`B.1 The SRC-6 in a rack at the National Center for Supercomput-
`
`ing Applications.
`
`. . . . . . . . . . . . . . . . . . . . . . . . . 115
`
`B.2 The architecture of the SRC MAP module. . . . . . . . . . . . 115
`
`vii
`
`

`

`List of Tables
`
`3.1 Results of statistical tests for quality of a pseudorandom se-
`
`quence generated by 15 processing engines. . . . . . . . . . . . 24
`
`3.2 Place and Route results for a two-FPGA, 30 processing engine
`
`Monte Carlo simulation to estimate π.
`
`. . . . . . . . . . . . . 27
`
`3.3 Place and Route results for a one-FPGA, one processing en-
`
`gine Monte Carlo options pricing simulation. . . . . . . . . . . 42
`
`3.4 Place and Route results for a one-FPGA, two processing en-
`
`gine Monte Carlo options pricing simulation. . . . . . . . . . . 43
`
`4.1 Place and Route results for a one-FPGA, four-processing en-
`
`gine Game of Life cellular automata simulation.
`
`. . . . . . . . 69
`
`4.2 Place and Route results for a one-FPGA, eight-processing en-
`
`gine Game of Life cellular automata simulation.
`
`. . . . . . . . 70
`
`4.3 Place and Route results for a one-FPGA, five-processing en-
`
`gine Lattice Gas Automata simulation.
`
`. . . . . . . . . . . . . 78
`
`5.1 Place and Route results for a one-FPGA, three processing en-
`
`gine edge detection program. . . . . . . . . . . . . . . . . . . . 87
`
`5.2 Performance of Sobel Edge Detection on an SRC-6 MAPe,
`
`with three processing engines, using one FPGA, compared
`
`with that of an x86 processor.
`
`. . . . . . . . . . . . . . . . . . 89
`
`viii
`
`

`

`Acknowledgements
`
`This thesis marks the culmination of my undergraduate career, and so I
`
`find this to be the perfect opportunity to acknowledge those individuals who
`
`have enriched my undergraduate experience, making it immeasurably more
`
`beneficial and exciting than it otherwise might have been. Of course by
`
`naming people, I run the risk of making serious omissions. If I do, I apologise
`
`and ask that you know that I am not any less grateful for your help over the
`
`years.
`
`First and foremost, I am exceedingly grateful to my advisor, Professor
`
`Michael Inggs. He granted me an exceptional amount of independence so
`
`that I could pursue what I thought to be the most interesting and impor-
`
`tant problems, and in so doing forced me to see first-hand what the world
`
`of research is about. This academic freedom has allowed me to enjoy an
`
`excellent six months following my every intellectual whim, and being able to
`
`count most of it as work! I am thankful for his unreasonable confidence in a
`
`student he had barely met, and for his willingness to put his reputation (and
`
`funds!) on the line by recommending me to his collaborators and contacts,
`
`and then leaving me to ‘go forth and do good things’. I can only hope that
`
`the final result, this thesis, is an apt repayment of his faith in me.
`
`This thesis would not exist as it stands were it not for the help I have
`
`received from a large number of people. I would like to extend my gratitude
`
`to all the folk at the National Center for Supercomputing Applications at
`
`the University of Illinois for their hospitality and support. Dr Radha Nand-
`
`kumar was instrumental in bringing me over to the United States for two
`
`months so that I could work with the NCSA’s Innovative Systems Labora-
`
`tory Reconfigurable Computing group, and so I am especially indebted to
`
`her. Thank you, Radha! I was apparently the first visitor in an experimental
`
`‘International Affiliates’ program, and although I certainly gained more from
`
`the experience than I was able to give back to the NCSA, I hope they found
`
`my visit to be a beneficial exercise.
`
`Dr Craig Steffen was my official host at the NCSA, and I would like to
`
`ix
`
`

`

`thank him for the logistical, technical and intellectual support that he pro-
`
`vided during my stay. Craig did far more than I ever could have reasonably
`
`imagined he would - from driving for 6 hours to Chicago to pick me up ‘so
`
`that I wouldn’t have to worry about catching the train’, to making sure
`
`I had literally everything I needed to start working straight away. I spent
`
`many hours with Craig pestering him about technical matters, discussing ap-
`
`proaches to problems and otherwise learning from him. He was an excellent
`
`mentor. Craig also had to put up with an enormous number of questions to
`
`satisfy my child-like curiosity about why things are the way they are in the
`
`U.S. - such as what the holes in electrical plug pins are for, why there is a
`
`big, slit rubber membrane in kitchen sinks, and why so many houses are built
`
`out of wood. Craig, thanks again for the lunches out and for periodic invites
`
`to your home for dinner. Thanks too, to Craig’s wife Becky for sharing her
`
`home with me and for the cooking. I hope to repay the debt to both of you
`
`one day.
`
`The ISL was an exceptionally friendly and productive group to work in. I
`
`had a lot of fun on the third floor - thanks, guys. Besides the joking around,
`
`I also learned a great deal from discussions I had with you all. Dr Volodymyr
`
`‘Vlad’ Kindratenko and David Pointer certainly deserve special mention for
`
`their technical guidance and general willingness to help me whenever I was
`
`stuck or in uncharted waters. Both Vlad and David went out of their way
`
`to help with questions I had that they didn’t have the answers to. This
`
`wasn’t very often though - their technical knowledge was an inspiration, and
`
`a resource from which I benefited regularly. Thank you.
`
`At the Reconfigurable Systems Summer Institute in Urbana in July, I
`
`met Dan Poznanovic and Dr Jeff Hammes, and would like to acknowledge
`
`helpful discussions with them. In particular, Jeff, who is a lead developer of
`
`the MAPC compiler at SRC, provided numerous useful pieces of advice and
`
`workarounds to compiler limitations.
`
`I was able to investigate the diverse mix of academic domains in this thesis
`
`that I did thanks to the generosity of several researchers who individually
`
`x
`
`

`

`agreed to meet with me and give me ‘crash courses’ in their disciplines and
`
`work, with virtually no hope of any payoff. Dr Dan Jacobsen from the
`
`National Bioinformatics Network drove out to UCT to chat with me about his
`
`work in integrating biological databases. The staff at the Square Kilometer
`
`Array / Karoo Array Telescope offices in Pinelands kindly showed me round
`
`one afternoon, and gave me a useful overview of the projects and their needs.
`
`Dr Athol Kemball, an astronomer at the NCSA, offered helpful advice. Artur
`
`Szostak, Gareth de Vaux and Bruce Becker from the UCT-ALICE group
`
`spent hours explaining the functioning of the dimuon High Level Trigger in
`
`the ALICE project to me, and giving me advice on where I would be best
`
`advised to focus my efforts.
`
`I am particularly grateful to Dr Zeblon Vilakazi from UCT-ALICE and
`
`iThemba Labs. Dr Vilakazi has spent a great deal of time discussing particle
`
`physics and the engineering behind detectors with me since I first approached
`
`him about his work on the ALICE project in 2004.
`
`I have learned a lot
`
`from him about what modern experimental physics is all about, even if his
`
`confidence in my ability as a dabbling physicist is far greater than my own!
`
`I also thoroughly enjoyed a short visit I had to CERN in June 2006 at Dr
`
`Vilakazi’s invitation - it was certainly a memorable experience, and a much
`
`appreciated detour on my way to Illinois.
`
`Dr Michelle Kuttel, besides her direct impact on this thesis, in the form
`
`of the work on atomic model docking in electron density maps and on cel-
`
`lular automata, has undoubtedly had a far more general influence as well.
`
`Dr Kuttel has been an exceptional mentor since I did a class project under
`
`her guidance in 2004. I have spent innumerable hours in her office discussing
`
`research in computer science and computational science, and academia and
`
`research in general. Besides stimulating intellectual conversations, Dr Kut-
`
`tel has also given me numerous opportunities to develop - by allowing me
`
`to guest lecture in her Distributed Systems course, involving me in the Sci-
`
`entific Clustering Applications Workshops and allowing me to speak at the
`
`second SCAW. She has also patiently explained computational chemistry to
`
`xi
`
`

`

`a neophyte and walked me through her atomic model docking work. In the
`
`CS2 project I took with her, I learned more about cellular automata that
`
`I otherwise might have, and had an exceptionally fun time building a CA
`
`lattice gas simulator with Shen Tian. This project, although not strictly
`
`research, was probably the closest I had come at the time.
`
`Shen, besides being a terrific friend, was the best project partner I could
`
`ever ask or hope for. His technical brilliance is nearly unmatched, and he is
`
`a hard and dedicated worker (when he isn’t distracted by Google or anime!).
`
`Our project in second year was undoubtedly the most exciting group project
`
`I’ve been involved in, and I benefited enormously from it, largely due to
`
`Shen’s involvement.
`
`I have continued our work in this thesis, and it goes
`
`almost without saying that had I not undertaken the original lattice gas
`
`project with Shen, this would not have happened.
`
`I would like to thank Justin Kelleher (now back in Ireland), Professor
`
`Andrew Hutchison and Professor Pieter Kritzinger from the Data Network
`
`Architectures group, as well as the various group students I’ve had interac-
`
`tions with, for involving me in the DNA activities, which gave me another
`
`perspective on what a research group does and how it goes about its busi-
`
`ness. Justin during his time at UCT was a great champion for me, and I had
`
`an excellent time discussing the state-of-the-art in software engineering with
`
`him. I would also like to especially thank Prof. Hutchison for giving me a
`
`glimpse into the security research field, and for presenting our paper at ISSA
`
`2006 while I was away.
`
`Professor Vasco Brattka, UCT’s lone quantum computing theory expert,
`
`has taught me a tremendous amount about what theory research is like.
`
`His quantum computing course was truly superb - flawlessly prepared, with
`
`excellent notes. Prof. Brattka’s deep insights into the mathematical founda-
`
`tions of quantum computing have been enlightening and inspiring. He has
`
`the enviable quality of never jumping to conclusions, and always insists on
`
`a well-thought-out argument before settling on any statement. I am grateful
`
`for his patiently sitting through descriptions of my hair-brained algorithm
`
`xii
`
`

`

`ideas. His gentle encouragement to continue working and thinking despite
`
`my minimal advances were much appreciated.
`
`Besides the abovementioned faculty, I have encountered several other lec-
`
`turers who have made my time at UCT more enjoyable. Professor Martin
`
`Braae, who headed the Department of Electrical Engineering for my first
`
`three years here, and Deputy Dean Professor Barry Downing have both been
`
`very supportive of my efforts, never quibbling when I broke rules on curricu-
`
`lum changes or otherwise tried to jump over bureaucratic barriers. Professor
`
`Anthony Chan’s seemingly unbreakable spirit and work ethic have been an
`
`inspiration to watch. In no particular order, Mr Alan Rynhoud, Mr Stephen
`
`Schrire, Professor Jonathan Tapson, Dr Fred Nicolls, Dr Andrew Wilkinson,
`
`Mr Samuel Ginsberg, Mr Simon Winberg, Professor Cathal Seoighe, Pro-
`
`fessor Raoul Viollier, Professor Sandro Perez, Dr Roger Fearick, Dr Gary
`
`Tupper and Professor Andy Buffler have all taught me at some stage, and
`
`were memorable for their depth of knowledge in their respective fields and
`
`the quality of their expositions. Mme. Ewa Swida-Reid provided an oft en-
`
`tertaining outlet for ‘les ing´enieurs’, in our solitary foray into the humanities.
`
`Not entirely outside the academic realm, I would like to thank the editors
`
`at Varsity when I worked at the paper, Steven Kenyon, David Wilson and
`
`Cathryn Reece, and the rest of the staff, for letting this engineer pretend to
`
`be a journalist for a few hours each week.
`
`Last but not least, my friends and family deserve my thanks for their
`
`support.
`
`In addition, the Smuts community has been excellent, and I’m
`
`especially grateful to my peers in the Electrical Engineering department for
`
`creating a friendly environment in which to work and learn, and for teaching
`
`me all that they have. Thanks finally to the Copeland tribe — Dean, Devin,
`
`Jason and Mike — for an excellent year, on top of those that went before.
`
`xiii
`
`

`

`Chapter 1
`
`Introduction
`
`This thesis investigates the feasibility of using reconfigurable computing tech-
`
`nology for performing scientific computations. In this introduction, we pro-
`
`vide a brief background and motivation for this investigation, provide details
`
`of the objectives of the thesis, and outline the contents of the thesis.
`
`1.1 Background
`
`The past four decades has seen an exponential rise in the speed of processors.
`
`Moore’s Law, a prediction made by Gordon Moore of Intel in 1965, has
`
`proven surprisingly accurate — the number of transistors on processors has
`
`doubled nearly every 24 months since Moore’s prediction that this would be
`
`the case. The ability to fabricate chips with more transistors has resulted in
`
`proportionate speed increases.
`
`However, during the past 3 years, microprocessor manufacturers have
`
`been experiencing difficulty in dramatically increasing the performance of
`
`their processors. Clock speed increases have all but come to a halt due to
`
`limitations with present fabrication technology, which has resulted in manu-
`
`facturers seeking alternative means to providing consumers with greater per-
`
`formance. Unfortunately no revolutionary architectural changes have been
`
`forthcoming, so the manufacturers have instead simply opted to build mul-
`
`1
`
`

`

`ticore chips.
`
`Power consumption on present microprocessors has also become a con-
`
`siderable issue, besides the problems that power dissipation causes with at-
`
`tempts to increase performance. Large high performance computing centres
`
`often consume megawatts of power, leading to electricity being a considerable
`operating expense1. Thus there is also an economic motivation to investigate
`
`lower power technologies.
`
`Field Programmable Gate Arrays (FPGAs) have long been used in digi-
`
`tal signal processing (DSP) applications, where relatively simple algorithms,
`
`involving primarily integer operations, are performed on large quantities of
`
`data. Over the past decade, several projects have been initiated to investi-
`
`gate the use of FPGAs for general-purpose scientific computation [4, 5, 6].
`
`This ideal led to its natural extension — the development of hybrid com-
`
`putational machines that use both traditional microprocessors and FPGAs.
`
`Such hybrid architectures, known as reconfigurable computers have recently
`
`been introduced commercially by vendors such as Cray, SRC and SGI.
`
`Figure 1.1: A generic reconfigurable computer architecture.
`
`Successes with current reconfigurable computers have largely been lim-
`
`1The National Center for Supercomputing Applications at the University of Illinois
`currently uses more than 3MW of power. The planned National Science Foundation
`petaflop supercomputer may require upwards of 20MW.
`
`2
`
`

`

`ited to specific signal processing applications or other specialized algorithms.
`
`However, it is certainly possible [6] that reconfigurable computers may be-
`
`come important technology for more general high performance computations,
`
`and in particular scientific simulation and computation.
`
`1.2 Objectives
`
`In this thesis we aimed to investigate the state-of-the-art in reconfigurable
`
`computing, and analyze the ability of current reconfigurable computing tech-
`
`nology to perform scientific computations. Methodology for developing soft-
`
`ware for reconfigurable computers was also to be investigated.
`
`These overall objectives are now described in more detail.
`
`1.2.1
`
`Investigate the State-of-the-art in Reconfigurable
`
`Computing
`
`A sizeable number of academic and commercial projects have been attempted
`
`over the past decade, with the intention of furthering the use of FPGAs, and
`
`more generally reconfigurable computers, in computationally intensive envi-
`
`ronments outside of signal processing. Academic projects include SPLASH
`
`from the early 1990s [4], the MIT FPGA-based Cellular Automata Machine
`
`from the mid-1990s, the Berkeley Emulation Engine (BEE) and its succes-
`
`sor, BEE2 [6], and Brigham Young University and Boston University’s ef-
`forts, amongst others. Commercial offerings now include those from Cray2,
`
`SGI, SRC, Nallatech and Linux Networx. Xilinx, the world’s leading FPGA
`
`manufacturer, has also recently demonstrated its own compiler efforts for a
`
`high-level language for programming its FPGAs.
`
`One of the broader aims of this thesis was to survey the current state-of-
`
`the-art in reconfigurable computing, to look for key differences between the
`
`2The current understanding in the industry is that Cray is discontinuing its XD1 line
`of reconfigurable computers, but may licence to third parties the technology it acquired
`from Octigabay that led to the development of the Cray XD1.
`
`3
`
`

`

`available technologies, and provide a review of what a researcher new to the
`
`area needs to know.
`
`1.2.2
`
`Implement Scientific Computing Algorithms on
`
`Reconfigurable Computers
`
`A core aim of the thesis is the implementation of several scientific computing
`
`codes on a reconfigurable computer. We aimed to identify several scientific
`
`computing problems that are computationally intensive, and of interest to re-
`
`searchers in South Africa, and rewrite them for execution on a reconfigurable
`
`computer. This, of course, implies the need to investigate how to develop
`
`software for reconfigurable computers.
`
`The specific problems we chose to implement are: Monte Carlo Simula-
`
`tions; Cellular Automata Simulations; Edge Detection (Image Processing)
`
`and Macromolecular Docking.
`
`1.2.3 Analyze the Performance of Scientific Applica-
`
`tions on Reconfigurable Computers
`
`For each problem implemented on a reconfigurable computer, we aimed to
`
`analyze the performance of the implementation on the reconfigurable ver-
`
`sus its performance on a classic x86 microprocessor architecture. We also
`
`aimed to investigate how performance on reconfigurable computers scales.
`
`Finally, we aimed to analyze the performance and available resources to de-
`
`termine which aspects of current reconfigurable computers are performance
`
`bottlenecks.
`
`1.2.4 Provide Guidance on the Methodology for De-
`
`veloping Software for Reconfigurable Computers
`
`During the implementation of algorithms on a reconfigurable computer, we
`
`expected to find some effective method for parallelizing and porting code.
`
`4
`
`

`

`We also expected to encounter a series of technical difficulties, given the
`
`immature nature of the technology, and aimed to provide a brief reference in
`
`this thesis that gives our solutions to common problems one might encounter
`
`when porting code.
`
`1.3 Motivation for Problems Studied
`
`As we have mentioned, we implemented four different types of algorithm on
`
`reconfigurable computers during our study. These were Monte Carlo Simu-
`
`lations; Cellular Automata Simulations; Edge Detection (Image Processing)
`
`and Macromolecular Docking.
`
`In selecting algorithms to implement, one of our broad objectives was to
`
`pick those that would be of interest to researchers in South Africa. The first
`
`three problems that we selected are very broad, and naturally satisfy this
`
`criterion: Monte Carlo simulations are used in a wide variety of scientific
`
`computing areas, including computational chemistry and physics, as well as
`
`financial mathematics. Cellular automata are less widely used than Monte
`
`Carlo simulations, but have equally broad application, as Wolfram [10] has
`
`demonstrated.
`
`Image processing algorithms, including edge detection, are
`
`also widely used in industry in South Africa, and there are several research
`
`groups in academia and industry that work extensively with such algorithms.
`
`Macromolecular docking is a far more specific application than the pre-
`
`ceding three. It is under active study as part of the National Bioinformatics
`
`Network research programme, and is an important research topic in experi-
`
`mental techniques for structural biology.
`
`In selecting the algorithms to implement, we also aimed to pick a set
`
`of algorithms that were dissimilar, so that we could observe how different
`
`types of algorithms performed on reconfigurable computers.
`
`In our study
`
`of Monte Carlo simulations, we implemented a simple estimator of π that
`
`wasn’t very floating-point calculation intensive, and a financial options pric-
`
`ing simulation, which was quite floating-point calculation intensive. Neither
`
`5
`
`

`

`of these applications were data intensive. The cellular automata models we
`
`implemented were both discrete, and did not involve any floating-point cal-
`
`culations. Relatively large amounts of data did, however, need to be handled.
`
`The edge detection algorithm was also data-intensive, and involved only in-
`
`teger arithmetic. Its advantage was that is performance on FPGAs and on
`
`microprocessors is fairly well known, so it provides a good basis for compar-
`
`ison. Finally, the docking problem involves a large amount of floating-

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket