`
`it Chm-nit eioi fl'fmnfnlterl’hysm i‘carmnuimflom- H9 ramp l—iu
`
`In fact, the interaction field of a many-body system is usually expressed as empirical potential among the atoms
`(particles) of the many—body system under study and it should includes energy terms that represent bonded and
`non-bonded (Van der Wanls and Couloinbic forces) interactions. By limiting our attention to the simulation of
`a biological system. a typical force field used in claseical MT) sirnulaiionsappears to be like Follows
`
`v = Z sinubnh Z Karo-on;
`bonds
`angles
`
`(10a)
`
`4 Z Kaichsw—ch Z KtlE—Ifnll
`dihedral:
`improper
`i2
`6
`
`{lib}
`(m) ]+; ,- 4mm.
`+ZEEQH’U)
`J
`_ fl
`G'I'QJ'
`_
`fl
`The first four terms tEq, (than represent the bonded potential. The last one (En. tltlb)} gives the non-bonded
`energy contribution to the total potential and it is the most time-consuming task is 35—90% ofthe total CPU time).
`Each pair interaction t‘j in Eq. (10b) has to be calculated to obtain the energy due to the non-bonded forces. The
`number oi'pairs to be calculated is halved using Newton ‘3 third law F; = —i“_i.
`The number of pair interactions can be further reduced including only atorn pairs within a cutoff distance.
`’I‘heret'ore, at each step a generic atom i has a set of j different atoms to interact with (the non-bonded pair list},
`This list can be updated every n steps, with H. 2 it].
`Parallelizatico can be achieved with a particle decomposition scheme, i.e. a number ofi particles 15 assigned to
`each processor (home particles) [1']. Then, each processor will calculate independently the pair interactions of the
`1‘ home particles with the 1 particles.
`It is worth noting that in a shared men-torsr machine the replicated data model is automatically obtained, i.e. data
`about every atom in Ihe system is available to each processor, while in a distributed memory architecture position.
`and undergone force ofeacb particle, have to be communicated between the machine processors.
`Once obtained the interaction forces on each atom, the Newton ‘s equattons of motion can be numerically
`integrated using, for example, the leap-frog algorithm ['7],
`The integration is not a computationally demanding task but it can be nevoflheless perl‘onnccl in parallel on each
`home particle. A longer integration time step permits to simulate longer times with the same number cl‘ steps.
`if the bond-stretching vibrations have to be taken into accomn. the nine step cannot be longer than 0.25 f5. The
`application of distance constraints betwaen bonded atoms. with algorithms like SHAKE I32]. permits to increase
`the time step to 2 rs.
`
`Afier the non bonded interact-ion calculations. SHAKE is the most computational-intensive part ot’ille classical
`MD calculation. Modern MD simulations of biological systems include a great number of solvent molecules
`on which it is possible the application of independent distance constraints by dififerent processors. Constraints
`application to protein or DNA, on the contrary, is better performed by one processor. A good load balandng can be
`obtained with a functional parallelism, in. assigning different tasks (solvent distance conslraints, solute distance
`constraints. bonded calculations. etc.) to difi'crent processors.
`
`3.3.}.
`
`lbs! con-c
`
`GROMACS version Lo [4], from Grosningen BIOSON Research Institute, is an efficient parallel MD code,
`particularly suited to simulate biologICal systems. GROMACS has been parallelized using a message-passing
`model. Both PVM [33] (Parallel Virtual Machines) and MP] [32” (Message Passing Interface) have been utilized
`within the GROMACS software.
`
`As biological test case we chose to simulate the Major Histocornpstibility Complex maeromolccule class it
`(Ml-{C il'l complexed with an antigenic peptide and water molecules.
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 284
`Petitioner Microsoft Corporation - EX. 1066, p. 284
`——————____._______________________
`
`
`
`II
`
`f} Chiflemreml. J'Comprmri'hyrler thimttm: 139 (mil 1- F9
`Table 6
`MHC II simulated on a Compaq E540 cluster
`
`
`5MP melt.
`Nodes
`ET. is}
`Speedup
`‘3': parallel code—
`l
`931
`|
`NA.
`I
`2
`467
`2.0
`99.8
`3
`346
`2.7
`94.3
`4
`18]
`3.3
`9’31
`
`1
`
`.1
`
`i
`
`2
`t;
`6
`i!
`
`4'1")
`221
`Zij
`I68
`
`2.0
`3.3
`4.3
`55
`
`99. l
`93 I
`92.3
`93 T
`
`93.1!
`2.1
`349
`3
`92.2
`4.3
`EEG
`6
`0-1.9
`6L4
`I 46
`9
`95.31
`l}.
`I 14
`I2
`92.?
`3.3
`234
`J
`93.?
`5.5
`Ills
`l!
`953:
`8.2
`J 13
`ll
`16
`ll] l
`9.2
`‘15. 1
`
`
`Table- 7
`Mill: ll simulated on a IBM 8:: SF3 clusltor
`
`
`3MP mach.
`Nodes
`ET is}
`Speedup
`El» parallel node
`J
`i594
`I
`NA.
`i
`8
`3 i3
`5. ]
`9”;
`
`16 1 9| 8.32 93.9
`
`
`
`
`
`MHC II are ilnmuooglobulin-like proteins present in variety of cells, which bind anti
`gene from endomoecu
`plasma membrane and extracellular proteins. antigen-
`MHC II complexes are recognized at the surface ofthe cell
`response.
`by helper ‘llcelis promoting activation of an immune
`MHC II is composed by ӣ716 atoms, tli
`e antigenic peptide by 280 atoms and there are l? 296 water molecules
`giving a total 0”? 863 atoms. The system
`Wes simulated for [DO steps with a time step of}. is; a trombonded entofl’
`radius of I .3 out was used updating the no
`u~bonded pair list every l0 steps.
`Table 6 (cluster of Compaq E540, 4x
`W5@
`500 MHZ) and Table "I (cluster of IBM SP3, 8:: PMII@222 MHZ)
`showr the test case results. to column one the it
`umber ol‘SMI’ machines involved in the test run is shown; in column
`two lite total number ofproeessors and in the
`remaining columns the elapsed time, the speedups of the MP1 and the
`portion of parallelimed code as extrapolated
`pin-ted. The last data were calculated using
`the following equation: as = W where S is the speedup and p the number of processors.
`It is Worth noting the variebility of the extra
`polated parallel portion of the node that varies from a maximum of
`99.8%{ot1e Compaq E540 machine, 2 processors] to a minimum of9l 3% (one [EM 393 machine. 8 processors).
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 285
`Petitioner Microsoft Corporation - EX. 1066, p. 285
`
`
`
`[2
`
`u. C‘Jnttmr u: at. (Tempura: Pin-rm Cbmnrtmlmm list {700” .I— to
`
`This behaviom'is due to the simultaneous use ot’difi'ercnt paralleliratjon schemes; particle decomposition for the
`non—bonded force calculation that represents the 90.2% of serial calculation time; functional parallelism for the
`remaining portion of the code.
`For this reason the parallel efficiency ofthe code strongly depends on the simulated system and on the condition
`of the simulation. Nevertheless. the weight of the computational core (the non-bonded force calculation) can be
`considered as the minimum portion of parallel code obtainable in a real simulation.
`
`3.4. Atomic and Molecular Physics: The Eleanor-molecule Scattering treated with the Single Cemet- hirpann‘on
`rh‘CEJ method
`
`The application of the central field model in quantum mechanics yields to the factoriration ol' the waveFuoction
`in its radial and angular part and it has been and is nowadays. applied to the quantum description of bound and
`continuum electronic states ofatotns [35] while several attempts have been. made in the past to extend it to bound
`molecular systems [36].
`One of the. most succeseful application of this model has been the neahnent of the scattering processes which
`occur in the collision ofeloctrous with polyatomic non—linear targets, known as the Single CenterExpnnsion {SCEJ
`method. A general discussion on the computational aspects of this model has been given before [3?] and specific
`results have also been analyzed elsewhere [38,39]. We will therefore only outline here the basics oflhe generation
`of the SCE numerical wavefiinotion and its related molecular properties, The full details on the mathematical and
`numerical aspects of SUE procedure are given elsewhere. We refer the interested reader to a recent paper which
`describes the SCELib code a computational procedure for the SCE pie-processing phase, combined in a single set
`ofprograms [] l].
`
`3.4,1'. Ellie SHE u'mrcfuoctt'rmandrelatedmoleculorpmperhes
`In the present computational approach one needs a general—purpose quantum chemistry code that is employed
`to generate the Single Determinant description, (near to the Hartree—Fock -— HF --
`limit) of the target electronic
`wavefunotions and an interface with a numerical procedure diet can give us all the necessary quantities as being
`referred to the molecular Centre UfMass (COM) as expansion centre [37].
`in most of the numerical methods employed to solve the scattering equations one then oonvertsthe CC equations
`into a set ofcoupled radial equations by making first a single centre expansion :30?) of the bound and continuum
`functions and than by integrating over all Ihe angular coordinates. Limiting our attention to the ground state
`wavefunctionw one therefore writes down the bound orbitals as mansions around the centre of mass. from which
`the body—fixed (BF) Frame ot'refetonee originates.-
`
`dull-J =r " ENLIU'JXletEthL
`ltl
`
`ti 1}
`
`Here t' Iahels a Specific, multicenter molecular orbital (MU), which contributes to the density of the bound
`electrons to the nonlinear target. while the indiees Ipp.) for the continuum Functions label one of the relevant
`lrreducible Representations (LR) and one of its components, respectively. The index h labels a specific basis, at
`a given angular momentum l, for the pth [R that one is considering [40].
`hi order to perform expansions (l I), one needs. therefore, to start from the multieenter waveftmcticm which
`describes the target molecule and then generate by quadrature each of the ”gill" t coefficients: they were obtained
`for the first time by numerical quadrature oi” the multicenter GTO‘s given as Cartesian Gaussian functions [41].
`The X angular Functions used in (l I t are symmetry~adaptad, generalized harmonics build as linear combinations of
`spherical harmonics hmw‘ qb) \vhich,t‘ora given l, form a basis of the (2! + ”dimensional IR ofthe full rotation
`group [40,42].
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 286
`Petitioner Microsoft Corporation - EX. 1066, p. 286
`———-——___.______.____________________.
`
`
`
`G. Chitin-metal. -’£'ctmpnter.i’itystc.v Cmmrmttuttnm UD' (200.9 t—l'9
`
`13
`
`After one obtains the radial coefficients "filtrl. each hound one-electron wavefunction is also expanded about
`the COM. in terms of the X functions
`
`sit-(r) =r" Eutnrtxfi‘ten)
`M
`then. the one-electron density function can be written as usual
`ptr:=fltrtxnxz."nonzero-.. an =2 x Zlmtrllz.
`
`{12;
`
`(133
`
`where the factor 2 is due to the stun over spin, and the i sum is Over each douny occupied orbital. Once the
`quantity ptr} is obtained from the bound-state wavefimction urfrl of Eq. (13). it can be expanded in terms of
`symmetry—adapted fitncfions belonging to the A. irreducible representation as
`pt rl =r" antmtrlXfihtfl- n}.
`him
`
`{141
`
`where
`
`2'1
`
`In
`
`trimm(t')=2xZ/sint9)d9f tt;{t‘)-H;tr)d$.
`i
`n
`0
`
`(151
`
`Once the SCE Danetttt'on density has been commuted from qu (l 4) and (l 5} one is able to derive from it all the
`molecule: quantities which depend on the electronic distribution, either by natcgrelion, like the electronic Static
`Potential 1’ ptr) dr, or by difi’crcntiation like the density gradient (Wu and Laplacian (V210).
`By using the SCELib [43] AP] (Application Program Interface) one has access to a large set of these molecular
`properties being calculated with respect to the molecular COM and for the first time we coded in it the gradient
`and Laplacian of the electronic density and static potential as explained in details in the next section.
`3.4.2, Mmencat‘ implementation dermis
`A feature common to all SCE molecular properties is that they can be expressed as
`trim
`fit"-H-¢.l=ZFntm{F)XfirL‘(5‘¢L
`
`tlfi}
`
`where the f represent a given molecular property. function of the spherical coordinates as dot product of a radial
`component F times an angular analytic function X.
`In a similar manner are expressed the gradient and Laplacien of the f:
`we. 9. to z Z We". (run ta. to]
`he
`
`(15...,
`A
`Ft,”
`'flXt'MA
`In
`..
`—— Z[ d." XtmEt- + iii—(Wen + Elm-timed]
`tin
`
`in)
`
`and
`
`VINE-(94*) : Zv‘wmu-mmtem]
`ittr
`
`din".
`2 tit-"n. m + II
`{13)
`[-dJ—Iz—d-FV-‘T]Xint
`-
`but we let the interested reader to the specific literature I44] for the necessary mathematical hackgromtds and
`nnmeriml implementation details.
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 287
`Petitioner Microsoft Corporation - EX. 1066, p. 287
`—-———____.______________________—_
`
`
`
`Ill
`
`5. (Motivator. Customer Phi-arm Communal-arrow €59 {Mill 1- {9
`
`On Ens. (lot—(18]: above. the angular part is analytic and the radial part is numerical as derived From the SCE
`procedure. This last fact suggest that when one want to calculates given property over a given (139.46] grid, the
`most time consuming sectitnr will be the calculation of the firm (r), Park) and ngrt'rJ. In order to improve the
`calculation of the Fr", (r) and of its r- derivatives. a natural my to proceed is to peat-fit it alter the SCE procedure.
`with a suitable fitting function.
`
`In a first approach. we used a cubic spline fitting I45] ot'the Fr... 0') function. The spline fitting has the property
`that the fitted function can he very efliciently evaluated at each r point but more important, that one can obtains its
`first and second derivatives with respect to r From the sarnc fitting parameters of the Fin. it). Thus, when one has
`fitted the fi,,,(r') function with a cubic spline, by calculating the four parameters needed at each r point. a single
`call to an evaluation routine can efficiently return the F3," (r' 1. Film (r) and 1‘}:er-
`For what concerns the angular part ofour SCE functions, a natural limit arise on the floating point representation
`of the Km (he. the limit: in Fact. it is well lrnovvn that over a certain value off one reaches the limit ofthe double
`precision floating point arithmetic. To overcome this limit, one can either uses quadrupole precision floating point
`arithmetic {currently available on 64 bits computers) or use specifically designed routines for multiple precision
`lit-“evaluation of the l’n,h and even available on Internet [46].
`
`3.4.3. Numerical results
`
`111 this section We will report some calculation examples by limiting the resulting data to 1he pro-processing
`part ofthe whole electron- molecule SCE scattering procedure. These results refer to those recently reported in the
`literature |1 l] but are enriched with more infonnations on the inner profiling of the SCELih code used forthe test.
`This could offer a way to the reader to better analyze the relative importance of the various code sections with
`the aim to extract the numerical intensive kernels and evaluate them in detail. However, we should warn again the
`reader that we are referring to the prevprooessing part of the whole election-molecule computational procedure,
`and that the successive scattering calculation is likewise CPU and no demanding as stated in a recently published
`paper [4?], where a new method of integration of the scattering equations has been presented.
`on even more urlemsting aspect to discuss ofthe pro-processing part, would he the fact that through an interface
`code —— the SCElih APl — one is able to use the pre—proeessiug data to map a given molecular property (static
`potential, electron density, etc.) over a generic Cartesian grid. This of course. could he of more general use in
`many related scientific areas, but we should leave the haterested reader to some preliminary results obtained using
`a prototype version of the SCELib A131 set of programs [43],
`A5 a test case to discuss. we chose a C21: molecule, the 502 system. with a number of electrons able to produce
`a timing long enough to avoid any random measrrrernenterror in the parallel runs For the SD; molecule, we started
`from its H'FrD95" optimized geometry (RISD) = L423 A. 5gb =- |18.l°l and using this electronic solution to
`derivethe SCE wavofunclioo oqu. (l 2) with the SCELib package. Furlh errn ore, we used 3 8.3.15 of3 00 x 96 x lid
`for a total of 2419200 points, Lm was set to 10 and the Centre Of Mass (COM) chosen as expansion ccrrlre.
`This required a scratch memory usage of about 300 MB, which is small enough to be easily managed by the sewer
`class machines under tea-L 1n the followings. we decided to show only the results coming from the most time-
`consumirtg part of the SCELib package, the whim Function, which performs the Single Centre Expansion of the
`GTO wavefirnctiou ol'a SCF run. In this program. the grid evaluation of the GTO wavefunction and the subsequent
`angular integration are carried out and to show its parlbrrnances the Table 8 reports the overall elapsed time of
`the anoint code section together with the corresponding speed-ups on the LEM. Compaq and SUN SW parallel
`machines of the p-tlireaded SCELih version using from 1 to l4 nodes
`lle is evident fine: the reported data that all the machines show a fairly linear speedurp and in the case of the
`ES450U this trend is maintained up to n = 1:1. We should note however. that in this latter case we observe a super-
`linear speed-op bet-Ween 4 and 12 nodes which is quite unusual for this kind of parallel architecture where for this
`type offloating-point intensive codes one usually find a performance bottleneck beyond 8 CPUa. This panicular
`behaviour has been described and explained in detail in Ref. [I 1] and it will not repeated here.
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 288
`Petitioner Microsoft Corporation - EX. 1066, p. 288
`____—._________—____——_
`
`
`
`G. Chill-mp r: at. 2 Cmnprrn'r Home: Communication: 339 (200!) first
`
`1i
`
`Table 8
`
`30; SCENE) WW parallel mm; on IBM, SUN and Compaq SM? atehiicemres. Elapsed Tune
`t'E.T.ZI in seconds and speedup vs. the number ofriodes
`IBM 43P—26ll
`Compaq E540
`SUN E54500
`
`
`Nodes
`ET. [SJ
`Speedup
`EI ts]
`Speedup
`ET. is]
`Speedup
`l
`208.9
`I.”
`“14.1
`1.0
`393.3
`Ill
`2
`|0I|.I
`2.0
`51.6
`10
`133.3
`2.1
`4
`25.9
`ill}
`39.6
`I‘L-i
`ll
`45.]
`IL?
`II
`33.0
`12.3
`
`H [3.3 38.5
`
`
`More interesting to discuss here. is the analysis of the relative performances of the leading computational parts
`of the SCELih code. To this and, we report in the followings the profiling lnformations of the test performed on
`the same molecular system cited above but but over a smaller grid (10019 points) and serially ran over a single CPU
`Alpha ev67@66? MHz CPU, with the standardpmf Unix profiler under the Truotl V5.0A Operating System.
`
`%time
`
`seconds
`
`cum s
`
`cum sec procedure tfil-el
`
`51.6
`4?.0
`0.7
`0.3
`0.1
`0.1
`0.1
`0.0
`0.0
`0.0
`0.0
`0.0
`
`11.4873
`15.928"!
`0.2529
`0.0967I
`0.0458
`0.0283
`0.0215
`0.0083
`0-0029
`0.0029
`0.0020
`0.0010
`
`51.5
`95.6
`99.4
`99."!
`99.8
`99.9
`99.9
`100.0
`100.0
`100.0
`100.0
`l00.0
`
`17.49 CachO [<scelibep
`33.02
`sphint
`idacelitpl
`33.6?
`pow‘di
`t<EceJibet
`33.77 SubPointC i-(scell‘o‘):
`33.31 gaan [<scelih3}
`33.84 Mu1901ntc tqscelibpl
`33.86
`Pim Hecelibel
`33.87
`setinp i<scelib>l
`33.00
`norm [<scelib>1
`33.88
`1'01:th t<sce1ib>l
`33.00 monorm Kecelih):
`33.80 mkgrid tiSCElihbl
`
`It is evident from the data reported that 99.9% ofthe computing time refers to the part of the code running in
`parallel that is. the sphim‘ Function, In fact, apart From the preparation routines (Rom serinp to mkgrid') the rest of
`the Functions are called fi-om whim and among these, the most time consuming one is CathG. This behaviour
`confirm the linear speedup found over the three parallel SMP architectures used For the tests where we used a p-
`tlueaded version ofthe rphin.‘ function, so that the whole computational kernel was eligible to he run in parallel.
`II is interesting to point out the Fact that two routines, C‘nioMO and within account for the majority of the CPU
`time spent for the calculation. [n the fonner Function the Gaussian wavei‘unction is calculated over a convenient
`(9. o} grid and then used by the latter (the “caller“? to perform a Gauss-Legendre (GauSSv-Chebyshev} angular
`integmfion. These two steps are necessary to evaluate the niltr) terms of'Eq. (1] l and once carried out, the one—
`elech'on density can be calculated and from it, all the necessary molecular properties referred to a Single Centre of
`refm'ence. The computing time oflhe evaluationfinlegration steps is strictly bound to the low level routines used by
`the two functions cited above. While the evaluation stop depends fi-orn the erp function calculation offlie Gaussian
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 289
`Petitioner Microsoft Corporation - EX. 1066, p. 289
`————-_—._.________________________
`
`
`
`In
`
`I} Clement? mu. HCnmpurer Phyrrn Commmn’catm :39 (EM!) 1 1'9
`
`basis set (see Eq. (3)}. the integration phase is totally bound to the evaluation of the angular spherical harmonics
`Y}... functions (see Eq. {I 1)). This behaviour is quite micrcsting because it shows explicitly the dependence of the
`two major stages ofthe propracesaing calculation phase performed in splint.“ From the basic low-level routines exp
`and Yo...
`
`4. The commonly shared numerical behaviour
`
`The first element of similarity among this codes we analyzed so far. is that we. have a code implementation
`dependency. By this we mean that the user is not guaranteed that the same binary code is executed on different
`run-time.
`architectures, due to the fact, for example, that different algorithms can be used by low—level vendor routines at
`
`tho pro—processing behaviour of these codes
`A scoond element of similarity can he found by looking at
`(a situation that we refer to asjob type dependency). By this we mean Iliat the user is not guaranteed that thc same
`code sections will perform exactly at the same computational rates when applied to different molecular systems.
`In examples, lfyou change your molecule geometry or, even it'you change some run-timc parameters in the input,
`you are not assured that the some code section will performs at a constant computational rate.
`This is a common behavior of many codes in this and other scientific areas. which have years ofdevelopment
`behind, containing several thousands of FORTRANFC source lines: it is well know“ to the users {end—users or
`devclopers) of'the codes mentioned above, that even the traditional (serial) profiling stage can be a cumbersome
`activity clue to the hidden complexity of the source files.
`Nonetheless, we have seen as they share specific features which rnakc possible to derive some common
`conclusions on the best numerical environment for those applications We would point out in fact as stated in
`Eq. (3). the central rolc played by the an: Function in quantum chemistry and in molecular physics. At the
`saruc time, thc BLAS low-level routines are of paramount importance in materials science as well as in quantum
`chemistry as reported in Eqs. (1} and (9'). Furdicrrnorc the evaluation of the most time-consuming part or the
`potential used in classical MD of biomolecules and reported in Eq. (10b) closely relates to the expression of any
`SCE molecular property and its gradionbllnplacian ofEqs. (1 l)—(l 819th the dependency on 1,!r at various ranks
`is clearly evident.
`
`These last comments suggested us to try a sort of summary of the numcrfc needs of these closely related
`computational areas. We will sketch those requirements in a short report (Table 9} where we would focus
`the attention on those low—level routines which are the computational cores of the above cited codes. These
`computational kernels are generally small, cleaned sections ofcode that we would expect to be ported on highly
`oplimized silicon chips and dedicated to tho high pcrfonnanoe scientific computing commmiity.
`
`Tabla 5'
`
`A lmlntiw: attempt to summarize ca most time Dimming tour—level
`routines hooded in thiokhcmlcsl-phvsics siumlarions
`
`Function class Function type
`Basic: mathematical flirtatious
`up. i". . ..
`Linea: algebra basic flirtation:
`ELIAS L {—3, .. .
`Polynomial Emotions
`Legendre. Hermite. {‘llcbysliev. . . _
`Simple fining lilncu'ons
`Splines, N rank polynomials. .. .
`
`Son'cs mansions Fast Fourier mm. . ..
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 290
`Petitioner Microsoft Corporation - EX. 1066, p. 290
`—————.—_______________________
`
`
`
`5. Conclusions
`
`G. ("mitt-am urut -L‘rinrmrurPlu-rlcr Communi'oam 1'3!) {200;} .t—ttt
`
`l‘u'
`
`We have seen in the previous sections how many aspects join apparently different scientific fields, in the
`computational area of(bioJcliemical-pltysics. From the computational side those similarities become even more
`evident and a common set of fimcnons and routines are fi'cquently used in the majority ofthe codes adopted in this
`area ol'computirtg.
`This suggests a reasonable request to electronic engineers: why do not exploit the porting ofthese computational
`kernels on optimized silicon chips'lI We are really confident that this could improve the perfonnaucm ofthe codes
`used in these area by several order ofmagnitude. By the way‘ we are not underestimating the inner complexity in
`the design and manufacture of these martini—purpose (SP) devices, but we would conversely focus attention of the
`designers on the possible benefits of these SP chips to this {not so) specific field of application.
`Phat of all, let imagine this SP chip integrated as a co-prooessor in any serial or parallel architccmre by means
`of the PCI (Peripheral Component Interconnect) connector of its board, A part from the key technical issmts
`regarding the data bandwidth toward the Geneml' Pumas-a {GP} CPU, this component could eo-operato with it to
`solve specific numeric problem, by leaving to the main GP processor the remaining ofthe workload (HO. devices
`management, etc‘), This hypothetic system with SP and 51" processors coupled together could he a viable High
`Performance Computing {HPC} solution for this and many other computational areas, providing one is able to
`easily administer the SP processors within the GP system.
`in fact the SP boards could he designed to solve many dili'eront specific numerical problems (not necessary
`only in the area of [bio)chemical-physics) but, in our opinion, the whole system should conforms to the following
`requirements;
`
`I The SP chip must have an Application Program Interface (A131) so that
`mnespondiug sot’twarc Functions they refer to.
`o The SP API libraryI should be as cuni‘orrnant as possible to the corresponding sofiware counterparts and in
`a second step, the SP AP}. should be able to conform to the underlying hardware present tsee, e.g,, the role
`played by the GLKOpanL library in the graphic area of computing).
`0 The best hardware solution for this computational area would be a combination ofGPISI’ processors with the
`maximum configuration flexibility leg, you change your GP system and maintain the SP boards on the new
`
`it can be easily used as the
`
`This combination of GPISP chips could operate at impressive computational rates with minimal increasing of
`the hardware complexity as the PC market clearly shows. In fact the idea of merging GP and Sl‘ devices is not
`new: Intel Co, has introduced three years ago the M SP (5ND) chip into its series of Pentium processor with
`significant performance gains in executing multimedia XSo instructions.
`This last point open the discussion on the possible benefits of a mixed GPJSP architecture for this area of
`computing, a topic which is outside the scope of this paper. We would in any case point out that, difierently From
`other computational areas like High Energy Physics or l'tstrophysics,‘S the codes mentioned above require CPU as
`wall as memory and storage no top performances Hence, the mixed GPISP solution to the computing environment
`could surely represent the best viable way to High Pedimnance Computing in the short-medium terms.
`Furthermore, the expected perfon'nancea of the SP chips dedicated to this High Performance area of Computing
`are at least ofonc orderofmagnitude better than the corresponding GP RlSC processors {the 5? chip will eventually
`perform with greater perfonnances over the GP counterpart and stated by the published vendor benchmarks). This.
`together with the expected feasibility of the SP board over a standard FC-I bus. can improve the large diffusion of
`this land ot‘HPC solution with immediate economic benefits for the producer.
`
`
`computational details.
`r“The intervened reader can refers to the articles on the parallel SP machines. APE and GRAPE. presented in this. issue for any specific
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 291
`Petitioner Microsoft Corporation - EX. 1066, p. 291
`
`
`
`iii
`Acknowledgements
`
`G. Chriiemirinl ’Cumpwiurfimm Cwmum'aflm: £39 r2001) {—19
`
`We wish to thanks the kind help of Prof. EA. Gianurrco and Piaf. L. Colombo for many helpful discussions on
`the topics covered by this paper. Dr, A. Pimeiti for lire Gaussian93 results and [he CASPUR computational contra
`for providing the computerarchitecmms used in this Work.
`
`References
`
`| H I33. see. E. Clmcnri. G.Curm1.giu (Eds). METECC-95. STE}: hair: I995.
`ll] M.J. Fi-r'slr1 11W. Trucks, H3. Schlcgcl. QB. Samaria, MA. Robb, H1 Chma‘l'l. VI}. Zahwwsky,J.A.Momtgomuiy. RE. Strum-mm.
`LC. Swarm, S. Dapprich. 1M. Mill'mAD. Daniels. KN. Kudin. MC. Swain. U.Fari£88. I. Tamasi. V. Bar-one. M. Cassi. K Cnmmi. B.
`Mcnuoc'r. C. Pomclli. C. Adamo, S. Clilfimi. J. Dchtel'sky. GA. Peturmn. RY. Ayala. Q. Cui. K. Momkwna. 11K. Malick. AD. Muck.
`K. Raghamhari. LB. anan. I Cioslowsky. LV. Ortiz. EB. Sui-hum. 0. Liu, A. Liashenka, P. Fiskmz. I. Knmammi. R. Gomperrs.
`R.L. Martin. DJ. Fax. '1‘. Keiflr. MA. M-Lalram. CY. Pang. A. Nanayakkara, C. Gomlcs. M. Challacnrnlw. P MW. Gill. BI]. 1011mm].
`r995.
`W. ChcrL MW. Wong. J.L. Andrus, M. Headfiordm, 5.5. Raplnglc. LA. Pople, Grimm?! (Revision A1}. Gaussian Inc... Pinshurg. FA,
`[3] See. cg. WF. van Gallium X. DarrrrL AB. Mark, GRDMEIS lime field. in: KL Allingrr, T Clark, J Gaslciger. PA. Militia“. HF.
`Schulz-r III. P. Selimilrer this}. Enquiaperlin omenpumional Chemistry. WI. 2. Wiley and Sons. Landon. i993. pp. I?! 1— I216.
`[4] 9. Achiemp. Ru. DnmuL D.v.d. Spoel. A. Sijbers, H. [lam B. Rm'rsrna, M.K.R. Remains, Gnnnacs: A Pazallcl Comm: for
`Molecular Ugaanaics Simulnrions, mecdiugs affirgsias Computing'w. Vol. 1. World Scientific. Slllga‘ucl'c, [993.
`[5|
`IL Orr. M. Par-inullc. Phys. kw. Lari. ESUEISSJMH
`[6| CZ. Wang. KM. Hn. 1'11: 1. Prismainc. AA Rim: {Eds}. 11er in Chumr'ul Playsica WI. LXXXIX. p. 65].
`6.
`[T] HJE. Ramadan at at. Sclacicd parallel applirxrious and marina] clemculs. in: Aspects ofCanrpumtional Science. NCF. "195. Chrumr
`Jill] HG. Ewen. KJE. Mcaalrrrm. M. Simidgc, F. van Hoeml. H.I.C. andsm. Rwllelizaiinn of Mnlcurlnr Dynamics Cndc. GROMOSN
`purulklizaiioc for distributed muarory Worn-cc irr' Manuals and Tedrniqmis in Compuhtioual Chemistry: METECC-E‘j. STEF, 1995.
`[9} D. Rocmranu. R. Bizzan-i. G. Chillcmi, N. Sawmkfli Noln.J mmpuL Chen Ian-3931635.
`[in] 5.3. see. hupziiumvmspurit.
`[I II N. Sarina. FA. Gimrurwo. SCELib: A parallel computational library ol’mulcculirr mm in llrc single ocnrcr nppmadr. Campus. Phys.
`Comm. I25 (2mm I39.
`[l2] Dig-ital Esu-adcd Mathematical Liam {DXMLMrmpJandiuiml mfhpdsaflwsmfimltmrl.
`|13i Engenccring and Scientific SubIDrJIinu Library (ESSL), hnpflM\W'.remtcthm.wmiachJUpl‘LMcpiE‘OptLihfiuichmnl.
`[14] Sec. 2.3.. in recent unpublisod work by K Guru our some BLAS subroutines haplemcnrcd on Ihe Alpha evSSi'rzvfi nricrirpmccssurs. by
`dimi' contacting the nullrur in flu: mail mwgmrahminmqp.
`“5] [U]. Fair. W. Yang, Density Functional Thunry ofmoms and Molecules. Oxford Univrcrsily Friars. New Ym'k. I939
`Comm. 128 (1000] ms.
`[l6] L. Colombo. M. Roar-h. Parallel tram-binding molecular dynamics simulations cm symmeu-ic mullipmccssms platforms. Compur. Plus
`Tbch. Rap. #lIJ. in preparation
`“TI [3 Chillemi, A. Fianna, M. Rnsrnl. N. Baum. Th: Pcrlirmrance alchemical-physics aides an EASFUR parallel archrmnms. CASPER
`
`[18] AJ. vim liar Swen (Ed). Aspen“ affmpumfimlifl Science. NCF Pirbl., Drrn Hag. The Netherlands, IWS.
`[r9] c.c.amrrm,m. Mud.Pl.1ys.23(195l}69;
`(LG. Hall. ch. Rey. Soc. [London] A 205 {Willi-ll.
`Nellwrlnmla. ”90.
`[20] E. Element: (34.), Appendices 7C-TE in MUFECC-Hfl, Muslim: Tlsclrniquw i'll Camprrmnonal Chemism; ESCDM. Lcirlcn. Tile
`Ill] RC. Rafferrerri. Chem. Phys. Len 20 {19131335.
`in: C. Ogre-til. IE. C'srmrarlia (Eds). Computational
`I22] Fara discussion OPSCF cameraman: and stability m HB. Schlcgcl. J. McDouaII.
`Advances in Organic Chaninry. Kluwcr, The Nefl'lcthrlds, 1951!.
`[23] W'J I‘Ielrrz. L. Radum. P.r'.R. Scbluycr. JA. Pup]; Ab hitio Molecular Orbital Theory, Wiley and Sons, New York 1926.
`[2.4] MP. Allen. DJ. Trlrlcslcy. Cnmprrmr Simularr'ons of quuida. Dxliprd University Pnesa Oxford. I991.
`[25] L. Cuiumlro. in: D. Simmer (Ed). Annual Review ofCumpursrimmJ Physics W, World Scientific. Singapore. I996. [2, NT.
`[25]
`l. mei. R. Hint-15.1: Z. Wig. KM. Hr). CM. Snukorrlis. Phys Rev. B 49(l994172i2.
`[2T] hnp:iMmr-.muihnrgflapacki.
`[28] G.H. Gulub. CF. Van Loan. Matrix Comprnariom. 3rd ear-1.. Jul-iris an