throbber
In
`
`it Chm-nit eioi fl'fmnfnlterl’hysm i‘carmnuimflom- H9 ramp l—iu
`
`In fact, the interaction field of a many-body system is usually expressed as empirical potential among the atoms
`(particles) of the many—body system under study and it should includes energy terms that represent bonded and
`non-bonded (Van der Wanls and Couloinbic forces) interactions. By limiting our attention to the simulation of
`a biological system. a typical force field used in claseical MT) sirnulaiionsappears to be like Follows
`
`v = Z sinubnh Z Karo-on;
`bonds
`angles
`
`(10a)
`
`4 Z Kaichsw—ch Z KtlE—Ifnll
`dihedral:
`improper
`i2
`6
`
`{lib}
`(m) ]+; ,- 4mm.
`+ZEEQH’U)
`J
`_ fl
`G'I'QJ'
`_
`fl
`The first four terms tEq, (than represent the bonded potential. The last one (En. tltlb)} gives the non-bonded
`energy contribution to the total potential and it is the most time-consuming task is 35—90% ofthe total CPU time).
`Each pair interaction t‘j in Eq. (10b) has to be calculated to obtain the energy due to the non-bonded forces. The
`number oi'pairs to be calculated is halved using Newton ‘3 third law F; = —i“_i.
`The number of pair interactions can be further reduced including only atorn pairs within a cutoff distance.
`’I‘heret'ore, at each step a generic atom i has a set of j different atoms to interact with (the non-bonded pair list},
`This list can be updated every n steps, with H. 2 it].
`Parallelizatico can be achieved with a particle decomposition scheme, i.e. a number ofi particles 15 assigned to
`each processor (home particles) [1']. Then, each processor will calculate independently the pair interactions of the
`1‘ home particles with the 1 particles.
`It is worth noting that in a shared men-torsr machine the replicated data model is automatically obtained, i.e. data
`about every atom in Ihe system is available to each processor, while in a distributed memory architecture position.
`and undergone force ofeacb particle, have to be communicated between the machine processors.
`Once obtained the interaction forces on each atom, the Newton ‘s equattons of motion can be numerically
`integrated using, for example, the leap-frog algorithm ['7],
`The integration is not a computationally demanding task but it can be nevoflheless perl‘onnccl in parallel on each
`home particle. A longer integration time step permits to simulate longer times with the same number cl‘ steps.
`if the bond-stretching vibrations have to be taken into accomn. the nine step cannot be longer than 0.25 f5. The
`application of distance constraints betwaen bonded atoms. with algorithms like SHAKE I32]. permits to increase
`the time step to 2 rs.
`
`Afier the non bonded interact-ion calculations. SHAKE is the most computational-intensive part ot’ille classical
`MD calculation. Modern MD simulations of biological systems include a great number of solvent molecules
`on which it is possible the application of independent distance constraints by dififerent processors. Constraints
`application to protein or DNA, on the contrary, is better performed by one processor. A good load balandng can be
`obtained with a functional parallelism, in. assigning different tasks (solvent distance conslraints, solute distance
`constraints. bonded calculations. etc.) to difi'crent processors.
`
`3.3.}.
`
`lbs! con-c
`
`GROMACS version Lo [4], from Grosningen BIOSON Research Institute, is an efficient parallel MD code,
`particularly suited to simulate biologICal systems. GROMACS has been parallelized using a message-passing
`model. Both PVM [33] (Parallel Virtual Machines) and MP] [32” (Message Passing Interface) have been utilized
`within the GROMACS software.
`
`As biological test case we chose to simulate the Major Histocornpstibility Complex maeromolccule class it
`(Ml-{C il'l complexed with an antigenic peptide and water molecules.
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 284
`Petitioner Microsoft Corporation - EX. 1066, p. 284
`——————____._______________________
`
`

`

`II
`
`f} Chiflemreml. J'Comprmri'hyrler thimttm: 139 (mil 1- F9
`Table 6
`MHC II simulated on a Compaq E540 cluster
`
`
`5MP melt.
`Nodes
`ET. is}
`Speedup
`‘3': parallel code—
`l
`931
`|
`NA.
`I
`2
`467
`2.0
`99.8
`3
`346
`2.7
`94.3
`4
`18]
`3.3
`9’31
`
`1
`
`.1
`
`i
`
`2
`t;
`6
`i!
`
`4'1")
`221
`Zij
`I68
`
`2.0
`3.3
`4.3
`55
`
`99. l
`93 I
`92.3
`93 T
`
`93.1!
`2.1
`349
`3
`92.2
`4.3
`EEG
`6
`0-1.9
`6L4
`I 46
`9
`95.31
`l}.
`I 14
`I2
`92.?
`3.3
`234
`J
`93.?
`5.5
`Ills
`l!
`953:
`8.2
`J 13
`ll
`16
`ll] l
`9.2
`‘15. 1
`
`
`Table- 7
`Mill: ll simulated on a IBM 8:: SF3 clusltor
`
`
`3MP mach.
`Nodes
`ET is}
`Speedup
`El» parallel node
`J
`i594
`I
`NA.
`i
`8
`3 i3
`5. ]
`9”;
`
`16 1 9| 8.32 93.9
`
`
`
`
`
`MHC II are ilnmuooglobulin-like proteins present in variety of cells, which bind anti
`gene from endomoecu
`plasma membrane and extracellular proteins. antigen-
`MHC II complexes are recognized at the surface ofthe cell
`response.
`by helper ‘llcelis promoting activation of an immune
`MHC II is composed by ӣ716 atoms, tli
`e antigenic peptide by 280 atoms and there are l? 296 water molecules
`giving a total 0”? 863 atoms. The system
`Wes simulated for [DO steps with a time step of}. is; a trombonded entofl’
`radius of I .3 out was used updating the no
`u~bonded pair list every l0 steps.
`Table 6 (cluster of Compaq E540, 4x
`W5@
`500 MHZ) and Table "I (cluster of IBM SP3, 8:: PMII@222 MHZ)
`showr the test case results. to column one the it
`umber ol‘SMI’ machines involved in the test run is shown; in column
`two lite total number ofproeessors and in the
`remaining columns the elapsed time, the speedups of the MP1 and the
`portion of parallelimed code as extrapolated
`pin-ted. The last data were calculated using
`the following equation: as = W where S is the speedup and p the number of processors.
`It is Worth noting the variebility of the extra
`polated parallel portion of the node that varies from a maximum of
`99.8%{ot1e Compaq E540 machine, 2 processors] to a minimum of9l 3% (one [EM 393 machine. 8 processors).
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 285
`Petitioner Microsoft Corporation - EX. 1066, p. 285
`
`

`

`[2
`
`u. C‘Jnttmr u: at. (Tempura: Pin-rm Cbmnrtmlmm list {700” .I— to
`
`This behaviom'is due to the simultaneous use ot’difi'ercnt paralleliratjon schemes; particle decomposition for the
`non—bonded force calculation that represents the 90.2% of serial calculation time; functional parallelism for the
`remaining portion of the code.
`For this reason the parallel efficiency ofthe code strongly depends on the simulated system and on the condition
`of the simulation. Nevertheless. the weight of the computational core (the non-bonded force calculation) can be
`considered as the minimum portion of parallel code obtainable in a real simulation.
`
`3.4. Atomic and Molecular Physics: The Eleanor-molecule Scattering treated with the Single Cemet- hirpann‘on
`rh‘CEJ method
`
`The application of the central field model in quantum mechanics yields to the factoriration ol' the waveFuoction
`in its radial and angular part and it has been and is nowadays. applied to the quantum description of bound and
`continuum electronic states ofatotns [35] while several attempts have been. made in the past to extend it to bound
`molecular systems [36].
`One of the. most succeseful application of this model has been the neahnent of the scattering processes which
`occur in the collision ofeloctrous with polyatomic non—linear targets, known as the Single CenterExpnnsion {SCEJ
`method. A general discussion on the computational aspects of this model has been given before [3?] and specific
`results have also been analyzed elsewhere [38,39]. We will therefore only outline here the basics oflhe generation
`of the SCE numerical wavefiinotion and its related molecular properties, The full details on the mathematical and
`numerical aspects of SUE procedure are given elsewhere. We refer the interested reader to a recent paper which
`describes the SCELib code a computational procedure for the SCE pie-processing phase, combined in a single set
`ofprograms [] l].
`
`3.4,1'. Ellie SHE u'mrcfuoctt'rmandrelatedmoleculorpmperhes
`In the present computational approach one needs a general—purpose quantum chemistry code that is employed
`to generate the Single Determinant description, (near to the Hartree—Fock -— HF --
`limit) of the target electronic
`wavefunotions and an interface with a numerical procedure diet can give us all the necessary quantities as being
`referred to the molecular Centre UfMass (COM) as expansion centre [37].
`in most of the numerical methods employed to solve the scattering equations one then oonvertsthe CC equations
`into a set ofcoupled radial equations by making first a single centre expansion :30?) of the bound and continuum
`functions and than by integrating over all Ihe angular coordinates. Limiting our attention to the ground state
`wavefunctionw one therefore writes down the bound orbitals as mansions around the centre of mass. from which
`the body—fixed (BF) Frame ot'refetonee originates.-
`
`dull-J =r " ENLIU'JXletEthL
`ltl
`
`ti 1}
`
`Here t' Iahels a Specific, multicenter molecular orbital (MU), which contributes to the density of the bound
`electrons to the nonlinear target. while the indiees Ipp.) for the continuum Functions label one of the relevant
`lrreducible Representations (LR) and one of its components, respectively. The index h labels a specific basis, at
`a given angular momentum l, for the pth [R that one is considering [40].
`hi order to perform expansions (l I), one needs. therefore, to start from the multieenter waveftmcticm which
`describes the target molecule and then generate by quadrature each of the ”gill" t coefficients: they were obtained
`for the first time by numerical quadrature oi” the multicenter GTO‘s given as Cartesian Gaussian functions [41].
`The X angular Functions used in (l I t are symmetry~adaptad, generalized harmonics build as linear combinations of
`spherical harmonics hmw‘ qb) \vhich,t‘ora given l, form a basis of the (2! + ”dimensional IR ofthe full rotation
`group [40,42].
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 286
`Petitioner Microsoft Corporation - EX. 1066, p. 286
`———-——___.______.____________________.
`
`

`

`G. Chitin-metal. -’£'ctmpnter.i’itystc.v Cmmrmttuttnm UD' (200.9 t—l'9
`
`13
`
`After one obtains the radial coefficients "filtrl. each hound one-electron wavefunction is also expanded about
`the COM. in terms of the X functions
`
`sit-(r) =r" Eutnrtxfi‘ten)
`M
`then. the one-electron density function can be written as usual
`ptr:=fltrtxnxz."nonzero-.. an =2 x Zlmtrllz.
`
`{12;
`
`(133
`
`where the factor 2 is due to the stun over spin, and the i sum is Over each douny occupied orbital. Once the
`quantity ptr} is obtained from the bound-state wavefimction urfrl of Eq. (13). it can be expanded in terms of
`symmetry—adapted fitncfions belonging to the A. irreducible representation as
`pt rl =r" antmtrlXfihtfl- n}.
`him
`
`{141
`
`where
`
`2'1
`
`In
`
`trimm(t')=2xZ/sint9)d9f tt;{t‘)-H;tr)d$.
`i
`n
`0
`
`(151
`
`Once the SCE Danetttt'on density has been commuted from qu (l 4) and (l 5} one is able to derive from it all the
`molecule: quantities which depend on the electronic distribution, either by natcgrelion, like the electronic Static
`Potential 1’ ptr) dr, or by difi’crcntiation like the density gradient (Wu and Laplacian (V210).
`By using the SCELib [43] AP] (Application Program Interface) one has access to a large set of these molecular
`properties being calculated with respect to the molecular COM and for the first time we coded in it the gradient
`and Laplacian of the electronic density and static potential as explained in details in the next section.
`3.4.2, Mmencat‘ implementation dermis
`A feature common to all SCE molecular properties is that they can be expressed as
`trim
`fit"-H-¢.l=ZFntm{F)XfirL‘(5‘¢L
`
`tlfi}
`
`where the f represent a given molecular property. function of the spherical coordinates as dot product of a radial
`component F times an angular analytic function X.
`In a similar manner are expressed the gradient and Laplacien of the f:
`we. 9. to z Z We". (run ta. to]
`he
`
`(15...,
`A
`Ft,”
`'flXt'MA
`In
`..
`—— Z[ d." XtmEt- + iii—(Wen + Elm-timed]
`tin
`
`in)
`
`and
`
`VINE-(94*) : Zv‘wmu-mmtem]
`ittr
`
`din".
`2 tit-"n. m + II
`{13)
`[-dJ—Iz—d-FV-‘T]Xint
`-
`but we let the interested reader to the specific literature I44] for the necessary mathematical hackgromtds and
`nnmeriml implementation details.
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 287
`Petitioner Microsoft Corporation - EX. 1066, p. 287
`—-———____.______________________—_
`
`

`

`Ill
`
`5. (Motivator. Customer Phi-arm Communal-arrow €59 {Mill 1- {9
`
`On Ens. (lot—(18]: above. the angular part is analytic and the radial part is numerical as derived From the SCE
`procedure. This last fact suggest that when one want to calculates given property over a given (139.46] grid, the
`most time consuming sectitnr will be the calculation of the firm (r), Park) and ngrt'rJ. In order to improve the
`calculation of the Fr", (r) and of its r- derivatives. a natural my to proceed is to peat-fit it alter the SCE procedure.
`with a suitable fitting function.
`
`In a first approach. we used a cubic spline fitting I45] ot'the Fr... 0') function. The spline fitting has the property
`that the fitted function can he very efliciently evaluated at each r point but more important, that one can obtains its
`first and second derivatives with respect to r From the sarnc fitting parameters of the Fin. it). Thus, when one has
`fitted the fi,,,(r') function with a cubic spline, by calculating the four parameters needed at each r point. a single
`call to an evaluation routine can efficiently return the F3," (r' 1. Film (r) and 1‘}:er-
`For what concerns the angular part ofour SCE functions, a natural limit arise on the floating point representation
`of the Km (he. the limit: in Fact. it is well lrnovvn that over a certain value off one reaches the limit ofthe double
`precision floating point arithmetic. To overcome this limit, one can either uses quadrupole precision floating point
`arithmetic {currently available on 64 bits computers) or use specifically designed routines for multiple precision
`lit-“evaluation of the l’n,h and even available on Internet [46].
`
`3.4.3. Numerical results
`
`111 this section We will report some calculation examples by limiting the resulting data to 1he pro-processing
`part ofthe whole electron- molecule SCE scattering procedure. These results refer to those recently reported in the
`literature |1 l] but are enriched with more infonnations on the inner profiling of the SCELih code used forthe test.
`This could offer a way to the reader to better analyze the relative importance of the various code sections with
`the aim to extract the numerical intensive kernels and evaluate them in detail. However, we should warn again the
`reader that we are referring to the prevprooessing part of the whole election-molecule computational procedure,
`and that the successive scattering calculation is likewise CPU and no demanding as stated in a recently published
`paper [4?], where a new method of integration of the scattering equations has been presented.
`on even more urlemsting aspect to discuss ofthe pro-processing part, would he the fact that through an interface
`code —— the SCElih APl — one is able to use the pre—proeessiug data to map a given molecular property (static
`potential, electron density, etc.) over a generic Cartesian grid. This of course. could he of more general use in
`many related scientific areas, but we should leave the haterested reader to some preliminary results obtained using
`a prototype version of the SCELib A131 set of programs [43],
`A5 a test case to discuss. we chose a C21: molecule, the 502 system. with a number of electrons able to produce
`a timing long enough to avoid any random measrrrernenterror in the parallel runs For the SD; molecule, we started
`from its H'FrD95" optimized geometry (RISD) = L423 A. 5gb =- |18.l°l and using this electronic solution to
`derivethe SCE wavofunclioo oqu. (l 2) with the SCELib package. Furlh errn ore, we used 3 8.3.15 of3 00 x 96 x lid
`for a total of 2419200 points, Lm was set to 10 and the Centre Of Mass (COM) chosen as expansion ccrrlre.
`This required a scratch memory usage of about 300 MB, which is small enough to be easily managed by the sewer
`class machines under tea-L 1n the followings. we decided to show only the results coming from the most time-
`consumirtg part of the SCELib package, the whim Function, which performs the Single Centre Expansion of the
`GTO wavefirnctiou ol'a SCF run. In this program. the grid evaluation of the GTO wavefunction and the subsequent
`angular integration are carried out and to show its parlbrrnances the Table 8 reports the overall elapsed time of
`the anoint code section together with the corresponding speed-ups on the LEM. Compaq and SUN SW parallel
`machines of the p-tlireaded SCELih version using from 1 to l4 nodes
`lle is evident fine: the reported data that all the machines show a fairly linear speedurp and in the case of the
`ES450U this trend is maintained up to n = 1:1. We should note however. that in this latter case we observe a super-
`linear speed-op bet-Ween 4 and 12 nodes which is quite unusual for this kind of parallel architecture where for this
`type offloating-point intensive codes one usually find a performance bottleneck beyond 8 CPUa. This panicular
`behaviour has been described and explained in detail in Ref. [I 1] and it will not repeated here.
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 288
`Petitioner Microsoft Corporation - EX. 1066, p. 288
`____—._________—____——_
`
`

`

`G. Chill-mp r: at. 2 Cmnprrn'r Home: Communication: 339 (200!) first
`
`1i
`
`Table 8
`
`30; SCENE) WW parallel mm; on IBM, SUN and Compaq SM? atehiicemres. Elapsed Tune
`t'E.T.ZI in seconds and speedup vs. the number ofriodes
`IBM 43P—26ll
`Compaq E540
`SUN E54500
`
`
`Nodes
`ET. [SJ
`Speedup
`EI ts]
`Speedup
`ET. is]
`Speedup
`l
`208.9
`I.”
`“14.1
`1.0
`393.3
`Ill
`2
`|0I|.I
`2.0
`51.6
`10
`133.3
`2.1
`4
`25.9
`ill}
`39.6
`I‘L-i
`ll
`45.]
`IL?
`II
`33.0
`12.3
`
`H [3.3 38.5
`
`
`More interesting to discuss here. is the analysis of the relative performances of the leading computational parts
`of the SCELih code. To this and, we report in the followings the profiling lnformations of the test performed on
`the same molecular system cited above but but over a smaller grid (10019 points) and serially ran over a single CPU
`Alpha ev67@66? MHz CPU, with the standardpmf Unix profiler under the Truotl V5.0A Operating System.
`
`%time
`
`seconds
`
`cum s
`
`cum sec procedure tfil-el
`
`51.6
`4?.0
`0.7
`0.3
`0.1
`0.1
`0.1
`0.0
`0.0
`0.0
`0.0
`0.0
`
`11.4873
`15.928"!
`0.2529
`0.0967I
`0.0458
`0.0283
`0.0215
`0.0083
`0-0029
`0.0029
`0.0020
`0.0010
`
`51.5
`95.6
`99.4
`99."!
`99.8
`99.9
`99.9
`100.0
`100.0
`100.0
`100.0
`l00.0
`
`17.49 CachO [<scelibep
`33.02
`sphint
`idacelitpl
`33.6?
`pow‘di
`t<EceJibet
`33.77 SubPointC i-(scell‘o‘):
`33.31 gaan [<scelih3}
`33.84 Mu1901ntc tqscelibpl
`33.86
`Pim Hecelibel
`33.87
`setinp i<scelib>l
`33.00
`norm [<scelib>1
`33.88
`1'01:th t<sce1ib>l
`33.00 monorm Kecelih):
`33.80 mkgrid tiSCElihbl
`
`It is evident from the data reported that 99.9% ofthe computing time refers to the part of the code running in
`parallel that is. the sphim‘ Function, In fact, apart From the preparation routines (Rom serinp to mkgrid') the rest of
`the Functions are called fi-om whim and among these, the most time consuming one is CathG. This behaviour
`confirm the linear speedup found over the three parallel SMP architectures used For the tests where we used a p-
`tlueaded version ofthe rphin.‘ function, so that the whole computational kernel was eligible to he run in parallel.
`II is interesting to point out the Fact that two routines, C‘nioMO and within account for the majority of the CPU
`time spent for the calculation. [n the fonner Function the Gaussian wavei‘unction is calculated over a convenient
`(9. o} grid and then used by the latter (the “caller“? to perform a Gauss-Legendre (GauSSv-Chebyshev} angular
`integmfion. These two steps are necessary to evaluate the niltr) terms of'Eq. (1] l and once carried out, the one—
`elech'on density can be calculated and from it, all the necessary molecular properties referred to a Single Centre of
`refm'ence. The computing time oflhe evaluationfinlegration steps is strictly bound to the low level routines used by
`the two functions cited above. While the evaluation stop depends fi-orn the erp function calculation offlie Gaussian
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 289
`Petitioner Microsoft Corporation - EX. 1066, p. 289
`————-_—._.________________________
`
`

`

`In
`
`I} Clement? mu. HCnmpurer Phyrrn Commmn’catm :39 (EM!) 1 1'9
`
`basis set (see Eq. (3)}. the integration phase is totally bound to the evaluation of the angular spherical harmonics
`Y}... functions (see Eq. {I 1)). This behaviour is quite micrcsting because it shows explicitly the dependence of the
`two major stages ofthe propracesaing calculation phase performed in splint.“ From the basic low-level routines exp
`and Yo...
`
`4. The commonly shared numerical behaviour
`
`The first element of similarity among this codes we analyzed so far. is that we. have a code implementation
`dependency. By this we mean that the user is not guaranteed that the same binary code is executed on different
`run-time.
`architectures, due to the fact, for example, that different algorithms can be used by low—level vendor routines at
`
`tho pro—processing behaviour of these codes
`A scoond element of similarity can he found by looking at
`(a situation that we refer to asjob type dependency). By this we mean Iliat the user is not guaranteed that thc same
`code sections will perform exactly at the same computational rates when applied to different molecular systems.
`In examples, lfyou change your molecule geometry or, even it'you change some run-timc parameters in the input,
`you are not assured that the some code section will performs at a constant computational rate.
`This is a common behavior of many codes in this and other scientific areas. which have years ofdevelopment
`behind, containing several thousands of FORTRANFC source lines: it is well know“ to the users {end—users or
`devclopers) of'the codes mentioned above, that even the traditional (serial) profiling stage can be a cumbersome
`activity clue to the hidden complexity of the source files.
`Nonetheless, we have seen as they share specific features which rnakc possible to derive some common
`conclusions on the best numerical environment for those applications We would point out in fact as stated in
`Eq. (3). the central rolc played by the an: Function in quantum chemistry and in molecular physics. At the
`saruc time, thc BLAS low-level routines are of paramount importance in materials science as well as in quantum
`chemistry as reported in Eqs. (1} and (9'). Furdicrrnorc the evaluation of the most time-consuming part or the
`potential used in classical MD of biomolecules and reported in Eq. (10b) closely relates to the expression of any
`SCE molecular property and its gradionbllnplacian ofEqs. (1 l)—(l 819th the dependency on 1,!r at various ranks
`is clearly evident.
`
`These last comments suggested us to try a sort of summary of the numcrfc needs of these closely related
`computational areas. We will sketch those requirements in a short report (Table 9} where we would focus
`the attention on those low—level routines which are the computational cores of the above cited codes. These
`computational kernels are generally small, cleaned sections ofcode that we would expect to be ported on highly
`oplimized silicon chips and dedicated to tho high pcrfonnanoe scientific computing commmiity.
`
`Tabla 5'
`
`A lmlntiw: attempt to summarize ca most time Dimming tour—level
`routines hooded in thiokhcmlcsl-phvsics siumlarions
`
`Function class Function type
`Basic: mathematical flirtatious
`up. i". . ..
`Linea: algebra basic flirtation:
`ELIAS L {—3, .. .
`Polynomial Emotions
`Legendre. Hermite. {‘llcbysliev. . . _
`Simple fining lilncu'ons
`Splines, N rank polynomials. .. .
`
`Son'cs mansions Fast Fourier mm. . ..
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 290
`Petitioner Microsoft Corporation - EX. 1066, p. 290
`—————.—_______________________
`
`

`

`5. Conclusions
`
`G. ("mitt-am urut -L‘rinrmrurPlu-rlcr Communi'oam 1'3!) {200;} .t—ttt
`
`l‘u'
`
`We have seen in the previous sections how many aspects join apparently different scientific fields, in the
`computational area of(bioJcliemical-pltysics. From the computational side those similarities become even more
`evident and a common set of fimcnons and routines are fi'cquently used in the majority ofthe codes adopted in this
`area ol'computirtg.
`This suggests a reasonable request to electronic engineers: why do not exploit the porting ofthese computational
`kernels on optimized silicon chips'lI We are really confident that this could improve the perfonnaucm ofthe codes
`used in these area by several order ofmagnitude. By the way‘ we are not underestimating the inner complexity in
`the design and manufacture of these martini—purpose (SP) devices, but we would conversely focus attention of the
`designers on the possible benefits of these SP chips to this {not so) specific field of application.
`Phat of all, let imagine this SP chip integrated as a co-prooessor in any serial or parallel architccmre by means
`of the PCI (Peripheral Component Interconnect) connector of its board, A part from the key technical issmts
`regarding the data bandwidth toward the Geneml' Pumas-a {GP} CPU, this component could eo-operato with it to
`solve specific numeric problem, by leaving to the main GP processor the remaining ofthe workload (HO. devices
`management, etc‘), This hypothetic system with SP and 51" processors coupled together could he a viable High
`Performance Computing {HPC} solution for this and many other computational areas, providing one is able to
`easily administer the SP processors within the GP system.
`in fact the SP boards could he designed to solve many dili'eront specific numerical problems (not necessary
`only in the area of [bio)chemical-physics) but, in our opinion, the whole system should conforms to the following
`requirements;
`
`I The SP chip must have an Application Program Interface (A131) so that
`mnespondiug sot’twarc Functions they refer to.
`o The SP API libraryI should be as cuni‘orrnant as possible to the corresponding sofiware counterparts and in
`a second step, the SP AP}. should be able to conform to the underlying hardware present tsee, e.g,, the role
`played by the GLKOpanL library in the graphic area of computing).
`0 The best hardware solution for this computational area would be a combination ofGPISI’ processors with the
`maximum configuration flexibility leg, you change your GP system and maintain the SP boards on the new
`
`it can be easily used as the
`
`This combination of GPISP chips could operate at impressive computational rates with minimal increasing of
`the hardware complexity as the PC market clearly shows. In fact the idea of merging GP and Sl‘ devices is not
`new: Intel Co, has introduced three years ago the M SP (5ND) chip into its series of Pentium processor with
`significant performance gains in executing multimedia XSo instructions.
`This last point open the discussion on the possible benefits of a mixed GPJSP architecture for this area of
`computing, a topic which is outside the scope of this paper. We would in any case point out that, difierently From
`other computational areas like High Energy Physics or l'tstrophysics,‘S the codes mentioned above require CPU as
`wall as memory and storage no top performances Hence, the mixed GPISP solution to the computing environment
`could surely represent the best viable way to High Pedimnance Computing in the short-medium terms.
`Furthermore, the expected perfon'nancea of the SP chips dedicated to this High Performance area of Computing
`are at least ofonc orderofmagnitude better than the corresponding GP RlSC processors {the 5? chip will eventually
`perform with greater perfonnances over the GP counterpart and stated by the published vendor benchmarks). This.
`together with the expected feasibility of the SP board over a standard FC-I bus. can improve the large diffusion of
`this land ot‘HPC solution with immediate economic benefits for the producer.
`
`
`computational details.
`r“The intervened reader can refers to the articles on the parallel SP machines. APE and GRAPE. presented in this. issue for any specific
`
`Petitioner Microsoft Corporation - Ex. 1066, p. 291
`Petitioner Microsoft Corporation - EX. 1066, p. 291
`
`

`

`iii
`Acknowledgements
`
`G. Chriiemirinl ’Cumpwiurfimm Cwmum'aflm: £39 r2001) {—19
`
`We wish to thanks the kind help of Prof. EA. Gianurrco and Piaf. L. Colombo for many helpful discussions on
`the topics covered by this paper. Dr, A. Pimeiti for lire Gaussian93 results and [he CASPUR computational contra
`for providing the computerarchitecmms used in this Work.
`
`References
`
`| H I33. see. E. Clmcnri. G.Curm1.giu (Eds). METECC-95. STE}: hair: I995.
`ll] M.J. Fi-r'slr1 11W. Trucks, H3. Schlcgcl. QB. Samaria, MA. Robb, H1 Chma‘l'l. VI}. Zahwwsky,J.A.Momtgomuiy. RE. Strum-mm.
`LC. Swarm, S. Dapprich. 1M. Mill'mAD. Daniels. KN. Kudin. MC. Swain. U.Fari£88. I. Tamasi. V. Bar-one. M. Cassi. K Cnmmi. B.
`Mcnuoc'r. C. Pomclli. C. Adamo, S. Clilfimi. J. Dchtel'sky. GA. Peturmn. RY. Ayala. Q. Cui. K. Momkwna. 11K. Malick. AD. Muck.
`K. Raghamhari. LB. anan. I Cioslowsky. LV. Ortiz. EB. Sui-hum. 0. Liu, A. Liashenka, P. Fiskmz. I. Knmammi. R. Gomperrs.
`R.L. Martin. DJ. Fax. '1‘. Keiflr. MA. M-Lalram. CY. Pang. A. Nanayakkara, C. Gomlcs. M. Challacnrnlw. P MW. Gill. BI]. 1011mm].
`r995.
`W. ChcrL MW. Wong. J.L. Andrus, M. Headfiordm, 5.5. Raplnglc. LA. Pople, Grimm?! (Revision A1}. Gaussian Inc... Pinshurg. FA,
`[3] See. cg. WF. van Gallium X. DarrrrL AB. Mark, GRDMEIS lime field. in: KL Allingrr, T Clark, J Gaslciger. PA. Militia“. HF.
`Schulz-r III. P. Selimilrer this}. Enquiaperlin omenpumional Chemistry. WI. 2. Wiley and Sons. Landon. i993. pp. I?! 1— I216.
`[4] 9. Achiemp. Ru. DnmuL D.v.d. Spoel. A. Sijbers, H. [lam B. Rm'rsrna, M.K.R. Remains, Gnnnacs: A Pazallcl Comm: for
`Molecular Ugaanaics Simulnrions, mecdiugs affirgsias Computing'w. Vol. 1. World Scientific. Slllga‘ucl'c, [993.
`[5|
`IL Orr. M. Par-inullc. Phys. kw. Lari. ESUEISSJMH
`[6| CZ. Wang. KM. Hn. 1'11: 1. Prismainc. AA Rim: {Eds}. 11er in Chumr'ul Playsica WI. LXXXIX. p. 65].
`6.
`[T] HJE. Ramadan at at. Sclacicd parallel applirxrious and marina] clemculs. in: Aspects ofCanrpumtional Science. NCF. "195. Chrumr
`Jill] HG. Ewen. KJE. Mcaalrrrm. M. Simidgc, F. van Hoeml. H.I.C. andsm. Rwllelizaiinn of Mnlcurlnr Dynamics Cndc. GROMOSN
`purulklizaiioc for distributed muarory Worn-cc irr' Manuals and Tedrniqmis in Compuhtioual Chemistry: METECC-E‘j. STEF, 1995.
`[9} D. Rocmranu. R. Bizzan-i. G. Chillcmi, N. Sawmkfli Noln.J mmpuL Chen Ian-3931635.
`[in] 5.3. see. hupziiumvmspurit.
`[I II N. Sarina. FA. Gimrurwo. SCELib: A parallel computational library ol’mulcculirr mm in llrc single ocnrcr nppmadr. Campus. Phys.
`Comm. I25 (2mm I39.
`[l2] Dig-ital Esu-adcd Mathematical Liam {DXMLMrmpJandiuiml mfhpdsaflwsmfimltmrl.
`|13i Engenccring and Scientific SubIDrJIinu Library (ESSL), hnpflM\W'.remtcthm.wmiachJUpl‘LMcpiE‘OptLihfiuichmnl.
`[14] Sec. 2.3.. in recent unpublisod work by K Guru our some BLAS subroutines haplemcnrcd on Ihe Alpha evSSi'rzvfi nricrirpmccssurs. by
`dimi' contacting the nullrur in flu: mail mwgmrahminmqp.
`“5] [U]. Fair. W. Yang, Density Functional Thunry ofmoms and Molecules. Oxford Univrcrsily Friars. New Ym'k. I939
`Comm. 128 (1000] ms.
`[l6] L. Colombo. M. Roar-h. Parallel tram-binding molecular dynamics simulations cm symmeu-ic mullipmccssms platforms. Compur. Plus
`Tbch. Rap. #lIJ. in preparation
`“TI [3 Chillemi, A. Fianna, M. Rnsrnl. N. Baum. Th: Pcrlirmrance alchemical-physics aides an EASFUR parallel archrmnms. CASPER
`
`[18] AJ. vim liar Swen (Ed). Aspen“ affmpumfimlifl Science. NCF Pirbl., Drrn Hag. The Netherlands, IWS.
`[r9] c.c.amrrm,m. Mud.Pl.1ys.23(195l}69;
`(LG. Hall. ch. Rey. Soc. [London] A 205 {Willi-ll.
`Nellwrlnmla. ”90.
`[20] E. Element: (34.), Appendices 7C-TE in MUFECC-Hfl, Muslim: Tlsclrniquw i'll Camprrmnonal Chemism; ESCDM. Lcirlcn. Tile
`Ill] RC. Rafferrerri. Chem. Phys. Len 20 {19131335.
`in: C. Ogre-til. IE. C'srmrarlia (Eds). Computational
`I22] Fara discussion OPSCF cameraman: and stability m HB. Schlcgcl. J. McDouaII.
`Advances in Organic Chaninry. Kluwcr, The Nefl'lcthrlds, 1951!.
`[23] W'J I‘Ielrrz. L. Radum. P.r'.R. Scbluycr. JA. Pup]; Ab hitio Molecular Orbital Theory, Wiley and Sons, New York 1926.
`[2.4] MP. Allen. DJ. Trlrlcslcy. Cnmprrmr Simularr'ons of quuida. Dxliprd University Pnesa Oxford. I991.
`[25] L. Cuiumlro. in: D. Simmer (Ed). Annual Review ofCumpursrimmJ Physics W, World Scientific. Singapore. I996. [2, NT.
`[25]
`l. mei. R. Hint-15.1: Z. Wig. KM. Hr). CM. Snukorrlis. Phys Rev. B 49(l994172i2.
`[2T] hnp:iMmr-.muihnrgflapacki.
`[28] G.H. Gulub. CF. Van Loan. Matrix Comprnariom. 3rd ear-1.. Jul-iris an

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket