`
`
`ISSN OT92-8651
`
`Visit Witay Journals Oniine
`=‘. wcintersclence.witey.com
`WILEY-INTERSCIENCE
`
`;
`
`Petitioner Microsoft Corporation - Ex. 1012, Cover 1
`Petitioner Microsoft Corporation - Ex. 1012, Cover 1
`
`
`
`Journal of
`
`Computational —
`emist
`
`ORGANIC / INORGANIC/ PHYSICAL / BIOLOGICAL
`
`Warren J. Hehre
`Wavefunction, 18401 Von
`Karman, Suite 370
`Irvine, CA 92715
`
`Martin Karplus
`Department of Chemistry
`12 Oxford Street
`Harvard University
`Cambridge, MA 02138
`
`Peter A. Kollman
`Department of Pharmaceutical
`Chemistry
`School of Pharmacy
`University of California
`San Francisco, CA 94143
`
`Keiji Morokuma
`Department of Chemistry
`EmoryUniversity
`Atlanta, GA 30322
`
`Eiji Osawa
`Department of Knowledge-
`Based Information
`Engineering
`Toyohashi University of
`Technology
`Tempaku-cho, Toyohashi
`441, Japan
`
`John A. Pople
`Department of Chemistry
`Northwestern University
`2145 Sheridan Road
`Wilmette, IL 60208
`
`Peter Pulay
`Department of Chemistry
`University of Arkansas
`Fayetteville, AR 7270]
`
`Harold A. Scheraga
`Baker Laboratoryof Chemistry
`Cornell University
`Ithaca, NY 14853-1301
`
`Michael C. Zerner
`Departmentof Chemistry
`University of Florida
`Gainesville, FL 3261]
`
`Andrew Streitwieser, Jr.
`Departmentof Chemistry
`University of California
`Berkeley
`Berkeley, CA 94720
`Kenneth B. Wiberg
`Department of Chemistry
`Yale University
`225 ProspectStreet
`New Haven, CT 06520
`
`iditors:
`[ditorial Advisory Board:
` Enrico Clementi
`Norman L. Allinger
`Paul G. Mezey
`Department of Chemistry
`Leo Radom
`Departinent of Chemistry
`Department of Chemist ry
`Computational Centerfor
`Research School of Chemistry
`University L. Pasteur, 3
`University of Saskatchewan
`MolecularStructure and
`The Australian National
`rue de l'Université
`Saskatoon, Canada S7N OWO
`Design
`University
`67084 Strasbourg, France
`University of Georgia
`Canberra, A.C.T. 0200
`Athens, GA 30602-2526
`Australia
`Paul von R. Schleyer
`Computer Chemistry Center
`Institut ftir Organische Chemie
`Universitat Erlangen-Niirnberg
`William L. Jorgensen
`Henkestrasse 42, D-91054
`Departinentof Chemistry
`Erlangen, Germany
`Yale University
`Associate Editor:
`PO Box 208107
`NewHaven, CT 06520-8107
`Gernot Frenking
`Fachbereich Chemie
`Philipps-Universitat Marbu re
`Hans-Meerwein-Strasse
`D-35032 Marburg, Germany
`Assistant Editors:
`
`Martin Feigel
`Institut fiir Organische Chemie
`Universitat Bochum
`Universitatsstr. 150
`D-44780 Bochum, Germany
`RogerS. Grev
`Departmentof Chemistry
`University of Kentucky
`Lexington, KY 40506
`Editorial Production, John Wiley: Merilee Croft Olson
`
`The Journal of Computational Chemistry (ISSN: 0192-8651)is published 16 timesa year, monthly and
`cessing a change of address, For Subscription inquiries, please call 212-850-6645; E-mail:
`semi-monthly in January, April, July, and November, one volume per year by Jolin Wiley & Sons, Inc.:
`SUBINFO@jwiley.com
`605 Third Avenue, New York, NY, 10158.
`Postmaster: Send address changes to Journal of Computational Chemistry, Caroline Rothayth.
`Copyright @ 1998 John Wiley & Sons,
`Inc. All
`rights reserved. No part of
`this publication
`Director, Subscription Fullillment and Distribution, Jolin Wiley & Sons.
`inc. 605 Third Avenue,
`may be reproduced in any form or by any means, except as permitted under section 107 or
`New York, NY 10158. For subscription inquities, please call cuslomer
`service at 212-850-
`108 of
`the 1976 United States Copyright Act, without either
`the prior written permission of
`6645 or write to the above address.
`Advertising:
`Inquiries
`conceming
`advertising
`should
`be
`forwarded
`to
`Susan
`Levey,
`Advertising Sales, John Wiley & Sons,
`ine. 605 Third Avenue, New York, NY 10158. 212-
`York, NY, and al additional mailing offices.
`850-8832. Advertising Sales, European Contacts: Bob Kem or Nicky Douglas, Jotin Wiley & Sons, Ltd.
`indicale the copyright holders consent
`The code and the copyright notice in the journal
`Baffins Lane, Chichester. West Sussex PO19 1UD, England. Tel: 44 1243 770 350/367: Fax: 44 1243
`that copies may be made for personal or internal use, or for
`the personal or
`internal use of
`Specific
`clients, on the condition that
`the copier pay for copying beyond that permitted by
`770 432: e-mail: adsales@wiley.co.uk,
`Reprints: Reprint sales and inquities should be directed to the customer service department.
`Sections 107 and 108 of
`the United States Copyright Law. The per-copy fee for each item is
`John Wiley & Sons, Inc., 605 Third Ave., New York, NY 10158: tel: 212-850-8776.
`to be paid through the Copyright Clearance Center, Inc.
`Manuscripts should be submitted to one of
`lhe editors: Or. Norman L. Allinger, Journal at
`This consent does not
`extend to olher kinds
`of
`copying,
`such as copying for general
`Computational Chemisty, Department of Chemistry, The University of Georgia. Athens, GA
`distribution,
`for advertising or promotional purposes,
`for creating new collective works, or
`30602;
`Prof. Dr.
`Paul
`von R Schleyer,
`Journal of Computational Chemistry, Computer
`resale, Such permission requests and other permission inquiries
`should be addressed to
`for
`Chemistry Center,
`Institut
`fir Organische Chemie, Universitat Erlangen-Nomberg, Henkestr. 42,
`the Permissions Dept.
`0-91054, Erlangen, Germany: or Prof. Gernot Frenking, Journal of Computational Chemistry,
`Subscription price (Volume 19, 1998): $1,198.00 in US, $1,358.00 in Canada and Mexico,
`Fachbereich Chemie, Philipps-Universitat Marburg, Hans-Meenwein-Strasse, D-35032 Marburg, Germany.
`31,475.00 outside North America. $150.00 US ACS member, $246.00 outside US ACS member. Per-
`{information for contributors appears in thefirst andlast issue of each volume. All other correspondence
`sonal rate: $150.00 in US, $246.00 outside North America. Subscriptions at the personalrate are avail-
`should be addressed to Journal af Computational Chemisiry, Publisher, Interscience Division, Profes-
`able only to individuals, All subscriptions outside US will be sent by air. Payment must be made in US
`sional, Reference, and Trade Group, John Wiley & Sons Inc.. 605 Third Avenue, New York, NY 10158.
`dollars drawn ona US bank. Claims for undelivered copies will be accepted only after the following issue
`The contents of this journal are indexed in the following: Chemical Abstracts, Chemical Tilles,
`has been received. Please enclose a copy of the mailing label. Missing copies -will be Supplied when
`Current Contents/Physical, Chemical, & Earth Sciences, Mathematical Reviews, Reference Update,
`losses have been sustainedin transit and where reserve stock permits. Please allow four weeks for pro-
`esearch Alert (ISI), Science Citation Index (ISI), and SCISEARCH Database SH).
`== This paper meetsthe requirements of ANSUNISO 239.45-1992 (Permanenceof paper),
`
`Petitioner Microsoft Corporation - Ex. 1012, Cover 2
`Petitioner Microsoft Corporation - Ex. 1012, Cover 2
`
`
`
`Journal of
`
`Computational
`Chemistry
`
`Volume 19/Number 7/May 1998
`
`CONTENTS
`
`Development of a Parallel Molecular Dynamics Code on SIMD Computers:
`Algorithmfor Use of Pair List Criterion
`D, Roccatano, R. Bizzarri, G. Chillemi, N. Sanna, and A. Di Nola
`
`Multivariate Analysis of a Data Matrix Containing A-DNA and B-DNA
`Dinucleoside Monophosphate Steps: Multidimensional Ramachandran
`Plots for Nucleic Acids
`
`M. L, M. Beckers and L. M. C. Buydenis
`
`AbInitio Prediction of "N-NMR Chemical Shift in «-Boron Nitride
`Based on an Analysis of Connectivities
`Marcus Gastreich and Christel M. Marian
`
`The Flying Ice Cube: Velocity Rescaling in Molecular Dynamics
`Leads to Violation of Energy Equipartition
`Stephen C. Harvey, Robert K.-Z. Tan, and Thomas E. Cheatham UI
`
`Systematic Prediction of the Products and Intermediates of Isotopic
`Labeling in Reaction Pathway Studies
`AndrewV. Zeigarnik and Rail E. Valdés-Pérez
`
`Electrostatic Model for Infrared Intensities in a Spectroscopically
`Determined Molecular Mechanics Force Field
`
`Kin Palio and Samuel Krinim
`
`Solvation Free Energies Calculated Using the GB/SA Model: Sensitivity
`of Results on Charge Sets, Protocols, and Force Fields
`M. Rami Reddy, Mark D. Erion, Atul Agarwal, Vellarkad N, Viswanadhan
`D, Quentin McDonald, and W. Clark Still
`
`685
`
`695
`
`716
`
`726
`
`741
`
`754
`
`769
`
`Volume 19, Number 7 was mailed the week of April 13. 1998
`
`‘Continued on next page)
`
`Visit Wiley Journals Online
`“WILEY
`InterS@ience’
`www.interscience.wiley.com
`
`Petitioner Microsoft Corporation - Ex. 1012, ToC i
`Petitioner Microsoft Corporation - Ex. 1012, ToC i
`
`
`
`(Continued)
`
`Method of Calculating Band Shape for Molecular Electronic Spectra
`Greg M.Pearl, M. C. Zerner, Anders Broo, and John McKelvey
`Neighbor-List Reduction: Optimization for Computation of Molecular
`van der Waals and Solvent-Accessible Surface Areas
`Jorg Weiser, Armin A. Weiser, Peter S. Shenkin, and W. ClarkStill
`
`|
`
`Erratum
`
`781
`
`797
`
`3809
`
`Petitioner Microsoft Corporation - Ex. 1012, ToC ii
`Petitioner Microsoft Corporation - Ex. 1012, ToC 11
`
`
`
`———————
`
`Developmentof a Parallel Molecular
`Dynamics Code on SIMD Computers:
`Algorithm for Use of Pair List Criterion
`
`eee
`
`D. ROCCATANO;! R. BIZZARRI,”? G. CHILLEMI,? N. SANNA,”
`A. DI NOLA!
`‘Dipartimentodi Chintica, Universita ‘La Sapienza,” Ple Aldo Moro 5, 00185 Rome, Italy
`“CASPURConsorzio perle Applicazioni di Supercalcolo per Universita e Ricerca, c/o Universita “La
`Sapienza,” Rome, Italy
`‘Dipartimento di Fisica, Universita “La Sapienza,” Rome, Italy
`Received 24 April 1996; accepted 24 October 1997
`
`
`
`ABSTRACT:In recent years several implementations of molecular d)yamics &
`(MD)codes have beenreported on multiple instruction multiple data (MIMD) i AK
`machines. However, very few implementations of MD codes on single instructi6n==
`multiple data (SIMD) machines have been reported. The difficulty in using pair
`lists of nonbonded interactions is the major problem with MD codes for SIMD
`machines, suchthat, generally, the full connectivity computation has beenused.
`Wepresent an algorithm, the global cut-off algorithm (GCA), which permits the
`use of pairlists on SIMD machines. GCAis based on a probabilistic approach
`and requires the cut-off condition to be simultaneously verified onall nodes of
`the machine. The MD code used was taken from the GROMOSpackage; only
`the routines involved in the pairlists and in the computation of nonbonded
`interactions were rewrittenfora parallel architecture, The remaining calculations
`were performed on the host computer. The algorithm has been tested on
`Quadrics computers for configurations of 32, 128, and 512 processors ancl for
`systems of 4000, 8000, 15,000, and 30,000 particles. Quadrics was developed by
`Istituto Nazionale di Fisica Nucleare (INFN) and marketed by Alenia Spazio.
`© 1998 John Wiley & Sons, Inc.
`J Comput Chem 19: 685-694, 1998
`Keywords: molecular dynamics; domain decomposition algorithm; parallel
`computers; pair list for molecular dynamics code on SIMD machines; array
`processorelaborator (APE)
`
`Correspondence fo: A, Di Nola
`Contract/grant sponsors: Ente per le Nuove Tecnologie,
`"Energia e l’Ambiente; Alenia Spazio
`:
`
`Journal of Computational Chemistry. Vol. 19, No. 7, 685-694 (1998)
`©1998 John Wiley & Sons,Inc.
`.
`;
`
`CCC 0192-8651 / 98 / 070685-10
`
`Petitioner Microsoft Corporation - Ex. 1012, p. 685
`Petitioner Microsoft Corporation - Ex. 1012, p. 685
`
`
`
`ROCCATANO ETAL.
`
`Introduction
`
`ge lassical molecular dynamics (MD)is used to
`
`study the properties of liquids, solids, and
`molecules.'* The Newton equation of motion for
`eachparticle of the system is solved by numerical
`integration andits trajectory is obtained. From this
`microscopic point of view, many microscopic and
`macroscopic properties can be obtained. The need
`for numerical
`integration limits the time step to
`the femtosecond scale and makes MD simulation a
`very time consuming task. Therefore, considerable
`efforts have been concentrated on optimizing MD
`codes on parallel computers of different architec-
`tures,
`Parallel computers are frequently described as
`belonging to one of two types: single instruction
`stream multiple data stream (SIMD), or multiple
`instruction stream multiple data stream (MIMD).
`In general, SIMD machines have a simpler archi-
`tecture, but
`they have hardware limitations be-
`cause the same instruction is executed in parallel
`on every SIMD processor and, furthermore, some
`SIMD machines do not have local addressing; that
`is, the processors are not able to access their own
`memory using different local addresses. In recent
`years, several MD codes have been implemented
`on MIMD architectures with a few dozen of
`processors*~* and, more recently, also on 100- to
`1000-processor MIMD machines."~* Parallel inyple-
`mentations of biological MD programs such as
`CHARMM?”and GROMOS" on MIMD machines
`have been discussedin the literature.*!!~'3
`Less work has been done using SIMD
`systems.'"~'” In general, they make use of the full
`connectivity computation;
`that
`is, all atom pair
`interactions are calculated, and are useful
`for
`long-range force systems. This is due to the diffi-
`culty of using pair lists of nonbonded atoms on
`SIMD machines with nolocal addressing.
`In the present study we propose an algorithm
`that permits the use ofpairlists in a MD code fora
`SIMD machine with nolocal addressing. The algo-
`rithm requires simultaneous use of multiple time
`step’ and geometric decomposition” methods. In
`addition,
`the systolic loop’ method is used to
`further reduce computation time.
`The method was tested on Quadrics com-
`puters,""~?! a class of SIMD machines developed
`by INFN and Alenia Spazio, for configurations of
`32, 128, and 512 processors. Quadlrics is the only
`
`massive parallel computer developed with fully
`European technology. As
`the Europort2/PACC
`project!’ has shown,
`the scalability for a MD
`code on MIMDarchitecture, for complex systems
`such as a protein in solution, is generallysatisfac-
`tory only up to 12-16 nodes.
`Moreover,
`there are interesting projects being
`undertaken on mixed architecture MIMD/SIMD
`machines that could supply the computational
`powerof a SIMD machine, together with the flexi-
`bility of a MIMD.It is therefore worthwhile to
`determine whetherthese machinesare able to per-
`form such calculations.
`The following molecular systems have been
`used as tests:
`
`«
`
`*
`
`«
`
`«
`
`System 1: Box of 1536 water molecules (4608
`atoms).
` Systent 2: Box with a BPTI (bovine pancreatic
`trypsin inhibitor) molecule and 2712 water
`molecules (8704 atoms).
`Systent 3: Box with a SOD (superoxide dis-
`mutase) dimer and 4226 water molecules
`(15,360 atoms).
` Systent 4: Box with a SOD (superoxide dis-
`mutase) dimer and 9346 water molecules
`(30,720 atoms).
`
`It should be noted that system 4 is nearly the same
`as test case [3, used as the industrial benchmark in
`the framework of the Eurosport2/PACC project!
`(system 4 has 9346 water molecules whereas test
`case 13 has 9436 water molecules). The results
`showthatthe speed-up of the algorithmis compa-
`rable to those obtained with MIMD machines.
`
`
`
`Hardware
`
`Wetested the method on a Quadrics machine,
`Alenia Spazio’s supercomputer derived from the
`APE100 (Array Processor Elaborator) project, de-
`veloped by INFN.'’~*! These computers are a fam-
`ily of massively parallel SIMD machines with im-
`plementations from 8
`to 2048 processors. The
`biggest implementation allows a peak computing
`power of 100 GFlops in single precision (32-bit
`processors).
`The processors are arranged in a three-dimen-
`sional
`(3D) cubic mesh and can exchange data
`with the six neighboring nodes, with periodic
`boundaries. Each processor board contains eight
`processors (floating point units) with their own
`
`686
`
`
`
`VOL. 19, NO. 7
`
`Petitioner Microsoft Corporation - Ex. 1012, p. 686
`Petitioner Microsoft Corporation - Ex. 1012, p. 686
`
`
`
`ne ENeeeAa
`
`the bonded poten-
`Thefirst four terms represent
`tial. b, by, and K, are the actual’ bond length, its
`reference value, and the bond stretching force con-
`stant, respectively. 6, ,, and K, are the actual
`bond angle,
`its
`reference value, and the angle
`bending force constant, respectively, &, &, and K,
`are the actual
`improper dihedral angle,
`its refer-
`ence value, and the improperdihedral angle bend-
`ing force constant, respectively. ¢, K., a, and 6
`are the actual dihedral angle, its force constant, the
`multiplicity, and the phase, respectively. The last
`term in the equation includes the nonbonded, van
`der Waals, and Coulombic terms.
`€,,
`and o;; are
`the dispersion well depth and the Lennard-Jones
`distance, q; and q; are the electrostatic atomic
`charges, r;;
`is the distance between them, and «is
`the dielectric constant.
`The time step used for the numerical integration
`is in the femtosecondscale. The highest frequency
`of bond vibrations would require a time step < 0,5
`fs; however,
`if the simulation is performed with
`constant bondlengths, the time step canbe < 2fs,
`For this reason, many MD codes perform simula-
`tions with constant bond lengths.
`The most frequently used algorithm to perform
`MD simulation at constant bond lengths is
`the
`SHAKE algorithm based
`on
`an_
`iterative
`procedure.
`
`ee
`Compulational Algorithmfor
`NonbondedInteractions
`
`In a MD program,the calculation of the non-
`bondedforces is the most time-consuming task—in
`fact, it takes about 90% of the computational time,
`depending on the protocol used.
`One of the most frequently used techniques to
`reduce the number of nonbonded forces is the
`cut-off criterion. With this methodthe interactions
`between atoms beyond a cut-off distance are ne-
`glected. If the cut-off radius is appropriate the lost
`energy contributionto the global potential is small.
`During a small numberof steps the pairs of inter-
`acting atoms are considered to remain the same so
`that it
`is possible to create a list of these interac-
`tions, the nonbondedpairlist, which will be up-
`dated every n steps (1 is generally equal to 10).
`The number of nonbondedinteractions is N(N —
`1)/2 (N is the number of atoms), so that
`it
`is
`proportional to N*. The use of the cut-off criterion
`reduces this numberto kN (k is a constant).
`
`memory(4 megabytes). Up to 16 boards can be put
`into a crate. Configurations with more than 128
`processors are made up connecting crates of 8
`4+ x 4 (128) nodes.
`The Quadrics controller board contains oneinte-
`ger CPU (Z-CPU), which controls the flux of the
`program and the integer arithmetic. The language
`used is called Tao, a Fortran-like high-level paral-
`lel language, which can be modified and extended
`through a dynamic ZZ parser, The Quadrics ma-
`chine is connected to a host computer (a Sun
`Sparc-10 or -20), A host interface based on a HIPPI
`standard, which allows an 1/O speed betweenthe
`host and Quadrics of 20 MB/s, has recently been
`developed. The tests on the sequential machine
`have been run on a DEC-alpha 3000/500 machine.
`Barone et al.” compared the accuracy of Quadrics
`in the field of molecular dynamics with that ofa
`conventional computer to assess the limits of the
`single precision.
`
`eee
`Molecular Dynamics
`
`In a molecular dynamics simulation, the classi-
`cal equations of motion for the system of interest
`are integrated numerically by solving Newton's
`equations of motion:
`,
`d-rI
`ay >> = —VV (1, 75,....8,)
`Pte?
`i
`1
`2
`nN
`
`The solutiongives the atomic positions and veloci-
`ties as a function of time. The knowledge of the
`trajectory of each atom permits study of the dy-
`namic orstatistical properties of the system, The
`formof the interaction potential is complex andit
`incluces energy terms that represent bended and
`nonbonded (van der Waals and Coulombic) inter-
`actions:
`
`Vr Typ... ty)
`-yi
`bonds 2
`
`K,lb = bP +
`angles
`
`I
`‘
`
`K,[@— 0.]*
`[
`]
`
`1 L
`
`+ » 5 Kel a Eq]
`
`i
`
`%
`
`improper
`dihedrals
`fe K.[1 + cose — 8)]
`dihedrals
`
`$
`
`+ te; AZ 16
`
`12
`OF;
`
`i
`
`6
`O;;
`
`ij
`
`pairs ty, f)
`
`qj yj
`
`darat
`
`ij
`
` JOURNAL OF COMPUTATIONAL CHEMISTRY
`687
`
`Petitioner Microsoft Corporation - Ex. 1012, p. 687
`Petitioner Microsoft Corporation - Ex. 1012, p. 687
`
`
`
`ROCCATANO ET AL.
`
`the cut-off criterion is not di-
`Unfortunately,
`rectly applicable to SIMD architecture, as the same
`instruction is executed at each time on each pro-
`cessor and, consequently,it is not possible to have
`a local branch (the cut-off condition) in the pro-
`gram flow. Moreover, on Quadrics it is not possi-
`ble to have a local pair list on each node because
`local addressing of arrays is not possible. This
`explains the lowerlevel of efficiency of a MD code
`on a SIMD machine with respect to a MIMD one.
`We have recently developed an algorithm,
`the
`global cut-off algorithm (CGA), based ona proba-
`bilistic approach, which allows the use of the cut
`off condition on a SIMD machine with no local
`addressing.
`Because the calculation of bonded interactions
`and the integration of the trajectory take a small
`amount of the total calculation time, in the present
`version we have chosen tocarry them out on the
`front-end computer and to performonlythe calcu-
`lationsof the pairlist and of the nonbondedforces
`using the SIMD machine.It is, of course, possible
`to performall force calculations and integration in
`the parallel machine using, for instance, the repli-
`cated data method.®
`
`GEOMETRIC DECOMPOSITION
`
`The assignment of the atoms to the nodes is
`obtained by a dynamically geometric decomposi-
`tion’ in such a way that
`the same number of
`atoms is assigned to each node. In what follows,
`we discuss a decomposition for a bidimensional
`case; the extensionto a third dimensionis straight-
`forward: given the bidimensional box of Figure la
`and a 2D parallel topologyof 1 =n » X il, proces-
`sors, with 1, =, = 2,
`the box is first divided
`into it, boxes along the x-axis, as shownin Figure
`Ib, each containing the same number of atoms.
`Each box is successively divided into i, boxes
`along the y-axis in such a waythat each one ofthe
`i", Xm, boxes contains the same numberof atoms
`(Fig. 1c), When,as ina real case, a third dimension
`exists, a successive division along the z-axis has to
`be performed.
`It is obvious that, before performing anydivi-
`sion along a given axis,
`it is necessary to sort the
`atoms of each box along that axis.
`The density of a molecular system, such as a
`protein,
`is not uniform;
`thus,
`the boxes do not
`have the sameaxis lengths. However, these differ-
`ences do not significantly reduce the efficiency of
`the GCA described in what follows.
`
`
`
`FIGURE 1. Domain decomposition of the molecular
`system in boxes with the same numberof atoms, for a
`bidimensional case.
`
`SYSTOLIC LOOP METHOD
`
`Quadrics topology makes it possible to use
`a systolic loop to calculate the nonbondedinter-
`actions between the atoms assigned to the dif-
`ferent nodes. The systolic loop method is one of
`the most efficient algorithms for calculation of
`two-body interactions on MIMD and SIMD
`machines."'7!25" The systolic loop algorithm
`passes the coordinates of all atoms around a ring
`of P processors in P/2 steps, such that half of the
`coordinates passes every processor exactly once
`(transient atoms). Each nodealsostores the coordi-
`nates of a group of atoms of the overall system
`(resident atoms). During the systolic cycle each
`processor evaluates and accumulates the interac-
`tions of the resident atoms with thetransient ones,
`Onlyhalf of the atoms have to pass in each com-
`putational node as a consequence ofthe reciprocity
`ofthe interactions.
`The systolic loop path for a 32-node Quadrics
`machine is shown in Figure 2. This machine has
`two nodes along the y and = directions and eight
`along the x direction.
`The geometric decomposition of the system per-
`mits limitation of the search for nonbondedinter-
`actions only to the neighboring processors nearer
`than the cut-off radius, so that, depending on the
`number of nodes and on the system size,
`it
`is
`generally not necessary to perform the complete
`systolic loop. The computedforces are passed back
`to the owning processor to accumulate the full
`force,
`
`GLOBAL CUT-OFF ALGORITHM
`On a SIMD machine, all nodes simultaneously
`evaluate an interaction, but the atom pairs in each
`nocle are different, Figure 3 shows, as an example,
`
` 688
`VOL. 19, NO. 7
`
`Petitioner Microsoft Corporation - Ex. 1012, p. 688
`Petitioner Microsoft Corporation - Ex
`. 1012, p. 688
`eeeeee
`
`
`
`(0.0.15
`
`Ly
`
`\ RIGHT 09
`Ss
`FRONT (y)
`
`FIGURE 2. Systolic loop path for node (0,0,0) of a
`32-node Quadrics machine. The transient groups of
`atomsvisit only four neighboring y-z planes, based on
`Newton's third law.
`
`the case with four nodes: suppose that each node
`is evaluating the interaction a;;
`in this case, all a,
`interactions fall within the cut-off radius. When
`the interactions are of the b, type all the distances
`fall outside the cut-off radius and the interactions
`b; are skipped.In the caseof interactions of typec;
`the interactionis outside the cut-off radius in nodes
`1, 2, and 3, but it is inside the cut-off radius in
`
`
`
`FIGURE 3. Different types of interactions in a case of
`four nodes. a;, all the interactionsfall within the cut-off
`distance; b;, all the interactionsfall outside the cut-off
`distance; ¢,, one interactionfalls within the cut-off,
`whereasthree fall outside the cut-off. P.U. = processor
`unit.
`
`node 4, so that all nodes have to calculate this
`interaction and only will be saved in the forces
`calculation. If the atoms in each nodeare ordered
`randomly,
`the interactions of type c;
`result
`in
`being the most frequent.
`The basis idea of the global cut-off algorithm
`(GCA) is to maximize the occurrence of interac-
`tions of type a; and b; and, conversely, reduce the
`occurrence of interactions of type c,. To this end,it
`is necessary that the atoms inall nodes are sorted
`with the samecriterion. Different types of sorting
`give comparable results. We have chosen the one
`shownin Figure 4. After this sorting procedure, a
`list of the interactions of type a; and c,
`is created
`in the integer CPU (Z-CPU) of the SIMD machine.
`This list
`is equivalent, but not
`identical,
`to the
`nonbonded pair list used in most MD programs
`and will be referred as the nonbonded pairlist.
`The ordering procedures for the domain decom-
`position and the sorting procedure previously de-
`scribed are time consuming and have to be per-
`formed on the host serial machine; however, as
`will be shown, they have to be performed every
`100 to 200 steps so that they do not significantly
`affect the global computationtime,
`The global cut-off conditionis based ona proba-
`bilistic approach, so that the numberof pair inter-
`actions to be calculated is larger than the actual
`number of pairs included within the iy, Cistance.
`Depending on the molecular system and on the
`numberof nodes, the ratio between the numberof
`the calculated interactions and the numberof in-
`teractions actually included within the cut-off dis-
`
`
`
`
`
`
`
`
`
`
`
`FIGURE 4.Sorting of atomsin each nodefor a
`bidimensional case. The atoms are represented asfull
`circles.
`
` JOURNAL OF COMPUTATIONAL CHEMISTRY
`
`689
`
`Petitioner Microsoft Corporation - Ex. 1012, p. 689
`Petitioner Microsoft Corporation - Ex. 1012, p. 689
`
`
`
`ROCCATANO ET AL.
`
`tance varies from the three tofive. In this sense,
`the GCAis not veryefficient. However, it must be
`noted that almost all pair interactions within a
`distance ra. <r < ra, + Ar, with Ar =02 nm,
`are calculated. As an example, given r.,, = 0.6 nm,
`94% of the interactions in the range 0.6 <r <0.8
`nm are calculated and only 6% of pairs in this
`range are lost. This suggests adoption of a cut-
`off value of rec,, to be used in the global cut-off
`condition, somewhat shorterthan the actual cut-off,
`ate Cesired for the simulation.
`In the previous
`example, with roc, = 0.6 and 7. = 0.8 nm,
`the
`numberof pairs to be calculated is roughly two-
`to-threefold larger than the actual number of pairs
`within the r.,, distance. The remaining 6% of
`interactions in the range 0.6 < r < 0.8 nm have to
`be calculated separately.
`It
`is well known that nonbonded forces vary
`more slowly than the bonded ones. Moreover, non-
`bonded forces at large distances vary slower than
`nonbonded forces at short distances. This suggests
`updating of the forces at different steps, according
`to their nature (bonded and nonbonded) and to the
`distance of the interaction. The short-range interac-
`tions can be evaluated every step, and long-range
`interactions every imsteps. Accordingly,
`the few
`interactions in the range 0.6 <r <0.8 nm that
`were lost using roc, = 0.6 nm can be updated
`everyi steps. As these interactions are calculated
`while evaluating the nonbonded pair list (i.e., up-
`dated every1 steps), we have chosen m = 1= 10,
`It must be noted that there are nowtwoshells:
`an inner shell
`(7 < 0.6 nm) and an outer shell
`(0.6 < r < 0.8 nm). All interactions are evaluated
`every Hi steps, whereas only those interactions
`corresponding to the inner shell are evaluated ev-
`ery step. It is therefore not necessary to have a skin
`distance and to storea list of atom pairs outside
`the outer shell.
`
`In the present studyit is seen that most interac
`tions in the range0.6 to 0.8 nmare evaluated every
`step and onlya fewof themare evaluated every i
`steps. According to all of the MTSalgorithms,this
`choice does not affect the numerical accuracy; in
`fact, the same accuracyis obtained when an inter-
`action, within the outer shell,
`is evaluated every
`short time step or every long time step. However,
`as every long time stepall interactions within the
`outershell are evaluated, it is possible to perform
`the MTS according to the classical procedure; that
`is, by collection all
`the interactions within the
`outer shell at every long time step and collecting
`only the interactions within theinnershell at every
`short
`time step. Among several algorithms pro-
`posed for the multiple time step (MTS)!!*76 we
`have chosen the one developed by Scully and
`Hermans."*
`It must be noted thatall nonbondedinteractions
`(van der Waals and Coulombic) between bonded
`anc nonbonded atoms are calculated in this step,
`but theinteractions between bonded atomsare not
`saved. This is obtained by attaching to each atom a
`list of the atoms bondedtoitself, This procedure is
`certainly not efficient, but
`the time required to
`performit is negligible. In the following tests, the
`values of ige, and r.,, are fixed at 0.6 and0.8 nm,
`respectively,
`
`Resulis
`
`Table I shows the numberof interactions within
`the cut-off radius r.,, compared with the number
`of interactions to be evaluated with the GCA. It
`can be noted that the numberofinteractions for
`calculation is twoto three times the actual number
`of interactions within the cut-off radius. The time
`
`TABLE 1.
` Number of ActualInteractions, N, within a Cut-Off Radius (r.4,) = 0.8 nm Compared with the Numberof
`Interactions Calculated Using GCA on 32-, 128- and 512-Node Quadrics Machines.
`
`N
`32 nodes
`128 nodes
`512 nodes
`
`System 1
`(4608 atoms)
`488,847
`706,944
`840,320
`873,984
`System 2
`(8704 atoms)
`891,252
`1,612,864
`1,810,944
`2,070,016
`(15,360 atoms)
`1,487,054
`3,153,344
`3,768,576
`4,452,864
`(30,720 atoms)
`3,129,972
`6,754,240
`8,249,216
`9,938,944
`
`
`System 3
`
`System 4
`
`
`690
`VOL. 19, NO. 7
`
`Petitioner Microsoft Corporation - Ex. 1012, p. 690
`Petitioner Microsoft Corporation - Ex. 1012, p. 690
`
`
`
`required for performing the MD simulation with
`the present code is the sumof the following steps:
`
`1. Ordering procedure (performed every k
`steps by the host computer),
`2. Calculation of the nonbonded pairlist (per-
`formed every 1 steps by Quadrics),
`Y Calculation of the nonbonded forces (per-
`formed every step by Quadrics).
`3a, Calculation of the bondedforces by the host
`computer while performing step 3.
`4, I/O host @ Quadrics.
`wr
`SHAKE and integration (performed by the
`host),
`
`time-consuming
`The ordering procedure is a
`task and, dueto the diffusion of the system, it has
`to be periodically repeated every k steps. If no
`reordering is performed the nonbonded pairlist will
`incluce an increasing numberof interactions, thus
`increasing the computationaltime. Figure 5 shows
`
`RNumberofinteractionsx10° 6.7:
`
`o
`
`20
`
`120
`
`140
`
`—
`460
`
`—
`——+
`100
`80
`40
`60
`numberof steps
`FIGURE 5. Numberofinteractions to be calculated
`versus the numberof steps for system 4 on a 32-node
`machine. The atoms are not reordered during the
`simulation.
`
`the numberofinteractions to be evaluated versus
`the numberofsteps when the ordering procedure
`is performedat the beginning and not updated, for
`system + on a 32-node machine,
`The loss of performanceis nearly linear, being
`~ 0.08% per step. The optimum k value depends
`on the time required for the ordering procedure
`and on the time required for items 2 and 3.
`It
`showsthat, forall the systems andall the different
`numbers of nodes, the optimum k valueis in the
`range of 100 to 200 steps. The ordering procedure
`for system 4 on a Sun Spare-20 (the host computer)
`required 20 seconds, so thatits cputime per stepis
`in the range of 100 to 200 ms.
`The nonbonded pair list
`is evaluated every H
`steps (1 = 10 in the present case) and the non-
`bondedinteractions are evaluated every step. The
`average cpu time required for these tasks is re-
`ported in Table Il for different system