`
`Nucleic Acids
`Research
`
`JUL O 8 2003
`science Library
`University of Callfornlr,
`Riverside
`
`OXFORD UNIVERSITY PRESS
`
`ISSN 0305 1048 Coden NARHAO
`
`Miltenyi Ex. 1023 Page 1
`
`
`
`Subscriptions
`
`issues per year).
`is published twice monthly (24
`Nucleic Acids Research
`Subscriptions are entered on a calendar year basis only and are available as a
`printed version (including online access) or as online access only at a discount of
`l Oo/c (+VAT in UK). Prices include postage by surface mail or, for subscribers in
`the USA and Canada by air freight. or in India, Japan, Australia and New Zealand.
`by Air Speeded Post. Airmail rates are available on request.
`Annual subscription rate (Volume 31, 2003):
`Institutional
`Print and Online site licence: UK and Europe £1365, Rest of World $2360.
`Personal
`Print and Online: UK and Europe £365. Rest of World $621.
`Back volume prices are available on request. Please add sales tax to the prices
`quoted.
`Orders. Orders and payments from, or on behalf of. subscribers in the various
`geographical areas shown below should be sent to the Press office indicated.
`The Americas: Oxford University Press Inc., 2001 Evans Road. Cary. NC 27513.
`USA.
`Rest of the World: Journals Subscriptions Department, Oxford University Press,
`Great Clarendon Street, Oxford OX2 6DP, UK.
`Tel: (+441865 or 01865) 353907: Fax: (+441865) 353485:
`Email: jnl.orders@oup.co.uk
`Advertising. To advertise in Nucleic Acids Research contact Oxford University
`Press (US office) in the Americas or Oxford University Press (UK office) in the
`Rest of the World (see addresses above).
`
`© Oxford University Press 2003. All rights reserved: no part of this publication
`may be reproduced. stored in a retrieval system. or transmitted in any form or by
`any means, electronic. mechanical. photocopying, recording, or otherwise without
`either prior written permission of the Publishers. or a licence permitting restricted
`copying issued in the UK by the Copyright Licensing Agency Ltd, 90 Tottenham
`Court Road, London WIP 9HE, or in the USA by the Copyright Clearance Center.
`222 Rosewood Drive. Danvers. MA 01923. USA.
`
`Back volumes of this journal are available in 16 mm microfilm, 35 mm microfilm
`and 105 microfiche from University Microfilms International, 300 North Zeeb
`Road. Ann Arbor. MI 48106- 1346. USA. Copies of articles published are also
`available from UMI.
`
`Nucleic Acids Research (ISSN 0305-1048) is published twice monthly by Oxford
`University Press, Oxford. UK. Annual subscription price is US$2360. Nucleic
`Acids Research is distributed by Mercury International, 365 Blair Road, AveneL
`NJ 07001. USA. Periodicals Postage Paid at Rahway, NJ. USA and additional
`entry points.
`
`US POSTMASTER: send address corrections to Nucleic Acids Research, c/o
`Mercury Airfreight International Ltd. 365 Blair Road. A venel, NJ 07001, USA.
`
`Typeset and printed by Information Press Ltd, Oxford, UK on acid-free paper.
`
`Cover: 3D-Jury model of the C-terminal RNA methyltransferase module of the large ORF lab protein of the coronavirus isolated
`from patients suffering from SARS (Severe Acute Respiratory Syndrome). This kind of structure assignment is currently not
`fca-.;ihle using standard sequence comparison methods.
`
`Miltenyi Ex. 1023 Page 2
`
`
`
`Nucleic Acids Research
`Contents
`
`Volume 31 number 13, July 1, 2003
`
`EDITORIAL
`
`Detection of reliable and unexpected protein fold
`predictions using 3D-Jury
`
`K.Ginalski and L.Rychlewski
`
`DSSPcont: continuous secondary structure
`assignments for proteins
`
`P.Carter, C.A.F.Andersen and B.Rost
`
`PROTINFO: secondary and tertiary protein
`structure prediction
`
`L.-H.Hung and R.Samudrala
`
`The PredictProtein server
`
`B.Rost and J.Liu
`
`GeneSilico protein structure prediction meta-server
`
`M.A.Kurowski and J.M.Bujnicki
`
`META-PP: single interface to crucial prediction
`servers
`
`VA.Eyrich and B.Rost
`
`EVA: evaluation of protein structure prediction
`servers
`
`I.Y.Y.Koh, VA.Eyrich, M.A.Marti-Renom, D.Przybylski,
`M.S.Madhusudhan, N.Eswar, O.Grafia, F.Pazos, A.Valencia,
`A.Sali and B.Rost
`
`VADAR: a web server for quantitative evaluation of
`protein structure quality
`
`L.Willard, A.Ranjan, H.Zhang, H.Monzavi, R.F.Boyko,
`B.D.Sykes and D.S.Wishart
`
`ESPript/ENDscript: extracting and rendering
`sequence and 3D information from atomic
`structures of proteins
`
`WebFEATURE: an interactive web tool for
`identifying and visualizing functional sites on
`macromolecular structures
`
`P.Gouet, X.Robert and E.Courcelle
`
`M.P.Liang, D.R.Banatao, TE.Klein, D.L.Brutlag and
`R.B.Altman
`
`3MATRIX and 3MonF: a protein structure
`visualization system for conserved sequence motifs
`
`S.P.Bennett, L.Lu and D.L.Brutlag
`
`Motif3D: relating protein sequence motifs to 3D
`structure
`
`A.Gaulton and T.K.Attwood
`
`LOC3D: annotate sub-cellular localization for
`protein structures
`
`R.Nair and B.Rost
`
`Annotation in three dimensions. PINTS: Patterns in
`Non-homologous Tertiary Structures
`
`A.Stark and R.B.Russell
`
`NCI: a server to identify non-canonical interactions
`in protein structures
`
`M.M.Babu
`
`MolSurfer: a macromolecular interface navigator
`
`R.R.Gabdoulline, R.C.Wade and D.Walther
`
`CASTp: Computed Atlas of Surface Topography of
`proteins
`
`TA.Binkowski, S.Naghibzadeh and J.Liang
`
`3289
`
`3291-3292
`
`3293-3295
`
`3296-3299
`
`3300-3304
`
`3305-3307
`
`3308-3310
`
`3311-3315
`
`3316-3319
`
`3320-3323
`
`3324-3327
`
`3328-3332
`
`3333-3336
`
`3337-3340
`
`3341-3344
`
`3345-3348
`
`3349-3351
`
`Conrinued
`
`Miltenyi Ex. 1023 Page 3
`
`
`
`Contents (Continued)
`
`SEM (Symmetry Equivalent Molecules): a web(cid:173)
`based GUI to generate and visualize the
`macromolecules
`
`Servers for sequence-structure relationship analysis
`and prediction
`
`Z.Dosztanyi, C.Magyar, G.E.Tusnady, M.Cserzo, A.Fiser
`and I.Simon
`
`POPS: a fast algorithm for solvent accessible surface
`areas at atomic and residue level
`
`L.Cavallo, J.Kleinjung and F.Fratemali
`
`MATRAS: a program for protein 3D structure
`comparison
`
`LGA: a method for finding 3D similarities in
`protein structures
`
`Tools for comparative protein structure modeling
`and analysis
`
`SWISS-MODEL: an automated protein homology(cid:173)
`modeling server
`
`STING Millennium: a web-based suite of programs
`for comprehensive and simultaneous analysis of
`protein structure and sequence
`
`Integrated databanks access and sequence/structure
`analysis services at the PBIL
`
`NRSAS: Nuclear Receptor Structure Analysis
`Servers
`
`SSEP: secondary structural elements of proteins
`
`T.Kawabata
`
`A.Zemla
`
`N.Eswar, B.John, N.Mirkovic, A.Fiser, VA.Ilyin, U.Pieper,
`A.C.Stuart, M.A.Marti-Renom, M.S.Madhusudhan,
`B.Yerkovich and A.Sali
`
`T.Schwede, I.Kopp, N.Guex and M.C.Peitsch
`
`G.Neshich, R.C.Togawa, A.L.Mancini, PR.Kuser,
`M.E.B.Yamagishi, G.Pappas Jr, W.VTorres, T.F.eCampos,
`LL.Ferreira, FM.Luna, A.G.Oliveira, R.T.Miura,
`M.K.Inoue, LG.Horita, D.F.de Souza, F.Dominiquini,
`A.Alvaro, CS.Lima, F.O.Ogawa, G.B.Gomes, J.F.Palandrani,
`G.F.dos Santos, E.M.de Freitas, A.R.Mattiuz, LC.Costa,
`CL.de Almeida, S.Souza, C.Baudet and R.H.Higa
`
`G.Perriere, C.Combet, S.Penel, C.Blanchet, J.Thioulouse,
`C.Geourjon, J.Grassot, C.Charavay, M.Gouy, L.Duret and
`G.Deleage
`
`E.Bettler, R.Krause, F.Hom and G.Vriend
`
`VShanthi, P.Selvarani, Ch.K.Kumar, C.S.Mohire and
`K.Sekar
`
`S Mfold web server for nucleic acid folding and
`hybridization prediction
`
`M.Zuker
`
`RNAsoft: a suite of RNA secondary structure
`prediction and design software tools
`
`M.Andronescu, R.Aguirre-Hemandez, A.Condon and
`RH.Hoos
`
`S
`
`Pfold: RNA secondary structure prediction using
`stochastic context-free grammars
`
`B.Knudsen and I.Hein
`
`Vienna RNA secondary structure server
`
`PsEUooVIEwER2: visualization of RNA pseudoknots
`of any type
`
`LL.Hofacker
`
`K.Han and Y.Byun
`
`A software tool-box for analysis of regulatory RNA
`elements
`
`P.Bengert and T.Dandekar
`
`GPRM: a genetic programming approach to finding
`common RNA secondary structure elements
`
`Y.-J.Hu
`
`Volume 31 number 13, July 1, 2003
`
`A.S.Z.Hussain, Ch.K.Kumar, C.K.Rajesh, S.S.Sheik and
`K.Sekar
`
`3356-3358
`
`3359-3363
`
`3364-3366
`
`3367-3369
`
`3370-3374
`
`3375-3380
`
`3381-3385
`
`3386-3392
`
`3393-3399
`
`3400-3403
`
`3404-3405
`
`3406-3415
`
`3416-3422
`
`3423-3428
`
`3429-3431
`
`3432-3440
`
`3441-3445
`
`3446-3449
`
`Continued
`
`Miltenyi Ex. 1023 Page 4
`
`
`
`Contents (Continued)
`
`Volume 31 number 13, July 1, 2003
`
`Tools for the automatic identification and
`classification of RNA base pairs
`
`H.Yang, F.Jossinet, N.Leontis, L.Chen, I.Westbrook,
`H.Berman and E.Westhof
`
`GEPAS: a web-based resource for microarray gene
`expression data analysis
`
`I.Herrero, F.Al-Shahrour, R.Diaz-Uriarte, A.Mateos,
`J.M.Vaquerizas, I.Santoyo and I.Dopazo
`
`INCLUSive: a web portal and service registry for
`microarray and regulatory sequence analysis
`
`B.Coessens, G.Thijs, S.Aerts, K.Marchal, F.De Smet,
`K.Engelen, P.Glenisson, Y.Moreau, I.Mathys and B.De Moor
`
`3450-3460
`
`3461-3467
`
`3468-3470
`
`GenePublisher: automated analysis of DNA
`microarray data
`
`S.Knudsen, C.Workman, T.Sicheritz-Ponten and C.Friis
`
`3471-3476
`
`Express Yourself: a modular platform for processing
`and visualizing microarray data
`
`N.M.Luscombe, TE.Royce, P.Bertone, N.Echols, CE.Horak,
`J.T.Chang, M.Snyder and M.Gerstein
`
`Chiplnfo: software for extracting gene annotation
`and gene ontology information for microarray
`analysis
`
`REDUCE: an online tool for inferring cis-regulatory
`elements and transcriptional module activities from
`microarray data
`
`Design of oligonucleotides for microarrays and
`perspectives for design of multi-transcriptome
`arrays
`
`S.Zhong, C.Li and WH. Wong
`
`C.Roven and H.J.Bussemaker
`
`3477-3482
`
`3483-3486
`
`3487-3490
`
`H.B.Nielsen, R.Wemersson and S.Knudsen
`
`3491-3496
`
`Multiple sequence alignment with the Clustal series
`of programs
`
`R.Chenna, H.Sugawara, T.Koike, R.Lopez, T.J.Gibson,
`D.G.Higgins and J.D.Thompson
`
`CLOURE: Clustal Output Reformatter, a program
`for reformatting ClustalX/ClustalW outputs for SNP
`analysis and molecular systematics
`
`D.K.Kohli and A.K.Bachhawat
`
`Tcoffee@igs: a web server for computing, evaluating
`and combining multiple sequence alignments
`
`O.Poirot, E.O'Toole and C.Notredame
`
`SLAM web server for comparative gene finding and
`alignment
`
`S.Cawley, L.Pachter and M.Alexandersson
`
`Theatre: a software tool for detailed comparative
`analysis and visualization of genomic sequence
`
`Y.J.K.Edwards, T.J.Carver, T.Vavouri, M.Frith, M.J.Bishop
`and G.Elgar
`
`MultiPipMaker and supporting tools: alignments
`and analysis of multiple genomic DNA sequences
`
`S.Schwartz, L.Elnitski, M.Li, M.Weirauch, C.Riemer,
`A.Smit, NISC Comparative Sequencing Program,
`E.D.Green, R.C.Hardison and WMiller
`
`MAVID multiple alignment server
`
`N.Bray and L.Pachter
`
`3497-3500
`
`3501-3502
`
`3503-3506
`
`3507-3509
`
`3510-3517
`
`3518-3524
`
`3525-3526
`
`EnteriX 2003: visualization tools for genome
`alignments of Enterobacteriaceae
`
`L.Florea, M.McClelland, C.Riemer, S.Schwartz and WMiller
`
`3527-3532
`
`MGAlignlt: a web service for the alignment of
`mRNA/EST and genomic sequences
`
`B.T.K.Lee, T.WTan and S.Ranganathan
`
`RevTrans: multiple alignment of coding DNA from
`aligned amino acid sequences
`
`R.Wemersson and AG.Pedersen
`
`S
`
`PromH: promoters identification using orthologous
`genomic sequences
`
`V.V.Solovyev and I.A.Shahmuradov
`
`3533-3536
`
`3537-3539
`
`3540-3545
`
`Continued
`
`Miltenyi Ex. 1023 Page 5
`
`
`
`Contents (Continued)
`
`Volume 31 number 13, July 1, 2003
`
`S
`
`FIE2: a program for the extraction of genomic DNA
`sequences around the start and translation initiation
`site of human genes
`
`A.Chong, G.Zhang and VB.Bajic
`
`PromoSer: a large-scale mammalian promoter and
`transcription start site identification service
`
`A.S.Halees, D.Leyfer and Z.Weng
`
`S Dragon Gene Start Finder identifies approximate
`locations of the 5' ends of genes
`
`VB.Bajic and S.H.Seah
`
`ETOPE: evolutionary test of predicted exons
`
`A.Nekrutenko, W-Y.Chung and WH.Li
`
`ESEfinder: a web resource to identify exonic
`splicing enhancers
`
`SiteSeer: visualisation and analysis of transcription
`factor binding sites in nucleotide sequences
`
`L.Cartegni, I.Wang, Z.Zhu, M.Q.Zhang and A.R.Krainer
`
`PE.Boardman, S.G.Oliver and SJ.Hubbard
`
`MATCH™: a tool for searching transcription factor
`binding sites in DNA sequences
`
`A.Kel, E.G6J3ling, I.Reuter, E.Cheremushkin,
`O.Kel-Margoulis and E.Wingender
`
`Gibbs Recursive Sampler: finding transcription
`factor binding sites
`
`YMF: a program for discovery of novel
`transcription factor binding sites by statistical
`overrepresentation
`
`Target Explorer: an automated tool for the
`identification of new target genes for a specified set
`of transcription factors
`
`WThompson, E.C.Rouchka and C.E.Lawrence
`
`S.Sinha and M.Tompa
`
`A.Sosinsky, C.P.Bonin, RS.Mann and B.Honig
`
`Regulatory Sequence Analysis Tools
`
`J. van Helden
`
`GeneSeqer@PlantGDB: gene structure prediction in
`plant genomes
`
`GlimmerM, Exonomy and Unveil: three ab initio
`eukaryotic genefinders
`
`S Dragon ERE Finder version 2: a tool for accurate
`detection and analysis of estrogen response elements
`in vertebrate genomes
`
`PatSearch: a program for the detection of patterns
`and structural motifs in nucleotide sequences
`
`PSORT-B: improving protein subcellular localization
`prediction for Gram-negative bacteria
`
`Signal search analysis server
`
`MHCPred: a server for quantitative prediction of
`peptide-MHC binding
`
`S.D.Schlueter, Q.Dong and VBrendel
`
`WH.Majoros, M.Pertea, C.Antonescu and S.L.Salzberg
`
`VB.Bajic, S.L.Tan, A.Chong, S.Tang, A.Strom,
`J.-A.Gustafsson, C.-Y.Lin and E.T.Liu
`
`G.Grillo, F.Licciulli, S.Liuni, E.Sbisa and G.Pesole
`
`J.L.Gardy, C.Spencer, K.Wang, M.Ester, G.E.Tusnady,
`I.Simon, S.Hua, K.deFays, C.Lambert, K.Nakai and
`F.S.L.Brinkman
`
`G.Ambrosini, VPraz, VJagannathan and P.Bucher
`
`P.Guan, I.A.Doytchinova, C.Zygouri and D.R.Flower
`
`3546-3553
`
`3554-3559
`
`3560-3563
`
`3564-3567
`
`3568-3571
`
`3572-3575
`
`3576-3579
`
`3580-3585
`
`3586-3588
`
`3589-3592
`
`3593-3596
`
`3597-3600
`
`3601-3604
`
`3605-3607
`
`3608-3612
`
`3613-3617
`
`3618-3620
`
`3621-3624
`
`Continued
`
`Miltenyi Ex. 1023 Page 6
`
`
`
`Contents (Continued)
`
`ELM server: a new resource for investigating short
`functional sites in modular eukaryotic proteins
`
`Volume 31 number 13, July 1, 2003
`
`P.Puntervoll, R.Linding, C.Gemiind, S.Chabanis-Davidson,
`M.Mattingsdal, S.Cameron, D.M.A.Martin, G.Ausiello,
`B.Brannetti, A.Costantini, F.Ferre, V.Maselli, A.Via,
`G.Cesareni, F.Diella, G.Superti-Furga, LWyrwicz, C.Ramu,
`C.McGuigan, R.Gudavalli, I.Letunic, P.Bork, LRychlewski,
`B.Kiister, M.Helmer-Citterich, WN.Hunter, R.Aasland and
`I.I.Gibson
`
`3625-3630
`
`Prediction of lipid posttranslational modifications
`and localization signals from protein sequences:
`big-II, NMT and PTSl
`
`F.Eisenhaber, B.Eisenhaber, WKubina, S.Maurer-Stroh,
`G.Neuberger, G.Schneider and M.Wildpaner
`
`3631-3634
`
`Scansite 2.0: proteome-wide prediction of cell
`signaling interactions using short sequence motifs
`
`J.C.Obenauer, LC.Cantley and M.B.Yaffe
`
`Static benchmarking of membrane helix predictions
`
`A.Kemytsky and B.Rost
`
`The web server of IBM's Bioinformatics and Pattern
`Discovery group
`
`I.Huynh, I.Rigoutsos, LParida, D.Platt and I.Shibuya
`
`Identification of patterns in biological sequences at
`the ALGGEN server: PROMO and MALGEN
`
`D.Farre, R.Roset, M.Huerta, J.E.Adsuara, LRosell6,
`M.M.Alba and X.Messeguer
`
`SEARCHPKS: a program for detection and analysis
`of polyketide synthase domains
`
`G.Yadav, R.S.Gokhale and D.Mohanty
`
`S MAK, a computational tool kit for automated
`MITE analysis
`
`G.Yang and TC.Hall
`
`Cluster-Buster: finding dense clusters of motifs in
`DNA sequences
`
`M.C.Frith, M.C.Li and Z.Weng
`
`SIC: a tool to detect short inverted segments in a
`biological sequence
`
`D.Robelin, H.Richard and B.Prum
`
`mreps: efficient and flexible detection of tandem
`repeats in DNA
`
`R.Kolpakov, G.Bana and G.Kucherov
`
`SPA: simple web tool to assess statistical significance
`of DNA patterns
`
`H.Richard and G.Nuel
`
`TRACTS: a program to map
`oligopurine.oligopyrimidine and other binary DNA
`tracts
`
`M.Gal, T.Katz, A.Ovadia and G.Yagil
`
`DNA analysis servers: plot.it, bend.it, model.it and
`IS
`
`K.Vlahovicek, LKajan and S.Pongor
`
`NEBcutter: a program to cleave DNA with
`restriction enzymes
`
`T. Vincze, J.Posfai and R.J.Roberts
`
`3635-3641
`
`3642-3644
`
`3645-3650
`
`3651-3653
`
`3654-3658
`
`3659-3665
`
`3666-3668
`
`3669-3671
`
`3672-3678
`
`3679-3681
`
`3682-3685
`
`3686-3687
`
`3688-3691
`
`SVM-Prot: web-based support vector machine
`software for functional classification of a protein
`from its primary sequence
`
`BPROMPT: a consensus server for membrane
`protein prediction
`
`GlobPlot: exploring protein sequences for
`globularity and disorder
`
`C.Z.Cai, LY.Han, Z.LJi, X.Chen and Y.Z.Chen
`
`3692-3697
`
`PD.Taylor, T.K.Attwood and D.R.Flower
`
`3698-3700
`
`R.Linding, RB.Russell, V.Neduva and I.I.Gibson
`
`3701-3708
`
`Continued
`
`Miltenyi Ex. 1023 Page 7
`
`
`
`Contents (Continued)
`
`Volume 31 number 13, July 1, 2003
`
`iSPOT: a web tool to infer the interaction specificity
`of families of protein modules
`
`B.Brannetti and M.Helrner-Citterich
`
`Automated Gene Ontology annotation for
`anonymous sequence data
`
`S.Hennig, D.Groth and H.Lehrach
`
`ESTAnnotator: a tool for high throughput EST
`annotation
`
`A.Hotz-Wagenblatt, T.Hankeln, P.Emst, K.-H.Glatting,
`E.R.Schrnidt and S.Suhai
`
`3709-3711
`
`3712-3715
`
`3716-3719
`
`Phydbac (phylogenomic display of bacterial genes):
`an interactive resource for the annotation of
`bacterial genomes
`
`F.Enault, K.Suhre, O.Poirot, C.Abergel and J.-M.Claverie
`
`3720-3722
`
`AMIGene: Annotation of Microbial Genes
`
`S.Bocs, S.Cruveiller, D.Vallenet, G.Nuel and C.Medigue
`
`AHMII: Agent to Help Microbial Information
`Integration
`
`H.Sugawara and S.Miyazaki
`
`S DNannotator: annotation software tool kit for
`regional genomic sequences
`
`C.Liu, TI.Bonner, I.Nguyen, J.L.Lyons, S.L.Christian and
`E.S.Gershon
`
`Bioverse: functional, structural and contextual
`annotation of proteins and proteomes
`
`I.McDermott and R. Sarnudrala
`
`S
`
`S
`
`FrameD: a flexible program for quality check and
`gene prediction in prokaryotic genomes and noisy
`matured eukaryotic sequences
`
`EuGhE'HoM: a generic similarity-based gene finder
`using multiple homologous sequences
`
`PROBEmer: a web-based software tool for selecting
`optimal DNA oligos
`
`Primer Design Assistant (PDA): a web-based primer
`design tool
`
`DePIE: Designing Primers for Protein Interaction
`Experiments
`
`OligoDesign: optimal design of LNA (locked nucleic
`acid) oligonucleotide capture probes for gene
`expression profiling
`
`CODEHOP (COnsensus-DEgenerate Hybrid
`Oligonucleotide Primer) PCR primer design
`
`T.Schiex, I.Gouzy, A.Moisan and Y.de Oliveira
`
`S.Foissac, P.Bardou, A.Moisan, M.-J.Cros and T.Schiex
`
`SJ.Emrich, M.Lowe and A.L.Delcher
`
`S.H.Chen, C.Y.Lin, C.S.Cho, C.Z.Lo and C.A.Hsiung
`
`G.Lu, M.Hallett, S.Pollock and D.Thornas
`
`N.Tolstrup, PS.Nielsen, JG.Kolberg, AM.Frankel,
`H.Vissing and S.Kauppinen
`
`TM.Rose, J.G.Henikoff and S.Henikoff
`
`S RNA-related tools on the Bielefeld Bioinformatics
`Server
`
`A.Sczyrba, J.Kriiger, H.Mersch, S.Kurtz and R.Giegerich
`
`SIRW: a web server for the Simple Indexing and
`Retrieval System that combines sequence motif
`searches with keyword searches
`
`C.Rarnu
`
`Onto-Tools, the toolkit of the modern biologist:
`Onto-Express, Onto-Compare, Onto-Design and
`Onto-Translate
`
`Swiss EMBnet node web server
`
`S.Draghici, P.Khatri, P.Bhavsar, A.Shah, S.A.Krawetz and
`M.A.Tainsky
`
`L.Falquet, L.Bordoli, V.Ioannidis, M.Pagni and
`C. V.Jongeneel
`
`3723-3726
`
`3727-3728
`
`3729-3735
`
`3736-3737
`
`3738-3741
`
`3742-3745
`
`3746-3750
`
`3751-3754
`
`3755-3757
`
`3758-3762
`
`3763-3766
`
`3767-3770
`
`3771-3774
`
`3775-3781
`
`3782-3783
`
`Continued
`
`Miltenyi Ex. 1023 Page 8
`
`
`
`Contents (Continued)
`
`Volume 31 number 13, July 1, 2003
`
`ExPASy: the proteomics server for in-depth protein
`knowledge and analysis
`
`E.Gasteiger, A.Gattiker, C.Hoogland, I.Ivanyi, RD.Appel
`and A.Bairoch
`
`UniqueProt: creating representative protein
`sequence sets
`
`S.Mika and B.Rost
`
`3784-3788
`
`3789-3791
`
`BLAST2SRS, a web server for flexible retrieval of
`related protein sequences in the SWISS-PROT and
`SPTrEMBL databases
`
`WU-Blast2 server at the European Bioinformatics
`Institute
`
`K.Bimpikis, A.Budd, R.Linding and T.J.Gibson
`
`3792-3794
`
`R.Lopez, V.Silventoinen, S.Robinson, A.Kibria and WGish
`
`3795-3798
`
`OntoBlast function: from sequence similarities
`directly to potential functional annotations by
`ontology terms
`
`G.Zehetner
`
`ORFeus: detection of distant homology using
`sequence profiles and predicted secondary structure
`
`K.Ginalski, I.Pas, L.S.Wyrwicz, M.von Grotthuss,
`J.M.Bujnicki and L.Rychlewski
`
`PARSESNP: a tool for the analysis of nucleotide
`polymorphisms
`
`N.E.Taylor and E.A.Greene
`
`SIFT: predicting amino acid changes that affect
`protein function
`
`P.C.Ng and S.Henikoff
`
`S ActionMap: a web-based software that automates
`loci assignments to framework maps
`
`G.Albini, M.Falque and J.Joets
`
`WEB-THERMODYN: sequence analysis software
`for profiling DNA helical stability
`
`Y.Huang and D.Kowalski
`
`NEWT, a new taxonomy portal
`
`I.Q.H.Phan, S.F.Pilbout, WFleischmann and A.Bairoch
`
`Comprehensive quantitative analyses of the effects
`of promoter sequence elements on mRNA
`transcription
`
`PipeAlign: a new toolkit for protein family analysis
`
`M.Lapidot and Y.Pilpel
`
`F.Plewniak, L.Bianchetti, Y.Brelivet, A.Carles, F.Chalmel,
`O.Lecompte, T.Mochel, L.Moulinier, A.Muller, I.Muller,
`V.Prigent, R.Ripp, J.-C.Thierry, J.D.Thompson, N.Wicker
`and O.Poch
`
`NORSp: predictions of long regions without regular
`secondary structure
`
`I.Liu and B.Rost
`
`Biological SOAP servers and web services provided
`by the public sequence data bank
`
`H.Sugawara and S.Miyazaki
`
`FootPrinter: a program designed for phylogenetic
`footprinting
`
`M.Blanchette and M. Tompa
`
`GeneFizz: a web tool to compare genetic ( coding/
`non-coding) and physical (helix/coil) segmentations
`of DNA sequences. Gene discovery and evolutionary
`perspectives
`
`E. Yeramian and L.Jones
`
`3799-3803
`
`3804-3807
`
`3808-3811
`
`3812-3814
`
`3815-3818
`
`3819-3821
`
`3822-3823
`
`3824-3828
`
`3829-3832
`
`3833-3835
`
`3836-3839
`
`3840-3842
`
`3843-3849
`
`S Geno2pheno: estimating phenotypic drug resistance
`from HIV-1 genotypes
`
`N.Beerenwinkel, M.Daumer, M.Oette, K.Kom, D.Hoffmann,
`R.Kaiser, T.Lengauer, I.Selbig and H.Walter
`
`3850-3855
`
`Continued
`
`Miltenyi Ex. 1023 Page 9
`
`
`
`Contents (Continued)
`
`Volume 31 number 13, July 1, 2003
`
`Building protein diagrams on the web with the
`residue-based diagram editor RbDe
`
`L.Skrabanek, F.Campagne and H.Weinstein
`
`CRP: Cleavage of Radiolabeled Phosphoproteins
`
`A.I.Mackey, T.A.J.Haystead and WR.Pearson
`
`S NirGel: calculation of virtual two-dimensional
`protein gels
`
`Update on XplorMed: a web server for exploring
`scientific literature
`
`AUTHOR INDEX
`
`S, Supplementary Matenal available at NAR Online
`
`K.Hiller, M.Schobert, C.Hundertmark, D.Jahn and R.Milnch
`
`C.Perez-Iratxeta, A.I.Perez, P.Bork and M.A.Andrade
`
`3856-3858
`
`3859-3861
`
`3862-3865
`
`3866-3868
`
`Miltenyi Ex. 1023 Page 10
`
`
`
`Nucleic Acids Research, 2003, Vol. 31, No. 13
`3812-3814
`DOI: 10.1093/narlgkg509
`
`SIFT: predicting amino acid changes that affect
`protein function
`Pauline C. Ng and Steven Henikoff*
`
`Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N A1-162, Seattle, WA 98109, USA
`
`Received January 4, 2003; Revised and Accepted February 28, 2003
`
`ABSTRACT
`Single nucleotide polymorphism (SNP} studies and
`random mutagenesis projects identify amino acid
`substitutions in protein-coding regions. Each sub(cid:173)
`stitution has the potential to affect protein function.
`SIFT (~orting !ntolerant from !olerant} is a program
`that predicts whether an amino acid substitution
`affects protein function so that users can prioritize
`substitutions for further study. We have shown that
`SIFT can distinguish between functionally neutral
`and deleterious amino acid changes in mutagenesis
`studies and on human polymorphisms. SIFT is
`avai I able at http://blocks.fhcrc.org/sift/SI FT .html.
`
`INTRODUCTION
`Single nucleotide polymorphisms (SNPs) are used as markers
`in linkage and association studies to detect which regions in
`the human genome may be involved in disease. SNPs in
`coding and regulatory regions may be implicated in disease
`themselves. Non-synonymous SNPs that lead to an amino acid
`change in the protein product are of major interest, because
`amino acid substitutions currently account for approximately
`half of the known gene lesions responsible for human inherited
`disease (1 ). SIFT (Sorting Intolerant From Tolerant) uses
`sequence homology- to pre-diet whether an- amino acid
`substitution will affect protein function and hence, potentially
`alter phenotype (2,3 ).
`SIFT has been applied to human variant databases and was
`able to distinguish mutations involved in disease from neutral
`polymorphisms (3). Assuming that disease-causing amino acid
`substitutions are damaging to protein function, we applied SIFT
`to a database of missense substitutions associated with or
`involved in disease (4). SIFT predicted 69% to be damaging.
`When SIFT was applied to the non-synonymous SNPs in
`dbSNP ( 5), a database of putative SNPs, 25% of the variants
`were predicted to be deleterious. This was similar to SIFT's
`20% false positive error which suggested that most non(cid:173)
`synonymous SNPs are functionally neutral. Furthermore, a
`subset of the variants from dbSNP predicted to affect function
`were involved in disease which confirmed SIFT sensitivity.
`The SIFT algorithm relies solely on sequence for prediction,
`yet performs similarly .to tools that use structure (3,~8). An
`
`advantage of not requiring structure is that a larger number of
`substitutions can be predicted on. Of the non-synonymous
`SNPs identified by the SNP Consortium, 74% were sufficiently
`similar to homologs in protein sequence databases for SIFT
`prediction. The number of substitutions that SIFT can predict
`on is expected to increase as more genomes are sequenced and
`more protein sequences become available.
`
`SIFT PREDICTION METHOD
`SIFT presumes that important amino acids will be conserved
`in the protein family, and so changes at well-conserved
`positions tend to be predicted as deleterious. For example, if a
`position in an alignment of a protein family only contains the
`amino acid isoleucine, it is presumed that substitution to any
`other amino acid is selected against and that isoleucine is
`necessary for protein function. Therefore, a change to any
`other amino acid will be predicted to be deleterious to protein
`the
`in an alignment contains
`If a position
`function.
`hydrophobic amino acids isoleucine, valine and leucine, then
`SIFT assumes, in effect, that this position can only contain
`amino acids with hydrophobic character. At this position,
`to other hydrophobic amino acids are usually
`changes
`predicted to be tolerated but changes to other residues (such
`as charged or polar) will be predicted to affect protein
`function.
`To predict whether an amino acid substitution in a protein
`will affect protein function, SIFT considers the position at
`which the change occurred and the type of amino acid
`change. Given a protein sequence, SIFT chooses related
`proteins and obtains an alignment of these proteins with the
`query. Based on the amino acids appearing at each position
`in the alignment, SIFT calculates the probability that an
`amino acid at a position is tolerated conditional on the most
`frequent amino acid being tolerated. If this normalized value
`is less than a cutoff, the substitution is predicted to be
`deleterious (2). The SIFT algorithm and software have been
`described previously (2,3).
`
`SIFT WEBSITE
`
`Input
`Users can obtain predictions for amino acid changes of interest
`at http://www.blocks.fhcrc.org/sift/SIFT.html. From this page,
`
`*To whom correspondence should be addressed. Tel: + 1 2066674515; Fax: + 1 2066675889; Email: steveh@fhcrc.org-
`
`.\'ucleic Acids Research, Vol. 31, No. 13 £: Oxford University Press 2003; all rights reserved
`
`Miltenyi Ex. 1023 Page 11
`
`
`
`Nucleic Acids Research, 2003, Vol. 31, No. 13
`
`3813
`
`Sul:)stitution at pos 1426 f:co:m. S to P is pr:edicted to AFFECT PROTEIN FUl•ICTIOI•I with a scoz:e of o. 02.
`Median sequence conse:cvation: 2.90
`Sequences :cep:cesented at this position:26
`
`Substitution at pos 1432 f:com. H to K is pr:edicted to be TOLERATED with a scor:e of 0.08.
`Median sequence consez:vation: 2.90
`Sequences r:ep:cesented at this position:26
`
`Substitution at pos 1445 frnm. D to N is predicted to AFFECT PROTEIN FlJNCTION with a scor:e of 0.01.
`Median sequence conser:vation: 3.66
`Sequences r:ep:r:esented at this position:21
`T1TARJHNG 1 ; This sul:)sti tution m.ay have been pz:edicted to affect function just because
`the sequences used wer:e not diver:se enough. There is LOW COHFIDENCE ia this prediction.
`
`Figure 1. An example of SIFT prediction on amino acid changes in a protein. Substitutions with score less than 0.05 are predicted to affect protein function. In the
`last prediction, the median conservation of the sequences does not meet the threshold so a warning is issued.
`
`there are links to three submission pages which allow users
`different levels of involvement in order to control the quality of
`their predictions.
`For minimal involvement, users can simply submit their
`protein sequences and amino acid substitutions. In its fully
`automated mode, SIFT will search for protein sequences
`these
`the query protein and based on
`to
`homologous
`sequences, calculate probabilities for each possible amino acid
`change. Users can select from among SWISS-PROT, SWISS(cid:173)
`PROT/TrEMBL, or NCBI's non-redundant protein databases
`for SIFT to search ( 4,9).
`Although SIFT can choose sequences automatically, better
`prediction results may be obtained when all of the sequences
`that are provided are orthologous to the query protein. This is
`because inclusion of paralogous sequences confounds predic(cid:173)
`tion at residues conserved only among the orthologues. If a
`user already has sequences that are thought to be functionally
`similar to the protein of interest, these sequences can be
`directly submitted and SIFT's step for choosing sequences
`skipped. Given the query protein and homologous sequences,
`SIFT obtains the alignment.
`If regions are misaligned, SIFT will not recognize conserved
`positions and therefore miss potentially damaging substitu(cid:173)
`tions. For best prediction quality, a third mode of operation
`allows users to submit their own alignments.
`
`Output
`Predictions are given for all 20 possible amino acid changes at
`each position in the protein. The alignment is also returned so
`that users can examine the sequences used for prediction and
`modify them for resubmission. This option is also useful for
`removing uncertain, erroneous and misaligned sequences from
`alignment output generated by SIFT in its automatic mode.
`For amino acid substitutions submitted by the user, a more
`detailed synopsis is provided (Fig. 1 ). The score is the
`normalized probability that the amino acid change is tolerated.
`SIFT predicts substitutions with scores less than 0.05 as
`deleterious. Some SIFT users have found that substitutions
`with scores less than 0.1 provide better sensitivity for detecting
`deleterious SNPs (Cornelia Ulrich, personal communication
`
`and 10). The quantitative score allows users to prioritize their
`amino acid changes by ranking them from the lowest scores to
`the highest.
`Confidence in a substitution predicted to be deleterious
`depends on the diversity of the sequences in the alignment. If
`the sequences used for prediction are closely related, then
`many positions will appear conserved and SIFT will predict
`most substitutions to affect protein function. This leads to a
`high false positive error where functionally neutral substitu(cid:173)
`tions are predicted to be deleterious.
`To alert the user to these situations, SIFT calculates the
`median conservation value which measures the diversity of the
`sequences in the alignment. Conservation, as measured by
`information content ( 11 ), is calculated for each position in the
`is obtained.
`alignment and the median of these values
`Conservation ranges from log220 ( = 4.32), when a position
`is completely conserved and only one amino acid is observed,
`to zero, when all 20 amino acids are observed at a position. By
`default, SIFT builds alignments with a median conservation
`value of 3.0. Predictions based on sequence alignments