`Ino 2 (July 2509)
`eral Collection
`BISIG6E
`9-08-27 07-54 0¢
`
`LIBRARY OF
`
`Genome & CO. v. Univ. of Chicago
`
`SPGR2019-00002
`UNIV. CHICAGO EX. 2074
`
`PROPERTY OF THE
`NATIONAL
`
`ze
`
`This material was copied
`
`
`
`
`
`Co-Editors
`
`Peter J, Diggle
`Seott Lo Zever
`
`Advisory Board
`Norman Breslow
`David Cox
`
`sarah Darby
`Peter Diggle
`Mitchell Gail
`
`Niels Kercding
`Scout Zever
`
`Associate Editors
`
`Rebecea Betensky
`Adrian Bowman
`Maria De lore
`
`Paddy Parrington
`Jason Fine
`Joseph Hogan
`Philip House
`Hhins van Houwelinwen
`haitya lekstadt
`Rafael leivarry
`Bank Mattick
`
`Lom Nye
`Roper Peng
`Marnaret Pepe
`Louise M. Ryan
`Naisyin Wanye
`
`OXFORD JOURNALS
`OFF ORD UNIVERSITY PRD Se
`
`
`
`VOLUME 10
`
`" NUMBER 3" JULY 2009
`
`Biostatistics
`
`R. D. Peng
`Ednorik Reproducible research and Biostatistics
`
`D. Lee, C. Ferguson, R. Mitchell
`Air pollution and health in Seotlaned: a multicity study
`
`P. de Valpine, H.-M. Bitter, M. P. S. Brown, J. Heller
`A simulhition-approximation approach to sample size planning tor high-dimensional
`classification studies
`
`D. H. Y. Leung, Y.-G. Wang, M. Zhu
`Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method
`
`J. Hardin, J. Wilson
`A note on oligonucleotide expression values not being normally distributed
`
`DD. Y. Clement, R. L, Strawderman
`Conditional GLE for recurrent event gap times
`
`J. Hu, F. Hu
`Estimating equation-based causality analysis with application to microarray time series data
`
`J. E. Eckel-Passow, A. L. Oberg, T. M. Therneau, H. R. Bergen, II]
`An insight into high-resolution mass-spectrometry data
`
`M. Haugen,F. Bray, T. Grotmol, S. Tretli, O. O, Aalen, T. A. Moger
`Frailty modeling of bimodal age-incidence curves of nasopharyngeal carcinoma
`in low-risk populations
`
`D. M. Witten, R. Tibshirani, T. Hastie
`A penalized matrix decomposition, with applications to sparse principal components
`and canonical correlation analysis
`
`C. Proust-Lima,J. M. G. Taylor
`Development and validation of a dynamic prognostic tool for prostate cancer recurrence
`using repeated measures of posttreatment PSA: a joint modeling approach
`
`M. A. van de Wiel, J. Berkhof, W. N. van Wieringen
`Testing the prediction error difference between 2 predictors
`
`P. S. Sanchez, G. F. V. Glonek
`Optimal designs for 2-color microarray experiments
`
`O. Saarela, §. Kulathinal, J. Karvanen
`Joint analysis of prevalence andincidence data using conditional likelihood
`
`405
`
`409
`
`424
`
`436
`
`446
`
`451
`
`468
`
`481
`
`501
`
`515
`
`535
`
`550
`
`561
`
`575
`
`588
`
`Biostatistics - Referees of Manuscripts Submitted Mid-2007 to Mid-2008
`
`
`
`Subscriptions
`
`A subscription to Rivstatistics comprises 4 issues
`Prices Include postage by surface mail, or for
`subscribers in the USA and Canada by airfreight, or in
`India, Japan, Australia and New Zealand by Air Speeded
`Post,
`Annual Subseription Rate (Volume 10, 4 issues, 2009)
`Institutional
`Print edition andsite-wide onlineaccess:
`E248/US$696/E522
`Print edition only: C391/US$662/E497
`Site-wide online access only: E317/US$634/E476
`Personal
`:
`Print edition: L1IG7/US$334/E25 |
`Please note: US$ rate applies to US & Canada, Euros applies
`'o Europe, UKE applies to UK and Rest of World,
`Thereare other subscription rates available; for a
`completelisting please visit
`www.biostatistics.ox fordjournals org/subinto,
`Pull pre-payment in the correct currency is required forall
`orders. Payment should be in US dollars for orders being
`delivered to the USA or Canada: Euros for orders being
`delivered within Europe(excluding the UK); GBP sterling for
`orders being delivered elsewhere (ic, not being delivered to
`USA, Canada, or Europe). AH orders should be accompanied
`by full payment andsent to your nearest Oxford Journals
`office. Subscriptions are accepted for complete volumes only,
`Orders are regarded as firm, and payments are not
`refundable. Ourprices include Standard Air ats postiayse
`outside of the UK. Clainis must be notified within four
`Months of despatch/orderdate (whicheveris later),
`Subscriptions in the EEC maybesubject to European VAT.
`If registered, please supply details to avoid unnecessary
`charges. For subscriptions that include online versions, a
`Proportion of the subscription price may be subject to UK
`VAT. Subscribers in Canada, please add GST to the prices
`quoted. Personal rite subscriptions are only available if
`PMYMICNT 1S fade by personal cheque or credit card, delivery
`is loa private address, and as for personal use only,
`The current year and two PPOVIOUS VOUS ISSUES dure
`“Vatlible frotn Oodird Ver SLY Press. Previons Volinies
`Cn be obtained online at Httpy/www
`periodicals. omjoxtordhitinl or fram the Periodicals
`Service Company,
`11) Main Street, Germantown,
`NY 14526, USA. Limail: psewperindicals,com
`Tel
`01 (518) 537 4700, Bax: 1 (518) 537 5899.
`lor further information, please contact: Journals
`Customer Service| department, Oxford University Press,
`Great Clarendon Street, Oxford OX GDP, UK, Finaik
`jodsciest seryaboxtordjournabsorg.
`‘Tel (ane ANSWETphone
`outside normal working: hours): ei (O) L465 352907, Tax:
`PEEOV 865 359485, In the US, please contact: Journals
`Customer Service Department, Oxford University Press, 2001
`Evans Road, Cary, NC 27514, USA. Envail
`Inlordersq@osxtordjournals.ory, Tel (and answerphone outside
`nora!) working hours): 600 852 7423 (tollfree in
`HSACanada), Fax: 919.677 1714. In Japan, please contact:
`Jourmils Customer Services, Oxford University Press, 45-10:
`49, Shiba, Minato-ku, Tokyo, 108-8386, Jipan, Email:
`custserypa@oxfordiournals.ory, Tel +81 45444 5858,
`Fax: #81 3 3454 2929,
`
`Methods of payment. (i) Check (payable to Oxford
`University Press, to Oxford University Press, Cashiers Office,
`Great Clarendon Street, Oxford OX2 6DP, 1K) in GBE
`Sterling (drawn on a UK bank) US$ Dollars (draiwn on a US
`bank), or FUE Euros. (i) Bank transfer to Barclays Bank Ple,
`Oxford Group Office, Oxford (bank sort code 20-65-14) (LIK),
`
`overseas only Swift code BARC GB 22 (GBE Sterling to
`account no, 70299372, IBAN GBSSBARC2065 1870299932;
`USS Dollars to account no, 66014600, IBAN
`GB27BARC2065 1866014600; EUE Euros to account no.
`78923655, IBAN GBLOBARC2065 1478923655). (ii) Credit
`card (Mastercard, Visa, Switch or American Express},
`Hiostatisties (ISSN 1465-4644) 1s published quarterly in
`January, April, July, and October by Oxford University
`Press, Cary, NC. Biostatistics is distributed by DHL,
`Mills Road. Quarry Wood Industrial Estate, Aylesford,
`Kent ME20 7WZ, UK. Periodicals postage paid at Rahway,
`NJ], and at additional entry points,
`US Postmaster: send address changes to [ostatistics,
`c/o Mercury International, 365 Blair Road, Avenel,
`N] 07001,
`
`Oxford Journals Environmental and Ethical Policies
`Oxford Journals is committed to working with the global
`community to bring the highest quality research to the
`widest possible audience. Oxford Journals will protect the
`environment by implementing environmentally friendly
`policies and practices wherever possible. Please see
`hutp://www.oxfordjournals.org/ethicalpolicies. hurl for
`further information on Oxford Journals’ environmental
`and ethical policies,
`
`Digital Object Identifiers
`For information on dois and to resolve them, please visit
`www.doi.ory,
`Permissions
`For information on howto request permissions to
`reproduce artcles/information from this journal, please
`visit www.oxfordjournals.org/jals/permissions.
`Advertising
`Advertising, inserts, and artwork enquiries should
`he addressed to Advertising and Special Sales, Oxford
`Journals, Oxford University Press, Great Clarendon Street,
`Oxford, OX2 GDP, UK. Tel: +44 (0)1865 354767) Bax: +44
`(0/1865 353774; E-mail: jolsadvertisinga
`oxfordjournals,org.
`Submissions
`For instructions on submitting a manuscript to Biostatistics,
`Pease see the Instructions to Authors at
`www. biostatistics.oxfordjournals.org,
`Disclaimer
`Statements of fact and opinion in the articles in
`Mostatistics are those of the respective authors and
`contributors and not of Biostatistics or Oxford University
`Press. Neither Oxford University Press nor Miostatisties
`Inike any representation, express or implied,
`respect of the accuracy of the material ia this journal
`and Cannot accept any legal responsibility or liability tow
`“YW errors Or Oissions (hat may be made. The reader
`should mule his/her own evaluation as to the
`approprhiteness or otherwise of any experimental
`technique described.
`© 2009 Oxford University Press, all rights reserved.
`AlL rights reserved. no part of this publication may be
`reproduced, stored ina retrieval system, ov (ansmtted in
`any form or by any means, electronic, mechanical,
`photocopying, recording, or otherwise without priat
`written permission of the Publishers, ora licence
`permitiing restricted copying: issued tthe CK by the
`Copyriht Licensing Agency Ltd, 90 Tottenham Court Road,
`London WIP SHE, or in the USA by the Copyright Clearance
`Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`
`
`Brostaristies (2009), 1, 4, pp. 405408
`dot 10, (OO8/biostatisties/ks pol
`
`
`This material may be protected by Copyright law (Title 17 U.S. Code)
`
`
`
`Editorial
`
`As cocditors of Bieyraristies, we wish to encouragethe pracice of making research published in the
`journal reproducible by others. The following invited piece by Roger Peng sets out our policy on this,
`Rogerwill be assuming the role of Associate Editor for reproducibility as set out in hts piece.
`While we consider reproducibility to be a desirable goal, we wish to emphasise that our policyts to
`encourage our authors to consider this as an opportunity that they may wish to take, rather than as a re-
`quirement that we impose upon them. All submissions to the journal will continue to be reviewed using
`our established system;
`the issue of reproducibility will be considered only when a paper had been
`accepted for publication on the basis ofits scientific merit as judged by our peer-reviewprocess.
`PrererR J. DIGGLE, SCOTT L. ZEGER
`
`Reproducible research and Biostatistics
`
`ROGER D. PENG
`
`Johns Hopkins Universtty
`
`}.
`
`INTRODUCTION AND MOTIVATION
`
`Thereplication ofscientific findings using independent investigators, methods, data, equipment, and pro-
`locols has long been, and will continue to be,
`the standard: by which scientific claims are evaluated,
`However, in many fields of study there are examples ofscientific investigations that cannot be fully repli-
`cated becauseof a lack oftime or resources. In such asituation,thereis aneed fora minimum standardthat
`can fill
`the void between full replication and nothing. One candidate for this minimum standard:
`ts
`“reproducible research”, which requires that data sets and computer code be made avatlable to others
`for verifying publishedresults and conducting alternative analyses.
`The need for publishing reproducible research is increasing for a number ofreasons, Investigators are
`more frequently examining weak associations and complex interactions for which the data contam a low
`signal-to-noise ratio, New technologies allowscientists in all areas to compile complex high-dimensional
`databases. The ubiquity of powerful statistical and computing capabilities allows investigators to explore
`those databases andidentify associations ofpotentialinterest. However, with the increasein data and com-
`puting power comesa greater potential for identifying spurious associations. In addition to these develop-
`ments, recent reports of fraudulent research being publishedin the biomedical literature have highlighted
`the need for reproducibility in biomedical studies and have invited the attention of the major medical jour-
`nals (Laine ane offers, 2007), Even without the presence of deliberate fraud, it should be noted that as
`analyses become more complicated, the possibility of inadvertant errors resulting in misleading findings
`looms large. In the examples of Baggerly and others (2005) and Coombes andothers (2007), the ertors
`discovered were not necessarily simple or obvious and the examination of the problem itself required
`
`7
`
`The Author (008 Poblished by Ontord University Press All rights reserved, For permissions, please email
`
`journals permissions dos tord journals ane
`
`
`
`406
`
`RD. PENG
`
`a sophisticated analysis. Misunderstandings about commonly used software ean also lead to problems.
`particularly when such software is applied to Situations not originally imagined (Domine and others.
`2002).
`While many might agree with the benefits of disseminating reproducible research, there ts untortu-
`nately a general lack of infrastructure for supporting such endeavors, Investigators who are willing to
`make their research reproducible are confronted with a number of barriers, one of whieh ts the need to
`distribute, and make available for an indefinite amount of Gime, the supplementary materials required for
`reproducing the results, Another is the lack of an “instruction manual” that indicates which materials are
`needed and what might be the most suttable formats for making data and computer code available. In this
`editorial, we describe the efforts that Biosrafistics is making to promote reproducibility in brostatisical
`research,
`
`2. REPRODUCIBILITY POLICY FOR Brostatistes
`
`Fromthis issue forward, Biostatistics is willing to work with authors to publish articles that meet a stan
`dard of reproducibility. The standard involves three different dimensions that we describe in greater detul
`below. The purpose of defining different dimensions of reproducibility is to provide some level of con-
`tinuity between “not reproducible” and “reproducible.” The journal has for some time now allowed and
`encouraged authors to place supplementary materials online via the journal's Web site and the repro-
`ducible research policy builds upon that framework, It should be noted that this policy is sul in the early
`Stages and it is likely that the detuils will evolve as we pain experience working with authors.
`
`210 Damenstonsaf reproduecibiliry
`for reproducibility (AER) will handle submissions of reproducible articles. Cur
`The Associate Editor
`rently, the AER’s involvement with a submission begins only whenan article has been accepted for pub
`heation. The ABR will consider three different criteria When evaluating the reproducibility ofanarticle,
`I, Data: The analyte data from which the principal results were derived are made available on the
`journal’s Web site. The authors are responsible for cnsuring that necessary permissions are obtained
`before the data are distributed,
`i Code: Any computer code, sofiware. or other computerinstructions that were used to compute pub:
`lished results are provided. For software that
`is widely available from central
`repositories
`fog. CRAN, Statlib), a reference to where they can be obtained will suffice.
`s Reproducible: An article is designated as reproducible if the AER succeeds in executing the code
`on the data provided and produces results matching those that the authors clam are reproducible, [a
`reproducing these results, reasonable bounds for numerical tolerance will be considered,
`
`Authors can choose to meet a subset of these eriterne if they wish, For example, an author may choose
`fo release code showing howa particular method is implemented but may not have permission to publish
`the data,
`In-such a case, the “code” criterion is satistied, but
`the “data” and “reproducible” eriterne are
`not, For authors interested in submitting materials satisfying the “reproducible” criterion,
`the journal ts
`currently limiting submissions to those whose analyses are conducted using the R sofiware environment.
`This limitation may change imthe future and will generally be dependent on the resources ofthe journal
`and the AER. Papers that meet any or all of the above three numbered criteria will be kite marked D.C.
`and/or Ron ther title page inthe journal,
`
`
`
`fully reproduce their work. Authors can
`specilically for the purposes of allowing others to partially or
`submit analytic data sets. computer code, or both in support of their papers, Authors may additionally
`indicate Which results of the paper can be reproduced using the submitted materials, although such an
`indication ts not required. Unless an author indicates that the supplementary materials satisly the “repro-
`ducible” criterion (see belaw), the submitted materials will not be checked for reproducibility but, at the
`discretion of the Editors, may still be posted to the journal's Website with an indication that the author
`has contributed the materials for the purpose of reproducing the results,
`When submitting data sets and code, authors should use open and documented formats rather than
`proprictiry formats, Files containing computer code should be submitted in ASCH text format. While
`proprietary data formats may be standard in some subspecialties, we would prefer that an open alternative
`be submitted for the purposes ofposting on the journal's Website. Increasing the longevity and usctulness
`of the data and codeis one important goul which is best supported by the use of open formats. The ABR
`will work with authors to find appropriate formats for data and code submissions,
`
`Al Materials for satisfying the “reproducible ®criterion
`To satisfy the “reproducible” criterion, authors should submit all the necessary maternils so that the AER
`can execute the code on the analytic data sets and produce output similarto that obtained bythe author
`Currently, only submissions written using the R software environment will be accepted for satisfying this
`ertterion. Authors should submit the following:
`1 A “main” script whieh directs the overall analysis. This seript may load data, other software, and
`call the neeessary functions for conducting the analysis described in the article,
`2. Other required codefiles, presumably called from the “main” seript fle,
`OrMion,
`3, External data or auxiliary files containing the analytic data sets or other required int
`could consist
`\ A “target” file Cor files) contaming the results which are to be reproduced, Such a tile
`Heo
`This will aidan
`ofan ASCII text file containing numerical results ora PDP file containing a figure.
`the comparison of computed results with published results.
`.
`has
`a
`combination
`Although not required, authors are encouragedto use literate programiming tools suc has a combi
`of IATEX and Sweave, Specitically for those using Sweave, submissions should mclude the following
`i
`|
`)
`#
`I. The original Noweb source forthe article. typically afile with a Raw or Sow eXtension,
`2. The TeX file generated by the Sweave function, typically with file extension tex.
`3. Any datafiles or auxiliary code needed to exeeute the Sweavefunction successfully on the Noweb
`source fle,
`4. Any bibliographic database files Ce. BibTeX tiles).
`ithe AER is able to reproduce the stated results, the submitted materials will be posted to the journ
`Website wilh an indication that the results in the corresponding paper are reproducible,
`
`als
`
`4. EXAMPLE
`
`In this issue. Duncan Lee and coauthors have published a paper (“Air pollution and health im Scol Janu:
`A multi-city study”) along with all
`the necessary data and code for reproducing the principal findings
`presented in their paper. In the paper, the authors relate hospital admission counts toar pollution levels
`with spatial Poisson regression models, These models are fitted to the data within a Bayesian framework
`
`
`
`408
`
`R. Do PENG
`
`using Markov chain Monte Carlo (MCMC) methods, The authors have provided the code iMplementing
`their MCMC sampler and have provided an “Overall script.” tile that directs the amlysis,
`
`REFERENCES
`
`in noise: evaluating reported
`BAGGERLY, K., MORRIS, J., EDMONSON, S. AND COOMBES, K, (2005). Signal
`reproducibility of serum proteomic tests for ovarian cancer. Journal of tre National Cancer Institute 97, 307-809
`
`COOMBES, K.. WANG, J. AND BAGGEREY, K. (2007). Microarrays: retracing steps. Nature Medicine 13, 1276-
`277.
`
`DOMINICL, b,, MCDERMOTT, A. ZEGER, SL. AND SAMET, J, M, (2002), On the use of generalized additive
`models in time series studies ob air pollution and health, American Journal of Epidemiology VS6, 1-11.
`
`LAINE, C,, GOODMAN, S.N,, GRISWOLD, M.E. AND SOX, H.C, (2007). Reproducible research: moving toward
`research the public can really trust, Annaly of dnrernal Medicine 146, 450-454
`
`