throbber
Resource
`
`Personal Omics Profiling
`Reveals Dynamic Molecular
`and Medical Phenotypes
`
`Rui Chen,1,11 George I. Mias,1,11 Jennifer Li-Pook-Than,1,11 Lihua Jiang,1,11 Hugo Y.K. Lam,1,12 Rong Chen,2,12
`Elana Miriami,1 Konrad J. Karczewski,1 Manoj Hariharan,1 Frederick E. Dewey,3 Yong Cheng,1 Michael J. Clark,1
`Hogune Im,1 Lukas Habegger,6,7 Suganthi Balasubramanian,6,7 Maeve O’Huallachain,1 Joel T. Dudley,2
`Sara Hillenmeyer,1 Rajini Haraksingh,1 Donald Sharon,1 Ghia Euskirchen,1 Phil Lacroute,1 Keith Bettinger,1 Alan P. Boyle,1
`Maya Kasowski,1 Fabian Grubert,1 Scott Seki,2 Marco Garcia,2 Michelle Whirl-Carrillo,1 Mercedes Gallardo,9,10
`Maria A. Blasco,9 Peter L. Greenberg,4 Phyllis Snyder,1 Teri E. Klein,1 Russ B. Altman,1,5 Atul J. Butte,2 Euan A. Ashley,3
`Mark Gerstein,6,7,8 Kari C. Nadeau,2 Hua Tang,1 and Michael Snyder1,*
`1Department of Genetics, Stanford University School of Medicine
`2Division of Systems Medicine and Division of Immunology and Allergy, Department of Pediatrics
`3Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine
`4Division of Hematology, Department of Medicine
`5Department of Bioengineering
`Stanford University, Stanford, CA 94305, USA
`6Program in Computational Biology and Bioinformatics
`7Department of Molecular Biophysics and Biochemistry
`8Department of Computer Science
`Yale University, New Haven, CT 06520, USA
`9Telomeres and Telomerase Group, Molecular Oncology Program, Spanish National Cancer Centre (CNIO), Madrid E-28029, Spain
`10Life Length, Madrid E-28003, Spain
`11These authors contributed equally to this work
`12Present address: Personalis, Palo Alto, CA 94301, USA
`*Correspondence: mpsnyder@stanford.edu
`DOI 10.1016/j.cell.2012.02.009
`
`SUMMARY
`
`INTRODUCTION
`
`Personalized medicine is expected to benefit from
`combining genomic information with regular moni-
`toring of physiological states by multiple high-
`throughput methods. Here, we present an integrative
`personal omics profile (iPOP), an analysis that
`combines genomic, transcriptomic, proteomic, me-
`tabolomic, and autoantibody profiles from a single
`individual over a 14 month period. Our iPOP analysis
`revealed various medical risks,
`including type 2
`diabetes.
`It also uncovered extensive, dynamic
`changes in diverse molecular components and
`biological pathways across healthy and diseased
`conditions. Extremely
`high-coverage
`genomic
`and transcriptomic data, which provide the basis
`of our
`iPOP,
`revealed extensive heteroallelic
`changes during healthy and diseased states and an
`unexpected RNA editing mechanism. This study
`demonstrates that longitudinal
`iPOP can be used
`to interpret healthy and diseased states by connect-
`ing genomic information with additional dynamic
`omics activity.
`
`Personalized medicine aims to assess medical risks, monitor,
`diagnose and treat patients according to their specific genetic
`composition and molecular phenotype. The advent of genome
`sequencing and the analysis of physiological states has proven
`to be powerful
`(Cancer Genome Atlas Research Network,
`2011). However, its implementation for the analysis of otherwise
`healthy individuals for estimation of disease risk and medical
`interpretation is less clear. Much of the genome is difficult to
`interpret and many complex diseases, such as diabetes, neuro-
`logical disorders and cancer, likely involve a large number of
`different genes and biological pathways (Ashley et al., 2010;
`Grayson et al., 2011; Li et al., 2011), as well as environmental
`contributors that can be difficult to assess. As such, the com-
`bination of genomic information along with a detailed molecular
`analysis of samples will be important for predicting, diagnosing
`and treating diseases as well as for understanding the onset, pro-
`gression, and prevalence of disease states (Snyder et al., 2009).
`Presently, healthy and diseased states are typically followed
`using a limited number of assays that analyze a small number
`of markers of distinct types. With the advancement of many
`new technologies, it is now possible to analyze upward of 105
`molecular constituents. For example, DNA microarrays have
`allowed the subcategorization of
`lymphomas and gliomas
`
`Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1293
`
`Personalis EX2030.1293
`
`

`

`(Mischel et al., 2003), and RNA sequencing (RNA-Seq) has
`identified breast cancer transcript isoforms (Li et al., 2011; van
`der Werf et al., 2007; Wu et al., 2010; Lapuk et al., 2010).
`Although transcriptome and RNA splicing profiling are powerful
`and convenient, they provide a partial portrait of an organism’s
`physiological state. Transcriptomic data, when combined with
`genomic, proteomic, and metabolomic data are expected to
`provide a much deeper understanding of normal and diseased
`states (Snyder et al., 2010). To date, comprehensive integrative
`omics profiles have been limited and have not been applied to
`the analysis of generally healthy individuals.
`To obtain a better understanding of: (1) how to generate an
`integrative personal omics profile (iPOP) and examine as many
`biological components as possible, (2) how these components
`change during healthy and diseased states, and (3) how this
`information can be combined with genomic information to
`estimate disease risk and gain new insights into diseased states,
`we performed extensive omics profiling of blood components
`from a generally healthy individual over a 14 month period
`(24 months total when including time points with other molecular
`analyses). We determined the whole-genome sequence (WGS)
`of the subject, and together with transcriptomic, proteomic, me-
`tabolomic, and autoantibody profiles, used this information to
`generate an iPOP. We analyzed the iPOP of the individual over
`the course of healthy states and two viral infections (Figure 1A).
`Our results indicate that disease risk can be estimated by
`a whole-genome sequence and by regularly monitoring health
`states with iPOP disease onset may also be observed. The
`wealth of information provided by detailed longitudinal iPOP re-
`vealed unexpected molecular complexity, which exhibited
`dynamic changes during healthy and diseased states, and
`provided insight into multiple biological processes. Detailed
`omics profiling coupled with genome sequencing can provide
`molecular and physiological information of medical significance.
`This approach can be generalized for personalized health moni-
`toring and medicine.
`
`RESULTS
`
`Overview of Personal Omics Profiling
`Our overall
`iPOP strategy was to: (1) determine the genome
`sequence at high accuracy and evaluate disease risks,
`(2)
`monitor omics components over time and integrate the relevant
`omics information to assess the variation of physiological states,
`and (3) examine in detail the expression of personal variants
`at the level of RNA and protein to study molecular complexity
`and dynamic changes in diseased states.
`We performed iPOP on blood components (peripheral blood
`mononuclear cells [PBMCs], plasma and sera that are highly
`accessible) from a 54-year-old male volunteer over the course
`of 14 months (IRB-8629). The samples used for iPOP were taken
`over an interval of 401 days (days 0–400). In addition, a complete
`medical exam plus laboratory and additional tests were per-
`formed before the study officially launched (day 123) and blood
`glucose was sampled multiple times after the comprehensive
`omics profiling (days 401–602) (Figure 1A). Extensive sampling
`was performed during two viral infections that occurred during
`this period: a human rhinovirus (HRV) infection beginning on
`
`1294 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
`
`day 0 and a respiratory syncytial virus (RSV) infection starting
`on day 289. A total of 20 time points were extensively analyzed
`and a summary of the time course is indicated in Figure 1A.
`The different types of analyses performed are summarized in
`Figures 1B and 1C. These analyses, performed on PBMCs
`and/or serum components, included WGS, complete transcrip-
`tome analysis (providing information about the abundance of
`alternative spliced isoforms, heteroallelic expression, and RNA
`edits, as well as expression of miRNAs at selected time points),
`proteomic and metabolomic analyses, and autoantibody
`profiles. An integrative analysis of these data highlights dynamic
`omics changes and provides rich information about healthy and
`diseased phenotypes.
`
`Whole-Genome Sequencing
`We first generated a high quality genome sequence of this
`individual using a variety of different technologies. Genomic
`DNA was subjected to deep WGS using technologies from
`Complete Genomics (CG, 35 nt paired end) and Illumina
`(100 nt paired end) at 150- and 120-fold total coverage, respec-
`tively, exome sequencing using three different technologies to
`80- to 100-fold average coverage (see Extended Experimental
`Procedures available online) and analysis using genotyping
`arrays and RNA sequencing.
`The vast majority of genomic sequences (91%) mapped to the
`hg19 (GRCh37) reference genome. However, because of the
`depth of our sequencing, we were able to identify sequences
`not present
`in the reference sequence. Assembly of
`the
`unmapped Illumina sequencing reads (60,434,531, 9% of the
`total) resulted in 1,425 (of 29,751) contigs (spanning 26 Mb) over-
`lapping with RefSeq gene sequences that were not annotated in
`the hg19 reference genome. The remaining sequences appeared
`unique, including 2,919 exons expressed in the RNA-Seq data
`(e.g., Figure S1A). These results confirm that a large number of
`undocumented genetic regions exist
`in individual human
`genome sequences and can be identified by very deep
`sequencing and de novo assembly (Li et al., 2010).
`Our analysis detected many single nucleotide variants (SNVs),
`small
`insertions and deletions (indels) and structural variants
`(SVs;
`large insertions, deletions, and inversions relative to
`hg19), (summarized in Table 1 and Experimental Procedures).
`134,341 (4.1%) high-confidence SNVs are not present
`in
`dbSNP, indicating that they are very rare or private to the
`subject. Only 302 high-confidence indels reside within RefSeq
`protein coding exons and exhibit enrichments in multiples of
`three nucleotides (p < 0.0001).
`In addition to indels, 2,566
`high-confidence SVs were identified (Experimental Procedures
`and Table S1) and 8,646 mobile element insertions were identi-
`fied (Stewart et al., 2011).
`Analysis of the subject’s mother’s genome by comprehensive
`genome sequencing (as above) and imputation allowed a
`maternal/paternal chromosomal phasing of 92.5% of
`the
`subject’s SNVs and indels (see Extended Experimental Proce-
`dures for details). Of 1,162 compound heterozygous mutations
`in genes, 139 contain predicted compound heterozygous
`deleterious and/or nonsense mutations. Phasing enabled the
`assembly of a personal genome sequence of very high confi-
`dence (c.f., Rozowsky et al., 2011).
`
`Personalis EX2030.1294
`
`

`

`B
`
`A
`
`C
`
`Figure 1. Summary of Study
`(A) Time course summary. The subject was monitored for a total of 726 days, during which there were two infections (red bar, HRV; green bar, RSV). The black bar
`indicates the period when the subject: (1) increased exercise, (2) ingested 81 mg of acetylsalicylic acid and ibuprofen tablets each day (the latter only during the
`first 6 weeks of this period), and (3) substantially reduced sugar intake. Blue numbers indicate fasted time points.
`(B) iPOP experimental design indicating the tissues and analyses involved in this study.
`(C) Circos (Krzywinski et al., 2009) plot summarizing iPOP. From outer to inner rings: chromosome ideogram; genomic data (pale blue ring), structural variants >
`50 bp (deletions [blue tiles], duplications [red tiles]), indels (green triangles); transcriptomic data (yellow ring), expression ratio of HRV infection to healthy states;
`proteomic data (light purple ring), ratio of protein levels during HRV infection to healthy states; transcriptomic data (yellow ring), differential heteroallelic
`expression ratio of alternative allele to reference allele for missense and synonymous variants (purple dots) and candidate RNA missense and synonymous edits
`(red triangles, purple dots, orange triangles and green dots, respectively).
`See also Figure S1.
`
`WGS-Based Disease Risk Evaluation
`We identified variants likely to be associated with increased
`susceptibility to disease (Dewey et al., 2011). The list of high
`confidence SNVs and indels was analyzed for rare alleles (<5%
`of the major allele frequency in Europeans) and for changes in
`genes with known Mendelian disease phenotypes (data summa-
`rized in Table 2), revealing that 51 and 4 of the rare coding SNV
`and indels, respectively, in genes present in OMIM are predicted
`
`to lead to loss-of-function (Table S2A). This list of genes was
`further examined for medical relevance (Table S2A; example
`alleles are summarized in Figure 2A), and 11 were validated by
`Sanger sequencing. High interest genes include: (1) a mutation
`(E366K) in the SERPINA1 gene previously known in the subject,
`(2) a damaging mutation in TERT, associated with acquired
`aplastic anemia (Yamaguchi et al., 2005), and (3) variants asso-
`ciated with hypertriglyceridemia and diabetes, such as GCKR
`
`Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1295
`
`Personalis EX2030.1295
`
`

`

`Table 1. Summary and Breakdown of DNA Variants
`
`Total Variants
`
`Total High Confidence
`
`Heterozygous High Confidence
`
`Homozygous High Confidence
`
`Total gene-associated SNVs
`
`1,312,780
`
`Type
`
`Total SNVs
`
`Total coding/UTR
`
`Missense
`
`Nonsense
`
`Synonymous
`0
`0
`
`5
`
`UTR
`
`3
`
`UTR
`
`3,739,701
`
`49,017
`
`10,592
`
`83
`
`11,459
`
`4,085
`
`22,798
`
`3,301,521
`
`1,183,847
`
`44,542
`
`9,683
`
`73
`
`10,864
`
`2,978
`
`20,944
`
`1,971,629
`
`717,485
`
`27,383
`
`5,944
`
`49
`
`6,747
`
`1,802
`
`12,841
`
`690,102
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`1,329,892
`
`466,362
`
`17,159
`
`3,739
`
`24
`
`4,117
`
`1,176
`
`8,103
`
`449,203
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`Intron
`
`Ts/Tv
`
`dbSNP
`
`1,263,763
`
`1,139,305
`
`—
`
`2.14
`
`3,493,748
`
`3,167,180
`
`Candidate private SNV
`Indels (107 +36 bp)
`Coding
`
`Structural variants (>50 bp)
`In 1000G projecta
`
`245,953
`
`1,022,901
`
`3,263
`
`44,781
`
`4,434
`
`134,341
`
`216,776
`
`302
`
`2,566
`
`1,967
`
`High confidence values are from variants identified across multiple platforms (Illumina and CG) and/or Exome and RNA-Seq data. Annotations were
`based from variant call formatted (vcf) files for heterozygous calls: 0/1, reference (ref)/alternative (alt); 1/2, alt/alt and homozygous calls; 1/1, alt/alt; 1/,
`(alt/alt-incomplete call). Polyphen-2 was used to identify the location of the SNVs.
`a1000G (1000 Genomes Project Consortium, 2010).
`
`(homozygous) (Vaxillaire et al., 2008), and KCNJ11 (homozy-
`gous) (Hani et al., 1998) and TCF7 (heterozygous) (Erlich et al.,
`2009).
`Genetic disease risks were also assessed by the RiskOGram
`algorithm, which integrates information from multiple alleles
`associated with disease risk (Ashley et al., 2010) (Figure 2B).
`This analysis revealed a modest elevated risk for coronary artery
`disease and significantly elevated risk levels of basal cell carci-
`noma (Figure 2B), hypertriglyceridemia, and type 2 diabetes
`(T2D) (Figures 2B and 2C).
`In addition to coding region variants we also analyzed genomic
`variants that may affect
`regulatory elements (transcription
`factors [TF]), which had not been attempted previously (Data
`S1). A total of 14,922 (of 234,980) SNVs lie in the motifs of 36
`TFs known to be associated with the binding data (see Experi-
`mental Procedures), indicating that these are likely having a
`direct effect on TF binding. Comparison of SNPs that alter
`binding patterns of NFkB and Pol
`II sites (Kasowski et al.,
`2010), also revealed a number of other interesting regulatory
`variants, some of which are associated with human disease
`(e.g., EDIL) (Sun et al., 2010) (Figure S1B).
`
`Medical Phenotypes Monitoring
`Based on the above analysis of medically relevant variants and
`the RiskOGram, we monitored markers associated with high-
`risk disease phenotypes and performed additional medically
`relevant assays.
`Monitoring of glucose levels and HbA1c revealed the onset of
`T2D as diagnosed by the subject’s physician (day 369, Figures
`2A and 2C). The subject lacked many known factors associated
`with diabetes (nonsmoker; BMI = 23.9 and 21.7 on day 0 and day
`511, respectively) and glucose levels were normal for the first
`
`1296 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
`
`part of the study. However, glucose levels elevated shortly after
`the RSV infection (after day 301) extending for several months
`(Figure 2D). High levels of glucose were further confirmed using
`glycated HbA1c measurements at two time points (days 329,
`369) during this period (6.4% and 6.7%, respectively). After
`a dramatic change in diet, exercise and ingestion of low doses
`of acetylsalicylic acid a gradual decrease in glucose (to
`93 mg/dl at day 602) and HbA1c levels to 4.7% was observed.
`Insulin resistance was not evident at day 322. The patient was
`negative for anti-GAD and anti-islet antibodies, and insulin levels
`correlated well with the fasted and nonfasted states (Figure S2C),
`consistent with T2D. These results indicate that a genome
`sequence can be used to estimate disease risk in a healthy indi-
`vidual, and by monitoring traits associated with that disease,
`disease markers can be detected and the phenotype treated.
`The subject contained a TERT mutation previously associated
`with aplastic anemia (Yamaguchi et al., 2005). However, mea-
`surements of telomere length suggested little or no decrease in
`telomere length and modest increase in numbers of cells with
`short telomeres relative to age-matched controls (Figures S2A
`and S2B). Importantly, the patient and his 83-year-old mother
`share the same mutation but neither exhibit symptoms of aplas-
`tic anemia, indicating that this mutation does not always result in
`disease and is likely context specific in its effects.
`Consistent with the elevated hypertriglyceridemia risk, triglyc-
`erides were found to be high (321 mg/dl) at the beginning of the
`study. These levels were reduced (81–116 mg/dl) after regularly
`taking simvastatin (20 mg/day).
`We also examined the variants for their potential effects on
`drug response (see Extended Experimental Procedures). Among
`the alleles of interest, (Figure 2A and Table S2B) two genotypes
`affecting the LPIN1 and SLC22A1 genes were associated with
`
`Personalis EX2030.1296
`
`

`

`Table 2. Summary of Disease-Related Rare Variants
`
`Category
`
`Total high confidence rare SNVs
`
`Coding
`
`Missense
`
`Synonymous
`
`Nonsense
`
`Nonstop
`
`Damaging or possibly damaging
`Putative loss-of-function SNVsa
`
`Total high confidence rare indels
`
`Coding indels
`
`Frameshift indels
`
`miRNA indels
`
`miRNA target sequence indels
`Putative loss-of-function indelsa
`aIn curated Mendelian disease genes.
`
`Count
`
`289,989
`
`2,546
`
`1,320
`
`1,214
`
`11
`
`1
`
`233
`
`51
`
`51,248
`
`61
`
`27
`
`3
`
`5
`
`4
`
`favorable (glucose lowering) responses to two diabetic drugs, ro-
`siglitazone and metformin, respectively.
`We followed the levels of 51 cytokines along with the C-reactive
`protein (CRP) using ELISA assays, which revealed strong induc-
`tion of proinflammatory cytokines and CRP during each infection
`(Figures 2E and 2F). We also observed a spike of many cytokines
`at day 12 after the RSV infection (day 301 overall). These data
`define the physiological states and serve as a valuable reference
`for the omic profiles integrated into a longitudinal map of healthy
`and diseased states described in the next sections.
`We also profiled autoantibodies during the HRV infection.
`Plasma and serum samples from the first four time points
`(days 123, 0, 4 and 21), along with plasma samples from 34
`healthy controls were used to probe a protein microarray con-
`taining 9,483 unique human proteins spotted in duplicate. A total
`of 884 antigens with increased reactivity (Data S2) in the candi-
`date plasma relative to healthy controls were found (p < 0.01,
`Benjamini-Hochberg p < 0.01). Among the potentially interesting
`results was high reactivity with DOK6, an insulin receptor binding
`protein (NCBI gene database). These results demonstrate that
`autoantibodies can be monitored and that information relevant
`to disease conditions can be found.
`
`Dynamic Omics Analysis: Integrative Omics Profiling
`of Molecular Responses
`We profiled the levels of transcripts, proteins, and metabolites
`across the HRV and RSV infections and healthy states using
`a variety of approaches. RNA-Seq of 20 time points generated
`over 2.67 billion uniquely mapped 101b paired-end reads
`(123 million reads average per time point) and allowed for an
`analysis of the molecular complexity of the transcriptome in
`normal cells (PBMCs) at an unprecedented level. The relative
`levels of 6,280 proteins were also measured at 14 time points
`through differential labeling of samples using isobaric tandem
`mass tags (TMT), followed by liquid chromatography and mass
`spectrometry (LC-MS/MS) (Cox and Mann, 2010; Theodoridis
`
`et al., 2011). A total of 3,731 PBMC proteins could be consis-
`tently monitored across most of the 14 time points (see Fig-
`ure S3A and Data S3). In addition, 6,862 and 4,228 metabolite
`peaks were identified for the HRV and RSV infection, and a total
`of 1,020 metabolites were tracked for both infections (see Fig-
`ure S4 and Data S4, [3]). Finally, as described below, we also
`analyzed miRNAs during the HRV infection.
`This wealth of omics information allowed us to examine
`detailed dynamic trends related directly to the physiological
`states of the individual and revealed enormous changes in
`biological processes that occurred during healthy and diseased
`states. For each profile (transcriptome, proteome, metabolome),
`we systematically searched for
`two types of nonrandom
`patterns: (1) correlated patterns over time and (2) single unusual
`events (i.e., spikes that may occur at any given time point defined
`as statistically significantly high or low signal instances com-
`pared to what would be expected by chance). To perform this
`analysis, we developed a general scheme for integrated analysis
`of data (see Figure S5 and Extended Experimental Procedures
`for further details). We used a Fourier spectral analysis approach
`that both normalizes the various omics data on equal basis for
`identifying the common trends and features, and, also accounts
`for data set variability, uneven sampling, and data gaps, in order
`to detect real-time changes in any kind of omics activity at
`the differential
`time points (see Supplemental
`Information).
`Autocorrelations were calculated to assess nonrandomness
`of the time-series (p < 0.05 one-tailed based on simulated
`bootstrap nonparametric distribution by sampling with replace-
`ment of the original data, n > 100,000), with significant signals
`classified as autocorrelated (I). The remaining data was searched
`for spike events, which were classified as spike maxima (II)
`or spike minima (III) (p < 0.05 one-tailed based on differences
`from simulated, n > 100,000 random distribution of the time-
`series). After classification, the data were agglomerated into hier-
`archical clusters (using correlation distance and average linkage)
`of common patterns and biological relevance was assessed
`through GO (Ashburner et al., 2000) analysis (Cytoscape [Smoot
`et al., 2011], BiNGO [Maere et al., 2005] p < 0.05, Benjamini-
`Hochberg [Benjamini and Hochberg, 1995] adjusted p < 0.05)
`and pathway analysis (Reactome [Croft et al., 2011] functional
`interaction [FI], networks including KEGG [Kanehisa and Goto,
`2000; Smoot et al., 2011], p < 0.05, FDR < 0.05). The unified
`framework approach was implemented on all the different data
`sets both individually and in combination, and our results
`revealed a number of differential changes that occurred both
`during infectious states and the varying glucose states.
`We first analyzed the different individual transcriptome, pro-
`teome (serum and PBMC) and metabolome data sets; the
`proteome and metabolome results are presented in the Supple-
`mental Information (Figures S3, S4, S6 and Data S3–S6). A total
`of 19,714 distinct transcript isoforms (Wang et al., 2008) corre-
`sponding to 12,659 genes (Figure S1C) were tracked for the
`entire time course, and their dynamic expression response
`was classified into either autocorrelated (I) and spike sets, further
`subdivided as displaying maxima (II) or minima (III) (Figure 3). The
`clustering and enrichment analysis displayed a number of
`interesting pathways in each class. In the autocorrelated group
`(Figure 3B, [I]; see also Figure S6A and Data S6, [1 and 2]), we
`
`Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1297
`
`Personalis EX2030.1297
`
`

`

`A
`
`C
`
`E
`
`B
`
`D
`
`F
`
`Figure 2. Medical Findings
`(A) High interest disease- and drug-related variants in the subject’s genome.
`(B) RiskGraph of the top 20 diseases with the highest posttest probabilities. For each disease, the arrow represents the pretest probability according to the
`subject’s age, gender, and ethnicity. The line represents the posttest probability after incorporating the subject’s genome sequence. Listed to the right are the
`numbers of independent disease-associated SNVs used to calculate the subject’s posttest probability.
`(C) RiskOGram of type 2 diabetes. The RiskOGram illustrates how the subject’s posttest probability of T2D was calculated using 28 independent SNVs. The
`middle graph displays the posttest probability. The left side shows the associated genes, SNVs, and the subject’s genotypes. The right side shows the likelihood
`ratio (LR), number of studies, cohort sizes, and the posttest probability.
`(D) Blood glucose trend. Measurements were taken from samples analyzed at either nonfasted or fasted states; the nonfasted states (all but days 186,
`322, 329, and 369 and after day 400) were at a fixed time after a constant meal. Data was presented as moving average with a window of 15 days. Red
`
`1298 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
`
`Personalis EX2030.1298
`
`

`

`found two main trends: an upward trend (2,023 genes), following
`the onset of the RSV infection, and a similar coincidental
`downward trend (2,207 genes). The upward autocorrelated
`trend revealed a number of pathways as enriched (p < 0.002,
`FDR < 0.05), including protein metabolism and influenza life
`cycle. Additionally,
`the downward autocorrelation cluster
`showed a multitude of enriched pathways (p < 0.008, FDR <
`0.05), such as TCR signaling in naive CD4+ T cells, lysosome,
`B cell signaling, androgen regulation, and of particular interest,
`insulin signaling/response pathways. These different pathways,
`which are activated as a response to an immune infection, often
`share common genes and additionally we observe many genes
`hitherto unknown to be involved in these pathways but display-
`ing the same trend. Furthermore, we observed that the down-
`ward trend, that began with the onset of the RSV infection and
`appeared to accelerate after day 307, coincided with the begin-
`ning of the observed elevated glucose levels in the subject.
`In the dynamic spike class we again saw patterns that were
`concordant with phenotypes (Figure 3B, [II] and [III]; see also
`Figure S6A and Data S6, [3–14]). A set of expression spikes
`displaying maxima (547 genes), that are common to the onset
`of both the RSV and HRV infections are associated with phago-
`4,
`some, immune processes and phagocytosis, (p < 1 3 10
`3). Furthermore, a cluster that exhibits an elevated
`FDR < 6 3 10
`spike at the onset of the RSV infection involves the major histo-
`4, Benjamini-Hochberg adjusted
`compatibility genes (p < 7 3 10
`p < 0.03). A large number of genes with a coexpression pattern
`common to both infections in the time course have yet to be
`implicated in known pathways and provide possible connections
`related to immune response. Finally, our spike class displaying
`minima showed a distinct cluster (1,535 genes) singular to day
`307 (day 14 of the RSV infection), associated with TCR signaling
`again, TGF receptors, and T cell and insulin signaling pathways
`(p < 0.02, FDR < 0.03). Overall, the transcriptome analysis
`captures the dynamic response of the body responding to infec-
`tion as also evidenced by our cytokine measurements, and also
`can monitor health changes over long periods of time, with
`various trends.
`To further leverage the transcriptome and genome data, we
`performed an integrated analysis of transcriptome, proteomic
`and metabolomics data for each time point, observing how this
`corresponded to the varying physiological states monitored as
`described in the above sections. Because of the availability of
`many time points through the course of infection, we examined
`in detail the onset of the RSV infection, as well as extended our
`complete dynamics omics profile during the times that our
`subject began exhibiting high glucose levels. Figure 4 shows
`an integrated interpretation of omics data (see also Figure S6B
`and Data S7), where all trends are combined for each omics
`data set and the common patterns emerge providing comple-
`mentary information.
`In addition to the common patterns
`
`observed in our transcriptome analysis, new patterns emerged,
`some unique to protein data, some to metabolite, and some
`common to all. In particular we found the following interesting
`results: for autocorrelated clusters we found the same trends
`as observed in the transcriptome, additionally augmented
`with concordant protein expressions. Pathways such as the
`phagosome, lysosome, protein processing in endoplasmic retic-
`ulum, and insulin pathways emerged as significantly enriched
`(p < 0.002, FDR < 0.0075), and showed a downward trend post-
`infection, and further accelerated after 3 weeks following the
`initial onset of the RSV infection (this cluster comprised of
`1,452 transcriptomic and 69 proteomic components, corre-
`sponding to 1,444 genes). The elevated spike class showed a
`maxima cluster on day 18 post RSV infection (one time point
`after the cytokine maximum), with enrichment in pathways
`such as the spliceosome, glucose regulation of insulin secretion,
`4,
`and various pathways related to a stress response (p < 1 3 10
`FDR < 0.02)—this cluster included 1,956 transcriptomic, 571
`proteomic and 23 metabolomic components, corresponding to
`2,344 genes. Even though current proteomic information is
`more limited than the full transcriptome because it follows fewer
`components, as evidenced in Figure 4 (II), several pathways,
`including the glucose regulation of insulin secretion pathway,
`clearly emerge from the proteomic information and would not
`have been observed by only monitoring the transcriptome.
`Additionally, in this cluster we find significant GO enrichment in
`47, Benjamini-
`splicing and metabolic processes (p < 6 3 10
`45). Furthermore,
`Hochberg adjusted p < 10
`inspection of
`metabolites reveals 23 that show the same exact trend (i.e.,
`spikes at day 18 post RSV infection); at least one, lauric acid
`has been implicated in fatty acid metabolism and insulin regula-
`tory pathways (Kusunoki et al., 2007). Finally, we observe minima
`spikes as well, with yet another interesting group on day 18,
`which showed downregulation in several pathways (p < 0.003,
`FDR < 0.05), such as the formation of platelet plug. This cluster
`displayed a high degree of synergy between the various omics
`data, comprised of 3,237 transcriptomic and 761 proteomic
`components corresponding 3,400 genes and 83 metabolomic
`components.
`integrated approach revealed a clear
`In summary, our
`systemic response to the RSV infection following its onset and
`postinfection response,
`including a pronounced response
`evident at day 18 post RSV infection. A variety of infection/stress
`response related pathways were affected along with those
`associated to the high glucose levels in the later time points,
`including insulin response pathways.
`
`Dynamic Omics Analysis: Extensive Heteroallelic
`Variation and RNA Editing
`The considerable amount of transcriptome and proteome data
`allowed us to analyze and follow changes in allele-specific
`
`and green arrows and bars indicate the times of the HRV and RSV infections, respectively. Black arrows and bars indicate the period with life style
`changes.
`(E) C-reactive protein trend line. Error bars represent standard deviation of three assays.
`(F) Serum cytokine profiles. Red box and day number, HRV infection; green box and day number, RSV infection; question mark, elevated cytokine levels indi-
`cating an unknown event at day 301. Red is increased cytokine levels.
`See also Figure S2.
`
`Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1299
`
`Personalis EX2030.1299
`
`

`

`A
`
`B
`
`(I)
`
`(II)
`
`(III)
`
`Figure 3. Transcriptome Time Course Analysis
`(A) Summary of approach for identification of differentially expressed components. The various omics sets were processed through a common framework
`involving spectral analysis, clustering, and pathway enrichment analysis.
`(B) Pattern classification. The different emergent patterns from the analysis of the transcriptome

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket