`
`Personal Omics Profiling
`Reveals Dynamic Molecular
`and Medical Phenotypes
`
`Rui Chen,1,11 George I. Mias,1,11 Jennifer Li-Pook-Than,1,11 Lihua Jiang,1,11 Hugo Y.K. Lam,1,12 Rong Chen,2,12
`Elana Miriami,1 Konrad J. Karczewski,1 Manoj Hariharan,1 Frederick E. Dewey,3 Yong Cheng,1 Michael J. Clark,1
`Hogune Im,1 Lukas Habegger,6,7 Suganthi Balasubramanian,6,7 Maeve O’Huallachain,1 Joel T. Dudley,2
`Sara Hillenmeyer,1 Rajini Haraksingh,1 Donald Sharon,1 Ghia Euskirchen,1 Phil Lacroute,1 Keith Bettinger,1 Alan P. Boyle,1
`Maya Kasowski,1 Fabian Grubert,1 Scott Seki,2 Marco Garcia,2 Michelle Whirl-Carrillo,1 Mercedes Gallardo,9,10
`Maria A. Blasco,9 Peter L. Greenberg,4 Phyllis Snyder,1 Teri E. Klein,1 Russ B. Altman,1,5 Atul J. Butte,2 Euan A. Ashley,3
`Mark Gerstein,6,7,8 Kari C. Nadeau,2 Hua Tang,1 and Michael Snyder1,*
`1Department of Genetics, Stanford University School of Medicine
`2Division of Systems Medicine and Division of Immunology and Allergy, Department of Pediatrics
`3Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine
`4Division of Hematology, Department of Medicine
`5Department of Bioengineering
`Stanford University, Stanford, CA 94305, USA
`6Program in Computational Biology and Bioinformatics
`7Department of Molecular Biophysics and Biochemistry
`8Department of Computer Science
`Yale University, New Haven, CT 06520, USA
`9Telomeres and Telomerase Group, Molecular Oncology Program, Spanish National Cancer Centre (CNIO), Madrid E-28029, Spain
`10Life Length, Madrid E-28003, Spain
`11These authors contributed equally to this work
`12Present address: Personalis, Palo Alto, CA 94301, USA
`*Correspondence: mpsnyder@stanford.edu
`DOI 10.1016/j.cell.2012.02.009
`
`SUMMARY
`
`INTRODUCTION
`
`Personalized medicine is expected to benefit from
`combining genomic information with regular moni-
`toring of physiological states by multiple high-
`throughput methods. Here, we present an integrative
`personal omics profile (iPOP), an analysis that
`combines genomic, transcriptomic, proteomic, me-
`tabolomic, and autoantibody profiles from a single
`individual over a 14 month period. Our iPOP analysis
`revealed various medical risks,
`including type 2
`diabetes.
`It also uncovered extensive, dynamic
`changes in diverse molecular components and
`biological pathways across healthy and diseased
`conditions. Extremely
`high-coverage
`genomic
`and transcriptomic data, which provide the basis
`of our
`iPOP,
`revealed extensive heteroallelic
`changes during healthy and diseased states and an
`unexpected RNA editing mechanism. This study
`demonstrates that longitudinal
`iPOP can be used
`to interpret healthy and diseased states by connect-
`ing genomic information with additional dynamic
`omics activity.
`
`Personalized medicine aims to assess medical risks, monitor,
`diagnose and treat patients according to their specific genetic
`composition and molecular phenotype. The advent of genome
`sequencing and the analysis of physiological states has proven
`to be powerful
`(Cancer Genome Atlas Research Network,
`2011). However, its implementation for the analysis of otherwise
`healthy individuals for estimation of disease risk and medical
`interpretation is less clear. Much of the genome is difficult to
`interpret and many complex diseases, such as diabetes, neuro-
`logical disorders and cancer, likely involve a large number of
`different genes and biological pathways (Ashley et al., 2010;
`Grayson et al., 2011; Li et al., 2011), as well as environmental
`contributors that can be difficult to assess. As such, the com-
`bination of genomic information along with a detailed molecular
`analysis of samples will be important for predicting, diagnosing
`and treating diseases as well as for understanding the onset, pro-
`gression, and prevalence of disease states (Snyder et al., 2009).
`Presently, healthy and diseased states are typically followed
`using a limited number of assays that analyze a small number
`of markers of distinct types. With the advancement of many
`new technologies, it is now possible to analyze upward of 105
`molecular constituents. For example, DNA microarrays have
`allowed the subcategorization of
`lymphomas and gliomas
`
`Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1293
`
`Personalis EX2030.1293
`
`
`
`(Mischel et al., 2003), and RNA sequencing (RNA-Seq) has
`identified breast cancer transcript isoforms (Li et al., 2011; van
`der Werf et al., 2007; Wu et al., 2010; Lapuk et al., 2010).
`Although transcriptome and RNA splicing profiling are powerful
`and convenient, they provide a partial portrait of an organism’s
`physiological state. Transcriptomic data, when combined with
`genomic, proteomic, and metabolomic data are expected to
`provide a much deeper understanding of normal and diseased
`states (Snyder et al., 2010). To date, comprehensive integrative
`omics profiles have been limited and have not been applied to
`the analysis of generally healthy individuals.
`To obtain a better understanding of: (1) how to generate an
`integrative personal omics profile (iPOP) and examine as many
`biological components as possible, (2) how these components
`change during healthy and diseased states, and (3) how this
`information can be combined with genomic information to
`estimate disease risk and gain new insights into diseased states,
`we performed extensive omics profiling of blood components
`from a generally healthy individual over a 14 month period
`(24 months total when including time points with other molecular
`analyses). We determined the whole-genome sequence (WGS)
`of the subject, and together with transcriptomic, proteomic, me-
`tabolomic, and autoantibody profiles, used this information to
`generate an iPOP. We analyzed the iPOP of the individual over
`the course of healthy states and two viral infections (Figure 1A).
`Our results indicate that disease risk can be estimated by
`a whole-genome sequence and by regularly monitoring health
`states with iPOP disease onset may also be observed. The
`wealth of information provided by detailed longitudinal iPOP re-
`vealed unexpected molecular complexity, which exhibited
`dynamic changes during healthy and diseased states, and
`provided insight into multiple biological processes. Detailed
`omics profiling coupled with genome sequencing can provide
`molecular and physiological information of medical significance.
`This approach can be generalized for personalized health moni-
`toring and medicine.
`
`RESULTS
`
`Overview of Personal Omics Profiling
`Our overall
`iPOP strategy was to: (1) determine the genome
`sequence at high accuracy and evaluate disease risks,
`(2)
`monitor omics components over time and integrate the relevant
`omics information to assess the variation of physiological states,
`and (3) examine in detail the expression of personal variants
`at the level of RNA and protein to study molecular complexity
`and dynamic changes in diseased states.
`We performed iPOP on blood components (peripheral blood
`mononuclear cells [PBMCs], plasma and sera that are highly
`accessible) from a 54-year-old male volunteer over the course
`of 14 months (IRB-8629). The samples used for iPOP were taken
`over an interval of 401 days (days 0–400). In addition, a complete
`medical exam plus laboratory and additional tests were per-
`formed before the study officially launched (day 123) and blood
`glucose was sampled multiple times after the comprehensive
`omics profiling (days 401–602) (Figure 1A). Extensive sampling
`was performed during two viral infections that occurred during
`this period: a human rhinovirus (HRV) infection beginning on
`
`1294 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
`
`day 0 and a respiratory syncytial virus (RSV) infection starting
`on day 289. A total of 20 time points were extensively analyzed
`and a summary of the time course is indicated in Figure 1A.
`The different types of analyses performed are summarized in
`Figures 1B and 1C. These analyses, performed on PBMCs
`and/or serum components, included WGS, complete transcrip-
`tome analysis (providing information about the abundance of
`alternative spliced isoforms, heteroallelic expression, and RNA
`edits, as well as expression of miRNAs at selected time points),
`proteomic and metabolomic analyses, and autoantibody
`profiles. An integrative analysis of these data highlights dynamic
`omics changes and provides rich information about healthy and
`diseased phenotypes.
`
`Whole-Genome Sequencing
`We first generated a high quality genome sequence of this
`individual using a variety of different technologies. Genomic
`DNA was subjected to deep WGS using technologies from
`Complete Genomics (CG, 35 nt paired end) and Illumina
`(100 nt paired end) at 150- and 120-fold total coverage, respec-
`tively, exome sequencing using three different technologies to
`80- to 100-fold average coverage (see Extended Experimental
`Procedures available online) and analysis using genotyping
`arrays and RNA sequencing.
`The vast majority of genomic sequences (91%) mapped to the
`hg19 (GRCh37) reference genome. However, because of the
`depth of our sequencing, we were able to identify sequences
`not present
`in the reference sequence. Assembly of
`the
`unmapped Illumina sequencing reads (60,434,531, 9% of the
`total) resulted in 1,425 (of 29,751) contigs (spanning 26 Mb) over-
`lapping with RefSeq gene sequences that were not annotated in
`the hg19 reference genome. The remaining sequences appeared
`unique, including 2,919 exons expressed in the RNA-Seq data
`(e.g., Figure S1A). These results confirm that a large number of
`undocumented genetic regions exist
`in individual human
`genome sequences and can be identified by very deep
`sequencing and de novo assembly (Li et al., 2010).
`Our analysis detected many single nucleotide variants (SNVs),
`small
`insertions and deletions (indels) and structural variants
`(SVs;
`large insertions, deletions, and inversions relative to
`hg19), (summarized in Table 1 and Experimental Procedures).
`134,341 (4.1%) high-confidence SNVs are not present
`in
`dbSNP, indicating that they are very rare or private to the
`subject. Only 302 high-confidence indels reside within RefSeq
`protein coding exons and exhibit enrichments in multiples of
`three nucleotides (p < 0.0001).
`In addition to indels, 2,566
`high-confidence SVs were identified (Experimental Procedures
`and Table S1) and 8,646 mobile element insertions were identi-
`fied (Stewart et al., 2011).
`Analysis of the subject’s mother’s genome by comprehensive
`genome sequencing (as above) and imputation allowed a
`maternal/paternal chromosomal phasing of 92.5% of
`the
`subject’s SNVs and indels (see Extended Experimental Proce-
`dures for details). Of 1,162 compound heterozygous mutations
`in genes, 139 contain predicted compound heterozygous
`deleterious and/or nonsense mutations. Phasing enabled the
`assembly of a personal genome sequence of very high confi-
`dence (c.f., Rozowsky et al., 2011).
`
`Personalis EX2030.1294
`
`
`
`B
`
`A
`
`C
`
`Figure 1. Summary of Study
`(A) Time course summary. The subject was monitored for a total of 726 days, during which there were two infections (red bar, HRV; green bar, RSV). The black bar
`indicates the period when the subject: (1) increased exercise, (2) ingested 81 mg of acetylsalicylic acid and ibuprofen tablets each day (the latter only during the
`first 6 weeks of this period), and (3) substantially reduced sugar intake. Blue numbers indicate fasted time points.
`(B) iPOP experimental design indicating the tissues and analyses involved in this study.
`(C) Circos (Krzywinski et al., 2009) plot summarizing iPOP. From outer to inner rings: chromosome ideogram; genomic data (pale blue ring), structural variants >
`50 bp (deletions [blue tiles], duplications [red tiles]), indels (green triangles); transcriptomic data (yellow ring), expression ratio of HRV infection to healthy states;
`proteomic data (light purple ring), ratio of protein levels during HRV infection to healthy states; transcriptomic data (yellow ring), differential heteroallelic
`expression ratio of alternative allele to reference allele for missense and synonymous variants (purple dots) and candidate RNA missense and synonymous edits
`(red triangles, purple dots, orange triangles and green dots, respectively).
`See also Figure S1.
`
`WGS-Based Disease Risk Evaluation
`We identified variants likely to be associated with increased
`susceptibility to disease (Dewey et al., 2011). The list of high
`confidence SNVs and indels was analyzed for rare alleles (<5%
`of the major allele frequency in Europeans) and for changes in
`genes with known Mendelian disease phenotypes (data summa-
`rized in Table 2), revealing that 51 and 4 of the rare coding SNV
`and indels, respectively, in genes present in OMIM are predicted
`
`to lead to loss-of-function (Table S2A). This list of genes was
`further examined for medical relevance (Table S2A; example
`alleles are summarized in Figure 2A), and 11 were validated by
`Sanger sequencing. High interest genes include: (1) a mutation
`(E366K) in the SERPINA1 gene previously known in the subject,
`(2) a damaging mutation in TERT, associated with acquired
`aplastic anemia (Yamaguchi et al., 2005), and (3) variants asso-
`ciated with hypertriglyceridemia and diabetes, such as GCKR
`
`Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1295
`
`Personalis EX2030.1295
`
`
`
`Table 1. Summary and Breakdown of DNA Variants
`
`Total Variants
`
`Total High Confidence
`
`Heterozygous High Confidence
`
`Homozygous High Confidence
`
`Total gene-associated SNVs
`
`1,312,780
`
`Type
`
`Total SNVs
`
`Total coding/UTR
`
`Missense
`
`Nonsense
`
`Synonymous
`0
`0
`
`5
`
`UTR
`
`3
`
`UTR
`
`3,739,701
`
`49,017
`
`10,592
`
`83
`
`11,459
`
`4,085
`
`22,798
`
`3,301,521
`
`1,183,847
`
`44,542
`
`9,683
`
`73
`
`10,864
`
`2,978
`
`20,944
`
`1,971,629
`
`717,485
`
`27,383
`
`5,944
`
`49
`
`6,747
`
`1,802
`
`12,841
`
`690,102
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`1,329,892
`
`466,362
`
`17,159
`
`3,739
`
`24
`
`4,117
`
`1,176
`
`8,103
`
`449,203
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`—
`
`Intron
`
`Ts/Tv
`
`dbSNP
`
`1,263,763
`
`1,139,305
`
`—
`
`2.14
`
`3,493,748
`
`3,167,180
`
`Candidate private SNV
`Indels ( 107 +36 bp)
`Coding
`
`Structural variants (>50 bp)
`In 1000G projecta
`
`245,953
`
`1,022,901
`
`3,263
`
`44,781
`
`4,434
`
`134,341
`
`216,776
`
`302
`
`2,566
`
`1,967
`
`High confidence values are from variants identified across multiple platforms (Illumina and CG) and/or Exome and RNA-Seq data. Annotations were
`based from variant call formatted (vcf) files for heterozygous calls: 0/1, reference (ref)/alternative (alt); 1/2, alt/alt and homozygous calls; 1/1, alt/alt; 1/,
`(alt/alt-incomplete call). Polyphen-2 was used to identify the location of the SNVs.
`a1000G (1000 Genomes Project Consortium, 2010).
`
`(homozygous) (Vaxillaire et al., 2008), and KCNJ11 (homozy-
`gous) (Hani et al., 1998) and TCF7 (heterozygous) (Erlich et al.,
`2009).
`Genetic disease risks were also assessed by the RiskOGram
`algorithm, which integrates information from multiple alleles
`associated with disease risk (Ashley et al., 2010) (Figure 2B).
`This analysis revealed a modest elevated risk for coronary artery
`disease and significantly elevated risk levels of basal cell carci-
`noma (Figure 2B), hypertriglyceridemia, and type 2 diabetes
`(T2D) (Figures 2B and 2C).
`In addition to coding region variants we also analyzed genomic
`variants that may affect
`regulatory elements (transcription
`factors [TF]), which had not been attempted previously (Data
`S1). A total of 14,922 (of 234,980) SNVs lie in the motifs of 36
`TFs known to be associated with the binding data (see Experi-
`mental Procedures), indicating that these are likely having a
`direct effect on TF binding. Comparison of SNPs that alter
`binding patterns of NFkB and Pol
`II sites (Kasowski et al.,
`2010), also revealed a number of other interesting regulatory
`variants, some of which are associated with human disease
`(e.g., EDIL) (Sun et al., 2010) (Figure S1B).
`
`Medical Phenotypes Monitoring
`Based on the above analysis of medically relevant variants and
`the RiskOGram, we monitored markers associated with high-
`risk disease phenotypes and performed additional medically
`relevant assays.
`Monitoring of glucose levels and HbA1c revealed the onset of
`T2D as diagnosed by the subject’s physician (day 369, Figures
`2A and 2C). The subject lacked many known factors associated
`with diabetes (nonsmoker; BMI = 23.9 and 21.7 on day 0 and day
`511, respectively) and glucose levels were normal for the first
`
`1296 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
`
`part of the study. However, glucose levels elevated shortly after
`the RSV infection (after day 301) extending for several months
`(Figure 2D). High levels of glucose were further confirmed using
`glycated HbA1c measurements at two time points (days 329,
`369) during this period (6.4% and 6.7%, respectively). After
`a dramatic change in diet, exercise and ingestion of low doses
`of acetylsalicylic acid a gradual decrease in glucose (to
`93 mg/dl at day 602) and HbA1c levels to 4.7% was observed.
`Insulin resistance was not evident at day 322. The patient was
`negative for anti-GAD and anti-islet antibodies, and insulin levels
`correlated well with the fasted and nonfasted states (Figure S2C),
`consistent with T2D. These results indicate that a genome
`sequence can be used to estimate disease risk in a healthy indi-
`vidual, and by monitoring traits associated with that disease,
`disease markers can be detected and the phenotype treated.
`The subject contained a TERT mutation previously associated
`with aplastic anemia (Yamaguchi et al., 2005). However, mea-
`surements of telomere length suggested little or no decrease in
`telomere length and modest increase in numbers of cells with
`short telomeres relative to age-matched controls (Figures S2A
`and S2B). Importantly, the patient and his 83-year-old mother
`share the same mutation but neither exhibit symptoms of aplas-
`tic anemia, indicating that this mutation does not always result in
`disease and is likely context specific in its effects.
`Consistent with the elevated hypertriglyceridemia risk, triglyc-
`erides were found to be high (321 mg/dl) at the beginning of the
`study. These levels were reduced (81–116 mg/dl) after regularly
`taking simvastatin (20 mg/day).
`We also examined the variants for their potential effects on
`drug response (see Extended Experimental Procedures). Among
`the alleles of interest, (Figure 2A and Table S2B) two genotypes
`affecting the LPIN1 and SLC22A1 genes were associated with
`
`Personalis EX2030.1296
`
`
`
`Table 2. Summary of Disease-Related Rare Variants
`
`Category
`
`Total high confidence rare SNVs
`
`Coding
`
`Missense
`
`Synonymous
`
`Nonsense
`
`Nonstop
`
`Damaging or possibly damaging
`Putative loss-of-function SNVsa
`
`Total high confidence rare indels
`
`Coding indels
`
`Frameshift indels
`
`miRNA indels
`
`miRNA target sequence indels
`Putative loss-of-function indelsa
`aIn curated Mendelian disease genes.
`
`Count
`
`289,989
`
`2,546
`
`1,320
`
`1,214
`
`11
`
`1
`
`233
`
`51
`
`51,248
`
`61
`
`27
`
`3
`
`5
`
`4
`
`favorable (glucose lowering) responses to two diabetic drugs, ro-
`siglitazone and metformin, respectively.
`We followed the levels of 51 cytokines along with the C-reactive
`protein (CRP) using ELISA assays, which revealed strong induc-
`tion of proinflammatory cytokines and CRP during each infection
`(Figures 2E and 2F). We also observed a spike of many cytokines
`at day 12 after the RSV infection (day 301 overall). These data
`define the physiological states and serve as a valuable reference
`for the omic profiles integrated into a longitudinal map of healthy
`and diseased states described in the next sections.
`We also profiled autoantibodies during the HRV infection.
`Plasma and serum samples from the first four time points
`(days 123, 0, 4 and 21), along with plasma samples from 34
`healthy controls were used to probe a protein microarray con-
`taining 9,483 unique human proteins spotted in duplicate. A total
`of 884 antigens with increased reactivity (Data S2) in the candi-
`date plasma relative to healthy controls were found (p < 0.01,
`Benjamini-Hochberg p < 0.01). Among the potentially interesting
`results was high reactivity with DOK6, an insulin receptor binding
`protein (NCBI gene database). These results demonstrate that
`autoantibodies can be monitored and that information relevant
`to disease conditions can be found.
`
`Dynamic Omics Analysis: Integrative Omics Profiling
`of Molecular Responses
`We profiled the levels of transcripts, proteins, and metabolites
`across the HRV and RSV infections and healthy states using
`a variety of approaches. RNA-Seq of 20 time points generated
`over 2.67 billion uniquely mapped 101b paired-end reads
`(123 million reads average per time point) and allowed for an
`analysis of the molecular complexity of the transcriptome in
`normal cells (PBMCs) at an unprecedented level. The relative
`levels of 6,280 proteins were also measured at 14 time points
`through differential labeling of samples using isobaric tandem
`mass tags (TMT), followed by liquid chromatography and mass
`spectrometry (LC-MS/MS) (Cox and Mann, 2010; Theodoridis
`
`et al., 2011). A total of 3,731 PBMC proteins could be consis-
`tently monitored across most of the 14 time points (see Fig-
`ure S3A and Data S3). In addition, 6,862 and 4,228 metabolite
`peaks were identified for the HRV and RSV infection, and a total
`of 1,020 metabolites were tracked for both infections (see Fig-
`ure S4 and Data S4, [3]). Finally, as described below, we also
`analyzed miRNAs during the HRV infection.
`This wealth of omics information allowed us to examine
`detailed dynamic trends related directly to the physiological
`states of the individual and revealed enormous changes in
`biological processes that occurred during healthy and diseased
`states. For each profile (transcriptome, proteome, metabolome),
`we systematically searched for
`two types of nonrandom
`patterns: (1) correlated patterns over time and (2) single unusual
`events (i.e., spikes that may occur at any given time point defined
`as statistically significantly high or low signal instances com-
`pared to what would be expected by chance). To perform this
`analysis, we developed a general scheme for integrated analysis
`of data (see Figure S5 and Extended Experimental Procedures
`for further details). We used a Fourier spectral analysis approach
`that both normalizes the various omics data on equal basis for
`identifying the common trends and features, and, also accounts
`for data set variability, uneven sampling, and data gaps, in order
`to detect real-time changes in any kind of omics activity at
`the differential
`time points (see Supplemental
`Information).
`Autocorrelations were calculated to assess nonrandomness
`of the time-series (p < 0.05 one-tailed based on simulated
`bootstrap nonparametric distribution by sampling with replace-
`ment of the original data, n > 100,000), with significant signals
`classified as autocorrelated (I). The remaining data was searched
`for spike events, which were classified as spike maxima (II)
`or spike minima (III) (p < 0.05 one-tailed based on differences
`from simulated, n > 100,000 random distribution of the time-
`series). After classification, the data were agglomerated into hier-
`archical clusters (using correlation distance and average linkage)
`of common patterns and biological relevance was assessed
`through GO (Ashburner et al., 2000) analysis (Cytoscape [Smoot
`et al., 2011], BiNGO [Maere et al., 2005] p < 0.05, Benjamini-
`Hochberg [Benjamini and Hochberg, 1995] adjusted p < 0.05)
`and pathway analysis (Reactome [Croft et al., 2011] functional
`interaction [FI], networks including KEGG [Kanehisa and Goto,
`2000; Smoot et al., 2011], p < 0.05, FDR < 0.05). The unified
`framework approach was implemented on all the different data
`sets both individually and in combination, and our results
`revealed a number of differential changes that occurred both
`during infectious states and the varying glucose states.
`We first analyzed the different individual transcriptome, pro-
`teome (serum and PBMC) and metabolome data sets; the
`proteome and metabolome results are presented in the Supple-
`mental Information (Figures S3, S4, S6 and Data S3–S6). A total
`of 19,714 distinct transcript isoforms (Wang et al., 2008) corre-
`sponding to 12,659 genes (Figure S1C) were tracked for the
`entire time course, and their dynamic expression response
`was classified into either autocorrelated (I) and spike sets, further
`subdivided as displaying maxima (II) or minima (III) (Figure 3). The
`clustering and enrichment analysis displayed a number of
`interesting pathways in each class. In the autocorrelated group
`(Figure 3B, [I]; see also Figure S6A and Data S6, [1 and 2]), we
`
`Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1297
`
`Personalis EX2030.1297
`
`
`
`A
`
`C
`
`E
`
`B
`
`D
`
`F
`
`Figure 2. Medical Findings
`(A) High interest disease- and drug-related variants in the subject’s genome.
`(B) RiskGraph of the top 20 diseases with the highest posttest probabilities. For each disease, the arrow represents the pretest probability according to the
`subject’s age, gender, and ethnicity. The line represents the posttest probability after incorporating the subject’s genome sequence. Listed to the right are the
`numbers of independent disease-associated SNVs used to calculate the subject’s posttest probability.
`(C) RiskOGram of type 2 diabetes. The RiskOGram illustrates how the subject’s posttest probability of T2D was calculated using 28 independent SNVs. The
`middle graph displays the posttest probability. The left side shows the associated genes, SNVs, and the subject’s genotypes. The right side shows the likelihood
`ratio (LR), number of studies, cohort sizes, and the posttest probability.
`(D) Blood glucose trend. Measurements were taken from samples analyzed at either nonfasted or fasted states; the nonfasted states (all but days 186,
`322, 329, and 369 and after day 400) were at a fixed time after a constant meal. Data was presented as moving average with a window of 15 days. Red
`
`1298 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
`
`Personalis EX2030.1298
`
`
`
`found two main trends: an upward trend (2,023 genes), following
`the onset of the RSV infection, and a similar coincidental
`downward trend (2,207 genes). The upward autocorrelated
`trend revealed a number of pathways as enriched (p < 0.002,
`FDR < 0.05), including protein metabolism and influenza life
`cycle. Additionally,
`the downward autocorrelation cluster
`showed a multitude of enriched pathways (p < 0.008, FDR <
`0.05), such as TCR signaling in naive CD4+ T cells, lysosome,
`B cell signaling, androgen regulation, and of particular interest,
`insulin signaling/response pathways. These different pathways,
`which are activated as a response to an immune infection, often
`share common genes and additionally we observe many genes
`hitherto unknown to be involved in these pathways but display-
`ing the same trend. Furthermore, we observed that the down-
`ward trend, that began with the onset of the RSV infection and
`appeared to accelerate after day 307, coincided with the begin-
`ning of the observed elevated glucose levels in the subject.
`In the dynamic spike class we again saw patterns that were
`concordant with phenotypes (Figure 3B, [II] and [III]; see also
`Figure S6A and Data S6, [3–14]). A set of expression spikes
`displaying maxima (547 genes), that are common to the onset
`of both the RSV and HRV infections are associated with phago-
` 4,
`some, immune processes and phagocytosis, (p < 1 3 10
` 3). Furthermore, a cluster that exhibits an elevated
`FDR < 6 3 10
`spike at the onset of the RSV infection involves the major histo-
` 4, Benjamini-Hochberg adjusted
`compatibility genes (p < 7 3 10
`p < 0.03). A large number of genes with a coexpression pattern
`common to both infections in the time course have yet to be
`implicated in known pathways and provide possible connections
`related to immune response. Finally, our spike class displaying
`minima showed a distinct cluster (1,535 genes) singular to day
`307 (day 14 of the RSV infection), associated with TCR signaling
`again, TGF receptors, and T cell and insulin signaling pathways
`(p < 0.02, FDR < 0.03). Overall, the transcriptome analysis
`captures the dynamic response of the body responding to infec-
`tion as also evidenced by our cytokine measurements, and also
`can monitor health changes over long periods of time, with
`various trends.
`To further leverage the transcriptome and genome data, we
`performed an integrated analysis of transcriptome, proteomic
`and metabolomics data for each time point, observing how this
`corresponded to the varying physiological states monitored as
`described in the above sections. Because of the availability of
`many time points through the course of infection, we examined
`in detail the onset of the RSV infection, as well as extended our
`complete dynamics omics profile during the times that our
`subject began exhibiting high glucose levels. Figure 4 shows
`an integrated interpretation of omics data (see also Figure S6B
`and Data S7), where all trends are combined for each omics
`data set and the common patterns emerge providing comple-
`mentary information.
`In addition to the common patterns
`
`observed in our transcriptome analysis, new patterns emerged,
`some unique to protein data, some to metabolite, and some
`common to all. In particular we found the following interesting
`results: for autocorrelated clusters we found the same trends
`as observed in the transcriptome, additionally augmented
`with concordant protein expressions. Pathways such as the
`phagosome, lysosome, protein processing in endoplasmic retic-
`ulum, and insulin pathways emerged as significantly enriched
`(p < 0.002, FDR < 0.0075), and showed a downward trend post-
`infection, and further accelerated after 3 weeks following the
`initial onset of the RSV infection (this cluster comprised of
`1,452 transcriptomic and 69 proteomic components, corre-
`sponding to 1,444 genes). The elevated spike class showed a
`maxima cluster on day 18 post RSV infection (one time point
`after the cytokine maximum), with enrichment in pathways
`such as the spliceosome, glucose regulation of insulin secretion,
` 4,
`and various pathways related to a stress response (p < 1 3 10
`FDR < 0.02)—this cluster included 1,956 transcriptomic, 571
`proteomic and 23 metabolomic components, corresponding to
`2,344 genes. Even though current proteomic information is
`more limited than the full transcriptome because it follows fewer
`components, as evidenced in Figure 4 (II), several pathways,
`including the glucose regulation of insulin secretion pathway,
`clearly emerge from the proteomic information and would not
`have been observed by only monitoring the transcriptome.
`Additionally, in this cluster we find significant GO enrichment in
` 47, Benjamini-
`splicing and metabolic processes (p < 6 3 10
` 45). Furthermore,
`Hochberg adjusted p < 10
`inspection of
`metabolites reveals 23 that show the same exact trend (i.e.,
`spikes at day 18 post RSV infection); at least one, lauric acid
`has been implicated in fatty acid metabolism and insulin regula-
`tory pathways (Kusunoki et al., 2007). Finally, we observe minima
`spikes as well, with yet another interesting group on day 18,
`which showed downregulation in several pathways (p < 0.003,
`FDR < 0.05), such as the formation of platelet plug. This cluster
`displayed a high degree of synergy between the various omics
`data, comprised of 3,237 transcriptomic and 761 proteomic
`components corresponding 3,400 genes and 83 metabolomic
`components.
`integrated approach revealed a clear
`In summary, our
`systemic response to the RSV infection following its onset and
`postinfection response,
`including a pronounced response
`evident at day 18 post RSV infection. A variety of infection/stress
`response related pathways were affected along with those
`associated to the high glucose levels in the later time points,
`including insulin response pathways.
`
`Dynamic Omics Analysis: Extensive Heteroallelic
`Variation and RNA Editing
`The considerable amount of transcriptome and proteome data
`allowed us to analyze and follow changes in allele-specific
`
`and green arrows and bars indicate the times of the HRV and RSV infections, respectively. Black arrows and bars indicate the period with life style
`changes.
`(E) C-reactive protein trend line. Error bars represent standard deviation of three assays.
`(F) Serum cytokine profiles. Red box and day number, HRV infection; green box and day number, RSV infection; question mark, elevated cytokine levels indi-
`cating an unknown event at day 301. Red is increased cytokine levels.
`See also Figure S2.
`
`Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1299
`
`Personalis EX2030.1299
`
`
`
`A
`
`B
`
`(I)
`
`(II)
`
`(III)
`
`Figure 3. Transcriptome Time Course Analysis
`(A) Summary of approach for identification of differentially expressed components. The various omics sets were processed through a common framework
`involving spectral analysis, clustering, and pathway enrichment analysis.
`(B) Pattern classification. The different emergent patterns from the analysis of the transcriptome