throbber
Phased Whole-Genome Genetic Risk in a Family Quartet
`Using a Major Allele Reference Sequence
`
`Frederick E. Dewey1, Rong Chen2, Sergio P. Cordero3, Kelly E. Ormond4,5, Colleen Caleshu1, Konrad J.
`Karczewski3,4, Michelle Whirl-Carrillo4, Matthew T. Wheeler1, Joel T. Dudley2,3, Jake K. Byrnes4, Omar E.
`Cornejo4, Joshua W. Knowles1, Mark Woon4, Katrin Sangkuhl4, Li Gong4, Caroline F. Thorn4, Joan M.
`Hebert4, Emidio Capriotti4, Sean P. David4, Aleksandra Pavlovic1, Anne West6, Joseph V. Thakuria7,
`Madeleine P. Ball8, Alexander W. Zaranek8, Heidi L. Rehm9, George M. Church8, John S. West10, Carlos D.
`Bustamante4, Michael Snyder4, Russ B. Altman4,11, Teri E. Klein4, Atul J. Butte2, Euan A. Ashley1*
`
`1 Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America, 2 Division of Systems
`Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America, 3 Biomedical Informatics Graduate Training
`Program, Stanford University School of Medicine, Stanford, California, United States of America, 4 Department of Genetics, Stanford University School of Medicine,
`Stanford, California, United States of America, 5 Center for Biomedical Ethics, Stanford University, Stanford, California, United States of America, 6 Wellesley College,
`Wellesley, Massachusetts, United States of America, 7 Division of Genetics, Massachusetts General Hospital, Boston, Massachusetts, United States of America,
`8 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America, 9 Department of Pathology, Harvard Medical School, Boston,
`Massachusetts, United States of America, 10 Personalis, Palo Alto, California, United States of America, 11 Department of Bioengineering, Stanford University, Stanford,
`California, United States of America
`
`Abstract
`
`Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation.
`Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of
`genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele
`reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites
`to the lowest median resolution demonstrated to date (,1,000 base pairs). We use family inheritance state analysis to
`control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound
`heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to
`disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding
`and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying
`multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific,
`family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk
`assessment using whole-genome sequencing.
`
`Citation: Dewey FE, Chen R, Cordero SP, Ormond KE, Caleshu C, et al. (2011) Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele
`Reference Sequence. PLoS Genet 7(9): e1002280. doi:10.1371/journal.pgen.1002280
`
`Editor: Gregory P. Copenhaver, The University of North Carolina at Chapel Hill, United States of America
`
`Received April 21, 2011; Accepted July 26, 2011; Published September 15, 2011
`Copyright: ß 2011 Dewey et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
`unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
`
`Funding: FED was supported by NIH/NHLBI training grant T32 HL094274-01A2 and the Stanford University School of Medicine Dean’s Postdoctoral Fellowship.
`MTW was supported by NIH National Research Service Award fellowship F32 HL097462. JKB, OEC, and CDB were supported by NHGRI grant U01HG005715. CFT,
`JMH, KS, LG, MW-C, MW, and RBA were supported by grants from the NIH/NIGMS U01 GM61374. KEO was supported by NIH/NHGRI 5 P50 HG003389-05. AJB was
`supported by the Lucile Packard Foundation for Children’s Health, Hewlett Packard Foundation, and NIH/NIGMS R01 GM079719. JTD and KJK were supported by
`NIH/NLM T15 LM007033. EAA was supported by NIH/NHLBI KO8 HL083914, NIH New Investigator DP2 Award OD004613, and a grant from the Breetwor Family
`Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
`
`Competing Interests: JVT and AWZ are founders, consultants, and equity holders in Clinical Future; GMC has advisory roles in and research sponsorships from
`several companies involved in genome sequencing technology and personal genomics (see http://arep.med.harvard.edu/gmc/tech.html); MS is on the scientific
`advisory board of DNA Nexus and holds stock in Personalis; RBA has received consultancy fees from Novartis and 23andMe and holds stock in Personalis; AJB is a
`scientific advisory board member and founder for NuMedii and Genstruct, a scientific advisory board member for Johnson and Johnson, has received consultancy
`fees from Lilly, NuMedii, Johnson and Johnson, Genstruct, Tercica, and Prevendia and honoraria from Lilly and Siemens, and holds stock in NuMedii, Genstruct,
`and Personalis. EAA holds stock in Personalis.
`
`* E-mail: euan@stanford.edu
`
`Introduction
`
`Whole genome sequencing of related individuals provides a
`window into human recombination as well as superior error
`control and the ability to phase genomes assembled from high
`throughput short read sequencing technologies. The interro-
`gation of the entire euchromatic genome, as opposed to the
`coding exome, provides superior sensitivity to recombination
`
`events, allows for full interrogation of regulatory regions, and
`comprehensive exploration of disease associated variant loci, of
`which nearly 90% fall into non-protein-coding regions [1]. The
`recent publication of low-coverage sequencing data from large
`numbers of unrelated individuals offers a broad catalog of
`genetic variation in three major population groups that
`is
`complementary to deep sequencing of related individuals [2].
`Recently, investigators used a family-sequencing approach to
`
`PLoS Genetics | www.plosgenetics.org
`
`1
`
`September 2011 | Volume 7 |
`
`Issue 9 | e1002280
`
`Personalis EX2122
`
`

`

`Author Summary
`
`An individual’s genetic profile plays an important role in
`determining risk for disease and response to medical
`therapy. The development of technologies that facilitate
`rapid whole-genome sequencing will provide unprece-
`dented power in the estimation of disease risk. Here we
`develop methods to characterize genetic determinants of
`disease risk and response to medical therapy in a nuclear
`family of four, leveraging population genetic profiles from
`recent large scale sequencing projects. We identify the
`way in which genetic information flows through the family
`to identify sequencing errors and inheritance patterns of
`genes contributing to disease risk. In doing so we identify
`genetic risk factors associated with an inherited predispo-
`sition to blood clot formation and response to blood
`thinning medications. We find that this aligns precisely
`with the most significant disease to occur to date in the
`family, namely pulmonary embolism, a blood clot in the
`lung. These ethnicity-specific, family-based approaches to
`interpretation of individual genetic profiles are emblematic
`of the next generation of genetic risk assessment using
`whole-genome sequencing.
`
`fine map recombination sites, and combined broad population
`genetic variation data with phased family variant data to
`identify putative compound heterozygous loci associated with
`the autosomal recessive Miller syndrome [3]. We previously
`developed and applied a methodology for interpretation of
`genetic and environmental risk in a single subject using a
`combination of traditional clinical assessment, whole genome
`sequencing, and integration of genetic and environmental risk
`factors [4]. The combination of these methods and resources
`and their application to phased genetic variant data from family
`based sequencing has the potential to provide unique insight
`into topology of genetic variation, haplotype information, and
`genetic risk.
`One of the challenges to interpretation of massively parallel
`whole genome sequence data is the assembly and variant calling of
`sequence reads against the human reference genome. Although de
`novo assembly of genome sequences from raw sequence reads
`represents an alternative approach, computational limitations and
`the large amount of mapping information encoded in relatively
`invariant genomic regions make this an unattractive option
`presently. The National Center for Biotechnology Information
`(NCBI) human reference genome in current use [5] is derived
`from DNA samples from a small number of anonymous donors
`and therefore represents a small sampling of the broad array of
`human genetic variation. Additionally, this reference sequence
`contains both common and rare disease risk variants, including
`rare susceptibility variants for acute lymphoblastic leukemia and
`the Factor V Leiden allele associated with hereditary thrombo-
`philia [6]. Thus, the use of the haploid NCBI reference for variant
`identification using high throughput sequencing may complicate
`detection of rare disease risk alleles. Furthermore, the detection of
`alternate alleles in high-throughput sequence data may be affected
`by preferential mapping of short reads containing the reference
`base over those containing an alternate base [7]. The effects of
`such biases on genotype accuracy at common variant loci remain
`unclear.
`the development of a novel, ethnically
`Here we report
`concordant major allele reference sequence and the evaluation
`of its use in variant detection and genotyping at disease risk loci.
`Using this major allele reference sequence, we provide an
`
`Genetic Risk in a Family Quartet
`
`assessment of haplotype structure and phased genetic risk in a
`family quartet with familial thrombophilia.
`
`Results
`
`Study subjects and genome sequence generation
`Clinical characteristics of the study subjects and the heuristic
`for the genome sequence generation and analysis are described
`in Figure 1. Two first-degree family members, including the
`father in the sequenced quartet, have a history of venous
`thrombosis; notably,
`the sequenced father has a history of
`recurrent venous thromboembolism despite systemic anticoag-
`ulation. Both parents self-reported northern European ancestry.
`We used the Illumina GAII sequencing platform to sequence
`genomic DNA from peripheral blood monocytes from four
`individuals in this nuclear family, providing 39.3x average
`coverage of 92% of known chromosomal positions in all four
`family members (Figure S1).
`
`Development of ethnicity-specific major allele references
`We developed three ethnicity specific major allele references
`for European (European ancestry in Utah (CEU)), African
`(Yoruba from Ibadan, Nigereia (YRI)), and East Asian (Han
`Chinese from Beijing and Japanese from Tokyo (CHB/JPT))
`HapMap population groups using estimated allele frequency data
`at 7,917,426, 10,903,690, and 6,253,467 positions cataloged in
`the 1000 genomes project. Though relatively insensitive for very
`rare genetic variation, the low coverage pilot sequencing data
`(which comprises the majority of population-specific variation
`data) has a sensitivity for an alternative allele of .99% at allele
`frequencies .10% and thus has high sensitivity for detecting the
`major allele [2]. Substitution of the ethnicity-specific major allele
`for the reference base resulted in single base substitutions at
`1,543,755, 1,658,360, and 1,676,213 positions in the CEU, YRI,
`and CHB/JPT populations, respectively (Figure 2A). There were
`796,548 positions common to all
`three HapMap population
`groups at which the major allele differed from the NCBI
`reference base. Variation from the NCBI reference genomes was
`relatively uniform across
`chromosomal
`locations with the
`exception of loci
`in and near the Human Leukocyte Antigen
`(HLA)
`loci on chromosome 6p21 (Figure 2C). Of variant
`positions associated with disease in our manually curated
`database of 16,400 genotype-disease phenotype associations,
`4,339, 4,451, and 4,769 are represented in the NCBI reference
`sequence by the minor allele in the CEU, YRI, and CHBJPT
`populations, respectively (Figure 2B). There are 1,971 disease-
`associated variant positions represented on the NCBI reference
`sequence by the minor allele in all three population groups
`(Figure 2B). Of these manually-curated disease-associated vari-
`ants, 23 are represented on the NCBI reference sequence by
`minor alleles with a frequencies of less than 5% in all three
`population groups, and 18 are represented by minor alleles with
`frequencies of less than 1% in at least one population group
`(Table S1).
`To test the alignment performance of the major allele reference
`sequences, we performed alignments of one lane of sequence data
`to chromosome 6, which demonstrated the greatest population-
`specific divergence between the major allele in each HapMap
`population and the NCBI reference, and chromosome 22 in the
`NCBI and CEU major allele references
`(Table S2). These
`analyses demonstrated that ,0.01% more reads mapped
`uniquely to the major allele reference sequence than to the
`NCBI reference sequence. We identified sequence variants in the
`family quartet by comparison with the HG19 reference as well
`
`PLoS Genetics | www.plosgenetics.org
`
`2
`
`September 2011 | Volume 7 |
`
`Issue 9 | e1002280
`
`Personalis EX2122
`
`

`

`Genetic Risk in a Family Quartet
`
`Figure 1. Pedigree and genetic risk prediction workflow. A, Family pedigree with known medical history. The displayed ages represent the
`age of death for deceased subjects or the age at the time of medical history collection (9/2010) for living family members. Arrows denote sequenced
`family members. Abbreviations: AD, Alzheimer’s disease; CABG, coronary artery bypass graft surgery; CHF, congestive heart failure; CVA,
`cerebrovascular accident; DM, diabetes mellitus; DVT, deep venous thrombosis; GERD, gastroesophageal reflux disease; HTN, hypertension; IDDM,
`insulin-dependent diabetes mellitus; MI, myocardial infarction; SAB, spontaneous abortion; SCD, sudden cardiac death. B, Workflow for phased
`genetic risk evaluation using whole genome sequencing.
`doi:10.1371/journal.pgen.1002280.g001
`
`the CEU major allele reference we developed, resulting in single
`nucleotide substitutions at an average distance of 699 base pairs
`when compared with the NCBI reference and 809 base pairs
`
`when compared with the CEU major allele reference. We also
`identified 859,870 indels at an average inter-marker distance of
`3.6 kbp.
`
`PLoS Genetics | www.plosgenetics.org
`
`3
`
`September 2011 | Volume 7 |
`
`Issue 9 | e1002280
`
`Personalis EX2122
`
`

`

`Genetic Risk in a Family Quartet
`
`Figure 2. Development of major allele reference sequences. Allele frequencies from the low coverage whole genome sequencing pilot of the
`1000 genomes data were used to estimate the major allele for each of the three main HapMap populations. The major allele was substituted for the
`NCBI reference sequence 37.1 reference base at every position at which the reference base differed from the major allele, resulting in approximately
`1.6 million single nucleotide substitutions in the reference sequence. A, Approximately half of these positions were shared between all three HapMap
`population groups, with the YRI population containing the greatest number of major alleles differing from the NCBI reference sequence. B, Number
`of disease-associated variants represented in the NCBI reference genome by the minor allele in each of the three HapMap populations. C, Number of
`positions per Mbp at which the major allele differed from the reference base by chromosome and HapMap population.
`doi:10.1371/journal.pgen.1002280.g002
`
`PLoS Genetics | www.plosgenetics.org
`
`4
`
`September 2011 | Volume 7 |
`
`Issue 9 | e1002280
`
`Personalis EX2122
`
`

`

`A major allele reference sequence reduces genotyping
`error at variant loci associated with disease traits
`Specific to the family quartet, of 16,400 manually-curated
`single nucleotide polymorphisms associated with disease traits,
`10,396 were variant in the family when called against the NCBI
`reference genome, and 9,389 were variant in the family when
`called against the major allele reference genome. The genotyping
`error rate for these disease-associated variants, estimated by the
`Mendelian inheritance error (MIE) rate per variant, was 38%
`higher in the variants called by comparison with the NCBI
`reference genome (5.8 per 10,000 variants) than in variants called
`by comparison with the major allele reference genome (4.2 per
`10,000 variants). There were 233 genotype calls at 130 disease-
`associated variant positions
`that differed across
`the quartet
`between the NCBI reference genome and the major allele
`reference genome (summary for each genome is provided in
`Table S3). Among these variants, 161/188 genotypes (85.6%) in
`the major allele call set were concordant with genotypes from
`orthogonal genotyping technology, whereas only 68/188 (36.2%)
`in the NCBI call
`set were concordant with independent
`genotyping.
`
`Inheritance state analysis identifies .90% of sequencing
`errors
`Sequencing family quartets allows for precise identification of
`meiotic crossover sites from boundaries between inheritance
`states and superior error control [3]. We resolved contiguous
`blocks of single nucleotide variants into one of four Mendelian
`inheritance states or two error states. Using this methodology,
`we identified 3.1% of variant positions as associated with error
`prone regions
`(Figure 3A). Using a combination of
`these
`methods and quality score calibration with orthogonal geno-
`typing technology, we identified 94% of genotyping errors, with
`the greatest reduction in error rate resulting from filtering of
`variants in error prone regions (Figure 3B). We estimated a
`final genotype error
`rate by three methods of between
`5.2661027, estimated by the state consistency error rate in
`identical-by-descent regions, and 2.161026, estimated by the
`MIE rate per bp sequenced.
`
`Prior population mutation rate estimates are biased
`upwards by the reference sequence
`After excluding variants in sequencing-error prone regions, we
`identified 4,302,405 positions at which at least one family member
`differed from the NCBI reference sequence and 3,733,299
`positions at which at least one family member differed from the
`CEU major allele reference sequence (Figure S2). With respect to
`the NCBI reference sequence, this corresponds to an estimated
`population mutation rate (Watterson’s h [8]) of 9.261024,
`matching previous estimates [3]. However, in comparison with
`the CEU major allele reference, we estimated a lower population
`mutation rate of 7.861024, suggesting that previous estimates may
`have been biased upwards by comparison with the NCBI
`reference sequence.
`
`Male and female recombinations occur with nearly equal
`frequency in this family and approximately half occur in
`hotspots
`Boundaries between contiguous inheritance state blocks defined
`55 maternal and 51 paternal recombination events across the
`quartet at a median resolution of 963 base pairs. A parallel
`heuristic analysis of recombination sites confirmed our observation
`of nearly equal paternal and maternal recombination frequency
`
`Genetic Risk in a Family Quartet
`
`(Figure 3C). Fine scale recombination mapping and long range
`phasing revealed that the mother has two haplotypes ([C, T] and
`[T, T]) at SNPs rs3796619 and rs1670533 that are associated with
`low recombination rates in females, while the father has one
`haplotype associated with low recombination rate in males [T, C]
`[9]. The father also has the common [C,T] haplotype which is
`associated with high recombination rates in males when compared
`with the other two observed haplotypes. We found that 25 of 51
`paternal recombination windows (49%) and 27 of 55 maternal
`recombination windows (49%, Figure 3) were in hotspots (defined
`by maximum recombination rate of .10 cM/Mbp), while only
`,4 (4.1%) would be expected by chance alone (p = 2.0610273 for
`hotspot enrichment according to Monte Carlo permutation). Both
`parents carry 13 zinc finger repeats in the PRDM9 gene (Entrez
`Gene ID 56979) and are homozygous for the rs2914276-A allele;
`both of
`these loci have been previously associated with
`recombination hotspot usage [10–13]. We used a combination
`of per-trio phasing,
`inheritance state of adjacent variants, and
`population linkage disequilibrium data to provide long range
`phased haplotypes (Figure 3D).
`
`Rare variant analysis identifies multi-genic risk for familial
`thrombophilia
`It has been estimated from population sequencing data that
`apparently healthy individuals harbor between 50 and 100
`putative loss of
`function variants
`in genes associated with
`Mendelian diseases and traits [2]. Many of these variants are
`of limited import, either because they result in subtle phenotypes
`or have no biological effect. Thus, prioritization of putative loss
`of function variants remains a significant challenge. We used a
`combination of manually-curated rare-variant disease risk
`association data, an internally-developed method for scoring
`risk variants according to potential clinical impact, and existing
`prediction algorithms [14,15] (Figure S3 and Table S4)
`to
`provide genetic risk predictions for phased putative loss-of-
`function variants for the family quartet (Table 1). To further
`characterize the potential adverse effects of non-synonymous
`single nucleotide variants (nsSNVs), we implemented a multiple
`sequence alignment
`(MSA) of 46 mammalian genomes, de-
`scribed further in Text S1, that is similar to that implemented in
`the Genomic Evolutionary Rate Profiling score [16,17]. For
`coding variants of unknown significance,
`the mammalian
`evolutionary rate is proportional to the fraction of selectively
`neutral alleles [18] and can therefore serve as a prior expectation
`in determining the likelihood that an observed nsSNV is
`deleterious.
`Of 354,074 rare or novel variants compared with the CEU
`major allele reference sequence, we identified 200 non-synony-
`mous variants in coding regions, 1 nonsense variant, 1 single
`in the conserved 39 splice acceptor
`nucleotide variant
`(SNV)
`dinucleotides, and 27 novel
`frame-shifting indels
`in genes
`associated with Mendelian diseases or traits. Consistent with our
`prior observations and a conserved regulatory role for microRNAs
`(miRNAs), we found no rare or novel SNVs in mature miRNA
`sequence regions or miRNA target regions in 39 UTRs. There
`were four compound heterozygous variants in disease-related
`genes and three homozygous variants in disease-related genes
`(Table S6). Five variants across the family quartet are associated
`with Mendelian traits (Table 2). One variant in the gene F5
`(Entrez Gene ID 2153), encoding the coagulation factor V, confers
`activated protein C resistance and increased risk for thrombophilia
`[19,20]. Another variant (the thermolabile C677T variant) in the
`gene MTHFR (Entrez Gene ID 4524), encoding methylenetetra-
`hydrofolate reductase, predisposes heterozygous carriers to hyper-
`
`PLoS Genetics | www.plosgenetics.org
`
`5
`
`September 2011 | Volume 7 |
`
`Issue 9 | e1002280
`
`Personalis EX2122
`
`

`

`Genetic Risk in a Family Quartet
`
`Figure 3. Inheritance state analysis, error estimation, and phasing. A, A Hidden Markov Model (HMM) was used to infer one of four
`Mendelian and two non-Mendelian inheritance states for each allele assortment at variant positions across the quartet.
`‘‘MIE-rich’’ refers to
`Mendelian-inheritance error (MIE) rich regions. ‘‘Compression’’ refers to genotype errors from heterozygous structural variation in the reference or
`study subjects, manifest as a high proportion of uniformly heterozygous positions across the quartet. B, A combination of quality score calibration
`using orthogonal genotyping technology and filtering SNVs in error prone regions (MIE-rich and compression regions) identified by the HMM
`resulted in .90% reduction in the genotype error rate estimated by the MIE rate. C, Consistent with PRDM9 allelic status, approximately half of all
`recombinations in each parent occurred in hotspots. The mother has two haplotypes in the gene RNF212 associated with low recombination rates,
`while the father has one haplotype each associated with high and low recombination rates. Notation denotes base at [rs3796619, rs1670533]. D,
`Variant phasing using pedigree, inheritance state, and population linkage disequilibrium data. Pedigree data were first used to phase informative
`allele assortments in trios (top). The inheritance state of neighboring regions was used to phase positions in which all members of a mother-father-
`child trio were heterozygous and the sibling was homozygous for the reference or non-reference allele (middle). For uniformly heterozygous
`positions, we phased the non-reference allele using a maximum likelihood model to assign the non-reference allele to paternal or maternal
`chromosomes based on population linkage disequilibrium with phased SNVs within 250 kbp (bottom). In all panels a corresponds to the reference
`allele and b to the non-reference allele.
`doi:10.1371/journal.pgen.1002280.g003
`
`homocysteinemia and may have a synergistic effect on risk for
`recurrent venous thromboembolism [21,22]. Follow-up serological
`analysis demonstrated the father’s serum homocysteine concen-
`tration was 11.5 mmol/L (Table S11). We were able to exclude a
`homozygous variant in F2 (Entrez Gene ID 2147), a gene known
`to be associated with hereditary thrombophilia, based on its high
`evolutionary rate in multiple sequence alignment (Table S5). It is
`likely that these variants in F5 and MTHFR contribute digenic risk
`for thrombophilia passed to the daughter but not son from the
`father. This is consistent with the father’s clinical history of two
`venous thromboemboli, the second of which occurred on systemic
`anticoagulation. The daughter has a third variant inherited from
`her mother, the Marburg I polymorphism,
`in the hyaluronan
`binding protein 2 (HABP2, Entrez Gene ID 3026) gene known to
`
`be associated with inherited thrombophilia, thus contributing to
`multigenic risk for this trait [23–25]. Thus, our prediction pipeline
`recapitulated multigenic risk for the only manifest phenotype,
`recurrent thromboembolism, in the family quartet and provided a
`basis for a rational prescription for preventive care for the
`daughter.
`Association between synonymous SNVs (sSNVs) and disease has
`recently been described [26]. sSNVs may affect gene product
`function in several ways,
`including codon usage bias, mRNA
`decay rates, and splice site creation and/or disruption (Figure S4).
`We developed and applied an algorithm (Text S1), for predicting
`loss of function effects of 186 rare and novel sSNVs in Mendelian
`disease associated genes based on change in mRNA stability, splice
`site creation and loss, and codon usage bias. We found that one
`
`PLoS Genetics | www.plosgenetics.org
`
`6
`
`September 2011 | Volume 7 |
`
`Issue 9 | e1002280
`
`Personalis EX2122
`
`

`

`Genetic Risk in a Family Quartet
`
`Table 1. Putative loss of function variants across the family quartet.
`
`All variants
`
`All rare/novel
`
`Rare/novel and OMIM-disease
`associated gene
`
`HG19 reference
`(n = 4302405)
`
`CEU reference
`(n = 3733299)
`
`HG19 reference
`(n = 351555)
`
`CEU reference
`(n = 354074)
`
`HG19 reference
`
`CEU reference
`
`9468
`
`52
`
`11663
`
`7982
`
`50
`
`9928
`
`1276
`
`13
`
`1061
`
`1276
`
`13
`
`1059
`
`203
`
`1
`
`186
`
`200
`
`1
`
`186
`
`1303341
`
`1128283
`
`116276
`
`115397
`
`19544
`
`19766
`
`156
`
`98
`
`40142
`
`61826
`
`0
`
`2
`
`0
`
`147
`
`96
`
`37794
`
`59396
`
`0
`
`2
`
`0
`
`16
`
`9
`
`3637
`
`5989
`
`0
`
`1
`
`0
`
`16
`
`9
`
`3619
`
`5953
`
`0
`
`1
`
`0
`
`0
`
`1
`
`510
`
`848
`
`0
`
`0
`
`0
`
`0
`
`1
`
`516
`
`857
`
`0
`
`0
`
`0
`
`Variant type
`
`Coding-missense
`
`Coding-nonsense
`
`Coding-synonyn
`
`Intronic
`
`Splice-59
`
`Splice-39
`
`UTR-59
`
`UTR-39
`
`miRNA target
`
`Pri-miRNA
`
`Mature miRNA
`
`Coding indels
`
`Coding frameshift indels
`
`1519
`
`440
`
`1476
`
`418
`
`432
`
`273
`
`412
`
`253
`
`73
`
`29
`
`71
`
`27
`
`Abbreviations: CEU reference, variant calls against CEU major allele reference; HG19 reference, variant calls against NCBI reference sequence 37.1; miRNA, micro RNA; Pri-
`miRNA, primary microRNA transcript; OMIM, Online Mendelian Inheritance In Man database; UTR, un-translated region.
`doi:10.1371/journal.pgen.1002280.t001
`
`sSNV in the gene ATP6V0A4 (Entrez Gene ID 50617) was
`predicted to significantly reduce mRNA stability, quantified by the
`change in free energy in comparison with the reference base at
`that position (Figure S5). Further secondary structure prediction
`demonstrated that this SNV likely disrupts a short region of self-
`complementarity that forms a stable tetraloop (Figure S5) in the
`resultant mRNA. Homozygosity for loss of
`function (largely
`protein truncating) variants in this gene is associated with distal
`renal
`tubular acidosis, characterized by metabolic acidosis,
`potassium imbalance, urinary calcium insolubility, and distur-
`bances in bone calcium physiology [27].
`
`Common variant risk prediction identifies risk for obesity
`and psoriasis
`(GWAS)
`Results
`from Genome Wide Association Studies
`provide a rich data source for assessment of common disease risk
`
`in individuals. To provide a population risk framework for genetic
`risk predictions for this family quartet, we first localized ancestral
`origins based on principal components analysis of common single
`nucleotide polymorphism (SNP) data in each parent and the
`Population Reference Sample (POPRES) dataset [28] (Figure 4A).
`This analysis demonstrated North/Northeastern and Western
`European ancestral origins for maternal and paternal
`lineages,
`respectively.
`HLA groups are associated with several disease traits and are
`known to modify other genotype - disease trait associations [29–
`31]. We used long-range phased haplotypes and an iterative search
`(described in full in Text S1) for the nearest HLA tag haplotype
`[32]
`to provide HLA types
`for each individual prior
`to
`downstream risk prediction (Figure 4B and 4C). We then
`calculated composite likelihood ratios
`(LR)
`for 28 common
`diseases for 174 ethnically-concordant HapMap CEU individuals,
`
`Table 2. Rare variants with known clinical associations.
`
`Chromosome Gene
`
`rsid
`
`Affected
`family
`members Disease
`
`Inheritance
`
`Onset-
`earliest
`
`Onset-
`median Severity Actionability
`
`Lifetime
`risk
`
`Variant
`pathogenicity
`
`12
`
`10
`
`19
`
`1
`
`1
`
`VWF
`
`rs61750615 M, S, D
`
`Von Willebrand
`disease
`
`Incomplete
`dominant
`
`HABP2
`
`rs7080536 M, S, D
`
`SLC7A9
`
`rs79389353 M, D
`
`Carotid stenosis,
`thrombophilia
`
`Cysteinuria –
`kidney stones
`
`AD
`
`AR
`
`F5
`
`rs6025
`
`F, D
`
`Thrombophilia
`
`Incomplete
`dominant
`
`MTHFR
`
`rs1801133
`
`F, D
`
`Hyperhomocystein-
`emia
`
`AR
`
`1
`
`4
`
`1
`
`4
`
`1
`
`1
`
`4
`
`1
`
`4
`
`1
`
`5
`
`1
`
`3
`
`4
`
`1
`
`5
`
`5
`
`5
`
`5
`
`6
`
`variable
`
`variable
`
`7
`
`2
`
`2
`
`7
`
`7
`
`7
`
`7
`
`7
`
`Key: Father, mother, son, daughter = F, M, S, D. Abbreviations: AD, autosomal dominant; AR, autosomal recessive. Variants were scored according to disease phenotype
`features and variant pathogenicty as outlined in Table S4.
`doi:10.1371/journal.pgen.1002280.t002
`
`PLoS Genetics | www.plosgenetics.org
`
`7
`
`September 2011 | Volume 7 |
`
`Issue 9 | e1002280
`
`Personalis EX2122
`
`

`

`Genetic Risk in a Family Quartet
`
`Figure 4. Ancestry and immunogenotyping using phased variant data. A, Ancestry analysis of maternal and paternal origins based on
`principle components analysis of SNP genotypes intersected with the Population Reference Sample dataset. B, The HMM identified a recombination
`spanning the HLA–B locus and facilitated resolution of haplotype phase at HLA loci. Contig colors in the lower panel correspond to the inheritance
`state as depicted in Figure 3A. C, Common HLA types for family quartet based on phased sequence data.
`doi:10.1371/journal.pgen.1002280.g004
`
`and provided percentile scores for each study subject’s composite
`LR for each disease studied (Figure 5A). All four family members
`were at high risk for psoriasis, with the mother and daughter at
`highest risk (98th and 79th percentiles, respectively). We also found
`that both parents were predisposed to obesity, while both children
`had low genetic risk for obesity. Discordant risks for common
`disease between parents and at least one child were also seen for
`esophagitis and Alzheimer’s disease. Phased variant data were
`further used to provide estimates of parental contribution to
`disease risk in each child according to parental risk haploty

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket