throbber
John West
`Scott Kirk
`Richard Chen; Mark Pratt
`Accuracy documents to convert into project plan format
`Tuesday, May 29, 2012 12:58:39 PM
`Accuracy framework JW 5Jan2012.ppt
`Accuracy JW 5Mar2012.ppt
`Accuracy Gantt JW 15Mar2012.ppt
`Accuracy slides for mtg 13Apr2012.ppt
`Accuracy slides 2May2012 JW.pptx
`Methods to determine SNP error rate 3May2012 JW.pptx
`Accuracy JW 16May2012.ppt
`Accuracy differentiation JW 22May2012.xls
`Accuracy Gantt Spreadsheet 21April2012 JW.xls
`
`From:
`To:
`Cc:
`Subject:
`Date:
`Attachments:
`
`Hi Scott,
`
`I am attaching a sequence of documents related to accuracy planning which have been
`developed starting in early January. Since they are sequential in time, there is overlap of
`content, but perhaps walking through them will give you an idea of how the ideas developed
`and where we are now. These should complement the material you have from Mark. I would
`be happy to discuss these with you, Mark & Rich as you see fit, so this can become more
`integrated with the other project plans. Please let me know.
`
`Thanks,
`
`John
`
`Personalis EX2056
`
`

`

`ACCURACY DIFFERENTIATION
`Draft for discussion, JW, 22 May, 2012
`
`ORGANIZATION OF THIS FILE
`Company differentiation around accuracy
`Better sequencing in the laboratory
`Better variant detection & reporting
`More accurate databases, for interpretation
`
`COMPANY DIFFERENTIATION AROUND ACCURACY
`Focus on accuracy relevant to medical interpretation
`Better understanding of the issues than anyone
`Best track record of publications on the issue
`Unbiased from a platform standpoint & able to combine platforms
`More comprehensive data on accuracy than anyone
`World's only collection of genomes sequenced on both ILMN and CG platforms, plus arrays and karyotyping
`Largest family pedigree sequenced to high coverage
`Only genome sequenced on Sanger, ILMN and CGI
`Databases more accurate than those publicly available
`Able to provide a detailed quantitative view of mechanisms underlying errors
`Deep understanding of accuracy issues used to create better results
`Not just flagging errors, or filtering out those loci, but fixing the problems
`More insightful approaches delivery accuracy affordably
`
`BETTER SEQUENCING IN THE LABORATORY (WHEN MANAGED BY PERSONALIS)
`
`Focus on getting the whole medically-interpretable genome, accurately, even if more expensive
`Use insight into error types & medical content to keep this affordable
`
`Combine data from multiple different runs of a single platform
`Combine paired-end libraries made with multiple insert lengths
`Use longer read lengths (e.g. 2 x 250 bases, when available later in 2012)
`More expensive because only MiSeq, but clearly better
`Combine with bulk shorter-read data from HiSeq
`Substantially more efficient at split-read & junction-sequence SV detection
`Key to single-base breakpoint determination
`
`Combine data from multiple platforms
`More experience with Illumina / Complete Genomics than anyone
`May add Ion Torrent, Oxford Nanopore
`Guided by deep understanding of differential error mechanisms in each platform
`Not tied to any one platform
`Use whatever it takes to get the best possible combination
`
`Combine data from outside next-gen sequencing
`
`Several major areas of medical genetics are not well assayed by next gen sequencing
`Example 1 : Diseases caused by STR-expansion (e.g. Huntinton's)
`Example 2 : Robertsonian translocations
`
`Personalis will combine NGS data with Non-NGS technologies to create a complete assessment
`Add karyotyping
`Add electrophoresis where appropriate (TBD)
`Others
`
`Orthogonal technologies also provide validation of NGS results
`Integrate NGS with array (fluorescence for SV's, in addition to genotypes)
`Sanger and/or (PCR + electrophoresis) as follow-up to SNP's / SV's of specific genomes (option TBD)
`
`Ability to create semi-custom products focused on medically interpretable parts of the genome
`Leverages Personalis advantages in content
`Custom hybridization array
`Custom pullout set
`Other assays to fix specific error types
`
`Question : Should there be a "Personalis exome" option ?
`More comprehensive / accurate at exome price level ?
`
`What is proprietary about this approach ?
`
`Personalis EX2056
`
`

`

`Personalis can focus it's efforts based on the world's best content (re medical interpretation)
`Personalis will develop proprietary understanding about how best to combine multiple technologies
`Personalis does not face the competitive & anti-trust barriers that platform companies do
`Personalis' people combine deep experience with both platforms and interpretation & can leverage the two against each other
`Personalis can combine work in the lab and in bioinformatics, in a way that pure informatics companies can't
`
`BETTER VARIANT DETECTION & REPORTING
`
`Fewer false positive SNP's due to method of generating laboratory data
`Combination of paired-end insert lengths covers more of the genome uniquely
`
`Fewer false negative SNP's due to method of generating laboratory data
`More uniform coverage by combining library prep methods & platforms
`
`Orthogonal validation of millions of SNP genotypes by array
`Integrated with next gen sequencing data, not just another separate report
`
`Better alignment, due to better reference sequence
`SNP major allele ref by ethnicity (in our first product)
`InDel major allele ref by ethnicity (later)
`Other advances as R&D develops them:
`Itterative alignment
`
`# TBD changes inSNP alleles called with Personalis reference vs public standard
`Likely more improvement in non-European ethnicities
`Include (eventually) changes in InDels called as well
`
`We provide the only support available specifically for admixed genomes
`Major-allele non-ethnic reference, or even more advanced options
`
`Focused effort to align in the presence of SNP / InDel clusters, MNP's
`May leverage Hugo's BreakSeq approach
`May need time in the development plan
`
`Better SNP reporting due to better reference
`
`We report variants when sequence is a homozygous match to the public ref but that's the minor allele
`Entirely missed by systems which use the public reference
`Example : Factor V Leiden
`Rong had a whole paper on all the disease variants in the public ref
`> 1M loci where we can be different
`We should calculate the average # actual loci / genome, by ethnicity
`
`At het loci, we report both alleles, but we report the minor allele as the variant
`Not the allele which is different from the public reference
`
`Better detection of SV's
`
`Better lab data for SV detection:
`Longer reads (better for approaches based on split-read & junction-sequences)
`MiSeq 2x250 or other platform
`Multiple insert lengths
`Electrophoretic assay of STR-expansions
`Karyotyping for Robertsonian translocations
`
`Orthogonal technologies for validation of SV's:
`Fluorescence intensity data from hybridization arrays
`
`We combine the results from five different algorithmic approaches
`
`We test our SV algorithms by Mendelian Inheritance in high coverage whole genome family data sets
`one which was sequenced with ten different paired-end libraries spanning 200 - 40,000 bases
`and validate them using fluorescent intensities from high density hybridization arrays
`
`We don't treat all SV's as novel - we have the world's best database of known SV's and their junction sequences
`Detection is better when you know exactly what you are looking for
`We should have a meeting to discuss how we can (easily ?) build this
`Start with 1,000 Genomes result Hugo has helped create
`Large data set but low coverage may make detection less certain in low MAF SV's
`Others will be able to access this eventually, potentially catching up, or claiming to
`Augment this with (more confident ?) SV's from:
`
`Personalis EX2056
`
`

`

`Full coverage (30-40x) genomes (West, Altman, 40 Koreans, others we can download)
`High coverage (>60x) genomes (Snyder, CEPH1463, Venter, others ?)
`
`Better reporting of SV's
`We determine the zygocity of deletions and report it
`Deletions integrated with SNP report, e.g. "A-" vs "AA" inside a het deletion
`We report SV's with their allele frequencies in the ethnicity matching the sample
`
`Flagging of potential errors
`Many subtle error types not recognized by others
`Error mechanisms underlying differences when the same person is sequenced twice (it's not just Poisson !)
`Error loci determined from deep & multi-platform sequencing of large families
`Error loci determined by extensive platform comparison, both NGS/NGS and NGS/Non-NGS
`Detailed understanding of compressions, and large unpublished catalog of them
`
`MORE ACCURATE DATABASES, FOR INTERPRETATION
`
`Cleaner databases:
`Well financed, systematic manual curation to industrial QC standard
`Standardized medical language hierarchy
`Extensive cross checking of databases developed independently
`VariMed vs HGMD
`MendelDB vs OMIM
`Personalis PharmGKB vs public PGKB (need to be careful in this positioning)
`
`Databases others will not have:
`Regulome
`BreakSeq (esp if augmented with private Personalis data)
`Compression list (described in publications but not released)
`Variant data derived from a broad collection of genomes
`Multiple public data sets, some processed in proprietary ways by Personalis
`Access to private data sets, sequenced by others
`Access to private data sets, sequenced by Personalis
`
`Personalis EX2056
`
`

`

`Framework for assessment &
`improvement of accuracy in whole
`genome medical interpretation
`
`Draft Jan 5, 2012
`John West
`
`Personalis EX2056
`
`

`

`Research Market Products
`
`• Comprehensive medically-focused interpretation of whole human genome
`sequences
`– Huge # of potential results
`– Even a low % error can swamp customer & waste time trying to track
`down invalid hypotheses
`– If research is in a clinical setting, particularly with return of results to
`patients, IRB’s may want “FDA-like” quality systems in place
`• Standard quality approaches
`• Standard quality terminology & reporting
`– Accuracy / quality to “near-FDA” standards can be a sales advantage in
`the research market
`• Maximal-accuracy analysis of a genome. Personalis :
`– Takes responsibility for the sample
`– Specifies the combination of next-gen sequencing performed
`– Augments NGS with other technologies to determine key variants
`– Conducts genotyping & assesses concordance
`– Conducts follow-up genetic & non-genetic testing to validate results
`
`Personalis EX2056
`
`

`

`Eventual clinical product
`
`• Also focused on comprehensive medically-focused interpretation of whole
`human genomes
`– FDA anticipates “Class 3” (highest) level of regulatory oversight
`– Need to undo bad reputation created by DTC genotyping companies
`– Pro-active early leadership in quality systems for whole genome may
`create a positive reputation for Personalis at FDA
`
`Personalis EX2056
`
`

`

`CDC Process – Established process & terminology
`(Augment with FDA processes, likely to be similar)
`
`Personalis EX2056
`
`

`

` Clinical
`
`Clinical
`
`Specificity
`
`
`
`&;
`Ethical, Legal
`S
`rkta
`
`
`Social Implicat
`safeguards & impediments
`
`
`Setti ng
`
`
`ae”
`Analytic
`
`pecicty|
` Evaluation;eerie
`onitoring A
`
`>
`
`
`
`Personalis EX2056
`
`Sensitivity
`
`Penetrance
`Robustress
`Conta A
`OS
`
`“J
`
`Personalis EX2056
`
`

`

`Supplementary
`info:
` Medical record
` Family history
` Family tree
`
`Study participant(s)
`Samples
`Laboratory testing
`Raw data, QC report
`Alignment & variant
`detection
`Alleles at known loci
`& novel variants
`Variant interpretation
`
`Types of testing
`to run:
`Mix of NGS &
`non-NGS
`
`List of medically
`relevant allele
`types &
`coordinates
`
`Focus on
`Analytic Validity
`(Primary topic of
`this slide set)
`
`Draft genetic report
`Technical validation
`(e.g. Sanger sequencing)
`
`Validated genetic report
`Follow-on testing (non-genetic)
`Draft medical report
`Physician / researcher / counselor
`
`Optional return of final medical results
`Study participant
`
`Focus on Clinical
`Validity & Utility
`(Discuss in a
`future slide set)
`
`Personalis EX2056
`
`

`

`Elements of Analytic Validity in CDC’s ACCE Model
`See Appendix for additional detail
`
`• Analytic Sensitivity : How often positive when mutation is present ?
`– One minus the false negative rate ?
`• Analytic Specificity : How often negative when mutation is not present ?
`– One minus the false positive rate ?
`• Assay Robustness : How often does the test fail to give a usable result ?
`• Repeatability :
`– On the same sample
`– Within & between labs
`– Between sample types & other process variables
`• Confirmatory testing to resolve false positives ?
`– Genetic validation tests
`– Non-genetic tests (molecular, imaging, physiological)
`• Quality Control: Internal QC program defined & externally monitored ?
`
`Personalis EX2056
`
`

`

`Crucial to develop a model which links analytic
`validity results to monitor-able process variables
`
`• Allele determination performance is known to vary widely between loci
`• With millions of known loci, many quite rare, it is impractical to determine the
`assay performance for each as an independent system.
`• Novel loci, inherently, cannot be validated in advance
`• Proposed approach:
`– Create a quantitative process model of allele determination
`– Incorporate known process variables (sample type, storage & preparation,
`coverage, read length, repeat structure, algorithm parameters, etc)
`– Augment with models of process variability & failure mechanisms
`– Instrument the model with QC observables, related to normal & failure modes
`– Compare model predictions with real data & characterize differences
`– Iterate until model can confidently predict analytic validity metrics for both
`known & novel alleles, including in combination
`– System performance metric should prioritize medically interpretable alleles
`– Use insights from experimentally assessed error model to :
`• Evaluate / prioritize process improvements
`• Select disease-associated alleles which can be assayed with NGS+
`
`Personalis EX2056
`
`

`

`Genetic variant types to include in model
`
`• SNP’s
`•
`InDel’s
`• Larger deletions
`• STR / VNTR’s
`• Structural variants
`• Copy Number Variation
`• Trisomies
`
`• Combinations:
`– Any of these can be
`heterozygous
`– Compound
`heterozygocity
`– Small variants within
`larger het deletions or
`other CNV’s
`– Clusters of closely
`spaced variants, which
`might interfere with
`each others’ detection
`
`Personalis EX2056
`
`

`

`Examples of known error mechanisms
`Many more likely to follow
`
`•
`
`Locally low / no coverage due to:
`– GC bias of biochemistry
`– Alignment degeneracy (repeats)
`– Clusters of variant alleles exceed alignment mismatch allowance
`– Alignment poor due to problems with reference sequence
`• Reads align, but incorrectly
`– InDels missed, or inconsistently placed in homopolymers
`– Misplacements lead to :
`• Phantom het SNP’s
`• Apparent tri-allelic loci
`• Apparent triple haplotype structure
`• Raw read errors, random or systematic
`• Allelic imbalance of reads due to:
`– Random extremes of expected binomial distribution, esp where low
`coverage
`– Allelic biases - biochemical or bioinformatic (e.g. clusters of SNP’s in LD)
`• Read length too short to quantify length of VNTR, or VNTR embedded in other
`repeat structures
`
`Personalis EX2056
`
`

`

`Examples of process variability
` Many more likely to follow
`
`• Sample source (blood, saliva, tissue, cell culture)
`• Sample preservation, culturing & prep for sequencing
`• DNA sequencing platform, lab & operator(s)
`• Coverage
`• Read length
`• Paired-end insert length (a distribution)
`• Cluster creation & Sequencing chemistry versions
`• Cluster density
`• Variation in data processing algorithms, input parameters, reference sequence
`• DNA contamination or incomplete filtering of primer-related artifacts
`• Availability of supplementary data from the same sample (e.g. genotyping)
`•
`Inconsistent supplementary information from / about the individual sequenced
`
`Personalis EX2056
`
`

`

`QC Readouts, to monitor the process
`(Particularly given that it may be impractical to standardize all
`process variables completely, over multiple years)
`
`• Paired-end insert length distribution vs expected
`• Coverage non-uniformity vs % GC content; relative to a standard
`• Coverage histogram in regions selected to avoid repeats & %GC variation
`•
`Total coverage ratios of chromosomes
`• Raw read Q-score %-iles vs position along a read; relative to a standard
`• Allelic distribution of reads vs expected binomial, at a “diagnostic” set of het
`SNP loci
`• Sex, ethnicity, admixture and family relationships between samples vs
`expected from supplementary data provided
`• Raw read error rates (monitored at homozygous loci), systematic / random
`• Mendelian inheritance errors (family genome data sets)
`• Clusters of variants which are close / dense enough to predict higher
`probability of errors
`Two alleles / haplotypes in single sex chromosomes, vs a standard
`•
`Three alleles / haplotypes in diploid chromosomes , vs a standard
`•
`• Amount and location of autozygocity , vs a standard
`• Screening for non-human DNA
`• Concordance with genotyping data, vs a standard
`
`Personalis EX2056
`
`

`

`Categories of experimental data to assess
`error types, mechanisms, frequencies
`
`• Repeatability :
`– Same person twice (or more ?), trying to keep everything else constant
`– Alternatives: Genomes of identical twins, or children where identical
`• Controlled process experiments :
`– Same person sequenced multiple times, each time varying just one
`parameter of the process (many of these can be done computationally)
`• Concordance of other technology platforms on the same sample
`• Compare process QC metrics from many sources of raw genome data, even
`low coverage (e.g. 1,000 Genomes)
`• Mendelian inheritance errors (MIE’s) determined from family genome sets
`• Absolute accuracy :
`– Genomes of individuals known to have clear Mendelian diseases
`– Sequencing of complex synthesized oligo sets (synthetic genome)
`– Targeted modifications of DNA from a naturally occurring genome
`
`Personalis EX2056
`
`

`

`Lists of problematic regions in the genome
`Capture all in one place, unify, & link to underlying error mechanisms of the model
`
`• Mapability / alignability assessments (some on Santa Cruz browser site)
`• Genes with known CNV’s
`• Genes with known pseudogenes
`•
`List of problematic genes for NGS (e.g. VWF)
`• Evan Eichler’s list of problematic regions
`•
`1,000 Genomes list of regions with top 0.1% of coverage
`•
`List of compressions
`• MIE’s & MIE-cluster regions from family data
`• Difficult regions from phasing of a family
`• Regions of high raw read error rates (monitor at homozygous loci)
`•
`Loci apparently tri-allelic in the genome of a healthy individual, or 3 haplotypes
`• Regions of allelic bias
`• Regions of no or low aligned coverage, determined experimentally
`• Segmental duplications & other repeat structures
`
`Personalis EX2056
`
`

`

`List of Variants to Optimize Against
`
`• Medically interpretable variants (not just SNP’s !):
`– Mendelian “Top 200”
`• Specific loci / variant types, where known
`• Genes in which to identify novel variants
`– PharmGKB
`• Curated & novel variants
`– Alleles in top Risk-O-Grams
`• Diseases
`• Molecular or physiologic medically relevant phenotypes
`– Blood & tissue typing
`Functional elements (broad interest for discovery research)
`– RefSeq genes
`– Linc-RNA
`– Regulome elements
`– eQTL loci
`
`•
`
`Personalis EX2056
`
`

`

`Proposed Initial Priorities
`
`•
`
`Literature review related to genome sequencing accuracy, followed by
`discussion with SAB
`• Develop an initial list of top priority medically interpretable variants to optimize
`against
`• Create / collect lists of problematic regions of the genome
`• Screen target medical variant list against problematic regions to get 1st cut
`scale & composition of the accuracy problem
`• Develop initial quantitative model of allele determination, including known error
`mechanisms. Predict & characterize accuracy & repeatability levels
`• Obtain existing experimental data, reprocess with current version algorithms,
`and conduct 1st cut repeatability & accuracy assessment. Compare with
`model on a genome-wide basis and at the medically interpretable loci.
`Launch program to obtain long-lead-time experimental data, including
`prototype of “maximally accurate genome” product (based on NGS augmented
`by other technologies).
`
`•
`
`Personalis EX2056
`
`

`

`Appendix
`US Centers for Disease Control (CDC)
`ACCE Model Process for Evaluating Genetics Tests
`
`Potentially useful framework & terminology for moving towards FDA
`approval of genome-based diagnostics
`
`Personalis EX2056
`
`

`

`CBC Home
`
`
`01b€
`
`Centers for Disease Control and Prevention
`
`epee 24/7: Seving Lives. Protecting People. Saving Money through Prevention.
`
`GS http: / /www.cdc.gov/genomics/gtesting /ACCE/
`
`Genomic Testing
`ACCE ModelProcess for Evaluating Genetic Tests
`
`From 2000 = 2004, CDC's Office of Public Health Genomics (OPHG)
`established and supported the ACCE Model Project, which developed
`the first publicly-available analytical process for evaluating scientific
`data on emerging genetic tests. The ACCE framework has guided or
`been adopted by various entities in the United States and worldwide
`for evaluating genetic tests; the CDC-supported EGAPP™ initiative
`builds on the ACCE model structure and experience.
`
`Introduction to ACCE
`ACCE, which takes its name from the four main criteria for
`evaluating a genetic test — analytic validity, clinical validity, clinical
`utility and associated ethical, legal and Social implications — isa
`model process that includes collecting, evaluating, interpreting, and
`reporting data about DNA (and related} testing for disorders with a
`genetic component in a format that allows policy makers to have access to up-to-date and reliable
`information for decision making. The ACCE model process is composed of a standard set of 44 targetedi+
`questions (3) that address disorder, testing, and clinical scenarios, as well as analytic and clinical validity,
`clinical utility, and associated ethical, legal, and social issues.
`
` Facilities
`
`An important by-product of the ACCE model process is the identification of gaps in knowledge that will
`help to define future research agendas. The ACCE approach builds on a methodology originally described
`by Wald and Cuckle (1) and on terminology introduced by the Secretary's Advisory Committee on
`Genetic Testing (2).
`
`Learn more about ACCE,
`
`Personalis EX2056
`
`Personalis EX2056
`
`

`

`Genomic Testing
`ACCE ModelList of 44 Targeted Questions Aimed at a Comprehensive Review of
`Genetic Testing
`
`Element
`
`Component
`
`Specific Question
`
`Disorder/Setting
`
`. What is the specific clinical disorder to be studied?
`
`. What are the clinical findings defining this disorder?
`
`. What is the clinical setting in which the test is to be
`performed?
`
`. What DNA test(s) are associated with this disorder?
`
`. Are preliminary screening questions employed?
`
`.
`
`.
`
`Is it a stand-alone test or is it one of a series of tests?
`
`If it is part of a series of screening tests, are all tests
`performed in all instances (parallel} or are only some
`tests performed on the basis of other results (series)?
`
`La
`
`JooLlfe
`
`Personalis EX2056
`
`Personalis EX2056
`
`

`

`Genomic Testing
`ACCE ModelList of 44 Targeted Questions Aimed at a Comprehensive Review of
`Genetic Testing
`
`Element
`
`Component
`
`Specific Question
`
`Analytic Validity
`
`.
`
`Is the test qualitative or quantitative?
`
`Sensitivity
`
`. How often is the test positive when a mutation is
`present?
`
`Specificity
`
`. How often is the test negative when a mutation is not
`present?
`
`11.
`
`Is an internal QC program defined and externally
`monitored?
`
`12.
`
`Have repeated measurements been made on specimens?
`
`13.
`
`What is the within- and between-laboratory precision?
`
`14.
`
`If appropriate, how is confirmatory testing performed to
`resolve false positive results in a timely manner?
`
`15.
`
`What range of patient specimens have been tested?
`
`16.
`
`How often does the test fail to give a useable result?
`
`17.
`
`How similar are results obtained in multiple laboratories
`using the same, or different technology?
`
`Personalis EX2056
`
`Personalis EX2056
`
`

`

`Genomic Testing
`ACCE ModelList of 44 Targeted Questions Aimed at a Comprehensive Review of
`Genetic Testing
`
`Element
`
`Component
`
`Specific Question
`
`Clinical Validity
`
`Sensitivity
`
`. How often is the test positive when the disorder is
`present?
`
`Specificity
`
`. How often is the test negative when a disorder is not
`present?
`
`20,
`
`Are there methods to resolve clinical false positive results
`in a timely manner?
`
`Prevalence
`
`21.
`
`What is the prevalence of the disorder in this setting?
`
`22.
`
`Has the test been adequately validated on all populations
`to which it may be offered?
`
`23,
`
`What are the positive and negative predictive values?
`
`24.
`
`What are the genotype/phenotype relationships?
`
`25.
`
`What are the genetic, environmental or other modifiers?
`
`Personalis EX2056
`
`Personalis EX2056
`
`

`

`Genomic Testing
`ACCE ModelList of 44 Targeted Questions Aimed at a Comprehensive Review of
`Genetic Testing
`
`Element
`
`Component
`
`Specific Question
`
`Clinical Utility
`
`Intervention
`
`26,
`
`What is the natural history of the disorder?
`
`Intervention
`
`27.
`
`What is the impact of a positive (or negative) test on
`patient care?
`
`Intervention
`
`28.
`
`If applicable, are diagnostic tests available?
`
`Intervention
`
`29,
`
`Is there an effective remedy, acceptable action, or other
`measurable benefit?
`
`Intervention
`
`30.
`
`Is there general access to that remedy or action?
`
`31.
`
`Is the test being offered to a Socially vulnerable
`population ?
`
`Quality
`Assurance
`
`. What quality assurance measures are in place?
`
`Pilot Trials
`
`33.
`
`What are the results of pilot trials?
`
`Personalis EX2056
`
`Personalis EX2056
`
`

`

`Genomic Testing
`ACCE ModelList of 44 Targeted Questions Aimed at a Comprehensive Review of
`Genetic Testing
`
`Element
`
`Component
`
`Specific Question
`
`Health Risks
`
`7"
`
`. What health risks can be identified for follow-up testing
`and/or intervention?
`
`35.
`
`What are the financial costs associated with testing?
`
`Economic
`
`ah.
`
`What are the economic benefits associated with actions
`resulting from testing?
`
`Facilities
`
`J?
`
`. What facilities/personnel are available or easily put in
`place?
`
`Education
`
`a8.
`
`What educational materials have been developed and
`validated and which of these are available?
`
`39.
`
`Are there informed consent requirements?
`
`Monitoring
`
`40,
`
`What methods exist for long term monitoring?
`
`41.
`
`What guidelines have been developed for evaluating
`program performance?
`
`Personalis EX2056
`
`Personalis EX2056
`
`

`

`Genomic Testing
`ACCE ModelList of 44 Targeted Questions Aimed at a Comprehensive Review of
`Genetic Testing
`
`Element
`
`Component
`
`Specific Question
`
`ELSI
`
`impediments
`
`42. What is known aboutstigmatization, discrimination,
`privacy/confidentiality and personal/family social issues?
`
`43, Are there legal issues regarding consent, ownership of
`data and/or samples, patents,licensing, proprietary
`testing, obligation to disclose, or reporting requirements?
`
`Safeguards
`
`44, What safeguards have been described and are these
`Safeguards in place and effective?
`
`Personalis EX2056
`
`Personalis EX2056
`
`

`

`Develop list of
`medically
`interpretable loci /
`regions for testing
`
`Review
`literature
`
`Conceptual Gantt chart
`for analytical accuracy
`development. In
`practice, there will be
`overlap of the elements.
`
`Develop
`Error model
`based on
`hypothesized
`mechanisms
`of action
`
`Create
`QC
`tools
`
`Iterate error model & refine
`experimental analysis
`
`Analysis of
`experimental data to
`quantify error rates
`for all variant types
`& characterize by
`mechanism of action
`
`Develop
`strategy
`
`Design,
`implement &
`test technology
`improvements
`
`Release
`version x.x
`
`Personalis EX2056
`
`

`

`Develop a List of Medically Interpretable Loci /
`Regions for Testing
`• Develop a list, some of each variant-type, which we want to be able to interpret
`medically. An initial list can be anecdotal / representative. Eventually, it
`should be more comprehensive.
`• PharmGKB SNP’s
`• VariMed SNP’s which contribute to Risk-O-Grams
`– 1st exon, other exons, non-exonic
`• Create an initial list (maybe 20 each ?) of medically interpretable:
`– Small InDels, CNV’s, VNTR’s, Deletions spanning 10 – 100k bases,
`Insertions, Duplications, Balanced translocations
`• Create a list of genes which we want to be able to assay confidently in their
`entirety:
`– PhGKB VIP genes, genes for 20 major Mendelian Diseases (e.g. CFTR),
`20 major cancer genes (e.g. p53, BRCA 1&2, …)
`• Variants required for blood typing
`• Variants for tissue typing
`•
`Linc-RNA regions, Regulome elements (TF binding site ranges), eQTL loci
`•
`Loci / genes for which phasing is most likely to be necessary for interpretation
`
`Personalis EX2056
`
`

`

`Literature Review
`• Avoid re-inventing the wheel
`
`•
`
`•
`
`•
`
`Literature already reviewed:
`– Marguillies (NHGRI, coverage analysis & error filtering)
`Lam (Platform comparison)
`–
`Kahn (Accuracy advances, Illumina technology)
`–
`Complete Genomics technology whitepaper
`–
`Finishing of the Human Genome (2004)
`–
`Nature Biotech Dec 2011, Dutch group. (Filtering out errors)
`–
`Some literature on Huntington’s disease assays
`–
`To be reviewed:
`Complete Genomics data pipeline paper (recent)
`–
`To look for:
`Papers systematically addressing the mechanisms underlying errors by type
`–
`Assay methods & problems for specific difficult cases
`–
`UCLA (Stan Nelson’s group) error model (published ?)
`–
`• Mark to review algorithms and parameter settings of HugeSeq elements (Hugo
`may already be familiar):
`Alignments : BWA
`–
`SNP & InDel detection : GATK, SAMtools
`–
`SV & CNV detection :
`–
`• Breakdancer (paired-end mapping)
`• CNVnator (read-depth analysis)
`• Pindel (split-read analysis)
`• BreakSeq (junction mapping)
`
`Personalis EX2056
`
`

`

`Develop an error model based on
`hypothesized mechanisms of action
`
`• Coverage model incorporating: GC-bias, alignment degeneracy, Poisson
`sampling
`• SNP detection error model incorporating: Binomial distribution of het alleles,
`Raw read error rate, proximity to InDels (detected or not), clusters of SNPs,
`compressions, alignment degeneracy, coverage, SNP loci within SV’s (esp
`large deletions), aligner & variant caller parameters, allelic biases. False
`positive, false negative & non-call rates.
`InDel error model, including zygocity
`•
`• SV error model, including zygocity (separate models for the multiple SV &
`CNV algorithms used, and then their combination)
`• Can the model correctly reflect the differences between Illumina & CGI data
`sets on the same sample ?
`• Can the model reflect errors in the reference sequence & guide its
`improvement ?
`
`Personalis EX2056
`
`

`

`Create QC Tools to Measure Error-Model
`Parameters From Experimental Data Sets
`
`Determinants of coverage:
`– Coverage vs % GC content
`– Aligned coverage vs reference degeneracy (multiple parameters?)
`– Coverage histogram in regions selected to avoid repeats & %GC variation
`• Should match Poisson ?
`•
`Look for deviations from Poisson (e.g. palindromes)
`• Total coverage ratios of chromosomes (e.g. each chromosome vs total of Chr 1-22)
`Paired-end insert length distribution vs expected
`– Tails of the distribution may impact BreakDancer
`Raw read Q-score %-iles vs position along a read; vs expected
`Raw read error rates (monitored at homozygous loci)
`–
`In regions devoid of known problems (i.e. baseline)
`– Profiled across the genome, to look for problematic regions
`Allelic distribution of reads vs expected binomial (alignment, biochemistry problems)
`–
`In regions devoid of known problems (i.e. baseline)
`– Profiled across the genome, to look for problematic regions (allelic bias in the genomic sequence reads)
`• May be best with a very high coverage genome
`Sex, ethnicity, admixture and family relationships between samples vs expected from
`supplementary data provided
`Clusters of variants (not just SNPs) which are close / dense enough to predict higher
`probability of errors
`Amount and location of autozygocity, vs expectation
`
`Screening for non-human DNA (esp in data sets from salliva ?)
`
`•
`
`•
`
`•
`•
`
`•
`
`•
`
`•
`
`•
`
`•
`
`Personalis EX2056
`
`

`

`Analysis of experimental data to quantify error
`rates for all variant types & characterize by
`mechanism of action
`
`•
`
`•
`
`•
`
`Analysis of a single genome:
`– Coverage assessment & errors related to that, including non-calls
`–
`Two alleles / haplotypes in single sex chromosomes, vs a standard
`–
`Three alleles / haplotypes in diploid chromosomes , vs a standard
`– Comparison of variant detection between multiple algorithms used in HugeSeq for the same thing:
`• SNP’s & InDels : GATK & SAMtools
`• SV’s & CNV’s : Breakdancer, CNVnator, Pindel, BreakSeq
`• Aligned with the standard reference sequence vs ethnically specific major allele
`
`Comparison of genomes:
`–
`Independent subsamples from a single deep genome data set
`– Same genome sequenced twice (or more) with all the same platform settings
`– Same genome vs paired-end insert length
`– Same genome vs read-length
`– Mendelian Inheritance Errors (MIE) & Inheritance State Consistency Errors (ISCE)
`– Same genome

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket