throbber
From:
`To:
`Subject:
`Date:
`Attachments:
`
`Mark Pratt
`John West
`accuracy strategy notes
`Tuesday, May 15, 2012 9:29:07 AM
`AccuracyStrategyNotes110514.pptx
`
`The first 5-6 slides are current. The later slides are from a deck I had been building in
`Feb/March.
`
`Sorry about the state but I couldn't edit them down last night.
`
`^^
`
`Personalis EX2052
`
`

`

`Personalis Accuracy Program Facets
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`Ground Truth / Golden Genomes program
`–
`Concordance and pedigree analysis over multiple platforms to independently assess errors
`– Develop “gold standard” genomes using any accessible means to resolve conflicts and extend coverage
`•
`Alternative and complimentary assays
`•
`Targeted laboratory and bioinformatics analysis to resolve high impact errors
`–
`Prepare standard output for test pipeline comparison
`Genome comparison tool
`–
`Basic tool for binary low-level genome-wide comparison of nearly identical genomic analysis
`–
`Tabulate and categorize differences and problems
`–
`Used for pipeline development, “golden” genome comparison and other development accuracy activities
`Problem Flagging and Error model
`–
`Identify errors using golden genomes, array discordance, etc
`–
`Annotate comparison of sample and historical high density QC metrics
`–
`Annotate for known and suspected “problem” areas
`–
`Train algorithms classify and flagging errors (and later to resolve discordant data)
`Complimentary assays and analytical methods for production use
`– Develop customized libraries and protocols to improve sensitivity and accuracy on content:
`•
`Custom content GT/CNV/Breakpoint array
`•
`Targeted pullout and tailored insert sequencing libraries
`•
`Exome, RNAseq
`Develop Processes for Targeted Solutions
`–
`Research pipeline for resolution of specific problems, e.g. a known SD in a high value content area
`•
`Long range sequencing techniques (FISH, single molecule methods)
`•
`Customized realignment or reassembly
`•
`Customized local reference
`
`Personalis EX2052
`
`

`

`Error Analysis Using Golden Genomes
`
`•
`
`•
`
`•
`
`•
`
`•
`
`–
`
`Start with validated modern (current) genome analysis tools
`– Harnesses world-wide genomics community (e.g. GATK variant discovery framework)
`–
`Carefully engineered, particularly on SNP calling and Q scores
`–
`Rapidly evolving …
`Use multiple genomes and assays to produce a set of high quality “Golden” genomes
`– Use replicates and pedigrees to develop discordance and error data sets
`• Database errors and high error rate regions
`Compute detailed QC metrics
`• Database high density QC statistics (coverage, Q, FWD/REV asymmetry, etc)
`Develop genome-scale error model and annotation by segmenting with and training on
`–
`Known errors and error ranges from golden genome set analysis
`–
`Local and aggregate QC metrics
`– Database of known or suspected regions of difficulty
`•
`SDs, STRs, pseudogenes, palindromes
`• Alignment degeneracy metrics
`• GC content, problem motifs
`Use segmentation and error model to annotate and correct confidence
`– Machine learning approach
`Prioritize analysis and other development activities by value of content
`
`Personalis
`Advantages
`
`Personalis EX2052
`
`

`

`Process for 2-Genome Comparison
`Development Test Pipeline & Golden Genome Comparison
`Test
`Pipeline
`
`Gold Standard Genome Process
`
`(Replicates
`or Rep-Std)
`
`HugeSeq
`
`Union
`
`Concordance
`
`Pedigree Genome
`Replicate Genome
`Sets (ILMN)
`Sets (ILMN)
`Samples & Sequencing
`HugeSeq Pipeline
`QC tool
`SNP/CNV Array Data
`Additional Inputs (Exome, New Algos, GNOM)
`Master Union of Variants for Set
`Mendelian
`Inheritance State &
`Phasing Analysis
`
`Concordance analysis
`
`A/B Report
`Error ID,
`Classify,
`Report
`
`Error Identification
`Error Classification & Model Training
`Conflict Resolution / Reference Output
`
`A mechanism to comprehensively
`compare pipeline outputs is essential
`to accuracy improvements as well as
`maintenance and rapid development
`of pipeline components. Top level
`requirements:
`
`•
`
`•
`
`•
`
`•
`
`•
`
`Exhaustive comparison of locus-level
`variant and reference calls between two
`genome analyses.
`Segment discordance by Personalis
`content and known problem areas.
`Compare quality indicators such as
`coverage and variant qualities.
`Produce high level concordance scoring
`summary.
`Compare pipeline replicates or pipeline
`v. golden genome.
`
`Personalis EX2052
`
`

`

`A/B Genome Comparison Output
`(Comparison of ostensibly identical analyses)
`High Level
`Low Level
`• ∆% genome coverage
`• Discordant SNPs & Indels
`• ∆# and ∆% reads mapped
`– Segment by content regions
`– Segment by genomic regions
`• ∆# and ∆% of variants by type
`– Segment by sequence context
`•
`Comparison of coverage distributions
`– Segment by QC parameters
`•
`Comparison of Q distributions
`• Discordant CNV/SVs
`– ∆s at specified thresholds e.g. {10,20,30}
`– ID common variants
`Comparison of Q by read position
`– Compare parameters
`Comparison of InDel/SV sizes
`– Segment by sequence context
`– Segment by genomic regions
`– Segment by QC parameters
`
`•
`•
`
`Personalis EX2052
`
`

`

`Personalis Analytical Pipeline
`Accuracy Focal Points
`
`Issues
`• Lost or swapped
`sample
`• Enzymatic errors in
`sample preparation
`Incomplete or low
`coverage libraries
`• Poor confirmation
`
`•
`
`Issues
`• Assay-specific
`systematic error
`• Uneven coverage and
`quality
`• Sample mix up
`• Lab consistency
`
`Issues
`• Poor alignment
`• Incomplete variant
`analysis
`• Reference bias
`• Errors in reference
`• Overestimated
`confidence
`
`Mitigations
`• Multiple tissue
`• Retain sample
`• Internal library QC
`• Confirmation assay
`• Multiple custom
`libraries
`
`Mitigations
`• CLIA lab sequencing
`• CLIA lab genotype
`validation assay
`• Independently
`indexed libraries with
`tailored inserts
`
`Mitigations
`• Integrate all leading
`tools (HugeSeq)
`• Ethnic and custom
`references
`• Calibrate errors with
`genotypes
`• Reassembly of
`problem areas
`
`Issues
`• Incomplete and
`variable quality public
`databases
`• Errors in reference
`• No clinical annotation
`for variants
`• Lack of centralized,
`high-quality data
`
`Mitigations
`• Manual curation of
`literature under
`controlled protocol
`• Human cross-check
`of curation
`+ Varimed
`+ PharmGKB
`+ MendelDB
`+ Regulome
`
`Issues
`• High rate of significant
`false positives
`• Missing data on
`critical variants
`• Inaccurate confidence
`estimation and risk
`combination
`
`Mitigations
`• Risk-o-gram properly
`combines risk alleles
`• Detailed analysis of
`high significance loci
`• Physician/GC review of
`final report
`• Comprehensive error
`propagation framework
`
`Personalis EX2052
`
`

`

`Personalis Pipeline
`Accuracy Program Activities
`
`WGS Prep
`
`Exome Prep
`
`NGS
`
`Sample
`
`RNAseq?
`Prep
`
`HugeSeq
`
`Scripture?
`
`Array Prep
`
`Array Assay
`
`Genotyping
`
`Varimed
`
`PharmGKB
`
`MendelDB
`
`Public DBs
`
`Interpret
`
`Enzymology (Completeness, Error floor, Platform combination)
`
`Analysis & Tools (Accuracy/sensitivity assessment & model, QC, Annotation)
`
`Algorithms (Error framework, Alignment and Assembly)
`
`Bioinformatics (Error modes, Error propagation, Error model, Platform combination)
`
`Personalis EX2052
`
`

`

`GATK Variant Calling/Recallibration Pipeline
`First round for golden genomes
`
`Personalis EX2052
`
`

`

`Accuracy Program Facets
` Test, Fix, Ground truth
`
`a
`
`Golden
`Genomes
`
`Test Pipeline
`
`Complimentary Assays and
`Bioinformatic Methods
`
`Error Model
`
`Gold Standard Genome Process
`
`Pedigree Genome
`Replicate Genome
`Sets (ILMN)
`Sets (ILMN)
`Samples & Sequencing
`HugeSeq Pipeline
`QC tool
`SNP/CNV Array Data
`Additional Sequence Inputs (Exome, GNOM, …)
`Master Union of Variants for Set
`Mendelian
`Inheritance State &
`Phasing Analysis
`
`Concordance analysis
`
`Error Identification
`Error Classification & Model Training
`Conflict Resolution / Reference Output
`
`Co
`
`Personalis EX2052
`
`

`

`Genome Scale Accuracy
`(Miscalls, Mischaracterization and Mitigation)
`
`Cause
`
`Effect
`
`-
`No single platform has satisfactory accuracy for individual genome-wide interpretation
`-
`High false-positive rate for SNPs make NGS challenging for clinical interpretation and inefficient for
`discovery
`-
`Inaccurate characterization of structural variants leads to incorrect interpretation
`+ By combining complimentary assays and improving sequence analyses, Personalis improves both
`accuracy quality assessment to standards unavailable elsewhere
`+ Accurate and empirically tested assessment of analytical confidence is a cornerstone of the
`Personalis clinical interpretation.
`
`Personalis EX2052
`
`

`

`Genome Scale Incompleteness
`Causes, Consequences, and Corrections
`
`Cause
`
`Effect
`
`-
`No single method has satisfactory completeness for individual genome-wide interpretation
`-
`Even in large sample sets, systematic completeness issues degrade discovery sensitivity
`-
`You can’t assess what you can measure
`+ By combining complimentary assays and improving sequence analyses, Personalis extends genomic
`coverage beyond anything available today providing the most complete genome-scale analysis
`available at any price
`+ When information is absent, Personalis properly accounts for this implicit uncertainty in clinical
`interpretation and risk assessment.
`
`Real knowledge is to know the extent of one’s ignorance
` Confucius
`
`
`
`
`
`
`Personalis EX2052
`
`

`

`Complimentary Platforms Improves
`Accuracy and Completeness
`
`•
`
`State-of-the-art sequencing platforms have clinically
`challenging accuracy
`•
`Poor raw variant calling accuracy and consistency
`~3% error from MIE analysis (Dewey, et al, 2011)
`~1% variation between tissues (Lam, et al, 2011)
`~1.5% validation failure (Abecasis, 1000 Genomes)
`Poor concordance between platforms
`10% discordant SNVs (%2 contradictory), 73% disconcordant
`SVs between ILMM & GNOM (Lam, et al. 2011)
`30% validation failure of indels (Abecasis, 1000 Genomes)
`Poor coverage or reproducibility within platform
`Sacrifice 5% of genome to achieve Q50 run-to-run
`concordance (Ajay, et al 2011)
`80-85% accessible (Abecasis, 1000 Genomes)
`
`These error and completeness shortfalls are
`problematic for individual analysis and can lead to
`inappropriate clinical interpretations.
`
`•
`
`Personalis’s multi-platform approach extends coverage and resolves
`discordant measurements by mapping platform limits of performance.
`
`Q60
`
`Q80
`
`Q50
`
`Q40
`
`Q50
`
`Q30
`
`•
`
`•
`•
`
`High coverage PE DNAseq
`•
`Distributed over insert sizes – pullout
`combinations
`• Ultra-high coverage PE ExomeSeq
`1M Custom SNP array for confirmation and calibration
`RNAseq to corroborate or resolve complex issues?
`
`Personalis EX2052
`
`

`

`Pipeline Modules
`
`Sample
`Prep
`
`Assay
`
`Read QC
`
`Alignment
`QC
`
`Variant QC
`
`Variants
`
`Annota-
`tion
`
`Biological
`Insignt
`
`Clinical
`Insignt
`
`Base Content
`Proprietary Content
`
`Concordance
`
`Base Sample
`Pipeline
`
`Analytical Pipeline
`(HugeSeq)
`
`Second Source
`
`Secondary Pipeline
`
`Multiple Genomes
`
`Set QC
`
`Union Report
`
`Science
`
`Base
`Available
`
`Case-Control Analysis
`
`Trio Analysis
`
`Large Pedigree Analysis
`
`Personalis EX2052
`
`

`

`Personalis sets the Quality Standard
`in Genome Scale Analysis
`
`Mark Pratt
`16 February, 2011
`
`Personalis EX2052
`
`

`

`Industry Leading Quality at Every Step of
`Personalis Genome Interpretation
`• Curated Content
`Manually curated content using controlled processes and cross-checking
`• Complimentary Platforms
`Multiple assays combined to extend coverage, corroborate and calibrate results
`• Customized and Extended References
`Ethnic or custom reference alignment improves accuracy & coverage
`Full Spectrum Variant Identification
`Industry leading combination of SNVs, structural and copy number variants
`• Validated Analytical Uncertainty Model
`Confidence model based on empirical calibration of errors
`• Accurate Confidence Assessment of Clinical Interpretation
`Combination of biological and analytical uncertainties used in complete model
`
`•
`
`Personalis EX2052
`
`

`

`Sequence Alignment Ambiguity
`
`Issues:
`•
`Degenerate read alignment is suspected to be the primary driver in high sequencing error rates
`Approximately 100,000 SNPs per genome are miscalled in standard service sequencing
`
`
`• Much (10-15%) of genome is poorly accessed by sequencing because of repeat structures
`
`Sequence and hybridization array data are substantially incomplete for both discovery and risk analysis
`
`Approach:
`1.
`Improve reference sequence on SNPs (a) then InDels and other SVs (b) to reduce number of alignment
`mismatches and improve specificity
`Employ multiple insert length libraries to resolve (some) degeneracies and enable assembly
`Improve alignment algorithms and attempt local assemblies
`Incorporate very-long-read technology (PacBio/ONT) for resolution of long degenerate regions
`
`2.
`3.
`4.
`
`Concerns:
`1. Only modest improvement noted in correcting just SNVs in major allele reference (a) and substantial
`further development needed to incorporate InDels and other SVs into custom references (b)
`Significant facilities, development and product costs involved in multi-library preparation
`2.
`3. Development investment and outcome uncertainty for algorithm development
`Long read technologies are currently expensive, error prone and poorly commercialized
`4.
`
`Personalis EX2052
`
`

`

`Curated Content (Placeholder, Gemma WIP)
`
`Issues:
`•
`Unavailable clinical information in public databases
`•
`Poor quality control in publicly available databases
`
`Approach:
`• Manually curated proprietary databases
`•
`Human QC cross-check
`
`Concerns:
`•
`Labor intensive and slow
`
`Personalis EX2052
`
`

`

`Customized Personalis Genotype/CNV Array
`
`Issues:
`•
`Poor concordance between SNP arrays and sequencing data
`Few percent discordance, concordance predicts accuracy, element of error model
`
`•
`Known set of high value loci, poor sensitivity with commercial sequencing
`Missing data on variants in Varimed, PharmGKB will materially impact risk assessment
`
`
`Approach:
`1. Develop custom genotype & CNV array to baseline error model, corroborate and backfill sequence data
`Include all accessible SNPs in Varimed, PharmGKB, MendelDB, Regulome
`2.
`Include genome-wide QC and CNV loci to aid in sequence interpretation and error calibration
`3.
`
`Concerns:
`1.
`Adds $250? baseline cost to product
`2.
`Exposes Personalis content to array manufacturer (Illumina?)
`3. Unknown effort and utility of comprehensive error model
`
`Personalis EX2052
`
`

`

`Improved Reference Genome
`
`Personalis EX2052
`
`

`

`Propagation of Uncertainty
`
`Personalis EX2052
`
`

`

`Multi-library Sequencing
`
`Personalis EX2052
`
`

`

`Content?
`
`Poor content drives poor
`interpretation regardless of
`data type or quality.
`
`Personalis sets standards in
`completeness and quality with
` VariMed
` PharmGKB
` MendelDB
`
`• Qualified manual curation
`• Certificated checklist (?)
`• Certified cross-check and QC (?)
`• Administration (?)
`
`Personalis EX2052
`
`

`

`Customized and Extended References
`
`Existing reference contains numerous rare
`alleles and structural variants
`•
`Requires mismatches in alignments
`increasing ambiguity
`Increases error through reference bias
`
`•
`
` Reference is part of the problem
`
`Personalis’s team has pioneered the use
`of ethnic references to improve alignment
`performance.
`•
`Customized references
`• Diploid reference
`
`Personalis EX2052
`
`

`

`Full Spectrum Variant Identification at
`Industry Leading Accuracy
`• HugeSeq
`– Best of class combination of variant detection
`
`Personalis EX2052
`
`

`

`Validated Analytical Uncertainty Model
`
`Poorly calibrated confidence values from
`variant calling algorithms can result in
`inappropriate confidence of clinical
`interpretation.
`
`Existing platforms primarily calibrate on
`self-consistency and small genome
`sequencing. These approaches neglect or
`underestimate the effect of large
`systematic effects that affect typical human
`samples resulting from a range of issues
`from complex repeat structures in human
`genomes to variability in sample quality.
`
`Filtering is a typical approach improving
`error rates of sequencing data but often
`comes at an unacceptable cost of reduced
`sensitivity and negative bias in clinical
`interpretation.
`
`Personalis has developed a variant
`confidence model model derived from data
`quantifying both precision and accuracy
`across many samples and technology
`platforms
`– Sample-to-sample, run-to-run, lab-to-
`lab and coverage based precision model
`for platforms
`– Cross-platform discordance prediction
`and conflict resolution (??)
`– Characterizing and predicting excess
`“Mendelian inheritance anomalies” in
`families
`– Calibration of systematic error resulting
`from degenerate alignment
`
`Personalis EX2052
`
`

`

`Accurate Confidence Assessment of
`Clinical Interpretation
`
`Personalis has developed a statistical framework to carry forward all
`analytical information to interpretation
`
`– This framework maintains maximum statistical power of data, and
`– correctly distinguishes between modest confidence variant detection and
`missing data leading to reduction in false negative bias in interpretation….
`
`Personalis EX2052
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket