`Mark Pratt
`My slides for meeting with Mike Snyder yesterday
`Tuesday, May 22, 2012 6:38:43 AM
`Overview for Mike S 21May2012 JW.ppt
`
`From:
`To:
`Subject:
`Date:
`Attachments:
`
`Hi Mark,
`
`I am attaching my slides from yesterday's meeting with Mike Snyder - some maybe useful for
`bringing Jason up to speed ?
`
`I didn't realize Gabor had initial MIE data from the quartet genomes and am sorry we ran out
`of time for that, though it seemed so new that you all are somewhat still sifting through it.
` When you have, could we walk through it ?
`
`Thanks,
`
`John
`
`Personalis EX2055
`
`
`
`Personalis, Inc.
`
`Accuracy Overview
`John West
`May 21, 2012
`
`Personalis Confidential May 21, 2012
`
`1
`
`Personalis EX2055
`
`
`
`Overview of Personalis Accuracy Program
`
`Characterize errors
`
`Informatic Solutions
`
`Lab, Content Solutions
`
`Analysis
`Rates by variant type & Q-score
`Underlying mechanisms
`Medical interpretation impact
`Regulatory impact
`
`Methods
`Comparison with literature
`Studies on limited regions
`Whole genome analysis:
`Single genome metrics
`Same person twice
`Family inheritance (MIE)
`Platform comparison
`Arrays
`ILMN / CGI / Ion T…
`Sanger capillary
`
`Prioritize solutions
`Bio-informatic approaches
`Laboratory approaches
`Content approaches
`
`Advances in 3rd party alg’s
`Aligner for > 200bp reads
`New versions, current alg’s
`Other alg’s (e.g. lobSTR)
`
`Advances in our alg’s:
`SV’s : Zygocity, accuracy
`Combine SV’s & SNP’s
`Broader use of Breakseq
`
`New Personalis alg’s
`Alignment degeneracy
`Genome-aware alignment
`VNTR-specific
`
`Advances in reference
`Ethnic major-allele ref’s
`Iterative
`Admixed
`InDel’s / SV’s
`
`Flag or Filter Out
`For problems not yet solved
`
`Combine technologies
`WGS + exome
`Longer reads and/or inserts
`Multi-vendor
`Arrays (genotypes & SV’s)
`Electrophoretic sizing
`Sanger validation
`
`Content-based semi-custom
`Array (genotypes & SV’s)
`Pullout
`
`Custom sample preps
`Problematic motifs
`Degenerate regions
`Phasing
`
`Content
`Gold standard genomes
`Known healthy genomes
`
`Personalis Confidential May 21, 2012
`
`2
`
`Personalis EX2055
`
`
`
`Overview
`
`(Mis)Alignment in degenerate regions
`•
`• Raw read errors (systematic)
`• Problems in the reference
`
`• Tools for identifying error sources & rates
`
`Personalis Confidential May 21, 2012
`
`3
`
`Personalis EX2055
`
`
`
`Example of process variation in genome sequencing
`Four individuals had blood drawn together at the same hospital, sent to the same
`CLIA / CAP accredited lab, and sequenced on the same set of instruments.
`
`Personalis Confidential May 21, 2012
`
`4
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`5
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`6
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`7
`
`Personalis EX2055
`
`
`
`Occurrence of bases with very low raw quality scores
`deviates markedly from a random (binomial) model
`
`# Reads below quality threshold (Loci with 30-40x coverage)
`
`Profile of systematic errors is remarkably similar between these two genomes
`
`Personalis Confidential May 21, 2012
`
`8
`
`Personalis EX2055
`
`
`
`There are over 50,000 homopolymers with length > 24 in the human genome
`Homopolymers of G’s or C’s are expected to be more severe
`
`Personalis Confidential May 21, 2012
`
`9
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`10
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`11
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`12
`
`Personalis EX2055
`
`
`
`Region with “normal”
`sequence
`
`Region with “problematic”
`Sequence motif
`
`Tools to search for systematic errors across the genome, with their underlying
`mechanisms of action, will be followed by bio-informatic flags and (more
`importantly) bio-informatic and laboratory solutions.
`
`Personalis Confidential May 21, 2012
`
`13
`
`Personalis EX2055
`
`
`
`Directional raw Q-score distribution around a 25 A homopolymer
`The impact is downstream, not in the feature
`
`Personalis Confidential May 21, 2012
`
`14
`
`Personalis EX2055
`
`
`
`Forward
`reads
`
`Reverse
`reads
`
`Forward
`raw Q
`scores
`
`Reverse
`raw Q
`scores
`
`25 A’s
`
`Personalis Confidential May 21, 2012
`
`15
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`16
`
`Personalis EX2055
`
`
`
`Chr 1 : 103,508,600.
`Coverage almost goes to zero, in each direction separately and somewhat offset, at an 18 x 18-mer repeat.
`PG20. Major cluster of palindromes (74 of them), most with 16-19 base matching lengths.
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`18
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`19
`
`Personalis EX2055
`
`
`
`Sequence Motifs Which Underlie Some Errors
`Discussion
`
`• Studies Mike is aware of which identify these
`• Bioinformatic methods to read through these
`•
`Laboratory methods to read through these
`
`• Contribution of this class of error mechanism to the overall next-gen
`sequencing error rate
`
`Personalis Confidential May 21, 2012
`
`20
`
`Personalis EX2055
`
`
`
`Progression of Genome Data Sets for Accuracy Studies
`
`Founder Genomes
`
`Downloading
`
`Custom Sequenced
`
`West quartet (2009)
`MIE analysis:
`
`SNP (Dewey)
`
`SV’s (Manual, WG)
`
`InDel (TBD)
`Anecdotal error studies on
`portions of the genome:
`
`Phenomena
`
`Error rates
`
`Underlying mechanisms
`GA IIx vs HiSeq (early 2011)
`
`Snyder son/mother
`Data behind publications:
`
`Illumina vs CGI
`
`Snyderome paper
`Some inheritance validation
`
`HapMap NA12878
`Broad data set (late 2010):
`
`Hi-Seq 2x100
`
`10 PE insert lengths
`
`Total coverage > 500
`
`Same person “twice”:
`
`
`Same library
`
`
`Different library
`
`
`High coverage
`SV validation (Snyder)
`
`Fluor. Array data
`HapMap genotypes
`Some relatives’ data
`available
`
`Korean PGP
`40 HiSeq 2x100 genomes
`
`CEPH1463
`3 generations, 17 people
`Mother is HapMap NA12878
`
`To be “gold std”
`All 17 sequenced by CGI
`Test-bed for Personalis max
`accuracy combination of
`technologies
`Use to assess other tech’s:
`
`Ion Proton
`
`Oxford Nanopore
`
`MiSeq 2x250
`Anonymous samples
`(ethics)
`No Personalis relationship
`
`Venter Genome
`Only individual whole
`genome on Sanger
`technology
`
`Personalis Confidential May 21, 2012
`
`21
`
`Personalis EX2055
`
`
`
`Segmentation of the 17-member pedigree by priority for sequencing
`
`Personalis Confidential May 21, 2012
`
`22
`
`Personalis EX2055
`
`
`
`All 11
`children
`
`Several
`optimal
`subsets
`with 5 of
`the 11
`children
`
`Mendelian Inheritance Error analysis can be confounded by genomic variation
`induced by the cell culture process. Being able to see each parental allele in two
`separate children should let us separate most of these errors from sequencing
`artifacts, letting us better identify them.
`
`Personalis Confidential May 21, 2012
`
`23
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`24
`
`Personalis EX2055
`
`
`
`•
`
`Personalis Laboratory Strategy
`Laboratory goals:
`– Direct relationship with the customer
`– Quality control, sample to results (important for research and later clinical)
`– Position as “Value-Added Genome”, not “a bio-informatics company”
`– Offer a superior genome, by combining technologies from multiple companies
`– Develop superior sample preps, esp based on Personalis content
`• Outsourcing:
`– Whole genome sequencing service pricing now < reagents to do it yourself
`– Avoid infrastructure costs (instruments, initial computation, staffing)
`– Turn-around time is a disadvantage (quotes at 12-16 weeks)
`– Other assays we can outsource:
`• Arrays (standard & semi-custom)
`• Karyotyping
`• Exomes, RNAseq
`• Personalis lab
`– Receive, de-identify samples
`– QC, sample tracking
`– Ship aliquots to multiple vendors
`– Validations following primary analysis
`– Develop custom sample preps
`
`Personalis Confidential May 21, 2012
`
`25
`
`Personalis EX2055
`
`
`
`Personalis Confidential May 21, 2012
`
`26
`
`Personalis EX2055
`
`