throbber
From:
`To:
`Subject:
`Date:
`Attachments:
`
`Scott Kirk
`John West; Christian Haudenschild; Richard Chen; Hugo Lam; Rong Chen; Mark Pratt
`PGS v1 Requirements draft (rev 1.10)
`Wednesday, October 10, 2012 12:51:06 PM
`PGS_v1.10_RemainingRequirements.docx
`
`Hi,
`
`V1.10 is attached which includes a few edits and additional items from the accuracy discussion yesterday as well as
`a number of updates from the discussion this morning which covered secs 8,9,10,11, and part of 6 on SVs.
`
`Thanks,
`Scott
`
`Personalis EX2095
`
`

`

`
`
`
`
`
`
`
`Personalis Genome Service
`
`
`
`
`Remaining Requirements for v1.0
`
`Draft v1.9
`
`Personalis EX2095
`
`

`

`
`
`Version Author
`V1
`RC, HL, MP, SK, ROC,
`GC, MC
`ROC, SK, HL
`
`V1.1
`
`Date
`9/30/12
`
`10/1/12
`
`10/2/12
`
`V1.2
`
`RC, SK, ROC, MP
`
`10/3/12
`
`V1.3
`
`SK
`
`10/4/12
`10/5/12
`
`V1.4
`V1.5
`
`HL
`SK, MP, ROC, HL, RC
`
`10/7/12
`
`V1.6
`
`SK, MP
`
`10/7/12
`
`V1.7
`
`SK, RC, MP, HL, ROC
`
`10/8/12
`
`V1.8
`
`SK
`
`10/8/12
`
`V1.9
`
`SK
`
`Description
`First consolidated version
`
`Updated SV and pipeline requirements
`post discussion with Hugo.
`
`Merged in first round of changes from
`Rong
`Second round of changes from
`Rong
`First round of changes from Mark.
`Reorganization and miscellaneous
`edits and some priority tagging
`Pipeline performance testing
`Priority tagging and miscellaneous
`edits. Additional refinement of
`exome plus requirements.
`Further refinement of exome +
`requirements
`Revisions post VA gap analysis –
`VA gaps is now traced to the
`numbering convention. Try not to
`change numbers.
`Included mockup graphics in various
`sections and updated accuracy
`section some.
`Additions and edits from accuracy
`discussion
`
`
`1 Key Differentiators ............................................................................................................ 3
`2 Scope ...................................................................................................................................... 3
`3 Key Dependencies (if any) .............................................................................................. 3
`4 Format and Priority Definitions ................................................................................... 4
`5 Exome Plus Requirements .............................................................................................. 4
`6 Structural Variants Requirements ........................................................................... 10
`7 Additional Pipeline Requirements: .......................................................................... 18
`8 Build a repository of public genomes/exomes to calculate population
`mechanisms. ............................................................................................................................ 21
`
`Table of Contents
`
`frequencies, serve as controls, test our pipeline, and identify disease
`
`Personalis EX2095
`
`

`

`9 Curate proprietary databases with quality control ............................................ 23
`10 Additional Annotation requirements ................................................................... 26
`11 Case/control discovery module .............................................................................. 30
`12 Database accuracy study ........................................................................................... 33
`13 Reports Requirements ............................................................................................... 36
`14 Performance: ................................................................................................................. 52
`15 Additional testing requirements ............................................................................ 53
`16 Additional Accuracy Related Features ................................................................. 55
`17 Commercial .................................................................................................................... 59
`
`
`
`• Exome Plus support:
`o Pulldowns for long and short read sequencing that include custom
`content. Provides a differentiated laboratory component in our
`offering covering our proprietary content outside of standard Exome
`assays as well as regions with known accuracy issues.
`
`• End to End SV support:
`o SV is an area where most competitors do very little. Offering more
`here significantly differentiates us. We will be also producing our own
`frequency data as well as deriving it from public data sets.
`
`• Database accuracy study:
`o Provides a check on the quality of our proprietary databases
`o Provides marketing content around our proprietary databases
`
`• Base Pipeline and Annotation engine.
`• V1 of PGS
`• Document also highlights features that critical for VA proposal
`• VA proposal
`
`3 Key Dependencies (if any)
`
`
`
`1 Key Differentiators
`
`2 Scope
`
`Personalis EX2095
`
`

`

`• Exome Plus:
`o Custom pulldown orders from external manufacturers (Agilent,
`ILMN).
`• End to End SV support:
`o external tools
`o test data
`o databases
`• Database accuracy study:
`o tools to map the variants and phenotypes across proprietary and
`public databases
` Each requirement should include:
`• Requirement statement: Statement of what the requirement is. Can be
`simple descriptive sentences or in user based story format (as an [persona], I
`want [feature], so that I can [do something]). Independent of priority, “shall”
`generally means required and “may” means optional.
`• Additional Description: provide additional details and technical
`requirements as needed.
`• Definition of Done: what criteria need to be satisfied for the requirement to
`be considered “done”. Helps to define how requirement is tested.
`• Priority: see definitions.
`• Unique identifier: will be added when we enter the requirements into Jira.
` Priorities definitions:
`• P1 – Must have. Critical for the release.
`• VA – required for VA proposal
`• P2 - Nice to Have. Implement if time allows.
`• P3 - Future: not required, but is perhaps desirable in the future. May require
`some work to make them easier to implement later on.
`• P4 - Deferred: the feature is deferred until a future/next release.
` 5.1 Exome plus extends the standard exome to areas of medical significance
`Priority: P1
` Area
`
`outside the exome including regulatory regions, pharmGKB, Varimed,
`HGMD, and mendelian variants.
`
` Format and Priority Definitions
`
` 4
`
`5 Exome Plus Requirements
`
`Priority
`
`Comments
`
`Personalis EX2095
`
`

`

`P1
`P1
`P1
`P1
`P1
`P2
`P2
`
`
`
`
`
`
`
`Could be a P2, it’s next up if time allows.
`
`
`5.2 Exome plus fixes the following types of problematic regions in areas of
`medical significance in both exomes and whole genomes:
`
`Varimed
`PharmGKB
`Regulome tier 1
`Mendelian gene exons
`HGMD
`Regulome tier 2
`Gene panels
`Priority: P3
`Additional Description:: for the first release of this, work is to be scoped
`through the lens of the clinical exome panels. For example, focus on fixing
`problematic regions in the cardiomyopathy, FH, LongQT syndrome panels.
`Area
`Priority
`Comments
`Degenerate Alignment
`P2
`use long reads to improve these areas
`GC rich regions
`P2
`can we improve coverage in GC rich regions
`(more iterative than the other three and
`implies more research)
`Unstable expanding repeat
`P2
`regions/VNTRs/STRs
`Content intersecting SVs
`P2
`Content intersecting
`P2
`
`compressions
`
`Phased exons(?)
`P2
`P2
`P2
`
`HLA typing
`P2
`
`Allelotyping, unstable
`P2
`
`expanding repeats
` 5.3 Design exome plus to augment established low coverage regions across
`Priority: P2
`Additional Description::
`• High GC and poorly mapped (degenerately mapped) regions
`• Solution is longer reads and a GC specific process.
`Priority:P1
` Read length is 2x150 (?) and content shall include coverage of:
`
`
`
`
`
`
`
`
`the interpretable exome to achieve more uniform coverage and thus
`higher quality.
`
`5.4
`Implement first pass Exome Plus to Achieve 5.1, 5.2, 5.3
`5.4.1 An exome supplement pulldown (ESP) to cover missed and low coverage areas
`in standard NGS exome sequencing shall be developed.
`
`Personalis EX2095
`
`

`

`Area
`
`Priority
`
`What it covers/details
`
`
`P1
`VariMed
`Sarah’s set of 7800 genes and any SNP within
`P1
`pharmGKB
`vicinity of these genes
`3502 HGMD genes
`P1
`HGMD genes
`
`?
`Low coverage
`
`?
`High GC content
`
`?
`Missed splice sites
`
`P2
`Homopolymer flanks
`
`P1
`Important Regulome 1
`– overlapped with
`medical variants
`
`P1
`Important Regulome 2
`
`P1
`Highly conserved
`regions
`cover key disease related variants in our
`P1
`Hgmd, omim, varimed,
`proprietary databases as part of the pullout.
`mendelDB overlap
`
`P1
`Intronic variants
`(possible
`differentiator)
`
`P1
`HGMD 3400 disease
`causing genes
`
`P1
`ILMN Truseq exome
`expanded by 50
`bases(?), trimmed
`down by low and zero
`coverage areas.
` 5.4.2 An exome long read pulldown (LRP) set to cover areas that are interpretable,
`Priority:P3
`Read length is 2x250. Is therefore dependent upon selection, integration, and
`testing of a long read aligner.
` Area
`areas in OMIM gene,
`P3
`zero coverage regions
`pharmGKB, HGMD
`gene sets with mapping
`problems.
`mapping problems
`P3
`
`where assembly and
`compression is an issue
`hitting anything
`interpretable.
`HLA regions
`P2
`
`Unstable repeat dz
`P3
`
`Exomic seg dups
`P3
`
`Exomic compressions
`P3
`
`Content sv breakpoints P3
`
`Exomic long strs
`P3
`
`Pharmgkb problems
`P3
`
`
`but challenging with standard NGS exome sequencing shall be developed.
`
`Priority
`
`What it covers/details
`
`Personalis EX2095
`
`

`

`
`
`
`
`
`
`
`
`5.4.3 An exome High GC Supplement (HGCS) to cover regions of high GC content
`known to be problematic in NGC exome sequencing shall be developed.
`
`Area
`
`Priority
`
`What it covers/details
`
`Including high GC SNPs
`Very high GC. Poor coverage now in ILMN v3
`chemistry is an improvement over the past, so not
`necessarily a knock against the method.
`
`Common exomic indels P3
`Suspect medical snps
`P3
`Small, high value sets
`P3
`3 whole compressions
`P3
`on OMIM genes
`amylase family
`P3
`SMN1
`P3
`SMN2
`P3
`
`Priority:P3
`Content shall include coverage of:
`Exomic high GC regions P3
`First exons
`P3
`
`
`5.5 Exome+ pulldowns shall be evaluated to determine how well various
`experimental strategies work and to allow marketing communication
`about product performance and quality.
`5.5.1 The overall quality of the Exome+ pulldown assay will be evaluated and
`summarized by measuring the following characteristics:
`
`
`
`Metric
`
`Target
`
`Min. Acceptable
`
`Priority:P1
`We may want to add these to 5.11 for measuring within each type of content
`region as they could be different? They could also vary by vendor selected.
`<<Need to determine spec numbers to test against>>
`On/Off target quality
`Base coverage of XX% at >=
`XX% at >= 10x coverage depth.
`/enrichment efficiency
`10x coverage depth.
`<XX>
`(specificity)
`<XX>
`Sensitivity
`SNPs:<XX>
`SNPs:<XX>
`Indels:<XX>
`Indels:<XX>
`SVs:<XX>
`SVs:<XX>
`Uniformity of coverage 10, 20, 30x??<XX>
`<XX>
`Reproducibility
`95%<XX>
`<XX>
`Other measures?
`
`
` 5.5.2 The performance of Exome+ will be evaluated for each of the defined areas of
`Priority:P1
`<<Need to determine spec numbers to test against>>
`
`
`content by measuring the following characteristics:
`
`Personalis EX2095
`
`

`

`Metric
`
`Target
`
`Min. Acceptable
`
`5.5.3 To improve our ability to measure quality and performance of the Exome +
`pulldowns, produce specialized reports and analytics that summarize
`performance characteristics for the following areas:
`
`<XX>
`<XX>
`Comparison to standard
`exome results
`XX% at >= 10x coverage depth.
`Base coverage of XX% at >=
`On/Off target quality
`<XX>
`10x coverage depth.
`/enrichment efficiency
`<XX>
`(specificity)
`SNPs:<XX>
`SNPs:<XX>
`Sensitivity
`Indels:<XX>
`Indels:<XX>
`SVs:<XX>
`SVs:<XX>
`<XX>
`10, 20, 30x??<XX>
`Uniformity of coverage
`<XX>
`10, 20, 30x?<XX>
`Coverage of content
`region
`SNPs:<XX>
`SNPs:<XX>
`Sensitivity of detecting
`Indels:<XX>
`Indels:<XX>
`specific medically
`SVs:<XX>
`SVs:<XX>
`relevant loci (per panel
`basis?)
`SNPs:<XX>
`SNPs:<XX>
`Specificity of detecting
`Indels:<XX>
`Indels:<XX>
`specific medically
`SVs:<XX>
`SVs:<XX>
`relevant loci (per panel
`basis?)
`<XX>
`95%<XX>
`Reproducibility
`For example: a report on “here’s our performance on HGMD variants:
`coverage, variant detections, mean confidence”
`Possibly include ref calls at critical variant loci. Nice to know you have an
`affirmed null detection of some nasty variant.
` Area
`Varimed
`P1
`
`pharmGKB
`P1
`
`Regulome tier 1
`P1
`
`Regulome tier 2
`P1
`
`Mendelian gene exons
`P1
`
`Blood typing
`P1
`Double check VA proposal
`HLA typing
`P2
`
`Allelotyping, unstable
`P3
`
`expanding repeats
`Content intersecting
`P3
`
`SVs
`Content intersecting
`P3
`
`compressions
`Phased exons(?)
`P3
`
`Gene panels
`P3
`Could be a P2, it’s next up if time allows.
` 5.6 Overall cost of exome plus needs to be significantly less than a whole
`Priority: P1
`
`Priority
`
`Comments
`
`genome. (What is the target number?)
`
`Personalis EX2095
`
`

`

`5.6.1 Produce a detailed cost model of the entire exome plus workflow and use to
`determine what parameters we can work with to impact cost/sample.
`
`5.6.2 Develop an itemized value assessment of content to show the marginal value
`of the data in various content sets.
`
`5.7 The clinical exome panel must “finish” the well established gene panels
`for cardiomyopathy, FH, LongQT syndrome.
`
`5.7.1 For the cardiomyopathy, FH, LongQT syndrome. clinical panels scope what the
`current accuracy is.
`
`5.7.2 Determine the data combination and weighting strategy(??)
`
`Priority:P1
`Additional Description:: Determine what parameters can be worked with. It’s
`not necessarily content that reduces cost. Multiplexing, for example, is a
`possibility to reduce cost/sample.
`Priorty: P1
`Priority: P3
`Additional Description:: First release is a research (exome plus) and doesn’t
`include panels. Second release: clinical (exome panel)
`Priority: P3
`?Provide the ability to combine and weight data to ???? Not sure what this is.
`Priority:
`Priority:P1
`Priority:P3
`Priority:P1
`Priority:P1
`Priority: P1
`Priority:P3
`
`5.7.3 The pipeline analysis shall be enabled to make calls on combined data sets for
`single genome/exome/supplement (how is this different than multisample
`support?)
`
`5.7.4 Provide the ability to allelotype unstable expanding repeat diseases.
`
`5.7.5 Provide the ability to perform reassembly of SDs, compressions, and SVs to ???
`
`5.7.6 Provide the ability to detect a sample’s HLA type from exome + data.
`
`5.8
`
`Integrate support for exome+ workflow(s) into pipeline to enable analysis,
`variant calling, and annotation of the data in an automated fashion for
`multiple samples.
`
`5.8.1 A long read aligner shall be enabled to support 2x250 reads from the LRP
`analysis.
`
`Personalis EX2095
`
`

`

`5.8.2 GATK-lite version currently implemented in the pipeline shall be examined to
`determine it’s compatibility with Exome + data.
`
`6.1.2
`
`Insertion >50 bp (long insertion)
`
`6.1.3 deletion >50 (long deletion)
`
`6.1.4
`
`inversion
`
`6 Structural Variants Requirements
`6.1 Pipeline shall detect the following SV types:
`
`Priority:P1
`Definition of Done:
`We do most of these already. It’s a matter of improvements and doing really well
`with long deletions and stay with what we have for the others for V1.
`6.1.1 CNV Priority:P2, VA
`Priority:P2, VA
`Priority:P1, VA
`Feel we can get the most gain by doing well with long deletions.
`Prioritize those that are medically interpretable.
`Exome data deletion detection.
`Priority:P2, VA
`Priority:P2, VA
`Priority:P2, VA
`Priority:P2, VA
`Priority:P2, VA
`Contributors: Hugo, Dan, Nan
`Additional Description: Detecting low complexity SVs is one of the most
`conjunction with our long reads capture, we shall detect the most STRs.
`1. Integrate LobSTR as an SV detection tool in the SV module of the
`pipeline a) Installing dependent modules on the server and in the
`pipeline
`b) Adding a new SV algorithm to the pipeline
`
`6.1.5
`
`interchromosomal translocation (potential place we can stand out)
`
`6.1.6
`
`Intrachromosomal translocation
`
`6.1.7 VNTR (detect STR, get some VNTR with Breakseq)
`
`6.1.7.1 Integrate LobSTR for STR detection
`
`difficult SV detections. No known products are particularly targeting on
`detecting STR comprehensively. By using LobSTR, particularly in
`
`Personalis EX2095
`
`

`

`6.1.8 MEI
`6.1.8.1 Implement MEI (Mobile Element Insertion) detection which can be implicated
`in human disease such as schizophrenia (?)
`
`
`
`
`
`
`
`big challenge in the field, particularly with relatively short reads;
`however, MEI has been shown in various study before causing different
`diseases and somatic differences. There is currently no known
`
`c) Integrate the STR calls with existing SV call set
`Definition of Done:
`Priority:P1, VA
`Contributors: Hugo, Jing, Dan
`Additional Description: Detecting repetitive elements has always been a
`(publicly) available tool for MEI detection. For Complete, they have MEI
`detection but the quality is unknown. We shall develop a new detection tool
`with the best algorithms detecting mobile element insertions such as L1 and
`ALUs.
` Looks for gene disruption.
`
`
`ILMN data on BWA needs it’s own algorithm written. Start from algorithms
`which have been published and rewrite for ILMN/BWA data.
` There may be a way to take what we’re doing already as a first pass
`implementation for identifying “existing” MEI. A more involved
`implementation would consider “denovo” MEI.
`LOE is 1.5 Mo (1FTE) + validation time for more involved implementation.
`Definition of Done:
`
`Priority:P3
`Session with Sarah and Gemma to determine what we should be scanning
`for.
`Definition of Done:
`
`Priority:P1, VA
`
`6.1.9 Polyploidy/aneuploidy
`
`6.1.10 SVs shall be flagged in the GFF file output by the pipeline.
`6.2 Pipeline shall improve the accuracy for detecting the above SVs at
`increased sensitivity and specificity compared to Complete or other
`competitive solutions (including validation).
`
`Personalis EX2095
`
`

`

`6.2.1 Provide the ability to trade off between sensitivity and specificity on a per
`run/study basis.
`
`6.2.2 Specifically assess accuracy and performance for medically important genes
`that are the focal point of our disease panels. For example:
`cardiomyopathy/HCM, long QT syndrome, Familial hypercholesterolemia, and
`the associated pharmacogenomic regions.
`
`6.2.3 Generate latest breakpoint library from 1KG to improve accuracy of all types of
`
`Improve to add a “p” value to the detection replacing the high/low classes
`that currently exist.
` Testing:
` Definition of Done:
`
`Priority:P1, VA
`Additional Description:
`For example in discovery research we may want to emphasize sensitivity
`over specificity while for clinical applications specificity may be more
`important. ROC curves shall be constructed to measure.
`Definition of Done:
`
`Priority:P1, VA
`Definition of Done:
`SVs. Priority:P1, VA
`Contributors: Hugo, Jing
`Additional Description: With the collaboration with 1KG, particularly the
`and EBI (Jan Korbel, the SV leader in 1KG), a latest SV library from the
`1000 genomes shall be generated with breakpoint and validation
`information. We shall also run breakseq and possibly other SV tools to
`generate stringent breakpoints. Impvoves both sensitivity and specificity.
`Definition of Done:
`Priority:P1, VA
`Contributors: Hugo, Jing
`Additional Description: None of the existing SV callers perform local
`resolution where breakpoints are unclear. Performing local reassembly in
`breakpoint regions to determine more breakpoints and to validate the SVs.
`Definition of Done:
`
`SV group (Lam et. al. >4yrs experience), Yale (Gerstein’s lab, SV expert)
`
`6.2.4 Local reassembly to be more precise in resolving breakpoints and to validate
`the SVs.
`
`reassembly. SVs detected by certain algorithms are also of low
`
`Personalis EX2095
`
`

`

`6.2.5 Report breakpoint resolution when possible to better assess the downstream
`effect on gene function
`
`
`Priority:P1, VA
`Contributors:Hugo, Jing
`Dependency: 6.2.3, 6.2.4
`Additional Description: Many of the SVs detected nowadays are not of
`breakpoint resolution. Algorithms shall be developed to report the
`breakpoint information for as many SVs as possible, e.g. by doing local
`reassembly in breakpoint regions.
`Definition of Done:
`
` 6.2.6 Report zygosity for SVs when possible to better assess impact on gene function
`Priority:P1, VA
`Contributors: Hugo, Dan
`Dependency: 6.2.5
`Additional Description: Thus far, none of the SV algorithms report the SV
`zygosity. For the SVs detected by the pipeline, algorithms shall be developed
`to report the SV genotype with zygosity such as homozygous/heterozygous
`deletions/insertions.
`Definition of Done:
`
` 6.2.7 Refine SV merging algorithms to better annotate SV and increase precision of
`Priority:P1, VA
`Contributors:Hugo, Dan
`Dependency: 6.2.6
`Additional Description: SV merging is critical as SVs detected by the
`catalogorize these SVs. Without phasing information, it is particularly hard
`to do it accurately. Algorithms shall be developed to refine the current SV
`merging in the pipeline with the aid of multiple orthogonal SV algorithms, the
`zygosity information and the SV breakpoint/frequency data sets we are
`generating.
`Definition of Done:
`
`
`algorithms are sometimes broken into fragments or overestimated and
`multiple algorithms also report SVs largely overlap; however, no known
`algorithm or software can accurately consolidate, merge and then
`
`and disease, P1
`
`merging overlapping SVs (for example better detection of compound
`heterozygosity or overestimate on size of SVs)
`
`
`
`Personalis EX2095
`
`

`

`6.2.8 Provide an SV characterization tool to understand the formation mechanism of
`SVs (targeted to discovery researcher)
`
`following methods
`
`6.3.1 Create simulation data sets to model different types of SVs
`
`crucial for relating SV surveys to primate genome evolution and
`population genetics. Inference of its formation mechanism is crucial for
`understanding how the SV was generated, which may then intersect
`with exons of genes or that lead to gene fusion events causing diseases.
`Characterizing remains a challenging problem until our first
`
`Priority:P2
`Additional Description: Inference of the ancestral state of an SV locus is
`publication (Lam et. al. 2010). Based on our previous experience, we shall
`develop an in-house tool to infer SV formation mechanism and ancestral
`state.
`Definition of Done:
` 6.3 Sensitivity and specificity for each type of SV shall be tested via the
`Main focus on long deletions for V1 (P1)
`Priority:P2
`Contributors: Hugo, Ming, Mark, Jason
`Additional Description: There is currently no known robust SV
`accuracy/error model. Using an existing simulation box or creating our
`own to generate simulation data (e.g. sequence reads) to model different
`types of SV and to test their detection. If possible, a p-value should be
`generated and assigned to the SVs being called.
`Definition of Done:
`Priority:P2
`Manual validation of SV detected across multiple algorithms.
`Definition of Done:
`
`6.3.2 Comparison with SV results generated with orthogonal technologies (for
`example Complete Genomics data we’ve had sequenced).
`
`6.3.3 Describe relative performance statistics (sensitivity and specificity) for our SV
`detection algorithm versus:
`6.3.3.1 Illumina seq + each individual SV detection algorithm (Pindel, etc)
`6.3.3.2 Complete genomics pipeline
`6.3.4 Comparison to 1000 genomes and available results in the literature.
`
`Priority:P2
`As a golden data set to start working with.
`Definition of Done:
`
`Personalis EX2095
`
`

`

`6.3.7 Finalize the new breakseq by testing full-scale using 1KG data to fine tune the
`sensitivity and performance of the algorithm(s)
`
`new version with zygosity information has just been released at
`
`
`
`6.4 Build a unified and comprehensive SV/CNV database for downstream
`annotation
`
`6.3.5 Perform validations of detected SVs with PCR methods to validate accuracy
`and resolve discordance
`
`6.3.6 Asses the impact of PCR or Gel free sequencing on SV detection
`
`Priority:P2
`Contributors: Hugo, Jing, Shujun
`Additional Description: For regions not or badly validated by public such as
`1KG and are in content regions of interest/importance, PCR shall be
`performed to validate the existence and zygosity of the SVs.
`Definition of Done:
`Priority:P3
`Contributors: Hugo, Jing, Shujun
`Additional Description: PCR free might change the GC of sequences and gel
`free might change the fragment size, which may have impact on various
`detection algorithms.
`Definition of Done:
`Priority:P1, VA
`Contributors: Hugo, Jing
`Dependency: 6.2.3
`Additional Description: We are the author of BreakSeq (Lam et. al.) and a
`Personalis. We will be testing the new version of breakseq on 1KG data
`(~1000 samples) and fine tune the sensitivity and performance.
`Definition of Done:
`Priority:P1, VA (ask them)
`Contributors: Hugo, Dan, Jing, Ming
`Dependency: 6.2.5-6.2.8
` Potentially focus on deletions at first.
` Additional Description: Personalis will be the first one with a cleaned,
`frequency data as well as possible phenotypic data.
`1. Public call set (Hugo, Jing, Dan)
`a. Gather SVs with different level of resolutions (e.g. breakseq
`library, 1KG CNVs, 1KG SVs, etc)
`
`unified, comprehensive, and largest SV database with different level of
`resolutions, different studies, validation data, highest amount of
`breakpoints, breakpoint formation mechanisms, ancestral states,
`individual zygosity information where possible, comprehensive
`
`Personalis EX2095
`
`

`

`b. Develop algorithms to resolve the differences and enhance the
`quality of heterogeneous SVs and CNVs
`2. Internal call set (Hugo, Dan, Ming)
`a. Run SV module of the pipeline on our data sets to produce our
`own call set with possible zygosity and frequency data
`i. Using uk10k as well (6000 exomes) – get allele
`frequency data for SVs.
`ii. Look at the 25 genomes that we have.
`iii. Grow these over time (aggregated stats from
`everything we run)
`iv. Systematic approach to pulling in data to generate the
`information such as frequency
`3. Consolidate the call sets to generate the highest resolution data
`possible (Hugo, Dan)
`4. Validate with public datasets (e.g. 1KG), local re-assembly, and
`internal experiments where necessary (e.g. important content
`regions) (Hugo, Jing)
`5. Generate frequency data and doing statistical validation such as
`hardy weinberg. (Hugo, Dan)
`6. Correlate the final, cleaned, unified database with any phenotypic
`data available. (Hugo, Dan, Jing)
`Definition of Done:
` 6.5 Detect SV in exome regions and assess accuracy to strengthen our ability
`Priority:P1
`Additional Description: Detecting SV in exome regions are known to be
`Also, no known exome product is known to generate SV calls. We shall
`detect CNV and SV from exome data using the latest exome-specific algorithms
`as well as algorithms developed in-house.
`Definition of Done:
` 6.6 Choose an SV file format and multi-sample representation that works well
`Priority:P1
`Contributors:Hugo, Dan, Steve, Rong
`
`to “finish” exomes (esp. for clinical application of exome+). This is also a
`key differentiator because nobody else is doing this commercially.
`
`hard and existing SV detection tools usually don’t work with exome data;
`however, SVs in the exome regions are critical, e.g. indels in BRCA genes.
`
`for multi-algorithms, multi-sample, and downstream analyses.
`
`Personalis EX2095
`
`

`

`representing SVs, particularly for multi samples. 1KG is using VCF, but its
`known to be not optimal.
`
`Additional Description: There is currently no known good file format for
`1. Currently GFF for every sample. (Dan)
`2. Come up with a format or a way that works well for multi-algorithms
`and multi-sample, as well as the downstream analyses (Hugo, Dan)
`3. Changing the pipeline accordingly (Dan)
`4. Think about what a multisample file should have in order to work for
`downstream analyses. GVF is a possible option for representing
`multisamples. (Rong, Steve)
`Definition of Done:
`Priority:P1, VA
`Contributors: Hugo, Dan, Rong, Michael, Steve
`Additional Description:
`1. For VA we’ll be annotating the GFF file for each individual sample.
`2. In addition to annotated GFF, for v1 consider creating a multisample
`representation/merged SVs for annotation and comparison across
`samples. (Risk in this lies in the “fuzzy” criteria for merging across
`samples and annotating based on overlap with gene). Need to
`explain methodology and validate.
`3. Variant (stat) report: We are currently reporting SV association with
`different genomic elements (e.g. repeat elements, genes) (Hugo, Dan)
`a. elements (e.g. with other public datasets like DGV)
`4. Gene report: Annotate disease for genes overlapped with SV (Rong,
`Michael, Steve) (see 6.5)
`Definition of Done:
`
`Priority:P1, VA?
`Contributors: Rong, Steve, Michael, Hugo
`Additional Description:
`1. dbVar (Rong, Steve)
`a) Investigate and download dbVar
`b) Parse dbVar into MySQL
`c) Separate SV in clinical and normal samples
`d) Connect ID, genomic coordinate, and disease
`e) Design and implement dbVar report
`
`6.7 Create stats, QC, and annotation report for SV to deliver results.
`
`6.8 Create report for SV associations with diseases (for annotation) to deliver
`results
`
`Personalis EX2095
`
`

`

`6.9 Create report for statistically significant SV associations with genes,
`pathways, disease in case/control.
`
`7.2 Provide the ability to determine blood type from sequence data to
`improve ability to identity match sequence to sample record for the VA
`proposal.
`
`7.3 Provide the ability to determine HLA type to improve downstream
`annotation (especially for Exome +)
`
`7 Additional Pipeline Requirements:
`7.1 Provide the ability to determine relatedness, ethnicity, and sex to improve
`downstream annotation.
`
`2. Generate report for diseases associated with SVs
`
`Definition of Done:
`Priority:P2
`Dependency: 6.7
`Research project, but would be a large differentiator.
`Priority:P1, VA for Sex determination. P2 for relatedness and ethnicity.
`Contributors: Hugo, Dan, Jing
`Additional Description:
`Consult with Carlos and he may have suggestions or some things we could use.
`Definition of Done:
`Priority: P1, VA
`Definition of Done:
`Priority:P2
`Definition of Done:
`
`Priority:P1, VA
`Additional Description: Look at Complete and ILMN outputs we’ve received.
`Definition of Done:
`Priority:P1
`Contributors: Hugo, Ming
`Additional Description: We will be testing IE8+, Firefox, Chrome, and maybe
`Safari for all HTML outputs. Prioritize Firefox and Chrome.
`Definition of Done:
`
`7.4 Determine how the raw data should be delivered (e.g. file structures,
`formats, and transfer/devices)
`
`7.5 Test browser compatibility for output
`
`Personalis EX2095
`
`

`

`7.6 Test compatibility with public browsing tools and document how to use
`them. Tools include Broad-IGV, ??other browsing favorites?? Part of a
`larger interaction design.
`
`the full GATK2.0 version.
`
`Priority:P1
`Definition of Done:
` 7.7 Get GATK2.0 license when available and consider performing update to
`Priority:P1
`Contributors: Hugo, Dan, Scott
`Additional Description:
`Full GATK update in the fall when Broad releases the version from beta.
`Mostly consists of integration work.
`Definition of Done:
`Priority:P1
`Contributors: Hugo, Dan, Michael
`Additional Description: Generate unique Personalis variant IDs for all variants
`generated. Particularly important for the varaints that don’t have RSIDs.
`Definition of Done:
`Priority:P1
`Contributors: Hugo, Ming, Nan
`Additional Description: At a minimum includes sample names, case/control,
`sex, ethnicity, age as well as customer information for creating business
`elements reports. Additional information that is provided in contained in
`Christian’s DB layout and Genologics documentation.
`Definition of Done:
`Priority:P1,VA
`Contributors: Hugo, Dan, Jing
`Additional Description:
`1. On samples run at ILMN the array is run alongside the sequencing
`and we get the data back from them. Need specifics of the results
`returned.
`2. Use for validating SNPs and Genotypes
`
`Integrate analysis and annotation pipeline with our sample tracking/LIMS
`system from Genologics to pull necessary metadata to automate analysis
`and report generation.
`
`7.8 Unique Personalis IDs shall be assigned to all variants generated.
`
`7.9
`
`7.10 Pipeline shall provide the ability to cross-validate sequencing with array
`results – multiplatform support.
`
`Personalis EX

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket