`USO 11408033B2
`
`c12) United States Patent
`Bartha et al.
`
`(IO) Patent No.: US 11,408,033 B2
`(45) Date of Patent:
`Aug. 9, 2022
`
`(54) METHODS AND SYSTEMS FOR GENETIC
`ANALYSIS
`
`(71) Applicant: Personalis, Inc., Menlo Park, CA (US)
`
`(72)
`
`Inventors: Gabor T. Bartha, Los Altos, CA (US);
`Gemma Chandratillake, Cambridge
`(GB); Richard Chen, Burlingame, CA
`(US); Sarah Garcia, Palo Alto, CA
`(US); Hugo Yu Kor Lam, Sunnyvale,
`CA (US); Shujun Luo, Castro Valley,
`CA (US); Mark R. Pratt, Roseburg,
`OR (US); John West, Cupertino, CA
`(US)
`
`(73) Assignee: Personalis, Inc., Menlo Park, CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`(21) Appl. No.: 17/078,857
`
`(22) Filed:
`
`Oct. 23, 2020
`
`(65)
`
`Prior Publication Data
`
`US 2021/0062258 Al Mar. 4, 2021
`
`(58) Field of Classification Search
`None
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`9,128,861 B2
`9,745,626 B2
`10,266,890 B2
`10,415,091 B2
`2002/0006615 Al
`2005/0042668 Al
`2006/0184489 Al
`2006/0278241 Al
`2010/0042438 Al
`2011/0009296 Al
`2012/0077682 Al*
`
`9/2015 Bartha et al.
`8/2017 Bartha et al.
`4/2019 Bartha et al.
`9/2019 Bartha et al.
`1/2002 Goldsborough et al.
`2/2005 Perlin
`8/2006 Weiner et al.
`12/2006 Ruano
`2/2010 Moore et al.
`1/2011 Kain et al.
`3/2012 Bowcock et al.
`
`2012/0270206 Al
`2018/0051338 Al*
`2021/0047687 Al
`
`10/2012 Ginns et al.
`2/2018 West et al.
`2/2021 Bartha et al.
`
`G0lN 33/57496
`506/2
`
`Gl6B 30/00
`
`FOREIGN PATENT DOCUMENTS
`
`0281927 A2
`EP
`WO WO 2011/160063 A2
`WO
`WO-2011160206 Al
`WO
`WO-2014113204 Al
`
`9/1988
`12/2011
`12/2011
`7/2014
`
`Related U.S. Application Data
`
`(60) Continuation of application No. 16/816,135, filed on
`Mar. 11, 2020, now abandoned, which
`is a
`continuation of application No. 16/526,928, filed on
`Jul. 30, 2019, now abandoned, which
`is a
`continuation of application No. 15/996,215, filed on
`Jun. 1, 2018, now Pat. No. 10,415,091, which is a
`continuation of application No. 14/810,337, filed on
`Jul. 27, 2015, now Pat. No. 10,266,890, which is a
`division of application No. 14/141,990, filed on Dec.
`27, 2013, now Pat. No. 9,128,861.
`
`OTHER PUBLICATIONS
`
`Pritchard et al., "ColoSeq Provides Comprehensive Lynch and
`Polyposis Syndrome Mutational Analysis Using Massively Parallel
`Sequencing," J. Mo!. Diagn. 2012, 14:357-366, published online
`May 30, 2012. (Year: 2012).*
`(Continued)
`
`Primary Examiner - Kaijiang Zhang
`(74) Attorney, Agent, or Firm - Orrick, Herrington &
`Sutcliffe, LLP
`
`(60) Provisional application No. 61/753,828, filed on Jan.
`17, 2013.
`
`(57)
`
`ABSTRACT
`
`(51)
`
`(52)
`
`(2018.01)
`(2019.01)
`(2019.01)
`(2019.01)
`(2019.01)
`(2019.01)
`(2019.01)
`(2018.01)
`(2018.01)
`(2019.01)
`(2019.01)
`
`Int. Cl.
`C12Q 116874
`G16B 20100
`G16B 30/00
`G16B 99/00
`G16B 20110
`G16B 20120
`G16B 35/10
`C12Q 116806
`C12Q 116869
`G16B 35/00
`G16C 20160
`U.S. Cl.
`CPC ......... C12Q 116874 (2013.01); C12Q 116806
`(2013.01); G16B 20100 (2019.02); G16B
`20110 (2019.02); G16B 20120 (2019.02);
`G16B 30/00 (2019.02); G16B 35/10
`(2019.02); G16B 99/00 (2019.02); Cl2Q
`1/6869 (2013.01); Gl6B 35/00 (2019.02);
`Gl6C 20/60 (2019.02)
`
`This disclosure provides systems and methods for sample
`processing and data analysis. Sample processing may
`include nucleic acid sample processing and subsequent
`sequencing. Some or all of a nucleic acid sample may be
`sequenced to provide sequence information, which may be
`stored or otherwise maintained in an electronic storage
`location. The sequence information may be analyzed with
`the aid of a computer processor, and the analyzed sequence
`information may be stored in an electronic storage location
`that may include a pool or collection of sequence informa(cid:173)
`tion and analyzed sequence information generated from the
`nucleic acid sample. Methods and systems of the present
`disclosure can be used, for example, for the analysis of a
`nucleic acid sample, for producing one or more libraries, and
`for producing biomedical reports. Methods and systems of
`the disclosure can aid in the diagnosis, monitoring, treat(cid:173)
`ment, and prevention of one or more diseases and condi(cid:173)
`tions.
`
`23 Claims, 14 Drawing Sheets
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`US 11,408,033 B2
`Page 2
`
`(56)
`
`References Cited
`
`OTHER PUBLICATIONS
`
`Boers et al., "High-Throughput Multilocus Sequence Typing: Bring(cid:173)
`ing Molecular Typing to the Next Level," PLoS ONE 2012;
`7(7):e39630.
`Clark, et al. Performance comparison of exome DNA sequencing
`technologies. Nat Biotechnol. Sep. 25, 2011;29(10):908-14. doi:
`10 .103 8/nbt.197 5.
`Co-pending U.S. Appl. No. 16/526,928, inventors Barthagabor; T. et
`al., filed Jul. 30, 2019.
`Co-pending U.S. Appl. No. 16/816,135, inventors Barthagabor; T. et
`al., filed Mar. 11, 2020.
`Co-pending U.S. Appl. No. l 7 /235,776, inventors Westjohn et al.,
`filed Apr. 20, 2021.
`Craig, et al. Identification of genetic variants using bar-coded
`multiplexed sequencing. Nat Methods. Oct. 2008;5(10):887-93.
`Epub Sep. 14, 2008.
`European search report and opinion dated Aug. 4, 2016 for EP
`Application No. 13871784.
`Gottlieb, et al. The DiGeorge syndrome minimal critical region
`contains a goosecoid-like (GSCL) homeobox gene that is expressed
`early in human development. Am J Hum Genet. May 1997;60(5): 1194-
`201.
`Human Genome Overview GRCh37. Genome Reference Consor(cid:173)
`tium, Feb. 27, 2009. 1 Page.
`Human Genome Overview GRCh37.p13. Genome Reference Con(cid:173)
`sortium, Jun. 28, 2013. 2 Pages.
`Human Genome Overview GRCh38.pl2. Genome Reference Con(cid:173)
`sortium, Dec. 21, 2017. 2 Pages.
`International search report and written opinion dated Apr. 23, 2014
`for PCT/US2013/078123.
`
`Li, et al. Novel computational methods for increasing PCR primer
`design effectiveness in directed sequencing. BMC Bioinformatics.
`Apr. 11, 2008;9:191. doi: 10.1186/1471-2105-9-191.
`Market, et al., V(D)J Recombination and the Evolution of the
`Adaptive Immune System. PLOS Biol. 2003; l(l):el6. https://doi.
`org/10.1371/journal.pbio.00000 16.
`Notice ofallowancedatedJun. 3, 2015 forU.S.Appl. No. 14/141,990.
`Notice of Allowance dated Jun. 9, 2017 forU.S.Appl. No. 15/222,875.
`Office action dated Feb. 6, 2015 for U.S. Appl. No. 14/141,990.
`Office Action dated Feb. 27, 2017 for U.S. Appl. No. 15/222,875.
`Office action dated Jun. 5, 2014 for U.S. Appl. No. 14/141,990.
`Ralph, et al., Consistency ofVDJ Rearrangement and Substitution
`Parameters Enables Accurate B Cell Receptor Sequence Annota(cid:173)
`tion. PLOS computational biology, 2016; 12(1): el004409. https://
`doi.org/ 10 .13 71/journal. pcbi. l 004409.
`U.S. Appl. No. 14/810,337 Notice of Allowance dated Feb. 28,
`2019.
`U.S. Appl. No. 14/810,337 Notice of Allowance dated Jan. 18, 2019.
`U.S. Appl. No. 14/810,337 Office Action dated Apr. 9, 2018.
`U.S. Appl. No. 15/996,215 Notice of Allowance dated May 15,
`2019.
`U.S. Appl. No. 15/996,215 Office Action dated Dec. 31, 2018.
`U.S. Appl. No. 17/080,474 Notice of Allowance dated Jul. 19, 2021.
`U.S. Appl. No. 17 /080,474 Office Action dated Mar. 26, 2021.
`Co-pending U.S. Appl. No. 17/507,578, inventors Barthagabor; T. et
`al., filed Oct. 21, 2021.
`Co-pending U.S. Appl. No. 17/548,379, inventors Barthagabor; T. et
`al., filed Dec. 10, 2021.
`Hong, et al., Tracking the origins and drivers of subclonal metastatic
`expansion in prostate cancer. Nature Communications, Apr. 1, 2015;
`vol. 6, No. 1: pp. 1-12. XP055501144.
`U.S. Appl. No. 17/235,776 Office Action dated Aug. 17, 2021.
`
`* cited by examiner
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`U.S. Patent
`
`Aug. 9, 2022
`
`Sheet 1 of 14
`
`US 11,408,033 B2
`
`FIG.1
`
`130
`.,/
`
`135
`/
`I {
`
`l
`l
`/
`
`101
`
`.. / 120
`
`..... ..-·
`
`..... ·····
`
`..... ·······'·
`
`~.:.····
`
`-❖··/····'"° .
`
`........... _,,,,,,,
`
`/
`110 /
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`U.S. Patent
`
`Aug. 9, 2022
`
`Sheet 2 of 14
`
`US 11,408,033 B2
`
`f'IG. 2A
`
`.FIG. 2B
`
`FIG. 2C
`
`FIG. 2D
`
`Anaiysis 1 ---(cid:141)
`
`'!><; Output
`
`,f ~;;~;; h
`'~f-J r-- ----, d,:)·,{--·~;;;~;;;;···1
`:" ··3 :i-••),i Mafysi, 1 f ·······
`:f···;;;;;:;·····~::::J'
`"·······················
`
`... J .. ~ ::;;:; ~41 ~;;;;; 7
`
`H_l<ll_,_"_""_; A_,_•ill_h '.( -'>{~,a, - j
`!
`•·
`L_~, ,,,»t<l<»!4
`
`;
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`U.S. Patent
`
`Aug. 9, 2022
`
`Sheet 3 of 14
`
`US 11,408,033 B2
`
`FIG.3
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`U.S. Patent
`
`Aug. 9, 2022
`
`Sheet 4 of 14
`
`US 11,408,033 B2
`
`FIG.4
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`U.S. Patent
`
`Aug. 9, 2022
`
`Sheet 5 of 14
`
`US 11,408,033 B2
`
`F'IG.5
`
`• s~ 1~::::;:.".'~·•:~~~
`•:::•
`·~:: -4:)~-::',s:·-'.$-~:::
`
`!1 ~ .~~ ~~:::&:::.~•.:~~::::
`~ ,::::. i:: :;:.:.-x~.--:::::::;:;
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`U.S. Patent
`
`Aug. 9, 2022
`
`Sheet 6 of 14
`
`US 11,408,033 B2
`
`FIG.6
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`U.S. Patent
`
`Aug. 9, 2022
`
`Sheet 7 of 14
`
`US 11,408,033 B2
`
`FJG.7
`
`·Ii!<· t, .~'$;,;·.:ii:.s,N$~·
`•
`:;;:;:.~'.:..:"::::•:%~•-«=..;
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`U.S. Patent
`
`Aug. 9, 2022
`
`Sheet 8 of 14
`
`US 11,408,033 B2
`
`FIG. 8
`
`H(ih f;(: n:{~i•.:n •.::;~H f,.a rr>
`2:(cid:141)~-:: stru<R::r~ -?!td pr,;;hib~t
`n,;:.frnhi F.n·d f-:.~~~.~~t
`
`.......................................................... ~~·~H
`
`•.·.·.-.·.·.· .... ·.·.·.·.·.·.·.·.·,: ... xx::.·::x:·x:·:·.:.:::xx:-:·::::·:·x.-:·:.:.:xx.-:·:::::•.~.-.-.-:·:.:x.::·.-:·:·· ... .-.·.·.·.· ·
`
`FIG.9
`
`An&lrl•; 1 }--···--;
`
`.•·······•··••./·).·.
`i
`fk~is-:-i~
`t·
`t~':1~!$
`···•·•.•.•-"··•··•····•"'··•{··
`J··············{::~:::~'.~:~~:::}·············{::~;~:~~~~'.~~::}·····J
`
`v················)
`·.,~ ouip,,t 1
`~ ................. )
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`'
`
`Variant &
`Mutation
`Databases
`
`....
`
`-
`-
`
`Merge Variants
`Regions
`
`Personal is
`Variant Set
`
`F!G.10
`
`Target Gene
`Sets, Clinical
`Panels, Usts
`
`-
`
`--
`
`'
`
`Regulatory
`Databases
`(Regu!ome)
`
`..
`
`,...
`
`Alternate
`Sequences
`
`+
`
`Compile Content
`
`-
`-
`
`-
`- and Splice Site Lists
`
`Compile Exon, UTR
`
`,.._
`
`Gene
`Definitions
`
`1 r
`
`Compile Gene
`Regulatory
`Regions
`
`t
`
`Compile Target
`Gene Regions
`
`Gene Phasing or
`Reassembly
`Targets
`
`---
`
`I
`
`-
`
`~
`
`8
`
`Problematic
`Region
`Defin!t!ons
`
`....
`,.
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`~
`~
`~\,Ci
`N
`0
`N
`N
`
`('D
`('D
`
`\,Ci
`
`rJJ =(cid:173)
`.....
`0 ....
`....
`
`.i;...
`
`.,
`
`Phasing
`Required
`__,,,.;.,,.;;--
`I
`
`d r.,;_
`"'""'
`
`"'""' ~ = 00 = w
`w = N
`
`it
`
`Standard
`B::ome
`
`Personalis Net
`Content
`
`~ r
`..... lassify &
`OK
`Prioritize
`Not OK
`
`Standard
`Exome
`Targets
`
`y
`
`y
`
`.,
`'
`'
`,,•------1--- ++ ---1--------~--- ,-1 l
`
`Easy & Not
`Targeted
`
`High GC
`Conten_L_
`
`Low Complexity, STR,
`Expanding Repeats
`
`Alternate
`Mapping
`Problem~ Sequence~
`
`Supplementary
`Probes & Protocol
`
`High GC Probes
`& Protocol
`
`long RE3ad Probes
`& Protocol
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`FIG.11
`
`Sample
`
`.,./ DNA ' ,
`Extraction
`......,__:::-,
`
`/
`
`/ / -,
`-
`-------*-
`~ I
`
`, / ' / Fragmentation
`
`'...,
`
`Aliquot
`
`!
`j
`
`<::::_
`Stam.lard Protocol & Exome ~
`~ +Supplement Pullout ~
`
`High GC Protocol &
`Pullout
`
`·,~
`
`Biomedical
`Databases
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`~
`~
`~\,Ci
`N
`0
`N
`N
`
`('D
`('D
`
`rJJ =(cid:173)
`.....
`....
`0
`0 ....
`....
`
`.i;...
`
`HiSeq 2itl00bp
`Ssquem:ing
`
`Align, Ana!y;;:0 and/or
`Assemble Call Variants
`
`Data Pool
`
`-~
`
`Specified Armotal:1011
`& Interpretation
`
`Blomedkal
`Reports
`
`d r.,;_
`"'""'
`
`"'""' ~ = 00 = w
`w = N
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`FIG.12
`
`Sample
`
`Std Fragmentation
`& End Repair
`
`Std. Protocol & Exome
`and Supplement Pullout
`
`HiSeq 2x1 OObp
`Sequencing
`
`Align, Analyze, and/or
`Assemble, Call Variants
`
`Hi GC Fragmentation
`& End Repair
`
`High GC Protocol &
`Pullout
`
`HiSeq 2x1 OObp
`Sequencing
`
`Align, Analyze, and/or
`Assemble, Cal! Variants
`
`Biomedical
`Databases
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`~
`~
`~\,Ci
`N
`0
`N
`N
`
`('D
`('D
`
`rJJ =(cid:173)
`.....
`....
`....
`0 ....
`....
`
`.i;...
`
`Data
`Pool
`
`Specified
`Annotation &
`Interpretation
`
`Biomedical
`Reports
`
`d r.,;_
`"'""'
`
`"'""' ~ = 00 = w
`w = N
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`FIG.13
`
`Sample
`
`;.('_/"""'
`
`.... ,
`
`High GC Protocol -~
`& Pullout
`/""'
`
`..................
`
`Standard Protocol &. Exome
`+ Supplement Pullout
`
`.... -
`
`HiSeq 2x100bp
`Sequem:ing
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`~
`~
`~\,Ci
`N
`0
`N
`N
`
`('D
`
`rJJ =-('D
`.....
`....
`N
`0 ....
`....
`
`.i;...
`
`-1
`
`HiSeq 2x100bp I
`
`Sequencing
`
`l I
`
`I
`
`r1
`Biomedical
`Databases
`
`i!
`(X)
`:~
`Align, Analyze and/or F=="'
`
`e
`
`Assemble Call Variants ·~_ : : )
`
`Specified Annotation
`& interpretation
`
`Biomedical
`
`Reports ~-
`
`d r.,;_
`"'""'
`
`"'""' ~ = 00 = w
`w = N
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`FIG. 14
`
`Sample
`
`DNA Extraction
`
`Fragmentation
`& End Repair
`
`Long
`
`Short
`
`Long Read Pullout
`& ReadPrep
`
`Standare Protocol &. Exome
`+ Supplement Pullout
`
`High GC Protocol &
`Pullout
`
`HiSeq 2x1 OObp
`SequE¥ncing
`
`Biomedical
`Databases
`
`Align, Analyze, and/or
`Assemble, CaH Variants
`
`Specified Annotation
`& Interpretation
`
`Biomedical
`Reports
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`~
`~
`~\,Ci
`N
`0
`N
`N
`
`('D
`
`~
`
`rJJ =(cid:173)
`('D ....
`....
`0 ....
`....
`
`.i;...
`
`d r.,;_
`"'""'
`
`"'""' ~ = 00 = w
`w = N
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`FIG.15
`
`15~7
`~::--'
`
`/
`
`DNA----....
`Extraction ~
`l
`~~~~
`Fragmentation
`/
`"--......., and End Repair
`
`/
`
`long < '~ .... Size ~-.-....._
`- - . $~ - - - .
`
`Short
`
`Allquot
`
`long Read Pullout
`and long Read PreP,,,..·
`
`PacBio RS 5 kbp or
`MiSeq 2.x250bp
`
`Align1 Analyze and/or
`Assemble call Variants
`
`Standard Protocol & Exome
`+ Supplement Pullom:
`
`---~
`
`HiSeq 2xl00 bp
`Sequem::lng
`
`Align, Analyze and/or
`Assemble Call Variants
`
`Blomed!cal
`
`·--.,
`~.._____.,.,,,,
`
`Specified Annotation !
`~,,~-~~!:!!:~:.~!:~~,J
`
`Blomedkal
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`~
`~
`~\,Ci
`N
`0
`N
`N
`
`('D
`
`.i;...
`
`rJJ =(cid:173)
`('D ....
`....
`0 ....
`....
`
`.i;...
`
`d r.,;_
`"'""'
`
`"'""' ~ = 00 = w
`w = N
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`US 11,408,033 B2
`
`1
`METHODS AND SYSTEMS FOR GENETIC
`ANALYSIS
`
`CROSS-REFERENCE
`
`This application is a continuation application of U.S.
`patent application Ser. No. 16/816,135, filed Mar. 11, 2020,
`which application is a continuation application of U.S.
`patent application Ser. No. 16/526,928, filed Jul. 30, 2019,
`which application is a continuation application of U.S.
`patent application Ser. No. 15/996,215, filed Jun. 1, 2018,
`now U.S. Pat. No. 10,415,091, which application is a
`continuation application of U.S. patent application Ser. No.
`14/810,337, filed Jul. 27, 2015, now U.S. Pat. No. 10,266,
`890, which application is a divisional application of U.S.
`patent application Ser. No. 14/141,990, filed Dec. 27, 2013,
`now U.S. Pat. No. 9,128,861, which claims priority to U.S.
`Provisional Application No. 61/753,828, filed Jan. 17, 2013,
`each of which is incorporated herein by reference in its
`entirety.
`
`BACKGROUND
`
`5
`
`20
`
`2
`bined pool of nucleic acid molecules; and ( c) conducting one
`or more assays on the first combined pool of nucleic acid
`molecules, wherein at least one of the one or more assays
`comprises a sequencing reaction.
`Disclosed herein is a method for analyzing a nucleic acid
`sample, comprising (a) producing two or more nucleic acid
`molecules subsets from a nucleic acid sample, wherein
`producing the two or more nucleic acid molecules comprise
`enriching the two or more subsets of nucleic acid molecules
`10 for two or more different genomic regions; (b) conducting a
`first assay on a first subset of nucleic acid molecules among
`the two or more subsets of nucleic acid molecules to produce
`a first result, wherein the first assay comprises a first
`sequencing reaction; ( c) conducting a second assay on at
`15 least a second subset of nucleic acid molecules among the
`two or more subsets of nucleic acid molecules to produce a
`second result; and ( d) combining, with the aid of a computer
`processor, the first result with the second result, thereby
`analyzing the nucleic acid sample.
`Further provided herein is a method for analyzing a
`nucleic acid sample, comprising (a) preparing at least a first
`subset of nucleic acid molecules and a second subset of
`nucleic acid molecules from a nucleic acid sample, wherein
`the first subset of nucleic acid molecules differs from the
`second subset of nucleic acid molecules; (b) conducting a
`first assay on the first subset of nucleic acid molecules and
`a second assay on the second subset of nucleic acid mol(cid:173)
`ecules, wherein the first assay comprises a nucleic acid
`sequencing reaction that produces a first result, comprising
`nucleic acid sequence information for the first subset, and
`wherein the second assay produces a second result; ( c)
`analyzing, with the aid of a computer processor, the first
`result to provide a first analyzed result and analyzing the
`second result to provide a second analyzed result; and ( d)
`35 combining, with the aid of a computer processor, the first
`and second analyzed results, thereby analyzing the nucleic
`acid sample.
`Provided herein is a method for analyzing a nucleic acid,
`comprising (a) producing one or more subsets of nucleic
`40 acid molecules from a nucleic acid sample, wherein pro(cid:173)
`ducing the one or more subsets of nucleic acid molecules
`comprises conducting a first assay in the presence of one or
`more antioxidants to produce a first subset of nucleic acid
`molecules; and (b) conducting a sequencing reaction on the
`45 one or more subsets of nucleic acid molecules, thereby
`analyzing the nucleic acid sample.
`Also disclosed herein is a method for analyzing a nucleic
`acid sample, comprising (a) producing, with the aid of a
`computer processor, one or more capture probes, wherein
`50 the one or more capture probes hybridize to one or more
`polymorphisms, wherein the one or more polymorphisms
`are based on or extracted from one or more databases of
`polymorphisms, observed in a population of one or more
`samples, or a combination thereof, (b) contacting a nucleic
`55 acid sample with the one or more capture probes to produce
`one or more capture probe hybridized nucleic acid mol(cid:173)
`ecules; and ( c) conducting a first assay on the one or more
`capture probe hybridized nucleic acid molecules, thereby
`analyzing the nucleic acid sample, wherein the first assay
`comprises a sequencing reaction.
`Further disclosed herein is a method for developing
`complementary nucleic acid libraries, comprising (a) pro(cid:173)
`ducing two or more subsets of nucleic acid molecules from
`a sample, wherein (i) the two or more subsets of nucleic acid
`molecules comprise a first subset of nucleic acid molecules
`and a second subset of nucleic acid molecules, (ii) the first
`subset of nucleic acid molecules comprises nucleic acid
`
`Current methods for whole genome and/or exome
`sequencing may be costly and fail to capture many biomedi- 25
`cally important variants. For example, commercially avail(cid:173)
`able exome enrichment kits (e.g., Illumina's TruSeq exome
`enrichment and Agilent's SureSelect exome enrichment),
`may fail to target biomedically interesting non-exomic and
`exomic regions. Often, whole genome and/or exome 30
`sequencing using standard sequencing methods performs
`poorly in content regions having very high CG content
`(>70%). Furthermore, whole genome and/or exome
`sequencing also fail to provide adequate and/or cost-effec(cid:173)
`tive sequencing of repetitive elements in the genome.
`The methods disclosed herein provide specialized
`sequencing protocols or technologies to address these issues.
`
`SUMMARY
`
`Provided herein is a method for analyzing a nucleic acid
`sample, comprising (a) producing two or more subsets of
`nucleic acid molecules from a nucleic acid sample, wherein
`(i) the two or more subsets comprise a first subset of nucleic
`acid molecules and a second subset of nucleic acid mol(cid:173)
`ecules, and (ii) the first subset of nucleic acid molecules
`differs from the second subset of nucleic acid molecules by
`one or more features selected from genomic regions, mean
`GC content, mean molecular size, subset preparation
`method, or combination thereof, (b) conducting one or more
`assays on at least two of the two or more subsets of nucleic
`acid molecules, wherein (i) a first assay, comprising a first
`sequencing reaction, is conducted on the first subset of the
`two or more subsets to produce a first result, and (ii) a
`second assay is conducted on the second subset of the two
`or more subsets to produce a second result; and (c) com(cid:173)
`bining, with the aid of a computer processor, the first result
`and second result, thereby analyzing the nucleic acid
`sample.
`Also provided herein is a method for analyzing a nucleic 60
`acid sample, comprising (a) producing two or more subsets
`of nucleic acid molecules from a nucleic acid sample,
`wherein the two or more subsets differ by one or more
`features selected from genomic regions, mean GC content,
`mean molecular size, subset preparation method, or combi- 65
`nation thereof, (b) combining at least two of the two or more
`subsets of nucleic acid molecules to produce a first com-
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`US 11,408,033 B2
`
`3
`molecules of a first mean size, (iii) the second subset of
`nucleic acid molecules comprises nucleic acid molecules of
`a second mean size, and (iv) the first mean size of the first
`subset of nucleic acid molecules is greater than the second
`mean size of the second subset of nucleic acid molecules by
`about 200 or more residues; (b) producing two or more
`nucleic acid libraries, wherein (i) the two or more libraries
`comprise a first nucleic acid molecules library and second
`nucleic acid molecules library, (ii) the first nucleic acid
`molecules library comprises the one or more nucleic acid
`molecules from the first subset of nucleic acid molecules,
`(iii) the second nucleic acid molecules library comprises the
`one or more nucleic acid molecules from the second subset
`of nucleic acid molecules, and (iv) the content of the first
`nucleic acid molecules library is at least partially comple(cid:173)
`mentary to the content of the second nucleic acid molecules
`library.
`Provided herein is a method for developing complemen(cid:173)
`tary nucleic acid libraries, comprising (a) producing two or
`more subsets of nucleic acid molecules from a sample of 20
`nucleic acid molecules, wherein the two or more subsets of
`nucleic acid molecules comprise a first subset of nucleic acid
`molecules and a second subset of nucleic acid molecules; (b)
`conducting two or more assays on the two or more subsets
`of nucleic acid molecules, wherein (i) the two or more 25
`assays comprise a first assay and a second assay, (ii) the first
`assay comprises conducting a first amplification reaction on
`the first subset of nucleic acid molecules to produce one or
`more first amplified nucleic acid molecules with a first mean
`GC content, (iii) the second assay comprises conducting a 30
`second amplification reaction on the second subset of
`nucleic acid molecules to produce one or more second
`amplified nucleic acid molecules with a second mean GC
`content, and (iv) the first mean GC content of the first subset
`of nucleic acid molecules differs from the second mean GC 35
`content of the second subset of nucleic acid molecules; and
`(b) producing two or more nucleic acid libraries, wherein (i)
`the two or more libraries comprise a first nucleic acid
`molecules library and second nucleic acid molecules library,
`(ii) the first nucleic acid molecules library comprises the one
`or more first amplified nucleic acid molecules, (iii) the
`second nucleic acid molecules library comprises the one or
`more second amplified nucleic acid molecules, and (iv) the
`content of the first nucleic acid molecules library is at least
`partially complementary to the content of the second nucleic 45
`acid molecules library.
`Also provided herein is a method for developing comple(cid:173)
`mentary nucleic acid libraries, comprising (a) producing two
`or more subsets of nucleic acid molecules from a sample of
`nucleic acid molecules, wherein (i) the two or more subsets 50
`of nucleic acid molecules comprise a first subset of nucleic
`acid molecules and a second subset of nucleic acid mol(cid:173)
`ecules, and (ii) the two or more subsets of nucleic acid
`molecules differ by one or more features selected from
`genomic regions, mean GC content, mean molecular size,
`subset preparation method, or combination thereof, and (b)
`producing two or more nucleic acid libraries, wherein (i) the
`two or more libraries comprise a first nucleic acid molecules
`library and second nucleic acid molecules library, (ii) the
`first nucleic acid molecules library comprises the one or 60
`more nucleic acid molecules from the first subset of nucleic
`acid molecules, (iii) the second nucleic acid molecules
`library comprises the one or more nucleic acid molecules
`from the second subset of nucleic acid molecules, and (iv)
`the content of the first nucleic acid molecules library is at
`least partially complementary to the content of the second
`nucleic acid molecules library.
`
`4
`Disclosed herein is a method for sequencing, comprising
`(a) contacting a nucleic acid sample with one or more
`capture probe libraries to produce one or more capture probe
`hybridized nucleic acid molecules; and (b) conducting one
`5 or more sequencing reactions on the one or more capture
`probe hybridized nucleic acid molecules to produce one or
`more sequence reads, wherein (i) the sensitivity of the
`sequencing reaction is improved by at least about 4% as
`compared current sequencing methods; (ii) the sensitivity of
`10 the sequencing reaction for a genomic region comprising a
`RefSeq is at least about 85%, (iii) the sensitivity of the
`sequencing reaction for a genomic region comprising an
`interpretable genome is at least about 88%, (iv) the sensi(cid:173)
`tivity of the sequencing reaction for an interpretable variant
`15 is at least about 90%, or (v) a combination of (i)-(ii).
`At least one of the one or more capture probe libraries
`may comprise one or more capture probes to one or more
`genomic regions.
`The methods and systems disclosed herein may further
`comprise conducting one or more sequencing reactions on
`one or more capture probe free nucleic acid molecules.
`The percent error of the one or more sequencing reactions
`may similar to current sequencing methods. The percent
`error rate of the one or more sequencing reactions may be
`within about 0.001 %, 0.002%, 0.003%, 0.004%, 0.005%,
`0.006%, 0.007%, 0.008%, 0.009%, 0.01 %, 0.02%, 0.03%,
`0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 1 %, 1.1 %,
`1.2%, 1.3%, 1 .4%, 1.5%, 1.6%, 1.7%, 1.8%, 1.9%, or 2% of
`the current sequencing methods. The percent error of the one
`or more sequencing reactions is less than the error rate of
`current sequencing methods. The percent error of the
`sequencing reaction may be less than about 1.5%, 1 %,
`0.75%, 0.50%, 0.25%, 0.10%, 0.075%, 0.050%, 0.025%, or
`0.001%.
`The accuracy of the one or more sequencing reactions
`may similar to current sequencing methods. The accuracy of
`the one or more sequencing reactions is better than current
`sequencing methods.
`The nucleic acid molecules may be DNA. The nucleic
`40 acid molecules may be RNA.
`The methods and systems may comprise a second subset
`of nucleic acid molecules. The first subset and the second
`subset of nucleic acid molecules may differ by one or more
`features selected from genomic regions, mean GC content,
`mean molecular size, subset preparation method, or combi(cid:173)
`nation thereof.
`The one or more genomic regions may be selected from
`the group comprising high GC content, low GC content, low
`complexity, low mappability, known single nucleotide varia(cid:173)
`tions (SNVs), known inDels, known alternative sequences,
`entire genome, entire exome, set of genes, set of regulatory
`elements, and methylation state.
`The set of genes may selected from a group comprising
`set of genes with known Mendelian traits, set of genes with
`55 known disease traits, set of genes with known drug traits,
`and set of genes with known biomedically interpretable
`variants.
`The known alternative sequences may be selected from
`the group comprising one or more small insertions, small
`deletions, structural variant junctions, variable length tan(cid:173)
`dem repeats, and flanking sequences.
`The subsets of nucleic acid molecules may differ by mean
`molecular size. The difference in mean molecular size
`between at least two of the subsets of nucleic acid molecules
`65 is at least 100 nucleotides. The difference in mean molecular
`size between at least two of the subsets of nucleic acid
`molecules is at least 200 nucleotides. The difference in mean
`
`Foresight EX1001
`Foresight v Personalis
`
`
`
`US 11,408,033 B2
`
`5
`molecular size between at least two of the subsets of nucleic
`acid molecules is at least 300 nucleotides.
`The subsets of nucleic acid molecules may differ by mean
`GC content. The mean GC content of one or more subsets
`may be greater than or equal to 70%. Alternatively, the mean 5
`GC content of one or more subsets may be less than 70%.
`The difference between the mean GC content of two or more
`subsets may be at least about 5%, 10%, 15% or more.
`One or more additional assays may be conducted. A
`second assay may be conducted. A third assay may be 10
`conducted. A fourth assay may be conducted. A fifth, sixth,
`seventh, eighth, ninth, or tenth assay may be conducted. The
`one or more assays may comprise one or more sequencing
`reactions, amplification reactions, hybridization reactions,
`detection reaction, enrichment reactions, or a combination
`thereof.
`The one or more assays may produce one or more results.
`The second assay may comprise a nucleic acid sequencing
`reaction that produces the second result, and wherein the
`second result may comprise nucleic acid sequence informa(cid:173)
`tion for the second subset.
`The first and second assays may be conducted separately.
`The first and second assays may be conducted sequentially.
`The first and second assays may be conducted simultane(cid:173)
`ously.
`At least two of the subsets of nucleic acid molecules may
`be combined to produce a combined subset of nucleic acid
`molecules. The first and second assays may be conducted on
`the combined subset of nucleic acid molecules.
`The first assay and the second assay may be the same. The
`first assay and the second assay may be different.
`Analyzing the nucleic acid sample may comprise produc(cid:173)
`ing a unified assessment of the sample genetic state at each
`locus addressed by the assays.
`Conducting one or more amplification reactions may
`comprise one or more PCR-based amplifications, non-PCR
`based amplifications, or a combination thereof. The one or
`more PCR-based amplifications may comprise PCR, qPCR,
`nested PCR, linear amplification, or a combination thereof.
`The one or more non-PCR based amplifications may com(cid:173)
`prise multiple displacement amplification (MDA), transcrip(cid:173)
`tion-mediated amplification (TMA), nucleic acid sequence(cid:173)
`based amplification
`(NASBA),
`strand displacement
`amplification (SDA), real-time SDA, rolling circle amplifi(cid:173)
`cation, circle-to-circle amplification or a combination
`thereof.
`reactions may comprise capillary
`The sequencing
`sequencing, next generation sequencing, Sanger sequencing,
`sequencing by synthesis, single molecule nanopore sequenc(cid:173)
`ing, sequencing by ligation, sequencing by hybridization,
`sequencing by nanopore current restriction, or a combina(cid:173)
`tion thereof. Sequencing by synthesis may comprise revers(cid:173)
`ible terminator sequencing, processive single molecule
`sequencing, sequential nucleotide flow sequencing, or a
`combination thereof. Sequential nucleotide flow sequencing
`may comprise pyrosequencing, pH-mediated sequencing,
`semiconductor sequencing or a combination thereof. Con(cid:173)
`ducting one or more sequencing reactions comprises whole
`genome sequencing or exome sequencing.
`The sequencing reactions may comprise one or more
`capture probes or libraries of capture probes. At least one of
`the one or more capture probe