`
`(10) International Publication Number
`
`WO 2018/119452 A2
`
`h a
`
`WIPOl PCT
`
`(19) World Intellectual Property
`Organization
`International Bureau
`
`(43) International Publication Date
`28 June 2018 (28.06.2018)
`
`(51)
`
`International Patent Classification:
`
`CIZQ 1/6806 (2018.01)
`
`CIZQ 1/6827 (2018.01)
`
`(21)
`
`International Application Number:
`
`PCT/US2017/068329
`
`(22) International Filing Date:
`22 December 2017(22'12'2017)
`English
`English
`
`(25) Filing Language:
`(26) Publication Language:
`<30) PriorityDatm
`
`Helmy; 2 Barry Lane, Atherton, California 94027 (US).
`TALASAZ, AmirAli; 2181 Camino a Los Cerros, Menlo
`Park, California 94025 (US). ABDUEVA, Diana; 227 Or-
`chard Road, Orinda, California 94563 (US). SCHULTZ,
`Matthew; 989 Carolina Street, San Francisco, California
`94107 (US).
`al.; WILSON
`Jacqueline el
`(74) Agent: STRONCEK,
`SONSINI GOODRICH & ROSATI, 650 Page Mill Road,
`P 1 Alt
`l’f
`‘
`4304—1050
`.
`a O
`0’ Ca 1 0mm 9
`(US)
`81 D '
`tdStt
`'1
`m “d‘ rd,
`) kiii'gu?an2m3 2302:3552 ZvaiififS-lléwie “:3: ii?
`(
`M Azw
`25Aay
`20:
`'25 '08 201
`CA,CH,CL,CN,CO,CR,CU,CZ,DE,DJ,DK,DM,DO,
`62/550,540
`Us
`7)
`7(
`uguSt
`~
`~
`DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN,
`’
`(71) Applicant: GUARDANT HEALTH, INC. [US/US]; 505
`HR, HU, ID, IL, IN, IR, IS, JO, JP, KE, KG, KH, KN, KP,
`KR, KW, KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME,
`Pcnobscot Drivc, Redwood City, California 94063 (US).
`MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, No, NZ,
`OM, PA, PE, PG, PH, PL, PT, QA, Ro, RS, RU, RW, SA,
`SC, SD, SE, SG, SK, SL, SM, ST, sv, SY, TH, TJ, TM, TN,
`TR, TT, TZ, UA, UG, US, UZ, vc, VN, ZA, ZM, ZW.
`
`(72) Inventors: KENNEDY, Andrew; 3705 Terstena Place,
`Apt.
`206,
`Santa Clara, California
`95051
`(US).
`MORTIMER, Stefanie Ann Ward; 2000 Willow Springs
`Road, Morgan Hill, California 95037 (US). ELTOUKHY,
`
`(54) Title: METHODS AND SYSTEMS FOR ANALYZING NUCLEIC ACID MOLECULES
`
`FIG. 1
`dsDNA—_
`ssDNA
`
`
`
`_'_
`
`
`
`Reverse transcription of RNA using gene—specific/random hexamer/polyT DNA primer
`and DNA Molecular tag to ID originating molecule as RNA
`
`mseH degradation of RNA in RNADNA hybrid
`
`
`RNA
`
`Panilinn templates into ssDNA and dsDNA by either:
`1) ssDNA hybridlzation to sequence—speclfic/—agnostic probes (without den uration step)
`2) blunt-end/tail dsDNA, ligate with hairpin/bubble adapters, ligate ssD
`ely With CircLigasell- T4 RNA ligase
`
`3)"other ssDNA library prep methods known in the art...”
`Library
`6;) ofdsDNA
`ssDNA library preps to capture ssDNA and cDNA—
`tagged
`Enrich before library prep/F'CR as
`specified by protocols such as
`NEBDirect Cancer HotSpot Panel
`Ampllcon generation (i.e. gene fusions
`at. exon junctions)
`’Ancient DNA’ ssDNA ligation method
`
`
`
`
`
`
`
`
`
`
`(57) Abstract: The disclosure provides methods for processing nucleic acid populations containing different forms (e.g., RNA and
`DNA, single-stranded or double-stranded) and/or extents of modification (e. g., cytosine methylation, association With proteins). These
`methods accommodate multiple forms and/or modifications ofnllcleic acid in a sample, such that seqllence information can be obtained
`for multiple forms. The methods also preserve the identity of multiple forms or modified states through processing and analysis, such
`that analysis of sequence can be combined with epigenetic analysis.
`
`[Continued on nextpage]
`
`
`
`wo2018/119452A2|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`
`
`WO 2018/119452 A2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ,
`UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ,
`TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK,
`EE, ES, FI, FR, GB, GR, IIR, IIU, IE, IS, IT, LT, LU, LV,
`MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM,
`TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW,
`KM, ML, MR, NE, SN, TD, TG).
`
`Published:
`
`— Without international search report and to be republished
`upon receipt oft/tat report (Rule 48.2(g))
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`INTERNATIONAL PATENT APPLICATION
`
`METHODS AND SYSTEMS FOR ANALYZING NUCLEIC ACID MOLECULES
`
`Inventor( 5): Andrew KENNEDY ,
`
`Citizen of Canada, Residing at
`3705 Terstena Place, Apt. 206
`Santa Clara, CA 95051
`
`Stefanie Ann Ward MORTIMER,
`
`Citizen of Canada, Residing at
`2000 Willow Springs Road
`Morgan Hill, CA 95037
`
`Helmy ELTOUKHY,
`Citizen of the United States of America, Residing at
`2 Barry Lane
`Atherton, CA 94027
`
`AmirAli TALASAZ
`
`Citizen of the United States, Residing at
`2181 Camino a Los Cerros
`
`Menlo Park, CA 94025
`
`Diana ABDUEVA,
`
`Citizen of Russia, Residing at
`227 Orchard Road
`
`Orinda, CA 94563
`
`Matthew SCHULTZ,
`
`Citizen of the United States of America, Residing at
`989 Carolina Street
`
`San Francisco, CA 94107
`
`Assignee:
`
`Guardant Health, Inc.
`505 Penobscot Drive
`
`Redwood City, CA 94063 USA.
`
`Entity:
`
`a large business concem
`
`Filed Electronically on: December 22, 2017
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`METHODS AND SYSTEMS FOR ANALYZING NUCLEIC ACID MOLECULES
`
`REFERENCE TO RELATED PATENT APPLICATIONS
`
`[0001] This application claims the benefit of the priority dates of United States Provisional
`
`Patent Applications 62/438,240, filed December 22, 2016; 62/512,936, filed May 31, 2017 and
`
`62/550,540, filed August 25, 2017, all of which are incorporated by reference herein in their
`
`entirety.
`
`BACKGROUND
`
`[0002] Cancer is a major cause of disease worldwide. Each year, tens of millions of people are
`
`diagnosed with cancer around the world, and more than half eventually die from it. In many
`
`countries, cancer ranks the second most common cause of death following cardiovascular
`
`diseases. Early detection is associated with improved outcomes for many cancers.
`
`[0003] Cancer can be caused by the accumulation of genetics variations within an individual's
`
`normal cells, at least some of which result in improperly regulated cell division. Such variations
`
`commonly include copy number variations (CN Vs), single nucleotide variations (SNVs), gene
`
`fusions, insertions and/or deletions (indels), epigenetic variations include 5—methylation of
`
`cytosine (5—methylcytosine) and association of DNA with chromatin and transcription factors.
`
`[0004] Cancers are often detected by biopsies of tumors followed by analysis of cells, markers or
`
`DNA extracted from cells. But more recently it has been proposed that cancers can also be
`
`detected from cell-free nucleic acids in body fluids, such as blood or urine. Such tests have the
`
`advantage that they are noninvasive and can be performed without identifying suspected cancer
`
`cells in biopsy. However, such tests are complicated by the fact that amount of nucleic acids in
`
`body fluids is very low and what nucleic acid are present are heterogeneous in form (e.g., RNA
`
`and DNA, single—stranded and double—stranded, and various states of post—replication
`
`modification and association with proteins, such as histones).
`
`[0005] It is desirable to increase sensitivity of liquid biopsy assays while reducing the loss of
`
`circulating nucleic acid (original material) or data in the process.
`
`[0006] The disclosure provides methods, compositions and systems for analyzing a nucleic acid
`
`population comprising at least two forms of nucleic acid selected from double—stranded DNA,
`
`SUMMARY
`
`2
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`single—stranded DNA and single—stranded RNA.
`
`In some embodiments the method comprises (a)
`
`linking at least one of the forms of nucleic acid with at least one tag nucleic acid to distinguish
`
`the forms from one another, (b) amplifying the forms of nucleic acid at least one of which is
`
`linked to at least one nucleic acid tag, wherein the nucleic acids and linked nucleic acid tag, if
`
`present, are amplified, to produce amplified nucleic acids, of which those amplified from the at
`
`least one form are tagged; (0) assaying sequence data of the amplified nucleic acids at least some
`
`of which are tagged; and (d) decoding tag nucleic acid molecules of the amplified nucleic acids
`
`to reveal the forms of nucleic acids in the population providing an original template for the
`
`amplified nucleic acids linked to the tag nucleic acid molecules for which sequence data has
`
`been assayed.
`
`[0007] In some embodiments, the method further comprises enriching for at least one of the
`
`forms relative to one or more of the other forms. In some embodiments at least 70% of the
`
`molecules of each form of nucleic acid in the population are amplified in step (b). In some
`
`embodiments at least three forms of nucleic acid are present in the population and at least two of
`
`the forms are linked to different tag nucleic acid fonns distinguishing each of the three forms
`
`from one another.
`
`In some embodiments each of the at least three forms of nucleic acid in the
`
`population is linked to a different tag. In some embodiments each molecule of the same form is
`
`linked to a tag comprising the same identifying information tag (e. g., a tag with the same or
`
`comprising the same sequence).
`
`In some embodiments molecules of the same form are linked to
`
`different types of tags. In some embodiments step (a) comprises: subjecting the population to
`
`reverse transcription with a tagged primer, wherein the tagged primer is incorporated into cDNA
`
`generated from RNA in the population.
`
`In some embodiments the reverse transcription is
`
`sequence—specific.
`
`In some embodiments the reverse transcription is random.
`
`In some
`
`embodiments, the method further comprises degrading RNA duplexed to the cDNA.
`
`In some
`
`embodiments, the method further comprises separating single-stranded DNA from double—
`
`stranded DNA and ligating nucleic acid tags to the double—stranded DNA.
`
`In some embodiments
`
`the single—stranded DNA is separated by hybridization to one or more capture probes. In some
`
`embodiments, the method further comprises differentially tagging single—stranded DNA with a
`
`single—stranded tag using a ligase that functions on single stranded nucleic acids, and double—
`
`stranded DNA with double—stranded adapters using ligase that functions on double—stranded
`
`nucleic acids. In some embodiments, the method further comprises before assaying, pooling
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`tagged nucleic acids comprising different fonns of nucleic acid. In some embodiments, the
`
`method further comprises analyzing the pools of partitioned DNA separately in individual
`
`assays. The assays can be the same, substantially similar, equivalent, or different.
`
`[0008] In any of the above methods, the sequence data can indicate presence of a somatic or
`
`germline variant, or a copy number variation or a single nucleotide variation, or an indel or gene
`
`fusion.
`
`[0009] The disclosure further provides a method of analyzing a nucleic acid population
`
`comprising nucleic acids with different extents of modification. In some instances, the
`
`disclosure provides methods for screening for characteristics (e. g., 5’ methylcytosine) associated
`
`with a disease. The method comprises contacting the nucleic acid population with an agent (such
`
`as a methyl binding domain or protein) that preferentially binds to nucleic acids bearing the
`
`modification; separating a first pool of nucleic acids bound to the agent from a second pool of
`
`nucleic acids unbound to the agent, wherein the first pool of nucleic acids are overrepresented for
`
`the modification, and the nucleic acids in the second pool are underrepresented for the
`
`modification; linking the nucleic acids in the first pool and/or second pool to one or more nucleic
`
`acid tags that distinguish the nucleic acids in the first pool and the second pool to produce a
`
`population of tagged nucleic acids; amplifying the tagged nucleic acids, wherein the nucleic
`
`acids and the linked tags are amplified; assaying sequence data of the amplified nucleic acids and
`
`linked tags; decoding the tags to reveal whether the nucleic acids for which sequence data has
`
`been assayed were amplified from templates in the first or second pool.
`
`[00010]
`
`In some embodiments the modification is binding of nucleic acids to a protein. In
`
`some embodiments, the protein is a histone or transcription factor. In some embodiments, the
`
`nucleic acid modification is a post—replication modification to a nucleotide. In some
`
`embodiments, the post—replication modification is 5—methyleytosine, and the extent of binding of
`
`the capture agent to nucleic acids increases with the extent of S-methylcytosines in the nucleic
`
`acid.
`
`In some embodiments, the post—replication modification is S—hydroxymethyleytosine, and
`
`the extent of binding of the agent to nucleic acid increases with the extent of 5—
`
`hydroxymethyleytosine in the nucleic acid. In some embodiments, the post—replication
`
`modification is 5—formylcytosine or 5—carboxylcytosine and the extent of binding of the agent
`
`increases with the extent of 5—formylcytosine or 5—carboxylcytosine in the nucleic acid.
`
`In some
`
`embodiments, the post—replication modification is N°—methyladenine. In some embodiments, the
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`method further comprises washing nucleic acids bound to the agent and collecting the wash as a
`
`third pool including nucleic acids with the post replication modification at an intermediate extent
`
`relative to the first and second pools. Some methods further comprise, before assaying, pooling
`
`tagged nucleic acids from the first and second pools. In some embodiments, the agent comprises
`
`a methyl—binding domain or methyl—CpG—binding domain (MBD). The MBD can be a protein,
`
`an antibody or any other agent capable of specifically binding to modification of interest.
`
`Preferably, the MBD further comprises magnetic beads, streptavidin, or other binding domains
`
`for performing an affinity separation step.
`
`[00011]
`
`The disclosure further provides a method for analyzing a nucleic acid population
`
`in which at least some of the nucleic acids include one or more modified cytosine residues. The
`
`method comprises linking capture moieties, e.g., biotin, to nucleic acids in the population to
`
`serve as templates for amplification; performing an amplification reaction to produce
`
`amplification products from the templates; separating the templates linked to capture moieties
`
`from amplification products; assaying sequence data of the templates linked to capture moieties
`
`by bisulfite sequencing; and assaying sequence data of the amplification products.
`
`[00012]
`
`In some embodiments, the capture moieties comprise biotin.
`
`In some
`
`embodiments, the separating is performed by contacting the templates with streptavidin beads.
`
`In some embodiments the modified cytosine residues are 5—methylcytosine, 5-hydroxymethyl
`
`cytosine, 5—formyl cytosine or 5—carboxylcytosine.
`
`In some embodiments, the capture moieties
`
`comprise biotin linked to nucleic acid tags including one or more modified residues. In some
`
`embodiments, the capture moieties are linked to nucleic acid in the population via a cleavable
`
`linkage.
`
`In some embodiments, the cleavable linkage is a photocleavable linkage. In some
`
`embodiments, the cleavable linkage comprises a uracil nucleotide.
`
`[00013]
`
`The disclosure further provides a method of analyzing a nucleic acid population
`
`comprising nucleic acids with different extents of 5—methylcytosine. The method comprises (a)
`
`contacting the nucleic acid population with an agent that preferentially binds to 5—methylated
`
`nucleic acids; (b) separating a first pool of nucleic acids bound to the agent from a second pool
`
`of nucleic acids unbound to the agent, wherein the first pool of nucleic acids are overrepresented
`
`for 5—methylcytosine, and the nucleic acids in the second pool are underrepresented for 5—
`
`methylation; (c) linking the nucleic acids in the first pool and/or second pool to one or more
`
`nucleic acid tags that distinguish the nucleic acids in the first pool and the second pool, wherein
`
`U1
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`the nucleic acid tags linked to nucleic acids in the first pool comprise a capture moiety (e.g.,
`
`biotin); (d) amplifying the labelled nucleic acids, wherein the nucleic acids and the linked tags
`
`are amplified; (e) separating amplified nucleic acids bearing the capture moiety from amplified
`
`nucleic acids that do not bear the capture moiety; and (f) assaying sequence data of the separated,
`
`amplified nucleic acids.
`
`[00014]
`
`The disclosure further provides a method of analyzing a nucleic acid population
`
`comprising nucleic acids with different extents of modification, comprising: contacting the
`
`nucleic acids in the population with adapters to produce a population of nucleic acids flanked by
`
`adapters comprising primer binding sites; amplifying the nucleic acids flanked by adapters
`
`primed from the primer binding sites; contacting the amplified nucleic acids with an agent that
`
`preferentially binds to nucleic acids bearing the modification; separating a first pool of nucleic
`
`acids bound to the agent from a second pool of nucleic acids unbound to the agent, wherein the
`
`first pool of nucleic acids are overrepresented for the modification, and the nucleic acids in the
`
`second pool are underrepresented for the modification; performing a second amplification step of
`
`nucleic acids in the first and second pools; and assaying sequence data of the amplified nucleic
`
`acids in the first and second pools. Amplification of each pool can occur separately in different
`
`reaction vessels. Using pool specific tags allows for subsequent pooling of the amplicons prior to
`
`sequencing.
`
`[00015]
`
`The disclosure further provides a method of analyzing a nucleic acid population in
`
`which at least some of the nucleic acids include one or more modified cytosine residues,
`
`comprising contacting the nucleic acid population with adapters comprising a primer binding site
`
`comprising at least one modified cytosine to form nucleic acids flanked by adapters; amplifying
`
`the nucleic acids flanked by adapters primed from the primer binding sites in the adapters
`
`flanking a nucleic acid; splitting the amplified nucleic acids into first and second aliquots;
`
`assaying sequence data on the nucleic acids of the first aliquot; contacting the nucleic acids of
`
`the second aliquot with bisulfite, which converts unmodified cytosines (C’s) to uracils (U’s);
`
`amplifying the nucleic acids resulting from bisulfite treatment primed from the primer binding
`
`sites flanking the nucleic acids, wherein U's introduced by bisulfite treatment are converted to
`
`T's; assaying sequence data on the amplified nucleic acids from the second aliquot; comparing
`
`the sequence data of the nucleic acids in the first and second aliquots to identify which
`
`nucleotides in the nucleic acid population were modified cytosines.
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`[00016]
`
`In any of the above methods, the nucleic acid population can be from a bodily
`
`fluid sample, such as blood, serum, or plasma. In some embodiments, the nucleic acid
`
`population is a cell free nucleic acid population.
`
`In some embodiments, the bodily fluid sample
`
`is from a subject suspected of having a cancer.
`
`[00017]
`
`In one aspect provided herein is a method of analyzing a nucleic acid population
`
`comprising at least two forms of nucleic acid selected from double—stranded DNA, single—
`
`stranded DNA and single—stranded RNA, the method, wherein each of the at least two forms
`
`comprises a plurality of molecules, comprising: linking at least one of the fonns of nucleic acid
`
`with at least one tag nucleic acid to distinguish the forms from one another, amplifying the forms
`
`of nucleic acid at least one of which is linked to at least one nucleic acid tag, wherein the nucleic
`
`acids and linked nucleic acid tag, are amplified, to produce amplified nucleic acids, of which
`
`those amplified from the at least one form are tagged; assaying sequence data of the amplified
`
`nucleic acids at least some of which are tagged; wherein the assaying obtains sequence
`
`information sufficient to decode the tag nucleic acid molecules of the amplified nucleic acids to
`
`reveal the forms of nucleic acids in the population providing an original template for the
`
`amplified nucleic acids linked to the tag nucleic acid molecules for which sequence data has
`
`been assayed.
`
`In one embodiment the method further comprises the step of decoding the tag
`
`nucleic acid molecules of the amplified nucleic acids to reveal the forms of nucleic acids in the
`
`population providing an original template for the amplified nucleic acids linked to the tag nucleic
`
`acid molecules for which sequence data has been assayed. In another embodiment the method
`
`further comprises enriching for at least one of the forms relative to one or more of the other
`
`forms. In another embodiment at least 70% of the molecules of each form of nucleic acid in the
`
`population are amplified. In another embodiment at least three forms of nucleic acid are present
`
`in the population and at least two of the forms are linked to different tag nucleic acid forms
`
`distinguishing each of the three forms from one another. In another embodiment each of the at
`
`least three forms of nucleic acid in the population is linked to a different tag. In another
`
`embodiment each molecule of the same form is linked to a tag comprising the same tag
`
`information. In another embodiment molecules of the same form are linked to different types of
`
`tags. In another embodiment the method further comprises subjecting the population to reverse
`
`transcription with a tagged primer, wherein the tagged primer is incorporated into cDNA
`
`generated from RNA in the population. In another embodiment the reverse transcription is
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`sequence—specific. In another embodiment wherein the reverse transcription is random. In
`
`another embodiment the method further comprises degrading RNA duplexed to the cDNA. In
`
`another embodiment the method further comprises separating single-stranded DNA from double-
`
`stranded DNA and ligating nucleic acid tags to the double—stranded DNA.
`
`In another
`
`embodiment the single—stranded DNA is separated by hybridization to one or more capture
`
`probes. In another embodiment the method further comprises circularizing single—stranded DNA
`
`with a circligase and ligating nucleic acid tags to the double—stranded DNA. In another
`
`embodiment the method comprises, before assaying, pooling tagged nucleic acids comprising
`
`different forms of nucleic acid. In another embodiment the nucleic acid population is from a
`
`bodily fluid sample. In another embodiment the bodily fluid sample is blood, serum, or plasma.
`
`In another embodiment the nucleic acid population is a cell free nucleic acid population. In
`
`another embodiment the bodily fluid sample is from a subject suspected of having a cancer. In
`
`another embodiment the sequence data indicates presence of a somatic or germline variant. In
`
`another embodiment the sequence data indicates presence of a copy number variation. In another
`
`embodiment the sequence data indicates presence of a single nucleotide variation (SNV), indel or
`
`gene fusion. In another embodiment the sequence data indicates presence of a single nucleotide
`
`variation (SNV), indel or gene fusion.
`
`[00018]
`
`In another aspect provided herein is method of analyzing a nucleic acid
`
`population comprising nucleic acids with different extents of modification, comprising:
`
`contacting the nucleic acid population with an agent that preferentially binds to nucleic acids
`
`bearing the modification, separating a first pool of nucleic acids bound to the agent from a
`
`second pool of nucleic acids unbound to the agent, wherein the first pool of nucleic acids is
`
`overrepresented for the modification, and the nucleic acids in the second pool are
`
`underrepresented for the modification; linking the nucleic acids in the first pool and/or second
`
`pool to one or more nucleic acid tags that distinguish the nucleic acids in the first pool and the
`
`second pool to produce a population of tagged nucleic acids; amplifying the labelled nucleic
`
`acids, wherein the nucleic acids and the linked tags are amplified; and, assaying sequence data of
`
`the amplified nucleic acids and linked tags; wherein the assaying obtains sequence data for
`
`decoding the tags to reveal whether the nucleic acids for which sequence data has been assayed
`
`were amplified from templates in the first or the second pool.
`
`In one embodiment the method
`
`comprises the step of decoding the tags to reveal whether the nucleic acids for which sequence
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`data has been assayed were amplified from templates in the first or the second pool. In another
`
`embodiment the modification is binding of nucleic acids to a protein. In another embodiment the
`
`protein is a histone or transcription factor. In another embodiment the modification is a post-
`
`replication modification to a nucleotide. In another embodiment the post—replication
`
`modification is S—methyl—cytosine, and the extent of binding of the agent to nucleic acids
`
`increases with the extent of S—methyl—cytosines in the nucleic acid.
`
`In another embodiment the
`
`post—replication modification is S—hydroxymethyl—cytosine, and the extent of binding of the agent
`
`to nucleic acid increases with the extent of 5—hydroxymethyl—cytosine in the nucleic acid. In
`
`another embodiment the post—replication modification is 5—formyl—cytosine or 5—carboxyl—
`
`cytosine and the extent of binding of the agent increases with the extent of 5—formyl-cytosine or
`
`5—carboxyl-cytosine in the nucleic acid.
`
`In another embodiment the method further comprises
`
`washing nucleic acids bound to the agent and collecting the wash as a third pool including
`
`nucleic acids with the post replication modification at an intermediate extent relative to the first
`
`and second pools.
`
`In another embodiment the method comprises, before assaying, pooling
`
`tagged nucleic acids from the first and second pools. In another embodiment the agent is 5—
`
`methyl— binding domain magnetic beads. In another embodiment the nucleic acid population is
`
`from a bodily fluid sample.
`
`In another embodiment the bodily fluid sample is blood, serum, or
`
`plasma. In another embodiment the nucleic acid population is a cell free nucleic acid population.
`
`In another embodiment the bodily fluid sample is from a subject suspected of having a cancer.
`
`In another embodiment the sequence data indicates presence of a somatic or germline variant. In
`
`another embodiment the sequence data indicates presence of a copy number variation. In
`
`another embodiment the sequence data indicates presence of a single nucleotide variation (SNV),
`
`indel or gene fusion.
`
`[00019]
`
`In another aspect provided herein is a method of analyzing a nucleic acid
`
`population in which at least some of the nucleic acids include one or more modified cytosine
`
`residues, comprising linking capture moieties to nucleic acids in the population, which nucleic
`
`acids serve as templates for amplification; performing an amplification reaction to produce
`
`amplification products from the templates; separating the templates linked to capture tags from
`
`amplification products; assaying sequence data of the templates linked to capture tags by
`
`bisulfite sequencing; and assaying sequence data of the amplification products.
`
`In one
`
`embodiment the capture moieties comprise biotin. In another embodiment the separating is
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`performed by contacting the templates with streptavidin beads. In another embodiment the
`
`modified cytosine residues are 5—methyl—cytosine, 5—hydroxymethyl cytosine, 5—formyl cytosine
`
`or 5-carboxyl cytosine. In another embodiment the capture moieties comprise biotin linked to
`
`nucleic acid tags including one or more modified residues. In another embodiment the capture
`
`moieties are linked to nucleic acid in the population via a cleavable linkage. In another
`
`embodiment the cleavable linkage is a photocleavable linkage. In another embodiment the
`
`cleavable linkage comprises a uracil nucleotide.
`
`In another embodiment the nucleic acid
`
`population is from a bodily fluid sample.
`
`In another embodiment the bodily fluid sample is
`
`blood, serum, or plasma. In another embodiment the nucleic acid population is a cell free
`
`nucleic acid population. In another embodiment the bodily fluid sample is from a subject
`
`suspected of having a cancer. In another embodiment the sequence data indicates presence of a
`
`somatic or germline variant.
`
`In another embodiment the sequence data indicates presence of a
`
`copy number variation. In another embodiment the sequence data indicates presence of a single
`
`nucleotide variation (SNV), indel or gene fusion.
`
`[00020]
`
`In another aspect provided herein is amethod of analyzing a nucleic acid
`
`population comprising nucleic acids with different extents of 5—methylation, comprising:
`
`contacting the nucleic acid population with an agent that preferentially binds to S—methyl-ated
`
`nucleic acids; separating a first pool of nucleic acids bound to the agent from a second pool of
`
`nucleic acids unbound to the agent, wherein the first pool of nucleic acids are overrepresented for
`
`S—methylation, and the nucleic acids in the second pool are underrepresented for S—methylation;
`
`linking the nucleic acids in the first pool and/or second pool to one or more nucleic acid tags that
`
`distinguish the nucleic acids in the first pool and the second pool, wherein the nucleic acid tags
`
`linked to nucleic acids in the first pool comprise a capture moiety (e. g., biotin); amplifying the
`
`labelled nucleic acids, wherein the nucleic acids and the linked tags are amplified; separating
`
`amplified nucleic acids bearing the capture moiety from amplified nucleic acids that do not bear
`
`the capture moiety; and assaying sequence data of the separated, amplified nucleic acids.
`
`[00021]
`
`In another aspect provided herein is a method of analyzing a nucleic acid
`
`population comprising nucleic acids with different extents of modification, comprising:
`
`contacting the nucleic acids in the population with adapters to produce a population of nucleic
`
`acids flanked by adapters comprising primer binding sites; amplifying the nucleic acids flanked
`
`by adapters primed from the primer binding sites; contacting the amplified nucleic acids with an
`
`10
`
`
`
`WO 2018/119452
`
`PCT/USZOl7/068329
`
`agent that preferentially binds to nucleic acids bearing the modification; separating a first pool of
`
`nucleic acids bound to the agent from a second pool of nucleic acids unbound to the agent,
`
`wherein the first pool of nucleic acids is overrepresented for the modification, and the nucleic
`
`acids in the second pool are underrepresented for the modification; performing parallel
`
`amplifications of tagged nucleic acids in the first and second pools; and assaying sequence data
`
`of the amplified nucleic acids in the first and second pools. In another embodiment the adapters
`
`are hairpin adapters.
`
`[00022]
`
`In another aspect provided herein is a method of analyzing a nucleic acid
`
`population in which at least some of the nucleic acids include one or more modified cytosine
`
`residues, comprising contacting the nucleic acid population with adapters comprising a primer
`
`binding site comprising a modified cytosine to form nucleic acids flanked by adapters;
`
`amplifying the nucleic acids flanked by adapters primed from the primer binding sites in the
`
`adapters flanking a nucleic acid; splitting the amplified nucleic acids into first and second
`
`aliquots; assaying sequence data on the nucleic acids of the first aliquot; contacting the nucleic
`
`acids of the second aliquot with bisulfite, which converts unmodified C's to U; amplifying the
`
`nucleic acids resulting from bisulfite treatment primed from the primer binding sites flanking the
`
`nucleic acids, wherein U's introduced by bisulfite treatment are converted to T's; and, assaying
`
`sequence data of the amplified nucleic acids from the second aliquot; wherein the assaying
`
`produces sequence data that can be used to compare the sequence data of the nucleic acids in the
`
`first and second aliquot to identify which nucleotides in the nucleic acid population were
`
`modified cytosines.
`
`In one embodiment the method comprises comparing the sequence data of
`
`the nucleic acids in the first and second aliquot to identify which nucleotides in the nucleic acid
`
`population were modified cytosines. In another embodiment the adapters are hairpin adapters.
`
`[00023]
`
`In another aspect provided herein is a method, comprising: physically
`
`fractionating DN