`
`CLAIMS
`
`WHATIS CLAIMED IS:
`
`1.
`
`A method comprising:
`
`(a) providing a sample comprising a set of double-stranded polynucleotide
`
`molecules, each double-stranded polynucleotide molecule including first and second
`
`complementary strands;
`
`(b) tagging the double-stranded polynucleotide molecules with a set of
`
`duplex tags, wherein each duplex tag differently tags the first and second
`
`complementary strands of a double-stranded polynucleotide molecule in the set;
`
`(c) sequencing at least some of the tagged strands to produce a setof
`
`sequencereads;
`
`(d) reducing or tracking redundancyin the set of sequence reads;
`
`(e) sorting sequence reads into paired reads and unpaired reads, wherein
`
`(1) each paired read is formed from sequence reads generated fromafirst
`
`tagged strand and a second differently tagged complementary strand derived from a
`
`double-stranded polynucleotide molecule in the set; and
`
`(2) each unpaired read represents a first tagged strand having no second
`
`differently tag complementary strand derived from a double-stranded polynucleotide
`
`molecule represented among the sequencereads in the set of sequence reads;
`
`(f) determining quantitative measuresof (1) the paired reads and (2) the
`
`unpaired reads that map to each of one or more genetic loci; and
`
`(g) estimating a quantitative measure of total double-stranded
`
`polynucleotide molecules in the set that map to each of the one or more genetic loci
`
`based on the quantitative measure of paired reads and unpaired reads mapping to each
`
`locus.
`
`2.
`
`The method of claim 1 further comprising:
`
`(h) detecting copy number variation in the sample by determining a
`
`normalized total quantitative measure determined in step (g) at each of the one or more
`
`genetic loci and determining copy number variation based on the normalized measure.
`
`-25-
`
`
`
`3.
`
`The method of claim 1 wherein the double-stranded polynucleotide
`
`molecules are DNA.
`
`Attorney Docket No: 42534-708-101
`
`4.
`
`The method of claim 1 wherein the sample comprises double-stranded
`
`polynucleotide molecules sourced substantially from cell-free nucleic acids, e.g., cfDNA.
`
`5.
`
`The method of claim 1 wherein the sample comprises no more than 100
`
`ng double-stranded polynucleotide molecules.
`
`6.
`
`The method of claim 1 wherein the sample is selected from the group
`
`consisting of blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool and
`
`tears.
`
`7.
`
`The method of claim 1 wherein the sample comprises double-stranded
`
`polynucleotide molecules from healthy cells and from malignant cells.
`
`8.
`
`The method of claim 1 wherein the sample comprises maternal double-
`
`stranded polynucleotide molecules and fetal double-stranded polynucleotide molecules.
`
`9.
`
`The method of claim 1 wherein any of at least 10%, 25%, 50%, 75%, 90%
`
`or 99% of the double-stranded polynucleotide molecules in the set bear an identifying
`
`tag shared with at least one other double-stranded polynucleotide molecule in the set
`
`(e.g., the set of polynucleotide molecules is non-uniquely tagged).
`
`10.
`
`The methodof claim 1 wherein any of at most 25%, 10%, 2%, 1% or 0.1%
`
`of the double-stranded polynucleotide molecules in the set bear an identifying tag
`
`shared with at least one other polynucleotide molecule in the set.
`
`11.
`
`|The method of claim 1 wherein the double-stranded polynucleotide
`
`molecules in the set are tagged with between 2 and 1000 different identifying tags or
`
`between 2 and 100 different identifying tags.
`
`12.
`
`The method of claim 1 wherein each duplex tag comprises a
`
`polynucleotide identifier.
`
`-26-
`
`
`
`Attorney Docket No: 42534-708-101
`
`13.
`
`|The method of claim 12 wherein each polynucleotide identifier comprises
`
`a non-complementaryregion.
`
`14.
`
`|The method of claim 1 wherein each duplex tag is Y-shaped, bubble
`
`shaped or hairpin shaped.
`
`15.
`
`|The method of claim 1 wherein the double-stranded polynucleotides are
`
`converted into tagged polynucleotides with a conversion efficiency of at least 10%, at
`
`least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 80% or at
`
`least 90%.
`
`16.
`
`|The method of claim 1 wherein tagging comprises any of blunt-end
`
`ligation, sticky end ligation, molecular inversion probes, PCR, ligation-based PCR,
`
`multiplex PCR, single strand ligation and single strand circularization.
`
`17.
`
`|The method of claim 1 wherein sequencing comprises amplification of the
`
`tagged strands, e.g., by PCR.
`
`18.
`
`|The method of claim 1 comprising filtering out reads that are introduced
`
`into the sample through contamination.
`
`19.
`
`|The method of claim 1 comprising filtering out sequence reads thatfail to
`
`meet a set threshold.
`
`20.
`
`The method of claim 1 wherein reducing redundancyin the set of
`
`sequence reads comprises collapsing sequence reads produced from amplified
`
`products of an original polynucleotide molecule in the sample back to the original
`
`polynucleotide molecule.
`
`21.
`
`The method of claim 20 further comprising determining a consensus
`
`sequencefor the original polynucleotide molecule.
`
`22.
`
`The method of claim 21 further comprising identifying polynucleotide
`
`molecules at one or more genetic loci comprising a sequencevariant.
`
`-27-
`
`
`
`Attorney Docket No: 42534-708-101
`
`23.
`
`The method of claim 21 further comprising determining a quantitative
`
`measure ofpaired reads that map to a locus, wherein both strands of the pair comprise
`
`a sequencevariant.
`
`24.
`
`The method of claim 23 further comprising determining a quantitative
`
`measure of paired molecules in which only one member of the pair bears a sequence
`
`variant and/or determining a quantitative measure of unpaired molecules bearing a
`
`sequencevariant.
`
`25.
`
`The method of claim 22 wherein the sequencevariant is selected from a
`
`single nucleotide variant, an indel, a transversion, a translocation, an inversion, a
`
`deletion, a chromosomal structure alteration, a gene fusion, a chromosome fusion, a
`
`gene truncation, a gene amplification, a gene duplication and a chromosomal lesion.
`
`26.
`
`The method of claim 1 wherein the quantitative measures are numbers of
`
`molecules.
`
`27.
`
`|The method of claim 1 wherein the one or more genetic loci are a plurality
`
`of genetic loci.
`
`28.
`
`|The method of claim 1 wherein the one or more genetic loci correspond to
`
`one or more oncogenes(e.g., a panel of oncogenes).
`
`29.
`
`The method of claim 26 wherein the plurality of the genetic loci map to a
`
`single nucleotide, a gene, a fragment of a chromosome, a full chromosome or a
`
`genome.
`
`30.
`
`The method of claim 1 wherein estimating the quantitative measure
`
`comprises estimating a quantitative measure of polynucleotide molecules in the sample
`
`for which no sequencereads are detected.
`
`-28-
`
`
`
`Attorney Docket No: 42534-708-101
`
`31.
`
`The method of claim 1 wherein estimating the quantitative measure
`
`comprises use of a binomial distribution, exponential distribution, beta distribution or
`
`empirical distribution based on the redundancy of sequencereads.
`
`32.
`
`|The method of claim 2 wherein copy number variation is selected from
`
`aneuploidy, partial aneuploidy and polyploidy.
`
`33.
`
`|The method of claim 2 wherein nucleotide sequences from sequence
`
`reads are assembled into combined sequences and wherein combined sequencesare
`
`partitioned into non-overlapping windows.
`
`34.
`
`The method of claim 33 wherein CNV is determined between loci.
`
`35. Asystem comprising a computer readable medium comprising machine-
`
`executable code that, upon execution by a computer processor, implements a method
`
`comprising:
`
`duplex tags;
`
`(a) receiving into memory sequencereads of polynucleotides tagged with
`
`(b) reducing and/or tracking redundancyin the set of sequence reads;
`
`(c) sorting sequencereads into paired reads and unpaired reads, wherein
`
`(1) the unpaired read representsa first tagged strand having no
`
`second differently tag complementary strand derived from a double-stranded
`
`polynucleotide molecule represented among the sequencereads in the set of
`
`sequencereads;
`
`(d) determining quantitative measuresof (1) the paired reads and (2) the
`
`unpaired reads that map to each of one or more genetic loci; and
`
`(e) estimating a quantitative measureof total double-stranded
`
`polynucleotide molecules in the set that map to each of the one or more genetic loci
`
`based on the quantitative measure of paired reads and unpaired reads mapping to each
`
`locus.
`
`-29-
`
`
`
`Attorney Docket No: 42534-708-101
`
`36.
`
`The method of claim 35 further comprising:
`
`(f) detecting copy number variation in the sample by determining a
`
`normalized total quantitative measure determined in step (e) at each of the one or more
`
`genetic loci and determining copy number variation based on the normalized measure.
`
`37.
`
`_Acomposition comprising between 300 and 300,000 haploid genome
`
`equivalents of fragmented DNA, wherein DNA fragments are tagged with duplex tags
`
`and bear between 2 and 10,000 different identifiers.
`
`38.
`
`|The composition of claim 37 comprising between 1000 and 100,000
`
`haploid genome equivalents of fragmented DNA.
`
`39.
`
`The composition of claim 37 wherein DNA fragments bear between 10
`
`and 1,000 different identifiers.
`
`40.
`
`The composition of claim 37 wherein the fragmented DNA is human-
`
`derived DNA.
`
`41.
`
`The composition of claim 37 wherein the fragmented DNAis cfDNA.
`
`42.
`
`|The composition of claim 37 wherein the fragmented DNAis tumor DNA.
`
`43.
`
`|The composition of claim 41 wherein the tumor DNAis formalin-fixed,
`
`paraffin-embedded.
`
`44.
`
`A method comprising:
`
`(a) providing a sample comprising a set of double-stranded polynucleotide
`
`molecules, each double-stranded polynucleotide molecule including first and second
`
`complementary strands;
`
`(b) tagging the double-stranded polynucleotide molecules with a set of
`
`duplex tags, wherein each duplex tag differently tags the first and second
`
`complementary strands of a double-stranded polynucleotide molecule in the set;
`
`-30-
`
`
`
`(c) sequencing at least some of the tagged strands to produce a setof
`
`Attorney Docket No: 42534-708-101
`
`sequencereads;
`
`(d) reducing or tracking redundancyin the set of sequence reads;
`
`(e) sorting sequence reads into paired reads and unpaired reads, wherein
`
`(1) each paired read is formed from sequence reads generated fromafirst
`
`tagged strand and a second differently tagged complementary strand derived from a
`
`double-stranded polynucleotide molecule in the set; and
`
`(2) each unpaired read represents a first tagged strand having no second
`
`differently tag complementary strand derived from a double-stranded polynucleotide
`
`molecule represented among the sequencereads in the set of sequence reads;
`
`(f) determining quantitative measuresof (1) the paired reads and (2) the
`
`unpaired reads that map to each of one or more genetic loci and (3) read depth of the
`
`paired reads and (4) read depth of unpaired reads; and
`
`(g) estimating a quantitative measure of total double-stranded
`
`polynucleotide molecules in the set that map to each of the one or more genetic loci
`
`based on the quantitative measure of paired reads and unpaired reads andtheir read
`
`depths mapping to each locus.
`
`-31-
`
`