`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`2 August 2007 (02.08.2007)
`
` (10) International Publication Number
`
`WO 2007/087312 A2
`
`(51) International Patent Classification:
`C12Q [/68 (2006.01)
`C12P 19/34 (2006.01)
`
`(21) International Application Number:
`PCT/US2007/001796
`
`(22) International Filing Date: 22 January 2007 (22.01.2007)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(30) Priority Data:
`60/761,578
`60/775,098
`60/777,661
`60/779,540
`60/791,561
`60/824,456
`
`23 January 2006 (23.01.2006)
`21 February 2006 (21.02.2006)
`27 February 2006 (27.02.2006)
`6 March 2006 (06.03.2006)
`12 April 2006 (12.04.2006)
`4 September 2006 (04.09.2006)
`
`US
`US
`US
`US
`US
`US
`
`(71) Applicants (for all designated States except US): COM-
`PASS GENETICS, LLC [US/US]; 2330 West Joppa
`Road, Suite 330, Lutherville, MD 21093 (US). MACE-
`VICZ, Stephen, C.
`[US/US]; 21890 Rucker Drive,
`Cupcrtino, CA 95014 (US).
`
`(74) Agent: BABA. Edward. J.; Bozicevic, Field & Francis
`LLP, 1900 University Avenue, Suite 200, East Palo Alto,
`CA 94303 (US).
`
`(81) Designated States (unless otherwise indicated, for ever
`kind of national protection available): AE, AG, AL, AM,
`AT, AU, AZ, BA, BB, BG, BR, BW, BY, BZ, CA, CH, CN,
`CO, CR, CU, CZ, DE, DK, DM, DZ, EC, EE, EG, ES, FI,
`GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IS,
`JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK, LR, LS,
`LT, LU, LV, LY, MA, MD, MG, MK, MN, MW, MX, MY,
`MZ, NA, NG, NI, NO, NZ, OM, PG, PH, PL, PT, RO, RS,
`RU, SC, SD, SE, SG, SK, SL, SM, SV, SY, TJ, TM, TN,
`TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW.
`
`(84) Designated States (unless otherwise indicated, for ever
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM,
`ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI,
`FR, GB, GR, HU, IE, IS, IT, LT, LU, LV, MC, NL, PL, PT,
`RO, SE, SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA,
`GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`Published:
`
`without international search report and to be republished
`upon receipt of that report
`
`(72) Inventor; and
`(75) Inventor/Applicant (for US only): BRENNER, Sydney
`[GB/GB]; 3 Barton Square, Ely CB7 4P] (GB).
`
`For two—letter codes and other abbreviations, refer to the ”Guid—
`ance Notes on Codes and Abbreviations ” appearing at the begin—
`ning of each regular issue of the PCT Gazette.
`
`(54) Title: MOLECULAR COUNTING
`
`(57) Abstract: The invention provides methods and compositions for counting molecules in a sample, wherein each molecule is
`labeled with a unique oligonucleotide tag. Such tags are amplified and identified rather than the molecules them selves; that is, the
`problem of counting molecules is converted into the problem of counting tags. In one aspect of the invention, molecules to be counted
`are labeled by sampling. That is, conjugates are formed between the molecules to be counted and oligonucleotide tags ofa very large
`set, or repertoire. After conjugation, a sample of conjugates is taken that is sufficiently small so that substantially every molecule
`has a unique oligonucleotide tag. Counting of different tags may be accomplished in a variety of ways. In one aspect, different tags
`may be counted by carrying out a series of sorting steps to generate successively less complex mixtures in which tags are enumerated
`using length—encoded "metric" tags. In another aspect, different tags may be counted by directly sequencing a sample of tags using
`any one of several different sequencing methodologies.
`
`
`
`W02007/087312A2|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`MOLECULAR COUNTING
`
`Cross Reference to Related Applications
`
`This application claims priority from prior United States applications having the following
`
`5
`
`serial numbers and filing dates: Ser. No. 60/761,578 filed 23 January 2006; Ser. No. 60/775,098
`
`filed 21 February 2006; Ser. No. 60/777,661 filed 27 February 2006; Ser. No. 60/779,540 filed 06
`
`March 2006; Ser. No. 60/791,561 filed 12 April 2006; and Ser. No. 60/824,456 filed 04 September
`
`2006, which applications are each incorporated herein in their entireties by reference.
`
`10
`
`15
`
`2O
`
`25
`
`3O
`
`35
`
`Field of the Invention
`
`The present invention relates to methods and compositions for analyzing populations of
`
`polynucleotides, and more particularly, to methods and compositions for counting molecules in a
`
`sample.
`
`BACKGROUND
`
`The difference between health and disease frequently depends on whether or not certain
`
`biomolecules of an organism are within tightly controlled tolerances. This has led to an active search
`
`for quantitative molecular biomarkers to assess states of health and disease, e.g. Slamon et al,
`
`Science, 240: 1795—1798 (1988); Sidransky, Nature Reviews Cancer, 2: 210-219 (2002); Pinkel and
`
`Albertson, Ann. Rev. Genomics Hum. Genet., 6: 331-354 (2005); Stankiewicz and Lupski, Trends in
`Genetics, 18: 74—82 (2002); Hanna, Oncology, 61 (suppl 2): 22-30 (2001); Cronin et al, Am. J.
`
`Pathol., 164: 35—42 (2004); and the like. Although many techniques are available to measure
`amounts ofbiomolecules, they each have trade-offs with respect to sensitivity, selectivity, dynamic
`
`range, convenience, robustness, cost, and so on. For nucleic acid measurements, most techniques
`
`provide analog readouts, in that measured amounts are correlated with signal intensities, e.g. Pinkel
`
`and Albertson, Nature Genetics Supplement, 37: 81 1-317 (2005); Lockhart et al, Nature
`
`Biotechnology, 14: 1675—1680 (1996). Digital measurements of polynucleotides have been made,
`
`wherein measured amounts are correlated with integral numbers of countable events, e. g. numbers of
`
`sequence tags; however, even though such measurements have significant statistical advantages, they
`
`are usually more difficult and expensive to implement, e. g. Brenner et al, Nature
`
`Biotechnology,18:630-634 (2000); Velculescu et al, Science, 270: 484487 (1995); Dressman et al,
`
`Proc. Natl. Acad. Sci., 100: 8817-8822 (2003); Audio and Claverie, Genome Research, 7: 986-995
`
`(1 997).
`
`It would be advantageous to many pure and applied fields in the biosciences if there was
`
`available a method for conveniently and accurately providing digital measurements of quantities of
`
`biomolecules in a cost effective manner. Such a method would be particularly useful in the medical
`
`and research fields for determining a wide variety of quantities, including genetic copy number
`
`-1-
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`variation, aneuploidies, such as chromosome 21 trisomy, gene expression variation, methylation
`
`variation, and the like.
`
`SUMMARY OF THE INVENTION
`
`The invention provides a method of counting molecules in a sample by converting the
`
`problem of counting molecules into one of counting sequences of oligonucleotide tags. That is, in
`accordance with the invention, molecules to be counted in a sample are each labeled with a unique
`
`oligonucleotide tag. Such tags are then amplified and identified. The number of different
`
`10
`
`oligonucleotide tags detected, or counted, is equal to the number of molecules in the sample. In one
`aspect, molecules to be counted are each associated with or linked to an oligonucleotide tag randomly
`selected from a set that is much larger than the number of target molecules. This ensures with high
`
`probability that substantially every target molecule is associated with a unique oligonucleotide tag.
`In the process of linking or associating such target molecule with an oligonucleotide tag, a selected
`
`probe containing the tag is formed that can be selectively amplified and/or otherwise manipulated.
`That is, in one aspect, oligonucleotide tags of selected probes are isolated fiom other oligonucleotide
`
`15
`
`tags by physical separation or by the resistance of the selected probe to degradation by at least one
`
`nuclease activity-
`
`In one aspect, the different oligonucleotide tags of the selected probes, and hence,
`
`the number of target molecules, is determined by sequencing a sample of the oligonucleotide tags
`
`amplified flow the selected probes.
`
`20
`
`In another aspect of the invention, oligonucleotide tags are provided that comprise a
`collection of subunits, or “words,” that are selected from a defined set of subunits. In one
`
`embodiment, such collections of subunits are arranged into a concatenate to form an oligonucleotide
`
`tag.
`
`In one aspect, such concatenates may be formed by combinatorial synthesis. Thus, if
`
`oligonucleotide tags comprised K subunits and if the defined set of subunits has three members, then
`at each position, 1 through K, one of the three subunits is present. In another aspect, no two tags of
`
`25
`
`such a collection of subunits is the same; thus, an oligonucleotide tag comprising a concatenate of
`
`such subunits has a different subunit at each position.
`In one aspect, the number of subunits in a set may vary between 2 and 4, inclusive; however,
`preferably, the number of subunits in a set is two. An oligonucleotide tag made up of subunits from a
`set of size two is referred to herein as a “binary tag.” Subunits of binary tags can have lengths that
`
`3O
`
`vary widely. In one asPect; subunits of binary tags have lengths in the range of from 1 to 6
`
`nucleotides, and more preferably, in the range of from 2 to 4 nucleotides. In one preferred
`
`embodiment, subunits of binary tags are dinucleotides, such as those described more fully below.
`
`In one form of the invention, oligonucleotide tags are counted by successively sorting them
`into separate subsets based on the identity of the subunits at different positions within the tags,
`
`35
`
`preferably using a sorting by sequence process as disclosed by Brenner, PCT publication WO
`2005/080604, which is incorporated by reference. Afier each sorting step, each subset is tested for the
`
`-2-
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`presence or absence of oligonucleotide tags. Sorting takes place only once at a position and
`continues position by position until no oligonucleotide tag is detected in one of the sorted subsets.
`When this condition is reached, the number of molecules (and number of different oligonucleotide
`
`tags) can be determined. For binary tags, the number of molecules is proportional to 2', where r is
`
`the number of sorting steps required to reach a subset empty of binary tags.
`In one aspect, the invention provides a method for determining a number of target molecules
`in a sample carried out by the following steps: (a) providing molecule—tag conjugates each
`comprising an oligonucleotide tag such that substantially every different molecule of the sample is
`attached to a different oligonucleotide tag, each oligonucleotide tag comprising a concatenation of
`
`10
`
`subunits selected from a set of subunits, each subunit being a different nucleotide or oligonucleotide
`
`and having a position, and the set of subunits having a size of from 2 to 6 members; (b) dividing the
`
`oligonucleotide tags of the molecule-tag conjugates into aliquots by sorting the oligonucleotide tags
`according to the identity of a subunit within a first or a successive position; and (c) repeating step (b)
`
`for at least one aliquot in each successive application of step (b) until at least one aliquot has no
`
`15
`
`oligonucleotide tags that can be separated into aliquots, thereby determining the number of molecules
`in the sample to be in the range determined by a first number equal to the size of the subset taken to a
`
`power equal to the lowest number of times step (b) has been applied to produce an aliquot having no
`oligonucleotide tags less one and a second number equal to the size of the subset taken to a power
`equal to the greatest number of times step (b)_has been applied to produce an aliquot having no
`
`20
`
`oligonucleotide tags less one.
`
`In another aspect, a method of the invention for estimating a number of target
`
`polynucleotides in a mixture is carried out with the following steps: (a) labeling by sampling each
`target polynucleotide in the mixture so that substantially every target polynucleotide has a unique
`oligonucleotide tag; (b) amplifying the oligonucleotide tags of the labeled target polynucleotides;
`and (c) determining the number of different oligonucleotide tags in a sample of amplified
`
`oligonucleotide tags, thereby estimating the number of target polynucleotide in the mixture- In one
`
`embodiment of this aspect, whenever size-based tags (i.e. “metric tags”) are employed, the number
`
`of different oligonucleotide tags in a sample is determined by counting the number of
`
`oligonucleotide tags of different sizes, e.g. by electrophoretic separation, chromatographic
`
`separation, mass spectrometry analysis, or the like. In another embodiment of this aspect, the
`number of different oligonucleotide tags in a sample is determined by determining the nucleotide
`sequences thereof and then counting the number of oligonucleotide tags with different sequences.
`In another aspect, a method of determining a number of target polynucleotides is
`
`implemented by the following steps: (a) providing for each target polynucleotide a plurality of
`nucleic acid probes specific for the target polynucleotide, each nucleic acid probe having a different
`oligonucleotide tag; (b) combining in a reaction mixture the plurality of nucleic acid probes with the
`target polynucleotides so that substantially every target polynucleotides associates with a nucleic
`
`25
`
`30
`
`35
`
`-3-
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`acid probe to form a selected nucleic acid probe that is resistant to at least one nuclease activity, the
`
`plurality of nucleic acid probes having a size sufficiently greater than the number of target
`
`polynucleotides so that substantially every selected nucleic acid probe has a unique oligonucleotide
`
`tag; (0) isolating the selected nucleic acid probes by treating the reaction mixture with one or more
`
`nuclease activities; and (d) determining nucleotide sequences of oligonucleotide tags in a sample of
`
`isolated selected nucleic acid probes to determine the number of different oligonucleotide tags
`
`therein, thereby determining the number of target polynucleotide in the mixture.
`
`In still another aspect, the invention provides methods and compositions for detecting .
`
`nucleic acid probes by sequencing probe-specific oligonucleotide tags. In this aspect, probes from a
`
`collection of probes, e. g. circularizable probes specific for different single nucleotide
`
`polymorphisms, are each labeled with a unique oligonucleotide tag. After combining with target
`
`polynucleotides, selected nucleic acid probes are generated from the probes whenever their
`
`respective target polynucleotide is present in a sample, e. g. by way of a template-driven extension
`
`and/or ligation reaction, or the like. The nucleotide sequences of the selected nucleic acid probes are
`
`then determined in order to determine which target polynucleotides are present. In one embodiment,
`
`the sequences of oligonucleotide tags of selected nucleic acid probes are determined after
`
`amplification by_a sequencing by synthesis process.
`
`The present invention provides compositions and methods for making digital measurements
`
`of biomolecules, and has applications in the measurement of genetic copy number variation,
`
`aneuploidy, methylation states, gene expression changes, and the like, particularly under conditions
`
`of limiting sample availability.
`
`10
`
`15
`
`20
`
`Figs. lA—lH illustrate embodiments of the invention for counting polynucleotides, such as
`
`Brief Description of the Drawings
`
`25
`
`restriction fi'agments.
`
`Figs. 2A—2B illustrate a general procedure for attaching an oligonucleotide tag to one end of
`a polynucleotide.
`.
`. Fig. 3 contains a table (Table I) of sequences of exemplary reagents for converting binary
`
`tags into metric tags.
`
`30
`
`35
`
`Figs. 4A—4C illustrate exemplary embodiments of the invention that employ indexing
`
`adaptors and padlock probes for generating and enumerating selected probes.
`
`Figs. 5A—5B illustrate further exemplary embodiments of the invention that employ adaptors
`
`having nuclease resistant ends for generating and enumerating selected probes.
`
`Figs. 6A-6B illustrate still fiirther exemplary embodiments of the invention that employ
`
`ligation probes for generating and enumerating selected probes.
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`Figs. 7A-7B illustrate still fiirther exemplary embodiments of the invention that employ
`
`emulsion-based amplification and sequencing by synthesis to identify the oligonucleotide tags of
`selected probes.
`O
`
`Figs. 7C—7D illustrate an embodiment of the invention wherein metric tags are directly
`
`counted after separation to give an estimate of the number of target molecules in a sample,
`
`Fig. 8A contains a table (Table H) of lengths of single stranded metric tags released from
`
`composite tags produced in Example I.
`
`Fig. 8B illustrates diagrammatically the construction of a set of probes for use with the
`
`invention to count target nucleic acid molecules.
`
`Fig. 8C is an image of several mixtures of metric tags that have been electrophoretically
`
`separated.
`
`Figs. 9A—9E illustrates a scheme for generating sets of binary tags of a predetermined size.
`
`Fig. 10A shows data demonstrating the use of the sorting by sequence technique for
`
`generating successively less complex mixtures of nucleic acids.
`
`Fig. 10B shows data from a dilution series of test sequences that demonstrates the sensitivity
`
`of the sorting by sequence technique for isolating target sequences from mixtures.
`
`Figs. llA-l 1E illustrate a method of selecting particular fragments by cormnon sequence
`elements.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`The practice of the present invention may employ, unless otherwise indicated, conventional
`
`techniques and descriptions of organic chemistry, polymer technology, molecular biology (including
`
`recombinant techniques), cell biology, biochemistry, and immunology, which are within the skill of
`
`the art. Such conventional techniques include polymer array synthesis, hybridization, ligation, and
`
`detection of hybridization using a label. Specific illustrations of suitable techniques can be had by
`
`reference to the example herein below. However, other equivalent conventional procedures can, of
`
`course, also be used. Such conventional techniques and descriptions can be found in standard
`
`laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV), Using
`
`Antibodies: A Laboratory Manual, Cells: A Laboratory Manual, PCR Primer: A Laboratory Manual,_
`
`and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press),
`
`Stryer, L. (1995) Biochemistry (4th Ed.) Freeman, New York, Gait, “Oligonucleotide Synthesis: A '
`
`Practical Approach ” 1984, IRL Press, London, Nelson and Cox (2000), Lehnz‘nger. Principles of
`
`Biochemistry 3‘d Ed., W. H. Freeman Pub., New York, N.Y. and Berg et a1. (2002) Biochemistry, 5lh
`
`Ed., W. H. Freeman Pub., New York, N.Y., all of which are herein incorporated in their entirety by
`
`10
`
`15
`
`2O
`
`25
`
`30
`
`35
`
`reference for all purposes.
`
`The invention provides a method of counting molecules that are uniquely labeled with tags.
`
`That is, substantially every molecule to be counted in a sample, e_g. the number of single stranded
`
`-5-
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`DNA molecules of a particular genetic locus in a sample of genomic DNA, is associated with a probe
`
`having a different tag, so that the process of counting multiple copies of the same molecule is
`
`transformed into a process of counting the number of different kinds of associated tags. Both the
`
`process of associating a unique tag with a selected target molecule and the process of counting
`
`associated tags can be carried out in a variety of ways. In one aspect, such associations are made by
`
`providing a set of probes that are capable of specifically binding or reacting with the target molecules
`
`and that are labeled with tags selected from a repertoire that is substantially larger than the number of
`
`target molecules to be counted in a sample. Thus, the type of target molecule capable of being
`
`counted in accordance with the invention includes any type molecule for which such probes can be
`
`constructed, including, but not limited to, nucleic acids, proteins, peptides, drugs, chromosomes, and
`
`other structures, organelles, and compounds for which specific binding compounds, such as
`
`antibodies, can be produced. In one aspect, tags for use with the invention are oligonucleotide tags,
`
`because they are conveniently synthesized with a diversity of sequences, they are readily
`
`incorporated in probes having specific binding capability, and they may be amplified from very small
`
`quantities for convenient detection. However, other types of labels may be employed with the
`
`invention, which are capable of generating a large diversity of signals, including, but not limited to,
`
`quantum dots, nanoparticles, nanobarcodes, and the like, e.g. as disclosed in Freeman et a1,
`Proceedings SPIE, 5705: 1 14-121 (2005); Galitonov et a1, Opt. Express, 14: 1382 (2006); Reiss et al,
`J. Electroanal. Chem., 522: 95—103 (2002); Freeman et al, Methods Mol. Biol., 303: 73—83 (2005);
`Nicewarner-Pena et al, Science, 294: 137-141 (2001);or the like:
`
`When antibodies are available to specifically bind to target molecules to create an
`
`association, oligonucleotide tags may be used as labels by forming antibody-oligonucleotide
`
`conjugates, e.g. as disclosed in UllInan et al, Proc. Natl. Acad. Sci., 91: 5426-5430 (1994); Gullberg
`
`et al, Proc. Natl. Acad. Sci., 101: 8420—8424 (2004); Sano et a1, U.S. patent 5,665,539; Eberwine et
`
`al, U.S. patent 5,922,553; which are incorporated by reference. In one embodiment, oligonucleotide
`
`tags of specifically bound antibodies may be amplified and detected after washing away unbound
`
`conjugates. In another embodiment, a homogeneous format may be employed by using conjugates
`
`having a photosensitizer—cl'eavable linkage, as taught in U.S. patent publication 2006/0204999, which
`
`is incorporated by reference. After capture of all antibodies, e.g. with protein A or G, the
`
`oligonucleotide tags of those specifically bound to target molecules may be released by a
`
`photosensitizer attached to a second antibody specific for a second epitope of the target molecule.
`
`When target molecules are nucleic acids, both specific binding compounds and labels may
`
`likewise be nucleic acids. Nucleic acid probes incorporating oligonucleotide tags and components
`
`for specifically binding to target nucleic acids may be produced in a variety of forms to permit
`
`association with target molecules. In particular, in one aspect of the invention, nucleic acid probes
`
`- of the invention associate with target nucleic acid by specific hybridization. Such specifically
`
`hybridized probes are then altered so that they may be isolated or distinguished from non—specifically
`
`-6-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`hybridized probes. Such alteration and isolation may be carried out in many ways. For example, in
`one aspect, such alteration is circularization ofhybridized probes, e.g. by template-driven ligation,
`
`which renders associated probes resistant to exonuclease digestion, as illustrated in Fig. 4C. In
`
`another aspect, such alteration is the template-driven ligation of two or more probe components to
`
`form a single nuclease-resistance product, as illustrated in Fig. 6A. In another aspect, such alteration
`
`is extension by one of more nucleotides to add a capture moiety for physical separation from non-
`
`extended probes. In another aspect, after combining 5 ’—exonuclease—resistant probes with a sample,
`
`non—bound probes may be eliminated by digestion with a 3 ’ eXonuclease, such as exonuclease III,
`afier which the 3’ ends of the bound probes are extended, e.g. with a DNA polymerase, and the
`
`resulting complexes are treated with a 5’ exonuclea se, such as T7 exonuclease, to leave a population
`
`of extended probes that may be amplified and detected for enumerating the target molecules.
`
`As mentioned above, once an association between uniquely labeled probes and target
`
`molecules has been made, the number of different unique labels can be determined in a number of
`
`ways depending on the nature of the label. In the case of labels that comprise oligonucleotide tags, in
`
`one aspect, such determinations may be made by sorting to form successively less complex
`
`populations or by direct sequencing, as described more fully below.
`
`Counting By Sorting Oligonucleotide Tags
`
`In one aspect, binary tags are used to label molecules and the number of different binary tags
`
`present is determined by sequence—specific sorting of the tags. Preferably, unique tags are attached to
`
`the molecules to be counted by a process of labeling by sampling, as described by Brenner et al, U.S.
`
`patent 5,846,719. Essentially, any type of molecule, or other structures such as nanoparticles, or the
`
`like, that can be labeled with an oligonucleotide tag, can be counted in accordance with the invention.
`
`Thus, molecules that can be counted include biomolecules, such as polynucleotides, proteins,
`
`antibodies, and so on. In one aspect, polynucleotides are the preferred molecules for counting
`
`because of the many ways available to attach oligonucleotide tags, e.g. ligation either as a whole or
`stepwise in subunits, and to analyze and manipulate tag-polynucleotide conjugate, e.g. amplifying by
`
`PCR or other nucleic amplification technology. In one aspect, the method of the invention is
`
`implemented by providing separate sets of tags for sorting (i.e. “sorting tags”) and for identifying
`
`different sorting tags. That is, a set of sorting tags are designed to facilitate the labeling and sorting
`
`processes, whereas identification tags are designed for a specific readout device, such as a rnicroanay
`
`or electrophoresis instrument. Binary tags are an example of sorting tags, whereas metric tags are an
`
`example of identification tags.
`
`One embodiment of the invention for counting polynucleotides is illustrated in Figs. lA-lF.
`
`One counting approach is illustrated in Figs. lA-IB, where the objective is to count how many
`
`restriction fragments of a particular kind are present in a sample, e. g. a sample of genomic DNA from
`
`50—100 cells. DNA (100) extracted from the sample is digested (105) with a restriction endonuclease
`
`-7-
`
`10
`
`15
`
`2O
`
`25
`
`30
`
`35
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`having recognition sites (102) so that fragments (103) are produced. Preferably, a restriction
`
`endonuclease, or a combination of restriction endonucleases, is selected that produces fragments
`
`having an expected size in the range of fiom 100—5000 nucleotide, and more preferably, in the range
`
`of from 200-2000 nucleotides. Other fragment size ranges are possible, however, currently available
`
`replication and amplification steps work well within the preferred ranges. The object of the method
`
`is to count the number of f4 restriction fragments present in DNA (100) (and therefore, the sample of
`
`50-100 cells). Afier digestion (105), adaptors (107) having complementary ends and containing
`
`oligonucleotide tags, i.e. “tag adaptors," are ligated (106) to the fragments. In this example, there are
`
`100-200 fragments of each type, assuming a diploid organism. Each collection of ends of each type
`
`of fragment requires 100—200 tag adaptors in the ligation reaction; in effect, each collection of ends
`
`samples the population of tag adaptors. In accordance with the invention, the tag adaptors
`
`collectively include a population of tags sufficiently large so that such a sample contains substantially
`
`all unique tags. In one aspect, the size of the set of tags is at‘least ten times the number of fragments
`
`to be counted; in another aspect, the size of the set of tags is at least 100 times the number of
`
`fragments to be counted. Afler tag adaptors (107) are ligated, one of the tag adaptors on each
`
`fragment is exchanged for a selection adaptor (109)(which is the same for all fragments) so that each
`
`fragment has only a single tag and so that the molecular machinery necessary for carrying out
`
`sequence—specific selection is put in place. (Fig. 1C provides a more detailed illustration of the
`
`structure of the fragments at this point). One way to exchange a tag adaptor for a selection adaptor is
`
`described below and in Figs. 2A—ZB. After fragments of interest (110) have both adaptors attached,
`
`they are sorted from the rest of the fragments by the sequence-specific sorting process described in
`
`Appendix I. Briefly, such sorting is accomplished by repeated cycles of primer annealing to the
`
`selection adaptor, primer extension to add a biotinylated base only if fragments have a complement
`
`identical to that of the desired fragments, removing the biotinylated complexes, and replicating the
`
`captured fragments. That is, the selection is based on the sequence of the fragments adjacent to
`
`selection adaptor (109). One controls the fragments selected by controlling which incorporated
`
`nucleotide has a capture moiety in each cycle. After such sorting, the number of different tags in the
`
`population of fragments (110) is determined by successively sorting (1 16) the binary tags into two
`
`separate aliquots. The same sorting procedure of Appendix I is used. In this case, the selection is
`
`based on the words, or subunits, of the binary tags in fiaginents (110). After each sorting step, the
`
`resulting aliquots are tested for the presence or absence of fragments. A variety of testing procedures
`
`can be used and such selection is a matter of design choice and routine practice. In one aspect,
`
`aliquots are assayed using a PCR, which can be implemented with one or more controls or internal
`
`standards for confirming the absence of fragments. The sorting process continues until there is an
`
`aliquot with no fragments detected. Such a process is outlined in Fig. 1B for an initial number of 225
`
`(1 18). In each sorting step (120), the number of fiagments sorted into each aliquot will usually be
`
`about the same, because about the same number of tags will have a word of each type at each
`
`—8-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`
`
`WO 2007/087312
`
`PCT/US2007/001796
`
`position. Of course, statistical flukes are possible, in which case, the counting process may be
`
`repeated. In accordance with the invention, not all of the possible branches of a sorting process need
`
`be carried out. Selection of a particular pathway is a matter of design choice. For example, in the
`
`first sorting step, the 225 fragments are shown to be divided into subsets of 111 (122) and 114 (124).
`During the sorting process, of course, these quantities are not known. Only the presence or absence
`
`'
`
`of fi‘agrnents is determined. The numbers in Fig. 1B are presented only for illustration to show how
`
`repeated sorting eventually results in an aliquot with no fragments. As also illustrated, the selection
`
`of pathway can effect the determination of the number of molecules in the original mixture.
`
`However, statistically any preselected pathway should be equivalent. The confidence in a result can
`
`be increased by repeating the sorting process or by carrying out sorting along several pathways in
`
`parallel. The greatest variability occurs when the number of fragments becomes small, as indicated
`
`by examining pathways between sorting step 7 and 9, where one pathway results in no fragments
`
`detected (126) at step 8 and another pathway results in no fragments detected (128) at step 9. In this
`
`example, the number of molecules in the original mixture can be determined to be in the range
`
`between 2‘8") (=128) and 29'” (=256). Alternative algorithms may be used within the scope of
`
`inventive concept to determine or estimate the number of molecules in the original mixture.
`
`As mentioned above, Fig. 1C provides a structure of fragments having different adaptors at
`
`different ends, sometimes referred to herein as “asymmetric” fragments. Exemplary fragments (110)
`
`are redrawn to Show more structure. The fragments each comprise selection adaptor (129),
`
`restriction fragment (133), and tag adaptor (1 35). Tag adaptor (135) comprises primer binding sites
`
`(134) and (130), and sandwich between such sites are binary tags (132). Primer binding site (134)
`
`allows amplification of binary tag (132) and selection of binary tag (132) during a sorting procedure.
`
`The binary nature of the binary tags are shown by indicating words as open and darkened boxes; that
`
`is, there are two choices of word at each position. For