(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(19) World Intellectual Property Organization A, | I
`International Bureau
`
`p
`
` (10) International Publication Number
`
`(43) International Publication Date
`28 June 2007 (28.06.2007)
`
`International Patent Classification:
`
`C1I2@Q 1/68 (2006.01)
`
`(51)
`
`(21)
`
`International Application Number:
`PCT/NL2006/000654
`
`(22)
`
`International Filing Date:
`21 December 2006 (21.12.2006)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(30) Priority Data:
`60/752,591
`
`22 December 2005 (22.12.2005)
`
`US
`
`(71)
`
`Applicant (for all designated States except US): KEY-
`GENE N.V. [NL/NL]; 90, Agro Business Park, NL-6708
`PW Wageningen (NL).
`
`WO 2007/073171 A2
`
`(81) Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AT, AU, AZ, BA, BB, BG, BR, BW,BY, BZ, CA, CH, CN,
`CO, CR, CU, CZ, DE, DK, DM, DZ, EC, EE, EG, ES, FI,
`GB, GD, GE, GH, GM, GT, HN, HR, HU,ID,IL, IN, IS,
`JP, KE, KG, KM, KN,KP, KR, KZ, LA, LC, LK, LR, LS,
`LT, LU, LV, LY, MA, MD, MG, MK, MN, MW, MX, MY,
`MZ, NA, NG, NI, NO, NZ, OM,PG, PH, PL, PT, RO, RS,
`RU, SC, SD, SL, SG, SK, SL, SM, SV, SY, TJ, ‘I'M, ‘TN,
`TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW.
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KF, T.S, MW, MZ, NA, SD, SL, SZ, TZ, UG, 7M,
`ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI,
`FR, GB, GR, HU,IE, IS, IT, LT, LU, LV, MC, NL, PL, PT,
`RO,SE,SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA,
`GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`
`2007/073171AIRITMIITAINTIUITIONNTAANAAATA
`
`Published:
`Inventor; and
`(72)
`(75)
`—_without international search report and to be republished
`Inventor/Applicant (for US only): VAN ELJK, Michael,
`upon receipt of that report
`Josephus, Theresia [NL/NL]; 12, Pastoor Strijboschstraat,
`NL-5373 EJ Herpen (NL).
`
`(74)
`
`Agent: DE LANG,R.-J.; Exter Polak & Charlouis B.V.,
`P.O. Box 3241, NL-2280 GE Rijswijk (NL).
`
`For two-letter codes and other abbreviations, refer to the "Guid-
`ance Notes on Codes and Abbreviations" appearing at the begin-
`ning of each regular issue of the PCT Gazette.
`
`(54) Title: IMPROVED STRATEGIES FOR TRANSCRIPT PROFILING USING HIGH THROUGHPUT SEQUENCING TECH-
`NOLOGIES
`
`WO
`
`(57) Abstract: Described is a method for determining a nucleotide sequence within cDNA,the frequency of a nucleotide sequence
`inacDNAsample, as well as a method for (unbiased) determination ofrelative transcript levels of genes without sequence informa-
`tion of these genes being required, said methods using complexity reduction and (high throughput) sequencing.
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000654
`
`Improved strategies for transcript profiling using High
`Title:
`
`Throughput sequencing technologies.
`
`Technical Field
`The present invention relates to the fields of molecular
`
`
`biology and genetics. The invention relates to improved
`strategies for determining the sequence of transcripts based on
`the use of high throughput sequencing technologies. The
`
`invention further relates to improved strategies for unbiased
`transcript profiling.
`
`Background of the invention
`Transcript profiling is one of the cornerstone
`‘technologies used in modern day biotechnology research. The
`main application domain of transcript profiling is discovery of
`genes involved in complex traits. This includes a wide range of
`
`biological phenomena such as discovery of genes involved in
`(human) disease in order to identify targets for development of ©
`medication (target discovery), unraveling biochemical pathways
`
`controlling synthesis of biomolecules (fermentation industry),
`dissection of complex traits for plant and animal breeding
`
`(gene discovery) and many others.
`A second application domain follows the reverse route,
`i.e.
`to use transcript profiling for routine diagnostic
`determination of transcript profiles of
`(a selected subset of)
`genes in order to predict a complex phenotype. Examples in this
`category are molecular classification, diagnosis and prediction
`
`(Van de Vijver et
`of clinical prognosis of human breast cancer
`al., 2002; N. Engl. J. Med., vol. 347)25:1999-2009; van ‘*t Veer
`et al., 2002, Breast Cancer Res., vol. 5(1):57-87
`www. agendia.com) and papillary renal cell carcinoma (Yang et
`al., 2005). Approaches for the identification of relevant genes
`based on transcript profiling data collected in segregating
`
`
`
`10
`
`15
`
`20
`
`29
`
`30
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000654
`
`populations are described by Schadt and co-workers (2005, Sei.
`STKE, vol. 296:pe40) .
`In brief,
`transcript profilingis of
`paramount
`importance in life sciences research.
`
`Technologies for transcript profiling, have evolved rapidly
`over the past 10 years. Until the early nineties (shortly after
`the widespread availability of PCR),
`transcript profiling was
`performed by Northern blot analysis or RNAse protection assays.
`While these techniques are fairly specific and sensitive
`(especially RNAseprotection assays),
`limitations of these
`technologies are that only one or a few genes can analyzed at
`the time (low throughput), while the procedures are tedious and
`time-consuming.
`In addition, both methods require the use of
`radioactive labeling techniques, which poses health hazards.
`With the advent of the differential display (DD)
`technique
`in 1992 (Liang & Pardee, 1992, Science, vol. 257(5072):967-71),
`
`and many modifications and improvements of DD (e.g. Ordered
`Differential Display, Matz et al., 1997, Nucl. Acids. Res.,
`vol. 25(12):2541-2), a first step was taken towards multiplexed
`transcript profiling. Characteristics of DD are that random
`subsets of genes are targeted by low-stringency annealing of a
`randomly designed PCR primer to the cDNA sample to be analyzed,
`resulting in preferential amplification of expressed
`transcripts containing. sequences with high homology to the PCR
`primer, used. Next,
`the amplification products are resolved on
`sequence gels, resulting ina fingerprint pattern representing
`subsets of transcribed genes. While DD methods have higher
`throughput compared to Northern blots and RNAse protection
`assays,
`their limitations are the fairly low reproducibility /
`robustness of these techniques. This is in part due to non-
`specific annealing of the random PCR primer used. Consequently,
`fingerprint patterns generated using different random primers
`do not systematically target different
`(complementary) subsets
`of transcripts.
`A further disadvantage is that DD methods
`.
`require preparation of slab-gels or detection by capillary gel-
`electrophoresis. Yet another limitation is that’ the gene origin
`of observed bands in the fingerprints are not known, which
`requires band excision, elution, re-amplification and DNA
`
`LO
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000654
`
`the latter limitation is shared with
`sequencing to reveal;
`other fingerprint-based transcript profiling methods. Finally,
`with detection of 50-100 fragments per lane on.a gel /
`capillary trace,
`the technology is moderately multiplexed.
`The cDNA-AFLP method (Bachem et al., 1996, Plant J., vol.
`9(5):745-53) addresses two of the main limitations of DD
`technology, namely reproducibility/robustness and
`complementarity of information obtained in fingerprints
`generated with different PCR primers. The robustness and
`reproducibility of cDNA-AFLP method is very high because
`amplification of adaptor-ligated restriction fragments using
`selective AFLP@® (Keygene N.V.,
`the Netherlands; see e.g. BP 0
`534 858 and Vos P., et al.
`(1995). AFLP: a new technique for
`DNA fingerprinting. Nucleic Acids Research, vol. 23, No. 21, p.
`4407-4414) primers takes place under high-stringency
`conditions, resulting in highly reproducible fingerprints
`patterns.
`In addition,
`the use of selective AFLP primers with
`‘different selective nucleotides ensures that fingerprints
`containing complementary information are obtained. Hence cDNA-
`AFLP technology enables reproducible sampling of subsets of the
`transcriptome. Another advantage of
`(cDNA-) AFLP (and DD)
`is
`that no prior sequence information is needed:-and the technology
`can therefore be applied to a wide range of organisms. —
`Limitations of cDNA-AFLP are its moderate multiplexing levels
`per lane/trace and the fact that the gene origin of bands is
`not known directly (see also DD).
`
`The limitations in multiplexing levels of the above
`described transcript profiling methods have been addressed by
`both SAGE (Serial Analysis of Gene Expression; Velculescu et
`al., 1995, Science, vol. 270(5235):484-7) and Massively
`Parallel Signature Sequencing (MPSS: Brenner et al., 2000,
`Nature Biotechnology, vol. 18(6):630-4; Meyers et al., 2004,
`Nature Biotechnology, vol. 22(8):1006-11). Like CDNA-AFLP, both
`methods use type ITS restriction enzymes to cut sample cDNA,
`followed by adapter ligation. -
`In SAGE, adaptor-ligated fragments are subsequently:
`concatenated and sequenced by Sanger sequencing. Short 14-20 bp
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`PCT/NL2006/000654
`sequence tags are extracted from the Sanger sequence trace,
`providing quantitative information about the transcribed genes
`(“ digital Northern”). By comparing the frequency of tags
`between samples,
`information is obtained about relative
`expression levels between investigated samples, without the
`need for prior sequence information. Although this results in
`(accurate) determination of relative transcript abundance: in
`different samples, given the short sequence tags obtained it is
`difficult to assess from which genés the tags are derived,
`unless the large EST collections or the whole genome sequence
`of the investigated organism is available and tag sequences can
`be subjected to homology searches such as BLAST (Basic Local
`Alignment Search Tool) analysis. Hence, although SAGE is highly
`multiplexed,
`reproducible and robust, its value is limited to
`organisms with sequenced genomes. Another limitation is that
`the method is not very amenable to processing large samples
`(low throughput) due to the costs of large-scale Sanger
`sequencing.
`.
`Contrary to SAGE, MPSS is based on solid phase sequencing
`reactions. However, MPSS essentially suffers from the same
`limitations as SAGE, i.e.
`that very short sequence tags
`(approximately 20 bp) are obtained, which strongly limits
`further follow-up (gene identification / assay conversion) of
`interesting sequence tags in organisms for which limited
`(genome) sequence is available. In summary, although SAGE and
`MPSS are robust and highly multiplexed transcript profiling
`technologies which do not require prior sequence information to
`apply,
`their value is in practice limited to organisms for
`which the whole genome sequences have been determined or large
`EST collections are available in order to connect sequence tags
`£o genes. Both methods are low-throughput and.technically
`complex.
`Conceptual strong points are that both methods rely on
`statistical sampling of transcript libraries (resulting in
`“digital Northerns”)
`in combination with accurate sequence
`
`determination, which provides for unbiased estimates of
`(relative) transcription.1¢vels of many genes simultaneously
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000634
`
`and the fact that-.transcript profiling does not suffer from
`
`cross-hybridization to probes on solid supports.
`In 1995, gene expression microarrays were introduced
`(Schena et al., 1995, Science, vol. 270(5235):467-70), which
`presented a paradigm shift in the transcript profiling field.
`While initially so called “spotted * microarrays containing
`EST-derived PCR products as probes were used,
`in subsequent
`years the focus has shifted towards oligonucleotide DNA chips
`(Pease et al.,. 1994, Proc. Nat. Ac. Sci. USA, vol. 91(11) :5022-
`6), because of their higher robustness and scaling flexibility.
`Currently,
`the transcript profiling market is dominated by
`oligonucleotide DNA chips from various suppliers (e.g.
`Affymetrix, Nimblegen, Agilent etc). The power of DNA chips
`lies in the large number of DNA sequences that can be attached
`/ synthesized on their surface, which enables massively
`parallel transcript profiling, allowing e.g.
`transcript
`profiling for all known human genes (= high multiplexing level
`of genes).
`In addition,
`the process of chip fabrication and
`hybridization can be automated and controlled, allowing for
`high throughput and robustness, respectively. Consequently, DNA
`chips are the state-of-the-art for transcript profiling anno
`2005. However, while multiplexing capacity,
`throughput and
`
`two
`‘~robustness are very important strong points of DNA chips,
`important limitations of chip-based transcript profiling are
`that sequence information is needed in order to be able to
`build the chip and that cross-hybridization betweenhighly
`homologous sequence such as those derived from members of
`
`
`
`
`
`duplicated gene families may affect the accuracy of the
`results. The latter is very difficult to monitor/exclude,
`because it is an intrinsic characteristic of hybridization-
`based’ detection. Due to these facts, comparison of results
`‘obtained using DNA chips from different suppliers (reflecting
`different underlying production technologies and application
`protocols),
`is difficult to perform (Yauk et al., 2005, Nucleic
`Acids Research, vol. 32(15):¢e124). Within one platform,
`|
`validation of results by an independent method such as real-
`time PCR assays (e.g. TaqMan,
`Invader)
`is needed. Thus, DNA
`
`~
`
`10
`
`15
`
`20.
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000634
`
`chips do not provide data fitting the concept of a digital
`Northern but are useful for determination of relative
`expression levels if the same platform is used for all samples.
`|
`Ideally, a transcript profiling technology is highly
`multiplexed, i.e. many genes can be investigated
`simultaneously, high throughput, very robust and reproducible,
`highly accurate (not suffering. from cross-hybridization) and
`applicable without the need for prior sequence information. The
`invention described below provides for methods fitting such
`
`10
`
`criteria.
`
`Summary of the invention
`The present inventors have now found that with a different
`strategy this problem can be solved and the high throughput
`sequencing technologies can be efficiently used in transcript
`profiling.
`The invention comprises employing a technology that
`preferably divides the transcriptome in reproducible subsets.
`The subsets are sequenced and assembled into contigs
`corresponding to individual transcripts. By repeating this step
`in such a way that a different reproducible subset is provided,
`different sets of contigs are obtained. These different contigs
`are used to assemble the draft sequences of the transcripts.
`The invention does not require any knowledge of the sequence
`and can be applied to transcripts of any complexity: The
`invention is also applicable to a combination of transcripts
`e.g. derived from different tissues of the same organism or
`different organisms. The present invention provides a quicker,
`reliable and faster access to any transcript of interest and
`thereby provides for accelerated analysis of the transcript.
`qn
`The invention is also directed.to (unbiased) determination
`
`of relative transcript levels of genes without sequence
`the
`information of these’ genes being required. To this end,
`frequency of a sequence within a cDNA sample is determined by
`sequencing of complexity-reduced libraries of said cDNA sample
`and alignment of the sequence to determine, the number of times
`
`the sequence is identified in’the libraries. This may be
`
`15s
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000654
`
`and. the frequencies of the
`repeated for a second cDNA sample,
`two cDNA samples may be normalized, if required, and compared
`to determine relative transcription levels.
`
`
`
`‘Definitions
`In the following description and examples a number of
`terms are used.
`In order to provide a clear and consistent
`understanding of the specification and claims,
`including the
`scope to be given such terms,
`the following definitions are
`
`provided. Unless otherwise defined herein, all technical and
`scientific terms uséd have the same meaning as commonly
`understood by one of ordinary skill in the art to which this
`
`invention belongs. The disclosures of all publications, patent
`applications, patents and other references are incorporated
`herein in their entirety by reference.
`Nucleic acid: a nucleic acid according to the presént
`invention may include any polymer or oligomer of pyrimidine and
`purine bases, preferably cytosine,
`thymine, and uracil, and
`adenine and guanine, respectively (See Albert L. Lehninger,
`Principles of Biochemistry, at 793-800 (Worth Pub. 1982) which
`is herein incorporated by reference in its entirety for all
`purposes). The present invention contemplates any
`deoxyribonucleotide, ribonucleotide or peptide nucleic acid
`component, and any chemical variants thereof, such as
`methylated, hydroxymethylated or glycosylated forms of these
`bases, and the like. The polymers or oligomers may be
`heterogenous or homogenous in composition, and may be isolated
`from naturally occurring sources or may be artificially or
`synthetically produced.
`In addition,
`the nucleic acids may be
`DNA or RNA,. or a mixture thereof, and may exist permanently or
`transitionally in single-stranded or double-stranded form,
`including homoduplex, heteroduplex, and hybrid states.
`. Complexity reduction:
`the term complexity reduction is
`used to denote a method wherein the complexity of a nucleic
`acid sample, such as genomic DNA,
`is reduced by the generation
`of a subset of the sample. This subset can be representative _
`for the whole (i.e. complex)
`sample and.is preferably a
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000654
`
`reproducible subset. Reproducible means in this context that
`
`when the same sample is reduced in complexity using the same
`method,
`the same, or at least comparable, subset is obtained.
`The method used for complexity reduction may be any method for
`complexity reduction known in the art. Non-limiting examples of
`methods for complexity reduction include AFLP® (Keygene N.V.,
`the Netherlands; see e.g. EP.0 534 858),
`the methods described
`by Dong (see e.g. WO 03/012118, WO 00/24939),
`indexed linking.
`(Unrau, et al., 1994, Gene, 145:163-169),
`those disclosed in US
`
`10
`
`2005/260628, WO 03/010328, US 2004/10153, genome portioning
`
`(see e.g. WO 2004/022758), Serial Analysis of Gene Expression
`(SAGE; see e.g. Velculescu et al.; 1995, see above, and
`Matsumura et al., 1999, The Plant Journal, vol. 20(6):719-726)
`and modifications of SAGE (see e.g. Powell, 1998, Nucleic Acids
`
`
`Research, vol. 26(14) :3445-3446; and Kenzelmann and. Mitihlemann,
`1999, Nucleic Acids Research, vol. 27(3):917-918), MicroSAGE
`
`(see e.g. Datson et al., 1999, Nucleic Acids Research, vol.
`
`27(5).:1300-1307), Massively Parallel Signature Seguencing
`(MPSS; see e.g. Brenner et al., 2000, Nature Biotechnology,
`vol. 18:630-634 and Brenner et al., 2000, PNAS, vol.
`
`97(4):1665-1670), self-subtracted cDNA libraries (Laveder et
`al., 2002, Nucleic Acids Research, vol. 30(9):e38), Real-Time
`Multiplex Ligation-dependent Probe Amplification (RT-MLPA; see
`e.g. Bldering et al., 2003, vol. 31(23):e153), High Coverage
`Expression Profiling (HiCEP; see e.g. Fukumura et al., 2003,
`Nucleic Acids Research, vol. 31 (16) :e94), a universal micro-
`array system as disclosed in Roth et al., 2004, Nature
`Biotechnology, vol. 22(4):418-426, a transcriptome subtraction
`method (see e.g. Li et al., Nucleic Acids Research, vol.
`33(16):¢136), and fragment display {see e.g. Metsis et al.,
`2004, Nucleic Acids Research, vol. 32(16):e127). The complexity
`reduction methods used in the present invention have in common
`that they are reproducible. Reproducible in the sense that when
`the same sample is reduced in complexity in the same manner,
`.
`the same subset of the sample is obtained, as opposed to more
`random complexity reduction such as microdissection or the use
`of mRNA (cDNA) which represents a portion of the genome
`
`,
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO2007/073171
`
`PCT/NL2006/000654
`
`transcribed in a selected tissue and for its reproducibility is
`depending on the selection of tissue,
`time of isolation, and
`the like.
`.
`Tagging:
`the term tagging refers to the addition of a tag
`to a nucleic acid sample in order to be able to distinguish it
`from a second or further nucleic acid sample. Tagging can e.g.
`be performed by the addition of a sequence identifier during
`complexity reduction or by any other means known in the art.
`Such sequence identifier can e.g. be a unique base sequence of
`varying but defined length uniquely used for identifying a
`specific nucleic acid sample. Typical examples thereof are for
`instance ZIP sequences. Using such a tag, the origin of a
`sample can be determined upon further processing.
`in case of
`combining processed products originating from different nucleic
`acid samples,
`the different nucleic acid samples should be
`identified using different tags.
`Tagged library:
`the term tagged library refers to a
`
`library of tagged nucleic acid.
`Sequencing: The term sequencing refers to determining the
`order of nucleotides (base sequences)
`in a nucleic: acid sample,
`e.g. DNA or RNA.
`
`
`
`Aligning and alignment: With the term “aligning” and
`“alignment” is meant the comparison of two or more nucleotide
`sequence based on the presence of short or long stretches of
`identical or similar nucleotides. Several methods for alignment
`of nucleotide sequences are known in the art, as will be
`further explained below. Sometimes the terms ‘assembling’ or
`‘clustering’ are used as a synonym, although these terms are
`technically not identical. Alignment takes place based on
`comparing maximum homology, whereas assembling means preparing
`a contig based on an overlap.
`High-throughput screening: High-throughput screening,
`often abbreviated as HTS,
`is a method for: scientific
`experimentation especially relevant to the fields of biology
`and chemistry. Through a combination of modern robotics and
`other specialized laboratory hardware, it allows a researcher
`to effectively’ screen large amounts of samples simultaneously.
`
`LO
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000654
`
`High-throughput sequencing: determining the sequence of a
`nucleotide sequence using high-throughput techniques.
`|
`Restriction endonuclease: a restriction endonuclease or
`restriction enzyme is an enzyme that recognizes a specific
`nucleotide sequence (target site) in a double-stranded DNA
`molecule, and will cleave both strands of the DNA molecule at
`every target site.
`the DNA molecules produced by
`Restriction fragments:
`digestion with a restriction endonuclease are referred to as
`restriction fragments. Any given genome (or nucleic acid,
`regardless of its origin) will be digested by a particular
`
`restriction endonuclease into a discrete set of restriction
`fragments. The DNA fragments that result from restriction
`endonuclease cleavage can be further used in a variety of
`
`techniques and can for instance be detected by gel
`
`electrophoresis.
`Gel electrophoresis:
`in order to detect restriction
`
`fragments, an analytical method for fractionating double-
`stranded DNA molecules on the basis of size can be required.
`
`The most commonly used technique for achieving such
`fractionation is (capillary) gel electrophoresis. The rate at
`which DNA fragments move in such gels depends on their
`molecular weight;
`thus,
`the distances traveled decrease as the
`fragment. lengths increase. The DNA fragments fractionated by
`gel electrophoresis can be visualized directly by a staining
`procedure e.g. silver staining or staining using ethidium
`bromide, if the number of fragments included in the pattern is
`sufficiently small. Alternatively further treatment of the DNA
`fragments may incorporate detectable labels in the fragments,.
`
`10
`
`15
`
`20
`
`“25
`
`» 30
`
`such as fluorophores or radioactive labels.
`
`Ligation:
`the enzymatic reaction catalyzed by a ligase
`enzyme in which two double-stranded DNA molecules are
`In
`covalently joined together is referred to as ligation.
`general, both.DNA strands arecovalently joined together, but
`.it is also possible to prevent the ligation of one of the two
`strands through chemical or enzymatic modification of one of
`
`35
`
`

`

`PCT/NL2006/000654
`WO 2007/073171
`In that case the covalent joining will
`the ends of the strands.
`occur in only one of the two DNA strands.
`Synthetic oligonucleotide: single-stranded DNA molecules
`having preferably from about 10 to about 50 bases, which can be
`synthesized chemically are referred to as synthetic
`oligonucleotides. In general,
`these synthetic DNA molecules are
`designed to have a unique or desired nucleotide sequence,
`although it is possible to synthesize families of molecules
`
`having related sequences and which have different nucleotide
`compositions at specific positions within the nucleotide
`- sequence. The term synthetic oligonucleotide will be used to
`refer to DNA molecules having a designed or desired nucleotide
`sequence.
`-
`.
`|
`Adaptors: short double-stranded DNA molecules with a
`limited number of base pairs, e.g. about 10 to about 30 base
`pairs in length, which are designed such that they can be
`ligated to the ends of restriction fragments. Adaptors are
`generally composed of two-synthetic oligonucleotides, which
`have nucleotide sequences that are partially complementary to
`each other. When mixing the two synthetic oligonucleotides in
`solution under appropriate conditions,
`they will anneal to each
`other forming a double-stranded structure. After annealing, one
`end of the adaptor molecule is designed such that it is
`compatible with the end of a restriction fragment and can be
`ligated thereto;
`the other end of the adaptor can be designed
`so that it cannot be ligated, but this need not be the case
`(double ligated adaptors).
`|
`Adaptor-ligated restriction fragments: restriction
`fragments that have been capped by adaptors as a result of
`
`the term primers refers to a DNA
`“Primers: in general,
`strand which can prime the synthesis of DNA. DNA polymerase
`cannot synthesize DNA de novo without. primers: it can only
`extend an existing DNA strand in a reaction in which the
`complementary strand is used as a template to direct the order
`of nucleotides to be assembled. We will refer to the synthetic
`
` ligation.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000654
`
`oligonucleotide molecules that are used in a polymerase chain
`reaction (PCR) as primers.
`.
`|
`DNA amplification:
`the term DNA amplification will be |
`typically used to denote the in vitro synthesis of double-
`stranded DNA molecules using PCR.. It is noted that other
`amplification methods exist and they may be used in the present
`invention without departing from the gist.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`(a)
`(6b)
`
`Detailed description of the invention
`The present invention provides for a method for
`determining a nucleotide sequence of cDNA comprising the steps
`of:
`.
`.
`Providing CDNA;
`
`Performing a complexity reduction on at least a
`portion of the cDNA to obtain a first Library of the
`cDNA comprising cDNA fragments;
`(c) Determining at least part of the nucleotide sequences
`
`of the cDNA fragments of the first library by high-
`throughput sequencing;
`.
`(ad) Aligning the nucleotide sequences of the cDNA
`fragments of the first library of step d)
`to generate
`contigs of the first library; and
`(e) Determining the nucleotide sequence of the cDNA.
`Hitherto in the art of sequencing technology,
`the use of
`this complexity reduction in combination with high-throughput
`sequence determination of cDNA to represent transcripts has not
`been disclosed or suggested.
`cDNA is provided. It well known
` In step (a) of the method,
`
`in the art how to prepare cDNA. A method for the preparation is
`set forth below. However, any method for the preparation of
`cDNA may be used.
`is usually prepared from mRNA
`cDNA (complementary DNA)
`In that case, reverse
`using reverse transcriptase.
`transcriptase synthesizes a DNA strand complementary to an RNA
`template if it is provided with a.primer that is base-paired to
`the RNA and contains a free 3’-Oh group. Such primer can e€.g.
`be an oligo-dT primer that pairs with the poly-A sequence at
`
`

`

`WO 2007/073171
`PCT/NL2006/000654
`the 3’ end of most eucaryotic mRNA molecules. The rest of the
`cDNA strand can then be synthesized in the presence of the four
`deoxyribonucleoside triphosphates. The RNA strand of the
`resulting RNA-DNA hybrid is subsequently hydrolyzed, e.g. by
`raising the pH. Unlike RNA, DNA is resistant to alkaline
`hydrolysis, such that the DNA strand remains intact. An
`alternative primer can. be a random primer. The random priming,
`of cDNA may be beneficial when the reverse transcriptase fails
`to fully transcribe an mRNA template or if secondary structures
`exist. Yet an alternative primer can be a sequence-specific
`
`primer.
`
`a
`
`Methods’ for isolation of RNA from cells of a tissue of an
`organism or an organism itself are well known in the art of
`molecular biology. Moreover, many commercially available kits
`for cDNA synthesis can be purchased, such as e.g.
`from ABgene,
`Ambion, Applied Biosystems, BioChain, Bio-Rad, Clontech, GE
`Healthcare,. GeneChoice,
`Invitrogen, Novagen, Qiagen, Roche
`
`Applied Science, Stratagene, and the like. Such methods are
`e.g. described in Sambrook et al.
`(Sambrook, J., Fritsch, E.F.,
`
`and Maniatis, T.,
`in Molecular Cloning: A Laboratory Manual.
`Cold Spring Harbor Laboratory Press, NY, Vol. 1, 2,
`3
`(1989)).
`RNAmay be isolated from several sources such as a cell
`culture, a tissue, etc.
`In step (b) of the method according to the present
`invention, a complexity reduction is performed on at least a
`portion of the cDNA to obtain a first library of the cDNA
`comprising cDNA fragments- Many methods for complexity
`reduction are known in the art, as indicated in the definition
`section.
`
`the step of complexity
`In one embodiment of the invention,
`reduction of the nucleic acid sample comprises enzymatically
`cutting the nucleic acid sample in restriction fragments,
`separating the restriction fragments and selecting a particular
`
`pool of restriction fragments. Optionally,
`the selected
`fragments are then ligated to adaptor sequences containing PCR
`
`primer.templates/binding sequences.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`

`

`WO 2007/073171
`PCT/NL2006/000654
`In one embodiment of complexity reduction, a type IIs
`endonuclease is used to digest the nucleic acid. sample and the
`restriction fragments are selectively ligated to adaptor
`sequences. The adaptor sequences can contain various
`nucleotides in the overhang that is to be ligated and only the
`adaptor with the matching set of nucleotides in the overhang is
`ligated to the fragment and subsequently amplified. This
`technology is depicted in the art as ‘indexing linkers’.
`Examples of this principle can be found inter alia in Unrau and
`Deugau (1994) Gene 145:163-169.
`In one embodiment,
`the method of complexity reduction
`utilizes two restriction endonucleases having different target
`sites and frequencies and two. different adaptor sequences to
`provide adaptor-ligated restriction fragments, such as in AFLP.
`In one embodiment of the invention,
`the step of complexity
`reduction comprises performing an Arbitrarily Primed PCR upon
`the sample.
`|
`In one embodiment of the invention,
`the step of complexity
`reduction comprises removing repeated sequences by denaturing
`and re-annealing the DNA and then removing double-stranded
`duplexes.
`the step of
`In certain embodiments of the invention,
`complexity reduction comprises hybridising the nucleic acid
`sample to a magnetic bead that is bound to an oligonucleotide
`probe containing a desired sequence. This embodiment may
`further comprise exposing the hybridised sample to a single
`strand DNA nuclease to remove the single-stranded DNA,’ ligating
`an adaptor sequence containing a Class IIs restriction enzyme
`to release the magnetic bead. This embodiment may or may not
`comprise amplification of the isolated DNA sequence.
`Furthermore, the adaptor sequence may or may not be used as a
`template for the PCR oligonucleotide primer.
`In this
`embodiment,
`the adaptor sequence may or may not contain a
`sequence identifier or tag.
`the complexity
`In certain embodiments of the invention,
`reduction utilises differential display technology or READS
`(Gene Logic)
`technology.
`,
`
`10
`
`15
`
`30
`
`35
`
`

`

`WO 2007/073171
`
`PCT/NL2006/000654
`
`In certain embodiments of the invention,
`the method of
`complexity reduction comprises exposing the DNA sample to a
`mismatch binding protein and digesting the sample with a 3’
`5’ exonuclease and then a single strand nuclease. This
`embodiment may or may not include the use of a magnetic bead
`attached to the mismatch binding prot

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.

We are unable to display this document.

PTO Denying Access

Refresh this Document
Go to the Docket