`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`7 September 2007 (07.09.2007)
`
`(51)
`
`International Patent Classification:
`
`CIZQ 1/68 (2006.01)
`
`(21)
`
`International Application Number:
`PCT/NL2007/000055
`
`(22)
`
`(25)
`
`(26)
`
`(30)
`
`(71)
`
`(72)
`(75)
`
`International Filing Date:
`
`1 March 2007 (01.03.2007)
`
`Filing Language:
`
`Publication Language:
`
`Priority Data:
`60/777,514
`
`English
`
`English
`
`1 March 2006 (01.03.2006)
`
`US
`
`Applicant (for all designated States except US): KEY-
`GENE N.V. [NL/NL]; 90, Agro Business Park, NL—6708
`PW Wageningen (NL).
`
`Inventors; and
`VAN EIJK,
`Inventors/Applicants (for US only):
`Michael, Josephus, Theresia [NL/NL]; 12, Pastorw Stt‘ij—
`boschstraat, NL—5373 EJ Herpen (NL). HOGERS, René,
`Cornelis, Josephus [NL/NL]; 38, Aalbeek, NL—6715 GR
`EDE (NL).
`
` (10) International Publication Number
`
`WO 2007/100243 A1
`
`(74) Agent: DE LANG, R.-J.; Exter Polak & Charlouis B.V.,
`PO. BOX 3241, NL—2280 GE Rijswijk (NL).
`
`(81) Designated States (unless otherwise indicated, for ever
`kind of national protection available): AE, AG, AL, AM,
`AT, AU, AZ, BA, BB, BG, BR, BW, BY, BZ, CA, CH, CN,
`CO, CR, CU, CZ, DE, DK, DM, DZ, EC, EE, EG, ES, FI,
`GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IS,
`JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK, LR, LS,
`LT, LU, LY, MA, MD, MG, MK, MN, MW, MX, MY, MZ,
`NA, NG, NI, NO, NZ, OM, PG, PH, PL, PT, RO, RS, RU,
`SC, SD, SE, SG, SK, SL, SM, SV, SY, TJ, TM, TN, TR,
`TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW.
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM,
`ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, Fl,
`FR, GB, GR, HU, IE, IS, IT, LT, LU, LV, MC, MT, NL, PL,
`PT, RO, SE, SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM,
`GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`
`[Continued on next page]
`
`(54) Title: HIGH THROUGHPUT SEQUENCE—BASED DETECTION OF SNPS USING LIGATION ASSAYS
`
`(57) Abstract: Method for the detection the presence or absence
`of one or more target sequences in one or more samples based
`on oligonucleotide ligation assays with a variety of ligatable
`probes containing identifiers and the subsequent identification
`of the identifiers in the amplicons or ligated probes using high
`throughput sequencing technologies.
`
`A/c
`
`(1)
`(Pl/f
`
`(131)
`(P331)
`
`(TSl)
`
`W2) 7,,\(IDZ)
`(132 ) \
`
`P2)
`(P1352)
`
`(TS1)(T82)
`
`(P2)
`(P1 (IDl)(ID2)
`(PBSl)(P332)
`<-——> 4——>\
`(T31)
`(T32)
`
`)
`(P2)
`(P1)
`(132 //
`\JID1)47
`'/
`(358i)(P335)
`'—->\
`
`A
`—
`
`2
`
`9
`
`D ”(4-,(T31) F (T32) /j
`1\)\—‘:ID1)‘>\‘(\D1) (P2) ’132)/
`2()
`(P331)
`(CD1) ((32)
`
`E
`_
`
`(P1)
`
`E
`
`(l)
`(C)
`
`
`(TSl)
`(ID2)
`4i
`, _______________
`
`(TSPU (TSPEHWES
`(P332)
`(:31)
`(ch) ‘4
`(cP332)
`(9331)
`
`(2)
`
`(P1331)
`(T51)
`(T52)
`(1)3lfi;___, .7 ,, ,
`33$
`(:32)
`(PBSZ) \
`
`(P2)
`
`
`
`W02007/100243A1|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`
`
`WO 2007/100243 A1
`
`|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`Published:
`— with international search report
`— before the expiration of the time limit for amending the
`claims and to be republished in the event of receipt of
`amendments
`
`For two —letter codes and other abbreviations, refer to the “Guid—
`ance Notes on Codes and Abbreviations ” appearing at the begin—
`ning of each regular issue of the PCT Gazette.
`
`
`
`WO 2007/100243
`
`PCT/NL2007/000055
`
`Title: High Throughput sequence—based detection of SNPs using
`
`ligation assays.
`
`Field of the Invention
`
`10
`
`15
`
`The present invention relates to the field of molecular biology
`
`and biotechnology.
`
`In particular the invention relates to the field
`
`of nucleic acid detection, more in particular to the design and
`
`composition of (collections) of probes that can be used for the high
`
`throughput detection of nucleic acids. The invention also relates to
`
`methods for the detection of nucleic acids using the probes and
`
`compositions. The invention further provides for probes that are
`
`capable of hybridising to a target sequence of interest, primers for
`
`the amplification of ligated probes, use of these probes and primers
`
`in the identification and/or detection of nucleotide sequences that
`
`are related to a wide variety of genetic traits and genes and kits of
`
`20
`
`primers and/or probes suitable for use in the method according to the
`invention. The invention finds applicability in the field of the high
`
`throughput detection of target nucleotide sequences, whether from
`
`artificial, plant, animal or human origin or combinations thereof.
`
`The invention finds particular application in the field of high
`
`25
`
`3O
`
`35
`
`40
`
`throughput genotyping.
`
`Background of the Invention
`
`There is a rapidly growing interest in the detection of
`
`specific nucleic acid sequences. This interest has not only arisen
`
`from the recently disclosed draft nucleotide sequence of the human
`
`genome and many other genomes and the presence therein, as well as in
`
`the genomes of many other organisms, of an abundant amount of single
`
`nucleotide polymorphisms (SNP) and small insertion/deletions (indel)
`
`polymorphisms, but also from marker technologies (such as AFLP),
`
`SNPWave and the general recognition of the relevance of the detection
`
`of specific nucleic acid sequences as an indication of, for instance,
`
`genetically inheritable diseases. The detection of the various
`
`alleles of the breast cancer gene BRCA 1 to screen for susceptibility
`
`for breast cancer is just one of numerous examples. The recognition
`
`that the presence of single nucleotide substitutions and indels in
`
`genes provide a wide variety of information has also attributed to
`
`this increased interest. It is now generally recognised that these
`
`
`
`WO 2007/100243
`
`2
`
`PCT/NL2007/000055
`
`single nucleotide substitutions are one of the main causes of a
`
`significant number of monogenically and multigenically inherited
`
`diseases, for instance in humans, or are otherwise involved in the
`
`development of complex phenotypes such as performance traits in
`
`plants and livestock species. Thus, single nucleotide substitutions
`
`are in.many cases also related to or at least indicative of important
`
`traits in humans, plants and animal species.
`
`Analysis of these single nucleotide substitutions and indels
`
`will result in a wealth of valuable information, which will have
`
`widespread implications on medicine and agriculture in the widest
`
`possible terms. It is for instance generally envisaged that these
`
`developments will result in patient—specific medication. To analyse
`
`these genetic polymorphisms,
`
`there is a growing need for adequate,
`
`reliable and fast methods that enable the handling of large numbers
`
`of samples and large numbers of
`
`(predominantly) SNPs in a high
`
`throughput fashion, without significantly compromising the quality of
`
`the data obtained. One of the principal methods used for the analysis
`
`of the nucleic acids of a known sequence is based on annealing two
`
`probes to a target sequence and, when the probes are hybridised
`
`adjacently to the target sequence,
`
`ligating the probes.
`
`The OLA—principle (Oligonucleotide Ligation Assay) has been
`
`described, amongst others,
`
`in US 4,988,617 (Landegren et al.). This
`
`publication discloses a method for determining the nucleic acid
`
`sequence in a region of a known nucleic acid sequence having a known
`
`possible mutation. To detect the mutation, oligonucleotides are
`
`selected to anneal to immediately adjacent segments of the sequence
`
`to be determined. One of the selected oligonucleotide probes has an
`
`end region wherein one of the end region nucleotides is complementary
`to either the normal or to the mutated nucleotide at the
`
`corresponding position in the known nucleic acid sequence. A ligase
`
`is provided which covalently connects the two probes when they are
`
`correctly base paired and are located immediately adjacent to each
`
`other. The presence, absence or amount of the linked probes is an
`
`indication of the presence of the known sequence and/or mutation.
`
`Other variants of OLA—based techniques have been disclosed
`
`inter alia in Nilsson et al. Human mutation, 2002, 19, 410—415;
`
`Science 1994, 265: 2085-2088; US 5,876,924; WO 98/04745; WO 98/04746;
`
`US 6,221,603; US 5,521,065; USB,962,223; EP 185494Bl; US 6,027,889;
`
`US 4,988,617; EP 246864Bl; US 6,156,178; EP 745140 B1; EP 964704 B1;
`
`WO 03/054511; US 2003/0119004; US 2003/190646; EP 1313880; US
`
`2003/0032016; EP 912761; EP 956359; US 2003/108913; EP 1255871; EP
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`
`
`WO 2007/100243
`
`3
`
`PCT/NL2007/000055
`
`1194770; EP 1252334; W096/1527l; WO97/45559; U82003/0119004Al; US
`
`5,470,705.
`
`Particular advancements in the OLA techniques have been
`
`reported by Keygene, Wageningen,
`
`the Netherlands.
`
`In WO 2004/111271,
`
`W02005/021794, W02005/118847 and WOO3/052142,
`
`they have described
`
`several methods and probe designs that improved the reliability of
`
`oligonucleotide ligation assays. These applications further disclose
`
`the significant improvement in multiplex levels that can be achieved.
`
`However, all the above publications have as a disadvantage that they
`
`are based on electrophoretic— or array—based detection methods. A
`
`further disadvantage is the wide variation in length of the probes
`
`used, which may leads to less consistent amplification.
`
`It is clear that there is a continuing need for oligonucleotide
`
`probes that combine the advantages and avoid the specific
`
`disadvantages of the various ligation probe types and detection
`
`methods known in the art. There is also a need for further
`
`improvement of the technology by providing probes that have
`
`additional advantages. It is one of the goals of the present
`
`invention to provide such probes. It is another goal of the present
`
`invention to avoid the disadvantages of the commonly known probes as
`
`mentioned hereinbefore. It is a further goal of the invention to
`
`provide for probes that are suitable for high throughput detection
`
`methods. It is also a goal of the present invention to provide for an
`
`efficient, reliable and/or high throughput method for the detection
`
`of target nucleotide sequences, preferably by performing
`
`oligonucleotide ligation assays.
`
`The present inventors have set out to eliminate or at least
`
`diminish the existing problems in the art while at the same time
`
`attempting to maintain the many advantageous aspects thereof, and to
`
`further improve the technology. Other problems in the art and
`
`solutions provided thereto by the present invention will become clear
`
`the description,
`throughout
`described herein.
`
`the figures and the various embodiments
`
`Summary of the Invention
`
`The present inventors have been able to combine novel high
`
`throughput sequencing technologies with the versatility of
`
`oligonucleotide ligation based assays.
`
`In particular,
`
`the invention
`
`relates to a method for high throughput detection of target
`
`nucleotide sequences based on oligonucleotide ligation assays,
`
`wherein the probes used in the ligation assays are modified such that
`
`10
`
`15
`
`20
`
`25
`
`3O
`
`35
`
`40
`
`
`
`WO 2007/100243
`
`4
`
`PCT/NL2007/000055
`
`a high throughput sequencing method can be used to unequivocally
`
`reveal the present absence of the amount of the one or more target
`
`nucleotide sequences.
`
`Thus,
`
`the present inventors have found that by incorporation of
`
`a unique oligonucleotide identifier in at least one of the probes
`
`that are used in the OLA—assay for the detection of each target
`
`sequence in the sample and the subsequent detection of that
`
`identifier after the ligation and amplification steps by high
`
`throughput sequencing methods provides for a very efficient and
`
`reliable improvement of the existing technology. Contrary to the
`
`known probes in the art, whether linear or circularizable,
`
`the probes
`
`used in the method of the present invention can all have the same or
`
`very similar length. This uniform length is advantageous when the
`
`ligated probes are amplified as the amplification efficiencies for
`
`all ligated probes are similar, whereas with different length of the
`
`ligated probes, it has been observed that amplification efficiency
`
`may differ widely,
`
`thus compromising the reliability of the assay as
`
`a whole. The uniform length also facilitates the detection as the
`
`identifier is located at the same position for all ligated probes. By
`
`improving the OLA assays in this manner, a significant step is made
`
`in providing increasingly uniform assays that are easy to design for
`
`a specific target sequence, are able to reliably discriminate between
`
`target sequences or samples and can be performed in a high
`
`10
`
`15
`
`20
`
`25
`
`throughput, highly multiplexed fashion.
`In certain embodiments, methods for the high throughput
`
`detection of one or more target nucleotide sequences are provided.
`
`In
`
`certain embodiments,
`
`the method provides the high throughput
`
`detection of one or more target nucleotide sequences that may be
`
`derived from one or more samples.
`
`In certain embodiments,
`
`the method
`
`comprises providing for each target nucleotide sequence a first probe
`
`and a second probe.
`
`In certain embodiments the first probe comprises a target
`
`specific section at its 3’—end.
`
`In certain embodiments,
`
`the first
`
`probe further comprises a first tag section.
`
`In certain embodiments,
`
`the first tag section is non—complementary to the target nucleotide
`
`sequence.
`
`In certain embodiments,
`
`the first tag section further
`
`comprises a first primer binding sequence.
`
`In certain embodiments,
`
`the second probe comprises a second
`
`target specific section at its 5'—end.
`
`In certain embodiments,
`
`the
`
`second probe comprises a second tag section.
`
`In certain embodiments,
`
`the second tag section is non—complementary to the target nucleotide
`
`30
`
`35
`
`40
`
`
`
`WO 2007/100243
`
`5
`
`PCT/NL2007/000055
`
`sequence.
`
`In certain embodiments,
`
`the second tag section further
`
`comprises a second primer binding sequence. In certain embodiments,
`
`the first or the second tag section contains an identifier sequence.
`
`In certain embodiments, both the first and second tag section contain
`
`an identifier sequence.
`
`In certain embodiments,
`
`the identifier
`
`sequence is located between the first primer binding sequence and the
`
`first target specific sequence.
`
`In certain embodiments,
`
`the
`
`identifier is located between the second primer binding sequence and
`
`the second target specific sequence.
`
`In certain embodiments the first and second probes are allowed
`
`to hybridise to the target sequence in the sample. The respective
`
`first and second target specific sections of the probes are
`
`hybridised to preferably essentially adjacent sections on the target
`
`sequence, although in some embodiments a gap may be present between
`
`the two sections.
`
`In certain embodiments,
`
`the first and second probes
`
`are ligated i.e. connected to each other. The ligation of the first
`
`and second probe provides for ligated probes.
`
`In certain embodiments the ligated probes are amplified to
`
`provide amplicons.
`
`In certain embodiments one or more primers are
`
`used in the amplification.
`
`In certain embodiments a first and,
`
`in
`
`certain embodiments, a second primer are used for the amplification.
`
`In certain embodiments a first and,
`
`in certain embodiments a
`
`second primer can be used which,
`
`independently, may contain a
`
`restriction enzyme recognition site to provide amplicons. The
`
`amplicons are digested with the respective restriction enzymes for
`
`which recognition sites are present in the first and,
`
`in certain
`
`embodiments the second primer. Sequences for a third and,
`
`in certain
`
`embodiments a fourth primer, can subsequently be ligated to the
`
`digested amplicons to provide a template for amplification.
`
`In
`
`certain embodiments the third and second and,
`
`in certain embodiments
`
`the third and fourth primer are used in the amplification.
`
`In certain embodiments the ligated probes or,
`
`in certain
`
`embodiments,
`
`the amplicons derived from amplification of the ligated
`
`probes, are subjected to high throughput sequencing technologies to
`
`determine at least part of the nucleotide sequence of the ligated
`
`probes or amplicons.
`
`In certain embodiments the part(s) of the'
`
`nucleotide sequence that is(are) determined by subjecting the ligated
`
`probes or amplicons to high throughput sequencing technologies at
`
`least includes the identifier sequence(s).
`
`In certain embodiments the presence of the target nucleotide
`
`sequence in the sample is identified by determination of the presence
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`
`
`WO 2007/100243
`
`6
`
`PCT/NL2007/000055
`
`or absence of the associated identifier sequence in the nucleotide
`
`sequence.
`
`Brief description of the Drawings
`
`The present invention is illustrated by the following figures:
`
`Figure 1:
`
`In Figure 1, different probe types (A, B, C, D, E, F).
`
`are
`
`schematically illustrated vis—a—vis a target nucleotide sequence (T)
`
`of interest, carrying an A/C polymorphism. Various components of the
`
`probes have been depicted, using identical depictions throughout the
`
`10
`
`figures.
`
`15
`
`20
`
`25
`
`3O
`
`35
`
`Figure 1A illustrates a linear probe type, wherein a first probe (1)
`
`comprises a first target specific section (T81) and a first tag
`
`section comprising a first identifier (IDI) and a first primer
`
`binding sequence (PBSI), capable of annealing to a first primer (Pl).
`
`A second probe (2) comprises a second target specific section (T82)
`
`and a second tag section comprising a second identifier (ID2) and a
`
`second primer binding sequence (PBSZ), capable of annealing to a
`
`second-primer
`
`(P2).
`
`In embodiments for allele specific detection, T82
`
`may contain, preferably at its 3’end, an allele specific nucleotide
`
`(G or T),
`
`together with a different identifier (ID2 or ID2‘). The
`
`locus allele combination may then be determined (genotyped) by
`
`detection of the presence or absence of the combination of IDl with
`
`ID2 or ID2‘.
`
`In similar manner, all allelic variants of a
`
`polymorphism can be genotyped. Allele—specific probes can be designed
`
`for the other disclosed probe type in a similar fashion.
`
`Figure 13 illustrates a circularizable probe type, wherein the
`
`circularizable probe comprises a first target specific section (T81)
`
`and a second target specific section (T82), each located at the end
`
`(the respective 3’ and 5' ends) of the circularizable probe and a tag
`
`section comprising a first identifier (IDl), a first primer binding
`
`sequence (PBSl)
`
`that is capable of annealing to a first primer (Pl),
`
`a second primer binding sequence (PBSZ)
`
`that is capable of annealing
`
`to a second primer
`
`(P2) and an optional second identifier (ID2).
`
`IDl
`
`and ID2 may be located adjacent or not. In certain embodiments,
`
`the
`
`use of this probe type is coupled with an exonuclease treatment to
`
`remove unligated probes that may give rise to false—positive
`
`genotyping.
`
`In certain designs,
`
`the combination IDl and ID2 may
`
`represent locus allele combinations. Amplification of circularised
`
`probes will results in short amplicons that can be sequenced and
`
`
`
`WO 2007/100243
`
`7
`
`PCT/NL2007/000055
`
`determination of the presence of IDl and/or ID2 provides positive
`genotyping of the desired polymorphism.
`
`Figure 1C illustrates an alternative circularizable probe type. The
`
`probe contains the same components, but the relative positioning and
`
`orientation of IDl,
`
`ID2, PBSl and PBSZ is such that amplification
`
`will only occur when the probe is circularised. This avoids any
`
`removal of unligated probes.
`
`Figure 1D illustrates a linear probe type that is similar to the
`
`linear type of Figure 1A.
`
`In addition thereto clamp sections Cl and
`
`CZ have been incorporated in the tag section, preferably at the end,
`
`but they may also be located between ID and TS or between PBS and ID.
`
`C1 and C2 will anneal/hybridize to each other,
`
`thereby mimicking the
`
`padlock behaviour and in particular the improved hybridisation
`
`kinetics compared to conventional linear probe types (Fig 1A), while
`
`at the same time concatenation of the amplicons of the probe type of
`
`Figure l B or C can be avoided.
`
`Figure 1E illustrates the conformation of the compound probe and the
`
`elongation thereof. The first probe (1) preferably consists of a
`
`first target specific section (TSl). The second probe (2) comprises a
`
`second target specific section (T82) and a second tag section
`
`comprising a second identifier (ID2) and a second primer binding
`
`sequence (PBSZ). After or simultaneously with the
`
`hybridisation/ligation step, a compound probe (C)
`
`is annealed to part
`
`of TSl with a section that is capable of hybridizing to (part of)
`
`the
`
`first target specific section (TSPI).
`
`(C) further contains a section
`
`that contains a primer binding section (PBSl) capable of binding to a
`
`primer (Pl). Elongation of the compound probe yields a section
`
`complementary to the second target specific section (TSPZ),
`
`complementary to the second identifier cID2) and complementary to the
`
`second primer binding sequence (cPBSZ) capable of annealing to primer
`
`cP2. Amplification using primers P1 and cP2 yields amplicons that can
`
`be sequenced.
`
`Figure 1F illustrates the conformation of a set of asymmetric probes.
`
`The first probe (1) contains a target specific section (T81) and is
`
`exonuclease resistant (star). The second probe (2) contains a second
`
`target specific section (T82) and a tag section that contains two
`
`primer binding sites (PBSl and PBSZ, respectively) and a second
`
`identifier IDZ located between PBSl and PBSZ. Successful ligation and
`
`removal of unligated probes followed by amplification provided
`
`amplicons that can be sequenced.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`
`
`WO 2007/100243
`
`8
`
`PCT/NL2007/000055
`
`Figure 2 illustrates the principal structure of the ligated probes/
`
`amplicons after ligation/amplification for each of the probes types
`
`of Figure 1A through 1F
`
`Figure 3 illustrates schematically the high throughput sequencing
`
`step of the present invention whereby the ligated probes/amplicons
`
`are bound to a surface (a bead in the case of the emulsion PCR
`
`followed by pyrosequencing in the case of the 454 technology or the
`
`surface of the flow cell). The surface is provided with a sequence
`
`(CPBS)
`
`that is capable of annealing to one or both primer binding
`
`sequences PBS. After hybridisation of the ligated probe/amplicon to,
`
`the surface,
`
`the hybridised sequence can be amplified to either load
`
`the bead with amplified sequences or to generate clusters of
`
`amplified sequences on the surface of the flow cell. Subsequently,
`
`these sequence can be determined using the described high throughput
`
`sequencing technologies. The sequence can be determined uni— or
`
`bidirectional, by adding sequencing primers, nucleotides and enzymes.
`
`Figure 4 to 15 illustrates a set of degenerate ligation probes (i.e.
`
`for an allele ‘A’ and ‘G’)
`
`that comprise,
`
`in addition to, or as
`
`replacement of the elements depicted in figure 1 such as the target
`
`sections (T31, T82 etc.), a separate primer binding site for the
`
`sequencing step (sPBS) and a (reverse) PCR primer binding site
`
`(PBSZ). The sample specific reverse) primer binding site (OPBS)
`
`is
`
`degenerated by the introduction of a variable part that functions as
`
`a sample identifier (SIS). Each sample can be selectively amplified
`
`using sample specific primers (SSP),
`
`shown here for sample 1, 2 until
`
`n. The use of multiple, degenerated reverse primers are more
`
`economical and can be used for multiple assays. Pooled samples can be
`
`amplified using a correspondingly degenerated set of primers. Equal
`
`primer binding and amplification efficiency of all alleles or samples
`
`can be provided by the use of GC anchor sequences. The probes may
`
`further contain an optional C located at the 5’ end of the primer
`
`binding site, with a corresponding C located at the 3’ end of the
`
`(reverse) primer
`
`(SSP). The probes may further contain an allele
`
`identifier (AIS)
`
`for each allele to be investigated. The allele and
`
`sample identifiers are preferably from 3 to 5 bp, with preferably no
`
`two identical consecutive bases (homopolymers). Preferably the
`
`identifier differ by at least two bases.
`
`10
`
`15
`
`20
`
`25
`
`3O
`
`35
`
`
`
`WO 2007/100243
`
`9
`
`PCT/NL2007/000055
`.
`-
`
`Detailed description of the invention
`
`The present invention in a first aspect pertains to a method
`
`for the high throughput detection of one or more target nucleotide
`
`sequences in one or more samples,
`
`the method comprising the steps of:
`
`(a)
`
`providing for each target nucleotide sequence a first
`
`probe and a second probe, wherein
`
`the first probe comprises a target specific section at
`
`its 3’end and a first tag section that is non—
`
`complementary to the target nucleotide sequence and that
`
`comprises a first primer binding sequence,
`
`the second probe comprises a second target specific
`
`section at its S’end and a second tag section that is
`
`non—complementary to the target nucleotide sequence and
`
`that comprises a second primer binding sequence,
`
`wherein the first or second tag section, or both,
`
`contain(s) an identifier sequence that is located between
`
`the respective first or second primer binding sequence
`
`and the respective first or second target specific
`
`section,
`
`(b)
`
`allowing the first and second probe to hybridise to the
`
`target sequence,
`
`ligating the first and second probe when the respective
`
`target specific sections of the probes are hybridised to
`
`essentially adjacent sections on the target Sequence to
`
`provide ligated probes,
`
`(d)
`
`optionally, amplifying the ligated probes with a first
`
`and a second primer to provide amplicons,
`
`subjecting the ligated probes or amplicons to high
`
`throughput sequencing technology to determine at least
`
`part of the nucleotide sequence, at least including the
`
`identifier sequence, of the ligated probes or the
`
`amplicons,
`
`(f)
`
`identifying the presence of the target nucleotide
`
`sequence in the sample by determination of the presence
`
`or absence of the identifier sequence in the nucleotide
`
`sequence of step (e).
`
`The method starts with the provision of one or more samples
`
`(that may be combined or pooled)
`interest. To this sample,
`
`that may contain the sequence of
`
`the set of probes is added (for each target
`
`Sequence different sets of probes may be provided) and the target
`
`specific sections of the probes are allowed to hybridise to the
`
`10
`
`15
`
`2O
`
`25
`
`30
`
`35
`
`40
`
`
`
`WO 2007/100243
`
`10
`
`.
`
`PCT/NL2007/000055
`
`target sequence under suitable conditions. After hybridisation, any
`
`probes hybridised adjacent on the target sequence are ligated to
`
`result in ligated probes. The ligated probes may be amplified or,
`
`alternatively, directly subjected to sequencing using high throughput
`
`sequencing methods based on sequencing by synthesis. With the
`
`sequencing step and the subsequent identification of the identifier,
`
`the presence of the target sequence in the sample is determined and
`
`the genotyping is completed.
`
`One aspect of the present invention pertains to the
`
`advantageous design of the probes used in the present invention.
`
`These probes will be discussed in more detail herein below. Another
`
`advantageous aspect of the invention resides in the connection
`
`between the state of the art high throughput sequencing technologies
`
`as a detecting platform for oligonucleotide ligation assays and the
`
`discriminatory power of the OLA—based assays. As currently known OLA
`
`assays have been devised only for detection platforms that are based
`
`on length/mobility separation (i.e. electrophoretic analysis),
`
`hybridization (array—based) or mass determination (mass—
`
`spectrometry/MALDI—TOF), whereas no suitable probes have been
`
`developed that can be used in high throughput sequencing detection
`
`platforms. Applicants have observed that apart from innovations in
`
`probe design, also the methods of performing OLA assays in
`
`combination with high throughput sequencing requires serious
`
`amendments to both probes and procedures.
`
`Target nucleotide sequence
`
`In its widest definition,
`
`the target sequence may be any nucleotide
`
`sequence of interest. The target sequence can be any sequence of
`
`which its determination/detection is desired, for instance because it
`
`is indicative, associated or representative of a certain ailment or
`
`genetic make up or disorder. The target sequence preferably is a
`
`nucleotide sequence that contains, represents or is associated with a
`
`polymorphism. The term polymorphism herein refers to the occurrence
`
`of two or more genetically determined alternative sequences or
`
`alleles in a population. A polymorphic marker or site is the locus at
`
`which sequence divergence occurs. Preferred markers have at least two
`
`alleles, each occurring at frequency of greater than 1%, and more
`
`preferably greater than 10% or 20% of a selected population. A
`
`polymorphic locus may be as small as one base pair.
`
`Polymorphic markers include restriction fragment length
`
`polymorphisms, variable number of tandem repeats (VNTR's),
`
`10
`
`15
`
`20
`
`25
`
`3O
`
`35
`
`40
`
`
`
`WO 2007/100243
`
`11
`
`PCT/NL2007/000055
`
`hypervariable regions, minisatellites, dinucleotide repeats,
`
`trinucleotide repeats,
`
`tetranucleotide repeats, simple sequence
`
`repeats, and insertion elements such as Alu. The first identified
`
`allelic form is arbitrarily designated as the reference form and
`
`other allelic forms are designated as alternative or variant alleles.
`
`The allelic form occurring most frequently in a selected population
`
`is sometimes referred to as the wild type form. Diploid (and
`
`tetraploid / hexaploid) organisms may be homozygous or heterozygous
`
`for allelic forms. A diallelic polymorphism has two forms. A
`
`triallelic polymorphism has three forms. A single nucleotide
`
`polymorphism occurs at a polymorphic site occupied by a single
`
`nucleotide, which is the site of variation between allelic sequences.
`
`The site is usually preceded by and followed by highly conserved
`
`sequences of the allele (e. g., sequences that vary in less than
`
`1/100 or 1/1000 members of the populations). A single nucleotide
`
`polymorphism usually arises due to substitution of one nucleotide for
`
`another at the polymorphic site. Single nucleotide polymorphisms can
`also arise from a deletion of a nucleotide or an insertion of a
`
`nucleotide relative to a reference allele. Other polymorphisms
`
`include (small) deletions or insertions of several nucleotides,
`
`referred to as indels. The process of analysing the particular
`
`genetic variations (polymorphisms) existing in an individual DNA
`
`sample using the presently described methods is sometimes referred to
`
`in this application as genotyping or SNP genotyping in the ace of
`
`single nucleotide polymorphisms.
`
`the nucleic acids comprising the target
`
`E I
`
`n the nucleic acid sample,
`
`may be any nucleic acid of interest. Even though the nucleic acids in
`
`the sample will usually be in the form of DNA,
`
`the nucleotide
`
`sequence information contained in the sample may be from any source
`
`of nucleic acids,
`
`including e. g. RNA, polyA+ RNA,
`
`cDNA, genomic DNA,
`
`organellar DNA such as mitochondrial or chloroplast DNA, synthetic
`
`nucleic acids, DNA libraries (such as BAC libraries/pooled BAC
`
`clones), clone banks or any selection or combinations thereof. The
`
`DNA in the nucleic acid sample may be double stranded, single
`
`stranded, and double stranded DNA denatured into single stranded DNA.
`
`Denaturation of double stranded sequences yields two single stranded
`
`fragments one or both of which can be analysed by probes specific for
`
`the respective strands. Preferred nucleic acid samples comprise
`
`target sequences on cDNA, genomic DNA, restriction fragments,
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`
`
`WO 2007/100243
`
`12
`
`PCT/NL2007/000055
`
`adapter—ligated restriction fragments, amplified adapter—ligated
`
`restriction fragments, AFLPQ fragments or fragments obtained in an
`
`AFLP—template preamplification.
`
`5
`
`Samples
`
`It is preferred that a sample contains two or more different target
`
`sequences, i.e.
`
`two or more refers to the identity rather than the
`
`quantity of the target sequences in the sample.
`
`In particular,
`
`the
`
`sample comprises at least two different target sequences,
`
`in
`
`particular at least 100, preferably at least 250, more preferably at
`
`least 500, more in particular at least 1000, preferably at least
`
`2500, more preferably at least 5000 and most preferably at least
`
`10000 additional target sequences. In practice,
`
`the number of target
`
`sequences in a sample that can be analysed is limited, among others,
`
`by the number of amplicons than can be detected. The presently
`
`employed detection methods allow for relative large numbers of target
`
`sequences .
`
`Probe
`
`The sections of the oligonucleotide probes that are
`
`complementary to the targ