`(19) World Intellectual Property
`Organization
`International Bureau
`
`(OQ UTA ATTAMAAR
`
`(43) International Publication Date
`7 May 2015 (07.05.2015)
`
`WIPO!IPCT
`
`\=
`
`(10) International Publication Number
`WO 2015/066174 Al
`
`GD)
`
`International Patent Classification:
`C12@ 1/68 (2006.01)
`
`(21)
`
`International Application Number:
`
`PCT/US2014/062889
`
`(22)
`
`International Filing Date:
`
`29 October 2014 (29.10.2014)
`
`(25)
`
`(26)
`
`(30)
`
`(71)
`
`(72)
`
`(74)
`
`(81)
`
`Filing Language:
`
`Publication Language:
`
`English
`
`English
`
`Priority Data:
`61/897,015
`
`29 October 2013 (29.10.2013)
`
`US
`
`Applicant: LONGHORN VACCINES AND DIA-
`GNOSTICS, LLC [US/US]; 2 Bethesda Metro Center,
`Suite 910, Bethesda, Maryland 20814 (US).
`
`Inventors: DAUM, Luke T.; 318 Larkwood Drive, San
`Antonio, Texas 78209 (US). FISCHER, Gerald W.; 6417
`Lybrook Drive, Bethesda, Maryland 20817 (US).
`
`Agents: REMENICK, James et al.; Remenick PLLC,
`1025 Thomas Jefferson St., NW, Suite 175, Washington,
`District of Columbia 20007 (US).
`
`Designated States (unless otherwise indicated, for every
`kind of national protection available). AE, AG, AL, AM,
`AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY,
`
`BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM,
`DO, DZ, EC, LE, LG, ES, FI, GB, GD, GE, GII, GM, GT,
`HN, HR, HU,ID,IL,IN, IR, IS, JP, KE, KG, KN, KP, KR,
`KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, MG,
`MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM,
`PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, SC,
`SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN,
`TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW.
`
`(84)
`
`Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ,
`TZ, UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU,
`TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ, DE,
`DK,EE, ES, FI, FR, GB, GR, HR, HU,IE, IS, IT, LT, LU,
`LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK,
`SM, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ,
`GW, KM,ML, MR, NF, SN, TD, TG).
`Published:
`
`with international search report (Art. 21(3))
`
`before the expiration of the time limit for amending the
`claims and to be republished in the event of receipt of
`amendments (Rule 48.2(h))
`
`with sequencelisting part ofdescription (Rule 5.2(a))
`
`(54) Title: NEXT GENERATION GENOMIC SEQUENCING METHODS
`
` hing ba
`
`
`
`
`
`TSRRAYEINCAPTGAYSCPSSYCENGTZLEwRRSDEVVCTATIEOR,
`
`FIG. 1
`
`(57) Abstract: Disclosed is an enhanced method for rapid and cost-effective analysis of sequences of a microorganism by semi-con-
`ductor sequencing, preferably ion-torrent sequencing. This method provides for full length analysis and of multiple areas (e.g. genes)
`of multiple genomes. These methods identify genetic mutations of a particular gene that are responsible for conferring resistance or
`sensilivity to an antibiotic or other chemical compound. Multiple different species, strains and/or serotypes of a particular organism
`are rapidly and efficiently screened and mutations identified along with the complete genome of an organism. Byselecting primers
`pairs of similar size and GC content that produce amplicons with sequences spanning the entire genome, a single PCR reaction ana-
`lyzed by ion torrent methodology can determine the sequence of a complete genome. Methods are useful to sequences the genomes
`of viral agents, such as influenza virus, and bacterial agents, such as tuberculosis bacteria.
`
`
`
`
`
`wo2015/066174A1|IMITINMINMIIMTANATTRIANAATTAMATA
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`NEXT GENERATION GENOMIC SEQUENCING METHODS
`
`SequenceListing
`
`The instant application contains a Sequence Listing which has been submitted
`
`in ASCI format via EFS-Web and is hereby incorporated by reference in its entirety.
`
`Said ASCII copy, created on May 7, 2013,
`
`is named 3022.019.PCT_SL.txt and is
`
`37,821 bytes in size.
`
`Background
`
`1.
`
`Field of the Invention
`
`This invention is directed to tools, compositions and methods for identifying
`
`10
`
`genetic mutation and mega-bases of nucleic acid information by sequencing and, in
`
`particular,
`
`to electronic media and programs for analyzing sequences, genes and
`
`complete genomes by sequencing, and to the mutations identified and kits comprising
`
`reagents for identifying mutations in biological samples.
`
`2.
`
`Description of the Background
`
`15
`
`Mycobacterium tuberculosis (MTB), the causative agent for tuberculosis, is a
`
`highly transmissible bacterial pathogen with significant morbidity and mortality,
`
`particularly in HIV infected patients. Since 1997 tuberculosis has remained the leading
`
`cause of death in South Africa, a statistic linked to this country’s growing HIV
`
`epidemic. Moreover, effective treatment measures in patients with active MTB have
`
`20
`
`been exacerbated by increasing cases of multidrug resistance (MDR) and extensively
`
`drug-resistant (XDR)clinical isolates.
`
`Microscopy remains the cornerstone for diagnosing MTB in many low resource
`
`areas of the world where both MTB and also HIV are prevalent. However, many HIV
`
`infected patients with MTB are
`
`smear negative and microscopy provides no
`
`25
`
`information about antibiotic resistance. The emergence of multidrug-resistant (MDR)
`
`and extensively drug-resistant strains (XDR) has rendered standard MTB treatment
`
`regimens ineffective. According to one study, approximately 20% of TB patients in
`
`South Africa with HIV have MDR MTB. Rapid detection of MTB and initiating
`
`effective therapy is critical to decrease transmission and improve treatment outcome.
`
`30
`
`The roll-out of Cephiad’s Gene Xpert (Xpert) has improved MTB diagnosis and
`
`provides evidence of Rifampin resistance, but information about other drugs is not
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`provided.
`
`Furthermore,
`
`it may not be feasible to place Xpert
`
`testing in many
`
`microscopy labs in low resource settings. The ability to efficiently ship sputum
`
`samples centrally for next-generation sequencing (NGS) offers an opportunity to utilize
`
`highly trained staff and available infrastructure at central or regional laboratories.
`
`MDRtuberculosis strains are resistant to the first line antibiotics rifampin (RIF)
`
`and isoniazid (INH), while XDR MTBstrains are resistant to both RIF and INH as well
`
`as any fluoroquinolone and second-line injectable antibiotic drugs (e.g., amikacin,
`
`kanamycin or capreomycin). About 6% of all MTB cases are MDRstrains and South
`
`Africa continues to report higher percentages of XDR cases each year. While 7% of
`
`10
`
`patients infected with standard MTBstrains succumbto infection, the death rate rises to
`
`almost 50% with MDR tuberculosis. The emergence of antibiotic resistant MTB
`
`strains underscores an immediate need for rapid and highly accurate diagnosis,
`
`particularly in the developing countries of Africa.
`
`In addition migratory populations
`
`make geographical surveillance and tracking of drug resistance strains more urgent.
`
`15
`
`Culture-based drug susceptibility testing (DST) for MDRstrains is considered
`
`the gold-standard, but is time consuming (weeks to months), technically challenging
`
`and cost prohibitive, especially in resource limited countries.
`
`For example,
`
`the
`
`BACTEC MGIT 960 (Becton Dickinson Microbiology System, Silver Sparks NV,
`
`USA),
`
`is an automated continuously culture-based monitoring system that measures
`
`20
`
`bacterial oxygen consumption and can perform DST using prepared kits which are
`
`available for susceptibility of strains to a number of antibiotics. DST results obtained
`
`with the BACTEC MGIT 960 yield reliable and reproducible but require handling of
`
`viable and potentially infectious cultures, days to weeks or delay until results are
`
`available, specialized laboratory accommodations and high costs associated with the
`
`25
`
`instrument and consumables.
`
`In recent years, several nucleic acid based assays for determining MTB drug
`
`resistance have been developed. One of the most popular commercially available
`
`diagnostic assays is the GenoType MTBDRplus Line Probe Assay (LPA) by Hain
`
`LifeScience. This test employs nucleic acid extraction, PCR amplification, probe
`
`30
`
`hybridization and colorimetric visualization on lateral strips via an alkaline phosphatase
`
`reaction. LPA has been shownto be sensitive and specific, but there are several
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`drawbacks. Sensitivity of the LPA for all resistance-associated mutations will most
`
`likely never reach 100% since many mutations that confer resistance have yet to be
`
`discovered. Another inherent limitation of the LPA is an inability to detect sample
`
`populations that contain a mixture of resistant and susceptible strains. Strains that
`
`harbor substitution mutations that change an amino acid to a previously uncharacterized
`
`or unknown mutation not presented on the LPA are not detected. Furthermore, the
`
`LPA only allows detection of the most frequent mutations that cause resistance.
`
`If a
`
`strain were to contain mutations outside of the targeted mutations,
`
`the wild-type
`
`banding pattern will appear leading to a false negative (susceptible) result.
`
`10
`
`Thus, there is a need for a rapid, standardized, cost-effective protocol for full
`
`length gene analysis of critical genes such as, for example, genes associated with first
`
`and second line drug resistance.
`
`Summary
`
`The present
`
`invention overcomes disadvantages associated with current
`
`15
`
`strategies and designs, and hereby provides tools, compositions, methods to facilitate
`
`and simplify sequencing and methods for analyzing sequence information of nucleic
`
`acids including full-length genes and complete genomes.
`
`One embodiment of the invention is directed to analyzing drug resistance
`
`mutations by semi-conductor sequencing and, preferably,
`
`ion torrent sequencing.
`
`20
`
`Nucleic acid segments containing a gene of interest are amplified by PCR and the
`
`amplified products are processed and subsequently analyzed by sequencing.
`
`For
`
`nucleic acid segments that comprise RNA, the RNA is reverse transcribed to DNA.
`
`Sequencing is preferably by Ion Torrent, or Next-Generation sequencers including the
`
`Ion Torrent Personal Genome Machine (PGM™; Life Technologies). Preferably, the
`
`25
`
`amplification products represent a common full-length, or multiple overlapping pieces
`
`of genes of a numberof species, strains and/or serotypes of organisms. The amplified
`
`products are sequenced and mutations identified and mapped. Mapping identifies both
`
`known and previously unknown mutations and is useful to track the progress and
`
`movement of drug resistance across a population. Preferably, the invention analyzes
`
`30
`
`nucleic acids of pathogens suchas, for example, virus, bacteria or parasites. Preferably
`
`the viral pathogens are the causative agents of influenza or HIV and the bacterial
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`pathogens are the causative agents of tuberculosis.
`
`Jon torrent sequencing of the
`
`nucleic acid segments provides enhanced sequencing for rapid, efficient, cost-effective
`
`protocol for full
`
`length gene analysis. Drug resistance and other mutations are
`
`immediately determined.
`
`Another embodiment of the invention is directed to tools, compositions and
`
`methods
`
`for performing NGS sequencing, preferably ion torrent or MiSeq™
`
`sequencing of genes or complete genomes. The invention comprises obtaining a DNA
`
`sequence of an organism of interest and performing polymerase chain reaction analysis
`
`using multiple pairs of nucleic acid primers. Each pair of primers is designed to
`
`10
`
`simultaneously amplify overlapping segments of the genome under similar PCR
`
`conditions and these may be performed as sequencing reactions or multiplex for
`
`multiple genes or the entire genome. Preferred primers possess similar GC content and
`
`overall size. A single PCR amplification of the genome produces hundreds of
`
`amplification products whose sequences include the full-length gene, large gene and
`
`15
`
`noncoding segments or the entire genome of the organism. These products are
`
`analyzed, preferably by NGS, and the sequences matched to create a sequence map of
`
`the entire gene or genome.
`
`Another embodiment of the invention is directed to methods of identifying a
`
`sequence motif in the genome of a microorganism that confers resistance to an
`
`20
`
`antimicrobial compound, comprising: providing multiple nucleic acid samples obtained
`
`from multiple different strains or serotypes of the microorganism; amplifying the
`
`sequences of the multiple nucleic acid samples by a polymerase chain reaction;
`
`obtaining sequence information of the amplified sequences by ion torrent sequencing;
`
`identifying a polymorphism in the genome of at least one microorganism strain or
`
`25
`
`serotype from the sequence information obtained; and correlating the polymorphism
`
`identified with a phenotype or genomelocation of the at least one microorganism strain
`
`or serotype to identify the sequence motif that confers resistance to the antimicrobial
`
`compound. Preferably,
`
`the microorganism is a virus, a bacterium, a fungus or a
`
`parasite, and the virus is
`
`influenza virus and the bacterium is Mycobacterium
`
`30
`
`tuberculosis.
`
`Preferably,
`
`the nucleic acid samples are provided in an aqueous
`
`molecular transport medium that contains a chaotrope, a detergent, a reducing agent, a
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`chelator, a buffer, and an alcohol, together present in an amount sufficient to lyse cells,
`
`denature proteins, inactivate nucleases, kill pathogens, and not degrade nucleic acid.
`
`Preferably, amplifying is performed in a one step polymerase chain reaction utilizing a
`
`primer pair that amplifies a gene or nucleic acid segment associated with resistance to
`
`an antimicrobial compound, and the polymerase chain reaction is carried out in an
`
`aqueous mix comprising:
`
`a heat-stable polymerase; a mix of deoxynucleotide tri
`
`phosphates comprising about equivalent amounts of dATP, dCTP, dGTP and dTTP, a
`
`chelating agent, an osmolarity agent, an albumin, a magnesium salt; and a buffer.
`
`Preferably the antimicrobial compoundis an antibiotic.
`
`10
`
`Another embodiment of the invention is directed to methods of treating a
`
`disease or disorder caused by the at least one microorganism strain or serotype with the
`
`antimicrobial compound identified by the methods of the invention.
`
`Preferably,
`
`treatment comprises the targeted killing of the specific pathogen that is the causative
`
`agent of the disease or disorder. Also preferably, the effective dose is determined from
`
`15
`
`methods of the invention by assessing the phenotypic characteristics associated with the
`
`target sequence or sequences identified.
`
`Another embodiment of the invention is directed to methods for determining a
`
`complete sequence of a genome of an microorganism comprising: producing a series
`
`of amplicons by performing a single polymerase chain reaction (PCR) of the genome in
`
`20
`
`an aqueous mixture containing a heat-stable polymerase; a mix of deoxynucleotide tri
`
`phosphates comprising about equivalent amounts of dATP, dCTP, dGTP and dTTP; a
`
`chelating agent; a salt; a buffer; a stabilizing agent; and a plurality of primer pairs
`
`wherein each primer of the plurality of primer pairs has a similar annealing
`
`temperatures; sequencing each of the series of amplicons produced by semi-conductor
`
`25
`
`sequencing, and correlating the sequences of the amplicons and constructing the
`
`complete sequence of the genome. Preferably, each of the primers of the multiple
`
`primer pairs comprise primers that are from 15 to 25 nucleic acids in length and each
`
`has a GC content of about 25-50%. Preferably, each primer pair is designed to PCR
`
`amplify an amplicon, and the collection of amplicons that are PCR amplified
`
`30
`
`encompass overlapping segment of the complete genome sequence. Preferably, the
`
`plurality of primer pairs hybridizes to the genome and are spaced along the genomeat
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`about every 500 to 2,000 nucleotides. Preferably,
`
`the microorganism is a virus, a
`
`bacterium, a fungus, a parasite or a cell, and the virus is influenza virus and the
`
`bacterium is Mycobacterium tuberculosis.
`
`Another embodimentof the invention is directed to methods for determining the
`
`sequence of a nucleic acid segment in one step comprising: performing a polymerase
`
`chain reaction on the nucleic acid segment to produce a series of amplicons, wherein
`
`the PCR comprises a heat-stable composition comprising: a polymerase; a mix of
`
`deoxynucleotide tri phosphates comprising about equivalent amounts of dATP, dCTP,
`
`dGTP and dTTP; a chelating agent; a salt; a buffer; a stabilizing agent; and a plurality
`
`10
`
`of primer pairs wherein each primerof the plurality of primer pairs has an annealing
`
`temperature within 5°C; sequencing each of the series of amplicons produced by semi-
`
`conductor sequencing, and correlating the sequences of the amplicons and constructing
`
`the sequence of the nucleic acid segment. Preferably the nucleic acid segment is 1 Mb
`
`or greater in length, more preferably greater 2 or more Mbin length, more preferably 5
`
`15
`
`or more Mb in length and more preferably 10 or more Mb in length. Preferably, each
`
`of the primers of the multiple primer pairs is of from 16 to 24 nucleotides in length, has
`
`a GC content of about 28-35%, and has an annealing temperature of within 3°C ofeach
`
`other primer. Preferably, each primer pair is designed to PCR amplify an amplicon
`
`representing a portion of the sequence of the nucleic acid segment, and the collection of
`
`20
`
`amplicons that are PCR amplified represent overlapping portions of the complete
`
`sequence of the segment. Preferably, the plurality of primer pairs hybridizes to the
`
`segment at a spacing of about 800 to 1,200 nucleotides in length.
`
`Another embodiment of the invention is directed to mixtures comprising
`
`multiple pairs of nucleic acid primers wherein, upon subjecting the collection to a
`
`25
`
`polymerase chain reaction in association with a nucleic acid segment, the collection of
`
`primer pairs generates a collection of amplicons, wherein each ampliconis about 500 to
`
`2,000 nucleotides in length, such that the entire sequence of the segment is represented
`
`in the resulting collection of amplicons. Preferably, each primer of the collection of
`
`primerpairs is about 15 to 25 nucleotides in length, has a GC content of about 25-45%,
`
`30
`
`and an annealing temperature within 3°C of each other primer, and each primer of the
`
`collection of primer pairs contains a sequence that hybridizes to the genome of the
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`same microorganism. Preferably, the microorganism is a virus, a bacterium, a parasite,
`
`or a fungus. Preferably,
`
`the mixture contains a heat-stable polymerase; a mix of
`
`deoxynucleotide tri phosphates comprising about equivalent amounts of dATP, dCTP,
`
`dGTP and dTTP; a chelating agent; a salt; a buffer; a stabilizing agent and nuclease-
`
`free water.
`
`Another embodiment of the invention comprises kits containing reagent vessels
`
`preferably including one or more of chemical reagents, primers and polymerases for
`
`sequencing. The sample to be analyzed is mixed with a reagent vessel that preferably
`
`contains chemical components sufficient to kill all pathogens present in the sample,
`
`10
`
`inactivate nucleases in the sample, and maintain the integrity of the nucleic acids
`
`rendering the sample safe for transportation and subsequent manipulation, such as, for
`
`example, aqueous lysis buffer, aqueous or anhydrous transport medium, or aqueous
`
`PrimeStore Molecular Transport Medium®. The mixture may be combined in a
`
`column, such as a micro-centrifuge column, which may be included in the kit, to aid in
`
`15
`
`the extraction of nucleic acid form the sample. Extracted nucleic acid is preferably
`
`combined with another chemical reagent composition such as, for example PrimeMix®
`
`that facilitates nucleic acid testing such as, for example, PCR sequencing. Such reagent
`
`composition may contain positive control sequences, negative control sequences and/or
`
`sequences that specifically hybridize (under
`
`the desired high or
`
`low stringency
`
`20
`
`hybridization conditions) to a particular target sequences that is characteristic for the
`
`presence of a pathogen.
`
`Another embodiment of the invention is directed to computer-readable media
`
`that implements the analytical methods of the invention. Preferable the computer-
`
`readable media analyses sequence information obtained and centralizes the collection
`
`25
`
`of information. Also preferably the sequence information is compared with sequence
`
`information obtained from one or more known databases of sequence information for
`
`the sameor similar sequences andidentifies mutations that provide antibiotic resistance
`
`and other phenotypic characteristics to the microorganism.
`
`Other embodiments and advantages of the invention are set forth in part in the
`
`30
`
`description, which follows, and in part, may be obvious from this description, or may
`
`be learned from the practice of the invention.
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`Description of the Figures
`
`Figure 1
`
`Illustrates the pncA gene sequence plus 100 flanking base pairs as well
`
`as the reverse compliment sequence, the protein sequence, and the primers sequences.
`
`Figure 2
`
`Illustrates the nucleotide sequence of H37RV Genestrain as well as the
`
`sequences of the TB 16S ribosomal RNA gene sequencing primers.
`
`Figure 3
`
`Illustrates the rpoB gene conferring sensitivities/resistance to Rifampin
`
`as well as the forward and reverse primer sequencesfor rpoB.
`
`Figure 4
`
`Illustrates the Mycobacterium tuberculosis H37Ra, complete genome
`
`(GenBank: CPOGO61 1.1} GyrA Gene and three sets of forward and reverse primers.
`
`10
`
`Figure 5
`
`Mycobacterium tuberculosis H37Ra, complete genome (GenBank:
`
`CP000611.1) catalase-peroxidase-peroxynitritase T katG and three sets of forward and
`
`reverse primers.
`
`Figure 6
`
`Figure 7
`
`Illustrates the cycle threshold of Gyrase A and IS 6110 assays.
`
`Illustrates Gyrase A assay and the IS 6110 assay using sequence isolates
`
`15
`
`by cycle numbervs. Ct value.
`
`Figure 8
`
`Illustrates Gyrase A assay and the IS 6110 assay using sequenceisolates
`
`vs cycle threshold (ct).
`
`Figure 9
`
`Summary of results achieved in sequencing the influenza A genome
`
`using various primerpair collections with ion torrent sequencing methodology.
`
`20
`
`Figure 10
`
`Characterization of primer pairs
`
`for whole-genome
`
`ion torrent
`
`sequencing of influenza A (H3N2).
`
`Figure 11
`
`(A) Gene sequence of pncA showing coding regions as shaded, and (B)
`
`pncA forward and reverse primers utilized in PCR tiling and pncA regions P1-P4.
`
`Figure 12
`
`Architecture of an electronic system of the methods of the invention.
`
`25
`
`Description of the Invention
`
`Rapid analysis of genes associated with drug resistant strains is a major
`
`challenge for successful
`
`treatment of many diseases and disorders.
`
`Real-time
`
`geographical surveillance of emerging MTB drug resistance would facilitate more
`
`appropriate treatment strategies (e.g., drug, antibiotic, chemical). Currently, available
`
`30
`
`molecular methods such as the GenoType® MTBDRplus LPA offer limited detection
`
`capabilities, particularly when novel/uncommon amino acid substitutions are within
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`knowndrugresistance regions or when undiscovered amino acid mutations impact drug
`
`resistance. Also, current methodology including Ion Torrent protocol requires multi-
`
`steps, ancillary equipment and increased expense, and is labor intensive.
`
`A simplified semiconductor sequencing protocol for rapid characterization of
`
`full-length genes and genome has been surprisingly discovered.
`
`The invention
`
`comprises
`
`a
`
`standardized protocol
`
`for gene
`
`sequencing preferably utilizing
`
`semiconductor sequencing and preferably Ion Torrent sequencing of full-length genes.
`
`The protocol enables sequencing of entire coding regions implemented allowing
`
`characterization of known mutations and discovery of new polymorphisms.
`
`This
`
`10
`
`protocol also enables the sequencing of mega-bases of nucleotide information such that
`
`complete genomes of cells and organisms can be determined and the genetic
`
`polymorphism readily mapped and identified. Preferably the cells or organisms are
`
`disease causing prokaryotic or eukaryotic cells, or yeast or fungal cells. Preferred
`
`disease causing organisms include strains of bacteria, virus, fungus, and parasites.
`
`15
`
`Exemplary organisms include, but are not limited to DNA virus, an RNA virus, a
`
`positive or negative single-strand virus, a double strand virus, orthomyxovirus,
`
`paramyxovirus, Morbillivirus (e.g., Rubeola), retrovirus, flavivirus, filovirus, lentivirus,
`
`hanta virus, herpes virus (e.g., VZV, HSV I, HSV II, EBV), hepatitis virus (e.g., A, B,
`
`C, non-A, non-B), Influenza virus (e.g., HSN1, HIN1, H7N9), Respiratory Syncytial
`
`20
`
`Virus, HIV, or Ebola virus. Exemplary organismsalso include but are not limited to
`
`Mycobacteria
`
`(e.g., M.
`
`tuberculosis), Bacillus
`
`anthracis, Plasmodium (e.g.,
`
`Plasmodium falciparum), Shistosomiasis (e.g., Schistosoma mansoni), Francisella
`
`tularensis, Clostridium difficile, Meningococcal infections, Pseudomonas infections,
`
`Yersinia pestis, and Vibrio cholerae. The invention is also directed to the detection and
`
`25
`
`characterization of organismsthat are related to the pathogenic organisms, but are non-
`
`pathogenic. Detection of one or more of the non-pathogenic, but related organisms can
`
`be a definitive diagnosis of the absence of disease.
`
`In addition, the tools and methods
`
`of the invention allow for the identification and characterization of abnormalities in the
`
`existing genome of an individual such as a condition that may be present from birth
`
`30
`
`(congenital) and may be heritable. These genetic disorders are equally detectable and
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`characterizable by the tools and methods of the invention and can be diagnosed by
`
`comparison with an otherwise normal or control genomeof a non-afflicted individual.
`
`This relatively rapid (e.g., 1, 2, or 3 days, or less), standardized, cost-effective
`
`protocol allows for full-length analysis of genes such as, for example,
`
`to identify
`
`mutations that possess one or more alterations of a DNA, RNA,protein and/or peptide
`
`sequence. For sample sequences that are RNA, the RNA sequence of interest in the
`
`sample is typically reverse transcribed to DNA for PCR analysis. Preferably identified
`
`and characterized are one or more gene mutations that provide a microorganism with
`
`resistance to an antibiotic. Preferred mutations that are identified with the methods of
`
`10
`
`the invention are located in one or more sites within an amino acid coding region, a
`
`transcription promoter or termination site, a stop or start codon, a site within a non-
`
`coding region, a splice junction site, a modification site, a transcription or translation
`
`factor binding or recognition site, one or more sites that contribute to a three
`
`dimensional structure, or a combination thereof, Preferred genes that are analyzed
`
`15
`
`include MTB genes associated with first and second-line MTB drug resistance (see
`
`Figures 1-5). Preferred examples of MTB-associated genes include, for example, rpoB
`
`(rifampin), katG and inhA (isoniazid), gyrA and gyrB (fluoroquinolones), pncA and
`
`panD (PZA or pyrazinamide) and rrs(16s) (aminoglycosides, amikacin, kanamycin,
`
`capreomycin, streptomycin) and rspL (streptomycin).
`
`20
`
`The methods of the invention were used to evaluate 26 geographically diverse
`
`clinical isolates collected in South Africa including MDR and XDRstrains with next-
`
`generation Ion Torrent Personal Genome Machine (PGM). Ofparticular interest were
`
`INDELS, which are insertions or deletions if a single nucleotide (A,T,G,C) causing
`
`missense changes in the protein structure. The sequencing data obtained from this
`
`25
`
`developed methodology were compared to the HAIN LPA and genotyping DST data
`
`from culture. This methodology for the first time enables sequencing entire coding as
`
`well
`
`as non-coding regions
`
`for genes
`
`implemented
`
`in resistance
`
`allowing
`
`characterization of known mutations and discovery of new polymorphisms. Previously
`
`uncharacterized substitution mutations were identified on the rrs, rpoB, katG, pncA
`
`30
`
`gyrA and gyr B, katG, inhA and panD genes.
`
`10
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`The present invention offers significant potential for new sequencing platforms
`
`such as, for example, next-generation instruments to be more utilized in resource
`
`deprived environments such as Africa, Asia, and India.
`
`Specifically,
`
`the current
`
`invention improves
`
`and streamlines
`
`the up-front
`
`library preparation process.
`
`Methodology of the invention does not
`
`require the use of expensive ancillary
`
`equipment pieces typically utilized or required by the manufacturer. Specifically, the
`
`standardized procedure of the invention does not require an Agilient Bioanalyzer for
`
`DNA quantifications; the OneTouch ePCR system for emulsification PCR step, or the
`
`PipinPrep for gel excision. Additionally, since the protocol of the invention involves
`
`10
`
`re-sequencing full-coding genes (not necessarily full genomes) the Bioruptor is not
`
`required for shearing DNA into smaller pieces. Additionally, it is not necessary to
`
`sequence the entire genome and then identify genes. The method and tools of the
`
`invention allow for pre-selection of the genes and/or regions of interest that are to be
`
`sequenced. As the Agilent 2100 BioAnalyzer, OneTouch, PipinPrep, and Bioruptorall
`
`15
`
`require additional training for use, consume valuable laboratory bench space, and are
`
`extremely expensive, the invention represents a significant advance and improvement
`
`over convention methodologies.
`
`The sequencing protocol of the invention is exemplified herein using Ion
`
`Torrent sequencing as this sequencing method has been applied to M. tuberculosis. As
`
`20
`
`believed to clear to those skilled in the art,
`
`the protocol
`
`involves semiconductor
`
`sequencing, with is exemplified by Ion Torrent sequencing and, as such, involves the
`
`sequencing of large numbers of different regions simultaneously. The sequencing and
`
`nucleic acid methodologies are applicable to any series of genes, genomes or nucleic
`
`acid sequences.
`
`25
`
`The invention also includes a methodology for selecting primer pairs for
`
`sequencing a target of interest. Primer pairs are preferably selected with matched
`
`annealing and melting temperature as to the target. Preferably, melting and annealing
`
`temperatures are based on sequence characteristics such as the GC content of the
`
`sequence, the possibility of self-hybridization of the primer(e.g., forming hairpin loops
`
`30
`
`within the primer), and possible structures near the binding site. Preferably the primers
`
`do not self-hybridize under the conditions of sequencing. Preferably the GC content of
`
`11
`
`
`
`WO 2015/066174
`
`PCT/US2014/062889
`
`primers is between about 25% and 50%, more preferably between about 30% and 40%,
`
`more preferably between about 25% and 35%, and also more preferably between about
`
`40% and 50%. Thus, primer sequences of the target are selected for hybridization
`
`based on sequence characteristics such that all of the primer pairs utilized for the target
`
`will have similar melting and/or annealing temperatures to the target. Preferably
`
`primer sequences contain no regions of reasonably possible self hybridization of the
`
`primer
`
`sequence.
`
`Preferably primer pairs
`
`are matched for annealing and/or
`
`disassociation temperatures which may be within 5°C, within 4°C, with 3°C, within
`
`2°C, with 1°C and more preferably the same annealing temperature, the same melting
`
`10
`
`temperature or both. Primerpairs preferably generate amplicons of between about 500
`
`to about 2,000 nucleotides (NT) in length that represent overlapping segment of the
`
`target, more preferably between about 600 and 1,500 NT, more preferably between
`
`about 700 and 1,300 NT, more preferably between about 800 and 1,200 NT, more
`
`preferably between about 900 and 1,100 NT, and more preferably about 1,000 NT.
`
`15
`
`Primers are generally between 12 NT and 45 NT in length, more preferably between 15
`
`and 35 NT, and more preferably between about 18 and 25 NT. Although not a rule,
`
`generally longer primers have a lower GC content. Exemplary primers pairs are
`
`identified for the pncA gene (see Figure 1), the H37RV genestrain (see Figure 2), the
`
`rpoB gene (see Figure 3), the GyrA gene (see Figure 4, and the katG gene (see Figure
`
`20
`
`5). These primer pairs are useful to combine in ready to use kits to simplify the
`
`sequencing of full-length genes.
`
`In one embodiment of the invention, a semiconductor sequencing protocol was
`
`determined for five genes of M. tuberc