`
`(19) World Intellectual Property Organization
`International Bureau
`
`9 December 2010 (09.12.2010) (10) International Publication Number
`
`(43) International Publication Date
`
`WO 2010/141433 A2
`
`
`(51)
`
`International Patent Classification:
`C12Q 1/68 (2006.01)
`C40B 40/06 (2006.01)
`CIZQ 1/70 (2006.01)
`
`(81)
`
`(21)
`
`International Application Number:
`PCT/US2010/036849
`
`(22)
`
`International Filing Date:
`
`(25)
`
`(26)
`
`(30)
`
`(71)
`
`(72)
`(75)
`
`(74)
`
`Filing Language:
`
`Publication Language:
`
`1 June 2010 (01.06.2010)
`
`English
`
`English
`
`Priority Data:
`61/183,377
`61/286,742
`
`2 June 2009 (02.06.2009)
`15 December 2009 (15.12.2009)
`
`US
`US
`
`Applicant (for all designated States except US): THE
`REGENTS OF THE UNIVERSITY OF CALIFOR-
`NIA [US/US]; 1111 Franklin Street, 5th Floor, Oakland,
`CA 94607-5200 (US).
`
`Inventors; and
`Inventors/Applicants flor US only): DING, Shou-Wei
`[US/US]; 8797 Barnwood Lane, Riverside, CA 92508
`(US). WU, Qingfa [CN/US]; 1177 Linden Street, Apt.
`19, Riverside, CA 92507 (US).
`
`Agents: EINHORN, Gregory, P. et al.; Gavrilovich,
`Dodd & Lindsey LLP, 4660 La Jolla Village Drive, Suite
`750, San Diego, CA 92122 (US).
`
`Designated States (unless otherwise indicated, for every
`kind ofnational protection available): AE, AG, AL, AM,
`AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ,
`CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO,
`DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP,
`KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD,
`ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI,
`NO, NZ, OM, PE, PG, PH, PL, PT, RO, RS, RU, SC, SD,
`SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR,
`TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW.
`
`(84)
`
`Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG,
`ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ,
`TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK,
`EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU,
`LV, MC, MK, MT, NL, NO, PL, PT, R0, SE, SI, SK,
`SM, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ,
`GW, ML, MR, NE, SN, TD, TG).
`Published:
`
`without international search report and to be republished
`upon receipt ofthat report (Rule 48.2(g))
`
`with sequence listing part ofdescription (Rule 5.2(a))
`
`(54) Title: VIRUS DISCOVERY BY SEQUENCING AND ASSEMBLY OF VIRUS-DERIVED SIRNAS, MIRNAS, PIRNAS
`
`(57) Abstract: In one embodiment, the disclosure provides methods and systems for identifying viral nucleic acids in a sample. In
`another embodiment the invention provides methods for viral genome assembly and viral discovery using small inhibitory RNAs,
`or "small silencing," RNAs (siRNAS), micro-RNAs (miRNAs) and/or PlWl-interacting RNAs (piRNAs), including siRNAS, miR-
`NAS and/or piRNAS isolated or sequenced from invertebrate organisms such as insects (Anthropoda), nematodes (Nemapoda),
`Mollusca, Porifera, and other invertebrates, and/or plants, fungi or algae, Cyanobacteria and the like,
`
`
`
`W02010/141433A2|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`VIRUS DISCOVERY BY SEQUENCING AND ASSEMBLY
`OF VIRUS-DERIVED siRNAs, miRNAs, piRNAs
`
`REFERENCE TO SEQUENCE LISTING
`
`This application contains a txt. File containing the sequence listing, which is
`
`incorporated by reference herein.
`
`STATEMENT AS TO FEDERALLY SPONSORED RESEARCH
`
`This invention was made with government support under Grant No. A1052447
`
`awarded by the National Institutes of Health (NIH) and Grant No. 2007-35319-18325
`
`awarded by the USDA. The government has certain rights in the invention.
`
`TECHNICAL FIELD
`
`In one embodiment, the disclosure provides methods and systems for
`
`identifying viral nucleic acids in a sample. In another embodiment the invention
`
`provides methods for viral genome assembly and viral discovery using small
`
`inhibitory RNAs, or “small silencing,” RNAs (siRNAS), micro-RNAs (miRNAs)
`
`and/or PIWI—interacting RNAs (piRNAs), including siRNAS, miRNAs and/or
`
`piRNAs isolated or sequenced from invertebrate organisms such as insects
`
`(Anthropoda), nematodes (Nemapoda), Mollusca, Porifera, and other invertebrates,
`
`and/or plants, filngi or algae, Cyanobacterz'a and the like.
`
`BACKGROUND
`
`Discovery of new viruses is often hindered by difficulties in their
`
`amplification in cell culture and/or lack of their cross-reactivity in serological and
`
`nucleic acid hybridization assays to known viruses. Many new viruses have been
`
`recently identified in environmental and clinical samples using metagenomic
`
`approaches, in which viral particles are first purified and viral nucleic acid sequences
`
`are then randomly amplified prior to subcloning and sequencing (Delwart, 2007).
`
`The Dicer family of host immune receptors mediates antiviral immunity in
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`
`
`fungi, plants and invertebrate animals by RNA interference (RNAi) or RNA silencing
`
`(1-3). In this immunity, a viral double-stranded RNA (dsRNA) is recognized by Dicer
`
`and diced into small interfering RNAs (siRNAs). These virus-derived siRNAs are
`
`then loaded into an RNA silencing complex to act as specificity determinants and to
`
`1
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`guide slicing of the target viral RNAs by an Argonaute protein (AGO) present in the
`
`complex. Dicer proteins typically contain an RNA helicase domain, a PAZ domain
`
`shared with AGOs, and two tandem type III endoribonuclease (RNase III) domains.
`
`Dicer cleaves dsRNA with a simple preference toward a terminus of dsRNA,
`
`producing duplex small RNA fragments of discrete sizes progressively from the
`
`terminus (4).
`
`In addition to siRNAs, microRNAs (miRNAs) and PIWI—interacting RNAs
`
`(piRNAs) also guide RNA silencing in similar complexes but with distinct AGOs (4-
`
`6). ln Drosophila melanogaster, miRNAs and siRNAs are predominantly 22 and 21
`
`nucleotides in length, dependent on Dicer-l (DCRl) and DCR2 for their biogenesis,
`
`and act in silencing complexes containing AGOl and AG02 in the AGO subfamily,
`
`respectively (4-6). In contrast, ~24-30-nt piRNAs are Dicer-independent and require
`
`AGO3, Aubergine (AUB) and PIWI in the PIWI subfamily for their biogenesis (4-6).
`
`Genetic analyses (7—10) have clearly demonstrated a role for D. melanogaster DCR2
`
`in the immunity and bio genesis of viral siRNAs targeting diverse positive—strand (+)
`
`RNA viruses, including Flock house virus (FHV), cricket paralysis virus, Drosophila
`
`C virus (DCV), and Sindbis virus (SlN V). Cloning and sequencing of small RNAs
`
`from FHV-infected Drosophila cells fiirther indicate that the viral dsRNA replicative
`
`intermediates (vRI-dsRNA) are the substrate of DCR2 and the precursor of viral
`
`siRNAs (11-12). Drosophila susceptibility to Drosophila X virus (DXV), which
`
`contains a dsRNA genome, is influenced by components from both the siRNA (e.g.,
`
`AG02 & R2D2) and piRNA (e.g., AUB & PIWI) pathways (13). However, detection
`
`of small RNAs derived from any dsRNA virus has not been reported yet (1, 13).
`
`Virus—derived small RNAs were first detected in plants infected with a +RNA
`
`virus (14). The Dicer proteins involved in the production of siRNAs targeting both
`
`+RNA viruses and DNA viruses have been identified in Arabidopsz's thaliana (2-3),
`
`which encode AGOs in the AGO subfamily but not in the PIVVI subfamily (15).
`
`Cloning and sequencing of plant viral siRNAs suggest that they may be processed
`
`either from vRI-dsRNA or hairpin regions of single-stranded RNA precursors (16-
`
`20). Production of viral siRNAs has also been demonstrated in fungi, silkworms,
`
`mosquitoes, and nematodes in response to infection with +RNA viruses and viral
`
`small silencing RNAs produced in filngi and mosquitoes have recently been cloned
`
`and sequenced (21—25).
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`The available data thus illustrate that accumulation of virus-derived small
`
`silencing RNAs is a common feature of an active immune response to viral infection
`
`in diverse eukaryotic host species.
`
`SUMMARY
`
`The disclosure provides a method for viral discovery that is independent of
`
`either amplification or purification of viral particles. Many human diseases such as
`
`approximately half of all analyzed cases of human encephalitis and gastroenteritis,
`
`have no identified etiology. Thus, discovery of new viruses should facilitate
`
`identification of human pathogenic viruses, improve the understanding of their
`
`transmission and provide diagnostic tools and targets for the development of anti-
`
`virals.
`
`The disclosure provides methods of identifying viral nucleic acid, assembling
`
`viral genomes and discovering viruses based upon the mechanism of invertebrate,
`
`plant, algae, fungal etc. processing of viral small inhibitory RNAs, or “small
`
`silencing,” RNAs (siRNAS), micro—RNAs (miRNAs) and/or PIWI—interacting RNAs
`
`(piRNAs), including miRNA-, piRNA-, siRNA and/or RNAi-mediated viral
`
`immunity in plants and invertebrates, including insects (Drosophila melanogaster and
`
`mosquitoes) and nematodes (Caenorhabditis elegans), and algae, fungus,
`
`Cyanobacteria and the like.
`
`In alternative embodiments, the invention provides methods comprising:
`
`(a) (i) obtaining a plurality of naturally occurring 18-28 nucleotide RNA
`
`fragments, or siRNAs, or miRNAs and/or piRNAs, to generate an RNA library, or,
`
`obtaining a plurality of 18—28 nucleotide RNA fragments, or siRNAs, or miRNAs
`
`and/or piRNAs from an organism or organisms, or a plant or plants; and
`
`’IO
`
`’15
`
`20
`
`25
`
`(ii)
`
`determining the sequence of the RNA fragments, or siRNAs, or
`
`
`
`miRNAs and/or piRNAs, and using those sequences to assemble the RNA fragments,
`
`or siRNAs, or miRNAs and/or piRNAs into at least one contiguous unit (“a conti g”)
`
`comprising a plurality of the nucleotide RNA fragments siRNAs, miRNAs and/or
`
`30
`
`piRNAs; or
`
`(b) the method of (a), wherein the contigs are assembled using the help of a
`
`computer program, wherein optionally the computer program is VELVET.
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`In alternative embodiments, the methods fithher comprise determining the
`
`sequence of the assembled contiguous unit, or the contig; or further comprise:
`
`(a) searching a database of viral or microorganism sequences using the at least
`
`one contiguous sequence to identify a viral or microorganism genome, nucleic acid or
`
`protein-encoding sequence, or subsequence thereof, having significant homology to
`
`the assembled contiguous unit; or
`
`(b) the method of (a), wherein the database comprises non—redundant
`
`nucleotide sequences; or
`
`(c) the method of (a), wherein the database comprises in silico translation
`
`1O
`
`sequences.
`
`15
`
`20
`
`25
`
`30
`
`wherein optionally the assembled contig sequence has significant homology to
`
`a known viral genus or genome.
`
`In alternative embodiments, the methods fiirther comprise searching a
`
`database of viral or microorganism sequences using the at least one contiguous
`
`sequence to identify a viral or microorganism genome, nucleic acid or protein—
`
`encoding sequence, or subsequence thereof, having at least about 50% to about 100%
`
`percent homology to all or part of the assembled contiguous sequence.
`
`In alternative embodiments, the methods fiirther comprise making a
`
`phylogenetic analysis of the identified viral or microorganism genome, nucleic acid or
`
`protein-encoding sequence with the contiguous sequence.
`
`In alternative embodiments, the methods further comprise identifying and
`
`annotating the phylogenetic analysis of the identified viral sequence with the
`
`contiguous sequence.
`
`In alternative embodiments, the obtained RNA or nucleotide sequences are
`
`substantially purified or isolated from an organism of interest.
`
`In alternative embodiments, the methods fiirther comprise substantially
`
`purifying small RNA fragments, or siRNAs, or miRNAs and/or piRNAs, from an
`
`organism of interest and sequencing the RNA fragments to obtain an RNA library.
`
`In alternative embodiments, the methods fiirther comprise removing
`
`sequenced segments from the library that overlap with the genomic sequence of the
`
`organism of interest from which the RNA was derived.
`
`In alternative embodiments, the methods fiirther comprise filling in gaps
`
`between the contiguous sequences. In one embodiment, filling in the gaps between
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`the contiguous sequences comprises use of RT—PCR and/or sequencing to fill in gaps
`
`between the contiguous sequences.
`
`In alternative embodiments, the methods filrther comprise completing a
`
`genomic sequence of a virus or a microorganism comprising the contiguous sequence
`
`using 5’-RACE and 3’-RACE.
`
`In one embodiment, the organism or organisms is/are an invertebrate, an insect
`
`(Anthropoda), a nematode (Nemapoda), a Mollusca, a Porz'fera, a plant, a fungi, an
`
`algae, a Cyanobacteria; or the organism or organisms are identified or unidentified
`
`and are derived from an environmental sample. In one embodiment, the
`
`10
`
`environmental sample is a soil sample, a water sample or an air sample.
`
`In one embodiment, the invention provides methods for identifying a virus,
`
`comprising:
`
`constructing a small RNA library from an organism or organisms;
`
`deep sequencing the small RNA library;
`
`assembling the sequenced small RNAs using (a) all of the sequenced small
`
`RNAs of 18—28 nucleotides in length, or siRNAs, or miRNAs and/or piRNAs; or (b)
`
`small RNAs, or siRNAs, or miRNAs and/or piRNAs, of a defined length into at
`
`plurality of contigs;
`
`identifying and removing those assembled sequences mapped onto the genome
`
`of the organism to provide an enriched set of contigs;
`
`performing a homology search of contigs against known viruses at both the
`
`nucleotide and protein levels;
`
`optionally using RT—PCR and sequencing to fill the gaps between the contigs
`
`that show limited similarities with a known virus;
`
`completing the filll—length genomic sequence of the identified virus with 5’—
`
`RACE and 3’-RACE; and
`
`annotating the identified virus.
`
`In one embodiment, the organism or organisms is/are an invertebrate, an insect
`
`(Anthropoda), a nematode (Nemapoda), a Mollusca, a Porg'fera, a plant, a fungi, an
`
`algae, a Cyanobacterz‘a; or the organism or organisms are identified or unidentified
`
`and are derived from an environmental sample. In one embodiment, the
`
`environmental sample is a soil sample, a water sample or an air sample.
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`The details of one or more embodiments of the disclosure are set forth in the
`
`accompanying drawings and the description below. Other features, objects, and
`
`advantages of the disclosure will be apparent from the description and drawings, and
`
`from the claims.
`
`All publications mentioned herein are incorporated herein by reference in filll
`
`for the purpose of describing and disclosing the methodologies, which are described
`
`in the publications, which might be used in connection with the description herein.
`
`The publications discussed above and throughout the text are provided solely for their
`
`disclosure prior to the filing date of the present application. Nothing herein is to be
`
`construed as an admission that the inventors are not entitled to antedate such
`
`disclosure by virtue of prior disclosure.
`
`DESCRIPTION OF DRAWINGS
`
`Figure 1A and Figure B shows distribution of assembled viral siRNA conti gs
`
`on the tripartite and bipartite RNA genome of (a) CMV and (b) FHV; as discussed in
`
`detail in Example 1, below.
`
`Figure 2 shows the discovery of three new viruses by assembly of Viral
`
`siRNAs. Total of 54, 34 and 19 contigs assembled from sequenced siRNAs were
`
`mapped to DmTV, DmBV and DmTRV, respectively. The genome organization of
`
`EEV was shown as a reference for DmTRV. % protein sequence identities of
`
`assembled contigs (red bars) of the three Viruses to related Viruses were shown on the
`
`top or below; as discussed in detail in Example 1, below.
`
`Figure 3A, Figure 3B, Figure 3C illustrate a phylogenetic analysis of newly
`
`identified viruses (indicated by a red arrow) according to the similarities of viral
`
`RdRPs with Clustal W method; as discussed in detail in Example 1, below.
`
`Figure 4A and Figure 4B illustrate the distribution of assembled Viral siRNA
`
`contigs on the monopartite genome and bipartite RNA genome of SINV and a new
`
`nodavirus respectively; as discussed in detail in Example 1, below.
`
`Figure 5 illustrates position and distribution of FHV and SINV siRNA contigs
`
`assembled from small RNAs sequenced from (Figure 5A) Drosophz'la 82 cells
`
`infected with the BZ-deletion mutant of FHV (l l), (Figure 5B) a transgenic C.
`
`elegans strain in the RNAi-defective l (rde-l) mutant background carrying an FHV
`
`RNAl replicon in which the coding sequence of B2 was replaced by that of GFP (29),
`
`1O
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`and (Figure 5C) adult mosquitoes infected with SINV (22); as discussed in detail in
`
`Example 2, below.
`
`Figure 6 illustrates discovery of dsRNA viruses DTV (Figure 6A) and DBV
`
`(Figure 6B), and +RNA viruses DTrV (Figure 6C) and MNV (Figure 6D) from S2-
`
`GMR cells by vsSAR ; as discussed in detail in Example 2, below.
`
`Figure 7 illustrates the S2—GMR cells contained four infectious RNA viruses:
`
`Figure 7A illustrates DTV, DBV, DXV and ANV were all detected by RT—PCR in
`
`non-contaminated S2 cells 4 days after inoculation with the supernatant of the S2-
`
`GMR cells; Figure 7B illustrates detection of a Dl-RNA derived from ANV RN A2 in
`
`S2-GMR cells (right lane) and in S2 after inoculation with the supernatant of 82-
`
`GMR cells (left lane) by Northern blot hybridizations using a probe recognizing the
`
`3 ’-terrninal 120 nt of RNA2; Figure 7C illustrates structure of the cloned DI-RNA of
`
`ANV (top) and mapping of the perfect-matched 21-nt siRNAs sequenced from S2-
`
`GMR cells to the positive (blue) and negative (red) strands of ANV RNA2 (20—nt
`
`windows) (bottom) ; as discussed in detail in Example 2, below.
`
`Figure 8 illustrates size distribution (Figure 8A) and aggregate nucleotide
`
`composition (Figure SB) of virus-derived small RNAs in Drosophz'la OSS cells; as
`
`discussed in detail in Example 2, below.
`
`Like reference symbols in the various drawings indicate like elements.
`
`DETAILED DESCRIPTION
`
`In alternative embodiments the invention provides methods for viral genome
`
`assembly and viral discovery using small inhibitory RNAs, or “small silencing,”
`
`RNAs (siRNAS), micro—RNAs (miRNAs) and/or PIWI—interacting RNAs (piRNAs),
`
`including siRNAS, miRNAs and/or piRNAs isolated or sequenced from invertebrate
`
`organisms such as insects (Anthropoda), nematodes (Nemapoda), Mollusca, Porz'fera,
`
`and other invertebrates, and/or plants, fiJngi or algae, Cyanobacteria and the like.
`
`As described in Example 2, we found that viral small silencing RNAs
`
`produced by invertebrate animals are overlapping in sequence and can assemble into
`
`long contiguous fragments of the invading viral genome from small RNA libraries
`
`sequenced by next generation platforms. Based on this finding, we developed an
`
`approach of yirus discovery in invertebrates by deep sequencing and assembly of total
`
`small RNAs (vdSAR) isolated from a host organism of interest.
`
`1O
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`As described in Example 2, alternative embodiments of the invention revealed
`
`mix infection of Drosophila cell lines and adult mosquitoes by multiple RNA viruses,
`
`five of which were new. Analysis of small RNAs from mix infected Drosophila cells
`
`showed that infection of all three distinct dsRNA viruses triggered production of viral
`
`siRNAs with features similar to siRNAs derived from +RNA viruses. Our study also
`
`revealed production and assembly of virus—derived piRNAs in Drosophila cells,
`
`suggesting a novel function of piRNAs in viral immunity. Thus, unique features of
`
`the invention’s vdSAR can discover new invertebrate and arthropod-borne animal and
`
`human viral pathogens.
`
`As used herein and in the appended claims, the singular forms "a,” "and,” and
`
`"the" include plural referents unless the context clearly dictates otherwise. Thus, for
`
`example, reference to "an siRNA" includes a plurality of such siRNAs and reference
`
`to ”the virus" includes reference to one or more viruses, and so forth.
`
`Unless defined otherwise, all technical and scientific terms used herein have
`
`the same meaning as commonly understood to one of ordinary skill in the art to which
`
`this disclosure belongs. Although any methods and reagents similar or equivalent to
`
`those described herein can be used in the practice of the disclosed methods and
`
`compositions, the exemplary methods and materials are now described.
`
`Also, the use of “or” means “and/or” unless stated otherwise. Similarly,
`,3 ‘6
`3) “
`3, ‘C'
`)7 C"
`include,
`includes,” and “including” are
`
`“comprise,
`
`comprises,
`
`comprising
`
`interchangeable and not intended to be limiting.
`
`It is to be further understood that where descriptions of various embodiments
`
`use the term “comprising,” those skilled in the art would understand that in some
`
`specific instances, an embodiment can be alternatively described using language
`
`“consisting essentially of” or “consisting of.”
`
`The disclosure of US. Patent No. 7,211,390, describing techniques associated
`
`with “deep sequencing” is incorporated herein by reference.
`
`In immunity, viral infection induces production of virus-derived small
`
`interfering RNAs (siRNAs), pi-RNAs and miRNAs that subsequently guide specific
`
`viral RNA clearance by the RNA interference (RNAi), pi-RNA and miRNA
`
`mechanism. In D. melanogaster, for example, siRNAs of 21 nucleotides long
`
`targeting several positive—strand (+) RNA viruses are produced by Dicer—2 from
`
`processing dsRNA replicative intermediates synthesized during viral RNA
`
`1O
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`replication. Assisted by the dsRNA-binding protein R2D2, these viral siRNAs are
`
`then loaded in Argonaute-2 to direct viral RNA clearance (Galiana-Arnoux et al.,
`
`2006, Wang et al., 2006; Zambon et al., 2006). As a counter defense, viruses encode
`
`essential pathogenesis proteins that are viral suppressors of RNAi (VSRs) (Li and
`
`Ding, 2006; Mlotshwa et al., 2008). VSRs may inhibit either the production or
`
`activity of viral siRNAs by targeting the dsRNA precursors, siRNAs or Argonaute
`
`proteins. Several nuclear—replicating DNA viruses produce virus—derived microRNAs
`
`following infection of their mammalian host cells and many proteins encoded by
`
`mammalian RNA and DNA viruses exhibit VSR activity. However, the current
`
`consensus is that in vertebrates viral dsRNA triggers PKR and interferon responses
`
`instead of the RNAi response.
`
`The disclosure provides a method for virus discovery that is independent of
`
`either amplification or purification of viral particles. Many of the human diseases
`
`such as approximately half of all analyzed cases of human encephalitis and
`
`gastroenteritis, have no identified etiology. Thus, discovery of new viruses or the
`
`identification of the presence of viral infection can facilitate identification of human
`
`pathogenic viruses, improve our understanding of their transmission and provide
`
`diagnostic tools and targets for the development of anti-virals.
`
`The disclosure is based, in part, on the understanding of the mechanism of
`
`RNAi-mediated, including pi-RNA-, miRNA- and siRNA-based, viral immunity. In
`
`this immunity, viral infection induces production of virus-derived small interfering
`
`RNAs (siRNAs), pi-RNAs and miRNAs, that subsequently guide specific viral RNA
`
`clearance by the RNA interference (RNAi) (pi—RNA—, miRNA— and siRNA—based)
`
`mechanism. In D. melanogaster, for example, siRNAs of 21 nucleotides long
`
`targeting several positive—strand (+) RNA viruses are produced by Dicer—2 from
`
`processing dsRNA replicative intermediates synthesized during viral RNA
`
`replication. Assisted by the dsRNA-binding protein R2D2, these viral siRNAs are
`
`then loaded in Argonaute-2 to direct viral RNA clearance (Galiana-Arnoux et al.,
`
`2006; Wang et al., 2006; Zambon et al., 2006). As a counter defense, viruses encode
`
`essential pathogenesis proteins that are viral suppressors of RNAi (VSRs) (Li and
`
`Ding, 2006; Mlotshwa et al., 2008). VSRs may inhibit either the production or
`
`activity of viral siRNAs, pi—RNAs and miRNAs by targeting the dsRNA precursors,
`
`siRNAs, pi—RNAs and miRNAs or Argonaute proteins. Several nuclear—replicating
`
`1O
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`DNA viruses produce virus-derived microRNAs following infection of their
`
`mammalian host cells and many proteins encoded by mammalian RNA and DNA
`
`viruses exhibit VSR activity (Ding and Voinnet, 2007). However, the current
`
`consensus is that in vertebrates viral dsRNA triggers PKR and interferon responses
`
`instead of the RNAi (siRNAs, pi-RNAs and miRNAs) response.
`
`The disclosure demonstrates by the next generation sequencing technologies
`
`viral siRNAs, pi—RNAs and miRNAs produced in plants and fruit flies infected with
`
`positive-strand RNA viruses, which are closely related to human pathogenic viruses
`
`such as poliovirus and West Nile virus. The results show that viral siRNAs produced
`
`by the host immune system in response to viral infection are overlapping in sequence
`
`and can be assembled back into long continuous fragments (contigs) of the infecting
`
`viral RNA genome using assembly programs developed for short read genome
`
`sequencing. Unlike individual siRNAs, pi-RNAs and miRNAs, the contigs assembled
`
`from viral siRNAs, pi—RNAs and miRNAs can be translated into protein sequences in
`
`silico for homology searches to identify new viruses that may be only distantly related
`
`to known viruses.
`
`The disclosure demonstrates that deep sequencing by the next generation
`
`technologies and assembly ofvirus-derived siRNAs, pi-RNAs and miRNAs can be
`
`employed as a new approach for virus discovery and identification. Indeed, the
`
`examination of a recently sequenced small RNA library (Flynt et al., 2009) made
`
`from a Drosophila cell line found that the cell line is infected with at least five distinct
`
`RNA viruses. These include two known viruses and three new viruses belonging to
`
`different genera not previously described. Since virus infection of plants and
`
`invertebrates inevitably results in the production of virus—derived siRNAs, pi—RNAs
`
`and miRNAs, this invention does not depend on the ability to either amplify the virus
`
`or purify the viral particle to enrich viral nucleic acids, which is essential for the
`
`current technologies. lmportantly, any viruses detected by the method of the
`
`disclosure are live and replication-competent because viral siRNAs, pi-RNAs and
`
`miRNAs are products of an active host immune response to viral infection.
`
`The observation that individual viral siRNAs, pi-RNAs and miRNAs can be
`
`assembled back to longer genome fragments of the invading virus provides an
`
`exciting new method for virus discovery by deep sequencing and assembly of viral
`
`siRNAs, pi—RNAs and miRNAs. Unlike individual siRNAs, pi—RNAs and miRNAs,
`
`10
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`the contigs assembled from viral siRNAs, pi-RNAs and miRNAs can be translated
`
`into protein sequences in silico for homology searches to identify new viruses that
`
`may be distantly related to known viruses.
`
`The disclosure provides a frame of the VDsiR comprised of bioinformatics
`
`analysis and experimental verification. Small RNA assembling is a usefiil component
`
`of the system, the number of input sequences and distinct programs have impact on
`
`the output. In a pilot study (described herein), Velvet was found to be a useful
`
`program for the project, which employs the principle of de Bruijn graphs to build up
`
`continuous sequence from short reads in short run time (Zerbino et al, 2008).
`
`The disclosure thus provides in one embodiment, a method comprising (i)
`
`obtaining nucleotide sequences from a small RNA libraries comprising a plurality of
`
`naturally occurring 18-28 nucleotide RNA fragments to obtain a sequenced small
`
`RNA library; (ii) assembling the sequences in the sequenced small RNA library into
`
`at least one contiguous sequence comprising a plurality of nucleotide RNA fragment
`
`sequences; optionally filling in gaps in a sequence by RT—PCR techniques; (iii)
`
`searching a database of viral sequence using the at least one contiguous sequence to
`
`identify a viral sequence having at least 50%-100 percent homology to the contiguous
`
`sequence; (iv) identifying and annotating the phylogenetic analysis of the identified
`
`viral sequence with the contiguous sequence.
`
`It will be understood that a sequence library may be provided by a third party
`
`or made available to a user by any number of ways (i.e., intemet, computer readable
`
`medium and the like) and thus the process described above can be adapted to carry the
`
`process and identify or annotate a virus accordingly. In some embodiment, however,
`
`the library may be a sample library comprising substantially purified RNA from an
`
`organism of interest. In such instances, deep sequencing techniques are carried out
`
`and a sequence library created. In yet another embodiment, a sample comprising
`
`substantially purifying small RNA fragments from an organism of interest are
`
`provided in which case sequencing the RNA fragments to obtain the small RNA
`
`library is performed.
`
`In yet another embodiment, if a gross RNA sample from an organism is
`
`provided or where increased homology searching is desired, the method may
`
`optionally include removing sequenced segments from the library that overlap with
`
`the genomic sequence of the organism of interest from which the RNA was derived.
`
`ll
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`In yet another embodiment, the method further comprises completing a
`
`genomic sequence of a virus comprising the contiguous sequence using 5’-RACE and
`
`3 ’-RACE.
`
`For example, the disclosure demonstrates in the specific embodiments and
`
`proof of principle that a method including the steps of construction of a small RNA
`
`library from cell culture or adults insects such as mosquitoes or fruit flies; deep
`
`sequencing of the small RNA libraries with an I} lumina 2G Analyser: assembly of the
`
`sequenced small RNAs by Velvet using either all of the sequenced small RNAs of 18-
`
`28 nucleotides in length or small RNAs of specific lengths such as 21-nt and 22-nt,
`
`which most likely represent the products of Drosophz'la Dicer—2 and Dicer—l ,
`
`respectively, to generate a contig(s), contigs of virus-derived siRNAs may include
`
`features such as specific enrichment of 21- to 22-nt small RNAs, the presence of
`
`small RNAs of both polarities and the high density of siRNAs (number of
`
`siRNAs/length of contigs); identification and removal of those assembled sequences
`
`mapped onto the host genome when the genome sequence is known, which reduces
`
`the number of the candidates for next steps; homology search of contigs with known
`
`virus at both the nucleotide and protein levels; in an optional embodiment, RT-PCR
`
`and sequencing can be used to fill the gaps between the contigs that show limited
`
`similarities with a known virus; optionally completing the fu ll-length genomic
`
`sequence of the identified virus with 5’-RACE and 3’-RACE; and annotation and
`
`phylogenetic analysis of the identified virus, resulted in the identification of 2 known
`
`viruses and 3 novel viruses from a D. melanogaster sample.
`
`As used herein a sample is any sample that can contain a virus. Thus, the
`
`sample can be obtained from the environment, from a specific organism (including
`
`plants, insects and mammals). An environmental sample can be obtained from any
`
`number of sources (as described above), including, for example, insect feces, hot
`
`springs, soil and the like. Any source ofnucleic acids in purified or non-purified form
`
`can be utilized as starting material. Thus, the nucleic acids may be obtained from any
`
`source which is contaminated by an infectious organism (e.g. a virus). The sample can
`
`be an extract from any bodily sample such as blood, urine, spinal fluid, tissue, vaginal
`
`swab, stool, amniotic fluid or buccal mouthwash from any mammalian organism. For
`
`non—mammalian (e. g., invertebrates) organisms the sample can be a tissue sample,
`
`salivary sample, fecal material or material in the digestive tract of the organism. For
`
`12
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2010/141433
`
`PCT/U82010/036849
`
`example, in horticulture and agricultural testing the sample can be a plant, soil, liquid
`
`or other horticultural or agricultural product; in food testing the sample can be fresh
`
`food or processed food (for example infant formula, seafood, fresh produce and
`
`packaged food)