(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(19) World Intellectual Property Organization
`International Bureau
`
`9 December 2010 (09.12.2010) (10) International Publication Number
`
`(43) International Publication Date
`
`WO 2010/141433 A2
`
`
`(51)
`
`International Patent Classification:
`C12Q 1/68 (2006.01)
`C40B 40/06 (2006.01)
`CIZQ 1/70 (2006.01)
`
`(81)
`
`(21)
`
`International Application Number:
`PCT/US2010/036849
`
`(22)
`
`International Filing Date:
`
`(25)
`
`(26)
`
`(30)
`
`(71)
`
`(72)
`(75)
`
`(74)
`
`Filing Language:
`
`Publication Language:
`
`1 June 2010 (01.06.2010)
`
`English
`
`English
`
`Priority Data:
`61/183,377
`61/286,742
`
`2 June 2009 (02.06.2009)
`15 December 2009 (15.12.2009)
`
`US
`US
`
`Applicant (for all designated States except US): THE
`REGENTS OF THE UNIVERSITY OF CALIFOR-
`NIA [US/US]; 1111 Franklin Street, 5th Floor, Oakland,
`CA 94607-5200 (US).
`
`Inventors; and
`Inventors/Applicants flor US only): DING, Shou-Wei
`[US/US]; 8797 Barnwood Lane, Riverside, CA 92508
`(US). WU, Qingfa [CN/US]; 1177 Linden Street, Apt.
`19, Riverside, CA 92507 (US).
`
`Agents: EINHORN, Gregory, P. et al.; Gavrilovich,
`Dodd & Lindsey LLP, 4660 La Jolla Village Drive, Suite
`750, San Diego, CA 92122 (US).
`
`Designated States (unless otherwise indicated, for every
`kind ofnational protection available): AE, AG, AL, AM,
`AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ,
`CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO,
`DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP,
`KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD,
`ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI,
`NO, NZ, OM, PE, PG, PH, PL, PT, RO, RS, RU, SC, SD,
`SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR,
`TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW.
`
`(84)
`
`Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG,
`ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ,
`TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK,
`EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU,
`LV, MC, MK, MT, NL, NO, PL, PT, R0, SE, SI, SK,
`SM, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ,
`GW, ML, MR, NE, SN, TD, TG).
`Published:
`
`without international search report and to be republished
`upon receipt ofthat report (Rule 48.2(g))
`
`with sequence listing part ofdescription (Rule 5.2(a))
`
`(54) Title: VIRUS DISCOVERY BY SEQUENCING AND ASSEMBLY OF VIRUS-DERIVED SIRNAS, MIRNAS, PIRNAS
`
`(57) Abstract: In one embodiment, the disclosure provides methods and systems for identifying viral nucleic acids in a sample. In
`another embodiment the invention provides methods for viral genome assembly and viral discovery using small inhibitory RNAs,
`or "small silencing," RNAs (siRNAS), micro-RNAs (miRNAs) and/or PlWl-interacting RNAs (piRNAs), including siRNAS, miR-
`NAS and/or piRNAS isolated or sequenced from invertebrate organisms such as insects (Anthropoda), nematodes (Nemapoda),
`Mollusca, Porifera, and other invertebrates, and/or plants, fungi or algae, Cyanobacteria and the like,
`
`
`
`W02010/141433A2|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`VIRUS DISCOVERY BY SEQUENCING AND ASSEMBLY
`OF VIRUS-DERIVED siRNAs, miRNAs, piRNAs
`
`REFERENCE TO SEQUENCE LISTING
`
`This application contains a txt. File containing the sequence listing, which is
`
`incorporated by reference herein.
`
`STATEMENT AS TO FEDERALLY SPONSORED RESEARCH
`
`This invention was made with government support under Grant No. A1052447
`
`awarded by the National Institutes of Health (NIH) and Grant No. 2007-35319-18325
`
`awarded by the USDA. The government has certain rights in the invention.
`
`TECHNICAL FIELD
`
`In one embodiment, the disclosure provides methods and systems for
`
`identifying viral nucleic acids in a sample. In another embodiment the invention
`
`provides methods for viral genome assembly and viral discovery using small
`
`inhibitory RNAs, or “small silencing,” RNAs (siRNAS), micro-RNAs (miRNAs)
`
`and/or PIWI—interacting RNAs (piRNAs), including siRNAS, miRNAs and/or
`
`piRNAs isolated or sequenced from invertebrate organisms such as insects
`
`(Anthropoda), nematodes (Nemapoda), Mollusca, Porifera, and other invertebrates,
`
`and/or plants, filngi or algae, Cyanobacterz'a and the like.
`
`BACKGROUND
`
`Discovery of new viruses is often hindered by difficulties in their
`
`amplification in cell culture and/or lack of their cross-reactivity in serological and
`
`nucleic acid hybridization assays to known viruses. Many new viruses have been
`
`recently identified in environmental and clinical samples using metagenomic
`
`approaches, in which viral particles are first purified and viral nucleic acid sequences
`
`are then randomly amplified prior to subcloning and sequencing (Delwart, 2007).
`
`The Dicer family of host immune receptors mediates antiviral immunity in
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`
`
`fungi, plants and invertebrate animals by RNA interference (RNAi) or RNA silencing
`
`(1-3). In this immunity, a viral double-stranded RNA (dsRNA) is recognized by Dicer
`
`and diced into small interfering RNAs (siRNAs). These virus-derived siRNAs are
`
`then loaded into an RNA silencing complex to act as specificity determinants and to
`
`1
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`guide slicing of the target viral RNAs by an Argonaute protein (AGO) present in the
`
`complex. Dicer proteins typically contain an RNA helicase domain, a PAZ domain
`
`shared with AGOs, and two tandem type III endoribonuclease (RNase III) domains.
`
`Dicer cleaves dsRNA with a simple preference toward a terminus of dsRNA,
`
`producing duplex small RNA fragments of discrete sizes progressively from the
`
`terminus (4).
`
`In addition to siRNAs, microRNAs (miRNAs) and PIWI—interacting RNAs
`
`(piRNAs) also guide RNA silencing in similar complexes but with distinct AGOs (4-
`
`6). ln Drosophila melanogaster, miRNAs and siRNAs are predominantly 22 and 21
`
`nucleotides in length, dependent on Dicer-l (DCRl) and DCR2 for their biogenesis,
`
`and act in silencing complexes containing AGOl and AG02 in the AGO subfamily,
`
`respectively (4-6). In contrast, ~24-30-nt piRNAs are Dicer-independent and require
`
`AGO3, Aubergine (AUB) and PIWI in the PIWI subfamily for their biogenesis (4-6).
`
`Genetic analyses (7—10) have clearly demonstrated a role for D. melanogaster DCR2
`
`in the immunity and bio genesis of viral siRNAs targeting diverse positive—strand (+)
`
`RNA viruses, including Flock house virus (FHV), cricket paralysis virus, Drosophila
`
`C virus (DCV), and Sindbis virus (SlN V). Cloning and sequencing of small RNAs
`
`from FHV-infected Drosophila cells fiirther indicate that the viral dsRNA replicative
`
`intermediates (vRI-dsRNA) are the substrate of DCR2 and the precursor of viral
`
`siRNAs (11-12). Drosophila susceptibility to Drosophila X virus (DXV), which
`
`contains a dsRNA genome, is influenced by components from both the siRNA (e.g.,
`
`AG02 & R2D2) and piRNA (e.g., AUB & PIWI) pathways (13). However, detection
`
`of small RNAs derived from any dsRNA virus has not been reported yet (1, 13).
`
`Virus—derived small RNAs were first detected in plants infected with a +RNA
`
`virus (14). The Dicer proteins involved in the production of siRNAs targeting both
`
`+RNA viruses and DNA viruses have been identified in Arabidopsz's thaliana (2-3),
`
`which encode AGOs in the AGO subfamily but not in the PIVVI subfamily (15).
`
`Cloning and sequencing of plant viral siRNAs suggest that they may be processed
`
`either from vRI-dsRNA or hairpin regions of single-stranded RNA precursors (16-
`
`20). Production of viral siRNAs has also been demonstrated in fungi, silkworms,
`
`mosquitoes, and nematodes in response to infection with +RNA viruses and viral
`
`small silencing RNAs produced in filngi and mosquitoes have recently been cloned
`
`and sequenced (21—25).
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`The available data thus illustrate that accumulation of virus-derived small
`
`silencing RNAs is a common feature of an active immune response to viral infection
`
`in diverse eukaryotic host species.
`
`SUMMARY
`
`The disclosure provides a method for viral discovery that is independent of
`
`either amplification or purification of viral particles. Many human diseases such as
`
`approximately half of all analyzed cases of human encephalitis and gastroenteritis,
`
`have no identified etiology. Thus, discovery of new viruses should facilitate
`
`identification of human pathogenic viruses, improve the understanding of their
`
`transmission and provide diagnostic tools and targets for the development of anti-
`
`virals.
`
`The disclosure provides methods of identifying viral nucleic acid, assembling
`
`viral genomes and discovering viruses based upon the mechanism of invertebrate,
`
`plant, algae, fungal etc. processing of viral small inhibitory RNAs, or “small
`
`silencing,” RNAs (siRNAS), micro—RNAs (miRNAs) and/or PIWI—interacting RNAs
`
`(piRNAs), including miRNA-, piRNA-, siRNA and/or RNAi-mediated viral
`
`immunity in plants and invertebrates, including insects (Drosophila melanogaster and
`
`mosquitoes) and nematodes (Caenorhabditis elegans), and algae, fungus,
`
`Cyanobacteria and the like.
`
`In alternative embodiments, the invention provides methods comprising:
`
`(a) (i) obtaining a plurality of naturally occurring 18-28 nucleotide RNA
`
`fragments, or siRNAs, or miRNAs and/or piRNAs, to generate an RNA library, or,
`
`obtaining a plurality of 18—28 nucleotide RNA fragments, or siRNAs, or miRNAs
`
`and/or piRNAs from an organism or organisms, or a plant or plants; and
`
`’IO
`
`’15
`
`20
`
`25
`
`(ii)
`
`determining the sequence of the RNA fragments, or siRNAs, or
`
`
`
`miRNAs and/or piRNAs, and using those sequences to assemble the RNA fragments,
`
`or siRNAs, or miRNAs and/or piRNAs into at least one contiguous unit (“a conti g”)
`
`comprising a plurality of the nucleotide RNA fragments siRNAs, miRNAs and/or
`
`30
`
`piRNAs; or
`
`(b) the method of (a), wherein the contigs are assembled using the help of a
`
`computer program, wherein optionally the computer program is VELVET.
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`In alternative embodiments, the methods fithher comprise determining the
`
`sequence of the assembled contiguous unit, or the contig; or further comprise:
`
`(a) searching a database of viral or microorganism sequences using the at least
`
`one contiguous sequence to identify a viral or microorganism genome, nucleic acid or
`
`protein-encoding sequence, or subsequence thereof, having significant homology to
`
`the assembled contiguous unit; or
`
`(b) the method of (a), wherein the database comprises non—redundant
`
`nucleotide sequences; or
`
`(c) the method of (a), wherein the database comprises in silico translation
`
`1O
`
`sequences.
`
`15
`
`20
`
`25
`
`30
`
`wherein optionally the assembled contig sequence has significant homology to
`
`a known viral genus or genome.
`
`In alternative embodiments, the methods fiirther comprise searching a
`
`database of viral or microorganism sequences using the at least one contiguous
`
`sequence to identify a viral or microorganism genome, nucleic acid or protein—
`
`encoding sequence, or subsequence thereof, having at least about 50% to about 100%
`
`percent homology to all or part of the assembled contiguous sequence.
`
`In alternative embodiments, the methods fiirther comprise making a
`
`phylogenetic analysis of the identified viral or microorganism genome, nucleic acid or
`
`protein-encoding sequence with the contiguous sequence.
`
`In alternative embodiments, the methods further comprise identifying and
`
`annotating the phylogenetic analysis of the identified viral sequence with the
`
`contiguous sequence.
`
`In alternative embodiments, the obtained RNA or nucleotide sequences are
`
`substantially purified or isolated from an organism of interest.
`
`In alternative embodiments, the methods fiirther comprise substantially
`
`purifying small RNA fragments, or siRNAs, or miRNAs and/or piRNAs, from an
`
`organism of interest and sequencing the RNA fragments to obtain an RNA library.
`
`In alternative embodiments, the methods fiirther comprise removing
`
`sequenced segments from the library that overlap with the genomic sequence of the
`
`organism of interest from which the RNA was derived.
`
`In alternative embodiments, the methods fiirther comprise filling in gaps
`
`between the contiguous sequences. In one embodiment, filling in the gaps between
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`the contiguous sequences comprises use of RT—PCR and/or sequencing to fill in gaps
`
`between the contiguous sequences.
`
`In alternative embodiments, the methods filrther comprise completing a
`
`genomic sequence of a virus or a microorganism comprising the contiguous sequence
`
`using 5’-RACE and 3’-RACE.
`
`In one embodiment, the organism or organisms is/are an invertebrate, an insect
`
`(Anthropoda), a nematode (Nemapoda), a Mollusca, a Porz'fera, a plant, a fungi, an
`
`algae, a Cyanobacteria; or the organism or organisms are identified or unidentified
`
`and are derived from an environmental sample. In one embodiment, the
`
`10
`
`environmental sample is a soil sample, a water sample or an air sample.
`
`In one embodiment, the invention provides methods for identifying a virus,
`
`comprising:
`
`constructing a small RNA library from an organism or organisms;
`
`deep sequencing the small RNA library;
`
`assembling the sequenced small RNAs using (a) all of the sequenced small
`
`RNAs of 18—28 nucleotides in length, or siRNAs, or miRNAs and/or piRNAs; or (b)
`
`small RNAs, or siRNAs, or miRNAs and/or piRNAs, of a defined length into at
`
`plurality of contigs;
`
`identifying and removing those assembled sequences mapped onto the genome
`
`of the organism to provide an enriched set of contigs;
`
`performing a homology search of contigs against known viruses at both the
`
`nucleotide and protein levels;
`
`optionally using RT—PCR and sequencing to fill the gaps between the contigs
`
`that show limited similarities with a known virus;
`
`completing the filll—length genomic sequence of the identified virus with 5’—
`
`RACE and 3’-RACE; and
`
`annotating the identified virus.
`
`In one embodiment, the organism or organisms is/are an invertebrate, an insect
`
`(Anthropoda), a nematode (Nemapoda), a Mollusca, a Porg'fera, a plant, a fungi, an
`
`algae, a Cyanobacterz‘a; or the organism or organisms are identified or unidentified
`
`and are derived from an environmental sample. In one embodiment, the
`
`environmental sample is a soil sample, a water sample or an air sample.
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`The details of one or more embodiments of the disclosure are set forth in the
`
`accompanying drawings and the description below. Other features, objects, and
`
`advantages of the disclosure will be apparent from the description and drawings, and
`
`from the claims.
`
`All publications mentioned herein are incorporated herein by reference in filll
`
`for the purpose of describing and disclosing the methodologies, which are described
`
`in the publications, which might be used in connection with the description herein.
`
`The publications discussed above and throughout the text are provided solely for their
`
`disclosure prior to the filing date of the present application. Nothing herein is to be
`
`construed as an admission that the inventors are not entitled to antedate such
`
`disclosure by virtue of prior disclosure.
`
`DESCRIPTION OF DRAWINGS
`
`Figure 1A and Figure B shows distribution of assembled viral siRNA conti gs
`
`on the tripartite and bipartite RNA genome of (a) CMV and (b) FHV; as discussed in
`
`detail in Example 1, below.
`
`Figure 2 shows the discovery of three new viruses by assembly of Viral
`
`siRNAs. Total of 54, 34 and 19 contigs assembled from sequenced siRNAs were
`
`mapped to DmTV, DmBV and DmTRV, respectively. The genome organization of
`
`EEV was shown as a reference for DmTRV. % protein sequence identities of
`
`assembled contigs (red bars) of the three Viruses to related Viruses were shown on the
`
`top or below; as discussed in detail in Example 1, below.
`
`Figure 3A, Figure 3B, Figure 3C illustrate a phylogenetic analysis of newly
`
`identified viruses (indicated by a red arrow) according to the similarities of viral
`
`RdRPs with Clustal W method; as discussed in detail in Example 1, below.
`
`Figure 4A and Figure 4B illustrate the distribution of assembled Viral siRNA
`
`contigs on the monopartite genome and bipartite RNA genome of SINV and a new
`
`nodavirus respectively; as discussed in detail in Example 1, below.
`
`Figure 5 illustrates position and distribution of FHV and SINV siRNA contigs
`
`assembled from small RNAs sequenced from (Figure 5A) Drosophz'la 82 cells
`
`infected with the BZ-deletion mutant of FHV (l l), (Figure 5B) a transgenic C.
`
`elegans strain in the RNAi-defective l (rde-l) mutant background carrying an FHV
`
`RNAl replicon in which the coding sequence of B2 was replaced by that of GFP (29),
`
`1O
`
`15
`
`20
`
`25
`
`30
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`and (Figure 5C) adult mosquitoes infected with SINV (22); as discussed in detail in
`
`Example 2, below.
`
`Figure 6 illustrates discovery of dsRNA viruses DTV (Figure 6A) and DBV
`
`(Figure 6B), and +RNA viruses DTrV (Figure 6C) and MNV (Figure 6D) from S2-
`
`GMR cells by vsSAR ; as discussed in detail in Example 2, below.
`
`Figure 7 illustrates the S2—GMR cells contained four infectious RNA viruses:
`
`Figure 7A illustrates DTV, DBV, DXV and ANV were all detected by RT—PCR in
`
`non-contaminated S2 cells 4 days after inoculation with the supernatant of the S2-
`
`GMR cells; Figure 7B illustrates detection of a Dl-RNA derived from ANV RN A2 in
`
`S2-GMR cells (right lane) and in S2 after inoculation with the supernatant of 82-
`
`GMR cells (left lane) by Northern blot hybridizations using a probe recognizing the
`
`3 ’-terrninal 120 nt of RNA2; Figure 7C illustrates structure of the cloned DI-RNA of
`
`ANV (top) and mapping of the perfect-matched 21-nt siRNAs sequenced from S2-
`
`GMR cells to the positive (blue) and negative (red) strands of ANV RNA2 (20—nt
`
`windows) (bottom) ; as discussed in detail in Example 2, below.
`
`Figure 8 illustrates size distribution (Figure 8A) and aggregate nucleotide
`
`composition (Figure SB) of virus-derived small RNAs in Drosophz'la OSS cells; as
`
`discussed in detail in Example 2, below.
`
`Like reference symbols in the various drawings indicate like elements.
`
`DETAILED DESCRIPTION
`
`In alternative embodiments the invention provides methods for viral genome
`
`assembly and viral discovery using small inhibitory RNAs, or “small silencing,”
`
`RNAs (siRNAS), micro—RNAs (miRNAs) and/or PIWI—interacting RNAs (piRNAs),
`
`including siRNAS, miRNAs and/or piRNAs isolated or sequenced from invertebrate
`
`organisms such as insects (Anthropoda), nematodes (Nemapoda), Mollusca, Porz'fera,
`
`and other invertebrates, and/or plants, fiJngi or algae, Cyanobacteria and the like.
`
`As described in Example 2, we found that viral small silencing RNAs
`
`produced by invertebrate animals are overlapping in sequence and can assemble into
`
`long contiguous fragments of the invading viral genome from small RNA libraries
`
`sequenced by next generation platforms. Based on this finding, we developed an
`
`approach of yirus discovery in invertebrates by deep sequencing and assembly of total
`
`small RNAs (vdSAR) isolated from a host organism of interest.
`
`1O
`
`15
`
`20
`
`25
`
`30
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`As described in Example 2, alternative embodiments of the invention revealed
`
`mix infection of Drosophila cell lines and adult mosquitoes by multiple RNA viruses,
`
`five of which were new. Analysis of small RNAs from mix infected Drosophila cells
`
`showed that infection of all three distinct dsRNA viruses triggered production of viral
`
`siRNAs with features similar to siRNAs derived from +RNA viruses. Our study also
`
`revealed production and assembly of virus—derived piRNAs in Drosophila cells,
`
`suggesting a novel function of piRNAs in viral immunity. Thus, unique features of
`
`the invention’s vdSAR can discover new invertebrate and arthropod-borne animal and
`
`human viral pathogens.
`
`As used herein and in the appended claims, the singular forms "a,” "and,” and
`
`"the" include plural referents unless the context clearly dictates otherwise. Thus, for
`
`example, reference to "an siRNA" includes a plurality of such siRNAs and reference
`
`to ”the virus" includes reference to one or more viruses, and so forth.
`
`Unless defined otherwise, all technical and scientific terms used herein have
`
`the same meaning as commonly understood to one of ordinary skill in the art to which
`
`this disclosure belongs. Although any methods and reagents similar or equivalent to
`
`those described herein can be used in the practice of the disclosed methods and
`
`compositions, the exemplary methods and materials are now described.
`
`Also, the use of “or” means “and/or” unless stated otherwise. Similarly,
`,3 ‘6
`3) “
`3, ‘C'
`)7 C"
`include,
`includes,” and “including” are
`
`“comprise,
`
`comprises,
`
`comprising
`
`interchangeable and not intended to be limiting.
`
`It is to be further understood that where descriptions of various embodiments
`
`use the term “comprising,” those skilled in the art would understand that in some
`
`specific instances, an embodiment can be alternatively described using language
`
`“consisting essentially of” or “consisting of.”
`
`The disclosure of US. Patent No. 7,211,390, describing techniques associated
`
`with “deep sequencing” is incorporated herein by reference.
`
`In immunity, viral infection induces production of virus-derived small
`
`interfering RNAs (siRNAs), pi-RNAs and miRNAs that subsequently guide specific
`
`viral RNA clearance by the RNA interference (RNAi), pi-RNA and miRNA
`
`mechanism. In D. melanogaster, for example, siRNAs of 21 nucleotides long
`
`targeting several positive—strand (+) RNA viruses are produced by Dicer—2 from
`
`processing dsRNA replicative intermediates synthesized during viral RNA
`
`1O
`
`15
`
`20
`
`25
`
`30
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`replication. Assisted by the dsRNA-binding protein R2D2, these viral siRNAs are
`
`then loaded in Argonaute-2 to direct viral RNA clearance (Galiana-Arnoux et al.,
`
`2006, Wang et al., 2006; Zambon et al., 2006). As a counter defense, viruses encode
`
`essential pathogenesis proteins that are viral suppressors of RNAi (VSRs) (Li and
`
`Ding, 2006; Mlotshwa et al., 2008). VSRs may inhibit either the production or
`
`activity of viral siRNAs by targeting the dsRNA precursors, siRNAs or Argonaute
`
`proteins. Several nuclear—replicating DNA viruses produce virus—derived microRNAs
`
`following infection of their mammalian host cells and many proteins encoded by
`
`mammalian RNA and DNA viruses exhibit VSR activity. However, the current
`
`consensus is that in vertebrates viral dsRNA triggers PKR and interferon responses
`
`instead of the RNAi response.
`
`The disclosure provides a method for virus discovery that is independent of
`
`either amplification or purification of viral particles. Many of the human diseases
`
`such as approximately half of all analyzed cases of human encephalitis and
`
`gastroenteritis, have no identified etiology. Thus, discovery of new viruses or the
`
`identification of the presence of viral infection can facilitate identification of human
`
`pathogenic viruses, improve our understanding of their transmission and provide
`
`diagnostic tools and targets for the development of anti-virals.
`
`The disclosure is based, in part, on the understanding of the mechanism of
`
`RNAi-mediated, including pi-RNA-, miRNA- and siRNA-based, viral immunity. In
`
`this immunity, viral infection induces production of virus-derived small interfering
`
`RNAs (siRNAs), pi-RNAs and miRNAs, that subsequently guide specific viral RNA
`
`clearance by the RNA interference (RNAi) (pi—RNA—, miRNA— and siRNA—based)
`
`mechanism. In D. melanogaster, for example, siRNAs of 21 nucleotides long
`
`targeting several positive—strand (+) RNA viruses are produced by Dicer—2 from
`
`processing dsRNA replicative intermediates synthesized during viral RNA
`
`replication. Assisted by the dsRNA-binding protein R2D2, these viral siRNAs are
`
`then loaded in Argonaute-2 to direct viral RNA clearance (Galiana-Arnoux et al.,
`
`2006; Wang et al., 2006; Zambon et al., 2006). As a counter defense, viruses encode
`
`essential pathogenesis proteins that are viral suppressors of RNAi (VSRs) (Li and
`
`Ding, 2006; Mlotshwa et al., 2008). VSRs may inhibit either the production or
`
`activity of viral siRNAs, pi—RNAs and miRNAs by targeting the dsRNA precursors,
`
`siRNAs, pi—RNAs and miRNAs or Argonaute proteins. Several nuclear—replicating
`
`1O
`
`15
`
`20
`
`25
`
`30
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`DNA viruses produce virus-derived microRNAs following infection of their
`
`mammalian host cells and many proteins encoded by mammalian RNA and DNA
`
`viruses exhibit VSR activity (Ding and Voinnet, 2007). However, the current
`
`consensus is that in vertebrates viral dsRNA triggers PKR and interferon responses
`
`instead of the RNAi (siRNAs, pi-RNAs and miRNAs) response.
`
`The disclosure demonstrates by the next generation sequencing technologies
`
`viral siRNAs, pi—RNAs and miRNAs produced in plants and fruit flies infected with
`
`positive-strand RNA viruses, which are closely related to human pathogenic viruses
`
`such as poliovirus and West Nile virus. The results show that viral siRNAs produced
`
`by the host immune system in response to viral infection are overlapping in sequence
`
`and can be assembled back into long continuous fragments (contigs) of the infecting
`
`viral RNA genome using assembly programs developed for short read genome
`
`sequencing. Unlike individual siRNAs, pi-RNAs and miRNAs, the contigs assembled
`
`from viral siRNAs, pi—RNAs and miRNAs can be translated into protein sequences in
`
`silico for homology searches to identify new viruses that may be only distantly related
`
`to known viruses.
`
`The disclosure demonstrates that deep sequencing by the next generation
`
`technologies and assembly ofvirus-derived siRNAs, pi-RNAs and miRNAs can be
`
`employed as a new approach for virus discovery and identification. Indeed, the
`
`examination of a recently sequenced small RNA library (Flynt et al., 2009) made
`
`from a Drosophila cell line found that the cell line is infected with at least five distinct
`
`RNA viruses. These include two known viruses and three new viruses belonging to
`
`different genera not previously described. Since virus infection of plants and
`
`invertebrates inevitably results in the production of virus—derived siRNAs, pi—RNAs
`
`and miRNAs, this invention does not depend on the ability to either amplify the virus
`
`or purify the viral particle to enrich viral nucleic acids, which is essential for the
`
`current technologies. lmportantly, any viruses detected by the method of the
`
`disclosure are live and replication-competent because viral siRNAs, pi-RNAs and
`
`miRNAs are products of an active host immune response to viral infection.
`
`The observation that individual viral siRNAs, pi-RNAs and miRNAs can be
`
`assembled back to longer genome fragments of the invading virus provides an
`
`exciting new method for virus discovery by deep sequencing and assembly of viral
`
`siRNAs, pi—RNAs and miRNAs. Unlike individual siRNAs, pi—RNAs and miRNAs,
`
`10
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`the contigs assembled from viral siRNAs, pi-RNAs and miRNAs can be translated
`
`into protein sequences in silico for homology searches to identify new viruses that
`
`may be distantly related to known viruses.
`
`The disclosure provides a frame of the VDsiR comprised of bioinformatics
`
`analysis and experimental verification. Small RNA assembling is a usefiil component
`
`of the system, the number of input sequences and distinct programs have impact on
`
`the output. In a pilot study (described herein), Velvet was found to be a useful
`
`program for the project, which employs the principle of de Bruijn graphs to build up
`
`continuous sequence from short reads in short run time (Zerbino et al, 2008).
`
`The disclosure thus provides in one embodiment, a method comprising (i)
`
`obtaining nucleotide sequences from a small RNA libraries comprising a plurality of
`
`naturally occurring 18-28 nucleotide RNA fragments to obtain a sequenced small
`
`RNA library; (ii) assembling the sequences in the sequenced small RNA library into
`
`at least one contiguous sequence comprising a plurality of nucleotide RNA fragment
`
`sequences; optionally filling in gaps in a sequence by RT—PCR techniques; (iii)
`
`searching a database of viral sequence using the at least one contiguous sequence to
`
`identify a viral sequence having at least 50%-100 percent homology to the contiguous
`
`sequence; (iv) identifying and annotating the phylogenetic analysis of the identified
`
`viral sequence with the contiguous sequence.
`
`It will be understood that a sequence library may be provided by a third party
`
`or made available to a user by any number of ways (i.e., intemet, computer readable
`
`medium and the like) and thus the process described above can be adapted to carry the
`
`process and identify or annotate a virus accordingly. In some embodiment, however,
`
`the library may be a sample library comprising substantially purified RNA from an
`
`organism of interest. In such instances, deep sequencing techniques are carried out
`
`and a sequence library created. In yet another embodiment, a sample comprising
`
`substantially purifying small RNA fragments from an organism of interest are
`
`provided in which case sequencing the RNA fragments to obtain the small RNA
`
`library is performed.
`
`In yet another embodiment, if a gross RNA sample from an organism is
`
`provided or where increased homology searching is desired, the method may
`
`optionally include removing sequenced segments from the library that overlap with
`
`the genomic sequence of the organism of interest from which the RNA was derived.
`
`ll
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`In yet another embodiment, the method further comprises completing a
`
`genomic sequence of a virus comprising the contiguous sequence using 5’-RACE and
`
`3 ’-RACE.
`
`For example, the disclosure demonstrates in the specific embodiments and
`
`proof of principle that a method including the steps of construction of a small RNA
`
`library from cell culture or adults insects such as mosquitoes or fruit flies; deep
`
`sequencing of the small RNA libraries with an I} lumina 2G Analyser: assembly of the
`
`sequenced small RNAs by Velvet using either all of the sequenced small RNAs of 18-
`
`28 nucleotides in length or small RNAs of specific lengths such as 21-nt and 22-nt,
`
`which most likely represent the products of Drosophz'la Dicer—2 and Dicer—l ,
`
`respectively, to generate a contig(s), contigs of virus-derived siRNAs may include
`
`features such as specific enrichment of 21- to 22-nt small RNAs, the presence of
`
`small RNAs of both polarities and the high density of siRNAs (number of
`
`siRNAs/length of contigs); identification and removal of those assembled sequences
`
`mapped onto the host genome when the genome sequence is known, which reduces
`
`the number of the candidates for next steps; homology search of contigs with known
`
`virus at both the nucleotide and protein levels; in an optional embodiment, RT-PCR
`
`and sequencing can be used to fill the gaps between the contigs that show limited
`
`similarities with a known virus; optionally completing the fu ll-length genomic
`
`sequence of the identified virus with 5’-RACE and 3’-RACE; and annotation and
`
`phylogenetic analysis of the identified virus, resulted in the identification of 2 known
`
`viruses and 3 novel viruses from a D. melanogaster sample.
`
`As used herein a sample is any sample that can contain a virus. Thus, the
`
`sample can be obtained from the environment, from a specific organism (including
`
`plants, insects and mammals). An environmental sample can be obtained from any
`
`number of sources (as described above), including, for example, insect feces, hot
`
`springs, soil and the like. Any source ofnucleic acids in purified or non-purified form
`
`can be utilized as starting material. Thus, the nucleic acids may be obtained from any
`
`source which is contaminated by an infectious organism (e.g. a virus). The sample can
`
`be an extract from any bodily sample such as blood, urine, spinal fluid, tissue, vaginal
`
`swab, stool, amniotic fluid or buccal mouthwash from any mammalian organism. For
`
`non—mammalian (e. g., invertebrates) organisms the sample can be a tissue sample,
`
`salivary sample, fecal material or material in the digestive tract of the organism. For
`
`12
`
`’IO
`
`15
`
`20
`
`25
`
`30
`
`

`

`WO 2010/141433
`
`PCT/U82010/036849
`
`example, in horticulture and agricultural testing the sample can be a plant, soil, liquid
`
`or other horticultural or agricultural product; in food testing the sample can be fresh
`
`food or processed food (for example infant formula, seafood, fresh produce and
`
`packaged food)

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.

We are unable to display this document.

PTO Denying Access

Refresh this Document
Go to the Docket