PROVISIONAL PATENT APPLICATION
`
`METHODS AND COMPOSITIONS FOR NUCLEIC ACID ANALYSIS
`
`WSGRDocket No. 38938-719.101
`
`Inventor(s):
`
`Serge SAXONOV,
`Citizen of USA,Residingat
`10 De Anza Court
`San Mateo, CA 94402
`
`WagR
`Wilson Sonsini Goodrich & Rosati
`PROFESSIONAL CORPORATION
`
`650 Page Mill Road
`Palo Alto, CA 94304
`(650) 493-9300 (Main)
`(650) 493-6811 (Facsimile)
`
`Filed Electronically on: April 25, 2011
`
`4338427_1.DOCX
`
`

`

`METHODS AND COMPOSITIONS FOR NUCLEIC ACID ANALYSIS
`
`BACKGROUNDOF THE INVENTION
`
`[0001] There is a need for means of measuring collocated species in plasma, measuring fetal load and
`
`multiplexing on the same channel, multiplexing to align the dynamic range of targets whose concentrations
`
`are very different and to smooth out biological variations of reference genes, and for sample partitioning and
`
`barcode tagging for sequencing.
`
`SUMMARYOF THE INVENTION
`In general, in one aspect, a method is provided comprising partitioning one or more nucleic acids into
`[0002]
`isolated partitions, adding a unique bar codeto nucleic acids in each partition, pooling the nucleic acids from
`
`the partitions, analyzing the nucleic acids, and determining which nucleic acids were in the samepartition.
`
`INCORPORATION BY REFERENCE
`
`[0003] All publications, patents, and patent applications mentioned in this specification are herein —
`
`incorporated by reference to the same extent as if each individual publication, patent, or patent application
`wasspecifically and individually indicated to be incorporated by reference.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0004] The novel features of the invention are set forth with particularity in the appended claims. A better
`understanding of the features and advantages of the present invention will be obtained by referenceto the
`following detailed description that sets forth illustrative embodiments, in which the principles of the invention
`are utilized, and the accompanying drawings of which:
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`[0005]
`
`In general, described herein are methods, compositions, and kits for library preparation for
`
`sequencing comprising partitioning a given sample and furnishing those partitions with their own sets of
`
`barcode adaptors. Library preparation can be performedin separate partitions. The contents of the partitions
`
`can subsequently be sequenced, and the barcodes can be usedto identify which reads came from the same
`
`partition.
`
`[0006] A partition can be any modeofseparating that can be used for digital PCR,e.g., droplets,
`~ microfluidic channels, or wells. Barcode adaptors can be bundled within droplets.
`[0007] Currently, barcoding is used to pool samples in order to reduce the cost of sequencing per sample.
`
`Thusseparate library preps are done for each sample each with its own barcodes. Thelibraries are then pooled
`
`and run through a sequencer. Every read of the resulting dataset can then be traced backto the original sample
`via the barcode. Our approachis analogous, but instead of tagging for the purpose of resolving which sample
`produced a given read, we propose to use barcoding to group reads accordingto their partition. Given a large
`
`4338427_1,.DOCX
`
`-2-
`
`WSGRDocket No. 38938-719.101
`
`

`

`set of barcodes this enables.a number of breakthrough applications. Manyofthese applications will become
`
`increasingly relevant as throughput of nextgen sequencersincreases.
`
`[0008] Barcode tagging can be accomplished by merging adapter-filled droplets (AFD) with sample
`
`containing partitions (SCP), which themselves can be droplets. It can be madeso that adapter-filled droplets
`
`are smaller than sample-partitioning droplets. SCPs can be formed so that they contain AFDs. One
`
`implementation is that we can pre-makea large batch of AFDs and emulsify the sample so that sample-
`
`partitioning droplets end up containing AFDs. Through a temperature adjustment, the AFDscan be burst to
`
`release reaction components necessary forlibrary prep.
`
`[0009] Alternatively one could form larger adapter filled droplets and have them encompass sample-
`
`containing droplets.
`
`[0010] One can also construct a microfluidic device that merges a large set of pre-made adapter reagents
`with sample partitions such that every sample partition ends up withits own reagents. For example, if we
`have a square-shaped device with 1000x1000=1 millionpartitions, and our chemistry is suchthatit allows
`tagging eachread with two barcodes, we can construct one million unique identifiers with a modest number
`(2,000) of different barcodes. We load reagents with 1,000 different barcodesin the horizontal channels and
`
`reagents for another set of 1,000 different barcodesin the vertical channels. Every one ofthe million
`
`partitions ends up with its own unique combination of barcodes.
`
`[0011] One can also merge the two types of droplets (SCPs and AFDs)in a controlled manner-- one droplet
`
`of sample with one droplet of adapters.
`
`[0012] If we use droplets for tagging, we can pre-makea large set of droplets of N types. Each type is loaded
`
`with its own barcode.Nis partially determined by the length of the barcode (L). In principle, N can be as
`
`large as 4“L. So for L= 10, we can generate up to 1 million different droplet types.
`
`[0013] These adapter-filed droplets (ADFs) are randomly merged with sample containing partitions (SCP).
`
`One can then perform standard sequencing library preparation within each partition. Once the libraries are
`
`prepped the contents ofall the partitions is merged (eg by breaking droplets) and loaded onto a sequencer.
`The sequencer generates reads for many of the library molecules. Molecules that were prepared within the
`same droplet would contain the same barcode. If the numberof barcodesis sufficiently large, one can surmise
`
`that molecules containing the same barcode camefrom the samepartition.
`
`[0014] Thatis, if N is sufficiently large (ie larger than the number of ADFsactually used in the experiment),
`
`we would expect that any two SCPswill be tagged by different ADFs. If N is not very large we mayfind that
`distinct SCPs are tagged with the same adaptors. In that case we can estimate probabilistically the likelihood
`that any two reads came from the sameor different SCPs. For many applications, a probabilistic assessment
`would be sufficient.
`
`[0015] Applications
`[0016] Long reads and Phasing
`
`4338427_1.DOCX
`
`-3-
`
`WSGRDocket No. 38938-719.101
`
`

`

`[0017] Short read sequencers, such as those made by Illumina and ABI suffer from being unable to provide
`
`phasing information. These sequencers can produce reads of 100-200bp andas short as 30bp. 454 can do a
`slightly betterjob because the reads can get up to 400bp, but even that is generally far from sufficient to yield
`phasing information. PacBio and some other technologies promise much longerreads, but even with 1000bp
`
`reads, muchof the phasing information will be lacking.
`
`[0018] Ail of the existing nextgen sequencing platforms entail a library preparation step, where genomic
`
`DNAis appropriately fragmented and potentially sized, then ligated with a commonset of primers. This
`
`commonset of primers is used in the sequencing step for massive clonal amplification — either in solution or
`
`on solid support. These clones can then be sequenced because the presence of a massive amountofidentical
`
`sequencein a tightly confined space allowsfor the amplification of the fluorescent (or other) signal emitted
`
`by the sequencing reaction.
`
`[0019]
`
`It is now becoming commonpractice to use tag sequences appendedto the primers, so that a common
`
`barcodeis ligated to every sequence from a particular sample. Then libraries from different samples can be
`
`mixed and sequencedin a single run. Since every read would contain a barcode it should then be
`
`straightforward to infer which sample produced any given read. This is known as sample multiplexing and
`allows for much morecost effective pricing per sample for many sequencing applications. Note that in this
`case a part of every read is consumedbythe barcode. This has becomeless andless of an issue, as the reads
`
`are getting longer (100-200bp) since in principle one can tag one million samples with 10bp tags.
`
`[0020] Short reads make it challenging to sequence large genomes de-novo. Short reads are also incapable of
`
`delivering phasing information for all but a very small fraction of polymorphisms. Ourpartition-barcoding
`scheme can be usedto effectively re-construct much longerreads, help with long range assembly and supply
`
`phasing information while making use of existing sequencing approaches.
`
`[0021] The keyis to start with high molecular weight DNA andpartition the sample so that a given partition
`
`is very unlikely to contain two fragments from the same locusbut different chromosomes. Library prep is
`
`then performed within droplets as described above.
`
`[0022] The core conceptis that all the reads map somewhatclose to each other in the genomeand are found
`in the same droplet that, are very likely linked to each other and thus reside on the same chromosome.In this
`
`fashion individual short reads can be strung together into longer sequence fragments.
`
`[0023] Library prep within droplets would entail fragmentation and ligation of adaptors. Fragmentation can
`be accomplished enzymatically using an endonuclease, followedbya ligation step. Alternatively, a
`transposon-based approach such as Nextera’s can be used in that approach DNAis fragmented and adapted in
`
`a single step reaction. One can follow adapterligation with PCR amplification of ligated products to increase
`
`their concentrations.
`
`4338427_1.DOCX
`
`-4-
`
`WSGRDocket No. 38938-719.101
`
`

`

`[0024] One can also perform an MDA(multiple-displacement amplification) step within the droplet prior to
`fragmentation and adapterligation to amplify the amount of DNAin each droplet in order to cover more of
`
`the captured fragments.
`
`[0025] Example:
`
`[0026] Let’s say we load 1,000 GE = 6ng of DNAto be sequenced at 100x depth. This impliesthat there are
`
`2,000 copies of every (normal copy number) target. For every locus, we want to makesurethat a large
`
`majority of fragments end up in their own partitions and that most of the 2,000 fragments are tagged with a
`
`unique barcode.
`
`[0027] Thefirst requirement is accomplished by increasing the numberofpartitions. With 100,000
`
`partitions, we expect that only about 0.5% of fragments at a particular locus from different chromosomes
`
`would end upin the samepartition. Note that many such cases will be readily identified by the appearance of
`
`distinct alleles from heterozygous SNPs with the same barcodeas well as by increased coverage of the locus
`
`by a barcode.
`
`[0028]
`
`In order to ensure that most fragments are tagged with distinct barcodes, we need a large number of
`
`different barcodes and an approachthat distributes barcodes so that any given partition is furnished with a
`
`small number(preferably one) of barcode-containing droplets. The distribution can be random so that some
`
`partitions receive zero droplets, some one, some multiple. Thus for 100,000 partitions we can supply 100,000
`
`barcoding droplets. In that case, 37% ofthe partitions will receive no adapters and will thus be unavailable for
`
`sequencing. The number barcoding droplets can be increased if sample preservation is of paramount
`
`importance. 37% ofthe partitions will be barcoded with a single barcode and up to 25% will be coded with
`
`potentially different barcodes. In the case above, 740 fragments will be unavailable for sequencing, 740 will
`
`be sequestered in with their own barcodes and 500 will be sequestered with multiple barcodes. Ideally all of
`
`the 740*1 + 360*2 + ... = 2,000 barcodesin the partitions associated with a particular fragment would be
`
`unique. If we have 10,000 different barcode types, then more than 80% of the fragments would be uniquely
`
`tagged.
`Ifthe numberof genome equivalents were lower then we would need fewerpartitions and barcodes.
`[0029]
`[0030] Note that perfection is not necessary for this application, because we only need to capture a small
`
`subset of SNPs from any given genomic location to yield phasing information.It is acceptable if a substantial
`fraction of fragments is not informative.
`[0031] One canattain greater efficiency of sample processing if each partition is supplied with a barcode in a
`
`controlled manner — as could be done with raindance-like merging of droplets or via a microfluidic circuit
`similar to Fluidigm’s array designs. Meaning if we can guarantee that a given partition receives precisely one
`ADF, we can make do with fewer ADFs and fewer ADF types.
`
`4338427_1.DOCX
`
`-5-
`
`WSGRDocket No. 38938-719.101
`
`

`

`[0032] A microfluidic chip can be used in an analogous manner forpartitioning. Sample partitions can be
`
`supplied with their own barcodesvia a two-dimensional arrangement of channels as described above. A very
`
`large number of unique barcodes can bereadily supplied by combining vertical and horizontal barcodes.
`
`[0033] Single cell transcriptome sequencing
`
`[0034] Single cells can be captured within separate droplets. These can be lysed and reverse transcribed with
`
`partition-specific barcoded primers (the appropriate reagents can be sequestered in their own inner droplets to
`
`be burst by heating when appropriate). Alternatively, a generic RT reaction can be followed by library prep,
`which would incorporate unique barcodes.
`[0035] Calculations for the numberof droplets and barcodes are similar to what we covered above for
`
`phasing. For 2,000 cells, we need sufficient partitioning to capture them in separate droplets. 20,000 would be
`
`plenty. Wethen needto ensure that every oneofthe partitions with cells receives a unique barcode which can
`
`be accomplished reasonably well with 10,000 barcoding types.
`
`[0036] After partitioning, lysing, barcoding and sequencing, the read data can be analyzed to determine
`
`which transcripts came from the samecell. This way the massive capacity of nextgen sequencing can be
`
`applied to large collections of cells while preserving single cell resolution.
`
`[0037] Single cell genomic sequencing
`
`[0038] Similar to the idea of single cell transcriptome sequencing, one can capture individualcells in
`
`separate partitions and sequence the genomes while preserving single cell resolution.
`
`[0039] Depending onthe library prep chemistry used, the sequence coverage percellis likely to be shallow
`(very few readsper locus), but can nevertheless be usefulfor discerning large CNVsatsingle cell resolution.
`
`[0040] One can also perform MDAwithin the droplet on the cell’s genomeprior fragmentation and adapter
`
`ligation. This would provide more genetic material from the cell to sequence at the cost of introducing some
`
`bias and potentially losing CNV information.
`
`[0041] Single cell methylome sequencing
`[0042]
`Idea: partition cells, expose them to methyl-sensitive enzymesto digest away methylatedsites,
`
`sequence what’s left.
`
`[0043] Exosome sequencing.
`
`[0044] This is the same basic concept as single cell transcriptome sequencing. Exosomesare small
`
`extracellular organelles that contain RNA. Sequencing that RNA while preserving the information about
`which transcripts derive from the same exosomeis currently impossible butis likely to be interesting and
`valuable.
`
`[0045] Metagenomics sequencing
`
`[0046] Analogous to what’s described above for cells, we can capture viruses/bacteria and sequencetheir
`
`genomesand transcriptomes. Due to the large numberand variety of micro-organismsit is often very
`
`4338427_1.DOCX
`
`-6-
`
`WSGRDocket No. 38938-719.101
`
`

`

`challenging to sequence them de-novo with short reads. Stringing the reads togethervia partitioning would
`
`allow for high-throughput sequencing of many species whose genomesare currently effectively inaccessible.
`
`[0047] Additional variation on the idea
`
`[0048] One can construct a microfluidics device that partitions the sample so that every cell ends up with its
`
`ownset of barcodes. The contents of each chamberis is then processed separately to dilute and further
`
`partition (perhaps now through an emulsion) in order to enable whole genomeortranscriptome amplification
`
`separately for each cell. The idea is that WGA and other amplification schemes benefit from partitioning to
`reduce competition between different parts ofthe genome or transcriptome.
`[0049] We can makelarge slugs to capture individual cells and supply them with their own barcodes. We can
`
`then break those slugs into many (thousands or more) of much smaller droplets in order to perform unbiased
`
`whole genome/transcriptome amplification in droplets. (WGA can work muchbetter in droplets than in bulk.)
`
`Note that the droplets from all the slugs can be mixed together because they are already furnished with
`
`appropriate adaptors so we can determine from the sequencing information which reads came from which
`
`slug.
`
`[0050] Another application:
`
`[0051] We pre-makebatches ofantibodies linked to beads coated with short DNA fragments. Each antibody
`
`would be associated with its own unique sequence. (The antibodies could also be linked to droplets containing
`DNAfragments — whichcanbeburst as appropriate. ((This is actually what I thoughtoffirst, but Ben
`suggested beads, which made more sense))). Cells can be pre-coated with these antibodies, then captured in
`larger droplets along with droplet/cell-specific barcode adaptors. We can then go throughlibrary prep as
`described above, sequencethe contents ofall the droplets and infer which reads came from whichcell by the
`barcodes. Now in addition to sequencingthe cells' genomes or transcriptomes we canalso get information
`
`abouttheir proteins. Some of the same information can be captured via FACS, but here we can have an
`
`essentially arbitrary amount of multiplexing. Also, this could be a relatively straightforward way of capturing
`
`cell-specific nucleic acid information along with the protein data for manycells in a single shot.
`
`{0052] Multiplexing to align the dynamic range of targets whose concentrations are very different and
`
`_
`
`to smooth out biological variation of reference genes
`
`[0053] Weestimate CNVs by measuring concentrations of a target gene and a reference genein a single
`
`reaction using one fluorescence dye for the target and anotherfor the reference.
`
`[0054] For high copy number, concentration of the target should be higher than concentration of the
`reference. In that case it may be challenging to measure both in a single digital PCR reaction dueto the
`
`limited dynamic range of digital PCR.
`
`4338427_1.DOCX
`
`-7-
`
`WSGRDocket No. 38938-719.101
`
`

`

`[0055] We can address thisproblem by choosing several different targets for the reference multiplexed
`together andall using the same fluorescence dye. This would boost the counts of the reference and bring
`
`them closer to those ofthe target.
`
`[0056] Depending on the numberoftargets to be multiplexed we may needto use universal probes, LNA
`
`probes, ligation approaches.
`
`[0057] The sameidea applies if one is trying to measure several gene expression targets in a single reaction.
`Wecan design several assays to target the lowest expressed gene and bring the measured countscloserto
`those of the higher expressed gene(s).
`[0058] For expression we may need to doarestriction digest on the cDNAin order to makesurethat the
`
`different targets measured in a given gene end upin different droplets.
`
`[0059] This scheme mayalso apply to measuring viral load levels in a single reaction.
`
`[0060] This kind of multiplexing for CNVs mayalso be useful for evening out biological variation where the
`
`reference may vary in copy numberfrom individualto individual. By averaging across multiple targets we
`reduce the impactof the variation. This becomesofparticular interest for diagnostic tests, in particular those
`
`measuring copy numberalterations.
`
`[0061] Measuring fetal load and multiplexing on the same channel
`
`[0062] When measuring fetal load we measure markersof fetal-specific DNA (such as Y chromosome
`
`markers, or paternal SNPs, methyl-digested sequence) as well as those of total DNA from cell free plasma.
`
`[0063] Plasma contains very little DNA and weare limited in the amount of blood we can draw. Thereforeit
`
`is desirableto use aslittle DNA as possible to achieve a satisfactory measurement offetal load.
`[0064] One can reduce the amountof plasmaused(or, equivalently, increase precision while consuming the
`same amount of sample) by multiplexing several markers of the same type on the same fluorescence channel.
`
`[0065] For example we can simultaneously measure N markers of total DNA on genes of knownstable copy
`
`numberon the VIC channel, and several markers on the Y chromosome on the FAM channel. We don’t need
`
`to know the concentration of each individual marker, just their combinedtotal, which will come directly from
`
`the appropriate channel.
`
`[0066] Measuring collocated species in plasma
`
`[0067] Our technology allows us to measure when twospecies of nucleic acids are in physical proximity to
`each other. We construct assays with different fluorophores correspondingto different targets. When thereis
`an excess of droplets with two (or more) fluophores, we infer that the two speciesare spatially linked.
`[0068] microRNAsand other RNAsare believed to be packaged in exosomesin blood. Alsoit’s possible that
`
`some of them are packaged together in protein- complexes.
`
`4338427_1.DOCX
`
`-8-
`
`WSGRDocket No. 38938-719.101
`
`

`

`[0069]
`
`It is of interest to be able to tell which transcripts are co-localized whether in exosomesorprotein
`
`complexes.
`
`[0070] The idea is that we would partition plasma or somederivative of plasma processed in a way that
`
`would preserve exosomesor protein complexesofinterest.
`
`[0071] We then break the exosomes, digest proteins and run the appropriate PCR reactions within the
`
`partitions in a standard ddPCRfashion. Bursting of the exosomes can be accomplished through a temperature
`
`adjustmentor by releasing an inner emulsion that would carry an exosomeor protein-complex breaking agent.
`
`[0072] Cell free DNA molecules may also travel similarly aggregated andit is of interest to determine when
`
`the molecules are collocated.
`
`[0073] Samelogic applies to proteins. In principle we can detect proteins through proximity ligation
`
`approaches.
`
`[0074] We can therefore detect when particular RNA, DNA,protein targets travel together in plasma.
`
`Additional fluorescence channels allow collocation measurement of more targets simultaneously. For
`
`example, with three fluorophores we can measure collocation frequencyacrossthree targets.
`
`[0075] While preferred embodiments of the present invention have been shown and describedherein,it will
`
`be obviousto those skilled in the art that such embodiments are provided by way of example only. Numerous
`
`variations, changes, and substitutions will now occurto those skilled in the art without departing from the
`
`invention. It should be understoodthat various alternatives to the embodiments of the invention described
`
`herein may be employedin practicing the invention. It is intended that the following claims define the scope
`
`of the invention and that methods andstructures within the scope of these claims and their equivalents be
`
`covered thereby.
`
`4338427_1.DOCX ©
`
`:
`
`-9-
`
`WSGRDocket No. 38938-719.101
`
`

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.

We are unable to display this document.

PTO Denying Access

Refresh this Document
Go to the Docket