WSGRDocket No. 38938-713.102
`
`PROVISIONAL PATENT APPLICATION
`
`COPY NUMBER ESTIMATION
`
`Inventor(s):
`
`Mike LUCERO,
`Citizen of USA, Residing at
`634 Pine Terrace
`South San Francisco, CA 94080
`
`Serge SAXONOV,
`Citizen of USA, Residing at
`10 De Anza Ct,
`San Mateo, CA 94402
`
`Ben HINDSON,
`Citizen of Australia, Residing at
`1039 Bannock Street
`Livermore, CA 94551
`
`Kevin NESS,
`Citizen of Canada, Residing at
`24 Baytree Way Apt #10
`San Mateo, CA 94402
`
`Phil BELGRADER,
`Citizen of USA, Residing at
`89 Robinson Landing Rd.
`Severna Park, MD 21146
`
`Wilson Sonsini Goodrich &Rosati
`PROFESSIONAL CORPORATION
`
`650 Page Mill Road
`Palo Alto, CA 94304
`(650) 493-9300 (Main)
`(650) 493-6811 (Facsimile)
`
`Filed Electronically on: February 18, 2011
`
`WSGRDocket No. 38983-713.102
`
`

`

`COPY NUMBER ESTIMATION
`
`CROSS-REFERENCE
`
`[0001] This application is related to co-pending U.S. Provisional Patent Application No. 61/443,156, filed
`
`February 15, 2011, which is incorporated herein by referencein its entirety.
`
`[0002] There is a need for improved methods for copy numberestimation of nucleic acid.
`
`BACKGROUNDOF THE INVENTION
`
`[0003] Described herein are methods for estimating the copy numberof nucleic acids.
`
`SUMMARYOF THE INVENTION
`
`INCORPORATION BY REFERENCE
`
`[0004] All publications, patents, and patent applications mentionedin this specification are herein
`
`incorporated by reference to the same extent as if each individual publication, patent, or patent application
`
`was specifically and individually indicated to be incorporated by reference.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0005] The novel features of the invention are set forth with particularity in the appended claims. A better
`
`understanding of the features and advantages of the present invention will be obtained by reference to the
`
`following detailed description that sets forth illustrative embodiments, in whichthe principles of the invention
`
`are utilized, and the accompanying drawings of which:
`
`[0006] Figure 1 illustrates FAM and VIC separated by 1K, 10K, or 100K bases.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`[0007] A practical methodfor high-resolution CNV estimation and validation using ddPCR™
`
`[0008] Droplet digital PCR (ddPCR™)is a practical solution for validating copy numbervariations
`
`identified by next generation sequencers and microarrays. The ddPCR™ method can empowerone person to
`
`screen hundreds of samples for CNV analysis in a single work shift. The ddPCR™ workflow involves using
`
`restriction enzymes to separate tandem copiesof a target gene prior to assembling a duplex TaqMan® assay
`
`that includes reagents to detect both the target gene and a single-copy reference gene. The reaction mixture
`
`can then be partitioned into 20,000 nanoliter droplets that are thermo-cycled to end-point before being
`
`analyzed in a two-color reader. The fraction of positive-counted droplets enables the absolute concentrations
`
`for the target and reference genes to be measured, from which,a relative copy numberis determined. 20,000
`
`PCRreplicates per well providesthe statistical powerto resolve higher-order copy numberdifferences. This
`
`low-cost methodreliably generates copy number measurements with 95% confidenceintervals that span
`
`integer without overlap of adjacent copy numberstates. This technology is capable of phasing copy number
`
`-2-
`
`WSGRDocket No. 38938-713.102
`
`

`

`variants, and it can determine whether gene copies are on the sameor different chromosomes. Applications
`
`of this technology include, e.g., high-resolution CNV measurements, follow-up to genome-wideassociation
`
`studies, cytogenetic analysis, CNV alterations in canceroustissue, and CNV phasing.
`
`[0009]
`
`In general, described herein are restriction enzymes for copy numberestimation in digital format
`
`[0010] Disclosed includes: 1) Use ofrestriction enzymesto separate target copies so that the copies can be
`
`assorted independently into partitions for the digital readout and 2) use of the readout of undigested DNA
`
`together with the readout from digested DNAto estimate how targets are phased — ie if they are present close
`
`to each other on the same chromosomeorif they are on different chromosomes.
`
`[0011] Separating target copies for copy numberanalysis
`
`[0012] Digital PCR counts targets by partitioning the sample and identifying partitions containing the target.
`
`The digital readoutis an all or nothing processin that it specifies whether a given partition contains the target
`
`of interest and not necessarily how manycopiesofthe target are in the partition.
`
`[0013]
`
`In copy numberapplications, one is interested in counting the numberof times a particular sequence
`
`(ie target) is found in a given genome. This can be donebyassessing the concentration ofthe target and of a
`
`reference sequence that’s knownto be present at some fixed numberof copies in every genome. Typically for
`
`the reference one uses a housekeeping genethat’s present at two copies per diploid genome. Dividing the
`
`concentration of the target by the concentration of the reference yields an estimate for the numberoftarget
`
`copies per genome.
`
`[0014] For a particular genome, two or more targets can be linked closely together on the same chromosome.
`
`In that case, if the DNA is not sufficiently fragmented, a digital PCR partition containing one target will also
`
`contain all the targets that are linked to it. Because ofits digital nature, the readout would count multiple
`
`copies as one. In order for the linked targets to be counted separately they need to be separated before digital
`
`analysis.
`
`[0015] One technique that has been usedfor target separation is STA (Specific Target Amplification) [Qin et
`
`al. 2008], which entails performing a short pre-amplification step to generate separate unlinked amplicons for
`
`the target and the reference. The main shortcoming of STAis that it requires for the amplification efficiencies
`
`of the target and the reference to be matched. A slight difference in amplification efficiencies would result in
`
`a bias in CNV estimates. For example, if the target has the efficiency of 95% and the reference has the
`
`efficiency of 100%, a five-cycle STA would result in a 23% underestimate of the CNV. Similarly, any
`
`fluctuations in relative efficiencies of the two reactions (due to sample composition, instrument performance
`
`or operator variation) can result in a significant loss of precision. Also, pre-amplification imposesadditional
`
`burdens on the workflow. For example it requires setting up a separate PCR reaction, performing PCR,
`
`dilution of the PCR amplicons.
`
`[0016] Here we present an approach ofusing restriction enzymesto separate target copies in order to
`
`estimate copy numberstates accurately. The basic outline is that one digests the sample using a restriction
`
`enzymeor a restriction enzyme cocktail. The enzymes are chosen so that the DNA betweenthe targets is
`
`-3-
`
`WSGRDocket No. 38938-713.102
`
`

`

`restricted, but the regions to be amplified are not. The digested sample is then used in the digital PCR reaction
`
`for copy numberestimation.
`
`[0017] Choosing appropriate restriction enzymes:
`
`[0018] The enzyme should not cut the target or the reference amplicon. One can use a reference genome
`
`sequenceto predict the cutting.
`
`[0019]
`
`It should not cut the target or the reference amplicon even if the amplicons contain SNPs. SNP
`
`information can be obtained from several databases, most readily from dbSNP. Methylation sensitive
`
`restriction enzymescan also be used.
`
`[0020]
`
`It needs to cut between the amplicons. It’s best to ensure this by choosing enzymes whoserecognition
`
`sequencesoccur-- preferably multiple times -- near the amplicons. One has to makesure that these
`
`recognition sequencesare not affected by the presence of SNPs.
`
`[0021] The enzymeneedsto be an efficient but specific (no star activity) cutter. This, along with digestion
`
`time and enzymeconcentration, can be determined in advanceby performing appropriate enzymetitration
`
`experiments.
`
`[0022] Multiple-digests can be employed if some of the enzymesare notefficient at cutting, or if they don’t
`
`all work universally well across all samples (eg because of SNPs).
`
`[0023] There is some evidence that the PCR reaction works better when the size of the fragment containing
`
`the amplicon is smaller. Therefore picking restriction enzymes with cutting sites near the ampliconsis
`
`desirable.
`
`[0024] One often needs to analyze the same sample for multiple CNVs. In this case, it is desirable to select
`
`the smallest number of digests that would work well for the entire set of CNVs. Ideally a single restriction
`
`enzyme cocktail could be found that does not cut within any of the amplicons but has recognition sites near
`
`each one of them.
`
`[0025] Appropriate software can be written to automate the process of restriction enzyme choice and present
`
`an interface for experimental biologists to choose the most appropriate enzymesgiven thecriteria above.
`
`Additional considerations can be employed by the software, such as enzymecost or availability.
`
`[0026] Digestion with more than one enzyme, performedserially or together in a single tube, may help to
`
`ensure complete cutting of difficult targets.
`
`[0027] Most restriction enzymes can be heat-inactivated after restriction by raising the temperature of the
`
`restriction reaction. The temperature of heat-inactivation can be below the melt point of the restricted target
`
`fragments, so as to maintain double-stranded template copies.
`
`[0028] A control assay and template can be used to measurethe efficiency of the RE digestion step.
`
`[0029] Extracting haplotype information for CNV analysis
`
`[0030] If multiple copies of a gene are present in a particular sample, it is often important to determine
`
`whether both maternal and paternal chromosomescarry some ofthe copies or if one of the chromosomes
`
`lacks that gene. For example if a sample contains two copies of a gene, it may carry one chromosome with
`
`-4-
`
`WSGRDocket No. 38938-713.102
`
`

`

`two copies and one with zero or it may carry two chromosomeswith one copy each. Similarly, a sample with
`
`three copies, may carry one chromosomewith three copies and the other with zero or one chromosome with
`
`two and the other with one. Distinguishing between these possibilities (establishing whether sequences of
`
`interest are linked) is called phasing or haplotyping.
`
`[0031] Currently, no method can resolve phasing in a practical and general manner. It is especially difficult
`
`to resolve phase for copy numbervariants. In some applications long range PCR can be used. In somecases,
`
`genotypes of parents or other relatives can be used to infer the copy numberstate of the target individual.
`
`[0032] Here we present a method of extracting phasing information for CNVs by assaying the same sample
`
`twice on a digital PCR platform: once after applying a treatment that separates copies of the target and once
`
`without applying such a treatment. This approach requires high precision for final copy numberestimation
`
`andis thus best suited for use with a digital PCR platform that can produce a large numberofpartitions.
`
`[0033] Steps:
`
`[0034] 1. Split the sample into two aliquots. Aliquot (A) should contain the sample processed so that linked
`
`copies are separated. For this portion one can use STAor the RE digestion method outlined above. Aliquot
`
`(B) should contain the sample not treated for target copy separation (meaning no STA ordigestion).
`
`[0035] 2. Assess copy numberin aliquot (A).
`
`[0036] 3. Assess copy numberin aliquot (B). The sample needsto be of sufficiently high molecular weight
`
`so that if a pair of targets is on the same chromosome, they are mostly linked in solution as well. Ifthe DNA
`
`is completely unfragmented, the readout should be 0, 1, or 2 copies. However, because DNA will usually be
`
`at least partially degraded we expect copy numbers to span non-integer values as well as numbersgreater than
`
`2. We can add anotherstep to assess fragmentation in the sample using gels or a digital PCR co-location
`
`method (mile-post assay). Thusif the sample is deemed overly fragmented, no information can be gleaned
`
`about its haplotypes.
`
`[0037] 4. The greater the difference between readingsin (2) and (3) the morelikely it is that one of the
`
`chromosomesdoesnot carry a copy of the target. See examples below.
`
`[0038] This approachis particularly valuable for smaller copy numberstates: 2, 3, 4. It yields less
`
`information about phasing (and is more difficult technically) at higher copy numbers. Conveniently, most
`
`CNV work has focused on lower copy numberstates and there is reason to believe that resolving phaseis
`
`morerelevant for such states.
`
`[0039] Example 1:
`
`EXAMPLES
`
`[0040] If we have a CN of2.0 for a particular assay post-digestion we don't necessarily know if the
`
`composition is (A) 2 copies on one chromosomeand 0 on the other; or (B) 1 on each. If we run the same
`
`assay on undigested DNA then weshould be able to resolve between the two possibilities. In principle, we
`
`should get a CN 1 if the arrangementis (A) and a CN of 2 if the arrangementis (B). Because DNA is
`
`-5-
`
`WSGRDocket No. 38938-713.102
`
`

`

`fragmented, the readings aren't going to be as clean -- (A) should yield a reading higher than 1.0, but
`
`presumably significantly less than 2.0; (B) should yield exactly 2.0.
`
`[0041] Higher fragmentation of the starting material would bring the CN reading in (A) closer to 2.0. As an
`
`example, for a given assay we anticipate that the linked copies are separated by about 10kb and based on our
`
`fragmentation analysis 30% of 10kb segments are fragmented. In that case, scenario (A) should yield a
`
`reading of 1.3 and scenario (B) a reading of 2.0.
`
`[0042] Example 2:
`
`[0043] Alternatively, if the CN reading is 3.0 post-digestion, we don't know if the composition is (A) 3
`
`copies on one chromosome and 0 on the other; or (B) 2 on one and 1 on the other. If we run the same assay on
`
`undigested DNA,and assume the same parameters of a 10kb separation and 30% fragmentation as above,
`
`scenario (A) would yield a reading of 1.6 copies (=0.7 * 0.3 * 2+ 0.7 * 0.3 *14+ 0.3 *0.7*2+0.3 * 0.3 *3
`
`), whereas (B) would yield a reading of 2.3 copies.
`
`[0044] For the digital PCR-realtime hybrids, a la Life’s Biotrove arrays, one might be able to extract
`
`additional phasing information from undigested DNA. One can estimate how manyofthe partitions contain
`
`two,three, etc copies of the target by analyzing real time curves within eachpartition.
`
`[0045] We can attempt to do the same by looking at endpoint fluorescence, if segments with two copies of
`
`the target yield higher amplitudes than segments with one copy. Then under scenario (A) you should have
`
`fewer positives, but those positives should on average have higher fluorescence.
`
`[0046] One could tweak the numberofcycles to gain the best separation (makingit so that segments carrying
`
`one copy do not reach the endpoint).
`
`[0047] Example3:
`
`[0048] % Consider three types of DNA fragments. Fam-Vic together (not chopped),
`
`[0049] % Fam fragment, Vic fragment. We observe someprobabilities (counts in
`
`[0050] % FAM-VICcrossplot), and goalis to infer the concentrations.
`
`[0051] % First let us do forward. Given concentrations, compute counts. Then to do
`
`[0052] % inverse, we simply try out different values of concentrations and select
`
`[0053] % one which gives actual counts.
`
`[0054] N=20000;
`
`[0055] A = 10000;
`
`[0056] B=20000;
`
`[0057] AB = 10000; % Joined together
`
`[0058] cA=A/N;
`
`[0059] cB=B/N;
`
`[0060] cAB=AB/N;
`
`[0061]
`
`fprintf(1, '%f %f %f\n', cAB, cA, cB);
`
`[0062] pA=1 - exp(-cA);
`
`-6-
`
`WSGRDocket No. 38938-713.102
`
`

`

`[0063]
`
`pB = 1 - exp(-cB);
`
`[0064]
`
`pAB = 1 - exp(-cAB);
`
`[0065]
`
`%A is X and B is Y in cross plot
`
`[0066]
`
`p(2,1) = (1 - pA) * (1 - pB) * (1 - pAB); % Bottom left
`
`[0067]
`
`p(2,2) =pA * (1 - pB) * C1 - pAB); % Bottom right
`
`[0068]
`
`p(i,1) =(1 - pA) * pB * (1 - pAB); % Top Left
`
`[0069]
`
`p(1,2) = 1 - p@,1) - p@,2) - pd,1); % Top Right
`
`[0070]
`
`disp(round(p * N));
`
`[0071]
`
`% Also compute marginals directly
`
`[0072]
`
`[0073]
`
`[0074]
`
`cAorAB = (A + AB)/N; % =c_A+c_AB;
`
`cBorAB = (B + AB)/N; %=c_B+c_AB;
`
`
`
`pAorAB = 1 - exp(-cAorAB); % Can be computed from p too
`
`[0075]
`
`pBorAB = 1 - exp(-cBorAB);
`
`[0076]
`
`% Inverse
`
`[0077]
`
`H=p *N; % Weare given somehits
`
`[0078]
`
`%H = [0 8000;2000 0];
`
`[0079]
`
`% Compute prob
`
`[0080]
`
`estN = sum(H(:));
`
`[0081]
`
`ip= HeestN;
`
`[0082]
`
`[0083]
`
`ipAorAB = 1p(1,2) +1_p(,2);
`
`ipBorAB =1p(1,1) +1i_p(i,2);
`
`
`
`[0084]
`
`i_cAorAB = -log(1 - ipAorAB);
`
`[0085]
`
`i_cBorAB = -log(1 - 1pBorAB);
`
`[0086]
`
`maxVal = min(i_cAorAB, 1_cBorAB);
`
`[0087]
`
`delta = maxVal/1000;
`
`[0088]
`
`errArr = [];
`
`[0089]
`
`gcABArr = [];
`
`[0090]
`
`for gcAB = 0:delta:maxVal
`
`[0091]
`
`gcA =1_cAorAB - gcAB;
`
`[0092]
`
`gcB =1_cBorAB - gcAB;
`
`[0093]
`
`gpA = 1 - exp(-gcA);
`
`[0094]
`
`epB = | - exp(-geB);
`
`[0095]
`
`gpAB = | - exp(-gcAB);
`
`[0096]
`
`gp(2,1) = (1 - gpA) * (1 - gpB) * (1 - gpAB); % Bottom left
`
`[0097]
`
`gp(2,2) = gpA * (1 - gpB) * (1 - gpAB); % Bottom right
`
`[0098]
`
`ep(1,1) = (1 - gpA) * gpB * (1 - gpAB); % Top Left
`
`[0099]
`
`ep(1,2) = 1 - gp(2,1) - gp(2,2) - gp(1.1); % Top Right
`
`-7-
`
`WSGRDocket No. 38938-713.102
`
`

`

`[00100] gH = gp * estN;
`
`[00101] err = sqrt(sum((H(:) - gH(:)).*2));
`
`[00102] errArr = [errArr; err];
`
`[00103] gcABArr = [gcABArr; gcAB];
`
`[00104] end
`
`[00105] figure, plot(gcABArr, errArr);
`
`[00106] minidx = find(errArr == min(errArr(:)));
`
`[00107] minidx = minidx(1);
`
`[00108] estAB = gcABArt(minidx);
`
`[00109] estA =i_cAorAB - estAB;
`
`[00110] estB = i_cBorAB- estAB;
`
`[00111] fprintf(1, ‘%f %f %f\n’, estAB, estA, estB);
`
`[00112] gpA =1- exp(-estA);
`
`[00113] gpB = 1 - exp(-estB);
`
`[00114] gpAB = 1 - exp(-estAB);
`
`[00115] gp(2,1) = (1 - gpA) * C1 - gpB) * (1 - gpAB); % Bottom left
`
`[00116] gp(2,2) = gpA * (1 - gpB) * (1 - gpAB); % Bottom right
`
`[00117] gp(1,1) =(1 - gpA) * gpB * (1 - gpAB); % TopLeft
`
`[00118] gp(1,2) = 1 - gp(2,1) - gp(2,2) - gp(1,1); % Top Right
`
`[00119] cH = gp * estN;
`
`[00120] disp(round(gH));
`
`[00121] % Confirm the results using simulation
`
`[00122] numMolA = round(estA * estN);
`
`[00123] numMolB = round(estB * estN);
`
`[00124] numMolAB= round(estAB * estN);
`
`[00125] A = unique(randsample(estN, numMolA,1));
`
`[00126] B = unique(randsample(estN, numMolB, 1));
`
`[00127] AB = unique(randsample(estN, numMolAB,1));
`
`[00128] U = 1:estN;
`
`[00129] notA = setdiff(U, A);
`
`[00130] notB = setdiff(U, B);
`
`[00131] notAB = setdiff(U, AB);
`
`[00132] AorBorAB = union(A,union(B, AB));
`
`[00133] none = setdiff(U, AorBorAB);
`
`[00134] simcount(2,1) = length(none);
`
`[00135] simcount(2,2) = length(intersect(A, intersect(notB, notAB)));
`
`[00136] simcount(1,1) = length(intersect(B, intersect(notA, notAB)));
`
`-8-
`
`WSGRDocket No. 38938-713.102
`
`

`

`[00137] simcount(1,2) = length(AorBorAB)- simcount(2,2) - simcount(1,1);
`
`[00138] disp(simcount);
`
`[00139] Example 4:
`
`[00140] Milepost Assay Analysis—Probability of Fragmentation
`
`[00141] Problem statement
`
`[00142] Normally, there are two species of molecules (corresponding to FAM and VIC probes). Here, there
`
`are three species—fragmented FAM,fragmented VIC, and Linked FAM-VIC.
`
`[00143] There are two dyes, so there can be ambiguity. There is a need to compute concentrationsofall three
`
`species. (See Figure 1)
`
`[00144] Algorithm: Get 2x2 table of FAM versus VIC counts. Compute concentration of fragmented FAM
`
`and linked FAM-VIC asif there is 1 species. Compute concentration of fragmented VIC and linked FAM-
`
`VIC as if there is 1 species. Try out different concentrations of linked FAM-VIC (from which concentration
`
`of fragmented FAM and VIC can be found), and find the bestfit of the probability table with the observed
`
`counts:
`
`
`
`VIC-
`
`(1-f) (I-v) (-¢)
`
`{(1-v) (1-c)
`
`Probability of fragmentation (in %)
`
`gete
`
`1k Uncut
`
`10K Uncut
`
`100K Uncut
`
`6
`
`29.4
`
`98.7
`
`29.8
`
`97.7
`
`29.5
`
`99.9
`
`[00145] Next steps can include to see if a closed formula can be easily derived and/or to integrate with
`
`QTools.
`
`-9-
`
`WSGRDocket No. 38938-713.102
`
`

`

`[00146] While preferred embodiments of the present invention have been shownanddescribed herein,it will
`
`be obviousto those skilled in the art that such embodimentsare provided by way of example only. Numerous
`
`variations, changes, and substitutions will now occurto those skilled in the art without departing from the
`
`invention. It should be understood that various alternatives to the embodiments of the invention described
`
`herein may be employed in practicing the invention. It is intended that the following claims define the scope
`
`of the invention and that methods and structures within the scope of these claims and their equivalents be
`
`covered thereby.
`
`-10-
`
`WSGRDocket No. 38938-713.102
`
`

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.

We are unable to display this document.

HTTP Error 500: Internal Server Error

Refresh this Document
Go to the Docket