throbber
Estimating Sequencing Coverage
`Before starting a sequencing experiment, you should know the depth of sequencing you want to
`achieve. This Technical Note helps you estimate that coverage.
`
`Next-generation shotgun sequencing approaches require sequencing
`every base in a sample several times for two reasons:
`
`• You need multiple observations per base to come to a reliable
`base call.
`
`• Reads are not distributed evenly over an entire genome, simply
`because the reads will sample the genome in a random and
`independent manner1,2. Therefore many bases will be covered
`by fewer reads than the average coverage, while other bases
`will be covered by more reads than average. You need to
`account for this in your planning.
`
`This is expressed by the coverage metric, which is the number of
`times a genome has been sequenced (the depth of sequencing). For
`applications where you aim to sequence only a defined subset of
`an entire genome, like targeted resequencing or RNA sequencing,
`coverage means the amount of times you sequence that subset. For
`example, for targeted resequencing, coverage means the number of
`times the targeted subset of the genome is sequenced.
`
`This Technical Note provides information on how to calculate the
`coverage required for an experiment, and how to estimate the number
`of flow cells or lanes you need to use.
`
`Coverage Requirements Depend on Application
`Illumina does not have an official recommendation for sequencing
`coverage level.
`
`Most users determine the necessary coverage level based on the type
`of study, gene expression level, size of reference genome, published
`literature, and best practices defined by the scientific community. For
`example, the level of coverage for human genome mutations/SNPs/
`rearrangements detection that most publications require is from 10×
`to 30× depth of coverage depending on the application and statistical
`model. For ChIP-Seq studies where reads map to only a subset of a
`genome, often the researchers/publications require coverage around
`100×.
`
`For RNA sequencing, determining coverage is complicated by the fact
`that different transcripts are expressed at different levels. This means
`that more reads will be captured from highly expressed genes, and
`few reads will be captured from genes expressed at low levels. When
`planning RNA sequencing experiments, researchers usually think in
`terms of numbers of millions of reads to be sampled. The number of
`reads required will depend on how sensitive the experiment needs
`to be for genes expressed at low levels. Detecting rarely expressed
`genes might require an increase in the depth of coverage.
`
`Standards are Set by Field and Journals
`The standards are ultimately set by journals and the scientific field you
`are in. The Publications section on Illumina’s website (http://science.
`illumina.com/science/publications/publications-list.html) provides a
`resource for users to search publications for Whole Genome Rese-
`quencing, De Novo Sequencing, Targeted Resequencing, Transcrip-
`tomics and many other fields. This is recommended as a starting
`point for determining the target depth of coverage for a particular
`study. Another good resource for RNA Sequencing is provided by the
`ENCODE project:
`
`http://genome.ucsc.edu/ENCODE/protocols/dataStandards/
`ENCODE_RNAseq_Standards_V1.0.pdf
`
`Estimating Sequencing Runs
`Coverage Equation
`
`The Lander/Waterman equation is a method for computing coverage1.
`The general equation is:
`
`C = LN / G
`
`• C stands for coverage
`
`• G is the haploid genome length
`
`• L is the read length
`
`• N is the number of reads
`
`So, if we take one lane of single read human sequence with v3 chem-
`istry, we get
`
`C = (100 bp)*(189×106)/(3×109 bp) = 6.3
`
`This tells us that each base in the genome will be sequenced between
`six and seven times on average.
`
`Coverage Calculator
`
`Illumina provides an online coverage calculator that calculates the re-
`agents and sequencing runs needed to arrive at the desired coverage
`for your experiment, based on the Lander/Waterman equation. The
`calculator can be found here:
`
`http://www.illumina.com/CoverageCalculator
`
`Perform the following steps to run the calculator:
`
`1. Enter the input parameters:
`
`• The target genome or region size, for example, input 3000 Mb
`(3 Gb) for human genome.
`
`• The coverage you want.
`
`• The total number of cycles. For example, if you want to
`perform 100 bp paired-end runs (2×100), enter 200.
`
`Technical Note: Sequencing
`
`Foresight EX1028
`Foresight v Personalis
`
`

`

`being sequenced a certain number of times. We can use the coverage as
`the average number of occurrences and y as the exact number of times a
`base is sequenced, and then compute the probability that would happen:
`
`P(Y=3) = (6.33 × e-6.3)/3! = 0.077
`Of course, this is the value for exactly 3. It probably is more interesting to
`see the probability the base is sequenced 3 times or less, as most SNP
`callers require at least four calls at a base position to call SNPs. We can
`determine this probability simply by summing up the probabilities for Y=2,
`Y=1, and Y=0:
`
`P(Y<=3) = P(Y=3) + P(Y=2) + P(Y=1) + P(Y=0) =
`
`0.077 + 0.036 + 0.012 + 0.002 = 0.127
`So we see that about 12.7% of the bases in the genome will be covered by
`three or fewer reads, and we will probably want to increase our coverage for
`this experiment.The same formula can be used in a couple other ways. For
`example, by simply computing the Y=0 probability, we can estimate the per-
`centage of a genome not yet sequenced: in our example above 0.2% of the
`genome was not sequenced at all. By multiplying 0.2% by the genome size,
`we see that we would have a total gap length of about 6,000,000 bp. We
`can also estimate the number of gaps by multiplying the number of reads
`used by the percentage of the genome not covered: 0.2% * 189,000,000
`gives 378,000 gaps in the sequence.
`
`2. Select the instruments you want to perform the calculation
`for.
`
`3. Click Submit.
`
`The calculator now writes tables containing the total output required,
`output per lane or flow cell, and number of lanes or flow cells you need
`to use for the desired coverage. You can also download the results
`in a comma-separated values file, so you can share data or use the
`tables in Excel.
`
`Note that the calculator uses an estimate of reads passing filter
`commonly found for balanced genomes (such as PhiX or the human
`genome). If you plan to sequence an unbalanced genome, you may
`have a lower number of reads passing filter, and consequently a lower
`output per lane. If you plan a targeted resequencing or enrichment
`experiment, make sure to read the technical note Optimizing Coverage
`for Targeted Resequencing.
`
`When to Sequence More
`In Illumina sequencing experiments, it is very easy to increase the
`coverage or sequence depth, if you later decide you need more data.
`Provided you still have your original sample, you can just sequence
`more, and combine the sequencing output from different flow cells.
`There are a number of reasons to sequence more than the originally
`estimated coverage, these include:
`
`• The effects you see are not statistically significant. Sequencing
`more reads will generally increase the power of your assay.
`
`• You are investigating events that are very rare. For example,
`you may want to look at transcripts that are expressed at a
`very low level in RNA Sequencing, or look at very low binding
`activities in ChIP Sequencing.
`
`• Certain journals or fields may require a higher level of coverage
`for your particular application.
`
`• Certain genomes may need more sequencing. For example,
`certain regions may be hard to sequence requiring more
`coverage, or the genome may be polyploid.
`
`References
`1. Lander ES, Waterman MS.(1988) Genomic mapping by fingerprinting ran-
`dom clones: a mathematical analysis, Genomics 2(3): 231-239.
`2. Estimating the number of times a base is expected to be sequenced.
`Lander and Waterman made two assumptions about the sequencing:
`• Reads will be distributed randomly across the genome
`• Overlap detection doesn’t vary between reads.
`Based upon these two assumptions, they reached the conclusion that the
`number of times a base is sequenced follows a Poisson distribution. The
`Poisson distribution can be used to model any discrete occurrence given an
`average number of occurrences.The probability function is the following:
`
`P(Y=y) = (Cy × e-C)/y!
`• y is the number of times a base is read
`• C stands for coverage
`We can use the Poisson distribution to compute the probability of a base
`
`Illumina • 1.800.809.4566 toll-free (U.S.) • +1.858.202.4566 tel • techsupport@illumina.com • www.illumina.com
`
`FOR RESEARCH USE ONLY
`
`© 2014 Illumina, Inc. All rights reserved.
`Illumina, other trademarks separated by commas, and the pumpkin orange color are trademarks of Illumina, Inc. and/or its affiliate(s)
`in the U.S. and/or other countries. Pub. No. 770-2011-022 Current as of 12/01/14
`
`Technical Note: Sequencing
`
`Foresight EX1028
`Foresight v Personalis
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket