throbber
BRIEFINGS IN FUNC TIONAL GENOMICS . VOL 10. NO 6. 374 ^386
`
`doi:10.1093/bfgp/elr033
`
`Targeted enrichment of genomic DNA
`regions for next-generation sequencing
`
`Florian Mertes, Abdou ElSharawy, Sascha Sauer, Joop M.L.M. van Helvoort, P.J. van der Zaag, Andre Franke,
`Mats Nilsson, Hans Lehrach and Anthony J. Brookes
`Advance Access publication date 26 November 2011
`
`Abstract
`In this review, we discuss the latest targeted enrichment methods and aspects of their utilization along with
`second-generation sequencing for complex genome analysis. In doing so, we provide an overview of issues involved
`in detecting genetic variation, for which targeted enrichment has become a powerful tool. We explain how targeted
`enrichment for next-generation sequencing has made great progress in terms of methodology, ease of use and ap-
`plicability, but emphasize the remaining challenges such as the lack of even coverage across targeted regions. Costs
`are also considered versus the alternative of whole-genome sequencing which is becoming ever more affordable.
`We conclude that targeted enrichment is likely to be the most economical option for many years to come in a
`range of settings.
`
`Keywords: targeted enrichment; next-generation sequencing; genome partitioning; exome; genetic variation
`
`INTRODUCTION
`Next-generation sequencing (NGS) [1, 2] is now a
`major driver in genetics research, providing a power-
`ful way to study DNA or RNA samples. New and
`improved methods and protocols have been de-
`veloped to support a diverse range of applications,
`including the analysis of genetic variation. As part of
`this, methods have been developed that aim to
`achieve ‘targeted enrichment’ of genome subregions
`
`[3, 4], also sometimes referred to as ‘genome parti-
`tioning’. Strategies for direct selection of genomic
`regions were already developed in anticipation of
`the introduction of NGS [5, 6]. By selective
`recover and subsequent sequencing of genomic loci
`of interest, costs and efforts can be reduced signifi-
`cantly compared with whole-genome sequencing.
`Targeted enrichment can be useful in a number
`situations where particular portions of
`a
`
`of
`
`Corresponding author. Florian Mertes, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany.
`Tel: þ49 30 8413 1289; fax þ49 30 8413 1128; E-mail: mertes@molgen.mpg.de
`Florian Mertes studied biotechnology and earned a Doctorate from the Technical University Berlin. Currently, he is a postdoctoral
`researcher focusing on applied research to develop test/screening assays based on high-throughput technologies, using both PCR and
`next-generation sequencing.
`Abdou ElSharawy is a postdoctoral researcher (University of Kiel, CAU, Germany), and lecturer of Biochemistry and Cell Molecular
`Biology (Manusoura University, Egypt). He focuses on disease-associated mutations and miRNAs, allele-dependent RNA splicing, and
`high-throughput targeted, whole exome, and genome sequencing.
`Sascha Sauer is a research group leader at the Max Planck Institute for Molecular Genetics, and coordinates the European Sequencing
`and Genotyping Infrastructure.
`Joop M.L.M. van Helvoort is CSO at FlexGen. He received his PhD at the University of Amsterdam. His expertise is in microarray
`applications currently focusing on target enrichment.
`P.J. van der Zaag is with Philips Research, Eindhoven, The Netherlands. He holds a doctorate in physics from Leiden University. At
`Philips, he has worked on a number of topics related to microsystems and nanotechnology, lately in the field of nanobiotechnology.
`Andre Franke is a biologist by training and currently holds an endowment professorship for Molecular Medicine at the Christian-
`Albrechts-University of Kiel in Germany and is guest professor in Oslo (Norway).
`Mats Nilsson is Professor of Molecular Diagnostics at the Department of Immunology, Genetics, and Pathology, Uppsala University,
`Sweden. He has pioneered a number of molecular analysis technologies for multiplexed targeted analyses of genes.
`Hans Lehrach is Director at the Max Planck Institute for Molecular Genetics. His expertise lies in genetics, genomics, systems biology,
`and personalized medicine. Highlights include key involvement in several large-scale genome sequencing projects.
`Anthony J Brookes is a Professor of Bioinformatics and Genomics at the University of Leicester (UK) where he runs a research team
`and several international projects in method development and informatics for DNA analysis through to healthcare.
`
`ß The Author(s) 2011. Published by Oxford University Press.
`This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
`by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
`
`00001
`
`EX1033
`
`

`

`Targeted enrichment of genomic DNA regions
`
`375
`
`whole genome need to be analyzed [7]. Efficient
`sequencing of the complete ‘exome’ (all transcribed
`sequences) represents a major current application,
`but researchers are also focusing their experiments
`on far smaller sets of genes or genomic regions po-
`tentially being implicated in complex diseases [e.g.
`derived from genome-wide
`association studies
`(GWAS)], pharmacogenetics, pathway analysis and
`so on [1, 8, 9]. For identifying monogenetic diseases,
`exome sequencing can be a powerful
`tool
`[10].
`Across all these areas of study, a typical objective is
`the analysis of genetic variation within defined
`cohorts and populations.
`Targeted enrichment techniques can be charac-
`terized via a range of technical considerations related
`to their performance and ease of use, but the prac-
`tical importance of any one parameter may vary de-
`pending on the methodological approach applied
`and the scientific question being asked. Arguably,
`the most important features of a method, which in
`turn reflect the biggest challenges in targeted enrich-
`ment, include: enrichment factor, ratio of sequence
`reads on/off target region (specificity), coverage (read
`depth), evenness of coverage across the target region,
`method reproducibility, required amount of input
`DNA and overall cost per target base of useful
`sequence data.
`Within this review, we compare and contrast
`the most commonly used techniques for targeted
`enrichment of nucleic acids
`for NGS analysis.
`Additionally, we consider issues around the use of
`such methods for the detection of genetic variation,
`and some general points regarding the design of the
`target region, input DNA sample preparation and the
`output analysis.
`
`ENRICHMENT TECHNIQUES
`Current techniques for targeted enrichment can be
`categorized according to the nature of their core re-
`action principle (Figure 1):
`
`(i)
`
`(ii)
`
`‘Hybrid capture’: wherein nucleic acid strands
`derived from the input sample are hybridized
`specifically to preprepared DNA fragments com-
`plementary to the targeted regions of interest,
`either in solution or on a solid support, so that
`one can physically capture and isolate the
`sequences of interest;
`‘Selective circularization’: also called molecular
`inversion probes (MIPs), gap-fill padlock probes
`
`and selector probes, wherein single-stranded
`DNA circles
`that
`include target
`region se-
`quences are formed (by gap-filling and ligation
`chemistries) in a highly specific manner, creating
`structures with common DNA elements that are
`then used for selective amplification of the tar-
`geted regions of interest;
`(iii) PCR amplification: wherein polymerase chain
`reaction (PCR) is directed toward the targeted
`regions of
`interest by conducting multiple
`long-range PCRs in parallel, a limited number
`of standard multiplex PCRs or highly multi-
`plexed PCR methods that amplify very large
`numbers of short fragments.
`
`Given the operational characteristics of these dif-
`ferent targeted enrichment methods, they naturally
`vary in their suitability for different fields of applica-
`tion. For example, where many megabases needs
`to be analyzed (e.g. the exome), hybrid capture
`approaches are attractive as they can handle large
`target regions, even though they achieve suboptimal
`enrichment over the complete region of interest.
`In contrast, when small target regions need to be
`examined, especially in many samples, PCR-based
`approaches may be preferred as they enable a deep
`and even coverage over the region of interest, suit-
`able for genetic variance analysis.
`An overview of these different approaches is pre-
`sented in Figure 1, and Table 1 lists
`the most
`common methods
`along
`with
`additional
`information.
`
`Basic considerations for targeted
`enrichment experiments
`The design of a targeted enrichment experiment
`begins with a general consideration of the target
`region of interest. In particular, a major obstacle for
`targeted enrichment is posed by repeating elements,
`including interspersed and tandem repeats as well as
`elements such as pseudogenes located within and
`outside the region of interest. Exclusion of repeat
`masked elements [11] from the targeted region is a
`straightforward and efficient way to reduce the re-
`covery of undesirable products due to repeats.
`Furthermore, at extreme values (<25% or >65%),
`the guanine-cytosine (GC) content of the target
`region has a considerable impact on the evenness
`and efficiency of
`the enrichment
`[12]. This can
`0
`-UTR/
`adversely affect the enrichment of the 5
`promoter region and the first exon of genes, which
`
`00002
`
`

`

`376
`
`Mertes et al.
`
`Figure 1: Commonly used targeted enrichment techniques. (1) Hybrid capture targeted enrichment either on solid
`support-like microarrays (a) or in solution (b). A shot-gun fragment library is prepared and hybridized against a li-
`brary containing the target sequence. After hybridization (and bead coupling) nontarget sequences are washed
`away, the enriched sample can be eluted and further processed for sequencing. (2) Enrichment by MIPs which are
`composed of a universal sequence (blue) flanked by target-specific sequences. MIPs are hybridized to the region of
`interest, followed by a gap filling reaction and ligation to produce closed circles. The classical MIPs are hybridized
`to mechanically sheared DNA (a), the Selector Probe technique uses a restriction enzyme cocktail to fragment
`the DNA and the probes are adapted to the restriction pattern (b). (3) Targeted enrichment by differing PCR
`approaches. Typical PCR with single-tube per fragment assay (a), multiplex PCR assay with up to 50 fragments (b)
`and RainDance micro droplet PCR with up to 20 000 unique primer pairs (c) utilized for targeted enrichment.
`
`are often GC rich [13]. Therefore, expectations
`regarding the outcome of the experiment require
`careful evaluation in terms of
`the precise target
`region in conjunction with the appropriate enrich-
`ment method.
`The performance of a targeted enrichment ex-
`periment will also depend upon the mode and qual-
`ity of processing of the input DNA sample. Having
`sufficient high-quality DNA is key for any further
`downstream handling. When limited genomic DNA
`is available, whole-genome amplification (WGA) is
`usually applied. Since WGA produces only a repre-
`sentation and not a replica of the genome, a bias is
`assumed to be introduced though the impact of this
`on the final results can be compensated for, to a degree
`by identically manipulating control samples [14].
`
`All three major targeted enrichment techniques
`(hybrid capture, circularization and PCR) differ in
`terms of sample library preparation workflow enabl-
`ing sequencing on any of the current NGS instru-
`ments
`(e.g.
`Illumina, Roche 454 and SOLiD).
`Enrichment by hybrid selection relies on short frag-
`ment library preparations (typically range from 100
`to 250 bp) which are generated before hybridization
`to the synthetic library comprising the target region.
`In contrast, enrichment by PCR is performed dir-
`ectly on genomic DNA and thereafter are the library
`primers for sequencing added. Enrichment by circu-
`larization offers the easiest library preparation for
`NGS because the sequencing primers can be added
`to the circularization probe, thus eliminating the
`need for any further
`library preparation steps.
`
`00003
`
`

`

`Targeted enrichment of genomic DNA regions
`
`377
`
`103^104(upto10Mb)
`
`specialistequipment
`
`Relativelyexpensive,
`
`lowinputDNA
`
`Evencoverage,
`
`Afterenrichment
`
`102^104(0.1^5Mb)
`
`probes
`hybridization
`(incorporatedinto
`
`Duringenrichment
`
`10^200(0.1^1.5Mb)
`102^104(0.1^5Mb)
`
`DNAforLargesets
`effectiveness,>10mg
`largelyinfluence
`
`PCRconditions
`
`availableyet
`exomekitnot
`
`coverage
`intermsof,even
`reasonablyeconomical
`
`Easytoperform,
`evencoverage
`upandautomatable,
`Relativelyeasytoset
`
`DNA(<1mg)
`specificity,input
`instruments,high
`
`Nodedicated
`
`Allmajortargetedenrichmenttechniquesshowrelativeprosandcons.
`
`(150^1500bp)
`20000amplicons
`dropletPCRofupto
`coveragebytiling,micro
`
`Smallertargetregions,
`
`(150^450bp)
`150^200amplicons
`multiplexPCRof
`coveragebytiling,
`
`Smallertargetregions,
`
`coveragebytiling
`
`Smallertargetregions,
`
`RainDance
`
`Microdroplet
`
`Multiplicom,Fluidigm
`
`SeqTargetsystem
`SequalPrep,Qiagen
`
`Invitrogen
`
`Multiplex
`
`Longrange
`
`PCR
`
`relevantpanelkits
`
`Customkitsandclinically
`
`HaloGenomics
`
`Selectorprobes
`Molecularinversionprobes
`
`Circularization
`
`Beforeenrichment
`
`104^106(1^50Mb)
`
`forNGS
`Libraryprep
`
`(targetsize)
`Numberofloci
`
`equipment(3^10mg)
`DNA,high-tech
`
`Largeamountofinput
`
`DNA(<1^3mg)
`amountofinput
`Easeofuse,small
`
`largetargetsets
`Easeofproduction,
`
`usekits
`possible,readyto
`exome),Multiplexing
`regions(i.e.whole
`preconfiguredtarget
`regions,customand
`Mediumtolargetarget
`
`SeqCapEZ
`NimbleGen
`MYselect,Roche
`MYcroarray
`FlexGenFleXelect,
`
`AgilentSureSelect,
`
`SeqCapEZ
`RocheNimbleGen
`AgilentSureSelect,
`
`Insolution
`
`Solidsupport
`Hybridcapture
`
`Cons
`
`Pros
`
`Features
`
`Vendor
`
`Enrichmenttechnique
`
`Table1:Currentlyemployedtargetedenrichmenttechniques
`
`00004
`
`

`

`378
`
`Mertes et al.
`
`Sequencing can be performed either as single read or
`paired-end reads of the fragment library. In general,
`mate–pair libraries are not used for hybridization-
`based targeted enrichments due to the extra compli-
`cations this implies in terms of target region design.
`In general, a single NGS run produces enough
`reads to sequence several samples enriched by one
`of the mentioned methods. Therefore, pooling stra-
`tegies and indexing approaches are a practical way
`to reduce the per sample cost. Depending on the
`method used for
`targeted enrichment, different
`multiplexing strategies can be envisaged that enable
`multiplexing in different stages of the enrichment
`process: before, during and after the enrichment.
`For targeted enrichment by hybrid capture, indexing
`of the sample is usually performed after the enrich-
`ment but to reduce the number of enrichment reac-
`tions,
`the sample libraries can alternatively be
`indexed during the library preparations and then
`pooled for enrichment [15]. Enrichment by PCR
`and circularization offers indexing during the enrich-
`ment by using bar-coded primers in the product
`amplification steps [16]. Furthermore, two multi-
`plexing strategies can be combined in a single ex-
`periment. First, multiple samples can be enriched as a
`pool, with each harboring a unique pre-added
`bar-code. Then second, another bar-coding proced-
`ure can be applied postenrichment, to each of these
`pools, giving rise to a highly multiplexed final pool.
`If such extensive multiplexing is used, great care
`must be taken to normalize the amount of each
`sample within the pool to achieve sufficiently even
`representation over all samples in the final set of se-
`quence reads. In addition, highly complex pooling
`strategies also imply far greater challenges when it
`comes
`to deconvoluting the final
`sequence data
`back into the original samples.
`The task of designing the target region is relatively
`straightforward, and this can be managed with web-
`based tools offered by UCSC, Ensembl/BioMart,
`etc. and spreadsheet calculations (e.g. Excel) on a
`personal computer. Web-based tools like MOPeD
`offer a more user-friendly approach for oligoncleo-
`tide probe design [17]. Far more difficult, however, is
`the final sequence output analysis, which needs dedi-
`cated computer hardware and software. Fortunately,
`great progress has recently been made in read map-
`ping and parameter selection for this process, leading
`to more consistent and higher quality final results
`[18]. Reads generated by hybrid selection will always
`tend to extend into sequences beyond the
`
`target region and the longer the fragment library is,
`the more of these ‘near target’ sequences will be re-
`covered. Therefore, read mapping must start with a
`basic decision regarding the precise definition of the
`on/off target boundaries, as this parameter is used for
`counting on/off target reads and so influences the
`number of sequence reads considered as on target.
`This problem is not so critical for enrichments based
`on PCR and circularization as these methods do not
`suffer from ‘near target’ products. Another major
`consideration in data analysis is the coverage needed
`to reliably identify sequence variants, e.g. single nu-
`cleotide polymorphisms
`(SNP). This depends on
`multiple factors such as the nature of the region of
`interest in question, the method used for targeted
`enrichment. In different reports, it has ranged from
`8x coverage [19], which was the minimum coverage
`for reliable SNP calling and up to 200x coverage
`[20], in this case the total average coverage for the
`targeted region.
`
`Enrichment by hybrid capture
`Enrichment by hybrid capture (Figure 1.1a and b)
`builds on know-how developed over the decade or
`more of microarray research that preceded the NGS
`age [21, 22]. The hybrid capture principle is based
`upon the hybridization of a selection ‘library’ of very
`many fragments of DNA or RNA representing the
`target region against a shotgun library of DNA frag-
`ments from the genome sample to be enriched. Two
`alternative strategies are used to perform the hybrid
`capture: (i) reactions in solution [4] and (ii) reactions
`on a solid support [3]. Each of these two approaches
`brings different advantages, as listed in Table 1.
`Selection libraries for hybrid capture are typically
`produced by oligonucleotide synthesis upon micro-
`arrays, with lengths ranging from 60 to 180 bases.
`These microarrays can be used directly to perform
`the hybrid capture reaction (i.e. surface phase meth-
`ods), or the oligonucleotide pool can be harvested
`from the array and used for an in-solution targeted
`enrichment (i.e. solution phase methods). The de-
`tached oligonucleotide pool enables versatile down-
`0
`0
`stream processing:
`if universal 5
`-
`and 3
`-end
`sequences are included in the design of the oligo-
`nucleotides, the pool can be reamplified by PCR and
`used to process many genomic samples. Furthermore,
`it is possible to introduce T7/SP6 transcription start
`sites via these PCRs [23], so that the pool can be
`transcribed into RNA before being used in an en-
`richment experiment.
`
`00005
`
`

`

`Targeted enrichment of genomic DNA regions
`
`379
`
`Recently, an increasing number of protocols and
`vendors have begun offering out of the box solutions
`for hybrid capture, meaning, the researcher need
`not do development work but merely choose be-
`tween a preset targeted enrichment regions (e.g.
`whole exome) or specify their own custom enrich-
`ment
`region. Example vendors
`include: Agilent
`(SureSelect product), NimbleGen (SeqCap EZ prod-
`uct), Flexgen and MYcroArray. Alternatively, a
`more cost efficient option compared with buying a
`complete kit
`involves ordering a synthetic bait
`library, reamplifying this by PCR [24], optionally
`transcribing this
`into RNA and undertaking a
`do-it-yourself enrichment experiment based upon
`published protocols.
`
`Enrichment by circularization
`Enrichment by DNA fragment circularization is
`based upon the principle of selector probes [6, 25]
`and gap-fill padlock or MIPs [26]. This approach
`differs significantly compared with the aforemen-
`tioned hybrid capture method. Most notably, it is
`greatly superior in terms of specificity, but far less
`amenable to multiple sample co-processing in a
`single reaction. Each probe used for enrichment by
`circularization comprises a single-stranded DNA
`oligonucleotide that at its ends contains two se-
`quences that are complementary to noncontiguous
`stretches of a target genomic fragment, but in re-
`versed linear order. Specific hybridization between
`such probes and their cognate target genomic frag-
`ments generates bipartite circular DNA structures.
`These are then converted to closed single-stranded cir-
`cles by gap filling and ligation reactions (Figure 1.2). A
`rolling circle amplification step or a PCR directed
`toward sequences present in the common region of
`all the circles is then finally applied to amplify the
`target regions (circularized sequences) to generate an
`NGS library.
`Variations on this basic method concept exist, in
`particular with regard to the differences in sample
`material preparation and downstream processing for
`NGS library preparation. In the gap-fill padlock or
`MIPs
`implementation (Figure 1.2a),
`the sample
`DNA is fragmented by shearing and used in the bi-
`partite circular structure to provide a template for the
`probe DNA to be extended by gap filling and con-
`verted to a closed circle. In this incarnation, the de-
`sign of
`the MIPs merely has
`to consider
`the
`uniqueness of each target region fragment and the
`most suitable hybridization conditions. In contrast, a
`
`more elaborated design is offered by the ‘Selector
`Probe’ technique [6, 27]. Here the genomic DNA
`is fragmented in a controlled manner by means of a
`cocktail of restriction enzymes, and the selector
`probes are designed to accommodate the restriction
`pattern of the target region. The ends of each gen-
`omic DNA thus become adjacently positioned in the
`bipartite circles, enabling them to be gap filled and
`ligated into closed single-stranded circles
`(Figure
`1.2b).
`A particularly appealing feature of enrichment by
`circularization with MIPs and selectors is their ‘li-
`brary free’ nature [28]. Since MIPs and selectors
`0
`0
`- and 3
`-end with a
`comprise a target-specific 5
`common central linker, the sequencing primer infor-
`mation for NGS applications can be directly included
`into this common linker. Burdensome NGS library
`preparations are therefore not required, reducing
`processing time markedly.
`
`Enrichment by PCR
`Enrichment by PCR (Figures 1.3a–c) is in terms of
`methodology, a more straightforward method com-
`pared with the other genome partioning techniques.
`It takes advantage of the great power of PCR to
`enrich genome regions
`from small amounts of
`target material. Just as for circularization methods,
`if the PCR product sizes fall within the sequencing
`length of the applied NGS platform (maximum read
`length for SOLiD: 110 bp, Illumina: 240 bp and 454:
`1000 bp) PCR-based enrichment can allow one to
`bypass the need for shot-gun library preparation by
`0
`-tailed primers in the final amplifica-
`using suitably 5
`tion steps.
`The main downside of the method is that it does
`not scale easily, in any format, to enable the targeting
`of very large genome subregions or many DNA sam-
`ples. To use this method effectively, any significant
`extent of parallelized singleplex or multiplex PCR
`would need to be supported by the use of automated
`robotics,
`individual PCR amplicons (or multiplex
`products) need to be carefully normalized to equiva-
`lent molarities when pooling in advance of NGS (so
`that the final coverage of the total region of interest is
`as even as possible), and the amount of DNA mater-
`ial a study requires can be substantial as this require-
`ment grows linearly with the number of utilized
`PCR reactions. But if the target region is small,
`PCR can be the method of choice. For example, a
`target region of 50–100 kb or so, could be spanned
`by a handful of long-range PCRs each of 5–10 kb
`
`00006
`
`

`

`380
`
`Mertes et al.
`
`[29], or by tiling a few hundred shorter PCRs and
`using microtiter plates and robotics, or by one or
`other approaches toward PCR multiplexing [30, 31].
`Long-range PCR is the most commonly applied
`approach and it is reasonably straightforward to ac-
`complish. Many vendors now offer specially formu-
`lated kits
`(e.g.
`Invitrogen SequalPrep, Qiagen
`SeqTarget)
`that can amplify fragments of up to
`20 kb in length. And obviously, this approach is
`fully compatible with automation. Long-range
`PCR products do, however, have to be cleaned,
`pooled and processed for shot-gun library prepar-
`ation so that they are ready for analysis by NGS.
`To increase the throughput of PCR by keeping
`the number of PCR reactions as low as possible, there
`is the alternative of multiplex PCR (Figure 1.3b).
`Given careful primer design and reaction optimiza-
`tion, several dozen primer pairs can be used together
`effectively in a multiplexing reaction [32]. Indeed,
`software specifically created to help with multiplex
`PCR assay design is available [33]. Then, by running
`many such reactions in parallel, many hundred dif-
`ferent DNA fragments can be amplified. An alterna-
`tive method that
`is commercially available from
`Fluidigm (Table 1), uses a microfluidics PCR chip
`to conduct
`several
`thousand singleplex PCRs
`in
`parallel.
`the
`Yet, another strikingly elegant method is
`micro-droplet PCR technology developed by
`Raindance [34, 35]. Here, two libraries of lipid en-
`capsulated water droplets are prepared—one in
`which each droplet contains a small amount of the
`test sample DNA and the other comprising droplets
`that harbor distinct pairs of primers. These two
`libraries are then merged (respective droplet pairs
`are fused together) to generate a highly multiplexed
`total emulsion PCR wherein each reaction is actually
`isolated from all others in its own fused droplet
`(Figure 1.3c). Using this technology, up to 20 000
`primer pairs can be used effectively in parallel in a
`single tube.
`Overall, one can draw the following conclusions
`from a comparison of the currently used enrichment
`techniques shown in Table 1: (i) that hybrid capture
`has its main advantages for medium to large target
`regions (10–50 Mb) in contrast to the other two
`approaches which typically only target small regions
`within the kilo base pairs and low mega base pairs
`range. The ability to enrich for mega base pair-sized
`targets is particularly advantageous in research studies
`where typically whole exomes or many genes are
`
`this
`involved. Especially for clinical applications,
`may be relevant for oncological applications where
`one would expect to sequence 100–1000’s of genes.
`(ii) The advantage of PCR and circularization-based
`methods is that they achieve very high enrichment
`factors and few off-target reads, but only for small
`target regions. This is more suited to clinical genetics
`where typically only a few critical loci need to be
`assessed.
`
`Descriptive metrics for targeted DNA
`enrichment experiments
`To allow meaningful comparison of enrichment
`methods and experiments that employ them, and
`to rationally decide which technologies are most
`suitable when designing a research project, it is im-
`portant that an objective set of descriptive metrics are
`defined and then widely used when reporting en-
`richment datasets. A series of metrics need to be
`considered, and the importance of each can be
`weighted according to specific needs and objectives
`of any experiment. A proposal for such a set of met-
`rics is soon to be published, and it contains the fol-
`lowing (Nilsson et al., manuscript in preparation):
`
`(i) Region of interest (size): ROI;
`(ii) Average read depth (in ROI): D;
`(iii) Fraction of ROI sufficiently covered (at a spe-
`cified D): F;
`(iv) Specificity (fraction of reads in ROI): S;
`(v) Enrichment Factor (D for ROI versus D for rest
`of genome): EF;
`(vi) Evenness (lack of bias): E and
`(vii) Weight (input DNA requirement): W.
`
`A theoretical examination of how a method’s
`innate enrichment capability and the size of the tar-
`geted region work together to determine other par-
`ameters (such as specificity and read depth) can be
`very instructive when choosing an enrichment
`method for a particular application. This is illustrated
`in Figure 2. For example, given a method’s specific
`enrichment factor and knowledge about the size of
`the region of interest, the corresponding sequencing
`effort can be estimated for a given desired specificity
`(percent of sequences on target). Similarly,
`for a
`given region of interest and a minimum desired spe-
`cificity, the necessary enrichment factor capabilities
`can be calculated.
`Finally, the specific per sample costs for a targeted
`enrichment is useful to consider. To make costs
`
`00007
`
`

`

`Targeted enrichment of genomic DNA regions
`
`381
`
`Figure 2: Comparison of enrichment factor calculations on sequencing depth and percent on target sequences for
`different target region sizes employed for targeted enrichment. Calculations were performed as follows:
`EF ROIROIþgenome [52] sequencing depth ¼ pot seq per run
`percent on target sequences ¼ 100
`EF ROI
`EF, enrichment factor;
`100 ROI
`ROI, region of interest in kb; genome, genome size in kb; pot, percent on target sequences; seq per run, assumed
`sequences per run in kb.
`
`comparable, either for different target region sizes or
`across different methods, the costs can be normalized
`as costs per base pair. Costs also change with time
`and as technologies improve, and so at some stage
`the overall price of any particular experiment (i.e.
`targeted enrichment plus sequencing costs) will not
`be cheaper than the alternative of whole-genome
`sequencing combined with in silico-based isolation
`of the region of interest.
`
`DISCOVERY OF GENETIC
`VARIATION
`To investigate genetic variation by NGS, many
`DNA samples need to be tested. To reduce the
`cost of such studies, researchers typically focus their
`attention on genome subregions of particular inter-
`est, and this implies a major role for targeted enrich-
`ment in such undertakings. A set of concerns then
`arises regarding the accuracy of variation discovery
`
`within NGS data obtained from DNA that has been
`subjected to one or other enrichment methods.
`Other questions, such as whether the input genomic
`DNA was also preamplified by WGA, whether sam-
`ple pooling or multiplexing was applied and whether
`proper experimental controls were employed, also
`come into play. Currently, however, the field is lack-
`ing a complete understanding of all the issues and
`influences relevant
`to these important questions.
`For these reasons, it is critical that thorough down-
`stream validation experiments are performed, using
`independent experimental approaches.
`Another dimension to the problem of reliably dis-
`covering sequence variation, and one where there is
`perhaps a little more clarity, is the impact of different
`software and algorithm choices used for primary se-
`quence data analysis (e.g. the choice of suitable gen-
`ome alignment tool, filter parameters for the analysis,
`coverage thresholds at intended bases). It has been
`shown that the detection of variants depends strongly
`
`00008
`
`

`

`382
`
`Mertes et al.
`
`software tools employed [36].
`on the particular
`Indeed, because current alignment and analytical tools
`perform so heterogeneously,
`the 1000 Genomes
`Project Consortium [37] decided to avoid calling
`novel SNPs unless they were discovered by at least
`two independent analytical pipelines. In general, uni-
`fied analysis workflows can and must be developed
`[38] to enable the combination and processing of
`data produced from different machines/approaches,
`to at least minimize instrument-specific biases and
`errors
`that otherwise detract
`from making high-
`confidence variant base calls.
`Whatever mapping and analysis approach is
`applied, sufficient coverage on a single base reso-
`lution ranging from 20 to 50x is usually deemed
`necessary for reliable detection of sequence variation
`[39–42]. In one simulation study, the SNP discovery
`performances of two NGS platforms in a specific
`disease gene were shown to fall rapidly when the
`coverage depth was below 40x [43]. In addition,
`all called variants should ideally be supported by
`data from both read orientations (forward and re-
`verse). Some researchers further insist on obtaining
`at least three reads from both the forward and the
`reverse DNA strands (double-stranded coverage) for
`any nonreference base before it is called [20]. Such
`stringent quality control practices are surely needed
`to minimize error rates and the impact of random
`sampling variance, so that true variations and sequen-
`cing artifacts can be resolved and homozygous and
`heterozygous genotypes at sites of variation reliably
`scored.
`to
`Deep coverage alone seems not, by itself,
`always be sufficient for accurate variation discovery.
`For example, a naı¨ve Bayesian model for SNP call-
`ing—even with deep coverage—can le

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket