`DOI: 10.1017/S0033583502003797
`Printed in the United Kingdom
`
`169
`
`A review of DNA sequencing techniques
`
`Lilian T. C. Franga', Emanuel Carrilho* and Tarso B. L. Kist?*
`'Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil and
`Instituto de Biofisica Carlos Chagas Filho, CCS, Universidade Federal do Rio de Janeiro, Rio de Janeiro,
`Rj, Brazil (E-mail: lila@biof.ufrj.br)
`2 Instituto de Quimica de So Carlos, Universidade de Si0 Paulo, Sao Carlos, SP, Brazil
`(E-mail: emanuel@iqsc.sc.usp.br)
`? Departamento de Biofisica, Instituto de Biociéncias, Universidade Federal do Rio Grande do Sul,
`91501-970, Porto Alegre, RS, Brazil (E-mail: tarso@orion.ufrgs.br)
`
`
`i. Summary 169
`
`2. Introduction 170
`
`170
`
`176
`
`3. Sanger’s method and other enzymic methods
`3.1 Random approach
`171
`3.2 Direct approach
`[71
`175
`3.3 Enzyme technology
`175
`3.4 Sample preparation
`3.5 Labels and DNAlabelling
`3.5.1 Radioisotopes
`| 76
`3.5.2 Chemiluminescent detection
`3.5.3 Fluorescent dyes
`177
`3.6 Fragment separation and analysis
`3.6.1 Electrophoresis
`180
`3.6.2 Mass spectrometry — an alternative
`
`176
`
`| 80
`
`182
`
`4. Maxam & Gilbert and other chemical methods
`
`183
`
`5. Pyrosequencing - DNA sequencing in real time by the detection of released
`PPi
`187
`
`6. Single molecule sequencing with exonuclease
`
`190
`
`7, Conclusion
`
`192
`
`8. Acknowledgements
`
`192
`
`9 - References
`
`193
`
`|. Summary
`
`The four best known DNA sequencing techniques are reviewed. Important practical issues
`covered are read-length, speed, accuracy, throughput, cost, as well as the automation of
`sample handling and preparation. The methods reviewed are: (i) the Sanger method andits
`
`* Author to whom correspondence should be addressed.
`Tel.: 55 51 3316 7618; Fax: 55 51 3316 7003; E-mail: tarso@orion.ufrgs.br
`00001
`
`EX1013
`
`00001
`
`EX1013
`
`
`
`170
`L. T. C. Franga etal.
`most important variants (enzymic methods); (ii) the Maxam & Gilbert method and other
`chemical methods; (iii) the Pyrosequencing™™ method — DNAsequencingin real time by the
`detection of released pyrophosphate (PPi); and (iv) single molecule sequencing with
`exonuclease (exonuclease digestion of a single molecule composed of a single strand of
`fluorescently labelled deoxynucleotides). Fach method is briefly described,
`the current
`literature is covered, advantages, disadvantages, and the most suitable applications of each
`method are discussed.
`
`2. Introduction
`DNAsequencing techniquesare key tools in manyfields. A large number of different sciences
`are receiving the benefits of these techniques, ranging from archaeology, anthropology,
`genetics, biotechnology, molecular biology, forensic sciences, among others. A silent and
`remarkable revolution is under way in many disciplines; DNAsequencing is promoting new
`discoveries that are revolutionizing the conceptual foundations of many fields. At the same
`time new andvery important issues are emerging with these developments, such as bioethical
`questions and questions related to public health andsafety.
`In this review we will follow the chronological development of the methods. Wewill start
`in Section 3 with the methods developed by Sanger and his collaborators in the 1970s. ‘The
`Maxam & Gilbert method and other chemical methods are reviewed in Section 4. The PPi
`method — based on detection of PPi released on nucleotide incorporation during chain
`extension by polymerase ~
`is reviewed in Section 5. The methods based onsingle molecule
`detection are reviewed in Section 6. Finally, the concluding remarks are given in Section 7.
`
`3. Sanger’s method and other enzymic methods
`The first method described by Sanger and Coulson for DNA sequencing wascalled ‘plus and
`minus’ (Sanger & Coulson, 1975). This method used Escherichia coli DNA polymerase| and
`DNA polymerase from bacteriophage T4 (Englund, 1971, 1972) with different limiting
`nucleoside triphosphates. The products generated by the polymerases were resolved by
`ionophoresis on acrylamide gels. Due to the inefficacy of the ‘plus and minus’ method, 2 yt
`later, Sanger and his co-workers described a new breakthrough method for sequencing
`oligonucleotides via enzymic polymerization (Sangeref al. 1977). This method, which would
`revolutionize the field of genomics in the years to come, was initially known as the chain-
`termination method or the dideoxynucleotide method. It consisted of a catalysed enzymic
`reaction that polymerizes the DNA fragments complementary to the template DNA of
`interest (unknown DNA). Briefly, a 32p_labelled primer (short oligonucleotide with a
`sequence complementary to the template DNA) was annealedto a specific known region on
`the template DNA, which provided a starting point for DNAsynthesis. In the presence of
`DNA polymerases, catalytic polymerization of deoxynucleoside triphosphates (dNTP) onto
`the DNA occurred. The polymerization was extended until the enzyme incorporated a
`modified nucleoside [called a terminator or dideoxynucleoside triphosphate (ddNTP)] into
`the growing chain.
`This method was performed in four different tubes, each containing the appropriate
`amount of one of the four terminators. All the generated fragments had the same 5’-end,
`
`00002
`
`00002
`
`
`
`Review of DNA sequencing techniques
`
`17 |
`
`whereas the residue at the 3’-end was determined by the dideoxynucleotide used in the
`reaction. After all four reactions were completed,
`the mixture of different-sized DNA
`fragments was resolved by electrophoresis on a denaturing polyacrylamide gel,
`in four
`parallel
`lanes. The pattern of bands showed the distribution of the termination in the
`synthesized strand of DNA and the unknown sequence could be read by autoradiography.
`For a better understanding of the Sanger reaction, see Fig. 1. The enzymic method for DNA
`sequencing has been used for genomic research as the main tool to generate the fragments
`necessary for sequencing, regardless of the sequencing strategy. Two different approaches,
`shotgun and primer walking sequencing, are the most used (Griffin & Griffin, 1993). The
`main aspects of each strategy are described below in moredetail.
`
`3.1 Random approach
`
`Also known as shotgun sequencing, this is a random process because there is no control of
`the region that is going to be sequenced,atleast in the usual procedures (there are exceptions,
`for instance see the procedure described by Lander e¢ a/. 2001). Genomic DNAis randomly
`fragmented (by sonication, nebulization, or other scission methods) into smaller pieces,
`normally ranging from 2 to 3 kb. The fragments, inserted into a vector, are replicated in a
`bacterial culture. Several positive amplifications are selected, and the DNA is extensively
`sequenced. Due to the random nature of this process, the sequences generated overlap in
`many regions (Adamse¢ a/. 1996). The process of overlaying or alignment of the sequences
`is called sequence assembly. Shotgun sequencing normally produces a high level of
`redundancy (the same base is sequenced 6-10 times, in different reactions) which affects the
`total cost. A new variation of the method introduced by Venter ef a/.
`(1996) involved
`shotgunning a whole genomeat once. This strategy depended enormously on computational
`resources to align all generated sequences. However, the efforts were rewarded with the
`sequencing of the Haemophilus influenzae genome in only 18 months (Fleischmann ef a/. 1995)
`and, more recently, the human genome (Venter ef a/. 2001).
`Shotgun sequencing is well established, with ready availability of optimized cloning
`vectors, fluorescently labelled universal primers, and software for base calling and sequence
`assembly. ‘The whole process has a high level of automation, from the cloning of the vectors
`and colonyselection to the bases called. A simplified diagram of the shotgun process is
`summarized in Fig. 2. Although the random approach is fully compatible with automation,
`it can produce gaps in the sequence that can only be completed by direct sequencing of the
`region.
`
`3.2 Direct approach
`
`The other approach for genomic sequencing is the direct sequencing of unknown DNA
`within sites in which the sequence is known. For example, an unknown sequence of DNA
`is inserted into a vector and amplified. The first sequencing reaction is performed using the
`primers that hybridize to the vector sequence and polymerize the strand complementaryto
`the template. A second priming site is then chosen inside the newly generated sequence,
`following the same direction as the first one. This approach is known as primer walking
`(Studier, 1989; Martin-Gallardo et a/. 1992), and its major advantage is
`the reduced
`redundancy (Voss e¢ a/, 1993) because of the direct nature of the approach (opposite to
`00003
`:
`
`00003
`
`
`
`172
`
`L. 7. C. Franga et al.
`
`7.
`
`o | ‘g
`
`a
`Polymerized strand
`_- Primer
`| Polymerization _..
`"
`AISA) ay
`_ AM SAY ANMMAM +
`a
`
`(b)
`
`Pr
`
`:
`HAMS
`Hla
`:
`TUN
`ap
`i
`if
`i
`ANALiM bbe Ai Rub! aA, Ate LagLe stl HULL hWidaid a
`
`§
`
`Pee
`‘ilcl
`aes te
`
`<t2ts0=
`Foes
`
`iP
`
`CTOGGTeoeeAGTTGGTAGAGCAAAGGATGGTCAGOECCAAATGTGAGTAGTGCPAGGCTGAGAASCCOTGCTATAAAGacac
`i wi
`BAW RD
`:
`iy
`4
`i
`.
`A
`oi
`fCGTGGTGAAABTGGAGTTTGSCTGCACATAsGOTCAACCTAAATCTGCT,TGTACCAACAGGTGAAAAGAAAAGACAGGAsubuleal “tive
`
`‘a
`
`
`
`
`
`
`
`
`hat
`
`Fig. 1. Schematic representation of a sequencing process (‘four-colour Sanger’): starting from many
`copies of the ssDNA to be sequenced, bearing a known ‘marker’ at the beginning of the unknown
`sequence, a short oligonucleotide ‘primer’ complementary to this marker is hybridized(i.e. paired) to
`the marker,
`in the presence of DNA polymerase and free nucleotides. This hybridization initiates
`reconstruction by the polymerase of a single strand complementary to the unknown sequence (@).
`Including in the nucleotide bath in which the polymerization takes place a small fraction offluorescently
`labelled dideoxynucleotides (one different dye for each nucleotide type), which lack the OH group
`necessary for further extension ofthe strand,oneis able to synthesize at random complementary strands
`with all possible stop points (ic. all possible lengths with an integer number of nucleotides). These
`00004
`
`00004
`
`
`
`Review of DNA sequencing techniques
`
`173
`
`ED icati
`
`sonication or
`enzymic cut
`
`genomic DNA
`
`izi
`
`sizing
`
`=
`
`=
`
`
`
`2-3 kb
`
`%
`
`w
`
`cut +
`ut,
`ligation
`
`infection of
`mee at
`host cells
`
`growth
`cell
`atzl “ ee
`Bt
`colony
`a4
`Picking
`
`purified DNA
`
`massive sequencing -
`(6-8—- redundancy)
`
`sw —>>39—«——
`<—_——__—_-
`____~_>
`<—___—_—_
`—<§$<—_—_——-
`——————_-—C/“O-_ >
`
`$<———_—C_—_——>
`
`sequence assembly
`programs
`
`
`Fig. 2. Random sequencing approach or shotgun. The distinct processes involve first fragmentation of
`the DNA into 2-3 kbp range, fragments are then cloned into vectors and introduced into hostcells for
`amplification. After purification, the DNA from individual colonies is sequenced, and the results are
`lined up with sequence-assembly programs.
`
`random), as seen in Fig. 3. However, it requires the synthesis of each new primer, which, in
`the past, was time consuming and expensive, especially when dye-labelled primers were used.
`Somealternatives were introduced to overcome the problems of time and cost (Ruiz-
`Martinez e¢ a/. 1996). Although slightly different, these approaches shared the same idea of
`using a short oligonucleotide library as a means to create a longer primer. The numberofall
`sequences possible for an oligonucleotide with # bases is equal to 4”. It was proposed by
`Kieleczawaef a/. (1992) that a hexamerlibrary containing 4096 oligonucleotides could be cost
`effective. While each new 18-mer primer is used only once for each new reaction site
`
`
`newly synthesized ssDNAsare then separated bysize electrophoretically [see electropherogram in (d)]:
`consecutive peaks correspond to DNA fragments differing by one base, and each line corresponds to
`one given nucleotide, Automated analysis of the data allows the determination of the sequence (symbols
`above the peaks). The symbol N indicates ambiguous determination. In the present case, the sequence
`was faultless up to 435 bases. (Reproduced from Viovy, 2000.)
`00005
`
`00005
`
`
`
`174
`
`L. 7. C. Franga et al.
`
`tid
`
`enzymic cut w/
`specific enzyme
`
`genomic DNA
`
`ligation to
`a cosmid
`
`cell culture & growth or
`long-range PCR
`
`1"walkeee
`
`;
`4
`2" walk
`unknown DNA
`
`exces——v
`oo
`oo
`+EzEZ
`
`3" walk=_—>
`
`
`+—_—_—_—_—_——-
`ot
`—_——_—
`
`
`
`generated sequence (~ 2-4 «» redundancy)
`
`Fig. 3. DNA sequencing by the primer-walking strategy. In primer walking, the genomic DNAis cut
`into a large piece (~40 kbp) and inserted into a cosmid for growth. The sequencing is performed by
`walks, starting first from the known region of the cosmid. After the results from the first round are
`edited, a new priming site is located within the newly generated sequence. This procedureis repeated
`until the walks reach the opposite starting points.
`
`(uniqueness is a requirement to avoid false priming), a 6-mer can be employed in many
`priming sites at different positions.
`Using such short oligonucleotides leads to the possibility of mispriming since uniqueness
`is reduced with the reduced size of the oligonucleotide. For example, the use of three small
`oligonucleotides could result in several sites where one or two of them could hybridize to the
`template and initiate mispriming. To avoid this situation, a single-stranded DNA-binding
`protein (SSB) (Kieleczawa ef af. 1992), or the stacking effects of selected modular primers
`(Kotler ef a/, 1993) were used.
`small
`Nowadays,
`the appeal of a cost-effective and time-saving method that uses
`oligonucleotide libraries has disappeared with improvements in primer synthesis technology
`(Lashkariet a/. 1995). However, the demandfor a sequencing method that was able to provide
`long read-length (number of bases read per run), short analysis time, low cost, and high
`00006
`
`00006
`
`
`
`Review of DNA sequencing techniques
`
`175
`
`accuracy hasled to several modifications of the original Sanger method. In addition to several
`improvements in the procedures and in the reagents used in the sequencing reaction, further
`development
`in DNA separation technology was of paramount
`importance for
`the
`completion of the Human GenomeProject. Several of the improvements that have been made
`in each step of enzymic DNA sequencing will be described.
`
`3.3 Enzyme technology
`
`Improvements in DNA polymerase enzymes have greatly contributed to the quality of the
`sequencing reactions and sequencing data. Initially, isothermal DNA polymerases were used
`in manual and automated DNA sequencing (Tabor & Richardson, 1987; Tabore¢ a/. 1987).
`The reactions were performed at physiological temperatures (~37 °C) for a few minutes
`(~20 min). These enzymes (T4 or T7 DNA polymerases) evenly incorporated all four
`terminators, even the dye-labelled ones. The problem with these polymerases was that they
`were very sensitive to temperature and easily deactivated.
`With the discovery of the polymerase chain reaction (PCR) and the use of a heat-stable
`DNApolymerase from Thermus aquaticus (Taq polymerase), the ability to perform sequencing
`reactions (cycle-sequencing) with reduced amounts of DNAtemplate comparedto isothermal
`enzymes became possible (Mullis e¢ a/. 1986; Mullis & Fallona, 1987). The major drawback
`of cycle-sequencing using Taq polymerase was the preference of the enzyme for ddNTPs
`rather than dNTPs. A single substitution of one amino acid in the primary sequence of the
`enzyme completely changedthis effect and the rate of ddNTP incorporation was substantially
`equalized to that of dNTPs (Tabor & Richardson, 1995).
`Manyother enzymesare available for PCR and cycle sequencing. PCR enzymes require an
`extra feature, that is 3’- to 5’-exonuclease activity. This feature is called the proof-reading
`ability of the enzyme, Le. its ability to correct mistakes made during incorporation of the
`nucleotides. For cycle-sequencing, this activity must be suppressed to avoid un-interpretable
`data.
`there was still significant variation in peak intensity for
`Although largely improved,
`fluorescently labelled dye-terminators. The pattern of the termination was reproducible and
`predictable (Parker et a/. 1996), but this variation made automatic basecalling difficult. A few
`years later, one of the major suppliers of fluorescent sequencing kits introduced a modified
`set of fluorescent labels for ddNTPs. With this new dye-terminator kit, the signal was more
`even, and automated base calling improved significantly (see Section 3.5.3).
`
`3.4 Sample preparation
`
`The methodology for sample preparation often included the following steps: () DNA
`scission and cloning into a vector (e.g. M13 or M13mp18);
`(ii) vector amplification to
`produce a phage-infected culture; and (iii) purification from the cell culture to yield pure
`single-stranded (ss)DNA template (Martin & Davies, 1986), as illustrated in Figs 2 and 3.
`Amongthestrategies used to generate random fragmentsit is possible to mention: deletions
`generated by transposons (Ahmed, 1984), production of subclones by sonication of the DNA
`(Deininger, 1983), and restriction enzymes (Messing, 1983) such as DNAse (Anderson, 1981),
`exonuclease ITI (Henikoff, 1984) and T4 DNA polymerase resection clones (Dale et a/. 1985).
`An alternative strategy for sequencing projects on a large scale that involved procedures
`for amplification, purification, and selection of the M13 template was described by Beck &
`00007
`
`00007
`
`
`
`176
`
`L. 7. C. Franca et al.
`
`Alderton (1993). The main innovation in the amplification step was the use of the PCR. For
`the purification step, a large number of systems that used agarose were commercially
`available. However,
`these systems were both expensive and time consuming, and used
`considerable quantities of PCR products. Several methodologies for purification of PCR
`products have been described; among them,a technique that uses exonuclease I and shrimp
`alkaline phosphatase to degrade the excess primers and non-incorporated nucleotides, the
`main factors interfering in the sequencing reactions (Werle ef a/. 1994). Another method for
`purification of the fragments generated in the PCR was based on precipitation by isopropyl
`alcohol (Hogdall e¢ a/. 1999). This method is inexpensive,
`fast, and efficient for PCR
`fragments of any length.
`In another methodologyfor sequencing PCR products,a template generated by PCR using
`a biotinylated forward primer and a non-biotinylated reverse primer has been used (Van den
`Boom ef al. 1997). The non-purified product was submitted to dye-terminator cycle-
`sequencing using the same primers as used for the PCR. They enhanced the probability for
`the extension reaction by employing a second DNA polymerase, which is insensitive to the
`ddNTP concentration needed for sequencing. This results in a combined amplification and
`sequencing reaction in a single reaction due to the two DNA polymerases with differential
`incorporation rates for dideoxynucleotides (Van den Boom efal. 1998).
`Another methodfor directly sequencing from PCR products was suggested and is based
`on the substitution of the chain-terminator by chain-delimiters (Porter ef al. 1997). In this case
`it was demonstrated that boranophosphates (ANT"P: 2’-deoxynucleoside-5’-a-[P-borano]-
`triphosphate) were convenientfor use as delimiters for direct PCR sequencing (Fig. 4). The
`boranophosphates were heat stable, therefore they could be incorporated into DNA by PCR
`and, once incorporated, they blocked the action of the exonucleases. After incorporation, the
`boranophosphatepositions can be revealed bydigestion with an exonuclease, thus generating
`a series of fragments with boraneat the end. The resulting fragments were separated by gel
`electrophoresis in a standard sequencing reaction.
`Finally, the widely used method of plasmid-based amplification in E. coli followed by
`alkaline lysis was originally described by Birnboim & Doly (1979). Actually, most of the
`columnpreparations currently being sold for DNAisolation, involve using a technique based
`
`on this work.
`
`3.5 Labels and DNAlabelling
`
`3.5.1 Radioisotopes
`The enzymic method, when it was first described, used °*P as a label. Biggin e¢ al. (1983)
`proposed the use of deoxyadenosine 5/-(a-[3°S]thio)triphosphate as the label incorporated
`into the DNA fragments. This strategy resulted in an increase in band sharpness on
`autoradiography as well as in the resolution of band separation.
`
`3.5.2 Chemiluminescent detection
`As an alternative to radioisotopes, a method based on chemiluminescence detection with the
`biotin-streptavidin system has been used (Beck e/ a/. 1989; Gillevet, 1990; Olesen ef al. 1993;
`Cherry ef a/. 1994). In this system, the 5’-end ofan oligonucleotide linked to biotin was used
`as the primer in the sequencing reaction. The enzymealkaline phosphatase is bound to the
`
`00008
`
`00008
`
`
`
`Review of DNA sequencing techniques
`
`|77
`
`O-P—O-P—-O-P-O
`
`N
`
`OH
`Fig. 4. Structure of 2’-deoxynucleoside-5’-«-[P-borano]-triphosphate (ANT?P). N = adenine, cytosine,
`guanine or thymine. (Reproduced from Porter ef a/. 1997.)
`
`3’-OH
`
`DNAchain
`
`
`(b) Light
`
`
`
`Substrate (a), (b)
`
`(a) Colour
`
`Solid support
`
`Cc» Biotinylated primer
`Biotinylated alkaline phosphatase
`
`mS Streptavidin
`—»> Biotin
`
`Fig. 5. Schematic diagram for the colorimetric (a) or chemiluminescent (b) detection of immobilized
`DNAusing an enzyme-catalysed reaction. (Reproduced from Beck ef a/. 1989.)
`
`5’-end ofthe oligonucleotide by a streptavidin conjugate. The enzymecatalysed a luminescent
`reaction (Fig. 5) and the emitted photons could be detected by a photographic film. There
`are at least three advantages to this method; first, the sequencing reactions were obtained
`directly from the PCR products; secondly, this method did not require cloning of the DNA
`before sequencing (Douglas et a/. 1993; Debuire ef a/. 1993), and thirdly, it was possible to
`multiplex several reactions on the same gel and detect one at a time with appropriate enzyme-
`linked primers (Gillevet, 1990).
`
`3.5.3 Fluorescent dyes
`
`it still suffered from the use of
`Although the Sanger method was fast and convenient,
`radioisotopic detection, which was slow and potentially risky. Additionally, it required four
`lanes to run one sample because the label was the sameforall reactions. To overcome such
`problems, Smith ef a/. (1986) developed a set of four different fluorescent dyes that allowed
`all four reactions to be separated in a single lane. The authors used the following fluorophore
`groups: fluorescein, 4-chloro-7-nitrobenzo-2-1-diazole (NBD), tetramethyl-rhodamine, and
`Texas Red (Smith e¢ a/. 1985, 1986), whose spectral properties are shown in Table 1.
`Each of the four dyes was attached to the 5’-end of the primer and each labelled primer
`was associated with a particular ddNTP. For example, the fluorescein-labelled primer reaction
`was terminated with ddCTP (dideoxycytidine triphosphate),
`the tetramethyl-rhodamine-
`labelled primer reaction with ddATP (dideoxyadenosine triphosphate) and so on. All four
`00009
`
`00009
`
`
`
`178
`
`L. T.C. Franca et al.
`
`“0#DNoq0q>*“**ME*‘*B™*$DOM9Da9OOeeeeeeeeeEeoeooauw___e
`
`Table 1. Spectral properties of some fluorophores used in automated DNA sequencingnn——————etooeeeeoooeeeeeeeeeeeeeuq503VvVc0308eh
`Absorption
`Emission
`Emission
`maximum
`maximum
`FWHM*
`(nm)
`(nm)
`(nm)
`Dye
`493
`516
`60
`Fluorescein
`475
`540
`79
`4-Chloro-7-nitrobenzo-2-1-diazole (NBD)
`556
`582
`52
`Tetramethyl-rhodamine
`
`
`599 612_—a—a_a_oeoeeeoomTexas Red 42eneeeeee00000OOnDananOnDoOounauoaoQqQqaeeeeewoao»weeoeeeaa—om™
`
`
`
`
`* PWHM,full width at half maximum.
`
`reactions were then combined and introduced onto a slab gelin a single lane. The bands were
`detected upon excitation of the fluorescent moiety attached to the DNA with a laser beam at
`the end ofthe gel. The fluorescent light was separated by meansof four different coloured
`filters. After the 4-colour data was generated, the sequence read-out was straightforward,
`with the association of each colour to one base only.
`fluorescence detection then became
`DNA sequencing in slab gels with fixed-point
`‘automated DNA sequencing’ rather than ‘manual DNA sequencing’, which required
`exposure of the whole slab gel to a photographic plate for a fixed time and post-analysis
`detection (Griffin & Griffin, 1993; Adamsef a/. 1996). Automated DNA sequencing has been
`performed via twodifferent labelling protocols. The first used a set of fout fluorescentlabels
`attached at
`the 5’-end of the primer, as described earlier. In the second method,
`the
`fluorescent moiety was linked to the ddNTP terminators, allowing the synthesis ofall four
`ladders in a single vial. In the latter case, when the labelled ddNTP was incorporated, the
`enzyme terminated the extension at the same time as the ladder waslabelled. Thus the C-
`terminated ladder contained one fluorescent dye, and the G-, A-, and T-terminated ladders
`had theit ownrespective labels. The protocols are known as dye-labelled primer chemistry
`and dye-labelled terminator chemistry, respectively, and both labelling arrangements are
`shown in Fig. 6.
`Alternative dyes were synthesized and linked to an M13 sequencing primer via a
`sulphydryl group and conjugated with tetramethyl-rhodamine iodoacetamide (Ansorgeef a/.
`1986). This alternative dye used tetramethyl-rhodamine as the only fluorophore becauseofits
`high extinction coefficient, high quantum yield, and long wavelength of absorption (A... =
`560 nm, Ay, = 575 nm, FWHM = 52nm). One year later,
`the same group proposed a
`sulphydryl-containing M13 sequencing primer end-labelled with fluorescein iodoacetamine
`(Ansorgeef a/. 1987). Other dyes commonlylinked to the primers includes carboxyfluorescein
`(FAM), catboxy-4’,5’-dichloro-2’,7’-dimetoxyfluorescein (JOE), carboxytetramethyl-rhodamine
`(TAMRA)and carboxy-X-rhodamine (ROX) (Swerdlow & Gesteland, 1990; Karger ef al.
`1991; Carson et al. 1993). These dyes have emission spectta with their maximarelatively well
`spaced, whichfacilitates colour/base discrimination. One drawbackof this group of dyes was
`the need for two wavelengthsfor excitation; one at 488 nm for FAM and JOE dyes, and
`another at 543 nm for TAMRA and ROX dyes.
`A different set of four base-specific succinylfluorescein dyes linked to chain-terminating
`dideoxynucleotides was described (Prober e# a/. 1987). These dyes were 9-(carboxyethyl)-3-
`hydroxy-6-oxo-6GH-xanthenes or succinylfluoresceins (SF-xxx, where xxx represents the
`emission maximum in nanometres).
`
`00010
`
`00010
`
`
`
`Review of DNA sequencing techniques
`
`\79
`
`(a)
`
`A- reaction
`C - reaction
`
`
`
`
`Fane
`
`oe
`
`+ dNTPs ()
`+ ddATP ()
`
`+ dNTPs(*)
`+ ddCTP (#)
`
`G - reaction
`
`faMRA
`
`+ dNTPs()
`+ ddGTP(a)
`
`T - reaction
`
`
`tox
`
`+ dNTPs(2)
`+ ddTTP (®)
`
`Fos J
`Foo
`J-——aome
`Foo
`Jo0008
`
`Recon
`Lo
`7oom a
`ooo, ——
`
`Combine reactions +
`electrophoresis in one lane
`
`()
`
`
`
`C+F
`
`——}+—T
`
`Single vial reaction (+J
`+
`aes|
`electrophoresis in one lane
`+ dNTPs
`P
`__——___}+——-_R
`+ F-ddATP
`—___}+—__-—__F
`+ J-ddCTP
`+ T-ddGTP
`+ R-ddTTP
`Fig. 6. Comparison ofreactions for dye-labelled primer (2) and dye-labelled terminator (4) chemistries.
`Labelled primers require four separate reactions while labelled terminators only one. F, FAM; J, JOE;
`T, TAMRA; R, ROX.
`
`>+
`
`Another modification in the original sequencing protocol used T7 DNA polymerase (or
`Sequenase™) with unlabelled primers but with a strategy of internal labelling. This helped
`to overcome ambiguous sequences that were occasionally observed (Wiemannef a/. 1996). A
`new set of dyes, dipyrrometheneborondifluoride fluorophores (BODIPY) were shown to
`have better spectral characteristics than conventional rhodamine and fluorescein dyes. These
`dyes also showed uniform electrophoretic mobility, high fluorescence intensity, and
`consumed 30% less reagents per reaction than the conventional dyes (Metzker ef a/. 1996).
`A new dye set used for one-lane four-dye DNA sequencing with a set of fluorescent dyes with
`similar absorption and emission spectra, but different fluorescentlifetimes, has been described
`(Miiller ef a/. 1997). A different strategy, based on a series of near-IR fluorescent dyes with
`an intramolecular heavy atom to alter the fluorescence lifetimes, was also suggested to
`produce a set of dyes for one-lane DNA sequencing (Flanagan ef a/. 1998).
`A significant advance in dye-primer chemistry was the introduction of energy transfer (ET)
`dyes (Ju et a/. 1995a, b). They consisted of two dyes per primer, one being a common donor
`00011
`
`00011
`
`
`
`180
`
`L. 7. C. Franga et al.
`
`and the other an acceptor dye. The common donor can be either a fluorescein (FAM)or a
`cyanine (Cy5) derivative (Hungef a/. 1996) at the 5’-end. The second dye, the discriminating
`one,
`is located about 10 bases along, with the separation between the dyes optimized for
`energy-transfer efficiency and minimum electrophoretic mobility shifts. The four acceptors
`are the commonly used ones in dye-primer chemistry; FAM, JOR, TAMRAand ROX (Ju
`et al. 1995a). The major advantages of ET dyesare that they can be almost evenly excited by
`a single wavelength (488 nm) and that the electrophoretic mobility shifts are minimal.’
`BODIPY dyes were used to produce similar LT primers offering narrower spectral
`bandwidth and better quantum efficiency (Metzker ef a/. 1996). Since their introduction, ET
`dyes have been widely used (Wang ef al. 1995; Kheterpalef a/. 1996, 1998). A new method
`of constructing ET primersusing a universal cassette of ET wasalso developed. This cassette
`could be incorporated via conventional synthesis at the 5’-end of any primer sequence (Ju
`et al, 1996) allowing this technologyto be used in primer-walking projects.
`,
`Any genome-sequencing project cannot be accomplishedsolely by the shotgun approach
`and, eventually, some part of the sequence has to be generated by primer walking. Because
`the synthesis of labelled primers is very expensive, dye-labelled terminator chemistryis the
`system of choice in such cases. Impressive advances have also been madein this field. As
`mentioned earlier, the first enzymes used in cycle-sequencing had severe problemsin evenly
`incorporating the labelled terminators. To improve the sequencing performance, besidesall
`modifications in the synthesis of the enzyme, significant changes in the dye structure were also
`made. Conventional dye-terminator chemistry used rhodamine and fluorescein derivatives.
`Depending on the enzyme used,
`these dyes showed a large variation in peak height,
`depending on the sequence. In addition, they required two different excitation wavelengths
`because the dyes that emitted fluorescence at longer wavelengths were poorly excited by the
`argon ion laser (488 nm); therefore, an additional laser had to be used. In order to improve
`the spectral features of such dye-terminators, dichlororhodamine derivatives were proposed
`andtested for peak pattern and enzymediscrimination. A further improvement was achieved
`with the concept of ET dyes, which was also successfully translated to dye-terminator
`protocol (Rosenblum ¢7 a/. 1997; Lee ef al. 1997). With this latest improvement, performing
`cycle-sequencing with energy-transfer terminators became routine and results were of high
`quality (Zakeri ef a/. 1998).
`
`3.6 Fragment separation and analysis
`Separation and analysis of DNA fragments generatedby the Sanger method is a broad chapter
`and would be worthya review on its own. However,it is impossible to discuss the Sanger
`method and DNAanalysis without covering the important issues of electrophoresis and
`electrophoretic separation of DNA-sequencing samples.
`
`3.6.1 Electrophoresis
`The separation of labelled DNA fragments by polyacrylamide gel electrophoresis has been
`one of the greatest obstacles to complete automation of the enzymic DNA sequencing
`method. Among the main problems are gel preparation, sample loading, and post-
`1 Dueto differences in charge and size, fluorescent dyes impart a differential migration pattern to the
`DNA. Theeffect is most pronounced for small fragments (< 200 bases).
`00012
`
`00012
`
`
`
`Review of DNA sequencing techniques
`
`18 |
`
`electrophoresis gel treatment. However, a number of improvements in gel technology and
`electrophoresis have occurred, including the use of thinner gels (Garoff & Ansorge, 1981;
`Kostichka ef a/. 1992), gel gradient systems (Biggin e¢ a/. 1983), gel-to-plate binders, and the
`employment of devices to avoid temperature-induced band distortions (Garoff & Ansorge,
`1981). Althoughsignificant progress in enzymic DNA sequencing was made, relying solely
`on slab gel technology was not enough to accomplish the challenges set by the Human
`GenomeProject. In fact, in 1998 there was less than 6% of the genome published in the
`databases. The completion of the human genome was only possible due to several
`technological advances offered by capillary electrophoresis (C