`For DNA Sequencing By Synthesis
`
`Lin Yu
`
`Submitted in partial fulfillment of the
`requirements for the degree
`of Doctor of Philosophy
`in the Graduate School of Arts and Sciences
`
`COLUMBIA UNIVERSITY
`
`2010
`
`Page a
`
`Illumina Ex. 1093
`IPR Petition - USP 10,435,742
`
`
`
`
`
`
`
`UMI Number: 3428688
`
`
`
`
`
`
`All rights reserved
`
`INFORMATION TO ALL USERS
`The quality of this reproduction is dependent upon the quality of the copy submitted.
`
`In the unlikely event that the author did not send a complete manuscript
`and there are missing pages, these will be noted. Also, if material had to be removed,
`a note will indicate the deletion.
`
`
`
`
`
`
`
`
`UMI 3428688
`Copyright 2010 by ProQuest LLC.
`All rights reserved. This edition of the work is protected against
`unauthorized copying under Title 17, United States Code.
`
`
`
`
`
`
`
`
`ProQuest LLC
`789 East Eisenhower Parkway
`P.O. Box 1346
`Ann Arbor, MI 48106-1346
`
`
`
`
`Page b
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`© 2010
`
`Lin Yu
`All Rights Reserved
`
`Page c
`
`Page c
`
`
`
`
`
`ABSTRACT
`
`Novel Strategies To Increase Read Length And Accuracy
`For DNA Sequencing By Synthesis
`
`Lin Yu
`
`
`The completion of the Human Genome Project has increased the need for high-
`
`throughput DNA sequencing technologies aimed at uncovering the genomic
`
`contributions to diseases. The DNA sequencing by synthesis (SBS) approach has
`
`shown great promise as a new platform for decoding the genome. This thesis focuses
`
`on the development and improvement of a chip-based four-color DNA SBS platform
`
`using molecular engineering approaches. In this approach, four nucleotides (A, C, G,
`
`T) are modified as fluorescent nucleotide reversible terminators (CF-NRTs) by
`
`tethering a cleavable fluorophore to the base and capping the 3’-OH with a small
`
`chemically reversible moiety so that the nucleotide analogues are still recognizable as
`
`substrates by DNA polymerase. First, we explored the potential of using an azido
`
`modified group for nucleotide modification. Based on our established rationale for
`
`nucleotide reversible terminator (NRT) design, we synthesized a complete set of
`
`NRTs capped at the 3’ position with an azidomethyl group (3’-O-N3-dATP, 3’-O-N3-
`
`dCTP, 3’-O-N3-dGTP, 3’-O-N3-dTTP). Through testing and optimization, it was
`
`apparent that these NRTs were good substrates of a DNA polymerase. Afterwards, we
`
`worked out an optimum chemical cleavage condition to remove the azidomethyl
`
`group capping the 3’-OH of the nucleotide analogues under conditions that were
`
`Page d
`
`
`
`
`
`compatible with DNA, allowing the next NRT to be incorporated in the subsequent
`
`polymerase reaction. We then designed and synthesized two sets of azido-modified
`
`CF-NRTs for applications in SBS. The four CF-NRTs of the first set (3’-N3-O-dNTP-
`
`azidomethylbenzoyl-fluorophores) were capped at the 3’-OH with an azidomethyl
`
`group identical to the NRTs and contained a substituted 2-azidomethylbenzoyl linker
`
`to tether a fluorophore. These CF-NRTs were used to produce four-color de novo
`
`DNA sequencing data on a chip based our sequencing by synthesis approach. After
`
`each round of sequencing, both the fluorophores linked to the CF-NRTs and the 3’-
`
`azidomethyl group on the DNA extension products generated by incorporating 3’-O-
`
`N3-dNTP-azidomethylbenzoyl-fluorophores were removed using a TCEP [Tris(2-
`
`carboxyethyl)phosphine] cleavage solution. This one-step dual-cleavage process for
`
`reinitiating the polymerase reaction increased the overall SBS efficiency. After
`
`confirming the feasibility of implementing azido-modified CF-NRTs in SBS, we
`
`synthesized a second set of CF-NRTs (3’-O-N3-dNTP-N3-fluorophores) to further
`
`improve and optimize the sequencing process. During the incorporation stage of SBS,
`
`a mixture of CF-NRTs and NRTs was used to simultaneously extend the primer
`
`strand of various target DNA linear templates. This approach led to a more efficient
`
`DNA polymerase reaction since the smaller 3’-O-N3-dNTPs were much easier to
`
`incorporate. Moreover primers extended with NRTs resembled nascent strands of
`
`DNA that had no traces of modification after cleavage of the 3’-azidomethyl capping
`
`group. After the incorporation reaction, two separate capping steps, first with 3’-O-
`
`N3-dNTPs and then with ddNTPs, were performed to synchronize all the templates on
`
`the surface. Without
`
`these precautionary synchronization procedures, mixed
`
`Page e
`
`
`
`
`
`fluorescent signals would prevent the identification of the correctly incorporated
`
`nucleotide. Hence, we have successfully addressed one of the key drawbacks of SBS,
`
`which was the miscalling of the base due to lagging signals. In addition, since both
`
`3’-O-N3-dNTP-N3-fluorophores and 3’-O-N3-dNTPs were reversible terminators,
`
`which allow the sequencing of each base in a serial manner, they could accurately
`
`determine the homopolymeric regions of DNA. Finally, we developed a novel
`
`template walking strategy to increase read length for DNA SBS. The template
`
`walking method involved resetting the sequencing start site by extending the
`
`sequencing primer with three natural nucleotides and one NRT so that the polymerase
`
`reaction was temporarily paused when the NRT was incorporated. Upon restoring the
`
`3’-OH group of the NRT incorporated into the primer via cleavage, the next cycle of
`
`walking could be carried out until the entire preiously sequenced portion of the
`
`template was skipped. We have successfully demonstrated the integration of this
`
`template walking strategy into our four-color DNA SBS platform by performing one
`
`round of SBS, four cycles of template walking reactions, and then a second round of
`
`SBS. Through this effort, we were able to sequence a linear DNA template in its
`
`entirety, nearly doubling the read length of our previous sequencing results. We are
`
`also taking advantage of the massive throughput of a next generation sequencer that is
`
`based on our SBS technology to conduct digital gene expression study of Aplysia
`
`central nervous system in an ongoing project that explores the molecular mechanism
`
`of long-term memory formation.
`
`Page f
`
`
`
`Table of Contents
`
`
`
`List of Figures.............................................................................................................................................ix
`Acknowledgements..............................................................................................................................xvii
`Abbreviations and Symbols...............................................................................................................xix
` Chapter 1: Introduction to DNA Sequencing Technologies..............................................1
`1.1 Introduction.............................................................................................................................1
`1.2 Background and Significance........................................................................................2
`1.2.1. Sanger dideoxynucleotide sequencing............................................................6
`1.2.2. MALDI-‐TOF MS based DNA sequencing..........................................................9
`1.2.3. Pyrosequencing......................................................................................................13
`1.2.4. DNA sequencing by ligation..............................................................................15
`1.2.5. DNA sequencing by engineered nanopores...............................................17
`1.3 Conclusion..........................................................................................................................19
`1.4 References..............................................................................................................................20
`Chapter 2: Overview of DNA Sequencing by Synthesis Using Cleavable
`Fluorescent Nucleotide Reversible Terminators...............................................................25
`2.1 Introduction..........................................................................................................................25
`i
`
`
`
`
`
`
`
`2.2 General Methodology for DNA Sequencing by Synthesis Using Cleavable
`Fluorescent Nucleotide Reversible Terminators..........................................................26
`2.3 Four color DNA Sequencing by Synthesis Using Cleavable Fluorescent
`Nucleotide Reversible Terminators...................................................................................29
`2.3.1. Overview..................................................................................................................29
`2.3.2. Design, synthesis, and characterization of cleavable fluorescent
`nucleotide reversible terminators.................................................................................32
`2.3.3. DNA chip construction........................................................................................35
`2.3.4. Four color sequencing by synthesis using cleavable fluorescent
`nucleotide reversible terminators.................................................................................37
`2.4 References..............................................................................................................................40
`Chapter 3: Exploration of a New Chemical Moiety for Nucleotide Reversible
`Terminator Modification in DNA Sequencing by Synthesis..........................................43
`3.1 Introduction..........................................................................................................................43
`3.2 Experimental Rationale and Overview......................................................................44
`3.3 Results and Discussion.....................................................................................................46
`3.3.1. Design and synthesis of 3’-‐O-‐azidomethyl-‐dUTP-‐NH2, a model NRT
`compound.................................................................................................................................46
`3.3.2. Polymerase
`reaction using 3’-‐O-‐azidomethyl-‐dUTP-‐NH2 and
`characterization by MALDI-‐TOF MS..............................................................................47
`ii
`
`
`
`
`
`
`
`3.3.3. Cleavage reaction to restore 3’-‐OH of DNA extension product and its
`optimization.............................................................................................................................50
`3.3.4. Design, synthesis, and evaluation of a complete nucleotide reversible
`terminator set: 3’-‐O-‐N3-‐dNTPs........................................................................................52
`3.3.5. Continuous polymerase extension using 3’-‐O-‐modified NRTs and
`characterization by MALDI-‐TOF mass spectrometry............................................54
`3.4 Materials and Methods.....................................................................................................56
`3.4.1. Synthesis of 3’-‐O-‐azidomethyl-‐dUTP-‐NH2, a model NRT compound..57
`3.4.2.
`Polymerase
`reaction
`using
`3’-‐O-‐azidomethyl-‐dUTP
`and
`characterization by MALDI-‐TOF MS..............................................................................57
`3.4.3. Cleavage reaction to restore 3’-‐OH of DNA extension product and its
`optimization.............................................................................................................................58
`3.4.4. Design, synthesis, and evaluation of a complete nucleotide reversible
`terminator set: 3’-‐O-‐N3-‐dNTPs........................................................................................58
`3.4.5. Continuous polymerase extension using 3’-‐O-‐modified NRTs and
`characterization by MALDI-‐TOF mass spectrometry............................................60
`3.5 Conclusion..............................................................................................................................61
`3.6 References..............................................................................................................................61
`Chapter 4: Design, Synthesis, and Evaluation of a Novel Class of Cleavable
`Fluorescent Nucleotide Reversible Terminators Containing Substituted 2-‐
`Azidomethyl Benzoic Acid Linker for DNA Sequencing by Synthesis......................64
`
`iii
`
`
`
`
`
`
`
`4.1 Introduction..........................................................................................................................64
`4.2 Experimental Rationale and Overview......................................................................65
`4.3 Results and Discussion.....................................................................................................69
`4.3.1. Synthesis of 3’-‐O-‐N3-‐dNTP-‐azidomethylbenzoyl-‐fluorophores............69
`4.3.2. Polymerase single base extension and subsequent cleavage reactions
`of 3’-‐O-‐N3-‐dUTP-‐azidomethylbenzoyl-‐NH2 in solution and characterization
`by MALDI-‐TOF MS.................................................................................................................69
`4.3.3. Polymerase extension of the complete set of 3’-‐O-‐N3-‐dNTP-‐
`azidomethylbenzoyl-‐fluorophores as reversible
`fluorescent nucleotide
`terminators in solution and characterization by MALDI-‐TOF MS....................73
`4.3.4. Four-‐color DNA sequencing on a chip using 3’-‐O-‐N3-‐dNTP-‐
`azidomethylbenzoyl-‐fluorophores (CF-‐NRTs) and unlabeled 3′-‐O-‐N3-‐dNTPs
`(NRTs)........................................................................................................................................75
`4.4 Materials and Methods.....................................................................................................80
`4.4.1. Synthesis of 3’-‐O-‐N3-‐dNTP-‐azidomethylbenzoyl-‐fluorophores............80
`4.4.2. Polymerase single base extension and subsequent cleavage reactions
`of 3’-‐O-‐N3-‐dUTP-‐azidomethylbenzoyl-‐NH2 in solution and characterization
`by MALDI-‐TOF MS.................................................................................................................81
`4.4.3. Polymerase extension of the complete set of 3’-‐O-‐N3-‐dNTP-‐
`azidomethylbenzoyl-‐fluorophores as reversible
`fluorescent nucleotide
`terminators in solution and characterization by MALDI-‐TOF MS....................82
`iv
`
`
`
`
`
`4.4.4. Four-‐color DNA sequencing on a chip using 3’-‐O-‐N3-‐dNTP-‐
`azidomethylbenzoyl-‐fluorophores (CF-‐NRTs) and unlabelled 3′-‐O-‐N3-‐dNTPs
`(NRTs)........................................................................................................................................83
`4.5 Conclusion..............................................................................................................................84
`4.6 References..............................................................................................................................86
`Chapter 5: Four-‐color DNA Sequencing by Synthesis (SBS) Improvements using
`Cleavable Fluorescent Nucleotide Reversible Terminators..........................................88
`5.1 Introduction..........................................................................................................................88
`5.2 Experimental Rationale and Overview......................................................................91
`5.3 Results and Discussion.....................................................................................................93
`5.3.1. Design and synthesis of cleavable fluorescent nucleotide reversible
`terminators and 3’-‐O-‐modified NRTs for SBS...........................................................93
`5.3.2. Polymerase extension using cleavable
`fluorescent nucleotide
`reversible terminators in solution and their characterization by MALDI-‐TOF
`MS.................................................................................................................................................98
`5.3.3. Four-‐color DNA sequencing by synthesis on a chip using cleavable
`fluorescent nucleotide reversible terminators and 3’-‐O-‐modified NRTs......99
`5.4 Materials and Methods...................................................................................................104
`5.4.1. Design and synthesis of cleavable fluorescent nucleotide reversible
`terminators and 3’-‐O-‐modified NRTs for SBS.........................................................104
`v
`
`
`
`
`
`
`
`fluorescent nucleotide
`5.4.2. Polymerase extension using cleavable
`reversible terminators in solution and their characterization by MALDI-‐TOF
`MS...............................................................................................................................................105
`5.4.3. Construction of a DNA Immobilized Chip of Multiple Linear Templates
`.....................................................................................................................................................106
`5.4.4. Four-‐color DNA sequencing by synthesis on a chip using cleavable
`fluorescent nucleotide reversible terminators and 3’-‐O-‐modified NRTs....107
`5.5 Conclusion............................................................................................................................108
`5.6 References............................................................................................................................113
`Chapter 6: Exploration of Novel Primer Resetting Strategies to Extend Read-‐
`length for DNA Sequencing by Synthesis............................................................................117
`6.1 Introduction........................................................................................................................117
`6.2 Experimental Rationale and Overview....................................................................118
`6.2.1. Strategy 1: template “walking” by unlabeled nucleotides.....................119
`6.2.2. Strategy 2: template “walking” with universal bases..............................126
`6.2.3. Strategy 3: multiple primer hybridization...................................................127
`6.3 Results and Discussion...................................................................................................130
`6.3.1. Template walking using three natural nucleotides and MALDI-‐TOF MS
`characterization of walking products.........................................................................131
`6.3.2. Template walking using three natural nucleotides and one NRT for
`four-‐color DNA Sequencing by Synthesis..................................................................136
`vi
`
`
`
`
`
`
`
`6.4 Materials and Methods...................................................................................................142
`6.4.1. Template walking using three natural nucleotides and MALDI-‐TOF MS
`characterization of walking products.........................................................................143
`6.4.2. Template walking using three natural nucleotides and one NRT for
`four-‐color DNA Sequencing by Synthesis..................................................................145
`6.5 Conclusion............................................................................................................................147
`6.6 References............................................................................................................................149
`Chapter 7: Massively Parallel Monitoring of Gene Expression in Aplysia Central
`Nervous System (CNS) using Four-‐color DNA Sequencing by Synthesis..............150
`7.1 Introduction........................................................................................................................150
`7.2 Experimental Rationale and Overview....................................................................153
`7.3 Results and Discussion...................................................................................................155
`7.4 Materials and Methods...................................................................................................159
`7.4.1. Gene transcript analysis in Aplysia neuronal cells using Illumina
`Genome Analyzer.................................................................................................................160
`7.5 Conclusion............................................................................................................................160
`7.6 References............................................................................................................................161
`Chapter 8: Summary and Future Outlook...........................................................................163
`8.1 Exploration of a New Chemical Moiety for Nucleotide Reversible
`Terminator Modification in DNA Sequencing by Synthesis 1...........................163
`vii
`
`
`
`
`
`
`
`
`
`
`
`8.2 Design, Synthesis, and Evaluation of a Novel Class of Cleavable
`Fluorescent Nucleotide Reversible Terminators Containing Substituted 2-‐
`Azidomethyl Benzoic Acid Linker for DNA Sequencing by Synthesis 2........164
`8.3 Four-‐color DNA Sequencing by Synthesis (SBS) Improvements Using
`Cleavable Fluorescent Nucleotide Reversible Terminators 3...........................165
`8.4 Exploration of Novel Primer Resetting Strategies to Extend Read Length
`for DNA Sequencing by Synthesis 4..............................................................................166
`8.5 Future Outlook for 4-‐color DNA Sequencing by Synthesis using CF-‐NRTs
`5, 6................................................................................................................................................166
`8.6 References.......................................................................................................................168
`
`viii
`
`
`
`
`
`List of Figures
`
` Fig. 1.1. Chemical structures of 2’-‐deoxyribonucleotide triphosphates. Each
`nucleotide is composed of a base (adenine, guanine, cytosine, or thymine), a
`sugar, and a phosphate group.............................................................................................3
`Fig. 1.2. (A) A cartoon illustrating the double helical structure of DNA. Each
`strand is supported by the sugar-‐phosphate backbone. They are held
`together via hydrogen bonds in anti-‐parallel fashion (the 5’ end of one
`strand aligns with the 3’ end of the other one). (B) A figure depicting two
`DNA strands held together by hydrogen bonds between paired bases. (C)
`More detailed chemical structure showing the hydrogen bonding between
`bases...............................................................................................................................................4
`Fig. 1.3. Scheme of DNA polymerase reaction. DNA synthesis takes place via the
`addition of a nucleotide to the 3’-‐OH end of a DNA primer strand. The base-‐
`pairing between the incoming nucleotide and the DNA template strand
`dictates which nucleotide is added. DNA polymerase facilitates the addition
`of the incoming nucleotide by catalyzing the formation of a phosphodiester
`bond between the terminal 3’-‐OH group of the primer strand and the alpha
`phosphorus atom of the nucleotide. A pyrophosphate (PPi) group is
`released as a by-‐product.......................................................................................................6
`Fig. 1.4. Chemical structures of 3’-‐deoxyribonucleotide (dNTP) and 2’, 3’-‐
`dideoxyribonucleotide (ddNTP). Since ddNTPs do not have the 3’-‐OH group,
`which is necessary for DNA synthesis, they terminate further extension of
`the DNA strand once incorporated...................................................................................7
`Fig. 1.5. Sanger dideoxy chain-‐termination sequencing method. DNA fragments
`are generated by extending the primer with a mixture of dNTPs and
`ddNTPs. Upon incorporation of a ddNTP, the DNA strand ceases to
`participate in polymerase reaction due to the lack of the 3’-‐OH group. Thus,
`a mixture of DNA strands with different length complementary to the
`template DNA is produced. To determine the sequence of the template,
`these DNA fragments are separated based on size by electrophoresis, and
`the resulting bands of DNA are detected by their fluorescent signals..............8
`Fig. 1.6. Matrix-‐assisted
`laser desorption/ionization
`time-‐of-‐flight mass
`spectrometry (MALDI-‐TOF MS). A mixture of analyte (e.g. DNA sequencing
`fragments) and matrix molecules (blue) are spotted on the sample plate and
`allowed to co-‐crystallize prior to loading into the vacuum chamber. After UV
`laser irradiation, the desorbed and ionized analyte and matrix molecules
`are accelerated under a constant electric voltage, causing them to fly
`towards the detector. The charged molecules arrive at the detector at
`different times based on their masses. Therefore, the masses of the charged
`particles can be determined from their time-‐of-‐flight..........................................11
`ix
`
`
`
`
`
`
`Fig. 1.7. DNA sequencing using MALDI-‐TOF MS. (A) Sanger sequencing fragments
`generated using biotin-‐labeled ddNTPs. (B) Example of mass sequencing
`spectrum using biotin-‐labeled ddNTPs........................................................................13
`Fig. 1.8. General scheme of pyrosequencing. As the polymerase catalyzes the
`incorporation of nucleotide(s) into a growing strand of DNA, PPi molecules
`are released and then converted to ATPs by ATP-‐sulfurylase. The ATPs
`participate in the luciferase reaction in which a luciferin molecule
`is
`oxidized to produce oxyluciferin and light. The resulting luminescence can
`be registered using a photon detector..........................................................................15
`Fig. 1.9. Scheme of DNA sequencing by ligation using degenerate nonamers......16
`Fig. 1.10. 3-‐D rendering of α-‐hemolysin structure...........................................................17
`Fig. 1.11. Scheme for DNA sequencing using nanopore and theoretical plot of
`translocation time versus blockade current elicited by DNA strand..............18
`Fig. 2.1. In the SBS approach, a chip is constructed with immobilized DNA
`templates that are able to self-‐prime for initiating the polymerase reaction.
`Four nucleotide analogues are designed such that each is labeled with a
`unique fluorescent dye on the specific location of the base, and a small
`chemical group (R) to cap the 3'-‐OH group. Upon adding the four nucleotide
`analogues and DNA polymerase, only
`the nucleotide analogue
`complementary to the next nucleotide on the template is incorporated by
`polymerase on each spot of the chip (step 1). After removing the excess
`reagents and washing away any unincorporated nucleotide analogues, a 4-‐
`color fluorescence scanner is used to image the surface of the chip, and the
`unique fluorescence emission from the specific dye on the nucleotide
`analogues on each spot of the chip will yield the identity of the nucleotide
`(step 2). After imaging, the small amount of unreacted 3'-‐OH group on the
`self-‐primed template moiety will be capped by excess 3’-‐O-‐modified
`nucleotide reversible
`terminators and DNA polymerase
`to avoid
`interference with the next round of synthesis for synchronization (step 3).
`The dye moiety and the R protecting group will be removed to generate a
`free 3'-‐OH group with high yield (step 4). The self-‐primed DNA moiety on
`the chip at this s