throbber
articles
`
`— I
`
`nitial sequencing and analysis of the
`
`International Human fienome Sequencing consortium
`
`,
`-nunno"..."nun-"m...nuns-nu-
`annual-nu-
`nun-qu-u-uu-uu.
`‘A partial list ofauthors appears on the opposite page. Afiiiiations are listed at the and ofthe paper.
`
`Exhibit
`wn W‘
`Date
`6 '-
`Leslie Rockwood CSR RPR
`nu“nn-unun-uup.unun-unnuu...".u.
`
`m nailIQOlolulfllllDOlll‘ll-I------v-uv
`
`The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution.
`Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human
`genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
`
`coordinate regulation of the genes in the clusters.
`0 There appear to be about 30,000—40,000 protein—coding genes in
`the human genome—only about twice as many as in worm or fly.
`However,
`the genes are more complex, with more alternative
`splicing generating a larger number of protein products.
`OThe full set of proteins (the ‘proteome’) encoded by the human
`genome is more complex than those of invertebrates. This is due in
`part to the presence of vertebrate-specific protein domains and
`motifs (an estimated 7% of the total), but more to the fact that
`vertebrates appear to have arranged pro-existing components into a
`richer collection of domain architectures.
`
`0 Hundreds of human genes appear likely to have resulted from
`horiZOntal transfer from bacteria at some point in the vertebrate
`lineage. Dozens of genes appear to have been derived from trans—
`posable elements.
`0 Although about half of the human genome derives from trans—
`posable elements, there has been a marked decline in the overall
`activity of such elements in the hominid lineage. DNA transposons
`appear to have become completely inactive and long-terminal
`repeat (LTR) retroposons may also have done so.
`0 The pericentromeric and subtelomeric regions of chrOmosomes
`are filled with large recent segmental duplications of sequence from
`elsewhere in the genome. Segmental duplication is much more
`frequent in humans than in yeast, fly or worm.
`0 Analysis of the organization of Alu elements explains the long-
`standing mystery of their surprising genomic distribution, and
`suggests that there may be strong selection in favour of preferential
`retention ofAlu elements in GC-rich regions and that these ‘selfish’
`elements may benefit their human hosts.
`OThe mutation rate is about twice as high in male as in female
`meiosis, showing that most mutation occurs in males.
`0 Cytogenetic analysis of the sequenced clones confirms sugges-
`tions that large GC-poor regions are strongly correlated with ‘dark
`G—bands’ in karyolypes.
`ORecombination rates tend to be much higher in distal regions
`(around 20 megabases (MbD of chromosomes and on shorter
`chromosome arms in general,
`in a pattern that promotes the
`occurrence of at least one crossover per chromosome arm in each
`meiosis.
`'
`
`o More than 1.4 million single nucleotide polymorphisms (SNPs)
`in the human genome have been identified. This collection should
`allow the initiation of genome-wide linkage disequilibrium
`mapping of the genes in the human population.
`In this paper, we start by presenting background information on
`the project and describing the generation, assembly and evaluation
`of the draft genome sequence. We then focus on an initial analysis of
`the sequence itself: the broad chromosomal landscape; the repeat
`elements and the rich palaeontological record of evolutionary and
`biological processes that they provide;
`the human genes and
`proteins and their differences and similarities with those of other
`
`The rediscovery ofMendel’s laws ofheredity in the opening weeks of
`the 20th century“3 sparked a scientific quest to understand the
`nature and content of genetic information that has propelled
`biology for the last hundred years. The scientific progress made
`falls naturally into four main phases, corresponding roughly to the
`four quarters of the century. The first established the cellular basis of
`heredity: the chromosornes. The second defined the molecular basis
`of heredity: the DNA double helix. The third unlocked the informa—
`tional basis ofheredity, with the discovery of the biological mechan-
`ism bywhich cells read the information contained in genes and with
`the invention of the recombinant DNA technologies of cloning and
`sequencing by which scientists can do the same.
`The last quarter of a century has been marked by a relentless drive
`to decipher first genes and then entire genomes, spawning the field
`of genomics. The fruits of this work already include the genome
`sequences of 599 viruses and viroids, 205 naturally occurring
`plasmids,
`185 organelles, 31 eubacteria, seven archaea, one
`fungus, two animals and one plant.
`Here we report the results of a collaboration involving 20 groups
`from the United States,
`the United Kingdom, Japan, France,
`- Germany and China to produce a draft sequence of the human
`genome. The draft genome sequence was generated from a physical
`map covering more than 96% ofthe euchromatic part ofthe human
`genome and, together with additional sequence in public databases,
`it covers about 94% of the human genome. The sequence was
`produced over a relatively short period, with coverage rising from
`about 10% to more than 90% over roughly fifteen months. The
`sequence data have been made available without restriction and
`updated daily throughout the project. The task ahead is to produce a
`finished sequence, by closing all gaps and resolving all ambiguities.
`Already about one billion bases are in final form and the task of
`bringing the vast majority of the sequence to this standard is now
`straightforward and should proceed rapidly.
`The sequence of the human genome is of interest in several
`respects. It is the largest genome to be extensively sequenced so far,
`being 25 times as large as any previously sequenced genome and
`eight times as large as the sum of all such genomes. It is the first
`vertebrate genome to be extensively sequenced. And, uniquely, it is
`the genome of our own species.
`Much work remains to be done to produce a complete finished
`sequence, but the vast
`trove of information that has become
`available through this collaborative effort allows a global perspective
`on the human genome. Although the details will change as the
`sequence is finished, many points are already clear.
`OThe genomic landscape shows marked variation in the distribu—
`tion of a number of features,
`including genes,
`transposable
`elements, GC content, CpG islands and recombination rate. This
`gives us important clues about function. For example, the devel-
`"""
`'
`‘
`poor
`SEQUENOM EXHIBIT 1101
`3:311:13: SEQUENOM EXHIBIT 1101
`[plex
`Sequenom v. Stanford
`Sequenom V. Stanford
`860
`IPR2013-00390
`IPR2013-00390
`
`Macmillan Magazines Ltd
`
`NATUREI VOL 409 | 15 FEBRUARY 2001I\Mv.nature.com
`
`SEQUENOM EXHIBIT 1101
`
`

`

`articles
`
`Initial sequencing and analysis of the
`human genome
`
`International Human Genome Sequencing Consortium ..
`
`• A partial list of authors appears on the opposite page. Affiliations are listed at tlte end of the paper.
`
`/fOI
`!;~'-ib-it.=_-=.~'P~~A~..:.:I\)~1-... ~-.,.~---
`,-:2 ~ / <?(
`oate
`\..leslie Rockwood CSR RPR ~
`
`The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution.
`Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human
`genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
`
`The rediscovery of Mendel's laws ofheredity in the opening weeks of
`the 20th century'- 3 sparked a scientific quest to understand the
`nature and content of genetic information that has propelled
`biology for the last hundred years. The scientific progress made
`falls naturally into four main phases, corresponding roughly to the
`four quarters of the century. The first established the cellular basis of
`heredity: the chromosomes. The second defined the molecular basis
`ofheredity: the DNA double helix. The third unlocked the informa(cid:173)
`tional basis of heredity, with the discovery of the biological mechan(cid:173)
`ism by which cells read the information contained in genes and with
`the invention of the recombinant DNA technologies of cloning and
`sequencing by which scientists can do the same.
`The last quarter of a century has been marked by a relentless drive
`to decipher first genes and then entire genomes, spawning the field
`of genomics. The fruits of this work already include the genome
`sequences of 599 viruses and viroids, 205 naturally occurring
`plasmids, 185 organelles, 31 eubacteria, seven archaea, one
`fungus, two animals and one plant.
`Here we report the results of a collaboration involving 20 groups
`from the United States, the United Kingdom, Japan, France,
`. Germany and China to produce a draft sequence of the human
`genome. The draft genome sequence was generated from a physical
`map covering more than 96% of the euchromatic part of the human
`genome and, together with additional sequence in public databases,
`it covers about 94% of the human genome. The sequence was
`produced over a relatively short period, with coverage rising from
`about 10% to more than 90% over roughly fifteen months. The
`sequence data have been made available without restriction and
`updated daily throughout the project. The task ahead is to produce a
`finished sequence, by closing all gaps and resolving all ambiguities.
`Already about one billion bases are in final form and the task of
`bringing the vast majority of the sequence to this standard is now
`straightforward and should proceed rapidly.
`The sequence of the human genome is of interest in several
`respects. It is the largest genome to be extensively sequenced so far,
`being 25 times as large as any previously sequenced genome and
`eight times as large as the sum of all such genomes. It is the first
`vertebrate genome to be extensively sequenced. And, uniquely, it is
`the genome of our own species.
`Much work remains to be done to produce a complete finished
`sequence, but the vast trove of information that has become
`available through this collaborative effort allows a global perspective
`on the human genome. Although the details will change as the
`sequence is finished, many points are already clear.
`• The genomic landscape shows marked variation in the distribu(cid:173)
`tion of a number of features, including genes, transposable
`elements, GC content, CpG islands and recombination rate. This
`gives us important clues about function. For example, the devel(cid:173)
`opmentally important HOX gene clusters are the most repeat-poor
`regions of the human genome, probably reflecting the very complex
`
`coordinate regulation of the genes in the clusters.
`• There appear to be about 30,000-40,000 protein-coding genes in
`the human genome-only about twice as many as in worm or fly.
`However, the genes are more complex, with more alternative
`splicing generating a larger number of protein products.
`• The full set of proteins (the 'proteome') encoded by the human
`genome is more complex than those of invertebrates. This is due in
`part to the presence of vertebrate-specific protein domains and
`motifs (an estimated 7o/o of the total), but more to the fact that
`vertebrates appear to have arranged pre-existing components into a
`richer collection of domain architectures.
`• Hundreds of human genes appear likely to have resulted from
`horizontal transfer from bacteria at some point in the vertebrate
`lineage. Dozens of genes appear to have been der~ved from trans(cid:173)
`posable elements.
`• Although about half of the human genome derives from trans(cid:173)
`posable elements, there has been a marked decline in the overall
`activity of such elements in the hominid lineage. DNA transposons
`appear to have become completely inactive and long-terminal
`repeat (LTR) retroposons may also have done so.
`• The pericentromeric and subtelomeric regions of chromosomes
`a~e filled with large recent segmental duplications of sequence from
`elsewhere in the genome. Segmental duplication is much more
`frequent in humans than in yeast, fly or worm.
`• Analysis of the organization of Alu elements explains the long(cid:173)
`standing mystery of their surprising genomic distribution, and
`suggests that there may be strong selection in favour of preferential
`retention of Alu elements in GC-rich regions and that these 'selfish'
`elements may benefit their human hosts.
`• The mutation rate is about twice as high in male as in female
`meiosis, showing that most mutation occurs in males.
`• Cytogenetic analysis of the sequenced clones confirms sugges(cid:173)
`tions that large GC-poor regions are strongly correlated with 'dark
`G-bands' in karyotypes.
`• Recombination rates tend to be much higher in distal regions
`(around 20 megabases (Mb)) of chromosomes and on shorter
`chromosome arms in general, in a pattern that promotes the.
`occurrence of at least one crossover per chromosome arm in each
`meiosis.
`• More than 1.4 million single nucleotide polymorphisms (SNPs)
`in the human genome have been identified. This collection should
`allow
`the initiation of genome-wide linkage disequilibrium
`mapping of the genes in the human population.
`In this paper, we start by presenting background information on
`the project and describing the generation, assembly and evaluation
`of the draft genome sequence. We then focus on an initial analysis of
`the sequence itself: the broad chromosomal landscape; the repeat
`elements and the rich palaeontological record of evolutionary and
`biological processes that they provide; the human genes and
`proteins and their differences and similarities with those of other
`
`860
`
`~©2001 Macmillan Magazines Ltd
`
`NATURE I VO.J,. 409 115 FEBRUARY 200tlwww.nature.com
`
`SEQUENOM EXHIBIT 1101
`
`

`

`Genome Sequencing Centres (Listed in order of total genomic
`sequence conbibuted, with a partial list of personnel. A full list of
`contributors at each centre is available as Supplementary
`Information.)
`
`Whitehead Institute for Biomedical Research, Center for Genome
`Research: Eric S. Lander1*, Lauren M. Linton\ Bruce Blrren1*,
`Chad Nusbaum1*, Michael C. Zodyh , Jennifer Baldwin\
`Keri Devon\ Ken Dewar\ Michael Doyle\ William FitzHugh1
`" ,
`Roel Funke\ Diane Gage\ Katrina Harris\ Andrew Heaford1
`,
`John Howland\ Usa Kann\ Jessica Lehoczky\ Rosie LeVine\
`
`
`Paul McEwan 1 I Kevin McKernan 1 I James Meldrim 1
`' Jill p. Meslrov1* r
`Cher Miranda\ William Morris\ Jerome Naylor\
`Christina Raymond\ Mark Rosetti\ Ralph Santos\
`Andrew Sheridan\ Carrie Sougnez\ Nicole Stange-Thomann\
`Nikola Stojanovic\ Aravind Subramanian1
`& Dudley Wyman 1
`
`, John Sulston2*,
`The Sanger Centre: Jane Rogers2
`, Stephan Beck2
`Rachael Alnscough2
`1 David Benttei, John Burton2
`Christopher Clee2
`, Nigel Carter2, Alan Coulson2
`,
`Rebecca Deadman2
`, Panos Deloukas2
`, Andrew Dunham2
`,
`, Richard Durbin2*, Usa French2
`lan Dunham2
`, Darren Grafham2
`,
`Simon Gregori, Tim Hubbard2*, Sean Humphray2
`, Adrienne Hunt2,
`, Christine Lloyd2
`Matthew Jones2
`, Amanda McMurray2
`,
`, James C. Mullikin2*,
`Lucy Matthews2
`, Simon Mercer-2, Sarah Milne2
`
`
`Andrew Mungall21 Robert Plumb21 Mark Ross2, Ratna Shownkeen2
`& Sarah Slms2
`
`,
`
`Washington University Genome Sequencing Center:
`, LaDeana w. Hmier*,
`Robert H. Watersto~3*, Richard K. Wilson3
`
`John D. McPherson , Marco A. Marra3, Elaine R. Mardis3
`,
`, Asif T. Chinwalla3*, Kymberlie H. Pepin3
`Lucinda A. Fulton3
`, Stephanie L. Chissoe3
`Warren R. Glsh3
`, Michael C. Wendl3
`,
`Kim D. Delehauntyl, Tracie L. Miner, Andrew Delehaunty3
`,
`, Roberts. Fulton3
`Jason B. Kramer , Lisa L. Cook3
`,
`, Patrick J. Minx3 & Sandra w. Clifton3
`Douglas L. Johnson3
`
`,
`
`US DOE Joint Genome Institute: Trevor Hawkins\
`Elbert Branscomb\ Paul Predki4
`, Paul Richardson4
`,
`Sarah Wenning\ Tom Slezak\ Norman Doggett4
`1 Jan-Fang Cheng4
`,
`, Susan Lucas4
`1 Christopher Elkin4
`Anne Ofsen4
`,
`Edward Uberbacher4 & Marvin Frazier4
`
`Baylor College of Medicine Human Genome Sequencing Center:
`Richard A. Glbbs5*, Donna M. Muzny5
`, Steven E. SchererS,
`, Kim C. Worley5*, Catherine M.
`John B. Bouck5*, Erica J. Sodergren5
`
`Rlves5, James H; Gorrell5
`, Michael L. MetzkerS,
`Susan L. Naylof, Raju S. Kucherlapati7
`, David L. Nelson,
`& George M. Weinstock8
`
`RIKEN Genomic Sciences Center: Yoshiyuki Sakaki9
`
`
`Asao Fujiyama9, Masahira Hattori9, Tetsushi Yada9
`,
`, Takehiko ltoh9
`, Chiharu Kawagoe9
`Atsushi Toyoda9
`, Yasushi Totoki9 & Todd Taylor9
`Hidemi Watanabe9
`
`,
`
`1
`
`Genoscope and CNRS UMR-8030: Jean Weissenbach10
`,
`
`, William Saurin10, Francois Artiguenave10
`Roland Heilig10
`,
`Philippe Brottier10
`, Thomas Bruls10
`, Eric Pelletier10
`,
`Catherine Robert10 & Patrick Wincker10
`
`GTC Sequencing Center: Douglas R. Smith 1
`\
`Lynn Doucette-Stamm 11
`, Marc Rubenfield11
`, Keith Weinstock 11
`Hong Mel Lee11 & JoAnn Dubois11
`Department of Genome Analysis, Institute of Molecular
`
`,
`
`articles
`
`Biotechnology: Andre Rosenthal12, Matthias Platzer12
`
`,
`
`Gerald Nyakatura12, Stefan Taudien12 & Andreas Rump12
`
`Beijing Genomlcs Institute/Human Genome Center:
`Huanming Yang13
`, Jun Yu13
`, Jian Wang13
`, Guyang Huang14
`& Jun Gu15
`
`Multimegabase Sequencing Center, The Institute for Systems
`
`
`Biology: Leroy Hood16, Lee Rowen16, Anup Madan16 & Shizen Qin16
`
`Stanford Genome Technology Center: Ronald W. Davis17
`,
`Nancy A. Federspiel17
`, A. Pia Abo Ia 17 & Michael J. Proctor17
`
`Stanford Human Genome Center: Richard M. Myers18
`Jeremy Schmutz18
`1 Mark Dickson18, Jane Grimwood18
`
`& David R. Cox18
`University of Washington Genome Center: Maynard V. Olson19
`Rajinder Kaul19 & Christopher Raymond19
`
`,
`
`'
`
`Department of Molecular Biology, Kelo University School of
`Medicine: Nobuyoshi Shimizu20
`, Kazuhiko Kawasaki20
`& Shinsei Minoshima20
`
`University of Texas Southwestern Medical Center at Dallas:
`Glen A. Evans21t 1 Maria Athanasiou21 & Roger Schultz21
`
`University of Oklahoma's Advanced Center for Genome
`
`Technology: Bruce A. Roe22, Feng Chen22 & Huaqin Pan22
`
`Max Planck Institute for Molecular Genetics: Juliana Ramsey23
`Hans Lehrach23 & Richard Reinhardfl
`'
`
`Cold Spring Harbor Laboratory, Uta Annenberg Hazen Genome
`
`Center: W. Richa.rd McCombie24, Melissa de Ia Bastide24
`& Neilay Dedhia24
`
`GBF-German Research Centre for Biotechnology:
`Helmut Bliicke~, Klaus Homischer25 & Gabriele Nordsiek25
`
`.. Genome Analysis Group (listed In alphabetical order, also
`Includes individuals listed under other headings):
`, Jeffrey A. Ballet7Rich~ Agarwala26, L. Aravind26
`, Alex Bateman2
`
`
`,
`0
`
`Ser~f1m Batzoglou1, Ewan Bimey2B, PeerBork29
`, Daniel G. Brown1,
`•
`ChrJstopher B. Burge3
`', Lorenzo Cerutti28
`, Hsiu-Chuan Chen26
`,
`Deanna Church26
`1 Michele Clamp2
`, Richard R. Copley30
`Tobias Doerks29•30, Sean R. Eddy32, Evan E. Eichler27,
`'
`
`Terrence S. Furey33, James Galaganl, James G. R. Gilbert\
`Cyrus Harmon34
`, Yoshihide Hayashizaki35
`, David HaussleylS
`Henning Hermjakob28
`, Karsten Hokamp37
`, Wonhee Jang26
`,
`'
`
`L. Steven Johnson32, Thomas A. Jones32
`, Simon Kasif8,
`Arek Kaspryzk28
`
`, Scot Kennedy39, W. James Kent40
`, Paul Kitts26
`, Doron Lancet41,
`Eugene V. Koonin26
`, lan Korf, David Kulp34
`
`
`Todd M. Lowe42, Aoife Mclysaghf7, Tarjei Mikkelsen38
`,
`
`, Victor J. Pollara1
`John V. Moran431 Nicola Mulder8
`,
`
`ChrisP. Ponting44, Greg Schuler-26
`, Jiirg Schultz30
`, Guy Slatere,
`Arian F. A. Smit45, Elia Stupka28
`
`, Joseph Szustakowki38
`,
`Danielle T~ierry-Mieg26, Jean Thierry-Mieg26
`, Lukas Wagne,ZS,
`John Wai1Js3
`, Raymond Wheeler-', Alan Williams34
`, Yuri 1. WolfS,
`Kenneth H. Wolfe37
`, Shiaw-Pyng Yang3 & Ru·Fang Yeh31
`
`,
`
`Scientific management: National Human Genome Research
`Institute, US National Institutes of Health: Francis Collins46*
`MarkS. Guyer46
`, Jane Peterson46
`, Adam Felsenfeld46*
`'
`& Kris A. Wetterstrand46
`; Office of Science, US Department of
`
`Energy: Aristides Patrinos47; The Wellcome Trust: Michael J.
`Morgan48
`
`NATIJRE I VOL 40911 5 FEBRUARY 2001 1 www.nature.com
`
`~ @ 2001 Macmillan Magazines Ltd
`
`861
`
`

`

`articles
`
`organisms; and the history' of genomic segments. (Comparisons
`are drawn throughout with the genomes of the budding yeast
`Saccharomyces cerevisiae, the nematode worm Caenorhabditis
`elegans, the fruitfly Drosophila melanogaster and the mustard weed
`Arabidopsis thaliana; we refer to these for convenience simply as
`yeast, worm, fly and mustard weed.) Finally, we discuss applications
`of the sequence to biology and medicine and describe next Steps in
`the project. A full description of the methods is provided as
`Supplementary Information on Nature's web site (http://www.
`nature.com).
`We recognize that it is impossible to provide a comprehensive
`analysis of this vast dataset, and thus our goal is to illustrate the
`range of insights that can be gleaned from the human genome and
`thereby to sketch a research agenda for the future.
`
`Background to the Human Genome Project
`
`The Human Genome Project arose from two key insights that
`emerged in the early 1980s: that the ability to take global views of
`genomes could greatly accelerate biomedical research, by allowing
`researchers to attack problems in a comprehensive and unbiased
`fashion; and that the creation of such global views would require a
`communal effort in infrastructure building, unlike anything pre(cid:173)
`viously attempted in biomedical research. Several key projects
`helped to crystallize these insights, including:
`(1) The sequencing of the bacterial.viruses 4>Xl744.s and lambda6
`, the
`animal virus SV407 and the human mitochondrion8 between 1977
`and 1982. These projects proved the feasibility of assembling small
`sequence fragments into complete genomes, and showed the value
`of complete catalogues of genes and other functional elements.
`(2) The programme to create a human genetic map to make it
`possible to locate disease genes of unknown function based solely on
`their inheritance patterns, launched by Botstein and colleagues in
`1980 {ref. 9).
`(3) The programmes to create physical maps of clones covering the
`yeast10 and worm11 genomes to allow isolation of genes and regions
`based solely on their chromosomal position, launched by Olson and
`Sulston in the mid-1980s.
`
`(4) The development of random shotgun sequencing of comple(cid:173)
`mentary DNA fragments for high-throughput gene discovery by
`SchimmeJI 2 and Schimmel and Sutcliffe13
`, later dubbed expressed
`sequence tags (ESTs) and pursued with automated sequencing by
`Venter and others14- 20•
`The idea of sequencing the entire human genome was first
`proposed in discussions at scientific meetings organized by the
`US Department of Energy and others from 1984 to 1986 {refs 21,
`22). A committee appointed by the US National Research Council
`endorsed the concept in its 1988 reporfl, but recommended a
`broader programme, to include: the creation of genetic, physical
`and sequence maps of the human genome; parallel efforts in key
`model organisms such as bacteria, yeast, worms, flies and mice; the
`development of technology in support of these objectives; and
`research into the ethical, legal and social issues raised by human
`genome research. The programme was launched in the US as a joint
`effort of the Department of Energy and the National Institutes of
`Health. In other countries, the UK Medical Research Council and
`the Wellcome Trust supported genomic research in Britain; the
`Centre d'Etude du Polymorphisme Humain and the French Mus(cid:173)
`cular Dystrophy Association launched mapping efforts in France;
`government agencies, including the Science and Technology Agency
`and the Ministry of Education, Science, Sports and Culture sup(cid:173)
`ported genomic research efforts 'in Japan; and the European Com(cid:173)
`munity helped to launch several international efforts, notably the
`programme to sequence the yeast genome. By late 1990, the Human
`Genome Project had been launched, with the creation of genome
`centres in these countries. Additional participants subsequently
`joined the effort, notably in Germany and China. In addition, the
`Human Genome Organization {HUGO) was founded to provide a
`forum for international coordination of genomic research. Several
`26 provide a more comprehensive discussion of the genesis
`books24
`-
`of the Human Genome Project.
`Through 1995, work progressed rapidly on two fronts {Fig. 1).
`The first was construction of genetic and physical maps of the
`human and mouse genomes27
`31
`, providing key tools for identifica(cid:173)
`-
`tion of disease genes and anchoring points for genomic sequence.
`The second was sequencing of the yeast32 and worm33 genomes, as
`
`1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`)I
`
`1984
`
`~· • I
`
`Discussion and debate
`in scientific community
`NRC report
`I
`
`39 species )I
`
`Bacterial genome sequencing
`H.
`lu
`E. bali
`S. cerevisiae se9uencing
`
`1f
`
`C. elegans sequencing
`I • • • • • • • • • • • ___ ...;;;;.;;;~;;;.;;..;;.;;:,;;;,;,;,;~---till •• a WIll ¥II Ill U U
`D. melanogaster sequencing
`
`.......... --........ __ ;.;.;.;.;;;.;;.;,;;;:---..--Ol:..-.-.... -..... ..
`
`'1:1 ::I~ :JI .a'!.';: 'II'
`
`__ ....;A.;.;·..;.th.;.;a;;;li;;;ana.;.;.;se_,.q:;.ue;;;n,;,;c.;.;in;;;g:....- •••• • • • •
`
`Genetic maps
`
`Microsatellites
`
`SNPs
`
`~ Physical maps
`~ eDNA sequencing
`[
`Genomic sequencing
`Genetic maps ____ _.M..,i .. cro;;o;;;sa;;;tOiie;;.;.llit;;;e;;s ____ _
`
`Full length )I
`
`ESTs
`
`1
`
`Pilot
`sequencing
`
`SNPs
`
`)I
`
`c: Physical maps - - - - - - - - - - - - - - - - - - - - - - - - -
`~ eDNA sequencing
`ESTs
`Full length
`1
`~
`~~~
`draft. goo~ Finishing, -100%
`1 t
`Chromosome 22 Chromosome 21
`
`Genomic sequencing
`
`Pilot project,15%
`
`Figure 1 Timeline of large-scale genomic analyses. Shown are selected components of
`work on several non-vertebrate model organisms (red), the mouse (blue} and the human
`
`(green) from t99D: earlier projects are described in the text. SNPs, single nucleotide
`polymorphisms: ESTs, expressed sequence tags.
`
`862
`
`~ © 2001 Macmillan Magazines Ltd
`
`NATURE I VOL 409115 FEBRUARY 200llwww.nature.com
`
`

`

`well as targeted regions of mammalian genomes34- 37• These projects
`showed that large-scale sequencing was feasible and developed the
`two-phase paradigm for genome sequencing. In the first, 'shotgun',
`phase, the genome is divided into appropriately sized segments and
`each segment is covered to a high degree of redundancy (typically,
`eight- to tenfold) through the sequencing of randomly selected
`subfragments. The second is a 'finishing' phase, in which sequence
`gaps are dosed and remaining ambiguities are resolved through
`directed analysis. The results also showed that complete genomic
`sequence provided information about genes, regulatory regions and
`chromosome structure that was not readily obtainable from eDNA
`studies alone.
`In 1995, genome scientists considered a proposal38 that would
`have involved producing a draft genome sequence of the human
`genome in a first phase and then returning to finish the sequence in
`a second phase. After vigorous debate, it was decided that such a
`plan was premature for several reasons. These included the need first
`to prove that high-quality, long-range finished sequence could be
`produced from most parts of the complex, repeat-rich human
`genome; the sense that many aspects of the sequencing process
`were still rapidly evolving; and the desirability of further decreasing
`costs.
`Instead, pilot projects were launched to demonstrate the feasi(cid:173)
`bility of cost-effective, large-scale sequencing, with a target comple(cid:173)
`tion date of March 1999. The projects successfully produced
`finished sequence with 99.99% accuracy and no gaps39
`• They also
`introduced bacterial artificial chromosomes (BACs)~0, a new large(cid:173)
`insert cloning system that proved to be more stable than the cosmids
`and yeast artificial chromosomes (YACs) 41 that had been used
`previously. The pilot projects drove the maturation and conver(cid:173)
`gence of sequencing strategies, while producing 15% of the human
`genome sequence. With successful completion of this phase, the
`human genome sequencing effort moved into full-scale production
`in March 1999.
`The idea of first producing a draft genome sequence was revived
`at this time, both because the ability to finish such a sequence was no
`longer in doubt and because there was great hunger in the scientific
`community for human sequence data. In addition, some scientists
`favoured prioritizing the production of a draft genome sequence
`over regional finished sequence because of concerns about com(cid:173)
`mercial plans to generate proprietary databases of human sequence
`4
`that might be subject to undesirable restrictions on use42
`•
`..
`The consortium focused on an initial goal of pro.ducing, in a first
`production phase lasting until June 2000, a draft genome sequence
`covering most of the genome. Such a draft genome sequence,
`although not completely finished, would rapidly allow investigators
`·to begin to extract most of the information in the human sequence.
`Experiments showed that sequencing clones covering about 90o/o of
`the human genome to a redundancy of about four- to fivefold ('half(cid:173)
`shotgun' coverage; see Box 1) would accomplish this45
`• The draft
`46
`'
`genome sequence goal has been achieved, as described below.
`The second sequence production phase is now under way. Its
`aims are to achieve full-shotgun coverage of the existing clones
`during 2001, to obtain clones to fill the remaining gaps in the
`physical map, and to produce a finished sequence (apart from
`regions that cannot be cloned or sequenced with currently available
`techniques) no later than 2003.
`
`Strategic issues
`
`articles
`
`libraries with more uniform representation. The practice of sequen(cid:173)
`cing from both ends of double-stranded clones ('double-barrelled'
`shotgun sequencing) was introduced by Ansorge and others37 in
`1990, allowing the use of 'linking information' between sequence
`fragments.
`The application of shotgun sequencing was also extended by
`applying it to larger and larger DNA molecules-from plasmids
`(- 4 kilo bases (kb)) to cosmid clones37
`( 40 kb ), to artificial chro(cid:173)
`mosomes cloned in bacteria and yeasr5 (1 00-500 kb) and bacterial
`genomes56 (1-2 megabases (Mb)). In principle, a genome of arbi(cid:173)
`trary size may be directly sequenced by the shotgun method,
`provided that it contains no repeated sequence and can be uni(cid:173)
`formly sampled at random. The genome can then be assembled
`using the simple computer science technique of'hashing' (in which
`one detects overlaps by consulting an alphabetized look-up table of
`all k-letter words in the data). Mathematical analysis of the
`expected number of gaps as a function of coverage is similarly
`straightforward57
`•
`Practical difficulties arise because of repeated sequences and
`cloning bias. Small amounts of repeated sequence pose little
`problem for shotgun sequencing. For example, one can readily
`assemble typical bacterial genomes (about 1.5% repeat) or the
`euchromatic portion of the fly genome (about 3o/o repeat). By
`contrast, the human genome is filled (>50%) with repeated
`sequences, including interspersed repeats derived from transposable
`elements, and long genomic regions that have been duplicated in
`tandem, palindromic or dispersed fashion (see below). These
`include large duplicated segments (50-500 kb) with high sequence
`identity (98-99.9%), at which mispairing during recombination
`creates deletions responsible for genetic syndromes. Such features
`complicate the assembly of a correct and finished genome sequence.
`There are two approaches for sequencing large repeat-rich
`is a whole-genome shotgun sequencing
`genomes. The first
`approach, as has been used for the repeat-poor genomes of viruses,
`bacteria and flies, using linking information and computational
`
`Hierarchical shotgun sequencing
`
`Genomic DNA
`
`BAC library
`
`Organized
`mapped large
`clone contigs
`
`BAC to be
`sequenced
`
`Shotgun
`clones
`
`Shotgun
`sequence
`
`.....
`.....
`--.
`_ , ,..J r-J - " ""
`-
`)
`--",_r-~ _,..!.r,.._ ,l. ~.,..-:...,,......
`t
`
`... ACCGTAAATGGGCTGATCATGCTTAAA
`TGATCATGCTTAAACCCTGTGCATCCTACTG ...
`
`Hierarchical shotgun sequencing
`48
`Soon after the invention of DNA sequencing methods47
`, the
`•
`51
`shotgun sequencing strategy was introduced49
`; it has remained
`"
`the fundamental method for large-scale genome sequencing52
`54 for ·
`"
`the past

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket