Useful Proteins
`from Recombinant Bacteria
`Bacteria into which nonbacteria] genes ba ve been introduced are
`able to manufacrure nonbacterial proteins. Among the proteins
`made by recombinant-DNA methods are insulin and interferon
`by Walter Gilbert and Lydia Villa-Komaroff
`living cell is a protein factory. It
`synthesizes the enzymes and oth—
`er proteins that maintain its oWn
`integrity and physiological processes,
`and (in multicelled organisms) it often
`synthesizes and secretes other proteins
`that perform some specialized function
`contributing to the life of the organism
`as a whole. Different kinds of cells make
`different proteins,
`following instruc-
`tions encoded in the DNA of their genes.
`Recent advances in molecular biology
`make it possible to alter those instruc-
`tions in bacterial cells. thereby designing
`bacteria that can synthesize nonbacteri-
`al proteins. The bacteria are “recombi—
`nants." They contain, along with their
`own genes, part or all of a gene from a
`human cell or other animal cell. If the
`inserted gene is one for a protein with an
`importantbiomedical application, a cul—
`ture of the recombinant bacteria. which
`can be grown easily and at low cost, will
`serve as an efficient factory for prod uc-
`ing that protein.
`Many laboratories in universities and
`in anemerging “applied genetics" indus-
`try are working to design bacteria able
`to synthesize such nonbacterial pro-
`teins. A growing tool kit of “genetic en—
`gineering” techniques makes it possible
`to isolate one of the million-odd genes
`of an animal cell, to fuse that gene with
`part of a bacterial gene and to insert
`the combination into bacteria. As those
`bacteria multiply they make millions of
`copies of their own genes and of the ani—
`mal gene inserted among them. If the
`animal gene is fused to a bacterial gene
`in such a way that a bacterium can treat
`the gene as one of its own, the bacteria
`will produce the protein specified by the
`animal gene. New ways of rapidly and
`easily determining the exact sequence
`of the chemical groups that constitute
`a molecule of DNA make it possible
`to learn the detailed structure of such
`"cloned" genes. After the structure is
`known it can be manipulated to produce
`DNA structures that function more ef—
`ficiently in the bacterial cell.
`In this article we shall first describe
`some of these techniques in a general
`way and then tell how we and our col-
`leagues Argiris Efstratiadis, Stephanie
`Broome, Peter Lomedico and Richard
`Tizard applied them in our laboratory at
`Harvard University to copy a rat gene
`that specifies the hormone insulin, to in—
`sert the gene into bacteria and to get
`the bacteria to manufacture a precursor
`of insulin. In an exciting application of
`this technology Charles Weissmann and
`his colleagues at the University of Zu—
`rich recently constructed bacteria that
`produce human interferon. a potential-
`ly useful antiviral protein.
`DNA, RNA and Proteins
`Cells make proteins by translating a
`set of commands arrayed along a strand
`of DNA. This hereditary information
`is held in the order of four chemical
`groups along the DNA: the bases ade-
`nine, thymine, guanine and cytosine. In
`sets of threes along DNA these bases
`specify which amino acids, the funda-
`mental building blocks of proteins, are
`to be used in putting the protein togeth-
`er; the correspondence between specific
`base triplets and particular amino acids
`is called the genetic code. The part of a
`DNA molecule that incorporates the in-
`formation to specify the structure of a
`protein is called a structural gene.
`To act on this information the cell
`copies the sequence of bases from its
`genetic storehouse in DNA into another
`molecule: messenger RNA. A strand of
`DNA serves as a template for the assem-
`bly of a complementary strand of RNA
`according to base-pairing rules: adenine
`always pairs with uracil (which in RNA
`replaces DN A’s thymine) and guanine
`pairs with cytosine. In animal cells tran-
`scription takes place in the nucleus of
`the cell. The messenger—RNA molecules
`carry the information out of the nucle-
`us into the cytoplasm. where a com-
`plex molecular machine translates it into
`protein by linking together the appropri-
`Page 1
`ate amino acids. In bacteria, which have
`no nucleus, transcription and transla—
`tion take place concurrently. The mes-
`senger RNA serves as a temporary set
`of instructions. Which proteins the cell
`makes depends on which messengers it
`contains at any given time; to make a
`difl'erent protein the cell makes a new
`messenger from the appropriate struc-
`tural gene. The DNA in each cell con-
`tains all the information required at any
`time by any cell of the organism, but
`each cell “expresses.” or translates into
`protein, only a specific small portion
`of that information. How does the cell
`know which structural genes to express?
`Along with the structural
`tion, a DNA molecule carries a series of
`regulatory commands. also written out
`as a sequence of bases. The simplest
`of these commands say in effect “Start
`here“ or “Stop here" both for the tran—
`scription and for the translation steps.
`More complicated commands say when
`and in which type of cell a specific gene
`should be used. The genetic code is the
`same in all cell nuclei, a given structural
`sequence specifying the same protein in
`every organism, but the special com-
`mands are not the same in bacteria and
`in animal cells. One of the most surpris-
`ing diflerences was discovered only in
`the past two years. The information for
`a bacterial protein is carried on a contig—
`uo us stretch of DNA. but in more com—
`plicated organisms, such as pigs and
`the structural
`information is
`broken up into segments. which are sep-
`arated along the gene by long stretches
`of other DNA called intervening DNA
`or “introns.” In such a cell a long region
`(often 10 times more than might be
`needed) is transcribed into RNA. The
`cell then processes this long RNA mole—
`cule, removing the sequence of bases
`that does not code for the protein and
`splicing together the rest
`to make a
`messenger—RNA molecule that carries
`essentially just the “start.” the structur—
`al sequence and the “stop” needed for
`To persuade a bacterium to make a
`nonbacterial protein one must put into
`bacteria a DNA molecule that has a se—
`quence of bases specifying the proteirrs
`amino acids as well as the bacterial com—
`mands for transcription and translation.
`Moreover. the inserted DNA must be
`treated by the bacterium as its own so
`that it will be duplicated as the bacteri—
`um divides. The problem thus breaks
`down into three parts: to find the right
`structural sequence (insulin‘s, for exam—
`ple), to place it in bacteria in such a way
`that it will be maintained as the bac-
`teria grow and then to manipulate the
`surrounding information, modifying the
`regulatory commands so that the struc—
`tural sequence is expressed as protein.
`Once the protein is made, still further
`changes in its gene or modifications of
`the bacterium may be needed to obtain
`the protein in large encugh amounts to
`be useful.
`The constellation of
`HUMAN INTERFERON synthesized in bacteria demonstrates its
`ability to block a viral infection in this biological assay. The structural
`information for making the protein interferon was obtained from hu-
`man white blood cells in the form of messenger-RNA molecules; the
`RNA then served as a template for the synthesis ol double-strand
`molecules of copy DNA, and the DNA in turn was Inserted by re-
`combinant-DNA techniques into a laboratory strain of the bacteri—
`um Escherichia coli. which synthesized the protein. For the assay di-
`lutions of an extract of the bacteria were placed in some of the wells
`of a clear plastic tray; the other wells served as controls. (The wells
`are seen through the bottom of the tray in this photograph.) Human
`cells were added to the wells and were grown to term a layer of cells
`covering the bottom of each well. A virus preparation was then added
`to the cells. Twenty—four hours later the cell layer was stained. Where
`interferon in the extracts protected the cells against the virus the cells
`survived and were stained. Where there was no interferon the virus
`killed the cells and the dead cells did not pick up the stain. The con-
`trol wells in the first column at the left contain a layer of cells that
`were never exposed to the virus; they accordingly appear stained. The
`control wells in the second column contain cells that have been killed
`by the virus; they look gray or clear. The control wells in the third
`column contain dilutions of a standard laboratory sample of inter-ter-
`on obtained directly from human cells; the top well has the most in-
`terferon and each succeeding well has I third as much interferon as
`the well above it. The wells in the next six columns hold dilutions of
`bacterial extracts from six dill'erent colole of E. coil in which inter-
`feron DNA was present. Five of the six columns containing the bac-
`terial extracts show evidence of interferon activity. The third extract
`tested (Column 6) had no detectable interferon; it apparently did not
`have a complete interferon gene. The synthesis of human interferon
`by the recombinant-DNA method
`manu and his colleagues at the Uoiversrty of lunch in collaboration
`with Kari Cantell of the Finnish Red Cross.
`by lliogen, SA. Interferon is synthesized by many animal cells, but
`it is Species-specific: only human interferon works for
`lugs, and it has been too scarce even for satisfactory experimentation.
`Page 2


`DNA techniques for placing and main-
`taining a new gene in bacteria is called
`cloning, which in this sense means the
`isolation of a specific new DNA se-
`quence in a single organism that prolif—
`erates to form a population of identical
`descendants: a clone. There are two con-
`venient ways of doing this. In one meth
`0d a small circular piece of DNA called
`a plasmid is the vehicle for introducing
`the new DNA into the bacterium. Plas—
`mids carry only a few genes of their own
`and are maintained in several copies in-
`side the bacterium by the bacterium’s
`own gene functions; they remain sepa-
`rate from the main set of bacterial genes
`carried on a circle of DNA about 1,000
`times larger. Alternatively the vehicle
`could be a virus that grows in bacteria.
`Such viruses normally have some 10 to
`50 genes of their own (a bacterium has
`thousand genes) and can often
`carry other new DNA segments in place
`of some of their own. All the techniques
`we shall describe apply to both plasmids
`and viruses.
`A molecule of DNA resembles a very
`UAU u
`A us
`cvrorusu “THE-5555‘
`PROTEINS ARE MADE in a living cell according to instructions
`encoded in the cell’s genes, which consist of specific sequences of
`chemical groups (hues) strung out along a double-strand molecule
`of DNA in the cell’s nucleus. The genetic code is “written” in lthe
`four letters A, I G and C. which stand rapectively for the four bases
`adenine, thymine, guanine and cytosine. The code is “read” in the
`three-letter sets called codons, which specify the amino acids linked
`together in the protein chain. The order of the bases can also con-
`vey regulatory commands. In multicelled organisms the structural se-
`quence, or gene, encoding a particular protein is usually hrolten into
`fragments separated by long stretches of other DNA; in this diagram
`the gene fragments, called exons, are represented by the black letters
`and the intervening sequences, known u introm, hy the white letters.
`The genetic information is translated into protein indirectly. First the
`entire sequence of bases is transcribed inside the nuclels from the
`DNA to a single-strand molecule of RNA. According to the base.
`pairing rules governing transcription, adenine always pairs with ura-
`cil (U) and guanine always pairs with cytosine. Next the RNA copies
`of the introns are excised from the message and the remaining RNA
`copies of the exons are joined together end to end. The reembled
`strand of messenger RNA then movw from the nucleus to the cyto-
`plasm, where the actual protein-manufacturing process takes place.
`Hae |||
`Hoe Ill
`Eco RI
`Hero III
`Hae |||
`GTGT once
`_ _ ‘
`PS: I
`Hae I||
`Hha |
`Ave N
`Eco RI
`Hae I" Here "I
`has n i
`Hpa ll
`_l. _
`DNA CAN BE CUT into comparatively short lengtlu with the aid
`of restriction endonucleasa, special enzymes that recognize specific
`hase sequences at which they cause the molecule to come apart. For
`example, Eco R], the first such enzyme discovered, recognizes a cer-
`tain six-base sequence and cuts the molecule wherever this sequence
`appears, WhHEflS Hoe I“, another restriction enzyme, operates at a
`certain four-base sequence. Since the probability of finding a partic-
`oiar four-base sequence is greater than that of finding a particular
`six-base sequence, one would expect Hoe [II to cut DNA more often
`than Eco RI. Accordingly one Eco RI site and two Hoe 11] sites are
`represented in the DNA segment at the top, which corresponds to
`part of the gene coding for insulin in rat cells. The same DNA con-
`tains recognition sites for a number of other restriction enzymes, as
`is shown in the line diagram of a larger gene fragment at the bottom.
`Page 3


`long, twisted thread. A bacterium has
`one millimeter of DNA in a continuous
`string of some three million bases folded
`back and forth several thousand times
`into a space less than a micron (a thou—
`sandth of a millimeter) across. in human
`cells the DNA is packed into 46 chro-
`mosomes. each one containing about
`four centimeters in a single pieceI the
`total amount corresponding to about
`three billion bases. How can one find
`and work with a single gene only a few
`thousand bases long? Fortunately na—
`ture has devised certain enzymes (pro—
`teins that carry out chemical reactions)
`that solve part of the problem. These
`special enzymes. called restriction endo-
`nucleases. have the ability to scan the
`long thread of DNA and to recognize
`particular short sequences as landmarks
`at which to cut
`the molecule apart.
`Some 40 or 50 of these enzymes are
`known, each of which recognizes differ-
`ent landmarks; each restriction enzyme
`therefore breaks up any given DNA re-
`producibly into a characteristic set of
`short pieces, from a few hundred to a
`few thousand bases longI which one can
`isolate by length.
`One can clone such DNA pieces in
`bacteria. As a first step one purifies the
`circle of plasmid DNA. The sequences
`of the plasmids are such that one of
`the restriction enzymes will recognize a
`unique site on the plasmid and cut the
`circle open there. One can insert a cho—
`sen DNA fragment into the opening by
`using a variety of enzymatic techniques
`that connect its ends to those of the cir—
`cle. Ordinarily this recombinant—DNA
`molecule could not pass through the
`bacterial cell wall. A dilute solution of
`calcium chloride renders the bacteria
`permeable, however;
`in a mixture of
`treated cells and DNA a few bacteria
`will take up the hybrid plasmid. These
`cells can be found among all those that
`did not take up the DNA if a gene on the
`plasmid provides a property the bacteri-
`um must have to survive, such as antibi—
`otic resistance. Then any bacterium car—
`rying the plasmid will be resistant to the
`antibiotic. whereas all the others will be
`killed by it. When one spreads the mix—
`ture of bacteria out on an agar plate con—
`taining nutrients and the antibioticI each
`single bacterium with a plasmid will
`grow into a separate colony of about
`100 million cells. A single colony carli'lzldg by "m
`chosen and grown further to yield bil-
`lions of cells, each of which contains
`identical copies of the new DNA se-
`quence in a recombinant plasmid.
`The Sequencing of DNA
`The procedures we have outlined so
`far are followed in “shotgun” cloning
`experiments. One breaks up the DNA of
`an animal cell into millions of pieces
`and inserts each piece into a different
`bacterium. In this way a number of col-
`lections of all the fragments of human,
`© ‘
`1’ Pst
`d cDNA
`Mix AND
`RECOMBINANT-DNA TECHNIQUE for making a protein in bacteria calls for the Insertion
`of a fragment of animal DNA that encodes the protein into a plasmid, a small circular piece of
`bacterial DNA, which in turn serves as the vehicle for introducing the DNA into the bacterium.
`The plasmid DNA is cleaved with the appropriate restriction enzyme and the new DNA se-
`quence is inserted into the op_
`Iect the new DNA’s ends to those of the broken plasmid circle. In the procedure illustrated
`here, for example, a special enzyme, reverse transcriptase, is first used to copyI the genetic infor-
`mation from a single-strand molecule of messenger RNA into a single strand oi copy DNA.
`The RNA template is then destroyed, and a second strand of DNA is made with another en-
`zyme, DNA polymerase. Still another enzyme, 51 nuclease, serva to break the covalent linkage
`between the two DNA strands. In the next step the double-strand DNA is joined to the plasmid
`by first using the enzyme terminal transferase to extend the ends of the DNA with a short se—
`quence of identical bases (in this case four cytosiua} and then annealing the DNA to the plas—
`mid DNA. to which a complementary sequence of basm (four guaniua) has been added. Bac-
`terial enzymes eventually fill the gaps in the regenerated circular DNA molecule and seal the
`connection between the inserted DNA and the plasmid DNA. The
`plasmid used by
`the authors to make rat proinsulin in bacteria, designated 983322, incorporates two genes that
`confer resktance to two antibiotics: penicillin and tetracycline. The plasmid is cleaved by the
`restriction enzyme Psi at a recognition site that lies in the midst of the gene encoding penicillin-
`ase (the enzyme that breaks down penicillin). The added DNA destroys thk enzymatic activity,
`but the tetracycline resistance remaim and is used to identify bacteria containing the plasmid.
`Page I"


`mouse, rat and fly DNA have been
`made. One can determine the structure
`of any one of these cloned DNA's by
`breaking up the hybrid plasmid with a
`restriction enzyme, separating the re—
`sulting DNA fragments. determining
`the base sequence of each of the frag-
`ments and then putting the sequences
`together to deduce the entire structure
`of the cloned DNA.
`There are two methods for sequenc—
`ing DNA. Both exploit reference points
`created by restriction—enzyme cleavage
`of the DNA at a specific short sequence
`and then work out the rest of the se-
`quence by measuring the distance of
`each base from that cut. They do this
`by creating a set of radioactively labeled
`molecules, each of which extends from
`the common point to one of the occur-
`rences of a specific base. When these
`molecules are separated by size and de~
`tected by their radioactivity, the length
`of the smallest one shows the position of
`the first occurrence of that base; longer
`molecules correspond to later occur-
`rences. The pattern created by the anal-
`ysis of these molecules looks like a lad—
`der. From the positions of the rungs one
`reads off the lengths. By comparing four
`such patterns one reads off a sequence.
`One technique. devised by Allan M.
`Maxam and one of us (Gilbert), makes
`use of chemical reagents that detect the
`difl’erent chemical properties of the bas—
`es and break the DNA there. To gener-
`ate the set of fragments the reactions are
`done for a short time. so that the mole—
`cule is broken only occasionally instead
`of everywhere the base occurs; different
`molecules will be broken at diflerent
`places. Four dill‘erent sets of reagents
`are used to generate the four patterns.
`The radioactive label is attached direct—
`ly to the end of the particular restriction
`fragment one wants to sequence, so that
`only the molecules stretching from the
`labeled end to the break are detected by
`their radioactivity.
`The other sequencing method,
`vised by Frederick Sanger of the British
`Medical Research Council Laborato—
`ry of Molecular Biology in Cambridge,
`makes a DNA copy with an enzyme and
`stops the sequential synthesis, and hence
`the elongation of the copy, by blocking
`the movement of the enzyme at a specif—
`ic base. Here the radioactive label is in—
`corporated into the newly synthesized
`molecule in four different
`Both methods can provide the sequence
`of from 200 to 300 bases in a single ex-
`periment. One of the small plasmids in-
`volved in our cloning experiments was
`sequenced in a year by Gregory Sul-
`clifl'e, who worked out the order of the
`4,357 bases on one strand and checked
`them by working out the complementa-
`ry strand.
`Any DNA region carried on a plas-
`mid can be isolated and sequenced.
`The difficulty is not in determining the
`sequence but in obtaining the specific
`BASES T mo 6
`_ I.IlL92
`— [all
`— G
`SEQUENCING OF DNA, in the method devised by one of the authors (Gilbert) and Allen M.
`Mam, begins with the attachment at a radioactive label to one end of each strand of double-
`strand DNA (1). The strands of trillions of molecules are separated (2) and a preparation of one
`of the tvro kinds of strands is divided among four test tubes (3). Each tube contains a chemical
`agent that selectively destroys one or two of the four base A, T, G and C, thereby cleaving the
`strand at the site of those bases; the reaction is controlled so that onlyr some of the strands are
`cleaved at each of the sites where a given base appears, generating a set of fragments at dltier-
`ent sizes. A strand containing three 6'5 (4'), for example, would produce a mixture of three ra-
`dioactively labeled molecules (5). The reactions break DNA at the G‘s alone, at the 6’s and the
`A's, at the Is and the C‘s, and at the ca; alone. The molecules are separated according to
`size by electrophoresis on a gel; the shorter- the molecule, the farther it migrates down the gel
`(6). The radioactive label produces an tillage of each group of molecules on an X-ray film (7).
`When four films are placed side by side [3], the ladder-like array of bands represents all the suc-
`cessively shorter fragments at the original strand of DNA (9). Knowing what base or
`bases was destroyed to produce each of the fragments, one can start at the bottom and read oil'
`a leftpto-rtght sequence of beam (1'9), which in turn yields the sequence of the second strand.
`Page 11


