`published: 17 April 2014
`doi: 10.3389/fmicb.2014.00172
`
`Recombinant protein expression in Escherichia coli:
`advances and challenges
`
`Germán L. Rosano1,2* and Eduardo A. Ceccarelli 1,2
`1 Instituto de Biología Molecular y Celular de Rosario, Consejo Nacional de Investigaciones Científicas y Técnicas, Rosario, Argentina
`2 Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Rosario, Argentina
`
`is one of the organisms of choice for the production of recombinant
`Escherichia coli
`proteins. Its use as a cell factory is well-established and it has become the most popular
`expression platform. For this reason, there are many molecular tools and protocols at hand
`for the high-level production of heterologous proteins, such as a vast catalog of expression
`plasmids, a great number of engineered strains and many cultivation strategies. We review
`the different approaches for the synthesis of recombinant proteins in E. coli and discuss
`recent progress in this ever-growing field.
`
`Keywords: recombinant protein expression, Escherichia coli, expression plasmid, inclusion bodies, affinity tags,
`E. coli expression strains
`
`Edited by:
`Peter Neubauer, Technische
`Universität Berlin, Germany
`Reviewed by:
`Jose M. Bruno-Barcena, North
`Carolina State University, USA
`Thomas Schweder,
`Ernst-Moritz-Arndt-Universität
`Greifswald, Germany
`*Correspondence:
`Germán L. Rosano, Instituto de
`Biología Molecular y Celular de
`Rosario, Consejo Nacional de
`Investigaciones Científicas y Técnicas,
`Esmeralda y Ocampo, Rosario 2000,
`Argentina
`e-mail: rosano@ibr-conicet.gov.ar
`
`INTRODUCTION
`There is no doubt that the production of recombinant proteins
`in microbial systems has revolutionized biochemistry. The days
`where kilograms of animal and plant tissues or large volumes
`of biological fluids were needed for the purification of small
`amounts of a given protein are almost gone. Every researcher
`that embarks on a new project that will need a purified protein
`immediately thinks of how to obtain it in a recombinant form.
`The ability to express and purify the desired recombinant protein
`in a large quantity allows for its biochemical characterization, its
`use in industrial processes and the development of commercial
`goods.
`At the theoretical level, the steps needed for obtaining a recom-
`binant protein are pretty straightforward. You take your gene of
`interest, clone it in whatever expression vector you have at your
`disposal, transform it into the host of choice, induce and then, the
`protein is ready for purification and characterization. In practice,
`however, dozens of things can go wrong. Poor growth of the host,
`inclusion body (IB) formation, protein inactivity, and even not
`obtaining any protein at all are some of the problems often found
`down the pipeline.
`In the past, many reviews have covered this topic with great
`detail (Makrides, 1996; Baneyx, 1999; Stevens, 2000; Jana and
`Deb, 2005; Sorensen and Mortensen, 2005). Collectively, these
`papers gather more than 2000 citations. Yet, in the field of recombi-
`nant protein expression and purification, progress is continuously
`being made. For this reason, in this review, we comment on
`the most recent advances in the topic. But also, for those with
`modest experience in the production of heterologous proteins,
`we describe the many options and approaches that have been
`successful for expressing a great number of proteins over the
`last couple of decades, by answering the questions needed to be
`
`addressed at the beginning of the project. Finally, we provide a
`troubleshooting guide that will come in handy when dealing with
`difficult-to-express proteins.
`
`FIRST QUESTION: WHICH ORGANISM TO USE?
`The choice of the host cell whose protein synthesis machinery
`will produce the precious protein will initiate the outline of the
`whole process. It defines the technology needed for the project,
`be it a variety of molecular tools, equipment, or reagents. Among
`microorganisms, host systems that are available include bacteria,
`yeast, filamentous fungi, and unicellular algae. All have strengths
`and weaknesses and their choice may be subject to the protein of
`interest (Demain and Vaishnav, 2009; Adrio and Demain, 2010).
`For example, if eukaryotic post-translational modifications (like
`protein glycosylation) are needed, a prokaryotic expression sys-
`tem may not be suitable (Sahdev et al., 2008). In this review,
`we will focus specifically on Escherichia coli. Other systems are
`described in excellent detail in accompanying articles of this
`series.
`The advantages of using E. coli as the host organism are well
`known. (i) It has unparalleled fast growth kinetics. In glucose-salts
`media and given the optimal environmental conditions, its dou-
`bling time is about 20 min (Sezonov et al., 2007). This means that
`a culture inoculated with a 1/100 dilution of a saturated starter
`culture may reach stationary phase in a few hours. However, it
`should be noted that the expression of a recombinant protein
`may impart a metabolic burden on the microorganism, causing
`a considerable decrease in generation time (Bentley et al., 1990).
`(ii) High cell density cultures are easily achieved. The theoretical
`density limit of an E. coli liquid culture is estimated to be about
`200 g dry cell weight/l or roughly 1 × 1013 viable bacteria/ml (Lee,
`1996; Shiloach and Fass, 2005). However, exponential growth in
`
`www.frontiersin.org
`
`April 2014 | Volume 5 | Article 172 | 1
`
`BEQ 1044
`Page 1
`
`
`
`Rosano and Ceccarelli
`
`Recombinant protein expression in E. coli
`
`complex media leads to densities nowhere near that number. In
`the simplest laboratory setup (i.e., batch cultivation of E. coli at
`C, using LB media), <1 × 1010 cells/ml may be the upper
`◦
`37
`limit (Sezonov et al., 2007), which is less than 0.1% of the theoret-
`ical limit. For this reason, high cell-density culture methods were
`designed to boost E. coli growth, even when producing a recom-
`binant protein (Choi et al., 2006). Being a workhorse organism,
`these strategies arose thanks to the wealth of knowledge about
`its physiology. (iii) Rich complex media can be made from read-
`ily available and inexpensive components. (iv) Transformation
`with exogenous DNA is fast and easy. Plasmid transformation of
`E. coli can be performed in as little as 5 min (Pope and Kent,
`1996).
`
`SECOND QUESTION: WHICH PLASMID SHOULD BE CHOSEN?
`The most common expression plasmids in use today are the
`result of multiple combinations of replicons, promoters, selec-
`tion markers, multiple cloning sites, and fusion protein/fusion
`protein removal strategies (Figure 1). For this reason, the catalog
`of available expression vectors is huge and it is easy to get lost when
`choosing a suitable one. To make an informed decision, these fea-
`tures have to be carefully evaluated according to the individual
`needs.
`
`REPLICON
`Genetic elements that undergo replication as autonomous units,
`such as plasmids, contain a replicon. It consists of one origin of
`replication together with its associated cis-acting control elements.
`An important parameter to have in mind when choosing a suitable
`vector is copy number. The control of copy number resides in the
`replicon (del Solar and Espinosa, 2000). It is logical to think that
`high plasmid dosage equals more recombinant protein yield as
`many expression units reside in the cell. However, a high plasmid
`
`number may impose a metabolic burden that decreases the bacte-
`rial growth rate and may produce plasmid instability, and so the
`number of healthy organisms for protein synthesis falls (Bentley
`et al., 1990; Birnbaum and Bailey, 1991). For this reason, the use
`of high copy number plasmids for protein expression by no means
`implies an increase in production yields.
`Commonly used vectors, such as the pET series, possess the
`pMB1 origin (ColE1-derivative, 15–60 copies per cell; Bolivar
`et al., 1977) while a mutated version of the pMB1 origin is present
`in the pUC series (500–700 copies per cell; Minton, 1984). The
`wild-type ColE1 origin (15–20 copies per cell; Lin-Chao and
`Bremer, 1986; Lee et al., 2006) can be found in the pQE vec-
`tors (Qiagen). They all belong to the same incompatibility group
`meaning that they cannot be propagated together in the same cell
`as they compete with each other for the replication machinery
`(del Solar et al., 1998; Camps, 2010). For the dual expression of
`recombinant proteins using two plasmids, systems with the p15A
`ori are available (pACYC and pBAD series of plasmids, 10–12
`copies per cell; Chang and Cohen, 1978; Guzman et al., 1995).
`Though rare, triple expression can be achieved by the use of the
`pSC101 plasmid. This plasmid is under a stringent control of repli-
`cation, thus it is present in a low copy number (<5 copies per
`cell; Nordstrom, 2006). The use of plasmids bearing this repli-
`con can be an advantage in cases where the presence of a high
`dose of a cloned gene or its product produces a deleterious effect
`to the cell (Stoker et al., 1982; Wang and Kushner, 1991). Alter-
`natively, the use of the Duet vectors (Novagen) simplifies dual
`expression by allowing cloning of two genes in the same plas-
`mid. The Duet plasmids possess two multiple cloning sites, each
`preceded by a T7 promoter, a lac operator and a ribosome bind-
`ing site. By combining different compatible Duet vectors, up to
`eight recombinant proteins can be produced from four expression
`plasmids.
`
`FIGURE 1 | Anatomy of an expression vector. The figure depicts the major features present in common expression vectors. All of them are described in
`the text. The affinity tags and coding sequences for their removal were positioned arbitrarily at the N-terminus for simplicity. MCS, multiple cloning site. Striped
`patterned box: coding sequence for the desired protein.
`
`Frontiers in Microbiology | Microbiotechnology, Ecotoxicology and Bioremediation
`
`April 2014 | Volume 5 | Article 172 | 2
`
`BEQ 1044
`Page 2
`
`
`
`Rosano and Ceccarelli
`
`Recombinant protein expression in E. coli
`
`PROMOTER
`The staple in prokaryotic promoter research is undoubtedly the
`lac promoter, key component of the lac operon (Müller-Hill,
`1996). The accumulated knowledge in the functioning of the
`system allowed for its extended use in expression vectors. Lac-
`tose causes induction of the system and this sugar can be used
`for protein production. However, induction is difficult in the
`presence of readily metabolizable carbon sources (such as glu-
`cose present in rich media). If lactose and glucose are present,
`expression from the lac promoter is not fully induced until all
`the glucose has been utilized. At this point (low glucose), cyclic
`adenosine monophosphate (cAMP) is produced, which is neces-
`sary for complete activation of the lac operon (Wanner et al., 1978;
`Postma and Lengeler, 1985). This positive control of expression is
`known as catabolite repression. In accordance, cAMP levels are
`low in cells growing in lac operon-repressing sugars, and this cor-
`relates with lower rates of expression of the lac operon (Epstein
`et al., 1975). Also, glucose abolishes lactose uptake because lac-
`tose permease is inactive in the presence of glucose (Winkler
`and Wilson, 1967). To achieve expression in the presence of glu-
`cose, a mutant that reduces (but does not eliminate) sensitivity to
`catabolite regulation was introduced, the lacUV5 promoter (Sil-
`verstone et al., 1970; Lanzer and Bujard, 1988). However, when
`present in multicopy plasmids, both promoters suffer from the
`disadvantage of sometimes having unacceptably high levels of
`expression in the absence of inducer (a.k.a. “leakiness”) due to
`titration of the low levels of the lac promoter repressor pro-
`tein LacI from the single chromosomal copy of its gene (about
`10 molecules per cell; Müller-Hill et al., 1968). Basal expression
`control can be achieved by the introduction of a mutated pro-
`moter of the lacI gene, called lacI Q, that leads to higher levels of
`expression (almost 10-fold) of LacI (Calos, 1978). The lac pro-
`moter and its derivative lacUV5 are rather weak and thus not
`very useful for recombinant protein production (Deuschle et al.,
`1986; Makoff and Oxer, 1991). Synthetic hybrids that combine the
`strength of other promoters and the advantages of the lac pro-
`moter are available. For example, the tac promoter consists of
`the −35 region of the trp (tryptophan) promoter and the −10
`region of the lac promoter. This promoter is approximately 10
`times stronger than lacUV5 (de Boer et al., 1983). Notable exam-
`ples of commercial plasmids that use the lac or tac promoters to
`drive protein expression are the pUC series (lacUV5 promoter,
`Thermo Scientific) and the pMAL series of vectors (tac promoter,
`NEB).
`The T7 promoter system present in the pET vectors (pMB1
`ori, medium copy number, Novagen) is extremely popular for
`recombinant protein expression. This is not surprising as the
`target protein can represent 50% of the total cell protein in suc-
`cessful cases (Baneyx, 1999; Graumann and Premstaller, 2006).
`In this system, the gene of interest is cloned behind a promoter
`recognized by the phage T7 RNA polymerase (T7 RNAP). This
`highly active polymerase should be provided in another plas-
`mid or, most commonly, it is placed in the bacterial genome
`in a prophage (λDE3) encoding for the T7 RNAP under the
`transcriptional control of a lacUV5 promoter (Studier and Mof-
`fatt, 1986). Thus, the system can be induced by lactose or its
`non-hydrolyzable analog isopropyl β-D-1-thiogalactopyranoside
`
`(IPTG). Basal expression can be controlled by lacIQ but also by T7
`lysozyme co-expression (Moffatt and Studier, 1987). T7 lysozyme
`binds to T7 RNAP and inhibits transcription initiation from the
`T7 promoter (Stano and Patel, 2004). In this way, if small amounts
`of T7 RNAP are produced because of leaky expression of its gene,
`T7 lysozyme will effectively control unintended expression of het-
`erologous genes placed under the T7 promoter. T7 lysozyme is
`provided by a compatible plasmid (pLysS or pLysE). After induc-
`tion, the amount of T7 RNAP produced surpasses the level of
`polymerase that T7 lysozyme can inhibit. The “free” T7 RNAP
`can thus engage in transcription of the recombinant gene. Yet
`another level of control lies in the insertion of a lacO operator
`downstream of the T7 promoter, making a hybrid T7/lac pro-
`moter (Dubendorff and Studier, 1991). All three mechanisms
`(tight repression of the lac-inducible T7 RNAP gene by lacIQ, T7
`RNAP inhibition by T7 lysozyme and presence of a lacO operator
`after the T7 promoter) make the system ideal for avoiding basal
`expression.
`The problem of leaky expression is a reflection of the nega-
`tive control of the lac promoter. Promoters that rely on positive
`control should have lower background expression levels (Siegele
`and Hu, 1997). This is the case of the araPBAD promoter present
`in the pBAD vectors (Guzman et al., 1995). The AraC protein
`has the dual role of repressor/activator. In the absence of ara-
`binose inducer, AraC represses translation by binding to two
`sites in the bacterial DNA. The protein–DNA complex forms a
`loop, effectively preventing RNA polymerase from binding to the
`promoter. Upon addition of the inducer, AraC switches into “acti-
`vation mode” and promotes transcription from the ara promoter
`(Schleif, 2000, 2010). In this way, arabinose is absolutely needed for
`induction.
`Another widely used approach is to place a gene under the
`control of a regulated phage promoter. The strong leftward pro-
`moter (pL) of phage lambda directs expression of early lytic
`genes (Dodd et al., 2005). The promoter is tightly repressed by
`the λcI repressor protein, which sits on the operator sequences
`during lysogenic growth. When the host SOS response is trig-
`gered by DNA damage, the expression of the protein RecA is
`stimulated, which in turn catalyzes the self-cleavage of λcI, allow-
`ing transcription of pL-controlled genes (Johnson et al., 1981;
`Galkin et al., 2009). This mechanism is used in expression vectors
`containing the pL promoter. The SOS response (and recom-
`binant protein expression) can be elicited by adding nalidixic
`acid, a DNA gyrase inhibitor (Lewin et al., 1989; Shatzman
`et al., 2001). Another way of activating the promoter is to con-
`trol λcI production by placing its gene under the influence of
`another promoter. This two-stage control system has already
`been described for T7 promoter/T7 RNAP-based vectors. In the
`pLEX series of vectors (Life Technologies), the λcI repressor gene
`was integrated into the bacterial chromosome under the control
`of the trp promoter. In the absence of tryptophan, this pro-
`moter is always “on” and λcI is continuously produced. Upon
`addition of tryptophan, a tryptophan-TrpR repressor complex
`is formed that tightly binds to the trp operator, thereby block-
`ing λcI repressor synthesis. Subsequently, the expression of the
`desired gene under the pL promoter ensues (Mieschendahl et al.,
`1986).
`
`www.frontiersin.org
`
`April 2014 | Volume 5 | Article 172 | 3
`
`BEQ 1044
`Page 3
`
`
`
`Rosano and Ceccarelli
`
`Recombinant protein expression in E. coli
`
`Transcription from all promoters discussed so far is initiated
`by chemical cues. Systems that respond to physical signals (e.g.,
`temperature or pH) are also available (Goldstein and Doi, 1995).
`The pL promoter is one example. A mutant λcI repressor protein
`( λcI857) is temperature-sensitive and is unstable at temperatures
`◦
`C. E. coli host strains containing the λcI857 pro-
`higher than 37
`tein (either integrated in the chromosome or into a vector) are
`◦
`first grown at 28–30
`C to the desired density, and then protein
`◦
`expression is induced by a temperature shift to 40–42
`C (Menart
`et al., 2003; Valdez-Cruz et al., 2010). The industrial advantage
`of this system lies in part in the fact that during fermenta-
`tion, heat is usually produced and increasing the temperature in
`high density cultures is easy. On the other hand, genes under
`the control of the cold-inducible promoter cspA are induced by
`◦
`a downshift in temperature to 15
`C (Vasina et al., 1998). This
`temperature is ideal for expressing difficult proteins as will be
`explained in another section. The pCold series of plasmids have
`a pUC118 backbone (a pUC18 derivative; Vieira and Messing,
`1987) with the cspA promoter (Qing et al., 2004; Hayashi and
`Kojima, 2008). In the original paper, successful expression was
`achieved for more than 30 recombinant proteins from different
`sources, reaching levels as high as 20–40% of the total expressed
`proteins (Qing et al., 2004). However, it should be noted that in
`various cases the target proteins were obtained in an insoluble
`form.
`
`SELECTION MARKER
`To deter the growth of plasmid-free cells, a resistance marker is
`added to the plasmid backbone. In the E. coli system, antibiotic
`resistance genes are habitually used for this purpose. Resistance
`to ampicillin is conferred by the bla gene whose product is a
`periplasmic enzyme that inactivates the β-lactam ring of β-lactam
`antibiotics. However, as the β-lactamase is continuously secreted,
`degradation of the antibiotic ensues and in a couple of hours,
`ampicillin is almost depleted (Korpimaki et al., 2003). Under this
`situation, cells not carrying the plasmid are allowed to increase in
`number during cultivation. Although not experimentally verified,
`selective agents in which resistance is based on degradation, like
`chloramphenicol (Shaw, 1983) and kanamycin (Umezawa, 1979),
`could also have this problem. For this reason, tetracycline has been
`shown to be highly stable during cultivation (Korpimaki et al.,
`2003), because resistance is based on active efflux of the antibiotic
`from resistant cells (Roberts, 1996).
`The cost of antibiotics and the dissemination of antibiotic
`resistance are major concerns in projects dealing with large-
`scale cultures. Much effort has been put in the development of
`antibiotics-free plasmid systems. These systems are based on the
`concept of plasmid addiction, a phenomenon that occurs when
`plasmid-free cells are not able to grow or live (Zielenkiewicz and
`Ceglowski, 2001; Peubez et al., 2010). For example, an essen-
`tial gene can be deleted from the bacterial genome and then
`placed on a plasmid. Thus, after cell division, plasmid-free bac-
`teria die. Different subtypes of plasmid-addiction systems exist
`according to their principle of function: (i) toxin/antitoxin-
`based systems, (ii) metabolism-based systems, and (iii) oper-
`ator repressor titration systems (Kroll et al., 2010). While this
`promising technology has been proved successful in large-scale
`
`fermentors (Voss and Steinbuchel, 2006; Peubez et al., 2010),
`expression systems based on plasmid addiction are still not widely
`distributed.
`
`AFFINITY TAGS
`When devising a project where a purified soluble active recom-
`binant protein is needed (as is often the case), it is invaluable
`to have means to (i) detect it along the expression and purifica-
`tion scheme, (ii) attain maximal solubility, and (iii) easily purify
`it from the E. coli cellular milieu. The expression of a stretch of
`amino acids (peptide tag) or a large polypeptide (fusion partner)
`in tandem with the desired protein to form a chimeric protein may
`allow these three goals to be straightforwardly reached (Nilsson
`et al., 1997).
`Being small, peptide tags are less likely to interfere when fused
`to the protein. However, in some cases they may provoke nega-
`tive effects on the tertiary structure or biological activity of the
`fused chimeric protein (Bucher et al., 2002; Klose et al., 2004;
`Chant et al., 2005; Khan et al., 2012). Vectors are available that
`allow positioning of the tag on either the N-terminal or the
`C-terminal end (the latter option being advantageous when a
`signal peptide is positioned at the N-terminal end for secretion
`of the recombinant protein, see below). If the three-dimensional
`structure of the desired protein is available, it is wise to check
`which end is buried inside the fold and place the tag in the
`solvent-accessible end. Common examples of small peptide tags
`are the poly-Arg-, FLAG-, poly-His-, c-Myc-, S-, and Strep II-
`tags (Terpe, 2003). Since commercial antibodies are available for
`all of them, the tagged recombinant protein can be detected by
`Western blot along expression trials, which is extremely helpful
`when the levels of the desired proteins are not high enough to
`be detected by SDS-PAGE. Also, tags allow for one-step affinity
`purification, as resins that tightly and specifically bind the tags
`are available. For example, His-tagged proteins can be recovered
`by immobilized metal ion affinity chromatography using Ni2+
`or Co2+
`-loaded nitrilotriacetic acid-agarose resins (Porath and
`Olin, 1983; Bornhorst and Falke, 2000), while anti-FLAG affinity
`gels (Sigma-Aldrich) are used for capturing FLAG fusion proteins
`(Hopp et al., 1988).
`On the other hand, adding a non-peptide fusion partner
`has the extra advantage of working as solubility enhancers
`(Hammarstrom et al., 2002). The most popular fusion tags are
`the maltose-binding protein (MBP; Kapust and Waugh, 1999),
`N-utilization substance protein A (NusA; Davis et al., 1999),
`thioredoxin (Trx; LaVallie et al., 1993), glutathione S-transferase
`(GST; Smith and Johnson, 1988), ubiquitin (Baker, 1996) and
`SUMO (Butt et al., 2005). The reasons why these fusion partners
`act as solubility enhancers remain unclear and several hypothe-
`sis have been proposed (reviewed in Raran-Kurussi and Waugh,
`2012). In the case of MBP, it was shown that it possesses an
`intrinsic chaperone activity (Kapust and Waugh, 1999; Raran-
`Kurussi and Waugh, 2012). In comparison studies, GST showed
`the poorest solubility enhancement capabilities (Hammarstrom
`et al., 2006; Bird, 2011). NusA, MBP, and Trx display the best
`solubility enhancing properties but their large size may lead to
`the erroneous assessment of protein solubility (Costa et al., 2013).
`Indeed, when these tags are removed, the final solubility of the
`
`Frontiers in Microbiology | Microbiotechnology, Ecotoxicology and Bioremediation
`
`April 2014 | Volume 5 | Article 172 | 4
`
`BEQ 1044
`Page 4
`
`
`
`Rosano and Ceccarelli
`
`Recombinant protein expression in E. coli
`
`desired product is unpredictable (Esposito and Chatterjee, 2006).
`For these reasons, smaller tags with strong solubility enhancing
`effects are desirable. Recently, the 8-kDa calcium binding protein
`Fh8 from the parasite Fasciola hepatica was shown to be as good
`as or better than the large tags in terms of solubility enhancement.
`Moreover, the recombinant proteins maintained their solubility
`after tag removal (Costa et al., 2013). MBP and GST can be used
`to purify the fused protein by affinity chromatography, as MBP
`binds to amylose–agarose and GST to glutathione–agarose. MBP
`is present in the pMAL series of vectors from NEB and GST
`in the pGEX series (GE). A peptide tag must be added to the
`fusion partner-containing protein if an affinity chromatography
`step is needed in the purification scheme. MBP and GST bind to
`their substrates non-covalently. On the contrary, the HaloTag7
`(Promega) is based on the covalent capture of the tag to the
`resin, making the system fast and highly specific (Ohana et al.,
`2009).
`A different group of fusion tags are stimulus-responsive tags,
`which reversibly precipitate out of solution when subjected to
`the proper stimulus. The addition of β roll tags to a recombi-
`nant protein allows for its selective precipitation in the presence
`of calcium. The final products presented a high purity and
`the precipitation protocol only takes a couple of minutes (Shur
`et al., 2013). Another protein-based stimulus-responsive purifi-
`cation tags are elastin-like polypeptides (ELPs), which consist
`of tandem repeats of the sequence VPGXG, where X is Val,
`Ala, or Gly in a 5:2:3 ratio (Meyer and Chilkoti, 1999). These
`tags undergo an inverse phase transition at a given temperature
`of transition (T t). When the T t
`is reached, the ELP–protein
`fusion selectively and reversibly precipitates, allowing for quick
`enrichment of the recombinant protein by centrifugation (Banki
`et al., 2005). Precipitation can also be triggered by adjusting the
`ionic strength of the solution (Ge et al., 2005). These techniques
`represent an alternative to conventional chromatography-based
`purification methods and can save production costs, especially
`in large-scale settings (Fong and Wood, 2010). The main char-
`acteristics of the tags mentioned in this section are outlined on
`Table 1.
`
`TAG REMOVAL
`If structural or biochemical studies on the recombinant protein
`are needed, then the fusion partner must be eliminated from the
`recombinant protein. Peptide tags should be removed too because
`they can interfere with protein activity and structure (Wu and
`Filutowicz, 1999; Perron-Savard et al., 2005), but they can be left in
`place even for crystallographic studies (Bucher et al., 2002; Carson
`et al., 2007). Tags can be eliminated by either enzymatic cleavage
`or chemical cleavage.
`In the case of tag removal by enzyme digestion, expression
`vectors possess sequences that encode for protease cleavage sites
`downstream of the gene coding for the tag. Enterokinase, throm-
`bin, factor Xa and the tobacco etch virus (TEV) protease have all
`been successfully used for the removal of peptide tags and fusion
`partners (Jenny et al., 2003; Blommel and Fox, 2007). Choosing
`among the different proteases is based on specificity, cost, number
`of amino acids left in the protein after cleavage and ease of removal
`after digestion (Waugh, 2011). Enterokinase and thrombin were
`
`popular in the past but the use of His-tagged TEV has become an
`everyday choice due to its high specificity (Parks et al., 1994), it is
`easy to produce in large quantities (Tropea et al., 2009) and leaves
`only a serine or glycine residue (or even the natural N-terminus)
`after digestion (Kapust et al., 2002).
`As the name implies, in chemical cleavage the tag is removed
`by treatment of the fusion protein with a chemical reagent. The
`advantages of using chemicals for this purpose are that they are
`easy to eliminate from the reaction mixture and are cheap in com-
`parison with proteolytic enzymes, which makes them an attractive
`choice in the large-scale production of recombinant proteins
`(Rais-Beghdadi et al., 1998). However, the reaction conditions
`are harsh, so their use is largely restricted to purified recombi-
`nant proteins obtained from IBs. They also often cause unwanted
`protein modifications (Hwang et al., 2014). The most common
`chemical cleavage reagent is cyanogen bromide (CNBr). CNBr
`cleaves the peptide bond C-terminal to methionine residues, so
`this amino acid should be present between the tag and the protein
`of interest (Rais-Beghdadi et al., 1998). Also, the target protein
`should not contain internal methionines. CNBr cleavage can be
`performed in common denaturing conditions (6 M guanidinium
`chloride) or 70% formic acid or trifluoroacetic acid (Andreev et al.,
`2010). Other chemical methods for protein cleavage can be found
`in Hwang et al. (2014).
`
`THIRD QUESTION: WHICH IS THE APPROPRIATE HOST?
`A quick search in the literature for a suitable E. coli strain to use
`as a host will yield dozens of possible candidates. All of them
`have advantages and disadvantages. However, something to keep
`in mind is that many are specialty strains that are used in specific
`situations. For a first expression screen, only a couple of E. coli
`strains are necessary: BL21(DE3) and some derivatives of the K-12
`lineage.
`The history of the BL21 and BL21(DE3) strains was beautifully
`documented in Daegelen et al. (2009) and we recommend this
`article to the curious. BL21 was described by Studier in 1986 after
`various modifications of the B line (Studier and Moffatt, 1986),
`which in turn Daegelen et al. (2009) traced back to d’Herelle.
`A couple of genetic characteristics of BL21 are worthy of men-
`tion. Like other parental B strains, BL21 cells are deficient in the
`Lon protease, which degrades many foreign proteins (Gottesman,
`1996). Another gene missing from the genome of the ancestors of
`BL21 is the one coding for the outer membrane protease OmpT,
`whose function is to degrade extracellular proteins. The liberated
`amino acids are then taken up by the cell. This is problematic
`in the expression of a recombinant protein as, after cell lysis,
`OmpT may digest it (Grodberg and Dunn, 1988). In addition,
`plasmid loss is prevented thanks to the hsdSB mutation already
`present in the parental strain (B834) that gave rise to BL21. As a
`result, DNA methylation and degradation is disrupted. When the
`gene of interest is placed under a T7 promoter, then T7 RNAP
`should be provided. In the popular BL21(DE3) strain, the λDE3
`prophage was inserted in the chromosome of BL21 and contains
`the T7 RNAP gene under the lacUV5 promoter, as was explained
`earlier.
`The BL21(DE3) and its derivatives are by far the most used
`strains for protein expression. Still, there are reports where the
`
`www.frontiersin.org
`
`April 2014 | Volume 5 | Article 172 | 5
`
`BEQ 1044
`Page 5
`
`
`
`Rosano and Ceccarelli
`
`Recombinant protein expression in E. coli
`
`Table 1 | Main characteristics of protein fusion tags.
`
`Residues/Size (kDa)
`
`Ligand/Matrix
`
`Purification conditions
`
`Peptide tags
`Poly-Arg
`Poly-His
`FLAG
`
`Usually 5/0.80
`Usually 6/0.84
`8/1.01
`
`Strep-tag II
`
`8/1.06
`
`c-myc
`
`S-tag
`
`11/1.20
`
`15/1.75
`
`Fusion partnersa
`Fh8
`
`69/8.0
`
`Trx
`
`109/11.7
`
`SUMO
`
`ca. 100/12.0
`
`BRT17 (β roll tag) 153/14.7
`
`GST
`HaloTag7
`
`211/26.0
`ca. 300/34.0
`
`MBP
`ELPs
`
`396/ca. 42.5
`550 (for 110 repeats)/ca.
`47.0
`
`Cation-exchange resin
`Ni2+-nitrilotriacetic acid-agarose
`Anti-FLAG antibody
`immunodecorated agarose
`Specially engineered streptavidin
`(Strep-Tactin)
`Anti-myc antibody
`immunodecorated agarose
`S-protein (RNase A, residues
`21–124) agarose
`
`Ca2+-dependent binding to
`phenyl-Sepharose
`4-amino phenylarsine oxide agarose
`(alternatively an affinity tag can be
`added)
`An affinity tag must be added
`(usually His-tag)
`
`Glutathione–agarose
`Chloroalkane ligand attached to
`agarose
`
`Cross-linked amylose
`
`NaCl linear gradient (0–400 mM)
`20–250 mM Imidazole/low pH
`2–5 mM EDTA
`
`2–25 mM desthiobiotin
`
`Low pH
`
`3 M guanidinium thiocyanate; 0.2 M
`potassium citrate buffer, pH 2 or
`3 M MgCl2
`
`10 mM EDTA
`
`5–1000 mM b-βmercaptoethanol
`
`Precipitation in the presence of
`25–75 mM Ca2+
`10–20 mM reduced glutathione
`A protease cleavage site is added
`between the tag and the protein for
`in-column cleavage
`10 mM maltose
`Precipitation by temperature shifts
`and/or high concentrations of NaCl
`(>1.5 M)
`
`NusA
`
`495/54.8
`
`An affinity tag must be added
`(usually His-tag)
`
`Solubility enhancementb
`ND
`
`+++
`
`++++
`
`ND
`
`+
`ND
`
`+++
`ND
`
`++
`
`aNumber of residues and size of fusion partners are approximate in some cases, as many variants exist. bThe grading in the solubility enhancement column is based
`on the results of Bird (Bird, 2011); ND, not determined