Market value and patent citations
`Bronwyn H. Hall*
`Adam Jaffe**
`Manuel Trajtenberg***
`This paper explores the usefulness of patent citations as a measure of the “importance” of a firm’s
`patents, as indicated by the stock market valuation of the firm’s intangible stock of knowledge. Using
`patents and citations for 1963-1999, we estimate Tobin’s q equations on the ratios of R&D to assets
`stocks, patents to R&D, and citations to patents. We find that each ratio significantly impacts market
`value, with an extra citation per patent boosting market value by 3%. Further findings indicate that
`“unpredictable” citations have a stronger effect than the predictable portion, and that self-citations are
`more valuable than external citations.
`JEL Classification: O31, O38
`UC Berkeley, NBER, and IFS UC Berkeley, NBER, and IFS; Department of Economics 549
`Evans Hall, UC Berkeley, Berkeley, CA 9472—3880, USA; Tel. 1 510 642 3878;
` *
`Brandeis University and NBER; Brandeis University, Waltham MA 02554-9110, USA: 781-736-
`Tel Aviv University, NBER, and CEPR; Eitan Berglas School of Economics, Tel
`Aviv University, Tel Aviv 69978, Israel; Tel. 972-3-640-9911;
`This paper follows closely in the steps of the late Zvi Griliches and owes to him the underlying vision,
`method, and pursuit of data. The data construction was partially supported by the National Science
`Foundation, via grants SBR-9413099 and SBR-9320973. We are extremely grateful to Meg Fernando of
`REI, Case Western Reserve University, for excellent assistance in matching the patent data to Compustat.
`We also acknowledge with gratitude the comments received at numerous seminars, and from two
`1. Introduction
`It is widely understood that the R&D conducted by private firms is an investment activity, the output of
`which is an intangible asset that can be labeled as the firm’s “knowledge stock.” If this asset is known to
`contribute positively to the firm’s future net cash flows, then the size of a firm’s knowledge stock should
`be reflected in the observed market value of the firm. This implies that a firm’s R&D investments should
`be capitalized in the firm’s market value. Further, since the output of the R&D investment process is
`stochastic, some of the R&D will result in the creation of more valuable knowledge capital; if this success
`is observable, then it should be reflected in greater market value bang for the R&D buck.
`Empirical testing of this formulation requires an observable proxy for R&D “success.” There is a
`considerable literature using counts of firms’ successful patent applications for this purpose. But the value
`of patent counts as a proxy for R&D success is severely limited by the very large variance in the
`significance or value of individual patents, rendering patent counts an extremely noisy indicator of R&D
`success. In this paper we utilize information on the number of subsequent citations received by a firm’s
`patents to get a better measure of R&D success. Further, because citations arrive over time, and can be
`distinguished by the identity of the citing organization, we can distinguish the impact of a firm’s patents
`on its market value according to the time path and source of subsequent citations.
`This project was made possible by the recently completed creation of a comprehensive data file on patents
`and citations, comprising all US patents granted during the period 1963-1999 (three million patents), and
`all patent citations made during 1975-1999 (about 16 million citations), as described in Hall, Jaffe, and
`Trajtenberg (2001).1 We construct on the basis of these data three measures of “knowledge stocks”: the
`1 The complete data are available in the NBER site at, and also in a CD
`included with Jaffe and Trajtenberg (2002). For purposes of this paper, we actually used a previous
`version of the data that extends only until 1996.
`traditional R&D and patent count stocks, and a citations stock. The last poses serious truncation problems,
`since citations to a given patent typically keep coming over long periods of time, but we only observe
`them until the last date of the available data; we apply correction methods developed elsewhere to deal
`with this and related problems. It is important to note that in this paper we look only at a simple
`“hedonic” (and hence snapshot-like) market value equation, and do not address the deeper dynamic forces
`at work, as discussed by Pakes (1985) – these will have to wait for future research.
`We estimate Tobin’s q “hedonic” equations on three complementary aspects of knowledge stocks: R&D
`“intensity” (the ratio of R&D stocks to the book value of assets), the patent yield of R&D (i.e., the ratio of
`patent count stocks to R&D stocks), and the average citations received by these patents (i.e., the ratio of
`citations to patent stocks). We find that each of these ratios has a statistically and economically significant
`impact on Tobin’s q. This confirms that the market values R&D inputs, values R&D output as measured
`by patents, and further values “high-quality” R&D output as measured by citation intensity.
`When we look in more detail at the aspects of citation patterns that are associated with higher market
`value, we find: (i) The value of high citation intensity is disproportionately concentrated in highly cited
`patents: firms having two to three times the median number of citations per patent display a 35% value
`premium, and those with 20 citations and more command a staggering 54% market value premium. (ii)
`There are wide differences across sectors in the impact of each knowledge stock ratio on market value.
`(iii) Market value premia associated with patent citations confirm the forward-looking nature of equity
`markets: at a given point in time, market value premia are associated with future citations rather than
`those that have been received in the past, and the portion of total lifetime citations that is unpredictable
`based on the citation history at a given moment has the largest impact. (iv) Self-citations (i.e., those
`coming from down-the-line patents owned by the same firm) are more valuable than citations coming
`from external patents, but this effect decreases with the size of patent portfolio held by the firm, as might
`be expected.
`The paper is organized as follows: section 2 discusses the rationale for the use of patent and citations data
`in this sort of research, and reviews previous literature. The data are described in section 3, along with a
`discussion of truncation and its remedies. Section 4 deals with the specification of the market value
`equation, and the construction of citation stocks, including the partition into past-future and predictable-
`residual citation stocks. The empirical findings are presented in section 5: starting with a “horse race”
`between R&D, patents, and citations, we proceed to estimate the preferred specification that includes the
`three ratios, add industry effects, experiment with the various partitions of the citations stock, and finally
`look at the differential impact of self-citations. Section 6 concludes with ideas for further research.
`2. Patents, citations, and market value: where do we stand?
`Patents have long been recognized as a very rich data source for the study of innovation and technical
`change. Indeed, there are numerous advantages to the use of patent data: each patent contains highly
`detailed information on the innovation; patents display extremely wide coverage in terms of technologies,
`assignees, and geography; there are already millions of them (the flow being of over 150,000 US Patent
`and Trademark Office [USPTO] patent grants per year); the data contained in patents are supplied entirely
`on a voluntarily basis, etc. There are serious limitations as well, the most glaring being that not all
`innovations are patented, simply because not all inventions meet the patentability criteria, and because the
`inventor has to make a strategic decision to patent, as opposed to relying on secrecy or other means of
`2 Unfortunately, we have very little idea of the extent to which patents are representative of the wider
`universe of inventions, since there is no systematic data about inventions that are not patented (see,
`however, Crepon, Duguet, and Mairesse, 1998). This is an important, wide-open area for future research.
`The large-scale use of patent data in economic research goes back to Scherer (1965), Schmookler (1966),
`and Griliches (1984).3 One of the major limitations of these research programs, extremely valuable as
`they had been, was that they relied exclusively on patent counts as indicators of innovative output.4
`However, it has long been recognized that innovations vary enormously in their technological and
`economic “importance” or “value,” and that the distribution of such “values” is extremely skewed. Thus
`simple patent counts are inherently limited in the extent to which they can capture such heterogeneity (see
`Griliches, Pakes, and Hall, 1987). The line of research initiated by Pakes and Schankerman (1984) using
`patent renewal data clearly revealed these features of the patent data. Patent citations suggested
`themselves as a means to tackle such heterogeneity (Trajtenberg, 1990; Albert et al., 1991), as well as a
`way to trace spillovers (Jaffe, Trajtenberg, and Henderson, 1993). In order to understand the role that
`patent citations have come to play in this context, we have to look more in detail into the patent document
`as a legal entity and as an information source.
` A
` patent awards to inventors the right to exclude others from the unauthorized use of the disclosed
`invention, for a predetermined period of time.5 For a patent to be granted, the innovation must fulfill the
`following criteria: (i) it has to be novel in a legally defined sense6; (ii) it has to be non-obvious, in that a
`skilled practitioner of the technology would not have known how to do it; and (iii) it must be useful,
`meaning that it has potential commercial value. If a patent is granted, an extensive public document is
`created. The front page of a patent contains detailed information about the invention, the inventor, the
`3 The work of Schmookler involved assigning patent counts to industries, whereas Griliches’ project
`entailed matching patents to a sample of Compustat firms. In both cases the resulting data used were
`yearly patent counts by industries or firms. Scherer’s project involved the creation of a “technology flow
`matrix” by industry of origin and industries of use.
`4 Of course, that is the best they could do at the time, given computer and data resources available.
`5 Whether or not this right translates into market power depends upon a host of other factors, including
`the legal strength of these rights, the speed of technical advance, the ease of imitation, etc.
`6 In the US that means “first to invent,” whereas in Europe and Japan it means “first to file.”
`assignee, and the technological antecedents of the invention, including citations to previous patents. These
`citations serve an important legal function, since they delimit the scope of the property rights awarded by
`the patent. Thus, if patent B cites patent A, it implies that patent A represents a piece of previously
`existing knowledge upon which patent B builds, and over which B cannot have a claim. The applicant has
`a legal duty to disclose any knowledge of the prior art (and thus the inventor’s attorney typically plays an
`important role in deciding which patents to cite), but the decision regarding which citations to include
`ultimately rests with the patent examiner, who is supposed to be an expert in the area and hence able to
`identify relevant prior art that the applicant misses or conceals.7
`Thus, patent citations presumably convey information on two major aspects of innovations.8 The first is
`linkages between inventions, inventors, and assignees along time and space. In particular, patent citations
`enable the quantitative, detailed study of spillovers, along geographical, institutional, and related
`dimensions. The second is that citations may be used as indicators of the “importance” of individual
`patents, thus introducing a way of gauging the enormous heterogeneity in the “value” of patents.9 In this
`paper we concentrate on the latter aspect, with only a passing reference to citations as indicators of
`spillovers when dealing with self-citations.
`7 “During the examination process, the examiner searches the pertinent portion of the ‘classified’ patent
`file. His purpose is to identify any prior disclosures of technology…which anticipate the claimed
`invention and preclude the issuance of a patent; which might be similar to the claimed invention and limit
`the scope of patent protection…; or which, generally, reveal the state of the technology to which the
`invention is directed….If such documents are found they are made known to the inventor, and are ‘cited’
`in any patent which matures from the application…Thus, the number of times a patent document is cited
`may be a measure of its technological significance.” (Office of Technology Assessment and Forecast,
`1976, p. 167).
`8 Citations allow one also to probe into other aspects of innovations, such as their “originality,”
`“generality,” links to science, etc. – see Trajtenberg, Henderson, and Jaffe (1997).
`9 The two are of course related: one may deem more “important” those patents that generate more
`spillovers, and vice versa. Most research so far has treated these two aspects separately, but clearly there
`is room to aim for an integrative approach.
`There are reasons to believe that citations convey not just technological but also economically significant
`information: Patented innovations are for the most part the result of costly R&D conducted by profit-
`seeking organizations; if firms invest in further developing an innovation disclosed in a previous patent,
`then the resulting (citing) patents presumably signify that the cited innovation is economically valuable.
`Moreover, citations typically keep coming over the long run,10 giving plenty of time to dissipate the
`original uncertainty regarding both the technological viability and the commercial worth of the cited
`innovation. Thus, if we still observe citations years after the grant of the cited patent, it must be that the
`latter had indeed proven to be valuable.
` detailed survey of inventors provides some direct evidence on citations as indicative of the presumed
` A
`links across innovations (Jaffe, Trajtenberg, and Fogarty, 2000). A set of “citing inventors” answered
`questions about their patented inventions, about the relationship of these to previous patents cited in theirs
`as well as to technologically similar “placebo” patents that were not actually cited. A second set of
`(matched) “cited inventors” answered similar questions regarding the citing patents. The results confirm
`that citations do contain significant information on knowledge flows, but with a substantial amount of
`noise. The answers revealed significant differences between the cited patents and the placebos as to
`whether the citing inventor had learned anything from the cited patent, and precisely how and what she
`learned from it. However, as many as half of all citations did not seem to correspond to any kind of
`knowledge flow, whereas one-quarter of them indicate a strong connection between citing and cited
`10 The mean backward citation lag hovers around 15 years (depending on the cohort), the median at about
`10, and 5% of citations go back 50 years and more. The forward lag is more difficult to characterize
`because of the inherent truncation, but looking at citations to the oldest cohort in the data, that of 1975,
`we see that even after 25 years citations keep coming at a non-declining rate (see Jaffe and Trajtenberg,
`2002, Ch. 13).
`There have been a small number of studies that attempted to validate the use of patent citations as
`indicators of economic impact or value. Trajtenberg (1990) related the flow of patents in computed
`tomography (CT) scanners, a major innovation in medical technology, to the estimated social surplus due
`to improvements in this technology.11 Whereas simple patent counts showed no correlation with the
`estimated surplus, citation-weighted patent counts turned out to be highly correlated with it, thus
`providing first-time evidence to the effect that citations carry information on the value of patented
`innovations. Recent work by Lanjouw and Schankerman (2003) also uses citations, along with other
`measures such as number of claims and number of countries in which an invention is patented, as a proxy
`for patent “quality.” They find that a composite measure has significant power in predicting which patents
`will be renewed and which will be litigated, thus inferring that that these indicators are indeed associated
`with the private value of patents. Harhoff et al. (1999) survey German patent holders of US patents that
`were also filed in Germany, asking them to estimate the price at which they would have been willing to
`sell the patent right three years after filing. They find that the estimated value is correlated with
`subsequent citations, and that the most highly cited patents are very valuable, with a single citation
`implying an average value of about $1 million. Giummo (2003) examines the royalties received by the
`inventor/patent holders at nine major German corporations under the German Employee Compensation
`Act and reaches similar conclusions.
`There is a substantial literature relating the stock market value of firms to various measures of
`“knowledge capital,” and in particular to R&D and patents, going back to the landmark research program
`initiated by Griliches and coworkers at the NBER.12 Hall (2000) offers a recent survey of this line of
`11 Consumer surplus was derived from an estimated discrete choice model of demand for CT scanners,
`based on purchases of scanners by US hospitals. Innovation manifested itself in the sale of improved
`scanners over time, i.e., scanners having better characteristics (e.g., speed and resolution).
`12 See, among others, Griliches (1981), Pakes (1985), Jaffe (1986), Griliches, Pakes, and Hall (1987),
`Connolly and Hirschey (1988), Griliches, Hall, and Pakes (1991), Hall (1993a), Hall (1993b), and
`Blundell, Griffith, and van Reenen (1999).
`work: the typical finding is that patent counts do not have as much explanatory power as R&D in a
`market value equation, but they do appear to add some information above and beyond R&D. A few
`papers have tried to incorporate patent citations as well, albeit in the context of small-scale studies: Shane
`(1993) finds that, for a small sample of semiconductor firms in 1977-1990, patents weighted by citations
`have more predictive power in a Tobin’s q equation than simple patent counts, entering significantly even
`when R&D stock is included. Citations-weighted patents also turned out to be more highly correlated with
`R&D than simple patent counts, implying that firms invest more efforts into patented innovations that
`ultimately yield more citations. Finally, Austin (1993) finds that citation-weighted counts enter positively
`but not significantly in an event study of patent grants in the biotechnology industry.
`3. Data
`For the purposes of this project we have brought together two large datasets and linked them via an
`elaborate matching process: the first is all patents granted by the USPTO between 1965 and 1996,
`including their patent citations; the second is firm data drawn from Compustat, including market value,
`assets, and R&D expenditures. The matching of the two sets (by firm name) proved to be a formidable,
`large-scale task, that tied up a great deal of our research efforts for a long time: Assignees obtain patents
`under a variety of names (their own and those of their subsidiaries), and the USPTO does not keep a
`unique identifier for each patenting organization from year to year. In fact, the initial list of corporate
`assignees of the 1965-1995 patents included over 100,000 entries, which we sought to match to the names
`of the approximately 6,000 manufacturing firms on the Compustat files, and to about 30,000 of their
`subsidiaries (obtained from the Who Owns Whom directory), as of 1989.13 In addition to firms patenting
`under a variety of names (in some cases for strategic purposes), the difficulties in matching are
`13 Since ownership patterns change over time, ideally one would like to match patents to firms at more
`than one point in time; however, the difficulties of the matching process made it impossible to aim for
`more than one match.
`compounded by the fact that there are numerous spelling mistakes in the names, and a bewildering array
`of abbreviations. As shown in Hall, Jaffe, and Trajtenberg (2001), we nevertheless succeeded in matching
`over half a million patents, which represent 50-65% (depending on the year) of all patents of US origin
`that were assigned to corporations during the years 1965 to 1995.14,15 Still, the results presented here
`should be viewed with some caution, since they might be affected by remaining matching errors and
`The Compustat data comprise all publicly traded firms in the manufacturing sector (SIC 2000-3999)
`between 1976 and 1995. After dropping duplicate observations and partially owned subsidiaries, and
`cleaning on our key variables, we ended up with an unbalanced panel of 4,864 firms (approximately
`1,700 per year). The firms are all publicly traded on the New York, American, and regional stock
`exchanges, or over-the-counter on NASDAQ. The main Compustat variables used here are the market
`value of the firm at the close of the year, the book value of the physical assets, and the book value of the
`R&D investment. The market value is defined as the sum of the common stock, the preferred stock,16 the
`long-term debt adjusted for inflation, and the short-term debt net of assets. The book value is the sum of
`net plant and equipment, inventories, and investments in unconsolidated subsidiaries, intangibles, and
`14 That is, the 573,000 matched patents compose 50-65% of all assigned patents (about one-quarter don’t
`have an assignee) granted to US corporate inventors. Since Compustat includes firms that are traded in
`the US stock market only, most US patents of foreign origin are obviously not matched. The percentage
`matched is rather high, considering that the matching was done only to manufacturing firms, and only to
`those listed in Compustat.
`15 In order to ensure that we picked up all the important subsidiaries, we examined and sought to assign
`all unmatched patenting organizations that had more than 50 patents during the period. A spot check of
`firms in the semiconductor industry suggests that our total patent numbers are fairly accurate, except for
`some firms for which we found a 5-15% undercount, due primarily to changing ownership patterns after
`1989 – see Hall and Ziedonis (2001).
`16 That is, the preferred dividends capitalized at the preferred dividend rate for medium risk companies
`given by Moodys.
`others (all adjusted for inflation).17 The R&D capital stock is constructed using a declining balance
`formula and the past history of R&D spending with a 15% depreciation rate (for details see Hall, 1990).
`Using the patents and citation data matched to the Compustat firms, we constructed patent stocks and
`citation-weighted patent stocks, applying the same declining balance formula used for R&D (also with a
`depreciation rate of 15%). Our patent data go back to 1964, and the first year for which we used a patent
`stock variable in the pooled regressions was 1975, so the effect of the missing initial condition (i.e.,
`patents prior to 1964) should be small for the patent variable. The fraction of firms in our sample
`reporting R&D expenditures each year hovers around 60-70%, and the fraction of firms with a positive
`patent stock lies in the same range.18 The yearly fraction of firms with current patent applications is about
`35-40%, the percentage dropping steeply by the end of period because of the application-grant lag.
`Dealing with truncation
`Patent data pose two types of truncation problems, one regarding patent counts, the other citation counts.
`The first stems from the fact that there is a significant lag between patent applications and patent grants
`(averaging lately about two years). Thus, as we approach the last year for which there are data available
`(e.g., 1995 in the data used here), we observe only a small fraction of the patents applied for that
`eventually will be granted.19 As shown in Appendix A, correcting for this sort of truncation bias is
`17 These intangibles are normally the goodwill and excess of market over book value from acquisitions,
`and do not include the R&D investment of the current firm, although they may include some value for the
`results of R&D by firms that have been acquired by the current firm.
`18 Even though there is substantial overlap between firms reporting R&D and those with patent stocks, the
`two sets are not nested: 19% of the firms with R&D stocks have no patents, while 13% of the firms with
`patent stocks report no R&D.
`19 Of course, the difficulty stems from the fact that we do not observe patent applications (and even if we
`did, we would not know which of them would eventually be granted), and that we date patents by their
`application rather than by their grant year.
`relatively straightforward, and essentially involves using the application-grant empirical distribution to
`compute “weight factors.” Thus, and using the results reported there, a patent count for, say, 1994 would
`be adjusted upwards by a factor of 1.166, implying that about 17% of the patents applied for in 1994 are
`expected to be granted after 1995, the last year of the data.
`Citation counts are inherently truncated, since patents keep receiving citations over long periods of time
`(in some cases even after 50 years), but we observe at best only the citations given up to the present, and
`more realistically only up to the last year of the available data. Moreover, patents applied for in different
`years suffer to different extents from this truncation bias in citations received, and hence their citation
`intensity is not comparable and cannot be aggregated. For recent patents the problem is obviously more
`acute, since we only observe the first few years of citations. Thus, a 1993 patent that received ten citations
`by 1996 (the end of our data) is likely to be a higher citation-intensity patent than a 1985 patent that
`received 11 citations within our data period. Furthermore, although our basic patent information begins in
`1964, we only have data on the citations made by patents beginning in 1976. Hence patents granted
`before 1976 experience truncation at the beginning of their citation cycle.20
`We address the problem of truncated citations by estimating the shape of the citation-lag distribution, i.e.,
`the fraction of lifetime citations (defined as the 30 years after the grant date) that are received in each year
`after patent grant. We assume that this distribution is stationary and independent of overall citation
`intensity. Given this distribution, we can estimate the total citations of any patent for which we observe a
`portion of its citation life simply by dividing the observed citations by the fraction of the population
`distribution that lies in the time interval for which citations are observed.21 In the case of patents for
`20 Thus, a 1964 patent that received ten citations between 1976 and 1996 is probably more citation-
`intensive than a 1976 patent that received 11 citations over that same period.
`21 The details of the estimation of the citation lag distribution and the derived adjustment to citation
`intensity are described in Hall, Jaffe, and Trajtenberg (2000), Appendix D, and further adjustment
`procedures are developed in Hall, Jaffe, and Trajtenberg (2001).
`which we observe the prime citation years (roughly years 3-10 after grant), this should give relatively
`accurate estimates of lifetime citations. On the other hand, when we observe only the first few years after
`grant (which is the case for more recent patents), the estimates will be much more noisy. In particular, the
`estimate of lifetime citations for patents with no citations in their first few years will be exactly zero,
`despite the fact that some of those patents will be eventually cited. Because of the increasing imprecision
`in measuring cites per patent as we approach the end of our sample period, our pooled regressions focus
`first on the 1976-1992 period, and then on the subset of years between 1979 and 1988.22
` first look at the data
` A
`Table 1 shows the sample statistics for the main variables used in the analysis, for the sample of
`observations analyzed in Tables 3 through 6: as expected, both market and book value, and the various
`knowledge stocks (R&D, patents, and citations), are extremely skewed, with the means exceeding the
`median by over an order of magnitude. The ratios R&D/Assets and Citations/Patents are distributed much
`more symmetrically, reflecting systematic size effects; however, the patent yield (Patents/R&D) retains a
`high degree of skewness and displays a large variance, indicating a rather weak correlation between the
`two stocks. Both the dependent variable (market to book value) and the candidate regressors in the
`models to be estimated exhibit a non-negligible amount of within variation, suggesting that there is
`interesting “action” in both the cross-sectional and the temporal dimensions.
`Figure 1 shows the total citation and patenting rates per real R&D spending in our sample. Patent counts
`are adjusted for the application-grant lag, and citation counts are shown both corrected and uncorrected:
`22 Another issue that arises in this context is that the number of citations made by each patent has been
`rising over time, suggesting a kind of “citation inflation” that renders each citation less significant in later
`years. In this paper we choose not to make any correction for the secular changes in citation rates, with
`the cost that our extrapolation attempts become somewhat inaccurate later in the sample. For a detailed
`discussion of this issue, and of econometric techniques to deal with it, see Hall, Jaffe, and Trajtenberg
`clearly, correcting for truncation has a dramatic impact on the series, particularly for recent years.
`Although the earlier years (1975-1985) show a steady decline in paten

