`
`A Window into Expectations of the World Technological Frontier
`
`Honors Thesis
`Alexander M. Bell∗
`
`Departments of Economics and Computer Science
`
`Brown University
`
`Abstract
`
`A “submarine patent” is one whose issuance and publication has been intentionally delayed
`
`by its inventor with the hope that firms will independently discover and come to rely on
`
`a similar innovation at some later time, at which point the inventor causes his patent to
`
`issue and claims infringement. Although submarine patents are harmful to an innovation-led
`
`economy, previous research has struggled with how to discriminate between these subversive
`
`patents and patents whose issuance was delayed for legitimate reasons. I propose a novel
`
`identification strategy that exploits self-sorting by inventors around a 1995 policy change
`
`that was unfavorable to submarine patents. Using a regression-discontinuity design, I find
`
`that submarine patents are on average much more likely to be asserted for infringement in
`
`court cases. In addition, I conclude that submarines were more common in certain industries
`
`than others as evidenced by differential responses of industries to the policy change. Finally,
`
`I show that the failure of submarine patents within specific industries to ultimately assert
`
`infringement seems to be an indicator of which industries in the world economy experienced
`
`shifts in technological paradigms during this time, providing an additional method to asses
`
`the determinants of differences in income per capita across countries. I also describe how this
`
`result can be generalized across timeperiods.
`
`April 17, 2013
`
`∗The author acknowledges the guidance of his thesis advisor, Oded Galor, and comments from his reader in the computer
`science department, Tom Doeppner. The author is also grateful for helpful comments from Erez Yoeli and a number of friends,
`including those in other disciplines who helped relate these findings to the nature and history of technological progress in other
`fields. Finally, the author acknowledges previous work that laid the framework for this paper while studying the patent backlog
`at the USPTO in combination with Alan Marco, Stu Graham, and Conny Chen.
`
`
`
`Alex Bell
`
`1 Introduction
`
`An Autopsy on Submarine Patents
`
`A little-known Texas-based company filed a patent infringement complaint against social gaming giant Zynga
`
`in early 2012, alleging infringement on four patents. The issue dates of Personalized Media Communications’
`
`four patents were between June of 2010 and March of 2011, but all were filed in 1995: one on May 23, one
`
`on June 6, and two on June 7.
`
`These four patents, and others like them, are sometimes termed “submarine patents.” This term refers
`
`to a patent whose inventor does not wish to market his invention. Instead, after filing his non-public patent
`
`application, the inventor hopes for another inventor to discover the same invention and develop it into a
`
`successful product. When this happens, like a submarine emerging from the depths, the inventor finishes
`
`the paperwork for his patent to issue and sues the now-infringing producer for a share of his profits.
`
`Submarine patenting can be thought of in the context of the free-rider problem. If the costs of developing
`
`an invention into a marketable good are high, an inventor might choose to wait for another firm to invest in
`
`development, then extract royalties from that firm’s success. Submarine patenting may also be considered
`
`to cause a deadweight loss to society in the sense that some fraction of innovators who wish to sell their
`
`products must pay an additional “tax” on their products to submarine patenters.
`
`The economic rationale for patents is to encourage innovation. The government grants a fixed-term
`
`monopoly to an inventor in exchange for fully disclosing his invention early so that society might learn from
`
`it. At the same time, there are standards for patentability. Patents must be useful, novel, and non-obvious.
`
`The applicant carries the burden of proof of patentability, but the US Patent and Trademark Office would
`
`not be doing its job of aiding innovation if it risked disenfranchising inventors of their patents. Thus, if an
`
`inventor botches his application, he is given leniency to amend it without fear of losing his patent rights to
`
`others who would file after his original application. Section 2 discusses other legitimate reasons why a patent
`
`application may be delayed in more depth.
`
`This paper proposes a novel strategy for identifying a large group of submarine patents. I first show that
`
`patent applicants intending to take advantage of this loophole self-sorted to file before a policy change that
`
`made submarining future patents infeasible. Having shown that patents on either side of the discontinuity
`
`are similar in all ways other than their likelihood of being submarine patents, I examine characteristics
`
`of submarine patents and find that they are on average much more likely to be involved in infringement
`
`litigation. However, I find starkly different litigation outcomes for different industry classes, and conclude
`
`that this is due to shifts in technological paradigms within certain industries.
`
`I discuss the relevance of
`
`submarine patents to measure the expectations of technological change and provide a strategy to generalize
`
`the results of this paper to construct a measure of technological expectations and identify shifts in techno-
`
`1
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`logical paradigms over time and within industries. Such information can be useful for assessing differences
`
`in income across countries if economies that are engaged in the use of technologies that undergo paradigm
`
`shifts benefit from such technological revolutions.
`
`The remainder of this paper is as follows. Section 2 provides background on the policy change and a
`
`review of literature within the economics of innovation. Section 3 describes the dataset of patent grants I have
`
`compiled including outward linkages to outcomes such as litigation. Section 4 puts forward a theoretical
`
`model for submarining and testable hypotheses. Section 5 conveys my findings, while Section 6 offers
`
`discussion of the wider implications of these findings to studying the economics of innovation and Section 7
`
`finally concludes.
`
`2 Background and Motivation
`
`This section describes the history of the submarine loophole before proceeding to survey relevant literature.
`
`2.1 A 21st Century Vantage Point
`
`In order for a patent system to be susceptible to submarine patents, I argue that it must have two traits:
`
`(1) While the patent application is pending, other firms must have no knowledge of it. If they do, they will
`
`not use the technology.
`
`(2) Regardless of how long the patent pends for, once granted it must have a long enough term of force to
`
`be worth enforcing against profitable firms that rely on the invention.
`
`Issue (1) was resolved by the American Inventors Protection Act of 1999. Since 2000, most US applications
`are published 18 months after filing, regardless of their status as denied, issued, or still pending.∗ This policy
`
`change seems to have been aimed at bringing the US in line with what other countries were doing, speeding
`
`the diffusion of knowledge, and reducing the feasibility of submarine patents.
`
`Issue (2), however, was resolved earlier, when an agreement was signed by member nations of what was to
`
`become the World Trade Organization in 1994. With the goal of a more globally homogeneous system of IP
`
`enforcement to foster international trade, the Agreement on Trade-Related Aspects of Intellectual Property
`
`Rights contained a number of standards for laws pertaining to copyright, patenting, and other intellectual
`
`property.
`
`One of the many standards introduced was a harmonization of patent term. TRIPS, agreed on near the
`
`end of the 1994 Uruguay Round of the General Agreement on Tariffs and Trade (GATT), mandated that
`∗There are a few exceptions to mandatory application publication, the most notable of which occurs when applicants certify
`that they do not intend to file for the same invention in other countries.
`
`2
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`WTO members grant patent protection of at least 20 years, starting the clock at the filing date of a patent.
`
`Prior to TRIPS, applicants in the US were granted a patent term of 17 years from issue date. President
`
`Clinton signed the GATT on December 8, 1994, with patent term reforms set to take effect six months later,
`
`on June 8.
`
`The effect of the policy change was a tremendous flood of patent applications just prior to the shift. On
`
`June 7, ten times as many applications were filed as any other day excluding the month leading up. The
`
`Appendix contains a press release from the USPTO from June 28 explaining that it received a quarter of
`
`the year’s projected filings in just nine days. From the vantage point of 1995, this is a curious anomaly. But
`
`now that most of these applications have either issued or been abandoned, we see that this cohort of patent
`
`applications differs in important ways from other cohorts, and I will argue in Section 4 that it offers a unique
`
`window into the behavior of submarine patents.
`
`2.2 Related Literature
`
`This section begins with a survey of metrics that other studies have used to measure patent value, then
`
`discusses ways that economists believe inventors appropriate revenue from their inventions in different in-
`
`dustries. After a brief summary of the sparse literature on submarine patents, I overview the vast literature
`
`concerned with the effects on the economy of differing rates of technological progress.
`
`2.2.1 Measures of Patent Value
`
`The patent literature is rich with a variety of metrics for patent value and quality. Much of the earliest
`
`work in defining the roll of patent statistics within economics was done by Griliches (1991) and others at
`
`the NBER’s research program in productivity. The underlying motive was to better understand economic
`
`processes that lead to productivity gains – pursuing “the dream of getting hold of an output indicator of
`
`inventive activity,” in Griliches’ words. Following after Scherer (1984), who linked 15,000 patents to the
`
`443 largest US manufacturing firms in the FTC’s Line of Business Survey, Griliches and others explored
`
`outward linkages to R&D figures and stock market data for publicly traded corporations. 16 Griliches (1991)
`
`summarizes that a strong relationship can be identified at the cross-sectional level between R&D expenditure
`
`and the number of patents a firm has received. He further concludes that there may be evidence of diminishing
`
`returns to R&D expenditures. 8
`
`Hall, Jaffe, and Trajtenberg (2001) provide a more modern approach to patent data characteristic of
`
`the growing availability of digital information, particularly for patents. Of the 400 three-digit classes the
`
`USPTO groups patents into, the authors condensed the data into 36 two-digit technological sub-categories,
`
`3
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`and ultimately into six higher-level categories: Chemical (excluding drugs), Computers and Communica-
`
`tions, Drugs and Medical, Electrical and Engineering, Mechanical, and Others. However, their study reflects
`
`the difficulties of others who have attempted similar groupings, and they suggest that “while convenient,
`
`the present classification should be used with great care, and reexamined critically for specific applications.”
`
`They also discuss the usefulness of backward citations (citations a patent makes) as constituting a “paper
`
`trail” to measure knowledge spillovers and forward citations (citations received) as indicative of the “im-
`
`portance” of a patent. They put forward new measures in the form of Herfindahl concentration indices:
`
`Generality – the percentage of citations a patent makes in classes other than its own – and Originality – the
`
`percentage of citations received from other classes. They briefly discuss some validation strategies for these
`
`metrics. For example, Computers and Communications scores high on Generality, consistent with the view
`
`that it is a general purpose technology, and high on originality, in accordance with a view that it tends to
`
`break traditional models in terms of innovation. 11
`
`Hall, Jaffe, and Trajtenberg (2000) provide further insight into outside linkages of patent data. The study
`
`found that, in predicting firms’ market value from patent counts, weighting patents by their citation counts
`
`could better predict firms’ market value, indicating that forward citations are in some way tied to a notion
`
`of a patent’s “value.” 10
`
`In a different vein, a comprehensive survey by Scherer and others of US and German firms found payment
`
`of renewal fees to be a reliable proxy for patent value. They also confirm that the distribution of patent
`
`values is highly skewed, with a few patents being extremely profitable. 17
`
`Table 1 summarizes and expands upon a dichotomy proposed in van Zeebroeck et al (2008), which
`
`classifies the strategies used by economists to view patent data as either patent-based or market-based. 18
`
`Many of these techniques are revisited in Section 4, in which the feasibility and applicability of their use for
`
`this project are discussed.
`
`2.2.2 Appropriability of Inventions
`
`Within the field of economics, two major investigations have been carried out into how firms appropriate
`
`rents from their innovations. The first, published in 1987, was a survey of 650 R&D executives in 130 different
`
`lines of business (as defined by the FTC). It is sometimes referred to as the Yale survey. 13 The second was
`
`administered in 1994 to 1478 R&D labs, and is sometimes referred to as the Carnegie Mellon survey. 5
`
`The Yale survey divided its questions into product and process patents.
`
`In general, firms reported
`
`capturing profits from product innovations with patents more often than with process innovations, perhaps
`
`because it is more difficult (and less desirable) to keep product innovations secret. For processes, lead time
`
`and learning advantages were rated much more useful than patents. In general, patents were rated more
`
`4
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`Table 1: Established Metrics of Patent Value and Related Dimensions
`
`Metric
`—Patent-Based —
`Backward Citations
`
`Forward Citations
`
`Generality
`Originality
`Maintenance fee payments
`Legal disputes
`
`Parents
`
`—Market-Based —
`Firm value
`Estimated patent value
`
`Description
`
`citations the patent makes; commonly used to track knowledge
`spillovers
`citations the patent receives; generally accepted to be measure of
`“importance”
`diverse classes of forward citations
`diverse classes of backwards citations
`indicator of how much owner values patent
`incidences and outcomes of infringement suits; usually believed to
`be indicative of valuable patents
`count of parents or earliest parent’s filing date to show “entrance”
`into system
`
`Stock market performance, R&D statistics, Tobin’s q
`Royalties, valuation by inventors or managers, buy-outs
`
`important to businesses for preventing duplication by competitors than for amassing royalties. The industries
`
`that reported relying on patents the most to capture revenues on their innovations were chemicals and drugs,
`
`perhaps because in those industries, infringement is more clear-cut. The authors found that responses to
`
`their appropriability survey were consistently significant predictors of industries’ R&D intensities.
`
`The Carnegie Mellon survey some years later asked similar questions to the Yale survey, but extended the
`
`investigation into why firms use patents, even if they are not a valuable means of protecting their inventions
`
`(as most industries indicated). The most common reasons for not patenting were ease of inventing around
`
`products and concern over disclosing the invention. For small firms, the cost of defending their patents was
`
`a commonly raised issue. In examining individuals’ responses as to why their firms patent, the authors saw
`
`two groups naturally emerging. They define “complex” industries as those in which new products typically
`
`contain many patented inventions (eg, electronics) compared to “discrete industries” such as drugs and
`
`chemicals. Respondents in discrete industries tended to report use of patents not only for maintaining a
`
`monopoly on their innovations, but also to block rivals’ entry via similar inventions (“patent fences”). In
`
`contrast, the bulk of respondents who reported using patents to enter into licensing negotiations were in
`
`complex industries.
`
`2.2.3 Submarine Patents
`
`A limited literature exists within economics on issues relating to submarine patents. It seems the reason for
`
`the dearth of research on this topic is that submarine patents are generally difficult to identify.
`
`Graham and Mowery (2002) was one of the first papers within economics to analyze the effect of the
`
`5
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`patent application continuation process. A continuation is a legal term within patenting; it allows a rejected
`
`application to be restarted while claiming its original priority (filing) date. Of note, they found an increase in
`
`the use of continuations prior to 1995, with a sharp drop-off for applications filed after. They reported that
`
`software companies seemed to be using the continuations process the most. They also found continuations to
`
`be positively correlated with the number of forward citations, patent originality, and incidence of post-grant
`
`litigation (as measured by linkages to the Derwent patent litigation database). 7
`
`Hegde, Mowery, and Graham (2007) use more recent data, current as of 2004 (the NBER patent database
`
`described in Hall, Jaffe, and Trajtenberg (2001)). They describe the uniqueness of continuations to the US
`
`patent system and their suspected involvement with submarine patents. At the same time, they explain the
`
`stance of some patent attorneys and industry groups: these long-pending continued applications may also
`
`be the result of “high-risk investments of ‘pioneering inventors’ in ‘young’ fields of invention that are subject
`
`to uncertainty.” They admit that little empirical evidence has been brought to bear on the characteristics of
`
`those applicants who exploit the US continuation process. The paper empirically examines which types of
`
`industries have used different types of continuations: Continuations, Continuations in Part, and Divisionals.
`
`All three types introduce a delay in the application process while allowing the final patent to retain the
`
`priority date of the initial application. 12
`
`2.2.4 “Skill-Biased” Technical Change
`
`A number of studies, particularly in the 1990s and early 2000, have attempted to account for the growing wage
`
`gap observed in developed countries between skilled and unskilled workers. The gap can be characterized by
`
`an increase in wages of skilled (educated) workers above those of the unskilled, accompanied by a growing
`
`abundance of skilled workers in the labor force relative to unskilled. 2
`
`A concept of “skill-biased technical change” became a prominent explanation. The theory was that
`
`certain types of workers may fair better than others during periods of rapid technological growth. Namely,
`
`economists hypothesized that more educated or otherwise more able workers could better adjust to changing
`
`workplaces. This increase in demand for skilled workers could account for the relative rise in returns to skill
`
`in the midst of a relative increase in supply of skilled workers.
`
`Caselli divides technological revolutions into two categories: skill-biased (eg, the information technology
`
`revolution) and de-skilling (eg, the assembly line, which replaced skilled artisans in the production of cars).
`
`With the focus of modeling skill-biased revolutions, he develops a model in which productivity-augmenting
`
`technology spurs increases in wages, but particularly so for quick-learning workers, while the slow-learning
`
`workers continue to use the old technology for a certain period of time (the old capital is not immediately
`
`valueless). He confirms that recent increases in wage inequality within industries are in fact associated with
`
`6
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`increased inequality of capital-to-labor ratios. Furthermore, he theorizes that education may make workers
`
`better able to adjust to new technology; if this is the case, then technological revolutions within an industry
`
`would increase the returns to education for workers in those industries. 3
`
`In another model of technological change and wages, Galor and Moav theorize that regardless of whether
`
`a technological revolution ultimately brings about a new paradigm that is biased toward or away from skill
`
`(or ability), in the short run the transition to this new technological state will be skill (ability)-biased in
`
`the short turn. Their model predicts several effects that are confirmed by data. 6
`
`Machin and van Reenen used R&D intensities in the manufacturing sectors of seven OECD countries
`
`as a proxy for technological change. They found strong results across developed nations that industries
`
`experiencing technical change also experience a shift in composition of their labor forces favoring skilled
`
`workers, and conclude that this evidence is in line with the skill-biased technical change hypothesis. 14
`
`In a similar vein, Berman, Bound, and Machin address the spread of skill-biased technical change through-
`
`out the world economy. Using similar industry-level manufacturing data, they examine within-industry
`
`changes among developed countries in the skilled-to-unskilled labor ratio. They find that industry changes
`
`are highly correlated among countries. This is strong evidence that skill-biased technical change within
`
`industries is not localized to the country from which the technology originates, but rather the change per-
`
`meates throughout the industry and affects workers in the broader world economy. The authors also offer
`
`some evidence that skill-biased technical change may account for a trend of skill upgrading witnessed in the
`
`manufacturing sectors of less-developed countries. 2
`
`Bartel and Sicherman use a wide array of measures for technical change in manufacturing industries,
`
`including use of patents, investment in R&D, and various measures of TFP (ie, residuals from estimating
`
`sales after controlling for capital). Tracking workers through the National Longitudinal Survey of Youth,
`
`the authors find that after controlling for individual-level fixed effects, the relationship relationship between
`
`wages and technological change is weakened. They conclude that a significant part of industry-level changes
`
`in wage inequality and skill composition is due to sorting by workers who move between industries. 1
`
`A major limitation of studies examining technological change on industry wages and skill compositions
`
`has been a tendency to rely exclusively on trends in the manufacturing sector, where statistics such as
`
`industries’ R&D expenditures to sales ratios, total factor productivity, and wages are easiest to measure
`
`and most freely available from the government due to the manufacturing sector’s relevance to economic
`
`indicators. 9 With the method proposed in my paper to identify technological revolutions, combined with the
`
`increased use of individual-level tax data on earnings within economics, 4 it may soon be possible to extend
`
`the literature beyond the manufacturing industry.
`
`7
`
`
`
`Alex Bell
`
`3 Data
`
`An Autopsy on Submarine Patents
`
`We ourselves do not put enough emphasis on the value of data and data collection in
`our training of graduate students and in the reward structure of our profession. It is
`the preparation skill of the econometric chef that catches the professional eye, not the
`quality of the raw materials in the meal, or the effort that went into procuring them.
`(Zvi Griliches, address to the American Economic Association, January 4, 1994 9)
`
`In this following section, I summarize my ingredients.
`
`3.1 My data
`
`The USPTO has made remarkable strides in the past few years toward making raw patent data available to
`
`researchers. With this much data, the next question economists are confronting is what can be done with
`
`it. I first discuss what I have done with it, then what others are doing.
`
`Through a partnership with Google, the USPTO has made available the full text of every patent granted
`
`from January 1976 to present (updated weekly). To download the data, one must download and unzip nearly
`
`2,000 weekly files. None of these files are in formats that are useful to any type of research that economists
`
`would be interested in.
`
`There are four different file formats, spanning different timeperiods of operation:
`
`.txt (1365 files from 1976-2001). These are text files that are organized in some vaguely
`1. pftaps
`hierarchical manner. Certain text strings recursively separate different data fields.
`
`2. pg
`.sgm (54 files mid-2001 and a few in 2002). These files are in Standardized Generalized Markup
`Language, an ISO standard that is similar in some ways to XML.
`
`.xml (156 files 2002-2004). Similar to above, but somewhat more compatible with modern XML
`3. pg
`readers.
`
`4. ipg
`.xml (currently about 400 files, from 2005-present). The most recent XML generation. Names
`of tags were redone to be more descriptive.
`
`Weekly patent grant files were parsed in Python with a script run over the course of several days.
`
`Figure 1 presents a stylized Entity-Relationship model of how I conceptualize the data contained within
`
`a patent grant.
`
`In translating the data from a docuement-oriented model to the relational model, my
`
`ultimate goal was to produce a relational database in Boyce-Codd Normal Form – that is, one without
`
`unnecessary duplication of data. However, determining the data’s functional dependencies has proven more
`
`complex than expected. For example, we know the filing date for all five million patents in the database –
`
`let this be contained in the relation BasicInf o(patent, f iling date, issue date), in which the primary key
`
`is underlined. Yet when a citation is recorded within a patent grant, the USPTO lists not only the cited
`
`patent’s number, but also the cited patent’s application and issue dates.
`
`If we have a relation linking a
`
`patent to its citations, Cites(patent, cited patent), storing the application and issue dates of cited patent
`
`8
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`Figure 1: Entity-Relationship Diagram of Available Data
`Compustat linkages discussed in next section
`
`Diagram.png
`
`in that relation seems redundant, because this information can be assembled by joining the relations on
`
`BasicInf o.patent = Cites.cited patent. Given that the dataset contains five million patents but 55 million
`
`citations, this kind of duplication would be enormously costly. However, this foreign key relationship does
`
`not always hold true: a cited patent may have been issued prior to 1976, or it may be a patent in another
`
`country. If either of these is the case, its filing and issue dates would not be contained in the Cites relation.
`
`Similar problems exist in recording details of patents associated with parent relationships and elsewhere in
`
`the construction of the relational database. When confronted with these dilemmas, I generally erred on the
`
`side of preserving information, at the risk of redundancy.
`
`Another takeaway of Figure 1 is the many-to-many cardinalities that tend to characterize patent data.
`
`One patent may have many inventors, and one inventor may have many patents. The diagram shows the
`
`same for other relationships, such as assignment at grant and citations. Fields that a patent may strictly
`
`have one of are listed in the Patent entity (with the addition of primary class, shown by arrows pointing
`
`away from the Patent entity).
`
`9
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`3.1.1 Linkages External to Patent Grant from the USPTO
`
`I incorporate linkages to two relations provided by the USPTO through Google. The first is maintenance
`
`fee events. These are provided as a table indicating patent number, date of event, and type of event (eg,
`
`payment of a certain type of fee or refund for various reasons). Also relevant to the analysis, each event
`
`contains an indicator for whether the assignee claims small entity status, which affords the inventor a reduced
`
`financial burden. In the US, the patent fee schedule is highly back-loaded, as shown in Table 2. Because
`
`a relatively small number of patents are renewed to full maturity, this data offers an indication for patents
`
`that have issued several years in the past as to how much their owner values them.
`
`Table 2: Summary of Current Major Fee Schedule (USD)
`
`Type of fee
`Utility issue fee
`Due at 3.5 years
`Due at 7.5 years
`Due at 11.5 years
`
`Standard Amount Amount for Small Entity
`1,770
`885
`1,150
`575
`2,900
`1,450
`4,810
`2,405
`
`At the time of grant, the patent examiner places a patent into one of more than 400 technology classi-
`
`fications, primarily for the purpose of facilitating future searches for relevant prior art. These classes are
`
`further broken down into thousands of sub-classes. The class system frequently changes as new classes are
`
`created, merged, or obsoleted. A very recent addition to the USPTO’s bulk data downloads is a file linking
`
`each patent to its current classification. These are the technology classes analyzed in this paper. Still, an
`
`overwhelming challenge to all researchers working with patent data has been finding meaningful groupings
`
`for patents by economically relevant industries, as opposed to by the USPTO’s large number of technology
`
`classes.
`
`3.2 Litigation Data
`
`No branch of the government, including the USPTO, keeps track of patent infringement suits for individual
`
`patents. However, the US court system makes documents pertinent to legal cases available online through
`
`services such as Public Access to Court Electronic Records (PACER) for a fee of a few cents per page
`
`downloaded.
`
`Lex Machina is a private firm spun out of a joint research project between Stanford’s law school and com-
`
`puter science department called the IP Litigation Clearninghouse, which was designed to bring transparency
`
`to IP law. The business of Lex Machina is providing IP litigation data and analytics to businesses. The
`
`company does this by crawling court records including those from PACER and district court databases daily.
`
`10
`
`
`
`Alex Bell
`
`An Autopsy on Submarine Patents
`
`Using natural language processing algorithms, they determine which cases are instances of IP litigation, and
`
`in the cases of patent infringement, link the case with the patent being infringed upon (also known as the
`
`patent being “asserted”).
`
`Although Lex Machina sells this information to businesses, they agreed to contribute to this project a
`
`dataset of linkages from patent numbers to legal assertions for all patents filed in 1995.
`
`In section 5, I
`
`examine how the likelihood of assertion changes for patents filed just before and just after the policy change.
`
`3.3 NBER Patent Data Project and Name Matching
`
`This paper does not use the NBER database, but it is useful to discuss in terms of work that has been done
`
`with similar raw data to what I am using. The work done by the PDP and made public to researchers has
`
`been instrumental in speeding the diffusion of this data through the empirical research community. The data
`
`provided by the Patent Data Project is outlined in Hall, Jaffe, and Trajtenberg (2001). It contains data
`
`primarily on patent citations and assignees.
`
`The most significant drawback of this dataset is that it is only current as of 2006. In examining long-
`
`pending patents filed around the 1995 discontinuity, the picture would look different if we examined only
`
`patents issued by 2006.
`
`A tremendous contribution of this dataset is the authors’ attempts to conduct meaningful assignee name
`
`disambiguation and matching to Compustat firms. This is not a trivial task for several reasons. Consider a
`
`company identified in Compustat as IBM. In addition to filing patents as IBM, patents they file might list
`
`as assignee:
`
`1. An unabbreviated name, such as International Business Machines
`
`2. Some formal legal name, such as IBM, Inc. (or other languages’ variants). This is extremely common.
`
`3. A division of the company (eg “IBM R&D”, “IBM Circuits Division”, or “IBM India”). This is also
`extremely common.
`
`4. A misspelling of any of the above, perhaps due to data entry error.
`
`The architects of the dataset share their name standardization routines on their site. There isn’t much
`
`to do about (1). By either removing or standardizing most common suffixes and other elements of firm
`
`names, the authors seem to do a good job of mitigating lost matches due to (2). To deal with