throbber
Case 12-4547, Document 159, 06/04/2013, 955881, Page1 of 53
`
`12-4547-cv
`
`United States Court of Appeals
`for the
`Second Circuit
`
`
`
`
`
`AUTHORS GUILD, INC., AUSTRALIAN SOCIETY OF AUTHORS
`LIMITED, UNION DES ECRIVAINES ET DES ECRIVAINS QUEBECOIS,
`ANGELO LOUKAKIS, ROXANA ROBINSON, ANDRE ROY, JAMES
`SHAPIRO, DANIELE SIMPSON, T.J. STILES, FAY WELDON,
`
`(For Continuation of Caption See Inside Cover)
`
`_______________________________
`ON APPEAL FROM THE UNITED STATES DISTRICT COURT
`FOR THE SOUTHERN DISTRICT OF NEW YORK
`
`BRIEF OF DIGITAL HUMANITIES AND LAW SCHOLARS
`AS AMICI CURIAE IN SUPPORT OF DEFENDANTS-
`APPELLEES AND AFFIRMANCE
`
`
`
`
`
`
`
`
`
`
`
`
`
`On the Brief:
`MATTHEW SAG*
`ASSOCIATE PROFESSOR
`LOYOLA UNIVERSITY OF
`CHICAGO SCHOOL OF LAW
`
` *
`
`JASON SCHULTZ*
`ASSISTANT CLINICAL PROFESSOR OF LAW
`UC BERKELEY SCHOOL OF LAW
`396 Simon Hall
`Berkeley, California 94720
`(510) 642-1957
`jschultz@law.berkeley.edu
`
`Attorneys for Amici Curiae
`
` Filed in their individual capacity and not on behalf of their institutions.
`
`
`
`
`
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page2 of 53
`
`
`AUTHORS LEAGUE FUND, INC., AUTHORS’ LICENSING AND
`COLLECTING SOCIETY, SVERIGES FORFATTARFORBUND, NORSK
`FAGLITTERAER FORFATTERO OG OVERSETTERFORENING,
`WRITERS’ UNION OF CANADA, PAT CUMMINGS, ERIK GRUNDSTROM,
`HELGE RONNING, JACK R. SALAMANCA,
`Plaintiffs-Appellants,
`
`v.
`
`HATHITRUST, CORNELL UNIVERSITY, MARY SUE COLEMAN, President,
`University of Michigan, MARK G. YUDOF, President, University of California,
`KEVIN REILLY, President, University of Wisconsin System,
`MICHAEL MCROBBIE, President, Indiana University,
`
`NATIONAL FEDERATION OF THE BLIND, GEORGINA KLEEGE,
`BLAIR SEIDLITZ, COURTNEY WHEELER,
`
`Defendants-Appellees,
`
`Intervenor Defendants-Appellees.
`
`
`
`
`
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page3 of 53
`
`TABLE OF CONTENTS
`
`
`
`TABLE OF AUTHORITIES ......................................................................................................... iv
`STATEMENT OF INTEREST OF AMICI ..................................................................................... 1
`SUMMARY OF ARGUMENT ...................................................................................................... 2
`ARGUMENT .................................................................................................................................. 4
`I. The Freedom to Make Non-expressive Use of Copyrighted Works is Vital to the “Progress of
`Science” in the Digital Humanities ................................................................................................. 4
`II. Copyright Law Does Not Protect Non-expressive Aspects of Works .................................. 14
`A. The Idea/Expression Distinction ....................................................................................... 14
`B. Section 102(b) ................................................................................................................... 15
`C. Merger and Scènes à Faire ................................................................................................ 16
`D. Fact/Expression Distinction .............................................................................................. 17
`E. Non-expressive Metadata Does Not Implicate the Statutory Rights of the Copyright
`Holder ....................................................................................................................................... 18
`F. Non-expressive Metadata Is Also Noninfringing Because It Does Not Allow the Public to
`Perceive the Expressive Content of a Work ............................................................................. 22
`III. Text Mining Creates Value by Facilitating the Advancement of Our Collective Knowledge;
`and Other Non-expressive Purposes Should Be Considered "Fair Use" ...................................... 24
`A. Non-expressive Copying to Expand Our Knowledge in the Digital Humanities Is An
`Activity of the Sort that Copyright Law Should Favor, Through Fair Use .............................. 24
`B. The Nature of the Works in Question Is Favorable to the Fair Use Analysis of Mass
`Digitization for the Advancement of Digital Humanities Research and Scholarship .............. 27
`C. To the Extent Relevant, Mass Digitization Uses a Reasonable “Amount and
`Facilitating Data Mining for the Advancement of the Digital Humanities .............................. 28
`D. Allowing Intermediate Copying in Order to Enable Non-expressive Uses Does Not Harm
`Implicate the Works' Expressive Aspects in Any Way ............................................................ 29
`
`To Protect That Value, Mass Digitization and Similar Intermediate Copying for Data Mining
`
`Substantiality” of the Works in Question, in Light of the Socially Beneficial Purpose of
`
`the Market for the Original Works in a Legally Cognizable Manner, As The Practice Does Not
`
`ii
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page4 of 53
`
`CERTIFICATE OF COMPLIANCE WITH FRAP 32(a) ............................................................ 32
`
`
`
`
`
`
`
`iii
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page5 of 53
`
`
`TABLE OF AUTHORITIES
`
`Cases
`
`A.V. ex rel. Vanderhye v. iParadigms, LLC,
`562 F.3d 630 (4th Cir. 2009) ......................................................... 4, 29, 30
`
`
`Basic Books, Inc. v. Kinko's Graphics Corp.,
`758 F. Supp. 1522 (S.D.N.Y. 1991) ........................................................ 27
`
`
`Bill Graham Archives v. Dorling Kindersley Ltd.,
`448 F.3d 605 (2d Cir. 2006) .............................................................. 25, 27
`
`
`Bond v. Blum,
`317 F.3d 385 (4th Cir. 2003) ....................................................... 26, 27, 29
`
`
`Campbell v. Acuff-Rose Music, Inc.,
`510 U.S. 569 (1994) ..................................................................... 25, 28, 30
`
`
`Cariou v. Prince, No. 11-1197-cv,___ F.3d __, slip op. at 13
`(2d Cir., April 25, 2013) ......................................................................... 25, 29
`
`Castle Rock Entm’t v. Carol Publishing Grp.,
`150 F.3d 132 (2d Cir. 1998) ........................................................ 20, 21, 22
`
`
`Davis v. United Artists, Inc.,
`547 F. Supp. 722 (S.D.N.Y. 1982) .......................................................... 23
`
`
`Eldred v. Ashcroft, 537 U.S. 186 (2003). ..................................................... 13
`
`Feist Publ’ns, Inc. v. Rural Tel. Serv. Co., Inc.,
`499 U.S. 340 (1991) ........................................................................... 17, 20
`
`
`Fisher v. Dees,
`794 F.2d 432 (9th Cir. 1986) ................................................................... 30
`
`
`Fuld v. Nat’l Broad. Co., Inc.,
`390 F. Supp. 877 (S.D.N.Y. 1975) .......................................................... 23
`
`
`
`iv
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page6 of 53
`
`Golan v. Holder,
`132 S. Ct. 873 (2012) ............................................................................... 15
`
`
`Harper & Row Publishers, Inc. v. Nation Enters.,
`471 U.S. 539 (1985) ........................................................................... 14–15
`
`
`Hasbro Bradley, Inc. v. Sparkle Toys, Inc.,
`780 F.2d 189 (2d Cir. 1985) .................................................................... 21
`
`
`Hoehling v. Universal City Studios, Inc.,
`618 F.2d 972 (2d Cir. 1980) .............................................................. 16, 18
`
`
`Kelly v. Arriba Soft Corp.,
`336 F.3d 811 (9th Cir. 2002) ............................................................. 25, 29
`
`
`Kregos v. Associated Press,
`937 F.2d 700 (2d Cir. 1991) .................................................................... 16
`
`
`Madrid v. Chronicle Books,
`209 F. Supp. 2d 1227 (D. Wyo. 2002) ..................................................... 23
`
`
`MyWebGrocer, LLC v. Hometown Info, Inc.,
`375 F.3d 190 (2d Cir. 2004) .................................................................... 17
`
`
`Nat’l Basketball Ass’n v. Motorola, Inc.,
`105 F.3d 841 (2nd Cir. 1997) ............................................................ 17, 18
`
`
`New Era Publ’ns Int’l, ApS v. Carol Pub. Grp.,
`904 F.2d 152 (2d Cir. 1990) ............................................................... 27-28
`
`
`N.Y. Mercantile Exch., Inc. v. IntercontinentalExchange, Inc.,
`497 F.3d 109 (2d Cir. 2007) .................................................................... 16
`
`
`N.Y. Times Co. v. Tasini,
`533 U.S. 483 (2001) ........................................................................... 22, 23
`
`
`NXIVM Corp. v. Ross Inst.,
`364 F.3d 471 (2d Cir. 2004) .................................................................... 26
`
`
`
`
`v
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page7 of 53
`
`Perfect 10, Inc. v. Amazon.com, Inc.,
`508 F.3d 1146 (9th Cir. 2007) ....................................................... 4, 25, 29
`
`
`Peter F. Gaito Architecture, LLC v. Simone Dev. Corp.,
`602 F.3d 57 (2d Cir. 2010) ...................................................................... 15
`
`
`Religious Tech. Ctr. v. Lerma,
`908 F. Supp. 1362 (E.D. Va. 1995) ................................................... 26, 27
`
`
`Reyher v. Children’s Television Workshop,
`533 F.2d 87 (2d Cir. 1976) ...................................................................... 15
`
`
`Sega Enters. Ltd. v. Accolade, Inc.,
`977 F.2d 1510 (9th Cir. 1992) ....................................................... 4, 28, 30
`
`
`Sony Computer Entm’t, Inc. v. Connectix Corp.,
`203 F.3d 596 (9th Cir. 2000) ............................................................... 4, 28
`
`
`Sony Corp. of Am. v. Universal City Studios, Inc.,
`464 U.S. 417 (1984) ............................................................................ 14-15
`
`
`Stromback v. New Line Cinema,
`384 F.3d 283 (6th Cir. 2004) ................................................................... 23
`
`
`Tufenkian Imp./Exp. Ventures, Inc. v. Einstein Moomjy, Inc.,
`338 F.3d 127 (2d Cir. 2003) .................................................................... 15
`
`
`Ty, Inc. v. Publ’ns Int’l Ltd.,
`292 F.3d 512 (7th Cir. 2002) ................................................................... 21
`
`
`Warner Bros. Entm’t Inc. v. RDR Books,
`575 F. Supp. 2d 513 (S.D.N.Y. 2008) ......................................... 19, 20, 21
`
`
`Walker v. Time Life Films, Inc.,
`615 F. Supp. 430 (S.D.N.Y. 1985) .......................................................... 23
`
`Statutes
`17 U.S.C. § 102(a) (2006) ............................................................................. 20
`
`17 U.S.C. § 102(b) (2006) ............................................................ 4, 14, 15, 16
`vi
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page8 of 53
`
`17 U.S.C. § 106(2) (2006) ............................................................................ 21
`
`17 U.S.C. § 107 (2006) ................................................................................. 25
`
`17 U.S.C. § 201(c) (2006) ............................................................................. 22
`
`Constitutional Provisions
`U.S. Const. Art I., Sec. 8 ............................................................................... 13
`
`Other Authorities
`Sophia Ananiadou et al., Text Mining and its Potential
`Applications in Systems Biology,
`24 TRENDS IN BIOTECHNOLOGY 571 (2006) .................................................... 5
`
`Leif Isaksen, Elton Barker, Eric C. Kansa, Kate Byrne, GAP: A NeoGeo Approach
`to Classical Resources, 45 LEONARDO 82-83 (2012) ..................................... 7
`Christian Blaschke et al. Information Extraction in Molecular Biology, 3
`BRIEFINGS IN BIOINFORMATICS 154 (2002) ..................................................... 5
`
`Patricia Cohen, Digital Keys for Unlocking the Humanities’ Riches,
` N.Y. TIMES, Nov. 17, 2010, at C1 .................................................................. 7
`
`James M. Hughes, et al., Quantitative Patterns of Stylistic Influence in the
`Evolution of Literature, 109 PROC. OF THE NAT’L ACAD. OF SCI. OF
`THE U.S. 7682 (2012) ............................................................................... 10-11
`
`Matthew Jockers, MACROANALYSIS: DIGITAL METHODS FOR LITERARY
`HISTORY (2013) ..................................................................................... 6, 7, 10
`
`Brian Lavoie & Lorcan Dempsey, Beyond 1923: Characteristics of
`Potentially In Copyright Print Books in Library Collections, 15 D-Lib Mag.,
`http://www.dlib.org/dlib/november09/lavoie/11lavoie.html ........................ 28
`
`Pierre N. Leval, Toward A Fair Use Standard,
`103 HARV. L. REV. 1105 (1990) ................................................................... 26
`
`Steve Lohr, Dickens, Austen and Twain, Through a Digital Lens,
`N.Y. TIMES, Jan. 26, 2013, at BU3, available at
`http://www.nytimes.com/2013/01/27/technology/literary-history-seen-through
`-big-datas-lens.html?pagewanted=all&_r=2&. ............................................ 11
`
`vii
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page9 of 53
`
`MALLET: MAchine Learning for LanguagE Toolkit, http://mallet.cs.umass.edu/
`(last visited May 31, 2012) ........................................................................... 10
`
`Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres,
`Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg,
`Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and
`Erez Lieberman Aiden; Quantitative Analysis of Culture Using Millions of
`Digitized Books. 331 SCIENCE 176 (2011) .................................................... 10
`
`MONK: Metadata Offer New Knowledge, http://www.monkproject.org/
`(last visited May 31, 2013) ........................................................................... 10
`
`Franco Moretti, Graphs, Maps, Trees: Abstract Models for
`Literary History (2005) ................................................................................... 6
`
`Toshihide Ono et al., Automated Extraction of Information on Protein–Protein
`Interactions from the Biological Literature,
`17 BIOINFORMATICS 155 (2001) ...................................................................... 5
`
`Matthew Sag, Copyright and Copy-Reliant Technology,
`103 NW. U.L. REV. 1607 (2009) ..................................................................... 3
`
`Matthew Sag, Orphan Works as Grist for the Data Mill,
`27 BERKELEY TECH. L. J. 1503 (2012) ........................................................... 3
`
`Software Environment for the Advancement of Scholarly Research (“SEASR”)
`http://seasr.org (last visited May 31, 2013) .................................................. 10
`
`Text Analysis Portal for Research (“TAPoR”), http://www.tapor.ca/portal/
`portal (last visited July 2, 2012) .................................................................... 10
`
`Tracking 18th-century “social network” through letters, STANFORD
`UNIVERSITY (Dec. 14, 2009) (video), http://www.youtube.com/watch?v=
`nw0oS-AOIPE ................................................................................................ 7
`
`
`
`
`
`viii
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page10 of 53
`
`STATEMENT OF INTEREST OF AMICI1
`Amici are over 100 professors and scholars who teach, write, and research in
`
`computer science, the digital humanities, linguistics or law, and two associations
`
`that represent Digital Humanities scholars generally.2 Amici have an interest in this
`
`case because of its potential impact on their ability to discover and understand,
`
`through automated means, the data in and relationships among textual works.
`
`Legal Scholar Amici also have an interest in the sound development of intellectual
`
`property law. Resolution of the legal issue of copying for non-expressive uses has
`
`far-reaching implications for the scope of copyright protection, a subject germane
`
`to Amici’s professional interests and one about which they have great expertise.
`
`Amici speak only to the issue of copying for non-expressive uses. A complete list
`
`of individual amici is attached as Appendix A.
`
`
`
`
`1 Pursuant to Fed. R. App. P. 29(a), (c)(4), (c)(5) and Rule 29.1 of the Local Rules
`of the United States Court of Appeals for the Second Circuit, Amici hereby state
`that none of the parties to this case nor their counsel authored this brief in whole or
`in part; no party or any party’s counsel contributed money intended to fund
`preparing or submitting the brief; and no one else other than Amici and their
`counsel contributed money that was intended to fund preparing or submitting this
`brief. Amici also hereby state that all parties have consented to the filing of this
`brief, and we rely on that consent as our source of authority to file.
`2 See Association for Computers and the Humanities, http://www.ach.org/;
`Canadian Society for Digital Humanities, http://csdh-schn.org.
`.
`
`1
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page11 of 53
`
`
`
`
`
`
`
`SUMMARY OF ARGUMENT
`
`Mass digitization, especially by libraries, is a key enabler of socially
`
`valuable computational and statistical research (often called “data mining” or “text
`
`mining”). While the practice of data mining has been used for several decades in
`
`traditional scientific disciplines such as astrophysics and in social sciences like
`
`economics, it has only recently become technologically and economically feasible
`
`within the humanities. This has led to a revolution, dubbed “Digital Humanities,”
`
`ranging across subjects like literature and linguistics to history and philosophy.
`
`New scholarly endeavors enabled by Digital Humanities advancements are still in
`
`their infancy but have enormous potential to contribute to our collective
`
`understanding of the cultural, political, and economic relationships among various
`
`collections (or corpora) of works—including copyrighted works—and with society.
`
`The Court’s ruling in this case on the legality of mass digitization could
`
`dramatically affect the future of work in the Digital Humanities.
`
`This Court should affirm the decision of the district court below that library
`
`digitization for the purpose of text mining and similar non-expressive uses present
`
`2
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page12 of 53
`
`no legally cognizable conflict with the statutory rights or interests of the copyright
`
`holders. Where, as here, the output of a database—i.e., the data it produces and
`
`displays—is noninfringing, this Court should find that the creation and operation
`
`of the database itself is likewise noninfringing. The copying required to convert
`
`paper library books into a searchable digital database is properly considered a
`
`“non-expressive use” because the works are copied for reasons unrelated to their
`
`protectable expressive qualities — the copies are intermediate and, as far as is
`
`relevant here, unread.
`
`The type of non-expressive use at issue here – statistical analysis of text – is
`
`common among copy-reliant technologies: for example, Internet search engines
`
`and plagiarism detection software do not read, understand, or enjoy copyrighted
`
`works, nor do they deliver these works directly to the public. Such platforms copy
`
`the works only incidentally, in order to process them as “grist for the mill”—raw
`
`materials that feed various algorithms and indices. See Matthew Sag, Copyright
`
`and Copy-Reliant Technology, 103 NW. U.L. REV. 1607 (2009); Matthew Sag,
`
`Orphan Works as Grist for the Data Mill, 27 BERKELEY TECH. L. J. 1503 (2012).
`
`Further, generating data about a copyrighted work (often called “metadata”)
`
`does not infringe the original work because, as has been recognized for over a
`
`century, copyright law protects only an author’s original expression, not facts. That
`
`a “fact” might pertain to or describe an expressive work does not change its factual
`
`3
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page13 of 53
`
`character—or render it an author’s exclusive intellectual property under the law.
`
`Indeed, making such factual information freely available to all is crucial to the
`
`harmony between copyright law and the First Amendment—hence the existence of
`
`rules like the “idea/expression” distinction (see 17 U.S.C. § 102(b)), the doctrine of
`
`scenes à faire, and the “merger” principle.
`
`The act of copying works into a database in order to enable the generation of
`
`metadata about those works should thus be deemed noninfringing. As numerous
`
`courts have found, making intermediate copies that enable socially beneficial
`
`noninfringing uses and/or outputs constitutes a protected “fair use” under Section
`
`107 of the Copyright Act. See, e.g., A.V. ex rel. Vanderhye v. iParadigms, LLC,
`
`562 F.3d 630, 645 (4th Cir. 2009); Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d
`
`1146, 1168 (9th Cir. 2007); Sony Computer Entm’t, Inc. v. Connectix Corp., 203
`
`F.3d 596, 609 (9th Cir. 2000); Sega Enters. Ltd. v. Accolade, Inc., 977 F.2d 1510,
`
`1527-28 (9th Cir. 1992). Similarly, the mass digitization of books for text-mining
`
`purposes is a form of incidental or “intermediate” copying that enables ultimately
`
`non-expressive, non-infringing, and socially beneficial uses without unduly
`
`treading on any expressive—i.e., legally cognizable—uses of the works. The Court
`
`should find such copying to be fair use.
`
`I.
`
`ARGUMENT
`
`The Freedom to Make Non-expressive Use of Copyrighted Works is
`Vital to the “Progress of Science” in the Digital Humanities
`4
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page14 of 53
`
`
`
`Where large-scale electronic text collections are available, advances in
`
`computational power and a proliferation of new text-mining and visualization tools
`
`offer scholars of the humanities the chance to do what biologists, physicists, and
`
`economists have been doing for decades—analyze massive amounts of data.
`
`
`
` “Digital Humanities” scholars fervently believe that text mining and the
`
`computational analysis of text are vital to the progress of human knowledge in the
`
`current Information Age. The potential of these non-expressive uses of text has
`
`already been revealed in the life sciences, where researchers routinely use a variety
`
`of text-mining tools to facilitate the search for relevant research across disparate
`
`fields and to uncover previously unnoticed “correlations or associations such as
`
`protein-protein interactions and gene-disease associations.” See Sophia Ananiadou
`
`et al., Text Mining and its Potential Applications in Systems Biology, 24 TRENDS IN
`
`BIOTECHNOLOGY 571, 571 (2006) (citing Toshihide Ono et al., Automated
`
`Extraction of Information on Protein–Protein Interactions from the Biological
`
`Literature, 17 BIOINFORMATICS 155 (2001) and Christian Blaschke et al.
`
`Information Extraction in Molecular Biology, 3 BRIEFINGS IN BIOINFORMATICS 154
`
`(2002)).
`
`Similar breakthroughs are on the horizon in the humanities. Traditionally,
`
`literary scholars have relied upon the close and often anecdotal study of select
`
`works. Modern computing power, advances in computational linguistics and
`5
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page15 of 53
`
`natural language processing, and the mass digitization of texts now permit
`
`investigation of the larger literary record.
`
`Digitization enhances our ability to process, mine, and ultimately better
`
`understand individual texts, the connections between texts, and the evolution of
`
`literature and language. As University of Nebraska Professor Matthew Jockers
`
`explains, by exploring the literary record writ large, researchers can better
`
`understand the context in which individual texts exist, and thereby better
`
`understand the texts themselves. See Matthew Jockers, MACROANALYSIS: DIGITAL
`
`METHODS FOR LITERARY HISTORY (2013). Along similar lines, Stanford University
`
`Professor Franco Moretti has noted that “a field this large cannot be understood by
`
`stitching together separate bits of knowledge about individual cases, because it
`
`isn’t a sum of individual cases: it’s a collective system, that should be grasped as
`
`such, as a whole . . . .” Franco Moretti, GRAPHS, MAPS, TREES: ABSTRACT MODELS
`
`FOR LITERARY HISTORY 4 (2005) (emphasis in original).
`
`Researchers working in the field of information retrieval frequently use text
`
`mining and computer-aided classification to identify and retrieve relevant
`
`documents. Using similar techniques, researchers in the Digital Humanities are
`
`able to identify and retrieve relevant texts, often from unlikely places. Humanities
`
`researchers can thereby expand their traditional study of a few canonical works to a
`
`study of several million in the larger archive of literary history—an archive that
`
`6
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page16 of 53
`
`has hitherto remained hidden because of the limitations of humans’ reading
`
`capacity. As part of this process, such non-expressive uses often leads to additional
`
`expressive uses, expanding the audience (and the potential market) for enjoyment
`
`of individual works.3
`
`Mass digitization also results in the creation of data that enables scholars to
`
`reimagine relationships between texts—for example, by linking texts with maps.
`
`Thus, Google’s “Ancient Places Project” links the text of public domain books like
`
`Gibbon’s Decline and Fall of the Roman Empire to a map of the ancient world.4
`
`The interface allows the user to browse the books, including the full text, at the
`
`same time as she browses a map. The places mentioned are marked on the map and
`
`hyperlinked.5 Similar maps could be made with reference to works still under
`
`
`3 For example, Matthew Jockers used text mining and computer aided
`classification to identify an overlooked tradition of whaling fiction predating (and
`arguably informing) Melville’s writing of Moby Dick. See Jockers, supra.
`4 See Leif Isaksen, Elton Barker, Eric C. Kansa, Kate Byrne, GAP: A NeoGeo
`Approach to Classical Resources, 45 LEONARDO 82-83 (2012).
`5 In a similar vein, researchers at Stanford University have mapped thousands of
`letters exchanged during the Enlightenment and thereby devised a theory of how
`these individual networks fit into a coherent whole, which the scholars refer to as
`the “Republic of Letters.” See Tracking 18th-century “social network” through
`letters, STANFORD UNIVERSITY (Dec. 14, 2009) (video),
`http://www.youtube.com/watch?v=nw0oS-AOIPE. Such aggregation yields
`surprising insights: for example, “the common narrative is that the Enlightenment
`started in England and spread to the rest of Europe,” but the relatively low volume
`of correspondence between London and Paris suggests otherwise. See Patricia
`7
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page17 of 53
`
`copyright—importantly, without ever making the text of the book available for free
`
`viewing. Extracting such data from texts to create these maps is a quintessential
`
`non-expressive use of the underlying texts that does not implicate any copyright-
`
`protected use—let alone infringe the copyrights of—the works in question.
`
`Google’s “Ngram” tool provides another example of a non-expressive use
`
`enabled by mass digitization—this time easily visualized. Figure 1, below, is an
`
`Ngram-generated chart that compares the frequency with which authors of texts in
`
`the Google Book Search database refer to the United States as a single entity (“is”)
`
`as opposed to a collection of individual states (“are”). As the chart illustrates, it
`
`was only in the latter half of the Nineteenth Century that the conception of the
`
`United States as a single, indivisible entity was reflected in the way a majority of
`
`writers referred to the nation. This is a trend with obvious political and historical
`
`significance, of interest to a wide range of scholars and even to the public at large.
`
`But this type of comparison is meaningful only to the extent that it uses as raw data
`
`a digitized archive of significant size and scope.6
`
`
`Cohen, Digital Keys for Unlocking the Humanities’ Riches, N.Y. TIMES, Nov. 17,
`2010, at C1.
`6 Google Ngram is available at http://books.google.com/ngrams.
`8
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page18 of 53
`
`Figure 1: Google Ngram Visualization Comparing Frequency of
`“The United States is” to “The United States are”7
`
`
`
`To be absolutely clear, 1) the data used to produce this visualization can only
`
`be collected by digitizing the entire contents of the relevant books, and 2) not a
`
`single sentence of the underlying books has been reproduced in the finished
`
`product. In other words, this type of non-expressive use only adds to our collective
`
`
`7 Figure 1 is a reconstruction of data generated using Google Ngram, sampled at
`five-year intervals. The y-axis is scaled to 1/100,000 of a percent, such that 1 =
`0.00001%.
`
`9
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page19 of 53
`
`knowledge and understanding, without in any way replacing, damaging the value
`
`of, or interfering with the market for, the original works.8
`
`Google Ngram is just the tip of the iceberg.9 In Macroanalysis: Digital
`
`Methods and Literary History, Professor Jockers draws on a corpus of Nineteenth
`
`Century novels to demonstrate how literary style changes over time. See generally
`
`Jockers, supra. Examining word frequencies, syntactic patterns, and thematic
`
`markers in the metadata-enriched context of author nationality, author gender, and
`
`time period, opens up literary study to an entirely new perspective.10 Trendsetters
`
`
`8 For additional examples of Ngram’s uses, see, e.g., Jean-Baptiste Michel, Yuan
`Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google
`Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon
`Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden;
`Quantitative Analysis of Culture Using Millions of Digitized Books. 331 SCIENCE
`176 (2011) (a study of linguistic and cultural changes in over five million digitized
`books).
`9 The toolkit available to Digital Humanities researchers is becoming increasingly
`sophisticated. See, e.g., Text Analysis Portal for Research (“TAPoR”),
`http://portal.tapor.ca/portal/portal (last visited May 21, 2013) (tools to map word
`usage over time, including peaks, density, collocations, and types); MALLET:
`MAchine Learning for LanguagE Toolkit, http://mallet.cs.umass.edu/ (last visited
`May 31, 2013) (a Java-based package for statistical natural language processing,
`document classification, clustering, topic modeling, information extraction, and
`other machine learning applications to text); MONK: Metadata Offer New
`Knowledge, http://www.monkproject.org/ (last visited May 31, 2013) (a digital
`environment designed to help humanities scholars discover and analyze patterns in
`the texts); Software Environment for the Advancement of Scholarly Research
`(“SEASR”), http://seasr.org (last visited May 31, 2013).
`10 A recently published study, led by mathematicians at Dartmouth, makes a
`similar point. See James M. Hughes et al., Quantitative Patterns of Stylistic
`10
`
`
`

`

`Case 12-4547, Document 159, 06/04/2013, 955881, Page20 of 53
`
`and outliers are revealed, as when Jockers’ text mining and computational analysis
`
`demonstrated that Harriet Beecher Stowe’s fiction is far more similar to the work
`
`of male authors of her generation than to the female-authored works of
`
`“sentimental fiction” among which her work has traditionally been categorized.11
`
`Figure 2 provides another fascinating example of Professor Jockers’ research.
`
`The chart shows the extent to which British, American, and Irish authors focused
`
`on the theme of American slavery during the Nineteenth Century, based on a
`
`corpus of 3,450 novels from that time period. Although it comes as no surprise that
`
`slavery was most often addressed by American authors, the strong Irish reaction to
`
`the American Civil War (note the spike in the light gray line beginning in 1860)
`
`compared with the decidedly muted response by British aut

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket