`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`__________________
`
`GOOGLE INC.
`Petitioner
`
`v.
`
`IMPROVED SEARCH LLC
`Patent Owner
`__________________
`
`Case No. IPR2016-00797
`U.S. Patent No. 6,604,101
`__________________
`
`DECLARATION OF DOUGLAS W. OARD, Ph.D.
`__________________
`
`
`
`
`
`
`
`
`
`
`
`AOL Ex. 1008
`Page 1 of 149
`
`
`
`
`
`
`
`
`
`B.
`
`TABLE OF CONTENTS
`
`INTRODUCTION ........................................................................................... 1
`I.
`BACKGROUND AND QUALIFICATIONS ................................................. 3
`II.
`III. LEGAL STANDARDS USED IN MY ANALYSIS ...................................... 8
`A. Obviousness ........................................................................................... 8
`B.
`Priority Date ........................................................................................ 10
`C.
`Person of Ordinary Skill in the Art ..................................................... 10
`D.
`Broadest Reasonable Interpretation .................................................... 11
`IV. BACKGROUND ON THE STATE OF THE ART IN 2000 ........................ 11
`A. General Background ............................................................................ 11
`1.
`Development of the Field .......................................................... 12
`2.
`Query Translation in the 1990s ................................................. 14
`Prior Art Relied Upon ......................................................................... 18
`1.
`Fluhr ’97 (Ex. 1003) and Fluhr ’98 (Ex. 1004). ....................... 18
`2.
`Yamabana (Ex. 1005). .............................................................. 26
`3.
`Bian (Ex. 1006). ........................................................................ 29
`V. OVERVIEW OF THE ’101 PATENT .......................................................... 33
`VI. CLAIM CONSTRUCTION .......................................................................... 38
`A.
`“content word,” “keyword,” “key word”: “any word to be
`processed for subsequent searching” .................................................. 38
`“to extract”: “to identify as a word” .................................................... 39
`“dialectal standardization”: “replacing one term with another
`term in the same language that has the same or similar
`meaning” ............................................................................................. 40
`“contextual search”: “identification of one or more documents
`from a larger collection based on the presence of specific words
`contained in the text of those documents” .......................................... 41
`“database”: “a collection of data organized for convenient
`access on a computer” ......................................................................... 43
`“query input device for inputting a query in a first language”:
`“keyboard or equivalents” ................................................................... 44
`i
`
`
`B.
`C.
`
`D.
`
`E.
`
`F.
`
`AOL Ex. 1008
`Page 2 of 149
`
`
`
`
`
`
`
`
`
`G.
`
`H.
`
`I.
`
`J.
`
`VII.
`
`“dialectal controller for dialectally standardizing the content
`word/key word extracted from the query input”: “any hardware
`and/or software that dialectally standardizes the content
`word/key word extracted from the query input” ................................. 44
`“first translator for translating the dialectally standardized word
`into a second language”: “any hardware and/or software that
`translates the dialectally standardized word into a second
`language” ............................................................................................. 46
`“search engine for searching in the second language, the site
`names (URLs), pages and descriptions satisfying search
`criteria”: “Infoseek, Webcrawler, Lycos, HotBot, AltaVista,
`Dogpile, Savvy Search, Deja News, Infospace, China.com, or
`equivalents” ......................................................................................... 48
`“second translator for translating the search results into the first
`language”: “any hardware and/or software that translates the
`search results into the first language” ................................................. 48
`INVALIDITY ANALYSIS ........................................................................... 50
`A. Overview ............................................................................................. 50
`B.
`Ground 1: Claims 1, 2, 4, 5, 22, 24, 27, and 28 Would Have
`Been Obvious in Light of Fluhr ’97 and Fluhr ’98. ............................ 50
`1.
`Claim 1 ...................................................................................... 50
`a.
`Preamble – “method for performing a contextual
`search and retrieval of documents in a computer
`network” ......................................................................... 51
`Element [a] – “receiving through an input device, a
`query in a first language” ................................................ 52
`Element [b] – “processing said query to extract at
`least one content word from the query” ......................... 53
`Element [c] – “performing dialectal
`standardization of the at least one content word
`extracted from the query” ............................................... 54
`Element [d] – “translating the at least one
`dialectally standardized content word into a second
`language through a translator” ........................................ 56
`Element [e] – “performing a contextual search in
`the second language based on the at least one
`
`b.
`
`c.
`
`d.
`
`e.
`
`f.
`
`ii
`
`
`AOL Ex. 1008
`Page 3 of 149
`
`
`
`
`
`
`
`
`
`2.
`3.
`4.
`5.
`6.
`7.
`8.
`
`g.
`
`b.
`
`translated content word, using a search engine in
`the second language” ...................................................... 57
`Element [f] – “obtaining the search results in the
`second language in the form of at least one of site
`names (URLs) and documents, satisfying a search
`criteria” ........................................................................... 59
`Claim 2 ...................................................................................... 61
`Claim 4 ...................................................................................... 63
`Claim 5 ...................................................................................... 63
`Claim 24 .................................................................................... 64
`Claim 27 .................................................................................... 66
`Claim 28 .................................................................................... 66
`Claim 22 .................................................................................... 68
`a.
`Preamble – “A method for translating a query
`input by a user in a first language into a second
`language and searching and retrieving documents
`in the second language” .................................................. 69
`Element [a] – “processing a query input in a first
`language to extract content or keyword and
`dialectally standardizing the extracted content
`word” .............................................................................. 69
`Element [b] – “translating said standardized
`content word into a second language” ............................ 70
`Element [c] – “performing a contextual search in
`the second language, based on the content word,
`obtaining search results in said second language” ......... 70
`Ground 2: Claims 6, 7, 23, and 25 Would Have Been Obvious
`in Light of Fluhr ’97, Fluhr ’98, and Yamabana. ................................ 71
`1.
`Claim 6 ...................................................................................... 71
`2.
`Claim 7 ...................................................................................... 74
`a.
`Element [a] – “inputting said search results
`obtained in the second language into a translator” ......... 75
`Element [b] – “translating the search results into
`the first language” ........................................................... 75
`
`C.
`
`c.
`
`d.
`
`b.
`
`iii
`
`
`AOL Ex. 1008
`Page 4 of 149
`
`
`
`
`
`
`
`
`
`3.
`
`4.
`
`E.
`
`c.
`
`b.
`
`c.
`
`d.
`
`e.
`
`b.
`
`b.
`
`c.
`
`d.
`
`Element [c] – “displaying said search results in the
`first language” ................................................................. 75
`Claim 23 .................................................................................... 76
`a.
`Preamble – “A method for translating a query
`input by a user in a first language into a second
`language and searching and retrieving documents
`in the second language, and translating said web
`documents into the first language” ................................. 76
`Element [a] – “processing a query input in a first
`language to extract content word and dialectally
`standardizing the content word” ..................................... 78
`Element [b] – “translating said standardized
`keyword into a second language” ................................... 78
`Element [c] – “contextually searching and
`obtaining search results in said second language” ......... 78
`Element [d] – “translating said search results into
`said first language” ......................................................... 78
`Claim 25 .................................................................................... 79
`a.
`Element [a] – “translating the contextual search
`results into the first language” ........................................ 79
`Element [b] – “displaying the search results in the
`first language” ................................................................. 79
`D. Ground 3: Claim 28 Would Have Been Obvious in Light of
`Fluhr ’97, Fluhr ’98, and Bian............................................................. 80
`Ground 4: Claims 12, 13, 15, 16, and 26 Would Have Been
`Obvious in Light of Bian, Fluhr ’97, and Fluhr ’98. ........................... 82
`1.
`Claim 12 .................................................................................... 82
`a.
`Preamble – “A system for performing contextual
`search and retrieval of documents comprising” ............. 83
`Element [a] – “a query input device for inputting a
`query in a first language” ................................................ 83
`Element [b] – “a dialectal controller for dialectally
`standardizing the content word/key word extracted
`from the query input” ..................................................... 84
`Element [c] – “a first translator for translating the
`dialectally standardized word into a second
`language” ........................................................................ 84
`iv
`
`
`AOL Ex. 1008
`Page 5 of 149
`
`
`
`
`
`
`
`
`
`e.
`
`Element [d] – “a search engine for searching in the
`second language, the site names (URLs), pages
`and descriptions satisfying search criteria” .................... 85
`Element [e] – “a display screen unit for displaying
`the search results in the second language” ..................... 85
`Element [f] – “at least a second translator for
`translating the search results into the first
`language” ........................................................................ 86
`Claim 13 .................................................................................... 88
`2.
`Claim 15 .................................................................................... 89
`3.
`Claim 16 .................................................................................... 90
`4.
`Claim 26 .................................................................................... 91
`5.
`Ground 5: Claim 17 Would Have Been Obvious in Light of
`Bian, Fluhr ’97, Fluhr ’98, and Yamabana ......................................... 94
`VIII. CONCLUSION .............................................................................................. 96
`
`f.
`
`g.
`
`F.
`
`v
`
`
`AOL Ex. 1008
`Page 6 of 149
`
`
`
`
`
`
`
`I, Douglas W. Oard, hereby declare as follows:
`
`
`
`I.
`
`INTRODUCTION
`1.
`
`I have been retained by Google Inc. (“Google”) to provide my expert
`
`opinions regarding U.S. Patent No. 6,604,101 (“the ’101 Patent”). In particular, I
`
`have been asked to opine on whether the subject matter of claims 1-2, 4-7, 12-13,
`
`15-17, and 22-28 of the ’101 Patent would have been obvious to a Person of
`
`Ordinary Skill in the Art (POSA) at the time the application that led to the ’101
`
`Patent was filed. Based on my review of the prior art then available, my
`
`understanding of the relevant requirements of patent law, and my decades of
`
`experience in the field of cross-language information retrieval, it is my opinion that
`
`each of the challenged claims would have been obvious in light of the following
`
`prior art references:
`
`
`
`Christian Fluhr, et al., “Multilingual Database and Crosslingual
`
`Interrogation in a Real Internet Application: Architecture and Problems of
`
`Implementation,” in Cross-Language Text & Speech Retrieval: Papers from
`
`the 1997 AAAI Spring Symposium, Technical Report SS-97-05 (AAAI Press
`
`1997) (Ex. 1003, “Fluhr ’97”)
`
`
`
`Christian Fluhr, et al., “Distributed Cross-Lingual Information
`
`Retrieval,” in Cross-Language Information Retrieval (Gregory Grefenstette,
`
`ed., Kluwer Academic Publishers, 1998) (Ex. 1004, “Fluhr ’98”)
`
`1
`
`
`AOL Ex. 1008
`Page 7 of 149
`
`
`
`
`
`
`
`
`
`
`
`Kiyoshi Yamabana, et al., “A Language Conversion Front-End for
`
`Cross-Language Information Retrieval,” in Cross-Language Information
`
`Retrieval (Gregory Grefenstette ed., Kluwer Academic Publishers, 1998)
`
`(Ex. 1005, “Yamabana”)
`
`
`
`Guo-Wei Bian, et al., “Integrating Query Translation and Document
`
`Translation in a Cross-Language Information Retrieval System,” in Machine
`
`Translation and the Information Soup: Third Conference of the Association
`
`for Machine Translation in the Americas, AMTA’98, Langhorne, PA, USA,
`
`October 28-31, 1998 Proceedings (David Farwell, et al. eds., Springer-
`
`Verlag Berlin Heidelberg, 1998) (Ex. 1006, “Bian”)
`
`2.
`
`Specifically, it is my opinion that the challenged claims would have
`
`been obvious over the following combinations of prior art, which I will refer to in
`
`this Declaration as Grounds 1-5:
`
`
`
`
`
`
`
`
`
`Ground 1 – Fluhr ’97 and Fluhr ’98: Claims 1, 2, 4, 5, 22, 24, 27, and
`
`28
`
`Ground 2 – Fluhr ’97, Fluhr ’98, and Yamabana: Claims 6, 7, 23 and
`
`25
`
`Ground 3 – Fluhr ’97, Fluhr ’98, and Bian: Claim 28
`
`Ground 4 – Bian, Fluhr ’97, and Fluhr ’98: Claims 12, 13, 15, 16, and
`
`26
`
`2
`
`
`AOL Ex. 1008
`Page 8 of 149
`
`
`
`
`
`
`
`
`
`
`
`Ground 5 – Bian, Fluhr ’97, Fluhr ’98, and Yamabana: Claim 17
`
`A more detailed explanation of my opinions follows in this Declaration.
`
`3.
`
`I am being compensated for my time at my standard consulting rate of
`
`$420 per hour. I am also being reimbursed for expenses that I incur during the
`
`course of this work. Apart from that, I have no financial interest in either Google
`
`Inc. or Improved Search LLC. My compensation is not contingent upon the results
`
`of my study or the substance of my opinions.
`
`II. BACKGROUND AND QUALIFICATIONS
`4.
`I am a professor and researcher in the fields of computer science and
`
`information science. Information Retrieval (IR), which is the preferred technical
`
`term for search, has been the primary subject of my research for over twenty years,
`
`and from 1994 to 2002 Cross-Language IR (CLIR) was a particular area of focus.
`
`5.
`
`I received two degrees from Rice University: a Master of Electrical
`
`Engineering degree in 1979, and a Bachelor of Arts degree with a double major in
`
`Electrical Engineering and Mathematical Sciences, also in 1979. I received a
`
`Ph.D. in Electrical Engineering from the University of Maryland, College Park in
`
`1996, with a dissertation on Adaptive Vector Space Text Filtering for Monolingual
`
`and Cross-Language Applications.
`
`6.
`
`After completing my Ph.D. in 1996, I was appointed in that same year
`
`as an Assistant Professor in the College of Library and Information Services at the
`
`3
`
`
`AOL Ex. 1008
`Page 9 of 149
`
`
`
`
`
`
`
`
`
`University of Maryland, College Park. The name of the College of Library and
`
`Information Services has subsequently been changed to the College of Information
`
`Studies, reflecting a broader scope of both teaching and research. I was promoted
`
`to Associate Professor (with tenure) in 2002, and to Professor (with tenure) in
`
`2010. From 2006 to 2009 I served as Associate Dean for Research in the College
`
`of Information Studies. In 2000, I was appointed to a joint faculty position in the
`
`University of Maryland Institute for Advanced Computer Studies (UMIACS).
`
`UMIACS appointments are renewable term appointments with a term of three to
`
`five years, and my UMIACS appointment has been renewed continuously. I also
`
`currently serve as an Affiliate Professor in the Computer Science Department at
`
`the University of Maryland, College Park, and as an Affiliate Professor in the
`
`Applied Mathematics, Statistics and Scientific Computation (AMSC) program at
`
`the University of Maryland, College Park.
`
`7.
`
`I have held visiting faculty or visiting scholar positions while
`
`conducting research on IR during sabbatical visits (of 5-14 months duration) at the
`
`University of California Berkeley, the University of Southern California
`
`Information Sciences Institute (USC-ISI), the University of Melbourne (Australia),
`
`and (concurrently) RMIT University (Australia). I also am affiliated with the
`
`Johns Hopkins University Human Language Technology Center of Excellence and
`
`4
`
`
`AOL Ex. 1008
`Page 10 of 149
`
`
`
`
`
`
`
`
`
`I hold a Visiting Professor appointment at the National Institute of Informatics
`
`(NII) in Japan.
`
`8.
`
`From 2010 to 2012, I served as Director of the UMIACS
`
`Computational Linguistics and Information Processing (CLIP) lab. The CLIP lab’s
`
`research record is particularly strong in both computational linguistics and IR.
`
`9.
`
`As I mentioned above, I perform research in the general area of IR,
`
`with particular emphasis on the design of search systems that leverage specific
`
`technologies for the computational manipulation of human language. Examples of
`
`these technologies include translation (for CLIR), speech recognition (for speech
`
`retrieval), and optical character recognition (for document image retrieval). I have
`
`also conducted research on retrieval from informal sources of text such as email
`
`(particularly in the context of e-discovery), text chat, and microblog posts, and on
`
`recommender systems, knowledge base population, and computational social
`
`science.
`
`10.
`
`I have published more than 240 academic papers. About 100 of those
`
`papers are on CLIR, and I continue to conduct, publish, and review research on
`
`that topic. I have published peer reviewed papers on CLIR in venues such as the
`
`Journal of the Association for Information Science and Technology (in 2015),
`
`Information Processing & Management (in 2005, 2008, and 2012), Information
`
`Retrieval (in 2004), ACM Transactions on Asian Language Information
`
`5
`
`
`AOL Ex. 1008
`Page 11 of 149
`
`
`
`
`
`
`
`
`
`Processing (in 2003), Computer Speech and Language (in 2004), the Annual
`
`Review of Information Science and Technology (in 1998), and the ACM Special
`
`Interest Group on Information Retrieval (SIGIR) conference (four full papers
`
`between 2000 and 2008).
`
`11. At the University of Maryland, I teach courses on IR and on other
`
`aspects of information technology. Examples include graduate courses on
`
`Information Retrieval Systems, Creating Information Infrastructures, and
`
`Transformational Information Technologies, and an undergraduate course on
`
`Information and Knowledge Management.
`
`12.
`
`I am co-editor of the peer reviewed journal Foundations and Trends
`
`in Information Retrieval and a Senior Associate Editor for the peer reviewed
`
`journal ACM Transactions on Information Systems. I have previously also served
`
`on the editorial boards of the peer-reviewed journals Information Processing &
`
`Management (IP&M), Journal of the Association for Information Science and
`
`Technology and Information Retrieval. In 2008, I served as Program Committee
`
`Co-Chair for the leading IR research conference, the ACM SIGIR conference. I
`
`have also helped to organize more than thirty other international research meetings;
`
`examples include the 1997 American Association for Artificial Intelligence
`
`6
`
`
`AOL Ex. 1008
`Page 12 of 149
`
`
`
`
`
`
`
`
`
`(AAAI)1 Spring Symposium on Cross-Language Text and Speech Retrieval, the
`
`2009 annual conference of the North American chapter of the Association for
`
`Computational Linguistics (NAACL), and six workshops on the Discovery of
`
`Electronically Stored Information (DESI).
`
`13.
`
`I have served in leadership roles for all four of the major global IR
`
`evaluation venues, including as General Co-Chair for the NII Testbeds and
`
`Community for Information Access Research (NTCIR) evaluation in Japan; as
`
`Program Committee member and as a coordinator for tracks on CLIR and e-
`
`discovery at the Text Retrieval Conference (TREC) evaluation in the United
`
`States; as a coordinator for evaluations of interactive CLIR and cross-language
`
`speech retrieval at the Cross-Language Evaluation Forum (CLEF) in Europe; and
`
`as a coordinator for tracks on speech retrieval and on retrieval of scanned
`
`documents at the Forum for Information Retrieval Evaluation (FIRE) in India.
`
`14. My research on CLIR has been supported by the National Science
`
`Foundation (NSF) and the Defense Advanced Research Projects Agency
`
`(DARPA). My research on other topics has been supported by the National
`
`Endowment for the Humanities, IBM, and the Qatar National Research Fund,
`
`
`1 The name of AAAI has subsequently been changed to the Association for the
`
`Advancement of Artificial Intelligence.
`
`7
`
`
`AOL Ex. 1008
`Page 13 of 149
`
`
`
`
`
`
`
`
`
`among others. I regularly review research proposals for NSF, and occasionally for
`
`similar bodies in other countries (including Canada, Hong Kong, Luxembourg, and
`
`Switzerland).
`
`15.
`
`I have given presentations on my research in more than 30 countries,
`
`including, for example, Brazil, China, Egypt, Germany, India, New Zealand,
`
`Russia, Singapore, Spain, and the United Kingdom.
`
`16. A more detailed description of my professional qualifications,
`
`including a list of publications, teaching, and professional activities, is contained in
`
`my curriculum vitae, a copy of which is attached as Appendix A.
`
`III. LEGAL STANDARDS USED IN MY ANALYSIS
`17. Although I am not an attorney and I do not offer any legal opinions in
`
`this proceeding, I have been informed of and relied on certain legal principles in
`
`reaching the opinions set forth in this Declaration.
`
`A. Obviousness
`18.
`I understand that a patent claim is invalid if the differences between
`
`the subject matter and the prior art are such that the subject matter as a whole
`
`would have been obvious to a POSA at the time of the alleged invention. I further
`
`understand that an obviousness analysis involves a review of the scope and content
`
`of the asserted prior art, the differences between the prior art and the claims at
`
`issue, the level of ordinary skill in the pertinent art, and objective indicia of non-
`
`8
`
`
`AOL Ex. 1008
`Page 14 of 149
`
`
`
`
`
`
`
`
`
`obviousness such as long-felt need, industry praise for the invention, and
`
`skepticism of others in the field.
`
`19.
`
`I have been informed that the following rationales, among others, may
`
`support a conclusion of obviousness:
`
`
`
`
`
`(a)
`
`the combination of familiar elements according to known
`
`methods to yield predictable results;
`
`
`
`
`
`(b)
`
`the simple substitution of one known element for another to
`
`obtain predictable results;
`
`
`
`
`
`(c)
`
`the use of known techniques to improve similar methods or
`
`apparatuses in the same way;
`
`
`
`
`
`(d)
`
`the application of a known technique to a known method or
`
`apparatus ready for improvement to yield predictable results;
`
`
`
`
`
`(e)
`
`the choice of a particular solution from a finite number of
`
`identified, predictable solutions with a reasonable expectation of success;
`
`
`
`
`
`(f)
`
`the use of known work in one field of endeavor in either the
`
`same field or a different one based on design incentives or other market forces, if
`
`the variations are predictable to one of ordinary skill in the art; and
`
`
`
`
`
`(g)
`
`the following of some teaching, suggestion, or motivation in the
`
`prior art that would have led one of ordinary skill to modify the prior art reference
`
`or to combine prior art reference teachings to arrive at the claimed invention.
`
`9
`
`
`AOL Ex. 1008
`Page 15 of 149
`
`
`
`
`
`B.
`20.
`
`
`
`
`
`Priority Date
`
`I have been asked by counsel to assume that the priority date of the
`
`’101 Patent is June 28, 2000, which I understand to be the date on which the
`
`application leading to the ’101 Patent was filed. See Ex. 1001 at 1. I have used
`
`this priority date in my analysis of what a person of ordinary skill in the art would
`
`have known and understood prior to the ’101 Patent, how a POSA would interpret
`
`the claims of the ’101 Patent, and whether the claims of the ’101 Patent are valid.
`
`21. Although I have assumed here that the priority date is June 28, 2000,
`
`my opinion would not differ if I were to use any other date after 1998, the year in
`
`which Fluhr ’98, Bian, and Yamabana were published.
`
`C.
`22.
`
`Person of Ordinary Skill in the Art
`
` I understand that a POSA is a hypothetical person who is presumed
`
`to have known the relevant art as of the priority date. I understand that factors that
`
`may be considered in determining the level of ordinary skill in the art may include:
`
`(a) the type of problems encountered in the art; (b) prior art solutions to those
`
`problems; (c) the rapidity with which innovations are made; (d) the sophistication
`
`of the technology; and (e) the educational level of active workers in the field.
`
`23.
`
`I have been asked to provide my opinion as to the qualifications of the
`
`person of ordinary skill in the art to which the ’101 Patent pertains as of June 28,
`
`2000. In my opinion, a POSA would have at least an undergraduate degree in
`
`10
`
`
`AOL Ex. 1008
`Page 16 of 149
`
`
`
`
`
`
`
`
`
`computer science, information science, or a similar field and at least two years of
`
`experience in the field of CLIR, which could include academic experience (e.g., a
`
`Masters degree with a CLIR focus).
`
`D. Broadest Reasonable Interpretation
`24.
`I have been informed that, for purposes of this Inter Partes Review
`
`(IPR), the terms in the claims of the ’101 Patent are to be given their Broadest
`
`Reasonable Interpretation (BRI) in light of the specification of the ’101 Patent as
`
`understood by a POSA on the priority date. I have used this standard throughout
`
`my analysis.
`
`25.
`
`I have also been informed that some of the claim limitations of a
`
`patent may constitute “means-plus-function” limitations, and to construe those
`
`limitations, one must first identify the function that is covered by the claim and
`
`then identify the corresponding structure in the specification that performs that
`
`function.
`
`IV. BACKGROUND ON THE STATE OF THE ART IN 2000
`A. General Background
`26. The ’101 Patent relates to CLIR. In the typical formulation of the
`
`CLIR problem, a user presents a query to a search engine in one human language
`
`(e.g., English) and the search engine retrieves documents written in some other
`
`human language (e.g., Chinese) that the search engine determines are most likely
`
`to be relevant to the user’s request.
`
`11
`
`
`AOL Ex. 1008
`Page 17 of 149
`
`
`
`
`
`
`
`
`
`1.
`Development of the Field
`27. The problem of finding documents in a language different from the
`
`language used to pose the request is not new; librarians who sought to serve
`
`diverse user populations have faced the same problem for centuries. But, by the
`
`1990s, both the academic and commercial world recognized the wealth of digital
`
`information available in a variety of languages, and “the need to find ways of
`
`retrieving information across language boundaries, and to understand this
`
`information, once retrieved.” Ex. 1007 at 1 (Grefenstette). Throughout the 1990s,
`
`corporations, government institutions, academic research centers, and others
`
`sponsored or conducted research to develop methods and systems for accessing
`
`and understanding digital information written in a language other than that of the
`
`user’s query.
`
`28. CLIR involves the same challenges as monolingual IR—such as
`
`determining which results are most relevant to the user’s query (known as “ranked
`
`retrieval”)—plus the additional challenges posed by translation. Research in CLIR
`
`has often built upon techniques and approaches developed in monolingual IR. For
`
`example, it was well known by the late 1990s that one method of performing CLIR
`
`was to combine automated query translation with a monolingual search engine.
`
`See, e.g., Ex. 1005 at 101 (Yamabana); Ex. 1006 at 254, 258-59 (Bian).
`
`12
`
`
`AOL Ex. 1008
`Page 18 of 149
`
`
`
`
`
`
`
`
`
`29.
`
`In the 1990s, much of the research on CLIR focused on query
`
`translation, the idea of translating the terms2 found in a user’s query. Alternative
`
`approaches, including what came to be called “document translation” (which more
`
`precisely refers to translating the terms found in the documents as they are
`
`indexed), or interlingual techniques (in which language-neutral representations of
`
`both the queries and the documents are created) were also explored. By 1990,
`
`Christian Fluhr at the French Commission for Atomic Energy (CEA) and his
`
`colleagues were using the query translation approach to CLIR, and CLIR research
`
`progressed substantially throughout the decade. The rapid uptake of query
`
`translation as an approach to CLIR was due, at least in part, to the fact that queries
`
`are typically short and thus can be translated quickly. This offers advantages for
`
`experimental settings (in which researchers often wish to compare results from
`
`multiple system designs quickly, sometimes over a broad range of parameter
`
`settings) and in some operational settings (e.g., when a system needs to support
`
`
`2 “Word” and “term” are often used interchangeably in the CLIR literature,
`
`including in the ’101 Patent and in the references cited in this Declaration. When I
`
`use “term” I specifically mean to include words, parts of words, or multi-word
`
`expressions.
`
`13
`
`
`AOL Ex. 1008
`Page 19 of 149
`
`
`
`
`
`
`
`
`
`numerous query languages, a setting in which it might be infeasible to build an
`
`index for each possible query language).
`
`2. Query Translation in the 1990s
`30. One area of focus in CLIR query translation research in the 1990s was
`
`on choosing translations for some or all of the terms found in a user’s query.
`
`31. The first step of this process is to identify the individual terms in a
`
`user’s query. In languages such as English, French, and Russian in which words
`
`are by convention delimited by spaces, the simplest approach is to strip
`
`punctuation and then split the words at “white space” (e.g., spaces, tab characters,
`
`or line endings) to obtain individual words. See, e.g., Ex. 1004 at 42 (Fluhr ’98).
`
`However, in languages such as Chinese (in which individual words are not
`
`conventionally delimited) or German (in which it is common to combine words
`
`into longer terms that lack internal delimiters) researchers developed more
`
`complex approaches for extracting words from longer unsegmented character
`
`strings. For example, one common technique was to start with a list of words in
`
`the language (e.g., from a dictionary) and then use some algorithm for “tiling”
`
`those words onto the longer string in a manner that provides a plausible
`
`segmentation into words. See, e.g., Ex. 1006 at 254 (Bian).
`
`32. Next, the identified words must be translated into another language.
`
`In the 1990s, a significant amount of research was focused on a class of translation
`
`14
`
`
`AOL Ex. 1008
`Page 20 of 149
`
`
`
`
`
`
`
`
`
`techniques which came to be known as “dictionary-based” CLIR. That was, in
`
`essence, a variant on the age-old technique of manual glossing by dictionary
`
`lookup. Several types of pre-compiled resources that identified possible
`
`translations were explored, including bilingual dictionaries, multilingual thesauri,
`
`or lexicons that had been developed for use by machine