throbber

`
`
`
`
`
`IN THE UNITED STATES PATENT AND TRADEMARK OFFICE
`
`
`
`
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`
`
`AOL INC.
`Petitioner
`v.
`
`IMPROVED SEARCH LLC
`Patent Owner
`
`
`
`Case No. CBM2017-00038
`U.S. Patent No. 7,516,154
`
`
`DECLARATION OF DOUGLAS W. OARD, Ph.D.
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`

`
`AOL Ex. 1002
`Page 1 of 90
`
`

`

`
`
`TABLE OF CONTENTS
`
`TABLE OF CONTENTS .......................................................................................... ii 
`I. 
`INTRODUCTION .......................................................................................... 1 
`II. 
`QUALIFICATIONS ....................................................................................... 2 
`III.  COMPENSATION AND RELATIONSHIP TO THE PARTIES ................. 7 
`IV.  LEGAL STANDARDS USED IN MY ANALYSIS ..................................... 7 
`A. 
`The Person of Ordinary Skill in the Art ................................................ 8 
`B. 
`Broadest Reasonable Interpretation ...................................................... 8 
`C.  Means-Plus-Function Claim Elements .................................................. 9 
`D. 
`Enablement .......................................................................................... 10 
`E. 
`Incorporation by Reference ................................................................. 11 
`F. 
`Obviousness ......................................................................................... 11 
`SUMMARY OF OPINIONS ........................................................................ 12 
`V. 
`VI.  TECHNICAL BACKGROUND .................................................................. 13 
`A. 
`Cross-Language Search and Query Translation .................................. 13 
`B. 
`Problems Inherent in the Query Translation Approach ...................... 17 
`1. 
`Identifying individual query terms ........................................... 18 
`2. 
`Identifying possible translations for each query term ............... 21 
`3. 
`Determining how best to use those possible translations ......... 26 
`Sponsored Search ................................................................................ 30 
`C. 
`The ’101 Patent ................................................................................... 32 
`D. 
`The Divisional Application ................................................................. 36 
`E. 
`The ’154 Patent ................................................................................... 36 
`F. 
`The Challenged Claims ....................................................................... 39 
`G. 
`VII.  OPINIONS REGARDING CLAIM CONSTRUCTION ............................. 43 
`A. 
`“Dialectal Standardization” / “Dialectally Standardizing” ................. 43 
`B. 
`“Content Word” ................................................................................... 45 
`C. 
`“Advertising Cues”.............................................................................. 45 
`ii
`
`
`AOL Ex. 1002
`Page 2 of 90
`
`

`

`
`
`X. 
`
`VIII.  CLAIMS 1 AND 7 ARE INVALID FOR LACK OF
`ENABLEMENT. .......................................................................................... 46 
`A. 
`The Patent Gives No Guidance on How to Perform Dialectal
`Standardization. ................................................................................... 47 
`Undue Experimentation ....................................................................... 55 
`B. 
`IX.  CLAIM 7 IS INVALID FOR INDEFINITENESS. ..................................... 61 
`A. 
`“A Dialectal Controller for Dialectally Standardizing a Content
`Word Extracted from the Query” ........................................................ 62 
`“Means to Search the Database of the Advertising Cues Based
`on the Relevancy to the Translated Content Word” ........................... 69 
`CLAIMS 1 AND 7 ARE INVALID FOR OBVIOUSNESS. ...................... 72 
`A. 
`Claims 1 and 7 Are Not Entitled to the Priority Date of Either
`Parent Application. .............................................................................. 72 
`Claims 1 and 7 Would Have Been Obvious In Light of the ’101
`Patent and Skillen. ............................................................................... 74 
`1. 
`The ’101 patent and Skillen disclose each and every
`element of claim 1. .................................................................... 75 
`The ’101 patent and Skillen disclose each and every
`element of claim 7. .................................................................... 79 
`A POSA would have found it obvious to combine the
`teachings of the ’101 patent and Skillen. .................................. 84 
`I am aware of no objective indicia weighing in favor of a
`finding of non-obviousness. ...................................................... 86 
`XI.  CONCLUSION ............................................................................................. 86 
`
`B. 
`
`B. 
`
`2. 
`
`3. 
`
`4. 
`
`
`
`
`
`
`
`
`iii
`
`
`AOL Ex. 1002
`Page 3 of 90
`
`

`

`
`
`I, Dr. Douglas W. Oard, hereby state the following:
`I.
`
`INTRODUCTION
`1.
`
`I have been retained on behalf of AOL Inc. (“AOL”) to provide
`
`technical assistance related to the filing of a Petition for Covered Business Method
`
`Review (“CBM Review”) of U.S. Patent No. 7,516,154 (“the ’154 patent”). I am
`
`working as a private consultant on this matter and the opinions presented here are
`
`my own.
`
`2.
`
`I have been asked to provide a written declaration, including opinions
`
`related to the following issues:
`
` Technical background,
`
` The qualifications of a person of ordinary skill in the art (“POSA”),
`
` The proper interpretation of the claims under the broadest reasonable
`
`construction standard,
`
` Whether the specification of the patent describes the invention in such
`
`full, clear, concise, and exact terms as to enable a POSA to carry out the
`
`claimed invention without undue experimentation,
`
` Whether the specification of the patent describes sufficiently definite
`
`structure for performing functions recited in means-plus-function claims,
`
`and
`
`1
`
`
`AOL Ex. 1002
`Page 4 of 90
`
`

`

`
`
` Whether claims 1 and 7 of the ’154 patent would have been obvious
`
`to a POSA at the time of the alleged invention in light of U.S. Patent No.
`
`6,604,101 (“the ’101 patent”) (Ex. 1003) and U.S. Patent No. 6,098,065
`
`(“Skillen”) (Ex. 1004).
`
`In reaching my opinions on the ’154 patent, I have reviewed the documents cited
`
`herein and relied on my many years of knowledge and experience in the field of
`
`information retrieval (outlined in Section II). This Declaration sets forth the bases
`
`and reasons for my opinions, including the materials and information relied upon
`
`in forming those opinions and conclusions.
`
`II. QUALIFICATIONS
`3.
`I am a professor and researcher in the fields of computer science and
`
`information science. Information Retrieval (IR), which is the preferred technical
`
`term for search, has been the primary subject of my research for over twenty years.
`
`From 1994 to 2002, Cross-Language IR (CLIR) was a particular area of focus for
`
`me.
`
`4.
`
`I received two degrees from Rice University: a Master of Electrical
`
`Engineering degree in 1979 and a Bachelor of Arts degree with a double major in
`
`Electrical Engineering and Mathematical Sciences, also in 1979. I received a
`
`Ph.D. in Electrical Engineering from the University of Maryland, College Park in
`
`2
`
`
`AOL Ex. 1002
`Page 5 of 90
`
`

`

`
`
`1996, with a dissertation on Adaptive Vector Space Text Filtering for Monolingual
`
`and Cross-Language Applications.
`
`5.
`
`After completing my Ph.D. in 1996, I was appointed in that same year
`
`as an Assistant Professor in the College of Library and Information Services at the
`
`University of Maryland, College Park. The name of the College of Library and
`
`Information Services has subsequently been changed to the College of Information
`
`Studies, reflecting a broader scope of both teaching and research. I was promoted
`
`to Associate Professor (with tenure) in 2002, and to Professor (with tenure) in
`
`2010. From 2006 to 2009 I served as Associate Dean for Research in the College
`
`of Information Studies. In 2000, I was appointed to a joint faculty position in the
`
`University of Maryland Institute for Advanced Computer Studies (UMIACS).
`
`UMIACS appointments are renewable appointments with a term of three to five
`
`years, and my UMIACS appointment has been renewed continuously. I also
`
`currently serve as an Affiliate Professor in the Computer Science Department at
`
`the University of Maryland, College Park, and as an Affiliate Professor in the
`
`Applied Mathematics, Statistics and Scientific Computation (AMSC) program at
`
`the University of Maryland, College Park.
`
`6.
`
`I have also held visiting positions while conducting research on IR
`
`during sabbatical visits (of 5-14 months duration) at the University of California
`
`Berkeley, the University of Southern California Information Sciences Institute
`
`3
`
`
`AOL Ex. 1002
`Page 6 of 90
`
`

`

`
`
`(USC-ISI), the University of Melbourne (Australia) and (concurrently) RMIT
`
`University (Australia), and the University of Florida and (concurrently) the
`
`University of South Florida. I also am affiliated with the Johns Hopkins
`
`University Human Language Technology Center of Excellence, and I hold a
`
`Visiting Professor appointment at the National Institute of Informatics (NII) in
`
`Japan.
`
`7.
`
`From 2010 to 2012, I served as Director of the UMIACS
`
`Computational Linguistics and Information Processing (CLIP) lab. The CLIP lab’s
`
`research record is particularly strong in both computational linguistics and IR.
`
`8.
`
`As I mentioned above, I perform research in the general area of IR,
`
`with particular emphasis on the design of search systems that leverage specific
`
`technologies for the computational manipulation of human language. Examples of
`
`these technologies include translation (for CLIR), speech recognition (for speech
`
`retrieval), and optical character recognition (for document image retrieval). I have
`
`also conducted research on retrieval from informal sources of text such as email
`
`(particularly in the context of e-discovery), text chat, and microblog posts, and on
`
`recommender systems, knowledge base population, and computational social
`
`science.
`
`9.
`
`I have published more than 240 academic papers. About 100 of those
`
`papers are on CLIR, and I continue to conduct, publish, and review research on
`
`4
`
`
`AOL Ex. 1002
`Page 7 of 90
`
`

`

`
`
`that topic. I have published peer reviewed papers on CLIR in venues such as the
`
`Journal of the Association for Information Science and Technology, Information
`
`Processing & Management, Information Retrieval, ACM Transactions on Asian
`
`Language Information Processing, Computer Speech and Language, the Annual
`
`Review of Information Science and Technology, and the ACM Special Interest
`
`Group on Information Retrieval (SIGIR) conference.
`
`10. At the University of Maryland, I teach courses on IR and on other
`
`aspects of information technology. Examples include graduate courses on
`
`Information Retrieval Systems, Creating Information Infrastructures, and
`
`Transformational Information Technologies, and an undergraduate course on
`
`Information and Knowledge Management.
`
`11.
`
`I recently completed a five-year term as co-editor of the peer reviewed
`
`journal Foundations and Trends in Information Retrieval, and I continue to serve
`
`as a Senior Associate Editor for the peer reviewed journal ACM Transactions on
`
`Information Systems. I have previously also served on the editorial boards of the
`
`peer-reviewed journals Information Processing & Management, Journal of the
`
`Association for Information Science and Technology, and Information Retrieval.
`
`In 2008, I served as Program Committee Co-Chair for the leading IR research
`
`conference, the ACM SIGIR conference. I have also helped to organize more than
`
`thirty other international research meetings; examples include the 1997 American
`
`5
`
`
`AOL Ex. 1002
`Page 8 of 90
`
`

`

`
`
`Association for Artificial Intelligence (AAAI)1 Spring Symposium on Cross-
`
`Language Text and Speech Retrieval, the 2009 annual conference of the North
`
`American chapter of the Association for Computational Linguistics (NAACL), and
`
`seven workshops on the Discovery of Electronically Stored Information (DESI).
`
`12.
`
`I have served in leadership roles for four of the major global IR
`
`evaluation venues, including as General Co-Chair for the NII Testbeds and
`
`Community for Information Access Research (NTCIR) evaluation in Japan; as
`
`Program Committee member and as a coordinator for tracks on CLIR and e-
`
`discovery at the Text Retrieval Conference (TREC) evaluation in the United
`
`States; as a coordinator for evaluations of interactive CLIR and cross-language
`
`speech retrieval at the Cross-Language Evaluation Forum (CLEF) in Europe; and
`
`as a coordinator for tracks on speech retrieval and on retrieval of scanned
`
`documents at the Forum for Information Retrieval Evaluation (FIRE) in India.
`
`13. My research on CLIR has been supported by the National Science
`
`Foundation (NSF) and the Defense Advanced Research Projects Agency
`
`(DARPA). My research on other topics has been supported by the National
`
`Endowment for the Humanities, IBM, and the Qatar National Research Fund,
`
`                                                            
`1 The name of AAAI has subsequently been changed to the Association for the
`
`Advancement of Artificial Intelligence.
`
`6
`
`
`AOL Ex. 1002
`Page 9 of 90
`
`

`

`
`
`among others. I regularly review research proposals for NSF, and occasionally for
`
`similar bodies in other countries (including, for example, Canada, Hong Kong,
`
`Luxembourg, and Switzerland).
`
`14.
`
`I have given presentations on my research in more than 30 countries,
`
`including, for example, Brazil, China, Egypt, Germany, India, New Zealand,
`
`Russia, Singapore, Spain, and the United Kingdom.
`
`15. A more detailed description of my professional qualifications,
`
`including a list of publications, teaching, and professional activities, is contained in
`
`my curriculum vitae, a copy of which is attached as Appendix A.
`
`III. COMPENSATION AND RELATIONSHIP TO THE PARTIES
`16.
`I am being compensated for my time on this matter at my standard
`
`consulting rate of $420 per hour plus expenses. Apart from that, I have no
`
`financial interest in AOL Inc., Google Inc., or Improved Search LLC. My
`
`compensation is in no way dependent on the substance of my opinions or the
`
`outcome of this proceeding.
`
`IV. LEGAL STANDARDS USED IN MY ANALYSIS
`17. Although I am not an attorney and do not offer any opinions on the
`
`law, I have been informed of certain legal principles that I have relied on in
`
`reaching the opinions set forth in this Declaration.
`
`7
`
`
`AOL Ex. 1002
`Page 10 of 90
`
`

`

`
`
`A. The Person of Ordinary Skill in the Art
`18.
`I have been informed that a POSA is a hypothetical person who is
`
`presumed to have known all of the relevant prior art as of the priority date. I have
`
`been informed that factors that may be considered in determining the level of
`
`ordinary skill in the art may include: (a) the educational level of the inventor; (b)
`
`the type of problems encountered in the art; (c) prior art solutions to those
`
`problems; (d) the rapidity with which innovations are made; (e) the sophistication
`
`of the technology; and (f) the educational level of active workers in the field.
`
`19.
`
`I have been asked to provide my opinion as to the qualifications of the
`
`person of ordinary skill in the art to which the ’154 patent pertains. In my opinion,
`
`a POSA would have at least an undergraduate degree in computer science,
`
`information science, or a similar field and at least two years of experience in the
`
`field of CLIR, which could include academic experience (e.g., a Masters degree
`
`with a CLIR focus). In addition, I believe a POSA would be familiar with
`
`commercial aspects of information retrieval, including search-related advertising.
`
`B.
`20.
`
`Broadest Reasonable Interpretation
`
`I have been informed that for purposes of this CBM proceeding the
`
`terms in the claims of the ’154 patent are to be given their broadest reasonable
`
`interpretation in light of the specification of the ’154 patent, as understood by a
`
`POSA. I have used this standard throughout my analysis.
`
`8
`
`
`AOL Ex. 1002
`Page 11 of 90
`
`

`

`
`
`C. Means-Plus-Function Claim Elements
`21.
`I have been informed that an element of a patent claim may be
`
`expressed as a means or step for performing a specified function without the recital
`
`of structure, materials, or acts in support thereof. I have been informed that such
`
`elements are referred to as “means-plus-function” elements and are construed to
`
`cover the corresponding structure, material, or acts described in the specification
`
`and equivalents thereof. I have been informed that if the specification fails to
`
`identify any corresponding structure, material, or acts for a means-plus-function
`
`claim element, the claim is invalid because it is indefinite.
`
`22.
`
`I have been informed that, in determining whether a claim element is
`
`a means-plus-function element, use of the term “means” creates a presumption that
`
`a claim element is a means-plus-function element and that lack of the term
`
`“means” creates a presumption that a claim element is not a means-plus-function
`
`element. However, I have been informed that the essential inquiry is whether the
`
`words of the claim are understood by persons of ordinary skill in the art as having
`
`sufficiently definite meaning as the name for structure. I have been informed that
`
`the use of nonce words or generic terms such as module, mechanism, element, or
`
`device may invoke means-plus-function treatment.
`
`23.
`
`I have been informed that, where the structure disclosed by the
`
`specification for performing a particular function is a generic computer
`
`9
`
`
`AOL Ex. 1002
`Page 12 of 90
`
`

`

`
`
`programmed to carry out an algorithm, the disclosed structure is not the general
`
`purpose computer but the special purpose computer programmed to perform the
`
`disclosed algorithm.
`
`D. Enablement
`24.
`I have been informed that a patent claim is invalid if the specification
`
`of the patent fails to describe the claimed invention in such full, clear, concise, and
`
`exact terms as to enable a POSA to make and use the claimed invention.
`
`25.
`
`I have been informed that a claim lacks enablement if, as of the
`
`priority date, a POSA could not practice the full scope of the claim without undue
`
`experimentation. I have been informed that courts and the Patent Office often
`
`consider eight factors in determining whether undue experimentation would be
`
`needed to practice the full scope of a patent claim:
`
`(1) the quantity of experimentation necessary,
`
`(2) the amount of direction or guidance presented,
`
`(3) the presence or absence of working examples,
`
`(4) the nature of the invention,
`
`(5) the state of the prior art,
`
`(6) the relative skill of those in the art,
`
`(7) the predictability or unpredictability of the art, and
`
`(8) the breadth of the claims.
`
`10
`
`
`AOL Ex. 1002
`Page 13 of 90
`
`

`

`
`
`I have been informed that these factors are referred to as the Wands factors.
`
`E.
`26.
`
`Incorporation by Reference
`
`I have been informed that a patent specification may incorporate other
`
`material by reference, and that such material is considered to be part of the
`
`specification as if it had been set forth explicitly in the specification. I have been
`
`instructed to assume that the ’154 patent properly incorporates by reference the
`
`entireties of the ’101 patent and U.S. Patent App. No. 10/449,740 (Ex. 1007), the
`
`divisional application that followed the ’101 patent and preceded the ’154 patent.
`
`F. Obviousness
`27.
`I have been informed that a patent claim is invalid if the differences
`
`between the subject matter and the prior art are such that the subject matter as a
`
`whole would have been obvious to a POSA at the time of the alleged invention. I
`
`have been informed that an obviousness analysis involves reviewing the scope and
`
`content of the prior art, the differences between the prior art and the claims at
`
`issue, the level of ordinary skill in the pertinent art, and objective indicia of non-
`
`obviousness such as long-felt need, industry praise for the invention, and
`
`skepticism of others in the field.
`
`28.
`
`I have been informed that the following rationales, among others, may
`
`support a conclusion of obviousness:
`
`
`
`
`
`(a)
`
`the combination of familiar elements according to known
`
`11
`
`
`AOL Ex. 1002
`Page 14 of 90
`
`

`

`
`
`methods to yield predictable results;
`
`
`
`
`
`(b)
`
`the simple substitution of one known element for another to
`
`obtain predictable results;
`
`
`
`
`
`(c)
`
`the use of known techniques to improve similar methods or
`
`apparatuses in the same way;
`
`
`
`
`
`(d)
`
`the application of a known technique to a known method or
`
`apparatus ready for improvement to yield predictable results;
`
`
`
`
`
`(e)
`
`the choice of a particular solution from a finite number of
`
`identified, predictable solutions with a reasonable expectation of success;
`
`
`
`
`
`(f)
`
`the use of known work in one field of endeavor in either the
`
`same field or a different one based on design incentives or other market forces, if
`
`the variations are predictable to one of ordinary skill in the art; and
`
`
`
`
`
`(g)
`
`the following of some teaching, suggestion, or motivation in the
`
`prior art that would have led one of ordinary skill to modify the prior art reference
`
`or to combine prior art reference teachings to arrive at the claimed invention.
`
`V.
`
`SUMMARY OF OPINIONS
`29.
`
`It is my opinion that claims 1 and 7 of the ’154 patent are
`
`unpatentable. I conclude that claims 1 and 7 are unpatentable because the
`
`specification fails to enable a POSA to carry out the claimed invention without
`
`undue experimentation, that claim 7 is unpatentable because the specification fails
`
`12
`
`
`AOL Ex. 1002
`Page 15 of 90
`
`

`

`
`
`to disclose any structure, material, or acts associated with the claimed function of
`
`dialectal standardization, and that both claims are unpatentable because they would
`
`have been obvious to a POSA as of the priority date in light of the ’101 patent and
`
`Skillen.
`
`VI. TECHNICAL BACKGROUND
`30.
`In order to explain my opinion and the bases for it, I provide in this
`
`section a background on the field of cross-language search, including some of the
`
`challenges inherent in the field and some of the work that other researchers were
`
`doing in the years leading up to the ’101 and ’154 patents. I then provide a brief
`
`discussion of search-related advertising. Finally, I provide an overview of the ’154
`
`patent and its predecessor, the ’101 patent.
`
`A. Cross-Language Search and Query Translation
`31. The ’154 patent relates to the field of cross-language search, referred
`
`to in academia as CLIR for Cross-Language Information Retrieval. In the typical
`
`formulation of the CLIR problem, a user presents a query to a search engine in one
`
`human language (e.g., English) and the search engine retrieves documents written
`
`in some other human language (e.g., Chinese) that the search engine determines are
`
`most likely to be relevant to the user’s request.
`
`32. The problem of finding documents in a language different from the
`
`language used to pose the request is not new; librarians who sought to serve
`
`13
`
`
`AOL Ex. 1002
`Page 16 of 90
`
`

`

`
`
`diverse user populations have faced the same problem for centuries. But, by the
`
`1990s, both the academic and commercial worlds recognized the wealth of digital
`
`information available in a variety of languages, and “the need to find ways of
`
`retrieving information across language boundaries, and to understand this
`
`information, once retrieved.” Ex. 1023 at 1 (Grefenstette). Since the late 1980s,
`
`corporations, government institutions, academic research centers, and others have
`
`sponsored or conducted research to develop methods and systems for accessing
`
`and understanding information written in a language other than that of the user’s
`
`query, and earlier work with smaller collections extends back to at least the 1960s.
`
`33. CLIR involves the same challenges as monolingual IR—such as
`
`determining which results are most relevant to the user’s query—plus the
`
`additional challenges posed by translation. Research on CLIR has often built upon
`
`techniques and approaches developed in monolingual IR. For example, it was well
`
`known by the late 1990s that one workable method of performing CLIR was to
`
`translate the user’s query into the target language and then perform a search in that
`
`language with a monolingual search engine. See, e.g., Ex. 1021 at 101
`
`(Yamabana); Ex. 1022 at 254, 258-59 (Bian). That approach to CLIR is known as
`
`14
`
`
`AOL Ex. 1002
`Page 17 of 90
`
`

`

`
`
`query translation because of its focus on translating the terms2 found in a user’s
`
`query.
`
`34. Query translation was the subject of much of the research on CLIR in
`
`the 1990s. For example, in a 1998 paper published in the book Cross-Language
`
`Information Retrieval, Kiyoshi Yamabana and colleagues presented a method for
`
`translating queries that included an interactive user interface allowing the user to
`
`select from different translations of the query so as to ensure that the term selected
`
`for use in the translated query properly reflected the intended meaning of the query
`
`term. Ex. 1021 at 9-11.
`
`                                                            
`2 “Word” and “term” are often used interchangeably in the CLIR literature. When
`
`I use “term” in this Declaration I intend to refer generically to words or multi-word
`
`expressions.
`
`15
`
`
`AOL Ex. 1002
`Page 18 of 90
`
`

`

`
`

`In the figure above, Yamabana et al. shows a prompt allowing the user to select
`
`from different translations of the Japanese word jouhou (which, in the interface, is
`
`written using Japanese characters). Id. at 10-11. The translated query is then used
`
`in a monolingual retrieval system. See id. at 11.
`
`35. Another example of the use of query translation for CLIR can be
`
`found in a 1998 paper by Guo-Wei Bian and Hsin-Hsi Chen. Ex. 1022. Bian and
`
`Chen described a system that would automatically translate a user’s query from
`
`Chinese into English, send the query to English language search engines, and
`
`automatically translate the received results back into Chinese. Id. at 11-12.
`
`36. Yet another example of query translation CLIR in the late 1990s was
`
`described in a paper by Joanne Capstick and several colleagues published in
`
`Information Processing & Management in March of 2000. Ex. 1024. Capstick et
`
`16
`
`
`AOL Ex. 1002
`Page 19 of 90
`
`

`

`
`
`al. created a system called MULINEX that allowed a user entering a query in
`
`English, German, or French to obtain search results in any of those languages by
`
`translating their query and searching for documents in the desired language. See
`
`id. at 4.
`
`37. Alternatives to query translation, including what came to be called
`
`“document translation” (which, more precisely, refers to translating terms found in
`
`the documents as they are indexed), or interlingual techniques (in which language-
`
`neutral representations of both the queries and the documents are created) were
`
`also explored in this time frame. The focus on query translation was due, at least
`
`in part, to the fact that queries are typically short and thus can be translated
`
`quickly. This offers advantages for experimental settings (in which researchers
`
`often wish to compare results from multiple system designs quickly, sometimes
`
`over a broad range of parameter settings) and in some operational settings (e.g.,
`
`when a system needs to support numerous query languages, a setting in which it
`
`might be infeasible to build an index for each possible query language).
`
`B.
`Problems Inherent in the Query Translation Approach
`38. Building a complete and usable system for CLIR using query
`
`translation presents three inherent challenges that must be addressed through
`
`refined solutions in order to maximize the degree to which the search results can be
`
`expected to be relevant to the searcher’s information need: identifying individual
`
`17
`
`
`AOL Ex. 1002
`Page 20 of 90
`
`

`

`
`
`query terms, identifying possible translations for each query term, and determining
`
`how best to use those possible translations. In this section, I discuss each of those
`
`challenges in turn.
`
`1.
`Identifying individual query terms
`39. The first step of the query-translation process is typically to identify
`
`the individual terms in a user’s query. Even though a person fluent in the language
`
`of the user’s query might be able to easily read a query and point out what they
`
`think of as individual words, a computer must be programmed to use some
`
`automated process to identify the terms on which the search will be based, and it
`
`must do so in some way that is well suited to the retrieval task.
`
`40.
`
`In Western languages such as English, French, and Russian in which
`
`words are by convention delimited by spaces or other recognizable characters, the
`
`simplest approach to identifying individual terms is to strip punctuation and then
`
`split the words at “white space” (e.g., spaces, tab characters, or line endings) to
`
`obtain terms (often referred to as “tokens”) that approximate words. See, e.g., Ex.
`
`1020 at 42 (Fluhr ’98). Simply segmenting a string at white space can, however,
`
`split terms that are better thought of as a single term for purposes of translation
`
`(e.g., “high school”). When there are known translations for multi-word
`
`expressions, it can be useful to segment the text in a way that treats a known multi-
`
`word expression as a single term, which requires a more complex approach.
`
`18
`
`
`AOL Ex. 1002
`Page 21 of 90
`
`

`

`
`
`41. A more complex approach is also required for languages in which
`
`words are frequently not separated by white space or punctuation. Examples of
`
`such languages include Chinese, in which sentences are delimited but individual
`
`words are not, and some “freely compounding” languages (e.g., German), in which
`
`it is common to combine words into longer terms that lack internal delimiters.
`
`42. One approach to segmentation in such cases is to start with some list
`
`of words in the language (e.g., from a dictionary) and then to use an algorithm to
`
`identify the “best” way of tiling those terms onto a longer string of characters.
`
`Generally these algorithms differ not in their goal or their basic approach, but
`
`rather in the computational details of how the tiling process was performed. One
`
`particularly simple approach, a type of “greedy” segmentation, is to work from left
`
`to right, repeatedly finding the longest matching dictionary entry.3 For example,
`
`this approach will easily segment the German term “Götterdämmerung” into the
`
`strings “götter” (gods) and “dämmerung” (twilight). More sophisticated
`
`techniques are needed to deal with missing or added characters (e.g., when splitting
`
`the German word “Fahrvergnügen” into the strings “fahren” (driving) and
`
`“vergnügen” (enjoyment)). Such cases can be handled by automatically generating
`
`                                                            
`3 See generally Ex. 1025 (Kwok ’99) at 3-5 (describing a greedy segmentation
`
`approach for Chinese).
`
`19
`
`
`AOL Ex. 1002
`Page 22 of 90
`
`

`

`
`
`common transformations (such as fahren to fahr), but at the cost of adding
`
`additional opportunities to make mistakes.
`
`43. Queries in languages such as Chinese that completely lack word
`
`delimiters pose particular problems. For an English analogue (with spaces
`
`removed), consider the query “tentsandstakes,” which may or may not contain the
`
`word “sand.” Properly handling such cases requires moving beyond greedy
`
`methods to generate all possible segmentations and then testing each to see which
`
`are the most likely to reflect the writer’s intent.4 To deal correctly with such
`
`complex cases, it can be useful to perform syntactic analysis (e.g., “tents and
`
`stakes” is a well formed clause), to leverage simple term counts (e.g., “and” is a
`
`very common word), or to make use of broader range of corpus statistics (e.g.,
`
`“tent” and “stakes” might rarely be written together near “sand,” perhaps because
`
`tent stakes don’t work well in sand). When the number of options becomes too
`
`large, pruning techniques such as dynamic programming can

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket