`_______________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`_____________
`
`GOOGLE LLC
`
`Petitioner
`
`v.
`
`VALTRUS INNOVATIONS LIMITED
`
`(record) Patent Owner
`
`Patent No. 6,728,704
`
`DECLARATION OF PADHRAIC SMYTH
`
`1
`
`GOOGLE 1002
`
`
`
`I.
`
`
`IL
`
`
`III.
`
`
`I.
`
`TABLE OF CONTENTS
`ENGAGEMENT AND COMPENSATION ................................................... 4
`QUALIFICATIONS ........................................................................................ 4
`SUMMARY OF OPINIONS ........................................................................... 7
` MATERIALS REVIEWED ............................................................................ 7
`IV.
` UNDERSTANDING OF THE RELEVANT LAW ........................................ 8
`V.
` Anticipation ........................................................................................... 8
`A.
`Obviousness ........................................................................................... 9
` LEVEL OF ORDINARY SKILL IN THE ART ........................................... 12
`VI.
` RELEVANT TIMEFRAME FOR DETERMINING OBVIOUSNESS ....... 13
`VII.
` TECHNICAL INTRODUCTION ................................................................. 13
`VIII.
`The ’704 Patent Disclosure ................................................................. 13
` CLAIM INTERPRETATION ....................................................................... 16
`IX.
`BACKGROUND ON CLAIM INTERPRETATION ......................... 16
`DETAILED EXPLANATION OF THE REASONS FOR
`UNPATENTABILITY ........................................................................ 17
` Claims 1-3, 5-9, 12-14, 17-20, and 23 were Obvious over Bushee in
`Ground 1.
`view of Voorhees................................................................................. 17
`Overview of the Ground ...................................................................... 17
`1.
`Overview of Bushee .................................................................. 18
`2.
`Overview of Voorhees .............................................................. 21
`Rationale (Motivation) Supporting Obviousness ................................ 26
`Graham Factors ................................................................................... 28
`Reasonable Expectation of Success .................................................... 29
`
`B.
`
`
`A.
`
`
`A.
`
`
`B.
`
`
`B.
`C.
`
`C.
`
`
`
`
`2
`
`
`
`E.
`
`
` Analogous Art ..................................................................................... 29
`D.
`Claim Mapping .................................................................................... 30
` Claims 3-4, 9-10, 14-15, and 20-21 were Obvious over Bushee in
`Ground 2.
`view of Voorhees and Koppel ............................................................. 52
`A. Overview of the Ground ...................................................................... 52
`B.
`Overview of Koppel ............................................................................ 53
`C.
`Graham Factors ................................................................................... 54
`D.
`Rationale (Motivation) for the Combination ....................................... 54
`E.
`Reasonable Expectation of Success .................................................... 54
`F.
`Analogous Art ..................................................................................... 55
`Claim Mapping .................................................................................... 55
` Claims 1-3, 5-9, 12-14, 17-20, and 23 are Obvious over Voorhees in
`Ground 3.
`view of Bushee and Tso. ..................................................................... 56
`A. Overview of the Ground ...................................................................... 56
`1.
`Overview of Tso........................................................................ 57
`Graham Factors ................................................................................... 59
`Analogous Art ..................................................................................... 59
`Rationale for the Combination ............................................................ 60
`Reasonable Expectation of Success .................................................... 60
`Claim Mapping .................................................................................... 60
` Claims 3-4, 9-10, 14-15, and 20-21 were Obvious over Voorhees in
`Ground 4.
`view of Bushee, Tso, and Koppel ....................................................... 84
`Explanation of the Ground .................................................................. 84
`G.
` OATH
` ................................................................................................... 86
`X.
`
`F.
`
`
`B.
`C.
`D.
`E.
`F.
`
`
`
`3
`
`
`
`
`I.
`
`ENGAGEMENT AND COMPENSATION
`1. My name is Padhraic Smyth. I have been retained by Google LLC for
`
`the purpose of providing my opinion with respect to the unpatentability of U.S. Pat.
`
`No. 6,728,704 (“the ’704 patent”). I am being compensated for my time in preparing
`
`this declaration at my standard hourly rate, and my compensation is not dependent
`
`upon my opinions or the outcome of the proceedings. My curriculum vitae is
`
`attached as Ex. 1003.
`
` QUALIFICATIONS
`II.
`I am currently a Professor in the Department of Computer Science at
`2.
`
`the University of California, Irvine. I have held the title of Chancellor's Professor
`
`since 2018. Before that, I held the title of Full Professor from July 2003 to 2018.
`
`From July 1998 to June 2003, I held the title of Associate Professor. I began at UC
`
`Irvine as an Assistant Professor, a title I held from April 1996 to June 1998. I also
`
`hold joint faculty appointments with the Departments of Statistics and Education at
`
`UC Irvine.
`
`3.
`
`I was a Founding Director of the UCI Data Science Initiative at
`
`University of California, Irvine, from July 2014 to June 2018. I was also a Founding
`
`Director of the Center for Machine Learning and Intelligent Systems at the
`
`University of California, Irvine from January 2007 to July 2014.
`
`4.
`
`From October 1988 to March 1996, I was a Member of Technical Staff
`
`
`
`4
`
`
`
` and a Technical Group Leader (from 1992) at the Jet Propulsion Laboratory,
`
`California Institute of Technology, Pasadena, California.
`
`5.
`
`I completed a Bachelor’s Degree in Electronic Engineering in 1984 at
`
`the National University of Ireland, University College Galway. I completed a
`
`Master’s Degree in Electrical Engineering at the California Institute of Technology,
`
`Department of Electrical Engineering, in 1985. In 1988, I completed a Ph.D. in
`
`Electrical Engineering from the California Institute of Technology.
`
`6.
`
`I have spent three and a half decades researching topics relevant to the
`
`’704 patent, including data mining, machine learning, artificial intelligence, pattern
`
`recognition, and applied statistics. I am a co-author on over 200 published papers in
`
`these and related fields.
`
`7.
`
`I have co-authored or edited the following books that are relevant to the
`
`subject matter of the ’704 patent: Modeling the Internet and the Web: Probabilistic
`
`Methods and Algorithms, P. Baldi, P. Frasconi, and P. Smyth, John Wiley, June
`
`2003; Principles of Data Mining, D. Hand, H. Mannila, and P. Smyth, Cambridge,
`
`MA: MIT Press, 2001; and Advances in Knowledge Discovery and Data Mining, U.
`
`Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy (eds.), Palo Alto, CA:
`
`AAAI/MIT Press, 1996. I was an editor of the following conference proceedings
`
`relevant to the ’704 patent: C. Apte, J. Ghosh, P. Smyth (eds.), Proceedings of the
`
`17th ACM SIGKDD International Conference on Knowledge Discovery and Data
`
`
`
`5
`
`
`
`Mining, ISBN 978-1-4503-0813-7, ACM Press, New York, NY, 2011.
`
`8.
`
`Up to and around the time of filing of the ’704 patent, I presented or
`
`published the following papers, which are examples of work I have conducted
`
`relevant to the subject matter of the ’704 patent: D. Pavlov and P. Smyth,
`
`Probabilistic query models for transaction data, in Proceedings of the ACM Seventh
`
`International Conference on Knowledge Discovery and Data Mining, ACM: New
`
`York, NY, pp. 164-173, August 2001; I. Cadez, D. Heckerman, C. Meek, P. Smyth,
`
`and S. White, Visualization of navigation patterns on a Web site using model based
`
`clustering, in Proceedings of the ACM Sixth International Conference on
`
`Knowledge Discovery and Data Mining, New York, NY: ACM Press, pp. 280-284,
`
`August 2000; D. Pavlov, H. Mannila, and P. Smyth, Probabilistic models for query
`
`approximation with large sparse binary data sets, in Proceedings of the 2000
`
`Uncertainty in AI Conference, San Francisco, CA: Morgan Kaufmann, pp. 465-472,
`
`July 2000;
`
`9.
`
`Further examples of work I have conducted relevant to the subject
`
`matter of the ’704 patent include H. Mannila and P. Smyth, Approximate query
`
`answering with frequent sets and maximum entropy, Proceedings of ICDE 2000,
`
`IEEE Press, 309, February 2000; AAAI Press, 54-60, 1997; E. Keogh and P. Smyth,
`
`A probabilistic approach to fast pattern matching in time series databases,
`
`Proceedings of the Third International Conference on Knowledge Discovery and
`
`
`
`6
`
`
`
`Data Mining, Menlo Park, CA: AAAI Press, 24-30, 1997; P. Smyth, Clustering
`
`sequences using hidden Markov models, in Advances in Neural Information
`
`Processing 9, M. C. Mozer, M. I. Jordan and T. Petsche (eds.), Cambridge, MA:
`
`MIT Press, 648-654, 1997; U. M. Fayyad, P. Smyth, The automated analysis,
`
`cataloging, and searching of digital image libraries: a machine learning approach,
`
`Digital Libraries Workshop, Lecture Notes in Computer Science vol 916, pp. 225-
`
`249, Heidelberg: Springer Verlag, 1994.
`
` SUMMARY OF OPINIONS
`III.
`In my opinion:
`10.
`
`• Claims 1-3, 5-9, 11-14, 16-20, and 22-23 were obvious over Bushee in view
`
`of Voorhees.
`
`• Claims 3-4, 9-10, 14-15, and 20-21 were obvious over Bushee in view of
`
`Voorhees and Koppel.
`
`• Claims 1-3, 5-9, 12-14, 17-20, and 23 were obvious over Voorhees in view of
`
`Bushee and Tso.
`
`• Claims 3-4, 9-10, 14-15, and 20-21 were obvious over Voorhees in view of
`
`Bushee, Tso, and Koppel.
`
` MATERIALS REVIEWED
`IV.
`In forming my opinions, I have relied on my knowledge of the field and
`11.
`
`my experience, and have specifically reviewed the following exhibits:
`
`
`
`7
`
`
`
`
`
`
`
`Exhibit No.
`1001
`1004
`1005
`1006
`1007
`
`1013
`
`Description
`U.S. Patent No. 10,584,704 (“the ’704 patent”).
`U.S. Pat. No. 6,711,569 (“Bushee”).
`U.S. Pat. No. 5,864,846 (“Voorhees”).
`U.S. Pat. No. 6,385,602 (“Tso”).
`File History of U.S. Pat. App. Ser. No. 09/940,600 (“the ’704
`patent file history”).
`U.S. Pat. No. 7,257,766 (“Koppel”).
`
` UNDERSTANDING OF THE RELEVANT LAW
`V.
`I have the following understanding of the applicable law:
`12.
`
` Anticipation
`A.
`I understand that a claim in an issued patent can be unpatentable if it is
`13.
`
`anticipated. I understand that “anticipation” means that there is a single prior art
`
`reference that discloses every element of the claim, arranged in the way required by
`
`the claim.
`
`14.
`
`I understand that an anticipating prior art reference must disclose each
`
`of the claim elements expressly or inherently. I understand that “inherent”
`
`disclosure means that the claim element, although not expressly described by the
`
`prior art reference, must necessarily be present based on the disclosure. I understand
`
`that a mere probability that the element is present is not sufficient to qualify as
`
`
`
`8
`
`
`
`“inherent disclosure.”
`
` Obviousness
`B.
`I understand that a claim in an issued patent can be unpatentable if it is
`15.
`
`obvious. Unlike anticipation, obviousness does not require that every element of the
`
`claim be in a single prior art reference. Instead, it is possible for claim elements to
`
`be described in different prior art references, so long as there is motivation or
`
`sufficient reasoning to combine the references, and a reasonable expectation of
`
`success in achieving what is set forth in the claims.
`
`16.
`
`I understand that prior art can only be used in an obviousness challenge
`
`if it is “analogous art”. I understand analogous art to be prior art that is either in the
`
`same field as the patent at-issue, or prior art that would have been reasonably
`
`pertinent to the problems facing the named inventors.
`
`17.
`
`I understand that a claim is unpatentable for obviousness if the
`
`differences between the claimed subject matter and the prior art are such that the
`
`subject matter as a whole would have been obvious at the time the alleged invention
`
`was made to a person having ordinary skill in the art to which said subject matter
`
`pertains.
`
`18.
`
`I understand, therefore, that when evaluating obviousness, one must
`
`consider obviousness of the claim “as a whole”. This consideration must be from
`
`the perspective of the person of ordinary skill in the relevant art, and that such
`
`
`
`9
`
`
`
`perspective must be considered as of the “time the invention was made.”
`
`19. The level of ordinary skill in the art is discussed in ¶¶26-28 below.
`
`20. The relevant time frame for obviousness, the “time the invention was
`
`made”, is discussed in ¶¶29-29, below.
`
`21.
`
`I understand that in considering the obviousness of a claim, courts
`
`consider the four so-called Graham factors, named for a Supreme Court decision
`
`(Graham versus John Deere). These four factors include (1) the scope and content
`
`of the prior art, (2) the level of ordinary skill in the art at the relevant time, (3) the
`
`differences between the prior art and the claim, and (4) any “secondary
`
`considerations.”
`
`22.
`
`I understand that “secondary considerations” include real-world
`
`evidence that can tend to make a conclusion of obviousness either more probable or
`
`less probable. For example, the commercial success of a product embodying a claim
`
`of the patent could provide evidence tending to show that the claimed invention is
`
`not obvious. In order to understand the strength of the evidence, one would want to
`
`know whether the commercial success is traceable to a certain aspect of the claim
`
`not disclosed in a single prior art reference (i.e., whether there is a causal “nexus” to
`
`the claim language). One would also want to know how the market reacted to
`
`disclosure of the invention, and whether commercial success might be traceable to
`
`things other than innovation, for example the market power of the seller, an
`
`
`
`10
`
`
`
`advertising campaign, or the existence of a complex system having many features
`
`beyond the claims that might be desirable to a consumer. One would also want to
`
`know how the product compared to similar products not embodying the claim. I
`
`understand that commercial success evidence should be reasonably commensurate
`
`with the scope of the claim, but that it is not necessary for a commercial product to
`
`embody the full scope of the claim.
`
`23. Other kinds of secondary considerations are possible. For example,
`
`evidence that the relevant field had a long-established, unsolved problem or need
`
`that was later provided by the claimed invention could be indicative of non-
`
`obviousness. Evidence that others had tried, but failed to make an aspect of the
`
`claim might indicate that the art lacked the requisite skill to do so. Evidence of
`
`copying of the patent owner’s products before the patent was published might also
`
`indicate that its approach to solving a particular problem was not obvious. Evidence
`
`that the art recognized the value of products embodying a claim, for example, by
`
`praising the named inventors’ work, might tend to show that the claim was non-
`
`obvious.
`
`24.
`
`I further understand that prior art references can be combined where
`
`there is an express or implied rationale to do so. Such a rationale might include an
`
`expected advantage to be obtained, or might be implied under the circumstances.
`
`For example, a claim is likely obvious if design needs or market pressures existing
`
`
`
`11
`
`
`
`in the prior art make it natural for one or more known components to be combined,
`
`where each component continues to function in the expected manner when combined
`
`(i.e., when there are no unpredictable results). A claim is also likely unpatentable
`
`where it is the combination of a known base system with a known technique that can
`
`be applied to the base system without an unpredictable result. In these cases, the
`
`combination must be within the capabilities of a person of ordinary skill in the art.
`
`25.
`
`I understand that when considering obviousness, one must not refer to
`
`teachings in the specification of the patent itself. One can, however, refer to portions
`
`of the specification admitted to being prior art, including the “BACKGROUND”
`
`section. Furthermore, a lack of discussion in the patent specification concerning
`
`how to implement a disclosed technique can support an inference that the ability to
`
`implement the technique was within the ordinary skill in the prior art.
`
` LEVEL OF ORDINARY SKILL IN THE ART
`VI.
`In my opinion, the relevant art was that of search engine technology. I
`26.
`
`note that the ’704 patent teaches that “[t]his invention relates generally to search
`
`engine technology.” (Ex. 1001, 1:7-8).
`
`27.
`
`In my opinion, a person having ordinary skill in the art (hereinafter, a
`
`“POSITA”) would have had a Bachelor’s Degree in Computer Science or a related
`
`field and five years of experience in search technology, where a higher level of
`
`education may substitute for experience and vice versa.
`
`
`
`12
`
`
`
`28.
`
`I believe I would meet this definition, and would have met this
`
`definition in the relevant timeframe. My testimony is offered from this perspective,
`
`even if it does not specifically refer to the perspective of a person of ordinary skill
`
`in the art in every instance.
`
` RELEVANT TIMEFRAME FOR DETERMINING OBVIOUSNESS
`VII.
`I understand that obviousness must be evaluated “at the time of the
`29.
`
`invention.” From the cover pages of the ’704 patent, I can see that the first
`
`application for a patent was filed in the United States on August 27, 2001. For the
`
`purpose of this declaration, I will analyze obviousness in the time frame immediately
`
`prior to this date, although my testimony is usually applicable to a longer period of
`
`time before August 27, 2001. My testimony is directed to this timeframe, even if I
`
`do not always use a past tense.
`
` TECHNICAL INTRODUCTION
`VIII.
`A.
` The ’704 Patent Disclosure
`I have reviewed the ’704 patent. The ’704 patent relates to a particular
`30.
`
`method for arranging Web search results. (Ex. 1001, Abstract). The Web search
`
`results in the ’704 patent arise from a user query that is sent to multiple search
`
`engines. (Ex. 1001, 1:37-52). Each search engine returns a list of results. (Id.). The
`
`list of results includes links or references to Web pages. (Id.). Links are typically
`
`text that has been marked with an anchor tag in HTML code, although links can take
`
`a variety of forms, including images and other page elements, with mechanisms that
`13
`
`
`
`
`
`include the use of, e.g., scripting languages. According to the ’704 patent, each Web
`
`page in a search result list from a search engine has a relevance score, which is
`
`computed using known methods. (Ex. 1001, 5:56-65).
`
`31. When the ’704 patent system receives multiple lists of results from
`
`different search engines, these results are separate. The ’704 patent seeks some way
`
`to “fuse” (i.e. to “merge” or “integrate”) the results into one list, in order to present
`
`them as a single, integrated list. (Ex. 1001, 6:6-28). To do so, the system first takes
`
`a subset of each list, and assigns each search engine a score (a “representative value”)
`
`that is intended to represent the overall relevance of the search engine’s results,
`
`rather than the results of any particular entry. (Id.). One example of such a
`
`representative value, in the ’704 patent disclosure, is the average of each of the
`
`individual scores of each entry in the list of results for a search engine. (Id.). For
`
`example, suppose a first search engine yields 100 results, each of which has an
`
`individual relevance score. The ’704 patent teaches, e.g., to average each of the 100
`
`scores, and to use the average score to represent the relevance of the first search
`
`engine with respect to the results of other search engines. (Id.).
`
`32. The relevance value is used to sort the results from multiple search
`
`engines into a single list. (Id.). To do this, the ’704 patent teaches two methods.
`
`(Id.). As explained in the ’704 patent:
`
`“Once each result list has a representative value assigned to it, it is
`
`
`
`14
`
`
`
`merged with the others accordingly. Two preferred embodiments are
`given for accomplishing this operation. In the first embodiment,
`entries are merged by selecting the list with the highest representative
`value (e.g., the highest average scoring value). The first entry on the
`list that has not already been selected is then picked. That list's
`representative value is then decremented by a fixed amount and the
`process is repeated until all entries have been picked. If any
`representative value drops below zero after decrementing, it is reset
`to its initial value. In the second embodiment, entries are merged
`using a probabilistic approach. Each list is assigned a probability
`value equal to its representative value's percentage of the total
`representative values for all lists. Lists are then selected according to
`their probability value, with lists having higher probability values
`being more likely to be selected.”
`
`(Ex. 1001, 6:8-24).
`
`33. Claim 1 of the ’704 patent reads as follows:
`
`“1[a]. A method of merging result lists from multiple search
`engines, said method comprising:
` [1b] transmitting a query to a set of search engines;
` [1c] receiving in response to said query a result list from each
`search engine of said set of search engines, each result list
`including one or more entries;
` [1d] selecting a subset of entries from each result list to form a
`set of selected entries;
` [1e] assigning to each selected entry of said set of selected
`
`
`
`15
`
`
`
`entries a scoring value according to a scoring function;
` [1f] assigning to each subset a representative value according to
`the scoring values assigned to its entries; and
` [1g] producing a merged list of entries in a predetermined
`manner based on the representative value assigned to each result
`list,
` [1h] wherein the representative value varies in accordance with
`predetermined manner.”
`
` CLAIM INTERPRETATION
`IX.
` BACKGROUND ON CLAIM INTERPRETATION
`A.
`I understand that it is sometimes necessary or useful for claim terms in
`34.
`
`a patent to be further explained or interpreted (“construed”). I understand that in the
`
`present proceeding, the Board applies the same claim construction standard used by
`
`District Courts in actions involving the validity or infringement of a patent. This
`
`involves construing claim terms in accordance with the ordinary and customary
`
`meaning of such terms, as understood by one of ordinary skill in the art, in light of
`
`the claim language, the technical disclosure of the patent (i.e. the specification) and
`
`the prosecution history or “file history” of correspondence with the United States
`
`Patent and Trademark Office (USPTO) pertaining to the patent.
`
`35.
`
`I further understand that the file history of a parent patent application
`
`can be relevant to the claim construction of claim terms appearing in patents that
`
`have descended from that parent application.
`
`
`
`16
`
`
`
`36.
`
`I understand that certain “extrinsic” evidence, such as dictionaries or
`
`other prior art, can sometimes be useful to understand the meaning of a claim term.
`
`However I understand that where there is a conflict between any such extrinsic
`
`evidence and the patent and patent’s prosecution history, the latter control.
`
`37.
`
`I understand that no claim construction orders have been issued for the
`
`’704 patent and that claim construction proceedings have not taken place in any co-
`
`pending litigations in which the ’704 patent has been asserted.
`
`38.
`
`In my opinion, I can apply of the terms of the challenged claims of the
`
`’704 patent to the prior art without the need for further interpretation of those terms.
`
`I.
`
`DETAILED EXPLANATION OF THE REASONS FOR
`UNPATENTABILITY
`
` Claims 1-3, 5-9, 12-14, 17-20, and 23 were Obvious over Bushee in
`Ground 1.
`view of Voorhees.
`
`39.
`
`In my opinion, claims 1-3, 5-9, 12-14, 17-20, and 23 are obvious over
`
`U.S. Pat. No. 6,711,569 (“Bushee”)(Ex. 1004) in view of U.S. Pat. 5,864,846
`
`(“Voorhees”)(Ex. 1005). I understand that both Bushee and Voorhees were prior
`
`art to the ’704 patent.
`
` Overview of the Ground
`B.
`In my opinion, Bushee describes almost all of claim 1 of the ’704 patent
`40.
`
`under a proper understanding of the claim language, including: a user query
`
`transmitted to multiple search engines, receiving results in the form of a list of web
`17
`
`
`
`
`
`entries from each search engine, selecting a subset of results, scoring each entry in
`
`each result list, and assigning to each search engine a representative value that is the
`
`average of the individual scores of each entry, and merging the various results to
`
`form a merged results list.
`
`41. Under the proper understanding of the claims, Bushee does not describe
`
`merging the results “in a predetermined manner based on the representative
`
`value...wherein
`
`the representative value varies
`
`in accordance with [the]
`
`predetermined manner” (claim limitation [1h]). However, Voorhees describes just
`
`such a method of sorting pages into a merged result list that uses a representative
`
`value, wherein the representative value varies in accordance with predetermined
`
`manner.
`
`42.
`
`In my opinion it would have been obvious to use Voorhees’s method of
`
`sorting pages into a merged list, within Bushee’s multi-search-engine system. The
`
`resulting system would obviously have met claims 1-3, 5-9, 12-14, 17-20, and 23.
`
`1. Overview of Bushee
`43. Bushee describes a “method for automatic selection of databases for
`
`improving the efficiency of data capture and management systems.” (Ex. 1004,
`
`Abstract). While Bushee often speaks in terms of “databases” having “documents”,
`
`it is clear that Bushee’s databases are search engines operating on the World Wide
`
`Web. (Ex. 1004, 2:1-7). Bushee explains:
`
`
`
`18
`
`
`
`“Because of the similarity between web sites specifically and
`databases in general the terms document and web page are used
`synonymously
`throughout
`this document unless otherwise
`distinguished by context. Similarly, the terms search engine and
`database are also used synonymously throughout this document
`unless otherwise distinguished by context.”
`
`(Ex. 1004, 2:1-7)(Emphasis added).
`
`44. Bushee explains the method of the invention in relation to Fig. 2, which
`
`is reproduced here:
`
`
`
`19
`
`
`
`Obtain Query
`
`♦
`Compare Query to
`Categorization of
`Database in Pool
`
`•
`
`Select Databases
`
`♦
`
`Pass Query to
`Selected
`
`Collect Results
`from Database
`
`•
`•
`
`Pull First N Resu11js
`from Each
`Database
`I
`
`♦
`
`Score Each of N
`results
`
`♦
`Average Score of
`N Results for Each
`Database
`
`Assign Average
`Score
`
`Rank Databases
`by Average Score
`
`Present Databases
`and Results in
`Ranked Order
`
`•
`•
`•
`
`
`(Ex. 1004, Fig. 2). As shown in Fig. 2, Bushee obtains a query from a user, selects
`
`databases (search engines) to send the query to, transmits the query, receives results
`
`from the search engines, and selects a subset of the results (the “first N results”).
`
`(Ex. 1004, 3:64-5:5). Then, Bushee scores each result in each subset, takes the
`
`average of the scores for each search engine, and uses the average score of each
`
`database to rank the databases for presentation. (Ex. 1004, 4:47-5:33).
`
`45. Bushee explains that—just like in the ’704 patent—the average score
`
`
`
`20
`
`
`
`represents a measure of the relevancy of each database (search engine) to the user’s
`
`query:
`
`“Each of the documents (e.g. web pages) is then evaluated for the
`number of occurrences of the term or terms of the query in the
`document and the title of the document. The length of the document
`may also be determined for evaluating relevancy. This information
`is used to determine a numerical score for each document. The
`numerical scores for each document retrieved from a database are
`averaged together, and this averaged score is then assigned to the
`database as an indication of relevance of that database to the
`user's query.”
`
`(Ex. 1004, 5:18-28)(Emphasis added).
`
`2. Overview of Voorhees
`46. Voorhees—like Bushee and the ’704 patent—teaches a system for
`
`receiving search results from multiple search engines. Voorhees describes its
`
`method as:
`
`“A computer-implemented method for facilitating World Wide Web
`Searches and like database searches by combining search result
`documents, as provided by separate search engines in response to a
`query, into one single integrated list so as to produce a single
`document with a ranked list of pages....”
`
`(Ex. 1005, Abstract).
`
`47. There are two aspects to the Voorhees technology that are relevant to
`21
`
`
`
`
`
`my opinion. The first is a method of assigning a relevance weight to each search
`
`engine that returns results, while the second is a method of combining (or “fusing”)
`
`the results of multiple search engines into a single search result for the user. The
`
`former method (assigning a weight) is most relevant to Grounds 3 and 4, while the
`
`latter (fusing the results into a single document) is relevant to all grounds.
`
`48. Voorhees describes fusing search results from different search engines
`
`into a single search results page by estimating the relevance of each search engine’s
`
`results. To do this, Voorhees proposes two alternate methods, called (1) the average
`
`relevant document distribution method and (2) the centroid method. My opinion
`
`will focus on the centroid method. (Ex. 1005, 4:30-6:29). In the centroid method,
`
`each search engine has already performed a number of training queries with known
`
`results. (Ex. 1005, 2:43-51). Both the training queries and the results they return
`
`can differ from search engine to search engine. At each search engine, the training
`
`queries are divided into clusters, that is, groups of queries that have similar
`
`meanings. (Ex. 1004, 4:41-63, 5:35-56).
`
`49. What Voorhees means by “clusters” should be briefly explained.
`
`Voorhees describes using a vector space model. (Ex. 1005, Abstract, 3:24-28). In
`
`a vector-space model, any text can be formed into a vector. Such a vector could be,
`
`for example, a set of numbers (for example {1, 4, 7, 3, 15, 0, 0, 2, …}), where each
`
`number represents how many times a particular word or word root appears in the
`
`
`
`22
`
`
`
`text. In this way, any document, and any query, can be represented as a vector. The
`
`vectors can be visualized as points in n-dimensional space (vectors and points are,
`
`at least in this way, synonymous). The points (vectors) can be made equidistant
`
`from the origin of whatever coordinate system is being used, by normalizing the
`
`vector entries. A “similarity” between any two vectors can be understood by their
`
`geometric relationship—for example, one measure of similarity is the cosine of the
`
`angle between the two normalized vectors. (Ex. 1005, 3:5-7). Regarding clustering,
`
`when users enter multiple, different queries, the queries can all be represented as
`
`points in n-dimensional space. Some of these points may tend to cluster (be closer
`
`to one another than to other points). Mathematical algorithms can be used to identify
`
`such clusters, and to define which points belong to a cluster. Voorhees also defines
`
`a “cluster centroid”, which is an average of all the points (vectors) in a cluster. (Ex.
`
`1005, 4:53-57). To average vectors, one averages each of their entries, and uses
`
`those averages as a new (centroid) vector.
`
`50. The clusters of Voorhees are used to process a user query. When a user
`
`query is received by a search engine, the search engine compares the query with each
`
`cluster’s centroid. (Ex. 1005, 5:65-67). A cluster centroid is, again, basically an
`
`average of all queries that make up the cluster. (Ex. 1005, 5:45-47).