`
`
`In re Patent of: Ingemar J. Cox
`U.S. Patent No.: 8,205,237
`Issue Date: Jun. 19, 2012
`Appl. No.: 11/977,202
`Filing Date: Oct. 23, 2007
`Title: IDENTIFYING WORKS, USING A SUB-LINEAR TIME SEARCH,
`SUCH AS AN APPROXIMATE NEAREST NEIGHBOR SEARCH, FOR
`INITIATING A WORK-BASED ACTION, SUCH AS AN ACTION ON THE
`INTERNET
`
`Submitted via Electronic Filing
`Mail Stop PATENT BOARD
`Patent Trial and Appeal Board
`Commissioner for Patents
`P.O. Box 1450
`Alexandria, VA 22313-1450
`
`
`PETITIONER'S EXHIBIT 1004
`
`
`
`DECLARATION OF DR. PIERRE MOULIN IN SUPPORT OF
`PETITION FOR INTER PARTES REVIEW OF U.S. PATENT NO. 8,205,237
`UNDER 35 U.S.C. §§ 311-319
`
`
`
`
`
`Google Ex. 1004
`
`
`
`
`
`I.
`
`I, Dr. Pierre Moulin, D. Sc., declare as follows:
`
`INTRODUCTION
`1. My name is Pierre Moulin, D. Sc. I am a Professor of Electrical and
`
`Computer Engineering ("ECE") and Statistics at the University of Illinois. My
`
`business address is University of Illinois, Beckman Institute, 405 N. Matthews
`
`Ave., Urbana, IL 61801. I have been retained by counsel for Google Inc.
`
`("Petitioner"), as a technical expert witness with respect to the inter partes review
`
`petition filed by Petitioner concerning U.S. Patent No. 8,205,237 (the "'237
`
`patent").
`
`2.
`
`For purposes of this Declaration, I have been asked to provide an
`
`expert technical analysis and opinion, including whether a single prior art reference
`
`or combination of references discloses the elements of the claims of the '237
`
`patent. In addition, I have been asked to provide opinions on the knowledge of skill
`
`in the art, the scope and content of the prior art, the motivations to combine certain
`
`prior art references, and other matters as set forth in this Declaration.
`
`3. My investigation into the matters addressed herein is ongoing and I
`
`reserve the right to supplement this Declaration as my investigation continues.
`
`II. EDUCATIONAL AND PROFESSIONAL BACKGROUND
`4. My qualifications, including my publications, are generally
`
`summarized in my Curriculum Vitae, attached hereto as Exhibit 1005.
`
`
`
`Google Ex. 1004
`
`
`
`
`
`III. MATERIALS REVIEWED
`5.
`I have considered a number of references in connection with the
`
`preparation of this Declaration, including the '237 patent and file history. In
`
`addition, I have reviewed all references cited in this Declaration and the references
`
`identified in the Exhibit List included in Petitioner's petition. Further, I have
`
`conducted an independent review of available prior art. Finally, I have relied on
`
`my own expertise when formulating the opinions contained in this Declaration.
`
`IV. PERSON OF ORDINARY SKILL IN THE ART
`6.
`I understand that the present owner of the '237 patent, Network-1
`
`Technologies, Inc., has represented that the inventor conceived of the purported
`
`inventions embodied in the '237 patent no earlier than July 1, 2000 (the "critical
`
`date"). Ex. 1019 at 30.
`
`7.
`
`In my opinion, the relevant field of art for the '237 patent is that of
`
`automatic content recognition algorithms. It is my opinion that a person of ordinary
`
`skill in the art as of the critical date of July 1, 2000 would have been highly skilled,
`
`and typically would have possessed at least an M.S. in computer science, electrical
`
`engineering, or mathematics; knowledge of video and audio processing techniques;
`
`and 1-2 years of experience in audio, video, or image processing. I personally
`
`possessed this level of experience in 2000, and I worked closely with numerous
`
`other individuals who also possessed this level of experience.
`
`
`
`2
`
`Google Ex. 1004
`
`
`
`
`
`V. BACKGROUND IN THE FIELD OF AUTOMATED CONTENT
`RECOGNITION
`8.
`
`I provide the following background that relates to any opinions on the
`
`claims of the '237 patent.
`
`9.
`
`In the 1990s, the emergence of affordable computing power sufficient
`
`to process electronic media spurred researchers and entrepreneurs to automate
`
`various tasks within the field of content publication and consumption. Ex. 1001 at
`
`1:41-44. Three exemplary problems were (1) television viewers often disliked
`
`watching commercials; (2) television advertisers had no means to verify that
`
`commercials they paid to air were in fact aired, and aired in the correct time slot;
`
`and (3) radio listeners often failed to purchase music albums because they did not
`
`know the titles of songs that they enjoyed. Before 2000, numerous individuals
`
`concurrently developed the same two solutions to all three problems: (1)
`
`embedding hidden "watermarks" in electronic works, which could later be
`
`interpreted to identify the work; and (2) using computer-automated systems to
`
`recognize audio, video, and/or image content by analyzing the intrinsic features of
`
`a video work. The second technology is at issue in the present petition.
`
`10. Content recognition schemes universally relied on two widely known
`
`technologies: feature extraction and neighbor searching in a database.
`
`11. Feature extraction refers to quantifying a media work in a form that—
`
`unlike a raw video feed—is easily parsed by a computer. Typically, extracted
`
`
`
`3
`
`Google Ex. 1004
`
`
`
`
`
`features were compact, meaning they occupied less memory on a computer than
`
`the corresponding video file did. Furthermore, extracted features were typically
`
`structured in a format that facilitated efficient search. In the context of a content
`
`identification system, each set of extracted features corresponding to a given media
`
`work was stored as an entry in a database. Within such a database, entries were
`
`typically organized to facilitate efficient search. Such organization was known as
`
`"preprocessing."
`
`12. Neighbor searching refers to algorithms for comparing a first set of
`
`extracted features with one or more additional sets of extracted features to locate a
`
`close, but not necessarily exact, match. Because neighbor searching is
`
`computationally intensive for large feature sets, content recognition schemes
`
`typically employed search algorithms that increased efficiency by intelligently
`
`searching only a subset of potential matches (i.e., "non-exhaustive" algorithms).
`
`VI. OVERVIEW OF THE '237 PATENT
`13.
`It is my understanding that Petitioner is requesting inter partes review
`
`on the '237 patent, entitled "Identifying Works, Using A Sub-Linear Time Search,
`
`Such As An Approximate Nearest Neighbor Search, For Initiating A Work-Based
`
`Action, Such As An Action On The Internet" The '237 patent lists Ingemar J. Cox
`
`as inventor. It issued on Jun. 19, 2012, with 40 claims, of which claims 1, 5, 9, 13,
`
`25, and 33 are independent. The patent application that led to the '237 patent
`
`
`
`4
`
`Google Ex. 1004
`
`
`
`
`
`(Application No. 11/977,202) was filed on October 23, 2007, and was a
`
`continuation-in-part of Application No. 09/950,972, filed September 13, 2001,
`
`which issued as U.S. Patent No. 7,058,223. That application, in turn, claims
`
`priority to Provisional Application No. 60/232,618, filed September 14, 2000.
`
`14. The claims of the '237 patent relate to identifying an unknown media
`
`work and, based on its determined identity, performing an action.
`
`15. The independent claims of the '237 patent recite the following three
`
`elements: (1) receiving or obtaining "features . . . extracted from a media work";
`
`(2) "determining . . . an identification of the media work using the . . . features . . .
`
`to perform" a "search of . . . media works"; (3) either "transmitting . . . information
`
`about the identified media work to the client device" or "determining . . . an action
`
`based on the determined identification of the media work." Ex. 1001 at Claims 1,
`
`5, 9, 13, 25, 33.
`
`16. The specification of the '237 patent concedes that these three elements
`
`were well known prior to the filing of the '237 patent.
`
`17. First, the specification of the '237 patent explains that "[t]he
`
`recognition literature contains many different representations" of extraction
`
`algorithms, including "a pseudo-random sample of pixels from the frame or a low-
`
`resolution copy of the frame or the average intensities of nxn blocks of pixels," "a
`
`frequency-based decomposition of the signal, such as produced by the Fourier,
`
`
`
`5
`
`Google Ex. 1004
`
`
`
`
`
`wavelet and or discrete cosine transforms," "principal component analysis," "a
`
`temporal sequence of feature vectors," or "Fourier frequency decompositions." Ex.
`
`1001 at 6:57-7:11. The '237 specification does not purport to present any novel
`
`means of receiving or obtaining "features . . . extracted from a media work."
`
`18. Second, the '237 specification explains that the purported invention
`
`employs "common statistical measures" and "[o]ther forms of matching" already
`
`known in the art, such as "correlation coefficient," "Euclidean distance," "Lp-
`
`norms," "kd-trees, vantage point trees and excluded middle vantage point forests"
`
`(Ex. 1001 at 8:63-9:38) for "electronically determining an identification of the
`
`electronic work based on the extracted features" by performing "a non-exhaustive
`
`search identifying a neighbor." . The '237 specification explicitly credits these
`
`algorithms to various third party sources. Id. The '237 specification does not
`
`purport to present any novel means of "determining . . . an identification of the
`
`media work using the . . . features . . . to perform" a "search of . . . media works."
`
`See also, Ex. 1018 at 1-3, 10-11 (explaining, in 1999, that "there are many
`
`applications of nearest neighbor search where an approximate answer is good
`
`enough" and that "approximate similarity search can be performed much faster
`
`than the exact one").
`
`19. Third and fourth, the '237 specification explains that there already
`
`existed "[k]nown techniques of linking works delivered via traditional media
`
`
`
`6
`
`Google Ex. 1004
`
`
`
`
`
`channels to a more interactive system" for Ex. 1001 at 2:18-40 "electronically
`
`determining an action based on the identification of the electronic work." For
`
`instance, conceded prior art systems used an identification of a work to "access[] a
`
`database . . . and match[] the ID with the associated web address" and subsequently
`
`navigate to the web address. Id at 2:27-40. The '237 specification does not purport
`
`to present any novel means of "transmitting . . . information about the identified
`
`media work to the client device" or "determining . . . an action based on the
`
`determined identification of the media work."
`
`20. Claims 1, 5, 9, and 13 of the '237 patent additionally require that the
`
`extraction of features must occur "by a client device." Ex. 1001 at Claim 1, 5, 9,
`
`13. It is my opinion that this element was also well known in the art prior to the
`
`filing of the '237 patent. E.g., Ex. 1010 at 3:8-22; 2:42-49; 4:31-5:59; Claims 4-7,
`
`12-15; Ex. 1015 at 5:50-55, 2:34-46; 7:34-8:13; Ex. 1013 at 2:46-47; 9:42-61; Ex.
`
`1011 at 4:51-53; 9:55-67; 18:6-65; Abstract; Claims 1-3.
`
`21. As expressed herein, it is further my opinion that persons skilled in
`
`the art had already combined the foregoing known elements as of September 14,
`
`2000. Even if they had not, it is my opinion that a person skilled in the art would
`
`have recognized a reason to combine these elements (to create a content
`
`recognition system), and the combination would have yielded predictable results.
`
`
`
`7
`
`Google Ex. 1004
`
`
`
`
`
`22. The dependent claims modify these elements by further specifying the
`
`characteristics of (a) the electronic work (Ex. 1001 at Claims 2, 6, 10, 14, 17, 18,
`
`19, 20); (b) the means of transmitting or receiving information about the identified
`
`media work (id. at 21, 22, 23, 24); (c) the process of identifying the electronic
`
`work (id. at Claims 2-6, 16-20); (d) the determined identification of the electronic
`
`work (id. at Claims 3, 7, 11, 15); (e) the action (id. at Claim 13); and/or (f) the
`
`performed action (id. at Claims 4, 8, 12, 16, 26-32, 34-40).
`
`23. Although the specification highlights the use of the claimed features
`
`to identify a television commercial (e.g., Ex. 1001 at 9:53-10:20), the claims are
`
`not limited to the identification of television commercials and the specification
`
`explicitly envisions the identification of "content" generally (e.g., id. at 1:31-35
`
`("e.g., content or an advertisement")). Further, while the specification highlights
`
`the use of certain exemplary algorithms for extracting features from an electronic
`
`work (e.g., id. at 19:63-20:30, 6:51-7:37) and determining an identification of an
`
`electronic work by performing a non-exhaustive search identifying a neighbor
`
`(e.g., id. at 8:64-9:7, 10:63-11:6, 21:7-23; 21:52-22:21), a person skilled in the art
`
`would have understood that extraction can be performed using a wide variety of
`
`algorithms.
`
`24. The claimed steps are illustrated in several figures of the '237 patent,
`
`including Figure 1, which demonstrates (1) "feature (vector) extraction
`
`
`
`8
`
`Google Ex. 1004
`
`
`
`
`
`operation(s)" (140), which "extract features from [an electronic] work" (Ex. 1001
`
`at 6:31-35); (2) "feature (vector) lookup operation(s)" (150), which identify a work
`
`by "search[ing] for a matching feature vector" (id. at 6:36-40); (3) "work-
`
`associated information lookup operation(s)" (160), which "retrieve associated
`
`information, such as an action" (id. at 6:41-44); and (4) "action initiation
`
`operation(s)" (170), which "perform some action based on the associated
`
`information. Figures 2-4 clarify that, in the absence of claim language to the
`
`contrary, the foregoing claimed steps can be performed on any combination of a
`
`local device and a remote server. E.g., Fig. 2 (depicting all steps (140a, 150a, 160a,
`
`and 170a) performed on "user computer, set-top-box or equivalent" (210)); Fig. 3
`
`(depicting extraction, identification, and action (140b, 150b, and 170b) performed
`
`on "user computer, set-top-box or equivalent" (310) and information lookup (160b)
`
`performed on "monitoring center" (340)); Fig. 4 (depicting extraction and action
`
`(140c and 170c) performed on "user computer, set-top-box or equivalent" (410),
`
`and identification and information lookup (150a and 160c) performed on
`
`"monitoring center" (440)).
`
`VII. UNDERSTANDING OF THE LEGAL PRINCIPLES RELEVANT TO
`MY DECLARATION
`25.
`
`In performing my analysis, I was given a basic understanding of
`
`certain applicable patent law concepts, which I have applied herein.
`
`
`
`9
`
`Google Ex. 1004
`
`
`
`
`
`26.
`
`I understand that the subject matter sought to be patented is
`
`anticipated if each and every limitation of a claim is found, expressly or inherently,
`
`in a single prior art reference. To the extent that I refer to a particular reference as
`
`"anticipating" a particular claim, I am using this phrasing as a shorthand to refer to
`
`the facts that underlie a determination of anticipation, including the element-by-
`
`element comparison of the claims to the prior art.
`
`27.
`
`I understand that a patent claim may also be invalid if it would have
`
`been obvious to a person of ordinary skill in the art, at the time the invention was
`
`made. In evaluating whether a prior art reference or a combination of prior art
`
`references renders a patent claim obvious, it is necessary to determine the scope
`
`and content of the prior art references, ascertain the differences between the prior
`
`art references and the claim in issue, resolve the level of ordinary skill in the
`
`pertinent art, and evaluate evidence of secondary considerations. I further
`
`understand that a claim is not proved obvious merely by demonstrating that each of
`
`the claimed elements was independently known in the prior art. Rather, there must
`
`be some reason why a person of ordinary skill in the art would have been prompted
`
`to combine the elements in the manner claimed. A reason to combine known
`
`elements could come from a variety of sources, including but not limited to the
`
`prior art itself, the background knowledge of one of ordinary skill in the art, the
`
`nature of the problem to be solved, market demand, or common sense. To the
`
`
`
`10
`
`Google Ex. 1004
`
`
`
`
`
`extent that I refer to a particular reference or combination of references as
`
`rendering a particular term "obvious," I am using this phrasing as a shorthand to
`
`refer to the facts that underlie a legal determination of obviousness.
`
`28.
`
`I understand that secondary considerations are objective factors which
`
`may be considered when determining whether a claim was obvious at the time of
`
`invention. These include indicia of commercial success of the invention, lack of
`
`commercial success of the prior art, long-felt but unsolved need to solve the
`
`problem addressed by the invention, unexpected results, failure of others to
`
`develop a similar invention, and copying of the invention by others. In addition, I
`
`understand that if the claimed invention was invented independently by other
`
`persons, either before it was invented by the patentee or at about the same time,
`
`then this fact indicates that a claimed invention is obvious.
`
`VIII. SUMMARY OF OPINIONS
`29.
`It is my opinion that U.S. Patent No. 6,188,010 ("Iwamura", Ex. 1012)
`
`discloses each and every element of claims 1, 3-5, 7-9, 11-13, 15-16, 21-25, 29-30,
`
`33, 37, and 38 of the '237 patent. Therefore, these claims anticipated by Iwamura.
`
`30.
`
`It is my opinion that the combination of U.S. Patent No. 6,505,160
`
`("Levy", Ex. 1013) and Sunil Arya, et al., "An Optimal Algorithm for Approximate
`
`Nearest Neighbor Searching in Fixed Dimensions" ("Arya", Ex. 1006) discloses
`
`each and every element of claims 1-25, 29-30, 32-33, 37-38, and 40 of the '237
`
`
`
`11
`
`Google Ex. 1004
`
`
`
`
`
`patent. Therefore, these claims are obvious to a person of ordinary skill in the art
`
`over Levy in view of Arya.
`
`31.
`
`It is my opinion that the combination of U.S. Patent No. 6,597,405
`
`("Iggulden", Ex. 1011) and Christian Böhm, et al., "Efficient Similarity Search in
`
`Digital Libraries" ("Böhm") ("Böhm", Ex. 1007) discloses each and every element
`
`of claims 25, 32-33, and 40 of the '237 patent. Therefore, these claims are obvious
`
`to a person of ordinary skill in the art over Iggulden in view of Böhm.
`
`32.
`
`It is my opinion that U.S. Patent No. 5,874,686 ("Ghias", Ex. 1010)
`
`discloses each and every element of claims 1-3, 5-7, 9-11, 13-15 and 21-24 of the
`
`'237 patent. Therefore, these claims are anticipated by Ghias.
`
`33.
`
`It is my opinion that U.S. Patent No. 7,743,092 ("Wood", Ex. 1015)
`
`discloses each and every element of claims 9-16 and 23-24 of the '237 patent.
`
`Therefore, these claims are anticipated by Wood.
`
`34.
`
`It is my opinion that the combination of Iwamura and Chen discloses
`
`each and every element of claims 26-27 and 34-35. Therefore, these claims are
`
`obvious to a person of ordinary skill in the art over Iwamura in view of Chen.
`
`35.
`
`It is my opinion that the combination of Levy, Arya, and Chen
`
`discloses each and every element of claims 26-27 and 34-35. Therefore, these
`
`claims are obvious to a person of ordinary skill in the art over Levy in view of
`
`Arya and Chen.
`
`
`
`12
`
`Google Ex. 1004
`
`
`
`
`
`IX. CONSTRUCTION OF CLAIM TERMS
`36.
`I understand that, for purposes of this proceeding, the claim terms of
`
`the '237 patent should be given their broadest reasonable construction in light of
`
`the specification. I further understand that this standard of claim construction can
`
`be broader than that generally employed by a federal district court, which assigns
`
`claim terms their ordinary and customary meaning to a person of ordinary skill in
`
`the art at the time of filing, considering: (1) in the context of the entire patent,
`
`intrinsic evidence, such as the claims, specification, and file history; and (2) in the
`
`context of the intrinsic evidence, and to a lesser degree, extrinsic evidence, such as
`
`inventor testimony, dictionaries, and learned treatises.
`
`37. For the purposes of this Declaration, I will base my construction of
`
`the claim terms on the Petitioner's position in this proceeding. I understand and
`
`agree with Petitioner that for the majority of terms, the broadest reasonable
`
`construction is the plain and ordinary meaning to a person of ordinary skill in the
`
`art.
`
`38.
`
`It is my opinion that, regardless of the claim constructions ultimately
`
`adopted by the Board, the prior art references discussed herein anticipate and/or
`
`render obvious the enumerated claims of the asserted patents under any possible
`
`constructions.
`
`
`
`13
`
`Google Ex. 1004
`
`
`
`
`
`39.
`
`I understand and agree with Petitioner's position that the term
`
`"extracting features" means "generating a quantitative representation of the
`
`features of a media work, or a subset of the features of a media work."
`
`40. The patent specification supports this construction of "extracting
`
`features." In particular, the specification explains that "[t]he recognition literature
`
`contains many different" extraction algorithms, and enumerates a broad range of
`
`algorithms capable of "feature extraction," including "pseudo-random sample[s] of
`
`pixels," "the average intensities of n×n blocks of pixels," "frequency-based
`
`decomposition of the signal, such as produced by the Fourier, wavelet and or
`
`discrete cosine transforms" "principal component analysis," "a temporal sequence
`
`of feature vectors," or "a combination of these." E.g., Ex. 1001 at 6:51-7:11. The
`
`specification further notes that "other representations are possible." Id. at 7:6. The
`
`only unifying characteristic of these methods is that they generate a quantitative
`
`representation of the features of a media work.
`
`41. Moreover, in 2000, persons skilled in the art consistently employed
`
`this definition of "extracting features." In particular, persons skilled in the art
`
`understood that it was advantageous for content identification systems to generate
`
`quantitative representations of the features of a media work, because such
`
`representations were typically compact and logically organized, and therefore were
`
`more easily processed and searched by computers than raw media files. See Ex.
`
`
`
`14
`
`Google Ex. 1004
`
`
`
`
`
`1001 at 6:57-59. Such representations were often, but not always, formatted to
`
`maximize the efficiency of subsequent search.
`
`42.
`
`I understand and agree with Petitioner's position that the term
`
`"features extracted from the media work" means "a quantitative representation of
`
`the features of a media work, or a subset of the features of a media work,"
`
`consistent with the above discussion of "extracting features."
`
`43.
`
`I understand and agree with Petitioner's position that the broadest
`
`reasonable construction of the term "non-exhaustive search" is "a search that
`
`locates a match without conducting a brute force comparison of all possible
`
`matches, and all data within all possible matches." The specification never uses the
`
`terms "non-exhaustive" or "exhaustive." Ex. 1001. Moreover, in 2000, persons
`
`skilled in the art did not have a precise technical definition for "non-exhaustive
`
`search," and only used the term colloquially. As such, "non-exhaustive search" was
`
`understood to have multiple possible definitions, the broadest of which was
`
`consistent with the above definition. In particular, the broadest definition of "non-
`
`exhaustive search" employed by persons skilled in the art encompassed anything
`
`other than a "brute force" search. (As a corollary, the narrowest definition of
`
`"exhaustive search" was "brute force search.") A "brute force" search, in turn, is a
`
`search wherein a query is compared to every single portion of every single item in
`
`a database. Persons skilled in the art recognized that, while a properly constructed
`
`
`
`15
`
`Google Ex. 1004
`
`
`
`
`
`brute force search was guaranteed to find the best match in a database, brute force
`
`searches were inherently inefficient and required on average N/2 comparison
`
`operations (where N is the number of entries in the search database). See id. at
`
`8:54-63. Therefore, to maximize search efficiency, persons skilled in the art
`
`routinely employed more efficient searches that did not conduct a comparison of
`
`every single item in a database, sometimes referred to as non-exhaustive searches.
`
`44. Because a brute force search conducts a comparison of every item in a
`
`search database, it is guaranteed to find the best match. Thus, any search that is not
`
`guaranteed to find the best match is necessarily not brute force, and therefore is
`
`necessarily non-exhaustive.
`
`45. By contrast, certain non-exhaustive search algorithms are guaranteed
`
`to find the best match. Thus, the fact that a search is guaranteed to find the best
`
`match does not necessarily indicate that it is a brute force search, and therefore
`
`does not preclude it from being considered non-exhaustive.
`
`46. For the avoidance of doubt, it is my opinion that a search that does
`
`compare a query item to all items in a database, but does not consider all data
`
`within each database item, is non-exhaustive. For instance, suppose a content
`
`identification system in which each item of extracted data contains a first part and
`
`a second part. If that system compares an unknown work to every known work in
`
`the database, but does so only on the basis of the first part of each item of extracted
`
`
`
`16
`
`Google Ex. 1004
`
`
`
`
`
`data (ignoring the second part of each item of extracted data), the search is non-
`
`exhaustive.
`
`47.
`
`It is further my opinion that, when evaluating whether a search is non-
`
`exhaustive, only the search algorithm should be considered, and not the
`
`preprocessing algorithm, if any. When database entries are ingested into a content
`
`recognition database, the system will typically preprocess the data to organize it
`
`into a form that facilitates efficient search using a dedicated preprocessing
`
`algorithm. Subsequently and separately, when a search operation is initiated, the
`
`system will locate the best match in the database using a dedicated search
`
`algorithm. The search algorithm and the preprocessing algorithm are distinct, and
`
`operate at separate points in time. Thus, the fact that a preprocessing algorithm
`
`may analyze all data entries in a database at the time of ingestion does not render a
`
`search algorithm brute force.
`
`48.
`
`I understand and agree with Petitioner's position that the terms
`
`"identify a neighbor," "identify a near neighbor," and "nearest neighbor search."
`
`means "identify a close, but not necessarily exact, match." In 2000, persons skilled
`
`in the art consistently employed this definition. In particular, persons skilled in the
`
`art typically designed content identification systems that compared extracted
`
`features from a query work to extracted features from a known work. However,
`
`due to discrepancies in the source material, such as noise, distortion, and source,
`
`
`
`17
`
`Google Ex. 1004
`
`
`
`
`
`two different copies of the same work might produce somewhat different sets of
`
`extracted features. Ex. 1001 at 8:31-53. Thus, rather than design systems that
`
`searched only for exact matches, persons skilled in the art typically designed
`
`systems that identified close, but not necessarily exact, matches. Id. Persons skilled
`
`in the art understood that the term "identifying a neighbor" referred to such
`
`searches.
`
`49.
`
`It is my opinion that different algorithms for identifying a neighbor
`
`can employ different measures of distance. For instance, "Euclidean" distance
`
`refers to the literal distance between data points (i.e., "as the crow flies");
`
`"Manhattan" distance refers to the sum of the absolute differences of the
`
`coordinates of data points (i.e., as car drives in a city); and Hamming distance
`
`refers, in the context of binary data points, to a count of the number of bits that are
`
`not equal.
`
`50. Consistent with the above, I understand and agree with Petitioner's
`
`position that a "neighbor" means "a close, but not necessarily exact, match."
`
`51.
`
`I understand and agree with Petitioner's position that the term
`
`"portable client device" means "any portable electronic device" including, for
`
`example, a laptop. The specification never uses the terms "portable" or "client."
`
`However, in 2000, persons skilled in the art consistently employed the above
`
`definition. Further, persons skilled in the art understood that computers could
`
`
`
`18
`
`Google Ex. 1004
`
`
`
`
`
`embody essentially any form—including portable forms, such as laptops—while
`
`still being considered computers. Thus, it is further my opinion that the term
`
`"computer" broadly refers to any electronic device. As such, it is my opinion that
`
`any disclosure of a "computer" amounts to disclosure of a "portable client device."
`
`52.
`
`I understand and agree with Petitioner's position that the term "fixed
`
`radius search" means "a neighbor search that identifies matches within a threshold
`
`radius of the query item." The specification confirms that one type of neighbor
`
`search is a "fixed radius search" in which "if the database contains a vector that is
`
`within X of the query, then there is a match." Ex. 1001 at 21:52-22:7. As with a
`
`neighbor search, different algorithms for performing a fixed radius search can
`
`employ different measures of distance, including Euclidean distance, Manhattan
`
`distance, and Hamming distance. Moreover, in 2000, persons skilled in the art
`
`consistently employed this definition.
`
`53.
`
`I understand and agree with Petitioner's position that the term
`
`"sublinear search" means "a search whose execution time has a sublinear
`
`relationship to database size." For instance, a linear search of a 200-item database
`
`would take twice as long as a linear search of a 100-item database. By contrast, a
`
`sublinear search of a 200-item database would take less than twice as long as a
`
`sublinear search of a 100-item database, perhaps, for instance, 1.5 times as long.
`
`Thus, if search execution time were plotted as a function of database size, a linear
`
`
`
`19
`
`Google Ex. 1004
`
`
`
`
`
`search would yield a diagonal line of constant slope, while a sublinear search
`
`would yield a line with a slope approaching zero.
`
`54. To illustrate, the following chart (which plots search time as a
`
`function of the number of entries in the search database), depicts an exemplary
`
`linear search and an exemplary sublinear search. The linear search exhibits a
`
`constant slope, while the sublinear search exhibits a slope approaching zero. It is
`
`my opinion that a search exhibiting such a slope approaching zero would be
`
`considered sublinear.
`
`Linear Search vs Sublinear Search
`10
`
`0
`
`9
`8
`7
`6
`5
`4
`3
`2
`1
`Number of Entries in Search Database
`
`10
`
`Linear
`Sublinear
`
`
`
`0123456789
`
`Search Time
`
`55.
`
`In 2000, persons skilled in the art consistently employed the foregoing
`
`definition of sublinear. In particular, as the sizes of automatic content identification
`
`databases grew, persons skilled in the art consistently employed sublinear searches
`
`to avoid having to proportionally increase the computing power available to the
`
`identification system.
`
`
`
`20
`
`Google Ex. 1004
`
`
`
`
`
`56. Consistent with the above, it is my opinion that a search whose
`
`execution time is proportional to the logarithm of the size of the data set (e.g., a
`
`search with execution time proportional to A log (BN), where A and B are
`
`constants, and N is the number of entries in the database) is sublinear. The
`
`specification of the '237 patent confirms this. Ex. 1001 at 8:54-63 (contrasting
`
`binary search with linear search and noting that, "[i]f binary search was possible,
`
`then a database containing N vectors would require at most log(N) comparisons.").
`
`57. Consistent with the above, it is my opinion that the typical
`
`performance of search algorithms associated with "kd-trees and vantage point
`
`trees" is sublinear. The specification of the '237 patent confirms this. Ex. 1001 at
`
`21:56-60 ("A number of possible data structures are applicable including kd-trees
`
`and vantage point trees. These data structures and associated search algorithms
`
`organize a N-point dataset (N=90,000,000 in out previous example) so that sub-
`
`linear time searches can be performed on average.").
`
`58. Consistent with the above, it is my opinion that the typical
`
`performance of "excluded middle vantage point forest" searches is sublinear. The
`
`specification of the '237 patent confirms this. Ex. 1001 at 21:61-22:7 ("Recently,
`
`Yianilos proposed an excluded middle vantage point forest for nearest neighbor
`
`search. See, e.g., the Yianilos reference. This data structur