`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Exhibit 3
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 2 of 49 PageID #: 2283
`
`UNITED STATES DISTRICT COURT
`FOR THE EASTERN DISTRICT OF TEXAS
`SHERMAN DIVISION
`
`R2 SOLUTIONS LLC,
`
`
`Plaintiff,
`
`
`DATABRICKS, INC.,
`
`
`v.
`
`
`
`
`
`
`
`
`
`
`Civil Action No. 4:23-cv-01147-ALM
`
`Defendant.
`
`DECLARATION OF DR. JON B. WEISSMAN
`
`I, Jon B. Weissman, Ph.D., hereby declare as follows:
`
`1.
`
`My name is Jon Weissman. I am at least eighteen years of age. I have personal
`
`knowledge of and am competent to testify as to the facts and opinions herein.
`
`2.
`
`I have been asked by Defendant Databricks, Inc. (“Databricks”) to provide my
`
`expert opinions relating to certain terms and phrases used in the claims of U.S. Patent No.
`
`8,190,610 (the “’610 patent”).
`
`3.
`
`This declaration sets forth my opinions on the disputed claim terms of the ’610
`
`patent.
`
`I.
`
`BACKGROUND AND QUALIFICATIONS
`
`4.
`
`In the paragraphs below, I summarize my qualifications. My qualifications are
`
`stated more fully in my curriculum vitae, which is attached to this declaration as Exhibit A.
`
`5.
`
`I am a Full Professor in the Department of Computer Science & Engineering at the
`
`University of Minnesota, where I’ve been a professor since 1999. In addition to my professorship,
`
`I lead the University of Minnesota’s Distributed Computing Systems Group. The Distributed
`
`1
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 3 of 49 PageID #: 2284
`
`Computing Systems Group is focused on research into distributed and mobile systems, cloud
`
`computing, and high-performance computing.
`
`6.
`
`In 1995, I received my Doctor of Philosophy degree (Ph.D.) in Computer Science
`
`from the University of Virginia. For my Ph.D. thesis, I developed the first automated scheduling
`
`system for parallel and distributed applications across heterogeneous local and wide-area
`
`networks. In 1989, I received my Master of Science degree (M.Sc.) in Computer Science from the
`
`University of Virginia. In 1984, I received my Bachelor of Science degree (B.Sc.) in Applied
`
`Mathematics and Computer Science from Carnegie Mellon University.
`
`7.
`
`Before obtaining my Ph.D. degree, I worked as a software engineer for five years
`
`in the early 1990s in the area of distributed systems. During this time, I was responsible for
`
`designing, implementing, and maintaining distributed computing systems where multiple nodes or
`
`computers worked together to achieve a common goal. In this capacity, I helped design and
`
`implement a distributed AI framework for knowledge-based business applications. In another
`
`project, I designed and implemented a parallel and distributed simulation framework for military
`
`and air traffic control applications.
`
`8.
`
`In 1995, after earning my Ph.D., I returned to academia and began my career as a
`
`professor. My research has been funded by NASA, the National Science Foundation, the
`
`Department of Energy, and the Air Force, and has included the following projects related to
`
`distributed computing and distributed data processing:
`
`• Department of Energy (“DOE”), “Making Parallel Computing Easy”;
`
`•
`
` National Science Foundation,
`
`“Collaborative Data Analysis
`
`and
`
`Visualization”;
`
`• Department of Energy (“DOE”), “An Integrated Middleware Framework to
`
`Enable Extreme Collaborative Science”;
`
`2
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 4 of 49 PageID #: 2285
`
`• Army High Performance Computing and Research Center (“AHPCRC”),
`
`“Metacomputing: Enabling Technologies and the Virtual Data Grid”;
`
`• National Science Foundation (“NSF,” “Resource Management for Parallel and
`
`Distributed Systems”;
`
`• National Science Foundation (“NSF”), “A Framework for Adaptive Grid
`
`Services” and
`
`• Air Force Office of Scientific Research (“ASFOSR”), “Telecommunication
`
`Networks for Mobile and Distributed Computing and Communications.”
`
`9.
`
`I have published over 100 peer-reviewed technical articles, including some
`
`awarded or nominated for Best Paper at highly competitive international conferences. Many of
`
`my published papers relate to distributed computing and distributed data processing, including, for
`
`example, the following:
`
`• “Nebula: Distributed Edge Cloud for Data Intensive Computing,” Albert Jonathan,
`
`Mathew Ryden, Kwangsung Oh, Abhishek Chandra, and Jon Weissman, IEEE
`
`Transactions on Parallel and Distributed Systems;
`
`• “TripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud
`
`Environment,” Kwangsung Oh, Abhishek Chandra, and Jon Weissman, 10th ACM
`
`International Systems and Storage Conference;
`
`• “Passive Network Performance Estimation for Large-scale, Data-Intensive
`
`Computing,” Jinoh Kim, Abhishek Chandra, and Jon B. Weissman, IEEE
`
`Transactions on Parallel and Distributed Systems;
`
`• “DDDAS/ITR: A Data Mining and Exploration Middleware for Grid and
`
`Distributed Computing,” Jon B. Weissman, Vipin Kumar, Varun Chandola, Eric
`
`3
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 5 of 49 PageID #: 2286
`
`Eilertson, Levent Ertoz, Gyorgy Simon, Seonho Kim, and Jinoh Kim, Workshop on
`
`Dynamic Data Driven Application Systems – DDDAS;
`
`• “Scheduling Parallel Applications in Distributed Networks,” Jon B. Weissman and
`
`Xin Zhao, Journal of Cluster Computing;
`
`• “Adaptive Resource Scheduling for Network Services,” Byoung-Dai Lee and Jon
`
`B. Weissman, IEEE 3rd International Workshop on Grid Computing;
`
`• “Adaptive Reputation-Based
`
`Scheduling
`
`on Unreliable Distributed
`
`Infrastructures,” Jason D. Sonnek, Abhishek Chandra, and Jon B. Weissman, IEEE
`
`Transactions on Parallel and Distributed Systems;
`
`• “Integrated Scheduling: The Best of Both Worlds,” Jon B. Weissman, Darin
`
`England, and Lakshman Abburi Rao, Journal of Parallel and Distributed
`
`Computing;
`
`• “Predicting the Cost and Benefit of Adapting Parallel Applications in Clusters,”
`
`Jon B. Weissman, Journal of Parallel and Distributed Computing; and
`
`• “Optimizing Remote File Access for Parallel and Distributed Network
`
`Applications,” Jon B. Weissman, Mike Gingras, and Mahesh Marina, Journal of
`
`Parallel and Distributed Computing.
`
`10.
`
`I am also the coauthor of the “Distributed and Multiprocessing Scheduling” chapter
`
`of “The Computer Science and Engineering Handbook” (2nd edition 2004). The chapter discusses
`
`CPU scheduling in parallel and distributed systems. I examine in the chapter techniques for
`
`spreading tasks across several processors of a distributed system. Examples of scheduling criteria
`
`in a distributed computing system discussed in the chapter include minimizing cost, minimizing
`
`communication delay, giving priority to certain users’ processes, and needs for specialized
`
`hardware devices.
`
`4
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 6 of 49 PageID #: 2287
`
`11.
`
`I have also coauthored publications related to MapReduce, including “Cross-Phase
`
`Optimization in MapReduce,” in “Cloud Computing for Data-Intensive Applications,” Springer,
`
`2014. This book chapter explores the application of MapReduce across widely distributed data
`
`and distributed computation resources. The chapter “propose[s] new cross-phase optimization
`
`techniques that enable independent MapReduce phases to influence one another” to address the
`
`“problem” that “interaction of MapReduce phases becomes pronounced in the presence of
`
`heterogeneous network behavior.” (Cross-Phase Optimization at Abstract.)
`
`12.
`
`I also coauthored the MapReduce paper, “Exploring MapReduce Efficiency with
`
`Highly-Distributed Data,” published in June 2011 in the Proceedings of the Second International
`
`Workshop on MapReduce and Its Applications. The paper has been downloaded over 1,200 times
`
`from the ACM Digital Library and cited in over 100 other research papers, according to ACM
`
`Digital Library and Google Scholar. The paper explains that conventional single-cluster
`
`MapReduce architectures were not suitable when data and computing resources are widely
`
`distributed. The paper examines three different architectural approaches to perform MapReduce
`
`jobs on two platforms—PlanetLab and Amazon EC2—and explains that a local architecture
`
`performs better in zero-aggregation conditions, while distributed architectures are preferred in
`
`high-aggregation and equal-partition conditions.
`
`13.
`
`During my academic appointments at the University of Minnesota, University of
`
`Edinburgh National e-Science Center, and University of Texas San Antonio, I taught and continue
`
`to routinely teach classes in operating systems and parallel and distributed systems. These classes
`
`focus on parallel computing platforms, parallel applications, and parallel program scheduling.
`
`These parallel computing platforms are used to implement various distributed processing
`
`techniques, including, for example MapReduce.
`
`5
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 7 of 49 PageID #: 2288
`
`14.
`
`I have also served on the boards of several major journals, including IEEE
`
`Transactions on Parallel and Distributed Systems and IEEE Transactions on Computers. I am the
`
`steering committee chair of the ACM International Symposium on High Performance Parallel and
`
`Distributed Systems, the premier annual conference for presenting the latest research on the design,
`
`implementation, evaluation, and the use of parallel and distributed systems for high-end
`
`computing. I have also served on the program committees of many conferences in the area of
`
`distributed computing and distributed data processing, including many Institute of Electrical and
`
`Electronic Engineers (“IEEE”) International conferences and workshops. Examples of the
`
`program committees include Supercomputing Conference (SC), IEEE International Parallel &
`
`Distributed Processing Symposium (IPDPS), and the International Conference on Parallel
`
`Processing (ICPP). I also chaired/co-chaired conferences in the field, including for the
`
`International Symposium on High-Performance Parallel and Distributed Computing (HDPC). As
`
`part of my service to the scientific community, I have served on the editorial board of numerous
`
`professional journals including, among others, the HPDC 1992-2012 special issue, Journal
`
`Frontiers in High Performance Computing, and IEEE Transactions on Parallel and Distributed
`
`Systems.
`
`15.
`
`In addition, throughout my professional career I have received many awards for my
`
`technical contributions in the areas of parallel distributed systems. As some examples, I am the
`
`“Best Paper” nominee for the 2015 IEEE International Conference on Cloud Engineering (IC2E),
`
`and “Best Paper” winner for the 2009 IEEE Grid conference. I received the 1996 Career Award
`
`from the National Science Foundation and the 1995 Supercomputing Award for “High-
`
`Performance Computing with Legion,” Supercomputing Conference (SC).
`
`16.
`
`Over the past two decades, I have served as a technical consultant to many
`
`companies and organizations in the fields of edge computing, cloud computing, grid computing,
`
`6
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 8 of 49 PageID #: 2289
`
`and scheduling automation software. My work has spanned a wide range of industries.
`
`Representative clients include prominent technology companies such as Cisco, Instrumental Inc.,
`
`Beckman Coulter, Inc., Thompson Reuters, and Avaki Inc., which was later acquired by Oracle.
`
`At Thompson Reuters I presented a tutorial on MapReduce.
`
`II. MATERIALS CONSIDERED
`
`17.
`
`In forming the opinions set forth in this declaration, I have reviewed the ’610 patent,
`
`its file history, statements R2 made in IPR2024-00659, and the extrinsic evidence identified by the
`
`parties. Additionally, I have drawn on my many years of experience in the field of distributed
`
`computing and distributed data processing.
`
`III. COMPENSATION AND LACK OF FINANCIAL INTEREST IN THIS
`LITIGATION
`
`18.
`
`I am being compensated for my time at my usual consulting rate of $800 per hour.
`
`This compensation is not contingent upon my performance, the conclusions I reach in my analysis,
`
`the outcome of this matter, or any issues involved in or related to this matter. I have no financial
`
`interest in Databricks or this litigation.
`
`IV.
`
`LEVEL OF ORDINARY SKILL IN THE ART
`
`19.
`
`It is my understanding that a patent is to be interpreted based on how it would have
`
`been read by a person of “ordinary skill in the art” (“POSITA”) at the time of the effective filing
`
`date of the relevant application. It is my understanding that there are various factors which may
`
`help establish the level of ordinary skill in the art, including: (1) the education level of those
`
`working in the field; (2) the sophistication of the technology; (3) the types of problems
`
`encountered in the art; (4) the prior art solutions to those problems; and (5) the speed in which
`
`innovation are made in the field.
`
`7
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 9 of 49 PageID #: 2290
`
`20.
`
`It is my understanding that here, the earliest effective filing date for the ’610 patent
`
`is October 5, 2006. I am familiar with the technological field at issue at that time.
`
`21.
`
`The ’610 patent generally relates to distributed data processing. In my opinion a
`
`person of ordinary skill in the relevant art, or POSITA, at the time of the ’610 patent’s effective
`
`filing date would have had at least a bachelor’s degree in computer science or a similar field, and
`
`at least two years of industry or academic experience in a field related to performing data analytics
`
`and/or related data processing tasks, including but not limited to, distributed computing systems
`
`and distributed data processing. However, with more experience, less education may be needed,
`
`and vice versa.
`
`22.
`
`I am familiar with the knowledge possessed by, and perceptions held by, one of
`
`ordinary skill in the art at that time period for the ’610 patent, based on my experience, which
`
`includes teaching undergraduate and graduate students at the University of Minnesota, the
`
`University of Edinburgh, and the University of Texas San Antonio, and through my participation
`
`in scientific community events in this time frame.
`
`V.
`
`LEGAL STANDARD
`
`23.
`
`Although I am an expert in the relevant technical field, I am not an attorney and I
`
`do not intend to offer opinions on legal issues. The laws and principles of claim construction and
`
`the material in this section have been provided to me by counsel, and my understanding is as
`
`follows.
`
`A.
`
`24.
`
`General Claim Construction Legal Standards
`
`It is my understanding that the claims of a patent define the limits of the patentees’
`
`exclusive rights. To determine the scope of the claimed invention, courts typically construe (or
`
`define) claim terms, the meanings of which are disputed by the parties. It is my understanding that
`
`claim terms should generally be given their ordinary and customary meaning as understood by one
`
`8
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 10 of 49 PageID #: 2291
`
`of ordinary skill in the art at the time of the invention and after reading the patent and its
`
`prosecution history.
`
`25.
`
`Claims must be construed, however, in light of and consistent, with the patent’s
`
`intrinsic evidence. Intrinsic evidence includes the claims themselves, the written disclosure in the
`
`patent’s specification, and the patent’s prosecution history, including the prior art that was
`
`considered by the United States Patent and Trademark Office (“PTO”) as part of the patent’s
`
`prosecution.
`
`26.
`
`The language of the claims helps guide the construction of claim terms. The context
`
`in which a term is used in the claims can be highly instructive.
`
`27.
`
`The patent specification is the best guide to the meaning of a disputed claim term,
`
`beyond the words of the claims themselves. Embodiments disclosed in the specification help teach
`
`and enable those of skill in the art to make and use the invention and are helpful to understanding
`
`the meaning of claim terms. Nevertheless, in most cases, the limitations of preferred embodiments
`
`and examples appearing in the specification should not be read into the claims.
`
`28.
`
`In the specification, a patentee may also define his own terms, give a claim term a
`
`different meaning than it would otherwise possess, or disclaim or disavow claim scope. A court
`
`may generally presume that a claim term possesses its ordinary meaning. This presumption,
`
`however, does not arise when the patentee acts as his own lexicographer by explicitly defining or
`
`re-defining a claim term. This presumption of ordinary meaning can also be overcome by
`
`statements, in the specification or prosecution history of the patent, of clear disclaimer or
`
`disavowal of a particular claim scope.
`
`29.
`
`It is my understanding that the specification may also resolve any ambiguity where
`
`the ordinary and customary meaning of a claim term lacks sufficient clarity to permit the scope of
`
`the claim to be ascertained from the words of the claim alone.
`
`9
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 11 of 49 PageID #: 2292
`
`30.
`
`It is my understanding that the prosecution history is another important source of
`
`evidence in the claim construction analysis. The prosecution history is the record of the
`
`proceedings before the PTO, including communications between the patentee and the PTO. The
`
`prosecution history can inform the meaning of the claim language by demonstrating how the
`
`patentee and the PTO understood the invention and whether the patentee limited the invention in
`
`the course of prosecution, making the claim scope narrower than it would otherwise be. It is my
`
`understanding that a patentee may also define a term during the prosecution of the patent. The
`
`patentee is precluded from recapturing through claim construction specific meanings or claim
`
`scope clearly and unambiguously disclaimed or disavowed during the patent’s prosecution.
`
`31.
`
`It is my understanding that courts can also consider extrinsic evidence when
`
`construing claims. Extrinsic evidence is any evidence that is extrinsic to the patent itself and its
`
`prosecution history. Examples of extrinsic evidence include technical dictionaries, treatises, and
`
`expert testimony. It is my understanding that extrinsic evidence is less significant than the intrinsic
`
`record in determining the meaning of claim language.
`
`B.
`
`32.
`
`Legal Standards Governing Means-Plus-Function Terms
`
`It is my understanding that some claim terms can be written in a means-plus-
`
`function format. Construing such claim terms involves two steps. First, the Court must identify
`
`the claimed function. Second, the Court must identify the structure, if any, disclosed in the
`
`specification for performing that function. It is my understanding that in order to meet the
`
`definiteness requirement of 35 U.S.C. § 1121, the specification must include a disclosure sufficient
`
`
`1 It is my understanding that while means-plus-function limitations are now governed by § 112(f)
`rather than § 112, ¶ 6, the substantive requirements of that paragraph have not changed. It is my
`understanding that the amended statute applies only to patents and applications filed on or after
`September 16, 2012. Because the ’610 patent was filed before September 16, 2012, I will refer to
`the preamendment paragraph numbers.
`
`10
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 12 of 49 PageID #: 2293
`
`for one skilled in the art to understand what structure disclosed in the specification performs the
`
`recited function. To determine the structure that corresponds with the recited function, the
`
`specification must clearly link or associate a structure with the particular function recited in the
`
`claim.
`
`33.
`
`Generally, for claims directed towards computer-implemented inventions, the
`
`structure disclosed in the specification must be more than a general-purpose computer or
`
`microprocessor. This is because general purpose computers can be programmed to perform
`
`different tasks in different ways and such a disclosure would effectively provide no limit on the
`
`scope of the claims. Thus, the corresponding structure for a computer-implemented function is
`
`not a computer but is a specific algorithm that allows a general-purpose computer or
`
`microprocessor to perform the claimed function. An “algorithm” is a fixed step-by-step procedure
`
`for accomplishing a given result. A patentee may express the procedural algorithm in any
`
`understandable terms including as a mathematical formula, in prose, or as a flow chart. A patentee
`
`is not required to produce a listing of source code or a highly detailed description of the algorithm
`
`to be used to achieve the claimed function in order to satisfy 35 U.S.C. § 112, ¶ 6. The patentee is
`
`required, however, to disclose in the patent specification the algorithm that transforms the general-
`
`purpose microprocessor into a special purpose computer which is programmed to perform the
`
`algorithm. I am informed that a patent claim is invalid as being indefinite if the specification fails
`
`to disclose in sufficient detail an algorithm for programming the computer or microprocessor.
`
`There is one limited exception to this general rule—a patent can meet the requirements of § 112,
`
`¶ 6 by reciting only a general-purpose computer or microprocessor (with no corresponding
`
`algorithm) if the claimed function can be achieved without any special programming.
`
`34.
`
`It is my understanding that although a claim element that does not contain the term
`
`“means” is presumed not to be subject to 35 U.S.C. § 112, ¶ 6, this presumption is overcome where
`
`11
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 13 of 49 PageID #: 2294
`
`the term does not connote a sufficiently definite structure to a person of ordinary skill in the art or
`
`recites a function without reciting sufficient structure for performing that function. It is my further
`
`understanding that certain terms have been explicitly recognized as “nonce” words or verbal
`
`constructs that are not recognized as the name of structure and are simply a substitute for the term
`
`“means for.” While claim language that includes adjectives further defining a generic term can
`
`sometimes add sufficient structure to render the claim to be not means-plus-function under
`
`35 U.S.C. § 112, ¶ 6, not just any description or qualification of functional language will suffice.
`
`The proper inquiry is whether or not the claim limitation itself, when read in light of the
`
`specification, connotes to a person of ordinarily skill in the art definite structure for performing
`
`the claimed functions.
`
`C.
`
`35.
`
`Legal Standards Governing Claim Indefiniteness
`
`It is my understanding that there is a “definiteness requirement” that a patent claim
`
`must distinctly claim the subject matter the inventor regards as his invention. It is my
`
`understanding that the purpose of this requirement is to make sure that the scope of the claims is
`
`clear enough so that the public knows what it can and cannot do without infringing the patent’s
`
`claims. A claim limitation is indefinite if the claim, when read in light of the specification and the
`
`prosecution history, fails to inform with reasonable certainty persons of ordinary skill in the art
`
`about the scope of the invention. In other words, the claims, when read in light of the specification
`
`and the prosecution history, must provide objective boundaries for those of skill in the art.
`
`VI. GENERAL STATE OF THE ART
`
`36.
`
`The ’610 patent relates generally to distributed computing, and more specifically
`
`to performing MapReduce operations in a distributed system. To provide context for my opinions
`
`on the construction of certain terms in the ’610 patent, in this section I provide some background
`
`information on distributed data processing, distributed computing systems, and MapReduce. All
`
`12
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 14 of 49 PageID #: 2295
`
`of the concepts discussed in this section were well known before the earliest priority date for the
`
`’610 patent, which is October 5, 2006. For example, distributed data processing and distributed
`
`system textbooks used in undergraduate courses taught the concepts below, and I had taught these
`
`concepts years before the earliest priority date.
`
`A.
`
`37.
`
`State of the Art of Relational Processing
`
`Relational processing was not new in 2006. Relational Database Management
`
`Systems (RDBMS) employing Structured Query Language (SQL) performed relational processing
`
`for years before the ’610 patent. (See Oracle Database SQL Reference (Ex. B-1) at 1-1 to 1-3.)
`
`Relational processing involves processing tables of different schema, for example by executing
`
`different functions that processes the different data in each of those tables, producing new tables
`
`as a result of that processing, and merging or joining the tables, including over a distributed
`
`network. These techniques have been described in numerous textbooks, publications, and journals
`
`in the field of relational processing.
`
`38. Well before the ’610 patent, documents describing Microsoft’s SQL Server 2005,
`
`for example, demonstrated that multiple CPU machines used SQL commands to partition data,
`
`process it in parallel, and then combine the results. (See Tripp Whitepaper (Ex. B-2) at
`
`Databricks_R2_PA00005141 (“Why do you need Partitioning?”) (“Large-scale operations across
`
`extremely large data sets—typically many million rows—can benefit by performing multiple
`
`operations against individual subsets in parallel. . . . This allows SQL Server 2005 to more
`
`effectively use multiple- CPU machines.”).) Partitioning is the database process that divides large
`
`tables into multiple smaller parts. By partitioning a large table into smaller, individual tables,
`
`queries that access only a fraction of the data can run faster because they have less data to scan.
`
`The primary goal of partitioning is to aid in maintenance of large tables and to reduce the overall
`
`13
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 15 of 49 PageID #: 2296
`
`response time by reading and loading less data for particular SQL operations. (Id. at
`
`Databricks_R2_PA00005141, -5145.)
`
`39.
`
`SQL Server 2005 could also process data sets with different schema like the
`
`“Orders” and “OrderDetails” tables reproduced below:
`
`
`
`(Id. at Databricks_R2_PA00005145, Fig. 3.) In this example, the orders table could identify the
`
`customer order numbers whereas order details may identify the items each customer purchased.
`
`SQL Server 2005 was capable of processing and joining the data from these different tables,
`
`including partitioning the data as shown in the image above.
`
`40.
`
`The Tripp Whitepaper also provides an example of how data may be partitioned.
`
`“The first step in partitioning tables . . . is to define the data on which the partition is ‘keyed.’”
`
`(Id. at Databricks_R2_PA00005146.) The partitioning key is a set of one or more columns in the
`
`table. (Id.) A partition function defines how the rows of a table are mapped to a set of partitions
`
`based on the values of a certain column. The figure below shows one possible example of the
`
`steps for creating a partitioned table:
`
`14
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 16 of 49 PageID #: 2297
`
`
`
`(Id. at Databricks_R2_PA00005148, Fig. 11.)
`
`41.
`
`The first step may be to determine whether an object, like a table, should be
`
`partitioned. Generally, Tripp recommends partitioning large tables, because partitioning adds
`
`administrative overhead that outweighs its benefits for small tables. The next step may be to
`
`determine a partitioning key and the number of partitions for the data. After that, one or more
`
`filegroups can be created to store and separate the data. Filegroups “place a partitioned table on
`
`multiple files for better IO balancing.” (Id. at Databricks_R2_PA00005149.) Then, a partition
`
`function and a partition scheme may be created. As Tripp explains, “[o]nce you have created a
`
`partition function you must associate it with a partition scheme to direct the partitions to specific
`
`filegroups.” (Id. at Databricks_R2_PA00005151.) After both the partition function and partition
`
`scheme are defined, a partition table may be defined to take advantage of them. “The table defines
`
`which “scheme” should be used and the scheme defines the function.” (Id.)
`
`B.
`
`42.
`
`History of MapReduce
`
`As the ’610 patent explains, MapReduce refers to a well-known and preexisting
`
`programming methodology for processing “parallel computations over distributed (typically, very
`
`large) data sets.” (’610 patent, 1:6-27.) In 2004, Jeffrey Dean and Sanjay Ghemawat published a
`
`15
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 17 of 49 PageID #: 2298
`
`paper while working at Google that described MapReduce as a programming methodology for
`
`processing big data sets in a parallel, distributed computing environment. (Dean & Ghemawat
`
`(Ex. B-3) (“Dean”).) The paper was presented at the 6th Symposium on Operating Systems Design
`
`and Implementation (OSDI). (Id.)
`
`43.
`
`As the name suggests, the MapReduce methodology includes two steps: a map
`
`function and a reduce function. (Ex. B-3 at 2.) The map function “written by the user, takes an
`
`input pair and produces a set of intermediate key/value pairs” that “group[] together all
`
`intermediate values associated with the same intermediate key.” (Id.) Dean provides examples of
`
`key/value pairs, including a word count example where the word is the key and each instance of
`
`the word in a document is the value. (Id.)
`
`44.
`
`The reduce function “also written by the user, accepts an intermediate key I and a
`
`set of values for that key” and “merges together these values.” (Id.) The intermediate values are
`
`supplied to the user’s reduce function via an iterator. (Id.) An iterator simply reads in the data
`
`and calls the reduce function for each item. Dean’s “implementation of MapReduce runs on a
`
`large cluster of commodity machines,” and “often on several thousand machines.” (Id. at 1, 7.)
`
`45.
`
`According to Dean, one of Google’s “most significant uses of MapReduce” by 2004
`
`was using it to index webpages for Google’s web search service. (Id. at 10-11 (Section 6.1).) The
`
`indexing process ran “as a sequence of five to ten MapReduce operations,” and considered “as
`
`input a large set of documents that have been retrieved by [Google’s] crawling system,” where
`
`“[t]he raw contents for these documents are more than 20 terabytes of data.” (Id. at 11.) Dean
`
`places no limitation on the type of input data and contemplates processing “large collection[s] of
`
`documents,” including websites and logs from the Internet. (See id. at 2, 10-11.) Dean also
`
`discloses a MapReduce library that provides “support for reading input data in several different
`
`formats. For example, ‘text’ mode input treats each line as a key/value pair . . . [e]ach input type
`
`16
`
`
`
`Case 4:23-cv-01147-ALM Document 53-3 Filed 10/29/24 Page 18 of 49 PageID #: 2299
`
`implementation knows how to split itself into meaningful ranges for processing as separate map
`
`tasks (e.g., text mode’s range splitting ensures that range splits occur only at line boundaries).”
`
`(Id. at 6-7 (Section 4.4).)
`
`VII. THE ’610 PATENT
`
`46.
`
`I provide below a brief summary of the ’610 patent. The ’610 patent is directed to
`
`an “enhanced” form of MapReduce. (’610 patent at Abstract, 1:31-44.) But the patent concedes
`
`that the actual change is not an enhancement of MapReduce, but a different treatment of inputs:
`
`“[t]he inventors have realized that, by treating an input data set as a plurality of grouped sets of
`
`key/value pairs, the utility of the MapReduce programming methodology may be enhanced.” (Id.
`
`at 1:66-2:8, 1:31-41, Abstract.) In other words, the patent specifies a plurality of “data groups”
`
`with different schema that are processed by MapReduce.
`
`47.
`
`The claimed idea of the ’610 patent can be understood with reference to Fig. 5,
`
`below.
`
`17
`
`
`
`Case 4:23-cv-0