`2642
`
`EXHIBIT 1-1
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 2 of 49 PageID #:
`2643
`
`UNITED STATES DISTRICT COURT
`FOR THE EASTERN DISTRICT OF TEXAS
`SHERMAN DIVISION
`
`R2 SOLUTIONS LLC,
`
`
`Plaintiff,
`
`
`DATABRICKS, INC.,
`
`
`v.
`
`
`
`
`
`
`
`
`
`
`Civil Action No. 4:23-cv-01147-ALM
`
`Defendant.
`
`DECLARATION OF DR. JON B. WEISSMAN
`
`I, Jon B. Weissman, Ph.D., hereby declare as follows:
`
`1.
`
`My name is Jon Weissman. I am at least eighteen years of age. I have personal
`
`knowledge of and am competent to testify as to the facts and opinions herein.
`
`2.
`
`I have been asked by Defendant Databricks, Inc. (“Databricks”) to provide my
`
`expert opinions relating to certain terms and phrases used in the claims of U.S. Patent No.
`
`8,190,610 (the “’610 patent”).
`
`3.
`
`This declaration sets forth my opinions on the disputed claim terms of the ’610
`
`patent.
`
`I.
`
`BACKGROUND AND QUALIFICATIONS
`
`4.
`
`In the paragraphs below, I summarize my qualifications. My qualifications are
`
`stated more fully in my curriculum vitae, which is attached to this declaration as Exhibit A.
`
`5.
`
`I am a Full Professor in the Department of Computer Science & Engineering at the
`
`University of Minnesota, where I’ve been a professor since 1999. In addition to my professorship,
`
`I lead the University of Minnesota’s Distributed Computing Systems Group. The Distributed
`
`1
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 3 of 49 PageID #:
`2644
`
`Computing Systems Group is focused on research into distributed and mobile systems, cloud
`
`computing, and high-performance computing.
`
`6.
`
`In 1995, I received my Doctor of Philosophy degree (Ph.D.) in Computer Science
`
`from the University of Virginia. For my Ph.D. thesis, I developed the first automated scheduling
`
`system for parallel and distributed applications across heterogeneous local and wide-area
`
`networks. In 1989, I received my Master of Science degree (M.Sc.) in Computer Science from the
`
`University of Virginia. In 1984, I received my Bachelor of Science degree (B.Sc.) in Applied
`
`Mathematics and Computer Science from Carnegie Mellon University.
`
`7.
`
`Before obtaining my Ph.D. degree, I worked as a software engineer for five years
`
`in the early 1990s in the area of distributed systems. During this time, I was responsible for
`
`designing, implementing, and maintaining distributed computing systems where multiple nodes or
`
`computers worked together to achieve a common goal. In this capacity, I helped design and
`
`implement a distributed AI framework for knowledge-based business applications. In another
`
`project, I designed and implemented a parallel and distributed simulation framework for military
`
`and air traffic control applications.
`
`8.
`
`In 1995, after earning my Ph.D., I returned to academia and began my career as a
`
`professor. My research has been funded by NASA, the National Science Foundation, the
`
`Department of Energy, and the Air Force, and has included the following projects related to
`
`distributed computing and distributed data processing:
`
`• Department of Energy (“DOE”), “Making Parallel Computing Easy”;
`
`•
`
` National Science Foundation,
`
`“Collaborative Data Analysis
`
`and
`
`Visualization”;
`
`• Department of Energy (“DOE”), “An Integrated Middleware Framework to
`
`Enable Extreme Collaborative Science”;
`
`2
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 4 of 49 PageID #:
`2645
`
`• Army High Performance Computing and Research Center (“AHPCRC”),
`
`“Metacomputing: Enabling Technologies and the Virtual Data Grid”;
`
`• National Science Foundation (“NSF,” “Resource Management for Parallel and
`
`Distributed Systems”;
`
`• National Science Foundation (“NSF”), “A Framework for Adaptive Grid
`
`Services” and
`
`• Air Force Office of Scientific Research (“ASFOSR”), “Telecommunication
`
`Networks for Mobile and Distributed Computing and Communications.”
`
`9.
`
`I have published over 100 peer-reviewed technical articles, including some
`
`awarded or nominated for Best Paper at highly competitive international conferences. Many of
`
`my published papers relate to distributed computing and distributed data processing, including, for
`
`example, the following:
`
`• “Nebula: Distributed Edge Cloud for Data Intensive Computing,” Albert Jonathan,
`
`Mathew Ryden, Kwangsung Oh, Abhishek Chandra, and Jon Weissman, IEEE
`
`Transactions on Parallel and Distributed Systems;
`
`• “TripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud
`
`Environment,” Kwangsung Oh, Abhishek Chandra, and Jon Weissman, 10th ACM
`
`International Systems and Storage Conference;
`
`• “Passive Network Performance Estimation for Large-scale, Data-Intensive
`
`Computing,” Jinoh Kim, Abhishek Chandra, and Jon B. Weissman, IEEE
`
`Transactions on Parallel and Distributed Systems;
`
`• “DDDAS/ITR: A Data Mining and Exploration Middleware for Grid and
`
`Distributed Computing,” Jon B. Weissman, Vipin Kumar, Varun Chandola, Eric
`
`3
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 5 of 49 PageID #:
`2646
`
`Eilertson, Levent Ertoz, Gyorgy Simon, Seonho Kim, and Jinoh Kim, Workshop on
`
`Dynamic Data Driven Application Systems – DDDAS;
`
`• “Scheduling Parallel Applications in Distributed Networks,” Jon B. Weissman and
`
`Xin Zhao, Journal of Cluster Computing;
`
`• “Adaptive Resource Scheduling for Network Services,” Byoung-Dai Lee and Jon
`
`B. Weissman, IEEE 3rd International Workshop on Grid Computing;
`
`• “Adaptive Reputation-Based
`
`Scheduling
`
`on Unreliable Distributed
`
`Infrastructures,” Jason D. Sonnek, Abhishek Chandra, and Jon B. Weissman, IEEE
`
`Transactions on Parallel and Distributed Systems;
`
`• “Integrated Scheduling: The Best of Both Worlds,” Jon B. Weissman, Darin
`
`England, and Lakshman Abburi Rao, Journal of Parallel and Distributed
`
`Computing;
`
`• “Predicting the Cost and Benefit of Adapting Parallel Applications in Clusters,”
`
`Jon B. Weissman, Journal of Parallel and Distributed Computing; and
`
`• “Optimizing Remote File Access for Parallel and Distributed Network
`
`Applications,” Jon B. Weissman, Mike Gingras, and Mahesh Marina, Journal of
`
`Parallel and Distributed Computing.
`
`10.
`
`I am also the coauthor of the “Distributed and Multiprocessing Scheduling” chapter
`
`of “The Computer Science and Engineering Handbook” (2nd edition 2004). The chapter discusses
`
`CPU scheduling in parallel and distributed systems. I examine in the chapter techniques for
`
`spreading tasks across several processors of a distributed system. Examples of scheduling criteria
`
`in a distributed computing system discussed in the chapter include minimizing cost, minimizing
`
`communication delay, giving priority to certain users’ processes, and needs for specialized
`
`hardware devices.
`
`4
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 6 of 49 PageID #:
`2647
`
`11.
`
`I have also coauthored publications related to MapReduce, including “Cross-Phase
`
`Optimization in MapReduce,” in “Cloud Computing for Data-Intensive Applications,” Springer,
`
`2014. This book chapter explores the application of MapReduce across widely distributed data
`
`and distributed computation resources. The chapter “propose[s] new cross-phase optimization
`
`techniques that enable independent MapReduce phases to influence one another” to address the
`
`“problem” that “interaction of MapReduce phases becomes pronounced in the presence of
`
`heterogeneous network behavior.” (Cross-Phase Optimization at Abstract.)
`
`12.
`
`I also coauthored the MapReduce paper, “Exploring MapReduce Efficiency with
`
`Highly-Distributed Data,” published in June 2011 in the Proceedings of the Second International
`
`Workshop on MapReduce and Its Applications. The paper has been downloaded over 1,200 times
`
`from the ACM Digital Library and cited in over 100 other research papers, according to ACM
`
`Digital Library and Google Scholar. The paper explains that conventional single-cluster
`
`MapReduce architectures were not suitable when data and computing resources are widely
`
`distributed. The paper examines three different architectural approaches to perform MapReduce
`
`jobs on two platforms—PlanetLab and Amazon EC2—and explains that a local architecture
`
`performs better in zero-aggregation conditions, while distributed architectures are preferred in
`
`high-aggregation and equal-partition conditions.
`
`13.
`
`During my academic appointments at the University of Minnesota, University of
`
`Edinburgh National e-Science Center, and University of Texas San Antonio, I taught and continue
`
`to routinely teach classes in operating systems and parallel and distributed systems. These classes
`
`focus on parallel computing platforms, parallel applications, and parallel program scheduling.
`
`These parallel computing platforms are used to implement various distributed processing
`
`techniques, including, for example MapReduce.
`
`5
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 7 of 49 PageID #:
`2648
`
`14.
`
`I have also served on the boards of several major journals, including IEEE
`
`Transactions on Parallel and Distributed Systems and IEEE Transactions on Computers. I am the
`
`steering committee chair of the ACM International Symposium on High Performance Parallel and
`
`Distributed Systems, the premier annual conference for presenting the latest research on the design,
`
`implementation, evaluation, and the use of parallel and distributed systems for high-end
`
`computing. I have also served on the program committees of many conferences in the area of
`
`distributed computing and distributed data processing, including many Institute of Electrical and
`
`Electronic Engineers (“IEEE”) International conferences and workshops. Examples of the
`
`program committees include Supercomputing Conference (SC), IEEE International Parallel &
`
`Distributed Processing Symposium (IPDPS), and the International Conference on Parallel
`
`Processing (ICPP). I also chaired/co-chaired conferences in the field, including for the
`
`International Symposium on High-Performance Parallel and Distributed Computing (HDPC). As
`
`part of my service to the scientific community, I have served on the editorial board of numerous
`
`professional journals including, among others, the HPDC 1992-2012 special issue, Journal
`
`Frontiers in High Performance Computing, and IEEE Transactions on Parallel and Distributed
`
`Systems.
`
`15.
`
`In addition, throughout my professional career I have received many awards for my
`
`technical contributions in the areas of parallel distributed systems. As some examples, I am the
`
`“Best Paper” nominee for the 2015 IEEE International Conference on Cloud Engineering (IC2E),
`
`and “Best Paper” winner for the 2009 IEEE Grid conference. I received the 1996 Career Award
`
`from the National Science Foundation and the 1995 Supercomputing Award for “High-
`
`Performance Computing with Legion,” Supercomputing Conference (SC).
`
`16.
`
`Over the past two decades, I have served as a technical consultant to many
`
`companies and organizations in the fields of edge computing, cloud computing, grid computing,
`
`6
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 8 of 49 PageID #:
`2649
`
`and scheduling automation software. My work has spanned a wide range of industries.
`
`Representative clients include prominent technology companies such as Cisco, Instrumental Inc.,
`
`Beckman Coulter, Inc., Thompson Reuters, and Avaki Inc., which was later acquired by Oracle.
`
`At Thompson Reuters I presented a tutorial on MapReduce.
`
`II. MATERIALS CONSIDERED
`
`17.
`
`In forming the opinions set forth in this declaration, I have reviewed the ’610 patent,
`
`its file history, statements R2 made in IPR2024-00659, and the extrinsic evidence identified by the
`
`parties. Additionally, I have drawn on my many years of experience in the field of distributed
`
`computing and distributed data processing.
`
`III. COMPENSATION AND LACK OF FINANCIAL INTEREST IN THIS
`LITIGATION
`
`18.
`
`I am being compensated for my time at my usual consulting rate of $800 per hour.
`
`This compensation is not contingent upon my performance, the conclusions I reach in my analysis,
`
`the outcome of this matter, or any issues involved in or related to this matter. I have no financial
`
`interest in Databricks or this litigation.
`
`IV.
`
`LEVEL OF ORDINARY SKILL IN THE ART
`
`19.
`
`It is my understanding that a patent is to be interpreted based on how it would have
`
`been read by a person of “ordinary skill in the art” (“POSITA”) at the time of the effective filing
`
`date of the relevant application. It is my understanding that there are various factors which may
`
`help establish the level of ordinary skill in the art, including: (1) the education level of those
`
`working in the field; (2) the sophistication of the technology; (3) the types of problems
`
`encountered in the art; (4) the prior art solutions to those problems; and (5) the speed in which
`
`innovation are made in the field.
`
`7
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 9 of 49 PageID #:
`2650
`
`20.
`
`It is my understanding that here, the earliest effective filing date for the ’610 patent
`
`is October 5, 2006. I am familiar with the technological field at issue at that time.
`
`21.
`
`The ’610 patent generally relates to distributed data processing. In my opinion a
`
`person of ordinary skill in the relevant art, or POSITA, at the time of the ’610 patent’s effective
`
`filing date would have had at least a bachelor’s degree in computer science or a similar field, and
`
`at least two years of industry or academic experience in a field related to performing data analytics
`
`and/or related data processing tasks, including but not limited to, distributed computing systems
`
`and distributed data processing. However, with more experience, less education may be needed,
`
`and vice versa.
`
`22.
`
`I am familiar with the knowledge possessed by, and perceptions held by, one of
`
`ordinary skill in the art at that time period for the ’610 patent, based on my experience, which
`
`includes teaching undergraduate and graduate students at the University of Minnesota, the
`
`University of Edinburgh, and the University of Texas San Antonio, and through my participation
`
`in scientific community events in this time frame.
`
`V.
`
`LEGAL STANDARD
`
`23.
`
`Although I am an expert in the relevant technical field, I am not an attorney and I
`
`do not intend to offer opinions on legal issues. The laws and principles of claim construction and
`
`the material in this section have been provided to me by counsel, and my understanding is as
`
`follows.
`
`A.
`
`24.
`
`General Claim Construction Legal Standards
`
`It is my understanding that the claims of a patent define the limits of the patentees’
`
`exclusive rights. To determine the scope of the claimed invention, courts typically construe (or
`
`define) claim terms, the meanings of which are disputed by the parties. It is my understanding that
`
`claim terms should generally be given their ordinary and customary meaning as understood by one
`
`8
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 10 of 49 PageID #:
`2651
`
`of ordinary skill in the art at the time of the invention and after reading the patent and its
`
`prosecution history.
`
`25.
`
`Claims must be construed, however, in light of and consistent, with the patent’s
`
`intrinsic evidence. Intrinsic evidence includes the claims themselves, the written disclosure in the
`
`patent’s specification, and the patent’s prosecution history, including the prior art that was
`
`considered by the United States Patent and Trademark Office (“PTO”) as part of the patent’s
`
`prosecution.
`
`26.
`
`The language of the claims helps guide the construction of claim terms. The context
`
`in which a term is used in the claims can be highly instructive.
`
`27.
`
`The patent specification is the best guide to the meaning of a disputed claim term,
`
`beyond the words of the claims themselves. Embodiments disclosed in the specification help teach
`
`and enable those of skill in the art to make and use the invention and are helpful to understanding
`
`the meaning of claim terms. Nevertheless, in most cases, the limitations of preferred embodiments
`
`and examples appearing in the specification should not be read into the claims.
`
`28.
`
`In the specification, a patentee may also define his own terms, give a claim term a
`
`different meaning than it would otherwise possess, or disclaim or disavow claim scope. A court
`
`may generally presume that a claim term possesses its ordinary meaning. This presumption,
`
`however, does not arise when the patentee acts as his own lexicographer by explicitly defining or
`
`re-defining a claim term. This presumption of ordinary meaning can also be overcome by
`
`statements, in the specification or prosecution history of the patent, of clear disclaimer or
`
`disavowal of a particular claim scope.
`
`29.
`
`It is my understanding that the specification may also resolve any ambiguity where
`
`the ordinary and customary meaning of a claim term lacks sufficient clarity to permit the scope of
`
`the claim to be ascertained from the words of the claim alone.
`
`9
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 11 of 49 PageID #:
`2652
`
`30.
`
`It is my understanding that the prosecution history is another important source of
`
`evidence in the claim construction analysis. The prosecution history is the record of the
`
`proceedings before the PTO, including communications between the patentee and the PTO. The
`
`prosecution history can inform the meaning of the claim language by demonstrating how the
`
`patentee and the PTO understood the invention and whether the patentee limited the invention in
`
`the course of prosecution, making the claim scope narrower than it would otherwise be. It is my
`
`understanding that a patentee may also define a term during the prosecution of the patent. The
`
`patentee is precluded from recapturing through claim construction specific meanings or claim
`
`scope clearly and unambiguously disclaimed or disavowed during the patent’s prosecution.
`
`31.
`
`It is my understanding that courts can also consider extrinsic evidence when
`
`construing claims. Extrinsic evidence is any evidence that is extrinsic to the patent itself and its
`
`prosecution history. Examples of extrinsic evidence include technical dictionaries, treatises, and
`
`expert testimony. It is my understanding that extrinsic evidence is less significant than the intrinsic
`
`record in determining the meaning of claim language.
`
`B.
`
`32.
`
`Legal Standards Governing Means-Plus-Function Terms
`
`It is my understanding that some claim terms can be written in a means-plus-
`
`function format. Construing such claim terms involves two steps. First, the Court must identify
`
`the claimed function. Second, the Court must identify the structure, if any, disclosed in the
`
`specification for performing that function. It is my understanding that in order to meet the
`
`definiteness requirement of 35 U.S.C. § 1121, the specification must include a disclosure sufficient
`
`
`1 It is my understanding that while means-plus-function limitations are now governed by § 112(f)
`rather than § 112, ¶ 6, the substantive requirements of that paragraph have not changed. It is my
`understanding that the amended statute applies only to patents and applications filed on or after
`September 16, 2012. Because the ’610 patent was filed before September 16, 2012, I will refer to
`the preamendment paragraph numbers.
`
`10
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 12 of 49 PageID #:
`2653
`
`for one skilled in the art to understand what structure disclosed in the specification performs the
`
`recited function. To determine the structure that corresponds with the recited function, the
`
`specification must clearly link or associate a structure with the particular function recited in the
`
`claim.
`
`33.
`
`Generally, for claims directed towards computer-implemented inventions, the
`
`structure disclosed in the specification must be more than a general-purpose computer or
`
`microprocessor. This is because general purpose computers can be programmed to perform
`
`different tasks in different ways and such a disclosure would effectively provide no limit on the
`
`scope of the claims. Thus, the corresponding structure for a computer-implemented function is
`
`not a computer but is a specific algorithm that allows a general-purpose computer or
`
`microprocessor to perform the claimed function. An “algorithm” is a fixed step-by-step procedure
`
`for accomplishing a given result. A patentee may express the procedural algorithm in any
`
`understandable terms including as a mathematical formula, in prose, or as a flow chart. A patentee
`
`is not required to produce a listing of source code or a highly detailed description of the algorithm
`
`to be used to achieve the claimed function in order to satisfy 35 U.S.C. § 112, ¶ 6. The patentee is
`
`required, however, to disclose in the patent specification the algorithm that transforms the general-
`
`purpose microprocessor into a special purpose computer which is programmed to perform the
`
`algorithm. I am informed that a patent claim is invalid as being indefinite if the specification fails
`
`to disclose in sufficient detail an algorithm for programming the computer or microprocessor.
`
`There is one limited exception to this general rule—a patent can meet the requirements of § 112,
`
`¶ 6 by reciting only a general-purpose computer or microprocessor (with no corresponding
`
`algorithm) if the claimed function can be achieved without any special programming.
`
`34.
`
`It is my understanding that although a claim element that does not contain the term
`
`“means” is presumed not to be subject to 35 U.S.C. § 112, ¶ 6, this presumption is overcome where
`
`11
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 13 of 49 PageID #:
`2654
`
`the term does not connote a sufficiently definite structure to a person of ordinary skill in the art or
`
`recites a function without reciting sufficient structure for performing that function. It is my further
`
`understanding that certain terms have been explicitly recognized as “nonce” words or verbal
`
`constructs that are not recognized as the name of structure and are simply a substitute for the term
`
`“means for.” While claim language that includes adjectives further defining a generic term can
`
`sometimes add sufficient structure to render the claim to be not means-plus-function under
`
`35 U.S.C. § 112, ¶ 6, not just any description or qualification of functional language will suffice.
`
`The proper inquiry is whether or not the claim limitation itself, when read in light of the
`
`specification, connotes to a person of ordinarily skill in the art definite structure for performing
`
`the claimed functions.
`
`C.
`
`35.
`
`Legal Standards Governing Claim Indefiniteness
`
`It is my understanding that there is a “definiteness requirement” that a patent claim
`
`must distinctly claim the subject matter the inventor regards as his invention. It is my
`
`understanding that the purpose of this requirement is to make sure that the scope of the claims is
`
`clear enough so that the public knows what it can and cannot do without infringing the patent’s
`
`claims. A claim limitation is indefinite if the claim, when read in light of the specification and the
`
`prosecution history, fails to inform with reasonable certainty persons of ordinary skill in the art
`
`about the scope of the invention. In other words, the claims, when read in light of the specification
`
`and the prosecution history, must provide objective boundaries for those of skill in the art.
`
`VI. GENERAL STATE OF THE ART
`
`36.
`
`The ’610 patent relates generally to distributed computing, and more specifically
`
`to performing MapReduce operations in a distributed system. To provide context for my opinions
`
`on the construction of certain terms in the ’610 patent, in this section I provide some background
`
`information on distributed data processing, distributed computing systems, and MapReduce. All
`
`12
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 14 of 49 PageID #:
`2655
`
`of the concepts discussed in this section were well known before the earliest priority date for the
`
`’610 patent, which is October 5, 2006. For example, distributed data processing and distributed
`
`system textbooks used in undergraduate courses taught the concepts below, and I had taught these
`
`concepts years before the earliest priority date.
`
`A.
`
`37.
`
`State of the Art of Relational Processing
`
`Relational processing was not new in 2006. Relational Database Management
`
`Systems (RDBMS) employing Structured Query Language (SQL) performed relational processing
`
`for years before the ’610 patent. (See Oracle Database SQL Reference (Ex. B-1) at 1-1 to 1-3.)
`
`Relational processing involves processing tables of different schema, for example by executing
`
`different functions that processes the different data in each of those tables, producing new tables
`
`as a result of that processing, and merging or joining the tables, including over a distributed
`
`network. These techniques have been described in numerous textbooks, publications, and journals
`
`in the field of relational processing.
`
`38. Well before the ’610 patent, documents describing Microsoft’s SQL Server 2005,
`
`for example, demonstrated that multiple CPU machines used SQL commands to partition data,
`
`process it in parallel, and then combine the results. (See Tripp Whitepaper (Ex. B-2) at
`
`Databricks_R2_PA00005141 (“Why do you need Partitioning?”) (“Large-scale operations across
`
`extremely large data sets—typically many million rows—can benefit by performing multiple
`
`operations against individual subsets in parallel. . . . This allows SQL Server 2005 to more
`
`effectively use multiple- CPU machines.”).) Partitioning is the database process that divides large
`
`tables into multiple smaller parts. By partitioning a large table into smaller, individual tables,
`
`queries that access only a fraction of the data can run faster because they have less data to scan.
`
`The primary goal of partitioning is to aid in maintenance of large tables and to reduce the overall
`
`13
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 15 of 49 PageID #:
`2656
`
`response time by reading and loading less data for particular SQL operations. (Id. at
`
`Databricks_R2_PA00005141, -5145.)
`
`39.
`
`SQL Server 2005 could also process data sets with different schema like the
`
`“Orders” and “OrderDetails” tables reproduced below:
`
`
`
`(Id. at Databricks_R2_PA00005145, Fig. 3.) In this example, the orders table could identify the
`
`customer order numbers whereas order details may identify the items each customer purchased.
`
`SQL Server 2005 was capable of processing and joining the data from these different tables,
`
`including partitioning the data as shown in the image above.
`
`40.
`
`The Tripp Whitepaper also provides an example of how data may be partitioned.
`
`“The first step in partitioning tables . . . is to define the data on which the partition is ‘keyed.’”
`
`(Id. at Databricks_R2_PA00005146.) The partitioning key is a set of one or more columns in the
`
`table. (Id.) A partition function defines how the rows of a table are mapped to a set of partitions
`
`based on the values of a certain column. The figure below shows one possible example of the
`
`steps for creating a partitioned table:
`
`14
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 16 of 49 PageID #:
`2657
`
`
`
`(Id. at Databricks_R2_PA00005148, Fig. 11.)
`
`41.
`
`The first step may be to determine whether an object, like a table, should be
`
`partitioned. Generally, Tripp recommends partitioning large tables, because partitioning adds
`
`administrative overhead that outweighs its benefits for small tables. The next step may be to
`
`determine a partitioning key and the number of partitions for the data. After that, one or more
`
`filegroups can be created to store and separate the data. Filegroups “place a partitioned table on
`
`multiple files for better IO balancing.” (Id. at Databricks_R2_PA00005149.) Then, a partition
`
`function and a partition scheme may be created. As Tripp explains, “[o]nce you have created a
`
`partition function you must associate it with a partition scheme to direct the partitions to specific
`
`filegroups.” (Id. at Databricks_R2_PA00005151.) After both the partition function and partition
`
`scheme are defined, a partition table may be defined to take advantage of them. “The table defines
`
`which “scheme” should be used and the scheme defines the function.” (Id.)
`
`B.
`
`42.
`
`History of MapReduce
`
`As the ’610 patent explains, MapReduce refers to a well-known and preexisting
`
`programming methodology for processing “parallel computations over distributed (typically, very
`
`large) data sets.” (’610 patent, 1:6-27.) In 2004, Jeffrey Dean and Sanjay Ghemawat published a
`
`15
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 17 of 49 PageID #:
`2658
`
`paper while working at Google that described MapReduce as a programming methodology for
`
`processing big data sets in a parallel, distributed computing environment. (Dean & Ghemawat
`
`(Ex. B-3) (“Dean”).) The paper was presented at the 6th Symposium on Operating Systems Design
`
`and Implementation (OSDI). (Id.)
`
`43.
`
`As the name suggests, the MapReduce methodology includes two steps: a map
`
`function and a reduce function. (Ex. B-3 at 2.) The map function “written by the user, takes an
`
`input pair and produces a set of intermediate key/value pairs” that “group[] together all
`
`intermediate values associated with the same intermediate key.” (Id.) Dean provides examples of
`
`key/value pairs, including a word count example where the word is the key and each instance of
`
`the word in a document is the value. (Id.)
`
`44.
`
`The reduce function “also written by the user, accepts an intermediate key I and a
`
`set of values for that key” and “merges together these values.” (Id.) The intermediate values are
`
`supplied to the user’s reduce function via an iterator. (Id.) An iterator simply reads in the data
`
`and calls the reduce function for each item. Dean’s “implementation of MapReduce runs on a
`
`large cluster of commodity machines,” and “often on several thousand machines.” (Id. at 1, 7.)
`
`45.
`
`According to Dean, one of Google’s “most significant uses of MapReduce” by 2004
`
`was using it to index webpages for Google’s web search service. (Id. at 10-11 (Section 6.1).) The
`
`indexing process ran “as a sequence of five to ten MapReduce operations,” and considered “as
`
`input a large set of documents that have been retrieved by [Google’s] crawling system,” where
`
`“[t]he raw contents for these documents are more than 20 terabytes of data.” (Id. at 11.) Dean
`
`places no limitation on the type of input data and contemplates processing “large collection[s] of
`
`documents,” including websites and logs from the Internet. (See id. at 2, 10-11.) Dean also
`
`discloses a MapReduce library that provides “support for reading input data in several different
`
`formats. For example, ‘text’ mode input treats each line as a key/value pair . . . [e]ach input type
`
`16
`
`
`
`Case 4:23-cv-01147-ALM Document 62-2 Filed 11/12/24 Page 18 of 49 PageID #:
`2659
`
`implementation knows how to split itself into meaningful ranges for processing as separate map
`
`tasks (e.g., text mode’s range splitting ensures that range splits occur only at line boundaries).”
`
`(Id. at 6-7 (Section 4.4).)
`
`VII. THE ’610 PATENT
`
`46.
`
`I provide below a brief summary of the ’610 patent. The ’610 patent is directed to
`
`an “enhanced” form of MapReduce. (’610 patent at Abstract, 1:31-44.) But the patent concedes
`
`that the actual change is not an enhancement of MapReduce, but a different treatment of inputs:
`
`“[t]he inventors have realized that, by treating an input data set as a plurality of grouped sets of
`
`key/value pairs, the utility of the MapReduce programming methodology may be enhanced.” (Id.
`
`at 1:66-2:8, 1:31-41, Abstract.) In other words, the patent specifies a plurality of “data groups”
`
`with different schema that are processed by MapReduce.
`
`47.
`
`The claimed idea of the ’610 patent can be understood with referenc

Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.

This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.

One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site