`5692
`
`EXHIBIT A
`
`
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 2 of 149 PageID #:
`5693
`
`Second Supplemental Infringement
`Contentions
`R2 Solutions LLC v. Databricks, Inc.
`Civil Action No. 4:23-cv-01147 (E.D. Tex.)
`
`U.S. Pat. No. 8,190,610
`Claims 1-5 and 17-21
`
`
`
`Second Supplemental Infringement Contentions
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 3 of 149 PageID #:
`5694
`
`R2 Solutions LLC (“R2” or “Plaintiff”) contends that Databricks, Inc. (“Databricks” or “Defendant”), its partners,
`and its customers have infringed and continue to infringe U.S. Pat. No. 8,190,610 (the “’610 Patent”), at least
`by, among other things, making, having made, using, offering for sale, and/or selling within this judicial district
`and elsewhere in the United States and/or importing into this judicial district and into the United States –
`without license or authority – the “Accused Instrumentalities” in a manner that infringes the claims of the ‘610
`Patent as set forth in this chart. The “Accused Instrumentalities” include the Databricks Data Intelligence
`Platform/Databricks Lakehouse Platform and any other platforms provided by Databricks that utilize Apache
`Spark or any other similar functionality. The Accused Instrumentalities practice the steps of the asserted claims
`or are capable of performing the claim elements of the asserted claims, or otherwise satisfy all elements,
`described or set forth in one or more claims of the ’610 Patent.
`
`2
`
`
`
`Second Supplemental Infringement Contentions
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 4 of 149 PageID #:
`5695
`
`R2 contends that each and every limitation of the asserted claims of the ’610 Patent is literally present or performed and infringed by each and
`every Accused Instrumentality. In the alternative, R2 contends that all limitations of the asserted claims are present under the doctrine of
`equivalents because (1) to the extent there are any differences between the Accused Instrumentalities and those claim limitations (which R2
`denies), such differences are insubstantial. Equivalency may also be shown by the fact that (2) the Accused Instrumentalities perform
`substantially the same function, in substantially the same way, to achieve substantially the same result as the functions recited in the
`limitations of the asserted claims.
`
`R2 reserves the right to supplement or amend its claim charts, operative infringement contentions, and expert reports as may be appropriate
`or permissible under the applicable rules and Scheduling Order, including the Court’s modifications to P.R. 3-1 available in the Scheduling
`Order. Any reference or citation to the specification is for illustrative purposes only and is not intended to suggest that the scope of the
`claimed invention is limited to any particular preferred embodiment. R2 does not hereby waive any applicable work product protections.
`
`For the avoidance of doubt, the claim elements charted herein are software limitations and/or associated with software limitations for the
`purposes of P.R. 3-1(g), and to the extent that such elements are performed by hardware, firmware, software, and/or any combination
`thereof, the particular details are in Databricks’ possession, custody, and control.
`
`3
`
`
`
`References Cited
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 5 of 149 PageID #:
`5696
`
`The following chart presents R2 Solutions’ analysis of the Accused Instrumentalities based on publicly available information
`and the open-source code produced by Databricks. The citations in the chart refer to the following documents, which are
`incorporated by reference as if fully set forth herein.
`This infringement analysis is preliminary, and R2’s investigation into Databricks’ infringement is ongoing. R2 reserves the right
`to provide additional theories under which Databricks products/systems infringe this patent.
`
`• “Platform features,” https://www.databricks.com/product/pricing/platform-addons.
`• “Apache Spark on Databricks,” https://docs.databricks.com/en/spark/index.html.
`• “Apache Spark FAQ,” https://spark.apache.org/faq.html.
`• “RDD Programming Guide,” http://spark.apache.org/docs/latest/rdd-programming-guide.html.
`• “Spark in Action,” https://dokumen.pub/qdownload/spark-in-action-2nbsped-9781617295522.html.
`• “Learning Spark,” https://cs.famaf.unc.edu.ar/~damian/tmp/bib/Learning_Spark_Lightning-Fast_Big_Data_Analysis.pdf.
`• “Spark RDD Transformations with examples,” https://sparkbyexamples.com/apache-spark-rdd/spark-rdd-transformations/.
`• “Apache Spark Quick Start Guide,” https://learning.oreilly.com/library/view/apache-spark-
`quick/9781789349108/0ee1a5e2-09d0-49f0-99f5-9dee8336258d.xhtml.
`
`4
`
`
`
`References Cited (continued)
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 6 of 149 PageID #:
`5697
`
`• “Spark SQL Shuffle Partitions,” https://sparkbyexamples.com/spark/spark-shuffle-partitions/.
`• “Apache Spark Map vs. FlatMap Operation,” https://data-flair.training/blogs/apache-spark-map-vs-flatmap/.
`• “Hints,” https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html.
`• “Spark SQL”, https://kks32-courses.gitbook.io/data-analytics/spark/spark-sql.
`• “Spark SQL, DataFrames and Datasets Guide,” https://spark.apache.org/docs/2.2.0/sql-programming-guide.html.
`• “Delta Lake on Databricks,” https://www.databricks.com/product/delta-lake-on-databricks
`• “ETL,” https://www.databricks.com/glossary/extract-transform-load
`• “Transform Data with Delta Live Tables,” https://docs.databricks.com/en/delta-live-tables/transform.html
`• “Chapter 4. Reductions in Spark,” https://www.oreilly.com/library/view/data-algorithms-
`with/9781492082378/ch04.html
`• Open Source Spark Code**
`
`**NOTE: citations to file paths (e.g., “org/apache/spark/Partitioner.scala”, etc.) throughout have a root address of https://github.com/apache/spark/blob/master/core/src/main/scala.
`Thus, as a few examples:
`•
`org/apache/spark/rdd/PairRDDFunctions.scala refers to https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
`•
`org/apache/spark/Partitioner.scala refers to https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Partitioner.scala
`
`5
`
`
`
`U.S. Patent No. 8,190,610
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 7 of 149 PageID #:
`5698
`
`Title:
`
`Priority Date:
`Filing Date:
`Issue Date:
`Expiration Date:
`
`MAPREDUCE FOR DISTRIBUTED DATABASE
`PROCESSING
`October 5, 2006
`October 5, 2006
`May 29, 2012
`October 14, 2029
`
`6
`
`
`
`Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 8 of 149 PageID #:
`5699
`
`1[pre]
`
`A method of processing data of a data set over a distributed system, wherein the data set comprises a plurality of data groups,
`the method comprising:
`
`1[a]
`
`1[b]
`
`1[c]
`
`partitioning the data of each one of the data groups into a plurality of data partitions that each have a plurality of key-value pairs
`and providing each data partition to a selected one of a plurality of mapping functions that are each user-configurable to
`independently output a plurality of lists of values for each of a set of keys found in such map function’s corresponding data
`partition to form corresponding intermediate data for that data group and identifiable to that data group,
`
`wherein the data of a first data group has a different schema than the data of a second data group and the data of the first data
`group is mapped differently than the data of the second data group so that different lists of values are output for the
`corresponding different intermediate data, wherein the different schema and corresponding different intermediate data have a
`key in common; and
`
`reducing the intermediate data for the data groups to at least one output data group, including processing the intermediate data
`for each data group in a manner that is defined to correspond to that data group, so as to result in a merging of the
`corresponding different intermediate data based on the key in common,
`
`1[d]
`
`wherein the mapping and reducing operations are performed by a distributed system.
`
`7
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 9 of 149 PageID #:
`5700
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`Evidence
`
`Databricks infringes Claim 1 of the ’610 Patent. Each of the Accused Instrumentalities performs a method of processing
`data of a data set over a distributed system, wherein the data set comprises a plurality of data groups, the method
`comprising the steps discussed herein. The Accused Instrumentalities are based on Apache Spark, Delta Lake, and/or
`other similar functionality which enables processing of data of a data set over a distributed system, wherein the data
`set comprises a plurality of data groups. The particular details are within Databricks' possession, custody, and control.
`
`Platform
`Features
`
`8
`
`Apache Spark
`on Databricks
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 10 of 149 PageID #:
`5701
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`The Accused Instrumentalities implement Spark and/or other similar functionality which enables processing of data of a
`data set over a distributed system. Users can submit jobs to a Spark cluster using the spark-submit tool.
`
`Evidence
`
`bin/spark-submit
`
`9
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 11 of 149 PageID #:
`5702
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`The SparkSubmit job is the main entry point for job submissions to a Spark Cluster
`
`Evidence
`
`org/apache/spark/deploy/SparkSubmit.scala
`
`10
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 12 of 149 PageID #:
`5703
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`See, as further examples,
`
`Evidence
`
`• org/apache/spark/SparkContext.scala
`• org/apache/spark/executor/Executor.scala
`• org/apache/spark/scheduler/DAGScheduler.scala
`• org/apache/spark/scheduler/TaskSchedulerImpl.scala
`• org/apache/spark/scheduler/TaskSetManager.scala
`The particular details are within Databricks’ possession, custody, and control.
`
`11
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 13 of 149 PageID #:
`5704
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`As another example, end users can use the spark-shell tool for interactive analysis
`
`Evidence
`
`bin/spark-shell
`
`12
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 14 of 149 PageID #:
`5705
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`Evidence
`
`org/apache/spark/repl/Main.scala
`
`13
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 15 of 149 PageID #:
`5706
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`See, as further examples,
`
`Evidence
`
`• org/apache/spark/repl/SparkILoop.scala
`• org/apache/spark/sql/SparkSession.scala
`• org/apache/spark/SparkContext.scala
`• org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala (e.g. parsePlan)
`• org/apache/spark/sql/Dataset.scala e.g. (join(right: Dataset[_], usingColumns: Seq[String], joinType: String))
`• org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala e.g. "case class Join(...)"
`
`The particular details are within Databricks’ possession, custody, and control.
`
`14
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 16 of 149 PageID #:
`5707
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`Evidence
`
`The Accused Instrumentalities implement Spark and/or other similar functionality which enables processing of data of a
`data set over a distributed system, wherein the data set comprises a plurality of data groups. For example, Spark
`utilizes RDDs which, in one example, are groups of data with, upon information and belief, a mechanism to identify data
`from that group. The particular details are within Databricks’ possession, custody, and control.
`Apache Spark FAQ
`
`RDD Programming Guide
`
`15
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 17 of 149 PageID #:
`5708
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`In another example, a mechanism to identify data from Spark RDDs is demonstrated by Spark’s “hints” (e.g.,
`partitioning hints, join hints, etc.). The particular details are within Databricks’ possession, custody, and control.
`
`Evidence
`
`Hints
`
`16
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 18 of 149 PageID #:
`5709
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`The Accused Instrumentalities implement Spark and/or other similar functionality which enables processing of data of a
`data set over a distributed system, wherein the data set comprises a plurality of data groups. The particular details are
`within Databricks’ possession, custody, and control.
`
`Evidence
`
`Platform
`Features
`
`Delta Lake on
`Databricks
`
`17
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 19 of 149 PageID #:
`5710
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`Evidence
`
`/**
` * A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable,
` * partitioned collection of elements that can be operated on in parallel. This class contains the
` * basic operations available on all RDDs, such as `map`, `filter`, and `persist`. In addition,
` * [[org.apache.spark.rdd.PairRDDFunctions]] contains operations available only on RDDs of key-value
` * pairs, such as `groupByKey` and `join`;
` * [[org.apache.spark.rdd.DoubleRDDFunctions]] contains operations available only on RDDs of
` * Doubles; and
` * [[org.apache.spark.rdd.SequenceFileRDDFunctions]] contains operations available on RDDs that
` * can be saved as SequenceFiles.
` * All operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)])
` * through implicit.
` *
` * Internally, each RDD is characterized by five main properties:
` *
` * - A list of partitions
` * - A function for computing each split
` * - A list of dependencies on other RDDs
` * - Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
` * - Optionally, a list of preferred locations to compute each split on (e.g. block locations for
` * an HDFS file)
` *
` * All of the scheduling and execution in Spark is done based on these methods, allowing each RDD
` * to implement its own way of computing itself. Indeed, users can implement custom RDDs (e.g. for
` * reading data from a new storage system) by overriding these functions. Please refer to the
` * <a href="http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf">Spark paper</a>
` * for more details on RDD internals.
` */
`abstract class RDD[T: ClassTag](
` @transient private var _sc: SparkContext,
` @transient private var deps: Seq[Dependency[_]]
` ) extends Serializable with Logging {
`
`For example, an RDD is the fundamental
`data structure that represents an
`immutable, distributed collection of objects
`that can be processed in parallel across a
`cluster. It is broken into a set of partitions.
`A partition is a fundamental unit of
`parallelism in Spark and represents a
`logical chunk of an RDD (Resilient
`Distributed Dataset).
`
`org/apache/spark/rdd/RDD.scala
`
`18
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 20 of 149 PageID #:
`5711
`
`Claim Feature
`1 [pre] A method of processing data of a data
`set over a distributed system, wherein the
`data set comprises a plurality of data groups,
`the method comprising:
`
`See, as further examples,
`
`Evidence
`
`• org/apache/spark/Partitioner.scala
`• org/apache/spark/HashPartitioner.scala
`• org/apache/spark/RangePartitioner.scala
`• org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
`• org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
`
`The particular details are within Databricks’ possession, custody, and control.
`
`19
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 21 of 149 PageID #:
`5712
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`Evidence
`
`The method performed by each of the Accused Instrumentalities includes partitioning the data of each one of the data
`groups into a plurality of data partitions that each have a plurality of key-value pairs and providing each data partition
`to a selected one of a plurality of mapping functions that are each user-configurable to independently output a plurality
`of lists of values for each of a set of keys found in such map function’s corresponding data partition to form
`corresponding intermediate data for that data group and identifiable to that data group. For example, in the Accused
`Instrumentalities that implement Spark and/or other similar functionality, a data group is partitioned into a plurality of
`elements and distributed across nodes. This distribution of elements is called a resilient distributed dataset (RDD). The
`particular details are within Databricks’ possession, custody, and control.
`
`RDD Programming Guide
`
`20
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 22 of 149 PageID #:
`5713
`
`Evidence
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`Spark in Action
`
`21
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 23 of 149 PageID #:
`5714
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`In the Accused Instrumentalities that implement Spark and/or other similar functionality, the data partitions each have
`a plurality of key-value pairs. For example, one form of the RDDs is a pair RDD, which includes key/value pairs. The
`particular details are within Databricks’ possession, custody, and control.
`
`Evidence
`
`Learning Spark
`
`Spark in Action
`
`22
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 24 of 149 PageID #:
`5715
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`org/apache/spark/rdd/RDD.scala
`/**
` * An identifier for a partition in an RDD.
` */
`trait Partition extends Serializable {
` /**
` * Get the partition's index within its parent RDD
` */
` def index: Int
`
` // A better default implementation of HashCode
` override def hashCode(): Int = index
`
` override def equals(other: Any): Boolean =
`super.equals(other)
`}
`
`For example, partitioning pair RDDs is reflected in the Spark source code.
`org/apache/spark/rdd/RDD.scala
`
`Evidence
`
`/**
` * A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable,
` * partitioned collection of elements that can be operated on in parallel. This class contains the
` * basic operations available on all RDDs, such as `map`, `filter`, and `persist`. In addition,
` * [[org.apache.spark.rdd.PairRDDFunctions]] contains operations available only on RDDs of key-value
` * pairs, such as `groupByKey` and `join`;
` * [[org.apache.spark.rdd.DoubleRDDFunctions]] contains operations available only on RDDs of
` * Doubles; and
` * [[org.apache.spark.rdd.SequenceFileRDDFunctions]] contains operations available on RDDs that
` * can be saved as SequenceFiles.
` * All operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)])
` * through implicit.
` *
` * Internally, each RDD is characterized by five main properties:
` *
` * - A list of partitions
` * - A function for computing each split
` * - A list of dependencies on other RDDs
` * - Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
` * - Optionally, a list of preferred locations to compute each split on (e.g. block locations for
` * an HDFS file)
` *
` * All of the scheduling and execution in Spark is done based on these methods, allowing each RDD
` * to implement its own way of computing itself. Indeed, users can implement custom RDDs (e.g. for
` * reading data from a new storage system) by overriding these functions. Please refer to the
` * <a href="http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf">Spark paper</a>
` * for more details on RDD internals.
` */
`abstract class RDD[T: ClassTag](
` @transient private var _sc: SparkContext,
` @transient private var deps: Seq[Dependency[_]]
` ) extends Serializable with Logging {
`
`23
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 25 of 149 PageID #:
`5716
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`Evidence
`
`See, as further examples,
`
`• org/apache/spark/rdd/RDD.scala
`• org/apache/spark/rdd/PairRDDFunctions.scala
`• org/apache/spark/rdd/PairRDDFunctions.scala
`• org/apache/spark/rdd/OrderedRDDFunctions.scala
`
`The particular details are within Databricks’ possession, custody, and control.
`
`24
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 26 of 149 PageID #:
`5717
`
`Evidence
`
`In the Accused Instrumentalities that implement Spark and/or other similar functionality, each data partition is provided
`to a selected one of a plurality of mapping functions. For example, the RDDs support two types of data operations:
`transformations and actions. Mapping functions are examples of transformation functions. The particular details are
`within Databricks’ possession, custody, and control.
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`RDD Programming Guide
`
`25
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 27 of 149 PageID #:
`5718
`
`Evidence
`
`In the Accused Instrumentalities that implement Spark and/or other similar functionality, the transformation operations
`include narrow and wide transformations. Wide transformations (also called “shuffles”) operate on data from multiple
`partitions. Wide transformations include aggregateByKey, reduceByKey, etc. The particular details are within
`Databricks' possession, custody, and control.
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`Spark RDD Transformations
`
`26
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 28 of 149 PageID #:
`5719
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`Evidence
`
`Apache Spark Quick Start Guide
`
`27
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 29 of 149 PageID #:
`5720
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`In the Accused Instrumentalities that implement Spark and/or other similar functionality, when executing a shuffle
`transformation, a map task is first run on all data partitions. The particular details are within Databricks' possession,
`custody, and control.
`
`Evidence
`
`Spark SQL Shuffle Partitions
`
`28
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 30 of 149 PageID #:
`5721
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`Evidence
`
`For example, ResultTask.scala generates a result task for each partition that is being processed within the Spark
`architecture, reflecting that Spark partitions data to, e.g., be fed into mapping functions. Specifically, the DAGScheduler
`creates a ResultTask for each input partition to track work for each stage of the computation. The runTask function of
`that ResultTask is responsible for executing the distributed task on each input partition.
`private[spark] class ResultTask[T, U](
` stageId: Int,
` stageAttemptId: Int,
` taskBinary: Broadcast[Array[Byte]],
` partition: Partition,
` numPartitions: Int,
` locs: Seq[TaskLocation],
` val outputId: Int,
` artifacts: JobArtifactSet,
` localProperties: Properties,
` serializedTaskMetrics: Array[Byte],
` jobId: Option[Int] = None,
` appId: Option[String] = None,
` appAttemptId: Option[String] = None,
` isBarrier: Boolean = false)
` extends Task[U](stageId, stageAttemptId, partition.index, numPartitions, artifacts,
` localProperties, serializedTaskMetrics, jobId, appId, appAttemptId, isBarrier)
` with Serializable {
`
`org/apache/spark/scheduler/ResultTask.scala
`
` @transient private[this] val preferredLocs: Seq[TaskLocation] = {
` if (locs == null) Nil else locs.distinct
` }
`
` override def runTask(context: TaskContext): U = {
` // Deserialize the RDD and the func using the broadcast variables.
` val threadMXBean = ManagementFactory.getThreadMXBean
` val deserializeStartTimeNs = System.nanoTime()
` val deserializeStartCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
` threadMXBean.getCurrentThreadCpuTime
` } else 0L
` val ser = SparkEnv.get.closureSerializer.newInstance()
` val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](
` ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
` _executorDeserializeTimeNs = System.nanoTime() - deserializeStartTimeNs
` _executorDeserializeCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
` threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime
` } else 0L
`
` func(context, rdd.iterator(partition, context))
` }
`
`29
`
`
`
`Claim Chart – Claim 1
`
`Case 4:23-cv-01147-ALM Document 83-3 Filed 02/10/25 Page 31 of 149 PageID #:
`5722
`
`Claim Feature
`1 [a] partitioning the data of each one of the
`data groups into a plurality of data
`partitions that each have a plurality of
`key-value pairs and providing each data
`partition to a selected one of a plurality of
`mapping functions that are each user-
`configurable to independently output a
`plurality of lists of values for each of a set
`of keys found in such map function’s
`corresponding data partition to form
`corresponding intermediate data for that
`data group and identifiable to that data
`group,
`
`In another example, the EnsureRequirements class ensures that input data is partitioned according to the requirements
`of later mapping functions by inserting Shuffle mapping operators where necessary.
`
`Evidence
`
`/**
` * Ensures that the [[org.apache.spark.sql.catalyst.plans.physical.Partitioning Partitioning]]
` * of input data meets the
` * [[org.apache.spark.sql.catalyst.plans.physical.Distribution Distribution]] requirements for
` * each operator by inserting [[ShuffleExchangeExec]] Operators where required. Also ensure
`that
` * the input partition orde

Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.

This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.

One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site