`
`(12) United States Patent
`Duffy et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,352.456 B2
`Jan. 8, 2013
`
`(*) Notice:
`
`(54) PRODUCER/CONSUMER OPTIMIZATION
`(75) Inventors: John J. Duffy, Renton, WA (US);
`Henricus Johannes Maria Meijer,
`Mercer Island, WA (US)
`(73) Assignee: Microsoft Corporation, Redmond, WA
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 590 days.
`(21) Appl. No.: 11/747,772
`(22) Filed:
`May 11, 2007
`(65)
`Prior Publication Data
`US 2008/O281786 A1
`Nov. 13, 2008
`
`(51) Int. Cl.
`(2006.01)
`G06F 7700
`(2006.01)
`G06F 7/30
`(52) U.S. Cl. ........................................ 707/713; 707/719
`(58) Field of Classification Search .................. 707/713,
`707/719,999.002
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,590,319 A 12/1996 Cohen et al.
`5,857, 180 A
`1/1999 Hallmarket al.
`6,009,265 A 12/1999 Huang et al.
`6,625,593 B1
`9/2003 Leung et al.
`6,968.335 B2 11/2005 Bayliss et al.
`7,051,034 B1
`5, 2006 Ghosh et al.
`7,146,365 B2 12/2006 Allen et al.
`2005/O165798 A1
`7/2005 Cherkauer et al.
`
`OTHER PUBLICATIONS
`Ganguly et al., “Query Optimization for Parallel Execution'. Pro
`ceedings of the 1992 ACM SIGMOD international conference on
`Management of data, p. 9-18, Jun. 2-5, 1992, San Diego, California,
`United States.
`
`Graefe, “Volcano—An Extensible and Parallel Query Evaluation
`System'. IEEE Transactions on Knowledge and Data Engineering.
`vol. 6, No. I. pp. 120-135, Feb. 1994, IEEE.*
`Bouganim et al., “A Dynamic Query Processing Architecture for
`Data Integration Systems'. Bulletin of the IEEE Computer Society
`Technical Committee on Data Engineering, 2000, IEEE.*
`Bala et al., “Dynamo: a transparent dynamic optimization system'.
`PLDI 00 Proceedings of the ACM SIGPLAN 2000 conference on
`Programming language design and implementation, pp. 1-12, 2000,
`ACM.
`Girbal et al., “Semi-Automatic Composition of Loop Transforma
`tions for Deep Parallelism and Memory Hierarchies”. International
`Journal of Parallel Programming, vol. 34, No. 3, pp. 261-317, 2006,
`Spreinger Verlag Berlin Heidelberg.*
`Qasem et al., “Profitable loop fusion and tiling using model-driven
`empirical search”, ICS '06 Proceedings of the 20th annual interna
`tional conference on Supercomputing, pp. 249-256, 2006, ACM.*
`Pouchetaet al., “Combined Iterative and Model-driven Optimization
`in an Automatic Parallelization Framework”. SC 10 Proceedings of
`the 2010 ACM/IEEE International Conference for High Performance
`Computing, Networking, Storage and Analysis, pp. 1-11, 2010,
`ACM.
`Ganguly, et al. "Query Optimization for Parallel Execution.”
`Hewlett-Packard Laboratories, http://delivery.acm.org/10.1145/
`140000/130291/p9-ganguly.pdf?key1=130291&key2=3748847311
`&col=GUIDE&d=GUIDE&CFID=65859593
`&CFTOKEN=60779184, Jun. 1992, last accessed Feb. 14, 2007, 10
`pageS.
`
`(Continued)
`Primary Examiner — Michael Hicks
`(57)
`ABSTRACT
`Systems and methods facilitate efficient data processing in a
`computer environment. Data producers and consumers are
`considered in aggregate rather than in isolation. In one
`instance, interaction between data producers and consumers
`is improved by integrating producers and consumers. Opti
`mization can Subsequently be performed over the combina
`tion to produce Synergistic results.
`
`12 Claims, 13 Drawing Sheets
`
`
`
`
`
`PARTITION
`DATA
`
`
`
`
`
`PRODUCER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IDENTIFY ONE ORMORE CONSUMERS
`ASSOCATED WITHA PRODUCER
`
`1110
`
`IDENTIFY CONSUMERACTIONS
`
`MODIFY THE PRODUCER ASA
`FUNCTION OF CONSUMERACTIONS
`
`1120
`
`1130
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 1
`
`
`
`US 8,352.456 B2
`Page 2
`
`OTHER PUBLICATIONS
`
`Lu, et al. "Optimization of Multi-Way Join Queries for Parallel
`Execution.” Proceedings of the 17th International Conference on
`Very Large Data Bases, http://www.sigmod.org/vldb/conf 1991/
`P549. PDF, Sep.1991, Barcelona, Spain, last accessed Feb. 14, 2007,
`12 pages.
`
`Ayres, et al. "An Extensible Query Execution Engine for Supporting
`New Query Execution Models.” http://icwww.epfl.ch/publications/
`documents/IC TECH REPORT 2005034.pdf, last accessed Feb.
`14, 2007, 16 pages.
`
`* cited by examiner
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 2
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 1 of 13
`
`US 8,352.456 B2
`
`100
`
`Y
`
`
`
`PRODUCER
`COMPONENT
`
`CONSUMER
`COMPONENT
`
`PROGRAM CODE
`
`INTERFACE
`COMPONENT
`
`OPTIMIZATION
`COMPONENT
`
`Fig. 1
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 3
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 2 of 13
`
`US 8,352.456 B2
`
`
`
`130
`
`INTEGRATION
`COMPONENT
`
`EXECUTION
`STRATEGY
`COMPONENT
`
`OPTIMIZATION
`COMPONENT
`
`COST
`COMPONENT
`
`Fig. 2
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 4
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 3 of 13
`
`US 8,352.456 B2
`
`
`
`130
`
`INTEGRATION
`COMPONENT
`
`QUERY PLAN
`COMPONENT
`
`OPTIMIZATION
`COMPONENT
`
`COST
`COMPONENT
`
`Fig. 3
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 5
`
`
`
`US. Patent
`
`Jan.8,2013
`
`Sheet4of13
`
`US 8,352,456 B2
`
`
`
`MNEDmZOU
`
`v.5
`
`mmUDQOMm
`
`mammg
`
`HMSC5
`
`ZOHEHm‘fim
`
`<F<Q
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 6
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 6
`
`
`
`
`
`US. Patent
`
`Jan 8,2013
`
`mf05a
`
`CuU
`
`00
`
`4,253a
`
`mMMEDmZOU
`
`mmEDmZOQ
`
`MmEDmZOO
`
`MmEDmZOO
`
`
`
`2
`
`B.xm”a
`
`MMUDDOMQ
`
`vaqmm
`
`Emma
`
`PUmAMm
`
`Mimi?»
`
`Homw—m—m
`
`mam—:3
`
`HQmAm—m
`
`Emma
`
`ZOEEUm/wm
`
`<H<Q
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 7
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 7
`
`
`
`
`U.S. Patent
`
`US 8,352.456 B2
`
`
`
`YHTEIVNÍCI SNOO
`
`LOEITIGIS
`
`JLOHTEIS NA
`
`
`
`
`
`
`
`
`
`GIOVLS EINITIHd|Id]
`
`|
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 8
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 7 of 13
`
`US 8,352.456 B2
`
`
`
`130
`
`CONSUMER
`ANALYSIS
`COMPONENT
`
`PRODUCER
`MODIFICATION
`COMPONENT
`
`OPTIMIZATION COMPONENT
`
`Fig. 7
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 9
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 8 of 13
`
`US 8,352.456 B2
`
`-800
`
`QUERY
`
`
`
`810
`
`DATABASE
`INTERFACE
`COMPONENT |
`
`RESULT
`- RESULT
`:
`
`RESULT
`
`QUERY
`PROCESSOR
`COMPONENT
`
`DATABASE
`
`Fig. 8
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 10
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 9 of 13
`
`US 8,352.456 B2
`
`
`
`START
`
`900
`
`Y
`
`IDENTIFY A PROGRAMMATIC PRODUCER
`AND ASSOCIATED CONSUMER
`
`OPTIMIZE INTERACTION BETWEEN THE
`CONSUMER AND THE PRODUCER
`
`910
`
`920
`
`Fig. 9
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 11
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 10 of 13
`
`US 8,352.456 B2
`
`
`
`START
`
`y
`
`1000
`
`ANALYZE A CODE OF A
`PRODUCER WITH INTEGRATED
`CONSUMER
`
`1010
`
`DETERMINING AN EXECUTION
`STRATEGY BASED ON THE ANALYSIS
`
`1020
`
`EXECUTE THE CODE
`UTILIZING THE STRATEGY
`
`1030
`
`Fig. 10
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 12
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 11 of 13
`
`US 8,352.456 B2
`
`
`
`1100
`
`Y
`
`IDENTIFY ONE OR MORE CONSUMERS
`ASSOCIATED WITH A PRODUCER
`
`1110
`
`IDENTIFY CONSUMERACTIONS
`
`1120
`
`MODIFY THE PRODUCER AS A
`FUNCTION OF CONSUMERACTIONS
`
`1130
`
`Fig.11
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 13
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 12 of 13
`
`US 8,352.456 B2
`
`4. 1228
`- - - - -
`APPLICATION(S)
`I
`- - - - - - - -
`
`-
`
`-
`
`-
`
`-
`
`-
`
`
`
`^ 1210
`
`1212
`
`PROCESSING
`UNIT(S)
`
`SYSTEM
`MEMORY
`
`MASS
`STORAGE
`
`INTERFACE
`COMPONENT(S)
`
`INPUT OUTPUT
`
`Fig. 12
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 14
`
`
`
`U.S. Patent
`
`Jan. 8, 2013
`
`Sheet 13 of 13
`
`US 8,352.456 B2
`
`
`
`1310
`
`CLIENT(S)
`
`
`
`
`
`
`
`CLIENT
`DATA
`STORE(S)
`
`
`
`1330
`
`SERVER(S)
`
`1340
`
`SERVER
`DATA
`STORE(S)
`
`
`
`
`
`COMMUNICATION
`FRAMEWORK
`
`1350
`
`Fig. 13
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 15
`
`
`
`1.
`PRODUCERFCONSUMER OPTIMIZATION
`
`BACKGROUND
`
`Computer programs are groups of instructions that 5
`describe actions to be performed by a computer or other
`processor-based device. When a computer program is loaded
`and executed on computer hardware, the computer will
`behave in a predetermined manner by following the instruc
`tions of the computer program. Accordingly, the computer 10
`becomes a specialized machine that performs the tasks pre
`scribed by the instructions.
`A programmer using one or more programming languages
`creates the instructions comprising a computer program.
`Typically, source code is specified or edited by a programmer 15
`manually and/or with help of an integrated development envi
`ronment (IDE). Subsequently, the source code can be com
`piled or otherwise transformed by another program into com
`puter instructions executable by a computer or like device.
`In Software engineering, a plurality of design patterns are 20
`conventionally utilized in program development. A design
`pattern provides a framework for describing a particular issue
`and Solutions thereto. More specifically, a design pattern is a
`general, repeatable solution for common issues that occur in
`Software design. Among other things, use of design patterns 25
`speeds up development, helps prevent Subtle issues and
`improves program readability and comprehension by those
`familiar with the pattern.
`One basic design pattern is producer/consumer. A pro
`ducer/consumer relationship is one in which a producer gen- 30
`erates data and the consumer uses the data. This pattern is
`utilized in a myriad of different environments for a number of
`processes including, at a higher level, data warehousing for
`cleansing and transforming data and image processing for
`iterative refinement. In fact, the pattern can apply to any 35
`situation in which data is produced and consumed. One par
`ticularly prevalent use case pertains to queries.
`Query execution can be seen as a traditional client/server or
`consumer/producer model where an entity A requests a ser
`vice from another entity B, in this case the retrieval of some 40
`data that satisfies criteria and is in the shape requested. Some
`bi-directional communication mechanism is required Such
`that A can instruct B about its desire and so that B may
`respond to A with the results. The entire result set is returned
`in Some form and thereafter consumed for Some purpose. 45
`Conventional relational database management Software
`(RDBMS) employs cursors for the output interface with
`which to stream query output to the consumer. A cursor is a
`single stream of data that facilitates Supply of data in some
`definite, sequential ordering.
`
`50
`
`SUMMARY
`
`The following presents a simplified Summary in order to
`provide a basic understanding of some aspects of the claimed 55
`Subject matter. This Summary is not an extensive overview. It
`is not intended to identify key/critical elements or to delineate
`the scope of the claimed Subject matter. Its sole purpose is to
`present some concepts in a simplified form as a prelude to the
`more detailed description that is presented later.
`Briefly described, the subject disclosure pertains to opti
`mization of producer/consumer code. Instead of treating pro
`ducers and consumers as blackboxes, activities of a producer
`and one or more associated consumers can be analyzed and
`employed to facilitate efficient execution. In other words, 65
`optimization can be performed on a producer/consumer
`aggregate rather than on the producer and consumer in isola
`
`60
`
`US 8,352,456 B2
`
`2
`tion. Furthermore, the producer and/or consumer can be an
`aggregate of some other producer and/or consumer Such that
`the optimization can be recursive.
`In accordance with one aspect of the disclosure, consumer
`activity can be merged or integrated with producer activity. In
`this case, consumer activity can simply be considered an
`extension of producer activity. As a result, an execution strat
`egy can be generated over the integration or aggregate to
`implement parallelism, among other things. By way of
`example, if the producer corresponds to a query, novel and/or
`conventional query optimization approaches can be utilized
`that cross the producer/consumer boundary.
`According to another aspect of the disclosure, consumer
`activity can be analyzed and employed to streamline producer
`execution alone or in conjunction with other optimization
`techniques. More particularly, a producer can be specialized
`to produce data solely of interest to one or more associated
`COSU.S.
`To the accomplishment of the foregoing and related ends,
`certain illustrative aspects of the claimed Subject matter are
`described herein in connection with the following description
`and the annexed drawings. These aspects are indicative of
`various ways in which the Subject matter may be practiced, all
`of which are intended to be within the scope of the claimed
`Subject matter. Other advantages and novel features may
`become apparent from the following detailed description
`when considered in conjunction with the drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram of a code optimization system in
`accordance with an aspect of the claimed subject matter.
`FIG. 2 is a block diagram of a representative optimization
`component.
`FIG. 3 is a block diagram of a representative optimization
`component for query processing.
`FIG. 4 is a graphical illustration of a consumer/producer
`processing approach.
`FIG. 5 is a graphical illustration of a consumer/producer
`processing approach utilizing partition parallelism.
`FIG. 6 is a graphical illustration of a consumer/producer
`processing approach utilizing pipeline parallelism.
`FIG. 7 is a block diagram of a representative optimization
`component.
`FIG. 8 is a block diagram of a database management sys
`tem operable to return data for parallel consumption.
`FIG. 9 is a flow chart diagram of a method of optimizing
`producer/consumer interaction.
`FIG. 10 is a flow chart diagram of a method of data pro
`cessing.
`FIG. 11 is a flow chart diagram of a method of code opti
`mization.
`FIG. 12 is a schematic block diagram illustrating a suitable
`operating environment for aspects of the Subject disclosure.
`FIG. 13 is a schematic block diagram of a sample-comput
`ing environment.
`
`DETAILED DESCRIPTION
`
`Systems and methods are described hereinafter concerning
`data processing. Processing efficiency can be improved by
`optimizing interactions between producers and consumers.
`More particularly, producers and consumers can be consid
`ered en masse rather than as isolated, atomic processes. Opti
`mizations can then be performed with respect to both data
`production and consumption resulting in a synergistic result.
`Such optimization can be accomplished via direct program
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 16
`
`
`
`15
`
`3
`specification, an application programming interface, a com
`piler and/or a query processor, amongst other mechanisms.
`Various aspects of the subject disclosure are now described
`with reference to the annexed drawings, wherein like numer
`als refer to like or corresponding elements throughout. It
`should be understood, however, that the drawings and
`detailed description relating thereto are not intended to limit
`the claimed subject matter to the particular form disclosed.
`Rather, the intention is to coverall modifications, equivalents
`and alternatives falling within the spirit and scope of the
`claimed Subject matter.
`Referring initially to FIG. 1, a code optimization system
`100 is illustrated in accordance with an aspect of the claimed
`Subject matter. The system includes programmatic code 110
`comprising one or more producer components 112 and con
`Sumer components 114. The producer component 112 pro
`duces or generates data that is consumed or utilized by one or
`more consumer components 114. As will be described further
`infra, consumer components 114 can also act as or include
`functionality of producer components 112 or vice versa, for
`example where there is a sequence of components that receive
`data, perform some action and pass the data to the next com
`ponent. Further yet, the producer component 112 and con
`Sumer component 14 can operate with respect to arbitrary
`data types including, streams, graphs and trees, among others.
`Interface component 120 is operable to receive, retrieve or
`otherwise obtain or identify related producer components 112
`and consumer components 114 within programmatic code
`110. These components or identities thereof can be provided
`to or made accessible by optimization component 130. The
`interface component 120 therefore acts as a conduit between
`the programmatic code 110 and the optimization component
`130. Accordingly, it is also to be noted that the interface
`component 120 and/or portions thereof can facilitate modifi
`cation of the code 110 by optimization component 130.
`The optimization component 130 facilitates improving
`data processing by optimizing producer components 112,
`consumer components 114 and/or interaction between the
`components. Conventionally, producer components 112 and
`consumer components 114 are treated as isolated atomic pro
`cesses. Such treatment is likely a function of traditional
`modular program design and/or presumed communication
`latency or low throughput. However, the conventional treat
`ment can result in Sub-optimal execution in many cases. For
`instance, this is quite limiting for Scenarios with Small dis
`tances or high throughput including in memory, in the same
`process and/or on the same machine. Further, with an increase
`in availability of parallel hardware and decrease in commu
`nication latency, it is important that Software evolve to do
`more things in parallel. That includes consuming items from
`a producer.
`Rather than treating consumer components 114 as black
`boxes, the optimization component 130 is operable to analyze
`consumer activity to facilitate optimization of the producer
`component 112, the consumer component 114 or both. In one
`instance, consumeractivity associated with a producer can be
`utilized to refine data produced by a producer. Additionally or
`alternatively, at least a portion of consumer activity can be
`merged within a producer or at least treated as Such for opti
`mization purposes.
`In essence, structure of a producer component 112 and at
`least one associated consumer component 114 is known or
`can be determined. Based thereon, the producer component
`112 and the consumer component 114 are matched up in a
`way to optimize performance, parallelism and/or data com
`65
`munication. Furthermore, it is to be appreciated that once
`consumeractivity is exposed optimizations can be performed
`
`40
`
`45
`
`50
`
`55
`
`60
`
`US 8,352,456 B2
`
`10
`
`25
`
`30
`
`35
`
`4
`recursively thereon providing deeper optimizations than
`would otherwise have been possible. By way of example,
`where there is a sequence of producers and consumers, the
`consumers can also be producers that can be optimized.
`The interface component 120 and/or optimization compo
`nent 130 can be embodied in any number of systems or
`mechanisms. By way of example, the interface component
`120 and optimization component 130 can be embodied in an
`application programming interface (API) associated with
`producers and consumers. Additionally or alternatively, the
`components 120 and 130 can form part of a compiler such that
`generated code including producers and consumers is opti
`mized for execution. Further yet, the components can form
`part of a query processor and/or associated components.
`However, the claimed subject matter is not limited to these
`exemplary embodiments. Other embodiments are also pos
`sible and are to be deemed with in the scope of innovation.
`Turning attention to FIG. 2, a representative optimization
`component 130 is depicted in accordance with an aspect of
`the disclosure. The optimization component 130 includes an
`integration component 210 to integrate or merge consumer
`activity with producer activity. By integrating producers and
`consumers, a bottleneck can be removed from the process
`wherein generated data is merged or packaged into a data
`structure and transmitted or otherwise identified to the con
`Sumer for processing.
`Integration component 210 can operate over a plurality of
`consumers and producers. By way of example, a sequence of
`producers and consumers can exist where data is transformed
`and passed to the next element that transforms that data and
`passes it on to the next, etc. Essentially, there can be a pipeline
`of consumers that are producers themselves. The integration
`component 210 can integrate or aggregate all these consum
`ers and producers together into one big group Such that opti
`mization can be applied across all consumers and producers.
`For instance, Suppose there is a process that takes an image
`and performs successive transformations thereon. These
`transformations can be glued together into one big process
`such that optimization can be performed with respect to the
`image as a whole instead of separate parts where intermediate
`values are sent to the next black box. More specifically, the
`integration component 210 can Support intra and/or inter
`component optimizations.
`Execution strategy component 220 is operable to optimize
`execution of an integrated producer and consumer. In particu
`lar, the execution strategy component 220 can identify an
`optimized execution strategy and/or modify the code in
`accordance therewith. For instance, the execution strategy
`component 220 can employ conventional optimization
`approaches across a producer/consumer boundary including
`without limitation common Sub-expression elimination and
`operation reordering. While these optimizations are similar to
`what compilers do everyday, they are different in that the
`execution strategy component 220 can take advantage of rich
`semantic knowledge to rewrite higher level operations in
`addition to lower level program Statements.
`Various parallelism approaches can also be employed by
`the execution strategy component 230 including partitioning
`and pipelining. In partitioning, code is replicated, and the
`replicas execute simultaneously over disjoint Subsets of data.
`In one particular instance, multistreaming can be employed
`between producers and consumers to permit parallelism to
`extend from core production operations into consumption
`operations without any Superfluous merge operation or other
`bottleneck, for example. Pipelining or pipeline parallelism
`divides processing tasks into a set of tasks connected in series.
`Tasks in each pipeline are often parallelized for example
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 17
`
`
`
`US 8,352,456 B2
`
`10
`
`15
`
`25
`
`30
`
`35
`
`5
`utilizing partitioning. This approach can be employed
`instances where partitioning alone is not efficient because of
`varying costs associated with task execution.
`Cost component 230 identifies execution cost associated
`with producer and/or consumer operations. This information
`can be determined or inferred and supplied to the execution
`strategy component 220 to facilitate identification of an
`appropriate parallel execution strategy or degree of parallel
`ism given the costs and available resources, for instance.
`FIG. 3 depicts a representative optimization component
`130 with respect to a query-processing embodiment.
`Although not limited thereto, produces/consumer data pro
`cessing can apply to query processes, namely generation of
`query results and processing thereof. Such queries can be
`traditional database queries (e.g., relational database query)
`or language integrated queries, among others. Similar to FIG.
`2, FIG.3 includes the integration component 210 for integrat
`ing, in this case, queries and consumers of query results and
`cost component 220 for determining or inferring execution
`cost associated with query and/or consumption operations.
`Further, optimization component 130 can include a query
`plan component 320 that can generate and/or optimize a
`query plan. A query plan defines a query execution strategy
`for retrieving data. In accordance with an aspect of the
`claimed subject matter the query plan component 320 can
`generate and/or optimize plans as a function of consumer
`activity. In other words, consumption action can be incorpo
`rated into query planning optimization itself. Particular plans
`can be generated or modified as a function of execution cost
`associated with producer and/or consumer costs provided by
`the cost component 230.
`In one implementation, the query plan component 320 can
`generate an abstract syntax tree or graph that represents the
`query itself wherein nodes identify query operations and
`edges identify flow between operators. Consumeractivity can
`be incorporated therein and represented as another node in the
`graph or tree. Conventional and/or novel optimization tech
`niques can be applied to the integrated representation and
`subsequently employed during execution via a query proces
`40
`sor, engine, component or the like (not shown). By way of
`example and not limitation, such optimization techniques can
`include standard relational algebra-based rewrite rules, as
`most relational databases are employed against tree-based
`representations of queries.
`FIGS. 4-6 provide exemplary query scenarios to facilitate
`clarity and understanding with respect to aspects of the
`claimed subject matter. It is to be appreciated that the
`examples and associated description are provided solely for
`clarity and understanding and are not meant to limit the scope
`of the claimed subject matter.
`The following exemplary code pertains to a consumer pro
`cessing situation in the context of a language integrated
`query:
`
`45
`
`50
`
`void ProcessCustomer(var c) { /*...* }
`void ProcessXyzCustomers(List<Customers customers) {
`DateTime orderDate = DateTime.Now. Subtract(-30):
`var filtered = from c in customers
`where c. Active &&.
`(c. State == "AK ||c. State == “HI”) &&.
`c.Orders.Count(o => o.OrderDate > orderDate) > 0
`select c;
`foreach (varc in filtered)
`ProcessCustomer(c);
`
`55
`
`60
`
`65
`
`6
`More specifically, the query var filtered from c in customers
`retrieves customers filtered by or including particular prop
`erties (e.g., customer activation, state and orders). Subse
`quently, a loop for each (varc in filtered) performs an action
`on each customer returned by the query, namely ProcessCus
`tomer(c) (implementation omitted).
`Referring to FIG. 4, a conventional approach to executing
`the above exemplary query is depicted. More specifically,
`FIG. 4 illustrates what might occur in a traditional approach
`executing on a four central processing unit (CPU) machine
`with a partition-parallelism approach. As shown, data is
`retrieved from a store and split or partitioned across the four
`CPUs for parallel query processing (Where, Select). Results
`are captured from the parallelized query, merged and Subse
`quently provided to the consumer. Note the loss of parallelism
`as soon as the “end” of the producer is reached via the explicit
`merge. While the query processing is executing across four
`processors all produced data needs to be merged prior to
`delivery to the consumer, which is then executed without
`parallelism. The merge is a bottleneck in this process.
`FIG.5 graphically illustrates an optimized execution plan
`for the exemplary code utilizing partitioning. The limitations
`imposed by the previous approach are unnecessary and can
`hamper any query engine's ability to generate efficient plans
`and decisions during execution. This can be solved by mod
`eling consumer action as another query operation. It is to be
`noted that this can apply to scenarios in which a consumerand
`producer are in the same process, in separate processes or
`even on separate machines. In other words, the distance
`between consumer and producer can be arbitrary. Accord
`ingly, the merge action can be removed and the consumer
`ProcessCustomer() function run in parallel with other query
`operations. Further, the consumer could itself invoke other
`sub-queries or operations that could become part of an overall
`query execution plan.
`In the code, the semicolon between select c and for each
`identifies the split between producer and consumer. In that
`case, a filtered object needs to be created and passed into the
`consumer loop, as previously described. What is desired is to
`fuse the producer and the consumer so that the loop is inte
`grated into the query. This can result be accomplished be
`removing the loop and altering the query such that select c
`becomes select Produceconsumer(c). Now the consumer
`activity can simply appear as another query operation. During
`query plan generation and/or optimization, consumers can be
`partitioned across the four CPUs and operate on a subset of
`data provided by the producers.
`Turning to FIG. 6, another optimization approach is illus
`trated utilizing pipelining. Just as a query plan component
`and/or related components can peek inside of query opera
`tions such as Where and Select to determine and adaptively
`gather selectivity and cost, using this data to make better
`judgments about query execution now and in the future, it
`may also do so with a consumer. Therefore, not only can
`consumer activities be run in parallel, a query planner can
`make decisions based on this. For example, pipelining is often
`used to balance a query with imbalanced costs. This assumes
`that the planner is aware of consumer costs as traditional
`approaches are limited to producer activity. With this
`approach, a planner may choose to introduce new pipeline
`stages including running the consumer as its own pipeline,
`simply because it knows about this extension of the query.
`For example, if the aforementioned exemplary query is to
`be executed on a multiple CPU machine and the consumer
`costs four times the cost of the Where which costs two times
`the cost of the Select portion of the query, a reasonable expec
`tation since production is often an expensive operation, then
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2174, p. 18
`
`
`
`US 8,352,456 B2
`
`10
`
`15
`
`25
`
`7
`pipelining can be used to balance the query as shown in FIG.
`6. This style of optimization is simply not possible with
`conventional techniques and/or approaches.
`Referring to FIG. 7, a representative optimization compo
`nent 130 is illustrated in accordance with an aspect of the
`claimed Subject matter. As illustrated, the optimization com
`ponent 130 includes a consumer analysis component 710.
`The consumer analysis component 710 is a mechanism for
`analyzing activity across one or more consumers. Producer
`modification component 720 is communicatively coupled to
`the consumer analysis component 710 and operable to
`receive, retrieve or otherwise obtain consumer information.
`The producer modification component 720 can then modify
`producer activity as a function of consumer activity. By way
`of example, consider a situation where the producer yields
`customers and the consumer processes customers. If the con
`Sumer analysis component 710 can determine that a set of one
`or more consumers only ever utilizes a customer name, the
`producer modification component 720 can alter the query to
`produce only customer names rather than all customer infor
`mation. Accordingly, large portions of computation may be
`able to be eliminated as a function of consumer activity Such
`as where the producer produces that which the consumer is
`not interested, among other things.
`It should be appreciated that by removing a black box
`distinction between producers and consumers various opti
`mizations can occur. Here, optimization component 130
`facilitates producer modification in light of consumer activ
`ity. This functionality can be provided alone or in combina
`tion with other optimizations including parallelism. For
`example, the functionality captured by the consumer analysis
`component 710 and producer modification component 720
`can be captured and/or performed by execution strategy com
`ponent 220 of FIG. 2 or query plan component 320 of FIG. 3.
`Referring to FIG. 8, a database management system 800 is
`illustrated in accordance with an aspect of the claimed subject
`matter. The database management system 800 includes a
`database interface component 810, query processor compo
`nent 820 and database 830 for processing queries. The data
`base interface component 810 receives, retrieves or otherwise
`obtains a database query and transmits it to the query proces
`sor component 820. The query processor component 820
`process the query against the database 830 and returns results.
`In accordance with one aspect, the query processor compo
`nent 820 can produce results in accordance with a particular
`45
`query plan transmitted with the query or otherwise o