throbber
Continuous Queries over Append-Only
`Databases
`
`Douglas Terry, David Goldberg, David Nichols,
`and Brian Oki
`
`Xerox Corporation
`Palo Alto Research Center
`3333 Coyote Hill Rd.
`Palo Alto, CA 94304
`
`Abstract: In a database to which data is continually added, users
`may wish to issue a permanent query and be notified whenever
`data matches the query. If such continuous queries examine only
`single records, this can be implemented by examining each record
`as it arrives. This is very efficient because only the incoming
`record needs to be scanned. This simple approach does not work
`for queries involving joins or time. The Tapestry system allows
`users to issue such queries over a database of mail and bulletin
`board messages. The user issues a static query, such as "show me
`all messages that have been replied to by Jones," as though the
`database were fixed and unchanging. Tapestry converts the query
`into an incremental query that efficiently finds new matches to the
`original query as new messages are added to the database. This
`paper describes the techniques used in Tapestry, which do not
`depend on triggers and thus be implemented on any commercial
`database that supports SQL. Although Tapestry is designed for fil(cid:173)
`tering mail and news messages, its techniques are applicable to
`any append-only database.
`
`1.0
`
`INTRODUCTION
`
`A new class of queries, continuous queries, are similar to conven(cid:173)
`tional database queries, except that they are issued once and
`henceforth run "continually" over the database. As additions to the
`database result in new query matches, the new results are returned
`to the user or application that issued the query. This paper concen(cid:173)
`trates on the semantics and implementation of continuous queries.
`Continuous queries were developed and incorporated into the Tap(cid:173)
`estry system for filtering streams of electronic documents, such as
`mail messages or news articles. The Tapestry system maintains
`information about a document, such as its author, date, keywords,
`and title, in a database. The database is append-only, that is, new
`documents are added to the database as they arrive and are never
`removed. Continuous queries are used to identify documents of
`interest to particular users. Although the concept of continuous
`queries was developed for Tapestry, it applies to any database that
`is append-only.
`
`Tapestry users desire more elaborate filtering queries than those
`that use only the properties of the individual message, such as
`selecting all messages that were written by a given person or that
`Permission to copy without fee all or part of this material is
`granted provided that the copies are not made or distributed for
`direct commercial advantage, tha ACM copyright notice and the
`title of the publication and its date appear, and notice is given
`that copying is by permission of the Association for Computing
`Machinery. To copy otherwise, or to republish, requires a fee
`and/or specific permission.
`1992 ACM SIGMOD- 6/92/CA, USA
`e 1992 ACM 0-89791-522-4/92/0005/0321...$1.50
`
`321
`
`contain a given keyword [16]. Tapestry's filter queries can also
`select a message based on its relationship to other messages, such
`as the fact that it is a reply to a particular message or that one or
`more messages are replies to it. They can select a message based
`on its age, such as the fact that it is two weeks old and nobody has
`replied to it. They can select a message based on annotations
`attached to the message by one or more users; for example, users
`might vote for messages they like and have such votes registered
`as annotations on the messages. To support these desires, the sys(cid:173)
`tem cannot simply examine each message as it arrives, but needs
`to run arbitrary database queries continuously.
`
`Writing a continuous query should be as easy as writing a con(cid:173)
`ventional query for a relational database. In the Tapestry system,
`users write continuous queries in a special language TQL (Tapes(cid:173)
`try Query Language) that is similar to SQL. These queries are
`written as queries over a static database. This permits a user to try
`out a query by running it as an ad hoc query against the database,
`refine it, and then try it again. Once satisfied with the query, the
`user can install it in the Tapestry system as a continuous query.
`
`For its storage, the Tapestry system uses a commercial relational
`database management system that supports SQL. A straightfor(cid:173)
`ward method of implementing a continuous query over such a
`database is to periodically execute the query, say once every hour.
`Figure 1 outlines the basic algorithm.
`
`Figure 1
`Periodic Query Execution
`FOREVER DO
`Execute Query Q
`Return results to user
`Sleep for some period of time.
`ENDLOOP
`
`While simple to implement, this approach has three main defi(cid:173)
`ciencies:
`• Nondeterministic results. The records selected by a query
`depend on when that query is executed. A query that is exe(cid:173)
`cuted every hour on the hour may produce a different compos(cid:173)
`ite set of results than the same query executed once per day or
`even every hour on the half hour. This means that two users
`with the exact same continuous query could be presented with
`a different set of results.
`• Duplicates. Each time the query is executed the user will see
`all records selected by the query, old as well as new. Since the
`database is append-only, the set of records returned by a query
`will increase steadily over time. In practice, users are only
`interested in the records matching a continuous query that
`have not been previously returned.
`
`PALO ALTO NETWORKS Exhibit 1056 Page 1
`
`

`
`• Inefficiency. Executing the same query over and over again is
`overly expensive. Just as the size of the query's result set
`increases over time, so does the execution cost. Ideally, the cost
`of executing a continuous query should be a function of the
`amount of new data, and not dependent on the size of the whole
`database.
`
`The problem of duplicates can be solved by having the system
`remember the complete set of records that has been returned to
`each user. The system would then take care to only return query
`results that are not in this set. While this approach strictly avoids
`duplicates, it still has efficiency problems. Much of the computa(cid:173)
`tion cost of the query is spent selecting records that are subse(cid:173)
`quently discarded.
`
`Active databases, such as the 51lert system [12], address the ineffi(cid:173)
`ciency problem by using triggers to execute queries over new data
`as it arrives. Continuous queries are similar to the active queries of
`the 51lert system but can be implemented in standard SQL [2].
`Tapestry transforms each user-provided query into an incremental
`query that is run periodically. These incremental queries execute
`efficiently and avoid duplicates, to a large extent, by limiting the
`query to the portion of the database that might newly match the
`query. This is similar to the approach taken by active databases but
`does not require a trigger mechanism. The result of running a
`sequence of incremental queries is the same as executing the orig(cid:173)
`inal user query after every update to the database, but the compu(cid:173)
`tation cost is drastically reduced.
`
`The following section explains in more detail why periodic execu(cid:173)
`tion can yield nondeterministic results and proposes a clean, time(cid:173)
`independent semantics for continuous queries. Sections 4.0 and
`5.0 detail the translation steps needed to convert a general SQL
`query into its associated incremental query. Section 6.0 discusses
`the implementation of incremental queries and their performance
`when run on a sample database. Section 7.0 discusses other
`approaches to supporting continuous queries and related work.
`Section 8.0 suggests applications of continuous queries and future
`work.
`
`2.0 CONTINUOUS SEMANTICS
`
`The most significant problem with simply executing a query peri(cid:173)
`odically is that this can produce nondeterministic results. Consider
`the query: "select messages to which nobody has sent a reply."
`When a message is added to the database, it matches the query.
`However, once a reply message arrives, the message being replied
`to no longer matches the query. If a particular message were to
`arrive in the database at 8:15 and a reply to it arrived at 8:45, then
`the message would not be returned by a system that ran the algo(cid:173)
`rithm in Figure 1 every hour on the hour, but would be returned by
`a system that ran it every hour on the half hour (since the message
`would match at 8:30).
`
`This raises the general question: What are reasonable semantics
`for a query that executes "continuously?" In other words: What
`guarantees can be provided to users about the set of records
`returned by a continuous query?
`
`Users should not need to understand the implementation of the
`system in order to know what results to expect as the result of a
`continuous query. The semantics should be independent of how
`the system operates internally and when it chooses to perform var(cid:173)
`ious operations such as executing queries. Two users with the
`same continuous query should see the same result data. This
`implies that the semantics of continuous queries should be time(cid:173)
`independent.
`
`We suggest that the semantics of a continuous query should be
`defined as follows:
`
`Continuous semantics: the results of a continuous query is the
`set of data that would be returned if the query were executed at
`every instant in time.
`
`This says that the behavior of a continuous query is that it appears
`to be executed continuously by the system. That is, the system
`guarantees to show the user any record that would be selected by
`the query at any time. The system may implement this behavior in
`any number of ways, such as collecting results and presenting
`them to the user periodically, but the actual set of results eventu(cid:173)
`ally seen by the user is well-defined and time-independent.
`
`To be precise, let Q(t) be the set of records returned by the execu(cid:173)
`tion of query Q over the database that existed at time t. That is,
`Q(t) is the result of running Q at time t. Now let QM(t) denote
`the total set of data returned up until time t by executing query Q
`as a continuous query:
`
`Q~t) = UQ(s)
`sSt
`
`(EQ 1)
`
`When a query Q is executed with continuous semantics, it returns
`QM(t), not Q(t).
`
`Continuous queries are qualitatively different from one-time que(cid:173)
`ries. Consider the user who wants to see all the messages that do
`not receive replies. The obvious formulation: "select messages to
`which nobody has sent a reply," when executed as a continuous
`query, would return every message to the user, since every mes(cid:173)
`sage has no replies when it first arrives. This is undoubtedly not
`what the user intended. The problem does not lie with continuous
`semantics, but rather with the user's imprecise specification of his
`continuous query. Finding the messages that never receive a reply
`would require waiting forever, but a short wait will find most mes(cid:173)
`sages that never receive a reply. Thus a more precise query would
`be something like: "select messages that are more than two weeks
`old and to which nobody has sent a reply." This illustrates the
`point that not all database queries are suitable as continuous que(cid:173)
`ries. Nevertheless, continuous queries are a valuable concept.
`Throughout the remainder of this paper, continuous semantics are
`assumed to be the desired semantics for continuous queries.
`
`One very important question remains: Can continuous semantics
`be realized in a practical system? Certainly, running a query at
`every time is not possible, and if it were possible, would not be
`practical. This paper discusses techniques for providing continu(cid:173)
`ous semantics in an effective and efficient manner.
`
`3.0
`
`PROVIDING CONTINUOUS SEMANTICS
`
`The key to providing efficient continuous queries is the following
`observation: If we have a query QM that can compute QM(t) as
`defined above, then the simple technique of periodically executing
`QM and returning the new results yields continuous semantics.
`The frequency with which QM is executed simply affects the size
`of each batch of results, not the collective set of results. Figure 2
`shows a modification to the algorithm in Figure 1 that obeys con(cid:173)
`tinuous semantics. The algorithm keeps track of the last time it
`ran, 't.
`This algorithm works because QM is monotone, that is,
`Q~t1 ) ~ Q~t2) whenever t 1 < t2. Many interesting queries are
`not monotone and are converted to Q M. We call Q M the minimum
`bounding monotone query since it is the smallest monotone query
`that returns all the messages in Q .
`
`322
`
`PALO ALTO NETWORKS Exhibit 1056 Page 2
`
`

`
`Figure2
`Continuous Query Execution using QM.
`Set 1: = -oo
`FOREVER DO
`set t := current time
`Execute queries Q~t) and Q~'t)
`Return Q~t)- QM('t) to user
`set 1: := t
`Sleep for some period of time
`ENDLOOP
`
`Tapestry's approach to implementing continuous queries is two(cid:173)
`fold. First, a query, Q , is converted into the minimum bounding
`monotone query, QM. If the user's query is already monotone then
`Q M is usually the same as Q, and in any event produces the same
`results. Section 4.0 gives the details of how to generate an efficient
`QM.
`Second, tJte monotone query is converted into an incremental
`query, Q' , that can quickly compute an appr~lfimation to
`Q~t)- Q~'t). The queries Q, QM, and {! can all be in
`expressed in SQL.
`
`Incremental queries 5e introduced for performance reasons. An
`incremental query, {! , is parameterized by two tim_ep: the time
`that it was last executed t, and the current time t. {! ( 't, t) is
`intended to return the records that begin matching query Q M in
`the time interval from 't to t. The incremental query works by
`restricting the portion of the database over which it runs to those
`objects that might match the query and have not been previously
`returned. This allows incremental queries to run much more effi(cid:173)
`ciently than queries over the complete database.
`
`An incremental query should obey the following two properties:
`It returns enough: Q~t)- Q~'t) ~ rj('t, t).
`(a)
`It doesn't return too much: Q'('t, t) ~ QM(t).
`(b)
`Ideally, QI should return exactly the new results, but the current
`rewrite rules do not achieve this. Unlike QM, which is exactly the
`minimum bounding n:qnotone query, d is only an approximation
`of QM(t)- Q~'t). {! returns at least the new results, and occa(cid:173)
`sionally returns a past result. Thus, to guarantee that previously
`returned results are not returned to a user again, the system must
`keep track of these results and explicitly filter out duplicates. In
`practice, if users are not bothered by occasional duplicates, then
`the results of the incremental queries can be returned directly to
`users.
`
`As long as the minimum bounding monotone query for Q can be
`obtained and this query can be incrementalized so as to satisfy the
`two properties above, then the incremental query can be executed
`periodically and still guarantee continuous semantics. This is
`because the union of the results of all the incremental queries is
`exactly Q~t):
`QM(tn} = d(-oo, !1} U Q1(t1, t2} U ... U rj(tn-1' tn} (EQ 2)
`which is true because (using (a))
`
`Q~tn} = Q~t1 }- Q~-oo) U
`QM(t2)- QM(t1) U ... U QM(tn)- Q~tn-1)
`
`c. d(-oo, t 1) u QI(tl' t 2) u ... u dUn-1' tn) (EQ
`3
`
`)
`
`and (using (b))
`
`I
`I
`I
`QM(-oo, t1} U QM(tp t2} U ••• U QM(tn-1• tn}
`~ QM(t1} U Q~t2} U ... U Q~tn}
`= QM(tn}
`
`(EQ4)
`
`This indicates an effective strategy for executing a continuous
`query using a conventional relational database manager. The basic
`algorithm is presented in Figure 3. The system runs each incre(cid:173)
`mental query, queues up the results for delivery to users, records
`the time at which each query was run, waits some period of time,
`and then repeats this process using the recorded times as parame(cid:173)
`ters to the incremental queries
`
`Continuous Query Execution
`
`Figure 3
`Set 1: = -oo
`FOREVER DO
`set t := current tir:t)e
`Execute query Q ( 't, t)
`Return result to user
`set 1: := t
`Sleep for some period of time
`ENDLOOP
`
`As mentioned before, in the Tapestry system users write queries in
`a special language TQL (Tapestry Query Language). We have
`developed algorithms for taking a TQL query, transforming it to
`be monotone, incrementalizing that monotone query, and then
`converting it to SQL. Rather than introduce TQL, the following
`sections present versions of the algorithms that translate SQL que(cid:173)
`ries. Because they were designed to work for TQL, the algorithms
`have some restrictions in the SQL environment. The major differ(cid:173)
`ence is that Tapestry queries always want duplicate suppression
`(DISTINCT) because they are always retrieving mail messages.
`While the algorithms below do not always suppress duplicates,
`they often do, and this means they do not support the use of aggre(cid:173)
`gates (such as SUM or COUNT). Another area we have not
`addressed is outer joins. These are areas for future work.
`
`The following sections examine various constructs that can be
`used in SQL and discuss how to generate minimal bounding
`monotone and incremental queries for continuous queries that use
`these constructs. The rules for producing monotone and incremen(cid:173)
`tal queries make two principal assumptions about the database: (1)
`the database is append-only, namely, records are added to the data(cid:173)
`base but no data is deleted or modified, and (2) each table contains
`a timestamp column, called "ts", that indicates when the record
`was added to the database.
`
`4.0 MONOTONE QUERIES
`
`The Class of Non-monotone queries
`4.1
`Without the database being append-only, no query is monotone
`since it is always possible to delete a record that has been previ(cid:173)
`ously returned as the result of a query, thereby reducing the que(cid:173)
`ry's result set. For an append-only database, many common SQL
`queries are monotone. For example, SQL queries that are simply
`boolean predicates over the column values of a single table are
`monotone in nature. Such queries can include the comparison
`operators (=, <, >, ... ) and boolean operators (AND, OR, and
`NOT). The following is an example of a simple query:
`SELECT * FROM tbl
`WHERE
`tbl.fieldl = "Foo" AND NOT tbl.field2 < tbl.field3
`
`323
`
`PALO ALTO NETWORKS Exhibit 1056 Page 3
`
`

`
`This query is monotone because once a record is added to the data(cid:173)
`base, it either satisfies the query or not, and that satisfaction
`doesn't change over time (since the database is append-only).
`
`Queries involving joins are also monotone in nature. Again, this is
`because the database is append-only. Conceptually, a query with a
`join is the same as a query over a single table formed by taking the
`cross product of the joined tables. This single "join" table is
`append-only as long as the base tables are append-only.
`
`Tapestry queries may include the following constructs, which can
`lead to non-monotone queries:
`
`• functions that read the current time
`• subqueries prefaced by "NOT EXISTS"
`
`First consider time. While the original SQL standard does not
`include an explicit data type for storing dates, many versions of
`SQL, such as Sybase's Transact-SQL [15], as well as the proposed
`new ISO SQL standard [2], do support dates. These systems gen(cid:173)
`erally provide functions that read and return the current date and
`time. In Transact-SQL, for example, GetDate() is such a function,
`and the ISO SQL standard uses the variable CURRENT_TIMES(cid:173)
`TAMP. Queries involving calls on such functions are often non(cid:173)
`monotone. The simplest example of a query involving time is
`SELECT * FROM tbl WHERE tbl.field op GetDate()
`When op is <, this query can be illustrated as follows:
`
`FALSE
`
`tbl.field
`
`TRUE
`
`)
`
`The horizontal axis represents time t, (the value returned by Get(cid:173)
`Date()), and the graph represents the boolean value of the query
`tbl.field < GetDate() for a fixed message. It is false when time t is
`less than tbl.field (evaluated for that one fixed message), true when
`t is greater than tbl.field. The fact that this graph is monotone
`increasing translates directly into the query being monotone.
`When op is>, the graph is decreasing, and the query is no longer
`monotone.
`
`TRUE----
`
`FALSE
`
`)
`
`tbl.field
`
`An example of a non-monotone query is "select messages that
`have not expired", or
`SELECT * FROM m WHERE m.expires > GetDate()
`This is not monotone because any message that satisfies the query
`will eventually cease to satisfy it (after it expires).
`
`The graph argument just given suggests that a query is monotone
`if its only reference to time is in subexpressions of the form
`E < GetDate()
`
`or
`
`E~ GetDate()
`
`and likely to be non-monotone if it the comparison operator is>,
`:<::,=,or:¢.
`
`Here E is some date-valued expression, possibly involving fields
`of one or more tables and other built-in functions. The next section
`will show that queries involving the AND and OR of terms of the
`first form are indeed monotone. However, boolean combinations
`of terms of the second form are not always non-monotone. For
`example,
`SELECT * FROM tbl
`WHERE
`(tbl.field > GetDate() AND tbl.string = "base") OR
`(tbl.field ~ GetDate() AND
`tbl.string LIKE "%base")
`
`is monotone, because it can be rewritten as
`SELECT * FROM tbl
`WHERE tbl.string = "base" OR
`(tbl.field ~ GetDate() AND
`tbl.string LIKE "%base")
`
`assuming that the two calls to GetDate() in the original query
`return the same value.
`
`This example illustrates that a monotone rewriting rule is not the
`same as a test for monotonicity. Although the rewriting rules of the
`next section will rewrite the first form of the query into the second,
`it requires knowledge about the semantics of LIKE to conclude
`that the two queries are the same, and thus that the original query
`was monotone.
`
`Only the failure of monotonicity due to time has been discussed so
`far. A second cause of nonmonotonicity is the use of NOT
`EXISTS. The simple query "select messages that have no reply",
`might be written in SQL as
`SELECT * FROM msgs m
`WHERE NOT EXISTS (
`SELECT * FROM msgs m1
`WHERE ml.inreplyto = m.msgid)
`
`This is non-monotone because a message may satisfy the query for
`a while, but then fail because of the arrival of a reply. No explicit
`occurrence of time in the query is involved. Assuming that each
`append-only table has a column named "ts" that contains a times(cid:173)
`tamp of when the row was added to the table, the following figure
`illustrates the non-monotonicity.
`TRUE ___ _
`
`FALSE
`
`)
`
`ml.ts
`
`A more realistic non-monotone query is "select messages that are
`more than two weeks old and to which nobody has sent a reply."
`Although it involves time, it uses the monotone construction E <
`GetDate(), but is still non-monotone because of NOT EXISTS.
`
`The Basic Rewriting Rules
`4.2
`This section will show how to compute the minimal bounding
`monotone query for any SQL query in standard form (see below).
`Throughout the next two sections, several shorthands are used in
`expressing SQL queries. Table 1 lists these shorthands and their
`SQL equivalents. In particular, the term t used in a query refers to
`the current time. All instances of t in a query should obtain the
`same value. To ensure this, t could be a parameter to the query that
`is set by calling GetDate() exactly once. QM(t) occasionally refers
`to both the query Q M and the set of records returned by Q M when
`evaluated at time t. The meaning should be clear from context.
`
`324
`
`PALO ALTO NETWORKS Exhibit 1056 Page 4
`
`

`
`Standard form queries have the form
`SELECf ... FROM tbll, tbl2, ...
`WHERE (E11 AND E12 AND .. AND Elk ) OR
`(Ez! AND E22 AND ... AND E 2k ) OR 1
`
`2
`
`(En! AND Enz AND ... AND Enk)
`where each E;j is either of the form NOT EXISTS(q), or a boolean
`expression without subqueries. Furthermore if E;j involves time,
`then it must be of the form tope, where op is one of<,=,>,::;,~.
`-:t:., and e is an arithmetic expression that does not involve t. The
`subqueries q of NOT EXISTS(q) must also be in standard form.
`The technical report gives examples to show how most common
`SQL queries can be rewritten to this standard form [17].
`
`Table 1
`
`SOL shorthands.
`
`CURRENT_ TIMESTAMP or GetDate()
`c +INTERVAL "14" DAY or
`DateAdd(week, 2, c)
`
`c1 <t AND c2 < t AND ... AND Ck< t
`
`t<d1 AND f< d2 AND ... AND f< dk
`Some expression involving x, y, ...
`P(x1) AND P(x2) AND ... AND P(xk)
`
`C+2weeks
`
`MAX(c1, c2, ... , ck) < t
`
`P(x, y, ... )
`AND P(xi)
`1::; i::; k
`
`OR P(xi)
`1::; i::; k
`
`The rewrite rules need only consider each of the AND subexpres(cid:173)
`sions, since the minimal bounding monotone query Q M of a query
`Q in the form P ORR is PM OR RM. Here is a proof:
`
`QM(t) = UQ(s)
`sSt
`= U (P(s) OR R(s))
`sSt
`= U (P(s) u R(s))
`sSt
`(UP(s) u UR(s))
`sSt
`sSt
`
`(EQ5)
`
`This section assumes there are no NOT EXISTS terms. Then each
`AND subexpression has the form (E1 AND E2 AND ... AND Ek).
`If a term E; doesn't test the current time (the GetDate() function),
`then its truth value cannot change as time passes. Since the truth
`values of these terms are unchanging with respect to time, they
`don't affect the monotonicity of the query. Each of the remaining
`terms are of the form e op t, with op a simple relational test.
`
`A device that will simplify the algorithms is to add a term tbl.ts < t
`for each table tbl (recall that each table has a 'ts' column with the
`time the row was added). Thus the query
`SELECf * FROM tb1 WHERE tbl.field = "joe"
`becomes
`SELECf * FROM tbl
`WHERE tbl.field = "joe" AND tbl.ts < t
`
`After the manipulations are over, any remaining tbl.ts < t terms are
`redundant and can be removed since a record will not appear in the
`table before time tbl.ts. An example follows shortly.
`
`To avoid multiple cases, first assume that op is < or> (the minor
`changes needed for the other relational operators are indicated at
`the end of this section). Then each AND subexpression is of the
`forme1 <t AND ... AND en <t ANDt<d1 AND ... ANDt<dm
`AND P, where P is the conjunction of all the terms that don't
`involve t. The tbl.ts < t terms mentioned above simply add to the
`list of e;'s. The expression e1 < t AND e2 < t AND ... AND en< tis
`equivalent to MAX(e1, e2, ... , en)< t, and the expression t < d1
`AND t < dz AND ... AND t < dm is equivalent to t < MIN(d1, d2,
`... , dm). Thus, the AND subexpression can be rewritten as
`MAX(e1, e2, ... , en)< t AND t < MIN(d1, d2, ... , dm) AND P
`where Pis the conjunction of all the terms that don't involve t. If
`MAX(e1, ez, ... , en)< MIN(d" d2, ... , dm), then the AND subex(cid:173)
`pression is true between those times, as in the figure below.
`
`-I I
`
`I I I
`
`IfMAX(e1, e2, ... , en)> MIN(d1, d2, ... , dm), then the AND subex(cid:173)
`pression can never be true. Combining these cases yields
`MAX(e" e2, ... , en)< MIN(d1, d2, ... , dm) AND MAX(c1, e2, ... ,
`en)< t ANDP.
`
`Since SQL does not have MAX and MIN functions as such, this
`must be rewritten as
`AND(c; < di) AND AND(c; < t) AND P
`1:s;iSn
`l:s;i::;n
`1 =s;j=s;m
`
`Here are two examples. First, consider the query "select messages
`whose date field is in the future." This can be written as
`SELECf m.msgid FROMm WHERE t < m.date
`After adding the m.ts < t subexpression, e1 = m.ts and d1 = m.date,
`so the monotone query is
`SELECf m.msgid FROMm
`WHERE m.ts < m.date AND m.ts < t
`
`The redundant m.ts < t can be removed for a final answer of
`SELECf *FROMm WHERE m.ts < m.date
`
`Note how the introduction of the m.ts < t term is reflected in the
`final answer. For a second example, consider the query "select
`messages that are between 2 and 3 weeks old", which can be writ(cid:173)
`ten in SQL as
`SELECf *FROMm
`WHERE m.ts + 2 weeks < t AND t < m.ts + 3 weeks
`There is no need to add an m.ts < t subexpression, since it would
`be redundant. Then e1 = m.ts + 2 weeks, d1 = m.ts + 3 weeks, so
`the monotone query is
`SELECT *FROMm
`WHERE m.ts + 2 weeks < m.ts + 3 weeks AND
`m.ts + 2 weeks < t
`Or simplifying,
`
`325
`
`PALO ALTO NETWORKS Exhibit 1056 Page 5
`
`

`
`SELECT *FROMm
`WHERE m.ts + 2 weeks < t
`
`Here is an example. The query "select messages more than 2
`weeks old that no one has replied to" might be written in SQL as
`
`The operators :::; and ~ can be dealt with in the same spirit. For
`example, if there is a subexpression e':::; t, then add e' in with the
`e/s, but change e' < t to e':::; t. If there is also a subexpression t:::;
`d', then add d' in with the d/s, but in the first AND operation,
`change e' < d' toe':::; d'. Finally, convert terms of the form e1 = t
`into e1 :::; t AND t:::; e1 and e2 * t into e2 < tOR e2 > t.
`4.3 Handling NOT EXISTS terms
`The previous section required that there were no NOT EXISTS
`terms in the query. This restriction is now removed by allowing
`terms of the form
`
`NOT EXISTS(
`SELECT * FROM tbll, tb12, ... , thin WHERE R).
`
`These new terms can be thought of as "for every" terms, since
`-,3xP(x) is the same as '1::/x.P(x).
`
`Complications arise if the subexpression R involves calls to Get(cid:173)
`Date(). Since calling GetDate() returns the current value of time t,
`R is written as R(t) to indicate that it depends on the current time.
`First, assume that the function R(t) is already monotone (this
`includes the case when R does not involve t, that is, does not con(cid:173)
`tain any GetDate() calls).
`
`The NOT EXISTS subexpression shown above is true for the orig(cid:173)
`inal query Q(t) if no records exist at timet that satisfy the R(t). As
`before, the rewrite rules need to add tbl.ts < t subexpressions to
`encode the fact that messages are not in the database until time t =
`tbl.ts:
`
`NOT EXISTS (
`SELECT* FROM tbll, tbl2, ... , tblk
`WHERE R(t) AND
`tbll.ts < t AND tbl2.ts < t AND ... AND tblk.ts < t).
`
`This is equivalent to
`
`NOT EXISTS (
`SELECT* FROM tbll, tbl2, ... , tblk
`WHERE
`R(t) AND MAX(tbll.ts, tbl2.ts, ... , tblk.ts) < t).
`
`As explained in the previous section, each disjunct can be handled
`separately, and each disjunct containing a NOT EXISTS subex(cid:173)
`pression will be in the form
`
`MAX(el, e2, ... ,en)< t AND
`MIN(dl, d2, ... , dm) > t AND P AND
`NOT EXISTS (
`SELECT* FROM tbll, tbl2, ... , tblk
`WHERE
`R(t) AND MAX(tbll.ts, tbl2.ts, ... , tblk.ts) < t)
`
`Because R(t) is monotone, any row that satisfies it can never
`cease to satisfy it as time goes on. Therefore, the NOT EXISTS
`subexpression can never go from FALSE to TRUE. Thus, it suf(cid:173)
`fices to test R at the earliest time that the rest of the disjunct could
`become true, namely C = MAX(e1, e2, ... ,en). The subexpression
`becomes
`C < t AND MIN(dt> d2 .... , dm) > t AND P AND
`NOT EXISTS (
`SELECT* FROM tbll, tbl2, ... , tblk
`WHERE
`R(C) AND MAX(tbll.ts, tbl2.ts, ... , tblk.ts) <C)
`
`SELECT *FROMm
`WHERE m.ts + 2 weeks < t AND
`NOT EXISTS(
`SELECT *FROMm ml
`WHERE ml.imeplyto = m.msgid)
`
`adding the MAX subexpression gives
`
`SELECT *FROMm
`WHERE m.ts + 2 weeks < t AND
`NOT EXISTS(
`SELECT* FROMm ml
`WHERE ml.imeplyto = m.msgid AND ml.ts < t)
`C = m.ts + 2 weeks, so the monotone query is
`SELECT * FROMm
`WHERE m.ts + 2 weeks < t AND
`NOT EXISTS(
`SELECT * FROMm ml
`WHERE
`ml.imeplyto = m.msgid AND
`ml.ts < m.ts + 2 weeks)
`
`This says "select messages more than 2 weeks old that did not
`receive a reply in their first two weeks", which is exactly the
`smallest monotone query returning at least the messages of the
`original query.
`
`Next, consider the most general case, where R(t) is not necessarily
`monotone. Unfortunately, R(t) cannot simply be converted to the
`minimal bounding monotone query and use the rules from above,
`as this would change the meaning of the query. For example, con(cid:173)
`sider the query "select messages that have not received a reply for
`the last two weeks." This is
`
`SELECT m.msgid FROMm
`WHERE NOT EXISTS(
`SELECT *FROMm ml
`WHERE
`ml.imeplyto = m.msgid AND t < ml.ts + 2 weeks).
`which means "select messages for which there does not exist a
`reply less than two weeks old." Making the subquery in the NOT
`EXISTS monotone produces a query that means "select messages
`for which there does not exist a reply of any age," which is not the
`same query.
`
`Monotonizing a NOT EXISTS with arbitrary R(t), requires finding
`a way to quantify over time, turning NOT EXISTS( ... t ... ) into 3 t
`NOT EXISTS( ... t .... ). It is not possible to quantify overt in SQL,
`b

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket