`
`
`
`
`
`
`
`
`
`
`(12) United States Patent
`US 6,882,993 B1
`(10) Patent N0.:
`Lawande et al.
`(45) Date of Patent: Apr. 19, 2005
`
`
`
`
`
`
`
`
`
`USOO6882993B1
`
`(54)
`
`
`
`(75)
`
`
`
`(:73)
`
`
`
`Assignee:
`
`
`
`
`
`Notice:
`
`
`
`
`
`
`
`INCREMENTAL REFRESH OF
`
`
`MATERIALIZED VIEWS WITH JOINS AND
`
`
`
`
`AGGREGATES AFTER ARBITRARY DML
`
`
`
`
`OPERATIONS TO MULTIPLE TABLES
`
`
`
`Inventors: Shilpa Lawande, Nashua, N11 (US);
`
`
`
`
`
`Abhinav Gupta, Palo Alto, CA (US);
`
`
`
`
`
`Benoit Dageville, Foster City, (IA (US)
`
`
`
`
`Oracle International Corporation,
`
`
`
`Redwood Shores, CA (US)
`
`
`
`Subject to any disclaimer, the term of this
`
`
`
`
`
`
`patent is extended or adjusted under 35
`
`
`
`
`U.S.(I. 154(b) by 453 days.
`
`
`
`
`Appl. No.2 10/059,616
`
`
`
`
`Filed:
`Jan. 28, 2002
`
`
`
`
`
`
`
`
`
`
`Int. Cl.7 ................................................ G06F 17/30
`US. Cl.
`707/2; 707/102
`
`
`
`Field of Search .................... 707/1—10, 100—1041,
`
`
`
`
`707/200—205
`References Cited
`
`
`US. PATENT DOCUMENTS
`
`
`
`................... 707/200
`Sun et al.
`5,963,959 A * 10/1999
`
`
`
`
`
`Bello et al.
`707/3
`6,496,819 B1 * 12/2002
`
`
`
`
`
`
`
`
`
`6,546,402 B1 *
`4/2003
`
`Beyer et al.
`707/201
`
`
`
`
`
`
`
`Arora ...........
`. 707/102
`6,708,179 B1 *
`3/2004
`
`
`
`
`
`
`
`6,763 352 B1
`7/2004
`Cochranc ct a1.
`.............. 707/4
`
`
`
`
`
`
`
`OTIIER PUBLICATIONS
`
`
`Jonathan Goldstein et al., Optimizing queries using materi—
`
`
`
`
`
`
`
`alized views: a practical solution, 2001, ACM Press Specia
`
`
`
`
`
`
`
`
`Interest Group on Management of Data, pp. 331—342.*
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Satyanarayana R. Valluri et al., View relevance driven
`
`
`
`
`
`
`
`materialized View selection in data warehousing environ—
`
`
`
`
`
`
`ment, 2002, Austrialian Computer Society, Inc. Darling-
`
`
`
`
`
`
`hurst, Australia, pp. 18741963F
`
`
`
`
`Oracle Corporation, “Create Materialized View” copyright
`
`
`
`
`
`1996—2001, Oracle 9i Reference Release 1(9.0.1) Part
`
`
`
`
`
`
`#A90125701, pp. 1722)?
`
`
`
`* cited by examiner
`
`
`
`
`
`
`
`
`
`
`
`
`Primary ExaminerHMane I). Mizrahi
`
`
`
`
`(74) Attorney, Agent, or Firm—Hickman Palermo Truong
`
`
`
`
`
`
`& Becker LLP; John D. Henkhaus
`
`
`
`
`
`(57)
`ABSTRACT
`
`
`
`A method is provided for incrementally refreshing a mate-
`
`
`
`
`
`
`rialized view after multiple operations on a row of a base
`
`
`
`
`
`
`table of the materialized View, by determining an equivalent
`
`
`
`
`operation for the multiple operations and refreshing the
`
`
`
`
`
`
`
`materialized View according to the equivalent operation. The
`
`
`
`
`
`
`method is applicable to a materialized view based on mul-
`
`
`
`
`
`tiple base tables on which multiple operations have been
`
`
`
`
`
`
`
`
`performed. The step of determining the equivalent operation
`
`
`
`
`
`
`can include identifying rows for which an earliest operation
`
`
`
`
`
`
`
`is a DELETE operation, or rows for which a latest operation
`
`
`
`
`
`
`is an INSERT operation, or a combination of the two. The
`
`
`
`
`
`
`step of refreshing the materialized view includes performing
`
`
`
`
`
`
`an inverse operation of the equivalent operation to determine
`
`
`
`
`
`a pre-update state of the row, and refreshing the materialized
`
`
`
`
`
`
`
`view based on the pre—npdate state. Additional embodiments
`
`
`
`
`
`are provided which enhance the performance of materialized
`
`
`
`
`
`
`View refresh queries.
`
`
`
`33 Claims, 12 Drawing Sheets
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`404
`
`t1
`
`PRE_
`
`
`
`
`
`BASEJABLE
`1
`
`
`(UNKNOWN)
`
`
`
`BASE TABLE
`
`
`T2
`
`
`
`
`(UNKNOWN)
`IIIIIIIIIIIII
`
`
`
`MATERIALIZED
`
`
`
`
`VIEW
`
`
`MV
`
`I'Illlllll'lll
`
`
`
`(KNOWN)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`402
`
`Page 1 of 27
`
`
`
`
`
`
`
`DELTA_
`
`DELTA
`
`BASETABLE
`T
`
`t
`
`
`
`
`
`(UNKNOWN)
`
`
`
`t2
`POST_
`f Illllllllllll’I
`/
`BASE TABLE
`z
`I
`I
`
`
`
`
`
`
`(KNOWN)
`
`
`
`408
`
`
`
`
`
`
`
`DELTA
`
`
`BASE TABLE
`
`T2
`(UNKNOWN)
`
`
`
`
`
`
`
`
`
`
`
`
`
`' [III/IIIIIIII’
`a
`a
`;
`BASE TABLE
`I
`
`3
`g
`T2
`
`I
`g
`1
`I
`
`
`”’”””N ”
`
`
`
`412
`
`
`
`410
`
`
`
`MV1
`
`
`
`DELTA
`
`MATERIALIZED
`VIEW
`
`
`(UNKNOWN)
`
`
`
`MATERIALIZED
`VIEW
`
`
`
`
`MV
`
`
`
`(UNKNOWN)
`
`
`
`LGE Exhibit 1018
`
`LG V. Uniloc
`
`IPR2017-01427
`
`Page 1 of 27
`
`LGE Exhibit 1018
`LG v. Uniloc
`IPR2017-01427
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`
`Sheet 1 0f 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`(t1) TIME TABLE 102
`-——_
`
`
`
`
`
`
`
`
`
`
`108 ll-——
`
`
`
`
`
`
`
`
`
`M"
`
`
`
`
`
`
`-—-E-—
`
`
`
`
`
`
`
`Inn-a.-
`
`
`
`
`———_
`
`
`
`
`
`
`
`
`“_—_
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`02) LOCATION TABLE 104
`
`
`
`
`'-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 2 of 27
`
`Page 2 of 27
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 2 0f 12
`
`
`
`US 6,882,993 B1
`
`
`
`SALESREPORT TABLE 202
`
`
`
`
`
`
`
`
`JAN 1980
`
`
`
`
`JAN 1980
`
`
`
`
`Santa Clara
`
`
`
`
`Los Angeles
`
`
`
`$50,000
`
`
`
`$90,000
`
`
`
`JAN 1980
`
`Atlanta
`
`$30,000
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIG. 2
`
`
`
`Page 3 0f 27
`
`Page 3 of 27
`
`
`
`US. Patent
`
`Apr. 19, 2005
`
`nM3m
`
`US 6,882,993 B1
`
`mumNam
`
`h
`
`§ ,
`
`g
`
`omm_oddNaxz:_x8252mofimflz.
`
`€09xmoEmz_292023228AoEzoo
`
`
`
`(0
`
`S?
`u.
`
`Page 4 of 27
`
`mama
`
`Page 4 of 27
`
`
`
`US. Patent
`
`Apr. 19, 2005
`
`Sheet 4 0f 12
`
`US 6,882,993 B1
`
`'\\'\\\.\\\\~
`
`m3mow
`59:mm<mm\\““\\\‘~‘§‘\“
`
`\
`
`““‘\““-‘~‘\
`
`\\N._.“up“NH\\MESmw<mo:m59¢mm<mmm8EB39:mm<m
`
`
`2,\~Lw§w7ww§§m226522Azgozxza
`
`“.‘\‘\“‘.‘“‘
`
`
`2m;Es
`>2”3,>s_
`
`
`
`BN_._<EE<_2ommqfimwzs.BEER:
`
`
`
`.“-~‘~‘~“~‘
`
`
`
`
`
`226522Azaozxza230sz
`
`v.GE
`
`
`
`AixméwfiVi“Azgozxza;c“:2265.22
`
`
`
`
`
`Emma
`
`
`
`1.50mJim—moImma
`
`Page 5 0f 27
`
`
`
`m.._m<hwm<m
`
`NEEmm<m3v
`
`Page 5 of 27
`
`
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 5 0f 12
`
`
`
`US 6,882,993 B1
`
`2.5%
`
`
`
`comm
`
`
`
`m
`
`
`
`
`
`
`
`
`
`
`
`mm.OE
`
`
`mz_>m_E
`
`mwowz<w
`
`
`IIo_loo._
`D_I>>Om
`
`
`
`Q_..m__>=._.Q_IOO._lemj<mQI>>Om
`
`
`
`
`
`moonmméwlhoaoommiwmm
`
`avom20.5.0041me3mzoEmmmo2SE$49:
`
`
`
`
`
`
`
`
`F
`
`
`
`N
`
`
`
`8223,w
`
`82mm”.N
`
`
`
`«mommgr—Hum;
`
`
`
`5.202032:.n=z>>Om
`
`
`
`Page 6 0f 27
`
`Page 6 of 27
`
`
`
`
`
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`Sheet 6 0f 12
`
`
`
`
`
`
`US 6,882,993 B1
`
`FIG.5b
`
`
`
`$600
`
`or)
`
`N
`
`
`
`$400
`
`N
`
`.—
`
`2'
`l—
`
`OI
`
`'—
`
`Q
`LL!I
`2
`l—
`
`QI
`8
`_|
`
`
`
`or)
`
`(\l
`
`D
`_I
`
`a4
`U)
`
`QI
`3
`
`0C
`
`C
`
`
`
`D
`8
`“7
`fl
`2%
`ml
`I—
`o
`3
`
`DO0
`
`‘:
`0.]
`
`
`
`TABLESAFTERFIRSTOPERATIONS(t1)
`
`
`
`
`
`o.
`
`
`
`
`
`
`
`
`
`PRELOCATION504b
`
`
`
`
`
`PRETIME502b
`
`E 3II
`
`
`
`
`m IRVINE
`
`
`SANJOSE
`I—
`
`ROW_IDLOC_|D -
`
`
`
`
`ROW_IDTIME_IDMONTH
`
`
`
`
`
`
`
`
`
`MARCH1980
`FEB1980
`
`anH“
`
`
`
`
`
`
`
`
`Page 7 0f 27
`
`Page 7 of 27
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 7 0f 12
`
`
`
`US 6,882,993 B1
`
`.—
`
`Eb...
`
`
`
`Elms—z.
`
`IOO._
`Q_|mm._<w
`
`QI>>Om
`
`oom
`
`
`
`com
`
`
`
`o _
`
`
`
`>._._o
`
`9100;
`
`a
`
`
`
`
`
`._.
`
`
`Elms:
`
`
`0.25¢
`
`m_z_>m_
`
`mmowz<
`
`w
`
`
`
`
`
`om.GE
`
`m
`
`
`
`m
`
`
`
`
`
`II.
`
`ooomwwéwflOnoOmde
`
`
`
`$8zO_._.<OO._ImEm
`
`omomm2_._.lwmn_
`
`
`
`
`
`Agsz_._.<mmn_O0200mm.mm;wmij.
`
`
`
`
`
`
`
`
`
`Page 8 0f 27
`
`Page 8 of 27
`
`
`
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 8 0f 12
`
`
`
`US 6,882,993 B1
`
`E
`
`
`
`
`
`
`
`
`
`
`.._<._.0._.n:1m=2_._.9100..913.7%o_1>>0m
`
`um.GE
`
`ow?23,
`
`290mmI550
`
`I._.zOs_o_1m=2_._.
`
`
`
`
`
`boomww._<w1._.o:oomm1._.m0n_
`
`
`
`
`
`38292004150;
`
`
`
`umommEFIHwOm
`
`
`
`3mzozémmo0%:Eh:was.
`
`Page 9 0f 27
`
`Page 9 of 27
`
`
`
`
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 9 0f 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`MLOG$_TIME
`
`
`
`M_ROW$$
`
`
`
`SEQ$$
`
`
`
`
`OLD_NEW$$
`
`
`
`DMLTYPE$$
`
`
`
`MONTH
`
`
`
`
`JAN 1980
`
`
`
`
`
`
`MAR 1980
`
`
`
`606
`
`
`
`
`
`602
`
`
`
`610
`
`
`
`
`
`
`608
`
`
`
`612
`
`
`
`
`FEB 1980
`
`
`
`
`JAN 1980
`
`
`
`
`MAR 1980
`
`
`
`
`FEB 1980
`
`
`
`
`JAN 1980
`
`
`
`
`JAN 1980
`
`
`
`
`JAN 1980
`
`
`
`
`DEC 1979
`
`
`
`
`FEB 1980
`
`
`
`
`JAN 1980
`
`
`
`
`
`
`
` I
`
`
`
`
`FIG. 6a
`
`
`
`
`
`
`604
`
`
`
`Page 10 of 27
`
`Page 10 of 27
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 10 0f 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`M_ROW$$
`
`
`
`
`
`MLOG$_PRODUCT_SALES
`
`
`SEQ$$
`
`
`
`
`OLD NEW$$
`
`
`
`DMLTYPE$$
`
`
`
`
`
`TOTAL
`
`
`
` $300
`
`
`
`
`
`1
`
`2
`
`1
`
`2
`
`2
`
`3
`
`3
`
`4
`
`-
`
`N
`
`N
`
`
`
`1 I.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`$600
`
`$400
`
`$500
`
`$600
`
`$500
`
`$800
`
`$500
`
`$300
`
`$200
`
`$500
`
`$400
`
`
`FIG. 6b
`
`Page 11 of 27
`
`Page 11 of 27
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`
`Sheet 11 0f 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`
`
`DETERMINE AN EQUIVALENT OPERATION FOR
`
`
`
`
`A PLURALITY OF OPERATIONS ON A ROW OF A
`
`
`
`
`BASE TABLE OF A MATERIALIZED VIEW
`
`ON THE EQUIVALENT OPERATION
`
`
`
`
`REFRESH THE MATERIALIZED VIEW BASED
`
`
`
`
`
`
`704
`
`
`
`
`FIG. 7
`
`Page 12 of 27
`
`Page 12 of 27
`
`
`
`
`US. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 12 0f 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`
`IF A PLURALITY OF OPERATIONS ON A ROW OF A
`
`
`
`
`
`BASE TABLE OF A MATERIALIZED VIEW ARE DESCRIBED
`
`
`
`
`
`IN A MATERIALIZED VIEW LOG AND A DIRECT LOAD LOG,
`
`
`
`
`
`COMBINE THE LOGS INTO A COMBINED LOG
`
`
`
`802
`
`
`
`
`
`
`
`IDENTIFY ROWS FOR WHICH AN EARLIEST OPERATION IS A
`
`
`
`
`DELETE OPERATION AND/OR A LATEST OPERATION IS
`
`
`
`
`AN INSERT OPERATION
`
`
`
`
`
`804
`
`
`
`EARLI EST DELETE AND/OR LATEST INSERT OPERATION
`
`
`
`
`
`DETERMINE AN EQUIVALENT OPERATION FOR THE
`
`
`
`PLURALITY OF OPERATIONS ACCORDING TO THE
`
`
`
`
`
`
`806
`
`
`
`
`
`
`
`
`
`
`PERFORM AN INVERSE OPERATION OF THE EQUIVALENT
`
`
`
`
`OPERATION ON EACH CHANGED ROW TO DETERMINE A
`
`
`
`
`PRE-UPDATE STATE OF THE ROW
`
`
`
`THE PRE-UPDATE STATE OF THE ROW
`
`
`
`
`
`REFRESH THE MATERIALIZED VIEW BASED ON
`
`
`
`
`
`808
`
`
`
`810
`
`
`
`
`FIG. 8
`
`Page 13 of 27
`
`Page 13 of 27
`
`
`
`
`
`US 6,882,993 B1
`
`
`1
`INCREMENTAL REFRESH OF
`
`
`MATERIALIZED VIEWS WITH JOINS AND
`
`
`
`
`
`AGGREGATES AFTER ARBITRARY DML
`
`
`
`
`OPERATIONS TO MULTIPLE TABLES
`
`
`
`
`
`FIELD OF THE INVENTION
`
`
`
`
`
`
`
`
`
`The present invention relates generally to database man-
`
`
`
`
`
`
`
`agement systems and, more specifically, to techniques for
`
`
`
`
`
`
`incrementally refreshing materialized views with joins and
`
`
`
`
`
`
`aggregates after arbitrary DML operations to multiple base
`tables.
`
`
`
`
`10
`
`
`
`
`
`
`
`
`
`
`BACKGROUND OF THE INVENTION
`
`
`
`
`
`
`
`
`
`
`In a database management system (DBMS), data is stored
`in one or more data containers, each container contains
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`records, and the data within each record is organized into
`
`
`
`
`
`
`
`
`one or more fields.
`In relational database management
`
`
`
`
`
`
`
`
`
`
`systems, the data containers are referred to as tables, the
`records are referred to as rows, and the fields are referred to
`
`
`
`
`
`
`
`
`as columns. In object oriented databases, the data containers
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`are referred to as object classes, the records are referred to
`
`
`
`
`
`
`
`
`
`as objects, and the fields are referred to as attributes. Other
`
`
`
`
`
`database architectures may use other terminology.
`
`
`
`
`
`
`Database management systems retrieve information in
`
`
`
`
`
`
`
`response to receiving queries that specify the information to
`
`
`
`
`
`
`
`retrieve. In order for a database management system to
`
`
`
`
`
`
`
`
`understand the query, the query should conform to a data-
`
`
`
`
`
`
`
`base language recognized by the database management
`
`
`
`
`
`
`
`system, such as the Structured Query Language (SQL).
`Materialized Views
`
`
`
`
`
`
`
`
`
`Computer database management systems that are used for
`
`
`
`
`
`data warehousing frequently store pre-computed summary
`
`
`
`
`
`information in summary tables in order to speed up query
`
`
`
`
`
`
`
`
`processing. The summary tables are often referred to as
`materialized views. The data from which the materialized
`
`
`
`
`
`
`
`views are generated are referred to as base data. The tables
`
`
`
`
`
`
`
`
`that contain the base data are referred to as base tables.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Materialized views typically store aggregated
`information, such as “sum of PRODUCTiSALES, by
`
`
`
`
`
`
`
`
`
`
`
`
`
`region, by month.” As new data is periodically added to the
`
`
`
`
`
`
`
`
`base tables or existing data is periodically operated upon, the
`
`
`
`
`
`
`summary information needs to be updated (i.e., refreshed) to
`reflect the new base data.
`
`
`
`
`
`
`
`
`
`
`
`One approach to refreshing materialized views is referred
`to as the “total refresh” or “full refresh” approach. Accord-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ing to the total refresh approach, the values in materialized
`views are recalculated based on all of the base data. Systems
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`that employ a total refresh approach have the disadvantage
`
`
`
`
`
`
`
`that the recreation process is a relatively lengthy operation
`due to the size and number of tables from which the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`summary information is derived. For example, when ten
`new rows are added to a particular base table that contains
`
`
`
`
`
`
`
`
`
`a million rows, a total refresh operation would have to
`
`
`
`
`
`
`
`
`process all one million and ten rows of the base table to
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`regenerate the materialized views derived using the base
`table.
`
`
`
`
`
`
`
`The process of updating summary information contained
`
`
`
`
`
`
`in materialized views may be improved by performing an
`
`
`
`
`
`
`
`incremental refresh during which, rather than generating a
`new set of summary information based on calculations that
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`use all of the base data, the summary information is updated
`
`
`
`
`
`
`
`
`
`based on previous summary values and the new base data.
`
`
`
`
`
`
`One difficulty associated with performing an incremental
`refresh is that a single materialized view may contain
`
`
`
`
`
`
`
`
`Page 14 of 27
`
`15
`
`
`
`20
`
`25
`
`
`
`30
`
`35
`
`
`
`40
`
`
`
`45
`
`
`
`50
`
`
`
`55
`
`
`
`60
`
`
`
`65
`
`
`
`
`2
`
`
`
`
`
`
`
`
`summarized data derived from multiple base tables. For
`
`
`
`
`
`
`
`
`example, assume that a database includes the three base
`
`
`
`
`
`
`
`
`
`tables illustrated in FIG. 1. Referring to FIG. 1, a
`PRODUCTiSALES table 106 contains information about
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`specific sales made by a company. A LOCATION table 104
`contains information about where the sales took place. A
`
`
`
`
`
`
`
`
`TIME table 102 contains information about the times at
`
`
`
`
`
`
`
`
`which sales were made.
`
`
`
`
`FIG. 2 illustrates a SALESREPORT table 202 that stores
`
`
`
`
`
`
`
`
`
`
`
`
`the pre-computed result of an aggregate query based on the
`PRODUCTiSALES table 106, LOCATION table 104 and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`TIME table 102. In the specific example given, SALESRE-
`PORT table 202 stores the sum of all sales made in each city
`
`
`
`
`
`
`
`
`
`
`
`
`during each month.
`Information that defines a materialized view is referred to
`
`
`
`
`
`
`
`
`
`
`
`
`herein as a “summary definition”. A summary definition
`
`
`
`
`
`
`
`
`
`
`indicates (1) the location of the base data that is used to
`
`
`
`
`
`
`
`
`
`
`derive the materialized view, and (2) how the base data from
`
`
`
`
`
`
`
`
`the base tables should be aggregated to derive the summa-
`
`
`
`
`
`
`
`
`rized data. In many systems, summary definitions take the
`
`
`
`
`
`
`
`
`
`form of queries. For example, the query for the SALESRE-
`
`
`
`
`
`
`
`PORT table 202 may be expressed in a relational database
`
`
`
`
`
`
`
`management system using the following SQL language
`
`
`
`
`
`
`
`statement which joins the three base tables:
`
`
`
`
`
`
`SELECT t1.month, t2.city, SUM (t3.total) sumsales
`FROM TIME t1, LOCATION t2, PRODUCTiSALES t3
`
`
`
`
`
`
`
`
`
`
`
`WHERE t1.timeiid=t3.timeiid AND
`t2.lociid=t3.lociid
`
`
`
`
`
`
`GROUP BY t1.month, t2.city
`The “SELECT” line indicates the columns that are to be
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`used to generate the materialized view. The “FROM” line
`indicates the tables that have those columns. The “WHERE”
`
`
`
`
`
`
`
`
`
`line indicates the criteria for joining values from one table
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`with corresponding values from the other
`tables. For
`example, values in row 108 of TIME table 102, row 110 of
`
`
`
`
`
`
`
`
`
`LOCATION table 104, and row 112 of PRODUCTiSALES
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`table 106 would be joined together because the timeiid
`value in row 108 matches the timeiid value in row 112, and
`
`
`
`
`
`
`
`
`
`
`
`the lociid value in row 110 matches the lociid value in row
`
`
`
`
`
`
`
`
`
`
`
`112.
`
`The “GROUP BY” line indicates that the rows retrieved
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`by the select statement should be put into groups for each
`month/city combination. The “sum” function in the
`
`
`
`
`
`
`
`“SELECT” line indicates that for each of those groups, the
`
`
`
`
`
`
`
`
`
`values from the “total” column of PRODUCTiSALES
`
`
`
`
`
`
`
`
`table 106 are to be summed. The resulting materialized view
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`will thus have one row for every month/city combination,
`and that row will have a column called “sumsales” that
`
`
`
`
`
`
`
`
`
`contains the sum of the “total” column values for that
`
`
`
`
`
`
`
`
`
`
`
`month/city combination.
`The SALESREPORT table 202 illustrated in FIG. 2 is an
`
`
`
`
`
`
`
`
`
`
`
`
`example of the materialized view that would be generated in
`
`
`
`
`
`
`
`response to the materialized view definition specified above.
`The SALESREPORT table 202 has the three columns speci-
`
`
`
`
`
`
`
`
`
`fied in the “SELECT” line of the materialized view defini-
`
`
`
`
`
`
`
`
`
`tion: month, city, and sumsales.
`
`
`
`
`
`If the system that maintains the SALESREPORT table
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`202 does not support incremental refresh, the query listed
`above must be run against the base tables each time the base
`
`
`
`
`
`
`
`
`
`
`
`tables are modified. Rather than re-compute the entire con-
`
`
`
`
`
`
`
`
`
`tents of SALESREPORT when data is modified or new data
`
`
`
`
`
`
`
`to
`is added to the system,
`it would be more efficient
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`re-compute just the changes to the existing SALESREPORT
`that result from the base table data modifications.
`
`
`
`
`
`
`
`
`
`
`Join Operation
`
`
`
`
`
`including materialized view
`To respond to queries,
`
`
`
`
`
`
`queries, a database server typically has to perform numerous
`
`
`
`
`
`
`
`
`
`
`Page 14 of 27
`
`
`
`
`
`US 6,882,993 B1
`
`
`3
`
`
`
`
`
`
`
`
`table join operations because the database records that
`
`
`
`
`
`
`
`
`contain the information that is needed to respond to the
`
`
`
`
`
`
`
`
`queries are often organized into a star schema. Astar schema
`
`
`
`
`
`
`
`is distinguished by the presence of one or more relatively
`
`
`
`
`
`
`
`
`
`large tables and several relatively smaller tables. Rather than
`
`
`
`
`
`
`
`duplicating the information contained in the smaller tables,
`
`
`
`
`
`
`
`
`the large tables contain references (foreign key values) to
`rows stored in the smaller tables. The larger tables within a
`
`
`
`
`
`
`
`
`
`star schema are referred to as “fact tables”, while the smaller
`
`
`
`
`
`
`
`
`tables are referred to as “dimension tables”.
`
`
`
`
`
`
`
`
`
`
`
`
`
`When a database management system contains very large
`
`
`
`
`
`
`
`
`amounts of data, certain queries against the database can
`
`
`
`
`
`
`
`
`
`
`take an unacceptably long time to execute. The cost of
`
`
`
`
`
`
`executing a query may be particularly significant when the
`
`
`
`
`
`
`
`query requires joins among a large number of database
`tables.
`
`
`10
`
`15
`
`
`
`
`4
`
`
`
`
`
`
`contain column r, a column containing values representing
`
`
`
`
`
`
`
`
`regions. It is may be useful to generate a result set that
`
`
`
`
`
`
`
`
`summarizes sales by region, and for each region, sales date.
`
`
`
`
`
`
`
`Such a result set may be generated by referencing column r
`
`
`
`
`
`
`and d in the group-by clause, as illustrated by the following
`query.
`
`
`
`
`SELECT d, r SUM (s) from t
`GROUP BY r, d
`
`
`
`
`
`
`
`Another useful way to provide aggregate information is to
`
`
`
`
`
`
`
`
`
`generate one result set that groups data by various combi-
`
`
`
`
`
`
`
`
`nations of columns. For example, a result set may contain a
`
`
`
`
`
`
`
`
`
`set of rows grouped by region and date, and a set of rows
`
`
`
`
`
`
`
`
`grouped only by region. Such a result set may be generated
`
`
`
`
`
`
`
`by submitting a query that
`includes multiple subqueries
`
`
`
`
`
`operated upon by the union operator.
`
`20
`
`25
`
`30
`
`35
`
`
`
`Aggregate Function
`
`
`
`
`
`
`
`
`An important function for data generation and retrieval
`
`
`
`
`
`
`performed by a database management system is the genera-
`
`
`
`
`
`
`tion of aggregated information. Aggregated information is
`
`
`
`
`
`
`information derived by applying an aggregate function to the
`values in a column of a subset of rows in a “base table”.
`
`
`
`
`
`
`
`
`
`
`
`
`
`Examples of aggregate functions are functions that sum
`
`
`
`
`
`
`
`values, calculate averages, and determine minimum and
`maximum values. The column that contains the values to
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`which an aggregate function is applied is referred to as the
`measure.
`
`
`
`
`
`
`
`The subsets of rows to which an aggregate function is
`
`
`
`
`
`
`applied are determined by values in a “group-by” column.
`
`
`
`
`
`
`The aggregate information generated by a database man-
`
`
`
`
`
`
`
`
`agement system is presented as a result set having the
`
`
`
`
`
`
`
`group-by column and the measure column. In particular, the
`
`
`
`
`
`
`
`
`
`
`result set has one row for each unique value in the group-by
`
`
`
`
`
`
`
`
`
`column. Each row in the result set corresponds to the group
`of rows in the base table containing the value for the
`
`
`
`
`
`
`
`
`
`
`group-by column of the row. The measure column in the row
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`contains the output of the aggregate function applied to the
`40
`values in the measure column of the group of rows.
`
`
`
`
`
`
`
`
`
`
`
`
`Aggregate information is generated by a database man-
`
`
`
`
`
`
`agement system in response to receiving an aggregate query.
`
`
`
`
`
`
`
`
`An aggregate query specifies a group-by column, the mea-
`
`
`
`
`
`
`
`
`sure column, and the aggregate function to apply to the
`45
`
`
`
`
`
`
`
`
`measure values. The following query is provided as an
`illustration.
`
`
`
`
`
`SELECT d, SUM(s) sumis From t
`GROUP BY d
`
`
`
`
`
`
`
`
`Table t contains data representing the sales of an organi-
`
`
`
`
`
`
`
`zation. Each row represents a particular sales transaction.
`For a particular row in table t, column d contains the date of
`
`
`
`
`
`
`
`
`the sales transaction, and s contains the sale amount.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The SELECT clause contains “SUM(s)”, which specifies
`
`
`
`
`
`
`
`
`
`that the aggregate function “sum” is to be applied to the
`values in column s of table t. The query also includes the
`
`
`
`
`
`
`
`
`group-by clause “GROUP BY d”, which denotes column d
`
`
`
`
`
`
`
`
`
`
`as the group-by column.
`
`
`
`
`
`
`
`
`Execution of this query generates a result set with a
`
`
`
`
`
`
`
`column for d and a column for sum (s). A particular row in
`
`
`
`
`
`
`
`
`
`
`
`the result set represents the total sales (s)
`for all sale
`
`
`
`
`
`
`
`transactions in a given day (d). Specifically, for a particular
`row in the result set, d contains a unique date value from
`
`
`
`
`
`
`
`
`
`table t for column d. Column sumis contains the sum of the
`
`
`
`
`
`
`
`
`
`sales amount values in column s for the group of rows from
`
`
`
`
`
`
`
`
`
`i that have the unique date value in column d.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`is often useful
`to generate aggregate information
`It
`
`
`
`
`
`
`
`
`grouped by multiple columns. For example, table i may also
`
`50
`
`55
`
`60
`
`65
`
`Page 15 of 27
`
`
`
`
`Refreshing Materialized Views
`
`
`
`
`
`
`High-level relational algebra expressions which are the
`
`
`
`
`
`
`
`basis of an incremental refresh algorithm are described,
`
`
`
`
`using the following notations:
`
`
`
`
`
`
`Ti: relational table (pre-update state, before an operation
`
`
`has occurred);
`
`
`
`
`
`
`Tr: relational table (post-update state, after an operation
`
`
`has occurred);
`
`
`
`
`
`AT: delta corresponding to operations (e.g., INSERT,
`
`
`
`
`DELETE, or UPDATE) performed on table T;
`
`
`
`
`
`
`
`U: multi-set addition operation (UNION ALL); and
`
`
`
`T1><T2: innerjoin between T1 and T2.
`Incremental refresh of a materialized view involves
`
`
`
`
`
`
`
`
`
`
`
`updating the materialized view to reflect
`the operations
`
`
`
`
`
`
`
`performed on, or changes made to,
`the underlying base
`
`
`
`
`
`
`
`tables. Thus, for the following materialized view involving
`a join of two tables,
`
`
`
`
`MV=T><T2,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`if updates AT1 and AT2 are done to tables T1 and T2,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`respectively, the post-update versions of the tables, T1+ and
`
`
`
`T; are depicted as:
`T1+=T1UAT1; and
`
`
`T2+=T2UAT2.
`
`
`
`
`
`
`
`
`
`
`If MV+ represents the new state of the materialized view
`
`
`
`
`
`including the changes to T1 and T2 and AMV represents the
`
`
`
`
`changes applied to MV, then
`MV+=MVUAMV.
`
`
`
`
`
`Alternatively,
`
`
`
`MW 2 Tf X T;
`
`= (T1 U ATI) >< (T2 U AT2);
`
`
`
`=(T1X T2)U(T1><AT2)U(AT1><(T2 U AT2));
`
`
`
`=MVU(T1><AT2)U(AT1><T2+);
`
`
`
`
`
`
`
`(Expression 1)
`
`
`Hence,
`
`
`AMV=(T1><AT2)U (AT1><T2+)
`
`(Expression 2)
`
`
`
`
`
`
`
`This generalized representation of a query can be con-
`
`
`
`
`
`
`ceptualized as refreshing the materialized view, MV, assum-
`
`
`
`
`
`
`
`
`ing one table was updated at a time. Thus, assuming T2 was
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 15 of 27
`
`
`
`
`
`US 6,882,993 B1
`
`5
`
`
`
`
`
`
`
`updated first, T1 must be considered in its non-updated or
`
`
`
`
`
`
`
`pre-update state. This corresponds to the term (T1><T2).
`
`
`
`
`
`
`
`
`
`Once this change has been considered, T2 must now be used
`
`
`
`
`
`
`
`
`in its post-update state, i.e., T;. This corresponds to the
`
`
`term (AT1><T2+).
`
`
`
`
`
`
`
`Extending this to an MV with n tables,
`the following
`
`
`expression applies:
`MV=T1XT2XT3X... >< Tn
`
`
`AMV=AT1><T2><T3><...><T,,
`
`
`
`
`
`UfoAszT3x...><Tn
`
`
`
`(Expression 3)
`
`
`
`
`UfoT§><T§><...><ATn
`
`
`
`
`
`
`
`
`
`
`The above expression essentially represents one way of
`
`
`
`
`
`
`
`
`
`
`including all the changes to all of the base tables without
`
`
`
`
`
`
`
`
`counting any of them twice. This expression can be applied
`
`
`
`
`
`
`
`
`
`in general to any kind of materialized view. If only one table
`
`
`
`
`
`
`
`
`(for example, T1) is updated, then T,- and Ti+ are the same for
`
`
`
`
`
`
`
`
`
`
`i not equal to 1, and thus the pre-update state is not required
`
`
`
`
`
`
`
`
`to refresh the materialized view. In addition, if a table (for
`
`
`
`
`
`
`
`
`
`example, T1) is not updated, then T,- and Ti+ are the same (for
`
`
`example, T1=T1+).
`
`
`
`
`
`
`Prior approaches to refreshing materialized views have
`
`
`
`
`
`
`
`
`typically restricted the table operations in some way. For
`
`
`
`
`
`
`
`
`
`example, one approach requires that only one base table is
`
`
`
`
`
`
`
`
`updated when a view refresh is called. Hence, the pre-update
`
`
`
`
`
`
`
`
`
`state is not required and the problem is avoided. Such an
`
`
`
`
`
`
`
`
`approach may produce correct results for materialized views
`
`
`
`
`
`
`
`
`with aggregations within a single base table because there is
`
`
`
`
`
`
`
`
`only one table of interest, but, in practice, the assumption or
`likelihood that only one table has been updated before the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`refresh is called makes this approach impractical.
`
`
`
`
`
`
`
`Another approach restricts the base table operations to
`
`
`
`
`
`
`
`
`only INSERTs or only DELETEs. Thus, the pre-update state
`
`
`
`
`
`
`
`
`can easily be computed by determining the new rows (for
`
`
`
`
`
`
`
`
`
`
`INSERTs) or the rows that are no longer in the table (for
`
`
`
`
`
`
`DELETEs), and essentially reversing these operations.
`However, if an arbitrary mix of INSERT, UPDATE and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DELETE operations has been performed on more than one
`then the determination of the pre-update state is
`table,
`
`
`
`
`
`
`
`
`
`non-trivial since the same row could have been inserted,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`updated and deleted many times. Hereinafter,
`the terms
`“INSERT,” “DELETE,” and “UPDATE,” when referenced
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`in all capitals, each refer to an independent table operation.
`
`
`
`
`
`
`
`Still another approach is to use a memoryless algorithm.
`This means that all rows from the materialized view that
`
`
`
`
`
`
`
`
`
`
`could have been affected by any of the modified rows of base
`
`
`
`
`
`
`
`
`tables identified in a materialized view change log are
`
`
`
`
`
`
`
`
`removed. Then, the materialized view query is reissued and
`
`
`
`
`
`
`
`
`restricted to the new versions of these rows. Such an
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`approach may be sufficient for materialized views without
`
`
`
`
`
`
`
`aggregation, as each row in the base table is individually
`represented in the materialized view. However, due to the
`
`
`
`
`
`
`
`nature of aggregate functions, several rows in a base table
`
`
`
`
`
`
`
`are aggregated down to a few rows in the materialized view.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Therefore, multiple rows in a base table can affect any
`
`
`
`
`
`
`particular row in the materialized view. Consequently, a
`
`
`
`
`
`
`
`memoryless refresh approach is not efficient since it requires
`access to rows other than the changed rows. In addition,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`multiple changes performed on multiple rows in the base
`
`
`
`
`
`
`
`
`
`table could change any single row in the materialized view.
`
`
`
`
`
`
`
`Hence, memoryless refresh may at times be worse than a
`
`
`
`
`
`
`
`complete refresh due to the overhead of identifying affected
`rows.
`
`
`
`Page 16 of 27
`
`
`6
`
`
`
`
`
`
`Based on the foregoing, it is clearly desirable to provide
`
`
`
`
`
`a mechanism that incrementally refreshes a materialized
`
`
`
`
`
`
`view after arbitrary multiple operations on a row of a base
`
`
`
`
`
`
`table. Furthermore it is desirable to provide a mechanism
`
`
`
`
`
`
`that incrementally refreshes a materialized view after opera-
`
`
`
`
`
`tions to rows of multiple base tables.
`SUMMARY OF THE INVENTION
`
`
`
`
`
`
`
`
`
`A method is provided for incrementally refreshing a
`
`
`
`
`
`
`materialized view after multiple operations on a row of a
`
`
`
`
`
`
`
`
`
`base table of the materialized view, by determining an
`
`
`
`
`
`
`
`
`equivalent operation for the multiple operations and refresh-
`
`
`
`
`
`
`
`
`ing the materialized view according to the equivalent opera-
`
`
`
`
`
`
`
`tion. The method is applicable to a materialized view based
`
`
`
`
`
`
`
`
`on multiple base tables on which multiple operations have
`
`
`
`
`
`
`
`been performed. Thus, a materialized view defined by a
`
`
`
`
`
`
`
`
`
`query that joins two or more base tables or that aggregates
`
`
`
`
`
`
`
`
`
`values from multiple rows of a base table or multiple base
`
`
`
`
`
`
`
`tables, can be efficiently refreshed. Furthermore, the method
`
`
`
`
`
`
`
`
`is applicable to operations to base tables which occurred
`
`
`
`
`
`
`
`independently of each other and INSERT operations which
`
`
`
`were performed in bulk.
`
`
`
`
`
`
`
`According to embodiments, the step of determining the
`
`
`
`
`
`
`
`
`equivalent operation can include identifying rows for which
`
`
`
`
`
`
`
`
`an earliest operation from a plurality of operations was a
`
`
`
`
`
`
`
`
`DELETE operation, or rows for which a latest operation
`
`
`
`
`
`
`from a plurality of operations was an INSERT operation, or
`
`
`
`
`
`
`
`
`
`a combination of the two. In addition, according to one
`
`
`
`
`
`
`
`
`embodiment, the step of refreshing the materialized view
`
`
`
`
`
`
`includes performing an inverse operation of the equivalent
`
`
`
`
`
`
`
`
`operation to determine a pre-update state of the row, and
`
`
`
`
`
`
`
`refreshing the materialized view based on the pre-update
`state.
`
`Additional embodiments are described which enhance the
`
`
`
`
`
`
`
`
`
`
`
`
`performance of materialized view queries.
`In one such
`embodiment, the base tables are ordered according to the
`
`
`
`
`
`
`
`
`
`
`
`computational complexity of determining their pre-update
`
`
`
`
`
`
`state based on the rows operated upon, and the materialized
`
`
`
`
`
`
`
`view is refreshed by applying the more complex base tables
`before the less complex base tables. Other embodiments
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`combine queries for defining rows for which