`
`
`
`
`
`
`
`
`
`
`«2) United States Patent
`US 6,882,993 B1
`(10) Patent No.:
`
`
`
`
`
`
`
`Lawande et al.
`Apr. 19, 2005
`(45) Date of Patent:
`
`US006882993B1
`
`(54)
`
`
`
`
`
`(75)
`
`
`
`Notice:
`
`
`
`INCREMENTAL REFRESH OF
`
`
`MATERIALIZED VIEWS WITH JOINS AND
`
`
`
`
`AGGREGATES AFTER ARBITRARY DML
`
`
`
`
`OPERATIONS TO MULTIPLE TABLES
`
`
`
`
`
`
`
`
`Inventors: Shilpa Lawande, Nashua, NII (US);
`
`
`
`
`
`Abhinav Gupta, Palo Alto, CA (US);
`
`
`
`
`Benoit Dageville, Foster City, CA (US)
`
`
`
`
`
`(73) Assignee: Oracle International Corporation,
`
`
`
`Redwood Shores, CA (US)
`
`
`
`
`
`
`Subject to any disclaimer, the term of this
`
`
`
`
`patent is extended or adjusted under 35
`
`
`
`
`U.S.C. 154(b) by 453 days.
`
`
`
`
`
`
`
`
`
`10/059, 616
`Appl. No.:
`Filed:
`
`
`
`
`
`Jan. 28, 2002
`
`
`
`
`
`Tint, C17 ieececcccccccsescseesestesereeseseesenees GO6F 17/30
`
`
`US. Cl.
`wees 2707/2; 707/102
`.....
`
`Field of Search .........0..0000.... 707/1-10, 100-104.1,
`
`
`
`
`707/200-205
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`References Cited
`U.S. PATEN'T DOCUMENTS
`
`
`
`
`
`
`
`
`5,963,959 A * 10/1999 Sun et al. 0... 707/200
`
`
`
`
`
`
`
`
`6,496,819 B1 * 12/2002 Bello et al.
`w 707/3
`...
`
`
`
`
`
`
`
`
`6,546,402 B1 *
`4/2003 Beyer etal.
`
`707/201
`
`
`
`
`
`
`6,708,179 Bl *
`3/2004 Arora .......6
`- 707/102
`
`
`
`
`
`
`
`
`6,763,352 BL
`7/2004 Cochranect al... 707/4
`OTHER PUBLICATIONS
`
`
`
`
`
`
`
`
`
`Jonathan Goldstein et al., Optimizing queries using materi-
`
`
`
`
`
`
`
`
`alized views: a practical solution, 2001, ACM Press Specia
`
`
`
`
`
`
`Interest Group on Managementof Data, pp. 331-342.*
`
`
`
`
`
`
`
`
`Satyanarayana R. Valluri et al, View relevance driven
`
`
`
`
`
`
`materialized view selection in data warehousing environ-
`
`
`
`
`
`
`ment, 2002, Austrialian Computer Society, Inc. Darling-
`
`
`
`
`hurst, Australia, pp. 187—-196.*
`
`
`
`
`
`Oracle Corporation, “Create Materialized View” copyright
`
`
`
`
`
`
`1996-2001, Oracle 9i Reference Release 1(9.0.1) Part
`
`
`
`#A90125-01, pp. 1-22).*
`
`
`
`
`
`
`
`
`
`
`* cited by examiner
`
`
`
`
`
`
`
`
`
`Primary kxaminer—)iane D. Mizrahi
`
`
`
`
`
`
`(74) Attorney, Agent, or Firm—Hickman Palermo Truong
`
`
`
`
`
`& Becker LLP; John D. Henkhaus
`
`
`ABSTRACT
`(57)
`
`
`
`
`
`
`A method is provided for incrementally refreshing a mate-
`
`
`
`
`
`
`rialized view after multiple operations on a row of a base
`
`
`
`
`table of the materialized view, by determining an equivalent
`
`
`
`
`
`
`
`operation for the multiple operations and refreshing the
`
`
`
`
`
`
`materialized view according to the equivalent operation. The
`
`
`
`
`
`method is applicable to a materialized vicw based on mul-
`
`
`
`
`
`
`
`
`tiple base tables on which multiple operations have been
`
`
`
`
`
`
`performed. The step of determining the equivalent operation
`
`
`
`
`
`
`
`can include identifying rows for which an earliest operation
`
`
`
`
`
`
`is a DELETEoperation, or rows for which a latest operation
`is an INSERToperation, or a combination of the two. The
`
`
`
`
`
`
`
`
`
`
`
`
`step of refreshing the materialized viewincludes performing
`
`
`
`
`
`an inverse operation of the equivalent operation to determine
`
`
`
`
`
`
`
`a pre-update state of the row, and refreshing the materialized
`
`
`
`
`
`view based on the pre-update state. Additional embodiments
`
`
`
`
`
`
`are provided which enhance the performance of materialized
`
`
`
`view refresh queries.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`33 Claims, 12 Drawing Sheets
`
`
`
`
`ty
`
`PRE_
`BASETABLE
`
`I
`
`'
`(UNKNOWN)
`
`
`
`
`
`
`
`404
`
`
`
`
`
`
`
`
`
`
`
`402
`
`Page 1 of 27
`
`
`
`
`
`BASE TABLE
`
`
`Tg
`
`(UNKNOWN)
`(SLPLELLeALLELE
`y
`y MATERIALIZED
`
`VIEW
`4
`
`yj
`MV
`
`eens
`
`(KNOWN)
`
`
`
`
`
`
`
`
`
`408
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DELTA_
`DELTA
`
`BASETABLE
`
`Ty
`(UNKNOWN)
`
`
`
`
`
`DELTA
`
`
`BASE TABLE
`
`Ty
`(UNKNOWN)
`
`
`
`
`
`
`
`DELTA
`
`MATERIALIZED
`VIEW
`
`
`M4
`
`(UNKNOWN)
`
`
`
`
`
`
`
`to
`POST_
`yeeheerrees)
`DBASETABLE
`y
`y
`1
`in
`y
`Y
`
`4.
`77 L4
`AETNOWN)
`
`412
`
`
`
`
`
`
`
`
`
`
`
`
`LEE
`BASETABLE
`1
`1-410
`
`y
`y
`T
`4
`y
`2
`
`4
`7
`LA
`ZZ
`vt)
`
`KNOWN) aed
`
`
`
`
`
`MATERIALIZED
`
`VIEW
`
`MV
`(UNKNOWN)
`
`
`
`LGE Exhibit 1018
`
`LG v. Uniloc
`
`IPR2017-01427
`
`Page 1 of 27
`
`LGE Exhibit 1018
`LG v. Uniloc
`IPR2017-01427
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr.19, 2005
`
`
`
`
`
`Sheet 1 of 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`tt) TIME TABLE 102
`TIMED|DAY|HOUR|MONTH
`
`
`
`
`
`eeee
`
`
`
`
`
`
`
`
`
`
`108
`3|5|etaniggo
`
`
`
`
`
`
`
`
`aeeee
`
`
`
`
`
`
`V5|6|os|reBigg0
`
`
`po
`
`
`
`
`
`
`
`Eeeeee
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(2) LOCATION TABLE 104
`
`
`
`
`
`
`2wan[Losangeles[ca
`
`110
`
`2ain[atria[oa
`
`To
`
`cE
`
`
`
`
`Ci
`
`
`
`(3) PRODUCT_SALES 106
`
`
`
`
`
`Page 2 of 27
`
`Page 2 of 27
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr.19, 2005
`
`
`
`
`Sheet 2 of 12
`
`
`
`US 6,882,993 B1
`
`
`
`SALESREPORT TABLE 202
`
`MONTH
`
`
`JAN 1980
`
`
`
`
`JAN 1980
`
`
`
`JAN 1980
`
`JAN 1980
`
`JAN 1980
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CITY
`
`SUMSALES
`
`
`
`
`
`
`Santa Clara
`
`
`
`
`Los Angeles
`
`
`
`$50,000
`
`
`
`$90,000
`
`
`
`
`
`
`
`
`
`
`
`Atlanta
`
`Tustin
`
`Seattle
`
`$30,000
`
`$15,000
`
`$70,000
`
`$35,000
`
`JUL 1985
`Ogden
`
`
`
`
`FIG. 2
`
`
`
`Page 3 of 27
`
`Page 3 of 27
`
`
`
`U.S. Patent
`
`Apr. 19
`
`’
`
`2005
`
`Sheet 3 of 12
`
`US 6,882,993 B1
`
`Bde
`
`0ee
`
`ack
`Oe|DOS
`
`YNN|MYOMINooSyJOVAYSINI
`W901NOLLYOINNININOD|||9ct|20E|sna||||[orBOE|
`
`SOIAS
`
`id
`
`€“Old
`
`AV1dSI0
`
`
`
`SOIARGLAdNI
`
`ore
`
`YOSHND
`
`TOXLNOS
`
`Page 4 of 27
`
`Page 4 of 27
`
`
`
`
`
`
`U.S. Patent
`
`Apr. 19, 2005
`
`Sheet 4 of 12
`
`US 6,882,993 B1
`
`
`
` LOLVyA1aVlASVEpv2
`CLLLALALMMMMMM
`
`G}
`
`r/,CLAPLLLAhhhahha
`
`AW ASAASSASAASS
`
`(NMON))
`
`GAZNVIdaLVW
`
`MalA
`
`AW
`
`(NMONMNN)
`
`viqad
`
`
`
`TeV.asva
`
`C]
`
`(NMONYNN)
`
`Q3ZNIWidsLv
`
`VL1Sd
`
`NGIA
`
`LAW
`
`(NMONMNN)
`
`yOl
`
`FEVLsve
`
`Ch
`
`(NMONYNN)
`
`NS
`
`Yi"yAWy4
`yCLLAALMMMMMMhhh
`CLLALLMMMdokienkede
`QSZNVIdaLVA
`
`MalA
`
`(NMON™)
`
`(NMON»)
`
`(NMONYNN)
`
`(NMONXNN)
`
`y4chpyFlav1asvaYV7 OOTELILLLED
`
`y)4byyyy
`
`807
`
`visa
`
`TIGVLasva
`
`Mh
`
`~Witad
`
`“Wud
`
`hy
`
`aJIAVLsvevO?
`
`Page 5 of 27
`
`Page 5 of 27
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 5 of 12
`
`
`
`US 6,882,993 B1
`
`Pes
`W1LOL|Graw|a’ool|alsats
`
`890GS3TWSLONGOYdAYd
`
`
`(1)SNOLLYHadOOLHOUdSATAVL
` ao]DemfaloGl907|GiMOu
`
`8705NOLLVOOTSud
`0861NVIJoda
`HLNOW|GI-SWiL|GlMou
`
`e205SWILSud
`
`
`
`
`
`
`
`
`
`
`
`
`eG‘Olds
`
`
`
`
`
`
`
`eo
`
`
`
`
`
`
`
`
`
`
`
`alMOY
`
`
`
`
`
`
`
`
`
`
`
`Page 6 of 27
`
`Page 6 of 27
`
`
`
`
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 6 of 12
`
`
`
`US 6,882,993 B1
`
`$400
`
`$600
`
`
`
`2c
`
`a
`oS
`wy
`ws
`LL
`
`|xoB
`
`S>QOC
`
`c
`a.
`Lid
`cc
`aN
`
`
`
`Q
`
`J
`
`3c
`
`FIG.5b
`
`2+
`
`SB
`
`(e
`1S
`
`
`
`
`
`
`
`|
`
`1a
`
`= w
`
`c
`
`T55
`
`
`
`
`
`LOC_ID}TIMED|TOTAL pfa|
`
`SALES_IDpefepetefetes
`
`
`
`
`
`
`
`TABLESAFTERFIRSTOPERATIONS(ty)
`
`
`
`SANJOSE
`IRVINE
`pt2
`Pt
`
`|bSo
`
`MARCH1980
`
`FEB1980
`
`
`pefefsjs|
`
`
`{
`4|8
`ul|a
`oc
`a
`
`=23|2
`
`
`
`
`
`
`
`te
`cle
`a
`
`I
`
`=O
`
`o
`na
`
`WwW
`Lu
`=|2
`
`
`
`Page 7 of 27
`
`Page 7 of 27
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr. 19, 2005
`
`
`
`
`Sheet 7 of 12
`
`
`
`US 6,882,993 B1
`
`offsfoWLOL|GrAWit|al001falsas
`
`
`9906SS1VSLONdOYdA”d
`
`
`
`
`(21)SNCUWH3dOONOOSSHALVSATAVL
`alMOY
`
`3940SNOILYOOTSud
`
`
`
`
`
`
`
`
`
`Page 8 of 27
`
`
`
`
`
`
`
`glMOU
`
`
`
`
`
`
`
`9205SWILSud
`
`
`
`
`
`
`
`
`
`
`
`
`pe
`pe2pt
`
`
`
`
`
`
`
`
`
`pe
`
`
`pe
`
`9G“Old
`
`Page 8 of 27
`
`
`
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr.19, 2005
`
`
`
`
`Sheet 8 of 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`
`
`
`
`
`
`
`
`
`rowWioL|arawit|ao07jalsatvs|alMou p90g
`
`
`SAIVSLONGOXdLSOd
`De|ewep
`Pane
`
`
`(©)SNOLLVHAdOCHIHHALSTIaVL
`[|efPOSNOLLYOOT
`6261940oTHINOW|GISILL
`
`Pz0SSWILLSOd
`
`PSOld
`
`0861NVf
`
`Page 9 of 27
`
`Page 9 of 27
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr.19, 2005
`
`
`
`
`Sheet 9 of 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`
`
`
`MLOG$_TIME
`
`
`ares|WOT
`
`
`
`
`
`
`C8[or
`
`
`
`
`
`DMLTYPE$$
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`610
`
`
`
`[ovis
`
`
`To[vin
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` 608
`
`
`
`
`
`
`
`
`i
`
`
`
`
`
`
`FEB 1980
`
`
`
`
`
`JAN 1980
`
`612
`
`
`FIG. 6a
`
`Page 10 of 27
`
`Page 10 of 27
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr.19, 2005
`
`
`
`
`Sheet 10 of 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`
`
`
`MLOG$_PRODUCT_SALES
`
`
`
`
`
`ptfo|
`
`a2
`
`
`
`reae
`
`
`
`
`
`
`
`
`
`
`
`
`
`[|
`
`0
`
`
`?fe}eo|
`
`
`pett
`
`psttfe|
`
`
`
`$500
`
`
`
`
`
`1=
`
`U
`
`
`
`
`
`2?feto
`ptfefo
`
`
`
`
`
`
`Pe|sa
`
`
`
`
`Page 11 of 27
`
`Page 11 of 27
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr.19, 2005
`
`
`
`
`
`Sheet 11 of 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`
`
`DETERMINE AN EQUIVALENT OPERATION FOR
`
`
`
`
`A PLURALITY OF OPERATIONS ON A ROW OF A
`
`
`
`
`BASE TABLE OF A MATERIALIZED VIEW
`
`ON THE EQUIVALENT OPERATION
`
`
`
`
`REFRESH THE MATERIALIZED VIEW BASED
`
`
`
`
`
`
`
`
`
`FIG. 7
`
`Page 12 of 27
`
`Page 12 of 27
`
`
`
`
`U.S. Patent
`
`
`
`
`
`Apr.19, 2005
`
`
`
`
`Sheet 12 of 12
`
`
`
`US 6,882,993 B1
`
`
`
`
`
`IF A PLURALITY OF OPERATIONS ON A ROW OF A
`
`
`
`
`
`BASE TABLE OF A MATERIALIZED VIEW ARE DESCRIBED
`
`
`
`
`
`IN A MATERIALIZED VIEW LOG AND A DIRECT LOAD LOG,
`
`
`
`
`
`COMBINE THE LOGS INTO A COMBINED LOG
`AN INSERT OPERATION
`
`
`
`
`
`
`
`IDENTIFY ROWS FOR WHICH AN EARLIEST OPERATIONIS A
`
`
`
`
`DELETE OPERATION AND/OR A LATEST OPERATIONIS
`
`
`
`
`
`
`802
`
`
`
`804
`
`
`
`THE PRE-UPDATE STATE OF THE ROW
`
`
`
`
`
`
`
`
`
`
`
`DETERMINE AN EQUIVALENT OPERATION FOR THE
`
`
`
`PLURALITY OF OPERATIONS ACCORDING TO THE
`
`
`
`EARLIEST DELETE AND/OR LATEST INSERT OPERATION
`
`
`
`
`
`
`
`
`
`
`PERFORM AN INVERSE OPERATION OF THE EQUIVALENT
`
`
`
`
`OPERATION ON EACH CHANGED ROW TO DETERMINE A
`
`
`
`PRE-UPDATE STATE OF THE ROW
`
`
`
`
`
`
`
`REFRESH THE MATERIALIZED VIEW BASED ON
`
`
`
`
`
`810
`
`
`
`
`FIG. 8
`
`Page 13 of 27
`
`Page 13 of 27
`
`
`
`
`
`US 6,882,993 B1
`
`
`1
`
`
`INCREMENTAL REFRESH OF
`
`
`
`
`
`MATERIALIZED VIEWS WITH JOINS AND
`AGGREGATES AFTER ARBITRARY DML
`
`
`
`
`
`
`
`OPERATIONS TO MULTIPLE TABLES
`
`FIELD OF THE INVENTION
`
`
`
`
`
`
`
`
`The present invention relates generally to database man-
`
`
`
`
`
`
`
`agement systems and, more specifically, to techniques for
`
`
`
`
`
`
`incrementally refreshing materialized views with joins and
`
`
`
`
`
`
`aggregates after arbitrary DML operations to multiple base
`tables.
`
`
`
`
`
`
`10
`
`
`
`
`
`
`
`
`
`
`BACKGROUND OF THE INVENTION
`
`
`
`
`
`
`
`
`
`
`In a database management system (DBMS),datais stored
`
`
`
`
`
`
`
`
`in one or more data containers, each container contains
`
`
`
`
`
`
`
`
`
`records, and the data within each record is organized into
`
`
`
`
`
`
`
`
`one or more fields.
`In relational database management
`
`
`
`
`
`
`
`
`
`
`systems, the data containers are referred to as tables, the
`
`
`
`
`
`
`
`
`recordsare referred to as rows, and the fields are referred to
`
`
`
`
`
`
`
`
`as columns. In object oriented databases, the data containers
`
`
`
`
`
`
`
`
`are referred to as object classes, the records are referred to
`
`
`
`
`
`
`
`
`
`as objects, and the fields are referred to as attributes. Other
`
`
`
`
`
`database architectures may use other terminology.
`
`
`
`
`
`
`Database management systems retrieve information in
`
`
`
`
`
`
`
`response to receiving queries that specify the information to
`
`
`
`
`
`
`
`retrieve. In order for a database management system to
`
`
`
`
`
`
`
`
`understand the query, the query should conform to a data-
`
`
`
`
`
`
`
`base language recognized by the database management
`
`
`
`
`
`
`
`system, such as the Structured Query Language (SQL).
`Materialized Views
`
`
`
`
`
`
`
`
`
`Computer database management systemsthat are used for
`
`
`
`
`
`data warehousing frequently store pre-computed summary
`
`
`
`
`
`information in summary tables in order to speed up query
`
`
`
`
`
`
`
`
`processing. The summary tables are often referred to as
`materialized views. The data from which the materialized
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`views are generated are referred to as base data. The tables
`that contain the base data are referred to as basetables.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Materialized views typically store aggregated
`
`
`
`
`
`
`information, such as “sum of PRODUCT_SALES, by
`
`
`
`
`
`
`
`region, by month.” As new data is periodically added to the
`
`
`
`
`
`
`
`
`base tables or existing data is periodically operated upon,the
`
`
`
`
`
`
`summary information needs to be updated (i.e., refreshed) to
`reflect the new base data.
`
`
`
`
`
`
`
`
`
`
`
`One approachto refreshing materialized viewsis referred
`
`
`
`
`
`
`
`
`to as the “total refresh” or “full refresh” approach. Accord-
`
`
`
`
`
`
`
`
`ing to the total refresh approach, the values in materialized
`
`
`
`
`
`
`
`viewsare recalculated based onall of the base data. Systems
`
`
`
`
`
`
`
`
`that employ a total refresh approach have the disadvantage
`
`
`
`
`
`
`
`that the recreation process is a relatively lengthy operation
`due to the size and number of tables from which the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`summary information is derived. For example, when ten
`
`
`
`
`
`
`
`
`
`new rowsare added to a particular base table that contains
`
`
`
`
`
`
`
`
`a million rows, a total refresh operation would have to
`
`
`
`
`
`
`
`
`
`
`
`process all one million and ten rows of the base table to
`
`
`
`
`
`
`
`
`regenerate the materialized views derived using the base
`table.
`
`
`
`
`
`
`
`The process of updating summary information contained
`
`
`
`
`
`
`in materialized views may be improved by performing an
`
`
`
`
`
`
`
`incremental refresh during which, rather than generating a
`
`
`
`
`
`
`new set of summary information based on calculations that
`
`
`
`
`
`
`
`
`
`use all of the base data, the summary information is updated
`
`
`
`
`
`
`
`
`
`based on previous summary values and the new base data.
`
`
`
`
`
`
`Onedifficulty associated with performing an incremental
`
`
`
`
`
`
`
`
`refresh is that a single materialized view may contain
`Page 14 of 27
`
`15
`
`
`
`20
`
`25
`
`
`
`30
`
`35
`
`
`
`40
`
`
`
`45
`
`
`
`50
`
`
`
`55
`
`
`
`60
`
`
`
`65
`
`
`
`
`
`
`
`
`
`
`2
`
`
`
`
`
`
`
`
`summarized data derived from multiple base tables. For
`
`
`
`
`
`
`
`
`example, assume that a database includes the three base
`
`
`
`
`
`
`
`
`
`tables illustrated in FIG. 1. Referring to FIG. 1, a
`
`
`
`
`
`
`
`PRODUCT_SALEStable 106 contains information about
`
`
`
`
`
`
`
`specific sales made by a company. A LOCATIONtable 104
`
`
`
`
`
`
`
`
`contains information about where the sales took place. A
`TIME table 102 contains information about the times at
`
`
`
`
`
`
`
`
`which sales were made.
`
`
`
`
`FIG. 2 illustrates a SALESREPORTtable 202 that stores
`
`
`
`
`
`
`
`
`
`
`
`
`the pre-computed result of an aggregate query based on the
`
`
`
`
`
`
`
`PRODUCT_SALEStable 106, LOCATIONtable 104 and
`
`
`
`
`
`
`
`TIMEtable 102. In the specific example given, SALESRE-
`
`
`
`
`
`
`
`
`
`PORTtable 202 stores the sum ofall sales made in each city
`
`
`
`during each month.
`Information that defines a materialized view is referred to
`
`
`
`
`
`
`
`
`
`
`
`
`herein as a “summary definition”. A summary definition
`
`
`
`
`
`
`
`
`
`
`indicates (1) the location of the base data that is used to
`
`
`
`
`
`
`
`
`
`
`derive the materialized view, and (2) how the base data from
`
`
`
`
`
`
`
`
`the base tables should be aggregated to derive the summa-
`
`
`
`
`
`
`
`
`rized data. In many systems, summary definitions take the
`
`
`
`
`
`
`
`
`
`form of queries. For example, the query for the SALESRE-
`
`
`
`
`
`
`
`PORTtable 202 may be expressed in a relational database
`
`
`
`
`
`
`
`management system using the following SQL language
`
`
`
`
`
`
`
`statement which joins the three base tables:
`
`
`
`
`
`
`SELECTt,.month,t,.city, SUM (t,.total) sumsales
`
`
`
`
`
`FROM TIMEt,, LOCATIONt,, PRODUCT_SALESt,
`
`
`
`
`
`
`WHEREt,.time_id=t,.time_id AND
`
`
`
`t,.loc_id=t,.loc_id
`
`
`
`GROUPBYt,.month,t,.city
`The “SELECT”line indicates the columnsthat are to be
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`used to generate the materialized view. The “FROM”line
`indicates the tables that have those columns. The “WHERE”
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`line indicates the criteria for joining values from one table
`
`
`
`
`
`
`
`
`with corresponding values from the other
`tables. For
`
`
`
`
`
`
`
`
`
`example, values in row 108 of TIME table 102, row 110 of
`
`
`
`
`
`
`
`
`LOCATIONtable 104, and row 112 of PRODUCT_SALES
`
`
`
`
`
`
`
`
`
`table 106 would be joined together because the time_id
`
`
`
`
`
`
`
`
`
`
`
`value in row 108 matches the time_id value in row 112, and
`
`
`
`
`
`
`
`
`
`
`
`the loc_id value in row 110 matches the loc_id value in row
`112.
`
`The “GROUP BY”line indicates that the rows retrieved
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`by the select statement should be put into groups for each
`
`
`
`
`
`
`
`month/city combination. The “sum” function in the
`
`
`
`
`
`
`
`
`
`“SELECT”line indicates that for each of those groups, the
`
`
`
`
`
`
`
`
`values from the “total” column of PRODUCT_SALES
`
`
`
`
`
`
`
`table 106 are to be summed. The resulting materialized view
`
`
`
`
`
`
`
`
`
`will thus have one row for every month/city combination,
`and that row will have a column called “sumsales” that
`
`
`
`
`
`
`
`
`
`contains the sum of the “total” column values for that
`
`
`
`
`
`
`
`
`
`
`
`month/city combination.
`The SALESREPORTtable 202 illustrated in FIG. 2 is an
`
`
`
`
`
`
`
`
`
`
`
`
`example of the materialized view that would be generated in
`
`
`
`
`
`
`
`response to the materialized view definition specified above.
`
`
`
`
`
`
`
`
`
`The SALESREPORTtable 202 has the three columnsspeci-
`fied in the “SELECT”line of the materialized view defini-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tion: month,city, and sumsales.
`
`
`
`
`
`
`
`
`If the system that maintains the SALESREPORTtable
`
`
`
`
`
`
`
`
`
`202 does not support incremental refresh, the query listed
`
`
`
`
`
`
`
`
`
`
`
`above must be run against the base tables each time the base
`
`
`
`
`
`
`
`
`
`tables are modified. Rather than re-compute the entire con-
`tents of SALESREPORTwhendata is modified or new data
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`to
`is added to the system,
`it would be more efficient
`
`
`
`
`
`
`
`re-compute just the changesto the existing SALESREPORT
`that result from the base table data modifications.
`
`
`
`
`
`
`
`
`
`
`Join Operation
`
`
`
`
`
`To respond to queries,
`including materialized view
`
`
`
`
`
`
`queries, a database server typically has to perform numerous
`
`
`
`
`Page 14 of 27
`
`
`
`
`
`US 6,882,993 B1
`
`
`3
`
`
`
`
`
`
`
`
`table join operations because the database records that
`
`
`
`
`
`
`
`
`contain the information that is needed to respond to the
`
`
`
`
`
`
`
`
`queries are often organized into a star schema. Astar schema
`
`
`
`
`
`
`
`is distinguished by the presence of one or morerelatively
`
`
`
`
`
`
`
`
`
`large tables and several relatively smaller tables. Rather than
`
`
`
`
`
`
`
`duplicating the information contained in the smaller tables,
`
`
`
`
`
`
`
`
`the large tables contain references (foreign key values) to
`
`
`
`
`
`
`
`
`
`rowsstored in the smaller tables. The larger tables within a
`
`
`
`
`
`
`
`
`star schemaare referred to as “fact tables”, while the smaller
`tables are referred to as “dimension tables”.
`
`
`
`
`
`
`
`
`
`
`
`
`
`When a database management system contains very large
`
`
`
`
`
`
`
`
`amounts of data, certain queries against the database can
`
`
`
`
`
`
`
`
`
`
`take an unacceptably long time to execute. The cost of
`
`
`
`
`
`
`executing a query maybeparticularly significant when the
`
`
`
`
`
`
`
`query requires joins among a large number of database
`tables.
`
`
`10
`
`15
`
`
`
`
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`60
`
`
`Aggregate Function
`
`
`
`
`
`
`
`
`An important function for data generation and retrieval
`
`
`
`
`
`
`performed by a database managementsystem is the genera-
`
`
`
`
`
`
`tion of aggregated information. Aggregated information is
`
`
`
`
`
`
`information derived by applying an aggregate function to the
`values in a column of a subset of rows in a “base table”.
`
`
`
`
`
`
`
`
`
`
`
`
`
`Examples of aggregate functions are functions that sum
`
`
`
`
`
`
`
`values, calculate averages, and determine minimum and
`maximum values. The column that contains the values to
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`which an aggregate function is applied is referred to as the
`measure.
`
`
`
`
`
`
`
`The subsets of rows to which an aggregate function is
`
`
`
`
`
`
`applied are determined by values in a “group-by” column.
`
`
`
`
`
`
`The aggregate information generated by a database man-
`
`
`
`
`
`
`
`
`agement system is presented as a result set having the
`
`
`
`
`
`
`
`group-by column and the measure column. In particular, the
`
`
`
`
`
`
`
`
`
`
`result set has one row for each unique value in the group-by
`
`
`
`
`
`
`
`
`
`column. Each row in the result set corresponds to the group
`
`
`
`
`
`
`
`
`
`
`of rows in the base table containing the value for the
`
`
`
`
`
`
`
`
`group-by column of the row. The measure columnin the row
`
`
`
`
`
`
`
`
`contains the output of the aggregate function applied to the
`
`
`
`
`
`
`
`40
`values in the measure column of the group of rows.
`
`
`
`
`
`Aggregate information is generated by a database man-
`
`
`
`
`
`
`agement system in responseto receiving an aggregate query.
`
`
`
`
`
`
`
`
`An aggregate query specifies a group-by column, the mea-
`
`
`
`
`
`
`
`
`sure column, and the aggregate function to apply to the
`
`
`
`
`
`
`
`
`measure values. The following query is provided as an
`illustration.
`
`
`
`
`
`SELECT d, SUM(s) sum_s From t
`GROUPBY d
`
`
`
`
`
`
`
`
`Table t contains data representing the sales of an organi-
`
`
`
`
`
`
`
`zation. Each row represents a particular sales transaction.
`
`
`
`
`
`
`
`
`For a particular row in table t, column d contains the date of
`
`
`
`
`
`
`
`
`the sales transaction, and s contains the sale amount.
`
`
`
`
`
`
`
`The SELECTclause contains “SUM(s)”, which specifies
`
`
`
`
`
`
`
`
`
`that the aggregate function “sum” is to be applied to the
`
`
`
`
`
`
`
`
`values in column s of table t. The query also includes the
`
`
`
`
`
`
`group-by clause “GROUP BYd”, which denotes column d
`
`
`
`
`as the group-by column.
`
`
`
`
`
`
`
`
`Execution of this query generates a result set with a
`
`
`
`
`
`
`
`column for d and a column for sum (s). A particular row in
`
`
`
`
`
`
`
`
`
`
`
`the result set represents the total sales (s)
`for all sale
`
`
`
`
`
`
`
`transactions in a given day (d). Specifically, for a particular
`
`
`
`
`
`
`
`
`
`row in the result set, d contains a unique date value from
`
`
`
`
`
`
`
`
`
`table t for column d. Column sum_scontains the sum of the
`
`
`
`
`
`
`
`
`
`sales amount values in columns for the group of rows from
`
`
`
`
`
`
`
`
`i that have the unique date value in column d.
`
`
`
`
`
`
`
`
`is often useful
`to generate aggregate information
`It
`
`
`
`
`
`
`
`
`grouped by multiple columns. For example, table 1 may also
`Page 15 of 27
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`4
`
`
`
`
`
`
`contain column r, a column containing values representing
`
`
`
`
`
`
`
`
`regions. It is may be useful to generate a result set that
`
`
`
`
`
`
`
`
`summarizes sales by region, and for each region,sales date.
`
`
`
`
`
`
`
`Such a result set may be generated by referencing column r
`
`
`
`
`
`
`and d in the group-by clause, as illustrated by the following
`
`query.
`
`
`
`SELECTd, r SUM (s) from t
`
`GROUP BYr, d
`
`
`
`
`
`
`Anotheruseful way to provide aggregate informationis to
`
`
`
`
`
`
`
`
`
`generate one result set that groups data by various combi-
`
`
`
`
`
`
`
`
`nations of columns. For example, a result set may contain a
`
`
`
`
`
`
`
`
`
`set of rows grouped by region and date, and a set of rows
`
`
`
`
`
`
`
`
`grouped only by region. Such a result set may be generated
`
`
`
`
`
`
`
`by submitting a query that
`includes multiple subqueries
`
`
`
`
`
`operated upon by the union operator.
`
`
`
`Refreshing Materialized Views
`
`
`
`
`
`
`High-level relational algebra expressions which are the
`
`
`
`
`
`
`
`basis of an incremental refresh algorithm are described,
`
`
`
`
`using the following notations:
`
`
`
`
`
`
`T;: relational table (pre-update state, before an operation
`
`
`has occurred);
`
`
`
`
`
`
`T;*: relational table (post-update state, after an operation
`
`
`has occurred);
`
`
`
`
`
`AT: delta corresponding to operations (e.g., INSERT,
`
`
`
`
`DELETE, or UPDATE) performed ontable T;
`
`
`
`
`
`
`U: multi-set addition operation (UNION ALL); and
`
`
`
`T,><T,: innerjoin between T, and T).
`Incremental refresh of a materialized view involves
`
`
`
`
`
`
`
`
`
`
`
`updating the materialized view to reflect
`the operations
`
`
`
`
`
`
`
`performed on, or changes made to,
`the underlying base
`
`
`
`
`
`
`
`tables. Thus, for the following materialized view involving
`
`
`
`
`a join of two tables,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`MV=T><T,,
`
`
`
`
`
`
`
`
`if updates AT, and AT, are done to tables T, and T;,
`
`
`
`
`
`
`
`respectively, the post-update versions of the tables, T,* and
`
`
`
`T,” are depicted as:
`
`
`T,*=T,UAT,; and
`
`To*=TUAT).
`
`
`
`
`
`
`
`
`
`
`If MV* represents the new state of the materialized view
`
`
`
`
`
`including the changes to T, and T, and AMVrepresents the
`
`
`
`
`changes applied to MV,then
`MV*=MVUAMV.
`
`
`
`
`
`Alternatively,
`
`
`
`
`MV! =TS xTh;
`
`
`(Expression 1)
`
`
`
`
`= (T, UAT) x (Ty U AT);
`
`
`
`= (T, XT) U (1, XAT2) U (AT, x (Tz UAT);
`
`
`
`
`= MV U(T xAT2) U (AT, x TZ);
`
`
`
`Hence,
`
`
`
`AMV=(T,><AT,)U(AT|><T,")
`
`(Expression 2)
`
`
`
`
`
`
`
`This generalized representation of a query can be con-
`
`
`
`
`
`
`ceptualized as refreshing the materialized view, MV, assum-
`
`
`
`
`
`
`
`
`ing one table was updated at a time. Thus, assuming T, was
`
`
`
`
`
`Page 15 of 27
`
`
`
`
`
`US 6,882,993 B1
`
`
`5
`
`
`
`
`
`
`updated first, T, must be considered in its non-updated or
`
`
`
`
`
`
`
`pre-update state. This corresponds to the term (T,><T,).
`
`
`
`
`
`
`
`
`
`Once this change has been considered, T, must now be used
`
`
`
`
`
`
`
`
`in its post-update state, ic., T,*. This corresponds to the
`
`
`term (AT,><T,”).
`
`
`
`
`
`
`
`
`Extending this to an MV with n tables,
`the following
`
`
`expression applies:
`
`MV =T, xT. XT3X... x Th
`
`
`
`
`(Expression 3)
`
`AMV = AT, XT) XT3 X...X Ty,
`
`
`
`
`
`JTF x AT: xT3 x... Tn
`
`
`
`Jrrx ty xTZ Xx... XAT,
`
`
`
`
`
`
`
`
`
`
`The above expression essentially represents one way of
`
`
`
`
`
`
`
`
`
`
`including all the changesto all of the base tables without
`
`
`
`
`
`
`
`
`counting any of them twice. This expression can be applied
`
`
`
`
`
`
`
`
`
`in general to any kind of materialized view.If only one table
`
`
`
`
`
`
`
`
`(for example, T,) is updated, then T, and T,* are the same for
`
`
`
`
`
`
`
`
`
`
`i not equal to 1, and thus the pre-update state is not required
`
`
`
`
`
`
`
`
`to refresh the materialized view. In addition,if a table (for
`
`
`
`
`
`
`
`
`
`example, T,) is not updated, then T, and T,* are the same(for
`
`
`example, T,=T,").
`
`
`
`
`
`
`Prior approaches to refreshing materialized views have
`
`
`
`
`
`
`
`
`typically restricted the table operations in some way. For
`
`
`
`
`
`
`
`
`
`example, one approach requires that only one basetable is
`
`
`
`
`
`
`
`
`updated when a view refresh is called. Hence, the pre-update
`
`
`
`
`
`
`
`
`
`state is not required and the problem is avoided. Such an
`
`
`
`
`
`
`
`
`approach may producecorrect results for materialized views
`
`
`
`
`
`
`
`
`with aggregations within a single base table because there is
`
`
`
`
`
`
`
`
`only one table of interest, but, in practice, the assumption or
`
`
`
`
`
`
`
`
`
`
`likelihood that only one table has been updated before the
`
`
`
`
`
`
`refresh is called makes this approach impractical.
`
`
`
`
`
`
`
`Another approach restricts the base table operations to
`
`
`
`
`
`
`
`
`only INSERTs or only DELETEs. Thus,the pre-update state
`
`
`
`
`
`
`
`
`can easily be computed by determining the new rows(for
`
`
`
`
`
`
`
`
`
`
`INSERTs) or the rows that are no longer in the table (for
`
`
`
`
`
`
`DELETEs), and essentially reversing these operations.
`
`
`
`
`
`
`
`However, if an arbitrary mix of INSERT, UPDATE and
`
`
`
`
`
`
`
`
`DELETEoperations has been performed on more than one
`
`
`
`
`
`
`
`
`
`then the determination of the pre-update state is
`table,
`
`
`
`
`
`
`
`
`
`non-trivial since the same row could have been inserted,
`
`
`
`
`
`
`
`
`updated and deleted many times. Hereinafter,
`the terms
`
`
`
`
`
`
`“INSERT,” “DELETE,” and “UPDATE,” when referenced
`
`
`
`
`
`
`
`
`in all capitals, each refer to an independenttable operation.
`
`
`
`
`
`
`
`Still another approach is to use a memoryless algorithm.
`This meansthat all rows from the materialized view that
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`could have been affected by any of the modified rows of base
`
`
`
`
`
`
`
`
`tables identified in a materialized view change log are
`
`
`
`
`
`
`
`
`removed. Then, the materialized view query is reissued and
`restricted to the new versions of these rows. Such an
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`approach may be sufficient for materialized views without
`
`
`
`
`
`
`
`aggregation, as each row in the base table is individually
`
`
`
`
`
`
`
`represented in the materialized view. However, due to the
`
`
`
`
`
`
`
`nature of aggregate functions, several rows in a base table
`
`
`
`
`
`
`
`
`are aggregated downto a few rowsin the materialized view.
`
`
`
`
`
`
`
`
`Therefore, multiple rows in a base table can affect any
`
`
`
`
`
`
`particular row in the materialized view. Consequently, a
`
`
`
`
`
`
`
`memoryless refresh approachisnotefficient since it requires
`
`
`
`
`
`
`
`
`
`access to rows other than the changed rows. In addition,
`
`
`
`
`
`
`
`multiple changes performed on multiple rows in the base
`
`
`
`
`
`
`
`
`
`table could change any single row in the materialized view.
`
`
`
`
`
`
`
`Hence, memoryless refresh may at times be worse than a
`
`
`
`
`
`
`
`complete refresh due to the overhead of identifying affected
`rows.
`
`
`Page 16 of 27
`
`
`
`10
`
`15
`
`
`
`20
`
`25
`
`
`
`35
`
`40
`
`
`
`45
`
`
`
`50
`
`
`
`55
`
`
`
`60
`
`
`
`65
`
`
`
`
`6
`
`
`
`
`
`
`Based on the foregoing, it is clearly desirable to provide
`
`
`
`
`
`a mechanism that incrementally refreshes a materialized
`
`
`
`
`
`
`view after arbitrary multiple operations on a row of a base
`
`
`
`
`
`
`table. Furthermore it is desirable to provide a mechanism
`
`
`
`
`
`
`that incrementally refreshes a materialized view after opera-
`
`
`
`
`
`tions to rows of multiple base tables.
`SUMMARYOF THE INVENTION
`
`
`
`
`
`
`
`
`A method is provided for incrementally refreshing a
`
`
`
`
`
`
`materialized view after multiple operations on a row of a
`
`
`
`
`
`
`
`
`
`base table of the materialized view, by determining an
`
`
`
`
`
`
`
`
`equivalent operation for the multiple operations and refresh-
`
`
`
`
`
`
`
`
`ing the materialized view according to the equivalent opera-
`
`
`
`
`
`
`
`tion. The method is applicable to a materialized view based
`
`
`
`
`
`
`
`
`on multiple base tables on which multiple operations have
`
`
`
`
`
`
`
`been performed. Thus, a materialized view defined by a
`
`
`
`
`
`
`
`
`
`query that joins two or more base tables or that aggregates
`
`
`
`
`
`
`
`
`
`values from multiple rows of a base table or multiple base
`
`
`
`
`
`
`
`tables, can beefficiently refreshed. Furthermore, the method
`
`
`
`
`
`
`
`
`is applicable to operations to base tables which occurred
`
`
`
`
`
`
`
`independently of each other and INSERToperations which
`
`
`
`were performed in bulk.
`
`
`
`
`
`
`
`According to embodiments, the step of determining the
`
`
`
`
`
`
`
`
`equivalent operation can include identifying rows for which
`
`
`
`
`
`
`
`
`an earliest operation from a plurality of operations was a
`
`
`
`
`
`
`
`
`DELETE operation, or rows for which a latest operation
`
`
`
`
`
`
`from a plurality of operations was an INSERToperation, or
`
`
`
`
`
`
`
`
`
`a combination of the two. In addition, according to one
`
`
`
`
`
`
`
`
`embodiment, the step of refreshing the materialized view
`
`
`
`
`
`
`includes performing an inverse operation of the equivalent
`
`
`
`
`
`
`
`
`operation to determine a pre-update state of the row, and
`
`
`
`
`
`
`
`refreshing the materialized view based on the pre-update
`state.
`
`Additional embodiments are described which enhance the
`
`
`
`
`
`
`
`
`
`
`
`
`performance of materialized view queries.
`In one such
`
`
`
`
`
`
`
`embodiment, the base tables are ordered according to the
`
`
`
`
`computational complexity of determining their pre-update
`
`
`
`
`
`
`state based on the rows operated upon, and the materialized
`
`
`
`
`
`
`
`view is refreshed by applying the more complex base tables
`
`
`
`
`
`
`
`before the less complex base tables. Other embodiments
`
`
`
`
`
`
`
`
`combine queries for defining rows for which the earliest
`