`
`
`
`Sequentiality and Prefetching in Database
`Systems
`
`ALAN JAY SMITH
`
`University of California-Berkeley
`
`Sequentiality of access is an inherent characteristic of many database systems. Weusethis observation
`to develop an algorithm which selectively prefetches data blocks ahead of the point of reference. The
`numberof blocks prefetched is chosen by using the empirical run length distribution and conditioning
`on the observed numberof sequential block references immediately preceding reference to the current
`block. The optimal number of blocks to prefetch is estimated as a function of a numberof ‘‘costs,”
`including the cost of accessing a block not resident in the buffer (a miss), the cost of fetching additional
`data blocks at fault times, and the cost of fetching blocks that are never referenced. We estimate this
`latter cost, described as memorypollution, in two ways. We consider the treatment (in the replacement
`algorithm) of prefetched blocks, whether they are treated as referenced or not, and find that it makes
`very little difference. Trace data taken from an operational IMS database system is analyzed and the
`results are presented. We show how to determine optimal block sizes. We find that anticipatory
`fetching of data can lead to significant improvements in system operation.
`
`prefetching, database systems, paging, buffer management, sequentiality,
`Key Words and Phrases:
`dynamic programming, IMS
`CR Categories:
`3,73, 3.74, 4.33, 4.34
`
`1.
`
`INTRODUCTION
`
`\\
`Database systems are designed to insert, delete, retrieve, update, and analyze
`information stored in the database. Since every query or modification to the
`database will access at least one target datum, and since the analysis associated
`with a query is usually trivial, the efficiency of most database systems is heavily
`dependent on the numberof I/O operations actually required to query or modify
`and the overhead associated with each operation.
`One method used in some systems to reduce the frequency of I/O operations
`is to maintain in a main memory buffer pool a number of the blocks of the
`database. Data’ accesses satisfied by blocks found in this buffer will take place
`
`
`Permission to copy withoutfee all or part of this material is granted provided that the copies are not
`madeor distributed for direct commercial advantage, the ACM copyright notice and the title of the
`publication and its date appear, and notice is given that copying is by permission of the Association
`for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
`permission.
`The author was supported during the bulk of this research as a visitor at the IBM San Jose Research
`Laboratory. Partial support was later provided by the National Science Foundation under Grant
`MCS.-75-06768. Some computer time was furnished by the Energy Research and Development
`Administration under Contract E(04-3)515.
`Author's address: Department of Electrical Engineering and Computer Sciences and the Electronics
`Research Laboratory, University of California-Berkeley, Berkeley, CA 94720.
`© 1978 ACM 0362-5915/78/0900-0223 $00.75
`ACM Transactions on Database Systems, Vol. 3, No. 3, September 1978; Pages 223-247.
`
`001
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`001
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`224
`
`.
`
`Alan Jay Smith
`
`much more quickly and usually with much less computational overhead. IMS
`(Information Management System/360) [10-12] uses this Strategy, as we discuss
`later. The INGRES database system [9] makes use of the I/O buffer pool
`maintained by UNIX [19] for this same purpose. There is considerable scope for
`optimizingthe operation ofthe bufferpool, principally by controllingtheselection
`
`St
`
`A. Sequentiality
`A characteristic of the database system that we study in this paper, and we
`believe a characteristic of many other systems,is Sequentiality of access. Many
`queriesrequire scans of an entire database in order to compute an aggregate, An
`example might be: “Find the averagesalary of all residents of New York City,”
`which would probablybecalculated byscanningsequentially over the appropriate
`section of the database. In some systems a simple lookup evokes a sequential
`scan of the entire database,if the database is not indexed on the key, or a partial
`scan, if the index is not precise. For example, a query about “John Jones” might
`be answered by using an index to find the beginning of the records for names
`starting with “J,” and then searching forward, “Range” queries, in which the
`databaseis searched for records with keys in a range, will also result in sequential
`accesses, either to the records themselvesorat least to an indexifthe index exists
`
`data differs significantly from the logical one, then little Sequentiality can be
`expected.
`a
`A consistently sequential pattern of access will allow us to easily anticipate
`which data blocks, Segments, or tuples are likely to be accessed next and tofetch
`them before their useis required or requested. In systems in which anticipatory
`data fetchingis less costly than demand fetching, we can expect a decrease in the
`cost of I/O activity. It is obvious, and under certain conditions has been shown
`formally [1], that when multiple block or anticipatory fetching is no “cheaper”
`per block than demand fetching, demand fetching is an optimal policy. We
`contend, and we shall discuss this in greaterdetail in a later section ofthis paper,
`that fetching segments or blocksin advanceoftheir use, andinparticularfetching
`several segments or blocks at once,is significantly less costly per block fetched
`(with our cost functions) than individual demand fetching.
`B. Previous Research
`
`
`
`
`Cha
`
`ACM Transactions on Database Systems, Vol. 3, No.3, September 1978.
`
`002
`
`|
`Facebook's Ex. love
`IPR2019-00516er
`
`
`002
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`IMS
`SCUusS
`pool
`ifor
`the
`i
`e
`atch-
`100k
`
`we
`-any
`An
`“Ys
`aate
`ual
`“Hal
`‘ght
`nes
`ene
`nial
`ists
`che
`ally
`she
`abe
`"h
`wy
`ne
`~
`
`—n
`
`“
`
`a
`
`Sequentiality and Prefetching in Database Systems
`.
`225
`[2, 8, 25-28] Sequentialprefetchingoflinesforcache memoryhas been shown to
`work very well [24] because the extent ofsequentiality is large compared to the
`cachepage(line) size; formostprogramsthesequentialityisnotlarge compared
`to the main memorypage size. Prefetchingof1/O data streams hasalso worked
`well [22, 23] because ofthe frequent use ofpurely sequentialfiles,
`:
`C.
`Information Management System/360
`The previous research regarding prefetching whichis discussed above has all
`been oriented toward the analysis and use of data describing real system opera-
`tion. We shall do likewise, and in later Sections of this paper we present data
`taken from an operational and long running Information Management System/
`360 (IMS), a data management system marketed bythe IBM Corporation. This
`‘particular system has been in Operation for a numberof years in an industrial
`environment. The results observed are believed to be illustrative of an actual
`application, but they cannot be taken as typical or representative for any other
`IMSsystem. Despite this disclaimer, and the fact that our research is based on
`data taken from a Specific installation running a specific database system, we
`believe that sequentialityis a characteristic common to many databasesystems,
`and that-therefore the methodology developed in this paper for sequential
`prefetching will also be applicable to other IMS installations and.
`to other
`database systems,
`In orderto provide some backgroundfor the analysis ofour data, we describe
`below someofthesignificant features of IMS. A readable and more complete
`discussioncanbefoundinabookbyDate[5],whichalsodiscussesotherdatabase
`systems andorganizations, The IMS manuals [10-12] provide much more thor-
`ough coverage.
`;
`IMSis a hierarchical databasesystem;thatis, segments (tuples) in adatabase
`are arrangedlogically in a tree structure such as that indicated in Figure 1. An
`IMSimplementation may consist ofseveral databases, each ofwhich consists of
`College
`} Root
`
`ACMTransactions on Database Systems, Vol. 3, No. 3, September 1978,
`
`IPR2019-00516
`
`003
`
`Facebook's Ex. 1026 ©
`
`Departments
`
`’
`
`Chairman
`
`
`
`Room
`Instructor
`Students
`|__Robin,F104]Senior]
`
`
`
`[Joytont,tJ8300|Senior]
`
`[Finch6]7814|sumer]
`
`
`
`Fig. 1
`|
`
`003
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`
`
`°
`
`
`
`226
`
`.
`
`Alan Jay Smith
`
`and so on.
`
`a large number ofsuch “trees.” The roots may or may not be sorted and may or
`may not be indexed. In the system measured there were several databases; each
`had the roots indexed. Access to units within a tree is through the root. Within
`the tree, data is stored in a top-down,left-to-right order. That is, the following
`items from Figure 1 would be stored in this order: Engineering, Mechanical
`Engineering, Brown, M24 Basket Weaving, 39-311, 14:05, Prof. Smith, Finch C
`7614 Junior, Jayhawk L 8302 Senior, Robin F 1104 Senior, M23 Floatation,...,
`Two physical Storage organizations are possible, hierarchical sequential and
`hierarchical direct. In the former case segments are examined consecutively in
`the order in which they are stored in order to find someposition in the tree. In
`the latter case, which describes the databases in this system, certain segment
`instancesare related by two-way pointers and these pointers can be used to skip
`over intervening data. In either case the data is accessed using DL/1 (Data
`Language/1), which implementsnine operations. These are: GET UNIQUE(find
`a specific item), GET NEXT (get the next item), GET NEXT WITHIN PARENT
`(get next item under the same root), GET HOLD UNIQUE, GET HOLD NEXT,
`GET HOLD WITHIN PARENT, INSERT, REPLACE,and DELETE.In each
`of the first six cases, the command can or must be qualified to partially or
`completely identify the tuple in question. GET HOLD (items 4, 5, and 6)is
`usually user prior to INSERT, DELETE, and REPLACE. Eighty-one percent of
`all accesses measured (see below) proved to be for lookup only (GET) rather
`than modification (INSERT, REPLACE, DELETE). Average frequency for
`INSERT, REPLACE, and DELETE was 11 percent, 0.7 percent, and 8 percent,
`respectively, with substantial variation between databases,
`.
`A search for
`a uniquely qualified item, such as GET” UNIQUE
`(College(Engineering).Department(Mechanical)Chairman) will involve a search
`starting at the appropriateroot and either scanning in sequential storage orderif
`the storage organization is sequential, or following pointersif the organization is
`direct. Eachofthe segmentsreferenced in the courseoffinding thé target segment
`is called a path segment. A “GET NEXT”will also involve either sequential
`search or search following pointers. Unless the pointers lead immediately to the
`desired item, a significant amount of sequentiality will be evident in both cases;
`in following a path to the target the direction of search is always forward in the
`A segment can be referenced only if it is in memory. For several efficiency
`reasons, IMS groups segments into fixed size storage units called physical blocks
`(henceforth called blocks). In the particular implementation in question the block
`size was 1690 bytes for data blocks and 3465 bytes for blocks used to hold index
`entries for the root nodes. The blocks are the unit of physical storage and data
`transmission; a request for a segment will result in the block containing the
`segmentbeing transmitted,if necessary, from secondary storage to a bufferpool
`of blocks kept in main memory.
`The search for a target segment proceeds as follows:
`i. Determine the first path segment to be examined.
`ii, Search the buffer pool for the block containing the segment.If the block is
`missing, then fetch the block (and remove a block if necessary to make room).
`ACM Transactions on Database Systems,Vol. 3, No. 3, September 1978.
`
`database.
`
`004
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`004
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`
`
`227
`:
`Sequentiality and Prefetching in Database Systems
`iii. Find the segment within the block. If this is the target segment, then
`perform the desired operation. Otherwise, determine the next path segment and
`continue at step 2.
`Wenote that Segments are commonly much smaller than blocks; for our data
`the average segment size was 80 bytes. A large block size will result in many
`segments being fetched at once; because of the sequential nature of segment
`
`was used without change, except that the reloading ensured thatthe logical and
`physical database organizations coincided.
`The. database system used was IMS/360 version 2.4, running under OS/VS2,
`release 1.6. Thetotal size of the entire database was about 200 megabytes.
`Our data analysis, in later sections, uses three sections of the entire seven-day
`block reference trace. The part labeled “full tape”is the trace for thefirst day.
`This first-day tape actually was generated in seven sections of approximately
`equal size; the parts of the-trace referred to as “part 1” and “part 2” are just the
`first two segments ofthis day-long trace.
`This data was gathered by researchers at the IBM San Jose Research Labo-
`ratory, and further discussions of IMS,the data gathering methodology, and the
`papers by Tuel and Rodriguez-Rosell [30], Rodriguez-Rosell and Hildebrand
`(21], Ragaz and Rodriguez-Rosell [18], Rodriguez-Rosell [20], Gaver;"Lavenberg,
`and Price [7], Lavenberg and Shedler [14], Lewis and Shedler [16], Lavenberg
`and Shedler [15], and Tuel [29] for further information.
`2. OPTIMIZATION OF PREFETCHING
`A. The Model
`As was discussed in the beginning ofthis paper, the efficiency of operation of a
`database system can be increased by increasing the fraction of data references
`captured by the main memory buffer. The only general method known bythe
`authorto beeffective in increasing the buffer hit ratio (for a fixed size buffer) is
`to take advantage ofsequentiality in the series ofaccesses—either by prefetching
`or by increasing theblock size. This issue is discussed further in the data analysis
`section of this paper; for the formulation of our optimization results we rely on a
`model asfollows:
`i. The physical layout of the database is the linear projection (as defined
`earlier) of the logical structure of the database. Segmentsin the databasewill be
`assumed to be numbered consecutively from 1 to S. Blocks in the database .
`ACMTransactions on Database Systems, Vol. 3. No.3, September 1978.
`
`005
`
`Facebook's Ex. 1026
`IPR2019-005 16
`
`005
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`228
`
`:
`
`Alan Jay Smith
`
`(
`
`
`
`B.
`
`,
`
`(physical groupings ofadjacent segments) are numberedconsecutively from 1 to
`ii. The reduced block reference string is the concatenation of a sequence of
`runs, each run having a length that is independently and identically distributed
`with probability mass function l(k), k = 1,..., 2.
`The reduced block reference string (RBRS)is the sequence of block references
`from which all immediate rereferences have been deleted. Thusif the original
`block reference string was b;, bi, by, 8;, bj, bi, bs, bg, bg, .... then the RBRS would be
`bi, 0;, bj, b:, bg, .... It isn’t known whether knowledge of immediate rereferencing
`provides useful informationfor either prefetching or replacement, but we choose
`not to use that information here.
`, A run is a sequence of reduced block references, such that the blocks are
`numberedconsecutively. More specifically, let bj, bj+1, «5 Bjeny Oj4n+1 be an RBRS,
`and let bj41 4b; +1, bjree ¥ Bjre + 1, and 6; + 1 = ba forj < i<j +k. Then the
`RBRSsequence }j+1,
`..., bj+4 is defined to be a run of length k. We define the
`predecessor of block 8; to be block 8; — 1 and the successor to be block 8; + 1.
`From our definition, it may be seen that a block which is not preceded in the
`reference string by its predecessor nor succeeded by its successor constitutes a
`run of length 1.’
`Wedenote the probability distribution for the run length as follows:
`E(k)
`=the probability mass function for the run length. /(%) is the probability
`that a run is exactly k blocks long.
`L(k)|=the cumulative mass function for the run: length = ¥, Xi).
`L‘(k)|=the survivor function for the run length = 1 — L(A).
`L*(j|k) =the survivor function, given that the run is oflength & or greater, equal
`to L°(j)/L(k — 1), j= k; equal to 1 otherwise.
`=the hazard rate, equal to I(k)/L‘(k — 1), the probability that the run
`ends with the kth element given thatit is at least k blocks long.
`=mean run length = )7.; il(i).
`.
`
`H(k)
`
`L
`
`B. The Costs of Sequential Prefetching
`Our probabilistic model for the reference string as a concatenation of runs of
`independentandidentically distributed length suggests that we condition on the
`previously observed run length to decide how manyblocks ahead to prefetch. To
`do so, we must determine the utility of successful prefetches and the problems
`associated with unsuccessful ones. Wetherefore identify several costs in executing
`I/O operations: CPU timeis necessary to initiate I/O operations and to update
`all appropriate tables. The transfer of data requires memory cycles and thus
`interferes with CPU operation. Transfer time, and often seek and latency time,
`depending on the secondary storage device, prevent data from becoming imme-
`
`' The function ofourdefinition of a runis to allowusto estimate the utility of prefetching by assuming
`that blocks are only used in runs.In fact, blocks may be referenced out of order, and a prefetched
`block might be used, but not as part of the run for which it was fetched. Thus we could define a
`generalized run such that block }; is the kth element of a generalized run if the predecessorof block
`5, is in memory at the time block 6;, is referenced and if block 6, — 1 is the (R — 1)-st element of a
`generalized run. Such a generalization is beyond the scopeofthis study, both becauseofits complexity
`and because of its dependence on thesize of the memory.
`ACMTransactions on Database Systems, Vol. 3, No.3, Septemher 1978.
`
`
`
`| 006
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`006
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`
`
`{
`
`
`
`
`
`Sequentiality and Prefetching in Database Systems
`
`.
`
`229
`
`diately available. The CPU in this case must either remain idle while the
`information is fetched, or it must switch to another task, which again involves
`overhead. Every time a block is fetched it may displace another block that is
`aboutto be rereferenced. Thus, excessive fetching of unused blocks may actually
`increase the missratio.
`Weshall group all of the above costs into three strategy costs and one effect
`cost. Possible strategies, which involve strategycosts, are the following: (a) Fetch
`a block on demand (as needed), (b) fetch a block in advance but not simultan-
`eously with a demandfetch, and (c) fetch additional sequentially adjacent blocks
`(“tag-along” blocks) when a block is fetched according to strategy (a) or (b). We
`identify these costs as follows:
`DFC: Demand Fetch Cost—the cost of fetching one block when it is needed
`immediately.
`PFC: Pre-Fetch Cost—the cost of fetching one block when it is not needed
`immediately and whenit is not a “tag-along” block.
`TAC: Tag-Along Cost—the cost for each additional block transferred from
`secondary storage whentheinitial block was fetched by strategy(a) or (b).
`Wealso identify the following effect cost:
`BFC: Bad Fetch Cost—thecost, in additional fetches, resulting from bringing in
`a block thatis not actually used. This latter effect is also known as memory
`pollution.
`In determining and evaluating our cost function, we will make the following
`assumptions in addition to those of our model:
`iti, DFC, PFC, TAC, and BFC includeall important costs, and further, the
`total cost can be expressed as a linear combination of these individualcosts.
`iv. Blocks that are prefetched and used whilestill in memory do not contribute
`to memory pollution.
`\
`v. The effect of memory pollution is independent of the block replacement
`algorithm and the memory capacity.
`-
`vi. There is nothing other than run length that contributes to effective predic-
`tion.
`Wewill discuss the validity of some of these assumptionsin later sections of
`this paper.
`Let us assume somestrategy for prefetching. Associated with this strategy is a
`set of numbers(DF), E(PF), E(TA), and E(BF), where these are respectively
`the expected riumber of demandfetches, prefetches, tag-along blocks, and bad
`fetches per run. The expected cost per run C underthis strategy is then
`(1)
`C = DFC*E(DF) + PFC*E(PF) + TAC*E(TA) + BFC*E(BF).
`From our assumptions above, specifically from ii and vi, to optimize system
`operationit is sufficient to minimize the expected cost per run by choosing the
`best prefetching strategy. We find that strategybelow.
`
`C. Prefetch Strategy Optimization
`Weshall consider twoclasses of prefetch policies; the second will be a generaliza-
`tion of thefirst:
`
`ACMTransactions on Database Systems,Vol. 3, No. 3, September 1978.
`
`007
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`007
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`230
`
`.
`
`Alan Jay Smith
`
`Class 1 prefetch policies permit any number / additional sequential blocks to
`be prefetched at a time when a demandfetchis required. No fetches are permitted
`at other times. In many systems, PFC is comparable to DFC, so this restriction
`follows naturally.
`,
`Class 2 prefetch policies permit any number/ additional sequential blocks to
`be prefetched beyondthe current reference at any time, whether or not a miss
`occurred on this reference. From our set of costs, it is clear that there is no
`advantage to prefetchingif at the time of a reference to block bz, block 6; + 1is
`already in memory. Therefore it is assumed that prefetches are initiated only
`when the current reference causes a fault or the block beyond the current
`reference is not in the buffer.
`1. Class 1 Prefetch Optimization. We define the remaining cost for a run
`C(k), and the minimum expected remaining cost for a run Cnin(k), as follows:
`The remaining cost C(z) for a run is the expected cost of fetching all remaining
`blocks of the current run, given that (exactly) k — 1 blocks have already been
`fetched and that the run is # or more blocks in total length. The minimum
`expected remaining cost is taken overall possible class 1 fetch policies.
`The value of Cyin(k) may be simply computed:It is the cost of fetching the kth
`block in the run (DFC) plus the cost ofj additional blocks U*TAC, 7 = 0), plus
`the cost of the remainder of the run (L°(k + JNR)*(Crin(k + 7 + 1))), plus the
`pollution cost (BFC+S33 Uk + aU — )/L(Rk — 1)). This may be seen by
`inspection (when the termsare collected) to be:
`Cnintk) = DFC + min [rac + L°(k.+j\R)*Crinlk +7 + 1)
`(2)
`+ BCS Uk +) - )/LR vf.
`The mean minimum costper runis then Cnin(1), and the mean cost per (reduced)
`reference is Crin(1)/L. (The cost per actual reference can be obtained by dividing
`the cost per reduced reference by the ratio of real references to reduced refer-
`ences.)
`a
`If the run length is bounded, C,i2 may be computed straightforwardly: Let Rmex
`be the largest possible run length. Then Cmin(Rmax) = DFC. Cmin(k), Rk < max, iS
`a function of Cnin(7), Rmax 2 j > k, and other known parameters only, and the
`computation is thus simple and proceeds from k = Rmax.to k = 0. Since we shall
`estimate the run length distribution L(#) from the observed sample £(k), the run
`length is always bounded. We hypothesize that for “well-behaved”distributions
`L(k), arbitrary truncation of L() producesnoill effects. Since L°(k + Jz), J = 1,
`is alwaysless than one, the effect of errors in Cmin(j + & + 1) can be seen to die
`out.
`The computation of C,,in also provides the optimal fetch policy. At the time of
`a fault to the kth block in the run,it is optimal to fetch a(k) additional blocks,
`where a(k) is the value of j that minimizes the expression above in eq. (2). We
`note that after this computation has been performed, many values of Crin(k) and
`a(k) are superfluous since there will never be a demandfetch to the fth block in
`the run. For example, let the optimum values of a be respectively a = {1, 1, 3, 3,
`ACM Transactions on Database Systems, Vol. 3, No. 3, September 1978.
`
`J
`J20
`
`
`
`
`
`008
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`008
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`
`
`
`
`Sequentiality and Prefetching in Database Systems
`
`.
`
`231
`
`2, 4, 4} for k = 1 to 7. Then a fetch to the first block in a run will result in its
`successor being fetched. Therefore, there can never be a fault to the second item
`in a run. Similarly, the fault to the third block in a run fetches also the fourth,
`fifth, and sixth. Thusitis sufficient to specify the fetch policy only at the possihle
`fault points.
`2. Class 2 Fetch Policies.
`In the case that fetching at other than fault times
`is permitted, we must define a slightly different cost function. Let Cfmin(k) be the
`. minimum remaining future costs of a run, where this is defined to be the cost of
`accessing all remaining blocks of the run, given that the runis & or more blocks
`in length and that exactly & blocks have already been fetched. (Note thedifference
`from the preceding definition.) We are led to this different definition because at
`the time that we reference block & in the run, we must decide whether to bring
`in block & + 1 at this time, or to wait to see if it is referenced first. The cost
`breaks down into two parts, depending on this decision, and may beseen to be:
`Cfnin(k) = min {|prc + min {J*TAC + L'(j + BR)Cfrinlk +7 +1)
`+ BFC*S) UR + i(F+1—a)/L(k - v}
`[e<ai(ore+min [rac +
`
`Fe
`
`i=)
`
`J
`
`Lh + jk + UD *Chain(k +7 + 1)
`yl
`
`i=0
`
`(3)
`+ BFC*Y WkR+i+ NG - vew})]}
`Again, we obtain the optimal fetch policy as a consequenceof our computation
`of Cfnin. Let B(R) be the value ofj that minimizes the first term in eq. (3) above,
`andlet y(k) be the value ofj that minimizes the second term.If the first term is
`less, it is optimal to fetch a number ofadditional blocks (B(k) +_1’blocks) even
`thoughit is not a fault time; if the second term is less, it is optimal to wait until
`the (k + 1)-st block is referenced (if it is), at which time y(k) additional blocks
`should be fetched.
`The mean minimum cost per run is DFC + Cfmin(1), and the mean minimum
`cost per (reduced) reference is (DFC + Cfnin(1))/L.
`3. COST ESTIMATION
`A. Estimation of DFC, PFC, and TAC
`In Section 2C we described an algorithm for computing the cost to bring into the
`memory buffer the blocks in one run as a function of four parameters, DFC
`(demand fetch cost), PFC (prefetch cost), TAC (tag-along cost), and BFC (bad
`fetch cost), values for which have not yet been assigned. In this section we shall
`discuss the choice of values for these parameters.
`Wewill arbitrarily assign DFC a value of 1.0. All othercosts will be assigned
`relative to this fixed value. Since our cost structure uses a relative and not
`absolute scale, we have lost no generality.
`.
`ACMTransactions on Database Systems, Vol. 3, No. 3, September1978.
`
`009
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`009
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`
`
`
`
`232
`
`:
`
`Alan Jay Smith
`
`The values assigned to PFC and TAC (relative to DFC) are a function of the
`architecture of the I/O devices, the load on the system,the quality of the system
`implementation, and the degree of multiprogramming, among other factors. The
`costs for a prefetch include the CPU timeinvolved in constructing andinitiating
`the channel program; the CPUidle time induced by the period during which the
`channel, controller, and device are unavailable; the memoryinterference produced
`by the transfer; and the possible delay in completing the fetch should the
`prefetched block be needed quickly. The tag-along cost is almost the sameas the
`above, but includes the marginal additional cost for the channel program per
`block transferred rather than the bulk of the overhead. Each of the components
`of these costs will vary among machines and operating systems, with time
`dependent factors such as the load on the system, and with the system configu-
`ration (numberandtype of I/O devices). We will therefore discuss somedifferent
`Scenarios and somecost functions which might follow. Specific examples of cost
`functions will be chosen andcalculations presented in Section 4, where wediscuss
`the data we have analyzed. We note that the only way to determine a set of
`verifiable values for these costs is to vary the prefetching algorithm on a specific
`system driven repeatedly by the same benchmark andto estimate the parameters
`statistically from the observed behavior.
`The simplest case is one in which the system is completely I/O bound;thatis,
`the system is assumed to always be waiting for an I/O operation to complete.
`Consider quarter track blocks and random seeks; then the mean transfer, latency,
`and seek times for the IBM 2314 disk are respectively 7% msec, 52 msec, and 60
`msec. For the 3330 disk, the values are 4.2 msec, 8.4 msec, and 30 msec.(In fact,
`random seeks are unlikely (see Smith [22, 23] for a relevant discussion), so the
`meanseek time should probably beless than given.) The marginal wait for a tag-
`along block is then (2%)/(60 +.256 + 254) relative to the demandfetch wait for the
`2314, or 0.079 (0.099 for the 3330). For 4 track data sets, the value would be 0.147
`and 0.1721for the two disks. Adding in the costs of the other overheads mentioned
`above, a valueof0.2 for TAC seems reasonable. The value of PFC ishigher, since
`it includes the overhead for performing the I/O and may includemore nonover-
`lapped transfer time. PFC is almost certainly lower than DFCsince some of the
`I/O wait time is assumed to be overlapped with the processing of the remaining
`block in the run. A value of 0.7 seems to be a plausible value for PFC.
`An alternative assumption for system operation is that the CPU is fully utilized
`and that the demandson the CPU limit the throughputof the system. This would
`be reasonable in a system with many storage (I/O) devices and a great deal of
`memory, so thatall I/O could easily be overlapped with computation. In this
`case the values assumed by DFC, PFC, and TAC depend on the exact time it
`takes to execute the appropriate code(I/O initiation, task switching, if any, etc.).
`It has been suggested to the author that valuesof 0.2 and 0.9 for TAC and PFC
`seem reasonable.
`For other cases, intermediate between I/O bound and CPU bound,the values
`assigned to the cost variables will vary. It is to be hoped that because of the
`uncertainty in the cost variables, the cost function will be relatively flat for
`varying. degrees of prefetching. Computations performed by the author indicate
`that this is the case.
`
`ACM Transactions on Database Systems, Vol. 3, No. 3, September 1978.
`
`010
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`010
`
`Facebook's Ex. 1026
`IPR2019-00516
`
`
`
`Sequentiality and Prefetching in Database Systems
`
`.
`
`233
`
`B. Estimation of BFC
`It remains to estimate a value for BFC, the additional cost for fetching a block
`that is never used. This additional cost arises in the case that a block is removed
`from the buffer pool in order to make room for the new fetch and that this old
`block is quickly rereferenced. We assumethat every block that is expelled from
`the buffer and then reused is fetched at a cost of 1.0; we now estimate the
`probability that a block so expelled is reused.
`Before we describe two methods of calculating the value of BFC, we define a
`specific paging algorithm. In MULTICS [4] and CP-67 [3], a paging algorithm
`commonly referred to as the “clock” paging algorithm is used. We define here
`the:
`GENERALIZED CLOCK PAGE REPLACEMENT ALGORITHM. With
`each page frame in memory weassociate a countfield and we arrange these count
`fields in a circular list. Whenevera pageis referenced, the associated countfield
`is set to i, When a pagefault occurs, a pointer that circles around this circular list
`of page framesis observed.If the count field pointedto is zero, then the pageis
`removed and the new page is placed in that frame. Otherwise, the count is
`decrementedby 1, the pointer is advancedto the next count field, and the process
`is repeated. When anew pageis placed in the page frame, the countfieldis set to
`i if the page is to be referenced (demandfetch) and it is set to J if the page has
`been prepagedandis not immediately referenced. We abbreviate this algorithm
`by writing CLOCKP(/,i). The “P” indicates that this is a prepaging algorithm
`(the prepaging strategy has not been specified). When no prepagingis involved,
`the algorithm is abbreviated CLOCK(z). The algorithm used in MULTICS and
`CP-67 is CLOCK(I).
`Wehavechosento definethis algorithm in such a manner becausethe existence
`of the parametersi and j will facilitate certain experiments to be described.
`Wewill also have occasion to refer to the Working Set paging algorithm [6]
`and the Least Recently Used (LRU) pagingalgorithm [17], with which we assume
`the readeris familiar.
`a
`Wewill employ two methods to estimate BFC; both methods are rather ad
`hoc