`
`DATA STRUCTURES=
`
`PROGRAMS
`
`NIKLAUS WIRTH
`
`Eidgenossische Technische Hochschule
`Zurich, Switzerland
`
`PRENTICE-HALL, INC.
`
`ENGLEWOOD CLIFFS, N.J.
`
`ASSA ABLOY Ex. 1019 - Page 1
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`l lhm , JJ of Otl/lt SI 010 /oglng /11 Publication Data
`
`W111 111, N IKI. AUS .
`Algorithms + da ta structures = programs.
`0 lbliogra phy : p.
`I ncludcs index.
`I. E lectronic digital computers- Programming.
`2. D ata structures (Computer science) 3. Algorithms.
`J . Title.
`001.6'42
`QA76.6.W56
`JSBN 0-13-022418-9
`
`75- 11599
`
`© 1976
`by PRENTICE-HALL, INC.
`Englewood Cliffs, New Jersey
`
`All rights reserved. No part of this
`book may be reproduced in any form
`or by any means without permission
`in writing from the publisher.
`
`10 9 8 7 6
`
`Printed in the United States of America
`
`PRENTICE-HALL INTERNATIONAL, INC., London
`PRENTICE-HALL OF AUSTRALIA, PTY., LTD., Sydney
`PRENTICE-HALL OF CANADA, LTD., Toronto
`PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi
`PRENTICE-HALL OF JAPAN, INC., Tokyo
`PRENTICE-HALL OF SOUTHEAST ASIA (PTE.) LTD., Singapore
`
`To Nani
`
`ASSA ABLOY Ex. 1019 - Page 2
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`SEC. 1.10
`
`REPRESENTATION OF ARRAY, RECORD, AND SET STRUCTURES
`
`29
`
`var s,t: course;
`trialset: selection;
`begin s : = 1;
`while --i(s in remaining) do s : = s+ 1;
`session : = [ s];
`trialset : = remaining -
`for t : = 1 to N do
`if t in trialset then
`begin if c01ifiict[t] * session = [ ] then
`session : = session + [t]
`
`conflict [ s];
`
`(1.30)
`
`end
`
`end
`
`Evidently, this solution for selecting "suitable" sessions will not generate a
`timetable which is necessarily optimal in any specific sense. In unfortunate
`cases the number of sessions may be as large as that of courses, even if simul(cid:173)
`taneous scheduling were feasible.
`
`1.10. REPRESENTATION OF ARRAY, RECORD,
`AND SET STRUCTURES
`
`The essence of the use of abstractions in programming is that a program
`may be conceived, understood, and verified on the basis of the laws governing
`the abstractions and that it is not necessary to have further insight and knowl(cid:173)
`edge about the ways in which the abstractions are implemented and repre(cid:173)
`sented in a particular computer. Nevertheless, it is helpful for a successful
`programmer to have an understanding of widely used techniques for repre(cid:173)
`senting the basic concepts of programming abstractions, such as the funda(cid:173)
`mental 3ata structures. It is helpful in the sense that it might enable the
`programmer to make sensible decisions about program and data design in
`the light not only of the abstract properties of structures, but also of their
`realizations on actual computers, taking into account a computer's particular
`capabilities and limitations.
`The problem of data representation is that of mapping the abstract
`structure into a computer store. Computer stores are-in a first approxima(cid:173)
`tion-arrays of individual storage cells called words. The indiceso f the
`words are called addresses.
`var store: array[address] of word
`
`(1.31)
`
`The cardinalities of the types address and word vary from one computer
`to another. A particular problem is the great variability of the cardinality
`of the word. Its logarithm is called the wordsize, because it is the number of
`bits that a storage cell consists of.
`
`....
`0 y
`" en
`
`.:::;(cid:173)
`l!J
`
`lJJ
`
`"' ...:
`
`- "'
`"' "'
`"Cl
`~ ..
`5
`...
`
`"Cl
`ij
`
`Cl)
`C:
`
`0 z
`
`"'
`
`t,
`en
`
`ASSA ABLOY Ex. 1019 - Page 3
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`30
`
`FUNDAM ENTAL DATA STRUCTURES
`
`CHAP. 1
`
`SEC. 1.10
`
`REPRESENTATION OF ARRAY, RECORD, AND SET STRUCTURES
`
`31
`
`1.10.1. Representation of Arrays
`
`A representation of an array structure is a mapping of the (abstract)
`array with components of type T onto the store which is an array with
`components of type word.
`The array should be mapped in such a way that the computation of
`addresses of array components is as simple (and therefore efficient) as possi(cid:173)
`ble. The address or store index i of the jth array component is computed by
`the linear mapping function
`
`i = i 0 + j * S
`(l.32)
`where i0 is the address of the first component, and sis the number of words
`that a component "occupies." Since the word is by definition the smallest
`individually accessible unit of store, it is evidently highly desirable that s
`be a whole number, the simplest case beings = 1. Ifs is not a whole number
`(and this is the normal case), thens is usually rounded up to the next larger
`integer [sl . Each array component then occupies [sl words, whereby [sl - s
`words are left unused (see Figs. 1.5 and 1.6). Rounding up of the number of
`
`store
`
`~~~~~~~~~
`..,.,,,..,.,....,.,.,....,..'444,.,_._c.<,U'""'"'"'
`
`: i_,,,,.------ abstract
`array
`
`~~~~~~~~~
`
`Fig. 1.5 Mapping an array onto a store.
`
`~
`
`• •23
`fsl = 3
`
`: : : : : : : :: :: : :: : ::
`
`unused
`
`Fig. 1.6 Padded representation of a
`record.
`
`words needed to the next whole number is called padding. The storage utiliza(cid:173)
`tion factor u is the quotient of the minimal amounts of storage needed to
`represent a structure and of the amounts actually used:
`
`(1.33)
`
`Since an implementor will have to aim for a storage utilization as close to
`
`1 as possible, and since accessing parts of words is a cumbersome and
`relatively inefficient process, he will have to compromise. Following are the
`considerations to be made:
`
`1. Padding will decrease storage utilization.
`2. Omission of padding may necessitate inefficient partial word access.
`3. Partial word access may cause the code (compiled program) to expand
`and therefore to counteract the gain obtained by omission of padding.
`
`In fact, considerations 2 and 3 are usually so dominant that compilers will
`always use padding automatically. We notice that the utilization factor will
`always be u > 0.5, if s > 0.5. However, if s < 0.5, the utilization factor
`may be significantly increased by putting more than one array component
`into each word. This technique is called packing. If n components are packed
`into a word, the utilization factor is (see Fig. 1.7)
`n •s
`u- - -- [n•sl
`
`
`(1.34)
`
`L-.__.JL...-__.JL...-__.JL...-__.JL-----''-----'~
`
`Fig. 1.7 Packing six components into one word.
`
`~padding
`
`Access to the ith component of a packed array involves the computation
`of the word address j in which the desired component is located and involves
`the computation of the respective component position k within the word.
`j = i div n
`k = i mod n = i- j*n
`In IJlOSt programming languages the programmer is given no control
`over the representation of the abstract data structures. However, it should
`be possible to indicate the desirability of packing at least in those cases in
`which more than one component would fit into a single word, i.e., when a
`gain of storage economy by a factor of 2 and more could be achieved. ~e
`introduce the convention to indicate the desirability of packing by prefixmg
`the symbol array (or record) in the declaration by the symbol packed.
`
`(1.35)
`
`EXAMPLE
`
`type a/fa = packed array [l . . n] of char
`This feature is particularly valuable on computers with large words and
`relatively convenient accessibility of partial fields of words. The essential
`property of this prefix is that it does in no way change the meaning (or cor(cid:173)
`rectness) of a program. This means that the choice of an alternative rep_re(cid:173)
`sentation can be easily indicated with the implied guarantee that the meanmg
`of the program remains unaffected.
`
`ASSA ABLOY Ex. 1019 - Page 4
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`264
`
`DYNAMIC INFORMATION STRUCTURES
`
`CHAP. 4
`
`SEC. 4.6
`
`KEY TRANSFORMATIONS (HASHING)
`
`265
`
`if pl f .lh then
`begin {RL} p2 := plj.left; plf.lh := false;
`plj.left := p2f.right; p2f.right := pl;
`pf.right := p2f.left; p2f .left := p; p := p2
`end
`end else
`begin h := h- 1; if h =I=- 0 then pf.rh := true
`end
`end else
`begin pf .count := p f .count + l; h := O
`end
`end {search}
`
`Note that the actions to be taken for node re-arrangement very strongly
`resemble those developed in the balanced tree search algorithm (4.63). From
`(4.87) it is evident that all four cases can be implemented by simple pointer
`rotations: single rotations in the LL and RR cases, double rotations in the
`LR and RL cases. In fact, procedure (4.87) appears slightly simpler than
`(4.63). Clearly, the hedge-tree scheme emerges as an alternative to the AVL(cid:173)
`balance criterion. A performance comparison is therefore both possible and
`desirable.
`We refrain from involved mathematical analysis and concentrate on
`some basic differences. It can be proven that the A VL-balanced trees are a
`subset of the hedge-trees. Hence, the class of the latter is larger. It follows
`that their path length is on the average larger than in the AVL case. Note in
`this connection the "worst-case" tree (4) in Fig. 4.53. On the other band, node
`re-arrangement will be called for less frequently. The balanced tree will
`therefore be preferred in those applications in which key retrievals are much
`more frequent than insertions ( or deletions); if this quotient is moderate, the
`hedge-tree scheme may be preferred.
`It is very difficult to say where the borderline lies. It strongly depends not
`only on th_e quotient between the frequencies of retrieval and structu ral
`change, but also on the characteristics of an implementation. This is parti(cid:173)
`cularly the case if the node records have a densely packed representation and
`consequently access to fields involves part word selection. Boolean fields
`(/h, rh in the case of hedge-trees) may be bandied more efficiently on many
`implementations than three-valued fields (bal in the case of balanced tree ).
`
`4.6. KEY TRANSFORMATIONS (HASHING)
`
`The general problem addressed in the last section and used to develop
`solutions demonstrating dynamic data allocation techniques is the following:
`Given a set S of items characterized by a key value upon which
`an ordering relation is defined, how is S to be organized so that
`
`retrieval of an item with a given key k involves as little effort as
`possible.
`Clearly, in a computer store each item is ultimately accessed by specifying
`a storage address a. Hence, the stated problem is essentially one of finding an
`appropriate mapping Hof keys (K) into addresses (A):
`
`H: K-A
`In Sect. 4.5 this mapping was implemented in the form of various list
`and tree search algorithms based on different underlying data organizations.
`Here we present yet another approach that is basically simple and very
`efficient in many cases. The fact that it also bas some disadvantages will be
`discussed subsequently.
`The data organization used in this technique is the array structure. H
`is therefore a mapping transforming keys into array indices, which is the
`reason for the term key transformation that is generally used for this technique.
`It should be noted that we shall not need to rely on any dynamic allocation
`procedures because the array is one of the fundamental, static structures.
`This paragraph is thus somewhat misplaced under the chapter heading of
`dynamic information structures, but since it is often used in problem areas
`where tree structures are comparable competitors, this seems to be an appro(cid:173)
`priate place for its presentation.
`The fundamental difficulty in using a key transformation is that the set
`of possible key values is very much larger than the set of available store
`addresses (array indices). A typical example is the use of alphabetical words
`with, say, up to 10 letters as keys for the identification of individuals in a
`set of, say, up to a thousand persons. Hence, there are 26 10 possible keys,
`which are to be mapped onto 10 3 possible indices. The function His therefore
`obviously_ a many-to-one function. Given a key k, the first step in a retrieval
`(search) operation is to compute its associated index h = H(k), and the second
`-evidently necess~ry-step is to verify whether or not the item with the key
`k is indeed identified by h in the array (table) T, i.e., to check whether
`T[H(k)].key = k. We are immediately confronted with two questions:
`
`1. What kind of function H should be used?
`2. How do we cope with the situation that H does not yield the location of
`the desired item?
`
`The answer to question 2 is that some method must be used to yield an alter(cid:173)
`native location, say index h', and, if this is still not the location of the wanted
`item, yet a third index h ", and so on. The case in which a key other than the
`qesired one is at the identified location is called a collision; the task of gen(cid:173)
`erating alternative indices is termed collision handling. In the following we
`shall discuss the choice of a transformation function and methods of collision
`handling.
`
`ASSA ABLOY Ex. 1019 - Page 5
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`266
`
`DYNAMIC INFORMATION STRUCTURES
`
`4.6.1. Choice of a Transformation Function
`
`A prerequisite of a good transformation function is that it distributes
`the keys as evenly as possible over the range of index values. Apart from
`satisfying this requirement, the distribution is not bound to any pattern, and
`it is actually desirable if it gives the impression that it is entirely at random.
`This property bas given this method the somewhat unscientific name hashing,
`i.e., "chopping the argument up" or "making a mess," and H is called the
`hash function. Clearly, it should be efficiently computable, i.e., be composed
`of very few basic arithmetic operations.
`Assume that a transfer function ord(k) is available and denotes the
`ordinal number of the key k in the set of all possible keys. Assume, further(cid:173)
`more, that the array indices i range over the integers O • .. N -
`l, where N
`is the size of the array. Then an obvious choice is
`H(k) = ord(k) mod N
`It has the property that the key values are spread evenly over the index range,
`and it is therefore the basis of most key transformations. It is also extremely
`efficiently computable if N is a power of 2. But it is exactly this case that must
`be avoided if the keys are sequences of letters. The assumption that all keys
`are equally likely is in this case entirely erroneous. In fact, words which differ
`by only a few characters will then most likely map onto identical indices,
`thus effectively causing a most uneven distribution. In (4.88) it is therefore
`particularly recommended to let N be a prime number [4-7]. This has the con(cid:173)
`sequence that a full division operation is needed that cannot be replaced by
`a mere masking of binary digits, but this is no serious drawback on most
`modern computers that feature a built-in division instruction.
`Often, bash functions are used which consist of applying logical operations
`such as the "exclusive or" to some parts of the key represented as a sequence
`of binary digi~s. These operations may be faster than division on some
`computers, but they sometimes fail spectacularly to distribute the keys
`evenly over the range of indices. We therefore refrain from discussing such
`methods in further detail.
`
`(4.88)
`
`4.6.2. Collision Handling
`
`If an entry in the table corresponding to a given key turns out to be not
`the desired item, then a collision is present, i.e., two items have keys mapping
`onto the same index. A second probe is necessary, one based on an index
`obtained in a deterministic manner from the given key. There exist several
`methods of generating secondary indices. An obvious and effective one is
`linking all entries with identical primary index H(k) together as a linked list.
`This is called direct chaining. The elements of this list may either be in the
`
`SEC. 4.6
`
`KEY TRANSFORMATIONS (HASHING)
`
`267
`
`primary table or not; in the latter case, storage in which they are allocated
`is usually called an overflow area. This method is quite effective, although it
`has the disadvantage that secondary lists must be maintained and that each
`entry must provide space for a pointer (or index) to its list of collided items.
`An alternative solution for resolving collisions is to dispense with links
`entirely and instead simply look at other entries in the same table until the
`item is found or an open position is encountered, in which case one may
`assume that the specified key is not present in the table. This method is called
`open addressing [4-9]. Naturally, the sequence of indices of secondary probes
`must always be the same for a given key. The algorithm for a table lookup can
`then be sketched as follows:
`
`h := H(k); i := O;
`repeat
`if T[h].key = k then item found else
`if T[h].key = free then item is not in table else
`begin {collision}
`i :=
`i+ l; h := H(k) + G(i)
`
`end
`until found or not in table ( or table full)
`
`(4.89)
`
`Various functions for resolving collisions have been proposed in the
`literature. A survey of the topic by Morris in 1968 [4-8] stimulated consider(cid:173)
`able activities in this field. The simplest method is to try for the next location
`-considering the table to be circular-until either the item with the specified
`key is found or an empty location is encountered. Hence, G(i) = i; the indices
`h1 used for probing in this case are
`
`(4.90)
`
`i = l ... N-l
`
`h 0 = H(k)
`h1 = (h 0 + i) mod N,
`This method is called linear probing and has the disadvantage that entries
`have a tendency to cluster around the primary keys (keys that had not collided
`upon insertion). Ideally, of course, a function G should be chosen which again
`spreads the keys uniformly over the remaining set of locations. In practice,
`however, this tends to be too costly, and methods which offer a compromise
`by being simple to compute and still superior to the linear function (4.90)
`are preferred. One of them consists of using a quadratic function such that
`the sequence of indices for probing is
`
`h 0 = H(k)
`h1 = (h 0 + i 2
`
`) mod N
`
`(i > 0)
`
`(4.91)
`
`ASSA ABLOY Ex. 1019 - Page 6
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`