`
`H
`
`lanford University
`
`THE ART OF
`COMPUTER PROGRAMMING
`
`ISHINO COMPANY
`
`Reading, Massachusetts
`Menlo Park, California · London · Amsterdam · Don Mills, Ontario · Sydney
`
`ASSA ABLOY Ex. 1021 - Page 1
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`This book is in the
`ADDISON-WESLEY SERIES IN
`COMPUTER SCIENCE AND INFORMATION PROCESSING
`
`Consulting Editors
`RICHARD s. VARGA and MICHAEL A. HARRISON
`
`Second printing, March 1975
`
`Copyright© 1973 by Addison-Wesley Publishing Company, Inc. Philippines copyright 1973
`by Addison-Wesley Publishing Company, Inc.
`All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
`or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording,
`or otherwise, without the prior written permission of the publisher. Printed in the United
`States of America. Published simultaneously in Canada. Library of Congress Catalog Card
`No. 67-26020.
`
`ISBN 0-201-03803-X
`BCDEFGH IJ-M A -79B76
`
`This book forms a nat
`Chapter 2, because it
`basic structural ideas.
`book is only for those E
`tion of general-purpos(
`But in fact the area o
`discussing a wide vari(
`
`How are good algc
`How can given alg
`How can the effici,
`How can a persor
`same application?
`In what senses car
`How does the the,
`How can external
`with large data ha
`
`Indeed, I believe that
`somewhere in the cont
`This volume comp
`is concerned with sor1
`been divided chiefly in
`also are supplementar:
`tations (Section 5. 1)
`Chapter 6 deals with
`fil es; this is subdivided
`of keys, or by digital
`problem of secondary :
`
`ASSA ABLOY Ex. 1021 - Page 2
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`506
`
`SEARCHING
`
`6.4 HASHING
`
`6.4
`
`G.4
`
`So far we have considered search methods based on comparing the given argu(cid:173)
`ment K to the keys in the table, or using its digits to govern a branching process.
`A third possibility is to avoid all this rummaging around by doing so me arith(cid:173)
`metical calculation on K, computing a function f (K) which is the location of K
`and the associated data in the table.
`For example, let's consider again the set of 31 English words which we have
`subj ected to various search strategies in Section 6.2.2 and 6.3. Table 1 shows
`a short MIX program which transforms each of the 31 keys into a unique number
`f(K) between -10 and 30. If we compare this method to the MIX programs for
`the other methods we have considered (e.g., binary search, optimal tree search,
`trie memory, digital tree search), we find that it is superior from the standpoint
`of both space and speed, except that binary search uses slightly less space. In
`fact, the average time for a successful search, using the program of Table 1
`with the frequency data of Fig. 12, is only about 17.Su, and only 41 table loca(cid:173)
`tions are needed to store the 31 keys.
`Unfortunately it isn't very easy to discover such functions f(K). There
`are 41 31 ~ 10 50 possible functions from a 31-element set into a 41-element set,
`and only 41 • 40 • • • · • 11 = 41 !/10! ~ 104 3 of them will give distinct values
`for each argument; thus only about one of every 10 million function s will be
`suitable.
`Functions which avoid duplicate values are surprisingly rare, even with a
`fairly large table. For example, the famous "birthday paradox" asserts that if
`23 or more people are present in a room, chances are good that two of them
`
`TRANSFORMING A SET OF KEYS INTO UNIQUE ADDRESSES
`
`Table 1
`
`will have the same montl
`dom function which map
`no two keys map into t i
`Skeptics who doubt this r
`large parties they atten<
`unpublished work of H.
`Essays (1939), 45. See a
`Mecmuasi 4 (1939), 145·
`Theory (New York: Wile)
`On the other hand, tb
`niewski and W. Turski, C
`a suitable function can b
`amusing to solve a puzzl,
`Of course this methc
`must be known in advan<
`making it necessary to s1
`more versatile method if
`keys to yield the same ,
`ambiguity after f(K) has
`These considerations
`known as hashing or sea
`chop so mething up or to
`off some aspects of the k
`searching. We compute
`where the search begins.
`The birthday parad,
`Ki ~ Ki which hash to
`
`Instruction
`
`LDlN K(l: 1 )
`LD2 K(2:2)
`INCl
`-8,2
`JlP
`*+2
`INCl 16,2
`LD2 K(3:3)
`J2Z
`9F
`INCl
`-28,2
`JlP
`9F
`INCl 11,2
`LDA
`K(4:4)
`JAZ
`9F
`DECl
`-5,2
`9F
`JlN
`INCl 10
`9H LDA
`K
`CMPA TABLE,l
`JNE
`FAILURE
`
`0
`z
`<
`
`Cil
`0::
`<
`
`<
`
`Ul
`<
`
`E-<
`<
`
`Cil
`
`'°
`
`E-<
`::,
`
`'°
`
`><
`'°
`
`0::
`D
`~
`
`:,;
`D
`
`e:
`
`0
`<
`::c
`
`Cil
`>
`<
`::c
`
`Cil
`::c
`
`0::
`Cil
`::c
`
`Ul
`H
`::c
`
`z
`H
`
`H
`
`Ul
`H
`
`E-<
`H
`
`E-<
`D
`z
`
`- 1 - 2
`- 2
`- 1
`14
`- 5
`14
`- 5
`16
`16
`16
`
`14
`14
`
`- 1
`- 1
`- 9
`- 9
`7
`7
`7
`
`- 1
`- 1
`6
`6
`
`- 1
`- 1
`10
`10
`
`- 1
`- 1
`13
`13
`
`13
`13
`
`6
`10
`6
`10
`- 13
`- 18
`-18 - 13
`- 3
`3
`- 3
`3
`- 3
`3
`
`- 2
`- 2
`- 6
`- 2 - 2 - 6
`14
`18
`2
`14
`18
`2
`
`- 6
`- 6
`5
`5
`
`18
`18
`
`14
`14
`9
`9
`
`7
`7
`7
`
`- 3
`- 3
`- 3
`
`3
`3
`3
`
`13
`13
`13
`
`14
`14
`14
`
`16
`16
`16
`
`9
`9
`9
`
`18
`18
`18
`
`-8 -8
`- 8
`- 8
`- 15
`- 15
`- 15
`- 15
`2
`2
`2
`2
`2
`2
`- 22
`- 1
`- 22
`- 1
`- 7
`35
`-7
`35
`- 7
`35
`15
`15
`25
`25
`25
`25
`
`- 7
`- 7
`-7
`
`- 8
`- 8
`- 11
`- 11
`10
`10
`10
`
`- 8
`- 8
`- 11
`- 11
`10
`10
`10
`
`10
`10
`10
`
`- 15
`- 9 -9 -9 - 9
`- 8
`-9 -9 - 9 -9 - 15
`- 8
`- 7
`-7 -17 -2
`5
`6
`-7
`- 17 -2
`5
`6
`- 7
`25
`18
`-1
`29
`25
`18
`-1
`29
`25
`18
`-1
`29
`20
`12
`20
`12
`
`6
`6
`
`5
`5
`
`12
`12
`12
`
`- 1
`- 1
`- 1
`
`29
`29
`29
`
`5
`5
`5
`
`6
`6
`6
`
`20
`20
`20
`
`2
`5
`2
`5
`- 7 - 7
`- 7 - 7
`23
`20
`23
`20
`23
`20
`9
`9
`19
`19
`19
`19
`
`23
`23
`23
`
`ASSA ABLOY Ex. 1021 - Page 3
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`6.4
`
`6.4
`
`HA SHDIG
`
`507
`
`will have the same month and day of birth! In other words, if we select a ran(cid:173)
`dom function which maps 23 keys into a table of size 365, the probability that
`no two keys map into the same location is only 0.4927 (less than one-half) .
`Skeptics who doubt this result should try to find the birthday mat es at the next
`large parties they attend.
`[The birthday paradox apparently originated in
`unpublished work of H. Davenport; cf. W.W. R. Ball, Math. R ecreations and
`Essays (1939), 45. See also R. von Mises, j stanbul Universitesi Fen Fakultesi
`Mecmuasi 4 (1939), 145- 163, and W. Feller, An Introduction to Probability
`Theory (New York: Wiley, 1950), Section 2.3.]
`On the other hand, the approach used in Table 1 is fairly flexible [cf. M . Gre(cid:173)
`niewski and W. Turski, CACM 6 (1963), 322-323], and for a medium-sized table
`a suitable function can be found after about a day's work. In fact it is rather
`amusing to solve a puzzle like this.
`Of course this method has a serious fl aw, since the contents of the table
`must be knowri in advance; adding one more key will probably ruin everything,
`making it necessary to start over almost from scratch. We can obtain a much
`more versatile method if we give up the idea of uniqueness, permitting different
`keys to yield the same value f(K), and using a special method to resolve any
`ambiguity after f(K) has been computed.
`These considerations lead to a popular class of search methods commonly
`knqwn as hashing or scatter storage techniques. The verb "to hash" means to
`chop something up or to make a mess out of it; the idea in hashing is to chop
`off so me aspects of the key and to use this partial information as the basis for
`searching. We compute a hash function h(K) and use this value as the address
`where the search begins.
`The birthday paradox tells us that there will probably be distinct keys
`K i ~ K i which hash to the same value h(K i) = h(K;). Such an occurrence is
`
`(I)
`
`...
`
`:i:
`
`...
`
`...
`z
`
`(I)
`
`...
`
`r,-,
`
`...
`
`r,-,
`0
`z
`
`...
`:i:
`u
`...
`r,-,
`w
`::,
`<
`z
`<
`iii
`:i:
`11.
`:i:
`:i:
`0::
`0
`0
`><
`r,-,
`r,-,
`r,-,
`r,-,
`0
`0
`0
`~
`~
`Contents of rll after executing the instruction, given a particular key K
`
`(I)
`
`:i:
`r,-,
`
`(I)
`
`...
`
`~omparing the given argu(cid:173)
`overn a branching process.
`mnd by doing some arith(cid:173)
`which is the location of K
`
`glish words which we have
`2 and 6.3. T able 1 shows
`mys into a unique number
`>d to the MIX programs for
`~arch, optimal tree search,
`Jerior from the standpoint
`ses slightly less space. In
`; the program of Table 1
`:u, and only 41 tabl e loca-
`
`:h functions f(K) . There
`set into a 41-element set,
`1 will give distinct values
`million function s will be
`
`risingly rare, even with a
`y paradox" asserts th at if
`·e good that two of them
`
`QUE ADDRESSES
`
`:s
`0 g;
`
`0
`<
`:i:
`
`- 6
`- 6
`2
`2
`
`2
`2
`-7
`-7
`?3
`?3
`l3
`
`l3
`l3
`l3
`
`- 6
`- 6
`5
`5
`
`5
`5
`- 7
`- 7
`20
`20
`20
`9
`9
`19
`19
`19
`19
`
`-8
`- 15
`- 15
`2
`2
`2
`- 22
`- 22
`- 7
`- 7
`- 7
`
`- 7
`- 7
`- 7
`
`-8
`- 15
`- 15
`2
`2
`2
`- 1
`- 1
`35
`35
`35
`15
`15
`25
`25
`25
`25
`
`w
`:i:
`
`0:: w
`:i:
`
`-8
`-8
`- 11
`- 11
`10
`10
`10
`
`- 8
`- 8
`- 11
`- 11
`JO
`10
`10
`
`10
`10
`10
`
`- 8
`- 8
`- 7
`- 7
`18
`18
`18
`12
`12
`
`- 9
`- 9
`- 9 -9 - 15
`- 16
`-9 - 9 -9 - 9 -15
`- 16
`- 2
`- 17
`-7 - 18
`6
`5
`- 17
`- 2
`-7 - 18
`6
`5
`- 1
`29
`25
`4
`- 1
`29
`25
`4
`- 1
`29
`25
`4
`20
`20
`
`5
`5
`
`6
`6
`
`- 16
`- 16
`- 9
`- 9
`22
`22
`22
`
`12
`12
`12
`
`- 1
`- 1
`- 1
`
`29
`29
`29
`
`5
`5
`5
`
`6
`6
`6
`
`20
`20
`20
`
`4
`4
`4
`
`22
`22
`22
`
`- 26 -26 -28
`- 23 -23 - 26
`- 23
`- 23
`- 16
`- 23 -26 -26 - 26
`- 28
`- 23 -23 - 23
`- 16
`- 20
`- 15 -33 -26 -25
`-5 -23 -23 -23
`- 15 -33 -26 - 25 -20
`-5 -23 -23 - 23
`-2
`- 16
`17
`12
`0
`30
`-2
`- 16
`12
`17
`0
`30
`- 16
`- 2
`12
`17
`0
`30
`-5
`-22 -21
`8
`-5
`- 21
`- 22
`8
`-1
`11
`29
`- 1
`29
`11
`- 1
`29
`11
`-5
`11
`-5
`11
`21
`21
`21
`21
`
`30 -1 0
`- 10
`30
`- 10
`30
`
`- 2
`- 6
`- 2
`-6
`-6 -2
`
`17
`17
`17
`
`11
`11
`11
`
`-5
`-5
`-5
`
`8
`8
`8
`
`1
`1
`- 26 -22 - 18
`-26 -22 - 18
`-14
`-6
`2
`-14
`-6
`2
`-14
`-6
`2
`- 2
`- 10
`- 2
`- 10
`
`ASSA ABLOY Ex. 1021 - Page 4
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`
`
`508
`
`SEARCHING
`
`6.4
`
`6.-!
`
`called a collision, and several interesting approaches have been devised to
`In order to use a scatter table, a programmer
`handle the collision problem.
`must make two almost independent decisions: He must choose a hash function
`h(K), and he must select a method for collision resolution. We shall now con(cid:173)
`sider th ese two aspects of the problem in turn.
`
`(1)
`
`Hash functions . To make things more explicit, let us assume throughout this
`section that our hash function h takes on at most M different values, with
`0 _::; h(K) < M,
`for all keys K. The keys in actual files that arise in practice usually have a great
`deal of redundancy; we must be careful to find a hash function that breaks up
`clusters of almost identical keys, in order to reduce the number of collisions.
`It is theoretically impossible to define a hash function that creates random
`data from the nonrandom data in actual files. But in practice it is not difficult
`to produce a pretty good imitation of random data, by using simple arithmetic
`as we have discussed in Chapter 3. And in fact we can often do even better,
`by exploiting the nonrandom properties of actual data to construct a hash
`function that leads to fewer collisions than truly random keys would produce.
`Consider, for example, the case of 10-digit keys on a decimal computer.
`One hash function that suggests itself is to let M = 1000, say, and to let h(K)
`be three digits chosen from somewhere near the middle of the 20-digit product
`K X K. This would seem to yield a fairly good spread of values between 000
`and 999, with low probability of collisions. Experiments with actual data show,
`in fact, that this "middle square" method isn't bad, provided th at the keys
`do not have a lot of leading or trailing zeros; but it turns out that there are
`safer and saner ways to proceed, just as we found in Chapter 3 th at the middle
`flquare method is not an especially good random number generator.
`Extensive tests on typical files have shown that two major types of hash
`functions work quite well. One of these is based on division, and the other is
`based on multiplication.
`The division method is particularly easy; we simply use the remainder
`modulo M:
`
`(2)
`
`h(K ) = K mod M.
`In this case, some values of l'vl are obviously much better than others. For
`example, if M is an even number, h(K) will be even when K is even and odd
`when K is odd, and this will lead to a substanti al bias in many files. It would
`be even worse to let M be a power of the radix of the computer, since K mod M
`would then be simply the least significant digits of K (independent of the other
`digits). Similarly we can argue that M probably shouldn't be a multiple of 3
`either; for if the keys are alphabetic, two keys which differ from each other
`only by permutation of letters would then differ in numeric value by a multiple
`of 3. (This occurs because 10n mod 3 = 4n mod 3 = 1.) In general, we want
`to avoid values of M which divide r k ± a, where k and a are small numbers and
`r is the radix of the alphabetic character set (usually r = 64, 256, or 100),
`
`since a remainder modulo
`position of the key digits
`a prime number such that
`has been found to be quit
`For example, on the I
`h(K) by the sequence
`LDX
`ENTA
`DIV
`The multiplicative hf
`harder to describe becaw
`instead of with integers.
`usually 10 10 or 23 0 for M
`if we imagine the radix p
`choose some integer const
`
`h
`
`In this case we usually h
`h(K) consists of the leadi1
`In MIX code, if we let .
`hash function is
`
`K
`LDA
`A
`MUL
`ENTA 0
`SLB
`m
`
`Now h(K) appears in reg
`shift instructions, this se
`on many machines multir
`In a sense this metho
`could for example take 1
`the reciprocal of a constai
`that (5) is almost a "midi
`ence: We shall see that rr
`good properties.
`One of the nice featm
`was lost in (5); we could
`after (5) has finished. Th
`algorithm can be used to
`that K = (A'(AK modi
`tents of register X just b
`
`ASSA ABLOY Ex. 1021 - Page 5
`ASSA ABLOY AB v. CPC Patent Technologies Pty Ltd.
`IPR2022-01094 - U.S. Patent No. 8,620,039
`
`