throbber
1570
`
`IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 40, NO. 10, OCTOBER 1992
`
`A Comparison of Hashing Schemes for Address Lookup in Computer Networks
`
`Raj Jain
`
`1'“
`
`Abstract— Using a trace of address references, we compared
`the efficiency of several different hashing functions such as cyclic
`redundancy checking (CRC) polynomials, Fletcher checksum,
`folding of address octets using the exclusive— OR operation,
`and bit extraction from the address. Guidelines are provided for
`determining the size of hash masks required to achieve a specified
`level of performance.
`
`I.
`
`INTRODUCTION
`
`HE trend toward networks becoming larger and faster
`and addresses becoming larger has impelled a need to
`explore alternatives for fast address recognition. This problem
`is actually a special case of the general problem of searching
`through a large database and finding the information associated
`with a given key. For example, datalink adapters on local area
`networks (LAN) need to recognize the multicast destination
`addresses of frames on the LAN. Bridges, used to intercon-
`nect two or more LAN ’s, have to recognize the destination
`addresses of every frame and decide quickly whether to receive
`the frame for forwarding. Routers in wide-area networks have
`to look through a large forwarding database to decide the
`output
`link for a given destination address. Name servers
`have the ultimate responsibility for associating names to
`characteristics. In these applications, a hashing algorithm can
`be used to search through a very large information base in
`constant
`time. We also investigated caching as a possible
`solution but found that it could be harmful in some cases in
`
`the sense that the performance would be better without it [4].
`To compare various hashing strategies, we used a trace of
`destination addresses observed on an Ethernet LAN in use.
`The trace consisted of 2.046 million frames observed over a
`
`period of 1.09 h. A total of 495 distinct station addresses were
`observed, of which 296 were seen in the destination field.
`
`ll. HASHING: CONCEPTS
`
`h(Addr)
`
`Hash cells
`
`Subtables
`
`Fig. 1.
`
`Hashing concepts.
`
`addresses and a hash table of M cells, the goal is to minimize
`the average number of lookups required per frame.
`If we perform a regular binary search through all N ad-
`dresses, we need to perform 1 + log2(N) or log2(2N) look-
`ups per frame. Given an address that hashes to the ith cell, we
`have to search through a subtable of n, entries, which requires
`only log2(2n,) lookups. The total number of lookups saved is
`
`Z n- [log2(2N) — log2 (2n,)]
`'1
`
`is the number of frames that hash t0 the ith cell,
`where 7",-
`2 Ti 2 R. The net saving per frame is
`
`2 —% log; (7%) = Z: —qi10g2pi.
`L
`
`(1)
`
`Here, q,- : r,- /R denotes the fraction of frames that hash to the
`ith cell, and pi = n, /N is the fraction of addresses that hash
`to the ith cell. The goal of a hashing function is to maximize
`the quantity 2 —q,-log2(p,~). Notice that pi and q, are not
`related. In the special case of all addresses being equally
`likely to be referenced, q,
`is equal to p, and the expression
`2 —p,- log2(p,-) would be called the entropy of the hashing
`function. It is because of this similarity that we will call the
`quantityz —q,~ log2(p,-) the entropy or information content of
`the hashing function. It is measured in units of “bits.”
`
`III. HASHING USING ADDRESS BITS
`
`Webster’s dictionary defines the word “hash” as a Verb “to
`chop (as meat and potatoes) into small pieces ”. Strange as
`it may sound, this is correct. Basically, hashing allows us to
`chop up a big table into several small subtables so that we
`can quickly find the information once we have determined
`the subtable to search for. This determination is made using
`a mathematical function, which maps the given key to hash
`cell 'i as shown in Fig. l. The cell i could then point us to the
`subtable of size n,. Given a trace of R frames with N distinct
`
`Paper approved by the Editor for Communication Protocols of the IEEE
`Communications Society. Manuscript received June 30, 1989; revised June
`10, 1991.
`The author is with the Digital Equipment Corporation, Littleton, MA 01460.
`IEEE Log Number 9201642.
`
`The simplest hashing method is to use a certain number
`of bits, say m, from the address. For example, we could
`hash using bits i,i + 1, .
`~
`- ,i + m — 1 of the address to 2’"
`subtables. The number of lookups saved, as we saw in the last
`section, is equal to the information entropy of the bits. For
`our trace, this is plotted in Fig. 2. Eight curves corresponding
`0090—6778/92303DO © 1992 IEEE
`
`UNIFIED 1021
`UNIFIED 1021
`UNIFIED v. PLECTRUM LLC
`UNIFIED v. PLECTRUM LLC
`IPR2017-01430
`|PR2017-01430
`
`

`

`IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 40, NO. 10, OCTOBER 1992
`
`1571
`
`1
`
`l
`
`‘V
`
`Ill
`
`InformationinBits
`
`0
`
`24
`16
`8
`First Bit of the Sequence
`
`32
`
`Fig. 3.
`
`Information in CRC bits.
`
`the fifth octet may not be the
`It should be obvious that
`most informative octet for all environments. Nonetheless, the
`above recommendations are useful providing that one uses
`the appropriate, most informative octet instead.
`
`IV. HASHING USING CRC
`
`Another hashing function, already used in a few adapters,
`is to use bits from the cyclic redundancy check (CRC) of
`the address. Using the 32 bit CRC polynomial used on IEEE
`802 LAN ’5 [2], we computed the information content of m-bit
`sequences consisting of bits 2' through i + m — 1 of the CRC
`for m = 1,2,~-,8 and t' : 0,1,~--,31. Here,
`i : 0
`represents the most significant bit of the CRC. The results
`are shown in Fig. 3.
`It
`is
`interesting to compare Figs. 2 and 3. Notice the
`following.
`1) Almost all 32 bits have a high information content very
`close to one bit. Thus, it does not matter which bit of
`the CRC we use.
`
`2) All curves are (almost) horizontal straight lines. Thus,
`the information content of all bit combinations is iden—
`tical. It does not matter which m bits are chosen for
`m = 1,2, -
`3) The information content of in bits is approximately
`m. This means that CRC provides an almost optimal
`hashing function.
`We repeated the analysis with several other 8 bit and 16 bit
`CRC polynomials. In general, we found that if a polynomial
`provides a good CRC,
`it can serve as an excellent hash
`function.
`
`V. HASHING USING FLETCHER CHECKSUM
`
`This checksum is used in the 180/051 transport since it is
`so easy to compute in software. Given an n-octet message
`b[1] -
`~
`- b[n], a two-octet checksum CM and C[1] is computed
`
`8
`
`6
`
`a.
`
`E=
`.2 4
`
`a EE
`
`2
`
`o
`
`o
`
`36
`24
`12
`First Bit of the Sequence
`
`Fig.2.
`
`Information in address bits.
`
`32
`1
`
`48
`
`to in consecutive bits with m = 1,2, -
`Fig. 2, we observe that
`1) All 8 bit sequences between bits 0 and 24 have less
`than two bits of information.
`
`- ,8 are plotted. From
`
`~
`
`2) The bits 32 through 39,
`mation content.
`
`in general, have a high infor—
`
`The first observation is not surprising considering that the
`first three octets of the IEEE 802 addresses used on IEEE
`802 LAN’5 are assigned by the IEEE and, thus, most stations
`have the same first
`three octets. The first
`two bits corre—
`sponding to the global/local assignment and multicast/unicast
`addresses may be different in different addresses. Given these
`two bits, other bits can be easily predicted. In multivendor
`environments,
`the first
`three octets may have a little more
`information. However, in general, these bits are not good for
`hashing purposes.
`the fifth octet of the
`The second observation says that
`address has the highest information in our environment. This
`observation,
`if applicable,
`leads to the following types of
`conclusions.
`
`1) Use the fifth octet as the hashing function. The bits in this
`octet would provide a maximum savings in the number
`of lookups.
`2) When comparing two addresses, compare the fifth octet
`first. If the addresses are not equal, the very first compar-
`ison will fail more often than when using other octets.
`3) Use the fifth octet as the branching function at the root
`of a tree database. If the addresses are stored in a tree or
`trie structure [5] and the address bits are used to decide
`the branch to be taken, using bits from this octet would
`provide maximum discrimination.
`4) Use the fifth octet as the load balancing function for
`different paths. Given several alternative paths to a set
`of destinations, one way to balance the load on different
`paths is to decide the path based on a few bits of
`the address. This eliminates out—of-order arrivals since
`all frames going to a particular destination follow a
`single path and load balancing is achieved by different
`destinations using different paths.
`
`

`

`1572
`
`IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 40, NO. 10, OCTOBER 1992
`
` 2‘2a:
`
`.Et:
`.2
`‘5
`E“-4
`.5
`
`00
`
`l2
`8
`4
`First Bit of the Sequence
`
`16
`
` .9
`
`i5
`.5r:
`.9
`‘5
`E8
`E
`
`0
`
`0
`
`12
`8
`4
`First Bit of the Sequence
`
`16
`
`Fig, 4.
`
`Information in Fletcher checksum hits.
`
`Fig, 5.
`
`Information in mod-checksum bits.
`
`f:
`.E:
`.9.
`‘5
`E8
`
`E.
`
`3i
`
`O
`
`Fig. 6.
`
`6
`4
`2
`First Bit of the Sequence
`Information in XOR»fold bits.
`
`8
`
`as follows.
`
`Cll] : 0;
`C[0] = 0;
`Fort} = 1 step 1 until n do
`
`(:[0] = c[0] + b[i];
`
`c[1] = c[1] + (:[0];
`EndFor;
`
`The information in m consecutive bits of address checksums
`
`is shown in Fig. 4. This also is a good hashing function.
`
`VI. HASHING USING ANOTHER CHECKSUM
`
`Another popular checksum algorithm used to guard against
`memory errors in network address databases is [2]:
`
`(L:Mw@%%m+%m+nw
`
`-H%M+%M+H®£m—D
`
`VIII. MASK SIZE FOR AN ADDRESS FILTER
`
`is the ith octet of the address and C is the 16 bit
`Here, b[z']
`checksum. Since we are not aware of its name, we will call
`it the “mod-checksum.” The information content of the bits
`
`of this Checksum are shown in Fig. 5. Notice that the mod—
`checksum is not as good a hashing function as the Fletcher
`checksum even though it is more complex to compute.
`
`Vll. HASHING USING XOR FOLDING
`
`The final alternative that we investigated is that of the
`straightforward exclusive—0R operation on the six octets of
`the address to produce eight bits.
`
`C=HH$HQ$M$$HQ®HH$HQ
`
`The information content of the bits in the “XOR-fold” so
`
`this function,
`obtained is shown in Fig. 6. To our surprise,
`which is so simple to implement,
`is an excellent hashing
`function.
`
`In this section, we briefly address the problem of finding
`the size of the hash mask required to get a desired level of
`performance. We assume that the filter consists of a simple
`M x 1 bit mask; that is, each hash table cell is only one bit
`wide. A hash function is used to map the address to an index
`value 2'
`in the range 0 through M — 1, and if the ith bit in
`the mask is set, the frame is accepted for further processing;
`otherwise, the frame is rejected. This is how hashing is used
`in several commercial media access controller (MAC) chips.
`Such a hash filter is a perfect rejection filter in the sense that
`if the mask bit is zero, we can be certain that the address is
`not wanted, and there is no need to search the address table.
`The performance of the filter is measured by the probability
`of an unwanted address being rejected by the filter. We call
`this probability the unwanted-rejection rate. If we assume that
`all addresses are equally likely to be seen and that all mask
`cells are equally likely to be referred, then using the procedure
`described in [3], we can compute the unwanted-rejection rate
`as shown in Fig. 7. In the figure, the number of addresses k that
`
`

`

`IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 40. N0. 10, OCTOBER 1992
`
`1.00
`
`0.75
`
`0.50
`
`0.25
`
`P(RejectionlUnwanted)
`
`0'000
`
`['2
`8
`4
`Number of Addresses Wanted
`
`16
`
`Fig. 7.
`
`Probability of rejecting unwanted frames as a function of number
`of addresses wanted and the mask size M.
`
`a station may want to receive is plotted along the horizontal
`axis, and the probability of rejecting an unwanted frame is
`plotted along the vertical axis. Eight curves corresponding to
`mask size M : 2,4,8,-~,128,512 bits are shown. From
`Fig. 7, we observe the following.
`1) There is some filtering even with small masks. For
`example, if one wants to receive 10 addresses, an 8 bit
`mask is expected to reject 26% of the unwanted frames
`without further searching. Although this rate is low, the
`point is that it is nonzero even though the mask size is
`less than the number of addresses.
`
`2)
`
`In general, it is better to have as large a mask as possible.
`For example, if one wants to receive 10 addresses with
`a 512 bit mask, 98% of the unwanted frames will be
`rejected without further searching.
`3) If the mask size is large compared to the number of
`addresses to receive, that is if M > k, the curves are
`linear and the unwanted-rejection rate is approximately
`1 7 k/M.
`The last observation is helpful in deciding the mask size.
`Thus, if one wants to reject 80% of the unwanted frames,
`
`I
`
`m
`
`1573
`
`the mask size should by five times the number of addresses
`desired.
`
`IX. SUMMARY
`
`We showed that the number of lookups saved using hashing
`is equal to the information content of the bits of the hashed
`value and compared several hashing functions. First, simple hit
`extraction from the address itself provides a hashing function
`that is easy to implement in hardware as well as software.
`Second, bits extracted from the CRC of the address can be
`used as a hashing function that
`is easy to implement
`in
`hardware. Third, bits extracted from the Fletcher checksum
`can be used as a hashing function that is easy to implement
`in software. Finally, exclusive-OR folding of the the address
`octets provides another hashing alternative that
`is easy to
`implement both in software as well as hardware.
`We concluded that CRC polynomials are excellent hashing
`functions. Fletcher’s checksum and folding are also good hash—
`ing functions. The mod-checksum, which is more complex to
`compute than Fletcher’s checksum, is not as good as the latter.
`Although bit extraction is not as good as other alternatives, it
`is the simplest. The choice between bit extraction and other
`alternatives is basically that of computing versus storage. If
`we can use excess memory, bit extraction with a few more
`bits may provide the same information as the checksum or
`folding with a few less bits.
`We pointed out
`that for a station wanting to receive k
`addresses, the probability of rejecting unwanted frames using
`a simple M x 1 bit mask is 1— Ic/M. This allows us to decide
`the mask size required for a desired level of performance.
`
`REFERENCES
`
`[1] J. G. Fletcher, "An arithmetic checksum for serial transmissions,” IEEE
`Trans. Commun, vol. COM-30, no. 1, pp. 2477252, Jan. 1982.
`[2] The Ethernet—A Local Area Network: Data Link Layer and Physical
`Layer Specifications,
`Published jointly by Digital,
`Intel, and Xerox
`Corp., Version 2.0, Nov. 1982, pp. 95—96.
`[3] R.
`lain, “A comparison of hashing schemes for address lookup in
`
`computer networks," DEC Tech. Rep., DEC-TR-593, Feb., 1989.
`, “Characteristics of destination address locality in computer net-
`works: A comparison of caching schemes,” Compur. Nerw. and ISDN
`Syst., vol. 18, pp. 243454, 1989/1990.
`[5] R. Sedgewick,A1gnrit}1ms. Reading, MA: Addison-Wesley, 1988.
`
`[4]
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket