`Proceedings of the USENIX Annual Technical Conference (NO 98)
`New Orleans, Louisiana, June 1998
`
`Increasing Effective Link Bandwidth
`by Suppressing Replicated Data
`
`Jonathan Santos and David Wetherall
`Massachusetts Institute of Technology
`
`
`
`For more information about USENIX Association contact:
`
`510 528-8649
`1. Phone:
`510 548-5738
`2. FAX:
`office@usenix.org
`3. Email:
`4. WWW URL: http://www.usenix.org/
`
`MICROSOFT
`
`EXHIBIT 1004
`
`Page 1 of 13
`
`
`
`Increasing E ective Link Bandwidth
`
`by Suppressing Replicated Data
`
`
`
`Jonathan Santos
`
`y
`
`z
`
`David Wetherall
`Software Devices and Systems Group
`Laboratory for Computer Science
`Massachusetts Institute of Technology
`http#www&sds&lcs&mit&edu
`
`Abstract
`
`
`
`Introduction
`
`In the Internet today* transfer rates are often limited
`by the bandwidth of a bottleneck link rather than the
`computing power available at the ends of the links8
`To address this problem* we have utilized inexpen<
`sive commodity hardware to design a novel link layer
`caching and compression scheme that reduces band<
`width consumption8 Our scheme is motivated by the
`prevalence of repeated transfers of the same infor<
`mation* as may occur due to HTTP* FTP* and DNS
`tra c8 Unlike existing link compression schemes* it
`is able to detect and use the long<range correlation of
`repeated transfers8 It also complements application<
`level systems that reduce bandwidth usage* e8g8* Web
`caches* by providing additional protection at a lower
`level* as well as an alternative in situations where
`application<level cache deployment is not practical
`or economic8
`
`We make three contributions in this paper8 First* to
`motivate our scheme we show by packet trace anal<
`ysis that there is signi cant replication of data at
`the packet level* mainly due to Web tra c8 Sec<
`ond* we present an innovative link compression
`protocol well<suited to tra c with such long<range
`correlation8 Third* we demonstrate by experimen<
`tation that the availability of inexpensive memory
`and general<purpose processors in PCs makes our
`protocol practical and useful at rates exceeding T
` Mbps8
`
` This work was supported by DARPA4 monitored by the
`O ce of Naval Research under contract No> N B BCB
`>
`yEmailI jrsantosKmit>edu
`zEmailI djwKlcs>mit>edu
`
`In the Internet today5 transfer rates are often lim7
`ited by the bandwidth of a bottleneck link rather
`than the computing power available at the ends of
`the links& For example5 access links modem5 ISDN5
`T 5 T restrict bandwidth due to cost5 while wire7
`less links restrict bandwidth due to properties of
`the media& A traditional solution to this problem
`is the use of data compression5 either at the link
`or application level& Existing compression schemes5
`however5 tend to miss the redundancy of multiple
`instances of the same information being transferred
`between di erent clients and servers& This is prob7
`lematic because such transfers have become preva7
`lent with the growth of information services such as
`the Web&
`
`DanzigKs study of Internet tra c noted that
`half of the FTP transfers could be eliminated with a
`caching architecture that suppressed multiple trans7
`fers of the same information across the same link&
`Since that time5 protocols and tra c patterns have
`changed with the growth of the Web it is now
`HTTP5 not FTP5 that is dominant& However5 the
`level of redundancy is still perceived to be high5 de7
`spite the application7level caching mechanisms that
`have emerged to curtail it&
`
`In this paper5 we revisit the problem of improv7
`ing e ective link bandwidth in the context of traf7
` c with replicated data5 as may occur due to TCP
`retransmissions5 application7level multicast5 DNS
`queries5 repeated Web and FTP transfers5 and so on&
`We have designed an innovative link compression
`scheme that uses a network7based cache to detect
`and remove redundancy at the packet level& Our
`Page 2 of 13
`
`
`
`scheme takes advantage of the availability of inex3
`pensive memory and general3purpose processors to
`provide an economical means of purchasing addi3
`tional bandwidth8 That is: given the one3time costs
`of ; per PC and the monthly costs of ; per
`T 8 Mbps: it is cheaper to purchase the two
`PCs used by the scheme than the bandwidth they
`are expected to save8
`
`Our scheme has several interesting propertiesF
`
` It is independent of the format of packet data
`contents and so provides bene ts even when
`application objects have been previously com3
`pressed: e8g8: for Web images already in JPEG
`or GIF format8
`
` It utilizes a source of correlation that is not
`available at individual clients and servers and is
`not found by existing link compression schemes8
`
` It provides the bandwidth reduction bene ts of
`caching in a transparent manner: e8g8: there is
`no risk of stale information or loss of endpoint
`control8
`
` It constructs names at the link level using n3
`gerprints and so does not depend on higher level
`protocol names or details8 For example: the
`same information identi ed by di erent URLs
`will be compressed by our scheme: but not by
`Web caches8
`
`Our scheme overlaps application3level caching sys3
`tems most notably Web caches in that both re3
`duce the impact of repeated transfers of the same in3
`formation8 However: our scheme is intended to com3
`plement Web caches rather than to compete with
`them: since it addresses a slightly di erent goal and
`works at a di erent level8 For example: Web caches
`do not take advantage of replication across multiple
`caching systems: protocols and application objects8
`
`In this paper: we presentF a trace3driven tra c anal3
`ysis that motivates our schemeW the design of our
`systemW and an experimental characterization of a
`prototype implementation8 Our tra c analysis in
`Section uses several traces of at least one million
`packets each that we recorded between our site the
`MIT Laboratory for Computer Science: including
`the Web consortium and the rest of the Internet8
`In Section : we describe the system architecture
`and compression protocol: along with a prototype
`implementation running under Linux8
`In Section
`
`: we evaluate the performance of this prototype8
`We then contrast our system with related work and
`conclude in Sections and : respectively8
`
` Analysis of Replicated Tra c
`
`To understand the potential of a system for sup3
`pressing replicated data transfers at the packet level:
`we began our design by analyzing network tra c8
`We de ne a packet to be replicated when the con3
`tents of its payload match exactly the contents of
`a previously observed payload8 Since packet head3
`ers are expected to be constantly changing and a
`function of the source and destination hosts rather
`than the data being transported: we do not consider
`them in our search for replicated data8
`
`Note that it is not clear that overlapping Web trans3
`fers will translate into replication that satis es our
`de nition and that may be detected and removed
`at the packet level8 First: data sent multiple times
`must be parceled into packet payloads in the same
`manner: despite potentially di erent protocol head3
`ers: path maximum transmission units MTUs: and
`protocol implementations8 Second: the timescale
`of replication which may be hours for Web docu3
`ments must be observable with a limited amount of
`storage8 We therefore characterize the replication as
`de ned above by answering the following questionsF
`
` How much data is replicated^
`
` What kind of data is most likely to be repli3
`cated^
`
` What is the temporal distribution of replicated
`data^
`
`! Obtaining the Packet Traces
`
`As input to our analysis: we collected a series of full
`packet traces of all tra c exchanged between our
`site and the rest of the Internet8 New traces rather
`than publicly available archives were necessary be3
`cause we require the entire packet contents in or3
`der to detect repeated data8 The choice of our site
`was expedient: but it makes an interesting test case
`because it is a diverse environment hosting many
`Page 3 of 13
`
`
`
`All Inbound
`Total Vol/
`
`MB
`Repl/
`
`
`
`
`
`
`
`
`
`
`
`
`
`Inbound HTTP
`Total Vol/
`
`MB
`Repl/
`
`
`
`
`
`
`
`
`
`
`
`
`
`All Outbound
`Total Vol/
`
`MB
`Repl/
`
`
`
`
`
`
`
`
`
`
`
`
`
`Outbound HTTP
`Total Vol/
`
`MB
`Repl/
`
`
`
`
`
`
`
`
`
`
`
`
`
`Set
`A
`B
`C
`D
`E
`Total
`
`Table F Total volume and replicated percentage by volume of inbound and outbound tra c
`
`Set C
`
`Set E
`
`Set A
`
`Set D
`Set B
`
`500
`
`1000
`
`1500
`
`data length (bytes)
`
`100
`
`90
`
`80
`
`70
`
`60
`
`50
`
`40
`
`30
`
`20
`
`10
`
`cumulative volume of replicated data (%)
`
`0
`
`0
`
`Figure F Cumulative volume of replicated data by
`packet length
`
`and tra c as HTTP tra c/
`
`Table summarize the amount of replicated data
`that was found in each packet traceT for inbound
`and outbound tra cT respectively/ The leftShand
`columns show the results for all types of tra c in
`each traceT while the rightShand columns summarize
`the replication in only the HTTP tra c for each
`trace/
`
`It includes the Web ConsorS
`clients and servers/
`tiumT MIT Laboratory for Computer Science and
`the MIT AI Laboratory/
`
`Each trace was captured using tcpdump as a pasS
`sive monitor listening to Ethernet tra c traveling
`on the segment between the Lab and the Internet/
`Five sets of S million packets each were gathered
`at di erent times of dayT corresponding to approxS
`imately / GB of raw data in total/ No packet
`capture loss was detected/
`
`! Analysis Procedure
`
`We statically analyzed each trace by searching
`through the packets sequentially for replicated data/
`To expose the application dataT we progressively
`stripped protocol headers up to the TCPUDP
`level/ For exampleT TCP payloads were identi ed
`by removing rst the EthernetT then IP and S
`nally TCP headers/ Our analysis therefore slightly
`underestimates the amount of replicated data due
`to changing headers at higher protocol layers that
`could not easily be taken into accountb one examS
`ple of tra c that falls into this category is DNS
`responses/
`
`! Replication by Tra c Type
`
`Our initial analyses classi ed replication by tra c
`direction incoming and outgoing and type TCPT
`UDPT other IPT and other Ethernet/ It quickly beS
`came evident that most replication occurred in outS
`going TCP data on ports and T i/e/T Web
`tra c responding to queries from other sites/ To
`highlight thisT we separately classi ed TCP port
`
`These results support our intuition that there are
`signi cant amounts of replicated data present in the
`traces/ FurtherT most of the tra cT as well as a
`greater percentage of replicationT exists in the outS
`bound tra c/ ThereforeT for the remainder of this
`paperT we will focus on the outbound tra c over the
`link/
`
`Page 4 of 13
`
`
`
`Set C
`
`Set E
`
`Set B
`
`Set D
`
`Set A
`
`identi ed as a function of window size;
`
`Figure shows this result for all outbound tra c;
`The positive result that we infer is that the majority
`of replicated data can be observed with a cache of
` MBD i;e;D reasonable results can be expected if
`we cache the data in the amount of RAM that is
`presently available in PCs;
`
`24
`
`22
`
`20
`
`18
`
`16
`
`14
`
`12
`
`10
`
`replicated volume (%)
`
`8
`
`6
`
`4
`
`0
`
`100
`
`200
`
`300
`window size (MB)
`
`400
`
`500
`
`600
`
`Figure ’ Percent of outbound tra c versus window
`size
`
`! Replication by Packet Size
`
`A further criterion that is important to our scheme
`is packet size; Replication in large packets will re>
`sult in a more e ective system than replication in
`small packets when xed>length packet overheads
`and packet processing costs are taken into account;
`
`To assess this e ectD we classi ed the replicated data
`according to the length of the data payload; Figure
` depicts the cumulative volume of replicated data
`according to packet length; The sharp increases
`round and bytes correspond to the default
`TCP segment size is bytes and the maximum
`Ethernet payload ; Kb; It is apparent that
`of the volume of replicated data occurs in packets
`with a length greater than bytes; This sug>
`gests that small per packet space costs required for
`compression will not result in a signi cant system
`overhead;
`
`! Distribution of Replication
`
`FinallyD the timescale of replication events deter>
`mines the size of the packet cache needed to observe
`and remove such redundancy; To quantify this ef>
`fectD we determined the intervalD in bytes of dataD
`from each match to the previous copy of the match;
`These intervals were then grouped to compute the
`percentage of the replicated tra c that could be
`
` Design and Implementation
`
`We now describe the design and implementation of a
`compression architecture that suppresses replicated
`data based on the analysis from Section ; The over>
`all goal of our scheme is simply to transmit repeated
`data as a short dictionary tokenD using caches of re>
`cently seen data at both ends of the link to maintain
`the dictionary and encode and decode these tokens;
`
`The correct operation of this scheme as a distributed
`system is complicated by the fact that messages may
`be lost by the channel; Our design must resolve the
`following issues’
`
` How are dictionary tokens generatedY
`
` How are dictionaries at either end of the link
`maintained in a nearly synchronized stateY
`
` How are inevitable di erences in dictionary
`state handledY
`
`Our approach is based on the insight that the n>
`gerprint of a data segment is an inexpensive name
`for the data itselfD both in terms of space and time;
`We are aware of the use of ngerprints for identi >
`cation and version control in various systemsD e;g;D
`Java RMIOSD but to the best of our knowledge this
`is the rst time that ngerprints have been applied
`for this purpose at the network layer;
`
`We selected the MD hash for our implemen>
`tation because it is bits and may be calculated
`in one rapid traversal of the datab on a PentiumII
` MHz the computational rate of ngerprinting
`exceeds Mbps; FurtherD given that the hash
`is large enough and collisions rare enoughD it is ef>
`fectively a unique name for the data; For exam>
`pleD though our architecture handles collisionsD none
`were detected in our trace data analysis;
`Page 5 of 13
`
`
`
`Compressor
`
`Decompressor
`
`input
`
`Channel
`
`uncompressed
`
`compressed
`
`output
`
`classify cache compress
`
`decompress
`
`classify
`
`cache
`
`Figure ’ Components of the Architecture
`
`To handle message loss in a lightweight fashion8 we
`have opted to process messages independently8 such
`that each message is the unit of error generation
`and recovery; That is8 our scheme is connectionless
`aside from the dictionary state and does not re>
`quire that a reliable transport protocol be run across
`the link in order to recover from errors;
`
` ! Architecture
`
`The main components of our architecture are shown
`in Figure 8 which shows a unidirectional compres>
`sion system to simplify our description; The system
`consists of a compressor8 a channel8 and a decom>
`pressor; The compressor is a repeater perhaps part
`of a router that accepts input tra c8 processes it
`to compress replicated data8 and transmits the re>
`sulting packets over the channel; Conversely8 the
`decompressor accepts tra c from the channel8 pro>
`cesses it to remove compression8 and transmits it
`as output tra c; The channel itself may be any
`bidirectional linkC we use the reverse direction to
`carry protocol control messages; Bidirectional com>
`pression is achieved by using two instances of the
`protocol8 one for each direction;
`
`Both the compressor and decompressor are com>
`posed of several modules for classifying8 caching8
`and compressing packets; Our architecture allows
`di erent policies to be selected for the implementa>
`tion of each of these stages8 subject to the constraint
`that compressor and decompressor implement iden>
`tical processing in order to ensure that their dictio>
`naries are closely synchronized; In particular8 the
`dictionary caches must be of equal size; We describe
`each module in turn;
`
` ! ! Classifying Packets
`
`Not all packets need be entered into the dictionary
`cache; Our analysis in section showed that most
`of the replicated data in our traces was composed of
`outgoing Web tra c and large packets; An imple>
`mentation may take advantage of such bias by selec>
`tively considering certain types of tra c for cache
`inclusion; The classi cation step in our architecture
`serves this role8 and must be performed in the same
`manner at the compressor and decompressor;
`
`The classi er further encodes the rules for identify>
`ing application data units ADUs embedded within
`the payload of packets8 e;g;8 the stripping of head>
`ers up to the TCPUDP level; By using application
`level framing concepts ALF 8 other extension
`policies could be designed to cater for speci c ap>
`plication headers or compensate for the di erent di>
`vision of data across di erent protocols;
`
` ! ! Caching Policies
`
`The cache module maintains the dictionary state8
`naming payloads by their ngerprint; Our architec>
`ture allows any ngerprint to be used depending on
`the required tradeo between speed8 space and col>
`lisions; In our implementation we use MD8 though
`stronger ngerprints such as the SHA or weaker
` ngerprints such as MD may be used;
`
`Two policies govern the operation of the cache’
`the inclusion policy decides which payloads selected
`by classi cation should be admitted to the cache8
`and the replacement policy decides which payloads
`should be evicted when more space is needed; As
`for classi cation8 the compressor and decompressor
`must implement identical policies;
`
`Page 6 of 13
`
`
`
`Our default policies are simple/ all payloads that are
`output by the classi er are entered into the cache5
`and the cache is maintained in least6recently6used
`order7 For inclusion5 an interesting policy would be
`to store replicated data only after its ngerprint had
`been encountered a certain number of times7 De6
`pending on the number of times a given payload is
`repeated5 this may signi cantly reduce the storage
`required to suppress a given volume of replicated
`data7 For replacement5 results with Web caching
` suggest that taking payload length into consid6
`eration may improve performance5 since larger data
`payloads translate to higher per6packet savings7
`
`A further issue that a ects inclusion is ngerprint
`collision7 Collisions are expected to be extremely
`rare5 but nevertheless it is conceivable that they may
`occur7 If so5 they must not result in a deterministic
`error5 with the same o ending data being repeatedly
`transferred to correct perceived transmission errors7
`
`In our architecture5 collision detection is performed
`as part of cache lookup and insertion at the com6
`pressor7 Every time a ngerprint matches5 the full
`payload data is compared with the existing cache
`contents before it is entered7
`If a collision is en6
`countered5 the ngerprint is marked as illegal in the
`dictionary and the colliding payload is transmitted
`without any compression7 Any subsequent payloads
`which index to the illegal ngerprint are also trans6
`mitted uncompressed7 These illegal entries must
`persist at the compressor until the decompressor is
`reset7
`
` ! ! Compression and Decompression
`
`Finally5 the compression and decompression mod6
`ules exchange dictionary tokens to suppress the ac6
`tual transfer of repeated data7 Di erent policies
`may be used by the compressor to decide when to
`compress payloads7 Our default policy is to simply
`send tokens whenever repeated data is available7 Al6
`ternative policies may be useful when the link pos6
`sesses a large latency or high error rate and it is de6
`sirable to further reduce the chance that the far end
`of the link does not have the payload corresponding
`to a token7 In these cases5 it would be possible to
`send tokens after the payload has been sent multi6
`ple times5 or5 in the case of TCP tra c5 send the
`token when the acknowledgment of the payload is
`detected in the reverse direction7
`
` ! Protocol Operation
`
`We now describe the exchange of protocol messages
`between the compressor and decompressor7 These
`fall into three cases7
`
` In the normal case5 a payload is transferred be6
`ing entered in the dictionary as a side6e ect
`and after some interval another payload with
`the same contents is transferred5 this time as
`a dictionary token7 We refer to this case as
`compression7
`
` Occasionally5 however5 message loss on the
`channel may cause the two caches to lose syn6
`chronization and a dictionary token that is
`transferred must be returned to the sender to
`be resolved7 We refer to this case as rejection7
`
` Further5 if either the compressor or decompres6
`sor is restarted during the operation of the pro6
`tocol5 it is desirable to reset the other cache to a
`known state7 Therefore5 we add reset messages
`to the protocol7
`
` !! Compression
`
`The sequence of message exchange in the compres6
`sion case is shown as a time sequence diagram with
`time proceeding down the page in Figure 7 These
`descriptions assume that the incoming packet passes
`the classi cation stage and satis es the inclusion
`policyQ packets that do not are simply forwarded
`over the link in the usual fashion7
`
`When the compressor receives a packet fHdrA5 Xg
`to be forwarded over the link5 where HdrA is the
`TCPIP header and X is the data payload5 it rst
`computes HX5 the ngerprint of X7 If it nds that
`no entry indexed by HX exists in its cache5 it
`stores X in its cache5 indexed by HX7 It then for6
`wards the TCPIP packet across the link7 Upon
`receiving a TCPIP packet forwarded over the chan6
`nel5 the decompressor also computes HX5 and
`stores X in its cache5 indexed by HX7 The TCPIP
`packet is then output from the system7
`
`At some point later5 the compressor may receive a
`packet HdrB5 X5 for which an entry indexed by HX
`already exists in its cache7 This indicates that it has
`Page 7 of 13
`
`
`
`Compressor
`
`Decompressor
`
`time
`
`Compressor
`
`Decompressor
`
`time
`
`{HdrA, X}
`
`H(X) not found
`store H(X) -> X
`
`{HdrA, X}
`
`store H(X) -> X
`
`{HdrA, X}
`
`{HdrB, X}
`
`H(X) found
`
`{HdrB, H(X)}
`
`{HdrA, X}
`
`H(X) not found
`store H(X) -> X
`
`{HdrB, X}
`
`H(X) found
`
`{HdrA, X}
`
`(packet loss)
`
`{HdrB, H(X)}
`
`{HdrB, H(X)}
`
`H(X) not found
`
`lookup(H(X)) = X
`
`{HdrB, X}
`
`lookup H(X) = X
`
`{HdrB, X}
`
`Figure ’ Compression protocol
`
`already received a packet containing X7 which it for;
`warded over the link< Therefore assuming the com;
`pression policy is satis ed it sends a packet to the
`decompressor containing the TCPIP header HdrB
`and the ngerprint HX< Fingerprint packets ap;
`pear in bold type in the protocol diagrams<
`
`The implementation must therefore provide a means
`for these ngerprint packetsH to be distinguished
`from ordinary IP packets<
`In practice7 this is not
`a problem7 because the codepoint used for demul;
`tiplexing protocols at the link level may be over;
`loaded7 e<g<7 we allocate additional types for the
`Ethernet protocol type eld< Note that it is im;
`portant that this identi cation scheme not increase
`the length of the packet7 since this would necessitate
`a segmentation and reassembly protocol to accom;
`modate maximum length datagrams<
`
`When the decompressor receives a ngerprint packet
`fHdrB7 HXg7 it determines the data payload X
`that is indexed by HX in its cache< It then for;
`wards the corresponding TCPIP packet fHdrB7 Xg
`to the network on that end<
`
` !! Rejection
`
`The protocol as described above is incomplete7 for
`it does not handle the case where a packet contain;
`
`store H(X) -> X
`
`{HdrB, X}
`
`Figure ’ Rejection handling
`
`ing the rst instance of a data payload is lost while
`being sent across the link< We expect this case to
`be rare for most channels7 since bit error rates typi;
`cally contribute negligibly to the overall packet loss7
`and loss due to congestion may be detected at the
`compressor since it results in queue over ow and
`the lost payloads not entered into the dictionary<
`
`Nevertheless7 if the protocol is left as is7 the lack of
`feedback means that the compressor does not know
`that the decompressor never received the original
`payload< This means that it will send further copies
`of the payload by its ngerprint when the packet
`is retransmitted7 causing ongoing loss< To correct
`this error7 we introduce rejection handling into the
`protocol to handle events in which the decompressor
`receives a ngerprint that is not in its cache<
`
`Figure depicts rejection handling with another
`time sequence diagram< After message loss7 if the
`decompressor receives a ngerprint packet fHdrB7
`HXg for which HX is not a valid entry in its
`cache7 it sends the entire ngerprint packet includ;
`ing the header back to the compressor as a rejection
`packet< When the compressor receives this rejec;
`tion7 it determines the data X that is indexed by
`HX< This is highly likely to be in the cache at the
`compressor since it was sent in the recent past< The
`compressor then sends the complete TCPIP packet
`Page 8 of 13
`
`
`
`fHdrB$ Xg to the decompressor$ which processes the
`packet as if it were receiving a new TCPIP packet;
`It therefore enters it into its cache for subsequent
`use;
`
`If any of the packets that are sent as part of the
`rejection handling are lost$ or in the unlikely event
`that the compressor no longer has the payload corB
`responding to the rejected ngerprint in its cache$
`then the transmission has failed$ and no further
`steps are taken to recover; This residual loss will
`then be handled by the reliability mechanisms of
`the application in the same manner that packet loss
`is handled today;
`
`net protocol types to distinguish the ngerprint and
`rejection packets from the uncompressed packets;
`
`We implemented the dictionary caches using hash
`table structures with a leastBrecentlyBused replaceB
`ment strategy; For ngerprints$ we used the MD
`hash of the payload; We also used a classi er that
`only accepted data with payloads of at least
`bytes since Figure indicates that the remaining
`data comprises only of the replicated volume;
`Finally$ we limited the amount of memory available
`for the caches$ excluding the overhead induced by
`the hash table implementations$ to MB each;
`
` ! Reset Messages
`
` Experimental Results
`
`During normal operation of the protocol$ the comB
`pressor keeps track of all illegal ngerprints i;e;$
`those ngerprints for which a collision occurred; In
`the event that this state is lost e;g;$ the compressor
`is restarted$ the compressor reliably sends a cache
`reset message to the decompressor to ensure that
`the decompressor does not have any entries indexed
`by a previously illegal ngerprint;
`
`Further$ restarting the decompressor during operaB
`tion of the protocol may result in signi cant rejecB
`tion tra c; Therefore$ we explicitly send a cache
`reset message from the decompressor to the comB
`pressor; This is merely a performance optimization$
`and is not essential for correctness;
`
`To evaluate the system$ we performed three sets of
`experiments;
`
` We measured the bandwidth savings that our
`system provides in practice when operating on
`real tra c;
`
` We measured the baseline performance of the
`compressor and decompressor to gauge at what
`rates our system may be used;
`
` We compared the bandwidth savings produced
`by our system with alternative compression
`schemes;
`
` ! Implementation
`
`! Bandwidth Reduction
`
`We implemented the architecture described above
`using a pair of IntelBbased PentiumII MHz maB
`chines running Linux ; ; with MB of RAM
`each; The machines were directly connected to each
`other via a dedicated Mbps Ethernet and both
`machines were also connected to the Mbps EthB
`ernet network which comprises our research groupVs
`LAN; Both compressor and decompressor modules
`were written in C and ran as userBlevel processes;
`
`Our main design goal is to reduce the amount of
`bandwidth consumed by replicated data; We meaB
`sured bandwidth savings by inserting our system
`into the network at the point where we previously
`gathered traces[ see section ; ; We kept track of
`the amount of data input to and output from the
`system and the amount of data transmitted across
`the compressed channel while the system ran for
`hours and processed approximately GB;
`
`The compressor machine was con gured with IP forB
`warding enabled in the kernel; However$ we modiB
` ed the kernel forwarding routine to send the packB
`ets to the userBlevel program instead of handling the
`routing itself; We also allocated additional EtherB
`
`Figure shows the resulting bandwidth reduction
`for each minute of the run; It shows that the imB
`plementation is e ective in reducing the bandwidth
`utilization by approximately for the entire duB
`ration of the experiment;
`
`Page 9 of 13
`
`
`
`Set Reduction Compression Both
`A
` C
` C
` C
`B
` C
`C
` C
`C
` C
` C
` C
`D
` C
` C
` C
`E
` C
` C
` C
`Avg
` C
` C
` C
`
`Table ’ Percentage of bandwidth saved by Reduc;
`tion and Compression gzip on outbound tra c
`
`despite the fact that it is a user;level prototype that
`has not been tunedG eCgCG to minimize copiesC Fur;
`therG preliminary comparisons with other compres;
`sion schemes such as gzip as discussed below sug;
`gest that our scheme is signi cantly less computa;
`tionally expensiveC The similar and low latencies of
`compressor and decompressor result in a balanced
`system for given hardware and a small impact on
`overall
`latencyC They are also likely to improve
`signi cantly with an kernel;space implementation
`since the overhead of context switching would be
`removedC
`
`! Other Compression Methods
`
`Since bandwidth savings are heavily data depen;
`dentG we compared our bandwidth reductions with
`those of other compression schemes to place them
`in context and help gauge their signi canceC
`
`As an approximation to real systemsG we ran our
`trace data through a process that applied gzip com;
`pression to packet payloads and recorded volume
`statisticsC To simulate useful schemes under real;
`time and high throughput conditionsG we used the
`fastest library setting and processed packets individ;
`uallyO even soG gzip is substantially slower than our
`scheme and could not keep pace with a Mbps
`linkC Table compares this compression with our
`scheme for removing replicated dataC We infer from
`these results that our scheme provides similar bene;
` tsG somewhat smaller on averageG but requiring less
`computationC
`
`To look at the e ects of combining our reduction and
`regular compressionG we ran our trace data through
`a process that combined the twoG rst removing
`replicated data and compressing the remainderC Ta;
`ble also shows these resultsC
`It highlights the
`Page 10 of 13
`
`100
`
`90
`
`80
`
`70
`
`60
`
`50
`
`40
`
`30
`
`20
`
`10
`
`bandwidth reduction (%)
`
`0
`
`0
`
`200
`
`400
`
`600
`
`800
`time (m)
`
`1000
`
`1200
`
`1400
`
`Figure ’ Bandwidth reduction for all outbound
`tra c
`
`! System Performance
`
`Since we are interested in the potential of this
`scheme for use in higher speed networks with ca;
`pacities exceeding Mbps we measured the overall
`system performance to see how fast it would runC
`
`Packet streams containing no detectable replication
`incur the highest amount of processing required for
`each packet at both the compressor and decompres;
`sorC This therefore presents the worst;case load for
`our systemG and we used such streams to test the
`performance of the systemC
`
`To measure the throughput of the systemG we ran
`the system over a Mbps channel and sent our
`test stream of packets over a TCP connection that
` owed over the channelC We measured latency by
`using tcpdump as a passive monitor to capture and
`timestamp packets entering and leaving both the
`compressor and decompressorC To observe small la;
`tenciesG we use a modi ed Linux kernel that records
`timestamps using the processor cycle counter at
`driver interrupt timeC
`
`The results of our tests were that our implemen;
`tation was capable of forwarding over packets
`per second with a maximum throughput exceeding
` MbpsC FurthermoreG the latencies of the com;
`pressor and decompressor were both approximately
` sC These results are encouragingO our system
`can already run at rates exceeding T MbpsG
`
`
`
`fact that reduction and compression combine rather
`than overlap2 as each tends to tap a correlation on
`di erent timescales4
`
`We also considered the impact of header compres6
`sion2 but quickly realized that it would provide
`smaller savings4 With the average packet size of
`our trace close to bytes2 elimination of TCPIP
`headers from all packets would save no more than
` of the bandwidth2 and this best case is unlikely
`to be obtained across a link where there is signi 6
`cant tra c mixing4
`
` Related Work
`
`We discuss two categories of related workI compres6
`sion and caching4
`
`! Compression Techniques
`
`When faced with a limited transfer rate2 higher ef6
`fective throughput can be obtained by compressing
`the data before tra