`ScholarlyCommons
`
`Technical Reports (CIS)
`
`Department of Computer & Information Science
`
`May 1991
`
`A Comparison of Compressed and Uncompressed
`Transmission Modes
`
`Tarek M. Sobh
`University of Pennsylvania
`
`Jaffar Rehman
`University of Pennsylvania
`
`Follow this and additional works at: http://repository.upenn.edu/cis_reports
`
`Recommended Citation
`Sobh, Tarek M. and Rehman, Jaffar, "A Comparison of Compressed and Uncompressed Transmission Modes" (1991). Technical
`Reports (CIS). Paper 411.
`http://repository.upenn.edu/cis_reports/411
`
`University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-91-41.
`
`This paper is posted at ScholarlyCommons. http://repository.upenn.edu/cis_reports/411
`For more information, please contact repository@pobox.upenn.edu.
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 001
`
`
`
`A Comparison of Compressed and Uncompressed Transmission Modes
`
`Abstract
`In this paper we address the problem of host to host communication. In particular, we discuss the issue of
`efficient and adaptive transmission mechanisms over possible physical links. We develop a tool for making
`decisions regarding the flow of control sequences and data from and to a host. The issue of compression is
`discussed in details, a decision box and an optimizing tool for finding the appropriate thresholds for a decision
`are developed. Physical parameters like the data rate, bandwidth of the communication medium, distance
`between the hosts, baud rate, levels of discretization, signal to noise ratio and propagation speed of the signal
`are taken into consideration while developing our decision system. Theoretical analysis is performed to
`develop mathematical models for the optimization algorithm. Simulation models are also developed for
`testing both the optimization and the decision tool box.
`
`Comments
`University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-
`CIS-91-41.
`
`This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/411
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 002
`
`
`
`A Comparison Of Compressed and Uncompressed
`Transmission Modes
`
`MS-CIS-91-41
`GRASP LAB 266
`
`Tarek M. Sobh
`J affar Re hman
`
`Department of Computer and Information Science
`School of Engineering and Applied Science
`University of Pennsylvania
`Philadelphia, PA 19104-6389
`
`May 1991
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 003
`
`
`
`A Comparison of Compressed and Uncompressed Transmission
`Modes
`
`Tarek M. Sobh and Jaffar Rehman
`
`Department of Computer and Information Science
`School of Engineering and Applied Science
`University of Pennsylvania, Philadelphia, PA 19104
`
`Abstract
`
`In this paper we address the problem of host to host communication. In particular, we discuss
`the issue of efficient and adaptive transmission mechanisms over possible physical links. We
`develop a tool for making decisions regarding the flow of control sequences and data from and
`to a host. The issue of compression is discussed in details, a decision box and an optimizing
`tool for finding the appropriate thresholds for a decision are developed. Physical parameters like
`the data rate, bandwidth of the communication medium, distance between the hosts, baud rate,
`levels of discretization, signal to noise ratio and propagation speed of the signal are taken into
`consideration while developing our decision system. Theoretical analysis is performed to develop
`mathematical models for the optimization algorithm. Simulation models are also developed for
`testing both the optimization and the decision tool box.
`
`1 Introduction
`
`Data which is transmitted over a communication medium between two computers contains some form
`of redundancy. This redundancy can be exploited to make economical use of the storage media or
`to reduce the cost of transferring the data and commands over the communication network. One of
`the basic issues in the design of the presentation layer is to decide whether data is to be compressed
`before transmission or not. Many factors may affect nialring this decision, one of the most important
`ones is the cost factor. Compressing data before sending it will often help reduce the cost.
`
`Some other factors may affect the decision of compression. The time factor may be the influencing
`one, in fact, one should not forget that there is the overhead of the compression and decompression
`algorithms at the sender and a t the receiver hosts. This overhead is both in time and money, as the
`CPU is used for running the algorithms. Thus, the designer always faces the problem of when should
`one compress the data. The parameters which may affect this decision might be the parameters of the
`physical communication medium, they might also be the parameters of the compression algorithm
`used or both.
`
`The decision that the design engineer will have to make might be a decision to compress or not given
`a certain fixed compression and physical medium parameters, or it might be a decision to compress
`depending on the value of one or more of the parameters (i.e., to compress if a certain threshold is
`met). In our work, we try to develop a tool for making such a decision, we choose the time t o be the
`criteria for making corniression decision, where the time is both for the compression overhead and
`for transmission. We develop a ~ e s / n o decision box given some fixed parameters and an optimizing
`decision box for finding the threshold for one varying parameter while fixing the others. Theoretical
`analysis is performed and also simulations for different sets of data and a decision (or a threshold)
`is output for each kind of analysis.
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 004
`
`
`
`2 Physical and Compression Parameters
`
`The parameters of the communication medium between two hosts will affect the decision regarding
`compression. Whether the medium be a coaxial cable, a fiber optic cable or air (in the case of radio
`transmissions) it can always be completely specified by parameterizing it. The parameters that may
`help in determining the transmission time can be listed as follows :
`
`a The data rate in bits per second. { D bps )
`
`a The band width of the medium in hertz. { B Hz }
`
`a The distance between the two hosts under consideration in meters. { L m }
`
`r The levels of discretization. { I )
`r The baud rate in changes per second. { b )
`
`a The signal to noise ratio in decibels. { S dB )
`
`a The propagation speed of the signal in the medium, in meters per second. { P m/s )
`
`The simultaneous use of all the above parameters to express time for transmission is redundant.
`For example, it is sufficient t o use the number of bits and the data rate t o express the time. If the
`data rate is not available we can use the baud rate, the levels of discretization and the data size.
`Alternatively, we can also use Shanon's maximum data rate bound, thus using the band width, the
`signal to noise ratio and the data size to find an expression for the minimum time for transmission.
`
`The other set of parameters that is involved with the computation of the time for transmitting a
`certain amount of data is the set of the compression algorithm parameters. The CPU run time
`as a function of the size of data input to the algorithm is one of those parameters. The expected
`compression ratio, which actually depends on what type of data to be transmitted is the second
`compression parameter of concern.
`
`3 Mat hemat ical Formulation
`
`The problem can be mathematically formulated by trying to find the cost of sending a certain number
`of bits from a host to another. The cost will be assumed to be the time through which the channel
`will be kept busy sending the data plus the time that will take the CPU to perform the compression
`and decompression on the data that are required to be transmitted. One can use a weight in the
`cost expression to denote that, for example, the cost for utilizing the network cable for one second
`is X times the cost for utilizing the CPU for one second. Thus, the expression for the cost function
`may be written as :
`
`Transmission time + X x CPU computation time
`where X is the ratio between the cost of using the network for one unit time and the cost of one unit
`CPU time.
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 005
`
`
`
`3.1 The Transmission Time
`If we make the assumption that we only have two hosts connected directly and ignore the other
`overheads of the protocol t o be used, the time needed to transmit N bits from a host to another
`can be written as a mathematical expression in terms of the physical medium parameters. For our
`implementation we are going t o develop the transmission expression in four different ways, using
`four different sets of physical parameters, where each set individually is sufficient to specify the
`transmission time T 1 completely.
`
`3.1.1 Formulation Using the Data rate
`The time required for transmitting N bits can be formulated as follows :
`
`where D is the data rate in bits per second, L is the distance between the two hosts and P is the
`signal propagation speed. The first term is for the emission of N bits from the sender and the second
`term is the time for the last bit to reach the receiver.
`
`3.1.2 Formulation Using the baud rate
`The time required for transmitting N bits can be formulated as follows :
`
`where b is the baud rate in changes per second, I is the number of levels of discretization, L is the
`distance between the two hosts and P is the signal propagation speed. The first term is for the
`emission of N bits from the sender and the second term is the time for the last bit to reach the
`receiver.
`
`3.1.3 Formulation Using the band width
`The time required for transmitting N bits can be formulated as follows :
`
`where B is the band width in Hertz, I is the number of levels of discretization, L is the distance
`between the two hosts and P is the signal propagation speed. The first term is for the emission of
`N bits from the sender and the second term is the time for the last bit to reach the receiver. In this
`case, there is assumed to be no noise whatsoever, we are assuming the maximum possible data rate.
`
`3.1.4 Formulation Using the Signal to Noise Ratio
`The time required for transmitting N bits can be formulated as follows :
`
`where B is the band width in Hertz, S is the signal to noise ratio in decibels, L is the distance
`between the two hosts and P is the signal propagation speed. The first term is for the emission of
`N bits from the sender and the second term is the time for the last bit to reach the receiver. In this
`case Shanon's maximum data rate of a noisy channel is assumed.
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 006
`
`
`
`3.2 The Compression and Decompression Times
`The run times of the algorithm for compression and decompression can be expressed as a function
`of the size of the input in terms of machine cycles. That is, the compression time can be expressed
`as T 2 ( M ) where M is the size of data that is input to the compression algorithm.
`
`3.3 Total Cost Without Using Compression
`The total time to send N bits without using compression would then simply be equal to the transmis-
`sion time, thus it equals one of the four expressions discussed previously. The total cost is considered
`to be only the time during which the communication channel is utilized.
`
`3.4 Total Cost Using Compression
`The total cost for transmitting a sequence of bits using compression will be assumed to be a weighted
`combination of the times for transmission and the times for compression and decompression. Thus,
`if we assume the compression ratio of the algorithm to be equal t o R, and X is the ratio between
`the cost of using the network for one unit time and the cost of one unit CPU time and if we also
`assume a variable page size, i,e. compression is to be performed before each transmission of a block
`of size M of data, the total cost to be incurred ( when we express the transmission time in terms of
`the data rate ) can be written as :
`
`where f l and f2 are the compression and decompression runtime functions ( in terms of the input
`size ).
`Similarly, the total cost can be written for the other physical medium sets of parameters as :
`
`'" + $ + ~ ( f l ( ~ ) + f l ( ~ ~ ) )
`
`1
`
`C =
`2 B log, I
`
`4 Compression Algorithms
`
`The methods to compress data have been studied for many years. However, several problems have
`prevented the widespread integration of compression methods into computer systems for automatic
`data compression. These problems include poor runtime execution preventing high data rates, lack
`of flexibility in the compression procedures to deal with most types of redundancies and storage
`management problems in dealing with storage of blocks of data of unpredictable length. In most
`cases a method presents some subset of these problems and therefore is restricted to applications
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 007
`
`
`
`where its inherent weaknesses do not result in critical inefficiencies. In this section we shall review
`the different forms of redundancies that can be taken advantage of for compression and then look at
`some of the approaches taken towards this end. Then we shall present a method due t o Welch [4]
`which avoids many of the drawbacks of most of the methods.
`
`4.1 Kinds of Redundancies
`There are four major types of redundancies that we shall mention here. The forms of redundancies
`discussed are not independent of each other but overlap to some extent. There are some other forms
`of redundancies too, but the ones we are going to discuss are the major ones.
`
`In different types of data, some of the characters are used more frequently than others. For example,
`in English text we see space and 'e' more frequent than any other character and special characters
`are used a lot less frequently. Generally speaking, all of the string combinations might not be used
`in a specific data set, resulting in the possibility of reducing the number of bits per combination.
`This kind of redundancy is due to character distribution.
`
`The repetition of string patterns is another form of redundancy found in some of the cases. For
`example, the sequence of blank spaces is very common in some kind of data files. In such cases the
`message can usually be encoded more compactly rather than by repeating the string pattern.
`
`In a certain type of data set, certain sequences of characters might appear very frequently. Some
`pairs may be used with higher frequency than the individual probabilities of the letters in these
`pairs would imply. Therefore, these pairs could be encoded using fewer bits than the combined
`combinations of the two characters formed by joining together the individual combinations for the
`two characters. Likewise, the unusual pairs, can be encoded using very long bit patterns to achieve
`better utilization.
`
`In some data sets, certain strings or numbers consistently appear at a predictable position. This is
`called Positional redundancy. It is a form of partial redundancy that can be exploited in encoding.
`
`4.2 Methods of Compression
`In what follows, we shall describe several compression algorithms which exploit the above forms of
`redundancies. We will, then choose one of these methods for our simulation.
`
`4.2.1 Huffman Encoding
`
`This is the most popular compression method. It translates the fixed-size pieces of input data into
`variable-length symbols. The method is based on the probability of occurrence of various symbols.
`Each symbol is assigned a code such that the number of bits in the assigned code approximately equals
`log2(symbol-occurrence-probability). Huffman encoding has certain problems. The first problem is
`that the size of input symbols is restricted by the size of the translation table. If a symbol is one
`eight-bit byte, then a table of 256 entries is sufficient. However, such a table limits the degree of
`compression that can be achieved. If the size of the input symbols is increased to two bytes each,
`the compression degree would be increased. However, such encoding would require a table of 64K
`entries which may be an unacceptably high cost.
`
`The complexity of the decompression process is another problem of the Huffman encoding. The
`translation table is essentially a binary tree, in that, the interpretation of each code proceeds bit by
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 008
`
`
`
`bit and a translation subtable is chosen depending on whether the bit is zero or one. This basically
`means a logic decision on every bit which can create a system bottle neck.
`
`The third problem with Huffman encoding is the need to know the frequency distribution of characters
`in the input data which is not well known for all kinds of data. The translation table is derived based
`on the frequency distribution of characters. A common solution is to do two passes on the data, one
`t o find the frequency distribution ordering and the other is to to do the encoding. This process may
`be done block wise to adapt to the changes in data. This is not very efficient although it might be
`accept able.
`
`4.2.2 Run-lengt h encoding
`Repeated sequences of identical characters can be encoded as a count field along with the repeated
`character. However, the count fields have to be distinguished from the normal character fields which
`might have the same bit pattern as the count fields. A possible solution is t o use a special character
`to mark the count field. This might be suitable for ASCII text, but not when there are arbitrary bit
`patterns such as in the case of binary integers. Typically, three characters are required t o mark a
`sequence of an identical character. So, this will not be useful for sequences of length three or less.
`
`4.2.3 Programmed Compression
`
`In formatted data files, several techniques are used to do compression. Unused blank or zero spaces
`are eliminated by making fields variable in length and using an index structure with pointers to each
`field position. Predicted fields are compactly encoded by a code table. Programmed compression
`cannot effectively handle character distribution redundancy and is therefore a nice complement of
`Huffman encoding. The programmed compression has several drawbacks. First it is usually done by
`the application programmers adding to the software development cost. The type of decompression
`used requires a knowledge of the record structure and the code tables. The choice of field sizes and
`code tables may complicate or inhibit later changes to the data structure making the software more
`expensive to maintain.
`
`4.2.4 Adaptive Compression
`The Lempel-Ziv [5,6] and related methods fall into this category. These methods exploit redundancies
`in the form of character frequency, character repetitions, and high usage pattern. Fixed length codes
`are used for variable-length strings such that the probability of occurrence of all selected strings
`is almost equal. This implies that the strings comprising of more frequently occurring symbols
`will contain more symbols than those strings having more of the infrequent symbols. This type
`of algorithm requires one pass over the data and is adaptive in nature. It starts with an empty
`translation table and builds the table dynamically as the compression proceeds. For this very reason
`the algorithm yields poor results in the beginning of the data set. As the algorithm gains more
`experience with the symbols frequency etc., and the table grows larger, the compression degree
`becomes better. The input size should be long enough to achieve a good overall compression degree
`over the whole data set. A finite implementation of such an algorithm would loose its adaptive
`characteristic after a certain length of input if the input's redundancy characteristics vary over its
`length.
`
`We have chosen a variation of the Lempel-Ziv procedure which is called LZW due to Welch [4].
`This method retains the adaptive characteristics of the Lempel-Ziv procedure but yields relatively
`inexpensive implementations and a potential of very fast execution because of its simple logic.
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 009
`
`
`
`4.3 The LZW Algorithm
`The LZW algorithm is based on a translation table, referred to here as a string table, that maps
`strings of input characters into fixed length codes. The use of 12-bit codes is common. The LZW
`table has the prefix property that if wK is a string in the table and w is a string and K is a character,
`then w is also in the table. K is called the extension character on the prefix string w. The string
`table may be initialized to contain all single-character strings.
`
`The LZW table, at any time in the compression process, contains strings that have been encountered
`previously in the message being compressed. In other words, it contains the running sample of strings
`in the message, so the available strings reflect the statistics of the message. The strings added to the
`table are determined by this parsing: each parsed input string extended by its next input character
`forms a new string added t o the string table. Each such string is assigned a unique identifier, namely
`its code value. In precise terms, this is the algorithm :
`
`Initialize table to contain single-character strings.
`Read first input character ==+ prefix string w.
`Step: Read next input character K
`If no such K (input exhausted): code(w)*
`If w l i exists in string table:
`wK
`w;
`else wK not in string table:
`code(w)==+ output;
`W K e string table;
`Ii & w;
`repeat step.
`
`output ;EXIT
`repeat step.
`
`The algorithm is quite simple and can have a very fast implementation. The main concern in the
`implementation is storing the string table which can be very large. However, it can be made tractable
`by representing each string by its prefix identifier and extension character. This will give a table of
`fixed length entries.
`
`The basic algorithm for decompression is as follows :
`
`Next Code:
`
`Next Symbol:
`
`Decompression: Read first input c o d e d CODE
`OLDcode;
`with CODE = code(li), li ==+ output;
`CODE * INcode;
`Read next input code
`If no new code : EXIT, else:
`If CODE = code(wK): K =-+ output;
`code(w) a CODE;
`Repeat Next Symbol
`else if CODE = code(K) K
`output;
`OLDcode, I< * string table;
`INcode * OLDcode;
`Repeat Next Code.
`
`The decompressor logically uses the same string table as the compressor and similarly constructs it
`as the message is translated. Each received code value is translated by way of the string table into
`a prefix string and extension character. The extension character is pulled off and the prefix string is
`decomposed into its prefix and extension. This operation is recursive until the prefix string is a single
`character, which completes decompression of that code. Each update to the string table is made for
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 010
`
`
`
`each code received (except the first one). When a code has been translated , its final character is
`used as the extension character, combined with the prior string, to add a new string to the string
`table. This new string is assigned a unique code value, which is the same code that the compression
`assigned to that string. In this way, the decompressor incrementally reconstructs the same string
`table that the compressor used.
`
`5 Modes of Transmission
`
`The goal of our mathematical formulations and modeling is to perform one of two basic tasks, the
`first one is to decide whether to use compression or not given a certain set of fixed parameters { for
`compression, decompression and the physical medium ) , the other is to decide the threshold for a
`specific varying parameter before which we should perform compression and after which we should
`not perform compression.
`
`Two independent situations can arise in our formulation, in the first one, we can consider the protocol
`in which the communication t o take place is a one of varying page length. In this case, the compression
`is performed for one "chunk" of data at a time and is immediately sent after that. In the other case,
`the protocol may have a fixed page size and thus the compression is performed for large files and the
`compressed data is sent one page at a time. Thus comparing the two models for decision making
`and optimizing parameters can be performed for each one of these situations separately. It should
`also be noticed that there might exist hypothetical bounds and average values for the run times and
`compression ratios for the compression and decompression algorithms.
`
`5.1 Using a Varying-Length Page
`
`The problem in this case is to either make a decision regarding compression or to optimize a pa-
`rameter, the four different representations for the transmission time can each be used t o formulate
`and express the total cost incurred in the compression and uncompression modes. In the decision
`problem, we choose the scheme to have the less cost. In the optimization problems we find the range
`for a certain parameter such that, for example, compression is a better technique, by solving the
`inequality.
`
`Assuming that we use the LZW algorithm, characters are 8 bits each, the machine's cycle rate is
`w cycles per second, the data size to be compressed is M bits and the compression ratio is R. The
`algorithm runtime can be formulated as :
`
`The following inequality can be formed for the model using the data rate as the physical parameter,
`for cost of the compressed mode to be less than the cost of the uncompressed mode .
`
`For the model using the baud rate
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 011
`
`
`
`For the model using the band width
`
`For the model using the signal to noise ratio
`
`5.2 Using a Fixed-Length Page
`In this model we assume that the protocol has a fixed length page and that the compression and
`decompression is done for a large chunk of data M. In this situation another parameter should
`be taken into consideration, which is the page size m and the expression for the transmission time
`should now include the number of compressed pages that are sent over the communication medium.
`Thus the above inequalities can now be expressed as :
`
`For the model using the baud rate
`
`For the model using the band width
`
`For the model using the signal to noise ratio
`
`6 The Experiment
`
`In our experiment towards the goal of establishing a reasonable tool for the design engineer, we
`offer the choice for either making a decision to use a compression/decompression scheme given a
`certain situation, i.e, a fixed set of physical layer parameters and a certain size of a chunk of data, or
`choosing to optimize {obtain the threshold of) a certain parameter, such that we can use compression
`for all values of the parameter that are less than this threshold, as it gives a less total cost than the
`technique that does not use compression. The user is given the choice of choosing any one of the
`four different ways of modeling the cost function, to maximize the number of parameters that one
`can deal with. Thresholds are found by solving the above inequalities for the parameters that are to
`be optimized for minimizing the total communication cost.
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 012
`
`
`
`In the
`The results of running the experiment are displayed both theoretically and realistically.
`theoretic solution, the input file to be transmitted is assumed to be a plain text, thus assuming
`no prior information about the kind of data that are transferred in the network, then, a more
`realistic { either a decision or a value ) solution is given by calculating the actual compression and
`decompression runtimes by running the LZW algorithm on them and observing the times and the
`compression ratio.
`
`In the first two examples the algorithm is run on image data. The first one contains a lot of details,
`the second is mainly a few number of dots in a planar surface, it is not surprising then to know
`that the compression ratio in the second example turned to be equal to 48 !!, especially when we
`remember the adaptive characteristics of the LZW algorithm. The compression ratio in the first one
`was equal to 5. This fact contributed to the difference in the thresholds and decisions between the
`"theoretic" and the "realistic" approaches to finding the required limits and decisions. The following
`are snap shots of three different runs for the system, the first two are for the complex image, the
`third is for the simple one :
`>> p r o j e c t . e
`Enter various input values prompted f o r .
`
`Decision o r Optimization? [d/ol : o
`
`Desired Model?
`[l=using d a t a r a t e , 2=baud r a t e , 3=bandwidth, 4=using sig-noise r a t i o ] : 3
`
`Data Size? 8389192
`Cycle Rate? 14286000
`Network/CPU time c o s t r a t i o ? 5
`Theoretical Compression Ratio? 1.8
`Observed Compression Ratio? 5.156
`Observed Compression Time? 1 . 3
`Observed Decompression Time? 1.2
`Levels of D i s c r e t i z a t i o n ? 2
`
`Theoretical Results: If band width < 1588559.073359 Hz t h e n compress.
`I f band width < 270484.731978 Hz then compress.
`Simulation Results:
`
`> >
`>> p r o j e c t . e
`Enter various input values prompted f o r .
`
`Decision o r Optimization? [d/ol:d
`
`Desired Model?
`[l=using d a t a r a t e , 2=baud r a t e , 3=bandwidth, 4=using sig-noise r a t i o ] :4
`
`Data Size? 8389192
`Cycle Rate? 14286000
`Network/CPU time c o s t r a t i o ? 5
`Theoretical Compression Ratio? 5
`Observed Compression Ratio? 5.156
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 013
`
`
`
`Observed Compression Time? 1.3
`Observed Decompression Time? 1.2
`Signal to Noise Ratio in decibels? 5
`Medium Bandwidth? 1000000
`
`Theoretical Results: Compression would cost less.
`Simulation Results: Compression would cost more.
`
`>>
`>> project .e
`Enter various input values prompted for.
`
`Decision or Optimization? [d/ol:o
`
`Desired Model?
`[l=using data rate, 2=baud rate, 3=bandwidth, 4=using sig-noise ratio] :I
`
`Data Size? 1966640
`Cycle Rate? 14286000
`Network/CPU time cost ratio? 5
`Theoretical Compression Ratio? 1.8
`Observed Compression Ratio? 48.468
`Observed Compression Time? 0.4
`Observed Decompression Time? 0.2
`
`Theoretical Results: If data rate < 3177118.146718 bps then compress.
`Simulation Results: If data rate < 642021.316608 bps then compress.
`
`We then proceed in the experiment to try different kind of data, we try executable files and observe
`the results of running our toolbox. The following is a sample run for the system on a file of executable
`commands.
`>> project .e
`Enter various input values prompted for.
`
`Decision or Optimization? [dl01 : d
`
`Desired Model?
`[l=using data rate, 2=baud rate, 3=bandwidth, 4=using sig-noise ratio] :4
`
`Data Size? 745472
`Cycle Rate? 14286000
`NetworkICPU time cost ratio? 5
`Theoretical Compression Ratio? 1.8
`Observed Compression Ratio? 1.526
`Observed Compression Time? 0.5
`Observed Decompression Time? 0.3
`Signal to Noise Ratio in decibels? 30
`
`Veritas Techs. LLC
`Exhibit 1015
`Page 014
`
`
`
`Medium Bandwidth? 3000
`
`Theoretical Results: Compression would cost more.
`Simulation Results: Compression would cost more.
`
`It is noticed that for a "nice" collection of information, which includes a lot of repetitiveness, the
`compression ratio is maximum, while it decreases for other types. The fact that there is sometimes a
`discrepancy between the realistic and the theoretic values is because the theoretic approach assumes
`a "perfect" media when it calculates the runtime for compression, however, this is not the case when
`performing the actual compression in software on a down-to-earth Vax workstation. Also the wide
`variations in the compression ratios should be taken into consideration.
`
`7 Conclusions
`
`We discussed the issue of efficient and adaptive transmission mechanisms over possible physical links
`between hosts. A tool was developed for making decisions regarding the flow of control sequences and
`data from and to a host. The decision of compressing data and commands is discussed in details, a
`yes/no decision box and an optimizing tool for finding the appropriate thresholds for a decision were
`implemented. Physical parameters are taken into consideration while developing our decision system.
`Also, the compression parameters relationships and different compression techniques suited for the
`task are developed, with an emphasis on adaptive ones that accommodate various data and control
`patterns. Theoretical analysis is performed to develop mathematical models for the optimization
`algorithm. Simulation models are also developed for testing both the optimizatio