`SECOND EDITION
`
`1
`
`DISH 1015
`
`
`
` DATA COMPRESSION
`
`Techniques and Applications
`Hardware and Software Considerations
`
`Second Edition
`
`GILBERT HELD
`4-Degree Consul/(fig,
`Macon, Georgia, USA
`
`and
`
`THOMAS R. MARSHALL
`(software author)
`
`FENWIC‘K LIBRARY
`JGEORGE MA
`-
`FAIRFAX, Wasom umvsnsm
`
`John Wiley & Sons
`Chichester - New York - Brisbane - Toronto - Singapore
`
`
`
`2
`
`
`
`
`
`Copyright © 1983, 1987 by John Wiley & Sons Ltd.
`
`All rights reserved.
`
`No part of this book may be reproduced by any means, or
`transmitted, or translated into a machine language without the
`written permission of the publisher.
`
`Library of Congress Cataloging-in-Publication Data:
`Held, Gilbert, 1943—
`Data compression.
`Bibliography: p.
`Includes index.
`
`1. Data compression (Computer science) 1. Marshall,
`Thomas (Thomas R.)
`II. Title.
`QA76.9.D33H44 1987
`005.74’6
`ISBN 0 471 91280 8
`
`86-18942
`
`British Library Cataloguing in Publication Data:
`Held, Gilbert
`Data compression: techniques and applications hardware
`and software considerations. —
`2nd ed.
`
`1. Data compression (Computer science)
`I. Title
`II. Marshall, Thomas
`005.74’6
`QA76.9.D33
`
`ISBN 0 471 91280 8
`
`Typeset by Photo-Graphics, Honiton, Devon
`Printed in Great Britain
`
`3
`
`
`
` CHAPTER ONE
`
`Rationale and Utilization
`
`In the chronology of computer development, large—scale information transfer
`by remote computing and the development of massive information storage
`and retrieval systems have witnessed a tremendous growth. Concurrent with
`this growth, several problem areas have developed which can result in major,
`but unnecessary, economic expenditures.
`One problem is the so-called ‘run-away database”. Here the size of the
`database used by an organization for its information storage and retrieval
`programs becomes larger and larger, requiring additional disc drives for
`online systems and reels of magnetic tapes for those systems that can be
`processed in a batch environment.
`Accompanying the growth in the size of databases has been a large increase
`in the number of users and duration of usage by personnel at remote
`locations. These factors result in tremendous amounts of data being trans-
`ferred between computers and remote terminals. To provide transmission
`facilities for the required data transfers, communications lines and auxiliary
`devices, such as modems and multiplexers, have been continuously upgraded
`by many organizations to permit higher data—transfer capability.
`Although the obvious solutions to these problems of data storage and
`information transfer are to install additional storage devices and expand
`existing communications facilities, to do so requires an additional increase
`in an organization’s equipment and operating costs. One method that can
`be employed to alleviate a portion of data, storage and information transfer
`problems is through the representation of data by more efficient codes. If
`one examines an organization’s database or monitors a transmission line,
`there is an excellent chance that the individual characters that both make
`
`up the database and the transmission sequence could be encoded more
`effectively. Two techniques that can result in a more effective encoded data
`representation are logical and physical data compression.
`
`1.1 LOGICAL COMPRESSION
`
`When a database is designed, one of the first steps of the analyst is to obtain
`as much data reduction as possible. This data reduction results from the
`
`1
`
`
`
`4
`
`
`
`
`
` 2 e
`
`limination of redundant fields of information while representing the data
`elements in the remaining fields with as few logical indicators as is feasible.
`Although logical c0mpression is data dependent and the method employed
`can vary based upon the analyst’s foresight,
`the following two examples
`will illustrate the ease of implementation and benefits of this compression
`technique.
`One simple example of logical data compression is the occupational field
`on a personnel database. Suppose 30 alphanumeric positions are allocated
`to this field.
`If the field is fixed, occupations such as the 10-character
`occupational description ‘DISHWASHER’ have 20 blanks inserted into the
`remainder of the field. Then, 30 million characters of storage would be
`required for the occupational field of 1 million workers. Suppose at most
`there were 32 768 distinct occupations. Instead of indicating the occupational
`title, one could encode the equivalent 5-digit data code, eliminating 25
`character positions per field. The size of the field could be reduced further
`by allocating the binary value of 1 or more characters to the occupational
`code. As an example, an 8-bit character could represent 28 — 1 or 255
`distinct values or occupational codes. Linking two 8-bit characters through
`appropriate software would provide 216—1 or 65 535 distinct codes. This
`would reduce the field size from 30 to 2 characters, saving 28 million
`characters of storage. If our counting begins at zero instead of a conventional
`starting place of 1, an 8-bit character could represent 256 codes while a 16—
`bit character could be employed to represent 65 536 distinct values.
`A second example of logical compression is a date field. This type of
`field frequently occurs in databases. Normally,
`the numeric equivalents
`of the subfields representing day, month and year are used in place of
`longhand notation. Thus, 01 04 81 would represent 1 April 1981. While
`this
`logical compression results
`in 6 numeric characters of storage,
`additional data reduction can result from storing the date as a binary value.
`Since the day will never exceed 31, 5 bits would suffice to represent the date
`field. Similarly, 4 bits could be used to represent the month value while 7
`bits could represent 127 years, permitting a relative year ranging from 1900
`to 2027.
`
`Logical compression using numerical and binary representation is illus—
`trated in Figure 1.1 for the preceding date field example. It is interesting to
`note that employing binary representation reduces the date field to 16 binary
`digits or two 8-bit concatenated characters of storage. As discussed, many
`logical compression methods can be considered by an analyst during the
`database design process. Each method may result in a distinct degree of data
`storage reduction. Correspondingly, when logically compressed databases or
`portions of such databases are transmitted between locations, transmission
`time is reduced since fewer data characters are transmitted.
`
`While logical compression can be an effective tool in minimizing the size
`of a database, it only reduces transmission time when logically compressed
`data is transmitted. Thus,
`the transmission of inquiry and response data
`
`5
`
`
`
`
`
`
`Longhand
`Example
`
`DAY
`1
`
`MONTH
`APRIL
`
`YEAR
`1981
`
`Logical compression using numerical representation
`
`Example
`
`01
`
`04
`
`81
`
`Logical compression using binary representation
`
`Example
`
`00001
`
`0100
`
`1010001
`
`Figure 1.1 Logical compression methods. Logical com-
`pression can result from alphanumeric, numeric or
`binary representation of data in a shorthand notation
`
`which are typically encoded as separate and distinct entities in the appropriate
`bit representation of the code for each character is not normally affected.
`Similarly, the occurrence of repeating patterns and groups of characters
`which are normally contained in reports transmitted from computer systems
`to terminal devices would not be affected. For such situations, a reduction
`in data transmission time depends upon the physical compression of the data
`as it is encountered.
`
`1.2 PHYSICAL COMPRESSION
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Physical compression can be viewed as the process of reducing the quantity
`of data prior to it entering a transmission medium and the expansion of such
`data into its original format upon receipt at a distant location. Although
`both physical and logical compression can result in reduced transmission time,
`distinct application differences exist between the two techniques. Logical
`compression is normally used to represent databases more efficiently and
`does not consider the frequency of occurrence of characters or groups of
`characters. Physical compression takes advantage of the fact that when data
`is encoded as separate and distinct entities, the probabilities of occurrence
`of characters and groups of characters differ. Since frequently occurring
`characters are encoded into as many bits as those characters that only rarely
`occur, data reduction becomes possible by encoding frequently occurring
`characters into short bit codes while representing infrequently occurring
`characters by longer bit codes. Like logical compression, many physical
`compression techniques exist. Some techniques replace repeating strings of
`characters by a special compression indicator character and a quantity count
`character. Other techniques replace frequently occurring characters with a
`short binary code while infrequently encountered characters are replaced by
`longer binary codes. In Chapter 2, 10 distinct physical compression methods
`are covered in detail. For the remainder of this book we will focus our
`
`attention upon physical data compression.
`
`
`
`6
`
`
`
`1.3 COMPRESSION BENEFITS
`:-
`When data compression is used to reduce storage requirements, overall
`program execution time may be reduced. This is because the reduction in
`storage will result in a reduction of disc-access attempts, while the encoding
`and decoding required by the compression technique employed will result
`in additional program instructions being executed. Since the execution time
`of a group of program instructions is normally significantly less than the time
`required to access and transfer data to a peripheral device, overall program
`execution time may be reduced.
`With respect to the transmission of data, compression provides the network
`planner with several benefits in addition to the potential cost savings associ—
`ated with sending less data over the switched telephone network where the
`cost of the call is usually based upon its duration. First, compression can
`reduce the probability of transmission errors occurring since fewer characters
`are transmitted when data is compressed while the probability of an error
`occurring remains constant. Second, since compression increases efficiency,
`it may reduce or even eliminate extra workshifts. Finally, by converting text
`that is represented by a conventional code such as standard ASCII into a
`different code, compression algorithms may provide a level of security against
`illicit monitoring.
`For data communications, the transfer of compressed data over a medium
`results in an increase in the effective rate of information transfer, even
`though the actual data transfer rate expressed in bits per second remains the
`same. Data compression can be implemented on most existing hardware by
`software or through the use of a special hardware device that incorporates
`one or more compression techniques.
`In Figure 1.2, a basic data-compression block diagram is illustrated. Shown
`as a black box, compression and decompression may occur within the user’s
`processor to include personal computers, intelligent terminals or in a device
`foreign to the processor, such as a specialized communications component.
`Foremost among these components are data concentrators and statistical
`multiplexers.
`To examine in some detail a portion of the benefits that may result from
`the employment of one or more compression techniques requires a review
`of some fundamental compression terminology.
`
`1.4 TERMINOLOGY
`
`As illustrated in Figure 1.2, an original data stream is operated upon accord-
`ing to a particular algorithm to producea compressed data stream. This
`compression of the original data stream is sometimes referred to as an
`encoding process with the result that the compressed data stream is also
`called an encoded data stream. Reversing the process, the compressed data
`stream is decompressed to reproduce the original data stream. Since this
`
`7
`
`
`
`
`
`Doto
`
`block
`data-compression
`1.2 Basic
`Figure
`diagram. An original data stream operated upon
`according to one or more compression algorithms
`results in the generation of a compressed data
`stream
`
`
`
`Compressed
`Original
`compression
`
`data
`data
`Data
`
`
`
`decompression
`
`
`
`
`decompression process results in the decoding of the compressed data stream,
`the result is sometimes referred to as the decoded data stream. We will use
`
`the term original data stream and decoded data stream synonymously, as
`well as the terms compressed data stream and encoded data stream.
`The degree of data reduction obtained as a result of the compression
`process is known as the compression ratio. This ratio measures the quantity
`of compressed data in comparison to the quantity of original data, such that
`(Ruth and Krentzler, 1972):
`
`Length of original data string
`Com ression ratio =——.
`Length of compressed data string
`p
`
`From the above equation, it is obvious that the higher the compression
`ratio the more effective the compression technique employed. Another term
`used when talking about compression is the figure of merit, where:
`
`Length of compressed data string
`Figure Of merlt _ Length of original data string
`
`The figure of merit is the reciprocal of the compression ratio and must
`always be less than unity for the compression process to be effective. The
`fraction of data reduction is one minus the figure of merit. Thus, a com—
`pression technique that results in one character of compressed data for every
`three characters in the original data stream would have a compression ratio
`of 3, a figure of merit of 0.33 and the fraction of data reduction would be
`0.66.
`
`1.5 COMMUNICATIONS APPLICATIONS
`
`To obtain an overview of some of the communications benefits available
`
`through the incorporation of data compression, we can consider a typical
`data communications application. As illustrated in the top portion of Figure
`1.3, a remote batch terminal is connected to a central computer with trans-
`
`8
`
`
`
`
`
`N0 compress/an
`
`
`
`llHHilHiiHll'
`
`
`
`
`96 kbps transmission
`rate
`
`
`9.6 kbps information
`transfer rate
`
`Remote batch
`terminal
`
`Compress/on rat/'0 2
`
`
`
`
`
`9 6 kbps transmission
`rate
`Computer
`
`
`
`
`
` Remote batch
`terminal
`19.2 kbps information
`transfer rate
`
`Compressxbn rat/0 2
`
`
`4.8 kbps transmission
`rate
`
`
`
`Computer
`Remote batch
`
`terminal
`9.6 kbps information
`
`
`
`transfer rate
`
`
`
`Figure 1.3 Data compression affects the information transfer ratio (ITR), through the
`use of data compression, the methodology and structure of one’s data communications
`facility may be changed
`
`mission occurring at a 9.6 kbps data rate. Let us assume that the data to be
`transmitted has not been compressed. If through the programming of one
`or more compression algorithms or the installation of a hardware compression
`device a compression ratio of 2 is obtained, several alternatives may be
`available with respect to one’s data communications methodology. First, our
`data transmission time is reduced since the effective information transfer
`rate has increased to approximately 19.2 kbps as shown in the middle portion
`of Figure 1.3. Ignoring communications software overhead, the data trans-
`mission time is halved. Thus, one may now consider using the remote batch
`terminal for other remote processing applications or perhaps an expensive
`after-hours shift or portion of such a shift can be alleviated. In the lower
`portion of Figure 1.3 another user option is illustrated. Here the transmission
`rate may be reduced to 4800 bps. With a compression ratio of 2,
`this is
`equivalent to an information transfer rate of 9600 bps. By lowering the data
`transmission rate, more expensive 9600 bps modems may be replaced by
`4800 bps modems and line conditioning which is normally required when
`transmitting data at 9600 bps may be removed, resulting in an additional
`cost reduction.
`
`
`
`9
`
`
`
`No compression
`
`Compression
`
`
`
`Computer
`
`
`
`
`
` Compression performing modern
`
`Figure 1.4 Data compression on a multidrOp line reduces the flow of data on the
`line, permitting additional terminals to be serviced
`
`A second type of communications application that can benefit from the
`utilization of data compression is illustrated in Figure 1.4. A typical multidrop
`network is illustrated in the top portion of Figure 1.4, connecting terminals
`at diverse geographical locations via a common leased line to a computer
`site. Typically, the transmission activity of the terminals is the governing
`factor that limits the multidrop line to a maximum number of drops. In the
`bottom portion of Figure 1.4,
`it is assumed that compression performing
`modems were substituted for the conventional modems used in the original
`multidrop configuration. Since data compression on a multidrop line reduces
`the flow of data on the line, its utilization will normally enable additional
`drops to be added to the line prior to the occurrence of throughput delays
`that affect the response time of the terminals attached to each drop. In this
`particular example,
`it
`is assumed that the use of compression performing
`modems permitted an increase in the number of line drops from 4 to 6.
`
`1.6 DATA COMPRESSION AND INFORMATION TRANSFER
`
`When data is transmitted between terminals, a terminal and a computer or
`two computers, several delay factors may be encountered which cumulatively
`affect the information transfer rate. Data transmitted over a transmission
`
`medium must be converted into an acceptable format for that medium. When
`digital data is transmitted over analogue telephone lines, modems must be
`
`10
`
`10
`
`
`
`_£___
`
`8 e
`
`mployed to convert the digital pulses of the business machine into a modu-
`lated signal acceptable for transmission on the analogue telephone circuit.
`The time between the first bit entering the modem and the first modulated
`signal produced by the device is known as the modem’s internal delay time.
`Since two such devices are required for a point—to—point circuit, the total
`internaldelay time'encountered during a transmission sequence equals-twice
`the modem’s internal delay time. Such times can range from a few to 10 or
`more milliseconds (ms). The second delay encountered on a circuit
`is a
`function of the distance between points and is known as the circuit or
`propagation delay time. This is the time required for the signal
`to be
`propagated or transferred down the line to the distant end. Propagation
`delay time can be approximated by equating 1 millisecond for every 150
`circuit miles and adding 12 milliseconds to the total.
`Once data is received at the distant end it must be acted upon, resulting
`in a processing delay which is a function of the computer or terminal
`employed as well as the quantity of transmitted data which must be acted
`upon. Processing delay time can range from a few milliseconds where a
`simple error check is performed to determine if the transmitted data was
`received correctly to many seconds where a search of a database must occur
`in response to a transmitted query. Each time the direction of transmission
`changes in a typical half duplex protocol, control signals at the associated
`modem to computer and modem to terminal interface change. The time
`required to switch control signals to change the direction of transmission is
`known as line turnaround time and can result in delays up to 250 or more
`milliseconds, depending upon the transmission protocol employed. We can
`denote the effect of data compression by examining the transmission protocol
`commonly known as BISYNC communications and a few of its derivations.
`
`BISNYC communications
`
`One of the most commonly employed transmission protocols is the Binary
`Synchronous Communications (BISNYC) communications control structure.
`This line control structure was introduced in 1966 by International Business
`Machine Corporation and is used for'transmission by many medium-speed
`and high-speed devices to include terminal and computer systems. BISNYC
`provides a set of rules which govern the synchronous transmission of binary-
`coded data. While this protocol can be used with a variety of transmission
`codes, it is limited to the half dupiex transmission mode and requires the
`acknowledgement of the receipt of every block of transmitted data. In an
`evolutionary process, a number of synchronous protocols have been
`developed to supplement or Serve as a replacement to BISNYC, the most
`prominent being the high level data link control (HDLC) protocol defined
`by the International Standard Organization (ISO).
`The key difference between BISYNC and HDLC protocols is that BISYNC
`is a half duplex, character-oriented transmission control structure while
`
`11
`
`11
`
`
`
`
`
`HDLC is a bit-oriented, full duplex transmission control structure. We can
`investigate the efficiency of these basic transmission control structures and
`the effect of data compression upon their information transfer efficiency. To
`do so, an examination of some typical error control procedures is first
`required.
`
`Error control
`
`'
`
`I
`
`9
`
`The most commonly employed error-control procedure is known as auto-
`matic request for repeat (ARQ). In this type of control procedure, upon
`detection of an error a request is made by the receiving station to the sending
`station to retransmit the message. Two types of ARQ procedures have been
`developed: ‘stop and wait ARQ’ and ‘go back n ARQ’, which is sometimes
`called continuous ARQ.
`
`‘Stop and wait ARQ’ is a simple type of error-control procedure. Here
`the transmitting station stops at the end of each block and waits for a reply
`from the receiving terminal pertaining to the block’s accuracy (ACK) or
`error (NAK) prior to transmitting the next block. This type of error-control
`procedure is illustrated in Figure 1.5. Here the time between transmitted
`blocks is referred to as dead time which acts to reduce the effective data
`
`rate on the circuit. When the transmission mode is half duplex, the circuit
`must be turned around twice for each block transmitted, once to receive the
`reply (ACK or NAK) and once again to resume transmitting. These line
`turnarounds, as well as such factors as the propagation delay time, station
`message processing time and the modem internal delay time, all contribute
`to what is shown as the cumulative delay factors.
`When the ‘go back n ARQ’ type of error control procedure is employed,
`the dead time can be substantially reduced to the point where it may be
`
`I
`
`I
`
`: Dead time
`:
`__......—.—¢-—|-
`
`——|.-
`
`Transmit
`
`Block [+2
`
`Block [+1
`
`_ A
`
`CK
`1' + 1
`
`1 I I
`
`I
`
`IIIIIr
`
`II
`
`.—
`
`.1—
`
`ACK
`1'
`
`Curmlmiue
`deloy factors
`
`Receive
`
`Figure 1.5 Stop and wait ARQ. In this type of error control procedure, the receiver
`transmits an acknowledgement after each block. This can result in a significant
`amount of cumulative delay time between data blocks
`
`
`
`12
`
`12
`
`
`
`10
`
`Block
`[+3
`
`Block
`[+1
`
`Block
`[+2
`
`Block
`[+1
`
`Block
`[
`
`Primary channel —-
`
`AC_K
`1
`
`NAK
`(+1
`
`ACK
`[+2
`
`ACK
`[+1
`
`ACK
`[+3
`
`— Reverse channel
`
`Figure 1.6 Go back N ARQ. In a ‘go hack it ARQ’ error-
`control procedure,
`the transmitter continuously sends
`messages until the receiver detects an error. The receiver
`then transmits a negative acknowledgement on the reverse
`channel and the transmitter retransmits the block received
`in error. Some versions of this technique require blocks
`sent before the error indication was encountered to be
`retransmitted in addition to the block received in error
`
`insignificant. One way to implement this type of error control procedure is
`by the utilization of a simultaneous reverse channel for acknowledgement
`signalling as illustrated in Figure 1.6. In this type of operating mode, the
`receiving station sends back the ACK or NAK reSponse on the reverse
`channel for each transmitted block. If the primary channel operates at a
`much higher data rate than the reverse channel, many blocks may have been
`received prior to the transmitting station receiving the NAK in response to
`a block at the receiving station being found in error. The number of blocks
`one may go back to request a transmission, n, is a function of the block size
`and buffer area available in the business machines and terminals at
`the
`transmitting and receiving stations, the ratio of the data transfer rates of the
`primary and reverse channels and the processing time required to compute
`the block check character and transmit an acknowledgement. For the latter,
`this time is shown as small gaps between the ACK and NAK blocks in Figure
`1.6.
`
`Half duplex throughout mode]
`
`When a message block is transmitted in the BISYNC control structure, a
`number of control characters are contained in that block in addition to the
`message text. If the variable C is assigned to represent the number of control
`characters per block and the variable D is used to represent the number of
`data characters, then the total block length is C + D. If the data transfer
`rate expressed in bps is denoted as TR and the number of bits per character
`is denoted as BC, then the transmission time for one character is equal to
`BCITR which can be denoted as Tc. Since D + C characters are contained
`in a message block,
`the time required to transmit the block will become
`TC*(D + C). Once the block is received, it must be acknowledged. To do
`
`13
`
`13
`
`
`
`11
`
`so, the receiving station is required to first compute a block check character
`(BCC) and compare it with the transmitted BCC character appended to the
`end of the transmitted block. Although the BCC character is computed as
`the data is received, a comparison is performed after the entire block is
`received and only then can an acknowledgement be transmitted. The time
`to check the transmitted and computed BCC characters and form and trans-
`mit the acknowledgement is known as the processing and acknowledgement
`time (TPA)'
`When transmission is half duplex, the line turnaround time (TL) required
`to reverse the transmission direction of the line must be added. Normally,
`this time includes the request-to-send/clear-to-send (RTS/CTS) modem delay
`time as well as each of the modems’ internal delay time. For the acknowl-
`edgement to reach its destination, it must propagate down the circuit and
`this propagation delay time, denoted as Tp, must also be considered. If the
`acknowledgement message contains A characters then, when transmitted on
`the primary channel, A*BC/TR seconds are required to send the acknowl-
`edgement.
`Once the original transmitting station receives the acknowledgement it
`must determine if it is required to retransmit the previously sent message
`block. This time is similar to the processing and acknowledgement time
`previously discussed. To transmit either a new message block or repeat the
`previously sent mesasge block, the line must be turned around again and
`the message block will require time to propagate down the line to the
`receiving station. Thus,
`the total
`time to transmit a message block and
`receive an acknowledgement, denoted as TB, becomes:
`
`Since efficiency is the data-transfer rate divided by the theoretical data-
`transfer rate, the transmission control structure efficiency (ETCS) becomes:
`
`_ BC*D*(1—P)
`Eras —
`T33 TB
`
`(1.2)
`
`Here P is the probability that one or more bits in the block are in error,
`causing a retransmission to occur.
`Although the preceding is a measurement of the transmission control
`structure efficiency,
`it does not consider the data code efficiency which is
`the ratio of information bits to total bits per character. When the data code
`efficiency is included, we obtain a measurement of the information transfer
`efficiency. We can call this ratio the information transfer ratio (ITR) which
`Will provide us with a measurement of the protocol’s information transfer
`efficiency. This results in:
`
`s:
`.
`ITR =M
`Be
`
`14
`
`(1.3)
`
`
`
`
`
`14
`
`
`
` 12
`
`where:
`
`ITR = Information transfer ratio
`
`BIC = Information bits per character
`BC
`= Total bits per character
`D
`= Data characters per message block
`A
`= Characters in the acknowledgement message
`C
`= Control characters per message block
`TR
`2 Data transfer rate (bps)
`TC
`= Transmission time per character (BC/TR)
`TPA = Processing and acknowledgment time
`TL
`= Line turnaround time
`Tp
`= Propagation delay time
`p
`= Probability of one or more errors in block.
`
`the information transfer ratio provides us with a
`From the preceding,
`measurement of the efficiency of the transmission control structure without
`considering the effect of compression. When compression is considered we
`obtain a new term which we will denote as the effective information transfer
`ratio (EITR).
`When data is compressed, the original data stream will be reduced in
`size prior to transmission, the actual reduction being dependent upon the
`compression algorithms employed as well as the composition of the data
`acted upon. In general, we can assume that the compression ratio considers
`the number of characters in the compressed data stream to include special
`control characters required to indicate one or more compression algorithms.
`This reasonable assumption simplifies the effect of considering data com-
`pression when examining a particular protocol. As an example, consider a
`160-character data block compressed into 78 data characters plus 2 com—
`pression indicator characters. Here the compression ratio would be 160/
`(78 + 2) or 2. The effect upon the previously developed equation to compute
`the information transfer ratio would be to change D in the numerator to the
`non-compressed string length of 160 characters while D in the denominator
`would be the actual 78 compressed data characters plus the two additional
`special characters required to indicate data compression, resulting in a total
`of 80 characters. If the total number of control characters framing the data
`block is relatively small,
`the effective information transfer ratio can be
`approximated by multiplying the information transfer ratio by the com—
`pression ratio.
`
`Computation examples
`
`We will assume that our data transmission rate is 4800 bps and we will
`transmit information using a BISYNC transmission control structure employ-
`ing a ‘stop and wait ARQ’ error control procedure. Furthermore, let us
`assume the following parameters:
`
`15
`
`15
`
`
`
`
`
`13
`
`= 4 characters per acknowledgement
`A
`Brc = 8 bits per character
`.
`BC
`= 8 bits per character
`D
`= 80 data characters per block
`C
`z 10 control characters per block
`TR
`3 4800 bps
`TC
`= 8/4800 = 0.00166 seconds (5) per character
`TPA = 20 milliseconds = 0.02 s
`TL
`= 100 milliseconds = 0.10 s
`Tp
`= 30 milliseconds = 0.03 s
`P
`= 0.01
`
`Then:
`
`ITR =
`
`8 80 (1 0'01)
`4800*[0.00166(80+ 10) +2*(0.02+ 0.03+0.1}+4*8f4800)]
`
`= 0.2861.
`
`Since the transfer rate of information in bits (TRIB) is equal to the product
`of the data transfer rate and the information transfer ratio, we obtain:
`
`TRIB = ITR*TR = 0.2861‘4800 = 1373 bps.
`
`For the preceding example, approximately 28 per cent of the data transfer
`rate is effectively used.
`Let us now examine the effect of doubling the text size to 160 characters
`while the remaining parameters except P continue as before. Since the block
`size has doubled, P approximately doubles, resulting in the ITR becoming:
`
`8*160*(1-0.02)
`ITR _
`‘ 4800*[0.00166(160+10)+2*(0.02+0.03+0.1)+(4*8/4800)]
`= 0.4339.
`
`With an ITR of 0.4339 the TRIB now becomes:
`
`TRIB = ITR*TR = 0.4339*4800 = 2083 bps.
`
`Here, doubling the block size raises the percentage of the data transfer
`rate effectively used to 43.39 per cent.
`
`Compression effect
`
`Suppose one or more data-compression algorithms are employed which result
`in a compression ratio of 2. What effect would this have upon the effective
`information transfer ratio?
`The effective information transfer ratio (EITR) can be obtained by mod—
`ifying equation (1.1) as follows:
`
`16
`
`16
`
`
`
`
`
`Brciiflffl—P)
`EITR =m 1.4
`TR*[TC*(DE+C)+2*(TM+TL+T.)+(A*BCITRJ]
`
`(
`
`)
`
`where
`
`D1 = original data block size in characters prior to compression
`D2 = compressed data block size in characters to include special com-
`pression indication characters.
`
`If on the average 160 data characters are transmitted in a compressed
`format of 80 characters we obtain:
`
`
`anew (1—0.01)
`EITR _
`‘ 48m*[0.00166(80+10)+2*(0.02+0.03+0.1}+(4*8!48{}0)]
`= 0.5788.
`
`As previously discussed, the effective information transfer ratio can be
`approximated as follows:
`
`EITR = ITR*CR.
`
`(1.5)
`
`Substituting, we obtain:
`
`EITR 2 0.2861'2 2 0.5722.
`
`Since the transfer rate of information in bits (TRIB) is the product of the
`effective information transfer ratio and the operating data rate, we obtain:
`
`TRIB = 0.5788*4800 = 2778 bps.
`
`In Table 1.1, the reader will find a comparison of the variations in the
`ITR, EITR and TRIB when non-compressed and compressed data are
`transmitted for two different block sizes.
`
`is apparent that two methods can be employed to
`it
`From Table 1.1,
`increase one’s transmission efficiency. First, one may alter the protocol or