`
`I’I‘—23, No. 3, MAY 1977
`
`337
`
`A Universal Algorithm for Sequential Data Compression
`
`JACOB ZIV, FELLOW, IEEE, AND ABRAHAM LEMPEL, MEMBER, IEEE
`
`Abstract—A universal algorithm for sequential data compres-
`sion is presented. Its performance is investigated with respect to
`a nonprobabilistie model of constrained sources. The compression
`ratio achieved by the proposed universal code uniformly ap-
`proaches the lower bounds on the compression ratios attainable by
`block-to-variable codes and variable-to-block codes designed to
`match a completely specified source.
`
`I. INTRODUCTION
`
`com-
`arising in digital
`N MANY situations
`munications and data processing, the encountered
`strings of data display various structural regularities or are
`otherwise subject to certain constraints, thereby allowing
`for storage and time—saving techniques of data compres-
`sion. Given a discrete data source, the problem of data
`compression is first to identify the limitations of the source,
`and second to devise a coding scheme which, subject to
`certain performance criteria, will best compress the given
`source.
`
`Once the relevant source parameters have been identi-
`fied, the problem reduces to one of minimum—redundancy
`coding. This phase of the problem has received extensive
`treatment in the literature [1]—[7].
`When no a priori knowledge of the source characteristics
`is available, and if statistical tests are either impossible or
`unreliable, the problem of data compression becomes
`considerably more complicated. In order to overcome these
`difficulties one must resort to universal coding schemes
`whereby the coding process is interlaced with a learning
`process for the varying source characteristics [8],
`Such
`coding schemes inevitably require a larger working mem-
`ory space and generally employ performance criteria that
`are appropriate for a wide variety of sources.
`In this paper, we describe a universal coding scheme
`which can be applied to any discrete source and whose
`performance is comparable to certain optimal fixed code
`book schemes designed for completely specified sources.
`For lack of adequate criteria, we do not attempt to rank the
`proposed scheme with respect to other possible universal
`coding schemes. Instead, for the broad class of sources
`defined in Section III, we derive upper bounds on the
`compression efficiency attainable with full a priori
`knowledge of the source by fixed code book schemes, and
`
`Manuscript received June 23, 1975; revised July 6, 1976. Paper pre-
`viously presented at the IEEE International Symposium on Information
`Theory, Ronneby, Sweden, June 21-24, 1976.
`J. Ziv was with the Department of Electrical Engineering, Technionv
`Israel Institute of Technology, Haifa, Israel. He is now with the Bell
`Telephone Laboratories, Murray Hill, NJ 07974.
`A. Lempel was with the Department of Electrical Engineering, Tech-
`nion—Israel Institute of Technology, Haifa, Israel. He is now with the
`Sperry Research Center, Sudbury, MA 01776.
`
`then show that the efficiency of our universal code with no
`a priori knowledge of the source approaches those
`bounds.
`
`The proposed compression algorithm is an adaptation
`of a simple copying procedure discussed recently [10] in
`a study on the complexity of finite sequences. Basically,
`we employ the concept of encoding future segments of the
`source—output via maximum—length copying from a buffer
`containing the recent past output. The transmitted
`codeword consists of the buffer address and the length of
`the copied segment. With a predetermined initial load of
`the buffer and the information contained in the codewords,
`the source data can readily be reconstructed at the de-
`coding end of the process.
`.
`The main drawback of the proposed algorithm is its
`susceptibility to error propagation in the event of a channel
`error.
`
`11. THE COMPRESSION ALGORITHM
`
`The proposed compression algorithm consists of a rule
`for parsing strings of symbols from a finite alphabet A into
`substrings, or words, whose lengths do not exceed a pre-
`scribed integer Ls, and a coding scheme which maps these
`substrings sequentially into uniquely, decipherable code-
`words of fixed length LC over the same alphabet A.
`The word—length bounds L, and LC allow for bounded-
`delay encoding and decoding, and they are related by
`
`Le = 1+ l10g (n - Ls)l + I108 Lsl:
`
`(1)
`
`where [x] is the least integer nothsmialler than x, the log-
`arithm base is the cardinality oz of the alphabet A, and n
`is the length of a buffer, employed at the encoding end of
`the process, which stores the latest n symbols emitted by
`the source. The exact relationship between n and L3 is
`discussed in Section III. Typically, it 2 L30/‘Ls, where 0
`< h < 1. For on—line decoding, a buffer of similar length has
`to be employed at the decoding end also.
`To describe the exact mechanics of the parsing and
`coding procedures, we need some preparation by way of
`notation and definitions.
`
`Consider a finite alphabet A of a symbols, say A =
`{0,1, - - - ,a — 1}. A string, or word, S of length Z(S) = k over
`A is an ordered k—tuple S = s1s2 - - - 3;, of symbols from A.
`To indicate a substring of S which starts at position i and
`ends at position j, we write S(i,j). When i S j, S(i,j) =
`s,~s,-+1 - - - S,-, but when i > j, we take S(i,j) = A, the null
`string of length zero.
`The concatenation of strings Q and R forms a new string
`S = QR; if£’(Q) = k and €(R) = m, then €(S) = I2 + m, Q
`= S(1,k), and R = S(k + 1, k + m). For eachj, 0 Sj .<_
`
`APPLE 1016
`
`1
`
`APPLE 1016
`
`
`
`338
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, MAY 1977
`
`€(S), S(1,j) is called aprefix ofS; S(1,j) is a proper prefix
`ofSifj<€(S).
`C
`, Given a proper prefix S (1,j ) of a string S and a positive
`integer i such that i _<_ j, let L(i) denote the largest non-
`negative integer € § 3 (S) — j such that
`
`S(i,i+ Z - 1) = S(j+ 1,j+ Z),
`
`and let p be a position of S (1,j ) for which
`
`L(p) = max {L(i)}.
`1SiSj
`
`The substring S(j + 1,j + L(p)) of S is called the repro-
`ducible extension of S(1,j) into S, and the integer p is
`called the pointer of the reproduction. For example, if S
`= 00101011 andj = 3,thenL(1) = 1 since $0 + 1,j + 1) =
`S(1,1) but S0 + 1,j + 2) 75 S(1,2). Similarly,L(2) = 4 and
`L(3) = 0. Hence, S(3 +»1,8 + 4) = 0101 is the reproducible
`extension of S(1,3) = 001 into S with pointer p = 2.
`Now, to describe the encoding process, let S = 8182 - - -
`denote the string of symbols emitted by the source. The
`sequential encoding of S entails parsing S into successive
`source words, S = S1S2 - - - , and assigning a codeword C,
`for each S,. For bounded-delay encoding, the length Z, of
`each,S,_ is at most equal to a predetermined parameter L,,
`while each C, is of fixed length L, as given by (1).
`To initiate the encoding process, we assume that the
`output S of the source was preceded by a string Z of n —
`LS zeros, and we store the string B1 = ZS(1,L_,) in the
`buffer. If S (1,j) is the reproducible extension of Z into
`ZS(1,L,. — 1),then S1 = S(1,j + 1) and £’1=j+ 1. To de-
`termine the next source word, we shift out the first 3,
`symbols from the buffer and feed into it the next 61 sym-2
`bols of S to obtain the string B2 = B1(€1 + 1,n)S(L_, + 1,
`L, + [1). Now we look for the reproducible extension E of
`B2(1,n - Ls) into B2(1,n - 1), and set S2 = Es, where sis
`the symbol next to E in B 2. In general, if B, denotes the
`string of n source symbols stored in the buffer when we are
`ready to determine the ith source word S,, the successive
`encoding steps can be formally described as follows.
`1) Initially, set B1 = 0”‘L~S(1,L_,), i.e., the all-zero
`string of length n — Ls followed by the first LS symbols of
`S.
`A
`.
`
`2) Having determined B,, i Z 1, set
`
`S," = B,(n ‘LS + 1,n ‘L3 + 3,‘),
`
`where the prefix of length Z, - 1 of S, is the reproducible
`extension ofB,(1,n - Ls) into B,(1,n — 1).
`3) If p, is the reproduction pointer used to determine
`S,, then the codeword C, for S, is given by
`A
`
`Ci = Cz1Cg2Ci3,
`
`where C,1 is the radix—oz representation of p, — 1 with
`t’ (C,1) = [log (n — L,)], C,g is the radix—a representation
`of Z, - 1 with £’(C,2) = [log LS], and C,3 is the last symbol
`of S,, i.e., the symbol occupying position n - L_,. +' K, of B,.
`The total length of C, is given by
`
`€(C,) = [log (n — Ls)] + [log Ls] + 1
`
`4) To update the contents of the buffer, shift out the
`symbols occupying the first 3, positions of the buffer while
`feeding in the next Z, symbols from the source to obtain
`
`Bi+1 = Bi(€i + 1,n)S(hi + 1,hi+ 4,‘),
`
`where h, is the position of S occupied by the last symbol
`Of Bi.
`1
`This completes the description of the encoding process.
`It is easy to verify that the parsing rule defined by (2)
`guarantees a bounded, positive source word length in each
`iteration; in fact, 1 S 6, 5 L3 for each i thus allowing for
`a radix-a representation of Z, — 1 with [log Ls] symbols
`from A. Also, since 1 S p, S n — L, for each i, it is possible
`to represent p, - 1 with [log (n — L_,)] symbols from A.
`Decoding can be performed simply by reversing the
`encoding process. Here we employ a buffer of length n —
`L, to store the latest decoded source symbols. Initially, the
`buffer is loaded with n — L, zeros. IflD, = d1d2 -
`- - d,,_Ls
`denotes the contents of the buffer after C,_1 has been de-
`coded into S,_1, then
`X
`
`S.--1= D.-(n — L. — e.«.; + Ln — Ls):
`
`where €,_1 = €(S,_1), and where D,+1 can be obtained
`from D, and C, as follows.
`Determine p, - 1 and 4’, — 1 from the first [log (n -— L,.)'|
`and the next [log L,.'| symbols of C,. Then, apply 4’, — 1
`shifts while feeding the contents of stage p, into stage n -
`L3. The first of these shifts will change the buffer contents
`from D, to
`
`Dg1>= d2d3 . . . dn_L_‘_dp,. = d<11>d;>...dgg1,
`
`.5’
`
`Similarly, ifj 5 Z, — 1, the jth shift will transform D9”)
`= d‘,/'“d9‘” - - - d‘,{:Bs
`into DE“
`= d§"‘1’d§,/1"“ - --
`d<,,f:1;d§,f;1> = d§1’d§” - . . d‘,,/1,5. After
`these 2, — 1
`shifts are completed, shift Once more, while feeding the last
`symbol of C, into stage n — L, of the buffer. It is easy to
`verify that the resulting load of the buffer contains S, in
`its last 5, = Z (S,) positions.
`The following example will serve to illustrate the me-
`chanics of the algorithm. Consider the ternary (oz = 3)
`input string
`
`S = 001010210210212021021200 - - -
`
`,
`
`and an encoder with parameters L3 = 9 and n = 18. (These
`parameters were chosen to simplify the illustration; they
`do not reflect the design considerations to be discussed in
`Section III.) According to (1), the corresponding codeword
`length is given by
`
`Lc=1+log3(18—9)+log39=5.
`
`Initially, the buffer is loaded with n — L, = 9 zeros, fol-
`lowed by the first L3 = 9 digits of S, namely,
`
`B1=00O000000 001010210.
`\j-\/§/\.j-\/-§_/
`
`n-L_,=9
`
`Ls=9
`
`To determine the first source word S1, we have to find the
`longest prefix B1(1Q,9 + £1 - 1) of
`
`in accordance with (1).
`
`2
`
`B1(10,1'7)=00101021
`
`2
`
`
`
`ZIV AND LEMPEL: SEQUENTIAL DATA COMPRESSION
`
`339
`
`which matches a substring of B1 that starts in position pl
`5 9 and then set S1 = B1(10,9 + 61). It is easily seen that
`the longest match in this case is B1(10,11) = 00, and hence
`S1 = 001 and £1 = 3. The pointer p1 for this step can be any
`integer between one and nine; we choose to set p1 = 9. The
`two—digit radix—3 representation of p1 — 1 is C11 = 22, and
`that of £1 - 1 is C12 = 02. Since C53 is always equal to the
`last symbol of S,~, the codeword for S1 is given by C1 =
`22021.
`
`To obtain the buffer load B2 for the second step, we shift
`out the first £1 = 3 digits of B 1 and feed in the next 3 digits
`S (10,12) = 210 of the input string S. The details of steps
`2, 3, and 4 are tabulated below, where pointer positions are
`indicated by arrows and where the source words S, are
`indicated by the italic substring of the corresponding
`buffer load B,-
`
`l
`
`B2=00OO0O00101021O210,
`
`C2=211O2
`
`l
`
`B3=000010102102102I20,
`
`C3=2O212
`
`l
`
`B4=210210212021021200,
`
`C4=0222O.
`
`III. COMPRESSION or CONSTRAINED SOURCES
`
`In this section, we investigate the performance of the
`proposed compression algorithm with respect to a non-
`probabilistic model of constrained information sources.
`After defining the source model, we derive lower bounds
`on the compression ratios attainable by block—to-variable
`and variable—to—block codes under full knowledge of the
`source, and then show that the compression ratio achieved
`by our universal code approaches these bounds.
`
`A. Definition of the Source Model
`
`Let A = {0,1, - - - ,oz — 1} be the given a—symbol alphabet,
`and let A * denote the set of all finite strings over A. Given
`a string S E A* and a positive integer m S 3 (S), let Slm}
`denote the set of all substrings of length m contained in S,
`and let S(m) denote the cardinality of S {In}. That is,
`3(3)-In
`
`S{m}=
`
`S(i+ 1,i+m)
`
`and
`
`S(m) = |S{m}|-
`
`Given a subset 0 ofA*, let
`
`aim} = {S E oI£’(S) = m},
`
`and let a(m) denote the cardinality of o{m}.
`A subset a of A * is called a source if the following three
`properties hold:
`
`1) A C a (i.e., 0 contains all the unit length strings),
`2) S E 0’ implies SS E a,
`3) S E 0 implies S{m} C a{m}.
`
`3
`
`Typically, such a source a is defined by specifying a fi-
`nite set of strings over A which are forbidden to appear as;
`substrings of elements belonging to a, and therefore a(m)
`< am for all m exceeding some mo.
`With every source 0, we associate a sequence h(1),
`h(2), - - - of parameters, called the h-parameters of a,
`' wherel
`
`h(m) = i log a(m),
`m
`
`m = 1,2, - - -.
`
`(2)
`
`It is clear that 0 S h(m) S 1 for all m and, by 2) it is also
`clear that mh(m) is a nondecreasing function of m. The _
`sequence of h—parameters, however,
`is usually nonin~
`creasing in m. To avoid any possible confusion in the se—£
`quel, we postulate this property as an additional defining
`property of a source. Namely, we require
`
`4) h(m) = 1/m log o(m) is a nonincreasing function of
`m.
`
`B. Some Lower Bounds on the Compression Ratio
`
`Consider a compression coding scheme for a source a
`which employs a block—to-variable (B V) code book of M
`pairs (X,-,Y,-) of words over A, with £’(X,) = L for i =
`1,2, -
`- - ,M. The encoding of an infinitely long string 8 E‘
`0 by such a code is carried out by first parsing S into blocks
`of length L, and then replacing each block Xi by the cor-
`responding codeword Y,-. It is assumed, of course, that the
`code book is exhaustive with respect to 0 and uniquely
`decipherable [2}. Hence, we must have
`
`{Xi iii = UlL}
`
`M =
`
`= aLh(L),
`
`01'
`
`and
`
`max l€(Y,-)} Z l0gM = Lh(L).
`1SiSM
`
`(4)
`
`The compression ratio pg associated with the ith word»
`pair of the code is given by
`
`Pi
`
`£’(Yi)
`L .
`
`The BV compression ratio, pm/(o,M), of the source a,
`is defined as the minimax value of p,-, where the maximi—
`zation is over all word—pairs of a given code, and the min-
`imization is over the set C3‘/(a,M) of all BV code books"
`consisting of M word—pairs. Thus,
`
`p3V(a,M)= min max {gum}
`
`Cm/(a,M)1sisM L
`
`logM Lh(L)
`= —-—' =
`L
`L
`
`_
`>
`
`h
`
`,
`
`L .
`()
`
`1 Throughout this paper, log x means the base—a logarithm of x.
`
`3
`
`
`
`340
`
`For later reference, we record this result in the following
`lemma.
`Lemma 1:
`
`where
`
`pBv(U,M) Z h(L),
`
`Lh(L) = log M.
`
`Now, consider a compression scheme which employs a
`Variable-to—block (VB) code book of M word—pairs (X,-,Y,-),
`with £’(Y,) = L for all i = 1,2, - - - ,M. In this case, the
`compression ratio associated with the ith word—pair is
`given by
`
`__
`
`Pi
`
`L
`5 (Xi),
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, MAY 1977
`
`Remarks
`
`1) Since the value of L in the context of Lemma 1
`
`satisfies the definition of LM as given in Lemma 2, it fol-
`lows that the bounds of both lemmas are essentially the
`same, despite the basic difference between the respective
`coding schemes.
`E
`2) By 2), the second defining property of a source, the
`per—word bounds derived above apply to indefinitely long
`messages as well, since the whole message may consist of
`repeated appearances of the same worst case word.
`3) By 4), the nonincreasing property of the h—pa-
`rameters, the form of the derived bounds confirms the
`intuitive expectation that an increase in the size M of the
`employed code book causes a decrease in the lower bound
`on the attainable compression ratio.
`
`and similarly, the VB compression ratio pV3(o,M) of a is
`defined as the minimax value of p,~ over all word—pairs and
`over the set CVB(o,M) of VB code books with M word-
`pairs.
`Lemma 2:
`
`where
`
`pvB(<r,M) Z h(LM)
`
`LM = max {KIM Z a(€)}.
`
`Proof: We may assume, without loss of generality, that
`in every code under consideration
`
`£’(X1).<. £’(X2) S
`
`S £’(XM),
`
`and hence for each C E C V};(¢7,M),
`
`,- C =
`maxp()
`
`L(C)
`“X9
`
`.
`
`Since C is exhaustive with respect to a, we have
`
`M Z rr(€1),
`
`5
`()
`
`(6)
`
`where £1 = 6 (X 1); and since C is uniquely decipherable,
`we have
`
`L(C) ZlogM.
`
`(7)
`
`C. Performance of the Proposed Algorithm
`
`We proceed now to derive an upper bound on the com-
`pression ratio attainable by the algorithm of Section II. To
`this end, we consider the worst case source message of
`length n — Ls, where n is the prospective buffer length and
`L, is the maximal word—length. The bound obtained for
`this case will obviously apply to all messages of length n
`— L, or greater.
`First, we assume that only the h-parameters of the
`source under consideration are known to the designer of
`the encoder. Later, when we discuss the universal perfor-
`mance of the proposed algorithm, we will show that even
`this restricted a priori knowledge of the source is actually
`unessential.
`
`We begin by choosing the buffer length n to be an inte-
`ger of the form
`x
`
`z
`
`n = 21 mam + _Z)\+1
`
`+ (6 + 1)(Ng+1 + 1),
`
`where
`
`x
`e
`Ng+1= Z1(Z — m)a’" + Z (Z — m)a(€),
`m=
`m=>\+1
`
`(9)
`
`(10)
`
`From the definition of LM, inequality (6), and the nonde-
`creasing property of 0(3), we obtain
`
`A = [£’h(€)j,
`
`the integer part of log a(€) = £’h(€), and
`
`Z = Ls - 1.
`
`(11)
`
`£1 SLM.
`
`_ From (5), (7), and (8), we have
`
`and since
`
`max p,'(C) Z
`
`logM_
`LM ’
`
`M Z 0(LM) = aLMh(LM),
`
`it follows that for each C E C VB(a,M),
`
`(8)
`
`The specific value of the parameter L3 is left for later
`determination (see the first remark following the proof
`of Theorem 1). The reasoning that motivates the given
`form of n will become clear from subsequent deriva-
`tions.
`
`Consider a string S E rrln — Ls}, and let N(S ) denote the
`number of words into which S is parsed by the algorithm
`during the encoding process. Recalling that each of these
`words is mapped into a codeword of fixed length LC (see
`(1)), it follows that the compression ratio p(S) associated
`with the string S is given by
`
`max p,-(C) Z h(LM).
`
`Q.E.D.
`
`‘p(S') =
`
`Le
`
`n — L,
`
`N(S)-
`
`4
`
`
`
`341
`
`(16)
`
`a”
`
`ZIV AND LEMPEL: SEQUENTIAL DATA COMPRESSION
`
`Hence, the compression ratio p attainable by the algorithm
`for a given source 0 is
`
`or
`
`N,
`
`(12)
`
`NsM=n
`
`— L
`Z
`
`3:
`
`n — L
`fl
`L, — 1
`
`where
`
`N= max N(S).
`se.;:n—L,}
`
`Let Q E afn — L3} be such that N(Q) = N, and suppose
`that the algorithm parses Q into Q = Q1Q2 - - - QN. From
`step 2) of the encoding cycle it follows that if €(Q,-) = Z (Qj)
`< Ls, for some i <j < N, then Q, 75 Q]; (Note that when
`Qj is being determined at the jth cycle of the encoding
`process, all of Q, is still stored in the buffer, and since €(Q,-)
`< L, , the longest substring in the buffer that precedes Q,-
`and is a prefix of Q, must be of length €(Q,~) - 1.)
`Denoting by Km the number of Q,-, 1 S i S N — 1, of
`length m, 1 S m S Ls, we have
`Ls
`N = 1 + 2 K,,,.
`m= 1
`
`Hence, from (12) and (16), we have
`
`Le
`£2 _
`”Se_L,—r
`
`Note that despite the rater rudimentary overestimation
`of N by N’, the upper bound of (17) is quite tight, since the
`fact that no source word is longer than L, immediately
`implies p Z LC/Ls.
`We can state now the following result.
`Theorem 1: If the buffer length n for a source with
`known h—parameters is chosen according to (9), then
`
`p S h(Ls - 1) + e(Ls),
`
`where
`
`an)
`
`=a—1
`
`<3+3mgam—1)+mg%§.
`
`Proof: From (1) we have
`
`By the above argument, and by property 3) of the source,
`we have
`
`LC = 1 + [log LS] + [log (n — Ls)]
`
`K,,,Sa(m),
`
`for1SmS€=L,—-1.
`
`Since
`
`[+1
`
`n _Ls = €(Q1v)+ Z mKm,
`m=1
`
`and n and L, are both fixed, it is clear that by overesti-
`mating the values of Km for 1 S m _<_ E at the expense of
`K5+1, we can only overestimate the value of N. Therefore,
`since a(m) S a(m + 1) and a(m) = amhlml S am, we ob-
`tain
`
`/
`6
`/
`N_<_K[+1+ Z Km=N/,
`m=1
`
`From (9) and (10) we obtain
`
`S 3 + log (L, — 1) + log (n — L_,).
`
`n—L.,=e[i<e—m>am+ i (Z-m)a(€)
`
`m=>\+1
`
`771:1
`
`A
`
`6’
`
`+ ; oz’”+ 2 0(3)],
`
`m=>\+1
`
`1
`
`and since 01'” S a(Z), for 1 S m S A, we have
`
`3
`'
`1
`n —Ls 5 30(5) 2 (Z - m+ 1) =§€2(€
`m=1
`
`+ 1)a(€),
`
`or
`
`log (n — L3) 5 2 log 3 + log
`
`+ L’h(€).
`
`2
`
`£’+1
`
`where
`
`and
`
`K,m={a’”,
`
`0(€),
`
`for1SmS}\=L€h(£’)j
`
`for>\<m SE
`
`(14)
`
`Since 3 = L, — 1, we obtain
`
`,
`
`Kg+1=
`
`1
`
`5’
`
`,
`
`;1mK,,,)_l.
`
`LC 5 3 + 3log (L, — 1) +log%+ (L, — 1)h(L_, — 1),
`
`01'
`
`From (14), (15), and (9), we obtain K241 5 Ng+1, and
`
`LC 5 (Ls — 1)[h(Ls — 1) + 6(Ls)l-
`
`x
`
`(7
`
`N/=NZ+1+ Z;
`
`1
`
`01'" + Z
`m=>\+1
`
`0(3)
`
`Substituting this result into (17), we obtain the bound of
`Theorem 1.
`
`Q.E.D.
`
`which, together with (9) and (10), yields
`(2
`n—L,—€N’= i ma’"+ Z ma(£’)
`m=1
`m=}x+1
`
`x
`
`e
`
`+Ne+1‘-€[ Z oz’”+ Z a(€)]=0,
`
`m=>\+1
`
`m=1
`
`Remarks
`
`1) The value of e(L,) decreases with L_, and, conse-
`quently, the compression ratio p approaches the value of
`h(L, — 1), the h—parameter associated with the second
`largest word—length processed by the encoder. Given any
`6 > 0, one can always find the least integer Z, such that
`p - h(€, — 1) S 6. The magnitude of the acceptable de-
`
`5
`
`
`
`342
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, MAY 1977
`
`viation 5 determines the operational range of LS, namely,
`LS .>_ £3.
`
`2) Since our code maps source words of variable
`length at most Ls into codewords of fixed length LC, we
`adopt as a reference for comparison the best VB code C VB
`discussed in Subsection III—B. The counterpart of our
`block-length LC is L(C) of (5), and by (7) we can write
`
`LC % logM % LMh(LM),
`
`(18)
`
`r
`
`where M is the code book size and h(LM) is the lower
`bound (see Lemma 2) on the compression ratio pm}; at-
`tainable by C VB. Since for sufficiently large Ls, we have
`also
`
`Le ~ (Ls _ 1)P Q (Ls _
`
`" 1):
`
`(19)
`
`it follows that p % pvg.
`We turn now to investigate the universal performance
`of the proposed algorithm, i.e., the performance when no
`a priori knowledge of the source to be compressed is
`available.
`
`Given 61 and hl such that 0 < 61 < hl < 1, let £1 be the
`least positive integer satisfying
`1
`<31Ze(€1+1)=2*<3+3log€1+log
`1
`
`and let K = o/1"1 and A1 = |_€1h1j. Consider the encoder
`E1 which employs a buffer of length n1 = n(h1,€1), ob-
`tained from (9) and (10) by setting 3 = Z1, A = A1, and 0(6)
`= K, and whose maximal word—length L, is equal to £1 +
`1.
`
`It follows from Theorem 1 that if this encoder is applied
`to a source 01 such that h,,l(€ 1) = h, then the resulting
`compression ratio p1(U1) satisfies
`
`In analogy with (20), it is also clear that if E0 is applied
`to a source 00 such that h,,(,(€0) = h0, then the resulting
`compression ratio p0(o'0) satisfies
`
`P0(00) S ha0(€o) + 50 = ho + 50,
`
`(24)
`
`where 60 = e(€0 + 1).
`From (23) and (24), we have p0(o0) S p0 — e(£0 + 1) + 50
`= p(), and
`
`()0Sp0—hQ<€(€0)
`
`1
`S€([1+1) S()1<Ep().
`
`Hence, h0 can be made arbitrarily close to p0, and conse—*
`quently, the encoder E0 matches as closely as desired the
`lower end p0 of the given compression range.
`Theorem 2: Let a be a source for which a matched en-
`
`coder E achieves a compression ratio p(z7) within the range
`(p0,p1). Then the compression ratio po(U), achieved for a
`by E0, satisfies
`
`100(0) 5 0(0) + A,
`
`where
`
`_
`_
`flog d1
`1
`hl
`AS
`61
`d—max{h0,1__h0
`(Typically, (h1/h0) > (1/1 - h0) and d = (hl/h0).)
`
`Proof: To prove the theorem we shall consider the
`obviously worst case of applying E0 to the source 01 whose
`matched encoder E1 realizes p1.
`Let p()(01) denote the compression ratio achievable by
`E0 when applied to 01. According to (12), we have
`
`L00
`
`P0(l71) 3
`
`no _
`
`1)N0(al))
`
`P1(01) S h;—1(€1) + 51 = hl + 51-
`
`(20)
`
`where
`
`Suppose now that h; and 61 were chosen to fit the pre-
`scribed upper value p1 = 121 + 51 of a prospective com-
`pression ratio range (p0,p1) with
`
`0<51R<p0<p1<1,
`
`R>1,
`
`where R is an adjustment parameter to be determined
`later. As shown above, the encoder E1 is then matched
`exactly to the upper value p1 of the prospective compres-
`sion range. In order to accommodate the whole given range,
`we propose to employ a slightly larger encoder E0 whose
`buffer is of length n0 = n1 - Z1 + 60, where n1 and K1 are
`the parameters of E1, and where 30 is an integer, greater
`than £1, for which the solution h0 of the equation
`
`i
`
`ni — 31 + 30 = n(h0,l’o)
`
`(22)
`
`satisfies
`
`P0 ‘ €(l7o) < ho 5 P0 " €(50 + 1)-
`
`(23)
`
`Noting that n0 — £0 = n1 - Z1 is fixed, it is clear from (9)
`and (10) that as 60 increases h0 decreases; also, (21) and the
`fact that 60 > [1 imply that p0 - e(€0 + 1) > p0 — e(€1 + 1)
`2 p0 — 61 > 0. Hence, there exist 110 > 0 and 60 > £1 that
`satisfy both (22) and (23).
`
`L0; = 1 + [log (Z,- + 1)] + [log (n,- — Zi -— 1)],
`
`i E (0,1),
`
`and N,~(aj), i,j E {0,1}, is the maximum number of words
`into which a string S E 0,-{ni — L’; - 1} is parsed by E.
`Since n0 - £0 = In - 6'1 and 50 > 31, it is easy to verify
`
`that? NQ(U1) 5 N1(O'1). AlSO
`
`N1(rr1) s N1<a1> = —~»w—”1‘ ((1 ‘L 1) = —Mé—”°' “° + 1’.
`
`
`31
`51
`
`Hence
`
`and since
`
`P0(0‘1)
`
`S
`
`L00 Lcl
`= — +
`[1
`[1
`
`L00 — Lcl
`£1
`
`+
`
`S
`
`P1
`
`Lc0 _ Lcl
`£1
`
`,
`
`Leo ‘ Lcl = [108 (30 + 1)-l ‘ (10% (31 + 1)l 5 i10g kl»
`
`2 The assertion here is analogous to that of [10, theorem 1].
`
`6
`
`6
`
`
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-23, NO. 3, MAY 1977
`
`343
`
`where k = (50/61), we obtain
`
`/1o(U1) s p. +,,i1r1og ki.
`
`Moreover, for given complexity i.e., a given codeword
`length, the compression efficiency is comparable to that
`of an optimal variable-to-block code book designed to
`match a given source.
`
`(25)
`
`To obtain an upper bound on k, we first observe that
`
`REFERENCES
`
`(50 — m + 1)af°h0 S n0 — do
`
`/1
`=n1-Z1331 Z (!1—m +1)0/lhl,
`m=1
`
`which reduces to
`
`k2(1 _ ho)? _<_ 0/Ih1(1-/€(ho/h1))_
`
`(26)
`
`Now, either k(1 — ho) < 1, or else the exponent on the
`right side of (26) must be nonnegative and thus k S (h1/
`ho). In either case,
`
`h1
`
`1
`
`kSmax[h—0,1_h0]=d.
`
`Q.E.D.
`
`Theorems 1 and 2 demonstrate the efficiency and uni-
`versality of the proposed algorithm. They show that an
`encoder designed to operate over a prescribed compression
`range performs, practically, as well as one designed to
`match a specific compression ratio within the given range.
`
`[1] D. A. Huffman, “A method for the construction of minimum—re-
`dundancy codes,” Proc. IRE, vol. 40, pp. 1098-1101, 1952.
`[2] R. M. Karp, “Minimum-redundancy coding for the discrete noiseless
`channel,” IRE Trans. Inform. Theory, vol. IT—17, pp. 27-38, Jan.
`1961.
`
`[3] B. F. Varn, “Optimal variable length codes,” Inform. Contr, vol.
`19, pp. 289-301, 1971.
`[4] Y. Perl, M. R. Gary, and S. Even, “Efficient generation of optimal
`prefix code: Equiprobable words using unequal cost letters,” J.
`ACM, vol. 22, pp. 202-214, April 1975.
`[5] A. Lempel, S. Even, and M. Cohn, “An algorithm for optimal prefix
`parsing of a noiseless and memoryless channel,” IEEE Trans. In-
`form. Theory, vol. IT-19, pp. 208-214, March 1973.
`[6] F. Jelinek and K. S. Schneider, “On variable length to block coding,”
`IEEE Trans. Inform. Theory, vol. IT—l8, pp. 765-774, Nov. 1972.
`[7] R. G. Gallager, Information Theory and Reliable Communication.
`New York: Wiley, 1968.
`[8] J. Ziv, “Coding of sources with unknown statistics—Part I; Proba-
`bility of encoding error,” IEEE Trans. Inform. Theory, vol. IT-18,
`pp. 384-394, May 1972.
`[9] L. D. Davisson, “Universal noiseless coding,” IEEE Trans. Inform.
`Theory, vol. IT-19, pp. 783-795, Nov. 1973.
`[10] A. Lempel and J. Ziv, “On the complexity of finite sequences,” IEEE
`Trans. Inform. Theory, vol. IT-22, pp. 75-81, Jan. 1976.
`[11] B. M. Fitingof, “Optimal coding in the case of unknown and
`changing message statistics,” Prob. Inform. Transm., vol. 2, pp.
`3-11, 1966.
`
`On Binary Sliding Block Codes
`
`TOBY BERGER, SENIOR MEMBER, IEEE, AND JOSEPH KA-YIN LAU
`
`Abstract— Sliding block codes are an intriguing alternative to
`the block codes used in the development of classical information
`theory. The fundamental analytical problem associated with the
`use of a sliding block code (SBC) for source encoding with respect
`to a fidelity criterion is that of determining the entropy of the coder
`output. Several methods of calculating and of bounding the output
`entropy of an SBC are presented. Thelocal and global behaviors
`of a well-designed SBC also are discussed. The so-called “l01-
`coder,” which eliminates all the isolated zeros from a binary input,
`plays a central role. It not only provides a specific example for
`application of the techniques developed for calculating and
`bounding the output entropy, but also serves as a medium for ob-
`taining indirect insight into the problem of characterizing a good
`
`Manuscript received April 16, 1976. This work was supported in part
`by the National Science Foundation under Grant GK—41229 and by a
`John Simon Guggenheim Memorial Foundation Fellowship. Portions
`of this paper were first presented at the IEEE Information Theory
`Workshop, Lenox, Massachusetts, June 1975.
`The authors are with the School of Electrical Engineering, Cornell
`University, Ithaca, NY 14853.
`‘
`
`SBC. An easily implementable SBC subclass is introduced in which
`the outputs can be calculated by simple logic circuitry. The study
`of this subclass is shown to be closely linked with the theory of al-
`gebraic group codes.
`
`I. INTRODUCTION
`
`theory of
`the
`of
`HANNON’S development
`source coding subject to a fidelity criterion [1], [2]
`dealt almost exclusively with block coding, i.e., the map-
`ping of consecutive, nonoverlapping, fixed—length blocks
`of source data into a so—called “code book” containing a
`constrained number of entries. The fundamental theorems
`
`of rate—distortion theory, which relate optimal source code
`performance to an information—theoretic minimization,
`involve complicated random coding arguments [3], [4] in
`general cases. Also,
`in many situations, block coding
`structures are exceedingly difficult to implement.
`
`7