`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL IT-24, NO. 5, Sl]’I'EMBER 1978
`
`Individual Sequences via
`Compression of
`Variable-Rate Coding
`
`JACOB ZIV, FELLOW, IEEE, AND ABRAHAM LEMPEL, MEMBER, IEEE
`
`1
`1
`
`Oracle 1015
`rac C
`
`
`
`UV AND IEMPELI OOMPRESION OF SEQUENCE
`
`individual sequences), corresponds to the smallest coding
`rate for x under which the decoding error-rate can be
`made arbitrarily small.
`In thispaper, compressibility of individual sequences is
`investigated with respect to a broader class of encoders
`that can operate in a variable-ratefmode as well as in a
`fixed-rate one and that allow for any finite-state scheme
`of variable-length-to-variable-length coding.‘ On the other
`hand, no distortion is allowed, and the original data must)
`be fully recoverable from the compressed image. This‘
`class of encoders can be modeled by the class of finite-
`state in_formation—lossless generalized automata [2], [3].
`In our model, an encoder E is defined by a quintuple
`(S,A,B,g,f) where S is a finite set of states, A is a finite
`input alphabet, B is a finite set of output words over a
`finite output alphabet, g is the “next—state” function that
`maps S XA into S, and f is the output function that maps
`S XA into B.
`
`By allowing the words in B to be of different finite
`lengths, we allow for block-to-variable coding, and by
`including in B the null-word A (i.e., the “wor ” of length
`zero), we allow for any finite-state scheme of variable—to-
`variable coding’.
`When an infinite sequence x=x,x2---, x,.EA is fed
`into E =(S,A,B, g,f),
`the encoder emits an infinite
`sequence y=y1y2---, y,.EB while going through an in-
`finite sequence of states z= z,z2- -
`-
`, z, E S, according to
`
`yi=f(zi!xi)
`
`zi+l=g(zi!xi)9
`
`i=1,2,"'
`
`where z, is the state of E when it “sees” the input symbol
`x,.
`
`A finite segment x,x,+,---xi, 1<i<j, of x will be
`denoted by x{; similar notation will naturally apply to
`finite segments of other sequences. Following conven-
`tional practice, we shall extend the use of the encoder
`functions f and g to indicate the output sequence as well
`as the final state, which results from a given initial state
`and a finite sequence of input letters. For instance, we
`shall occasionally write f(z,,x,") for y," and g(z,,x{') for
`Zn-t-l‘
`
`An encoder E is said to be information lossless (IL) if for
`all 2, E S and all x,"‘E A ",
`n >1,
`the
`triple
`{z,,f(z,,x,"),g(z,,x,")} uniquely determines x,". E is said
`to be information lossless offinite order (ILF) if there exists
`a finite positive integer m such that for all 2, ES and all
`x{" EA”, the pair {z,,f(z,,xf")} uniquely determines the
`first input letter xl. It is easy to verify [3] that if E is ILF
`then it is also IL, but there exist IL encoders that are not
`ILF.’
`
`‘We regard block-to-block, block-to-variable, and variable-to-block as
`special cases of variable-to-variable coding.
`‘Even [3] presented an algorithm for determining whether a given
`automaton is either IL, ILF, or neither. For an automaton with s states,
`the algorithm will take a number of steps that are proportional to :3.
`
`In the sequel, we assume the IL or the ILF property
`depending on which leads to a stronger result. For exam-
`ple, we prove a coding theorem (Theorem 2) by means of
`an ILF construction,‘ while its converse (Theorem 1)
`is
`proved under the broader IL assumption.
`loss of
`To simplify the discussion without any real
`generality, we also assume throughout
`that
`the output
`alphabet of the encoder is binary and that the initial state
`2, is a prescribed fixed member of the state—set S.
`Given an encoder E =(S,A,B,g,f) and an input string
`xl",
`the compression ratio for x," with respect
`to E is
`defined by
`
`Pzlxf) g
`
`L(yf')
`n logzoz
`
`(1)
`
`=2’;"=-ll(yi)a and l(yi) is
`Where a 3 iA i! y1"=f(zlsx1")9
`the length in bits of y, EB. (Note that when y,.=}\, l(y,.)=
`0-)
`the class E(s) of all
`The minimum of pE(x,") over
`finite-state IL encoders with lA[= or and {S} < s is denoted
`by pE(_,)(xf'). That is
`
`pE(S)(-xi") é Eggs) {p,,-<xr)}-
`Furthermore let
`
`oE<s>(x) =“= 1i§_r1__s°gp m:m(x.")
`
`and
`
`Pix) '9"
`
`x1_i_1{_1° P5(:)(«‘)-
`
`(2)
`
`(3)
`
`(4)
`
`It is clear that for every individual sequence x, 0< p(x)
`< 1. This normalized quantity p(x) that depends solely on
`x will be referred to as the (finite-state) compressibility of
`x. In Theorem 1 (the converse-to-coding theorem), we
`derive a lower bound on pE(,)(xf') and show that in the
`limit this bound approaches the normalized Lempel-Ziv
`complexity [4] and becomes a lower bound on the com-
`pressibility of x. In Theorem 2 (the coding theorem), we
`demonstrate, using a variant of the author’s universal
`compression algorithm [5], the existence of an asymptoti-
`cally optimal universal ILF encoding scheme under which
`the compression ratio attained for x tends in the limit to
`the compressibility p(x) of x for every x. It is important to
`note that apart from their asymptotic significance,
`the
`results of Theorems 1 and 2 also provide useful perfor-
`mance criteria for finite (and practical) data-compression
`tasks.
`
`The concept of compressibility as defined here, like the
`quantity H(-) in [1], seems to play a role analogous to that
`of entropy in classical information theory where one deals
`with probabilistic ensembles of sequences rather than with
`individual sequences. This analogy is reinforced by Theo-
`rems 3 and 4, where our concept of compressibility is
`examined from a probabilistic point of view.
`The remainder of this paper contains two parts: de-
`scriptive part (Section II) where all the results are stated
`
`
`
`2
`
`
`
`532
`
`nzma numsncrrons ON mroruunon rrnaoxv, VOL. rr-24, N0. 5, simizuiuzn 1978
`
`and discussed and a formal part (Section III) where all
`proofs except that of Theorem 2 are given. The proof of
`Theorem 2, which is constructive and thus informative, is
`presented in the mainstream of Section II.
`
`where
`
`.
`k
`n(k)= 2r2'=(k—1)2*+'+2.
`i=1
`
`.
`
`It is easy to verify that when each u(i) is parsed into its 2’
`distinct i-tuples, we obtain a parsing of uf"") into a maxi-
`mum number of distinct phrases, namely,
`k
`
`c(u‘1“")) = 2 2‘=2"+‘ -2.
`i=1
`
`For example, u1"(3’ is parsed as follows:
`
`u’,'(3)=0, l,00,0l, 10, 1 1,000,001,010,
`011,100, 101,110,111.
`
`(8)
`
`For this particular sequence u, inequality (7a) implies
`
`P0,); um £2_“_“_‘-_2_>_12g__(2_"Z'l:_Z.2 =.—1_
`k—>no
`(k—1)2"+'+2
`
`In our next result we employ a variant of the authors’
`universal compression algorithm [5]
`to demonstrate the
`existence of an ILF compression scheme under which, for
`every x,
`the compression ratio attained for xf tends to
`p(x) as It tends to infinity.
`
`II.
`
`STATEMENT AND DISCUSSION or RESULTS
`
`Our first result establishes a lower bound on the com-
`
`pression ratio attainable by any encoder E, from the class
`E(s) of IL encoders with no more than s states, for any
`finite input string x," over an alphabet A of or letters. In
`the theorem below and elsewhere in the sequel,
`log k
`means logzk.
`
`Theorem 1 (Converse-to-Coding Theorem): For every x,"
`E A "
`
`P5(s)(xi') 9
`
`c(x,")-+-52
`n log or
`
`n
`
`2
`
`252
`
`452
`
`n log or
`
`(5)
`
`where c(x,") is the largest number of distinct strings (or
`“phrases”) whose concatenation forms x,". (The proof of
`this theorem is given in Section III.)
`
`It was shown in an earlier paper [4, Th. 2] that for all
`xf' E A "
`
`CLz(x;")-1<€(x.")<
`
`n log a————
`(1 -—<,,) log n
`
`6
`( )
`
`where cLZ(x,") is the Lempel-—Ziv complexity of x," and 5,,
`satisfies 1im,,_,,,° 6,, =0. From (5) and (6) it follows that
`
`pE(s)(x) =
`
`n—>oo
`Sup pE(.r)(x1n)
`
`n—>&
`Sup
`
`c(x{‘)10g c(xf)
`n log or
`
`which implies the bound (7a) of Corollary 1. The bound
`(7b) of this corollary follows directly from (5) and (6).
`Corollary 1 (Corrioressibility Bounds):
`
`p(x)=,1'ggpr;(.)(x)>1i§1n_as;}p
`
`c(xi")103 C(xi")
`n log“
`
`(7a)
`
`1
`p(x) > lim sup lim sup
`"am k_)w kn log a
`k-1
`- 20 c(x.‘.fi'1’") log c(xe‘.‘I1’")-
`
`(7b)
`
`lim,,_,,,, sup
`that
`It also follows from (6) and (7)
`(cLz(x,") log n)/(n log or), the normalized Lempel—Ziv
`complexity of x, serves as a lower bound on the compress-
`ibility p(x) of x.
`_ An interesting application of the compressibility bound
`is the use of (7) to identify certain infinite sequences 1:
`that While being rather easy to describe, satisfy p(x)=l
`and thus are incompressible by any finite—state IL en-
`coder. To illustrate this point, let u(k) denote the binary
`sequence of length k2" that lists, for example, in lexico-
`graphic order, all the 2" binary words of length k, and let
`
`u’1"")=u(l)u(2)- -
`
`- u(k)
`
`
`
`Theorem 2 (Coding Theorem): For every n >0 there ex-
`ists a finite-state ILF encoder 5 with s(n) states that
`implements a block-to—variable code with the following
`performance characteristics.
`i) For any given block length n and every input block
`x,", the compression-ratio attained by 5 satisfies
`1og(2a(c<xr>+i)>;
`
`P5(-"i')<
`
`n logo:
`
`<9)
`
`successive blocks
`the compression ratio attained for
`x('_}'_,),,,,,, i =1, 2,~ -
`- , satisfies the same inequality with
`x(‘ffT,),+, replacing _x,".
`11) For every finite s,
`
`Ps(Xr")<Pr<s)(Xi')‘*‘3;(")
`
`(10)
`
`where
`
`8,(n) = 0.
`
`let p5(x,‘n)
`iii) Given an infinite input sequence x,
`denote the compression ratio attained for x by 5 while
`feeding 5 with successive input blocks of length n. Then
`for any a > O
`
`Pslxm) < p(x)+5.(x,n)
`
`(11)
`
`where
`
`nlilno 8‘(x,n) = 6.
`
`Proof: The proof is constructive, andbefore going
`into computational details, we present a short outline of
`the construction. For the encoder 5, we employ an ILF
`finite-state machine that realizes a concatenated coding
`scheme by combining a fixed block-to-variable outer code
`with a state-dependent variable-to-variableinner code.
`3
`Tlie inner code is used to encode sequentially and state
`
`
`
`4
`
`
`
`534
`
`man nwesxcnous on mromxnow '11-naoxv, VOL. rr-24, NO. 5, SEPTEMBER 1978
`
`Theorem 3: For every infinite sequence it
`(23)
`PM = 1?(x)-
`Theorem 4: If x is drawn from an ergodic source with
`entropy H then
`
`Pr [p(x)==H]=1.
`
`(24)
`
`Corollary 3: If x is drawn from a stationary source with
`entropy H then
`
`Ep(x)= H
`
`(25)
`
`where E denotes expectation.
`In a recent paper [1], compressibility of an individual
`infinite sequence x with respect to an IL fixed-rate en-
`coder is measured in terms of a “finite-state complexity”
`H(x), and results similar to those obtained here for p(x)
`are established there for H(x). Thus the roles played by
`p(x) and H(x) for individual sequences are analogous to
`that played by the Shannon entropy for probabilistic
`ensembles of sequences, and both Ep(x) and EH(x) are
`appropriate candidates for a concept of “generalized ent-
`ropy” in the nonstationary case.
`
`III.
`
`PROOFS
`
`Proof of Theorem 1: Given an encoder E E E(s) and an
`input string x,", let
`
`n.+i n;+l'
`x."=x."»x"= x"=
`
`..x"
`4-,“
`
`be a parsing of x," into c = c(x,") distinct phrases, and let cj
`denote the number of phrases x,ff’_l+,,
`1 <1’ < c, (where
`no § 0 and nc 3 n) for which y,f“_'+,,
`the corresponding
`output phrase, is j-bits long. Since the input phrases are all
`distinct, it follows from the IL property of E that cj < s22’
`for all j. It is also clear that to obtain a lower bound on
`the length L(y{') in bits of y,", we may overestimate cj,
`j=0, l,- - -
`, at the expense of 2,->jc,., provided the sum of
`all 9 remains equal to c. Thus if q and r are the nonnega-
`tive integers satisfying c = qs’ + r, 0 < r < s2, and if
`k
`.
`q= 2 2J+Ak,
`1-0
`
`O<A,,<2"“,
`
`then we may assume that cj=s’2j for 0< j<k, ck,,_,=
`s2Ak + r, and cj=0 forj> k +1. Therefore
`k
`
`c=s2 E 2j+s2Ak+r=s2(2"“+t)
`j-()
`
`t=A,(—1+—-2,
`' S
`
`(26)
`
`(27)
`
`where
`
`and
`
`k
`
`.
`
`L(yf') > s2 Eojl/+(k+ l)(s2Ak + r)
`,.
`
`=s7'[(k— 1)2"“+2] +(k+ l)(s2Ak + r)
`
`5
`5
`
`=s2(k— l)(2"“+t).+s2(k+3+2t)
`
`= (k -1)(c + :2) +2s2(t +2).
`
`(28)
`
`is the number of bits used to encode the ith
`where L,
`block x(',-",,,),,+, of x. Since (9) and (10) hold for any input
`block of length n, we can write
`L.I
`n log at =Pza(xi'i—1)n+:) <P£(s)(x'("i—1)n+i)+3;(")-
`
`(15)
`
`PE(s)(x'('i'—i)n+i
`
`,
`
`k 2
`
`i=1
`
`From (15) and (16) we obtain
`
`l
`p5(x,n) <8:(n) + lim sup K
`k—->00
`which reduces to
`
`Pz;(x»”)<5;(’1)+Pi-:(s)(X)-
`
`(17)
`
`where lim,,_,,,° 6,(n)=0. By (4), we can write pE(S)(x)=p(x)
`+8;*(x), where 1im3__m 85*(x)=0 for all x. Hence given x
`and any 5 > 0, there always exists a sufficiently large finite
`s for which 6,*(x)<e. Since (17) holds for every finite s, it
`follows that for any c >0, we can write
`
`p5(x,n) < p(x)+ E + 6_,(n),
`
`which proves (11) with 5((x,n) é :+8,(n) and completes
`the proof of the theorem.
`Q.E.D.
`
`It is easy to verify that inequality (7b) and the proof of
`Theorem 2 also imply the following corollary.
`Corollary 2: Let p(x{) denote the number of phrases in
`the incremental parsing of
`Then for every infinite
`sequence x,
`
`p(x) = lim sup lim sup
`n—-rco
`k_.,w
`
`1
`
`kn log 01
`
`k
`
`- _Z0p(x.$iIl’")10gp(x§.fIAl’")-
`We proceed now to examine the concept of compressi-
`bility from a different point of view. Given wEA’, an
`arbitrary word of length I over A, and an input string
`xf‘ EA ", let
`
`8(x,»'1'{,w)={ 1’
`
`0,
`
`if "7++1'=W 0<i<n—l.
`
`otherwise
`
`(18)
`
`Then the relative frequency of occurence of w in x," is
`1
`P1 -1
`.
`P("1"’W) ”' n — 1+ 1 Ea 8(x"'*+‘[’W)’
`
`(19)
`
`which can also be interpreted as a “probability measure”
`of wEA’ relative to x{'. The corresponding normalized
`“entropy” is then given by
`1
`fi’(x‘n)= -— lloga E P(xi",W)10gP(xi"aW)
`we/1'
`
`(20)
`
`where by definition P log P=0 whenever P=0. Now we
`take
`
`and
`
`H,(x)=lim sup I?,(x,")
`
`1?(x)= lim 1i,(x).
`I->50
`
`(21)
`
`(22)
`
`The existence of this limit is established in. Section III
`where we also prove the following results.
`
`
`
`
`
`ZIV AND IEMPEL2 ‘COMP-RESSION OP SIQUENCES
`
`k—l=log
`
`From (26) we have
`0+3,
`c_s2t
`2 —log 1+
`2 —-2-—-log
`4s
`s
`' which together with (28) yields
`L( ~)>(e+s=) lo "”2+¢
`Y1
`9
`432
`
`Where
`
`
`(H_ 1)52
`2
`c-s t
`
`252“ +2)
`1- = —————--2- — log 1+
`c+s
`2
`__
`2
`=
`Let ¢ «HF DS )/(C S 0' Then
`2
`7: 2s + 2:15
`__1og (1+¢)
`¢-+32
`l+¢
`and by (26) and (27) we have
`*
`¢‘(A,.+ L2i2"<"+‘>.
`s
`
`’
`
`,
`
`0+1”;
`c-szt
`'
`
`h
`(29) W cm
`
`"EA,
`
`32+ ,
`201 )
`
`S
`
`Lemma 2 (Generalized Kraft Inequality): For any given
`IL encoder E with s=|S| states,
`A
`K: 2 2-I-(w)<S2(1+ log
`.
`(31)
`L(w)=§Igg {L(f(z,w))}
`and L(f(z,w)) is the’ length in bits of f(z,w), the output
`from E when started in state 2 and fed with w.
`,
`I
`Proof: Let
`denote the number of wEA for which
`L(w) =j. Then K=2 Vk—2‘f and a’=E .k.. By the IL prop-
`erty of E, it is clear Jtiiat k. < s22’. It isjalso clear that to
`.
`1
`.
`obtain an upper bound on K, we may overestimate kj,
`j=O, l,- -
`-
`, at the expense of E,.>jk,, provided the sum of
`all the
`remains equal to cx’. Thus we can write
`M
`K< 2 (s22’)2"'=s2(m+ 1)
`j‘0
`
`53%
`
`(30)
`
`(32)
`
`From the definitions of Ak and r it follows that 0 <¢< 1, Wii¢i"¢ "1 i5 iiie iiii¢g¢i Satisfying
`.
`m
`and one can readily verify that over this interval, 24>> (1
`m-1
`.
`+¢) log (1 +¢). Hence ¢>(2s2)/(c+s2), which together
`2 s’2’< a’< 2 s22/.
`with (29) yields
`—
`J‘°
`1'0
`
`C +32
`V a,
`,,,_;
`L(y,")>(c+s2) log 4 2 +232.
`2"'=1+22J<—2+l
`S
`1'0
`3
`Dividing both sides of this inequality by n log oz, we
`obtain the bound of Theorem 1-
`Q.E.D- which together with (32) yields (30).
`
`Furthermore
`
`Q.E.D.
`
`Lemma 1-’ Th5 iiiiiii Of (22) eXi5i5-
`Proof: By (20) and (21), we can Write
`.
`-1
`(l+m)HI+m(x)=10g a
`.
`'
`
`n
`
`n
`
`hiiisgp wgnm PM W) log Pm W)
`
`P H,
`
`,u
`
`“‘ I
`
`.
`11
`
`=
`
`-
`
`'
`
`15:? ‘M W)
`log P(xi”’uv) +1ogP(x"
`)
`——-———
`P(x,",u)
`i
`=lI;I(x)+ i
`um Sup 2 P(xfz’v)
`log“
`""°° 054'"
`2 P(xln’u0) 1
`P(x{v’u)
`‘“”T'* 03 f'‘‘-
`uEA’ P(xi ’°)
`Pix‘ ’"°)
`
`Proof of Theorem 3: From the definition of L(w) in
`(31), it is clear that for any IL encoder with s states
`,,
`1
`"
`pE(:)(xI )= n log a ‘E! L(f(z1vxi))
`I
`. "
`
`= In log a ,2: H“(f(Z”x"))
`
`n-—l
`
`1
`
`,.
`
`l0ga.§oL(f(Z'H’x':l[))
`i
`"4
`1
`.-+
`>1" log a ‘go L(xi+i
`By (19), we can rewrite (33) as
`n—l+l
`,,
`p[;‘(.y)(X‘ )> In log a
`§AIP(Xx »W)L(W)a
`and we obtain
`
`"’
`
`(33)
`
`By the convexity of the logarithm function, we can further
`write
`A
`A
`1
`("i"")H1+m(x)< lH1(x)'i'1og a
`2 _.__”<"I"’“>
`1
`1-
`2 P n
`DEA”.
`(X1 30) 0g “EA, P(x1n’v)
`-
`which reduces to
`A
`A
`A
`(1+m)H,,,,,,(x) < lH,(x) + mH,,,(x).
`Hence Iii‘, is subadditive in 1, and since O< H,(x) < 1, the
`limit of (22) exists.
`Q.E.D.
`
`1
`li1;1_»s°:1p w§AlP(x,",w)L(w).
`pE(J)(x)>[1Og 0‘
`By (20) and (21), we have
`um sup
`I?,<x)~pE(,,(x> <
`1
`lloga n——;m
`. 2 p(Xln,w)(1og 7% _L(w))
`X ,
`WEA’
`‘
`_L( )
`1
`lim sup 2 P(xf',w) log 1-;-—
`”°8 0‘
`n-+°° w€A’
`Pix: W)
`
`=
`
`
`
`6
`
`
`
`536
`
`151212 TRANSACTIONS on INFORMATION rmzoxv, VOL. rr-24, N0. 5, snrvraumm 1978
`
`which by the convexity of the logarithm function and where P(x,w)=lim,,_,wP(x,", w) and Pr (w) is the proba-
`Lemma 2 reduces to
`bility measure of w. Therefore, if H,= l / IZWEA1 Pr (w)
`log Pr (w), we obtain
`'
`
`Ii,(x)—p£(,)(x)<l1olg a li1’;r1_)s°1.ip log w§Al2“L(w)
`=I1gga (1+log
`
`2
`
`Taking the limit as I approaches infinity, we obtain
`
`1%) —p....<x> < 0.
`and since (34) holds for every finite s, we have
`
`1
`
`(34)
`
`Pr [H,(x) = 11,] = 1,
`which when I approaches infinity becomes
`Pr [1i(x)=H]=1.
`From this and Theorem 3, we obtain Theorem 4. Q.E.D.
`
`ACKNOWLEDGMENT
`
`11706) <p(x)
`for every infinite sequence x.
`Using Huffman’s coding scheme for input blocks of
`length 1, it is easy to show [6] that
`
`(35)
`
`p(x) < 5’1(x)+
`
`log a
`I
`
`9
`
`which when 1 tends to infinity becomes
`
`(36)
`p(x) < 906)
`for all x. Combining (35) with (36) completes the proof.
`Q.E.D.
`
`Proof of Theorem 4: Since x is drawn from an ergodic
`source, it follows that for every positive integer I and for
`every WEA’
`
`Pr [P(x,w)=Pr (w)] =1
`
`The authors are thankful to M. Cohn, H. D. Shapiro,
`D. Slepian, and A. D. Wyner for many helpful discus-
`sions.
`
`REFERENCES
`
`[1]
`
`[2]
`
`J. Ziv, “Coding theorems for individual sequences," IEEE Trans.
`Inform Theory, IT—24, pp. 405-412, July l978.
`S. Even, “Generalized automata and their information losslessness,”
`Switching Circuit Theory and Logical Design, All-IE Special I’ubI.,
`5.141, pp. 144-147, 1961.
`S. Even, “On information lossless automata of finite order,” IEEE
`Tram. Electronic Computers, vol. EC-1 pp. 561-569, Aug. 1965.
`[4] A. Lempel and J. Ziv, “On the complexity of finite sequences,”
`IEEE Trans. Inform. Theory, vol. lT—22, pp. 75-8], Jan. l976.
`J. Ziv and A. Lempel, “A universal algorithm for sequential data
`compression,” IEEE Trans. Infonn. Theory, vol. IT-23, pp. 337-343,
`May 1977.
`[6] R. G. Gallager, Information Theory and Reliable Conununication.
`New York: Wiley, 1968.
`
`[3]
`
`[5]
`
`
`
`7