throbber
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.
`
`IT—23, NO. 3, MAY 1977
`
`337
`
`A Universal Algorithm for Sequential Data Compression
`
`JACOB ZIV, FELLOW, IEEE, AND ABRAHAM LEMPEL, MEMBER, IEEE
`
`Abstract—A universal algorithm for sequential data compres-
`sion is presented. Its performance is investigated with respect to
`a nonprobabilistic model of constrained sources. The compression
`ratio achieved by the proposed universal code uniformly ap-
`proaches the lower bounds on the compression ratios attainable by
`block-to-variable codes and variable-to-block codes designed to
`match a completely specified source.
`
`I. INTRODUCTION
`
`com-
`arising in digital
`N MANY situations
`munications and data processing, the encountered
`strings of data display various structural regularities or are
`otherwise subject to certain constraints, thereby allowing
`for storage and time—saving techniques of data compres-
`sion. Given a discrete data source, the problem of data
`compression is first to identify the limitations of the source,
`and second to devise a coding scheme which, subject to
`certain performance criteria, will best compress the given
`source.
`
`Once the relevant source parameters have been identi-
`fied, the problem reduces to one of minimum—redundancy
`coding. This phase of the problem has received extensive
`treatment in the literature [1]—[7].
`When no a priori knowledge of the source characteristics
`is available, and if statistical tests are either impossible or
`unreliable, the problem of data compression becomes
`considerably more complicated. In order to overcome these
`difficulties one must resort to universal coding schemes
`whereby the coding process is interlaced with a learning
`process for the varying source characteristics [8],
`Such
`coding schemes inevitably require a larger working mem-
`ory space and generally employ performance criteria that
`are appropriate for a wide variety of sources.
`In this paper, we describe a universal coding scheme
`which can be applied to any discrete source and whose
`performance is comparable to certain optimal fixed code
`book schemes designed for completely specified sources.
`For lack of adequate criteria, we do not attempt to rank the
`proposed scheme with respect to other possible universal
`coding schemes. Instead, for the broad class of sources
`defined in Section III, we derive upper bounds on the
`compression efficiency attainable with full a priori
`knowledge of the source by fixed code book schemes, and
`
`Manuscript received June 23, 1975; revised July 6, 1976. Paper pre-
`viously presented at the IEEE International Symposium on Information
`Theory, Ronneby, Sweden, June 21-24, 1976.
`J. Ziv was with the Department of Electrical Engineering, Technion—
`Israel Institute of Technology, Haifa, Israel. He is now with the Bell
`Telephone Laboratories, Murray Hill, NJ 07974.
`A. Lempel was with the Department of Electrical Engineering, Tech-
`nion—Israel Institute of Technology, Haifa, Israel. He is now with the
`Sperry Research Center, Sudbury, MA 01776.
`
`then show that the efficiency of our universal code with no
`a priori knowledge of the source approaches those
`bounds.
`
`The proposed compression algorithm is an adaptation
`of a simple copying procedure discussed recently [10] in
`a study on the complexity of finite sequences. Basically,
`we employ the concept of encoding future segments of the
`source—output via maximum—length copying from a buffer
`containing the recent past output. The transmitted
`codeword consists of the buffer address and the length of
`the copied segment. With a predetermined initial load of
`the buffer and the information contained in the codewords,
`the source data can readily be reconstructed at the de-
`coding end of the process.
`.
`The main drawback of the proposed algorithm is its
`susceptibility to error propagation in the event of a channel
`error.
`
`11. THE COMPRESSION ALGORITHM
`
`The proposed compression algorithm consists of a rule
`for parsing strings of symbols from a finite alphabet A into
`substrings, or words, whose lengths do not exceed a pre—_
`scribed integer Ls, and a coding scheme which maps these
`substrings sequentially into uniquely, decipherable code-
`words of fixed length LC over the same alphabet A.
`The word—length bounds L, and L, allow for bounded-
`delay encoding and decoding, and they are related by
`
`Le = 1+ l10g (11 - Ls)l 4” I108 L5]:
`
`(1)
`
`where [x] is the least integer notgsmaller than x, the log-
`arithm base is the cardinality oz of the alphabet A, _and n
`is the length of a buffer, employed at the encoding end of
`the process, which stores the latest n symbols emitted by
`the source. The exact relationship between n and L, is
`discussed in Section III. Typically, n 2 Lso/‘Ls, where 0
`< h < 1. For on—line decoding, a buffer of similar length has
`to be employed at the decoding end also.
`To describe the exact mechanics of the parsing and
`coding procedures, we need some preparation by way of
`notation and definitions.
`
`Consider a finite alphabet A of a symbols, say A =
`{0,1, - - - ,a — 1}. A string, or word, S of length [(8) = k over
`A is an ordered k—tuple S = .9132 - - - 3;, of symbols from A.
`To indicate a substring of S which starts at position i and
`ends at position j, we write S(i,j). When i S j, S(i,j) =
`s,~s,-+1 - - - S,-, but when i > j, we take S(i,j) = A, the null
`string of length zero.
`The concatenation of strings Q and R forms a new string
`S = QR; if£’(Q) = k and EU?) = m, then €(S) = k + m, Q
`= S(1,k), and R = S(k + 1, k + m). For eachj, 0 Sj _<_
`
`1
`
`Oracle 1014
`
`Oracle 1014
`
`

`
`2
`
`

`
`ZIV AND LEMPEL: SEQUENTIAL DATA COMPRESSION
`
`339
`
`which matches a substring of B1 that starts in position pl
`5 9 and then set S1 = B1(10,9 + 61). It is easily seen that
`the longest match in this case is B1(10,11) = 00, and hence
`S1 = 001 and £1 = 3. The pointer p1 for this step can be any
`integer between one and nine; we choose to set p1 = 9. The
`two—digit radix—3 representation of p1 — 1 is C11 = 22, and
`that of £1 - 1 is C12 = 02. Since C53 is always equal to the
`last symbol of S,~, the codeword for S1 is given by C1 =
`22021.
`
`To obtain the buffer load B2 for the second step, we shift
`out the first £1 = 3 digits of B 1 and feed in the next 3 digits
`S (10,12) = 210 of the input string S. The details of steps
`2, 3, and 4 are tabulated below, where pointer positions are
`indicated by arrows and where the source words S, are
`indicated by the italic substring of the corresponding
`buffer load B,-
`
`l
`
`B2=00OO0O00101021O210,
`
`C2=211O2
`
`l
`
`B3=000010102102102I20,
`
`C3=2O212
`
`l
`
`B4=210210212021021200,
`
`C4=0222O.
`
`III. COMPRESSION or CONSTRAINED SOURCES
`
`In this section, we investigate the performance of the
`proposed compression algorithm with respect to a non-
`probabilistic model of constrained information sources.
`After defining the source model, we derive lower bounds
`on the compression ratios attainable by block—to-variable
`and variable—to—block codes under full knowledge of the
`source, and then show that the compression ratio achieved
`by our universal code approaches these bounds.
`
`A. Definition of the Source Model
`
`Let A = {0,1, - - - ,oz — 1} be the given a—symbol alphabet,
`and let A * denote the set of all finite strings over A. Given
`a string S E A* and a positive integer m S 3 (S), let Slm}
`denote the set of all substrings of length m contained in S,
`and let S(m) denote the cardinality of S {In}. That is,
`3(3)-In
`
`S{m}=
`
`S(i+ 1,i+m)
`
`and
`
`S(m) = |S{m}|-
`
`Given a subset 0 ofA*, let
`
`aim} = {S E oI£’(S) = m},
`
`and let a(m) denote the cardinality of o{m}.
`A subset a of A * is called a source if the following three
`properties hold:
`
`1) A C a (i.e., 0 contains all the unit length strings),
`2) S E 0’ implies SS E a,
`3) S E 0 implies S{m} C a{m}.
`
`3
`3
`
`Typically, such a source a is defined by specifying a fi-
`nite set of strings over A which are forbidden to appear as;
`substrings of elements belonging to a, and therefore a(m)
`< am for all m exceeding some mo.
`With every source 0, we associate a sequence h(1),
`h(2), - - - of parameters, called the h-parameters of a,
`' wherel
`
`h(m) = i log a(m),
`m
`
`m = 1,2, - - -.
`
`(2)
`
`It is clear that 0 S h(m) S 1 for all m and, by 2) it is also
`clear that mh(m) is a nondecreasing function of m. The _
`sequence of h—parameters, however,
`is usually nonin~
`creasing in m. To avoid any possible confusion in the se—£
`quel, we postulate this property as an additional defining
`property of a source. Namely, we require
`
`4) h(m) = 1/m log o(m) is a nonincreasing function of
`m.
`
`B. Some Lower Bounds on the Compression Ratio
`
`Consider a compression coding scheme for a source a
`which employs a block—to-variable (B V) code book of M
`pairs (X,-,Y,-) of words over A, with £’(X,) = L for i =
`1,2, -
`- - ,M. The encoding of an infinitely long string 8 E‘
`0 by such a code is carried out by first parsing S into blocks
`of length L, and then replacing each block Xi by the cor-
`responding codeword Y,-. It is assumed, of course, that the
`code book is exhaustive with respect to 0 and uniquely
`decipherable [2}. Hence, we must have
`
`{Xi iii = UlL}
`
`M =
`
`= aLh(L),
`
`01'
`
`and
`
`max l€(Y,-)} Z l0gM = Lh(L).
`1SiSM
`
`(4)
`
`The compression ratio pg associated with the ith word»
`pair of the code is given by
`
`Pi
`
`£’(Yi)
`L .
`
`The BV compression ratio, pm/(o,M), of the source a,
`is defined as the minimax value of p,-, where the maximi—
`zation is over all word—pairs of a given code, and the min-
`imization is over the set C3‘/(a,M) of all BV code books"
`consisting of M word—pairs. Thus,
`
`p3V(a,M)= min max {gum}
`
`Cm/(a,M)1sisM L
`
`logM Lh(L)
`= —-—' =
`L
`L
`
`_
`>
`
`h
`
`,
`
`L .
`()
`
`1 Throughout this paper, log x means the base—a logarithm of x.
`
`

`
`340
`
`For later reference, we record this result in the following
`lemma.
`Lemma 1:
`
`where
`
`pBv(U,M) Z h(L),
`
`Lh(L) = log M.
`
`Now, consider a compression scheme which employs a
`variable—to—block (VB) code book of M word—pairs (X,-,Y,-),
`with £’(Y,) = L for all i = 1,2, - -- ,M. In this case, the
`compression ratio associated with the ith word-pair is
`given by
`
`_
`
`Pi
`
`L
`5’ (Xi),
`
`and similarly, the VB compression ratio pV3(a,M) of a is
`defined as the minimax value of p,- over all word—pairs and
`over the set CVB(a,M) of VB code books with M word-
`pairs.
`Lemma 2:
`
`where
`
`/>vB(0,M) Z h(LM)
`
`LM = max {KIM Z a(€)}.
`
`Proof: We may assume, without loss of generality, that
`in every code under consideration
`
`€(X1)S 6'(X2) 5
`
`S €(XM),
`
`and hence for each C E C VB(a,M),
`
`,- C =
`maxp(
`)
`
`L(C)
`“Xfl
`
`.
`
`Since C is exhaustive with respect to 0, we have
`
`M Z rr(£’i),
`
`5
`()
`
`(6)
`
`where 61 = L’ (X 1); and since C is uniquely decipherable,
`we have
`
`L(C) ZlogM.
`
`(7)
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, MAY 1977
`
`Remarks
`
`1) Since the value of L in the context of Lemma 1
`satisfies the definition of LM as given in Lemma 2, it fol-
`lows that the bounds of both lemmas are essentially the
`same, despite the basic difference between the respective
`coding schemes.
`2) By 2), the second defining property of a source, the
`per—word bounds derived above apply to indefinitely long
`messages as well, since the whole message may consist of
`repeated appearances of the same worst case word.
`3) By 4), the nonincreasing property of the h-pa-
`rameters, the form of the derived bounds confirms the
`intuitive expectation that an increase in the size M of the
`employed code book causes a decrease in the lower bound
`on the attainable compression ratio.
`
`C. Performance of the Proposed Algorithm
`
`We proceed now to derive an upper bound on the com-
`pression ratio attainable by the algorithm of Section II. To
`this end, we consider the worst case source message of
`length n — Ls, where n is the prospective buffer length and
`L, is the maximal word—length. The bound obtained for
`this case will obviously apply to all messages of length n
`— L, or greater.
`First, we assume that only the h—parameters of the
`source under consideration are known to the designer of
`the encoder. Later, when we discuss the universal perfor-
`mance of the proposed algorithm, we will show that even
`this restricted a priori knowledge of the source is actually
`unessential.
`
`We begin by choosing the buffer length n to be an inte-
`ger of the form
`x
`
`
`m=1n= 2 ma”"+ Zm=}\+1 m<T(3) + (5 + 1)(Ne+1 +1),
`
`Z
`
`
`where
`
`A
`
`e
`m=>\+1 (3 — m)<7(€),
`Ne+1 = Z (5 -m)a’" + Z
`m=1
`
`(9)
`
`(10)
`
`From the definition of LM, inequality (6), and the nonde-
`creasing property of 6(5), we obtain
`
`A = [£’h(Z)j, the integer part of log 0(3) = Zh(€), and
`
`6 = L, —— 1.
`
`(11)
`
`£1 ELM.
`
`_ From (5), (7), and (8), we have
`
`max p,-(C) Z
`
`logM_
`LM ’
`
`and since
`
`M Z
`
`= aLMh(l-M),
`
`it follows that for each C E C VB(o,M),
`
`max p,-(C) Z h(LM).
`
`(8)
`
`The specific value of the parameter L, is left for later
`determination (see the first remark following the proof
`of Theorem 1). The reasoning that motivates the given
`form of n will become clear from subsequent deriva-
`tions.
`
`Consider a string S E crjn — Ls}, and let N(S ) denote the
`number of words into which S is parsed by the algorithm
`during the encoding process. Recalling that each of these
`words is mapped into a codeword of fixed length LC (see
`(1)), it follows that the compression ratio p(S) associated
`with the string S is given by
`
`Q.E.D.
`
`4
`
`‘p(S') =
`
`Lc
`
`n — Ls
`
`N(S)-
`
`

`
`ZIV AND LEMPEL: SEQUENTIAL DATA COMPRESSION
`
`Hence, the compression ratio p attainable by the algorithm
`for a given source 0 is
`
`N,
`
`(12)
`
`where
`
`N= max N(S).
`se.;:n—L,}
`
`Let Q E afn — L3} be such that N(Q) = N, and suppose
`that the algorithm parses Q into Q = Q1Q2 - - - QN. From
`step 2) of the encoding cycle it follows that if €(Q,-) = Z (Qj)
`< Ls, for some i <j < N, then Q, 75 Q]; (Note that when
`Qj is being determined at the jth cycle of the encoding
`process, all of Q, is still stored in the buffer, and since €(Q,-)
`< L, , the longest substring in the buffer that precedes Q,-
`and is a prefix of Qj must be of length €(Q,-) — 1.)
`Denoting by Km the number of Q,-, 1 S i S N — 1, of
`length m, 1 S m S Ls, we have
`Ls
`N = 1 + 2 K,,,.
`m= 1
`
`or
`
`n — L
`- L
`3.
`L, — 1
`Z
`Hence, from (12) and (16), we have
`
`N 5 N’ = ”
`
`S =
`
`Le
`£2 _
`pS€—LS_1.
`
`341
`
`(16)
`
`(17)
`
`Note that despite the rater rudimentary overestimation
`of N by N’, the upper bound of (17) is quite tight, since the
`fact that no source word is longer than L, immediately
`implies p Z LC/Ls.
`We can state now the following result.
`Theorem 1: If the buffer length n for a source with
`known h—parameters is chosen according to (9), then
`
`p S h(Ls - 1) + e(Ls),
`
`where
`
`e(L,.) =
`
`L, — 1
`
`(3 + 3 log (L, — 1) + log
`
`Proof: From (1) we have
`
`By the above argument, and by property 3) of the source,
`we have
`
`LC = 1 + [log LS] + [log (n — L,)]
`
`K,,,Sa(m),
`
`for1SmS€=L,—-1.
`
`Since
`
`[+1
`
`n _Ls = €(Q1v)+ Z mKm,
`m=1
`
`and n and L, are both fixed, it is clear that by overesti-
`mating the values of Km for 1 S m _<_ E at the expense of
`K5+1, we can only overestimate the value of N. Therefore,
`since a(m) S a(m + 1) and a(m) = amhiml S am, we ob-
`tain
`
`S 3 + log (L, — 1) + log (n — L_,).
`
`From (9) and (10) we obtain
`A
`
`7n=1
`
`4.’
`
`m=>\+1
`
`>\
`
`6’
`
`+ ; oz’”+ 2 0(3)],
`
`m=)\+1
`
`1
`
`and since 01'” S a(Z), for 1 S m S A, we have
`
`n—L,SZa(€) f (Z-m+1)=%£’2(€+1)a(€),
`
`m=1
`
`log (n — L3) 5 2 log 3 + log
`
`+ L’h(€).
`
`2
`
`£’+1
`
`/
`6
`/
`N_<_K[r+1+ Z Km=N/,
`m=1
`
`01'
`
`where
`
`and
`
`K;n={a’”,
`
`0(5),
`
`for1SmS}\=L€h(€)j
`
`for>\<m SE
`
`(14)
`
`Since 3 = L, — 1, we obtain
`
`,
`
`Kg+1=
`
`1
`
`5’
`
`,
`
`;1mK,,,)-I.
`
`Ls
`LC 5 3 + 3log (L, — 1) +log?‘+ (L, — 1)h(L, — 1),
`01'
`
`From (14), (15), and (9), we obtain K)“ 5 Ng+1, and
`
`LC 5 (Ls — 1)[h(Ls — 1) + 6(Ls)l-
`
`x
`
`(7
`
`N/=Ne+1+ Z;
`
`1
`
`01'” + Z
`m=>\+1
`
`0(3)
`
`which, together with (9) and (10), yields
`e
`n—L,—€N’= i mcr’"+ Z ma(£’)
`m=1
`m=>x+1
`
`A
`
`e
`
`+Nz+1-€[ Z a’”+ Z a(€)]=O,
`
`m=>\+1
`
`m=1
`
`Substituting this result into (17), we obtain the bound of
`Theorem 1.
`
`Q.E.D.
`
`Remarks
`
`1) The value of e(L,) decreases with L, and, conse-
`quently, the compression ratio p approaches the value of
`h(L, — 1), the h—parameter associated with the second
`largest word—length processed by the encoder. Given any
`6 > 0, one can always find the least integer Z, such that
`p - h(€, — 1) S 6. The magnitude of the acceptable de-
`
`5
`
`

`
`342
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, MAY 1977
`
`In analogy with (20), it is also clear that if E0 is applied
`to a source :70 such that h,,(,(Z0) = ho, then the resulting
`compression ratio p0(a0) satisfies
`
`Po(00) S ha0(€o) + 50 = ho + 50.
`
`(24)
`
`where 60 = e(€0 + 1).
`From (23) and (24), we have p0(a0) 5 p0 — e(£0 + 1) + 60
`= p(), and
`
`1
`()0Spo—hQ<e(€0)S6([1+1)S51<Ep0.
`
`Hence, h0 can be made arbitrarily close to p0, and conse—‘
`quently, the encoder E0 matches as closely as desired the
`lower end p0 of the given compression range.
`Theorem 2: Let a be a source for which a matched en-
`
`coder E achieves a compression ratio p(a) within the range
`(p0,p1). Then the compression ratio po(a), achieved for a
`by E0, satisfies
`
`00(0) S 0(0) + A,
`
`where
`
`flog dl
`
`1
`hl
`AS
`(1
`max {ho 1 " ho
`(Typically, (h1/h0) > (1/1 - I10) and d = (h1/h0).)
`
`d =
`
`—,
`
`viation 6 determines the operational range of LS, namely,
`L, 2 Z3.
`2) Since our code maps source words of variable
`length at most Ls into codewords of fixed length LC, we
`adopt as a reference for comparison the best VB code C VB
`discussed in Subsection III—B. The counterpart of our
`block-length LC is L(C) of (5), and by (7) we can write
`
`LC % lOg'M %
`
`(18)
`
`~
`
`where M is the code book size and h(LM) is the lower
`bound (see Lemma 2) on the compression ratio pVB at-
`tainable by C VB. Since for sufficiently large Ls, we have
`also
`
`LC ~ (Ls _ 1)P Q (Ls _
`
`" 1):
`
`(19)
`
`it follows that p % pvg.
`We turn now to investigate the universal performance
`of the proposed algorithm, i.e., the performance when no
`a priori knowledge of the source to be compressed is
`available.
`
`Given 61 and h1 such that0 < 61< h1 < 1, let [1 be the
`least positive integer satisfying
`1
`c)1Ze(€1+l)=2-<3+3log€1+log
`1
`
`and let K = a"1h1 and )\1 = |_Z1h1j. Consider the encoder
`E1 which employs a buffer of length n1 = n(h1,€ 1), ob-
`tained from (9) and (10) by setting I = Z1, A = A1, and 0(3)
`= K, and whose maximal word—length L, is equal to 6’ 1 +
`1.
`
`It follows from Theorem 1 that if this encoder is applied
`to a source 01 such that h,,(€
`1) = h1, then the resulting
`compression ratio p1(a1) satisfies
`
`Proof: To prove the theorem we shall consider the
`obviously worst case of applying E0 to the source 01 whose
`matched encoder E1 realizes p1.
`Let p0(o'1) denote the compression ratio achievable by
`E0 when applied to 01. According to (12), we have
`
`L00
`
`Po(01) =
`
`no _
`
`N0(0I))
`
`p1(01) 5 h.}1(€1) + 51 = hi 4” 51-
`
`(20)
`
`where
`
`Suppose now that h1 and 61 were chosen to fit the pre-
`scribed upper value p1 = h1 + 61 of a prospective com-
`pression ratio range (p0,p1) with
`
`0<51R<p0<p1<l,
`
`R>1,
`
`where R is an adjustment parameter to be determined
`later. As shown above, the encoder E1 is then matched
`exactly to the upper value p1 of the prospective compres-
`sion range. In order to accommodate the whole given range,
`we propose to employ a slightly larger encoder E0 whose
`buffer is of length n0 = n1 - 61 + £0, where n1 and [1 are
`the parameters of E1, and where 30 is an integer, greater
`than t’ 1, for which the solution h0 of the equation
`
`i
`
`711 — 31 + 30 = n(h0,€’o)
`
`(22)
`
`satisfies
`
`P0 ‘ €(3o) < ho 5 Po " 6(50 + 1)-
`
`(23)
`
`1 is fixed, it is clear from (9)
`Noting that n0 — £0 = n1 - €
`and (10) that as 20 increases h0 decreases; also, (21) and the
`fact that (0 > (1 imply that p0 - e(€0 + 1) > p0 — e(€1 + 1)
`2 p0 — 61 > 0. Hence, there exist h0 > 0 and 30 > £1 that
`satisfy both (22) and (23).
`
`L31‘: 1 + [log ([1 + 1)]-1-|'log(n,-— 31-‘ 1)],
`
`iE i011},
`
`and N,~(aj), i,j E {0,1}, is the maximum number of words
`into which a string 8 E ag,-{n,~ — 6’, — 1} is parsed by E1.
`Since n0 — £0 = n1 — Z1 and 30 > 31, it is easy to verify
`that? N0(U1) S N1(U1). AISO by
`
`N1(U1) 5 Ni(01)=
`
`n1‘(€1+1)___n0-(50+1)
`£1
`(1
`
`Hence
`
`L00
`_
`P0(IT1) < [1
`
`Lcl
`= — +
`[1
`
`Lc0 — Lcl
`£1
`
`S
`
`01
`
`+
`
`Lc0 _ Lcl
`£1
`
`,
`
`and since
`
`L00 ‘ LL-1 = l10g(3o + 1)-l ‘ l10g (31 + 1)-l5l—10g kl,
`
`2 The assertion here is analogous to that of [10, theorem 1].
`
`6
`
`

`
`IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-23, NO. 3, MAY 1977
`
`343
`
`where k = (30/31), we obtain
`
`110071) 5 ,01+%1[10g ki.
`
`Moreover, for given complexity i.e., a given codeword
`length, the compression efficiency is comparable to that
`of an optimal Variable—to—block code book designed to
`match a given source.
`
`(25)
`
`To obtain an upper bound on k, we first observe that
`
`REFERENCES
`
`«I0
`302
`m=}\o+1
`
`(50 — m + 1)a'!°h0 S no — do
`
`/1
`= YL1 - 51 S (1 Z (€1—m + 1)0/lhl,
`m=1
`
`which reduces to
`
`k2(1 _ ho)? 5 0/Ih1(1-/€(ho/h1))_
`
`(26)
`
`Now, either k(1 —— ho) < 1, or else the exponent on the
`right side of (26) must be nonnegative and thus k S (h1/
`ho). In either case,
`
`h
`
`1
`k 5 max ]—1
`h01—h0]
`
`,
`
`=d.
`
`Q.E.D.
`
`Theorems 1 and 2 demonstrate the efficiency and uni-
`versality of the proposed algorithm. They show that an
`encoder designed to operate over a prescribed compression
`range performs, practically, as well as one designed to
`match a specific compression ratio within the given range.
`
`[1] D. A. Huffman, “A method for the construction of minimum—re-
`dundancy codes,” Proc. IRE, vol. 40, pp. 1098-1101, 1952.
`[2] R. M. Karp, “Minimum-redundancy coding for the discrete noiseless
`channel,” IRE Trans. Inform. Theory, vol. IT—17, pp. 27-38, Jan.
`1961.
`[3] B. F. Varn, “Optimal variable length codes,” Inform. Contr., vol.
`19, pp. 289-301, 1971.
`[4] Y. Perl, M. R. Gary, and S. Even, “Efficient generation of optimal
`prefix code: Equiprobable words using unequal cost letters,” J.
`ACM, vol. 22, pp. 202-214, April 1975.
`[5] A. Lempel, S. Even, and M. Cohn, “An algorithm for optimal prefix
`parsing of a noiseless and memoryless channel,” IEEE Trans. In-
`form. Theory, vol. IT-19, pp. 208-214, March 1973.
`[6] F. Jelinek and K. S. Schneider, “On variable length to block coding,”
`IEEE Trans. Inform. Theory, vol. IT—18, pp. 765-774, Nov. 1972.
`[7] R. G. Gallager, Information Theory and Reliable Communication.
`New York: Wiley, 1968.
`[8] J. Ziv, “Coding of sources with unknown statistics—Part I; Proba-
`bility of encoding error,” IEEE Trans. Inform. Theory, vol. IT-18,
`pp. 384-394, May 1972.
`[9] L. D. Davisson, “Universal noiseless coding,” IEEE Trans. Inform.
`Theory, vol. IT-19, pp. 783-795, Nov. 1973.
`[10] A. Lempel and J. Ziv, “On the complexity of finite sequences,” IEEE
`Trans. Inform. Theory, vol. IT-22, pp. 75-81, Jan. 1976.
`[11] B. M. Fitingof, “Optimal coding in the case of unknown and
`changing message statistics,” Prob. Inform. Transm., vol. 2, pp.
`3-11, 1966.
`
`On Binary Sliding Block Codes
`
`TOBY BERGER, SENIOR MEMBER, IEEE, AND JOSEPH KA-YIN LAU
`
`Abstract— Sliding block codes are an intriguing alternative to
`the block codes used in the development of classical information
`theory. The fundamental analytical problem associated with the
`use of a sliding block code (SBC) for source encoding with respect
`to a fidelity criterion is that of determining the entropy of the coder
`output. Several methods of calculating and of bounding the output
`entropy of an SBC are presented. Thelocal and global behaviors
`of a well-designed SBC also are discussed. The so-called “101-
`coder,” which eliminates all the isolated zeros from a binary input,
`plays a central role. It not only provides a specific example for
`application of the techniques developed for calculating and
`bounding the output entropy, but also serves as a medium for ob-
`taining indirect insight into the problem of characterizing a good
`
`Manuscript received April 16, 1976. This work was supported in part
`by the National Science Foundation under Grant GK—41229 and by a
`John Simon Guggenheim Memorial Foundation Fellowship. Portions
`of this paper were first presented at the IEEE Information Theory
`Workshop, Lenox, Massachusetts, June 1975.
`The authors are with the School of Electrical Engineering, Cornell
`University, Ithaca, NY 14853.
`‘
`
`SBC. An easily implementable SBC subclass is introduced in which
`the outputs can be calculated by simple logic circuitry. The study
`of this subclass is shown to be closely linked with the theory of al-
`gebraic group codes.
`
`I. INTRODUCTION
`
`theory of
`the
`of
`HANNON’S development
`source coding subject to a fidelity criterion [1], [2]
`dealt almost exclusively with block coding, i.e., the map-
`ping of consecutive, nonoverlapping, fixed—length blocks
`of source data into a so—called “code book” containing a
`constrained number of entries. The fundamental theorems
`
`of rate—distortion theory, which relate optimal source code
`performance to an information—theoretic minimization,
`involve complicated random coding arguments [3], [4] in
`general cases. Also,
`in many situations, block coding
`structures are exceedingly difficult to implement.
`
`7

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket