throbber
IEEE TR.ANSAC‘1‘IONS ON INFORMATION THEORY. V0]... IT-23. N0. 3, MAY 1977
`
`A Universal Algorithm for Sequential Data Compression
`
`JACOB ZIV, FELLOW, IEEE, AND ABRAHAM LEMPEL, MEMBER. IEEE
`
`Abstraetwn universal algorithm for sequential data compres-
`sion is presented. Its performance is investigated with respect to
`a nonprobabilistic model of constrained sources. The compression
`ratio achieved by the proposed universal code uniformly an»
`preaches the lower bounds on the compression ratios attainable by
`block-to-variable codes and variable-to-block codes designed to
`match a completely specified source.
`
`1.
`
`IN'l‘RODUC'l‘ION
`
`com-
`arising in digital
`N MANY situations
`munications and data processing, the encountered
`strings of data display various structural regularities or are
`otherwise subject to certain constraints, thereby allowing
`for storage and timesaving techniques of data compres-
`sion. Given a discrete data source, the problem of data
`compression is first to identify the limitations of the source,
`and second to devise a coding scheme which, subject to
`certain performance criteria, wi ll best compress the given
`source.
`
`Once the relevant source parameters have been identi-
`fied, the problem reduces to one of minimum-redundancy
`coding. This phase of the problem has received extensive
`treatment in the literature [1]—l7].
`When no a priori knowledge of the source characteristics
`is available, and if statistical tests are either impossible or
`unreliable, the problem of data compression becomes
`considerably more complicated. In order to overcome these
`difficulties one must resort to universal coding schemes
`whereby the coding process is interlaced with a learning
`process for the varying source characteristics [8], [9]. Such
`coding schemes inevitably require a larger working mem-
`ory space and generally employ performance criteria that
`are appropriate for a wide variety of sources.
`In this paper, we describe a universal coding scheme
`which can be applied to any discrete source and whose
`performance is comparable to certain optimal fixed code
`book schemes designed for completely specified sources.
`For lack of adequate criteria, we do not attempt to rank the
`proposed scheme with respect to other possible universal
`coding schemes. Instead, for the broad class of sources
`defined in Section III, we derive upper bounds on the
`compression efficiency attainable with full a priori
`knowledge of the source by fixed code book schemes, and
`
`Manuscript received June 23. 1975; revised July 6, 19'?6. Paper pre-
`viously presented at the lEEEIntemationalSyrnposiun:1 on Information
`Theory, Ronnehy, Sweden. June 21-24, 1973.
`J. Ziv was with the Department of Electrical Engineering, Technion.
`Israel Institute of Technology. Haifa, Israel. He is now with the Bell
`Telephone Laboratories, Murray Hill, NJ 07911.
`A. Lempel was with the Department of Electrical Engineering, Tech-
`nion—lsrael Institute of Technology, Haifa, Israel. He is now with the
`Sperry Research Center, Sudbury, MA 01776.
`
`then show that the efficiency of our universal code with no
`a priori knowledge of the source approaches those
`bounds.
`
`The proposed compression algorithm is an adaptation
`of a simple copying procedure discussed recently [10] in
`a study on the complexity of finite sequences. Basically,
`we employ the concept of encoding future segments of the
`source—output via maximum—length copying from a buffer
`containing the recent past output. The transmitted
`codeword consists of the buffer address and the length of
`the copied segment. With a predetermined initial load of
`the buffer and the information contained in the codewords,
`the source data can readily be reconstructed at the de-
`coding end of the process.
`The main drawback of the proposed algorithm is its
`susceptibility to error propagation in the event of a channel
`error.
`
`II. THE COMPRESSION A1.ooR1'rHM
`
`The proposed compression algorithm consists of a rule
`for parsing strings of symbols from a finite alphabet A into
`substrings, or words, whose lengths do not exceed a pre—_
`scribed integer L_,, and a coding scheme which maps these
`substrings sequentially into uniquely decipherable code-
`words of fixed length L, over the same alphabet A.
`The word—length bounds L, and L,. allow for bounded-
`delay encoding and decoding, and they are related by
`
`L, = 1+ [log (rt -- L.,)1+[1ogL,1,
`
`(I)
`
`where [x'| is the least integer not smaller than :5, the log-
`arithm base is the cardinality or of the alphabet A, and n
`is the length of a buffer, employed at the encoding end of
`the process, which stores the latest n. symbols emitted by
`the source. The exact relationship between :1. and L, is
`discussed in Section III. Typically, n m L_,cx“'-*=, where 0
`< h < 1. For on-line decoding, a buffer of similar length has
`to be employed at the decoding end also.
`To describe the exact mechanics of the parsing and
`coding procedures, we need some preparation by way of
`notation and definitions.
`
`Consider a finite alphabet A of or symbols, say A =
`i0,1, - - - ,a — 1}. Astring, or word, S of length 8(3) = it over
`A is an ordered k—tuple S = s1s2 - - - s;, of symbols from A.
`To indicate a substring of S which starts at position i and
`ends at position j, we write S(i,j}. When i E j, S(i,j) =
`s,-3,41 - - - s,-, but when E > j, we take S(z',j) = II, the null
`string of length zero.
`The concatenation of strings Q and R forms a new string
`8' = QR; if£’{Q) = i2 and NR) = m, then 5(3) = 12 + m, Q
`= S(1,k}, and R = Site + 1, it + m). For eachj, U Ej 5
`
`SAP AMERICA, INC. ET AL.
`EXHIBIT 1012
`
`Page 1 of 7
`
`

`
`338
`
`IEEE TRANSACTIONS DN INFORMATION THEORY. MAY 1977
`
`3(8), S(1,j) is called a prefix ofS; S(1,,r') is a proper prefix
`ofSifj < NS).
`Given a proper prefix S(1,,i) ofa string S and a positive
`integer i such that i S 3?, let L (i) denote the largest non-
`negative integer E S 1? {S ) — j such that
`
`S(i,i+ .5’ — 1] = S(_;'+ 1,j+ 2},
`
`and let p be a position of S (1,}'} for which
`
`Llp) = max lL(i)}.
`‘l.£r'Sj
`
`The substring S{_,r' + 1,3’ + L(p)} ofS is called the repro-
`ducible extension of S{1,j) into S, and the integer p is
`called the pointer of the reproduction. For example, if S
`= 00101011 and} = 3, thenL(1} = 1 since S0" + 1,} + 1) =
`S{1,1) but SU + 1,,i + 2) ;-5 S{1,2). Similarly,L_(2) = 4and
`M3} = 0. Hence, S(3 + 1,3 + 4) '—' 0101 is the reproducible
`extension of S(1,3) = 001 into S with pointer p = 2.
`Now, to describe the encoding process, let S = 8189 - - -
`denote the string of symbols emitted by the source. The
`sequential encoding of S entails parsing S into successive
`source words, S = SIS; -
`-
`-
`, and assigning a codeword C,-
`for each S,-. For bounded-delay encoding, the length t’,- of
`each 8; is at most equal to a predetermined parameter L,-.
`while each C,- is of fixed length L._. as given by (
`).
`To initiate the encoding process, we assume that the
`output 8 of the source was preceded by a string Z of n —
`L, zeros, and we store the string B1 = ZS(1,L,.) in the
`buffer. If S(1,j) is the reproducible extension of Z into
`ZS(1,L,. — 1], then S] = S(1,j + I) and E1=j+ 1. To de-
`termine the next source word, we shift out the first 3;
`symbols from the buffer and feed into it the next 4?] sym-
`bols ofS to obtain the string Bg = B](£1 + 1,n)S(L,, + 1,
`L, + 81). Now we look for the reproducible extension E of
`Bg(1,n - L3) into Bg(1,n -1), and setSg = Es, where sis
`the symbol next to E in B2. In general, if B; denotes the
`string of :1 source symbols stored in the buffer when we are
`ready to determine the ith source word S,-, the successive
`encoding steps can be formally described as follows.
`1) Initially, set 31 = 0"""-'-'S(1,L,,), i.e., the all—zero
`string of length n - L5 followed by the first L3 symbols of
`S
`.
`
`2} Having determined B,-,:' 2 1, set
`
`S; = B,-(n - L, +1,n - Ls + 8.-),
`
`where the prefix of length 2, — 1 of S; is the reproducible
`extension ofl$,-(1,n — L,} into B,-{1,n -1).
`3) If p,- is the reproduction pointer used to determine
`S; , then the codeword C, for S, is given by
`'
`
`C: = CuC;‘2Ci3.
`
`where Cu is the radix-oz representation of p.‘ - 1 with
`£(C,-1} = [log [n — L_.,.)], C52 is the radix-or representation
`of 3; - 1 with £’(C,-2) = [log L3]. and C53 is the last symbol
`of S,-, i.e., the symbol occupying position rt — L, +' 3; of B,-.
`The total length of C; is given by
`
`= [log (73 —
`
`+ [log Ls-l +1
`
`4) To update the contents of the buffer, shift out the
`symbols occupying the first 8; positions of the buffer while
`feeding in the next 8; symbols from the source to obtain
`
`B;+1= Brit’; + 1,7!)-S'(l1i+ 1.hi+ 9:).
`
`where h,- is the position of S occupied by the last symbol
`of .35.
`This completes the description of the encoding process.
`It is easy to verify that the parsing rule defined by (2)
`guarantees a bounded, positive source word length in each
`iteration; in fact, 1 5 8; S. L, for each i thus allowing for
`a radix—a representation of E; — 1 with [log L_.,'| symbois
`from A. Also, since 1 S p,- 5 n - L, for eachi, it is possible
`to represent pg — 1 with [log (n — Ls)-l symbols from A.
`Decoding can be performed simply by reversing the
`encoding process. Here we employ a buffer oflength n —
`L, to store the latest decoded source symbols. Initially, the
`buffer is loaded with n - L3 zeros. If'D,- = d]Glg - - - dn_L,
`denotes the contents of the buffer after C,-_. has been de-
`coded into S,-_,, then
`
`Si—‘l = Diln "Ls _ fa"-1+]-an _ L5):
`
`where E,--1 = €(S.'—i), and where D,-+1 can he obtained
`from D; and C; as follows.
`Determine pg - l and E; — 1 from the first [log [n - L,.}'|
`and the next [log L_,'| symbols of G. Then, apply 8; - I
`shifts while feeding the contents of stage p,- into stage n —
`L_,.. The first of these shifts will change the buffer contents
`from D; to
`
`D5" = d2d.’l - - - d.._:.,.dp.- = d‘.“d-‘_a" - --
`
`5.‘.'.i...
`
`Similarly, ifj S 6’; - 1, the jth shift will transform DP”)
`= dgi-”dy—” .
`.
`. d£{:,{;
`into pi“ = dy"”dy'-U . ..
`d‘,;‘:},f,d_‘,;‘,."’ = dllldlg” - - - d‘,:”_;,,. After these 8; — 1
`shifts are completed, shift once more, while feeding the last
`symbol of C; into stage n — L, of the buffer. It is easy to
`verify that the resulting load of the buffer contains 8; in
`its last 9; = £(S;) positions.
`The following example will serve to illustrate the me-
`chanics of the algorithm. Consider the ternary (ox = 3}
`input string
`
`3 = 0010102I02102l2021021200 - - - ,
`
`and an encoder with parameters L3 = 9 and n = 18. [These
`parameters were chosen to simplify the illustration; they
`do not reflect the design considerations to be discussed in
`Section III.) According to (1), the corresponding codeword
`length is given by
`
`C = 1 +log3{18—9)+log39=5.
`
`Initially, the buffer is loaded with n — L3 = 9 zeros, fol-
`lowed by the first L, = 9digits of S,namely,
`
`B] =00_0000000 001010210.
`\-.._..uII--\/--II-.._./
`\.._.uII--\/--II-.._/
`n.—L,,.=9
`L,.=9
`
`To determine the first source word S 1, we have to find the
`longest prefix B1{1Q,9 + £1 - 1} of
`
`in accordance with {1).
`
`B](10,l'?l =00101021
`
`Page 2 of 7
`
`

`
`ZN AND LEMPEL: SEQUENTIAL DATA COMPRESSION
`
`339
`
`which matches a substring of B1 that starts in position p 1
`S 9 and then set S1 = B1(10.9 + 8;). It is easily seen that
`the longest match in this case is B1(10,11) = 00, and hence
`S1= 001 and !1= 3. The pointer P} for this step can be any
`integer between one and nine; we choose to set p 1 = 9. The
`two-digit radix—3 representation of p 1 — 1 is C” = 22, and
`that of 8; -— 1 is C12 = 02. Since C,-3 is always equal to the
`last symbol of S,-, the codeword for S1 is given by C1 =
`22021.
`
`To obtain the buffer load B2 for the second step, we shift
`out the first 3 1 = 3 digits of B1 and feed in the next 3 digits
`S (10,12) = 210 of the input string S. The details of steps
`2, 3, and 4 are tabulated below, where pointer positions are
`indicated by arrows and where the source words S, are
`indicated by the italic substring of the corresponding
`buffer load B,-
`
`l
`B2=000O0fl00101021021Q
`
`1
`B3=0o0o101021021o212a
`
`l
`B,=21o210212o2r0212oa
`
`C2 = 21102
`
`C3 = 20212
`
`C4 = 02220.
`
`111. COMPRESSION or CONSTRAINED SOURCES
`
`In this section, we investigate the performance of the
`proposed compression algorithm with respect to a non-
`probabilistic model of constrained information sources.
`After defining the source model, we derive lower bounds
`on the compression ratios attainable by block-to-variable
`and variable—to-block codes under full knowledge of the
`source, and then show that the compression ratio achieved
`by our universal code approaches these bounds.
`
`A. Definition of the Source Model
`
`Let A = {{),1, - - - ,a - 1] be the given a—s-ymbol alphabet,
`and let A * denote the set of all finite strings over A. Given
`a string S E A* and a positive integer m 5 HS), let Sim}
`denote the set of all substrings of length m contained in S,
`and let S (m) denote the cardinality of S {m }. That is,
`!iSJ—rn
`
`Slm}= B‘ S(£+ 1,f+m)
`
`son; = IS£mIl-
`
`Givenasubset arofA*, let
`
`crlml = {S E rr|£(S) = m},
`
`and let cr{m) denote the cardinality of aim}.
`A subset er of A * is called a source if the following three
`properties hold:
`
`1) A C er (i.e., 0' contains all the unit length strings),
`2) S E 0 implies SS E a,
`3) S E crimplies Slml C clml.
`
`Typically. such a source rr is defined by specifying a fi-
`nite set of strings over A which are forbidden to appear as.
`substrings of elements belonging to O‘, and therefore cr{m)
`< am for all m exceeding some mg.
`With every source or, we associate a sequence M1),
`M2), - - - of parameters. called the h-parameters of 0',
`where‘-
`
`Mm) = ilog aim),
`m
`
`m=1e,.-
`
`(3
`
`It is clear that 0 5 Min) 5 1 for all m and, by 2) it is also
`clear that mh(m} is a nondecreasing function of m. The
`sequence of h-parameters, however,
`is usually nonin—,
`creasing in m. To avoid any possible confusion in the se-'
`quel, we postulate this property as an additional defining
`property of a source. Namely, we require
`
`4} h{m} = 1/m log cr(m} is a nonincreasing function of
`m.
`
`B. Some Lower Bounds on the Compression Ratio
`
`Consider a compression coding scheme for a source tr
`which employs a block-to-variable {Bl/) code book of M
`pairs (X,-,Y,-) of words over A, with .£’(X,-} = L for i =
`1,2, - - - ,M. The encoding of an infinitely long string 8 E‘-
`or by such a code is carried out by first parsing S into blocks
`of length L, and then replacing each block X; by the cor-
`responding codeword Y.-. It is assumed, of course, that the
`code book is exhaustive with respect to o and uniquely
`decipherable [2]. Hence, we must have
`
`lXeli-"11 = GIT-l
`
`M = <r(L) = cx“"""’.
`
`max Ie‘(Y,-)lz1ogM = Lh(L).
`15:‘ SM
`
`(3)
`
`(4)
`
`The compression ratio ,0,‘ associated with the ith word-
`pair of the code is given by
`
`_ _ 6’ (Yr)
`.0;
`L -
`
`The BV compression ratio, p31,r(o',.M), of the source a_
`is defined as the minimax value of p,-, where the maximi-
`zation is over all word—pairs of a given code, and the min-
`imization is over the set C3y{a,M) of all B V code hooks
`consisting of M word—pairs. Thus,
`
`i0av(0,M) =
`
`min max
`C;;v(o.M>1si'4_=M
`
`.
`
`3(Y;)l
`
`L
`
`s‘°gM=%=:-.(L1..
`L
`L
`
`1 Throughout this paper, log I means the base-nr logarithm of x.
`
`Page 3 of 7
`
`

`
`340
`
`For later reference, we record this result in the following
`lemma.
`Lemma 1:
`
`where
`
`o.sv(cr.M) 2 h( L
`
`Lh(L) = log M.
`
`Now, consider a compression scheme which employs a
`variable-to-block (VB) code book of M word—pairs (X,-,Y.-},
`with £’(Y,-) = L for all i = 1,2, - - - ,M. In this case, the
`compression ratio associated with the ith word-pair is
`given by
`
`L
`
`l_
`P:
`
`and similarly, the VB compression ratio py,:;{o,M) of c is
`defined as the rninirnax value of _£I,' over all word—pairs and
`over the set Cw;(a,M) of VB code books with M word-
`pairs.
`Lemma 2:
`
`pvs(rf.Ml Z MLM)
`
`where
`
`LM = max {£|M 2 6(5)}.
`
`Proof: We may assume, without loss of generality, that
`in every code under consideration
`
`£’(X1}St’(X2}S---Efi-KM).
`
`and hence for each C E CVg(0,M),
`
`_ MC)
`I
`max p,{C) — —“r1}.
`Since C is exhaustive with respect to or, we have
`
`M 2 alt’ 1}.
`
`{3}
`
`where 81 = €(X1); and since C is uniquely decipherable,
`we have
`
`MC] .2 log M.
`
`(7)
`
`IEEE TRANSACTIONS ON INFORMATION THEORY, MAY 1977
`
`Remarks
`
`1} Since the value of L in the context of Lemma 1
`satisfies the definition of LM as given in Lemma 2, it fol-
`lows that the bounds of both lemmas are essentially the
`same, despite the basic difference between the respective
`coding schemes.
`2) By 2), the second defining property of a source, the
`per—word bounds derived above apply to indefinitely long
`messages as well, since the whole message may consist of
`repeated appearances of the same worst case word.
`3) By 4}, the nonincreasing property of the h—pa-
`rameters. the form of the derived bounds confirms the
`intuitive expectation that an increase in the size M of the
`employed code book causes a decrease in the lower bound
`on the attainable compression ratio.
`
`C. Performance of the Proposed Algorithm
`
`We proceed now to derive an upper bound on the com-
`pression ratio attainable by the algorithm of Section II. To
`this end, we consider the worst case source message of
`length n — L_,, where n is the prospective buffer length and
`L, is the maximal word-length. The bound obtained for
`this case will obviously apply to all messages of length n
`— L, or greater.
`First, we assume that only the h-parameters of the
`source under consideration are known to the designer of
`the encoder. Later, when we discuss the universal perfor-
`mance of the proposed algorithm, we will show that even
`this restricted a prion" knowledge of the source is actually
`unessential.
`We begin by choosing the buffer length n to be an inte-
`ger of the form
`A
`
`8
`m=.x+I mcrifl 4'
`n = Z mam + Z
`rn-——l
`
`(I? 4' 1}(Ne+:+1l.
`
`(9)
`
`where
`
`N!+1 =
`
`8
`i (£’—m)zv"'+ Z
`m=l
`m=2\+l
`
`U""m)0[-9),
`
`(10)
`
`From the definition of LM, inequality (6), and the nonde-
`creasing property of <r(8), we obtain
`
`A = [£'h(€)].
`
`the integer part oflog ¢r(€) = £h(£’), and
`
`3 = L, — 1.
`
`(Ill
`
`€;SLM.
`
`_ From (5), (7), and (8), we have
`
`log M_
`maxp.'(C) 2 LM ,
`
`and since
`
`M 3 CALM) = a'f‘MhhLJH}’
`
`it follows that for each C‘ E CV5-{ar,M),
`
`max m(C) Z h(LM}.
`
`(8)
`
`The specific value of the parameter L, is left for later
`determination {see the first remark following the proof
`of Theorem 1). The reasoning that motivates the given
`form of n will become clear from subsequent deriva-
`tions.
`
`Consider a string S E rrin — L5}, and let N(S) denote the
`number of words into which 3 is parsed by the algorithm
`during the encoding process. Recalling that each of these
`words is mapped into a codeword of fixed length L, {see
`(1)), it follows that the compression ratio p(S) associated
`with the string 8' is given by
`
`as) =
`
`L"n_
`
`L8 N(S}.
`
`Page 4 of 7
`
`

`
`ZIV AND I.-EMPEL: SEQUENTIAL DATA COMPRESSION
`
`Hence, the compression ratio ,0 attainable by the algorithm
`for a given source 0' is
`
`(12)
`
`N= max N(S).
`SE aln‘-L,.l
`
`Let Q E crin — L3} be such that N(Q} = N, and suppose
`that the algorithm parses Q into Q = Q1622 - - - QN. From
`step 2) of the encoding cycle it follows that if HQ.-) = 9 )5)
`< L_.,, for somei <j < N, then Q; 75 65-. (Note that when
`Q_,- is being determined at the j th cycle of the encoding
`process, all of Q, is still stored in the buffer, and since 5 (Q1)
`< L,, the longest substring in the buffer that precedes Q;
`and is a prefix of Q; must be oflength 2{Q_.-) — 1.}
`Denoting by Km the number of Q,-, 1 S i _<. N - 1, of
`length m, 1 S m 5 L5, we have
`L.-.
`N=l+ 2 Km.
`m=l
`
`N£N’=
`
`n - L3 _ n — L5
`5
`L, — 1'
`
`Hence, from (12) and (16), we have
`
`DEE:
`8‘
`
`LC
`L3 — 1'
`
`(17)
`
`Note that despite the rater rudimentary overestimation
`of N by N’, the upper bound of (17) is quite tight, since the
`fact that no source word is longer than L, immediately
`implies p 2 L,/Ls.
`We can state now the following result.
`Theorem I: If the buffer length n for a source with
`known h-parameters is chosen according to (9), then
`
`P 5-
`
`"'
`
`+ €(Le;l;
`
`where
`
`(3 + 3 log (L, — 1} + log
`c(L,-)
`:12, -1
`Proof.‘ From {1} we have
`
`By the above argument, and by property 3} of the source,
`we have
`
`L, = 1+ [log L_,.] + [log in - L_,)'|
`
`From {9} and (10) we obtain
`
`E 3 + log [L,.. — 1) + log {n — L,..).
`
`m=|
`
`n-L_.,=f[ i {t’—m]a"‘+ i
`m=.\+‘|. (3 -m)rr{Zl
`A
`t’
`m= A+l am],
`+ z or"'+ 2
`m=I
`and since C!” 5 0(3), for 1 S m S A, we have
`
`n—L,,sea{e) f (3-m-I-1}=$€2(£’+l)ar(
`Im:
`
`),
`
`+1
`log (rt — L5) 5 2log£ + log‘? 2 + fhlf).
`
`Since 8 = L, — 1, we obtain
`
`L, s3+31og(L_, —1)+1og%+(L_, —1}h(L_, -1),
`OI‘
`
`LC 5 (Ls _ ]-)lh(Ls _1)+ 5(Ls)l-
`
`Substituting this result into (17), we obtain the bound of
`Theorem 1.
`
`Q.E.D.
`
`Remarks
`1) The Value of e(L,) decreases with L, and, conse-
`quently, the compression ratio p approaches the value of
`ML, - 1), the h-parameter associated with the second
`largest word-length processed by the encoder. Given any
`6 > 0, one can always find the least integer 3,. such that
`p - h(£,, - 1) 5 6. The magnitude of the acceptable de—
`
`Km
`
`rI(m).
`
`for1Sm5£'=L_,—1.
`
`Since
`
`mKrnn
`+
`37- “Ls '=
`and n and L, are both fixed, it is clear that by overesti-
`mating the values of K," for 1 5 m 5 E at the expense of
`K(+1, we can only overestimate the value of N. Therefore,
`since «(ml 5. afm + 1) and cr{m) = a”“"l’“’ 5 am, we ob-
`tain
`
`J
`8
`r
`N£K,p,1+ Z K,,,=N’,
`m=l
`
`(13)
`
`K:,.=l
`
`am
`’0(3),
`
`for] 5 m 5 A = Lfh{£)j
`for)\<m 5 8
`
`(14)
`
`From (14), (15), and (9), we obtain Kin.” = N3“, and
`A
`
`r I
`
`lV'=lV(+1+ E: fl"'+
`m=1
`
`I".I'I=.\+l
`
`«(fl
`
`which, together with (9) and (10). yields
`A
`9
`Z"_mtr’”+ X mcr(£’)
`m=l
`m=.\+l
`
`n—L_,-EN’:
`
`A
`if
`+iNg+[—'g[Z;0tm+ Z
`m=A+I 6(9)] = D,
`m=l
`
`Page 5 of 7
`
`

`
`342
`
`IFIEE TRANSACTIONS ON INFORMATION THEORY. MAY 1977
`
`viation 5 determines the operational range of Ls, namely,
`L, 2 £3.
`2) Since our code maps source words of variable
`length at most L, into codewords of fixed length Lc, we
`adopt as a reference for comparison the best VB code CV5
`discussed in Subsection III-B. The counterpart of our
`block-length L, is L(C) of (5), and by (7) we can write
`
`Le "3 103 M ‘“ LMMLM),
`
`(13)
`
`where M is the code book size and h{L,a,4) is the lower
`bound {see Lemma 2) on the compression ratio pvg at-
`tainable by CV3. Since for sufficiently large L,, we have
`also
`
`Le =*‘= (Ls - Up 9“ (Ls -1}h(L.= *1),
`
`(I9)
`
`it follows that p H pm.
`We turn now to investigate the universal performance
`of the proposed algorithm, i.e., the performance when no
`a priori knowledge of the source to be compressed is
`available.
`Given 5; and in such that 0 < 61< h, < 1, let £1 be the
`least positive integer satisfying
`
`alzew,+1)=§—Gr+3mgz,+kg£‘+1)
`1
`and let X = o’“''' and A] -~ |_t‘1h1_|. Consider the encoder
`E1 which employs a buffer of length in = n{h1,t'1), ob-
`tained from (9) and (10) by setting 8 = 31,} = A], and a(!.’)
`'-= K , and whose maximal word—leng'th L, is equal to El +
`1.
`
`It follows from Theorem 1 that if this encoder is applied
`to a source an such that h,,,(81_} = h], then the resulting
`compression ratio piicn) satisfies
`
`In analogy with (20), it is also clear that if E0 is applied
`to a source 0'0 such that h,,,,(!g) = ho, then the resulting
`compression ratio p()(t'f[)} satisfies
`
`IJo(0ol 5 hcIo(e[)] 4' 50 = ho 4‘ 5o.
`
`(24)
`
`where 60 = e{€g + I).
`From (23) and (24), we have pg{o'0} 5 pg — c(€n + 1) + :50
`= 190. and
`
`1
`5o5po'hu<e(fo) S s(£i +1) 551 (Eng.
`
`Hence, hg can be made arbitrarily close to pa, and conse-
`quently, the encoder Eg matches as closely as desired the
`lower end pg of the given compression range.
`Theorem 2: Let 0' be a source for which a matched en-
`
`coder E achieves a compression ratio ,o(o) within the range
`{p|},,o]). Then the compression ratio P0(0'), achieved for o'
`by E0, satisfies
`
`where
`
`1
`1
`J
`hg1—hn'
`(Typically. (hifhol > (1./1 - ho) andd = (h1z’hn}.}
`
`Proof: To prove the theorem we shall consider the
`obviously worst case of applying E0 to the source in whose
`matched encoder E1 realizes p1.
`Let po(I‘71l denote the compression ratio achievable by
`Eu when applied to 01. According to (12), we have
`
`floifirl =
`
`Lei}
`If_ (£0 + 1) Nam),
`
`D1(01l5he.(31l+f5i=h1+51-
`
`(20)
`
`where
`
`Suppose now that h; and :51 were chosen to fit the pre-
`scribed upper value p1 = In + 61 of a prospective com-
`pression ratio range (90.91) with
`
`0<51R‘(p0<p]‘Cl,
`
`H>I,
`
`(21)
`
`where R is an adjustment parameter to be determined
`later. As shown above, the encoder E1 is then matched
`exactiy to the upper value ,0 1 of the prospective compres-
`sion range. In order to accommodate the whole given range,
`we propose to employ a slightly larger encoder E3 whose
`buffer is of length ng = R1 — £1 + 5'0, where in and t’; are
`the parameters of E1, and where 80 is an integer, greater
`than £1, for which the solution ha of the equation
`
`R1-31+€’o= n(ho.3c)
`
`(22)
`
`satisfies
`
`flo‘€(3ol <ho5IJo‘€(3o+ 1l-
`
`(23)
`
`Noting that no - 89 = n; - £1 is fixed, it is clear from (9)
`and (10) that as 99 increases ha decreases; also, (21) and the
`fact that 59 > 3; imply that pa — etfn + 1} > pg — (-(£1+ 1}
`2 pg — 51 > 0. Hence, there exist ha > 0 and 80 > 2’. that
`satisfy both (22) and (23).
`
`L._.; = 1 +|'log(t’,-+1}'I+|'log(n;— E.--1)],
`
`5E mall.
`
`and N,-(cr,-}, i,_,i E §0,1l. is the maximum number of words
`into which a string 8 E a,-In, — E; — 1i is parsed by E;.
`Since no - Eu -'= a1 - £1 and 85 > £1, it is easy to verify
`that? N0(a1) 5 Nlfol). Also by (18),
`
`N1{0il5 Niifiil “
`Hence
`
`}‘I1‘(f1+1}=T1o—(£g+1)
`£1
`£1
`
`£o+
`51
`
`Let) "' Lcl
`
`3'1
`
`+ Let)‘ Lcl
`31
`
`and since
`
`LE9 - L9; = [log (89 +1)]—|'log(£1+1)'lSflogk'|,
`
`9 The assertion here is analogous to that of [10, theorem 1].
`
`Page 6 of 7
`
`

`
`IEEE 'l‘R»\NSAC'l‘IONS ON INFORMATION THEORY, VOL.
`
`[T-23, NO. 3. MAY 1977
`
`343
`
`where k = ifgffil. we obtain
`
`1
`1
`Doif-'1) 5 :01 ‘l’ E‘ H03 kl-
`
`Moreover, for given complexity i.e., a given codeword
`length, the compression efficiency is comparable to that
`of an optimal variable-to-block code book designed to
`match a given source.
`
`(25)
`
`30
`
`To obtain an upper bound on k, we first observe that
`flu
`2‘:
`m =.\|_p+1
`
`(50 — m 'l' 1)a"°'l”’ S no — do
`F
`= n, — 2, 5 2', i (£1-m +1)w’r’H,
`m=l
`
`which reduces to
`
`ken _ h0}2 5 a:’1h:{1—&>()mfh[J).
`
`(26)
`
`Now, either it (1 — ho) < 1, or else the exponent on the
`right side of (26) must be nonnegative and thus is S 021/
`ho). In either case,
`
`k S
`
`max
`
`in
`—,
`
`1
`1 _ ho]=d.
`
`Q.E.D.
`
`Theorems 1 and 2 demonstrate the efficiency and uni-
`versality of the proposed algorithm. They show that an
`encoder designed to operate over a prescribed compression
`range performs, practically, as well as one designed to
`match a specific compression ratio within the given range.
`
`REFERENCES
`
`{l]
`
`1'}. A. Huffman, "A method for the construction of minimum-re-
`dundancy codes," Pro-:. IRE, vol. 40, pp. 10934101, 1952.
`[2] R. M. Karp, “Minimum—redundancy coding for the discrete noiseless
`channel," IRE Trans. inform. Theory, vol. IT—l'?, pp. 27-38, -Jan.
`1951.
`I3] B. F‘. Vern, "Optimal variable length codes." Inform. Conlr., vol.
`19, pp. 289-301, I971.
`[4] Y. Perl, M. R. Gary, and S. Even, "Efficient generation ofoptimal
`prefix code: Equiproboble words using unequal cost letters." J.
`ACM, vol. 22. pp. 2U2«214, April 1975.
`|5] A. l.empel,S. Even,and M.Cohn, “An algorithm foroptimalprclix
`parsing of a noiseless and memoryless channel,“ IEEE Trans. in-
`form_ Theory, vol. I'l"- I9, pp. 208-214, March 1973.
`[S] F. -Jelinek and K. S. Schneider. "On variable length to block coding,"
`IEEE Trans. Inform. Theory, vol. l'l‘.18, pp. 'l65—'i"F4. Nov. 1972‘.
`[T] R. G. Gallager, Information 'T'heor_v and Reliubie C'ommu:1i('r1lir:n.
`New York: Wiley, 1968.
`[8] J. Ziv, “Coding of sources with unknown statistics—l’art 1: Probe-
`bility of encoding error," IEEE Trans. inform. Theory. vol. l'i‘—l8.
`pp. 384-394. May 1972.
`[S] L. D. Davisson, “Universal noiseless coding,“ IEEE Trans. Inform.
`Theory, vol. lT—l9, pp. 783-795, Nov. 1973.
`[10] A. Lempel and J. Ziv, "On the complexity of finite sequences,” IEEE
`Trans. Inform. 'I‘}ieor_v, vol. IT-22. pp. 'F5 -81,Jan. 19'i6.
`[11] B. M. Fitingof, “Optimal coding in the case of unknown and
`changing message statistics." Prob. inform. Tron.-2m_, vol. 2, pp.
`3-1], 1966.
`
`On Binary Sliding Block Codes
`
`TOBY BERGER, SENIOR MEMBI-‘Jvt. lF1EE,.AND JOSEPH KA-YIN LAU
`
`Abstract—- Sliding block codes are an intriguing alternative to
`the block codes used in the development of classical information
`theory. The fundamental analytical problem associated with the
`use of a sliding block code (SBC} for source encoding with respect
`to a fidelity criterion is that ofdeterrnining‘ the entropy of the coder
`output. Several methods of calculating and of bounding the output
`entropy of an SBC are presented. The local and global behaviors
`of a well-designed 3138 also are discussed. The so-called “I01-
`coder,” which eliminates all the isolated zeros from a binary input,
`plays a central role. It not only provides 3. specific example for
`application of the techniques developed for calculating and
`bounding the output entropy, but also serves as a medium for oh-
`taining indirect insight into the problem of characterizing a good
`
`Manuscript received April 16, 19175. This work was supported in part
`by the National Science Foundation under Grant GK-41229 and by a
`John Simon Guggenheim Memorial Foundation Fellowship. Portions
`of this paper were first presented at the IEEE Information Theory
`Workshop, Lcnox, Massachusetts, June l9'?5.
`The authors are with the School of Electrical Engineering, Cornell
`University, lthaca. NY 14853.
`
`SBC. An easily implementable SBC subclass is introduced in which
`the outputs can be calculated by simple logic circuitry. The study
`of this subclass is shown to be closely linked with the theory of al-
`gebraic group codes.
`
`I.
`
`INTR()DUC'T‘ION
`
`theory of
`the
`HANNON’S development of
`source coding subject to a fidelity criterion [1], [2]
`dealt almost exclusively with block coding, i.e., the map-
`ping of consecutive, nonoverlapping, fixed—]ength blocks
`of source data into a so—called “code book” containing a
`constrained number of entries. The fundamental theorems
`of rate—distortion theory, which relate optimal source code
`performance to an information-theoretic minimization,
`involve complicated random coding "arguments [3], [4] in
`general cases. Also,
`in many situations, block coding
`structures are exceedingly difficult to implement.
`
`Page 7 of 7

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket