`
`PROCEEDINGS OF THE I.R.E.
`
`September
`
`A Method for the Construction of
`
`Minimum-Redundancy Codes*
`
`DAVID A. HUFFMANT, ASSOCIATE, IRE
`
`will be defined here as_an ensemble code which, for a
`
`message ensemble consisting of a finite number of mem-
`bers, N, and for a given number of coding digits, D,
`yields the lowest possible average message length. In
`order to avoid the use of the lengthy term “minimum-
`redundancy,” this term will be replaced here by “opti-
`mum.” It will be understood then that, in this paper,
`“optimum code” means “minimum-redundancy code.”
`The following basic restrictions will be imposed on an
`ensemble code:
`
`(a) No two messages will consist of identical arrange-
`ments of coding digits.
`(b) The message codes will be constructed in such a
`way that no additional indication is necessary to
`specify where a message code begins and ends
`once the starting point of a sequence of messages
`is known.
`
`Restriction (b) necessitates that no message be coded
`in such a way that its code appears, digit for digit, as the
`first part of any message code of greater length. Thus,
`01, 102, 111, and 202 are valid message codes for an en-
`semble of four members. For instance, a sequence of
`these messages 1111022020101111102 can be broken up
`into the individual messages 111-102—202—01-01-111-102.
`All the receiver need know is the ensemble code. How-
`
`ever, if the ensemble has individual message codes in-
`cluding 11, 111, 102, and 02, then when a message se-
`quence starts with the digits 11, it is not immediately
`certain whether the message 11 has been received or
`whether it is only the first two digits of the message 111.
`Moreover, even if the sequence turns out to be 11102,
`it is still not certain whether 111-02 or 11-102 was trans-
`
`mitted. In this example, change of one of the two mes-
`sage codes 111 or 11 is indicated.
`C. E. Shannon’ and R. M. Fano"’- have developed en-
`semble coding procedures for the purpose of proving
`that the average number of binary digits required per
`message approaches from above the average amount of
`information per message. Their coding procedures are
`not optimum, but approach the optimum behavior when
`N approaches infinity. Some work has been done by
`Kraft“ toward deriving a coding method which gives an
`average code length as close as possible to the ideal when
`the ensemble contains a finite number of members.
`
`However, up to the present time, no definite procedure
`has been suggested for the construction of such a code
`
`2 R. M. Fano, “The Transmission of Information,” Technical
`Report No. 65, Research Laboratory of Electronics, M.I.T., Cam-
`bridge, Mass.; 1949.
`.
`3 L. G. Kraft, “A Device for Quantizing, Grouping, and Coding
`Amplitude-modulated Pulses,” Electrical Engineering Thesis, M.I.'I‘.,
`Cambridge, Mass.; 1949.
`
`Oracle 1014
`
`Oracle 1 0 1 4
`
`Summary—An optimum method of coding an ensemble of mes-
`sages consisting of a finite number of members is developed. A
`minimum-redundancy code is one constructed in such 9. way that the
`average number of coding digits per message is minimized.
`
`INTRODUCTION
`
`©NE IMPORTANT METHOD of transmitting
`
`messages is to transmit in their place sequences
`of symbols. If there are more messages which
`might be sent than there are kinds of symbols available,
`then some of the messages must use more than one sym-
`bol. If it is assumed that each symbol requires the same
`time for transmission, then the time for transmission
`(length) of a message is directly proportional to the
`number of symbols associated with it. In this paper, the
`symbol or sequence of symbols associated with a given
`message will be called the “message code.” The entire
`number of messages which might be transmitted will be
`called the “message ensemble.” The mutual agreement
`between the transmitter and the receiver about the
`
`meaning of the code for each message of the ensemble
`will be called the “ensemble code.”
`
`Probably the most familiar ensemble code was stated
`in the phrase “one if by land and two if by sea.” In this
`case, the message ensemble consisted of the two indi-
`vidual messages “by land” and “by sea”, and the message
`codes were “one” and “two.”
`
`In order to formalize the requirements of an ensemble
`code, the coding symbols will be represented by num-
`bers. Thus, if there are D different types of symbols to
`be used in coding, they will be represented by the digits
`0, 1, 2,
`-
`-
`~
`, (D—— 1). For example, a ternary code will be
`constructed using the three digits 0, 1, and 2 as coding
`symbols.
`The number of messages in the ensemble will be called
`N. Let P('i) be the probability of the ith. message. Then
`
`P(i) = 1.
`
`(1)
`
`N Z
`
`{=1
`
`The length of a message, L(1I), is the number of coding
`digits assigned to it. Therefore,
`the average message
`length is
`
`Ln = £3 P(i)L(i)-
`{=1
`
`(2)
`
`The term “redundancy” has been defined by Shannon‘
`as a property of codes. A “minimum-redundancy code”
`
`* Decimal classification: R531.1. Original manuscript received by
`the Institute, December 6, 1951.
`1’ Massachusetts Institute of Technology, Cambridge, Mass.
`‘ C. E. Shannon, “A mathematical theory of communication,”
`Bell Sys. Tech. Jour., vol. 27, pp. 398-403; July, 1948.
`
`1
`
`
`
`1952
`
`Hufman: A Method for the Construction of Minimum-Redundancy Codes
`
`1099
`
`to the knowledge of the author. It is the purpose of this
`paper to derive such a procedure.
`
`DERIVED CODING REQUIREMENTS
`
`For an optimum code, the length of a given message code
`can never be less than the length of a more probable
`message code. If this requirement were not met, then a
`reduction in average message length could be obtained
`byinterchangingthecodesforthetwomessagesinquestion
`in such a way that the shorter code becomes associated
`with the more probable message. Also, if there are sev-
`eral messages with the same probability, then it is pos-
`sible that the codes for these messages may differ in
`length. However, the codes for these messages may be
`interchanged in any way without affecting the average
`code length for the message ensemble. Therefore, it may
`be assumed that the messages in the ensemble have been
`ordered in a fashion such that
`
`1°(1) 2; 19(2) E; -- - E: 1”(A7 '* 1) E: l°(FV)
`
`(3)
`
`and that, in addition, for an optimum code, the condition
`
`L(1) _S_ L(2) § - -- .S. L(-7V ~ 1) é L(N)
`
`(4)
`
`is assumed to be satisfied
`holds. This requirement
`throughout the following discussion.
`It might be imagined that an ensemble code could
`assign q more digits to the Nth message than to the
`(N—1)st message. However, the first L(N— 1) digits of
`the Nth message must not be used as the code for any
`other message. Thus the additional g digits would serve
`no useful purpose and would unnecessarily increase
`La... Therefore, for an optimum code it is necessary that
`L(N) be equal to L(N—1).
`The lath prefix of a message code will be defined as the
`first k digits of that message code. Basic restriction (b)
`could then be restated as: No message shall be coded
`in such a way that its code is a prefix of any other mes-
`sage, or that any of its prefixes are used elsewhere as a
`message code.
`Imagine an optimum code in which no two of the mes-
`sages coded with length L(N) have identical prefixes of
`order L(N)—1. Since an optimum code has been as-
`sumed, then none of these messages of length L(N) can
`have codes or prefixes of any order which correspond to
`other codes. It would then be possible to drop the last
`digit of all of this group of messages and thereby reduce
`the value of L.,.,. Therefore, in'an optimum code, it is
`necessary that at least two (and no more than D) of the
`codes with length L(N) have identical prefixes of order
`L(N) — 1.
`One additional requirement can be made for an opti-
`mum code. Assume that there exists a combination of
`
`the D different types of coding digits which is less than
`L(N) digits in length and which is not used as a message
`code or which is not a prefix of a message code. Then
`this combination of digits could be used to replace the
`code for the Nth message with a consequent reduction
`of L..,,. Therefore, all possible sequences of L(N)—-1 2
`2
`
`digits must be used either as message codes, or must
`have one of their prefixes used as message code.
`The derived restrictions for an optimum code are
`summarized in condensed form belowdnd considered in
`
`addition to restrictions (a) and (b) given in the first
`part of this paper:
`
`(c)
`
`L(1) § L(2) g -
`
`-
`
`- s L(N — 2) = L(N).
`
`(5)
`
`((1) At least two and not more than 1) of the messages
`with code length L(N) have codes which are alike
`except for their final digits.
`(e) Each possible sequence of L(N) -1 digits must be
`used either as a message code or nust have one of
`its prefixes used as a message code.
`
`OPTIMUM BINARY Com:
`
`For ease of development of the optimum coding pro-
`cedure, let us now restrict ourselves to the problem of
`binary coding. Later this procedure will be extended to
`the general case of D digits.
`Restriction (c) makes it necessary that the two least
`probable messages have codes of equal
`length. Re-
`striction (d) places the requirement that, for D equal
`to two, there be only two of the messages with coded
`length L(N) which are identical except {or their last
`digits. The final digits of these two codes will be one
`of the two binary digits, 0 and 1. It will be necessary to
`assign these two message codes to the Nth and the
`(N —~ 1)st messages since at this point it is not known
`whether or not other codes of length L(N) exist. Once
`this has been done, these two messages are equivalent
`to a single composite message. Its code'(as yet undeter-
`mined) will be the common prefixes of order L(N) -1 of
`these two messages. Its probability will be the sum of
`the probabilities of the two messages from which it was
`created. The ensemble containing this composite mes-
`sage in the place of its two component messages will be
`called the first auxiliary message ensemble.
`This newly created ensemble contains one less mes-
`sage than the original. Its members should be rearranged
`if necessary so that the messages are again ordered ac-
`cording to their probabilities. It may be considered ex-
`actly as the original ensemble was. The codes for each of
`the two least probable messages in this new ensemble
`are required to be identical except in their final digits;
`0 and 1 are assigned as these digits, one for each of the
`two messages. Each new auxiliary ensemble contains
`one less message than the preceding ensemble. Each
`auxiliary ensemble represents the original ensemble with
`full use made of the accumulated necessary coding re-
`quirements.
`-
`The procedure is applied again and again until the
`number of members in the most recently formed auxili-
`ary message ensemble is reduced to two. One of each of
`the binary digits is assigned to each of these two com-
`posite messages. These messages are then combined to
`form a single composite message with probability unity,
`and the.coding is complete.
`
`
`
`1100
`
`PROCEEDINGS OF THE I.R.E.
`
`September
`
`TABLE I
`OPTIMUM BINARY CODING PROCEDURE
`
`Message Probabilities
`
`Auxiliary Message Ensembles
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`Original
`Message
`Ensemble
`
`0.20
`
`0.18
`
`0.10
`0.10
`0.10
`
`0.20
`
`6.18
`
`0.10
`010
`0.10
`
`0.20
`
`0.18
`
`0.10
`0.10
`0.10
`
`—->0.08
`
`0.06
`0.06
`0.04
`0.04}
`0.04 —
`
`0.20
`
`0.18
`
`0.20
`
`0.18
`
`—>0.14
`0.10
`0.10
`0.10
`0.10
`0.08 -
`
`0.10
`0.10
`0.10
`-—>O.1O
`0.08
`0.08
`0.06 —
`
`0.20
`
`0.18
`
`0.10
`0.10
`0.10
`
`0.08
`-90.03
`0.06
`0.06}
`0.04 —
`
`—~>1.00
`
`—>O.60
`0.40}-
`
`V
`
`«>0.40
`0.36
`0.24 —
`
`-—>0.36
`0.24
`0.20
`0.20 —
`
`—>0.24
`0.20
`0.20
`0.18
`0.18 —
`
`0.20
`-—>0.2O
`0.18
`0.18
`0.141
`0.10)’-
`
`0.20
`
`0.18
`-+0.18
`0.14
`0.10
`0.10
`0.10 —
`
`0.06
`0.06
`0-96
`0.06
`0.04
`0.04
`0.04
`*0.04
`0.04
`0.04
`0.04}
`0.04
`-0.04 -
`0.03} 1
`0.01 -
`
`Now let us examine Table I. The left—hand column
`
`contains the ordered message probabilities of the ensem-
`ble to be coded. N is equal to 13. Since each combination
`of two messages (indicated by a bracket) is accompanied
`by the assigning of a new digit to each, then the total
`number of digits which should be assigned to each origi-
`nal message is the same as the number of combinations
`indicated for that message. For example, the message
`marked *, or a composite of which it is a part, is com-
`bined with others five times, and therefore should be
`
`assigned a code length of five digits.
`When there is no alternative in choosing the two least
`probable messages,
`then it is clear that the require-
`ments, established as necessary, are also sufficient for
`deriving an optimum code. There may arise situations
`in which a choice may be made between two or more
`groupings of least likely messages. Such a case arises, for
`example, in the fourth auxiliary ensemble of Table 1.
`Either of the messages of probability 0.08 could have
`been combined with that of probability 0.06. However,
`it is possible to rearrange codes in any manner among
`equally likely messages without affecting the average
`code length, and so a choice of either of the alternatives
`could have been made. Therefore, the procedure given is
`always sufficient to establish an optimum binary code.
`The lengths of all the encoded messages derived from
`Table I are given in Table 11.
`Having now determined proper lengths of code for
`each message,
`the problem of specifying the actual
`digits remains. Many alternatives exist. Since the com-
`bining of messages into their composites is similar to the
`successive confluences of trickles, rivulets, brooks, and
`
`creeks into a final large river, the procedure thus far de-
`scribed might be considered analogous to the placing of
`signs by a water-borne insect at each of these junctions
`as he journeys downstream. It should be remembered
`that the code which we desire is that one which the in-
`
`sect must remember in order to work his way back up-
`stream. Since the placing of the signs need not follow
`the same rule, such as “zero-right-returning,” at each
`junction, it can be seen that there are at least 2” dif-
`ferent ways of assigning code digits for our example.
`TABLE II
`RESULTS or OPTIMUM BINARY Conmc Pnocanumz
`
`i
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`
`P(i)
`0.20
`0.18
`0.10
`0.10
`0.10
`0 .06
`0.06
`0 .04
`0 .04
`0.04
`0.04
`0.03
`0.01
`
`L(1')
`2
`3
`3
`3
`3
`4
`5
`5
`5
`5
`5
`6
`6
`
`P(i)L(i)
`0 .40
`0.54
`0.30
`0.30
`0.30
`0.24
`0 .30
`0 .20
`0 .20
`0.20
`0.20
`0 18
`0.06
`
`Lav =3
`
`Code
`10
`000
`011
`110
`111
`0101
`00100
`00101
`01000
`01001
`00110
`001110
`001111
`
`The code in Table II was obtained by using the digit
`0 for the upper message and the digit 1 for the lower
`message of any bracket. It is important to note in Table
`I that coding restriction (e) is automatically met as long
`as two messages (and not one) are placed in each
`bracket.
`
`3
`
`
`
`1952
`
`PROCEEDINGS OF THE I.R.E.
`
`1101
`
`GENERALIZATION OF THE METHOD
`
`Optimum coding of an ensemble of messages using
`three or more types of digits is similar to the binary
`coding procedure. A table of auxiliary message ensem-
`bles similar to Table I will be used. Brackets indicating
`messages combined to form composite messages will be
`used in the same way as was done in Table I. However,
`in order to satisfy restriction (e), it will be required that
`all these brackets, with the possible exception of one com-
`bining the least probable messages of the original ensem-
`ble, always combine a number of messages equal to D.
`It will be noted that the terminating auxiliary ensem-
`ble always has one unity probability message. Each pre-
`ceding ensemble is increased in number by D——1 until
`the first auxiliary ensemble is reached. Therefore, if N;
`is the number of messages in the first auxiliary ensemble,
`then (N1—1)/(D—1) must be an integer. However
`N1 = N—no+1, where no is the number of the least prob-
`able messages combined in a bracket in the original en-
`semble. Therefore, no (which, of course, is at least two
`and no more than D) must be of such a value that
`(N—no)/(D—1) is an integer.
`In Table III an example is considered using an en-
`semble of eight messages which is to be coded with four
`
`digits; no is found to be 2. The code listed in the table is
`obtained by assigning the four digits 0, 1, 2, and 3, in or-
`der, to each of the brackets.
`
`TABLE III
`OPTIMUM CODING PROCEDURE FOR D=4
`
`
`Message Probabilities
`
`Original
`Message
`Ensemble
`
`
`
`Auxiliary Ensemble:
`
`L(1Z)
`
`Code
`
`-91.00
`
`-+0.40
`0.22
`0.20 —
`0.18 J
`
`
`
`0.22
`0.20 _
`0.18
`0.15
`0.10
`0 .08
`
`0.22
`0.20
`0.18
`0.15
`0.10
`0 . 08 -—
`—->0.07
`
`1
`
`0.05} l
`
`0.02 —
`
`1
`1
`1
`2
`2
`2
`
`3
`
`3
`
`l
`2
`3
`00
`01
`02
`
`030
`
`031
`
`ACKNOWLEDGMENTS
`
`The author is indebted to Dr. W. K. Linvill and Dr.
`R. M. Fano, both of the Massachusetts Institute of
`
`Technology, for their helpful criticism of this paper.
`
`Coding with Linear Systems*
`JOHN P. COSTAST, ASSOCIATE, IRE
`
`Summary/—Message transmission over a noisy channel is con-
`sidered. Two linear networks are to be designed: one being used to
`treat the message before transmissionand the second to filter the
`treated message plus channel noise at the receiving end. The mean-
`square error between the actual transmission circuit output and the
`delayed message is minimized for a given allowable average signal
`power by proper network design. Numerical examples are given and
`discussed.
`
`1.
`
`INTRODUCTION
`
`ited to linear networks. In Fig. 1, network H(w) will
`be used to code the message before transmission and
`network G(w) will perform the necessary decoding. Net-
`work H(w) must be designed so that the message is
`predistorted or coded in such a way as to enable the de-
`coding or filtering network G(w) to give a better system
`output than would have been possible had the message
`itself been sent without modification.
`
`'1 (n
`
`NOISY TRANSMISSION
`CHANNEL
`
`no)III
`
`II
`
`l IuII1I
`
`Fig. 1——-Transmission system.
`
`Before going further, a criterion of performance must
`be chosen for the transmission system. That is, some
`measurable quantity must be decided upon to enable
`us to determine whether one particular network pair
`H(co) —G(w) is more satisfactory than some other pair. N0
`single performance criterion can be expected to apply in
`all situations and no such claim is made for the mean-
`square error measure of performance which is to be used
`
`4
`
`THE PROBLEM to be considered here is that of
`
`message transmission over a noisy channel. As
`shown in Fig. 1, a message function,f,,,(t), is to be
`sent down a channel into which a noise function, f,,(t),
`is introduced additively. The resultant system output
`is represented by fo(t). In most communication systems,
`the opportunity exists to code the message before its in-
`troduction into the transmission channel. Recently,
`Wiener, Shannon, and others have considered coding
`processes of a rather complex nature wherein the mes-
`sage function is sampled, quantized, and the resulting
`sample values converted into a pulse code for transmis-
`sion. Although this technique may be quite useful
`in
`many instances, its application will be restricted by the
`complexity of the terminal equipment required. In this
`discussion, the coding and decoding systems will be lim-
`
`” Decimal classification: R531.1XR143. Original manuscript re-
`ceived by the Institute, April 16, 1951; revised manuscript received
`May 3, 1952.
`1'General Electric C0,, Electronics Park, Syracuse, N. Y.