`BOOK COMPANY
`New York
`St. Louis
`San Francisco
`Diisseldorf
`Johannesburg
`Kuala Lumpur
`London
`Mexico
`Montreal
`New D elhi
`Panama
`Paris
`Sao Paulo
`Singapore
`Sydney
`Tokyo
`Toronto
`
`A. BRUCE CARLSO N
`Associate Professor of Sysrems Engi11eeri11g
`Rensselaer Polyteclmic lnstifllte
`
`Communication
`Systems
`
`AN INTRODUCTION TO
`SIGNALS AND NOISE IN
`ELECTRICAL COMMUNICATION
`
`S ECOND EDITION
`
`IPR2018-1556
`HTC EX1054, Page 1
`
`
`
`This book was set in Times New Roman.
`The editors were Kenneth J. Bowman and Michael Gardner;
`the cover was designed by Pencils Portfolio, Inc.;
`the production supervisor was Leroy A . Young.
`The dra wings were done by J & R Services, Inc.
`Kingsport P ress, Inc., was printer and binder.
`
`Library of Congress Cataloging in Publication Data
`Carlson, A
`Bruce, date
`Communication system.
`(McGraw-Hill electrical and electronic engineering
`series)
`Bibliography: p.
`I. Signal theory (Telecommunication) 2. Telecom(cid:173)
`I. Title.
`munication.
`TK5 102.5.C3 1975
`ISBN 0--07-009957-X
`
`621.38'043
`
`74-9841
`
`COMMUN ICATION SYSTEMS
`An Introduction to
`Signals a nd Noise in
`Electrical Communication
`Copyright<!:) 1968, 1975 by McGraw-Hill, Inc. All rights reserved.
`Printed in the United States of America. No part of this publication may be reproduced
`stored in a retrieval system, or transmitted, in any form or by any means,
`'
`electron ic, mechanical, photocopying, recording, o r otherwise,
`without the prior wr itten permission of the publisher.
`
`890KPKP 832
`
`To the memory of my father,
`ALBIN JOHN CARLSON
`
`IPR2018-1556
`HTC EX1054, Page 2
`
`
`
`9
`
`INFORMATION THEORY AND
`COMMUNICATION SYSTEMS
`
`The past several chapters have dealt with electrical communication primarily in
`terms of signals, both desired and undesired. We have devised signal models, examined
`the effects of networks on signals, and analyzed modulation as a means of signal
`transmission. Although many rewards and much insight have been gained by this
`approach, signal theory alone is not sufficient for a complete understanding of our
`subject matter, particularly when it comes to the design of new and improved systems.
`What is needed is a more encompassing view of the communication process, a broader
`in short,
`perspective leading to basic principles for system design and comparison -
`a general theory of communication.
`Prior to the 1940s a few steps were taken toward such a theory in the telegraphy
`investigations of Nyquist and Hartley. But then, shortly after World War II, Claude
`Shannon (1948) and Norbert Wiener (1949) set forth new concepts that had and
`continue to have major impact. Taken together, the ideas of Wiener and Shannon
`established the foundation of modem (statistical) communication theory. Both men
`were concerned with extracting information from a background of noise, and both
`applied statistical concepts to the problem. t There were, however, differences in
`emphasis.
`
`t Both men arc also famous for other accomplishments. Wiener founded the subject
`known as cybemctfos, and Shannon first pointed out the relationship of Boolean
`algebra to switching~ircuit design -
`in his master's thesis!
`
`INFORMATION THEORY AND COMMUN ICATION SYSTEMS 341
`
`Wiener treated the case where the information-bearing signals are beyond the
`designer's control, in whole or part, all the processing being at the receiving end.
`The problem then can be stated in this fashion : Given the set of possible signals, not
`of our choosing, plus the inevitable noise, how do we make the best estimate of the
`present and future values of the signal being received? Optimum solutions to this and
`similar problems are sought in the discipline known as detection theory.
`Shannon's work is more nearly akin to what we think of as communication,
`where signal processing can take place at both transmitter and receiver. Shannon
`posed this problem: given the set of possible messages a source may produce, not of
`our choosing, how shall the messages be represented so as best to convey the infor(cid:173)
`mation over a given system v.ith its inherent physical limitations? To handle this pro(cid:173)
`blem in quite general terms it is necessary to concentrate more on the information
`per se than on the signals, and Shannon's approach was soon rechristened information
`theory.
`Information theory is a mathematical subject dealing with three basic concepts:
`the measure of information, the capacity of a communication channel to transfer
`information, and coding as a means of utilizing channels at full capacity. These
`concepts are tied together in what can be called the fundamental theorem of infor(cid:173)
`mation theory, as follows.
`
`Given an information source and a communication channel, there exists a
`coding technique such that the information can be transmitted over the channel
`at any rate less than the channel capacity and with arbitrarily small fre<i uency of
`errors despite the presence of noise.
`
`The surprising, almost astonishing aspect of this theorem is error-free transmission
`on a noisy channel, a condition achieved through the use of coding. In essence,
`coding is used to match the source and channel for maximum reliable information
`transfer, roughly analogous to impedance matching for maximum power transfer.
`But the study of coding is, by and large, tangential to our inmediate aims.
`Thus, with some reluctance, we limit this chapter primarily to the concepts of infor(cid:173)
`mation measure and channel capacity, with emphasis on the latter. By so doing we
`shall eventually arrive at answers to these significant questions:
`
`2
`
`J Precisely how do the fundamental physical limitations (i.e., bandwidth
`and noise) restrict information transmission?
`Is there such a thing as an ideal communication system, and, if so, what are
`its characteristics?
`3 How well do existing communication systems measure up to the ideal, and
`how can their performance be improved?
`
`Answers to these questions are certainly germane to electrical communication.
`
`IPR2018-1556
`HTC EX1054, Page 3
`
`
`
`342
`
`INFORMATION THEORY AND COMMUNICATION SYSTEMS
`
`9. 1
`
`INFORMATION MEASURE: ENTROPY 343
`
`They will be explored in some detail at the close of the chapter. But we must begin
`with information theory.
`
`9.1 INFORMATION MEASURE: ENTROPY
`The crux of information theory is the measure of information. Here we are using in(cid:173)
`formation as a technical term, not to be confused with its more conventional interpre(cid:173)
`tations. In particular, the information of information theory has little to do with
`knowledge or meaning, concepts which defy precise definition, to say nothing of
`quantitative measurement. In the context of communication, information is simply
`that which is produced by the source for transfer to the user. This implies that
`before transmission, the information was not available at the destination; otherwise
`the transfer would be zero. Pursuing this line of reasoning, consider the following
`somewhat contrived situation.
`A man is planning a trip to Chicago. To determine what clothes he should
`pack, he telephones the Chicago weather bureau and receives one of the following
`forecasts:
`
`The sun will rise.
`It will rain.
`There will be a tornado.
`
`Clearly, the amount of information gained from these messages is quite different.
`The first contains virtually no information, since we are reasonably sure in advance
`that the sun will rise ; there is no uncertainty about this, and the call has been wasted.
`But the forecast of rain does provide information not previously available to the
`traveler, for rain is not an everydl!Y occurrence. The third forecast contains even more
`information, tornadoes being relatively rare and unexpected events.
`Note that the messages have been listed in order of decreasing likelihood and
`increasing information. The less likely the message, the more information it conveys
`to the user. We are thus inclined to say that information measure is related to un(cid:173)
`certainty, the uncertainty of the user as to what the message will be. Moreover, the
`amount of information depends only on the message uncertainty, rather than its
`actual content or possible interpretations. Had the Chicago weather forecast been
`"The sun will rain tornadoes," it would convey information, being quite unlikely, but
`not much meaning.
`Alternately, going to the transmitting end of a communication system, informa(cid:173)
`tion measure is an indication of the freedom of choice exercised by the source in selec(cid:173)
`ting a message. If the source can freely choose from many different messages, the
`user is highly uncertain as to which message will be selected. But if there is no choice
`at all, only one possible message, there is no uncertainty and hence no information.
`
`Whether one prefers the uncertainty viewpoint or the freedom-of-choice interpre(cid:173)
`tation, it is evident that the measure of information involves probabilities. Messages
`of high probability, indicating little uncertainty on the part of the user or little choice
`on the part of the source, convey a small amount of information, and vice versa.
`This notion is formalized by defining self-information in terms of probability.
`
`Self-Information
`
`I
`
`Consider a so urce that produces various messages. Let one of the messages be
`designated A, and let PA be the probability that A is selected for transmission. Consis(cid:173)
`tent with our discussion above, we write the self-information associated with A as
`
`where the function f( ) is to be determined. As a step toward finding f( ), intuitive
`reasoning suggests that the following requirements be imposed:
`
`JA =[(PA)
`
`f(P,.) ~ 0
`
`where O :,;PA :;5; I
`
`(I)
`
`(2)
`
`lim f(P,.) =0
`P..t -+ I
`f(P,.) > [(Pa)
`
`(3)
`
`for PA < P8
`The student should have little trouble interpreting these requirements.
`Many functions satisfy Eqs. (I) to (3). The final and deciding factor comes from
`considering the transmission of independent messages. When message A is delivered,
`the user receives J A units of information. If a second message B is also delivered,
`the total information received should be the sum of the self-informations, J _. + J 8 .
`This summation rule is readily appreciated if we think of A and B as coming from
`different sources. But suppose both messages come from the same source ; we can
`then speak of the compound message C = AB. If A and Bare statistically independent,
`Pc = PAP 8 and Jc = f(P A P 8 ). But the received information is still J~ = J A + J 8 =
`.f(P A) + f(P 8) and therefore
`
`f(P,.P8 )=f(P,.)+f(P8 )
`
`(4)
`
`which is our final requirement for f( ).
`There is one and only one functiont satisfying the conditions (I) to (4), namely,
`the logarithmic function f( ) = -log. ( ), where b is the logarithmic base. Thus
`self-information is defined as
`
`"'-
`JA -
`
`I
`- log• PA = log.-
`PA
`
`(5)
`
`t See Ash (1965, chap. 1) for proof.
`
`IPR2018-1556
`HTC EX1054, Page 4
`
`
`
`344
`
`INFORMATION THEORY AND COMMUNICATION SYsnMS
`
`9.1
`
`INFORMATION MEASURE: ENTROPY 345
`
`Alternately, assuming the levels to be equally likely, the information per element is
`log 8 = 3 bits, for a total of 3 x I 0 5 x 3 ::::, I 06 bits, as before.
`But what a bout the thousand words? Suppose, for the sake of a rgument, tlaat a
`vocabulary consists of 100,000 equally likely words. The probability of any one
`word is then P = 10 - 5, so the information contained in 1,000 words is
`::::, 2 x 104 bits
`.! = 1,000 log 105 = 10 3 x 3.32 log, 0 105
`
`or substantially less than the information in one picture.
`The validity of the above assumptions is of course open to question ; the point
`of this example is the method, not the results.
`////
`
`Entropy and Information Rate
`
`Self-information is defined in terms of the individual messages or symbols a source
`may produce. It is not, however, a useful description of the source re lative to communi(cid:173)
`cation. A communication system is not designed around a particular message but
`rather all possible messages, i.e., what the source could produce as distinguished
`from what it does produce on a given occasion. Thus, although the instanta neous
`information flow from a source may he erratic, one must describe the source in terms
`of the average information produced. This average information is called the source
`entropy.
`For a discrete source whose symbols are sta1istical/y independent, the entropy
`expression is easily formulated. Let m be the number of different symbols, i.e., an
`alphabet of size m . When the jth symbol is transmitted, it conveys J 1 = log (1/ P,)
`bits of information. In a long message of N » 1 symbols, the jth symbol occurs
`about NP
`times, and the total information in the message is approximately
`1
`
`NP1.F, + NP2.F2 + ... + NP.,.F~ = L NP; .F,
`
`m
`
`} • 1
`
`bits
`
`where b is unspecified for the moment. The minus sign in - logb PA is perhaps
`disturbing at first glance. But, since probabilities are bounded by O ~ PA ~ I, the
`negative of the logarithm is positive, as desired. The alternate form logb (I/PA) helps
`avoid confusion on this score, and will be used throughout.
`Specifying the logarithmic base b is equivalent to selecting the unit of informa(cid:173)
`tion. While common or natural logarithms (b = JO or b= e) seem obvious candidates,
`the standard convention of information theory is to take b = 2. The corresponding
`unit of information is termed the bi1, a contraction for binary digit suggested by
`J. W. Tukey. Thus
`
`J A = log2 ...!..
`PA
`
`bits
`
`The reasoning behind this rather strange convention goes like this. Information is a
`measure of choice exercised by the source; the simplest possible choice is that between
`two equiprobable messages, i.e., an unbiased binary choice. The information unit is
`therefore normalized to this lowest-order situation, and I bit of information is the
`amount required or conveyed by the choice between two equally likely possibilities,
`i.e., if PA = P8 = M, then J A = .!8 = log2 2 = I bit.
`Binary digits enter the picture simply because any two things can be represented
`by the two binary digits O and 1. Note, however, that one binary digit may convey
`more or less than I bit of information, depending on the probabilities. To prevent
`misinterpretation, binary digits as message elements are called binils in this chapter.
`Since tables of base 2 logarithms are relatively uncommon, the following
`conversion relationship is needed:
`
`log2 v = log2 IO log10 v"" 3.32 log10 v
`(6)
`Thus, if PA = Yto, f,. = 3.32 log10 10 = 3.32 bits. In the remainder of this chapter,
`all logarithms will be base 2 unless otherwise indicated.
`
`Example 9.1 The Information in a Picture
`
`It has often been said that one picture is worth a thousand words. With a little stretch(cid:173)
`ing, information measure supports this old saying.
`For analysis we decompose the picture into a number of discrete dots, or ele(cid:173)
`ments, each element having a brightness level ranging in steps from black to white.
`The standard television image, for instance, has about 500 x 600 = 3 x 105 elements
`and eight easily distinguishable levels. Hence, there are 8 x 8 x .. . = 83
`10
`' possible
`•
`pictures, each with probability P = s- <Jx 10'> if selected at random. Therefore
`
`J = Jog 83 ' 1°' = 3 x 105 log 8::::, 106 bits
`
`which, when divided by N, yields the average information per symbol. We therefore
`define the entropy of a discrete source as
`.,
`.,
`I
`Jff~ L P;J, = L P, log p.
`J• I
`J•I
`It should be observed that Eq. (7) is a n ensemble average. If the source is nonstation(cid:173)
`ary, the symbol probabilities may change with time and the entropy is not very mean(cid:173)
`ingful. We shall henceforth assume that infor mation sources are ergodic, so tha t ti me
`a nd ensemble averages are identical.
`The name entropy and its symbol Jff are borrowed from a similar equatiort in
`statistical mechanics. Because of the mathematical similarity, various attempts have
`
`bits/symbol
`
`(7)
`
`J
`
`IPR2018-1556
`HTC EX1054, Page 5
`
`
`
`346
`
`INFORMATION lREORY AND COMMUNICATION SYSTEMS
`
`9.1
`
`INFORMATION MEASURE: ENTROPY 347
`
`been made to relate communication entropy with thermodynamic entropy. t However,
`the attempted relationships seem to cause more confusion than illumination, and it is
`perhaps wiser to treat the two entropies as different things with the same name.
`For this reason the alternate designation comentropy has been suggested for communi(cid:173)
`cation entropy.
`But what is the meaning of communication entropy as written in Eq. (7)?
`Simply this: although one cannot say which symbol the source will produce next,
`on the average we expect to get .1'f bits of information per symbol or N.1'f bits in a
`message of N symbols, if N is large.
`For a fixed alphabet size (fixed m) the entropy of a d iscrete source depends on the
`symbol probabilities but is bounded by
`
`0 ::. .1'f s; log m
`
`(8)
`
`These extreme limits are readily interpreted and warrant further discussion. The
`lower limit, Jf' = 0, implies that the source delivers no information (on the average),
`and hence there is no uncertainty about the message. We would expect this to corres(cid:173)
`pond to a source that continually produces the same symbol; i.e., all symbol probab(cid:173)
`ilities are zero, save for one symbol having P = I. It is easily shown that Jf' = 0 in
`this case. At the other extreme, the maximum entropy must correspond to maximum
`uncertainty or maximum freedom of choice. This implies that all symbols are equally
`likely ; there is no bias, no preferred symbol. A little further thought reveals that the
`symbol probabilities must be the same; i.e., .1'f = Jf' max = log m when a ll P1 = l /m,
`which has particular significance for our later wor k.
`The variation of .1'f between the limits of Eq. (8) is best illustrated by considering
`a binary source (m = 2). The symbol probabilities are then related and can be written
`as P and 1 - P. Thus
`
`I
`I
`.1'f = P log - + ( I - P) log - -
`P
`1 - P
`
`(9)
`
`which is plotted versus P in Fig. 9.1. Note the rather broad maximum centered at
`P = 0.5, the equally likely case, where .1'f = log 2 = I bit.
`Bringing the time element into the picture, suppose two sources have equal
`entropies but one is "faster" than the other, producing more symbols per unit time.
`In a given period, more information must be transferred from the faster source than
`from the slower, which obviously places greater demands on the communication
`system. Thus, for our purposes, the description of a source is not its entropy alone
`but its entropy rate, or information rate, in bits per second. The entropy rate of a
`discrete source is. simply defined a s
`
`t Sec Brillouin (1956).
`
`1.0
`
`:E
`i, 0.5
`
`FIGURE 9.1
`Entropy of a binary source versus the
`probability of one symbol.
`
`O
`
`o.s
`p
`
`1.0
`
`fJl 8 .1'f
`i
`
`bits/s
`
`(10)
`
`where i is the average symbol duration, namely,
`
`..
`i= I P; TJ
`
`j
`
`•
`
`(11)
`
`l
`
`so 1/i equa ls the average number of symbols per unit time.
`
`EXERCISE 9. 1 Calculate the entropy rate of a telegraph source having Pd0 , = %,
`Tdo• = 0.2 s, Pdash = ~ , Tdash = 0.4 s. Ans.: Bl = 3.44 bits/s.
`
`Example 9.2 Entropy and PCM
`
`As an example of entropy applied to our earlier studies, consider a PCM system whose
`input is the continuous signal x(I) bandlimited in W = 50 Hz. Suppose x(t) is sampled
`at the Nyquist rate J. = 2 W, a nd let there be four quantum levels such that the quan(cid:173)
`tized values have probabilities :¥2, X , Ys, and Ys. Identifying each possible quantized
`value as a "symbol," the output of the quantizer then looks like a discrete source
`with m = 4 and
`
`Yi' = V2 log 2 + X log 4 + ~ log 8 + ~ log 8
`= 1.75 bits/symbol
`Since the symbol rate is 1/'f = J. = JOO,
`
`fJt = 100 x 1.75 = 175
`
`bits/s
`
`Thus, it should be possible to represent this same information by equiprobable
`binary digits (biaits) generated at a rate of 175 binits/s.
`
`IPR2018-1556
`HTC EX1054, Page 6
`
`
`
`348
`
`INFORMATION THEORY AND COMMUNICATION SYSTEMS
`
`9.1
`
`INFORMATION MEASURE: ENTROPY 349
`
`To check this idea, suppose that the system is in fact binary PCM with the
`quantized samples transmitted as coded binary pulses. Labeling the quantum levels
`by the code numbers 0, I , 2, and 3, let us assume the following coding:
`
`Code number
`
`Probability
`
`Binary code
`
`0
`I
`2
`3
`
`~
`l4
`l-11
`~
`
`00
`01
`10
`11
`
`Since there are two binary digits for each sample, the PCM signal has 200 pulses per
`second. Now we argued that 175 binary digits per second should be sufficient ; yet
`200 per second is requfred with this code.
`The anomaly is quickly resolved by noting that the binary digits are not equally
`likely; in fact, P0 = 1Yi6 and P 1 = Yi6, as the reader can verify. Clearly, the suggested
`code is not optimum. On the other hand, it is simple and reasonably efficient.
`Pressing onward, the binit rate and probabilities given a bove suggest that the
`information rate at the encoder output is
`q/ = 200(1Yi6 log 1;{1 + Yi6 log 1%)
`= 2()() X 0.897 = 179 bits/S
`Again something is wrong, for the information rate into the encoder is only l 75 bits/s.
`Surely direct encoding does not add information!
`To explain the discrepancy we must recall that the entropy equation (7) is based
`on statistica lly independent symbols. And, while it may be true that successive
`quantized samples are independent, the successive binary pulses are not, because we
`have encoded in groups of two. For the case of dependent symbols, we must modify
`/Ill
`the measure of information and introduce conditional entropy.
`
`Conditional Entropy and Redundancy
`
`Discrete sources are often constrained by certain rules which limit the choice in selec(cid:173)
`ting successive symbols. The resulting intersymbol influence reduces uncertainty and
`thereby reduces the amount of information produced. We account for this effect by
`using conditional probabilties and conditional entropy.
`Written text, being governed by rules of spelling and grammar, is a good
`example of intersymbol influence. On a relative-frequency basis, the probability of
`U in printed English is Pu = 0.02; but if the previous letter is Q, the conditional
`
`probability of U given Q is P( U I Q) ~ I, whereas in contrast P(U I W) < 0.001.
`The influence may well extend over several symbols, phrases, or even complete
`sentences, as illustrated by the fact that in most textbooks, including this one,
`P(THATjIT CAN BE SHOWN)~ I.
`The expression for conditional entropy therefore is formulated by considering
`the entire past history of the source - more precisely, a ll possible past histories.
`Thus, if j represents the next symbol (or group of symbols) and i represents the pre(cid:173)
`ceding sequence, the information conveyed by j given i is log (1/P(jj i)]. Ave raging
`over all j's and i's gives the conditional entropy
`
`I
`Jlf0 = ~ ~ P 1P(jli) log P(jli)
`
`(12)
`
`In general .Jf0 :s; Jf' ; the equality applies only when the symbols are independent and
`P(jji) = P1 .
`A source producing dependent symbols is said to be redundant, meaning that
`symbols are generated which are not absolutely essential to convey the information.
`(ls it really necessary to explicitly write the U following every Q in English?) The
`redundancy of English text is estimated to be roughly 50 percent. This implies that
`in the long run, half the symbols are unnecessary: yu sbld babl t read tbs evntbo
`sevrl ltrs r msng. The reader may wish to ponder the observation that without
`redundancy, abbreviation would be impossible, and any two-dimensional array of
`letters would form a valid crossword puzzle.
`For very long passages of printed English, the conditional entropy may be as
`low as 0.5 to 1.0 bits/symbol because of contextual inferences. Thus, with suitable
`coding, printed English theoretically could be transmitted in binary form with an
`average of one binary digit per symbol. Contrast this with existing teletype systems
`that use five binary digits per character.
`From the viewpoint of efficient communication, redundancy in a message is
`undesirable; the same information could be sent with fewer nonredundant (indepen(cid:173)
`dent) symbols. Thus, coding to reduce intersymbol influence is a method of improving
`efficiency. On the other band, redundancy is a definite aid in resolving ambiguities
`if the message is received with errors, a not uncommon phenomenon in telegraphy,
`for example. Indeed, coding for error protection is based on the insertion of redundant
`symbols.
`Optimum transmission therefore entails coding to reduce the inefficie nt re(cid:173)
`dunda ncy of the message, plus coding to add " efficient" redundancy for error control.
`Much has been done in the area of error-detecting and error-correcting codes, but
`coding to reduce message redundancy is far more difficult, and relatively little bas
`been accomplished in this direction. (In retrospect, the bandwidth-reduction feature
`of differential PCM relies on the redundancy of the input signal.)
`
`IPR2018-1556
`HTC EX1054, Page 7
`
`
`
`350
`
`INFORMATION THEORY AND COMMUNICATION SYSTEMS
`
`9.2 CHANNEL CAPACITY AND DISCRETE CKANNELS 35 (
`
`Continuous Information Sources
`
`Having discussed discrete sources at some length, the next logical step would be
`the definition of entropy for continuous sources, sources whose messages are
`continuously varying functions of time. Such a definition is possible but will not be
`presented here. For one reason, the mathematics gets rather complicated and tends
`to obscure physical interpretation; for another, the entropy of a continuous source
`turns out to be a relative measure instead of an absolute measure of informa tion.
`Fortunately, the goals of this chapter can be achieved by sticking to the discrete
`formulation, and most of our conclusions will apply to continuous sources with minor
`modification. But more important, we shall find that because of the fun damental
`physical limitations, communication is inherently a discrete process regardless of the
`source. This striking conclusion is one of Shannon's principal contributions lo the
`theory of communication, but it was noted by Hartley as far back as 1928.
`
`9.2 CHANNEL CAPACITY AND DISCRETE CHANNELS
`We have seen that il is often convenient to treat the terminal equipment of a communi(cid:173)
`cation system as being perfect (noise-free, distortionless, etc.) and think of all undesired
`effects as taking place in the channel. The communication channel is therefore an
`abstraction, a model representing the vehicle of transmission plus all phenomena that
`tend to restrict transmission. The fact that there are fundamental physical limitations
`to information transfer by electrical means leads to the notion of channel capacily.
`Just as entropy rate measures the amount of information produced by a source
`in a given time, capacity is a measure of the amount of information a channel can
`transfer per unit time. Channel capacity is symbolized by ((!, and its units are bits per
`second. Restating the fundamental theorem in terms of 9t and re, we have:
`Given a channel of capacity <(l and a source having entropy rate Bl, then if
`Bf :s; ({/, there exists a coding technique such that the output of the source can
`be transmitted over the channel with an arbitrarily small frequency of errors,
`despite the presence of noise. If 9l > <(J, it is not possible to transmit without
`errors.
`
`Although we shall attempt to make the theorem plausible, complete proof
`involves a great deal of coding theory and is omitted here. Instead we shall concen(cid:173)
`trate on aspects more pertinent to electrical communication.
`
`Channel Capacity
`
`The fundamental theorem implicitly defines channel capacity as the maximum rate
`at which the channel supplies reliable information to the destination. With this
`interpretation in mind we can form ulate a general expression for capacity by means
`of the following argument.
`
`Consider all the different messages of length Ta source might produce. lf the
`channel is noisy, it will be difficult to decide at the receiver which particular message
`was intended and the goal of info rmation transfer is partially defeated. But suppose
`we restrict the messages to only those that are "very different" from each other,
`such that the received message can be correctly identified with sufficiently small
`probability of error. Let M (T) be the number of these very different messages of
`length T.
`Now, insofar as the destination or user is concerned. the source-plus-channel
`combination may be regarded as a new source generating messages at the receiving
`end. With the above message restriction, this equivalent source is discrete and has an
`alphabet of size M(T). Correspondingly, the maximum entropy produced by the
`equivalent source is log M(T), and the maximum entropy rate at the destination is
`( I /T) log M(T). Hence, letting T-+ oo to ensure generality,
`
`I
`.
`<(J = hm - log M(T)
`T-a, T
`
`bits/s
`
`(I)
`
`an alternate definition for channel capacity. The following discussion of discrete
`channels shows that Eq. (I) is an intuitively meaningful definition.
`
`Discrete Noiseless C hannels
`
`A d iscrete channel is one that transmits information by successively assuming various
`disjoint electrical states- voltage levels, instantaneous frequency, etc. Let µ be the
`number of possible states and r the signaling rate in states per unit time. If the signal(cid:173)
`to-noise ratio is sufficiently large, the error probability at the receiver can be extremely
`small, so small that to all intents and purposes the channel is deemed to be noiseless.
`U nder this assumption, any sequence of symbols will be correctly identified and the
`capacity calculation is straightforward.
`A received message of length T will consist of rT symbols, each symbol being
`one of the µ possible states. The number of different messages is thus M(T) = Jl'r
`and hence
`
`<'{/= Jim .!_logµ'r = Jim rTlogµ
`r-., T
`r- oo T
`= r logµ
`The capacity of a noiseless discrete channel is therefore proportional to the signaling
`rate a nd the logarithm of the number of states. For a binary channel (JL = 2) the
`capacity is numerically equa l to the signaling rate, that is, <c = r.
`According to Eq. (2), one can double channel capacity by doubling the signaling
`speed (which is certainly reasonable) or by squaring the number of states (somewhat
`more subtle to appreciate). In regard to the latter, suppose two identical but indepen-
`
`bits/s
`
`(2)
`
`IPR2018-1556
`HTC EX1054, Page 8
`
`
`
`352
`
`INFORMATION TI<EORY AND COMMUNICATION SYSTEMS
`
`9.2 CHANNEL CAPACITY AND DtSCRETE CHANNELS 353
`
`dent channels are operated in parallel so their combined capacity is clearly 2(r logµ) =
`2r logµ = r log µ 2• At the output we can say that we are receiving 2r symbols per
`second, each sym bol being drawn from an alphabet of sizeµ; or the output can be
`viewed as r compound symbols per second, each being drawn from an alphabet of size
`µ X µ = µl.
`The fo llowing example illustrates a coding technique that achieves !ft = <(I on a
`no iseless discrete channel.
`
`Example 9.3
`
`Consider the PCM source in Example 9.2. In theory a noiseless binary channel will
`suffice to convey the information if r ~ 175 binits/s. To minimize !ft we cannot use
`the previous code because it requires two binary digits per source symbol and r = 200.
`A more efficient code is as follows.
`
`Code number
`
`Probability Binary code
`
`0
`I
`2
`3
`
`l-2
`J.i
`J,g
`J,g
`
`0
`10
`110
`111
`
`With tnis code, a message containing N » I source symbols requires the trans(cid:173)
`mission of N/2 + 2(N/4) + 3(N/8) + 3(N/8) = 1.15N channel symbols, that is, 1.75
`binary digits per source symbol. The required signaling rate is then r = 1.75/i = 175,
`and we have achieved transmission at fft = '11; the encoding has produced a perfect
`match between source and channel.
`Although this example is admittedly a special case, there are two general hints
`about efficient encoding to be gained from it. First, the code is such that the channel
`symbols O and 1 are equally likely a nd statistically independent. (The skeptical reader
`should verify this.) Second, the source symbol with the nighest probability is assigned
`the shortest word code, and so forth down the line to the least probable symbol,
`which gets the longest code. Systematic procedures for devising such codes are
`described in the literature. t
`Somewhat parenthetically we might note that assigning shorter code words to
`the more probable symbols is just common sense. Over a century ago, long before
`S hannon, Samuel Morse constructed h