`
`CLAUDE E. SHANNON, MEMBER, IRE
`
`Classic Paper
`
`A method is developed for representing any communication
`system geometrically. Messages and the corresponding signals are
`points in two “function spaces,” and the modulation process is a
`mapping of one space into the other. Using this representation, a
`number of results in communication theory are deduced concern-
`ing expansion and compression of bandwidth and the threshold
`effect. Formulas are found for the maximum rate of transmission
`of binary digits over a system when the signal is perturbed by
`various types of noise. Some of the properties of “ideal” systems
`which transmit at this maximum rate are discussed. The equivalent
`number of binary digits per second for certain information sources
`is calculated.
`
`I.
`
`INTRODUCTION
`A general communications system is shown schemati-
`cally in Fig. 1. It consists essentially of five elements.
`1) An Information Source: The source selects one mes-
`sage from a set of possible messages to be transmitted to
`the receiving terminal. The message may be of various
`types; for example, a sequence of letters or numbers, as
`in telegraphy or teletype, or a continuous function of time
`, as in radio or telephony.
`2) The Transmitter: This operates on the message in
`some way and produces a signal suitable for transmission
`to the receiving point over the channel. In telephony, this
`operation consists of merely changing sound pressure into
`a proportional electrical current. In telegraphy, we have
`a encoding operation which produces a sequence of dots,
`dashes, and spaces corresponding to the letters of the
`message. To take a more complex example, in the case of
`multiplex PCM telephony the different speech functions
`must be sampled, compressed, quantized and encoded, and
`finally interleaved properly to construct the signal.
`3) The Channel: This is merely the medium used to
`transmit the signal from the transmitting to the receiving
`point. It may be a pair of wires, a coaxial cable, a band
`of radio frequencies, etc. During transmission, or at the
`receiving terminal, the signal may be perturbed by noise
`or distortion. Noise and distortion may be differentiated on
`the basis that distortion is a fixed operation applied to the
`signal, while noise involves statistical and unpredictable
`
`This paper is reprinted from the PROCEEDINGS OF THE IRE, vol. 37, no.
`1, pp. 10–21, Jan. 1949.
`Publisher Item Identifier S 0018-9219(98)01299-7.
`
`Fig. 1. General communications system.
`
`perturbations. Distortion can, in principle, be corrected by
`applying the inverse operation, while a perturbation due to
`noise cannot always be removed, since the signal does not
`always undergo the same change during transmission.
`4) The Receiver: This operates on the received signal
`and attempts to reproduce, from it, the original message.
`Ordinarily it will perform approximately the mathematical
`inverse of the operations of the transmitter, although they
`may differ somewhat with best design in order to combat
`noise.
`5) The Destination: This is the person or thing for whom
`the message is intended.
`Following Nyquist1 and Hartley,2 it is convenient to use
`a logarithmic measure of information. If a device has
`possible positions it can, by definition, store log
`units of
`amounts to a choice
`information. The choice of the base
`of unit, since log
`log
`log
`. We will use the base
`2 and call the resulting units binary digits or bits. A group
`of
`relays or flip-flop circuits has
`possible sets of
`bits.
`positions, and can therefore store log
`If it is possible to distinguish reliably
`different signal
`functions of duration
`on a channel, we can say that the
`channel can transmit log
`bits in time
`. The rate of
`. More precisely, the channel
`transmission is then log
`capacity may be defined as
`
`(1)
`
`1 H. Nyquist, “Certain factors affecting telegraph speed,” Bell Syst. Tech.
`J., vol. 3, p. 324, Apr. 1924.
`2 R. V. L. Hartley, “The transmission of information,” Bell Syst. Tech.
`J., vol. 3, p. 535–564, July 1928.
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 2, FEBRUARY 1998
`
`447
`
`0018–9219/98$10.00 ª
`
`1998 IEEE
`
`IPR2018-01473
`Apple v. INVT
`INVT Exhibit 2001 - Page 1
`
`
`
`A precise meaning will be given later to the requirement
`of reliable resolution of the
`signals.
`
`apart. The function can be simply reconstructed from the
`samples by using a pulse of the type
`
`II. THE SAMPLING THEOREM
`Let us suppose that the channel has a certain bandwidth
`in cps starting at zero frequency, and that we are allowed
`to use this channel for a certain period of time
`. Without
`any further restrictions this would mean that we can use
`as signal functions any functions of time whose spectra lie
`entirely within the band
`, and whose time functions lie
`within the interval
`. Although it is not possible to fulfill
`both of these conditions exactly, it is possible to keep the
`spectrum within the band
`, and to have the time function
`. Can we describe in a more
`very small outside the interval
`useful way the functions which satisfy these conditions?
`One answer is the following.
`contains no frequencies
`Theorem 1: If a function
`cps, it is completely determined by giving
`higher than
`its ordinates at a series of points spaced 1/2
`seconds
`apart.
`This is a fact which is common knowledge in the
`communication art. The intuitive justification is that, if
`contains no frequencies higher than
`, it cannot change to
`a substantially new value in a time less than one-half cycle
`of the highest frequency, that is, 1/2
`. A mathematical
`proof showing that
`this is not only approximately, but
`exactly, true can be given as follows. Let
`be the
`spectrum of
`. Then
`
`(2)
`
`(3)
`
`since
`
`is assumed zero outside the band
`
`. If we let
`
`where
`
`is any positive or negative integer, we obtain
`
`(4)
`
`(5)
`
`at the sampling points. The
`On the left are the values of
`integral on the right will be recognized as essentially the
`th coefficient in a Fourier-series expansion of the function
`, taking the interval
`to
`as a fundamental
`period. This means that the values of the samples
`determine the Fourier coefficients in the series expansion
`of
`. Thus they determine
`, since
`is zero for
`frequencies greater than
`, and for lower frequencies
`is determined if its Fourier coefficients are determined. But
`determines the original function
`completely,
`since a function is determined if its spectrum is known.
`Therefore the original samples determine the function
`completely. There is one and only one function whose
`spectrum is limited to a band
`, and which passes through
`given values at sampling points separated 1.2
`seconds
`
`sin
`
`(6)
`
`,
`and zero at
`This function is unity at
`i.e., at all other sample points. Furthermore, its spectrum is
`constant in the band
`and zero outside. At each sample
`point a pulse of this type is placed whose amplitude is
`adjusted to equal that of the sample. The sum of these pulses
`is the required function, since it satisfies the conditions on
`the spectrum and passes through the sampled values.
`Mathematically, this process can be described as follows.
`Let
`be the
`th sample. Then the function
`is
`represented by
`
`sin
`
`(7)
`
`does not start
`A similar result is true if the band
`at zero frequency but at some higher value, and can be
`proved by a linear translation (corresponding physically to
`single-sideband modulation) of the zero-frequency case. In
`this case the elementary pulse is obtained from sin
`by
`single-side-band modulation.
`If the function is limited to the time interval
`and the
`samples are spaced 1/2
`seconds apart, there will be a total
`of
`samples in the interval. All samples outside will
`be substantially zero. To be more precise, we can define a
`function to be limited to the time interval
`if, and only if,
`all the samples outside this interval are exactly zero. Then
`we can say that any function limited to the bandwidth
`and the time interval
`can be specified by giving
`numbers.
`Theorem 1 has been given previously in other forms
`by mathematicians3 but in spite of its evident importance
`seems not
`to have appeared explicitly in the literature
`of communication theory. Nyquist,4, 5 however, and more
`recently Gabor,6 have pointed out that approximately
`numbers are sufficient, basing their arguments on a Fourier
`series expansion of the function over the time interval
`. This given
`and
`cosine terms up to
`frequency
`. The slight discrepancy is due to the fact
`that the functions obtained in this way will not be strictly
`limited to the band
`but, because of the sudden starting
`and stopping of the sine and cosine components, contain
`some frequency content outside the band. Nyquist pointed
`out the fundamental importance of the time interval
`seconds in connection with telegraphy, and we will call this
`the Nyquist interval corresponding to the band
`.
`
`3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts
`in Mathematics and Mathematical Physics, no. 33. Cambridge, U.K.:
`Cambridge Univ. Press, ch. IV, 1935.
`4 H. Nyquist, “Certain topics in telegraph transmission theory,” AIEE
`Trans., p. 617, Apr. 1928.
`5 W. R. Bennett, “Time division multiplex systems,” Bell Syst. Tech.
`J., vol. 20, p. 199, Apr. 1941, where a result similar to Theorem 1 is
`established, but on a steady-state basis.
`6 D. Gabor, “Theory of communication,” J. Inst. Elect. Eng. (London),
`vol. 93, pt. 3, no. 26, p. 429, 1946.
`
`448
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 2, FEBRUARY 1998
`
`IPR2018-01473
`Apple v. INVT
`INVT Exhibit 2001 - Page 2
`
`
`
`numbers used to specify the function need not
`The
`be the equally spaced samples used above. For example,
`the samples can be unevenly spaced, although, if there is
`considerable bunching, the samples must be known very
`accurately to give a good reconstruction of the function. The
`reconstruction process is also more involved with unequal
`spacing. One can further show that the value of the function
`and its derivative at every other sample point are sufficient.
`The value and first and second derivatives at every third
`sample point give a still different set of parameters which
`uniquely determine the function. Generally speaking, any
`set of
`independent numbers associated with the
`function can be used to describe it.
`
`III. GEOMETRICAL REPRESENTATION OF THE SIGNALS
`A set of three numbers
`,
`,
`, regardless of their
`source, can always be thought of as coordinates of a point in
`three-dimensional space. Similarly, the
`evenly spaced
`samples of a signal can be thought of as coordinates of
`a point in a space of
`dimensions. Each particular
`selection of these numbers corresponds to a particular point
`in this space. Thus there is exactly one point corresponding
`to each signal in the band
`and with duration
`.
`The number of dimensions
`will be, in general, very
`high. A 5-Mc television signal lasting for an hour would be
`represented by a point in a space with
`dimensions. Needless to say, such a space cannot
`be visualized. It is possible, however, to study analytically
`the properties of
`-dimensional space. To a considerable
`extent, these properties are a simple generalization of the
`properties of two- and three-dimensional space, and can
`often be arrived at by inductive reasoning from these cases.
`The advantage of this geometrical representation of the
`signals is that we can use the vocabulary and the results of
`geometry in the communication problem. Essentially, we
`have replaced a complex entity (say, a television signal) in
`a simple environment [the signal requires only a plane for
`its representation as
`] by a simple entity (a point) in a
`complex environment (
`dimensional space).
`If we imagine the
`coordinate axes to be at right
`angles to each other, then distances in the space have a
`simple interpretation. The distance from the origin to a point
`is analogous to the two- and three-dimensional cases
`
`where
`
`is the
`
`th sample. Now, since
`
`sin
`
`we have
`
`(8)
`
`(9)
`
`(10)
`
`using the fact that
`
`sin
`
`sin
`
`(11)
`
`times the
`Hence, the square of the distance to a point is
`energy (more precisely, the energy into a unit resistance)
`of the corresponding signal
`
`(12)
`
`. Similarly,
`is the average power over the time
`where
`the distance between two points is
`times the rms
`discrepancy between the two corresponding signals.
`If we consider only signals whose average power is less
`than
`, these will correspond to points within a sphere of
`radius
`
`(13)
`
`If noise is added to the signal in transmission, it means
`that the point corresponding to the signal has been moved a
`certain distance in the space proportional to the rms value of
`the noise. Thus noise produces a small region of uncertainty
`about each point in the space. A fixed distortion in the
`channel corresponds to a warping of the space, so that each
`point is moved, but in a definite fixed way.
`In ordinary three-dimensional space it
`is possible to
`set up many different coordinate systems. This is also
`possible in the signal space of
`dimensions that we
`are considering. A different coordinate system corresponds
`to a different way of describing the same signal function.
`The various ways of specifying a function given above are
`special cases of this. One other way of particular importance
`in communication is in terms of frequency components.
`The function
`can be expanded as a sum of sines and
`cosines of frequencies
`apart, and the coefficients used
`as a different set of coordinates. It can be shown that these
`coordinates are all perpendicular to each other and are
`obtained by what is essentially a rotation of the original
`coordinate system.
`Passing a signal through an ideal filter corresponds to
`projecting the corresponding point onto a certain region in
`the space. In fact, in the frequency-coordinate system those
`components lying in the pass band of the filter are retained
`and those outside are eliminated, so that the projection is
`on one of the coordinate lines, planes, or hyperplanes. Any
`filter performs a linear operation on the vectors of the space,
`producing a new vector linearly related to the old one.
`
`IV. GEOMETRICAL REPRESENTATION OF MESSAGES
`We have associated a space of
`dimensions with the
`set of possible signals. In a similar way one can associate
`a space with the set of possible messages. Suppose we are
`considering a speech system and that the messages consist
`
`SHANNON: COMMUNICATION IN THE PRESENCE OF NOISE
`
`449
`
`IPR2018-01473
`Apple v. INVT
`INVT Exhibit 2001 - Page 3
`
`
`
`decreased. A similar effect can occur through probability
`considerations. Certain messages may be possible, but so
`improbable relative to the others that we can, in a certain
`sense, neglect them. In a television image, for example,
`successive frames are likely to be very nearly identical.
`There is a fair probability of a particular picture element
`having the same light intensity in successive frames. If
`this is analyzed mathematically, it results in an effective
`reduction of dimensionality of the message space when
`is large.
`We will not go further into these two effects at present,
`but let us suppose that, when they are taken into account, the
`resulting message space has a dimensionality
`, which will,
`of course, be less than or equal to
`. In many cases,
`even though the effects are present, their utilization involves
`too much complication in the way of equipment. The
`system is then designed on the basis that all functions are
`different and that there are no limitations on the information
`source. In this case, the message space is considered to have
`the full
`dimensions.
`
`V. GEOMETRICAL REPRESENTATION OF THE
`TRANSMITTER AND RECEIVER
`We now consider the function of the transmitter from
`this geometrical standpoint. The input to the transmitter is
`a message; that is, one point in the message space. Its output
`is a signal—one point in the signal space. Whatever form
`of encoding or modulation is performed, the transmitter
`must establish some correspondence between the points in
`the two spaces. Every point in the message space must
`correspond to a point in the signal space, and no two
`messages can correspond to the same signal. If they did,
`there would be no way to determine at the receiver which
`of the two messages was intended. The geometrical name
`for such a correspondence is a mapping. The transmitter
`maps the message space into the signal space.
`In a similar way, the receiver maps the signal space back
`into the message space. Here, however, it is possible to
`have more than one point mapped into the same point.
`This means that several different signals are demodulated
`or decoded into the same message. In AM, for example,
`the phase of the carrier is lost in demodulation. Different
`signals which differ only in the phase of the carrier are
`demodulated into the same message. In FM the shape of
`the signal wave above the limiting value of the limiter
`does not affect the recovered message. In PCM considerable
`distortion of the received pulses is possible, with no effect
`on the output of the receiver.
`We have so far established a correspondence between a
`communication system and certain geometrical ideas. The
`correspondence is summarized in Table 1.
`
`VI. MAPPING CONSIDERATIONS
`It is possible to draw certain conclusions of a general
`nature regarding modulation methods from the geometrical
`picture alone. Mathematically, the simplest types of map-
`pings are those in which the two spaces have the same
`
`Fig. 2. Reduction of dimensionality through equivalence classes.
`
`of all possible sounds which contain no frequencies over a
`and last for a time
`.
`certain limit
`Just as for the case of the signals, these messages can
`be represented in a one-to-one way in a space of
`dimensions. There are several points to be noted, however.
`In the first place, various different points may represent the
`same message, insofar as the final destination is concerned.
`For example, in the case of speech, the ear is insensitive
`to a certain amount of phase distortion. Messages differing
`only in the phases of their components (to a limited extent)
`sound the same. This may have the effect of reducing the
`number of essential dimensions in the message space. All
`the points which are equivalent for the destination can be
`grouped together and treated as one point. It may then
`require fewer numbers to specify one of these “equivalence
`classes” than to specify an arbitrary point. For example, in
`Fig. 2 we have a two-dimensional space, the set of points in
`a square. If all points on a circle are regarded as equivalent,
`it reduces to a one-dimensional space—a point can now be
`specified by one number, the radius of the circle. In the
`case of sounds, if the ear were completely insensitive to
`phase, then the number of dimensions would be reduced
`by one-half due to this cause alone. The sine and cosine
`and
`for a given frequency would not
`components
`need to be specified independently, but only
`; that
`is, the total amplitude for this frequency. The reduction in
`frequency discrimination of the ear as frequency increases
`indicates that a further reduction in dimensionality occurs.
`The vocoder makes use to a considerable extent of these
`equivalences among speech sounds, in the first place by
`eliminating, to a large degree, phase information, and in
`the second place by lumping groups of frequencies together,
`particularly at the higher frequencies.
`In other types of communication there may not be any
`equivalence classes of this type. The final destination is
`sensitive to any change in the message within the full
`dimensions. This appears to be
`message space of
`the case in television transmission.
`A second point to be noted is that the information source
`may put certain restrictions on the actual messages. The
`dimensions contains a point for every
`space of
`limited to the band
`and of duration
`function of time
`. The class of messages we wish to transmit may be
`only a small subset of these functions. For example, speech
`sounds must be produced by the human vocal system. If
`we are willing to forego the transmission of any other
`sounds, the effective dimensionality may be considerably
`
`450
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 2, FEBRUARY 1998
`
`IPR2018-01473
`Apple v. INVT
`INVT Exhibit 2001 - Page 4
`
`
`
`Table 1
`
`Fig. 4. Efficient mapping of a line into a square.
`
`forth through the higher dimensional region as indicated
`in Fig. 4, where we have mapped a line into a square. It
`will be noticed that when this is done the effect of noise is
`small relative to the length of the line, provided the noise
`is less than a certain critical value. At this value it becomes
`uncertain at the receiver as to which portion of the line
`contains the message. This holds generally, and it shows
`that any system which attempts to use the capacities of a
`wider band to the full extent possible will suffer from a
`threshold effect when there is noise. If the noise is small,
`very little distortion will occur, but at some critical noise
`amplitude the message will become very badly distorted.
`This effect is well known in PCM.
`Suppose, on the other hand, we wish to reduce dimen-
`sionality, i.e., to compress bandwidth or time or both. That
`is, we wish to send messages of band
`and duration
`over a channel with
`. It has already been
`indicated that the effective dimensionality
`of the message
`space may be less than
`due to the properties of the
`source and of the destination. Hence we certainly need no
`dimension in the signal space for a good
`more than
`mapping. To make this saving it is necessary, of course, to
`isolate the effective coordinates in the message space, and
`to send these only. The reduced bandwidth transmission of
`speech by the vocoder is a case of this kind.
`The question arises, however, as to whether further
`reduction is possible. In our geometrical analogy,
`is it
`possible to map a space of high dimensionality onto one of
`lower dimensionality? The answer is that it is possible, with
`certain reservations. For example, the points of a square
`can be described by their two coordinates which could be
`written in decimal notation
`
`(14)
`
`From these two numbers we can construct one number by
`taking digits alternately from and
`
`(15)
`
`determines
`, and
`determines
`and
`A knowledge of
`both
`and . Thus there is a one-to-one correspondence
`between the points of a square and the points of a line.
`
`Fig. 3. Mapping similar to frequency modulation.
`
`number of dimensions. Single-sideband amplitude modula-
`tion is an example of this type and an especially simple one,
`since the coordinates in the signal space are proportional
`to the corresponding coordinates in the message space. In
`double-sideband transmission the signal space has twice the
`number of coordinates, but they occur in pairs with equal
`values. If there were only one dimension in the message
`space and two in the signal space, it would correspond to
`mapping a line onto a square so that the point
`on the line
`is represented by
`in the square. Thus no significant
`use is made of the extra dimensions. All the messages go
`into a subspace having only
`dimensions.
`In frequency modulation the mapping is more involved.
`The signal space has a much larger dimensionality than
`the message space. The type of mapping can be suggested
`by Fig. 3, where a line is mapped into a three-dimensional
`space. The line starts at unit distance from the origin on the
`first coordinate axis, stays at this distance from the origin
`on a circle to the next coordinate axis, and then goes to
`the third. It can be seen that the line is lengthened in this
`mapping in proportion to the total number of coordinates.
`It is not, however, nearly as long as it could be if it wound
`back and forth through the space, filling up the internal
`volume of the sphere it traverses.
`This expansion of the line is related to the improved
`signal-to-noise ratio obtainable with increased bandwidth.
`Since the noise produces a small region of uncertainty about
`each point, the effect of this on the recovered message will
`be less if the map is in a large scale. To obtain as large
`a scale as possible requires that the line wander back and
`
`SHANNON: COMMUNICATION IN THE PRESENCE OF NOISE
`
`451
`
`IPR2018-01473
`Apple v. INVT
`INVT Exhibit 2001 - Page 5
`
`
`
`This type of mapping, due to the mathematician Cantor,
`can easily be extended as far as we wish in the direction of
`reducing dimensionality. A space of
`dimensions can be
`mapped in a one-to-one way into a space of one dimension.
`Physically, this means that the frequency-time product can
`be reduced as far as we wish when there is no noise, with
`exact recovery of the original messages.
`In a less exact sense, a mapping of the type shown in
`Fig. 4 maps a square into a line, provided we are not too
`particular about recovering exactly the starting point, but
`are satisfied with a nearby one. The sensitivity we noticed
`before when increasing dimensionality now takes a different
`form. In such a mapping, to reduce
`, there will be a
`certain threshold effect when we perturb the message. As
`we change the message a small amount, the corresponding
`signal will change a small amount, until some critical
`value is reached. At this point the signal will undergo a
`considerable change. In topology it is shown7 that it is
`not possible to map a region of higher dimension into a
`region of lower dimension continuously. It is the necessary
`discontinuity which produces the threshold effects we have
`been describing for communication systems.
`This discussion is relevant to the well-known “Hartley
`law,” which states that “an upper limit
`to the amount
`of information which may be transmitted is set by the
`sum for the various available lines of the product of the
`line-frequency range of each by the time during which
`it is available for use.” There is a sense in which this
`statement is true, and another sense in which it is false.
`It
`is not possible to map the message space into the
`signal space in a one-to-one, continuous manner (this is
`known mathematically as a topological mapping) unless
`the two spaces have the same dimensionality; i.e., unless
`. Hence, if we limit the transmitter and receiver
`to continuous one-to-one operations, there is a lower bound
`to the product
`in the channel. This lower bound is
`determined, not by the product
`of message bandwidth
`and time, but by the number of essential dimension
`, as
`indicated in Section IV. There is, however, no good reason
`for limiting the transmitter and receiver to topological
`mappings. In fact, PCM and similar modulation systems
`are highly discontinuous and come very close to the type
`of mapping given by (14) and (15). It is desirable, then, to
`find limits for what can be done with no restrictions on the
`type of transmitter and receiver operations. These limits,
`which will be derived in the following sections, depend on
`the amount and nature of the noise in the channel, and on
`the transmitter power, as well as on the bandwidth-time
`product.
`It is evident that any system, either to compress
`, or
`to expand it and make full use of the additional volume,
`must be highly nonlinear in character and fairly complex
`because of the peculiar nature of the mappings involved.
`
`VII. THE CAPACITY OF A CHANNEL IN THE
`PRESENCE OF WHITE THERMAL NOISE
`It is not difficult to set up certain quantitative relations
`that must hold when we change the product
`. Let us
`assume, for the present, that the noise in the system is a
`white thermal-noise band limited to the band
`, and that
`it is added to the transmitted signal to produce the received
`signal. A white thermal noise has the property that each
`sample is perturbed independently of all the others, and the
`distribution of each amplitude is Gaussian with standard
`deviation
`where
`is the average noise power.
`How many different signals can be distinguished at the
`receiving point in spite of the perturbations due to noise?
`A crude estimate can be obtained as follows. If the signal
`has a power
`, then the perturbed signal will have a power
`. The number of amplitudes that can be reasonably
`well distinguished is
`
`(16)
`
`in the neighborhood of
`is a small constant
`where
`unity depending on how the phrase “reasonably well” is
`interpreted. If we require very good separation,
`will
`be small, while toleration of occasional errors allows
`to be larger. Since in time
`there are
`independent
`amplitudes, the total number of reasonably distinct signals
`is
`
`The number of bits that can be sent in this time is log
`and the rate of transmission is
`
`(17)
`
`,
`
`log
`
`(bits per second)
`
`(18)
`
`The difficulty with this argument, apart from its general
`approximate character,
`lies in the tacit assumption that
`for two signals to be distinguishable they must differ at
`some sampling point by more than the expected noise.
`The argument presupposes that PCM, or something very
`similar to PCM, is the best method of encoding binary
`digits into signals. Actually, two signals can be reliably
`distinguished if they differ by only a small amount, pro-
`vided this difference is sustained over a long period of
`time. Each sample of the received signal then gives a small
`amount of statistical information concerning the transmitted
`signal; in combination, these statistical indications result in
`near certainty. This possibility allows an improvement of
`about 8 dB in power over (18) with a reasonable definition
`of reliable resolution of signals, as will appear later. We
`will now make use of the geometrical representation to
`determine the exact capacity of a noisy channel.
`Theorem 2: Let
`be the average transmitter power, and
`suppose the noise is white thermal noise of power
`in the
`band
`. By sufficiently complicated encoding systems it
`is possible to transmit binary digits at a rate
`
`7 W. Hurewitz and H. Wallman, Dimension Theory. Princeton, NJ:
`Princeton Univ. Press, 1941.
`
`log
`
`(19)
`
`452
`
`PROCEEDINGS OF THE IEEE, VOL. 86, NO. 2, FEBRUARY 1998
`
`IPR2018-01473
`Apple v. INVT
`INVT Exhibit 2001 - Page 6
`
`
`
`with as small a frequency of errors as desired. It is not
`possible by any encoding method to send at a higher rate
`and have an arbitrarily low frequency of errors.
`This shows that the rate
`measures
`log
`in a sharply defined way the capacity of the channel for
`transmitting information. It is a rather surprising result,
`since one would expect that reducing the frequency of
`errors would require reducing the rate of transmission, and
`that the rate must approach zero as the error frequency
`does. Actually, we can send at
`the rate
`but reduce
`errors by using more involved encoding and longer delays
`at the transmitter and receiver. The transmitter will take
`long sequences of binary digits and represent this entire
`sequence by a particular signal function of long duration.
`The delay is required because the transmitter must wait for
`the full sequence before the signal is determined. Similarly,
`the receiver must wait for the full signal function before
`decoding into binary digits.
`We now prove Theorem 2. In the geometrical represen-
`tation each signal point is surrounded by a small region
`of uncertainty due to noise. With white thermal noise, the
`perturbations of the different samples (or coordinates) are
`all Gaussian and independent. Thus the probability of a
`perturbation having coordinates
`(these are
`the differences between the original and received signal
`coordinates) is the product of the individual probabilities
`for the different coordinates
`
`exp
`
`exp
`
`Since this depends only on
`
`the probability of a given perturbation depends only on the
`distance from the original signal and not on the direction.
`In other words, the region of uncertainty is spherical in
`nature. Although the limits of this region are not sharply
`defined for a small number of dimensions
`, the
`limits become more and more definite as the dimensionality
`increases. This is because the square of the distance a
`signal is perturbed is equal to
`times the average
`noise power during the time
`. As
`increases,
`this
`average noise power must approach
`. Thus, for large
`, the perturbation will almost certainly be to some point
`near the surface of a sphere of radius
`centered
`at
`the original signal point. More precisely, by taking
`sufficiently large we can insure (with probability as
`near to one as we wish) that
`the perturbation will
`lie
`within a sphere of radius
`where
`is
`arbitrarily small. The noise regions can therefore be thought
`of roughly as sharply defined billiard balls, when
`is
`very large. The received signals have an average power
`, and in the same sense must almost all lie on
`
`the surface of a sphere of radius
`. How
`many different transmitted signals can be found which will
`be distinguishable? Certainly not more than the volume
`of the sphere of radius
`divided by the
`, since overlap of
`volume of a sphere of radius
`the noise spheres results in confusion as to the message
`at the receiving point. The volume of an
`-dimensional
`sphere8 of radius
`is
`
`(20)
`
`Hence, an upper limit for the number
`signals is
`
`of distinguishable
`
`Consequently, the channel capacity is bounded by
`
`log
`
`(21)
`
`(22)
`
`This prove