`
`number of consecutive bits from the data flow are taken
`together to form words. Each word is interpreted as an
`address which indicates a unique sample value, as shown
`in Fig. 2. The series of bits is therefore converted into
`a series of. samples via this word series. These data
`samples are then grouped into windows and added to
`the corresponding samples in the subband window of
`the original audio signal.
`The number of bits n b which are used to form one
`word depends on the set masked threshold in the subband
`and the difference Ab between the consecutive sample
`values (see Fig. 2; Ab will be indicated in the following
`by the bit step size). By assuming that the incoming
`series of bits has a uniform probability density distri-
`bution, a power
`
`Pb = (221Th — 1)
`
`12
`
`(1)
`
`74
`
`2 5
`
`4
`2—
`
`sample
`
`0
`
`3
`At)
`
`1
`2— Ab
`
`2
`
`l ob
`
`5 e.,
`2 w
`z Ab
`2
`
`000 001 011 010 110 111 101 100
`address
`Fig. 2. Example as illustration of data sample construction
`with 3-bit words.
`
`addition
`
`up
`sampling
`
`fatering
`
`combined
`out
`
`M
`
`PAPERS
`
`works best on sounds whose spectral components are
`close to those of the masking sound, but also occurs
`for components further away. The effect decreases more
`quickly toward lower frequencies than toward higher
`ones. The same is true for the time behavior: the masking
`is greatest for sounds which occur simultaneously, but
`can also be perceived in the time intervals shortly before
`and after the masking sound is supplied.
`As stated, it can be deduced from the masking effect
`that there are signals which can be added inaudibly to
`an audio signal. The momentary power spectrum of
`these signals should therefore remain at all times under
`the masked threshold of that point in time. This means
`therefore that a data flow (series of bits) can also be
`added, that is, by constructing a signal of this kind
`from these bits. This can be done in the following way
`(see Fig. 1).
`In order to use the masking effect, the signal is first
`split into subbands by means of filtering. The samples
`in each subband are then grouped into consecutive time
`windows (of approximately 10 ms in length). The win-
`dows from all subbands which represeni the same time
`interval form blocks. For each block the power spectrum
`is calculated, which is then used to determine the masked
`threshold in each subband [6]. From this the maximum
`permitted power of a signal to be added can be obtained
`per subband, so that this can be constructed from the
`data flow. After the addition the subband signals are
`joined together again by a reconstruction filter bank to
`form a wide-band signal. On the premise that the im-
`plemented scheme determines the masked threshold
`correctly, the resulting wide-band signal will sound
`the same as the original audio signal. In the paper it
`is assumed that the used masking model is correct.
`Extensive listening tests, however, have confirmed this
`[7].
`The signal to be added from the data flow and the
`set masked threshold is constructed as follows. A certain
`
`filleting down
`sampling
`
`blocking
`
`• •
`0 OOOOO
`• 0 OOOOOO
`
`eT AT
`
`audio
`In
`
`104-
`
`data In
`Ju
`
`analyzing
`
`maslJng threshold
`
`1-4
`
`constructing
`
`J. Audio Eng. Soc., Vol. 40. No. 5. 1992 May
`
`377
`
`Fig. I. Basic diagram for data addition.
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0401
`
`
`
`TEN KATE ET AL
`
`can be assigned to each window of samples constructed
`in this way. With a given bit step size A b , the number
`of bits per word is thus obtained from Eq. (1) as the
`maximum number nb that still supplies a power under
`the set masked threshold in the corresponding subband.
`How the size of Ab is determined is discussed in Sec.
`1.2.
`The signal constructed in this way will have a power
`spectrum, the height of which is given by Eq. (1), but
`which is extended over the whole frequency range.
`However, the addition of the data signal at subband
`level limits the width of this spectrum to that of the
`subband. The grouping in subband time blocks is thus
`used not only to determine the masking properties of
`the audio signal, but also to modify the frequency —
`time characteristic of the data signals to be added.
`The schematic diagram for retrieving the data added
`from the audio signal produced is shown in Fig. 3. The
`audio signal is first filtered in subbands and grouped
`in time windows, so that the same blocks are formed
`again (the filter banks to be used are of the (nearly)
`perfect reconstruction type [8)). After the position of
`the masked threshold has been determined, the sample
`values are extracted from the data signal as they were
`constructed during the addition. From the position of
`the masked threshold, the number of bits nb that was
`added is again determined using Eq. (1). Finally, by
`using the same addressing table as that used during the
`addition (Fig. 2), the conversion to bit words can be
`made which, by placing them one after the other, again
`from the original data flow. Retrieval is thus obtained.
`In order to distinguish between the added data sample
`value and the original audio sample value, it is necessary
`to apply a reference level in the combined signal. A
`level of this kind can be achieved by first quantizing
`the audio samples before carrying out the addition-. In
`this case, quantizing can be described as
`
`Q(s) = AQ * ROUND(s/AQ) ,
`
`(2)
`
`PAPERS
`
`where s is the value of the sample to be quantized,
`Q(s) its value after quantization and A() the quantization
`step size. In order to distinguish between audio sample
`value and data sample value, a step size AQ should be
`used which is greater than the range of possible data
`sample values:
`
`AQ > 2
`
`2"b — 1
`2
`
`Ab
`
`(3)
`
`The data sample value can then be recognized as the
`"quantization noise," which results from quantizing
`the combined sample again (see Fig. 4).
`The quantization of the audio signal reduces the ac-
`curacy of its representation, and this can be modeled
`as an increase in its noise level. Because the quantization
`has been used on a time-limited subband signal, this
`noise is however masked as long as its power remains
`under the masked threshold. (This property is also used
`with bit-rate reduction techniques 16).) The noise power
`is given as [9]
`
`P = 12
`
`.(4)
`
`Because the quantization noise and the data signal are
`not correlated, the total power to be masked is obtained
`from the sum of their respective powers, given by Eqs.
`(1) and (4). Using Eq. (3), this power can be written
`as
`
`Pt =
`
`PQ <
`
`•
`
`2
`
`(5)
`
`The addition and retrieval parameters AQ and nb can
`therefore be determined as follows. After determining
`the masked threshold, the maximum possible quanti-
`zation step size AQ is determined using Eq. (5). The
`maximum number of nb bits which can be added is
`
`filtering down
`sampling
`
`biocidng
`
`combined
`In
`
`•o eeeee oo
`
`AT AT
`
`exuactlon deconstructing
`
`e
`
`data
`out
`_rum_
`
`analyzing
`
`masking threshold
`
`378
`
`J. Audia•Eng. Soc., Vol. 40, No. 5, 1992 May
`
`Fig. 3. Basic diagram for data retrieval.
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0402
`
`
`
`PAPERS
`
`SURROUND-STEREO-SURROUND CODING
`
`then obtained from Eq. (3).
`The resulting addition process can also be viewed
`as follows. It is determined for each sample value which
`part of its representation is significant and which part
`is not. This distinction is made possible by the masking
`effect: only a limited accuracy can be detected by the
`human ear. The insignificant part of the signal is then
`replaced by a different value, which indicates the in-
`formation to be added.
`
`1.2 Noise
`The starting point is that the processing takes place
`with digital audio signals. This means that the combined
`signal produced will be quantized after the fi nal filtering
`to a wide-band signal (see Fig. 1) to the representation
`accuracy of the transmission channel over which it will
`be sent. This creates quantization noise with, in the
`case of a channel with a linear quantization (PCM), a
`fiat spectrum (that is, over the whole audio band) and
`a power PN of [9]
`
`12
`
`PN =
`
`(6)
`
`in which Ach indicates the quantization step size of the
`transmission channel.
`The audio signal is fi ltered again in subbands at the
`receiver end (see Fig. 3) This affects the channel quan-
`tization noise in two ways. First, the probability density
`distribution of the noise will change into a Gaussian
`one and second, the power in each subband will decrease
`in proportion to the bandwidth of this subband. Thus
`in the case of a perfect transmission channel and a
`filtering in M subbands of equal width, the subband
`samples received have a noise component with a prob-
`ability density function
`
`It is this standard deviation a which determines the
`selection of the bit step size Ai, in Eq. (1).
`The data bits are recovered by converting the data
`samples received back to their address bit words ac-
`cording to a procedure as shown in Fig. 2. As a result
`of the noise, faults may occur in this process. By the
`use of a Gray code conversion [9] (Fig. 2) only 1 bit
`will toggle in the bit word each time the noise exceeds
`a decision threshold. (These thresholds lie in the middle
`between the noise-free sample values.)
`Using Eq. (7a) an estimate can now be made of the
`error probability that n bits will be converted incorrectly
`(n = I):
`
`J.- (is - 1/2)Ab
`
`P(n) =
`
`p(e) de +
`
`p(e) de
`
`(n- (2)4
`
`(2n —
`I Ab
`= 1 — erf
`2V a/
`
`•
`
`(8)
`
`Thus with a according to Eq. (7b), Ab can be set for
`a certain error probability P(n). On the other hand, Ab
`affects the number of bits nb that can be added [see
`Eq. (3)]. As a result there is a tradeoff between n b and
`P(n).
`In fact, the audio signal itself can be regarded as a
`"channel" over which the data are transported. A channel
`capacity C can then be defined as
`
`C = I E
`M at O
`
`(9)
`
`where M is the number of subbands and n b.„, is the
`number of added bits per sample in subband m. Ac-
`cording to Eq. (3), n b.,„ follows as
`
`P(e) =
`
`1
`a;r
`
`exp
`
`— e 2
`---2-
`
`2 cr
`
`(7a)
`
`A0
`nb.„, = TRUNC[21og(---22A
`
`1-xs.m
`
`+ 1)]
`
`(10)
`
`where E is the magnitude and a the standard deviation.
`a is given by
`
`er
`
`1/ 1-11)
`V 12M
`
`(7b)
`
`in which 6.(2,,, and Ab.„, are the quantization step size
`and bit step size in subband m, respectively. If the
`subbands are all of equal width, then the channel noise
`a [Eq. (7b)] is of equal strength in each subband and
`Ab,„, can thus be taken the same in each subband [Eq.
`
`addition
`
`quantization
`
`extraction
`
`quantization
`
`audio
`
`data
`
`2
`
`data
`
`q
`
`Fig. 4. Addition and extraction blocks from Figs. 1 and 3 in greater details.
`
`J. Audio Eng. Soc., Vol. 40, No. 5. 1992 May
`
`379
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0403
`
`
`
`TEN KATE ET AL
`
`PAPERS
`
`(8)], Ab,a, = Aa = Cr.b
`
` Eq. (9) can now be written as
`
`hr-1
`1
`(V12T1
`— 2, TRUNC[2log
`
`C =
`M
`
`C
`b
`
`q21- +
`
`Ath
`
`N./12M
`210g
` +
`Cb
`
`1 44, -1
`2 2logAQ,„, — 2logAth .
`M mop
`
`(11)
`
`The first term reflects the effect of the channel noise.
`An increase in the parameter M, that is, splitting up
`the signal into more subbands, reduces the noise con-
`tribution in each band, which means that more bits can
`be added. This increases the complexity of the system
`and also the delay of the audio signal as a result of the
`narrow-band filtering. The coefficient Cb takes into
`account the tradeoff between the number of bits added
`and the error probability occurring. The second term
`indicates the masking effect of the audio signal: the
`greater the masking, the greater AQ, and thus the more
`information can be added. (As a result of the filtering,
`addition is also possible if some bands have AQ . ,,
`0.) The third term indicates that an increase in the
`representation accuracy of the audio signal increases
`the channel capacity by approximately the same size.
`For example, representation with 18 bits instead of 16
`(linear PCM) means a four-times reduction of iticb and
`thus an increase of C by 2 bits. (It is assumed here that
`addition has already taken place in each subband.) In
`the case of a transmission channel in which the rep-
`resentation accuracy varies, such as, for example, in
`NICAM [10), it may be useful to normalize AQ,„, by
`Acb to a new parameter, which means that the varying
`property can then be eliminated.
`As stated, (nearly) perfect reconstruction filter banks
`are used [8). This is necessary to ensure that the (sub-
`band) sample values used in the retrieval are (almost)
`the same as those which occurred after the addition
`(except for the wide-band quantization noise). In the
`filter structures used up and down sampling takes place
`(Figs. 1 and 3). This makes the system a multirate
`system. For a proper functioning the total delay between
`the two filters on both sides of the transmission channel
`must be a complete number of times the highest down-
`sampling factor (M). In that case the delay at subband
`level is also a complete number of sample periods.
`Consequently, synchronization is required at the re-
`ceiver end (processing in windows also makes this
`necessary). By not up and down sampling, this syn-
`
`chronization seems to be no longer required. However,
`the perfect reconstruction property will then be lost.
`Because of the processing (quantizing and adding) the
`spectrum of the subband samples changes over the whole
`bandwidth (given by the sampling frequency), while
`their fi lters only allow through the part in the corre-
`sponding subband. These two only coincide when
`sampling at the critical rate, and only then is perfect
`reconstruction possible. (The filter sequence for which
`the (nearly) perfect reconstruction property must apply
`is synthesis analysis, that is, the reverse to what the
`filter banks were designed for [8]. The fact that in this
`case the perfect reconstruction property is also valid
`can be seen by looking at the analysis —synthesis—
`analysis cascade. The first two filters form a perfect
`reconstruction pair as they were designed. The signals
`at the input of both analysis filters are therefore identical.
`Because the analysis filters are the same, it follows
`that the synthesis —analysis pair must also be a perfect
`reconstruction pair.)
`A different approach to the one stated here is Nyquist's
`first criterion. From this it also follows that with an
`ideal bandpass filter no intersymbol interference occurs
`if the symbols are on (a multiple of) the critical rate
`(and are detected synchronously).
`
`2 COMPATIBLE CODING
`
`2.1 The Principle
`Using the technique presented, a surround —stereo —
`surround coding system can now be developed which
`is very suitable for use in HDTV. Multichannel audio
`can be sent over a stereo transmission channel so that
`stereo reception is possible without additional modi-
`fication, while there is the possibility of surround re-
`ception with a receiver equipped with additional elec-
`tronics. In the following it will be assumed that the
`HDTV audio consists of fi ve audio channels.
`Fig. 5 shows the principle of the system. The pro-
`grams are supplied with fi ve-channel sound. A down
`mix to two-channel stereo is then made from this ver-
`sion. There are no restrictions on the way in which this
`down mix is made, that is, a signal with an optimum
`stereo effect can be produced. In addition to the stereo
`signal, a three-channel (audio) signal is also generated
`which, together with the stereo signal, contains all the
`information on the original five-channel composition.
`These information signals arc then added to the stereo
`signal according to the technique described in Sec. 1
`and retrieved at the receiver end.
`
`
`
`5-channel
`audio
`L, R, C, Si, SR
`
`MIX
`
`
`
`2-channel
`stereo
`L', R'
`
`surround
`information
`HI. H2, H3
`
`ADD
`
`transmission
`
`Fig. 5. Proposed coding scheme.
`
`380
`
`J. Audio Eng. Soc, Vol. 40, No. 5, 1992 May
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0404
`
`
`
`PAPERS
`
`SURROUND-STEREO-SURROUND CODING
`
`Because of the identical format, the signal transmitted
`is compatible and existing receivers can still be used.
`Reproduction of this signal will give the listener the
`stereo sensation as it was optimized during the down
`mix. Of course, the extra information is also reproduced
`but, because of the masking effect, the listener is not
`aware of this. This information is however still available
`by means of the technique described. The receiver must
`be expanded for this with additional electronics. After
`retrieving this information, the down mix carried out
`can be reversed, which means that the reproduction of
`the five-channel surround-sound sensation becoMes
`possible.
`
`2.2 The System
`The original five audio channels are indicated with
`L, R, C, SL, and SR. Of these the first two signals are
`thought to be supplied to loudspeakers which arc on
`the left and right of the video screen, respectively, the
`third (central) signal to a loudspeaker near the screen,
`and the latter two signals (surround) to the loudspeakers
`behind the listener (see Fig. 6). A stereo down mix
`could be
`
`L' := L + 2- v2 C + SL
`
`1
`
`R' := R + 1 V2 C + SR .
`
`(Other possibilities are conceivable.) Numerous signals
`can store the surround information here, but one pos-
`sibility is
`
`Hi := C
`
`H2 := SL
`
`113 := SR •
`
`(12c)
`
`(12d)
`
`(12e)
`
`HDTV
`video saeen
`
`000
`
`Fig. 6. Loudspeaker setup for five-channel surround sound.
`
`J. Audio Eng. Soc.. Vol. 40, No. 5, 1992 May
`
`In this case it is, of course, sensible to use first data
`reduction on C, SL, and SR [6]. The L' and R' signals
`are processed according to the method described in
`Sec. 1, and the information H1,112, 1 / 3 is added. After
`retrieving this information, the down mix can be re-
`versed and the fi ve-channel sensation can be produced
`again:
`
`L" := L' —
`
`1
`2 1/-2-
`
`+H2)
`
`R' —
`
`r -
`1
`2 v2 H I + 113)
`
`C" := H i
`
`SE := H2
`
`:= H3 .
`
`(13a)
`
`(13b)
`
`(13d)
`
`(13d)
`
`(13e)
`
`A problem may occur as a result of this dematrixing.
`During the addition of the information, a quantization
`must be carried out (see Sec. 1.1). This quantization
`is carried out on the subband samples of L' and R' and
`in such a way that the resulting quantization noise is
`masked by these audio signals and thus remains in-
`audible. The stereo signal including the added infor-
`mation thus still creates the same listening experience.
`Dematrixing [Eqs. (13)] can however separate the audio
`signal from the quantization noise, which means that
`the noise could become audible. The effect becomes
`clear by looking at a silent channel (and switching off
`the other loudspeakers when listening). Assume, for
`example, that all channels with the exception of channel
`C are silent. In that case L' and R' are both equal to
`V21./2C [see Eqs. (12a,b)J. These signals are quantized
`and H I (= C), 112 (silent) and 113 (silent) are added.
`After retrieval, C, SL, and SR are determined from H i ,
`if2, and 113. The result is used to reverse the down-
`mixing. This dematrixing will remove (1/2172H1 +
`H2.3) = V2VIC from L' and R' [see Eqs. (13a,b)].
`As a result of this the quantization noise produced during
`the addition procedure remains in the left and right
`channel L" and R", while the signal that masked this
`noise, 1/2 1r2C, is now transmitted to another loud-
`speaker, C". Because the audio signal is still present,
`it will still have a masking effect on the quantization
`noise, though this will be less effective than if they
`were both generated by the same loudspeaker.
`A remedy is to expand the information signals H1. 2.3
`with some extra control information. This information
`then indicates which channels are silent, so that after
`dematrixing, any residual sound can be removed from
`these channels. Possibly the information is given for
`every subband separately. In addition, instead of always
`coding C, SL, and SR in H I , 112, and H3, it is better to
`take the weakest three of L, R. C. SL, and SR. This
`ensures that the quantization noise is always in those
`
`381
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0405
`
`
`
`TEN KATE ET AL
`
`PAPERS
`
`signals which give the greatest masking and therefore
`that the chance of its audibility is limited. The choice
`made is added as control information to 111.2.3 and used
`during dematrixing. Informal listening tests on various
`types of program material have proven the validity of
`this procedure. Only by switching off some channels,
`it could occur that noises in the other channels became
`audible. Those cases only happened with especially
`constructed signals. Common audio signals did not re-
`veal any problem.
`A complete abundance of audible quantization noise
`is possible by adapting the (audio) input of the masking
`model [I I). Instead of the power spectrum of the down-
`mixed stereo signal, that of the signal which will remain
`after dematrixing should be used. For example, in the
`case described by Eqs. (12) and (13) the power spectrum
`of L and R instead of L' and R' should be taken to
`determine the masked threshold.
`A final question is whether there is always sufficient
`room available in the stereo signal to add the infor-
`mation. As explained in Sec. 1 with Eq. (11), this
`amount of room depends on two main fictors, namely,
`the masking power of the audio signal and its repre-
`sentation accuracy [AQ and 4,1, in Eq. (11)]. It is clear
`that a higher representation accuracy simplifies the task
`because the amount of information to be added is in-
`dependent of it. Experiments have, hOwever, shown
`that the representations currently used offer sufficient
`space for the information required. With regard to the
`masking power of the audio signal, one might naively
`expect there to be problems with low masking power.
`In this application, however, the information to be
`added, H i , Hz, and 113, is an audio signal which is
`also present in the masking signal itself, L' and R'. In
`other words, if there is limited masking, that is, if little
`room is available, there is also little information to be
`added. In the extreme case of no masking (L, R, C,
`SL, and SR are all silent), for example, there is also no
`need to add information. Another example is given by
`assuming L and R to contain the direct sound and early
`reflections and 5L and SR to contain the reverberation
`of a concert-hall recording. When the music stops,
`there is still a (decreasing) reverberation. However, in
`the down-mixed stereo signal L' and R', this rever-
`beration is also present and as a result there is still an
`audio signal in order to mask the information to be
`added (which information is that L and R are silent!).
`Within the European HDTV project EUREKA-95,
`the system is considered as a potential way to transmit
`HDTV sound. Its interesting feature is the compatibility
`to the two-channel D2MAC transmission standard. After
`various informal listening tests, which showed the sys-
`tem's potential, a formal listening test on the system's
`performance was organized by EU95. During the sum-
`mer of 1990 these tests have been conducted. Critical
`signals were constructed. The tests did not reveal any
`significant audible degradation of these signals after
`having been mixed into a two-channel NICAM stereo
`signal. Further formal listening tests are planned for
`early 1992.
`
`382
`
`3 CONCLUSIONS
`
`A new surround —stereo —surround coding technique
`is presented. The down mix to the stereo signal may
`be optimized to give the best stereo effect. The extra
`information required to reproduce the original multi-
`channel surround sensation using the stereo signal is
`added in this stereo signal. Here the masking effect is
`used so that the addition remains inaudible. Compat-
`ibility with current stereo standards is therefore guar-
`anteed. Using the system it is possible to maintain the
`original channel separation.
`
`4 ACKNOWLEDGMENT
`
`The authors would like to express their thanks to Dr.
`W. F. Druyvesteyn, who came up with the idea of using
`the masking effect for information addition, and to Dr.
`R. N. J. Veldhuis, who devised the basic algorithms
`for this addition.
`
`5 REFERENCES
`
`[1] E. Stetter, "Mehrkanal-Stereoton zum Bild far
`Kino and Fernsehen" (Multichannel Stereo Sound for
`Cinema and Television Picture)," Rundfunktech. Mitt.,
`vol. 35, pp. 1-9 (1991).
`[2] D. J. Meares, "Sound Systems for High Definition
`Television," Acoust. Bull., vol. 15, pp. 6-11 (1990).
`[3] W. R. Th. ten Kate, L. M. van de Kerkhof, and
`F. F. M. Zijdcrvcld, "Digital Audio Carrying Extra
`Information," in Proc. ICASSP90 (Albuquerque, NM,
`1990 Apr.), pp. 1097-1100.
`[4] B. C. J. Moore, An Introduction to the Psychology
`of Hearing, 3rd ed. (Academic Press, London, 1989).
`[5] E. Zwicker and H. Fast!, Psychoacoustics Facts
`and Models (Springer, Berlin, 1990).
`[6] R. N. J. Veldhuis, M. Breeuwer, and R. van der
`Waal, "Subband Coding of Digital Audio Signals,"
`Philips J. Res., vol. 44, pp. 329-343 (1989).
`[7] C. Gerwin and T. Ryden, "Subjective Assess-
`ments on Low Bit-Rate Audio Codecs," in Proc. 10th
`Int. AES Conf. on Images of Audio (London, 1991
`Sept.), pp. 91-102.
`[8] M. Vetterli and D. LeGall, "Perfect Recon-
`struction FIR Filter Banks: Some Properties and Fac-
`torizations," IEEE Trans. Acoust., Speech, Signal
`Process., vol. ASSP-37, pp. 057-1071 (1989).
`[9] N. S. Jayant and P. Noll, Digital Coding of
`Waveforms. (Prentice-Hall, Englewood Cliffs, NJ,
`1984).
`[10] C. R. Caine, A. R. English, and J. W. H.
`O'Clarey, "NICAM 3: Near-Instantaneously Corn-
`panded Digital Transmission System for High-Quality
`Sound Programmes," Rad. Elec. Eng. , vol. 50, pp.
`519-530 (1980).
`[11) W. R. Th. ten Kate, P. M. Boers, A. Miikivirta,
`J. Kuusama, E. Sorensen, and K. E. Christensen,
`"Matrixing of Bit Rate Reduced Audio Signals," in
`Proc. ICASSP92 (San Francisco, CA, 1992 March).
`
`J. Audio Eag. Soc., Vol. 40. No. 5, 1992 May
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0406
`
`
`
`PAPERS
`
`SURROUND—STEREO—SURROUND CODING
`
`THE AUTHORS
`
`‘.7
`
`X
`Vq.
`ti fie
`e,f yjSbor.
`KtItM,P.
`
`W. R. Th. ten Kate
`
`F. F. M. Zijderveld
`
`L. M. van de Kerkhof
`Warner R. Th. ten Kate was born in Leiden, The
`Netherlands, in 1959. He studied electrical engineering
`at Delft University of Technology, graduating in 1982
`cum 'nude, and received the 1983 prize awarded by
`the Delft University Fund. During the final stages of
`his studies his research was directed at solar cells of
`amorphous silicon and silicon radiation detectors. He
`received the Ph.D. degree in 1987. .
`Since 1988 Dr. ten Kate has been working in the
`Acoustics Group of Philips Research Laboratories. In
`1985 he also began studying the French horn at the
`Royal Conservatory in The Hague and graduated in
`1989 with distinction.
`
`1987 cum laude. He then moved to Philips Consumer
`Electronics. His activities arc in the sphere of digital
`audio, in particular audio sourcc coding and HDTV
`sound. He is involved in various international projects,
`including Eureka 95 (HDTV), Eureka 147 (Digital Au-
`dio Broadcasting), JESSI AEl4 (JESSI DAB), and ISO/
`MPEG Audio.
`
`•
`Franc F. M. Zijderveld was born in Helmond, The
`Netherlands, on 1961 November 20. in 1985 he com-
`pleted his studies in electrical engineering at the Eind-
`hoven Institute of Technology, his final project being
`the realization of an autofocus system for a CCD video
`camera. He then joined Philips Consumer Electronics,
`where he worked in the development laboratory for
`video equipment and was mainly involved in analog
`video signal processing in CCD cameras. In 1987 he
`moved to the Audio Signal Processing Group at the
`Philips Consumer Electronics Advanced Development
`Centre, working on the installation of an experimental
`four-channel audio postproduction room and on the
`digital 4-2-4 system. His current interest is centered
`on digital audio broadcasting.
`
`•
`Leon M. van de Kerkhof was born in Eindhoven,
`The Netherlands, in 1958. In 1978 he joined Philips
`Research Laboratories, where he worked on noise con-
`trol (including reactive sound absorbers and aerody-
`namic noise) and the use of adaptive filters in acoustics.
`At the same time he began an evening course in electrical
`engineering at the Institute of Technology. After grad-
`uating in 1981 he continued his studies at the Eindhoven
`University of Technology and received a degree in
`
`J. Audio Eng. Soc., Vol. 40, No. 5, 1992 May
`
`383
`
`BEST AVAILABLE COPY
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0407
`
`
`
`AN AUDIO ENGINEERING SOCIETY PREPRINT
`
`from the Journal of the Audio Engineering Society.
`portion thereof, is not permitted without direct permission
`All rights reserved. Reproduction of this preprint, or any
`
`42nd Street, New York, New York 10165, USA.
`and remittance to the Audio Engineering Society, 60 East
`Additional preprints may be obtained by sending request
`
`contents.
`the Review Board. The AES takes no responsibility for the
`manuscript, without editing, corrections or consideration by
`This preprint has been reproduced from the author's advance
`
`•
`
`AUDIO a
`
`Berlin
`1993 March 16-19
`the 94th Convention
`Presented at
`
`Oxon, United Kingdom
`Peter G. Craven
`Technical Consultant, Oxford, United Kingdom
`Michael A. Gerzon
`
`Preprint 3551 (D3-1)
`
`A High-Rate Buried Data Channel for Audio CD
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0408
`
`
`
`A High-Rate Buried Data Channel for Audio CD
`
`!Aloha°, A. Gerzon
`Technical Consultant, 57 Juxon St, Oxford OX2 BDJ, UK
`Peter 0. Craven
`11 Wessex Way, Grove, Mintage, Oxon OX12 OBS, UK
`
`Abstract
`
`The paper describes a new proposal for burying a high data rate data
`channel (with up to 360 kbit/s or more) compatibly within the data stream
`of an audio CD without significant impairment of existing CD
`performance. The new data channel may be used for high-quality data-
`reduced related audio channels, or even for data-compressed video or
`computer data, while retaining compatibility with existing audio CD
`players. The theory of the new channel coding technique is described.
`
`0. Introduction
`
`The paper describes a new proposal for burying a high data rate data
`channel (with up to 360 kbit/s or more) compatibly within the data stream of
`an audio CD without significant impairment of existing CD performance. The
`proposal in this paper is to replace a number (up to four per channel) of the
`least significant bits (LSBs) of the audio words by other data, and to use the
`psychoacoustic noise shaping techniques associated with noise shaped
`subtractive dither to reduce the audibility of the resulting added noise down to
`a subjective perceived level equal to that of conventional CD.
`
`Simply replacing the LSBs of existing audio data would, of course cause a
`drastic audible modification of the existing audio signal for two reasons :
`1) the wordlength of existing signals would be truncated to (say) only
`12 bits, which would not only reduce the basic quantization resolution by 24
`dB, but also would introduce the problems of added distortion and modulation
`noise caused by truncation (e.g. see refs. [1-41).
`2) Additionally, the replaced last (say) 4 LSBs would themselves
`constitute an added noise signal, which itself may not have a perceptually
`desirable random-noise like quality, and will also add to the perceived noise
`level in the main audio signal, typically increasing the noise by a further 3 dB
`above that due to truncation alone, giving in this case as much as 27 dB
`degradation total in noise performance.
`
`This paper describes methods of overcoming all these problem in replacing
`the last few LSBs of an audio signal by other data. The new method involves
`the following steps:
`A) Using a pseudo-random encode/decode process, operating only on
`Page 1
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0409
`
`
`
`the LSB data stream itself without extra synchronizing signals, to make the
`added LSB data effectively of random noise form, so that the added signal
`becomes truly noise-like.
`B) Using this pseudo-random data signal as a subtractive dither signal
`(e.g. see [1-4]), so that simultaneously it does not add to the perceived noise
`and that it removes all nonlinear distortion and modulation noise effects
`caused by truncation. Remarkably, and unlike in the ordinary subtractive
`dither case [3], this does not require the use of a special subtractive dither
`decoder, so that the process works on a standard off-the-shelf CD player,
`and
`
`at
`additionally,
`C)
`incorporating
`stage,
`encoding
`the
`psychoacoustically optimized noise shaping of the (subtractive) truncation
`error, thereby reducing the perceived truncation noise error by around 17 dB
`furthe