throbber
SURROUND—STEREO —SURROUND CODING
`
`number of consecutive bits from the data flow are taken
`together to form words. Each word is interpreted as an
`address which indicates a unique sample value, as shown
`in Fig. 2. The series of bits is therefore converted into
`a series of. samples via this word series. These data
`samples are then grouped into windows and added to
`the corresponding samples in the subband window of
`the original audio signal.
`The number of bits n b which are used to form one
`word depends on the set masked threshold in the subband
`and the difference Ab between the consecutive sample
`values (see Fig. 2; Ab will be indicated in the following
`by the bit step size). By assuming that the incoming
`series of bits has a uniform probability density distri-
`bution, a power
`
`Pb = (221Th — 1)
`
`12
`
`(1)
`
`74
`
`2 5
`
`4
`2—
`
`sample
`
`0
`
`3
`At)
`
`1
`2— Ab
`
`2
`
`l ob
`
`5 e.,
`2 w
`z Ab
`2
`
`000 001 011 010 110 111 101 100
`address
`Fig. 2. Example as illustration of data sample construction
`with 3-bit words.
`
`addition
`
`up
`sampling
`
`fatering
`
`combined
`out
`
`M
`
`PAPERS
`
`works best on sounds whose spectral components are
`close to those of the masking sound, but also occurs
`for components further away. The effect decreases more
`quickly toward lower frequencies than toward higher
`ones. The same is true for the time behavior: the masking
`is greatest for sounds which occur simultaneously, but
`can also be perceived in the time intervals shortly before
`and after the masking sound is supplied.
`As stated, it can be deduced from the masking effect
`that there are signals which can be added inaudibly to
`an audio signal. The momentary power spectrum of
`these signals should therefore remain at all times under
`the masked threshold of that point in time. This means
`therefore that a data flow (series of bits) can also be
`added, that is, by constructing a signal of this kind
`from these bits. This can be done in the following way
`(see Fig. 1).
`In order to use the masking effect, the signal is first
`split into subbands by means of filtering. The samples
`in each subband are then grouped into consecutive time
`windows (of approximately 10 ms in length). The win-
`dows from all subbands which represeni the same time
`interval form blocks. For each block the power spectrum
`is calculated, which is then used to determine the masked
`threshold in each subband [6]. From this the maximum
`permitted power of a signal to be added can be obtained
`per subband, so that this can be constructed from the
`data flow. After the addition the subband signals are
`joined together again by a reconstruction filter bank to
`form a wide-band signal. On the premise that the im-
`plemented scheme determines the masked threshold
`correctly, the resulting wide-band signal will sound
`the same as the original audio signal. In the paper it
`is assumed that the used masking model is correct.
`Extensive listening tests, however, have confirmed this
`[7].
`The signal to be added from the data flow and the
`set masked threshold is constructed as follows. A certain
`
`filleting down
`sampling
`
`blocking
`
`• •
`0 OOOOO
`• 0 OOOOOO
`
`eT AT
`
`audio
`In
`
`104 -
`
`data In
`Ju
`
`analyzing
`
`maslJng threshold
`
`1-4
`
`constructing
`
`J. Audio Eng. Soc., Vol. 40. No. 5. 1992 May
`
`377
`
`Fig. I. Basic diagram for data addition.
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0401
`
`

`

`TEN KATE ET AL
`
`can be assigned to each window of samples constructed
`in this way. With a given bit step size A b , the number
`of bits per word is thus obtained from Eq. (1) as the
`maximum number nb that still supplies a power under
`the set masked threshold in the corresponding subband.
`How the size of Ab is determined is discussed in Sec.
`1.2.
`The signal constructed in this way will have a power
`spectrum, the height of which is given by Eq. (1), but
`which is extended over the whole frequency range.
`However, the addition of the data signal at subband
`level limits the width of this spectrum to that of the
`subband. The grouping in subband time blocks is thus
`used not only to determine the masking properties of
`the audio signal, but also to modify the frequency —
`time characteristic of the data signals to be added.
`The schematic diagram for retrieving the data added
`from the audio signal produced is shown in Fig. 3. The
`audio signal is first filtered in subbands and grouped
`in time windows, so that the same blocks are formed
`again (the filter banks to be used are of the (nearly)
`perfect reconstruction type [8)). After the position of
`the masked threshold has been determined, the sample
`values are extracted from the data signal as they were
`constructed during the addition. From the position of
`the masked threshold, the number of bits nb that was
`added is again determined using Eq. (1). Finally, by
`using the same addressing table as that used during the
`addition (Fig. 2), the conversion to bit words can be
`made which, by placing them one after the other, again
`from the original data flow. Retrieval is thus obtained.
`In order to distinguish between the added data sample
`value and the original audio sample value, it is necessary
`to apply a reference level in the combined signal. A
`level of this kind can be achieved by first quantizing
`the audio samples before carrying out the addition-. In
`this case, quantizing can be described as
`
`Q(s) = AQ * ROUND(s/AQ) ,
`
`(2)
`
`PAPERS
`
`where s is the value of the sample to be quantized,
`Q(s) its value after quantization and A() the quantization
`step size. In order to distinguish between audio sample
`value and data sample value, a step size AQ should be
`used which is greater than the range of possible data
`sample values:
`
`AQ > 2
`
`2"b — 1
`2
`
`Ab
`
`(3)
`
`The data sample value can then be recognized as the
`"quantization noise," which results from quantizing
`the combined sample again (see Fig. 4).
`The quantization of the audio signal reduces the ac-
`curacy of its representation, and this can be modeled
`as an increase in its noise level. Because the quantization
`has been used on a time-limited subband signal, this
`noise is however masked as long as its power remains
`under the masked threshold. (This property is also used
`with bit-rate reduction techniques 16).) The noise power
`is given as [9]
`
`P = 12
`
`.(4)
`
`Because the quantization noise and the data signal are
`not correlated, the total power to be masked is obtained
`from the sum of their respective powers, given by Eqs.
`(1) and (4). Using Eq. (3), this power can be written
`as
`
`Pt =
`
`PQ <
`
`•
`
`2
`
`(5)
`
`The addition and retrieval parameters AQ and nb can
`therefore be determined as follows. After determining
`the masked threshold, the maximum possible quanti-
`zation step size AQ is determined using Eq. (5). The
`maximum number of nb bits which can be added is
`
`filtering down
`sampling
`
`biocidng
`
`combined
`In
`
`•o eeeee oo
`
`AT AT
`
`exuactlon deconstructing
`
`e
`
`data
`out
`_rum_
`
`analyzing
`
`masking threshold
`
`378
`
`J. Audia•Eng. Soc., Vol. 40, No. 5, 1992 May
`
`Fig. 3. Basic diagram for data retrieval.
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0402
`
`

`

`PAPERS
`
`SURROUND-STEREO-SURROUND CODING
`
`then obtained from Eq. (3).
`The resulting addition process can also be viewed
`as follows. It is determined for each sample value which
`part of its representation is significant and which part
`is not. This distinction is made possible by the masking
`effect: only a limited accuracy can be detected by the
`human ear. The insignificant part of the signal is then
`replaced by a different value, which indicates the in-
`formation to be added.
`
`1.2 Noise
`The starting point is that the processing takes place
`with digital audio signals. This means that the combined
`signal produced will be quantized after the fi nal filtering
`to a wide-band signal (see Fig. 1) to the representation
`accuracy of the transmission channel over which it will
`be sent. This creates quantization noise with, in the
`case of a channel with a linear quantization (PCM), a
`fiat spectrum (that is, over the whole audio band) and
`a power PN of [9]
`
`12
`
`PN =
`
`(6)
`
`in which A ch indicates the quantization step size of the
`transmission channel.
`The audio signal is fi ltered again in subbands at the
`receiver end (see Fig. 3) This affects the channel quan-
`tization noise in two ways. First, the probability density
`distribution of the noise will change into a Gaussian
`one and second, the power in each subband will decrease
`in proportion to the bandwidth of this subband. Thus
`in the case of a perfect transmission channel and a
`filtering in M subbands of equal width, the subband
`samples received have a noise component with a prob-
`ability density function
`
`It is this standard deviation a which determines the
`selection of the bit step size Ai, in Eq. (1).
`The data bits are recovered by converting the data
`samples received back to their address bit words ac-
`cording to a procedure as shown in Fig. 2. As a result
`of the noise, faults may occur in this process. By the
`use of a Gray code conversion [9] (Fig. 2) only 1 bit
`will toggle in the bit word each time the noise exceeds
`a decision threshold. (These thresholds lie in the middle
`between the noise-free sample values.)
`Using Eq. (7a) an estimate can now be made of the
`error probability that n bits will be converted incorrectly
`(n = I):
`
`J.- (is - 1/2)Ab
`
`P(n) =
`
`p(e) de +
`
`p(e) de
`
`(n- (2)4
`
`(2n —
`I Ab
`= 1 — erf
`2V a/
`
`•
`
`(8)
`
`Thus with a according to Eq. (7b), Ab can be set for
`a certain error probability P(n). On the other hand, Ab
`affects the number of bits nb that can be added [see
`Eq. (3)]. As a result there is a tradeoff between n b and
`P(n).
`In fact, the audio signal itself can be regarded as a
`"channel" over which the data are transported. A channel
`capacity C can then be defined as
`
`C = I E
`M at O
`
`(9)
`
`where M is the number of subbands and n b.„, is the
`number of added bits per sample in subband m. Ac-
`cording to Eq. (3), n b.,„ follows as
`
`P(e) =
`
`1
`a;r
`
`exp
`
`— e 2
`---2-
`
`2 cr
`
`(7a)
`
`A0
`nb.„, = TRUNC[21og(---22A
`
`1-xs.m
`
`+ 1)]
`
`(10)
`
`where E is the magnitude and a the standard deviation.
`a is given by
`
`er
`
`1/ 1-11)
`V 12M
`
`(7b)
`
`in which 6.(2,,, and Ab.„, are the quantization step size
`and bit step size in subband m, respectively. If the
`subbands are all of equal width, then the channel noise
`a [Eq. (7b)] is of equal strength in each subband and
`Ab,„, can thus be taken the same in each subband [Eq.
`
`addition
`
`quantization
`
`extraction
`
`quantization
`
`audio
`
`data
`
`2
`
`data
`
`q
`
`Fig. 4. Addition and extraction blocks from Figs. 1 and 3 in greater details.
`
`J. Audio Eng. Soc., Vol. 40, No. 5. 1992 May
`
`379
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0403
`
`

`

`TEN KATE ET AL
`
`PAPERS
`
`(8)], Ab,a, = Aa = Cr.b
`
` Eq. (9) can now be written as
`
`hr-1
`1
`(V12T1
`— 2, TRUNC[2log
`
`C =
`M
`
`C
`b
`
`q21- +
`
`Ath
`
`N./12M
`210g
` +
`Cb
`
`1 44, -1
`2 2logAQ,„, — 2logAth .
`M mop
`
`(11)
`
`The first term reflects the effect of the channel noise.
`An increase in the parameter M, that is, splitting up
`the signal into more subbands, reduces the noise con-
`tribution in each band, which means that more bits can
`be added. This increases the complexity of the system
`and also the delay of the audio signal as a result of the
`narrow-band filtering. The coefficient Cb takes into
`account the tradeoff between the number of bits added
`and the error probability occurring. The second term
`indicates the masking effect of the audio signal: the
`greater the masking, the greater AQ, and thus the more
`information can be added. (As a result of the filtering,
`addition is also possible if some bands have AQ . ,,
`0.) The third term indicates that an increase in the
`representation accuracy of the audio signal increases
`the channel capacity by approximately the same size.
`For example, representation with 18 bits instead of 16
`(linear PCM) means a four-times reduction of iticb and
`thus an increase of C by 2 bits. (It is assumed here that
`addition has already taken place in each subband.) In
`the case of a transmission channel in which the rep-
`resentation accuracy varies, such as, for example, in
`NICAM [10), it may be useful to normalize AQ,„, by
`Acb to a new parameter, which means that the varying
`property can then be eliminated.
`As stated, (nearly) perfect reconstruction filter banks
`are used [8). This is necessary to ensure that the (sub-
`band) sample values used in the retrieval are (almost)
`the same as those which occurred after the addition
`(except for the wide-band quantization noise). In the
`filter structures used up and down sampling takes place
`(Figs. 1 and 3). This makes the system a multirate
`system. For a proper functioning the total delay between
`the two filters on both sides of the transmission channel
`must be a complete number of times the highest down-
`sampling factor (M). In that case the delay at subband
`level is also a complete number of sample periods.
`Consequently, synchronization is required at the re-
`ceiver end (processing in windows also makes this
`necessary). By not up and down sampling, this syn-
`
`chronization seems to be no longer required. However,
`the perfect reconstruction property will then be lost.
`Because of the processing (quantizing and adding) the
`spectrum of the subband samples changes over the whole
`bandwidth (given by the sampling frequency), while
`their fi lters only allow through the part in the corre-
`sponding subband. These two only coincide when
`sampling at the critical rate, and only then is perfect
`reconstruction possible. (The filter sequence for which
`the (nearly) perfect reconstruction property must apply
`is synthesis analysis, that is, the reverse to what the
`filter banks were designed for [8]. The fact that in this
`case the perfect reconstruction property is also valid
`can be seen by looking at the analysis —synthesis—
`analysis cascade. The first two filters form a perfect
`reconstruction pair as they were designed. The signals
`at the input of both analysis filters are therefore identical.
`Because the analysis filters are the same, it follows
`that the synthesis —analysis pair must also be a perfect
`reconstruction pair.)
`A different approach to the one stated here is Nyquist's
`first criterion. From this it also follows that with an
`ideal bandpass filter no intersymbol interference occurs
`if the symbols are on (a multiple of) the critical rate
`(and are detected synchronously).
`
`2 COMPATIBLE CODING
`
`2.1 The Principle
`Using the technique presented, a surround —stereo —
`surround coding system can now be developed which
`is very suitable for use in HDTV. Multichannel audio
`can be sent over a stereo transmission channel so that
`stereo reception is possible without additional modi-
`fication, while there is the possibility of surround re-
`ception with a receiver equipped with additional elec-
`tronics. In the following it will be assumed that the
`HDTV audio consists of fi ve audio channels.
`Fig. 5 shows the principle of the system. The pro-
`grams are supplied with fi ve-channel sound. A down
`mix to two-channel stereo is then made from this ver-
`sion. There are no restrictions on the way in which this
`down mix is made, that is, a signal with an optimum
`stereo effect can be produced. In addition to the stereo
`signal, a three-channel (audio) signal is also generated
`which, together with the stereo signal, contains all the
`information on the original fi ve-channel composition.
`These information signals arc then added to the stereo
`signal according to the technique described in Sec. 1
`and retrieved at the receiver end.
`
`
`
`5-channel
`audio
`L, R, C, Si, SR
`
`MIX
`
`
`
`2-channel
`stereo
`L', R'
`
`surround
`information
`HI. H2, H3
`
`ADD
`
`transmission
`
`Fig. 5. Proposed coding scheme.
`
`380
`
`J. Audio Eng. Soc, Vol. 40, No. 5, 1992 May
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0404
`
`

`

`PAPERS
`
`SURROUND-STEREO-SURROUND CODING
`
`Because of the identical format, the signal transmitted
`is compatible and existing receivers can still be used.
`Reproduction of this signal will give the listener the
`stereo sensation as it was optimized during the down
`mix. Of course, the extra information is also reproduced
`but, because of the masking effect, the listener is not
`aware of this. This information is however still available
`by means of the technique described. The receiver must
`be expanded for this with additional electronics. After
`retrieving this information, the down mix carried out
`can be reversed, which means that the reproduction of
`the five-channel surround-sound sensation becoMes
`possible.
`
`2.2 The System
`The original fi ve audio channels are indicated with
`L, R, C, SL, and SR. Of these the first two signals are
`thought to be supplied to loudspeakers which arc on
`the left and right of the video screen, respectively, the
`third (central) signal to a loudspeaker near the screen,
`and the latter two signals (surround) to the loudspeakers
`behind the listener (see Fig. 6). A stereo down mix
`could be
`
`L' := L + 2- v2 C + SL
`
`1
`
`R' := R + 1 V2 C + SR .
`
`(Other possibilities are conceivable.) Numerous signals
`can store the surround information here, but one pos-
`sibility is
`
`Hi := C
`
`H2 := SL
`
`113 := SR •
`
`(12c)
`
`(12d)
`
`(12e)
`
`HDTV
`video saeen
`
`000
`
`Fig. 6. Loudspeaker setup for five-channel surround sound.
`
`J. Audio Eng. Soc.. Vol. 40, No. 5, 1992 May
`
`In this case it is, of course, sensible to use first data
`reduction on C, SL, and SR [6]. The L' and R' signals
`are processed according to the method described in
`Sec. 1, and the information H 1 ,112, 1 / 3 is added. After
`retrieving this information, the down mix can be re-
`versed and the fi ve-channel sensation can be produced
`again:
`
`L" := L' —
`
`1
`2 1/-2-
`
`+H2)
`
`R' —
`
`r -
`1
`2 v2 H I + 113)
`
`C" := H i
`
`SE := H2
`
`:= H3 .
`
`(13a)
`
`(13b)
`
`(13d)
`
`(13d)
`
`(13e)
`
`A problem may occur as a result of this dematrixing.
`During the addition of the information, a quantization
`must be carried out (see Sec. 1.1). This quantization
`is carried out on the subband samples of L' and R' and
`in such a way that the resulting quantization noise is
`masked by these audio signals and thus remains in-
`audible. The stereo signal including the added infor-
`mation thus still creates the same listening experience.
`Dematrixing [Eqs. (13)] can however separate the audio
`signal from the quantization noise, which means that
`the noise could become audible. The effect becomes
`clear by looking at a silent channel (and switching off
`the other loudspeakers when listening). Assume, for
`example, that all channels with the exception of channel
`C are silent. In that case L' and R' are both equal to
`V21./2C [see Eqs. (12a,b)J. These signals are quantized
`and H I (= C), 112 (silent) and 113 (silent) are added.
`After retrieval, C, SL, and SR are determined from H i ,
`if2, and 113. The result is used to reverse the down-
`mixing. This dematrixing will remove (1/2172H1 +
`H2.3) = V2VIC from L' and R' [see Eqs. (13a,b)].
`As a result of this the quantization noise produced during
`the addition procedure remains in the left and right
`channel L" and R", while the signal that masked this
`noise, 1/2 1r2C, is now transmitted to another loud-
`speaker, C". Because the audio signal is still present,
`it will still have a masking effect on the quantization
`noise, though this will be less effective than if they
`were both generated by the same loudspeaker.
`A remedy is to expand the information signals H1. 2. 3
`with some extra control information. This information
`then indicates which channels are silent, so that after
`dematrixing, any residual sound can be removed from
`these channels. Possibly the information is given for
`every subband separately. In addition, instead of always
`coding C, SL, and SR in H I , 112, and H3, it is better to
`take the weakest three of L, R. C. SL, and SR. This
`ensures that the quantization noise is always in those
`
`381
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0405
`
`

`

`TEN KATE ET AL
`
`PAPERS
`
`signals which give the greatest masking and therefore
`that the chance of its audibility is limited. The choice
`made is added as control information to 111.2.3 and used
`during dematrixing. Informal listening tests on various
`types of program material have proven the validity of
`this procedure. Only by switching off some channels,
`it could occur that noises in the other channels became
`audible. Those cases only happened with especially
`constructed signals. Common audio signals did not re-
`veal any problem.
`A complete abundance of audible quantization noise
`is possible by adapting the (audio) input of the masking
`model [I I). Instead of the power spectrum of the down-
`mixed stereo signal, that of the signal which will remain
`after dematrixing should be used. For example, in the
`case described by Eqs. (12) and (13) the power spectrum
`of L and R instead of L' and R' should be taken to
`determine the masked threshold.
`A final question is whether there is always sufficient
`room available in the stereo signal to add the infor-
`mation. As explained in Sec. 1 with Eq. (11), this
`amount of room depends on two main fictors, namely,
`the masking power of the audio signal and its repre-
`sentation accuracy [AQ and 4,1, in Eq. (11)]. It is clear
`that a higher representation accuracy simplifies the task
`because the amount of information to be added is in-
`dependent of it. Experiments have, hOwever, shown
`that the representations currently used offer sufficient
`space for the information required. With regard to the
`masking power of the audio signal, one might naively
`expect there to be problems with low masking power.
`In this application, however, the information to be
`added, H i , Hz, and 113, is an audio signal which is
`also present in the masking signal itself, L' and R'. In
`other words, if there is limited masking, that is, if little
`room is available, there is also little information to be
`added. In the extreme case of no masking (L, R, C,
`SL, and SR are all silent), for example, there is also no
`need to add information. Another example is given by
`assuming L and R to contain the direct sound and early
`reflections and 5L and SR to contain the reverberation
`of a concert-hall recording. When the music stops,
`there is still a (decreasing) reverberation. However, in
`the down-mixed stereo signal L' and R', this rever-
`beration is also present and as a result there is still an
`audio signal in order to mask the information to be
`added (which information is that L and R are silent!).
`Within the European HDTV project EUREKA-95,
`the system is considered as a potential way to transmit
`HDTV sound. Its interesting feature is the compatibility
`to the two-channel D2MAC transmission standard. After
`various informal listening tests, which showed the sys-
`tem's potential, a formal listening test on the system's
`performance was organized by EU95. During the sum-
`mer of 1990 these tests have been conducted. Critical
`signals were constructed. The tests did not reveal any
`significant audible degradation of these signals after
`having been mixed into a two-channel NICAM stereo
`signal. Further formal listening tests are planned for
`early 1992.
`
`382
`
`3 CONCLUSIONS
`
`A new surround —stereo —surround coding technique
`is presented. The down mix to the stereo signal may
`be optimized to give the best stereo effect. The extra
`information required to reproduce the original multi-
`channel surround sensation using the stereo signal is
`added in this stereo signal. Here the masking effect is
`used so that the addition remains inaudible. Compat-
`ibility with current stereo standards is therefore guar-
`anteed. Using the system it is possible to maintain the
`original channel separation.
`
`4 ACKNOWLEDGMENT
`
`The authors would like to express their thanks to Dr.
`W. F. Druyvesteyn, who came up with the idea of using
`the masking effect for information addition, and to Dr.
`R. N. J. Veldhuis, who devised the basic algorithms
`for this addition.
`
`5 REFERENCES
`
`[1] E. Stetter, "Mehrkanal-Stereoton zum Bild far
`Kino and Fernsehen" (Multichannel Stereo Sound for
`Cinema and Television Picture)," Rundfunktech. Mitt.,
`vol. 35, pp. 1-9 (1991).
`[2] D. J. Meares, "Sound Systems for High Definition
`Television," Acoust. Bull., vol. 15, pp. 6-11 (1990).
`[3] W. R. Th. ten Kate, L. M. van de Kerkhof, and
`F. F. M. Zijdcrvcld, "Digital Audio Carrying Extra
`Information," in Proc. ICASSP90 (Albuquerque, NM,
`1990 Apr.), pp. 1097-1100.
`[4] B. C. J. Moore, An Introduction to the Psychology
`of Hearing, 3rd ed. (Academic Press, London, 1989).
`[5] E. Zwicker and H. Fast!, Psychoacoustics Facts
`and Models (Springer, Berlin, 1990).
`[6] R. N. J. Veldhuis, M. Breeuwer, and R. van der
`Waal, "Subband Coding of Digital Audio Signals,"
`Philips J. Res., vol. 44, pp. 329-343 (1989).
`[7] C. Gerwin and T. Ryden, "Subjective Assess-
`ments on Low Bit-Rate Audio Codecs," in Proc. 10th
`Int. AES Conf. on Images of Audio (London, 1991
`Sept.), pp. 91-102.
`[8] M. Vetterli and D. LeGall, "Perfect Recon-
`struction FIR Filter Banks: Some Properties and Fac-
`torizations," IEEE Trans. Acoust., Speech, Signal
`Process., vol. ASSP-37, pp. 057-1071 (1989).
`[9] N. S. Jayant and P. Noll, Digital Coding of
`Waveforms. (Prentice-Hall, Englewood Cliffs, NJ,
`1984).
`[10] C. R. Caine, A. R. English, and J. W. H.
`O'Clarey, "NICAM 3: Near-Instantaneously Corn-
`panded Digital Transmission System for High-Quality
`Sound Programmes," Rad. Elec. Eng. , vol. 50, pp.
`519-530 (1980).
`[11) W. R. Th. ten Kate, P. M. Boers, A. Miikivirta,
`J. Kuusama, E. Sorensen, and K. E. Christensen,
`"Matrixing of Bit Rate Reduced Audio Signals," in
`Proc. ICASSP92 (San Francisco, CA, 1992 March).
`
`J. Audio Eag. Soc., Vol. 40. No. 5, 1992 May
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0406
`
`

`

`PAPERS
`
`SURROUND—STEREO—SURROUND CODING
`
`THE AUTHORS
`
`‘.7
`
`X
`Vq.
`ti fie
`e,f yjSbor.
`KtItM,P.
`
`W. R. Th. ten Kate
`
`F. F. M. Zijderveld
`
`L. M. van de Kerkhof
`Warner R. Th. ten Kate was born in Leiden, The
`Netherlands, in 1959. He studied electrical engineering
`at Delft University of Technology, graduating in 1982
`cum 'nude, and received the 1983 prize awarded by
`the Delft University Fund. During the final stages of
`his studies his research was directed at solar cells of
`amorphous silicon and silicon radiation detectors. He
`received the Ph.D. degree in 1987. .
`Since 1988 Dr. ten Kate has been working in the
`Acoustics Group of Philips Research Laboratories. In
`1985 he also began studying the French horn at the
`Royal Conservatory in The Hague and graduated in
`1989 with distinction.
`
`1987 cum laude. He then moved to Philips Consumer
`Electronics. His activities arc in the sphere of digital
`audio, in particular audio sourcc coding and HDTV
`sound. He is involved in various international projects,
`including Eureka 95 (HDTV), Eureka 147 (Digital Au-
`dio Broadcasting), JESSI AEl4 (JESSI DAB), and ISO/
`MPEG Audio.
`
`•
`Franc F. M. Zijderveld was born in Helmond, The
`Netherlands, on 1961 November 20. in 1985 he com-
`pleted his studies in electrical engineering at the Eind-
`hoven Institute of Technology, his final project being
`the realization of an autofocus system for a CCD video
`camera. He then joined Philips Consumer Electronics,
`where he worked in the development laboratory for
`video equipment and was mainly involved in analog
`video signal processing in CCD cameras. In 1987 he
`moved to the Audio Signal Processing Group at the
`Philips Consumer Electronics Advanced Development
`Centre, working on the installation of an experimental
`four-channel audio postproduction room and on the
`digital 4-2-4 system. His current interest is centered
`on digital audio broadcasting.
`
`•
`Leon M. van de Kerkhof was born in Eindhoven,
`The Netherlands, in 1958. In 1978 he joined Philips
`Research Laboratories, where he worked on noise con-
`trol (including reactive sound absorbers and aerody-
`namic noise) and the use of adaptive filters in acoustics.
`At the same time he began an evening course in electrical
`engineering at the Institute of Technology. After grad-
`uating in 1981 he continued his studies at the Eindhoven
`University of Technology and received a degree in
`
`J. Audio Eng. Soc., Vol. 40, No. 5, 1992 May
`
`383
`
`BEST AVAILABLE COPY
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0407
`
`

`

`AN AUDIO ENGINEERING SOCIETY PREPRINT
`
`from the Journal of the Audio Engineering Society.
`portion thereof, is not permitted without direct permission
`All rights reserved. Reproduction of this preprint, or any
`
`42nd Street, New York, New York 10165, USA.
`and remittance to the Audio Engineering Society, 60 East
`Additional preprints may be obtained by sending request
`
`contents.
`the Review Board. The AES takes no responsibility for the
`manuscript, without editing, corrections or consideration by
`This preprint has been reproduced from the author's advance
`
`•
`
`AUDIO a
`
`Berlin
`1993 March 16-19
`the 94th Convention
`Presented at
`
`Oxon, United Kingdom
`Peter G. Craven
`Technical Consultant, Oxford, United Kingdom
`Michael A. Gerzon
`
`Preprint 3551 (D3-1)
`
`A High-Rate Buried Data Channel for Audio CD
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0408
`
`

`

`A High-Rate Buried Data Channel for Audio CD
`
`!Aloha°, A. Gerzon
`Technical Consultant, 57 Juxon St, Oxford OX2 BDJ, UK
`Peter 0. Craven
`11 Wessex Way, Grove, Mintage, Oxon OX12 OBS, UK
`
`Abstract
`
`The paper describes a new proposal for burying a high data rate data
`channel (with up to 360 kbit/s or more) compatibly within the data stream
`of an audio CD without significant impairment of existing CD
`performance. The new data channel may be used for high-quality data-
`reduced related audio channels, or even for data-compressed video or
`computer data, while retaining compatibility with existing audio CD
`players. The theory of the new channel coding technique is described.
`
`0. Introduction
`
`The paper describes a new proposal for burying a high data rate data
`channel (with up to 360 kbit/s or more) compatibly within the data stream of
`an audio CD without significant impairment of existing CD performance. The
`proposal in this paper is to replace a number (up to four per channel) of the
`least significant bits (LSBs) of the audio words by other data, and to use the
`psychoacoustic noise shaping techniques associated with noise shaped
`subtractive dither to reduce the audibility of the resulting added noise down to
`a subjective perceived level equal to that of conventional CD.
`
`Simply replacing the LSBs of existing audio data would, of course cause a
`drastic audible modification of the existing audio signal for two reasons :
`1) the wordlength of existing signals would be truncated to (say) only
`12 bits, which would not only reduce the basic quantization resolution by 24
`dB, but also would introduce the problems of added distortion and modulation
`noise caused by truncation (e.g. see refs. [1-41).
`2) Additionally, the replaced last (say) 4 LSBs would themselves
`constitute an added noise signal, which itself may not have a perceptually
`desirable random-noise like quality, and will also add to the perceived noise
`level in the main audio signal, typically increasing the noise by a further 3 dB
`above that due to truncation alone, giving in this case as much as 27 dB
`degradation total in noise performance.
`
`This paper describes methods of overcoming all these problem in replacing
`the last few LSBs of an audio signal by other data. The new method involves
`the following steps:
`A) Using a pseudo-random encode/decode process, operating only on
`Page 1
`
`DISH-Blue Spike-246
`Exhibit 1010, Page 0409
`
`

`

`the LSB data stream itself without extra synchronizing signals, to make the
`added LSB data effectively of random noise form, so that the added signal
`becomes truly noise-like.
`B) Using this pseudo-random data signal as a subtractive dither signal
`(e.g. see [1-4]), so that simultaneously it does not add to the perceived noise
`and that it removes all nonlinear distortion and modulation noise effects
`caused by truncation. Remarkably, and unlike in the ordinary subtractive
`dither case [3], this does not require the use of a special subtractive dither
`decoder, so that the process works on a standard off-the-shelf CD player,
`and
`
`at
`additionally,
`C)
`incorporating
`stage,
`encoding
`the
`psychoacoustically optimized noise shaping of the (subtractive) truncation
`error, thereby reducing the perceived truncation noise error by around 17 dB

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket