`
`Applicati o n Issues of MPEG-1 /2 Video Cocli ng
`
`369
`
`during a transition period, bott1 NTSC and DTV service will be simultaneously broadcast on
`different channels J11d DTV cc1n 011ly use the taboo channels. This approacl1 allows a smooth
`transition to DTV, suct1 that the services of tl1e existing NTSC receivers will remain and gradually
`be phased out of existe11ce i,1 the year 2006. The si,nulcasting requirernent causes some tccl1nical
`difficulties in DTV desig11. First, tl1e l1igl1-quality HDTV prog,ram must be ·delivered in a 6-MHz
`cl1annel to n1ake efficje,1t use of spectrurr1 and to fit allocation plans for the spectrt.1n1 assigned to
`television broadcasti 11g. Seco11d, a low-power and low-interference signal must be used so that
`si1nulcasti11g i11 tl1e sa111e frequency allocations as current NTSC service does not c·ause excessive
`interferen ce witl1 the existing NTSC receiving, since the taboo channels are generally unsuitable
`for broadcastir1g an NTSC signal due to high interference. In addition to satisfying tl1e frequency
`spec trun1 requirement, tl1e DTV standard has several in1portant features, which allow DTV to
`achieve interope rability witl1 computers and data corn1nunications. The first feature is the adoption
`of a layered digital system arcl1itecture. Each individual layer of tl1e system is desig.ned to be
`ir1teroperable witl1 otl1er systems at the corresponding layers. For example, the square pixel and
`progressive sca11 picture forn1at should be provided to allo\v computers access Lo the compression
`layer or picture layer depending on the capacity of the computers and the ATM-like packet for1nat
`for the ATM nel vvork to access the transport layer. Second, tl1e DTV standard uses a l·1eader/descrip(cid:173)
`tor approach to provide maxin1u1n flexible operating characteristics. Therefore, the layered archi(cid:173)
`tecture is the 111ost important feature of DTV standards. Tl1e additional advantage of layering is
`that the ele1nents of the system can be combined with other tech.nologies to create new applications.
`Th e system of DTV standard includes four layers: the picture layer, the ·compression layer, the
`transport layer, and tl1e tra11smission layer.
`
`17 .2.2.1
`
`Picture Layer
`
`At the picture lay.er, the input video forn1ats t1ave been defined. The Executive Committee of tl1e
`ATSC has approved release of statement regarding the ider1tification of the HDTV and Standard
`Definition Television (SDTV) tra11sn1ission fom1ats \vitl1in the ATSC DTV standards. There are six
`video t·om1ats in the ATSC DTV standard, whicl1 are ·'Higl1 Definition Television." These fo·m,ats
`are I isted in Table 17. I .
`The remaining 12 video formats are not HDTV fom1at. These fom1ats represent some .in1prove(cid:173)
`ment s over analoo NTSC and are referred to as ''SDTV." These are listed in Table 17.2.
`These definitions are fully supported by tl1e technical speciJications for t_l1e various formats as
`measured against the internatjonally accepted definition of HDTV established in 1989 by the ITU
`a·nd the definitions cjted by t}ie FCC during the DTV standard developn1ent process. These forn1ats
`cover a wide variety of applications, which i11clude motion picture film, curre11tly available HDTV
`production equipm·ent, tl1e NTSC television _standard, and computers such as person.al computers
`and workstations. However, there is no simple tecf1nique \.vhich can convert images from one pixel
`
`0
`
`TABLE 17.1
`HDTV Formats
`
`Spatial Format
`(X x Y active pixels)
`
`Aspect Ratio
`
`Tempora ,I Rate
`(Hz progressive scan)
`
`I 920 x I 080 (square pixel)
`
`16:9
`
`1280 x 720 (square pixel)
`
`16:9
`
`23.976/24
`29.97/30
`59.94/60
`23.976/24
`29.97/30
`59.94/60
`
`•
`
`•
`
`IPR2021-00827
`Unified EX1008 Page 395
`
`
`
`370
`
`Image and Video Compre ssion for Multirnedia Er,gineering
`
`TABLE 17.2
`s·orv Formats
`Spatial Format
`(X x Y active pixels)
`
`Aspect Ratio
`
`Temporal Rate
`(Hz progressive scan )
`
`704 x 480 (CCIR601)
`
`16:9 or 4:3
`
`640 X 480 (VG·A, square pixel)
`
`4:3
`
`23.976/ 24
`29 .97/30
`59 .94/60
`23.976/24
`29 .97/3.0
`59.94/60
`
`format and fran1e rate to anotherll1at acl1ieve interoperabiliLy a1no11g fil1n a11d tl1e various worldwide
`televisio:n standards. For exan1ple, all lo\v-cost computers use square p,ixels arid progres.sive scan(cid:173)
`ning, while current television uses rectangular pixels and interlaced scanning. The video industry
`has paid a lot ·of attention to developing forn1at-cOn\1erting tecl111iques. Son1e tecl1niques such as
`deinterlacing, dO\vn/up-conversion for fo1111at conversio11 have already been developed. It should
`be noted that tl1e broadcasters, ¢ontent providers, a11d service providers ca11 use any one of these
`DTV forrnat. This results in a difficult problem for DTV receiver n1anufacturers \vho have to provide
`all kinds of DTV receivers to decode all these for111ats and the11 to convert the decoded signal to
`its particular ,display fom1at. On the otl1er hand, tl1is requiren1ent also gives receiver manufacturers
`the flexibility to produce a wide variety of products that have different functionality and cost, and
`the consumers freedom to choose an1ong tl1en1.
`
`17 .2.2.2 Compression Layer
`
`The ra\v data rate of HDTV of 1920 x 1080 x 30 x 16 ( 16 bits per pixel corresponds to 4:2:2 color
`format) is about 1 Gbps. T.he functjon of the compression layer is to compress the ra\v data from
`about_ 1 Gbps to the data rate of approxin1ately 19 Mbps to satisfy tl1e 6-MHz spectrun1 requirement .
`This goal is achieved by using the main profile and .high level of the MPEG-2 video standard .
`Actually, during the development of the Grand Alliance HDTV systen1, many research results \Vere
`adopted by the MPEG-2 standard at the same time; for example, the support for interlaced video
`fo11nat and the syntax for data partitioning and scalability. The ATSC DTV standard is tl1e first and
`mos,t important application example of the MPEG-2 standard. The use of MPEG-2 video compres(cid:173)
`sion funda1nentally enables ATSC DTV devices to interoperate witl1 MPEG - 1/2 cornputer multi(cid:173)
`media applications directly at th.e compres.sed bitstream lever.
`
`17 .2.2.3 Transport Layer
`
`The transp0rt layer is another important issue for interoperability. The ATSC DTV transport layer
`uses the· MPEG-2 system ·tr,ansport stream syntax. It is a fully compatible subset of the MPEG-2
`transport protocol. The basic function of the transport layer js to de·fine the basic for1Tiat of data
`packets. The pu,rposes of packetization include:
`
`• Pac~aging the data into the fixed-size cells or packets t·or forward error correctio ·n (FEC)
`e,ncoding to protect the bit error due to the communication channe l noise;
`• Mu.ltiplexing the video, audio, and data of a program into a bitstrearn;
`• Brovjding time syncl1ro·nization .for different n:iedia ele111ents;
`• Providing flexibility aQd exte.osibility witl1 backward compatibility.
`
`IPR2021-00827
`Unified EX1008 Page 396
`
`
`
`Application
`
`Issues of MPEG-1/2 Video Coding
`
`371
`
`-<1111(----
`
`4 byte packet header
`Vid.eo
`
`Audio
`
`Video
`
`Video
`
`Audio
`
`PGM GD
`
`Video
`
`FIGURE 17.1 Packet structure of ATSC DTV transport la.yer.
`
`The transport layer of ATSC DTV uses a fixed-length packet. The packet size is 188 bytes c·.onsisting
`of 184 bytes of pay'load and 4 bytes of header. Within the packel l1eader, tl1el 3-bit packet identifier
`(PID) is used to provide tl1e important capacity to combine tl1e vjdeo, audio, and ancillary data
`streao1 into a single bitstrearB as shown in Figure 17. l. Each packet contains only a single type of
`data (video, audio, data, progran1 guide, etc.) identified by tl1e PID.
`Tl1is type of packet structure packetizes tl1e video, audio, and auxiliary data separately. It also
`provides tl1e basic 1nL1ltiplexin.g function tl1at produces a bitstream including video, five-channel
`surround -sound audio1 a.nd an auxiliary data capacity. This ki.nd of transport layer approach also
`provides con1plete flexibility to allocate ct1annel capacity to achieve any mix among video, audio,
`ar1d otJ1er data services. It should be noted that the selection of 188-packet lengtl1 is a trade-off
`between reducing tl1e overhead due to the transport header and increasing tl1e efficiency of error
`correction . Also, one ATSC DTV packet can be con1pletely encapsulated with its heade.r \Vithin
`four ATM packets by using l AAL byte per ATM ]1eader leaving 47 usable payload bytes times 4,
`for 188 bytes. The details of tJ1e transport ]ayer is djs_cussed in the chapter on MPEG systems.
`
`Transmission Layer
`The function of tl1e transmission layer is to modulate the transport oitstream
`into a signal that can be transmitte·d over the 6-MHz analog cl1annel. The ATSC DTV system uses
`a trellis-coded eight-level vestigial sideband (8-VSB) modulation technique to deliver approxi(cid:173)
`mately 19.3 Mbps in the 6-MHz terrestria,1 si111ulcast ct1annel. VSB modulation inl1erenlly requires
`only processi11g the in-phase signal sampled at the symbol rate, tl1us reducing tl1e complexity of
`the receiver , and ultimately the cost of implen1er_1tation. The VSB signal is organized in a data
`fran1e that provide s a trai11i11g signal to facilitate channel equalization for removing multipath
`distortion. However, from several field-test results, the multipatl1 distortio11 is still a serious problem
`of terrestrial simulcast receiving. The fr~n1e is organized into segn1ents each \vith 832 symbols.
`Each transmitted seg1nent co11sists of one syr1chronizat.ion byte (four sy111bols), 187 data bytes, and
`20 R-S parity bytes. Tl1is corresponds to a J 88-byte p.acket, wl1ich is protecte·d by 20-byte R-S
`code. Interoperabi Ii ty at the trans1nission layer is required by different transmission n1edia appli(cid:173)
`cations. The different media us.e different modulation techniques nO\V, su.ch as QAM for cable and
`QPSK for satellite. Eve11 for terrestrial transmissio11, European DVB. systen1s use OFDM transn1is(cid:173)
`sion. Tl1e ATV receivers \viii 11ot only be designed to receive terrestrial broadcasts, but also the
`programs fron1 c.able, satellite, and other n1edia.
`
`17.3 TRANSCODING WITH BITSTREAM SCALING
`
`1 7 .3.1
`
`BACKGROUND
`
`As indicated in the previous cl1apters, digital video signals exist everywhere in tl1.e format of
`con1pressed bitstreams . The con1pressed bitstreams of· video signals are used for tra11sn1ission and
`storage tl1rougb different. media sucl1 as terrestrial TV, satellite, cable~ the ATM net\vork, and the
`
`IPR2021-00827
`Unified EX1008 Page 397
`
`
`
`372
`
`lin age and Video Co111pression for Multir n edia Engineerin g
`
`Internet. The decoding of a bitstream can be in1ple1ne11ted in eitl1er l1ardware or software. Ho\vever,
`for I1igh-bit-rate con1pressed video bitstrean1s 1 specially des igned hardware is still the n1aj or deco d(cid:173)
`ing approa ch due to the speed li_mitation of current computer processors. T l,1e compr essed bitstream
`_as a ne,v fom1at of , 1ideo signal is a revolutionary change to video industry since it enab les many
`application s. On th.e other hand , there is a proble1n of bitstrea1n conversion. B ilstrean1 co11version
`or transcoding can be classified as bit rate conversion, resolution conversion, and sy 11tax co11ver ion.
`Bit rate conversion includes bit rate scaling and the conversior1 betvvee11 co11stant bit rate (CBR)
`and variable bit rate (VBR) . Resolutio11 cor1,,ersion includes spatial reso lution cor1vers.ion and
`temporal resolution cor1version. Syntax conversion is 11eeded bet vveen di rrerent co 111 press ion stan(cid:173)
`dard s such as JPEG, MPEG-1, MPEG-2, H.26 1, and H.263. I11 tl1is ection, vve w1ll focu on the
`topic of bjt rate conversion, especially on bit rate scalir1g since it finds \vide ,1pplication and readers
`can extend the idea to other kinds of transcodi ng. Al o, \Ve limit ourseJ,,e to focu on the problem
`of scaling an MPEG CBR-en coded bitstrean1 do,vn LO a lo\ver CBR. T l1e other ki11d of transcoding,
`do\vn-conversion decoder, \viii be prese nted in a separate sectio,1.
`The basic function of bitstrean1 scaling may be tt1ougl1t of a n black box, \Vhich pa sivel)1
`accepts a precoded MPEG bitstream at the input and produce .. a sca led bit trea ,11, \Vl1ich 111ee ts ne\~'
`constraints that are not kno\vn a p1·io1·i dur i11g the creation of the origi11~,l pre oded bitstream. The
`bitstream scaler is a transcoder, or filter, tl1at provides a n1atcl1 bet\veen n,1 MPE G ·ource bitstream
`and the receivin g load. Tl1e rece iving load consists of tl1e trans1111ssion cl1a11r1c 1, tl1e destination
`decoder , and perhaps a destination storage device. The constrai11t on the ne,:v bitstrc,1m 111ay be bound
`by a variety ot· condition s. Among then1 are the peak or average bit rale i1.11posed l)y tl1e con1muni(cid:173)
`cations channel , the total nun1ber of bits imposed by the storage device, a11d/or the variacion of bit
`usage across pictures due to the an1ount of buffering available at the recei,1i11g decoder.
`While the idea of bitstream sca ling has r11any concepts si111ilar to tl1ose provided by tl1e various
`MPEG -2 scalability profiles, the intend.ed applica tions and goals dirfer. T l1e MP EG-2 sca labilit)'
`n1ethods (dat a partitioning , SNR scalability, spatial sca la'bility, and te111pornl ca lab ility) are aimed
`at pr0\ 1iding encoding of source video into multipl e service grades (thnt are preden ncd at the tjme
`of encoding) and multiti ered transn1ission for increased signal robustr1ess. T l1e mul l i pie bi tstreams
`created by MPEG-2 scalabilit y are hierarchically dependent in such a \vay that by decodin g an
`increasing number of bitslream s, higher service grades are reconsLructed. Bitstream scaling meth (cid:173)
`ods, in contra st, are prim arily decoder/transcoder tecl1niques for converting an existing preco ded
`bitstream to another one that meets new rate constraints. Several app lications that motivate bitstream
`scaling include the following:
`
`Consider a video-on-demand (VOD) scenario wherein a video file
`1. Video-On-Demand
`serve.r includes a storage device containing a library of preco ded MPEG bitstrea ms.
`The se bits,treams in the library are originally coded at high quality (e.g. , studi o qu ality) .
`A nu·mber of client s may request retrieval of these video progran1s at one particular time.
`The number of users and the quality of video delivered t.o the users are co nstrair1ed b)'
`the outgoing channel capacity . This outgoing channel, \vl1icl1 n, ay be a ca ble bus or an
`ATM trunk , for example, must be shared among the users who are adn1itted lo tl1e se rvice.
`Different users may require d'ifferent levels of video quality, and the quality of a respective
`pr:ogram will be based on the fraction of tl1e total channel capacity allocated to eac l1
`user. To acco111modate a plurality of users simultaneou sly, the video file ser ver 1nust scale
`the stored precoded bitstreams to a reduced rate before it is delivered ove r the ch.anncl
`to respective users. The quality of the resulting scaled bitstream sl1ould not be signifi(cid:173)
`eantly degraded compared with the quality of a hypotl1etical bitstream so obtain ed by
`coding the original source material at toe reduced rate . Con1plexity cost is not such a
`c.ritical factor because only tlie file server has to b.e equipped witl1 the bitstream scaling
`hardware, not every user. Presumably, video service provider s would be \Villin,g to pay
`a high co.st for delivering the possible highest-quality video at a pres cribed bit rate .
`
`IPR2021-00827
`Unified EX1008 Page 398
`
`
`
`Application
`
`Issues of MPEG-1/2 Video Coding
`
`373
`
`A~ ~n option, a so~histicated video file server rnay also perfo1m scaling of multiple
`or1g111al precoded b1tstreams jointly and statistically multiplex the resulting scaled VBR
`bitstreams i11Lo tl1e cha1111el. By scaling tl1e group of bitstreams jointly, statistical gains
`can be acl1ieved. These statistical gains can be realized in the for111 of higher and n1ore
`unifor111 pictL1re quality for tl1e san1e channel capacity. Statistica l multiplexing over a
`DirecTv transponder (Is11ardi, 1993) is 011e example of an application of video stati,stical
`111ul ti plexi rig.
`I11 this application, the video bitstream is scaled
`2. Trick- 1)lay Track on Digital VTRs
`to create a sid·etrack 011 video tape recorders (VTRs). This sidetrack contains very coarse
`qualit y video sufncie11L to facilitate trick-modes on the V""fR (e.g., FF and REW at
`differe11t speeds). Complexity cost for the bitstream scaling hardware is of significant
`co n.cer11 i 11 ll1is ,1pplication
`ince the VTR is a 1nass consumer i Lem subject co mass
`prodt1cl ion.
`3. Exte nded-Play Recording on Digital VTRs
`In this application, video is broadcast to
`users' l101nes at a certain broadcast quality (-6 Mbps for standard-definition video and
`- 24 Mbp s for l1igh-definition video). Witl1 a bitstrean1 scaling feature in their VTRs,
`u ers 111ay record tl1e video at a reduced rate, akin to extended-p lay (EP) mod e on today's
`VHS recorders, tl1ereby recording a greater duration of video progran1s onto a tape at
`lower qua'lily. Again, hardv.,are complexity costs would be a n1ajor factor l1ere.
`
`17.3.2
`
`BA SIC PRI NCIPLES OF BtT STREAM SCALING
`
`•
`
`As de scribed previously, the i(lea of scaling an MPEG-2-compre ssed bitstrearn do\vn to a lo\ver
`bit rate is initiated by se\ieral applications. One problern is the criteria tl1at should be used to judg e
`tl1e perf om1ance of (ln architecture that c·an reduce tl1e size or rate of an MPEG-con1pressed
`bitstre a111. Two basic principles of bitstream scaling are ( I ) the inforn1ation in the original bitstrean1
`should be exploited as 1n uch as possible, and (2) the resulting in1age quality of tl1e new bitstream
`\Vitl1 a lower bil rate should be as close as possible to a bitstrea1n created by coding the original
`source video a1 tl1e reduced rate. Here, we assu111e cl1at for a .giver1 rate the origina l sot1rce is encoded
`in an op timal way. Of course, the implementatio11 of hard\vare con1plexity also l1as to be considered.
`Figur e 17.2 shows a simplified encoding stru·cture of MPEG encodi ng ir1 \.vhich Ll1e rate control
`n1echanism is 11ot sl1own.
`In this structure, a block of image data is first transfor111ed to a set of coefficients; the coe'fficients
`are tl1en quantized wit!, a quantizer step \~hicl1 is decided by tl1e given bit rate budget, or number
`of bits ,1ssigned to tl1is block. Finally, the quantized coefficients are coded i11 variable-length coding
`to the binary forn1at, wl1icl1 is called the bitstrean1 or bits.
`
`Q
`
`VLC-
`
`Bits
`
`lnput source
`
`T
`
`p
`
`T-- transfonn, Q--quantizer, P-moticn-com,pensated prediction
`VLC-- variable 'length
`FIGURE 17.2 Simplified encoder structure. T = transform, Q = qua11tizer, P = motion-compensated predic(cid:173)
`tion, VLC = variable length.
`
`IPR2021-00827
`Unified EX1008 Page 399
`
`
`
`374
`
`Image and Video Co,npression for Multim eclia Engineering
`
`Fron, tl1is structure it is obvious that tl1e perfor.111ar1ce of cl1,1ngi11g tl1e qt1a11tizer step \vill be
`better than cuttin·g bi,gh.er freque11cies \Vhen tl1e same an1ount of rate 1ieeds to be reduced . In the
`original bitstrean1 tl1e coefficients are quantized \Vill1 finer qua11tization steps \vl1icl1 a1·e optimized
`at tl1e original l1igl1 rate. Af'ter cutting the coef.(icients o·f higl1er frequencies, tl1e rest of tl1e
`eoe~ncients are not quantized witl1 an opti1nal. quantizer. In the n1ethod of requantization all
`coefficients are requantized \.Vitl1 a11 opti111al quantizer \.vhicl1 is detern1i11ed by tl1e reduced rate ; the
`perfo1111ance of the requantization method n1ust be better tl1an tl1e n1etl1od of cutting high frequencie s·
`to reach the reduced r,ate. Tl1e theoretical analysis is give11 in Section 17.3.4.
`In the follo\ving, se,,eral different arcl1itectures that acco111plish the bi tstr ean1 sca ling are
`discussed. The different methods l1ave varying l1ard\\1are in1ple111e11tation con1plexities; each l1as its
`own degree of Lrad.e-off betwee11 required l1ardware and resultir1g i111age quality.
`
`17.3.3
`
`ARCHITECTURES OF BITSTREAM SCALING
`
`Four architectures for birstrean1 scali11g are discussed. Eacl1 of the sca li11g t1rcl1itectures described
`I1as its own. particular benefits that are suitable for a particular ,lf)plicatio ,1.
`
`Architecture I: Tl1e bitstrea111 is scaled by cutti11g l1igl1 frequencies.
`Architecture 2: Tl1e bitstreru11 is scaled by rec1uantization.
`Architecture 3: The bitstream is s.caled by reencodin.g the reco nstructed pictures. ,vitl1
`motio11 vectors and coding decisior1 n1odes extrc1cted rron1 the original l1igl1-
`quality bitstream.
`Architect ,ure 4: The bitstream is scaled by r.eencoding the recon tructed· pictures \Vitl1
`n1otion vectors extracted froin tl1e origir1al l1igl1.-qua lity bitstrearn, but ne,v
`coding decisior1s are co·n1puted b,1sed 011 reco nsLrucLcd pictures.
`
`Architectures 1 and 2 are considered for VTR applications sucl1 as trick-play n1odes and EP
`recording. Architectares 3 and 4 are considered for and other applica ·ble StatMux sce nari os.
`
`17.3.3.1 Architecture 1: Cutti.ng AC Coefficients
`
`A block diagram illustrating arcl1itecture 1 is sho\vn in Figure l 7.3a. Tl1e n1ethod of reducing the
`b.it rate fn archjtecture I is based on cutting the l1igher-frequency coe fficients. The incoming
`precoded CBR stream enters a decoder rate buffer. Following the top branch leading from tl1e rate
`buffer, a VLD is used to parse the bits for the next fran1e in the bu·ffer to identify all tl1e variable(cid:173)
`length codewords that corre spond to ac coefficients used in that frarne. No bits are ren1oved from
`the rate buffer. The codewords are not decoded, but just simply parsed by the VLD parser to
`determine codeword lengths. The bit a.I location a.nalyzer accun1ulates tl1ese ac bit counts for e,,e[)'
`macro-block in the frame and creates an ac bit usage profile as sho\vn in Figure 17 .3(b ). Tl1at is,
`the analyzer generates a running sum of ac OCT coefficient bits on a mac1·oblock basis:
`
`PVN = L_IAC_B!TS,
`
`(17.1)
`
`where PVN is the profile value of a runi:li.ng sun1 of AC codeword bits u11til the n1acroblock N. In
`addition, the analyzer count$ the sum of all c.oded bits for tl1e fran1e, TB (total bits). After a.II
`macrobl0eks for th.e frame h,ay,e been analyzed., a target value TVAc, of ac DCT coefficient bits pe.r
`frame is calculated as
`.
`
`,
`
`.
`
`(17.2)
`
`IPR2021-00827
`Unified EX1008 Page 400
`
`
`
`Application
`
`Issues of MPEG-1/2 Video Coding
`
`375
`
`Bitstream
`
`I I I I I
`
`New bit rate
`
`Cumulative bits
`use.d for AC cocffs
`
`VLD Parser
`
`Bit allocation
`Analysis
`
`t------.
`
`•
`
`•
`
`Delay
`
`VLD P~er
`
`i---
`
`Rare controller
`(frequency c.ut)
`
`.....,_.,..8_its-out
`
`Profile of original bits
`
`•
`
`I
`
`I New target
`
`I
`
`B.lock number
`
`0
`
`JI'lG URE 17.3
`
`(a) Ar chitecture I ; cullin g high frequencies. (b) Profile 1nap.
`
`\vl1ere TVAc is the target value of AC codeword bits per fran1e, PV LS is tl1e profile value at the last
`macrob lock , a is tl1e percentage by wl1ich tl1e pree11coded bitstrean1 is to 'be reduced, TB is the
`total bits, and B1;x is the an1ount of bits by \vl1icl1 the previous frame missed its desired target. The
`profile value of AC coefficient bits is scaled by tl1e factor T~\c !PVLS. Multiplying each PY N performs
`sca ling by that factor to .gene rate tl1e li11early scaled profile sho\vn in Fi·gure 17.3(b). Fo llowing tl1e
`bottom bra11ch fron1 t11e rate buffer, a delay is inserted equal to tl1e an1ount of ti1n.e required for
`the top branch analy sis processing to be completed for Lhe current frame. A second VLD parser
`accesses and re1noves all codeword bits fron1 the buffer and delivers Ll1em to a rate contro ller. Tl1e
`rate co ntroller rece ives tl1e scaled target bit usage prefile for tl1e ar11ount of ac bits to be used \Vitl1in
`tl1e frame. The rate controller has memory to store all coefficients associated \VitJ1 tl1e current
`macroblock it is operating on. AJI original codeword bits at a l1igl1e1· level tl1a11 ac coefficients (i.e .,
`all fixe"d-lengtb l1eader codes, n1otion vector codes, 11'laer0block typ.e codes, etc.) are l1eld in n1en1ory
`and will be re111ultiplexed with all AC codewords in tl1al ri1acroblock that f1ave not been ·excised to
`for1r1 tl1e outgoing scaled bits.tream. Tl1e ra(e co11Lroller determines and flags in the" n1acr0block
`codeword men1ory which AC code\vords t.o keep and wl1icl1 to excise. AC code\vords are acces.sed
`from tf1e tnacrobJock codeword men1ory i11 tl1e ord.er ACJI. AC12, ACJ3, AC14, AC/5, ACJ6 ,
`, AC22 , AC23 , AC24, AC25, AC26, AC3/ , AC32, AC33, etc., \vhere ACij denotes tl1e ith AC
`AC2/
`codewo rds fron1 jtl1 block in the macroblock if it is present. As tl1e AC code\vords are accessed
`t'ro1n memory, the respective codeword bits are summed and co11tinuot1sly eo111.pared witl1 the sca led
`profile value to the current macroblock, Jess the 11umber of bits for i11sertion of EOB (end-of~block)
`codew ords. Respective AC codewords are fiagged as kept until tl1e running sum of AC code\vords
`bits exceeds the scale@ profile value less EOB bits. w ·hen this condition is 1net, all remaining AC
`codewords are rnarked 'for being e_xcised. Tl1is proces"s continues un.til all macroblocks l1ave tl1eir
`kept codewords reassembled to forn1 the scaled bit~Lrean1 .
`
`•
`
`IPR2021-00827
`Unified EX1008 Page 401
`
`
`
`376
`
`Image and Video Compre ssion for Multim edia Engineering
`
`VLDP~er~
`
`.,__ __ ~
`
`it allocation
`Analysis
`
`Bitstrearn ---,
`----•~1111
`New bit rate
`
`•
`
`Delay
`
`VLD Parser~
`
`Rate controlle
`( requantizer)
`
`-----Bits-out
`
`VLC
`
`FIGURE 17.4 Ar chit ecture 2: increasin g quanti zaLion
`
`tep.
`
`17.3.3.2 Architecture 2: Increasing Quantization Step
`
`Architecture 2 is sho,vn in Figure 17.4. The method of bitstream scaling in ·1rcl1itcc ture 2 is based
`on increasing tl1e quantization step. Tllis n1ethod requires addi tional dcqu a11tizcr/q uant izer and
`variable-length coding (VLC) l1ardware o,1er tl1e first 111ethod. Like the nrst 111etl1od, it also rnakes
`a first VLD pass. on the bitstream a11d obtains a sin1ilar scaled pron le of Larget curnula tive code,vord
`bits vs. rnacroblock count to be used for rate control.
`The rate control mechanism differs from this point on. Arter tJ1e se -ond-1Jas VLD is made on
`the bitstream , quantized DCT coefficients are dequa11tized. A block of fi nely qua.ntized OCT
`coefficients js O·btai11ed as a result of this. This block of DCT coe fficients is req uar1rized vvilh a
`coarser quantizer scale. The value used for tl1e coarser qua 11tizer cale is de1ermined adap Li·ve1y by
`n1aking adjust111ents after every macroblock so that the scaled targe t profile is tracked as \,Ve progress
`through the n1acroblocks in the frame:
`
`•
`
`QN = QJ\IQ,,',f + G * L (BU - PV:\1-1) ,
`
`' t\l-1
`
`( 17.3)
`
`w·here QN is the quantization factor for macroblock N, Q N0 11.1 is an estimate of the new no·minal
`quantization factor for the frame, L N_1BV is the cumulative amount. of coded bits up to macroblock
`N -
`l, and G is a gain factor \vhich controls how tightly the pron le curve is tracked through the
`picture. QNOM is initialized to an average guess value before th·e very first frarne, and updat ed for
`the ne.Xct .frame by setting it to Q1-5 (the quantizati on factor for the last n1ac roblock) from the fran1e
`just completed. The coarsely requantized block of DCT coeffic ients is variable-l ength-c oded to
`generate the scaled bitstream. The rate controller also has provision.s for changing som.e macroblock (cid:173)
`layer codewords, su.cl1 as the n1acroblock-type and coded-block pattern to ensure a legitimate scaled
`bitstream that conforms to MPEG.;,2 syntax .
`
`17.3.3.3 Architecture 3: Reencodin-g with Old Motio .n Vectors
`and Old Decisions
`
`The third arch'tecture for bitstream scaling is shown in Figure 17.5. In tl1is architecture, the n1otion
`vectors and macroblock coding decis,ion modes are first e.xtractedl from the original bjtstream, and
`at the same time the reconstructed pictures are obtained from the normal decoding procedure. Then
`the scaled, bitstrea111 is obtained by reencoding the reconstructed pictures using tl1e old motion
`vectors and maeroblock decision.s from tl1e original h>itstrean1. The benefits obtained fron1 this
`arc.hitecture compared with full decoding and reencoding is that no ,notion esti,r11ation and decision
`c0mputation is needed.
`
`IPR2021-00827
`Unified EX1008 Page 402
`
`
`
`Applic ation lssL1es of MPEG-1 /2 Video Coding
`
`377
`
`VLD Pars
`
`Motion vector
`and~ing
`decision
`cxtracter
`
`Motion vectors and
`
`Macroblock Decision Modes
`
`Bitstream ---
`---41•~1111
`
`New bit rate
`
`Delay
`
`VLD&
`Dequantizer
`
`.----.._____, Bi ts-out
`
`Reconstruct
`
`Re-encoder
`
`FIGURE 17.5 Arcl1itecture 3.
`
`17.3.3.4 Archite c·ture 4: Reencoding with Old Motion Vectors
`and New Decisions
`
`Archite cture 4 is a n1odified version of arct1itecture 3 in which new macroblock decisio11 modes
`are con1puLed durin g ree11coding based on reconstructed pictures. Tl1e scaled bitstrean1 cre.ated tl1js
`\vay is expe cted to yield an .i111provement jn picture quality because the decisio n modes obtained
`from tl1e h.igh-qual i Ly original bitstrea·m are not optimal for ree11coding at L}1e reduced rate. For
`exa1nple, at higl.1.er r,1tes tl1e optimal n1ode decision for a n1acroblo-ck is 1nore likely to favor
`bidirectiona.l field motio,1 compensation over forward fran1e moti on con1.pensation. But at lower
`rntes, only tl1e oppo. ite decision n1ay be true. In.order for tl1e reencod.er to l1ave the possibility of
`deciding on 11e\v r11acroblock coding modes, the entire pool of n1otion vectors of every type must
`be available. Tl1is can be supplied by aL1gmenting tl1e original l1igh-qualit y bitstrec1m \Vith ancillary
`data co ntainin g the c11tire pool of n10Lion vectors during the ti_me iL 'vvas original ly encoded. It could
`be inserted into the user data every frame. For tl1e same origi11al bit rate, the quality of an original
`bitstream obtain ed this way is degraded con1pared with an origjnal bitstream obtained from archi(cid:173)
`tecture 3 because the addition al overhead required for the extra motion vectors steals away bits for
`actual encodi ng. However, the resulting scaled bitstrea111 is expected to sl1ow quality in1provement
`over the scaled bitstrean1 frorn architecture 3 if tl1e gains from computing ne'vv and more accurate
`decision 1nodes can overcome tl1e loss in original picture quality. Table 17 .3 outlines the hard\vare
`con1plex ity sav ings of each of the tl1ree proposed architectures as con1pared \vith full decoding and
`reencod ing.
`
`17 .3.3 .S Comparison of Bitstream Scaling Methods
`
`We have described four architectures 1·or bitstrean1 scaling ,vhicl1 are useful for various applications
`as described i11 tl1e introduction. Among the four arcl1itectures, arcl1itectures I and 2 do 11ol require
`
`TABLE 17.3
`Hardware Complexity Savings over Full Decoding/Reencodin .g
`
`Coding Method
`
`Architecture 1
`
`Architecture 2
`
`Archite cture 3
`Arcl1itecture 4 ·
`
`Hard\vare Complexity Savings
`
`No decoding loop. DO DCT/IDCT. no frnrne store n1ernory. noeocoding loop, no qunntizer/d.equnntizer,
`no motion compensaLion, no VLC. sin1plifi ed rate control
`No decoding loop, no DCT/ IDCT, no frnn,e store 111en1ory, no encoding loop, no 111otion con,pensntion,
`sin1plifi ed rate control
`No motion estimation, no n1ncroblock coding decisions
`No n101ion es1in1ntion
`
`IPR2021-00827
`Unified EX1008 Page 403
`
`
`
`Image and Video Compression for Mu ltimedia Engineering
`
`nd enr,-oding )oops or frame srore memory for reco11structed pictures, thereby
`, d.
`._ d
`enure . eco 1ng a
`,
`\.:,
`·....
`.
`.
`.
`. .
`.
`.fi
`·t hardware complexity. However, video quality te11ds to degrade tl1rough tl1e group
`saVIng s1gn1 can
`.
`. . .
`·
`. .
`:
`(G,OP) until tl1e next I-p1ctu-re due to drift 1n tl1e absence of decoder/encoder loops . For
`·,.
`f
`o pictures
`.
`.
`.
`,
`.
`. 1. -o say for rate reduction greater tl1an 25o/o, arcl1.1te·cture l produces poor-quality blocky
`I O
`_ar.:,e sea IO.:,,
`.
`.
`,
`.
`.
`.
`.
`.
`.
`. _
`.
`pictures, prinlarily because n1any bits were spent in t~e or1g111al .h1gh-qual1~y b1tstr~am on finely
`quantizing the de an~ ot~1er ve~ lo~v-?rder ac coefficients.'. Arcl11tecture 2 1s a part1cularl~ good
`clloice for VTR appI1cat1o,ns since 1t 1s a good compromise between l1ardware compl exity and
`recon_structed image quality. Arcl1itectures 3 and 4 are suitable for VOD server applications