`(10) Patent N0.:
`(12) Unlted States Patent
`
`Radha et al.
`(45) Date of Patent:
`Oct. 19, 2004
`
`U5006806909B1
`
`(54) SEAMLESS SPLICING 0F MPEG-2
`MULTIMEDIA DATA STREAMS
`
`(75)
`
`Inventors: Hayder Radha, Mahwah, NJ (US);
`-
`-
`-
`xjrgrsthfilfiJksrflgjfiiglardfl
`Parthasarathy, Ossining, NY (US)
`
`5,801,781 A *
`......... 348/441
`9/1998 Hiroshima et al.
`5,859,660 A *
`1/1999 Perkins et al.
`................. 348/9
`5,917,830 A *
`6/1999 Chen et al.
`.............. 370/487
`zigzag: 2 * 13/3888 $kefiti1~ ~~
`~~~~~~~ 3327214713
`
`,
`,
`*
`me e a.
`.......
`6,181,383 B1 *
`1/2001 Fox et a1.
`................... 348/515
`>1 cited by examiner
`
`(73) Assrgnee: gigggilgkaISIhps Electronlcs N.V.,
`
`Primary Examiner—Victor R. Kostak
`(74) Attorney, Agent, or Firm—Michael E. Belk
`
`( * ) Notice:
`
`Subject. to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by0 days.
`
`(21) Appl. No.: 09/033,730
`.
`Flled3
`
`M313 3: 1998
`
`(22)
`
`Related US Application Data
`PTOViSional application N0~ 60/039528: filed on Man 3:
`1997.
`
`(60)
`
`Int. Cl.7 .................................................. H04N 7/12
`(51)
`(52) US. Cl.
`.................... 348/3841; 348/515; 348/722;
`375/24028
`(58) Field of Search .............................. 348/3841, 512,
`348/722, 584, 515; 370/468, 486, 487,
`535, 412, 384; 707/101, 102, 104, 201;
`375/24028
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`(57)
`
`ABSTRACT
`
`Respective encoders provide a first and second encoded
`MPEG-2 data streams for a first and second program respec-
`tively. Each stream includes at least video and audiocom-
`ponents. The encoder prov1des seamless v1deo splice-1n and
`splice-out points. A play-to-air splicer is commanded to
`switch the broadcast output from the first input stream to the
`second input streams. The splicer identifies approximately
`aligned seamless video splice-in and seamless video splice-
`.
`.
`.
`.
`out pomts 1n the respective first and second v1deo streams.
`The splicer splices the second video stream to the first video
`stream, but continues to broadcast the first audio stream. The
`splicer identifies corresponding audio splice-in and splice-
`out points. The splicer splices the second audio component
`to the first audio component. The splicer adjusts the decode
`and presentation times in the second stream after the respec-
`tive slice-in to be consistent With such times in the first
`program. A decoder converts the compressed video and
`audio components output from the splicer into uncom-
`pressed form.
`
`5,703,877 A * 12/1997 Nuber et al.
`
`................ 370/395
`
`17 Claims, 12 Drawing Sheets
`
`24‘
`
`230
`
`231
`
`243
`
`244
`
`245
`
`DECODER
`
`245
`
`BROADCAST
`
`
`
`
`
`
`
`ANALOG
`I 0AM
`
`218
`
`2“
`219 mm
`FILM
`SCANNER
`
`20‘
`
`DIGITALIZER
`
`212
`
`213
`
`214
`215
`
`202
`
`BASEBAND
`SWITCHER
`
`I ”WI
`
`I 0me
`
`203
`
`233
`
`217
`_ ENCODEH
`
`I DIGITAL
`CAM
`
`=
`
`204
`
`ENCODER
`DECODER
`
`ENCODER
`
`ENCODEH
`
`232
`
`-ENCODER
`BROADCAST
`DISPLAY
`PREVIEW 216
`DISPLAY
`270
`
`COMBINER
`
`.DEEAIML
`
`ENCODER
`
`240
`
`210
`
`
` PREVIEW
`DISPLAY
`
`DISPLAY DECODER
`
`SPLICER
`SPLICER
`COMBINER
`
`TRANSPORT
`
`DECODEH
`
`DISPLAY
`
`271
`
`1
`
`280
`
`281
`
`282
`
`283
`
`BTN0003340
`
`NEULION 1021
`
`BTN0003340
`
`1
`
`NEULION 1021
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 1 0f 12
`
`US 6,806,909 B1
`
`
`
`Hagguaflggg
`Eagaaga
`gggggg
`fiaaaaa
`
`gg
`
`Ea
`
`EQEI
`
`Q<o._><n_
`
`:2thFa:
`
`BTN0003341
`
`BTN0003341
`
`2
`
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 2 0f 12
`
`US 6,806,909 B1
`
`V1
`
`A1
`
`v
`
`[\D
`
`A[\D
`
`FIG.2
`PRIOR ART
`
`vsso
`
`VIDEO -__— OLDINPUT
`AUDIO —-— STREAM
`
`Ii
`
`E ASS
`vssu
`IWWW NEW INPUT
`
`'
`
`AUDIO
`
`STREAM
`
`VSSI
`
`ASS
`
`VIDEOm ouwur
`
`AUDIO —Im STREAM
`
`ASS
`
`FIG.3
`
`BTN 0003342
`
`BTN0003342
`
`3
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 3 0f 12
`
`US 6,806,909 B1
`
`VSSI
`
`VSSI
`
`OLDPROGRAM
`PRESENT/”'0”
`
`NEWPROGRAM
`PRESENT/“'0“
`
`m,
`mm-mmmmmmmmm
`
`SILENT PERIOD
`
`FIG.4
`
`vsso
`
`vssn
`
`VSSI
`
`OLD PROGRAM
`PRESENTATION
`
`NEW PROGRAM
`PRESENW'ON
`
`SPLICED
`--------— PRESENTAT'ON
`
`SILENT PERIOD
`
`FIG. 5
`
`BTN 0003343
`
`BTN0003343
`
`4
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 4 0f 12
`
`US 6,806,909 B1
`
`II
`OLD
`PSOC
`STREAM
`STREAMII
`“'0
`STREAMIIP300
`
`VSSO
`
`
`
`VSSI
`
`VSSI
`
`NEW
`
`SPLICED
`
`VSSI
`
`BSC
`
`0 0 0
`
`E80
`
`VSSO
`STREAM
`
`FIG.6
`
`101
`
`PROVIDING UNCOMPRESSED FIRST AND SECOND MULTIMEDIA PROGRAMS WITH
`VIDEO COMPONENT, AUDIO COMPONENT, AND CAPTION COMPONENT
`
`ENCODING PROGRAMS USING MPEG-2 VIDEO COMPRESSION WITH
`GROUP OF PICTURES ENDING WITH P-ERAME PRESENTATION
`
`‘02
`
`PROVIDING SEAMLESS VIDEO SPLICE-IN BEFORE EACH I-FRAME IN STREAM
`
`INDICATING VIDEO SPLICE-IN IN PACKET BEFORE EACH I-FRAME IN STREAM
`
`103
`
`104
`
`PROVIDING SEAMLESS VIDEO SPLICE-OUT BEFORE EACH PAND I-FRAME IN STREAM
`
`INDICATING VIDEO SPLICE—OUT IN PACKET
`FOLLOWING EACH PAND I-FRAME IN STREAM
`
`106
`
`105
`
`FIG. 7A
`
`BTN 0003344
`
`BTN0003344
`
`5
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 5 0f 12
`
`US 6,806,909 B1
`
`PROVIDING SEAMLESS AUDIO SPLICE-IN BEFORE EACH AUDIO FRAME IN STREAM
`
`
`
`
`
`
`
`INDICATING AUDIO SPLICE-IN IN PACKET
`BEFORE EACH AUDIO FRAME IN STREAM
`
`107
`
`109
`
`
`
`PROVIDING SEAMLESS AUDIO SPLICE-OUT AFTER EACH AUDIO FRAME IN STREAM
`
`INDICATING AUDIO SPLICE-OUT IN PACKET FOLLOWING EACH AUDIO FRAME IN STREAM
`
`
`
`
`
`110
`
`FIG. TB
`
`BTN 0003345
`
`BTN0003345
`
`6
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 6 0f 12
`
`US 6,806,909 B1
`
`TRANSPORTING DATA STREAMS TO SPLICER
`
`RECEIVING COMMAND TO SPLICE SECOND PROGRAM
`TO FIRST PROGRAM AT SPECIFIED TIME
`
`SELECTING VIDEO SPLICE-IN POINT FOR SECOND PROGRAM
`WHICH IS CLOSEST TO SPECIFIED TIME IN STREAM
`
`DETERMINING WHETHER ANY VIDEO FRAMES NEED TO BE SKIPPED
`AND HOW MANY TO PROVIDE SEAMLESS VIDEO SPLICING
`
`SELECTING VIDEO SPLICE-OUT POINT FOR FIRST PROGRAM DEPENDING
`ON DETERMINATION FOR SKIPPING VIDEO FRAMES, AT EQUAL OR
`PREVIOUS POSITION IN STREAM TO VIDEO SPLICE- IN POINT
`
`SPLICING VIDEO IN FIRST PROGRAM OUT AT SELECTED VIDEO
`SPLICE-OUT POINT AND SPLICING VIDEO OF SECOND
`PROGRAM IN AT SELECTED VIDEO SPLICE—IN POINT
`
`126
`
`
`
`
`CHANGING PROGRAM CLOCK REFERENCE, BEGIN-PRESENTATION TIMES,
`AND BEGIN—DECODING TIMES IN VIDEO PACKETS OF SECOND PROGRAM
`AFTER SELECTED VIDEO SLICE- IN POINT TO MAKE CONSISTENT
` WITH TIMES IN FIRST PROGRAM
`
`
`
`ADJUSTING VIDEO BUFFER OF SPLICER IF REQUIRED SO THAT FIRST
`VIDEO FRAME AFTER SELECTED VIDEO SPLICE-IN IMMEDIATELY
`FOLLOWS LAST VIDEO FRAME BEFORE SELECTED SPLICE-OUT
`
`127
`
`128
`
`BTN0003346
`
`BTN0003346
`
`7
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 7 0f 12
`
`US 6,806,909 B1
`
`SELECTING AUDIO SPLICE-IN IN SECOND PROGRAM WITH BEGIN-
`PRESENTATION TIME AFTER EQUAL OR AFTER BEGIN—
`PRESENTATION TIME OF VIDEO PICTURE AFTER SPLICE-IN
`
`129
`
`DETERMINING WHETHER ANY AUDIO FRAMES NEED TO BE
`SKIPPED AND HOW MANY TO PREVENT AUDIO DECODING
`BUFFER OVERFLOW
`
`130
`
`
`
`
`
`SELECTING AUDIO SPLICE-OUT IN FIRST PROGRAM DEPENDING ON
`DETERMINATION OF NEED TO SKIP AUDIO FRAMES AND
`WITH AN END-PRESENTATION TIME EQUAL OR BEFORE END-
` 131
`PRESENTATION TIME OF VIDEO PICTURE BEFORE VIDEO SPLICE-OUT
`
`SPLICING AUDIO OF FIRST PROGRAM OUT AT SELECTED AUDIO
`SPLICE-OUT AND SPLICING AUDIO OF SECOND PROGRAM
`IN AT SELECTED AUDIO SPLICE-IN POINT
`
`T32
`
`CHANGING PROGRAM CLOCK REFERENCE. BEGIN-PRESENTATION TIMES,
`AND BEGIN-DECODING TIMES IN AUDIO PACKETS OF SECOND
`PROGRAM AFTER SELECTED AUDIO SLICE—IN POINT TO BE
`CONSISTENT WITH TIMES IN FIRST PROGRAM
`
`
`
`
`
`133
`
`FIG. 7D
`
`TRANSPORTING DATA STREAM TO DECODER
`
`
`
`
`SELECTING PROGRAM TO DECODE
`
`I41
`
`142
`
`
`
`
`143
`
`BTN 0003347
`
`
`
`
`
`DECODING VIDEO AND AUDIO FRAMES OF SELECTED PROGRAM INTO
`UNCOMPRESSED DIGITAL VIDEO AND DIGITAL AUDIO DATA
`
`THE RESULTING VIDEO PICTURES AND AUDIO SOUNDS
`ARE DISPLAYED TO A VIEWER
`
`144
`
`FIG. 7E
`
`BTN0003347
`
`8
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 8 0f 12
`
`US 6,806,909 B1
`
`
`088NElma$125$3Sag$839
`552%E:55552%Egg3%25-EEQEEEE
`
`ESEQz§m<mNew
`
` 25EN..-ElmaESEE3225$228gem$8quE$55
`
`
`$25I5%Efig
`3%:EN555mmmmom
`
`mamNamamamE
`5228"3N.
`
`3:
`
`9aESEHE355-
`
`EE
`
`2%mmEE:55-8N:NEEEE
`
`BTN 0003348
`
`
`
`
`
`@Nam$805VENEER.5N
`
`EN8Ni583E:25-ESE2m82%
`magEE.$82:I:N22
`
`fizzsm
`
`EN
`
`BTN0003348
`
`9
`
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 9 0f 12
`
`US 6,806,909 B1
`
`
`
`FIG.9
`
`10
`
`BTN0003349
`
`BTN0003349
`
`10
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 10 0f 12
`
`US 6,806,909 B1
`
`SPLICE
`COMMAND
`
`STREAM 00 01 CD
`
`II
`
`STREAM
`
`STREAM
`
`358
`
`374
`
`.
`I MOD
`I.
`F
`I
`I
`
`OD
`
`II
`I
`I
`I.
`I
`
`376
`
`381
`382
`
`11
`
`BTN0003350
`
`BTN0003350
`
`11
`
`
`
`US. Patent
`
`Oct. 19, 2004
`
`Sheet 11 0f 12
`
`US 6,806,909 B1
`
`410
`
`SELECTION
`COMMAND
`
`STREAM
`
`
`
`OUTPUT
`STREAM
`
`OUTPUT
`STREAM
`
`
`
`
`
`
`
`
`
`420
`MEM
`403 I
`I
`424
`407
`
`I INPUT
`INPUT
`OUTPUT
`OUTPUT I
`I VBUF
`ABUF
`ABUF
`VBUF
`I
`406
`
`427 I
`I
`408
`I MOD - - I
`' m m I
`I
`
`
`
`
`
`423
`428
`430
`
`433
`
`434
`
`432
`
`431
`
`402
`
`
`DECODER
`
`
`400/
`
`FIG. 11
`
`12
`
`BTN0003351
`
`BTN0003351
`
`12
`
`
`
`US. Patent
`
`Oct. 19
`
`, 2004
`
`Sheet 12 0f 12
`
`US 6,806,909 B1
`
`mom
`
`mw<m9m
`
`mama
`
`Em:
`
`moEmEZ
`
`mom.
`
`5m<22<mo0mm
`com
`
`E2528
`
`13
`
`BTN 0003352
`
`BTN0003352
`
`13
`
`
`
`
`US 6,806,909 B1
`
`1
`SEAMLESS SPLICING OF MPEG-2
`MULTIMEDIA DATA STREAMS
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`This application is a continuation of provisional applica-
`tion No. 60/039,528 filed Mar. 3, 1997.
`This invention was made with United States Government
`
`support under Cooperative Agreement No. 70NANBH1174
`awarded by the National Institute For Standards and Tech-
`nology (NIST). The United States Government has certain
`rights in the invention.
`
`FIELD OF THE INVENTION
`
`The invention is related to the field of digital multimedia
`transmissions and especially to MPEG-2 bit streams.
`
`BACKGROUND
`
`One of the most common operations in TV is switching
`from one program to another. At the studio, cameras and
`microphones are switched and mixed to form a program. At
`the broadcaster (whether broadcast by cable or airwaves),
`programs are regularly switched to commercials and to other
`programs. Finally, the viewer is given a choice of several
`program channels and often switches between the channels,
`especially between programs.
`Currently the switching of analog signals at the studio and
`at the broadcaster, occurs during vertical intervals. In order
`to form a picture on a TV screen, first the odd lines of the
`picture are drawn by an electron gun, from the upper left,
`across each line,
`to the lower right side. Then during a
`vertical interval, the aim of the electron gun is moved from
`the lower right back to the upper left corner. Then, in a
`similar manor the electron gun draws the even lines of the
`picture interlaced with the odd lines. An independent unit of
`video such as all the even lines (or all the odd lines) is
`usually referred to as a “frame”.
`Currently, play-to-air (PTA) switchers are used to switch
`analog TV signals. Such switchers include synchronizing
`circuits, so that when a switch command is received, the
`PTA switcher waits until the next vertical interval between
`
`and then switches. When the program is switched during the
`vertical interval, there are no resulting flickers or flashes or
`other anomalies in the picture display during switching. This
`is known as seamless switching.
`In a typical implementation of a PTA switcher, there are
`two output channels: a program channel and a preview
`channel. The program channel carries the material that is
`being broadcast (“aired”), whereas the preview channel is
`used for viewing only within the studio and it usually carries
`the program to be switched to next (i.e., the next program to
`be aired and transmitted over the program channel). The
`focus herein is on the output stream carried over the program
`channel since this is the stream that
`is received by the
`viewers and has to be displayed seamlessly. Therefore, and
`unless specified differently, output stream refers to the
`stream output over the program channel.
`Many broadcasters are considering adding digital chan-
`nels to their broadcasts. In the world, colors, brightness,
`sounds have practically infinite variations. (i.e.
`they are
`analog) For digital broadcasting, analog scenes and sounds,
`usually must be converted into digital representations in a
`process known as digitalizing or analog-to-digital (A/D)
`conversion. Due to the high bandwidth required for uncom-
`pressed digital video signals, it is expected that the video
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`signals will require compression even in the production
`studio. For example, a single channel of uncompressed
`standard definition, digital video requires transmission of
`about 250 Mbs (million bits per second) of information
`(high definition video requires 1.5 Gbs). In digital video,
`pictures may not be interlaced and the term “video frame” is
`used to refer to a complete picture.
`The digital compression/decompression system can be
`conceived as: multiple encoders, each of which convert an
`uncompressed digital signal stream to a compressed stream;
`a switcher or splicer which switches between the input
`stream from each encoder to an output stream; and a decoder
`which decompresses the output stream from the splicer.
`The standard for handling digital multimedia data is
`known as MPEG-2. In MPEG-2, the digital representations
`of elements (e.g. video, 2—4 audio channels, captions) of a
`program are compressed (encoded) in a lossy manner (i.e.
`some information is lost) and the encoded information is
`transmitted as a continuous stream of bits. At the end of
`
`the encoded information is decompressed
`transport,
`(decoded) to approximately reproduce the original digitali-
`zation of the elements, and the decoded elements are dis-
`played to the viewer.
`MPEG-2 streams are organized hierarchically. First, the
`digital representations for each element are encoded
`(compressed) into a bitstream known as an elementary
`stream (ES). Then headers are inserted into each ES to form
`a packetized elementary stream (PES). The header of each
`PES contain a decode timestamp (DTS) which specifies
`when the decoding of the following ES is to be completed,
`and a presentation timestamp (PTS) which specifies when
`the decoded information for the following ES is to be
`presented. For example, a PES header will be inserted before
`each picture of a video elementary stream and before each
`frame of an audio elementary stream. Each PES stream is
`encapsulated (packaged) into a series of transport packets
`each of which are 188 bytes long and include a header and
`payload such as the bits of a PES stream. A typical PES
`stream such as a picture, requires a large number of packets.
`The header of each packet includes flags, a countdown field,
`and a 13 bit packet identifier (PID) field which identifies the
`portion of the PES that the packet is for. For example, all the
`packets for an MPEG group of pictures may have the same
`PID. All the packets with the same PID are called a PID
`stream.
`
`There are several auxiliary PID streams for each program,
`one of the streams is the program clock reference (PCR)
`which contains samples of a 27 MHZ clock used by the video
`and audio encoders and decoders. The PID that carries the
`
`PCR is called the PCRiPID. Another auxiliary PID stream
`for each program, contains a program map table (PMT)
`which lists all the PID’s which belong to the program and
`defines which PID streams contain which elements (video,
`audio channels, captions, PCRiPID). All the PID streams
`for a program are multiplexed together (the packets are
`intermixed, but bits of different packets are not intermixed)
`so that, for example, the packets for pictures and the packets
`for audio frames are mixed together.
`An MPEG-2 bit stream may include multiple programs.
`For example, the stream in a cable TV system may include
`hundreds of programs. The packets for different programs
`are also multiplexed together so that the decoder has to
`select
`the packets of a program in order to decode a
`particular program. Thus, another auxiliary PID stream is
`provided containing a program association table (PAT)
`which lists the PID streams containing the PMT’s for each
`BTN 0003353
`
`14
`
`BTN0003353
`
`14
`
`
`
`US 6,806,909 B1
`
`3
`of the programs. The packets of the PAT stream are all
`identified by a PID=0.
`The packets for each program in a multi-program stream
`may be referred to as a stream or sub-stream. Similarly, the
`packets for each element or component of a program may be
`referred to as a stream or substream. Those skilled in the art
`
`are accustomed to this terminology.
`FIG. 1 schematically illustrates a stream of packets with
`a packet identifier in the header and video, audio, PCR or
`PMT data in the payloads. Each packet is actually a con-
`tinuous stream of bits representing one formatted block as
`shown. The packets containing data for a first video picture
`V1 are mixed with packets containing data for a first audio
`frame A1 and packets containing data for a second audio
`frame A2 as well as with packets containing PCR times and
`packets containing PMT information. Note that packets for
`different video frames in the same program are not mixed
`and packets for different audio frames in the same program
`are not mixed. However, for multi-program streams,
`the
`packets for a picture of one program would be mixed with
`packets for pictures of another program. Also, note that the
`bits of different packets are not mixed, that is, the stream
`transmits all the bits for one packet sequentially together
`then all the bits for the next packet sequentially together.
`FIG. 2 schematically illustrates the same streams as FIG.
`1 in a different way, by showing a separate bar for each
`component (element) of the program with vertical
`lines
`between PES streams for each picture or audio frame. The
`separate packets are not shown. In FIG. 2, the intermixing of
`packets for audio frames 1 and 2 with video picture 1 is
`illustrated by overlapping the PES stream for picture 1 with
`the PES streams for audio frames 1 and 2.
`
`In the MPEG-2 standard, switching between programs is
`referred to as splicing, and points where splicing may take
`place without causing anomalies are referred to as seamless
`splice points. In MPEG-2, a new program is spliced onto an
`old program in the output stream when you switch from an
`old program to a new program. In the header of the packet
`in the same PES stream most immediately before a splice
`point, the MPEG-2 standard specifies that the splice point
`may be indicated, by setting the splicianointiflag=1,
`setting the spliceicoutdown=0, and if the splice is a seam-
`less splice point, that may also be indicated by setting the
`seamlessispliceiflag=1.
`In MPEG-2 video compression, each picture is first com-
`pressed in a manner similar to JPEG (quantized cosine
`intraframe compression), and then sequentially presented
`pictures are compressed together (quantized cosine inter-
`frame compression). Essentially in interframe compression,
`only the differences between a picture and pictures it
`depends on are included in the compressed frame. The
`decoding of a picture may depend on the decoding of
`previously viewed pictures and in some cases on the decod-
`ing of subsequently viewed pictures. In order to minimize
`decoding problems, especially errors that may be propagate
`from an erroneous decoding of one picture to cause the
`erroneous decoding of dependent pictures, only a relatively
`small group of pictures (GOP) are compressed together (e.g.
`9 pictures). The pictures of each GOP are encoded together
`independently from the pictures of any preceding GOPs and
`can thus be independently decoded (except for trailing
`B-frames) and any errors can not propagate from group to
`group. The first picture in a GOP (in order of presentation)
`is known as an I-frame and it is essentially just a JPEG
`encoded (independently compressed) picture and its decod-
`ing can be preformed independently (i.e. its decoding does
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`not depend on any other picture). Some of the subsequent
`pictures in the group may be so called P-frames (prediction
`encoded frames) and their decoding depends on the previous
`I-frame and any previous P-frames in the GOP. That is, each
`P-frame only contains the differences between that picture
`and the previously decoded I or P-frame and the differences
`are compressed. Typically in broadcast streams, most of the
`pictures in a GOP are so called B-frames (bidirectionally
`encoded frames) and their decoding depends on both the
`immediately preceding I or P-frame and the immediately
`succeeding I or P-frame (in order of presentation). B-frames
`are typically, much smaller than P-frames which are
`typically, much smaller than I-frames. The size of particular
`encoded frames in MPEG-2 varies depending on the com-
`plexity of the picture and on the amount of difference
`between the picture and the picture or pictures on which its
`decoding depends.
`A typical scheme proposed for broadcasting MPEG-2 is a
`group of 9 pictures presented sequentially on a display in the
`following order:
`
`The decoding of P4 depends on I1 and the decoding of P7
`depends on the decoding of P4 (which depends on the
`decoding of 11). The decoding of B2 and B3 depends on the
`decoding of I1 and P4. The decoding of B5 and B6 depends
`on the decoding of P4 and P7. The decoding of the last two
`B-frames (B8 and B9) depends on the decoding of P7 and on
`the immediately following I-frame (110) in the following
`GOP (not shown).
`In the data stream the encoded pictures are not transmitted
`or stored in presentation order. They are provided in the
`order that
`they are required for decoding. That
`is,
`the
`B-frames follow the I and P-frames on which they are
`dependent. The pictures in this typical scheme are provided
`in stream order, as follows:
`
`11
`
`B,2
`
`B71
`
`P4 B2
`
`B3
`
`P7
`
`B5
`
`B5
`
`I10
`
`B8
`
`B9
`
`Note that in stream order B_2 and B_1 of the preceding GOP
`and 110 of the succeeding GOP are mixed with the pictures
`of this typical GOP.
`MPEG-2 defines a video buffer model for a decoder called
`
`the video buffering verifier (VBV). The VBV is a bitstream
`constraint, not a decoder specification. The actual decoder
`buffer will be designed so that any bitstream that does not
`overflow or underflow the VBV model, will not overflow or
`underflow the actual decoder buffer. The VBV model is a
`
`first-in-first-out (FIFO) buffer in which bits simultaneously
`exit the buffer in chunks of one picture at a time at regular
`intervals (e.g. every 33 milliseconds(ms)). The rate at which
`pictures exit the buffer is called the frame rate and the
`average decode time and it is the same as the frame rate.
`When a decoder resets and starts to decode a new stream,
`the VBV buffer is initially empty. The VBV buffer is filled
`at a rate specified in the bit stream for either: a predeter-
`mined period of time for constant bit rate (CBR) mode; or
`until filled to a predetermined level for variable bit rate
`(VBR) mode. The time required to partially fill the VBV
`buffer prior to operation is called the startup delay. The
`startup delay must be carefully adhered to in order to prevent
`overflow or underflow of the VBV buffer during subsequent
`decoder operation.
`
`15
`
`BTN 0003354
`
`BTN0003354
`
`15
`
`
`
`US 6,806,909 B1
`
`5
`the buffer continues to
`When a bit stream terminates,
`deliver pictures to the decoder until the buffer is emptied.
`The time required to empty the buffer after the stream ends
`is called the ending delay.
`Those skilled in the art are directed to the following
`publications: (1) Table 3 “Compression Format Constraints”
`of AnneXAof Doc. A/53, ATSC Digital Television Standard;
`(2) ISO/IEC 13818-1, “Generic Coding of Moving Pictures
`and Associated Audio: Systems”; (3) Section 5.13 titled
`“Concatenated Sequences” in Doc. A/54, “Guide to the use
`of the ATSC Digital Television Standard”, 4th Oct. 1995; (4)
`ISO/IEC 11172-3 International Standard, “Information
`Technology-Coding of Moving Pictures and Associated
`Audio for Digital Storage Media at up to about 1.5 Mbit/
`s—Part 3: Audio, First edition”, 1993-08-01; (5) ISO/IEC
`13818-3 “Draft International Standard,
`Information
`Technology—Generic Coding of Moving Pictures and Asso-
`ciated Audio: Audio,” ISO/IEC JTC1/SC29/WG11 N0703,
`May 10, 1994; (6) Proposed SMPTE Standard PT 20.02/10
`“Splice Points for MPEG-2 Transport Streams,” Second
`Draft, July 1997.
`SUMMARY OF THE INVENTION
`
`It is an object of the invention to provide methods and
`apparatus for carrying out seamless video splicing and to
`avoid disturbing audio anomalies due to the related audio
`splicing of MPEG-2 bit streams that include video and audio
`components.
`In the method of the invention for splicing MPEG-2
`multimedia programs, in the same or different multimedia
`data streams, a first and second programs are provided. Each
`program includes a first media component of the same first
`media (e.g. video) and a second media component of the
`same second media (e.g. an audio channel) which is a
`different media than the first media. Each media component
`of each program has a multitude of splice-in points with
`respective begin-presentation times for respective first por-
`tions presented after the splice-in. Each media component
`also has a multitude of splice-out points with respective
`end-presentation times for a last portion presented before the
`splice-out. Such times (associated with splice-in and splice-
`out points) are relative to the starting time of the program.
`A command is received to splice the second program to the
`first program. Then the splicer selects a seamless splice-in
`point for the first component in the second program in the
`stream and selects a seamless splice-out point for the first
`component in the first program in the stream. The position
`of the slice-out point in the stream or streams in the splicer
`are approximately aligned with the position in the stream of
`the splice-in point of the first component of the second
`program. Then the splicer cuts the first component of the first
`program out at the selected splice-out point for the first
`component, and splices in the first component of the second
`program at the selected splice-in point for the first compo-
`nent. Then the presentation times in the second program are
`changed so that
`the first presented portion of the first
`component of the second program has a begin-presentation
`time which is the same as the end-presentation time of the
`last presented component of the first program. Then the
`splicer selects a splice-in point in the stream for the second
`component in the second program at which (treating the
`presentation times in the two programs as consistent) the
`begin-presentation time of the earliest presented portion of
`the second component of the second program (after the
`splice-in point for the second component in the stream) is
`equal or after the end-presentation time of the latest pre-
`sented portion of the first component (before the splice-out
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`point of the first component of the first program in the
`stream). The splicer also selects a splice-out point in the
`stream for the second component of the first program, at
`which the end-presentation time of the latest presented
`portion of the second component of the first program (before
`the splice-out point for the second component in the stream)
`is equal to or before both: the begin-presentation time of the
`earliest presented portion of the first component (after the
`splice-in point of the first component in the stream); and the
`begin-presentation time of the earliest presented portion of
`the second component in the second program (after the
`splice-in point of the second component in the stream). The
`splicer then splices the second component of the first pro-
`gram out at
`the selected splice-out point of the second
`component and splices the second component of the second
`program in at the selected splice-in point of the second
`component.
`In one specific embodiment of the method of the
`invention, the begin-presentation time for the earliest pre-
`sented portion of the second component of the second
`program (after the selected splice-in point for the second
`component in the stream) is equal to or after the begin-
`presentation time for the earliest presented portion of the
`first component of the second program (after the selected
`splice-in point of the first component in the stream). Also,
`the end-presentation time for the latest presented portion of
`the second component of the first program is equal to or
`before the begin-presentation time for the earliest presented
`portion of the second program (following the selected
`splice-in point for the second component in the stream).
`In another specific embodiment of the method of the
`invention, the end-presentation time for the latest presented
`portion of the second component of the first program (before
`the splice-out point for the second component in the stream)
`is equal to or before the begin-presentation time for the
`earliest presented portion of the first component of the
`second program (after the selected splice-in point for the
`first component in the stream). Also, the begin-presentation
`time for the earliest presented portion of the second com-
`ponent of the second program (after the splice-in point for
`the second component in the stream) is equal to or later than
`the end-presentation time for the earliest presented portion
`of the second component in the first program (before the
`splice-out point for the second component in the stream).
`In another specific embodiment of the method of the
`invention, the number of audio frames that must be skipped
`to prevent overflowing an audio decoding buffer is deter-
`mined. Then a splice-out point for the second component in
`the first program that is previous to the splice-in point of the
`second component
`in the second program is selected
`depending on the determination in order to prevent over-
`flowing the audio decoder buffer.
`In the MPEG-2 data stream of the invention a first section
`
`of the stream consists essentially of a first media component
`of a first program and a second media component of the first
`program. Asecond section of the stream consists essentially
`of first media component of a second program and a second
`media component of the second program. A third section of
`the stream between the first section and the second section,
`consists essentially of the first media component of the
`second program and the second media component of the first
`program.
`
`A multimedia encoder of the invention includes a pro-
`cessing unit; a memory communicating with the processing
`unit; one or more buffers in the memory; one or more
`network inputs communicating with the buffers in the
`BTN0003355
`
`16
`
`BTN0003355
`
`16
`
`
`
`US 6,806,909 B1
`
`7
`memory, for receiving uncompressed programs; and at least
`one network output communicating with the buffers in the
`memory, for transmitting a data stream of one or more
`compressed programs from the encoder. The encoder also
`includes apparatus for receiving the uncompressed programs
`from the inputs into the buffers; apparatus for compressing
`the uncompressed portions of the programs in the buffers
`into compressed portions of the programs in the buffers; and
`apparatus for transmitting the compressed programs from
`the buffers onto the network output. The encoder also
`includes video splice-out providing apparatus for providing
`a multitude of seamless splice-out points in at least one of
`the compressed programs; and video splice-in providing
`apparatus for providing a multitude of seamless splice-in
`points in at least another one of the compressed programs.
`The encoder also has apparatus to prevent audio anomalies
`due to splicing the compressed programs.
`A multimedia data stream splicer of the invention
`includes: a processing unit; a memory communicating with
`the processing unit; one or more buffers in the memory; one
`or more network inputs communicating with the buffers in
`the memory, for one or more input data streams including at
`least a first and second programs. Each program includes a
`first media component of the same first media (e.g. video)
`and a second media component of the same second media
`(e.g. audio) which is different than the first media. Each
`media component of each program has a multitude of
`splice-in points, each associated with a portion of the
`component having an earliest begin-presentation time after
`the splice-in; and a multitude of splice-out points, each
`associated with a portion of the component having the latest
`end-presentation time before the splice-out. The splicer
`further includes at least one network output for an output
`data stream with one or more programs, communicating
`with the buffers in the memory. The splicer also includes
`apparatus (programed computer memory) for receiving the
`programs from the input data streams into the buffers;
`apparatus for transmitting the programs from the buffers
`onto the network output as a data stream; and apparatus for
`receiving a splice command to splice the second program to
`the first program. The splicer also includes apparatus for
`selecting a splice-in point of the first component in the
`second program depending on the splice command; appa-
`ratus for selecting a splice-out point of the f