`
`Inventor: David Singer
`
`BACKGROUND
`
`In video coding systems, a conventional encoder may code a source video sequence into
`a coded representation that has a smaller bit rate than does the source video and, thereby achieve
`data compression. Coded video may bestored and/or transmitted according to a coding policy,
`typically conforming to a standard such as MPEG4 or MP4. A decoder may then invert the
`coding processes performed by the encoderto retrieve the source video.
`
`FEIG. 1: Video Coding Basics
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIG. 1
`
`
`VIDEO DECODER!
`DISPLAY
`
`1 is a simplified block diagram of a video coder/decoder system suitable for use
`FIG.
`with the present invention. A video system may include:
`
`terminals that communicate via a network. The terminals each may capture video data
`locally and code the video data for transmission to another terminal via the network.
`Each terminal may receive the coded video data of the other terminal from the
`network, decode the coded data and display the recovered video data. Video
`terminals may include personal computers (both desktop and laptop computers),
`tablet computers, handheld computing devices, computer servers, media players
`and/or dedicated video conferencing equipment.
`
`an encoder system that may accept a source video sequence and may code the source
`video as coded video, which typically has a much lowerbit rate than the source video.
`
`a channel that delivers the coded video data output from the coding engine to the
`decoder.
`
`e The encoder system may output the coded video data to the channel, which may
`be a storage device, such as an optical, magnetic or electrical storage device, or a
`communication channel
`formed by computer network or a communication
`network for example either a wired or wireless network.
`
`
`
`e
`
`adecoder system that may retrieve the coded video data from the channel, invert the
`coding operations performed by the encoder system and output decoded video data to
`an associated display device.
`
`Notes:
`
`e As shown the video communication system supports video coding and decoding in
`one direction only. For bidirectional communication, an encoder and decoder may
`each be implemented at each terminal such that each terminal may capture video data
`at a local location and code the video data for transmission to the other terminal via
`the network. Each terminal may receive the coded video data of the other terminal
`from the network, decode the coded data and display video data recovered therefrom.
`
`e Video coding systems initially may separate a source video sequenceinto a series of
`frames, each frame representing a still image of the video. A frame may be further
`divided into blocks of pixels. Each frame of the video sequence may then be coded
`on a block-by-block basis according to any of a variety of different coding
`techniques. For example, using predictive coding techniques, some frames in a video
`stream may be coded independently (intra- coded I-frames) and some other frames
`may be coded using other frames as reference frames (inter- coded frames, e.g., P-
`frames or B-frames). P- frames may be coded with reference to a single previously
`coded frame and B-frames may be coded with reference to a pair of previously coded
`frames.
`
`e For time based coding techniques, segments of the input video sequence mayinclude
`various time-based characteristics. For example, each frame may have an associated
`time length, coding order and display order.
`
`the
`e The terminals may exchange information as part of an initial handshake,
`information detailing the capabilities of each terminal and potentially identifying the
`coding format and default parameters in an initialization table.
`
`Page 2 of 11
`
`13316/1466700
`
`
`
`SUMMARYOF INVENTION
`
`Where bandwidth is rising, but latency (since it is a physical characteristic tied to the
`speed signals travel, to a large extent) is staying steady, or even rising due to buffer delays,a file
`format that is optimized for media delivery, notably streaming and one-to-many distribution
`(broadcast, multicast, application layer multicast or peer-to-peer distribution)
`is desired.
`Conventionally, the best bandwidth efficiency is achieved when the round-trip delay cost is
`‘amortized’ over larger data objects (such as segments). HTTP streaming is the current best
`example of an environment that addresses these needs, but the ‘segment format’ should be
`delivery system independent.
`
`Accordingly, a new file format is presented, designed to simplify and modernize existing
`file formats. The data objects need to be incremental, capable of carrying a single stream or a
`package, re-packageable, and so on. A new format should be designed to re-use existing formats
`as muchas possible, to allow for easy conversion between delivery and storage, and should be
`optimized for media delivery. A regular MP4 file (using only boxes described in the standard,
`and no extension boxes) should be ‘automatically’ convertible into a file of this format.
`
`A presentation can be represented as a packaged single file, or the contents of the
`package stored separately (‘unbundled’, i.e. as a plurality of tracks, or the un-timed data items).
`Presentations can also be fragmented in time, and the fragments collected into separate segments.
`
`A media fragment contains a section of the timeline. An initialization fragment contains
`initialization information.
`Fragments may be aggregated into segments. The smallest unit
`capable of being independently typed, managed,(e.g. as a file), and so on, is the segment. A
`segment always has a segment-type box at the beginning (and possibly later on), and then one or
`more fragments. A segment
`that starts with an initialization fragment can be called an
`initialization segment.
`
`Page 3 of 11
`
`13316/ 1466700
`
`
`
`DETAILED DESCRIPTION
`
`FIG. 2: Functional Block Diagram of Encoder
`
`VIDEO
`ENCODING
`ENGINE
`
`ENGINE
`
`VIDEO
`DECODING
`
`FIG. 2 is a simplified block diagram of a media encoder(e.g. video, as in this example)
`according to an embodimentof the present invention. The encoder may include:
`
`e
`
`a pre-processor that receives the input video data from the video source, such as a
`camera or storage device, separates the video data into frames, and prepares the
`frames for encoding.
`
`o The pre-processor performs video processing operations on video frames
`including filtering operations
`such as de-noising filtering, bilateral
`filtering or other kinds of processing operations that improve efficiency of
`coding operations performed by the encoder. Typically, the pre-processor
`analyzes and conditions the source video for more efficient compression.
`
`e
`
`an encoding engine that codes processed frames according to a variety of coding
`modes to achieve bandwidth compression.
`
`o The coding engine may select from a variety of coding modesto code the
`video data, where each different coding mode yields a different level of
`compression, depending upon the content of the source video.
`
`o The encoding engine may code the processed source video according to a
`predetermined multi-stage coding protocol. For example, common coding
`engines parse source video frames according to regular arrays of pixel data
`(e.g., 8x8 or 16x16 blocks), called “pixel blocks” herein, and may code the
`pixel blocks according to block prediction and calculation of prediction
`residuals, quantization and entropy coding.
`
`e
`
`avideo decoding engine that decodes coded video data generated by the encoding
`engine.
`
`Page 4 of 11
`
`13316/1466700
`
`
`
`o The decoding engine generates the same decoded replica of the source
`video data that a decoder system will generate, which can be used as a
`basis for predictive coding techniques performed by the encoding engine.
`The decoding engine may access a reference frame cache to store frame
`data that may represent sources of prediction for later-received frames
`input to the video coding system. Both the encoder and decoder may keep
`reference frames in a buffer. However, constraints in buffer sizes may
`limit the number of reference frames that can be stored in the cache.
`
`e
`
`a multiplexer (MUX) to store the coded data and combine the coded data into a
`bit stream to be delivered by a transmission channel to a decoderor terminal.
`
`IG. 3: Functional Block Diagram of Decoder
`FE
`
`CONTROLLER
`
`BUFFER
`
`VIDEO
`DECODING
`ENGINE
`
`POST-
`PROCESSOR
`
`FIG. 3 is a simplified block diagram of a video decoder according to an embodiment of
`the present invention. The encoder may include:
`
`a buffer to receive and store the coded channel data before processing by the
`encoding engine.
`
`a decoding engine to receive coded video data and invert coding and compression
`processes performed by the encoder.
`
`o The decoding engine may parse the received coded video data to recover
`the original source video data for example by decompressing the frames of
`a received video sequence by inverting coding operations performed by
`the encoder.
`
`o The decoding engine may access a reference frame cache to store frame
`data that may represent source blocks and sources of prediction for later-
`received frames input to the video coding system.
`
`a controller to identify the characteristics of the coded video data and select a
`decoding mode for the coded video data.
`
`a post-processor that prepares the video data for display. This may includefiltering,
`de-interlacing, or scaling, de-blocking, sharpening, up-scaling, noise masking, or
`other post-processing operations.
`
`a display pipeline that represents further processing stages (buffering, etc.) to output
`the final decoded video sequenceto a display device.
`
`Page 5 of 11
`
`13316/ 1466700
`
`
`
`Notes:
`
`As discussed above, the foregoing embodiments provide a coding system that codes
`and decodes video data according to a predetermined format. The techniques
`described above find application in both hardware- and software-based encoders and
`decoders. In a hardware-based encoder. The functional units within an encoder or
`decoder may be provided in a dedicated circuit system such as a digital signal
`processor or field programmable logic array or by a general purpose processor. In a
`software-based encoder,
`the functional units may be implemented on a personal
`computer system (commonly, a desktop or laptop computer) executing software
`routines corresponding to these functional blocks.
`The program instructions
`themselves also may be provided in a storage system, such as an electrical, optical or
`magnetic storage medium, and executed by a processor of the computer system. The
`principles of the present invention find application in hybrid systems of mixed
`hardware and software designs.
`
`Several embodiments of the invention are specifically illustrated and/or described
`herein. However,
`it will be appreciated that modifications and variations of the
`invention are covered by the above teachings and within the purview of the appended
`claims without departing from the spirit and intended scope ofthe invention.
`
`FIG.4: Pattern Based Defaults
`
`fi
`ed
`
`eee
`2
`cote
`ed
`oo
`cote
`cote
`ed
`
`
`aeoo
`fs
`
`eeeRA
`BSS
`59
`
`
`
`
`bead
`Se
`See
`
`
`
`
`beac
`ee
`See
`a
`bg
`i
`J
`8
`hee ee
`ce
`i
`
`3 14 15 16 17 18 19 20
`
`
`FIG. 4
`
`FIG. 4 illustrates an exemplary fragment having a pattern according to an embodimentof
`the present invention. As shown, the fragment includes a plurality of frames, the first frame (1)
`may be encoded as an I-frame, and a plurality of subsequent frames (2-10) may be coded as B-
`or P-frames. Then a subsequent frame (11) may be coded as an I frame and the plurality of
`subsequent frames (12-30) may be encoded in a similar pattern of B- and P- frames.
`
`e
`
`Initialization information exchanged between the encoder and decoder, or
`otherwise stored or transmitted with the coded video data, may identify the usual
`setup of data references,
`sample entries,
`etc.
`In fragment
`initialization
`information, there can be tables that define a pattern that can be indexed into by a
`run, or chunk, of media data (video, audio, or other media frames). These
`
`Page 6 of 11
`
`13316/ 1466700
`
`
`
`patterns include sample size pattern, duration, timing, and re-ordering, sample
`group membership, and other fragment characteristics used to decode and display
`the coded video data.
`
`e The fragment initialization information sets a pattern for all of the characteristics.
`Each pattern may be defined to have a fixed length, and all patterns may have the
`same length. The pattern length may be shorter or longer than the total sample
`count in a fragment. The fragment may indicate an initial offset into the patterns.
`The pattern is repeated (from the beginning) as needed, to coverall the samples in
`the runs.
`
`e
`
`If, for a
`In each run, of chunk, of media frames, there is a sample count.
`then the
`particular type of characteristic, sequence-specific values are used,
`corresponding table is present in the run or chunk. If the table is absent in a run
`or chunk, then the implied part of the pattern is used (based on the initial offset
`into the pattern, given for the fragment, and the sample numberin the fragment).
`
`e A long pattern may have sub-parts that cover common cases and without
`repetition among common cases. A short pattern may regularly repeat. A
`fragment longer than the short pattern may keep looping with the same pattern
`until the end of the fragment.
`
`FIG. 5: Method for Selecting Pattern Based Defaults
`
`FIG. 5 is a simplified flow diagram illustrating a method for selecting pattern based
`defaults for decoding a coded video sequence.
`
`ReceiveInitialization
`Information
`
`Receive Coded Video
`Sequence
`
`Is a Table Present?
`
`Y'
`
`
`
` Step 2: The decoder collects
`
`
`Step 1: An initialization table
`is received at
`the decoder with a
`coded
`video
`sequence.
`The
`initialization table defines the default
`values of
`the
`characteristics
`for
`patterns in the sequence.
`
`initialization
`the
`in
`pattern
`the
`information, and the set of tables in
`the chunk or run. A need for the
`pattern may be identified by the lack
`of a table of a specific type in the
`chunk or run.
`
`then
`Step 3: The decoder
`identifies the characteristic values for
`the frames in the sequence. A pattern
`will have a corresponding set of
`defaults that may be the same for
`every frame following the pattern.
`
`Identify Pattern Characteristics
`
`
`
`Identify Characteristic
`Information in Table
`
`Decode Coded Video
`
`FIG. 5
`
`Page 7 of 11
`
`13316/1466700
`
`
`
`For frames not a part of the sequence, the characteristic values may be included in the coded
`video data. Thus, unless otherwise indicated, the default values for characteristics of the frames
`corresponding to the pattern may be used.
`
`Step 4: The decoding engine may decode the coded video using the characteristic values
`identified in step 3, including the defaults for each identified pattern.
`
`FIG.6: Indexing Video Files
`
`Initialization Media Data Fragment Media Data Fragment Media Data
`
`Movie
`
`;
`
`Movie
`
`.
`
`Movie
`
`i
`
`FIG. 6
`
`FIG. 6 illustrates an exemplary video file having a plurality of movie fragments indexed
`by the initialization information according to an embodiment of the present invention. Each
`movie fragment may have a defined start time and duration, and, they can be stored contiguously
`and hencehavea starting byte andsize.
`
`Movie fragments may be implemented like samples such that the same table functionality
`maybe used to access fragments in a segment as the samples in a fragment. This enables the use
`of ‘sample groups’ and all other mechanisms that conventionally apply to samples to apply to
`fragments.
`
`A table, and marking on the fragments, may allow for random access of a portion of
`media data without requiring the decoding unit to parse the entire movie.
`
`
` AccessInitialization Information
`
`FIG. 7: Method for Accessing Media Segment
`
`FIG. 7 is a simplified flow diagram illustrating a method for
`randomly accessing a portion of a media file according to an
`embodimentofthe present invention.
`
`Step 1: To display a portion of a media file, a controller
`may access
`the initialization information for
`the file.
`The
`initialization information,
`included at
`the start of the file, may
`include characteristic information for the file and include a sync
`table that identifies the size or duration of each of a plurality of
`fragments within the file.
`In order to access a requested fragment,
`the controller may parse the sync table to identify the start of the
`
`
`relevant media file and any characteristics that carry from the
`
`Access Movie Fragment
`
`Display Media Segment
`
`FIG. 7
`
`Page 8 of 11
`
`13316/ 1466700
`
`
`
`initialization information throughoutthe file and are applicable to the requested fragment.
`
`Step 2: The controller may then access the fragment information includedat the start of
`the requested fragment. Like the initialization information,
`the fragment information may
`include characteristic information for the fragment.
`
`Step 3: The requested information may then be displayed using the characteristic
`information accessed in the initialization information and the fragment information.
`
`FIG. 8: Carouseling Initialization Information
`
`(Major)
`
`Version
`
`Vv
`
`Media Data
`
`FIG. 8
`
`FIG. 8 illustrates an exemplary fragment of coded video data having carouseling
`initialization information according to an embodimentof the present invention.
`
`the initialization information for the data may be periodically
`With streaming data,
`dumped or updated. The decoder displaying the streamed data may then reinitialize the settings
`and characteristics for the data each time an initialization fragment is received. As shown in
`FIG. 8,
`initialization information having carouseling version data may indicate whether re-
`initialization is required.
`
`As shownin FIG. 8, an initialization box in the receivedinitialization information gives a
`major and minor version number, documenting whetherre-initialization is needed when a new
`initialization packet
`is encountered. Both sync and non-sync (‘difference’)
`initialization
`fragments may be included in the initialization information. For example, the initialization
`version (major) may indicate whether re-initialization is required, for example, if the streaming
`information is switching between different videos coded according to different codecs.
`
`The initialization version (minor) may indicate whether an update of the initialization
`information is required. Theinitialization fragment may contain an edit list, then if the an update
`applies to the whole video the edit
`list may be replaced in a carouseled version of the
`initialization segment. If this new edit list maps the media data preceding in segmentorderto the
`same presentation times, this is a minor update to the initialization segment. Otherwise,it is a
`major(re-initialization) update.
`
`For example, for pair ofinitialization fragments, if the two version numbersare identical,
`the initialization information is identical and the initialization fragment is a resend of the known
`initialization fragment. If the major version is identical, but the minor version has changed, then
`the box is a compatible update (e.g. includes more meta-data that nonetheless applies to the
`whole duration of the presentation).
`If the major version has changed,
`then this is new
`initialization information and requires a re-initialization.
`
`Page 9 of 11
`
`13316/ 1466700
`
`
`
`In another exemplary embodiment, a segment that contains an initialization fragment can
`be marked as a random accesspoint in the segment index. Then for a minor version change, the
`update initialization fragment may contain the difference betweenthe original initialization
`fragment and the update fragment. Then the segments may be defined as either independent(I)
`segments when the major version indicates a re-initialization or predictively (P) coded segments
`when the minorversion identifies changes to be madeto the data of the I segment data._
`
`FIG. 9: Method for Identifying Initialization Updates
`. FIG.
`.
`9
`Is
`a
`ReceiveInitialization
`simplified flow diagram
`Information
`illustrating a method for
`identifying
`initialization
`update information in a
`stream of
`video
`data
`according
`to
`an
`embodimentof the present
`invention.
`
`Is Major Version Different?
`
`Identify Version Information
`
`
`
`
`
`Step 1: A decoder
`may receive initialization
`information for data being
`streamed to the decoder.
`The
`initialization
`information may
`contain
`:
`:
`y
`:
`version information, major
`and minor,
`that identifies
`the version of the data.
`
`Re-Initialize Decoder(s)
`
`Is Minor Version Different?
`
`
`
`FIG. 9
`
`Discard ReceivedInitialization
`Information
`
`
`
`
`
`ae
`UpdateInitialization Data
`
`
`
`Step 2: The version of the received initialization data may then be compared to the
`version of the currently utilized initialization data.
`If the version information includes a major
`identification and a minor
`identification, and the major
`identification for
`the received
`initialization data is different than the major identification for the currently utilized initialization
`data, the decoder should bereinitialized using the received initialization data.
`
`If the major versions are the same, but the minor versions are different, the
`Step 3:
`received initialization data indicates that a change to the initialization information should occur
`and the edit table or other changed information in the currently utilized initialization data should
`be replaced by the information in the receivedinitialization data.
`
`Step 4: If the major and minor versions on both the received information data and the
`currently utilized information data are the same, the received informationis discarded.
`
`Page 10 of 11
`
`13316/1466700
`
`