`Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`MPEG4: coding for content,
`universal accessibility
`
`interactivity,
`
`and
`
`Cliff Reader
`Samsung Electronics
`3655 North First Street
`San Jose, California 95134-1713
`E-mail: creader@samsung.
`com
`
`Abstract. MPEG4 is a natural extension of audiovisual
`coding, and yet
`from many perspectives
`breaks new ground as a standard. New coding
`techniques
`are being introduced,
`of course,
`but
`they will work on new
`The standard itself has a new architecture,
`and will use
`data structures.
`a new operational model when implemented
`on equipment
`that
`is likely
`to have innovative system architecture. The author
`introduces the back-
`ground developments
`in technology
`and applications
`that are driving or
`enabling the standard,
`introduces the focus of MPEG4, and enumerates
`the new functionalities
`to be supported. Key applications
`in interactive
`TV and heterogeneous
`environments
`are discussed. The architecture of
`followed by a discussion of the multiphase MPEG4
`MPEG4 is described,
`communication
`scenarios,
`and issues of practical
`implementation
`of
`MPEG4 terminals. The paper concludes with a description of the MPEG4
`workplan.
`In summary, MPEG4 has two fundamental
`attributes. First,
`it
`is the coding of audiovisual
`objects, which may be natural or synthetic
`data in two or three dimensions. Second,
`the heart of MPEG4 is its
`syntax:
`the MPEG4 Syntactic Descriptive Language—MSDL.
`
`terms: visual communication
`.%bject
`digital video; compression.
`
`and image processing; MPEG4; multimedia;
`
`Optical Engineering
`
`35(l),
`
`104–108 (January 1996).
`
`a
`
`Introduction
`1
`MPEG4 introduces
`for
`a new generation of coding standards
`audiovisual data. Fifty years ago, video standardization meant
`specifying voltage levels, and end-to-end synchronous
`timing
`was required. The motivation was not only interoperability,
`but the need to define practical, economical
`implementations.
`Recently
`video standardization
`has meant
`standardizing
`coding algorithm,
`again motivated in part by the economics
`of mass production
`that
`result
`from “hard-wired”
`imple-
`mentation. Coding schemes
`such as H.320 and MPEG each
`have defined a bit-precise
`syntax for efficiency and ease of
`decoding, but the coded representation
`is abstracted from the
`specifics of the implementation;
`in other words,
`it is at a
`higher
`layer
`in the protocol
`stack than the physical
`layer.
`These schemes also have been able to include a limited degree
`of
`flexibility, most notably
`in the form of “profiles”
`in
`MPEG2. However,
`they still code ‘‘video,”
`i.e., a regular
`time sequence of two-dimensional
`video frames and audio
`samples. See Fig. 1. In contrast,
`in the future, we will stan-
`dardize a communication
`language
`for audiovisual
`objects,
`and MPEG4 is taking the key step in that direction.
`This evolution
`is made possible to a large extent by ad-
`vances in technology. Digital TV settop boxes may be built
`upon the architecture
`of a PC, and PCs may be enhanced by
`adding TV tuner cards. Clearly the two are fusing, and what
`matters
`is the availability
`of very high-performance
`proces-
`
`PJpcr VIS-[)9 received MtIy 10, 1995: re~,iseclmanuscrip[received .luly 12. 1995;
`ii~c~pt~d for publication July 3 [, 199S.
`01996 !kie(y of PhotmOpticA l!)strull]et]t:iti(~nEngineers. 0091-3286/96/S6.00.
`
`104 / OPTICAL ENGINEERING/
`
`January 1996 /Vol. 35 No. 1
`
`prices.
`at consumer-level
`sors and high-capacity memories
`Today we have standardized
`specific coding algorithms
`rec-
`ognizing that dedicated
`implementation
`in hardware
`is es-
`sential
`to Iow cost. But the audiovisual
`system of tomorrow
`will be built around an embedded PC. and codimz will be
`done in software. That means a big advance
`in fl~xibility,
`and in particular
`an openness
`for future development.
`A specific consequence
`is flexibility in presentation, with
`a decoupling
`of the update rate and resolution for the audi-
`ovisual data themselves
`from the refresh rate and resolution
`of the display or sample rate of the audio output. The op-
`portunity
`exists
`to present
`audiovisual
`objects
`each at
`its
`native resolution or to scale the resolution as appropriate.
`Computer
`technology
`has already had a major
`influence
`through creating tools for postproduction. Today,
`the leading
`technology
`is nonlinear
`editing systems. With workstation
`processing power and hard-disk capacities unthinkable
`only
`a few years ago,
`today’s all-digital
`suite generates, manip-
`ulates, and composes a wide range of audio and video objects:
`3-D titles, video clips, audio clips, computer animation,
`syn-
`thetic
`audio. The authoring
`environment
`is thus already
`object-oriented with a mix of natural and synthetic data that
`is 2-D or 3-D.
`has also
`technology
`side, computer
`On the presentation
`in video games. Memory technology
`led to great progress
`permits
`storage of databases
`large enough to support pho-
`torealistic
`scenes in high-end environments,
`and remarkably
`complex scenes even on consumer-level
`products. The in-
`teractive nature of the games means the presentation
`is also
`“nonlinear”
`and consists of manipulating
`and composing
`the audio and video objects in the database.
`
`1
`
`SAMSUNG-1038
`
`
`
`Downloaded From: https://www.spiedigitallibrary.org/journals/Optical-Engineering on 24 Mar 2023
`Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`MPEG4: CODING FOR CONTENT,
`
`INTERACTIVITY, AND UNIVERSAL ACCESSIBILITY
`
`and pre-
`distribution
`production,
`between the conventional
`sentation environment
`and the new MPEG4-enabled
`envi-
`ronment
`is shown in Fig. 3.
`the scene is fully com-
`In the conventional
`environment,
`uosed before beirw fixed into a standard video format. Three-
`~imensional
`obje~ts are reduced to two-dimensional
`projec-
`tions,
`the script
`is fixed, and the presentation
`is essentially
`constrained
`to the s~atial and tem~oral
`format of the irmut.
`In contrast,
`if audio ~nd video obje&s are coded at the sou~ce,
`together with (optionally)
`the script,
`then flexibility is main-
`tained.
`If desired,
`the three-dimensional
`nature of the objects
`can be preserved,
`the script may consist only of a behavior
`model, and the format of the presentation
`can be indepen-
`dently chosen.
`In summary, MPEG4 will not code video as
`25/30 frames/s
`of PAL/NTSC resolution,
`or audio as 48
`ksamples/s. MPEG4 will code audio and video objects
`at
`their native resolution;
`then, by selecting objects and scaling
`their resolution, a wide variety of presentations
`can be made.
`The other
`important
`factor
`in the applications
`addressed
`by MPEG4 is the heterogeneous
`networking environment we
`can already foresee, and in particular wireless communica-
`tion. Mobile applications
`are compelling, whether
`for video-
`phone communications
`or remote access to audiovisual
`da-
`tabases. This introduces
`requirements
`for tolerance of noisy
`environments,
`varying bandwidths,
`and varying degrees of
`decoder
`resources,
`including display resolution, computation
`resources, and battery power. MPEG4 will address the prob-
`lem of channels
`in which the residual noise is not zero, and
`provide fine-grained
`scalability for constrained
`bit rate and
`decoder
`resources.
`
`3 Structure of MPEG4
`
`a single algorithm. No such
`MPEG4 will not standardize
`algorithm can exist
`in view of the range of functionalities
`and applications
`to be addressed. Also,
`there is no need to
`standardize
`a single algorithm, when cost-effective
`systems
`can be built to switch between algorithms, or even learn new
`ones. The latter capability
`also permits
`future advances
`in
`coding techniques
`to be included in the standard. So MPEG4
`will establish an extendible
`set of coding fools, which can
`and the
`be combined
`in various ways to make algorithms,
`algorithms
`can be customized
`for specific
`applications
`to
`This is illustrated in Fig. 4.
`make profiles.
`Tools, algorithms and profiles are coding objects and con-
`sist of an independent
`core and a standard
`interface. The
`standard interface guarantees
`the coding objects can inter-
`work, and the independent
`core permits proprietary
`tech-
`niques to be invented and made available within the standard.
`This is analogous
`to the situation in computer
`software ap-
`plications, where independent
`software vendors
`(ISVS) can
`develop and market products
`that are guaranteed to run, pro-
`vided they are compatible with the application
`program in-
`terface (API).
`In this sense, MPEG4 will be the API for the
`coded representation
`of audiovisual data.
`is the
`together
`The “glue”
`that binds the coding objects
`(MSDL) which
`MPEG4
`Syntactic Descriptive
`Language
`comprises
`several key components:
`first, definition
`of the
`coding object
`interface noted above;
`second, a mechanism
`to combine
`coding objects
`to construct
`coding algorithms
`and profiles; and third, a mechanism for downloading
`new
`coding objects. The current
`thinking is for coded data objects
`
`OPTICAL ENGINEERING / January 1996 /Vol. 35 No. 1 / 105
`
`Signal Voltage
`Presentation
`Format
`Synchronous
`Timing
`
`Coding Algorithm
`Bit-Precise
`Syntax
`
`Set of Profiles
`
`Syntactic
`
`Descriptive
`
`Language
`
`MSDL
`
`ANALOG
`
`VIDEO
`
`(frame-based)
`I
`
`I+
`
`DIGITAL
`
`VIDEO
`
`(frame-based)
`
`1
`
`fdPEG4
`(Objact-based)
`
`Fig. 1 Evolution of audiovisual
`
`standards.
`
`‘TV/jlm’
`
`Av-
`
`inter
`/
`
`tivity
`
`Fig. 2 MPEG4 addresses applications
`
`in the shaded area.
`
`and the projected new ca-
`Both of these developments,
`suggest a trend from purely
`pabilities of “interactive TV,”
`linear TV toward “nonlinear TV”
`from both a content and
`a technology point of view. MPEG4 is the storage and trans-
`mission standard for nonlinear TV.
`
`and Applications
`2 MPEG4 Functionalities
`of the worlds of computers
`In the near future, convergence
`and consumer
`audiovisual
`products will be accompanied
`by
`advances
`and diversification
`of communications.
`High-
`bandwidth wide-area networks, wireless networks, and huge
`audiovisual
`databases offer the potential
`for an explosion in
`applications
`on one hand, and a crisis in data management
`on the other hand. MPEG4 will tackle the data representation
`part of this problem and will be an audiovisual
`coding stan-
`dard allowing for interactivity,
`high compression,
`and uni-
`versal accessibility. This is illustrated in Fig. 2.
`To accomplish
`this, MPEG4 will address a set of eight
`new and improved functionalities
`(in addition to the standard
`functionalities).
`These are shown in Table 1, together with
`examples of applications. This table has been extracted from
`the MPEG4 Proposal Package Description (PPD) document. 1
`Within this suite of functionalities,
`several
`trends are clear.
`First,
`there is a prevalence of content-based
`capabilities. We
`want
`to be able to access audiovisual
`objects, manipulate
`them, and present
`them in a highly flexible way. We want
`to do this at the coded data level. Second,
`interactivity
`is
`important. Third, we want to have these capabilities
`for both
`natural and synthetic audiovisual objects, and fourth, we want
`these capabilities
`no matter where we are. A comparison
`
`2
`
`
`
`Downloaded From: https://www.spiedigitallibrary.org/journals/Optical-Engineering on 24 Mar 2023
`Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`READER
`
`Table 1 Description of functionalities.
`
`Functionality
`
`Description
`
`Example Uses
`
`mrtent Based Intemctivify
`
`MPEG-4 shall provide data access based
`:ontent-Baaed
`multimedia Data on the audio-visual content by using
`various accessing tools such as indexing,
`~ccess Tools
`hyperlinking, querying, browsing,
`uploading, downloading, and deleting.
`
`“ :Qntent-p~edre~ie”a’of
`mformatlon from on-line
`libraries and travel information
`databases.
`
`;ontent-Based
`Manipulation
`nd Bitstream
`lditing
`
`l Interactive home shopping
`
`MPEG-4 shall provide an ‘MPEG-4
`Syntactic Description Language’ and
`l Home movie production and
`coding schemes to support content-based
`editing
`manipulation and bkstream editing without
`theneed fortranscoding. The MSDL shall l Insertion ofsign language
`interpreter or subtitles;
`be flexible enough to provide extension
`for future uses.
`
`l Digital effects (e.g. fade-ins);
`
`Iybrid Natural MPEG-4 shall support efficient methcds
`,nd Synffretic
`for combining synthetic scenes or objects
`)ata Coding
`with natural scenes or objects (e.g. text
`and graphics overlays), theability to code
`and manipulate natural and synthetic audio
`and video data, and deccder-controllable
`methods of compositing synthetic data
`with ordinary video and audio, allowing
`for interactivity.
`
`l Animations and synthetic sound
`can be composite with natural
`audio and video in a game.
`
`l Aviewer cantranslate orremov
`a graphic overlay to view the
`video beneath it;
`
`l Graphics and sound can be
`‘rendered’ fcom different points
`of observation;
`
`mproved
`remporal
`/andom Access
`
`MPEG-4 shall provide efficient methods
`l Audio-visua’datacanbe
`randomly accessed from a
`to randomly access, within a limited time
`remote terminal over limited
`and with fine resolution, parts (e.g. frames
`capacity media.
`or objects) from an audio-visual sequence.
`This includes ‘conventional random access l A ‘fast forward’ can be
`at very low bit rates.
`performed on a single AV objet
`in the sequence.
`
`oppression
`
`reproved
`:oding
`;fflciency
`
`Zoding of
`tiultiple
`?oncurrent Data
`ltreams
`
`‘rriversalAccess
`
`7obustness in
`ZrrOr-PrOne
`environments
`
`Iontent-B ased
`scalability
`
`l Efficient transmission of audio-
`visual data on Iow-bandwidtb
`channels.
`
`l Efficient storage of audio-visual
`data on limited capacity media,
`e.g. magnetic disks.
`
`l Multimedia entertainment, e.g.
`virtual reality games, 3D
`movies;
`
`l Training and flight simulations;
`
`l Multimedia presentations and
`education;
`
`For specific applications targeted by
`MPEG-4, MPEG-4 shall provide
`subjectively better audio-visual quality at
`comparable bit-rates compared to existing
`or other emerging standards.
`
`MPEG-4 shall provide the ability to code
`multiple views/ soundtracks of a scene
`efficiently and provide sufficient
`synchronization between the resulting ele-
`mentary streams. For stereoscopic video
`aPP~ations, MpEG-4 shall include the
`abdlty to exploit redundancy in multiple
`viewing or hearing points of the same
`scene, permitting joint coding solutions
`that allow compatibility with normal audio
`and video as well as the ones without the
`compatibility constraint.
`
`MPEG-4 shall provide an error robustness l Transmitting from a database
`over a wireless network;
`capability to allow access to applications
`over a variety of wireless and wired
`networks and storage media. Sufficient
`error robustness shall be provided for low
`bit-rate applications under severe error
`conditions (e.g. long error bursts).
`
`. Communicating with a mobile
`terminal.
`
`l Gathering audio-visual data fron
`a remote location.
`
`MPEG-4 shall provide ability to achieve
`scalability with a fine granukwity in
`content, quality, (e.g. spatial resolution,
`temporal resolution), and complexity. In
`MPEG-4,
`these scalabilities are especially
`intended to result in content-based scaling
`of audio-visual
`information.
`
`l User or automated selection of
`decoded quality of objects in the
`scene;
`
`. Database browsing at different
`content levels, scales,
`resolutions, and qualities;
`
`106 / OPTICAL ENGINEERING/
`
`January 1996 /Vol. 35 No. 1
`
`3
`
`
`
`Downloaded From: https://www.spiedigitallibrary.org/journals/Optical-Engineering on 24 Mar 2023
`Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`MPEG4: CODING FOR CONTENT,
`
`INTERACTIVITY, AND UNIVERSAL ACCESSIBILITY
`
`A/V OBJECTS
`Video/Audio
`Natural/Synthetic
`2-D/3-D
`
`Source
`
`,
`
`1
`
`Script
`
`n
`
`RIGID
`FORMAT!
`
`I
`
`1
`
`I
`
`I
`
`Fig. 5 MPEG4 communication
`
`phases.
`
`,
`
`t
`
`I
`
`I
`I
`
`I (Interactive
`I- SCIiDt
`
`I
`‘ I
`
`II
`
`FLEXIBLE
`FORMAT!
`
`pro-
`and presentation
`distribution
`Fig. 3 Coding in the production,
`cess:
`(a) conventional
`video environment,
`(b) MPEG4 environment.
`
`Fig. 4 Structure of MPEG4,
`
`OBJECTS
`
`to be described by the coding objects. Collectively
`themselves
`these components define a syntax for MPEG4, and the fourth
`component of MSDL is a set of rules for parsing this syntax.
`The MSDL is described more fully in the document <‘Re-
`quirements
`for the MPEG4 SDL.” 2
`of
`transmission
`This
`structure
`implies
`a multiphase
`MPEG4 data. At
`the beginning
`of an exchange
`between a
`user and a database or between two users,
`there is a config-
`uration phase, during which the coder and decoder determine
`the coding objects to be used, their configuration,
`and whether
`or not both of them have all the required objects.
`If not, there
`may be a learning phase, during which coding objects are
`downloaded.
`Finally,
`there is the transmission
`phase for the
`communication
`of the data, which must of course be bit-
`efficient. The process
`is illustrated in Fig. 5.
`
`4
`
`Practical MPEG4 Systems
`
`The general structure of MPEG4 is very flexible, and allows
`extension of capabilities
`in the future, but a generic imple-
`mentation
`of
`the standard would produce
`neither
`a cost-
`effective solution to any specific application,
`nor an efficient
`solution during the setup phases. One would expect
`that any
`specific system would be implemented to address a very small
`number of very well-defined
`applications,
`and that the setup
`phase would consist only of specifying a default configura-
`tion. The learning phase and the full configuration
`phase
`should occur only rarely—for
`example,
`in the way that soft-
`ware upgrades are done today. However,
`some help is needed
`to establish the default configurations;
`otherwise
`there may
`still be too frequent
`and irritating setup scenarios. For ex-
`ample,
`if different geographic
`regions established
`different
`default
`tool sets,
`then establishing
`an MPEG4 dialog across
`the region boundaries would necessitate
`a setup phase.
`In the
`worst case, a failure to agree on which of two competitive
`tools to use could occur. As a result,
`the current
`thinking is
`for MPEG to standardize
`an initial set of tools,
`toward which
`the implementers
`and users will gravitate. Most notably, since
`MPEG2 is definitively
`the highest-performance
`coding al-
`gorithm for linear TV, the MPEG2 tools will be in the MPEG4
`tool set.
`implemen-
`the least expensive
`term,
`in the short
`At least
`to be dedi-
`tation for a dedicated application will continue
`cated,
`fixed-function
`hardware. This is not inconsistent with
`MPEG4, with the proviso that there be a piece of logic capable
`of conducting
`the setup-phase
`dialog. As a practical matter,
`virtually all MPEG decoders
`today are architected with an
`embedded controller
`that parses the syntax and implements
`the higher coding layers in software.
`It is thus conceivable
`that an existing MPEG 1 or MPEG2 VLSI decoder might
`become
`an MPEG4
`decoder merely
`by adding MPEG4
`“front-end” microcode.
`
`5 MPEG4 Program of Work
`The schedule for MPEG4 is shown in Fig. 6. The principal
`components
`of the workplan are development
`of the syntax,
`development
`of the coding tools, and generation of the stan-
`dard document. The syntax is being developed over the next
`two years, and currently
`there is a call
`for proposals
`for
`eletnents of the MSDL.3 It is planned for a first draft
`to be
`available in January 1996, in time to be used with the coding
`tools selected from a first round of tests .34 Experience gained
`from that use will help define the second version of
`the
`MSDL, which will occur in November
`1996. The syntax will
`be frozen in July 1997.
`Selection and evaluation of the coding tools will be done
`using the successful competitive-phase—collaborative-phase
`
`OPTICAL ENGINEERING / January 1996 /Vol. 35 No. 1 / 107
`
`4
`
`
`
`Downloaded From: https://www.spiedigitallibrary.org/journals/Optical-Engineering on 24 Mar 2023
`Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`95
`
`96
`
`I
`
`Syntax
`
`m
`
`A
`vu1
`
`Tests Vh’k
`
`I
`
`A
`ver 2
`
`READER
`
`97
`
`[
`
`98
`
`6 Conclusions
`
`A
`Frozen
`
`the world standard for coded audio-
`MPEG has established
`visual data, offering the highest available quality and practical
`implementation.
`In the emerging MPEG4 standard,
`this work
`will be extended to exciting new applications
`such as inter-
`active TV, and new operational
`environments
`such as wire-
`less networks.
`
`References
`
`~
`
`Playground
`
`CoIl~borati~c Phase
`
`\VD
`
`A
`WD
`
`A
`A
`WD Frozen CD
`Veriiicolion Tests
`
`A
`DIS
`
`A
`IS
`
`1.
`
`2.
`
`3.
`
`4,
`
`v>! . V,”am,mn Mc.kl
`
`Fig. 6 MPEG4 workplan.
`
`(PPD),”
`
`revision 3, ISO/IEC
`
`proposal package description
`“MPEG4
`JTC l/SC29/WG 11 N0998, Jul. 1995.
`“Requirements for
`the MPEG4 SDL”
`JTC i/sC’29/WG11N 1022, Jul. 1995.
`‘‘MPEG4 call for proposals, ” lSO/IEC JTC l/SC29/WG 1I N0997, Jui.
`1995.
`test evaluation procedures document, ” lSO/IEC JTC1/SC29/
`“MPEG4
`WG 11 N0999, Jul. 1995.
`
`(draft
`
`revision 2.0),
`
`ISO/IEC
`
`used in MPEG 1 and MPEG2. Tests are the con-
`approach
`clusions of the competitive
`phases, and past experience
`has
`been that no outright winner appears;
`rather,
`there are several
`approaches
`that can contribute
`to development
`of the best
`result. Convergence
`to this result
`is performed by the mech-
`anism of wt-i$cation models:
`a precisely
`defined software
`simulation is used to perform experiments on promising cod-
`are conducted,
`ing techniques.
`So-called
`core
`experiments
`in which multiple parties perform the same experiment under
`identical conditions
`to evaluate a particular
`technique. Can-
`didate took for conventional
`video and audio have been de-
`veloped in 1995. Candidate tools for hybrid natural/synthetic
`audiovisual
`data will be developed
`in 1996, after which a
`Working Draft will be established. The hybrid work will be
`conducted by constructing
`a “playground”
`comprising a da-
`tabase of 2-D and 3-D natural and synthetic audiovisual
`ob-
`jects. Proponents will be invited to run scripted experiments
`in this playground, with the results being evaluated for coding
`efficiency. The emerging standard will be frozen in July 1997,
`and final approval
`is expected for November
`1998.
`
`Cliff Reader has more than 20 years ex-
`perience in digital video coding,
`image pro-
`cessing,
`and real-time
`digital
`video sys-
`tems
`design.
`His doctoral
`thesis was
`written while in residence at the University
`of Southern California
`Image Processing
`Institute,
`1971–1 973. His doctorate was
`awarded by the University of Sussex, En-
`gland in 1974. He has worked in the fields
`of
`reconnaissance
`imaging, medical
`im-
`aging, and earth resources
`imaging, and
`recently has concen~ated
`in the emerging field of
`;on~umer
`most
`digital video. He has published
`numerous
`papers, Since 1990 he
`has been an active member of the lSO/lEC MPEG community,
`pro-
`viding technical
`contributions
`to MPEGI
`and MPEG2. He was the
`head of
`the U.S. delegation
`for
`two years, was chief editor of
`the
`MPEG1 standard, and is now subcommittee
`chairman for the Ap-
`plications
`and Operational
`Environments
`group, better
`known as
`MPEG4. He is also the technical expert on MPEG intellectual prop-
`erty, assisting
`in the establishing
`of
`the MPEG Patent Pool. Dr.
`Reader has been with Samsung Electronics
`for the past
`two years
`as Associate Director
`for Strategic Planning.
`
`108 / OPTICAL ENGINEERING/
`
`January 1996 /Vol. 35 No. 1
`
`5
`
`