`
`
`
`Introduction to
`
`MPEG-7
`
`Multimedia Content
`
`Description Interface
`
`Edited by
`
`B. S. Manjunath
`
`University of California, Santa Barbara, USA
`
`Universitar Politecnica de Catalunya, Barcelona, Spain
`
`Philippe Salembier
`
`Heinrich-Hertz—Institute (HHI), Berlin, Germany
`
`Thomas Sikora
`
`JOHN WILEY & SONS, LTD
`Apple 1014
`
`Apple 1014
`
`
`
`, Copyright © 2002
`'
`
`John Wiley & Sons Ltd,
`The Atrium, Southern Gate, Chichester,
`West Sussex P019 8SQ, England
`
`Telephone
`
`(+44) 1243 779777
`
`Email (for orders and customer service enquiries): cs-books@wiley.co.uk
`Visit our Home Page on www.wileyeurope.com or www.wiley.co.uk
`
`Reprinted March 2003
`
`All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or
`transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or
`otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms
`of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T
`4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be
`addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate,
`Chichester, West Sussex P019 SSQ, England, or emailed to permreq@wiley.co.uk, or faxed to
`(+44) 1243 770571.
`
`This publication is designed to provide accurate and authoritative information in regard to the subject
`matter covered. It is sold on the understanding that the Publisher is not engaged in rendering
`professional services. If professional advice or other expert assistance is required, the services of a
`competent professional should be sought.
`
`Other Wiley Editorial Offices
`
`John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
`
`Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
`
`Wiley-VCH Verlag GmbH, Boschstr. 12, D—69469 Weinheim, Germany
`
`John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia
`
`John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
`
`John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1
`
`British Library Cataloguing in Publication Data
`
`A catalogue record for this book is available from the British Library
`
`ISBN 0 471 48678 7
`
`Typeset in 10/ 12pt Times Roman by Laserwords Private Ltd, Chennai, India
`Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire
`This book is printed on acid-free paper responsibly manufactured from sustainable forestry
`in which at least two trees are planted for each one used for paper production.
`
`Apple 1014
`
`
`
`
`
`
`Apple 1014
`
`
`
`
`
`E
`
`:
`5
`
`1
`
`E
`E
`.
`
`2
`E
`i
`
`;
`1
`E
`
`
`
`4
`E
`
`Contributors
`
`Adam Lindsay
`Computing Department Lancaster
`University
`Lancaster
`LAl 4YR
`
`UK
`
`Ajay Divakaran, Ph.D.
`Mitsubishi Electric Research
`Laboratories
`Murray Hill Laboratory
`571 Central Avenue
`Suite 115
`-
`M
`H11, NJ 07974
`1
`urray
`ajayd@merl. com
`
`Akio Yamada
`Computer & Communication
`Research, NEC COYP'
`Miyazaki 4-1—1, Miyamae,
`Kawasaki 216—85555, Japan
`a-yamada@da.jp.nec.com
`
`Ana Belen Benitez
`Electrical Engineering Department
`Columbia University
`1312 Mudd, #F6, 500 W. 120th
`St, MC 4712
`New York, NY 10027
`USA
`Tel: +1 212 854—7473
`Fax: +1 212 932—9421
`ana@ee.columbia.edu
`
`Benoit Mory
`Laboratoires d’Electronique
`Philips
`22 avenue Descartes, BB 15
`94453 Limeil—Brevannes Cedex
`
`France
`
`benoit. mory @philips. com
`
`‘
`Chi Sun Won
`gepartment of Electrlcal
`ngineenng
`.
`.
`Dong G111? Umversuy
`26 Beon—Ji, 3 Ga, P11~Dong,
`Joong-Gu Seoul
`’
`South Korea
`
`cswon@dgu. ac. kr
`
`Claude Seyrat
`Expway 0/0 Acland
`18 avenue Georges V
`75008 pafis
`France
`cseyrat@acland.fr
`
`Cédfic Thiénot
`Expway 0/0 Acland
`18 avenue Georges V
`75005 Paris
`France
`cthienot@acland.fr
`
`Dean S. Messing
`Information Systems
`Technologies Dept.
`
`Apple 1014
`
`Apple 1014
`
`
`
` CONTRIBUTORS
`
`
`
`Sharp Laboratories of America
`”5750 N .W. Pacific Rim Blvd.
`
`Camas, WA, USA. 98607
`
`deanm@sharplabs. com
`
`Fernando Manuel Bernardo Pereira,
`Professor
`
`Instituto Superior
`Técnico — Instituto
`
`de Telecomunicaooes
`Av. Rovisco Pais, 1049—001
`
`Lisboa, Portugal
`FernandorPereira @ lx. it.pt
`
`Francoise Preteux
`
`Institut National des
`
`Telecommunications
`
`Unité de Projets ARTEMIS
`9, Rue Charles Fourier
`
`91011 Evry Cedex — France
`Francoise.Preteux@int-evryfr
`
`Hawley K. Rising III
`Sony MediaSoft Lab, USRL
`MD# SJ2C4
`
`3300 Zanker Road
`
`San Jose, CA 95134—1901
`
`hawley. rising @ am.sony. com
`
`Heon Jun Kim
`
`MI Group, Information
`Technology Lab.
`LG Electronics Institute of
`
`Technology
`16 Woomyeon—Dong, Seocho—Gu
`Seoul, Korea 137~724
`
`hjk@lge.co.kr
`
`I. P. A. Charlesworth,
`
`Reech Capital PLC,
`1 Undershaft,
`
`London EC3P 3DQ, England
`jason- charlesworth @ reech. com
`
`Jane Hunter
`
`DSTC Pty Ltd
`
`Distributed Systems Technology
`CRC
`
`Level 7, General Purpose Sout
`The University of Queensland
`Queensland 4072
`Australia
`
`jane @ dstc. edu.au
`
`Jens—Rainer Ohm
`
`Institute of Communication
`
`Engineering
`Aachen University of Technology
`Melatener Str. 23, D-52072
`
`Aachen, Germany
`0hm@ient.rwth-aachen.de
`
`John Smith
`
`IBM T. J. Watson Research Center
`
`30 Saw Mill River Road
`
`Hawthorne, NY 10532
`USA
`
`jrsmith@wats0n. ibm. com
`
`Iose M. Martinez
`
`Grupo de Tratamiento de
`Imagenes
`Dpto. Sefiales, Sistemas y
`Radiocomunicaciones
`
`E.T.S.Ing. Telecomunicacion
`(C—306)
`Universidad Politecnica de Madrid
`
`Ciudad Universitaria s/n
`
`E—28040 Madrid
`
`Spain
`jms@gti.ssr.upm.es
`
`Ibrg Hueuer
`Siemens AG
`
`CT IC 2
`
`Jorg Heuer
`81730 Miinchen
`
`Germany
`Tel: +49 89 636 52957
`
`Fax: +49 89 636 52393
`
`Joerg.Heuer@mchp.siemens.de
`
`Apple 1014
`
`
`
`r1,1-,,_.._,.,Mt_._.wiw.__._fi,_MWm
`
`
`
`Apple 1014
`
`
`
`
`
`
`
`CONTRIBUTORS
`
`Kyoungro Yoon
`LG Electronics Institute of
`
`Technology
`16 Woomyeon—dong, Seocho—gu,
`Seoul 137-724
`
`Korea
`
`Tel: +82~2~526—4133
`
`Fax: +82~2—526-4852
`
`yoonk@lg-elite.com
`
`Leonardo Chiariglione
`Telecom Italia Lab
`
`Via G. Reiss Romoli, 274
`I—10148 Torino
`
`Italy
`leonardo.chiariglione@tilab. com
`
`Leszek Cieplinski
`Mitsubishi Electric ITE—VIL
`
`20 Frederick Sanger Road
`Guildford
`
`Surrey
`GU2 7YD
`
`United Kingdom
`Leszek.Cieplinski@vil.ite.mee.c0m
`
`Michael Casey
`M E R L
`
`201 Broadway, 8th Floor
`Cambridge MA 02139
`mkc@merl.c0m
`
`Michael Wollborn
`
`Robert Bosch GmbH
`
`FV/SLM
`
`PO Box 777777
`
`D—31132 Hildesheim
`
`Germany
`Michael. W0llb0rn@de.bosch. com
`
`Mikio Sasaki
`
`Research Laboratories, DENSO
`CORPORATION
`
`500—1 Minamiyama,
`Komenoki-cho, Nisshin-shi,
`
`Aichi—ken,
`
`470—0111 Japan
`msasaki @ rlab. denso. c0.jp
`
`Miroslaw Bober
`
`Visual Information Laboratory
`Mitsubishi Electric Information
`
`Technology Center Europe
`20 Frederick Sanger Road
`Guildford Surrey GU2 7YD, UK
`miroslaw. bober@vil. ite. mee. com
`
`Mufit Ferman
`
`Sharp Laboratories of America
`5750 NW. Pacific RimBlvd.
`
`Camas, WA 98607
`USA
`
`mferman@sharplabs. com
`
`Neil Day
`MPEG—7 Alliance
`
`Dublin, Ireland.
`dneil@bluemetrix.com
`
`Shun-ichi Sekiguchi
`Multimedia Signal
`Processing Lab,
`Multimedia Labs, NTT
`DoCoMo Inc.
`
`Olivier Avaro
`
`France Télécom R&D
`
`38/40 rue General Leclerc
`
`92794 ISSY MOULINEAUX
`
`Cedex 9
`
`France
`
`Olivier.avar0@francetelecom.c0m
`
`Peter van Beek
`
`Sharp Labs of America,
`5750 NW. Pacific Rim Blvd,
`
`Camas, WA 98607
`USA
`
`,
`
`Tel: +1 360—817—7622,
`
`Fax: +1 360—817~8436,
`
`pvanbeek@sharplabs. com
`Apple 1014
`
`Apple 1014
`
`
`
` CONTRIBUTORS
`
`
`
`wmmmww
`
`. Philip N. Garner
`Canon Research Centre
`
`Europe Ltd,
`1 Occam Court,
`
`Occam Road,
`
`Surrey Research Park,
`Guildford,
`
`Surrey GU2 7Y1.
`United Kingdom
`philg@cre. canon. co. uk
`
`Philippe Salembier
`Universitat Politecnica de
`
`Catalunya
`Campus Nord, Modulo D5
`Jordi Girona, 1-3
`
`08034 Barcelona,
`
`Spain
`Tel: +34 9 3401 7404
`
`Fax: +34 9 3401 6447
`
`philippe @ gps. tsc. upc. es
`
`Rob Koenen
`
`InterTrust Technologies
`Corporation
`4750 Patrick Henry Drive
`Santa Clara, CA 95054
`USA
`
`rkbenen@intertrust. com
`
`Santhana Krishnamachari
`
`Philips Research
`345 Briarcliff Manor
`
`New York 10510, USA
`
`Santhana.krishnamachari@philips.com
`
`Schuyler Quackenbush,
`AT&T Labs, Rm E133 180 Park
`
`Avenue, Bldg. 103 Florham Park,
`NJ 07932, USA
`
`Sylvie JEANNlN
`Philips Research USA
`345 Scarborough Road, Briarcliff
`Manor NY 10510, USA
`
`Thomas Sikora
`
`Heinrich—Hertz-Institute for
`
`Communication Technology
`Einsteinufer 37 D—10587 Berlin
`
`Germany
`Sikora@hhi.de
`
`Toby Walker
`Media Processing Division
`Network and Software Technology
`Center of America
`
`3300 Zanker Road, MD #SJ2C4
`
`San Jose, California 95134
`
`t0byw@usrl.sony.c0m
`
`Whoi—Yul Yura Kim
`
`School of Electrical and Computer
`Engineering
`Hanyang University
`Korea
`
`wykim@email.hanyang.ac.kr
`
`Yanglim Choi
`Digital Media R&D Center,
`Samsung Electronics Co., Ltd.
`416, Maetan 3-Dong, Paldal-Gu,
`Suwon, Kyungki—Do, S. Korea.
`yanglimc®samsung.c0m
`
`YongMan Ro
`School of Engineering
`Information and Communication
`
`University
`Yusong-Gu PO Box 77
`Taejon, South Korea
`yr0@icu.ac.kr
`
`Apple 1014
`
`Apple 1014
`
`
`
`Preface
`
`This book provides a comprehensive introduction to the new ISO/MPEG7 standard. The
`individual chapters are written by experts who have actively participated and contributed
`to the development of the standard. The chapters are organized in an intuitive way, with
`clear explanations of the underlying tools and technologies contributing to the standard. A
`large number of illustrations and working demonstrations should make this book a valuable
`resource for a wide spectrum of readers — from graduate students and researchers inter-
`ested in the state~of—the-art media analysis technology to practicing engineers interested
`in implementing the standard.
`
`SEARCH AND RETRIEVAL OF MULTIMEDIA DATA
`
`Multimedia search and retrieval has become a very active research field because of the
`increasing amount of audiovisual (AV) data that is becoming available, and the growing
`difficulty to search, filter or manage such data. Furthermore, many new practical applica—
`tions such as large—scale multimedia search engines on the Web, media asset management
`systems in corporations, AV broadcast servers, and personal media servers for consumers
`
`are about to be widely available. This context has led to the development of efficient
`processing tools that are able to create the description of AV material or to support the
`identification or retrieval of AV documents. Besides the research activity on processing
`tools, the need for interoperability between devices has also been recognized and several
`standardization activities have been launched. MPEG—7, also called “Multimedia Content
`Description Interface”, standardizes the description of multimedia content supporting a
`wide range of applications. Standardization activities do not focus so much on processing
`tools but concentrate on the selection of features that have to be described, and on the
`way to structure and instantiate them with a common language.
`I
`
`As an emerging research area of wide interest, multimedia content description
`has a large audience. There are many workshops and conferences related to this topic
`every year, and their number is growing. The MPEG—7 technology covers the most recent
`developments in multimedia search and retrieval.
`
`This book presents a comprehensive overview of the principles and conCepts
`involved in a complete chain of AV material indexing, metadata description (based on
`the MPEG-7 standard), information retrieval and browsing. The book offers a practical
`step—by—step walk through of the components,rfrom systems to schemas Ato Iapélip—visual
`ppe
`
`
`
`l li 3
`
`3
`
`f
`
`l
`
`3
`
`
`
`Apple 1014
`
`
`
`
`
`PREFACE
`
`. descriptors. It addresses the selection of the multimedia features to be described, the orga-
`' nization and structuring of the description, the language to instantiate the description, as
`well as the major processing tools used for indexing and retrieval of images and video
`sequences. The accompanying electronic documentation will include numerous examples
`and working demonstrations of many of these components.
`
`Researchers and students interested in multimedia database technology will find
`this book a valuable resource covering a broad overview of the current state of the art in
`search and retrieval. Practicing engineers in industry will find this book useful in build—
`ing MPEG—7 compliant systems, as the only resource outside of the MPEG community
`available to the public at the time of publication.
`
`ORGANIZATION
`
`The book is organized into six sections: Introduction, Systems, Multimedia Description
`Schemes, Visual descriptors, Audio Descriptors and Applications.
`
`Section I: Introduction
`
`This section introduces the MPEG—7 standardization activity and the history behind this
`new standard. In Chapter 1, Leonardo Chiariglione, the convenor of MPEG, provides the
`motivation for the new standard. Chapter 2, by Pereira and Koenen, outlines the various
`activities within MPEG—7 that gained momentum towards the end of 1998, culminating
`in the final standard in 2002.
`‘
`
`Section II: Systems
`
`The systems section covers three major areas: Systems Architecture, Description Defini—
`tion Language and the Binary Format for MPEG-7. The chapter on Systems Architecture
`discusses the design principles behind MPEG—7 Systems and highlights the most impor-
`tant processing steps for transport and consumption of MPEG-7 descriptions. The second
`chapter focuses on the language used to define the various description elements called
`Descriptors or Description Schemes that are presented in Sections III, IV and V. Finally,
`the Binary Format for MPEG-7 is described in Chapter 5. This format has been designed
`so as to efficiently compress and transport MPEG—7 descriptions.
`
`Section III: Multimedia Description Schemes
`
`Section III describes the organization of features that can be described with MPEG-7. The
`organization of this section is based on the functionality provided by the various Descrip—
`tion Schemes. Chapter 6 provides an overview of the entire section. Chapter 7 discusses
`elementary Descriptions Schemes or Descriptors that are used as building blocks for more
`complex Descriptions Schemes. The tools available for description of a single multimedia
`document are reviewed in chapter 8. The most important features related to content man-
`agement and description, including low—level as well as high-level features, are analyzed.
`
`Apple 1014
`
`
`
`Mr.met.va1W«W-wmmm
`
`
`
`Apple 1014
`
`
`
` PREFACE
`
`W P
`
`urely audio or visual features are very briefly mentioned in this chapter. A detailed pre—
`sentation of the corresponding set of tools is given in Sections IV (Visual features) and
`Section V (audio features). The main functionalities supported by the tools of Chapter 8
`include search, retrieval and filtering. Navigation and browsing are supported by a spe—
`cific set of tools described in Chapter 9. Furthermore, the description of collections of
`documents or of descriptions is presented in Chapter 10. Finally, for some applications, it
`has been recognized that it is necessary to define in a normative way the user preferences
`and the usage history pertaining to the consumption of the multimedia material. This
`allows, for example, matching between user preferences and MPEG-7 content descrip-
`tions in order to facilitate personalization of the processing. These tools are described in
`Chapter 11.
`
`Section IV: Visual Descriptors
`
`This section begins with an overview in Chapters 12 and 13 describes color'descriptors
`that represent different aspects of color distribution in images and Video. These include
`descriptors for a color histogram of a single image as well as a collection of images, color
`structure, dominant color, and color layout. Chapter 14 presents three texture descriptors: a
`homogeneous texture descriptor, a coarse level browsing descriptor and an edge histogram
`descriptor. Chapter 15 presents descriptors that represent contour shape, region shape and
`3—D shapes. The section concludes with motion descriptors in Chapter 16.
`
`Section V: Audio Descriptors
`
`An overview of the audio descriptors is provided in Chapter 17. Chapter 18 describes the
`spoken content technology in more detail. Sound recognition and sound similarity tools
`are outlined in Chapter 19.
`
`Section VI: Applications
`
`Finally, we conclude with a section on the potential applications of MPEG—7. The appli~
`cations are broadly classified into search and browsing related, and mobile applications.
`Chapter 20 covers some interesting search and browsing applications that include real
`time video retrieval, browsing of TV news broadcast using MPEG~7 tools, and audio and
`music retrieval. Chapter 21 discusses two interesting mobile applications.
`
`DVD
`
`The accompanying DVD contains additional material, including technical reports, some
`working demonstrations and the official MPEG—7 reference software. The demonstrations
`on the DVD include video browsing and shot retrieval, and search and browsing of
`images using texture. We hope that researchers and graduate students will find this useful
`in their work.
`
`Apple 1014
`
`.
`
`
`
`
`
`Apple 1014
`
`
`
`
`
`W..Lmemmmm
`
` PREFACE
`
`
`
`ACKNOWLEDGMENTS
`
`We would like to express our gratitude and sincere thanks to all the contributors without
`whose dedication and timely contributions this work would not have been possible. Our
`special thanks to Leonardo Chiariglione, the convenor of MPEG, for his encouragement
`and support throughout the course of this project. We would like to thank the Interna—
`tional Organization for Standardization (ISO) and in particular Jacques—Olivier Chabot
`and Keith Brannon for allowing us to publish the MPEG—7 Reference software on the
`accompanying DVD.
`
`We would also like to thank Dr. Lutz Ihlenburg of Heinrich—Hertz—Institut, Ger-
`many, for assisting on editorial issues and for providing many valuable comments and
`suggestions. Our thanks to Shawn Newsam and Lei Wang for organizing the material for
`the DVD. We extend our thanks to the many reviewers who helped editing individual
`chapters.
`
`BSM would also like to thank Samsung Electronics for its support in facilitat-
`ing the participation in the MPEG—7 activities. Special thanks to Dr. Hyundoo Shin and
`Dr. Yanglim Choi for their support during the past three years. Thanks to Shawn Newsam,
`Xinding Sun, Gomathi Sankar, Ying Li, Ashish Agarwal, and Lei Wang for reviewing
`some of the chapters. He would like to thank all the members of the Vision research
`laboratory at UCSB for their help in putting together this manuscript.
`
`B. S. Manjunath
`Philippe Salembier
`Thomas Sikora
`
`Apple 1014
`
`Apple 1014
`
`
`
`2 C
`
`ontext, Goals and Procedures
`
`
`
`Fernando Pereira and Rob Koenen
`
`Instituto Superior Técnico, Lisbon, Portugal, InterTruSt Technologies Corp,
`CA, USA
`‘
`
`-W____
`.W_..
`
`2.1 MOTIVATION AND OBJECTIVES
`
`Producing multimedia content today is easier than ever before. Using digital cameras,
`personal computers and the Internet, virtually every individual in the world is a potential
`content producer, capable of creating content that can be easily distributed and published.
`The same technologies allow content, which would in the past remain inaccessible, to be
`made available on—line.
`
`However, what would seem like a dream can easily turn into an ugly nightmare if no
`means are available to manage the explosion in available content. Content, analogue and
`digital alike, has value only if it can be discovered and used. Content that cannot be easily
`found is like content that does not exist, and potential revenues are directly dependent on
`users finding the content. The easier it becomes to produce content, the faster the amount
`of content grows and the more complex the problem of managing content gets. The same
`digital technology that lowers the thresholds for producing and publishing content can also
`help in analyzing and classifying it, in extracting and manipulating features for specific
`applications and in searching and discovering content. Be it with or without automated
`support, information about content is a prerequisite for being able to find and manage it.
`
`To date, people looking for content have used text-based browsers with very mod-
`erate retrieval performance; typically, these search engines yield much noise around the
`hits. The fact that they are in widespread use nonetheless indicates that a need exists.
`These text-based engines rely on human operators to manually describe the multimedia
`content with keywords and free annotations. For two reasons this is increasingly unac-
`ceptable. First, it is a costly process, and the cost increases with the growing amount
`of content. Second, these descriptions are inherently subjective and their usage is often
`confined to the application domain that the descriptions were created for. H‘ence;‘it is
`necessary to automatically and objectively describe, index and annotate multimedia infor-
`mation, notably audiovisual data, using tools that automatically extract (possibly complex)
`audiovisual features from the content to substitute or complement manual, text-based
`Apple 1014
`
`
`
`
`
`Mm‘-"w—‘w“W"v»\.__._.....W..w~
`
`
`
`
`
`Apple 1014
`
`
`
`
`
` CONTEXT, GOALS AND PROCEDURES
`
`
`
`. descriptions. These automatically extracted audiovisual features will have three advan-
`tages over human annotations: (1) they will be automatically generated, (2) they can be
`more objective and domain-independent and (3) they can be native to the audiovisual
`content. Native descriptions would use nontextual data to describe content, using features
`such as color, shape, texture, melody and sound envelopes, in a way that allows the user
`to search by comparing descriptions. Even though automatically extracted descriptions
`will be very useful, it is evident that descriptions, the ‘bits about the bits’, will always
`include textual components. There are many features that can only be expressed through
`text, for example, authors and titles.
`
`The situation depicted above has been recognized for a number of years now, and
`a lot of work has been invested in the last years in researching relevant technologies.
`Several products addressing this problem have already emerged in the market, such as
`Virage’s Videologger [1]. These products, as well as the large number of papers in jour-
`nals, conferences and workshops, were an indication that the time was ripe to address the
`multimedia content description problem at a much larger scale.
`
`The aforementioned problem and the technological situation were recognized by
`MPEG (the Moving Picture Experts Group) [2] in July 1996, when it decided, at the
`Tampere MPEG meeting, to start a standardization project, generally known as MPEG—7,
`and formally called Multimedia Content Description Interface (ISO/IEC 15938) [3]. The
`MPEG—7 project has the objective to specify a standard way of describing various types of
`multimedia information: elementary pieces, complete works and repositories, irrespective
`of their representation format and storage medium. The objective is to facilitate the quick
`and efficient identification of interesting and relevant information and the efficient man—
`agement of that information [4]. These descriptions are both textual (annotations, names
`of actors etc.) and nontextual (statistical features, camera parameters etc.) Like the other
`members of the MPEG family, MPEG—7 defines a standard representation of multimedia
`information satisfying a set of well-defined requirements. But MPEG—7 is quite a different
`standard than its predecessors. MPEG-1, MPEG—2 and MPEG—4 all represent the content
`itself - ‘the bits’ ~ while MPEG—7 represents information about the content — ‘the bits
`about the bits’. While the first reproduce the content, the latter describes the content. The
`requirements for these two purposes are very different [5], although there is also some
`interesting overlap in technologies, and sometimes the frontiers are not that sharp.
`
`Even without MPEG—7, there are many ways to describe multimedia content in
`use today in various digital asset management systems. Such systems, however, generally
`do not allow a search across different repositories and do not facilitate content exchange
`between different databases using different description systems. These are interoperability
`issues, and creating a standard is an appropriate way to address them. A standard way to do
`multimedia content description allows content and its descriptions to be exchanged across
`different systems. Also, it sets an environment in which tools from different providers
`can work together, creating an infrastructure for transparent management of multimedia
`content. The main results of the MPEG—7 standard are this increased interoperability, the
`prospect to offer lower—cost products through the creation of a sizable market with new,
`standard-based services and a rapidly growing user base [6]. This agreement — a standard
`is no more and no less than an agreement between its users — will stimulate both content
`providers and users and simplify the entire content—identification process. Of course, the
`standard needs to be technically sound, since otherwise proprietary solutions will prevail,
`
`Apple 1014
`
`Apple 1014
`
`
`
`
`
`DRIVING PRINCIPLES
`
`which will hamper interoperability. The challenge in MPEG—7 was matching the needs
`with the available technologies, or, in other words, reconciling what is possible with what
`is useful.
`
`Participants in the development of MPEG-7 represent broadcasters, equipment and
`software manufacturers, digital content creators and managers, telecommunication ser-
`vice providers, publishers and intellectual property rights managers, as well as university
`researchers.
`
`2.2 DRIVING PRINCIPLES
`
`in which
`The MPEG—7 standardization project was preceded by an exploration phase,
`some fundamental principles appeared to be generally shared among all participants [4].
`These driving principles are more than just high-level requirements, as they express the
`vision behind MPEG—7. This Vision has guided the requirements gathering process [5]
`and the subsequent tools development work [4].
`
`The guiding principles, which set the foundations of the MPEG—7 standard, are as
`follows [4]:
`
`0 Wide application base: MPEG-7 shall be applicable to the content associated to any
`application domain, real-time—generated or not; MPEG—7 shall not be tuned to any
`specific type of application. Moreover, the content may be stored, and may be made
`available on—line, off—line or streamed.
`
`0 Relation with content: MPEG—7 shall allow the creation of descriptions to be used:
`
`— stand—alone, for example, just providing a summary of the content;
`
`— multiplexed with the content itself, for example, when broadcast together with the
`content;
`
`— linked to one or more versions of the content, for example, in Internet-based media.
`
`0 Wide array of data types: MPEG—7 shall consider a large variety of data types (or
`modalities) such as speech, audio, image, Video, graphics, 3—D models, synthetic audio
`and so on. Since MPEG-7 emphasis is on audiovisual information, no new description
`tools should be developed for textual data. Rather, existing solutions shall be considered
`such as Standard Generalized Markup Language (SGML), Extensible Markup Language
`(XML) or Resource Description Framework (RDF) [5].
`
`0 Media independence: MPEG-7 shall be applicable independently of the medium that
`carries the content. Media can include paper, film, tape, CD, a hard disk, a digital
`broadcast, Internet streaming and so on.
`
`o Object—based : MPEG-7 shall allow the object—based description of content. The content
`can be represented, in this case described, as a composition of multimedia objects and it
`shall be possible to independently access the descriptive data regarding specific objects
`in the content.
`
`0 Format independence: 'MPEG-7 shall be applicable independently of the content repre—
`sentation format, whether analogue or digital, compressed or uncompressed. TherefOre,
`audiovisual content could be represented in Phase Alternate Line (PAL), National Tele—
`vision Standards Committee (NTSC), MPEG-1, MPEG—2 or MPEG-4 and so forth.
`There is, however, a special relation with MPEG—4 [7, 8] since both MPEG-7 and
`Apple 1014
`
`Apple 1014
`
`
`
` CONTEXT, GOALS AND PROCEDURES
`
`
`
`MPEG—4 are multimedia representation standards that are built using an object—based
`“data model. As such, they are both unique and they complement each other very well,
`allowing very powerful applications to be created.
`
`0 Abstraction level: MPEG—7 shall include description capabilities with different levels
`of abstraction, from low—level, often statistical features, to high-level features convey—
`ing semantic meaning. Often the lowlevel features can be extracted automatically,
`whereas the more semantically meaningful features need to be extracted manually or
`semiautomatically. Also, different levels of description granularity shall be possible
`within each abstraction level. Note that higher-level conclusions often find evidence in
`lower—level features.
`
`0 Extensibility: MPEG—7 shall allow the extension of the core set of description tools in
`a standard way. It is recognized that a standard such as MPEG—7 can never contain all
`the structures needed to address every single application domain, and thus it shall be
`possible to extend the standard in a way that guarantees as much interoperability as
`possible.
`
`These principles not only characterize the MPEG—7 vision but they also indicate what sets
`MPEG—7 apart from other similar standardization efforts.
`
`2.3 WHAT IS STANDARDIZED?
`
`Technology and standards, as so many things in life, may get old and obsolete. In addi—
`tion, since the less flexible and dynamic they are, the easier it is for them to become
`obsolete,
`it
`is essential
`that standards are as flexible and minimally constraining as
`possible, while still serving their fundamental objective — interoperability. To MPEG,
`this means that a standard must specify the minimum necessary, but not more than
`that. This approach allows industrial competition and further evolution of the tech—
`nology in the so—called ‘nonnormative’ areas — the areas that the standard does not
`fix. To MPEG7,
`this implies that only the description format — syntax and seman-
`tics —- and its decoding will be standardized. Elements that are explicitly not specified
`are techniques for extraction and encoding, and the ‘consumption’ (description usage)
`phase. Although good analysis and retrieval tools will be as essential for a success—
`ful MPEG-7 application as motion estimation and rate control are for MPEG—1 and
`MPEG—2 applications, and video segmentation for some MPEG—4 applications,
`their
`standardization is not required for interoperability; in fact, the descriptions’ consumer
`does not care that much about the way the descriptions are created, provided that they
`can be understood and used. The specification of content analysis tools — automatic
`or semiautomatic — is out of the scope of the standard, and so are the programs and
`machines that ‘consume’ MPEG—7 descriptions. Developing these tools will be a task
`for the industries that build and sell MPEG—7—enabled products. This approach ensures
`that good use can be made of the continuous improvements in the relevant technical
`areas. New technological developments can be leveraged to build improved automatic
`analysis tools, matching engines and so on, and the descriptions they produce or con-
`sume will remain compliant with the standard. Therefore, progress need not stop at the
`moment the standard is frozen. It is possible to rely on technical competition for obtain—
`ing ever better results. This is happening for MPEG—Z, where improvements in encoding
`
`Apple 1014
`
`
`
`
`
`Apple 1014
`
`
`
`MPEG STANDARDS DEVELOPMENT PROCESS
`
`techniques have slashed bit-rates for digital television almost to half over the past four
`years.
`
`The first edition of the MPEG-7 standard is commonly designated as Version 1. The
`standard will be extended in the future with additional tools to address more requirements
`and provide more functionality. This will happen in the form of amendments to the
`standard. It is common to designate as Version N of a part of the standard, the set of
`tools in Version 1 extended with the tools specified in Amendment N-l for that part of
`the standard; for example, Amendment 1 of a part of the MPEG—4 standard is commonly
`known as the Version 2 of that part of the standard.
`
`.
`
`2.4 MPEG STANDARDS DEVELOPMENT PROCESS
`
`
`
`._.w,MA,“NA...A..-“..._
`
`3
`l
`:
`
`ll t
`
`(5
`
`When content representation changed from analogue to digital, the technology develop-
`ment process also changed, if only in terms of speed of developments and the fact that
`it was no longer sufficient to simply designate vertical columns of technology for well-
`defined applications. Thus, it is essential for standardization bodies such as MPEG to take
`this environment into account in the standards it creates and the way it sets those standards.
`For a'decade now, it has no longer been possible to employ the ‘sys