`John R. Smith, Rakesh Mohan and Chung-Sheng La
`IBM T.J. Watson Research Center
`30 Saw Mill River Road
`Hawthorne, NY 10532
`{jrsmith,rakesh,csli}@watson. ibm. corn
`1.1. Related work
`for processing content in the net-
`its accessibility. Recent transcod-
`0-8186-8821-1/98 $10 ! 00 0 1998 IEEE
`Figure 1: Image transcoding modifies the images along the
`dimensions of size, fidelity and color in order to adapt them
`to the client devices.
`ing efforts have focussed on compressing and caching im-
`ages in the Internet in order to reduce the data transmis-
`sion and speed-up delivery. Fox, et al.., developed a sys-
`tem for compressing Internet content lossily at a proxy in
`order to deal with client variability and improve end-to-
`end performance [2]. Ortega, et al., investigated a new
`image caching policy that reduces the resolution of infre-
`quently accessed images in order to conserve storage space
`and bandwidth [3]. Several commercial systems such as In-
`tel’s Quick Web [4] and Spyglass’ Prism [5] compress the
`images at the Internet service providers’ proxy to speed-up
`download time.
`We develop a more powerful image transcoding system
`that analyzes the images, the related text and Web doc-
`ument context in order to select policies for adapting the
`images to the client devices. The system transcodes the im-
`ages along the dimensions of size, fidelity, and color in order
`to better adapt them to the client device’s communication,
`processing, storage, and display capabilities.
`1.2. Outline
`In this paper, we present the content-based image transcoder
`system. In Section 2, we present the image content analysis
`system that classifies the images into image type and pur-
`pose classes. In Section 3, we describe the image transcod-
`ing functions and policies. Finally, in Section 4, we examine
`the potential improvement in accessibility of images for a
`growing diversity of client devices. We also demonstrate
Authorized licensed use limited to: Finnegan Henderson Farabow Garrett & Dunner. Downloaded on April 09,2021 at 17:12:38 UTC from IEEE Xplore. Restrictions apply.


`the potential for end-to-end speed up in image access via a
`network-based image transcoding proxy.
`The image analysis system classifies the images based on
`their content into image type and purpose classes. We de-
`fine the following image type classes: T = {BWG, BWP,
`GRG, GRP, SCG, CCG, and CP}, where
`1. BWG - b/w graphic
`2. BWP - b/w photo
`3. GRG - gray graphic
`4. GRP - gray photo
`5. SCG - simple color graphic
`6. CCG - complex color graphic
`7. CP - color photo
`The graphics vs. photographs distinction is loosely targeted
`for distinguishing between synthetic and natural images.
`Although the distinction is not always clear for images on
`the Web [6], the incentive for distinguishing between them is
`to use transcoding functions that are separately tuned for
`handling these types. We define also the following image
`purpose classes P = {ADV, DEC, BUL, RUL, MAP, INF,
`NAV, CON}, where
`1. ADV - advertisement, i.e., banner ads
`2. DEC - decoration, i.e., background textures
`3. BUL - bullets, points, balls, dots
`4. RUL - rules, lines, separators
`5. MAP - maps, i.e., images with click focus
`6. INF - information, i.e., icons, logos, mastheads
`7. NAV - navigation, i.e., arrows
`8. CON - content related, i.e., news photos
`We also map the images into subject classes using related
`text. The semantic information potentially provides sub-
`stitute text for the images for client devices that cannot
`handle images.
`2.1. Image type classification
`The image type classification system utilizes a decision tree
`classifier. The decision tree, depicted in Figure 2, classi-
`fies the images along the dimensions of color content (color,
`gray, b/w), and source (photographs, graphics). Distin-
`guishing between b/w, gray and color is often not trivial
`because of artifacts introduced in the image production and
`compression. Examples of the seven image type classes are
`illustrated at the bottom of Figure 2.
`The image type decision tree consists of five decision
`points, each of which utilises a set of features extracted from
`the images. Keeping in mind the need for real-time, on-
`line transcoding, the features are extracted only as needed
`for the tests in order to minimize processing. The image
`features are derived from several color and texture measures
`computed from the images. We obtained the classification
`parameters for these measures from a training set of 1,282
`images retrieved from the Web.
`graphic graphic
`(BWG) (BWP) . (GRG) (GRP) (SCG) (CCG)
`Figure 2: Image type decision tree consisting of five decision
`points for classifying the images into image type classes.
`Each image X [ m , n] has three color components, cor-
`responding to the RGB color channels as follows: XTgb =
`(z,, z g , z b ) , where z,, z g , 26 E (0,255). The decision tree
`performs the following tests for each image X:
`1. Color vs. non-color. The first test distinguishes
`between color and non-color images using the mea-
`sure of the mean saturation per pixel ps. The satu-
`ration channel ys of the image is computed from X
`as follows:
`y. = max(z,, z9, Zb) - min(zT, z g , zb).
`Then, p8 = 1 M N E,,, ys[m, n] gives the mean satu-
`ration, where M , N are the image width and height,
`respectively. Table 1 shows the mean E ( p , ) and stan-
`dard deviation a ( p s ) of the saturation measure for
`the set of 1,282 images. The mean saturation ps dis-
`criminates well between color and non-color images
`since the presence of color requires ps > 0, while
`strictly non-color images have ps = 0. However, due
`to noise, a small number of saturated colors often ap-
`pear in non-color images. For example, for the 464
`non-color images, E ( p d ) = 2.0.
`I Color
`I 46.2
`I 818 I 63.0
`Table 1: The color vs. non-color test uses mean saturation
`per pixel p s .
`2. B/W vs. Gray. The second test distinguishes be-
`tween b/w and gray images using the entropy P,
`and variance Vv of the intensity channel y,,. The in-
`tensity channel of the image is computed as follows:
Authorized licensed use limited to: Finnegan Henderson Farabow Garrett & Dunner. Downloaded on April 09,2021 at 17:12:38 UTC from IEEE Xplore. Restrictions apply.


`+ 0.hb. Then, the intensity en-
`P,, = - c”,”=”,
`p [ k ] log, p[IC], where
`if IC = y, [m, n]
`0 otherwise.
`The intensity v riance is given by
`where pv = &/ E,,, y,[m, n]. Table 2 shows the
`statistics of P, nd V , for 464 non-color images. We
`can see that for b/w images the expected entropy P,
`is low and expe ted variance Vi, is high. The reverse
`is true for gray mages.
`E(P,) I u(Pv) I E(K) 1 u(V,)
`1 Test 2 I # I
`I 11,644 I 4.993 I
`I B/W
`I 1.1
`1 300 1
`u(pLJ) 50.8
`E(Wi.66) 0.24
`~(Wififi’l 0.16
`Table 4: The SCG vs. CCG vs. CP test uses mean satura-
`tion p s , HSV entropy P I 6 6 and HSV switches W166.
`We use the 166-HSV color entropy Pi66 and mean
`color switch per pixel W166 measures. In the compu-
`tation of the 166-HSV color entropy, p [ k ] gives the fre-
`quency of pixels with color index value IC. The color
`switch measure is defined as in the test three measure,
`except that it is extracted from the 166-HSV color im-
`age y166. We use also the measure of mean saturation
`per pixel p s . Table 4 shows the statistics for ,us, P166,
`and W166 for 818 color images. Color graphics have
`a higher expected saturation E ( p I ) ) than color pho-
`tos. But, color photos and complex color graphics
`have higher expected entropies E(P166) and switch
`measures E(W166) in the quantized HSV color space.
`Image purpose classification
`documents often contain information related to each
`image that can be used to infer information about them [8,
`91. The system uses this information with the image type to
`classify the images into the image purpose classes P. The
`system makes use of five contexts for the images in the Web
`documents: C = {BAK, INL, ISM, REF, LIN}, defined in
`terms of HTML code as follows:
`1. BAK - background, i.e., <body backgr= ... >
`2. INL - inline, i.e., <img src= ... >
`3. ISM - ismap, i.e., <img SIC=... ismap>
`4. REF - referenced, i.e., <a href= ... >
`5. LIN - linked, i.e., <a href= ... ><img src= ... ></a>
`The system also uses a dictionary of terms extracted from
`the text related to the images. The terms are extracted
`from the ‘alt’ tag text, the image URL address strings, and
`the text nearby the images in the Web documents. The
`system makes use of terms such as D = {“ad”, “texture”,
`“bullet” , “map”, “logo”, “icon”}. The system also extracts
`a number of image attributes, such as image width (tu),
`height ( h ) , and aspect ratio ( T = w / h ) .
`The system classifies the images into the purpose classes
`using a rule-based decision tree framework described in [lo].
`The rules map the values for image type t E 7, context
`Table 2: The b/w us. gray test uses intensity entropy P,
`and variance V,.
`if yv[m - 1, n3 # y v [ m , nl
`0 otherwise.
`Table 3: The BWG us. BWP test uses intensity switches
`W,. The GRG vs. GRP uses W, and intensity entropy P,.
Authorized licensed use limited to: Finnegan Henderson Farabow Garrett & Dunner. Downloaded on April 09,2021 at 17:12:38 UTC from IEEE Xplore. Restrictions apply.


`c E C, terms d E V , and image attributes a E {w, h, T } into
`the purpose classes. The following examples illustrate some
`of the image purpose rules:
`t t = SCG, c = REF, d = “ad”
`p = ADV
`p = DEC t c = BAK, d = “texture”
`t = SCG, c = ISM, w > 256, h > 256
`p = MAP
`t = SCG, T > 0.9, T < 1.1, w < 12
`p = BUL
`t = SCG, T > 20, h < 12
`p = RUL
`t = SCG, c = INL, h < 96, w < 96
`p = INF
`2.3. Image summarizer
`In order t o provide feedback about the embedded images
`for text browsers, the system generates image summary in-
`formation. The summary information contains the assigned
`image type and purpose, the Web document context, and
`related text. The system uses an image subject classifi-
`cation system that maps images into subjects categories
`(s) using key-terms ( d ) , i.e., d -+ s, which is described
`in [SI. The summary information is made available to the
`transcoding engine t o allow the substitution of the image
`with text.
`The system transcodes the images using a set of transcoding
`policies. The policies apply the transcoding functions that
`are appropriate for the client devices.
`3.1. Transcoding functions
`The system provides a set of transcoding functions that
`manipulate the images along the dimensions of image size,
`fidelity, and color, and that substitute the images with text
`or HTML code. Example transcoding functions include
`Size: minify, crop, and subsample.
`Fidelity: J P E G compress, GIF compress, quantize,
`reduce resolution, enhance edges, contrast stretch,
`histogram equalize, gamma correct, smooth, sharpen,
`and de-noise.
`Color content: reduce color, map to color table,
`convert to gray, convert to b/w, threshold, and dither.
`Substitution: substitute attributes (U), text ( d ) ,
`type ( t ) , purpose ( p ) , and subject (s), and remove
`3.2. Client device characteristics
`The growing number of client devices that are gaining ac-
`cess to the Internet are varied in their communication, pro-
`cessing, storage and display capabilities. Table 5 illustrates
`some of the variability in device bandwidth, display size,
`display color and storage among devices
`Since many devices are constrained in their capabili-
`ties, they cannot simply access image content as-is on the
`Internet. For example, many PDAs cannot handle J P E G
`images, regardless of size. The HHCs cannot easily dis-
`play Web pages loaded with images because of screen size
`TV browser
`Color P C
`320 x 200
`640 x 480
`544x 384
`1024X 768
`1280 x 1024
`Display Device
`limitations. Color PCs often cannot access image content
`quickly over dial-up connections. The presence of fully sat-
`urated red or white images causes distortion on NTSC TV-
`browser displays. The transcoder framework allows the con-
`tent providers to publish content at the highest fidelity, and
`the system manipulates the content to adapt to the unique
`characteristics of the devices.
`3.3. Transcoding policies
`The transcoding system employs the transcoding functions
`in the transcoding policies. Consider the following exam-
`ple transcoding policies based upon image type and client
`device capabilities:
`minify(X) t type(X) = CP, device = HHC
`subsample(X) t type(X) = SCG, device = HHC
`dither(X) c type(X) = C P , device = PDA
`threshold(X) t type(X) = SCG, device = PDA
`JPEG(X) c type(X) = GRP, bandwidth 5 28.8K
`GIF(X) c type(X) = GRG, bandwidth 5 28.8K
`Notice that two methods of image size reduction are em-
`ployed: minify and subsample. The difference is that minify
`performs anti-aliasing filtering and subsampling. Minifying
`graphics often generates false colors during filtering, which
`increases the size of the file. This can be avoided by sub-
`sampling directly. We also distinguish between graphics
`and photographs for compressing and reducing the color of
`the images. For compression, JPEG works well for gray
`photographs but not for graphics. For GIF, the reverse
`is true. When converting color images to b/w, dithering
`the photographs improves their appearance, while simply
`thresholding the graphics improves their readability. By
`performing the image type content analysis, the system is
`able to better select the appropriate transcoding functions.
`The transcoding policies also make use of the image pur-
`pose analysis. Consider the following example transcoding
`fullsize(X) e purpose(X) = MAP
`remove(X) t purpose(X) = ADV
`bandwidth 5 14.4K
`substitute(X, “<li>”) c purpose(X) = BUL,
`device = PDA
`substitute(X, t ) t purpose(X) = INF,
`display size = 320 x 200
`The first policy makes sure that map images are not reduced
`in size in order to preserve the click focus translation. The
Authorized licensed use limited to: Finnegan Henderson Farabow Garrett & Dunner. Downloaded on April 09,2021 at 17:12:38 UTC from IEEE Xplore. Restrictions apply.


`second policy illustrat s the removal of advertisement im-
`ages if the bandwidth is low. The third policy substitutes
`the bullet images with he HTML code “<li>,” which draws
`a bullet without requir‘ng the image. A similar policy sub-
`stitutes rule images w th “<hr>”. The last policy substi-
`tutes the information ‘mages, i.e., logos, icons, mastheads,
`with related text if th
`device screen is small.
`3.4. Image transco ing proxy
`The content-based im ge transcoder is part of a network-
`based transcoding pro y, see Figure 3. The transcoding
`proxy handles the req ests from the client devices for Web
`documents and images. The proxy retrieves the documents
`and images, analyzes, manipulates and transcodes them,
`and delivers them to t
`e devices.
`the client. Reduc
`transcoding proxy
`duction can result
`accounting for the
`ata sizes of the images at the
`compression, size and color re-
`nd-to-end delivery, even when
`s introduced by the content analy-
`in retrieving the image via
`by Lt = Ds/Bp -k Ds/Bt +
`in a net speed-up by a factor
`sion ratio D,/Dt is
`Consider a relatively high proxy-to-server bandwidth of
`B, = 1000 Kbps, a clie t-to-proxy bandwidth of B , =20
`Kbps, and a transcoder andwidth of B, = 2400 Kbps. A
`data compression ratio a the proxy of D,/Dt 1 1.03 results
`in a net end-to-end spe
`d-up. If the data is compressed
`by a factor of D,/Dt = 8, the speed-up is by a factor of
`L c / L t a 6.5. If B p = 50 Kbps, the data compression ratio
`needs to be increased to D,/Dt 2 1.8 to have a speed-up
`in delivery. In this case, data compression of D,/Dt = 8
`speeds up delivery by a factor of L c / L t x 1.9.
`We presented a system for transcoding images in the Inter-
`net in order to adapt them to client devices with a wide
`range of communication, processing, storage and display
`capabilities. The content-based image transcoder analyzes
`the images and classifies them into image type and image
`purpose classes. The system then utilizes transcoding poli-
`cies based on the content classes to manipulate, transcode,
`and adapt the images. The image transcoding system im-
`proves access of a variety of client devices, including PDAs,
`HHCs, TV browsers and color PCs to the images in the
`J. R. Smith, R. Mohan, and C.-S. Li. Transcoding
`Internet content for heterogenous client devices.
`Proc. IEEE Inter. Symp. on Circuits and Syst. (IS-
`CAS), June 1998. Special session on Next Generation
`A. Fox, S. D. Gribble, E. A. Brewer, and E. Amir.
`Adapting to network and client variability via on-
`demand dynamic distillation. In ASPLOS- VII, Cam-
`bridge, MA, October 1996.
`A. Ortega, F. Carignano, S. Ayer, and M. Vetterli.
`Soft caching: Web cache management techniques for
`images. In Workshop on Multimedia Signal Processing,
`pages 475 - 480, Princeton, NJ, June 1997. IEEE.
`Intel Quick Web.
`V. Athitsos, M. J . Swain, and C. Frankel. Distinguish-
`ing photographs and graphics on the World-Wide Web.
`In Proc. IEEE Workshop on Content-based Access of
`Image and Video Libraries, June 1997.
`J. R. Smith and S.-F. Chang. Tools and techniques
`In Symposium on Elec-
`for color image retrieval.
`tronic Imaging: Science and Technology - Storage &
`Retrieval for Image and Video Databases I V , volume
`2670, pages 426 - 437, San Jose, CA, February 1996.
`N. C. Rowe and B. Frew. Finding photograph captions
`multimodally on the World Wide Web. Technical Re-
`port Code CS/Rp, Dept. of Computer Science, Naval
`Postgraduate School, 1997.
`J. R. Smith and S.-F. Chang. Visually searching the
`Web for content. IEEE Multimedia Mag., 4(3):12 - 20,
`July-September 1997.
`S. Paek and J. R. Smith. Detecting image purpose in
`World-Wide Web documents. In IS&T/SPIE Sympo-
`sium on Electronic Imaging: Science and Technology -
`Document Recognitaon, San Jose, CA, January 1998.
Authorized licensed use limited to: Finnegan Henderson Farabow Garrett & Dunner. Downloaded on April 09,2021 at 17:12:38 UTC from IEEE Xplore. Restrictions apply.

