`Environments
`
`Anuj Maheshwari, Aashish Sharma, Krithi Ramamritham
`Center for Intelligent Internet Research
`Department of Computer Science and Engineering.
`Indian Institute of Technology Bombay
`Mumbai, India 400076
`anuj@iitbombay.org, ash@ee.iitb.ac.in
`krithi@cse.iitb.ac.in
`
`Prashant Shenoy
`University of Massachusetts
`Amherst, MA 01003
`shenoy@cs.umass.edu
`
`Abstract
`
`With the advent of the Wireless Internet, the client space
`has become heterogeneous in terms of device capabilities.
`To cater to the needs of these devices in E-Commerce ap-
`plications, smart intermediaries have been developed to in-
`crease the user satisfaction by hiding the inherent weakness
`of some of the small although handy devices like the PDAs
`and Web-tops. Transcoding has been a popular technique
`to render data for small devices that have smaller displays,
`and lesser colour capabilities. But transcoding comes at the
`cost of caching at the Intermediary.
`In this paper, we describe a transcoding and caching
`proxy that caches objects for heterogeneous client spaces
`by maintaining separate caches for different categories of
`clients (PC, PDAs, Mobiles, etc.) and transcoding the lower
`fidelity versions from the high fidelity variants at the prox-
`ies as opposed to fetching the transcoded variants from the
`server. To achieve this, the proxy keeps the server-directed
`transcoding information (if provided by the host server) as
`part of meta data attached to the cached objects and uses
`this information to convert its fidelity and modes or uses
`heuristics. Through such an intermediary architecture that
`serves a heterogeneous client base, we exploit the avail-
`ability of cached high-fidelity variants of web resources
`(brought in to serve the requirements of high-end devices
`like PC’s) to serve low end devices, and thereby decrease
`latency and bandwidth.
`
`1. Introduction
`
`With the everyday computer user adopting the Internet,
`the opportunity to communicate and interact for business
`and personal use has exploded to phenomenal levels. As
`
`traditional business takes its steps into the electronic en-
`vironment, the rate of growth of the e-commerce and m-
`commerce space has been rapid. The development has gone
`a step further with the advent of pervasive devices, such as
`cell phones and personal digital assistants (PDA’s). All this
`has led to a greater potential to do business and to share
`ideas.
`As the figures state, the current mobile subscriptions
`stand at 170 million. They are forecast to exceed one bil-
`lion by 2003. According to a study published by Strategy
`Analytics [1] within its strategic advisory service, Wireless
`Internet Applications, more than 1.5 billion PDAs, handsets
`and Internet appliances are to be equipped with the wireless
`capabilities the end of 2004. In money terms, the global mo-
`bile commerce market will be worth 200 billion dollars by
`then and 130 million customers will be generating almost
`14 billion transactions per annum. Taking a look at the mo-
`bile devices we can expect 70 percent of the new cellular
`phones and 80 percent of all the new PDAs to have some
`form of access to the web and further expect 63 percent of
`the transactions to be generated by mobile devices [2].
`All the above is based on the hypothesis that mobile
`phones and small handheld devices would be able to pro-
`vide the quality of service and features desired by a com-
`mon user to carry out day-to-day business activities. An en-
`abling and essential feature for the same is the ease to access
`the Internet at a reasonable user satisfaction 1level. But then
`one needs to understand that there are several challenges in
`making mobility a widespread reality. Many problems like
`lower bandwidth, higher error rates, frequent disconnec-
`tions, smaller displays are common across mobile phones
`and PDAs. Another important issue here is that of hetero-
`geneous client space.
`
`1User satisfaction is the perceived experience of a typical user during
`Internet surfing.
`
`Proceedings of the 12th Int(cid:146)l Wrkshp on Research Issues in Data Engineering: Engineering e-Commerce/ e-Business Systems (RIDE(cid:146)02)
`1066-1395/02 $17.00 ' 2002 IEEE
`Authorized licensed use limited to: Jim Day. Downloaded on July 01,2020 at 17:45:37 UTC from IEEE Xplore. Restrictions apply.
`
`Adobe - Exhibit 1015, page 1
`
`
`
`Heterogeneous client space exists when data is avail-
`able to users across a growing number of destinations -
`laptops, desktops, kiosks, automobile browsers, cellular
`phones, pagers, pocket PCs, Palm OS devices and other
`handheld devices. Current trends suggest that the evolu-
`tion of the Internet is towards more and more heterogene-
`ity. In heterogeneous client environments, each destination
`brings its own unique requirements. Devices present dif-
`ferent graphical, bandwidth and display requirements, and
`may support a variety of data formats (e.g. XML, WML,
`cHTML, SVG). They have different processing capabili-
`ties and constantly evolving standards. Such a heteroge-
`neous client environment is very distinct from the prevailing
`homogeneous client environments in which the only client
`type is a personal computer. Every response in a heteroge-
`neous environment is not only a function of the URL but
`also depends significantly on the kind of the device mak-
`ing the request. This additional constraint in such envi-
`ronments forces a rethink of certain prevalent technologies,
`which provided performance benefits in homogeneous en-
`vironments completely useless. One such technology is the
`caching of web objects.
`
`There is no doubt that caching has increased user satis-
`faction tremendously by serving web objects locally from
`a proxy that is near the client. We believe, caching can
`also provide similar benefits in heterogeneous client envi-
`ronments though this requires additional features at the in-
`termediary and standard protocols for the end user to com-
`municate. One of the fundamental problems that appear in
`enabling caching for heterogeneous environments is storing
`of multiple variants of the same object generated because of
`the transcoding operation at the server side or at an interme-
`diary before the caching server. Presently, all transcoding
`engines mark transcoded content as un-cacheable. We be-
`lieve this is unnecessary and leads to wastage of bandwidth
`and increases latency in a response that needs transcoded
`data since the content now travels across the Internet when
`it could very well be served from a location near the client.
`
`We present a novel caching and transcoding system
`called TransSquid that
`is an extensible and intelligent
`intermediary for the heterogeneous client environments.
`TransSquid is a modular framework that enables caching in
`heterogeneous environments by maintaining a multi-level
`cache and linking it with the transcoder.
`
`This paper is structured as follows - we first present
`our technique and then discuss its implementation in detail.
`Then, we look at related techniques that help in enhancing
`the user satisfaction through mobile devices like PDA’s and
`mobile phones and compare them with our work. We end
`with a discussion on the open issues and also present the
`scope of TransSquid.
`
`2. Caching in Heterogeneous Client Environ-
`ments
`
`To place things in perspective, we first discuss caching
`technology, as it exists today. Cache technology provides
`for the three primary characteristics - scalability, availability
`and responsiveness. The fundamental principle of a cache is
`to provide frequently requested content locally and reduce
`the latency for the request to be serviced.
`With the advent and the widespread usage of the Internet,
`caches have been used to store web objects. They could be
`deployed at the end point or as an intermediary between the
`client and the server. The essential idea is still the same, to
`provide faster access to content. On the whole, all caching
`technologies, whether at the end point or at an intermediate
`location revolve around reducing latency by avoiding slow
`links between client and origin server. They try to make
`up for slow connections and network congestion. For ISPs
`and Content Providers, caching further reduces the over-
`all traffic on the networks and allows them to offer high-
`performance distribution at low cost.
`For a typical web request, caching functionality is em-
`bedded into the HTTP protocol. A HTTP [9] transaction
`begins when the client sends a request with the URL, using a
`series of HTTP header requests. This request is triggered by
`an intermediary caching proxy that checks for the presence
`of the requested object in the local cache. If the object is
`present in the cache, the proxy would append an if-modified-
`since request header with the request. The server responds
`to the typical request by first sending HTTP response head-
`ers that contains information about the requested content
`eg. date, content type, last modified, and content length.
`In case a if-modified-since request header is present in the
`request, the server simply sends a “304 Not Modified”
`reply if the object at the cache is the most recent version
`else, the client has to download the web object from the end
`server. When the client request is furnished from the cache,
`it is termed as a HIT. If the cache does not have the latest
`version of the requested object and the object is got from the
`end server, it is termed as a MISS. All caching algorithms
`try to maximize the probability of a HIT given the limited
`storage space available with the cache.
`As discussed earlier, the constraints in a heterogeneous
`environment are very different from that of the homoge-
`neous environment. In the next couple of paragraphs, we
`shall discuss in more detail, the issues and imperatives in
`such environments from the angle of caching.
`Transcoding and content negotiation lead to multiple
`variants of a resource or a web object that differ in modality
`and fidelity 2. For example, a typical PDA client would be
`
`2A media resource can be translated to different modalities, such as text
`to audio, or video to images. Where as different versions within the same
`modality, occurring for example due to, image compressions, text summa-
`rizations and video abstractions are the fidelity variants of a resource.
`
`Proceedings of the 12th Int(cid:146)l Wrkshp on Research Issues in Data Engineering: Engineering e-Commerce/ e-Business Systems (RIDE(cid:146)02)
`1066-1395/02 $17.00 ' 2002 IEEE
`Authorized licensed use limited to: Jim Day. Downloaded on July 01,2020 at 17:45:37 UTC from IEEE Xplore. Restrictions apply.
`
`Adobe - Exhibit 1015, page 2
`
`
`
`satisfied with a lesser fidelity variant than the variant for the
`same resource for a Workstation client that would demand
`rich features.
`
`proxy in heterogeneous client environments. To summarize,
`a caching proxy can be designed to:
`
`Client Type
`Work-Station
`Color PC
`PDA (high-end)
`PDA (low-end)
`WAP-enabled Mobile Phone
`
`Table 1. Client Capabilities
`Bandwidth Display Size Color Device
`10 Mbps
`1280x1024
`24-bit color
`56 Kbps
`1024x768
`24-bit color
`14.4 Kbps
`320x200
`8-bit color
`9.6 Kbps
`320x200
`Greyscale
`20 bps
`132x176
`b/w
`
`Storage
`20 Gb
`10 Gb
`16 Mb
`8 Mb
`2 Mb
`
`Essential requirements for intermediaries, especially
`caching proxies in such environments are, recognition and
`persistence of the different variants of the same resource.
`For the former, standards can help in the identification and
`resolution of variants through attached headers. It is the per-
`sistence of multiple variants of the same resource that would
`require intelligent caches that have a mechanism to catego-
`rize them according to one or a set of parameters (namely
`fidelity, requesting client, content-type) and store them ac-
`cordingly in a way that makes retrieval fast and simple. This
`makes the design of such a cache both interesting and chal-
`lenging.
`For the identification of the client type in an heteroge-
`neous environment, it is imperative for the intermediary
`proxy and the end server to understand the Client Capabil-
`ities and Preference Profiles (CC/PP) [4]. CC/PP is a col-
`lection of the capabilities and preferences associated with
`user and the agents used by the user to access the WWW.
`For an intermediary caching proxy, understanding CC/PP
`would help it in the classification of clients so that variants
`of the same resource could be appropriately made available
`to them. The service and the content provided by a caching
`proxy would depend entirely on how well the proxy can un-
`derstand the requirements of a client and thereby, on it abil-
`ity to deliver the closest possible variant available. CC/PP,
`thus gives a standard for such communication.
`Intelligent caching proxies in a heterogeneous client en-
`vironment could have the functionality to exploit the het-
`erogeneity through the conversion of high fidelity variants
`of a resource to low fidelity variant to serve clients at the
`lower capacities. A typical example of this would be the
`conversion of high fidelity JPEG image to a low fidelity GIF
`image in reduced colour and half the dimensions to serve a
`low resolution PDA client. This would imply that a higher
`fidelity variant of a resource could be used to satisfy the
`needs of clients that have lower capabilities, through local
`transcoding of content at the caching proxy itself.
`For this, a transcoding module that is tightly coupled
`with the caching engine is needed at the intermediary. The
`above discussion highlights the requirements for a caching
`
`1. Recognize multiple variants of a resource and catego-
`rize them in the cache accordingly.
`
`2. Understand the Clients Capabilities and Preference
`Profiles (CC/PP) to facilitate client recognition.
`
`3. Manipulate fidelity or modality (or transcode) of vari-
`ants whenever possible to provide better service.
`
`3. TransSquid: A Caching and Transcoding
`Intermediary
`
`Our novel technique is a multi-level caching solution
`for the emerging heterogeneous client environments. The
`TransSquid architecture is designed as a smart intermedi-
`ary that can cache and transcode web objects in an environ-
`ment where the client requests have to be serviced intelli-
`gently according to the client capabilities. TransSquid tries
`to solve the problem of caching in a heterogeneous client
`environment by taking a client centric approach for catego-
`rizing different variants and then storing them in a multi-
`level cache on the basis of fidelity and the modality.
`As can be seen in Table 1, there are an ever-increasing
`myriad of devices for accessing the web. Given this, it
`would not be sensible to provide for caches for each type
`of device as the current base of these devices is high and is
`increasing because of innovation and newer players. If one
`cache were provided for each device and if a cached ob-
`ject that serves a Windows CE based iPAQ, it would not be
`able to support a Palm device, which could share the same
`data given their capabilities. Separate caches for each de-
`vice would lead to low and perhaps ineffective HIT rates
`in the cache, which defeats the purpose of the cache itself.
`TransSquid, therefore provides a limited level caching ar-
`chitecture, by dividing the client space into a limited num-
`ber of (say three) categories based on the capabilities. A
`client device is a member of one and only one of the 3 cate-
`gories depending on its capabilities like display size, colors,
`storage and bandwidth of connection. All web objects ac-
`
`Proceedings of the 12th Int(cid:146)l Wrkshp on Research Issues in Data Engineering: Engineering e-Commerce/ e-Business Systems (RIDE(cid:146)02)
`1066-1395/02 $17.00 ' 2002 IEEE
`Authorized licensed use limited to: Jim Day. Downloaded on July 01,2020 at 17:45:37 UTC from IEEE Xplore. Restrictions apply.
`
`Adobe - Exhibit 1015, page 3
`
`
`
`cessed from a client device are stored in the cache that the
`client has membership to.
`The 3 major categories that we divide the client space
`are:
`
`1. High Capability Clients:
`
`The clients that fall in this category are Personal Com-
`puters, Work Stations and Laptop Computers. These
`devices have large storage capacities (typically more
`than 64 MB RAM and 10 GB or more disk space),
`large screen size (640 x 480 pixels or above), multi-
`media support, and good processing power. These de-
`vices have functionality to view video, audio and high
`resolution images. Content available on the Internet is
`default for such kind of computing devices.
`
`2. Medium Capability Clients:
`
`Portable Computers like PDA’s and WebTops fall in
`the category of Limited Capability clients. Typical ex-
`amples of such devices are the Palm Pilots and iPAQ’s.
`The demand for such portable and handy devices has
`grown exponentially. These devices have smaller
`screen size (typically 320 x 200), limited colors, lesser
`processing power (100 Mhz) and are connected to the
`Internet through slow wireless links. Some devices in
`this category also have audio functionality.
`
`3. Limited Capability Clients:
`
`These are very low-end devices that have been used to
`surf the internet only for score updates, news headlines
`and other text-only or have very low graphics support.
`Some models of mobile phones have come up with
`larger screens, but inherently the data transfer to these
`devices is very slow. SMS - a popular technique for
`data transfer to mobile phones has a bandwidth of 20
`bytes per second. Though with new Wireless Trans-
`fer Protocol (WAP) and other efforts from the research
`community this has improved, surfing using a the mo-
`bile phone remains slow.
`
`As can be seen, we take a middle path. We justify our
`categorization into a limited sets by saying that broadly the
`requirements of all clients in a particular category are the
`same. Though, certain differences in capacities might exist,
`we take a broader view for user satisfaction. The overall
`picture is represented in Figure 1. Any necessary modifi-
`cation could be done on the fly. Our experiments suggest
`that PDA devices with Windows CE and the Palm OS (the
`two major players in the small device OS market) give al-
`most the same visibility and feel for content. Though dif-
`ferent transcoding engines have proprietary techniques for
`transcoding, we feel that the difference in the rendering for
`these clients that are close to each other in CC/PP is not very
`
`Host Server
`
`TRANS−SQUID
`
`Clients
`
`Server
`Directed
`Transcoding
`Information
`
`Cache
`
`Trancoding Module
`
`Policy Engine
`
`CC / PP
`
`CC / PP
`
`CC / PP
`
`Figure 1. Basic scenario for TransSquid
`
`high, and hence the same cached object could provide the
`necessary level of user satisfaction.
`If certain transcoding needs to be done on an object in a
`cache to suit the requirements of a request made from the
`client in the same category - it could be done on the fly
`or at the client side, but the performance benefits provided
`by giving a Cache HIT would outweigh the less optimal
`solution in transcoding.
`Intra-cache communication between the different levels
`in TransSquid, allow the low end users to benefit from the
`availability of high fidelity content. Our assumption is that
`normally a high fidelity variant can be converted into a
`lower fidelity variant and hence objects of a higher fidelity
`cache can respond to a request for a lower fidelity cache
`by local transcoding. This is broadly true for images and
`HTML data whose fidelity can be changed by decreasing
`resolution or by removing unnecessary information. We
`term this as Partial HIT and discuss this concept in our pro-
`ceeding discussion.
`
`3.1 Notion of Partial HIT
`
`HIT and MISS are the two primary events that take place
`when the client sends a request for a object to the network.
`Depending on attributes like if-modified-since and client
`preferences, the object is either returned from the cache or
`fetched from the server, as seen in our earlier discussion on
`caches.
`In the TransSquid architecture, when a client with low
`fidelity requirements, makes a request, for which the cache
`already contains a higher fidelity variant but no object in
`the cache that the client maps to exactly, the cache returns a
`Partial HIT. In such a case, the cache sends the object to the
`transcoding module of the TransSquid architecture, which
`uses the information available in the meta-data of the Ob-
`ject, the content typeand Characteristics of the Object as its
`input to return a suitable variant of te resource. If the meta-
`data contains the directive given by the host server (based
`
`Proceedings of the 12th Int(cid:146)l Wrkshp on Research Issues in Data Engineering: Engineering e-Commerce/ e-Business Systems (RIDE(cid:146)02)
`1066-1395/02 $17.00 ' 2002 IEEE
`Authorized licensed use limited to: Jim Day. Downloaded on July 01,2020 at 17:45:37 UTC from IEEE Xplore. Restrictions apply.
`
`Adobe - Exhibit 1015, page 4
`
`
`
`HASH 1
`CACHE
`LEVELS
`HASH 2
`
`HASH 3
`
`TRANSCODING MODULE
`
`CACHE STORE
`
`StoreMetaData
`
`HttpReplyHeader
`
`HttpReplyBody
`
`StoreEntry
`
` CLIENT INTERFACE
`
`
`
`
`
`
`
`
`
`on information semantics) then this information is used for
`transcoding. This is called Server Directed Transcoding
`[21].
`A Partial HIT is more time consuming than the HIT in
`which the object is uploaded from the disk and served to
`the client. The additional time is primarily consumed in -
`firstly, trying to determine the fidelity and the modality of
`the variant from the variant available in the cache, and sec-
`ondly, time taken to perform the actual transcoding opera-
`tion. This is still an order above the MISS case, in which the
`object is fetched from the server, and possibly transcoded at
`an intermediary as well. Figure 2 shows the relative times
`taken by MISS, HIT and a Partial HIT.
`
`CLIENT
`
`TRANS−SQUID
`
`SERVER
`
`POLICY ENGINE
`
`MISS
`
`PARTIAL HIT
`
`HIT
`
`time
`
`}Transcoding
`
`REQUEST
`
`REPLY
`
`Figure 2. Concept of a Partial HIT
`
`In this section, we saw the TransSquid architecture and
`the problem that it tries to solve. Our solution is based on
`certain assumptions that are based on our understanding of
`the WWW and also the heuristics and rules of thumb avail-
`able for the Internet in general. In the next section, we dis-
`cuss the implementation of TransSquid.
`
`4 Implementation of TransSquid
`
`4.1 Architecture
`
`The TransSquid is designed as a modular framework
`where each module has a specific function. The major parts
`of the TransSquid are:
`
`1. Multi-level Caching Module
`
`2. Transcoding Module
`
`3. Client Side Module
`
`4. Policy Engine
`
`A schematic representation of the architecture is shown
`in the Figure 3. The Multi-level Caching Module is struc-
`tured in the form of three caches that store the web objects
`
`Figure 3. Architectural overview of TransSquid.
`
`according to the requesting client type. Each level functions
`as a separate cache though they have a central persistence.
`The cache replacement policy for the cache is FIFO. The
`cache replacement policy by itself in such architecture is an
`interesting and unexplored issue for research.
`These caches are hierarchical as the caches serving the
`high capability clients can also serve requests from de-
`vices like mobile phones and PDA’s after passing the ob-
`jects through the transcoding module. As discussed later,
`the transcoding module renders the objects for lower capa-
`bility devices.
`The Transcoding Module in our implementation is a
`heuristic based web object processor, which recognizes two
`types of objects - images and text. For images the transcod-
`ing modules applies functions like reducing the dimensions,
`decreasing the number of colors and decreasing the quality
`factor of JPEG images. The function to be performed on
`an image is chosen through the policy engine or by using
`any server directed information appended with the web ob-
`ject. For text, we apply simple techniques like removing
`unwanted parts like advertisements, buttons and unneces-
`sary information replacing them by simple text links.
`The Policy Engine provides the Transcoding Module
`with information about current modality and fidelity of a
`web object and also suggests the function that should be
`carried out to render it for other devices types. Again, this is
`based on heuristics like content type, size, and dimension,
`in the case of images.
`The Client Side Module is the interface between the
`client and the proxy. It is used for intercepting the client
`requests and mapping it to the correct cache. The client
`side maintains a client specific state so that a client need
`not advertise its capabilities and preferences (CC/PP) every
`
`Proceedings of the 12th Int(cid:146)l Wrkshp on Research Issues in Data Engineering: Engineering e-Commerce/ e-Business Systems (RIDE(cid:146)02)
`1066-1395/02 $17.00 ' 2002 IEEE
`Authorized licensed use limited to: Jim Day. Downloaded on July 01,2020 at 17:45:37 UTC from IEEE Xplore. Restrictions apply.
`
`Adobe - Exhibit 1015, page 5
`
`
`
`time it makes a request.
`We use the Squid web proxy cache as the basic platform
`for our implementation. Squid [5] is an open-source, high-
`speed, Internet proxy-caching program. We use Squid ver-
`sion 2.3.STABLE 4 available freely on the Internet. Our
`code is written in C language. We chose Squid as a platform
`for its robustness and wide spread usage, and also because
`hierarchies of squid proxies, arranged in complex relation-
`ships are possible. Our code uses the caching functionality
`of Squid for implementing the multi-level cache, though the
`Transcoding Module and Policy Engine are written using
`some of the commonly available libraries like lib-jpeg [6].
`Our contribution thorugh the TransSquid is in the form
`
`of:
`
`(cid:15) Enhanced work-flow of a typical request.
`
`(cid:15) Intelligent storage caches that can communicate with
`each other.
`
`(cid:15) Addition of a Transcoding Module and Policy Engine,
`to show a proof of concept Caching and Transcoding
`Intermediary.
`
`(cid:15) Interaction between the caching related processes and
`transcoding related functions for higher optimization.
`
`4.2 Data Structures
`
`This section describes some of the key data structures
`that have been changed from existing Squid implementation
`or that have been added.
`The Data Structures for the caching functionality of the
`TransSquid, not only need to keep typical caching informa-
`tion but at each point of a typical request-response process,
`they need to keep track of the requesting client type. This
`information is required to base decisions like which cache
`to retrieve or store.
`In Squid, the request data and the reply data are linked
`together using the httpStateDatawhich contain the request t
`data structure. In TransSquid, this data structure is mod-
`ified to contain another attribute called device which con-
`tains information on the type of client making the request.
`Additionally, the StoreEntry that keeps information on the
`storage of a web object in the cache, also has an attribute
`called device. This attribute is used to keep track of which
`cache (store table) should it be linked to as the Squid now
`maintains three different caches. Internally the caches store
`all the data in the same central cache.
`Transcoding information is stored in a data structure
`called transcode info and the Policy Engine usually gen-
`erates it. The client registers its CC/PP to the proxy and
`then uses a unique key through which the TransSquid knows
`its CC/PP for the later sessions. The Client Site maintains
`
`persistence storage and has appropriate data structures for
`keeping such state information.
`
`4.3 Flow of Typical Request
`
`The flow of a typical request in the Trans-Squid archi-
`tecture is as follows.
`
`1. A client connection is accepted by the client-side. The
`HTTP request is parsed. The client device is identified
`and the clientHttpRequestdevice flag is set.
`
`2. The access controls are checked. The client-side builds
`an ACL (Access Control Layer) state data structure
`and registers a callback function for notification when
`access control checking is completed.
`
`3. After the access controls have been verified, the client-
`side looks for the requested object in the cache. If it is
`a cache HIT, then the client-side registers its interest in
`the StoreEntry. Otherwise, Squid needs to forward the
`request, perhaps with an if-Modified-Since header. If a
`Partial HIT is encountered, Squid calls the Transcode
`Module and transcodes the content of the HttpReply
`according to server directed transcoding information
`information that is stored in the Store Meta Data or
`according to its Policy Engine. Once this is done the
`request is forwarded to the client, and the new variant
`of the StoreEntryis added to the cache (hash table) of
`that client type, and the client-side registers its interest
`in that StoreEntry.
`
`4. The HTTP module first opens a connection to the ori-
`gin server or cache peer. If there is no idle persistent
`socket available, a new connection request is given to
`the Network Communication module with a callback
`function.
`
`5. When a TCP connection has been established, HTTP
`builds a request buffer and submits it for writing on the
`socket. It then registers a read handler to receive and
`process the HTTP reply.
`
`6. As the reply is initially received, the HTTP reply head-
`ers are parsed and placed into a reply data structure.
`As reply data is read, it is appended to the StoreEn-
`try. Every time data is appended to the StoreEntry,
`the client-side is notified of the new data via a call-
`back function. Meta information from the server side
`like the transcoding details for various devices is added
`to the Store Meta Data. This information is used if a
`client encounters a Partial HIT as discussed in step 3.
`
`7. As the client-side is notified of new data, it copies the
`data from the StoreEntryand submits it for writing on
`the client socket.
`
`Proceedings of the 12th Int(cid:146)l Wrkshp on Research Issues in Data Engineering: Engineering e-Commerce/ e-Business Systems (RIDE(cid:146)02)
`1066-1395/02 $17.00 ' 2002 IEEE
`Authorized licensed use limited to: Jim Day. Downloaded on July 01,2020 at 17:45:37 UTC from IEEE Xplore. Restrictions apply.
`
`Adobe - Exhibit 1015, page 6
`
`
`
`Quality=25
`Quality=50
`Quality=75
`
`120
`
`100
`
`80
`
`60
`
`40
`
`20
`
`Cumulative Percentage (%)
`
`Figure 4. A GIF image of size 3473 bytes
`transcoded into a much smaller sized version
`of 1473 bytes as represented below it.
`
`8. As data is appended to the StoreEntry, and the client(s)
`read it, the data may be submitted for writing to disk.
`
`9. When the HTTP module finishes reading the reply
`from the upstream server, it marks the StoreEntry as
`complete. The server socket is either closed or given
`to the persistent connection pool for future use.
`
`10. When the client-side has written all of the object data,
`it unregisters itself from the StoreEntry. At the same
`time it either waits for another request from the client,
`or closes the client connection.
`
`4.4 Transcoding Results
`
`In TransSquid, various transcoding techniques are used
`for changing the modality and fidelity of web objects. For
`images, some of the transcoding techniques used are based
`on:
`
`(cid:15) Size : minify, subsample
`
`(cid:15) Fidelity : JPEG Compress, GIF compress, reduce res-
`olution
`
`(cid:15) Colour Content : reduce colors, convert to grey, con-
`vert to b/w
`
`The transcoding for a web object is a function of its cur-
`rent mode (HTML, jpeg/image, gif/image) and the current
`client category it serves and that it would aspire to serve.
`For images, it is further a heuristic function of its resolu-
`tion, colour, size (in bytes), geometry and priority.
`
`0
`
`0
`
`0.2
`
`0.4
`
`0.8
`0.6
`Compression Ratio
`
`1
`
`1.2
`
`1.4
`
`Figure 5. Transcoding GIF (size > 5 kB) im-
`ages to JPEG images.
`
`For our analysis, we work on a a total of 9336 images,
`which comprise of 51.2 % GIF’s and rest JPEG’s. Further
`we classify, images based on size, colour and purpose. Each
`such combination has a function preassigned that will be
`used for conversion between category of clients.
`A typical transcoding operation is shown in Figure 4.
`The transcoding engine converts a GIF image with 256
`colours for PC user into a reduced colour (b/w) GIF for a
`Palm V user. The simple function executed by the transcod-
`ing module is the reduce color operation. The size of the
`image reduces from 3473 bytes to 1473 bytes without any
`significant compromise in quality.
`Some of the other operations like conversion of high
`colour GIF’s above 50 Kbytes into JPEG’s are based on
`analysis done on our sample image collections. Figures 5-7
`show some of the statistical results obtained from transcod-
`ing operation performed on images. Here, the resulting im-
`age size as a percentage of original image size is plotted as
`a cumulative distribution. We see in Figure 5, transcoding
`from GIF to JPEG provides atleast 50% decrease in size,
`for 90% of large images (above 50 Kbytes). This comes out
`at hardly any difference in perceivable quality if the resul-
`tant image is viewed through a limited capac