`
`Content Delivery Networks:
`An Introduction
`White paper
`
`Contact:
`
`Networking Products Division
`HCL Technologies Ltd.
`49-50 Nelson Manikkam Road
`Chennai- 600 029, INDIA
`© HCL Technologies Ltd..
`All rights reserved.
`
`May 2002
`
`http://cdn.hcltech.com
`
`
` p o w e r i n g i m a g i n a t i o n
`
`1
`
`1
`
`
`
`
`
`
`
`Contents
`
`
`Introduction: What are Content Delivery Networks?........................................................................ 3
`Typical CDN Architecture ............................................................................................................. 3
`CDN service providers ................................................................................................................. 5
`Peering between CDNs ................................................................................................................... 5
`Typical architecture between peering CDNs ................................................................................... 6
`Accounting in a CDN........................................................................................................................ 6
`Log collection................................................................................................................................ 6
`Terminology used in content networking...................................................................................... 7
`Conclusion: future market trends ..................................................................................................... 9
`About HCL Technologies ............................................................................................................... 10
`
`
` p o w e r i n g i m a g i n a t i o n
`
`2
`
`2
`
`
`
`
`
`
`
`Introduction: What are Content Delivery Networks?
`Content Delivery Network, as the name suggests, is a network of machines which
`delivery content, which may be static or dynamic data on the web. CDN encompasses
`many different technologies, all with a common goal of improving the Internet
`performance. CDNs require the ability to detect changes to existing data or to detect
`availability of new content at origin servers.
`
`CDN is necessarily a complex system with many components. These components are
`distributed across different nodes of a network in a possibly heterogeneous environment.
`These components are servers with replicated content in them all over the world. CDNs
`typically take care of redirecting customer requests to a server topologically placed near
`the customer. Thus, customer gets the advantage of getting data requested at a much
`faster rate, from the nearest server.
`
`Two of the major concerns addressed by CDNs are ensuring efficient content distribution
`and freshness of content given to the customer. An enterprise of ISPs would never
`accept misses on certain objects. Retrieving large objects across the Internet during
`bandwidth-constrained hours could result in unacceptable latencies. Pushing time-
`sensitive data (e.g. news, share prices, entertainment, live sporting events) to servers to
`ensure fresh content is a challenge, which CDN solution providers face. Most important
`of them is efficient distribution of all content to the respective servers.
`
`Typical CDN Architecture
`Essential infrastructure for CDNs would typically follow the ideology that it is necessarily a
`complex system with many components. This requires a management entity (i.e. network
`operations center), which intelligently monitors and manages the whole system.
`
`
` p o w e r i n g i m a g i n a t i o n
`
`
`
`3
`
`3
`
`
`
`
`
`Then there are servers (i.e. shown as surrogate server 1, surrogate server 2, surrogate
`server 3 in the diagram) cache associated with them. The cache actually cache the data
`so that when a customer views the same content next time, it is furnished fast. In case
`the content is not present in the cache, it is the responsibility of the respective server
`associated with cache to get content from a source which is topologically nearer to it.
`
`Metrics for redirecting requests could include network proximity derived from network
`routing tables (e.g. border gateway protocol), topological proximity (i.e. depending on
`region), load balancing with servers (i.e. finding out the server with less load in a
`particular region). In case the respective server finds that content is not actually present
`in nearby locations, it sends a request to origin server and gets content. CDNs also
`process logs from these servers and use it for billing purposes.
`
`Server Director/Content Director
`This intercepts all requests and directs them to servers topologically nearest to the client
`or end-user. Server director avoids situations where access to Internet is impossible as a
`result of server problems. Assuming content is present in three servers in a region (i.e.
`surrogate server1, surrogate server 2 and surrogate server 3) and a request comes from
`a client topologically nearer to surrogate server 1 with the requested content. When
`request arrives, assume that surrogate server 1 is down.
`
`In a situation like this, server director would not let the request go to surrogate server 1.
`Instead, it passes on the request to surrogate server 2 or surrogate server 3 depending
`on factors like topological proximity, network proximity etc.
`
`Server director keeps track of health of all servers in the server farm and is able to detect
`failure immediately and take appropriate action. Server director handles directing
`requests to a least loaded server. In case all servers are out of service, then requests
`should be sent to origin server. There could arise a situation where content is being
`fetched from one of the origin servers, and at this instant, one of the servers in server
`farm comes up. During this time, server director should be able to switch/redirect the
`request to the server with the content cached and has come up. This leads to efficient
`utilization of bandwidth and faster access of content to client.
`
`Mechanism for direct access to origin server
`Many a times it happens that client needs access to updated information (e.g. financial
`news). In such a case the server director would have a mechanism to redirect all
`requests to origin servers.
`
`Single point of control
`
`Server director needs to take care of all servers, which would be distributed over great
`distances from a single point of control. All servers in server farm have a cache
`associated with them, which cache content delivered over HTTP, FTP or NNTP. These
`are characterized by having some proportion of static content. Essentially, servers should
`potentially be able to detect whether content requested is present in cache or not. Based
`on this, server should pass on the request to origin server. On getting response from
`origin server, the content should be delivered to end-user, and should cache a copy of it
`in server’s cache.
`
`Efficiency with cache
`
`With the increase in usage of Internet on a day-to-day basis, we realize that cache hit
`rates are growing exponentially with respect to growth of web content. It is also unlikely
`
`
` p o w e r i n g i m a g i n a t i o n
`
`4
`
`4
`
`
`
`
`
`that capacity of cache increases with respect to growth of web content. So efficient
`utilization of cache is a must for better performance and with hundreds and thousands of
`caches over the Internet, performance improvement could be substantial.
`
`
`Effective cache management involves finding out the objects which are to be present in
`cache and which need not. Most of algorithms which come into picture here take into
`account the probability of the requested object being accessed a number of times in
`recent past.
`
`The CDN advantages
`
`(cid:156) Faster response time due to factors like geographical proximity, network
`proximity etc.
`(cid:156) Providing support for different types of content including on-demand, streaming
`media
`(cid:156) Providing support for secure delivery of content
`(cid:156) Efficient distribution of content to all the resources (typically all the servers in the
`content delivery network)
`(cid:156) Provide unique ways to improve performance of network thereby providing ways
`to utilize bandwidth efficiently
`
`CDN service providers
`Some of the service providers who specialize in content delivery networks:
`
`Akamai
`Digital Island
`Globix
`Mirror-Image
`Ibeam
`CacheWare
`Inktomi
`Cache Flow
`
`
`
`
`
`
`
`
`
`
`: www.akamai.com
`: www.digitalisland.com
`: www.globix.com
`: www.mirror-image.com
`: www.ibeam.com
`: www.cacheware.com
`: www.inktomi.com
`: www.cache-flow.com
`
`Peering between CDNs
`Content peering allows multiple content delivery network solution providers to inter-
`operate with each other. It is very much possible that two CDNs are do not have the
`same underlying technology / architecture implementation. It is also highly unlikely that a
`single CDN solutions provider is spanning multiple geographies. From this, it is obvious
`that content peering or CDN peering is a must in terms of current trends.
`
`Assume that there are two CDNs peering. Both CDNs would have their own independent
`content directors, which handle request routing, load balancing, and surrogates furnish
`data to client or end-user. For peering to take place, there has to be some form of
`communication between components of independent CDNs. This requires a gateway for
`communication.
`
`The end-result would be a virtual network in which all components of different CDNs have
`work in unison to deliver content to end-users. There is an alliance of service providers
`and content providers namely content alliance, which created content-peering group. The
`content-alliance was formed to facilitate interoperability of independent CDNs. Although
`content alliance group is focussed on standards that would help CDNs peer, the content-
`bridge alliance is focussed to new models that offer content providers optimal
`performance and network reach.
`
`
` p o w e r i n g i m a g i n a t i o n
`
`5
`
`5
`
`
`
`
`
`On the industry alliance front, a working group called Content Distribution Internetworking
`Working Group was formed, jointly led by representatives from Content Bridge and
`Content Alliance. Initial areas of focus include content distribution/injection, request
`routing and accounting, with subsequent aim to create a common set of protocols for
`content peering.
`
`Typical architecture between peering CDNs
`
`Reference:
`
`http://www.ietf.org/internet-drafts/draft-green-cdnp-gen-arch-03.txt (IETF Internet Draft)
`
`
`
`
`
`Accounting in a CDN
`
`Log collection
`Accounting information is in the form of log files for each of individual components in
`CDN environment. There is generally a mechanism by which logs for all components are
`collected, logically broken down into chunks and results analyzed. A CDN solution
`provides content providers with accounting and billing information. All the data is
`collected over CDN’s administrative domain. Peering between CDNs introduces the need
`to obtain similar accounting data from a foreign domain. This requirement means that
`customers of a peered CDN service (publishers, clients, and CDNs) must now have a
`generalized or standard means of obtaining accounting information to support current as
`well as planned business models.
`
`For example, to implement business models such as “Pay Per View” there should be a
`mechanism for authenticating and authorizing clients at the delivery point in a foreign
`domain. In a typical CDN environment, a server is required to handle accounting and
`billing requirements. This server provides means for collecting data from individual
`components of a CDN.
`
`CDN accounting and billing mechanism should have access to individual component logs
`and statistics which would provide a way to generate customer specific reports. This
`mechanism has an interface generally given to the customers/content providers to
`access accounting and billing information.
`
`
` p o w e r i n g i m a g i n a t i o n
`
`6
`
`6
`
`
`
`
`
`Access to content distribution between various components and management of these
`components is generally protected via authentication mechanisms. Customers (i.e.
`content providers, ISPs) would be provisioned to have access to their own content and
`resources.
`
`CDN solution Provider bills content-Provider on the basis of quality of content delivered,
`speed of delivery, amount of content delivered and performance over the period of time
`accounting is done. ISP bills content provider on resources used (i.e. surrogates and
`caches used) and bandwidth savings.
`
`Terminology used in content networking
`
`Browser cache
`
`Comes into picture when end-users hits the “back button“ to visit the page already seen.
`This content would generally be stored in a section of hard disk and is generally referred
`to as browser cache.
`
`Cache
`
`Cache is a local copy of data accessed over a network, which makes subsequent access
`to the same data faster. When data is read from, or written to main memory, a copy is
`also saved in cache. When a request comes for content already present in the cache, it is
`given immediately to the client. Note here that the data won’t be read from main memory
`but from the cache (needless to add here that cache is built from faster memory chips
`than main memory).
`
`Content consumers
`
`Users requesting a page.
`
`Content peering
`
`No individual CDN could span geographies across the Internet. Content peering allows
`diverse CDNs to interoperate whereby resources of individual CDNs could be combined
`to have a larger reach.
`
`Content provider
`
`Origin servers from where content is being requested by end clients/users. CDN vendors
`provide different kind of solutions to these content providers who are their customers.
`
`Domain name
`
`Domain name is a portion of naming-tree hierarchy that refers to general groupings of
`networks based on organization type or geography.
`
`Domain Naming System (DNS)
`
`DNS is a database of host information. A DNS name server resolves hostnames to IP
`addresses mapping queries. Queries may come from DNS clients or from other DNS
`name servers.
`
`
` p o w e r i n g i m a g i n a t i o n
`
`7
`
`7
`
`
`
`
`
`Forward proxy
`
`Forward proxy cache fetches content from origin servers in case content is not present in
`its cache. In other words, client requests bypass proxy cache and go to origin server only
`when the content is not cached locally in proxy cache. Objects stored in cache would be
`delivered faster. Objects not present in cache are fetched from origin server.
`
`HTTP
`
`Hyper Text Transfer Protocol (HTTP) runs over TCP/IP to transfer data over a network.
`HTTP defines how messages are formatted and transmitted and what action web servers
`and browsers should take in response to various commands.
`
`ISP
`
`An Internet Service Provider (ISP) is an entity that provides services on the Internet.
`
`Load balancing
`
`Distributing processing load evenly across a computer network in such a way that no
`single computer on the network is overloaded.
`
`Proxy cache
`
`LANs consist of many computers interconnected with each other and ultimately
`connected to Internet via a proxy, which could also function as cache. When using a
`proxy server, computers on the LAN are not connected directly to Internet. Only the proxy
`server would be directly connected to Internet.
`
`Client software is then configured to connect through proxy-server. One example of proxy
`cache is Squid proxy cache. A proxy cache can help a web site load faster as the content
`is cached. Since a proxy cache has a large number of users behind them, they reduce
`latency involved in getting content. Also, since this proxy cache would be used by large
`number of users, it would also result in substantial reduction of utilization of bandwidth.
`
`As the name suggests a caching proxy is a proxy, which also performs caching as part of
`its many activities. When content is requested from a browser, the caching proxy fetches
`the content, and saves a copy of the content in the cache. If another client requests the
`same content, the content is delivered directly from the caching proxy, instead of going to
`internet.
`
`Request routing
`
`Any client request would be redirected to a particular surrogate server depending on
`factors like network proximity, topological proximity, load balancing. This redirection is
`done by content director and is also called as request routing.
`
`Reverse proxy
`
`Reverse proxy is the name for certain alternate uses of a proxy server. It can be used
`outside the firewall to represent a secure content server to outside clients, preventing
`direct, unmonitored external access to the server’s data. It can also be used for
`replication; that is, multiple proxies can be attached in front of a heavily used server for
`load balancing.
`
`
` p o w e r i n g i m a g i n a t i o n
`
`8
`
`8
`
`
`
`
`
`SLA
`
`Service Level Agreements (SLAs) determine how the content is distributed in a CDN
`according to customer’s preferences. It is an agreement with the customer and CDN
`solution provider, which might involve resource provisioning for the customer.
`
`Surrogate servers
`
`Surrogate servers are located on edge of the network. They are capable of handling
`HTTP requests and servicing them through cache associated with it. The responses are
`triggered from cache associated with surrogate server.
`
`Transparent caching
`
`Transparent caching is essentially a form of forward proxy caching in which there is no
`configuration required from the browser perspective. This kind of transparent caching is
`often preferred as it eliminates need for administrative support.
`
`User agent
`
`End-user tool that sends a request.
`
`Web server
`
`A web server is a process running on the operating system that enables users to access
`resources that have been published in form of web pages in a wide area network (i.e.
`Internet)
`
`Conclusion: future market trends
`Content delivery networks provide a platform and capability to manage all forms of
`communication. It helps in moving towards providing efficient ways of managing and
`distributing content across the edge and understanding customer needs. Content delivery
`solutions could be deployed to avoid performance bottlenecks, which result due to
`continuous upsurge of the applications on the corporate Intranets.
`
`The goal of content delivery networks would essentially be to optimize bandwidth usage
`and to show improved response times relative to what would exist without the content
`delivery networks solutions
`in place. However, CDN solution now view newer
`opportunities in value addition, such as content adaptation, personalization of content,
`advertisement banner insertion, virus filtering and language translation. ICAP, an
`emerging content adaptation protocol has the potential for the above value-added
`services. A lot of content networking would depend on structuring the components
`involved.
`
`References
`IETF works-in-progress.
`
`http://www.ietf.org/internet-drafts/draft-ietf-cdi-architecture-00.txt
`
`http://www.ietf.org/internet-drafts/draft-ietf-cdi-aaa-reqs-00.txt
`
`
` p o w e r i n g i m a g i n a t i o n
`
`9
`
`9
`
`
`
`
`
`About HCL Technologies
`HCL Technologies, with a revenue of US$ 297 millions, is one of India’s leading IT
`services companies, providing a broad range of services to clients worldwide. Services
`include Technology Development, Software Product Engineering, Networking &
`Application Services and Business Process Outsourcing.
`
`HCL Tech focuses on technology as well as research & development outsourcing, with
`the objective of working with clients in areas at the core of their business. The focus on
`such mission critical projects and the ability to provide services throughout the life cycle
`of client products, from conceptualization to ongoing development and maintenance,
`enables HCL Tech to build long-term relationships with customers. These include
`software and hardware companies as well as large and medium sized organizations,
`across diverse industries around the world. Market leaders like Cisco Systems, Novell,
`RSA Security, KLA Tencor etc. feature in the reputed list of clients of HCL Tech.
`
`HCL Tech delivers services through an extensive offshore software development
`infrastructure in India and a vast global marketing and project network that enables
`scalable, flexible and cost-effective delivery. The company’s offshore model involves
`delivery of outsourcing services to clients abroad, by technical professionals located at
`the software development centers in India and may also include onsite work at the client
`site, on a short-term project-by-project basis. As of March 31, 2002, HCL Tech had 5945
`employees including JVs and subsidiaries. The company is thus able to capitalize on the
`advantages inherent to the Indian IT sector, including access to a large pool of skilled
`Indian technical professionals who deliver high-quality, globally competitive services at a
`significantly lower cost than in the United States.
`
`The offshore model fosters strong client relationships because some clients also make
`substantial capital investments in the dedicated offshore development centers set up
`exclusively for them. HCL Tech’s extensive marketing network comprises 21 marketing
`offices in 14 countries. Since inception, HCL Tech has emphasized the importance of
`building skills in emerging technologies by focusing on research and development
`activities for clients. The company’s R&D heritage stems partly from the early efforts of
`several key senior personnel who were actively involved in research and development
`related to the design of computer hardware and systems software products for the Indian
`market in the 1980s. HCL Tech continues to develop its IT services business by
`leveraging on the unique skills and know-how of these executives and other employees.
`
`
` p o w e r i n g i m a g i n a t i o n
`
`10
`
`10
`
`