`
`Reference 39
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2151, p. 1
`
`
`
`A Self-Adapting Web Server Architecture: Towards Higher Performance and
`Better Utilization
`
`Khalid Al-Issa
`Farag Azzedin
`Information and Computer Sciences Department
`King Fahd University of Petroleum & Minerals
`Dhahran, Saudi Arabia
`fazzedin@kfupm.edu.sa
`khalid.issa@aramco.com
`
`ABSTRACT
`
`The way at which a Web server handles I/O operations has
`a significant impact on its performance. Servers that allow
`blocking for I/O operations are easier to implement, but ex-
`hibit less efficient utilization and limited scalability. On the
`other hand, servers that allow non-blocking I/O usually per-
`form and scale better, but are not easy to implement and
`have limited functionality. This paper presents the design
`of a new, self-adapting Web server architecture that makes
`decisions on how future I/O operations would be handled
`based on load conditions. The results obtained from our
`implementation of this architecture indicate that it is capa-
`ble of providing competitive performance and better utiliza-
`tion than comparable non-adaptive Web servers on different
`load levels.
`
`KEYWORDS: Operating systems; Internet and Web com-
`puting; Synchronous and asynchronous I/O; Concurrency
`
`1. INTRODUCTION
`
`Performance is a vital factor behind the success of Web-
`based services. As a result, working towards improv-
`ing the performance of Web servers becomes a critical
`issue. As a matter of fact,
`the tremendous growth of
`Web-based services and applications over the past several
`years, the growth of network bandwidth, and the pres-
`ence of a very demanding, large and growing commu-
`nity of Web users, are expected to put more heat on Web
`servers [16] [5] [2] [19] [18].
`In general, performance can be looked at as either macro
`or micro performance [20]. A Web server’s macro per-
`formance refers to the side of performance observed by
`clients, including throughput and response time. Micro per-
`formance, on the other hand, represents the server’s inter-
`
`nal performance, including lots of metrics like clock Cy-
`cles Per Instruction (CPI) and cache hit rate. While both
`of the two classes of enhancement contribute to the over-
`all performance of a Web server, they differ in complexity,
`significance and effectiveness. For instance, a simple ap-
`proach towards improving the overall performance is to use
`replication, which is precisely used to give better macro per-
`formance by providing multiples of the original throughput.
`This approach, however, only provides a workaround that
`would still suffer from the same set of issues that exist in
`the server architecture [19]. A more effective alternative
`would be to look into enhancing a Web server’s micro per-
`formance, which would not only improve the overall per-
`formance, but also allows for eliminating major limitations
`and defects. As an example, some earlier work has shown
`that a server would be able to satisfy its clients with bet-
`ter throughput and response time if its cache locality is im-
`proved [11], or if more clients requests are taken every time
`the server accepts new requests [5] [3].
`
`The primary task of Web servers is to deliver Web con-
`tents in a concurrent fashion. In order to deliver contents,
`frequent disk I/O operations have to take place, and that
`hits concurrency significantly. Concurrency is generally
`achieved either through asynchronous system calls to avoid
`blocking the server, or through multiple server instances us-
`ing threads or sub-processes in which case the use of syn-
`chronous system calls becomes acceptable. The former ap-
`proach is typically used in the Single-Process Event-Driven
`(SPED) Web server architecture. As shown in Figure 1, a
`client request is first accepted by the SPED server process.
`Then, the server performs all necessary computations as re-
`quested by the client. In case I/O operations are needed, the
`server would enqueue these I/O requests against an asyn-
`chronous system call like select(), which should relieve the
`server from having to block waiting for the I/O operation to
`be completed. While the I/O is being performed, the server
`may start processing computations associated with another
`
`978-1-4244-4907-1/09/$25.00 ©2009 IEEE
`
`96
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2151, p. 2
`
`
`
`replace available, less efficient system calls [21] [8] [12],
`and (3) suggest and implement enhancements to these two
`architectures [11] [2] [5] [1] [9] [13] [3], . What is com-
`mon about all the proposals we survey is that their technique
`for handling I/O operations is pre-determined, and can not
`adapt to the continuously changing state of the server. How-
`ever, there are situations in which allowing a server thread to
`block is more cost-effective. There are also cases in which
`a server thread is so expensive that it should be allowed to
`only perform computations, and no I/O. As a result, we be-
`lieve that a server should be allowed to determine what is
`best I/O scheme to follow based on load conditions.
`This research work proposes the algorithm and structure of
`a self-adapting, multi-threaded server architecture that has
`the ability to switch between two different I/O schemes de-
`pending on load conditions. To the best of our knowledge,
`this is the first proposed adaptive Web server model that can
`follow more than one I/O scheme.
`
`1.1. Motivation
`Enhancing the way at which a Web server handles I/O has
`been an active topic in literature over the last decade. The
`key motivation was to propose ideas for higher concurrency,
`despite the long latency of disk I/O. Hybrid architectures,
`which rely on combinations of the two original approaches,
`are among the most interesting proposed ideas. For in-
`stance, one of the first hybrid models suggests employing a
`pool of helper processes whose role is to perform I/O. This
`relieves SPED servers from the need for inefficient asyn-
`chronous I/O, and greatly increases these servers’ capacity.
`While different hybrid models employed different tech-
`niques for achieving the primary goal of improving perfor-
`mance through increasing utilization and scalability, they
`are common in that they enforce a fixed I/O scenario. Pro-
`posed hybrid models that allowed blocking I/O to take place
`would allow that even under overloaded conditions. Simi-
`larly, models that utilize helper processes would pass I/O
`requests to helper processes even under lower load condi-
`tions, in which case blocking might both faster and less of
`an overhead. This motivates us to propose a self-adapting
`Web server architecture and evaluate its effectiveness to out-
`perform non-adaptive architectures in both throughput and
`response time on different load conditions.
`
`1.2. Objectives
`
`The main objective of this research work is to outline two
`major limitations present in today’s widely used I/O model
`in Web servers, the synchronous blocking I/O model. Scal-
`ability is highly affected due to allowing server threads to
`block for I/O. This would consequently have negative ef-
`fects on a Web server’s overall performance as it limits
`
`Figure 1. The SPED Web Server Architecture
`
`Figure 2. The Multi-Threaded Web Server Architecture
`
`client request.
`
`While the SPED model works pretty well when serving
`cached contents, it is not efficient when contents are to
`be fetched from disk,
`in which case the server is ex-
`pected to interleave the serving of requests with slow I/O
`operations [16] [13].
`In addition, asynchronous system
`calls available for use to implement concurrency in this
`model causes the server process to actually block in cer-
`tain cases [16] [8]. The alternative for achieving con-
`currency through asynchronous calls is to run multiple
`server instances through either multiple processes or mul-
`tiple threads. As shown in Figure 2, in a multi-threaded
`server, a thread is responsible for processing all computa-
`tions as well as any needed I/O associated with the client
`request. While the multi-process and multi-threaded archi-
`tectures are rather easier to implement [13] [4], they exhibit
`relatively low utilization of a server’s resources since they
`allow processes and threads to block. In addition, the fact
`that every connection gets assigned a unique server thread or
`process has a negative impact on scalability. Moreover, the
`introduction of persistent connections in HTTP 1.1, which
`permits a connection to stay active while different objects
`are transferred, allowed for even less efficient utilization of
`the server [4].
`
`The fact that both of the two alternatives have signifi-
`cant limitations in concurrently handling I/O operations and
`clients requests has motivated researchers to: (1) come up
`with hybrid architectures that combine certain features of
`the two approaches [16] [19] [4], (2) implement libraries to
`
`97
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2151, p. 3
`
`
`
`throughput and increases response time. In addition, allow-
`ing server threads to perform blocking I/O operations rep-
`resents an inefficient utilization of this valuable resource.
`Utilization becomes a critical issue as the rate of incoming
`clients requests increases while server threads are idle over
`I/O.
`
`We still believe that blocking for I/O is the right choice un-
`der certain circumstances. Therefore, we propose in this pa-
`per a Web server architecture that can use both the blocking
`and the non-blocking I/O models under different load condi-
`tions to enhance performance and provide better utilization
`of server threads.
`
`1.3. Contributions
`
`This research work contributes to literature by first provid-
`ing a survey and classification of current Web server ar-
`chitectures. Over the past several years, a number of ar-
`chitectures have been proposed to overcome limitations in
`the original models, as well as to improve performance and
`cope with the increasing popularity of Web-based services.
`In this research, we present a survey of these models along
`with a classification that is based on how they handle I/O
`requests.
`
`Second, this research work brings to attention the need for
`adaptability in Web servers. This feature will enable the
`server to choose a more practical work scenario depending
`on past, current, or foreseeable circumstances.
`
`Third, this research work promotes the use of asynchronous
`I/O for multi-threaded servers. The highly improved scal-
`ability obtained with this technique compared to the very
`common contender justifies it very well.
`
`Last, this research work introduces a performance evalua-
`tion of an implementation of the proposed Web server ar-
`chitecture.
`
`The paper is organized as follows: Section 2 presents a
`survey and a classification of existing Web server architec-
`tures. In Section 3, we explain the I/O models and outline
`strengths and weaknesses of each one of them. Section 4
`describes the advantages of implementing self-adaptability
`in Web servers. Then, we describe the internals of the self-
`adapting Web server architecture in Section 5. In Section 6,
`we present the results obtained from our experiments in
`which we compare the performance of an implementation
`of the self-adapting Web server model to non-adaptive Web
`servers. We then explain how utilization is enhanced in the
`self-adapting Web server architecture in Section 7. Finally,
`Section 8 presents a conclusion of this paper, along with
`plans for future work.
`
`Table 1. Some Existing Web Server Architectures
`
`Year
`1999
`
`Contribution
`AMPED
`
`Class
`1
`
`2001
`
`Cohort scheduling
`
`2001
`
`SEDA
`
`2001 Multi-Accept
`
`2003
`
`Cappriccio
`
`2004
`
`Lazy AIO
`
`2005
`
`Hybrid
`
`2007
`
`SYMPED
`
`2008 MEANS
`
`3
`
`1
`
`3
`
`2
`
`2
`
`1
`
`1
`
`2
`
`Remarks
`This is a SPED server that passes
`I/O requests to helper processes
`or threads
`In this server, the order of exe-
`cuting threads is changed in or-
`der to execute similar compu-
`tations consecutively, which re-
`duces cache misses.
`A pipelined server that consists
`of multiple stages, each is asso-
`ciated with a pool of threads.
`Instead of accepting a single in-
`coming connection, a bulk of
`incoming connections are taken
`every time accept() is called.
`A multi-threaded package that
`uses asynchronous I/O and pro-
`vides high scalability.
`A new asynchronous I/O library
`that is meant to resolve issues
`with the available asynchronous
`libraries.
`A multi-threaded server that em-
`ploys an event-dispatcher to re-
`solve issues with allowing per-
`sistent HTTP connections.
`This server employs multiple
`SPED instances.
`A software architecture that uses
`micro-threads
`for
`scheduling
`event-based tasks to Pthreads.
`
`2. LITERATURE REVIEW
`
`The multi-threaded and the event-based architectures are the
`original approaches for implementing a server. As each of
`the two has its own limitations and areas for improvement,
`many proposals over the last several years came to sug-
`gest and implement enhancements to overcome limitations
`and improve performance. We classify these proposals into
`three classes: (1) proposals for hybrid architectures, (2) pro-
`posals that suggest replacement libraries, and (3) proposals
`that disregard the I/O issue and focus on other aspects to
`improve performance. Table 1 summarizes our classification
`of the available Web server architectures.
`
`2.1. Proposals for Hybrid Approaches
`
`The first class of these proposals focused on deriving hy-
`brid architectures that would combine features from the
`two original models. One of the early attempts was the
`asymmetric multi-process event-driven (AMPED) architec-
`ture [16], which provides a more effective solution for per-
`forming I/O operations in SPED servers, in which asyn-
`chronous system calls like select() are used. The AMPED
`
`98
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2151, p. 4
`
`
`
`architecture is similar to SPED in that it typically runs a
`single thread of execution. However, instead of perform-
`ing asynchronous I/O using descriptors through select(),
`AMPED passes I/O operations to helper processes. As a
`result, if blocking for I/O ever takes place, only helper pro-
`cesses will have to set idle and not the server’s process.
`A more recent hybrid model is the Staged event-driven ar-
`chitecture (SEDA) [19].
`In SEDA, the entire work-flow
`of processing requests is re-structured into a sequence of
`stages, which makes it very similar to a simple pipeline.
`The main motivation behind introducing SEDA is to pro-
`vide massive concurrency by allowing pools of threads to
`handle specific sets of tasks at the same time.
`D. Carrera et. al. [4] proposes a solution for the case in
`which an idle client continues to hug a server thread, which
`is an undesirable consequence of allowing persistent con-
`nections in HTTP/1.1 for multi-threaded servers. To re-
`solve this issue, they introduce a hybrid model in an event-
`dispatcher is used to identify sockets with readable contents
`and assign them to a server thread, which would read and
`process the request.
`D. Pariag et.
`al [17] proposes the Symmetric Multi-
`Processor Event Driven (SYMPED) architecture.
`The
`SYMPED model consists of multiple SPED instances work-
`ing together to increase the level of concurrency. Whenever
`one of these instances blocks for disk accesses, other in-
`stances can take over processing clients requests.
`2.2. Proposals for Replacement Libraries
`The second class of proposals started from the fact that
`available asynchronous system calls are not efficient, and
`suggested that they should be replaced. For instance, the
`way select() works requires it to block when used on disk
`I/O [16] [8].
`Proposals in this area focus on provid-
`ing replacements to these limited libraries. Elmeleegy et.
`al. [8] proposed Lazy Asynchronous I/O (LAIO), an asyn-
`chronous I/O interface to better support non-blocking I/O
`that would be more appropriate for event-driven program-
`ming. LAIO basically provides a non-blocking counterpart
`for each blocking system call. It handles blocking I/O oper-
`ations for the application setting on top, while the applica-
`tion is allowed to move on with processing other requests.
`Capriccio [1] is a scalable thread package that has the ability
`to scale up to 100,000 threads. This package was designed
`to resolve the scalability issue of the multi-threaded archi-
`tecture. This high scalability in this solution was achieved
`through the use of epoll(), an asynchronous I/O interface
`that has proven to perform better than both the select() and
`the poll() interfaces with the right optimizations [9].
`Lei et. al. [12], proposes MEANS, a micro-thread software
`
`architecture that consists of two thread-layers setting be-
`tween the application and the operating system. An applica-
`tion that makes use of MEANS will assign work to MEANS
`micro-threads, which assign tasks in an event-based sce-
`nario to Pthreads interacting directly with the operating sys-
`tem.
`
`2.3. Proposals for Modifying Other Architectural
`Components
`
`The last class of proposals focused on enhancing the way
`the original models operate, while allowing the same I/O
`scenarios to take place. Chandra et. al. [5] and Brecht et.
`al. [3] suggest modifying the way a Web server accepts new
`connections. In the case of SPED, instead of accepting only
`a single new connection every time the server checks for in-
`coming connections, they suggest accepting multiple con-
`nections [5]. This method increases the rate of accepting
`new connections, and increases concurrency of the server
`by providing more work that is ready to be processed at any
`instance.
`Larus et. al. [11] suggest enhancing Web server perfor-
`mance by increasing locality and minimizing cache misses.
`They proposed a server model in which different requests
`are analyzed to identify similar computations. The order
`at which these requests are processed would be altered to
`allow similar computations to be processed consecutively.
`Executing similar computations as a group increases local-
`ity, and consequently improves performance.
`
`3. I/O MODELS
`Whenever an I/O operation needs to be performed by a Web
`server, the operating system actually takes care of it. This is
`due to many reasons, including maintaining a layer of secu-
`rity through which only privileged applications are granted
`access to certain files. While the I/O operation is being car-
`ried out, a server could either be blocked waiting for it to
`complete, or is free to process other requests. This depends
`entirely on the type of I/O the server initiated.
`
`3.1. Synchronous I/O Operations
`
`Synchronous I/O could be blocking or non-blocking to the
`calling process [10]. While both are performed through the
`same read and write system calls, the non-blocking requires
`the “O NONBLOCK” option to be set when the open() sys-
`tem call is issued. The main issue with the synchronous
`non-blocking model is that it would require the calling pro-
`cess to send numerous calls to get the status of the requested
`I/O. As a result, this model is known to be extremely ineffi-
`cient [10].
`The synchronous blocking model, on the other hand,
`
`99
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2151, p. 5
`
`
`
`is widely used in multi-threaded and multiprocess Web
`servers, where the presence of multiple instances of the
`server makes the undesirable blocking less significant.
`There are, however, major issues with this I/O model. Par-
`ticularly for multi-threaded Web servers, allowing threads
`to block would largely limit the server scalability, lead-
`ing to degraded performance with low throughput and high
`response time.
`In addition, allowing a server threads to
`block introduces idle time and makes inefficient utilization
`of valuable resources.
`
`3.2. Asynchronous I/O Operations
`
`Just like the case with synchronous I/O, asynchronous I/O
`can be either blocking or non-blocking [10]. A well-known
`example of the asynchronous blocking I/O scheme is the se-
`lect() system call [7]. Through this system call, the calling
`process can add new asynchronous I/O requests and take the
`ones that are complete for processing. The select() system
`call keeps a list of file-descriptors for every request it is pro-
`cessing. Every time a select() is issued, the calling process
`is expected to block for a period of time to allow some I/O
`to complete.
`
`In the asynchronous non-blocking system calls, the calling
`process returns immediately after initiating an I/O request.
`This class of I/O operations is performed through system
`calls present in the Asynchronous I/O (AIO) library, includ-
`ing the aio read() and aio write(). Once an AIO operation
`is initiated, it gets carried out to completion for the calling
`process, which would need to use some notification mecha-
`nisms to identify completed I/O requests. Two of the widely
`used notification mechanisms include signals and polling.
`In signals, the kernel would send a signal to the calling pro-
`cess once its requested I/O operation is complete. In polling,
`on the other hand, a dedicated thread is typically used to
`“poll” the status of a list of I/O requests passed to him by
`the calling process.
`
`While in synchronous blocking I/O the calling process has
`a limit on how many requests it is processing at a given
`time, asynchronous I/O enables the calling process to ac-
`cept much more work. This is largely due to the fact
`that a worker thread is relieved from having to handle a
`client requests to completion. This provides a boost to
`the server’s scalability, and consequently enables the server
`to sustain higher throughput with lower response time. In
`addition, utilization of server threads is kept high since
`they are devoted for processing and not for sleeping over
`I/O. There are, however, issues with the asynchronous non-
`blocking approach. There are situations where blocking is
`more preferable. Under lower workloads, as well as for
`shorter I/O requests, blocking becomes a more convenient
`alternative and is expected to perform better. In addition,
`
`the need for notification mechanisms both complicates the
`rather simple I/O scenario and introduces an overhead that
`can have significant drawbacks. Last but not least, there are
`limits to how many AIO requests a kernel allows at a given
`time [6].
`
`4. THE NEED FOR SELF-ADAPTABILITY
`
`Existing Web servers follow a fixed I/O scheme at all times,
`regardless of the fact that different I/O operations are of
`varying cost, and that they are performed at different load
`conditions. For instance, if we look at a server that allows
`blocking I/O to take place, then we know that concurrent I/O
`requests would cause several threads to block until the I/O is
`complete. While this is acceptable under lower load condi-
`tions, it becomes undesirable as the server gets overloaded
`as that would highly limit the scalability of the server. The
`inefficient utilization is also an issue here since the server
`is allowed to set idle over I/O while other queue is building
`up.
`The basic alternative of allowing only non-blocking I/O to
`take place has limitations too. While this may significantly
`enhance utilization and improve performance, it introduces
`the overhead of notification that might not be well justi-
`fied under low load. For instance, polling is a notification
`technique in which an I/O request - initiated by a server‘s
`thread - is regularly checked for completion by a polling
`thread. When this server is under lower load conditions,
`then this I/O scenario becomes quite overwhelming. As a
`matter of fact, it would be more practical and reasonable to
`allow threads to block in such a case rather than introduce
`the overhead of asynchronous I/O.
`This brings to attention the need for a Web server to adapt to
`different work scenarios under different load conditions in
`order to ensure best possible performance. More precisely,
`the use of blocking I/O is very practical whenever incom-
`ing requests are within a Web server’s capacity. Beyond the
`server capacity, the use of non-blocking I/O increases scal-
`ability, giving better chances for higher performance, and
`utilizes server threads highly and more appropriately.
`
`5. THE SELF-ADAPTING WEB SERVER
`ARCHITECTURE
`
`The self-adapting Web server architecture is similar to the
`multi-threaded Web server model, with the distinction that
`it allows for two types of I/O: synchronous blocking, and
`asynchronous non-blocking I/O operations. Synchronous
`I/O takes less effort from the server itself, but lead to lim-
`iting both utilization and scalability. Asynchronous I/O
`comes with its own overhead, but is very useful at times
`when we need the best scalability and utilization of server
`
`100
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2151, p. 6
`
`
`
`Clients requests
`
`Clients requests
`
`main() thread
`
`main() thread
`
`write
`to client
`
`Pool of
`worker
`threads
`
`Web contents
`stored at local
`disks
`
`polling
`thread
`(idle)
`
`[a]
`
`write
`to client
`
`Pool of
`worker
`threads
`
`write
`to client
`
`polling
`thread
`
`Web contents
`stored at local
`disks
`
`[b]
`
`Figure 3. The Self-Adapting Web Server Work Scenario
`
`resources. The criteria of adaptability can follow more than
`one pattern, as long as it includes well defined and easily
`detectable conditions. For example, in our implementation
`we use the presence of queued requests as an indicator of
`overload.
`In our implementation of the self-adapting server model,
`we used the following pools of threads explained in [15]:
`a pool of 64 worker threads, and a pool of a single polling
`thread for detecting completed asynchronous I/O requests.
`While the use of asynchronous I/O is not very common in
`multi-threaded servers, it has many advantages. With syn-
`chronous I/O, the maximum number of requests to be ser-
`viced at anytime would equal the number of threads. How-
`ever, using asynchronous I/O, worker threads are able to
`initiate I/O requests and move on to servicing other client
`requests, and that would greatly improve scalability. The
`default I/O scheme for our server is the synchronous block-
`ing, and the server would only switch to using asynchronous
`I/O as load increases (see Figure 3).
`As can be seen in Figure 3 client requests are expected to
`get assigned to worker threads almost instantly under lower
`load conditions, without having to wait for long periods of
`time in the queue. As the rate of incoming requests in-
`creases, less worker threads become free and we get to a
`point at which all worker threads are busy processing client
`requests. At this point, incoming requests start piling up
`in the queue of the thread pool, and the server would no-
`tice that and instruct worker threads to rather perform the
`next I/O requests as asynchronous non-blocking operations.
`This will continue to be the case until all queued requests
`are assigned to worker threads, and then the server may re-
`sume the default working scenario of using synchronous I/O
`scheme.
`Figure 4 shows a flowchart of the algorithm used to build our
`server. The upper-most block in the diagram, labeled (a),
`
`accept()
`
`(a) main() thread
`
`(b) worker threads
`
`enqueue request into queue
`of thread-pool
`
`dequeue request, perform
`header parsing, ready for I/O
`
`check for outstanding requests
`in pool's queue
`
`any outstanding
`requests?
`
`no
`
`yes
`
`perform Blocking I/O
`read from "le, and write
`back to client
`
`perform Asynch. I/O
`initiate a read request, and
`enqueue for polling
`
`worker thread is done,
`marked as free
`
`(c) polling thread
`
`polling thread dequeues an AIO
`read request, and check its status
`
`yes
`
`AIO read
`completed?
`
`no
`
`polling thread writes
`back to client
`
`enqueues back the
`AIO read
`
`Figure 4. Adaptability Algorithm in the Self-Adapting
`Web Server Architecture
`
`represents the work done by the main thread. This thread is
`basically the server’s dispatcher, which binds to the listen-
`ing port and continuously assign incoming requests to the
`queue of the threads-pool. The block in the middle of the
`diagram shows the work done by a worker thread. First, the
`worker thread dequeues a work item from the queue, and
`starts processing it. When it gets to the point where it needs
`to perform I/O, the worker thread will have to check for the
`presence of outstanding requests in the queue, and if there
`are none, then it would perform a blocking I/O operation.
`In case the queue is not empty, however, then that is a good
`indicator that all other worker threads are busy and that the
`server is about to be - or already is - overloaded. As a re-
`sult, the worker thread would initiate an asynchronous I/O
`request, and would enqueue a work item associated with it
`for the polling thread. That moves us to the third block in
`the diagram, labeled (c), which represents the work done by
`the polling thread. Once the polling thread dequeues this
`work item, it would be able to identify the I/O request in
`
`101
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2151, p. 7
`
`
`
`charge, and would be able to check on its status. In case
`the I/O is found to be complete, then the polling thread will
`respond back to the client, otherwise it enqueues back for
`later.
`
`5.1. Implementation
`
`The self-adapting Web server architecture employs both
`synchronous and asynchronous system calls for making I/O.
`For synchronous I/O, it uses the system‘s read() and write().
`For asynchronous I/O, it uses the Asynchronous I/O (AIO)
`set of library calls, which is included in the 2.6 Linux ker-
`nel. The main reason we elected to use this library is that it
`includes true non-blocking calls like the aio read(), which
`our worker threads use to initiate an asynchronous I/O be-
`fore it moves on to servicing another request.
`The programming language in use for implementing the
`server was C, and the code was compiled using GCC.
`
`6. PERFORMANCE EVALUATION
`In order to evaluate the performance of the self-adapting
`Web server model, we compare it to two non-adaptive
`servers we developed. The three Web servers we developed
`differ only in the way I/O is performed, while everything
`else is kept the same. In addition, we compare our model to
`a similarly configured version of the Apache server, which
`represents the state-of-the art in Web servers. The goal of
`comparing the servers’ performance to Apache is to pro-
`vide a sense of how it performs compared to a fully opti-
`mized, production-ready server such as Apache, under the
`same workload.
`
`6.1. Performance Metrics
`
`Web servers performance is usually measured in terms of
`both throughput and response time. Throughput of a Web
`server is given by:
`
`T hroughput = R
`T
`
`(1)
`
`where R represents the number of successfully completed
`requests, and T represents the total time that was needed
`to complete them. Throughput provides a measure of how
`many requests a Web server can successfully complete in a
`unit of time. The other metrics, response time, represents
`the time between the completion of a request and the begin-
`ning of a response, and is often measured in milli-seconds.
`
`6.2. Testing Environment
`
`Besides the self-adapting Web server, which we explained
`in Section 5, we implemented a multi-threaded server that
`
`is blocking I/O (BIO)-based. This server is composed of
`a main thread, and a pool of worker threads. The main
`thread binds itself to the listening port and then works as
`a dispatcher assigning incoming clients requests to worker
`threads, which are responsible for processing all requests
`to completion. This processing includes parsing the HTTP
`header, validating the request, performing blocking I/O re-
`quests to obtain the requested Web files, and sending the
`responses back to clients.
`
`We also implemented a multi-threaded server that is asyn-
`chronous I/O (AIO)-based. In addition to the main thread
`and the pool of worker threads available in the BIO-based
`server, the AIO-based server has a dedicated polling-thread
`whose role is to verify completion of initiated I/O requests.
`In this server, the main thread works similar to the one of
`the BIO-based server, but the way worker threads work dif-
`fers. The rule of thumb here is that no worker thread is
`allowed to perform I/O operations. As a result, when an
`I/O operation is needed, a worker thread only initiates it and
`then is freed up for processing another client request. Of
`course, somebody else will have to follow up with the I/O
`operation that has been just initiated, and this is the job of
`the polling thread. More precisely, worker threads use the
`aio read() call to initiate an I/O read operation. Since this
`call is non-blocking, the thread is expected to return from it
`immediately. Just before it is allowed to to go back to the
`pool, the worker thread enqueues a request for the polling-
`thread to track the initiated I/O operation. After that, the
`polling-thread will dequeue the request and check the status
`of its associated I/O operation. If the I/O is complete, the
`polling-thread will respond back to the client. Otherwise, it
`would enqueue the request back to be re-checked later.
`
`Apache is a well-known Web server that follows the origi-
`nal multi-threaded model, with blocking I/O operations per-
`formed as needed. In this experiment, Apache was reco