throbber
J Grid Computing (2007) 5:235–250
`DOI 10.1007/s10723-007-9068-6
`
`Benefits and Drawbacks of Redundant Batch Requests
`
`Henri Casanova
`
`Received: 5 September 2006 / Accepted: 12 January 2007 / Published online: 6 February 2007
`© Springer Science + Business Media B.V. 2007
`
`Abstract Most parallel computing platforms are
`controlled by batch schedulers that place requests
`for computation in a queue until access to com-
`pute nodes is granted. Queue waiting times are
`notoriously hard to predict, making it difficult for
`users not only to estimate when their applications
`may start, but also to pick among multiple batch-
`scheduled platforms the one that will produce
`the shortest turnaround time. As a result, an in-
`creasing number of users resort to “redundant
`requests”: several requests are simultaneously
`submitted to multiple batch schedulers on behalf
`of a single job; once one of these requests is
`granted access to compute nodes, the others are
`canceled. Using simulation as well as experiments
`with a production batch scheduler we evaluate
`the impact of redundant requests on (1) average
`job performance, (2) schedule fairness, (3) sys-
`tem load, and (4) system predictability. We find
`that some of the popularly held beliefs about
`the harmfulness of redundant batch requests are
`unfounded. We also find that the two most crit-
`
`This work was supported by the NSF under Award
`0546688.
`
`H. Casanova (B)
`
`Department of Information and Computer Sciences,
`University of Hawai‘i at Manoa, 1680 East–West Rd.,
`Post 317, Honolulu, HI 96822, USA
`e-mail: henric@hawaii.edu
`
`ical issues with redundant requests are the ad-
`ditional load on current middleware infrastruc-
`tures and unfairness towards users who do not
`use redundant requests. Using our experimental
`results we quantify both impacts in terms of the
`number of users who use redundant requests and
`of the amount of request redundancy these users
`employ.
`Key words job scheduling · batch scheduling ·
`redundant requests
`
`1 Introduction
`
`Most parallel computing platforms are accessed
`via batch schedulers [5] to which users send
`requests specifying how many compute nodes
`they need for how long. Batch schedulers can
`be configured in various ways to implement ad-
`hoc resource management policies and may main-
`tain multiple queues of pending requests. Most
`batch schedulers use “backfilling,” which allows
`some requests to jump ahead in a queue to re-
`duce queue fragmentation. Backfilling may hap-
`pen when a request is submitted, canceled, or
`when a job runs for less time than initially re-
`quested (which is common). The above makes
`queue waiting time difficult to predict. Some batch
`schedulers can provide an estimate of queue wait-
`ing time based on the current state of the queue.
`
`Data Co Exhibit 1050
`Data Co v. Bright Data
`
`

`

`236
`
`H. Casanova
`
`3.
`
`4.
`
`Impact on system load redundant requests
`cause higher load on the batch schedulers, on
`the network, and on the middleware infra-
`structure used to access remote platforms. The
`question is: do redundant requests cause any
`of the system’s component to become a bot-
`tleneck, and if so, which one?
`Impact on predictability submissions and can-
`cellations of redundant requests cause churn
`in batch queues, which likely makes them less
`predictable. The question is: what is the de-
`crease in queue waiting time prediction accu-
`racy when redundant requests are used?
`
`To answer the above questions we use simu-
`lations, real-world experiments, and analysis of
`results obtained by others. We find that several
`popularly held beliefs regarding the negative ef-
`fects of redundant batch requests are unfounded.
`For instance, our experiments show that a batch
`scheduler, even when running on a mere 1 GHz
`Pentium III processor, can most likely handle
`large amounts of request redundancy without be-
`coming a bottleneck. In fact, we find that the two
`main issues with redundant requests are: (1) ad-
`ditional load on the middleware; and (2) fairness
`towards users who do not use redundant requests.
`This paper is organized as follows. Section 2
`presents background on redundant requests and
`discusses related work. Sections 3, 4, 5, and 6 focus
`on the four questions above. Section 7 concludes
`the paper with a summary of our findings and with
`perspectives on future work.
`
`2 Background and Related Work
`
`Redundant requests can be sent to:
`
`Unfortunately, these estimates do not take back-
`filling into account, which makes them pessimistic.
`Conversely, they do not take future submissions
`of high priority requests into account either,
`which makes them optimistic. Although very re-
`cently developed forecasting methods for estimat-
`ing lower or upper bounds on queue waiting time
`with certain levels of confidence are promising [1],
`most users today have at best a fuzzy notion of
`what queue waiting times to expect. At the same
`time many of these users have access to multiple
`batch-scheduled platforms that can be used for
`running their applications, possibly at different
`institutions. As a result, rather than picking one
`target platform based on a poor estimate of queue
`waiting time, if any, users can send a request
`to each platform; when one of these requests is
`granted access to compute nodes the others are
`canceled. This can be easily implemented by hav-
`ing the application send a callback to the user (or
`to the program that submitted the requests) when
`it starts executing.
`The admittedly brute-force strategy described
`above, which we term “redundant requests,” is
`gaining popularity because it obviates the need
`for difficult platform selection. However, there is a
`widespread but not verified notion that if “every-
`body were to use redundant requests” then “bad
`things would happen.” In this paper we attempt
`to determine the effects of redundant requests.
`More specifically, we quantify the four following
`impacts of redundant batch requests:
`
`1.
`
`2.
`
`Impact on average job performance while
`redundant requests may intuitively lead to
`better load balancing across individual plat-
`forms, they may also disrupt the resource
`management policies implemented by batch
`schedulers and thereby decrease overall job
`performance. The question is: by how much
`do redundant requests improve or worsen av-
`erage job performance?
`Impact on schedule fairness redundant re-
`quests give users who use them an advantage
`as they have the luxury to pick the shortest
`queue waiting times. The question is: how
`much of an advantage do redundant request
`provide and how penalized are users who do
`not employ them?
`
`queues
`
`on multiple
`
`(1)
`
`batch
`
`Individual
`platforms;
`(2) Multiple batch queues of multiple platforms;
`(3) Multiple batch queues of a single platform;
`or
`(4) A single batch queue of a single platform.
`
`In (1), (2), and (3), the goal is to avoid selecting
`a batch queue a priori but instead to use the batch
`queue on which the shortest queue waiting time is
`experienced. When using multiple platforms, as in
`(1) and (2), a difficulty may be the heterogeneity
`
`

`

`Benefits and drawbacks of redundant batch requests
`
`237
`
`among these platforms. The computation times
`requested by each redundant request could be
`scaled to reflect platform heterogeneity and differ-
`ent numbers of compute nodes could be requested
`on different platforms. Sophisticated users could
`thus attempt to tailor their requests to achieve the
`best response times on each candidate platform.
`(Note that typical users are not sophisticated,
`request computation times that are gross over-
`estimations of needed computation times [21],
`and may not even have a good understanding
`of the scaling properties of their applications).
`More importantly, the platform with the shortest
`queue waiting time could also be a platform with
`slow compute nodes, meaning that the shortest
`queue waiting time may not lead to the shortest
`turnaround time (i.e., queue waiting time plus
`execution time). Users then face a conundrum:
`should one wait possibly a long time for a faster
`platform? Another conundrum arises when using
`(3) above. Different queues typically correspond
`to higher service unit costs. The question is then
`whether one should wait possibly a long time
`for a cheaper platform. Option (4) above can be
`useful for “moldable” jobs that can accommodate
`various numbers of compute nodes. Moldable jobs
`are common but requesting the optimal number
`of nodes is known to be difficult [18]. Typically, a
`larger number of nodes will lead to a longer queue
`waiting time and to a shorter execution time, while
`a smaller number of nodes will lead to a shorter
`queue waiting time and to a longer execution
`time. One approach is then to send redundant
`requests for different numbers of nodes. With this
`approach, one is faced again with a conundrum
`similar to the one for option (2): should one wait
`possibly a long time for a larger number of nodes?
`Note that option (4) can be combined with the
`other three.
`There is no one-size-fits-all answer to these
`conundrums as the solution strongly depends both
`on the expected application execution times and
`on the system load, forcing users to use heuristics.
`These heuristics could be ad hoc, could use queue
`waiting time statistics and/or forecasting [1], or
`could use real-time status information from the
`batch schedulers that gives a sense of the request’s
`place in the queue (unfortunately many currently
`deployed schedulers do not provide such informa-
`
`tion). Finally, note that the number of redundant
`requests that can be used for (2), (3) and (4)
`can be bounded by each batch scheduler. Indeed,
`batch schedulers can typically be configured so
`that each user can only have a limited number
`of pending requests in the batch queue(s). In this
`paper we study option (1), use a simple model for
`generating redundant requests in a heterogeneous
`environment, and leave options (2), (3), and (4)
`for future work.
`Previous works have explored the use of redun-
`dant requests. Most notably, Subramani et al. [19]
`and Sabin et al. [16] have studied them as a way to
`perform job scheduling in a Grid platform. In their
`works, the redundant requests are not initiated
`by the users but by a metascheduler [2, 7, 15, 17]
`to potentially offload work to remote platforms.
`A metascheduler serves as a (centralized or dis-
`tributed) resource broker and thus controls to
`some extent how a set of individual platforms
`are shared and used by a community of users.
`These works show that using redundant requests
`can lead to better overall performance, and more
`so in systems containing clusters with different
`numbers of nodes. Although related, our work
`studies redundant requests generated by users
`without
`the knowledge of
`the scheduler(s).
`This has two important implications. First, a
`metascheduler can choose remote clusters based
`on some global knowledge about the system (e.g.,
`queue sizes) in order to let redundant requests
`“play nice” with each other. Note that as of today,
`no widely accepted metascheduler is deployed but
`users may resort to redundant requests directly.
`By contrast, we study user-driven redundant re-
`quests that may negatively disrupt the schedule at
`remote clusters. Second, in our study we consider
`that only some users may be using redundant
`requests and thus obtain an unfair advantage over
`users who do not use redundant requests. By con-
`trast, in [16, 19] all users benefit from the same
`benefits from redundant requests. We argue that
`in real systems today this is not the case, partly
`due to the lack of a metaschedulers, but also due
`to the fact that not all users are created equal:
`some may not have accounts on multiple plat-
`forms, some may not be sufficiently sophisticated
`to use redundant requests. Another difference
`between our work and these previous works is
`
`

`

`238
`
`H. Casanova
`
`that we study the impact of redundant requests on
`the load on the batch scheduler, on the load on
`the middleware, and on the predictability of the
`system. Other relevant related work includes the
`“placeholder scheduling” technique [13], which
`allows for a late binding of the application to
`the resources allocated by a batch scheduler. It
`provides a simple way to implement redundant
`requests since a callback is sent to the user when
`the application is ready to execute. At that time
`the request submitter (the user or, more likely, a
`program) may cancel redundant requests.
`
`3 Impact on Average Job Performance
`
`In this section we investigate whether redundant
`requests negatively impact job scheduling in terms
`of average job performance. Before presenting
`our results we detail our experimental methodol-
`ogy and define our metric for job performance.
`
`3.1 Experimental Methodology
`
`3.1.1 Simulation Model
`
`We use simulation because experiments on pro-
`duction systems would be prohibitive both in
`terms of time and money (i.e., service unit al-
`locations on batch-scheduled platforms), because
`they would be limited to a specific configuration,
`and because they would hardly be repeatable. We
`have implemented a simulator using the SimGrid
`[9] toolkit which provides the needed abstrac-
`tions and realistic models for the simulation of
`processes interacting over a network. We simu-
`late a platform that consists of a number of sites,
`where each site holds a parallel platform, say, a
`cluster. Each cluster is managed by its batch sched-
`uler. We also simulate a stream of jobs at each clus-
`ter. Each job requires some number of compute
`nodes for some duration, sends a request to the
`local cluster, and may send redundant requests to
`other clusters. We detail below the components of
`our simulation model.
`
`Clusters and Batch Schedulers We simulate a
`set of N clusters, C1, . . . , CN. Cluster Ci con-
`tains ni identical compute nodes. Different node
`
`speeds could be accounted for by scaling re-
`quested compute times and numbers of nodes (as
`discussed in Section 2), but this is not straight-
`forward to model. Instead, we limit heterogeneity
`to the number of nodes in each cluster and to
`potentially different workloads at these clusters
`(i.e., more or fewer requests per second). Each
`cluster is managed by a batch-scheduler, which
`can use one of three job scheduling algorithms:
`EASY [10], Conservative Backfilling (CBF) [12],
`or First Come First Serve (FCFS). The EASY
`algorithm enables backfilling and is representative
`of algorithms running in deployed batch sched-
`ulers today. Although widely studied, the more
`complex CBF algorithm is, to the best of our
`knowledge, only implemented in the OAR batch
`scheduler [3]. This is also the case for FCFS, but
`it is a simple algorithm that is commonly used
`as a base-line comparator. We model each batch
`scheduler as managing a single queue and we do
`not consider request priorities.
`
`Workload Simulating a stream of jobs can be
`done either by using a workload model or by
`“replaying” traces collected from the logs of real-
`world batch schedulers. The results presented in
`this paper were obtained with the former ap-
`proach. We use the model by Lublin et al. [11],
`which is the latest, most comprehensive, and most
`validated batch workload model in the literature.
`Accordingly, we model request arrival times using
`a Gamma distribution (corresponding to the so-
`called “peak hour” model). Note that [11] goes
`further by providing a “combined” model that
`uses two Gamma distributions: one to model job
`inter-arrival times during peak hours, and one to
`model the fraction of jobs that arrive during each
`of the 48 half-hour periods of the day, so as to re-
`flect nocturnal and diurnal trends in the number of
`submissions. We conducted simulations with the
`combined model, but the results did not change
`our conclusions. For simplicity we only present
`results obtained with the peak hour model. We
`model the requested number of nodes with a
`two-stage log-uniform distribution biased towards
`powers of two. We model the requested compute
`times with a hyper-Gamma distribution whose p
`parameter depends on the requested number of
`nodes.
`
`

`

`Benefits and drawbacks of redundant batch requests
`
`239
`
`Unless specified otherwise, we instantiate the
`parameters of all
`the distributions using the
`“model” parameter values derived in [11], to
`which we refer the reader for all details. We
`conducted some simulations using real-world
`traces made available in the Parallel Workloads
`Archive [4] but, expectedly, did not observe sig-
`nificantly different results. We opted for using
`a workload model rather than using traces for
`the experiments presented in this paper as it is
`straightforward to modify the model’s parameters
`to study different scenarios.
`
`3.1.2 Assumptions
`
`To isolate the effects of redundant requests on
`scheduling we do not simulate any network traffic.
`This includes the cost of sending a request to
`a potentially remote cluster, which is arguably
`small. More importantly, we also ignore the over-
`head for sending application input data, if any,
`to a remote cluster. To use a remote cluster, a
`user must pre-stage input data on that platform
`(unless the application streams or downloads its
`input data directly). However, when using redun-
`dant requests, users usually do not pre-stage input
`data to all remote clusters but wait until nodes
`are allocated on a particular cluster. The typical
`approach is then to request extra computation
`time that will be used to upload application input
`data to the cluster. Therefore, the only direct
`impact of redundant requests on the specifics of
`the workload is that requested computation times
`may be higher than when there are no redundant
`requests. (Note that the workload model in [11],
`which we use in this work, was developed based
`on logs of requests that most likely were not
`redundant.) However, we performed experiments
`in which we increased the requested duration of
`redundant requests by 10 and 50% and observed
`no difference in our results. This showed that the
`results in this paper hold when users request more
`compute time to allow late binding of application
`data to remote platforms. Note that it is shown
`in [19] that the added cost of using redundant
`requests when proactively transferring application
`data to all candidate platforms does not impact the
`effectiveness of using redundant requests.
`
`For the experiments in this section we ignore
`all overheads due to the network, the middleware,
`and the batch scheduler itself so as to isolate the
`effects of redundant request on job performance.
`We study these overheads in Section 5.
`
`3.2 Performance Metric
`
`We use a popular metric to assess the average job
`performance: the average stretch over all jobs in
`the system. The stretch of a job is the job’s turn-
`around time, which is its execution time plus its
`queue waiting time, divided by the job’s execution
`time. The stretch is often called “slowdown” be-
`cause it measures by how much the job execution
`is slowed down when compared to execution on
`a dedicated platform. This metric has been used
`previously, both in practical works to evaluate the
`performance of batch schedulers and in theoreti-
`cal works as objective functions, to be minimized,
`for job scheduling algorithms (which are ironically
`not used by batch schedulers). In this paper we
`often use the average relative stretch, that is the
`stretch relative to that when no redundant re-
`quests are used, averaged over all jobs. A value
`lower than 1 means that the use of redundant
`requests is beneficial, while a value higher than
`1 means that the use of redundant requests is
`harmful.
`We do not use the average turnaround time as
`a metric because it can be skewed by long jobs.
`Furthermore, the stretch makes it possible to eas-
`ily compare results obtained with different work-
`loads, i.e., different job durations. However, in our
`specific simulations, our results were essentially
`unchanged when analyzed with the turnaround
`time metric.
`
`3.3 Simulation Results
`
`We first simulate an environment that consists
`of N identical clusters, for N = 2, 3, 4, 5, 8, 10,
`15, 20. Each cluster contains 128 compute nodes
`and is managed by a scheduler that uses the EASY
`algorithm. Each cluster receives a stream of jobs
`generated according to the model described in
`Section 3.1.1. We simulate 6 h of job submissions
`(around 4,000 jobs given that the mean job inter-
`arrival time for the base model in [11] is roughly
`
`

`

`240
`
`H. Casanova
`
`5 s). We evaluate five redundant request schemes:
`R2, R3, R4, H ALF, ALL, in which a request is
`sent to 2, 3, 4, half, and all clusters, respectively.
`One request is always sent to the local cluster, and
`remote clusters are chosen randomly according to
`a uniform distribution. Other methods for choos-
`ing remote clusters are possible. For instance, in-
`spired by the work in [19], one may select remote
`clusters based on batch queue lengths. However, it
`is not clear why users would go through the trou-
`ble of picking remote clusters as opposed to just
`blindly sending requests to all clusters on which
`they have accounts. Consequently, our method
`of random selection merely reflects the fact that
`different users have accounts on different clus-
`ters. We also report on results obtained with non-
`uniform distributions of accounts across clusters.
`For now we assume that the same redundant
`request scheme is used by all jobs. For each ex-
`periment we generate N random job streams and
`simulate the six schemes above for these streams.
`Figure 1 shows the average relative stretch, av-
`eraged over 50 experiments. Over these 50 sam-
`ples, we measure coefficients of variation ranging
`approximately from 50 to 5% when going from
`N = 2 clusters to N = 20 clusters. The same holds
`true for all experiments that follow. (We do not
`show error bars on the graphs to avoid clutter.)
`Values below 1 in Fig. 1 mean that using redun-
`dant requests is beneficial on average.
`
`R2
`R3
`R4
`HALF
`ALL
`
`1.15
`
`1.1
`
`1.05
`
`1
`
`0.95
`
`0.9
`
`0.85
`
`0.8
`
`Average relative stretch
`
`One can see that in the worst case using redun-
`dant requests leads to an average relative stretch
`10% higher than when not using them, on aver-
`age. Using redundant requests becomes benefi-
`cial regardless of the redundant request scheme
`for N > 5. For N ≤ 5, it seems that a few short
`jobs get penalized due to a few lost opportunities
`for backfilling, thereby increasing their stretch.
`Accordingly, when using the average turnaround
`time as a metric, using redundant requests is al-
`ways beneficial even for N ≤ 5. This phenomenon
`disappears, or at least becomes insignificant, as
`the number of cluster increases. Note that many
`high-performance computing users today have ac-
`counts on five different clusters or more, and that
`this number will increase as the number of de-
`ployed clusters continues to grow.
`Redundant requests are beneficial because they
`allow for a better load-balancing of requests
`across clusters. Using redundant requests leads
`to better relative stretches in more than 95% of
`the experiments for N = 20, more than 90% for
`N = 15, and more than 85% for N = 10. When
`using redundant requests leads to a worse rela-
`tive stretch, it is worse by at most 0.4, 1.7, and
`2.1% for N = 20, N = 15, and N = 10, respec-
`tively. Conversely, when using redundant requests
`leads to a better relative stretch, the stretch is
`better on average by 15 to 25%. We examined
`some of the cases in which using no redundant
`requests is better than using redundant requests.
`These cases seem idiosyncratic and we could not
`derive any general reason why redundant requests
`may not lead to better results beyond particularly
`“unlucky” sequences of request submissions.
`
`Other algorithms and compute time estimates
`The results above were obtained for batch sched-
`ulers using the EASY algorithm and assuming
`that jobs request precisely the amount of compute
`time that they need. Table 1 shows results ob-
`tained for N = 10 clusters and for the H ALF re-
`dundant request scheme for the EASY, CBF, and
`FCFS algorithms, and for exact and conservative
`compute time estimates. We obtained similar re-
`sults (i.e., all average relative metrics under 1) for
`other redundant request schemes and other values
`of N > 5. One can see that the average stretch
`and coefficient of variance of stretches, relative
`
`0.75
`2
`
`4
`
`6
`
`8
`
`14
`
`16
`
`18
`
`20
`
`12
`10
`Number of sites
`Fig. 1 Average relative stretch for redundant request
`schemes relative to the scheme using no redundant re-
`quests, versus the number of clusters, averaged over 50
`experiments
`
`

`

`Benefits and drawbacks of redundant batch requests
`
`241
`
`Table 2 Average relative stretches for non-uniformly dis-
`tributed redundant requests, relative to the scheme using
`no redundant requests, averaged over 50 experiments, for
`N = 10 clusters
`Redundant request scheme
`
`R2 R3 R4 H ALF
`
`Average relative stretch
`
`0.94 0.95 0.88 0.89
`
`given different levels of workload. Figure 2 shows
`average relative stretch for all redundant request
`schemes in a 10-cluster platform, averaged over
`50 experiments, for various job interarrival times
`(in seconds). The base model proposed in [11]
`produces mean interarrival times of 5.01 s, which
`is the mean of a Gamma distribution with para-
`meters α = 10.23 and β = 0.49 (the mean is equal
`to α × β). To simulate other workloads we vary
`the value of α from 4 to 20, leading to interarrival
`times between approximately 2 and 10 s. One can
`see on the figure that using redundant requests
`improves average relative stretch regardless of the
`job interarrival time.
`
`Heterogeneity So far, all our experiments were
`conducted on a homogeneous system with identi-
`cal clusters and statistically identical job streams
`at all clusters. To evaluate whether redundant
`
`R2
`R3
`R4
`HALF
`ALL
`
`1
`
`0.95
`
`0.9
`
`0.85
`
`0.8
`
`0.75
`
`Average relative stretch
`
`0.7
`
`2
`
`3
`
`7
`6
`5
`4
`Mean Job interarrival time
`Fig. 2 Average stretch for redundant request schemes
`relative to the scheme using no redundant requests, versus
`job interarrival times, averaged over 50 experiments, for
`N = 10 clusters
`
`8
`
`9
`
`10
`
`Table 1 Average relative stretch for three different job
`scheduling algorithms, and for conservative compute time
`estimates and exact compute time estimates,
`for the
`H ALF redundant request scheme, relative to the scheme
`using no redundant requests, averaged over 50 experi-
`ments, for N = 10 clusters
`Average relative stretch
`
`Job scheduling
`algorithm
`
`With exact
`estimates
`
`With conservative
`estimates
`
`EASY
`CBF
`FCFS
`
`0.88
`0.90
`0.93
`
`0.83
`0.83
`0.93
`
`to when no redundant requests are used, are all
`below 1 no matter what scheduling algorithm is
`used (EASY, CBF, or FCFS), and whether jobs
`request exactly the compute time they need (“Ex-
`act Estimates”) or more compute time than they
`actually need as is the case in practice (“Conser-
`vative Estimates”). We model requested compute
`times using the “φ model” proposed in [24], with
`φ = 0.10, which leads to a uniformly distributed
`overestimation factor with mean 2.16. A more
`sophisticated model was recently proposed in [21],
`and although we may use it to repeat these experi-
`ments in future work, we do not expect the results
`to be significantly different.
`
`Non-uniform redundant request distribution We
`repeat our first experiment for N = 10 clusters,
`simulating redundant requests schemes that pick
`remote clusters at random but biased towards
`some clusters. We model the probability of pick-
`ing remote cluster C1 is twice as high as the
`probability of picking remote cluster C2, which is
`twice as high as the probability of picking remote
`cluster C3, and so on. This (arbitrary) distribution
`is heavily biased (half of the clusters are each
`picked with only probability 6.25%). Results are
`shown in Table 2, averaged over 50 experiments.
`One can see that on average the use of redundant
`requests is beneficial for performance, even when
`the targets of the redundant requests are not uni-
`formly distributed. In fact, the results are similar
`to those obtained with a uniform distribution.
`
`Job interarrival times One may wonder whether
`using redundant requests is more or less harmful
`
`

`

`242
`
`H. Casanova
`
`Table 3 Average stretch for heterogeneous platforms, rel-
`ative to the scheme using no redundant requests, averaged
`over 50 experiments, for N = 10 clusters
`Redundant request
`scheme
`
`Average
`relative stretch
`
`R2
`R3
`R4
`H ALF
`ALL
`
`0.83
`0.74
`0.71
`0.63
`0.67
`
`requests are harmful in a heterogeneous environ-
`ment we conduct the following experiment. We
`simulate N = 10 clusters, where each cluster con-
`tains a number of nodes picked randomly among
`16, 32, 64, 128, and 256. Each cluster receives
`a job stream with job interarrival times picked
`randomly between 2 and 20 s. Jobs arriving at
`a cluster request a number of nodes between 1
`and the number of nodes available at that cluster.
`Therefore, jobs arriving at a large cluster may not
`be runnable on smaller clusters. Table 3 shows
`results for all redundant request schemes, rela-
`tive to the scheme using no redundant requests,
`averaged over 50 experiments. One can see that
`using more redundant requests is even more ben-
`eficial (higher performance and better fairness)
`than in the homogeneous case across the board.
`This is because the load balancing due to redun-
`dant requests is more effective in a heterogeneous
`system. This result corroborates the findings in
`[16, 19].
`
`3.4 Discussion
`
`Virtually all simulation results presented so far
`indicate that the average job performance is im-
`proved when redundant requests are used. These
`results hold across ranges of job scheduling algo-
`rithms, job interarrival times, number of clusters,
`heterogeneity across clusters, distribution of re-
`dundant requests across clusters, and whether re-
`quested compute times are exact or conservative.
`Essentially, this is because redundant requests im-
`prove load balancing across clusters.
`The reader may wonder whether race condi-
`tions are involved when using redundant requests.
`
`Indeed, two requests may start at approximately
`the same time and one of them may be can-
`celed only after it is started. In our simulation we
`observed that such race conditions where rare
`(occurring for less than 0.005% of the jobs). How-
`ever, for larger latencies of receiving notifications
`from a starting request and of canceling a request,
`race conditions could be slightly more frequent.
`One issue could then be wastage of CPU time.
`Unless latencies are enormous, wastage should
`not be significant and well worth the cost for the
`benefits of redundant requests, as will be seen in
`the next section. For instance, a 5-s delay when
`canceling a job that has just started on 64 proces-
`sors would only waste 0.08 h of CPU time, which
`is a minuscule fraction of typical user allocations.
`Another issue is whether the request submitter
`(user or program) should pay attention to such
`race conditions. Fortunately, a pending request or
`a running job are canceled in exactly the same way
`when using current batch scheduler interfaces.
`The only issue is that the request submitter must
`be prepared to ignore stale notifications from re-
`quests that are in the process of being canceled.
`
`4 Impact on Schedule Fairness
`
`We now investigate the question of schedule
`fairness, that is: do all jobs fare equally when
`redundant requests are used? The experimental
`methodology and assumptions are identical to
`those used in Section 3. We first present results
`obtained when all jobs use identical redundant
`requests schemes, and then turn to the case in
`which only a subset of the jobs use redundant
`requests.
`
`4.1 When All Jobs Use Redundant Requests
`
`In this section we use the coefficient of variation
`of stretches (i.e., the standard deviation divided
`by the average, in percentage, over all jobs in
`the system), as a way to evaluate the fairness
`of a schedule. A lower coefficient of variation
`indicates a fairer schedule since on average all
`jobs achieve a stretch closer to the mean. An-
`other popular metric that is related to the fairness
`of a schedule is the maximum stretch, and the
`
`

`

`Benefits and drawbacks of redundant batch requests
`
`243
`
`R2
`R3
`R4
`HALF
`ALL
`
`Table 5 Coefficient of variation of stretches for non-
`uniformly distributed redundant requests, relative to the
`scheme using no redundant requests, averaged over 50
`experiments, for N = 10 clusters
`Redundant request scheme
`
`R2 R3 R4 H ALF
`
`Relative C.V. of stretches
`
`0.94 0.92 0.88 0.86
`
`on the figure). The improved fairness can be at-
`tributed to the fact that redundant requests lead
`to better load-balancing. Table 4 shows results
`for the three job scheduling algorithms for either
`exact or conservative compute time estimates, and
`here again we see that fairness is improved by
`the use of redundant requests. Finally, Table 5
`shows that the improvement in fairness holds for
`non-uniformly distributed redundant requests, as
`modeled in Section 3.3. These results also hold
`over job interarrival times from 2 to 10 s as well
`as for heterogeneous platforms.
`
`4.2 When Only Some Jobs U

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket