`
`Reference 5
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2117, p. 1
`
`
`
`IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 8, AUGUST 1991
`
`963
`
`Design and Analysis of Master/Slave Multiprocessors
`
`Albert G. Greenberg, Member, IEEE, and Paul E. Wright
`
`system
`
`(kernel)
`
`user
`
`Fig. 1. Process modes.
`
`Abstract— In certain computer operating systems, including
`Unix,1 a process alternates between two modes: user mode and
`system mode. The process enters system mode on executing
`a system call, say, to do I/O, and returns to user mode on
`completion of the call. In master/slave multiprocessors, certain
`system calls, which we refer to as kernel calls, can execute only
`on the master processor. All other work can execute on the
`master processor or on any slave processor. As a result, the kernel
`calls from different processes are serialized, which (barring more
`fundamental changes to the operating system) is necessary since
`kernel calls from different processes are not independent; for
`example, the calls might update the same table.
`A simple model of master/slave processors is presented, along
`with two simple, practical scheduling algorithms. An approximate
`analysis of the model yields simple formulas for performance
`measures in terms of the hardware and workload parameters, and
`gives insight into the power and the limitations of master/slave
`systems. In particular, we obtain formulas for the maximal
`processing power (throughput) of the system, a quantity that
`remains bounded as the number of slave processors increases.
`Our analysis is also applicable to symmetric multiprocessors,
`where performance considerations such as cache performance,
`may dictate asymmetric assignment of system tasks to the pro
`cessors.
`Index Terms—Master/slave, multiprocessors, performance eval
`uation, scheduling.
`
`I. Introduction
`TN certain computer operating systems, including Unix, a
`1 process alternates between two modes: system mode and
`user mode. System mode is entered when the process makes
`a call to the operating system, say, to do I/O. A simple
`way to support several processes running concurrently on a
`multiprocessor, with minimal changes to the operating system
`itself, is to designate one processor as the master and the others
`as the slaves [2], [5]. A subset of the system calls, which we
`refer to as the kernel calls (See Fig. 1), can execute only on the
`master. When a process running on a slave processor makes a
`kernel call, the slave returns the process to the master, rather
`than service the call. As a result, the kernel calls are serialized,
`which (barring more fundamental changes to the operating
`system) is necessary since kernel calls from different processes
`are not independent; for example, the calls might update the
`same table. On the other hand, while not executing kernel calls,
`processes are independent, and can execute on any processor.
`We present a simple stochastic model of master/slave mul
`tiprocessors, and two simple, practical scheduling algorithms.
`An approximate analysis of the model yields simple formulas
`for performance measures in terms of hardware and workload
`1 Unix® is a registered trademark of AT&T Bell Laboratories.
`Manuscript received April 4, 1989; revised October 17, 1990.
`The authors are with AT&T Bell Laboratories, Murray Hill, NJ 07974.
`IEEE Log Number 9101665.
`
`parameters. Monte Carlo simulation results show that the
`approximations are accurate. The results give insight into the
`power and the limitations of master/slave systems. In partic
`ular, we obtain formulas for the maximal processing power
`(throughput) of the system. These parameters are possible to
`control in system design, though the control may entail difficult
`recoding of some of the kernel calls. Our results quantify the
`benefits of the recoding. Conversely, the results show that in
`many cases, without such recoding, it is fruitless to add more
`slave processors to the system.
`Goble and Marsh [5] appear to have been the first to
`argue that the master/slave approach offers a simple and
`cheap way to extend significantly the power of an exist
`ing uniprocessor, in their case the VAX 11/780. A more
`recent example is the AT&T 3B2 1000 model 80, a Unix
`master/slave multiprocessor. Further motivation for the study
`of asymmetric multiprocessors stems somewhat surprisingly
`from symmetric multiprocessors, where all processors have
`the same capabilities. In practice, it is often desirable and
`sometimes necessary for performance reasons, to partition the
`operating system functions for these machines so that some
`functions are restricted to execute only on certain processors.
`Doing so can improve the processors’ cache performance,
`which is critical to overall system performance. If all calls
`to a service are restricted to execute on only one processor,
`the corresponding code and data may be partly captured and
`held in this processor’s cache, causing the code to run faster
`and generate less traffic between the processors and main
`memory. For example, assigning one processor to handle
`interrupts from a given peripheral device not only simplifies
`the interrupt handling routine, but also may improve its
`performance because the appropriate code and data are more
`likely to persist in cache between interrupts. The model
`considered here, however, makes it clear that this approach
`is not without pitfalls.
`Our analytic approach is well-known in probabilistic mod
`eling as fixed point analysis [3], [4], [6], [9], [14] or the
`fixed population mean method [16]. Often, a fixed point
`approach culminates in a numerical search for solutions, with
`no guarantee that the solutions are unique. In this paper, the
`approach yields unique, simple closed form solutions.
`We believe that this work reveals the essential poten
`tials and limitations imposed by scheduling constraints on a
`0018-9340/91/0800-0963$01.00 © 1991 IEEE
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2117, p. 2
`
`
`
`964
`
`master/slave multiprocessor architecture. However, the model
`presented here, trades off some realism for simplicity:
`• We assume that the multiprocessor supports shared mem
`ory, and do not model bandwidth limitations between
`processors and memory. In practice, this bandwidth limits
`the number of processors the multiprocessor can support
`at a high level of utilization per processor. Our model and
`its analysis expose additional limitations (if any) arising
`from scheduling constraints and workload parameters.
`• We assume that the central random variables of interest,
`such as job lengths, are exponentially distributed and
`that there is just one class of job in the system. In
`particular, we assume that each kernel model demand a
`job makes is exponentially distributed. A cursory glance
`at measurement data collected during the development
`of the AT&T 3B2 1000 Model 80 shows that this expo
`nential assumption is not unreasonable, although a closer
`examination reveals that the tail does indeed contain
`slightly too much mass.
`In an influential paper, Leland and Ott published mea
`surement data on Unix job characteristics [12]. Leland and
`Ott considered a Unix job’s total CPU service demand
`and found that if the probability distribution function that
`best fit their data was of the form F(x) = l—rx~c, and an
`exponential distribution gave a much poorer fit. Krueger
`and Livny [11] later found that a close fit to the Leland
`and Ott data is obtained by using a hyperexponential
`distribution, specifically, a mixture of three exponential
`distributions. This is not surprising since the Leland and
`Ott workload data showed distinct job classes: CPU hogs,
`disk hogs, and the remaining short jobs.
`In short, while it is difficult to fit a single exponential
`distribution to a multiclass Unix workload, a mixture
`of exponentials gives an adequate fit. In this paper, we
`consider an analytic closed model where there is a single
`class of jobs that are never replaced so the total CPU
`demand is infinite. However, as in [12] and [11] if it
`is deemed crucial to model a multiclass workload, we
`recommend adapting the analysis to hyperexponential dis
`tributions. Our analysis is useful in the design phase to see
`how workload parameters impact system performance.
`Though the analytic results presented here do not address
`multiclass workloads, the model described in Section II
`is easily extended to capture multiple classes (replacing
`the exponential distributions with hyperexponential dis
`tributions), which can then be studied via simulation and
`via our analytic approach as described in Section VII.
`• The exponential assumptions lead to simple formulas
`that provide useful guides to practical design of master/
`slave systems. The formulas identify saturation condi
`tions, which must be confronted to obtain a balanced
`design. In particular, the formulas give the number of
`slave processors that can be usefully supported, as a
`function of the workload parameters.
`The scheduling algorithms considered here arose during
`the design and implementation of the AT&T 3B2/1000
`model 80, a Unix master/slave multiprocessor. Measure
`ment of the model’s parameters revealed that some kernel
`
`IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 8, AUGUST 1991
`
`call recoding would be essential. This gave focus to
`the development, and significant performance gains were
`noted during the recoding, which our model and analysis
`helped to explain.
`Some of the formulas derived follow just from flow
`balance conditions, and do not depend on the exponential
`assumptions. In Section VI, we discuss a simple way,
`suggested by our analysis, to extract a key performance
`statistic essentially via measurement alone. Moreover,
`the analysis can be adapted to handle other distribu
`tions (including deterministic distributions), at the cost
`of replacing the formulas with numerical solutions.
`• Last, we believe that our modeling approach and analysis
`can be generalized to study scenarios where there is no
`one master processor, but a system function may be
`restricted to execute only on one processor, or only on
`a subset of the processors. This would lead to better
`understanding of asymmetric uses of multiprocessors with
`symmetric capabilities, alluded to above.
`
`II. The Model
`We assume that there is one master processor, P >1 slave
`processors, and a fixed number of jobs in the system, N >1.
`The slave processors run continuously at unit speed. Let
`speed of master processor
`P = speed of slave processor
`The assignment of jobs to processors depends on the sched
`uling algorithm.
`Our model of a job is simple. We lump system mode work
`that can be executed on a slave processor (reentrant system
`calls) with real user mode work and refer to all such work as
`user mode work. Henceforth, no mention is made of system
`mode. In our model, a job alternates between user and kernel
`mode. On entering user mode, the job demands to be served
`in user mode for a random period of time. We call this period
`of time a holding time. After obtaining that service, during
`sojourns on master or slave processors, the job switches to
`kernel mode and demands service in that mode for another
`random period of time. This demand can be satisfied only
`during sojourns on the master. After the demand is satisfied,
`the job switches back to user mode. Note that the term holding
`time refers to the random service demand for service in a
`given mode, not the time interval during which this service
`is delivered.
`The key parameters of our job model are the holding times
`in kernel and user mode. The holding time in a given mode
`is defined as a random amount of time that a job requires
`service in this mode before switching to the other mode. Thus,
`a holding time is a demand for CPU service, which is satisfied
`over a number of sojourns on the processors. We assume that
`the holding times are independent, exponentially distributed
`random variables, with means
`Tu = L( length of user mode period)
`
`and
`
`Tk = //(length of kernel mode period).
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2117, p. 3
`
`
`
`GREENBERG AND WRIGHT: MASTER/SLAVE MULTIPROCESSORS
`
`We assume that each time that a job is started on a processor,
`a context switch cost is paid. Specifically, the job demands
`service for a random period with mean
`Tc = /^(length of context switch period)
`before demanding the service associated with its mode. We
`assume the context switch periods are independent and ex
`ponentially distributed. We emphasize that this is a modeling
`assumption only and is done for the purposes of making the
`analysis tractable. Obviously, the simple model of a job studied
`here is an idealization, but it contains sufficient structure to
`expose scheduling difficulties intrinsic to master/slave systems.
`Interval Timeouts: An integral part of all timesharing sys
`tems, such as Unix, is an interval timer, which periodically
`generates an interrupt signal which is simultaneously sent to
`all processors [2], The interval timeout period is the time
`quantum used for timesharing. In our model, we assume
`that the sequence of interval timeout periods are independent,
`exponentially distributed random variables, with mean
`Tj = £ (length of interval timeout period).
`In any real system, the interval timeout period has determin
`istic, rather than exponentially distributed length. However,
`owing to the random mixing arising from our other stochastic
`assumptions, the results do not appear to be very sensitive to
`the shape of this distribution. We have conducted experiments
`in which both the context switch cost and the interval timeout
`are deterministic. These experiments indicate that the trends
`predicted by the exponential assumptions are correct. In ad
`dition, the values of the observed performance measures are
`in close agreement with those predicted by our analysis. This
`issue is taken up again in Section V.
`Scheduling Opportunities: There are exactly two classes of
`events, which we term scheduling opportunities, which can
`trigger a processor to interrupt one job to run another:
`• when its currently running job switches from user to
`kernel mode or from kernel to user mode, and
`• when the system’s interval timer goes off.
`In Unix systems, a switch from user to kernel mode is signaled
`by an exception or illegal instruction trap, which switches
`control to an interrupt handler. The interrupt handler can take
`scheduling actions just before and just after starting the kernel
`mode service.
`Parameter Ranges: Reasonable parameters ranges are Tc =
`0.4 ms, Tj = 10 ms, Tk
`0.1 to 2 ms, and Tu — 1 to
`100 ms. In practice, the interval timeout period Tj is difficult
`to change, because a large number of periodic system tasks are
`set to be performed on fixed multiples of this period. The wide
`range for Tu reflects the range in workloads. Roughly, high
`Tu workloads consist of computationally intensive jobs, and
`low Tu workloads consist of interactive or I/O intensive jobs.
`To achieve a high level of performance, we would like Tk
`to be small and Tu to be large. In most systems, Tk is the
`cost of a single kernel call, and is difficult to reduce. However,
`Tu can be increased by recoding kernel calls so that they can
`execute on the slave processors. (Recall that Tu lumps real
`user mode work with system calls executable on the slave
`
`965
`
`processors.) Typically, recoding a kernel call is difficult since
`data structures and algorithms have to be reworked so that
`several instances of the call can proceed concurrently. Our
`analysis shows how large Tu must be increased to reach a
`given level of performance.
`
`3)
`
`III. Scheduling Algorithms
`In this section we present two scheduling algorithms: kickto
`and sampler. Roughly, kickto moves jobs from master to slave
`as fast as possible. The idea is to prevent starvation of the
`slaves, though if jobs switch between user and kernel modes
`rapidly kickto may waste an excessive number of cycles due
`to context switching. The idea behind the sampler algorithm
`is to reduce this danger significantly by moving a job from
`master to slave only if the master finds the job in user mode
`on interval timeout. The drawback for sampler is the potential
`for underutilizing the slave processors.
`The kickto and sampler algorithms have the following
`properties in common.
`Priority is given to jobs in kernel mode. That is, when
`1)
`the master is in position to start a new job, it always
`chooses one in kernel mode if there is one. Though
`this makes good sense from a performance viewpoint,
`it is desirable for more basic reasons. As described in
`[2], jobs in kernel mode are likely to hold scant system
`resources, so Unix systems assign such jobs high priority
`to free those resources as soon as possible.
`2) Jobs not running on some processor wait in one of two
`queues: either the master queue or the slave queue. Jobs
`in the master queue are all in kernel mode and jobs in
`the slave queue are all in user mode. A slave processor
`can take jobs from the slave queue only. The master
`processor can take jobs from either queue.
`When a job running on a slave processor switches to
`kernel mode, the processor stops running the job and
`places it on the master queue. Next, if the slave queue
`is not empty, the processor takes a job from the slave
`queue and runs it. Otherwise, the processor idles until
`a job arrives in the slave queue, whereupon it takes the
`job and runs it.
`Whenever the master processor is in position to select
`a job to run, if the master queue is not empty, then
`the master processor chooses a job from that queue. If
`the master queue is empty but the slave queue is not,
`then the master processor chooses (steals) a job from
`the slave queue.
`At every interval timeout, every processor stops its
`currently running job, and attempts to obtain another,
`whereupon the processor first holds for a context switch
`period and then runs the job. Thus, in our model the
`context switch cost is charged even if it turns out that
`the processor obtains the same job as it had before the
`interval timeout.
`We insist on service interruption and context switch
`ing on every interval timeout in order to model the
`overhead of timesharing, though this may be slightly
`pessimistic. In practice, the timeout may trigger system
`
`4)
`
`5)
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2117, p. 4
`
`
`
`IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 8, AUGUST 1991
`
`IV. Performance Analysis
`In this section, we present an approximate steady-state anal
`ysis of the model, for a workload determined by parameters
`Tu and TK. The analysis provides simple formulas for the
`processor utilizations and the average queue lengths. (Indeed,
`the queue length distribution and higher queue length mo
`ments are readily computable.) The formulas reveal important
`differences between sampler and kickto, which we bring out
`numerically in Section V. Moreover, the formulas allow us
`to determine precisely in terms of the parameters where the
`bottleneck lies: whether it is the master queue or the slave
`queue that saturates as the number of jobs increases. In
`particular, we find exactly how large the number of slave
`processors P must be, with all other parameters held fixed,
`before the system becomes “master-limited,” meaning the
`master queue is overloaded and adding more slave processors
`is fruitless.
`In order to streamline the statement of the results and avoid
`unduly complicated formulas, we shall first state the results
`obtained assuming Tc = 0 (Section IV-A), then describe how
`to scale the parameters and results to account for context
`switching (Section IV-C). The error made by this scaling is
`essentially that of replacing a two stage Erlang distribution (the
`first stage models holding for an exponential period with mean
`Tc and the second holding for another exponential period
`with mean depending on Tu, Tk, Tj) with an exponential
`distribution having the same mean. There is no difficulty in
`handling the two stage Erlang distribution in our analysis, but
`it leads to more complicated formulas without compensating
`benefit.
`Our analysis is approximate and consists of analyzing the
`master and slave subsystems separately as if each receives
`an external stream of jobs at unknown Poisson rate A, which
`depart after receiving service. We solve for the average number
`of jobs QS(A) and Qm{X) in the slave and master subsystems,
`observed at a random moment in (continuous) time in steady
`state. As to be expected, QJ. X) and <5m(A) increase as A > 0
`increases and diverge at finite (saturation) values of A. We
`determine A as the smallest positive root Ajv of the “fixed
`point” equation
`
`©(A) + Qm(A) — N,
`
`which states that the average number of jobs in the two
`(approximate) subsystems must equal the given number of
`jobs N in the real system. We also derive simple closed
`forms for the equilibrium distribution of the number of jobs in
`each subsystem as functions of A a-. Thus, having determined
`Ajv, we can derive simple closed forms for essentially any
`performance measure.
`At steady state, the (unknown) rate at which jobs flow from
`master to slaves must equal the rate in the reverse direction.
`This motivates analyzing each subsystem as if each receives
`jobs at the same rate A. However, it does not follow from
`the assumptions of Section II that the flow is Poisson and is
`independent of the detailed state of the other subsystem. This
`is an approximation.
`
`Slave
`queue
`
`c I I I U U
`
`S
`
`0
`
`966
`
`Master queue
`
`K
`
`KK
`
`©
`
`s
`Fig. 2. Flow of jobs in a master/slave multiprocessor.
`
`maintenance tasks that cause an overhead to be felt even
`when the result of the timeout is to reschedule the job
`that was running before the timeout.
`The flow of jobs in the system is illustrated in Fig. 2. The
`mechanism for selecting one job from a queue holding at least
`one job need not be modeled, since our performance estimates
`deal only with the lengths of the queues.
`
`A. Kickto
`In kickto, whenever the currently running job on the master
`processor goes into user mode, if the master queue is not
`empty, then the master processor stops the job, places it in
`the slave queue and runs another, selected from the master
`queue. Thus, the master processor kicks jobs out to the slave
`queue as soon as possible. If the master queue is empty when
`the currently running job enters user mode, then the master
`processor continues to run the job. If a job subsequently
`migrates from a slave processor to the master processor (to
`execute in kernel mode) while the user mode job is still
`running on the master, the user mode is preempted and placed
`in slave queue and the (newly arrived) kernel mode job is run
`on the master.
`
`B. Sampler
`If jobs switch modes rapidly (Tu and Tk are on the order
`of Tc), then under kickto the master processor is swamped
`by context switching. The following algorithm, sampler, tries
`to overcome this problem by not moving jobs to the slave
`queue as readily. Under sampler, the master processor takes
`scheduling actions only on interval timeouts. At an interval
`timeout, if the master queue is not empty and the master
`processor finds its currently running job in user mode, then
`it stops the job, places it in the slave queue, and runs another
`selected from the master queue. Thus, the master processor
`samples the mode of its current job at all interval timeouts.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2117, p. 5
`
`
`
`GREENBERG AND WRIGHT: MASTER/SLAVE MULTIPROCESSORS
`
`The close agreement between the analytic results obtained
`via these approximations and simulation results show that this
`error is small. (Of course, the simulation results are for the
`model of Section II, without the approximations introduced
`here.)
`
`A. Analysis I: No Context Switch Cost
`The utilization of a processor is the steady-state average
`proportion of time the processor spends servicing user or
`kernel model demands, as opposed to idling. Letting Um
`denote the utilization of the master processor and Us the
`utilization of a given slave processor, define the system’s
`processing power to be
`W = Um + PUS
`which is proportional to the steady-state rate at which the
`system executes instructions. Let Qm and Qs denote the
`number of jobs (queued and in service) at the master and at
`the slaves, respectively.
`Suppose that there is no context switch cost, so Tc = 0.
`When the master queue is empty and the master moves its
`currently running job to the slave queue, it steals the job
`back and restarts it immediately without any context switch
`overhead. It is as if the job never left service. It follows that
`the master processor is never idle, so
`Um = l.
`A simple drift argument (given below for sampler) shows
`that
`
`(4.1)
`
`X* =
`
`if using sampler,
`pTu
`TvTK+nTiiTu+TK)'
`if using kickto
`Tk
`is the maximal throughput of the master subsystem. It follows
`from (4.1) that A* is always smaller for sampler than for kickto.
`A renewal theoretic argument shows that A* = [i/Tk holds for
`kickto regardless of the distributions of the random variables.
`The justification of (4.1) for sampler is as follows. Assume
`that the master queue contains an infinite number of jobs. Thus,
`the processor is never starved for work, and outputs jobs at
`the highest possible rate in this case. Let us regard the master
`as being in either user mode or kernel mode according to the
`mode of the job currently being executed. Under sampler, this
`“state” of the master processor evolves approximately as a
`Markov chain with two states: kernel and user. Let tt” and 7rfc
`denote the equilibrium distribution of this chain. Since the job
`in service goes from user to kernel mode on service completion
`(rate n/Ty) or interval timeout (rate l/I)), and from kernel
`to user mode on service completion (rate ^/Tk), the balance
`equation is
`
`7TU I
`= 77^
`^
`\Tu
`Tk
`Tj
`which has the normalized solution, i.e., tt11 + nk = 1,
`b-TuTi
`^.U __
`TT =
`TuTk + pTi(Tu + Tk)
`
`(4.2)
`
`(4.3)
`
`967
`
`(4.4)
`
`7Tfc =
`
`Tk{Tu +
`TuTk + nTi{Tu + Tk)
`Jobs depart only if caught in user mode on an interval timeout.
`Thus, the maximal throughput is txu/Tj. Unraveled, this is the
`expression for A*.
`The detailed analysis of the slave subsystem is classical, and
`is given first. The analysis of the master subsystem depends
`on the scheduling algorithm. We treat sampler first and then
`kickto.
`1) Slave Subsystem in Isolation: Assuming that jobs arrive
`to the slave queue at Poisson rate A and depart after receiving
`service (at exponential rate l/Tu), the slave subsystem is a
`classical M/M/P system [10]. Letting Xs denote the number
`of jobs present at equilibrium, we obtain [10]
`
`Pr[Xs = k\ =
`
`I{XTu,P)(^)%
`
`0 < fe < P,
`k> P.
`
`(4.5)
`
`The mean population in the system is
`(\TV)P I(XTu,P)
`(P - 1)!
`(P- XTuf
`
`Qs(X) = E[Xs] = XTu 1 +
`
`(4.6)
`
`where I(X, P) is given by (4.28). It is intuitively obvious and
`is easily demonstrated that Qm(A) increases with A > 0.
`2) Master Subsystem in Isolation—Sampler: In this section,
`we analyze a Markovian model of the master processor in
`isolation, assuming the scheduling algorithm is sampler.
`We describe the state of the master processor by two
`variables: Xm, the total number of jobs present at the master
`(queued and in service), and an “internal” state Ym, which
`assumes the values “u” or “k” according to whether the job
`presently executing on the master is in user or kernel mode.
`The state space consists of all pairs (Xm,Ym) where Ym is
`either “u” or “k” and Xrn is a (strictly) positive integer. (Under
`sampler, the master is never idle.) We assume jobs arrive in
`kernel mode at Poisson rate A. Interval timeouts also “arrive”
`as a Poisson stream, but at rate 1/Tj. Since we assume there is
`no context switch cost here {Tc = 0), a job may be regarded as
`a two-state Markov chain, alternating between user and kernel
`modes. Taking into account the speed of the master processor,
`the mean holding times in user and kernel modes for a job
`executing on the master are 7},- ///, and Ty /fi, respectively.
`The transition diagram for the Markov chain (2fm,ym) is
`illustrated in Fig. 3. The states (Xm,u) constitute the top row,
`while the states {Xm,k) form the bottom row. Transitions
`from right to left (rate A) correspond to job arrivals, vertical
`transitions (rates h/Tk and p/Tu) to mode switches of the
`currently running job, and diagonal transitions (rate 1/T/) to
`job transfers on interval timeout. An analytical proof of the
`criterion A < A*, for the positive recurrence of the Markov
`chain (Xm,Ym), is given in the Appendix.
`Now, let 7r“, nf denote the equilibrium probability that there
`are i jobs at the master and its internal state is “u” or “k,”
`respectively. Let TTi be the two-dimensional row vector with
`entries tt, = [tt’A Trf], i > 1. The balance equations for the
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2117, p. 6
`
`
`
`968
`
`k :
`Xm :
`
`o A_
`
`*
`
`^7
`I
`
`IEEE TRANSACTIONS ON COMPUTERS, VOL. 40, NO. 8, AUGUST 1991
`
`nr
`
`A_
`
`^7
`I
`
`X_
`
`I
`
`X_
`
`ft
`
`ft
`
`^7
`
`T
`
`• • •
`
`etc.
`• • •
`
`2
`3
`5
`1
`4
`Fig. 3. State transition diagram for master in analysis of sampler.
`
`master determined by Fig. 3 may be written in the form
`0 = 7T;_1 A + TTiB + 7ri+1C,
`i > 2,
`
`(4.7)
`
`The mean number of jobs on the master is found from
`Qm(A) = E[Xm]
`oc
`y^TTt • i
`i=l
`= TTl
`
`oo
`
`i=l
`= TTi(I — R)~21
`(4.16)
`where 1 denotes the two-dimensional column vector of all
`ones. It follows after some algebra that
`X2TvT2k - XiiTyTk + h2{Tv + Tk)
`^{Tu + T^il-X/Xft
`
`•
`
`(4-17)
`
`Qm{X) =
`
`It is intuitively evident that Qm (A) is an increasing function
`of 0 < A < A*. A proof is given in the Appendix.
`3) Master Subsystem in Isolation—Kickto: In this section,
`we analyze a Markovian model of the master processor in
`isolation, assuming the scheduling algorithm is kickto.
`Jobs in kernel mode are assumed to arrive at Poisson rate
`A. The master processor is never idle. If the number of jobs
`present is two or more, then the job in service must be in kernel
`mode, and so receives service at exponential rate (jl/Tk. If
`exactly one job is present, then that job may be either in kernel
`mode receiving service at exponential rate h/Tk or in user
`mode receiving service at exponential rate fi/Tu- If exactly
`one job is present, the current service is user mode, and a
`new job arrives (necessarily in kernel mode), then the current
`job is transferred out and the new job enters kernel mode
`service. We describe the state of the system in equilibrium by
`the nonnegative random variable
`if one job present and serving user mode,
`0,
`if /,; > 1 jobs present and serving kernel
`Ym = \k,
`mode.
`The Markov chain Ym evolves as a simple birth/death process,
`where if k > 1, the transition Ym : A; —» fc + 1 occurs at rate
`A, the transition Yrn : k k — 1 occurs at rate h/Tk, and if
`k = 0, Ym : 0 —> 1 occurs at rate A + ii/Tj;. It is clear that
`the system is stable if and only if
`A < A* =
`
`(4.18)
`
`Tk
`The transition diagram is illustrated in Fig. 4.
`The number of jobs on the master (in user or kernel mode),
`is given by the random variable
`Xm — tH(ix\Ym ^ 1}.
`
`(4.19)
`
`B =
`
`D =
`
`~
`T,
`
`_E_
`Tu
`
`JL
`Tu
`
`Tk .
`
`0 = 7T i ID + 7r2C,
`(4.8)
`where matrices A, B, C, and D account for the transition
`rates into and out of Xm = i,
`A 0
`-A -
`A =
`2.
`0 A
`Tk
`-X-rft
`c- 0
`T,
`Tu
`0 ’
`Tk
`These equations may be solved by setting
`, J > 1
`Tti = TTiiZ
`where R satisfies the matrix equation
`0 = A + RB + R2C.
`(4.10)
`The vector tti is found by solving the boundary equation
`0 = tti [£> + RC].
`(4.11)
`It turns out this matrix geometric solution [15] may be given
`explicitly. Furthermore, (4.11) expresses tti as the stationary
`vector of a two-state Markov chain. We defer the details of
`the solution to the Appendix and instead just state the final
`answer here:
`
`i-l
`
`(4.9)
`
`R=XTt
`
`1
`1
`
`f XT + 2k.
`+ Tu
`+
`+ pTi
`( XT
`)Ik.
`Ik
`Tu
`
`TTl = 7
`
`Tk TuY
`
`1
`
`_ TuTk
`A
`X*JTu + Tk'
`
`(4.12)
`
`(4.13)
`
`(4.14)
`
`The matrix R has two positive real eigenvalues. Let o(R)
`denote its spectral radius. It is shown in the Appendix that
`o(R) < 1 o A < A*.
`(4.15)
`The condition a(R) < 1 is necessary and sufficient for the
`summability of the sequence of vectors 77, i > 1.
`
`PATENT OWNER DIRECTSTREAM, LLC
`EX. 2117, p. 7
`
`
`
`GREENBERG AND WRIGHT: MASTER/SLAVE MULTIPROCESSORSO#o=o=o#o etc.
`
`Vrn : o
`
`2
`3
`4
`1
`Fig. 4. State transition diagram for master in analysis of kickto.
`
`We readily find that
`
`Pr[ym = k\
`
`k— 1
`
`Tu(^-XTk)
`L h(Tu+Tk)
`
`J m Tk(m-ATk)(m+AT[7)
`
`hHTu+Tk)
`
`The mean population is found to be
`
`Qm(A) = E[Xm}
`\2T2kTu - XiiTuTk + h2(Tu + Tk)
`li2{Tu + Tk) (1 — A/A*)
`
`We note that the formula for the mean population has the
`same form as in the case of sampler. Note, however, th