throbber
Live Migration of Virtual Machines
`
`Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen†,
`Eric Jul†, Christian Limpach, Ian Pratt, Andrew Warfield
`† Department of Computer Science
`University of Cambridge Computer Laboratory
`15 JJ Thomson Avenue, Cambridge, UK
`University of Copenhagen, Denmark
`{jacobg,eric}@diku.dk
`firstname.lastname@cl.cam.ac.uk
`
`Abstract
`Migrating operating system instances across distinct phys-
`ical hosts is a useful tool for administrators of data centers
`and clusters: It allows a clean separation between hard-
`ware and software, and facilitates fault management, load
`balancing, and low-level system maintenance.
`By carrying out the majority of migration while OSes con-
`tinue to run, we achieve impressive performance with min-
`imal service downtimes; we demonstrate the migration of
`entire OS instances on a commodity cluster, recording ser-
`vice downtimes as low as 60ms. We show that that our
`performance is sufficient to make live migration a practical
`tool even for servers running interactive loads.
`In this paper we consider the design options for migrat-
`ing OSes running services with liveness constraints, fo-
`cusing on data center and cluster environments. We intro-
`duce and analyze the concept of writable working set, and
`present the design, implementation and evaluation of high-
`performance OS migration built on top of the Xen VMM.
`
`1 Introduction
`
`Operating system virtualization has attracted considerable
`interest in recent years, particularly from the data center
`and cluster computing communities. It has previously been
`shown [1] that paravirtualization allows many OS instances
`to run concurrently on a single physical machine with high
`performance, providing better use of physical resources
`and isolating individual OS instances.
`In this paper we explore a further benefit allowed by vir-
`tualization:
`that of live OS migration. Migrating an en-
`tire OS and all of its applications as one unit allows us to
`avoid many of the difficulties faced by process-level mi-
`gration approaches. In particular the narrow interface be-
`tween a virtualized OS and the virtual machine monitor
`(VMM) makes it easy avoid the problem of ‘residual de-
`pendencies’ [2] in which the original host machine must
`remain available and network-accessible in order to service
`
`certain system calls or even memory accesses on behalf of
`migrated processes. With virtual machine migration, on
`the other hand, the original host may be decommissioned
`once migration has completed. This is particularly valuable
`when migration is occurring in order to allow maintenance
`of the original host.
`
`Secondly, migrating at the level of an entire virtual ma-
`chine means that in-memory state can be transferred in a
`consistent and (as will be shown) efficient fashion. This ap-
`plies to kernel-internal state (e.g. the TCP control block for
`a currently active connection) as well as application-level
`state, even when this is shared between multiple cooperat-
`ing processes. In practical terms, for example, this means
`that we can migrate an on-line game server or streaming
`media server without requiring clients to reconnect: some-
`thing not possible with approaches which use application-
`level restart and layer 7 redirection.
`
`Thirdly, live migration of virtual machines allows a sepa-
`ration of concerns between the users and operator of a data
`center or cluster. Users have ‘carte blanche’ regarding the
`software and services they run within their virtual machine,
`and need not provide the operator with any OS-level access
`at all (e.g. a root login to quiesce processes or I/O prior to
`migration). Similarly the operator need not be concerned
`with the details of what is occurring within the virtual ma-
`chine; instead they can simply migrate the entire operating
`system and its attendant processes as a single unit.
`
`Overall, live OS migration is a extremelely powerful tool
`for cluster administrators, allowing separation of hardware
`and software considerations, and consolidating clustered
`hardware into a single coherent management domain.
`If
`a physical machine needs to be removed from service an
`administrator may migrate OS instances including the ap-
`plications that they are running to alternative machine(s),
`freeing the original machine for maintenance. Similarly,
`OS instances may be rearranged across machines in a clus-
`ter to relieve load on congested hosts. In these situations the
`combination of virtualization and migration significantly
`improves manageability.
`
`USENIX Association
`
`NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation
`
`273
`
`Microsoft Ex. 1006, p. 1
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`We have implemented high-performance migration sup-
`port for Xen [1], a freely available open source VMM for
`commodity hardware. Our design and implementation ad-
`dresses the issues and tradeoffs involved in live local-area
`migration. Firstly, as we are targeting the migration of ac-
`tive OSes hosting live services, it is critically important to
`minimize the downtime during which services are entirely
`unavailable. Secondly, we must consider the total migra-
`tion time, during which state on both machines is synchro-
`nized and which hence may affect reliability. Furthermore
`we must ensure that migration does not unnecessarily dis-
`rupt active services through resource contention (e.g., CPU,
`network bandwidth) with the migrating OS.
`Our implementation addresses all of these concerns, allow-
`ing for example an OS running the SPECweb benchmark
`to migrate across two physical hosts with only 210ms un-
`availability, or an OS running a Quake 3 server to migrate
`with just 60ms downtime. Unlike application-level restart,
`we can maintain network connections and application state
`during this process, hence providing effectively seamless
`migration from a user’s point of view.
`We achieve this by using a pre-copy approach in which
`pages of memory are iteratively copied from the source
`machine to the destination host, all without ever stopping
`the execution of the virtual machine being migrated. Page-
`level protection hardware is used to ensure a consistent
`snapshot is transferred, and a rate-adaptive algorithm is
`used to control the impact of migration traffic on running
`services. The final phase pauses the virtual machine, copies
`any remaining pages to the destination, and resumes exe-
`cution there. We eschew a ‘pull’ approach which faults in
`missing pages across the network since this adds a residual
`dependency of arbitrarily long duration, as well as provid-
`ing in general rather poor performance.
`Our current implementation does not address migration
`across the wide area, nor does it include support for migrat-
`ing local block devices, since neither of these are required
`for our target problem space. However we discuss ways in
`which such support can be provided in Section 7.
`
`2 Related Work
`
`The Collective project [3] has previously explored VM mi-
`gration as a tool to provide mobility to users who work on
`different physical hosts at different times, citing as an ex-
`ample the transfer of an OS instance to a home computer
`while a user drives home from work. Their work aims to
`optimize for slow (e.g., ADSL) links and longer time spans,
`and so stops OS execution for the duration of the transfer,
`with a set of enhancements to reduce the transmitted image
`size. In contrast, our efforts are concerned with the migra-
`tion of live, in-service OS instances on fast neworks with
`only tens of milliseconds of downtime. Other projects that
`
`have explored migration over longer time spans by stop-
`ping and then transferring include Internet Suspend/Re-
`sume [4] and µDenali [5].
`
`Zap [6] uses partial OS virtualization to allow the migration
`of process domains (pods), essentially process groups, us-
`ing a modified Linux kernel. Their approach is to isolate all
`process-to-kernel interfaces, such as file handles and sock-
`ets, into a contained namespace that can be migrated. Their
`approach is considerably faster than results in the Collec-
`tive work, largely due to the smaller units of migration.
`However, migration in their system is still on the order of
`seconds at best, and does not allow live migration; pods
`are entirely suspended, copied, and then resumed. Further-
`more, they do not address the problem of maintaining open
`connections for existing services.
`
`The live migration system presented here has considerable
`shared heritage with the previous work on NomadBIOS [7],
`a virtualization and migration system built on top of the
`L4 microkernel [8]. NomadBIOS uses pre-copy migration
`to achieve very short best-case migration downtimes, but
`makes no attempt at adapting to the writable working set
`behavior of the migrating OS.
`
`VMware has recently added OS migration support, dubbed
`VMotion, to their VirtualCenter management software. As
`this is commercial software and strictly disallows the publi-
`cation of third-party benchmarks, we are only able to infer
`its behavior through VMware’s own publications. These
`limitations make a thorough technical comparison impos-
`sible. However, based on the VirtualCenter User’s Man-
`ual [9], we believe their approach is generally similar to
`ours and would expect it to perform to a similar standard.
`
`Process migration, a hot topic in systems research during
`the 1980s [10, 11, 12, 13, 14], has seen very little use for
`real-world applications. Milojicic et al [2] give a thorough
`survey of possible reasons for this, including the problem
`of the residual dependencies that a migrated process re-
`tains on the machine from which it migrated. Examples of
`residual dependencies include open file descriptors, shared
`memory segments, and other local resources. These are un-
`desirable because the original machine must remain avail-
`able, and because they usually negatively impact the per-
`formance of migrated processes.
`
`For example Sprite [15] processes executing on foreign
`nodes require some system calls to be forwarded to the
`home node for execution, leading to at best reduced perfor-
`mance and at worst widespread failure if the home node is
`unavailable. Although various efforts were made to ame-
`liorate performance issues, the underlying reliance on the
`availability of the home node could not be avoided. A sim-
`ilar fragility occurs with MOSIX [14] where a deputy pro-
`cess on the home node must remain available to support
`remote execution.
`
`274
`
`NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation
`
`USENIX Association
`
`Microsoft Ex. 1006, p. 2
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`We believe the residual dependency problem cannot easily
`be solved in any process migration scheme – even modern
`mobile run-times such as Java and .NET suffer from prob-
`lems when network partition or machine crash causes class
`loaders to fail. The migration of entire operating systems
`inherently involves fewer or zero such dependencies, mak-
`ing it more resilient and robust.
`
`three. For example, pure stop-and-copy [3, 4, 5] involves
`halting the original VM, copying all pages to the destina-
`tion, and then starting the new VM. This has advantages in
`terms of simplicity but means that both downtime and total
`migration time are proportional to the amount of physical
`memory allocated to the VM. This can lead to an unaccept-
`able outage if the VM is running a live service.
`
`3 Design
`
`At a high level we can consider a virtual machine to encap-
`sulate access to a set of physical resources. Providing live
`migration of these VMs in a clustered server environment
`leads us to focus on the physical resources used in such
`environments: specifically on memory, network and disk.
`
`This section summarizes the design decisions that we have
`made in our approach to live VM migration. We start by
`describing how memory and then device access is moved
`across a set of physical hosts and then go on to a high-level
`description of how a migration progresses.
`
`3.1 Migrating Memory
`
`Moving the contents of a VM’s memory from one phys-
`ical host to another can be approached in any number of
`ways. However, when a VM is running a live service it
`is important that this transfer occurs in a manner that bal-
`ances the requirements of minimizing both downtime and
`total migration time. The former is the period during which
`the service is unavailable due to there being no currently
`executing instance of the VM; this period will be directly
`visible to clients of the VM as service interruption. The
`latter is the duration between when migration is initiated
`and when the original VM may be finally discarded and,
`hence, the source host may potentially be taken down for
`maintenance, upgrade or repair.
`
`It is easiest to consider the trade-offs between these require-
`ments by generalizing memory transfer into three phases:
`Push phase The source VM continues running while cer-
`tain pages are pushed across the network to the new
`destination. To ensure consistency, pages modified
`during this process must be re-sent.
`Stop-and-copy phase The source VM is stopped, pages
`are copied across to the destination VM, then the new
`VM is started.
`Pull phase The new VM executes and, if it accesses a page
`that has not yet been copied, this page is faulted in
`(“pulled”) across the network from the source VM.
`
`Although one can imagine a scheme incorporating all three
`phases, most practical solutions select one or two of the
`
`Another option is pure demand-migration [16] in which a
`short stop-and-copy phase transfers essential kernel data
`structures to the destination. The destination VM is then
`started, and other pages are transferred across the network
`on first use. This results in a much shorter downtime, but
`produces a much longer total migration time; and in prac-
`tice, performance after migration is likely to be unaccept-
`ably degraded until a considerable set of pages have been
`faulted across. Until this time the VM will fault on a high
`proportion of its memory accesses, each of which initiates
`a synchronous transfer across the network.
`
`The approach taken in this paper, pre-copy [11] migration,
`balances these concerns by combining a bounded itera-
`tive push phase with a typically very short stop-and-copy
`phase. By ‘iterative’ we mean that pre-copying occurs in
`rounds, in which the pages to be transferred during round
`n are those that are modified during round n − 1 (all pages
`are transferred in the first round). Every VM will have
`some (hopefully small) set of pages that it updates very
`frequently and which are therefore poor candidates for pre-
`copy migration. Hence we bound the number of rounds of
`pre-copying, based on our analysis of the writable working
`set (WWS) behavior of typical server workloads, which we
`present in Section 4.
`
`Finally, a crucial additional concern for live migration is the
`impact on active services. For instance, iteratively scanning
`and sending a VM’s memory image between two hosts in
`a cluster could easily consume the entire bandwidth avail-
`able between them and hence starve the active services of
`resources. This service degradation will occur to some ex-
`tent during any live migration scheme. We address this is-
`sue by carefully controlling the network and CPU resources
`used by the migration process, thereby ensuring that it does
`not interfere excessively with active traffic or processing.
`
`3.2 Local Resources
`
`A key challenge in managing the migration of OS instances
`is what to do about resources that are associated with the
`physical machine that they are migrating away from. While
`memory can be copied directly to the new host, connec-
`tions to local devices such as disks and network interfaces
`demand additional consideration. The two key problems
`that we have encountered in this space concern what to do
`with network resources and local storage.
`
`USENIX Association
`
`NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation
`
`275
`
`Microsoft Ex. 1006, p. 3
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`For network resources, we want a migrated OS to maintain
`all open network connections without relying on forward-
`ing mechanisms on the original host (which may be shut
`down following migration), or on support from mobility
`or redirection mechanisms that are not already present (as
`in [6]). A migrating VM will include all protocol state (e.g.
`TCP PCBs), and will carry its IP address with it.
`
`To address these requirements we observed that in a clus-
`ter environment, the network interfaces of the source and
`destination machines typically exist on a single switched
`LAN. Our solution for managing migration with respect to
`network in this environment is to generate an unsolicited
`ARP reply from the migrated host, advertising that the IP
`has moved to a new location. This will reconfigure peers
`to send packets to the new physical address, and while a
`very small number of in-flight packets may be lost, the mi-
`grated domain will be able to continue using open connec-
`tions with almost no observable interference.
`
`Some routers are configured not to accept broadcast ARP
`replies (in order to prevent IP spoofing), so an unsolicited
`ARP may not work in all scenarios. If the operating system
`is aware of the migration, it can opt to send directed replies
`only to interfaces listed in its own ARP cache, to remove
`the need for a broadcast. Alternatively, on a switched net-
`work, the migrating OS can keep its original Ethernet MAC
`address, relying on the network switch to detect its move to
`a new port1.
`
`In the cluster, the migration of storage may be similarly ad-
`dressed: Most modern data centers consolidate their stor-
`age requirements using a network-attached storage (NAS)
`device, in preference to using local disks in individual
`servers. NAS has many advantages in this environment, in-
`cluding simple centralised administration, widespread ven-
`dor support, and reliance on fewer spindles leading to a
`reduced failure rate. A further advantage for migration is
`that it obviates the need to migrate disk storage, as the NAS
`is uniformly accessible from all host machines in the clus-
`ter. We do not address the problem of migrating local-disk
`storage in this paper, although we suggest some possible
`strategies as part of our discussion of future work.
`
`3.3 Design Overview
`
`The logical steps that we execute when migrating an OS are
`summarized in Figure 1. We take a conservative approach
`to the management of migration with regard to safety and
`failure handling. Although the consequences of hardware
`failures can be severe, our basic principle is that safe mi-
`gration should at no time leave a virtual OS more exposed
`
`1Note that on most Ethernet controllers, hardware MAC filtering will
`have to be disabled if multiple addresses are in use (though some cards
`support filtering of multiple addresses in hardware) and so this technique
`is only practical for switched networks.
`
`VM running normally on
`Host A
`
`Stage 0: Pre-Migration
` Active VM on Host A
` Alternate physical host may be preselected for migration
` Block devices mirrored and free resources maintained
`
`Stage 1: Reservation
` Initialize a container on the target host
`
`Overhead due to copying
`
`Downtime
`(VM Out of Service)
`
`Stage 2: Iterative Pre-copy
` Enable shadow paging
` Copy dirty pages in successive rounds.
`
`Stage 3: Stop and copy
` Suspend VM on host A
` Generate ARP to redirect traffic to Host B
` Synchronize all remaining VM state to Host B
`
`VM running normally on
`Host B
`
`Stage 4: Commitment
` VM state on Host A is released
`
`Stage 5: Activation
` VM starts on Host B
` Connects to local devices
` Resumes normal operation
`
`Figure 1: Migration timeline
`
`to system failure than when it is running on the original sin-
`gle host. To achieve this, we view the migration process as
`a transactional interaction between the two hosts involved:
`Stage 0: Pre-Migration We begin with an active VM on
`physical host A. To speed any future migration, a tar-
`get host may be preselected where the resources re-
`quired to receive migration will be guaranteed.
`Stage 1: Reservation A request is issued to migrate an OS
`from host A to host B. We initially confirm that the
`necessary resources are available on B and reserve a
`VM container of that size. Failure to secure resources
`here means that the VM simply continues to run on A
`unaffected.
`Stage 2: Iterative Pre-Copy During the first iteration, all
`pages are transferred from A to B. Subsequent itera-
`tions copy only those pages dirtied during the previous
`transfer phase.
`Stage 3: Stop-and-Copy We suspend the running OS in-
`stance at A and redirect its network traffic to B. As
`described earlier, CPU state and any remaining incon-
`sistent memory pages are then transferred. At the end
`of this stage there is a consistent suspended copy of
`the VM at both A and B. The copy at A is still con-
`sidered to be primary and is resumed in case of failure.
`Stage 4: Commitment Host B indicates to A that it has
`successfully received a consistent OS image. Host A
`acknowledges this message as commitment of the mi-
`gration transaction: host A may now discard the orig-
`inal VM, and host B becomes the primary host.
`Stage 5: Activation The migrated VM on B is now ac-
`tivated. Post-migration code runs to reattach device
`drivers to the new machine and advertise moved IP
`addresses.
`
`276
`
`NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation
`
`USENIX Association
`
`Microsoft Ex. 1006, p. 4
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`Tracking the Writable Working Set of SPEC CINT2000
`
`gzip
`
`vpr
`
`gcc
`
`mcf
`
`crafty parser
`
`eon
`
`perlbmk gap vortex
`
`bzip2
`
`twolf
`
`2000
`
`4000
`
`6000
`Elapsed time (secs)
`
`8000
`
`10000
`
`12000
`
`80000
`
`70000
`
`60000
`
`50000
`
`40000
`
`30000
`
`20000
`
`10000
`
`Numberofpages
`
`0
`
`0
`
`Figure 2: WWS curve for a complete run of SPEC CINT2000 (512MB VM)
`
`This approach to failure management ensures that at least
`one host has a consistent VM image at all times during
`migration. It depends on the assumption that the original
`host remains stable until the migration commits, and that
`the VM may be suspended and resumed on that host with
`no risk of failure. Based on these assumptions, a migra-
`tion request essentially attempts to move the VM to a new
`host, and on any sort of failure execution is resumed locally,
`aborting the migration.
`
`4 Writable Working Sets
`
`When migrating a live operating system, the most signif-
`icant influence on service performance is the overhead of
`coherently transferring the virtual machine’s memory im-
`age. As mentioned previously, a simple stop-and-copy ap-
`proach will achieve this in time proportional to the amount
`of memory allocated to the VM. Unfortunately, during this
`time any running services are completely unavailable.
`
`in
`A more attractive alternative is pre-copy migration,
`which the memory image is transferred while the operat-
`ing system (and hence all hosted services) continue to run.
`The drawback however, is the wasted overhead of trans-
`ferring memory pages that are subsequently modified, and
`hence must be transferred again. For many workloads there
`will be a small set of memory pages that are updated very
`frequently, and which it is not worth attempting to maintain
`coherently on the destination machine before stopping and
`copying the remainder of the VM.
`
`is: how does one determine when it is time to stop the pre-
`copy phase because too much time and resource is being
`wasted? Clearly if the VM being migrated never modifies
`memory, a single pre-copy of each memory page will suf-
`fice to transfer a consistent image to the destination. How-
`ever, should the VM continuously dirty pages faster than
`the rate of copying, then all pre-copy work will be in vain
`and one should immediately stop and copy.
`
`In practice, one would expect most workloads to lie some-
`where between these extremes: a certain (possibly large)
`set of pages will seldom or never be modified and hence are
`good candidates for pre-copy, while the remainder will be
`written often and so should best be transferred via stop-and-
`copy – we dub this latter set of pages the writable working
`set (WWS) of the operating system by obvious extension
`of the original working set concept [17].
`
`In this section we analyze the WWS of operating systems
`running a range of different workloads in an attempt to ob-
`tain some insight to allow us build heuristics for an efficient
`and controllable pre-copy implementation.
`
`4.1 Measuring Writable Working Sets
`
`To trace the writable working set behaviour of a number of
`representative workloads we used Xen’s shadow page ta-
`bles (see Section 5) to track dirtying statistics on all pages
`used by a particular executing operating system. This al-
`lows us to determine within any time period the set of pages
`written to by the virtual machine.
`
`The fundamental question for iterative pre-copy migration
`
`Using the above, we conducted a set of experiments to sam-
`
`USENIX Association
`
`NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation
`
`277
`
`Microsoft Ex. 1006, p. 5
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`Rateofpagedirtying(pages/sec)
`
`8000
`
`7000
`
`6000
`
`5000
`
`4000
`
`3000
`
`2000
`
`01
`
`000
`
`8000
`
`7000
`
`6000
`
`5000
`
`Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime
`(Based on a page trace of OLTP Database Benchmark)
`Migration throughput: 128 Mbit/sec
`
`0
`
`200
`
`400
`
`600
`Elapsed time (sec)
`
`800
`
`1000
`
`1200
`
`Migration throughput: 256 Mbit/sec
`
`4
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`4
`
`3.5
`
`3
`
`2.5
`
`Expecteddowntime(sec)
`
`Rateofpagedirtying(pages/sec)
`
`9000
`
`8000
`
`7000
`
`6000
`
`5000
`
`4000
`
`3000
`
`2000
`
`01
`
`000
`
`9000
`
`8000
`
`7000
`
`6000
`
`5000
`
`Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime
`(Based on a page trace of Linux Kernel Compile)
`Migration throughput: 128 Mbit/sec
`
`0
`
`100
`
`200
`
`300
`400
`Elapsed time (sec)
`
`500
`
`600
`
`Migration throughput: 256 Mbit/sec
`
`4
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`4
`
`3.5
`
`3
`
`2.5
`
`Expecteddowntime(sec)
`
`Rateofpagedirtying(pages/sec)
`
`4000
`
`3000
`
`2000
`
`01
`
`000
`
`8000
`
`7000
`
`6000
`
`5000
`
`4000
`
`3000
`
`2000
`
`000
`
`0
`
`200
`
`400
`
`600
`Elapsed time (sec)
`
`800
`
`1000
`
`1200
`
`Migration throughput: 512 Mbit/sec
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`4
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`Expecteddowntime(sec)
`
`Expecteddowntime(sec)
`
`Rateofpagedirtying(pages/sec)
`
`4000
`
`3000
`
`2000
`
`01
`
`000
`
`9000
`
`8000
`
`7000
`
`6000
`
`5000
`
`4000
`
`3000
`
`2000
`
`000
`
`0
`
`100
`
`200
`
`300
`400
`Elapsed time (sec)
`
`500
`
`600
`
`Migration throughput: 512 Mbit/sec
`
`2
`
`1.5
`
`1
`
`0.5
`
`0
`
`4
`
`3.5
`
`3
`
`2.5
`
`2
`
`1.5
`
`1
`
`0.5
`
`Expecteddowntime(sec)
`
`Expecteddowntime(sec)
`
`Rateofpagedirtying(pages/sec)
`
`01
`
`0
`
`0
`
`200
`
`400
`
`600
`Elapsed time (sec)
`
`800
`
`1000
`
`1200
`
`Rateofpagedirtying(pages/sec)
`
`01
`
`0
`
`0
`
`100
`
`200
`
`300
`400
`Elapsed time (sec)
`
`500
`
`600
`
`Figure 3: Expected downtime due to last-round memory
`copy on traced page dirtying of a Linux kernel compile.
`
`Figure 4: Expected downtime due to last-round memory
`copy on traced page dirtying of OLTP.
`
`Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime
`(Based on a page trace of SPECweb)
`Migration throughput: 128 Mbit/sec
`
`14000
`
`0123456789
`
`Expecteddowntime(sec)
`
`Rateofpagedirtying(pages/sec)
`
`600
`
`500
`
`400
`
`300
`
`200
`
`01
`
`00
`
`600
`
`500
`
`400
`
`300
`
`Effect of Bandwidth and Pre−Copy Iterations on Migration Downtime
`(Based on a page trace of Quake 3 Server)
`Migration throughput: 128 Mbit/sec
`
`0
`
`100
`
`200
`
`300
`Elapsed time (sec)
`
`400
`
`500
`
`Migration throughput: 256 Mbit/sec
`
`0.5
`
`0.4
`
`0.3
`
`0.2
`
`0.1
`
`0
`
`0.5
`
`0.4
`
`0.3
`
`Expecteddowntime(sec)
`
`Rateofpagedirtying(pages/sec)
`
`12000
`
`10000
`
`8000
`
`6000
`
`4000
`
`02
`
`000
`
`0
`
`100
`
`200
`
`300
`400
`Elapsed time (sec)
`
`500
`
`600
`
`700
`
`Migration throughput: 256 Mbit/sec
`
`14000
`
`12000
`
`10000
`
`8000
`
`Rateofpagedirtying(pages/sec)
`
`6000
`
`4000
`
`02
`
`000
`
`14000
`
`12000
`
`Rateofpagedirtying(pages/sec)
`
`10000
`
`8000
`
`6000
`
`4000
`
`02
`
`000
`
`0
`
`100
`
`200
`
`300
`400
`Elapsed time (sec)
`
`500
`
`600
`
`700
`
`Migration throughput: 512 Mbit/sec
`
`0123456789
`
`0123456789
`
`Expecteddowntime(sec)
`
`Expecteddowntime(sec)
`
`Rateofpagedirtying(pages/sec)
`
`Rateofpagedirtying(pages/sec)
`
`200
`
`01
`
`00
`
`600
`
`500
`
`400
`
`300
`
`200
`
`01
`
`00
`
`0
`
`100
`
`200
`
`300
`Elapsed time (sec)
`
`400
`
`500
`
`Migration throughput: 512 Mbit/sec
`
`0.2
`
`0.1
`
`0
`
`0.5
`
`0.4
`
`0.3
`
`0.2
`
`0.1
`
`0
`
`Expecteddowntime(sec)
`
`Expecteddowntime(sec)
`
`0
`
`100
`
`200
`
`300
`Elapsed time (sec)
`
`400
`
`500
`
`0
`
`100
`
`200
`
`300
`400
`Elapsed time (sec)
`
`500
`
`600
`
`700
`
`Figure 5: Expected downtime due to last-round memory
`copy on traced page dirtying of a Quake 3 server.
`
`Figure 6: Expected downtime due to last-round memory
`copy on traced page dirtying of SPECweb.
`
`278
`
`NSDI ’05: 2nd Symposium on Networked Systems Design & Implementation
`
`USENIX Association
`
`Microsoft Ex. 1006, p. 6
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`ple the writable working set size for a variety of bench-
`marks. Xen was running on a dual processor Intel Xeon
`2.4GHz machine, and the virtual machine being measured
`had a memory allocation of 512MB. In each case we started
`the relevant benchmark in one virtual machine and read
`the dirty bitmap every 50ms from another virtual machine,
`cleaning it every 8 seconds – in essence this allows us to
`compute the WWS with a (relatively long) 8 second win-
`dow, but estimate it at a finer (50ms) granularity.
`
`The benchmarks we ran were SPEC CINT2000, a Linux
`kernel compile, the OSDB OLTP benchmark using Post-
`greSQL and SPECweb99 using Apache. We also measured
`a Quake 3 server as we are particularly interested in highly
`interactive workloads.
`
`Figure 2 illustrates the writable working set curve produced
`for the SPEC CINT2000 benchmark run. This benchmark
`involves running a series of smaller programs in order and
`measuring the overall execution time. The x-axis measures
`elapsed time, and the y-axis shows the number of 4KB
`pages of memory dirtied within the corresponding 8 sec-
`ond interval; the graph is annotated with the names of the
`sub-benchmark programs.
`
`From this data we observe that the writable working set
`varies significantly between the different sub-benchmarks.
`For programs such as ‘eon’ the WWS is a small fraction of
`the total working set and hence is an excellent candidate for
`migration. In contrast, ‘gap’ has a consistently high dirty-
`ing rate and would be problematic to migrate. The other
`benchmarks go through various phases but are generally
`amenable to live migration. Thus performing a migration
`of an operating system will give different results depending
`on the workload and the precise moment at which migra-
`tion begins.
`
`4.2 Estimating Migration Effectiveness
`
`We observed that we could use the trace data acquired to
`estimate the effectiveness of iterative pre-copy migration
`for various workloads. In particular we can simulate a par-
`ticular network bandwidth for page transfer, determine how
`many pages would be dirtied during a particular iteration,
`and then repeat for successive iterations. Since we know
`the approximate WWS behaviour at every point in time, we
`can estimate the overall amount of data transferred in the fi-
`nal stop-and-copy round and hence estimate the downtime.
`
`Figures 3–6 show our results for the four remaining work-
`loads. Each figure comprises three graphs, each of which
`corresponds to a particular network bandwidth limit for
`page transfer; each individual graph shows the WWS his-
`togram (in light gray) overlaid with four line plots estimat-
`ing service downtime for up to four pre-copying rounds.
`
`Looking at
`
`the topmost
`
`line (one pre-copy iteration),
`
`the first thing to observe is that pre-copy migration al-
`ways performs considerably better than naive stop-and-
`copy. For a 512MB virtual machine this latter approach
`would require 32, 16, and 8 seconds downtime for the
`128Mbit/sec, 256Mbit/sec and 512Mbit/sec bandwidths re-
`spectively. Even in the worst case (the starting phase of
`SPECweb), a single pre-copy iteration reduces downtime
`by a factor of four.
`In most cases we can expect to do
`considerably better – for example both the Linux kernel
`compile and the OLTP benchmark typically experience a
`reduction in downtime of at least a factor of sixteen.
`The remaining three lines show, in order, the effect of per-
`forming a total of two, three or four pre-copy iterations
`prior to the final stop-and-copy round. In most cases we
`see an increased reduction in downtime from performing
`these additional iterations, although with somewhat dimin-
`ishing returns, particularly in the higher bandwidth cases.
`This is because all the observed workloads exhibit a small
`but extremely frequently updated set of ‘hot’ pages.
`In
`practice these pages will include the stack and local vari-
`ables being accessed within the currently executing pro-
`cesses as well as pages being used for network and disk
`traffic. The hottest pages will be dirtied at least as fast as
`we can transfer them, and hence must be transferred in the
`final stop-and-copy phase. This puts a lower bound on the
`best possible

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket