throbber
I 1111111111111111 11111 lllll 111111111111111 1111111111 111111111111111 11111111
`US007484208Bl
`
`c12) United States Patent
`Nelson
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,484,208 Bl
`Jan.27,2009
`
`(54) VIRTUAL MACHINE MIGRATION
`
`(76)
`
`Inventor: Michael Nelson, 888 Forest La., Alamo,
`CA (US) 94507
`
`( * ) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 425 days.
`
`(21) Appl. No.: 10/319,217
`
`(22) Filed:
`
`Dec. 12, 2002
`
`(51)
`
`Int. Cl.
`G06F 9/455
`(2006.01)
`G06F 12100
`(2006.01)
`(52) U.S. Cl. ............................................... 718/1; 711/6
`(58) Field of Classification Search ..................... 718/1;
`709/224; 711/153, 6
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`6,075,938 A *
`6,698,017 Bl *
`2004/0010787 Al *
`
`6/2000 Bugnion et al. ............... 703/27
`2/2004 Adamovits et al. .......... 717/168
`1/2004 Traut et al ...................... 718/1
`
`OTHER PUBLICATIONS
`Theimer, Marvin M., Lantz, Keith A, and Cheriton, David R.,
`"Preemptable Remote Execution Facilities for the V-System," Asso(cid:173)
`ciation for Computing Machinery, pp. 2-12, Dec. 1985.
`* cited by examiner
`
`Primary Examiner-Li B Zhen
`
`(57)
`
`ABSTRACT
`
`A source virtual machine (VM) hosted on a source server is
`migrated to a destination VM on a destination server without
`first powering down the source VM. After optional pre-copy(cid:173)
`ing of the source VM's memory to the destination VM, the
`source VM is suspended and its non-memory state is trans(cid:173)
`ferred to the destination VM; the destination VM is then
`resumed from the transferred state. The source VM memory
`is either paged in to the destination VM on demand, or is
`transferred asynchronously by pre-copying and write-pro(cid:173)
`tecting the source VM memory, and then later transferring
`only the modified pages after the destination VM is resumed.
`The source and destination servers preferably share common
`storage, in which the source VM's virtual disk is stored; this
`avoids the need to transfer the virtual disk contents. Network
`connectivity is preferably also made transparent to the user by
`arranging the servers on a common subnet, with virtual net(cid:173)
`work connection addresses generated from a common name
`space of physical addresses.
`
`4 Claims, 3 Drawing Sheets
`
`,,.? DESTINATION SERVER
`1002
`,,..-1300
`Server Daemon
`
`✓ 1000
`
`SOURCE SERVER
`
`,,..-1200
`Source VM
`
`f
`) 6. Begin save
`17. Pre-copy memory f ~
`1 s. Save state
`19. End save
`
`)4 Ready
`
`I
`
`)2. CreateVM)
`I
`
`-t
`
`,,....1202
`Destination VM
`
`) 10. Restore state
`
`13. Ready I
`) 3A. Wait for migration I
`I
`
`~
`
`f~ "' i°"-~ I
`f;:: I'--.
`l
`r------. ~ ~ Destination Kernel
`
`) 11. End restore
`
`~1602\
`
`I Source Kernel j
`
`'-----1600
`
`: 12. Page In
`
`)
`
`: 13. Done
`
`I
`
`) 1. Migrate prepare )
`
`15. Suspend
`and migrate
`
`I
`I
`
`Migration
`
`"-- 2000
`
`Microsoft Ex. 1011, p. 1
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`~
`
`0 ....
`....
`.....
`rJJ =(cid:173)
`
`('D
`('D
`
`1,0
`0
`0
`N
`~--..J
`N
`?
`~
`~
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`'N = 00 = "'""'
`
`~
`00
`~
`--..l
`
`d r.,;_
`
`700
`
`~~ 130 1~160~170
`~ IMM1J 11 ME~ORY I ~ ~ I NET
`
`AND DRIVERS
`
`MODULES
`KERNEL
`
`LOADABLE
`
`s10
`
`~ 650_,,. I HANDLER E}csoa
`~ J INTERRUPT
`I SCHEDULER I
`~ WORLDS I
`
`616 612
`
`KERNEL
`
`614
`
`C
`
`HARDWARE
`SYSTEM
`
`100
`
`600
`
`I • • • I VM
`
`I I VCPU(S) I VMEM VDISK I VDEVICE(S) I
`
`210:::,_ ~4@ 230:::,_
`
`1 I 200n
`
`220--1 GUEST OS 224-1 DRIVERS I
`
`y
`
`VM
`
`200
`
`f
`
`1000
`
`FIG. 1 ---------------------------.
`
`----------._
`
`---
`
`860 I
`
`~ MEMORY I I • • • I VMM
`
`I 300n
`I
`
`EMULATORS
`
`MIGRATION J DEVICE
`
`VMM
`
`MGMT
`
`350
`
`330
`
`360
`
`3061
`
`I
`
`420
`cos ~
`
`Microsoft Ex. 1011, p. 2
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`U.S. Patent
`
`Jan.27,2009
`
`Sheet 2 of 3
`
`US 7,484,208 Bl
`
`[I]
`[I]
`
`"<:t
`0
`.....
`0
`
`[I]
`
`[I]
`
`N
`0
`C\I
`..-
`
`C
`0
`:.;:;
`~~
`·.;:::;>
`
`Cl)
`0,)
`□
`
`0,)
`~~
`5>
`
`(J)
`
`0
`0
`.....
`N
`
`[I]
`[I]
`
`C\I
`0
`0
`..-
`
`0
`0
`0
`..-
`
`0,)
`C
`,.._
`0,)
`~
`
`~
`(/)
`0
`
`~
`(/)
`0
`
`~
`(/)
`0
`
`0,)
`C
`I.....
`0,)
`~
`C
`0
`:.;:;
`CCI
`C
`:.;:;
`Cl)
`0,)
`0
`
`0,)
`C
`I.....
`0,)
`~
`0,)
`u
`I.....
`::J
`0
`(J)
`
`C\I
`0
`.....
`CD
`
`0
`0
`.....
`CD
`
`0
`
`0 r---..-
`
`er:
`w
`>
`er:
`w
`(/)
`
`er:
`w
`>
`er:
`w
`(/)
`
`er:
`w
`>
`er:
`w
`(J)
`
`□·
`
`CD
`0
`O')
`
`□·
`
`"<:t
`0
`O')
`
`□·
`
`C\I
`0
`O')
`
`0 □·
`
`0
`O')
`
`■
`
`N
`CJ
`
`-LL
`
`Microsoft Ex. 1011, p. 3
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`U.S. Patent
`
`Jan.27,2009
`
`Sheet 3 of 3
`
`US 7,484,208 Bl
`
`DESTINATION SERVER
`
`/"1300
`Server Daemon
`
`j2. Create VM I
`
`'
`
`,r /" 1202
`
`j 4. Ready
`j"
`
`I
`
`Destination VM
`
`,3. Readyj
`j3A Wait for migration I
`I,
`I 10.
`I "
`
`Restore state
`
`j 11. End restore
`
`FIG. 3
`
`/
`1002
`
`~1000
`
`SOURCE SERVER
`
`/"1200
`Source VM
`
`Begin save
`
`6.
`
`7.
`
`8. Save state
`
`9. End save
`
`l•
`
`-
`-
`-
`
`~
`
`Source Kernel
`
`'
`
`""1600
`
`~
`Pre-copy memory ~ ~
`~~ ""
`~;::::
`~ ~
`"
`....______ ~ N
`"
`r---,. Destination Kernel
`I 12 Page In I
`I
`
`1602\
`
`1, "
`
`I
`I
`I 13.
`
`Done
`
`j 1. Migrate prepare I
`
`5. Suspend
`and migrate
`
`-
`-
`
`Migration
`~ 2000
`
`Microsoft Ex. 1011, p. 4
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`US 7,484,208 Bl
`
`1
`VIRTUAL MACHINE MIGRATION
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`This invention relates to a computer architecture, in par(cid:173)
`ticular, to an architecture that coordinates the operation of
`multiple virtual machines.
`2. Description of the Related Art
`The advantages of virtual machine technology have
`become widely recognized. Among these advantages is the
`ability to run multiple virtual machines on a single host plat(cid:173)
`form. This makes better use the capacity of the hardware,
`while still ensuring that each user enjoys the features of a
`"complete," isolated computer.
`General Virtualized Computer System
`As is well known in the field of computer science, a virtual
`machine (VM) is a software abstraction-a "virtualiza(cid:173)
`tion"----of an actual physical computer system. FIG. 1 illus(cid:173)
`trates, in part, the general configuration of a virtual machine
`200, which is installed as a "guest" on a "host" hardware
`platform 100.
`As FIG. 1 shows, the hardware platform 100 includes one
`or more processors (CPU's) 110, system memory 130, and a
`storage device, which will typically be a disk 140. The system
`memory will typically be some form of high-speed RAM,
`whereas the disk (one or more) will typically be a non-vola(cid:173)
`tile, mass storage device. The hardware 100 will also include
`other conventional mechanisms such as a memory manage(cid:173)
`ment unit MMU 150, various registers 160, and any conven(cid:173)
`tional network connection device 170 (such as a network
`adapter or network interface card-"NIC") for transfer of
`data between the various components of the system and a
`network 700, which may be any known public or proprietary
`local or wide-area network such as the Internet, an internal
`enterprise network, etc.
`Each VM 200 will typically include at least one virtual
`CPU 210, a virtual disk 240, a virtual system memory 230, a
`guest operating system (which may simply be a copy of a
`conventional operating system) 220, and various virtual
`devices 230, in which case the guest operating system ("guest
`OS") will include corresponding drivers 224. All of the com(cid:173)
`ponents of the VM may be implemented in software using
`known techniques to emulate the corresponding components
`of an actual computer.
`If the VM is properly designed, then it will not be apparent
`to the user that any applications 260 running within the VM
`are running indirectly, that is, via the guest OS and virtual
`processor. Applications 260 running within the VM will act
`just as they would if run on a "real" computer, except for a
`decrease in running speed that will be noticeable only in
`exceptionally time-critical applications. Executable files will
`be accessed by the guest OS from the virtual disk or virtual
`memory, which will simply be portions of the actual physical
`disk or memory allocated to that VM. Once an application is
`installed within the VM, the guest OS retrieves files from the
`virtual disk just as if they had been pre-stored as the result of
`a conventional installation of the application. The design and
`operation of virtual machines is well known in the field of
`computer science.
`Some interface is usually required between a VM and the
`underlying host platform (in particular, the CPU), which is
`responsible for actually executing VM-issued instructions
`and transferring data to and from the actual memory and
`storage devices.A common term for this interface is a "virtual
`machine monitor" (VMM), shown as component 300. A
`VMM is usually a thin piece of software that runs directly on
`
`2
`top of a host, or directly on the hardware, and virtualizes all
`the resources of the machine. Among other components, the
`VMM therefore usually includes device emulators 330,
`which may constitute the virtual devices (230) that the VM
`5 200 addresses. The interface exported to the VM is then the
`same as the hardware interface of the machine, so that the
`guest OS cannot determine the presence of the VMM. The
`VMM also usually tracks and either forwards (to some form
`of operating system) or itself schedules and handles all
`10 requests by its VM for machine resources, as well as various
`faults and interrupts.
`Although the VM (and thus the user of applications run(cid:173)
`ning in the VM) cannot usually detect the presence of the
`VMM, the VMM and the VM may be viewed as together
`15 forming a single virtual computer. They are shown in FIG. 1
`as separate components for the sake of clarity.
`Virtual and Physical Memory
`As in most modern computers, the address space of the
`memory 130 is partitioned into pages (for example, in the
`20 Intel x86 architecture) or regions (for example, Intel IA-64
`architecture). Applications then address the memory 130
`using virtual addresses (VAs), which include virtual page
`numbers (VPNs). The VAs are then mapped to physical
`addresses (PAs) that are used to address the physical memory
`25 130. (VAs and PAs have a common offset from a base address,
`so that only the VPN needs to be converted into a correspond(cid:173)
`ing PPN.) The concepts ofVPN s and PPN s, as well as the way
`in which the different page numbering schemes are imple(cid:173)
`mented and used, are described in many standard texts, such
`30 as "Computer Organization and Design: The Hardware/Soft(cid:173)
`ware Interface," by David A. Patterson and John L. Hennessy,
`Morgan Kaufmann Publishers, Inc., San Francisco, Calif.,
`1994, pp. 579-603 (chapter 7.4 "Virtual Memory"). Similar
`mappings are used in region-based architectures or, indeed, in
`35 any architecture where relocatability is possible.
`An extra level of addressing indirection is typically imple(cid:173)
`mented in virtualized systems in that a VPN issued by an
`application 260 in the VM 200 is remapped twice in order to
`determine which page of the hardware memory is intended.
`40 The first mapping is provided by a mapping module within
`the guest OS 202, which translates the guest VPN (GVPN)
`into a corresponding guest PPN (GPPN) in the conventional
`manner. The guest OS therefore "believes" that it is directly
`addressing the actual hardware memory, but in fact it is not.
`45 Of course, a valid address to the actual hardware memory
`must ultimately be generated. A memory management mod(cid:173)
`ule 350 in the VMM 300 therefore performs the second map(cid:173)
`ping by taking the GPPN issued by the guest OS 220 and
`mapping it to a hardware ( or "machine") page number PPN
`50 that can be used to address the hardware memory 130. This
`GPPN-to-PPN mapping is typically done in the main system(cid:173)
`level software layer (such as the kernel 600 described below),
`depending on the implementation: From the perspective of
`the guest OS, the GVPN and GPPN might be virtual and
`55 physical page numbers just as they would be if the guest OS
`were the only OS in the system. From the perspective of the
`system software, however, the GPPN is a page number that is
`then mapped into the physical memory space of the hardware
`memory as a PPN.
`System Software Configurations in Virtualized Systems
`In some systems, such as the Workstation product of
`VMware, Inc., of Palo Alto, Calif., the VMM is co-resident at
`system level with a host operating system. Both the VMM and
`the host OS can independently modify the state of the host
`65 processor, but the VMM calls into the host OS via a driver and
`a dedicated user-level application to have the host OS perform
`certain I/O operations of behalf of the VM. The virtual com-
`
`60
`
`Microsoft Ex. 1011, p. 5
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`US 7,484,208 Bl
`
`3
`puter in this configuration is thus fully hosted in that it runs on
`an existing host hardware platform and together with an exist(cid:173)
`ing host OS.
`In other implementations, a dedicated kernel takes the
`place of and performs the conventional functions of the host
`OS, and virtual computers run on the kernel. FIG. 1 illustrates
`a kernel 600 that serves as the system software for several
`VM/VMM pairs 200/300, ... , 200n/300n. Compared with a
`system in which VMMs run directly on the hardware plat(cid:173)
`form, use of a kernel offers greater modularity and facilitates 10
`provision of services that extend across multiple VMs (for
`example, for resource management). Compared with the
`hosted deployment, a kernel may offer greater performance
`because it can be co-developed with the VMM and be opti(cid:173)
`mized for the characteristics of a workload consisting of
`VMMs. The ESX Server product ofVMware, Inc., has such a
`configuration.
`A kernel-based virtualization system of the type illustrated
`in FIG. 1 is described in U.S. patent application Ser. No.
`09/877,378 ("Computer Configuration for Resource Man- 20
`agement in Systems Including a Virtual Machine"), which is
`incorporated here by reference. The main components of this
`system and aspects of their interaction are, however, outlined
`below.
`At a boot-up time, an existing operating system 420 may be
`at system level and the kernel 600 may not yet even be
`operational within the system. In such case, one of the func(cid:173)
`tions of the OS 420 may be to make it possible to load the
`kernel 600, after which the kernel runs on the native hardware
`and manages system resources. In effect, the kernel, once
`loaded, displaces the OS 420. Thus, the kernel 600 may be
`viewed either as displacing the OS 420 from the system level
`and taking this place itself, or as residing at a "sub-system
`level." When interposed between the OS 420 and the hard(cid:173)
`ware 100, the kernel 600 essentially turns the OS 420 into an
`"application," which has access to system resources only
`when allowed by the kernel 600. The kernel then schedules
`the OS 420 as if it were any other component that needs to use
`system resources.
`The OS 420 may also be included to allow applications
`unrelated to virtualization to run; for example, a system
`administrator may need such applications to monitor the
`hardware 100 or to perform other administrative routines. The
`OS 420 may thus be viewed as a "console" OS (COS). In this
`case, the kernel 600 preferably also provides a remote proce(cid:173)
`dure call (RPC) mechanism 614 to enable communication
`between, for example, the VMM 300 and any applications
`800 installed to run on the COS 420.
`Worlds
`The kernel 600 handles not only the various VMMNMs,
`but also any other applications running on the kernel, as well
`as the COS 420 and even the hardware CPU(s) 110, as entities
`that can be separately scheduled. In this disclosure, each
`schedulable entity is referred to as a "world," which contains
`a thread of control, an address space, machine memory, and
`handles to the various device objects that it is accessing.
`Worlds, represented in FIG.1 within the kernel 600 as module
`612, are stored in a portion of the memory space controlled by
`the kernel. Each world also has its own task structure, and
`usually also a data structure for storing the hardware state
`currently associated with the respective world.
`There will usually be different types of worlds: 1) system
`worlds, which are used for idle worlds, one per CPU, and a
`helper world that performs tasks that need to be done asyn(cid:173)
`chronously; 2) a console world, which is a special world that 65
`runs in the kernel and is associated with the COS 420; and 3)
`virtual machine worlds.
`
`4
`Worlds preferably run at the most-privileged level (for
`example, in a system with the Intel x86 architecture, this will
`be level CPL0), that is, with full rights to invoke any privi(cid:173)
`leged CPU operations. A VMM, which, along with its VM,
`5 constitutes a separate world, therefore may use these privi(cid:173)
`leged instructions to allow it to run its associated VM so that
`it performs just like a corresponding "real" computer, even
`with respect to privileged operations.
`Switching Worlds
`When the world that is rumiing on a particular CPU (which
`may be the only one) is preempted by or yields to another
`world, then a world switch has to occur. A world switch
`involves saving the context of the current world and restoring
`the context of the new world such that the new world can
`15 begin executing where it left off the last time that it is was
`running.
`The first part of the world switch procedure that is carried
`out by the kernel is that the current world's state is saved in a
`data structure that is stored in the kernel's data area. Assum(cid:173)
`ing the common case of an underlying Intel x86 architecture,
`the state that is saved will typically include: 1) the exception
`flags register; 2) general purpose registers; 3) segment regis(cid:173)
`ters; 4) the instruction pointer (EIP) register; 5) the local
`descriptor table register; 6) the task register; 7) debug regis-
`25 ters; 8) control registers; 9) the interrupt descriptor table
`register; 10) the global descriptor table register; and 11) the
`floating point state. Similar state information will need to be
`saved in systems with other hardware architectures.
`After the state of the current world is saved, the state of the
`30 new world can be restored. During the process of restoring the
`new world's state, no exceptions are allowed to take place
`because, if they did, the state of the new world would be
`inconsistent upon restoration of the state. The same state that
`was saved is therefore restored. The last step in the world
`35 switch procedure is restoring the new world's code segment
`and instruction pointer (EIP) registers.
`When worlds are initially created, the saved state area for
`the world is initialized to contain the proper information such
`that when the system switches to that world, then enough of
`its state is restored to enable the world to start running. The
`EIP is therefore set to the address of a special world start
`function. Thus, when a running world switches to a new world
`that has never run before, the act of restoring the EIP register
`will cause the world to begin executing in the world start
`function.
`Switching from and to the COS world requires additional
`steps, which are described in U.S. patent application Ser. No.
`09/877,378, mentioned above. Understanding this process is
`not necessary for understanding the present invention, how(cid:173)
`ever so further discussion is omitted.
`Memory Management in Kernel-Based System
`The kernel 600 includes a memory management module
`616 that manages all machine memory that is not allocated
`55 exclusively to the COS 420. When the kernel 600 is loaded,
`the information about the maximum amount of memory
`available on the machine is available to the kernel, as well as
`information about how much of it is being used by the COS.
`Part of the machine memory is used for the kernel 600 itself
`60 and the rest is used for the virtual machine worlds.
`Virtual machine worlds use machine memory for two pur(cid:173)
`poses. First, memory is used to back portions of each world's
`memory region, that is, to store code, data, stacks, etc., in the
`VMM page table. For example, the code and data for the
`VMM 300 is backed by machine memory allocated by the
`kernel 600. Second, memory is used for the guest memory of
`the virtual machine. The memory management module may
`
`50
`
`40
`
`45
`
`Microsoft Ex. 1011, p. 6
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`US 7,484,208 Bl
`
`6
`2) the user of a VM, in particular, of an application running
`on the VM, will usually not be able to notice that the appli(cid:173)
`cation is running on a VM (which is implemented wholly as
`software) as opposed to a "real" computer;
`3) assuming that different VMs have the same configura(cid:173)
`tion and state, the user will not know and would have no
`reason to care which VM he is currently using;
`4) the entire state (including memory) of any VM is avail(cid:173)
`able to its respective VMM, and the entire state of any VM and
`10 of any VMM is available to the kernel 600;
`5) as a consequence of the above facts, a VM is "relocat(cid:173)
`able."
`Except for the network 700, the entire multi-VM system
`shown in FIG. 1 can be implemented in a single physical
`15 machine, such as a server. This is illustrated by the single
`functional boundary 1000. (Of course devices such as key(cid:173)
`boards, monitors, etc., will also be included to allow users to
`access and use the system, possibly via the network 700; these
`are not shown merely for the sake of simplicity.)
`In systems configured as in FIG. 1, the focus is on manag-
`ing the resources of a single physical machine: Virtual
`machines are installed on a single hardware platform and the
`CPU(s), network, memory, and disk resources for that
`machine are managed by the kernel 600 or similar server
`25 software. This represents a limitation that is becoming
`increasingly undesirable and increasingly unnecessary. For
`example, if the server 1000 needs to be shut down for main(cid:173)
`tenance, then the VMs loaded in the server will become
`inaccessible and therefore useless to those who need them.
`30 Moreover, since the VMs must share the single physical
`memory space 130 and the cycles of the single (or single
`group of) CPU, these resources are substantially "zero-sum,"
`such that particularly memory- or processor-intensive tasks
`may cause noticeably worse performance.
`One way to overcome this problem would be to provide
`multiple servers, each with a set of VMs. Before shutting
`down one server, its VMs could be powered down or check(cid:173)
`pointed and then restored on another server. The problem
`40 with this solution is that it still disrupts on-going VM use, and
`even a delay often seconds may be noticeable and irritating to
`users; delays on the order of minutes will normally be wholly
`unacceptable.
`What is needed is a system that allows greater flexibility in
`45 the deployment and use ofVMs, but with as little disruption to
`users as possible. This invention provides such a system, as
`well as a related method of operation.
`
`35
`
`20
`
`5
`include any algorithms for dynamically allocating memory
`among the different VM's 200.
`Interrupt Handling in Kernel-Based System
`The kernel 600 preferably also includes an interrupt han(cid:173)
`dler 650 that intercepts and handles interrupts for all devices 5
`on the machine. This includes devices such as the mouse that
`are used exclusively by the COS. Depending on the type of
`device, the kernel 600 will either handle the interrupt itself or
`forward the interrupt to the COS.
`Device Access in Kernel-Based System
`In the preferred embodiment of the invention, the kernel
`600 is responsible for providing access to all devices on the
`physical machine. In addition to other modules that the
`designer may choose to load into the kernel, the kernel will
`therefore typically include conventional drivers as needed to
`control access to devices. Accordingly, FIG. 1 shows within
`the kernel 600 a module 610 containing loadable kernel mod(cid:173)
`ules and drivers.
`Kernel File System
`In the ESX Server product ofVMware, Inc., the kernel 600
`includes a fast, simple file system, referred to here as the VM
`kernel file system (VMKFS), that has proven itself to be
`particularly efficient for storing virtual disks 240, which typi(cid:173)
`cally comprise a small number oflarge (at least 1 GB) files.
`By using very large file system blocks, the file system is able
`to keep the amount of metadata (that is, the data that indicates
`where data blocks are stored on disk) needed to access all of
`the data in a file to an arbitrarily small size. This allows all of
`the metadata to be cached in main memory so that all file
`system reads and writes can be done without any extra meta(cid:173)
`data reads or writes.
`The VMKFS in ESX Server takes up only a single disk
`partition. When it is created, it sets aside space for the file
`system descriptor, space for file descriptor information,
`including the file name, space for block allocation informa(cid:173)
`tion, and space for block pointer blocks. The vast majority of
`the partition's space is used for data blocks, whose size is set
`when the file system is created. The larger the partition size,
`the larger the block size should be in order to minimize the
`size of the metadata.
`As mentioned earlier, the main advantage of the VMKFS is
`that it ensures that all metadata may be cached in high-speed,
`main system memory. This can be done by using large data
`block sizes, with small block pointers. Since virtual disks are
`usually at least one gigabyte in size, using large block sizes on
`the order of 64 Megabytes will cause virtually no wasted disk
`space and all metadata for the virtual disk can be cached
`simultaneously in system memory.
`Besides being able to always keep file metadata cached in 50
`memory, the other key to high performance file I/O is to
`reduce the number of metadata updates. Note that the only
`reason why the VMKFS metadata will need to be updated is
`if a file is created or destroyed, or if it changes in size. Since
`these files are used primarily for virtual disks ( or, for example, 55
`for copy-on-write redo logs), files are not often created or
`destroyed. Moreover, because virtual disks are usually fixed
`in size upon creation, the file size of a virtual disk does not
`usually change. In order to reduce the number of metadata
`updates on a virtual disk to zero, the system may therefore 60
`preallocate all data blocks for virtual disks when the file is
`created.
`Key VM Features
`For the purposes of understanding the advantages of this
`invention, the salient points of the discussion above are:
`1) each VM 200, ... , 200n has its own state and is an entity
`that can operate completely independently of other VMs;
`
`SUMMARY OF THE INVENTION
`
`In a networked system of computers (preferably, servers),
`including a source computer and a destination computer and
`a source virtual machine (VM) installed on the source com(cid:173)
`puter, the invention provides a virtualization method and
`system according to which the source VM is migrated to a
`destination VM while the source VM is still powered on.
`Execution of the source VM is suspended, and then non(cid:173)
`memory source VM state information is transferred to the
`destination VM; the destination VM is then resumed from the
`transferred non-memory source VM state.
`Different methods are provided for transferring the source
`VM' s memory to the destination VM. In the preferred
`embodiment of the invention, the destination VM may be
`resumed before transfer of the source VM memory is com-
`65 pleted. One way to do this is to page in the source VM
`memory to the destination VM on demand. Following an
`alternative procedure, the source VM memory is pre-copied
`
`Microsoft Ex. 1011, p. 7
`Microsoft v. Daedalus Blue
`IPR2021-00832
`
`

`

`US 7,484,208 Bl
`
`7
`to the destination VM before the non-memory source VM
`state information is transferred.
`In one refinement of the invention, any units (such as
`pages) of the source VM memory that are modified (by the
`source VM or by any other component) during the interval 5
`between pre-copying and completing transfer of all pages are
`retransferred to the destination VM. Modification may be
`detected in different ways, preferably by write-protecting the
`source VM memory and then sensing page faults when the
`source VM attempts to write to any of the protected memory
`units. An iterative procedure for retransferring modified
`memory units is also disclosed.
`In the preferred embodiment of the invention, the source
`VM' s non-memory state information includes the contents of
`a source virtual disk. The contents of the source virtual disk 15
`are preferably stored in a storage arrangement shared by both
`the source and destination computers. The destination VM's
`virtual disk is then prepared by mapping the virtual disk of the
`destination VM to the same physical addresses as the source
`virtual disk in the shared storage arrangement.
`In the most commonly anticipated implementation of the
`invention, communication between a user and the source and
`destination VMs takes place over a network. Network con(cid:173)
`nectivity is preferably also made transparent to the user by
`arranging the servers on a common subnet, with virtual net- 25
`work connection addresses generated from a common name
`space of physical addresses.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates the main components of a server that
`includes one or more virtual machines running on a system(cid:173)
`level kernel.
`FIG. 2 illustrates a farm of interconnected servers accord(cid:173)
`ing to the invention, with each server hosting a plurality of
`virtual machines.
`FIG. 3 illustrates the steps the invention takes to migrate a
`virtual machine from a source to a destination.
`
`DETAILED DESCRIPTION
`
`In broadest terms, the invention provides a farm of servers,
`each of which may host one or more virtual machines (VMs ),
`as well as mechanisms for migrating a VM from one server
`(the source server) to another (the destination server) while
`the VM is still running. There are many reasons why efficient,
`substantially transparent VM migration is beneficial. Load(cid:173)
`balancing is mentioned above, as is the possibility that a
`machine may need to be taken out of service for maintenance.
`Another reason may be to add or remove resources from
`the server. This need not be related to the requirements of the
`hardware itself, but rather it may also be to meet the desires of
`a particular user/customer. For example, a particular user may
`request ( and perhaps pay for) more memory, more CPU time,
`etc., all of which may necessitate migration of his VM to a
`different server.
`The general configuration of the server farm according to
`the invention is illustrated in FIG. 2, in which a plurality of
`users 900, 902, 904, ... , 906 access a farm of servers 1000,
`1002, ... , 1004 via the network 700. Each of the servers is
`preferably configured as the server 1000 shown in FIG. 1, and
`will include at least one and possibly many VMs. Thus, server
`1000 is shown in both FIG. 1 and FIG. 2.
`In a server farm, all of the resources of all of the machines
`in the farm can be aggregated into one common resource pool.
`From the perspective of a user, the farm will appear to be one
`big machine with lots of resources. As is described below, the
`
`8
`invention provides a mechanism for managing the set of
`resources so that they are utilized as efficiently and as reliably
`as possible.
`Advantages ofVM Migration
`The ability to quickly migrate VMs while they are running
`between individual nodes of the server farm has several
`advantages, among which are the following:
`1) It allows the load to be balanced across all nodes in the
`cluster. If one node is out of resources while other nodes have
`10 free resources, then VMs can be moved around between
`nodes to balance the load.
`2) It allows individual nodes of the cluster to be shut down
`for maintenance without requiring that the VMs that are run(cid:173)
`ning on the node be shut down: Instead of shutting the VMs
`down, the VMs can simply be migrated to other machines in
`the cluster.
`3) It allows the immediate utilization of new nodes as they
`are added to the cluster. Currently running VMs can be
`migrated from machines that are over ut

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket