throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2005/0132250 A1
`(43) Pub. Date:
`Jun. 16, 2005
`Hansen et al.
`
`US 20050132250A1
`
`(54) PERSISTENT MEMORY DEVICE FOR
`BACKUP PROCESS CHECKPOINT STATES
`
`(75) Inventors: Roger Hansen, San Francisco, CA
`(US); Pankaj Mehra, San Jose, CA
`(US); Sam Fineberg, Palo Alto, CA
`(Us)
`Correspondence Address:
`HEWLETT PACKARD COMPANY
`P O BOX 272400, 3404 E. HARMONY ROAD
`INTELLECTUAL PROPERTY
`ADMINISTRATION
`FORT COLLINS, CO 80527-2400 (US)
`
`(73) Assignee: Hewlett-Packard Development Com
`pany, L.P., Houston, TX
`
`(21) Appl. No.:
`
`10/737,374
`
`(22) Filed:
`
`Dec. 16, 2003
`
`Publication Classi?cation
`
`(51) Int. Cl? ................................................... .. G06F 11/00
`(52) US. Cl. ................................................................ .. 714/5
`
`(57)
`
`ABSTRACT
`
`A system is described that includes a network interface
`attached to a persistent memory unit. The persistent memory
`unit is con?gured to receive checkpoint data from a primary
`process, and to provide access to the checkpoint data for use
`in a backup process, Which provides recovery capability in
`the event of a failure of the primary process. The netWork
`interface is con?gured to provide address translation infor
`mation betWeen virtual and physical addresses in the per
`sistent memory unit. In other embodiments, the persistent
`memory unit is capable of storing multiple updates to the
`checkpoint state. The checkpoint state and the updates to the
`checkpoint state, if any, can be retrieved by the backup
`process periodically, or all at once upon failure of the
`primary process.
`
`Processor
`M Ni
`Primary Process LU-Q \
`m
`
`Processor
`
`
`
`@- Bockuggrocess
`
`it.
`
`_*
`
`
`
`I PMM
`
`440
`
`Persistent Memory
`E
`
`NI
`m
`
`Checkpoint State —_
`
`Checkpoint Update -_
`
`@
`Checkpoint Update -_
`1Q
`
`Checkpoint Update -_
`152
`
`

`
`Patent Application Publication Jun. 16, 2005 Sheet 1 0f 5
`
`US 2005/0132250 A1
`
`400\‘
`
`Persistent Memory
`M
`Checkpoint State
`m
`
`Address Translation and
`Protection Tables
`E
`NI
`l1_
`
`Processor
`i
`M NI
`Primary Process w
`m
`Operating System
`14A
`
`Access Rights &
`Connection Contexts
`
`Processor
`NI E
`w Backup Process
`Q
`Oper?’fing System
`1%
`
`Access Rights &
`Connection Contexts
`
`FIG 7A
`
`N'
`1%
`
`PMM
`140'
`
`_
`proqc?sor
`d
`Processor
`_ — W Checkpoint State
`rec
`1&5.
`pr'mcrlly 120C955 »
`w \‘ Backup Process
`
`_
`
`1_22
`
`Persistent Memory
`£32
`
`FIG. '18
`
`

`
`Patent Application Publication Jun. 16, 2005 Sheet 2 0f 5
`
`US 2005/0132250 A1
`
`Processor
`M NI
`Primory Process 313 \
`l’LQ
`
`Persistent Memory
`Q
`
`NI
`H 4_4
`
`Processor
`1012
`Bockuggrocess
`*“
`
`NI
`
`'1 4O
`
`I
`PMM
`w
`
`—
`
`Checkpoint Stote —_
`120
`Checkpoint Update -_
`@
`Checkpoint Upciote -_
`1Q
`
`Checkpoint Updcrte -_
`132
`
`Processor
`M
`
`Persistent Memory
`‘
`102
`Processor
`—
`406
`.
`reod —
`Checkpoint Stote \_ Backup Process
`'
`420
`122
`Prrmory Process write
`M _\~> _
`——
`
`FIG. ’I D
`
`

`
`Patent Application Publication Jun. 16, 2005 Sheet 3 0f 5
`
`US 2005/0132250 A1
`
`Q2
`
`40
`206 N
`Non-Volatile
`Memory : K > 444 hi.’
`M
`—
`
`Processor
`
`@
`A
`
`1%
`
`Non-Volatile
`Secondary
`Storage
`m
`
`A
`
`Volo’rile
`Memory ‘
`Q2
`A
`
`V
`
`Bo?ery
`M
`
`> M
`m
`I
`
`FIG. 3
`
`

`
`Patent Application Publication Jun. 16, 2005 Sheet 4 0f 5
`
`US 2005/0132250 A1
`
`PM VIRTUAL
`ADDRESS
`
`BASE ADDRESS
`
`BASEADDRESS+N
`
`402 <
`404 \
`406 '
`408
`410
`442
`414
`44¢ '
`
`PM PHYSICAL
`ADDRESS
`
`0
`
`41a
`420
`422
`424
`/: 426
`428
`430
`432
`434
`436
`438
`440
`442
`444
`444
`448 M
`
`FIG. 4
`
`

`
`Patent Application Publication Jun. 16, 2005 Sheet 5 0f 5
`
`US 2005/0132250 A1
`
`con
`
`qwm
`
`_Z I I I
`
`
`
`wwm tom 6:80 59.:
`I 9% 2m in
`
`
`
`>063 EQBQEJEEQQ 623v o=wE:cocQ_<
`
`
`
`
`
`mom
`
`
`
`.wlvm I
`
`>1 : > t
`
`k 1 < < < r
`
`A \L > ‘y i
`
`
`
`
`
`90a _>_O~._ 322
`
`
`
`
`
`@0906 2m gm, wEQQOE co=oo=aa<
`
`: :
`
`I I mom
`
`§ 20
`
`
`
`Ews>w @5980
`
`am
`
`

`
`US 2005/0132250 A1
`
`Jun. 16, 2005
`
`PERSISTENT MEMORY DEVICE FOR BACKUP
`PROCESS CHECKPOINT STATES
`
`BACKGROUND
`
`[0001] Failure of a computer, as Well as application pro
`grams, executing on a computer can often result in the loss
`of signi?cant amounts of data and intermediate calculations.
`The cause of failure can be either hardWare or softWare
`related, but in either instance the consequences can be
`expensive, particularly When data manipulations are inter
`rupted in mid-stream. In the case of large softWare applica
`tions, a failure might require an extensive effort to regen
`erate the status of the application’s state prior to the failure.
`[0002] Generally, checkpoint and restoration techniques
`periodically save the process state during normal execution,
`and thereafter restore the saved state to a backup process
`folloWing a failure. In this manner, the amount of lost Work
`is minimiZed to progress made by the application process
`since the restored checkpoint.
`[0003] Traditionally, computers have stored the check
`point data in either system memory coupled to the comput
`er’s processor, or on other input/output (I/ O) storage devices
`such as magnetic tape or disk. I/O storage devices can be
`attached to a system through an I/O bus such as a PCI
`(originally named Peripheral Component Interconnect), or
`through a netWork such as Fiber Channel, In?niband, Serv
`erNet, or Ethernet. I/O storage devices are typically sloW,
`With access times of more than one millisecond. They utilize
`special I/O protocols such as small computer systems inter
`face (SCSI) protocol or transmission control protocol/inter
`net protocol (TCP/IP), and typically operate as block
`exchange devices (e.g., data is read or Written in ?xed siZe
`blocks of data). A feature of these types of storage I/O
`devices is that they are persistent such that When they lose
`poWer or are re-started they retain the information stored on
`them previously. In addition, I/O storage devices can be
`accessed from multiple processors through shared I/O net
`Works, even after some processors have failed.
`
`[0004] As used herein, the term “persistent” refers to a
`computer memory storage device that can Withstand a poWer
`reset Without loss of the contents in memory. Persistent
`memory devices have been used to store data for starting or
`restarting softWare applications. In simple systems, persis
`tent memory devices are static and not modi?ed as the
`softWare executes. The initial state of the softWare environ
`ment is stored in persistent memory. In the event of a poWer
`failure to the computer or some other failure, the softWare
`restarts its execution from the initial state. One problem With
`this approach is that all intermediate calculations Will have
`to be recomputed. This can be particularly onerous if large
`amounts of user data must be reloaded during this process.
`If some or all of the user data is no longer available, it may
`not be possible to reconstruct the pre-failure state.
`
`[0005] System memory is generally connected to a pro
`cessor through a system bus Where such memory is rela
`tively fast With guaranteed access times measured in tens of
`nanoseconds. Moreover, system memory can be directly
`accessed With byte-level granularity. System memory, hoW
`ever, is normally volatile such that its contents are lost if
`poWer is lost or if a system embodying such memory is
`restarted. Also, system memory is usually Within the same
`fault domain as a processor such that if a processor fails, the
`
`attached memory also fails and may no longer be accessed.
`Metadata, Which describes the layout of memory, is also lost
`When poWer is lost or When the system embodying such
`memory is restarted.
`
`[0006] Prior art systems have used battery-backed
`dynamic random access memory (BBDRAM), solid-state
`disks, and netWork-attached volatile memory. Prior
`BBDRAM, for example, may have some performance
`advantages over true persistent memory. It is not, hoWever,
`globally accessible. Moreover, BBDRAM that lies Within
`the same fault domain as an attached CPU Will be rendered
`inaccessible in the event of a CPU failure or operating
`system crash. Accordingly, BBDRAM is often used in
`situations Where all system memory is persistent so that the
`system may be restarted quickly after a poWer failure or
`reboot. BBDRAM is still volatile during long poWer outages
`such that alternate means must be provided to store its
`contents before batteries drain. Importantly, this use of
`BBDRAM is very restrictive and not amenable for use in
`netWork-attached persistent memory applications, for
`example.
`[0007] Battery-backed solid-state disks (BBSSD) have
`also been proposed for other implementations. These
`BBSSDs provide persistent memory, but functionally they
`emulate a disk drive. An important disadvantage of this
`approach is the additional latency associated With access to
`these devices through 1/0 adapters. This latency is inherent
`in the block-oriented and ?le-oriented storage models used
`by disks and, in turn, BBSSDs, Which do not bypass the host
`computer’s operating system. While it is possible to modify
`solid-state disks to eliminate some shortcomings, inherent
`latency cannot be eliminated because performance is limited
`by the I/O protocols and their associated device drivers. As
`With BBDRAM, additional technologies are required for
`providing the checkpoint state of an application program in
`a failed domain to a backup copy of the application program
`running in an operational domain.
`
`SUMMARY
`
`[0008] In some embodiments, a system includes a netWork
`interface attached to a persistent memory unit. The persistent
`memory unit is con?gured to receive checkpoint data from
`a primary process, and to provide access to the checkpoint
`data for use in a backup process to support recovery capa
`bility in the event of a failure of the primary process. The
`netWork interface is con?gured to provide address transla
`tion information betWeen virtual and physical addresses in
`the persistent memory unit. In other embodiments, the
`persistent memory unit is capable of storing multiple
`updates to the checkpoint state. The checkpoint state and the
`updates to the checkpoint state, if any, can be retrieved by
`the backup process periodically, or all at once upon failure
`of the primary process.
`
`[0009] In yet other embodiments, a method for recovering
`the operational state of a primary process includes mapping
`virtual addresses of a persistent memory unit to physical
`addresses of the persistent memory unit, and receiving
`checkpoint data regarding the operational state of the pri
`mary process in the persistent memory unit. In some
`embodiments, the checkpoint data is provided to a backup
`process. In still other embodiments, the context information
`regarding the addresses is provided to the primary process
`and the backup process.
`
`

`
`US 2005/0132250 A1
`
`Jun. 16, 2005
`
`[0010] In other embodiments, the persistent memory unit
`provides the checkpoint data to the backup process When the
`primary process fails. Alternatively, in still other embodi
`ments, the persistent memory unit can be con?gured to store
`multiple sets of checkpoint data sent from the processor at
`successive time intervals, or to provide the multiple sets of
`checkpoint data to the backup process at one time.
`
`[0011] These and other embodiments Will be understood
`upon an understanding of the present disclosure by one of
`ordinary skill in the art to Which it pertains.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0012] The accompanying drawings, Which are incorpo
`rated in and form a part of this speci?cation, illustrate
`embodiments of the invention and, together With the
`description, serve to eXplain its principles:
`
`[0013] FIG. 1A is a block diagram of an embodiment of
`a system that includes a netWork attached persistent memory
`unit (NPMU) capable of storing checkpoint state informa
`tion;
`[0014] FIG. 1B is a diagram of an embodiment of one
`method for accessing checkpoint state information from the
`NPMU of FIG. 1A;
`
`[0015] FIG. 1C is a block diagram of an embodiment of
`a system that includes a netWork attached persistent memory
`unit (NPMU) capable of storing multiple sets of checkpoint
`state information;
`[0016] FIG. 1D is a diagram of an embodiment of another
`method for accessing checkpoint state information from the
`NPMU of FIG. 1C;
`
`[0017] FIG. 2 is a block diagram of an embodiment of a
`netWork attached persistent memory unit (NPMU);
`
`[0018] FIG. 3 is a block diagram of an embodiment of a
`netWork attached persistent memory unit (NPMU) using
`battery backup;
`[0019] FIG. 4 is a block diagram illustrating mappings
`from a persistent memory virtual address space to a persis
`tent memory physical address space; and
`
`[0020] FIG. 5 is a block diagram of an illustrative com
`puter system on Which a netWork attached persistent
`memory unit (NPMU) can be implemented.
`
`DETAILED DESCRIPTION
`
`[0021] Whereas prior art systems have used persistent
`memory only in the conteXt of block-oriented and ?le
`oriented I/O architectures With their relatively large laten
`cies, the present teachings describe memory that is persistent
`like traditional I/O storage devices, but that can be accessed
`like system memory With ?ne granularity and loW latency.
`Systems according to the present teachings alloW application
`programs to store one or more checkpoint states, Which can
`be accessed by a backup copy of the application in the event
`of a hardWare or softWare failure that prevents the primary
`application program from eXecuting.
`[0022] As shoWn in FIG. 1, a system 100 using netWork
`attached persistent memory includes a netWork-attached
`persistent memory unit (NPMU) 102 that can be accessed by
`one or more processor nodes 104, 106 through correspond
`
`ing netWork interfaces (NI) 108, 110, and system area
`netWork (SAN) 112, such as a Remote Direct Memory
`Access (RDMA)-enabled SAN. RDMA can be implemented
`as a feature of NIs 108, 110, and 114 to enable processor
`nodes 104, 106 to directly store and retrieve information in
`the memory of NPMU 102. Transferring data directly to or
`from NPMU 102 eliminates the need to copy data betWeen
`memory in processor nodes 104, 106 and kernel I/O pro
`cesses in operating systems 144, 146. RDMA capability thus
`reduces the number of conteXt sWitches betWeen primary
`process 116 and backup process 122, and operating systems
`144, 146 While handling memory transfers via SAN 112.
`
`[0023] SAN 112 accesses NPMU 102 via netWork inter
`face (NI) 114. NPMU 102 combines the durability and
`recoverability of storage I/ O With the speed and ?ne-grained
`access of system memory. Like storage, the contents of
`NPMU 102 can survive the loss of poWer or system restart.
`Like remote memory, NPMU 102 can be accessed across
`SAN 112. HoWever, unlike directly-connected memory,
`NPMU 102 can continue to be accessed even after one or
`more processor nodes 104, 106 have failed.
`
`[0024] Primary process 116 running on processor node
`104 can initiate remote commands, for eXample, a Write
`command to send data for checkpoint state 120 in NPMU
`102. Primary process 116 can also provide data for check
`point state 120 periodically. Backup process 122 running on
`processor node 106 is con?gured to perform the functions of
`primary process 116 in the event of a failure of primary
`process 116. Backup process 122 can also initiate remote
`read and Write operations to NPMU 102, such as a read
`command to access checkpoint state 120 periodically and/or
`upon failure of primary process 116.
`
`[0025] In a Write operation initiated by processor node
`104, for eXample, once data has been successfully stored in
`NPMU 102, the data is durable and Will survive a poWer
`outage or failure of processor node 104, 106. In particular,
`memory contents Will be maintained as long as NPMU 102
`continues to function correctly, even after the poWer has
`been disconnected for an eXtended period of time, or the
`operating system on processor node 104, 106 has been
`rebooted. In addition to data transfer operations, NPMU 102
`can be con?gured to respond to various management com
`mands.
`
`[0026] In some embodiments, processor nodes 104, 106
`are computer systems that include at least one central
`processing unit (CPU) and system memory Wherein the CPU
`is con?gured to run operating systems 144, 146. Processor
`nodes 104, 106 can additionally be con?gured to run one or
`more of any type of application program, such as primary
`process 116 and backup process 118. Although system 100
`is shoWn With tWo processor nodes 104, 106, additional
`processor nodes (not shoWn) can communication With SAN
`112 as Well as With processor nodes 104, 106 over a netWork
`(not shoWn) via netWork interfaces 108, 110, 114.
`
`[0027] In some embodiments, SAN 112 is a RDMA
`enabled netWork connecting multiple netWork interface
`units (NI), such as NIs 108, 110, and 114 that can perform
`byte-level memory operations betWeen tWo processor nodes
`104, 106, or betWeen processor nodes 104, 106 and a device
`such as NPMU 102, Without notifying operating systems
`144, 146. In this case, SAN 112 is con?gured to perform
`virtual to physical address translation to map contiguous
`
`

`
`US 2005/0132250 A1
`
`Jun. 16, 2005
`
`network virtual address spaces onto discontiguous physical
`address spaces. This type of address translation allows for
`dynamic management of NPMU 102. Commercially avail
`able SANs 112 with RDMA capability include, but are not
`limited to, ServerNet, GigaNet, In?niband, and all Virtual
`Interface Architecture compliant SANs.
`
`[0028] Processor nodes 104, 106 are generally attached to
`SAN 112 through respective NIs 108, 110, however, many
`variations are possible. More generally, however, a proces
`sor node need only be connected to an apparatus for com
`municating read and write operations. For example, in
`another implementation of this embodiment, processor
`nodes 104, 106 include various CPUs on a motherboard that
`utiliZe a data bus, for eXample a PCI bus, instead of SAN
`112. It is noted that the present teachings can be scaled up
`or down to accommodate larger or smaller implementations
`as needed.
`
`[0029] Network interfaces (NI) 108, 110, 114 are commu
`nicatively coupled to NPMU 102 to allow for access to the
`persistent memory contained with NPMU 102. Any suitable
`technology can be utiliZed for the various components of
`FIG. 1A, including the type of persistent memory. Accord
`ingly, the embodiment of FIG. 1A is not limited to a speci?c
`technology for realiZing the persistent memory. Indeed,
`multiple memory technologies, including magnetic random
`access memory (MRAM), magneto-resistive random access
`memory (MRRAM), polymer ferroelectric random access
`memory (PFRAM), ovonics uni?ed memory (OUM),
`BBDRAM, and FLASH memories of all kinds may be
`appropriate. System 100 can be con?gured to allow high
`granularity memory access, including byte-level memory
`access, compared to BBSSDs, which transfer entire blocks
`of information.
`
`[0030] Notably, memory access granularity can be
`adjusted as required in system 100. The access speed of
`memory in NPMU 102 should also be fast enough to support
`the transfer rates of the data communication scheme imple
`mented for system 100.
`
`[0031] It should be noted that persistent information is
`provided to the eXtent the persistent memory in use may hold
`data. For eXample, in many applications, persistent memory
`may be required to store data regardless of the amount of
`time power is lost; whereas in another application, persistent
`memory may only be required for a few minutes or hours.
`
`[0032] Memory management functionality can be pro
`vided in system 100 to create one or more independent,
`indirectly-addressed memory regions. Moreover, NPMU
`meta-data can be provided for memory recovery after loss of
`power or processor failure. Meta-data can include, for
`eXample, the contents and layout of the protected memory
`regions within NPMU 102. In this way, NPMU 102 stores
`the data as well as the manner of using the data. When the
`need arises, NPMU 102 can provide the meta-data to backup
`process 122 to allow system 100 to recover from a power or
`system failure associated with primary process 116.
`
`[0033] In the embodiment of system 100 shown in FIG.
`1A, each update to checkpoint state 120 can overwrite some
`or all of the information currently stored for checkpoint state
`120. Since there is only one copy of checkpoint state 120,
`backup process 122 can remain idle until primary process
`116 fails, and then read checkpoint state 120 to continue the
`
`functions that were being performed by primary process
`116. FIG. 1B is a diagram of an embodiment of a method for
`accessing checkpoint state 120 from NPMU 102. As shown,
`primary process 116 writes data to the beginning address of
`checkpoint state 120, and backup process 122 reads data
`from the beginning of checkpoint state 120. In such an
`embodiment, only one copy of checkpoint state 120 needs to
`be maintained.
`
`[0034] FIG. 1C shows another embodiment of NPMU
`102 con?gured with multiple checkpoint update areas 128
`132 associated with checkpoint state 120. Checkpoint state
`120 can include a full backup state for primary process 116.
`Each update to checkpoint state 120 can be appended to the
`previously written information, thereby creating a series of
`updates to checkpoint state 120 in non-overlapping update
`areas 128-132 of NPMU 102. When checkpoint state 120 is
`relatively large, update areas 128-132 provide a bene?t of
`eliminating the need for primary process 116 to write a
`complete checkpoint state 120 each time information is
`checkpointed.
`[0035] For example, primary process 116 may read in
`large blocks of data during initialiZation, and update various
`segments of the data at different phases of operation. The
`initial checkpoint state 120 can include a backup of all the
`data, while update areas 128-132 can be used to store
`smaller segments of the data as the segments are updated.
`Backup process 122 can then initialiZe itself with checkpoint
`state 120, and apply data from the subsequent update areas
`128-132 in the order they were written. Further, backup
`process 122 does not have to wait until primary process 116
`fails to begin initialiZing itself with data from checkpoint
`state 120 and update areas 128-132. This is especially true
`when there is potential to over?ow the amount of storage
`space available for checkpoint state 120 and update areas
`128-132. This is also true when it would take a greater
`amount of time than desired for backup process 122 to
`recreate the state of primary process 116 after primary
`process 116 fails.
`
`[0036] FIG. 1D is a diagram of an embodiment of a
`method for accessing checkpoint state 120 from NPMU 102
`of FIG. 1C. As shown, primary process 116 appends data to
`the address of checkpoint state 120 and update areas (not
`shown) following the address where data was last written to
`NPMU 102. Backup process 122 reads data from the begin
`ning to the end of the last area that was written by primary
`process 116. In such embodiments, facilities are provided to
`provide backup process 122 with the starting and ending
`location of the most current update to checkpoint state 120,
`as further discussed herein.
`
`[0037] Whether backup process 122 reads checkpoint
`state 120 and update areas 128-132 periodically, or when
`primary process 116 fails, backup process 122 can read any
`previously unread portion of checkpointed state 120 and
`update areas 128-132 before taking over for primary process
`116.
`
`[0038] UtiliZing NPMU 102 allows primary process 116
`to store checkpoint state 120 regardless of the identity,
`location, or operational state of backup process 122. Backup
`process 122 can be created in any remote system that has
`access to NPMU 102. Primary process 116 can write check
`point state 120 and/or update areas 128-132 whenever
`required without waiting for backup process 122 to
`
`

`
`US 2005/0132250 A1
`
`Jun. 16, 2005
`
`acknowledge receipt of messages. Additionally, NPMU 102
`allows ef?cient use of available information technology (IT)
`resources since backup process 122 only needs to execute
`when either (1) primary process 116 fails; or (2) to periodi
`cally read information from checkpoint state 120 and/or
`update areas 128-132 to avoid over?owing NPMU 102. In
`contrast, some previously known checkpointing techniques
`utiliZe message passing between a primary process and a
`backup process to communicate checkpoint information.
`The primary process thus required information regarding the
`identity and location of the backup process. Additionally, the
`backup process had to be operational in previously known
`systems in order to synchroniZe with the primary process to
`receive the checkpoint message.
`
`[0039] Further, NPMU 102 can be implemented in hard
`ware, thereby providing fast access for read and write
`operations. Other previously known checkpointing tech
`niques store checkpoint information on magnetic or optical
`media, which requires much more time to access than
`NPMU 102.
`
`[0040] FIG. 2 shows an embodiment of NPMU 102 that
`uses non-volatile memory 202 communicatively coupled to
`NI 114 via a communications link 206. Non-volatile
`memory 202 can be, for example, MRAM or Flash memory.
`NI 114 typically does not initiate its own requests, but
`instead NI 114 receives management commands from SAN
`112 via communication link 210, and carries out the
`requested management operations. Speci?cally, NPMU 200
`can translate incoming requests and then carry out the
`requested operation. Further details on command processing
`will be discussed below. Communication links 206, 210 can
`be con?gured for wired and/or wireless communication.
`SAN 112 can be any suitable communication and processing
`infrastructure between NI 114 and other nodes such as
`processor nodes 104, 106 in FIG. 1A. For example, SAN
`112 can be a local area network, and/or wide area network
`such as the Internet.
`
`[0041] FIG. 3 shows another embodiment of NPMU 102
`using a combination of volatile memory 302 with battery
`304 and a non-volatile secondary store 310. In this embodi
`ment, when power fails, the data within volatile memory 302
`is preserved using the power of battery 304 until such data
`can be saved to non-volatile secondary store 310. Non
`volatile secondary store can be, for example, a magnetic disk
`or slow FLASH memory. The transfer of data from volatile
`memory 302 to non-volatile secondary memory store 310
`can occur without external intervention or any further power
`other than from battery 304. Accordingly, any required tasks
`are typically completed before battery 304 completely dis
`charges. As shown, NPMU 120 includes optional CPU 306
`running an embedded operating system. Accordingly, the
`backup task (i.e., data transfer from volatile memory 302 to
`non-volatile secondary memory store 310) can be performed
`by software running on CPU 306. NI 114 initiates requests
`under the control of software running on CPU 306. CPU 306
`can receive management commands from the network and
`carry out the requested management operation.
`
`[0042] Various embodiments of NPMU 102 can be man
`aged to facilitate resource allocation and sharing. In some
`embodiments, NPMU 102 is managed by persistent memory
`manager (PMM) 140, as shown in FIG. 1A. PMM 140 can
`be located internal or external to NPMU 102. When PMM
`
`140 is internal to NPMU 102, processor nodes 104, 106 can
`communicate with PMM 140 via SAN 112 and network
`interface (NI) 114 to perform requested management tasks,
`such as allocating or de-allocating regions of persistent
`memory of NPMU 102, or to use an existing region of
`persistent memory. When PMM 140 is external to NPMU
`102, processor nodes 104, 106 can issue requests to NPMU
`102, and NPMU 102 can interface with PMM 104 via NI
`114, SAN 112, and NI 141 associated with PMM 140. As a
`further alternative, processor nodes 104, 106 can commu
`nicate directly with PMM 140 via NI 108, 110, respectively,
`and SAN 112 and NI 141. PMM 140 can then issue the
`appropriate commands to NPMU 102 to perform requested
`management tasks.
`
`[0043] Note that because NPMU 102 can be durable, and
`can maintain a self-describing body of persistent data,
`meta-data related to existing persistent memory regions can
`be stored on NPMU 102. PMM 140 can perform manage
`ment tasks that will keep the meta-data on NPMU 102
`consistent with the persistent data stored on NPMU 102. In
`this manner, the NPMU’s stored data can always be inter
`preted using the NPMU’s stored meta-data and thereby
`recovered after a possible system shutdown or failure.
`NPMU 102 thus maintains in a persistent manner not only
`the data being manipulated but also the state of the process
`ing of such data. Upon a need for recovery, system 100 using
`an NPMU 102 is thus able to recover and continue operation
`from the memory state in which a power failure or operating
`system crash occurred.
`
`[0044] As described with reference to FIG. 1A, SAN 112
`provides basic memory management and virtual memory
`support. In such an implementation, PMM 140 can program
`the logic in NI 114 to enable remote read and write opera
`tions, while simultaneously protecting the persistent
`memory from unauthoriZed or inadvertent accesses by all
`except a select set of entities on SAN 112. Moreover, as
`shown in FIG. 4, NPMU 102 can support virtual-to-physical
`address translation. For example, a continuous virtual
`address space such as persistent memory (PM) virtual
`addresses 402-416 can be mapped or translated to discon
`tinuous persistent memory physical addresses 418-448. PM
`virtual addresses can be referenced relative to a base address
`through N incremental addresses. Such PM virtual
`addresses, however, can also correspond to discontiguous
`PM physical addresses.
`
`[0045] As shown, PM virtual address 402 can actually
`correspond to a PM physical address 436, and so on.
`Accordingly, NPMU 102 can provide the appropriate trans
`lation from the PM virtual address space to the PM physical
`address space and vice versa. In this way, the translation
`mechanism allows NPMU 102 to present contiguous virtual
`address ranges to processor nodes 104, 106, while still
`allowing dynamic management of the NPMU’s physical
`memory. This can be important because of the persistent
`nature of the data on an NPMU 102. Due to con?guration
`changes, the number of processes accessing a particular
`NPMU 102, or possibly the siZes of their respective alloca
`tions, may change over time. The address translation mecha
`nism allows NPMU 102 to readily accommodate such
`changes without loss of data. The address translation mecha
`nism further allows easy and ef?cient use of persistent
`memory capacity by neither forcing the processor nodes
`104, 106 to anticipate future memory needs in advance of
`
`

`
`US 2005/0132250 A1
`
`Jun. 16, 2005
`
`allocation or forcing the processor nodes 104, 106 to Waste
`persistent memory capacity through pessimistic allocation.
`[0046] With reference again to FIG. 1A, a ServerNet SAN
`operating in its native access validation and translation block
`transfer engine (A VT/BTE) mode is an example of a single
`address space SAN 112. Each target on such SAN presents
`the same, ?at netWork virtual address space to all compo
`nents that issue requests to SAN 112, such as processor
`nodes 104, 106. NetWork virtual address ranges can be
`mapped by the target from PM virtual address to PM
`physical address ranges With page granularity. NetWork PM
`virtual address ranges can be exclusively allocated to a
`single initiator (e.g., processor node 104), and multiple PM
`virtual addresses can point to the same physical page.
`[0047] When processor node 104 requests PMM 140 to
`open (i.e., allocate and then begin to use) a region of
`persistent memory in NPMU 102, NPMU’s NI 114 can be
`programmed by PMM 140 to alloW processor node 104 to
`access the appropriate region. This programming allocates a
`block of netWork virtual addresses and maps (i.e., translates)
`them to a set of physical pages in physical memory. The
`range of PM virtual addresses can be contiguous regardless
`hoW many pages of PM physical address are to be accessed.
`The physical pages can, hoWever, be anyWhere Within the
`PM physical memory. Upon successful set-up of the trans
`lation, NPMU 102 can notify the requesting processor node
`104 of the PM virtual address of the contiguous block. Once
`open, processor node 104 can access NPMU memory pages
`by issuing read or Write operations to NPMU 102. NPMU
`102 can also notify subseque

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket