`Gold et al.
`
`111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US006785786Bl
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,785,786 Bl
`Aug. 31,2004
`
`(54) DATA BACKUP AND RECOVERY SYSTEMS
`
`(75)
`
`Inventors: Stephen Gold, Bristol (GB); Jon
`Bathie, Bristol (GB); Peter King,
`Bristol (GB); Ian Peter Crighton,
`Bristol (GB)
`
`(73) Assignee: Hewlett Packard Development
`Company, L.P., Houston, TX (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.:
`
`09/486,439
`
`(22) PCT Filed:
`
`Aug. 28, 1998
`
`(86) PCT No.:
`
`PCT/GB98/02603
`
`§ 371 (c)(l),
`(2), ( 4) Date:
`
`Feb. 28, 2000
`
`(87) PCT Pub. No.: W099/12098
`
`PCT Pub. Date: Mar. 11, 1999
`
`(30)
`
`Foreign Application Priority Data
`
`Aug. 29, 1997
`Aug. 29, 1997
`
`(EP) ............................................ 97306628
`(EP) ............................................ 97306629
`
`Int. Cl? ................................................ G06F 12/16
`(51)
`(52) U.S. Cl. ................................ 711/162; 714/6; 714/4
`(58) Field of Search ................................. 711/162, 104,
`711!114, 161; 714/6
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,644,698 A * 7/1997 Cannon ......................... 714/6
`
`5,649,152 A * 7/1997 Ohran eta!. ................ 711!114
`5,673,381 A * 9/1997 Huai eta!. ..................... 714/1
`5,673,382 A * 9/1997 Cannon eta!. ................ 714/6
`5,758,359 A * 5/1998 Saxon ........................ 707/204
`5,799,147 A * 8/1998 Shannon ........................ 714/6
`5,966,730 A * 10/1999 Zulch ......................... 711!162
`6,035,412 A * 3/2000 Tamer eta!. .................. 714/6
`6,148,412 A * 11/2000 Cannon eta!. ................ 714/6
`
`FOREIGN PATENT DOCUMENTS
`
`0 410 630 A2
`EP
`0 541 281 A2
`EP
`0 769 741 A1
`EP
`* cited by examiner
`
`1!1991
`5/1993
`4/1997
`
`Primary Examiner-Matthew Kim
`Assistant Examiner-Matthew D. Anderson
`
`(57)
`
`ABSTRACT
`
`Clients are connected via a LAN to backup apparatus
`including hard and tape drives, respectively primary and
`secondary backups. Each client schedules backup operations
`on a time basis since last backup, the amount of information
`generated since last backup, etc. Prior to a backup, each
`client sends to the backup apparatus a request including
`information representing the files to be backed up. The
`backup apparatus receives backup requests from the clients
`and accepts or rejects backup requests based on backup and
`network loading. The backup apparatus eliminates redun(cid:173)
`dant files and indicates to the clients, prior to the file back
`up, that some files requested to be backed up are already
`stored and are not to be sent. The backup apparatus enables
`a client to restore any of its 'lost' data by copying it directly
`from the disk, without the need for backup administrator
`assistance.
`
`29 Claims, 9 Drawing Sheets
`
`CLIENT210a
`
`CLIENT
`FILE SYSTEM
`322
`
`I
`DYNAMIC
`SCHEDULER I
`310
`
`AGENT
`215a
`
`I
`
`347
`
`l
`FILE
`DIFFERENCING I
`330
`
`-1
`
`BACKUP
`_,. APPARATUS
`240
`
`LOCAL BACKUP
`CONFIGURATION~
`FILE312 ~
`~ ACTIVE FILE I
`I
`MANAGER
`-1
`P1
`32o
`DTF ~
`CACHE
`332
`I FDALOG h
`H BLOCK .~ -
`-1
`I.
`~
`I
`~ RESTORE
`I
`I
`
`CLIENT
`OPERATING ~
`SYSTEM
`312
`
`DIFFERENCING
`340
`
`FDA345
`
`DATA
`TRANSFER
`350
`
`360
`
`Apple Inc. Exhibit 1009 Page 1
`
`
`
`PRIOR ART
`
`ServerM
`
`Figure 1
`
`130m
`
`100
`
`LAN 120
`
`Server I
`("' 0000000 ......, ~======
`
`uuuuuuu
`
`Tape drive
`
`ID~+---1• -+>I~ D_
`
`~
`
`130a
`
`,,,
`
`140
`
`Client N
`
`•
`\Jl
`d •
`
`~0:1
`
`D liOn
`
`-
`
`f-
`
`---------
`
`~o:l
`
`~-
`
`-
`
`I lOb D
`
`Client 2
`
`. -~o:l
`11 Oa D
`
`-
`
`Client 1
`
`Apple Inc. Exhibit 1009 Page 2
`
`
`
`Figure 2
`
`200
`
`Server M
`
`~~~ It, . )_I
`
`~ 130m
`
`("" 0000000 t'-.,
`
`Server 1
`
`JaiJL,. )-1
`
`~~ 230a
`
`Backup Agent
`
`Backup Agent
`
`~~1-244
`====:5
`[Q]av
`
`242
`
`240~
`
`~--------~--------~~------------~----------~--------~
`
`220
`
`r-~01
`D 210n
`
`-
`
`~ Backup Agent )
`
`•
`\Jl
`d •
`
`Client N
`
`( Backup Agent J
`~-
`~o:!
`
`2l0b D
`
`Client 2
`
`215a
`
`( Backup Agent J
`
`Client 1
`
`Apple Inc. Exhibit 1009 Page 3
`
`
`
`347 ~ BLOCK
`
`FDA LOG
`
`330
`
`~ DIFFERENCING
`[\.
`
`FILE
`
`CACHE ~
`
`332
`
`322
`
`FILE SYSTEM
`
`CLIENT
`
`320
`
`_., MANAGER
`:~ ACTIVE FILE
`
`\
`
`DTF
`
`FILE312 ~
`
`215a
`AGENT
`
`310
`
`I~
`
`SCHEDULER
`
`DYNAMIC
`
`.....
`
`CONFIGURATION~
`LOCAL BACKUP
`
`CLIENT 210a
`
`•
`\Jl
`•
`
`240
`
`APPARATUS
`
`BACKUP
`
`FDA 345
`
`SFP 342
`
`e3
`
`Figur
`
`I~
`
`I'
`
`360
`
`RESTORE
`
`~ TRANSFER
`
`350
`
`DATA
`
`DIFFERENCING
`
`~ 340
`~
`
`OPERATING ~
`
`14-
`
`312
`
`SYSTEM
`
`CLIENT
`
`Apple Inc. Exhibit 1009 Page 4
`
`
`
`405
`
`•
`\Jl
`d •
`
`~
`
`~
`
`J
`
`Pointer!
`
`File2
`
`J
`Directory Tree File
`
`Filet
`
`Pointer2
`Pointer!
`
`l
`
`l
`
`410
`
`Figure 4
`
`I
`Backup Directory Filet
`
`Filet
`
`J
`
`Chunk2a
`Chunkla
`
`File2
`
`400
`
`~
`
`It,
`Chunk3a
`Chunk2a
`~ Chunkla
`I
`
`J
`Backup Directory File2
`
`Chunklb
`
`Filet
`
`~.
`
`r--1---_ l
`
`Apple Inc. Exhibit 1009 Page 5
`
`
`
`Figure
`
`210a
`
`CLIENT
`
`•
`\Jl
`d •
`
`550
`
`RECOVERY
`DISASTER
`
`242
`TAPE BACKUP ~ DRIVE
`TAPE
`
`540
`
`530
`
`CONTROL
`MERGE
`
`520
`
`STORAGE
`BACKUP
`
`510
`
`ELIMINATION
`
`FILE
`
`REDUNDANT
`
`SCHEDULER 500
`
`DYNAMIC
`
`FILE 504
`CONFIG
`BACKUP
`
`IMERG£5281
`
`DELTA 526
`
`524
`
`BASELINE
`
`512
`
`RFEINDEX
`
`FILE 502
`PRIORITY
`
`HARD DISK 244
`
`BACKUP APPARATUS 240
`
`Apple Inc. Exhibit 1009 Page 6
`
`
`
`U.S. Patent
`
`Aug. 31,2004
`
`Sheet 6 of 9
`
`US 6,785,786 Bl
`
`CLIENT SIDE
`r - - - -
`SCHEDULE BACKUP
`L-------~----
`
`600
`
`605
`
`REQUEST SLOT
`L-------~----~
`
`BACKUP SERVER SIDE
`
`NO
`
`610
`
`REQUEST
`~GRANTED
`~?
`,--------~--------YES
`
`FILE DIFFERENCING
`
`BUILD NEW FILE LIST
`
`615
`
`620
`
`TRANSMIT NEW FILE LIST
`
`623
`
`RECEIVE LIST & CALCULATE
`SIGNATURES
`
`635
`
`625
`
`630
`
`648
`
`TRANSMIT DATA FOR 650
`BACKUP
`
`Figure 6
`
`655
`
`Apple Inc. Exhibit 1009 Page 7
`
`
`
`~
`c
`N c
`'"""' ~
`~
`
`> = ~
`
`= ......
`~
`~ ......
`~
`•
`\Jl
`d •
`
`I nl Signature I
`
`1 2131415 61718191
`
`Date I Time
`
`Filename
`
`File 5
`
`Pointer
`
`Signature
`
`Date I Time
`
`Filename
`
`6 Signature /
`
`Date I Time
`
`Filename
`
`File 4
`Figure 7d File 2
`
`3 7/ Signature J
`
`Date I Time
`
`Filename
`
`File 1
`
`Signature I Pointer I
`
`Filename
`
`File 4
`
`Figure 7c
`
`CRC
`
`CRC
`
`File 5
`Figure 7b File4
`
`New (Unique)
`
`n Signature
`
`1 2 3 4 5 6 7 8 9
`
`Date I Time
`
`Filename
`
`n Signature New ( Common )
`
`1 2 3 4 5 6 7 8 9
`
`Date I Time
`
`Filename
`
`UnModified
`
`n Signature
`
`1 2 3 4 5 6 7 8 9
`
`Date /Time
`
`Filename
`
`Modified
`
`n Signature
`
`1 2 3 4 5 6 7 8 9
`
`Date I Time
`
`Filename
`
`n Signature Modified
`
`1 2 3 4 5 6 7 8 9
`
`Date I Time
`
`Filename
`
`File 5
`
`File 4
`
`File 3
`
`File 2
`
`File 1
`
`Figure 7a
`
`Apple Inc. Exhibit 1009 Page 8
`
`
`
`Figure 8
`
`~ File C
`
`File B
`
`~
`
`~
`
`File A
`
`~
`
`File 5
`
`File 4
`
`1-A
`
`l!
`
`~ File 3
`
`File 2
`
`File 1
`
`___.
`
`r;
`
`•
`\Jl
`d •
`
`242 BACKED UP DATA
`
`SignatureC 7
`SignatureA v
`
`~
`
`Signature4
`
`SignatureS
`
`Signature!
`
`SignatureS ~
`
`------
`
`Signature3
`
`Signature2 ~
`
`CRC
`
`CRC
`
`CRC
`
`CRC
`
`CRC
`
`CRC
`
`CRC
`
`CRC
`
`Size
`
`Size
`
`Size
`
`Size
`
`Size
`
`Size
`
`Size
`
`Size
`
`File 4 Client 11 On
`
`File C Client 11 On
`
`File B Client 11 Ob
`
`File 1 Client 11 Ob
`
`File A Client 11 Ob
`
`File 5 Client llOa
`
`File 3 Client 11 Oa
`
`File 2 Client 11 Oa
`
`RFEINDEX
`
`512
`
`Apple Inc. Exhibit 1009 Page 9
`
`
`
`.~ 0
`~HEADS IQE
`
`930
`
`v
`
`Figure 9
`
`'i
`
`A
`
`925
`R/W
`
`v
`
`tt
`
`'i
`
`A
`
`v
`J... HARD DISK
`
`920
`
`905 v
`I/F
`l;t.
`
`r
`--
`
`II.
`
`'i
`
`A
`
`~
`
`....
`
`.... ~
`
`.... ~
`
`v MECHANISM 935
`tt
`
`SYSTEM BUS 912
`
`MEMORY
`
`MAIN
`
`::.
`..:.
`913
`
`•
`\Jl
`d •
`
`915
`ROM
`
`r
`
`tt
`
`APPARATUS 900
`TAPE BACKUP
`
`J
`
`r--
`
`..:.
`
`'i
`
`A
`
`CONTROLLER
`
`910
`
`Apple Inc. Exhibit 1009 Page 10
`
`
`
`US 6,785,786 Bl
`
`1
`DATA BACKUP AND RECOVERY SYSTEMS
`
`TECHNICAL FIELD
`
`The present invention relates to computer data backup and
`recovery and particularly, but not exclusively, to apparatus,
`methods and systems for enacting data backup and recovery.
`
`BACKGROUND ART
`
`5
`
`25
`
`2
`and to restore to said clients data from the primary
`storage means; and
`to backup to the tape storage means, in accordance with
`pre-defined criteria, at least some of the data stored in
`the primary storage means and to restore to the primary
`storage means, in accordance with a respective restore
`message received from a client, at least some data
`stored in the tape storage means.
`Preferably, the controller means is programmed to main-
`10 tain stored on the primary storage means at least the most
`current version of all data received from the clients. In this
`way, a data restore operation can be enacted using data
`stored in primary storage, without needing to find and install
`any particular backup tape.
`Preferably, the controller means is programmed to backup
`data stored in the primary storage means to the tape storage
`means independently of any messages from the clients.
`In preferred embodiments, the primary storage means
`comprises a random access storage means such as a hard
`20 disk drive. Alternatively, the primary storage means com(cid:173)
`prises non-volatile random access memory (NV-RAM). For
`the latter case, however, the applicants believe that currently
`NV-RAM would be very prohibitively expensive compared
`to a hard disk drive.
`In preferred embodiments, the tape storage apparatus
`comprises a housing configured specifically to house the
`controller, the interface means, the primary storage means
`and the tape storage means. Thus, the tape storage apparatus
`provides a dedicated and integrated solution to data storage
`and recovery. In other, less preferred embodiments, the
`components of the apparatus may be distributed, for
`example, with some of the components residing on or in
`other apparatus, such as a computer.
`In accordance with a second aspect, the present invention
`provides a method of backing up to a data backup and
`restore apparatus attached to a network data stored in one or
`more clients also attached to the network, the method
`comprising the data backup and restore apparatus storing in
`primary data storage a most recent version of all data
`received from the clients and, from time to time, in accor(cid:173)
`dance with pre-determined criteria, storing in secondary data
`storage at least some of the data stored in the primary data
`storage.
`In accordance with a third aspect, the present invention
`provides a data storage system comprising:
`a network;
`tape storage apparatus; and
`at least one client connected to the apparatus, the (or the
`at least one) client comprising client storage means and
`client processing means, the client processing means being
`programmed in accordance with pre-determined criteria to
`determine when data stored in the client storage means
`should be backed up to the tape backup apparatus.
`Other aspects and embodiments of the invention will
`55 become apparent from the following description and claims.
`
`Typically computer networks in use today have the basic
`topology shown in the diagram in FIG. 1. In FIG. 1, several
`client workstations are connected via a network to several
`servers.
`Users perform work on the client workstations that can 15
`communicate via a network link with the servers. Clients
`typically store unique user data whereas servers typically
`provide a common central point for some function such as
`shared hard disk storage, data backup storage, software
`applications or printer control.
`There are a number of current schemes used for backing
`up data on the network. A first scheme is for each client to
`individually store data to a device such as a tape drive. Tape
`storage has become the preferred method of backing up
`information on computers due to its relatively low cost, high
`capacity, durability and portability. This first scheme
`requires each person, or each administrator, responsible for
`a client to be responsible for backing up the data. Also, each
`client requires its own backup device. A second scheme is
`for each client to store important data on a remote file server,
`where the server data is backed up at regular intervals, for
`example on a daily basis. Thus, if a client fails, potentially
`only less important information is lost. A third scheme,
`which is available, for example, to systems which operate
`under Windows NT4 and are part of an NT Domain, is that
`all 'specified' information on a client is backed up, typically
`overnight, to a tape drive connected to an NT server.
`Known backup schemes offer relatively good protection
`for client data, but with a very high administration overhead. 40
`For example, if data needs to be recovered by a client, for
`example as a result of one or more files being 'lost' or
`destroyed, then the client's owner typically needs to contact
`an administrator of the backup system and request a restore
`procedure. Typically, a restore procedure involves the 45
`backup system administrator tracking down and mounting a
`respective tape, on which the last backup of the lost file(s)
`was made, and initiating the restore procedure. While such
`a procedure is typically very reliable, it can be perceived as
`onerous on client users and backup system administrators 50
`alike.
`
`30
`
`35
`
`DISCLOSURE OF THE INVENTION
`
`In accordance with a first aspect, the present invention
`provides tape storage apparatus, comprising:
`interface means for connecting the apparatus to one or
`more clients;
`controller means for controlling the apparatus and for
`processing messages received from the one or more
`clients;
`primary storage means; and
`tape storage means, wherein the controller is pro(cid:173)
`grammed:
`to process backup and restore messages received from the 65
`one or more clients respectively to backup to the
`primary storage means data received from the clients
`
`60
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`An embodiment of the present invention will now be
`described in more detail, by way of example only, with
`reference to the following drawings, of which:
`FIG. 1 is a diagram of a conventional computer network;
`FIG. 2 is a diagram of a computer network modified to
`operate in accordance with the present exemplary embodi(cid:173)
`ment;
`FIG. 3 is a diagram illustrating the main functional
`features of a client according to the present exemplary
`embodiment;
`
`Apple Inc. Exhibit 1009 Page 11
`
`
`
`US 6,785,786 Bl
`
`3
`FIG. 4 is a diagram which illustrates the main file struc(cid:173)
`tures on the tape backup apparatus to facilitate a backup
`operation according to the present exemplary embodiment;
`FIG. 5 is a diagram illustrating the main functional
`features of a backup apparatus according to the present 5
`exemplary embodiment;
`FIG. 6 is a flow diagram representing a backup operation
`according to the present exemplary embodiment;
`FIG. 7 is a diagram which illustrates file information
`gathered during the procedure described in relation to FIG.
`6;
`
`FIG. 8 is a diagram which represents a redundant file
`elimination index; and
`FIG. 9 is a block diagram of backup apparatus according
`to an embodiment of the present invention.
`
`BEST MODE FOR CARRYING OUT THE
`INVENTION, & INDUSTRIAL APPLICABILITY
`
`4
`server 130a. The backup software schedules the backup to
`happen at, for example, 11:00 pm each night, and controls
`the transfer of data from the file server 130a to the backup
`apparatus 900.
`FIG. 2 is a diagram, which illustrates a networked com-
`puter environment 200 modified for operation according to
`the present embodiment. As shown, a plurality of clients
`(designated 210a, 210b, etc) are connected to a computer
`network 220. In this case also, the network is a LAN (local
`10 area network), such as an Ethernet, which supports the
`TCP liP data communications protocol. A number of servers
`(designated 230a, 230b, etc) are connected to the LAN 220.
`The servers, as above, may be file servers, print servers,
`email servers, or any combination thereof. Also connected to
`15 the LAN 220 is a backup apparatus 240 according to the
`present embodiment. The backup apparatus 240 is shown to
`include a hard disk drive 244 in addition to a tape drive 242.
`The operation of the backup apparatus 240 will be described
`in more detail below.
`In general terms, servers can be thought of as more
`powerful clients, or clients with large amounts of disk
`storage. Thus, for the sake of convenience and unless
`otherwise stated, the term "client" when used hereafter shall
`be taken to mean either a server or a client in a network
`environment. Further, for ease of description only, the term
`"client" shall also be taken to include any device or appa(cid:173)
`ratus which stores data locally which can be backed up
`remotely.
`Further, unless otherwise indicated, only one client sys-
`30 tern 210a will be considered, although it will be understood
`that the other clients operate in the same manner.
`The client 210a includes a backup agent 215a, which
`comprises one or more software routines. The main func(cid:173)
`tional modules of the backup agent 215a are illustrated in the
`35 diagram in FIG. 3. Each module comprises one or more
`software routines, written for example in the C++ program(cid:173)
`ming language, which control the client 210a to process data
`and communicate with the backup apparatus 240 as
`described in detail below. The software routines are stored
`on a hard disk drive device (not shown) in the client and are
`loaded into main random access memory (RAM) when they
`are required to operate, and a central processor in the client
`processes the instructions to control the client to operate in
`accordance with the present embodiment. The lines illus-
`45 trated interconnecting the various modules and diagram
`blocks represent communications channels which are open
`some or all of the time, depending on requirement. The
`client 210a is a general purpose computer, such as a PC
`running the Windows NT 4.0 operating system.
`50 Dynamic Scheduler Module
`In FIG. 3, a dynamic scheduler 310 is a module respon(cid:173)
`sible for dynamically initiating a backup cycle from the
`client 210a, based on time since the last backup and the local
`system resources. A local backup configuration file 312
`55 contains details on a general network-wide policy set by the
`network administrator and a local user-defined policy, in
`terms of a target time delay before data is protected. For
`example, one default policy would be to attempt a backup
`once an hour. The dynamic scheduler 310 is a background
`60 process which runs permanently on the client 210a.
`After the target time delay (e.g. 1-hour) has passed, the
`scheduler 310 assesses the local client system 210a
`resources to ensure that the backup can run without seriously
`impacting the system performance. If the local client system
`65 210a is heavily loaded, for example at 95% capacity, the
`scheduler 310 will retry a short period later (e.g. after 5
`minutes) and continue retrying until the system has enough
`
`As has already been discussed, hitherto known backup 20
`schemes and systems offer relatively good protection for
`network data, but with a potentially very high administration
`overhead, particularly when data restoration from tape is
`required. Administrators have the added problems of back(cid:173)
`ups not being run, or failing due to lack of training or 25
`knowledge on the part of the client users. This is especially
`true of backup scheduling, media handling and maintenance
`schemes, such as tape rotation or drive cleaning. Failed
`backups can result in data loss, which, at best, causes time
`to be lost in recreating the data.
`Also, as the amount of networked data increases, backup
`capacity and network bandwidth become significant limiting
`factors (takes longer to backup and thus increases network
`down time during backup), as does the increasing amount of
`backup administration overhead.
`Consider a 10-client and 2-server network where each
`client has 2 Gbytes of disk storage and each server has 20
`Gbytes of disk space. A tape storage device would poten(cid:173)
`tially need to have a capacity of 60 Gbytes to guarantee to
`completely back up the network.
`Further, a typical10-Base-T network can transfer at most
`1 Mbyte of data per second. A full backup of the above
`network would take 60,000 seconds or 17 hours. During this
`time, the network would be unavailable for use for any other
`means. This would be unacceptable to most users.
`The embodiments described herein aims to address at
`least some of these issues.
`FIG. 1 is a diagram, which illustrates a general prior art
`networked computer environment 100. As shown, a plurality
`of computer systems, or clients, (designated llOa, HOb, etc)
`are connected to a computer network 120. In this example,
`the network is a LAN (local area network), such as an
`Ethernet, which supports the TCP/IP data communications
`protocol. Also connected to the LAN are a number of servers
`(designated 130a, 130b, etc). The servers may be, for
`example, file servers, print servers, email servers, or any
`combination thereof. In the diagram, a tape drive 140 is
`illustrated connected to the server 130a, where the server
`130a is a file server. The data stored on the file server 130a,
`in on-line storage (where on-line indicates that the storage is
`accessible at a given instance) such as a hard disk drive, is
`backed up to the tape drive 140 on a daily basis, typically to
`a new tape each day. Tape media is termed off-line storage,
`since when a tape is removed it is no longer accessible. A
`backup operation is scheduled and controlled by backup
`software, which runs as a background process on the file
`
`40
`
`Apple Inc. Exhibit 1009 Page 12
`
`
`
`US 6,785,786 Bl
`
`5
`
`15
`
`5
`free resources to run the backup. There is an upper time limit
`to the retry, for example 30-minutes, after which time a
`backup is forced irrespective of system loading. The upper
`time limit for retrying is another general policy variable
`stored locally in the backup configuration file 312.
`Once the local client system resources allow, the sched(cid:173)
`uler 310 communicates with the backup apparatus 240 to
`request a backup slot. The backup apparatus 240 will allow
`the backup job to start if the network bandwidth can support
`it. If there are already other backups running from other 10
`clients, which are using up all of the network bandwidth
`allocated to backups, the backup apparatus 240 communi(cid:173)
`cates with the client 210a to refuse the request. As a result,
`the scheduler 310 returns to the retry cycle and waits until
`the backup apparatus 240 gives permission to progress.
`Because the backup agent 215a dynamically initiates the
`backup jobs, the overall network backup scheme is configu(cid:173)
`ration independent. Thus, a client can power-save, or a
`portable client can be detached from the network, and on
`return the backups can continue seamlessly. Thus, there is no 20
`requirement for the backup apparatus 240 to have any
`'knowledge' of which clients are connected, or are not
`connected, to the network at any time.
`The dynamic scheduler 310 also has responsibility for
`initiating the operation of the other modules when required. 25
`Active File Manager Module
`An active file manager module (AFM) 320 monitors
`which files are to be opened by the backup agent 215a, for
`the purpose of backing up. Before a file is opened, the AFM
`320 checks the client's file system 322 to see if the file is 30
`already in use by another program running on the client
`210a. If the file is already in use, the AFM 320 waits until
`the file is in a "safe" state for the backup agent 215a to back
`it up. Only when the file is "safe" does the AFM 320 allow
`the backup agent 215a to open the file. A file can be "safe" 35
`even if the file is locked and in use. If the file is written to
`during the backup operation, the AFM 320 automatically
`preserves the data to be backed up by sending the respective
`blocks directly to the backup apparatus 240 over the net(cid:173)
`work. The backup apparatus 240 can then manage or 40
`re-order these out of order backup blocks, thus preserving
`the original state of the file from when the backup started.
`For example, consider a database of customer addresses
`(not shown), which is being backed up. During the backup,
`a user changes one of the entries in a part of the database that 45
`has not yet been backed up. The AFM 320 immediately
`sends the old address to the backup server, and when the
`backup reaches this point it skips the updated address in the
`database. This method means that when a database applica(cid:173)
`tion thinks it has written data to disk, it has indeed been 50
`written to disk, and not cached somewhere else. Thus, there
`is no possibility of data loss or corruption if the server 240
`were to crash during the backup.
`The AFM 320 can be user-configured to determine when
`a file is "safe" to backup. For example, the AFM 320 can use 55
`a write inactivity period to decide this, which would be one
`of the general policy values stored in the backup configu(cid:173)
`ration file 312. In order to ensure that a backup copy of a file
`does not contain a partial transaction, the AFM 320 monitors
`the period of time that passes without a write taking place. 60
`For example, if the time is set to 5 seconds, the file is not
`"safe" until there is a 5 second period when there are no
`writes active, and at this point the file can be backed up.
`There is also a value for the period of time after which the
`AFM 320 gives up trying to find a safe state. For example, 65
`if this time is set to 60 seconds, then the AFM 320 will try
`for one minute to find a 5 second period with no writes.
`
`6
`Some applications, notably databases, operate a number
`of files simultaneously (e.g. data files and index files) and to
`assure the overall integrity of such files they must be
`configured as a "group". A group defines a number of files
`that must be backed up from a collective "safe" state, and
`only when every file in the group is simultaneously in a
`"safe" state can each file be backed up. This grouping is
`performed automatically by the AFM 320 when it detects
`one of the major database types (e.g. Exchange, Notes, SQL
`Server, and Oracle). Further, the AFM 320 may be config(cid:173)
`ured to treat user-defined list of files as groups, with the
`group definitions being stored in the backup configuration
`file 312.
`File Differencing Module
`A file differencing module (FDM) 330 is a module in the
`backup agent 215a that selects the files to be backed up by
`determining which files have changed or been added since
`the last backup. The module achieves this by reading the
`current directory tree of the local file system 322 and
`checking each file's modified time/date against the entries in
`a cached Directory Tree File (DTF) 332 generated from the
`last backup. Modified files will have different times and
`dates, and new files will have no corresponding entry.
`Modified files are marked as "Modified" and new files are
`marked as "New". Note that for the first backup after
`installation all files will be new files.
`Before the list of modified or new files is further
`processed, the list is filtered for excluded files (such as
`temporary files, Internet cache files, swap files, etc). The
`policy for excluding files is held in the local backup con(cid:173)
`figuration file 312, and is generated from a general network
`policy set by the network administrator and also from a
`user-defined set of excluded files or directories.
`The next stage is to determine which of the new files are
`already held on the backup apparatus 240, and are thus
`redundant. For example, if there has already been a backup
`of a Windows95 workstation, then subsequent backups will
`determine that the Windows95 operating system files are
`redundant. The FDM 330 first sends the list of the selected
`new files to the backup apparatus 240. The list contains for
`each file a 32-bit CRC code (calculated for the respective
`name, date/time stamp and file size information). The
`backup apparatus 240 returns a list of the files that match its
`existing backup file contents, and for each file it also returns
`a signature (in this case a 32-bit CRC checksum calculated
`over the actual file data) and an indication of the location of
`the file on the backup server. For each of the potentially
`redundant files in the list, the FDM 330 generates a respec(cid:173)
`tive signature value and compares it with the value returned
`by the backup apparatus 240. Where the signatures match,
`the file marking is changed from "New" to "Redundant".
`Thus, the output of the FDM 330 is a list of all files which
`are new or modified since the last backup, marked as:
`"Redundant", copy already held on backup server;
`"New", new file, thus no need for block differencing; or
`"Modified", use block differencing to determine which
`blocks changed.
`As well as files, the FDM 330 identifies any modifications
`to the system information used to rebuild the system in
`disaster recovery. This covers areas such as NetWare NDS
`partitions, disk partition information, file system types and
`details (e.g. compressed), bootstrap partitions (e.g. MER,
`NetWare DOS partition).
`Block Differencing
`A block-differencing module (BDM) 340 determines
`which blocks in each file have changed since the last backup.
`The process of identifying the changed portions (deltas) of
`
`Apple Inc. Exhibit 1009 Page 13
`
`
`
`US 6,785,786 Bl
`
`7
`files is performed by two basic processes. The first process
`is a sliding fingerprint (SFP) process 342. In general, a
`fingerprinting process is a probabilistic algorithm, where the
`probability of a failure is made far less than the probability
`of an undetected failure of the underlying storage and
`communication media (for further, detailed information on
`fingerprinting, the reader is referred to the book by Richard
`Karp and Michael Rabin, "Efficient randomised pattern
`matching algorithms", Harvard University Centre for
`Research in Computing Technology, TR-31-81, December
`1981). The second process involves active detection of
`writes to regions of files; this technique requires a process
`called a file delta accelerator (FDA) process 345. The FDA
`process 345 is a background process which operates all the
`time to monitor the client's operating system 312 write calls
`and maintain a log 347 of which logical regions of which 15
`files have been modified.
`The FDA process 345 is more efficient for files that are
`updated in place (e.g. databases), while the SFP process 342
`is far more efficient for document files that are entirely (or
`largely) rewritten with each update-although, only a small 20
`portion of the file may have been modified. As will be
`described, the present embodiment makes use of a combi(cid:173)
`nation of an SFP process 342 and a FDA process 345. As
`each modified file is opened for backup, the FDA process log
`347 is checked to see how much of the file has been 25
`modified. If more than a threshold percentage, for example
`5-10 percent, has been modified, and if the absolute size of
`the changes is smaller than a given size (e.g. 2MB), then the
`SFP process 342 is selected as the appropriate process to use.
`Otherwise, the FDA-detected regions are used. Note that if
`the local client 210a crashes without a 'clean' FDA
`shutdown, all FDA log 347 information is totally
`invalidated, so the BDM 340 must temporarily revert to the
`SFP process (or a conventional incremental backup) when
`the next backup is performed.
`The SFP process 342 divides an updated file into equal(cid:173)
`sized "chunks", the size of which varies depending on the
`file size. Each chunk has a 12-byte fingerprint calculated for
`it, and the fingerprints are sent with the backup data for the
`file to be stored by the backup apparatus 240. When a file is
`to be checked with the SFP process 342, the BDM 340
`communicates with the backup apparatus 240 to download
`the fingerprint set for the file in question. It is also possible
`to locally cache fingerprint sets for files that are frequently
`accessed. The SFP process 342 then calculates the finger- 45
`print function for the updated version of the file, starting
`from the first byte and using a chunk size the same size as
`for the last backup of the file. Then the SFP process 342
`compares the resulting new first fingerprint with the previ(cid:173)
`ous fingerprint set to a find a match. If there is a match, then 50
`the chunk starting at that byte is