`
`[191
`
`[11] Patent Number:
`
`5,649,196
`
`Woodhill et a1.
`[45] Date of Patent: Jul. 15, 1997
`
`
`
`USOOS649196A
`
`[54]
`
`[75]
`
`SYSTEM AND METHOD FOR DISTRIBUTED
`STORAGE MANAGEMENT ON
`NETWORKED COMPUTER SYSTEMS
`USING BINARY OBJECT IDENTIFIERS
`
`Inventors: James R. Woodhill, Houston; Louis R.
`Woodhill, Richmond; William Russell
`More, Jr.; Jay Harris Berlin, both of
`Houston, all of Tex.
`
`[73]
`
`Assignee: Legent Corporation, Pittsburgh, Pa.
`
`[21]
`
`Appl. No.: 555,376
`
`[22]
`
`Filed:
`
`Nov. 9, 1995
`
`[63]
`
`[51]
`[52]
`
`[5 3]
`
`[56]
`
`Related US. Application Data
`
`Continuation of Ser. No. 85,596, Jul. 1, 1993, abandoned.
`
`
`.............................. G06F 11/00
`Int. Cl.6
`US. Cl. .......................... 395/620; 395/610; 395/618;
`395/200.03; 395/489; 395/497.04; 395/185.01;
`395/183.17; 395/182.04; 395/180; 364/DIG. 1;
`364/285; 364/285.1; 364/284.4
`Field of Search ..................................... 395/610, 618,
`395/620, 200.03, 489, 497.04, 185.01, 183.17,
`182.04, 180; 364/DIG. 1
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,616,315 10/1986 Logsdon et a1.
`........................ 364/200
`5,133,065
`7/1992 Cheffetz et al.
`..... 395/575
`..
`
`5,202,982
`4/1993 Gramlich et a1.
`.
`395/600
`
`5,235,601
`8/1993 Stallmo et a1.
`371/401
`
`5,239,637
`8/1993 Davis et al. ........... 395/425
`
`..... 395/800
`8/1993 Rudeseal et a1.
`.
`5,239,659
`
`5,274,802 12/1993 Altine ................. 395/600
`5,278,838
`1/1994 Ng et a1.
`371/10.l
`
`..
`5,295,258
`3/1994 Jewett et a1.
`. 395/575
`11/1994 Webber et a1.
`......................... 395/800
`5,367,698
`
`OTHER PUBLICATIONS
`
`The 8th International Conference on Distributed Computing
`Systems, 13 Jun. 1988, San Jose, California pp. 471—479
`(Relevant to claim No. 1, 2, 5, 8, 9, 18, 19) Daniel Barbara
`et a1. ‘Exploiting symmetries for low—cost comparison of
`file copies’ see p. 471, right col., line 29-p. 472, left col., line
`10 (Relevant to claim No. 3, 4, 6, 10—14, 20.
`Computer Technology Review, vol. 12, No. 10, Aug. 1992,
`Los Angeles US pp. 55—60 Daniel Masters ‘Distributed
`Network Processing Speeds Up Network Backup’ see p. 60,
`left col., line 18—line 31 (Relevant claim N0. 10).
`Primary Examiner—Paul V. Kulik
`Assistant Examiner—Jean R. Homere
`Attorney, Agent, or Finn—Kirkpatrick & Lockhart LLP
`[57]
`ABSTRACT
`
`The present invention is directed to a system and method for
`the distributed management of the storage space and data on
`a networked computer system wherein the networked com—
`puter system includes at least two storage devices for storing
`data files comprised of one or more binary objects. The
`distributed storage management system includes a device for
`selectively copying the binary objects stored on one of the
`storage devices to another of the storage devices and another
`device for calculating a current value for a binary object
`identifier for selected binary objects stored on the storage
`devices wherein the calculation of the binary object identi-
`fier is based upon the actual data contents of the associated
`binary object. The distributed storage management system
`further includes a device for storing the current value of the
`binary object identifier as a previous value of the binary
`object identifier, another device for comparing the current
`value of the binary object identifier associated with a par-
`ticular binary object to one or more previous values of the
`binary object identifier associated with that particular binary
`object and a device for commanding the device for selec—
`tively copying binary objects in response to the device for
`comparing.
`
`18 Claims, 14 Drawing Sheets
`
`
`/?09
`Crate
`Agsisn
`5
`Backup
`Lser—
`=
`Queue
`’
`Defined
`
`Record
`Pnori
`/
`/]12
`Update
`Dadrup
`Queue
`
`y
`
`
`
`Chung: mFil
`
`5'
`
`in
`
`Queue
`File
`Backup
`L—R°?°‘d
`SDutItxsed-
`: Queucnccax
`5W m
`N
`N
`
`ma ——‘
`”94““
`g
`Delete —-
`Quwe
`
`[20/
`Create
`——116
`rile _
`Identification
`Record
`
`Delele
`B k
`Q1215
`Record
`I
`‘22
`
`
`
`
`
`refill?
`{133
`C
`ldmrificalion
`_. HinmyObjecl
`Rewrds
`
`Local
`
`\124
`
`
`
`132—
`
`‘
`
`1,
`
`,
`
`128— m:
`ldmtificmian
`E
`
`ATE.
`!
`134\
`Create
`ldcntifv
`Data
`Blckup
`Bing”.
`Sleam>
`130— Instance
`Objeclsfal
`1MB
`_Rfinni_
`—\¢___
`5
`”0
`J
`Y
`[33ka
`Separate
`chmmt Data
`‘1"
`utlpc
`‘1' SI
`”6
`
`Piggo
`SI’rzaIrtlpm
`1‘
`
`Streams
`136/ Binary/Objects
`n
`a" )
`
`=
`
`
`
`;
`‘
`
`GOOG-1003-Page 1 of 27
`
`GOOG-1003-Page 1 of 27
`
`
`
`US. Patent
`
`Jul. 15, 1997
`
`Sheet 1 of 14
`
`5,649,196
`
`1 8
`
`User
`Work Station
`
`
`
`17
`
`1
`
`Local
`Compute
`
`/ 15
`
`User .
`Work Statlon
`
`18
`
`17
`
` _1_Q
`
`Local
`Compute
`
`20
`
`19
`
`.
`
`\ 15
`
`14
`
`12
`
`20
`
`19
`
`Wide
`
`
`
`Network
`
`13
`
`Remote
`
`Fileserver
`
`Backup
`
`FIG. 1
`
`GOOG-1003-Page 2 of 27
`
`GOOG-1003-Page 2 of 27
`
`
`
`US. Patent
`
`Jul. 15, 1997
`
`Sheet 2 of 14
`
`5,649,196
`
`Compressed Storage Files
`
`Free Disk Space
`
`Local Computer Data Files
`
`Operating System Files
`
`Backup Queue Database
`
`File Database
`
`Distributed Storage
`
`Management Program
`
`FIG. 2
`
`GOOG-1003-Page 3 of 27
`
`GOOG-1003-Page 3 of 27
`
`
`
`US. Patent
`
`Jul. 15, 1997
`
`Sheet 3 of 14
`
`5,649,196
`
`_2_§
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Backup
`Instance
`Record
`42
`
`Binary
`Obj ect
`Ident.
`
`, Record
`
`58
`\
`
`74
`
`Binary
`Object
`Identifier
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIG. 3
`
`GOOG-1003-Page 4 of 27
`
`GOOG-1003-Page 4 of 27
`
`
`
`US. Patent
`
`Jul. 15, 1997
`
`Sheet 4 of 14
`
`5,649,196
`
`Backup
`
`Record
`
`vs
`
`2s
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Record Type
`
`File Priority
`
`FIG. 4
`
`76
`
`78
`
`80
`
`82
`
`84
`
`86
`
`88
`
`
`
`
`
`
`90
`
`92
`
`
`
`
`
`GOOG-1003-Page 5 of 27
`
`GOOG-1003-Page 5 of 27
`
`
`
`US. Patent
`
`Jul. 15, 1997
`
`Sheet 5 of 14
`
`r
`
`5,649,196
`
`108
`
`102
`
`Assign
`User-
`Defined
`
`Priori
`
`
`
`Create
`Backup
`
`Queue
`Record
`
`
`Change to
`Fil -
`
`112
`
`
`
`
`
`
`
`
`
`
`Queue
`Record
`
`
`@ N
`119 /
`
`Flle
`
`128
`
`130
`
`132
`
`‘
`
`1 22
`
`138
`
`reate
`C
`
`File _ —126
`
`Identification
`
`Record
`
`
`Locate
`
`File
`Identification
`
`Record
`
`134\
`Data
`Steam >
`1 MB
`
`
`
`N
`
`. Create.
`Binary Object
`Identification
`
`
`
`Records
`Identi
`Bin fy
`- a”
`Objects for
`Backu -
`
`Separate
`
`
`File Into
`Data
`
`
`Streams
`
`136
`
`Segment Data
`Stream Into
`Multiple
`Binary Objects
`
`
`
`
`
`140
`
`To Step 116
`
`GOOG-1003-Page 6 of 27
`
`GOOG-1003-Page 6 of 27
`
`
`
`US. Patent
`
`Jul. 15,1997
`
`Sheet 6 of 14
`
`5,649,196
`
`
`
`Compile list
`Of Binary
`Objects To
`Be Backed Up
`
`
`
`
`
`Transmit
`.
`Delete Binaxy
`
`
`Storage File
`Slsiffimen
`Object From
`
`
`
`To Remote
`stagge
`Local
`
`
`
`
`Backup
`p
`Storage
`
`
`
`
`
`
`
`
`
`Locate Highest
`Priority Binary
`Object
`
`300
`
`For Backup
`
`
`
`304
`
`
`
`Routine
`
`Send Message
`To Resource
`Allocation
`
`FIG 5C
`
`
`
`Send List Of
`Wait For
`Message From
`Binary
`
`
`
`
`Objects
`Resource
`Allocation
`
` For Backu u
`
` Routine
`
`
`
`
`306
`
`308
`
`GOOG-1003-Page 7 of 27
`
`GOOG-1003-Page 7 of 27
`
`
`
`US. Patent
`
`Jul.15, 1997
`
`Sheet 7 of 14
`
`5,649,196
`
`Send Message
`
`
`
`To Resource
`
`Allocation Routine
`
`
`
`Message From Backup/
`
`Restore Routine
`
`Wait For Compress
`
`Send Message
`
`To Resource
`
`
`
`
`Allocation Routine
`Allocated
`
`
`Storage
`Filled
`
`
`
`Request Allocation
`
`
`
`Of Compressed
`
`Storage File
`
`
`
`FIG. SD
`
`‘320
`
`Request
`
`Send Message
`To Resource
`Allocation
`Routine
`
`Wait For
`
`Allocation
`
`Allocate
`
`Compressed
`Storage File
`
`FIG. 5B
`
`GOOG-1003-Page 8 of 27
`
`GOOG-1003-Page 8 of 27
`
`
`
`N 350
`
`
`Mess-
`Mess-
`
`Mess-
`
`
`age From
`age From
`
`
`age From Back-
`Compression
`Local Storage
`
`
`
`
`
`Routine
`Routine
`u Restore
`
`outin ‘
`
`
`
`Mess-
`Y
`age From
`Y
`
`
`Compression
`
`
`Routin
`
`
`
`
`dd Compres-
`Save Storage
`
`sion Routine
`
`
`To Space
`Space.
`
`
`_ q , _
`-
`Information
`
`
`
`Mark
`
`
`
`Compression
`
`
`Available
`
`
`
`
`Detrmine
`Higest Prori
`
`Waiting
`
`N Compression
`Routines
`
`
`
`
`Send Message
`
`Storage
`
`
`To
`Space
`
`
`
`Available
`Compression
`
`
`Routine
`
`
`358
`
`U.S. Patent
`
`Jul.15, 1997
`
`Sheet 8 of 14
`
`5,649,196
`
`
`
`
`Wait For
`
`Message Fro
`Prioritization
`
`Routine
`
`
`4
`3 4
`
`Routine As
`
`
`
`3 54
`
`352
`
`
`
`
`
`Binary
`
`Object
`
`
`336
`
`33 8
`
`
`
`Compression
`Routine
`
`- vailabl -
`
`Y
`
`340
`
`Trasmit Mess-
`
`
`
`
`
`age To Backup/
`Restore
`Routine
`
`
`
`N
`
`N9
`
`
`Send Message
`
`
`Any
`
`Lower Priority
`T0 D81ete
`
`
`
`Lower Priority
`Storage
`
`
` File
`Storage Files
`
`
`
`Send Message
`To Delete
`,6,
`
`
`
`FIG. 5F
`
`Storage Files
`Afier Transfer
`
`GOOG-1003-Page 9 of 27
`
`GOOG-1003-Page 9 of 27
`
`
`
`US. Patent
`
`Jul. 15, 1997
`
`Sheet 9 of 14
`
`,
`
`5,649,196
`
`
`
`
`Identity Whether
`
`Binary Object Is
`
`
`
`
`Segment Of
`Database File
`
`
`
`
`
`Create
`Shadow
`
`
`File
`
`
`
`
`
`
`Calculate
`
`Contents
`
`Identifiers
`
`
`
`
`
`
`Calcualte
`
` Identifiers
`
`Contents
`
`
`
`
`
`
`Contents
`Identifiers
`
`Equal
`
`
`
`Last
`
`Granule
`
`
`Processed
`
` FIG. 5G
`
`GOOG-1003-Page 10 of 27
`
`GOOG-1003-Page 10 of 27
`
`
`
`US. Patent
`
`Jul.15, 1997
`
`Sheet 10 of 14
`
`5,649,196
`
`Create Work
`
`Area On
`.
`
`Remote File Server
`
`Locate Most
`
`
`Recent Complete
`
`
`
`
`
`Copy Of
`
`'
`
`Binary Object
`
`420
`
`422
`
`424
`
`430
`
`Create
`
`Bitmap
`
`Obtain
`
`
`
`List
`
`Of
`Granules
`
`
`
`Locate Most
`
`Recent
`
`
`
`
`
`
`
`
`
`
`
`Binary Object
`
`
`GTanularized
`
`Copy Of
`
`426
`
`
`
` Restore
`438
`
`
`Reconstituted
`
`
`Binary
`
`Object
`
`GOOG-1003-Page 11 of 27
`
`GOOG-1003-Page 11 of 27
`
`
`
`US. Patent
`
`Jul. 15, 1997
`
`Sheet 11 of 14
`
`5,649,196
`
`Obtain
`
`OfFile
`
`Identity
`
`443
`
`Compile List Of
`
`Binary Objects In
`Previous Version
`
`Of File
`
`Calculate
`
`
`
`
`Contents
`Identifiers
`
`
`
`44
`
`Transmit
`
`Request
`
`Update
`
`Reconstitute
`
`Object
`
`Binary
`
`. 48
`
`FIG. SI
`
`54
`-
`56
`0
`45
`452
` Transmit
`
`
`Do
`
`
`Contents
`Granules
`
`
`
`Identifiers
`
`To Local
`
`
`Identifiers
`Match
`
`
`
`
`
`Compare
`Contents
`
`Computer
`
` 460
`
`More
` End
`Granules
`
`GOOG-1003-Page 12 of 27
`
`GOOG-1003-Page 12 of 27
`
`
`
`US. Patent
`
`Jul. 15, 1997
`
`Sheet 12 of 14
`
`5,649,196
`
`502
`
`500
`
`Initiate Restore
`
`
`
`
`
`
`
`Binary Object
`
`Of Randomly
`
`Selected
`
` Restore
`. Binaryr
`
`
`
`Object
`
`04
`
`Calculate
`
`Identifier
`
`Binary Object
`
`506
`
`
`
`Binary
`
`Object Identifiers N
`Identical
`
`
`
`
`
`
`510
`
`
`Generate
`Audit
`
` Failure
`
` Log
`
`Successful
`
`
`
`
`Restore
`
`Audit
`
`FIG. SJ
`
`50
`
`8
`
`GOOG-1003-Page 13 of 27
`
`GOOG-1003-Page 13 of 27
`
`
`
`US. Patent
`
`Jul. 15,1997
`
`Sheet 13 0f 14
`
`5,649,196
`
`Obtain
`
` 600
`
`Last Access
`
`Date
` 602
`
`Restore File
`
`Database if
`
`
`
`
`
`Necessary
`
`‘
`
`FIG. 5K
`
` o4
`
`
`
`Read File
`
`Identification
`
`Record
`
`
`
` Put
`
`
`
`Disk Drive
`
`
`
`Online
`
`
`Locate Most
`
`Recent Backup
`
`
`Instance Record
`
`
`
`
` Initiate Restoration
`
`
`
` Set Migration
`
`Of File; Set
`
`Migration Status
`
`To "Normal"
`
`Status To
`
`
`"Migrated"
`
`GOOG-1003-Page 14 of 27
`
`GOOG-1003-Page 14 of 27
`
`
`
`US. Patent
`
`Jul. 15,1997
`
`Sheet 14 of 14
`
`5,649,196
`
`Locate Each
`
`
`
`
`
`File
`
`Identification
`Record
`
`
`
`Create
`
`Retention
`
`710
`
`
`Additional
`
`Working
`
`Backup Instance
`
`List
`
`Records
`
`
`
`
`
`
`Locate Most
`
`
`
`
`Mark Retention
`
`Insert
`
`Date Matches
`Recent Backup
`Working List
`
`
`
`Unused Date
`Instance
`Entries As
`
`
`
`
`Ranges
`
`
`Record
`
`
`Us ed
`
`
`
`GOOG-1003-Page 15 of 27
`
`GOOG-1003-Page 15 of 27
`
`
`
`5,649,196
`
`1
`SYSTEM AND METHOD FOR DISTRIBUTED
`STORAGE MANAGEMENT ON
`NETWORKED COMPUTER SYSTEMS
`USING BINARY OBJECT IDENTIFIERS
`
`This application is a continuation application Ser. No.
`08/085,596 filed on Jul. 1, 1993, now abandoned.
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`
`The present invention is directed generally to a system
`and method for distributed storage management on a net-
`worked computer system and, more specifically, to a system
`and method for distributed storage management on a net-
`worked computer system including a remote backup file
`server and one or more local area networks in comrnunica—
`tion with the remote backup file server.
`2. Description of the Background of the Invention
`Backup copies of information stored on a computer
`system must be made so that if a failure occurs which causes
`the original copies of the data to be lost, the lost data can be
`recovered as it existed at the time when the last backup copy
`was made. Backup/restore systems have a long history on all
`types of computer systems from mainframes to
`minicomputers, local area network file servers and desktop
`workstations.
`
`Historically, backup systems have operated by making
`copies of a computer system’s files on a special backup
`input/output device such as a magnetic tape drive, floppy
`diskette drive, or optical disk drive. Most systems allow full
`backup, partial backup (e.g., specified drives, directories, or
`files), or incremental backups based on files changed after a
`certain date or time. Copies of files made during a backup
`procedure are stored on these special backup devices and are
`then later retrieved during a restore operation either under
`file names derived from the original file, from the date/time
`of the backup operation or from a serially-incremented
`number. The backup procedure is typically accomplished on
`an individual computer/file server basis, rather than through
`a single coordinated approach encompassing multiple sys-
`tems. That is, the computer resources of two computers at
`most (the one processing the files to be backed up and the
`one with the backup device attached) are employed to effect
`the backup process, regardless of the actual number of
`computers networked together.
`Today, the absolute numbers of computers networked
`together by organizations are increasing rapidly as is the
`number of diiferent types of computers and operating sys-
`tems in use. At the same time, the number of storage devices
`and the capacities incorporated into each of these units is
`growing even more rapidly. In this environment, the backup/
`restore approaches which have been traditionally used have
`become less reliable, more expensive, and more consump-
`tive of human time and attention.
`
`Thus, the need exists for a system designed to overcome
`the limitations of the existing backup/restore systems that
`have the following characteristics: (1) is capable of operat-
`ing on a networked computer system incorporating various
`types of computers and operating systems; (2) is capable of
`accommodating a large array of large capacity storage
`devices; (3) is reliable; (4) is capable of operating with a
`minimum amount of human intervention; and (5) is rela—
`tively inexpensive.
`SUMMARY OF THE INVENTION
`
`The present invention is directed to a system for the
`distributed management of the storage space and data on a
`
`2
`
`networked computer system wherein the networked com-
`puter system includes at least two storage devices for storing
`data files comprised of one or more binary objects. The
`distributed storage management system includes means for
`selectively copying the binary objects stored on one of the
`storage devices to another of the storage devices and means
`for calculating a current value for a binary object identifier
`for selected binary objects stored on the storage devices
`wherein the calculation of the binary object identifier is
`based upon the actual data contents of the associated binary
`object. The distributed storage management system further
`includes means for storing the current value of the binary
`object identifier as a previous value of the binary object
`identifier, means for comparing the current value of the
`binary object identifier associated with a particular binary
`object to one or more previous values of the binary object
`identifier associated with that particular binary object and
`means for commanding the means for selectively copying
`binary objects in response to the means for comparing.
`The present invention is further directed to a method for
`the management of the storage space and data on a computer
`system wherein the computer system includes at least two
`storage area for storing data files comprised of one or more
`binary objects. The storage space management method
`includes the following steps: (1) selectively copying the
`binary objects stored in one of the storage areas to another
`of the storage areas; (2) calculating a current value for a
`binary object identifier for selected binary objects stored in
`the storage areas wherein the calculation of the binary object
`identifier is based upon the actual data contents of the
`associated binary object; (3) storing the current value of the
`binary object identifier as a previous value of the binary
`object identifier; (4) comparing the current value of the
`binary object identifier associated with a particular binary
`object to one or more previous values of the binary object
`identifier associated with that particular binary object; and
`(5) controlling the step for selectively copying binary
`objects in response to the step for comparing.
`The system and method. of the present invention for the
`management of the storage space on a computer system
`provide a backup/restore system that is capable of operating
`on a networked computer system incorporating various
`types of computers and operating systems, is capable of
`accommodating a large array of large capacity storage
`devices, is reliable, is capable of operating with a minimum
`amount of human intervention and is relatively inexpensive.
`These and other advantages and benefits of the present
`invention will become apparent fi‘om the description of a
`preferred embodiment hereinbelow.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`For the present invention to be clearly understood and
`readily practiced, a preferred embodiment will now be
`described, by way of example only, with reference to the
`accompanying figures wherein:
`FIG. 1 illustrates a simplified representation of a net-
`worked computer system in which the system and method of
`the present invention may be employed;
`FIG. 2 illustrates the manner in which the Distributed
`Storage Manager program of the present invention allocates
`the storage space on each of the storage devices illustrated
`in FIG. 1;
`FIG. 3 illustrates the File Database utilized by the Dis-
`tributed Storage Manager program of the present invention;
`FIG. 4 illustrates the Backup Queue Database utilized by
`the Distributed Storage Manager program of the present
`invention; and
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`GOOG-1003-Page 16 of 27
`
`GOOG-1003-Page 16 of 27
`
`
`
`5,649,196
`
`3
`FIGS. Sa—Sl illustrate flow charts explaining the operation
`of the Distributed Storage Manager program of the present
`invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`
`FIG. 1 illustrates a simplified representation of a typical
`networked computer system 10 in which the system and
`method of the present invention for distributed storage
`management on networked computer systems may be
`employed. A remote backup file server 12 is in
`communication, via data path 13, with a wide area network
`14. The wide area network 14 is, in turn, in communication
`with a plurality of local area networks 16 via data paths 15.
`Those of ordinary skill in the art will recognize that any
`number of wide area networks 14 may be in communication
`with remote backup file server 12 and that any number of
`local area networks 16 (from 1 to more than 100) may be in
`communication with each wide area network 14. Those of
`ordinary skill in the art will also recognize that the means for
`communication between remote backup file server 12, wide
`area network 14 and local area networks 16 over data paths
`13 and 15 is well known.
`
`Each local area network 16 includes multiple user work-
`stations 18 and local computers 20 each in communication
`with their respective local area network 16 via data paths 17.
`Again, those of ordinary skill in the art will recognize that
`the means for communication between user workstations 18,
`local computers 20 and local area networks 16 via data paths
`17 is well known. The storage space on each disk drive 19
`on each local computer 20 in the networked computer
`system 10 is allocated as follows and as is shown in FIG. 2:
`(1) operating system files 22; (2) a Distributed Storage
`Manager program 24 which embodies the system and
`method of the present invention (the operation of which is
`described in detail hereinbelow); (3) a File Database 25 (the
`structure of which is described in detail hereinbelow); (4) a
`Backup Queue Database 26 (the structrn'e of which is
`described in detail hereinbelow); (5) local computer data
`files 28; (6) free disk space 30 and (7) compressed storage
`files 32 (created by the Distributed Storage Manager pro-
`gram 24 of the present invention as is explained more fully
`hereinbelow).
`The Distributed Storage Manager program 24 of the
`present invention builds and maintains the File Database 25
`on one of the disk drives 19 on each local computer 20 in the
`networked computer system 10 according to the structure
`illustrated in FIG. 3. The File Database 25 stores information
`relating to each file that has been backed up by the Distrib-
`uted Storage Manager program 24 since the initialization of
`that program on each local computer 20. The File Database
`25 is comprised of three levels of records organized accord-
`ing to a predefined hierarchy. The top level record, File
`Identification Record 34, includes identification information
`for each file that has been backed up by Distributed Storage
`Manager program 24. File Identification Record 34 contains
`the following elements: (1) Record Type 36 (identifies the
`file as either a directory file or a regular file); (2) File
`Location 38 (name of the directory in which the file resides);
`(3) File Name 40 (name of the file); (4) Migration Status 41
`(explained more fully hereinbelow); and (5) Management
`Class 43 (explained more fully hereinbelow).
`For each File Identification Record 34 in File Database
`25, one or more Backup Instance Records 42 are created that
`contain information about the file (identified by File Iden-
`tification Record 34) at the time that the file is backed up.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`4
`
`Each time that a file is backed up, a Backup Instance Record
`42 is created for that file. Each Backup Instance Record 42
`consists of the following elements: (1) Link to File Identi-
`fication Record 44; (2) Backup Cycle Identifier 46 (identifies
`the particular backup cycle during which the Backup
`Instance Record 42 is created); (3) File Size 48; (4) Last
`Modified Date/Time 50; (5) Last Access Date/Time 52; (6)
`File Attributes 54 (e.g., read-only, system, hidden); (7)
`Delete Date 56 (date on which the file was deleted); and (8)
`Insert Date 57 (date on which the Backup Instance Record
`42 was created).
`Associated with each Backup Instance Record 42 is one
`or more Binary Object Identification Records 58. The Dis-
`tributed Storage Manager program 24 views a file as a
`collection of data streams. A data stream is defined as a
`
`distinct collection of data within the file that may be changed
`independently from other distinct collections of data within
`the file. For example, a file may contain its normal data and
`may also contain extended attribute data. A user may change
`the extended attribute data without modifying any of the
`normal data or vice versa. The Distributed Storage Manager
`program 24 further divides each data stream into one or
`more binary objects. If the size of the data stream is equal
`to or less than a previously defined convenient maximum
`binary object size (currently one (1) megabyte), then a single
`binary object represents the data stream. If the data stream
`is larger than the maximum binary object size, then the
`Distributed Storage Manager program 24 divides the data
`stream into multiple binary objects, all but the last of which
`are equal in size to the maximum binary object size. A
`Binary Object Identification Record 58 is created for each
`binary object that comprises the file which was backed up
`during the backup cycle identified by the Backup Cycle
`Identifier 46 of a particular Backup Instance Record 42.
`Each Binary Object Identification Record 58 includes the
`following components: (1) Link to Backup Instance Record
`60; (2) Binary Object Stream Type 62 (e.g., data, extended
`attributes, security); (3) Binary Object Size 64; (4) Binary
`Object CRC32 66 (explained more fully hereinbelow); (5)
`Binary Object LRC 68 (explained more fully hereinbelow);
`(6) Binary Object Hash 70 (explained more fully
`hereinbelow); and (7) Binary Object Oflset 72 (explained
`more fully hereinbelow). The Binary Object Size 64, Binary
`Object CRC32 66, Binary Object LRC 68 and Binary Object
`Hash 70 comprise the Binary Object Identifier 74 which is
`a unique identifier for each binary object to be backed up and
`is discussed in more detail below.
`
`The Distributed Storage Manager program 24 also builds
`and maintains the Backup Queue Database 26 on one of the
`disk drives 19 on each local computer 20 in the networked
`computer system 10 according to the structure illustrated in
`FIG. 4. Each entry (Backup Queue Record 75) in the Backup
`Queue Database 26 is comprised of the following compo-
`nents: (1) Record Type 76 (identifies the file as either a
`directory file or a regular file); (2) File Location 78 (name of
`the directory in which the file resides); (3) File Name 80
`(name of the file); (4) File Status 82 (“new”, “modified” or
`“deleted”); (5) File Size 84; (6) Last Modified Date/Time 86;
`(7) Last Access Date/Time 88; (8) File Attributes 90 (e.g.,
`read-only, system, hidden); and (9) File Priority 92
`(explained more fully hereinbelow).
`The operation of the Distributed Storage Manager pro~
`gram 24 may be illustrated by way of the flow charts
`depicted in FIGS. 5a through 51. For explanation purposes,
`the Distributed Storage Manager program 24 is divided into
`several distinct functions which will be discussed in turn.
`Those of ordinary skill in the art will recognize, however,
`
`GOOG-1003-Page 17 of 27
`
`GOOG-1003-Page 17 of 27
`
`
`
`5
`
`5,649,196
`
`6
`
`that each of the distinct functions operates in cooperation
`with the other functions to form a unitary computer program
`Those of ordinary skill in the art will also recognize that the
`following discussion illustrates the operation of the Distrib-
`uted Storage Manager program 24 on a single local com—
`puter 20, although it should be understood that the Distrib-
`uted Storage Manager program 24 operates in the same
`fashion on each local computer 20 on the networked com-
`puter system 10. The Distributed Storage Manager program
`24 can either be executed on user demand or can be set to
`execute periodically on a user-defined schedule.
`1. Identification of Binary Objects to be Backed Up
`In the flow chart of FIG. 5a, execution of the Distributed
`Storage Manager program 24 begins at step 100 where the
`Backup Queue Database 26 is built by creating a Backup
`Queue Record 75 for each File Identification Record 34
`found in File Database 25. In this way, a list of files that were
`backed up during the previous backup cycle is established so
`that it can be determined which files need to be backed up
`during the current backup cycle. To create each Backup
`Queue Record 75, the Backup Instance Record 42 repre-
`senting the most recent backup of the file represented by
`each File Identification Record 34 is located. This determi-
`nation is made by examining the Backup Cycle Identifier 46
`in each Backup Instance Record 42. The Backup Cycle
`Identifier 46 may represent either a date (month/day/year) or
`numerical value assigned to a particular backup cycle. The
`Backup Queue Record 75 is comprised of certain ofthe data
`fields of both the File Identification Record 34 and the
`Backup Instance Record 42. During the process of creating
`each Backup Queue Record 75, the File Status field 82 is set
`to “DELETED”. However, if the Delete Date field 56 of the
`most recent Backup Instance Record 42 associated with the
`File Identification Record 34 currently being processed is
`non-zero, indicating that the file has been previously deleted,
`then no Backup Queue Record 75 is created for that File
`Identification Record 34. If the backup that is currently
`being processed for the local computer 20 is not a full
`backup (i.e., all files on all disk drives 19 on the local
`computer 20), then the Distributed Storage Manager pro:
`gram 24 will only create Baclmp Queue Records 75 for those
`files that match the backup specifications. For example, if
`only those files that have a file extension of “.EXE” are to
`be backed up, then only File Identification Records 34 that
`correspond to “Em-E” files will be processed
`Program control then continues with step 102 where the
`Distributed Storage Manager program 24 of the present:
`invention scans all disk drives 19 on the local computer 20
`that are to be backed up. This operation consists of scanning
`the directory hierarchy on each disk drive 19 on the local
`computer 20 and returning to the Distributed Storage Man-
`ager program 24 certain file block information for each of
`the directory files and regular files that are stored on the disk
`drives 19 to be backed up. A typical computer operating
`system maintains a file block for each file stored on the
`system which includes information such as file location, file
`type, user-assigned file name, file size, creation date and
`time, modify date and time, access date and time and file
`attributes. This operation may be controlled by some param-
`eters that indicate which drives, directories and files are to
`be backed up during a backup operation. However, the
`default operation is to back up all files on all disk drives 19
`on the local computer 20. Program control then continues
`with step 103 where the Distributed Storage Manager pro-
`gram 24 determines whether the file block information for
`an additional file has been located on the disk drives 19. If
`an additional file has been located, program control contin-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`ues with step 104. If an additional file has not been located,
`program control continues with step 116.
`In step 104, the Distributed Storage Manager program 24
`determines whether a Backup Queue Record 75 exists for
`the located file by comparing the file's file block information
`to the information stored in Backup Queue Database 26. If
`such a Backup Queue Record 75 does not exist (i.e., this is
`the first time this file will be backed up), program control
`continues with step 106 where a Backup Queue Record 75
`for the file is created using the information contained within
`the file’s file block. The File Status field 82 for the newly
`created Baclmp Queue Record 75 is set to “NEW”. Program
`control then continues with step 108 where a user-defined
`priority is assigned to the file and stored in the File Priority
`field 92 of the Backup Queue Record 75. This user-defined
`priority may be assigned to the file by methods that are
`well-known to those of ordinary skill in the art The use of
`the File Priority field 92 by the Distributed Storage Manager
`program 24 is discussed in more detail hereinbelow. Pro-
`gram control is then returned to step 102.
`If the Distributed Storage Manager program 24
`determines, in step 104, that a Backup Queue Record 75
`exists in the Backup Queue Database 26 for the located file,
`program control continues with step 110 where it is deter-
`mined whether any change has been made to the file. This
`determination is made by comparing the information in the
`file’s file block with the information stored in the file’s
`Backup Queue Record 75. If any of the values have changed,
`program control continues with step 112 where File Status
`field 82 is set to “MODIFIED” and the fields in the Backup
`Queue Record 75 are updated from the file’s file block
`information. Program control then continues with step 108
`where a user-defined priority is assigned to the file and
`stored in File Priority field 92; program control is then
`returned to step 102. If the determination is made in step 110
`that no change has been made to the file, then, in step 114,
`the Backup Queue Record 75 is deleted from the Backup
`Queue Database 26 since the file does not need to be backed
`up. Following step 114, program control is returned to step
`102.
`
`If the Distributed Storage Manager program 24
`determines, in step 103, that an additional file has not been
`located, program control continues with step 116. In step
`116, the Distributed Storage Manager program 24 reads each
`Backup Queue Record 75 in Backup Queue Database 26,
`one at a time. The Backup Queue Records 75 in Backup
`Queue Database 26 represent all of the files that must be
`backed up by the Distributed Storage Manager program 24
`during the present backup cycle. Program control continues
`with step 117 where the Distributed Storage Manager pro-
`gram 24 determines whether a next Backup Queue Record
`75 has been located in Backup Queue Database 26. If a next
`Backup Queue Record 75 has been located, program control
`continues with step 118; otherwise, program control contin—
`ues with step 119, where the routine illustrated by the flow
`chart of FIG. 5a is terminated. In step 118, the Distributed
`Storage Manager program 24 determines whether the File
`Status field 82 in the Backup Queue Record 75 currently
`being processed is set to “DELETED”. If the File Status
`field 82 is set to “DELETED”, program control continues
`with step 120 where the Delete Date field 56 in the most
`rec