`Bolosky et al.
`
`I 1111111111111111 11111 lllll lllll 111111111111111 lllll 111111111111111 11111111
`US006477544Bl
`US 6,477,544 Bl
`*Nov. 5, 2002
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) SINGLE INSTANCE STORE FOR FILE
`SYSTEMS
`
`WO
`
`WO 99/21082
`
`4/1999
`
`OTHER PUBLICATIONS
`
`(75)
`
`Inventors: William J. Bolosky, Issaquah; John R.
`Douceur, Bellevue; Scott M. Cutshall,
`Carnation; Richard F. Rashid,
`Redmond; Nathan P. Myhrvold,
`Bellevue; David A. Goebel, Seattle, all
`ofWA(US)
`
`(73) Assignee: Microsoft Corporation, Redmond, WA
`(US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`This patent is subject to a terminal dis(cid:173)
`claimer.
`
`(21) Appl. No.: 09/354,624
`
`(22) Filed:
`
`Jul. 16, 1999
`
`Int. Cl.7 ................................................ G06F 17/30
`(51)
`(52) U.S. Cl. ......................... 707/200; 1/3; 1/10; 1/100;
`1/205
`(58) Field of Search .......................... 707/1, 3, 10, 100,
`707/200, 205
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,410,667 A
`5,706,510 A *
`5,778,384 A
`5,778,395 A *
`5,907,673 A
`5,918,229 A *
`6,185,574 Bl
`
`4/1995 Belsan et al.
`1/1998 Burgoon ..................... 395/619
`7/1998 Provino et al.
`7/1998 Whiting et al. ............. 707/204
`5/1999 Hirayama et al.
`6/1999 Davis et al.
`.................. 707 /10
`2/2001 Howard et al.
`
`FOREIGN PATENT DOCUMENTS
`
`EP
`WO
`WO
`
`0 774 715 Al
`WO 99/09480
`WO 99/12098
`
`5/1997
`2/1999
`3/1999
`
`Steere et al., "A Feedback-driven Proportion Allocator for
`Real-Rate Scheduling", Third Symposium on Operating
`Systems Design and Implementation ( OSDI '99), USENIX
`Association, pp. 145-158 (1999).
`LaLonde, Ken, "Batch daemon-README", UNIX Batch
`Command, University of Toronto, pp. 1-3 (Feb. 27, 1997),
`ftp://ftp.cs.toronto.edu/pub/batch.tar.z printed Dec. 8, 2000.
`* cited by examiner
`
`Primary Examiner-----Kim Vu
`Assistant Examiner-Cam Y T Truong
`(74) Attorney, Agent, or Firm-Law Offices of Albert S.
`Michalik, PLLC
`
`(57)
`
`ABSTRACT
`
`A method and system for storing the data of files having
`duplicate content, by maintaining a single instance of the
`data, and providing logically separate links to the single
`instance. Files of duplicate content have their data stored in
`a common store file by a single instance store (SIS) facility,
`which also converts the original file or files to links to that
`common store file and creates additional links thereto as
`needed. The SIS facility may reside above a file system as
`a filter driver. File system requests directed to the link file
`( e.g., open, write, read, close and delete) reach the SIS filter,
`which then transparently handles each request as if the link
`file was a normal file. To preserve logical separation, writes
`to a SIS link file are to the link file, and the written portion
`recorded as dirty. The SIS filter intercepts SIS read requests,
`and reads clean portions from the common store file and any
`dirty portions from the link file. When the link file is closed,
`the common store file also may be closed, and, if the link file
`has been written, the non-dirtied portions of the link file are
`filled in with clean data from the common store file, and the
`link file reconverted to a normal file. Security is provided to
`prevent unauthorized access to the common store files, as is
`a volume check facility that repairs any inconsistencies in
`SIS metadata.
`
`75 Claims, 16 Drawing Sheets
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 1
`
`
`
`i,-
`~
`,I;;..
`,I;;..
`11.
`-...,l
`-...,l
`~
`O'I
`rJ'J.
`e
`
`'"""' O'I
`'"""' 0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`z 0
`
`N
`0
`0
`N
`~Ul
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`Device
`Storage
`
`56
`
`'nitorl I
`
`47
`
`Adapter
`
`Host
`
`Adapter
`
`Video
`
`55
`
`48
`
`21
`
`Unit
`
`Processing
`
`20
`
`I
`I
`I
`I
`I
`-------------------------------------------------------------------------,
`
`(ROM)
`--------------
`System Memory
`
`24
`
`26 I
`
`1
`50
`
`UIII
`
`PROGRAMS 37'
`
`1 APPLICATION
`
`<
`
`42
`
`Computer(s)
`
`Remote
`
`,
`
`><
`
`I
`Wide Area Network
`
`49
`
`51
`
`LAN
`
`I
`I
`
`______________ J
`I
`I
`I
`I ~:
`
`Interface
`Network
`
`Interface
`
`Interface
`
`Drive
`Optical
`
`Interface
`Disk Drive
`Magnetic
`
`Interface
`
`Drive
`
`Hard Disk
`
`,,~?:~~;o-/ ~
`---
`
`CJ
`
`"~29 3J
`---1--,-
`
`MODULES
`PROGRAM
`OTHER 38 PROGRAM
`
`39
`
`DATA
`
`' "'"
`
`FIG. 1
`
`FILESYS 36
`SYSTEM/35 PROGRAMS
`OPERATIN~ APPLICATION
`
`37
`
`/// 27
`
`/
`
`___ 7L, __
`11111
`
`/
`
`/
`
`/
`
`/
`
`p~:Tr:M ;!! 11
`
`MODULES 38
`OTHER PROGRAM
`
`: 11
`
`I
`I
`
`.I
`SYSTEM 36
`
`PROGRAMS 37
`APPLICATION
`
`FILE
`
`35
`
`-
`
`SYSTEM
`
`OPERATING
`
`(RAM) 25
`
`22
`
`BIOS
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 2
`
`
`
`i,,,-.
`~
`,I;;..
`,I;;..
`11.
`-...,l
`-...,l
`~
`O'I
`rJ'J.
`e
`
`'"""' O'I
`0 ....,
`N
`~ ....
`'JJ. =(cid:173)~
`
`z 0
`
`N
`0
`0
`N
`~Ul
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`Common Store
`
`78_.,_
`
`FIG. 2A
`
`Store\UUIDn
`
`Common
`
`File
`
`Store\UUID2
`
`Common
`
`File
`
`CommonStore(UUID1)
`
`File
`
`Data
`
`Volume
`----I Filesys
`80
`
`75.JJ--f-
`
`--1--
`
`(including Signature)
`Backpointer Stream
`
`94----'--'--
`68 -.....-
`
`~
`
`SIS
`
`8 SIS COPYFILE
`
`60
`
`74
`
`[User Request]
`
`Control
`
`80 _)
`
`I
`-,
`
`:.J
`~ir2\XYZ I
`I
`I
`I
`I
`66 ...-f Dest. -,
`
`-
`
`-
`
`I e
`F"I
`
`-
`
`r:-
`
`Vol~'""
`File~
`
`6°4 I Dir1\XYZ
`rt File
`Source
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 3
`
`
`
`i--
`~
`,I;;..
`,I;;..
`1J.
`-...,l
`-...,l
`,I;;..
`_,.a-...
`rJ'J.
`e
`
`~
`
`'""" O'I
`0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`z 0
`
`N
`0
`0
`N
`~Ul
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`106
`
`II
`
`---;,,,
`
`Dir2\XYZ
`72 ,--L., Link File
`
`----
`I -~--_)
`7
`
`Store\UUIDn
`
`Common
`
`File
`
`(cid:127)
`(cid:127)
`
`Store\UUID2
`
`Common
`
`File
`
`•I,
`
`I
`
`Common Store
`
`CommonStore(UUID1)
`
`File
`
`Data
`
`78-i.--
`
`Volume
`-Filesys
`
`~ i-
`
`76
`
`~L----
`
`~ 1--(including Signature)
`Backpointer Stream
`
`94
`
`68-.., II
`
`II
`
`,I
`
`SIS
`
`File 1/0
`
`~
`
`'t-
`I
`
`Dir1\XYZ -~
`
`L... __ ....J
`102 1 Context
`ri SIS
`
`I
`
`Volume
`Filesys
`
`• 80 _j
`
`I
`
`I
`
`r---Link File r 70
`
`Point .,. .. J
`Reparse
`84
`
`i~i¢
`
`66 L...: __ :.J
`I
`(I. I
`
`I
`
`·
`
`,_ Point
`Reparse
`
`yz
`
`-:£
`r
`
`. 1
`
`64
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 4
`
`
`
`i--
`~
`,I;;..
`,I;;..
`11.
`-...,l
`-...,l
`,I;;..
`_,.a-...
`rJ'J.
`e
`
`'"""' O'I
`0 ....,
`~ ....
`'JJ. =-~
`
`,i;;..
`
`z 0
`
`N
`0
`0
`N
`~Ul
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`CommonStore\(UUID1)
`
`I
`
`I
`
`File
`
`Data
`
`I
`
`I
`
`(including Signature)
`Backpointer Stream
`
`I I
`
`I
`I I
`94~
`
`FIG. 3
`
`SIS
`
`62_,,I
`
`102
`
`104
`
`File 1/0 I
`
`Context Map
`
`1MB + 10KB to 2MB Clean
`1MB to 1MB + 10KB-1 Dirty
`0 to 1 MB-1
`Clean
`
`-70
`
`__.11
`I
`
`2MB
`
`I
`
`Link File Dir1\XYZ
`
`10KB
`1MB 1MB+
`
`110_]
`
`0
`
`82~ Reparse Point
`
`1081 64KB
`1MB+
`
`92
`
`86 ++-T SIS Reparse Tag
`
`UUID I Signature 7
`
`90
`
`88 m Reparse Data
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 5
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 5 of 16
`
`US 6,477,544 Bl
`
`begin
`SIS Copyfile
`
`FIG. 4
`
`Open
`Source File
`
`400
`
`Yes
`
`Copy Contents of
`Source File to
`Common Store
`
`Convert Source File to
`a Link File Including
`Reparse Point
`
`Create Destination
`as Link to Common
`Store File
`
`Add Backpointer(s)
`to any New Link(s) to
`Common Store File
`
`404
`
`406
`
`408
`
`410
`
`Close Files
`
`412
`
`end
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 6
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 6 of 16
`
`US 6,477,544 Bl
`
`User Level
`
`Kernel Level
`
`Open Request ,___ _____ ___,
`SIS Link
`
`Optional Filter
`Driver{s)
`
`96
`
`0
`
`0
`
`SIS Filter Driver
`
`62'
`
`Optional Filter
`Driver{s)
`
`000
`
`98
`
`NTFS
`
`FT Disk
`
`100
`
`FIG. 5
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 7
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 7 of 16
`
`US 6,477,544 Bl
`
`begin
`SIS Open I Create
`
`Receive NTFS Error =
`STATUS_REPARSE
`
`FIG. 6A
`
`600
`
`Open Common
`Store File
`
`Yes
`
`604
`
`Yes
`
`608
`
`No
`Delete Reparse Point,
`Return Handle to Link
`File to User
`
`610
`
`Close Common
`Store File
`
`end
`
`To/ From
`FIG. 68
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 8
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 8 of 16
`
`US 6,477,544 Bl
`
`From FIG. 6A
`
`F/G. 6B
`
`Instruct NTFS to Open Link
`File (Resubmit Open IRP
`with Reparse Flag Set)
`
`620
`
`622
`
`Receive Success Status from NTFS,
`Attach Context, Mark Any Allocated
`Portions as Dirty in Context Map
`
`628
`
`No
`
`Add Link File to List in
`BackpointerStream of
`Common Store File
`
`Start Volume Check
`(FIG. 13A) in Background
`(Unless Already Running)
`
`Yes
`
`Set Check Bit in
`Backpointer Stream if
`Volume Check Running
`
`632
`
`Return Handle to ____ __,,
`634
`Link File to User
`
`To FIG. 6A
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 9
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 9 of 16
`
`US 6,477,544 Bl
`
`User Level
`
`Kernel Level
`
`SIS Write
`Request
`
`Optional Filter
`Driver(s)
`
`96
`
`0
`
`SIS Filter Driver
`
`62'
`
`0
`
`Optional Filter
`Driver(s)
`
`98
`
`0
`
`NTFS
`
`100
`
`FT Disk
`
`FIG. 7
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 10
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 10 of 16
`
`US 6,477,544 Bl
`
`FIG. 8
`
`begin
`SIS Write
`
`Receive (Link File)
`Write Request, Pass
`to NTFS
`
`Receive Status
`from NTFS
`
`804
`
`800
`
`802
`
`806
`
`Return Error
`to User
`
`808
`
`810
`
`Mark Written
`Region as Dirty in
`Context Map
`
`Return Success
`to User
`
`end
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 11
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 11 of 16
`
`US 6,477,544 Bl
`
`User Level
`
`Kernel Level
`
`SIS Read
`Request
`
`0
`
`Optional Filter
`Driver(s)
`
`96
`
`0
`
`62'
`
`SIS Filter Driver
`
`Optional Filter
`Driver(s)
`
`98
`
`100
`
`NTFS
`
`FT Disk
`
`FIG. 9
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 12
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 12 of 16
`
`US 6,477,544 Bl
`
`begin
`SIS Read
`
`Access Map in
`Context
`
`FIG. 10
`
`1000
`
`1004
`
`Convert Read
`Request to Read
`from Common Store
`File, Pass to NTFS,
`Receive Data (or
`Error) from NTFS
`
`1012
`
`1008
`
`1010
`
`Pass Dirty
`Portion of Read
`Request to NTFS
`
`Pass Read
`Request to NTFS,
`Receive Data ( or
`Error) from NTFS
`
`Convert Clean Portion of
`Read Request to Read
`from Common Store File,
`Pass to NTFS
`
`1014
`
`Receive Data (or Error)
`from NTFS, Merge Dirty
`and Clean Portions
`
`1016
`
`Return Data (or
`Error) to User
`
`end
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 13
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 13 of 16
`
`US 6,477,544 Bl
`
`begin
`SIS Close
`
`1100
`
`FIG. 11
`
`Close Link File Handle
`
`No
`
`1110
`
`Copy Clean Portion(s)
`from Common Store
`File to Link File
`
`1112
`
`No 1116
`
`(Leave Dirty
`Portion
`Allocated)
`
`1114
`
`Convert
`Link File to
`Regular File
`
`To I From FIG. 12
`(Delete Backpointer /
`Common Store File)
`
`1108
`
`Yes
`
`Close Common
`Store File
`
`end
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 14
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 14 of 16
`
`US 6,477,544 Bl
`
`begin Delete Backpointer /
`Common Store File
`
`FIG. 12
`
`Delete Backpointer
`Corresponding to Link File
`
`1200
`
`1202
`
`Yes
`
`Yes
`
`1206
`
`No
`
`.___ ___ ~
`
`Yes
`
`Delete Common
`Store File
`
`1208
`
`end
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 15
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 15 of 16
`
`US 6,477,544 Bl
`
`begin
`Volume Check
`
`FIG. 13A
`
`1300
`
`1302
`
`Open* Next (first)
`Common store File
`
`Reset Check Bit in Each
`Backpointer; Close*
`Common Store File
`
`Yes
`
`1306
`
`No
`
`Open I Close Next
`(first) Link File
`(sets Check Bit)
`
`Yes
`
`To FIG. 138
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 16
`
`
`
`U.S. Patent
`
`Nov. 5, 2002
`
`Sheet 16 of 16
`
`US 6,477,544 Bl
`
`From FIG. 13A
`
`FIG. 13B
`
`1320
`
`Open* Next (first)
`Common Store File
`
`1322
`
`Select Next (first)
`Backpointer
`
`No
`
`To/ From FIG. 12
`(Delete Backpointer I
`Common Store File)
`
`1326
`
`1328
`
`No
`Close* Common
`Store File
`
`1330
`
`No
`
`end
`Volume Check
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 17
`
`
`
`US 6,477,544 Bl
`
`1
`SINGLE INSTANCE STORE FOR FILE
`SYSTEMS
`
`TECHNICAL FIELD
`
`The invention relates generally to computer systems and
`data storage, and more particularly to a more efficient way
`to store files of a file system.
`
`BACKGROUND OF THE INVENTION
`
`2
`tion of modified data. While this works well for virtual
`memory and database messaging systems, file systems have
`a number of complexities that are not addressed by these
`prior Copy-On-Write techniques. For example, unlike raw
`5 data, files are renamed, deleted, and may be opened and
`closed by multiple users at different times. Many files also
`have security issues that need to be addressed. Moreover,
`unlike mail messages, there are many different types of files
`in a file system, and not all files in the file system are good
`10 candidates for copy-on-write, such as frequently changed
`application data files. Detecting and carrying out delayed.
`copying for those types of files in a file system would be
`costly and wasteful. In short, there has heretofore not been
`any way in which to represent duplicate file system data as
`15 a single instance thereof, while maintaining a logical dis(cid:173)
`tinction between the user files corresponding to that single
`instance of data, so that the semantics of private files are
`preserved.
`
`The contents of a file of a file system may be identical to
`the contents stored in one or more other files. While some
`file duplication tends to occur on even an individual user's
`personal computer, duplication is particularly prevalent on
`networks set up with a server that centrally stores the
`contents of multiple personal computers. For example, with
`a remote boot facility on a computer network, each user
`boots from that user's private directory on a file server. Each
`private directory thus ordinarily includes a number of files
`that are identical to files on other users' directories. As can
`be readily appreciated, storing the private directories on
`traditional file systems consumes a great deal of disk and
`server file buffer cache space.
`Techniques that have been used to reduce the amount of 25
`used storage space include linked-file or shared memory
`techniques, essentially storing the data only once. However,
`when these techniques are used in a file system, the files are
`not treated as logically separate files by the linked-file or
`shared memory techniques. For example, if one user makes 30
`a change to a linked-file, or if the contents of the shared
`memory change, every other user linked to that file or using
`the shared memory sees the change. This is a significant
`drawback in a dynamic environment where files do change,
`even if not very frequently. For example, in many 35
`enterprises, different users need to maintain different ver(cid:173)
`sions of files at different times, including traditionally read(cid:173)
`only files such as applications. As a result, linked-file or
`shared memory techniques would work well for files that are
`strictly read-only, but these techniques fail to provide the 40
`flexibility needed in a dynamic environment.
`One general concept that has been employed in virtual
`memory and database messaging systems to reduce the
`amount of duplicated data stored is known as a "Copy-on(cid:173)
`Write" technique. With the Copy-on-Write technique, at the 45
`time of a requested copy, a link between a source and
`destination is established, but the copy is not made. Instead,
`the actual copying of the data is postponed and takes place
`only if and when either the source or destination is modified.
`For example, in virtual memory Copy-On-Write, processes 50
`send messages to one another with copy semantics, using the
`virtual memory system to map the same memory into the
`address space of both processes. If a process subsequently
`writes into the memory, a protection fault occurs, and the
`system makes a copy of the page in question and maps the 55
`newly copied page into the address space of the faulting
`process. Copy-on-write is useful when modification is
`expected to be a relatively rare occurrence, wherein the extra
`cost associated with detecting and carrying out the delayed
`copy operation is outweighed by the savings achieved by not 60
`having to make a copy most of the time. For example, in the
`database Copy-On-Write, only one copy of a mail message
`is maintained for a message sent to multiple users. A copy
`is made only if one of the recipients modifies the original
`mail message, a relatively rare occurrence.
`Unlike the linked-file or shared memory techniques, the
`Copy-On-Write concept thus preserves the logical separa-
`
`20
`
`65
`
`SUMMARY OF THE INVENTION
`
`Briefly, the present invention provides a system and
`method for storing the data of files having duplicate data by
`maintaining a single instance of the data, and providing
`logically separate links to the single instance of the data
`representing each file in place of the file. The present
`invention thus manages a single copy of identical files, while
`maintaining the semantics of having separate normal files.
`To accomplish the single instance store (SIS) of the
`present invention, a normal source file is converted to a link
`(file) to a common store file, such as by direct user request.
`A common store file is a file owned by SIS (rather than by
`a user) that is used to contain the data from the files
`represented by SIS links. Logically separate links to the
`same common store file may be created for files having
`duplicate content. The SIS facility may reside above the
`Windows® 2000/NT® file system (NTFS) as a filter driver,
`while the link file may be a sparse file including a reparse
`point identified by a SIS tag. SIS operates transparently, in
`that each file system request directed to the link file ( e.g.,
`open, write, read, close, delete, and so on) ultimately reaches
`the SIS filter, which then handles the request as if the link file
`was a normal file. For example, a file open request opens
`both the link file and the common store file (if not already
`open via another link). Writes to a SIS link file are written
`to the link file, and the SIS filter records the written portion
`of the file as dirty, thus preserving the logical separation of
`link files from one another. Read requests are intercepted by
`the SIS filter, which then reads clean portions from the
`common store file and any dirty portions from the link file.
`When the file is closed, the link file and the common store
`file are closed (unless the common store file is open via
`another link file). In the event that the link file has been
`written, then the sparse portions of the link file are filled in
`with clean data from the common store file, and the link file
`reconverted to a normal file.
`The common store file maintains a backpointer identify(cid:173)
`ing each link file that points to it. When a link file is
`reconverted to a normal file, (i.e., it was written to and then
`closed), the backpointer is removed from the common store
`file. Also, when a link file is deleted, the backpointer at its
`common store file is deleted. When no backpointers remain
`in a common store file, the common store file is deleted,
`since it is no longer needed to store the data for a link file.
`The present invention provides security via a special
`signature to prevent unauthorized access to the common
`store files. A volume check facility that repairs inconsisten-
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 18
`
`
`
`US 6,477,544 Bl
`
`3
`cies in SIS metadata is also provided. In the volume check
`facility, the backpointers of each common store file are
`checked against the links in the file system, and the back(cid:173)
`pointers or links repaired as necessary.
`Other advantages will become apparent from the follow(cid:173)
`ing detailed description when taken in conjunction with the
`drawings, in which:
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram representing a computer system
`into which the present invention may be incorporated;
`FIGS. 2A-2B are block diagrams representing the copy(cid:173)
`ing of a source file to a single instance store (SIS) link file
`and SIS common store file over time in accordance with an
`aspect of the present invention;
`FIG. 3 is block diagram representing various components
`of a SIS link file and SIS common store file in accordance
`with an aspect of the present invention;
`FIG. 4 is a flow diagram generally representing the steps
`taken when copying a source file to a SIS link file and SIS
`common store file in accordance with an aspect of the
`present invention;
`FIG. 5 is a representation of a SIS link file open request
`passing through a preferred SIS and file system architecture
`in accordance with an aspect of the present invention;
`FIGS. 6A-6B comprise a flow diagram generally repre(cid:173)
`senting the steps taken by the SIS facility to handle the open
`request represented in FIG. 5;
`FIG. 7 is a representation of a SIS link file write request
`passing through a preferred SIS facility in accordance with
`an aspect of the present invention;
`FIG. 8 is a flow diagram generally representing the steps
`taken by the SIS facility to handle the write request repre(cid:173)
`sented in FIG. 7;
`FIG. 9 is a representation of a SIS link file read request
`passing through a preferred SIS facility in accordance with
`an aspect of the present invention;
`FIG. 10 is a flow diagram generally representing the steps
`taken by the SIS facility to handle the read request repre(cid:173)
`sented in FIG. 9;
`FIG. 11 is a flow diagram generally representing the steps
`taken by the SIS facility to handle a SIS link file close
`request in accordance with an aspect of the present inven(cid:173)
`tion;
`FIG. 12 is a flow diagram generally representing the steps
`taken by the SIS facility to handle a SIS link file delete
`request in accordance with an aspect of the present inven(cid:173)
`tion; and
`FIGS. 13A-13B comprise a flow diagram generally rep(cid:173)
`resenting the steps taken by a SIS volume check facility for
`repairing inconsistencies in the metadata of SIS files in
`accordance with another aspect of the present invention.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Exemplary Operating Environment
`
`FIG. 1 and the following discussion are intended to
`provide a brief general description of a suitable computing
`environment in which the invention may be implemented.
`Although not required, the invention will be described in the
`general context of computer-executable instructions, such as
`program modules, being executed by a personal computer.
`Generally, program modules include routines, programs,
`
`4
`objects, components, data structures and the like that per(cid:173)
`form particular tasks or implement particular abstract data
`types. Moreover, those skilled in the art will appreciate that
`the invention may be practiced with other computer system
`5 configurations, including hand-held devices, multi(cid:173)
`processor systems, microprocessor-based or programmable
`consumer electronics, network PCs, minicomputers, main(cid:173)
`frame computers and the like. The invention may also be
`practiced in distributed computing environments where
`10 tasks are performed by remote processing devices that are
`linked through a communications network. In a distributed
`computing environment, program modules may be located
`in both local and remote memory storage devices.
`With reference to FIG. 1, an exemplary system for imple-
`15 menting the invention includes a general purpose computing
`device in the form of a conventional personal computer 20
`or the like, including a processing unit 21, a system memory
`22, and a system bus 23 that couples various system com(cid:173)
`ponents including the system memory to the processing unit
`20 21. The system bus 23 may be any of several types of bus
`structures including a memory bus or memory controller, a
`peripheral bus, and a local bus using any of a variety of bus
`architectures. The system memory includes read-only
`memory (ROM) 24 and random access memory (RAM) 25.
`25 A basic input/output system 26 (BI OS), containing the basic
`routines that help to transfer information between elements
`within the personal computer 20, such as during start-up, is
`stored in ROM 24. The personal computer 20 may further
`include a hard disk drive 27 for reading from and writing to
`30 a hard disk, not shown, a magnetic disk drive 28 for reading
`from or writing to a removable magnetic disk 29, and an
`optical disk drive 30 for reading from or writing to a
`removable optical disk 31 such as a CD-ROM, DVD-ROM
`or other optical media. The hard disk drive 27, magnetic disk
`35 drive 28, and optical disk drive 30 are connected to the
`system bus 23 by a hard disk drive interface 32, a magnetic
`disk drive interface 33, and an optical drive interface 34,
`respectively. The drives and their associated computer(cid:173)
`readable media provide non-volatile storage of computer
`40 readable instructions, data structures, program modules and
`other data for the personal computer 20. Although the
`exemplary environment described herein employs a hard
`disk, a removable magnetic disk 29 and a removable optical
`disk 31, it should be appreciated by those skilled in the art
`45 that other types of computer readable media that can store
`data that is accessible by a computer, such as magnetic
`cassettes, flash memory cards, digital video disks, Bernoulli
`cartridges, random access memories (RAMs), read-only
`memories (ROMs) and the like may also be used in the
`50 exemplary operating environment.
`A number of program modules may be stored on the hard
`disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25,
`including an operating system 35 (preferably Windows®
`2000). The computer 20 includes a file system 36 associated
`55 with or included within the operating system 35, such as the
`Windows NT® File System (NTFS), one or more applica(cid:173)
`tion programs 37, other program modules 38 and program
`data 39. A user may enter commands and information into
`the personal computer 20 through input devices such as a
`60 keyboard 40 and pointing device 42. Other input devices
`(not shown) may include a microphone, joystick, game pad,
`satellite dish, scanner or the like. These and other input
`devices are often connected to the processing unit 21
`through a serial port interface 46 that is coupled to the
`65 system bus, but may be connected by other interfaces, such
`as a parallel port, game port or universal serial bus (USE).
`A monitor 47 or other type of display device is also
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1043, p. 19
`
`
`
`US 6,477,544 Bl
`
`5
`
`5
`connected to the system bus 23 via an interface, such as a
`video adapter 48. In addition to the monitor 47, personal
`computers typically include other peripheral output devices
`(not shown), such as speakers and printers.
`The personal computer 20 may operate in a networked
`environment using logical connections to one or more
`remote computers 49. The remote computer (or computers)
`49 may be another personal computer, a server, a router, a
`network PC, a peer device or other common network node,
`and typically includes many or all of the elements described 10
`above relative to the personal computer 20, although only a
`memory storage device 50 has been illustrated in FIG. 1. The
`logical connections depicted in FIG. 1 include a local area
`network (LAN) 51 and a wide area network (WAN) 52. Such
`networking environments are commonplace in offices, 15
`enterprise-wide computer networks, Intranets and the Inter(cid:173)
`net.
`When used in a LAN networking environment, the per(cid:173)
`sonal computer 20 is connected to the local network 51
`through a network interface or adapter 53. When used in a 20
`WAN networking environment, the personal computer 20
`typically includes a modem 54 or other means for establish(cid:173)
`ing communications over the wide area network 52, such as
`the Internet. The modem 54, which may be internal or
`external, is connected to the system bus 23 via the serial port 25
`interface 46. In a networked environment, program modules
`depicted relative to the personal computer 20, or portions
`thereof, may be stored in the remote memory storage device.
`It will be appreciated that the network connections shown
`are exemplary and other means of establishing a communi- 30
`cations link between the computers may be used.
`The present invention is described herein with reference
`to Microsoft Corporation's Windows 2000 (formerly Win(cid:173)
`dows NT®) operating system, and in particular to the
`Windows NT® file system (NTFS). Notwithstanding, there
`is no intention to limit the present invention to Windows®
`2000, Windows NT® or NTFS, but on the contrary, the
`present invention is intended to operate with and provide
`benefits with any operating system, architecture and/or file
`system that needs to store duplicated data.
`
`6
`allows more than two files to be specified for merging into
`a single instance representation thereof. It also may occur
`that the user requests that a SIS file be made from a file that
`is not a SIS link file but already has a single instance
`representation thereof. In such an instance, similar to the
`destination file, the non-SIS link source file may be con(cid:173)
`verted (described below) by the SIS_COPYFILE control to
`a link to the existing single instance.
`As an alternative to the manual SIS copy file operation 60,
`a user level process that seeks identical files may run ( e.g.,
`as a background process) to automatically request merging
`identical files into a single instance store file. The preferred
`user level process, known as a "groveler" 74 (FIG. 2A), uses
`a file system control named SIS_MERGE_FILES as
`described in copending United States patent application
`entitled "Method and System for Automatically Merging
`Files Into a Single Instance Store," assigned to the assignee
`of the present invention, filed concurrently herewith, and
`hereby incorporated by reference herein in its entirety. In
`general, after locating identical files, (possibly only those
`exceeding some threshold size), the result of the automatic
`actions taken by the groveler 74 with respect to the SIS_
`MERGE_FILES control provide a similar result to the
`manual SIS_COPYFILE actions taken by the user, and thus
`for purposes of simplicity, the groveler actions are not
`separately described herein in detail.
`In accordance with one aspect of the present invention,
`FIG. 2B shows the result of the SIS COPYFILE control. In
`FIG. 2B, the source and destination files are SIS link files 70,
`72, while the single instance representation, including the
`file data 76, is maintained as a common store file 68 in a
`common store 78. Each SIS link file 70, 72 is a user file that
`is managed by the SIS facility 62, while the common store
`78 is preferably a file system directory that is not intended
`35 to be visible or accessible to users. The link files 70, 72 are
`preferably on the same file system volume 80, as is the
`common store directory 78. Note that the single instance
`representation need not actually be a file system file in a
`common store directory, but may be stored in some other
`40 data structure. Thus, as used herein, the terms common store
`file and/or single instance file are intended to mean any
`appropriate data structure that can hold at least part of a file's
`contents.
`For efficiency, the SIS facility 62 may be built into the file
`system. However, although not necessary to the present
`invention, primarily for flexibility and to reduce complexity
`it is preferable in the Windows 2000 environment to imple(cid:173)
`ment the SIS facility 62 as a filter driver 62' (FIG. 5). Indeed,
`the present invention was implemented without changing the
`Windows NT® file system (NTFS). Notwithstanding, it will
`be understood that the present invention is not limited to the
`NTFS filter driver model.
`In the NTFS environment, filter drivers are independent,
`loadable drivers through which file system 1/0 (input/
`output) request packets (IRPs) are passed. Each IRP corre(cid:173)
`sponds to a request to perform a specific file system
`operation, such as read, write, open, close or delete, along
`with information related to that request, e.g., identifying the
`file data to read. A filter driver may perform actions to an IRP
`as it passes the