throbber
RAIF: Redundant Array of Independent Filesystems
`
`;
`
`2, Arun M. Krishnakumar1, Chaitanya Patti1, Abhishek Rai1,
`Nikolai Joukov1
`Sunil Satnur1, Avishay Traeger1, and Erez Zadok1
`1 Stony Brook University,
`2 IBM T.J. Watson Research Center
`
`Abstract
`Storage virtualization and data management are well
`known problems for individual users as well as large orga-
`nizations. Existing storage-virtualization systems either do
`not support a complete set of possible storage types, do not
`provide (cid:3)exible data-placement policies, or do not support
`per-(cid:2)le conversion (e.g., encryption). This results in subop-
`timal utilization of resources, inconvenience, low reliability,
`and poor performance.
`We have designed a stackable (cid:2)le system called Redun-
`dant Array of Independent Filesystems (RAIF). It combines
`the data survivability and performance bene(cid:2)ts of tradi-
`tional RAID with the (cid:3)exibility of composition and ease of
`development of stackable (cid:2)le systems. RAIF can be mounted
`on top of directories and thus on top of any combination
`of network, distributed, disk-based, and memory-based (cid:2)le
`systems. Individual (cid:2)les can be replicated, striped, or stored
`with erasure-correction coding on any subset of the under-
`lying (cid:2)le systems.
`RAIF has similar performance to RAID. In con(cid:2)gura-
`tions with parity, RAIF’s write performance is better than
`the performance of driver-level and even entry-level hard-
`ware RAID systems. This is because RAIF has better control
`over the data and parity caching.
`
`Introduction
`1
`If lost, some (cid:2)les
`The importance of (cid:2)les varies greatly.
`can be easily regenerated, some can be regenerated with
`dif(cid:2)culty, and some cannot be regenerated at all. The
`loss of some (cid:2)les, such as (cid:2)nancial data, can be catas-
`trophic. Therefore, (cid:2)les must be stored with varying levels
`of redundancy using data replication or erasure-correction
`codes. File storage is a complicated set of trade-offs be-
`tween data protection, performance, data management con-
`venience, and special infrastructure requirements. For ex-
`ample, some (cid:2)les must be shared by many users, whereas
`some are only available locally on read-only media; some
`(cid:2)les must be encrypted; and some can be ef(cid:2)ciently com-
`pressed. At the same time, every possible storage sub-
`component, such as a disk or a remote server, has a set of
`unique features. Therefore, an ideal storage-virtualization
`system must support most physical and logical storage de-
`vices, as well as support various per-(cid:2)le data placement
`
`policies and per-(cid:2)le special features such as compression
`and encryption.
`Hardware-virtualization systems are limited to speci(cid:2)c
`types of lower storage devices. Driver-level storage virtual-
`ization systems support more device types, but the number
`of local and SAN device types is still limited. Device-level
`and driver-level storage virtualization systems, such as hard-
`ware and software RAID [34], operate on raw data blocks
`without knowledge of meta-data. This means that they use
`the same storage policies for all (cid:2)les. At most, they try to
`use statistical information about data accesses in an attempt
`to increase performance [47]. This also means that informa-
`tion on individual disks in an array has no meaning and little
`or no information can be extracted if too many disks fail.
`File-system(cid:150)level storage virtualization systems still sup-
`port only limited types of storage; many support just one.
`This is because they have to explicitly support speci(cid:2)c NAS
`and other protocols as well as special features such as com-
`pression and encryption.
`Stackable (cid:2)le systems are a useful and well-known tech-
`nique for adding functionality to existing (cid:2)le systems [53,
`54]. They allow incremental addition of features and can
`be dynamically loaded as external kernel modules. Stack-
`able (cid:2)le systems overlay another lower (cid:2)le system, inter-
`cept (cid:2)le system events and data bound from user processes
`to the lower (cid:2)le system, and in turn use the lower (cid:2)le sys-
`tem’s operations and data. Most stackable (cid:2)le systems in the
`past have assumed a simple one-to-one mapping: the stack-
`able (cid:2)le system was layered on top of one lower directory
`on a single (cid:2)le system. A different class of stackable (cid:2)le
`systems that use a one-to-many mapping (fan-out) has pre-
`viously been suggested [16, 38] and was recently developed
`from the FiST [48] templates.
`We have developed a fan-out RAID-like stackable (cid:2)le
`system called Redundant Array of Independent Filesystems
`(RAIF). RAIF is a storage-virtualization system that im-
`poses virtually no restrictions on the underlying stores and
`allows versatile per-(cid:2)le and per-directory storage policies.
`RAIF derives its usefulness from three main features: (cid:3)exi-
`bility of con(cid:2)guration, access to high-level information, and
`easier administration.
`First, because RAIF is stackable, it can be mounted over
`directories and, thus, over any combination of lower (cid:2)le sys-
`
`IEEE
`
`Exhibit
`0002
`
`IEER
`
`OMPUTER
` SOCIETY
`
`Authorized licensed use limited to: NASA Goddard Space Flight. Downloaded on July 13,2010 at 20:45:20 UTC from IEEE Xplore. Restrictions apply.
`
`24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007)
`0-7695-3025-7/07 $25.00 © 2007
`
`HAFEMAN EXHIBIT 2049
`
`

`

`outlines the design of RAIF. Section 3 describes some inter-
`esting implementation details. Section 4 presents an evalua-
`tion of RAIF. Section 5 discusses related work. We conclude
`and outline future directions in Section 6.
`
`2 Design
`Existing storage-virtualization systems are limited in three
`ways: (1) they only virtualize a limited class of storage de-
`vices; (2) they support one or a small number of data place-
`ment policies, such as replication on a subset of lower stor-
`age devices or striping with parity; and (3) they provide
`few (usually no) per-(cid:2)le data management features such as
`compression, consistency validation, or encryption. Even
`so, existing storage-virtualization systems are complex, ex-
`pensive, and hard to maintain. This is because traditional
`storage-virtualization systems must explicitly support differ-
`ent types of lower storage types and features.
`Stackable (cid:2)le systems are a technique for layering new
`functionality on top of existing (cid:2)le systems. As seen in
`Figure 1, a stackable (cid:2)le system’s methods are called by
`the Virtual File System (VFS) as with other (cid:2)le systems,
`but in turn it uses another (cid:2)le system instead of perform-
`ing operations on a backing store such as a disk or an NFS
`server [35, 53]. Before calling the lower-level (cid:2)le system’s
`methods, a stackable (cid:2)le system can modify the operation
`or the data. Stackable (cid:2)le systems behave like normal (cid:2)le
`systems from the perspective of the VFS; from the perspec-
`tive of the underlying (cid:2)le system, they behave like the VFS.
`Fan-out stackable (cid:2)le systems differ from linear stackable
`(cid:2)le systems in that they call multiple underlying (cid:2)le sys-
`tems, or branches.
`
`User
`
`Kernel
`
`User Process
`rename()
`
`vfs_rename()
`Virtual File System (VFS)
`wrapfs_rename()
`Wrapfs
`ext2_rename()
`
`Ext2
`
`Figure 1: Linear (cid:2)le system stacking
`Redundant Array of Independent Filesystems (RAIF) is
`a stackable fan-out (cid:2)le system. We designed it with (cid:3)exi-
`bility in mind. RAIF can be mounted over any set of (cid:2)le
`systems. This means that it does not have to support all pos-
`sible storage types and protocols explicitly: it can leverage
`the code of dozens of existing (cid:2)le systems and NAS proto-
`cols. RAIF will even be able to reuse future (cid:2)le systems and
`protocols without modi(cid:2)cation. If necessary, RAIF can still
`
`tems. For example, it can be mounted over network (cid:2)le sys-
`tems like NFS and Samba, AFS distributed (cid:2)le systems, and
`local (cid:2)le systems(cid:151)all at the same time. RAIF leverages the
`well-maintained code and protocols of existing (cid:2)le systems.
`In fact, RAIF can even utilize future (cid:2)le systems and proto-
`cols. Stackable (cid:2)le systems can be mounted on top of each
`other. Existing stackable (cid:2)le systems offer features like en-
`cryption [50], data-integrity veri(cid:2)cation [27], virus check-
`ing [32], versioning [33], tracing [3], and compression [51].
`These (cid:2)le systems can be mounted over RAIF as well as
`below it.
`Second, because RAIF operates at the (cid:2)le system level,
`it has access to high-level (cid:2)le system meta-data that is
`not available to traditional block-level storage virtualization
`systems. This meta-data allows the optimization of redun-
`dancy schemes, data placement, and read-ahead algorithms.
`For example, RAIF can stripe large multimedia (cid:2)les across
`different branches for performance, and concurrently use
`two parity pages for important (cid:2)nancial data (cid:2)les that must
`survive even two failures. Dynamic RAIF-level migration
`allows one to optimize RAIF reliability, resource utilization,
`and performance.
`Third, administration is easier because (cid:2)les are stored on
`unmodi(cid:2)ed lower-level (cid:2)le systems. Therefore, the size of
`these lower (cid:2)le systems can be changed, and they can be
`backed up easily using standard software. The data is easier
`to recover in the case of failure because it is stored in (cid:2)les
`with the same names as their upper-level counterparts. For
`example, if the (cid:2)les are replicated on several disks, then ev-
`ery such disk will contain an ordinary (cid:2)le system with the
`same (cid:2)les intact.
`A common limitation of software RAIDs is that data re-
`covery is slow and may require a restart from the very be-
`ginning if it is interrupted. RAIF performs storage recovery
`on a per-(cid:2)le basis and can even do it in the background, con-
`currently with normal activity. If irreparable inconsistencies
`are detected, RAIF can identify the speci(cid:2)c (cid:2)les affected,
`which is more user-friendly than the lists of block numbers
`offered by block-level systems.
`Like other data striping and replication systems, RAIF
`can improve performance for multi-process workloads.
`Even for I/O-intensive single-process workloads that re-
`quire many synchronous operations, RAIF’s performance is
`comparable with the performance of driver-level software
`RAID systems. For RAIF con(cid:2)gurations similar to standard
`RAID4 and RAID5, RAIF can outperform driver-level and
`even some hardware implementations thanks to better con-
`trol over (cid:2)le system caches. Moreover, most other storage-
`virtualization systems choose a single data-placement pol-
`icy, which is not optimal for all (cid:2)les. RAIF can optimize
`performance for all (cid:2)les because it can use better data-
`placement policies for individual (cid:2)les or groups of (cid:2)les.
`The rest of the paper is organized as follows. Section 2
`
`IEEE
`
`IEER 4•
`
`COMPUTER
`SOCIETY
`
`Authorized licensed use limited to: NASA Goddard Space Flight. Downloaded on July 13,2010 at 20:45:20 UTC from IEEE Xplore. Restrictions apply.
`
`24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007)
`0-7695-3025-7/07 $25.00 © 2007
`
`HAFEMAN EXHIBIT 2049
`
`

`

`the lower branches. The data (cid:2)les are stored using differ-
`ent RAIF modes that we call levels, analogous to standard
`RAID levels:
`(cid:15) RAIF0 The (cid:2)le is striped over a set of lower (cid:2)le sys-
`tems. The striping unit may be different for different
`(cid:2)les. As we describe below, this level also allows us to
`distribute entire (cid:2)les across branches.
`(cid:15) RAIF1 The (cid:2)le is replicated over a set of branches.
`Note that we de(cid:2)ne RAIF1 slightly differently from the
`original RAID level 1 [34]. The original RAID level 1
`de(cid:2)nition corresponds to RAIF01.
`(cid:15) RAIF4 The (cid:2)le is striped as in RAIF0 but a dedicated
`branch is used for parity. This level is useful for het-
`erogeneous sets of branches, for example if the parity
`branch is much faster than the others and the workload
`has many writes, or if the parity branch is much slower
`and the workload is read-biased.
`(cid:15) RAIF5 This level is similar to RAIF4, but the parity is
`rotated through different branches.
`(cid:15) RAIF6 In RAIF6, extra parity branches are used to re-
`cover from two or more simultaneous failures.
`(cid:15) RAIF01 This level is a mirrored RAIF0 array.
`(cid:15) RAIF10 Similarly to RAIF01, a mirrored array is
`striped over a set of branches.
`Small (cid:2)les may occupy only a portion of a single stripe.
`To distribute space utilization and accesses among branches,
`we start the stripes of different (cid:2)les on different branches.
`We use the term starting branch to denote the branch where
`the (cid:2)le’s (cid:2)rst data page is located. By default, the starting
`branch is derived from the (cid:2)le name using a hash function.
`Also, users may specify starting branches explicitly.
`Another use of the starting branch is to randomize the
`data placement of entire (cid:2)les. In particular, one may use
`RAIF0 with a large striping unit size.
`In this case, ev-
`ery small (cid:2)le will be entirely placed on its starting branch
`whereas large (cid:2)les will be striped. For RAIF levels 4, 5,
`and 6, small (cid:2)les will be stored on their starting branches
`and their parity will be stored on parity branches. Essen-
`tially, small (cid:2)les will be automatically replicated on two or
`three branches whereas large (cid:2)les will be striped. Also, it
`is possible to specify a special striping width value ((cid:150)1) to
`force (cid:2)les to be stored on their starting branch independently
`of their size. This is useful to replicate data on two ran-
`dom branches (level 5) or one random and one or two (cid:2)xed
`branches (levels 4 and 6).
`RAIF can assign levels and striping unit sizes to (cid:2)les and
`directories individually, or by applying regular expressions
`to (cid:2)le names. For example, if users’ home directories are
`to be stored on a RAIF mounted over a set of branches as
`shown in Figure 2, one may use the following global rules:
`(cid:15) Use RAIF1 to replicate *.c, *.h, *.doc, and other
`important (cid:2)les on all local drives (branches E0...EN)
`
`be mounted over standard RAID arrays or any other storage-
`virtualization system. In addition, existing and future stack-
`able (cid:2)le systems can be mounted above RAIF or over any
`lower branches. This allows one to add important extra fea-
`tures such as compression or encryption in a selective way.
`Figure 2 shows an example RAIF (cid:2)le system mounted over
`several different types of lower storage. In this (cid:2)gure, letters
`below RAIF denote the branch labels as follows:
`C A read-only CD-ROM (cid:2)le system.
`E A set of Ext3 (cid:2)le systems mounted over local disks.
`V A versioning stackable (cid:2)le system mounted over an
`NFS client.
`N An encryption stackable (cid:2)le system mounted over an
`NFS client. All data stored on this remote branch is
`automatically encrypted.
`G A TmpFS Linux (cid:2)le system that stores data in virtual
`memory or in the swap area. Because virtual memory
`may be scarce, the branch data is compressed with a
`stackable compression (cid:2)le system.
`
`User
`
`Kernel
`
`User Process
`rename()
`
`vfs_rename()
`Virtual File System (VFS)
`avfs_rename()
`
`AVFS
`
`raif_rename()
`RAIF
`
`C
`
`Ei
`
`V
`
`N
`
`G
`
`I-
`
`VersionFS
`
`NCryptFS
`
`GzipFS
`
`iso9660
`
`yv
`Ext3
`
`NFS
`
`TmpFS
`
`CD−ROM
`
`yv
`Local Disks
`
`LAN
`
`Memory/Swap
`
`Figure 2: A possible combination of RAIF fan-out
`stacking and other (cid:2)le systems stacked linearly.
`Letters below RAIF denote the branch labels.
`
`2.1 RAIF Data Placement
`RAIF is designed at a high level of abstraction and leverages
`well-maintained, external code. Therefore, we were able to
`concentrate on storage virtualization issues rather than sup-
`port for different types of lower storage devices, placement
`of individual blocks on these devices, and NAS or SAN pro-
`tocols. RAIF duplicates the directory structure on all of
`
`IEEE
`
`IEER
`
`COMPUTER
`SOCIETY
`
`Authorized licensed use limited to: NASA Goddard Space Flight. Downloaded on July 13,2010 at 20:45:20 UTC from IEEE Xplore. Restrictions apply.
`
`24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007)
`0-7695-3025-7/07 $25.00 © 2007
`
`HAFEMAN EXHIBIT 2049
`
`

`

`and branch V. This way, important (cid:2)les are replicated
`locally, stored remotely, and also stored with versions
`at the same time.
`(cid:15) Use RAIF4 to stripe large multimedia (cid:2)les on all local
`drives and keep parity data on the remote NFS server
`(branch N). This allows us to save precious space on
`the local drives and still keep parity to recover (cid:2)les in
`case of a single disk failure. Note that multimedia (cid:2)les
`are usually not changed frequently. Therefore, the re-
`mote server is seldom contacted for common read op-
`erations.
`(cid:15) Use branch G for intermediate (cid:2)les (e.g., *.o). This
`way these (cid:2)les are available for repeated compilations
`but are purged if not used for a while.
`(cid:15) Use RAIF1 over branches C and G to provide the il-
`lusion of writing to a read-only (cid:2)le system. As we
`will describe in Section 2.3, RAIF provides a simpli-
`(cid:2)ed uni(cid:2)cation functionality [48]. For example, if a
`CD-ROM contains source code, one may compile it
`directly from the CD-ROM. Resulting (cid:2)les will be put
`onto the G branch and users will see a merged set of
`(cid:2)les in a single directory.
`(cid:15) Use RAIF0 or RAIF5 on the E branches for all other
`(cid:2)les as they are either regenerable or unimportant.
`In addition, individual users may be allowed to de(cid:2)ne
`special rules on a per-(cid:2)le or per-directory basis.
`
`2.2 RAIF Rules
`The meta-information about every (cid:2)le includes the (cid:2)le’s
`starting branch, RAIF level, striping unit, and the set of
`branches used to store the (cid:2)le. Since the (cid:2)le name is the
`only information available for a (cid:2)le LOOKUP operation,
`RAIF needs to maintain a mapping of (cid:2)le names to meta-
`information to perform LOOKUP operations.
`In an early RAIF prototype [23], we used a hash function
`to (cid:2)nd the branch with per-(cid:2)le meta-information. However,
`this approach is not suitable for a storage-virtualization sys-
`tem in which any subset of branches can be used to store
`a (cid:2)le: RAIF needs to know the (cid:2)le storage con(cid:2)guration
`in order to determine the subset of branches used to store
`the (cid:2)le. RAIF’s meta-information storage is based on two
`assumptions: (cid:2)rst, we assume that RAIF meta-information
`does not change frequently; second, we assume that the
`number of storage policies is much smaller than the num-
`ber of (cid:2)les. Therefore, every storage policy is associated
`with a regular expression to match (cid:2)le names. A regular
`expression, a RAIF level, a striping unit size, and a set of
`branches together de(cid:2)ne a RAIF rule. A lookup, then, con-
`sists of matching a (cid:2)le name with a rule and looking up the
`(cid:2)le in the returned set of branches. Rules are stored in spe-
`cial (cid:2)les on a subset of lower branches. These (cid:2)les are not
`visible from above RAIF and are read at directory lookup
`time. This approach allows RAIF to decrease both system
`
`time and I/O overheads. Directories with no rules contain
`no rule (cid:2)les, which is the common case. Also, storing rule
`(cid:2)les close to the (cid:2)les they relate to increases data locality on
`the lower branches and thus improves performance. RAIF
`supports the following three types of rules:
`(cid:15) Global rules are inherited by lower directories. They
`can be added or removed only manually.
`(cid:15) Local rules are not inherited by lower directories and
`have higher priority than global rules. Generally, these
`rules are necessary for exception handling. For exam-
`ple, they can be used to de(cid:2)ne special policies for an
`individual (cid:2)le. However, the most common use of local
`rules is to handle RENAME and rule-change operations,
`as we will describe in Section 2.2.2.
`(cid:15) Meta rules specify policies for storing RAIF rule (cid:2)les.
`This way, these special rule (cid:2)les can be replicated on
`all or only a subset of branches to balance reliability
`and performance. Also, meta rules are useful if some
`branches are mounted read-only. Finally, this allows
`one to dedicate branches for storing RAIF rules. For
`example, such branches can be networked (cid:2)le systems
`to allow sharing of the rules between servers. Dedi-
`cated branches are thus similar to dedicated meta-data
`servers in cluster storage systems [1]. Regular expres-
`sions in meta rules are matched only against lower di-
`rectory names.
`Let us again consider the example RAIF con(cid:2)guration
`shown in Figure 2, and assume that we want to store a small
`directory tree as shown in Figure 3. As we discussed in Sec-
`tion 2.1, we may want to have a global rule for the topmost
`directory, raif, to stripe all multimedia (cid:2)les on the local
`hard disks and keep their parity on a remote branch (RAIF4
`con(cid:2)guration). However, one may want to add a special lo-
`cal rule to the dir1 directory to replicate birth.avi on all
`the local drives and a remote server because that (cid:2)le con-
`tains especially precious data.
`2.2.1 Rule Matching
`Upon a (cid:2)le LOOKUP operation, RAIF matches the (cid:2)le name
`against the regular expressions in all relevant rule (cid:2)les.
`Rules in every directory are arranged sequentially accord-
`ing to their priority. Local rules have the highest priority
`and are tried (cid:2)rst. Global rules in every directory have
`precedence over global rules of the parent directory. This
`way, every rule has a unique priority value. Therefore, the
`rule-matching process proceeds by scanning the rules se-
`quentially, starting from the highest-priority local rule in the
`current directory and (cid:2)nishing at the lowest-priority global
`rule from the RAIF mount-point directory. Because we as-
`sume that rules are not changed frequently, RAIF can use
`(cid:2)nite-state machines (e.g., using the Aho-Corasick pattern-
`matching algorithm [2]) to avoid sequential scanning.
`Let us now consider the LOOKUP operation in more de-
`
`IEEE
`
`IEER 4•
`
`COMPUTER
`SOCIETY
`
`Authorized licensed use limited to: NASA Goddard Space Flight. Downloaded on July 13,2010 at 20:45:20 UTC from IEEE Xplore. Restrictions apply.
`
`24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007)
`0-7695-3025-7/07 $25.00 © 2007
`
`HAFEMAN EXHIBIT 2049
`
`

`

`/raif
`
`top.avi
`
`dir1
`
`dir2
`
`top.avi
`
`/raif
`
`dir1
`
`birth.avi
`
`bday.avi
`
`a.txt
`
`party.avi
`
`birth.avi
`
`bday.avi
`
`dir2
`
`Figure 3: A sample directory tree.
`
`tail. First, RAIF matches the (cid:2)le name against all relevant
`rules using the (cid:2)nite automaton for the current directory.
`Out of the returned set of rules, RAIF selects the highest-
`priority rule and looks up the (cid:2)le on one of the branches
`according to that rule. After this LOOKUP operation, RAIF
`knows if the lower (cid:2)le exists and if it is a regular (cid:2)le or a
`directory. If the (cid:2)le exists, RAIF looks up the rest of the
`branches. In case of a directory LOOKUP, RAIF reads and
`parses its rules if they exist. If the (cid:2)le being looked up does
`not exist, RAIF returns from the LOOKUP operation. The
`rest of the branches are looked up later, upon calls to (cid:2)le
`or directory CREATE functions, because this is the earliest
`opportunity for RAIF to know the type of these (cid:2)les. This
`allows RAIF to minimize the number of LOOKUP calls sent
`to the lower (cid:2)le systems.
`
`2.2.2 Rule Management
`Rules for groups of (cid:2)les allow RAIF to perform with min-
`imal overheads for common (cid:2)le system operations because
`only a minimal amount of extra meta-information requires
`maintenance in memory and on the lower branches. This is
`possible because RAIF rules are kept intact most of the time.
`Let us consider the three situations in which rule changes are
`required.
`Rename operations change the name of the (cid:2)le. Therefore,
`the rule that matches a (cid:2)le may be different before and af-
`ter it is renamed. Note that this is not likely because most
`renames do not change the logical meaning of the (cid:2)le be-
`ing renamed. For example, it is unlikely that the source (cid:2)le
`main.c would be renamed to a video (cid:2)le main.avi. How-
`ever, the location of the starting branch may change even in
`the common case, which is important if the (cid:2)le is striped.
`Therefore, if the old storage policy does not correspond to
`the new (cid:2)le name, RAIF has two options: (1) add a local
`rule for the new (cid:2)le (the regular expression (cid:2)eld is the ac-
`tual new (cid:2)le name), or (2) store the (cid:2)le according to its new
`name. RAIF adds a local rule for large (cid:2)les and moves the
`data for small (cid:2)les. The (cid:2)le-size threshold is con(cid:2)gurable.
`Upon an UNLINK operation of a (cid:2)le, the corresponding local
`rule is deleted if it exists.
`Moving a directory may require an update of the rules for
`that directory. For example, Figure 4 shows the directory
`
`a.txt
`
`party.avi
`
`Figure 4: Same directory tree with dir2 moved in-
`side dir1.
`tree of Figure 3 with directory dir2 moved into dir1. We
`can see that if dir1 has any rules, then these rules could
`con(cid:3)ict with the placement of the (cid:2)les in the old dir2. In
`that case, RAIF would add the necessary global rules to
`dir2. Note that the number of rules to add is usually small.
`In the worst case, the global rules from the directories above
`may require an addition to a single directory.
`Manual addition, removal, and modi(cid:2)cation of rules by
`users is not a frequent operation. However, it may result in
`con(cid:3)icts if existing (cid:2)les match the new rule and the new rule
`does not correspond to the old storage policy. Again, RAIF
`offers two possible options: (1) moving the (cid:2)les’ data; and
`(2) adding local rules for affected (cid:2)les. The (cid:2)rst option may
`be useful for storage recon(cid:2)guration. Rule changes usually
`correspond to administrators’ desire to modify the storage
`con(cid:2)guration. For example, an administrator may want to
`increase the storage capacity and add another local disk to
`the RAIF con(cid:2)guration shown in Figure 2.
`In that case,
`administrators will change the existing rule to stripe data
`across more branches, and may want to migrate the data for
`all existing (cid:2)les as well. The second option, adding local
`rules, corresponds to the standard technique of lazy storage
`migration. In particular, new (cid:2)les will be stored according
`to the new rules whereas old (cid:2)les will be gradually migrated
`if deleted or completely overwritten.
`Rename and directory-movement operations both require
`changes to rules or (cid:2)les in only one directory. Rule addi-
`tions, removals, and modi(cid:2)cations may require recursive op-
`erations and are therefore performed partially in user mode.
`In particular, a user-level utility traverses the directory tree
`bottom-up and gradually updates it by calling the kernel-
`level directory rule-update routines. Per-directory updates
`are not recursive, and RAIF performs them in the kernel to
`avoid races. As an example rule-update scenario, consider
`the directory tree shown in Figure 3. Let us assume that we
`want to add a global rule to directory raif with the regular
`expression a.txt. First, the user-level program for rule ad-
`dition will issue an ioctl call to ask the kernel component
`to add the rule to directory dir1. Because that directory has
`no (cid:2)les that match the regular expression a.txt, RAIF will
`just add that global rule to dir1 and return. Second, the
`
`IEEE
`
`IEER 4•
`
`COMPUTER
`SOCIETY
`
`Authorized licensed use limited to: NASA Goddard Space Flight. Downloaded on July 13,2010 at 20:45:20 UTC from IEEE Xplore. Restrictions apply.
`
`24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007)
`0-7695-3025-7/07 $25.00 © 2007
`
`HAFEMAN EXHIBIT 2049
`
`

`

`user-level program will do the same for dir2, which has a
`(cid:2)le that matches the new regular expression. Depending on
`the (cid:2)le size or additional program (cid:3)ags, RAIF either adds
`a local rule for that (cid:2)le or moves its data. (Storage poli-
`cies must be changed only in that case.) If RAIF moves the
`data, it (cid:2)rst creates local rules for every matched (cid:2)le in a
`directory to ensure that the (cid:2)les’ data is always consistent
`with the rules. Only after doing this, RAIF removes these
`temporary local rules, adds a new global rule to dir2, and
`returns. Third, the user-level program calls RAIF to add a
`rule to directory raif. Finally, RAIF adds the global rule
`to raif and removes the temporary global rules from dir1
`and dir2.
`2.3 Merging and Uni(cid:2)cation Semantics
`RAIF levels that stripe the (cid:2)les can be used to merge several
`small (cid:2)le systems together to form a single, larger (cid:2)le sys-
`tem. Similar to Chunkfs [19], such merging decreases the
`total consistency checking time. RAIF can merge any num-
`ber of (cid:2)le systems without having to modify their source
`code or on-disk data layouts. RAIF can also store parity
`locally or remotely to increase data reliability.
`RAIF’s design ensures that (cid:2)les stored on several
`branches are visible as a uni(cid:2)ed name-space from above.
`This allows RAIF to join several (cid:2)le systems together simi-
`lar to Unionfs [48]. Thus, one can run programs that require
`read-write access to the directory structure from a read-only
`CD-ROM or a DVD-ROM. For example, one may create
`rules to store rule (cid:2)les and all the new (cid:2)les on the writable
`branches only. This way, one can compile programs from
`read-only media directly, or temporarily store useful (cid:2)les on
`RAM-based (cid:2)le systems.
`2.4 RAIF Administration
`RAIF’s rule-changing process is useful in many cases, such
`as when adding or removing RAIF branches, or when in-
`creasing or decreasing available storage space. However,
`the rule-changing process may be lengthy so it is wise to
`estimate the storage space changes beforehand. RAIF has
`user-level utilities for this purpose.
`RAIF has a user-level Perl script that accepts high-level
`descriptions of (cid:2)le types and information about the relative
`importance of (cid:2)les. This script can change rules and even
`migrate data to increase or decrease storage space utiliza-
`tion given this information. For example, an administrator
`could use the script to free N gigabytes of space when nec-
`essary [55]. This script can perform automatic storage re-
`con(cid:2)guration if periodically started by a cron job. For ex-
`ample, RAIF can store all (cid:2)les with replication while space
`is plentiful but gradually change the reliability of data up to
`the lowest allowed level as space becomes scarce.
`Also, user-mode tools can initiate delayed conversion of
`(cid:2)le storage policies. For example, RAIF can create local
`
`rules for (cid:2)les affected by renaming or rule-changing opera-
`tions during high system load times. Later, the cron dae-
`mon can initiate conversion of these (cid:2)les during periods in
`which the system is underutilized.
`This design allows RAIF to be simple in the kernel and
`operate with clear and deterministic rules. At the same time,
`more sophisticated and abstract policies can be processed
`and de(cid:2)ned at the user level.
`2.4.1 RAIF Backup and Security
`RAIF stores (cid:2)les on lower (cid:2)le systems as regular (cid:2)les. This
`provides three main bene(cid:2)ts. First, it allows incremental
`backups on a per-branch basis. These backups can be re-
`stored on individual failed branches. Second, RAIF (cid:2)les
`have the same owners and attributes as their counterparts
`on lower branches. Similarly, rule (cid:2)les have the same own-
`ers and access permissions as their parent directories. This
`means that users not allowed to write to a directory are not
`allowed to create rules in that directory because they cannot
`create rule (cid:2)les there. Third, administrators can set up quo-
`tas for the lower (cid:2)le systems and control the usage of these
`individual branches. This could be used to discourage users
`from setting up rules to replicate all their (cid:2)les.
`2.5 Performance
`RAIF is a (cid:2)le system, and as such is located above (cid:2)le
`system caches. Therefore, it can ef(cid:2)ciently use all avail-
`able memory for data and meta-data caching. For example,
`both data and parity are automatically cached in the page
`cache [13]. This allows RAIF to outperform driver-level and
`even some hardware RAID controllers (with small caches)
`under a write-oriented w

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket