throbber
Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 1 of 34
`
`Exhibit 21
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 2 of 34
`
`On Incremental File System Development
`EREZ ZADOK, RAKESH IYER, NIKOLAI JOUKOV, GOPALAN SIVATHANU, AND
`CHARLES P. WRIGHT
`
`Developing (cid:12)le systems from scratch is di(cid:14)cult and error prone. Layered, or stackable, (cid:12)le
`systems are a powerful technique to incrementally extend the functionality of existing (cid:12)le systems
`on commodity OSes at runtime. In this paper, we analyze the evolution of layering from historical
`models to what is found in four di(cid:11)erent present day commodity OSes: Solaris, FreeBSD, Linux,
`and Microsoft Windows. We classify layered (cid:12)le systems into (cid:12)ve types based on their functionality
`and identify the requirements that each class imposes on the OS. We then present (cid:12)ve major design
`issues that we encountered during our experience of developing over twenty layered (cid:12)le systems on
`four OSes. We discuss how we have addressed each of these issues on current OSes, and present
`insights into useful OS and VFS features that would provide future developers more versatile
`solutions for incremental (cid:12)le system development.
`Categories and Subject Descriptors: D.2.7 [Software Engineering]: Distribution, Maintenance, and Enhance-
`ment(cid:151)Portability; D.2.13 [Software Engineering]: Reusable Software(cid:151)Reuse models; D.4.3 [Operating Sys-
`tems]: File Systems Management(cid:151)File organization; D.4.7 [Operating Systems]: Organization and Design
`General Terms: Design
`Additional Key Words and Phrases: Layered File Systems, Stackable File Systems, VFS, Vnode,
`I/O Manager, IRP, Extensibility
`
`INTRODUCTION
`1.
`Data management is a fundamental facility provided by the operating system (OS). File
`systems are tasked with the bulk of data management, including storing data on disk (or
`over the network) and naming (i.e., translating a user-visible name such as /usr/src
`into an on-disk object). File systems are complex, and it is dif(cid:2)cult to enhance them.
`Furthermore, OS vendors are reluctant to make major changes to a (cid:2)le system, because
`(cid:2)le system bugs have the potential to corrupt all data on a machine. Because (cid:2)le system
`development is so dif(cid:2)cult, extending (cid:2)le system functionality in an incremental manner
`is valuable.
`Incremental development also makes it possible for a third-party software
`developer to release (cid:2)le system improvements, without developing a whole (cid:2)le system
`from scratch.
`Originally, (cid:2)le systems were thoroughly integrated into the OS, and system calls di-
`rectly invoked (cid:2)le system methods. This architecture made it dif(cid:2)cult to add multiple (cid:2)le
`systems. The introduction of a virtual node or vnode provided a layer of abstraction that
`separates the core of the OS from (cid:2)le systems [Kleiman 1986]. Each (cid:2)le is represented in
`
`Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use
`provided that the copies are not made or distributed for pro(cid:2)t or commercial advantage, the ACM copyright/server
`notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the
`ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior speci(cid:2)c
`permission and/or a fee.
`c(cid:13) 2006 ACM 1533-3077/2006/0000-0001 $5.00
`
`ACM Transactions on Storage, Vol. 2, No. 2, May 2006, Pages 1(cid:150)33.
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 3 of 34
`
`2
`
`(cid:1)
`
`Zadok et al.
`
`memory by a vnode. A vnode has an operations vector that de(cid:2)nes several operations that
`the OS can call, thereby allowing the OS to add and remove types of (cid:2)le systems at run-
`time. Most current OSes use something similar to the vnode interface, and the number of
`(cid:2)le systems supported by the OS has grown accordingly. For example, Linux 2.6 supports
`over 30 (cid:2)le systems and many more are maintained outside of the of(cid:2)cial kernel tree.
`Clearly de(cid:2)ning the interface between the OS and (cid:2)le systems makes interposition pos-
`sible. A layered, or stackable, (cid:2)le system creates a vnode with its own operations vector
`to be interposed on another vnode. Each time one of the layered (cid:2)le system’s operations
`is invoked, the layered (cid:2)le system maps its own vnode to a lower-level vnode, and then
`calls the lower-level vnode’s operation. To add functionality, the layered (cid:2)le system can
`perform additional operations before or after the lower-level operation (e.g., encrypting
`data before a write or decrypting data after a read). The key advantage of layered (cid:2)le
`systems is that they can change the functionality of a commodity OS at runtime so hard-to-
`develop lower-level (cid:2)le systems do not need to be changed. This is important, because OS
`developers often resist change, especially to (cid:2)le systems where bugs can cause data loss.
`Rosenthal was among the (cid:2)rst to propose layering as a method of extending (cid:2)le sys-
`tems [Rosenthal 1990; 1992]. To enable layering, Rosenthal radically changed the VFS in-
`ternals of SunOS. Each public vnode (cid:2)eld was converted into a method; and all knowledge
`of vnode types (e.g., directory vs. regular (cid:2)le) was removed from the core OS. Researchers
`at UCLA independently developed another layering infrastructure [Heidemann and Popek
`1991; 1994] that placed an emphasis on light-weight layers and extensibility. The original
`pioneers of layering envisioned creating building blocks that could be composed together
`to create more sophisticated and rich (cid:2)le systems. For example, the directory-name lookup
`cache (DNLC) could simply be implemented as a (cid:2)le system layer, which returns results
`on a cache hit, but passes operations down on a miss [Skinner and Wong 1993].
`Layering has not commonly been used to create and compose building-block (cid:2)le sys-
`tems, but instead has been widely used to add functionality rapidly and portably to existing
`(cid:2)le systems. Many applications of layered (cid:2)le system are features that could be imple-
`mented as part of the VFS (e.g., uni(cid:2)cation), but for practical reasons it is easier to develop
`them as layered (cid:2)le systems. Several OSes have been designed to support layered (cid:2)le sys-
`tems, including Solaris, FreeBSD, and Windows. Several layered (cid:2)le systems are available
`for Linux, even though it was not originally designed to support them. Many users use lay-
`ered (cid:2)le systems unknowingly as part of Antivirus solutions [Symantec 2004; Miretskiy
`et al. 2004], and Windows XP’s system restore feature [Harder 2001]. On Unix, a null-
`layer (cid:2)le system is used to provide support for accessing one directory through multiple
`paths. When the layer additionally modi(cid:2)es the data, useful new functionality like encryp-
`tion [Corner and Noble 2002; Halcrow 2004] or compression [Zadok et al. 2001] can be
`added. Another class of layered (cid:2)le systems, called fan out, operates directly on top of
`several lower-level (cid:2)le systems. For example, uni(cid:2)cation (cid:2)le systems merge the contents
`of several directories [Pendry and McKusick 1995; Wright et al. 2006]. Fanout (cid:2)le systems
`can also be used for replication, load-balancing, failover, snapshotting, and caching.
`The authors of this paper have over (cid:2)fteen years of combined experience developing
`layered (cid:2)le systems on four OSes: Solaris, FreeBSD, Linux, and Windows. We have
`developed more than twenty layered (cid:2)le systems that provide encryption, compression,
`versioning, tracing, antivirus, uni(cid:2)cation, snapshotting, replication, checksumming, and
`more.
`ACM Transactions on Storage, Vol. 2, No. 2, May 2006.
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 4 of 34
`
`On Incremental File System Development
`
`(cid:1)
`
`3
`
`The rest of this paper is organized as follows. In Section 2 we survey alternative tech-
`niques to enhance (cid:2)le system functionality. In Section 3 we describe four models of layered
`(cid:2)le system development. We then proceed to describe (cid:2)ve broad classes of layered (cid:2)le sys-
`tems in Section 4. In Section 5 we describe (cid:2)ve general problems and their solutions that
`are useful for all types of layered (cid:2)le systems. We conclude in Section 6 with guiding
`principles for future OS and layered (cid:2)le system developers.
`
`2. RELATED WORK
`In this section we describe alternatives to achieve the extensibility offered by layered (cid:2)le
`systems. We discuss four classes of related works based on the level at which extensibility
`is achieved: in hardware, in the device driver, at the system-call level, or in user-level
`programs. We have a detailed discussion of layered (cid:2)le system infrastructures in Section 3.
`Hardware level. Slice [Anderson et al. 2000] is a storage system architecture for high
`speed networks with network-attached block storage. Slice interposes a piece of code
`called a switching (cid:2)lter in the network hardware to route packets among a group of servers.
`Slice appears to the upper level as a single block-oriented storage device. High-level con-
`trol information (e.g., (cid:2)les) is unavailable to interposition code at the hardware level, and
`therefore cannot perform optimizations for speci(cid:2)c devices.
`Semantically-Smart Disk Systems (SDSs) [Sivathanu et al. 2003] attempt to provide
`(cid:2)le-system(cid:150)like functionality without modifying the (cid:2)le system. Knowledge of a speci(cid:2)c
`(cid:2)le system is embedded into the storage device, and the device provides additional func-
`tionality that would traditionally be implemented in the (cid:2)le system. Such systems are
`relatively easy to deploy, because they do not require modi(cid:2)cations to existing (cid:2)le system
`code. Layered (cid:2)le systems share a similar goal in terms of reusing and leveraging existing
`infrastructures. Unlike a layered (cid:2)le system, an SDS is closely tied to the format of the (cid:2)le
`system running on top of it, so porting SDSs to new (cid:2)le systems is dif(cid:2)cult.
`Device-driver level. Software RAID and Logical Volume Managers (LVMs) introduce
`another layer of abstraction between the storage hardware and the (cid:2)le system. They provide
`additional features such as increased reliability and performance, while appearing to the
`(cid:2)le system as a simple disk, which makes them easy to deploy in existing infrastructure.
`For example, on Linux a Cryptoloop devices uses a loopback block driver to encrypt data
`stored on a disk or in a (cid:2)le. A new (cid:2)le system is then created within the Cryptoloop device.
`Any (cid:2)le system can run on top of a block device-driver extension. However, block device
`extensions cannot exploit the control information (e.g., names) that is available at the (cid:2)le
`system level.
`System-call level. SLIC [Ghormley et al. 1998] is a protected extensibility system for
`OSes that uses interposition techniques to enable the addition of a large class of untrusted
`extensions to existing code. Several OS extensions can be implemented using SLIC such
`as encryption (cid:2)le systems and a protected environment for binary execution. The Inter-
`position Agents toolkit [Jones 1993], developed by Microsoft Research, allows a user’s
`code to be written in terms of high-level objects provided by this interface. The toolkit
`was designed to ease interposing code between the clients and the instances of the system
`interface to facilitate adding new functionality like (cid:2)le reference tracing, customizable (cid:2)le
`system views, etc. to existing systems. Similarly, Mediating Connectors [Balzer and Gold-
`man 1999] is a system call (and library call) wrapper mechanism for Windows NT that
`ACM Transactions on Storage, Vol. 2, No. 2, May 2006.
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 5 of 34
`
`4
`
`(cid:1)
`
`Zadok et al.
`
`allows users to trap API calls.
`System call interposition techniques rely on the communication channel between user-
`space and the kernel, and hence cannot handle operations that bypass that channel (e.g.,
`mmap operations and their associated page faults). Also, interposing at the system call
`level results in overhead for all system calls even if only a subset of kernel components
`(e.g., the (cid:2)le system) need to be interposed.
`
`User level. Gray-box Information and Control Layers (ICL) [Arpaci-Dusseau and Arpaci-
`Dusseau 2001] extend the functionality of OSes by acquiring information about their in-
`ternal state.
`ICLs provide OS-like functionality without modifying existing OS code.
`Dust [Burnett et al. 2002] is a direct application of ICLs that uses gray-box knowledge
`of the OS’s buffer cache management policy to optimize application performance. For ex-
`ample, if a Web server (cid:2)rst services Web pages that are believed to be in the OS buffer
`cache, then both average response time and throughput can be improved.
`Blaze’s CFS is a cryptographic (cid:2)le system that is implemented as a user-level NFS
`server [Blaze 1993]. The OS’s unmodi(cid:2)ed NFS client mounts the NFS server over the
`loopback network interface. The SFS toolkit [Mazi·eres 2001] aims to simplify Unix (cid:2)le
`system extensibility by allowing development of (cid:2)le systems at the user level. Using the
`toolkit, one can implement a simple user-level NFS server and redirect local (cid:2)le system
`operations into the user level implementation. The popularity of the SFS toolkit demon-
`strates that developers have observed the complexity of modifying existing time-tested (cid:2)le
`systems. SiRiUS [Goh et al. 2003], a (cid:2)le system for securing remote untrusted storage,
`and Dabek’s CFS [Dabek et al. 2001], a wide area cooperative (cid:2)le system, were built using
`the SFS toolkit.
`Filesystem in Userspace [Szeredi 2005], or FUSE, is a hybrid approach that consists
`of two parts: (1) a standard kernel-level (cid:2)le system which passes calls to a user-level
`demon, and (2) a library to easily develop (cid:2)le-system(cid:150)speci(cid:2)c FUSE demons. Developing
`new (cid:2)le systems with FUSE is relatively simple because the user-level demon can issue
`normal system calls (e.g., read) to service a VFS call (e.g., vfs read). The main two
`disadvantages of a FUSE (cid:2)le system are that (1) performance is limited by crossing the
`user-kernel boundary, and (2) the (cid:2)le system can only use FUSE’s API, which closely
`matches the VFS API, whereas kernel (cid:2)le systems may access a richer kernel API (e.g.,
`for process and memory management).
`Sun designed and developed Spring as an object-oriented microkernel OS. Spring’s
`architecture allowed various components, including (cid:2)le systems, to be transparently ex-
`tended with user libraries [Khalidi and Nelson 1993]. Spring’s design was radically differ-
`ent from current commodity OSes. As it was research prototype, it was not deployed
`in real systems. K42 [Appavoo et al. 2002] is a new OS under development at IBM
`which incorporates innovative mechanisms and modern programming technologies. Vari-
`ous system functionalities can be extended at the user level through libraries by providing
`a microkernel-like interface. The Exokernel architecture [Engler et al. 1995] implements
`minimal components in the kernel and allows user-level applications to customize OS func-
`tionality using library OSes.
`In general, user-level extensions are easy to implement, but their performance is not as
`good as kernel extensions because the former involve data copies between the user level
`and the kernel level, as well as additional context switches.
`ACM Transactions on Storage, Vol. 2, No. 2, May 2006.
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 6 of 34
`
`On Incremental File System Development
`
`(cid:1)
`
`5
`
`3. LAYERING MODELS
`On Unix, the idea of layering evolved from the vnode interface [Kleiman 1986], initially
`implemented in SunOS in 1984. Most of today’s layered (cid:2)le system models use a vnode
`interface. Microsoft Windows, on the other hand, uses a message-passing layering model.
`We describe three Unix models and the Windows model here; these cover the full range
`of popular or commodity models available. In Section 3.1 we describe Rosenthal’s object-
`oriented model. In Section 3.2 we describe the model contemporaneously developed at
`UCLA. In Section 3.3 we describe the layering model found in three modern Unix systems
`(Linux, FreeBSD, and Solaris). In Section 3.4 we describe the Windows 2000 and Win-
`dows XP message-passing layering model. We present a summary table of the features
`offered by each layering model in Section 3.5.
`3.1 Rosenthal’s Layering Model
`In 1990, Rosenthal developed an experimental prototype of a new VFS for SunOS [Rosen-
`thal 1990] with the goal of layering vnodes, so that (cid:2)le systems can be composed of build-
`ing blocks. Rosenthal identi(cid:2)ed two distinct types of layering. The (cid:2)rst, shown in Fig-
`ure 1(a), is interposition in which a higher-level vnode is called before the lower-level vn-
`ode, and can modify the lower-level vnode’s arguments, operation, and results. Today, this
`is commonly called linear layering or linear stacking. The second, shown in Figure 1(b)
`is composition in which a higher-level vnode performs operations on several lower-level
`vnodes. Today, this is commonly called fan out.
`User
`User
`
`User
`
`Vnode A
`
`Vnode A
`
`Vnode A
`
`Vnode B
`
`(a) Linear
`
`Vnode B
`
`Vnode C
`
`(b) Fan out
`
`Vnode B
`
`(c) Fan in
`
`Fig. 1. Types of layering. In a linear layer all operations are intercepted by the top-level vnode, A, and A passes
`it to a single vnode below. In fan-out, all operations go through the top-level vnode, A, but there are two or more
`lower-level vnodes. In fan-in, some operations go through the top-level vnode, A, before going to the lower-level
`vnode, B, but some operations can bypass A and directly access B.
`
`Rosenthal introduced two key abstractions to support vnode layering: push and pop for
`inserting and removing vnodes from a stack. All vnode operations go through the highest-
`level vnode. Rosenthal replaced all visible attributes in the vnode structure with methods
`so that layered (cid:2)le systems could intercept them. Push inserts a vnode at the top of a stack
`of vnodes, so that all operations destined for the stack of vnodes go to the new top vnode.
`The top vnode may in turn call the vnodes that are below it. Pop performs the opposite
`operation: the top vnode from a stack is removed and operations go directly to the lower-
`level vnode. Two new (cid:2)elds were added to the vnode structure: v top and v above.
`ACM Transactions on Storage, Vol. 2, No. 2, May 2006.
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 7 of 34
`
`6
`
`(cid:1)
`
`Zadok et al.
`
`The v above pointer points to the vnode that is directly above this one in the stack.
`The v top pointer points to the highest level vnode in the stack. All vnode operations
`go through the highest-level vnode, hence every vnode in the stack essentially becomes
`an alias to the highest-level vnode. Unfortunately, this prevents fan-in access (shown in
`Figure 1(c)), in which a user process accesses the lower-level (cid:2)le system directly. Fan-
`in access is necessary when applications need to access unmodi(cid:2)ed data; for example,
`a backup program should write encrypted data (stored on the lower-level (cid:2)le system) to
`tape, not the unencrypted data (accessible through the top layer). Rosenthal also suggested
`several layered (cid:2)le system building blocks, but did not implement any of them for his
`prototype.
`Rosenthal proposed a set of requirements for a layered vnode interface [Rosenthal 1992].
`To support interposition, he concluded that the OS must support replacing a vnode’s op-
`erations vector and private data for each interposer. Moreover, all vnode operations must
`be mitigated through the operations vector. To support composition, Rosenthal concluded
`that vnodes must support transactions. The vnode interfaces assumes that each operation
`is atomic [Kleiman 1986]. However, to compose two (cid:2)le systems together, two distinct
`lower-level vnode operations must be called; thus the overall operation is not atomic.
`Later, a vnode interface for SunOS based on Rosenthal’s interposition and several exam-
`ple (cid:2)le systems were developed [Skinner and Wong 1993]. Of particular note is that some
`existing VFS operations took multiple vnode arguments, making it impossible to determine
`which vnode’s operations vector should handle the operation. For example, link would
`take two vnode arguments, the parent directory and the vnode to insert into that directory.
`To make each vnode control all of its operations, Skinner and Wong decomposed these
`larger vnode operations into smaller operations to manage vnode link counts, to update di-
`rectory entries, and to create objects. Skinner and Wong’s prototype dealt with issues that
`Rosenthal did not, including concurrency, locking, and per-mount operations. They also
`developed (cid:2)le system components in a layered manner. Skinner and Wong’s implementa-
`tion of mount and unmount is a particularly elegant example of push and pop. The OS’s
`name-lookup code no longer needs special code for mount points. The root vnode of the
`(cid:2)le system to be mounted is simply pushed on top of the covered vnode. To unmount a (cid:2)le
`system, its root vnode is simply popped from the stack. The directory-name(cid:150)lookup cache
`was also implemented as an interposition layer.
`3.2 UCLA’s Layering Model
`In the early 1990s, Heidemann and Popek also developed an infrastructure for layered (cid:2)le
`system development at UCLA [Heidemann and Popek 1991; 1994]. The UCLA model
`emphasized general VFS extensibility and the ability to compose many small and ef(cid:2)cient
`building block layers into more sophisticated (cid:2)le system services. The UCLA model sug-
`gests that each separate service should be its own layer. For example, UFS should be
`divided into at least three different layers: (1) managing disk partitions, (2) a (cid:2)le storage
`layer providing arbitrary-length (cid:2)les referenced by a unique identi(cid:2)er (i.e., inode number),
`and (3) a hierarchical directory component. Breaking up a (cid:2)le system like UFS would
`allow interposition at several points. For example, an intrusion-detection layer could be
`inserted above the directory component so it has access to names, but a compression layer
`could be inserted between the naming layer and the (cid:2)le storage layer to avoid the imple-
`mentation details of hard links.
`To provide extensibility, the UCLA model does not have a (cid:2)xed set of operations. In-
`ACM Transactions on Storage, Vol. 2, No. 2, May 2006.
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 8 of 34
`
`On Incremental File System Development
`
`(cid:1)
`
`7
`
`stead, each (cid:2)le system provides its own set of operations, and the total set of operations is
`the union of all (cid:2)le systems’ operations. If a (cid:2)le system does not support a given opera-
`tion, then a generic routine for that (cid:2)le system is called. This routine could, for example,
`simply return an (cid:147)Operation not supported(cid:148) error. However, for a layered (cid:2)le system, a
`generic bypass routine can be used. To provide operations with arbitrary arguments, the
`UCLA interface transforms all of an operation’s arguments into a single structure of argu-
`ments. Along with the structure, meta-data is passed to the operation that identi(cid:2)es each
`argument’s offset and type. Using this meta-data, the bypass routine can generically map
`its own vnodes to lower-level vnodes and invoke the lower-level operation.
`The UCLA model uses mount and unmount to instantiate a layered (cid:2)le system. The
`existing mount operation (cid:2)ts well for two reasons: (1) it creates a new subtree of objects
`within the (cid:2)le system name space, and (2) creation of subtrees usually requires an existing
`(cid:2)le system object (e.g., to mount a UFS (cid:2)le system you need a device node). A (cid:2)le-system
`layer often accesses an existing directory in much the same way that UFS uses a device.
`For example, to mount an encryption (cid:2)le system, the path to the encrypted data is used.
`Caching is essential to provide good performance, but can pose problems if two or more
`layers cache an object. For example, if two layers concurrently cache the same page or
`vnode, and one modi(cid:2)es it, then the other one would read stale data. Similarly, if both
`of them modify the page or vnode, then one of them would lose its update. Heidemann
`developed a prototype general-purpose cache manager that provided coherent access to all
`(cid:2)le system layers [Heidemann and Popek 1995]. The UCLA cache manager requires lay-
`ers to request either shared or exclusive ownership of an object before caching it. When
`required, the cache manager revokes the existing layers’ ownership using a callback. In
`most cases, the cache manager automatically determines when two objects represent the
`same data. However, if layers may change the semantics of the data, then the cache man-
`ager cannot correlate the upper-level and lower-level (cid:2)le’s position. In these cases, the
`cache manager treats the upper-level and lower-level objects as two distinct objects, and
`all ownership requests for either object are sent to the layer that changes the semantics.
`The semantic-changing layer can then invoke the correct callbacks based on its knowledge
`of the relationship between the original and transformed data.
`The Ficus replicated (cid:2)le system and several course projects (e.g., encryption and com-
`pression layers) were developed using the UCLA model [Guy et al. 1990].
`
`3.3 Layering in Modern Unix Systems
`In this section we describe layering in three modern Unix-based OSes: Solaris, FreeBSD,
`and Linux. We discuss their pros and cons in terms of ease of development and perfor-
`mance. All three systems use the vnode interface to enable layering functionality, but they
`have key implementation differences that in(cid:3)uence development ease and performance.
`Solaris. The architecture of the Solaris VFS is nearly identical to the classic vnode
`architecture. Each vnode has a (cid:2)xed operations vector. Each operation must be imple-
`mented by every (cid:2)le system (unde(cid:2)ned operations are not permitted). Generic operations
`for returning (cid:147)operation not supported(cid:148) or in some cases success are available. Mutable
`attributes such as size, access time, etc. are not part of vnode (cid:2)elds; instead, the vnode
`operations include functions for managing such attributes.
`The Solaris loopback (cid:2)le system, Lofs [SMCC 1992], is a null pass-through layer that
`layers only on directory vnodes. Thus, for regular (cid:2)les on Lofs, vnode objects are not
`ACM Transactions on Storage, Vol. 2, No. 2, May 2006.
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 9 of 34
`
`8
`
`(cid:1)
`
`Zadok et al.
`
`duplicated. Because Lofs does not layer on (cid:2)le vnodes, the data pages of (cid:2)les are not
`duplicated, resulting in more ef(cid:2)cient memory utilization. However, this makes it harder
`to extend Lofs to provide functionality like encryption, where two different page contents
`need to be maintained at the lower level and the layered level.
`CacheFS [SunSoft 1994] is a layered fan-out (cid:2)le system in Solaris which can mount on
`top of two lower level directories, the source and the cache directory. When (cid:2)les in the
`source directory are accessed through CacheFS, they are copied into the root of the cache
`directory and indexed using a numeric identi(cid:2)er.
`
`FreeBSD. The FreeBSD vnode interface has extensibility based on the UCLA model.
`FreeBSD allows dynamic addition of vnode operations. While activating a (cid:2)le system,
`the kernel registers the set of vnode operations that the (cid:2)le system supports, and builds an
`operation vector for each (cid:2)le system that contains the union of all operations supported
`by any (cid:2)le system. File systems provide default routines for operations that they do not
`support. FreeBSD uses packed arguments so layers can generically bypass operations as
`in the UCLA model.
`FreeBSD’s version of the loopback (cid:2)le system is called nullfs [Pendry and McKusick
`1995]. It is a simple (cid:2)le system layer and makes no transformations on its arguments, just
`passing the requests it receives to the lower (cid:2)le system. FreeBSD’s union mounts [Pendry
`and McKusick 1995] logically merge the directories that they are mounted on.
`Compared to Solaris (and Linux), FreeBSD has the most versatile support for layering
`at the (cid:2)le system level. Its architecture allows new vnode operations to be created dy-
`namically. Packing function arguments in an argument structure allows layers to bypass
`operations that they do not need to intercept easily.
`
`Linux. Linux does not have native support for layered (cid:2)le systems. The Linux VFS was
`designed to make adding (cid:2)le systems relatively simple by separating generic (cid:2)le system
`code from the individual (cid:2)le system implementations. A (cid:2)le system may choose not to
`implement a method, and the VFS itself provides a sensible default. For example, a (cid:2)le
`system can use generic routines for most data-oriented operations (i.e., reading from and
`writing to (cid:2)les, including via mmap), and only needs to provide two primitive operations
`to read and write a block from the (cid:2)le. This trend of extracting more and more abstractions
`is on the rise in the new versions of the Linux kernel. Layered (cid:2)le systems are usually not
`able to use generic methods, because they need to pass the operation to the lower-level (cid:2)le
`system, but generic operations do not make it more dif(cid:2)cult to develop layered (cid:2)le systems.
`On the other hand, providing functionality directly within the VFS makes it more dif(cid:2)cult
`to implement layered (cid:2)le systems. As shown in Figure 2, a layered (cid:2)le system must appear
`just like the VFS to the lower-level (cid:2)le system [Zadok and B(cid:11)adulescu 1999], so it must
`replicate this VFS functionality. Worse, as the VFS changes, the layered (cid:2)le system must
`also change, or subtle bugs could be introduced.
`On Linux, the vnode object is divided into two components: a dentry which represents
`the name of the (cid:2)le and an inode which represents the actual on-disk (cid:2)le. This separation
`simpli(cid:2)es support for hard links. On Linux, VFS operation vectors are (cid:2)xed. Operations
`cannot be added, and the prototype of existing operations cannot be changed. The Linux
`VFS stores mutable values inside the inode object. This makes it more dif(cid:2)cult to maintain
`cache coherency between the inodes of the layered (cid:2)le system and the lower level inodes.
`It also causes performance degradation in layered (cid:2)le systems as the modi(cid:2)ed values in the
`ACM Transactions on Storage, Vol. 2, No. 2, May 2006.
`
`

`

`Case 1:20-cv-00034-ADA Document 50-5 Filed 04/10/20 Page 10 of 34
`
`On Incremental File System Development
`
`(cid:1)
`
`9
`
`VFS
`
`File System Behavior
`VFS−Like Behavior
`
`Layered
`File System
`
`Lower−Level File System
`
`Fig. 2. On Unix, layered (cid:2)le systems consist of two halves: the top half must behave like a (cid:2)le system, and the
`bottom half must behave like the VFS.
`
`lower-level objects need to be copied to the upper-level objects after each (cid:2)le operation.
`The existing VFS API requires modi(cid:2)cations to avoid this data redundancy. In particular,
`the data (cid:2)elds should not be directly accessible from outside a (cid:2)le system. Instead, reads
`and writes of these private (cid:2)elds should be performed by calling (cid:2)le-system methods. Our
`evaluation, described in Appendix A, shows that the overheads of these method calls would
`be negligible.
`FiST [Zadok 2001] is a high-level layered (cid:2)le system de(cid:2)nition language that we de-
`veloped. FiST includes Wrapfs [Zadok et al. 1999], a layered vnode interface imple-
`mentation for Linux, FreeBSD, and Solaris. Wrapfs is a simple pass-through layer that
`intercepts name and data operations. The VFS objects of each entity ((cid:2)le or directory)
`are duplicated at the Wrapfs level. On Linux, many (cid:2)le systems have been derived from
`Wrapfs, including ones for encryption [Corner and Noble 2002; Halcrow 2004; Shana-
`han

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket