`
`Richard Allen Floyd
`
`
`Technical Report 272
`
`January 1989
`
`
`COMPUTER SCIENCE
`
`
`Oracle Exhibit 1004, page 1
`
`
`
`Oracle Exhibit 1004, page 2
`
`Oracle Exhibit 1004, page 2
`
`
`
`Transparency in Distributed File Systems
`
`by
`
`Richard Allen Floyd
`
`Submitted in Partial Fulfillment
`
`of the
`
`Requirements for the Degree
`
`
`Doctor of Philosophy
`
`Supervised by Carla Schlatter Ellis
`
`Department of Computer Science
`College of Arts and Sciences
`
`University of Rochester
`
`Rochester, New York.
`
`1989
`
`Oracle Exhibit 1004, page 3
`
`
`
`Oracle Exhibit 1004, page 4
`
`Oracle Exhibit 1004, page 4
`
`
`
`Curriculum Vitae
`
`Richard Allen Floyd was born in Rochester, Minnesota on St. Patrick's Day, March 17th, 1953, a
`
`fact that his Irish mother has never let him forget. In 1971 he entered Iowa State University. He
`
`worked his way through school at a number of jobs, the most memorable of which was as a pro
`
`grammer for the High Energy Physics group at Iowa State. It was here that he was first intro
`
`duced to computers and where he developed his distinctive late-night working style. He majored
`
`in Physics (with a minor in Electrical Engineering) and graduated, with distinction and several
`
`achievement awards, in 1977.
`
`He started his graduate career at the University of California, Berkeley in September of 1977, but
`
`quickly discovered that it was the computer aspects of his earlier physics work that had made it so
`
`appealing. He left in 1978 to take a position at the Clinton P. Anderson Meson Physics Facility in
`
`Los Alamos, New Mexico, where he developed data acquisition and analysis software for particle
`
`physics experiments. He also began his education in Computer Science at the University of New
`
`Mexico. In 1981 he entered the University of Rochester's Computer Science program, and in
`
`1982 was awarded a Masters in Computer Science. While at Rochester he was a teaching assis
`
`tant and a research assistant.
`
`ii
`
`Oracle Exhibit 1004, page 5
`
`
`
`Oracle Exhibit 1004, page 6
`
`Oracle Exhibit 1004, page 6
`
`
`
`Acknowledgements
`
`My advisor, Carla Ellis, has put up with me for an extraordinary amount of time. In addition to
`
`being my advisor, she has been a good friend and colleague. Her help has meant more than I can
`
`say. I would not have been able to write this dissertation without her advice, support, and friend
`
`ship. I would also like to thank the other past and present members of my committee, Tom
`
`LeBlanc, Chris Brown, Bezalel Gavish, Michael Scott, and Bruce Arden. Their direction and
`
`encouragement has been of great value.
`
`The efforts of the Computer Science Department's faculty, students, and staff have made the
`
`department's facilities first rate. I would like to acknowledge in particular Liud Bukys, who pro
`
`vided extensive help and advice on various arcane aspects of RIG, and on !PC matters. He also
`
`developed a file access protocol that provided a starting point for the Roe file access protocol.
`
`Mike Dean ported the initial Local File Server to UNIX, Josh Tenenberg ported it to the Alto, and
`
`Doug Ierardi implemented the first Roe Transaction coordinator. In addition, Jill Forster and Rose
`
`Peet provided assistance with administrative details and endless encouragement
`
`Many friends, both in and out of the department, have helped with their encouragement and dis- .
`
`tractions. It's impossible to mention them all. I'd like to give special thanks to Kok Heong, for
`
`her boundless enthusiasm and her proofreading, to Jaii, who supported my fondness for things
`
`New Mexican, to E, for too many things to list, and to Mark, Brux, Jennifer, Liralen, George,
`
`Judy, Lee, Joan, Mike, Beth, Jim, Stuart, Diane, Steve, and Lydia.
`
`iii
`
`Oracle Exhibit 1004, page 7
`
`
`
`This work has been supported in part by the National Science Foundation under grant number
`
`DCR-8320136, in part by the Office of Naval Research under grant number NOOO14-82-K-0193,
`
`and, thanks to the efforts and faith of Steve Vinter, in part by BBN Laboratories.
`
`iv
`
`Oracle Exhibit 1004, page 8
`
`
`
`Abstract
`
`The last few years have seen an explosion in the research and development of distributed file sys
`
`tems. Existing systems provide a limited degree of network transparency. with researchers gen
`
`erally arguing that full network transparency is unachievable. Attempts to understand and address
`
`these arguments have been limited by a lack of understanding of the range of possible solutions to
`
`transparency issues and a lack of knowledge of the ways in which file systems are used.
`
`We address these problems by: 1) designing and implementing a prototype of a highly transparent
`
`distributed file system; 2) collecting and analyzing data on file and directory reference patterns;
`
`and 3) using these data to analyze the effectiveness of our design.
`
`Our distributed file system. Roe. supports a substantially higher degree of transparency than earlier
`
`distributed file systems. and is able to do this in a heterogeneous environment. Roe appears to
`
`users to be a single. globally accessible file system providing highly available. consistent files. It
`
`provides a coherent framework for uniting techniques in the areas of naming, replication. con
`
`sistency control, file and directory placement, and file and directory migration in a way that pro
`
`vides full network transparency. This transparency allows Roe to provide increased availability,
`
`automatic reconfiguration, effective use of resources, a simplified file system model, and important
`
`performance benefits.
`
`Our data collection and analysis work provides detailed information on short-term file reference
`
`panerns in the UNIX environment In addition to examining the overall request behavior, we
`
`break references down by the type of file, owner of file, and type of user. We find significant
`
`v
`
`Oracle Exhibit 1004, page 9
`
`
`
`vi
`
`differences in reference patterns between the various classes that can be used as a basis for place
`
`ment and migration algorithms. Our study also provides, for the first time, information on direc
`
`tory reference patterns in a hierarchical file system. The results provide striking evidence of the
`
`importance of name resolution overhead in UNIX environments.
`
`Using our data collection analysis results, we examine the availability and performance of Roe.
`
`File open overhead proves to be an issue, but techniques exist for reducing its impact.
`
`Oracle Exhibit 1004, page 10
`
`
`
`Table of Contents
`
`1 Introduction
`1.1 The Problem
`1.2 A Range of Solutions
`1.2.1 Utilities for File Transfer
`1.2.2 Transparent Remote Access
`1.2.3 Transparent Distributed File Systems
`1.3 Issues in the Design of a Transparent Distributed File System
`1.4.The Thesis
`1.5 The Remainder of the Dissertation ..
`1.6 The Significance of This Work
`
`2 Previous Work
`2.1 Introduction
`2.2 Terms and Metrics
`2.2.1 Availability.................
`2.2.2 Consistency......................
`2.2.3 Performance
`2.2.4 Reconfigurability
`2.2.5 Resource Utilization
`2.2.6 Transparency
`2.3 Existing Distributed File Systems
`2.3.1 Helix
`2.3.2 NFS
`2.3.3 Sprite
`2.3.4 Andrew
`2.3.5 LOCUS
`2.3.6 IBIS
`2.3.7 MULTIFILE
`2.4 Naming
`2.4.1 R*
`2.4.2 Caching and Hints
`2.4.3 Clearinghouse
`
`vii
`
`1
`
`1
`
`2
`
`2
`
`2
`
`3
`
`4
`
`6
`
`.7
`
`8
`
`
`10
`
`10
`
`11
`
`11
`
`12
`
`12
`
`13
`
`13
`
`14
`
`14
`
`15
`
`16
`
`18
`
`19
`
`21
`
`23
`
`24
`
`24
`
`25
`
`26
`
`26
`
`
`Oracle Exhibit 1004, page 11
`
`
`
`2.4.4 Cronus
`2.4.5 Structure Free Name Distribution
`2.5 Consistency and Replication
`2.5.1 Internal Consistency............
`2.5.2 Mutual Consistency of Replicated Data
`2.5.3 Relaxing Consistency Requirements
`2.6 File System Reference Patterns
`2.6.1 Long-Term Reference Studies
`2.6.2 Short-Term File Reference Studies
`2.6.3 Directory Reference Studies
`2.7 File Assignment an" Migration
`2.7.1 File Assignment
`2.7.2 Migration Algorithms
`2.8 Other Relevant Work:
`2.9 Discussion
`
`,.
`
`3 The Architecture of Roe, A Transparent Distributed File System
`3.1 Introduction
`3.2 Environmental Assumptions
`3.3 Goals of the Roe Distributed File System
`3.4 The Roe Approach
`3.4.1 Overview
`3.4.2 File Replication and Consistency
`3.4.2.1 Motivation
`3.4.2.2 Internal Consistency
`3.4.2.3 Mutual Consistency
`3.4.3 The Global Directory
`3.4.3.1 The Structure of the Global Directory
`3.4.3.2 Replicating the Global Directory
`3.4.4 The Network Model
`3.4.5 Initial Placement
`3.4.6 Migration
`3.4.7 Support for Heterogeneity
`3.5 The Architecture of Roe
`3.5.1 Organization and Distribution
`3.5.2 User Protocol
`3.5.3 Internal Protocols
`3.6 Discussion
`3.6.1 Meeting the Goals of Roe
`3.6.2 Weaknesses of the Roe Approach
`3.6.3 Strengths of the Roe Approach
`3.7 Summary
`
`4 The Roe Implementation .;.............................................................................................
`4.1 Introduction
`
`viii
`
`27
`
`28
`
`28
`
`29
`
`30
`
`31
`
`32
`
`32
`
`33
`
`34
`
`35
`
`35
`
`36
`
`37
`
`39
`
`
`41
`
`41
`
`42
`
`43
`
`46
`
`46
`
`47
`
`47
`
`48
`
`48
`
`52
`
`53
`
`58
`
`61
`
`63
`
`66
`
`69
`
`71
`
`72
`
`76
`
`81
`
`87
`
`87
`
`88
`
`89
`
`90
`
`
`92
`
`92
`
`
`Oracle Exhibit 1004, page 12
`
`
`
`4.2 The Implementation Environment
`4.2.1 The Host Environments
`4.2.2 Interprocess Communication
`4.3 The Roe Implementation
`4.3.1 General Approach
`4.3.2 Roe Server Implementations
`4.3.2.1 Local File Server
`4.3.2.1.1 UNIX
`4.3.2.1.2 RIG
`4.3.2.1.3 AltolMesa
`4.3.2.2 Local Representative
`4.3.2.3 Transaction Coordinator :.......................................................................................
`4.3.2.4 Local Directory Server
`4.3.2.5 Global Directory Server
`4.3.2.5.1 Initialization
`4.3.2.5.2 The Network Model
`4.3.2.5.3 File and Directory Management
`4.3.2.5.4 Initial Placement and Migration
`4.3.3 What the Implementation Provides
`4.3.4 Implementation Weaknesses and Omissions
`4.4 Implementation Difficulties
`4.4.1 Multiplexed Servers
`4.4.2 !PC Problems
`4.4.3 A Solution
`4.4.4 Towards a Better !PC
`4.4.5 Other Difficulties
`4.5 Performance Considerations
`4.5.1 Host Performance
`4.5.2 The Performance of Roe
`4.5.3 Improvements
`4.6 Observations
`4.7 Summary
`
`5 Short-Term File Reference Patterns in a UNIX Environment
`5.1 Introduction
`5.2 Data Collection Environment
`5.3 Data Collection Method
`5.3.1 Static Snapshot
`5.3.2 Logging File System Activity
`5.4 Analysis Method
`5.4.1 Basic Approach
`:...................................................................................................
`5.4.2 Cuts
`5.4.3 Analysis Complications
`5.5 File Reference Patterns
`5.5.1 Overall Open and Read/Write Patterns
`
`ix
`
`93
`
`94
`
`95
`
`97
`
`97
`
`98
`
`99
`
`99
`
`102
`
`102
`
`103
`
`103
`
`104
`
`106
`
`107
`
`108
`
`109
`
`111
`
`112
`
`113
`
`116
`
`117
`
`118
`
`123
`
`125
`
`127
`
`129
`
`129
`
`131
`
`133
`
`135
`
`136
`
`
`138
`
`138
`
`140
`
`140
`
`140
`
`141
`
`145
`
`145
`
`146
`
`149
`
`150
`
`150
`
`
`Oracle Exhibit 1004, page 13
`
`
`
`5.5.1.1 Basic Statistics
`5.5.1.2 Per Open Results
`5.5.1.3 Per File Results
`5.5.2 Execute Patterns
`5.5.3 User File Patterns
`5.5.3.1 Basic Statistics for User Files
`5.5.3.2 Per Open Results for User Files
`5.5.3.3 Per File Results for User Files
`5.6 Summary
`
`6 Directory Reference Patterns in a UNIX Environment
`6.1 Introduction
`6.2 Data Collection Methodology
`6.3 Analysis Method
`6.3.1 Conventions
`6.3.2 Cuts
`6.4 Directory Reference Patterns
`6.4.1 Basic Statistics
`6.4.2 Per Reference Results
`6.4.3 Per Directory Results
`6.4.4 The High Cost of Opens
`6.5 Summary
`
`7 Implications for the Design of Distributed File Systems
`7.1 Introduction
`7.2 Availability
`7.2.1 Availability Model
`7.2.2 Basic Distributed File System Availability................................................................
`7.2.3 Adding ReadlWrite and Semantic Information
`7.3 Reconfigurability: Initial Placement and Migration
`7.4 Performance
`7.4.1 General Considerations
`7.4.2 Performance Issues in Roe
`7.4.3 Implications for Other Distributed File System Approaches
`7.5 Summary
`
`8 Summary and Future Work
`8.1 Summary
`8.2 Future Work
`8.2.1 Extension and Evaluation of Roe
`8.2.2 Reference Data Collection
`8.2.3 Initial Placement and Migration Algorithms
`
`Bibliography
`
`x
`
`151
`
`154
`
`166
`
`173
`
`180
`
`181
`
`182
`
`188
`
`191
`
`
`194
`
`194
`
`195
`
`196
`
`196
`
`197
`
`198
`
`199
`
`203
`
`208
`
`216
`
`223
`
`
`225
`
`225
`
`226
`
`226
`
`228
`
`230
`
`233
`
`236
`
`236
`
`238
`
`243
`
`245
`
`
`247
`
`247
`
`249
`
`249
`
`250
`
`251
`
`
`253
`
`
`Oracle Exhibit 1004, page 14
`
`
`
`List of Tables
`
`4-1 File access times for 4.2BSD UNIX
`4-2 Message and port management costs
`4-3 Time to send a simple message
`4-4 Time to open an existing local Roe file, by phase of open
`4-5 Time to open an existing local Roe file, by type of activity............................................
`4-6 Opening remote and replicated Roe files
`
`~
`
`5-1 Snapshot output
`5-2 Dynamic log structure
`5-3 Records logged
`5-4 Opens, by object type
`5-5 Class and owner of opened regular files
`5-6 Mode of open for open-close sessions
`5-7 Function of opened perm files
`5-8 Bytes read/written for regular files
`5-9 File size distributions
`5-10 File sizes, weighted by number of bytes read
`5-11 Percentage read (read-only opens)
`5-12 Percentage written (write-only opens)
`5-13 Percentage read (read/write opens)
`5-14 Percentage written (read/write opens)
`5-15 Percentage read, by size (read-only opens)
`5-16 Percentage written, by size (write-only opens)
`5-17 Number of opens/file
`5-18 Open distribution (as a function of opens/file)
`5-19 Frequently opened inodes
`5-20 Open time (seconds)
`5-21 File interopen intervals (seconds)
`5-22 File sharing, by file class and owner
`5-23 Readers, writers, users and inversions (no cuts)
`5-24 Basic active executable statistics
`5-25 Executable file sizes (bytes)
`5-26 Number of executes/active executable
`5-27 Execute distribution (as a function of executes/executable)
`
`xi
`
`129
`
`130
`
`130
`
`131
`
`132
`
`132
`
`
`141
`
`142
`
`151
`
`152
`
`153
`
`153
`
`154
`
`158
`
`159
`
`160
`
`163
`
`163
`
`164
`
`164
`
`166
`
`166
`
`167
`
`167
`
`169
`
`170
`
`170
`
`173
`
`174
`
`174
`
`175
`
`175
`
`176
`
`
`Oracle Exhibit 1004, page 15
`
`
`
`5-28 Frequently executed inodes
`5-29 Interexecute intervals (seconds)
`5-30 Process lifetimes (seconds)
`5-31 Executable sharing
`5-32 User opens to user files
`5-33 Modes of open for user open-close sessions to user files
`5-34 Function of opened user penn files
`5-35 Bytes read/written by users to user files
`5-36 User file size distributions
`5-37 Percentage of users files read (read-only opens)
`5-38 Percentage of user files written (write-only opens)
`5-39 Percentage of user files read (read/write opens)
`5-40 Percentage of user files written (read/write opens)
`5-41 Number of user opens/user file
`5-42 User file interopen intervals (seconds)
`5-43 User file sharing
`5-44 Readers, writers, users and inversions (user references to user files)
`
`6-1 Records logged
`6-2 Opens, by object type
`6-3 Path statistics
`6-4 Components/path
`6-5 Reference statistics
`6-6 References/directory .
`6-7 Directory size distributions (in entries)
`6-8 Number of references/directory
`6-9 Reference distribution (as a function of references/directory)
`6-10 Frequently referenced directories (no cut)
`6-11 Frequently referenced directories (owner_USER+ruid_USER cut)
`6-12 Directory interreference intervals (seconds)
`6-13 Reads/directory version
`6-14 Directory sharing
`6-15 Directory sharing (user cuts)
`6-16 Blocks accessed/path resolution (512 byte blocks)
`6-17 Directory overhead (512 byte blocks)
`6-18 Block counts for regular file opens, reads, and writes (512 byte blocks)
`6-19 Blocks accessed/path resolution (4K byte blocks)
`6-20 Directory overhead (4K byte blocks)
`6-21 Block counts for regular file opens, reads, and writes (4K byte blocks)
`
`7-1 Availability based on path length distributions
`7-2 Worst case availability using semantic and read/write information
`7-3 Miss ratio vs. cache size (in nodes)
`7-4 Miss ratio vs. cache size (in bytes)
`
`:.::~........
`
`xii
`
`178
`
`179
`
`179
`
`180
`
`181
`
`181
`
`182
`
`182
`
`183
`
`187
`
`187
`
`187
`
`187
`
`188
`
`189
`
`191
`
`191
`
`
`199
`
`200
`
`201
`
`202
`
`202
`
`203
`
`206
`
`208
`
`210
`
`211
`
`211
`
`213
`
`214
`
`215
`
`215
`
`217
`
`218
`
`219
`
`222
`
`222
`
`222
`
`
`229
`
`232
`
`239
`
`240
`
`
`Oracle Exhibit 1004, page 16
`
`
`
`List of Figures
`
`3-1 A Roe directory entry
`3-2 The network model
`3-3 A machine-dependent file entry
`3-4 Opening and reading a file
`3-5 Roe component location
`
`4-1 U of R Computer Science Department 3MB network (cira 1983)
`4-2 A port in CMU-!PC
`4-3 A multiplexed server
`4-4 Structure of a Roe file on UNIX
`4-5 Structure of a Roe directory
`4-6 Initial placement of file and directory copies
`
`5-1 Average number of regular file opens per second r2 hour resolution)
`5-2 Average number of regular file opens per second r 15 minute resolution)
`5-3 Bytes read from regular files r2 hour resolution)
`5-4 Bytes written to regular files r2 hour resolution)
`
`5-5 Bytes transferred to and from regular files (10 second resolution)
`5-6 Dynamic file size distributions (cumulative, measured at close)
`5-7 Dynamic file size distributions, weighted by bytes read (cumulative, at close)
`5-8 Percent of file read for read-only opens (cumulative)
`5-9 Percent of file written for write-only opens (cumulative)
`5-10 Percent of file read for read/write opens (cumulative)
`5-11 Percent of file written for read/write opens (cumulative)
`5-12 Percent of file read for read-only opens (cumulative, by size)
`5-13 Percent of file written for write-only opens (cumulative, by size)
`5-14 Number of opens per active file (cumulative)
`5-15 Fraction of opens per active file (cumulative)
`5-16 Times from file open to close (cumulative)
`5-17 File interopen intervals (cumulative)
`5-18 File lifetimes (cumulative)
`5-19 Version lifetimes (cumulative)
`5-20 Dynamic executable file size distributions (cumulative)
`
`xiii
`
`56
`
`63
`
`71
`
`75
`
`76
`
`
`93
`
`96
`
`100
`
`101
`
`105
`
`112
`
`
`155
`
`155
`
`156
`
`157
`
`157
`
`158
`
`160
`
`161
`
`162
`
`162
`
`163
`
`165
`
`165
`
`167
`
`168
`
`169
`
`170
`
`171
`
`172
`
`175
`
`
`Oracle Exhibit 1004, page 17
`
`
`
`5-21 Number of executes per active executable (cumulative)
`5-22 Fraction of executes per active executable (cumulative)
`5-23 File interexecute intervals (cumulative)
`5-24 Process lifetimes (cumulative)
`5-25 Average number of file opens per second C2 hour resolution, U cut)
`5-26 Dynamic file size distributions (cumulative, measured at close, U cut)
`5-27 Percent of file read for read-only opens (cumulative, U cut)
`5-28 Percent of file writ.en for write-only opens (cumulative, U cut)
`5-29 Percent of file read for read/write opens (cumulative, U cut)
`5-30 Percent of file written for read/write opens (cumulative, U cut)
`5-31 Number of opens JX:r active file (cumulative, U cut)
`5-32 File interopen intervals (cumulative, U cut)
`5-33 File lifetimes (cumulative, U cut)
`5-34 Version lifetimes (cumulative, U cut)
`6-1 Directory references per second r 2 hour resolution)
`6-2 Directory references per second r 2 hour resolution, user cuts)
`6-3 Size of referenced directories (cumulative, in entries)
`6-4 Size of referenced directories (cumulative, in entries, user cuts)
`6-5 Number of references per directory (cumulative)
`6-6 Fraction of references per directory (cumulative)
`6-7 Number of references per active directory (cumulative, user cuts)
`6-8 Directory interreference intervals (cumulative)
`6-9 Directory version lifetimes (cumulative)
`6-10 Reads per directory version (cumulative)
`6-11 Reads per directory version (cumulative, user cuts)
`6-12 Path resolution cost (cumulative, 512 byte blocks)
`6-13 Name resolution overhead for file opens (cumulative, 512 byte blocks)
`6-14 Path resolution cost (cumulative, 4K byte blocks)
`6-15 Name resolution overhead for file opens (cumulative, 4K byte blocks)
`6-16 Path resolution cost (cumulative, 4K byte blocks, user cuts)
`6-17 Name resolution overhead for opens (cumulative, 4K byte blocks, user cuts)
`
`7-1 Availability based on path length distributions
`7-2 Availability based on path length distributions, a > 0.9
`7-3 Worst case availability using semantic and read/write information
`7-4 Worst case availability using semantic and read/write information, a> 0.9
`7-5 'Whole directory cache effectiveness
`7-6 Whole directory cache effectiveness (user cuts)
`7-7 Whole directory cache effectiveness (byte size limit)
`
`xiv
`
`176
`
`177
`
`178
`
`180
`
`183
`
`184
`
`185
`
`185
`
`186
`
`186
`
`188
`
`189
`
`190
`
`190
`
`
`204
`
`205
`
`205
`
`207
`
`207
`
`209
`
`209
`
`212
`
`212
`
`213
`
`215
`
`217
`
`218
`
`220
`
`220
`
`221
`
`221
`
`
`228
`
`230
`
`231
`
`231
`
`239
`
`240
`
`241
`
`
`Oracle Exhibit 1004, page 18
`
`
`
`Chapter 1
`
`
`Introduction
`
`
`1.1. The Problem
`
`Recent years have seen a dramatic decrease in both the cost and size of computer systems. This
`
`has encouraged the proliferation of local area networks (LANs) of small machines. With this
`
`trend has come the problem of sharing information among users of these machines. The problem
`
`of file sharing, in particular, is one that all such networks must address.
`
`It is not enough to provide users with the means to specify and access files on remote machines.
`
`The complexity introduced by the distribution of computing resources and files will quickly
`
`overwhelm users unless we provide a mechanism that allows simple abstractions of the environ
`
`ment to be constructed. In addition, this mechanism should allow users to make effective use of
`
`the resources available, and to take advantage of the distributed nature of the environment. Our
`
`approach to achieving these goals, a highly network transparent distributed file system, is the sub
`
`ject of this dissertation.
`
`The remainder of this chapter provides an introduction to the problem of sharing files in a LAN, to
`
`transparent distributed file systems, and to this dissertation. Section 1.2 maps out the range of
`
`solutions that can be used to address the problem and argues that a highly transparent distributed
`
`1
`
`
`Oracle Exhibit 1004, page 19
`
`
`
`file system is, in general, the preferred solution. Section 1.3 presents the major issues that arise in
`
`designing such a file system. Sections 1.4 and 1.5 describe the approach used in our work and
`
`summarize the rest of the dissertation. Section 1.6 explains the significance of our results.
`
`2
`
`1.2. A Range of Solutions
`
`1.2.1. Utilities for File Transfer
`
`Perhaps the simplest solution for dealing with file sharing in a network is to provide utilities that
`
`allow users to explicitly copy files from machine to machine. Two examples of file transfer utili
`
`ties are the ttp command (used in the Internet environment [Postel 82]) and the rep command
`
`(used between BSD UNIXl systems [Quarterman 85]).
`
`There are a number of problems with this approach to file sharing. It requires that users know the
`
`hosts on which their files are located and that their applications run on these hosts. It makes it
`
`difficult to organize files, as they may be scattered across a number of hosts. Moving files (to
`
`avoid downtime, load balance, and so on) requires knowledge of the hosts and resources available
`
`on the network. In a heterogeneous environment, users see several different naming conventions
`
`and file transfer protocols. Given the ability to transfer a file, users frequently react by spreading
`
`copies throughout the network to increase the file's availability. This raises the difficult problem
`
`of keeping these copies consistent or, if they do become inconsistent, the problems of locating the
`
`most current one and reconciling conflicting changes.
`
`1.2.2. Transparent Remote Access
`
`A more sophisticated solution to the problem of file sharing in a network is to provide transparent
`
`access to files on remote machines. By transparent access we mean that files on remote machines
`
`are accessed using the same techniques used for local files. One approach for achieving this is to
`
`allow a host or other device name to be explicitly included in a file specification, and to have
`
`lUND( is a trademark of AT&T Bell Laboratories:
`
`Oracle Exhibit 1004, page 20
`
`
`
`3
`
`lower layers route the call to the appropriate host. VAXNMS and RIG [Ball 76] support this
`
`approach.
`
`An alternative commonly used in the UNIX environment is to provide a way to patch a remote
`
`naming tree into the local naming tree. This allows remote files to be accessed by users without
`
`necessarily being aware of the existence of these remote hosts. This approach is used by Sun's
`
`NFS [Lyon 85] and AT&T's RFS [Rifkin 86].
`
`The major advantage of transparent access is that it can, with a careful choice of naming conven
`
`tions, allow a user to run applications and access files independent of relative user and file loca
`
`tions. In the case of systems (such as NFS and RFS) that make the remote naming tree an exten
`
`sion of the local one, it allows users to take advantage of the file resources of a network without
`
`generally needing to know the details of their location.
`
`There are a number of serious drawbacks to this approach, though. Binding sections of the narn
`
`ing tree to a host means that the resources available to users of that tree are limited to those on
`
`that machine. Limitations in storage space and processor capacity crop up in unexpected places,
`
`and reconfiguration to correct problems is difficult. A more serious drawback is decreased avail
`
`ability due to machine crashes and other faults. The availability of an NFS-style system will most
`
`likely be less than that of a host running alone, as users will generally require access to multiple
`
`machines to perform their work. In addition, users lose the ability to replicate important files on a
`
`number of different machines. The ability to replicate critical resources to improve availability is
`
`a key advantage of networks over single site systems. A solution that takes advantage of the
`
`potential benefits of distribution is needed.
`
`1.2.3. Transparent Distributed File Systems
`
`A solution to the problems described in the last two sections is to provide users with a network
`
`transparent distributed file system. By network transparent we mean that the distributed file sys
`
`tem (DFS) allows users to create and access objects with no constraints due to the name, the loca
`
`tion of the user and object, and with no knowledge of the underlying systems. Our basic goal is
`
`Oracle Exhibit 1004, page 21
`
`
`
`4
`
`to preserve the abstraction of a single, shared file system across the network.
`
`A major benefit of this approach is that it frees users from the need to understand the details of
`
`the underlying network, and so allows them to develop a simpler model of their environment.
`
`Uncoupling name from location allows the underlying system to be adapted to changing demands
`
`by transparently adding or removing storage and to improve performance by adjusting the loca
`
`tions of files to match requests. Further, this uncoupling allows files to be replicated and distrib
`
`uted to enhance availability.
`
`1.3. Issues in the Design of a Transparent Distributed File System
`
`There are a number of issues that arise in the design of a distributed file system. For example,
`
`what is the model of the file system that will be presented to the user? If a decision is made to
`
`give the user the view that a single file system exists, how do we support this view? What actions
`
`are taken if failures or resource limitations make it difficult to maintain this view? Are there ways
`
`that we can minimize the occurrence of these problems? How do we support such a file system in
`
`a heterogeneous environment? What is the coupling between names and objects stored in the file
`
`system? How are objects located? What effect do architectural decisions have on availability and
`
`performance? If the DFS supports replication, do we insist on consistency? Are these decisions
`
`dependent on the characteristics of the underlying hosts and network?
`
`The last few years have seen an explosion in the research and development of network transparent
`
`DFSs that address many of the issues we have listed here. These efforts have been hampered by
`
`two factors:
`
`(1) Limited understanding of the range of possible solutions to problems in DFS design and
`
`of the interactions of these solutions.
`
`(2) A lack of understanding of the ways in which file systems are used.
`
`There exist a wide range of solutions to various individual problems that a network transparent
`
`DFS must solve. Examples include numerous algorithms for naming, consistency control, file
`
`Oracle Exhibit 1004, page 22
`
`
`
`5
`
`placement and file migration. For the most part, however, these solutions exist in isolation. There
`
`is little knowledge of how they would interact in a fully transparent DFS. Distributed file systems
`
`that do exist have abandoned one or more of the aspects of transparency in order to simplify
`
`design. For example, LOCUS [Walker 83b] takes the approach of gluing together the name
`
`spaces of existing hosts (with additional mechanisms for limited replication), and so provides
`
`access transparency, but only limited location transparency. The IBIS file system [Tichy 84], on
`
`the other hand, supports its own name space and so can provide location transparency, but does
`
`not guarantee that users at different locations will see consistent versions of files.
`
`Further, most existing systems have been designed for a limited environment. In particular, there
`
`has been little attempt by DFSs to accommodate heterogeneous environments. The rapid evolution
`
`of computer hardware and software, the proliferation of special-purpose machines, and the contin
`
`ued growth in the use of networks are all factors that encourage heterogeneity. DFSs that are
`
`designed for heterogeneous environments are faced with the problem of accommodating the
`
`differing capabilities and needs of the underlying systems.
`
`Because of the modest transparency goals and environmental limitations of current systems, there
`
`is no real understanding of the interactions of solutions that can be used in a transparent DFS, and
`
`of the range of applicability of these solutions. Nor is it clear to what extent the various aspects
`
`of transparency are realizable.
`
`Evaluating the effectiveness of a DFS design and implementation requires knowledge of the ways
`
`in which it will be used. There is, unfortunately, very little detailed information available on the
`
`usage patterns for a DFS, or even of single site file systems. In particular, most DFSs currently in
`
`existence use a hierarchical directory system modeled after that used in UNIX, but there are no
`
`data available on directory reference patterns.
`
`Oracle Exhibit 1004, page 23
`
`
`
`6
`
`1.4. The Thesis
`
`Our thesis is as follows:
`
`Full network transparency in distributed file systems offers significant benefits.
`
`These benefits include increased availability, more effective use of resources, the
`
`ability to adapt to changing demands, transparent reconfiguration to adjust to
`
`changes
`
`in
`
`resources, a greatly simplified
`
`file system model
`
`from
`
`the
`
`user/application point of view and, with careful design, enhanced performance over
`
`other DFS approaches.
`
`Distributed file system designers recognize that there are desirable aspects of network tran
`
`sparency, but have generally argued that heterogeneity, complexity, performance, availability, and
`
`autonomy issues make full network transparency unachievable. Some have used these issues to
`
`argue against any sort of network transparency. The lack of understanding of design choices and
`
`of usage patterns described in the previous section makes resolving these conflicting points of view
`
`difficult. In this dissertation we address this lack by:
`
`(1) Designing and implementing a prototype of a highly transparent DFS.
`
`(2) Collecting and analyzing data on file and directory references from a large UNIX sys
`
`tem.
`
`(3) Using these reference patterns to analyze the effectiveness of our DFS design.
`
`As part of the process of designing a highly transparent DFS we investigate solutions to the vari
`
`ous problems faced by a DFS, and examine the interactions of these solutions. This process
`
`extends our understanding of the range of possible DFS designs and of the tradeoffs between vari
`
`ous designs. The resulting file system design meets the transparency guidelines of our thesis. The
`
`prototype implementation validates the design and provides further insights.
`
`The collection of usage data from.a UNIX system gives us data on file reference patterns and, for
`
`the first time, data on directo