throbber
Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 1 of 16
`Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 1 of 16
`
`
`
`
`EXHIBIT B
`EXHIBIT B
`
`
`
`

`

`Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 2 of 16
`
`Towards Better Understanding of Black-box Auto-Tuning:
`A Comparative Analysis for Storage Systems
`Zhen Cao1, Vasily Tarasov2, Sachin Tiwari1, and Erez Zadok1
`1Stony Brook University
`and
`2IBM Research—Almaden
`
`Appears in the Proceedings of the 2018 Annual USENIX Technical Conference (ATC’18)
`
`Abstract
`Modern computer systems come with a large num-
`ber of configurable parameters that control their behav-
`ior. Tuning system parameters can provide significant
`gains in performance but is challenging because of the
`immense number of configurations and complex, non-
`linear system behavior. In recent years, several studies
`attempted to automate the tuning of system configura-
`tions; but they all applied only one or few optimization
`methods. In this paper, for the first time, we apply and
`then perform comparative analysis of multiple black-
`box optimization techniques on storage systems, which
`are often the slowest components of computing systems.
`Our experiments were conducted on a parameter space
`consisting of nearly 25,000 unique configurations and
`over 450,000 data points. We compared these meth-
`ods for their ability to find near-optimal configurations,
`convergence time, and instantaneous system throughput
`during auto-tuning. We found that optimal configura-
`tions differed by hardware, software, and workloads—
`and that no one technique was superior to all others.
`Based on the results and domain expertise, we begin to
`explain the efficacy of these important automated black-
`box optimization methods from a systems perspective.
`1
`Introduction
`Storage is a critical element of computer systems and
`key to data-intensive applications.
`Storage systems
`come with a vast number of configurable parameters that
`control system’s behavior. Ext4 alone has around 60 pa-
`rameters with whopping 1037 unique combinations of
`values. Default parameter settings provided by vendors
`are often suboptimal for a specific user deployment; pre-
`vious research showed that tuning even a small subset
`of parameters can improve power and performance effi-
`ciency of storage systems by as much as 9× [66].
`Traditionally, system administrators pick parameter
`settings based on their expertise and experience. Due to
`the increased complexity of storage systems, however,
`manual tuning does not scale well [87]. Recently, sev-
`eral attempts were made to automate the tuning of com-
`puter systems in general and storage systems in particu-
`lar [71, 78]. Black-box auto-tuning is an especially pop-
`ular approach thanks to its obliviousness to a system’s
`internals [86]. For example, Genetic Algorithms (GA)
`
`were applied to optimize the I/O performance of HDF5-
`based applications [5] and Bayesian Optimization (BO)
`was used to find a near-optimal configuration for Cloud
`VMs [3]. Other methods include Evolutionary Strate-
`gies [62], Smart Hill-Climbing [84], and Simulated An-
`nealing [21]. The basic mechanism behind black-box
`auto-tuning is to iteratively try different configurations,
`measure an objective function’s value—and based on the
`previously learned information—select the next config-
`urations to try. For storage systems, objective functions
`can be throughput, energy consumption, purchase cost,
`or even a formula combining different metrics [50, 71].
`Despite some appealing results, there is no deep under-
`standing how exactly these methods work, their efficacy
`and efficiency, and which methods are more suitable for
`which problems. Moreover, previous works evaluated
`only one or few algorithms at a time. In this paper, for
`the first time (to the best of our knowledge), we apply
`and analytically compare multiple black-box optimiza-
`tion techniques on storage systems.
`To demonstrate and compare these algorithms’ ability
`to find (near-)optimal configurations, we started by ex-
`haustively evaluating several storage systems under four
`workloads on two servers with different hardware and
`storage devices; the largest system consisted of 6,222
`unique configurations. Over a period of 2+ years, we ex-
`ecuted 450,000+ experimental runs. We stored all data
`points in a relational database for query convenience, in-
`cluding hardware and workload details, throughput, en-
`ergy consumption, running time, etc. In this paper, we
`focused on optimizing for throughput, but our method-
`ology and observations are applicable to other metrics
`as well. We will release our dataset publicly to facilitate
`more research into auto-tuning and better understanding
`of storage systems.
`Next, we applied several popular techniques to the
`collected dataset to find optimal configurations under
`various hardware and workload settings: Simulated An-
`nealing (SA), Genetic Algorithms (GA), Bayesian Op-
`timization (BO), and Deep Q-Networks (DQN). We
`also tried Random Search (RS) in our experiments,
`which showed surprisingly good results in previous re-
`search [8]. We compared these techniques from vari-
`ous aspects, such as the ability to find near-optimal con-
`figurations, convergence time, and instantaneous sys-
`
`1
`
`

`

`Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 3 of 16
`
`tem throughput during auto-tuning. For example, we
`found that several techniques were able to converge to
`good configurations given enough time, but their effi-
`cacy differed a lot. GA and BO outperformed SA and
`DQN on our parameter spaces, both in terms of con-
`vergence time and instantaneous throughputs. We also
`showed that hyper-parameter settings of these optimiza-
`tion algorithms, such as mutation rate in GA, could af-
`fect the tuning results. We further compared the tech-
`niques across three behavioral dimensions: (1) Explo-
`ration: how much the technique searches the space ran-
`domly. (2) Exploitation: how much the technique lever-
`ages the “neighborhood” of the current candidate or pre-
`vious search history to find even better configurations.
`(3) History: how much data from previous evaluations
`is kept and utilized in the overall search process. We
`show that all techniques employ these three key concepts
`to varying degrees and the trade-off among them plays
`an important role in the effectiveness and efficiency of
`the algorithms. Based on our experimental results and
`domain expertise, we provide explanations of efficacy
`of such black-box optimization methods from a storage
`perspective. We observed that certain parameters would
`have a greater effect on system performance than oth-
`ers, and the set of dominant parameters depends on file
`systems and workloads. This allows us to provide more
`insights into the auto-tuning process.
`Auto-tuning storage systems is fairly complex and
`challenging. We made several necessary assumptions
`and simplifications while collecting our exhaustive data,
`which we detail in §3. Therefore, some of our observa-
`tions might differ when applied to production systems.
`However, the main purpose of this paper is not to pro-
`vide a complete solution; rather, we focus on comparing
`and understanding the efficacy of several popular opti-
`mization techniques when applied to storage systems.
`We believe this paves the way for practical auto-tuning
`storage systems in real-time.
`The rest of the paper is organized as follows. §2 ex-
`plains the challenges of auto-tuning storage systems and
`provides necessary background knowledge. §3 describes
`our experimental methodology and environments. In §4
`we applied multiple optimization methods and evaluated
`and explained them from various aspects. §5 covers lim-
`itations and future plans for our work. §6 lists related
`work. We conclude and discuss future directions in §7.
`2 Background
`Storage systems are often a critical component of com-
`puter systems, and are the foundation for many data-
`intensive applications. Usually they come with a large
`number of configurable options that could affect or even
`determine the systems’ performance [12, 74], energy
`consumption [66], and other aspects [47, 71]. Here
`
`we define a parameter as one configurable option, and
`a configuration as a combination of parameter val-
`ues. For example, the parameter block size of Ext4
`can take 3 values: 1K, 2K, and 4K. Based on this,
`[journal mode=“data=writeback”, block size=4K, in-
`ode size=4K] is one configuration with 3 specific pa-
`rameters: journal mode, block size, and inode size. All
`possible configurations form a parameter space.
`When configuring storage systems, users often stick
`with the default configurations provided by vendors be-
`cause 1) it is nearly impossible to know the impact of
`every parameter across multiple layers; and 2) vendors’
`default configurations are trusted to be “good enough”.
`However, previous studies [66] showed that tuning even
`a tiny subset of parameters could improve the perfor-
`mance and energy efficiency for storage systems by as
`much as 9×. As technological progress slows down, it
`becomes even more important to squeeze every bit of
`performance out of deployed storage systems.
`In the rest of this section we first discuss the chal-
`lenges of system tuning (§2.1). Then, §2.2 briefly intro-
`duces several promising techniques that we explore in
`this paper. §2.3 explains certain methods that we deem
`less promising. §2.4 provides a unified view of these
`optimization methods.
`2.1 Challenges
`The tuning task for storage systems is difficult, due to
`the following four challenges.
`(1) Large parameter space. Modern storage systems
`are fairly complex and easily come with hundreds or
`even thousands of tunable parameters. One evaluation
`for storage systems can take multiple minutes or even
`hours, which makes exhaustive search impractical. Even
`human experts cannot know the exact impact of every
`parameter and thus have little insight into how to opti-
`mize. For example, Ext4+NFS would result in a parame-
`ter space consisting of more than 1022 unique configura-
`tions. IBM’s General Parallel File System (GPFS) [64]
`contains more than 100 tunable parameters, and hence
`1040 configurations. From the hardware perspective,
`SSDs [30, 53, 57, 65], shingled drives [1, 2, 32, 45], and
`non-volatile memory [40, 83] are gaining popularity,
`plus more layers (LVM, RAID) are added.
`(2) Non-linearity. A system is non-linear when the
`output is not directly proportional to the input. Many
`computer systems are non-linear [16], including storage
`systems [74]. For example, Figure 1 shows the aver-
`age operation latency of GPFS under a typical database
`server workload while changing only the value of the
`parameter pagepool from 32MB to 128MB, and setting
`all the others to their default. Clearly the average la-
`tency is not directly proportional to the pagepool size.
`In fact, through our experiments, we have seen many
`
`2
`
`

`

`Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 4 of 16
`
`ability of accepting a certain move. Instead of always
`
`Figure 1: Storage systems are non-linear.
`more parameters with similar behavior. Worse, the pa-
`rameter space for storage systems is often sparse, irreg-
`ular, and contains multiple peaks. This makes automatic
`optimization even more challenging, as it has to avoid
`getting stuck in a local optima [36].
`(3) Non-reusable results. Previous
`studies have
`shown that evaluation results of storage systems [12,66]
`and databases [78] are dependent on the specific hard-
`ware and workloads. One good configuration might per-
`form poorly when the environment changes. Our evalu-
`ation results in Section 4 show similar observations.
`(4) Discrete and non-numeric parameters. Some
`storage system parameters can take continuous real val-
`ues, while many others are discrete and take only a lim-
`ited set of values. Some parameters are not numeric
`(e.g., I/O scheduler name or file system type). This adds
`difficulty in applying gradient-based approaches.
`Given these challenges, manual tuning of storage sys-
`tems becomes nearly impossible while automatic tuning
`merely difficult.
`In this paper we focus on automatic
`tuning and treat it as an optimization problem.
`2.2 Applied Methods
`Several classes of algorithms have been proposed for
`similar optimization tasks, including automated tuning
`for hyper-parameters of machine learning systems [7, 8,
`59] and optimization of physical systems [3, 78]. Ex-
`amples include Genetic Algorithms (GA) [18, 34], Sim-
`ulated Annealing (SA) [15, 41], Bayesian Optimization
`(BO) [11,68], and Deep Q-Networks (DQN) [46,54,55].
`Although these methods were proposed originally in dif-
`ferent scholarly fields, they can all be characterized as
`black-box optimizations.
`In this section we introduce
`several of these techniques that we successfully applied
`in auto-tuning storage systems.
`Simulated Annealing (SA) is inspired by the anneal-
`ing process in metallurgy, which involves the heating
`and controlled cooling of a material to get to a state with
`minimum thermodynamic free energy. When applied to
`storage systems, a state corresponds to one configura-
`tion. Neighbors of a state refer to new configurations
`achieved by altering only one parameter value of the cur-
`rent state. The thermodynamic free energy is analogous
`to optimization objectives. SA works by maintaining the
`temperature of the system, which determines the prob-
`
`Figure 2: Crossover in Genetic Algorithm (GA).
`moving towards better states as hill-climbing methods
`do, SA defines an acceptance probability distribution,
`which allows it to accept some bad moves in the short
`run, that can lead to even-better moves later on. The
`system is initialized with a high temperature, and thus
`has high probability of accepting worse states in the be-
`ginning. The temperature is gradually reduced based on
`a pre-defined cooling schedule, thus reducing the proba-
`bility of accepting bad states over time.
`Genetic Algorithms (GA) were inspired by the pro-
`cess of natural selection [34].
`It maintains a popula-
`tion of chromosomes (configurations) and applies sev-
`eral genetic operators to them. Crossover takes two par-
`ent chromosomes and generates new ones. As Figure 2
`illustrates, two parent Nilfs2 configurations are cut at
`the same crossover point, and then the subparts after the
`crossover point are exchanged between them to gener-
`ate two new child configurations. Better chromosomes
`will have a higher probability to “survive” in future se-
`lection phases. Mutation randomly picks a chromosome
`and mutates one or more parameter values, which pro-
`duces a completely different chromosome.
`Reinforcement Learning (RL) [72] is an area of ma-
`chine learning inspired by behaviorist psychology. RL
`explores how software agents take actions in an environ-
`ment to maximize the defined cumulative rewards. Most
`RL algorithms can be formulated as a model consisting
`of: (1) A set of environment states; (2) A set of agent
`actions; and (3) A set of scalar rewards. In case of stor-
`age systems, states correspond to configurations, actions
`mean changing to a different configuration, and rewards
`are differences in evaluation results. The agent records
`its previous experience (history), and makes it available
`through a value function, which can be used to predict
`the expected reward of state-action pairs. The policy de-
`termines how the agent takes action, which maintains the
`exploration-exploitation trade-off. The value function
`can take a tabular form, but this does not scale well to
`many dimensions. Function approximation is proposed
`to deal with high dimensionality, which is still known
`to be unstable or even divergent. With recent advances
`in Deep Learning [28], deep convolutional neural net-
`works, termed Deep Q-Networks (DQN), were proposed
`to parameterize the value function, and have been suc-
`
`3
`
` 30
`
` 90
` 60
`pagepool Size (MB)
`
` 120
`
` 150
`
` 40
`
` 30
`
` 20
`
` 10
`
`Avg. Latency (ms)
`
` 0
`
` 0
`
`FS
`
`BG
`
`Journal Option
`
`Parent 1
`
`NilFS2
`
`8
`
`order=strict
`
`Parent 2
`
`NilFS2
`
`256
`
`order=relaxed
`
`Child 1
`
`NilFS2
`
`8
`
`order=relaxed
`
`Child 2
`
`NilFS2
`
`256
`
`order=strict
`
`

`

`Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 5 of 16
`
`cessfully applied in solving various problems [54, 55].
`Many variants of DQN have been proposed [46].
`Bayesian Optimization (BO) [11, 68] is a popular
`framework to solve optimization problems.
`It models
`the objective function as a stochastic process, with the
`argument corresponding to one storage configuration. In
`the beginning, a set of prior points (configurations) are
`given to get a fair estimate of the entire parameter space.
`BO works by computing the confidence interval of the
`objective function according to previous evaluation re-
`sults, which is defined as the range of values that the
`evaluation result is most likely to fall into (e.g., with
`95% probability). The next configuration is selected
`based on a pre-defined acquisition function. Both confi-
`dence intervals and the acquisition function are updated
`with each new evaluation. BO has been successfully ap-
`plied in various areas, including hyper-parameter opti-
`mization [17] and system configuration optimization [3].
`BO and its variants differ mainly in their form of prob-
`abilistic models and acquisition functions. In this paper
`we focus mainly on Gaussian priors and an Expected
`Improvement acquisition function [68].
`Other promising techniques include Tabu Search [27],
`Particle Swarm Optimization [39], Ant Colony Opti-
`mization [20], Memetic Algorithms [52], etc. Due to
`space limits, we omit comparing all of them in this pa-
`per (part of our future work). In fact, as detailed in §2.4,
`most of these techniques actually share similar traits.
`2.3 Other Methods
`Although many optimization techniques have been pro-
`posed, we feel that not all of them make good choices for
`auto-tuning storage systems. For example, since many
`parameters of storage systems are non-numeric, most
`gradient-based methods (i.e., based on linear-regression)
`are less suitable to this task [29].
`Control Theory (CT). CT was historically used to
`manage linear system parameters [19,37,44]. CT builds
`a controller for a system so its output follows a desired
`reference signal [33, 43]. However, CT has been shown
`to have the following three problems: 1) CT tends to be
`unstable in controlling non-linear systems [48, 49]. Al-
`though some variants were proposed, they do not scale
`well. 2) CT cannot handle non-numeric parameters; and
`3) CT requires a lot of data during the learning phase,
`called identification to build a good controller.
`Supervised Machine Learning (ML). Supervised
`ML has been successfully applied in various domains [9,
`10, 56, 81]. However, the accuracy of ML models de-
`pends heavily on the quality and amount of training
`data [81], which is not available or impossible to collect
`for large parameter spaces such as ours.
`Therefore, we feel that neither CT nor supervised ML,
`in their current state, are the first choice to directly and
`
`efficiently apply for auto-tuning storage systems. That
`said, they constantly evolve and new promising results
`appear in research literature [4, 67, 69, 86]; we plan to
`investigate them in the future.
`2.4 Unified Framework
`Most optimization techniques are known to follow the
`exploration-exploitation dilemma [23, 46, 68, 79]. Here
`we summarize the aforementioned methods by extend-
`ing the unified framework with a third factor, the his-
`tory. Our unified view thus defines three factors or di-
`mensions: (cid:4) (1) Exploration defines how the technique
`searches unvisited areas. This often includes a com-
`bination of pure random and also guided search based
`on history. (cid:4) (2) Exploitation defines how the tech-
`nique leverages history to find next sample. (cid:4) (3) His-
`tory defines how much data from previous evaluations
`is kept. History information can be used to help guide
`both future exploration and exploitation (e.g., avoiding
`less promising regions, or selecting regions that have
`never been explored before). Table 1 summarizes how
`the aforementioned techniques work by maintaining the
`balance among these three key factors. For example,
`GA keeps the evaluation results from the last genera-
`tion, which corresponds to the concept of history. GA
`then exploits the stored information, applying selection
`and crossover to search nearby areas and pick the next
`generation. Occasionally, it also randomly mutates some
`chosen parameters, which is the idea of exploration. As
`shown in §4, the trade-off among exploration, exploita-
`tion, and history determines the effectiveness and effi-
`ciency of these optimization techniques.
`3 Experimental Settings
`We now describe details of the experimental environ-
`ments, parameter spaces, and our implementations of
`optimization algorithms.
`Hardware. We performed experiments on two sets of
`machines with different hardware categorized as low-
`end (M1) and mid-range (M2). We list the hardware
`details in Table 3. We also use Watts Up Pro ES power
`meters to measure the energy consumption [82].
`Workload. We benchmarked storage configuration
`with four
`typical macro-workloads generated by
`Filebench [25, 75]. (cid:4) (1) Mailserver emulates the I/O
`workload of a multi-threaded email server. (cid:4) (2) File-
`server emulates the I/O workload of a server that hosts
`users’ home directories. (cid:4) (3) Webserver emulates the
`I/O workload of a typical static Web server with a high
`percentage of reads. (cid:4) (4) Dbserver mimics the behav-
`iors of Online Transaction Processing (OLTP) databases.
`Before each experiment run, we formatted and mounted
`the storage devices with the targeted file system.
`
`4
`
`

`

`Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 6 of 16
`
`Origin
`Annealing technology
`in metallurgy
`
`Exploration
`Allowing moving to
`worse neighbor states
`
`Exploitation
`
`Neighbor function
`
`History
`
`N/A
`
`Natural evolution
`
`Mutation
`
`Crossover and selection
`
`Current population
`
`Taking random actions
`
`Deep convolutional
`neural network
`Acquisition function &
`probabilistic model
`
`Algorithm
`Simulated Annealing
`(SA)
`Genetic Algorithms
`(GA)
`Deep Q-Networks
`(DQN)
`Bayesian
`Optimization (BO)
`
`Param.
`
`Abbr.
`
`File System
`
`FS
`
`Block (Leaf) Size BS
`Inode Size,
`Sector Size
`Block Group
`
`BG
`
`IS
`
`Taking actions based on
`Behaviorist psychology
`action-reward function
`and neuroscience
`Selecting samples with
`Selecting samples with
`Statistics and
`high mean values
`high variances
`experimental design
`Table 1: Comparison and summaries of optimization techniques.
`Hardware
`Values
`M2
`M1
`Model
`Ext2, Ext3, Ext4, XFS, Btrfs,
`Dell PE R710
`Dell PE SC1425
`Nilfs2, Reiserfs
`Intel Xeon quad-core
`Intel Xeon single-core
`CPU
`2.4GHz CPU × 2
`2.8GHz CPU × 2
`1K, 2K, 4K
`n/a, 128, 256, 512, 1024, 2048,
`Memory
`24GB
`2GB
`4096, 8192
`HDD2 (147GB SAS),
`HDD1 (73GB
`n/a, 2, 4, 8, 16, 32, 64, 128, 256
`HDD3 (500GB SAS),
`Seagate
`n/a, order=strict, order=relaxed,
`HDD4 (250GB
`ST373207LW SCSI
`data=journal, data=ordered,
`SATA), SSD (200GB)
`drive)
`data=writeback
`Table 3: Details of experiment machines.
`relatime, noatime
`n/a, compress, nodatacow,
`nodatasum, notail
`noop, cfq, deadline
`I/O Scheduler
`I/O
`Table 2: Details of parameter spaces.
`
`Journal Option
`
`JO
`
`Atime Option
`
`Special Option
`
`AO
`
`SO
`
`The working set size affects the duration of an ex-
`periment [74]. Our goal in this study was to explore
`a large set of parameters and values quickly (although
`it still took us over two years). We therefore decided
`to trade the working set size in favor of increasing the
`number of configurations we could explore in a practi-
`cal time period. We used the default working set sizes
`in Filebench, and ran each workload for 100 seconds;
`this is long enough to get stable evaluation results under
`this setting. The experiments demonstrate a wide range
`of performance numbers and are suitable for evaluating
`different optimization methods.
`Parameter Space. Since the main goal of our paper is
`to compare multiple optimization techniques, we want
`our storage parameter spaces to be large and complex
`enough. Alas, evaluations for storage systems take a
`long time. Considering experimentation on multiple
`hardware settings and workloads, we decided to experi-
`ment with a reasonable subset of the most relevant stor-
`age system parameters. We selected parameters in close
`collaboration with several storage experts that have ei-
`ther contributed to storage stack designs or have spent
`years tuning storage systems in the field. We experi-
`mented with 7 Linux file systems that span a wide range
`of designs and features: Ext2 [13], Ext3 [77], Ext4 [24],
`XFS [73], Btrfs [61], Nilfs2 [42], and Reiserfs [60].
`Our experiments were mainly conducted on two sets
`of parameters, termed as Storage V1 and Storage V2.
`We started with seven common file system parameters
`
`Storage
`
`(shown in the first 7 rows of Table 2), and refer it as Stor-
`age V1. Storage V1 was tested on M1 machines. We then
`extended our search space with one more parameter, the
`I/O Scheduler, and refer to it as Storage V2. Storage V2
`was evaluated on M2 servers. Note that certain combi-
`nations of parameter values could produce invalid con-
`figurations. For example, for Ext2, the journaling option
`makes no sense because Ext2 does not have a journal. To
`handle this, we added a value n/a to the existing range of
`parameters. Any parameter with n/a value is considered
`invalid.
`Invalid configurations will always come with
`evaluation results of zero (i.e., no throughput); this en-
`sures they are purged in an upcoming optimization pro-
`cess. There are 2,074 valid configurations in Storage
`V1 and 6,222 in Storage V2 for each workload and stor-
`age device. We believe our search spaces are large and
`complex enough to demonstrate the difference in effi-
`ciency of various optimization algorithms. Furthermore,
`many of the chosen parameters are commonly tuned and
`studied by storage experts; having a basic understanding
`of such parameters helped us understand auto-tuning re-
`sults.
`
`Experiments and implementations. Our experi-
`ments and implementation consist of two parts. First,
`we exhaustively ran all configurations for each work-
`load and device on M1 and M2 machines, and stored
`the results in a relational database. We collected the
`throughput in terms of I/O operations per second, as re-
`ported by Filebench, the running time (including setup
`time), as well as power and energy consumption. To ac-
`quire more accurate and stable results, we evaluated each
`configuration under the same environment for at least 3
`runs, resulting in more than 450,000 total experimen-
`
`5
`
`

`

`Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 7 of 16
`
`Through-
`I/O
`Special
`Atime
`Journal
`File Block Inode BG
`Hardware-
`Options Options Scheduler put (IOPS)
`Options
`Workload-Device System Size
`Size Count
`M1-Mail-HDD1 Nilfs2
`order=relaxed relatime
`n/a
`-
`3,677
`2K
`n/a
`256
`M2-Mail-HDD3
`n/a
`relatime
`n/a
`noop
`18,744
`Ext2
`4K
`256
`32
`M2-File-HDD3
`n/a
`relatime compress deadline
`16,549
`Btrfs
`4K 4,096
`n/a
`M2-Mail-SSD
`n/a
`relatime
`n/a
`noop
`18,845
`Ext2
`4K
`256
`8
`M2-DB-SSD
`data=ordered noatime
`n/a
`noop
`41,948
`Ext4
`1K
`128
`2
`M2-Web-SSD
`data=ordered noatime
`n/a
`noop
`16,185
`Ext4
`4K
`128
`4
`Table 4: Global optimal configurations with different settings and workloads.
`4.1 Overview of Data Sets
`tal runs. This data collection benefited our evaluation
`As per §3, our experimental methodology is to first ex-
`on auto-tuning as we can simply simulate a variety of
`algorithms by just querying the database for the evalua-
`haustively run all configurations under different work-
`tion results for different configurations, without having
`loads and test machines. We stored the results in a
`to rerun slow I/O experiments. The exhaustive search
`database for future use. This data collection benefits
`also lets us know exactly what the global optimal con-
`future experiments as we can simulate a variety of al-
`figurations are, so that we can calculate how close each
`gorithms by querying the database for the evaluation re-
`optimization method gets to the global optimum.
`sults of different configurations. Due to space limits, in
`Second, we simulated the process of auto-tuning stor-
`this section we show only 6 representative data sets out
`of 18: 2 workloads on M1 and 4 devices × 4 workloads
`age systems by running the desired optimization method
`and querying the database for the average evaluation re-
`on M2. They were picked to (1) show a wide range of
`sults from multiple (3+) repeated runs. We focused on
`hardware and workloads’ impact on optimization results
`optimizing for throughput in this paper. The computa-
`and (2) to present more SSD results, given SSDs’ in-
`tion cost of optimization algorithms are ignored in our
`creasing popularity.
`experiments. We believe our observations are applica-
`Figure 3 shows the throughput CDF among all con-
`ble to other optimization objectives as well. Our im-
`figurations for each hardware setting and workload. The
`plementations of optimization methods are mostly based
`Y-axis is normalized by the maximum throughput un-
`on open-source libraries. We use Pyevolve [58] for
`der each experiment setting. The symbols on each line
`Genetic Algorithms, Scikit-Optimize [70] for Bayesian
`mark the default Ext4 configurations. As seen, for most
`Optimization, and TensorFlow [76] for the DQN im-
`settings, throughput values vary across a wide range.
`plementation. We implemented a simple version of
`The ratios of the worst throughput to the best one are
`Simulated Annealing, with both linear and geometric
`mostly between 0.2–0.4. In one extreme case, for File-
`cooling schedules.
`(We also fixed bugs in Pyevolve
`server on M2 machines and with the HDD3 device (ab-
`and plan to release our patches.) Most of our imple-
`breviated as M2-Fileserver-HDD3), the worst configu-
`mentation was done by converting storage-related con-
`ration only produces 1% I/O operations per unit time,
`cepts into algorithm-specific ones. For example, for
`compared with the global optimal one. This under-
`GA, we defined each storage parameter as a gene, and
`lines the importance of tuning storage systems: an im-
`each configuration as a chromosome. For DQN we
`properly configured system could be remarkably under-
`provided storage-specific definitions for states, actions,
`utilized, and thus wasting a lot of resources. How-
`and rewards. The complete implementation uses around
`ever, M2-Webserver-SSD shows a much narrower range
`10,000 lines of Python code.
`of throughput, with the worst-to-best ratio close to 0.9.
`This is attributed mainly to the fact that Webserver
`consists of mostly sequential read operations that are
`processed similarly by different I/O stack configura-
`tions. Figure 3 also shows that default Ext4 configu-
`rations are always sub-optimal and, under most settings,
`ranked lower than the top 40% configurations. For M1-
`Mailserver-HDD1, the default Ext4 configuration shows
`a normalized throughput of 0.39, which means that the
`optimal configuration performs 2.5 times better.
`Table 4 lists optimal configurations for the same six
`hardware and workload settings. As we can see, optimal
`configurations depend on the specific hardware as well
`as the running workload. For M1-Mailserver-HDD1,
`the global best is a Nilfs2 configuration. However, if
`
`Our evaluation mainly focuses on comparing the effec-
`tiveness and speed of applying multiple optimization
`techniques on auto-tuning storage systems, and provid-
`ing insights into our observations. §4.1 overviews the
`data sets that we collected for over two years. §4.2 com-
`pares five popular optimization techniques from several
`aspects. §4.3 uses GA as a case study to show that hyper-
`parameters of these methods can also impact the auto-
`tuning results. §4.4 takes the first step towards explain-
`ing these black-box optimization methods, based on our
`evaluation results and our storage expertise.
`
`4 Evaluations
`
`6
`
`

`

`Case 1:20-cv-00034-ADA Document 52-8 Filed 04/27/20 Page 8 of 16
`
`Figure 3: Throughput CDF with different hardware and work-
`loads, with symbols marking the default Ext4 configurations.
`we fix the workload, change the hardware, and get M2-
`Mailserver-HDD3, the optimum becomes an Ext4 con-
`figuration. Similarly, fixing the hardware to M2-*-SSD
`and experimenting under different workloads leads to
`different optimal configurations. This proves our early
`claim that performance is sensitive to the environment
`(e.g., hardware, configuration, and workloads); this ac-
`tually complicates the problem as results from one envi-
`ronment cannot be directly applied in another.
`4.2 Comparative Analysis
`Many optimization techniques have been applied to var-
`ious auto-tuning tasks [71, 78]. However, previous ef-
`forts picked algorithms somewhat arbitrarily and eval-
`uated only one algorithm at a time. Here we provide
`the first comparative study of multiple black-box opti-
`mization techniques on auto-tuning storage systems. As
`discu

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket