throbber
A Case for Flash Memory SSD in Enterprise Database
`Applications
`
`Sang-Won Lee†
`
`Bongki Moon‡ Chanik Park§
`
`Jae-Myung Kim¶
`
`Sang-Woo Kim†
`
`†School of Information & Communications Engr.
`Sungkyunkwan University
`Suwon 440-746, Korea
`{wonlee,swkim}@ece.skku.ac.kr
`§Samsung Electronics Co., Ltd.
`San #16 Banwol-Ri
`Hwasung-City 445-701, Korea
`ci.park@samsung.com
`
`‡Department of Computer Science
`University of Arizona
`Tucson, AZ 85721, U.S.A.
`bkmoon@cs.arizona.edu
`¶Altibase Corp.
`182-13, Guro-dong, Guro-Gu
`Seoul, 152-790, Korea
`jmkim@altibase.com
`
`ABSTRACT
`Due to its superiority such as low access latency, low en-
`ergy consumption, light weight, and shock resistance, the
`success of flash memory as a storage alternative for mobile
`computing devices has been steadily expanded into personal
`computer and enterprise server markets with ever increas-
`ing capacity of its storage. However, since flash memory ex-
`hibits poor performance for small-to-moderate sized writes
`requested in a random order, existing database systems may
`not be able to take full advantage of flash memory without
`elaborate flash-aware data structures and algorithms. The
`objective of this work is to understand the applicability and
`potential impact that flash memory SSD (Solid State Drive)
`has for certain type of storage spaces of a database server
`where sequential writes and random reads are prevalent. We
`show empirically that up to more than an order of magni-
`tude improvement can be achieved in transaction processing
`by replacing magnetic disk with flash memory SSD for trans-
`action log, rollback segments, and temporary table spaces.
`
`Categories and Subject Descriptors
`H. Information Systems [H.2 DATABASE MANAGE-
`MENT]: H.2.2 Physical Design
`General Terms
`Design, Algorithms, Performance, Reliability
`∗
`This work was partly supported by the IT R&D program
`of MIC/IITA [2006-S-040-01] and MIC, Korea under ITRC
`IITA-2008-(C1090-0801-0046). The authors assume all re-
`sponsibility for the contents of the paper.
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada.
`Copyright 2008 ACM 978-1-60558-102-6/08/06 ...$5.00.
`
`Keywords
`Flash-Memory Database Server, Flash-Memory SSD
`
`1.
`
`INTRODUCTION
`Due to its superiority such as low access latency, low en-
`ergy consumption, light weight, and shock resistance, the
`success of flash memory as a storage alternative for mobile
`computing devices has been steadily expanded into personal
`computer and enterprise server markets with ever increas-
`ing capacity of its storage. As it has been witnessed in the
`past several years, two-fold annual increase in the density
`of NAND flash memory is expected to continue until year
`2012 [11]. Flash-based storage devices are now considered
`to have tremendous potential as a new storage medium that
`can replace magnetic disk and achieve much higher perfor-
`mance for enterprise database servers [10].
`The trend in market is also very clear. Computer hard-
`ware manufacturers have already launched new lines of mo-
`bile personal computers that did away with disk drives alto-
`gether, replacing them with flash memory SSD (Solid State
`Drive). Storage system vendors have started lining up their
`flash-based solutions in Terabyte-scale targeting large-scale
`database servers as one of the main applications.
`Adoption of a new technology, however, is often deterred
`by lack of in-depth analysis on its applicability and cost-
`effectiveness, and is even considered risky when it comes to
`mission critical applications. The objective of this work is
`to evaluate flash memory SSD as stable storage for database
`workloads and identify the areas where flash memory SSD
`can be best utilized, thereby accelerating its adoption as
`an alternative to magnetic disk and maximizing the benefit
`from this new technology.
`Most of the contemporary database systems are config-
`ured to have separate storage spaces for database tables and
`indexes, log data and temporary data. Whenever a trans-
`action updates a data object, its log record is created and
`stored in stable storage for recoverability and durability of
`the transaction execution. Temporary table space stores
`
`1075
`
`CSCO-1031
`
`

`

`temporary data required for performing operations such as
`sorts or joins. If multiversion read consistency is supported,
`another separate storage area called rollback segments is
`created to store previous versions of data objects.
`For the purpose of performance tuning as well as recov-
`erability, these distinct storage spaces are often created on
`physically separate storage devices, so that I/O throughput
`can increase, and I/O bottlenecks can be detected and ad-
`dressed with more ease. While it is commonly known that
`accessing data stored in secondary storage is the main source
`of bottlenecks in database processing, high throughput of a
`database system cannot be achieved by addressing the bot-
`tlenecks only in spaces for tables and indexes but also in
`spaces for log, temporary and rollback data.
`Recent studies on database availability and architecture
`report that writing log records to stable storage is almost
`guaranteed to be a significant performance bottleneck [13,
`21]. In on-line transaction processing (OLTP) applications,
`for example, when a transaction commits, all the log records
`created by the transaction have to be force-written to sta-
`ble storage.
`If a large number of concurrent transactions
`commit at a rapid rate, the log tail will be requested to be
`flushed to disk very often. This will then lengthen the av-
`erage wait time of committing transactions and delay the
`release of locks further, and eventually increase the overall
`runtime overhead substantially.
`Accessing data stored in temporary table spaces and roll-
`back segments also takes up a significant portion of total
`I/O activities. For example, queries performing a table scan,
`join, sort or hash operation are very common in a data ware-
`housing application, and processing those queries (except
`simple table scans) will require a potentially large amount
`of intermediate data to be written to and read from tem-
`porary table spaces. Thus, to maximize the throughput of
`a database system, it is critical to speed up accessing data
`stored in those areas as well as in the data space for tables
`and indexes.
`Previous work has reported that flash memory exhibits
`poor performance for small-to-moderate sized writes requested
`in a random order [2] and the best attainable performance
`may not be obtained from database servers without elab-
`orate flash-aware data structures and algorithms [14].
`In
`this paper, in contrast, we demonstrate that flash mem-
`ory SSD can help improve the performance of transaction
`processing significantly, particularly as a storage alternative
`for transaction log, rollback segments and temporary table
`spaces. To accomplish this, we trace quite distinct data ac-
`cess patterns observed from these three different types of
`data spaces, and analyze how magnetic disk and flash mem-
`ory SSD devices handle such I/O requests, and show how
`the overall performance of transaction processing is affected
`by them.
`While the previous work on in-page logging is targeted at
`regular table spaces for database tables and indexes where
`small random writes are dominant [14], the objective of this
`work is to understand the applicability and potential impact
`that flash memory SSD has for the other data spaces where
`sequential writes and random reads are prevalent. The key
`contributions of this work are summarized as follows.
`• Based on a detailed analysis of data accesses that are
`
`traced from a commercial database server, this paper
`provides an understanding of I/O behaviors that are
`dominant in transaction log, rollback segments, and
`temporary table spaces.
`It also shows that this I/O
`pattern is a good match for the dual-channel, super-
`block design of flash memory SSD as well as the char-
`acteristics of flash memory itself.
`• This paper presents a quantitative and comparative
`analysis of magnetic disk and flash memory SSD with
`respect to performance impacts they have on transac-
`tional database workloads. We observed more than an
`order of magnitude improvement in transaction through-
`put and response time by replacing magnetic disk with
`flash memory SSD as storage media for transaction log
`or rollback segments. In addition, more than a factor
`of two improvement in response time was observed in
`processing a sort-merge or hash join query by adopting
`flash memory SSD instead of magnetic disk for tem-
`porary table spaces.
`• The empirical study carried out in this paper demon-
`strates that low latency of flash memory SSD can alle-
`viate drastically the log bottleneck at commit time and
`the problem of increased random reads for multiversion
`read consistency. With flash memory SSD, I/O pro-
`cessing speed may no longer be as serious a bottleneck
`as it used be, and the overall performance of query
`processing can be much less sensitive to tuning param-
`eters such as the unit size of physical I/O. The supe-
`rior performance of flash memory SSD demonstrated in
`this work will help accelerate adoption of flash mem-
`ory SSD for database applications in the enterprise
`market, and help us revisit requirements of database
`design and tuning guidelines for database servers.
`
`The rest of this paper is organized as follows. Section 2
`presents a few key features and architecture of Samsung flash
`memory SSD, and discusses its performance characteristics
`with respect to transactional database workloads. Section 3
`describes the experimental settings that will be used in the
`following sections. In Section 4, we analyze the performance
`gain that can be obtained by adopting flash memory SSD
`as stable storage for transaction log. Section 5 analyzes the
`patterns in which old versions of data objects are written
`to and read from rollback segments, and shows how flash
`memory SSD can take advantage of the access patterns to
`improve access speed for rollback segments and the average
`response time of transactions. In Section 6, we analyze the
`I/O patterns of sort-based and hash-based algorithms, and
`discuss the impact of flash memory SSD on the algorithms.
`Lastly, Section 7 summarizes the contributions of this paper.
`
`2. DESIGN OF SAMSUNG FLASH SSD
`The flash memory SSD (Solid State Drive) of Samsung
`Electronics is a non-volatile storage device based on NAND-
`type flash memory, which is being marketed as a replacement
`of traditional hard disk drives for a wide range of comput-
`ing platforms.
`In this section, we first briefly summarize
`the characteristics of flash memory as a storage medium for
`databases. We then present the architecture and a few key
`
`

`

`features of Samsung flash memory SSD, and discuss its per-
`formance implications on transactional database workloads.
`2.1 Characteristics of Flash Memory
`Flash memory is a purely electronic device with no me-
`chanically moving parts like disk arms in a magnetic disk
`drive. Therefore, flash memory can provide uniform ran-
`dom access speed. Unlike magnetic disks whose seek and
`rotational delay often becomes the dominant cost of reading
`or writing a sector, the time to access data in flash mem-
`ory is almost linearly proportional to the amount of data
`irrespective of their physical locations in flash memory. The
`ability of flash memory to quickly perform a sector read or
`a sector (clean) write located anywhere in flash memory is
`one of the key characteristics we can take advantage of.
`On the other hand, with flash memory, no data item (or a
`sector containing the data item) can be updated in place just
`by overwriting it. In order to update an existing data item
`stored in flash memory, a time-consuming erase operation
`must be performed before overwriting. The erase operation
`cannot be performed selectively on a particular data item
`or sector, but can only be done for an entire block of flash
`memory called erase unit containing the data item, which is
`much larger (typically 128 KBytes) than a sector. To avoid
`performance degradation caused by this erase-before-write
`limitation, some of the data structures and algorithms of
`existing database systems may well be reconsidered [14].
`The read and write speed of flash memory is asymmetric,
`simply because it takes longer to write (or inject charge into)
`a cell until reaching a stable status than to read the status
`from a cell. As will be shown later in this section (Table 1),
`the sustained speed of read is almost twice faster than that
`of write. This property of asymmetric speed should also be
`considered when reviewing existing techniques for database
`system implementations.
`2.2 Architecture and Key Features
`High bandwidth is one of the critical requirements for the
`design of flash memory SSD. The dual-channel architecture,
`as shown in Figure 1, supports up to 4-way interleaving to
`hide flash programming latency and to increase bandwidth
`through parallel read/write operations. An automatic inter-
`leaving hardware logic is adopted to maximize the interleav-
`ing effect with the minimal firmware intervention [18].
`
`Figure 1: Dual-Channel Architecture of SSD
`
`A firmware layer known as flash translation layer (FTL) [5,
`
`12] is responsible for several essential functions of flash mem-
`ory SSD such as address mapping and wear leveling. The
`address mapping scheme is based on super-blocks in order
`to limit the amount of information required for logical-to-
`physical address mapping, which grows larger as the capac-
`ity of flash memory SSD increases. This super-block scheme
`also facilitates interleaved accesses of flash memory by strip-
`ing a super-block of one MBytes across four flash chips. A
`super-block consists of eight erase units (or large blocks) of
`128 KBytes each. Under this super-block scheme, two erase
`units of a super-block are allocated in the same flash chip.
`Though flash memory SSD is a purely electronic device
`without any moving part, it is not entirely latency free for
`accessing data. When a read or write request is given from
`a host system, the I/O command should be interpreted and
`processed by the SSD controller, referenced logical addresses
`should be mapped to physical addresses, and if mapping
`information is altered by a write or merge operation, then
`the mapping table should be updated in flash memory. With
`all these overheads added up, the read and write latency
`observed from the recent SSD products is approximately 0.2
`msec and 0.4 msec, respectively.
`In order to reduce energy consumption, the one-chip con-
`troller uses a small amount of SRAM for program code, data
`and buffer memory.1 The flash memory SSD drives can be
`interfaced with a host system through the IDE standard
`ATA-5.
`
`2.3 Flash SSD for Database Workload
`Typical transactional database workloads like TPC-C ex-
`hibit little locality and sequentiality in data accesses, a high
`percentage of which are synchronous writes (e.g., forced-
`writes of log records at commit time). Such latency hiding
`techniques as prefetching and write buffering become less
`effective for this type of workload, and the performance of
`transactional database applications tends to be more closely
`limited by disk latency than disk bandwidth and capac-
`ity [24]. Nonetheless, for more than a decade in the past, the
`latency of disk has improved at a much slower pace than the
`bandwidth of disk, and the latency-bandwidth imbalance is
`expected to be even more evident in the future [19].
`In this regard, extremely low latency of flash memory
`SSD lends itself to being a new storage medium that re-
`places magnetic disk and improves the throughput of trans-
`action processing significantly. Table 1 shows the perfor-
`mance characteristics of some contemporary hard disk and
`flash memory SSD products. Though the bandwidth of disk
`is still two to three times higher than that of flash memory
`SSD, more importantly, the read and write latency of flash
`memory SSD is smaller than that of disk by more than an
`order of magnitude.
`As is briefly mentioned above, the low latency of flash
`memory SSD can reduce the average transaction commit
`time and improve the throughput of transaction processing
`significantly. If multiversion read consistency is supported,
`rollback data are typically written to rollback segments se-
`quentially in append-only fashion and read from rollback
`segments randomly during transaction processing. This pe-
`
`1The flash memory SSD drive tested in this paper contains
`128 KByte SRAM.
`
`

`

`Storage
`Average
`Latency
`Sustained
`Transfer Rate
`
`†
`
`hard disk
`8.33 ms
`
`110 MB/sec
`
`‡
`
`flash SSD
`0.2 ms (read)
`0.4 ms (write)
`56 MB/sec (read)
`32 MB/sec (write)
`
`†
`
`Disk: Seagate Barracuda 7200.10 ST3250310AS, average
`latency for seek and rotational delay;
`‡
`SSD: Samsung MCAQE32G8APP-0XA drive with
`K9WAG08U1A 16 Gbits SLC NAND chips
`
`Table 1: Magnetic disk vs. NAND Flash SSD
`
`culiar I/O pattern is a good match for the characteristics of
`flash memory itself and the super-block scheme of the Sam-
`sung flash memory SSD. External sorting is another opera-
`tion that can benefit from the low latency of flash memory
`SSD, because the read pattern of external sorting is quite
`random during the merge phase in particular.
`
`3. EXPERIMENTAL SETTINGS
`Before presenting the results from our workload analysis
`and performance study in the following sections, we describe
`the experimental settings briefly in this section.
`In most cases, we ran a commercial database server (one
`of the most recent editions of its product line) on two Linux
`systems (kernel version 2.6.22), each with a 1.86 GHz In-
`tel Pentium dual-core processor and 2 GB RAM. These
`two computer systems were identical except that one was
`equipped with a magnetic disk drive and the other with a
`flash memory SSD drive instead of the disk drive. The disk
`drive model was Seagate Barracuda 7200.10 ST3250310AS
`with 250 GB capacity, 7200 rpm and SATA interface. The
`flash memory SSD model was Samsung Standard Type
`MCAQE32G8APP-0XA with 32 GB capacity and 1.8 inch
`PATA interface, which internally deploys Samsung
`K9WAG08U1A 16 Gbits SLC NAND flash chips (shown in
`Figure 2). These storage devices were connected to the com-
`puter systems via a SATA or PATA interface.
`
`Figure 2: Samsung NAND Flash SSD
`
`When either magnetic disk or flash memory SSD was used
`as stable storage for transaction log, rollback segments, or
`temporary table spaces, it was bound as a raw device in
`order to minimize interference from data caching by the op-
`erating system. This is a common way of binding storage
`
`devices adopted by most commercial database servers with
`their own caching scheme. In all the experiments, database
`tables were cached in memory so that most of IO activi-
`ties were confined to transaction log, rollback segments and
`temporary table spaces.
`
`4. TRANSACTION LOG
`When a transaction commits, it appends a commit type
`log record to the log and force-writes the log tail to stable
`storage up to and including the commit record. Even if a no-
`force buffer management policy is being used, it is required
`to force-write all the log records kept in the log tail to ensure
`the durability of transactions [22].
`As the speed of processors becomes faster and the memory
`capacity increases, the commit time delay due to force-writes
`increasingly becomes a serious bottleneck to achieving high
`performance of transaction processing [21]. The response
`time Tresponse of a transaction can be modeled as a sum
`of CPU time Tcpu, read time Tread, write time Twrite and
`commit time Tcommit. Tcpu is typically much smaller than
`IO time. Even Tread and Twrite become almost negligible
`with a large capacity buffer cache and can be hidden by
`asynchronous write operations. On the other hand, commit
`time Tcommit still remains to be a significant overhead, be-
`cause every committing transaction has to wait until all of
`its log records are force-written to log, which in turn can-
`not be done until forced-write operations requested by other
`transactions earlier are completed. Therefore, the amount of
`commit-time delay tends to increase as the number of con-
`current transactions increases, and is typically no less than
`a few milliseconds.
`Group commit may be used to alleviate the log bottle-
`neck [4]. Instead of committing each transaction as it fin-
`ishes, transactions are committed in batches when enough
`logs are accumulated in the log tail. Though this group
`commit approach can significantly improve the throughput
`of transaction processing, it does not improve the response
`time of individual transactions and does not remove the
`commit time log bottleneck altogether.
`Log records are always appended to the end of log. If a
`separate storage device is dedicated to transaction log, which
`is commonly done in practice for performance and recover-
`ability purposes, this sequential pattern of write operations
`favors not only hard disk but also flash memory SSD. With
`no seek delay due to sequential accesses, the write latency
`of disk is reduced to only half a revolution of disk spindle
`on average, which is equivalent to approximately 4.17 msec
`for disk drives with 7200 rpm rotational speed.
`In the case of flash memory SSD, however, the write la-
`tency is much lower at about 0.4 msec, because flash memory
`SSD has no mechanical latency but only a little overhead
`from the controller as described in Section 2.3. Even the
`no in-place update limitation of flash memory has no nega-
`tive impact on the write bandwidth in this case, because log
`records being written to flash memory sequentially do not
`cause expensive merge or erase operations as long as clean
`flash blocks (or erase units) are available. Coupled with the
`low write latency of flash memory, the use of flash memory
`SSD as a dedicated storage device for transaction log can
`reduce the commit time delay considerably.
`
`

`

`In the rest of this section, we analyze the performance
`gain that can be obtained by adopting flash memory SSD
`as stable storage for transaction log. The empirical results
`from flash memory SSD drives are compared with those from
`magnetic disk drives.
`4.1 Simple SQL Transactions
`To analyze the commit time performance of hard disk and
`flash memory SSD drives, we first ran a simple embedded
`SQL program on a commercial database server, which ran
`on two identical Linux systems except that one was equipped
`with a magnetic disk drive and the other with a flash mem-
`ory SSD drive instead of the disk drive. This embedded
`SQL program is multi-threaded and simulates concurrent
`transactions. Each thread updates a single record and com-
`mits, and repeats this cycle of update and commit continu-
`ously. In order to minimize the wait time for database table
`updates and increase the frequency of commit time forced-
`writes, the entire table data were cached in memory. Conse-
`quently, the runtime of a transaction excluding the commit
`time (i.e., Tcpu + Tread + Twrite) was no more than a few
`dozens of microseconds in the experiment. Table 2 shows
`the throughput of the embedded SQL program in terms of
`transactions-per-seconds (TPS).
`
`no. of concurrent
`transactions
`4
`8
`16
`32
`64
`
`hard disk
`TPS %CPU
`178
`2.5
`358
`4.5
`711
`8.5
`1403
`20
`2737
`38
`
`flash SSD
`TPS %CPU
`2222
`28
`4050
`47
`6274
`77
`5953
`84
`5701
`84
`
`Table 2: Commit-time performance of an embedded
`SQL program measured in transactions-in-seconds
`(TPS) and CPU utilization
`
`Regarding the commit time activities, a transaction can
`be in one of the three distinct states. Namely, a transaction
`(1) is still active and has not requested to commit, (2) has
`already requested to commit but is waiting for other trans-
`actions to complete forced-writes of their log records, or (3)
`has requested to commit and is currently force-writing its
`own log records to stable storage.
`When a hard disk drive was used as stable storage, the
`average wait time of a transaction was elongated due to
`the longer latency of disk writes, which resulted in an in-
`creased number of transactions that were kept in a state of
`the second or third category. This is why the transaction
`throughput and CPU utilization were both low, as shown in
`the second and third columns of Table 2.
`On the other hand, when a flash memory SSD drive was
`used instead of a hard disk drive, much higher transaction
`throughput and CPU utilization were observed, as shown in
`the fourth and fifth columns of Table 2. With a much shorter
`write latency of flash memory SSD, the average wait time of
`a transaction was shortened, and a relatively large number
`of transactions were actively utilizing CPU, which in turn
`resulted in higher transaction throughput. Note that the
`CPU utilization was saturated when the number of concur-
`
`rent transactions was high in the case of flash memory SSD,
`and no further improvement in transaction throughput was
`observed when the number of concurrent transactions was
`increased from 32 to 64, indicating that CPU was a limiting
`factor rather than I/O.
`4.2 TPC-B Benchmark Performance
`In order to evaluate the performance of flash memory SSD
`as a storage medium for transaction log in a more harsh envi-
`ronment, we ran a commercial database server with TPC-B
`workloads created by a workload generation tool. Although
`it is obsolete, the TPC-B benchmark was chosen because it
`is designed to be a stress test on different subsystems of a
`database server and its transaction commit rate is higher
`than that of TPC-C benchmark [3]. We used this bench-
`mark to stress-test the log storage part of the commercial
`database server by executing a large number of small trans-
`actions causing significant forced-write activities.
`In this benchmark test, the number of concurrent simu-
`lated users was set to 20, and the size of database and the
`size of database buffer cache of the server were set to 450
`MBytes and 500 MBytes, respectively. Note that this set-
`ting allows the database server to cache the entire database
`in memory, such that the cost of reading and writing data
`pages is eliminated and the cost of forced writing log records
`remains dominant on the critical path in the overall perfor-
`mance. When either a hard disk or flash memory SSD drive
`was used as stable storage for transaction log, it was bound
`as a raw device. Log records were force-written to the sta-
`ble storage in a single or multiple sectors (of 512 bytes) at
`a time.
`Table 3 summarizes the results from the benchmark test
`measured in terms of transactions-per-seconds (TPS) and
`CPU utilization as well as the average size of a single log
`write and the average time taken to process a single log
`write. Since multiple transactions could commit together
`as a group (by a group commit mechanism), the frequency
`of log writes was much lower than the number of transac-
`tions processed per second. Again, due to the group commit
`mechanism, the average size of a single log write was slightly
`different between the two storage media.
`
`Transactions/sec
`CPU utilization (%)
`Log write size (sectors)
`Log write time (msec)
`
`hard disk
`864
`20
`32
`8.1
`
`flash SSD
`3045
`65
`30
`1.3
`
`Table 3: Commit-time performance from TPC-B
`benchmark (with 20 simulated users)
`
`The overall transaction throughput was improved by a
`factor of 3.5 by using a flash memory SSD drive instead of
`a hard disk drive as stable storage for transaction log. Evi-
`dently the main factor responsible for this improvement was
`the considerably lower log write time (1.3 msec on average)
`of flash memory SSD, compared with about 6 times longer
`log write time of disk. With a much reduced commit time
`delay by flash memory SSD, the average response time of
`a transaction was also reduced considerably. This allowed
`
`

`

`In the case of disk, the trend in transaction throughput
`remained almost identical to the one previously observed
`when a dual-core processor was used. In the case of flash
`memory SSD, the trend of SSD-Quad was also similar to that
`of SSD-Dual, except that the saturation level was consider-
`ably higher at approximately 4300 transactions per second.
`The results from these two benchmark tests speak for
`themselves that the processing speed of CPU was a bot-
`tleneck in transaction throughput in case of flash memory
`SSD, while it was not in the case of disk.
`
`5. MVCC ROLLBACK SEGMENT
`Multiversion concurrency control (MVCC) has been adopted
`by some of the commercial and open source database sys-
`tems (e.g., Oracle, PostgreSQL, SQL Server 2005) as an al-
`ternative to the traditional concurrency control mechanism
`based on locks. Since read consistency is supported by pro-
`viding multiple versions of a data object without any lock,
`MVCC is intrinsically non-blocking and can arguably min-
`imize performance penalty on concurrent update activities
`of transactions. Another advantage of multiversion concur-
`rency control is that it naturally supports snapshot isola-
`tion [1] and time travel queries [15, 17]. 2
`To support multiversion read consistency, however, when
`a data object is updated by a transaction, the original data
`value has to be recorded in an area known as rollback seg-
`ments. The rollback segments are typically set aside in sta-
`ble storage to store old images of data objects, and should
`not be confused with undo log, because the rollback seg-
`ments are not for recovery but for concurrent execution of
`transactions. Thus, under multiversion concurrency control,
`updating a data object requires writing its before image to
`a rollback segment in addition to writing undo and redo log
`records for the change.
`Similarly, reading a data object can be somewhat costlier
`under the multiversion concurrency control. When a trans-
`action reads a data object, it needs to check whether the
`data object has been updated by other transactions, and
`needs to fetch an old version from a rollback segment if nec-
`essary. The cost of this read operation may not be trivial,
`if the data object has been updated many times and fetch-
`ing its particular version requires search through a long list
`of versions of the data object. Thus, it is essential to pro-
`vide fast access to data in rollback segments so that the
`performance of database servers supporting MVCC are not
`hindered by increased disk I/O activities [16].
`In this section, we analyze the patterns in which old ver-
`sions of data objects are written to and read from rollback
`segments, and show how flash memory SSD can take ad-
`vantage of the access patterns to improve access speed for
`rollback segments and the average response time of transac-
`tions.
`
`5.1 Understanding the MVCC Write
`When a transaction updates tuples, it stores the before
`images of the updated tuples in a block within a rollback
`
`2As opposed to the ANSI SQL-92 isolation levels, the snap-
`shot isolation level exhibits none of the anomalies that the
`SQL-92 isolation levels prohibit. Time travel queries allow
`you to query a database as of a certain time in the past.
`
`transactions to release resources such as locks and memory
`quickly, which in turn helped transactions avoid waiting on
`locks held by other transactions and increased the utiliza-
`tion of CPU. With flash memory SSD as a logging storage
`device, the bottleneck of transaction processing now appears
`to be CPU rather than I/O subsystem.
`4.3 I/O-Bound vs. CPU-Bound
`In the previous sections, we have suggested that the bot-
`tleneck of transaction processing might be shifted from I/O
`to CPU if flash memory SSD replaced hard disk as a log-
`ging storage device. In order to put this proposition to the
`test, we carried out further performance evaluation with the
`TPC-B benchmark workload.
`First, we repeated the same benchmark test as the one
`depicted in Section 4.2 but with a varying number of sim-
`ulated users. The two curves denoted by Disk-Dual and
`SSD-Dual in Figure 3 represent the transaction throughput
`observed when a hard disk drive or a flash memory SSD
`drive was used as a logging storage device, respectively. Not
`surprisingly, this result matches the one shown in Table 3,
`and shows the trend more clearly.
`In the case of flash memory SSD, as the number of con-
`current transactions increased, transaction throughput in-
`creased quickly and was saturated at about 3000 transac-
`tions per second without improving beyond this level. As
`will be discussed further in the following, we believe this
`was because the processing power of CPU could not keep
`up with a transaction arrival rate any higher than that. In
`the case of disk, on the other hand, transaction throughput
`increased slowly but steadily in proportion to the number of
`concurrent transactions until it reached the same saturation
`level. This clearly indicates that CPU wa

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket