`
`(12) United States Patent
`Fallon et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,934,535 B2
`Jan. 13, 2015
`
`(54) SYSTEMS AND METHODS FOR VIDEO AND
`AUDIO DATA STORAGE AND DISTRIBUTION
`
`(71) Applicant: Realtime Data LLC, Armonk, NY (US)
`
`(72) Inventors: James J. Fallon, Armonk, NY (US);
`Stephen J. McErlain, New York, NY
`(US)
`
`(73) Assignee: Realtime Data LLC, Armonk, NY (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 14/033,245
`
`(22) Filed:
`(65)
`
`Sep. 20, 2013
`Prior Publication Data
`US 2014/OO23135A1
`Jan. 23, 2014
`O
`O
`Related U.S. Application Data
`(63) Continuation of application No. 13/154.239, filed on
`Jun. 6, 2011, now Pat. No. 8,553,759, which is a
`continuation of application No. 12/123,081, filed on
`May 19, 2008, now Pat. No. 8,073,047, which is a
`(Continued)
`
`(2006.01)
`(2006.01)
`
`(51) Int. Cl.
`H04N 7/2
`H03M 7/30
`(52) U.S. Cl.
`CPC .............. H03M 7/6094 (2013.01); H03M 7/30
`(2013.01); H03M 7/3084 (2013.01)
`USPC ..................................................... 375/24O.O1
`(58) Field of Classification Search
`CPC. H03M 7/30; H03M 7/3084; G06F 17/30153:
`G06F 2212/40: G06F 12/0246; G06F
`17/30501; G06F 3/0679; G06F 3/0688;
`H04L 69/04: Y1OS 707/99931; Y1OS
`707/99942: H04W 28/06; H04N1/00236;
`H04N2201/3283; G11C 29/40
`
`USPC ............. 375/240, E7094; 707/E17.001, 736,
`707/781-788; 711/E12.008, 154; 709/247;
`708/203
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,394,352 A
`3,490,690 A
`
`7, 1968 Wernikoffet al.
`1/1970 Apple et al.
`(Continued)
`
`FOREIGN PATENT DOCUMENTS
`
`DE
`EP
`
`2, 1992
`4127518
`12/1985
`O 164677
`(Continued)
`OTHER PUBLICATIONS
`Realtime's Response in Opposition to the Defendants' Joint Objec
`tions to Report and Recommendation of Magistrate Regarding
`Motion for Partial Summary Judgment of Invalidity for Indefinite
`ness, in Realitime Data, LLC d/b/a/LXO v. Packeteer, Inc. et al., Civil
`Action No. 6:08-cv-00144-LED; U.S. District Court for the Eastern
`District of Texas, dated Jul 27, 2009, 15 pages.
`(Continued)
`
`Primary Examiner — Tesfaldet Bocure
`(74) Attorney, Agent, or Firm — Sterne, Kessler, Gold
`Stein & Fox PL.L.C.
`
`ABSTRACT
`(57)
`Data compression and decompression methods for compress
`ing and decompressing databased on an actual or expected
`throughput (bandwidth) of a system. In one embodiment, a
`controller tracks and monitors the throughput (data storage
`and retrieval) of a data compression system and generates
`control signals to enable/disable different compression algo
`rithms when, e.g., a bottleneck occurs So as to increase the
`throughput and eliminate the bottleneck.
`
`30 Claims, 4 Drawing Sheets
`
`COMPRESSION
`- ALGORTHMS
`- i. S-3
`it -
`f
`--a- --- -2
`p
`CONTROLLER - 'E-CAAto-1.
`SS
`
`or - so
`
`A RFES
`
`
`
`STORAGE --
`| MEDU
`
`1
`
`DISH 1001
`
`
`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 1 of 4
`
`US 8,934,535 B2
`
`COMPRESSION
`- ALGORTHMS
`1 . -3
`
`-Y -
`
`y8
`2
`4- compression
`CONROER
`|COASON F-DAIA 10-? -
`
`iDATAPROFILES
`
`co-si 8.
`
`
`
`STORAGE --
`Eiji
`mutuawed
`
`f
`
`FG.
`
`2
`
`
`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 2 of 4
`
`US 8,934,535 B2
`
`&
`
`R
`
`& ,
`
`firii;
`NTIALZECOPRESSIONSYSTEM
`WTHDEFAULTASY ETRIC r-g
`COMPRESSIONEQUINE
`--arra-a-a-as-exa
`DECORRESSOFERATING SYSTER
`SINGEEFAJASYSTREC is 2
`CRESSENGINE
`Xiaop-oopercept -My
`22- -- N
`YES : -8.
`is
`polskBEADCOSS ANDY
`23-Nui
`y;626
`-
`2nyes 1
`s
`- ECO3FESSESNG >'s -
`s
`i
`-RENTAGORT2
`y S 2 f
`SELECTCO PRESSION" p.
`wars
`i? SEECASERCA
`DECOPRESSION ROUTINE
`DECOMPE
`it. RONE.
`ASSOCAE AIAA
`28
`lyES
`stafa. - S
`-
`1 iss
`woECOPRE
`SSAA
`first ERGYESSEC Route
`r
`NRESL- NEGUSE - NQ
`25
`^^
`s
`i NQ
`3.
`SELECOPRESSION
`ROUTINE PROVINGFASTER
`L. COPRESENRIE is
`G. 2
`
`s
`
`Aaaaaa.
`
`^
`
`
`
`NO
`Y.
`Saar-May
`
`-Y -
`aYaaxxxxxxxxYsraaaasrara Yarkar.arrrrrrr.
`
`axas-a-a-awasara as Yassraraxrrrr.
`
`3
`
`
`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 3 of 4
`
`US 8,934,535 B2
`
`raxasrararasarayara Yaaaaasarassassara-sasasax Yaraassrssserrara
`
`arraaXXaXaXaa
`
`t
`
`v
`
`x-rre
`
`foxey
`
`s i N-- W
`
`s
`Raasa
`assis
`
`N r
`
`-aaa.
`
`******~~~~,- - - ----**&
`
`? * *
`
`º.***************-----**************&&&&&&•••••••••••~~~……….……………………….
`
`xxaaaa
`
`************************)*)*)*)*)*)*)*)*,,+,+,+,+,·,·,·* & %.
`
`Xaxa. x
`
`axaaaaa.
`
`----*******************&&&&&
`·······º·:·º·:·~~~~
`
`
`
`
`
`~~~~~~*~~~~******************~~~~*~~~~~.~~~~~ ~~~~.~~~~ ~~~~,~-------
`
`
`
`4
`
`
`
`U.S. Patent
`
`Jan. 13, 2015
`
`Sheet 4 of 4
`
`US 8,934,535 B2
`
`*YSCA, SK
`
`Superblock se
`
`
`
`•ramasarar-rw-w
`
`Supertiock
`aia
`
`xxxrvaxx-xxxx
`
`warraxxar
`
`
`
`|- Type. 3 is
`3 hits
`8 its
`"Sector Court 8 bits
`LBA
`32 bits
`F.G. (3
`
`Marrrrrrr,
`
`5
`
`
`
`US 8,934,535 B2
`
`1.
`SYSTEMIS AND METHODS FOR VIDEO AND
`AUDIO DATA STORAGE AND DISTRIBUTION
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is a continuation of U.S. patent applica
`tion Ser. No. 13/154.239, filed on Jun. 6, 2011, now U.S. Pat.
`No. 8,553,759, which is a continuation of U.S. patent appli
`cation Ser. No. 12/123,081, filed on May 19, 2008, now U.S.
`Pat. No. 8,073,047, which is a continuation of U.S. patent
`application Ser. No. 10/076,013, filed on Feb. 13, 2002, now
`U.S. Pat. No. 7,386,046, which claims the benefit of U.S.
`Provisional Application No. 60/268.394, filed on Feb. 13,
`2001, each of which is fully incorporated herein by reference
`in its entirety.
`
`10
`
`15
`
`BACKGROUND
`
`2
`memory for most applications. Thus, in order to offer suffi
`cient memory for the operating system(s), application pro
`grams, and user data, computers often use various forms of
`popular off-processor high speed memory including static
`random access memory (SRAM), synchronous dynamic ran
`dom access memory (SDRAM), synchronous burst static ram
`(SBSRAM). Due to the prohibitive cost of the high-speed
`random access memory, coupled with their power Volatility, a
`third lower level of the hierarchy exists for non-volatile mass
`storage devices. While mass storage devices offer increased
`capacity and fairly economical data storage, their data storage
`and retrieval bandwidth is often much less in relation to the
`other elements of a computing system.
`Computers systems represent information in a variety of
`manners. Discrete information Such as text and numbers are
`easily represented in digital data. This type of data represen
`tation is known as symbolic digital data. Symbolic digital
`data is thus an absolute representation of data Such as a letter,
`figure, character, mark, machine code, or drawing.
`Continuous information Such as speech, music, audio,
`images and video, frequently exists in the natural world as
`analog information. As is well known to those skilled in the
`art, recent advances in very large scale integration (VLSI)
`digital computer technology have enabled both discrete and
`analog information to be represented with digital data. Con
`tinuous information represented as digital data is often
`referred to as diffuse data. Diffuse digital data is thus a rep
`resentation of data that is of low information density and is
`typically not easily recognizable to humans in its native form.
`Modern computers utilize digital data representation
`because of its inherent advantages. For example, digital data
`is more readily processed, stored, and transmitted due to its
`inherently high noise immunity. In addition, the inclusion of
`redundancy in digital data representation enables error detec
`tion and/or correction. Error detection and/or correction
`capabilities are dependent upon the amount and type of data
`redundancy, available error detection and correction process
`ing, and extent of data corruption.
`One outcome of digital data representation is the continu
`ing need for increased capacity in data processing, storage,
`and transmittal. This is especially true for diffuse data where
`increases in fidelity and resolution create exponentially
`greater quantities of data. Data compression is widely used to
`reduce the amount of data required to process, transmit, or
`store a given quantity of information. In general, there are two
`types of data compression techniques that may be utilized
`either separately or jointly to encode/decode data: lossless
`and lossy data compression.
`Over the last decade, computer processor performance has
`improved by at least a factor of 50. During this same period,
`magnetic disk storage has only improved by a factor of 5.
`Thus one additional problem with the existing art is that
`memory storage devices severely limit the performance of
`consumer, entertainment, office, workStation, servers, and
`mainframe computers for all disk and memory intensive
`operations.
`For example, magnetic disk mass storage devices currently
`employed in a variety of home, business, and Scientific com
`puting applications suffer from significant seek-time access
`delays along with profound read/write data rate limitations.
`Currently the fastest available (15,000) rpm disk drives Sup
`port only a 40.0 Megabyte per second data rate (MB/sec).
`This is in stark contrast to the modern Personal Computers
`Peripheral Component Interconnect (PCI) Bus's input/output
`capability of 512 MB/sec and internal local bus capability of
`1600 MB/sec.
`
`25
`
`30
`
`45
`
`1. Technical Field
`The present invention relates generally to data compres
`sion and decompression and, in particular, to a system and
`method for compressing and decompressing databased on an
`actual or expected throughput (bandwidth) of a system that
`employs data compression. Additionally the present inven
`tion relates to the Subsequent storage, retrieval, and manage
`ment of information in data storage devices utilizing either
`compression and/or accelerated data storage and retrieval
`bandwidth.
`2. Description of the Related Art
`There are a variety of data compression algorithms that are
`currently available, both well-defined and novel. Many com
`pression algorithms define one or more parameters that can be
`varied, either dynamically or a-priori, to change the perfor
`mance characteristics of the algorithm. For example, with a
`35
`typical dictionary based compression algorithm such as Lem
`pel-Ziv, the size of the dictionary can affect the performance
`of the algorithm. Indeed, a large dictionary may be employed
`to yield very good compression ratios but the algorithm may
`take a longtime to execute. If speed were more important than
`40
`compression ratio, then the algorithm can be limited by
`selecting a smaller dictionary, thereby obtaining a much
`faster compression time, but at the possible cost of a lower
`compression ratio. The desired performance of a compression
`algorithm and the system in which the data compression is
`employed, will vary depending on the application.
`Thus, one challenge in employing data compression for a
`given application or system is selecting one or more optimal
`compression algorithms from the variety of available algo
`rithms. Indeed, the desired balance between speed and effi
`ciency is typically a significant factor that is considered in
`determining which algorithm to employ for a given set of
`data. Algorithms that compress particularly well usually take
`longer to execute whereas algorithms that execute quickly
`usually do not compress particularly well.
`Accordingly, a system and method that would provide
`dynamic modification of compression system parameters so
`as to provide an optimal balance between execution speed of
`the algorithm (compression rate) and the resulting compres
`sion ratio, is highly desirable.
`Yet another problem within the current art is data storage
`and retrieval bandwidth limitations. Modern computers uti
`lize a hierarchy of memory devices. In order to achieve maxi
`mum performance levels, modern processors utilize onboard
`memory and on board cache to obtain high bandwidth access
`to both program and data. Limitations in process technologies
`currently prohibit placing a sufficient quantity of onboard
`
`50
`
`55
`
`60
`
`65
`
`6
`
`
`
`US 8,934,535 B2
`
`3
`Another problem within the current art is that emergent
`high performance disk interface standards such as the Small
`Computer Systems Interface (SCSI-3), iSCSI, Fibre Channel,
`AT Attachment UltraDMA/100+, Serial Storage Architec
`ture, and Universal Serial Bus offer only higher data transfer 5
`rates through intermediate data buffering in random access
`memory. These interconnect Strategies do not address the
`fundamental problem that all modern magnetic disk storage
`devices for the personal computer marketplace are still lim
`ited by the same typical physical media restriction. In prac- 10
`tice, faster disk access data rates are only achieved by the high
`cost solution of simultaneously accessing multiple disk
`drives with a technique known within the art as data striping
`and redundant array of independent disks (RAID).
`RAID systems often afford the user the benefit of increased 15
`data bandwidth for data storage and retrieval. By simulta
`neously accessing two or more disk drives, data bandwidth
`may be increased at a maximum rate that is linear and directly
`proportional to the number of disks employed. Thus another
`problem with modern data storage systems utilizing RAID 20
`systems is that a linear increase in data bandwidth requires a
`proportional number of added disk storage devices.
`Another problem with most modern mass storage devices
`is their inherent unreliability. Many modern mass storage
`devices utilize rotating assemblies and other types of electro- 25
`mechanical components that possess failure rates one or more
`orders of magnitude higher than equivalent solid state
`devices. RAID systems employ data redundancy distributed
`across multiple disks to enhance data storage and retrieval
`reliability. In the simplest case, data may be explicitly 30
`repeated on multiple places on a single disk drive, on multiple
`places on two or more independent disk drives. More com
`plex techniques are also employed that Support various trade
`offs between data bandwidth and data reliability.
`Standard types of RAID systems currently available 35
`include RAID Levels 0, 1, and 5. The configuration selected
`depends on the goals to be achieved. Specifically data reli
`ability, data validation, data storage/retrieval bandwidth, and
`cost all play a role in defining the appropriate RAID data
`storage solution. RAID level 0 entails pure data striping 40
`across multiple disk drives. This increases data bandwidth at
`best linearly with the number of disk drives utilized. Data
`reliability and validation capability are decreased. A failure of
`a single drive results in a complete loss of all data. Thus
`another problem with RAID systems is that low cost 45
`improved bandwidth requires a significant decrease in reli
`ability.
`RAID Level 1 utilizes disk mirroring where data is dupli
`cated on an independent disk Subsystem. Validation of data
`amongst the two independent drives is possible if the data is 50
`simultaneously accessed on both disks and Subsequently
`compared. This tends to decrease data bandwidth from even
`that of a single comparable disk drive. In systems that offer
`hot swap capability, the failed drive is removed and a replace
`ment drive is inserted. The data on the failed drive is then 55
`copied in the background while the entire system continues to
`operate in a performance degraded but fully operational
`mode. Once the data rebuild is complete, normal operation
`resumes. Hence, another problem with RAID systems is the
`high cost of increased reliability and associated decrease in 60
`performance.
`RAID Level 5 employs disk data striping and parity error
`detection to increase both data bandwidth and reliability
`simultaneously. A minimum of three disk drives is required
`for this technique. In the event of a single disk drive failure, 65
`that drive may be rebuilt from parity and other data encoded
`on disk remaining disk drives. In systems that offer hot Swap
`
`4
`capability, the failed drive is removed and a replacement drive
`is inserted. The data on the failed drive is then rebuilt in the
`background while the entire system continues to operate in a
`performance degraded but fully operational mode. Once the
`data rebuild is complete, normal operation resumes.
`Thus another problem with redundant modern mass stor
`age devices is the degradation of data bandwidth when a
`storage device fails. Additional problems with bandwidth
`limitations and reliability similarly occur within the art by all
`other forms of sequential, pseudo-random, and random
`access mass storage devices. Typically mass storage devices
`include magnetic and optical tape, magnetic and optical
`disks, and various solid-state mass storage devices. It should
`be noted that the present invention applies to all forms and
`manners of memory devices including storage devices utiliz
`ing magnetic, optical, neural and chemical techniques or any
`combination thereof.
`Yet another problem within the current art is the applica
`tion and use of various data compression techniques. It is well
`known within the current art that data compression provides
`several unique benefits. First, data compression can reduce
`the time to transmit data by more efficiently utilizing low
`bandwidth data links. Second, data compression economizes
`on data storage and allows more information to be stored for
`a fixed memory size by representing information more effi
`ciently.
`For purposes of discussion, data compression is canoni
`cally divided into lossy and lossless techniques. Lossy data
`compression techniques provide for an inexact representation
`of the original uncompressed data Such that the decoded (or
`reconstructed) data differs from the original unencoded/un
`compressed data. Lossy data compression is also known as
`irreversible or noisy compression. Negentropy is defined as
`the quantity of information in a given set of data. Thus, one
`obvious advantage of lossy data compression is that the com
`pression ratios can be larger than that dictated by the negent
`ropy limit, all at the expense of information content, Many
`lossy data compression techniques seek to exploit various
`traits within the human senses to eliminate otherwise imper
`ceptible data. For example, lossy data compression of visual
`imagery might seek to delete information content in excess of
`the display resolution or contrast ratio of the target display
`device.
`On the other hand, lossless data compression techniques
`provide an exact representation of the original uncompressed
`data. Simply stated, the decoded (or reconstructed) data is
`identical to the original unencoded/uncompressed data. Loss
`less data compression is also known as reversible or noiseless
`compression. Thus, lossless data compression has, as its cur
`rent limit, a minimum representation defined by the entropy
`of a given data set.
`A rich and highly diverse set of lossless data compression
`and decompression algorithms exist within the current art.
`These range from the simplest "adhoc approaches to highly
`Sophisticated formalized techniques that span the Sciences of
`information theory, statistics, and artificial intelligence. One
`fundamental problem with almost all modern approaches is
`the compression ratio to encoding and decoding speed
`achieved. As previously stated, the current theoretical limit
`for data compression is the entropy limit of the data set to be
`encoded. However, in practice, many factors actually limit the
`compression ratio achieved. Most modern compression algo
`rithms are highly content dependent. Content dependency
`exceeds the actual statistics of individual elements and often
`includes a variety of other factors including their spatial loca
`tion within the data set.
`
`7
`
`
`
`5
`Of popular compression techniques, arithmetic coding
`possesses the highest degree of algorithmic effectiveness, and
`as expected, is the slowest to execute. This is followed in turn
`by dictionary compression, Huffman coding, and run-length
`coding with respectively decreasing execute times. What is
`not apparent from these algorithms, that is also one major
`deficiency within the current art, is knowledge of their algo
`rithmic efficiency. More specifically, given a compression
`ratio that is within the effectiveness of multiple algorithms,
`the question arises as their corresponding efficiency.
`Within the current art there also presently exists a strong
`inverse relationship between achieving the maximum (cur
`rent) theoretical compression ratio, which we define as algo
`rithmic effectiveness, and requisite processing time. For a
`given single algorithm the effectiveness over a broad class of
`data sets including text, graphics, databases, and executable
`object code is highly dependent upon the processing effort
`applied. Given a baseline data set, processor operating speed
`and target architecture, along with its associated Supporting
`memory and peripheral set, we define algorithmic efficiency
`as the time required to achieve a given compression ratio.
`Algorithmic efficiency assumes that a given algorithm is
`implemented in an optimum object code representation
`executing from the optimum places in memory. This is almost
`never achieved in practice due to limitations within modern
`optimizing software compilers. It should be further noted that
`an optimum algorithmic implementation for a given input
`data set may not be optimum for a different data set. Much
`work remains in developing a comprehensive set of metrics
`for measuring data compression algorithmic performance,
`30
`however for present purposes the previously defined terms of
`algorithmic effectiveness and efficiency should suffice.
`Various solutions to this problem of optimizing algorith
`mic implementation are found in U.S. Pat. Nos. 6,195,024
`and 6,309,424, issued on Feb. 27, 2001 and Oct. 30, 2001,
`respectively, to James Fallon, both of which are entitled
`“Content Independent Data Compression Method and Sys
`tem, and are incorporated herein by reference. These patents
`describe data compression methods that provide content-in
`dependent data compression, wherein an optimal compres
`40
`sion ratio for an encoded stream can be achieved regardless of
`the data content of the input data stream. As more fully
`described in the above incorporated patents, a data compres
`sion protocol comprises applying an input data stream to each
`of a plurality of different encoders to, in effect, generate a
`plurality of encoded data streams. The plurality of encoders
`are preferably selected based on their ability to effectively
`encode different types of input data. The final compressed
`data stream is generated by selectively combining blocks of
`the compressed streams output from the plurality of encoders
`based on one or more factors such as the optimal compression
`ratios obtained by the plurality of decoders. The resulting
`compressed output stream can achieve the greatest possible
`compression, preferably in real-time, regardless of the data
`COntent.
`Yet another problem within the current art relates to data
`management and the use of existing file management sys
`tems. Present computer operating systems utilize file man
`agement systems to store and retrieve information in a uni
`form, easily identifiable, format. Files are collections of
`executable programs and/or various data objects. Files occur
`in a wide variety of lengths and must be stored within a data
`storage device. Most storage devices, and in particular, mass
`storage devices, work most efficiently with specific quantities
`of data. For example, modern magnetic disks are often
`divided into cylinders, heads and sectors. This breakout arises
`from legacy electro-mechanical considerations with the for
`
`50
`
`45
`
`55
`
`60
`
`65
`
`US 8,934,535 B2
`
`10
`
`15
`
`25
`
`35
`
`6
`mat of an individual sector often some binary multiple of
`bytes (512, 1024, ...). A fixed or variable quantity of sectors
`housed on an individual track. The number of sectors permit
`ted on a single track is limited by the number of reliable flux
`reversals that can be encoded on the storage media per linear
`inch, often referred to as linearbit density. In disk drives with
`multiple heads and disk media, a single cylinder is comprised
`of multiple tracks.
`A file allocation table is often used to organize both used
`and unused space on a mass storage device. Since a file often
`comprises more than one sector of data, and individual sec
`tors or contiguous strings of sectors may be widely dispersed
`over multiple tracks and cylinders, a file allocation table
`provides a methodology of retrieving a file orportion thereof.
`File allocation tables are usually comprised of strings of
`pointers or indices that identify where various portions of a
`file are stored.
`In-order to provide greater flexibility in the management of
`disk storage at the media side of the interface, logical block
`addresses have been Substituted for legacy cylinder, head,
`sector addressing. This permits the individual disk to opti
`mize its mapping from the logical address space to the physi
`cal sectors on the disk drive. Advantages with this technique
`include faster disk accesses by allowing the disk manufac
`turer greater flexibility in managing data interleaves and other
`high-speed access techniques. In addition, the replacement of
`bad media sectors can take place at the physical leveland need
`not be the concern of the file allocation table or host computer.
`Furthermore, these bad sector replacement maps are defin
`able on a disk by disk basis.
`Practical limitations in the size of the data required to both
`represent and process an individual data blockaddress, along
`with the size of individual data blocks, governs the type offile
`allocation tables currently in use. For example, a 4096 byte
`logical block size (8 sectors) employed with 32 bit logical
`block addresses. This yields an addressable data space of
`17.59 Terabytes. Smaller logical blocks permit more efficient
`use of disk space. Larger logical blocks Support a larger
`addressable data space. Thus one limitation within the current
`art is that disk file allocation tables and associated file man
`agement systems are a compromise between efficient data
`storage, access speed, and addressable data space.
`Data in a computer has various levels of information con
`tent. Even within a single file, many data types and formats
`are utilized. Each data representation has specific meaning
`and each may hold differing quantities of information. Within
`the current art, computers process data in a native, uncom
`pressed, format. Thus compressed data must often be decom
`pressed prior to performing various data processing functions
`or operations. Modern file systems have been designed to
`work with data in its native format. Thus another significant
`problem within the current art is that file systems are notable
`to randomly access compressed data in an efficient manner.
`Further aggravating this problem is the fact that when data
`is decompressed, processed and recompressed it may not fit
`back into its original disk space, causing disk fragmentation
`or complex disk space reallocation requirements. Several
`solutions exist within the current art including file by file and
`block structured compressed data management.
`In file by file compression, each file is compressed when
`stored on disk and decompressed when retrieved. For very
`Small files this technique is often adequate, however for larger
`files the compression and decompression times are too slow,
`resulting in inadequate system level performance. In addition,
`the ability to access randomly access data within a specific file
`is lost. The one advantage to file by file compression tech
`niques is that they are easy to develop and are compatible with
`
`8
`
`
`
`US 8,934,535 B2
`
`5
`
`10
`
`15
`
`30
`
`35
`
`45
`
`7
`existing file systems. Thus file by file compressed data man
`agement is not an adequate Solution.
`Block structured disk compression operates by compress
`ing and decompressing fixed block sizes of data. Block sizes
`are often fixed, but may be variable in size. A single file
`usually is comprised of multiple blocks, however a file may
`be so Small as to fit within a single block. Blocks are grouped
`together and stored in one or more disk sectors as a group of
`Blocks (GOBs). A group of blocks is compressed and decom
`pressed as a unit, thus there exists practical limitations on the
`size of GOBs. Most compression algorithms achieve a higher
`level of algorithmic effectiveness when operating on larger
`quantities of data. Restated, the larger the quantity of data
`processed with a uniform information density, the higher the
`compressions ratio achieved. If GOBs are Small compression
`ratios are low and processing time short. Conversely, when
`GOBS are large compression ratios are higher and processing
`time is longer. Large GOBS tend to perform in a manner
`analogous to file by file compression. The two obvious ben
`efits to block structured disk compression are pSuedo-random
`data access and reduced data compression/decompression
`processing time.
`Several problems exist within the current art for the man
`agement of compressed blocks. One method for storage of
`25
`compressed files on disk is by contiguously storing all GOBS
`corresponding to a single file. However as files are processed
`within the computers, files may grow or shrink in size. Inef
`ficient disk storage results when a Substantial file size reduc
`tion occurs. Conversely when a file grows Substantially, the
`additional space required to store the data may not be avail
`able contiguously. The result of this process is substantial
`disk fragmentation and slower access times.
`An alternate method is to map compressed GOBs into the
`next logical free space on the disk. One problem with this
`method is that average file access times are substantially
`increased by this technique due to the random data storage.
`Peak access delays may be reduced since the statistics behave
`with a more uniform white spectral density, however this is
`not guaranteed.
`40
`A further layer of complexity is encountered when com
`pressed information is to be managed on more than one data
`storage device. Competing requirements of data access band
`width, data reliability/redundancy, and efficiency of storage
`space are encountered.
`These and other limitations within the currentart are solved
`with the present invention.
`
`SUMMARY OF THE INVENTION
`
`The present invention is directed to a system and method
`for compressing and decompressing based on the actual or
`expected throughput (bandwidth) of a system employing data
`compression and a technique of optimizing based upon
`planned, expected, predicted, or actual usage.
`In one aspect of the present invention, a system for provid
`ing bandwidth sensitive data compression comprises:
`a data compression system for compressing and decom
`pressing data input to the system;
`a plurality of compression routines selectively utilized by
`the data compression system; and
`a controller for tracking the throughput of the system and
`generating a control signal to select a compression rou
`tine based on the system throughput. In a preferred
`embodiment, when the controller determines that the
`system throughput falls below a predetermined through
`put threshold, the controller commands the data com
`
`50
`
`55
`
`60
`
`65
`
`8
`pression engine to use a compression routine providing
`a faster rate of compression so as to increase the through
`put.
`In another aspect, a system for providing bandwidth sen
`sitive data compression comprises a plurality of access pro
`files, operatively accessible by the controller that enables the
`controller to determine a compression routine that is associ
`ated with a data type of the data to be compressed. The access
`profiles comprise information that enables the controller to
`select a suitable compression algorithm that provides a
`desired balance between execution speed (rate of compres
`sion) and efficiency (compression ratio).
`In yet another aspect, a system comprises a data storage
`controller for controlling the compression and storage of
`compressed data to a storage device and the retrieval and
`decompression of compressed data from the storage device.
`The system throughput tracked by the controller preferably
`comprises a number of pending access requests to a storage
`device.
`In another aspect, the system comprises a data transmission
`controller for