`
`USOOS809295A
`
`5,809,295
`[11] Patent Number:
`[19]
`UnIted States Patent
`
`Straub et al.
`[45] Date of Patent:
`Sep. 15, 1998
`
`[54] METHOD AND APPARATUS FOR STORING
`COMPRESSED IFII_.EI)A1A()NAI)ISK
`,
`,
`,
`WHLRE EACH MDFA1 DATA STRUCTURE
`INCLUDES AN EXTRA BYTE
`Inventors: Eric John Slrztul); Richard P.
`Jemigan, IV, both 01‘ Kirkland; Scott
`1). Quinn, Issaquah; Peter Blackburn
`Stewart, Redmond, all of Wash.
`
`[75]
`
`OTHER PUBLICATIONS
`,
`-.
`.
`-
`.
`Peter Norton, (nude to DOS Updated to (.over (12, pp.
`4541440,“, ’450451, 1994
`Stac Electronic’s Stacker, Data Sources Report, Computer
`Select, 1994.
`Christopher Barr, “Slacker 3.1 for Windows and DOS", PC
`Magzine, vol. 12, No. 15, Sep. 14, 1993, pp. 124—126.
`
`Primary Examiner—Wayne Amsbury
`Assistant Hrmm'm’r—Greta 1,. Robinson
`
`[73
`
`[22
`
`[51
`[52
`
`
`
`58‘.
`
`.
`[56'
`
`
`
`[21- App}. No; 506,971
`‘
`
`Assigncc: Micmsolt Corporation, Redmond,
`Wash
`
`linemen Agent, or Firm—Jones & Askew, LLP
`[57]
`ABSTRACT
`
`I-‘iled:
`
`Sep. 26, 1995
`
`. 5
`Int: ('1'
`U.S. (.1.
`
`:
`,
`_ (70“ ”’80
`.......................... 395K601; 3933x612; 3953'602;
`395E439; 36492183
`Field of Search ..................................... 393430, 600,
`395,401, 601, 602, 612, 439; 364,222.81
`
`a.
`References Cited
`US. PATENT DOCUMENTS
`
`S,23?,o?5
`5,500,530
`5_,55I,(12tt
`5,5?4,€Kt?
`
`8:19.03 Hannon,Jr.
`395.:‘425
`4,“!996 Whiting:l cl ul.
`..... 341.."51
`
`8,!1990 Flax et at.
`.........
`395K312
`11,-1996 Jernigan, [V ct al.
`395tottt
`
`10
`
`
`
`\ 12
`
`A method and apparatus for storing compressed lile data
`stored on a disk. the method includes an improved format
`[or
`the Compressed Volume I-‘ile
`(CV13), and more
`specifically,
`If) an improved format for the MDFAT data
`structure stored within the CVF. The improved format
`includes using an additional byte for maintaining each entry
`in the MDFAT data structure which increases the number of
`sectors within the Sector llea
`that can be addressed and
`P
`accessed. The improved formal further allows compressed
`clusters to be stored in fragments in various portions of
`vacant Storage space located throughout the Sector ”cap.
`The new format for the MDFAT data structure includes a bit
`that identifies whether each cluster is being stored in frag-
`ments. The method stores the locations of each of the
`
`fragments in a Fragment Pointer List
`SCCIUI‘ of [he 111-51 fragmenL
`
`located in the first
`
`44 Claims, 13 Drawing Sheets
`
`16
`
`‘17
`
`
`
`OPERATING SYSTEM
`
`COMPRESSION DEVICE
`DRIVER
`
`‘18
`
`15
`
`COMPRESSED
`VOLUME FILE
`
`MAGNETIC
`DISK DRIVE
`
`14
`
`1
`
`Apple v. Realtime
`Proceeding No. |PR2016-01737
`APPLE 1037
`
`Apple v. Realtime
`Proceeding No. IPR2016-01737
`APPLE 1037
`
`1
`
`
`
`SU
`
`ma
`
`m,
`
`3
`
`5,809,295
`
`mII
`
`(TO—u.
`
`“.0m
`
`ammDZD
`
`
`
`Nn-m©«mugawar:aNmumw
`
`
`tug?EHmmOHUm—mmIIIIIIIIIEma“.
`
`
`
`
`
`a.>E_2m_emNFEENEEExamNEEEE«En&0UUUw.__n_mm:=n_mm.__u_<m_I__n_<man.DmmwmmnnzOUZD
`
`
`1\
`
`mm._.m3._0m>mOhom—ED.5".av
`IEI>m._.Zm_ml=m>mk2m.
`
`2
`
`
`
`
`
`US. Patent
`
`Sep. 15,1998
`
`Sheet 2 0f 13
`
`5,809,295
`
`tag—bums
`
`IE
`
`II
`
`III
`
`vm
`
`m—.O_..._
`
`bin—2
`
`35,4541
`
`Iguana-II
`
`Hanan-Hal
`
`Hagan-HE!
`
`laaflaafil
`
`Hanan-Hal
`
`luau-i=5:
`
`Ian-IEEEI
`
`Inna-i=5!
`
`IIIIII'I
`
`mmmmhmmm
`Nme.vmmNoV.vmme.vNm
`mewe5mmmmanmmcm.2.m_.
`
`Ommmmmgu
`
`42.49
`
`“*0LL!
`
`0Lu
`0')
`
`I)
`
`
`>m._.Zm$5.930
`
`
`0L
`
`L
`
`IJJ
`
`3
`
`
`
`
`
`
`US. Patent
`
`Sep. 15,1998
`
`Sheet 3 0f 13
`
`5,809,295
`
`
`
`IHOZMJmOFQmm
`
`
`
`
`
`0.4m“.OZFmEb06.x".
`
`
`
`3DD
`
`D
`
`HERE!
`
`0—.0Eaem
`
`IIn-EI:al-aIEanII!myII
`aam5530mo".4.:mIVbmm
`
`>E_2m_>55EE.Sad
`Q.mma-Bam?S3..mmmm-m“a-2EHmmohmm
`>E§m>E2m3mumm88N45ammwmmnsoU
`mmmmmmmwom:\
`
`>m20w§n
`
`
`
`{{rill{{{{{r1111.[Mlt
`
`mm
`
`Elfilfilllfig:J
`
`
`DU
`
`J
`
`4
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 15,1998
`
`Sheet 4 0f 13
`
`5,809,295
`
`OPERATING SYSTEM
`
`COMPRESSION DEVICE
`DRIVER
`
`DISK DRIVE
`
`COMPRESSED
`VOLUME FILE
`
`MAGNETIC
`
`FIG.2
`
`
`
`
`
`
`
`
`
`
`
`
`FAT
`
`ROOT DIRECTORY
`
`SECTOR
`HEAP
`
`COMPRESSED
`VOLUME
`FILE
`
`
`
`FIGJ
`
`5
`
`
`
`US. Patent
`
`Sep. 15, 1998
`
`Sheet 5 of 13
`
`5,809,295
`
`
`
`momamflnzoomE9:29,9:5523$50me52536.6:BEBEm_mE30>
`
`
`
`
`
`
`E90033u592m9:mEEBumD90mm:m_:22;65:U905m_“:69.:ME>:omamuE:E_xm§m5.552thE.:o_EE..OE_oEomambomamszmin.mmmm00-m_2cm.mm::23;932:5<
`
`
`
`
`9,:U6.mnEzc:29?9:Gm?.3039.
`
`
`
`5:95.me
`
`.9303.
`€283
`
`25
`
`EmtO
`
`
`
`
`
`503..#582EmE20>nommEQEoo
`
`
`
`.8w.@moo-m§9:20.)958925M3mmioufigUcmNoEwmOQ-w_2.9.229
`
`
`
`
`
`
`
`
`
`#5529:V63903@555as....u_>um5.6EomqmuEaEimS.9.:U555;:
`
`
`
`
`
`ammIBEmw9:E83032939.0.3...man»::22;$5523mto295<
`
`
`m223$£5.5mam9:.62.6ummmmanQ.2:cosumac
`
`
`
`
`
`comm5.,035m2:52:5as229:..5530a5..SanEmmmEQEOo2:2639:53:852:3
`
`
`
`55mm9:U6mamE250m5:09.6535.3?EBmm:E8QOon...E6m5
`*03_om%oE:E_xm_29:”—o5:83a2._.<u=_mor:.2338238QO9:.
`
`dam;
`
`98.2«53mm333a9:32g“563.é:.3958”.
`
`
`
`
`
`.02»,m_553050532.50m5.3o2tea.3:E255896532509:tFE:n<‘323$3mam:55mm9:E553some5..:nmwas._.<n=_mmptr
`
`.aman:@829mmomnmgtom6
`
`8:9-5%
`
`—<v.0_u_
`
`
`
`.N«GMIDOE0.»m2:20.3%:
`
`.mmmos.m5.6on“mateum6:28n5632_,mEEom>9938m_
`
`$553352..2xmficmommBonEg>9.5Damn
`
`
`_.u“m..__u..§>
`
`hon—06m
`
`6
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 15, 1998
`
`Sheet 6 of 13
`
`5,809,295
`
`._.«6$503.0....m2......10.9.3.
`
`.mmmoz9:.6mmmatea2582a58%"at
`
`
`
`mg“mateum0:30”.“a.3039.355.Cam9::_mm:929...“.Eav.65$3.83
`
`£meQESm3ES.8555,mmomnwmsto5"..552$5U5530m“E:or:”B
` «(1.0:—
`
`
`
`
`
`
`tEwEm_QEBmmomamgto5rarc..”_>u05E583m_a_::EL3ummcwgmmvmE332392.9.5393:Louomm055:589mfimmxm$5..3233»
`.85392$5.6639:225.mE:_o>UmmwfiaES9.:E0:053mmmUmEémwmmaco8:939mmmEEEmEn.m:29?mOo-w_>_ofEnoncommuE93025m3...
`
`
`
`
`
`
`
`mBOUEE5:2,..09:3mmBEESmmQEUmomnwgto9.:t:96658025*6mm:5:2$5.?.925momnwgtom55...633502m2...
`
`ammo—29:”—05“metaE6:30”..w.303mEtEm635mumqmgtom5.6».3202630..9:
`
`
`
`
`
`.meo.318$me2:.mmE3_o>N:29?momamwzhoBu.xmwxo.85
`
`.mn_mn=29:96
`
`7
`
`
`
`US. Patent
`
`Sep. 15, 1998
`
`Sheet 7 of 13
`
`5,809,295
`
`
`
`vammaqutOhm2:IUth:_.mwo—”—
`
`
`
`Ea9.908
`
`gunk...
`
`I a
`
`
`
`n.2903
`
`
`
`
`
`EmmODwE9t>222mm:908mm353:5:E:E_me
`
`touatomou@5me$16
`
`momammzho>335:9.2
`
`RmEats990mm.5E300hum—0350mm:
`
`
`
`
`
`
`
`
`
`
`
`
`
`252:o.280th52m_Em:25-3855cam:m_990...ammo-cam:55.39..ammoI
`
`
`=Eon;9:26:me9E
`once
`
`cozatomma
`
`
` :58c3389mes:.mn_>u520c:9063 .9was25.m08,159525E.
`
`
`39:::6&6mmomamgto:_.tEmao26:a9:on«mam0:333:335520
`a:0$5“com8“comcan._.<u_283Sn2@55qu:03:26:m:2.2m2...
`
`acme3van5293Fc8339
`
`8829muéwxoc:«m
`
`AEGSNnew
`
`mamas.
`
`as9gm
`
`
`
`I
`In
`
`«mmtO
`
`an923
`
`.998
`
`Ba9903
`
`.383
`
`gimme
`
`9.0“,me
`
`mg
`
`ab22:9)h....589:5:
`
`390.85
`
`motEm
`
`50m
`
`8
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 15, 1998
`
`Sheet 8 of 13
`
`5,809,295
`
`
`
`
`
`om><>3<-xmficmom>203mg?625953633E:E_me>9umEELEm—u
`
`EVmmbwt0....mZD10.3.3
`
`momamgtn—>9590:9o
`
`
`
`.5>:omamoEzE_xm_29.:mm2:....026mODmEE2908Est;.5LEE—E
`
`
`
`
`
`5.20E.3308E62605:5.EH:2me:o_mwm.anuumumEzmm93>2Um__a_::Edam:.9039:.5mflm623%.m52Exammm?$5.930modimkn
`
`
`
`
`
`
`
`
`
`
`
`
`
`rm:0:3959.m_38::+v*um:_oiu_oio£
`
`x2....9ommihoalmmgndnalmounbaalhg
`
`
`uoaummlhqumEB.ninmouA-nQniSo
`
`wam+.5355+3mm5Eu:+bum—gm+6mm:mafia—2tomug
`
`
`
`=035.925xm_o:_>>+39..DE
`
`
`
`
`
`2+tmfim”So3939m:#392v5.tfimm5v5553.350.
`
`
`
`.mnSummumawgtohow220;59.2o:2.:
`
`
`
`toSmmealm9>n.nanlmOUAan_15o
`
`+oJBmaolhoficmnbnnlhg
`
`\muqmmlmalhOOm—Ivmoon
`
`a6.:.633En8:53a.83m2I!
`
`
`
`mmvWEDGEOhMZEIDhqe:
`
`«£10:
`
`
`
`
`
`:OEJEE3309.5Boxas:*055mm.830.I50m
`
`33095
`
`2me
`
`mm,5E38
`
`502:
`
`thHowm
`
`mm96238
`
`905mm
`
`woucsoo
`
`mama:
`
`Nm2.908
`
`.3main
`
`352:.
`
`8%
`
`9
`
`
`
`
`
`US. Patent
`
`Sep. 15, 1998
`
`Sheet 9 of 13
`
`5,809,295
`
`mm»MIDOEE9231053.
`
`
`
`3.015;:2052559.can.55.5239
`
`
`
`
`
`:E:«Edeg?9:B5:339.2%Eta.25be.3929:Me5:823
`
`
`.553502won9:.335:80Em£2233553.8.535bias.5hmnEzc
`nfl—EOE 0259:.525m
`
`55855003663mm9:355:35555309..m..6555E0539:.
`
`5::3)9:59.:5mafia?233am53.55m_mE:_o>55:33.5:chm_
`
`
`
`525$55:25th>3“5:...va60:5.»mecm”5055833EaEceE
`
`:o_mmEanonoumEzmma»:9.$5295555:09..m.we555:5:9:.
`
`
`55513::new;550m32min9:5@5568or:555mm_mo_uo_
`
`8550:.»2.55E8mammE5m0:2.935955.«L«to..35:mm:
`
`
`
`._.<n_:m_as:E5meyaU55.50N-.=_Umo\$mulum._$n"3.0.62:552
`
`In.1'
`
`qua:585m
`
`Ema“at“.
`
`t3m
`
`553.0
`
`E50
`
`harm
`
`mammn
`
`
`
`.v553556262.53385a80535a$8mo.
`
`
`
`
`
`5953..595.o:655:.newmEEEmm?
`
`
`
`.:E__“.38259:5535.cm25.5;$33
`
`0259:525m
`
`has.:n9.A-o.EmtnupA-_.
`
`mv
`
`
`
`man:9:29,
`
`5
`
`:29?"30
`
`hmnEzc
`
`9303Nme.
`
`55:?5Q
`
`10
`
`5.2"So
`
`555:5:mm.
`
`mom
`
`mow
`
`55EEOE5
`
`mm.
`
`10
`
`
`
`US. Patent
`
`Sep. 15, 1998
`
`Sheet 10 0f 13
`
`5,809,295
`
`
`
`
`
`
`
`.8.50.05::5E00:E0_>55.50020.5“m5mEEEnEmam.mcozfimno50.0.02:203:02:3Ez_m.mo<n_m>mo350m:2upmi...
`
`
`
`
`
`050.”—
`
` .mm:
`
`
`
`
`:03mm:300%.;5352:206010555:5500005550:00=mEm«0m£535505B.0£505.,gong—55EummeaEg2Eat05E55%cm000.020m5co5050:0550
`55000“3005«mm3.0355:m590:0UU<9300052005E53...umtnmnEm
`
`mEmEmm:08mcozmuozm0055.00050...>mE2503.00Eow0303m5.550.0
`
`
`
`
`05EOE950m:0001Lowowm05E0200050505.6.LmnEzc55000..:9mmtEwomm.-H552
`
`
`
`
`
`umnEac
`
`
`
`6.5.2:.LO“—Ehwmwm
`
`Eu.
`
`ON
`
`hm
`
`
`
`
`
`.5...coszhaEm59903.000:=maEé0.2000U555:3:05.6E30tom
`
`
`
`
`50:205mBmofiE020.250.00.00080555uoEoEmmiu.05000.qumEmwc
`
`
`
`
`
`20503ucm«m:.5anEmEmmt0535:00:02;.EmEmmt“E:052:0.6
`5930m_.5020mt050200559,5:9:chSE05.5.00905.00.5
`
`
`
`
`
` UmwmmaEgcaE0302.0m5E“—55050.903“02530092000".0.353:05.55:300tom.55UmmmEaEg
`
`.EEB..
`
`
`
`
`
`.3Ummmmanog.0§00000560050.20E005200559.35tmEEEUEamt
`
`
`
`
`
`Eo>m3EmmEQEooEccmo000nm0>tocan;UmmeaEgcs.0930Ememe
`
`
`
`
`
`095.0990.050305252505:05000%05E0:35
`
`
`
`
`
`
`
`.30.:E000;.ESEEm:m.E0..50:0.6509rcou8
`
`
`
`
`
`.mEmEmm:0:A6.5500:EwEmmtmm:5303.0AA55:05.95”:
`
`w
`
`II.
`.NNI
`Ia
`
`E8882
`
`“mm
`
`mm
`
`11
`
`11
`
`
`
`
`
`
`
`
`
`US. Patent
`
`S
`
`w5’1
`
`31anHmChS
`
`5,809,295
`
`
`
`
`
`
`
`3.:gmuEOQ“$59.3mam:Loyoom
`
`533.09:.6EwEmEv.m_£%59,5:58.259:99m9.98:=95E“990$9659:5:9.:“E5:00:9m
`
`
`m....8:93:9..26953mm
`
`tfimomm
`
`
`9:99m30mm:mam:58mm.2::_56mm“2:$595.LmnEsz55$::2mm
`
`
`.5253:55mm"50m5Em2Lag—ca:E529523‘.553«Eu5..Emu
`
`D10.”—
`
`
`
`8:33me£55.35
`
`$ka:m
`
`gamma
`
`33000me
`
`12
`
`12
`
`
`
`US. Patent
`
`Sep. 15, 1998
`
`Sheet 12 of 13
`
`5,809,295
`
`
`
`
`
`53353335333333EEEEEEa—uaanuumammosmm
`
`
`
`HEHEHEIHangaalmgzoé
`
`(3.0.”—
`
`
`
`an:EEO."—22:92“.
`
`AmmOPUm—m
`
`
`Emzoq‘E02:55Em§o<E
`Ihozmjmums—DZ
`
`Ell
`
`Hg:
`
`II...
`
`IIEMIII
`
`HIIH
`
`Eli!Hwoo.rII
`
`
`
`ligll
`
`mm.o_n—
`
`13
`
`13
`
`
`
`
`
`U
`
`S Patent
`
`Sep. 15,1998
`
`Sheet 13 of 13
`
`5,809,295
`
`4.0WW9M09N
`
`_I
`
`0
`
`.q
`
`011$”.
`
`
`
`04m“—O<mu_Odin—O<mmp
`
`>m0_2m_2
`
`>mOEw§
`
`em
`
`1amww
`
`3 SVHd
`NOLLVOO']
`
`
`
`n_<m_ImOhme
`
`14
`
`vmN_.
`
`
`
`
`
`Odin.O<muOdin.059E
`
`
`
`m<mImOkUmm
`
`>m0_2w_>_
`
`14
`
`
`
`
`
`
`One of the media utilized by computers to store data is a
`magnetic disk drive. Magnetic disk drives allow computers
`to alter the data that is stored thereon, such that data can be
`read from and written onto the magnetic disk. The data is
`usually stored on the disk in groups called tiles, with each
`file being independently accessible. The location on the disk
`where each file is stored is identified and stored in a data
`structure so that the computer can quickly access the nec-
`essary data when desired.
`The storage area on the disk is divided into concentric
`circles called tracks. "the number of tracks on a disk depends
`on the size of the disk. It is not uncommon for a disk to have
`over one thousand concentric tracks. Each track on the disk
`is divided into an equal number of sectors. Typically, the
`sectors are arranged in slices, such that the sectors at the
`outer edge, or beginning. of the disk take up more linear
`space along a track than the sectors near the inner edge, or
`end, of the disk. However, the data stored within each sector
`is arranged such that each sector contains an identical
`amount of data. The location of each sector is stored in a
`special data structure on the disk,
`thereby making each
`sector independently accessible.
`The operating system of the computer groups the sectors ‘
`of the disk into clusters.
`It should be ttnderstood that this
`grouping of sectors into clusters is performed by the opcrw
`ating system of the computer, and is thus not a physical
`delimitation. For a disk that stores uncompressed data, every
`cluster on the disk contains the same number of sectors. A
`cluster usually comprises from one to sixty-four sectors,
`depending on the size of the disk, with each sector storing
`512 bytes of data. A special data structure on the disk
`contains a list of all clusters on the disk. thereby making
`each cluster independently accessible.
`The operating system of a computer stores uncompressed
`data in one or more clusters on the disk. A cluster thus
`contains the contents, or a portion thereof, of a single file.
`Large files may require several thousand clusters to hold all
`of the data associated with the file, but extremely small files
`can be stored in a single cluster. Because a cluster only stores
`data from a single file, several sectors of the cluster will not
`be used to store other data if the file does not contain enough
`data to fill all the sectors of the cluster.
`
`3o
`
`40
`
`50
`
`2
`onto a disk, the head is positioned over the appropriate area
`of the disk where the data is to be written. An electrical
`current, supplied to the head. produces a magnetic field
`which magnetizes a small area ofthe disk near the head. This
`small magnetized area represents a digital bit. Similarly,
`when data is read from a disk drive, the head is positioned
`over the appropriate magnetized area of the disk, which
`induces a current in the head. The current induced in the
`head is then decoded into digital data.
`The rigid platters of the disk drive are connected to a
`single spindle. The spindle is connected to a motor, which
`spins the disks in unison. Most disk drives include motors
`that spin the disks at a constant rate. Although the sectors of
`the disk may take up different amounts of space, the amount
`ofdata stored within each sector is identical. This allows the
`disks to spin at a constant rate to retrieve equal amounts of
`data regardless of the location of the sector on the disk.
`When a new cluster on the disk is accessed, two mechani-
`cal operations thst occur before the head actually begins to
`read or write data. First, the head, attached to a pivotable
`arm,
`is moved radially from the location of the current
`cluster to the location of the destination cluster. A certain
`amount of time is required for the arm holding the head to
`overcome the etIects of inertia and friction so as to etIect its
`movement. Additional time is required to allow the head to
`settle in a stationary position after its movement. Second, the
`head must wait for the appropriate cluster of the disk,
`containing the desired data,
`to spin beneath the head.
`Because the disk spins at a constant rate,
`the maximum
`amount of time that
`the head must wait for the desired
`cluster to pass beneath it is the time required for the platter
`to complete one spin. Therefore, each access of a new cluster
`creates an inherent delay due to the mechanical requirements
`of accessing the correct area of the disk.
`The amount of data that one disk can store is limited by
`the size of the disk and the number of tracks thereon.
`However, a technique called data compression allows the
`data in a
`file to be stored in less space on the disk.
`Compression programs, as are well known in the art, typi-
`cally identify frequently occurring strings of letters, words,
`or phrases and replace each string with a respective token.
`Because each token requires a much smaller space for
`storage than does the respective string, the file data requires
`less storage space. A file stored using such a technique is
`called a compressed file. When the compressed data is to be
`read,
`it
`is lirst accessed and then decompressed to be
`returned to its original form. Compressed files not only save
`disk storage space, but also can decrease access time
`because it can be faster [or the computer to read and
`decompress a compressed file than to read an uncompressed
`tile.
`
`The file data contained in a single uncompressed cluster,
`when stored in compressed form,
`is caller] a compressed
`cluster. A compressed cluster requires fewer sectors to store
`data than an uncompressed cluster, and thus, more than one
`compressed cluster may be stored within the sectors required
`to store data of an uncompressed cluster. Unlike disk space
`allocation for uncompressed files, which are stored in cluster
`units, compression programs allocate disk storage space by
`sector units. Compression programs thus support variable—
`length clusters for the storage of compressed data because
`only as many sectors as are necessary to store each cluster
`in compressed form are allocated for
`that purpose.
`Therefore, compression programs may allow more than one
`compressed file to be stored within the sectors that would
`otherwise comprise a single uncompressed cluster. Variable-
`length clusters, used to store compressed data, allow previ-
`
`5,809,295
`
`1
`METHOD AND APPARATUS FOR STORING
`COMPRESSED FILE DATA ON A DISK.
`WHERE EACH MDFAT DATA STRUCTURE
`INCLUDES AN EXTRA BYTE
`
`TECHNICAL FIELD
`
`invention relates to an improved method,
`The present
`apparatus, and format for storing compressed file data on a
`disk.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`15
`
`Magnetic disk drives include a stack of several rigid
`aluminum disks, or platters, coated with magnetic material.
`Each platter stores data on both its top and bottom surfaces.
`Data is encoded on the each of the disks by magnetizing
`areas of the disk surface. Data is retrieved or added to the
`disk by a readtwrite bead. Each head of the magnetic disk
`drive consists of a tiny electromagnet attached to a pivoting
`arm, with the electromagnet being positioned very close to
`the surface of the disks. One such head is provided for each
`recording surface on each platter. The arms pivot to move
`the heads back and forth over the surface of the disks in a
`generally radial path.
`IIead actuators pivot
`the arms to
`control the movement of the heads. When data is written
`
`55
`
`60
`
`65
`
`15
`
`15
`
`
`
`5,809,295
`
`10
`
`15
`
`3
`ously vacant sectors in a cluster to be allocated to store data
`from another compressed cluster. This allows each sector of
`storage space on the disk to be available for storing corn-
`pressed files.
`Disk compression programs increase the amount of data
`that can be stored on a disk by compressing the data stored
`in each cluster so that it can be stored in fewer sectors than
`would be required to store the data in uncompressed form.
`The technique used by these compression programs is to
`compress all files on the disk and store them in a single, large
`file called the Compressed Volume File {CVF}. Once the
`files are compressed and stored in the CVF, any attempt to
`access one of the files will be intercepted by compression
`device driver software. This device driver locates the file’s
`compressed data within the CVF and executes the read or
`write request, decompressing or compressing the file data as
`necessary.
`The location of compressed file data stored on a disk must
`be maintained to allow the operating system of the computer
`to access files when desired. Several data structures exist
`within the CVF that are used for file access.
`It should be
`understood that
`the data structures discussed below are
`maintained by the operating system of the computer and are
`provided merely as an example of the necessary data struc-
`tures. Other data structures may be maintained by alternate “
`operating systems that perform similar functions to achieve
`similar results.
`The first sector in the CVF contains a data structure called
`the Microsoft DoubleSpace BIOS Parameter Block
`(MDBPB). The MDBPB contains information that describes
`the characteristics of the drive, including how many bytes
`are contained per sector, how sectors are contained per
`cluster, and the total number of clusters on the drive.
`
`an
`
`The Boot Sector comprises one sector within the CVF,
`and is identical to the Boot Sector found in the MS-DOS ‘
`operating system, and is well known to those skilled in the
`art.
`
`Each sector on the disk has a unique identifying number,
`based on the location of the sector on the disk, so that it can
`be easily accessed. Similarly, each cluster has a unique
`cluster number. Identifying numbers are assigned such that
`adjacent sectors and clusters, respectively, are consecutively
`numbered. The primary data structure used for determining
`what parts of the disk are being used for tile data storage is
`the File Allocation Table (FAT). The FAT for a compressed
`drive is identical to the FAT for an uncompressed drive. The
`FAT contains one entry for each cluster identified on the
`drive.
`
`For an uncompressed drive, the sectors used to identify
`FAT clusters are sequentially located on the disk, usually
`beginning with the outermost sectors on the disk. Clusters
`are listed in the FM" sequentially by their cluster number,
`beginning with the outermost clusters on the disk. If each
`cluster on an uncompressed drive contains, for example,
`eight sectors, then the first eight contiguous sectors would be
`identified as cluster 1,
`the next eight contiguous sectors
`would be identified as cluster 2, and so on throughout the
`disk. Because clusters for uncompressed drives are sequen-
`tially numbered, the operating system can easily locate the
`sector number for a given cluster by multiplying the cluster
`number by the number of sectors per cluster.
`The FAT entry for each cluster contains the number of the
`cluster in which the next part of that file is contained. The
`FAT entry for the cluster containing the last data of a file
`comprises an End of File {EOF} entry. Therefore, each file
`stored on the disk is represented in the FAT as a chain ofone
`
`4U
`
`50
`
`55
`
`60
`
`65
`
`16
`
`4
`the FAT indicates whether each
`or more clusters. Thus,
`cluster is allocated to a file, but does not tell the operating
`system of the computer to which file a given cluster belongs.
`A compressed drive also maintains a FAT The FAT for a
`compressed drive is identical to the FAT maintained for an
`uncompressed drive in that the FAT contains one entry for
`each cluster on the disk. However, the clusters in a com—
`pressed drive can be variable in length and, furthermore,
`sequentially numbered clusters need not be physically adja-
`cent. Thus, the operating system cannot determine, merely
`by examining the FAT, in what sector number compressed
`data for a given cluster is stored.
`Another data structure, called the Root Directory, here-
`inafter referred to as the Directory, includes a list of all files
`and subdirectories stored on the disk. The Directory differs
`from subdirectories in that the Directory is stored near the
`beginning of the disk prior to sectors used to store file data.
`Subdirectories are stored in sectors on the disk in the same
`manner as are files, and the location of first-level files and
`subdirectories is maintained in the Directory. The Directory
`has an entry, for each file, containing the cluster number in
`which the first part of that
`tile is stored. By storing the
`starting cluster number of each file, the Directory ties each
`file to the l-‘AT.
`
`An additional data structure, called the Sector Heap,
`occupies the majority of the space within the (.‘VF. The
`Sector Heap is an array of sectors in the CV1: in which the
`compressed data is actually stored. The first sector of each
`compressed cluster contains a compression tag that indicates
`the degree of compression that has been applied to the data.
`Another data structure maintained in the CVF is the
`Microsoft DoubleSpace File Allocation Table (MDT-AT).
`Microsoft DoubleSpace is a data compression program
`included in MS—DOS. The MDFAT keeps track of where the
`uncompressed data for each cluster is stored in compressed
`or uncompressed form within the CVF. The MDFAT paral-
`lels the FAT and contains an entry for each respective entry
`in the FAT. For each FAT entry, the corresponding MDFAF
`entry indicates in what sectors within the Sector Heap that
`cluster’s compressed data is stored. Thus, by examining the
`FAT and corresponding MDFAT entries, the operating sys-
`tem can determine in which sectors compressed data for a
`cluster is stored.
`
`A further data structure stored in the CW“ on compressed
`drives is the BitFAT. The BitFAT is a table containing one bit
`for each sector
`in the Sector I'Ieap. Each bit
`indicates
`whether the respective sector is being used to store data or
`whether the sector is unused. Therefore, available sectors
`can be identified by the operating system by scanning the
`BitFAT.
`
`To access a file on the disk, the operating system of the
`computer reads the Directory entry for the file to determine
`the first cluster number in which file data is stored. The
`operating system then reads the entire chain of FAl‘ entries
`starting at the file‘s first cluster. Using the location of the
`sectors used to store data for each cluster, as found in the
`MDFAT, and the chain of clusters from the FAT, the oper-
`ating system can determine every compressed cluster
`belonging to the file and can access all sectors of each
`compressed cluster.
`In some methods of storing uncompressed data, fragmen-
`tation may occur when data on a disk is deleted and new data
`is added. Fragmentation is the storing of portions of a single
`file in non-adjacent locations on the disk. When previously-
`used disk space is made available, the operating system of
`the computer stores new file data in the first available vacant
`
`16
`
`
`
`5,809,295
`
`5
`space. However, the first available vacant space may be too
`small to store all of the new file data to be stored. Therefore,
`a portion of the new file data will be stored in the first
`available vacant space and any leftover data will be stored
`in the next available vacant space. Thus. a file may be stored
`in several non-adjacent groups of clusters on the disk, and is
`thus called a fragmented file.
`Storing uncompressed data in fragmented locations on the
`disk is beneficial in that it allows the vacant spaces through-
`out the disk to be filled by new data being written onto the
`disk. If storing data in fragments is not utilized, and the new
`data is too large to be stored within a single one of the
`smaller portions of vacant space located throughout the disk,
`then the smaller portions of vacant disk space scattered
`among filled clusters are not utilized, and the vacant space
`is wasted.
`
`Compressed file data can become fragmented in the same
`manner as uncompressed file data. Because compression
`programs allocate disk space for storage of compressed data
`by sector rather than by cluster, the groups of sectors in the
`Sector Heap in which the compressed clusters of a file are
`stored may not be adjacent to one another. Additionally. the
`unused sectors within the Sector Heap may become scat—
`tered around the disk, rather than being consolidated at the
`end of the Sector Heap.
`This is demonstrated in FIGS. 1A and 113. FIG. IAshows
`uncompressed data of File A (identified as Al and A2),
`uncompressed data of File I3 (identified as Bl and 132), and
`uncompressed data of File C (identified as C1, C2, C3, and
`C4) stored in adjacent clusters comprising a contiguous run
`of sectors 1 through 64. FIG. 1B shows how the uncom—
`pressed data of FIG. 1A may be stored in compressed form.
`FIG. 13 shows how compressed data may occupy only a few
`of the sectors required to store the data of the cluster in
`uncompressed form, thereby leaving several vacant sectors
`within the Sector Heap. 'l'hus, FIG. 13 shows compressed
`file data stored in groups of sectors interspersed with groups
`of vacant sectors.
`
`Some prior methods of storing compressed data stored
`each compressed cluster only in a contiguous run of sectors.
`but did not store a compressed cluster in fragments in
`separate. non—contiguous groups of sectors on the disk.
`Therefore. some prior implementations of storing com-
`pressed data stored compressed clusters only in an empty
`space on the disk that provided su tficient space to store the
`compressed cluster in its entirety. If the empty space could
`not store the entire compressed cluster, then no portion of the
`compressed cluster would be stored in the empty space and
`the computer would search for another empty space suffi-
`ciently large to store all of the compressed cluster. Thus, in
`the example of FIG. 13, the empty sectors would remain
`vacant unless a new compressed cluster to be stored is small
`enough to fit entirely within the vacant space. This causes
`many of the small empty spaces within the Sector IIeap to
`remain unfilled and thus wastes disk space, adversely alIect-
`ing the goal of complete maximization of the storage space
`on the disk.
`
`Another prior method of storing compressed data allows
`the storage of the data of a compressed cluster in fragments
`scattered throughout the disk. However, this prior method
`u tilizes a different manner of identifying the location of each
`fragment of the compressed cluster. This prior method
`stores, in each fragment, the location of the next fragment.
`Thus,
`this method would include storing the location of
`fragment 2 in fragment 1, storing the location of fragment 3
`in fragment 2, and so forth for all fragments. This method of
`
`10
`
`15
`
`3n
`
`40
`
`50
`
`55
`
`60
`
`65
`
`6
`storing the location of each fragment suffers from the
`disadvantage of requiring access of the fragments based only
`on the fragment number and not on the location of the
`fragments. Such a storage method would necessitate the
`access of the fragments in numerical order, such that frag—
`ment 1
`is accessed first, then fragment 2, then fragment 3,
`and so forth. This method does not provide a method for
`optimizing the access of each fragment because the location
`of each fragment
`is unknown to the computer until
`the
`immediately prior fragment is accessed.
`Another prior method of storing the fragment locations
`involves storing the location of each fragment within the
`MDFAT. However, such a method requires that adequate
`space be allocated for each sector
`in the Sector Heap,
`because each of the sectors in the Sector Heap can poten—
`tially be used to store a fragment of a compressed cluster.
`'t'heret‘ore, space must be allotted for storage of fragment
`locations for each sector, even though only a small percent~
`age of sectors in the Sector IIeap may actually be used to
`store a fragment. Therefore, for sectors that are not used to
`store fragments,
`the space allocated for storage of the
`fragment locations for that sector is wasted. Because of the
`very large number of sectors on a compressed drive, this
`prior method wastes a great deal of storage space when less
`than all of the sectors are used to store a fragment.
`SUMMARY OF THE INVENTION
`
`The present invention provides a method and apparatus
`for storing compressed file data on a disk. The method
`includes an improved format for the Compressed Volume
`File (CVF), and more specifically, an improved format for
`the MDFAT data structure stored in the CVF. The improved
`format
`includes using an additional byte for maintaining
`each entry in the MDPAT data structure. The use of an
`additional byte increases the number of sectors within the
`Sector Heap that can be addressed and accessed and thus
`increases the nu mber of sectors that can be accessed for each
`FAT cluster. The improved format
`further allows com-
`pressed clusters to be stored in fragments in various portions
`of vacant space located throughout the Sector Heap.
`'Ihe
`new format for the MDFA'I' data structure includes a bit that
`identifies whether each compressed cluster is being stored in
`fragments. The method stores the location of each of the
`fragments in a Fragment Pointer List
`located in the first
`sector of the first fragment in the Sector Heap. The storage
`of the entire Fragment Pointer List
`in the first fragment
`allows the computer to obtain the locations of all fragments
`prior to initiating access of any fragment, thus allowing the
`computer to directly access a fragment. The storage of the
`Fragment Pointer List in the first fragment also eliminates
`the location of the subsequent fragment from being inter-
`spersed with data when all fragments are read into memory.
`Thus, it is an object of the present invention to provide a
`method and system for storing compressed file data stored
`on a disk.
`
`It is another object of the present invention to increase the
`amount of compressed data that can be stored on a disk.
`It is a further object of the present invention to increase
`the number of sectors for storing compressed data that can
`be addressed and accessed.
`
`It is a still further object of the present invention to store
`compressor] data in fragments, when necessary, rather than
`in contiguous sectors.
`It is a further object of the present invention to store the
`list of fragments using a minimum of disk storage space.
`It is yet another object of the present invention to store the
`list of the location of each fragment entirely within the first
`fragment of the fragment chain.
`
`17
`
`17
`
`
`
`5,809,295
`
`7
`It is a still further object of the present invention to allow
`access of all fragments while minimizing movement 01' the
`heads of the disk drive.
`
`Other objects, features, and advantages of the present
`invention will become apparent upon reading the following
`specification, when taken in conjunction with the drawings
`and the appended claims.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`FIG. 1A is a representation of uncompressed file data
`stored in clusters on a