throbber
I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111
`US008140786B2
`
`c12) United States Patent
`Bunte et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,140, 786 B2
`Mar.20,2012
`
`(54) SYSTEMS AND METHODS FOR CREATING
`COPIES OF DATA, SUCH AS ARCHIVE
`COPIES
`
`(75)
`
`Inventors: Alan Bunte, Monmouth Beach, NJ (US);
`Anand Prahlad, East Brunswick, NJ
`(US); Brian Brockway, Shrewsbury, NJ
`(US)
`
`(73) Assignee: Comm Vault Systems, Inc., Oceanport,
`NJ (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1003 days.
`
`(21) Appl. No.: 11/950,376
`
`(22) Filed:
`
`Dec. 4, 2007
`
`(65)
`
`Prior Publication Data
`
`US 2008/0229037 Al
`
`Sep. 18, 2008
`
`Related U.S. Application Data
`
`(60) Provisional application No. 60/882,884, filed on Dec.
`29, 2006, provisional application No. 60/871,737,
`filed on Dec. 22, 2006, provisional application No.
`60/882,883, filed on Dec. 29, 2006, provisional
`application No. 61/001,485, filed on Oct. 31, 2007,
`provisional application No. 60/868,518, filed on Dec.
`4, 2006.
`
`(51)
`
`Int. Cl.
`G06F 12100
`(2006.01)
`(52) U.S. Cl. ............ 7111161; 71 l/El2.06; 711/El2.103
`(58) Field of Classification Search ........................ None
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`8/1987 Ng
`4,686,620 A
`2/1991 Coleetal.
`4,995,035 A
`5,005,122 A
`411991 Griffin et al.
`
`5,093,912 A
`5,133,065 A
`5,193,154 A
`5,212,772 A
`5,226,157 A
`5,239,647 A
`5,241,668 A
`5,241,670 A
`5,276,860 A
`5,276,867 A
`5,287,500 A
`5,321,816 A
`5,333,315 A
`5,347,653 A
`
`Dong et al.
`3/1992
`Cheffetz et al.
`7/1992
`Kitajima et al.
`3/1993
`Masters
`5/1993
`Nakano et al.
`7/1993
`Anglin et al.
`8/1993
`Eastridge et al.
`8/1993
`Eastridge et al.
`8/1993
`Fortier et al.
`1/1994
`Kenley et al.
`1/1994
`Stoppani, Jr.
`2/1994
`Rogan et al.
`6/1994
`Saether et al.
`7/1994
`Flynn et al.
`9/1994
`(Continued)
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`0259912
`3/1988
`(Continued)
`
`OTHER PUBLICATIONS
`
`Webopedia, "Data Duplication", Aug. 31, 2006, pp. 1-2, http://web.
`archive.org/web/200609130305 59/http://www.webopedia.com/
`TERM/D/data_deduplication.htrnl. *
`
`(Continued)
`
`Primary Examiner - Edward Dudek, Jr.
`Assistant Examiner - Christopher Birkhimer
`(74) Attorney, Agent, or Firm - Perkins Coie LLP
`
`(57)
`
`ABSTRACT
`
`A system and method of creating archive copies of data sets is
`described. In some examples, the system creates an archive
`copy from an original data set. In some examples, the system
`creates an archive copy when creating a recovery copy for a
`data set. In some examples, the system creates a copy without
`redundant data, and then encrypts the data set.
`
`13 Claims, 20 Drawing Sheets
`
`100~
`
`110
`
`150
`
`120
`
`/§) __ _
`.. ~--~-~··~
`~
`~
`
`140
`
`CSCO-1039
`Page 1 of 37
`
`

`

`US 8,140,786 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`4/1995 Fecteau et al.
`5,410,700 A
`5,437,012 A *
`7/1995 Mahajan ....................... 7111111
`9/1995 Hayashi
`5,448,724 A
`5,491,810 A
`2/1996 Allen
`5,495,607 A
`211996 Pisello et al.
`5,504,873 A
`411996 Martin et al.
`5,544,345 A
`8/ 1996 Carpenter et al.
`5,544,347 A
`8/1996 Yanai et al.
`5,559,957 A
`9/1996 Balk
`5,619,644 A
`411997 Crockett et al.
`5,638,509 A
`6/1997 Dunphy et al.
`5,673,381 A
`911997 Huai et al.
`5,699,361 A
`12/1997 Ding et al.
`5,729,743 A
`3/ 1998 Squibb
`5,742,792 A *
`4/1998 Yanai et al. ................... 7111162
`5,751,997 A
`511998 Kullick et al.
`511998 Saxon
`5,758,359 A
`611998 Senator et al.
`5,761,677 A
`611998 Crouse et al.
`5,764,972 A
`7/1998 Whiting et al.
`5,778,395 A
`911998 Nielsen
`5,812,398 A
`9/1998 Benson et al.
`5,813,008 A
`911998 Johnson et al.
`5,813,009 A
`911998 Morris
`5,813,017 A
`5,822,780 A *
`10/1998 Schutzman ................... 7111165
`2/1999 Blumenau
`5,875,478 A
`3/1999 Ebrahim
`5,887,134 A
`5/1999 Ofek
`5,901,327 A
`7I1999 Perks
`5,924,102 A
`8/ 1999 Benson
`5,940,833 A
`911999 Aviani, Jr.
`5,950,205 A
`10/ 1999 Beeler, Jr.
`5,974,563 A
`5,990,810 A
`1111999 Williams
`6,021,415 A
`212000 Cannon et al.
`212000 Anglin
`6,026,414 A
`412000 Ulrich et al.
`6,052,735 A
`612000 Kedem
`6,076,148 A
`712000 Ying
`6,094,416 A
`10/2000 Low et al.
`6,131,095 A
`10/2000 Sidwell
`6,131,190 A
`1112000 Cannon et al.
`6,148,412 A
`1112000 Urevig et al.
`6,154,787 A
`12/2000 Mutalik et al.
`6,161,111 A
`12/2000 Yeager
`6,167,402 A
`6,212,512 Bl
`412001 Barney et al.
`6,260,069 Bl
`712001 Anglin
`6,269,431 Bl
`7/2001 Dunham
`6,275,953 Bl
`8/2001 Vahalia et al.
`6,301,592 Bl
`10/2001 Aoyama et al
`6,311,252 Bl*
`10/2001 Raz.
`6,324,581 Bl
`1112001 Xu et al.
`6,328,766 Bl
`12/2001 Long
`6,330,570 Bl
`12/2001 Crighton
`6,330,642 Bl
`12/2001 Carteau
`6,343,324 Bl
`112002 Hubis et al.
`RE37,601 E
`3/2002 Eastridge et al.
`6,356,801 Bl
`3/2002 Goodman et al.
`6,389,432 Bl
`512002 Pothapragada et al.
`6,421,711 Bl
`712002 Blumenau et al.
`6,487,561 Bl
`1112002 Ofek et al.
`6,513,051 Bl
`112003 Bolosky et al.
`6,519,679 B2
`212003 Devireddy et al.
`6,538,669 Bl
`3/2003 Lagueux, Jr. et al.
`5/2003 O'Connor
`6,564,228 Bl
`6,609,183 B2 *
`8/2003 Ohran ........................... 7111161
`6,609,187 Bl*
`8/2003 Merrell et al. ................ 7111173
`6,658,526 B2
`12/2003 Nguyen et al.
`6,704,730 B2
`3/2004 Moulton et al.
`6,745,304 B2 *
`6/2004 Playe ............................ 7111161
`6,757,699 B2 *
`111
`6/2004 Lowry
`6/2004 Cabrera et al.
`6,757,794 B2
`6,795,903 B2 *
`912004 Schultz et al ................. 7111154
`6,810,398 B2
`10/2004 Moulton
`6,868,417 B2
`3/2005 Kazar et al.
`6,901,493 Bl*
`512005 Maffezzoni ................... 7111162
`6,928,459 Bl
`8/2005 Sawdon et al.
`6,952,758 B2
`10/2005 Chron et al.
`6,959,368 Bl *
`10/2005 St. Pierre et al .............. 7111162
`
`. ................ 7111117
`
`6,976,039 B2 *
`6,993,162 B2 *
`7,017,113 B2
`7,035,943 B2
`7,089,395 B2 *
`7,111,173 Bl*
`7,117,246 B2
`7,143,091 B2
`7,191,290 Bl
`7,246,272 B2
`7,272,606 B2
`7,287,252 B2
`7,320,059 Bl
`7,325,110 B2
`7,395,282 Bl
`7,444,382 B2
`7,444,387 B2
`7,478,113 Bl
`7,487,245 B2
`7,496,604 B2
`7,647,462 B2
`7,661,028 B2
`7,685,177 Bl
`7,685,459 Bl
`7,698,699 B2
`7,870,486 B2
`200110037323 Al*
`2002/0099806 Al
`200210107877 Al
`2003/0033308 Al
`2003/0110190 Al
`2003/0182310 Al *
`200410148306 Al *
`2004/0230817 Al
`2004/0250033 Al
`2005/0033756 Al
`2005/0066190 Al*
`2005/0086443 Al
`2005/0114406 Al
`2005/0131900 Al
`2005/0182780 Al*
`2005/0203864 Al
`2005/0234823 Al*
`2005/0262194 Al*
`2006/0005048 Al
`200610010227 Al
`2006/0047894 Al *
`2006/0053305 Al
`2006/0056623 Al*
`200610174112 Al *
`2006/0224846 Al
`2006/0230244 Al
`2006/0242489 Al
`2007 /0022145 Al
`2007/0118705 Al
`200710179995 Al
`2007/0198613 Al
`2007 /0203937 Al
`2007 /02557 58 Al
`2008/0028007 Al
`2008/0098083 Al
`2008/0162320 Al
`2008/0162518 Al
`200910106480 Al
`200910112870 Al
`2009/0132619 Al
`2009/0144285 Al
`200910177719 Al
`2009/0204649 Al
`2010/0094817 Al
`2010/0161554 Al
`
`111
`............. 382/118
`
`12/2005 Chefalas et al.
`112006 Stephany et al.
`3/2006 Bourbakis et al.
`412006 Yamane et al.
`8/2006 Jacobson et al ............... 7111202
`912006 Scheidt ......................... 713/186
`10/2006 Christenson et al.
`1112006 Charnock et al.
`3/2007 Ackaouy et al.
`7 /2007 Cabezas et al.
`9/2007 Borthakur et al.
`10/2007 Bussiere et al.
`112008 Armangau et al.
`112008 Kubo et al.
`7 /2008 Crescenti et al.
`10/2008 Malik
`10/2008 Douceur et al.
`112009 De Spiegeleer et al.
`212009 Douceur et al.
`212009 Sutton, Jr. et al.
`112010 Wolfgang et al.
`212010 Erofeev
`3/2010 Hagerstrom et al.
`3/2010 De Spiegeleer et al.
`412010 Rogers et al.
`112011 Wang et al.
`1112001 Moulton eta!. .................. 707/1
`712002 Balsamo et al.
`8/2002 Whiting et al.
`212003 Patel et al.
`6/2003 Achiwa et al.
`9/2003 Charnock et al. .......... 707/104.1
`.............. 707/101
`7/2004 Moulton et al.
`1112004 Ma
`12/2004 Prahlad et al.
`212005 Kottomtharayil et al.
`3/2005 Martin .......................... 713/200
`412005 Mizuno et al.
`512005 Borthakur et al.
`6/2005 Palliyll et al.
`8/2005 Forman et al ................. 707/101
`912005 Schmidt et al.
`10/2005 Schimpf ......................... 705/50
`1112005 Mamou et al. ................ 709/203
`112006 Osaki et al.
`112006 Atluri
`3/2006 Okumura ...................... 7111111
`3/2006 Wahlert et al.
`3/2006 Gligor et al. .................... 380/28
`8/2006 Wray ............................ 713/168
`10/2006 Amarendran et al.
`10/2006 Amarendran et al.
`10/2006 Brockway et al.
`112007 Kavuri
`5/2007 Arakawa et al.
`8/2007 Prahlad et al.
`8/2007 Prahlad et al.
`8/2007 Prahlad et al.
`1112007 Zheng et al.
`1/2008 Ishii et al.
`4/2008 Shergill et al.
`7 /2008 Mueller et al.
`7 /2008 Bollinger et al.
`412009 Chung
`412009 Ozzie et al.
`512009 Arakawa et al.
`612009 Chatley et al.
`712009 Kavuri
`8/2009 Wong et al.
`4/2010 Ben-Shaul et al.
`612010 Datuashvili et al.
`
`EP
`EP
`EP
`EP
`EP
`
`FOREIGN PATENT DOCUMENTS
`0405926
`111991
`0467546
`111992
`5/1997
`0774715
`0809184
`1111997
`0899662
`3/1999
`
`Page 2 of 37
`
`

`

`US 8,140,786 B2
`Page 3
`
`EP
`WO
`WO
`
`0981090
`W0-95/13580
`W0-99/12098
`
`212000
`5/1995
`3/1999
`
`OTHER PUBLICATIONS
`
`Federal Information Processing Standards Publication 180-2,
`"Secure Hash Standard", Aug. 1, 2002, pp. 1-83 http://csrc.nist.gov/
`publications/fips/fips 180-2/fips l 80-2withchangenotice. pdf. *
`Menezes et al., "Handbook of Applied Cryptography", CRC Press,
`1996, pp. 321-383 http://www.cacr.math.uwaterloo.ca/hac/about/
`chap9.pdf.*
`Microsoft, "Computer Dictionary", Fifth Edition, 2002, p. 249.*
`SearchStorage, "File System", Nov. 1998, pp. 1-10, http://searchstor(cid:173)
`age. techtarget.corn/ definition/ file-system.*
`Comm Vault Systems, Inc., "Deduplication," <http://documentation.
`commvault.corn/commvault/release_8_0_0/books_online_l/
`english_US/features/single_instance/single_instance.htm>,
`internet accessed on May 21, 2009, 9 pages.
`Comm Vault Systems, Inc., "Deduplication-How to," <http://docu(cid:173)
`mentation.commvault.com/commvault/release_8_0_0/books_
`online_l/english_US/features/single_instance/single_instance_
`how_to.htm>, internet accessed on May 21, 2009, 7 pages.
`U.S. Appl. No. 12/626,839, filed Nov. 27, 2009, Klose, Michael F.
`Diligent Technologies "HyperFactor," <http://www.diligent.com/
`products:protecTIER-l:HyperFactor-1>, Internet accessed on Dec.
`5, 2008, 2 pages.
`Overland Storage, "Data Deduplication," <http://www.overlandstor(cid:173)
`age.com/topics/data_deduplication.html>, Internet accessed on
`Dec. 5, 2008, 2 pages.
`Lortu Software Development, "Kondar Technology-Deduplication,"
`<http://www.lortu.com/en/deduplication.asp>, Internet accessed on
`Dec. 5, 2008, 3 pages.
`Quantum Corporation, "Data De-Duplication Background: A Tech(cid:173)
`nical White Paper," May 2008, 13 pages.
`Kornblum, Jesse, "Identifying Almost Identical Files Using Context
`Triggered Piecewise Hashing," www.sciencedirect.com, Digital
`Investigation 3S (2006), pp. S91-S97.
`
`International Search Report and Written Opinion, International
`Application No. PCT/US2009/58137, Mail Date Dec. 23, 2009, 14
`pages.
`U.S. Appl. No. 11/963,623, Gokhale.
`U.S. Appl. No. 12/058,178, Kottomtharayil.
`U.S. Appl. No. 12/058,317, Kottomtharayil.
`U.S. Appl. No. 12/058,367, Kottomtharayil.
`Armstead et al., "Implementation of a Campus-wide Distributed
`Mass Storage Service: The Dream vs. Reality," IEEE, 1995, pp.
`190-199.
`Arneson, "Mass Storage Archiving in Network Environments,"
`Digest of Papers, Ninth IEEE Symposium on Mass Storage Systems,
`Oct. 31, 1988-Nov. 3, 1988, pp. 45-50, Monterey, CA.
`Cabrera et al., "ADSM: A Multi-Platform, Scalable, Backup and
`Archive Mass Storage System," Digest of Papers, Compcon '95,
`Proceedings of the 40th IEEE Computer Society International Con(cid:173)
`ference, Mar. 5, 1995-Mar. 9, 1995, pp. 420-427, San Francisco, CA.
`Eitel, "Backup and Storage Management in Distributed Heteroge(cid:173)
`neous Environments," IEEE, 1994, pp. 124-126.
`Jander, M., "Launching Storage-Area Net," Data Communications,
`US, McGraw Hill, NY, vol. 27, No. 4 (Mar. 21, 1998), pp. 64-72.
`Jason Gait, "The Optical File Cabinet: A Random-Access File Sys(cid:173)
`tem for Write-Once Optical Disks," IEEE Computer, vol. 21, No. 6,
`pp. 11-22 ( 1988).
`Rosenblum et al., "The Design and Implementation of a Log-Struc(cid:173)
`tured File System," Operating Systems Review SI GO PS, vol. 25, No.
`5, New York, US, pp. 1-15 (May 1991).
`U.S. Appl. No. 12/145,342, filed Jun. 24, 2008, Gokhale.
`U.S. Appl. No. 12/145,347, filed Jun. 24, 2008, Gokhale.
`U.S. Appl. No. 12/647,933, filed Jul. 3, 2008.
`U.S. Appl. No. 12/647,906, filed Dec. 28, 2009, Attarde et al.
`U.S. Appl. No. 12/649,454, filed Dec. 30, 2009, Muller et al.
`Commvault Systems, Inc., "Continuous Data Replicator 7.0," Prod(cid:173)
`uct Data Sheet, 2007.
`U.S. Appl. No. 12/565,576, filed Sep. 23, 2009, Kottomtharayil et al.
`U.S. Appl. No. 13/251,022, filed Sep. 30, 2011, Gokhale.
`* cited by examiner
`
`Page 3 of 37
`
`

`

`130
`
`Archive copy
`
`FIG. IA
`
`MA
`
`132
`
`MA
`
`117
`
`Cl
`
`.-············-~·-········-·················
`
`,...=::._----.. ~ ... ~1~2~1-------.
`120
`
`Secondary copy Ill
`ffi
`
`126
`
`124
`
`122
`
`Update
`Scan
`Metadata
`f3
`
`0
`
`File system
`
`api
`
`File system
`
`-----·····
`
`-----·-·----·---------·-·····························-····
`
`...... ·································--·~-·-················-----·-··--------------...........
`
`..-'1'-'1-"-0 ____ _.._ ____ ~
`
`100~
`
`Page 4 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 2of20
`
`US 8,140, 786 B2
`
`160~
`
`110
`
`Original
`data set
`
`132
`Archive copy
`(m.a.)
`
`117
`Cl
`
`165
`UI
`
`encryption
`
`FIG.JB
`
`170~
`
`175
`Previously
`archived copy
`
`132
`
`Archive copy
`
`137
`encryption
`
`FIG.JC
`
`Page 5 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 3of20
`
`US 8,140, 786 B2
`
`0
`0
`N
`
`.q-
`0
`N
`
`('t')
`0
`N
`
`N
`0
`N
`
`E
`CtJ T'""
`~ 0
`
`N -UJ
`
`-c
`
`(].)
`C)
`ro
`ro
`"C
`(].)
`~
`
`H
`
`~
`~
`~
`
`"C
`
`C\l -ro
`..... -cc
`
`Q) (].)
`:.:: C)
`<( C\l
`..c
`:J en
`
`Page 6 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 4of20
`
`US 8,140, 786 B2
`
`N
`N
`N
`
`ro <D
`......
`ro ro
`"'C ..0
`
`(/)
`
`......
`ro c:
`...... (I)
`ro e>
`-c ro
`
`-----------
`' ' '
`'
`' ' '
`'
`' '
`'
`
`'
`
`' ' '
`'
`
`\
`
`\
`
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`
`(")
`0
`N
`
`(/) CTIJ
`
`(I) (")
`OlN
`m
`m
`'5
`(I)
`E
`
`ro
`..c
`m
`iii
`"'C
`
`(")
`0
`
`N CTIJ
`
`(I)
`('I')
`C'l N
`m
`m
`'5
`(I)
`E
`
`(/)
`m
`..c
`m
`iii
`"'C
`
`(I) (")
`
`(/)
`C'l N m
`m
`..c
`ro
`m
`iii
`'5
`"'C
`Q)
`E
`
`\
`\
`\
`\
`\
`(")
`\ 0
`\N
`\
`
`\
`
`c ..-[IJ
`.__ _____ __.
`c ..-[IJ
`
`N
`0
`N
`
`..-
`N
`N
`
`c
`-~
`13
`
`0 ..-
`N
`
`N
`..- U-
`(I) (I)
`ro :J
`N
`'t: "'C
`(I) 0
`£E
`
`'-
`Q) Q)
`Cl Cl
`~ m
`0 c:
`...... Cll
`(/) E
`
`N
`0
`N
`
`..-
`N
`N
`
`me
`...... Q)
`ro C>
`"'C Cll
`
`c
`-~
`13
`
`m a>
`......
`ro m
`"'C ..0
`
`(/)
`
`(I) (")
`
`(/)
`C>N m
`m
`..c
`m
`ro
`iii
`'5
`(") ~
`"'C
`0
`N
`
`................. ~
`...... ·······
`N.__ _ _ _ __.
`
`LO
`0
`N
`
`Page 7 of 37
`
`

`

`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`r-------,
`320
`
`._. _______
`
`agents
`other
`
`module
`interface
`
`agent
`jobs
`
`214
`
`211
`
`FIG.2C
`
`__,,,,.
`
`-.......
`H' 222
`....
`
`--
`
`dB
`
`......._
`
`I'--..
`,.,....
`
`agent
`stream
`
`management
`
`agent
`
`310
`
`210
`
`212
`
`_....,,
`213
`
`dB
`
`....__
`
`r--.__
`
`""
`
`~
`
`~
`
`~
`
`agent
`data
`
`client
`
`_ ...
`
`.............,,
`
`221
`
`111
`
`207~
`
`Page 8 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 6of20
`
`US 8,140, 786 B2
`
`0 ..q-
`
`('t)
`
`(I)
`C>
`~
`.....
`0
`en
`
`0
`..q-
`('t)
`
`(I)
`C>
`...
`~
`0
`en
`
`0
`('t)
`('t)
`
`0
`
`~
`('t)
`
`ro
`E
`
`I.()
`('t)
`('t)
`
`x
`(I)
`"C
`c:
`
`~
`~
`~
`
`~~ ('t) ~
`
`"C
`
`...
`
`c:
`.~
`(.)
`
`I.()
`~
`
`('t) 19 ~
`ro o
`O(;)
`
`~~ M
`ro ·-
`'
`'
`
`~
`
`0
`N
`('t)
`
`(I) a;
`C>e>
`~ ro
`0 c:
`... ro
`en E
`
`0
`0
`('t)
`
`' ' '
`
`Page 9 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 7of20
`
`US 8,140, 786 B2
`
`N
`
`0
`
`u co
`>-co a..
`
`~
`~
`~
`
`""" N ...,.
`
`N
`N
`
`"""
`
`0
`N
`
`"""
`
`('I')
`N
`
`"""
`
`N
`
`~
`Q)
`
`l+:: u co
`Q)
`Q)
`.~ ..c::::
`..c::::
`~
`<(
`
`0
`0
`
`"""
`
`Page 10 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 8of20
`
`US 8,140, 786 B2
`
`'V
`0
`N
`
`'V
`0
`N
`
`'V
`0
`N
`
`N
`::it:. c:
`::I
`L:
`
`0 -~ -Ctl
`
`~
`
`::it:. c:
`::I
`L:
`0
`-
`
`~ -Ctl
`
`(")
`
`::it:. c:
`::I
`L:
`
`0 -~ -Ctl
`
`'Ii
`~
`~
`
`0
`0
`LO
`
`Page 11 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 9of20
`
`US 8,140, 786 B2
`
`~600
`
`Begin
`
`612
`Receive request to create
`L....-a-rc_h_iv_e_c_o_py_v_i_a_o_ri-gi-na_l_d_a_ta__.
`set
`
`61 O
`
`614
`Receive request to create
`archive copy via recovery copy
`
`620
`
`Create archive copy
`
`630
`Store archive copy in storage
`component
`
`Done
`
`FIG. 6
`
`Page 12 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 10 of 20
`
`US 8,140, 786 B2
`
`5700
`
`Begin
`
`710
`
`Receive recovery copy of data
`
`720
`
`730
`
`740
`
`750
`
`Single instance data
`
`Content index data
`
`Encrypt data
`
`Create archive copy
`
`Done
`
`FIG. 7
`
`Page 13 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 11 of 20
`
`US 8,140, 786 B2
`
`5800
`
`Backup file
`
`Identify file
`
`810
`
`820
`
`Determine file uniqueness
`
`No
`
`Add file reference
`
`840
`
`Store unique file
`
`Done
`
`FIG. 8
`
`Page 14 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 12 of 20
`
`US 8,140, 786 B2
`
`900
`
`Index Content
`
`Select offline copy
`
`Identify content
`
`910
`
`920
`
`930
`
`Update content index
`
`Done
`
`FIG. 9
`
`Page 15 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 13 of 20
`
`US 8,140, 786 B2
`
`1000
`
`Begin
`
`1010
`
`Receive data for encryption
`
`Encrypt data
`
`1020
`
`1030
`
`Transfer data to offsite location
`
`Done
`
`FIG.JO
`
`Page 16 of 37
`
`

`

`FIG.11
`
`archive
`Longterm
`
`1150
`
`1140
`
`--------
`
`-----------------------------
`
`Index cycle
`
`~llllllWt@
`
`®@lllllllWd]
`
`continuous
`single instanced
`archive copy
`Onsite-
`
`~1100
`
`1130
`
`RA copy
`
`I
`E'.7/"/"/"~
`11 I' I I II
`~ ~
`
`I
`r//,a.zzi
`II I I I I II
`
`1120
`
`Wk3
`
`Wk2
`
`I
`w~
`II 111111
`Wk 1 m
`
`(cycles)
`PIT sets
`recovery copy
`Onsite -
`
`Storage policy ..._/"-111 O
`
`Page 17 of 37
`
`

`

`FIG.12
`
`0
`N
`0 .....
`Ul
`....
`.....
`1J1 =- ('D
`
`('D
`
`1230
`
`~111111~
`
`lmllllllf?WA
`
`continuous
`single instanced
`archive copy
`Onsite -
`
`N
`~o
`N
`~ :-:
`~
`
`0 ....
`
`N
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`51200
`
`1220
`
`1215
`
`Page 18 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 16 of 20
`
`US 8,140, 786 B2
`
`~1300
`
`Begin
`
`1310
`
`Retrieve existing set of data
`
`Single instance data
`
`Content index data
`
`1320
`
`1330
`
`1340
`
`Encrypt data
`
`1350
`Permanently erase unneeded
`data
`
`1360
`
`Create archive copy
`
`Done
`
`FIG.13
`
`Page 19 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 17 of 20
`
`US 8,140, 786 B2
`
`1400~
`
`1405
`.....
`Q)
`
`~---
`cc
`
`1410
`Collaborative
`document
`management
`system
`
`1420
`
`Collaborative
`search system
`
`FIG.14
`
`1430
`Content
`indexing
`system
`
`1440
`Security
`system
`
`1450
`Document
`retention
`system
`
`Page 20 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 18 of 20
`
`US 8,140, 786 B2
`
`~1500
`
`HTML
`page
`
`ASPX
`page
`
`1510
`List View
`- -- -- -- -- --
`-- -- -- -- -- --
`-- -- -- -- -- --
`- -- -- -- -- --
`- -- -- -- -- --
`
`1520
`Script-
`web port
`
`D
`
`.....__
`
`1530
`___..,,
`
`~
`
`.....
`
`...
`,..
`
`Configuration
`database
`
`Sch em a
`
`
`XML
`
`1540
`
`1550~
`
`1--
`
`View
`definition
`
`~
`
`.....
`
`~
`
`,..
`
`Parser
`
`I
`
`FIG.15
`
`1510
`Doc A
`
`1570
`Doc B
`
`Page 21 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 19 of 20
`
`US 8,140, 786 B2
`
`1600~
`
`1430
`
`Content
`indexing
`system
`
`I
`
`I
`
`I
`
`1610 I
`
`' ' ' ' 1620
`
`Common
`database
`
`Offline
`data
`
`~----..
`
`1630
`Enterprise 1
`(Windows) __ ....... Online
`
`1640
`
`1650
`Enterprise 2 _ _ _ Online
`(Linux)
`
`1660
`
`FIG.16
`
`Page 22 of 37
`
`

`

`U.S. Patent
`
`Mar.20,2012
`
`Sheet 20 of 20
`
`US 8,140, 786 B2
`
`1700
`
`Retain Document
`
`1710
`
`Receive retention request
`
`1720
`
`Identify relevant documents
`
`Set hold flag
`
`1730
`
`1740
`
`Monitor system changes
`
`1750
`
`Generate report
`
`Done
`
`FIG.17
`
`Page 23 of 37
`
`

`

`US 8,140,786 B2
`
`1
`SYSTEMS AND METHODS FOR CREATING
`COPIES OF DATA, SUCH AS ARCHIVE
`COPIES
`
`CROSS-REFERENCE TO RELATED
`APPLICATION(S)
`
`This application claims priority to the following patent
`applications, all of which are incorporated by reference in
`their entirety: U.S. Provisional Patent Application No.
`60/882,884, filed on Dec. 29, 2006, entitled SYSTEMS AND
`METHOD FOR CREATING COPIES OF DATA, SUCH AS
`REFERENCE ARCHIVE COPIES, U.S. Provisional Patent
`Application No. 60/871,737, filed on Dec. 22, 2006, entitled
`SYSTEM AND METHOD FOR STORING REDUNDANT
`INFORMATION, U.S. Provisional Patent Application No.
`60/882,883, filed on Dec. 29, 2006, entitled SYSTEM AND
`METHOD FOR ENCRYPTING DATA TO BE ARCHIVED,
`U.S. Provisional Patent Application No. 61/001,485, filed on
`Oct. 31, 2007, entitled SYSTEM AND METHOD FOR
`ENCRYPTING DATA TO BE ARCHIVED, and U.S. Provi(cid:173)
`sional Application No. 60/868,518, filed on Dec. 4, 2006,
`entitled METHOD AND SYSTEM FOR RETENTION OF
`DOCUMENTS.
`This application incorporates the following applications by
`reference: U.S. patent application Ser. No. 11/694,869, filed
`on Mar. 30, 2007, entitled METHOD AND SYSTEM FOR
`OFFLINE INDEXING OF CONTENT AND CLASSIFY(cid:173)
`ING STORED DATA, and U.S. patent application Ser. No.
`11/564,119, filed on Nov. 28, 2006, entitled SYSTEMS AND
`METHODS FOR CLASSIFYING AND TRANSFERRING
`INFORMATION IN A STORAGE NETWORK.
`
`BACKGROUND
`
`Corporations and other organizations routinely copy data
`produced and/or stored by their computer systems in order to
`retain an archive of the data. For example, a company might
`retain data from computing systems related to e-commerce, 40
`such as databases, file servers, web servers, and so on. The
`company may also retain data from computing systems used
`by employees, such as those used by an accounting depart(cid:173)
`ment, marketing department, engineering, and so on.
`Often, such retention and/or archiving amasses large 45
`amounts of data. There may be data copied or retained by way
`of periodic or one-time backups, continuous data protection
`(CDP) backups, snapshot backups, and so on. The data may
`include personal data, such as financial data, customer/ client/
`patient contact data, audio/visual data, and other types of 50
`data. Organizations may also retain data related to the correct
`operation of their computer systems, such as operating sys(cid:173)
`tem files, application files, user settings, and so on.
`Once the stored data has aged a certain amount of time, the
`data storage systems may send the data to a data archive that 55
`stores the data for as long as is required. Typical data storage
`systems create a first storage copy for short term data recov(cid:173)
`ery and after a certain time send the copies to an archive for
`long term storage. Thus, organizations are storing large
`amounts of data in their data archives at great expense.
`Organizations increasingly rely on computer systems to
`produce and store critical information and the retention and
`recovery of data may cause problems in their operation and
`overall effectiveness. For example, a data storage system may
`receive an identification of a file location to store and create 65
`one or more storage files containing the contents of the stored
`file and/or location. The data storage system can then restore
`
`2
`data from these storage files (such as backup files) should
`anything happen to the original data.
`At times, organizations may want to quickly access data
`stored in their data archives. For example, an organization
`5 may receive a discovery request for a small amount of email
`data. Although the amount of requested data may be small, the
`data storage system may need to search many archive files
`(such as backup tapes) to find the requested data.
`Companies are often required to retain documents in
`10 archive files in order to comply with various regulations. For
`example, when a company is in litigation, the company may
`be required to retain documents related to the litigation.
`Employees are often asked not to delete any correspondence,
`emails, or other documents related to the litigation. Recently
`15 enacted amendments to Federal Rules of Civil Procedure
`(FRCP) place additional document retention burdens on a
`company. According to Gartner, "Several legal commentators
`believe that the heart of the proposed changes to FRCP is the
`formal codification of "electronically stored information"
`20 (ESI) and the recognition that the traditional discovery frame(cid:173)
`work dealing with paper-based documents is no longer
`adequate." Legal discovery of electronic information has
`emerged as a key requirement for today's enterprise in recent
`years, and the new federal rules both strengthen and expand
`25 those requirements.
`Complying with all of the regulations related to document
`retention can be difficult, particularly when many employees
`may have relevant documents stored under their control that
`are relevant to the issue at hand. Penalties for violation of
`30 regulations related to document retention can be steep, and
`executives and business managers want confidence that
`employees are taking appropriate steps to comply with the
`regulations. Employees may forget about requests to retain
`documents, or may not think that a particular document is
`35 relevant when others would disagree.
`Companies also need provisions for finding retained docu(cid:173)
`ments. Traditional search engines accept a search query from
`a user, and generate a list of search results. The user typically
`views one or two of the results and then discards the results.
`However, some queries are part of a longer-term, collabora(cid:173)
`tive process. For example, when a company receives a legal
`discovery request, the company is often required to mine all
`of the company's data for documents responsive to the dis(cid:173)
`covery request. This typically involves queries of different
`bodies of documents lasting days or even years. Many people
`are often part of the query, such as company employees, law
`firm associates, and law firm partners. The search results must
`often be viewed by more than one of these people in a well(cid:173)
`defined set of steps (i.e., a workflow ). For example, company
`employees may provide documents to a law firm, and asso(cid:173)
`ciates at the law firm may perform an initial reading of the
`documents to determine if the documents contain relevant
`information. The associates may flag documents with
`descriptive classifications such as "relevant" or "privileged."
`Then, the flagged documents may go to a law firm partner that
`will review each of the results and ultimately respond to the
`discovery request with the set of documents that satisfies the
`request.
`Collaborative document management systems exist for
`60 allowing multiple users to participate in the creation and
`revision of content, such as documents. Many collaborative
`document management systems provide an intuitive user
`interface that acts as a gathering place for collaborative par-
`ticipants. For example, Microsoft Sharepoint Server provides
`a web portal front end that allows collaborative participants to
`find shared content and to participate in the creation of new
`content and the revision of content created by others. In
`
`Page 24 of 37
`
`

`

`3
`addition to directly modifying the content of a document,
`collaborative participants can add supplemental information,
`such as comments to the document. Many collaborative docu(cid:173)
`ment management systems also provide workflows for defin(cid:173)
`ing sets of steps to be completed by one or more collaborative
`participants. For example, a collaborative document manage(cid:173)
`ment system may provide a set of templates for performing
`common tasks, and a collaborative participant may be guided
`through a wizard-like interface that asks interview-style ques-
`tions for completing a particular workflow.
`The foregoing examples of some existing problems with
`data storage, archiving, and restoration are intended to be
`illustrative and not exclusive. Other limitations will become
`apparent to those of skill in the art upon a reading of the
`Detailed Description below.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`FIG. lA is a block diagram illustrating a data archival and
`data retrieval system.
`FIG. lB is a block diagram illustrating an alternative data
`archival system.
`FIG. lC is a block diagram illustrating an alternative data
`archival system.
`FIG. 2A is a block diagram illustrating components of a
`data stream.
`FIG. 2B is a block diagram illustrating an example of a data
`storage system.
`FIG. 2C is a block diagram illustrating components of
`server used in data storage operations.
`FIG. 3 is a block diagram illustrating components used to
`create an archive file and store an archive copy.
`FIG. 4 is a block diagram illustrating the architecture of an
`archive file.
`FIG. 5 is a schematic diagram illustrating the storage of
`data chunks on storage components.
`FIG. 6 is a flow diagram illustrating an exemplary routine
`for copying data.
`FIG. 7 is a flow diagram illustrating an exemplary routine
`for creating an archive copy of data.
`FIG. 8 is a flow diagram illustrating an exemplary routine
`for reducing a data set to single instances of data.
`FIG. 9 is a flow diagram illustrating an exemplary routine
`for indexing an archive copy of a data set.
`FIG. 10 is a flow diagram illustrating an exemplary routine
`for encrypting an archive copy of a data set.
`FIG. 11 is a block diagram illustrating a storage policy for
`creating a data archive for an existing archived data set.
`FIG. 12 is a block diagram illustrating an alternative data
`archive and retrieval system.
`FIG. 13 is a flow diagram illustrating an exemplary routine
`creating an archive copy of data from an archived data set.
`FIG. 14 is a block diagram illustrating an example archi(cid:173)
`tecture for integrating a collaborative search system with a
`collaborative document management system.
`FIG. 15 is a block diagram illustrating an example integra- 55
`ti on of a content indexing system to provide access to dispar(cid:173)
`ate data sources.
`FIG. 16 is a schematic diagram illustrating integration of
`parsers with a typical collaborative document management
`system.
`FIG. 17 is a flow diagram illustrating typical processing in
`response to a document retention request.
`
`45
`
`COPYRIGHT NOTICE
`
`A portion of the disclosure of this patent document con(cid:173)
`tains material that is subject to copyright protection. The
`
`US 8,140,786 B2
`
`4
`copyright owner has no objection to the facsimile reproduc(cid:173)
`tion by anyone of the patent document or the patent disclo(cid:173)
`sures, as it appears in the Patent and Trademark Office patent
`files or records, but otherwise reserves all copyright rights
`whatsoever.
`
`DETAILED DESCRIPTION
`
`Examples of the technology provided below describe sys(cid:173)
`tems and methods of creating an archive copy or copies of a
`data set. Although described in connection with certain
`examples, the systems described herein are applicable to and
`may employ any wireless or hard-wired network or data stor(cid:173)
`age system that stores and conveys data and information from
`15 one point to another, including communication networks,
`enterprise networks, storage networks, and so on.
`Examples of the technology describe a method and system
`of creating an archive copy from one or more secondary
`copies that are created from an original data set, or primary or
`20 production copy, such as data from a file system. For example,
`instead of using certain types of secondary copies, such as
`recovery copies, snapshot volumes, and so on, to archive data
`(e.g., waiting until a recovery copy has aged a certain time
`period and then storing some or all of the recovery copy as an
`25 archive copy), the system creates an archive copy of the data
`during or soon after creating other secondary copies. That is,
`the system may create a certain type of secondary copy that
`may be used for long term archival purposes from any data
`under management by the system. For example, this copy
`30 may be single instanced and then encrypted, unlike other
`secondary copies under management by the system.
`Alternatively, examples of the technology describe a
`method and system of creating the archive copy directly from
`the primary copy (i.e., the original data set), such as the
`35 primary copy of a file system, an exchange server, a SQL
`database, and so on. For example, the system may create an
`archive copy of data without first making creating other sec(cid:173)
`ondary copies.
`Furthermore, examples of the technology describe a
`40 method and system of creating a

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket