`US008140786B2
`
`c12) United States Patent
`Bunte et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,140, 786 B2
`Mar.20,2012
`
`(54) SYSTEMS AND METHODS FOR CREATING
`COPIES OF DATA, SUCH AS ARCHIVE
`COPIES
`
`(75)
`
`Inventors: Alan Bunte, Monmouth Beach, NJ (US);
`Anand Prahlad, East Brunswick, NJ
`(US); Brian Brockway, Shrewsbury, NJ
`(US)
`
`(73) Assignee: Comm Vault Systems, Inc., Oceanport,
`NJ (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1003 days.
`
`(21) Appl. No.: 11/950,376
`
`(22) Filed:
`
`Dec. 4, 2007
`
`(65)
`
`Prior Publication Data
`
`US 2008/0229037 Al
`
`Sep. 18, 2008
`
`Related U.S. Application Data
`
`(60) Provisional application No. 60/882,884, filed on Dec.
`29, 2006, provisional application No. 60/871,737,
`filed on Dec. 22, 2006, provisional application No.
`60/882,883, filed on Dec. 29, 2006, provisional
`application No. 61/001,485, filed on Oct. 31, 2007,
`provisional application No. 60/868,518, filed on Dec.
`4, 2006.
`
`(51)
`
`Int. Cl.
`G06F 12100
`(2006.01)
`(52) U.S. Cl. ............ 7111161; 71 l/El2.06; 711/El2.103
`(58) Field of Classification Search ........................ None
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`8/1987 Ng
`4,686,620 A
`2/1991 Coleetal.
`4,995,035 A
`5,005,122 A
`411991 Griffin et al.
`
`5,093,912 A
`5,133,065 A
`5,193,154 A
`5,212,772 A
`5,226,157 A
`5,239,647 A
`5,241,668 A
`5,241,670 A
`5,276,860 A
`5,276,867 A
`5,287,500 A
`5,321,816 A
`5,333,315 A
`5,347,653 A
`
`Dong et al.
`3/1992
`Cheffetz et al.
`7/1992
`Kitajima et al.
`3/1993
`Masters
`5/1993
`Nakano et al.
`7/1993
`Anglin et al.
`8/1993
`Eastridge et al.
`8/1993
`Eastridge et al.
`8/1993
`Fortier et al.
`1/1994
`Kenley et al.
`1/1994
`Stoppani, Jr.
`2/1994
`Rogan et al.
`6/1994
`Saether et al.
`7/1994
`Flynn et al.
`9/1994
`(Continued)
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`0259912
`3/1988
`(Continued)
`
`OTHER PUBLICATIONS
`
`Webopedia, "Data Duplication", Aug. 31, 2006, pp. 1-2, http://web.
`archive.org/web/200609130305 59/http://www.webopedia.com/
`TERM/D/data_deduplication.htrnl. *
`
`(Continued)
`
`Primary Examiner - Edward Dudek, Jr.
`Assistant Examiner - Christopher Birkhimer
`(74) Attorney, Agent, or Firm - Perkins Coie LLP
`
`(57)
`
`ABSTRACT
`
`A system and method of creating archive copies of data sets is
`described. In some examples, the system creates an archive
`copy from an original data set. In some examples, the system
`creates an archive copy when creating a recovery copy for a
`data set. In some examples, the system creates a copy without
`redundant data, and then encrypts the data set.
`
`13 Claims, 20 Drawing Sheets
`
`100~
`
`110
`
`150
`
`120
`
`/§) __ _
`.. ~--~-~··~
`~
`~
`
`140
`
`CSCO-1039
`Page 1 of 37
`
`
`
`US 8,140,786 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`4/1995 Fecteau et al.
`5,410,700 A
`5,437,012 A *
`7/1995 Mahajan ....................... 7111111
`9/1995 Hayashi
`5,448,724 A
`5,491,810 A
`2/1996 Allen
`5,495,607 A
`211996 Pisello et al.
`5,504,873 A
`411996 Martin et al.
`5,544,345 A
`8/ 1996 Carpenter et al.
`5,544,347 A
`8/1996 Yanai et al.
`5,559,957 A
`9/1996 Balk
`5,619,644 A
`411997 Crockett et al.
`5,638,509 A
`6/1997 Dunphy et al.
`5,673,381 A
`911997 Huai et al.
`5,699,361 A
`12/1997 Ding et al.
`5,729,743 A
`3/ 1998 Squibb
`5,742,792 A *
`4/1998 Yanai et al. ................... 7111162
`5,751,997 A
`511998 Kullick et al.
`511998 Saxon
`5,758,359 A
`611998 Senator et al.
`5,761,677 A
`611998 Crouse et al.
`5,764,972 A
`7/1998 Whiting et al.
`5,778,395 A
`911998 Nielsen
`5,812,398 A
`9/1998 Benson et al.
`5,813,008 A
`911998 Johnson et al.
`5,813,009 A
`911998 Morris
`5,813,017 A
`5,822,780 A *
`10/1998 Schutzman ................... 7111165
`2/1999 Blumenau
`5,875,478 A
`3/1999 Ebrahim
`5,887,134 A
`5/1999 Ofek
`5,901,327 A
`7I1999 Perks
`5,924,102 A
`8/ 1999 Benson
`5,940,833 A
`911999 Aviani, Jr.
`5,950,205 A
`10/ 1999 Beeler, Jr.
`5,974,563 A
`5,990,810 A
`1111999 Williams
`6,021,415 A
`212000 Cannon et al.
`212000 Anglin
`6,026,414 A
`412000 Ulrich et al.
`6,052,735 A
`612000 Kedem
`6,076,148 A
`712000 Ying
`6,094,416 A
`10/2000 Low et al.
`6,131,095 A
`10/2000 Sidwell
`6,131,190 A
`1112000 Cannon et al.
`6,148,412 A
`1112000 Urevig et al.
`6,154,787 A
`12/2000 Mutalik et al.
`6,161,111 A
`12/2000 Yeager
`6,167,402 A
`6,212,512 Bl
`412001 Barney et al.
`6,260,069 Bl
`712001 Anglin
`6,269,431 Bl
`7/2001 Dunham
`6,275,953 Bl
`8/2001 Vahalia et al.
`6,301,592 Bl
`10/2001 Aoyama et al
`6,311,252 Bl*
`10/2001 Raz.
`6,324,581 Bl
`1112001 Xu et al.
`6,328,766 Bl
`12/2001 Long
`6,330,570 Bl
`12/2001 Crighton
`6,330,642 Bl
`12/2001 Carteau
`6,343,324 Bl
`112002 Hubis et al.
`RE37,601 E
`3/2002 Eastridge et al.
`6,356,801 Bl
`3/2002 Goodman et al.
`6,389,432 Bl
`512002 Pothapragada et al.
`6,421,711 Bl
`712002 Blumenau et al.
`6,487,561 Bl
`1112002 Ofek et al.
`6,513,051 Bl
`112003 Bolosky et al.
`6,519,679 B2
`212003 Devireddy et al.
`6,538,669 Bl
`3/2003 Lagueux, Jr. et al.
`5/2003 O'Connor
`6,564,228 Bl
`6,609,183 B2 *
`8/2003 Ohran ........................... 7111161
`6,609,187 Bl*
`8/2003 Merrell et al. ................ 7111173
`6,658,526 B2
`12/2003 Nguyen et al.
`6,704,730 B2
`3/2004 Moulton et al.
`6,745,304 B2 *
`6/2004 Playe ............................ 7111161
`6,757,699 B2 *
`111
`6/2004 Lowry
`6/2004 Cabrera et al.
`6,757,794 B2
`6,795,903 B2 *
`912004 Schultz et al ................. 7111154
`6,810,398 B2
`10/2004 Moulton
`6,868,417 B2
`3/2005 Kazar et al.
`6,901,493 Bl*
`512005 Maffezzoni ................... 7111162
`6,928,459 Bl
`8/2005 Sawdon et al.
`6,952,758 B2
`10/2005 Chron et al.
`6,959,368 Bl *
`10/2005 St. Pierre et al .............. 7111162
`
`. ................ 7111117
`
`6,976,039 B2 *
`6,993,162 B2 *
`7,017,113 B2
`7,035,943 B2
`7,089,395 B2 *
`7,111,173 Bl*
`7,117,246 B2
`7,143,091 B2
`7,191,290 Bl
`7,246,272 B2
`7,272,606 B2
`7,287,252 B2
`7,320,059 Bl
`7,325,110 B2
`7,395,282 Bl
`7,444,382 B2
`7,444,387 B2
`7,478,113 Bl
`7,487,245 B2
`7,496,604 B2
`7,647,462 B2
`7,661,028 B2
`7,685,177 Bl
`7,685,459 Bl
`7,698,699 B2
`7,870,486 B2
`200110037323 Al*
`2002/0099806 Al
`200210107877 Al
`2003/0033308 Al
`2003/0110190 Al
`2003/0182310 Al *
`200410148306 Al *
`2004/0230817 Al
`2004/0250033 Al
`2005/0033756 Al
`2005/0066190 Al*
`2005/0086443 Al
`2005/0114406 Al
`2005/0131900 Al
`2005/0182780 Al*
`2005/0203864 Al
`2005/0234823 Al*
`2005/0262194 Al*
`2006/0005048 Al
`200610010227 Al
`2006/0047894 Al *
`2006/0053305 Al
`2006/0056623 Al*
`200610174112 Al *
`2006/0224846 Al
`2006/0230244 Al
`2006/0242489 Al
`2007 /0022145 Al
`2007/0118705 Al
`200710179995 Al
`2007/0198613 Al
`2007 /0203937 Al
`2007 /02557 58 Al
`2008/0028007 Al
`2008/0098083 Al
`2008/0162320 Al
`2008/0162518 Al
`200910106480 Al
`200910112870 Al
`2009/0132619 Al
`2009/0144285 Al
`200910177719 Al
`2009/0204649 Al
`2010/0094817 Al
`2010/0161554 Al
`
`111
`............. 382/118
`
`12/2005 Chefalas et al.
`112006 Stephany et al.
`3/2006 Bourbakis et al.
`412006 Yamane et al.
`8/2006 Jacobson et al ............... 7111202
`912006 Scheidt ......................... 713/186
`10/2006 Christenson et al.
`1112006 Charnock et al.
`3/2007 Ackaouy et al.
`7 /2007 Cabezas et al.
`9/2007 Borthakur et al.
`10/2007 Bussiere et al.
`112008 Armangau et al.
`112008 Kubo et al.
`7 /2008 Crescenti et al.
`10/2008 Malik
`10/2008 Douceur et al.
`112009 De Spiegeleer et al.
`212009 Douceur et al.
`212009 Sutton, Jr. et al.
`112010 Wolfgang et al.
`212010 Erofeev
`3/2010 Hagerstrom et al.
`3/2010 De Spiegeleer et al.
`412010 Rogers et al.
`112011 Wang et al.
`1112001 Moulton eta!. .................. 707/1
`712002 Balsamo et al.
`8/2002 Whiting et al.
`212003 Patel et al.
`6/2003 Achiwa et al.
`9/2003 Charnock et al. .......... 707/104.1
`.............. 707/101
`7/2004 Moulton et al.
`1112004 Ma
`12/2004 Prahlad et al.
`212005 Kottomtharayil et al.
`3/2005 Martin .......................... 713/200
`412005 Mizuno et al.
`512005 Borthakur et al.
`6/2005 Palliyll et al.
`8/2005 Forman et al ................. 707/101
`912005 Schmidt et al.
`10/2005 Schimpf ......................... 705/50
`1112005 Mamou et al. ................ 709/203
`112006 Osaki et al.
`112006 Atluri
`3/2006 Okumura ...................... 7111111
`3/2006 Wahlert et al.
`3/2006 Gligor et al. .................... 380/28
`8/2006 Wray ............................ 713/168
`10/2006 Amarendran et al.
`10/2006 Amarendran et al.
`10/2006 Brockway et al.
`112007 Kavuri
`5/2007 Arakawa et al.
`8/2007 Prahlad et al.
`8/2007 Prahlad et al.
`8/2007 Prahlad et al.
`1112007 Zheng et al.
`1/2008 Ishii et al.
`4/2008 Shergill et al.
`7 /2008 Mueller et al.
`7 /2008 Bollinger et al.
`412009 Chung
`412009 Ozzie et al.
`512009 Arakawa et al.
`612009 Chatley et al.
`712009 Kavuri
`8/2009 Wong et al.
`4/2010 Ben-Shaul et al.
`612010 Datuashvili et al.
`
`EP
`EP
`EP
`EP
`EP
`
`FOREIGN PATENT DOCUMENTS
`0405926
`111991
`0467546
`111992
`5/1997
`0774715
`0809184
`1111997
`0899662
`3/1999
`
`Page 2 of 37
`
`
`
`US 8,140,786 B2
`Page 3
`
`EP
`WO
`WO
`
`0981090
`W0-95/13580
`W0-99/12098
`
`212000
`5/1995
`3/1999
`
`OTHER PUBLICATIONS
`
`Federal Information Processing Standards Publication 180-2,
`"Secure Hash Standard", Aug. 1, 2002, pp. 1-83 http://csrc.nist.gov/
`publications/fips/fips 180-2/fips l 80-2withchangenotice. pdf. *
`Menezes et al., "Handbook of Applied Cryptography", CRC Press,
`1996, pp. 321-383 http://www.cacr.math.uwaterloo.ca/hac/about/
`chap9.pdf.*
`Microsoft, "Computer Dictionary", Fifth Edition, 2002, p. 249.*
`SearchStorage, "File System", Nov. 1998, pp. 1-10, http://searchstor(cid:173)
`age. techtarget.corn/ definition/ file-system.*
`Comm Vault Systems, Inc., "Deduplication," <http://documentation.
`commvault.corn/commvault/release_8_0_0/books_online_l/
`english_US/features/single_instance/single_instance.htm>,
`internet accessed on May 21, 2009, 9 pages.
`Comm Vault Systems, Inc., "Deduplication-How to," <http://docu(cid:173)
`mentation.commvault.com/commvault/release_8_0_0/books_
`online_l/english_US/features/single_instance/single_instance_
`how_to.htm>, internet accessed on May 21, 2009, 7 pages.
`U.S. Appl. No. 12/626,839, filed Nov. 27, 2009, Klose, Michael F.
`Diligent Technologies "HyperFactor," <http://www.diligent.com/
`products:protecTIER-l:HyperFactor-1>, Internet accessed on Dec.
`5, 2008, 2 pages.
`Overland Storage, "Data Deduplication," <http://www.overlandstor(cid:173)
`age.com/topics/data_deduplication.html>, Internet accessed on
`Dec. 5, 2008, 2 pages.
`Lortu Software Development, "Kondar Technology-Deduplication,"
`<http://www.lortu.com/en/deduplication.asp>, Internet accessed on
`Dec. 5, 2008, 3 pages.
`Quantum Corporation, "Data De-Duplication Background: A Tech(cid:173)
`nical White Paper," May 2008, 13 pages.
`Kornblum, Jesse, "Identifying Almost Identical Files Using Context
`Triggered Piecewise Hashing," www.sciencedirect.com, Digital
`Investigation 3S (2006), pp. S91-S97.
`
`International Search Report and Written Opinion, International
`Application No. PCT/US2009/58137, Mail Date Dec. 23, 2009, 14
`pages.
`U.S. Appl. No. 11/963,623, Gokhale.
`U.S. Appl. No. 12/058,178, Kottomtharayil.
`U.S. Appl. No. 12/058,317, Kottomtharayil.
`U.S. Appl. No. 12/058,367, Kottomtharayil.
`Armstead et al., "Implementation of a Campus-wide Distributed
`Mass Storage Service: The Dream vs. Reality," IEEE, 1995, pp.
`190-199.
`Arneson, "Mass Storage Archiving in Network Environments,"
`Digest of Papers, Ninth IEEE Symposium on Mass Storage Systems,
`Oct. 31, 1988-Nov. 3, 1988, pp. 45-50, Monterey, CA.
`Cabrera et al., "ADSM: A Multi-Platform, Scalable, Backup and
`Archive Mass Storage System," Digest of Papers, Compcon '95,
`Proceedings of the 40th IEEE Computer Society International Con(cid:173)
`ference, Mar. 5, 1995-Mar. 9, 1995, pp. 420-427, San Francisco, CA.
`Eitel, "Backup and Storage Management in Distributed Heteroge(cid:173)
`neous Environments," IEEE, 1994, pp. 124-126.
`Jander, M., "Launching Storage-Area Net," Data Communications,
`US, McGraw Hill, NY, vol. 27, No. 4 (Mar. 21, 1998), pp. 64-72.
`Jason Gait, "The Optical File Cabinet: A Random-Access File Sys(cid:173)
`tem for Write-Once Optical Disks," IEEE Computer, vol. 21, No. 6,
`pp. 11-22 ( 1988).
`Rosenblum et al., "The Design and Implementation of a Log-Struc(cid:173)
`tured File System," Operating Systems Review SI GO PS, vol. 25, No.
`5, New York, US, pp. 1-15 (May 1991).
`U.S. Appl. No. 12/145,342, filed Jun. 24, 2008, Gokhale.
`U.S. Appl. No. 12/145,347, filed Jun. 24, 2008, Gokhale.
`U.S. Appl. No. 12/647,933, filed Jul. 3, 2008.
`U.S. Appl. No. 12/647,906, filed Dec. 28, 2009, Attarde et al.
`U.S. Appl. No. 12/649,454, filed Dec. 30, 2009, Muller et al.
`Commvault Systems, Inc., "Continuous Data Replicator 7.0," Prod(cid:173)
`uct Data Sheet, 2007.
`U.S. Appl. No. 12/565,576, filed Sep. 23, 2009, Kottomtharayil et al.
`U.S. Appl. No. 13/251,022, filed Sep. 30, 2011, Gokhale.
`* cited by examiner
`
`Page 3 of 37
`
`
`
`130
`
`Archive copy
`
`FIG. IA
`
`MA
`
`132
`
`MA
`
`117
`
`Cl
`
`.-············-~·-········-·················
`
`,...=::._----.. ~ ... ~1~2~1-------.
`120
`
`Secondary copy Ill
`ffi
`
`126
`
`124
`
`122
`
`Update
`Scan
`Metadata
`f3
`
`0
`
`File system
`
`api
`
`File system
`
`-----·····
`
`-----·-·----·---------·-·····························-····
`
`...... ·································--·~-·-················-----·-··--------------...........
`
`..-'1'-'1-"-0 ____ _.._ ____ ~
`
`100~
`
`Page 4 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 2of20
`
`US 8,140, 786 B2
`
`160~
`
`110
`
`Original
`data set
`
`132
`Archive copy
`(m.a.)
`
`117
`Cl
`
`165
`UI
`
`encryption
`
`FIG.JB
`
`170~
`
`175
`Previously
`archived copy
`
`132
`
`Archive copy
`
`137
`encryption
`
`FIG.JC
`
`Page 5 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 3of20
`
`US 8,140, 786 B2
`
`0
`0
`N
`
`.q-
`0
`N
`
`('t')
`0
`N
`
`N
`0
`N
`
`E
`CtJ T'""
`~ 0
`
`N -UJ
`
`-c
`
`(].)
`C)
`ro
`ro
`"C
`(].)
`~
`
`H
`
`~
`~
`~
`
`"C
`
`C\l -ro
`..... -cc
`
`Q) (].)
`:.:: C)
`<( C\l
`..c
`:J en
`
`Page 6 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 4of20
`
`US 8,140, 786 B2
`
`N
`N
`N
`
`ro <D
`......
`ro ro
`"'C ..0
`
`(/)
`
`......
`ro c:
`...... (I)
`ro e>
`-c ro
`
`-----------
`' ' '
`'
`' ' '
`'
`' '
`'
`
`'
`
`' ' '
`'
`
`\
`
`\
`
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`\
`
`(")
`0
`N
`
`(/) CTIJ
`
`(I) (")
`OlN
`m
`m
`'5
`(I)
`E
`
`ro
`..c
`m
`iii
`"'C
`
`(")
`0
`
`N CTIJ
`
`(I)
`('I')
`C'l N
`m
`m
`'5
`(I)
`E
`
`(/)
`m
`..c
`m
`iii
`"'C
`
`(I) (")
`
`(/)
`C'l N m
`m
`..c
`ro
`m
`iii
`'5
`"'C
`Q)
`E
`
`\
`\
`\
`\
`\
`(")
`\ 0
`\N
`\
`
`\
`
`c ..-[IJ
`.__ _____ __.
`c ..-[IJ
`
`N
`0
`N
`
`..-
`N
`N
`
`c
`-~
`13
`
`0 ..-
`N
`
`N
`..- U-
`(I) (I)
`ro :J
`N
`'t: "'C
`(I) 0
`£E
`
`'-
`Q) Q)
`Cl Cl
`~ m
`0 c:
`...... Cll
`(/) E
`
`N
`0
`N
`
`..-
`N
`N
`
`me
`...... Q)
`ro C>
`"'C Cll
`
`c
`-~
`13
`
`m a>
`......
`ro m
`"'C ..0
`
`(/)
`
`(I) (")
`
`(/)
`C>N m
`m
`..c
`m
`ro
`iii
`'5
`(") ~
`"'C
`0
`N
`
`................. ~
`...... ·······
`N.__ _ _ _ __.
`
`LO
`0
`N
`
`Page 7 of 37
`
`
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`r-------,
`320
`
`._. _______
`
`agents
`other
`
`module
`interface
`
`agent
`jobs
`
`214
`
`211
`
`FIG.2C
`
`__,,,,.
`
`-.......
`H' 222
`....
`
`--
`
`dB
`
`......._
`
`I'--..
`,.,....
`
`agent
`stream
`
`management
`
`agent
`
`310
`
`210
`
`212
`
`_....,,
`213
`
`dB
`
`....__
`
`r--.__
`
`""
`
`~
`
`~
`
`~
`
`agent
`data
`
`client
`
`_ ...
`
`.............,,
`
`221
`
`111
`
`207~
`
`Page 8 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 6of20
`
`US 8,140, 786 B2
`
`0 ..q-
`
`('t)
`
`(I)
`C>
`~
`.....
`0
`en
`
`0
`..q-
`('t)
`
`(I)
`C>
`...
`~
`0
`en
`
`0
`('t)
`('t)
`
`0
`
`~
`('t)
`
`ro
`E
`
`I.()
`('t)
`('t)
`
`x
`(I)
`"C
`c:
`
`~
`~
`~
`
`~~ ('t) ~
`
`"C
`
`...
`
`c:
`.~
`(.)
`
`I.()
`~
`
`('t) 19 ~
`ro o
`O(;)
`
`~~ M
`ro ·-
`'
`'
`
`~
`
`0
`N
`('t)
`
`(I) a;
`C>e>
`~ ro
`0 c:
`... ro
`en E
`
`0
`0
`('t)
`
`' ' '
`
`Page 9 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 7of20
`
`US 8,140, 786 B2
`
`N
`
`0
`
`u co
`>-co a..
`
`~
`~
`~
`
`""" N ...,.
`
`N
`N
`
`"""
`
`0
`N
`
`"""
`
`('I')
`N
`
`"""
`
`N
`
`~
`Q)
`
`l+:: u co
`Q)
`Q)
`.~ ..c::::
`..c::::
`~
`<(
`
`0
`0
`
`"""
`
`Page 10 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 8of20
`
`US 8,140, 786 B2
`
`'V
`0
`N
`
`'V
`0
`N
`
`'V
`0
`N
`
`N
`::it:. c:
`::I
`L:
`
`0 -~ -Ctl
`
`~
`
`::it:. c:
`::I
`L:
`0
`-
`
`~ -Ctl
`
`(")
`
`::it:. c:
`::I
`L:
`
`0 -~ -Ctl
`
`'Ii
`~
`~
`
`0
`0
`LO
`
`Page 11 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 9of20
`
`US 8,140, 786 B2
`
`~600
`
`Begin
`
`612
`Receive request to create
`L....-a-rc_h_iv_e_c_o_py_v_i_a_o_ri-gi-na_l_d_a_ta__.
`set
`
`61 O
`
`614
`Receive request to create
`archive copy via recovery copy
`
`620
`
`Create archive copy
`
`630
`Store archive copy in storage
`component
`
`Done
`
`FIG. 6
`
`Page 12 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 10 of 20
`
`US 8,140, 786 B2
`
`5700
`
`Begin
`
`710
`
`Receive recovery copy of data
`
`720
`
`730
`
`740
`
`750
`
`Single instance data
`
`Content index data
`
`Encrypt data
`
`Create archive copy
`
`Done
`
`FIG. 7
`
`Page 13 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 11 of 20
`
`US 8,140, 786 B2
`
`5800
`
`Backup file
`
`Identify file
`
`810
`
`820
`
`Determine file uniqueness
`
`No
`
`Add file reference
`
`840
`
`Store unique file
`
`Done
`
`FIG. 8
`
`Page 14 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 12 of 20
`
`US 8,140, 786 B2
`
`900
`
`Index Content
`
`Select offline copy
`
`Identify content
`
`910
`
`920
`
`930
`
`Update content index
`
`Done
`
`FIG. 9
`
`Page 15 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 13 of 20
`
`US 8,140, 786 B2
`
`1000
`
`Begin
`
`1010
`
`Receive data for encryption
`
`Encrypt data
`
`1020
`
`1030
`
`Transfer data to offsite location
`
`Done
`
`FIG.JO
`
`Page 16 of 37
`
`
`
`FIG.11
`
`archive
`Longterm
`
`1150
`
`1140
`
`--------
`
`-----------------------------
`
`Index cycle
`
`~llllllWt@
`
`®@lllllllWd]
`
`continuous
`single instanced
`archive copy
`Onsite-
`
`~1100
`
`1130
`
`RA copy
`
`I
`E'.7/"/"/"~
`11 I' I I II
`~ ~
`
`I
`r//,a.zzi
`II I I I I II
`
`1120
`
`Wk3
`
`Wk2
`
`I
`w~
`II 111111
`Wk 1 m
`
`(cycles)
`PIT sets
`recovery copy
`Onsite -
`
`Storage policy ..._/"-111 O
`
`Page 17 of 37
`
`
`
`FIG.12
`
`0
`N
`0 .....
`Ul
`....
`.....
`1J1 =- ('D
`
`('D
`
`1230
`
`~111111~
`
`lmllllllf?WA
`
`continuous
`single instanced
`archive copy
`Onsite -
`
`N
`~o
`N
`~ :-:
`~
`
`0 ....
`
`N
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`51200
`
`1220
`
`1215
`
`Page 18 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 16 of 20
`
`US 8,140, 786 B2
`
`~1300
`
`Begin
`
`1310
`
`Retrieve existing set of data
`
`Single instance data
`
`Content index data
`
`1320
`
`1330
`
`1340
`
`Encrypt data
`
`1350
`Permanently erase unneeded
`data
`
`1360
`
`Create archive copy
`
`Done
`
`FIG.13
`
`Page 19 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 17 of 20
`
`US 8,140, 786 B2
`
`1400~
`
`1405
`.....
`Q)
`
`~---
`cc
`
`1410
`Collaborative
`document
`management
`system
`
`1420
`
`Collaborative
`search system
`
`FIG.14
`
`1430
`Content
`indexing
`system
`
`1440
`Security
`system
`
`1450
`Document
`retention
`system
`
`Page 20 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 18 of 20
`
`US 8,140, 786 B2
`
`~1500
`
`HTML
`page
`
`ASPX
`page
`
`1510
`List View
`- -- -- -- -- --
`-- -- -- -- -- --
`-- -- -- -- -- --
`- -- -- -- -- --
`- -- -- -- -- --
`
`1520
`Script-
`web port
`
`D
`
`.....__
`
`1530
`___..,,
`
`~
`
`.....
`
`...
`,..
`
`Configuration
`database
`
`Sch em a
`
`
`XML
`
`1540
`
`1550~
`
`1--
`
`View
`definition
`
`~
`
`.....
`
`~
`
`,..
`
`Parser
`
`I
`
`FIG.15
`
`1510
`Doc A
`
`1570
`Doc B
`
`Page 21 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 19 of 20
`
`US 8,140, 786 B2
`
`1600~
`
`1430
`
`Content
`indexing
`system
`
`I
`
`I
`
`I
`
`1610 I
`
`' ' ' ' 1620
`
`Common
`database
`
`Offline
`data
`
`~----..
`
`1630
`Enterprise 1
`(Windows) __ ....... Online
`
`1640
`
`1650
`Enterprise 2 _ _ _ Online
`(Linux)
`
`1660
`
`FIG.16
`
`Page 22 of 37
`
`
`
`U.S. Patent
`
`Mar.20,2012
`
`Sheet 20 of 20
`
`US 8,140, 786 B2
`
`1700
`
`Retain Document
`
`1710
`
`Receive retention request
`
`1720
`
`Identify relevant documents
`
`Set hold flag
`
`1730
`
`1740
`
`Monitor system changes
`
`1750
`
`Generate report
`
`Done
`
`FIG.17
`
`Page 23 of 37
`
`
`
`US 8,140,786 B2
`
`1
`SYSTEMS AND METHODS FOR CREATING
`COPIES OF DATA, SUCH AS ARCHIVE
`COPIES
`
`CROSS-REFERENCE TO RELATED
`APPLICATION(S)
`
`This application claims priority to the following patent
`applications, all of which are incorporated by reference in
`their entirety: U.S. Provisional Patent Application No.
`60/882,884, filed on Dec. 29, 2006, entitled SYSTEMS AND
`METHOD FOR CREATING COPIES OF DATA, SUCH AS
`REFERENCE ARCHIVE COPIES, U.S. Provisional Patent
`Application No. 60/871,737, filed on Dec. 22, 2006, entitled
`SYSTEM AND METHOD FOR STORING REDUNDANT
`INFORMATION, U.S. Provisional Patent Application No.
`60/882,883, filed on Dec. 29, 2006, entitled SYSTEM AND
`METHOD FOR ENCRYPTING DATA TO BE ARCHIVED,
`U.S. Provisional Patent Application No. 61/001,485, filed on
`Oct. 31, 2007, entitled SYSTEM AND METHOD FOR
`ENCRYPTING DATA TO BE ARCHIVED, and U.S. Provi(cid:173)
`sional Application No. 60/868,518, filed on Dec. 4, 2006,
`entitled METHOD AND SYSTEM FOR RETENTION OF
`DOCUMENTS.
`This application incorporates the following applications by
`reference: U.S. patent application Ser. No. 11/694,869, filed
`on Mar. 30, 2007, entitled METHOD AND SYSTEM FOR
`OFFLINE INDEXING OF CONTENT AND CLASSIFY(cid:173)
`ING STORED DATA, and U.S. patent application Ser. No.
`11/564,119, filed on Nov. 28, 2006, entitled SYSTEMS AND
`METHODS FOR CLASSIFYING AND TRANSFERRING
`INFORMATION IN A STORAGE NETWORK.
`
`BACKGROUND
`
`Corporations and other organizations routinely copy data
`produced and/or stored by their computer systems in order to
`retain an archive of the data. For example, a company might
`retain data from computing systems related to e-commerce, 40
`such as databases, file servers, web servers, and so on. The
`company may also retain data from computing systems used
`by employees, such as those used by an accounting depart(cid:173)
`ment, marketing department, engineering, and so on.
`Often, such retention and/or archiving amasses large 45
`amounts of data. There may be data copied or retained by way
`of periodic or one-time backups, continuous data protection
`(CDP) backups, snapshot backups, and so on. The data may
`include personal data, such as financial data, customer/ client/
`patient contact data, audio/visual data, and other types of 50
`data. Organizations may also retain data related to the correct
`operation of their computer systems, such as operating sys(cid:173)
`tem files, application files, user settings, and so on.
`Once the stored data has aged a certain amount of time, the
`data storage systems may send the data to a data archive that 55
`stores the data for as long as is required. Typical data storage
`systems create a first storage copy for short term data recov(cid:173)
`ery and after a certain time send the copies to an archive for
`long term storage. Thus, organizations are storing large
`amounts of data in their data archives at great expense.
`Organizations increasingly rely on computer systems to
`produce and store critical information and the retention and
`recovery of data may cause problems in their operation and
`overall effectiveness. For example, a data storage system may
`receive an identification of a file location to store and create 65
`one or more storage files containing the contents of the stored
`file and/or location. The data storage system can then restore
`
`2
`data from these storage files (such as backup files) should
`anything happen to the original data.
`At times, organizations may want to quickly access data
`stored in their data archives. For example, an organization
`5 may receive a discovery request for a small amount of email
`data. Although the amount of requested data may be small, the
`data storage system may need to search many archive files
`(such as backup tapes) to find the requested data.
`Companies are often required to retain documents in
`10 archive files in order to comply with various regulations. For
`example, when a company is in litigation, the company may
`be required to retain documents related to the litigation.
`Employees are often asked not to delete any correspondence,
`emails, or other documents related to the litigation. Recently
`15 enacted amendments to Federal Rules of Civil Procedure
`(FRCP) place additional document retention burdens on a
`company. According to Gartner, "Several legal commentators
`believe that the heart of the proposed changes to FRCP is the
`formal codification of "electronically stored information"
`20 (ESI) and the recognition that the traditional discovery frame(cid:173)
`work dealing with paper-based documents is no longer
`adequate." Legal discovery of electronic information has
`emerged as a key requirement for today's enterprise in recent
`years, and the new federal rules both strengthen and expand
`25 those requirements.
`Complying with all of the regulations related to document
`retention can be difficult, particularly when many employees
`may have relevant documents stored under their control that
`are relevant to the issue at hand. Penalties for violation of
`30 regulations related to document retention can be steep, and
`executives and business managers want confidence that
`employees are taking appropriate steps to comply with the
`regulations. Employees may forget about requests to retain
`documents, or may not think that a particular document is
`35 relevant when others would disagree.
`Companies also need provisions for finding retained docu(cid:173)
`ments. Traditional search engines accept a search query from
`a user, and generate a list of search results. The user typically
`views one or two of the results and then discards the results.
`However, some queries are part of a longer-term, collabora(cid:173)
`tive process. For example, when a company receives a legal
`discovery request, the company is often required to mine all
`of the company's data for documents responsive to the dis(cid:173)
`covery request. This typically involves queries of different
`bodies of documents lasting days or even years. Many people
`are often part of the query, such as company employees, law
`firm associates, and law firm partners. The search results must
`often be viewed by more than one of these people in a well(cid:173)
`defined set of steps (i.e., a workflow ). For example, company
`employees may provide documents to a law firm, and asso(cid:173)
`ciates at the law firm may perform an initial reading of the
`documents to determine if the documents contain relevant
`information. The associates may flag documents with
`descriptive classifications such as "relevant" or "privileged."
`Then, the flagged documents may go to a law firm partner that
`will review each of the results and ultimately respond to the
`discovery request with the set of documents that satisfies the
`request.
`Collaborative document management systems exist for
`60 allowing multiple users to participate in the creation and
`revision of content, such as documents. Many collaborative
`document management systems provide an intuitive user
`interface that acts as a gathering place for collaborative par-
`ticipants. For example, Microsoft Sharepoint Server provides
`a web portal front end that allows collaborative participants to
`find shared content and to participate in the creation of new
`content and the revision of content created by others. In
`
`Page 24 of 37
`
`
`
`3
`addition to directly modifying the content of a document,
`collaborative participants can add supplemental information,
`such as comments to the document. Many collaborative docu(cid:173)
`ment management systems also provide workflows for defin(cid:173)
`ing sets of steps to be completed by one or more collaborative
`participants. For example, a collaborative document manage(cid:173)
`ment system may provide a set of templates for performing
`common tasks, and a collaborative participant may be guided
`through a wizard-like interface that asks interview-style ques-
`tions for completing a particular workflow.
`The foregoing examples of some existing problems with
`data storage, archiving, and restoration are intended to be
`illustrative and not exclusive. Other limitations will become
`apparent to those of skill in the art upon a reading of the
`Detailed Description below.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`10
`
`FIG. lA is a block diagram illustrating a data archival and
`data retrieval system.
`FIG. lB is a block diagram illustrating an alternative data
`archival system.
`FIG. lC is a block diagram illustrating an alternative data
`archival system.
`FIG. 2A is a block diagram illustrating components of a
`data stream.
`FIG. 2B is a block diagram illustrating an example of a data
`storage system.
`FIG. 2C is a block diagram illustrating components of
`server used in data storage operations.
`FIG. 3 is a block diagram illustrating components used to
`create an archive file and store an archive copy.
`FIG. 4 is a block diagram illustrating the architecture of an
`archive file.
`FIG. 5 is a schematic diagram illustrating the storage of
`data chunks on storage components.
`FIG. 6 is a flow diagram illustrating an exemplary routine
`for copying data.
`FIG. 7 is a flow diagram illustrating an exemplary routine
`for creating an archive copy of data.
`FIG. 8 is a flow diagram illustrating an exemplary routine
`for reducing a data set to single instances of data.
`FIG. 9 is a flow diagram illustrating an exemplary routine
`for indexing an archive copy of a data set.
`FIG. 10 is a flow diagram illustrating an exemplary routine
`for encrypting an archive copy of a data set.
`FIG. 11 is a block diagram illustrating a storage policy for
`creating a data archive for an existing archived data set.
`FIG. 12 is a block diagram illustrating an alternative data
`archive and retrieval system.
`FIG. 13 is a flow diagram illustrating an exemplary routine
`creating an archive copy of data from an archived data set.
`FIG. 14 is a block diagram illustrating an example archi(cid:173)
`tecture for integrating a collaborative search system with a
`collaborative document management system.
`FIG. 15 is a block diagram illustrating an example integra- 55
`ti on of a content indexing system to provide access to dispar(cid:173)
`ate data sources.
`FIG. 16 is a schematic diagram illustrating integration of
`parsers with a typical collaborative document management
`system.
`FIG. 17 is a flow diagram illustrating typical processing in
`response to a document retention request.
`
`45
`
`COPYRIGHT NOTICE
`
`A portion of the disclosure of this patent document con(cid:173)
`tains material that is subject to copyright protection. The
`
`US 8,140,786 B2
`
`4
`copyright owner has no objection to the facsimile reproduc(cid:173)
`tion by anyone of the patent document or the patent disclo(cid:173)
`sures, as it appears in the Patent and Trademark Office patent
`files or records, but otherwise reserves all copyright rights
`whatsoever.
`
`DETAILED DESCRIPTION
`
`Examples of the technology provided below describe sys(cid:173)
`tems and methods of creating an archive copy or copies of a
`data set. Although described in connection with certain
`examples, the systems described herein are applicable to and
`may employ any wireless or hard-wired network or data stor(cid:173)
`age system that stores and conveys data and information from
`15 one point to another, including communication networks,
`enterprise networks, storage networks, and so on.
`Examples of the technology describe a method and system
`of creating an archive copy from one or more secondary
`copies that are created from an original data set, or primary or
`20 production copy, such as data from a file system. For example,
`instead of using certain types of secondary copies, such as
`recovery copies, snapshot volumes, and so on, to archive data
`(e.g., waiting until a recovery copy has aged a certain time
`period and then storing some or all of the recovery copy as an
`25 archive copy), the system creates an archive copy of the data
`during or soon after creating other secondary copies. That is,
`the system may create a certain type of secondary copy that
`may be used for long term archival purposes from any data
`under management by the system. For example, this copy
`30 may be single instanced and then encrypted, unlike other
`secondary copies under management by the system.
`Alternatively, examples of the technology describe a
`method and system of creating the archive copy directly from
`the primary copy (i.e., the original data set), such as the
`35 primary copy of a file system, an exchange server, a SQL
`database, and so on. For example, the system may create an
`archive copy of data without first making creating other sec(cid:173)
`ondary copies.
`Furthermore, examples of the technology describe a
`40 method and system of creating a