`
`Exhibit L
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 2 of 23 PageID #: 550
`I 1111111111111111 11111 111111111111111 IIIII IIIII IIIII IIIII 1111111111 11111111
`US006343295Bl
`US 6,343,295 Bl
`Jan.29,2002
`
`(12) United States Patent
`MacLeod et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) DATA LINEAGE
`
`2000.*
`
`(75)
`
`Inventors: Stewart P. MacLeod, Redmond; Casey
`L. Kiernan, Kirkland; Vij Rajarajan,
`Issaquah, all of WA (US)
`
`(73) Assignee: Microsoft Corporation, Redmond, WA
`(US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`(21) Appl. No.: 09/212,218
`
`(22) Filed:
`
`Dec. 16, 1998
`
`Int. Cl.7 ....................... ......................... G06F 17/00
`(51)
`(52) U.S. Cl. ............................. 707/103; 707/6; 707/10;
`707 /102; 707 /101
`(58) Field of Search ................................ 707/1, 2, 3, 4,
`707/5, 6, 10, 101, 103, 102; 705/28; 709/316
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,193,185 A * 3/1993 Lanter ........................ 707/101
`5,970,476 A * 10/1999 Fahey ......................... 705/28
`5,991,741 A * 11/1999 Speakman et al. ............ 705/30
`5,991,751 A * 11/1999 Rivette et al.
`................. 707/1
`6,016,497 A * 1/2000 Suver ......................... 707/103
`6,230,212 Bl * 5/2001 Morel et al. ................ 709/316
`
`OIBER PUBLICATIONS
`
`Practical lineage tracing in data warehouses by Cui et al.,
`Dept. of Computer Science, Stanford University, Proc.; 16th
`International Conference, pp. 367-368, Feb. 29-Mar. 3,
`
`ExploITing data lineage for parellel optimization in external
`DBMS by Shek, et al, Inf. Sci. Lab., HRL Lab, Malibu, CA,
`paper in Data Engineering, 1999, Proc. 15th International
`Conference, p. 256, Mar. 23-26, 1999.*
`
`* cited by examiner
`
`Primary Examiner-Diane D. Mizrahi
`(74) Attorney, Agent, or Firm-Woodcock Washburn Kurtz
`Mackiewicz & Norris LLP
`
`(57)
`
`ABSTRACT
`
`A system for tracking the lineage of data in a database. Data
`within the tables are tracked by attaching lineage informa(cid:173)
`tion to the data, preferably, by adding a lineage identifier to
`each row in a table. Data that share a common lineage can
`be identified by virtue of sharing a common lineage identi(cid:173)
`fier. The lineage identifier can then be used to trace the
`source of the data, i.e., data having a common identifier
`share a common history. Additionally, the lineage identifier
`can provide details about transformations undergone by the
`data. For example, the lineage identifier can act as a pointer
`to a detailed history files of operations that were performed
`on the data to transform it into its current form. Preferably,
`the lineage identifier tracks program modules as well as
`specific versions of the program modules that transformed
`the particular data under consideration.
`
`8 Claims, 12 Drawing Sheets
`
`/156-\
`
`/-156---\
`
`empl_id mpl_name
`
`empl_sala
`
`dopt_id
`
`Lineage
`
`157a
`
`157
`
`12345
`
`Tom Smith
`
`30,000
`
`23456
`
`Jane Madiso 130,000
`
`34567
`
`John Jones
`
`53,000
`
`65432
`
`87654
`
`Sally Kerr
`47,000
`--+------l--'------+
`Tom Richard 93,000
`Joe Smith
`32,000
`
`45678
`
`56789
`
`9991
`
`9992
`
`9993
`
`9994
`
`9995
`
`13.000
`
`54,000
`
`Jesse Jame
`
`Mike Jones
`
`~f:r~i1=~~r ::6°::0
`
`435492295
`
`435492295
`
`435492295
`
`435492295
`
`435492295
`
`325492277
`
`325492277
`
`325492277
`
`325492277
`
`3 5492277
`
`157b
`
`20b
`
`151
`
`........
`
`158,--~
`
`r·
`158_
`
`·1GURE 6
`
`Cloudera Exhibit 1024 - Page 1 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 3 of 23 PageID #: 551
`
`U.S. Patent
`
`Jan.29,2002
`
`Sheet 1 of 12
`
`US 6,343,295 Bl
`
`,
`~-------~~~,~-,----------------------------~----~~~~~~~~-~~~-~-~~~ ~-------------~
`
`C
`
`(HOM24l
`[yios:£···1
`(R.AM25)
`
`~
`
`AP:>tJCATiON j
`PROGRJ.i.MS !
`I
`
`OTHER
`PROGR/~M-$
`37
`
`P.90GRl\M
`DAT/, 38
`
`Proces;
`
`ng Vn.;t .. 1
`1
`
`.,
`
`48
`··········~·
`
`i
`
`l
`
`! kard rnsk Drive 1
`,
`l!F 32
`
`f
`
`_______ .,. __ _
`
`~1.~a9n0ti.; Disk
`Driv,s !iF 3:3
`---------··r.
`
`Ht)S1 Adept-er
`
`!~
`
`55
`
`.... Symem Bus.23 ......................... [
`
`~--
`
`Opti.:.;B1 Dfive l/F i
`34
`·---------··i·
`
`i
`
`~J~t~'\'ork ~-:r: ?f3
`
`Ser:ai Pon VF
`4-6
`..
`
`• .. i
`
`62
`
`-----1
`I
`I
`I
`I
`I
`f
`I
`:
`
`::-t
`
`!L1~Nfi1
`~)
`
`:~ 11=r =-i·=·Mo~:~~~~~-'
`
`r~ ,
`
`O:xr-1'.•,l~t~r~s)
`
`FIGURE<
`
`··.o;pplicaticr.s
`311
`
`F!op::ty n:-ive -SG
`
`Cloudera Exhibit 1024 - Page 2 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 4 of 23 PageID #: 552
`
`U.S. Patent
`
`Jan.29,2002
`
`Sheet 2 of 12
`
`US 6,343,295 Bl
`
`(.)
`0
`
`Cll
`1ii
`0
`
`N □·m~ 1ii
`
`0
`
`.c
`N
`r--.
`
`Cll
`0
`N
`
`Cll
`0
`N
`
`.0
`0
`N
`
`(.)
`0
`N
`
`II
`r-------cn CJ
`
`□' 111t]j
`
`1ii
`0
`
`Cll
`N r--.
`
`0
`I.()
`
`<(
`N
`w
`a:::
`:::)
`(9 u::
`
`□
`
`==
`== ==
`==
`
`aj
`:5
`C.
`E
`0 u
`aj
`i::
`Q)
`(J)
`
`Cll
`1ii
`0
`
`0 r--.
`
`Cloudera Exhibit 1024 - Page 3 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 5 of 23 PageID #: 553
`
`~
`
`~ :=
`
`N
`Q
`Q
`N
`"'~
`N
`
`"""" ~ = """"
`
`~
`~
`•
`00.
`d •
`
`Ridgemont
`
`Ridgemont
`
`Ridgemont
`
`I
`
`60
`?
`
`research
`
`testing
`
`development
`
`East Bedford
`
`marketing
`
`East Bedford
`
`accounting
`
`54321
`
`65432
`
`76543
`
`87654
`
`98765
`
`I
`I
`
`I
`
`158
`
`\
`158
`
`54321
`
`76543
`
`76543
`
`87654
`
`65432
`
`93,00,9
`
`47,000
`
`53,000
`
`130,000
`
`30,000
`
`Tom Richard
`
`-,
`
`Sally Kerr
`
`John Jones
`
`Jane Madison
`
`Tom Smith
`
`56789
`
`45678
`
`34567
`
`23456
`
`12345
`
`158
`
`158
`
`dept_location
`
`dept_name
`
`dept_id
`
`dept_ld
`
`empl_salary
`
`empl_name
`
`empl_id
`
`✓
`152
`
`15~
`
`156....
`
`156
`
`156
`
`150 .........
`
`eo
`Ol
`\0
`"N
`~
`~
`0'I
`rJ).
`
`i--
`
`e
`
`N
`~
`0 .....
`
`~
`~
`~
`::r
`r.,:,
`
`FIGURE 28
`
`\160/
`
`i ,···
`
`'
`
`\
`
`i
`
`56,000 5,000
`
`12,000 1,000
`
`statblob
`
`rows dpages
`
`\,~
`
`158~~~-:
`
`,___ 1761
`l.~O. f !
`
`158,.-}
`
`id
`
`I
`
`154--.....
`
`160
`
`Cloudera Exhibit 1024 - Page 4 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 6 of 23 PageID #: 554
`
`'""""
`eo
`Ol
`\0
`"N
`~
`~
`0'I
`rJ).
`
`e
`
`N
`~
`0 .....
`
`,1::,..
`~
`~
`::r
`r.,:,
`
`98
`
`r
`
`97
`
`Events
`
`Events+
`
`Engine
`
`SQL Server
`
`[
`
`Actions1-
`
`.. -·····-__,
`
`~ ___ S_N_M_P_
`
`Replication
`
`Task Execution
`Alerting/Paging
`
`Event Management,
`
`SQL Server Agent
`
`96
`
`FIGURE 3
`
`Services
`
`Objects
`
`Data Transformation
`
`SQL Distributed Management
`
`HTML
`Wizards,
`
`UI,
`
`91
`
`:too
`
`~
`
`~ :=
`
`N
`Q
`Q
`N
`"'~
`N
`
`90
`
`lr-
`
`I
`
`SOL Namespa
`
`99
`
`95
`
`"""" ~ = """"
`
`~
`~
`•
`00.
`d •
`
`I
`I
`I
`I
`I
`I
`------,
`94
`
`Tools
`
`Applications
`
`Enterprise Manager
`
`SQL Server
`
`93
`
`------------------------
`
`92
`
`Cloudera Exhibit 1024 - Page 5 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 7 of 23 PageID #: 555
`
`U.S. Patent
`
`Jan.29,2002
`
`Sheet 5 of 12
`
`US 6,343,295 Bl
`
`0
`r--
`
`('")
`C:
`0
`~ (.)
`0
`.2
`C\I
`a.
`a.
`<(
`Q)
`en
`Cll
`.0
`2
`Cll
`0
`
`C\I
`C:
`0
`'fil
`.2
`a.
`a.
`<(
`Q)
`en
`Cll
`.0
`Cll
`
`ro
`0
`
`Cll
`C\I
`r--
`
`-0
`0
`C\I
`
`(.)
`N
`I'--
`
`C:
`.Q -0
`..... 0
`-~ C\I
`a.
`a.
`<(
`Q)
`en
`Cll
`.0
`2
`Cll
`0
`
`"O
`N
`I'--
`
`I!)
`C:
`0
`
`'fil
`.2 a. a.
`
`<(
`Q)
`en
`Cll
`.0
`2
`Cll
`0
`
`-.:I"
`C:
`0
`'fil
`.2
`a.
`a.
`<(
`Q)
`en
`Cll
`.0
`Cll
`
`ro
`0
`
`-.:::I'"
`w
`0:::
`=>
`~
`LL
`
`Cloudera Exhibit 1024 - Page 6 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 8 of 23 PageID #: 556
`
`i,,,-.
`~
`'° (It
`'N
`~
`,I;;;..
`~
`O'I
`rJ'J.
`e
`
`'"""' N
`0 ....,
`O'I
`~ ....
`'Jl =-~
`
`N
`0
`0
`N
`~~
`N
`?
`~
`~
`
`~ = ......
`
`~ ......
`~
`r:JJ. .
`d .
`
`(15~ ~ j
`
`157
`
`157a
`
`(
`
`(156\
`
`Lineage
`
`Data 1 and Add
`
`Convert Application
`
`202
`
`I
`
`----...
`
`155
`
`76543
`
`76543
`
`65432
`
`157b,___J
`
`15791 325492277
`15791 325492277 I
`325492277
`15791
`15791 325492277
`15791 325492277
`4354922951
`54321
`76543 435492295 j
`76543 435492295
`87654 435492295
`65432 435492295
`
`I
`
`200,000
`
`77,000
`
`54,000
`
`13,000
`
`32,000
`
`93,000
`
`47,000
`
`53,000
`
`130,000
`
`Dick Nixon
`Sally Starr
`
`Mike Jones
`_James _
`Jesse
`Joe Smith
`
`Sally Kerr
`
`Richard
`
`-
`
`I
`
`9995
`
`9994
`
`9993
`
`~···~--
`
`9992
`
`9991
`
`56789
`
`45678
`
`John Jones
`
`Madison
`
`-
`
`Jane
`
`34567
`---------
`
`I 23456
`
`30,000
`
`Tom Smith
`
`12345
`
`FIGURE 5
`
`I
`
`158 __
`
`158'
`
`Lineage
`
`empl_id empl_name empl_salary dept_id
`
`I Nixon, Dick wo.~
`
`77,000
`
`Starr, Sally
`
`"'-J>----+---
`
`9995
`
`9994
`
`158~
`
`54,000
`
`13,000
`
`32,000
`
`salary
`
`Jones, Mike
`
`James, Jesse
`
`I
`
`Smith, Joe
`
`name
`L________,
`
`156 ,
`
`9993
`
`9992
`
`9991
`
`id
`
`158 r
`
`------..
`
`153
`
`\~-
`
`\
`
`I
`I
`( Convert Application
`
`Data 2 and Add
`
`Lineage
`
`/"<\
`
`204
`
`_,/
`
`160
`
`54321
`
`93,000,
`
`87654
`
`47,000
`
`53,000
`
`130,000
`
`30,000
`
`Tom Richard
`
`Sally Kerr
`
`John Jones
`
`Madison
`
`Jane
`
`56789
`
`45678
`
`34567
`
`I
`
`I
`
`158
`
`I 23456
`
`158
`
`Tom Smith
`
`12345
`
`dept_id
`
`empl_salary
`
`empl_name
`
`empl_id
`
`,--156
`
`156,
`
`152----..._
`
`Cloudera Exhibit 1024 - Page 7 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 9 of 23 PageID #: 557
`
`i,-
`~
`'° (It
`'N
`~
`~
`O'I
`rJ'J.
`e
`
`'"""' N
`0 ....,
`-..J
`~ ....
`'JJ. =(cid:173)~
`
`N
`0
`0
`N
`~~
`N
`?
`~
`~
`
`~ = ......
`
`~ ......
`~
`r:JJ. .
`d .
`
`Server Computer
`
`r----t8 D
`
`~~~
`
`0000000
`
`20b
`
`206
`
`157b
`
`70
`
`I
`
`15791 7~c ~--'
`
`325492277
`
`325492277
`
`15791
`15791
`
`FIGURE 6
`
`200,000/
`
`I
`
`77,000
`
`Dick Nixon
`Sally Starr
`
`/
`
`1
`
`54,000
`
`Mike Jones
`
`9995
`
`9994
`
`9993
`
`325492277
`
`}5791
`
`13,000
`
`Jesse James
`
`9992
`
`325492277
`
`1/191
`
`435492295
`
`54J21
`
`435492295
`
`435492295
`
`I 435492295
`
`,435492295
`
`7654f
`
`76543 /
`
`87654
`
`65432
`
`32,000
`
`Joe Smith
`
`9991
`
`93,000
`
`47,000
`
`Tom Richard
`Sally Kerr
`
`56789
`
`45678
`
`53,000
`
`John Jones
`
`34567
`
`130,000
`
`Jane Madisor
`
`30,000
`
`Tom Smith
`
`I 23456
`12345
`
`158'
`
`Lineage
`
`empl_sala11 dept_id
`
`empl_id ~mpl_name
`
`157a
`
`157
`
`\
`
`(15
`
`~ (156\
`
`151
`
`Cloudera Exhibit 1024 - Page 8 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 10 of 23 PageID #: 558
`
`U.S. Patent
`
`Jan.29,2002
`
`Sheet 8 of 12
`
`US 6,343,295 Bl
`
`,..._
`w
`0:::
`:::>
`C)
`u.
`
`'q"
`......
`
`N\
`
`rn
`(0
`T"""
`N
`
`.0
`co
`T"""
`N
`
`(.)
`
`co
`T"""
`N
`
`O')
`T"""
`
`N
`...... N
`
`a:,
`
`~I ~
`
`'
`
`. a..
`<(
`I-
`<(
`0
`
`>
`i=
`0
`<(
`
`~
`0
`I-
`(fJ
`::::>
`0
`
`w
`0
`z w
`0
`w
`0
`w
`0:::
`0...
`
`(0
`T"""
`N
`
`\
`
`(fJ
`:::i::::
`(fJ
`<(
`I-
`
`N
`N\
`
`......
`
`0 ...... N
`
`(fJ
`0...
`w
`I-
`(fJ
`
`w
`(.9
`~
`0
`<( a..
`(fJ
`I-
`0
`
`(fJ z
`~
`::::>
`_J
`0
`0
`
`T"""
`
`N N
`\
`
`-_ -=S_
`
`~
`0:::
`0
`LL
`(fJ
`z
`<(
`0:::
`I-
`
`N
`N
`N
`
`,-
`!~
`
`I \i;~-
`
`I
`
`_J
`<(
`co
`0
`_J
`(.9
`
`Cloudera Exhibit 1024 - Page 9 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 11 of 23 PageID #: 559
`
`U.S. Patent
`
`Jan.29,2002
`
`Sheet 9 of 12
`
`US 6,343,295 Bl
`
`218
`
`FIGURE 88
`
`FiGURE 8C
`
`FlGURE SA
`
`Cloudera Exhibit 1024 - Page 10 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 12 of 23 PageID #: 560
`
`i,-
`~
`'° (It
`'N
`~
`~
`O'I
`rJ'J.
`e
`
`'"""' N
`0 ....,
`'"""' 0
`~ ....
`'JJ. =(cid:173)~
`
`N
`0
`0
`N
`~~
`N
`?
`~
`~
`
`~ = ......
`
`~ ......
`~
`r:JJ. .
`d .
`
`216b
`
`FIGURE 9
`
`Main= DTSTransformStat_OK
`D TS Destination(' 'Lineage_ Short''] = D TS Source{' 'D TS Lineage_ Short'']
`D TS Destination(' 'Lineage_F ull' ') = D TS Source(' 'D TS Lineage_F ull' ')
`D TS Destination(' 'contract'') = D TS Source(' 'contract'']
`D TS Destination(' 'zip'') = D TS Source(' 'zip'')
`DTSDestination("state") = ucase( DTSSource("state"))
`D TS Destination(' 'city'') = D TS Source(' 'city'')
`D TS Destination('' address'') = D TS Source('' address'')
`DTSDestination(''phone'') = DTSSource(''phone'')
`D TS Destination(' 'au_fname' ') = D TS Source(' 'au_fname' ')
`D TS Destination('' au_lname' ') = D TS Source('' au_lname' ')
`DTSDestination(''au_id'') = DTSSource(''au_id'')
`
`End Function
`
`-----
`
`Function Main()
`
`•xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
`' destination column
`' Copy each source column to the
`' Visual Basic Transformation Script
`•xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
`
`234
`
`233
`
`Cloudera Exhibit 1024 - Page 11 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 13 of 23 PageID #: 561
`
`U.S. Patent
`
`Jan.29,2002
`
`Sheet 11 of 12
`
`US 6,343,295 Bl
`
`0
`t--
`
`C
`0
`:;:; cu
`C:
`:;:;
`en
`Q)
`0
`
`Cf) a..
`w
`I-
`Cf)
`
`T""
`N
`N
`
`Q) e
`
`:::J
`0
`Cf)
`
`il!J
`
`E ....
`,.Q
`en
`C: cu
`....
`I-
`
`i--.:
`Cl)
`UJ
`0
`
`UJ
`(.)
`0:::
`:J
`0
`Cl)
`
`(.)
`co
`T""
`N
`
`O')
`T""
`N
`
`co
`T""
`N
`
`cu
`N
`t--
`
`LU
`c::::
`:::,
`(9
`LL
`
`Cloudera Exhibit 1024 - Page 12 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 14 of 23 PageID #: 562
`
`U.S. Patent
`
`Jan.29,2002
`
`Sheet 12 of 12
`
`US 6,343,295 Bl
`
`71
`
`~ ...................
`
`250
`
`11/24/9? L'.::S"":ac,11. PM
`:il/:-:,;;/1:'."" 12:0:!'.:ic•ti ~•~,:
`
`~.V..:":1.l~& j;Sc!-:;i~ 9M
`
`21.>4
`
`FIGURE 1i
`
`Cloudera Exhibit 1024 - Page 13 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 15 of 23 PageID #: 563
`
`US 6,343,295 Bl
`
`1
`DATA LINEAGE
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`This applcation is related by subject matter to the inven-
`tions disclosed in commonly assigned U.S. patent applica(cid:173)
`tion Ser. No 09/212,238, filed on Dec. 16, 1998, entitled
`"DATA LINEAGE DATA TYPE" and pending U.S. patent
`application Ser. No. 09/213,069, filed on Dec. 16, 1998,
`entitled "GRAPHICAL QUERY ANALYZER."
`
`TECHNICAL FIELD
`
`5
`
`The present invention relates generally to database
`systems, and more particularly to a system for maintaining 15
`lineage information for data stored in a database.
`
`BACKGROUND OF THE INVENTION
`
`2
`lineage information to the data, preferably, by adding a
`lineage identifier to each row in a table. Data that share a
`common lineage can be identified by virtue of sharing a
`common lineage identifier.
`The lineage identifier can then be used to trace the source
`of the data, i.e., data having a common identifier share a
`common history. Additionally, the lineage identifier can
`provide details about transformations undergone by the data.
`For example, the lineage identifier can act as a pointer to a
`10 detailed history files of operations that were performed on
`the data to transform it into its current form. Preferably, the
`lineage identifier tracks program modules as well as specific
`versions of the program modules that transformed the par-
`ticular data under consideration.
`As a result of the data lineage mechanism, users can trace
`the history data in a table, even when that data has made
`several hops among databases, where the data has undergone
`one or more transformations, or where the transforming
`program modules have themselves under gone revision. This
`20 provides users with a powerful mechanism to have higher
`confidence in the quality and reliability of data in a database
`and to quickly trace and correct errors in the data.
`
`A relational database is a collection of related data that is
`organized in related two-dimensional tables of columns and
`rows wherein information can be derived by performing set
`operations on the tables, such as join, sort, merge, and so on.
`The data stored in a relational database is typically accessed
`by way of a user-defined query that is constructed in a query
`language such as Structured Query Language ("SQL"). A 25
`SQL query is non-procedural in that it specifies the objective
`or desired result of the query in a language meaningful to a
`user but does not define the steps to be performed, or the
`order of the steps in order to accomplish the query.
`Moreover, very large conventional database systems pro- 30
`vide a storehouse for data generated from a variety of
`locations and applications ( often referred to as data ware(cid:173)
`houses or data marts). The quality and reliability of the
`storehouse is greatly effected by the quality and reliability of
`its underlying data. Because the data can originate from a 35
`variety of sources, the quality and reliability of data will
`often depend on the quality and reliability of the source.
`Moreover, the matter is further complicated because indi(cid:173)
`vidual rows of data within a single table can originate from
`different sources.
`Currently, if the data in a database is questionable, there
`is no easy way to track the history of the data to determine
`where it originate or how it may have been changed. As
`such, it would be advantageous to users of a database to have
`tools that allow the users to trace aspects of the history (i.e.,
`where the data originated and how the data has been
`transformed) of the data in a database.
`The task of tracing aspects of the history of data in a
`database is further complicated in enterprise-wide databases
`(such as data warehouses) where data may flow into the
`database from direct as well as indirect data sources, (i.e.,
`the data may have been collected from another database that
`itself directly or indirectly derived the data). In other words,
`the data may have made multiple "hops" before reaching the 55
`destination database of interest.
`As such, there is a need for providing method and
`apparatus for determining information about the history (i.e.,
`lineage) of data contained within a database.
`
`40
`
`45
`
`50
`
`60
`
`SUMMARY OF THE INVENTION
`Briefly, the present invention is directed toward database
`technology that provides users with powerful tools neces(cid:173)
`sary to manage and exploit data. The present invention
`provides a system and method for tracking the lineage of 65
`data within database tables. According to an aspect of the
`invention, data within the tables are tracked by attaching
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Other features of the invention are further apparent from
`the following detailed description of presently preferred
`exemplary embodiments of the invention taken in conjunc(cid:173)
`tion with the accompanying drawings, of which:
`FIG. 1 is a block diagram representing a computer system
`in which aspects of the present invention may be incorpo(cid:173)
`rated;
`FIG. 2A is schematic diagram representing a network in
`which aspects of the present invention may be incorporated;
`FIG. 2B is a diagram representing tables in an exemplary
`database;
`FIG. 3 is an architecture of an exemplary database man(cid:173)
`agement system;
`FIG. 4 is a network of database systems depicting the
`logical flow of data;
`FIG. 5 is a diagram showing the transformation of data as
`it moves between databases;
`FIG. 6 is a diagram of the binding of data lineage
`information to rows of data in a database;
`FIG. 7 is a functional diagram of a data transformation
`package;
`FIGS. 8A-8C are depictions of a graphical interface for
`attaching data lineage information to data imported into a
`database;
`FIG. 9 is an ActiveX® script for importing data into a
`database while adding data lineage information;
`FIG. 10 is a data pump architecture for importing data into
`a database; and
`FIG. 11 is a window showing data lineage information
`attached to a row of data.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Overview
`
`The present invention provides a database management
`system that provides for tracking and tracing the lineage of
`data stored in a database. The present exemplary embodi(cid:173)
`ments described herein describe the invention in connection
`with row level lineage. However, the invention is by no
`
`Cloudera Exhibit 1024 - Page 14 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 16 of 23 PageID #: 564
`
`US 6,343,295 Bl
`
`10
`
`4
`including an operating system 35, one or more application
`programs 36, other program modules 37 and program data
`38. A user may enter commands and information into the
`personal computer 20 through input devices such as a
`5 keyboard 40 and pointing device 42. Other input devices
`(not shown) may include a microphone, joystick, game pad,
`satellite disk, scanner or the like. These and other input
`devices are often connected to the processing unit 21
`through a serial port interface 46 that is coupled to the
`system bus, but may be connected by other interfaces, such
`as a parallel port, game port or universal serial bus (USE).
`A monitor 47 or other type of display device is also
`connected to the system bus 23 via an interface, such as a
`video adapter 48. In addition to the monitor 47, personal
`computers typically include other peripheral output devices
`15 (not shown), such as speakers and printers.
`The personal computer 20 may operate in a networked
`environment using logical connections to one or more
`remote computers, such as a remote computer 49. The
`remote computer 49 may be another personal computer, a
`server, a router, a network PC, a peer device or other
`common network node, and typically includes many or all of
`the elements described above relative to the personal com(cid:173)
`puter 20, although only a memory storage device 50 has
`been illustrated in FIG. 1. The logical connections depicted
`in FIG. 1 include a local area network (LAN) 51 and a wide
`area network (WAN) 52. Such networking environments are
`commonplace in offices, enterprise-wide computer
`networks, Intranets and the Internet.
`When used in a LAN networking environment, the per-
`sonal computer 20 is connected to the local network 51
`through a network interface or adapter 53. When used in a
`WAN networking environment, the personal computer 20
`typically includes a modem 54 or other means for establish-
`ing communications over the wide area network 52, such as
`the Internet. The modem 54, which may be internal or
`external, is connected to the system bus 23 via the serial port
`interface 46. In a networked environment, program modules
`depicted relative to the personal computer 20, or portions
`thereof, may be stored in the remote memory storage device.
`It will be appreciated that the network connections shown
`are exemplary and other means of establishing a communi(cid:173)
`cations link between the computers may be used.
`2. A Network Environment
`FIG. 2 illustrates an exemplary network environment in
`which the present invention may be is employed. Of course,
`actual network and database environments can be arranged
`in a variety of configurations; however, the exemplary
`environment shown here provides a framework for under(cid:173)
`standing the type of environment in which the present
`invention operates.
`The network may include client computers 20a, a server
`computer 20b, data source computers 20c, and databases 70,
`72a, and 72b. The client computers 20a and the data source
`computers 20c are in electronic communication with the
`server computer 20b via communications network 80, e.g.,
`an Intranet. Client computers 20a and data source computers
`20c are connected to the communications network by way of
`communications interfaces 82. Communications interfaces
`82 can be any one of the well-known communications
`interfaces such as Ethernet connections, modem
`connections, and so on.
`Server computer 20b provides management of database
`70 by way of database server system software, described
`65 more fully below. As such, server 20b acts as a storehouse
`of data from a variety of data sources and provides that data
`to a variety of data consumers.
`
`20
`
`25
`
`3
`means limited to row level lineage, as the invention could be
`applied on a column basis or a table basis as well.
`Exemplary Operating Environment
`1. A Computer Environment
`FIG. 1 and the following discussion are intended to
`provide a brief general description of a suitable computing
`environment in which the invention may be implemented.
`Although not required, the invention will be described in the
`general context of computer-executable instructions, such as
`program modules, being executed by a computer, such as a
`workstation or server. Generally, program modules include
`routines, programs, objects, components, data structures and
`the like that perform particular tasks or implement particular
`abstract data types. Moreover, those skilled in the art will
`appreciate that the invention may be practiced with other
`computer system configurations, including hand-held
`devices, multi-processor systems, microprocessor-based or
`programmable consumer electronics, network PCS,
`minicomputers, mainframe computers and the like. The
`invention may also be practiced in distributed computing
`environments where tasks are performed by remote process(cid:173)
`ing devices that are linked through a communications net(cid:173)
`work. In a distributed computing environment, program
`modules may be located in both local and remote memory
`storage devices.
`With reference to FIG. 1, an exemplary system for imple(cid:173)
`menting the invention includes a general purpose computing
`device in the form of a conventional personal computer 20
`or the like, including a processing unit 21, a system memory
`22, and a system bus 23 that couples various system com- 30
`ponents including the system memory to the processing unit
`21. The system bus 23 may be any of several types of bus
`structures including a memory bus or memory controller, a
`peripheral bus, and a local bus using any of a variety of bus
`architectures. The system memory includes read-only 35
`memory (ROM) 24 and random access memory (RAM) 25.
`A basic input/output system 26 (BIOS), containing the basic
`routines that help to transfer information between elements
`within the personal computer 20, such as during start-up, is
`stored in ROM 24. The personal computer 20 may further 40
`include a hard disk drive 27 for reading from and writing to
`a hard disk, not shown, a magnetic disk drive 28 for reading
`from or writing to a removable magnetic disk 29, and an
`optical disk drive 30 for reading from or writing to a
`removable optical disk 31 such as a CD-ROM or other 45
`optical media. The hard disk drive 27, magnetic disk drive
`28, and optical disk drive 30 are connected to the system bus
`23 by a hard disk drive interface 32, a magnetic disk drive
`interface 33, and an optical drive interface 34, respectively.
`The drives and their associated computer-readable media 50
`provide non-volatile storage of computer readable
`instructions, data structures, program modules and other
`data for the personal computer 20. Although the exemplary
`environment described herein employs a hard disk, a remov(cid:173)
`able magnetic disk 29 and a removable optical disk 31, it 55
`should be appreciated by those skilled in the art that other
`types of computer readable media which can store data that
`is accessible by a computer, such as magnetic cassettes, flash
`memory cards, digital video disks, Bernoulli cartridges,
`random access memories (RAMs), read-only memories 60
`(ROMs) and the like may also be used in the exemplary
`operating environment. Further, as used herein, the term
`"computer readable medium" includes one or more
`instances of a media type ( e.g., one or more floppy disks, one
`or more CD-ROMs, etc.).
`A number of program modules may be stored on the hard
`disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25,
`
`Cloudera Exhibit 1024 - Page 15 of 22
`
`
`
`Case 4:23-cv-01147-ALM Document 22-15 Filed 05/23/24 Page 17 of 23 PageID #: 565
`
`US 6,343,295 Bl
`
`6
`5
`In the example of FIG. 2, data sources are provided by
`several types of server software in a network and provides
`a primary interface for users who are administering copies of
`data source computers 20c. Data source computers 20c
`SQL Server on the network; (2) an Applications Interface 93
`communicate data to server computer 20b via communica(cid:173)
`that allows integration of a server interface into user appli-
`tions network 80, which may be a LAN, WAN, Intranet,
`5 cations such as Distributed Component Object Modules
`Internet, or the like. Data source computers 20c store data
`locally in databases 72a, 72b, which may be relational
`(DCOM); and (3) a Tools Interface 94 that provides an
`interface for integration of administration and configuration
`database servers, excel spreadsheets, files, or the like. For
`example, database 72a shows data stored in tables 150, 152,
`tools developed by Independent Software Vendors (ISV).
`and 154. The data provided by data sources 20c is combined
`Layer two opens the functionality of the SQL server to
`and stored in a large database such as a data warehouse
`10 other applications by providing three application program(cid:173)
`maintained by server 20b.
`ming interfaces (API): SQL N amespace 95, SQL Distributed
`Management Objects 99, and Data Transformation Services
`Client computers 20a that desire to use the data stored by
`server computer 20b can access the database 70 via com(cid:173)
`100. A user interface 91 is provided by Wizards, HTML, and
`munications network 80. Client computers 20a request the
`so on. SQL Namespace API 95 exposes the user interface
`15 (UI) elements of SQL Server Enterprise Manager 92. This
`data by way of SQL queries ( e.g., update, insert, and delete)
`allows applications to include SQL Server Enterprise Man(cid:173)
`on the data stored in database 70.
`ager UI elements such as dialog boxes and wizards.
`3. Database Architecture
`SQL Distributed Management Objects API 99 abstracts
`A database is a collection of related data. In one type of
`the use of DDL, system stored procedures, registry
`database, a relational database, data is organized in a two(cid:173)
`20 information, and operating system resources, providing an
`dimensional column and row form called a table. FIG. 2B
`API to all administration and configuration tasks for the SQL
`illustrates tables such as tables 150, 152, and 154 that are
`Server.
`stored in database 72a. A relational database typically
`Distributed Transformation Services API 100 exposes the
`includes multiple tables. A table may contain zero or more
`records and at least one field within each record. A record is
`services provided by SQL Server to aid in building data
`a row in the table that is identified by a unique numeric 25
`warehouses and data marts. As described more fully below,
`these services provide the ability to transfer and transform
`called a record identifier. A field is a subdivision of a record
`data between heterogeneous OLE DB and ODBC data
`to the extent that a column of data in the table represents the
`sources. Data from objects or the result sets of queries can
`same field for each record in the table.
`be transferred at regularly scheduled times or intervals, or on
`A database typically will also include associative struc(cid:173)
`30 an ad hoc basis.
`tures. An example of an associative structure is an index,
`Layer three provides the heart of the SQL server. This
`typically, but not necessarily, in a form of B-tree or hash
`layer comprises an SQL Server Engine 97 and a SQL Server
`index. An index provides for seeking to a specific row in a
`Agent 96 that monitors and controls SQL Server Engine 97
`table with a near constant access time regardless of the size
`of the table. Associative structures are transparent to users of 35 based on Events 98 that inform SQL Server Agent of the
`status of the SQL Server Engine 97. The Server Engine
`a database but are important to efficient operation and
`control of the database management system. A database
`processes SQL statements, forms and optimizes query
`execution plans, and so on.
`management system (DBMS), and in particular a relational
`database management system (RDBMS) is a control system
`Logical Database Application
`that supports database features including, but not limited to, 40
`The above description focused on physical attributes of an
`storing data on a memory medium, retrieving data from the
`exemplary database environment in which the present inven(cid:173)
`memory medium and updating data on the memory medium.
`tion operates. FIG. 4 logically illustrates the manner in
`As shown in FIG. 2B, the exemplary database is 72a
`which data moves among a number is of database servers,
`comprises employee table 150, department table 152, and
`which may simultaneously be data sources for other data-
`sysindexes table 154. Each table comprises columns 156 and
`45 base servers, to the destination database.
`rows 158 with fields 160 formed at the intersections. Exem(cid:173)
`Here, Database server 20b provides management of data(cid:173)
`plary employee table 150 comprises multiple columns 158
`base 70. Data for database 70 is provided by data sources
`includi

Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.

This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.

One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site