`[11] Patent Number:
`[19]
`United States Patent
`
`D0kt0r
`[45] Date of Patent:
`Oct. 20, 1998
`
`U8005826259A
`
`[54] EASILY EXPANDABLE DATA PROCESSING
`SYSTEM AND METHOD
`
`[75]
`
`Inventor: Karol Doktor, Wheelers Hill, Australia
`
`[73] Assignee: Financial Systems Technology Pty.
`Ltd., Melbourne, Australia
`
`[21] Appl. No.: 862,176
`.
`Flledi
`
`[22]
`
`May 22: 1997
`D t
`t'
`R l t d U S A l'
`a a
`e a e
`'
`'
`pp lca 1011
`11 1995 Pat No
`[60] Continuation of Sci N0 439 207 Ma
`5,675,779, which is a division of Ser. 3190283361), Jun. 28,
`1993, abandoned, which is a continuation of Ser. No.
`526’424’ May 21’ 1990’ abandoned
`Int. Cl.6 ...................................................... G06F 17/30
`[51]
`[52] us. Cl.
`...................................................... 707/4, 707/1
`[58] Field Of Search .................................... 707/1, 2, 3, 4,
`707/5, 6, 7, 8, 9, 10, 100, 101, 102, 103,
`104, 200; 345/352, 335
`
`[56]
`
`References Cited
`
`
`
`U~S- PATENT DOCUMENTS
`11/1971 Feng ..................................... 340/1725
`3,618,027
`6/1972 Bharwani et a1.
`340/1725
`3,670,310
`12/1978 Lin et a].
`..........
`364/900
`4,128,891
`1/1985 Kitakami et a1.
`.
`364/900
`4,497,039
`2/1985 Baker et a1.
`......
`364/900
`4,498,145
`3/1986 LlndStlom et al-
`364/300
`45759798
`12/1986 BaChman """""
`" 364/200
`476317664
`29%??? 1313:;
`:1};me """"""""""""""""" 323%?)
`
`2/1989 Baba ..
`4,807,122
`364/200
`5/1989 Green ............
`4,829,427
`364/300
`
`“1990 Shimaoka et a1.
`.. 364/200
`4,893,232
`
`.......................... 364/200
`4,901,229
`2/1990 Tashiro et al.
`4,918,593
`4/1990 Huber ...................................... 364/200
`4,930,071
`5/1990 Tou et a1.
`......
`364/300
`499309072
`5/1990 Aglawal et al-
`364/300
`rus e a .
`273337333
`31338 gadeflf elt a1~
`322/383
`4,967,341
`10/1990 Yamamoto et al.
`.................... 364/200
`
`--
`
`
`
`OTHER PUBLICATIONS
`,
`,
`,
`El—Sharkaw1 et al., “The Architecture and Implementation
`0f ENLI: Example—Based Natural Language—Assisted Inter-
`face”, 1990 IEEE, pp. 430—432.
`Kiefer et al., “SYGRAF: Implenting Logic Programs in a
`Database Style”, IEEE Transaction on Software Engineer-
`ing, vol. 14, No. 7, Jul. 1988, pp. 922—935.
`
`Korth et al., Database System Concepts, McGraw—Hill
`Book Company, Copyright 1986, pp. 45—323.
`Yu et al., “Automatic Knowledge Acquisition and Mainte-
`nance for Sematic Query Optimization”, IEEE Transactions
`on Knowledge and Data Engineering, vol. 1, No. 3, Sep.
`1989’ pp' 362—375
`(List continued on next page.)
`
`Primary Examiner—Thomas G. Black
`Assistant Examiner—Buay Lian Ho
`Attorney) Age/tn or Firm—Skjerven, Merrill, MacPherson,
`Franklin & Friel LLP; Edward C Kwok
`[57]
`ABSTRACT
`
`Machine automated techniques are described for a method
`of data processing called Relationships Processing. A com-
`puting system is disclosed which provides for the high speed
`recording and extraction of data objects (entities) and for the
`deVelOPmem Ola.” representing a queried relationShiP
`between the entities. The system is expandable to handle the
`relatively voluminous data bases of large, commercial data
`repositories. A user defines set of entities and allowed
`relationships between the entities. The user can expand this
`set of allowed entities and relationships at any time during
`the life of the system without reprogramming or compiling
`of computer program code or disrupting concurrent opera-
`tional use of the system. Large systems cannow be built that
`are no longer limited to a scope of design requirements
`known during initial systems development. For a given set
`of defined relationships the system allows the user to per-
`form complex inquiries (again without programming at the
`code level) that would normally require multiple nested
`inquiries to be coded programmatically and would not
`achieve the performance levels of the Relationships Proces-
`SOL
`
`(List continued on next page.)
`
`18 Claims, 19 Drawing Sheets
`
`ELATION R-l
`
`mam
`E.
`
`_—a"‘"8;ii;3?%Al"'EXéei3nlaalyimm
`
`Mandate——upling=Yes/No
`]1/E2
`‘i/m
`s busnw H I
`
`[NSTANCE
`
`1/E1
`
`= ”CustomerB“
`
`
`
`
`
`
`IBM EX. 1014
`
`001
`
`001
`
`
`
`5,826,259
`
`Page 2
`
`US. PATENT DOCUMENTS
`
`5,539,870
`5,542,073
`5,548,749
`5,581,785
`5,664,177
`
`.......................... 345/352
`7/1996 Conrad et al.
`........................ 395/600
`7/1996 Schiefer et a1.
`395/600
`8/1996 Kroenke et al.
`
`707/103
`12/1996 Nakamura etal.
`9/1997 Lowry ..................................... 707/100
`
`OTHER PUBLICATIONS
`
`Wilschut et a1., “Pipelining in Query Execution”, 1990
`IEEE Copyright 1990 p. 562.
`.
`’
`.
`.
`.
`’
`“Extended DISJuHCnVe Normal Form for Effluent Process—
`ing of Recursive Logic Quires”, IBM Technical Disclosure
`Bulletin, V01. 30, N0. 1, Jun. 1987, pp. 360—366.
`
`5,133,068
`5,168,565
`5,226,158
`5,239,663
`5,369,761
`5,379,419
`
`7/1992 Cru5 et a1.
`12/1992 Morlta ~~~~~
`7/1993 Horn etal-
`8/1993 Faudemay et a1.
`11/1994 Conley et a1.
`1/1995 Heffernan et a1.
`
`.............................. 395/600
`~395/600
`- 395/600
`. 395/800
`..... 707/2
`. 395/600
`
`
`
`
`- 395/600
`“1995 Boykln etal-
`573867557
`. 395/600
`1/1995 Eisenberg et a1.
`5,386,559
`. 395/600
`..
`4/1995 Bigelow etal.
`5,408,657
`.......................... 707/101
`5,459,860 10/1995 Burnett et al.
`5,488,722
`1/1996 Potok ...................................... 395/600
`5,504,885
`4/1996 Alashqur ................................. 395/600
`
`002
`
`002
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 1 0f 19
`
`5,826,259
`
`
`
`
`‘ ccess Control
`Program
`(Source Code)
`
`
`
`
`
`BULK STORAGE
`(Binary Code Strings)
`
`r—______________________________
`
`
`Aboasacontrol
`Erogram.
`(Object Code)
`
`
`
`
`UERY INPUT:
`(SOL)
`
`
`"Please find
`all books
`having xxx“
`
`
`
`
`UPDATE INPUT:
`(SQL)
`
`
`"Please add
`new books
`yyy-zzz"
`
`
`
`60
`
`FIG. 1A
`
`(Prior An)
`
`003
`
`003
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 2 0f 19
`
`5,826,259
`
`55528
`
`mmm?
`
`$82
`
`33
`
`my
`
`
`
` 355Vm:.OE
`
`$55$052
`
`m:2
`
`053
`
`223
`
`004
`
`004
`
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 3 0f 19
`
`5,826,259
`
`_.-<N.GE
`
`:3EE
`
`v
`
`mkoom
`
`8:83
`
`m_._o£:<
`
`
`8:83
`
`968E:
`
`5:8:
`
`8:
`
`05:85:
`
`8:8;
`
`982
`
`9.685.
`
`Egon
`
`005
`
`988
`
` 8:83
`
`
`@852
`
`982
`
`005
`
`
`
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 4 0f 19
`
`5,826,259
`
`N-<N.0_n_
`
`C:88v
`
`8:83
`
`888::
`
`8:8;
`
`25
`
`@885.
`
`888m
`
`lllllll
`
`882
`
`Fm.—mn_
`
`582.28%_
`
`mmm
`
`888
`
`8:83
`
`8:83
`
`@888:
`
`8:8;
`
`25
`
`888$.
`
`8:8;
`
`282
`
`988E:
`
`8:8;
`
`88,:
`
`282
`
`omm
`
`006
`
`lllllllllllllllllIL4..oz88m
`
`006
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`m,
`
`whS
`
`5,826,259
`
`86828:2.Em
`
`
`
`8:83952In:.oz
`
`007
`
` 9
`
`Tmm._®_n_
`
`€<5i
`
`aHokmvOHzO_.EN_z<mVEum._m<_.-m_>_._.<.m_m
`
`
`
`%8:83282IFI.02w86839:3Em
`
`5mxoommLo£s<Em
`m8:839&24.oz
`
`007
`
`
`
`pm_
`
`01,_N-m_N.6_n_
`
`_€<55V
`
`5,826,259
`
`_
`
`mm.G_n_
`
`US. Patent
`
`m.
`
`8w
`
`6whS
`
`2£259552Em
`M,8:839&2Im.02
`
`
`EmES
`
`Em
`
`081a
`
`008
`
`008
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 7 0f 19
`
`5,826,259
`
`wagSM*02an5m
`
`
`
`mm:€<horn:
`
`Tm.GE
`
`Semi
`
`=V=HZD—
`
`“*EII9884I
`
`mEmz
`
`009
`
`009
`
`
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 8 0f 19
`
`5,826,259
`
`wrm.GE
`
`92a:
`
`Eoccmm
`
`88E30
`
`~bomr
`
`S\</\
`
`22
`
`‘__——————.I
`l
`
`.6min...
`
`.m..=io._.fi..oio.._.
`
`awm
`
`010
`
`010
`
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 9 0f 19
`
`5,826,259
`
`FIG. 4A
`
`4m
`
`
`
`RELATION R-i
`
`Head Attribute + Tail Attribute
`Meanin =
`{ gar
`dinality = one to many}
`
`-Mandatory Coupling = Yes/No
`
`
`
`
`
`
`
`INSTANCE
`'1/E1
`
`Address
`
`
`
`
`II -: I“ fin-Iii}!
`= "Customer-B"
`
`
`
`
`
`
`/
`
`/
`
`
`
` W
`
`
`
`011
`
`011
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 10 0f 19
`
`5,826,259
`
`FIG. 4B
`
`.501 O
`
`/
`
` Account
`
`Telephone
`
`/ -
`
`/
`
`/
`
`012
`
`012
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 11 0f 19
`
`5,826,259
`
`FIG. 5
`
`Name of
`Single Table
`where instances
`are stored
`
`.Addresse
`
`T.Companie
`
`A
`
`Entity
`Type
`NO'
`2
`Slot No.
`
`.1.
`
`Entity
`Class Name
`Abbreviation
`(Full Name)
`
`CU
`(Customer)
`
`AD
`
`(Address)
`
`AC
`
`(Account )
`
`SU
`(Supplier)
`
`EA
`
`013
`
`013
`
`
`
`US. Patent
`
`()cL 20,1998
`
`Sheet12 0f19
`
`5,826,259
`
`FIG. 6A
`
`130-RP
`
`600
`
`RELDEF Table
`#6
`
`Name of
`Relation Relation
`First
`Single Table
`Tail Entity Cardrnalrty
`Class Name where instances Head Entity
`6001
`of Relation
`Type Number Type Number
`(Full Name)
`(Full Name)
`are stored
`
`
`
`.1
`
`(Customer)
`
`.2
`(Address)
`
`.3
`(Account)
`
`,1
`(Customer
`
`1
`
`3-
`(Account)
`
`2
`(Address)
`
`1
`(Customer)
`
`2
`(Address)
`
`{1:m}
`
`{1:1}
`
`{1 21}
`
`{1:1}
`
`I
`
`('s Owning)
`
`.3.
`
`.4.
`
`-SM-
`
`('5 St.
`Mailing)
`
`-HQ-
`('3 Main
`Headqtrs)
`
`Mandatory
`Head
`Coupling
`00g
`
`Optional
`Second thru Fifth
`Tail Entity
`Names
`(type)
`
`600e
`
`014
`
`014
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 13 0f 19
`
`5,826,259
`
`130-RP
`
`699
`
`RELDEF Table
`
`Optional
`Second thru Fifth
`Tail Entity
`Names
`Region
`6_0Qe
`
`(
`
`.4
`(Supplier
`
`.5
`(Area)
`
`T5
`
`Tail-Activation
`Mask
`(606)
`
`No.
`
`.6
`.8
`(Phone (Contact
`No.)
`Person)
`
`-3
`(Account)
`
`Entity
`Type Number
`(and Name)
`
`Relation
`Type
`
`015
`
`015
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 14 0f 19
`
`5,826,259
`
`______
`
`r
`
`________w_
`
`/\/
`
`_—_._..
`
`._—_
`
`ESEm____
`
`r
`
`m8392::E.m_ov
`_WNE-Dm-
`
`________-________4
`
`be.2_
`
`623.83E.:5.
`
`
`memcom-
`£3E
`
`
`HmmDOmm
`
`:2
`
`9:225
`
`”3:80:399:
`
`lam:
`
`016
`
`016
`
`
`
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 15 0f 19
`
`5,826,259
`
`!
`
`Illlll—lllllll
`
`;_________
`
`mammengg9.”NE
`a.lllllllllll
`
` .~...
`SE8:5829255p8;
`
`.58:82$9:memamm.
`
`83855cm;mmm
`.wmmgvAll...
`
`NS.GE
`
`017
`
`"Tbm
`
`
`
`'dlmecmEoop
`
`o:
`
`
`
`mm__n_oEoS<m.:m=<
`
`.2:860mwhom
`
`017
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 16 0f 19
`
`5,826,259
`
`00
`
`
` BULK STORAGE
`
`(Binary Code Strings)
`
`130* j INQ.DEF
`
`
`
`
`REITDEF
`RELDEF
`t
`
`/-RiT-1\ /-RiT-2'\ /—
`
`
`ENT.DEF
`l
`
`EiT-1
`
`
`
`
`
`
`
`
`ENT.DEF
`|
`EiT-2
`
`ENTDEF
`1
`EiT-3
`
`Access Control
`Program
`(Source Code
`
`Access Control
`Program
`
`
`
`Ob
`
`
`UERY INPUT:
`
`W UERY OUTPU
`
`
`
`FIG' 8
`
`140
`
`150
`
`"Please find
`
`
`all books
`
`
`having xxx"
`
`
`
`
`HEW
`Restructure
`
`SEQST?
`"Please add
`
`
`
`
`new bootlfs
`"Please add
`
`
`
`YW‘ZZZ
`new REL
`Rnn-Rmm"
`
`
`870
`
`018
`
`018
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 17 0f 19
`
`5,826,259
`
`EDBIDU
`
`oom
`
`_.m.m:“—Emao9.2%N8?
`
`
`
`05sac“.512E507:
`
`8m
`
`DDDIDD
`
`EEEIEE2E.
`
`.................N.-F2E-Fzm+25
`
`mzfis.mosoE50;
`
`5a
`
`onCI
`
`
`
`33II.03I33I
`
`
`
` /I.|.|I:
`
`IIIENE/tawm33
`
`28m/III
`
`
`
`
`
`mzfi:Iomfimoz<moEEmzoEjmmENE«mm
`
`Ill
`
`magi(II
`
`019
`
`019
`
`
`
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 18 0f 19
`
`5,826,259
`
`N-m.GE
`
`m.GE
`
`om.
`
`
`
`295mgmEUmEEE
`
`gm:588253
`
`m_20£33m@25me moi
`
`
`
`0:3555%;mmm---mo_co:oo_mmequw
`
`sea9:52:SN........9:28m98m
`
`
`
`55830m2..........moS<.822
`
`2650
`
`
`
`mo<mokm>_._._.Zm_
`
`E
`
`55m
`
`mz<m=2
`
`NucmTtm
`
`020
`
`020
`
`
`
`
`
`
`US. Patent
`
`Oct. 20, 1998
`
`Sheet 19 0f 19
`
`5,826,259
`
`Iago:m.IER_188mm;$8:ng38:32.Imocgwg
`
`
`$24.:IHmm;E.
`
`
`
`SufiAIIIIIII058,595A||ln|938255Allllllxllmgtsm
`520:m..
`
`m-_m>m.__.-_w>m._
`
`@955,55352:m_
`Ego:m.|
`
`.,
`
`_LuQo.
`
`6201m.
`
`_.BEEmE2Enw:
`
`\\I')/.w,3,.$901w..éEUao_.$3.
`
`
`E\c.e___M8%0_NEE
`
`021
`
`
`
`EN>x_a-_-
`EESEE%o_Iu
`
`EN_>_x_
`
`
`\I/).2/.gH620:w..5388,6855;
`
`
`
`
`
`
`
`E?>3E;9:3599%28E2:82Ebdf.2.,E..E.6:9,6952:0
`
`021
`
`
`
`
`
`
`5,826,259
`
`1
`EASILY EXPANDABLE DATA PROCESSING
`SYSTEM AND METHOD
`
`This application is a continuation of application Ser. No.
`08/439,207, filed May 11, 1995, now US. Pat. No. 5,675,
`779 which is a divisional of Ser. No. 08/083,361, filed Jun.
`28, 1993 now issued; which is a continuation of Ser. No.
`07/526,424, filed May 21, 1990, now abandoned.
`
`BACKGROUND OF THE INVENTION
`
`Cross Reference to Microfiche Appendix
`
`This application includes a plurality of computer program
`listings (modules) in the form of a Microfiche Appendix
`which is being filed concurrently herewith as 1162 frames
`(not counting target and title frames) distributed over 20
`sheets of microfiche in accordance with 37 C.F.R. § 1.96.
`The disclosed computer program listings are incorporated
`into this specification by reference but it should be noted that
`the source code and/or the resultant object code of the
`disclosed program modules are subject to copyright protec-
`tion. The copyright owner has no objection to the facsimile
`reproduction by anyone of the patent document (or the
`patent disclosure as it appears in the files or records of the
`US. Patent and Trademark Office) for the sole purpose of
`studying the disclosure but otherwise reserves all other
`rights to the disclosed computer program modules including
`the right to reproduce said computer program modules in
`machine-executable form.
`
`Field of Invention
`
`The present invention relates generally to computer data-
`base management systems and more specifically to appara-
`tus and methods for modifying and searching through large
`scale databases at high speed.
`
`Description of Related Art
`
`Modern computer systems are capable of storing volumi-
`nous amounts of information in bulk storage means such as
`magnetic disk banks. The volume of stored information can
`be many times that of the textual information stored in a
`conventional encyclopedia or in the telephone directory of a
`large city. Moreover, modern computer systems can sift
`through the contents of their bulk storage means at
`extremely high speed, accessing as many as one million
`bytes of information or more per second (a byte is a string
`of eight bits, equivalent to approximately one character of
`text in layman’s terms). Despite this capability, it may take
`an undesirably long time (i.e., hours or days) to retrieve
`desired pieces of information. In commercial settings such
`as financial data storage facilities, there will be literally
`billions of pieces of information that could be sifted through
`before the right one or more pieces of information are found.
`Thus, even at speeds of one million examinations per
`second, it can take thousands of seconds (many hours) to
`retrieve a desired piece of information. Efficient organiza-
`tion of the stored information is needed in order to minimize
`retrieval time.
`
`The methods by which pieces of information are orga-
`nized within a computer, searched through or reorganized,
`often parallel techniques used by older types of manual
`information processing systems. A well known example of
`a manual system is the index card catalog found in public
`libraries. Such a card catalog consists of a large number of
`uniformly dimensioned paper cards which are serially
`stacked in one or more trays. The cards are physically
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`positioned such that each card is directly adjacent to no more
`than two others (for each typical examination there is a
`preceding card, the card under examination and a following
`card in the stack). On the front surface of each index card a
`librarian enters, in left to right sequence; the last name of an
`author, the first name of the author, the title of a single book
`which the author wrote and a shelf number indicating the
`physical location within the library where the one book may
`be found. Each of these four entries may be referred to as a
`“column” entry. Sufficient surface area must be available on
`each card to contain the largest of conceivable entries.
`After the entries are made, the index cards are stacked one
`after the next in alphabetical order, according to the author’s
`last name and then according to the author’s first name and
`then by title. This defines a “key-sequenced” type of data-
`base whose primary sort key is the author’s name. The
`examination position of each card is defined relative to the
`contents of preceding and following cards in the stack. That
`is, when cards are examined, each intermediate card is
`examined immediately after its alphabetically preceding
`card and immediately before its alphabetically succeeding
`card. When a new book is acquired,
`the key-sequenced
`database is easily “updated” by inserting a new card between
`two previously created cards. Similarly, if a book is removed
`from the collection, its card is simply pulled from the card
`stack to reflect the change.
`If a library user has an inquiry respecting the location of
`a particular book or the titles of several books written by a
`named author, the librarian may quickly search through the
`alphabetically ordered set of index cards and retrieve the
`requested information. However, if a library user has an
`inquiry which is not keyed to an author’s name, the search
`and retrieval process can require substantially more time; the
`worst case scenario being that for each inquiry the librarian
`has to physically sift through and examine each card in the
`entire catalog. As an example of such a scenario, suppose
`that an inquiring reader asks for all books in the library
`where the author’s first name is John and the title of the book
`
`contains the word “neighbor” or a synonym thereof.
`Although it is conceptually possible to answer this inquiry
`using the information within the catalog, the time for such
`a search may be impractically long, and hence, while the
`information is theoretically available, it is not realistically
`accessible.
`
`To handle the more common types of inquiries, libraries
`often keep redundant sets of index cards. One set of cards is
`sorted according to author names and another set is sorted
`according to the subject matter of each book. This form of
`redundant storage is disadvantageous because the size of the
`card catalog is doubled and hence, the cost of information
`storage is doubled. Also, because two index cards must be
`generated for each new book added to the collection the cost
`of updating the catalog is also doubled.
`The size of a library collection tends to grow over time as
`more and more books are acquired. During the same time,
`more and more index cards are added to the catalog. The
`resulting stack of cards, which may be viewed as a kind of
`“database”, therefore grows both in size and in worth. The
`“worth” of the card-based system may be defined in part as
`the accumulated cost of all work that is expended in creating
`each new index card and in inserting the card into an
`appropriate spot in the stack.
`As time goes by, not only does the worth and size of the
`database grow, but new technologies, new rules, new
`services, etc., begin to emerge and the information require-
`ments placed on the system change. Some of these changes
`
`022
`
`022
`
`
`
`5,826,259
`
`3
`may call for a radical reorganization of the card catalog
`system. In such cases, a great deal of work previously
`expended to create the catalog system may have to be
`discarded and replaced with new work.
`For the sake of example, let it be supposed that the library
`acquires a new microfilm machine which stores copies of a
`large number of autobiographies. The autobiographies dis-
`cuss the life and literary works of many authors whose books
`are kept in the library. Let it further be supposed that the
`original, first card catalog system is now required to cross
`reference each book to the microfilm location (or plural
`locations) of its author’s (or plural authors’) autobiogra-
`phies. In such a case, the card catalog system needs to be
`modified by adding at least one additional column of infor-
`mation to each index card to indicate the microfilm storage
`locations of the relevant one or more autobiographies.
`We will assume here that there is not enough surface area
`available on the current index cards for adding the new
`information. Larger cards are therefore purchased, the infor-
`mation from the old cards is copied to the new cards, and
`finally, the new microfilm cross referencing information is
`added to the larger cards. This type of activity will be
`referred to here as “restructuring” the database.
`Now let us suppose,
`that as more time goes by, an
`additional but previously unanticipated, cross indexing cat-
`egory is required because of the introduction of a newer
`technology or a new government regulation. It might be that
`the just revised and enlarged second card system does not
`have the capacity to handle the demands of the newer
`technology or regulation. In such a situation, a third card
`system has to be constructed from scratch. The value of
`work put into the creation of the just-revised second system
`is lost. As more time passes and further changes emerge in
`technology, regulations, etc., it is possible that more major
`organizational changes will have to be made to the catalog
`system. Time after time, a system will be built up only to be
`later scrapped because it fails to anticipate a new type of
`information storage and retrieval operation. This is quite
`wasteful.
`
`Although computerized database systems are in many
`ways different from manual systems,
`the computerized
`information storage and retrieval systems of the prior art are
`analogous to manual systems in that
`the computerized
`databases require similar restructuring every time a new
`category of information relationships or a new type of
`inquiry is created.
`At a fundamental level, separate pieces of information are
`stored within a computerized database system as a large
`number of relatively short strings of binary bits where each
`string has finite length. The bit strings are distributed spa-
`cially within a tangible medium of data storage such as an
`array of magnetic disks, optical devices or other information
`representing means capable of providing mass storage. Each
`bit is represented by a magnetic flux reversal, an optical
`perturbation and/or some other variance in the physical
`attributes of a data storage medium. A transducer or ampli-
`fier means converts these variances into signals (e.g.,
`electrical, magnetic, or optical) which can be processed on
`a digital data processing machine. Each string of bits is often
`uniquely identified by its physical location or by a logical
`storage address. Some bit strings may function as address
`pointers, rather than as the final pieces of “real” information
`which a database user wishes to obtain. The address pointers
`are used to create so-called “threaded list” organizations of
`data wherein logical
`links between a first
`informational
`“object” (first piece of real data) and a second informational
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`“object” (second piece of real data) are established by a
`chain of direct or indirect address pointers. The user-desired
`objects of real information themselves can be represented by
`a collection of one or more physically or logically connected
`strings.
`Typically, “tables” of information are created within the
`mass storage means of the computerized system. Ahorizon-
`tal “row” of related objects, which is analogous to a single
`card in a card catalog system, may be defined by placing the
`corresponding bit strings of the objects in physical or
`address proximity with each other. Logical interconnections
`may be defined between different rows by using ancillary
`pointers (which are not considered here as the “real” data
`sought by a database user). A serial sequence of “rows”
`(analogous to a stack of cards) is then defined by linking one
`row to another according to a predefined sorting algorithm
`using threaded list techniques.
`linking “threads” may be
`A vast number of different
`defined in this way through a database table having millions
`or billions of binary information bits. Unlike manual
`systems, the same collection of rows (which replaces the
`manual stack of cards) can be simultaneously ordered in
`many different ways by utilizing a multiplicity of threaded
`paths so that redundant data storage is not necessary.
`Searches and updates may be performed by following a
`prespecified thread from one row to the next until a sought
`piece of information (or its address) is found within a table.
`A threaded-list type of table can be “updated” in a manner
`similar to manual card systems by breaking open a logical
`thread within the list, at a desired point, and inserting a new
`row (card) or removing an obsolete row at the opened spot.
`Tables are often constructed according to a “key-
`sequenced” approach. One column of a threaded-list table is
`designated as the sort-key column and the entries in that
`column are designated as “sort keys”. Address pointers are
`used to link one row of the table to another row according
`to a predefined sequencing algorithm which orders the
`entries (sort-keys) of the sort column as desired (i.e.,
`alphabetically, numerically or otherwise). Once a table is so
`sorted according to the entries of its sort column, it becomes
`a simple task to search down the sort column looking for an
`alphabetically, numerically or otherwise ordered piece of
`data. Other pieces of data which are located within the row
`of each sort key can then be examined in the same sequence
`that each sort key is examined. Any column can serve as the
`sort column and its entries as the sort keys. Thus a table
`having a large plurality of columns can be sorted according
`to a large number of sorting algorithms.
`The key-sequencing method gives tremendous flexibility
`to a computerized database but not without a price. Each
`access to the memory location of a list-threading address
`pointer or to the memory location of a sort-key or to the
`memory area of “real” data which is located adjacent to a
`sort-key takes time. As more and more accesses are required
`to fetch pointers and keys leading to the memory location of
`a piece of sought-after information (“real data”),
`the
`response time to an inquiry increases and system perfor-
`mance suffers.
`
`There is certain class of computerized databases which
`are referred to as “relational databases”. Such database
`
`systems normally use threaded list techniques to define a
`plurality of key-sequenced “tables”. Each table contains at
`least two columns. One column serves as the sort column
`while a second or further columns of the table store either
`
`the real data that is being sought or additional sort-key data
`which will ultimately lead to a sought-after piece of real
`
`023
`
`023
`
`
`
`5,826,259
`
`5
`data. The rows of the table are examined in an ordered
`fashion according to the contents of the sort column. Target
`data is located by first threading down the sort column and
`thus moving through the chain of rows within a table
`according to a prespecified sort algorithm until a specific
`sort-key is found. Then the corresponding row is examined
`horizontally and the target data (real data or the next key) is
`extracted from that row.
`
`An example of “real” data would be the full-legal names
`of unique persons such as in the character strings, “Mr.
`Harry W. Jones”, “Mrs. Barbara R. Smith”, etc. The sort-key
`can be a number which is stored adjacent to the full name
`and which sequences the names (real data) according to any
`of a wide variety of ordering patterns including by age, by
`height, by residential address, alphabetically, etc. Because
`the real data (e.g., full name of a person) is stored in a
`separate column, it is independent from the sort key data. A
`large variety of different relations can therefore be estab-
`lished between a first piece of real data (e.g., a first person’s
`name) and a second piece of real data (e.g., a second
`person’s name) simply by changing the sort keys that are
`stored in the separate sort column (e.g., who is older than
`whom, who is taller, etc.). Plural orderings of the real data
`can be obtained at one time by providing many columns in
`one table, by storing alternate keys in the columns and by
`choosing one or more of these columns as the primary sort
`key column.
`Relational database systems often include tables that do
`not store real data in a column adjacent to their sort-key
`column, but rather store a secondary key number which
`directs a searcher to a row in another key-sequenced table
`where a matching key number is held together with either a
`piece of sought-after real data or yet another forward refer-
`encing key number (e.g., an entry which in effect says “find
`the row which holds key number x of yet another table for
`further details”). With this indirect key-sequenced approach,
`a large number of tables can be simultaneously updated by
`changing one entry in a “base” table.
`Relational database tables are normally organized to
`create implied set and subset “relations” between their
`respective items of pre-stored information. The elements of
`the lowest level subsets are stored in base tables and higher
`level sets are built by defining, in other tables, combinations
`of keys which point to the base tables. The implied relations
`between elements cannot be discerned by simply inspecting
`the raw data of each table. Instead, relations are flushed out
`only with the aid of an access control program which
`determines in its randomly-distributed object code, which
`table to examine first and what column to look at before
`
`beginning to search down the table’s column for a key
`number and, when that key number is found, what other
`column to look at for the real data or a next key number.
`Relations between various “entities” of a relational database
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`are implied by the sequence in which the computer accesses
`them.
`
`55
`
`By way of a concrete example, consider a first relational
`table (Names-Table) which lists the names of a large number
`of people in telephone directory style. Each name (each
`separate item of real data) is paired to a unique key number
`and the rows of this Names-Table are sorted sequentially
`according to the key number. A second relational table may
`be provided in the database (Cars-table) which lists auto-
`mobile (vehicle) identification numbers (VIN) each paired in
`its row with a second key number. If the second key number
`is matched by a corresponding key number in the first table,
`then a relationship might be implied between the entries of
`the two separate tables (Names-Table and Cars-Table). The
`
`60
`
`65
`
`6
`“implied” relationship might be one of an infinite set of
`possibilities. The relationship could be, for example, that the
`car listed in the second table is “owned” by the person whose
`name is found next to a matching key in the first table. On
`the other hand, it might be implied that the matched person
`in the first table “drives” the car, or “cleans” the car or has
`some other relation to the car. It is left to the access control
`
`program to define what the relationship is between entities
`in the first table and entities in the second table.
`
`It can be seen that relational database systems offer users
`a great deal of flexibility since an infinite number of relations
`may be defined (implied). Economy in maintaining
`(updating) the database is also provided-since a change to a
`base table propagates through all other tables which refer-
`ence the base table. The access control program of the
`database system can include information-updating modules
`which, for example, change the key number in the second
`table (Cars-Table) whenever ownership of a car changes. If
`the name of the new owner is already in the first table
`(Names-Table), it does not have to be typed a second time
`into a new storage area and thus, extra work and storage
`redundancy are avoided. The vehicle identification number
`(VIN) remains unchanged. Minimal work is thus expended
`on updating the database.
`Despite these advantages, relational database systems
`suffer from expandability and restructuring problems similar
`to those of the above-described manual system. Sometimes
`the rows within a particular table have to be altered to add
`additional columns. This is not easily done. Suppose for
`example, that a new government regulation came into being,
`mandating that vehicles are to always be identified not only
`by a vehicle identification number (VIN) but also by the
`name and location of the factory where the vehicle was
`assembled. If spare columns are not available in the Cars-
`Table, the entire database may have to be restructured to
`create extra room in the storage means (i.e. the disk bank)
`for adding the newly required columns. New key numbers
`will have to be entered into the new columns of each row
`
`(e.g., a new “factory of assembly” key number) and sorted
`in order to comply with the newly mandated regulation. New
`search and inquiry routines will have to be written for
`handling the newly structured tables.
`In the past, much of this restructuring work was done by
`reprogramming the computer at the object code or source
`code level. This process relied heavily on an expert pro-
`gramming staff. It was time consuming, costly and prone to
`programming errors. Worst of all, it had to be redone time
`and again as new informational requirements emerged just
`after a last restructuring project was completed. There is a
`need in the industry for a database management system
`which provides quick responses to inquiries and which can
`also be continuously updated or restructured without repro-
`gramming at the source or object code level.
`SUMMARY OF THE INVENTION
`
`It is an objective of the present invention to provide a
`database system which is capable of storing voluminous
`amounts of information, sifting through the information at
`high speed, and is at the same time easily expandable or
`restructurable to take on new forms of entities and relation-
`
`ships.
`In accordance with a first aspect of the invention, an entity
`definition table (ENT.DEF) is defined within the memory
`means of a computer system to store the name of an allowed
`entity typ