`
`(12) United States Patent
`US 7,873,625 B2
`(10) Patent N0.:
`Mourra et al.
`
`(45) Date of Patent: Jan. 18, 2011
`
`(54) FILE INDEXING FRAMEWORK AND
`SYMBOLIC NAME MAINTENANCE
`FRAMEWORK
`
`(75)
`
`Inventors: John Mourra, Toronto (CA); Vladimir
`Klicnik, Oshawa (CA); Lok Tin Loi,
`Toronto (CA); Hiroshi Tsuji, Stouffville
`(CA)
`
`(73) Assignee:
`
`International Business Machines
`Corporation, Armonk, NY (US)
`
`*
`
`Notice:
`
`J
`y
`Sub'ect to an disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 304 days.
`
`(21) Appl.No.: 11/532,804
`
`(22)
`
`Filed:
`
`Sep. 18, 2006
`
`(65)
`
`Prior Publication Data
`
`US 2008/0071805 A1
`
`Mar. 20, 2008
`
`(51)
`
`Int. Cl.
`(2006.01)
`G06F 7/00
`(2006.01)
`G06F 17/30
`(52) US. Cl.
`....................................... 707/711; 707/741
`(58) Field of Classification Search ....................... None
`See application file for complete search history.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,893,232 A *
`5,481,683 A
`5,787,435 A *
`5,806,058 A *
`5,933,820 A *
`6,366,956 B1
`6,460,178 B1
`
`.............. 707/1
`
`1/1990 Shimaoka et a1.
`1/1996 Karim
`7/1998 Burrows ..................... 707/102
`9/1998 Mori et a1.
`..................... 707/2
`8/1999 Beier et a1.
`.................... 707/1
`4/2002 Krishnan
`10/2002 Chan et a1.
`
`6,640,225 B1 * 10/2003 Takishita et a1.
`6,654,758 B1
`11/2003 Teague
`6,741,988 B1
`5/2004 Wakefield et a1.
`6,760,694 B2
`7/2004 Al-Kazily et a1.
`
`............... 707/5
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`EP
`
`0405829
`
`1/1999
`
`OTHER PUBLICATIONS
`
`Kendrick et a1., “Method for Supporting Non-Text Objects in an
`Index Structure that Originally Supported Only Text Records”, Tech-
`nical Disclosure Bulletin Jan. 1988, pp. 155-162.
`
`(Continued)
`
`Primary ExamineriPierre M Vital
`Assistant ExamineriAugustine Obisesan
`(74) Attorney, Agent, or FirmiScott D. Paul, Esq.; Carey
`Rodrigues Greenberg & Paul LLP
`
`(57)
`
`ABSTRACT
`
`The present invention provides a file management system that
`includes a file indexing framework that allows third parties to
`contribute index handlers that are responsible for populating
`index entries for the artifacts they own and/or generate. The
`framework manages the creation, maintenance, and update of
`the index, and calls the index handlers at appropriate times so
`they can parse files that they understand for values that need
`to be stored in the index. The framework also provides APIs
`for querying the standardized fields of the index, so applica-
`tions can search for standard types of data contributed for any
`of the indexed files. The present invention also provides a
`mechanism to keep track of symbolic name associations for
`every file/entity in the system. Specifically, the present inven-
`tion provides a session-based and transient shadow table of
`symbolic names previously used by the files (even beyond the
`lifetime of the files themselves).
`
`19 Claims, 3 Drawing Sheets
`
`
`
`
`MEMORY
`yamssws
`—Is
`wn
`
`HLE wnaxIm rmawoxx
`Io
`
`
`H NsvrrzM
`um nemsmnm
`
`came: NUIIFICA'HON
`5mm
`
`max
`swrmumutax Mama:
`_36 <—> momma:
`E
`_ I.
`m
`[VDEX wkrrsk
`_ ,3
`
`minnows
`um moczssmc
`_ (—>
`sy
`w
`
`
`Io
`mamas
`
`
`
`
`
`
`
`Adobe Inc. v.
`
`Exhibit 1028
`
`Express Mobile, |nc.,
`|PR2021 -XXXXX
`
`US. Pat. 9,471,287
`
`
`swam
`&
`—LPmmv
`[Cmm SYSYEM
`mu:
`mrmwas Ian-mum
`— u <—>
`
`5mmmm rowunoII
`meow — y,
`<—> um
`smm
`
`
`
`lEXTERNAL
`_ ,
`amass
`
`
`
`
`Exhibit 1028
`
`Page 01 of 12
`
`
`
` mama: RESOLUTION
`
`
`20
`
`5:
`
`J
`
`_
`
`22
`
`
`
`I
`III
`I
`.
`I
`
`I—IIIIIIIIIIIIIIIIIII
`
`
`
`
`
`
`
`_ u f
`III —IIIIIIIIIIIIIIIIIIIIIIIIIIIII
`
`Exhibit 1028
`Page 01 of 12
`
`
`
`US 7,873,625 B2
`
`Page 2
`
`2006/0041606 A1*
`2006/0112141 A1
`2006/0136391 A1
`2006/0206535 A1*
`2007/0016546 A1 *
`
`2/2006 Sawdon ...................... 707/205
`5/2006 Morris
`6/2006 Morris
`............... 707/200
`9/2006 Ogawa et a1.
`1/2007 De Vorchik et a1.
`............ 707/1
`
`OTHER PUBLICATIONS
`
`Harold Henke, “Methodology for Searching Adobe Acrobat Portable
`Data Format Files Based on Content Relevance”, Research Disclo-
`sure No. 432, Article 141, p. 756, Apr. 2000.
`Hedgemon et al., “Two -Level Index Conversion ofText to Composite
`Document”, Technical Disclosure Bulletin Aug. 1987, pp. 990-993.
`
`* cited by examiner
`
`US. PATENT DOCUMENTS
`
`............ 707/694
`.
`707/3
`
`..
`707/1
`.......... 707/ 100
`
`Compton et a1.
`Miyamoto et al.
`Benelisha et al.
`Murakami et al.
`Nunez
`Kobayashi
`Cox et al.
`Lusen
`Thompson et al.
`Chen et al.
`Eross ......................... 715/5 13
`Wang et a1.
`
`..................... 707/3
`
`7,783,615
`2003/0101171
`2003/0144990
`2003/0182297
`2004/0117362
`2004/0220919
`2004/0225865
`2005/0010452
`2005/0050537
`2005/0086269
`2005/0097452
`2005/0203957
`
`B1 *
`A1*
`A1*
`A1*
`A1
`A1*
`A1
`A1
`A1
`A1
`A1*
`A1
`
`8/2010
`5/2003
`7/2003
`9/2003
`6/2004
`11/2004
`11/2004
`1/2005
`3/2005
`4/2005
`5/2005
`9/2005
`
`Exhibit 1028
`
`Page 02 of 12
`
`Exhibit 1028
`Page 02 of 12
`
`
`
`U.S. Patent
`
`Jan. 18, 2011
`
`Sheet 1 of 3
`
`US 7,873,625 B2
`
`
`
`95:5me
`
`KMDE
`
`mZOE<UEn§<
`
`>MS>5§
`
`mfimfid.
`
`339%?
`
` m.4mg?
`
`
`
`MMOBMEL325,595m:E
`
`krMOSfi—Hz
`
`
`
`7OF§Hm5mE”Hm—4944‘:
`
`Elkm>m
`
`
`
`ZOEKUEFOZDUEIU
`
`ST:m>m
`
`
`
`MU<LMMHEMEIEZIN‘:
`
`ZMHm‘rm
`
`
`
`”Hm—HEBXMDE
`
`DZEmuUOME
`
`.EZb
`
`mm—04$:th
`
`a:
`
`
`
`
`
`UEmmm.03an«rum—QC
`
`
`
`Zmzlmkrmm2<ZUEOMHZ‘rm
`
`ZOEUm—FMQm2<z
`
`thmm’m
`
`Zdhajbmnqmmqflh
`
`23%.;
`
`2mhm>m
`
`
`
`
`
`ZOquOmm—MMUZmEm.“mm
`
`thmm’m
`
`1N
`
`
`
`
` ,_<ZvI7X:
`
`mmbSmQ
`
`flGE
`
`Exhibit 1 02 8
`
`Page 03 of 12
`
`Exhibit 1028
`Page 03 of 12
`
`
`
`
`U.S. Patent
`
`Jan. 18, 2011
`
`Sheet 2 of 3
`
`US 7,873,625 B2
`
`3
`
`oExmoEm3:-2A?5530-
`
`onBmzém
`
`~5thxmmEmm
`
`om
`
`
`
`ZOE!DEF—[OZ
`
`MUZ<EO
`
`mAE
`
`N.65
`
`
`
`.HXmAQmB
`
`MMAQZ<E
`
`Xmma
`
`
`
`mmqoz<mmquz/E
`
`meEmeE
`
`<><_.m3:
`
`AQmBmm<m
`
`MMAQZ<E
`
`XmQE
`
`QNm
`
`0%MNm
`
`<Nm
`
`Mom
`
`<om
`
`
`
`29205;?29305;?
`
`Exhibit 1 02 8
`
`Page 04 of 12
`
`Exhibit 1028
`Page 04 of 12
`
`
`
`
`
`
`US. Patent
`
`Jan. 18, 2011
`
`Sheet 3 of3
`
`US 7,873,625 B2
`
`u
`
`m >
`
`-‘
`
`E F Z L
`
`FIG.3
`
`U
`
`>-‘
`
`E F Z L
`
`<1:
`
`Exhibit 1028
`
`Page 05 of 12
`
`Exhibit 1028
`Page 05 of 12
`
`
`
`US 7,873,625 B2
`
`1
`FILE INDEXING FRAMEWORK AND
`SYMBOLIC NAME MAINTENANCE
`FRAMEWORK
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`In general, the present invention relates to file manage-
`ment. Specifically, the present invention provides a system
`for indexing files and for tracking symbolic names associated
`with files.
`2. Related Art
`
`Many computer applications persist information by storing
`data in files. For example, a software tool that allows a user to
`create different kinds of data objects may store representa-
`tions of each into a separate file. However, an application
`trying to find specific information that has been stored in one
`or more files faces two problems: (1) searching for data in
`files becomes time-consuming as the number and size of files
`increases; and (2) only files with well-known formats can be
`searched. The first problem can be addressed for the mo st part
`by indexing the contents of files: parsing each file once to
`store the relevant contents of the file into an index, and after-
`wards searching the index (which presumably provides effi-
`cient search characteristics) instead of each file individually.
`However, this technique also can only be used for files with
`well-known formats. Without a well-known format, the con-
`tents of a file cannot be parsed. A significant hurdle thus faces
`applications that wish to search or index arbitrary files with
`arbitrary formats. A trend in software is to support pluggable
`extensions, which allows clients to provide their own special-
`izedbehavior to the software. This trend leads to the use of file
`
`formats understood only by a specific extension, or the intro-
`duction of extensions-specific data into extensible file for-
`mats. Unfortunately, none of the existing systems are capable
`of treating extensions-specific data on par with file data it is
`hard-coded to understand.
`
`Additional problems exist in the current art with respect to
`symbolic names for files. Specifically, consider an interde-
`pendent environment, where references are symbolic and not
`direct, examples of which may include an XML-based sys-
`tem. In such an environment, references between entities are
`described through symbolic names, where one entity refer-
`ences a symbolic name while another entity associates itself
`with the same symbolic name. At runtime, the referenced
`symbolic name is resolved to the respective entity. The advan-
`tages of such an environment are clear and documented,
`ranging from flexibility to pluggability. Nonetheless, one
`common problem encountered in such a system is the issue of
`dangling references, where a reference is un—resolvable due to
`a missing dependency. This may occur when an entity being
`referenced symbolically is deleted or its associated symbolic
`name is modified. When the entity being referenced is
`deleted, its association with the symbolic name is deleted
`with it. Even if the change was simply a modification of the
`associated symbolic name, then the association with the origi-
`nal symbolic name is deleted. Accordingly, the referencing
`entity is no longer capable of associating its dangling refer-
`ence with the specific entity that was deleted. This means that
`there is no way to identify to a user of such an environment the
`reason that caused the dangling reference (e.g. the name ofthe
`specific file that was deleted). Looking at the same issue from
`a different perspective, it also means that when an entity’s
`association with a symbolic name is deleted (i.e. the entity is
`either deleted or simply associated with a different symbolic
`name), there is no way to find all referencing entities that are
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`
`affected after the fact (since they only reference the deleted
`symbolic name, and not the entity itself).
`
`SUMMARY OF THE INVENTION
`
`The invention introduces an indexing and search frame-
`work that allows third parties to contribute indexing exten-
`sions (referred to herein as index handlers) that are respon-
`sible for populating index entries for the artifacts they own
`and/or generate. Each entry in the index represents a file and
`its contents. The index entries include a standardized list of
`fields, such as file name and defined elements. These stan-
`dardized fields allow similar data to be stored in a consistent
`
`way for each file, which therefore essentially allows file con-
`tent searches (via searching of the index) to be done in a
`consistent way for all files, regardless of the format of those
`files. The framework manages the creation, maintenance, and
`update of the index, and calls the index handlers at appropri-
`ate times so they can parse files that they understand for
`values that need to be stored in the index. The framework
`
`provides APIs for querying the standardized fields of the
`index, so applications can search for standard types of data
`contributed for any of the indexed files.
`The present invention also provides a mechanism to keep
`track of symbolic name associations for every file/entity in
`the system. Specifically, the present invention provides a
`session-based and transient shadow table of symbolic names
`previously used by the files (even beyond the lifetime of the
`files themselves). By maintaining symbolic name association
`information beyond the lifetime of the entity itself or its
`association, the door is opened for other features that can use
`this information to track referencing entities and validate or
`even repair them as necessary (even ifthe referenced entity no
`longer exists or is no longer associated with the specific
`symbolic name).
`In one aspect of the present invention, a method for index-
`ing a file is provided. The method comprises: registering an
`index handler corresponding to a particular file type; calling
`the index handler when a change to a file having the particular
`file type is detected; parsing the file with the index handler to
`obtain index information; and writing the index information
`to an index entry corresponding to the file using an index
`writer.
`
`In another aspect of the present invention, the method
`further comprises: passing the index handler the file after the
`change is detected; and determining that the index handler is
`capable of parsing the file.
`In another aspect of the present invention, the method
`further comprises: responsive to determining that the index
`handler is capable of parsing the file, calling the index han-
`dler; passing the index handler an index writer object with the
`file; and the index handler calling the index writer using the
`index writer object to store the index information.
`In another aspect of the present invention, the method
`further comprises: receiving a query for the file; and search-
`ing the index entry based on the query.
`In another aspect of the present invention, a system for
`indexing a file is provided. The system comprises: a handler
`registration system for registering an index handler corre-
`sponding to a particular file type; a change notification system
`for receiving a notification of a change to a file having the
`particular file type; a handler interface system for calling the
`index handler based on the notification, wherein the index
`handler parses the file to obtain index information; and an
`index writer that is called by the index handler for writing the
`index information to an index entry.
`
`Exhibit 1028
`
`Page 06 of 12
`
`Exhibit 1028
`Page 06 of 12
`
`
`
`US 7,873,625 B2
`
`3
`In another aspect of the present invention, the handler
`interface system passes the index handler an index writer
`object.
`In another aspect of the present invention, the system fur-
`ther comprises a query processing system for receiving a
`query on the file, and for searching the index entry based on
`the query.
`In another aspect of the present invention, the system fur-
`ther comprises an index containing the index entry, wherein
`the index entry contains a standardized list of fields that
`contain the file information.
`
`In another aspect of the present invention, a program prod-
`uct stored on at least one computer readable medium for
`indexing a file is provided. The at least one computer readable
`medium comprises program code for causing a computer
`system to perform the following: register an index handler
`corresponding to a particular file type; call the index handler
`when a change to a file having the particular file type is
`detected; parse the file with the index handler to obtain index
`information; and write the index information to an index entry
`corresponding to the file using an index writer.
`In another aspect of the present invention, the at least one
`computer readable medium further comprises program code
`for causing the computer system to perform the following:
`communicate with the index handler; and pass the index
`handler the file after the change is detected.
`In another aspect of the present invention, the at least one
`computer readable medium further comprises program code
`for causing the computer system to perform the following:
`call the index handler if the index handler is capable of pars-
`ing the file; and pass the index handler an index writer object
`with the file.
`
`In another aspect of the present invention, the at least one
`computer readable medium further comprises program code
`for causing the computer system to perform the following:
`receive a query for the file; and search the index entry based
`on the query.
`In another aspect of the present invention, a method for
`deploying a system for indexing a file is provided. The system
`comprises: providing a computer infrastructure being oper-
`able to: register an index handler corresponding to a particular
`file type; call the index handler when a change to a file having
`the particular file type is detected; parse the file with the index
`handler to obtain index information; and write the index
`information to an index entry corresponding to the file using
`an index writer.
`
`In another aspect of the present invention, a method for
`maintaining symbolic name integrity in a dynamic environ-
`ment is provided. The method comprises: detecting a new
`symbolic name associated with a file; populating a primary
`table with the association of the new symbolic name to the
`file; populating a shadow table with an association of a pre-
`vious symbolic name to the file; and using the shadow table to
`resolve a dangling reference to the file.
`In another aspect of the present invention, the method for
`maintaining symbolic name integrity in the dynamic environ-
`ment further comprises using the shadow table to resolve a
`dangling reference to the file.
`In another aspect of the present invention, the method for
`maintaining symbolic name integrity in the dynamic environ-
`ment further comprises deleting the association of the previ-
`ous symbolic name to the file from the shadow table after a
`predetermined time period.
`In another aspect of the present invention, a system for
`maintaining symbolic name integrity in a dynamic environ-
`ment is provided. The system comprises: a name detection
`system for detecting a new symbolic name associated with a
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`file; a table population system for populating a primary table
`with the association of the new symbolic name to the file, and
`for populating a shadow table with an association of a previ-
`ous symbolic name to the file; and a reference resolution
`system for using the shadow table to resolve a dangling ref-
`erence to the file.
`
`In another aspect of the present invention, the system for
`maintaining symbolic name integrity in a dynamic environ-
`ment further comprises a reference resolution system for
`using the shadow table to resolve a dangling reference to the
`file.
`
`In another aspect of the present invention, the table popu-
`lation system further deletes the association of the previous
`symbolic name to the file from the shadow table after a
`predetermined time period.
`In another aspect of the present invention, a program prod-
`uct stored on at least one computer readable medium for
`maintaining symbolic name integrity in a dynamic environ-
`ment is provided. The at least one computer readable medium
`comprises program code for causing a computer system to
`perform the following: detect a new symbolic name associ-
`ated with a file; populate a primary table with the association
`of the new symbolic name to the file; and populate a shadow
`table with an association of a previous symbolic name to the
`file.
`
`In another aspect of the present invention, the at least one
`computer readable medium further comprising program code
`for causing the computer system to perform the following:
`use the shadow table to resolve a dangling reference to the
`file.
`
`In another aspect of the present invention, the at least one
`computer readable medium further comprising program code
`for causing the computer system to perform the following:
`delete the association of the previous symbolic name to the
`file from the shadow table after a predetermined time period.
`In another aspect of the present invention, a method for
`deploying a system for maintaining symbolic name integrity
`in a dynamic environment is provided. The method com-
`prises: providing a computer infrastructure being operable to:
`detect a new symbolic name associated with a file; populate a
`primary table with the association of the new symbolic name
`to the file; and populate a shadow table with an association of
`a previous symbolic name to the file.
`In another aspect of the present invention, the computer
`infrastructure is further operable to use the shadow table to
`resolve a dangling reference to the file.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`These and other features of this invention will be more
`
`readily understood from the following detailed description of
`the various aspects ofthe invention taken in conjunction with
`the accompanying drawings in which:
`FIG. 1 depicts a file management system according to an
`aspect of the present invention.
`FIG. 2 depicts a more detailed depiction ofthe file indexing
`framework of FIG. 1.
`
`FIG. 3 depicts an illustrative symbolic name scenario
`according to an aspect of the present invention.
`The drawings are not necessarily to scale. The drawings are
`merely schematic representations, not intended to portray
`specific parameters of the invention. The drawings are
`intended to depict only typical embodiments ofthe invention,
`and therefore should not be considered as limiting the scope
`of the invention. In the drawings, like numbering represents
`like elements.
`
`Exhibit 1028
`
`Page 07 of 12
`
`Exhibit 1028
`Page 07 of 12
`
`
`
`US 7,873,625 B2
`
`5
`DETAILED DESCRIPTION OF THE INVENTION
`
`Referring now to FIG. 1, a file management system 10
`according to the present invention is shown. As depicted,
`system 10 includes computer system 14 deployed within a
`computer infrastructure/environment 12. This is intended to
`demonstrate, among other things, that some or all of the
`teachings of the present invention could be implemented
`within a network environment (e.g., the Internet, a wide area
`network (WAN), a local area network (LAN), a Virtual private
`network (VPN), etc.), or on a stand-alone computer system.
`In the case of the former, communication throughout the
`network can occur via any combination of various types of
`communications links. For example,
`the communication
`links can comprise addressable connections that may utilize
`any combination ofwired and/or wireless transmission meth-
`ods. Where communications occur via the Internet, connec-
`tivity could be provided by conventional TCP/IP sockets-
`based protocol, and an Internet service provider couldbe used
`to establish connectivity to the Internet. Still yet, computer
`infrastructure 12 is intended to demonstrate that some or all of
`
`the components of system 10 could be deployed, managed,
`serviced, etc. by a service provider who offers to manage files
`according to the present invention.
`As shown, computer system 14 includes a processing unit
`16, a memory 18, a bus 20, and input/output (I/O) interfaces
`22. Further, computer system 14 is shown in communication
`with external I/O devices/resources 24 and index 26. In gen-
`eral, processing unit 16 executes computer program code,
`such as applications 50, file indexing framework 30, and
`symbolic name system 42, which are stored in memory 18.
`While executing computer program code, processing unit 16
`can read and/or write data to/from memory 18, index 26,
`and/or I/O interfaces 22. Bus 20 provides a communication
`link between each of the components in computer system 14.
`External devices 24 can comprise any devices (e. g., keyboard,
`pointing device, display, etc.) that enable a user to interact
`with computer system 14 and/or any devices (e.g., network
`card, modem, etc.) that enable computer system 14 to com-
`municate with one or more other devices.
`
`Computer infrastructure 12 is only illustrative of various
`types of computer infrastructures for implementing the inven-
`tion. For example, in one embodiment, computer infrastruc-
`ture 12 comprises two or more devices (e.g., a server cluster)
`that communicate over a network to perform the various
`process of the invention. Moreover, computer system 14 is
`only representative ofvarious possible computer systems that
`can include numerous combinations of hardware. To this
`
`extent, in other embodiments, computer system 14 can com-
`prise any specific purpose article of manufacture comprising
`hardware and/or computer program code for performing spe-
`cific functions, any article of manufacture that comprises a
`combination of specific purpose and general purpose hard-
`ware/software, or the like. In each case, the program code and
`hardware can be created using standard programming and
`engineering techniques, respectively. Moreover, processing
`unit 16 may comprise a single processing unit, or be distrib-
`uted across one or more processing units in one or more
`locations, e.g., on a client and server. Similarly, memory 18
`and/or index 26 can comprise any combination of various
`types of data storage and/or transmission media that reside at
`one or more physical locations. Further, I/O interfaces 22 can
`comprise any system for exchanging information with one or
`more external devices 24. Still further, it is understood that
`one or more additional components (e.g., system software,
`math co-processing unit, etc.) not shown in FIG. 1 can be
`included in computer system 14. However, if computer sys-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`tem 14 comprises a handheld device or the like, it is under-
`stood that one or more external devices 24 (e.g., a display)
`and/or index 26 could be contained within computer system
`14, not externally as shown.
`Index 26 can be any type of system (e.g., a database)
`capable ofproviding storage for file information and/or tables
`54 and 56 of symbolic names under the present invention. To
`this extent,
`index 26 could include one or more storage
`devices, such as a magnetic disk drive or an optical disk drive.
`In another embodiment, index 26 includes data distributed
`across, for example, a local area network (LAN), wide area
`network (WAN) or a storage area network (SAN) (not
`shown). In a typical embodiment, index 26 includes one or
`more index entries that each correspond to a particular file.
`Along these lines, each index entry includes a standardized
`list of fields that contains file information as extracted and
`
`indexed under the present invention
`Shown in memory 18 of computer system 14 (among other
`systems and tables) are file indexing framework 30, symbolic
`name system 42, application(s) 50 and index handler(s) 52. It
`should be understood that file indexing framework 30 and
`symbolic name system 42 could be provided independent of
`one another. For example, they do not need to both be pro-
`vided on a single computer system 14 within the scope of the
`present invention. In addition, it should be understood that file
`indexing framework 30 and symbolic name system 42 could
`be realized as multiple computer programs (as shown), or as
`a single computer program (not shown). It should also be
`understood that the various systems and their sub-systems of
`FIG. 1 are shown as such for illustrative purposes only and
`that the same functionality could be implemented with a
`different configuration of systems and sub-systems. In any
`event, the functions of file indexing framework 30 will be
`described first in conjunction with FIGS. 1 and 2.
`
`A. File Indexing Framework
`As depicted in FIG. 1, file indexing framework 30 includes
`handler registration system 32, change notification system
`34, handler interface system 36, index writer 38, and query
`processing system 40. Among other things, file indexing
`framework 30 provides the following capabilities: (I) regis-
`tration mechanism for index handlers; (2) infrastructure to
`initialize the contents of the index and update its contents as
`files are added, changed, and deleted; (3) a delegation mecha-
`nism for calling index handlers to parse files they understand
`for relevant contentithis delegation provides the support for
`indexing arbitrary files with arbitrary formats; (4) establish-
`ment of standardized fields in the index entry for a file; (5) a
`set of index writerAPIs to be called by index handlers to store
`standardized index values for a file; and (6) another set of
`APIs 60 shown in FIG. 2 (provided as part of query process-
`ing system 40) to be called by applications 50 to query the
`standardized index values. (The term “API” is an acronym for
`“application programming interface”, which enables a com-
`puter program to access a set of functions (typically provided
`by another application or library) without a detailed under-
`standing of the internal workings of the functions being
`accessed.)
`The index handlers are responsible for identifying files
`they understand, parsing the contents of tho se files, and iden-
`tifying appropriate data to be stored in the index entries for
`those files. Some advantages over a non-extensible file index-
`ing system include: (1) support for new file formats can be
`added by the provider of the file format, without any changes
`to the indexing framework or applications querying the index;
`(2)
`files storing data formats with built-in extensibility
`mechanisms, (e.g., WSDL and SCDL) can be indexed with-
`
`Exhibit 1028
`
`Page 08 of 12
`
`Exhibit 1028
`Page 08 of 12
`
`
`
`US 7,873,625 B2
`
`7
`out changes to file indexing framework 30, the code that
`indexes the base elements ofthe data standard, or applications
`querying the index (support for a particular extension can be
`added by the extension provider in the form of an index
`handler that only processes that extension); and (3) applica-
`tions 50 can essentially search the contents of arbitrary files
`with arbitrary formats in a standard manner, by searching
`standardized fields of index 26, without knowledge of file
`formats or the runtime extensions for which they may be
`targeted.
`A general description of the file indexing framework 30
`will now be given in conjunction with FIG. 1. Thereafter, a
`more specific illustrative example will be given in FIG. 2. In
`general, handler registration system 32 allows third parties to
`register index handlers 52. Each index handler 52 is typically
`associated with a particular file format. As changes to a file
`having a particular file format occur, a notification will be
`received by change notification system 34. Under the present
`invention such a notification can come directly from applica-
`tions 50 or from an operating system level. In any event, when
`such a notification is received, handler interface system 36
`will call the associated index handler 52 to determine whether
`
`index handler 52 is capable of parsing the affected file. If so,
`handler interface system 36 will pass index handler 52 the file
`along with an index writer object. Upon receipt, index handler
`52 will parse the file to obtain index information. Such infor-
`mation can be any type of information that will identify the
`file and its contents. Once the index information is deter-
`mined/extracted, index handler 52 will call index writer 38 to
`store the index information. Specifically, index writer 38 will
`write the index information to an index entry corresponding
`to the file. As indicated above, each file typically has its own
`index entry in index 26. Each entry includes a standardized
`list of fields such as file name and elements defined in the file.
`
`Should a user later desire to search/query index 26, the user
`can submit a query that will be received and processed by
`query processing system 40. Specifically, query processing
`system 40 includes a set of query APIs 60 (FIG. 2) that is
`adapted to receive queries from applications 50.
`Referring now to FIG. 2, a more specific example of these
`functions is shown. The example shown in FIG. 2 is an index-
`ing and search framework used to implement WebSphere
`Integration Developer (WID) 6.0., which is commercially
`available from IBM Corp. of Armonk, NY. In general, tool-
`ing in WID requires the following information, which can be
`scattered across all files generated and referenced by WID:
`(1) all element definitions of a certain type; (2) all files that
`reference a specified file; (3) all elements that reference a
`specified element; (4) all elements referenced by a specified
`element; and (5) all files that define a specified namespace.
`Implementing these searches without the use of an index 26 is
`not a viable option from a performance standpoint, because of
`the potential number of files that would need to be searched.
`More importantly, WID is based on the Service Component
`Architecture (SCA) which is highly pluggable. SCA compo-
`nent, import, and export objectsiwhich are saved as filesi
`all contain extensible data that can be defined by a runtime
`extension. Similarly, the implementation files that each com-
`ponent can reference in its extensible data are also specific to
`a runtime extension. Correctly indexing new component,
`import or export types introduced by new runtime extensions
`would not be possible without an extensible indexing mecha-
`nism. The mechanism in this case is file indexing framework
`30.
`
`As mentioned above, file indexing framework 30 provides
`a registration mechanism that allows third parties to define
`index handlers 52A-D. An index handler 52A-D is respon-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`sible for generating index information for a file, and is a class
`that implements a Java interface provided by file indexing
`framework 30. An index handler 52A-D is generally specific
`to a single file type and is provided by the domain owner ofthe
`file type. An index handler 52A-D must understand how to
`parse a file for meaning, to identify elements definitions and
`references that are relevant to the WID tooling that will be
`querying index 26.
`When creating an index entry for a file, file indexing frame-
`work 30 calls each index handler 52A-D, passing it the file
`being i