`(12)
`(10) Patent No.:
`US 7,873,625 B2
`Mourraetal.
`(45) Date of Patent:
`Jan. 18, 2011
`
`
`US007873625B2
`
`(54) FILE INDEXING FRAMEWORK AND
`SYMBOLIC NAME MAINTENANCE
`FRAMEWORK
`
`(75)
`
`Inventors: John Mourra, Toronto (CA); Vladimir
`Klienik, Oshawa (CA); Lok Tin Loi,
`Toronto (CA): Iliroshi Tsuji, Stouffville
`(CA)
`
`(73) Assignee:
`
`International Business Machines
`Corporation, Armonk, NY (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the termofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 304 days.
`
`(21) Appl. No.: 11/4532,804
`.
`(22) Filed:
`
`Sep. 18, 2006
`
`(65)
`
`(51)
`
`Prior Publication Data
`US 2008/0071805 Al
`Mar.20, 2008
`Int. Ch
`(2006.01)
`GO6F 7/00
`(2006.01)
`G06F 170
`707/711: 707/741
`:
`(32) US.CI
`,
`~ eeDerrensteerer ss tees
`(58) Field of Classification Search ...............sere None
`See application file for complete searchhistory.
`References Cited
`
`(56)
`
`............ 07/1
`
`U.S. PATENT DOCUMENTS
`4.803.232 A *
`1/1990 Shimaoka etal.
`5,481.683 A
`1/1996 Karim
`5.787.435 A *
`71998 BUITOWS ceccccccccccccececce. 707/102
`.. 7107/2
`5,806,058 A *
`9/1998 Morietal.
`.
`
`8/1999 Beieret al. oo. 707/1
`5,933,820 A *
`6,366,956 Bl
`4/2002. Krishnan
`6,460,178 B1
`10/2002 Chan etal.
`
`6,640,225 BL* 10/2003 Takishitaet al.
`6,654,758 Bl
`11/2003 ‘league
`6,741,988 Bl
`5/2004 Wakefield et al.
`6,760,694 B2
`7/2004 Al-Kazilyet al.
`(Continued)
`
`......0..0..... 07/5
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`0405829
`1/1999
`
`OTHER PUBLICATIONS
`Kendrick et al., “Method for Supporting Non-Text Objects in an
`Index Structure that Originally Supported Only Text Records”, Tech-
`nical Disclosure Bulletin Jan. 1988, pp. 155-162.
`(Continued)
`Primary Examiner—Pierre M Vital
`Assistant Examiner—Augustine Obisesan
`(74) Attorney, Agent, or Firm—Scott D. Paul, Esq.; Carey
`Rodrigues Greenberg & Paul LLP
`
`ABSTRACT
`(57)
`Thepresent inventionprovidesa file management systemthat
`includes a file indexing framework that allows thirdparties to
`contributeindex handlers that are responsible for populating
`index entries for the artifactsthey own and/or generate. The
`framework managesthe creation, maintenance, andupdate of
`the index, and calls the index handlers at appropriate limes so
`they can parsefiles that they understand for values that need
`to be stored in the index. The framework also provides APIs
`for querying the standardized fields of the index, so applica-
`tions can searchfor standard types of data contributed for any
`of the indexed files. The present invention also provides a
`mechanism to keep track of symbolic nameassociations for
`every file/entity in the system. Specifically, the present inven-
`tion provides a session-based and transient shadow table of
`symbolic namespreviously used bythefiles (even beyond the
`lifetimeofthefiles themselves).
`
`19 Claims, 3 Drawing Sheets
`
`
`
`
`
`<> tase
`
`
`
`Le
` REFERENCE ResoLUTION Ly
`
`
`
`{excernat=Loy
`Devices
`
`
`
`»
`
`[2
`
`APPLICATIONS
`
`REFERE
`
`“
`
`
`
`
`FAY x»
`FROCESIING==ie
`INT
`FILE INDEXTNS FRAMEWORK
`
`HANDLER REGISTRATION
`SYSTEM
`“
`CHANGE NOTIFICATION
`SesTen
`
`‘STEM r
`INDEX
`HANDIFR INTRRRATE
`L. 36 | rasDiers
`SYSTEM
`eS
`INDEX WRITER
`Liss
`
`fe
`L
`ERY PROCESEING es
`.
`
`
`e
`L
`INTRRFACR!
`
`
`
`SYMBOLICNaMESYSTEM comrRIMARYJ5
`
`MAME DETECTION
`SvSTEM
`Ly
`|«
`pp
`TABLE
`La
`Le
`
`syTABLE POPULATIONSTEM
`suapow | se
`
`1,1
`\
`
`)=1'11111',11,11111'
`11'111,11)11'111111)11,111111)11
`
`11
`
`Page 1 of 12
`
`GOOGLEEXHIBIT 1017
`
`Page 1 of 12
`
`GOOGLE EXHIBIT 1017
`
`
`
`US7,873,625 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`
`. 707/694
`7107/3
`
`
`....
`wee TOIL
`tevseeees 707/100
`
`8/2010 Comptonetal. ..
`B1*
`5/2003 Miyamotoetal.
`Al*
`7/2003 Benelishaet al.
`Al*
`9/2003 Murakami etal.
`Al*
`6/2004 Nunez
`Al
`Al* 11/2004 Kobayashi
`Al
`11/2004 Cox etal.
`Al
`1/2005 Lusen
`Al
`3/2005 Thompsonetal.
`Al
`4/2005 Chenetal.
`Al*
`5/2005 Eross wees
`Al
`9/2005 Wangctal.
`
`........
`
`seven eneeeeee 7107/3
`
`seveeenees 715/513
`
`2006/0041606 A1l*
`2006/0112141 Al
`2006/0136391 Al
`2006/0206535 Al*
`2007/0016546 Al*
`
`2/2006 Sawdon.......... ce 707/205
`5/2006 Morris
`6/2006 Morris
`9/2006 Ogawael al. .......0 707/200
`1/2007 De Vorchik etal.
`............ 707/1
`
`OTHER PUBLICATIONS
`
`Harold Henke, “Methodologyfor Searching Adobe Acrobat Portable
`Data Format Files Based on Content Relevance”, Research Disclo-
`sure No. 432, Article 141, p. 756, Apr. 2000.
`Hedgemonet al., “Two-Level Index Conversion ofText to Composite
`Document”, Technical Disclosure Bulletin Aug. 1987, pp. 990-993.
`
`* cited by examiner
`
`7,783,615
`2003/0101171
`2003/0144990
`2003/0182297
`2004/0117362
`2004/02209 19
`2004/0225865
`2005/00 10452
`2005/0050537
`2005/0086269
`2005/0097452
`2005/0203957
`
`Page 2 of 12
`
`Page 2 of 12
`
`
`
`U.S. Patent
`
`Jan. 18, 2011
`
`Sheet 1 of 3
`
`US 7,873,625 B2
`
`
`
`
`
`
`
`STTAINVAHOVdeaLNIWaTINVH
`
`XdCNI
`
`WALSAS
`
`SNOILVOITdd¥
`
`WaLSAS
`
`
`
`ONISSADOUdAYIAOzia
`
`SdVdaLNT
`
`ul
`
`
`
`UALRIMXACNI
`
`
`
`
`
`aqaviNOILDDLAC0DAWN+AUVWTad
`
`AMOUVHS
`
`ald
`
`7
`
`
`
`NOILVINdOdATAVL
`
`WaLSAS
`
`WALSAS
`
`
`
`NOLLATOSAYSONATAdaa
`
`WALSAS
`
`
`
`
`
`WALSASSNWNOITOUAAS
`
`
`
`aIVN2H.XH
`
`
`
`
`SaQIAdd
`
`IDIA
`
`SYOMAINYYdONXACNIAd
`
`AYOWII
`
`
`
`NOILVULSIDRUWATANVIT
`
`WALSAS
`
`
`
`NCILWOIILONDCONVHD
`
`WHISAS
`
`
`
`WALSAS8UALNdNOD
`
`ONISSHIOUd
`
`LIND
`
`Page 3 of 12
`
`Page 3 of 12
`
`
`
`
`
`U.S. Patent
`
`Jan. 18, 2011
`
`Sheet 2 of 3
`
`US 7,873,625 B2
`
`
`
`azsozsecWes
`
`¢OA
`
`
`
`UATANVHYUATAUNVHAUATAUNVHYWATAUNVH
`
`
`
`
`
`
`
`XHdNIXHdNIXHdNIXHCNI
`
`
`
`
`
`9%ATA
`
`ogADNVHOD
`
`NOILVDISLLON
`
`ONIXACNIATA
`
`SYOMAWNVA
`
`
`
`YWALNIMXACNI
`
`
`
`NOILVDOIIddVNOTLVOMTddV
`
`
`
`“LEXATOSMVAVI4ATOSMASVa
`
`Page 4 of 12
`
`Page 4 of 12
`
`
`
`
`
`
`U.S. Patent
`
`Jan. 18, 2011
`
`Sheet 3 of 3
`
`US 7,873,625 B2
`
`L)
`
`>o—4L
`
`cn
`
`FIG.3
`
`<<
`
`aa
`
`aCt
`
`oZ
`
`.
`
`Page 5 of 12
`
`Page 5 of 12
`
`
`
`US 7,873,625 B2
`
`1
`FILE INDEXING FRAMEWORK AND
`SYMBOLIC NAME MAINTENANCE
`FRAMEWORK
`
`
`
`BACKGROUND OF THE INVENTION
`
`2
`affected after the fact (since they only reference the deleted
`symbolic name, and not the entityitself).
`
`SUMMARYOF THE INVENTION
`
`1. Field of the Invention
`
`In general, the present invention relates to file manage-
`ment. Specifically, the present invention provides a system
`for indexingfiles and for tracking symbolic namesassociated
`with files.
`2. Related Art
`
`Manycomputerapplicationspersist information by storing
`data in files. For example, a software tool that allowsa userto
`create different kinds of data objects may store representa-
`tions of each into a separate file. However, an application
`trying to find specific information that has been stored in one
`or morefiles faces two problems: (1) searching for data in
`files becomes time-consuming as the numberandsizeoffiles
`increases; and (2) only files with well-known formats can be
`searched. Thefirst problem can be addressed for the most part
`by indexing the contents offiles: parsing each file once to
`store the relevant contents ofthe file into an index, and after-
`wards searching the index (which presumably provideseffi- 5
`cient search characteristics) instead of each file individually.
`However, this technique also can only be used for files with
`well-known formats. Without a well-known format, the con-
`tents ofa file cannotbe parsed. A significant hurdle thus faces
`applications that wish to search or index arbitrary files with
`arbitrary formats. A trend in software is to support pluggable
`extensions, which allowsclients to provide their own special-
`ized behaviorto the software. This trendleadsto the useoffile
`
`The invention introduces an indexing and search frame-
`work that allows third parties to contribute indexing exten-
`sions (referred to herein as index handlers) that are respon-
`sible for populating index entries for the artifacts they own
`and/or generate. Eachentry in the index represents a file and
`its contents. The index entries include a standardizedlist of
`fields, such as file name and defined elements. These stan-
`dardized ficlds allow similar data to be stored in a consistent
`wayfor eachfile, which therefore essentially allowsfile con-
`tent searches (via searching of the index) to be done in a
`consistent way forall files, regardless ofthe format ofthose
`files. The framework managesthe creation, maintenance, and
`update ofthe index, and calls the index handlers at appropri-
`ate times so they can parse files that they understand for
`values that need to be stored in the index. The framework
`provides APIs for querying the standardized fields of the
`index, so applications can search for standard types of data
`contributed for any of the indexed files.
`The present invention also provides a mechanism to keep
`track of symbolic name associations for every file/entity in
`the system. Specifically, the present invention provides a
`session-based andtransient shadowtable of symbolic names
`previously used by the files (even beyondthe lifetime of the
`files themselves). By maintaining symbolic nameassociation
`information beyond the lifetime of the entity itself or its
`association, the dooris opened for other features that can use
`this information to track referencing entities and validate or
`even repair them as necessary (even ifthe referenced entity no
`formats understood onlyby a specific extension,orthe intro-
`longer exists or is no longer associated with the specific
`duction of extensions-specific data into extensible file for-
`symbolic name).
`mats. Unfortunately, none ofthe existing systems are capable
`In one aspect ofthe present invention, a method for index-
`oftreating extensions-specific data on par with file data it is
`ingafile is provided. The method comprises: registering an
`hard-coded to understand.
`index handler corresponding to a particularfile type; calling
`Additional problemsexist in the current art with respect to
`the index handler when a changetoafile having the particular
`40
`symbolic namesfor files. Specifically, consider an interde-
`file type is detected; parsingthe file with the index handler to
`pendent environment, where references are symbolic and not
`obtain index information; and writing the index information
`direct, examples of which may include an XML-based sys-
`to an index entry corresponding to the file using an index
`tem. In such an environment, references between entities are
`writer.
`described through symbolic names, where one entity refer-
`ences a symbolic name while another entity associatesitself
`with the same symbolic name. At runtime, the referenced
`symbolic nameis resolved to the respective entity. The advan-
`tages of such an environment are clear and documented,
`ranging from flexibility to pluggability. Nonetheless, one
`commonproblem encountered in such a system is the issue of 5
`dangling references, where a reference is un-resolvable due to
`a missing dependency. This may occur whenanentity being
`referenced symbolically is deletedor its associated symbolic
`name is modified. When the entity being referenced is
`deleted, its association with the symbolic name is deleted
`with it. Even if the change was simply a modification of the
`associated symbolic name, then the association with the origi-
`nal symbolic nameis deleted. Accordingly, the referencing
`entity is no longer capable of associating its dangling refer-
`ence with the specific entity that was deleted. This means that
`there is no wayto identify to a user of such an environment the
`reasonthat caused the dangling reference(e.g. thename ofthe
`specific file that was deleted). Looking at the same issue from
`a different perspective, it also means that when an entity’s
`association with a symbolic nameis deleted (i.e. the entity is
`either deleted or simply associated with a different symbolic
`name), there is no wayto find all referencing entities that are
`
`45
`
`In another aspect of the present invention, the method
`further comprises: passing the index handlerthefile after the
`changeis detected; and determiningthat the index handleris
`capableofparsingthefile.
`In another aspect of the present invention, the method
`further comprises: responsive to determining that the index
`handler is capable of parsing the file, calling the index han-
`dler; passing the index handler an index writer object with the
`file; and the index handler calling the index writer using the
`index writer object to store the index information.
`In another aspect of the present invention, the method
`further comprises: receiving a query for the file; and search-
`ing the index entry based onthe query.
`In another aspect ofthe present invention, a system for
`indexinga file is provided. The system comprises: a handler
`registration system for registering an index handler corre-
`spondingto a particular file type; a changenotification system
`for receiving a notification of a changeto a file having the
`particular file type; a handler interface system for calling the
`index handler based on the notification, wherein the index
`handler parses the file to obtain index information; and an
`index writer that is called bythe index handler for writing the
`index information to an index entry.
`
`Page 6 of 12
`
`Page 6 of 12
`
`
`
`US 7,873,625 B2
`
`10
`
`15
`
`20
`
`30
`
`35
`
`40
`
`45
`
`
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`4
`3
`file; a table population system for populating, a primary table
`In another aspect of the present invention, the handler
`with the association ofthe newsymbolic nameto thefile, and
`interface system passes the index handler an index writer
`for populating a shadowtable with an association of a previ-
`object.
`ous symbolic nameto the file; and a reference resolution
`In anotheraspect of the present invention, the system fur-
`system for using the shadow table to resolve a dangling ref-
`ther comprises a query processing system for receiving a
`erenceto thefile.
`query onthefile, and for searching the index entry based on
`the query.
`In another aspect of the present invention, the system for
`In anotheraspect of the present invention, the system fur-
`maintaining symbolic nameintegrity in a dynamic environ-
`ther comprises an index containing the index entry, wherein
`ment further comprises a reference resolution system for
`using, the shadow table to resolve a dangling reference to the
`the index entry contains a standardized list of fields that
`containthe file information.
`file.
`Tn anotheraspect ofthe present invention, a program prod-
`In another aspect of the present invention, the table popu-
`uct stored on at least one computer readable medium for
`lation system further deletes the association of the previous
`indexinga file is provided. The at least one computer readable
`symbolic nameto the file from the shadowtable after a
`medium comprises program code for causing a computer
`predetermined time periad.
`system to perform the following: register an index handler
`In anotheraspectofthe present invention, a program prod-
`correspondingtoaparticularfile type;call the index handler
`uct stored on at least one computer readable medium for
`when a change to a file having the particular file type is
`maintaining symbolic name integrity in a dynamic environ-
`detected; parse the file with the index handler to obtain index
`mentis provided. Theat least one computer readable medium
`information; and write the index information to an index entry
`comprises program code for causing a computer system to
`correspondingto the file using an index writer.
`performthe following: detect a new symbolic name associ-
`In another aspect of the present invention, the at least one
`ated with a file; populate a primary table with the association
`computer readable medium further comprises program code
`of the new symbolic nametothe file; and populate a shadow
`for causing the computer system to perform the following:
`table with an association of a previous symbolic nameto the
`file.
`communicate with the index handler; and pass the index
`handlerthe file after the changeis detected.
`In another aspect of the present invention, the at least one
`In another aspect of the present invention, the at least one
`computer readable mediumfurther comprising program code
`computer readable medium further comprises program code
`for causing the computer system to perform the following:
`for causing the computer system to perform the following:
`use the shadowtable to resolve a dangling reference to the
`file.
`call the index handler if the index handler is capable of pars-
`ing the file; and pass the index handler an index writer object
`withthefile.
`In another aspect of the present invention, the at least one
`computer readable medium further comprises program code
`for causing the computer system to perform the following:
`receive a query for the file; and search the index entry based
`onthe query.
`Tn another aspect of the present invention, a method for
`deploying a system for indexinga file is provided. The system
`comprises: providing a computer infrastructure being oper-
`ableto: register an index handler correspondingto a particular
`file type; call the index handler when a changeto a file having,
`the particularfile type is detected; parse the file with the index
`handler to obtain index information; and write the index
`information to an index entry correspondingto the file using
`an index writer.
`In another aspect of the present invention, a method for
`maintaining symbolic nameintegrity in a dynamic environ-
`ment is provided. The method comprises: detecting a new
`symbolic nameassociated with a file; populating a primary 5
`table with the association of the new symbolic nameto the
`file; populating a shadow table with an associationof a pre-
`vious symbolic name tothe file; and using, the shadowtable to
`resolve a dangling referenceto thefile.
`In another aspect of the present invention, the method for
`maintaining symbolic nameintegrity in the dynamic environ-
`ment further comprises using the shadow table to resolve a
`dangling reference to thefile.
`In another aspect of the present invention, the method for
`maintaining symbolic nameintegrity in the dynamic environ-
`mentfurther comprises deleting the association ofthe previ-
`ous symbolic nameto thefile from the shadow table after a
`predetermined time period.
`In another aspect of the present invention, a system for
`maintaining symbolic nameintegrity in a dynamic environ-
`mentis provided. The system comprises: a name detection
`systemfor detecting a new symbolic name associated with a
`
`In another aspect of the present invention, the at least one
`computer readable mediumfurther comprising program code
`for causing the computer system to perform the following:
`delete the association of the previous symbolic nameto the
`file from the shadow table after a predetermined timeperiod.
`In another aspect of the present invention, a method for
`deploying a system for maintaining symbolic nameintegrity
`in a dynamic environment is provided. The method com-
`prises: providing a computerinfrastructure being operable to:
`detect a new symbolic nameassociated with a file; populate a
`primary table with the association of the new symbolic name
`to thefile; and populate a shadowtable with an association of
`a previous symbolic nametothefile.
`In another aspect of the present invention, the computer
`infrastructure is further operable to use the shadowtable to
`resolve a dangling referenceto thefile.
`
`
`
`These and other features of this invention will be more
`readily understood fromthe following detailed description of
`the various aspects ofthe invention taken in conjunction with
`the accompanying drawings in which:
`FIG. 1 depicts a file management system according to an
`aspect of the present invention.
`T'IG. 2 depicts a moredetailed depiction ofthe file indexing
`framework of FIG.1.
`
`FIG. 3 depicts an illustrative symbolic name scenario
`according to an aspect of the present invention.
`The drawingsare not necessarily to scale. The drawings are
`merely schematic representations, not intended to portray
`specific parameters of the invention. The drawings are
`intended to depict only typical embodiments ofthe invention,
`and therefore should not be considered as limiting the scope
`ofthe invention. In the drawings, like numbering represents
`like elements.
`
`5
`
`60
`
`65
`
`Page 7 of 12
`
`Page 7 of 12
`
`
`
`US 7,873,625 B2
`
`5
`
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`Referring now to FIG. 1, a file management system 10
`according to the present invention is shown. As depicted,
`system 10 includes computer system 14 deployed within a
`computer infrastructure/environment 12. This is intended to
`demonstrate, among other things, that someorall of the
`teachings of the present invention could be implemented
`within a network environment(e.g., the Internet, a wide area
`network (WAN), a local area network (LAN), a virtual private
`network (VPN), etc.), or on a stand-alone computer system.
`Jn the case of the former, communication throughout the
`network can occur via any combination of various types of
`communications links. For cxample,
`the communication
`links can comprise addressable connections that may utilize
`any combination ofwired and/or wireless transmission meth-
`ods. Where communications occur via the Internet, connec-
`tivity could be provided by conventional TCP/IP sockets-
`based protocol, and an Internet service provider could be used
`to establish connectivity to the Internet. Still yet, computer
`infrastructure 12 is intended to demonstrate that someorall of
`the components of system 10 could be deployed, managed,
`serviced, etc. by a service provider whooffers to managefiles
`according to the present invention.
`As shown, computer system 14 includes a processing unit
`16, a memory 18, a bus 20, and input/output (I/O) interfaces
`22. Further, computer system 14 is shown in communication
`with external I/O devices/resources 24 and index 26. In gen-
`eral, processing unit 16 executes computer program code,
`such as applications 50, file indexing framework 30, and
`symbolic name system 42, which are stored in memory18.
`While executing computer program code, processing unit 16
`can read and/or write data to/from memory 18, index 26,
`and/or I/O interfaces 22. Bus 20 provides a communication
`link between each of the components in computer system 14.
`External devices 24 can comprise anydevices (e.g., keyboard,
`pointing device, display, etc.) that enable a user to interact
`with computer system 14 and/or any devices(e.g., network
`card, modem,etc.) that enable computer system 14 to com-
`municate with one or more other devices,
`Computer infrastructure 12 is only illustrative of various
`types of computerinfrastructures for implementing the inven-
`tion. For example, in one embodiment, computer infrastruc-
`ture 12 comprises two or more devices(e.g., a server cluster)
`that communicate over a network to perform the various
`process of the invention. Moreover, computer system 14 is
`only representative ofvarious possible computer systemsthat
`can include numerous combinations of hardware. To this
`extent, in other embodiments, computer system 14 can com-
`prise any specific purposearticle of manufacture comprising 5
`hardware and/or computer program code for performing spe-
`cific functions, anyarticle of manufacture that comprises a
`combination of specific purpose and general purpose hard-
`ware/software,or the like. In each case, the program code and
`hardware can be ereated using standard programming and
`engineering techniques, respectively. Moreover, processing
`unit 16 may comprise a single processing unit, or be distrib-
`uted across one or more processing units in one or more
`locations, e.g., on a client and server. Similarly, memory 18
`and/or index 26 can comprise any combination of various
`types of data storage and/or transmission mediathat reside at
`one or more physical locations. Further, I/O interfaces 22 can
`comprise any system for exchanging information with one or
`more external devices 24. Still further, it is understood that
`one or more additional components (e.g., system software,
`math co-processing unit, etc.) not shown in FIG. 1 can be
`included in computer system 14. However, if computer sys-
`
`40
`
`45
`
`6
`tem 14 comprises a handheld deviceorthe like, it is under-
`stood that one or more external devices 24 (e.g., a display)
`and/or index 26 could be contained within computer system
`14, not externally as shown.
`Index 26 can be any type of system (e.g.. a database)
`capableofproviding storage for file information and/ortables
`54 and 56 of symbolic names underthe present invention. To
`this extent, index 26 could include one or more storage
`devices, such as a magnetic diskdrive or an optical disk drive.
`In another embodiment, index 26 includes data distributed
`across. for example. a local area network (I.AN), wide area
`network (WAN) or a storage area network (SAN)
`(not
`shown). In a typical embodiment, index 26 includes one or
`more index entries that each correspondto a particular file.
`Along these lines, each index entry includes a standardized
`list of fields that contains file information as extracted and
`indexed under the present invention
`Shown in memory 18 of computer system 14 (amongother
`systemsandtables)are file indexing framework 30, symbolic
`name system 42, application(s) 50 and index handler(s) 52. It
`should be understood that file indexing framework 30 and
`symbolic name system 42 could be provided independent of
`one another. For example, they do not need to both be pro-
`vided ona single computer system 14 within the scope ofthe
`present invention. In addition, it should be understoodthatfile
`indexing framework 30 and symbolic name system 42 could
`berealized as multiple computer programs (as shown), or as
`a single computer program (not shown). It should also be
`understoodthat the various systems and their sub-systems of
`FIG. 1 are shown as suchforillustrative purposes only and
`that the same functionality could be implemented with a
`different configuration of systems and sub-systems. In any
`event, the functions offile indexing framework 30 will be
`describedfirst in conjunction with FIGS. 1 and 2.
`
`A. File Indexing Framework
`Asdepicted in FIG,1, file indexing framework30 includes
`handler registration system 32, change notification system
`34, handler interface system 36, index writer 38, and query
`processing system 40. Among other things, file indexing
`framework 30 provides the following capabilities: (1) regis-
`tration mechanism for index handlers; (2) infrastructure to
`initialize the contents of the index and update its contents as
`files are added, changed, and deleted; (3) a delegation mecha-
`nism forcalling index handlersto parse files they understand
`for relevant content—this delegation provides the support for
`indexing arbitraryfiles with arbitrary formats; (4) establish-
`ment of standardizedfields in the index entryfor a file; (5) a
`set of index writer APIsto be called by index handlersto store
`standardized index values for a file; and (6) another set of
`APIs 60 shownin FIG.2 (provided as part of query process-
`ing system 40) to be called by applications 50 to query the
`standardized index values. (The term “API”is an acronymfor
`“application programming interface”, which enables a com-
`puter program to accessa set offunctions(typically provided
`by another application or library) without a detailed under-
`standing of the intemal workings of the functions being
`accessed.)
`The index handlers are responsible for identifying files
`they understand, parsing the contents of thosefiles, and iden-
`tifying appropriate data to be stored in the index entries for
`those files. Some advantages over a non-extensiblefile index-
`ing system include: (1) support for new file formats can be
`addedby the providerofthe file format, without any changes
`to the indexing frameworkor applications querying the index;
`(2) files storing data formats with built-in extensibility
`mechanisms, (e.g., WSDL and SCDL) canbe indexed with-
`
`Page 8 of 12
`
`Page 8 of 12
`
`
`
`US 7,873,625 B2
`
`10
`
`7
`8
`sible for generating index informationfora file, and is a class
`out changes to file indexing framework 30, the code that
`that implements a Java interface provided byfile indexing
`indexesthe base elements ofthe data standard,or applications
`framework 30. An index handler 52A-D is generally specific
`querying, the index (support for a particular extension can be
`to asinglefile type andis provided by the domain ownerofthe
`added by the extension provider in the form of an index
`file type. An index handler 52A-D must understand howto
`handler that only processes that extension); and (3) applica-
`parse a file for meaning,to identify elements definitions and
`tions 50 can essentially search the contents of arbitrary files
`references that are relevant to the WID tooling that will be
`with arbitrary formats in a standard manner, by searching
`querying index 26.
`standardized fields of index 26, without knowledge offile
`formats or the runtime extensions for which they may be
`Whencreating an index entry fora file, file indexing frame-
`targeted.
`work 30 calls each index handler 52A-D,passing it thefile
`A general description of the file indexing framework 30
`being indexed. The index handler 52A-Dfirst determines tf it
`will now be given in conjunction with FIG. 1. Thereafter, a
`is a file it understands how to parse. If so, it returns true;
`more specific illustrative example will be given in FIG. 2. In
`otherwiseitreturns false. Ifindex handler 52A-Dreturnstrue,
`general, handlerregistration system 32 allowsthird partics to
`file indexing framework 30 calls it again, also passing it an
`register index handlers 52. Each index handler 52 is typically
`index writer object. Index handler 52A-D should then open
`associated with a particular file format. As changestoafile
`the file, parse it, and call index writer 38 with any data that
`having a particular file format occur, a notification will be
`should be stored in the index entry for the file. The index
`received by changenotification system 34. Underthe present
`writer class 38 provides convenient methodsfor storing infor-
`invention such a notification can comedirectly from applica-
`mation in standardized index fields, as well as a method for
`tions 50 or from an operating system level. In any event, when
`storing data in handler-specific index fields. Again, index
`such a notification is received, handler interface system 36
`handler 52A-D can returntrueor false, this time to indicate if
`will call the associated index handler 52 to determine whether
`data it passed to the index writer 38 should be saved or
`index handler 52 is capable of parsing the affected file. If so,
`discarded. File indexing framework 30 calls all handlers
`handler interface system 36 will pass index handler 52thefile
`52A-D in the same manner. Index writer 38 manages the
`along with an index writer object. Upon receipt, index handler
`merging of data supplied by multiple index handlers 52A-D
`and stores all of the contributed index fields into an index
`52 will parse the file to obtain index information. Suchinfor-
`mation can be any type of information that will identify the
`file and its contents. Once the index information is deter-
`mined/extracted, index handler 52 will call index writer 38 to
`store the index information. Specifically, index writer 38 will
`write the index information to an index entry corresponding
`to the file. As indicated above, eachfile typically has its own
`index entry in index 26. Each entry includes a standardized
`list of fields such as file name and elements defined in thefile.
`Should a userlater desire to search/query index 26, the user
`can submit a query that will be received and processed by
`query processing system 40. Specifically, query processing
`system 40 includes a set of query APIs 60 (FIG. 2) that is
`adapted to receive queries from applications 50.
`Referring nowto FIG. 2, a more specific example of these
`functions is shown. The example shownin FIG.2 is an index-
`ing and search framework used to implement WebSphere
`Integration Developer (WID) 6.0., which is commercially
`available from IBM Corp. of Armonk, N.Y. In general, tool-
`ing in WID requiresthe following information, which can be
`scattered across all files generated and referenced by WID:
`(1) all element definitions ofa certain type; (2) all files that
`reference a specified file; (3) all elements that reference a
`specified element; (4) all elements referenced bya specified
`element; and (5) all files that define a specified namespace.
`Implementing these searches without the use of an index 26 is
`nota viable option from a performancestandpoint, because of
`the potential numberoffiles that would need to be searched.
`More importantly, WID is based on the Service Component
`Architecture (SCA) whichis highly pluggable. SCA compo-
`nent, import, and export objects—whichare saved as files—
`all contain extensible data that can be defined. by a runtime
`extension. Similarly, the implementationfiles that each com-
`ponentcan reference in its extensible data are also specific to
`a runtime extension. Correctly indexing new component,
`import or export types introduced by new runtime extensions
`would not be possible without an extensible indexing mecha-
`nism. The mechanism