throbber
(12) United States Patent
`Bushee
`
`11111 111111111111111111 111111111111111 IIII IIIII IIIII 111111111
`US006711569B1
`US 6,711,569 Bl
`Mar.23,2004
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(54) MEfHOD FOR AUTOMATIC SELECTION
`OF DATABASES FOR SEARCHING
`
`(75)
`
`Inventor: William J. Bushee, Sioux Falls, SD
`(US)
`
`(73) As.signee: Bright Planet Corporation, Sioux
`Falls, SD (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154{b) by239 days.
`
`(21) Appl. No.: 09/911,452
`Jul. 24, 2001
`
`(22) Filed:
`
`Int. Cl.7
`................................................ G06F 17/30
`(51)
`(52) U.S. CJ. ................................. 707/5; 707/6; 707/10;
`707/104.1; 715/501.1; 715/513
`(58) Fleld of Search ................................ 707/3, 5, 6, 9,
`707/10, 101, 102, 103 R, 2, 4, 100, 104.1;
`715/501.1, 513; 709/233
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,257,185 A
`5,321,833 A
`5,338,976 A
`5,446,891 A
`
`• 10/1993 Farley et al.
`............... 7Cfl/100
`• 6/1994 Chang et al.
`.................. 7Cfl/S
`• 8/1994 Anwyl et al. .................. 704/2
`• 8/1995 Kaplan et al. ................. 7Cfl/2
`
`• 2/1998 Schultz .......................... 7Cfl/4
`5,721,902 A
`• 7/1998 Light ............................ 7Cfl/S
`5,778,363 A
`• 10/1998 Nielsen ...................... 709/233
`5,826,031 A
`• 11/1998 Pirolli et al. ................... 7Cfl/3
`5,835,905 A
`• 8/2000 Kleinberg ...................... 7Cfl/S
`6,112,202 A
`6,418,433 Bl • 7/2002 Chakrabarti et al ............ 7Cfl/S
`6,510,427 Bl • 1/2003 Bossemeyer et al. .......... 7Cfl/6
`FOREIGN PATENT DOCUMENTS
`
`........... G<XJF/17/60
`........... GOSF/17/30
`........... G<XJF/15/40
`. .......... G<XJF/17/27
`
`JP
`• 8/1999
`11224292 A
`JP
`• 8/1999
`411224256 A
`WO
`WO 9204681 Al • 3/1992
`WO
`WO 9712333 Al • 4/1997
`* cited by examiner
`Primary Examiner-Shahid Alam
`(74) Attorney, Agent, or Firm-Kaan:lal & Leonard, LLP
`(57)
`ABSTRACT
`
`A method for automatic selection of databases for improving
`the efficiency of data capture and management systems. The
`method for automatic selection ofdatabasesincludesobtain
`ing a candidate database listing providing a uniform
`resource locator (URL) for each one of a plurality of
`candidate databases to be considered during selection,
`obtaining a query from a user, matching a subset of candi
`date databases to said query, and storing a listing of selected
`databases to be used for retrieving information relative to
`said query.
`
`2 Clai~ 2 Drawing Sheets
`
`70 ~-4i~-7
`
`~D.:'._tabase Database I Database~
`2
`l_/
`I
`L ___________ .J
`Network
`
`Document
`
`22
`
`24
`
`Storage Means
`
`Communication Means
`
`Computer
`
`40
`
`50
`
`20
`
`Que,y
`Input
`Means
`
`Evaluation
`Portion
`
`Candidate
`Database
`Listing
`
`60
`
`Selected Database Portion
`
`Score
`
`Record
`
`I
`Database A eraged
`Location ~ ·
`I Number ~core t Segment I Segment
`J
`e~J esj 61J';;F
`
`Segment
`
`gmen
`
`001
`
`GOOGLE 1004
`
`

`

`U.S. Patent
`
`Mar.23,2004
`
`Sheet 1 of 2
`
`US 6,711,569 Bl
`
`70
`
`Document
`
`24----
`
`Storage Means
`
`Communication Means
`
`40
`
`Computer
`
`50
`
`Query
`Input
`Means
`
`Evaluation
`Portion
`
`Candidate
`Database
`Listing
`
`60
`
`Selected Database Portion
`
`----,--i--.-- ~
`
`I
`
`Score
`
`Record
`
`.
`Location
`
`d
`verage
`
`rDatabase 'A
`I
`I Number I s!c::nt Segment I Segment
`Segment L 9
`J
`7ij ssJ s1J sgJ
`
`L...-
`
`Figure 1
`
`002
`
`

`

`U.S. Patent
`
`Mar.23,2004
`
`Sheet 2 of 2
`
`US 6,711,569 Bl
`
`Obtain Query
`
`+
`
`Score Each of N
`results
`
`Compare Query to
`Categorization of
`Database in Pool
`
`+
`
`+
`
`Average Score of
`N Results for Each
`Database
`
`+
`
`+
`
`Select Databases
`
`+
`
`Assign Average
`Score
`
`+
`
`Pass Query to
`Selected
`
`Rank Databases
`by Average Score
`
`+
`
`Present Databases
`and Results in
`Ranked Order
`
`+
`
`Collect Results
`from Database
`
`+
`
`Pull First N Results
`from Each
`Database
`I
`
`Figure 2
`
`003
`
`

`

`US 6,711,569 Bl
`
`1
`METHOD FOR AUTOMATIC SELECTION
`OF DATABASES FOR SEARCHING
`
`INCORPORATION BY REFERENCE
`
`This patent application discloses an invention which may
`optionally form a portion of a larger system. Other portions
`of the larger system are disclosed and described in the
`following patent applications, all of which are subject to an
`obligation of assignment to the same person. The disclosures
`of these applications are herein incorporated by reference in
`their entireties.
`MEIBOD AND SYSTEM FOR AUTOMATIC HAR(cid:173)
`VESTING AND QUALIFICATION OF DYNAMIC 15
`DATABASE CONTENT, William J. Bushee, Thomas
`W. Tiahrt, and Michael K. Bergman, and Filed Jul. 24,
`2001, application Ser. No. 09/911,522 now pending.
`AUTOMATIC SYSTEM FOR CONFIGURING TO
`DYNAMIC DATABASE SEARCH FORMS, William 20
`J. Bushee, Filed Jul. 24, 2001, application Ser. No.
`09/911,435 now pending.
`SYSTEM AND METHOD FOR EFFICIENT CONTROL
`AND CAPTURE OF DYNAMIC DATABASE 25
`CONTENT, William J. Bushee and Thomas W. Tiahrt,
`Filed Jul. 24, 2001, application Ser. No. 09/911,434
`now pending.
`SYSTEM FOR AUTOMATICALLY CATEGORIZING
`CONTENT IN HIERARCHICAL SUBJECT 30
`STRUCTURES, Thomas W. Tiahrt, Michael K.
`Bergman, and William J. Bushee, Filed Jul. 24, 2001,
`application Ser. No. 09/911,433 now pending.
`SYSTEM AND METHOD FOR FLEXIBLE INDEXING
`OF DOCUMENT CONTENT, Thomas W. Tiahrt, Filed 35
`Jul. 24, 2001, application Ser. No. 09/911,432 now
`pending.
`
`BACKGROUND OF THE INVENTION
`
`2
`to the users' queries. Because of the similarity between web
`sites specifically and databases in general the terms docu(cid:173)
`ment and web page are used synonymously throughout this
`document unless otherwise distinguished by context.
`5 Similarly, the terms search engine and database are also used
`synonymously throughout this document unless otherwise
`distinguished by context.
`Many enterprises, whether business, governmental, or
`10 other coordinated undertakings, require large amounts of
`"current" information to be analyzed and available for use in
`the daily execution of their activities. The Internet has made
`the availability information in near real time a reality.
`However, this very current information is distributed across
`several thousand, if not millions, of computer systems linked
`to the Internet. Additionally, this information may be stored
`in various different formats, such as documents, web pages,
`and other machine readable formats. Locating information
`relevant to a specific query posed by a user often requires
`specific knowledge of the information's location, sophisti(cid:173)
`cated search strategies and even professional researchers.
`The use of search engines to locate information related to a
`user's query is well known and has to some extent sped up
`the process of locating related information.
`A significant portion of related information returned by
`search engines may not be considered truly relevant to a
`user's query. The resources required to evaluate all of the
`information identified by a search engine in order to filter out
`non-relevant information can be more than substantial. The
`resources used may include, by way of example and not
`limitation, transmission bandwidth, data storage, and time
`(both of system usage and of personnel) required to filter out
`related but not relevant information. The need to capture and
`organize relevant information can be overwhelming, and an
`automated system is required to effectively solve this prob(cid:173)
`lem.
`In these respects, the method for automatic selection of
`databases according to the present invention substantially
`departs from the conventional concepts and designs of the
`prior art, and in so doing provides a system primarily
`developed for the purpose of improving the efficiency of
`data capture and management systems.
`
`SUMMARY OF IBE INVENTION
`
`40
`
`45
`
`1. Field of the Invention
`The present invention relates to search engines and more
`particularly pertains to a new method for automatic selection
`of databases for improving the efficiency of data capture and
`management systems.
`2. Description of the Prior Art
`The Internet is a worldwide system of computer networks
`in which users at any one computer may get information
`In view of the foregoing disadvantages inherent in the
`located on virtually any other computer with appropriate
`authorization. The Internet uses a set of protocols called 50
`known types of search engines now present in the prior art,
`the present invention provides a new method for automatic
`Transmission Control Protocol/Internet Protocol or TCP/IP.
`selection of databases construction wherein the same can be
`The World Wide Web (often abbreviated as WWW) is a
`utilized for improving the efficiency of data capture and
`portion of the Internet using hypertext as a method for rapid
`cross-referencing that links one document or site to another. 55 management systems.
`The invention contemplates a method of selection and
`A database is a collection of data, which is organized in
`characterization of search engines and databases which
`a manner that allows its contents to be easily accessed,
`includes obtaining a candidate database listing providing a
`managed, and updated. Given this definition an Internet site
`uniform resource locator (URL) for each one of a plurality
`can be viewed as a database with a collection of data that can
`60 of candidate databases to be considered during selection,
`be viewed as pages, or accessible documents. Similarly, any
`obtaining a query from a user, matching a subset of candi(cid:173)
`network for accessing documents can be considered a
`date databases to said query, and storing a listing of selected
`database, including intranets and extranets. These network
`databases to be used for retrieving information relative to
`databases can be either static or dynamic. A static network
`65 said query.
`database provides the same set of documents or pages to
`every user. A dynamic network database presents unique
`There has thus been outlined, rather broadly, the more
`important features of the invention in order that the detailed
`documents or pages to different users, typically as a response
`
`004
`
`

`

`US 6,711,569 Bl
`
`3
`description thereof that follows may be better understood,
`and in order that the present contribution to the art may be
`better appreciated. There are additional features of the
`invention that will be described hereinafter and which will
`form the subject matter of the claims appended hereto.
`In this respect, before explaining at least one embodiment
`of the invention in detail, it is to be understood that the
`invention is not limited in its application to the details of
`construction and to the arrangements of the components set
`forth in the following description or illustrated in the draw(cid:173)
`ings. The invention is capable of other embodiments and of
`being practiced and carried out in various ways. Also, it is
`to be understood that the phraseology and terminology
`employed herein are for the purpose of description and 15
`should not be regarded as limiting.
`As such, those skilled in the art will appreciate that the
`conception, upon which this disclosure is based, may readily
`be utilized as a basis for the designing of other structures,
`methods and systems for carrying out the several purposes 20
`of the present invention. It is important, therefore, that the
`claims be regarded as including such equivalent construc(cid:173)
`tions insofar as they do not depart from the spirit and scope
`of the present invention.
`The objects of the invention, along with the various
`features of novelty which characterize the invention, are
`pointed out with particularity in the claims annexed to and
`forming a part of this disclosure. For a better understanding
`of the invention, its operating advantages and the specific 30
`objects attained by its uses, reference should be made to the
`accompanying drawings and descriptive matter in which
`there are illustrated preferred embodiments of the invention.
`
`10
`
`4
`The evaluation portion 40 of the system 10 is used for
`capturing, storing and scoring a plurality of responsive
`documents 70 (such as, for example, web pages) returned by
`each one of the plurality of databases 4 in response to the
`5 user's query.
`A candidate database listing 50 may provide an index of
`uniform resource locators (URLs) for each database 4 to be
`considered for selection in response to the user's query.
`The evaluation portion 40 determines a page score for a
`numerical representation of each one of the responsive
`documents 70 associated with each one of the plurality of
`databases 4. The range score is a numerical representation of
`the relative relevancy of the document 70 to the user's query.
`The evaluation portion further determines an averaged score
`for each one of the plurality of databases 4 based upon an
`average of each one of the page scores. The averaged score
`is used to evaluate the relevancy of the database 4 to the
`user's query.
`A selected database portion 60 provides information
`related to each one of the plurality of databases 4 being
`selected as relevant to the user's query. The selected data-
`25 base portion 60 may provide a plurality of fields for storing
`this information about each of the databases.
`In one embodiment of the invention, the plurality of fields
`include a database location number segment 62, an averaged
`score segment 64, a plurality of score segments 66, and a
`plurality of record segments 68. The database location
`number segment 62 provides a cross-reference to a location
`in the candidate database listing 50 of a URL associated with
`the database 4. The averaged score segment 64 records the
`35 averaged score for the database 4 for the user's query. Each
`one of the plurality of score segments 66 record the page
`score for one of the responsive documents 70 used to
`determine the averaged score for the database 4. Each one of
`the record segments 68 provides a cross-reference to a
`40 location of each one of the responsive documents 70 in a
`storage medium.
`Each of the database location segments 62 and the record
`segments 68 may comprise a 64-bit representation of loca-
`45 tion for facilitating access to more than 4.3 billion discrete
`locations.
`A listing of candidate databases is provided to the system
`for consideration with respect to a query or series of queries
`provided by a user. The queries may be provided directly by
`50 the user, or may be passed to the system through a file
`transfer or file access process.
`The subject query being processed is forwarded or passed
`to each of the candidate databases ( e.g. from the listing) and
`waits for the databases to provide responsive web pages.
`Typically these responsive web pages will provide URLs for
`responsive documents. Each URL may be followed to the
`document and a copy of the associated document is captured
`for evaluation.
`An evaluation parameter may be used to define a maxi(cid:173)
`mum number of responsive documents to be captured from
`each one of the plurality of databases. In a preferred embodi(cid:173)
`ment the evaluation parameter may be set and adjusted by
`65 the user to a maximum number of responsive documents.
`The evaluation parameter preferably may have a value in the
`range between 2 and 20 (inclusive) documents. More
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`The invention will be better understood and objects other
`than those set forth above will become apparent when
`consideration is given to the following detailed description
`thereof. Such description makes reference to the annexed
`drawings wherein:
`FIG. 1 is a schematic functional interconnect view of a
`new system for automatic selection of databases according
`to the present invention.
`FIG. 2 is a schematic flow diagram of a method aspect of
`the present invention.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`With reference now to the drawings, and in particular to
`FIGS. 1 and 2 thereof, a new method for automatic selection
`of databases embodying the principles and concepts of the
`present invention will be described.
`As best illustrated in FIG. 1, the system 10 for the 55
`automatic selection of databases generally comprises a com(cid:173)
`puter system 20, a query input means 30, and an evaluation
`portion 40.
`The computer system 20 includes a storage means 22 for
`facilitating the retention and recall of dynamic database 60
`content and a communications means 24 for performing
`bi-directional communication between the computer system
`20 and a network 2.
`The query input means 30 of the system 10 is used for
`receiving a plurality of queries from a user and transferring
`the plurality of queries to a plurality of databases 4.
`
`005
`
`

`

`US 6,711,569 Bl
`
`5
`
`40
`
`5
`preferably, the evaluation parameter has a value falling in
`the range between 4 and 10 documents (inclusive). Most
`preferably, the evaluation parameter has a value of approxi(cid:173)
`mately 5 documents.
`A database providing the documents (such as a search
`engine) may also indicate relative scores or rankings for the
`relevancy of each of the documents with respect to the query
`based upon various factors determined by the entity oper(cid:173)
`ating the database. In a preferred embodiment, the docu- 10
`ments captured for storage and analysis are the documents
`with the highest associated scores or rankings determined by
`the source database.
`Each of the captured documents copies is stored on the 15
`system for recall and analysis without having to return to the
`source databases of the documents.
`Each of the documents (e.g. web pages) is then evaluated
`for the number of occurrences of the term or terms of the
`query in the document and the title of the document. The 20
`length of the document may also be determined for evalu(cid:173)
`ating relevancy. This information is used to determine a
`numerical score for each document. The numerical scores
`for each document retrieved from a database are averaged 25
`together, and this averaged score is then assigned to the
`database as an indication of relevance of that database to the
`user's query.
`The databases may be sorted or ranked by averaged score
`such that databases with relatively higher averaged scores 30
`are presented to the user before databases with relatively
`lower averaged scores.
`An information stream may be created which contains
`multiple information portions. Each information portion is 35
`associated with each one of the databases still under con-
`sideration after initial screening or filtering.
`In a preferred embodiment, the information portions may
`include a database location number segment, an average
`score segment, a plurality of score segments, and a plurality
`of record segments. The database location number segments
`provide a cross-reference to a location of the database in the
`candidate database listing. Each of the score segments
`provides the numerical scores determined for each of the 45
`responsive pages used to develop the averaged score. Each
`of the record segments provides a cross-reference to a
`location of each of the captured copies of the responsive
`pages.
`Therefore, the foregoing is considered as illustrative only
`of the principles of the invention. Further, since numerous
`modifications and changes will readily occur to those skilled
`in the art, it is not desired to limit the invention to the exact
`construction and operation shown and described, and 55
`accordingly, all suitable modifications and equivalents may
`be resorted to, falling within the scope of the invention.
`I claim:
`1. A method for the automatic selection and characteriza-
`tion of search engines and databases comprising:
`obtaining a candidate database listing providing a uniform
`resource locator (URL) for each one of a plurality of
`candidate databases to be considered during selection;
`obtaining a query from a user;
`submitting the query from the user to each one of said
`plurality of candidate databases;
`
`50
`
`60
`
`65
`
`6
`obtaining an evaluation parameter providing a predeter(cid:173)
`mined number of responsive documents to capture;
`selecting a number of URLs associated with responsive
`documents corresponding to said evaluation parameter,
`said responsive documents being selected according to
`a score provided by said database such that higher
`scoring responsive documents are selected over lower
`scoring responsive documents;
`collecting a document associated with each one of said
`URLs;
`storing each one of said documents for analysis;
`evaluating each responsive documents for occurrence of
`the query term, length of said responsive documents,
`and title of said responsive documents;
`creating a page score for each one of said responsive
`documents associated with each one of said plurality of
`databases;
`calculating an averaged score for each one of said data(cid:173)
`bases based upon an average of all of said pages scores
`associated with each one of said databases;
`associating said averaged score with said database and the
`user's query;
`sorting said candidate listing of databases by said aver(cid:173)
`aged score such that relatively higher scoring databases
`are presented substantially before relatively lower scor(cid:173)
`ing databases;
`storing said listing of selected databases associated with
`the user's query;
`creating an information stream having a plurality of
`information portions, each one of said information
`portions being associated with one of said plurality of
`selected databases;
`creating a plurality of fields within each one of said
`plurality of information portions, said plurality of fields
`including a database location number segment provid(cid:173)
`ing a cross-reference to a location of said database in
`said candidate database listing, an average score seg(cid:173)
`ment for storing said averaged score for said database
`associated with the user's query, a plurality of score
`segments for storing each one of said page scores
`associated with said database, a plurality of record
`segments for storing a location of each one of said
`responsive documents associated with said database;
`sorting said plurality of information portions such that
`information portions associated with relatively higher
`scoring databases are positioned earlier in said infor(cid:173)
`mation stream than information portions associated
`with relatively lower scoring databases; and
`writing said information stream to a storage medium to
`provide a selected listing of databases associated with
`the user's query to be polled for relative information.
`2. A system for the automatic selection of websites
`comprising:
`a computer system having a storage means for facilitating
`the retention and recall of dynamic database content,
`said computer system having a communications means
`for performing bi-directional communication between
`said computer system and a network;
`a query input means for receiving a plurality of queries
`from a user and transferring the plurality of queries to
`a plurality of databases;
`
`006
`
`

`

`US 6,711,569 Bl
`
`7
`an evaluation portion for capturing, storing and scoring a
`plurality of responsive documents returned by said
`databases;
`an evaluation parameter defining a maximum number of
`responsive documents to be captured for each one of 5
`said plurality of databases;
`a candidate database listing providing an index of uniform
`resource locators (URLs) for each database to be con(cid:173)
`sidered for selection in response to the user's query;
`said evaluation portion determines a page score as a
`numerical representation of each one of said responsive
`documents associated with each one of said plurality of
`databases, said evaluation portion further determining
`an averaged score for each one of said plurality of 15
`databases based upon an average of each one of said
`page scores, said averaged score being used to evaluate
`relevancy of said database to the user's query;
`a selected database portion providing information related
`to each one of said plurality of databases being selected 20
`as relevant to the user's query, said selected database
`portion providing a plurality of fields;
`wherein said plurality of fields further comprises:
`
`10
`
`8
`a database location number segment providing a cross(cid:173)
`reference to a location of a URL associated with said
`database in said candidate database listing;
`an averaged score segment recording said averaged
`score for said database associated with the user's
`query;
`a plurality of score segments, each one of said plurality
`of score segments recording said page score for each
`one of said responsive documents used to determine
`said averaged score;
`a plurality of record segments, each one of said record
`segments providing a cross-reference to a location of
`each one of said responsive documents stored for
`determining said page scores;
`wherein each one of said database location segments
`and said record segments comprise 32-bit represen(cid:173)
`tations of location; and
`wherein each one of said database location segments
`and said record segments comprise 64-bit represen(cid:173)
`tations of location for facilitating accessing more
`than 4.3 billion discrete locations.
`
`* * * * *
`
`007
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket