throbber
US005l09439A
`5,109,439
`[11] Patent Number:
`[19]
`Umted States Patent
`
`Froessl
`[45] Date of Patent:
`Apr. 28, 1992
`
`llllllllllllll|||IllllIllll||||l|||ll|||||||||1||I|||||l|l||||l||||l|l|||||
`
`[54] MA55 DOCUMENT STORAGE AND
`RETRHEVAL SYSTEM
`
`..................... .. 332;’!
`4.811.166 3/I989 Gonzalez et al.
`4.933.979
`6/1990 Suzuki et al.
`....................... .. 382/bl
`
`Inventor: Horst Fl-oessl, Gutenbergstrasse 2-4,
`13-6944 Hemsbach Fed‘ Rep. of
`Gennany
`
`Boudreau.
`Prfimafy Examl'n£-‘r—LeO
`Assistant Examiner—T)avid Fox
`Attorney, Agent, or Ftrm—WaIter C. Farley
`
`[21] Appl. No.: 536,769
`[22] Filed:
`Jam 32, 199;)
`
`ABSTRACT
`[57]
`A sequence of documents is delivered to an optical
`scanner in which each document is scanned to form a
`digital image representation ofthe content of the docu-
`Goa‘ 9/00
`-
`131- Cl-5 ------
`[51]
`152] U-5- CL --------------------------------- " 332/ ment. In one embodiment, the image representation is
`_
`'
`converted into code (ASCII) and is automatically ex-
`[53] Flew '3‘ Search """"""""" 3§§£,;’6§1"§g§4i/2926554;
`amined by data processing apparatus to select search
`‘
`'
`’
`'
`words which meet predetermined criteria and by which
`[56]
`References Cited
`the document can subsequently located.
`In another
`us. PATENT DOCUMENTS
`embodiment, the image is not converted. The Search
`words are stored in a nonvolatile memory in code form
`é,35S.824 ll/I982 Glickman et al
`and the entire document content is stored in mass stor-
`4.553.261 ll/3935 Froessl .... ..
`6/!9BT Matsueda ..
`" 382,5?
`4.672.683
`4 743 618
`'2: 382/61
`5/1933 Takeda cl 3].
`4.758380 Ir‘/1988 Tsunekawa et ai.
`382/61
`4.760.606 T/I938 Lesnick et al.
`..................... .. 382/61
`
`age, either in code or image fonn. Techniques for se-
`19911113 ‘be 5“”-‘h Words are di5°30S=d-
`_
`21 Claims, 7 Drawing Sheets
`
`
`
`144
`
`145
`
`
`
`14?
`
`
`
`USEH
`STATION
`
`USER
`STATION
`
`
`
`
`
` INPUT KEYBOARD
`MOU E, ETC.}
`
`USER
`STATION
`COMM. LINK
`
`146
`OH NETWORK
`SERVER _ COMPUTER
`voumus s NON-
`VOLATILE MEMORY;
`RAM. TABLES. HD
`
`
`
`
`
`
`
`SOFTWARE.
`
`
`HARDWARE FOR
`5E*‘"°”
`
`CHAR cow
`pnocssson
`
`Page 1 of 17
`Page 1 of 17
`
`FIS Exhibit 1018
`FIS Exhibit 1018
`
`

`
`u
`
`m
`
`A
`
`5
`
`Page 2 of 17
`
`D.5NQ93
`
`8
`
`agIma.m_._mE
`
`
`..._mzmfimzfité89n_O...2.58H288z_2,58mzmfi8.352:;zmmté89m83EMEE89wmfisoo<ass
`
`
`xomdmmQwasmo“.22J.zo_§.Eon_z_mwazmmea.
`
`
`w..A,Eo_u_.E...........EEm.MEa;
`
`
`
`
`
`mmazmm22e2:;mo“.m_m<s__IomfimMmzmfi839mm:_8m838...
`
`232%.E82m82macs12$:wmos...$05mmE=m
`
`zoEs_m9_z_.;_§s_oomm2m%
`
`
`
`82.w,é...m.,._o$____.,___._.u.”_m_..M..,m_.mfi,.,w4
`
`S.5GE«N
`“M..__,.__m__w_,‘,___fi___m_
`
`.m3
`
`.9.o._EEa.28EzzémB888%
`
`
`

`
`f.HDut3P3U
`
`Apr. 28, 1992
`
`Sheet 2 of 7
`
`5,109,439
`
`AV._,mm9flzmznooo
`
`mac:
`
`Bmmmoofi
`
`
`
`xomfimBmdmmz__._o§
`
`
`
`noSum20.:ammo;
`
`mo§oz<._$9._.zm_._..5ooo
`
`adc.mmmoommnmo
`
`3
`
`$05aEmszoo
`
`
`
`_._oEmwzmmozo
`
`o_:._.=$memo;
`
`mos...mmoa
`
`Page 3 of 17
`
`D._om%an_
`
`._§z§_mo“.05“.
`
`mmmmmmoe.mom....m_.>mm
`
`maooz_mmopma99..
`
`2.
`
`3
`
`.mE_ams;
`
`mm_mmw._..ao<
`
`N.2:9.
`
`>
`
`E02mo.mo§_::wz<E
`
`
`am._.<ammoozo
`
`mmmmmmnmc9Emssooocomm502
`
`:_§.m.
`zomfimm<mm_mmmEo<
`
`Q0...84demo;
`
`2.
`
`9_._.EsomfijmmmoommoaaEmszoo
`
`
`memo;zomfim2".2.
`
`Eazmmm_moo_._o
`
`.2522...momesp.
`82.E92.mmoa
`
`3.?
`
`
`
`no2m__>m_m25.5:
`
`
`
`:8mo".mzomfim
`
`_mE8_:m._moE
`
`#525,.moEma
`
`ztzm
`
`Eozmm
`
`zo=.§mou_z_
`
`2.50”.
`
`....
`
`moo
`
`zo_..#m_>zoo
`
`_2m.aoE
`
`S.9205..........S.om.:oE
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`

`
`U.S. Patent
`
`Apr. 23, 1992
`
`Sheet 3 of 7
`
`5,109,439
`
`
`
`zo:$.Eou_z_mmazmm8...
`
`E03zomfimm<mm:mom9o._.
`
`' ' ‘ . ' ' l . I ‘ . '|
`
`Page 4 of 17
`
`2._ov%an_
`
`
`
`
`
`zo:.§mou_z_mz_.._mE&000..
`
`
`zo_E_2m9_z_mmom8588AV
`
`.25,5:mo;95".zo_wE>z8988SE
`
`amfimsamofi:_m_o‘..w._._,_hm.,__,__,,__”__.H.w
`
`.no..
`
`<NGE8
`
`,_%_W._.,‘_..mm....___m
`
`
`
`no2m__>m_m252::
`
`
`
`>E2m_..§z§_
`
`.>m._.m=u_.s_w:..._oEmoo
`
`mommomfim
`
`Eozmm
`
`zo_:§__"_ou_z_
`
`25”.z_
`
`
`
`Emsaooom¢:.zm3mmoawEmszoo
`
`mo<:_$0.5mmtzm
`
`38zEm_mo._.wm~...<m.
`
`83Q2932%.
`
`
`
`92n::55m_._m§069z_358:55.z_.._m_Ea>zmmts.O0O._
`
`
`
`OwO._O...53.OOO._000..mmézoo<ass
`
`m._m<.._.
`
`mMIE
`
`
`
`onOm_vO.._000..IOn_
`
`
`
`m_w§_xomqmm
`
`
`
`
`
`>z§2oommozmm5..m_._mEmom_._mEommofiM.9,59.
`
`
`
`
`
`
`
`
`
`

`
`S.U
`
`m
`
`mA
`
`m
`
`w
`
`4.,mmGE
`
`Page 5 of 17
`
`D.6mmama
`
`M,Emaooz_.8;M55adc.mmmoomm“mo
`
`
`
`
`
`
`IomfimgoEma__.._oEmemo;85..e:25m_w§az<._.59Emzsoonzmemo;
`
`
`
`
`
`.zmmozommopmzomfim5m_,_m_mmz_=o<_..._
`
`
`
`
`
`t._§z§:805¢E323n.83..
`
`
`
`H_mmm_Ea<mom.sm_>mmwmmmm_moo<
`
`._...mBmmmoofiMAV>ma9mpzmzaooa
`...m3mo:9=._.=s
`omzsmmmooM552,59.%.momo_s
`_._om<mw>zoE§mou_z_2.u_z_
`
`
`
`mwozwmmmooxo8Eozmm
`
`
`M3.53.
`
`mm_mwm¢8<B2922game;mpzmzaooo98¢50$zomfim2m_mwmmmS<m.962mo3%.._._zmz<EaE5mmoozo
`
`882.$054”Q9...2,59.
`
`
`
`a I '.IIIII-_l.I.I.I..lIPM5dz:9:..adcs_oE
`
`
`

`
`US. Patent
`
`Apr. 23, 1992
`
`Sheet 5 of 7
`
`5,109,439
`
`
`
`START SEARCH WORD
`SELECTION
`
`
`
`'__________"4: 100
`In—————:z ——-an-:——
`: CONVERT TEXT 5
`
`f’ 54. 94
`
`102
`
`CHECK EACH WORD
`IN TEXT FOR !N|TlAL
`CAPITAL LETTER
`
`"
`
`110
`
`CAPS VOCAB.
`TABLE
`
`
`
`I we
`°°'¢«7?+?%A‘g3RD
`vocAS. TABLE
`
`
`
`105
`Y
`
`
`
`cs???” "*5"
`D
`BY FULL
`srop
`:2
`
`
`DOES
`v‘.?{?u5‘%§§f1~5§‘L
`
`
`LETTER
`9
`
`
`
`112
`
`
`
`116
`
`WORD IN CAPS
`
`
`VOCAB. TABLE AS WOR
`COMPARE OTHER (NON-CAP)
`NOT TOOSELECT
`
`WORDS WITH LANGUAGE
`TABLE DICTIONARY) AND
`
`SELECT S SEARCH WORDS
`
`
`THOSE MARKED," INCLUDING
`SPECIAL AND “MUST”
`SELECTIONS FOR
`
`SPECIFIC BUSINESS
`
`A
`
`90 NOT
`STORE
`
`STORE AS SEARCH
`wonn A CORRELATE
`WITH DOCUMENT no
`
`107
`
`
`
`119
`
`THERE MORE TEXT
`IN THE DOCUMENT
`
`
`
`
`
`I
`
`120
`
`Y
`
`G0 To START SEARCH
`wono SELEGT1ON
`
`FIG. 3
`
`122
`
`N N
`
`EXT
`DOCUMENT
`
`Page 6 of 17
`Page6of 17
`
`

`
`tHe.taPS”U
`
`82
`
`_._..J
`
`93A,901.15
`
`Page 7 of 17
`
`D.5Nmama
`
`..6....
`
`2:
`
`Em:8.we
`
`zoE:m
`
`
`
`r..2:35.mno:M.>5%ao¢<om>e_5%.
`
`
`
`
`35-202am_.__._.<._o>
`
`z1_m.__.%m.mv_z_._E200em...-n8.xmwflflwmmomm_Sn=._ooUH:
`
`
`
`
`EE:o>.25”.Ecaozmz .m.>z8.m§omzoE&mmozmommwmmwmm_.._o"_m_m<a.E<_._m5%2.19..mm._m_§
`
`
`
`
`
`
`
`
`
`

`
`US. Patent
`
`Apr. 23, 1992
`
`Sheet 7 of 7
`
`5,109,439
`
`150
`
`ENTER SEARCH
`WORDS
`
`
`
`
`
`COMPARE SEARCH
`WORDS WITH STORED
`SEARCH WORDS
`
`
`
`
`
`
`
`DISPLAY LIST
`OF SEARCH
`WORDS
`
`
`
`SEARCH
`WORDS
`FOUND ?
`
`CALL UP LIST OF
`SEARCH WORDS
`MEETING CRITERIA
`
`
`
`
`
`SELECT SEARCH
`WORDS FROM
`DISPLAY NO. OF
`LIST
`DOCUMENTS
`
`HAVING SEARCH
`
`WORDS CHOSEN
`
`
`
`
`
`
`158
`
`
`
`N
`
`
`TOO
`
`MANY TO REVIEW
`VISUIALLY
`
`-152
`
`‘I64
`
`
`
`DISPLAY
`DOCUMENTS
`
`PRINT OR
`QUIT
`
`
`
`
`
`
`
`CHOOSE ADDED
`CRITERIA TO RE-
`
`DUCE NUMBER OF
`
`
`DOCUMENTS
`
`
`
`FIG. 5
`
`Page 8 of 17
`Page8of 17
`
`

`
`1
`
`5,109,439
`
`MASS DOCUMEINT STORAGE AND RETRIEVAL
`SYSTEM
`
`This invention relates to a system for the mass storage
`of documents and to a method for automatically select-
`ing search words by which the documents can be re-
`trieved on the basis of the document content.
`
`BACKGROUND OF THE INVENTION
`
`Various systems are used for the mass storage and
`retrieval of the contents of documents including sys-
`tems such as those disclosed in my earlier U.S. Pat. Nos.
`4,273,440; 4,553,261; and 4,276,065. While these systems
`are indeed quite usable and effective, they generally
`require considerable human intervention. Other systems
`involve storage techniques which do not use the avail-
`able technology to its best advantage and which have
`serious disadvantages as to speed of operation and elli-
`ciency. In this context, the term "mass storage" is used
`to mean storage of very large quantities of data in the
`order of, e.g., multiple megabytes. gigabytes or tera-
`bytes. Storage media such as optical disks are suitable
`for such storage although other media can be used.
`Generally speaking, prior large-quantity storage sys-
`tems employ one of the following approaches:
`A. The content of each document is scanned by some
`form of optical device involving character recogni-
`tion (generically. OCR) so that all or major parts of
`each document are converted into code (ASCII or
`the like) which code is then stored. Systems of this
`type allow full-text code searches to be conducted
`for words which appear in the documents. An
`advantage of this type of system is that indexing is
`not absolutely required because the full text of each
`document can be searched, allowing a document
`dealing with a specific topic or naming a specific
`person to be located without having to be con-
`cerned with whether the topic or person was
`named in the index. Such a system has the disad-
`vantages that input tends to be rathel: slow because
`of the conversion time required and input also
`requires human supervision and editing. usually by
`a person who is trained at least enough to under-
`stand the content of the documents for error-
`checking purposes. Searching has also been slow if
`no index is established and, for that reason, index-
`ing is often done. Also, the question of how to deal
`with non-word images (graphs, drawings, pictorial
`representations) must be dealt with in sotne way
`which differs from the techniques for handling text
`in many OCR conversion systems. Furthermore,
`such systems have no provision for offering for
`display to the user a list of relevant search words,
`should the user have need for such assistance.
`B. The content of each document is scanned for the
`purpose of reducing the images of the document
`content to a form which can be stored as images,
`i.e., without any attempt to recognize or convert
`the content into ASCII or other code. This type of
`system has the obvious advantage that graphical
`images and text are handled together in the same
`way. Also, the content can be displayed in the same
`form as the original document, allowing one to
`display and refer to a reasonably faithful reproduc-
`tion of the original at any time. In addition, rather
`rapid processing of documents and storage of the
`contents is possible because no OCR conversion is
`
`I0
`
`15
`
`20
`
`15
`
`30
`
`35
`
`45
`
`SI}
`
`55
`
`60
`
`65
`
`Page 9 of 17
`Page 9 of 17
`
`2
`needed and it is not necessary for a person to check
`to see that conversion was proper. The disadvan.
`tages of such a system are that some indexing tech-
`nique must be used. While it would be theoretically
`possible to conduct a pattern search to locate a
`specific word ‘'match‘‘ in the stored images of a
`large number of documents. success is not likely
`unless the “searched for“ word is presented in a
`font or typeface very similar to that used in the
`original document. Since such systems have had no
`way of identifying which font might have been
`used in the original document, a pattern search has
`a low probability of success and could not be relied
`upon. Creating an index has traditionally been a
`rather time consuming, labor-intensive task. Also,
`image storage systems (i.e., storing by using bit-
`mapping or line art or using Bezier models) typi-
`cally require much more memory than storing the
`equivalent text in code. perhaps 25 times as much.
`Various image data banks have conte into existence
`but acceptance at this time is very slow mainly due to
`input and retrieval problems. Because of the above
`difficulties. mass storage systems mainly have been re-
`stricted to archive or library uses wherein retrieval
`speed is of relatively little significance or wherein the
`necessary human involvement for extensive indexing
`can be cost justified. There are, however, other contexts
`in which mass storage could be employed as a compo-
`nent of a larger and different document handling system
`if the above disadvantages could be overcome.
`SUMMARY OF THE INVENTION
`
`An object of the present invention is to Ptovide a
`method of handling input documents. storing the con-
`tents of the documents and automatically creating a
`selection of search words for the stored documents with
`little or no human intervention.
`A further object is to provide a method of machine-
`indexing contents of documents which are to be stored
`in image form in such a way that the documents can be
`retrieved.
`
`Another object is to provide a method to display
`search words to users in an indexed or a non-indexed
`system.
`Briefly described, the invention comprises a method
`of retrievably storing contents of a plurality of docu-
`ments having images imprinted thereon comprising
`optically scanning the documents to form a representa-
`tion of the images on the documents. A unique identif-
`cation number can be assigned to each document and to
`the image representation of each document. Search
`words are automatically selected from each document
`to be used in locating the document from mass storage.
`The selected search words are converted to code, cor-
`relating the converted search words with the unique
`identification number of the document from which the
`search words were selected. The search words are
`stored in code, and the image representation of each
`document is stored in mass storage or the entire text is
`convened into ASCII or other code with the Search
`words being retained in separate storage for display to
`users when desired.
`It should be kept in mind that the invention contem-
`plates three possible approaches which have their own
`advantages and disadvantages. In one approach, the text
`is “read" by a scanner or the like and kept in a bit-
`mapped or similar digital for, as it emerges from the
`scanner rather than being converted into ASCII or
`
`

`
`5,109,439
`
`3
`other code. Search words are extracted and converted
`into code but the main body of the text is stored (in mass
`storage} as an image. In the second approach, the entire
`document (to the extent possible) is converted. search
`words are selected and stored in code form. and the
`entire text is stored in code. In the third approach. the
`document is also entirely converted (to the extent possi-
`ble) and search words are selected but the document is
`finally stored in image form. Except for the search
`words. the converted text is not saved in mass storage.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`In order to impart full understanding of the manner in
`which these and other objects are attained in accor-
`dance with the invention, particularly advantageous
`embodiments thereof will be described with reference
`to the accompanying drawings, which form part of this
`specification, and wherein:
`FIGS. 1A and 1B. taken together, constitute a flow
`diagram illustrating the overall steps of a first embodi-
`ment of a document processing method in accordance
`with the invention;
`FIGS. 2A and 2B, taken together. constitute a flow
`diagram illustrating the steps of a second embodiment
`of a document processing method in accordance with
`the invention;
`FIG. 3 is a flow diagram illustrating a search word
`selection process in accordance with the invention:
`FIG. 4 is a block diagram of a system in accordance
`with the invention; and
`FIG. 5 is a flow diagram illustrating a retrieval
`method in accordance with the invention.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`The present invention will be described in the context
`of a system for handling incoming mail in an organiza-
`tion such as a corporation or government agency which
`has various departments and employees and which re-
`ceives hundreds or thousands of pieces of correspon-
`dence daily. At present, such mail is commotlll’ handled
`manually because there is no practical alternative. Ei-
`ther of two approaches is followed. depending on the
`size and general policies of the organization:
`in one
`approach. mail is distributed to departments. and per-
`haps even to individual addressees, before it is opened.
`to the extent that its addressee can be identified from the
`envelope; and in the other approach. the mail is opened
`in a central mail room and then distributed to the ad-
`dressees In either case. considerable delay exists before
`the mail reaches the intended recipient. In addition,
`there is very little control over the tasks which are to be
`performed in response to the mail because a piece of
`mail may go to an individual without his or her supervi-
`sor having any way to track the response. Copying (i.e.,
`making a paper copy) of each piece of mail for the
`supervisor is, of course, unnecessarily wasteful. The
`present system can be used to store and distribute such
`incoming mail documents.
`Referring first to FIG. 1, at the beginning of the
`process of the present invention. each incoming docu-
`ment 20 is delivered 21 to a scanner and is automatically
`given a distinctive identification (ID) number which
`can be used to-identify the document in both the hard
`copy form and in storage. The ID number can be
`printed on the original of the document. in case it be-
`comes necessary to refer to the original in the future.
`Preferably, the ID number is a 13 digit number of which
`
`Page 10 of 17
`Page10of17
`
`4
`two digits represent the particular scanner {in the event
`that the organization has more than one) or the depart-
`ment in which or for which the incoming documents
`are being processed. two digits represent the current
`year. three digits represent the day of the year and six
`digits represent the time (hour, minute and second}.
`The number is automatically provided by a time
`clock as each document is fed into the system. For
`reasons which will be discussed below, it is anticipated
`that most documents will be processed in a time of
`about two seconds each which means that the time-
`based ID number will be unique for each document. As
`the number is being printed on the document, it is sup-
`plied to non-volatile storage, such as a hard disk. for
`cross reference use with other information about the
`document.
`
`While use of the ID number is clearly preferred, it
`would be possible to group documents, as by week or
`month received, and rely on other criteria to locate
`specific documents within each group. In such a case.
`the ID number would not be unique to each individual
`document but some other form of identification can
`enable reference to a specific document.
`In order for the processing to be reliable, there are
`certain prerequisites for the documents. systems and
`procedures to allow the documents to be processed.
`Most of these are common to all conversion systems,
`not only those of the present invention. Currently avail-
`able hardware devices are capable of performing these
`functions. The criteria are:
`a. Each document should be easily readable. ie. have
`reasonably good printing.
`b. The print should be on one side of the page only.
`For documents having printing on both sides,
`it
`should be standard practice to use one side only.
`c. The scanner should have a document feeder.
`d. A copying machine should be available for either
`copying documents darker when the original is too
`light. or
`copying damaged or odd-size documents not suit-
`able for feeder input.
`e. Character recognition software used with the sys-
`tem must be powerful and able to convert several
`different fonts appearing on one page.
`f. Preferably the software should also be able to con-
`vert older type fonts and must be able to separate
`text and graphics appearing on the same page.
`At this preliminary stage, pre-run information 22 can
`also be supplied to the apparatus to set, for example, the
`two-digit portion indicating the department for which
`documents are being processed. This is helpful if a sin-
`gle scanner is to be used for more than one department
`or if a scanner in one department is temporarily inopera-
`tive and one for another department is being used.
`The documents are fed into the scanner, after or
`concurrently with assignment of the ID number, the
`scanner being of a type usable in optical character rec-
`ognition (OCR} but without the usual recognition hard-
`ware or software. The scanner thus produces an output
`which is typically an electrical signal comprising a se-
`ries of its of data representing successive lines taken
`from the image on the document. Each of the successive
`lines consists of a sequence of light and dark portions
`(without gray scales) which can be thought of as equiv-
`alent to pixels in a video display. Several of these “pixel
`lines“ form a single line of typed or printed text on the
`document. the actual number of pixel
`lines (also re-
`ferred to as "line art“) needed or used to form a single
`
`10
`
`IS
`
`20
`
`25
`
`3D
`
`35
`
`40
`
`45
`
`50
`
`55
`
`65
`
`

`
`5
`line of text being a function of the resolution of the
`scanner.
`
`5,109,439
`
`6
`including address. is attached, 36. to the ID number for
`that particular document for subsequent use as a search
`word. If no pattern match is found, a flag can be at-
`tached to the ID number for that document to indicate
`that fact, allowing human intervention to deter:-nine
`whether the logo pattern should be added to the exist-
`ing table.
`As will be discussed, the ID number and any addi-
`tional information which is stored with that number, as
`well as search words to be described. are ultimately
`stored in code rather than image form. Such code is
`preferably stored on a hard disk while the images are
`ultimately stored in a mass store such a WORM (write
`once. read many times) optical disk. Meanwhile, all
`such data is held in RAM.
`
`At this stage, the system enters into a process of se.
`lecting search words and other information from the
`remaining parts of the document to allow immediate
`electronic distribution as well as permanent storage of
`the documents which have specifically designated ad-
`dressees and to permit subsequent retrieval on the basis
`of information contained in the document. Some of the
`techniques for doing these tasks are language- and cus-
`tom-dependent, as will be discussed. and the techniques
`must thus be tailored to the languages and customs for
`the culture in which the system is intended to be used.
`A general principle in this embodiment is to attempt to
`recognize portions of the document which are likely to
`contain information of significance to subsequent re-
`trieval before the document is converted into code and
`to then convert into code only specific search words
`within those recognized portions.
`It is customary in many countries to have the date of
`the letter and information about the addressee isolated
`at the top of a letter following a logo, or in a paragraph
`which is relatively isolated from the remainder of the
`text. This part of the letter easily can be recognized
`from the relative proportion of text space to blank space
`without first converting the text into code. Once recog-
`nized, 38. this portion can be converted. identified as
`“date" and “addressee“ information 40 and stored with
`the document ID. All known arrangements for writing
`a date can be stored in a data table for comparison with
`the document so that the date and its characteristics can
`be recognized.
`If the date and addressee information cannot be rec-
`ognized in a specific document, the ID for that docu-
`ment is flagged 42 for human intervention so that the
`date is manually added to the extent that it is available.
`In this context.
`the “addressee" would normally be
`either a specifically named person or a department
`within the overall organization. To facilitate identifying
`the addressee, a table can be maintained with individual
`and department names for comparison.
`At this stage of the process, normally about two sec-
`onds or less after the document has been introduced into
`the scanner. enough information will have been deter-
`mined (in most cases) for the system to send to the
`individual addressee, as by a conventional E-mail tech-
`nigue, notification 44 that a document has been re-
`ceived, from whom, and that the text is available from
`mass storage under a certain ID number. If desired. the
`image of the entire document can be transmitted to the
`addressee but a more efficient approach is to send only
`notification. allowing the intended recipient to access
`the image from mass storage.
`In a similar fashion, the name of the individual sender,
`as distinguished from a company with which the indi-
`
`10
`
`15
`
`25
`
`30
`
`35
`
`20
`
`In conventional OCR, software is commonly used to
`analyze immediately the characteristics of each group
`of pixel
`lines making up a line of text in an effort to
`"recognize" the individual characters and, after recog-
`nition. to replace the text line with code. such as ASCII
`code, which is then stored or imported into a word
`processing program. In one aspect of the present inven-
`tion (FIG. 1), recognition of the full
`text is not at-
`tempted at this stage. Rather, the data referred to above
`as pixel lines is stored in that image form without con-
`version. In the other approach (FIG. 2), the full text is
`convened into code and is then stored in mass storage
`(e.g., optical disk) while the converted search words are
`stored, as suggested above, in a readily accessible form
`of non-volatile memory such as a hard disk. In this
`connection. memory such as random access memory,
`buffer storage and similar temporary forms of memory
`are referred to herein as either RAM or volatile mem-
`ory and read/write memory such as hard disk, diskette,
`tape or other memory which can be relied upon to
`survive the deenergization of equipment is referred to as
`non-volatile memory.
`The pixel line image is stored in a temporary memory
`such as RAM 26 and the ID number, having been gen-
`erated in it code such as ASCII by the time clock or the
`like concurrently with the printing, is stored in code
`form and correlated in any convenient fashion with its
`associated document image.
`As will be recognized, the image which is stored in
`this fashion includes any graphical, non-text material
`imprinted on the document as well as unusually large
`letters or designs. in addition to the patterns of the text.
`Commonly, incoming correspondence will
`include a
`letterhead having a company logo or initials thereon. At
`this stage 26 of the process, the image can be searched
`to determine if patterns indicative of a logo or other
`distinctive letterhead (generically referred to herein as a
`“logo“) is present. This can be automatically perfonncd
`by examining the top two to three inches of the docu-
`ment for characters which are larger than normal docu-
`ment fonts or have other distinctive characteristics. By
`“automatically" it is meant that the step can be per-
`formed by machine, i.e.. by a suitably constructed and
`programmed computer of which examples are readily
`available in the marketplace. The term "automatically"
`will be used herein to mean "without human interven-
`tion" in addition to meaning that the step referred to is
`done routinely.
`If such a logo is found, 28, a comparison 30 can be
`made to see if the sender's company logo matches a
`known logo from previous correspondence. This infor-
`mation cart be useful in subsequent retrieval. For this
`purpose, a data table 32 including stored patterns of 55
`known logos is maintained correlated with the identifi-
`cation of the sending organization, the pattern informa-
`tion in the table 32 being in the same form as the signals
`produced by the scanner so that the scanner output can
`be compared with the table to see if a pattern match
`exists.
`To seek a pattern match. a comparison is performed
`preferably using a system of the type produced by Ben-
`son Computer Research Corporation, McLean, Va.
`which utilizes a search engine employing parallel pro-
`cessing and in-memory data analysis for very rapid
`pattern comparison. If the letterhead/logo on a docu-
`ment is recognized, 34», an identification of the sender,
`
`45
`
`S0
`
`65
`
`Page 11 of 17
`Page11of17
`
`

`
`5,109,439
`
`7
`vidual might be employed. is usually readily recogniz-
`able, 46, near the end ofthe document page on which it
`appears. If recognizable, the sender's name and/or title
`is chosen routinely, 48, as one of the Search words.
`Additionally. it will be recognized that the presence of
`the sender's name at the end is an indication that the
`page on which it appears is the last page of that specific
`document, while the presence of the addressee's name
`near the top indicates that the page is the first page. An
`indication of Attachments at the bottom can also be
`chosen to show that there is more to be associated with
`the letter.
`Multiple page documents can be recognized by the
`absence of letterhead information on the second and
`subsequent pages and by the presence of a signature on
`a page other than the one with address information. It is
`important to correlate all subsequent pages with the
`first page so that when a multiple page document is
`found in a search, the first page is displayed and the user
`can then "leaf through" the document by sequentially
`displaying the subsequent pages.
`If a specific document exhibits any problems with
`character recognition. 50, the search words and related
`material are stored and the ID flagged for human atten-
`tion, 52. The human review 56 is for the purpose of
`determining the reasons for the problem, correcting
`them if possible and either retrying the machine pro-
`cessing or manually entering the desired information.
`The next task. 54, is to identify by machine those
`words in the text of the document which are significant
`to the meaning of the document and which can be used
`as search words, apart from identification of the sender,
`addressee. etc. The manner in which this task will be
`accomplished is more language-dependent
`than the
`above. A more complete discussion of the text search
`word selection process follows with reference to FIG.
`3. The chosen search words are converted to code, 58,
`stored with, or correlated with, the ID number and the
`image itself is transferred to the mass store. If more
`documents are to be processed, 60. the method starts
`again at 21.
`To summarize. the documents received by a com-
`pany are analyzed to identify and store important words
`from various parts of each such document. In the exam-
`ple of a business letter. such information should include
`the following:
`Sending organization (letterhead information)
`Date of the letter
`Addressee (company, organization)
`Reference
`Individual addressee (Dear Mr. ---)
`Search words chosen from text
`Presence of enclosure/annex
`Individual sender
`FIG. 2 shows an alternative embodiment in which the
`input document text is converted, to the extent possible.
`at the beginning of the process while the scanning is
`being performed. This difference leads to a number of
`other changes throughout the process, although many
`of the steps are the same. The process of FIG. 2 will be
`briefly discussed with emphasis on the differences from
`FIG. 1.
`
`To begin with. the feeding of documents 60 to scan-
`ner SI and the insertion of pre-run information 62 is the
`same. However, after or concurrently with scanning,
`the entire document is converted. 63, to code by suit-
`able conventional character recognition equipment and
`software and stored in volatile memory. As in FIG. 1,
`
`Page 12 of 17
`Page 12 of17
`
`8
`the image of the document is stored in RAM. 64, even
`though the conversion is accomplished. If there are any
`OCR conversion problems, 65,
`the ID number
`is
`flagged for human review. 65. and correction or manual
`entry. 67.
`The image is searched for a logo pattern. 70, and if a
`logo is found. 74. its pattern is compared. 75, with pat-
`terns stored in a logo table 76. If found, 78, the infon-na-
`tion stored therein about the sender is added. 80, to the
`ID data stored. If not. it can be added manually, 82.
`The system can be arranged to search for addressee
`and date information in either the image in RAM or the
`converted code in RAM, but the preferred method is to
`search in code, 72. If found, 84, these data are chosen,
`86, as search words. If not, the document is flagged for
`human review, 87. ‘Notification of the receipt of a docu-
`ment, or the entire document, can then be sent to the
`addressee, 88.
`If date and sender information has been found, 90, it
`is added as search words, 92. The search word selection
`from the text is performed. 94. chosen words are stored
`and correlated with the ID number, 96, and the con.
`verted image data are stored in WORM or other mass
`store. As before, the ID and search word information is
`stored in a non-volatile. rewritable form of memory
`such as a hard disk. In this approach. storage of the
`image is possible together with full text conversion or
`conversion in part as well as conversion of search
`words into code. On the other hand, total conversion
`can be used only for the search for, and extraction of
`search words with, possibly, editing being performed to
`only the search words or only to the capital letters of
`the search words. The search in code in this case in-
`cludes, e.g., date. addressee and sender.
`Using this approach, the remainder of the converted
`text is not stored but is deleted.
`Correction of incorrectly converted search words
`and/or rejections (words which cannot be recognized
`and converted} can also be reduced to two errors per
`rejection. or more for any characters following a capital
`letter. The capital letter itself would have to be correct
`for later ease and reliability of searching.
`FIG. 3 illustrates a process for selecting search words
`from the text of a document automatically. i.e., without
`human intervention in the case of most documents.
`which is a very important part of the present

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket