`United States Patent
`
`Froessl
`[45] Date of Patent:
`Mar. 7, 1995
`
`[11] Patent Number:
`
`5,396,588
`
`lllllllllllllllll||||||||||Illlllllllllllllllllllllllllllllllllllllllllllll
`USOOS396588A
`
`[54] DATA PROCESSING USING DIGITIZED
`IMAGES
`
`[76]
`
`Inventor: Horst Froessl, Gutenbergstrassc 2-4,
`6944 Hemsbach, Germany
`
`[56]
`
`4,887,304 12/1989 Terzian ................................. 382/30
`
`4,907,283
`3/1990 Tanaka et al.
`.
`382/40
`4,933,979 6/1990 Suzuki et al.
`......
`382/61
`
`4,944,022 7/1990 Yasujima et a1.
`.1
`....... 382/14
`
`1/1991 Fujisawa et a]. ............ 364/900
`4,985,863
`
`..
`...... 355/244
`4,989,042
`1/1991 Muramatsu et a1.
`8/1991 Morris et a1. ................. 382/61
`5,038,392
`[21] APP‘- N04 547,190
`
`9/1991 Yamagata e1 :11,
`.
`....... 382/57
`5,048,113
`[22] Filed:
`JuL 3, 1990
`
`..... 364/519
`5,051,925
`9/1991 Kadono et 81.
`
`5,060,146 10/1991 Chang et a1.
`..
`..... 364/900
`{51% Int. C1.6 .............................................. GOGF 15/62
`5.1099439 4/ 1992 Froessl ---------------------------------- 382/61
`52 US. Cl. .................................... 395/145; 395/150;
`.
`-
`.
`382/69; 364/419.19; 364/DIG. 2; 364/963
`PM?” Exam’f‘e"Ma’k ‘9 Z‘mmem‘a"
`[58] Field of Search ....................... 395/145, 150, 600;
`382/11, 48, 69; 364/419, 963, 419.19 mm“ Exam‘”e"‘.1°seph Feud
`Refer
`Cited
`Attorney, Agent, or Firm—Walter C. Farley
`ences
`1
`U.S. PATENT DOCUMENTS
`[57]
`thod f
`fBS
`fCI'
`d
`isc osed in
`o manipu ating in ormation is
`A me
`1/1972 Chappaq .................
`. ...... 382/69
`3,634,822
`.
`.
`.
`.
`.
`.
`.
`
`3,964,591
`6/1976 1111161111.
`Wh‘Fh “3e .data '5 “med as 3. d‘g‘t‘zed mag. and '5
`197/1 R
`4,028,674 6/1977 Chuang .......
`. 340/1463
`“Fm“ 1" Image 10“” for V3110“ data Process“? “‘3‘
`
`4,273,440 6/1981 Hm
`,,,,, 355/40
`mpulations. A font table is formed having a matrix of
`4,553,261 11/1985 Froessl .................. 382/57
`fonts correlated with characters and symbols in code
`
`123:2??? 223:;
`11340111“a: 31-
`3329:;
`form such as ASCII. Desired material in the stored
`
`a ..............
`,
`,
`atsu
`'
`'
`_
`‘
`
`4,726,065 2/1988 F1650 ...................... 381/41
`'19:;mentsalisllocated "5mg amen? mamh swelling
`4,748,673 5/1988 Takeda et a1. ................ 382/56
`“'1
`a par
`e ”mess" 5”“ engme'
`
`4,758,980 7/1988 Tsunekawa et al.
`364/900
`
`....................... 382/48
`4,760,606 7/1988 Lesnick et a1.
`
`8 Claims, 4 Drawing Sheets
`
`86
`
`
`
`INPUT
`88
`
`
`(KEYBOARD.
`
`MOUSE, ETC.)
`
`
`
`
`
`DOCUMENT
`
`
`
`COMPUTER
`_. FEED, PRINT —
`
`
`81 SCANNER
`
`
`VOLATILE 81 NON-
`
`
`VOLATILE MEMORY;
`
`
`RAM, TABLES, HD
`
`
`
`
`SOFTWARE,
`SEARCH
`
`
`HARDWARE FOR
`
`
`
`PROCESSOR
`
`
`CHAR. CONV.
`
`
`P. 1
`
`SONY — Ex.-1012
`Sony Corporation — Petitioner
`
`
`
`U.S. Patent
`
`Mar. 7, 1995
`
`Sheet 1 of 4
`
`5,396,588
`
`FIG. 1
`
`I00
`
`1
`
`IOb
`FOR TABLE;
`
`I0
`{ c
`
`IZu
`K9
`IZb
`\->
`
`K».
`I2c
`
`;
`
`P.2
`
`
`
`US. Patent
`
`Mar. 7, 1995
`
`Sheet 2 of4
`
`5,396,588
`
`FIG. 2
`
`15
`
`ESTABLISH FONT TABLE,
`ASCII TO PATTERN, WITH
`MULTIPLE FONTS AND SIZES
`
`16
`
`ENTER FONT
`AND SEARCH
`AREA DESIRED
`
`17
`
`GO TO
`DOCUMENTS IN
`IMAGE PATTERN
`
`STORAGE
`
`
`
`MATCH PATTERN OF
`FIRST CHARACTER
`
`
`FOUND WITH PATTERN
`IN FONTTABLE
`
`
`
`EXAMINE
`SECOND
`CHARACTER
`
`23 y
`
`24
`
`% 19
`
`Y
`
`g
`
`STOP PROCESS
`TOOPERATOR w NUMBERS
`DELETE
`ERROR MESSAGE
`
`
`
`32
`
`
`MATCH PATTERN OF
`STORE ALL CHARACTERS
`NEXT CHARACTER WITH
`IN HEADING IN ALPHABET
`
`
`PATTERNS IN FONT TABLE
`GROUP OF FIRST CHARACTER
`(1 ST READING LTR)
`
`25
`.
`
`
`
`30
`
`Y
`
`34
`35
`
`LAST
`N
`GO TO NEXT
`READING IN
`DOCUMENT
`
`
`IN STORAGE
`
`
`Y
`
`0
`
`R3
`
`
`
`US. Patent
`
`Mar. 7, 1995
`
`Sheet 3 of 4
`
`5,396,588
`
`FIG. 3
`
`
`
`GO TO FIRST
`LETTER STORAGE.
`FIRST READING
`
`
`
`
`
`
`
`RELOCATE READING
`IN ACCORDANCE WTTH
`
`ALPHABET POSITION
`
`
`
`
`COMPARE NTH CHAR-
`
`
`ACTER OF NEXT HEAD-
`
`ING WITH FONT TABLE
`
`
`
`RELOCATE
`HEADING
`IN ACCORDANCE
`
`WITH ALPHABET
`
`
`HEADING IN
`
`
`LAST LETTER
`
`STORE ?
`
`
`
`
`HEADING IN
`THIS LETTER
`
`
`STORE ? CHARACTER OF
`
`SAME HEADING
`
`COMPARE NTH
`CHARACTER WITH
`TABLE. START
`AT N = 2
`
`
`
`
`
`
`
`RELOCATE HEADING
`
`
`IN ACCORDANCE WITH
`
`ALPHABET POSTTION
`
`
`
`LAST
`AND FLAG: COMPLETED
`
`
`HEADING IN
`
`THIS LETTER
`
`
`STORE ?
` RELOCATE HEADING
`
`
`
`IN ACCORDANCE WITH
`
`ALPHABET POSITION
`
`
`
`
`SET N = 2. GO TO
`
`
`NEXT LETTER STORE
`
`
`
`
`P.4
`
`
`
`US. Patent
`
`Mar. 7, 1995
`
`Sheet 4 of 4
`
`5,396,588
`
`FIG. 4
`
`
`
`INPUT
`
`(KEYBOARD,
`
`MOUSE, ETC.)
`
`
`
`
`DOCUMENT
`
`—.
`
`
`
`FEED, PRINT _ COMPUTER
`
`VOLATILE & NON-
`& SCANNER
`
`
`VOLATILE MEMORY;
`RAM, TABLES. HD
`
`
`
`
`
`SEARCH
`SOFTWARE,
`
`
`HARDWARE FOR
`PROCESSOR
`CHAR. CONV.
`
`
`P.5
`
`
`
`1
`
`
`
`5,396,588
`
`
`
`
`2
`
`
`
`
`
`
`
`
`
`rapidly using the full content of the document as search
`criteria for both text and graphics.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A further object is to provide such a method wherein
`document contents can be manipulated and processed
`
`
`
`
`
`
`
`
`
`
`
`
`without converting the alphanumeric characters in the
`document into code.
`
`
`
`
`
`
`
`
`
`
`
`
`Yet another object is to provide a system which in-
`
`
`
`
`
`
`
`cludes a method of retrieving image content by pattern
`matching, with or without indexing.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A further object is to provide a system which can
`
`
`
`
`
`
`
`convert existing paper documents, such as technical
`manuals, into a form suitable for interactive electronic
`
`
`
`
`
`
`
`
`
`display.
`Briefly described, the invention comprises a method
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of data processing including storing digitized images of
`
`
`
`
`
`
`document contents, establishing in non-volatile memory
`
`
`
`
`
`
`
`
`a font table including code values of alphanumeric char-
`
`
`
`
`
`
`
`
`acters and symbols and images of characters and sym-
`bols in each font used in the documents, each of the
`
`
`
`
`
`
`
`
`
`
`characters and symbols in each font being correlated
`
`
`
`
`
`
`
`with the code values for the character or symbol, locat-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ing digitized images of selected portions of stored docu-
`
`
`
`
`
`
`
`ments which are to be manipulated, and manipulating
`
`
`
`
`
`
`
`
`
`the selected portions in the form of digitized images.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`In order to impart full understanding of the manner in
`which these and other objects are attained in accor-
`
`
`
`
`
`
`
`
`dance with the invention, particularly advantageous
`
`
`
`
`
`
`embodiments thereof will be described with reference
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`to the accompanying drawings, which form part of this
`specification, and wherein:
`
`
`
`FIG. 1 is a schematic illustration of a font table show-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ing the organization of a matrix usable to construct a
`correlation between characters and their equivalents in
`
`
`
`
`
`
`
`
`
`
`
`a plurality of fonts;
`FIGS. 2 and 3 are parts of a flow diagram illustrating
`
`
`
`
`
`
`
`
`
`the steps of the method of the invention as applied to a
`
`
`
`
`
`
`
`
`
`
`
`specific storage and retrieval problem; and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIG. 4 is a schematic diagram of a system for per-
`forming the method.
`
`
`
`DESCRIPTION OF THE PREFERRED
`
`
`
`EMBODIMENTS
`
`In most storage systems which deal with large
`
`
`
`
`
`
`
`
`amounts of data, the data is converted into dp code,
`
`
`
`
`
`
`
`
`
`
`ASCII being the most common, and stored in code
`
`
`
`
`
`
`
`
`
`form. When one wishes to retrieve some part of the
`
`
`
`
`
`
`
`
`
`
`stored data, various techniques can be used, depending
`
`
`
`
`
`
`
`
`on how the system is designed to operate. Some use
`
`
`
`
`
`
`
`
`
`
`index techniques while others rely on full-text searching
`
`
`
`
`
`
`
`
`for selected search words.
`
`
`
`
`In accordance with the present invention, informa-
`
`
`
`
`
`
`tion is stored in image form, the word “information”
`
`
`
`
`
`
`
`
`
`being used to mean the content of documents which are
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`being or have been transferred from a typed, printed or
`written form to digital storage. The stored information
`
`
`
`
`
`
`
`
`is preferably not indexed as it is entered into the system
`
`
`
`
`
`
`
`
`
`
`
`because any indexing system adds time to the input
`
`
`
`
`
`
`
`
`
`process.
`
`While it would theoretically have been possible in
`
`
`
`
`
`
`
`prior art systems using image storage to conduct a pat-
`
`
`
`
`
`
`
`
`tern search to locate a specific word “match” in the
`
`
`
`
`
`
`
`
`
`
`stored images of a large number of documents, success
`
`
`
`
`
`
`
`
`would not have been likely unless the “searched for”
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`word were presented in a font or typeface very similar
`to that used in the original document. Since such sys-
`
`
`
`
`
`
`
`
`
`tems have had no way of identifying which font might
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`DATA PROCESSING USING DIGITIZED IMAGES
`
`
`
`
`
`
`
`CROSS REFERENCE TO RELATED
`
`
`
`APPLICATION
`
`
`
`
`
`
`
`
`
`Reference is made to application Ser. No. 536,769,
`
`
`
`
`
`
`
`
`
`
`
`filed Jun. 12, 1990, the entire content of which is hereby
`incorporated by reference.
`
`
`
`SPECIFICATION
`
`
`
`
`
`10
`
`
`
`15
`
`
`
`
`
`
`
`
`
`
`
`
`
`This invention relates to a method and apparatus for
`
`
`
`
`
`
`
`using image searching and manipulation techniques to
`
`
`
`
`
`
`
`
`
`
`store and retrieve information in such a way that the
`
`
`
`
`
`
`
`
`
`input of material from documents to mass storage is
`facilitated and the retrieval of desired information is not
`
`
`
`
`
`
`
`
`
`impeded.
`
`BACKGROUND OF THE INVENTION
`
`
`
`
`
`
`
`
`
`
`
`Mass storage is becoming a much more interesting
`
`
`
`
`
`
`
`
`
`
`
`tool than it has in the past for a larger number of appli-
`cations because of the introduction of relatively new
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`mass storage media such as optical disks. However, it is
`
`
`
`
`
`
`
`
`
`
`still necessary to find efficient ways of putting the data
`
`
`
`
`
`
`into mass storage and retrieving it.
`
`
`
`
`
`
`
`Certainly the most efficient technique for inputting
`
`
`
`
`
`
`
`
`
`the contents of typed or printed documentation is with
`the use of optical scanning techniques. Methods and
`
`
`
`
`
`
`
`
`apparatus for handling incoming mail and the like in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`large quantities are disclosed in my copending applica-
`
`
`
`
`
`
`
`
`
`tion Ser. No. 536,769. In that application, the technique
`is used of optically scanning each document, identifying
`
`
`
`
`
`
`
`
`by data processing techniques “search words” which
`
`
`
`
`
`
`
`can subsequently be used to retrieve the documents and
`
`
`
`
`
`
`
`
`
`then storing the documents in a mass store, either in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`image form or in a data processing code such as ASCII.
`
`
`
`
`
`
`
`
`By “image form” it is meant that a digitized representa—
`tion of the image of the document is stored in a form
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`which is sometimes referred to as “bit mapped”. While
`
`
`
`
`
`
`
`
`
`image storage requires much more memory, it has the
`
`
`
`
`
`
`
`
`advantage of speed over converting everything into dp
`
`
`
`
`
`
`
`code, which necessarily requires human editing to as
`sure accuracy of conversion, and also has the advantage
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`of being able to reproduce a replica of the original on a
`
`
`
`
`
`
`
`display or with a suitable printer, including signatures,
`
`
`
`
`
`
`
`letterhead “logos” and other non-text or unconvertible
`features such as drawings or graphics.
`
`
`
`
`
`
`
`
`
`
`
`
`
`Retrieval has always been regarded as a requirement
`
`
`
`
`
`
`
`
`
`which necessitated conversion into dp code of all or a
`significant part of each document. Even in the system
`
`
`
`
`
`
`
`
`
`disclosed in Ser. No. 536,769, some conversion is used
`
`
`
`
`
`
`
`
`
`in connection with search words and the like, and that
`
`
`
`
`
`
`
`
`
`
`system is regarded as representing a minimum of con-
`
`
`
`
`
`
`
`
`version, and probably the most efficient system for
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`bridging the gap between hard copy (paper) and mass
`electronic or optical storage. It would, however, be
`
`
`
`
`
`
`
`
`advantageous for many circumstances if the speed of
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`putting information from documents into digital storage
`
`
`
`
`
`
`
`
`
`
`could be further increased so that the time for putting a
`page of printed or typed material into digital form in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`mass storage could be,
`in essence, not significantly
`
`
`
`
`
`
`
`
`
`
`longer than the time required for the page to be physi—
`
`
`
`
`
`
`
`cally scanned by an optical scanning device.
`SUMMARY OF THE INVENTION
`
`
`
`
`
`
`
`
`
`
`
`
`
`An object of the present invention is to provide a
`
`
`
`
`
`
`method of retrievably storing contents of documents
`
`20
`
`
`
`
`
`25
`
`
`30
`
`
`35
`
`
`
`
`45
`
`
`
`50
`
`
`55
`
`
`6O
`
`
`65
`
`
`
`
`
`P.6
`
`
`P. 6
`
`
`
`5,396,588
`
`
`3
`
`
`
`
`
`
`
`
`
`have been used in the original document, a pattern
`
`
`
`
`
`
`
`
`
`
`search has had a low probability of success and could
`
`
`
`
`not be relied upon.
`
`
`
`
`
`
`
`
`In order to overcome this problem, the present inven-
`tion uses what will be referred to as a “font table”. The
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`font table is a matrix of patterns organized in such a way
`
`
`
`
`
`
`
`
`that the alphanumeric characters and other symbols in a
`
`
`
`
`
`
`
`
`
`
`specific style of font or typeface are correlated with the
`
`
`
`
`
`
`
`
`
`ASCII (or other code system) values for those symbols.
`
`
`
`
`
`
`
`
`
`A schematic representation of a font table is shown in
`
`
`
`
`
`
`
`
`
`
`FIG. 1. When represented on paper, the table has a
`
`
`
`
`
`
`
`
`
`
`plurality of columns 10a, 10b, 10c, .
`. and a plurality of
`.
`rows 12a, 12b, 12c, .
`. Each column contains a list of
`.
`.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the various patterns of characters which go to make up
`
`
`
`
`
`
`
`
`
`
`
`a font set, each character being in the particular style of
`that font. Each row contains various forms of each
`
`
`
`
`
`
`
`
`
`character or symbol in each of the various fonts. At the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`intersection of a row and column will be found a spe-
`
`
`
`
`
`
`
`
`
`cific character pattern in a selected font. The fonts can
`be identified in any convenient way, such as by num-
`
`
`
`
`
`
`
`
`bers, and the characters can also be identified in various
`
`
`
`
`
`
`
`
`
`ways although one of the most desirable is to use the
`
`
`
`
`
`
`
`
`
`
`ASCII value for each character.
`
`
`
`
`
`The present invention does not use a font table im-
`
`
`
`
`
`
`
`
`
`printed on paper but, rather, uses a table in the form of
`
`
`
`
`
`
`
`
`
`
`
`a non-volatile memory such as a hard disk, i.e., a mem-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ory medium which is not erased when power is re-
`
`
`
`
`
`
`
`
`
`moved. As such, it may not physically have columns
`
`
`
`
`
`
`
`
`
`and rows, but can have any convenient equivalent form
`of organization which has characteristics similar to the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`written form, i.e., a font set can be located and recog-
`
`
`
`
`
`
`
`
`nized, the members of a set of fonts representing any
`
`
`
`
`
`
`
`single character can be recognized and the “intersec-
`
`
`
`
`
`
`
`
`tion” patterns can be quickly located, e.g., the pattern
`which represents the letter “H” in font number 3 can be
`
`
`
`
`
`
`
`
`
`located in the memory.
`
`
`
`
`
`
`
`
`
`
`
`
`
`The form of memory used is preferably revisable, i.e.,
`additions of new fonts can be made and corrections can
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`also be made. If this is considered unimportant for a
`specific application, the table can be stored in some
`
`
`
`
`
`
`
`
`
`form of read-only-memory (ROM) such as in one or
`
`
`
`
`
`
`
`
`
`more PROM chips.
`
`
`
`As mentioned above, each member of each font set is
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`stored in the table as an image. Thus, when one wishes
`
`
`
`
`
`
`
`
`
`
`
`to locate a specific word, printed in a specific font, in
`the mass store which contains all of the document text,
`
`
`
`
`
`
`
`
`
`
`the word is entered in code form as by keyboard and the
`
`
`
`
`
`
`
`
`
`
`
`character pattern equivalent to each letter of the word
`
`
`
`
`
`
`
`
`
`is copied from the font table into volatile memory
`
`
`
`
`
`
`
`
`
`(RAM) in the selected font. A pattern search is then
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`conducted to find a string of patterns in the stored text
`
`
`
`
`
`
`
`
`
`which matches the string of patterns which has been
`constructed from the table. The system can require a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`full match but, more practically, a match of some accu-
`racy less than 100% is used in order to be reasonably
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`sure of finding the desired word and avoid the problem
`of missing the word because of a typographical error or
`
`
`
`
`
`
`
`
`
`a small defect in the stored image.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A pattern search of the type described can be per-
`formed very quickly using a very fast computer search
`
`
`
`
`
`
`
`
`engine such as that developed and marketed by the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Benson Computer Research Corporation, McLean, Va.
`
`
`
`
`
`
`
`
`The Benson systems employ multiple processors in a
`parallel architecture arrangement to conduct compari-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`sons at a high rate of speed. Once located, the words or
`
`
`
`
`
`
`
`
`the documents which contain the word images can be
`copied into RAM or disk for sorting or other manipula-
`
`
`
`
`
`
`
`
`
`tion. Alternatively, the contents of storage such as opti-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`10
`
`
`
`15
`
`
`
`20
`
`
`25
`
`
`3O
`
`
`35
`
`
`
`
`45
`
`
`50
`
`
`55
`
`
`
`
`65
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`4
`
`
`
`
`
`
`
`
`
`cal disk (WORM) can be copied to RAM for pattern
`match searching.
`
`
`
`
`
`
`
`
`
`
`It is also possible to automatically provide a string of
`all fonts for each search word entered by a user for an
`
`
`
`
`
`
`
`
`
`
`
`individual search. However, for the most efficient
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`searching, as suggested above, it is desirable to be able
`
`
`
`
`
`
`
`
`
`
`to specify the font or fonts in which the searched-for
`
`
`
`
`
`
`
`
`
`
`text appears. This may not always be possible, but with
`
`
`
`
`
`
`
`
`proper system arrangement it is possible in a large num-
`
`
`
`
`
`
`
`ber of situations. When dealing with incoming corre-
`
`
`
`
`
`
`
`
`
`spondence, as in the system of application Ser. No.
`
`
`
`
`
`
`
`
`
`
`
`536,769, it is quite helpful to also maintain a record of
`
`
`
`
`
`
`
`
`
`
`fonts used by a specific company and to add new fonts,
`
`
`
`
`
`
`
`when they are encountered, with the identification of
`the sender.
`
`
`
`
`
`
`
`
`
`
`Consider, for example, the situation in which the
`
`
`
`
`
`
`
`
`mass store is dealing with correspondence and the
`
`
`
`
`
`
`
`
`
`Volkswagen Company is known to have used 5 specific
`fonts and further assume that retrieval of a document
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`received from the Volkswagen Company containing
`the term “rear axle” is needed. It is known that the letter
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`was sent by a Mr. Wagner in May, 1989. The operator
`
`enters as search criteria the search words “rear axle,
`
`
`
`
`
`
`
`
`
`Wagner, May 1989” and “Volkswagen=company”.
`
`
`
`
`
`
`
`
`
`
`
`
`
`The program goes to a company list and, under “Volk-
`
`
`
`
`
`
`swagen” identifies those fonts associated with Volkswa-
`
`
`
`
`
`
`
`
`
`
`gen in image and select the search words “rear axle,
`
`
`
`
`
`
`
`
`
`Wagner, May 1989” in 5 different image fonts since the
`
`
`
`
`
`
`
`
`
`
`
`
`letter might have been written in any one of those. The
`
`
`
`
`
`
`
`
`
`search engine, in a relatively short time searches the
`
`
`
`
`
`
`
`
`
`image files and extracts the desired letter, or possibly
`several letters meeting the criteria, ready for display
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`and/or printout. This approach permits adding even
`
`
`
`
`
`
`
`
`old, outdated fonts for filing old documents, eliminating
`
`
`
`
`
`
`
`
`the requirement of warehouses stacked with old files.
`It is a relatively simple matter to maintain the font
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`record. In received correspondence, the font (or fonts)
`used in a newly received letter is compared with the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`font table and, if the font is recognized as being in the
`table, the font is listed with the name of the sender. If a
`
`
`
`
`
`
`
`
`
`
`
`
`font is not recognized, the document image can be
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`flagged to bring it to the attention of an operator for
`
`
`
`
`
`
`
`
`addition to the table, generally a partly manual process.
`
`
`
`
`
`
`
`
`
`As a more detailed and specific illustration of the
`
`
`
`
`
`
`
`
`
`method of the present invention, consider the case of a
`naval organization which has a large number of hand-
`
`
`
`
`
`
`
`books and instruction manuals all of which are needed
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`for the routine maintenance of a ship. These manuals, or
`
`
`
`
`
`
`
`
`
`
`
`their equivalents, must be carried by the ship so that the
`
`
`
`
`
`
`
`
`
`personnel in various specialties can refer to them for
`routine maintenance, or non-routine repair, of any sys-
`
`
`
`
`
`
`
`tem aboard, whether of a mechanical, hydraulic, electri-
`
`
`
`
`
`
`cal or other nature. Such manuals are typically printed
`
`
`
`
`
`
`
`
`in a small number of type styles or fonts which are
`
`
`
`
`
`
`
`
`
`
`relatively standard. The presence of a full set of such
`
`
`
`
`
`
`
`
`manuals for an aircraft carrier has been estimated to be
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`responsible for altering the draft of the ship by about
`three feet.
`
`
`In accordance with the present invention, all of the
`
`
`
`
`
`
`
`
`
`manuals with their associated diagrams and illustrations
`
`
`
`
`
`
`
`can be stored in image form on a reasonable number of
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`optical disks and can then be searched to locate the
`information of interest. Because of the limited number
`
`
`
`
`
`
`
`
`of fonts which are known in advance, the search speed
`
`
`
`
`
`
`
`
`
`
`is maximized and the weight associated with the printed
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`documents is replaced by the comparatively trivial
`weight of several computers (which are already avail-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`able on the ship in the form of personal computers)
`
`P.7
`
`
`P. 7
`
`
`
`
`5
`
`
`
`
`
`
`
`
`
`along with the optical disks and disk readers for cooper-
`ating with the computers.
`
`
`
`
`
`
`
`
`
`
`
`
`Referring now to FIG. 2, the following example will
`
`
`
`
`
`
`
`
`involve the review, by the computer equipment, of the
`
`
`
`
`
`
`
`
`
`stored text of the manuals for the purpose of extracting
`
`
`
`
`
`
`
`
`(copying into RAM) the chapter headings of all chap-
`
`
`
`
`
`
`
`
`ters, and then putting them into alphabetical order.
`
`
`
`
`
`
`
`
`Although each chapter heading is preceded by a one- or
`
`
`
`
`
`
`
`two-digit reference number, for the present purpose
`
`
`
`
`
`
`
`
`
`
`those numbers are to be discarded and only the chapter
`names are to be retained. As indicated above, three
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`fonts are used in the manuals, a large font for chapter
`
`
`
`
`
`
`
`
`headings, a medium font for subheadings and a “nor-
`
`
`
`
`
`
`
`
`
`
`
`mal” font for the text. The first step 15 in the method
`thus is the establishment of a font table which cross-ref-
`
`
`
`
`
`
`
`
`erences the ASCII values for each letter, number and
`
`
`
`
`
`
`
`
`
`other symbol to the equivalent characters in the fonts
`
`
`
`
`
`
`
`
`
`used. For this example, only the largest font would be
`
`
`
`
`
`
`
`
`
`
`needed, but the table is established with all fonts when
`
`
`
`
`
`
`
`
`
`
`the mass storage files are created so that any manipula-
`
`
`
`
`
`
`
`
`
`tion can be done.
`
`
`
`
`The identification of the font which is to be searched
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`for is then entered, 16, along with the search area de-
`sired, i.e., any limitation on the area of the manuals
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`which are to be searched. The search engine, such as
`the Benson system mentioned above, then examines the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`full text of the documents in image storage, 17, and
`
`attempts to match the pattern of the first character of
`
`
`
`
`
`
`
`
`
`
`each heading with the pattern in the table, 18. If there is
`
`
`
`
`
`
`
`
`
`
`
`
`a match, the character is examined, 20, to see if it is a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`number. If it is a number, the heading is further exam-
`ined, 22, to see if the second character is a number or a
`
`
`
`
`
`
`
`
`
`
`
`blank space, 23. If the second character is found to be a
`
`
`
`
`
`
`
`
`
`
`
`number, there is a check to see if there is a blank space,
`
`
`
`
`
`
`
`
`
`
`
`
`
`24, following the second number. Location of a blank in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`either place completes the pattern which identifies a
`heading. Thus, in addition to the font size, the material
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`being examined has been confirmed as a heading and the
`
`
`
`
`
`
`
`numbers preceding the heading name have been iso-
`
`
`
`
`
`
`
`
`lated. Those numbers, having served their purpose for
`this search, are deleted, 25. Failure to find a match in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the appropriate font size or a blank in the proper place,
`26, is inconsistent with the known format of the stored
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`documents. In such a case, the process is stopped and
`the operator is informed with an appropriate error mes-
`
`
`
`
`
`
`
`
`sage, 27.
`
`
`
`
`
`
`
`
`
`
`The following character is compared, 28, with the
`font table to determine which letter of the alphabet
`
`
`
`
`
`
`
`
`
`begins the first word of the heading. If there is no
`
`
`
`
`
`
`
`
`
`
`
`match, 30, then the method is stopped and the operator
`
`
`
`
`
`
`
`
`
`
`is informed, 27. If there is a match, all of the words in
`
`
`
`
`
`
`
`
`
`
`
`
`
`the heading are then copied and stored together, 32, as
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`being a heading and they are stored in an alphabet
`group which will contain other headings starting with
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the same letter, i.e., if the heading is “HYDRAULIC
`SYSTEMS”, then it is stored in a memory area reserved
`
`
`
`
`
`
`
`
`
`for headings starting with H. The first letter of the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`stored heading will now be referred to as the “first
`character”.
`
`The mass store is then examined to see if there is
`
`
`
`
`
`
`
`
`
`
`
`another document with a heading in storage, 34. If so,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`35, the above steps beginning at 18 are repeated for each
`
`as indicated by the circled recirculation numeral 1. If
`
`
`
`
`
`
`
`
`
`the heading is found to be the last heading in the last
`
`
`
`
`
`
`
`
`
`
`
`
`document, then the process of arranging the documents
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`within the alphabet groups begins as shown in FIG. 3.
`
`
`
`
`
`
`
`Actually, the rearranging process can be accomplished
`
`
`
`
`
`
`
`
`
`
`while the above steps are being repeated for the second
`
`
`
`
`
`
`
`10
`
`
`
`
`
`15
`
`
`
`20
`
`
`
`
`25
`
`
`30
`
`
`
`35
`
`
`
`
`
`45
`
`
`50
`
`
`55
`
`
`60
`
`
`
`65
`
`
`5,396,588
`
`
`
`
`
`
`
`
`
`6
`
`
`
`
`
`
`
`
`and subsequent documents, but for simplicity, it is de—
`scribed herein as being a totally serial process.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The rearranging process begins, 36, with the first
`
`
`
`
`
`
`
`
`
`
`alphabet group in which the next, or Nth, character of
`
`
`
`
`
`
`
`
`
`each heading is examined, 38, for a match with a pattern
`in the font table. For this next character, N=2. If a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`match, 40, is found, the heading is relocated, 42, to a
`
`
`
`
`
`
`
`
`position in the memory consistent with that second
`
`
`
`
`
`
`
`
`
`
`letter position in the alphabet. The Nth character of the
`
`
`
`
`
`
`
`
`
`
`
`next heading is examined, 44, and if a match is found, 46,
`
`
`
`
`
`
`
`
`
`
`
`that heading is also relocated, 48. If no match is found,
`the character is checked, 50, to see if it is a blank. If it is
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`not a blank, the process is stopped and an error message
`
`
`
`
`
`
`
`
`
`
`
`
`given, 27. If it is a blank, then N is increased by one, 52,
`and the next character of that same heading is checked,
`
`
`
`
`
`
`
`
`
`
`54. If that also is a blank, 56, it is assumed that the head—
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ing has ended and the heading is relocated, 57. In addi-
`
`
`
`
`
`
`
`
`
`tion, that heading is flagged as having been completed
`
`
`
`
`
`
`
`
`
`
`insofar as alphabetizing is concerned and it is passed by
`
`
`
`
`
`
`
`
`
`
`in subsequent operations. If it is not a blank, a match is
`
`
`
`
`
`
`
`
`
`
`
`sought, 58. A failure to find a match stops the process.
`If there is a match, then it means only that there was a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`space in the heading and it is relocated, 59. After either
`relocation 57 or 59, a check is made, 60, to see if this was
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the last heading in this first letter store. If not, N is
`reduced by one, 61, and the process continues from 44.
`
`
`
`
`
`
`
`
`
`
`If so, a check is made, 62, to see if this is the last heading
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`in the last first letter store. If so, the entire alphabetizing
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`process is ended. If not, N is reset to two, 64, and the
`process is continued with the next first letter store from
`
`
`
`
`
`
`
`
`
`
`38.
`
`
`
`
`
`
`
`
`
`Returning to step 48, after relocation, the process
`checks to see if that was the last heading, 66. If not, the
`
`
`
`
`
`
`
`
`
`
`
`
`
`process is recirculated to 44. If so, a check, 68, is made
`
`
`
`
`
`
`
`
`
`
`
`
`to see if this was the last heading in the last letter store.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`If so, the process has been completed and the headings
`
`
`
`
`
`
`
`
`
`
`
`are ready to be printed out or displayed in the desired
`order. If not, N is increased by one, 70, and the process
`
`
`
`
`
`
`
`
`
`
`
`
`repeats from 38.
`
`
`
`
`
`
`
`
`
`
`
`
`
`In the above process, it will be apparent that instead
`of being relocated each time the next character is identi-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`fied, an index can be built up to identify the storage
`locations of words having certain character values.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Then, when a printout or display is desired, the images
`are read out on the basis of the index information.
`
`
`
`
`
`
`
`
`
`
`In the approach described above, the text is stored
`
`
`
`
`
`
`
`
`totally in image form, i.e., without conversion to ASCII
`
`
`
`
`
`
`
`
`or other code. In a modified version of that approach,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`special use can be made of the printed index which
`generally accompanies documents such as this to facili-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tate searching for and displaying desired portions of text
`
`
`
`
`
`
`
`
`or illustrations. In this modified approach, the printed
`
`
`
`
`
`
`
`
`
`index (as distinguished from any index created by the
`computer system) is not only stored in image but is also
`
`
`
`
`
`
`
`
`
`
`
`converted into code, using conventional character rec-
`
`
`
`
`
`
`ognition equipment and software, either when the mate—
`
`
`
`
`
`
`
`rial is first scanned into mass storage or subsequently.
`
`
`
`
`
`
`
`
`
`Then, when one wishes to locate those parts of the
`
`
`
`
`
`
`
`
`
`
`stored text relating to a specific index item, the index is
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`displayed from the stored code, the desired item is se-
`lected from the display and image search words are
`
`
`
`
`
`
`
`
`
`constructed from the font table in each of the fonts used
`
`
`
`
`
`
`
`
`
`
`
`in the document. Those image search words are then
`
`
`
`
`
`
`
`
`
`used in a pattern search, as discussed above, to locate
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`the relevant parts of the text. The image store of the
`index need not be maintained after conversion. This
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`approach retains the advantages of image storage for
`
`
`
`
`
`
`
`
`most of the material but facilitates retrieval by provid-
`
`P.8
`
`
`P. 8
`
`
`
`
`
`8
`7
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`various components. As such, it is provided with vola-
`ing a more direct technique for finding relevant search
`
`
`
`
`
`
`
`
`til