throbber

`
`DOCUMENT
`IMAGE
`ANALYSIS
`
`
`
`Lawrence O'Gormcn
`Rongochar Kasfu ri
`
`m.
`COMPUTER
`SOCIETY
`
`A
`IEEE
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 1 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 1 of 41
`
`

`

`Executive Briefing
`
`Document Image Analysis
`
`Lawrence O’Gorman
`
`Rangachar Kasturi
`
`
`
`IEEE®
`COMPUTER
`SOCIETY
`
`Los Alamitos, California
`
`Washington
`
`‘
`
`Brussels
`
`'
`
`Tokyo
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 2 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 2 of 41
`
`

`

`Library of Congress Catnioging-in—Publlcatlan Data
`
`O’Gorman. Lawrence.
`Executive briefing: document image analysIs I Lawrence O'Gnrman,
`Rangachar Kasturi.
`p.
`cm.
`Inciudes bibliographical references.
`ISBN 0—8136-7302-X
`2. Image processing—wDigita!
`1. Document Imaging systems.
`techniques.
`3, Optical character recognition devices.
`3. Kasturi.
`Rangachar. 1949— .
`II. Title.
`HF5737.037
`1997
`006.4 ' 2—6021
`
`Cf!"
`
`974 7283
`
`Capyrighr @ 199'? by The Institute of Electrical and Elecu'cmics Enginccrs. Inc. All rights marred.
`Cnp)'riyfir and Reprint Permissions: Abstracting is permitted wim cmdir to the source. Libraries are
`permitted to photoccpy isolated pages bcytmd the Iirnits of US copyright an. for private use 0f Ihcir
`patrons. Other mpying,
`reprint. or mpublicalion mquests shuuid be addressed (0:
`IEEE Copyrights
`Manager. IEi‘iE Sari/ice. Canter. 4-4-5 Hoes Lane. PO. Box £331. Piscataway. NJ 08855-I33I.
`
`IEEE Computer Society Press Order Number BR07802
`Library of Cungruss Number 97-17283
`ISBN 0-8? 86-7802-X
`
`IEEE Cumputu Society Press
`Custom Scmcc Ccnrcr
`I0662 1.05 quuvma‘ Cirdc
`PO. flux 3014
`Les Alumnus. CAEQ'IEIL IS 14
`Te]: +!~7M«821~338ii
`Fax: +1~7M>82i~4641
`Bmaii: m.bouks@mmpumr.urg
`
`Additirmui cupi‘es my be rjrrleredfmm:
`
`IEEE Scn’Il‘fl Center
`445 Hoes Lane
`PO. Box 1331
`Piscazaway. NJ (38855-1331
`Tel: +I-908-98I - I393
`Fax: +3 -Q08-9Bi-9557
`nusmstwfiicornpummg
`
`IEEE Compuicr Soeieiy
`13. Avenue dc I'Aqmiun
`8-1200 Bnmscis
`BELGIUM
`Tel: +32—2-770-2198
`Fax: +32-2c7708505
`euro.c1fc®compu:cr.urg
`
`IPal: Computer Society
`()osiuma Building
`2—194 Minunu-Auyama
`Minmo-ku.Tokyo I07
`MPAN
`Tel: Hit-34.50841 I8
`Fax: +8 I 34408-3553
`tokyo,ufc®wmputer.org
`
`Publisher: Mat: Loch
`Acquisuions Editor: Biil Sanders
`Devalopmcmal Editor: Chcryi Smith
`Advertising/Pmmoiions: Tom Fink
`Production Editor: Lisa O'Conner
`Printed in the United States of America by KN]. Incorporated
`
`iEEE®
`COMPUTER 9
`SOCIETY
`a.
`
`a
`
`.
`';
`
`ii 3i
`
`[
`
`i
`
`
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 3 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 3 of 41
`
`

`

`
`
`
`
`iii
`
`Preface
`
`In the late 19805. the prevalence of fast computers. large computer memory. and inex-
`pensive scanners fostered an increasing interest
`in document image analysis. With
`many paper documents being sent and received via fax machines and stored digitally
`in large document databases. interest grew in doing more than simply viewing and
`printing these images. Research was undertaken and commercial systems built to read
`text on a page, to find fields on a form. and to locate iines and symbols on a diagram.
`Today, we see the resuits of this research and development in document pioccssing
`and optical character recognition (OCR). OCR is used by post offices to automatically
`route mail; engineering diagrams are extracted from paper for computer storage and
`modification; handheld computers recognize symbols and handwriting for use in
`niche markets such as inventory control. [n the future. applications such as these will
`be improved, and other document applications will be added. For instance. the mil-
`lions of paper volumes now in libraries will be replaced by computer tiles of page
`images that can be searched for by content and accessed by many people simulta—
`neously—and they wili never be misshelvcd, Businesspeople wiil carry their file cabi-
`nets in their portable computers: paper copies of new product literature. receipts. or
`other random notes will be instantly filed and accessed in the computer; and sigma
`lures will be analyzed by the computer for verification and security access.
`This: book describes some of the technical methods and systems used for docu-
`ment processing of text and graphics images. The methods have grown out of the
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 4 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 4 of 41
`
`

`

`
`fieids of digitai signal prmessing. digital image processing, and pattern recognition.
`The objective is to give the reader an understanding of what approaches are used for
`application to documents and how these methods apply to different situations. Since
`the field of document processing IS relatively new, it is also dynamic; in other words.
`current methods have room for improvement, and innovations are still being made. In
`addition. there are rarely definitive techniques for all cases oia certain problem.
`The intended audience is executives. managers. and other decision makers whose
`businesses require some acquaintance or understanding of document processing. (We
`call this group “executives" in accordance with the [iterative Briefing series.) Some
`mdirnentary knowledge of computers and computer images will be helpful back-
`groumi for these readers. We begin with basrc principles (such as defining pixels) hut
`
`technique operates and not necessarily knowledge of picture processing. A grasp of
`the terminology goes a long way toward aiding the executive in understanding the
`technology and processes discussed in each chapter. For this reason, each section
`begins with a list of keywords. With knowledge of the terminology and whatever
`depth of method or system understanding that he or she decides to take from the text.
`the executive shouid be well equipped to deal with document—processing issues.
`in each chapter. we attempt to identify major problem areas and to describe more
`than one method applied to each problem. as well as the advantages and disadvan»
`tages of each method. This gives an understanding of the problems and also the nature
`of trade-offs that so often must he made in choosing a method. With this tinderstandw
`ing of the problem and a knowledge of the methodology options. an executive will
`have the technical background and context with which to ask questions, judge recon:-
`mendations, weigh options, and make decisions.
`We incittdc technoiogy descriptions and references to the technical papers that
`host give details on the techniques presented in the book. The technology descriptions
`are detailed enough for one to understand the methods—if implementation is desired.
`the references will facilitate this. i’opular as woli as accepted methods are presented
`so that the executive can compare a variety of options. in many cases. some of the
`options are advanced methods not currently used in commercial products. Depending
`
`eessing. These are described from the applications viewpoint to give concrete exam-
`pies of where and how the methods are implemented.
`The book is organized in the sequence that document images are usually pro-
`cessed. After document input by digital Scanning, pixel processing is first perfumed.
`This level of processing includes operations that are applied to all image pixels. These
`include noise removal, image enhancement. and segmentation of image components
`into text and graphics (lines and symbols). 'Featttrcdevel analysis treats groups of pix~
`cls as entities and includes line and curve detection and shape description. The last
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 5 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 5 of 41
`
`

`

`2
`
`Chapter 1
`
`Textual Processing
`
`\\ ..
`Graphical Processing
`
`
`
`
` Line Processing
`
`0 tical Character
`ecognitton
`
`Region and
`Symbol
`Processing
`
`
`Skew, Text Lines.
`Straight Lines,
`Filled Regions
`Text Blocks, and
`Corners. and Curves
`Paragraphs
`Figure 1. 1 Hierarchy oi'document processing subarcas listing the types ofdocument components in each suh~
`
`Text
`
`area.
`
`analysis techniques. the megabytes of initial data are culled to yield
`tic description of the document.
`it is not difficult to find examples ofthe need for doc
`ument analysis. Look around
`the workplace and you will see stacks of paper docum
`cnts. Some may be computer
`generated, though invariably by different computers and
`software. and their electronic
`formats may be incompatible. Some will include both
`formatted text and tables. as
`well as handwritten entries. and they (tiller in size. from 3.5 x 2 in. (8.89 x 5.08mi)
`business cards to 34 x 44 in. (86 x l l loin) engineering drawings. in many businesses
`today. imaging systems are used to store images of pages so that storage and retrieval
`are more efficient. Future document analysis systems will recognize types of docu~
`ments, enable the extraction of their Functional pans, and be able
`to translate from one
`computer generated format to another. Many other exampics ex
`ist of the. use of and
`need for document systems. Glance behind the counter
`in a post office at the mounds
`of letters and packages. In some US. post offic
`es. over a million pieces of mail must
`be handled each day. Machines to perform sorti
`ng and address recognition have been
`
`
`
`
`
`1.1 Her
`
`00:
`
`BANK OF AMERICA
`
`
`
`- 1080IPR2021 0
`
`Ex. 1007, p. 6 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 6 of 41
`
`

`

`3
`
`
`What is a Document image and What: Do We [Jo with It?
`
`used for several decades. but there is still the need to process more mail. more quickly.
`and more accurately. Examine the- stacks of a library. where row after row of paper
`documents are stored. Loss of material. misiiling. limited numbers oi‘cach copy. and
`even degradation of materials are common problems and may be improved hy docu-
`ment analysis techniques. All of these examples serve as applications ripe for the
`potential solutions of document image analysis.
`Though document image analysis has been in use for a couple of decades (espe-
`cially in the banking business for computer reading of numeric check codes). only in
`the late I980s and early £99le has the field grown rapidly. The predominant reason for
`this is the greater speed and lower cost of hardware now available. Since fax machines
`have become ubiquitous, the cost ofoptical scanners for document input has dropped
`to the level
`that
`these are affordable to even small businesses: and individuals.
`Although document images contain a relatively large amount of data. even personal
`computers now have adequate speed to process them. Computer main memory also is
`now adequate for large document images: more importantly. however. optical memory
`is now available for mass storage of large amounts of data. This improvement in hard-
`ware. and the increased use of computers for storing paper documents. has led to
`increasing interest in improving the technology of document processing and recogni-
`tion. The advancements being made in document analysis software and algorithms are
`an essential complement to these hardware improvements. With OCR recognition
`rates now in the mid to high 90 percent range. and other document processing mellir
`ods achieving similar improvements, these advances in research have also driven doc-
`ument image analysis forward.
`As improvements
`in technology continue. document systems will become
`increasingly more common. For instance. OCR systems will he more widely used to
`store. search. and excerpt from paper~based documents. Page layout analysis tech-
`niques will recognize a particular form or page format and allow its duplication. {Ira-
`grams will be entered from pictures or by hand and logically edited.
`i’cnnhascd
`computers will translate handwritten entries into electronic documents. Archives of
`paper documents in libraries and engineering companies wili be electronically con-
`vened for more efficient storage and instant delivery to a home or office computer.
`Although it will be increasingly the case that documents are produced and reside on a
`computer. because there are many different systems and prott'icols and because paper
`is a very comfortable medium for us to deal with. paper documents will he with us to
`some degree for many decades to come. The difference will he that they will iiuaily he
`integrated into our computerized world.
`
`includes digital signal processing and digital image processing. Digital signal process-
`
`1.1 Hardware Advancements and the Evolution of
`Document Image Analysis
`
`The history of document image analysis can be traced through a computer lineage that
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 7 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 7 of 41
`
`

`

`computer vision for processing images of three-dimensional scenes used in robotics.
`In the mid- to lote- l 9805, document image analysis bega
`this was predominantly due to hardware advancements enabling processing to be
`formed at a reasonable cost and speed. Whereas a speech sign
`'
`in frames of 256 samples long .
`'
`'
`‘
`-
`document image is from 2.550 X 3.300 pixels for at business l‘
`dots per inch (dpi) ( l2 dots permillimeter) to 34.000X44 000 pixels fora 34 x 44 in.
`systems are now available for storing business forms, performing OCR on typewritten
`text, andcompressing engineering drawings. Document analysis research continues to
`pursue more intelligent handling of documents. better compression. especially
`through component recognition. and faster processing.
`
`ment analysis
`
`H:
`
`I
`
`Hmmmwile»«a-:"--:4~::3r"
`
`l,l
`
`:23g:
`fa
`
`enhancement, and segmentation. For gray-scale images with information that is inher—
`
`that the image ol‘ the document contains only raw data that must he further at
`to glean the information. For example, Figure 1.3 shows the imag
`is a pixel array ofON or OFF values whose sh' pe is known to h
`however. to a computer it is just a string ol‘ bits in computer me
`1.2.1 Pixel-Level Processing [Chapter 2)
`This stage ofdocument image analysis includes hinarization, noise reduction, signal
`
`tulyzed
`e of the letter 6. This
`umans as the letter e,-
`mory,
`
`7.500
`
`LSOO W03
`1 Title, 2 3
`
`Figure 1 .a
`
`’
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 8 of 41
`
`

`

`
`
`
`
`
`
`‘
`
`
`
`
`
`
`
`
`-
`dC ,1: at
`SOOXSL' e
`“we 6 urea
`m an
`7,500x 10 Character Features
`1.0 X 5 Region Features
`
`
`500 Line and Curve Segments
`Ranging from 20 to 2,000 Pixels Long
`
`
`
`I0 Filled Regions Ranging from
`'20 x 20 to 200 x 200 Pixeln
`
`FeaturemLevel
`
`analysis
`
`?
`
`
`
`What la a Document Image and What Do We Do with It?
`
`5
`
`Document Page
`
`Data Capture
`
`
`
`107 pixels
`
`Pixel-Level
`
`Processing
`
`7,500 Character Boxes, Each
`About 15 X 20 Pixels
`
`
`
`
`Text Analysis
`Graphics Analysis
`and Recognition
`and Recognition
`
`
`
`
`
`1,500 Words, 10 Paragraphs,
`_
`
`l Titic, 2 Subtitles, and so on
`
`
`2 Line Diagrams,
`1 Company Logo. and so on
`
`Document Description
`
`Figure 1.2 Atypical sequence of steps for document analysis, with examples of intermediate and final results
`and data site.
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 9 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 9 of 41
`
`

`

`
`
`
`
`e
`
`_
`
`background and whiz
`d by perfimning
`
`BANK OF AMER'CA
`
`Ex. 1007, 9- 1° °f 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 10 of 41
`
`

`

`
`
`What: Is a Document Image and What; Do We Do with It?
`
`7
`
`Segmentation occurs on two levels. On the first level, segmentation occurs if the
`document contains both text and graphics—these are separated for subsequent pro-
`cessing by different methods. 0n the second level. segmentation is performed on tcxt
`by iocating columns. paragraphs. Words. titles. and captions; and on graphics by sepa-
`rating symbol and line components. For instance. in a page: containing a flow chart
`with an accompanying caption. text and graphics are first separated. Then the text is
`separated into that of the caption and that of the chart. The graphics are separated into
`rectangles. circics. connecting lines. and so on.
`
`1 .2.2 Feature-«Level Analysis [Chapter 3]
`in a text image. the global features describe each page and consist of skew (the tilt at
`which the page has been scanned). line lengths. tine spacing, and so on. Local features
`describe individual characters and consist of font size, number of loops in a character.
`numbe:r of crossings. accompanying dots and so on.
`In a graphical
`image. global features describe the skew of the page. the line
`widths. range of curvature. minimum fine lengths. and so on. Local features describe
`each corner. curvc. and straight line. as wail as the rectangles, circles. and other geo—
`metric shapes.
`
`1.2.3 Recognition of Text and Graphics [Chapters 4
`and 5]
`
`The float step in document image analysis is recognition and description: components
`are assigned a semantic label and the entirc document
`is described as a whoic.
`Domain knowledge is used most extensively at this stage. The result is a description
`of a ductirncnt as a human wouid give it. For a text image. we refer for example. not to
`pixel groups or biobs of black on white. but to titles. subtitles. bodies ni‘text, and took
`notes. Depending on the arrangement of these text blocks, a page of text may be a title
`page of a papcr, ii labia of contents of a journal. at business form. or the l’acc of a mail
`piece. For a graphical image. an electrical circuit diagram for instance. we refer not to
`tines joining circles and triangles and other shapes. but to connections betwocn AND
`gates, transistors. and other eicctronic components. The components and their connec—
`tions describe a particular circuit that has a purpose. in the known domain. It is this
`semantic description that is most ci'ficicntiy stored and most cfi‘cctivcly used for com
`mon tasks. such as indexing and modifying particular document components .
`
`i2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 11 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 11 of 41
`
`

`

`
`This material may be protected by Copyright law (Title 17 us. Code)
`
`
`
`
` Chapter 2
`
`Preparing the Document
`Image
`
`2. 1
`
`Introduction
`
`Data capture of documents by optical scanning or by digital video yields a file of pic-
`ture elements, or pixels, that is the raw input to document analysis. These pixels are
`samples of intensity values taken in a grid pattern over the document page, where the
`intensity values may be OFF (0) or ON (1) for binary images, 0 to 255 for gray-scale
`images, and three channels of 0 to 255 color values for color images. The first step in
`document analysis is to perform processing on this image to prepare it for further
`analysis. Such processing includes thresholding to reduce a gray-scale or color image
`to a binary image, reduction of noise to reduce extraneous data, and thinning and
`region detection to enable easier subsequent detection of pertinent features and
`objects of interest. This pixel-level processing (also called preprocessing and low-
`level processing in other literature) is the subject of this chapter.
`
`2.2 Thresholding
`
`thresholding, binarization, global
`Keywords:
`intensity histogram
`
`thresholding, adaptive thresholding,
`
`In this treatment of document processing. we deal with images containing text and
`graphics of binary information—that is, these images contain a single foreground
`level that is the text and graphics of interest and a single background level upon which
`
`9
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 12 of 41
`
`

`

`1 0
`
`Chapter 2
`
`\ t
`
`he foreground contrasts. We will also call the foreground objects, regions of interest,
`or components. (Of course, documents may also contain true gray-scale [or color]
`information, such as in photographic figures; however, except for recognizing the
`presence of a gray—scale picture in a document, we leave the analysis of pictures to the
`more general fields of image analysis and machine vision.) Although the information
`is binary, the data—in the form of pixels with intensity values—are not likely to have
`only two levels, they, instead, have a range of intensities. This may be due to non-uni-
`form printing or non-uniform reflectance from the page or a result of intensity transi-
`tions at the region edges that are located between foreground and background regions.
`The objective in binarization is to mark pixels that belong to true foreground regions
`with a single intensity (0N) and background regions with a different intensity (OFF).
`Figure 2.] illustrates the results of binarizing a document image at different threshold
`values. The UN values are shown in black in Figure 2.1, and the OFF values are in
`white.
`
`For documents with a good contrast of components against a uniform back-
`ground, binary scanners are available that combine digitization with thresholding to
`yield binary data; however, for the many documents that have a wide range of back-
`ground and object intensities, this fixed threshold level often does not yield images
`with clear separation between the foreground components and the background. For
`instance, when a document is printed on differently colored paper, when the fore
`ground components are faded due to photocopying, or when different scanners have
`different light levels, the best threshold value will also be different. For these cases,
`there are two alternatives. One is to empirically determine the best binarization setting
`on the scanner (most binary scanners provide this adjustment) and to do this each time
`an image is poorly binarized. The other alternative is to start with gray-scale images
`(having a range of intensities, usually from 0 to 255) from the digitization stage and
`then to use methods for automatic threshold determination to better perform binariza-
`tion. Although the latter alternative requires more input data and processing,
`the
`advantage is that a good threshold level can be found automatically, ensuring consis-
`tently good images, and precluding the need for time~consuming manual adjustment
`and repeated digitization. The following discussion presumes initial digitization to
`gray-scale images.
`If the pixel values of the components and those of the background are fairly con-
`sistent in their respective values over the entire image, a single threshold value can be
`found for the image. This use of a single threshold for all image pixels is called global
`thresholding. Processing methods will be described that automatically determine the
`best global threshold value for different images. For many documents, however, a sin-
`gle global threshold value cannot be used even for a single image due to non-unifor-
`mities within foreground and background regions. For example, for a document
`containing white background areas as well as highlighted areas of a different back-
`ground color, the best thresholds will change by area. For this type of image, different
`threshold values are required for different local areas; this is adaptive thresholding
`and will also be described.
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 13 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 13 of 41
`
`

`

`Preparing the Document Image
`
`1 1
`
`a. Histogram
`
`Q
`(Whlte) (b)
`
`(c)
`
`(d)
`
`255 (black)
`Intensity Value
`
`b. Low Threshold
`
`.
`
`,
`
`'
`
`.
`_
`1.} 9 u
`T ém‘é‘i’ku'ssio 5
`Scientists, p.5
`
`‘
`
`d. High Threshold
`
`c. Good Threshold
`
`’
`
`Topping ihe
`,
`_ r, Talent of Russia's
`" Scientists, p.5
`
`Figure 2.1
`
`Image binarization. (a) Histogram of original gray-scale image; horizontal axis shows markings
`for threshold values of images below. The lower peak is for the white background pixels, and the
`upper peak is for the black foreground pixels. Image binarized with: (b) threshold value too low.
`(c) good threshold value. and (d) threshold value too high.
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 14 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 14 of 41
`
`

`

`12
`
`Chapter 2
`
`2.2.1 Global Thresholding
`
`The most straightforward way to automatically select a global threshold is to use a
`histogram of the pixel intensities in the image. The intensity histogram plots the num-
`ber of pixels with values at each intensity level. See Figure 2.1 for a histogram of a
`document image. For an image with well-differentiated foreground and background
`intensities, the histogram will have two distinct peaks. The valley between these peaks
`can be found as the minimum between two maxima, and the intensity value there is
`chosen as the threshold that best separates the two peaks.
`There are a number of drawbacks to global threshold selection based on the shape
`of the intensity distribution. The first drawback is that images do not always contain
`well—differentiated foreground and background intensities because of poor contrast
`and noise. A second drawback is that, especially for an image of sparse foreground
`components, such as for most graphics images, the peak representing the foreground
`will be much smaller than the peak of the background intensities. This often makes it
`difficult to find the valley between the two peaks. In addition, reliable peak and valley
`detection are separate problems unto themselves. One way to improve this approach is
`to compile a histogram of pixel intensities that are weighted by the inverse of their
`edge-strength values [1]. Region pixels with low edge values will be weighted more
`highly than boundary and noise pixels with higher edge values, thus sharpening the
`histogram peaks due to these regions and facilitating threshold detection between
`them. An analogous technique is to highly weight intensities of pixels with high edge
`values, then choose the threshold at the peak of this histogram corresponding to the
`transition between regions [2]. This requires peak detection of a single maximum, and
`this is often easier than valley detection between two peaks. This approach also
`reduces the problem of large size discrepancy between foreground and background
`region peaks because edge pixels are accumulated on the histogram instead of region
`pixels; the difference between a small and large size area is a linear quantity for edges
`versus a much larger squared quantity for regions. A third method uses a Laplacian
`weighting. The Laplacian is the second derivative operator, which highly weights
`transitions from regions into edges (the first derivative highly weights edges). This
`will highly weight the border pixels of both foreground regions and their surrounding
`backgrounds, and because of this the histogram will have two peaks of similar area.
`Although these histogram-shape techniques offer the advantage that peak and valley
`detection are intuitive, peak detection is still susceptible to error due to noise and
`poorly separated regions. Furthermore, when the foreground or background region
`consists of many narrow regions, such as for text, edge and Laplacian measurement
`may be poor due to very abrupt transitions (narrow edges) between foreground and
`background.
`A number of methods determine foreground and background classes by using for-
`mal pattern recognition techniques that optimize some measure of separation. One
`method is minimum error thresholding [3, 4] (Figure 2.2). Here, the foreground and
`background intensity distributions are modeled as normal (Gaussian or bell—shaped)
`probability density functions. For each intensity value (from 0 to 255, or a smaller
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 15 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 15 of 41
`
`

`

`
`
`.
`:.
`?'
`
`f
`‘
`
`
`
`Preparing the Document Image 13
`
`range if the threshold is known to be limited to it), the means and variances are calcu-
`lated for the foreground and background classes, and the threshold is chosen such that
`the misclassification error between the two classes is minimized. Minimum error
`thresholding is classified as a parametric technique because of the assumption that the
`gray—scale distribution can be modeled as a probability density function. This is a pop-
`ular method for many computer vision applications, but some experiments indicate
`that documents do not adhere well to this model: thus, results with this method are
`poorer than nonparametric approaches [5]. One nonparametric approach is Otsu’s
`method [6, 7]. Calculations are first made of the ratio of between~class variance to
`within-class variance for each potential threshold value. The classes here are the fore-
`ground and background pixels, and the purpose is to find the threshold that maximizes
`the variance of intensities between the two classes and minimizes them within each
`class. This ratio is calculated for all potential threshold levels, and the level at which
`the ratio is maximum is the chosen threshold. An approach similar to Otsu’s employs
`an information theory measure, entropy, which is a measure of the information in the
`image expressed as the average number of bits required to represent the information
`[5, 8]. Here, the entropy for the two classes is calculated for each potential threshold,
`and the threshold where the sum of the two entropies is largest is chosen as the best
`threshold. Moment preservation is another thresholding approach [9]. This method is
`less popular than the preceding ones; however, we have found it to be more effective
`in binarizing document images containing text. In the moment preservation method, a
`threshold is chosen that best preserves moment statistics in the resulting binary image
`as compared with the initial gray—scale image. These moments are calculated from the
`intensity histogram—the first four moments are required for binarization.
`Many images have more than two levels. For instance, magazines often employ
`boxes to highlight text; the background of the box has a different color than the white
`background of the page. In this case, the image has three levels: background, fore-
`ground text, and background of highlight box. To properly threshold an image of this
`type, multithresholding must be performed. There are fewer multithresholding meth-
`ods than binarization methods. Most require that the number of levels is known (for
`example, [6]). For the cases where the number of levels is not known beforehand, one
`method [16] will determine the number of levels automatically and perform appropri-
`ate thresholding. This added level of flexibility may sometimes lead to unexpected
`results; for example, a magazine cover with three intensity levels may be thresholded
`to four levels because of the presence of an address label that is thresholded at a sepa-
`rate level.
`
`2.2.2 Adaptive Thresholding
`
`A common way to perform adaptive thresholding is by analyzing gray-level intensi-
`ties within local windows across the image to determine local thresholds [10, 11].
`White and Rohrer [12] describe an adaptive thresholding algorithm for separating
`characters from background. The threshold is continuously changed through the
`image by estimating the background level as a two-dimensional running average of
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 16 of 41
`
`BANK OF AMERICA
`
`IPR2021-01080
`
`Ex. 1007, p. 16 of 41
`
`

`

`14
`
`Chapter 2
`
`Number of Pixels
`
`Number of Pixels
`
`Background
`Peak
`
`
` Foreground
`
`
`Peak
`
`h—p Intensity
`
`I Intensity
`
`Area of Misclassification
`Error
`
`Figure 2.2 Illustration of misclassification error in thresholding. Left, intensity histogram showing fore-
`ground and background peaks; right, the tails of the foreground and background populations have
`been extended to show the intensity overlap of the two populations. This overlap makes it impos—
`sible to correctly classify all pixels using a single threshold. The minimum-error method of
`threshold selection minimizes the total misclassification error.
`
`local pixel values taken for all pixels in the image (Figure 2.3). Mitchell and Gillies
`[13] describe a similar thresholding method where background white-level normaliza-
`tion is first done by estimating the white level and subtracting this level from the raw
`image. Segmentation of characters is accomplished by applying a range of thresholds
`and selecting the resulting image with the least noise content. Noise content is mea-
`sured as the sum of areas occupied by components that are smaller and thinner than
`empirically determined parameters. From the results of binarization for different
`thresholds shown in Figure 2.1, one can see that the best threshold selection yields the
`least visible noise. The main problem with any adaptive binarization technique is the
`choice of window size. The window size should be large enough to guarantee that
`enough background pixels are included to obtain a good estimate of average value,
`but not so large as to average over non-uniform background intensities. Often, how-
`ever, the features in the image vary in size, causing problems with fixed window size.
`To remedy this, domain dependent information can be used to ensure that the results
`of binarization give the expected features (a large blob of an ON-valued region is not
`expected in a page of smaller symbols, for instance). If the result is

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket