`(12) Patent Application Publication (10) Pub. No.: US 2002/0037097 A1
`Hoyos et al.
`(43) Pub. Date:
`Mar. 28, 2002
`
`US 20020037097A1
`
`(54) COUPON RECOGNITION SYSTEM
`
`Related US. Application Data
`
`(76) Inventors; Hector Hoyos, Guaynabo, PR (Us);
`Alex Rivera, San Juan, PR (Us);
`Miguel Berrios, Guaynabo, PR (Us);
`Inaki Olivares, San Juan, PR (US);
`Michelle Vlera-Vera, Carolina, PR
`(Us)
`
`Correspondence Address:
`Patent Law O?ices 0f
`Health W. Hoglund
`391 Juan A_ Davila Street
`San Juan, PR 00918 (Us)
`
`(21) Appl, No;
`
`09/855,830
`
`(22) Filed:
`
`May 15, 2001
`
`1 1 0
`
`/ 1 12
`
`(63) Non-provisional of provisional application No.
`60/204,440, ?led on May 15, 2000. Non-provisional
`of provisional application No. 60/204,170, ?led on
`May 15’ 2000'
`Publication Classi?cation
`
`(51) Int. Cl.7 ..................................................... .. G06K 9/00
`(52) US. Cl. ............................................................ .. 382/137
`
`(57)
`
`ABSTRACT
`
`An automated transaction machine includes a scanner con
`?gured to receive a bill or coupon. The coupon is processed
`by application of connected component analysis, segmenta
`tion, coupon matching, and data extraction to determine an
`associated vendor and customer account information. This
`information is used to complete a payment transaction.
`
`114
`
`118
`
`PaperBm >{EiJ‘
`
`Binary
`mm 6 > Cou on Engine
`
`Coupon
`Database
`
`1 16
`
`">
`
`\
`
`Bill TYPEI
`
`Coupon ID
`Fields of Interest:
`OCR Line
`Account
`Amount
`DemographiCs
`Others
`
`Page 1 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 1 0f 12
`
`US 2002/0037097 A1
`
`QQBQEO
`
`c0960
`
`A
`
`565m
`
`H .3
`
`
`
`0 ES . 623m A Em .?mdm
`
`N _. _. O f w
`
`cosmomzoumm
`
`coEwcoU
`
`2m 2m
`
`
`
`‘Ali gig/Q 0mg:
`
`EN NFN
`
`0PM
`
`@258?
`
`538mm
`
`Page 2 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 2 0f 12
`
`US 2002/0037097 A1
`
`$235 “6 wEoE Q 09? Em
`
`kowmuuohrmom A
`
`m5 9
`
`\Mmi
`
`
`
`Q0960 2682
`
`w 5 N 5
`
`o5
`
`QQEEBQEH
`
`wEcoN
`
`m .5
`
`3356a
`
`m5
`
`Q3“ 3%
`
`N; O;
`
`m c minoocm
`
`zosumboU
`
`33m
`
`
`
`
`
`kommwooavi AI. 565m
`
`Page 3 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 3 0f 12
`
`US 2002/0037097 A1
`
`F," Barcode’?!
`
`Li
`
`.
`
`7
`
`526
`
`\lertica\
`Region?
`
`ti
`Tl L’
`
`528
`
`Text Area?
`
`‘J
`
`N 530
`
`OGRA?
`
`F’ Table? j L
`il
`
`W520
`
`Horizontal
`Region’?
`
`A
`
`1
`
`*w 522
`
`Logo’?
`
`2
`
`"ll
`
`Text LineiPij
`T,’
`
`Fig. 5A
`
`Page 4 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`P t
`
`.
`
`.
`
`a cut Appllcatlon Publlcatlon Mar. 28, 2002 Sheet 4 0f 12
`
`.
`
`.
`
`US 2002/00370 97 A1
`
`mnoemm
`
`6226
`
`E2310:
`
`cowmmm
`
`nmyomccoO
`
`EmEOQEoO
`
`8m
`
`EOE?
`
`co?mm
`
`mmZ pxmk
`
`Sn
`
`.w:
`
`wmm
`
`Nmm i 922m
`
`Page 5 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`°
`9
`66
`0
`Patent Application Publication Mar 28 2002 Sh t5 f 12
`
`US 2002/0037097 A1
`
`coo
`
`445.9<18:
`09¢oEm..imomoo«j_mS.z<3<>mofisomngqmoo<oE9.:<
`.093amzooOEd.2o..<...Emu¢__._._oz_mammmmaoumq
`ommeufmcoE2%..2%$2:xomon
`
`._JH...:.3an
`
`ommoei_wfisszuu>m:O.a<2
`
`D
`
`_m.2<u:3.
`
`__:____________._,_,__E____
`
`.mcmmmrmuaanaSnmmfimmua
`
`
`
`fi.zm:omm955::._momfi.0mnamxu.5z
`
`mpuqZ_U_u._O
`
`<oma
`
`Page 6 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`Page 6 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 6 0f 12
`
`US 2002/0037097 A1
`
`m ummm
`
`mmu??u Q Bum mmaammu
`
`Page 7 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 7 of 12
`
`Us 2002/0037097 A1
`
`coovac
`
`....es\,320
`
`MEMHHHHMH
`
`.§.I.=u4
`
`
`
`iifi'urunIu_=IIE_l_m~hr.m—m=—o
`
`....2._.!__.,m.iEi._”§.l§.._
`F....4!rd5.ciliaadmwmzmflma
`
`L
`
`owe«mo
`
`Emmomnou
`mwfimzzou
`
`gnu-u532UU
`
`=_i______.
`
`MO0mus.D2
`
`4P2MDU
`
`one
`
`mmmaHMMF
`
`mmmmmmmmm
`
`M20
`
`Page 8 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`Page 8 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 8 of 12
`
`US 2002/0037097 A1
`
`
`
`‘-
`cu
`%——-+9-N
`~
`%
`
`o
`§—>
`
`3
`
`ooooooo
`oooooooo
`
`Page90f25
`
`ROTHSCHILD EXHIBIT 1001
`
`Page 9 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 9 of 12
`
`US 2002/0037097 A1
`
`Page 10 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`Page 10 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 10 0f 12
`
`US 2002/0037097 A1
`
`810
`
`OCR Line
`& Barcode Matcher
`
`/*\
`<Unique ID?
`
`F816
`
`‘
`
`818
`
`Text Matcher
`
`Unique 1D’?
`
`fwszo
`
`i
`
`822
`
`824
`
`826
`
`L¢______,_2
`
`Logo Matcher
`
`Unique ID?
`
`i" Exit
`
`L?’
`
`Fig. 8
`
`Page 11 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 11 0f 12
`
`US 2002/0037097 A1
`
`Page 12 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`Patent Application Publication Mar. 28, 2002 Sheet 12 0f 12 US 2002/0037097 A1
`
`v2:
`
`{0362
`
`$262
`
`BSQEQO
`
`2 .9“.
`
`W\\\\\\\J
`
`T\\\\\\|
`
`
`
`6532 gwmo
`
`NF:
`
`hmnmwm Emu
`
`
`
`ltlliulil?uii we?
`
`653w
`
`é moor
`
`Page 13 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`US 2002/0037097 A1
`
`Mar. 28, 2002
`
`COUPON RECOGNITION SYSTEM
`
`FIELD OF THE INVENTION
`
`[0001] The present invention relates generally to methods
`of automatically recognizing a document and more speci?
`cally to recognizing a document used in the sale or purchase
`of goods and services, commonly referred to as a bill or a
`coupon.
`
`BACKGROUND OF THE INVENTION
`
`[0002] In their efforts to ?nd better Ways to manage and
`support the increasing demand for products and services at
`?nancial institutions, the banking industry has turned to the
`implementation of automated systems that enable faster
`transaction processing While providing customers With a
`broader and more accessible variety of services on a “self
`service” basis. The ?exibility of extended branch hours and
`multiple transaction processing available at most automated
`teller machines (“ATM’s”) have dramatically altered the
`Way in Which customers interact With banks, and have
`become an additional and almost indispensable convenience
`to everyday living. Recent improvements to ATM-related
`machines Will alloW a customer to pay a bill using a debit or
`credit card. The bill is scanned and automatically recog
`niZed. The customer can then make payment by providing a
`debit or credit card.
`[0003] Although various recognition algorithms may be
`used to identify the product or service provider, the customer
`and the amount associated With a bill or coupon, invariably
`such systems include some degree of error. That is, virtually
`any system Will make some errors in identifying the product
`or service provider, the customer and the amount associated
`With a bill or coupon. The possibility for errors may con
`tribute to the unWillingness of banks and other ?nancial
`institutions to offer automated bill payment on a large-scale
`basis. LikeWise, the uncertainty of these transactions may
`feed consumer apprehension in using such systems. Accord
`ingly, a more robust system is desired.
`
`SUMMARY OF THE INVENTION
`
`[0004] According to one aspect of the invention a cus
`tomer enters a paper bill into a scanner. The resulting image
`data is provided to an associated computer. The computer
`extracts prominent features from the image in order to
`determine (1) the company that issued the bill, and (2) the
`customer’s account number and the amount to pay. The ?rst
`goal is a one-to-many matching problem. The system deter
`mines the closest match betWeen the input coupon and a
`library of coupons each associated With a company. If the
`coupon does not match any coupon in the database, it returns
`the paper bill to the customer and alerts the customer that the
`paper bill does not match any template in its library. Thus,
`the computer performs both matching and authentication.
`The second goal is an optical character recognition (OCR)
`problem. After a bill type has been recogniZed, a customer
`?eld and an amount ?eld may be extracted. The text in such
`?elds are provided to an OCR program that transforms the
`pixel data into machine-readable code.
`
`[0005] According to another aspect of the invention, after
`a bill or a number of bills from a customer have been
`recogniZed, the customer is provided With a number of
`payment options. These include any combination of credit
`
`card, debit card, smart card, cash, check or other means of
`payment. If the customer elects to pay by cash, check or
`other paper document, the customer enters the paper docu
`ment into a scanner. The paper document is identi?ed and
`authenticated. For example, in the case of a check, the
`computer isolates the amount ?eld as Well as the unique
`account identi?er. The text in such ?elds are provided to an
`OCR program that transforms the pixel data into machine
`readable code.
`[0006] In the case of cash, the paper bill is accepted by a
`separate scanner and associated authentication processor.
`The authentication processor performs various checks on the
`paper bill to determine both its authenticity and denomina
`tion. The result is passed to the computer so that the
`customer may be credited a corresponding amount. This
`payment, in turn, may be applied by the customer against
`any outstanding bills.
`[0007] According to another aspect of the invention, a
`method of operating an automated transaction machine
`includes recogniZing a coupon by scanning the coupon to
`generate an electronic representation. Segments of the elec
`tronic representation are compared With a de?ned category
`of patterns. Any segments that match one of the patterns is
`eliminated as noise. Connected segments are identi?ed
`Within the electronic representation. A barcode search is
`applied to the connected segments and any additional seg
`ments proximate thereto to determine Whether the connected
`segments form a portion of a barcode sequence. If so the
`alphanumeric characters associated With the barcode
`sequence are determined. An optical character recognition
`search is applied to the connected segments and any addi
`tional segments proximate thereto to determine Whether the
`connected segments form a portion of a text string. If so, the
`alphanumeric characters associated With the text string are
`determined. A table search is applied to the connected
`segments to determine Whether the connected segments
`form any portion of a table. If so the boundaries and position
`of the table on the coupon are determined. The alphanumeric
`characters associated With the barcode sequence, the alpha
`numeric characters associated With the text string, and the
`boundaries and position of the table are compared With a
`database of coupon data to determine Whether the electronic
`representation matches a coupon type in the database of
`coupon data.
`[0008] According to a further aspect of the invention,
`connected segments are run-length encoded so that each roW
`of is represented by a plurality of start and end points that
`represent the start and end of a continuous run of elements.
`The start and end points of adjacent roWs are compared to
`determine Whether any start or end points fall betWeen the
`start and end points of the adjacent roWs.
`[0009] According to a further aspect of the invention,
`segments of the electronic representation are compared With
`a de?ned category of patterns. The central bit of the seg
`ments are eliminated When the comparison generates a
`match, provided that the elimination of the central bit Will
`not disconnect otherWise connected components.
`[0010] According to a further aspect of the invention, the
`match is detected if the location and value of the barcode
`sequence or the character strings match an entry in the listing
`of vendor data.
`[0011] According to a further aspect of the invention, a
`customer account and an account balance are determined
`
`Page 14 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`US 2002/0037097 A1
`
`Mar. 28, 2002
`
`after determining a coupon type. The customer account and
`the account balance are read from the table of coupon data.
`
`[0012] According to another aspect of the invention, a
`method of identifying a vendor, a customer and an account
`balance based upon the representation of a coupon begins by
`grouping image data into a plurality of interconnected
`segments. The interconnected segments are then grouped to
`form objects of various types that include text lines, bar
`codes and OCR lines. Barcode recognition is applied to the
`interconnected segments to detect any barcode character
`sequences. Optical character recognition is applied to the
`interconnected segments to determine an optical character
`sequence. Text character recognition is applied to the inter
`connected segments to determine a text character sequence.
`A table stores the barcode character sequence, the optical
`character sequence, and the text character sequence. At least
`one of the barcode character sequence, the optical character
`sequence, and/or the text character sequence are compared
`to a database of vendor data to detect a match that deter
`mines a vendor. An expected location of a customer iden
`ti?er and an expected location of an account balance are
`determined based upon the vendor. The customer identi?er
`and the account balance are determined based upon the
`expected location.
`
`[0013] According to a further aspect of the invention, a
`plurality of bounding boxes are determined, each of Which
`de?ne the limits of one of the plurality of interconnected
`segments.
`[0014] According to a further aspect of the invention, the
`bounding boxes are compared to a plurality of thresholds to
`identify interconnected segments comprising noise and to
`identify interconnected segments comprising an OCR char
`acter sequence.
`
`[0015] According to another aspect of the invention, the
`automated transaction machine is implemented on a com
`puter system especially suitable for determining vendor,
`customer and account data associated With a coupon. The
`computer system includes a scanner, a card acceptor, and a
`netWork connection.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0016] FIG. 1 is a block diagram shoWing one preferred
`system for determining a coupon type and extracting rel
`evant ?elds from the coupon. The system includes a scanner
`112, a database of coupon data 116, and a coupon engine
`114. The coupon engine 114 compares a coupon image
`received from the scanner 112 With the database of coupon
`data 116 to determine its type and to extract the relevant
`?elds.
`
`[0017] FIG. 2 is a block diagram shoWing one preferred
`system for establishing the database of coupon data 116.
`
`[0018] FIG. 3 is a block diagram shoWing further details
`of one preferred coupon engine. It includes a preprocessor
`310, a segmentator 312, a match engine 314, an extraction
`engine 316, and a post processor 318.
`
`[0019] FIG. 4 is a block diagram shoWing further details
`of one preferred preprocessor 310.
`
`[0020] FIG. 5A is a block diagram shoWing one preferred
`method of performing segmentation of the coupon image
`data.
`
`[0021] FIG. 5B is a block diagram shoWing one preferred
`database structure suitable for use With method of segmen
`tation of FIG. 5A.
`
`[0022] FIG. 6A shoWs one example of a black-and-White
`scanned image of a coupon.
`
`[0023] FIG. 6B shoWs the example coupon of FIG. 6A
`along With one preferred connected component analysis
`associated thereWith.
`
`[0024] FIG. 6C shoWs the example coupon of FIG. 6A
`along With one preferred segmentation analysis associated
`thereWith.
`
`[0025] FIG. 7A shoWs one preferred connected compo
`nent table generated by performing connected component
`analysis on the coupon image of FIG. 6A.
`
`[0026] FIG. 7B shoWs one preferred segmentation table
`generated by performing segmentation on the coupon image
`of FIG. 6A.
`
`[0027] FIG. 8 is a block diagram shoWing one preferred
`method of determining the coupon type based upon a
`comparison With a coupon database.
`
`[0028] FIG. 9 shoWs one preferred set of patterns that are
`applied to a coupon image in the preprocessor 310 of FIGS.
`3 and 4 to reduce noise in the coupon image.
`
`[0029] FIG. 10 is a block diagram shoWing a computer
`system suitable for implementing the preferred system of
`FIG. 1.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`[0030] In one preferred embodiment of the invention, a
`paper bill or coupon is scanned and compared to a database
`of coupon data. The comparison is used to determine the
`coupon type and associated vendor. After making this deter
`mination, various ?elds of interest are extracted from the
`coupon such as account name, account balance, billing
`address, etc.
`
`[0031] Turning to FIG. 1, the process of identifying a
`coupon and extracting various ?elds is further described. At
`block 110, a customer presents a coupon. Typically, the
`coupon includes various forms of data such as a barcode, an
`OCRA text line, a logo, text, and others. These various forms
`of data are used to determine the vendor that issued the
`coupon, as Well as an associated customer account identi?er,
`an account balance, and related account data.
`
`[0032] At block 112, the coupon is passed through a
`scanner such as are Widely available commercially. The
`scanner passes the coupon over an opto-electronic trans
`ducer to generate an electronic representation of the coupon.
`Preferably, the scanner is con?gured to provide a black-and
`White image of the coupon, that is a binary bitmap of the
`coupon. In practice, 200 dpi resolution is suf?cient for most
`coupon types and preferred because the relatively loW
`resolution reduces data processing requirements. Nonethe
`less, some barcode images require ?ner scanning to distin
`guish adjacent lines. When coupons With ?ne barcodes are
`used, the resolution is set to 300 dpi, or the loWest resolution
`capable of resolving the lines of the barcode or other feature
`of the coupon.
`
`Page 15 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`US 2002/0037097 A1
`
`Mar. 28, 2002
`
`[0033] At block 114, information is extracted from the
`electronic representation of the coupon. For example, the
`siZe of the coupon is determined. Various data ?elds are
`identi?ed, such as barcodes, OCR lines, text lines, table
`boundaries, and others. As appropriate, the symbols in these
`?elds are passed to a recognition program that decodes the
`symbols into alphanumeric strings. These are compared to
`the coupon database 116 to determine Whether the incoming
`coupon matches the type of an entry in the coupon database
`116. The criteria for making this determination are further
`described beloW. Where the coupon generates a match, the
`coupon database Will identify certain areas of interest in the
`coupon, such as an OCR line With an associated account
`number and balance due.
`
`[0034] On many coupons, the same data is repeated in
`multiple formats. For example, the customer account num
`ber may be listed as a text string and as a barcode or OCR
`line. If one generates an error, the other may be used as an
`alternative source of information. Likewise, the tWo may be
`checked against each other to ensure that no errors Were
`made in converting the underlying image object into an
`alphanumeric string.
`[0035] Finally, at block 118, the results of the coupon
`analysis are provided. Typically, this includes a coupon ID
`that identi?es the vendor. Where a particular vendor uses
`more than one coupon layout, then more than one coupon ID
`Will be associated With the particular vendor. The results Will
`also include a number of additional ?elds that vary by
`coupon type. In most instances, these Will include an OCR
`line that includes the vendor’s ID, an account number, an
`amount due, and name and address information.
`
`[0036] Turning to FIG. 2, the process of establishing the
`database of coupon data 116 is described. The process
`begins at block 210 by providing a number of sample
`coupons from the same vendor having the same type. Where
`a vendor uses more than one coupon type, the different types
`are added in separate sessions. Preferably, at least ten sample
`coupons are provided.
`
`[0037] Then, at block 212, the sample coupons are
`scanned and processed to remove skeW and noise. The
`output provides a black-and-White bitmap for each of the
`underlying coupons. This data is used to establish the
`location, siZe and variation of the relevant ?elds.
`[0038] Next, at block 214, the bitmap is processed to
`determine the location and siZe of various ?elds. This
`processing includes both connected component analysis and
`segmentation, Which are further described beloW. The result
`is a listing of the type of elements in the coupon that is
`automatically generated by softWare engines. The listing
`includes position and type information for each element of
`the coupon image.
`
`[0039] Next, at block 216, a user speci?es ?elds of inter
`est. For example, a particular coupon type Will include an
`account name and number, an amount due, and an issue or
`due date. The user may select ?elds that should be extracted
`from a coupon image for processing payment. The selected
`?elds (also termed ?elds of interest) Will depend upon the
`information provided on the coupon and upon the processing
`needs of the vendor issuing the coupon.
`
`[0040] For example, a particular vendor may include an
`OCR line along the bottom of their coupons. This OCR line
`
`may include the account number and amount due. For this
`coupon, the user Would specify the expected location of the
`OCR line along With the format for receiving the account
`number and amount due. When this type of coupon is
`identi?ed by the coupon engine, the ?eld of interest infor
`mation is used to extract the account number and amount
`due.
`
`[0041] Next, at block 218, a user speci?es the set of
`suf?cient conditions for identifying a coupon. For example,
`some vendors include a unique reference number as part of
`an OCR line to identify themselves. In such cases, an OCR
`line containing the unique reference number may be suf?
`cient to identify a particular coupon type and associated
`vendor. In other cases, a barcode, text line, coupon layout or
`even a logo may be used to identify the coupon and
`associated vendor. The user speci?es Which of these ele
`ments or combination of elements shall be conclusive in
`determining the type of a coupon. The user may specify
`more than one condition for making this determination. For
`example, Where a coupon includes a barcode and also
`includes the vendor’s name and logo the user may specify
`that the vendor’s barcode sequence Will prove conclusive in
`determining the vendor. If a barcode match is not found,
`possibly because of a damaged coupon, the vendor’s name
`and logo Will prove conclusive in determining the vendor.
`These conditions are speci?ed by the user.
`
`[0042] Next, at block 220, the ?eld speci?cation and
`condition speci?cation are saved in the coupon database.
`This database is used to determine a coupon type and to
`extract ?elds of interest. This process is further described
`beloW.
`
`[0043] Turning to FIG. 3, one preferred method of oper
`ating a coupon engine, shoWn as block 114 of FIG. 1 is
`described. The process begins at block 310 Where the binary
`image data is received from a scanner. Here the data is
`preprocessed to reduce noise and to reformat the bit data
`information into a map of connected components. A con
`nected component is any combination of one or more bits
`that are connected to one or more other bits. For example, an
`individual letter in a text line consists of a group of inter
`connected bits. The connected component analysis Will
`identify that group of bits together. The connected compo
`nent analysis also identi?es the coordinates of the minimal
`bounding box for the connected components. This provides
`the coordinates for the upper, loWer, left and right bound
`aries of the bounding box.
`
`[0044] The preprocessing is further described beloW With
`reference to FIG. 4. A coupon image shoWn divided into
`bounding boxes each surrounding one connected component
`is described beloW With reference to FIG. 6A. The associ
`ated table of bounding box information is described beloW
`With reference to FIG. 7A.
`
`[0045] After completing the connected component analy
`sis, the data is passed to a segmentator at block 312. The
`segmentator operates upon the connected components and
`associated bounding boxes to determine their type. Prefer
`ably, tWelve symbol types are identi?ed. These include: (1)
`barcode, (2) line, (3) frame, (4) MICR line, (5) table, (6)
`horiZontal region (or text Word), (7) logo, (8) text line, (9)
`vertical region, (10) text area, (11) OCR line, and (12)
`connected component types. Each connected component is
`classi?ed into one of these types depending upon its under
`
`Page 16 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`US 2002/0037097 A1
`
`Mar. 28, 2002
`
`lying characteristics. These components are classi?ed in
`accordance With rules that are applied to the connected
`components and described below With reference to FIGS.
`5A and 5B.
`
`[0046] Next, at block 314, the information from the seg
`mentation process is used to determine the coupon type.
`Speci?cally, the information from the segmentation process
`is compared With information from the coupon database
`315. If the information from the coupon matches a set of
`conditions in the coupon database 315 the coupon type is
`determined. OtherWise, the coupon is rejected as not an
`acceptable coupon type. The process of generating a match
`is further described beloW With reference to FIG. 8.
`
`[0047] After identifying the coupon type, the process
`proceeds to extract customer information including account
`number, amount due and similar information, at block 316.
`The coupon database 315 identi?es the areas or Zones Where
`this information may be found. These areas are provided to
`the appropriate recognition engine for processing. For
`example, Where the coupon database 315 directs extraction
`of a customer name from a text line, the identi?ed area is
`passed to the optical character recognition engine. There the
`text is processed and the customer name returned as a
`character sequence. After extracting the desired ?elds, the
`process proceeds to perform post-processing operations at
`block 318.
`
`[0048] In practice, the recognition engines achieve a high
`degree of accuracy. Nonetheless, errors may occur during
`the process of extracting data. Post-processing is applied to
`minimize these errors. For example, spell checking, Zip code
`checking and other standard checks can be applied as
`post-processing at block 318.
`
`[0049] After completion of the post-processing, the result
`ing coupon type and ?elds of interest are provided by the
`computer. This information is used to process the coupon.
`
`[0050] Turning to FIG. 4, one preferred preprocessor
`suitable for use as the preprocessor 310 of FIG. 3 is
`described. The preprocessor includes a skeW correction
`block 410, a noise reduction block 412, a run length encod
`ing block 414, and a connected components block 416.
`Document skeW results from imperfections in the scanning
`process. Preferably, the skeW correction is performed in the
`scanner (shoWn as scanner 112 in FIG. 1). HoWever, if the
`scanner does not provide this functionality, then it is imple
`mented in the preprocessor 310.
`
`[0051] Next, noise reduction is applied at block 412.
`Preferably this includes the morphological operations of
`erosion and dilation. This reduces or eliminates noise in the
`image, Which is introduced by the scanning process and by
`background design patterns present in some coupons.
`
`[0052] The morphological erosion is performed by com
`paring three by three image segments With a prede?ned
`group of patterns. If an image segment matches the pattern,
`then the center bit of the image is treated as noise and
`eliminated. One preferred set of templates used in this
`operation is shoWn in FIG. 9.
`
`[0053] Turning brie?y to that ?gure, templates 901-921
`are used in the erosion process. Although the templates are
`shoWn graphically, they may also be represented as a string
`
`of bits. For example, template 901 may be represented as:
`[100,110,100], template 902 may be represented as: [001,
`110,100], and so on.
`[0054] When applying the templates 901-921, a bit is ?rst
`detected. The templates are applied by aligning the center of
`the template With the detected bit. The center bit for each
`template is alWays black. That is, using the above notation,
`the templates all folloW the form: [XXX,X1X,XXX], Where
`an “X” denotes a surrounding bit, and the “1” identi?es the
`center bit. Since the center bit is alWays set and alWays
`compared to a bit that is also set, the comparison betWeen
`these bits Will alWays generate a match. Accordingly, after
`detecting a bit, the template is compared only to the sur
`rounding bits to determine a match. This provides a com
`putational bene?t as one feWer comparisons are made.
`[0055] The templates 901-921 are chosen to reduce noise
`and at the same time to avoid the possibility that a connected
`component is split by the application of the templates. For
`example the template [101,010,000] is not included even
`though the template 916, [111,010,000] is included. The
`template [101,010,000] Would act to split an otherWise
`connected component.
`[0056] Returning to FIG. 4, after performing noise reduc
`tion, the remaining data is run-length encoded. Since the
`image typically includes long stretches of White space. Each
`bit is not encoded, rather the transition from a White bit to
`black bit is encoded. For coupon documents, this tends to
`reduce the bit requirements. Thus, the run-length encoding
`algorithm traverses the image roW-Wise and encodes con
`tinuous runs of pixels storing only its roW and the columns
`Where the run starts and ends.
`[0057] Next, the run-length encoded image data is pro
`vided to a connected component block 416. Any tWo adja
`cent runs that overlap or any tWo adjacent runs that end and
`begin Within one bit are grouped as a connected component.
`For example a run in the ?rst roW beginning at pixel 10 and
`extending to pixel 20 Would be joined With a run in the
`second roW beginning at pixel 15 and extending to pixel 25.
`LikeWise, a run in the third roW beginning at pixel 10 and
`extending to pixel 20 Would be joined With a run in the
`fourth roW beginning at pixel 21 and extending to pixel 31.
`Thus, When applying this algorithm to a pixel, another pixel
`is adjacent thereto if it lies in any of the eight surrounding
`locations (also termed eight-connected). One preferred
`method of determining the connected components is
`described in “Data Structures and Problem Solving using
`C++,” M. A. Weiss, 2nd Ed., Addison Wesley Longman, Inc.,
`Reading, Mass., 2000, at pages 845 through 863, Which is
`incorporated herein by reference.
`[0058] Turning to FIG. 5, the process of applying the
`segmentation analysis is further described. The segmenta
`tion analysis applies rules and conditions as explained beloW
`to the connected components to group them into the tWelve
`symbol types. Again, these include: (1) barcode, (2) line, (3)
`frame, (4) MICR line, (5) table, (6) horiZontal region (or text
`Word), (7) logo, (8) text line, (9) vertical region, (10) text
`area, (11) OCR line, and (12) connected component types.
`Where speci?c reference is made to a pixel threshold or
`comparison, the scanning resolution is set to 200 dpi. For
`other scanning resolutions, the pixel thresholds are simply
`adjusted proportionally.
`[0059] Beginning at block 510, the segmentator searches
`the connected components to ?nd a candidate for a barcode.
`
`Page 17 of 25
`
`ROTHSCHILD EXHIBIT 1001
`
`
`
`US 2002/0037097 A1
`
`Mar. 28, 2002
`
`The search begins by ?nding a connected component having
`a linear shape such as the individual lines of a barcode.
`Speci?cally, the segmentator searches for a connected com
`ponent having a density greater than 0.5 and an aspect ratio
`less than 0.25 or greater than 4. The density is de?ned as the
`number of (black) pixels in the connected component
`divided by the number of pixels in the bounding box
`associated With the connected component. The aspect ratio
`is de?ned as the Width divided by the height. The height and
`Width are determined by the bounding box associated With
`a connected component.
`
`[0060] After ?nding one connected component that meets
`these conditions, the segmentator tries to extend the barcode
`area by ?nding another line adjacent to the ?rst line that also
`meets the conditions for a barcode element. After ?nding
`such an element, the overlap betWeen the tWo is determined.
`At least eighty percent of the ?rst line must overlap the
`second line, and vice versa. For example, suppose that the
`?rst line begins at an uppermost pixel