throbber
(12) Ulllted States Patent
`Ji et a].
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 8,081,209 B2
`Dec. 20, 2011
`
`US008081209B2
`
`(54) METHOD AND SYSTEM OF SPARSE CODE
`BASED OBJECT CLASSIFICATION WITH
`
`SENSOR FUSION
`
`(75) IIWBIIIOFSI ZheIIgPiIIg Ji, Lansing, MI (Us); Danil
`V. Prokhorov, Canton, MI (US)
`
`(73) Assignee: Toyota Motor Engineering &
`Manufacturin North America Inc
`g
`’
`"
`Erlanger, KY (Us)
`
`7,202,776 B2
`g;
`
`4/2007 Breed
`greeg e: 3%
`
`ree e a .
`,
`,
`3/2008 Andarawis et a1. ......... .. 702/188
`7,343,265 B2 *
`4/2008 B d ........................... .. 701/45
`7,359,782 B2 *
`2/2005 Gbiizdy
`2005/0038573 A1
`2005/0273212 A1 12/2005 Hougen
`2007/0064242 A1* 3/2007 Childers ..................... .. 356/601
`2007/0233353 A1 10/2007 Kade
`2007/0282506 A1 12/2007 Breed et a1.
`2008/0046150 A1* 2/2008 Breed ........................... .. 701/45
`2010/0148940 A1* 6/2010 Gelvin et a1. .......... .. 340/28602
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`gage? 11S sixlgeltdeg 0?; dadJusted under 35
`'
`'
`'
`( ) y
`ays'
`
`OTHER PUBLICATIONS
`Optimal In-Place Learning andthe Lobe ComponentAnalysis; Juyan
`Weng and Nan Zhang.
`
`(21) Appl. N0.: 12/147,001
`
`>i< Cited by examiner
`
`(22) Filed:
`
`Jun‘ 26’ 2008
`.
`.
`.
`PrIor Publication Data
`US 2009/0322871 A1
`Dec. 31, 2009
`
`(65)
`
`(51) Int. Cl.
`(2006 01)
`H04N 7/18
`(200601)
`G06F 15/16
`'
`(52) U 5 Cl
`(58) Field of Classi?cation Search
`See a lication ?le for Com lete'
`pp
`p
`References Cited
`
`(56)
`
`348/115
`348/113i118
`histo
`ry'
`
`U.S. PATENT DOCUMENTS
`McNary et a1.
`10/1999
`5,963,653
`A
`Malhotra
`12/2004
`6,834,232
`B1
`2/2005
`6,856,873
`Breed et a1.
`B2
`3/2005
`6,862,537
`Skrbina et a1.
`B2
`4/2005
`6,885,968
`Breed et a1.
`B2
`6,889,171
`5/2005
`Skrbina et a1.
`B2
`7,049,945
`5/2006
`Breed et a1.
`B2
`YOPP
`7,069,130
`6/2006
`B2
`7,085,637
`8/2006
`Breed et a1.
`B2
`7,110,880
`9/2006
`Breed et a1.
`B2
`
`Primary Examiner * Zami Maung
`(74) Attorney, Agent, or Firm *Gifford, Krass, Spinkle,
`Anderson & Citkowski’ PC‘
`
`(57)
`
`ABSTRACT
`.
`.
`.
`A system and method for object class1?cat1on based upon the
`fusion of a radar system and a natural imaging device using
`sparse code representation. The radar system provides a
`means of detecting the presence of an object Within a prede
`termined path of a vehicle. Detected objects are then fused
`With the image gathered by the camera and then isolated in an
`attention WindoW. The attention WindoW is then transformed
`into a sparse code representation of the object. The sparse
`code representation is then compared With knoWn sparse code
`representation of various objects. Each knoWn sparse code
`representation is given a predetermined variance and subse
`quent sparse code represented objects falling Within saidvar‘i
`ance Will be classi?ed as such. The system and method also
`includes an associative learning algorithm Wherein classi?ed
`sparse code representations are stored and used to help clas
`sifying subsequent sparse code representation.
`
`4 Claims, 5 Drawing Sheets
`
`
`
`'QQQQQQQ' ‘DI
`
`Layer 1
`(Recognition)
`
`L e: 0
`(Sparse Representation)
`
`“Vehicle”
`
`“Non-Vehicle"
`
`Layer 2
`(Association)
`
`Layer 3
`(Classes)
`
`AVS EXHIBIT 2006
`Toyota Inc. v. American Vehicular Sciences LLC
`IPR2013-00419
`
`

`

`Image
`
`. . Projectlon —- WlIldOW —> .
`Extraction
`
`
`
`
`
`
`Sparse -
`Codmg
`
`—> MILN
`
`‘ i_i
`
`Camera
`
`Camera
`Calibration
`
`1.17""
`
`Target
`Kalman POiIll
`Radars) Filtering K
`14
`2°
`
`US. Patent
`
`Dec. 20, 2011
`
`Sheet 1 015
`
`US 8,081,209 B2
`
`16 f- 1 0
`
`[so K24 /18
`
`1
`
`22
`
`image
`
`/
`
`Top Down View:
`
`32\
`
`_\
`
`ImageNurnzlEl
`
`"""""
`
`Training Process:
`
`________
`
`t
`
`l
`
`i
`
`r’
`
`:
`
`I
`
`:i
`
`I’?
`
`""" """
`
`"""" "
`
`it i
`
`I
`
`a
`
`_______ n
`
`_
`
`------
`
`....
`
`....... __
`
`-2O
`
`'
`-10
`
`:
`t,
`1O
`0
`t2] Show Grid:
`
`20
`
`IE!
`
`E
`
`FIG-2
`
`

`

`US. Patent
`
`Dec. 20, 2011
`
`Sheet 2 015
`
`US 8,081,209 B2
`
`X
`
`0
`
`Z
`
`Y
`
`World-Center
`Coordinate System
`
`1, 2°
`Max. Winn; /
`Vehicle
`,1‘
`
`Max. Height
`
`F
`
`3
`
`28
`
`X0
`A
`
`"0'
`
`,’
`,’ Projected Size
`, ' 1 Image Plane
`
`Camera-Center
`Coordinate System
`
`0C
`
`' YC
`
`Natural Images
`
`Randomly Selected
`Subpatches
`
`26
`
`FIG-4
`
`

`

`US. Patent
`
`Dec. 20, 2011
`
`Sheet 3 015
`
`US 8,081,209 B2
`
`'1
`Window Images (56*5 6)
`
`I.‘
`
`HI Ill
`
`||||||‘ lllli‘n
`
`Sparse Representation (36*431)
`
`

`

`US. Patent
`
`Dec. 20, 2011
`
`Sheet 4 015
`
`US 8,081,209 B2
`
`o~\
`
`“v I. I n
`O_\
`O \ f-Om e nce
`
`“Non-Vehicle"
`
`3
`
`L
`
`(Ciggasres)
`
`L
`
`(Assiygzitiunl
`
`2
`
`/ -
`
`O /
`
`0/
`
`O
`
`.
`6
`
`Layerl
`(Recognition)
`
`Layer 0
`(Sparse Representation)
`
`

`

`US. Patent
`
`Dec. 20, 2011
`
`Sheet 5 015
`
`US 8,081,209 B2
`
`Step 1
`
`12
`
`/
`
`Create an Attention Window for each Object
`Detected that is Associated with a Radar Return.
`
`Step 2
`Generate an Orientation-Selective Filter of the
`Image of each Attention Window.
`
`Step 3
`Transform each Orientation-Selective Filter into a
`Sparse Code Representation.
`
`u
`
`Step 4
`
`Step 5
`
`1
`
`Classify each Sparse Code Representation.
`
`Ir
`
`Store each classi?ed S arse Code Representation
`in a atabase.
`
`Step 6
`Compare each Stored Classi?ed Sparse Code
`Representation with Subsepuent Sparse Code
`Representations of Images 0 Objects Captured in
`said Attention Windows to Classify said
`Subsequent Images.
`
`FIG-8
`
`

`

`US 8,081,209 B2
`
`1
`METHOD AND SYSTEM OF SPARSE CODE
`BASED OBJECT CLASSIFICATION WITH
`SENSOR FUSION
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`A system and method for object classi?cation based upon
`the fusion of a radar system and a natural imaging device, and
`a sparse code representation of an identi?ed object.
`2. Description of the Prior Art
`Sensor fusion object classi?cation systems are knoWn and
`Well documented. Such systems Will gather information from
`an active and passive sensor and associate the tWo data to
`provide the user With information relating to the data, such as
`Whether the object is a vehicle or a non-vehicle. Such an
`association is commonly referred to as fusion and is referred
`to as such herein. In operation, fusion relates to return of a
`natural image captured by the passive sensor With respect to
`the detection of an object by the active sensor. Speci?cally, an
`active sensor such as a radar system Will be paired to a passive
`sensor such as a video camera, and the objects detected by the
`radar Will be mapped to the video image taken by the video
`camera. The fusion of such data may be done using algo
`rithms Which map the radar return to the video image. The
`fused data may then be further processed for relevant infor
`mation such as object detection and classi?cation using some
`form of visual graphic imaging interpretation. HoWever,
`visual graphic imaging interpretation requires su?icient
`memory to store the visual graphic data, and su?icient pro
`cessing speed to interpret the visual data in a timely manner.
`For example, US. Pat. No. 6,834,232 to Malhotra teaches the
`use of multiple sensor data fusion architecture to reduce the
`amount of image processing by processing only selected
`areas of an image frame as determined in response to infor
`mation from electromagnetic sensors. Each selected area is
`given a centroid and the center of re?ection for each detected
`object is identi?ed. A set of vectors are determined betWeen
`the centers of re?ection and the centroid. The difference
`betWeen the centers of re?ection and the centroids are used to
`classify objects. HoWever, Malhotra does not teach the use of
`orientation selective ?lters for object classi?cation.
`US. Pat. No. 6,889,171 to Skrbina et al. discloses a system
`fusing radar returns With visual camera imaging to obtain
`environmental information associated With the vehicle such
`as object classi?cation. Speci?cally, a radar is paired With a
`camera and the information received from each are time
`tagged and fused to provide the user With data relating to
`object classi?cation, relative velocity, and the like. This sys
`tem requires the data to be processed through an elaborate and
`complicated algorithm and thus requires a processor With that
`ability to process a tremendous amount of data in a relatively
`short period of time in order to provide the user With usable
`data.
`US. Pat. No. 7,209,221 to Breed et al. discloses a method
`of obtaining information regarding a vehicle blind spot using
`an infrared emitting device. Speci?cally, the method uses a
`trained pattern recognition technique or a neural netWork to
`identify a detected object. HoWever, Breed et al. is dependent
`upon the trained pattern recognition technique Whereby the
`amount of patterns and processes may place a huge burden on
`the system.
`Accordingly, it is desirable to have a system for object
`classi?cation Which does not require the amount of process
`ing capabilities as the prior art, and Which can re?ne and
`improve its classi?cation over time. One form of object rec
`ognition and classi?cation is knoWn as sparse code represen
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`tation. It is understood that sparse coding is hoW the human
`visual system e?iciently codes images. This form of object
`recognition produces a limited response to any given stimulus
`thereby reducing the processing requirements of the prior art
`systems. Furthermore, the use of sparse code recognition
`alloWs the system to be integrated With current systems hav
`ing embedded radar and camera fusion capabilities.
`
`SUMMARY OF THE INVENTION AND
`ADVANTAGES
`
`A system and method for object classi?cation based upon
`the fusion of a radar system and a natural imaging device
`using sparse code representation is provided. Speci?cally, a
`radar system is paired With a camera. The radar system pro
`vides a means of detecting the presence of an object Within a
`predetermined path of a vehicle. Objects identi?ed Within this
`predetermined path are then fused With the image gathered by
`the camera and the detected image is provided to the user in
`separate WindoWs referred to hereafter as “attention Win
`doWs .” These attention WindoWs are the natural camera image
`of the environment surrounding the radar return. The atten
`tion WindoW is then transformed into a sparse code represen
`tation of the object. The sparse code representation is then
`compared With knoWn sparse code representation of various
`objects. Each knoWn sparse code representation is given a
`predetermined variance and subsequent sparse code repre
`sentations falling Within said variance Will be classi?ed as
`such. For instance, a knoWn sparse code representation hav
`ing a value of Y Will be given a variance of Y+/—V, and
`subsequent sparse code representations of an image of an
`object falling Within Y+/—V, Will be classi?ed as the object
`With Y. The system and method also includes an associative
`learning algorithm Wherein classi?ed sparse code represen
`tations are stored and used to help classifying subsequent
`sparse code representation, thus providing the system With
`the capabilities of associative learning.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Other advantages of the present invention Will be readily
`appreciated, as the same becomes better understood by ref
`erence to the folloWing detailed description When considered
`in connection With the accompanying draWings Wherein:
`FIG. 1 is schematic of the operational architecture of the
`system;
`FIG. 2 is schematic of the system development interface;
`FIG. 3 shoWs the projection from the actual vehicle siZe to
`the WindoW siZe in the image plane;
`FIG. 4 shoWs the generation of orientation-selective ?lters
`using Lobe Component Analysis;
`FIG. 5 shoWs the arrangement of local receptive ?elds over
`the attention WindoW.
`FIG. 6 shoWs the transformation of the natural image into
`a sparse code representation;
`FIG. 7 shoWs the placement of neurons on different levels
`in the netWork; and
`FIG. 8 shoWs the method for object classi?cation using
`sparse code representation.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`With reference to FIG. 1, a system 10 and method 12 for
`object classi?cation based upon the fusion of a radar system
`14 and a natural imaging device 16 using sparse code repre
`sentation 18 is provided. Speci?cally, the system 10 fuses
`radar return 20 With associated camera imaging 22, and the
`
`

`

`US 8,081,209 B2
`
`3
`fused data 24 is then transformed into a sparse code repre
`sentation 18 and classi?ed. The camera image of the radar
`return 20 is isolated, extracted and ?ltered With the help of
`orientation-selective ?lters 26. Processing of local receptive
`?elds embedded onto the extracted image by orientation
`selective ?lters 26 results in a sparse code for the image. The
`sparse code representation 18 of the object on the image is
`compared to knoWn sparse code representations 18 of a par
`ticular object class. The processing and comparison of any
`subsequent sparse code representation 18 of images is done in
`an associative learning framework.
`Thus, the video image of the object detected by the radar
`system 14 is transformed to a sparse code representation 18,
`Wherein the sparse code representation 18 of the detected
`object is used for higher-level processing such as recognition
`or categorization. A variety of experimental and theoretical
`studies indicate that the human visual system ef?ciently
`codes retinal activation using sparse coding, thus for any
`given stimulus there are only a feW active responses. Accord
`ingly, sparse coding representation is more independent than
`other forms of imaging data. This feature alloWs object leam
`ing and associative learning using sparse code representation
`18 to become a compositional problem and improves the
`memory capacity of associative memories. For illustrative
`purposes, the system 10 described Will classify an object as
`being either a vehicle or a non-vehicle. HoWever it is antici
`pated that the object classi?cation system 10 is capable of
`making other sub classi?cations. For example, if an object is
`classi?ed as a non-vehicle, the system 10 may further classify
`the object as being a human, or an animal such as a deer;
`likeWise, if the system 10 classi?es an object as being a
`vehicle, the system 10 can further classify the object as being
`a motorcycle, bike, or an SUV.
`In the ?rst preferred embodiment of the system 10 for
`object classi?cation, a radar system 14 is paired With a video
`camera for use in an automobile. Thus the speci?cations and
`characteristics of the radar system 14 must be suitable for
`detecting objects Within the customary driving environment
`of an automobile. The speci?cations of a suitable radar sys
`tem 14 includes a range betWeen 2 to 150 m With a tolerance
`of either +/—5% or +/—l .0 m; an angle of at least 15 degrees,
`With a tolerance of either +/—0.3 degrees or +/—0.l m, and a
`speed of+/—56 m/s With a tolerance +/—0.75 m/s. An example
`of a radar system 14 having the qualities described above is
`the F10 mM-Wave radar. The video camera characteristics
`and speci?cations of the video camera paired With such a
`radar system 14 includes a refreshing rate of 15 HZ, a ?eld
`vieW of 45 degrees, and a resolution of 320*240 pixels. An
`example of a video camera system 16 having the qualities
`described above is a Mobileye camera system 10.
`In operation, the radar system 14 provides a return for
`objects detected Within 15 degrees and out to 150 meters of
`the path of the vehicle. The radar return 20 can be processed
`temporally to determine the relative speed of each object. As
`the object classi?cation system 10 is directed at classifying
`objects Within the path of the vehicle, radar returns 20 outside
`of a predetermined parameter Will be discarded. This has the
`bene?t of reducing subsequent process computational loads
`of the system 10 thereby increasing the ef?ciency of said
`system 10. For instance, radar returns 20 more than eight
`meters to the right or left of the vehicle may be discarded as
`the object is considered out of the vehicle’s path. HoWever, it
`is understood that the parameters for discarding radar returns
`20 disclosed above are for illustrative purposes only and are
`not limiting to the disclosure presented herein.
`The radar returns 20 are fused With the real-time video
`image provided by the video camera system 16. Speci?cally,
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`the radar returns 20 are projected onto an image system ref
`erence plane 10 using a perspective mapping transformation
`28. Thus, a processor 3 0 is provided for fusing the radar return
`20 With the real-time visual of the natural imaging device 16
`using an algorithm Which performs the necessary perspective
`mapping transformation 28. The perspective mapping trans
`formation 28 is performed using calibration data that contain
`the intrinsic and extrinsic parameters of each camera. An
`example of such a perspective mapping transformation 28 can
`be found in US. Patent No. to Skrbina et al.
`With reference to FIG. 2, attention WindoWs 32 are created
`for each radar return 20. The attention WindoWs 32 are the
`video images of the radar return 20, and Will be given a
`predetermined siZe for Which to display the detected object.
`Speci?cally, as each radar return 20 is fused With the video
`camera image, the environment surrounding the radar returns
`20 is shoWn in an isolated vieW referred to as attention Win
`doWs 32. The video image of each attention WindoW 32 cor
`relates to the expected height and Width of the detected object.
`Accordingly, if the system 10 is concerned With only classi
`fying the object as either a vehicle or non-vehicle, the video
`image presented in each attention WindoW 32 is designed to
`the expected height of a vehicle. HoWever, as stated above the
`system 10 can be used to provide a broad possibility of clas
`si?cations and thus the dimensions of the video images con
`tained Within the attention WindoW 32 Will be tailored accord
`ingly.
`With reference to FIG. 3, the attention WindoWs 32 provide
`an extracted image of the detected object from the video
`camera. As the video images are designed to capture the
`predetermined area surrounding the object as described
`above, each video image may differ in siZe from one another
`depending upon the distance betWeen the detected object and
`the vehicle. These images are normaliZed Within the attention
`WindoW 32 to prevent the video images from being deformed.
`FIG. 2, shoWs normalized images Within its respective atten
`tion WindoW 32 Wherein the pixel intensity of each attention
`WindoW 32 unoccupied by an image are set to intensities of
`Zero.
`With reference noW to FIG. 4, the generation of orienta
`tion-selective ?lters 26 is provided. In the preferred embodi
`ment, the generation of orientation-selective ?lters 26 is
`achieved using Lobe Component Analysis hereafter
`(“LCA”). Speci?cally, neuron layers are developed from the
`natural images using LCA Wherein the neurons With update
`totals less than a predetermined amount are discarded and the
`neurons With an update total of a predetermined amount or
`more (Winning neuron) are retained. The retained (Winning)
`neurons are then used as orientation-selective ?lters 26.
`As shoWn in FIG. 6, a pool of orientation-selective ?lters
`26 may be developed to provide a specimen for Which sub
`sequent classi?cation of objects may be made. The developed
`?lter pool is comprised of Winning neurons as derived from
`the video image of an example of an object as captured Within
`each attention WindoW 32. For instance, the developed ?lter
`pool may consist of Winning neurons from hundreds of natu
`ral images that have been processed through the LCA. The
`developed ?lter pool is then embedded into the system 10 to
`provide the system 10 With a basis for subsequent generation
`of sparse codes for object classi?cations.
`Before images are processed by the LCA (FIG. 4), they are
`Whitened. Whitening of images is a Well knoWn statistical
`preprocessing procedure Which is intended to decorrelate
`images. In the preferred embodiment, Whitening is achieved
`by selecting a predetermined area x of each attention WindoW
`32, from a predetermined number N of random locations
`(randomly selected patches in FIG. 4), Where the predeter
`mined area x is further de?ned by an area of pixels d, and
`dIPZEMgthXPWZ-dth. The attention WindoW 32 is thereby related
`
`

`

`US 8,081,209 B2
`
`5
`in matrix form whereby x is the attention WindoW 32, and
`x:{xl, x2, .
`.
`. xn}. Each x” is further represented in vector
`form xi:(x,-, xi, .
`.
`. xhd). The Whitening matrix W is generated
`by taking the matrix of principal components V:{Vl, V2, .
`.
`.
`Vk} and dividing each by its standard deviation (the square
`root of variance). The matrix D is a diagonal matrix Where the
`matrix element at roW and column i is
`
`l
`m
`
`and 7», is the eigenvalue of vi. Then, WIVD. To obtain a
`Whitened attention WindoW 32 Y, multiply the response
`vector(s) X by W: YIWXIVDX. HoWever, the image can be
`transformed into a sparse code representation 18 Without
`Whitening.
`Upon obtaining Whitened attention WindoW 32 Y, Y is
`further processed to develop neuron layers through the LCA
`algorithm. The LCA algorithm incrementally updates c of
`20
`corresponding neurons represented by the column vectors
`v1“), v2“),
`. v6“) from Whitened input samples y, y(1),
`y(2), . . . y(i), Where y(i) is a column vector extracted from the
`Matrix Y At time t, the output of the layer is the response
`vector Z(I):(Zl(I), Z2(t), .
`.
`. ZC(I)). The LCA Algorithm
`Z(t):LCA(y(t)) is as folloWs:
`
`25
`
`ALGORITHM l
`
`Lobe Component Analysis
`
`22:
`
`3:
`4:
`5:
`
`6:
`
`7:
`
`Sequentially initialize 0 cells using ?rst c observations:
`v,(‘) = y(t) and set cell-update age n(t) = 1, for t = 1, 2, .
`
`.
`
`.
`
`, c.
`
`fort=c+1,c+2,...do
`Compute output (response) for all neurons:
`for l 2 i 2 0 do
`Compute the response:
`y(i)-W0- 1) ]
`Z‘“) g“(uvm-nuuymu ’
`
`Where g,- is a sigmoidal function.
`Simulating lateral inhibition, decide the
`Winner:j = arg maxlgigc{zi(t)}.
`Update only the Winner neuron vj using its
`temporally scheduled plasticity:
`Vj(t) = ‘Viv/‘(t _ 1) + W2Zjy(t)>
`Where the scheduled plasticity is determined by
`its tWo age-dependent Weights:
`
`W : Imp-hung»
`no)
`
`a
`
`2 1 + #60»
`no)
`
`a
`
`With W1 + W2 E 1. Update the number ofhits
`(cell age) n(j) only for the Winner: n(_i) <— n(_i) + 1.
`
`8:
`
`Where plasticity parameters t1 = 20, t2 = 200, c = 2, and
`r = 2000 in our implementation.
`All other neurons keep their ages and Weight
`unchanged: For all l 2 i E c, i ==j, vi”) = vll’il).
`
`end for
`
`end for
`
`FIG. 4 shoWs the result When using LCA on neurons (“c”)
`and Whitened input samples (“n”). Each neuron’s Weight
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`vector is identical in dimension to the input, thus is able to be
`displayed in each grid as a roW and column image by applying
`a de-Whitening process, Which is achieved by restoring the
`original input vector using the folloWing equation: x:VD_lx.
`In FIG. 4, the lobe component With the most Wins is at the top
`left of the image grid progresses through each roW to the lobe
`component With the least Wins, Which is located at the bottom
`right. As stated before, the neurons having at least a predeter
`mined amount of Wins (update totals) Were kept because said
`neurons shoWed a localiZed orientation pattern. Accordingly,
`the retained Winning neurons are designated as orientation
`selective ?lters 26. Thus, as the real World image samples are
`collected and processed through the above described LCA,
`the Winning neurons are kept and designated as orientation
`selective ?lters 26 and used for further processing. The ben
`e?t of using orientation-selective ?lters 26 is that said orien
`tation-selective ?lters 26 eliminate reliance upon problem
`domain experts, and traditional image processing methods
`12.
`With reference to FIG. 5, square receptive ?elds are applied
`over the entire attention WindoW 32 such that some of the
`receptive ?elds overlap each other. Each receptive ?eld has a
`predetermined area de?ned by pixel length and Width. In the
`preferred embodiment, these receptive ?elds are pixels in the
`attention WindoW 32 image plane. FIG. 5 shoWs each recep
`tive ?eld staggered from each other by pixels, and in an
`overlapping relationship. In this manner a plurality of local
`receptive ?elds are held Within the attention WindoW 32 plane.
`With reference to FIG. 6, the orientation-selective ?lter 26
`may be transformed into sparse code representation 18 S by
`taking orientation-selective ?lters 26 F transposed and mul
`tiplying it by a non-negative matrix L, Where L represents the
`local receptive ?elds as shoWn in the folloWing equation
`SIFTXL. Each receptive ?eld is represented by a stacked
`vector having a correlating pixel length and Width. Accord
`ingly, in the preferred embodiment, the stacked vector dimen
`sion is x. Thus, a non-negative matrix L is provided, Where
`L:{I1, I2, . . .Im} and m is the number oflocal receptive ?elds.
`The set of orientation-selective ?lters 26 are represented as
`F:{fl, f2, .
`.
`. fat} Where c' represents the number of neurons
`retained from the Win selection process, and each f,- is a
`non-negative orientation-selective ?lter 26 from neuron i rep
`resented as fi:(f,-, fi, .
`.
`. fad), Where d:l6><l6.
`As represented by the S equation above, the orientation
`selective ?lters 26 are multiplied by the local receptive ?elds
`to generate sparse representation of the image. The matrix S
`is non-negative and, by operation of the formula SIFTXL, is
`reshaped into a vector having a total number of dimension
`i><m, and i represents the total number of neurons kept from
`the Winning process of FIG. 4, and in represents the total
`number of local receptive ?elds. Thus, matrix S maps the raW
`pixel representation of the attention WindoW 32 to a higher
`dimensional, sparse encoded space Which leads to a sparse
`code representation 18 of the input as opposed to the standard
`pixel representation of the image. The sparse code represen
`tation 18 has the advantage of achieving better ef?ciency in
`both recognition rate and learning speed than classi?cation
`methods 12 based upon the original pixel input, or image
`based classi?cation systems 10 using image recognition.
`An associative learning frameWork such as Multi-layer
`In-place Learning NetWork (MILN), Neural NetWork (NN),
`Incremental Support Vector Machines (I-SVM), or Incremen
`tal Hierarchical Discriminant Regression, is then used to clas
`sify the sparse code representation 18 of the input. The asso
`ciative learning frameWork utiliZes both incremental and
`online learning methods 12 to recogniZe and classify inputs.
`Accordingly, the use of a developed ?lter pool enhances the
`
`

`

`US 8,081,209 B2
`
`7
`system 10 due to the various known properties of objects
`provided. However, the advantage of the associative learning
`framework is that the system 10 can make subsequent object
`classi?cation based upon a developed ?lter pool having only
`one sparse code representations 18 of each object Within a
`particular classi?cation. For example, Where the system 10
`only makes tWo object classi?cations, vehicle or non-vehicle,
`a sparse code representation 18 of a vehicle and a non-vehicle
`Will alloW the system 10 to make subsequent object classi?
`cations based upon the tWo knoWn sparse code representa
`tions. In the preferred embodiment, the MILN is used as it
`provides better performance.
`A comparative study of four different classi?cation meth
`ods Was conducted Where each method Was used in an open
`ended autonomous development Where an ef?cient (memory
`controlled), real-time (incremental and timely), and extend
`
`8
`types of input, and uses the least memory, in terms of the
`number of support vectors automatically determined by the
`data, but training time is the Worst. Another potential problem
`With I-SVM is its lack of extendibility to situations When the
`same data may be later expanded from the original tWo to
`more than tWo classes. As general purpose regressors, IHDR
`and MILN are readily extendable. IHDR uses too much
`memory and does not represent information e?iciently and
`selectively as shoWn in tables 1 and 2. The MILN hoWever,
`alloWs the system to focus analysis on sub-parts of the atten
`tion WindoW to improve generalization (e.g., recogniZe a
`vehicle as a combination of license plate, rear WindoW and
`tWo tail lights). Table 2 shoWs that the overall accuracy of
`object classi?cation is more accurate using MILN Whenusing
`sparse coded inputs.
`
`TABLE 1
`
`AVERAGE PERFORMANCE & COMPARISON OF LEARNING METHODS OVER l0-FOLD CROSS
`VALIDATION FOR PIXEL INPUTS
`
`Learning
`Method
`
`“Overall”
`Accuracy
`
`“Vehicle”
`Accuracy
`
`“Other Objects”
`Accuracy
`
`Training Time
`Per Sample
`
`Test Time
`Per Sample
`
`Final # Storage
`Elements
`
`NN
`I-SVM
`I-HDR
`MILN
`
`N/A
`93.89 1 1.84% 94.32 1 1.42% 93.38 1 2.04%
`94.38 1 2.24% 97.08 1 1.01% 92.10 1 6.08% 134.3 1 0.4 ms
`95.87 1 1.02% 96.36 1 0.74% 95.62 1 2.84%
`2.7 r 0.4 ms
`94.58 12.34% 97.12 1 1.60% 91.20 15.31%
`8.8 10.6 ms
`
`432.0 1 20.3 ms
`2.2 r 0.1 ms
`4.7 r 0.6 ms
`8.8 10.6 ms
`
`621
`44.5 r 2.3
`689
`100
`
`TABLE 2
`
`AVERAGE PERFORMANCE & COMPARISON OF LEARNING METHODS OVER lO-FOLD CROSS
`VALIDATION FOR SPARSE CODED INPUTS
`
`Learning
`Method
`
`“Overall”
`Accuracy
`
`“Vehicle”
`Accuracy
`
`“Other Objects”
`Accuracy
`
`Training Time
`Per Sample
`
`Test Time
`Per Sample
`
`Final # Storage
`Elements
`
`NN
`I-SVM
`I-HDR
`MILN
`
`N/A
`94.32 1 1.24% 95.43 11.02% 91.28 1 1.86%
`96.79 1 1.17% 97.23 1 1.20% 99.40 1 3.27% 324.5 1 22.4 ms
`96.54 1 1.83% 96.79 1 1.04% 96.31 1 2.05%
`12.2 r 1.3 ms
`97.14 1 1.27% 97.93 1 1.63% 95.46 1 2.54%
`109.1 1 3.2 ms
`
`2186.5 1 52.5 ms
`7.6 r 0.3 ms
`21.5 r 1.7 ms
`42.6 r 0.4 ms
`
`621
`45.2 r 2.6
`689
`100
`
`45
`
`55
`
`able (the number of classes can increase) architecture is
`desired. Tables shoWing the results of the study are provided
`for reference. The four different classi?cation methods are:
`K-Nearest Neighbor (“NN”), With kIl, and using a Ll dis
`tance metric for baseline performance, Incremental-SVM
`(“I-SVM”), Incremental Hierarchical Discriminant Regres
`sion (IHDR) and the MILN as discussed herein. A linear
`kernel for I-SVM Was used for high-dimensional problems. A
`50
`summary of the results of the comparative study using tWo
`different input types: non-transformed space, “pixel” space
`having input dimensions of 56x56; and sparse-coded space
`having dimensions of 36><431 by the MILN Layer-l are pro
`vided in the Tables 1 and 2. Table 1 shows the average per
`formance of learning methods over l0-fold cross validation
`for pixel inputs and Table 2 shoWs average performance of
`learning methods over l0-fold cross validation for sparse
`coded inputs.
`The study shoWs that the Nearest Neighbor performed
`fairly Well, but Was prohibitively sloW. IHDR performed
`faster than NN, but required a lot of memory as IHDR auto
`matically develops an overlaying tree structure that organiZes
`and clusters data. Furthermore, While IHDR does alloW
`sample merging, it saves every training sample thus does not
`use memory e?iciently. I-SVM performed Well With both
`
`60
`
`65
`
`The generated sparse code representations 18 are pro
`cessed through the MILN. The advantage of a MILN is that
`there are no global rules for learning, such as the minimiZa
`tion of mean square error for a pre-collected set of inputs and
`outputs. Thus With MILN, each neuron learns on its oWn, as a
`self-contained entity using its oWn internal mechanisms. As
`shoWn in the algorithm 2, a Top-DoWn input is provided.
`Initially, the Top-DoWn input is an external signal that is
`labeled as being in a particular classi?cation. Subsequent
`signals gathered from the systems sensors are then compared
`With the initial signal for classi?cation. The mechanisms con
`tained in the MILN along With each neuron’s stimulation
`affect the neuron’s features over time thus enabling a learning
`process. As shoWn in FIG. 7, the MILN netWork includes tWo
`layers to perform learning, a ?rst layer Which recogniZes the
`sparse code representation 18, and a second layer Which
`associates the recogniZed layer With the classes. Obviously,
`additional layers may be provided for further classifying the
`input in each class into sub-classes. In operation, the inputs
`are processed through the folloWing algorithm Which classi
`?es the sparse code representations 18 into predetermined
`categories:
`
`

`

`US 8,081,209 B2
`
`9
`ALGORITHM 2
`
`MILN
`
`. L- 1,setthe output at layerl attirnet= Oto be
`.
`1: Forl= 1, .

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket