`(12) Patent Application Publication (10) Pub. No.: US 2004/0120581 A1
`Ozer et al.
`(43) Pub. Date:
`Jun. 24, 2004
`
`US 2004t]12U581A1
`
`(54) METHOD AND APPARATUS FOR
`AUTOMATED VIDEO ACTIVITY ANALYSIS
`
`(52) U.S. Cl.
`
`......................... .. 3821224; 3833190; 382E173
`
`(76)
`
`Inventors:
`
`I. Bllrak Ozer, Plainsboro, NJ (US);
`Wayne H. Wolf, Princeton, NJ (US);
`Tiehan Lu, Princeton, NJ (US)
`
`Conespondence Address:
`Richard C. Woodhridge,
`Synnestvcdt Lechncr Sc Woodbridgc, LLP
`P.O. Box 592
`Princeton, NJ 08542-0592 (US)
`
`(21) Appl. No.:
`
`10,/649,418
`
`(22)
`
`Filed:
`
`Aug. 27, 2003
`
`Related U.S. Application Data
`
`(60)
`
`Provisional application No. 6U;’406,567, filed on Aug.
`27, 2002.
`
`Publication Classification
`
`(51)
`
`lnt.C|." ........................... .. coax 9:45; c;ots1< W62;
`coax 9:34
`
`(57)
`
`ABSTRACT
`
`The invention is a new method and apparatus that can be
`used to detect,
`recogniwe, and analyze people or other
`objects in security checkpoints, public—places, parking lots,
`or in similar environments under surveillance to detect the
`presence of certain objects ol‘ interests (e.g., people), and to
`identify their activities for security and other purposes in
`real-time. The system can detect a wide range of activities
`for different applications. The method detects any new
`object introduced into a known environment and then clas-
`silies the object regions to human body parts or to other
`non-rigid and rigid objects. By comparing the detected
`objects with the graphs from a database in the system, the
`methodology is able to identify object parts and to decide on
`the presence of the object of interest (human, bag, dog, etc.)
`in video sequences. The system tracks the movement of
`dilferent object parts in order to combine them at a later
`stage to high—level semantics. For example,
`the motion
`pattern of each human body part is compared to the motion
`pattern of the known activities. The recogniired movements
`ol‘ the body parts are combined by a classilier to recognize
`the overall activity of the human body.
`
`401 Video Input 402
`Background
`
`el1m:(1:(t)1lon &
`
`
`transformation
`
`
`403
`Segmemafion
`
`404
`Contour
`generation
`
` 408
`Multiple frame
`
`classification
`
`Page 1 of12
`
`Page 1 of 12
`
`RPX Exhibit 1005
`
`RPX Exhibit 1005
`RPX v. MD Security
`RPX V. MD Security
`
`
`
`405
`
`
`
`Ellipse fitting
`
`
`
`Patent Application Publication
`
`Jun. 24, 2004 Sheet 1 of 5
`
`US 2004/0.120581 A]
`
`Figure 1
`
`Head region 101
`
`Handreglons 100
`
`Page2of12
`Page 2 of 12
`
`
`
`Patent Application Publication Jun. 24, 2004
`
`Sheet 2 of 5
`
`US 2004/0120581 Al
`
`Figure 2
`
`
`
`
`
`205
`
`
`
`207
`
`203
`
`209
`
`210
`
`_
`
`211
`
`212
`
`Wall-llfhg Fllglrl
`walking LH1
`
`lfluvlng Dunn
`l'dIa'.rIng Up
`
`Pohtlng Hlghl
`Punting Letl
`
`Opening Arrm
`Closing Arms
`
`Page3of12
`Page 3 of 12
`
`
`
`Patent Application Publication Jun. 24, 2004 Sheet 3 of 5
`
`US 2004/0120581 Al
`
`Figure 3
`
`
` Region of Interest 302
`
`
`/
`Ior Camera I
`
`
` ..- ,/I
`
`
`/-- Host
`Computer
`
`
`
`Page4of12
`Page 4 of 12
`
`
`
`Patent Application Publication Jun. 24, 2004 Sheet 4 of 5
`
`US 2004/0120581 Al
`
`Figure 4
`
` 402
`401 Video Input
`Background
`
`elimination &
`
`color
`
`transformation
`
`
`403
`Segmentation
`
`404
`Contour
`generation
`
`classification
`
`frame
`
`Page5of12
`Page 5 of 12
`
`
`
`Patent Application Publication .lun. 24, 2004 Sheet 5 of 5
`
`US 2004/0120581 A]
`
`Figure 5
`
`
`
`501 R9959“
`
`502 Contours
`
`Page6of12
`Page 6 of 12
`
`
`
`US 2004/0120581 A]
`
`Jun. 24, 2004
`
`METHOD AND APPARATUS FOR AUTOMATED
`VIDEO ACTIVITY ANALYSIS
`
`CROSS RI:'.I"I_’.R|_-ZNCIL TO Rl:lI.A'I'I_-II)
`/\PPI.I(.'A'l'IONS
`
`[0001] This application claims the priority of provisional
`U.S. Application Serial No. 60J'406,567, filed on Aug. 27,
`2006 and entitled “A System For ()bject Detection And
`Motion Classification In Compressed And Uncompressed
`Domains" by I. Burak Ozer and Wayne H. Wolf, the entire
`contents and substance of which are hereby incorporated in
`total by reference.
`
`BACKGROUND or THE INVENTION
`
`[0002]
`
`1. liield of Invention
`
`[0003] The invention is a new method and apparatus to
`detect the presence of articulated objects, e.g. human body,
`and rigid objects and to identify their activities in com-
`pressed and uncompressed domains and in real—time. The
`invention is used in a multiple camera system that
`is
`designed for use in indoor and outdoor environments. Pos-
`sible applications of the invention are applications in law
`enforcement e.g. security checkpoints, home security, public
`places, experimental social sciences, entertainment e.g. vir-
`lual rooms, smart rooms, in monitoring e.g. interior of plane,
`car, train or in monitoring outdoor environments e.g. streets,
`bus stops, road-sides, etc.
`
`[0004]
`
`2. Description of Related Art
`
`BACKGROUND OF THE INVENTION
`
`gestures in the database. Unlike the invention described
`herein, this approach is not modular as it recognizes the
`gesture of the whole human body figure. The same gesture,
`e.g. arm flapping can be performed by different subjects,
`birds, human, etc., where the subject of interest
`is not
`identified by the system. Another drawback of such a system
`is that it can easily fail when the subject figure is occluded.
`[0009] As described in a patent entitled “Method and
`Apparatus for Detecting and Quantifying Motion of a Body
`Part”, U.S. Pat. No. 5,148,477, a system for body part
`motion is invented. Unlike the invention described herein,
`this approach is adapted to analyze facial movement, e.g.
`movement of eyebrows. The system does not classify dif-
`ferent body parts of the human, it assumes that the object of
`interest
`is face. Unlike the system described herein,
`the
`system is purely dependent on the pixel change between two
`frames without using any classification and recognition
`information and any high level semantics.
`[0010] U.S. Pat. No. 6,249,606 describes a system for
`computer input using a cursor device in which gestures
`made by a person controlling the cursor are recognized. In
`contrast, our system is not limited to use with a cursor device
`or to computer input applications. U.S. Pat. No. 6,222,465,
`U.S. Pat. No. 6,147,678, U.S. Pat. No. 6,204,852, and U.S.
`Pat. No. 5,454,043 describe computer input systems that
`recognize hand gestures;
`in contrast, our system is not
`limited to computer control of a virtual environment or to
`hand gestures. U.S. Pat. No. 6,057,845 and U.S. Pat. No.
`5,796,406 are also directed to computer input devices and
`not the more general case of activity analysis solved by our
`invention.
`
`[0005] Recent advances in camera and storage systems are
`main factors driving the increased popularity of video sur-
`veillance. Prices continue to drop on components e.g.
`(.‘M()S cameras while manufacturers have added more fea-
`
`tures. Furthermore, the evolution of digital video especially
`in digital video storage and retrieval systems is another
`leading factor. Besides the expensive surveillance systems,
`today’s PC—based, easy plug—in surveillance systems are
`directed at home users and small business owners who
`cannot afford the expense of investing thousand of dollars
`for a security system. Real time monitoring from anywhere,
`anytime enable keeping a watchful eye on security areas,
`offices, stores, houses, pools or parking garages.
`
`[0006] Although these surveillance systems are powerful
`with new advances in camera and storage systems, auto-
`matic information retrieval from the sequences, e.g. rigid
`and non-rigid object detection and activity recognition in
`compressed and uncompressed domains, is not mature yet.
`These topics are still open areas for many research groups in
`industry, government, and academy.
`
`[0007] Early activity recognition systems used beacons
`carried by the subjects. Ilowever, a system that uses video
`avoids the need for beacons and allows the system to
`recognize activities that can be used to command the opera-
`tion of the environment.
`
`[0011] As described in the patent application entitled
`"Method of detecting and tracking groups of people” by
`Myron D.
`Flickner, U.S.
`patent
`application No.
`20030107649, a human tracking and detection system is
`invented that compares objects to “silhouette” templates to
`identify human and then uses tracking algorithm to deter-
`mine the trajectory of people. This system does not try to
`understand the activity of the people, nor does it try to find
`the human-object interaction as our invention can do.
`[0012] As described in a paper delivered at the VVorkshop
`on Artificial
`Intelligence for Web Search 2000 entitled
`“Wsual Event Classification via Force Dynamics” authored
`by Siskind, a system, which classifies simple motion events,
`e.g. pick up and put down by using single camera input is
`presented. The system uses “force-dynamic” relations to
`distinguish between event types. A human hand performs
`pick—up and put—down gesture. The system works for stable
`background and colored objects. However,
`the system
`doesn’t identify hand or other objects in the scene.
`[0013] As reported in the IEEE Computer Vision and
`Pattern Recognition Proceedings 1997, entitled "Coupled
`Ilidden Markov Models {IIMM} for Complex Action Rec-
`ognition” by Matthew Brand, Nuria Oliver, and Alex Pent-
`land, a hand gesture recognition system is described. The
`system recognizes certain Chinese martial art movemenLs.
`Ilowever, the hands are assumed to be recognimed a-priori.
`The system doesn’t detect and classify hands before gesture
`recognition step. The movement of one hand depends on the
`[0008] As described in patents entitled “Method and
`movement of the second hand, where freedom of motion of
`Apparatus for real—time gesture recognition” by Katerina H.
`the hands is limited by the martial art movements.
`Nguyen, U.S. Pat. Nos. 6,072,494 and 6,256,033, a gesture
`[0014]
`Parameterimed-IIMM, as reported in |I:ll£.I;' Trans-
`recognition system is invented that compares the input
`gesture of the subject e.g. human figure with the known
`actions on Pattern Recognition and Machine Intelligence,
`Page7of12
`Page 7 of 12
`
`
`
`US 2004/0120581 A]
`
`Jun. 24, 2004
`
`Volume 21, No 9, September 1999, entitled, "Parametric
`Hidden Markov Models for Gesture Recognition” authored
`by Wilson and Bobick, can recognize complex events e.g. an
`interaction of two mobile objects, gestures made with two
`hands (eg. so big, so small), etc. One of the drawbacks ol‘
`the parameterized HMM is that for complex events (e.g. a
`combination of sub—events) parameter training space may
`become very large.
`
`In summary, most of the activity recognition sys-
`[0015]
`tems are suitable for a specific application type. The inven-
`tion described herein can detect a wide range of activities for
`different applications. For this reason, the scheme detects
`different object parts and their movement in order to com-
`bine them at a later stage that connects to high—level seman-
`tics. Each object part has its own freedom of motion and the
`activity recognition for each part
`is achieved by using
`several IIMMs in parallel.
`
`SUMMARY OF THE INVENTION
`
`[0016] Explanation of some terms and abbreviations that
`are used throughout the text:
`
`[0017] Parametric representation: Using abstract shapes
`with several parameters (typically fewer than 50) to repre-
`sent a complex shape object such as human body parts.
`
`[0018] Video sequence: A sequence of images that gives
`one or more activities recognizable by human being. Avideo
`sequence can be any video recording or media (eg. an
`MPEG-1 file, a video tape, a video disc, a DVD, etc)
`
`Iil-"GA: FPGA or Field Programmable Gate /\1Tay is
`[0019]
`a type of programmable or con tigurable circuit.
`
`Plat[on'n I71-‘GA: [Iigh-density til-"G/\.s Ltsed to pro-
`[0020]
`vide core function of a system instead ofjust used as ‘glue
`logic‘ that coordinate main functional units together.
`
`[0021] Special-purpose hardware: Any single-purpose
`hardware unit, including but not
`limited to one or more
`FPGAS or other configurable logic, ASIC(s), or a custom
`chip(s).
`
`[0022] Video signal processor: A programmable computer
`used for video processing.
`
`[0023] TriMedia processor: A series of video processors
`produced by ’l'riMedia Inc. {Now part of Philips).
`
`particular posture, e.g. pointing to the right, in a database
`andlor processing input frames, starting with this particular
`posture, for activity recognition purposes for a certain num-
`ber of frames or until detecting another particular posture.
`The system is capable of using compressed domain and
`stored, e.g. MP|3.(3 Inter-frames, as well as uncompressed
`domain real-time video inputs.
`
`invention can detect
`[0[D9] The system of the present
`non-rigid (e.g. human body) and rigid object pans and
`recognize their activities in compressed and uncompressed
`domains. To achieve this, a method with two levels, namely
`low and high levels, is used. The low—level part performs
`object detection and extracts parameters for the abstract
`graph representation of the image being processed. The high
`level part uses dynamic programming to determine the
`activities of the object parts, and uses a distance classifier to
`detect specific activities.
`
`[X)W'lCVC1 part perfonns object detection and
`[0030]
`extracts parameters for the abstract graph representation of
`the frame being processed in real time. Local consistency
`based on low level features and geometrical characteristics
`of the object regions is used to group object parts. Fu rther—
`more, higher order shape metrics is needed for the presen-
`tation oi" the complex objects. The object is decomposed for
`its presentation as a combination of component shapes. The
`result will be unalfected by a partial occlusion of the object.
`
`[0031] The system is capable of managing the segmenta-
`tion process by using object-based knowledge in order to
`group the regions according to a global consistency and
`introducing a new model—based segmentation algorithm by
`using a feedback from relational representation of the object.
`The major advantages of the model-based segmentation can
`be summarized as improving the object extraction by reduc-
`ing the dependence on the low—level segmentation process
`and combining the boundary and region properties. Fu rther—
`more, the features used for segmentation are also attributes
`for object detection in relational graph representation.
`
`[0032] The system is also capable of 2D approximation of
`object parts by fitting closed curves with shape preserving
`deformations that provide satisfactory results.
`It helps to
`disregard the deformations due to the clothing.
`
`PC: General personal computer (including desktop
`[0024]
`computers, servers, andfor laptop computers)
`
`[0033] The selected unary and binary attributes are further
`extended for application specific algorithms.
`
`[0034] Object detection is achieved by matching the rela-
`tional graphs of objects with the reference model. This
`method maps the attributes, interprets the match and checks
`the conditional rules in order to index the parts correctly.
`This method improves object extraction accuracy by reduc-
`ing the dependency on the low—level Segmentation process
`and combining the boundary and region properties. Fu rther—
`more, the features used for segmentation are also attributes
`for object detection in relational graph representation. This
`property enables to adapt the segmentation thresholds by a
`model-based training system.
`
`[0025] The present system can recognize activities in the
`compressed and uncompressed domains. Depending on the
`application and image resolution, a compact and a modular
`representation of the object is used.
`
`[0026] First, in order to recognivie the overall human body
`posture in the compressed domain, an eigenspace represen-
`tation of human silhouettes obtained from AC values of the
`Discrete Cosine Transform (DCT) coefiicients, is used.
`
`[0027] The system of present invention can use AC—DCT
`coeflicient diiferences and compare them in order to classify
`the upfdown and leftfright movements of the human body
`silhouette in real-time.
`
`[0035] After the detection of the object parts, the system
`is ready to recogni;rJ:: the activities of each object part and the
`[0028] Posture recognition result helps the system to
`decide between two possibilities, storing the frarnes with a
`overall activity of the object.
`Page8of12
`Page 8 of 12
`
`
`
`US 2004/0120581 A1
`
`Jun. 24, 2004
`
`[0036] For example, if the object of interest is a human
`body, the system will first detect different object pans, e.g.
`hands, head, arms,
`legs,
`torso and compare these part
`attributes with the human tnodel attributes via graph match-
`ing. It‘ the object of interest is a rigid object the system will
`detect object pans and compare the attributes of these parts
`with the object model via graph matching. The high level
`part uses a pattern classifier, namely Hidden Markov Mod-
`els, which classifies activity patterns of the body parts in
`space-time and determines the movemenLs of the object
`parts. It also uses a distance classifier to detect specific
`gestures and activities. For articulated objects like human
`body, the system will find the activities of each body part
`independently from each other and combine the activities to
`find the gesture and overall activity of the human body at a
`later stage by using a quadratic distance classifier.
`
`[0037] Note that, each object part has its own freedom of
`motion. The activity recognition for each part is achieved by
`using several IIMMS in parallel.
`
`[0038] Combining activities for different rigid and non-
`rigid object parts and generating scenarios are purely appli-
`cation dependent issues. For each type of application the part
`activities are combined with different weights and generated
`different scenarios for these particular applications.
`
`the
`that
`[0039] Other advantages of the invention are,
`system is fast, robust and has a very low latency and a high
`accuracy rate for rigid and non-I'igid object part detection
`and activity classification. Additionally, compressed domain
`methods reduce computational complexity, to avoid depen-
`dency on correct segmentation and reduce storage area and
`bandwidth requirements. Finally, a multi—cameratmulti—pro—
`cessor system using a PC as a host allows evaluating the
`algorithms running on real-time data.
`
`[0040] FIG. 3 is a schematic view of the object detection
`and activity recognition apparatus. The apparatus includes a
`monitor, analog or digital cameras, a personnel computer
`(PC) including a database for objects, a database for activi-
`ties, video capturer, central processor, buffers and other
`memory uniLs.
`
`[0041] The video frames are sent from the camera to the
`video capturer on the PC. The video capturer converts the
`frames sent from the camera from analog format to digital
`format and stores the color components of the current frame
`in three different buffers.
`
`[0042] The color components stored in the bullers are
`further processed by the central processor to classify the
`object regions. Region paratneters are compared with the
`object parameters in the object database and the region is
`classified.
`
`BRIIEIV [)l_iS(fRIP'l'I()N OI" TIIL7. DRAVVINGS
`
`[0046] FIG. 1 is a pictorial view of the object parts and
`fitted closed curves (Object of Interest is “hu man”).
`
`[0047] FIG. 2 is a diagrammatic, schematic view of some
`of the recognized activities.
`
`[0048] FIG. 3 is a schematic view of the object detection
`and activity recognition apparatus.
`
`[0049] FIG. 4 is a block diagram of our system.
`
`[0050] FIG. 5 shows an example of contour generation.
`
`I)l£'l'/\lI.I£D Dl;'S(fRIP’I'[()N OI" TIIIE
`INVl3N'l‘I()N
`
`[0051] Overview
`
`[0052] Most of the existing commercial video security
`systems are only based on simple motion detection algo-
`rithms, where the systems are unable to classify the detected
`object or to detect the occurrence of events, e.g. suspicious
`movement. On the other hand, although there has been a
`great deal of research done for object detection and tracking,
`the existing solutions are not scalable and their real-time
`applicabilities are limited. Since the personnel cannot pay
`constant attention to the monitors,
`the effectiveness of
`is
`traditional
`surveillance
`systems
`limited. Systems
`equipped with the proposed algorithm would increase effec-
`tiveness by calling attention to the cameras that capture
`unattended objects such as, but not limited to, unattended
`bag in a metro station or unattended package next to a public
`building, identify intrusion and detect suspicious activities,
`reduce the need for monitors and supervisory personnel and
`transfer low-level video data to other security points in
`real-time for immediate response. Our invention is also
`useful for applications other than security, such as tracking
`the activities of employees for human factors studies or
`identifying the activity of customers for marketing studies.
`As shown in FIG. 1, the system can identify each object part
`separately (in this case hands 100, head 101, and torso 102)
`after comparing the object attributes with the model data-
`base via a graph—matching algorithm.
`
`In this system, the user can determine the actions
`[0053]
`taken when the system helps the user identify suspicious
`activity. The system can easily be set up to classify several
`objects of interest, such as, but not limited to, human, bag,
`dog, and to recognize a wide variety of activities, such as
`gestures ranging from pointing a gun to waving arms,
`leaving unattended objects, entering a prohibited area, tail-
`gating at security doors, spending too much or too little time
`in an area, etc. or to detect the direction of movement of a
`rigid object, such as, but not limited to, a truck which is
`speeding towards the security gate at a nuclear facility, etc.
`
`[0043] Spatial information of the classified object regions
`is stored in the memory unit.
`
`[0044] After a certain number of frames the sequential
`spatial information stored in the memory unit is compared
`with the activity database by the central processor for each
`object region in parallel.
`
`[0045] The output activities of the object parts are further
`processed by the central processor to find the overall activity
`of the object.
`
`[0054] Libraries of activities determine what events cause
`the system to set an alarm. The user can add to the library
`and turn actions in the library on and off at will. Most video
`analysis systems use simple methods such as motion detec-
`tion. Ilowever, since motion detection and tracking don’t
`know what is moving, they can easily generate false alarms
`as well as missing important events. In contrast, this system
`builds a model of the object of interest on each video frame.
`It tracks only objects that fit the model of the user defined
`subject, such as huntan, any rigid object, dog. This makes
`Page9of12
`Page 9 of 12
`
`
`
`US 2004/0120581 Al
`
`Jun. 24, 2004
`
`the system more accurate because it easily rejects many
`elements in the scene that may be moving but are not objects
`of interest.
`
`[0055] FIG. 3 illustrates our overall system m use. In this
`figure, camera 301 views a region of interest 302. Camera
`301 is connected to a computer 303. The computer 303 may
`keep a database of graphs, IIMM models, and other infor-
`mation used during video analysis.
`
`[0056] FIG. 4- shows a block diagram of our invention.
`The video input 401 may be from an analog camera whose
`video data has been suitably digitized or from a digital
`camera. A variety of video input formats can be used. The
`various elements in the block diagram will be described in
`more detail below.
`
`[0057] This patent describes exemplary implementations
`of our invention but
`the invention is not limited to the
`components and details described here.
`
`[0058] Early Stage Analysis
`
`[0059] Background elimination and color transformation:
`The first step (402) is the transformation of pixels into
`another color space regarding to the application. Back-
`ground elimination is performed by using these transformed
`pixel values for the current and background images. The
`foreground-background separation is achieved by compar-
`ing the DC ooefficients of the foreground object with the DC
`coefiicients of the background object via a statistical
`method.
`
`[0060] Segmentation: In 403, the foreground regions are
`extracted and the object of interest is segmented hierarchi-
`cally into its smaller unique parts based on the combination
`of color components and statistical shape features after
`background elimination. The meaningful adjaoent segments
`are combined and used as the input of the following algo-
`rithm steps.
`
`[006l] Contour following: Contour points of the seg-
`mented regions are extracted and stored (404). FIG. 5 gives
`an example of contour following: the frame given to the
`contour following algorithm 501 resulLs in the output frame
`with the contour 502.
`
`[0062] Ellipse fitting: This step (405) fits ellipses to the
`contours. Even when object of interest is not occluded by
`another object, due to the possible positions of non—rigid
`parts an object pan can be occluded in different ways. In this
`case, 2D approximation of pans by fitting ellipses with
`shape preserving deformations provides more satisfactory
`results. It also helps to discard the deformations.
`
`[0063] Object modeling by invariant shape attributes: For
`object detection,
`it
`is necessary to select part attributes
`which are invariant to two—dimensional transformations and
`
`are maximally discriminating between objects (406).
`
`find a meaningful correspondence of the combined segments
`in the reference model, the combination will be rejected and
`a new combination will be generated. For the purpose of
`determining the class of these feature vectors a piecewise
`quadratic Bayesian classifier with discriminant function
`g(X) is used. The generality of the reference model attributes
`allows the detection of dijferent kind of models for the same
`
`object type while the conditional rule generation decreases
`the rate of false alarms. The computations needed for each
`node matching are then a function of the feature size and the
`previously matched nodes of the branch under consider-
`ation. The marked regions are tracked by using ellipse
`parameters for the consecutive frames and graph-matching
`algorithm is applied for new objects appearing in the other
`regions.
`
`[0065] Classifying Over Multiple Frames
`
`[0066] Output of the graph-matching algorithm is the
`classified object parts. The movements of the object parLs are
`described as a spatio-temporal sequence of feature vectors
`that consist of the direction of the object part movement. The
`system checks direction of the movements of the object pans
`for a number of frames and calculates the probabilities of the
`activities with the known activities by using [Iidden Markov
`Models and chooses the pattern with the highest probability
`as the recognized activity in these frames (408). The activity
`of each pan is then combined by a quadratic distance
`classifier to find the overall activity of the object of interest.
`As shown in FIG. 2, frame 201 shows walking right and
`frame 202 shows walking left.
`
`[0067] The basis of this invention relies on the combina-
`tion of the modular parts that form logical semantics at each
`stage. The lowest level consists of low—level regions. The
`combination of regions oonesponds to object parts. The
`combination of object pans dellnes the object while the
`combination of the movements of different pans determines
`the gesture and activities. The combination of the activities
`defines the event in the scene. The same approach is then
`used to teach the system different objects, activities and
`events. Therefore, the user-defined semantics are entered as
`up—bottom approach. For example, the event of pointing a
`gun can be decomposed to dijferent
`levels such as the
`relative location of arms, hands and torso and the combined
`movements of arms and hands.
`
`[0068] The attached Exhibit A and Exhibit B include
`disclosure materials directed to various aspects of the inven-
`tion.
`
`It will be understood that the forgoing description
`[0069]
`of the invention is by way of example only, and variations
`will be evident to those skilled in the art without departing
`from the scope of the invention, which is as set out in the
`appended claims.
`
`1. A method for detection the pans of the non—rigid object
`such as, but not limited to, human body, and for recognition
`gestures and activities of the object of interest in real-time
`video sequences comprising:
`
`[0064] Graph matching: In this step (407), we compare the
`object model with a set of stored models. Each extracted
`region modeled with ellipses corresponds to a node in the
`graphical representation of the object of interest. Each object
`part and meaningful combinations represent a class w where
`the combination of binary and unary features are represented
`by a feature vector X and computed off—line. The combina-
`tion ofsegments is controlled by the reference model and by
`the rule generator. If the graph-matching algorithm cannot
`c) Finding the contours of the areas cited in lb);
`Page10of12
`Page 10 of 12
`
`a} Eliminating the background to obtain the foreground
`objects;
`
`b) Detecting dilTerent regions of the foreground objects by
`using color andior shape information;
`
`
`
`US 2004/0120581 A]
`
`Jun. 24, 2004
`
`d) Filling closed curves to the contours cited in lo);
`
`6. The method of claim 4 further comprising:
`
`e) Computing unary and binary attributes for the closed
`curves cited in Id};
`
`l) Comparing the attributes cited in la} with the object
`attributes in the training data set via a matching algo-
`rithm and determining object parts after matching;
`
`g) Combining adjacent segments and repeating claims lc)
`through If);
`
`h) Storing 2D center of gravity coordinates of each object
`part in the buffers for certain number of frames;
`
`i) Comparing the change of center of gravity coordinates
`with time for each object part cited in Ur‘) with the
`templates in the training data set and recognizing the
`activity of each part separately;
`
`j) Combining the activities of the object parts cited in 11:)
`and recognizing the overall activity of the object of
`interest in the scene; and,
`
`k) Combining the gestures and activities of different
`objects to detect the event in the scene.
`2. The method of claim 1 wherein step 1151) further
`comprises:
`
`a) Assigning a number to pixels contained in the seg-
`mented foreground regions and assigning another num-
`ber to the non—foreground regions and generating a
`binary image; and,
`
`b) Determining and storing contour coordinates of the
`segmented regions cited in claims lb) and 60) in a
`buffer.
`
`7. The method of claim 6 wherein step 6b) further
`comprises:
`
`a} Initializing multiple 3 by 3 pixel windows;
`
`b) Shifting the windows through the binary image cited in
`611) in different and independent directions; and,
`
`c) Finding the next neighboring point on the outer bound-
`ary of the foreground object segment cited in 40!} after
`one of the window centers overlaps with the foreground
`object segment.
`8. The method of claim 1 wherein step la‘) further
`comprises:
`
`Approximating contours by fitting closed curves with
`shape preserving defonnations for minimizing the
`effect of occlusion and local defonnations.
`
`a) Grabbing and digitizing several video frames under
`dijferent lighting changes;
`
`9. The method of claim 1 wherein step la) further
`comprises:
`
`b) Converting input color components into a single color
`space or a combination ofeolor spaces such as, but not
`limited to,
`red—green—blue color space,
`lu minance—
`chrominance color space, hue-saturation color space,
`etc., or a combination of them;
`
`Determining geometric invariant attributes for closed
`curves that are maximally discriminating between
`objects.
`10. The method of claim 1 wherein step 1)‘) further
`comprises:
`
`c) Generating a statistical model for background frames
`by using the mean and standard deviation of frame
`blocks and color componenLs cited in 2b);
`
`a) Generating attribute feature vectors for each closed
`curve where each object part and meaningful combi-
`nations represent a class;
`
`(1) Grabbing and digitizing test video frame and repeating
`step 2.15) for the test frame;
`
`b) I)etennining the class of the multi-dimensional feature
`vector by using quadratic Mahalanobis classifier; and,
`
`e) Generating a statistical model for test frame by using
`the mean and standard deviation of frame blocks and
`
`color components cited in 2d); and,
`
`0 Comparing the mean andfor standard deviation of the
`frame blocks of the test frame cited in 29) with the
`mean andfor standard deviation of the background
`frames cited in 2c).
`3. The method of claim 2 wherein step 2f) further com-
`prises:
`
`eliminating the blocks with similar mean and standard
`deviation that are below a threshold from the test frame
`and generating a foreground region.
`4. The method of claim 3 further com prising:
`
`Segmenting foreground regions cited in 3 of the object of
`interest hierarchica