throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2004/0120581 A1
`Ozer et al.
`(43) Pub. Date:
`Jun. 24, 2004
`
`US 20040120581A1
`
`(54) METHOD AND APPARATUS FOR
`AUTOMATED VIDEO ACTIVITY ANALYSIS
`
`(52) U.S. Cl.
`
`......................... .. 382/224; 382/190; 382/173
`
`(76)
`
`Inventors:
`
`I. Burak Ozer, Plainsboro, NJ (US);
`Wayne H‘ W0lf> Princetom NJ (US);
`Tiehan Lu’ Princeton’ NJ (US)
`
`Correspondence Address:
`Richard C. Woodbridge, Esq.
`Synnestvedt Lechner & Woodbridge, LLP
`P.O. Box 592
`Princeton, NJ 08542-0592 (US)
`
`(21) Appl' No’:
`22
`F1 d:
`1 6
`
`(
`
`)
`
`10/649’418
`A _ 27 2003
`ug
`’
`Related U_S_ Application Data
`
`(60) Provisional application No. 60/406,567, filed onAug.
`27, 2002.
`
`Publication Classification
`
`(51)
`
`Int. Cl.7 ........................... .. G06K 9/46; G06K 9/62;
`G06K 9/34
`
`57
`
`ABSTRACT
`
`)
`(
`The invention is a new method and apparatus that can be
`used to detect, recognize, and analyze people or other
`objects in security checkpoints, public-places, parking lots,
`or in similar environments under surveillance to detect the
`
`presence of certain objects of interests (e.g., people), and to
`identify their activities for security and other purposes in
`real-time. The system can detect a wide range of activities
`for different applications. The method detects any new
`object introduced into a known environment and then clas-
`sifies the object regions to human body parts or to other
`non-rigid and rigid objects. By comparing the detected
`objects with the graphs from a database in the system, the
`methodology is able to identrfy object parts and to decide on
`the presence of the object of interest (human, bag, dog, etc.)
`in Video 5equenee5~ The System tracks the m0Vemem Of
`different object parts in order to combine them at a later
`stage to high-level semantics. For example,
`the motion
`pattern of each human body part is compared to the motion
`pattern of the known activities. The recognized movements
`of the body parts are combined by a classifier to recognize
`the overall activity of the human body.
`
`
`401 Video Input 402
`Background
`
`elimination &
`Color
`trans formation
`
`
`
`403
`Segmentation
`
`404
`Contour
`generation
`
`
`
`
`
`405
`A408
`Ellipse fim-mg
`Mlultiptle frame
`
`c assi ication
`
`
`
`1
`
`POI 1005
`
`1
`
`POI 1005
`
`

`
`Patent Application Publication
`
`Jun. 24, 2004 Sheet 1 of 5
`
`US 2004/0120581 A1
`
`Figure 1
`
`Head region 191
`
`Torso 102
`
`“ region
`
`Hanregians 100
`
`2
`
`

`
`Patent Application Publication Jun. 24, 2004 Sheet 2 of 5
`
`US 2004/0120581 A1
`
`Figure 2
`
`Walklng mgm
`Walklng Len
`
`Mmrlng Dawn
`
`Moving up
`
`Polntlng Hlht
`Polntlng Len
`
`Opening Arms
`closing Arms
`
`201
`
`_,
`
`205
`
`1%\.
`
`
`
`209
`«\
`
`210
`
`‘
`
`4, +
`
`,« \ a
`
`Q
`
`207
`
`208
`
`+ +
`
`/ x
`
`21 1
`
`W
`
`212
`
`.
`
`©‘@ (
`
`)
`
`3
`
`

`
`Patent Application Publication Jun. 24, 2004 Sheet 3 of 5
`
`US 2004/0120581 A1
`
`Figure 3
`
`
`
`
`Region of interest 302
`for C amera 1
`
`
`
`//M ',,~-- Host
`
`Computer
`
`304
`
`4
`
`

`
`Patent Application Publication Jun. 24, 2004 Sheet 4 of 5
`
`US 2004/0120581 A1
`
`Figure 4
`
` 402
`401 Video Input
`
`Background
`elimination &
`
`C010;
`
`trans formation
`
`403
`Segmentation
`
`404
`Comm.“
`generation
`
`
`
`
`405
`Multiple frame
`Ellipse fitting
`
`classification
`
`
`408
`
`5
`
`

`
`Patent Application Publication Jun. 24, 2004 Sheet 5 of 5
`
`US 2004/0120581 A1
`
`Figure 5
`
`
`
`501 Reions
`
`
`
`502 Contours
`
`6
`
`

`
`US 2004/0120581 A1
`
`Jun. 24, 2004
`
`METHOD AND APPARATUS FOR AUTOMATED
`VIDEO ACTIVITY ANALYSIS
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This application claims the priority of provisional
`U.S. Application Serial No. 60/406,567, filed on Aug. 27,
`2003 and entitled “A System For Object Detection And
`Motion Classification In Compressed And Uncompressed
`Domains” by I. Burak Ozer and Wayne H. Wolf, the entire
`contents and substance of which are hereby incorporated in
`total by reference.
`
`BACKGROUND OF THE INVENTION
`
`[0002]
`
`1. Field of Invention
`
`[0003] The invention is a new method and apparatus to
`detect the presence of articulated objects, e.g. human body,
`and rigid objects and to identify their activities in com-
`pressed and uncompressed domains and in real-time. The
`invention is used in a multiple camera system that
`is
`designed for use in indoor and outdoor environments. Pos-
`sible applications of the invention are applications in law
`enforcement e.g. security checkpoints, home security, public
`places, experimental social sciences, entertainment e.g. vir-
`tual rooms, smart rooms, in monitoring e. g. interior of plane,
`car, train or in monitoring outdoor environments e.g. streets,
`bus stops, road-sides, etc.
`
`[0004]
`
`2. Description of Related Art
`
`BACKGROUND OF THE INVENTION
`
`[0005] Recent advances in camera and storage systems are
`main factors driving the increased popularity of video sur-
`veillance. Prices continue to drop on components e.g.
`CMOS cameras while manufacturers have added more fea-
`
`tures. Furthermore, the evolution of digital video especially
`in digital video storage and retrieval systems is another
`leading factor. Besides the expensive surveillance systems,
`today’s PC-bascd, casy plug-in surveillance systems are
`directed at home users and small business owners who
`
`cannot afford the expense of investing thousand of dollars
`for a security system. Real time monitoring from anywhere,
`anytime enable keeping a watchful eye on security areas,
`offices, stores, houses, pools or parking garages.
`
`[0006] Although these surveillance systems are powerful
`with new advances in camera and storage systems, auto-
`matic information retrieval from the sequences, e.g. rigid
`and non-rigid object detection and activity recognition in
`compressed and uncompressed domains, is not mature yet.
`These topics are still open areas for many research groups in
`industry, government, and academy.
`
`[0007] Early activity recognition systems used beacons
`carried by the subjects. However, a system that uses video
`avoids the need for beacons and allows the system to
`recognize activities that can be used to command the opera-
`tion of the environment.
`
`[0008] As described in patents entitled “Method and
`Apparatus for real-time gesture recognition” by Katerina H.
`Nguyen, U.S. Pat. Nos. 6,072,494 and 6,256,033, a gesture
`recognition system is invented that compares the input
`gesture of the subject e.g. human figure with the known
`
`gestures in the database. Unlike the invention described
`herein, this approach is not modular as it recognizes the
`gesture of the whole human body figure. The same gesture,
`e.g. arm flapping can be performed by different subjects,
`birds, human, etc., where the subject of interest
`is not
`identified by the system. Another drawback of such a system
`is that it can easily fail when the subject figure is occluded.
`[0009] As described in a patent entitled “Method and
`Apparatus for Detecting and Quantifying Motion of a Body
`Part”, U.S. Pat. No. 5,148,477, a system for body part
`motion is invented. Unlike the invention described herein,
`this approach is adapted to analyze facial movement, e.g.
`movement of eyebrows. The system does not classify dif-
`ferent body parts of the human, it assumes that the object of
`interest is face. Unlike the system described herein, the
`system is purely dependent on the pixel change between two
`frames without using any classification and recognition
`information and any high level semantics.
`[0010] U.S. Pat. No. 6,249,606 describes a system for
`computer input using a cursor device in which gestures
`made by a person controlling the cursor are recognized. In
`contrast, our system is not limited to use with a cursor device
`or to computer input applications. U.S. Pat. No. 6,222,465,
`U.S. Pat. No. 6,147,678, U.S. Pat. No. 6,204,852, and U.S.
`Pat. No. 5,454,043 describe computer input systems that
`recognize hand gestures;
`in contrast, our system is not
`limited to computer control of a virtual environment or to
`hand gestures. U.S. Pat. No. 6,057,845 and U.S. Pat. No.
`5,796,406 are also directed to computer input devices and
`not the more general case of activity analysis solved by our
`invention.
`
`[0011] As described in the patent application entitled
`“Method of detecting and tracking groups of people” by
`Myron D.
`Flickner, U.S.
`patent
`application No.
`20030107649, a human tracking and detection system is
`invented that compares objects to “silhouette” templates to
`identify human and then uses tracking algorithm to deter-
`mine the trajectory of people. This system does not try to
`understand the activity of the people, nor does it try to find
`the human-object interaction as our invention can do.
`[0012] As described in a paper delivered at the Workshop
`on Artificial Intelligence for Web Search 2000 entitled
`“Visual Event Classification via Force Dynamics” authored
`by Siskind, a system, which classifies simple motion events,
`e.g. pick up and put down by using single camera input is
`presented. The system uses “force-dynamic” relations to
`distinguish between event types. A human hand performs
`pick-up and put-down gesture. The system works for stable
`background and colored objects. However,
`the system
`doesn’t identify hand or other objects in the scene.
`[0013] As reported in the IEEE Computer Vision and
`Pattern Recognition Proceedings 1997, entitled “Coupled
`Hidden Markov Models (HMM) for Complex Action Rec-
`ognition” by Matthew Brand, Nuria Oliver, and Alex Pent-
`land, a hand gesture recognition system is described. The
`system recognizes certain Chinese martial art movements.
`However, the hands are assumed to be recognized a-priori.
`The system doesn’t detect and classify hands before gesture
`recognition step. The movement of one hand depends on the
`movement of the second hand, where freedom of motion of
`the hands is limited by the martial art movements.
`[0014] Parameterized-HMM, as reported in IEEE Trans-
`actions on Pattern Recognition and Machine Intelligence,
`
`7
`
`

`
`US 2004/0120581 A1
`
`Jun. 24, 2004
`
`Volume 21, No 9, September 1999, entitled, “Parametric
`Hidden Markov Models for Gesture Recognition” authored
`by Wilson and Bobick, can recognize complex events e. g. an
`interaction of two mobile objects, gestures made with two
`hands (e.g. so big, so small), etc. One of the drawbacks of
`the parameterized HMM is that for complex events (e.g. a
`combination of sub-events) parameter training space may
`become very large.
`
`In summary, most of the activity recognition sys-
`[0015]
`tems are suitable for a specific application type. The inven-
`tion described herein can detect a wide range of activities for
`different applications. For this reason, the scheme detects
`different object parts and their movement in order to com-
`bine them at a later stage that connects to high-level seman-
`tics. Each object part has its own freedom of motion and the
`activity recognition for each part
`is achieved by using
`several HMMs in parallel.
`
`SUMMARY OF THE INVENTION
`
`[0016] Explanation of some terms and abbreviations that
`are used throughout the text:
`
`[0017] Parametric representation: Using abstract shapes
`with several parameters (typically fewer than 50) to repre-
`sent a complex shape object such as human body parts.
`
`[0018] Video sequence: A sequence of images that gives
`one or more activities recognizable by human being. Avideo
`sequence can be any video recording or media (e.g. an
`MPEG-1 file, a video tape, a video disc, a DVD, etc)
`
`[0019] FPGA: FPGA or Field Programmable Gate Array is
`a type of programmable or configurable circuit.
`
`[0020] Platform FPGA: High-density FPGAs used to pro-
`vide core function of a system instead of just used as ‘glue
`logic’ that coordinate main functional units together.
`
`[0021] Special-purpose hardware: Any single-purpose
`hardware unit, including but not limited to one or more
`FPGAs or other configurable logic, ASIC(s), or a custom
`chip(s).
`
`[0022] Video signal processor: Aprogrammable computer
`used for video processing.
`
`[0023] TriMedia processor: A series of video processors
`produced by TriMedia Inc. (Now part of Philips).
`
`particular posture, e.g. pointing to the right, in a database
`and/or processing input frames, starting with this particular
`posture, for activity recognition purposes for a certain num-
`ber of frames or until detecting another particular posture.
`The system is capable of using compressed domain and
`stored, e.g. MPEG Inter-frames, as well as uncompressed
`domain real-time video inputs.
`
`[0029] The system of the present invention can detect
`non-rigid (e.g. human body) and rigid object parts and
`recognize their activities in compressed and uncompressed
`domains. To achieve this, a method with two levels, namely
`low and high levels, is used. The low-level part performs
`object detection and extracts parameters for the abstract
`graph representation of the image being processed. The high
`level part uses dynamic programming to determine the
`activities of the object parts, and uses a distance classifier to
`detect specific activities.
`
`[0030] Low-level part performs object detection and
`extracts parameters for the abstract graph representation of
`the frame being processed in real time. Local consistency
`based on low level features and geometrical characteristics
`of the object regions is used to group object parts. Further-
`more, higher order shape metrics is needed for the presen-
`tation of the complex objects. The object is decomposed for
`its presentation as a combination of component shapes. The
`result will be unaffected by a partial occlusion of the object.
`
`[0031] The system is capable of managing the segmenta-
`tion process by using object-based knowledge in order to
`group the regions according to a global consistency and
`introducing a new model-based segmentation algorithm by
`using a feedback from relational representation of the object.
`The major advantages of the model-based segmentation can
`be summarized as improving the object extraction by reduc-
`ing the dependence on the low-level segmentation process
`and combining the boundary and region properties. Further-
`more, the features used for segmentation are also attributes
`for object detection in relational graph representation.
`
`[0032] The system is also capable of 2D approximation of
`object parts by fitting closed curves with shape preserving
`deformations that provide satisfactory results. It helps to
`disregard the deformations due to the clothing.
`
`PC: General personal computer (including desktop
`[0024]
`computers, servers, and/or laptop computers)
`
`[0033] The selected unary and binary attributes are further
`extended for application specific algorithms.
`
`[0025] The present system can recognize activities in the
`compressed and uncompressed domains. Depending on the
`application and image resolution, a compact and a modular
`representation of the object is used.
`
`[0026] First, in order to recognize the overall human body
`posture in the compressed domain, an eigenspace represen-
`tation of human silhouettes obtained from AC values of the
`
`Discrete Cosine Transform (DCT) coeflicients, is used.
`
`[0027] The system of present invention can use AC-DCT
`coeflicient differences and compare them in order to classify
`the up/down and left/right movements of the human body
`silhouette in real-time.
`
`[0028] Posture recognition result helps the system to
`decide between two possibilities, storing the frames with a
`
`[0034] Object detection is achieved by matching the rela-
`tional graphs of objects with the reference model. This
`method maps the attributes, interprets the match and checks
`the conditional rules in order to index the parts correctly.
`This method improves object extraction accuracy by reduc-
`ing the dependency on the low-level segmentation process
`and combining the boundary and region properties. Further-
`more, the features used for segmentation are also attributes
`for object detection in relational graph representation. This
`property enables to adapt the segmentation thresholds by a
`model-based training system.
`
`[0035] After the detection of the object parts, the system
`is ready to recognize the activities of each object part and the
`overall activity of the object.
`
`8
`
`

`
`US 2004/0120581 A1
`
`Jun. 24, 2004
`
`[0036] For example, if the object of interest is a human
`body, the system will first detect different object parts, e.g.
`hands, head, arms,
`legs,
`torso and compare these part
`attributes with the human model attributes via graph match-
`ing. If the object of interest is a rigid object the system will
`detect object parts and compare the attributes of these parts
`with the object model via graph matching. The high level
`part uses a pattern classifier, namely Hidden Markov Mod-
`els, which classifies activity patterns of the body parts in
`space-time and determines the movements of the object
`parts. It also uses a distance classifier to detect specific
`gestures and activities. For articulated objects like human
`body, the system will find the activities of each body part
`independently from each other and combine the activities to
`find the gesture and overall activity of the human body at a
`later stage by using a quadratic distance classifier.
`
`[0037] Note that, each object part has its own freedom of
`motion. The activity recognition for each part is achieved by
`using several HMMs in parallel.
`
`[0038] Combining activities for different rigid and non-
`rigid object parts and generating scenarios are purely appli-
`cation dependent issues. For each type of application the part
`activities are combined with different weights and generated
`different scenarios for these particular applications.
`
`that the
`[0039] Other advantages of the invention are,
`system is fast, robust and has a very low latency and a high
`accuracy rate for rigid and non-rigid object part detection
`and activity classification. Additionally, compressed domain
`methods reduce computational complexity, to avoid depen-
`dency on correct segmentation and reduce storage area and
`bandwidth requirements. Finally, a multi-camera/multi-pro-
`cessor system using a PC as a host allows evaluating the
`algorithms running on real-time data.
`
`[0040] FIG. 3 is a schematic view of the object detection
`and activity recognition apparatus. The apparatus includes a
`monitor, analog or digital cameras, a personnel computer
`(PC) including a database for objects, a database for activi-
`ties, vidco capturcr, ccntral proccssor, buffers and other
`memory units.
`
`[0041] The video frames are sent from the camera to the
`video capturer on the PC. The video capturer converts the
`frames sent from the camera from analog format to digital
`format and stores the color components of the current frame
`in three different buffers.
`
`[0042] The color components stored in the buffers are
`further processed by the central processor to classify the
`object regions. Region parameters are compared with the
`object parameters in the object database and the region is
`classified.
`
`[0043] Spatial information of the classified object regions
`is stored in the memory unit.
`
`[0044] After a certain number of frames the sequential
`spatial information stored in the memory unit is compared
`with the activity database by the central processor for each
`object region in parallel.
`
`[0045] The output activities of the object parts are further
`processed by the central processor to find the overall activity
`of the object.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0046] FIG. 1 is a pictorial view of the object parts and
`fitted closed curves (Object of Interest is “human”).
`
`[0047] FIG. 2 is a diagrammatic, schematic view of some
`of the recognized activities.
`
`[0048] FIG. 3 is a schematic view of the object detection
`and activity recognition apparatus.
`
`[0049] FIG. 4 is a block diagram of our system.
`
`[0050] FIG. 5 shows an example of contour generation.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`[0051] Overview
`
`[0052] Most of the existing commercial video security
`systems are only based on simple motion detection algo-
`rithms, where the systems are unable to classify the detected
`object or to detect the occurrence of events, e.g. suspicious
`movement. On the other hand, although there has been a
`great deal of research done for object detection and tracking,
`the existing solutions are not scalable and their real-time
`applicabilities are limited. Since the personnel cannot pay
`constant attention to the monitors,
`the effectiveness of
`traditional
`surveillance
`systems
`is
`limited. Systems
`equipped with the proposed algorithm would increase effec-
`tiveness by calling attention to the cameras that capture
`unattended objects such as, but not limited to, unattended
`bag in a metro station or unattended package next to a public
`building, identify intrusion and detect suspicious activities,
`reduce the need for monitors and supervisory personnel and
`transfer low-level video data to other security points in
`real-time for immediate response. Our invention is also
`useful for applications other than security, such as tracking
`the activities of employees for human factors studies or
`identifying the activity of customers for marketing studies.
`As shown in FIG. 1, the system can identify each object part
`separately (in this case hands 100, head 101, and torso 102)
`after comparing the object attributes with the model data-
`base via a graph-matching algorithm.
`
`In this system, the user can determine the actions
`[0053]
`taken when the system helps the user identify suspicious
`activity. The system can easily be set up to classify several
`objects of interest, such as, but not limited to, human, bag,
`dog, and to recognize a wide variety of activities, such as
`gestures ranging from pointing a gun to waving arms,
`leaving unattended objects, entering a prohibited area, tail-
`gating at security doors, spending too much or too little time
`in an area, etc. or to detect the direction of movement of a
`rigid object, such as, but not limited to, a truck which is
`speeding towards the security gate at a nuclear facility, etc.
`
`[0054] Libraries of activities determine what events cause
`the system to set an alarm. The user can add to the library
`and turn actions in the library on and off at will. Most video
`analysis systems use simple methods such as motion detec-
`tion. However, since motion detection and tracking don’t
`know what is moving, they can easily generate false alarms
`as well as missing important events. In contrast, this system
`builds a model of the object of interest on each video frame.
`It tracks only objects that fit the model of the user defined
`subject, such as human, any rigid object, dog. This makes
`
`9
`
`

`
`US 2004/0120581 Al
`
`Jun. 24, 2004
`
`the system more accurate because it easily rejects many
`elements in the scene that may be moving but are not objects
`of interest.
`
`[0055] FIG. 3 illustrates our overall system in use. In this
`figure, camera 301 views a region of interest 302. Camera
`301 is connected to a computer 303. The computer 303 may
`keep a database of graphs, HMM models, and other infor-
`mation used during video analysis.
`
`[0056] FIG. 4 shows a block diagram of our invention.
`The video input 401 may be from an analog camera whose
`video data has been suitably digitized or from a digital
`camera. A variety of video input formats can be used. The
`various elements in the block diagram will be described in
`more detail below.
`
`[0057] This patent describes exemplary implementations
`of our invention but the invention is not limited to the
`
`components and details described here.
`
`[0058] Early Stage Analysis
`
`[0059] Background elimination and color transformation:
`The first step (402) is the transformation of pixels into
`another color space regarding to the application. Back-
`ground elimination is performed by using these transformed
`pixel values for the current and background images. The
`foreground-background separation is achieved by compar-
`ing the DC coefficients of the foreground object with the DC
`coeflicients of
`the background object via a statistical
`method.
`
`[0060] Segmentation: In 403, the foreground regions are
`extracted and the object of interest is segmented hierarchi-
`cally into its smaller unique parts based on the combination
`of color components and statistical shape features after
`background elimination. The meaningful adjacent segments
`are combined and used as the input of the following algo-
`rithm steps.
`
`[0061] Contour following: Contour points of the seg-
`mented regions are extracted and stored (404). FIG. 5 gives
`an example of contour following: the frame given to the
`contour following algorithm 501 results in the output frame
`with the contour 502.
`
`[0062] Ellipse fitting: This step (405) fits ellipses to the
`contours. Even when object of interest is not occluded by
`another object, due to the possible positions of non-rigid
`parts an object part can be occluded in different ways. In this
`case, 2D approximation of parts by fitting ellipses with
`shape preserving deformations provides more satisfactory
`results. It also helps to discard the deformations.
`
`[0063] Object modeling by invariant shape attributes: For
`object detection,
`it
`is necessary to select part attributes
`which are invariant to two-dimensional transformations and
`
`are maximally discriminating between objects (406).
`
`[0064] Graph matching: In this step (407), we compare the
`object model with a set of stored models. Each extracted
`region modeled with ellipses corresponds to a node in the
`graphical representation of the object of interest. Each object
`part and meaningful combinations represent a class w where
`the combination of binary and unary features are represented
`by a feature vector X and computed off-line. The combina-
`tion of segments is controlled by the reference model and by
`the rule generator. If the graph-matching algorithm cannot
`
`find a meaningful correspondence of the combined segments
`in the reference model, the combination will be rejected and
`a new combination will be generated. For the purpose of
`determining the class of these feature vectors a piecewise
`quadratic Bayesian classifier with discriminant function
`g(X) is used. The generality of the reference model attributes
`allows the detection of different kind of models for the same
`
`object type while the conditional rule generation decreases
`the rate of false alarms. The computations needed for each
`node matching are then a function of the feature size and the
`previously matched nodes of the branch under consider-
`ation. The marked regions are tracked by using ellipse
`parameters for the consecutive frames and graph-matching
`algorithm is applied for new objects appearing in the other
`regions.
`
`[0065] Classifying Over Multiple Frames
`
`[0066] Output of the graph-matching algorithm is the
`classified object parts. The movements of the object parts are
`described as a spatio-temporal sequence of feature vectors
`that consist of the direction of the object part movement. The
`system checks direction of the movements of the object parts
`for a number of frames and calculates the probabilities of the
`activities with the known activities by using Hidden Markov
`Models and chooses the pattern with the highest probability
`as the recognized activity in these frames (408). The activity
`of each part
`is then combined by a quadratic distance
`classifier to find the overall activity of the object of interest.
`As shown in FIG. 2, frame 201 shows walking right and
`frame 202 shows walking left.
`
`[0067] The basis of this invention relies on the combina-
`tion of the modular parts that form logical semantics at each
`stage. The lowest level consists of low-level regions. The
`combination of regions corresponds to object parts. The
`combination of object parts defines the object while the
`combination of the movements of different parts determines
`the gesture and activities. The combination of the activities
`defines the event in the scene. The same approach is then
`used to teach the system different objects, activities and
`events. Therefore, the user-defined semantics are entered as
`up-bottom approach. For example, the event of pointing a
`gun can be decomposed to different
`levels such as the
`relative location of arms, hands and torso and the combined
`movements of arms and hands.
`
`[0068] The attached Exhibit A and Exhibit B include
`disclosure materials directed to various aspects of the inven-
`tion.
`
`It will be understood that the forgoing description
`[0069]
`of the invention is by way of example only, and variations
`will be evident to those skilled in the art without departing
`from the scope of the invention, which is as set out in the
`appended claims.
`
`1. Amethod for detection the parts of the non-rigid object
`such as, but not limited to, human body, and for recognition
`gestures and activities of the object of interest in real-time
`video sequences comprising:
`
`a) Eliminating the background to obtain the foreground
`objects;
`
`b) Detecting different regions of the foreground objects by
`using color and/or shape information;
`
`c) Finding the contours of the areas cited in lb);
`
`10
`
`10
`
`

`
`US 2004/0120581 A1
`
`Jun. 24, 2004
`
`d) Fitting closed curves to the contours cited in IC);
`
`6. The method of claim 4 further comprising:
`
`e) Computing unary and binary attributes for the closed
`curves cited in la);
`
`f) Comparing the attributes cited in 1e) with the object
`attributes in the training data set via a matching algo-
`rithm and determining object parts after matching;
`
`g) Combining adjacent segments and repeating claims 1c)
`through 1}‘);
`
`h) Storing 2D center of gravity coordinates of each object
`part in the buffers for certain number of frames;
`
`i) Comparing the change of center of gravity coordinates
`with time for each object part cited in 1h) with the
`templates in the training data set and recognizing the
`activity of each part separately;
`
`Combining the activities of the object parts cited in 1h)
`and recognizing the overall activity of the object of
`interest in the scene; and,
`
`k) Combining the gestures and activities of different
`objects to detect the event in the scene.
`2. The method of claim 1 wherein step la) further
`comprises:
`
`a) Assigning a number to pixels contained in the seg-
`mented foreground regions and assigning another num-
`ber to the non-foreground regions and generating a
`binary image; and,
`
`b) Determining and storing contour coordinates of the
`segmented regions cited in claims lb) and 6a) in a
`buflkr
`
`7. The method of claim 6 wherein step 6b) further
`comprises:
`
`a) Initializing multiple 3 by 3 pixel windows;
`
`b) Shifting the windows through the binary image cited in
`6a) in different and independent directions; and,
`
`c) Finding the next neighboring point on the outer bound-
`ary of the foreground object segment cited in 401) after
`one of the window centers overlaps with the foreground
`object segment.
`8. The method of claim 1 wherein step la) further
`comprises:
`
`Approximating contours by fitting closed curves with
`shape preserving deformations for minimizing the
`effect of occlusion and local deformations.
`
`a) Grabbing and digitizing several video frames under
`different lighting changes;
`
`9. The method of claim 1 wherein step 1e) further
`comprises:
`
`b) Converting input color components into a single color
`space or a combination of color spaces such as, but not
`limited to,
`red-green-blue color space,
`luminance-
`chrominance color space, hue-saturation color space,
`etc.; or a combination of them;
`
`Determining geometric invariant attributes for closed
`curves that are maximally discriminating between
`objects.
`10. The method of claim 1 wherein step 1}‘) further
`comprises:
`
`c) Generating a statistical model for background frames
`by using the mean and standard deviation of frame
`blocks and color components cited in 2b);
`
`a) Generating attribute feature vectors for each closed
`curve where each object part and meaningful combi-
`nations represent a class;
`
`d) Grabbing and digitizing test video frame and repeating
`step 2b) for the test frame;
`
`b) Determining the class of the multi-dimensional feature
`vector by using quadratic Mahalanobis classifier; and,
`
`e) Generating a statistical model for test frame by using
`the mean and standard deviation of frame blocks and
`
`color components cited in 2a); and,
`
`f) Comparing the mean and/or standard deviation of the
`frame blocks of the test frame cited in 2e) with the
`mean and/or standard deviation of the background
`frames cited in 2c).
`3. The method of claim 2 wherein step 2}‘) further com-
`prises:
`
`eliminating the blocks with similar mean and standard
`deviation that are below a threshold from the test frame
`
`and generating a foreground region.
`4. The method of claim 3 further comprising:
`
`Segmenting foreground regions cited in 3 of the object of
`interest hierarchically into its smaller unique parts
`based on the combination of color components and
`statistical shape features.
`5. The method of claim 4 wherein the segmenting step
`based on statistical shape features further comprises:
`
`Comparing the curvature, mean, deviati

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket