throbber
United States Patent [19J
`Courtney
`
`[54] MOTION BASED EVENT DETECTION
`SYSTEM AND METHOD
`
`[75]
`
`Inventor: Jonathan D. Courtney, Dallas, Tex.
`
`[73] Assignee: Texas Instruments Incorporated,
`Dallas, Tex.
`
`[21] Appl. No.: 08/795,432
`
`[22] Filed:
`
`Feb. 5, 1997
`
`Related U.S. Application Data
`[60] Provisional application No. 60/011,106, Feb. 5, 1996.
`Int. Cl.6
`....................................................... H04N 7/18
`U.S. Cl. ........................... 348/143; 348/135; 348/155
`Field of Search ..................................... 348/135, 142,
`348/152, 154, 155, 143, 171, 172; 382/103,
`107, 236
`
`[51]
`[52]
`[58]
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,243,418
`5,428,774
`5,550,965
`5,666,157
`5,721,692
`5,734,737
`
`9/1993 Kuno et al. ............................. 358/108
`6/1995 Takahashi et al. ...................... 395/600
`8/1996 Gabbe et al.
`........................... 395/154
`9/1997 Aviv ........................................ 348/152
`2/1998 Nagaya et al. .......................... 364/516
`3/1998 Chang et al.
`........................... 382/107
`
`FOREIGN PATENT DOCUMENTS
`
`0 318 039 11/1988
`
`Japan .
`
`OTHER PUBLICATIONS
`
`Lee, S., et al., "Video Indexing-An Approach Based on
`Moving Object and Track," in Wayne Niblack, editor, Stor(cid:173)
`age and Retrieval for Image and Video Databases, Proc.
`SPIE 1908, 25-36 (1993).
`
`I 1111111111111111 11111 111111111111111 1111111111 11111 11111 111111111111111111
`US005969755A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,969,755
`Oct. 19, 1999
`
`Ioka, M, et al., "A Method for Retrieving Sequences of
`Images on the Basis of Motion Analysis," in Image Storage
`and Retrieval Systems, Proc. SPIE 1662, 35-46 (1992).
`Day Y F, et al., "Object-Oriented Conceptual Modeling of
`Video Data", Supplied by Applicant, 402-408, Mar. 6, 1995.
`Abe S, et al., "Scene Retrieval Method using Temporal
`Condition Changes", Supplied by Applicant, whole docu(cid:173)
`ment, Jan. 1, 1993.
`Hirotada Ueda, et al., "Automatic Structure Visualization for
`Video Editing", Supplied by Applicant, whole document,
`Apr. 24, 1993.
`Orkisz, M, "Moving Objects Locaation in Complex Scenes
`Filmed by a Fixed Camera", Supplied by Applicant, 325,
`327-328, Jan. 1, 1992.
`Primary Examiner-Tommy P. Chin
`Assistant Examiner-John Voisinet
`Attorney, Agent, or Firm-Robert L.
`Donaldson
`[57]
`
`Troike; Richard L.
`
`ABSTRACT
`
`A method to provide automatic content-based video index(cid:173)
`ing from object motion is described. Moving objects in
`video from a surveillance camera 11 detected in the video
`sequence using motion segmentation methods by motion
`segmentor 21. Objects are tracked through segmented data
`in an object tracker 22. A symbolic representation of the
`video is generated in the form of an annotated graphics
`describing the objects and their movement. A motion ana(cid:173)
`lyzer 23 analyzes results of object tracking and annotates the
`graph motion with indices describing several events. The
`graph is then indexed using a rule based classification
`scheme to identify events of interest such as appearance/
`disappearance, deposit/removal, entrance/exit, and motion/
`rest of objects. Clips of the video identified by spatio(cid:173)
`temporal, event, and object-based queries are recalled to
`view the desired video.
`
`22 Claims, 11 Drawing Sheets
`
`21
`11
`~ pJ~ITIZATI_~~ MOTION
`~ 7 :
`
`SEGMENTOR
`
`I
`I
`I
`I
`
`I
`I
`I
`I
`
`22
`OBJECT
`TRACKER
`
`23
`
`24
`
`25
`
`MOTION
`ANALYZER
`
`RECORDER
`
`COMPRESSOR
`
`:TEMPORARY: 10
`STORAGE
`:
`~-----------J
`
`20
`i
`
`JO
`
`DE COMPRESSOR
`
`28
`GRAPHICAL
`USER INTERFACE
`
`27
`
`QUERY
`ENGINE
`
`15
`
`29
`
`MONITOR 19
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 1 of 11
`
`5,969,755
`
`11
`
`CAMERA
`
`13
`
`VISION
`SUBSYSTEM
`
`21
`11
`~ PJ~ITIZAT~~~ MOTION
`~ 7 :
`
`
`SEGMENTOR
`
`I
`
`I
`I
`I
`I
`
`15
`
`17
`
`19
`
`USER
`INTERFACE
`
`10
`
`Fig. 1
`
`22
`
`23
`
`2 4
`
`25
`
`OBJECT
`TRACKER
`
`MOTION
`ANALYZER
`
`RECORDER
`
`COMPRESSOR
`
`:TEMPORARY: 10
`STORAGE
`:
`L-----------J
`
`20
`
`i
`
`30
`
`DE COMPRESSOR
`
`28
`GRAPHICAL
`USER INTERFACE
`
`27
`
`QUERY
`ENGINE
`
`15
`
`29
`
`19
`
`Fig. 5
`
`IMAGE
`Th
`Dn
`In .-------. r----------1 .---~-__, ~-~-~ CONNECTED
`IQ
`DIFFERENCING
`; LO~LiEiss; THRESHOLDING MORP~foLi?CAL
`COMPONENTS
`ANALYSIS
`L __________ J .___ __ __.
`.__ ___ __.
`
`h
`
`k
`
`REFERENCE
`IMAGE
`
`Fig. 6
`
`Cn
`
`101
`
`CAMERA
`
`100
`
`WATCHPOINTS
`Fig. 27
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 2 of 11
`
`5,969,755
`
`flO]
`
`[20}
`
`r,~.r-·11
`t·d. J
`
`[,:o,
`
`():;
`'
`
`[001
`,, J
`
`~
`
`1 nc··
`iodj:
`
`[
`
`[l 10]
`
`[40]
`
`[80]
`.
`'
`
`[120]
`
`[130]
`
`)'50)
`-~
`
`._
`
`f ., «("!]
`l.h3 ..
`
`[170]
`
`[200]
`
`[220j
`
`Fig. 2
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 3 of 11
`
`5,969,755
`
`~
`·················•·❖.•:•::::::,:j;:::
`
`::1
`
`~j
`~-1
`·····
`
`· ....... ·.·.·.·r
`i
`
`Flg. 3
`
`,.-··········"''l
`ii f
`: ~.
`... .
`,
`
`. .. .. .
`
`. .....
`
`·,
`
`. ~'. , .... ~.
`Removal
`
`...
`
`..
`.
`: ~
`···········r
`~- - - --
`
`.... · .. · ... '
`
`• •:! L1~
`:,: 1;:rIT :r,,,,.··.····
`
`Motirni Segrne.ntafam
`
`.\,fotion Grnph
`
`Fig. 4
`
`•
`
`. ,·,,
`
`k)
`
`REGION 3
`l
`c9
`
`lr·-.........
`
`..-, REGlmi
`(fl
`\.·.,
`
`{b)
`
`•
`
`{e)
`
`Fig. 7
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 4 of 11
`
`5,969,755
`
`Fig. 8
`
`14·
`,:.
`(]
`
`Fig, 24b
`
`Fig,
`
`,,
`tlG -✓•
`
`n4 ,
`L'·,a
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 5 of 11
`
`5,969,755
`
`FO
`
`F1
`
`F2
`
`F3
`
`FS
`
`F4
`Fig. 9
`
`F6
`
`F7
`
`FB
`
`FD
`
`F1
`
`F2
`
`F3
`
`F4
`
`FS
`
`F6
`
`F9
`
`F1D F11 F12 F13 F14
`
`FB
`F7
`Fig. 10
`
`FD
`
`F1
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F9
`
`F1D F11 F12 F13 F14
`
`FB
`F7
`Fig. 11
`
`FD
`
`F1
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F9
`
`F1D F11 F12 F13 F14
`
`FB
`F7
`Fig. 12
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 6 of 11
`
`5,969,755
`
`FO
`
`F1
`
`F2
`
`F3
`
`F4
`
`FS
`
`F9
`
`F10 F11 F12 F13 F14
`
`F8
`F7
`Fig. 13
`
`FO
`
`F1
`
`F2
`
`F3
`
`F4
`
`FS
`
`F6
`
`F9
`
`F10 F11 F12 F13 F14
`
`F8
`F7
`Fig. 14
`
`FO
`
`F1
`
`F2
`
`F3
`
`F4
`
`FS
`
`F6
`
`F9
`
`F10 F11 F12 F13 F14
`
`F8
`F7
`Fig. 15
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 7 of 11
`
`5,969,755
`
`ENTRANCE
`
`START
`
`EXIT
`
`APPEAR
`
`DISAPPEARANCE
`
`I
`I
`I
`I
`I
`I
`I
`
`I
`
`I
`
`I
`
`Fl / F2 / F3
`FO
`DEPOSITOR/ DEPOSIT
`
`\
`
`F4 \ FS
`EXIT
`
`F9
`
`\
`
`\
`
`F6\ F7
`FB
`ENTRANCE
`Fig. 16
`
`I
`
`I
`
`I
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`I
`
`•
`
`I
`
`F10/F11 F12/ F13 /F14
`REMOVAL /REMOVER EXIT
`
`E=EXIT
`
`F9
`FB
`F7
`F6
`FS
`F4
`F3
`F2
`Fl
`FO
`- - - - - - - -T - - - - - - - - - -
`Fig. 17
`
`FlO F11 F12 F13 F14
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`FS
`
`F6
`F7
`FB
`Fig. 18
`
`F9
`
`FlO F11 F12 F13 F14
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 8 of 11
`
`5,969,755
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`FS
`
`F6
`F7
`FB
`Fig. 19
`
`F9
`
`F10 F11 F12 F13 F14
`
`FO
`
`F1
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`F7
`FB
`Fig. 20
`
`F9
`
`F10 F11 F12 F13 F14
`
`FO
`
`Fl
`
`F2
`
`F3 F4\ F5
`EXIT
`
`F6
`
`F7
`
`FB
`
`F9
`
`F10 F11 F12 F13 F14
`
`Fig. 21
`
`

`

`U.S. Patent
`U.S. Patent
`
`.i,, ......
`
`Sheet 9 of 11
`Sheet 9 of 11
`
`5,969,755
`
`Oct. 19, 1999
`Oct. 19, 1999
`
`5,969,755
`
`•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•,•·········································································
`
`❖:❖:•:❖:❖:❖:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:•:
`
`Axis Exhibit 1021, Page 10 of 21
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 10 of 11
`
`5,969,755
`
`[(}1
`J
`l
`
`[72]
`
`[84]
`
`!()fil
`~ ·-
`,:
`
`[J ')i}'
`,. ~,)
`
`C-J 'l))
`~ -~.Ji-·:,
`
`rJ.4· .SJ
`!. .. '4!"
`
`[1.56]
`
`[240]
`
`[2,52]
`
`[26{]
`
`Fig. 25
`
`'
`
`I)'
`
`l27nj
`
`

`

`U.S. Patent
`
`Oct. 19, 1999
`
`Sheet 11 of 11
`
`5,969,755
`
`[26]
`
`[D1}
`
`[.,.,..,~]
`
`I. ~;t,)
`
`[247]
`
`[104]
`
`[117J
`
`[ 1 u t->'
`.. ih)j
`
`(169'
`
`[260)
`
`,-·· .)
`f?86.j
`
`r,2· 0,9,1
`l
`,J, '
`
`Fig. 26
`
`

`

`5,969,755
`
`1
`MOTION BASED EVENT DETECTION
`SYSTEM AND METHOD
`
`This application claims priority under 35 USC §119(e)
`(1) of provisional application No. 60/011,106, filed Feb. 5,
`1996. This application is related to co-pending application
`Ser. No. 08/795,434 (TI-22548) entitled, "Object Detection
`Method and System for Scene Change Analysis in TV and
`IR Data" of Jonathan Courtney, et al., filed Feb. 50, 1997.
`This application is incorporated herein by reference.
`
`TECHNICAL FIELD OF THE INVENTION
`
`This invention relates to motion event detection as used
`for example in surveillance.
`
`BACKGROUND OF THE INVENTION
`
`5
`
`2
`Retrieving Sequences of Images on the Basis of Motion
`Analysis, in Image Storage and Retrieval Systems, Proc.
`SPIE 1662, 35-46 (1992), and S. Y. Lee and H. M. Kao,
`Video Indexing-an approach based on moving object and
`track, in Wayne Niblack, editor, Storage and Retrieval for
`Image and Video Databases, Proc. SPIE 1908, 25-36
`(1993).
`Using breakpoints found via scene cut detection, other
`researchers have pursued hierarchical segmentation to ana-
`10 lyze the logical organization of video sequences. For more
`on this, see the following: G. Davenport, T. Smith, and N.
`Pincever, Cinematic Primitives for Multimedia, IEEE Com(cid:173)
`puter Graphics &Applications, 67-74 (1991); M. Shibata,
`A temporal Segmentation Method for Video Sequences, in
`Petros Maragos, editor, Visual Communications and Image
`15 Processing, Proc SPIE 1818, 1194-1205 (1992); D.
`Swanberg, C-F. Shu, and R. Jain, Knowledge Guided Pars(cid:173)
`ing in Video Databases in Wayne Niblack, editor, Storage
`and Retrieval for Image and Video Databases, Proc. SPIE
`1908, 13-24 (1993). In the same way that text is organized
`20 into sentences, paragraphs and chapters, the goal of these
`techniques is to determine a hierarchical grouping of video
`sub-sequences. Combining this structural information with
`content abstractions of segmented sub-sequences provides
`multimedia system users a top-down view of video data. For
`25 more details see F. Arman, R. Depommier, A. Hsu, and M.
`Y. Chiu, Content-Based Browsing of Video Sequences, in
`Proceedings of ACM International Conference on
`Multimedia, (1994).
`Closed-circuit television (CCTV) systems provide secu-
`30 rity personnel a wealth of information regarding activity in
`both indoor and outdoor domains. However, few tools exist
`that provide automated or assisted analysis of video data;
`therefore, the information from most security cameras is
`under-utilized.
`Security systems typically process video camera output
`by either displaying the video on monitors for simultaneous
`viewing by security personnel and/or recording the data to
`time-lapse VCR machines for later playback. Serious limi(cid:173)
`tations exist in these approaches:
`Psycho-visual studies have shown that humans are limited
`in the amount of visual information they can process in tasks
`like video camera monitoring. After a time, visual activity in
`the monitors can easily go unnoticed. Monitoring effective(cid:173)
`ness is additionally taxed when output from multiple video
`cameras must be viewed.
`Time-lapse VCRs are limited in the amount of data that
`they can store in terms of resolution, frames per second, and
`length of recordings. Continuous use of such devices
`requires frequent equipment maintenance and repair.
`In both cases, the video information is unstructured and
`un-indexed. Without an efficient means to locate visual
`events of interest in the video stream, it is not cost-effective
`for security personnel to monitor or record the output from
`all available video cameras.
`Video motion detectors are the most powerful of available
`tools to assist in video monitoring. Such systems detect
`visual movement in a video stream and can activate alarms
`or recording equipment when activity exceeds a pre-set
`threshold. However, existing video motion detectors typi(cid:173)
`cally sense only simple intensity changes in the video data
`and cannot provide more intelligent feedback regarding the
`occurrence of complex object actions such as inventory
`theft.
`
`35
`
`Advances in multimedia technology, including commer(cid:173)
`cial prospects for video-on-demand and digital library
`systems, has generated recent interest in content-based video
`analysis. Video data offers users of multimedia systems a
`wealth of information; however, it is not as readily manipu(cid:173)
`lated as other data such as text. Raw video data has no
`immediate "handles" by which the multimedia system user
`may analyze its contents. Annotating video data with sym(cid:173)
`bolic information describing its semantic content facilitates
`analysis beyond simple serial playback.
`Video data poses unique problems for multimedia infor(cid:173)
`mation systems that text does not. Textual data is a symbolic
`abstraction of the spoken word that is usually generated and
`structured by humans. Video, on the other hand, is a direct
`recording of visual information. In its raw and most common
`form, video data is subject to little human-imposed structure,
`and thus has no immediate "handles" by which the multi(cid:173)
`media system user may analyze its contents.
`For example, consider an on-line movie screenplay
`(textual data) and a digitized movie (video and audio data).
`If one were analyzing the screenplay and interested in
`searching for instances of the word "horse" in the text, many
`text searching algorithms could be employed to locate every 40
`instance of this symbol as desired. Such analysis is common
`in on-line text databases. If, however, one were interested in
`searching for every scene in the digitized movie where a
`horse appeared, the task is much more difficult. Unless a
`human performs some sort of pre-processing of the video 45
`data, there are no symbolic keys on which to search. For a
`computer to assist in the search, it must analyze the semantic
`content of the video data itself. Without such capabilities,
`the information available to the multimedia system user is
`greatly reduced.
`Thus, much research in video analysis focuses on seman(cid:173)
`tic content-based search and retrieval techniques. The term
`"video indexing" as used herein refers to the process of
`marking important frames or objects in the video data for
`efficient playback. An indexed video sequence allows a user 55
`not only to play the sequence in the usual serial fashion, but
`also to "jump" to points of interest while it plays. A common
`indexing scheme is to employ scene cut detection to deter(cid:173)
`mine breakpoints in the video data. See H. Zang, A
`Kankanhalli, and Stephen W. Smoliar, Automatic Partition- 60
`ing of Full Motion Video, Multimedia Systems, 1, 10-28
`(1993). Indexing has also been performed based on camera
`(i.e., viewpoint) motion, see A. Akutsu, Y. Tonomura, H.
`Hashimoto, and Y. Ohba, Video Indexing Using Motion
`Vectors, in Petros Maragos, editor, Visual Communications 65
`and Image Processing SPIE 1818, 1552-1530 (1992), and
`object motion, see M. Ioka and M. Kurokawa, A Method for
`
`50
`
`SUMMARY OF THE INVENTION
`In accordance with one embodiment of the present
`invention, a method is provided to perform video indexing
`
`

`

`5,969,755
`
`3
`from object motion. Moving objects are detected in a video
`sequence using a motion segmentor. Segmented video
`objects are recorded and tracked through successive frames.
`The path of the objects and intersection with paths of the
`other objects are determined to detect occurrence of events. 5
`An index mark is placed to identify these events of interest
`such as appearance/disappearance, deposit/removal,
`entrance/exit, and motion/rest of objects.
`These and other features of the invention that will be
`apparent to those skilled in the art from the following 10
`detailed description of the invention, taken together with the
`accompanying drawings.
`
`4
`ment of the present invention. In this view, a camera 11
`provides input to a vision subsystem 13 including a pro(cid:173)
`grammed computer which processes the incoming video
`which has been digitized to populate a database storage 15.
`The term camera as used herein may be a conventional
`television (TV) camera or infrared (IR) camera. A user may
`then analyze the video information using an interface 17
`including a computer to the database 15 via spatio-temporal,
`event-, and object-based queries. The user interface 17 plays
`video subsequences which satisfy the queries to a monitor
`19.
`FIG. 2 shows frames from a video sequence with content
`similar to that found in security monitoring applications. In
`this sequence, a person enters the scene, deposits a piece of
`15 paper, a briefcase, and a book, and then exits. He then
`re-enters the scene, removes the briefcase, and exits again.
`The time duration of this example sequence is about 1
`minute; however, the action could have been spread over a
`number of hours. By querying the AVI database 15, a user
`20 can jump to important events without playing the entire
`sequence front-to-back. For example, if a user formed the
`query "show all deposit events in the sequence", the AVI
`system 10 would respond with sub-sequences depicting the
`person depositing the paper, briefcase and book. FIG. 3
`25 shows the actual result given by the AVI system in response
`to this query where the system points to the placement of the
`paper, briefcase and book, and boxes highlight the objects
`contributing to the event.
`In processing the video data, the AVI vision subsystem 13
`30 employs motion segmentation techniques to segment fore(cid:173)
`ground objects from the scene background in each frame.
`For motion segmentation techniques see S. Yalamanchili, W.
`Martin, and J. Aggarwal, Extraction of Moving Object
`Descriptions via Differencing, Computer Graphics and
`35 Image Processing, 18, 188-201 (1982); R. Jain, Segmenta(cid:173)
`tion of Frame Sequences Obtained by a Moving Observer,
`IEEE Transactions on Pattern Analysis and Machine
`Intelligence, 6, 624-629 (1984); A Shio and J. Sklansky,
`Segmentation of People in Motion, in IEEE Workshop on
`40 Visual Motion, 325-332 (1991); and D. Ballard and C.
`Brown, Computer Vision, Prentice-Hall, Englewood Cliffs,
`New Jersey (1982) to segment foreground objects from the
`scene background in each frame. It then analyzes the seg(cid:173)
`mented video to create a symbolic representation of the
`45 foreground objects and their movement. This symbolic
`record of video content is referred to as the video "meta(cid:173)
`information" (see FIG. 4). FIG. 4 shows the progression of
`the video data frames, the corresponding motion segmenta(cid:173)
`tion and the corresponding meta-information. This meta-
`so information is stored in the database in the form of an
`annotated directed graph appropriate for later indexing and
`search. The user interface 17 operates upon this information
`rather than the raw video data to analyze semantic content.
`The vision subsystem 13 records in the meta-information
`ss the size, shape, position, time-stamp, and image of each
`object in every video frame. It tracks each object through
`successive video frames, estimating the instantaneous veloc(cid:173)
`ity at each frame and determining the path of the object and
`its intersection with the paths of other objects. It then
`60 classifies objects as moving or stationary based upon veloc(cid:173)
`ity measures on their path.
`Finally, the vision subsystem 13 scans through the meta(cid:173)
`information and places an index mark at each occurrence of
`eight events of interest: appearance/disappearance, deposit/
`65 removal, entrance/exit, and motion/rest of objects. This
`indexing is done using heuristics based on the motion of the
`objects recorded in the meta-information. For example, a
`
`DESCRIPTION OF THE DRAWINGS
`FIG. 1 is an overview diagram of a system for automati(cid:173)
`cally indexing pre-recorded video in accordance with one
`embodiment of the present invention;
`FIG. 2 is a sequence of frames of video (test sequence 1)
`with the frame numbers below each image;
`FIG. 3 illustrates points in the video sequence that satisfy
`the query "show all deposit events";
`FIG. 4 illustrates the relation between video data, motion
`segmentation and video meta-information;
`FIG. 5 illustrates the Automatic Video Indexing system
`architecture;
`FIG. 6 illustrates the motion segmentor;
`FIG. 7 illustrates motion segmentation example where (a)
`; (b) Image In; (c) absolute differ(cid:173)
`is the reference image I0
`ence IDn=In-I 0 I; (d) Threshold image Th; (e) result of
`morphological close operation; (f) result of connected com(cid:173)
`ponents analysis;
`FIG. 8 illustrates reference image from a TV camera
`modified to account for the exposed background region;
`FIG. 9 illustrates the output of the object tracking stage
`for a hypothetical sequence of 1-D frames where vertical
`lines labeled "F n" represent frame numbers n and where
`primary links are solid lines and secondary links are dashed;
`FIG. 10 illustrates an example motion graph for a
`sequence of 1-D frames;
`FIG. 11 illustrates stems;
`FIG. 12 illustrates branches;
`FIG. 13 illustrates trails;
`FIG. 14 illustrates tracks;
`FIG. 15 illustrates traces;
`FIG. 16 illustrates indexing rules applied to FIG. 10;
`FIG. 17 illustrates a graphical depiction of the query
`Y =(C,T,V,R,E);
`FIG. 18 illustrates processing of temporal constraints;
`FIG. 19 illustrates processing of object based constraints;
`FIG. 20 illustrates processing of spatial constraints;
`FIG. 21 illustrates processing of event-based constraints;
`FIG. 22 illustrates a picture of the "playback" portion of
`the GUI;
`FIG. 23 illustrates the query interface;
`FIG. 24 illustrates video content analysis using advanced
`queries with video clips a, b, c and d;
`FIG. 25 illustrates frames from test sequence 2;
`FIG. 26 illustrates frames from test sequence 3; and
`FIG. 27 illustrates video indexing in a real-time system.
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS OF THE PRESENT
`INVENTION
`FIG. 1 shows a high-level diagram of the Automatic
`Video Indexing (AVI) system 10 according to one embodi-
`
`

`

`5,969,755
`
`6
`components analysis resulting in a unique label for each
`connected region in image Th·k. The image Th is defined as
`
`Th(i, j) =
`{
`
`1 if IDn (i, j)I ;a: h
`
`0 otherwise
`
`for all pixels (i, j) in Th, where Dm is the difference image
`of In and I0 such that
`
`For noisy data (such as from an infrared camera), the image
`Dn may be smoothed via a low-pass filter to create a more
`consistent difference image.
`Finally, the operation a·k is defined as
`
`a· iv=( affik)8k,
`
`10
`
`5
`moving object that "spawns" a stationary object results in a
`"deposit" event. A moving object that intersects and then
`removes a stationary object results in a "removal" event.
`The system stores the output of the vision subsystem-the
`video data, motion segmentation, and meta-information-in 5
`the database 15 for retrieval through the user interface 17.
`The interface allows the user to retrieve a video sequence of
`interest, play it forward or backward and stop on individual
`frames. Furthermore, the user may specify queries on a
`video sequence based upon spatial-temporal, event-based,
`and object-based parameters.
`For example, the user may select a region in the scene and
`specify the query "show me all objects that are removed
`from this region of the scene between 8 am and 9 am." In this
`case, the user interface searches through the video meta- 15
`information for objects with timestamps between 8 am and
`9 am, then filters this set for objects within the specified
`region that are marked with "removal" event tags. This
`results in a set of objects satisfying the user query. From this
`set, it then assembles a set of video "clips" highlighting the 20
`query results. The user may select a clip of interest and
`proceed with further video analysis using playback or que(cid:173)
`ries as before.
`The following is a description of some of the terms and 25
`notation used in the remainder of this application.
`A sequence S is an ordered set of N frames, denoted
`S={F0 , F1 , . . . , FN_ 1 }, where Fn is the frame number n in
`the sequence.
`A clip is a 4-tuple e=(S,f,s,l), where Sis a sequence with
`N frames, and f, s, and 1 are frame numbers such that
`O~f~s~l~N-1. Here, F1 and F1 are the first and last valid
`frames in the clip, and Fs is the current frame. Thus, a clip
`specifies a sub-sequence with a state variable to indicate a
`"frame of interest".
`A frame F is an image I annotated with a timestamp t.
`Thus, frame number n is denoted by the pair Fn=(Im tn).
`An image I is an r by c array of pixels. The notation I (i,j)
`indicates the pixel at coordinates (row i, column j) in I,
`wherein i=O, ... ,r-1 and j=O, ... , c-1. For purposes of this
`discussion, a pixel is assumed to be an intensity value
`between O and 255.
`FIG. 5 shows the AVI system in detail. Note that the
`motion segmentor 21, object tracker 22, motion analyzer 23, 45
`recorder 24, and compressor 25 comprise the vision sub(cid:173)
`system 13 of FIG. 1. Likewise, the query engine, 27,
`graphical user interface 28, playback device 29 and decom(cid:173)
`pression modules 30 comprise the user interface 17. The
`subsequent paragraphs describe each of these components in 50
`detail.
`The current implementation of the AVI system supports
`batch, rather than real-time, processing. Therefore, frames
`are digitized into a temporary storage area 20 before further
`processing occurs. A real-time implementation would
`bypass the temporary storage 20 and process the video in a
`pipelined fashion.
`FIG. 6 shows the motion segmentor in more detail. For
`each frame F n in the sequence, the motion segmentor 21
`computes segmented image en as
`
`40
`
`where EB is the morphological dilation operator and 8 is the
`morphological erosion operator.
`FIG. 7 shows an example of this process. FIG. 7a is the
`reference image I0 ; FIG. 7b is the image In; FIG. 7c is the
`absolute difference IDn=In-Iol; FIG. 7d is the thresholded
`image Th, which highlights motion regions in the image;
`FIG. 7e is the result of the morophogical close operation,
`which joins together small regions into smoothly shaped
`objects; FIG. 7/ is the result of connected components
`analysis, which assigns each detected object a unique label
`such as regions 1-4. This result is em the output of the
`30 motion segmentor.
`Note that the technique uses a "reference image" for
`processing. This is nominally the first image from the
`sequence, I0 • For many applications, the assumption of an
`available reference image is not unreasonable; video capture
`35 is simply initiated from a fixed-viewpoint camera when
`there is limited motion in the scene. Following are some
`reasons why this assumption may fail in other applications:
`1. Gradual lighting changes may cause the reference
`frame to grow "out of date" over long video sequences,
`particularly in outdoor scenes. Here, more sophisti(cid:173)
`cated techniques involving cumulative differences of
`successive video frames must be employed.
`2. The viewpoint may change due to camera motion. In
`this case, camera motion compensation must be used to
`"subtract" the moving background from the scene.
`3. A object may be present in the reference frame and
`move during the sequence. This causes the motion
`segmentation process to incorrectly detect the back(cid:173)
`ground region exposed by the object as if it were a
`newly-appearing stationary object in the scene.
`A straightforward solution to problem 3 is to apply a test
`to non-moving regions detected by the motion segmentation
`process to determine if a given region is the result of either
`55 (1) a stationary object present in the foreground or (2)
`background exposed by a foreground object present in the
`reference image.
`In the case of video data from a TV camera, this test is
`implemented based on the following observation: if the
`60 region detected by the segmentation of image In is due to the
`motion of an object present in the reference image (i.e., due
`to "exposed background"), a high probability exists that the
`boundary of the segmented region will match intensity edges
`detected in I0 • If the region is due to the presence of a object
`65 in the current image, a high probability exists that the region
`boundary will match intensity edges in In. The test is
`implemented by applying an edge detection operator (See D.
`
`where Th is the binary image resulting from thresholding the
`absolute difference of images In and I0 at h, Th·k is the
`morphological close operation on Th with structuring ele(cid:173)
`ment k, and the function ccomps(-) performs connected
`
`

`

`10
`
`20
`
`µnP=µnP +unP·(tn+l-tn),
`
`where µnP is the predicted centroid of V nP in Cn+i, µ,;' is the
`centroid of V nP measured in Cm unP is the estimated
`(forward) velocity of V ,;, and tn+l and tn are the timestamps
`of frames F n+l and Fm respectively. Initially, the velocity
`estimate is set to u,;=(0,0).
`2. For each V ,;EVm determine the V-object in the next
`frame with centroid nearest µ,['. This "nearest neighbor" is
`denoted n,;. Thus,
`
`n~ = v~. 1 3 11,it~ - µ~. 1 II :;; 11,it~ - µ~. 1 IIV q * r
`
`3. For every pair (V,;, nnP=Vn+iJ for which no other
`V-objects in Vn have Vn+i' as a nearest neighbor, estimate
`Un+/; the (forward) velocity of vn+i' as
`
`(1)
`
`7
`Ballard and C. Brown, Computer Vision, Prentice-Hall,
`Englewood Cliffs, N.J., 1982) to the current and reference
`images and checking for coincident boundary pixels in the
`segmented region of Cn.
`In the case of video data from an IR camera, foreground
`objects may not have easily detectable edges due to heat
`diffusion and image blurring. In data from some cameras,
`however, objects exhibit a contrasting halo due to opto(cid:173)
`mechanical image sharpening. See A Rosenfeld and A Kak,
`Digital Picture Processing, 2ed., Volume 1, Academic Press,
`New York, N.Y., 1982. Thus, the test may be implemented
`by comparing the variance of pixel intensities within the
`region of interest in the two images. Since background
`regions tend to exhibit constant pixel intensities, the vari(cid:173)
`ance will be highest for the image containing the foreground 15
`object.
`The object detection method for scene change analysis in
`TV and IR data is described in above cited application of
`Courtney, et al. incorporated herein by reference and in
`Appendix A
`If either test supports the hypothesis that the region in
`question is due to exposed background, the reference image
`is modified by replacing the object with its exposed back(cid:173)
`ground region (see FIG. 8).
`No known motion segmentation technique is perfect. The 25
`following are errors typical of many motion segmentation
`techniques:
`1. True objects will disappear temporarily from the
`motion segmentation record. This occurs when there is
`insufficient contrast between an object and an occluded 30
`background region, or if an object is partially occluded
`by a "background" structure (for instance, a tree or
`pillar present in the scene).
`2. False objects will appear temporarily in the motion
`segmentation record. This is caused by light fluctua- 35
`tions or shadows cast by moving objects.
`3. Separate objects will temporarily join together. This
`typically occurs when two or more objects are in close
`proximity or one object occludes another object.
`4. Single objects will split into two regions and then 40
`rejoin. This occurs when a portion of an object has
`insufficient contrast with the background it occludes.
`Instead of applying incremental improvements to relieve
`the shortcomings of motion segmentation, the AVI technique
`addresses these problems at a higher level where informa- 45
`tion about the semantic content of the video data is more
`readily available. The object tracker and motion analyzer
`units described later employ object trajectory estimates and
`domain knowledge to compensate for motion segmentation
`inaccuracies and thereby construct a more accurate record of
`the video content.
`The motion segmentor 21 output is processed by the
`object tracker 22. Given a segmented image Cn with P
`uniquely-labeled regions corresponding to foreground
`objects in the video, the system generates a set of features to
`represent each region. This set of features is named a
`"V-object" (video-object), denoted V ,;, p=l, . . . ,P. A
`V-object contains the label, centroid, bounding box, and
`shape mask of its corresponding region, as well as object
`velocity and trajectory information by the tracking process. 60
`V-objects are then tracked through the segmented video
`sequence. Given segmented images Cn and Cn+i with
`V-objects Vn={VnP; p=l, . . . ,P} and vn+1={Vn+lq;
`q=l, ... ,Q}, respectively, the motion tracking process
`"links" V-objects V,; and V n+l q if their position and esti(cid:173)
`mated velocity indicate that they correspond to the same
`real-world object appearing in frames Fn and Fn+i· This is
`
`5,969,755
`
`8
`determined using linear predicti

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket