throbber
~Pergamon
`
`Pattern Recognition, Vol. 30, No. 4, pp. 607--Q25, 1997
`© 1997 Pattern Recognition Society. Pnblished by Elsevier Science Ltd
`Ptinted in Great Britain. All rights reserved
`0031-3203/97 $17.00+.00
`
`PH: S0031-3203(96)00107-0
`
`AUTOMATIC VIDEO INDEXING VIA OBJECT
`MOTION ANALYSIS
`
`JONATHAN D. COURTNEY*
`Texas Instruments, Incorporated 8330 LBJ Freeway, MIS 8374 Dallas, Texas 75243, U.S.A
`
`(Received 12 June 1996; received for publication 30 July 1996)
`
`Abstract-To assist human analysis of video data, a technique has been developed to perform automatic,
`content-based video indexing from object motion. Moving objects are detected in tbe video sequence using
`motiOn se~mentation I?etbo?s. By trac~ng individual objects through tbe segmented data, a symbolic
`representation of tbe video IS generated m tbe form of a directed graph describing tbe objects and tbeir
`movement. This graph is then annotated using a rule-based classification scheme to identify events of interest
`~.g., a~pearance/di.sappearan~e, deposit/removal, entrance/exit, and motion/rest of objects. One may then use ~
`m?ex mto. tbe monon graph mstead of the raw data to analyse the semantic content of tbe video. Application of
`tb1s techmque to surveillance video analysis is discussed. © 1997 Pattern Recognition Society. Published by
`Elsevier Science Ltd.
`
`Video indexing
`
`Object tracking
`
`Motion analysis
`
`Content-based retrieval
`
`1. INTRODUCTION
`
`Advances in multimedia technology, including commer(cid:173)
`cial prospects for video-on-demand and digital library
`systems, have generated recent interest in content-based
`video analysis. Video data offers users of multimedia
`systems a wealth of information; however, it is not as
`readily manipulated as other data such as text. Raw video
`data has no immediate "handles" by which the multi(cid:173)
`media system user may analyse its contents. By annotat(cid:173)
`ing video data with symbolic information describing the
`semantic content, one may facilitate analysis beyond
`simple serial playback.
`To assist human analysis of video data, a technique has
`been developed to perform automatic, content-based
`video indexing from object motion. Moving objects
`are detected in the video sequence using motion seg(cid:173)
`mentation methods. By tracking individual objects
`through the segmented data, a symbolic representation
`of the video is generated in the form of a directed graph
`describing the objects and their movement. This graph is
`then annotated using a rule-based classification scheme
`to identify events of interest, e.g., appearance/disappear(cid:173)
`ance, deposit/removal, entrance/exit, and motion/rest of
`objects. One may then use an index into the motion graph
`instead of the raw data to analyse the semantic content of
`the video.
`We have developed a system that demonstrates this
`indexing technique in assisted analysis of surveillance
`video data. The Automatic Video Indexing (AVI) system
`allows the user to select a video sequence of interest, play
`it forward or backward and stop at individual frames.
`Furthermore, the user may specify queries on video
`sequences and "jump" to events of interest to avoid
`tedious serial playback. For example, the user may select
`
`* E-mail: courtney@csc.ti.com.
`
`a person in a video sequence and specify the query "show
`me all objects that this person removed from the scene".
`In response, the A VI system assembles a set of video
`"clips" highlighting the query results. The user may
`select a clip of interest and proceed with further video
`analysis using queries or playback as before.
`The remainder of this paper is organized as follows:
`Section 2 discusses content-based video analysis. Sec(cid:173)
`tion 3 presents a video indexing technique based on
`object motion analysis. Section 4 describes a system
`which implements this video indexing technique for
`scene monitoring applications. Section 5 presents experi(cid:173)
`mental results using the system. Section 6 concludes the
`paper.
`
`2. CONTENT-BASED VIDEO ANALYSIS
`
`Video data poses unique problems for multimedia
`information systems that text does not. Textual data is
`a symbolic abstraction of the spoken word that is usually
`generated and structured by humans. Video, on the other
`hand, is a direct recording of visual information. In its
`raw and most common form, video data is subject to little
`human-imposed structure, and thus has no immediate
`"handles" by which the multimedia system user may
`analyse its contents.
`For example, consider an online movie screenplay
`(textual data) and a digitized movie (video and audio
`data). If one were analysing the screenplay and interested
`in searching for instances of the word "horse" in the text,
`various text searching algorithms could be employed to
`locate every instance of this symbol as desired. Such
`analysis is common in online text databases. If, however,
`one were interested in searching for every scene in the
`digitized movie where a horse appeared, the task is much
`more difficult. Unless a human performs some sort of
`
`607
`
`AVIGILON EX. 2006
`IPR2019-00236
`Page 1 of 19
`
`

`

`608
`
`J. D. COURTNEY
`
`pre-processing of the video data, there are no symbolic
`keys on which to search. For a computer to assist in the
`search, it must analyse the semantic content of the video
`data itself. Without such capabilities, the information
`available to the multimedia system user is greatly re(cid:173)
`duced.
`Thus, much research in video analysis focuses on
`semantic content-based search and retrieval techniques.
`Video indexing refers to the process of identifying im(cid:173)
`portant frames or objects in the video data for efficient
`playback. An indexed video sequence allows a user not
`only to play the sequence in the usual serial fashion, but
`also to "jump" to points of interest while it plays. A
`common indexing scheme is to employ scene cut detec(cid:173)
`tion(!) to determine breakpoints in the video data. Index(cid:173)
`ing has also been performed based on camera (i.e.
`4l
`viewpoint) motion<2l and object motion.<3
`'
`Using breakpoints found via scene cut detection, other
`researchers have pursued hierarchical segmentation<S-?)
`to analyse the logical organization of video sequences. In
`the same way that text is organized into sentences,
`paragraphs, and chapters, the goal of these techniques
`is to determine a hierarchical grouping of video sub(cid:173)
`sequences. Combining this structural information with
`content abstractions of segmented sub-sequences<Sl pro(cid:173)
`vides multimedia system users a top-down view of video
`data.
`The indexing technique described in this paper (the
`"AVI technique") performs video indexing based on
`object motion analysis. Unlike previous work, it forms
`semantically high-level interpretations of object actions
`and interactions from the object motion information. This
`allows multimedia system users to search for object(cid:173)
`motion "events" in the video sequence (such as object
`entrance or exit) rather than features related to object
`velocity alone (such as "northeast movement").
`
`3. VIDEO INDEXING VIA OBJECT MOTION ANALYSIS
`
`Given a video sequence, the AVI technique analyses
`the motion of foreground objects in the data and indexes
`
`the objects to indicate the occurrence of several events of
`interest. It outputs a symbolic abstraction of the video
`content in the form of an annotated directed graph
`containing the indexed objects. This symbolic data
`may then be read by a user interface to perform con(cid:173)
`tent-based queries on the video data.
`The AVI technique processes the video data in three
`stages: motion segmentation, object tracking, and motion
`10
`analysis. First, motion segmentation methods<9
`) are
`•
`used to segment moving foreground objects from the
`scene background in each frame. Next, each object is
`tracked through successive video frames, resulting in a
`graph describing object motion and path intersections.
`Then the motion graph is scanned for the occurrence of
`several events of interest. This is performed using a rule(cid:173)
`based classifier which employs knowledge concerning
`object motion and the output of the previous stages to
`characterize the activity of the objects recorded in the
`graph. For example, a moving object that occludes an(cid:173)
`other object results in a "disappear" event; a moving
`object that intersects and then removes a stationary object
`results in a "removal" event. An index is then created
`which identifies the location of each event in the video
`sequence.
`Figure 1 depicts the relation between the video data,
`motion segmentation information, and the motion graph.
`Note that for each frame of the video, the AVI technique
`creates a corresponding symbolic "frame" to describe it.
`
`3.1. Terminology and notation
`
`The following is a description of some of the terms and
`notation used in the subsequent sections:
`
`• A sequence Y' is an ordered set of N frames, denoted
`Y' = {Fo,F,, . .. ,FN-J}, whereFnisframenumbern
`in the sequence.
`• A clip is a 4-tuple '?J=(Y',j,s,l), where Y' is a
`sequence with N frames, and f, s, and l are frame
`numbers such that 0 ~ f ~ s ~ l ~ N - 1. Here, F1
`and F1 are the first and last valid frames in the clip, and
`Fs is the "start" frame. Thus, a clip specifies a sub-
`
`Video Data
`
`Motion Segmentation
`
`Motion Graph
`
`:---- -~--: :--~------: :--~--: :----ci\-: .:~.~-~--~-.. -1?---:
`,
`..
`. ·········; .. ··:..,HJ ..... , .... , ....... . ioi :::~:··,
`.u_ ... , ... T.....
`, o
`1~.~-~·.~·.~~.t:.·~~·-·~.;.~.~-~.r:.~~-~.t ---~-~-~-~ . ...l~·--~.~ .. ~.~-).::~--~-~~--- j
`
`Removal
`
`Fig. 1. Relation between video data, motion segmentation information, and tbe symbolic motion graph.
`
`AVIGILON EX. 2006
`IPR2019-00236
`Page 2 of 19
`
`

`

`Automatic video indexing via object motion analysis
`
`609
`
`sequence and contains a state variable to indicate a
`"frame of interest".
`• A frame F is an image I annotated with a timestamp t.
`Thus, frame number n is denoted by the pair
`Fn = (In, tn)·
`• An image I is an rxc array of pixels. The notation /(i, j)
`indicates the pixel at coordinates (row i, columnj). For
`purposes of this discussion, a pixel is assumed to be an
`intensity value between 0 and 255.
`• A timestamp records the date and time that an image
`was digitized.
`
`3.2. Motion segmentation
`
`For each frame Fn in the sequence, the motion seg(cid:173)
`mentation stage computes segmented image Cn as
`
`Cn = ccomps(Th·k),
`
`where Th is the binary image resulting from thresholding
`the absolute difference of images In and / 0 at h, Th· k the
`morphological close operation° 2l on Th with structuring
`element k, and the function ccomps(·) performs con(cid:173)
`nected components analysis/Ill resulting in a unique
`label for each connected region in image Th· k. The
`image Th is defined as
`T (i ") = { 1
`if IIn(i,j)- Io(i,j)l 2' h,
`0 otherwise,
`,J
`h
`
`for all pixels (i,j) in h
`Figure 2 shows an example of this process. Absolute
`differencing and thresholding [Fig. 2(c) and (d)] detect
`motion regions in the image. The morphological close
`operation shown in Fig. 2(e) joins together small regions
`into smoothly-shaped objects. Connected components
`analysis assigns each detected object a unique label, as
`shown in Fig. 2(f). Components smaller than a given size
`
`threshold are discarded. The result is Cm the output of the
`motion segmentation stage.
`The motion segmentation technique described here is
`best suited for video sequences containing object motion
`within an otherwise static scene, such as in surveillance
`and scene monitoring applications. Note that the tech(cid:173)
`nique uses a "reference image" for processing. This is
`nominally the first image from the sequence, / 0 . For many
`applications, the assumption of an available reference
`image is not unreasonable; video capture is simply
`initiated from a fixed-viewpoint camera when there is
`limited motion in the scene. Following are some reasons
`why this assumption may fail in other applications:
`
`1. Sudden lighting changes may render the reference
`frame invalid. However, techniques such as scene
`to detect such
`cut detection(!) may be used
`occurrences and indicate when a new reference
`image must be acquired.
`2. Gradual lighting changes may cause the reference
`image to slowly grow "out of date" over long video
`sequences, particularly in outdoor scenes. Here, more
`sophisticated techniques involving cumulative differ(cid:173)
`ences of successive video frames< 13
`) must be employed.
`3. The viewpoint may change due to camera motion. In
`this case, camera motion compensation°4l must be used
`to offset the effect of an apparent moving background.
`4. An object may be present in the reference frame and
`move during the sequence. This causes the motion
`segmentation process to incorrectly detect the back(cid:173)
`ground region exposed by the object as if it were a
`newly-appearing stationary object in the scene.
`
`A straightforward solution to problem 4 is to apply a
`test to non-moving regions detected by the motion seg(cid:173)
`mentation process based on the following observation: if
`
`(a)
`
`•
`
`(d)
`
`(b)
`
`•
`
`(e)
`
`(c)
`
`•
`
`(f)
`
`Fig. 2. Motion segmentation example. {a) Reference image I0 . (b) Image In- (c) Absolute difference \In - I0 \.
`(d) Thresholded image Th. (e) Result of morphological close operation. (f) Result of connected components
`analysis.
`
`AVIGILON EX. 2006
`IPR2019-00236
`Page 3 of 19
`
`

`

`610
`
`J. D. COURTNEY
`
`the region detected by the segmentation of image In is due
`to the motion of an object present in the reference image
`(i.e. due to "exposed background"), a high probability
`exists that the boundary of the segmented region will
`coincide with intensity edges detected in I 0 . If the region
`is due to the presence of a foreground object in the
`current image, a high probability exists that the region
`boundary will coincide with intensity edges in In. The test
`is implemented by applying an edge detection operator to
`the current and reference images and checking for co(cid:173)
`incident boundary pixels in the segmented region of
`Cn.<9l Figure 3 shows this process. If the test supports
`the hypothesis that the region in question is due to
`exposed background, the reference image is modified
`by replacing the object with its exposed background
`region (see Fig. 4).
`No motion segmentation technique is perfect. The
`following are errors typical of many motion segmenta(cid:173)
`tion techniques:
`
`1. True objects will disappear temporarily from the
`motion segmentation record. This occurs when there
`is insufficient contrast between an object and an
`occluded background region, or if an object is
`partially occluded by a "background" structure (for
`instance, a tree or pillar present in the scene).
`2. False objects will appear temporarily in the motion
`segmentation record. This is caused by light fluctua(cid:173)
`tions or shadows cast by moving objects.
`3. Separate objects will temporarily join together. This
`typically occurs when two or more objects are in
`close proximity or when one object occludes another
`object.
`4. Single objects will split into multiple regions. This
`occurs when a portion of an object has insufficient
`contrast with the background it occludes.
`
`Instead of applying incremental improvements to re(cid:173)
`lieve the shortcomings of motion segmentation, the AVI
`technique addresses these problems at a higher level
`where information about the semantic content of the
`video data is more readily available. The object tracking
`and motion analysis stages described in Sections 3.3 and
`3.4 employ object trajectory estimates and knowledge
`concerning object motion and typical motion segmenta(cid:173)
`tion errors to construct a more accurate representation of
`the video content.
`
`3.3. Object tracking
`
`The motion segmentation output is processed by the
`object tracking stage. Given a segmented image Cn with
`P uniquely-labeled regions corresponding to foreground
`objects in the video, the system generates a set of features
`to represent each region. This set of features is named a
`"V-object" (video-object), denoted V~, p = 1, ... , P. A
`V-object contains the label, centroid, bounding box, and
`shape mask of its corresponding region, as well as object
`velocity and trajectory information generated by the
`tracking process.
`V-objects are then tracked through the segmented
`video sequence. Given segmented images Cn and Cn+t
`
`with V-objects Vn = {V~; p = 1, ... , P} and Vn+l =
`{V,:+1; q = 1, ... , Q}, respectively, the motion tracking
`process "links" V-objects V~ and V~+l if their position
`and estimated velocity indicate that they correspond to
`the same real-world object appearing in frames Fn and
`Fn+l· This is determined using linear prediction of V(cid:173)
`object positions and a "mutual nearest neighbor" criter(cid:173)
`ion via the following procedure:
`
`1. For each V-object V~ E Vn, predict its position in the
`next frame using
`if,. = J/, + v~ . (tn+l -
`tn),
`where if,. is the predicted centroid of V~ in Cn+t> J1:.
`the centroid of V~ measured in Cm v~ the estimated
`(forward) velocity of V~, and tn+l and tn are the
`timestamps of frames Fn+l and Fm respectively.
`Initially, the velocity estimate is set to v~ = (0, 0).
`2. For each V~ E Vn, determine the V-object in the next
`frame with centroid nearest if,.. This "nearest neigh(cid:173)
`bor" is denoted JV~. Thus,
`JV~ = V~+l 3 II.U,: - .u~+1ll S II ,if,; - .u~+1ll Vq # r.
`3. For every pair (V~, JV~ = V~+ 1) for which no other V(cid:173)
`objects in Vn have V~+ 1 as a nearest neighbor, estimate
`v~+1 , the (forward) velocity of V~+l' as
`.U~+1- ~
`;
`tn+1 -
`tn
`otherwise, set v~+l = (0, 0).
`
`r
`vn+l =
`
`(1)
`
`each Cm
`for
`performed
`are
`steps
`These
`n = 0, 1, ... , N- 2. Steps 1 and 2 find nearest neighbors
`in the subsequent frame for each V-object. Step 3 gen(cid:173)
`erates velocity estimates for V-objects that can be un(cid:173)
`ambiguously tracked; this information is used in step 1 to
`predict V-object positions for the next frame.
`Next, steps l-3 are repeated for the reverse sequence,
`i.e. Cm n = N- 1,N- 2, ... , 1. This results in anew set
`of predicted centroids, velocity estimates, and nearest
`neighbors for each V-object in the reverse direction.
`Thus, the V-objects are tracked both forward and back(cid:173)
`ward through the sequence. The remaining steps are then
`performed:
`
`4. V-objects V~ and V~+l are mutual nearest neighbors
`if Jll"~ = V~+ 1 and JV~+ 1 = V~. (Here, JV~ is the
`nearest neighbor of V~ in the forward direction, and
`JV~+ 1 is the nearest neighbor of V~+ 1 in the reverse
`direction.) For each pair of mutual nearest neighbors
`(V~, v~+1), create a primary link from v~ to v~+1"
`5. For each V~ E Vn without a mutual nearest neighbor,
`create a secondary link from V~ to JV~ if the predicted
`centroid if,. is within E of JV~, where E is some small
`distance.
`6. For each V~+1 in Vn+ 1 without a mutual nearest
`neighbor, create a secondary link from JV~+ 1 to
`V,:+ 1 if the predicted centroid p~+ 1 is within E of
`JV~+l"
`The object tracking procedure uses the mutual nearest
`neighbor criterion (step 4) to estimate frame-to-frame V-
`
`AVIGILON EX. 2006
`IPR2019-00236
`Page 4 of 19
`
`

`

`Automatic video indexing via object motion analysis
`
`611
`
`(a)
`
`(b)
`
`(c)
`
`\'(cid:173)r "o.
`I
`I
`..... ..__,
`
`(f)
`
`(g)
`
`(h)
`
`Fig. 3. Exposed background detection. (a) Reference image / 0 . (b) Image In. (c) Region to be tested. (d)
`Edge image of (a), found using Sobel0 1) operator. (e) Edge image of (b). (t) Edge image of (c), showing
`boundary pixels. (g) Pixels coincident in (d) and (t). (h) Pixels coincident in (e) and (t). The greater number
`of coincident pixels in (g) versus (h) support the hypothesis that the region in question is due to exposed
`background.
`
`AVIGILON EX. 2006
`IPR2019-00236
`Page 5 of 19
`
`

`

`612
`
`J. D. COURTNEY
`
`Fig. 4. Reference image modified to account for the exposed background region detected in Fig. 3.
`
`object trajectories with a high degree of confidence. Pairs
`of mutual nearest neighbors are connected using a "pri(cid:173)
`mary" link to indicate that they are highly likely to
`represent the same real-world object in successive video
`frames.
`Steps 5-6 associate V-objects
`tracked
`that are
`with less confidence but display evidence that they might
`result from the same real-world object. Thus, these
`objects are joined by "secondary" links. These steps
`are necessary to account for the "split" and "join"
`type motion segmentation errors as described
`in
`Section 3.2.
`The object tracking process results in a list of V(cid:173)
`objects and connecting links that form a directed graph
`(digraph) representing the position and trajectory of
`foreground objects in the video sequence. Thus, the V(cid:173)
`objects are the nodes of the graph and the connecting
`links are the arcs. This motion graph is the output of the
`object tracking stage.
`
`Figure 5 shows a motion graph for a hypothetical
`sequence of one-dimensional frames. Here, the system
`detects the appearance of an object at A and tracks it to
`the V-object at B. Due to an error in motion segmenta(cid:173)
`tion, the object splits at D and E, and joins at F. At G, the
`object joins with the object tracked from C due to
`occlusion. These objects split at H and I. Note that
`primary links connect the V-objects that were most
`reliably tracked.
`
`3.4. Motion analysis
`
`The motion analysis stage analyses the results of
`object tracking and annotates the motion graph with tags
`describing several events of interest. This process pro(cid:173)
`ceeds in two parts: V-object grouping and V-object
`indexing. Figure 6 shows an example motion graph for
`a hypothetical sequence of 1-D frames discussed in the
`following sections.
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F7
`
`F8
`
`Fig. 5. The output of the object tracking stage for a hypothetical sequence of 1-D frames. The vertical lines
`labeled "Fn" represent frame number n. Primary links are shown as solid arcs; secondary links are shown as
`dashed arcs.
`
`AVIGILON EX. 2006
`IPR2019-00236
`Page 6 of 19
`
`

`

`Automatic video indexing via object motion analysis
`
`613
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F7
`
`F8
`
`F9
`
`FlO
`
`Fll
`
`Fl2
`
`Fl3
`
`Fl4
`
`Fig. 6. An example motion graph for a sequence of 1-D frames.
`
`3.4.1. V-object grouping. First, the motion analysis
`stage hierarchically groups V-objects into structures
`representing the paths of objects through the video data.
`Using graph theory terminology,0 5l five groupings are
`defined for this purpose:
`A stem M ={Vi: i = 1,2, ... ,NM} is a maximal(cid:173)
`size, directed path (dipath) of two or more V-objects
`containing no secondary links, meeting all of the follow(cid:173)
`ing conditions:
`• outdegree(Vi) = 1 for 1 :::; i < NM,
`• indegree(Vi) = 1 for 1 < i :::; NM, and
`• either
`
`or
`
`(2)
`
`(3)
`
`where Jli is the centroid of V-object Vi EM.
`Thus, a stem represents a simple trajectory of an object
`through two or more frames. Figure 7 labels V-objects
`from Fig. 6 belonging to separate stems with the letters
`"A" through "K".
`Stems are used to determine the motion "state" of
`real-world objects, i.e. whether they are moving or
`
`stationary. If equation (2) is true, then the stem is classi(cid:173)
`fied as stationary; if equation (3) is true, then the stem is
`classified as moving. Figure 7 highlights stationary stems
`B, C, F, and H; the remainder are moving.
`A branch B ={Vi: i = 1, 2, ... ,NB} is a maximal(cid:173)
`size dipath of two or more V-objects containing no
`secondary
`links,
`for which outdegree(Vi)=1
`for
`1 :::; i < NB and indegree(V;)=l for 1 < i:::; NB. Figure
`8 labels V-objects belonging to branches with the letters
`"L" through "T". A branch represents a highly reliable
`trajectory estimate of an object through a series of
`frames.
`If a branch consists entirely of a single stationary stem,
`then it is classified as stationary; otherwise, it is classi(cid:173)
`fied as moving. Branches "N" and "Q" in Fig. 8 (high(cid:173)
`lighted) are stationary; the remainder are moving.
`A trail Lis a maximal-size dipath of two or more V(cid:173)
`objects that contains no secondary links. This grouping
`represents the object tracking stage's best estimate of an
`object trajectory using the mutual nearest neighbor cri(cid:173)
`terion. Figure 9 labels V-objects belonging to trails with
`the letters "U" through "Z".
`A trail and the V-objects it contains are classified as
`stationary if all the branches it contains are stationary,
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F7
`
`F8
`
`F9
`
`FlO
`
`Fll
`
`Fl2
`
`Fl3
`
`Fl4
`
`Fig. 7. Stems. Stationary stems are highlighted.
`
`K
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F7
`
`FS
`
`F9
`
`FlO
`
`Fll
`
`Fl2
`
`Fl3
`
`Fl4
`
`Fig. 8. Branches. Stationary branches are highlighted.
`
`AVIGILON EX. 2006
`IPR2019-00236
`Page 7 of 19
`
`

`

`614
`
`J. D. COURTNEY
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F7
`
`FS
`
`F9
`
`FlO
`
`Fll
`
`Fl2
`
`Fl3
`
`Fl4
`
`Fig. 9. Trails.
`
`and moving if all the branches it contains are moving.
`Otherwise, the trail is classified as unknown. Trail W in
`Fig. 9 is stationary; the remainder are moving.
`A track K={L,,G,, ... ,LNK_1 ,GNK_1 ,LNK}
`is a
`dipath of maximal size containing trails { L; : 1 ::;
`i::; NK], and connecting dipaths {G; : 1 ::; i < NK}.
`For each G; E K there must exist a dipath
`H = {Vf,G;, V;~d
`
`(where Vf is the last V-object in L;, and V;~ 1 is the first V(cid:173)
`object in L;+ 1), such that every \'} E H meets the require-
`ment
`
`(4)
`
`where p,~ is the centroid of Vf, v~ the forward velocity of
`vf, (tj- f;) the time difference between the frames con(cid:173)
`taining \'} and Vf, and P,j is the centroid of \'). Thus,
`equation (4) specifies that the object must maintain a
`constant velocity through path H.
`A track represents the trajectory estimate of an object
`that may cause or undergo occlusion one or more times in
`a sequence. The motion analysis stage uses equation (4)
`to attempt to follow an object through frames where an
`
`occlusion occurs. Figure 10 labels V-objects belonging to
`tracks with the letters "a", "(3", "x", "6" and "c".
`Note that track 6 joins trails X and Y.
`A track and the V-objects it contains are classified as
`stationary if all the trails it contains are stationary, and
`moving if all the trails it contains are moving. Otherwise,
`the track is classified as unknown. Track x in Fig. 10 is
`stationary; the remaining tracks are moving.
`A trace is a maximal-size, connected digraph of V(cid:173)
`objects. A trace represents the complete trajectory of an
`object and all the objects with which it intersects. Thus,
`the motion graph in Fig. 6 contains two traces: one trace
`extends from F2 to F7 ; the remaining V-objects form a
`second trace. Figure 11 labels V-objects on these traces
`with the numbers "1" and "2".
`Note that the preceding groupings are hierarchical, i.e.
`for every trace E, there exists at least one track K, trail L,
`branch B, and stem M such that E 2 K 2 L 2 B 2 M.
`Furthermore, every V-object is a member of exactly one
`trace.
`The motion analysis stage scans the motion graph
`generated by the object tracking stage and groups V(cid:173)
`objects into stems, branches, trails, tracks, and traces.
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F7
`
`FS
`
`F9
`
`FlO
`
`Fll
`
`Fl2
`
`Fl3
`
`Fl4
`
`Fig. 10. Tracks. The dipath connecting trails X and Y from Fig. 9 is highlighted.
`
`FO
`
`Fl
`
`F2
`
`F3
`
`F4
`
`F5
`
`F6
`
`F7
`
`FS
`
`F9
`
`FlO
`
`Fll
`
`Fl2
`
`Fl3
`
`Fl4
`
`Fig. 11. Traces.
`
`AVIGILON EX. 2006
`IPR2019-00236
`Page 8 of 19
`
`

`

`Automatic video indexing via object motion analysis
`
`615
`
`Thus, these five definitions are used to characterize
`object trajectories in various portions of the motion
`graph. This information is then used to index the video
`according to its object motion content.
`
`3.4.2. V-object indexing. Eight events of interest are
`defined to designate various object-motion events in a
`video sequence:
`
`Appearance: An object emerges in the scene.
`Disappearance: An object disappears from the scene.
`Entrance: A moving object enters in the scene.
`Exit: A moving object exits from the scene.
`is added
`inanimate object
`Deposit: An
`scene.
`Removal: An inanimate object is removed from the
`scene.
`Motion: An object at rest begins to move.
`Rest: A moving object comes to a stop.
`
`the
`
`to
`
`These eight events are sufficiently broad for a video
`indexing system to assist the analysis of many sequences.
`For example, valuable objects such as inventory boxes,
`tools, computers, etc., can be monitored for theft (i.e.
`removal) in a security monitoring application. Likewise,
`the traffic patterns of automobiles can be analysed (e.g.,
`entrance/exit and motion/rest), or the shopping patterns
`of retail customers recorded (e.g., motion/rest and re(cid:173)
`moval).
`After the V-object grouping process is complete, the
`motion analysis stage has all the semantic information
`necessary to identify these eight events in a video se(cid:173)
`quence. For each V-object V in the graph, the following
`rules are applied to annotate the nodes of the motion
`graph with event tags:
`
`1. If Vis moving, the first V-object in a track (i.e. the
`"head"), and indegree(V) > 0, place a tag designat(cid:173)
`ing an appearance event at V.
`track, and
`the head of a
`2. If V is stationary,
`indegree(V) = 0, place a tag designating an appear(cid:173)
`ance event at V.
`3. If Vis moving, the last V-object in a track (i.e. the
`"tail"), and outdegree(V) > 0, place a disappearance
`event tag at V.
`4. If V is stationary, the tail of a track, and out(cid:173)
`degree(V) = 0, place a disappearance event tag at V.
`5. If Vis non-stationary (i.e. moving or unknown), the
`head of a track, and indegree(V) = 0, place an
`entrance event tag at V.
`6. If V is non-stationary, the tail of a track, and out(cid:173)
`degree(V) = 0, place an exit event tag at V.
`track, and
`the head of a
`7. If V is stationary,
`indegree(V) = 1, place a deposit event tag at V.
`8. If V is stationary, the tail of a track, and out(cid:173)
`degree(V) = 1, place a removal event tag at V.
`
`Rules 1-8 use track groupings to annotate the video at
`the beginning and end of individual object trajectories.
`Note, however, that rules 7 and 8 only account for the
`object deposited or removed from the scene; they do not
`tag the V-object that caused the deposit or remove event
`
`to occur. For this purpose, we define two additional
`events-
`
`Depositor: A moving object adds an inanimate object
`to the scene.
`Remover: A moving object removes an inanimate
`object from the scene.
`
`-and apply two more rules:
`
`9. If V is moving and adjacent to a V-object with a
`deposit event tag, place a depositor event tag at V.
`10. If Vis moving and adjacent from a V-object with a
`removal event tag, place a remover event tag at V.
`
`The additional events depositor and remover are used
`to provide a distinction between the subject and object of
`deposit/removal events. These events are only used when
`the actions of a specific moving object must be analysed.
`Otherwise, their deposit/removal counterparts are suffi(cid:173)
`cient indication of the occurrence of the event.
`Finally, two additional rules are applied to account for
`the motion and rest events:
`
`11. If V is the tail of a stationary stem M; and the head
`of a moving stem Mj for which IMd 2: hM and
`IMj I 2: hM, then place a motion event tag at V. Here,
`hM is a lower size limit of stems to consider.
`12. If Vis the tail of a moving stem M; and the head of a
`IMd 2: hM and
`for which
`stationary stem Mj
`IMjl 2: hM, then place a rest event tag at V.
`Table 1 summarizes the conditions under which rules
`1-12 apply event tags to V-objects with moving, sta(cid:173)
`tionary, and unknown motion states. Figure 12 shows all
`the event annotation rules applied to the example motion
`graph of Fig. 6.
`As the annotation rules are applied to the motion
`graph, each identified event is recorded in an index table
`for later lookup. This event index takes the form of an
`array of lists of V-objects (one list for each event type)
`and indexes V-objects in the motion graph according to
`their event tags.
`The output of the motion analysis stage is an annotated
`directed graph describing the motion of foreground
`objects and an event index indicating events of interest
`in the video stream. Thus, the motion analysis stage
`generates from the object tracking output a symbolic
`abstraction of the actions and interactions of foreground
`objects in the video. This approach enables content-based
`analysis of video sequences that wo

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket