`US008004563B2
`
`c12) United States Patent
`Talmon et al.
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US 8,004,563 B2
`Aug. 23, 2011
`
`(54) METHOD AND SYSTEM FOR EFFECTIVELY
`PERFORMING EVENT DETECTION USING
`FEATURE STREAMS OF IMAGE
`SEQUENCES
`
`(75)
`
`Inventors: Gad Talmon, Givat Slnnuel (IL); Zvi
`Ashani, Ganei Tikva (IL)
`
`(73) Assignee: Agent Vi, Rosh-Haayin (IL)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 846 days.
`
`(21) Appl. No.:
`
`10/501,949
`
`(22) PCT Filed:
`
`Jul. 3, 2003
`
`(86) PCT No.:
`
`PCT /IL03/00555
`
`§ 371 (c)(l),
`(2), ( 4) Date:
`
`Jul. 21, 2004
`
`(87) PCT Pub. No.: W02004/006184
`
`PCT Pub. Date: Jan. 15, 2004
`
`(65)
`
`Prior Publication Data
`
`US 2005/0036659 Al
`
`Feb. 17, 2005
`
`(51)
`
`Int. Cl.
`H04N 7118
`(2006.01)
`H04N 7112
`(2006.01)
`(52) U.S. Cl. ................................... 348/155; 375/240.08
`(58) Field of Classification Search .................. 348/154,
`348/155; 375/240.08
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`6,069,655 A * 512000 Seeley et al. .................. 348/154
`10/2000 Koller et al.
`6,130,707 A
`
`6,188,381 Bl
`6,266,369 Bl *
`6,349,114 Bl
`
`2/2001 van derWal et al.
`7/2001 Wang et al. ................... 375/240
`212002 Mory
`(Continued)
`
`DE
`EP
`EP
`
`FOREIGN PATENT DOCUMENTS
`38 42 356 Al
`6/1990
`1 107 609 Al
`6/2001
`1 173 020 A2
`1/2002
`(Continued)
`
`OTHER PUBLICATIONS
`
`XP-010199874: Meyer, M., et al., "A New System for Video-Based
`Detection of Moving Objects and its Integration into Digital Net(cid:173)
`works", IEEE, pp. 105-110, (1996).
`
`Jayanti K Patel
`Primary Examiner -
`Assistant Examiner - Richard Torrente
`(74) Attorney, Agent, or Firm - Spilman Thomas & Battle,
`PLLC
`
`ABSTRACT
`(57)
`Method and system for performing event detection and object
`tracking in image streams by installing in field, a set of image
`acquisition devices, where each device includes a local pro(cid:173)
`grammable processor for converting the acquired image
`stream that consist of one or more images, to a digital format,
`and a local encoder for generating features from the image
`stream. These features are parameters that are related to
`attributes of objects in the image stream. The encoder also
`transmits a feature stream, whenever the motion features
`exceed a corresponding threshold. Each image acquisition
`device is connected to a data network through a correspond(cid:173)
`ing data communication channel. An image processing server
`that determines the threshold and processes the feature stream
`is also connected to the data network. Whenever the server
`receives features from a local encoder through its correspond(cid:173)
`ing data communication channel and the data network, the
`server provides indications regarding events in the image
`streams by processing the feature stream and transmitting
`these indications to an operator.
`58 Claims, 5 Drawing Sheets
`
`ENCODER 1
`
`WAN
`
`207
`
`101
`
`THRESHOLD
`DETECTION
`
`203
`
`103
`
`
`
`US 8,004,563 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`6,384,862 Bl
`6,996,275 B2 *
`2002/0041626 Al
`2002/0054210 Al
`2002/0062482 Al *
`2005/0146605 Al*
`
`512002 Brusewitz et al.
`212006 Edanami ....................... 382/218
`412002 Yoshioka et al.
`512002 Glier et al.
`512002 Bolle et al. .................... 725/105
`7/2005 Lipton et al. .................. 348/143
`
`WO
`WO
`WO
`WO
`WO
`
`FOREIGN PATENT DOCUMENTS
`00/01140 Al
`1/2000
`01163937 A2
`8/2001
`01191353 A2
`1112001
`01191353 A3
`1112001
`02/37429 Al
`512002
`
`* cited by examiner
`
`
`
`QO = = ~ u. 0--, w = N
`
`rJl
`d
`
`0 -.
`....
`.....
`1J1 =(cid:173)
`
`('D
`('D
`
`(Jl
`
`....
`0 ....
`
`N
`
`~
`
`(.H
`N
`~
`~
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`104
`
`ID
`
`II
`
`105
`
`DODD
`DODD
`DODD
`SERVER
`MCIP ~
`
`·I 11
`
`4
`
`1
`
`INDICATIONS
`
`205
`
`WAN
`
`102
`
`DETECTION
`
`EVENT
`
`209
`
`ENCODER 2
`
`ENCODER 1
`
`FIG.1
`
`103
`
`NVR
`
`DETERMINATION
`THRESHOLD Hl~~~~IHI
`
`203
`
`101
`
`201
`STREAM
`FEATURE
`~
`
`DETECTION
`THRESHOLD
`
`CHANNEL N
`
`ENCODER N
`
`I
`
`207-
`
`•
`•
`•
`
`
`
`U.S. Patent
`U.S. Patent
`
`Aug. 23, 2011
`Aug. 23, 2011
`
`Sheet 2 of 5
`Sheet 2 of 5
`
`US 8,004,563 B2
`US 8,004,563 B2
`
`
`
`D#3
`
`Axis Exhi it 1003, Page 4 of 16
`
`N
`·-u..
`0)
`
`mcofimmn
`
`.A..
`
`
`
`U.S. Patent
`U.S. Patent
`
`Aug. 23, 2011
`Aug. 23, 2011
`
`Sheet 3 of 5
`Sheet 3 of5
`
`US 8,004,563 B2
`US 8,004,563 B2
`
`Fig.3A
`
`<(
`('f)
`•
`
`CJ) ·-LL
`
`Axis Exhibit 1003, Page 5 of 16
`
`
`
`U.S. Patent
`Patent
`
`Aug. 23, 2011
`Aug. 23, 2011
`
`Sheet 4 of 5
`Sheet 4 of 5
`
`US 8,004,563 B2
`US 8,004,563 B2
`
`
`
`al
`('I')
`•
`
`Fig.BB
`O> ·-LL
`
`Axis Exhibit 1003, Page 6 of 16
`
`
`
`U.S. Patent
`U S. Patent
`
`Aug. 23, 2011
`Aug. 23, 2011
`
`Sheet 5 of 5
`Sheet 5 of 5
`
`US 8,004,563 B2
`US 8,004,563 B2
`
`
`
`("I)
`•
`
`u
`C> ·-LL
`
`Axis Exhi it 1003, Page 7 of 16
`
`
`
`US 8,004,563 B2
`
`1
`METHOD AND SYSTEM FOR EFFECTIVELY
`PERFORMING EVENT DETECTION USING
`FEATURE STREAMS OF IMAGE
`SEQUENCES
`
`FIELD OF THE INVENTION
`
`The present invention relates to the field of video process(cid:173)
`ing. More particularly, the invention relates to a method and
`system for obtaining meaningful knowledge, in real time,
`from a plurality of concurrent compressed image sequences,
`by effective prong of a large number of concurrent incoming
`image sequences and/or features derived from the acquired
`images.
`
`BACKGROUND OF THE INVENTION
`
`Many efforts have been spent to improve the ability to
`extract meaning data out ofimages captured by video and still
`cameras. Such abilities are being used in several applications,
`such as consumer, industrial, medical, and business applica(cid:173)
`tions. Many cameras are deployed in the streets, airports,
`schools, banks, offices, residencies-as standard security
`measures. These cameras are used either for allowing an
`operator to remotely view security events in real time, or for
`recording and analyzing a security event at some later time.
`The introduction of new technologies is shifting the video
`surveillance industry into new directions that significantly
`enhance the functionality of such systems. Several processing
`algorithms are used both for real-time and offiine applica(cid:173)
`tions. These algorithms are implemented on a range of plat(cid:173)
`forms from pure software to pure hardware, depending on the
`application. However, these platforms are usually designed to
`simultaneously process a relatively small number of incom(cid:173)
`ing image sequences, due to the substantial computational
`resources required for image processing. In addition, most of
`the common image processing systems are designed to pro(cid:173)
`cess only uncompressed image data, such as the system dis(cid:173)
`closed in U.S. Pat. No. 6,188,381. Modern networked video 40
`environments require efficient processing capability of a
`large number of compressed video steams, collect from a
`plurality of image sources.
`Increasing operational demands, as well as cost constraints
`created the need for automation of event detection. Such 45
`event detection solutions provide a higher detection level,
`save manpower, replace other types of sensors and lower false
`alarm rates.
`Although conventional solutions am available for auto(cid:173)
`matic intruder detection, license plate identification, facial 50
`recognition, traffic violations detection and other image
`based applications, they usually support few simultaneous
`video sources, using expensive hardware platforms that
`require field installation, which implies high installation,
`maintenance and upgrade costs.
`Conventional surveillance systems employ digital video
`networking technology and automatic event detection. Digi-
`tal video networking is implemented by the development of
`Digital Video Compression technology and the availability of
`IP based networks. Compression standards, such as MPEG-4 60
`and similar formats allow transmitting high quality images
`with a relatively narrow bandwidth.
`A major limiting factor when using digital video network(cid:173)
`ing is bandwidth requirements. Because it is too expensive to
`transmit all the cameras all the time, networks are designed to 65
`concurrently transmit data, only from few cameras. The trans(cid:173)
`mission of data only from cameras that are capturing impor-
`
`2
`tant events at any given moment is crucial for establishing an
`efficient and cost effective digital video network.
`Automatic video-based event detection
`technology
`becomes effective for this purpose. This technology consists
`of a series of algorithms that are able to analyze the camera
`image in real time and provide notification of a special event,
`if it occurs. Currently available event-detection solutions use
`conventional image processing methods, which require
`heavy processing resources. Furthermore, they allocate a
`1 o fixed processing power (usually one processor) per each cam(cid:173)
`era input. Therefore, such systems either provide poor per(cid:173)
`formance due to resources limitation or are extremely expen(cid:173)
`sive.
`As a result, the needs of large-scale digital surveillance
`15 installations-namely, reliable detection, effective band(cid:173)
`width usage, flexible event definition, large-scale design and
`cost, cannot be met by any of the current automatic event
`detection solutions.
`Video Motion Detection (VMD) methods are disclosed,
`20 for example, in U.S. Pat. No. 6,349,114, WO 02/37429, in
`U.S. Patent Application Publication 2002,041,626, in U.S.
`Patent Application Publication No. 2002,054,210, in WO
`01/63937, in EPl 107609, in EPl 173020, in U.S. Pat. No.
`6,384,862, in U.S. Pat. No. 6,188,381, in U.S. Pat. No. 6,130,
`25 707, andin U.S. Pat. No. 6,069,655. However, all the methods
`described above have not yet provided satisfactory solutions
`to the problem of effectively obtaining meanings knowledge,
`in real time, from a plurality of concurrent image sequences.
`It is an object of the present invention to provide a method
`30 and system for obtaining meaningful knowledge, from a plu(cid:173)
`rality of concurrent image sequences, in real time.
`It is another object of the present invention to provide a
`method and system for obtaining meaningful knowledge,
`from a plurality of concurrent image sequences, which are
`35 cost effective.
`It is a further object of the present invention to provide a
`method and system for obtaining meaningful knowledge,
`from a plurality of concurrent image sequences, with reduced
`amount of bandwidth resources.
`It is still another object of the present invention to provide
`a method and system for obtaining meaningful knowledge,
`from a plurality of concurrent image sequences, which is
`reliable, and having high sensitivity in noisy environments.
`It is yet another object of the present invention to provide a
`method and system for obtaining meaningful knowledge,
`from a plurality of concurrent image sequences, with reduced
`installation and maintenance costs.
`Other objects and advantages of the invention will become
`apparent as the description proceeds.
`
`SUMMARY OF THE INVENTION
`
`While these specifications discuss primarily video cam(cid:173)
`eras, a person skilled in the art will recognize that the inven-
`55 tion extends to any appropriate image source, such as still
`cameras, computer generated images, prerecorded video
`data, and the like, and that image sources should be equiva(cid:173)
`lently considered. Similarly, the terms video and video
`stream, should be construed broadly to include video
`sequences, still pictures, computer generated graphics) or any
`other sequence of images provided or converted to an elec-
`tronic format that may be processed by a computer.
`The present invention is directed to a method for perform(cid:173)
`ing event detection and object tracking in image streams. A
`set of image acquisition devices is installed in field, such that
`each device comprises a local programmable processor for
`converting the acquired image stream, that consists of one or
`
`
`
`US 8,004,563 B2
`
`10
`
`4
`The present invention is also directed to a system for per(cid:173)
`forming event detection and object tracking in image streams,
`that comprises:
`a) a set of image acquisition devices, installed in field, each
`of which includes:
`a.1) a local programmable processor for converting the
`acquired image stream, to a digital format
`a.2) a local encoder, for generating, from the image
`stream,
`features, being parameters
`related
`to
`attributes of objects in the image stream, and for trans(cid:173)
`mitting a feature stream, whenever the motion fea(cid:173)
`tures exceed a corresponding threshold;
`b) a data network, to which each image acquisition device
`is connected through a corresponding data communica(cid:173)
`tion channel; and
`c) an image processing server connected to the data net(cid:173)
`work, the server being capable of determining the
`threshold, of obtaining indications regarding events in
`the image streams by processing the feature stream, and
`of transmitting the indications to an operator.
`The system may further comprise an operator display, for
`receiving and displaying one or more image streams that
`contain events, as well as a network video recorder for record(cid:173)
`ing one or more image streams, obtained while their local
`encoder operates in its first mode.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The above and other characteristics and advantages of the
`invention will be better understood through the following
`illustrative and non-limitative detailed description of pre(cid:173)
`ferred embodiments thereof, with reference to the appended
`drawings, wherein:
`FIG. 1 schematically illustrates the structure of a surveil-
`lance system that comprises a plurality of cameras connected
`to a data network, according to a preferred embodiment of the
`invention;
`FIG. 2 illustrates the use of AOI's (Area of Interest) for
`designating areas where event detection will be performed
`and for reducing the usage of system resources, according to
`a preferred embodiment of the invention; and
`FIGS. 3A to 3C illustrates the generation of an object of
`interest and its motion trace, according to a preferred embodi(cid:173)
`ment of the invention.
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`3
`more images, to a digital format, and a local encoder, for
`generating features from the mage stream. The features are
`parameters that are related to attributes of objects in the image
`stream. Each device transmits a feature stream, whenever the
`number and type of features exceed a corresponding thresh(cid:173)
`old. Each image acquisition device is connected to a data
`network through a corresponding data communication chan(cid:173)
`nel. An image processing server connected to the data net(cid:173)
`work determines the threshold and processes the feature
`stream. Whenever the server receives features from a local
`encoder through its corresponding data communication chan-
`nel and the data network, the server obtains indications
`regarding events in the image streams by processing the fea(cid:173)
`ture stream and transmitting the indications to an operator.
`The local encoder may be a composite encoder, which is a
`local encoder that further comprises circuitry for compress- 15
`ing the image stream. The composite encoder may operate in
`a first mode, during which it generates and transmits the
`features to the server, and in a second mode, during which it
`transmits to the server, in addition to the features, at least a
`portion of the image stream in a desired compression level, 20
`according to commands sent from the server. Preferably, each
`composite encoder is controlled by a command sent from the
`server, to operate in its first mode. As long as the server
`receives features from a composite encoder, that composite
`encoder is controlled by a command sent from the server, to 25
`operate in its second mode. The server obtains indications
`regarding events in the image streams by processing the fea(cid:173)
`ture stream, and transmitting the indications and/or their cor(cid:173)
`responding image streams to an operator.
`Whenever desired one or more compressed image streams 30
`containing events are decoded by the operator station, and the
`decoded image streams are transmitted to the display of an
`operator, for viewing. Compressed image streams obtained
`while their local encoder operates in its second mode may be
`recorded.
`Preferably, additional image processing resources, in the
`server, are dynamically allocated to data communication
`duels that receive image streams. Feature streams obtained
`while operating in the first mode may comprise only a portion
`of the image.
`A graphical polygon that encompasses an object ofinterest
`being within the fame of an image or an AOI (Area Oflnter(cid:173)
`est) in the image may be generated by the server and dis(cid:173)
`played to an operator for viewing. In addition, the server may
`generate and display a graphical trace indicating the history 45
`of movement of an object of interest, being within the frame
`of an image or an AOI in the image.
`The image stream may be selected from the group of
`images that comprises video streams, still images, computer
`generated images, and pre-recorded digital, analog video 50
`data, or video streams, compressed using MPEG format. The
`encoder may use different resolution and frame rate during
`operation in each mode.
`Preferably, the features may include motion features, color,
`portions of the image, edge data and frequency related infor- 55
`mation.
`The server may perform, using a feature stream, received
`from the local encoder of at least one image acquisition
`device, one or more of the following operations and/or any
`combination thereof:
`License Plate Recognition (LPR);
`Facial Recognition (FR);
`detection of traffic rules violations;
`behavior recognition;
`fire detection;
`traffic flow detection;
`smoke detection.
`
`35
`
`40
`
`A significant saving in system resources can be achieved
`by applying novel data reduction techniques, proposed by the
`present invention. In a situation where thousands of cameras
`are connected to a single server, only a small number of the
`cameras actually acquire important events that should be
`analyzed. A large-scale system can function properly only if
`it has the capability of identifying the inputs that may contain
`useful information and perform further processing only on
`such inputs. Such a filtering mechanism requires minimal
`60 processing and bandwidth resources, so that it is possible to
`apply it concurrently on a large number of image streams. The
`present invention proposes such a filtering mechanism, called
`Massively Concurrent Image Processing (MCIP) technology
`that is based on the analysis of incoming image sequences
`65 and/or feature streams, derived from the acquired images, so
`as to fulfill the need for automatic image detection capabili(cid:173)
`ties in a large-scale digital video network environment.
`
`
`
`US 8,004,563 B2
`
`5
`MCIP technology combines diverse technologies such as
`large scale data reduction, effective server design and opti(cid:173)
`mized image processing algorithms, thereby offering a plat(cid:173)
`form that is mainly directed to the security market and is not
`rivaled by conventional solutions, particularly with vast num(cid:173)
`bers of potential users. MCIP is a networked solution for
`event detection in distributed installations, which is designed
`for large scale digital video surveillance networks that con(cid:173)
`currently support thousands of camera inputs, distributed in
`an arbitrarily large geographical area and with real time per(cid:173)
`formance. MCIP employs a unique feature transmission
`method that consumes narrow bandwidth, while maintaining
`high sensitivity and probability of detection. MCIP is a
`server-based solution that is compatible with modern moni(cid:173)
`toring and digital video recording systems and carries out
`complex detection algorithms, reduces field maintenance and
`provides improved scalability, high availability, low cost per
`channel and backup utilities. The same system provides con(cid:173)
`currently multiple applications such as VMD, LPR and FR. In
`addition, different detection applications may be associated
`with the same camera.
`MCIP is composed of a server platform with various appli(cid:173)
`cations, camera encoders (either internal or eternal to the
`camera), a Network Video Recorder (NVR) and an operator
`station. The server contains a computer that includes propri(cid:173)
`etary hardware and software components. MCIP is based on
`the distribution ofimage processing algorithms between low(cid:173)
`level feature extraction, which is performed by the encoders
`which are located in field (i.e., in the vicinity ofa camera), and
`high-level processing applications, which are performed by a
`remote central server that collects and analyzes these fea(cid:173)
`tures.
`The MCIP system described hereafter solves not only the
`bandwidth problem but also reduces the load from the server
`and uses a unique type of data stream (not a digital video
`stream), and performs an effective process for detecting
`events at real time, in a large scale video surveillance envi(cid:173)
`ronment.
`A major element in MOP is data reduction, which is
`achieved by the distribution of the image processing algo(cid:173)
`rithms. Since all the video sources, which require event detec(cid:173)
`tion, transmit concurrently, the required network bandwidth
`is reduced by generating a reduced bandwidth feature stream
`201 in the vicinity of each camera. In order to detect, track,
`classify and analyze the behavior of objects present in video
`sources, there is no need to transmit full video streams, but
`only partial data, which contains information regarding
`describing basic attributes of each video scene.
`By doing so, a significantly smaller data bandwidth is used,
`which reduces the demands for both the network bandwidth
`and the event detection processing power. Furthermore, if
`only the shape, size, direction of movement and velocity
`should be detected, there is no need to transmit data regarding
`their intensity or color, and thus, a further bandwidth reduc(cid:173)
`tion is achieved. Another bandwidth optimization may be
`achieved ifthe encoder in the transmitting side filters out all
`motions which are under a motion threshold, determined by
`the remote central server 203. Such threshold may be the
`amount of motion in pixels between two consecutive frames,
`and may be determined and changed dynamically, according
`to the attributes of the acquired image, such as resolution,
`AOI, compression level, etc. Areas of movement which are
`under the threshold are considered either as noise, or non(cid:173)
`interesting motions.
`One method for extracting features at the encoder side is by
`slightly modifying and degrading existing temporal-based
`video compressors which were originally designed to trans-
`
`6
`mit digital video. The features may also be generated by a
`specific feature extraction algorithm (such as any motion
`vector generating algorithm) that is not related to the video
`compression algorithm. When working in this reduced band(cid:173)
`width mode, the output streams of these encoders are defi(cid:173)
`nitely not a video stream, and therefore cannot not be used by
`any receiving party to produce video images.
`FIG. 1 schematically illustrates the structure of a surveil(cid:173)
`lance system that comprises a plurality of cameras connected
`10 to a data network, according to a preferred embodiment of the
`invention. The system 100 comprises n image sources (in this
`example, n cameras, CAMl, ... , CAMn), each of which
`connected to a digital encoder ENC j, for converting the
`images acquired by CAM} to a compressed digital format.
`15 Each digital encoder ENC j is connected to a digital data
`network 101 at point Pj and being capable of transmitting
`data, which may be a reduced bandwidth feature stream 201
`or a full compressed video stream, through its corresponding
`channel Cj. The data network 101 collects the data transmit-
`20 ted from all channels and forwards them to the MCIP server
`102, through data-bus 103. MCIP server 102 processes the
`data received from each channel and controls one or more
`cameras which transmit any combination of the reduced
`bandwidth feature stream and the full compressed video
`25 stream, which can be analyzed by MCIP server 102 in real
`time, or recorded by NVR 104 and analyzed by MCIP server
`102 later. An operator station 105 is also connected to MCIP
`server 102, for real time monitoring of selected full com(cid:173)
`pressed video streams 205. Operator station 105 can manu-
`30 ally control the operation of MCIP server 102, whenever
`desired.
`The MCIP (Massively Concurrent Image Processing)
`server is connected to the image sources (depicted as cameras
`in the drawing, but may also be any image source, such taped
`35 video, still cameras, video cameras, computer generated
`images or graphics, and the like.) through data-bus 103 and
`network 101, and receives features or images in a compressed
`format. In the broadest sense this is any type of network,
`wired or wireless. The images can be compressed using any
`40 type of compression. Practically, WP based networks are
`used, as well as compression schemes that use DCT,
`Video LAN Client (VLC, which is a highly portable multime(cid:173)
`dia player for various audio and video formats as well as
`Digital Versatile Discs (DVDs), Video Compact Discs
`45 (VCDs ), and various streaming protocols, disclosed in WO
`01/63937) and motion estimation techniques such as MPEG.
`The system 100 uses an optional load-balancing module
`that allows it to easily scale the number of inputs that can be
`processed and also creates the ability to remove a single point
`50 of failure, by creating backup MCIP servers. The system 100
`also has a configuration component that is used for defining
`the type of processing that should be performed for each input
`and the destination of the processing res. The destination can
`be another computer, an email address, a monitoring appli-
`55 cation, or any other device that is able to receive textual and/or
`visual messages.
`The system can optionally be connected to an external
`database to assist image processing. For example, a database
`of suspect, stolen cars, of license plate numbers can be used
`60 for identifying vehicles.
`FIG. 2 illustrates the use of AOI's (Area of Interest) for
`reducing the usage of system resources, according to a pre(cid:173)
`ferred embodiment of the invention. AnAOI is a polygon (in
`this Fig., a hexagon) that encloses the area where detection
`65 will occur. The rectangles indicate the estimated object size at
`various distances from the camera. In this example, the scene
`of interest comprises detection movement of a person in a
`
`
`
`US 8,004,563 B2
`
`7
`field (shown in the first rectangle). It may be used in the
`filtering unit to decide if further processing is required. In this
`case, the filtering unit examines the feature data. The feature
`stream is analyzed to determine if enough significant features
`lie within the AOL If the number of features that are located
`inside the AOI and comprise changes, exceeds the threshold
`207 then this frame is designated as possibly containing an
`event and is transferred for further processing. Otherwise, the
`frame is dropped and no further processing is performed.
`The MCIP server receives the reduced bandwidth feature 10
`stream (such a feature stream is not a video stream at all, and
`hence, no viewable image can be reconstructed thereof) from
`all the video sources which require event detection. When an
`event is detected 209 within a reduced bandwidth stream that 15
`is transmitted from a specific video source, the central server
`may instruct this video source to change its operation mode to
`a video stream mode, in which that video source may operate
`as a regular video encoder and transmits a standard video
`stream, which may be decoded by the server or by any receiv(cid:173)
`ing party for observation, recording, further processing or any
`other purpose. Optionally the video encoder also continues
`transmitting the feature stream at the same time.
`Working according to this scheme, most of the video
`sources remain in the reduced bandwidth mode, while trans(cid:173)
`mitting a narrow bandwidth data stream, yet sufficient to
`detect events with high resolution and frame rate at the MCIP
`server. Only a very small portion of the sources (in which
`event is detected) are controlled to work concurrently in the
`video stream mode. This results in a total network bandwidth, 30
`which is significantly lower than the network bandwidth
`required for concurrently transmitting from all the video
`sources.
`For example, if a conventional video surveillance installa(cid:173)
`tion that uses 1000 cameras, a bandwidth of about 500 Kbp/s
`is needed by each camera, in order to transmit at an adequate
`quality. In the reduced bandwidth mode, only about 5 Kbp/s
`is required by each camera for the transmission of informa(cid:173)
`tion regarding moving objects at the same resolution and
`frame rate. Therefore, all the cameras working in this mode
`are using a total bandwidth of 5 Kbp/s times 1000=5 Mbp/s.
`Assuming that at steady state suspected objects appear in 1 %
`of the cameras (10 cameras) and they are working in video
`strum mode, extra bandwidth of 10 times 500 Kbp/s=5 Mbp/s
`is required. Thus, the total required network bandwidth using 45
`the solution proposed by the present invention is 10 Mbp/s.A
`total required network bandwidth of 500 Mbp/s would be
`consumed by conventional systems, if all the 1000 cameras
`would concurrently transmit video streams.
`The proposed solution may be applicable not only for 50
`high-level moving objects detection and tracking in live cam(cid:173)
`eras but also in recorded video. Huge amounts of video foot(cid:173)
`age are recorded by many surveillance systems. In order to
`detect interesting events in this recorded video, massive pro(cid:173)
`cessing capabilities are needed. By converting recorded 55
`video, either digital or analog, to a reduced bandwidth stream
`according to the techniques described above, event detection
`becomes much easier, with lower processing requirements
`and faster operation.
`The system proposed in the present invention comprises 60
`the following components:
`1. One or more MCIP servers
`2. One or more dual mode video encoders, which may be
`operated at reduced bandwidth mode or at video stream
`mode, according to remote instructions.
`3. Digital network, LAN or WAN, IP or other, which estab(cid:173)
`lishes communication between the system components.
`
`8
`4. One or more operator stations, by which operators may
`define events criteria and other system parameters and man(cid:173)
`age events in real time.
`5. An optional Network Video Recorder (NVR), which is able
`to record and play, on demand, any selected video source
`which is available on the network.
`Implementation for Security Applications:
`Following is a partial list of types of image processing
`applications which can be implemented very effectively
`using the method proposed by the present invention:
`Video Motion Detection-for both indoor and outdoor
`applications. Such application is commonly used to detect
`intruders to protected zones. It is desired to ignore nuisances
`such as moving trees, dust and animals. In this embodiment of
`the present invention manipulates input images at the stream
`level in order to filter out certain images and image changes.
`Examples of such filtering are motion below a predetermined
`threshold, size or speed related filtering all preferably applied
`20 within the AOis, thus reducing significantly the amount of
`required system resources for further processing. Since the
`system is server-based and there is no need for installation of
`equipment in the field (except the camera), this solution is
`very attractive for low budget application such as in the resi-
`25 dential market.
`Exceptional static objects detection-this application is
`used to detect static objects where such objects may require
`an alarm, By way of example, such objects may comprise an
`unattended bag at the airport, a stopped car on a highway, a
`person stopped at a protected location and the like. In this
`embodiment the present invention manipulates the input
`images at the stream level and examines the motion vectors at