`
`Y. Berviller, E. Tisserand, C. Bataille, H. Guermoud, S. Weber
`
`Laboratoire d'lnstrumentation Electronique de Nancy
`
`B.P. 239-54506 Vandoeuvre Cedex- France
`
`abstract
`
`We describe a system for detecting moving elements from a moving camera using a simplified background constraint
`technique. The hardware implementation of the method is done in an intelligent camera form. In order to keep the design
`small and able to operate at video frame rate the algorithm is for a large part implemented in F.P.G.A's. The camera move in a
`locally planar environment and is tilted enough to exclude the horizon line from the image. In this paper we assume the
`camera moves only in translation and the objects to detect are moving in the same direction as the camera does (and obviously
`at higher speed). If the instantaneous speed of the camera is known and is constant between two successively processed
`images, it is easy to predict (for a big part) the next image to process by using the reverse perspective transform. In order to
`transform such a prediction in a simple translation of the image and in order to obtain a uniform spatial resolution in the
`observed scene, we re-sample the lines and apply to each re-sampled line a specific horizontal scale factor. With each
`transformed image we «compute >> a predicted image. The predicted and true images are subtracted and objects moving in the
`scene correspond to a noticeable difference area. The direction and speed of the displacement are estimated with one(cid:173)
`dimensional correlation functions between vertical windows in subtracted images. A study of the limitations and experimental
`results obtained by simulation with real images are presented.
`
`motion detection
`
`collision avoidance
`
`real-time processing
`
`FPGA
`
`intelligent camera
`
`1. Introduction
`
`The detection of moving elements is an important task in autonomous vehicle systems and since a few years in the driving aid
`feature that future vehicles will possess. Among the various concepts, the detection of proximity (all around the vehicle but
`forward) aims to avoid rear-lateral collisions. Microwave2
`4 or ultra-sound5
`3
`6
`radar based systems have already been
`·
`·
`·
`proposed. Currently, thanks to the continuous cost diminution of C.C.D. cameras as well as their associated processing
`electronics, this domain is increasingly explored for on board applications. The biggest interest of this technique is its ability
`to recognize, even roughly, the shape of the detected object. This feature is not yet available, for a reasonable cost, with the
`other mentioned techniques 1
`• If we look for motion detection in the image processing area, three types of method are mainly
`used: those optical flow based, those based on the depth from focus or depth from defocus and those background constraint7
`based. The first are computing expensive8 even in their simplified form that use only one dimensional correlation9
`. One
`variant is the active vision 12 principle, that do not require the optical flow computing, but need a dynamically pan and tilt
`adjustable camera because of its underlying target tracking algorithm. The second are inadequate for dynamic distances
`measurements. They take at least ten images10 when depth from focus is used, while the depth from defocus, that need only
`two to three images, are less accurate and more sensitive to noise because they rely on second order differentiating
`operators 11
`• Furthermore, the last one need defocused images and small field depth camera, that make the image difficult to
`use for human observer.
`
`Thus we choose the third type of technique. The basic idea is that vehicles move generally in a planar environment13
`• With
`this assumption it becomes relatively easy, if the speed and direction of the camera platform is known, to estimate the next
`image in the sequence from current one. The moving object, relative to the scene, do not respect this prediction constraint, and
`are consequently detectable. In the following sections we describe the optical configuration that we use and some of its
`limitations, then we expose the global principle we retained. After this, we show the hardware implantation in an intelligent
`camera system, that is mainly constituted ofF.P.G.A. circuits. This solution contribute to build an economical and small sized
`device, that can be used, for example, in vehicle active mirror applications. We finish with the presentation of some
`preliminary experimental results.
`
`174
`
`SPIE Vol. 2950 • 0-8194-2354-8/96/$6.00
`
`Mercedes-Benz USA, LLC, Petitioner- Ex. 1011
`Mercedes-Benz USA, LLC v. American Vehicular Sciences LLC
`IPR2014-00644
`
`1
`
`
`
`2.1. Optical and geometric configuration
`
`2. Preliminary
`
`Since the objective is to detect the presence of an object in an area near from the camera, we find that it were judicious to tilt
`the camera in order to restrict the observed field as shown in figure 1. Thus the supervised area take almost the entire image
`and the spatial resolution is enhanced. This can also reduce the problem of over-illumination. We assume that the moving
`elements to detect are approaching by following the trajectory of the camera. This trajectory is considered to be collinear with
`the optical axis.
`
`· . . . )
`
`0
`
`·ct . 0
`
`0
`
`h
`
`~floor ~
`
`0~
`
`Z.min
`
`...
`
`Zsmax
`
`crj::
`
`2.2. Perspective equations
`
`Figure 1: Optical configuration
`
`We use the following indices to refer to the coordinates in the diverse coordinate systems:
`
`i: coordinates in the image coordinate system, c: coordinates in the camera coordinate system ( Oc, X c, ~, Zc), s:
`coordinates in the scene coordinate system ( 0 s , X s , I: , Z J
`
`if kx and ky are the scale factors along the axis Xc and Yc respectively (in pixels per meter).
`
`Thus by using the pinhole camera model we have:
`
`175
`
`2
`
`
`
`We would tilt the camera in such a manner that the extreme values of Yi correspond to finite values ofZs. This is given by
`the following equations:
`
`n
`. = h · tan(- - n. - a)
`Z
`snun
`2
`n
`= h·tan(--n.+a)
`2
`
`Z
`smax
`
`'I'
`
`'I'
`
`(2)
`
`The relation between the coordinates in the scene and in the camera coordinate system are given by using the homogeneous
`coordinates by:
`
`0
`cos(¢)
`-sin(¢)
`0
`
`0
`sin(¢)
`cos(¢)
`0
`
`(3)
`
`thus by substituting in (1) we obtain:
`
`(ys- h)· cos(¢)+ Zs ·sin(¢)
`Y· = f ·
`Y (-ys+h)·sin(lf>)+zs·cos(¢)
`
`'
`
`(4)
`
`and in the particular case of the points belonging to the floor:
`
`Zs -h ·tan(¢) -1
`Yi =....:..:. ____ _
`z
`!y
`~+tan(¢)
`h
`
`(5)
`
`xi
`- -
`fx
`
`xs
`h ·sin(¢)+ Zs ·cos(¢)
`
`(5')
`
`2.3. Effect of the « thickness » of the objects
`
`If Lk is the kth line ofthe image (the first being at the top).
`
`Equation (5) show that if an object is approaching along Zs ( Zs decreasing), Yi decrease and consequently Lk crease. The
`motion detection in the scene becomes a motion detection along Lk . Furthermore if the parameters a, ¢, h, N are known
`it is possible to find Zs by use of relation (6). Normally, this relation is only true for points belonging to the floor plane. But,
`if the front face (assumed perpendicular to the floor plane) of the element to detect posses points belonging to the floor,
`equation (4) show that these points have the smallest value of Yi and then have the biggest value of Lk . Thus these points
`will be detected first and satisfy to the planar hypothesis.
`
`176
`
`3
`
`
`
`2.4. Camera resolution
`
`Let N min be the minimum number of lines needed in order to keep the resolution in the scene lower or equal to &
`along Zs . Then N min is given by:
`
`2
`2-(m-n+p)·(m·p-1)
`+ 1 = 1 +
`N · =
`n·(1+p2)
`mm 1-y(zmax-&)
`y(zmax)
`
`(6)
`
`with:
`
`m = Zsmax ; m = & ; p =tan(¢)= 1 + m · tan(a)
`h
`h
`m-tan(a)
`
`3. Principle
`
`3.1. General overview
`
`Figure 2 show the global processing flow for the algorithm that we have developed.
`
`Image at
`timet
`
`Prediction of image at
`time t + 1 from image at ~ Subtraction
`timet
`
`Image at
`time t+l
`
`f--D Linearization
`
`Image at
`time t+2
`
`3.2. Linearization
`
`Prediction of image at
`time t+2 from image at 1----c Subtraction
`timet+ I
`
`_j
`
`Figure 2: data flow of the proposed algorithm
`
`l
`
`Vertical bands ~ Cross
`extraction
`~ correlation
`
`I
`
`~
`
`Analysis
`
`1..[:::
`4
`
`We wish resample the image in order to obtain an homogeneous spatial resolution in the scene from its image.
`
`Thus we can give the vertical linearization algorithm:
`
`for Zs creasing from Zsmin to Zsmax by step of & :
`
`177
`
`4
`
`
`
`. J
`calculate yz = y ·
`
`h
`tan(cj>)--
`Zs
`h
`1 +- · tan(cj>)
`Zs
`
`N l
`I y.(z )
`s + 1) · -J N: number of line in the image, E(x): integer part of x.
`calculate k = E (
`'
`L Y; (zsmax)
`2
`
`copy the line Lk in the resulting image.
`
`On the other hand, as shown by equation (5'), the lines are spatially more compressed as they correspond to farther area in the
`scene. Thus to linearize along x without reducing the supervised zone it is necessary to compress the lines corresponding to
`near distances. We keep the same scale as this given by the point at Zsmax (first line of the image).
`
`Thus we can give the expression of a linearized line:
`
`xi J = !i__. h · sin(cj>) + Zs · cos(cj>)
`-f
`fx h·sin(~~>)+Zsmax ·cos(~~>)
`x
`lin
`'I'
`'I'
`
`(
`
`(7)
`
`And we have the following algorithm:
`
`For each vertically linearized line:
`
`h · sin(cj>) + Zsmax · cos(cj>)
`calculate a = __ __:_ _ ____::;,::::::..:....__ _ ___;__
`h · sin(cj>) + Zs · cos(cj>)
`
`X max (a -1)
`calculate x 0 lin = - 2
`-
`
`· ----;;:--
`
`X max: number of points per line
`
`Xi lin= Xotin
`
`J X max l
`for n creasing from 0 to ll---;;-J
`
`if E[n ·a+ E(a)] <X max
`1 r
`1
`then l(xiun) = -;l (a- E(a)) · I(E(a · n) + E(a)) + f:'ol(E(a · n) + k) J
`
`E~)
`
`With l(x): gray level of the pixel of abscise X in the current line.
`
`and E( a) is the integer part of a
`
`178
`
`5
`
`
`
`An image that is processed by this algorithm is equivalent to an image obtained if the optical axis of the camera is
`perpendicular to the floor (if all the objects are planar).
`
`3.3. Prediction
`
`In the linearized image, the translation of the camera imply the translation of the image. Of course, there is a loss which
`correspond to the new part in the image and that will be greater as the speed of the camera platform increase.
`
`Let V be the speed of the camera and assume it is constant during the prediction time !lt , the corresponding
`movement V · !lt , imply a translation of E[ V :t] lines and a loss from the same number of lines. The most critical
`
`hypothesis is the planarity of the totality of the scene.
`
`3.4. Prediction gap
`
`A simple subtraction between the linearized image and its predicted part can make it possible to detect a moving object. But
`some conditions must be observed:
`* The illumination of the scene may not vary significantly during the prediction time, in order to eliminate the immobile part
`of the scene. Even if there are no illumination variation, there is a variation of intensities along the Z axis (used in shape
`from shading). But this would be neglected because, with the optical configuration that we impose, the dynamic on Z is
`low.
`
`* The gray level of the detected object must be significantly different from that of the floor.
`* The displacement of the object to detect must be two to three times & during llt, in order to discriminate it from the
`residual prediction error.
`
`Note that it may be possible, if there are small repetitive pattern, that false alarm occur by stroboscopic effect.
`
`3.5. Cross-correlation function
`
`We use a function that associate an intensity average profile to a vertical window. This approach has yet been used with non
`moving cameras and without distances linearization1
`. For the moment, we use three windows horizontally centered in the
`image.
`
`The detection of an object moving at the same speed as the camera is obtained by the analysis of both the mean value of the
`profile function and the cross-correlation function. If we use only the cross-correlation function we could make false detection
`because when no moving object are present spurious peaks can appear.
`
`We use a cross-correlation function based on Pearson's correlation coefficient. Thus the effects of intensity variation and non
`locally stationary « signal » are reduced. Furthermore, to prevent from errors due to border effects we assume that the profiles
`does not vary outside the observed interval.
`
`4. Hardware implantation
`
`4.1. Algorithm adaptation
`
`In order to match the requirements of such a system (small size, cost, power consumption), the following adaptations must be
`made:
`
`179
`
`6
`
`
`
`* The arithmetic computations, with the exception of the linearization parameters, will be made with integer, this will reduce
`the silicon area without decreasing the processing speed.
`
`* The averaged profiles will be taken from windows with a width that is an integer power of two.
`* Only the information in the averaged profiles are memorized.
`* The system can operate with cameras that have a shutter operating at image rate without the need of a non-interleaving
`device.
`
`We have noted the good immunity of the cross-correlation function with respect to quantification noise by making simulation
`with real images. Thus we use a 6 bits quantified video signal.
`
`4.2. General architecture
`
`Figure 3 show the functional decomposition of the proposed method, which is the following:
`* Computation of the linearization and prediction parameters
`* Linearization and vertical averaged profiles extraction.
`* Prediction-subtraction and cross-correlation of the differences of extracted profiles.
`
`The data exchange between these three phase are made by the memory.
`
`The device output consist of the periodic readout of the cross-correlated functions.
`
`Digital video signal
`
`Sync hronization
`
`Linearization and
`averaged vertical bands 1--
`extraction
`
`r
`
`speed --t>
`
`tilt angle --t>
`
`height --t>
`
`Linearization and
`prediction parameters
`computing
`
`"
`
`Memory
`
`Prediction, Subtraction
`and Cross-Correlation
`
`4.3. Processing blocs implantation
`
`Figure 3: general overview of the architecture
`
`The computation of the linearization and prediction parameters, which does not require a huge amount of computing power is
`made by a micro controller. The computing requirements is the following:
`
`The linearization, profiles extraction and the prediction-subtraction, cross-correlation operations are implanted in F.P.G.A's,
`which are well suited for such rapid processing. The organization of this implantation is shown in figure 4.
`
`180
`
`7
`
`
`
`Linearization and averaged vertical bands extraction
`
`synchronization
`signals
`
`r-
`
`I Video signal J----Lc Accumulator
`r
`l
`
`Line counter ~
`Selection of line and
`pixels to preserve
`
`Column
`counter
`
`r------t>
`
`f
`
`Memory access
`management
`
`%
`
`Prediction. Subtraction and Cross-Correlation
`
`<J-l>
`
`Memory access
`management
`
`1
`
`Cross-correlation
`
`:}------{;: Synchronous read-out
`
`"
`
`v
`
`Predictive subtraction
`
`1
`
`Figure 4: Organization of the two processing blocs implanted in F.P.G.A.
`
`We have not yet totally optimized all the critical paths but the size of the two F.P.G.A. function described above are
`respectively 1930 and 2560 gates equivalent.
`
`The RAM size needed is about 4 kword of 12 bits.
`
`there are F.P.G.A's that allow RAM implantation up to 20 kbits, but then there is not much place for logic and furthermore we
`use RAM for inter-processing communication. We use an external static RAM with an access time of 20 ns, that is not critical.
`
`The complete system is build up with a classical C.C.D. camera, a digitizing and synchronization interface, two small
`densities F.P.G.A's, one micro controller and one static RAM
`
`5.1. Experimental conditions
`
`5. Preliminary results
`
`We give the results that were obtained by software simulation of the architecture, that is, with the same adaptations of the
`algorithm. As these simulation are made off-line, we take the images sequences with an 8mm camera recorder and processed
`the images one by one with a personal computer. Figure 5 shows the effects of distances linearization and one can see the
`errors implied by non planar objects.
`
`181
`
`8
`
`
`
`5.2. Real scene
`
`We made this sequence with two cars. The car who precedes the other has the camera in the middle of the rear end. Figure 6
`show cross-correlation functions obtained with and without the presence of the car in the field of view.
`
`Figure 5: Effect of the linearization with a real scene.
`The black object correspond to the shadow and the
`front spoiler of the car
`
`cross-correlation value
`
`,.~------,-------~--~--~------~------~
`
`0.7
`
`0.6
`
`o.s
`
`0.3-
`
`0.2-
`
`.a-
`
`II
`
`-'·
`
`• 0 .. + . a
`
`111
`
`e
`
`• 0 _
`
`•>('"
`
`..
`
`-2
`
`number of lines shifted in the cross-correlation function
`Figure 6: Examples of cross-correlation function obtained
`in a real traffic scene. The value of & was 50 em and the
`speed of the first vehicle was approximately 30 km/h. The
`real speed of the approaching vehicle was unknown. The
`curve with the box points is taken at a moment where no
`car appears in the field of view, in contrast to the other
`curve.
`
`182
`
`9
`
`
`
`5.3. Calibrated scene
`
`Although the primary goal of the system is not the measurement of relative speed, we made several test to evaluate the
`possibility in this domain. We simulated such a scene by taking the image one by one and moving both the camera and the
`object to detect at each image by a measured distance.
`
`In the example from which the results are shown on the left side of figure 7, we moved the camera by 5 em and the object by
`10 em at each image. The curve with the box points is the average of ten consecutive cross-correlation function and his
`maximum value is located between 2 and 3 shifted lines. This correspond to 5 em, since we linearized the images with a &
`of 2 em, and is in concordance with the real relative motion. The graph on the right side of figure 7 shows the results if we use
`a profile that contain an identical object but that moves not in the scene. In these case the maximum value lies on a negative
`lines shift, because the object is moving away 5 em per image with respect to the camera. As for the graph on the right side,
`the curve with the box points is the average with the same characteristics. But the result are a little less good, since the
`maximum value does not lie exactly in the middle of -2 and -3.
`
`cross-correlation value
`
`cross-correlation value
`
`.
`
`'•
`
`i
`
`'.·
`
`~
`
`\:
`\! \
`\
`·'·
`!\'\
`.
`
`' \
`
`J )_·.)
`'.~· \ ·~J.
`ir \ ~· .. ·.
`ki
`\ ·~.
`lf1 ...
`\1\\·. ·.
`'.
`I :' '1 •
`/l r
`'l. I
`l
`I
`;.
`I
`I
`
`I
`·!
`
`If!, I
`
`l \'j
`
`1
`
`·,·,,,~~/.
`
`number of lines shifted in the cross-correlation function
`
`number of lines shifted in the cross-correlation function
`
`figure 7: Some cross-correlation function obtained with a scene where the camera moved 5 em and one object 10
`em at each image. The linearization was made with a & of 2 em. In both graphs the curve with the box points is
`the mean of ten consecutive functions. The graph on the left side correspond to profiles containing the moving
`object. The graph on the right side show the case of a profile containing an immobile object.
`
`183
`
`10
`
`
`
`6. Conclusion and future works
`
`We propose a method for detecting approaching elements in a moving area with a C.C.D. camera. This method is relatively
`simple and thus allow the integration in the form of an intelligent camera. The entire device can be integrated in a standard
`industrial camera box.
`
`The preliminary results make us think that an intelligent camera system can advantageously replace concurrent technologies
`(ultra sounds, microwave, etc.). Indeed, even if the accuracy, in terms of distance measurements, is not yet comparable with
`that of these methods, the false detection risk are quite low and can still be reduced by further processing. In addition, with the
`results that we expect to obtain soon, we will verify the robustness and certainly enhance the accuracy by averaging the cross(cid:173)
`correlation functions over time. We also expect to integrate the entire processing stage in one F.P.G.A. of medium density, by
`using the dynamic reconfiguration capability of these circuits.
`
`7. References
`
`1. G. Najm « Comparison of alternative crash avoidance sensor technologies » Proceedings of the SPIE vol.2344 Intelligent
`Vehicle Highway System 1994.
`
`2. Brooke« Radar Beams», Automotive Industries December 1992.
`
`3. Scott« Radar Mirror Has OE-Potential »Automotive Engineer August-September 1993.
`
`4. E. Jackson, J. A. Himelick and C. D. Wright« Commercial Vehicle Applications of Object Detection Systems» Int.
`Congress on Trans. Elect., Dearborn, October 1992.
`
`5. Polaroid Corp. «Ultrasonic Ranging System. For Accurate Distance Measurement and Object Detection» Cambridge.
`MA.
`
`6. T. Clemence and G. W. Hurlbut «The Application Of Acoustic Ranging to the Automatic Control of a Ground Vehicle »
`IEEE Trans. on Vehicle Tech. vol.VT-32 n°3 August 1983.
`7. Facon « EDOSI: un systeme d' etude du deplacement d' objets a partir de sequences d'images » Ph. D thesis Universite de
`Technologie de Compiegne December 1987.
`
`8. Ens and Z. N. Li « Real Time Motion Stereo » Proceedings of the IEEE Computer Society Conference on Computer
`Vision and Pattern Recognition, New York 15-18 June 1993.
`
`9. Acona and T. Poggio «Optical Flow from 1D Correlation: Application to a simple Time to Crash Detector» Proceedings
`of the IEEE fourth Int. Conf. on Computer Vision, Berlin 11-14 May 1993.
`
`10. Xiong and S. A. Shafer« Depth from Focusing and Defocusing» Proceedings of the IEEE Computer Society Conference
`on Computer Vision and Pattern Recognition, New York 15-18 June 1993.
`
`11. Surya and M. Subbarao « Depth from Defocus by Changing Camera Aperture: A Spatial Domain Approach » Proceedings
`of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York 15-18 June 1993.
`
`12. Aloimonos, I. Weiss and A. Bandyopadhyay «Active Vision» Int. J. of Computer Vision vol.11988.
`
`13. Elnagar and A. Basu «Motion detection using background constraint» Pattern Recognition vol.28 n°10 1995.
`
`14.L. Duckworth, M. L. Frey, C. E. Remer, S. Ritter and G. Vidaver «Comparative study of non-intrusive traffic monitoring
`sensors» Proceedings of the SPIE vol.2344 Intelligent Vehicle Highway System 1994.
`
`184
`
`11