`Multiview Video Coding Using View Interpolation
`and Color Correction
`Kenji Yamamoto, Member, IEEE, Masaki Kitahara, Hideaki Kimata, Tomohiro Yendo,
`Toshiaki Fujii, Member, IEEE, Masayuki Tanimoto, Senior Member, IEEE, Shinya Shimizu, Kazuto Kamikura, and
`Yoshiyuki Yashima, Member, IEEE
`(Inivted Paper)
`Abstract—Neighboring views must be highly correlated in mul-
`tiview video systems. We should therefore use various neighboring
`views to efficiently compress videos. There are many approaches
`to doing this. However, most of these treat pictures of other views
`in the same way as they treat pictures of the current view, i.e.,
`pictures of other views are used as reference pictures (inter-view
`prediction). We introduce two approaches to improving compres-
`sion efficiency in this paper. The first is by synthesizing pictures
`at a given time and a given position by using view interpolation
`and using them as reference pictures (view-interpolation predic-
`tion). In other words, we tried to compensate for geometry to ob-
`tain precise predictions. The second approach is to correct the lu-
`minance and chrominance of other views by using lookup tables to
`compensate for photoelectric variations in individual cameras. We
`implemented these ideas in H.264/AVC with inter-view prediction
`and confirmed that they worked well. The experimental results re-
`vealed that these ideas can reduce the number of generated bits by
`approximately 15% without loss of PSNR.
`Index Terms—Color correction, H.264/AVC, inter-view predic-
`tion, multiview video coding, view interpolation prediction.
`AS THE capabilities of personal computers and man-
`ufacturing cameras continue to dramatically increase,
`multiple-camera systems that have around one hundred
`cameras have emerged [1], [2], which makes it possible to gen-
`erate free-viewpoint video and 3-D recognition. Our group’s
`multiple-camera system, one of them, and the sequences cap-
`tured have previously been described [2]. These systems have
`Manuscript received December 13, 2006. This work was presented in part
`at the Picture Coding Symposium, Beijing, China, April 2006. This paper was
`recommended by Guest Editor Prof. Y. He.
`Color versions of one or more of the figures in this paper are available online
`Digital Object Identifier 10.1109/TCSVT.2007.903802
`Fig. 1. Multiview video coding: purpose of multiview video coding is not to
`encode panoramic pictures made from multiple cameras but to efficiently en-
`code pictures taken by all cameras.
`many fields in which they can be applied, such as to preserve our
`cultural heritage and traditional dance, free-viewpoint televi-
`sion, educational applications, face certification/authentication,
`security systems, and entertainment. They need technological
`advances in various fields such as video coding to become
`commonly used instead of single-video systems. This paper
`focuses on compression for multiview videos. Fig. 1 shows an
`overview of multiview video coding, whose purpose is not to
`encode panoramic pictures made by multiple cameras but to
`efficiently encode all the pictures taken by the cameras.
`There have recently been many groups studying multiview
`video coding. One of these is the 3DAV ad-hoc group at the
`MPEG/JVT meeting, which has been discussing this topic
`since December 2001 to achieve an international standard
`[3]–[5]. This group has discussed compression efficiency,
`viewpoint interpolation, low delay, the consistency of quality
`between views, and temporal random access as requirements
`for multiview video coding, and has attempted to establish a
`standard that will fulfill these requirements. Our approach is
`aimed at improving compression efficiency taking the previous
`requirements into account. The basis of our approach is the
`latest standard coding H.264/AVC and inter-view prediction.
`We added the following two approaches to it.
`The first uses view-interpolation technology, which synthe-
`sizes view-interpolated pictures at a given time and a given posi-
`tion and applies them as reference pictures. The main advantage
`of this approach is that only a little information is sent from the
`encoder to the decoder instead of the view-interpolated pictures
`to create novel reference pictures.
`some decoded pictures of other views not only instantaneously
`but also at different instants. In addition, it was possible to
`only decode local views because references were restricted
`in a GoGOP.1 For example, when a multiview consists of
`15 views and a GoGOP consists of five views (total of three
`GoGOPs), it is not necessary to decode 15 views for a view,
`but to only decode five views. Contrary to Kimata, Oka et al.
`suggested only referring to decoded pictures instantaneously
`for inter-view prediction because different instants were not
`referred to that much in their study [7]. Droese et al. proposed a
`scalable approach based on the H.264/AVC Scalable Extension
`[8]. Mueller et al. proposed a hierarchical prediction structure
`that was modified for multiview video coding [9]. Oh et al.
`proposed a lattice-like pyramid prediction structure [10]. All
`of them independently discussed prediction structures for
`inter-view prediction. As will be described later, we imple-
`mented our approach on Kimata’s prediction structure instead
`of the others to take local decoding into consideration.
`Taguchi et al. suggested first synthesizing a free-viewpoint
`video and its depth-map video and then encoding them. They
`then encoded residual errors between the 3-D warped free-view-
`point video and target multiview videos [11]. They concurrently
`suggested an approach to scalable decoding. Shimizu et al.’s
`approach was similar to Taguchi’s. He insisted that the coding
`of a depth map should be studied in detail because the quality
`of the map and the number of bits to be assigned to it were a
`trade-off [12]. Martinian et al. proposed view-synthesis predic-
`tion that used a disparity map [13]. Yoon et al. proposed layered
`depth images, which divided a picture into various layers and re-
`constructed a predicted picture from them [14]. The encoder in
`these methods makes a depth map and sends it to the decoder.
`The predicted picture is created by 3-D warping.
`B. Color Correction
`Lee et al. proposed MB-based illumination change-adaptive
`motion estimation/motion compensation (ICA ME/MC) [15].
`NewSAD in this method was defined to cancel out the difference
`in illumination between different views. Sohn et al. proposed
`a group of multiviews (GOMV) that was an instantaneous set
`of pictures, inter-view balanced display estimation (IBDE) that
`modified luminance for inter-view prediction, and a rate control
`scheme that treated a GOMV as a picture in conventional 2-D
`video coding [16]. Su et al. proposed illumination compensation
`(IC) and color compensation (CC), which added offsets to the
`YUV channels of the compensated block [17]. The offsets were
`predicted from neighboring blocks. Fecker et al. modified the
`neighbor views’ videos by using lookup tables calculated using
`a histogram from two cameras [18]. It is impossible to handle
`occlusions with the histogram approach. The quality of correc-
`tion is therefore determined by the occlusion areas. Chen et al.
`used the linear transformation of YUV channels, the coefficients
`of which were searched by iterative linear transformation [19].
`Color-correction methods for general purposes have also
`been developed. Ilie et al. introduced three approaches [20],
`i.e., linear least squares matching that modifies the captured
`picture by linear transformation, RGB to RGB transformation
`1This function is called “local decoding.”
`Fig. 2. GoGOP and GOP: GoGOP consisting of multiple GOPs.
`The second approach is used to correct the luminance and
`chrominance of the other views for inter-view prediction and
`view-interpolation prediction (color correction). They can be
`corrected because all cameras have photoelectric variations. The
`advantage of this approach is similar to that of the first, i.e., as
`it is only used to send small amounts of information, these pre-
`dictions are more accurate.
`A multiview video system generally has the following
`1) All cameras have the same color filters.
`2) It is very difficult to make the apertures of all cameras ex-
`actly the same. Gain control is used to compensate for this.
`However, some cameras only have one gain for RGB chan-
`nels or have roughly discrete gains, which is insufficient for
`3) Gain and brightness levels are not linear and are not the
`same on all cameras. As a result, no camera’s response is
`exactly the same.
`We tried to develop color correction working under the condi-
`tions above. Because of 1), we treated RGB channels indepen-
`dently. 2) and 3) made us consider non-linear transformation.
`The major characteristics of this color correction are that the
`problem is treated as an energy-optimization problem and its
`cost function is designed to work even if corresponding colors
`include some incorrect matches.
`This paper is organized as follows. We review the related
`work on prediction and color correction in Section II. The first
`approach called view-interpolation prediction and our prelimi-
`nary experiments on it are described in Section III. The second
`approach called color correction and our preliminary experi-
`ments on it are described in Section IV. Experiments that con-
`firmed coding was efficiently done are described in Section V.
`We conclude the paper in Section VI.
`There have been many approaches to multiview coding that
`are discriminable to prediction and color correction. Many ap-
`proaches mentioned below are studied about them; however, ad-
`ditional approaches to obtaining effective compression are still
`A. Prediction
`Kimata et al. introduced the group of group of pictures
`(GoGOP) for inter-view prediction. Fig. 2 shows an overview
`of GoGOP. He only allowed decoded pictures in the GoGOP
`to be referred to [6]. This meant that a picture could refer to
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.


`Fig. 3. Overview of first approach’s decoder: All tools except for view interpolation and reference picture memory are same as those for H.264/AVC and inter-view
`prediction. Some parameters in this figure are transmitted from encoder to decoder. Decoder produces M ; M . . . by itself.
`that simultaneously modifies the RGB data at pixels by using
`3 matrix, and general polynomial transformation that is
`a 3
`similar to RGB to RGB transformation but uses a polynomial
`3 matrix. Joshi et al. first carried out linear
`instead of the 3
`transformation in all RGB channels, and later did RGB to RGB
`4 matrix [21]. In both Ilie et al.’s
`transformation by using a 3
`and Joshi’s et al.’s approaches, the color pattern board needs to
`be captured before or after the objects are captured.
`As far as we know, no previous work has simultaneously
`treated the following conditions:
`1) independent correction of RGB channels;
`2) transformation using non-linear function;
`3) no use of color pattern board and handling of occlusions.
`If we use cameras that have different color filters, cross-RGB-
`3 matrix is theoretically re-
`channel correction by using a 3
`quired, unless it is not needed. It is not needed in our approach
`because we assume the situation that all cameras use same color
`filters. As the photoelectric-transfer characteristics of a camera
`are not linear, we should stick to non-linear correction. The in-
`fluence of occlusion areas should preferably not be ignored es-
`pecially when cameras are far apart.
`A. Overview
`This approach was based on the latest-standard coding
`H.264/AVC [22] and Kimata’s inter-view prediction structure
`[6]. Our idea was to add view interpolation to refer to the in-
`terpolated pictures as well as the decoded pictures of the other
`views. All tools except for view-interpolation processing and
`reference-picture memory are the same as those for H.264/AVC
`and the inter-view prediction structure in the decoder and local
`in Fig. 3 represents a decoded
`decoder (Fig. 3). The
`at time
`represents an
`picture of view
`represents the number
`interpolated picture.
`of interpolated pictures and
`is the total.
`produced from the decoded picture of the left view at the
`, and the decoded picture of the right view
`same instant
`. The left and right sides do not stand for left-side and
`Fig. 4. Example references. (a) V picture and (b) W picture.
`and the left
`right-side-neighboring. The distance between
`view also does not need to be the same as that between
`the right view. Fig. 2 shows an outline of the GoGOP. As men-
`tioned in Section II, all the pictures inside the GoGOP cannot
`refer to the decoded pictures outside the GoGOP to accomplish
`local decoding. In other words, the reference is restricted inside
`the GoGOP. Fig. 4 shows example references. The notations of
`picture types represented after this are as follows.
`, and
`are the same as ,
`, and
`pictures in H.264/
`refer to the other view’s decoded pictures
`, and
`, and
`. They represent
`in addition to the functions of
`conventional prediction, i.e., inter-view prediction;
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.


`Fig. 5. View-interpolation process: View interpolation is carried out in both encoder and decoder. It is, therefore, not necessary to send both M (c ; t ) and its
`disparity map, N .
`technique in our approach to create reference pictures that
`would present a picture at a given viewpoint and a given time.
`3-D warping also can create a picture, but the quality of view-
`interpolated pictures is better than that obtained by 3-D warping.
`This is because view interpolation uses two pictures for ren-
`dering while 3-D warping only uses one. In addition to this,
`view interpolation does not need a depth map while 3-D warping
`does. These are the reasons we chose view interpolation for view
`Many view-interpolation approaches have assumed aligned
`cameras are used, which first calculate a disparity map, and then
`render a view-interpolated picture. A disparity map was calcu-
`lated by Droese et al. [23] from the top left pixel to the bottom
`right pixel in order by using an energy function that took the
`smoothness of disparity transition into account. Although a dis-
`parity map was later calculated by Droese et al. [24] by loopy
`belief propagation, which needs a long time, the quality of the
`disparity map was better than that of the earlier study [23]. A
`disparity map at the left picture and then at the right picture
`was first calculated by Yaser et al. [25] to address the occlusion
`problem. After that, a view-interpolated picture was rendered
`between these two pictures. Because this method calculated two
`disparity maps, it needed longer than Droese et al.’s first calcu-
`lation [23] did. A disparity map at the left picture was calcu-
`lated by Boykov and Kolmogorov [26] from the left and right
`pictures using the novel graph-cut algorithm. Yamanaka et al.
`advanced this idea toward creating a view-interpolated picture
`[27]. Turning their attention to dynamic programming, Crim-
`inisi et al. considered their original four-plane algorithm to ad-
`dress the problem of occlusion [28]. Fukushima et al. advanced
`the concept of dynamic programming to two dimensions to cal-
`culate adequate minimum energy for two dimensional data [29].
`We chose Droese et al. [23] for our approach as they took both
`quality and computational complexity into account. According
`to Droese et al. [23], view interpolation with their method was
`carried out at 2160 ms/frame for a picture of 640x480 pixels
`with a 2.4-GHz Pentium 4 without hardware-specific optimiza-
`tion. This is currently not suitable for decoding processes, but
`rapid developments in hardware technology and software opti-
`mization should make it suitable in the near future. As Droese et
`al. [23] only used luminance information to estimate disparity,
`we modified it to use color information to obtain better disparity
`estimates and a better interpolated picture.
`The calculation procedure to synthesize an interpo-
`lated picture is outlined in Fig. 7. The rectangles in this
`figure stand for steps and the pairs of trapezoids stand for
`and its
`loops. Assume
`Fig. 6. Examples of coding order: Side numbers represent coding order. (a)
`Similar to IBP structure. (b) Similar to hierarchical B structure.
`in addition to the function of
`refers to
`. This is our proposed prediction, i.e., view-interpola-
`tion prediction. The conditions for using
`are the same
`as previously
`as the conditions for producing
`mentioned because
`by only using
`Because the decoder can produce
`some parameters,2 it is not necessary to send both
`and its disparity map from the encoder to the decoder (Fig. 5).
`The data to be sent are only some parameters mentioned at the
`footnote. This is one of the advantages of this approach.
`B. Coding Order
`Fig. 6 shows two examples of GoGOP coding designed
`with five views for the view axis and five pictures for the
`time axis. The side numbers represent the order of coding.
`Fig. 6(a) and (b) show the order in which view axes are sim-
`structure of
`ilar to the IBP structure and the hierarchical
`in Fig. 6(b) refers to the
`H.264/AVC. Each
`view-interpolated picture produced by the next neighboring
`, and
`views, such as
`C. View Interpolation
`View interpolation, in other words view synthesis, has been
`extensively studied in computer vision and computer graphics.
`Almost all these studies have aimed at creating new-viewpoint
`or focused pictures from the captured pictures. We used this
`2They are the following parameters to be described at the later section: (1) the
`position of the left camera and its homography matrix, (2) the position of the
`right camera and its homography matrix, and (3)J, K,  , , , and E
`We have assumed that all cameras are aligned in our method of interpola-
`tion and the distance between the target and the left camera and the distance
`between the target and the right camera are not always the same. Intrinsic an ex-
`trinsic camera parameters are not needed in this case. The previously mentioned
`parameters are sufficient to produce M (c ; t ).
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.


`Fig. 7. Calculation procedure to synthesize view-interpolated picture: Rectangles stand for steps and pairs of trapezoids for loops.
`disparity map
`is created from the left decoded picture
`and the right decoded picture
`. The left and right do not
`stand for left and right neighboring.
`is first calculated from
`the top left pixel to the bottom right pixel in order by using error
`is calculated at each pixel and this
`energy function
`determines the disparity there. After that, a view-interpolated
`picture is synthesized by using
`is formulated as
`are constant coefficients for
`determine the influence of each energy term.
`represents the
`the block size for block matching.
`position of a view-interpolated picture. For example, must be
`close to 0 when we want to make a view-interpolated picture
`close to the left camera or must be equal to 0.5 when we want
`to make a view-interpolated picture in the middle between the
`left and right cameras. The first three terms
`represent the differences between the blocks of
`, respectively. Fig. 8(a) shows the positions of pixels
`equals 1/3. The last term,
`used to calculate
`represents the spatial disparity continuity between a pixel and
`[Fig. 8(b)].
`, and
`its upper and left pixels on
`are defined in the same way.
`or that at the left is used
`The disparity that minimizes
`as an estimated disparity, and the view-interpolated picture is
`synthesized as
`Fig. 8. Error function terms. (a) Positions of pixels used to calculate E , E ,
`and E : (0 < < 1) represents position of view-interpolated picture. For
`example, when is close to 0, view-interpolated picture close to position of left
`camera is produced, or when equals 0.5, that in middle between left and right
`cameras is produced. (b) Spatial disparity continuity E .
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.


`Fig. 9. Xmas test sequence.
`to very simply
`are defined in the
`We prepared another function for
`address the occlusion problem.
`same way.
`We assumed in (1) and (2) that all cameras have the same di-
`rection for the optical axis and the same intrinsic camera param-
`eters. If this is not so for the input pictures, the decoded pictures
`need homography before the calculation above is done. Homog-
`raphy works as geometrical compensation except for the posi-
`tion of the optical center, which we ignore reluctantly.
`should differ from sequence to sequence and
`The range of
`this value influences the quality of the view-interpolated picture.
`Therefore, the range of must be determined by the encoder and
`sent to the decoder.
`D. Preliminary Experiments
`from left
`We conducted experiments to confirm the characteristics
`of view-interpolated pictures. Before conducting these, we
`captured a test sequence called Xmas [30] and created Blurred
`Xmas. The properties of Xmas are as follows.
`1) Xmas consisted of 101 pictures;
`2) We called these 101 pictures
`to right (Fig. 9);
`3) All pictures were captured with a single camera;
`4) The camera was moved sideways very slightly and accu-
`rately stopped at 101 positions by a mechanical shifting
`is 70 pixels;
`5) The maximum disparity between
`6) Five pictures were captured at each position and an average
`picture was produced from these. We used the average pic-
`tures as Xmas. Therefore, Xmas was free from noise;
`7) The light conditions or other environments were not
`changed during capture.
`Blurred Xmas
`were produced from
`denotes a Gaussian filter has been applied to
`with .
`The experiments were carried out as follows. We first chose
`as the middle picture. The left and right
`a picture from
`pictures were automatically determined when the difference in
`was the left picture
`camera was determined. For example,
`was that at the right when
`was the middle picture
`. We measured the PSNR of the Y channel
`between the middle and synthesized pictures produced from
`the left and right pictures over changing . We measured the
`for one
`to obtain stable
`Fig. 10. Experimental results.
`. After mea-
`results and used the average as a result for one
`, we processed
`the same way.
`suring all results for
`The experimental results are plotted in Fig. 10. From a qual-
`itative evaluation, we can say the following.
`1) The quality of the synthesized picture depends on the dis-
`tance between the left and right cameras. The quality is
`better when they are closer;
`2) The quality of the synthesized picture will be quite good
`when pictures are synthesized from blurred pictures;
`is 32.3 dB, we ex-
`3) Because the
`pect that this view interpolation has the capability of syn-
`thesizing more than about 32.3 dB when the maximum dis-
`parity between two pictures is about 30 pixels.
`decreases because decoded pictures containing dis-
`tortion are used as the left and right pictures for view-interpola-
`tion prediction.
`A. Overview
`We applied color correction to other view’s decoded pictures
`before the prediction scheme to use color-corrected pictures for
`inter-view prediction and view-interpolation prediction. Color-
`corrected decoded pictures were used as if they were the original
`decoded pictures. All tools except the processing for lookup-
`table creation and reference-picture memory were the same as
`those for H.264/AVC and the inter-view prediction structure.
`Fig. 11 shows an overview of this approach. The encoder cre-
`ates lookup tables, corrects the other views’ decoded pictures,
`and sends the lookup tables to the decoder. The corrected pic-
`tures are used for inter-view prediction and view-interpolation
`prediction. The decoder receives the lookup tables and corrects
`the other views’ decoded pictures. We express color-relation in-
`formation with lookup tables in this approach, which should be
`created as precisely as possible for efficient coding. In this sec-
`tion, we will describe how we created them in the encoder in
`There are some approaches that need a color pattern board
`to obtain the color relation information between cameras. The
`main advantage of the color pattern board is:
`it is possible to capture the data for the entire hue;
`it is easy to detect corresponding colors.
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.


`Fig. 11. Overview of color correction: Decoder receives lookup tables and uses
`them for color correction.
`There are some approaches, on the other hand, that obtain color-
`relation information without the use of a color pattern board.
`The disadvantages of using a color pattern board are as follows.
`• Providing a color pattern board is not easy when capturing
`a large area such that involving a soccer game in a stadium;
`It is possible to approximately capture whole intensities
`without using a color pattern board because corresponding
`colors are chosen from the whole picture. This is impos-
`sible for a color pattern board in some cases because they
`are only chosen from this. For example, if the board is
`placed in a room and there is a bright window near it, low
`intensities are not chosen from it [31];
`If the quality of color correction is the same between a
`color pattern board and a non-color pattern board, the latter
`approach is less difficult.
`Considering these advantages, we tried to develop a method of
`obtaining color-relation information by using correspondences
`instead of a color pattern board. This approach corrects the lu-
`minance and chrominance of the other views’ decoded pictures.
`The other views’ decoded pictures will be corrected to a color
`similar to the target view’s pictures.3 YUV channels are first
`converted to RGB channels. After that, they are corrected inde-
`pendently and non-linearly by using R, G, and B lookup tables.
`Finally, RGB channels are re-converted to YUV channels.
`Although these two conversions, YUV to RGB and RGB to
`YUV, appear wasteful, they are necessary. If these were not
`done, a bigger lookup table to directly change a YUV data set to
`a YUV data set would conceptually be necessary. Its size would
`when each
`, and
`had 256 degrees of intensity.
`Fig. 12 outlines the entire procedure for creating the lookup
`tables. The rectangles in this figure stand for steps and the pairs
`of trapezoids stand for loops. As we can see from the figure,
`3The word “target view” is generally used in color-correction studies for a
`view to be corrected. However, we have to correct other views to have a similar
`color to the “target view” that is to be encoded. We will use “target view” and
`“other view” in this paper to conform with multiview video-coding studies.
`Fig. 12. Calculation procedure for lookup tables.
`the following calculation is carried out for each “other view.”
`We can see correspondences between the picture for the other
`view and that for the target view at a given time in Step 1. The
`corresponding colors are detected by using the correspondences
`in Step 2. We calculate the lookup tables for all RGB channels
`in Step 3, where we presume a Lambertian condition that light
`rays from a point in the 3-D world will be the same; in other
`words, the corresponding colors will be the same.
`The RGB channels in this approach are corrected inde-
`pendently and non-linearly because of channel-independent
`lookup tables. This approach detects correspondences and
`corresponding colors instead of using a color pattern board. It
`is therefore not necessary to capture a color pattern board and
`it is possible to handle occlusions.
`We will describe Steps 1–3 in detail in Section IV-B–D.
`B. Step 1: Detection of Correspondences
`The purpose of this step is to create correspondences list
`from the other view picture , and the target view picture,
`Fig. 13 shows example correspondences. The lines in this figure
`represent example correspondences.
`We used scale invariant feature transform (SIFT) [32] without
`between the
`modifications. SIFT detects correspondences
`, and the target view picture,
`other view picture,
`refers to the corresponding position in
`refers to that in
`is the total number of corre-
`spondences. SIFT has some variable parameters that influence
`the total number and quality of correspondences, which have
`been evaluated in detail by Lowe [32]. We used correspondences
`in this approach to detect corresponding colors, which should
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.


`Fig. 13. Example SIFT results.
`Fig. 14. Gaussian filtered pictures for selection.
`spread as much as possible from dark to light in all RGB chan-
`nels. We should therefore vary these parameters to detect these
`C. Step 2: Detection of Corresponding Colors
`The purpose of this step is to create a corresponding-colors
`, from the correspondences list
`. Table I lists example
`corresponding-color lists where very small portions of corre-
`sponding colors are presented.
`It is simple to treat correspondences in input pictures as corre-
`sponding colors. However, we produced some Gaussian-filtered
`pictures from input pictures and selected colors from these at
`a given time. Fig. 14 shows examples of two Gaussian-filtered
`pictures and the selection of corresponding colors. When we
`use input pictures, the corresponding colors not only have high-
`spatial resolution but also include noise. Gaussian-filtered pic-
`tures are the opposite; they do not have either high-spatial reso-
`lution or include noise. To take advantage of this, we used some
`Gaussian-filtered pictures.4 It was important to use many plau-
`sible corresponding colors in Section IV-D. We therefore de-
`4The picture filtered with a very small  is the same as the input picture. The
`term “Gaussian-filtered picture” therefore includes an input picture here.
`cided to use all colors corresponding to some Gaussian-filtered
`pictures at a given time. This was also the reason we used some
`varied from case
`Gaussian-filtered pictures where their best
`to case.
`, was formulated as follows.
`The corresponding-colors list,
`, and
`are the RGB channels
`Assume that
`All these calculations are do

