throbber
1436
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 11, NOVEMBER 2007
`
`Multiview Video Coding Using View Interpolation
`and Color Correction
`
`Kenji Yamamoto, Member, IEEE, Masaki Kitahara, Hideaki Kimata, Tomohiro Yendo,
`Toshiaki Fujii, Member, IEEE, Masayuki Tanimoto, Senior Member, IEEE, Shinya Shimizu, Kazuto Kamikura, and
`Yoshiyuki Yashima, Member, IEEE
`
`(Inivted Paper)
`
`Abstract—Neighboring views must be highly correlated in mul-
`tiview video systems. We should therefore use various neighboring
`views to efficiently compress videos. There are many approaches
`to doing this. However, most of these treat pictures of other views
`in the same way as they treat pictures of the current view, i.e.,
`pictures of other views are used as reference pictures (inter-view
`prediction). We introduce two approaches to improving compres-
`sion efficiency in this paper. The first is by synthesizing pictures
`at a given time and a given position by using view interpolation
`and using them as reference pictures (view-interpolation predic-
`tion). In other words, we tried to compensate for geometry to ob-
`tain precise predictions. The second approach is to correct the lu-
`minance and chrominance of other views by using lookup tables to
`compensate for photoelectric variations in individual cameras. We
`implemented these ideas in H.264/AVC with inter-view prediction
`and confirmed that they worked well. The experimental results re-
`vealed that these ideas can reduce the number of generated bits by
`approximately 15% without loss of PSNR.
`Index Terms—Color correction, H.264/AVC, inter-view predic-
`tion, multiview video coding, view interpolation prediction.
`
`I. INTRODUCTION
`
`AS THE capabilities of personal computers and man-
`
`ufacturing cameras continue to dramatically increase,
`multiple-camera systems that have around one hundred
`cameras have emerged [1], [2], which makes it possible to gen-
`erate free-viewpoint video and 3-D recognition. Our group’s
`multiple-camera system, one of them, and the sequences cap-
`tured have previously been described [2]. These systems have
`
`Manuscript received December 13, 2006. This work was presented in part
`at the Picture Coding Symposium, Beijing, China, April 2006. This paper was
`recommended by Guest Editor Prof. Y. He.
`K. Yamamoto was with Nagoya University, Nagoya 464-8603, Japan. He
`is now with the Universal Media Research Center, National Institute of In-
`formation and Communications Technology, Tokyo 184-8795, Japan (e-mail:
`k.yamamoto@nict.go.jp).
`M. Kitahara is with NTT Advanced Technology Corporation, Kanagawa 239-
`0847, Japan (e-mail: kitahara@hama.ntt-at.co.jp).
`T. Yendo, T. Fujii, and M. Tanimoto are with Nagoya University, Nagoya
`464-8603, Japan (e-mail: yendo@nuee.nagoya-u.ac.jp; fujii@@nuee.nagoya-u.
`ac.jp; tanimoto @nuee.nagoya-u.ac.jp).
`H. Kimata, S. Shimizu, and K. Kamikura are with NTT Cyber Space Labora-
`tories, NTT Corporation, Yokosuka 239-0847, Japan (e-mail: kimata.hideaki@
`lab.ntt.co.jp; shimizu.shinya@lab.ntt.co.jp; kamikura.kazuto@lab.ntt.co.jp).
`Y. Yashima is with NTT Cyber Space Laboratories, NTT Corporation, Yoko-
`suka 239-0847, Japan, and also with Tokyo Institute of Technology, Tokyo 152-
`8550, Japan (e-mail: yashima.yoshiyuki@lab.ntt.co.jp).
`Color versions of one or more of the figures in this paper are available online
`at http://ieeexplore.ieee.org.
`Digital Object Identifier 10.1109/TCSVT.2007.903802
`
`Fig. 1. Multiview video coding: purpose of multiview video coding is not to
`encode panoramic pictures made from multiple cameras but to efficiently en-
`code pictures taken by all cameras.
`
`many fields in which they can be applied, such as to preserve our
`cultural heritage and traditional dance, free-viewpoint televi-
`sion, educational applications, face certification/authentication,
`security systems, and entertainment. They need technological
`advances in various fields such as video coding to become
`commonly used instead of single-video systems. This paper
`focuses on compression for multiview videos. Fig. 1 shows an
`overview of multiview video coding, whose purpose is not to
`encode panoramic pictures made by multiple cameras but to
`efficiently encode all the pictures taken by the cameras.
`There have recently been many groups studying multiview
`video coding. One of these is the 3DAV ad-hoc group at the
`MPEG/JVT meeting, which has been discussing this topic
`since December 2001 to achieve an international standard
`[3]–[5]. This group has discussed compression efficiency,
`viewpoint interpolation, low delay, the consistency of quality
`between views, and temporal random access as requirements
`for multiview video coding, and has attempted to establish a
`standard that will fulfill these requirements. Our approach is
`aimed at improving compression efficiency taking the previous
`requirements into account. The basis of our approach is the
`latest standard coding H.264/AVC and inter-view prediction.
`We added the following two approaches to it.
`The first uses view-interpolation technology, which synthe-
`sizes view-interpolated pictures at a given time and a given posi-
`tion and applies them as reference pictures. The main advantage
`of this approach is that only a little information is sent from the
`encoder to the decoder instead of the view-interpolated pictures
`to create novel reference pictures.
`
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.
`
`1051-8215/$25.00 © 2007 IEEE
`
`UNIFIED 1014
`
`

`

`YAMAMOTO et al.: MULTIVIEW VIDEO CODING USING VIEW INTERPOLATION AND COLOR CORRECTION
`
`1437
`
`some decoded pictures of other views not only instantaneously
`but also at different instants. In addition, it was possible to
`only decode local views because references were restricted
`in a GoGOP.1 For example, when a multiview consists of
`15 views and a GoGOP consists of five views (total of three
`GoGOPs), it is not necessary to decode 15 views for a view,
`but to only decode five views. Contrary to Kimata, Oka et al.
`suggested only referring to decoded pictures instantaneously
`for inter-view prediction because different instants were not
`referred to that much in their study [7]. Droese et al. proposed a
`scalable approach based on the H.264/AVC Scalable Extension
`[8]. Mueller et al. proposed a hierarchical prediction structure
`that was modified for multiview video coding [9]. Oh et al.
`proposed a lattice-like pyramid prediction structure [10]. All
`of them independently discussed prediction structures for
`inter-view prediction. As will be described later, we imple-
`mented our approach on Kimata’s prediction structure instead
`of the others to take local decoding into consideration.
`Taguchi et al. suggested first synthesizing a free-viewpoint
`video and its depth-map video and then encoding them. They
`then encoded residual errors between the 3-D warped free-view-
`point video and target multiview videos [11]. They concurrently
`suggested an approach to scalable decoding. Shimizu et al.’s
`approach was similar to Taguchi’s. He insisted that the coding
`of a depth map should be studied in detail because the quality
`of the map and the number of bits to be assigned to it were a
`trade-off [12]. Martinian et al. proposed view-synthesis predic-
`tion that used a disparity map [13]. Yoon et al. proposed layered
`depth images, which divided a picture into various layers and re-
`constructed a predicted picture from them [14]. The encoder in
`these methods makes a depth map and sends it to the decoder.
`The predicted picture is created by 3-D warping.
`
`B. Color Correction
`
`Lee et al. proposed MB-based illumination change-adaptive
`motion estimation/motion compensation (ICA ME/MC) [15].
`NewSAD in this method was defined to cancel out the difference
`in illumination between different views. Sohn et al. proposed
`a group of multiviews (GOMV) that was an instantaneous set
`of pictures, inter-view balanced display estimation (IBDE) that
`modified luminance for inter-view prediction, and a rate control
`scheme that treated a GOMV as a picture in conventional 2-D
`video coding [16]. Su et al. proposed illumination compensation
`(IC) and color compensation (CC), which added offsets to the
`YUV channels of the compensated block [17]. The offsets were
`predicted from neighboring blocks. Fecker et al. modified the
`neighbor views’ videos by using lookup tables calculated using
`a histogram from two cameras [18]. It is impossible to handle
`occlusions with the histogram approach. The quality of correc-
`tion is therefore determined by the occlusion areas. Chen et al.
`used the linear transformation of YUV channels, the coefficients
`of which were searched by iterative linear transformation [19].
`Color-correction methods for general purposes have also
`been developed. Ilie et al. introduced three approaches [20],
`i.e., linear least squares matching that modifies the captured
`picture by linear transformation, RGB to RGB transformation
`
`1This function is called “local decoding.”
`
`Fig. 2. GoGOP and GOP: GoGOP consisting of multiple GOPs.
`
`The second approach is used to correct the luminance and
`chrominance of the other views for inter-view prediction and
`view-interpolation prediction (color correction). They can be
`corrected because all cameras have photoelectric variations. The
`advantage of this approach is similar to that of the first, i.e., as
`it is only used to send small amounts of information, these pre-
`dictions are more accurate.
`A multiview video system generally has the following
`properties.
`1) All cameras have the same color filters.
`2) It is very difficult to make the apertures of all cameras ex-
`actly the same. Gain control is used to compensate for this.
`However, some cameras only have one gain for RGB chan-
`nels or have roughly discrete gains, which is insufficient for
`compensation.
`3) Gain and brightness levels are not linear and are not the
`same on all cameras. As a result, no camera’s response is
`exactly the same.
`We tried to develop color correction working under the condi-
`tions above. Because of 1), we treated RGB channels indepen-
`dently. 2) and 3) made us consider non-linear transformation.
`The major characteristics of this color correction are that the
`problem is treated as an energy-optimization problem and its
`cost function is designed to work even if corresponding colors
`include some incorrect matches.
`This paper is organized as follows. We review the related
`work on prediction and color correction in Section II. The first
`approach called view-interpolation prediction and our prelimi-
`nary experiments on it are described in Section III. The second
`approach called color correction and our preliminary experi-
`ments on it are described in Section IV. Experiments that con-
`firmed coding was efficiently done are described in Section V.
`We conclude the paper in Section VI.
`
`II. RELATED WORK
`There have been many approaches to multiview coding that
`are discriminable to prediction and color correction. Many ap-
`proaches mentioned below are studied about them; however, ad-
`ditional approaches to obtaining effective compression are still
`expected.
`
`A. Prediction
`Kimata et al. introduced the group of group of pictures
`(GoGOP) for inter-view prediction. Fig. 2 shows an overview
`of GoGOP. He only allowed decoded pictures in the GoGOP
`to be referred to [6]. This meant that a picture could refer to
`
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`1438
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 11, NOVEMBER 2007
`
`Fig. 3. Overview of first approach’s decoder: All tools except for view interpolation and reference picture memory are same as those for H.264/AVC and inter-view
`prediction. Some parameters in this figure are transmitted from encoder to decoder. Decoder produces M ; M . . . by itself.
`
`that simultaneously modifies the RGB data at pixels by using
`3 matrix, and general polynomial transformation that is
`a 3
`similar to RGB to RGB transformation but uses a polynomial
`3 matrix. Joshi et al. first carried out linear
`instead of the 3
`transformation in all RGB channels, and later did RGB to RGB
`4 matrix [21]. In both Ilie et al.’s
`transformation by using a 3
`and Joshi’s et al.’s approaches, the color pattern board needs to
`be captured before or after the objects are captured.
`As far as we know, no previous work has simultaneously
`treated the following conditions:
`1) independent correction of RGB channels;
`2) transformation using non-linear function;
`3) no use of color pattern board and handling of occlusions.
`If we use cameras that have different color filters, cross-RGB-
`3 matrix is theoretically re-
`channel correction by using a 3
`quired, unless it is not needed. It is not needed in our approach
`because we assume the situation that all cameras use same color
`filters. As the photoelectric-transfer characteristics of a camera
`are not linear, we should stick to non-linear correction. The in-
`fluence of occlusion areas should preferably not be ignored es-
`pecially when cameras are far apart.
`
`III. VIEW INTERPOLATION PREDICTION
`
`A. Overview
`This approach was based on the latest-standard coding
`H.264/AVC [22] and Kimata’s inter-view prediction structure
`[6]. Our idea was to add view interpolation to refer to the in-
`terpolated pictures as well as the decoded pictures of the other
`views. All tools except for view-interpolation processing and
`reference-picture memory are the same as those for H.264/AVC
`and the inter-view prediction structure in the decoder and local
`in Fig. 3 represents a decoded
`decoder (Fig. 3). The
`at time
`and
`represents an
`picture of view
`represents the number
`interpolated picture.
`of interpolated pictures and
`is the total.
`are
`produced from the decoded picture of the left view at the
`, and the decoded picture of the right view
`same instant
`. The left and right sides do not stand for left-side and
`at
`
`Fig. 4. Example references. (a) V picture and (b) W picture.
`
`and the left
`right-side-neighboring. The distance between
`view also does not need to be the same as that between
`and
`the right view. Fig. 2 shows an outline of the GoGOP. As men-
`tioned in Section II, all the pictures inside the GoGOP cannot
`refer to the decoded pictures outside the GoGOP to accomplish
`local decoding. In other words, the reference is restricted inside
`the GoGOP. Fig. 4 shows example references. The notations of
`picture types represented after this are as follows.
`,
`, and
`are the same as ,
`, and
`pictures in H.264/
`•
`AVC;
`refer to the other view’s decoded pictures
`, and
`,
`,
`, and
`. They represent
`in addition to the functions of
`conventional prediction, i.e., inter-view prediction;
`
`•
`
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`YAMAMOTO et al.: MULTIVIEW VIDEO CODING USING VIEW INTERPOLATION AND COLOR CORRECTION
`
`1439
`
`Fig. 5. View-interpolation process: View interpolation is carried out in both encoder and decoder. It is, therefore, not necessary to send both M (c ; t ) and its
`disparity map, N .
`
`technique in our approach to create reference pictures that
`would present a picture at a given viewpoint and a given time.
`3-D warping also can create a picture, but the quality of view-
`interpolated pictures is better than that obtained by 3-D warping.
`This is because view interpolation uses two pictures for ren-
`dering while 3-D warping only uses one. In addition to this,
`view interpolation does not need a depth map while 3-D warping
`does. These are the reasons we chose view interpolation for view
`synthesis.
`Many view-interpolation approaches have assumed aligned
`cameras are used, which first calculate a disparity map, and then
`render a view-interpolated picture. A disparity map was calcu-
`lated by Droese et al. [23] from the top left pixel to the bottom
`right pixel in order by using an energy function that took the
`smoothness of disparity transition into account. Although a dis-
`parity map was later calculated by Droese et al. [24] by loopy
`belief propagation, which needs a long time, the quality of the
`disparity map was better than that of the earlier study [23]. A
`disparity map at the left picture and then at the right picture
`was first calculated by Yaser et al. [25] to address the occlusion
`problem. After that, a view-interpolated picture was rendered
`between these two pictures. Because this method calculated two
`disparity maps, it needed longer than Droese et al.’s first calcu-
`lation [23] did. A disparity map at the left picture was calcu-
`lated by Boykov and Kolmogorov [26] from the left and right
`pictures using the novel graph-cut algorithm. Yamanaka et al.
`advanced this idea toward creating a view-interpolated picture
`[27]. Turning their attention to dynamic programming, Crim-
`inisi et al. considered their original four-plane algorithm to ad-
`dress the problem of occlusion [28]. Fukushima et al. advanced
`the concept of dynamic programming to two dimensions to cal-
`culate adequate minimum energy for two dimensional data [29].
`We chose Droese et al. [23] for our approach as they took both
`quality and computational complexity into account. According
`to Droese et al. [23], view interpolation with their method was
`carried out at 2160 ms/frame for a picture of 640x480 pixels
`with a 2.4-GHz Pentium 4 without hardware-specific optimiza-
`tion. This is currently not suitable for decoding processes, but
`rapid developments in hardware technology and software opti-
`mization should make it suitable in the near future. As Droese et
`al. [23] only used luminance information to estimate disparity,
`we modified it to use color information to obtain better disparity
`estimates and a better interpolated picture.
`The calculation procedure to synthesize an interpo-
`lated picture is outlined in Fig. 7. The rectangles in this
`figure stand for steps and the pairs of trapezoids stand for
`and its
`loops. Assume
`
`Fig. 6. Examples of coding order: Side numbers represent coding order. (a)
`Similar to IBP structure. (b) Similar to hierarchical B structure.
`
`•
`
`in addition to the function of
`refers to
`. This is our proposed prediction, i.e., view-interpola-
`tion prediction. The conditions for using
`are the same
`as previously
`as the conditions for producing
`needs
`mentioned because
`by only using
`Because the decoder can produce
`some parameters,2 it is not necessary to send both
`and its disparity map from the encoder to the decoder (Fig. 5).
`The data to be sent are only some parameters mentioned at the
`footnote. This is one of the advantages of this approach.
`
`.
`
`B. Coding Order
`
`Fig. 6 shows two examples of GoGOP coding designed
`with five views for the view axis and five pictures for the
`time axis. The side numbers represent the order of coding.
`Fig. 6(a) and (b) show the order in which view axes are sim-
`structure of
`ilar to the IBP structure and the hierarchical
`and
`in Fig. 6(b) refers to the
`H.264/AVC. Each
`view-interpolated picture produced by the next neighboring
`,
`,
`, and
`.
`views, such as
`
`C. View Interpolation
`
`View interpolation, in other words view synthesis, has been
`extensively studied in computer vision and computer graphics.
`Almost all these studies have aimed at creating new-viewpoint
`or focused pictures from the captured pictures. We used this
`
`2They are the following parameters to be described at the later section: (1) the
`position of the left camera and its homography matrix, (2) the position of the
`right camera and its homography matrix, and (3)J, K,  , , , and E
`.
`We have assumed that all cameras are aligned in our method of interpola-
`tion and the distance between the target and the left camera and the distance
`between the target and the right camera are not always the same. Intrinsic an ex-
`trinsic camera parameters are not needed in this case. The previously mentioned
`parameters are sufficient to produce M (c ; t ).
`
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`1440
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 11, NOVEMBER 2007
`
`Fig. 7. Calculation procedure to synthesize view-interpolated picture: Rectangles stand for steps and pairs of trapezoids for loops.
`
`disparity map
`
`is created from the left decoded picture
`and the right decoded picture
`. The left and right do not
`stand for left and right neighboring.
`is first calculated from
`the top left pixel to the bottom right pixel in order by using error
`.
`is calculated at each pixel and this
`energy function
`determines the disparity there. After that, a view-interpolated
`,
`and
`.
`picture is synthesized by using
`is formulated as
`
`(1)
`
`where
`that
`are constant coefficients for
`and
`indicate
`and
`determine the influence of each energy term.
`represents the
`the block size for block matching.
`position of a view-interpolated picture. For example, must be
`close to 0 when we want to make a view-interpolated picture
`close to the left camera or must be equal to 0.5 when we want
`to make a view-interpolated picture in the middle between the
`,
`left and right cameras. The first three terms
`and
`represent the differences between the blocks of
`, respectively. Fig. 8(a) shows the positions of pixels
`when
`equals 1/3. The last term,
`,
`used to calculate
`represents the spatial disparity continuity between a pixel and
`[Fig. 8(b)].
`,
`,
`, and
`its upper and left pixels on
`are defined in the same way.
`or that at the left is used
`The disparity that minimizes
`as an estimated disparity, and the view-interpolated picture is
`synthesized as
`
`Fig. 8. Error function terms. (a) Positions of pixels used to calculate E , E ,
`and E : (0 < < 1) represents position of view-interpolated picture. For
`example, when is close to 0, view-interpolated picture close to position of left
`camera is produced, or when equals 0.5, that in middle between left and right
`cameras is produced. (b) Spatial disparity continuity E .
`
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.
`
`(2)
`
`

`

`YAMAMOTO et al.: MULTIVIEW VIDEO CODING USING VIEW INTERPOLATION AND COLOR CORRECTION
`
`1441
`
`Fig. 9. Xmas test sequence.
`
`and
`
`to very simply
`are defined in the
`
`We prepared another function for
`address the occlusion problem.
`same way.
`We assumed in (1) and (2) that all cameras have the same di-
`rection for the optical axis and the same intrinsic camera param-
`eters. If this is not so for the input pictures, the decoded pictures
`need homography before the calculation above is done. Homog-
`raphy works as geometrical compensation except for the posi-
`tion of the optical center, which we ignore reluctantly.
`should differ from sequence to sequence and
`The range of
`this value influences the quality of the view-interpolated picture.
`Therefore, the range of must be determined by the encoder and
`sent to the decoder.
`
`D. Preliminary Experiments
`
`from left
`
`We conducted experiments to confirm the characteristics
`of view-interpolated pictures. Before conducting these, we
`captured a test sequence called Xmas [30] and created Blurred
`Xmas. The properties of Xmas are as follows.
`1) Xmas consisted of 101 pictures;
`2) We called these 101 pictures
`to right (Fig. 9);
`3) All pictures were captured with a single camera;
`4) The camera was moved sideways very slightly and accu-
`rately stopped at 101 positions by a mechanical shifting
`machine;
`is 70 pixels;
`and
`5) The maximum disparity between
`6) Five pictures were captured at each position and an average
`picture was produced from these. We used the average pic-
`tures as Xmas. Therefore, Xmas was free from noise;
`7) The light conditions or other environments were not
`changed during capture.
`,
`Blurred Xmas
`as
`
`were produced from
`
`(3)
`
`denotes a Gaussian filter has been applied to
`
`where
`with .
`The experiments were carried out as follows. We first chose
`as the middle picture. The left and right
`a picture from
`pictures were automatically determined when the difference in
`was the left picture
`camera was determined. For example,
`was that at the right when
`was the middle picture
`and
`. We measured the PSNR of the Y channel
`and
`between the middle and synthesized pictures produced from
`the left and right pictures over changing . We measured the
`of
`,
`for one
`to obtain stable
`
`Fig. 10. Experimental results.
`
`. After mea-
`results and used the average as a result for one
`, we processed
`the same way.
`suring all results for
`The experimental results are plotted in Fig. 10. From a qual-
`itative evaluation, we can say the following.
`1) The quality of the synthesized picture depends on the dis-
`tance between the left and right cameras. The quality is
`better when they are closer;
`2) The quality of the synthesized picture will be quite good
`when pictures are synthesized from blurred pictures;
`of
`at
`is 32.3 dB, we ex-
`3) Because the
`pect that this view interpolation has the capability of syn-
`thesizing more than about 32.3 dB when the maximum dis-
`parity between two pictures is about 30 pixels.
`decreases because decoded pictures containing dis-
`tortion are used as the left and right pictures for view-interpola-
`tion prediction.
`
`IV. COLOR CORRECTION
`
`A. Overview
`
`We applied color correction to other view’s decoded pictures
`before the prediction scheme to use color-corrected pictures for
`inter-view prediction and view-interpolation prediction. Color-
`corrected decoded pictures were used as if they were the original
`decoded pictures. All tools except the processing for lookup-
`table creation and reference-picture memory were the same as
`those for H.264/AVC and the inter-view prediction structure.
`Fig. 11 shows an overview of this approach. The encoder cre-
`ates lookup tables, corrects the other views’ decoded pictures,
`and sends the lookup tables to the decoder. The corrected pic-
`tures are used for inter-view prediction and view-interpolation
`prediction. The decoder receives the lookup tables and corrects
`the other views’ decoded pictures. We express color-relation in-
`formation with lookup tables in this approach, which should be
`created as precisely as possible for efficient coding. In this sec-
`tion, we will describe how we created them in the encoder in
`detail.
`There are some approaches that need a color pattern board
`to obtain the color relation information between cameras. The
`main advantage of the color pattern board is:
`•
`it is possible to capture the data for the entire hue;
`•
`it is easy to detect corresponding colors.
`
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`1442
`
`IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 11, NOVEMBER 2007
`
`Fig. 11. Overview of color correction: Decoder receives lookup tables and uses
`them for color correction.
`
`•
`
`•
`
`There are some approaches, on the other hand, that obtain color-
`relation information without the use of a color pattern board.
`The disadvantages of using a color pattern board are as follows.
`• Providing a color pattern board is not easy when capturing
`a large area such that involving a soccer game in a stadium;
`It is possible to approximately capture whole intensities
`without using a color pattern board because corresponding
`colors are chosen from the whole picture. This is impos-
`sible for a color pattern board in some cases because they
`are only chosen from this. For example, if the board is
`placed in a room and there is a bright window near it, low
`intensities are not chosen from it [31];
`If the quality of color correction is the same between a
`color pattern board and a non-color pattern board, the latter
`approach is less difficult.
`Considering these advantages, we tried to develop a method of
`obtaining color-relation information by using correspondences
`instead of a color pattern board. This approach corrects the lu-
`minance and chrominance of the other views’ decoded pictures.
`The other views’ decoded pictures will be corrected to a color
`similar to the target view’s pictures.3 YUV channels are first
`converted to RGB channels. After that, they are corrected inde-
`pendently and non-linearly by using R, G, and B lookup tables.
`Finally, RGB channels are re-converted to YUV channels.
`Although these two conversions, YUV to RGB and RGB to
`YUV, appear wasteful, they are necessary. If these were not
`done, a bigger lookup table to directly change a YUV data set to
`a YUV data set would conceptually be necessary. Its size would
`when each
`,
`, and
`had 256 degrees of intensity.
`be
`Fig. 12 outlines the entire procedure for creating the lookup
`tables. The rectangles in this figure stand for steps and the pairs
`of trapezoids stand for loops. As we can see from the figure,
`
`3The word “target view” is generally used in color-correction studies for a
`view to be corrected. However, we have to correct other views to have a similar
`color to the “target view” that is to be encoded. We will use “target view” and
`“other view” in this paper to conform with multiview video-coding studies.
`
`Fig. 12. Calculation procedure for lookup tables.
`
`the following calculation is carried out for each “other view.”
`We can see correspondences between the picture for the other
`view and that for the target view at a given time in Step 1. The
`corresponding colors are detected by using the correspondences
`in Step 2. We calculate the lookup tables for all RGB channels
`in Step 3, where we presume a Lambertian condition that light
`rays from a point in the 3-D world will be the same; in other
`words, the corresponding colors will be the same.
`The RGB channels in this approach are corrected inde-
`pendently and non-linearly because of channel-independent
`lookup tables. This approach detects correspondences and
`corresponding colors instead of using a color pattern board. It
`is therefore not necessary to capture a color pattern board and
`it is possible to handle occlusions.
`We will describe Steps 1–3 in detail in Section IV-B–D.
`
`B. Step 1: Detection of Correspondences
`The purpose of this step is to create correspondences list
`.
`from the other view picture , and the target view picture,
`Fig. 13 shows example correspondences. The lines in this figure
`represent example correspondences.
`We used scale invariant feature transform (SIFT) [32] without
`between the
`modifications. SIFT detects correspondences
`, and the target view picture,
`other view picture,
`
`where
`
`and
`refers to the corresponding position in
`refers to that in
`.
`is the total number of corre-
`spondences. SIFT has some variable parameters that influence
`the total number and quality of correspondences, which have
`been evaluated in detail by Lowe [32]. We used correspondences
`in this approach to detect corresponding colors, which should
`
`Authorized licensed use limited to: Laura Stein. Downloaded on October 01,2020 at 16:57:36 UTC from IEEE Xplore. Restrictions apply.
`
`

`

`YAMAMOTO et al.: MULTIVIEW VIDEO CODING USING VIEW INTERPOLATION AND COLOR CORRECTION
`
`1443
`
`Fig. 13. Example SIFT results.
`
`Fig. 14. Gaussian filtered pictures for selection.
`
`TABLE I
`EXAMPLE CORRESPONDING-COLORS LISTS
`
`spread as much as possible from dark to light in all RGB chan-
`nels. We should therefore vary these parameters to detect these
`correspondences.
`
`C. Step 2: Detection of Corresponding Colors
`The purpose of this step is to create a corresponding-colors
`, from the correspondences list
`. Table I lists example
`list
`corresponding-color lists where very small portions of corre-
`sponding colors are presented.
`It is simple to treat correspondences in input pictures as corre-
`sponding colors. However, we produced some Gaussian-filtered
`pictures from input pictures and selected colors from these at
`a given time. Fig. 14 shows examples of two Gaussian-filtered
`pictures and the selection of corresponding colors. When we
`use input pictures, the corresponding colors not only have high-
`spatial resolution but also include noise. Gaussian-filtered pic-
`tures are the opposite; they do not have either high-spatial reso-
`lution or include noise. To take advantage of this, we used some
`Gaussian-filtered pictures.4 It was important to use many plau-
`sible corresponding colors in Section IV-D. We therefore de-
`
`4The picture filtered with a very small  is the same as the input picture. The
`term “Gaussian-filtered picture” therefore includes an input picture here.
`
`cided to use all colors corresponding to some Gaussian-filtered
`pictures at a given time. This was also the reason we used some
`varied from case
`Gaussian-filtered pictures where their best
`to case.
`, was formulated as follows.
`The corresponding-colors list,
`,
`,
`,
`,
`, and
`are the RGB channels
`Assume that
`and
`
`of
`
`All these calculations are do

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket