`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`UNIFIED PATENTS, LLC,
`Petitioner,
`
`v.
`
`INTERDIGITAL VC HOLDINGS, INC.,
`Patent Owner.
`
`Case No. IPR2021-00102
`U.S. Patent No. 8,363,724
`
`PATENT OWNER’S PRELIMINARY RESPONSE
`PURSUANT TO 37 C.F.R. § 42.107(a)
`
`
`
`I.
`II.
`
`TABLE OF CONTENTS
`Summary .......................................................................................................... 1
`Technology Primer .......................................................................................... 3
`A. Video Compression & Decompression ................................................. 4
`1.
`Video Compression and Decompression .................................... 5
`2.
`Redundant Information ............................................................... 6
`a.
`Compressing video data by reducing redundancies ......... 6
`b.
`Using a typical “reference picture” to reduce
`redundancies ..................................................................... 7
`3. Motion Information ..................................................................... 8
`a.
`Compressing video data using motion information ......... 8
`b.
`Using reference blocks with motion information ............. 9
`Typical Reference Pictures ....................................................... 10
`Virtual Reference Pictures ........................................................ 13
`a.
`Golden Frames ................................................................ 13
`b.
`Detail-Obscured Frames ................................................. 17
`Synthesized Pictures ............................................................................ 18
`1. Multi-View Video ..................................................................... 18
`2.
`“Synthesized” Picture Frames Are Created To Provide
`Additional Viewpoints For Multi-View Videos ....................... 19
`Synthesized Reference Pictures Are Not Virtual
`Reference Pictures..................................................................... 21
`III. The ‘724 Patent .............................................................................................. 22
`A.
`The ’724 Patent Discloses Novel Techniques For Managing
`Virtual Reference Pictures .................................................................. 22
`The Challenged Claims All Require Virtual Reference Pictures ........ 24
`B.
`IV. Unified’s Petition ........................................................................................... 25
`A.
`Each Ground Relies on Xin For The Claimed “Virtual
`Reference Picture” Limitations ........................................................... 25
`B. Xin Does Not Discuss or Concern Virtual Reference Pictures ........... 26
`
`4.
`5.
`
`3.
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`i
`
`B.
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`V. All Grounds Fail Because Petitioner Identifies No Evidence That Xin
`Describes a Virtual Reference Picture ........................................................... 29
`A.
`There Is No Evidence That Virtual Reference Pictures Are
`“Also Known As Synthesized Reference Pictures” ............................ 31
`There Is No Evidence That Xin’s Synthesized Pictures Are
`Designed Solely For Use As Reference Pictures And Are Not
`Designed And Available For Display As A Video Frame of The
`Multi-Video Display............................................................................ 33
`VI. Conclusion ..................................................................................................... 36
`
`B.
`
`ii
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`TABLE OF AUTHORITIES
`
` Page(s)
`
`Cases
`U.S. v. Arthrex, Inc.,
`141 S. Ct. 549 (Oct. 13, 2020) .............................................................................. 3
`Statutes
`35 U.S.C. § 102 ........................................................................................................ 25
`35 U.S.C. § 103 ........................................................................................................ 25
`35 U.S.C. § 325(d) ............................................................................................. 25, 26
`
`iii
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`TABLE OF EXHIBITS
`
`Exhibit
`2001
`
`2002
`
`2003
`
`2004
`2005
`
`Description
`Declaration Of Robert Louis Stevenson, Ph.D. Regarding Patent
`Owner Preliminary Response For Inter Partes Review Of U.S.
`Patent No. 8,363,724
`Wien, Mathias, “High Efficiency Video Coding”, Signals and
`Communication Technology, Springer 2015
`Smolic, Interactive 3-D Video Representation and Coding
`Technologies, Proceedings of the IEEE, Vol. 93, No. 1, January
`2005
`Curriculum Vitae of Dr. Robert Louis Stevenson
`US Patent Publication 2006/0146143
`
`iv
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`I.
`
`Summary
`The Board should deny Unified’s Petition because it fails to establish the
`
`required reasonable likelihood of success under any of the asserted grounds.
`
`Each challenged claim of Interdigital’s U.S. Patent No. 8,362,724 concerns
`
`the use of a “virtual reference picture” to compress or decompress video data. But
`
`Xin—the sole reference relied on in the Petition Grounds as purportedly disclosing
`
`such virtual reference pictures—does not even mention virtual reference pictures.
`
`Instead, Xin concerns synthesized pictures and references, which are completely
`
`different and distinct from virtual reference pictures.
`
`Unable to identify relevant prior art that actually concerns virtual reference
`
`pictures, the Petition simply pretends that virtual and synthesized reference pictures
`
`are the same. They are not. Indeed, Dr. Robert Stevenson is the author of the article
`
`that the Petition cites and uses to explain what virtual references pictures are, and he
`
`explains in the attached declaration that Xin’s synthesized reference pictures are in
`
`fact not virtual reference pictures.
`
`And the Petition offers no evidence to the contrary. Instead, Petitioner Unified
`
`and its expert only assert without explanation or support that (1) Xin supposedly
`
`teaches that virtual reference pictures are also known as synthesized reference
`
`pictures; and (2) Xin’s synthesized pictures are virtual reference pictures because
`
`Xin supposedly discloses that its synthesized pictures are intended to be used solely
`
`1
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`as reference pictures and not to be shown as part of a video. (E.g., Pet., 8-9, 22, 24;
`
`EX1005, ¶¶58, 61, 67). But Petitioner and its expert identify no such teaching or
`
`disclosures in Xin. They cannot because there is none.
`
`None of the Xin passages cited by and relied on by Unified and its expert (or
`
`any other Xin passages) even mentions virtual reference pictures, much less states
`
`that they are the same as synthesized reference pictures. (Id.). And Unified and its
`
`expert do not even bother to try to cite to a Xin passage to support their wholly
`
`unsupported and unexplained assertion that Xin describes that its synthesized
`
`pictures are not used as part of the disclosed multi-view video and are thus virtual
`
`reference pictures. They cannot because Xin repeatedly states that those synthesized
`
`pictures are used as part of the disclosed multi-view video; indeed, the primary
`
`purpose of Xin is to use those synthesized pictures to create a “synthesized multiview
`
`video,” and Xin is even titled “Method And System For Synthesizing Multiview
`
`Video.”1 (EX1003, Title, emphasis added).
`
`Unified’s entire Petition hinges on its unsupported and unexplained assertion
`
`that virtual and synthetic reference pictures are the same thing, but the only evidence
`
`of record shows that they are not. Because Xin does not disclose or concern the
`
`1 Unless otherwise noted, all emphasis herein is added.
`
`2
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`“virtual reference pictures” that are the subject of the challenged claims, the Petition
`
`should be denied.2
`
`II.
`
`Technology Primer
`The Petition raises two distinct areas of technology: (1) the use of a virtual
`
`reference picture to compress and decompress digital video data, which is the subject
`
`of the challenged ’724 Patent; and (2) the use of a synthetic picture and reference
`
`picture to create and transmit video images that were not actually captured by a video
`
`camera, which is the subject of the Petition’s primary reference, Xin. As established
`
`and explained below, a “virtual reference picture” is a picture frame that is created
`
`solely for purposes of serving as a reference picture and is not designed for display
`
`as a video frame of the video. In contrast, a synthetic reference picture is created
`
`and designed so that it may be both used as a reference picture and displayed as
`
`video frame in the video. Xin’s synthetic reference picture is therefore not the
`
`claimed virtual reference picture.
`
`Because the Petition confuses and conflates the two, each is discussed and
`
`explained in the following technology primer.
`
`2 Additionally, the Supreme Court is currently considering the constitutionality of
`Appointments Clause issues concerning IPRs. See U.S. v. Arthrex, Inc., No. 19-
`1434, 141 S. Ct. 549 (Oct. 13, 2020). To the extent the Court determines the
`Appointments Clause has not been complied with, Patent Owner objects to this
`proceeding on that basis.
`
`3
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`Video Compression & Decompression
`A.
`The challenged ‘724 Patent generally concerns methods for compressing (and
`
`then decompressing) digital data. In particular, the ’724 Patent discloses and claims
`
`improved methods for using virtual reference pictures to achieve such compression
`
`and decompression.
`
`The ’724 Patent’s new method is so technically significant that it was
`
`incorporated into the leading video compression standard, the HEVC/H.265
`
`standard, as well as additional formats VP9 and AV1. Accordingly, the ‘724 patent
`
`is a Standard-Essential Patent for the VP9, AV1, and HEVC standard. The HEVC
`
`standard directly competes with VP9 and AV1 video compression formats
`
`developed and advocated by Google (which is a member of Petitioner Unified, but
`
`is not named as a real-party-in-interest in the Petition).
`
`The following addresses and explains several concepts important to
`
`understanding the compression and decompression methods of the ‘724 Patent (and
`
`the fundamental differences between it and Xin), including:
`
`•
`
`the use of compression to reduce the amount of data needed to
`
`transmit video;
`
`•
`
`exemplary methods for achieving such video compression, including
`
`reducing redundant information and using motion information;
`
`•
`
`the general use of a reference picture to aid compression; and
`
`4
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`the use of the specific type of reference picture that is the subject of
`
`•
`
`the ’724 Patent, a virtual reference picture.
`
`Video Compression and Decompression
`1.
`In its original raw form, video contains an enormous amount of data.
`
`(EX2002, 5-6). Because of their enormous size, transmitting such files in their raw
`
`form requires a great deal of time and resources. (EX2002, 5-6). It is therefore
`
`highly desirable to reduce the amount of data that needs to be transmitted to
`
`adequately reproduce the original video. (EX2002, 5-6). This reduction in the
`
`amount of data needed to transmit video information is known as “compression.”
`
`(EX2002, 5-6).
`
`Compression (also referred to as “encoding”) is thus the modification of raw
`
`data into a more compact form that can adequately represent the original video
`
`information. (EX2002, 23-28). Once compressed and then transmitted, the received
`
`compressed data can be decompressed (or “decoded”) for display or other use.
`
`(EX2002, 23-28). This processing pipeline is conceptually illustrated below.
`
`(Figure adapted from EX2002, 23-24).
`
`5
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`Redundant Information
`2.
`One way to reduce the amount of video data is to reduce or eliminate the
`
`amount of redundant information being transmitted. (EX2002, 23-28).
`
`Compressing video data by reducing redundancies
`a.
`Video is a series of still pictures (frames) that, when seen one after another,
`
`result in the appearance of a moving picture. (EX2002, 23-28). To illustrate, the
`
`figure below includes two consecutive pictures (top) and the difference between
`
`them (also referred to as the residual) (bottom). The individual in the pictures opens
`
`her eyes between the first picture (eyes closed) and second picture (eyes open).
`
`Because the background and most of the foreground is static, the difference
`
`(residual) between the two pictures is relatively small. In other words, most of the
`
`6
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`information needed to reproduce the second picture is the same as the information
`
`needed to reproduce the first picture, and sending that repetitive information twice
`
`would therefore be redundant. (EX2002, 37-39). Thus, one way to compress/reduce
`
`the transmitted video data is to send that redundant data just once and then
`
`subsequently send only the residual information necessary to modify the first image
`
`to reproduce the second. (EX2002, 37-39).
`
`b.
`
`Using a typical “reference picture” to reduce
`redundancies
`One way to reduce the transmission of redundant information is to use a
`
`“reference picture” (sometimes alternatively called a “reference frame”). If the
`
`encoding or decoding of a one picture frame uses (“references”) information present
`
`in another picture frame, that other picture frame is called a “reference” picture or
`
`frame. (EX2002, 41-44).
`
`For example, again consider the figure above that includes first and second
`
`pictures (top) and the difference between them (also referred to as the residual)
`
`(bottom). As illustrated above, the individual in the pictures opens her eyes between
`
`the first picture (eyes closed) and second picture (eyes open). The first picture (eyes
`
`closed) could be used as reference picture for reproducing the second picture using
`
`only the residual information.
`
`In this simplified scenario, all of the data for the first picture is transmitted.
`
`However, not all of the data for the second picture is transmitted; instead, only the
`
`7
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`residual is transmitted. The residual and the first picture can then be summed to
`
`recreate the second picture, without the need to transmit twice the redundant
`
`information shared between the first and second pictures (i.e., everything but for the
`
`data representing the eyes).
`
`3. Motion Information
`Another way to reduce the amount of redundant video data (and thus the
`
`overall amount of data that needs to be transmitted) is to rely on motion information.
`
`Such motion information can be used to reduce the transmission of redundant data
`
`concerning persistent video elements that change position from one video frame to
`
`the next. (EX2002, 37-39).
`
`Compressing video data using motion information
`a.
`A video sequence may feature elements that move during the sequence, such
`
`as a person’s face rising from the bottom of the frame to the top, or a bouncing ball
`
`moving from the bottom left corner to the top right corner. (EX2002, 37-39). The
`
`net result over time is a significant change in the image displayed, but, from video
`
`frame to frame, there are only small differences between the content observed, and
`
`these differences are mostly due to motion. (EX2002, 37-39).
`
`Under these circumstances, another means for compressing video data is to
`
`send the data representing the moving element just once. (EX2002, 37-44).
`
`Thereafter, to correctly reproduce the element in a later video picture frame, only
`
`8
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`the element’s motion information needs to be transmitted, instead of the complete
`
`data concerning that element. (EX2002, 37-44). The motion information can then be
`
`used to calculate the element’s position in the later video picture frame. (EX2002,
`
`37-44).
`
`Using reference blocks with motion information
`b.
`In the previous compression example (concerning the blinking eyes) an entire
`
`picture frame was used as a reference. However, it may also be helpful to break up
`
`a frame into sections (“blocks”) and use an individual block as a reference.
`
`(EX2002, 41-44). Such reference blocks are often used in connection with motion
`
`information.
`
`Consider the previously-mentioned bouncing ball moving from the bottom
`
`left corner to the top right corner in a sequence of picture frames. The data
`
`representing the appearance of that ball could be a block of data within a frame, and
`
`motion information could be used to reposition that block in subsequent frames.
`
`For example, in the illustration below, an initial frame (labeled “Reference”)
`
`includes a number of blocks (sometimes alternatively called predictors). (EX2002,
`
`41-44). In a later frame of the video sequence (labeled “Predicted”) four of the blocks
`
`have the same content (visual appearance) as in the earlier Reference frame, but their
`
`positions within the frame have moved. (EX2002, 41-44). For each such block, a
`
`9
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`motion vector can be used to describe how its position has changed from within the
`
`Reference frame to the Predicted frame. (EX2002, 41-44).
`
`(EX2002, 42).
`
`Typical Reference Pictures
`4.
`As illustrated above, a typical reference picture (in contrast to the virtual
`
`reference pictures discussed in more detail below) is a previously decoded or
`
`decompressed picture frame that forms part of the output video, and is thus
`
`displayable. (EX2002, 102-105). Picture frames are often classified as P, B, or I
`
`frames, depending on whether and how they are encoded and decoded in reference
`
`to one or more such reference pictures.
`
`For example, picture frames that are reproduced (“predicted”) using a single
`
`reference picture are referred to as P frames (P is for prediction). (EX2002, 102-
`
`105). Picture frames that are predicted based on two reference pictures are referred
`
`to as B frames (B is for Bi-Prediction). (EX2002, 102-105). And frames that are
`
`10
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`predicted without a reference picture are referred to as I frames (I is for Intra-
`
`Prediction). (EX2002, 102-105).
`
`The below figure illustrates the reference pictures of each picture in a
`
`sequence of video frames where each frame is labeled as an I frame, P frame, or B
`
`frame. (EX2002, 102-105).
`
`(Figure adapted from EX2002, Figure 4.3, 105).
`
`The I frame (or Intra-Prediction picture frame) does not rely on any other
`
`picture frame as a reference and can thus be decoded first. (EX2002, 102-105). The
`
`decoded I frame can be stored in a buffer, sometimes called the decoded picture
`
`buffer, for later use as a reference picture by the decoding process. (EX2002, 102-
`
`105). The decoded I frame is also included in the output video as a frame of the
`
`video. (EX2002, 102-105).
`
`11
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`The P1 picture frame (or Prediction frame) can be then be decoded using the
`
`stored I frame as a reference picture. (EX2002, 102-105). A prediction can be made
`
`based on the stored I frame, and the prediction can be combined with a residual,
`
`which is included in the transmission, to result in the decoded P1 frame. (EX2002,
`
`102-105). Once the P1 frame is decoded, it too can be placed into a buffer for later
`
`use as a reference picture. (EX2002, 102-105). The decoded P1 frame is also
`
`included in the output video as another frame of the video. (EX2002, 102-105).
`
`The B1 frame (or Bi-Prediction frame) can then be decoded using both the I
`
`frame and the P1 frame as reference pictures. (EX2002, 102-105). Two reference
`
`frames can be utilized in the prediction process, for example, by taking the average
`
`of the two frames, which combined with any motion information can be used to form
`
`the predictor, which is combined with a residual included in the bit stream to result
`
`in the decoded B1 frame. (EX2002, 102-105). The decoded B1 frame is included in
`
`the output video, as a second frame of the video. (EX2002, 102-105). In the
`
`illustrated example, B1 is not used by another frame as a reference frame and
`
`therefore need not be stored in a buffer as a reference. (EX2002, 102-105). But the
`
`decoded B1 frame is included in the output video as a frame of the video. (EX2002,
`
`102-105).
`
`The remaining frames can be similarly processed, with B2 using I and P1 as
`
`reference frames, P2 using P1 as a reference frame, and B3 and B4 using P1 and P2 as
`
`12
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`reference frames. (EX2002, 102-105). Each decoded frame may be included in a
`
`buffer, and may serve as a reference frame for another to-be-decoded frame.
`
`(EX2002, 102-105). All decoded frames are available for display and included in the
`
`output video as frames of the video. (EX2002, 102-105).
`
`Accordingly, each (non-virtual) reference frame is a previously decoded
`
`frame that is available for display to form part of the output video. (EX2002, 102-
`
`105).
`
`Virtual Reference Pictures
`5.
`On the other hand, a virtual reference picture is a picture frame that is created
`
`solely for purposes of serving as a reference picture and is not designed for use as a
`
`frame of video. (EX1008, 897-898; EX2001, ¶14). Thus, in contrast to a typical,
`
`non-virtual reference picture, a virtual reference picture is not available for display
`
`as a video frame of the video. (Id.).
`
`Examples of virtual reference pictures include golden frames and detail-
`
`obscured frames, which are explained in the following sections.
`
`Golden Frames
`a.
`One example of a virtual reference picture is a “golden frame.” (EX2001,
`
`¶15). A golden frame is created solely as a reference picture; it includes high-quality
`
`pixel information, but is not itself a picture frame from the video and is not available
`
`for inclusion as a frame of the video. (EX2001, ¶15). Rather, when the golden frame
`
`13
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`is created by the encoder and transmitted to the decoder, it includes an indication
`
`that the golden frame is not available for display in the output video. (EX2001, ¶15).
`
`To better understand a “golden frame” and its benefits, first consider a
`
`compression approach that does not use a “golden frame.” In this example, a video
`
`scene has a static background and a dynamic foreground, such as a video of a ball
`
`(foreground) moving across a landscape (background). (EX2001, ¶16). As the ball
`
`moves across the scene from frame-to-frame the ball covers a portion of the
`
`background and reveals another portion of the background that was previously
`
`covered. (EX2001, ¶16). If a video compression scheme used adjacent frames as
`
`(non-virtual) reference frames (as in the foregoing eye blinking example), then the
`
`portion of the background that was previously covered would not be included in the
`
`reference frame and would therefore need to be included in the residual (along with
`
`motion information for the ball “block”). (EX2001, ¶16). This concept is illustrated
`
`below. As shown, if the first picture serves as a reference picture for the second
`
`picture then the residual contains the information for the background that the ball
`
`was obscuring in the first frame.3
`
`3 Depending on the specific compression technology used, the residual may also
`include some additional information, which is ignored here for the purposes of
`illustrating the concept of a golden frame. (EX2001, ¶16).
`
`14
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`But instead of using the first picture as a reference frame, greater compression
`
`could be achieved by using a virtual reference frame. Specifically, a picture frame
`
`could be created comprising solely the static background of the scene, with no
`
`foreground. (EX2001, ¶17). Because this “golden frame” contains the complete
`
`background information, the portion of the background that was previously covered
`
`by the ball from one frame to the next would be available in the golden frame.
`
`(EX2001, ¶17). As a result, as illustrated below this information would not be
`
`required to be included in the residual, and the compression of the video using a
`
`golden frame is more efficient than using merely the adjacent frames for prediction.
`
`(EX2001, ¶17).
`
`15
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`But although this “golden frame” is created for use as a reference picture, it is
`
`not available for inclusion as a video frame in the displayed video itself. (EX2001,
`
`¶18). Continuing our example from above, in the displayed video, the ball is always
`
`present in the foreground in the video; in contrast, the golden frame contains only
`
`background and cannot be displayed as part of the video because it is missing the
`
`required ball image. (EX2001, ¶18). To this end, when an encoder transmits a
`
`golden frame to a decoder for use as a reference picture, the golden frame is
`
`designated as not available for display in the output video. (EX2001, ¶18).
`
`Because this golden frame is created solely for purposes of serving as a
`
`reference picture and is not available for display as a video frame of the video, it is
`
`a virtual reference picture. (EX2001, ¶19).
`
`16
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`Detail-Obscured Frames
`b.
`A detail-obscured (or filtered) reference frame is another example of a virtual
`
`reference picture and is the subject of the “Zhang” article co-authored by Dr.
`
`Stevenson and relied on in the Petition to explain virtual reference pictures. (Pet. at
`
`7-8; EX1008; EX2001, ¶20).
`
`As Dr. Stevenson explained in the Zhang article, certain transmission methods
`
`(such as certain wireless transmissions) may be prone to introducing errors into the
`
`compressed video file. (EX1008, 896; EX2001, ¶21). And because compression
`
`using motion compensation encodes video information as changes or differences
`
`between frames, small errors in a video sequence can lead to very large and
`
`noticeable errors in the decoded output video. (EX1008, 896; EX2001, ¶21).
`
`As described in his article, Dr. Stevenson investigated using virtual reference
`
`pictures to address this problem, specifically the use of a detail-obscured or filtered
`
`frame as a virtual reference frame in order to reduce the impact of errors in the
`
`compressed data. (EX1008, 897-898; EX2001, ¶22). To create one example of a
`
`virtual reference picture, Dr. Stevenson describes decoding several actual frames of
`
`the transmitted video, then applying a filter to those frames, which would in effect
`
`“obscure details” of content of the frames. (EX1008, 897-898; EX2001, ¶22). The
`
`resulting detail-obscured frame is used as a virtual reference to predict the next
`
`current frame. (EX1008, 897-898; EX2001, ¶22). The virtual reference picture
`
`17
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`appears to include undesirable artifacts and is not designed or available for viewing
`
`in a video, unlike non-virtual reference frames, which are video frames designed and
`
`available for display. (EX2001, ¶22).
`
`Synthesized Pictures
`B.
`The Petition’s Xin reference does not concern or describe the use of virtual
`
`reference pictures, but instead concerns a different and distinct concept, synthesized
`
`pictures. (EX2001, ¶23). Synthesized pictures (and synthesized pictures that are
`
`used as reference pictures) are created so that they may be displayed as part of the
`
`video, specifically, a multi-view video. (EX2001, ¶23).
`
`1. Multi-View Video
`Multi-view video refers to video sequences captured simultaneously from
`
`multiple camera angles and represented in a single video stream. (EX2003, 99;
`
`EX1003, [0003]-[0005], [0036], [0067]-[0068]; EX2001, ¶24). Each camera angle
`
`or view can be considered as a separate single-view video. (EX2003, 99; EX1003,
`
`[0003]-[0005]; EX2001, ¶24). In other words, multi-view video is a collection of
`
`video formed from two or more camera views. (EX2003, 99; EX1003, [0003]-
`
`[0005]; EX2001, ¶24).
`
`Consider for example the below figure from Xin illustrating four cameras
`
`pointed towards a common scene. Each camera is generating a video from their
`
`respective viewing positions and angles. (EX1003, [0003]-[0005]; EX2001, ¶25; See
`
`18
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`also, EX2003, 105-107). The video from each camera, or view, can be encoded, and
`
`the collection of encoded video is called a multi-view video. (EX1003, [0003]-
`
`[0005]; EX2001, ¶25; See also, EX2003, 99, 105-107).
`
`Multi-view video can also be used for free viewpoint television (FTV), which
`
`is a system for allowing the user to interactively control the viewpoint and generate
`
`new views of a dynamic scene from any 3D position. (EX2003, 105-107; EX1003,
`
`[0003]; EX2001, ¶26).
`
`2.
`
` “Synthesized” Picture Frames Are Created To Provide
`Additional Viewpoints For Multi-View Videos
`A synthesized picture frame is a picture frame created to provide an additional
`
`camera angle for a multi-view video. In other words, instead of limiting the available
`
`views and angles for a multi-view video to those created by actual cameras,
`
`19
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`synthesized picture frames are created to provide additional angles and views.4
`
`(EX1003, [0036], [0067]; EX2001, ¶27).
`
`Xin illustrates the synthesis of new views in the figure below. Two videos (or
`
`views) are acquired by two cameras 2 and 3 at respective viewing locations and
`
`angles. (EX1003, [0067]; EX2001, ¶28; See also, EX2003, 99, 105-107). A third
`
`video can be synthesized from the two acquired videos by, for example,
`
`interpolation, to create a video appearing to be from the view of (non-existent)
`
`camera 800. (EX1003, [0067]; EX2001, ¶28; See also, EX2003 at 99, 105-107).
`
`Although this synthesized video was not acquired from a camera, the synthesized
`
`video along with the two acquired videos may be shown as part of the multi-view
`
`video. (EX1003, [0003]-[0005], [0036], [0067]-[0068]; EX2001, ¶28; See also,
`
`EX2003, 99, 105-107).
`
`4 These additional angles and views correspond to a non-existent camera,
`sometimes referred to as virtual camera, and should not be confused with a virtual
`reference picture, which is a different concept entirely (EX2001, ¶27).
`
`20
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`
`As mentioned, synthesizing can be performed by interpolation. (EX1003,
`
`[0068]; EX2003, 99, 105-107; EX2001, ¶29). For example, a synthesized picture
`
`view residing between a first view and a second view can be generated by
`
`interpolating the pixel values of the first view and the second view for each frame.
`
`(EX1003, [0068]; EX2003, 99, 105-107; EX2001, ¶29). This interpolation can
`
`generate synthesized video picture frames that appear to a viewer to have been
`
`acquired by a camera positioned between the camera of the first view and the camera
`
`of the second view. (EX1003, [0067]-[0068]; EX2003, 99, 105-107; EX2001, ¶29).
`
`3.
`
`Synthesized Reference Pictures Are Not Virtual Reference
`Pictures
`Just as an ordinary picture frame (one created by an actual camera) may be
`
`used as a reference picture, so may a synthesized picture frame. (EX2001, ¶30). For
`
`example, just as an ordinary picture frame from a specific camera angle can be used
`
`21
`
`
`
`Case No. IPR2021-00102
`Patent No. 8,363,724
`as reference picture to predict the frame of other picture frames from that camera, a
`
`synthesized picture frame can be used to predict the contents of other synthesized
`
`picture frames. (EX2001, ¶30). As Xin describes: “The synthesized frames serve as
`
`reference pictures from which a current synthesized frame can be predicted.”
`
`(EX1008, [0071]).
`
`But critically, a “synthesized reference picture” is not a “virtual reference
`
`picture.” (EX2001, ¶31). Where a synthesized reference picture is itself a
`
`synthesized picture frame created and designed so that it may be viewed as part of
`
`the available multi-view video, a virtual reference picture is not created and designed
`
`to be viewed. (EX2001, ¶31). Unlike a synthesized reference picture, a virtual
`
`reference picture is created for the sole purpose of serving as a