`____________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`____________
`
`APPLE INC.
`Petitioner
`
`v.
`
`GESTURE TECHNOLOGY PARTNERS LLC
`Patent Owner
`_________________
`
`Inter Partes Review Case No. IPR2021-00921
`U.S. Patent No. 8,878,949
`
`SUPPLEMENTAL DECLARATION OF DR. BENJAMIN B. BEDERSON
`
`IPR2021-00921
`Apple EX1018 Page 1
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`
`I, Benjamin B. Bederson, hereby declare the following:
`
`1. My name is Benjamin B. Bederson, Ph.D and I am over 21 years of age
`
`and otherwise competent to make this Declaration. I make this Declaration based on
`
`facts and matters within my own knowledge and on information provided to me by
`
`others.
`
`2.
`
`I submitted an initial declaration in support of Apple’s petition for Inter
`
`Partes Review of U.S. Patent No. 8,878,949 (“the ’949 Patent”). I understand the
`
`PTAB instituted the requested review and that the proceeding involves the full scope
`
`of the proposed grounds addressed in my initial declaration. I have been asked to
`
`address a few additional issues in response to Patent Owner’s Response (Paper 10)
`
`and Patent Owner’s expert’s declaration (Ex. 2002).
`
`I.
`
`Patent Owner identifies no technical barriers to implementing a
`combination of Numazaki’s gesture recognition and videoconference
`functionalities
`3.
`In my original declaration, I described the reflected light extraction unit
`
`of Numazaki’s first embodiment, and I explained why a PHOSITA would have
`
`understood both the third and fifth embodiments use it. To start, I described the
`
`reflected light extraction unit with reference to Numazaki’s Fig. 2, describing it as a
`
`“two-camera structure for detecting a user’s gestures.” Ex. 1003, ¶ 34. I proceeded
`
`to describe how the reflected light extraction unit relies on a timing control unit to
`
`turn lighting unit 101 on when the first camera 109 is active and off when the second
`
`
`
`2
`
`IPR2021-00921
`Apple EX1018 Page 2
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`camera unit 110 is active. The difference of these images is obtained by difference
`
`calculation unit 111 and used by feature data generation unit 103 to determine
`
`gestures. Id. at ¶¶ 34-35 (citing Ex. 1004 , Fig. 2, 11:9-39, 11:43-51, 10:57-66).
`
`
`
`Id. at ¶ 34 (citing Ex. 1004, Fig. 2).
`
`4.
`
`Next, I described Numazaki’s third embodiment and its use of the same
`
`feature data generation unit as the first embodiment to convert gestures into
`
`commands. Ex. 1003, ¶ 31. I detailed how this embodiment uses a shape memory
`
`unit 332 to identify when the user has performed a pre-registered gesture and
`
`instructs the device to implement a command corresponding to it, such as
`
`“instructing the power ON/OFF of the TV, the lighting equipment, etc.” Id. at ¶ 36
`
`(citing Ex. 1004 at 31:3-10).
`
`
`
`3
`
`IPR2021-00921
`Apple EX1018 Page 3
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`I then described how the fifth embodiment uses image capture for
`
`5.
`
`videoconference applications with reference to Numazaki’s Fig. 46. This
`
`embodiment pairs the same reflected light extraction unit 102 of the first
`
`embodiment with a visible light photo-detection array 351, using these two imaging
`
`components to subtract all extraneous, background image information from the
`
`captured video of subject to arrive at image information containing only the subject.
`
`Ex. 1003, ¶ 38 (citing Ex. 1004, 39:6-16, 39:12-60, 40:32-35).
`
`Id. at (citing Ex. 1004, Fig. 46).
`
`6.
`
`Next, my original declaration stated several reasons why the third and
`
`fifth embodiments both rely upon the same reflected light extraction unit 102 from
`
`
`
`
`
`4
`
`IPR2021-00921
`Apple EX1018 Page 4
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`the first embodiment. Id. at ¶¶ 40-42. I then discussed Nonaka’s image capture
`
`gesture teachings (id. at ¶¶ 46-47) and explained why a PHOSITA would have been
`
`motivated by Nonaka to combine Numazaki’s third and fifth embodiments in its
`
`eighth embodiment laptop “such that a user could perform a gesture command
`
`(pursuant to its third embodiment) that causes video capture to initiate (pursuant to
`
`its fifth embodiment). Id. at ¶¶ 48-51.
`
`7.
`
`I understand Patent Owner raises an issue related to the proposed
`
`combination requiring the output of reflected light extraction unit 102 be processed
`
`by (1) the third embodiment’s feature data generation unit when detecting gestures
`
`and (2) the fifth embodiment’s feature data generation unit when implementing
`
`videoconference functionality. Paper 10, 12-16. Specifically, citing its expert, Patent
`
`Owner argues that the petition “does not explain how these specialized units would
`
`operate simultaneously or whether different units would operate at different times or
`
`what that timing functionality would require.” Paper 10, 16 (citing Ex. 2002, ¶ 58).
`
`8.
`
`Patent Owner and its expert misread the combination, which expressly
`
`defines the sequential nature in which the third and fifth embodiment units separately
`
`process unit 102’s output. The petition states “a PHOSITA would have been
`
`motivated to implement this gesture recognition as a means of allowing the user to
`
`initiate (or turn on) the fifth embodiment’s videoconferencing functionality.” Paper
`
`1, 31. The combination proposes the gesture detecting functionality of the third
`
`
`
`5
`
`IPR2021-00921
`Apple EX1018 Page 5
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`embodiment is employed until an image capture gesture is detected. Then, the
`
`system
`
`transitions
`
`into Numazaki’s
`
`fifth embodiment videoconferencing
`
`functionality. Accordingly, the proposed combination utilized the gesture detecting
`
`processing of the third embodiment and the videoconference processing of the fifth
`
`embodiment separately and sequentially. Indeed, the output of reflected light
`
`extraction unit 102 would be processed by the third embodiment’s gesture detecting
`
`block until an image capture gesture is detected and would then be processed by the
`
`fifth embodiment’s videoconferencing block. The output of reflected light extraction
`
`unit 102 would not be processed by both the third embodiment and the fifth
`
`embodiment blocks at the same time because the first acts as a trigger for the second
`
`to occur.
`
`9.
`
`Given the separate, but sequential processing tasks Numazaki assigns
`
`to these two embodiments, it would be well within the capabilities of a PHOSITA
`
`to utilize the same output by two separate processing blocks to implement the
`
`proposed combination. A PHOSITA would understand there are no technical
`
`barriers to arranging multiple distinct processing units that separately process the
`
`same output of a single unit. Indeed, I understand neither Patent Owner nor its expert
`
`identify any technical barriers to arranging Numazaki’s features such that the output
`
`of reflected light extraction unit 102 is processed separately by (1) the third
`
`embodiment’s feature data generation unit when detecting gestures and (2) the fifth
`
`
`
`6
`
`IPR2021-00921
`Apple EX1018 Page 6
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`embodiment’s feature data generation unit when implementing videoconference
`
`functionality.
`
`II.
`
`Patent Owner ignores the benefit of initiating video capture from
`beyond the reach of Numazaki’s laptop
`10.
`I understand Patent Owner first takes issue with the Petition’s premise
`
`that Numazaki’s videoconference functionality “would be improved by allowing
`
`users to position themselves in place before the video camera and initiate video
`
`capture through a gesture, rather than a physical input or timer mechanism.” Paper
`
`10, 18-19 (quoting Paper 1, 31). It argues that a user of Numazaki’s videoconference
`
`system would be “within reach of Numazaki’s laptop before and during a
`
`videoconference” and would, accordingly, have no need for an image capture
`
`gesture to initiate image capture. Id. (emphasis added). Patent Owner’s expert argues
`
`that because a user has to physically interact with the laptop to dial a telephone
`
`number, that “user would already be positioned ‘in place’ for the videoconference”
`
`and would have no need for the proposed gesture initiation. Ex. 2002, ¶ 62. Mr.
`
`Occhiogrosso concludes that because the user must physically interact with the
`
`laptop before the videoconference begins, that user would never move and would
`
`remain “within reach of Numazaki’s laptop before and during a videoconference.”
`
`Id. at ¶ 63 (emphasis added). I disagree.
`
`11. A PHOSITA would have understood there are countless examples of
`
`videoconference scenarios in which the user would desire to not sit in front of the
`
`
`
`7
`
`IPR2021-00921
`Apple EX1018 Page 7
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`laptop and within arm’s reach. Take, for example, a lecture in which the lecturer is
`
`standing, rather than seated. This common scenario benefits from allowing the user
`
`to stand some distance from the camera such that they are fully captured in the
`
`camera’s field of view. A PHOSITA would have understood that capturing a
`
`standing lecturer in this scenario would generally require that lecturer to be standing
`
`a distance far enough from the camera that the user is not within reach of the camera
`
`and able to manually control it. Another example is a tutorial in which the speaker
`
`is demonstrating the use of a product that requires a broader field of view than is
`
`provided by the laptop camera were the speaker seated before the laptop. Indeed, the
`
`videoconference scenario Mr. Occhiogrosso envisions—a user seated before the
`
`laptop and able to physically control the laptop—is most commonly associated with
`
`the camera capturing only the user’s upper body (e.g., shoulders and head). This
`
`field of view is adequate for many videoconference applications, but it would be
`
`inadequate for many others. As I discuss above, many common scenarios remove
`
`the speaker from close proximity of Numazaki’s laptop such that the user requires
`
`some ability to remotely initiate image capture. Indeed, Numazaki’s Fig. 48
`
`illustrates the fifth embodiment videoconference functionality and depicts a
`
`videoconference participant that appears to be removed from the camera by some
`
`distance:
`
`
`
`8
`
`IPR2021-00921
`Apple EX1018 Page 8
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`
`
`
`Ex. 1004, Fig. 48. A PHOSITA would have understood that the person illustrated in
`
`Fig. 48 is not seated before the camera. The participant is visible from the waist up,
`
`which is a larger field of view than is generally possible with a participant seated
`
`within arm’s reach of the camera. In sum, the proposed combination assumes the
`
`user needs to remotely trigger image capture, and a PHOSITA would have
`
`recognized there are numerous scenarios in which such remote trigger functionality
`
`is required. Accordingly, I disagree with Mr. Occhiogrosso that simply because a
`
`user may initially physically interact with the laptop to dial a number or otherwise
`
`
`
`9
`
`IPR2021-00921
`Apple EX1018 Page 9
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`prepare the laptop for a videoconference, that user would necessarily remain within
`
`arm’s reach when image capture is initiated.
`
`III. PATENT OWNER IGNORES THAT NONAKA EXPRESSLY
`TEACHES THAT GESTURES PROVIDE GREATER FLEXIBILITY
`THAN TIMERS
`12.
`I understand Patent Owner challenges whether gesture-based image
`
`capture initiation provides greater freedom than timers. Paper 10, 20-21. In my
`
`original declaration, I discussed the many known ways to remotely initiate image
`
`capture functionality, such as self-timer mechanisms. Ex. 1003, ¶ 49. I discussed
`
`Nonaka’s express teachings that gestures improve upon traditional timer-based
`
`remote initiation. Id. A PHOSITA would have agreed with Nonaka’s conclusion that
`
`gesture-based initiation is better in many scenarios and would have been motivated
`
`to implement Nonaka’s gesture-based solution with or in lieu of timers. As I
`
`explained in my initial declaration, the goal of initiating image capture from a remote
`
`position is “ensuring the user is able to get in position and prepared before the video
`
`capture begins.” Ex. 1003, ¶ 49. By initiating image capture with a gesture, a user
`
`retains complete control over image capture by conditioning it upon the user’s own
`
`preparation and positioning. Using a self-timer mechanism strips some level of
`
`control from the user, forcing the user to predict the amount of time requires to get
`
`positioned and creating downsides for predicting imprecisely. For example, were the
`
`user to set the timer for too short a period, image capture would initiate before the
`
`
`
`10
`
`IPR2021-00921
`Apple EX1018 Page 10
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`user is in position and ready to begin. To avoid this scenario, self-timers force the
`
`user to over-estimate the required time, leading to frustrating waiting periods in
`
`which the user is ready, but is forced to wait out the remainder of the timer period.
`
`Nonaka’s gesture-based initiation allows the user to take all the time necessary to
`
`get in position and permits initiating image capture as soon as the user is ready,
`
`avoiding the frustrating waiting period associated with timers.
`
`IV. NUMAZAKI’S UNIT 102 IS FIXED IN RELATION TO CAMERA 351
`13.
`I understand the Patent Owner challenges the Petition’s evidence that
`
`unit 102 is fixed in relation to camera 351 in the proposed combination, satisfying
`
`Claim 4. Paper 10, 23-24. The Petition set forth that Numazaki’s reflected light
`
`extraction unit 102 and digital camera 351 are arranged “side-by-side such that they
`
`have overlapping fields of view.” Paper 1, 38 (further citing Ex. 1004 at 39:4-44 for
`
`its teaching that they are “arranged in parallel”) (emphasis added). I understand
`
`Patent Owner and its expert focus on the “arranged in parallel” teaching, ignoring
`
`entirely the Petition’s argument that unit 102 and camera 351 have “overlapping
`
`fields of view.” Paper 10, 23-24; Ex. 2002, ¶¶ 70-71. Patent Owner and its expert
`
`are wrong to ignore this important aspect of the fifth embodiment. That it requires
`
`overlapping fields of view is key to concluding Numazaki’s unit 102 and camera
`
`351 are fixed in relation to each other.
`
`
`
`11
`
`IPR2021-00921
`Apple EX1018 Page 11
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`In my initial declaration, I referred to Fig. 46 of Numazaki to explain
`
`14.
`
`its fifth embodiment uses a two-camera reflected light extraction unit 102 in
`
`conjunction with a separate visible light photo-detection array 351:
`
`
`
`Ex. 1003, ¶ 52 (citing Ex. 1004, Fig. 46). I explained that the functionality of the
`
`fifth embodiment requires defining which portions of video captured by camera 351
`
`are retained, therefore a PHOSITA would have understood components 102 and 351
`
`“are both forward facing and have overlapping fields of view” because if they did
`
`not, “the output of the first unit could not be used to define which portions of the
`
`second unit’s output should be retained.” Id. In other words, to perform the basic
`
`function of the fifth embodiment, unit 102 and camera 351 must have and maintain
`
`
`
`12
`
`IPR2021-00921
`Apple EX1018 Page 12
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`overlapping fields of view. Were one to move in relation to the other, it would no
`
`longer be possible for the output of unit 102 to define which portions of the camera
`
`351 output should be retained.
`
`15.
`
`I understand Patent Owner’s expert admitted that unit 102 and camera
`
`351 must retain overlapping fields of view in order to “satisfy the intended purpose”
`
`of Numazaki’s fifth embodiment. Ex. 1019, 23:21-24:22. I understand Mr.
`
`Occhiogrosso also admitted that fixing unit 102 and camera 351 in relation to one
`
`another ensures they retain overlapping fields of view. Id. at 25:7-14. Finally, I
`
`understand Mr. Occhiogrosso confirmed that he was not aware of any teaching in
`
`Numazaki that suggests unit 102 and camera 351 are not fixed in relation to on
`
`another. Id. at 25:18-26:2. Accordingly, because (1) the fifth embodiment requires
`
`unit 102 and camera 351 to retain overlapping fields of view, (2) fixing them retains
`
`overlapping fields of view, and (3) there is no suggestion that they are not fixed, a
`
`PHOSITA would have considered unit 102 and camera 351 fixed in relation to each
`
`other.
`
`V. CONCLUSION
`16.
`I declare that all statements made herein of my knowledge are true, and
`
`that all statements made on information and belief are believed to be true, and that
`
`these statements were made with the knowledge that willful false statements and the
`
`
`
`13
`
`IPR2021-00921
`Apple EX1018 Page 13
`
`
`
`IPR2021-00921
`U.S. Patent No. 8,878,949
`like so made are punishable by fine or imprisonment, or both, under Section 1001 of
`
`Title 18 of the United States Code.
`
`Date:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`By: _______________________________
`Dr. Benjamin B. Bederson
`
`
`
`14
`
`
`
`
`
`
`IPR2021-00921
`Apple EX1018 Page 14
`
`