`An objective video quality assessment system based on human perception
`
`Arthur A. Webster, Coleen T. Jones, Margaret H. Pinson,
`Arthur A. Webster, Coleen T. Jones, Margaret H. Pinson,
`Stephen D. Voran, Stephen Wolf
`Stephen D. Voran, Stephen Wolf
`
`Institute for Telecommunication Sciences
`Institute for Telecommunication Sciences
`National Telecommunications and Information Administration
`National Telecommunications and Information Administration
`325 Broadway, Boulder, CO 80303
`325 Broadway, Boulder, CO 80303
`
`ABSTRACT
`ABSTRACT
`
`The Institute for Telecommunication Sciences (ITS) has developed an objective video quality assessment system that
`The Institute for Telecommunication Sciences (ITS) has developed an objective video quality assessment system that
`emulates human perception. The system returns results that agree closely with quality judgements made by a large panel of
`emulates human perception. The system returns results that agree closely with quality judgements made by a large panel of
`viewers. Such a system is valuable because it provides broadcasters, video engineers and standards organizations with the
`viewers. Such a system is valuable because it provides broadcasters, video engineers and standards organizations with the
`capability for making meaningful video quality evaluations without convening viewer panels. The issue is timely because
`capability for making meaningful video quality evaluations without convening viewer panels. The issue is timely because
`compressed digital video systems present new quality measurement questions that are largely unanswered.
`compressed digital video systems present new quality measurement questions that are largely unanswered.
`
`The perception-based system was developed and tested for a broad range of scenes and video technologies. The 36
`The perception-based system was developed and tested for a broad range of scenes and video technologies. The 36
`test scenes contained widely varying amounts of spatial and temporal information. The 27 impairments included digital video
`test scenes contained widely varying amounts of spatial and temporal information. The 27 impairments included digital video
`compression systems operating at line rates from 56 kbits/sec to 45 Mbits/sec with controlled error rates, NTSC encode/
`compression systems operating at line rates from 56 kbits/sec to 45 Mbits/sec with controlled error rates, NTSC encode/
`decode cycles, VHS and S-VHS record/play cycles, and VHF transmission. Subjective viewer ratings of the video quality
`decode cycles, VHS and S-VHS record/play cycles, and VHF transmission. Subjective viewer ratings of the video quality
`were gathered in the ITS subjective viewing laboratory that conforms to CCIR Recommendation 500-3. Objective measures of
`were gathered in the ITS subjective viewing laboratory that conforms to CCIR Recommendation 500-3. Objective measures of
`video quality were extracted from the digitally sampled video. These objective measurements are designed to quantify the spa-
`video quality were extracted from the digitally sampled video. These objective measurements are designed to quantify the spa-
`tial and temporal distortions perceived by the viewer.
`tial and temporal distortions perceived by the viewer.
`
`This paper presents the following: a detailed description of several of the best ITS objective measurements, a percep-
`This paper presents the following: a detailed description of several of the best ITS objective measurements, a percep-
`tion-based model that predicts subjective ratings from these objective measurements, and a demonstration of the correlation
`tion-based model that predicts subjective ratings from these objective measurements, and a demonstration of the correlation
`between the model's predictions and viewer panel ratings. A personal computer-based system is being developed that will
`between the model's predictions and viewer panel ratings. A personal computer-based system is being developed that will
`implement these objective video quality measurements in real time. These video quality measures are being considered for
`implement these objective video quality measurements in real time. These video quality measures are being considered for
`inclusion in the Digital Video Teleconferencing Performance Standard by the American National Standards Institute (ANSI)
`inclusion in the Digital Video Teleconferencing Performance Standard by the American National Standards Institute (ANSI)
`Accredited Standards Committee T1, Working Group T1A1.5.
`Accredited Standards Committee T1, Working Group T1A1.5.
`
`1. INTRODUCTION
`1. INTRODUCTION
`
`The need to measure video quality arises in the development of video equipment and in the delivery and storage of
`The need to measure video quality arises in the development of video equipment and in the delivery and storage of
`video and image information. Although the work described in this paper is concerned specifically with NTSC video (the distri-
`video and image information. Although the work described in this paper is concerned specifically with NTSC video (the distri-
`bution television standard in the United States), the principles presented can be applied to other types of motion video and
`bution television standard in the United States), the principles presented can be applied to other types of motion video and
`even still images. The methods of video quality assessment can be divided into two main categories: subjective assessment
`even still images. The methods of video quality assessment can be divided into two main categories: subjective assessment
`(which uses human viewers) and objective assessment (which is accomplished by use of electrical measurements). While we
`(which uses human viewers) and objective assessment (which is accomplished by use of electrical measurements). While we
`believe that assessment of video quality is best accomplished by the human visual system, it is useful to have objective meth-
`believe that assessment of video quality is best accomplished by the human visual system, it is useful to have objective meth-
`ods available which are repeatable, can be standardized, and can be performed quickly and easily with portable equipment.
`ods available which are repeatable, can be standardized, and can be performed quickly and easily with portable equipment.
`These objective methods should give results that correlate closely with results obtained through human perception.
`These objective methods should give results that correlate closely with results obtained through human perception.
`
`Objective measurement of video quality was accomplished in the past through the use of static video test scenes such
`Objective measurement of video quality was accomplished in the past through the use of static video test scenes such
`as resolution charts, color bars, multi-burst patterns, etc., and by measuring the signal to noise ratio of the video signal.1 These
`as resolution charts, color bars, multi-burst patterns, etc., and by measuring the signal to noise ratio of the video signal.1 These
`objective methods address the spatial and color aspects of the video imagery as well as overall signal distortions present in tra-
`objective methods address the spatial and color aspects of the video imagery as well as overall signal distortions present in tra-
`ditional analog systems. With the development of digital compression technology, a large number of new video services have
`ditional analog systems. With the development of digital compression technology, a large number of new video services have
`become available. The savings in transmission and/or storage bandwidth made possible with digital compression technology
`become available. The savings in transmission and/or storage bandwidth made possible with digital compression technology
`depends upon the amount of information present in the original (uncompressed) video signal, as well as how much quality the
`depends upon the amount of information present in the original (uncompressed) video signal, as well as how much quality the
`user is willing to sacrifice. Impairments may result when the information present in the video signal is larger than the transmis-
`user is willing to sacrifice. Impairments may result when the information present in the video signal is larger than the transmis-
`sion channel capacity. However, users may be willing to sacrifice quality to achieve a substantial reduction in transmission and
`sion channel capacity. However, users may be willing to sacrifice quality to achieve a substantial reduction in transmission and
`
`Page 1 of 12
`
`MINDGEEK EXHIBIT 1013
`
`
`
`storage costs. But, how much quality is sacrificed for how much cost savings? We propose a set of measurements that offers a
`storage costs. But, how much quality is sacrificed for how much cost savings? We propose a set of measurements that offers a
`way to begin to answer this question. New impairments can be present in digitally compressed video and these impairments
`way to begin to answer this question. New impairments can be present in digitally compressed video and these impairments
`include both spatial and temporal artifacts.2 The old objective measurement techniques are not adequate to assess the impact
`include both spatial and temporal artifacts? The old objective measurement techniques are not adequate to assess the impact
`on quality of these new artifacts.3
`on quality of these new artifacts.3
`
`After some investigation of compressed video, it becomes clear that the perceived quality of the video after passing
`After some investigation of compressed video, it becomes clear that the perceived quality of the video after passing
`through a given digital compression system is often a function of the input scene. This is particularly true for low bit-rate sys-
`through a given digital compression system is often a function of the input scene. This is particularly true for low bit-rate sys-
`tems. A scene with little motion and limited spatial detail (such as a head and shoulders shot of a newscaster) may be com-
`tems. A scene with little motion and limited spatial detail (such as a head and shoulders shot of a newscaster) may be com-
`pressed to 384 kbits/sec and decompressed with relatively little distortion. Another scene (such as a football game) which
`pressed to 384 kbits/sec and decompressed with relatively little distortion. Another scene (such as a football game) which
`contains a large amount of motion as well as spatial detail will appear quite distorted at the same bit rate. Therefore, we
`contains a large amount of motion as well as spatial detail will appear quite distorted at the same bit rate. Therefore, we
`directed our efforts toward developing perception-based objective measurements which are extracted from the actual sampled
`directed our efforts toward developing perception-based objective measurements which are extracted from the actual sampled
`video. These objective measurements quantify the perceived spatial and temporal distortions in a way that correlates as closely
`video. These objective measurements quantify the perceived spatial and temporal distortions in a way that correlates as closely
`as possible with the response of a human visual system. Each scene was digitized (at 4 times sub-carrier frequency) to produce
`as possible with the response of a human visual system. Each scene was digitized (at 4 times sub-carrier frequency) to produce
`a time sequence of images sampled at 30 frames per second (in time) and 756 x 486 pixels (in space).
`a time sequence of images sampled at 30 frames per second (in time) and 756 x 486 pixels (in space).
`
`2. DEVELOPMENT METHODOLOGY
`2. DEVELOPMENT METHODOLOGY
`
`Figure 1 presents a graphical depiction of the development process for the ITS quality assessment algorithm. A set of
`Figure 1 presents a graphical depiction of the development process for the ITS quality assessment algorithm. A set of
`video scene pairs (each consisting of the original and a degraded version) was used in a subjective test. These scene pairs were
`video scene pairs (each consisting of the original and a degraded version) was used in a subjective test. These scene pairs were
`also processed on a computer that extracted a large number of features. Statistical analysis was used to select an optimal set of
`also processed on a computer that extracted a large number of features. Statistical analysis was used to select an optimal set of
`quality parameters (obtained from features) that correlated well with the viewing panel results. This optimal set of parameters
`quality parameters (obtained from features) that correlated well with the viewing panel results. This optimal set of parameters
`was then used to develop a quality assessment algorithm that gives results that agree closely with viewing panel results.
`was then used to develop a quality assessment algorithm that gives results that agree closely with viewing panel results.
`
`Objective
`Objective (cid:9)
`Testing
`Testing (cid:9)
`
`Objective
`Objective
` Test
`Test
` Results
`Results
`(Features)
`(Features)
`
`Library of
`Library of
`Test
`Test
`Scenes
`enes
`
`Original
`Original
` Video
`Video
`
`Impairment
`Impairment
`Generators
`Generators
`
`(cid:9) ► O
`Degraded
`Degraded
` Video
`Video
`
`Statistical
`Statistical
`Analysis
`Analysis
`
`Parameters
`Parameters
`
`Quality
`Quality
`Assessment
`Assessment
`Algorithm
`Algorithm
`
`Viewing
`Viewing
`Panel
`Panel
`Results
`Results
`
`Subjective
`Subjective
`Testing
`Testing
`
`Figure 1. Development Process for Video Quality Assessment Algorithm
`Figure 1. Development Process for Video Quality Assessment Algorithm
`
`2.1 Library of test scenes
`2.1 Library of test scenes
`
`Several scenes, exhibiting various amounts of spatial and temporal information content, are needed to characterize
`Several scenes, exhibiting various amounts of spatial and temporal information content, are needed to characterize
`the performance of a video system. Even more scenes are needed to guard against viewer boredom during the subjective test-
`the performance of a video system. Even more scenes are needed to guard against viewer boredom during the subjective test-
`ing. A set of 36 test scenes was chosen for the experiment. The test scenes spanned a wide range of user applications including
`ing. A set of 36 test scenes was chosen for the experiment. The test scenes spanned a wide range of user applications including
`still scenes, limited motion graphics, and full motion entertainment video.
`still scenes, limited motion graphics, and full motion entertainment video.
`
`Page 2 of 12
`
`MINDGEEK EXHIBIT 1013
`
`
`
`2.2 Impairment generators
`2.2 Impairment generators
`
`Twenty-seven video systems (plus the ‘no impairment’ system) were used to produce the degraded video that was
`Twenty-seven video systems (plus the 'no impairment' system) were used to produce the degraded video that was
`used in the tests. The original video for this test was component analog video. The digital video systems included 11 video
`used in the tests. The original video for this test was component analog video. The digital video systems included 11 video
`codecs (coder-decoders) from 7 manufacturers operating at bit rates from 56 kbits/sec to 45 Mbits/sec including bit error rates
`codecs (coder-decoders) from 7 manufacturers operating at bit rates from 56 kbits/sec to 45 Mbits/sec including bit error rates
`of 10-6 and 10-5. Also included were analog video systems such as VHS and S-VHS recording and playback, and noisy RF
`of 10-6 and 10-5. Also included were analog video systems such as VHS and S-VHS recording and playback, and noisy RF
`transmission. All video systems except the ‘no impairment’ system included NTSC encoding and decoding.
`transmission. All video systems except the 'no impairment' system included NTSC encoding and decoding.
`
`2.3 Objective testing
`2.3 Objective testing
`
`Both the original video and the degraded video were digitized and processed to extract a large number of features.
`Both the original video and the degraded video were digitized and processed to extract a large number of features.
`The processing included Sobel filtering, Laplace filtering, fast Fourier transforms, first-order differencing, color distortion
`The processing included Sobel filtering, Laplace filtering, fast Fourier transforms, first-order differencing, color distortion
`measurements4, and moment calculations. Typically, features were calculated from each original and degraded frame of the
`measurements4, and moment calculations. Typically, features were calculated from each original and degraded frame of the
`video sequence to produce time histories. Some features required the entire original and degraded video image (e.g., the vari-
`video sequence to produce time histories. Some features required the entire original and degraded video image (e.g., the vari-
`ance of the error image calculated from the difference between the original and the degraded images). Other features required
`ance of the error image calculated from the difference between the original and the degraded images). Other features required
`only the statistics of the original and degraded video images (e.g., the change in image energy obtained from the differences
`only the statistics of the original and degraded video images (e.g., the change in image energy obtained from the differences
`between the original and the degraded image variances). The time histories of the features were collapsed by various methods,
`between the original and the degraded image variances). The time histories of the features were collapsed by various methods,
`e.g., maximum (MAX), root mean square (RMS), standard deviation (STD), etc., to produce a single scalar value (or parame-
`e.g., maximum (MAX), root mean square (RMS), standard deviation (STD), etc., to produce a single scalar value (or parame-
`ter) for each test scene. These parameters defined the objective measurements and were used in the statistical analysis step
`ter) for each test scene. These parameters defined the objective measurements and were used in the statistical analysis step
`shown in Figure 1.
`shown in Figure 1.
`
`2.4 Subjective testing
`2.4 Subjective testing
`
`The subjective test was conducted in accordance with CCIR Recommendation 500-3.5 A panel of 48 viewers were
`The subjective test was conducted in accordance with CCIR Recommendation 500-3.5 A panel of 48 viewers were
`selected from the U.S. Department of Commerce Laboratories phone book in Boulder, Colorado. Each viewer completed four
`selected from the U.S. Department of Commerce Laboratories phone book in Boulder, Colorado. Each viewer completed four
`viewing sessions during a single week, attending one session per day. Each session lasted approximately 25 minutes and
`viewing sessions during a single week, attending one session per day. Each session lasted approximately 25 minutes and
`required viewing of 38 or 40, 30-second test clips. A clip is defined as a test scene pair consisting of the original video and the
`required viewing of 38 or 40, 30-second test clips. A clip is defined as a test scene pair consisting of the original video and the
`degraded video. The viewer was first shown the original video for 9 seconds followed by 3 seconds of grey and then 9 seconds
`degraded video. The viewer was first shown the original video for 9 seconds followed by 3 seconds of grey and then 9 seconds
`of the degraded video. 9 seconds was allowed to rate the impairment on a 5 point scale before the next clip was presented. The
`of the degraded video. 9 seconds was allowed to rate the impairment on a 5 point scale before the next clip was presented. The
`viewer was asked to rate the difference between the original video and the degraded video as either (5) Imperceptible, (4) Per-
`viewer was asked to rate the difference between the original video and the degraded video as either (5) Imperceptible, (4) Per-
`ceptible but Not Annoying, (3) Slightly Annoying, (2) Annoying, or (1) Very Annoying. This scale covers a wide range of
`ceptible but Not Annoying, (3) Slightly Annoying, (2) Annoying, or (1) Very Annoying. This scale covers a wide range of
`impairment levels and is specified as one of the standard scales in the CCIR Recommendation 500-3. Impairment testing was
`impairment levels and is specified as one of the standard scales in the CCIR Recommendation 500-3. Impairment testing was
`used since we were interested in measuring the change in video quality due to a video system. A mean opinion score was gen-
`used since we were interested in measuring the change in video quality due to a video system. A mean opinion score was gen-
`erated by averaging the viewer ratings.
`erated by averaging the viewer ratings.
`
`The selection of 158 clips used in the test (out of 972 clips available) was made both deterministically and randomly.
`The selection of 158 clips used in the test (out of 972 clips available) was made both deterministically and randomly.
`Random selections were made from a distribution table that paired video teleconferencing systems with more video teleconfer-
`Random selections were made from a distribution table that paired video teleconferencing systems with more video teleconfer-
`encing scenes than entertainment scenes, and entertainment systems with more entertainment scenes than video teleconferenc-
`encing scenes than entertainment scenes, and entertainment systems with more entertainment scenes than video teleconferenc-
`ing scenes. The viewers rated 132 unique clips from the 158 actually viewed because some were used for training and
`ing scenes. The viewers rated 132 unique clips from the 158 actually viewed because some were used for training and
`consistency checks.
`consistency checks.
`
`2.5 Statistical analysis and quality assessment system
`2.5 Statistical analysis and quality assessment system
`
`This stage of the development process utilized joint statistical analysis of the subjective and objective data sets. This
`This stage of the development process utilized joint statistical analysis of the subjective and objective data sets. This
`step identifies a subset of the candidate objective measurements that provides useful and unique video quality information. The
`step identifies a subset of the candidate objective measurements that provides useful and unique video quality information. The
`best measurement was selected by exhaustive search. Additional measurements were selected to reduce the remaining objec-
`best measurement was selected by exhaustive search. Additional measurements were selected to reduce the remaining objec-
`tive-subjective error by the largest amount. Selected measurements complement each other. For instance, a temporal distortion
`tive-subjective error by the largest amount. Selected measurements complement each other. For instance, a temporal distortion
`measure was selected to reduce the objective-subjective error remaining from a previous selection of a spatial distortion mea-
`measure was selected to reduce the objective-subjective error remaining from a previous selection of a spatial distortion mea-
`sure. When combined in a simple linear model, this subset of measurements provides predicted scores that correlate well with
`sure. When combined in a simple linear model, this subset of measurements provides predicted scores that correlate well with
`the true scores obtained in the subjective tests. In constructing the linear model we looked for p measurements {mi} and p +1
`the true scores obtained in the subjective tests. In constructing the linear model we looked forp measurements {ini} andp +1
`constants {ci}, that allowed us to estimate the subjective mean opinion score. The estimated subjective mean opinion score is
`constants {ci}, that allowed us to estimate the subjective mean opinion score. The estimated subjective mean opinion score is
`
`Page 3 of 12
`
`MINDGEEK EXHIBIT 1013
`
`
`
`given by
`given by
`
`p(cid:229)+
`
`=
`sˆ
`s
`c0
`sus = Co + (cid:9)
`i
`
`,
`cimi
`cimi,
`1=
`=1
`
`(1)
`(1)
`
`where s is the true subjective mean opinion score and is the estimated score.
`where s is the true subjective mean opinion score and is the estimated score.
`sˆ
`
`3. RESULTS
`3. RESULTS
`
`For the results presented here, three complementary video quality measurements (p=3) were selected. These three
`For the results presented here, three complementary video quality measurements (p=3) were selected. These three
`complementary measures (m1, m2, and m3) have been used to explain most of the variance in subjective video quality that
`complementary measures (m1, m2, and m3) have been used to explain most of the variance in subjective video quality that
`resulted from the impairments used in this experiment. The investigations and research that produced the m1, m2, and m3
`resulted from the impairments used in this experiment. The investigations and research that produced the m1, m2, and m3
`video quality metrics also provided insight into how the human perceives the spatial and temporal information of a video
`video quality metrics also provided insight into how the human perceives the spatial and temporal information of a video
`scene.
`scene.
`
`3.1 Spatial and temporal information features
`3.1 Spatial and temporal information features
`
`The difficulty in compressing a given video sequence depends upon the perceived spatial and temporal information
`The difficulty in compressing a given video sequence depends upon the perceived spatial and temporal information
`present in that video sequence. Perceived spatial information is the amount of spatial detail in the video scene that is perceived
`present in that video sequence. Perceived spatial information is the amount of spatial detail in the video scene that is perceived
`by the viewer. Likewise, perceived temporal information is the amount of perceived motion in the video scene. Thus, it would
`by the viewer. Likewise, perceived temporal information is the amount of perceived motion in the video scene. Thus, it would
`be useful to have approximate measures of perceived spatial and temporal information. These information measures could be
`be useful to have approximate measures of perceived spatial and temporal information. These information measures could be
`used to select test scenes that appropriately stress the video compression system being designed or tested. Two different test
`used to select test scenes that appropriately stress the video compression system being designed or tested. Two different test
`scenes with the same spatial and temporal information should produce similar perceived quality at the output of the transmis-
`scenes with the same spatial and temporal information should produce similar perceived quality at the output of the transmis-
`sion channel. Measures of distortion could also be obtained by comparing the perceived information content of the video
`sion channel. Measures of distortion could also be obtained by comparing the perceived information content of the video
`before and after passing through a video system. Although it is recognized that spatial and temporal aspects of vision percep-
`before and after passing through a video system. Although it is recognized that spatial and temporal aspects of vision percep-
`tion cannot be completely separated from each other, we have found spatial and temporal features that correlate with human
`tion cannot be completely separated from each other, we have found spatial and temporal features that correlate with human
`quality perception of spatial detail and motion. Both of these features require pixel differencing operations, which seem to be
`quality perception of spatial detail and motion. Both of these features require pixel differencing operations, which seem to be
`basic attributes of the human visual system. The spatial information (SI) feature differences pixels across space while the tem-
`basic attributes of the human visual system. The spatial information (SI) feature differences pixels across space while the tem-
`poral information (TI) feature differences pixels across time. Here, both the SI and TI features have been applied to the lumi-
`poral information (TI) feature differences pixels across time. Here, both the SI and TI features have been applied to the lumi-
`nance portion of the video.
`nance portion of the video.
`
`
`
`3.1.1 Spatial information (SI) 3.1.1 Spatial information (SI)
`
` The spatial information feature is based on the Sobel filter.6 At time n, the video frame Fn is filtered with the Sobel
`The spatial information feature is based on the Sobel filter.6 At time n, the video frame Fri is filtered with the Sobel
`operators. The standard deviation over the pixels in each Sobel-filtered frame is then computed. This operation is repeated for
`operators. The standard deviation over the pixels in each Sobel-filtered frame is then computed. This operation is repeated for
`each frame in the video sequence and results in a time series of spatial information values. Thus, the spatial information fea-
`each frame in the video sequence and results in a time series of spatial information values. Thus, the spatial information fea-
`ture, SI [Fn], is given by
`ture, SI [F,J, is given by
`
`[
`]
`
`S= TDspace Sobel Fn{ [
`]
`}
`
`,
`SI Fn
`SI[F,J= STD space { Sobel[F,J} , (cid:9)
`
`(2)
`(2)
`
`where STDspace is the standard deviation operator over the horizontal and vertical spatial dimensions in a frame, and Fn is the
`where STDspace is the standard deviation operator over the horizontal and vertical spatial dimensions in a frame, and Fri is the
`nth frame in the video sequence. Figure 2 shows a time sequence of 3 contiguous video frames for an original scene (top row)
`nth frame in the video sequence. Figure 2 shows a time sequence of 3 contiguous video frames for an original scene (top row)
`and degraded version of that scene (second row). These images were sampled at the NTSC frame rate of approximately 30
`and degraded version of that scene (second row). These images were sampled at the NTSC frame rate of approximately 30
`frames per second. The degraded version of the scene was obtained from a 56 kbits/sec codec. The third row of Figure 2 shows
`frames per second. The degraded version of the scene was obtained from a 56 kbits/sec codec. The third row of Figure 2 shows
`the Sobel filtered version of the original scene and the fourth row shows the Sobel filtered version of the degraded scene. The
`the Sobel filtered version of the original scene and the fourth row shows the Sobel filtered version of the degraded scene. The
`highly localized, clearly focussed edges in the third row produce a large STDspace since the standard deviation is a measure of
`highly localized, clearly focussed edges in the third row produce a large STDspace since the standard deviation is a measure of
`the spread in pixel values. On the other hand, the non-localized, blurred edges shown in the fourth row produce a smaller
`the spread in pixel values. On the other hand, the non-localized, blurred edges shown in the fourth row produce a smaller
`STDspace, demonstrating that spatial detail has been lost. This is particularly evident for the images in the third column.
`STDspace, demonstrating that spatial detail has been lost. This is particularly evident for the images in the third column.
`
`Page 4 of 12
`
`MINDGEEK EXHIBIT 1013
`
`»
`
`
`1
`
`111111111111111
`
`Figure 2. Video Processed to Demonstrate Perceived Spatial and Temporal Information
`
`Page 5 of 12
`
`MINDGEEK EXHIBIT 1013
`
`
`
`3.1.2 Temporal information (TI)
`3.1.2 Temporal information (TI)
`
`The temporal information feature is based upon the motion difference image,
`, which is composed of the differ-
`The temporal information feature is based upon the motion difference image, AF , which is composed of the differ-
`ences between pixel values at the same location in space but at successive times or frames.
` , as a function of time (n), is
`ences between pixel values at the same location in space but at successive times or frames. AF , as a function of time (n), is
`defined as
`defined as
`
`D Fn D Fn
`
`=
`
`Fn Fn 1--
`D Fn
`AF n = Fn —Fn _ 1 . (cid:9)
`.
`
`(3)
`(3)
`
`D Fn
`The temporal information feature, TI [Fn], is defined as the standard deviation of
` over the horizontal and vertical spatial
`The temporal information feature, TI [Fn], is defined as the standard deviation of AF over the horizontal and vertical spatial
`dimensions, and is given by
`dimensions, and is given by
`
`[
`]
`=
`[
`]
`D Fn
`.
`TI[ Fn ] = STDspace [ AFn] • (cid:9)
`TI Fn
`STDspace
`
`(4)
`(4)
`
`D Fn
`More motion in adjacent frames will result in higher values of TI [Fn]. The fifth row of Figure 2 shows the AFn
`More motion in