`audiences
`
`Alan Lippman
`
`Alan Lippman, "Video coding for multiple target audiences," Proc. SPIE 3653,
`Visual Communications and Image Processing '99, (28 December 1998); doi:
`10.1117/12.334729
`Event: Electronic Imaging '99, 1999, San Jose, CA, United States
`
`SPIE.
`
`Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 2/12/2019 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`PROCEEDINGS OF SPIE
`
`SPIEDigitalLibrary.org/conference-proceedings-of-spie
`
`VIMEO/IAC EXHIBIT 1010
`VIMEO ET AL., IPR2019-00833
`
`
`
`Video Coding for Multiple Target Audiences
`
`Alan Lippman, RealNetworks
`
`Abstract
`We explain some of the mechanisms used by the SureStream™ method in RealSystem™ G2
`software for streaming video over the Internet. We focus on the dynamic behavior of the system under
`changing Internet bandwidth conditions. Our approach measures available bandwidth and switches between
`separate (non-layered) video encodings to match the channel capacity. The choice ofbitrates, appropriate
`rate control methods, and details of switching between each bitrate will be the main topics of this paper.
`Our goal is to present one approach to this problem and the rationale behind some of the decisions we
`made, in the hopes of encouraging progress in the development of the best possible video streaming
`experience over the Internet. To avoid complexity we have left out discussion of audio and the interaction
`between audio streaming and video streaming.
`
`Introduction
`Streaming video is the term commonly used for one-way, packet based transmission of video over
`the internet. Packets of sizes typically not bigger than one thousand bytes are sent from our server to the
`player at a steady rate. The player collects a certain minimum amount of packets (specified by a pre-roll
`amount) and then starts to decode the packets and combine the decoded results into a multimedia
`presentation. Once the presentation begins it should continue seamlessly until the player is stopped or until
`the content finishes. To accomplish this we must be robust to the situation of missing packets (including
`packets that arrive too late to be used), and the situation of changing bandwidth conditions.
`
`The path between a server sending out packets and a player receiving the packets has several
`statistics relevant to this paper. The first is the maximum sustainable bandwidth, the second is the amount
`of packet loss. Up to a point these are independent statistics - although as the amount of bandwidth that a
`server throws at the player increases, the potential for bottlenecks in the path to occur increases. For the
`purposes of this discussion we will use the following model. Firstly, we possess an estimate for the channel
`capacity. Secondly that packet loss is constant up to the estimated channel capacity, and that packet losses
`are uncorrelated (Although in regular UDP transmission packet loss is correlated, G2 uses a packet resend
`method that effectively eliminates this correlation). Our approach to maximizing the quality of the streamed
`video experience is to conduct a balancing act between maximizing bandwidth usage and not creating
`excessive loss by exceeding this rate.
`
`The rest of this paper is organized as follows. We first discuss rate control though a content
`creators perspective - in essence we will try to describe what we would like the video to look like at each
`bandwidth. We then describe encoding issues and the mechanisms we use to dynamically adjust the
`streamed video to the available bandwidth.
`
`Rate Control
`Rate control governs the trade-off between temporal resolution (more frames being higher
`resolution) and spatial resolution (sharper images being higher resolution), and has a strong affect on the
`subjective quality of coded video. In the RealProducer™ we provide the content creator a choice of three
`modes for rate control: sharpest image, normal motion and smooth motion. In sharp mode we will encode
`fewer frames resulting in shaper images; in smooth mode we encode many frames although the individual
`frames may be blurrier. We designed normal mode to reflect a compromise that was robust for a variety of
`content.
`
`As bitrate increases, it is our goal to maintain the compromise between spatial and temporal
`quality that the content creator specified. This is accomplished by increasing spatial and temporal quality in
`the proportions determined by the smooth - normal - sharp setting. Note that once every frame is being
`encoded, the only place left to spend bits is by an increase in spatial quality - however, for high action·
`video, this may not occur until very high bitrates.
`
`780
`
`Part of the IS&T/SPIE Conference on Visual Communications
`and Image Processing '99 • San Jose. California • January 1999
`SPIE Vol. 3653 • 0277-786X/98/$10.00
`
`Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 2/12/2019
`Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`
`
`In addition to obeying author controls, RealSystem software has two additional complications in
`its rate control. The first results from our desire to exploit the additional flexibility we have by being a one
`way, higher delay transmission system. This has several positive results; packet sizes can be increased
`leading to lower overhead for packet headers, also quality can be kept more constant by coding higher
`action sequences at higher than the allowed bitrate, provided that the content also contains lower action
`sequences which can be coded at a lower bitrate.
`
`The second complication arises from the introduction of forced key frames. Since these frames do
`not require any previous information, a video presentation can always be begun at a key frame. Keyframes
`allow users to join midstream for live content. Similarly, for pre-recorded content, we use key frames to
`provides our users with the ability to seek ahead ( or behind) in the content. A third benefit from the use of
`key frames is to periodically refresh the video in case any loss has occurred in previous segments. Finally,
`the ability to start playing a stream at any of the regularly occurring key frames is how we accomplish
`bandwidth scalability - the player will switch to a lower or higher bandwidth stream when conditions are
`appropriate.
`
`SureStream™
`Content creators producing SureStream content first select which audiences are going to be
`watching their content. Available target audiences are 28.8k modem, 56k modem, single ISDN, Dual
`ISDN, Corporate LAN, XDSL\Cable Modem. For each target audience there is an associated target bitrate
`for the combined audio and video stream. The default bitrates corresponding to the above audiences are
`20kbps, 34kbps, 45kbps, 80kbps, 150kbps, 218kbps and 436k. If only a single target audience is chosen,
`three separate video streams are created - one at the target bitrate, one at approximately 75% of the target
`rate, and one stream that only contains keyframes at no more that 50% of the target rate. The video coded at
`75% of the target bitrate provides an alternate video experience for those suffering from a bad internet
`connection. The keyframe only stream is meant for users experiencing a vary bad connection; in addition to
`its low bitrate, the server will subsample the keyframe only stream to meet the available bandwidth - in this
`way no matter what the bandwidth, at least the occasional frame will get through. In the event that multiple
`target audiences are selected, the RealProducer will attempt to nest the bitrates created so that the fallback
`stream for one target audience is the target bitrate for another.
`
`Switching between video encodings produced at different bitrates introduces several new
`complications. One is that we don't wish the switch to be a distracting or unpleasant experience. Our
`solution to this has been to use rate control schemes that increase both the frame rate and the clarity as
`bitrates increase - in this way the improvement in the appearance of the content from one bitrate to the next
`is more gradual. Another complication is maintaining the proper amount of pre-roll data in the player. As
`we mentioned above our rate control algorithm is more flexible at each bitrate to better exploit the high
`delay nature of our one-way system; switching between streams requires some coordination between the
`separate encodings so that the gestalt presentation doesn't violate the channel limitations.
`
`Examples
`We present some aggregate statistics for two standard MPEG clips, "Foreman" and "Grandma".
`They were created with the "normal motion" and "voice only" settings of the RealProducer. The Buffer
`Time is the amount of time that the player would need to wait (at the bandwidth specified by the target total
`bitrate) before a presentation could begin. The reader should notice the huge variety in framerates between
`the clips at different bandwidths. The reader should also notice that our system requires six separate
`encoding to cover the bulk of the internet audience. The CPU requirements of the encoding system are light
`enough to easily enable real-time encoding of QCIF video and audio on a P2 233mhz system.
`
`For those interested in experimenting further we encourage the download of our freely available
`RealProducer and RealPlayer™ available at wwv>'.real.com.
`
`Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 2/12/2019
`Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`781
`
`
`
`Target Audience
`
`Target Total Bitrates
`Target Video Bitrates
`Video Bitrates -
`Foreman
`Buffer Time -
`Foreman
`Average fps -
`Foreman
`Video Bitrates -
`Grandma
`Buffer Time -
`Grandma
`Average frames per
`second-
`Grandma
`
`28.8 Modem
`High Duress
`12 kbps
`7 kbps
`10.4 kbps
`
`28.8 Modem 28.8 Modem
`Duress
`15 kbps
`10 kbps
`10.6 kbps
`
`20 kbps
`15 kbps
`15 kbps
`
`56k
`Modem
`34 kbps
`29 kbps
`32.7 kbps
`
`Single
`ISDN
`45 kbps
`36.5 kbps
`39.7 kbps
`
`Dual
`ISDN
`80 kbps
`71.5 kbps
`72.5 kbps
`
`5.6 Secs
`
`4.3 Secs
`
`3.8 Secs
`
`3.4 Secs
`
`2.6 secs
`
`1.7 Secs
`
`.2 fps
`
`2.7 fps
`
`4.2 fps
`
`10.6 fps
`
`12.5 fps
`
`14.6 fps
`
`7.5 kbps
`
`10.5 kbps
`
`15.9 kbps
`
`30.6 kbps
`
`37.9 kbps
`
`74.5 kbps
`
`5.8 Secs
`
`4.8 Secs
`
`4.9 Secs
`
`3.3 Secs
`
`2.9 secs
`
`2.7 Secs
`
`.1 fps
`
`10.5 fps
`
`14 fps
`
`15 fps
`
`15 fps
`
`15 fps
`
`Conclusions
`We have a strong belief in creating easily scalable content and a dynamic player/server
`combination that maximizes the quality of a streaming multimedia experience regardless of the vagaries of
`a users Internet connection. We have presented here some of the issues we encountered. We plan to
`continue our work in this direction with the goal of providing in a seamless manner the best possible
`experience available to each individual user, across a wide spectrum of users, machines, and connections.
`
`782
`
`Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 2/12/2019
`Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
`
`