`(12) Patent Application Publication (10) Pub. No.: US 2004/0172656A1
`(43) Pub. Date:
`Sep. 2, 2004
`Kim et al.
`
`US 2004O172656A1
`
`(54) TWO-WAY AUDIO/VIDEO CONFERENCING
`SYSTEM
`
`(76) Inventors: Myong Gi Kim, Long Grove, IL (US);
`Arthur Yerkes, Chicago, IL (US)
`Correspondence Address:
`WELSH & KATZ, LTD
`120 S RIVERSIDE PLAZA
`22ND FLOOR
`CHICAGO, IL 60606 (US)
`(21) Appl. No.:
`10/758,883
`(22) Filed:
`Jan. 16, 2004
`Related U.S. Application Data
`(63) Continuation-in-part of application No. 10/376.866,
`filed on Feb. 28, 2003.
`
`Publication Classification
`
`(51) Int. Cl." ........................... H04N 7/173; H04H 9/00;
`HO)4N 7/16
`(52) U.S. Cl. .......................... 725/109; 725/111; 72.5/112;
`725/13
`
`ABSTRACT
`(57)
`A method and apparatus are provided for exchanging audio/
`Visual information between a caller and a called party
`through the Internet. The method includes the steps of
`Setting up a Session link between the caller and called party
`using a tunneled real time control protocol and collecting
`audio and Video information from the caller and called party.
`The method further includes the steps of forming the audio
`and Video portions into data objects, attaching a time Stamp
`to each formed data object and exchanging the formed audio
`and Video data objects as real time packets using a transport
`control protocol between the caller and called party through
`the Session link.
`
`
`
`
`
`
`
`
`
`
`
`110
`
`INTERNET
`COLOCATIONSTE
`
`SPEEDCAST
`AUDIOSERVER
`
`COMPANY's
`REMOTE OFFICE(s)
`
`108
`
`SPEEDCAST
`AUDIOSERVER
`
`
`
`FIREWAL
`
`SPEEDCASTAUDIO
`ENCOOERSERVER
`
`1
`
`Comcast, Ex. 1135
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 1 of 15
`
`US 2004/017265.6 A1
`
`INTERNET
`COLOCATIONSTE
`
`SPEEDCAS
`AUDIOSERVER
`
`COMPANYS
`REMOTE OFFICES)
`
`108
`
`SPEEDCAS
`AUDIOSERVER
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`FIREWALL SPEEDCAST AUDIO
`ENCODERSERVER
`
`
`
`100
`
`COMPANY'S OFFICE
`
`BOND FLOOR -
`
`2
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 2 of 15
`
`US 2004/017265.6 A1
`
`FIG.2
`
`INTERNET
`CO-LOCATION SITE
`
`// SPEEDCAST
`AUDIOSERVER
`
`
`
`
`
`LAPTOPw
`fD SPEEDCASTAUDIO
`PLAYER
`l
`DESKTOPPC
`204
`NSPEEDCAST
`AUDIO PLAYER
`
`IATO Air r
`E REEE
`EXN m
`
`SPEEDCAST Y
`
`.
`
`EsN N.
`D, ESSES
`WSPEEDCAST
`AUDIO PLAYER
`DESKTOPPC
`WSPEEDCAS
`200 AUDIO PLAYER
`KS
`WIRELESS
`SA MICROPHONE
`
`l
`DESKTOPPC
`WSPEEDCAST
`AUDIO PLAYER
`
`2.
`
`E%
`
`FIREWALL
`
`202
`2%
`
`m
`SPEEDCASTAUDIO
`ENCODERISERVER
`
`
`
`
`
`TELEPHONE
`COMPANY'S OFFICE
`
`EXCHANGE FLOOR
`
`3
`
`
`
`Patent Application Publication Sep. 2, 2004
`
`Sheet 3 of 15
`
`US 2004/017265.6 A1
`
`|OM||
`
`TOHINOO
`
`INBITO
`
`80105THEMISWOCHEJS
`
`HEAMHS1SWOOBEdSZº:
`
`
`
`MBAYld1SYOGHBdS
`
`X10AM030NMHITHE,
`
`
`
`IndNÏOEGIA IndN?olony
`
`
`ME000NE ISWOOEE|dS
`TET?H
`
`4
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 4 of 15
`
`US 2004/017265.6 A1
`
`SPEEDCASTENCODING STATION
`400
`Sa
`
`SPEEDCAST PLAYER
`408
`E.
`WORKSTATION
`- 410
`
`COMPUTER
`
`402
`
`FIREWALL
`
`
`
`WORKSTATION
`
`SPEEDCASTENCODNG STATION
`
`FIG. 5
`
`
`
`
`
`
`
`SPEEDCAST PLAYER
`
`CRTPRJECTOR
`
`WORKSTATION LAPTOP WORKSTATION
`SPEEDCAST PLAYER
`
`FIREWALL
`
`
`
`REFLECTOR
`
`WORKSTATION
`
`5
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 5 of 15
`
`US 2004/017265.6 A1
`
`SPEEDCAST END POINT
`ten
`t
`
`602
`
`COMPUTER
`
`N
`
`SPEEDCAST END PONT
`s KS
`-A
`
`s
`
`Y- 606
`I/
`a .2
`COMPUTER
`
`SPEEDCASTEND POINT
`RSN
`
`T
`
`-608
`
`... Y.
`
`COMPUTER
`
`CONFERENCEN
`SERVER & REFLECTOR
`ae
`DATA
`
`SPEEDCAST END POINT
`t
`
`A
`
`.
`s
`
`N
`Croc
`604-N-
`s
`COMPUTER
`
`6
`
`
`
`Patent Application Publication Sep. 2, 2004
`
`Sheet 6 of 15
`
`US 2004/017265.6 A1
`
`
`
`
`
`
`
`|N|OdGNE 1SWOCHEES
`
`
`
`3NOHJORGIA
`
`@
`
`
`
`
`
`
`
`7
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 7 of 15
`
`US 2004/017265.6 A1
`
`FIG. 8
`
`ENCODER WATS FORTHE PHONE TO RING
`
`802
`
`WHEN CALLIS MADE, THE MODEMPROGRAM OF THE ENCODER
`PICKSUPTHE PHONE
`
`RECORD8 kHz PCM (PULSE CODEMODULATION) SAMPLES FROM
`THE SPEECH INPUT GENERATED FROMMODEM
`
`804
`
`DIVIDEAUDIOSIGNALS INTO 20ms LONG FRAMES
`
`USING THE GSM(GLOBAL SYSTEM FORMOBILE COMMUNICATIONS)
`CODEC, COMPRESS THE2Oms FRAME INTODATAPACKET
`REPRESENTING PARTICULAREXCITATIONSEQUENCE AND AMPLITUDE
`BYUSING SHORT-TERMAND LONG-TERMPREDICTORS
`
`808
`
`TIMESTAMPTHE ENCODEDPACKET WITH THE CURRENT TIME
`
`810
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`8
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 8 of 15
`
`US 2004/017265.6 A1
`
`FIG. 9
`
`DEPENDING ON THE NETWORKCONFIGURATION OF THE NETWORK
`NODE THE PLAYER RESIDES IN, DETERMINE THE TYPE OF NETWORK
`TRANSPORT (RTP/UDPORTCPITUNNELEDHTTP) AND ROUTING
`METHOD (MULTICAST OR UNICAST) FOR THE PLAYER
`
`
`
`SEND THE DATAPACKETS TO ALL THE PLAYERS THAT ARE CONNECTED
`
`
`
`
`
`
`
`
`
`
`
`
`
`9
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 9 of 15
`
`US 2004/017265.6 A1
`
`FIG 10
`
`1000
`EACHRECEIVEDAUDIO FRAME ISPLACED INASORTED QUEUE, AND
`THE PACKET (AUDIO FRAME) WITH THE EARLIEST TIMESTAMPORTHE
`SMALLEST SEQUENCENUMBER WILL BE THE FIRSTDATAPACKET IN
`THE QUEUE
`
`1002
`
`THE PLAYER PICKS THE FIRSTPACKET OUT OF THE QUEUE, AND
`PROCESSEST IN THE FOLLOWING MANNER: IF THE SLEEPTIMES
`1Oms ORLESS, PROCESS THE SAMPLE IMMEDIATELY, IFTHE SLEEP
`TIME IS GREATER THAN 50ms, PROCESS THE SAMPLEAFTERA50ms
`WAIT (IN THIS CASE, SOME PACKETS WILL BE LOST); IF THE SLEEP
`TIME ISBETWEEN 10ms AND 50ms, SLEEPFOR THE INDICATED
`NUMBER OF MILLSECONDS AND THEN PROCESS THE SAMPLE
`
`1004
`EACHRECEIVED FRAME IS THENDECODED, ARING BUFFERADDINGA
`SMALLAUDIO LEADTIME, NEWAUDIO FRAMECAUSING THE RING
`BUFFERTO BE CLEARED WHEN T S FULL
`
`EXCITATIONSIGNALS IN THE FRAMES ARE FED THROUGH THE SHORT
`TERMAND LONG-TERMSYNTHESIS FILTERS ORECONSTRUCT THE
`AUDIO STREAMS
`
`1006
`
`DECODED AUDIO STREAMSARE FED TODIRECTXTO BE PLAYED
`BACK THROUGHA SOUND CARD
`
`
`
`1008
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`10
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 10 of 15
`FIG. 11
`
`US 2004/017265.6 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`100
`RECEIVEVIDEO FRAMES VIAAVIDEO CAPTURE CARD (INPUT VIDEO
`SIGNALSARE FED THROUGHS-VIDEOINPUT (ANALOG), IEEE 1394
`(FIREWIRE) OR USBPORT. AUDIOSIGNAL FROMMICROPHONE IS FED
`THROUGH AUDIO INPUT)
`
`1102
`USINGDIRECTX CAPTURELAYER,RECEIVEDNUMBER OFPCM (PULSE
`CODEMODULATION) SAMPLES ANDAVIDEO FRAME SAMPLE
`
`1104
`
`FOREACHENCODER, ENCAPSULATE THE SAMPLEDAUDIO AND
`VIDEOINTODATAOBJECTS RESPECTIVELY, ALONG WITH THE CAPTURE
`CHARACTERISTICS LIKE SAMPLE RATE, BITS AND CHANNELS FORAUDIO
`ANDX, Y AND COLOR SPACE FOR VIDEO, FOR EXAMPLE
`
`1106
`ENCODE THE CONVERTEDDATA (EACHENCODER PRODUCESAVIEW
`OF THE SAMPLE COMPATIBLE WITH ITS INPUT BY CONVERTING AND
`RE-SAMPLING THE INPUT DATA)
`
`PARTITION THE ENCODEDDATAINTO SMALLERDATAPACKETS
`
`1108
`
`
`
`CREATE THE TIMESTAMPANDATACHIT TODATAPACKET
`(DEPENDING ON THE TRANSPORT MODE, CREATE UNICASTRTP/UDP
`ORTCPPACKETS OR MULTICASTPACKETS FORTRANSMISSION)
`
`
`
`1110
`
`11
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 11 of 15
`FIG. 12
`
`US 2004/017265.6 A1
`
`
`
`
`
`-1200
`DEPENDING ON THE NETWORKCONFIGURATION OF THENETWORK
`NODEON WHICH THE PLAYER INRUNNING, DETERMINE THE TYPE OF
`NETWORKTRANSPORT (RTP/UDPoRTCPITUNNELEDHTTP) AND
`ROUTING METHOD (MULTICAST OR UNICAST) FOR THE PLAYER
`
`
`
`1202
`
`SEND THE DATAPACKETS TO ALL THE PLAYERS THAT ARE CONNECTED
`
`12
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 12 of 15
`FIG. 13
`EACHRECEIVEDPACKETISPLACED INASORTED QUEUE(THE PACKET WITH THE
`EARLIEST TIMESTAMPORTHESMALLEST SEQUENCENUMBER WILL BE THE FIRST DATA
`PACKETIN THE QUEUE)
`
`US 2004/017265.6 A1
`
`1300
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`THE PLAYER PICKS THE FIRSTPACKET OUT OF THE QUEUE, COPESIT TO THE SYNCH
`BUFFER, AND PROCESSEST IN THE FOLLOWING MANNER: IF THE SLEEPTIME IS OMS OR
`LESS, PROCESS THE SAMPLE IMMEDIATELY. IF THE SLEEPTIME IS GREATER THAN 50ms,
`PROCESS THE SAMPLEAFTERA50ms WAIT (IN THIS CASE, SOME PACKETS WILL BE LOST),
`F THE SLEEPTIME IS BETWEEN 10ms AND 50ms, SLEEPFOR THE INDICATED
`NUMBER OF MILLISECONOS AND THEN PROCESS THE SAMPLE
`
`1302
`
`1304
`
`EACH RECEIVED FRAME IS THENDECODED (ARING BUFFERADDSASMALLAUDIO
`LEADTIME.), ANDKEEPEXACTLY ONE VFDEO FRAME INABUFFER FOR REPAINT
`
`NEWAUDIO FRAMES CAUSES THE RING BUFFER TO CLEAR WHENTISFULL AND
`A NEWVIDEO FRAME REPLACES THE OLD ONE
`
`1306
`
`DECODED FRAMES AREFED TODIRECTXTO BE PLAYED BACK
`
`UPDATE (REPAINT)THE VIDEO FRAMES AND PLAYBACK THE AUDIO STREAM
`
`1308
`
`1310
`
`-1312
`
`WHENANDIF THEREARE IRC(INTERNET RELAY CHAT) MESSAGETOBESENT,
`SEND IT TO THE IRC SERVER, AND WHEN AND IF THERE ARE IRC MESSAGESTOBE
`RECEIVED, DISPLAY THEM
`
`13
`
`
`
`Patent Application Publication Sep. 2, 2004
`
`Sheet 13 0f 15
`
`US 2004/017265.6 A1
`
`
`
`
`
`
`
`ino ºpiâ uno ·
`
`14
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 14 of 15
`
`US 2004/017265.6 A1
`
`LAN or WAN
`
`
`
`15
`
`
`
`Patent Application Publication Sep. 2, 2004 Sheet 15 of 15
`
`US 2004/0172656A1
`
`- 104
`
`o2.
`
`
`
`sittitt
`Erskirt
`
`Act 2.
`
`16
`
`
`
`US 2004/0172656 A1
`
`Sep. 2, 2004
`
`TWO-WAY AUDIO/VIDEO CONFERENCING
`SYSTEM
`
`BACKGROUND OF THE INVENTION
`0001. The field of the invention relates to Internet com
`munication and the near-instantaneous delivery and play
`back of digitally encoded audio and Video. Internet broad
`casting or web casting allows many people to listen to radio
`Stations or to view news programs over the internet. How
`ever, internet broadcasting or web casting has an average
`latency of 5-20 seconds. That is, from the time the internet
`radio Station Starts the music or talk radio program, listeners
`will actually hear it 5-20 seconds later. The source of this
`latency comes from, for example, encoding, internet trans
`port (distribution), and decoding.
`0002 While this kind of latency may be acceptable for
`Some applications (e.g. listening to music, talk shows and
`any pre-recorded program may be acceptable), there are
`time-critical applications for which a 5-20 Second delay is
`unacceptable. For example, real-time market updates, emer
`gency broadcasts (fire, natural or manmade disasters), mili
`tary, police or 911 dispatches may not be able to tolerate
`Such a delay.
`0003. One obstacle to internet broadcasting is the high
`cost of the encoding Station, both for hardware and Software.
`The complexity associated with Setting up the encoding
`Station, as well as the required maintenance makes it even
`more difficult to establish and operate such an encoding
`Station. Another obstacle is the lack of a Standard in audio,
`as well as, Video players. Presently, there are three major
`media players, Microsoft's Windows Media TM, RealNet
`works's Real OneTM and Apple's QuickTime Media
`Player'TM, that can play back digital multimedia streams.
`Each of these playerS requires different ways of broadcasting
`over the internet. The variety of network protocols, routing
`methods and Security rules governing the usage of the
`internet also make internet broadcasting difficult.
`0004 One method of broadcasting over the internet is
`termed streaming. Microsoft(R), RealNetworks(R), and
`Apple(R) Computer are the three largest companies offering
`Streaming products. However, Streams from each of their
`Systems are generally incompatible with one another.
`Streams encoded by Microsoft's Windows MediaTM Server
`only work with Windows Media Player or Real One player,
`those encoded by RealNetworks Real Server'TM can only be
`played by RealPlayer'TM, while those encoded by Apple's
`QuickTime only work with the QuickTime Media PlayerTM
`or Real One player.
`0005. At nearly the same time the Microsoft, RealNet
`WorkS and Apple Computer developed their proprietary
`Streaming Systems, the Motion Pictures Experts Group
`(MPEG), a trade organization concerned with Setting broad
`cast Standards for the motion picture industry, released the
`MPEG-1 Standard for encoding and compressing digital
`audio and video. A subset of this specification, MPEG-1
`layer 3 audio (commonly referred to as MP3), quickly
`became the most popular compressed digital audio format
`because of its Superior compression ratioS and audio fidelity.
`Further contributing to the popularity of the MP3 format was
`the widespread availability of inexpensive (and in many
`cases, free) authoring and playback tools made possible by
`the presence of an open, published Standard. Driven by
`
`overwhelming public support for the MP3 format, many
`such media players, including RealPlayer, Windows Media
`Player, and QuickTime, quickly added support for the MP3
`Standard.
`0006 Seizing on the popularity of the MP3 audio format,
`On-Demand TechnologiesTM (“ODT”) developed the Audi
`oEdge TM server, which simultaneously serves a single MP3
`audio stream to all major players. Prior to AudioFdge TM,
`broadcasters wishing to Stream to their widest possible
`audience were required to encode and Serve Streams using
`multiple proprietary platforms. With AudioEdge TM, one
`MP3 encoder and one serving platform reach all popular
`players. In this manner, AudioFdge TM saves bandwidth,
`hardware, and maintenance costs. Additionally, because
`AudioEdge"M Supports Windows Media (the most popular
`proprietary streaming media format) and MP3 (the most
`popular standard based streaming media format) streams,
`the AudioEdge TM system eliminates the risk of technology
`lock-in, which is associated with many proprietary plat
`forms.
`0007 Multimedia streaming is defined as the real-time
`delivery and playback of digitally encoded audio and/or
`Video. The advantages of Streaming compared to alternative
`methods of distributing multimedia content over the internet
`are widely documented, among the most important of which
`is the ability for immediate playback instead of waiting for
`the complete multimedia file to be downloaded.
`0008 Two types of streaming are common today on the
`internet: on-demand and live. ODT AudioEdge TM delivers
`both live and on-demand (archived file) streams encoded in
`MP3 or Windows Media (WMA) format, and can be played
`using the major media players. Additionally, AudioBdge"
`is capable of delivering both archived Apple QuickTime and
`RealNetworks encoded media files on-demand.
`0009. On-demand streaming delivers a prerecorded (e.g.,
`an archived) multimedia file for playback by a single user
`upon request. For on-demand Streaming, an archived file
`must be present for each user to Select and view. An example
`of on-demand Streaming would be a television Station that
`Saves each news broadcast into an archived file and makes
`this archived file available for Streaming at a later time.
`Interested users would then be able to listen to and/or view
`this archived broadcast when it is so desired.
`0010 Live streaming involves the distribution of digi
`tized multimedia information by one or more users as it
`occurs in real-time. In the above example, the same news
`Station could augment its prerecorded archived content with
`live Streaming, thus offering its audience the ability to watch
`live news broadcasts as they occur.
`0011 Live streaming involves four processes: (1) encod
`ing, (2) splitting, (3) Serving, and (4) decoding/playback. For
`Successful live Streaming, all processes must occur in real
`time. Encoding involves turning the live broadcast Signal
`into compressed digital data Suitable for Streaming. Split
`ting, an optional Step, involves reproducing the original
`Source Stream for distribution to Servers or other splitters.
`The splitting or reflecting proceSS is typically used during
`the live Streaming of internet broadcasts (webcasts) to many
`users when Scalability is important.
`0012 Serving refers to the delivery of a live stream to
`users who wish to receive it. Often, Serving and Splitting
`
`17
`
`
`
`US 2004/0172656 A1
`
`Sep. 2, 2004
`
`functions can occur Simultaneously from a Single Serving
`device. Last, decoding is the process of decompressing the
`encoded Stream So that it can be heard and/or viewed by an
`end user. The decoding and playback process is typically
`handled by player software such as RealNetwork's Real One
`Player, Microsoft's Windows Media Player, or Apple's
`QuickTime player. All further uses of the term “streaming”
`refer to live Streaming over the internet, and further uses of
`the term “server” refer to a device capable of serving and
`Splitting live Streams.
`0013 As noted earlier, three major software players are
`available, however, they are not compatible with each other.
`In other words, a proprietary RealNetworkS-encoded audio
`stream can only be served by a RealNetworks server and
`played with the RealNetworks Real One Player. RealNet
`work claims that their new Real One player, made available
`in late 2002, can play back Windows Media streams as well
`as Apple QuickTime's MPEG-4 format. However, in all
`practicality, the broadcaster would have to choose one of the
`three proprietary Streaming formats, knowing that certain
`listeners will be excluded from hearing and/or viewing the
`Stream, or simultaneously encode and Stream in all three
`formats.
`0.014.
`Unfortunately, existing streaming audio and/or
`Video technologies, although termed live, Still exhibit a time
`delay from when an audio or Video Signal, is encoded to
`when the encoded signal is decoded to produce an audio or
`video output signal. For person-to-person conversation, for
`example, this delay of as much as 20 Seconds is simply
`unacceptable.
`0.015. In general, the internet broadcasting of video and
`audio introduces an average latency of 5-20 Seconds. That is,
`from the time live Video and audio frames are being cap
`tured, to the time viewers can actually hear and View the
`frames, is about 5-20 seconds. The sources of this latency for
`audio and Video are similar, and are generally a result of
`encoding (e.g., video/audio capture and compression of
`data), delivery (e.g., splitting, Serving and transport over IP),
`and decoding (e.g., buffering, data decompression and play
`back).
`0016. Thus, there exists a need for an improved system
`for Sending and receiving audio and Video over a network,
`Such as the internet, with minimal delay. Such a minimal
`delay may be one that is not perceptible to a user. Such
`minimal delay may also be referred to as “real-time”, “no
`delay' or “Zero delay”.
`
`BRIEF SUMMARY OF THE INVENTION
`0.017. To overcome the obstacles of known streaming
`Systems, there is provided a method and apparatus for
`eXchanging audio/visual information between a caller and a
`called party through the Internet. The method includes the
`Steps of Setting up a Session link between the caller and
`called party using a tunneled transmission control protocol
`and collecting audio and Video information from the caller
`and called party. The method further includes the steps of
`forming the audio and Video portions into data objects,
`attaching a time Stamp to each formed data object and
`eXchanging the formed audio and Video data objects as real
`time packets using a transport control protocol between the
`caller and called party through the Session link.
`
`BRIEF DESCRIPTION OF THE SEVERAL
`VIEWS OF THE DRAWINGS
`0018. The features of the present invention, which are
`believed to be novel, are set forth with particularity in the
`appended claims. The invention may best be understood by
`reference to the following description taken in conjunction
`with the accompanying drawings. In the Several figures like
`reference numerals identify like elements.
`0019 FIG. 1 is a block diagram of an example of a digital
`audio Streaming System;
`0020 FIG. 2 is a block diagram of another example of a
`digital audio Streaming System with a different configura
`tion;
`FIG. 3 is a software block diagram of SpeedCast
`0021
`Video digital multimedia Streaming System;
`0022 FIG. 4 is a block diagram of another example of a
`digital multimedia Streaming System;
`0023 FIG. 5 is a block diagram of another example of a
`digital multimedia Streaming System;
`0024 FIG. 6 is a block diagram of an example of a
`bi-directional (multipoint 2-way) digital multimedia stream
`ing System;
`0025 FIG. 7 is a block diagram of another example of a
`bi-directional (multipoint 2-way) digital multimedia stream
`ing System;
`0026 FIG. 8 is a flowchart depicting one embodiment of
`encoder data flow for SpeedCast Audio system (low-latency
`audio only System);
`0027 FIG. 9 is a flowchart depicting one embodiment of
`server data flow for SpeedCast Audio system;
`0028 FIG. 10 is a flowchart depicting one embodiment
`of player data flow for SpeedCast Audio system;
`0029 FIG. 11 is a flowchart depicting one embodiment
`of encoder data flow for SpeedCast Video system (low
`latency audio and video system);
`0030 FIG. 12 is a flowchart depicting one embodiment
`of server data flow for SpeedCast Video system;
`0031
`FIG. 13 is a flowchart depicting one embodiment
`of player data flow for SpeedCast Video system;
`0032 FIG. 14 is a software block diagram of a two-way
`conferencing System.
`0033 FIG. 15 is a block diagram of a two-way confer
`encing System using a direct connection method;
`0034 FIG. 16 is a block diagram of a two-way confer
`encing System using a connection method including a server;
`and
`0035 FIG. 17 depicts a graphical user interface screen of
`a two-way conferencing System.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`0036 While the present invention is susceptible of
`embodiments in various forms, there is shown in the draw
`ings and will hereinafter be described Some exemplary and
`non-limiting embodiments, with the understanding that the
`
`18
`
`
`
`US 2004/0172656 A1
`
`Sep. 2, 2004
`
`present disclosure is to be considered an exemplification of
`the invention and is not intended to limit the invention to the
`Specific embodiments illustrated.
`0037. It should be further understood that the title of this
`Section of this specification, namely, "Detailed Description
`Of The Invention', relates to a requirement of the United
`States Patent Office, and does not imply, nor should be
`inferred to limit the subject matter disclosed herein.
`0.038. The internet, as used herein, includes the world
`wide web (web) and other Systems for Storing and retrieving
`information using the internet. To View a web site, a user
`typically points to a web address, referred to as a uniform
`resource locator (URL), associated with the web site.
`0039. At least one embodiment of the system provides a
`method by which thousands of users can listen to an audio
`Stream simultaneously and economically with very little
`delay. The typical latency may be 500 ms within the pubic
`internet. Also, by connecting the encoding Station with a
`generic telephone line, an audio Stream may be broadcast
`from any wired or wireless phones. Other embodiments may
`not require Special hardware or media players. Any internet
`ready Windows-based computer with a standard sound card
`and Speaker allows users to listen to the broadcasted audio
`Stream.
`0040. The present audio system provides faster voice
`broadcasting over IP than prior art Systems using at least an
`encoder, a Server and a player. Various reasons for this
`improvement have been observed.
`0041. For example, one reason is auto-negotiation of the
`internet transport layer. Depending on the network configu
`ration between the Server and player, the audio broadcast can
`be accomplished via one of the 3 methods: multicast, unicast
`user datagram protocol (UDP), and tunneled real-time trans
`port protocol (RTP). If the network configuration for the
`player (client) is capable of accepting multicast packets, the
`server will transmit multicast packets. If not, unicast UDP or
`tunneled RTP transport methods will be used. Multicasting
`is a preferred method over unicast UDP or tunneled RTP
`because it uses leSS bandwidth than unicast, and will have
`less latency than tunneled RTP. Regardless of the network
`protocols chosen, each audio packet is time-Stamped in
`every 20 ms frame. This time-Stamp is used later to recon
`Struct the packets.
`0.042
`Next, are client and server buffering techniques
`which typically maintain a dynamically sized buffer that
`responds to network and central processing unit (CPU)
`conditions. In general, these buffers are kept as Small as
`possible, because this reduces the time between the Voice
`Sample being encoded, and the transmitted Voice Sample
`being decoded. Each Voice Sample may be transmitted every
`20 ms, and the System may hold a minimum of one Sample
`and a maximum of 50 Samples. The current Setting is
`designed for the worst case latency of one Second. Usually
`this dynamic buffer will hold no more than 10 samples.
`0043. The third reason is the choice of audio encoding.
`The audio System may be tuned to operate at peak efficiency
`when delivering a broadcast of the human Voice. Parameters
`taken into account when choosing the audio encoding
`mechanism for the System may include, for example, high
`compression ratio for encoding while preserving audio qual
`ity; data Stream ability to be multiplexed, avoidance of
`
`forward or backward temporal dependency in encoding
`(e.g., that is, the data packets produced must be represented
`as independent blocks which represent a certain slice of time
`of the original recording delta, and most of the waveform
`represented by that block may be recovered without refer
`ence to adjacent packets, Some of which may be lost); and
`encoding and decoding need not require the top of the line
`CPUs for their respective computers. Preferably, however,
`the encoding station is at least a 1.5 GHz Intel CPU or the
`equivalent, and the decoding station is at least a 500 MHz
`Intel CPU to run the player.
`0044) For clear voice quality the global system for mobile
`communications (GSM) codec was chosen for the audio
`System designed for human Voice. This codec filters out
`background noise from the Surrounding environment. Since
`the psycho-acoustic model is Specially tuned for human
`Voice processing, the types of errors in the audio will be
`limited to errors that Sound more natural to human Speakers
`(e.g., Switching the “F” sound with the "TH" sound). The
`usual Static or "garbled robot-like Voice” typical in direct
`analog (non-psycho-acoustic) or digital reproductions are
`unlikely to happen.
`0045 For low bandwidth per stream, each audio stream
`is set for 13 kbits/sec (kbps). Many streaming radio stations
`use between 24 and 128 kbps. The tradeoff is that generic
`Streaming radio may carry a wide variety of audio types
`(e.g., rock, jazz, classic and voice) while the audio system is
`Specifically tuned to human Voice reproduction. Grouping
`GSM packets into UDP packets further saves bandwidth.
`0046 For Secure communication, log-in and data encryp
`tion and user authentication may be implemented in the
`Speech broadcasting System.
`0047 User and data encryption can be performed using
`the industry-standard SSL (Secure Socket Layer). The algo
`rithm used may be changed on a per-Socket basis, and by the
`“amount of encryption (number of bits used in keys). Using
`SSL also allows the system to interface with a common web
`browser, making different types of media applications easy.
`For example, the same Server may serve both real-time live
`Streaming media and pre-recorded (archived or on-demand)
`media files. Their usage may be accurately accounted for by
`a user authentication System. Accounting coupled with
`authentication gives the operator of the System an easy way
`to facilitate billing.
`0048 User authentication can be layered on top of the
`encryption layer and is independent of the encryption layer.
`This form of authentication performs Secure authentication,
`without exposing the System to potential forgery or circum
`vention. This permits the use of any method to Store user
`names and passwords (e.g., UNIX password file, htaccess
`database, extensible markup language (XML) document,
`traditional database and flat file).
`0049. The client software can run on Windows 2000 and
`XP as MS ActiveX controls, compatible with MS Internet
`Explorer (IE). The server supports multicast for most effi
`cient bandwidth utilization within intranets. It also Supports
`unicast for most commonly used transport over current IPV4
`networks. For those users that are protected by tight fire
`walls, tunneled hyper text transfer protocol (HTTP) trans
`port may be used.
`0050. The system is easy to use for those listening to
`audio streams. All that is required is a web browser, Such as
`
`19
`
`
`
`US 2004/0172656 A1
`
`Sep. 2, 2004
`
`Internet Explorer, that can instantiate ActiveX controls.
`Once the user visits the appropriate web site, the program is
`downloaded, installs itself, fetches its configuration files,
`and attempts to Start the most efficient Stream type. If the
`player detects problem(s), it tries an alternative transport
`type and/or a different codec. It does So in the order of
`preference until a stream with desirable transport (e.g.
`multicast, unicast and tunneled HTTP) is established at an
`appropriate bandwidth. AS Such, the end user does not have
`to configure the player to circumvent any firewall restric
`tions that may be in place.
`0051. In one embodiment of the system, the audio encod
`ing Station contains elements necessary for listening to many
`audio broadcasts. It can also have the following Software:
`Linux RedHat 7.x, Apache web server; GSM encoder;
`auto-answering modem Software, audio Streaming Server;
`and Streaming Server Administrator (SSA)-Java program
`used to Set up and administer audio System. In this embodi
`ment, the audio encoding Station can be bundled with an
`audio streaming Server. This server can be, for example, a
`Linux-based internet “appliance” equipped with GSM
`encoder, voice capture modem (or wireless microphone) and
`low latency audio. This appliance is a 1U high rack
`mountable server with the following specifications: 1 GHz
`Pentium processor; 256 MB memory; 20 GB hard drive;
`Red Hat Linux 7.1 operating system; Dual 100 Base-T
`Ethernet NIC; high quality Data/Fax/Voice internal modem;
`multimedia Sound card; and optional wireleSS microphone
`and receiving Station.
`0.052
`Referring now to FIG. 1, there is shown Scenario
`“A” in which the broadcast origination point may be the
`floor of a major securities exchange 100. To initiate the
`broadcast, the individual providing the audio content dials
`the telephone number corresponding to a dedicated phone
`line 102 connected to the system. A modem 106 (with voice
`capture) answers the call and passes the Signal to the encoder
`104. The encoder 104, in turn, passes the digitally encoded
`signal to the server 106 for the distribution of the signal via
`a streaming server 108 within the local area network (LAN),
`e.g., an intranet, or via a streaming Server 110 over the
`internet. A player residing in any desktop PC connected to
`one of the Streaming Servers, for example, will decode the
`digital signal and play back the Voice data.
`0053 FIG. 2 illustrates Scenario “B” in which the broad
`caster (“squawker') speaks into a wireless microphone 200
`linked directly to the server 202 equipped with a wireless
`Station. Encoder/server 202 captures the Voice, encodes the
`audio signals and transmits them to server 204 for distribu
`tion. A player residing in any desktop PC, for example PC
`206, decodes the digital signal and plays back the Voice data.
`These System concepts can also be applied to Video and
`audio for multimedia Systems.
`0.054 An exemplary embodiment of a multimedia system
`includes up to about eight (8) logical Software Subsystems:
`encoder, Slide presenter, whiteboard (collaboration tools),
`IRC Server, reflector, conference Server or multipoint control
`unit (MCU) and player. An optional conference gateway can
`handle packet-level translation of H.323 and session initia
`tion protocol (SIP) based conferencing to make the Speed
`Cast Video System interoperable with these types of Systems.
`0.055 The encoding station is responsible for encoding
`the Video/audio channels, packetizing audio/video channels,
`
`and transmitting the packetized Streams to a reflector. The
`Slide presenter provides a Series of Static images, Such as
`joint photographic exerts group (JPEG) or portable network
`graphic (PNG) format, that are generated using MS Power
`Point. This is part of the logically independent data channel.
`Therefore, other data channels. Such as a spreadsheet, Word
`file and the like can be channeled through accordingly.
`Internet Relay Chat (IRC) handles standard chat functions.
`It consists of an IRC Server residing on the conference Server
`or reflectors and IRC client residing on every desktop
`computer where a player ru