`
`(12)
`
`United States Patent
`
`Azriel et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,286,652 B1
`Oct. 23, 2007
`
`(54) FOUR CHANNEL AUDIO RECORDING IN A
`T BASED NETWORK
`PACKE
`
`2001/0043571 A1 *
`11/2001 Jang et al.
`................ .. 370/260
`OTHER PUBLICATIONS
`
`(75)
`
`Inventors: Gad Azriel, Holon (IL); Yackov
`Sfadya, Kfar Saba (IL)
`
`(73) Assignee: 3Com Corporation, Marlborough, MA
`(US)
`
`International Telecommunication Union, H.225.0, Annex A, RTP/
`RTCP, Feb. 1998, pp. 73-106.
`International Telecommunication Union, H.323, Draft V4, Aug.
`1999, Chapters 6 & 7, P11 13-52.
`Pocket Tele hon Primer, 3COM Co oration, Mar. 1998.
`p
`y
`rp
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(1)) by 1606 days.
`
`* Cited by examiner
`Primary Exgm;‘ner—Fan Tsang
`Assistant Examiner—Joseph T Phan
`
`(21) Appl. No.: 09/584,581
`
`(57)
`
`ABSTRACT
`
`(22)
`
`Filed:
`
`May 31, 2000
`
`(51)
`
`1111- C1-
`(2006~01)
`H04M 1/64
`(2006~01)
`H04L 12/66
`(2006~01)
`G06F 15/173
`(52) U.S. Cl.
`.................. .. 379/88.22; 370/352; 709/224
`(58) Field of Classification Search ................... .. None
`See application file for complete search history.
`_
`References Clted
`US. PATENT DOCUMENTS
`
`(56)
`
`........... .. 348/14.09
`1/ 1998 Bruno et a1.
`5,710,591 A *
`8/2000 Sharman et a1.
`704/235
`6,100,882 A *
`
`................. .. 709/224
`9/2000 Bar et al.
`6,122,665 A *
`6,487,196 B1* 11/2002 Verthein eta1.
`.......... .. 370/352
`6,614,781 B1*
`9/2003 Elliott et al.
`...... .. 370/352
`
`6,850,609 B1*
`2/2005 Schrage ............... .. 379/202.01
`
`An apparatus for and a method of audio recording in packet
`based telephony systems. Using the present invention, the
`equivalent of four audio channels are recorded utilizing only
`two recording channels. Each channel recorded comprises
`the stream of packets generated and transmitted by each
`endpoint to the other side. The RTP packets include the
`samples generated by the particular endpoint in addition to
`the timestamp of the samples received from the other side
`actually played by the endpoint. The recording device has
`knowledge of what was played at the other endpoint in order
`to accurately playback the audio samples generated by and
`received from the other endpoint. The recording device
`receives a packet stream containing the audio generated on
`each endpoint and the timestamp of the packet from the
`other side that was played on the endpoint. The recording
`device can reconstruct from this data the audio signal that
`was actually played on each endpoint.
`
`28 Claims, 11 Drawing Sheets
`
`112
`
`118
`
`
`
`
`IP PACKET
`NEWVORK
`
`ENDPOINT
`A
`
`
`
`2 CHANNEL
`IP RECORDER
`
`126
`
`110/6
`
`Ca||Copy
`1103-1
`
`CallCopy
`1103-1
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 1 of 11
`
`US 7,286,652 B1
`
`FIG.1
`PWOR ART
`
`Ca||Copy
`1103-2
`
`CallCopy
`1103-2
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 2 of 11
`
`US 7,286,652 B1
`
`Eéommoo
`
`Ez<Ez_
`
`0%
`
`mm
`
`mm
`
`XEI mun
`
`
`
`Ezmmxm
`
`Io._._>>m
`
`in
`
`.3
`
`S
`
`~
`
`><>>m:.<o
`
`mNm+,
`
`mmmmmxmH<o
`
`
`
`Axmaz<4v
`
`NUE
`
`E<mos:
`
`cm
`
`Ca||Copy
`1103-3
`
`CallCopy
`1103-3
`
`
`
`
`
`
`
`U.S. Patent
`
`1B
`
`
`
` .2...__Eow_SE28Sm:"_Rm............-L__M220252comm:E5E8m_mII_E2m2ESH._
`
`SS8o_8<_o\_o_8<E_
`
`_IllllllImllllllIL3
`
`_E2m2¢5oH.__SS889>_o\_89>
`
`
`
` m:2.29%MmGEIIIIIIEIII1IIIII8n,_I8/2<2_2mERm._:K\
`s_:::::::::::::I_U8_$11SE28
`2Em»mu_____SSNN:__SE2822H_
`SE28m.SE28._._<o__2Em»m
`
`_8<.._2E2_mSSNN:_Em:
`
`Ca||Copy
`1103-4
`
`CallCopy
`1103-4
`
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 4 of 11
`
`US 7,286,652 B1
`
`4 CHANNEL
`
`IP RECORDING
`
`DEVICE
`
`FIG.4
`PR|OR ART
`
`Ca||Copy
`1103-5
`
`CallCopy
`1103-5
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 5 of 11
`
`US 7,286,652 B1
`
`zoE\2
`
`55$
`
`éosmz
`
`_
`
`H_
`
`_____
`
`%N?.22S.929:mi
`
`
`
`._.m_On_m_m..E:mV-
`
`"553
`
`mommuoomm
`
`hmoa
`
`5&835E‘..
`
`wemi.2:E;
`
`-«.. :::—::_:::.,_,_,:::j___
`
`ulN|.1.la1|I||‘I|I|I|I..|I|IUIINUII-I.lIIIIIIInIl.l'|"l!..fI|l.|.lIIIIu|II|In|nIII|nJq;__o%_,m
`
`mug
`
`\o2
`
`Ca||Copy
`1103-6
`
`CallCopy
`1103-6
`
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 6 of 11
`
`US 7,286,652 B1
`
`118
`
`
` IP PACKET
`
`NETWORK
`
` 2 CHANNEL
`
`
`
`IP RECORDER 126
`
`112
`
`ENDPOWT
`A
`
`110’////
`
`F1K}.6
`
`162
`
`170
`
`
`
`ENDPOINT
`
`A
`
` IP PACKET
`NEUNORK
`
`176
`
`TA(n)
`
`1 CHANNEL
`
`IP RECORWNG
`
`
`
`
`
`
`1 CHANNEL
`
`IP RECORWNG
`
`DEWCE — A
`DEWCE — B
`
`
`160/
`
`FIG?
`
`Ca||Copy
`1103J7
`
`CallCopy
`1103-7
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 7 of 11
`
`US 7,286,652 B1
`
`FIG. 8A
`
`2OO
`
`202
`NO
`
`RECORDING
`
`METHOD : ENOPOINT
`
`INITIALIZATION
`
`ADD|TiONAL SAMPLES TO BE
`PLAYED IN THE CURRENT RECEIVED
`
`RTP PACKEF ?
`
`YES
`
`204
`
`GET A SAMPLE FROM THE CURRENT
`RECEIVED RTP PACKET POINTEO TO
`
`BY R><_OEEsEr
`
`206
`
`208
`
`'
`
`INCREMENT RX_OFFSET BY ONE
`
`
`
`ENOPOINT B TIMESTAMP CLOCK RATE
`
`
`
`
`RX_T|MESTAM P_CUNTER = RX_PACKET_T|MESTAMP
`+ RX-OFFSH °<ENOPOINT B SAMPLING CLOCK RATE)
`
`
`
`
`
`
`
`210
`
`212
`
`NO
`
`PLAY THE SAMPLE
`
`TX_OFFSET = O ?
`
`YES
`
`1
`
`Ca||COpy
`1103-8
`
`CallCopy
`1103-8
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 8 of 11
`
`US 7,286,652 B1
`
`FIG.8B
`
`D
`
`214
`
`
`
`
`UPDATE THE ENDPOINT A TIMESTAMP
`COUNTER AND PUT THE TX__SEQUENCE
`AND ENDPOINT A TIMESTAMP IN THE
`RTP HEADER
`
`216
`
`
`
`PUT THE (RX_TIME STAMP_COUNTER)/
`(RX_SEQUENCE AND RX_OFFSET)
`IN
`THE RTP HEADER EXTENSION
`
`
`
`C7
`
`Ca||Copy
`1103-9
`
`218
`
`RECORD A SAMPLE
`
`PLACE THE SAMPLE IN THE RTP PACKET
`FOR TRANSMISSION AT OFFSET TX_OFFSET
`
`INCREMENT TX_OFFSET BY ONE
`
`-
`
`220
`
`222
`
`224
`
`RTP TRANSMISSION PACKET
`
`N0
`
`FULL ? -
`
`226
`
`
`
`232
`
`'
`SEND THE RTP PACKET TO THE ENDPOINT
`
`SEND COPY OF THE RTP PACKET TO
`THE RECORDING DEVICE
`
`ALLOCATE EMPTY BUFFER FOR THE NEXT
`RTP PACKET ; SET TX_OFFSET TO ZERO ;
`INCREMENT TX_SEQUENCE BY ONE
`
`SYNCHRONIZATION FLAG SET ?
`
`YES
`
`RESET SYNCHRONIZATION FLAG
`
`CallCopy
`1103-9
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 9 of 11
`
`US 7,286,652 B1
`
`on
`
`$2,
`
`mm:z:oo:%<Bm_z:rxmEmMomm_NobEmtonxm
`
`n§<Bms_:nh.2o§..xmob
`
`Sm
`
`mm:
`
`SN
`
`~
`
`EH9.Emtouémma%<Em_2:nh._V6E..xm
`
`HOENo._.mozmaommuxmEm
`
`OHm..¢z:ooun_2<Bm_2:..xmEm
`
`“EH9.mEz:oo..%<BH._2:uéHm
`
`
`
`Mo<.Ezo:<N_zom:oz»mEm
`
`MQENO»EmkonxmEm
`
`2:zo:<N_zo$_oz»mEm
`
`
`
`N.oflfifimmm9.Exo<n_
`
`
`
`
`
`mmC:,m_._+EOEEv_o<n_n_.Ebmzm:.F50
`
`
`
`
`
`Qz<%<Bm:::h§o<n_IxmmzonsWEtsm
`
`
`
`
`
`mmHmod:fiemzn:.mmozmsommnxm
`
`¢mN
`
`.2
`
`
`
`4..Etamm.hE__,z_>>o._.._$575
`
`mfi
`
`EOE£%<mGm:mg><._n_
`
`
`
`
`
`H205n_._.w_E>mR.EmI.r
`
`0905
`
`
`
`mozfim><._m
`
`Ca||Copy
`1103-10
`
`CallCopy
`1103-10
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 10 of 11
`
`US 7,286,652 B1
`
`RECORDING MEIHOD :
`
`RECORDING DEVICE
`
`RECEIVE TRANSMIT RTP
`PACKET FROM ENDPOINT
`
`‘
`
`BUFFER RECEIVE PACKETS
`
`STORE PACKETS IN
`
`SEQUENCE ORDER IN MEMORY
`
`250
`
`252
`
`254
`
`END
`
`FIG.9
`
`Ca||Copy
`1103-11
`
`CallCopy
`1103-11
`
`
`
`U.S. Patent
`
`Oct. 23, 2007
`
`Sheet 11 of 11
`
`US 7,286,652 B1
`
`PLAYBACK METHOD :
`
`RECORDING DEVICE
`
`260
`
`TIMESTAMP INDICATION USED '2
`
`NO
`
`264
`
`
`
`EXTRACT THE SEQUENCE
`NUMBER AND THE
`
`OFFSET FROM THE RTP
`
`HEADER EXTENSION AND
`
`SAVE THEM IN
`RX_B_SEQUENCE AND
`RX__B__OFFSEI.
`RES PECTIVELY
`
`
`
`
`
`
`EXTRACT TIMESTAMP FROM
`
`THE RTP HEADER EXTENSION
`
`
`
`
`AND SAVE IT IN
`RX_PACKET__B_T|MESTAMP
`
`
`
`
`
`
`SET NUMBER_OF_SAMPLES TO THE
`NUMBER OF SAMPLES IN THE RTP
`PACKET PAYLOAD
`
`255
`
`SILENCE INDICATION DETECTED '9
`
`NO
`
`268
`
`272
`
`
`
`
`
`APPEND A VECTOR
`
`
`
` GET A VECTOR OF
`SAMPLES TO REPLAY
`
`
`
`OF ZEROS TO THE
`PA(n) VECTOR
`
`PLAY RECONSTRUCTED AUDIO
`
`274
`
`END
`
`FIG.1O
`
`Ca||COpy
`1103-12
`
`CallCopy
`1103-12
`
`
`
`US 7,286,652 B1
`
`1
`FOUR CHANNEL AUDIO RECORDING IN A
`PACKET BASED NETWORK
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to voice over IP
`networks and more particularly relates to four-channel audio
`recording for use in a packet-based network.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`Separate Voice and Data Networks
`
`Currently, there is a growing trend to converge voice and
`data networks so that both utilize the same network infra-
`
`15
`
`2
`
`with a LAN interface port and a Layer 3 switch 38. The key
`components of an IP telephony system 30 are the modified
`desktop, gatekeeper and gateway entities. For the desktop,
`users may have an Ethernet phone 36 that plugs into an
`Ethernet RJ-45 jack or a handset or headset 35 that plugs
`into a PC 37.
`
`Today, all LAN based telephony systems need to connect
`to the PSTN 44. The gateway is the entity that is specifically
`designed to convert voice from the IP domain to the PSTN
`domain. The gatekeeper is primarily the IP telephony
`equivalent of the PBX in the PSTN world.
`Typically,
`the IP telephony traffic is supported by a
`packet-based infrastructure such as an Ethernet network but
`a circuit-based infrastructure can be used as well with some
`
`provisions (e.g., ATM LAN emulation on ATM networks).
`Telephony calls traversing the intranet may pass through a
`Layer 3 switch 38 or a router (not shown) connecting a
`corporate intranet 40. The Layer 3 switch and the router
`should support Quality of Service (QoS) features such as
`IEEE 802.lp and 802.1Q and Resource Reservation Proto-
`col (RSVP).
`
`ITU-T Recommendation H.323
`
`The International Telecommunications Union (ITU-T)
`Telecommunications Standardization Sector has issued a
`number of standards related to telecommunications. The
`Series H standards deals with audiovisual and multimedia
`
`20
`
`25
`
`structure. The currently available systems that combine
`voice and data have limited applications and scope. An
`example is Automatic Call Distribution (ACD), which per-
`mits service agents in call centers to access customer files in
`conjunction with incoming telephone calls. ACD centers,
`however, remain costly and difficult to deploy, requiring
`custom systems integration in most cases. Another example
`is the voice logging/auditing system used by emergency call
`centers (e.g., 911) and financial institutions. Deployment has
`been limited due to the limited scalability of the system since
`voice is on one network and data is on another, both tied
`together by awkward database linkages.
`The aim of IP telephony is to provision voice over IP
`based networks in both the local area network (LAN) and the
`wide area network (WAN). Currently, voice and data gen-
`erally flow over separate networks, the goal is to transmit
`them both over a single medium and on a single network.
`A block diagram illustrating example separate prior art
`data and voice networks is shown in FIG. 1. The LAN
`
`portion, generally referenced 10, comprises the LAN cabling
`infrastructure, routers, switches and gateways 12 and one or
`more network devices connected to the LAN. Examples of
`typical network devices include servers 14, workstations 16
`and printers. The voice portion, generally referenced 20, has
`at
`its core a private branch exchange (PBX) 24 which
`comprises one or more trunk line interfaces and one or more
`telephone and/or facsimile extension interfaces. The PBX is
`connected to the public switched telephone network (PSTN)
`22 via one or more trunk lines 28, e.g., analog, T1, E1, T3,
`ISDN, etc. A plurality of user telephones 26 and one or more
`facsimile machines 27 are also connected directly to the
`PBX via phone line extensions 29.
`The paradigm currently in wide spread use consists of
`circuit switched fabric 20 for voice networks and a com-
`
`pletely separate LAN infrastructure 10 for data. Most enter-
`prises today use proprietary PBX equipment for voice traffic.
`
`Voice and Data Over a Shared Network
`
`An increasingly common IP telephony paradigm consists
`of telephone and data tightly coupled on IP packet-based,
`switched, multimedia networks where voice and data share
`a common transport mechanism. It
`is expected that this
`paradigm will spur the development of a wealth of new
`applications that take advantage of the simultaneous deliv-
`ery of voice and data over a single unified fabric.
`A block diagram illustrating a voice over an IP network
`where voice and data share a common infrastructure is
`
`shown in FIG. 2. The IP telephony system, generally refer-
`enced 30, comprises, a LAN infrastructure represented by an
`Ethernet switch 32, a router, one or more telephones 36,
`workstations 34, a gateway 42, a gatekeeper 46, a PBX 33
`
`30
`
`systems and describes standards for systems and terminal
`equipment for audiovisual services. The H.323 standard is
`an umbrella standard that covers various audio and video
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`encoding standards. Related standards include H.225.0 that
`covers media stream packetization and call signaling proto-
`cols and H.245 that covers audio and video capability
`exchange, management of logical charmels and transport of
`control and indication signals. Details describing these stan-
`dards can be found in ITU-T Recommendation H.323 (Draft
`4 Aug. 1999), ITU-T Recommendation H.225.0 (February
`1998) and ITU-T Recommendation H.245 (Jun. 3, 1999).
`A block diagram illustrating example prior art H.323
`compliant terminal equipment
`is shown in FIG. 3. The
`H.323 terminal 50 comprises a video codec 52, audio codec
`54, system control 56 and H.225.0 layer 64. The system
`control comprises H.245 control 58, call control 60 and
`Registration, Admission and Status (RAS) control 62.
`Attached video equipment 66 includes any type of video
`equipment, such as cameras and monitors including their
`control and selection, and various video processing equip-
`ment. Attached audio equipment 70 includes devices such as
`those providing voice activation sensing, microphones,
`loudspeakers, telephone instruments and microphone mix-
`ers. Data applications and associated user interfaces 72 such
`as those that use the T.120 real time audiographics confer-
`encing standard or other data services over the data charmel.
`The attached system control and user interface 74 provides
`the human user interface for system control. The network
`interface 68 provides the interface to the IP based network.
`The video codec 52 functions to encode video signals
`from the video source (e.g., video camera) for transmission
`over the network and to decode the received video data for
`
`output to a video display. If a terminal incorporates video
`communications, it must be capable of encoding and decod-
`ing video information in accordance with H.261. A terminal
`may also optionally support encoding and decoding video in
`accordance with other recommendations such as H.263.
`
`The audio codec 54 functions to encode audio signals
`from the audio source (e.g., microphone) for transmission
`
`Ca||Copy
`1103-13
`
`CallCopy
`1103-13
`
`
`
`US 7,286,652 B1
`
`3
`over the network and to decode the received audio data for
`
`output to a loudspeaker. All H.323 audio terminals must be
`capable of encoding and decoding speech in accordance
`with G.7ll including both A-law and p.-law encoding. Other
`types of audio that may be supported include G.722, G.723,
`G.728 and G.729.
`
`The data channel supports telematic applications such as
`electronic whiteboards, still image transfer, file exchange,
`database access,
`real
`time audiographics conferencing
`(T.l20), etc. The system control unit 56 provides services as
`defined in the H.245 and H.225.0 standards For example, the
`system control unit provides signaling for proper operation
`of the H.323 terminal, call control, capability exchange,
`signaling or commands and indications and messaging to
`describe the content of logical channels. The H.225.0 Layer
`64 is operative to format the transmitted video, audio, data
`and control streams into messages for output to the network
`interface. It also functions to retrieve the received video,
`audio, data and control streams streams from messages
`received from the network interface 68.
`
`The gateway functions to convert voice from the IP
`domain to the PSTN domain. In particular,
`it converts IP
`packetized voice to a format that can be accepted by the
`PSTN. The actual format depends on the type of media and
`protocol used for connecting to the PSTN (e.g., T1, E1,
`ISDN BRI, ISDN PR1, analog lines, etc.). The gateway
`provides the appropriate translation between different video,
`audio and data transmission formats and between different
`
`communications procedures and medias.
`Note that since the digitization format for voice on the IP
`packet network is often different than on the PSTN, the
`gateway needs to provide this type of conversion that is
`known as transcoding. Note also that gateways also function
`to pass signaling information such as dial tone, busy tone,
`etc. Typical connections supported by the gateway include
`analog, T1, E1, ISDN, frame relay and ATM at OC-3 and
`higher rates. Additional functions performed by the gateway
`include call setup and clearing on both the network side and
`the PSTN side. The gateway may be omitted if communi-
`cations with the PSTN is not required.
`The gatekeeper functions to provide call control services,
`address translation services, call routing services, call autho-
`rization services, billing, bandwidth management and tele-
`phony supplementary services like call forwarding and call
`transfer to terminal endpoints on the network. It is primarily
`designed to be the IP telephony equivalent of the PBX.
`Logical endpoints register themselves with the gatekeeper
`before attempting to bring up a session. The gatekeeper may
`deny a request to bring up a session or may grant the request
`at a reduced data rate. This is particularly relevant to video
`connections that typically consume huge amounts of band-
`width for a high quality connection.
`Call control signaling is optional as the gatekeeper may
`choose to complete the call signaling with the H.323 end-
`points and process the call signaling or it may direct the
`endpoints to connect the call signaling channel directly to
`each other, thus the gatekeeper avoids handling the H.225.0
`call control signals.
`Through the use of H.225.0 signaling, the gatekeeper may
`reject calls from a terminal due to authorization failure. The
`reasons for rejection may include restricted access to or from
`particular terminals or gateways, or restricted access during
`certain time periods.
`Bandwidth management entails controlling the number of
`H.323 terminals that are allowed to simultaneously access
`the network. Via H.225.0 signaling,
`the gatekeeper may
`reject calls from a terminal due to bandwidth limitations.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`This may occur if the gatekeeper determined that there is not
`sufficient bandwidth available on the network to support the
`call.
`
`The call management function performed by the gate-
`keeper includes maintaining a list of currently active H.323
`calls. This information is used to indicate that a terminal is
`
`busy and to provide information for the bandwidth manage-
`ment function.
`
`The gatekeeper also provides address translation whereby
`an alias address is translated to a Transport Address. This is
`performed using a translation table that is updated using
`Registration messages, for example.
`
`Real-Time Transport Protocol
`
`The H.225.0 standard dictates the usage of the Real-Time
`Transport Protocol (RTP) which is defined by the IETF in
`RFC 1889 for conveying the data between the call endpoints
`and for monitoring the network congestion. The RTP pro-
`tocol defines the RTP packet structure that includes two
`parts:
`the RTP packet header part and the RTP packet
`payload part. The RTP packet header includes several fields.
`Among those fields, are the payload type identification field,
`the sequence numbering field and the time stamping field.
`Typically, applications encapsulate RTP in a UDP packet.
`UDP/IP is an unreliable transport mechanism and therefore
`there is no guarantee that the RTP packet would reach its
`destination. RTP may, however, be used with other suitable
`underlying network or transport protocols.
`RTP does not itself provide any mechanism to ensure
`timely delivery or other QoS guarantees, but relies on lower
`layer services to do so. It also does not guarantee delivery,
`nor does it assume that the underlying network is reliable
`and delivers packets in sequence. RTP includes sequence
`numbers and timestamps in the packet to allow the receiver
`to reconstruct the sender’s packet sequence and timing.
`RTP is intended to be flexible so as to provide the
`information required by a particular application. Unlike
`conventional protocols in which additional functions may be
`accommodated by making the protocol more general or by
`adding an option mechanism that required parsing, RTP can
`be tailored through modifications and/or additions to the
`headers.
`
`The RTP Control Protocol (RTCP) functions to periodi-
`cally transmit control packets to all participants in a session.
`The primary function of RTCP is to provide feedback on the
`quality of the data distribution that is useful for monitoring
`network congestion. The RTCP protocol
`is designed to
`monitor the quality of service and to convey information
`about the participants in an on-going session. RTCP also
`carries a transport level identifier for an RTP source called
`the canonical name or CNAME. Receivers require the
`CNAME to associate multiple data streams from a given
`participant in a set of related RTP sessions. The RTCP
`protocol can also be used to convey session control infor-
`mation such as participant identification. Each RTCP packet
`begins with a fixed header followed by structured elements
`of variable length. Note that the signaling/control informa-
`tion carried in the RTCP packets is transmitted using TCP/IP
`reliable protocol.
`Also under the H.323 protocol umbrella are a number of
`standards for voice codecs including for example, G.7ll,
`G.729, G.729.l and G.723.l.
`
`Ca||Copy
`1103-14
`
`CallCopy
`1103-14
`
`
`
`US 7,286,652 B1
`
`5
`Call Signaling
`
`Call signaling encompasses the messages and procedures
`used to establish a call, request changes in bandwidth of the
`call, get status of the endpoints in the call and disconnect the
`call. Call signaling uses messages defined in the H.225.0
`standard. In particular,
`the RAS signaling function uses
`H.225 .0 messages to perform registration, admissions, band-
`width changes, status and disengage procedures between
`endpoints and Gatekeepers. The RAS Signaling Charmel is
`independent from the Call Signaling Channel and the H.245
`Control Charmel.
`
`Each H.323 entity has at least one network address that
`uniquely identifies the H.323 entity on the network. For each
`network address, each H.323 entity may have several TSAP
`identifiers that enable the multiplexing of several charmels
`sharing the same network address. Endpoints have one
`well-known TSAP identifier known as the Call Signaling
`Charmel TSAP Identifier. In addition, Gatekeepers also have
`one well-known TSAP identifier defined known as the RAS
`Charmel TSAP Identifier, and one well-known multicast
`address defined known as the Discovery Multicast Address.
`Endpoints and H.323 entities use dynamic TSAP Identifiers
`for the H.245 Control Charmel, Audio Charmels, Video
`Charmels, and Data Charmels while the Gatekeeper uses a
`dynamic TSAP Identifier for Call Signaling Channels.
`Further, an endpoint may have one or more alias addresses
`associated with it. An alias address represents the endpoint
`and provides an alternate method of addressing the endpoint.
`It is important to note that an endpoint may have more than
`one alias address that translates to the same TSAP. The alias
`
`may comprise, for example, private telephone numbers,
`E.l64 numbers, any alphanumeric string that may represent
`a name, e-mail address, etc. In addition,
`the alias may
`comprise a MAC address, IP address, ATM address, access
`token, DNS address, TSAP as IP address concatenated with
`port number or name alias. Note that alias addresses are
`unique within a zone and that gatekeepers do not have alias
`addresses.
`
`When there is a Gatekeeper in the network, the calling
`endpoint addresses the called endpoint by its Call Signaling
`Charmel Transport Address or by its alias address. The
`Gatekeeper translates the latter into a Call Signaling Chan-
`nel Transport Address.
`An endpoint joins a zone via the registration process
`whereby it
`informs
`the Gatekeeper of its Transport
`Addresses and one or more associated alias addresses. Note
`
`take place before any calls are
`that registration must
`attempted. When endpoints are powered up, they look on the
`network for the Gatekeeper and once found, they register
`their TSAP and one or more aliases with therewith.
`
`Prior Art Four Charmel Audio Recording
`
`In LAN Telephony applications, the voice samples gen-
`erated are packed within RTP packets that are then encap-
`sulated within UDP/IP packets. The UDP packets that travel
`over an IP network may, however, be delayed, dropped or
`arrive out of order from their original transmission sequence
`depending on the degree of network congestion. Therefore,
`the frequency in which the packets arrive to the receive side
`is not constant.
`
`In order to combat the delay problems, many devices
`implement a jitter bulfer on the receive side. If packets are
`only delayed on the network, arriving at the receiver before
`the jitter bulfer underflows, the receive side will hear the
`sound as it was originally transmitted by the local endpoint.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`If, however, packets are dropped or packets are delayed too
`much and the jitter bulfer underflows (i.e. becomes empty),
`the receiving device either (1) replays the last packet
`received or (2) it injects a silence. Thus, in the event packets
`are dropped or are delayed excessively causing jitter bulfer
`underflow, the sound that is played on the receive side is not
`the original sound that was transmitted.
`Many audio applications including voice require that the
`audio (or voice) be recorded, at one or both ends of a
`conversation. Ablock diagram illustrating a prior art packet-
`based four charmel audio recorder is shown in FIG. 4. The
`
`system, generally referenced 80, comprises a packet network
`88 to which are connected a plurality of endpoints 82, such
`as endpoints A and B. Each endpoint comprises a loud-
`speaker (not shown) for generating audio and a microphone
`for converting audio, i.e. voice, to an electrical signal. Each
`endpoint is operative to receive an Rx signal 90 from the
`other endpoint and to generate a Tx signal 92 to the other
`side.
`
`The system further comprises a 4 four channel IP recorder
`device 94 that is adapted to receive a plurality of digitized
`audio channels and record them on storage media such as a
`hard disk, flash memory disk, RAM, NVRAM, magnetic
`tape, etc. Each endpoint sends two separate channels of
`audio to the recording device: a (I) played audio charmel and
`a (2) transmitted audio charmel. Endpoint A is adapted to
`send a separate played audio signal PA(n) 96 and a trans-
`mitted audio signal TA(n) 98 to the recording device. Note
`that the signal received (Rx 90) is not forwarded to the
`recorder as this signal is not necessarily the signal that is
`played by the endpoint. Similarly, endpoint B is adapted to
`send a separate played audio signal PB(n) 100 and a trans-
`mitted audio signal TB(n) 102 to the recording device.
`A requirement of any accurate recording system is to be
`able to faithfully playback the sound that was originally
`recorded. In a packet telephony system, a recorder must be
`able to playback the sound that was generated on the side of
`the talking endpoint (i.e. sent by the transmitter) in addition
`to the sound that was played at the listening endpoint (i.e. the
`playback signal sent to the loudspeaker). Therefore, each
`endpoint must forward two separate audio streams: the audio
`that is played through the speaker and the audio that is
`transmitted to the other side.
`
`In addition, the recording device must synchronize. The
`four channels of audio it receives from the two endpoints. It
`must be adapted to not only synchronize between playback
`and transmit between two endpoints, but must also be
`adapted to synchronize audio between transmit and playback
`from the same endpoint.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides an apparatus for and a
`method of audio recording in packet-based telephony sys-
`tems. Using the present invention, the equivalent of four
`audio channels are recorded utilizing only two recording
`channels. Each channel recorded comprises the stream of
`packets (e.g., RTP packets) generated and transmitted by
`each endpoint to the other side. The RTP packets include the
`samples generated by the particular endpoint in addition to
`an indication (e.g., a pointer) of the samples received from
`the other side actually played by the endpoint. Note that the
`audio played on an endpoint is not necessarily the samples
`received from the other side.
`
`The transmit data, including the indication of the samples
`played, generated by each side of a connection is sent to the
`recording device. The recording device is operative to store
`
`Ca||Copy
`1103-15
`
`CallCopy
`1103-15
`
`
`
`US 7,286,652 B1
`
`7
`the received packet stream on some type of storage media
`such as hard disk drive, a flash memory disk, RAM,
`NVRAM, magnetic tape, etc. The recording device com-
`prises means for synchronizing the audio stream of one
`endpoint to the audio stream from the other endpoint. The
`recording device must know what was played at the end-
`point in order to accurately playback the audio samples
`generated by and received from the other endpoint. Thus, the
`recording device is effectively provided knowledge of the
`actual audio played on both ends of the connection.
`In one embodiment, a two channel IP recording device is
`adapted to receive a single packet stream generated by each
`side of a connection. The packet stream is transported from
`each endpoint to the recording device over a reliable con-
`nection, using either a reliable protocol such as TCP/IP, a
`point-to-point connection, or a circuit based connection.
`Note that it is not necessary that the reliable connection be
`a real
`time connection. The packet stream includes the
`digital audio data generated on the endpoint, e.g., voice from
`a microphone, and an indication, e.g., pointer, of the packet
`from the other side that was played on the endpoint. In a
`second embodiment, each endpoint comprises recording
`means for recording the transmit packet stream sent to the
`other side. A subsequent ofiline process combines and syn-
`chronizes the two recorded packet streams using the indi-
`cations that were added to the RTP packets.
`Since the recorder receives the audio signal that was
`generated and transmitted from each endpoint, it can recon-
`struct the audio signal
`that was actually played on the
`endpoint. To playback an audio signal, the recording device
`needs to know the samples that were actually played on each
`endpoint. The recorder is provided knowledge of the audio
`played on the other end via information transmitted in the
`data sample packets it receives. Each endpoint is adapted to
`include an indication of the audio that it played, with the
`packet of data samples sent to the recorder.
`To perform accurate playback, the recording device needs
`to know for each sample an endpoint recorded, what sample
`the endpoint played at that time. The recording device is
`provided knowledge of the audio played on the endpoint via
`information transmitted in the header and header extension
`
`portions of the RTP packets and via the knowledge of the
`number of samples in the payload portion of the RTP packet.
`There are two methods by which an endpoint informs the
`recording device which samples were played when the
`samples in the data packet were recorded: the first method
`uses timestamps and the second method uses the RTP packet
`sequence numbers and offset pointers into the RTP packets.
`In the timestamp method, each endpoint is adapted to
`include the timestamp of the packet of audio that is played,
`with the packet of data samples sent to the recording device.
`Thus, two timestamps are sent in the RTP packet including
`(1) a first timestamp of the data samples generated by the
`endpoint
`(this timestamp value is taken when the first
`sample in the packet is taken) and (2) a second timestamp of
`the packet received from the other endpoint and played at a
`point in time when the first sample of the local endpoint
`packet is generated.
`Each endpoint is operative to track the timestamp of the
`data samples received encapsulated in RTP packets sent
`from the other endpoint. These data samples are subse-
`quently played by the endpoint
`through its associated
`speaker. The data samples generated by the endpoint are
`timestamped and placed in RTP packets. In addition, the
`timestamp of the data samples played by the other endpoint
`at that moment in time is also placed in the extension portion
`of the header of the RTP packet sent to the recording device.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`If the last packet received was replayed, an indication is
`placed in the header extension of the packet that comprises
`the timestamp of the most recently received RTP packet. If
`a silence is played, a zero is placed in the header extension.
`The completed RTP packet is then sent over a real time
`connection (e.g., UDP/IP) to the remote endpoint for