`
`(12) United States Patent
`Talwar
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,111,326 B1
`Feb. 7, 2012
`
`(54) POSTCAPTURE GENERATION OF
`SYNCHRONIZATION POINTS FOR AUDIO
`TO SYNCHRONIZE VIDEO PORTIONS
`CAPTUREDAT MULTIPLE CAMERAS
`(75) Inventor: Abhishek Talwar, New Delhi (IN)
`(73) Assignee: Adobe Systems Incorporated, San Jose,
`CA (US)
`
`(*) Notice:
`
`-
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1294 days.
`
`(21) Appl. No.: 11/805,830
`(22) Filed:
`May 23, 2007
`
`(51) Int. Cl.
`(2006.01)
`H04N5/04
`3.08:
`EN 7.2
`(2006.015
`H04N5/932
`(2006.01)
`H04N5/93
`(2006.01)
`G06F L/2
`(2006.01)
`GOIC 2L/00
`(52) U.S. Cl. .................. 348/500; 348/14.08; 348/423.1;
`725/67; 713/400; 701/215; 386/201: 386/203;
`386/282
`(58) Field of Classification Search ............... ... ... None
`See application file for complete search history.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`5,479,351 A 12/1995 Woo et al.
`
`82002 Lee
`6430,361 B2
`29, R 1 58. K et al. 1
`E. : :
`al
`7,015,954 B
`32006 Foote et al.
`7.024,575 B2
`4/2006 Lienhart et al.
`7.057,663 B1* 6/2006 Lee ............................ 348/423.1
`7,126,630 B1
`10/2006 Lee et al.
`2001/0036356 A1 11/2001 Weaver et al.
`2002fOO18124 A1
`2/2002 Mottur et al.
`2005/0024488 A1
`2/2005 Borg
`2005/0206720 A1* 9, 2005 Cheatle et al. ............. 348, 14.08
`2006/0002690 A1
`1/2006 Stwertka et al.
`2006/0248.559 A1 11/2006 Michener et al.
`2007.0035632 A1
`2/2007 Silvernail et al.
`FOREIGN PATENT DOCUMENTS
`119.9892
`4/2002
`EP
`* cited by examiner
`
`Primary Examiner — Jefferey Harold
`Assistant Examiner — Sean Haiem
`Robert C. Kowert:
`(74) Attorney, Agent, or Firm
`Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
`
`ABSTRACT
`
`57
`(57)
`Embodiments of the invention relate generally to computing
`devices and systems, software, computer programs, applica
`tions, and user interfaces, and more particularly, to synchro
`nizing portions of video as a function of the post-capture
`generation of synchronization points for audio, the portions
`of video being captured at multiple cameras.
`
`23 Claims, 12 Drawing Sheets
`
`
`
`-o
`
`
`
`
`
`
`
`
`
`Post-Capture
`Synchronization
`Point(s)
`211
`-
`SP 121
`
`
`
`
`
`
`
`
`
`Audio
`Synchronization
`Pit
`
`
`
`
`
`Under
`Analysis
`(i.e., WUA1)
`
`Wided
`Under
`Analysis
`(i.e., WUAn)
`
`
`
`Netflix v. GoTV
`IPR2023-00757
`Netflix Ex. 1020
`
`
`
`U.S. Patent
`
`Feb.7, 2012
`
`Sheet 1 of 12
`
`US 8,111,326 B1
`
`
`
`
`Arabient >
`Noise
`
`4 rey
`<Capture YS
`‘
`TONa oh
`a
`IOVS ue
`%
`~
`%
`wn
`‘ aeat
`
` ae :
`
`Tak ye
`
`arr 5
`
`FIG. 1A
`iPror Arh
`
`a& Postprocduotior
`
`
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 2 of 12
`
`US 8,111,326 B1
`
`-o
`
`Post-Capture
`Synchronization
`Point(s)
`
`
`
`
`
`
`
`Audio
`Synchronization
`Point
`Generator
`
`
`
`
`
`
`
`
`
`Analysis
`(i.e., WUA1)
`
`Video
`Under
`Analysis
`(i.e., WUAn)
`
`FIG. 2A
`
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 3 of 12
`
`US 8,111,326 B1
`
`231
`
`Alignment
`Point
`
`Audio
`Synchronization
`Point Generator
`Candidate
`Synchronization
`Point
`Generator
`
`235
`w
`Reference
`Audio
`236 Selector
`
`Band
`Selector/
`Generator
`
`Vicinity
`Determinator
`
`Reference Audio
`
`Candidate
`Candidate Synchronization
`Synch
`Point
`Point
`Detector
`
`Sound
`Attribute
`analyzer
`
`241
`
`aligner
`
`243
`
`
`
`244
`
`(
`
`Vicinity Range
`
`\
`
`Specimen Audio
`
`S
`245a
`
`Alignment
`Point
`
`233b
`
`245b
`
`FIG. 2B
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 4 of 12
`
`US 8,111,326 B1
`
`250
`RL = 30 min
`- (1800 Sec) ->
`18
`
`231
`
`
`
`233a
`Alignment
`-
`Point
`
`
`
`
`
`
`
`
`
`
`
`Vicinity Range
`
`233b
`
`Alignment
`Point
`
`243
`
`/
`244a -ea. 12 minor 720 sec)
`"221
`A. iii" sa"ple
`
`.
`
`. .
`
`S
`
`udio
`
`Q SO
`
`Specimen
`
`Sample
`G) S1
`
`i Specimen
`
`Sample
`G. Sn
`
`na as is a
`
`FIG. 2C
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 5 of 12
`
`US 8,111,326 B1
`
`264
`
`3
`S
`
`amplitude
`
`262
`
`264
`
`A9
`A8
`A7
`A6 S0 st
`A 2ra264
`A4 264
`
`S2
`
`264
`
`S6s.7 S8 S9
`S5
`
`10
`S
`
`A3
`A2
`A1
`AO
`
`4
`S
`
`266
`N
`Tolerance
`Range
`2.
`(e.g., 5% deviation)
`
`l/
`Y266
`-1 Tolerance
`>
`Range
`(e.g., 5% deviation)
`
`l/
`
`268
`
`tO t1 t2 t3 tA t5 ts t( t8 t9 t101
`
`Time units
`
`FIG. 2D
`
`Amplitude
`(e.g., dB)
`
`N8
`N7
`N6
`N5
`N4
`N3
`N2
`
`NO
`
`f0 f1, f2 f3 fa. f6 f6 f7 f8 fg f10 1
`
`frequency
`
`FIG. 2E
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 6 of 12
`
`US 8,111,326 B1
`
`300
`Y
`Audio
`Synchronization
`Point
`Generator
`
`Synchronization
`Point
`Certifier
`
`Candidate
`:
`Synch Point
`:
`Generator
`-----------------------------
`
`306
`
`POR
`Synchronizer
`
`confirmatory candidate
`synchronization point
`generator
`
`FIG. 3A
`
`
`
`Reference
`Audio
`
`Specimen
`Audio
`
`d
`
`322e
`
`Match/No Match
`Signal
`
`Match/No Match
`Signal
`
`FIG. 3B
`
`
`
`U.S. Patent
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 7 of 12
`
`US 8,111,326 B1
`US 8,111,326 B1
`
`
`
`
`
`
`
`
`
`Jeqjoo|Jo/puenusyy
`
`
`
`Jeg e???L 19ued
`
`
`
`UOQIBUSSJULIOYOUAS
`
`SJa}OUWIeIe
`
`olpnyseds
`
`cYOR]
`
`
`
`aoualajoy
`
`olpny
`
`"tbaca bWORLL
`
`
`
`
`
`
`
`
`
`Jeqofl[Sued
`
`peseq-oipny
`
`voneziUuoysUAS
`
`joued
`
`SOP
`
`| 07
`LOF
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 8 of 12
`
`US 8,111,326 B1
`
`500
`N
`
`504
`
`-
`
`generate a post-capture synchronization point as
`a function of subject audio
`
`
`
`analyze an attribute of sound associated
`with the subject audio
`
`determine whether the attribute of Sound for
`508. Subsets of the subject audio is substantially
`equivalent
`
`
`
`510
`
`Generate a candidate synchronization point
`
`
`
`512
`
`certify that the candidate synchronization point
`is a post-capture synchronization point
`
`
`
`
`
`514 v
`
`synchronize at least two portions of video
`substantially at the post-capture
`synchronization point
`
`516
`
`End
`
`FIG. 5
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 9 of 12
`
`US 8,111,326 B1
`
`600
`
`602-u
`
`
`
`604
`
`designate that a content file from a plurality of
`Content files includes a reference audio as a
`function of an amount of data
`
`
`
`606
`
`extract the reference audio and a
`specimen audio from content files in a
`repository
`
`608-a-
`
`characterize a portion of the reference
`audio in terms of an audio attribute
`I
`610 determine whether the specimen audio includes
`the characterized portion of the reference audio
`
`612 N.
`
`search for a pattern of the audio attribute
`associated with the characterized portion of the
`reference audio Within a Subset of the
`specimen audio
`
`
`
`614 r-
`
`synchronize video portions at a
`synchronization point that is based on the
`reference audio and the specimen audio
`
`616
`
`FIG. 6
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 10 of 12
`
`US 8,111,326 B1
`
`O2
`
`TO/From
`application,
`Operating
`system, or
`display
`
`
`
`
`
`I/F Module
`704
`m
`
`Display Module
`
`706
`
`Panel Generator
`714.
`
`
`
`
`
`Logic Module
`712
`
`----------
`790
`
`
`
`Rendering Engine
`
`708
`
`audio
`synchronization
`point generator
`
`FIG. 7A
`
`C D
`
`
`
`734---
`
`Logic Module
`
`724
`
`To/From
`application,
`operating
`system, or
`display
`
`/F Module
`726
`
`Display Module
`
`728
`
`Rendering Engine
`
`730
`
`FIG. 7B
`
`
`
`U.S. Patent
`
`Feb. 7, 2012
`
`Sheet 11 of 12
`
`US 8,111,326 B1
`
`Display
`
`I/O Device
`
`816
`
`Cursor Control
`
`818
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`--
`
`Processor
`
`804
`
`Disk Drive
`
`810
`
`832
`OS
`Application
`
`
`
`
`
`836
`
`Communication
`interface
`
`812
`
`audio
`synchronization
`point generation
`module
`
`83
`
`Memory
`-
`
`FIG. 8
`
`
`
`U.S. Patent
`U.S. Patent
`
`Feb. 7, 2012
`Feb.7, 2012
`
`Sheet 12 of 12
`Sheet 12 of 12
`
`US 8,111,326 B1
`US 8,111,326 B1
`
`
`
`::::
`
`800
`
`-O.
`FIG. 9
`
`
`
`US 8,111,326 B1
`
`1.
`POSTCAPTURE GENERATION OF
`SYNCHRONIZATION POINTS FOR AUDIO
`TOSYNCHRONIZE VIDEO PORTIONS
`CAPTUREDAT MULTIPLE CAMERAS
`
`FIELD OF THE INVENTION
`
`Embodiments of the invention relate generally to comput
`ing devices and systems, software, computer programs, appli
`cations, and user interfaces, and more particularly, to Syn
`chronizing portions of video as a function of the post-capture
`generation of synchronization points for audio, the portions
`of video being captured at multiple cameras.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`15
`
`When editing audio and video captured by multiple cam
`eras, traditional media editing applications typically operate
`on the premise that audio portions captured at different cam
`eras angles are coextensive with the captured video and align
`at a common point in time. But this is often not the case. In
`practice, the spatial arrangement of the multiple cameras, as
`well as the environment, contribute to deviations in audio
`relative to some point in time. These deviations, which can be
`as Small as a fraction of a second, can lead to two or more
`captured audio portions being out of synchronization as per
`ceived, for example, by a human listener.
`FIG. 1A illustrates a multi-camera arrangement 100 for
`capturing video and audio of a subject 108 at different angles
`and positions. As shown, capture devices 102a, 102b, and
`102c, which are typically cameras, are arranged at different
`angles A1, A2, and A3 relative to reference 110. Further, these
`capture devices are positioned at different distances, D1, D2,
`and D3 in space from subject 108. In this typical multi
`camera arrangement 100, these angles and distances, as well
`as other various factors, such as the occurrence of ambient
`noise 104 near capture device 102a, affect the synchroniza
`tion (and/or the quality) of the audio portions as they are
`captured.
`One common technique for synchronizing the video cap
`tured at capture devices 102a, 102b, and 102c is to implement
`time codes associated with each video (or otherwise use some
`sort of global synchronization signal) to synchronize both the
`Video and audio portions. In particular, a user is usually
`required to manually adjust the different videos to bring their
`time codes into agreement. A time code normally describes
`the relative progression of a video images in terms of an hour,
`minute, second, and frame (e.g., HH:MM:SS:FR). But a
`drawback to using time codes to synchronize audio requires
`the user to synchronize different video portions to a particular
`frame before synchronizing the audio portions. The effort to
`synchronize the audio is further exacerbated due to the num
`ber of samples of audio sound that is captured relative to the
`number of video frames. Typically, for each frame of video
`(e.g., 30 frames per second), there are 1,600 samples of audio
`(e.g., 48,000 samples per second). As such, audio portions for
`capture devices 102a, 102b, and 102c are typically synchro
`nized based on the video portions and their time codes, which
`can contribute to undesired sound delays and echoing effects.
`Another common technique for synchronizing the audio (and
`the video) captured at capture devices 102a, 102b, and 102c is
`to use a clapper to generate a distinctive Sound during the
`capture of the audio and video. A clapper creates an audible
`Sound—as a reference Sound—to synchronize audio during
`the capture of the audio. The clapper Sound is used for editing
`purposes and is discarded during editing. Consider that a
`clapper (not shown) generates a sound (“noise') 104 for
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`capture by capture devices 102a, 102b, and 102c. Thus, clap
`per noise 104 can be used to synchronize the audio. A draw
`back to using clapper noise 104 to synchronize audio is that
`the distance from noise and capture devices 102a, 102b, and
`102c can cause delays that hinder synchronization of the
`audio relating to scene 108.
`FIG. 1B illustrates a typical work flow to integrate indicia,
`Such as time codes and clapper Sounds, within the audio or
`Video for synchronization purposes prior to and/or during the
`capture of video using a multi-camera arrangement. As
`shown, a typical work flow to film a scene 108 (FIG. 1A)
`includes the stages of pre-production 140 (i.e., prior to cap
`turing video and audio), production 142 (i.e., the capturing of
`Video and audio), and post-production 144 (i.e., Subsequent to
`capturing video and audio). In a pre-production stage 140 of
`capturing video, common synchronization techniques usu
`ally require that a user procure either time code generation
`hardware or a clapper, or both, before the video and audio is
`captured. In a production stage 142, common synchroniza
`tion techniques usually require that a user implement time
`codes or a clapper to introduce points at which to synchronize
`Video during the capture of the video and audio. In a post
`production stage 144, a user normally uses the common syn
`chronization techniques of the pre-production 140 and pro
`duction 142 stages to synchronize the video. The time codes
`and clapper sounds require removal as they are intended for
`editing purposes and are distracting to an audience if time
`codes remain visible and clapper sounds remain audible in the
`final product.
`It would be desirable to provide improved computing
`devices and systems, software, computer programs, applica
`tions, and user interfaces that minimize one or more of the
`drawbacks associated with conventional techniques for Syn
`chronizing either audio or video, or both.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`The invention and its various embodiments are more fully
`appreciated in connection with the following detailed
`description taken in conjunction with the accompanying
`drawings, in which:
`FIG. 1A illustrates a multi-camera arrangement for captur
`ing video and audio of a Subject at different angles and posi
`tions;
`FIG. 1B illustrates a typical work flow to integrate indicia
`in connection the audio or video for synchronization purposes
`prior to and/or during the capture of video using a multi
`camera arrangement;
`FIG. 2A is a diagram of an audio synchronization point
`generator that is configured to generate points at which to
`synchronize video and/or audio that are captured at multiple
`capture devices, according to at least one embodiment of the
`invention;
`FIG. 2B is a diagram depicting an audio synchronization
`point generator that is configured to generate candidate Syn
`chronization points, according to at least one embodiment of
`the invention;
`FIG.2C is a diagram depicting an example of the operation
`of a sound attribute analyzer and a candidate synchronization
`point detector to determine one or more candidate synchro
`nization points, according to at least one specific embodiment
`of the invention;
`FIGS. 2D and 2E are diagrams depicting different
`attributes of sound that can be used to determine candidate
`synchronization points, according to various embodiments of
`the invention;
`
`
`
`US 8,111,326 B1
`
`3
`FIG. 3A illustrates another example of an audio synchro
`nization point generator that includes a synchronization point
`certifier, according to at least one embodiment of the inven
`tion;
`FIG. 3B illustrates the functionality of a synchronization
`point certifier, according to at least one embodiment of the
`invention;
`FIG. 4 is a diagram depicting the presentation of an audio
`based synchronization panel, according to at least one
`embodiment of the invention;
`FIG.5 is a flow diagram depicting one example of a method
`for synchronizing video and/or audio based on the generation
`of post-capture synchronization points, according to one
`embodiment of the invention;
`FIG. 6 is a flow diagram depicting another example of a
`method for synchronizing video and/or audio based on the
`generation of post-capture synchronization points, according
`to another embodiment of the invention;
`FIG. 7A illustrates an example of a panel presentation
`application for implementing a multi-camera panel that pre
`sents audio tracks synchronized with the use of post-capture
`synchronization points, according to various embodiments of
`the invention;
`FIG. 7B illustrates an alternative example of a panel pre
`sentation application for implementing a multi-camera panel
`that presents audio tracks synchronized with the use of post
`capture synchronization points, according to at least one
`embodiment of the invention;
`FIG. 8 illustrates an exemplary computer system suitable
`for implementing an interactive panel for an interface to
`modify the operation of an audio synchronization point gen
`erator, according to at least one embodiment of the invention:
`and
`FIG. 9 illustrates an example of a panel presentation sys
`tem for editing video clips associated with post-capture Syn
`chronization points for a reference audio and a specimen
`audio, according to various embodiment of the invention.
`Like reference numerals refer to corresponding parts
`throughout the several views of the drawings. Note that most
`of the reference numerals include one or two left-most digits
`that generally identify the figure that first introduces that
`reference number.
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`4
`subsets 218 and 228 are equivalent (or substantially equiva
`lent). Equivalent attributes for audio subsets 208, 218, and
`228, therefore, can represent a synchronization point for at
`least audio portions 206,216, and 226. In at least one embodi
`ment, audio synchronization point generator 210 can be con
`figured to select a portion of video 204 as a reference video
`(“Vref), and to analyze audio subset 208 against audio sub
`sets 218 and 228 to synchronize with video portions 214 and
`224, respectively, both of which represent specimen video
`under analysis (“VUA). Thus, audio synchronization point
`generator 210 can provide for the generation of post-capture
`synchronization points for the purposes of synchronizing at
`least two portions of audio and/or video that can be captured
`at, for example, multiple capture devices (e.g., multiple cam
`eras).
`In view of the foregoing, audio synchronization point gen
`erator 210 can implement post-capture synchronization
`points to automatically identify synchronization points for at
`least two portions of audio, according to at least one embodi
`ment of the invention. This can reduce the manual identifica
`tion of equivalent portions of audio and/or video for synchro
`nization purposes. Further, audio synchronization point
`generator 210 can generate post-capture synchronization
`points as a function of Subject audio, which can include
`Sounds generated by the Subject and/or scene for which audio
`and/or video was captured. Subject audio is sought to be
`captured as content rather than for production (e.g., editing)
`purposes. As such, the generation of post-capture synchroni
`Zation points for audio to synchronize video reduces the
`necessity to rely on external synchronization information to
`synchronize audio. Examples of external synchronization
`information include time codes and clapper sounds, as well as
`other synchronization signals and artificially-inserted Sounds
`(i.e., non-Subject sounds) that provide for synchronization
`points either prior to, or during, the capture of audio and
`Video, or both. In addition, the implementation of audio syn
`chronization point generator 210 can conserve resources and
`computational overhead by reducing the need to implement
`hardware to create the external Synchronization information.
`FIG. 2B is a diagram 230 depicting an audio synchroniza
`tion point generator that is configured to generate candidate
`synchronization points, according to at least one embodiment
`of the invention. Audio synchronization point generator 234
`can operate to identify a reference audio with which to com
`pare to other specimen audio(s) to determine equivalent (or
`Substantially equivalent) portions of audio that might qualify
`as post-capture synchronization points. In this example,
`audio synchronization point generator 234 includes a candi
`date synchronization point generator 240 that can be config
`ured to generate at least one candidate synchronization point
`for a reference audio 231 and a specimen audio 244. In
`various embodiments, reference audio 231 and specimen
`audio 244 can either include subject audio or exclude external
`synchronization information, such as audible external Syn
`chronization information, or both. As shown, candidate Syn
`chronization point generator 240 can include a reference
`audio selector 235, a vicinity determinator 237, a sound
`attribute analyzer 238, a candidate synchronization (“synch’)
`point detector 239, and aligner 241.
`In operation, reference audio selector 235 can be config
`ured to analyze content files to identify audio content within
`each content file for synchronization purposes. For example,
`reference audio selector 235 can be configured to extract
`audio portions from content files stored in a repository (not
`shown), while preserving associations to the corresponding
`video portions to synchronize both the audio and video. Ref
`erence audio selector 235 can also be configured to designate
`
`DETAILED DESCRIPTION
`
`45
`
`50
`
`FIG. 2A is a diagram 200 of an audio synchronization point
`generator that is configured to generate points at which to
`synchronize video and/or audio that are captured at multiple
`capture devices, according to at least one embodiment of the
`invention. Audio synchronization point generator 210 can be
`configured to analyze different portions of content, Such as
`content portions 202, 212, and 222, for synchronizing audio
`portions, such as audio portions 206, 216, and 226, with
`portions of video, such as video portions 204, 214, and 224.
`Different content portions can be captured by different cam
`55
`eras in a multiple-camera arrangement. Audio synchroniza
`tion point generator 210 can be further configured to generate
`synchronization points 211 (e.g., SP1, SP2, ..., SPn) that
`identify at least one portion of audio at which content portions
`202, 212, and 222 are in (or are substantially in) synchronic
`ity. Thus, synchronization points 211 facilitate the synchro
`nization of audio portions 206, 216, and 226, which, in turn,
`facilitate the synchronization of video portions 204, 214, and
`224. In one embodiment, audio synchronization point gen
`erator 210 can be configured to analyze an attribute of sound
`associated with, for example, a subset 208 of audio portion
`206, and to determine whether the attribute of sound for
`
`60
`
`65
`
`
`
`US 8,111,326 B1
`
`5
`an audio portion from a content file as reference audio 231. In
`a specific embodiment, an audio portion can be designated as
`reference audio 231 as a function of an amount of data asso
`ciated with the content file or its audio portion. Specifically,
`reference audio selector 235 can determine which content file
`has the largest amount of data for either the content or the
`audio, or both, and then designate the audio portion from that
`content file as reference audio 231. In at least one specific
`embodiment, reference audio 231 can have the longest dura
`tion relative to other audio portions from other content files.
`In one embodiment, reference audio selector 235 can also
`select one of the other audio portions as specimen audio 244.
`FIG. 2B shows reference audio selector 235 including a
`band selector/generator 236 for generating and managing
`bands 232 of reference audio 231. In one embodiment, band
`selector/generator 236 can be configured to subdivide refer
`ence audio 231 into bands 232. As such, bands 232 can
`represent subsets of reference audio 231. Further, consider
`the following example in which reference audio 231 has a
`duration expressed interms of time (i.e., a reference length, or
`“RL'), such as 30 minutes. In operation, band selector/gen
`erator 236 can generate any number of bands 232 having the
`same or different sizes. In one example, each band 232 can
`represent one percent (i.e., 1%) of the reference length. As
`such, each band 232 can represent 18 seconds (i.e., 1% of 30
`minutes). Further, band selector/generator 236 can be config
`ured to selecta band for determining whether specimen audio
`244 contains an equivalent (or Substantially equivalent)
`attribute of sound, such as the attribute of sound for a subset
`246 of specimen audio 244.
`Vicinity determinator 237 can be configured to generate a
`vicinity range 243 within which band 232a is compared to
`specimen audio 244. Vicinity determinator 237 can be con
`figured to size vicinity range 243 to any duration. For
`example, vicinity determinator 237 can size vicinity range
`243 to 40% of RL (i.e., 12 minutes if RL is 30 minutes).
`Aligner 241 can be configured to align vicinity range 243 with
`an alignment point 233b that is coincident (or substantially
`coincident) with alignment point 233a for band 232a. In one
`embodiment, a time reference. Such as a time code, can con
`stitute alignment points 223a and 233b. While aligner 241
`can be configured to position vicinity range 243 in any rela
`tion to alignment point 233b, aligner 241 has centered vicin
`ity range 243 in this example such that a first half 245a and a
`second half 245b each includes 20% RL. In some embodi
`ments, vicinity range 243 can extend up to 100% of reference
`audio 231.
`Sound attribute analyzer 238 and candidate synchroniza
`tion point detector 239 can be respectively configured to
`analyze an attribute of Sound associated with reference audio
`231, and determine whether the attribute of sound for subsets
`of specimen audio 244 is substantially equivalent. In at least
`one embodiment, sound attribute analyzer 238 can be config
`ured to characterize a portion of reference audio 231, such as
`band 232a, in terms of at least one audio attribute to form a
`characterized portion (not shown) of reference audio 231. In
`one embodiment, sound attribute analyzer 238 can be config
`ured to form the characterized portion of reference audio 231
`to identify a pattern, and to search specimen audio 244 to find
`other matching patterns. Candidate synchronization point
`detector 239 can be configured to determine that specimen
`audio 244 includes the characterized portion of reference
`audio 231, and to generate at least a candidate synchroniza
`tion point. In one embodiment, candidate synchronization
`point detector 239 can be configured to detect a matching
`pattern in specimen audio 244 to establish a candidate Syn
`chronization point. In at least one embodiment, band selector/
`
`40
`
`45
`
`6
`generator 236 is configured to select another band 232 should
`candidate synchronization point detector 239 fail to detect a
`matching pattern, and is further configured to continue select
`ing other bands 232 until either at least one candidate syn
`chronization point is detected or none is. In the latter case,
`audio synchronization point generator 234 can so indicate a
`state of no match to a user via, for example, a user interface
`(not shown). Then, reference audio selector 235 can select
`another specimen audio (not shown) for performing similar
`analysis.
`In one embodiment, the attribute of sound can be the ampli
`tude for an audio waveform that can expressed in percentages,
`decibels, and the like, relative to time. As such, Sound
`attribute analyzer 238 can be configured to analyze the audio
`waveform amplitude in band 232a of reference audio 231 to
`identify a pattern of waveform amplitudes, as shown in band
`232a of FIG. 2B. As such, sound attribute analyzer 238 can
`compare waveform amplitudes of band 232a with waveform
`amplitudes of specimen audio 244. To determine whether a
`candidate synchronization point exists, candidate synchroni
`zation point detector 239 determines whether the waveform
`amplitudes for band 232a match the waveform amplitudes for
`one or more Subsets of specimen audio 244. In the example
`shown, candidate synchronization point detector 239 is con
`figured to detect that the waveform amplitudes for band 232a
`are equivalent to the waveform amplitudes for subset 246.
`Subsequently, candidate synchronization point detector 239
`can generate a candidate synchronization point for band 232a
`and subset 246. In at least one embodiment, the candidate
`synchronization point for band 232a and subset 246 can be
`located at or near band 232a and subset 246 so as to provide
`for the alignment of band 232a and subset 246 relative to each
`other. In a specific embodiment, a candidate synchronization
`point can be implemented as a post-capture synchronization
`point. In various embodiments, the attribute of Sound can
`represent any characteristic of audio with which to compare
`and match portions of reference audio 231 and specimen
`audio 244. For example, the attribute of sound can also be the
`amplitude for an audio waveform relative to frequency. As
`Such, Sound attribute analyzer 238 can be configured to ana
`lyze the spectral frequencies and audio waveform amplitude
`in band 232a to identify a pattern for a frequency spectrum,
`which can be compared against Subsets of specimen audio
`244, including subset 246.
`Candidate synchronization point detector 239 can be con
`figured to provide a tolerance among values for the audio
`attribute to reduce false negatives (i.e., improper indications
`of mismatches between reference audio 231 and specimen
`audio 244) due to differences in tone, background noise,
`volume, and the like, that manifest in different audio portions
`that are captured by different cameras at different angles and
`spatial locations, according to at least one embodiment. In
`one embodiment, candidate synchronization point detector
`239 can be configured to establish a deviation from a pattern
`for band 232a to form a deviated pattern (not shown) within
`which a subset of specimen audio 244 is deemed to match the
`pattern. As such, candidate synchronization point detector
`239 can be configured to generate a candidate synchroniza
`tion point if a portion of specimen audio 244. Such as vicinity
`range 243, includes the deviated pattern, Such as in Subset
`246. In one instance, if the amplitudes for the audio waveform
`for both band 232a and a particular subset of specimen audio
`244 deviate less than an amount defined as a tolerance, Such as
`by 5%, then band 232a and that particular subset of specimen
`audio 244 can be deemed as being equivalent (or Substantially
`equivalent).
`
`10
`
`15
`
`25
`
`30
`
`35
`
`50
`
`55
`
`60
`
`65
`
`
`
`US 8,111,326 B1
`
`10
`
`15
`
`7
`As used herein, the term "synchronization point” refers
`generally, at least in one embodiment, to a point at which
`portions of two or more audio waveforms, such as those
`captured by multiple capture devices at different angles and/
`or positions, are in synchronization. As an example, consider
`that matching shapes for portions of two or more audio wave
`forms relative to a point in time can constitute a synchroni
`Zation point. As such, a synchronization point can indicate
`that a part of two or more audio waveforms are in synchro
`nicity. In one embodiment, a synchronization point can rep
`resent matching portions of multiple audio waveforms. In
`another embodiment, a synchronization point can also refer to
`point in time relative to the matching portions. As used herein,
`the term “post-capture” refers generally, at least in one
`embodiment, to post-production activity that occurs after
`capturing video and audio, and includes the process of editing
`content. As used herein, the term "subject audio” refers gen
`erally, at least in one embodiment, to the audio generated by
`Something, Such as a person, an object or a scene, that is
`captured by multiple capture devices for purposes of produc
`ing either a video with audio, or an audio recording along,
`Such as a movie or music. An example of a Subject audio is the
`Sounds produced by actors, such as their voices.
`As used herein, the term “post-capture synchronization
`point” refers generally, at least in one embodiment, to a syn
`chronization point that can be generated as part of post
`production activity based on the Subject audio. As such, the
`audio used to generate synchronization points includes
`Sounds that are intended to remain within the final product,
`Such as a movie or music, in accordance with at least one
`embodiment. In various embodiments, post-capture synchro
`nization points may or may not result from certifying candi
`date synchronization points. As used herein, the term 'can
`didate synchronization point” refers generally, at least in one
`embodiment, to a synchronization point that has yet to be
`certified, and thus is not confirmed as being a post-capture
`synchronization point with Sufficient certainty. As used
`herein, the term "confirmatory candidate synchronization
`points' refers generally, at least in one embodiment, to addi
`tional candidate synchronization points that are examined to
`certify the candidate synchronization point.
`As used herein, the term “audio” refers generall