throbber
Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 1 of 195 PageID #: 2229
`
`
`
`
`
`Exhibit D
`Part 1 of 2
`
`
`
`

`

`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 2 of 195 PageID #: 2230
`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 2 of 195 PagelD #: 2230
`
`A Visual Computing Environment for Very Large Scale
`Biomolecular Modeling
`
`M. Zeller, J.C. Phillips, A. Dalke, W. Humphrey, K. Schulten
`R. Sharma, T.S. Huang, V.I. Pavlovi¢, Y. Zhao, Z. Lo, $. Chu
`Beckman Institute for Advanced Science and Technology
`University of Illinois at Urbana-Champaign
`405 N. Mathews Avenue, Urbana, IL 61801
`}Department of Computer Science and Engineering
`Pennsylvania State University
`University Park, PA 16802-6106
`
`Abstract
`
`Knowledge of the complet molecular structures of living cells is being accumulated at a tremen-
`dous rate. Key technologies enabling this success have been high performance computing and powerful
`molecular graphics applications, but the technology is beginning to seriously lag behind challenges
`posed by the size and number of new structures and by the emerging opportunities in drug design
`and genetic engineering. A visual computing environmentis being developed which permits interac-
`tive modeling of biopolymers by linking a 3D molecular graphics program with an efficient molecular
`dynamics simulation program executed on remote high-performance parallel computers. The sys-
`tem will be ideally suited for distributed computing environments, by utilizing both local 3D graphics
`facilities and the peak capacity of high-performance computers for the purpose of interactive bio-
`molecular modeling. To create an interactive 8D environment three input methods will be explored:
`(1) a sia degree of freedom “mouse” for controlling the space shared by the model and the user; (2)
`voice commands monitored through a microphone and recognized by a speech recognition interface;
`(3) hand gestures, detected through cameras and interpreted using computer vision techniques.
`Controlling 3D graphics connected to real time simulations and the use of voice with suitable
`language semantics, as well as hand gestures, promise great benefits for many types of problem
`solving environments. Our focus on structural biology takes advantage of existing sophisticated soft-
`ware, provides concrete objectives, defines a well-posed domain of tasks and offers a well-developed
`vocabulary for spoken communication.
`
`1: Introduction
`
`The biomedical sciences are presently undergoing a revolutionary developmentfollowing the de-
`termination of a rapidly increasing number of complex molecular structures of the living cell. Struc-
`tures encompassing many thousands of atoms, with integral biological functions and of great medical
`significance, are being discovered at. an increasing pace. This development came about only with the
`advent of high performance computers, powerful molecular graphics software, and a host of program
`packages regularly used by thousandsof researchers. Computational methods play an essentialrole
`in structure determination and structure refinement, in graphical interpretations of complex struc-
`tures, in modeling of biopolymers for drug design and genetic engineering of proteins, and in the
`physical, mechanistic and dynamic analysis of resolved structures with the goal for understanding
`the principles underlying biopolymer architecture and function.
`
`1063-6862/97 $10.00 © 1997 IEEE
`
`3
`
`BECKMANO0000003
`
`BECKMAN00000003
`
`

`

`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 3 of 195 PageID #: 2231
`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 3 of 195 PagelD #: 2231
`
`Computational technology, unfortunately, lags now behind the rapid progressof structural biology
`and cannot adequately handle the size and number of structures discovered today. Furthermore,
`current user interfaces suffer from numerous shortcomings, e.g., a poor adaptation to the three-
`dimensional character of biopolymer structures and a complexity that requires long training periods.
`Also, gross limitations in locally available computer power restricts the interactive modeling of
`biopolymers and source codes are, with rare exceptions, inaccessible or poorly documented. Finally,
`the present paradigm for the rendering of biomolecular structures simply displays positions and
`types of atoms and bonds, neglecting topological and physical characteristics such as surfaces,
`voids, pockets, or electrostatic potentials.
`We have taken an innovative, collaborative approach to address the above impediments. A visual
`computing environment, MDScope, has been developed which permits interactive modeling[1, 4, 6].
`MDScope connects a powerful molecular graphics program, VMD, with a fast and efficient modeling
`program, NAMD, specifically designed for parallel computers. MDScope is ideally suited for high
`performance, distributed computing environments. The program presently links VMD to NAMD,
`the latter running on a remote high performance parallel computer. Three input methods have
`been explored in the framework of the VMD program to create an interactive 3D environment: a
`six (translation and rotation) degrees of freedom, electromagnetically tracked “mouse” has been
`integrated into VMD for manipulations of 3D objects; voice commands monitored through a micro-
`phone and recognized by a speech recognition interface have been introduced to replace cumbersome
`keyboard commands; hand gestures to manipulate the displayed model are detected through stereo
`cameras and interpreted using computer vision techniques. A large screen stereo graphicsfacility,
`expected to become the standard for biomolecular graphics, has been implemented andis frequently
`used by researchers.
`A long-term goal of VMD is to enable multiple users to interact with the model simultaneously.
`This interaction and collaboration is expected to significantly shorten the problem-solving cycle in
`biomolecular modeling since users can guide searches, e.g., for optimal designs, without resorting to
`a time consuming, non-intuitive cycle of batch jobs carrying out automated searches. Incorporating
`voice commands will allow the user to be free of the keyboard, and hand gestures will permit the
`user to easily manipulate the displayed model and to explore different molecular configurations. The
`combination of both speech and hand gestures will be far more powerful than either individually, and
`its success will apply to interactive environments for structural biologists as well as to a wide range of
`otherscientific and engineering applications. To accomplish this interaction, highly robust automatic
`speech recognition (ASR) and automatic gesture recognition (AGR) techniques are necessary. These
`techniques will be required for such problems as differentiation between commands and casual
`speech, and distinction between meaningful and meaningless gestures and hand movements.
`The primary infrastructure for our project is a large-screen stereographic projection facility,
`developed in the Theoretical Biophysics Group and shared by a large group of biomedical researchers.
`Thefacility employs cost effective and space saving display hardware which is added to a high end
`graphics workstation and can be easily duplicated at other sites. It produces 8’ x 6’ x 6’ 3D models
`in a 120 square foot area. The facility consists of a projector which displays alternating left- and
`right-eye views onto the screen at nearly twice the rate of ordinary projectors. The images, when
`viewed through special eyewear, produce a stereo display. In addition, a spatial tracker is used as
`a 3D input device for the development of a three-dimensional user interface. The program VMD
`has been designed for this environment as well as for standard monitors. Figure 1 attempts to
`convey an impression of the system through a photomontage of an actual image (nuclear hormone
`receptor-DNA complex) and of the actual space.
`The key goal of our work is to simplify model manipulation and rendering to such a degree
`that biomolecular modeling assumes a playful character; this will allow the researcher to explore
`variations of their model and concentrate on biomolecular aspects of their task without undue
`distraction by computational aspects. The ultimate goal, illustrated in Fig. 1, will focus on wide
`ranging improvementsof the graphical user interface of the existing programs MDScope which will
`become possible through the combined expertise of researchers in computational structural biology,
`parallel computing, numerical algorithm, and intelligent human-computer interaction technology.
`
`BECKMANO0000004
`
`BECKMAN00000004
`
`

`

`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 4 of 195 PageID #: 2232
`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 4 of 195 PagelD #: 2232
`
`Molecular Dynamics
`
`
` NAMD
`Computation
`
` MDCOMM
`
`Figure 1. The program VMD, coupled with the program NAMD, and speech and
`gesture user-interface components. Thesefacilities comprise MDScope, a problem-
`solving environmentfor structural biology.
`
`2: MDScope, a Visual Environment for Structural Biology
`
`MDScope is a set of integrated software components which provides an environmentfor simulation
`and visualization of biomolecular systems in structural biology. It consists of three separate packages
`which may be used individually, or together to constitute the MDScope environment [5]. These
`packages are:
`
`1. The program NAMD, a molecular dynamics program which runsin parallel on a wide variety
`of architectures and operating systems.
`2. The program VMD, a molecular visualization program which displays both static molecular
`structures and dynamic molecular motion as computed by programs such as NAMD.
`3. The MDCOMMsoftware, which provides an efficient means of communication between VMD
`and NAMD, and allows VMDtoact as a graphical user interface to NAMD. Using MDCOMM,
`VMDprovidesan interface for interactive setup and display of a molecular dynamics simulation
`on a remote supercomputer or high-performance workstation, using NAMDas a computational
`“engine”.
`
`Molecular dynamics calculations are computationally very expensive, and require large amounts
`of memory to store the molecular structure, coordinates, and atom-atom interaction lists. The
`challenge of efficient calculation of the inter-atomic forces by using high performance computingis
`addressed in NAMD through the use of parallel computation and incorporation of the Distributed
`Parallel Multipole Tree Algorithm (DPMTA)[2]. NAMD uses a spatial decomposition algorithm to
`partition the task of computing the force on each atom among several processors. This algorithm
`subdivides the volume of space occupied by the molecule into uniform cubes (or patches), as shown
`in Figure 2, which are distributed among the processors in a parallel computer. The motions of the
`atoms in each patch are computed by the processor to which each patch is assigned; as atoms move
`they are transferred between the patches, and patches are reassigned to different processors in order
`to maintain a uniform computational load.
`The key functions of the program VMDare to visualize biomolecular systems, to allow direct
`interaction between a user and a molecule being simulated on another computer, and to provide
`an intuitive user interface for controlling the visual display and remote simulation. VMD uses the
`
`BECKMANO0000005
`
`BECKMAN00000005
`
`

`

`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 5 of 195 PageID #: 2233
`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 5 of 195 PagelD #: 2233
`
`
`
`Figure 2. Spatial decomposition of a small polypeptide. Each cube represents a
`patch in NAMD.
`
`MDCOMM software mentioned above to enable it to initiate, display, and control a simulation
`using NAMD. Asthe trajectory of a molecular system is calculated, the coordinates of each atom
`are sent from NAMD to VMD. Current network technology provides the necessary bandwidth to
`communicate the atomic coordinate data; the use of a high-performance dynamics program is crucial
`in order to furnish new data at the speed required for interactive display.
`VMDimplements manydifferent forms of user interfaces — users may control the program through
`keyboard commands, a mouse, and a graphical user interface. VMD also implements a mechanism
`for external programs to serve as user interface components, by allowing them to communicate
`with VMD through standard network communication channels. This makes it possible for new user
`interface methods, such as the speech- and gesture-recognition systems discussed in Section 3, to be
`developed in parallel with VMD.
`Development of MDScope is an ongoing project in the Theoretical Biophysics Group at the
`University of Illinois [1]. All three components of MDScope (NAMD, VMD, and MDCOMM)may be
`obtained via anonymousftp (ftp.ks.uiuc.edu), or World Wide Web (http://www.ks.uiuc.edu).
`The components may be used individually or in concert, and include the complete source code for
`the packages as well as extensive documentation describing how to use and modify the programs.
`Currently, NAMDis available for a wide variety of architectures and operating systems, including
`clusters of high-performance Hewlett-Packard, Silicon Graphics, and IBM RS/6000 Unix worksta-
`tions, as well as the Cray T3D and Convex Exemplar parallel systems. We are in the process of
`making NAMD available on the IBM SP-2 and SGI Power Challenge architectures. NAMD should
`be compilable on any system with a C++ compiler and PVM version 3.3.11.
`In addition to the full source and SGIbinary distributions (IRIX 5.x and IRIX 6.x), the program
`VMDis already available for Hewlett-Packard (HP-UX 9 and HP-UX 10 using Mesa emulated
`OpenGL) and Linux workstations. Ports to other platforms, most notably IBM RS/6000 (AIX),
`will be available soon. VMD displays a graphical rendering of the molecular systems under study,
`and provides both a text-based console interface and a complete graphical interface for the user.
`These controls are used to modify the appearance of the molecules and display, to control the
`display of structural features of the molecules, and to access remote computers running molecular
`dynamics simulations. Multiple structures may be viewed simultaneously, and a flexible atom
`selection mechanism enables the user to easily select subsets of atoms for display. VMD includes
`an extensive text-command processing capability through the use of the Tcl library, a popular and
`widely available package for script parsing and interpreting. The use of Tcl makesit possible for users
`to write scripts including such features as variable substitution, control loops, and function calls.
`The current rendering of a molecule may also be saved in an imagefile or in a format suitable for use
`by several image-processing packages. Also, by connecting directly to a remote computer running
`a molecular dynamics simulation, VMD offers users the capability to interactively participate in an
`
`BECKMANOO0000006
`
`BECKMAN00000006
`
`

`

`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 6 of 195 PageID #: 2234
`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 6 of 195 PagelD #: 2234
`
`ongoing simulation, e.g. the option to apply perturbative forces to individual atoms.
`Speech and hand gestures are fundamental methods of human communication, and their use for
`interaction with and control of the display of VMD will greatly improve the utility of the program.
`Speech- and gesture-recognition user interfaces are being added to VMD to provide a “natural”
`working environment for researchers.
`
`3: Speech and Gesture Interface
`
`To fully exploit the potential that visual computing environments offer, there is a need for a
`“natural” interface that allows the manipulation of such displays without cumbersome attachments.
`In this section we describe the use of visual hand gesture analysis and speech recognition for de-
`veloping a speech/gesture interface to VMD. Thefree hand gestures are used for manipulating the
`3-D graphical display together with a set of speech commands. Wedescribe the visual gesture
`analysis and the speech analysis techniques used in developing this interface. The dual modality of
`speech/gesture is found to greatly aid the interaction capability.
`The communication mode that seems most relevant to the manipulation of physical objects is
`hand motion, also called hand gestures. We use it to act on the world, to grasp and explore objects,
`and to express our ideas. Now virtual objects, unlike physical objects, are under computer control.
`To manipulate them naturally, humans would prefer to employ hand gestures as well as speech.
`Psychological experiments, for example, indicate that people prefer to use speech in combination
`with gestures in a virtual environment, since it allows the user to interact without special training
`or special apparatus and allows the user to concentrate more on the virtual objects and the tasks at
`hand [3]. We explore this multimodal nature of HCI involved in manipulating virtual objects using
`speech and gesture.
`To keep the interaction natural, it is desirable to have as few devices attached to the user as
`possible. Motivated by this, we have been developing techniques that will enable spoken words
`and simple free-hand gestures to be used while interacting with 3D graphical objects in a virtual
`environment. The voice commands are monitored through a microphone and recognized using
`automatic speech recognition (ASR) techniques. The hand gestures are detected through a pair of
`strategically positioned cameras and interpreted using a set of computer vision techniques that we
`term automatic gesture recognition (AGR). These computervision algorithmsare able to extract the
`user hand from the background, extract positions of thefingers, and distinguish a meaningful gesture
`from unintentional hand movements using the context. We use the context of the VMD environment
`to place the necessary constraints to make the analysis robust and to develop a command language
`that attempts to optimally combine speech and gesture inputs.
`VMD uses a keyboard and a magnetically tracked pointer as the interface. This is particularly
`inconvenient since the system is typically used by multiple (6-8) users, and the interface hinders
`the interactive nature of the visualization system. Thus incorporating voice command control in
`MDScope would enable the users to be free of keyboards and to interact with the environment in
`a natural manner. The hand gestures would permit the users to easily manipulate the displayed
`model and “play” with different spatial combinations of the molecular structures. The integration
`of speech and hand gestures as a multi-modal interaction mechanism would be more powerful than
`using either mode alone, motivating the development of the speech/gesture interface. Further, the
`goal was to minimize the modifications needed to the existing VMD program for incorporating the
`new interface. The experimental prototypes that we built for both the speech (ASR) and hand
`gesture analysis (AGR) required the following addition to the VMD environment.
`Software. In order to reduce the complexity and increase theflexibility of the program design, a
`communications layer was added so external programs can be written and maintained independently
`from the VMD code. These use the VMD text language to query VMDfor information or to send
`new commands. The VMD text language is based on the TCL scripting language. Since all the
`capabilities of VMDare available at the script level, an external program can control VMDin any
`way. Both the ASR and AGR programs interact with VMD using this method. For a simple voice
`command, such as “rotate left 90”, the ASR converts the phrase into the VMD text command
`
`BECKMANO0000007
`
`BECKMAN00000007
`
`

`

`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 7 of 195 PageID #: 2235
`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 7 of 195 PagelD #: 2235
`
`“rotate y 90” and sends that to VMD. Similarly, when the AGR is being used as a pointing device,
`it sends the commands to change the current position and vector of VMD’s graphical 3D pointers.
`Setup for visual gesture analysis. To facilitate the development of AGR algorithms, we
`designed an experimental platform shown in Figure 3 that was used for gesture recognition experi-
`ments. In addition to the uniformly black background, there is a lighting arrangement that shines
`red light on the hand without distracting the user from the main 3D display. The setup has the
`additional advantage that it can be transported easily and is relatively unobtrusive.
`
` Side Camera
`
`
`3D pointing
`
`‘entroid and
`direction
`
`
`ajoraxistit
`
`
`
`C m
`
`Figure 3. The experimental setup with two cameras used for gesture recognition
`(left) and overview of the AGR subsystems(right).
`Setup for speech analysis. A prototype ASR system has been implemented and integrated
`into VMD. The system consisted of two blocks: a recorder front-end followed by the recognizer unit.
`The recorder employed a circularly-buffered memory to implement its recording duties, sending
`its output to the recognizer unit in blocks. A digital volume meter accompanied this to provide
`feedback to the user by indicating an acceptable range of loudness. The recognizer that followed
`was developed by modifying HTK software. This unit performed feature extraction and time-
`synchronous Viterbi decoding on the input blocks, sending the decoded speech directly via Tcl-dp
`commands to an SGI Onyx workstation where the VMD processresided.
`Speech/gesture command language.
`In order to effectively utilize the information input
`from the user in the form of spoken words and simple hand gestures, we have designed a command
`language for MDScope that combines speech with gesture. This command language uses the basic
`syntax of < action >< object >< modifier >. The < action > component is spoken (e-g.,
`“rotate”) while the < object > and < modifier > are specified by a combination of speech and
`gesture. An example is, speaking “this” while pointing, followed by a modifier to clarify what
`is being pointed to, such as “molecule”, “helix”, “atom”, etc., followed by speaking “done” after
`moving the hand according to the desired motion. Another example of the desired speech/gesture
`capability is the voice command “engage” to query VMD for the molecule that is nearest to the tip
`of the pointer and to make the molecule blink to indicate that it was selected and to save a reference
`to that molecule for future use. Once engaged, the voice command “rotate” converts the gesture
`commands into rotations of the chosen molecule, and the command “translate” converts them into
`translations. When finished, the command “release” deselects the molecule and allows the user
`to manipulate another molecule. The ASR and AGR techniques that made the above interaction
`possible are described next.
`
`3.1: Speech input using ASR
`
`In the integration of speech and gesture within the MDScope environment, a real-time decoding
`of the user’s commands is required in order to keep pace with the hand gestures. Thus thereis
`a need for “word spotting” which is defined as the task of detecting a given vocabulary of words
`embeddedin unconstrained continuous speech. It differs from conventional large-vocabulary contin-
`uous speech recognition (CSR or LVCSR) systems in that the latter seeks to determine an optimal
`
`BECKMANO0000008
`
`BECKMAN00000008
`
`

`

`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 8 of 195 PageID #: 2236
`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 8 of 195 PagelD #: 2236
`
`sequence of words from a prescribed vocabulary. A direct mapping between spoken utterances and
`the recognizer’s vocabulary is implied with a CSR, leaving no room for the accommodation of non-
`vocabulary wordsin the form of extraneous speech or unintended background noise. The basis for
`word spotting, also termed keyword spotting (KWS), is dictated by real world applications. Real
`users of a spoken language system often embellish their commands with supporting phrases and
`sometimes even issue conversation absent of valid commands. In response to such natural language
`dialogue and the implications to robust human-computer interaction, standard CSR. systems were
`converted into spotters by simply addingfiller or garbage models to their vocabulary. Recognition
`output stream would then consist of a sequence of keywordsandfillers constrained by a simple syn-
`tactical network. In other words, recognizers operated in a “spotter” mode. While early techniques
`emphasized a template-based dynamic time warping (DTW)slant, current approachesare typically
`armed with the statistical clout of hidden Markov models (HMMs) [8, 9, 12], and recently with
`the discriminatory abilities of neural networks (NN). These were typically word-based and used an
`overall network which placed the keyword models in parallel with the garbage models.
`Keywords. Table 1 lists the keywords and their phonetic transcriptions chosen for the ex-
`periment. These commands allowed the VMD user to manipulate the molecules and polymeric
`
`Keyword Transcription
`translate
` t-r-ae-n-s-l-ey-t
`rotate
`r-ow-t-ey-t
`engage
`eh-n-g-ey-jh
`release
`r-ih-Liy-s
`pick
`p-ih-k
`
`Table 1. Keywords.
`
`In modeling the acoustics of the speech, the HMM system
`structures selected by hand gestures.
`was based on phones rather than words for large vocabulary flexibility in the given biophysical
`environment. A word-based system, though invariably easier to implement, would be inconvenient
`to retrain if and when the vocabulary changed.
`Fillers. Filler models are more varied. In LVCSR applications, these fillers may be represented
`explicitly by the non-keyword portion of the vocabulary, as whole words for example. In other tasks,
`non-keywords are built by a parallel combination of either keyword “pieces” or phonemes whether
`they be context-independent (C1) monophonesor context-dependent (CD) triphones or diphones[9].
`Twelve fillers or garbage models were used to model extraneous speech in our experiment. Instead
`of being monophonesorstates of keyword models as used in prior experiments in the literature, the
`models that were used covered broad classes of basic sounds found in American English. There are
`several things to note. First, the class of “consonants-africates” was not used due to the brevity of
`occurrence in both the prescribed vocabulary and training data. As observed by [12] and many other
`researchers, varying or increasing the number of models does not gain much in spotting performance.
`Second, a model for backgroundsilence was includedin addition to the twelve garbage modelslisted.
`Such a model removed the need for an explicit endpoint detector by modeling the interword pauses
`in the incoming signal. Note also that the descriptors for the vowel class correspond to the position
`of the tongue hump in producing the vowel.
`Recognition Network. The recognition syntactical network placed the keywordsin parallel to
`a set of garbage models which included a model for silence. These models followed a null grammar,
`meaning that every model may precede or succeed any other model. A global grammarscale factor
`(s) and transition probability factor (p) were used to optimize the recognition accuracy and adjust
`the operating point of the system.
`Features and Training. After sampling speech at 16kHz and filtering to prevent aliasing,
`the speech samples were preemphasized with a first order digital filter using a preemphasis factor
`of 0.97 and blocked into frames of 25 ms in length with a shift between frames of 10 ms. Each
`frame of speech was weighted by a Hamming window, and then mel-frequency cepstral coefficients
`
`BECKMANOO0000009
`
`BECKMAN00000009
`
`

`

`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 9 of 195 PageID #: 2237
`Case 2:21-cv-00040-JRG Document 94-3 Filed 10/12/21 Page 9 of 195 PagelD #: 2237
`
`of sixteenth order were derived and weighted bya liftering factor of 22. Cepstral coefficients were
`chosen as they have been shown to be more robust and discriminative than linear predictive coding
`coefficients or log area ratio coefficients. Normalized log energy and first order temporal regression
`coefficients were also included in the feature vector.
`
`The topology of the HMMs for both keyword-phones and garbage models consisted of five states,
`the three internal states being emitting states. Following a left-to-right traversal, each state was
`described by a mixture of five continuous density Gaussians with diagonal covariance matrices.
`Three iterations of the Baum-Welch reestimation procedure were used in training.
`In training the set of fifteen keyword-phones, forty sentences were developed as follows. Each
`of the five keywords were individually paired with the remaining four. This was then doubled to
`provide a sufficient number of training tokens.
`In sum, the sentences were composed of pairs of
`keywords such as “engage translate” and “rotate pick” which were arranged in such an orderas to
`allow each of the keywords to be spoken sixteen times. Each VMD user proceeded with this short
`recording session.
`In training the garbage models, a much more extensive databaseof training sentences was required
`to provide an adequate amount, of training data since the twelve broad classes cover nearly the entire
`spectrum of the standard 48 phones. The TIMIT database was subsequently employed to provide
`an initial set of bootstrapped models. Retraining was then performed once for one VMD user who
`had recorded a set of 720 sentences of commonly used VMD commands. These sentences spanned
`the scope of the VMD diction, including a more detailed set of commands, numbers, and modifiers.
`This was necessary to provide data normalized to the existing computational environment. Note
`that the garbage models were trained only once for this experiment. Hence, VMD users only needed
`to go through the short training procedure detailed above.
`Performance. Upon testing the system as a whole with fifty test sentences that embedded the
`keywords within bodies of non-keywords, wordspotting accuracies ranged to 98% on the trained
`speaker. The trained speaker refers to each user who trained the keyword-phones regardless of the
`one whotrained the garbage models. This was considered very well by the VMD users for the
`given biophysics environment, supporting the techniques that were used.
`In general, false alarms
`occurred only for those situations where the user embedded a valid keyword within another word. For
`example, if one says ‘translation’ instead of ‘translate’, the spotter will still recognize the command
`as ‘translate’.
`
`3.2: Hand gesture input using AGR
`
`The general AGR problem is hard, because it involves analyzing the human hand which has a
`very high degree of freedom and because the use of the hand gesture is not so well understood
`(See [7] for a survey on vision-based AGR). However, we use the context of the particular virtual
`environment to develop an appropriate set of gestural “commands”. The gesture recognition is
`done by analyzing the sequence of images from a pair of cameras positioned such that they facilitate
`robust analysis of the hand images. The background is set to be uniformly black to further help
`with the real-time analysis without using any specialized image-processing hardware.
`Finger as a 3D pointer. The AGR system consists of two levels of subsystems (See Figure 3).
`First level subsystems are used to extract a 2D pointing direction from single camera images. The
`second level subsystem combines the information obtained from the outputs of the first level sub-
`systems into a 3D pointing direction. To obtain the 2D pointing direction, the first level subsystems
`perform a sequence of operations on the input image data. The gray-level imageis first thresholded
`in order to extract a silhouette of the user’s lower arm from the background. Next, first and second
`image moments are calculated and then used to form a bounding box for extraction of the index
`finger. Once the finger is segmented from the hand, another set of image moments is calculated,
`this time for the finger itself. Finally, based on these moments, 2D finger centroid and finger direc-
`tion are found. 3D pointing direction is finally determined in the second level subsystem using the
`knowledgeof the setup geometry and 2D centroids and pointing directions. This information is then
`forwarded to the central display manager which displays a cursor at an appropriate screen position.
`Our implementation produced a tracking rate of about 4 frames per second, mainly limited by the
`
`10
`
`BECKMANO0000010
`
`BECKMAN00000010
`
`

`

`C

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket