throbber
Laurie Gerber
`David Farwell
`Eduard Hovy (Eds.)
`
`Machine Translation
`and the
`Information Soup
`
`Third Conference of the Association
`for Machine Translation in the Americas
`AMTA'98
`Langhorne, PA, USA, October 28-31, 1998
`Proceedings
`
`Springer
`
`AOL Ex. 1022
`Page 1 of 18
`
`

`

`Series,Editors
`'Jairii~ G. Carbonell, Carnegie Mellon University; Pittsburgh,
`Jorg Siekmann, University of Saarland,
`
`Volume Editors
`David Farwell
`New Mexico State University, Computing Research
`Box 300011 3CRL, Las Cruces, NM 88003, USA
`E-mail: david@crl.nmsu.edu
`
`Laurie Gerber
`SYSTRAN Inc.
`7855 Fay Avenue, Suite 300
`P.O. Box 907 , La Jolla, CA 92037, USA
`E-mail: 1gerber@systransoft.com
`
`Eduard Hovy
`University of Southern California, Information Sciences Institute
`4676 Admiralty Way, Marina del Rey, CA 90292-6695, USA
`E-mail: hovy@isLedu
`
`Cataloging-in-Publication Data applied for
`
`Die Deutsche Bibliothek - CIP-Einheitsaufnahme
`
`Machine translation and the information soup: proceedings; Langhorne, PA.
`USA, October 28·31, 19981David Farwell .., (ed.). - Berlin; Heidelberg;
`New York; Barcelona; Hong Kong; London; Milan; Paris; Singapore; Tokyo:
`Springer.1998»,
`(... Conference of Associationfor Machine Translation in the Americas, AMT A ... ; 3)
`(Lecture notes in computer science; Vol. 1529 : Lecture notes in artificial
`intelligence)
`ISBN 3-540-65259-0
`
`CRSubject Classification (l998): 1.2.7, H.3, FA.3, H.S, 1.5
`
`ISBN 3-540-65259-0 Springer-Verlag Berlin Heidelberg New York
`
`is
`the whole or part of the materia!
`"Thiswork is su?j.ect to copyright. All righ~s are reserved. whether
`concerne~. speclf,,:ally therights of translatIOn. reprinting. re-use of illustrations,
`recitation. broadcasll~g.
`reproduction on.mlcro~lms or In any other way, and storage in data banks. Duplication of this publicatIOn
`?r parts thereof Is.penrulted onl'y ~nder the provisions of the German Copyright Law of September 9, 1965.
`I? Its currentversl~n. and perrrussion for use must always be obtained from Springer. Verlag. Violations are
`hable for prosecutlonunder
`the German Copyright Law.
`© Springer-Verlag Berlin Heidelberg 1998
`Printed in Germany>...
`..
`Typesetting:'.~a~era ready by allthor',"'-,:::;"/,,>
`.06/3142
`- 5432
`I O.
`SPIN 106921)06
`
`Printed on acid-free
`
`AOL Ex. 1022
`Page 2 of 18
`
`

`

`Integrating Query Translation and Document Translation
`ina Cross-Language Information Retrieval System
`
`Guo-Wei Bian and Hsin-Hsi Chen
`Department of Computer Science and Information Engineering
`National Taiwan University
`Taipei, Taiwan, R.O.C.
`Email: gwbian@nlg.csie.ntu.edu.tw.hh_chen@csie.ntu.edu.tw
`http://nlg3.csie.ntu.edu.tw
`
`Due to the explosive growth of the WWW, very large multilingual
`Abstract.
`textual resources have motivated the researches in Cross-Language Information
`Retrieval and online Web Machine Translation.
`In this paper, the integration
`of language translation and text processing system is proposed to build a
`multilingual information system. A distributed English-Chinese
`system on
`WWW is introduced to illustrate how to integrate query translation, search
`engines, and web translation system.
`Since July 1997, more than 46,000 users
`have accessed our system and about 250,000 English web pages have been
`translated to pages in Chinese or bilingual English-Chinese versions. And the
`average satisfaction degree of users at document level is 67.47%.
`
`1 Introduction
`and has
`explosively
`the World Wide Web (WWW) grows
`In the past
`few years,
`become the most useful and powerful
`information
`retrieval
`and accessing system on
`the Internet.': The WWW breaks the boundaries
`of countries
`and provides very large
`inmultiple
`online documents· (more than 10 million documents)
`languages.
`These
`multilingual.
`textual
`resources have motivated
`the researches
`in Cross-Language
`(fUR)
`and. online Machine Translation
`(MT)
`to build the
`InformatiOllRetrieval
`
`
`
`
`
`;n~t:~;o~;~t\:~~;~~:;;~:~~ss~:::mbee~~~~:~:;:~:1r,~:'th;:;·~
`
`the major
`barrier becomes
`the. language
`locate interesting. and·.~elevallt. illfonnation
`indifferent
`problem for people to search, retrieve, and 'understand WWWdocuments
`languages.,
`.That ~ecreases the dissemination
`po\Ver of the WWW to some extent.
`T.0. aUeVI?te..this •.barrier,
`and WWW servers k~ep
`some. information
`providers
`multiple copIes of
`t~eir. information· in different
`.languages
`for. multilingual
`servIC~.
`•.of the WWW. environnient..
`the provided
`information IS
`Due. to the dynamicnat~re
`updatedfreq~ent1y.!~tsapproachis
`illvolved with the data inconsistency
`problem
`andt~e.01anagementp~o~lem~fll1ultilingUal
`documents.
`How to incorpo~a..e the
`for muItthngual
`C3p~bthtr·oflangu~getranslationintowww
`becomes
`indispensable
`'1" ...••......:"
`. .
`.
`......•
`. 4]· h ve been
`service ... Recently·
`sever· I ..
`,
`3 on me. mach me translation
`systems
`[1-
`a ..
`
`AOL Ex. 1022
`Page 3 of 18
`
`

`

`251
`
`to alleviate the language
`
`to the WWW directly
`be employed
`cannot
`presented. Traditional MT systems
`becausethey are usually used to translate
`the written documents
`in the off-line batch
`mode. Translation quality is the most
`important
`criterion.
`In on-line and real-time
`applications,speed performance
`is also an important
`factor.
`In this paper, we will focus on the following
`problems
`barrieronWWW:
`1. Language translation
`techniques
`system and text processing system
`2. The integration of language
`translation
`Thelanguagetranslation system is proposed
`to incorporate with the different kinds of
`systems, etc.).
`textprocessing systems
`(e.g.,
`searching
`engines,
`text summarization
`Asystemintegrated MT and IR technologies
`for WWW (abbreviated as MTIR)
`is
`introducedto illustrate
`our
`solutions
`for
`the mentioned
`problems.
`Section 2
`describesa general model of the multilingual
`information
`system and introduces the
`architectureof our bilingual English-Chinese
`system for WWW. We discuss the
`in section 3.
`Section 4 specifies how to integrate
`Chinese-Englishquery translation
`thequerytranslation of CUR with several
`searching engines on WWW.
`Section 5
`describesthe online and real-time web translation.
`Section 6 makes evaluations for
`sucha multilingual
`information
`system from different users' viewpoints.
`Section 7
`concludesthe remarks.
`2 Multilingual Information System
`systems are shown as follows:
`Someof multilingual requirements
`for computer
`1. Data Representation:
`character
`sets and coding systems
`2. Data Input:
`input methods
`and transliterated
`input
`3. Data Display and Output:
`font mapping
`4. Data Manipulation:
`the application must be able to handle the different
`coding characters
`need of users
`to translate the information
`5. Query Translation:
`6. Document Translation
`using Machine
`Translation
`(MT):
`documents
`have been resolved by system applications in several
`Thefirst three requirements
`Some of applications
`and packages
`can also handle
`computeroperating systems
`lO~hsingle-byte and multipie-byte
`coding systems
`for Indo-European
`and Eastern-
`san languages. However
`the language
`barrier becomes
`the major problem for
`reoPleto access the multili~gual
`documents.
`How to incorporate
`the capability of
`ang~~getranslation to meet
`the requirements
`5 and 6 becomes
`indispensable
`for
`multJlmgualsystems.
`
`to translate
`
`~:lFour.LayerMultilingual Information System (MLIS)
`t;;~slo:hp~\Vs.a. four-layer multilingual
`information
`system. We put. the. different
`ocessmg systems on the four layers:
`Layert: Language Identification
`(U)
`... Layer2: Text Processing Systems
`~eveI3: Language Translation SysteIDs
`evel4: User Interface
`(UI)
`
`AOL Ex. 1022
`Page 4 of 18
`
`

`

`252
`
`Multilingual Resources
`
`Multiple Langauges
`
`l~ Tex.t Processing
`
`J'
`
`j I
`
`Language
`( Translation
`)
`
`Fig. 1. A Four-Layer Model of Multilingual
`
`Information
`
`System (MUS)
`
`Native Langauge(s)
`
`2. The Overall ArchitectureofMTIR
`
`Syst~m
`
`AOL Ex. 1022
`Page 5 of 18
`
`

`

`253
`
`lexical analysis,
`(e.g.,
`techniques
`processing
`language
`Becausemost of natural
`parsing,etc.) are dependent
`document,
`the layer 1
`on the language
`of processed
`resolveslanguage identification
`problem before
`text processing.
`The language
`identificationsystem employs
`cues
`from the different
`character
`sets and coding
`systemsof languages.
`At
`layer 2, the systems may perform information extraction,
`informationfiltering,
`information
`retrieval,
`text classification,
`text summarization, or
`othertext processing
`tasks.
`Some
`of
`the
`text processing
`systems may have
`interactionwith another one.
`For example,
`the relevant documents
`retrieved 'by IR
`systemcan be summarized
`to users.
`Additionally,
`a multilingual
`text processing
`systemshould be able
`to handle
`the different
`coding
`characters
`to match the
`requirement4 (data manipulation).
`Several
`searching
`engines
`(e.g., AltaVista,
`Infoseek,etc.) have the ability to index the documents
`of multiple languages.
`The
`languagetranslation systems at
`layer 3 are used to translate the information need of
`usersfor text processing
`systems
`and translate
`the resultant documents
`from text
`processingsystems
`to users
`in their native
`languages.
`The user
`interface is the
`closestlayer to users.
`It gets
`the user's
`information
`need (included parameters,
`queryanduser profile) and displays
`the resultant document
`to user.
`
`System for WWW
`Information
`2.2Bilingual English-Chinese
`Onthe WWW,
`systems
`can be easily integrated as a larger distributed
`the distinct
`systemusing the HTTP protocol.
`Each system can be involved using an URL of
`eGI program. First,
`the CGI program gets
`input data from the caller. Then the
`ealle,rgets the resultant document
`from the server
`system.
`Fig. 2 shows the basic
`archItectureof MTIR system.
`Users
`express
`their
`intention by inputting URLs of
`web'pages or queries
`in ChineselEnglish.
`A Chinese query is translated into the
`Englishcounterpart using query translation mechanism.
`The translations of query
`terms.are disambiguated
`using word co-occurrence
`relationship.
`. Then the system
`~endsthe translated query to the searching
`engine that selected by user in the user
`Interface..The query subsystem takes care of the user interface part.
`the WWW is
`of : t~e
`co~trol
`under
`the
`eoThe ,su~sequent
`navigation
`on
`mmUlllcatIonsubsystem.
`To minimize
`the traffic of Internet, a caching module IS
`present~din this subsystem and some proxy systems are used to process the request.
`Th~obJects.in the' cache are checked when a request
`is. received.
`If the requested
`the communication
`system fetches. the HTML file (.htm or .html
`fiobJectis not found,
`I
`lie)
`.,
`or text file (.txt or .text file)
`from the neighboring
`proxy systems or the ongma
`server,
`.
`
`It divides the whole file .into
`file.
`the retrieved
`The HTML analyzer examines
`.
`.' Th HTML tags
`severaltr
`I'
`ans anon segments
`for the machine
`translation subsystem.
`e
`.
`suchastitIh
`···I·S:
`msandtables
`di
`..
`e, ea mgs, unordered
`lists, ordered lists, defimtlOn IstS,lor
`I .
`f ·11.···.
`Pay the sirnil
`•.nuestion mark .and
`.'
`. I ar
`roles of punctuation
`marks
`like
`u
`stop, que
`".
`•
`exclamaf ...
`..
`"'1··
`t
`.g" bold
`.
`ita!'
`• Ion mark.
`to the above tags,
`the font stylee
`emen s,e.
`,."
`In contrast
`Ie, supe
`nknown words
`..
`. .
`. . .•
`...
`be
`rscnpts,
`subscripts
`and font
`styles, may produce many u
`.
`'..
`causethe
`hI'
`.
`..
`h
`t style elements
`s
`shOUldbe' Woe word IS split mto several parts.
`.Th~s t ese 10~ •-. •.••
`•A.f
`". hidden from the attributed words during translation processmg,
`.•,.
`...
`.....
`• > ter rec '
`'.
`.'
`•...
`. .....
`ther mformatlon
`usersmayaccesso ..........><.
`.elVIng
`the first
`translated
`document,
`
`AOL Ex. 1022
`Page 6 of 18
`
`

`

`254
`
`through the hyperlinks. We attach our system's URL to those URLs that linkto
`HTML files or text files. Such a way guarantees the successive browses are linked
`with our system. The other URLs,
`including inline images and external MIME
`objects, are changed into their absolute URLs.
`In other words,
`the non-textual
`information is received from the original servers. Our experimental systemis
`accessible with the following URL:
`http://mtir.csie.ntu.edu.tw
`3 Query Translation
`Several approaches have been proposed for CUR recently. There are four main
`approaches for query translation:
`1. Dictionary-based approach [5-8]
`2. Corpus-based approach [9-10]
`3. Hybrid approach (combined dictionary-based and corpus-based) [6]
`4. Machine Translation based approach (MT-based) [11]
`Because the large parallel Chinese-English
`corpora
`are not available, the
`dictionary-based approach is adopted in our system.
`The query translation for
`Chinese-English CUR consists of three major steps:
`1. Word segmentation: To identify the word boundary of the input streamof
`Chinese characters.
`the translated English query using the
`2. Query translation: To construct
`bilingual dictionary.. The translation disambiguation is done using the
`monolingual corpus.
`3. Monolingual IR: To search the relevant documents using the translated
`queries.
`The segmentation and the query translation use the same bilingual dictionary,in
`this design. That speeds up the dictionary lookup and avoids the inconsistenCies
`resulting from two dictionaries (i.e., segmentation dictionary and transfer dictionm:y),
`This bilingual dictionary has approximately 90,000 terms. The longest-matchtng
`method is adopted in Chinese segmentation.
`The segmentation processing searches
`for a dictionary entry corresponding to the longest sequence of Chinese characters
`from left to right. After identification of Chinese terms, the system selects someof
`the translation equivalents for each query. term from the bilingual dictionary.. Th~
`terms of query can be translated in two different
`levels of dictionary translations.
`word-level (word-by-word) and phrase-level
`translations.
`Those terms, missing
`from the transfer dictionary, are passed unchanged to the final query.
`3.1 Selectioll Strategies
`When,there is more than one translation equivalent in a dictionary entry, the following
`selection.strategies are explored.
`system looks up each term in the bilingual
`Select-AU(SA):.The
`.
`; (1)
`dictionary andconstructsa
`translated query by concatenating of all the senses of the
`<
`...
`.
`.•.•.
`.•. ..>.
`...
`....
`• .•
`.
`.
`.
`terms.
`..
`•Select-Highest-Frequency (SHF): The system selects the sense .with fhe
`.
`.• (2)
`plghest. ~r~quency.in target language corpus for each term.
`•Because the translatIon
`probabdltJesof
`senses for each term are unavailable without a large-scale word-
`
`AOL Ex. 1022
`Page 7 of 18
`
`

`

`255
`
`P(X,y)
`
`are reduced to the probabilities
`probabilities
`the translation
`alignedbilingual corpus,
`So,
`the frequently-used
`transferring sense of
`ofsensein the target
`language
`corpus.
`atermis used instead of the frequently-translated
`sense.
`the
`selects
`strategy
`This
`(3) Select-N-POS-Highest-Frequency
`(SNHF):
`If the term has N POS
`highest-frequentsense of each POS candidate
`of the term.
`candidates,the system will select N translation
`senses.
`Compared to this strategy,
`thestrategy(2) always selects only one sense for each term.
`(4) Word co-occurrence
`(WCO): This method classifies words on the basis of
`theirco-occurrence with other words.
`The translation
`of a query term can be
`disambiguatedwith the co-occurrence
`of its translation equivalents and other words'
`equivalents. The mutual
`information
`(MI)
`of word
`pairs
`reflects
`the word
`If two words x and y have probabilities P(x) and
`associationnorms in one language.
`pry), theirmutual information
`[12] is defined to be
`x, y = og 2 P(x)P(y)
`1
`)
`I(
`around the translation equivalents within the text
`the content
`Thismethod considers
`collectionto decide the best
`target equivalent.
`The mutual
`information of word pairs
`istrainedusing a window size 3 in the CACM text collection [13].
`Totally, there are
`247,864 word pairs.
`Table1 illustrates an example
`'~
`The Chinese concept
`translation.
`for different
`J iiR )}
`JW.'
`translation
`'singular
`value
`and
`its phrase-level
`(jiyi
`zhi
`fenjie)
`Four
`translated
`representations
`using
`different
`deco~position' are
`employed.
`translation
`is shown in Table 1 (a). Column 3
`selectionstrategies on the word-level
`Showsthe translation equivalents
`in transfer dictionary for the query terms at word-
`level. Table 1 (b) lists the mutual
`information
`of some word pairs of translation
`equivalents. Most of word pairs have no co-occurrence
`relations. ••..Considering the
`of the term '-t>-:JL '.(jiyi) has. the largest MI score
`e~ample,the equivalent
`'singular'
`w~l
`.
`~~.
`...
`.
`a I translatIOn equivalents
`of other two words.
`'
`3.2Exper'
`d
`t
`imen s an Evaluations
`are
`translations
`and the phrase-level
`InthefollOWing experiments
`the word-level
`and multi-term
`tOUchedto demonstrate
`the 'problems
`from missing
`terminology
`concepts,In addition we will evaluate these
`selection strategies with the long and
`theshortversions of queries,
`The short queries are used to simulate. the behavior of
`ourmethodsfor WWW.
`The· SMART information retrieval system [14] is utilized to
`~:sure the similarity of the query and each document using the vector space mo~el.
`CACque~yweights are multiplied
`by the traditional
`IDF factor.. The test,collectlO.n
`coll~ IS used to evaluate
`the performance
`of different
`selectIOn strategIes.
`-This
`'ct· ectlOnContains 3204 texts and 64que.ries
`in English.
`Each query has relevance...
`JUgement
`T
`.
`....
`..
`t I 20
`.
`he average number of words in the query IS approxIma eiy ...'.... ,
`s'..
`ate the Chmese
`• In order t·.
`.
`..•
`. 0 test
`qu .
`the effecttveness
`of query translation,
`.'fe
`cre
`'....
`enes by
`eones
`.The
`.
`..'
`Chi
`mes.....
`.
`.:
`Ch' •... manually translating
`the original English quenes
`to
`Inese q
`.•.
`.E h Chi
`se query IS
`.
`.'
`uenes
`as the input queries later.
`•.. me.
`.
`tr
`are regarded
`ac
`.
`anslatedt
`The followmg
`c.:
`..
`.
`.. .
`...
`..
`0 lOur target queries
`using different
`selectIOn strategies......
`i of
`.'..
`e.x ..
`f
`.penments
`.
`slated verSIOns 0
`t
`.•••.•.." comp~re the retrieval
`performances
`of the
`our
`ran
`
`AOL Ex. 1022
`Page 8 of 18
`
`

`

`256
`
`One example of the
`queries.
`to the results of the original English
`Chinese queries
`original English query, human translated Chinese version,
`and translated queries are
`It gives the segmented Chinese
`shown in Table 2.
`string and four automatically
`translated representations
`for
`the CACM QI.
`Parentheses
`surround
`the English
`multi-term concepts
`and the brackets
`surround
`the translation
`equivalents of each
`term.
`and phrase-level
`translation
`the word-level
`of
`the performances
`To compare
`checked to find the multi-term
`translation,
`the CACM English queries are manually
`concepts that are not contained in our bilingual dictionary.
`These concepts and their
`translations are added into the bilingual dictionary
`for the phrase-level
`experiments.
`(:it AA; olI.Jt "f ~~),
`Totally, 102 multi-word
`concepts
`(e.g.,
`remote
`procedure
`call
`(~J,-1t$)-JW),etc.) are identified in the CACM queries.
`singular value decomposition
`
`Table I.
`
`Different
`
`translations of Chinese concept
`decomposition)
`
`'~-l-1t ~ Nt-' (singular value
`
`Translated representations
`Table lea).
`Term POS
`SA
`N oddity singularity
`~1(.
`(jjvi)
`ADJ
`singular
`iti.
`N value worth
`(zhi)
`:$)-At
`(fenjie)
`
`based on different
`SHF
`SNHF
`singularity
`singular
`value
`
`singular
`value
`
`strategies
`WCO
`
`singular
`value
`
`decomposition
`
`decomposition
`
`N Decomposition analysis
`dissociation cracking
`disintegration
`analyze anatomize decompose
`decompound disassemble
`dismount resolve
`(solit up) (break up)
`
`V
`
`XV
`
`analyze
`
`analyze
`
`(split up)
`
`-
`
`Table I(b).
`
`word IEQuivalents
`pddity
`~1(.
`wll
`(jiyi)
`Isingular
`w12
`lsin!!Ularitv
`wI3
`iti.
`value
`w21
`(zhi) worth
`w22
`:$)-A!f
`analysis
`w31
`(fenjie) decomposition w32
`nalyze
`w33
`ecompose
`w34
`ecompound
`w35
`esolve
`w36
`
`....
`
`...
`
`for some word pairs
`information
`The mutual
`_
`(fenjie)
`~ 1(. (jiyi)
`fiR (zhi)
`7}N{-
`wll wl2 wl3 w21 w22 w31 w32 w33 w34 w351w36
`-
`-
`---
`-
`
`6.099
`
`4.115
`6.669
`
`1.823
`4.377
`
`6.099
`
`4.115 6.669
`
`1.823 4.377
`
`.
`
`-
`
`AOL Ex. 1022
`Page 9 of 18
`
`

`

`257
`
`2.2SHF
`
`for CACM Ql
`Table2. The Chinese query and four translated representations
`OriginalQuery What articles exist which deal with TSS 'Time Sharing System', an operating
`system for IBM computers?
`ChineseOuerv ~ltbX -f ,tAr ~MTIS '7)-at *- Nt.',-oft
`IBM 't~~Q!J111:~!k
`1Sezmentation ~ ltb X -f
`oft IBM 't~~ ijlJ 11it~!k
`JJ:
`:ff ~MTIS' 7)- at ~!k',
`-
`those article [be yes yah yep] about TIS '[minute cent apportion deal dissever
`2.1SA
`sharing] time [formation lineage succession system)',
`[a ace mono] [class
`seed] IBM [computer computing] of [(operating system) (operation system)
`OS]
`those article be about TIS 'deal
`(operating svstem)
`those article [be yes] about TIS '[minute deal] time system', [a mono] class
`IBM computer of [Ioperating svstem) OSl
`those article be about ITS 'sharing time system', a classIBM computer of
`(operating system)
`
`time system', a class IBM computer of
`
`2.3SNHF
`
`2.4WCO
`
`the average terms of user-supplied
`environments,
`Overa wide range of operational
`queriesare 1.5 - 2 words and rarely more than 4 words. Hull and Grefenstette [7]
`workwiththe short versions of queries
`(average length of seven words) from French
`to~nglishin TREC experiments.
`But no comparison of the short and long queries is
`avaIlable.To evaluate
`the behavior
`of user's
`short queries, we make additional
`experimentsto compare with
`the
`results
`of
`the original
`long queries.
`Three
`resear~~ershelp us to create the English and Chinese versions of short queries from
`~heangInalEnglish queries of CACM.
`For example,
`the short version of CACM Ql
`~s"TSSTiming Sharing System" . On the average,
`the short query has near 4 words,
`f E li h
`IOclud'
`.
`.'
`, mg smgle-word terms and multi-term concepts.
`The short version 0
`ng IS
`quenesi~regarded as the baseline
`to compare the results of translated queries of the
`shortChmesequeries.
`TheoY~rallresults are shown in Fig. 3.
`average precision [15] of
`The ll-point
`It achieves
`the 83.42%
`is 29.85%.
`the monohngual short English
`queries
`~~6nnance of the original English
`queries.
`In word-level
`experiments,
`th~ best
`. (wordco-occurrence)
`strategy gets the 72.96% performance ofthemonohngual
`In
`Enghshshort version and 65.18% of the monolingual
`original English version.
`~~a~.level, the.WCO achieves 87.14% and 74.71% respec~ively. The .SHE SNHF,
`R
`COselectIOn strategies perform better in the long quenes than that III short ones.
`o~ever,the simple SA strategy
`has opposite
`result.
`Because users give more
`specificterms in short queries
`the SA strategy introduces
`less extraneous terms to the
`up to
`query·•·.·...Alt.'
`ernatlvely,
`the phrase-level
`translation
`improves.
`the peT10rmance
`'
`t:
`14
`~31lJ{·
`th
`.
`bi
`hr
`0 overtheword-level
`for Chinese-English CLIR ... Com irung
`e
`translation
`~L~e dictionary. and co-occurrence
`can bring the ••performance ~f
`disambiguation
`up to 87% of monolingual
`retrieval
`in short queries •.. Recall
`that. the multi-
`Wordc
`•
`•
`.....
`nts
`aft ••.oncepts and their
`translations
`are added to the dictIOnary III .0Uf expenme
`b'l~rthe domain experts
`the queries; Hence
`the coverage .of
`have
`examined
`IIngualphI
`.
`.
`..
`IR·
`··E
`though the
`b'l'
`rasa dictionary will affect
`the performance
`of CL
`ven
`'.
`lIngualdi
`h WCO method
`'.
`..•
`still. . ,
`Icttonary does not contain these multi-word
`eoncepts.; t e
`th of query at
`iff
`I
`achieve
`Word~1
`s near 70% monolingual
`effectiveness
`eng
`...•
`for di [erent
`eVeltranslation.
`
`AOL Ex. 1022
`Page 10 of 18
`
`

`

`258
`
`ll-point average precision (%)
`40
`35
`30
`25
`20
`15
`10
`
`SHF
`
`21.89
`26.41
`19.57
`24.93
`
`SNHF
`
`19.33
`23.62
`17.42
`22.92
`
`WCO
`
`23.32
`26.73
`21.78
`26.01
`
`Monolingual
`
`SA
`
`5o
`
`-+-word-Ieve1
`-II- phrase-level
`-&-word-1evel
`(short-query)
`-'-phrase-Ievel
`(short query)
`
`35.78
`35.78
`29.85
`29.85
`
`16.39
`20.45
`18.28
`23.36
`
`of query translations
`The comparison of retrieval performances
`Fig. 3.
`queries and short queries in different
`levels of translations
`
`for the long
`
`4 Search Engines
`S··
`ur MTIR
`.
`I'
`trans anon m a
`are integrated with language
`IX popular
`search engmes
`in the user inte.rfac~f
`system.. User inputs query.and selects one of the search engines
`The Chinese query terms WIll be translated to English ones. After
`the processmg d
`of the translat: 1
`query translation, our system will send an HTTP request composed
`The retrieved results from the search engine wil
`query to the chosen search engine.
`be translated to the user's native language
`(Chinese).
`In general,
`the CGI progr~m
`of searching engine processes 'the HTTP request of query.
`For
`instance,
`a~s~m~?~
`is the translated query of the Chinese query "~~~~~f .Ul~
`"machine translation"
`The HITP requests for the cm programs of several
`fanyi).
`search engines are liste
`'+' for the
`in Table 3. The query words
`should be separated with the symbol
`standard URL encoding.
`for. multilin~ual
`definitions
`different
`five
`[7] give
`Hull r and Grefenstette
`colleclJOn,
`type 4 is "IR on a multilingual
`document
`information
`retrieval.
`··The
`How to merge and
`where queries can retrieve documents
`in multiple
`languages".
`rank the retrieved documents in different
`languages
`is a problem in CLIR.
`A~ong
`these systems,
`the AltaVista and Infoseek have indexed both the English and ChInese
`web pages.
`If a bilingual query ("~~~1f+machine+translation")is
`invoked,
`the
`two systems will
`list
`the relevant documents.
`of both languages.
`However,
`the
`ranking for documents
`in differenrlanguages
`seems not good.
`It's still a problem for
`multilingualIR.
`..
`
`..•...•••.•.....•....:.:..•
`
`AOL Ex. 1022
`Page 11 of 18
`
`

`

`259
`
`for the CGI Programs of Searching Engines
`Table 3. HTIP Requests
`HTfP Requests for the em Programs of Searching Engines
`Chinese
`SearchEngine
`Indexing
`Yes
`
`No
`Yes
`
`No
`
`No
`
`No
`
`AltaVista http://www.altavista.digital.comlcgi-binlquery?pg=q&what=web
`&kl=XX&q=machine+translation&search.x=35&search,v=9
`http://search.excite.comlsearch.gw?search=machine+translation
`Excite
`Infoseek http://www.infoseek.comffitles?qt=machine+translation&col=WW
`&sv=IS&lk=noframes&nh=lO
`http://www,lycos.comlcgi-inlpursuit?matchmode=and&cat=lycos&
`querv=machine+translation&x=30&v=4
`MetaCrawlerhttp://www.metacrawler,comlcrawler?general=machine+translation
`&method=O&target=&region=O&rpp=20&timeout=5&hpe=lO
`http://search.vahoo.comlbinisearch?p=machineHranslation
`
`Yahoo
`
`Lycos
`
`5 Document Translation
`system for users to navigate on
`translation
`Therequirement for an online machine
`WWW is' different
`from traditional
`off-line
`batch MT systems.
`An assisted MT
`systemshould help users quickly
`understand
`the Web pages and find the interested
`d~cumentsduring navigation
`on a very
`huge
`information
`resources.
`That
`is,
`differentusers' behaviors affect
`the requirements
`of machine translation systems.
`, Fromusers' viewpoint,
`a high-quality
`and high-speed
`online machine translation
`IS required. However, several
`steps should be performed after a query is issued.
`It
`takestime for the transfer
`of
`the query,
`the query translation,
`the retrieval of the
`document·satisfying
`the query,
`the
`transfer
`of
`the retrieved
`document
`and the
`documenttranslation.
`How to find the tradeoff between the speed performance
`and
`thetr
`.
`I
`.
`.
`hi
`ans ation performance
`on the WWW is an important
`issue.
`Besides t IS Issue,
`Ourprevious work [1] addressed
`including which material
`four other
`issues,
`is
`translated,what roles
`the HTML tags play in translation, what
`form the translated
`res~ltis.presented in, and where
`the translation
`capability is implemented,
`to design
`onhnerna hi
`.
`.
`.
`c llletranslatlOn
`systems
`for the WWW.
`design have been proposed [16-
`Manydifferent approaches
`to machine
`translation
`21],. These include rule-based
`example-based
`statistics-based,
`..knowledge-based,
`f
`and I
`.
`"
`g Ossary-based approaches.
`A hybrid approach [22] integrates the advantages ,0
`lhes7approaches and tries
`their disadvantages.
`to get
`rid of
`A r\lle-based partial
`~~slngrneth~d is adopted and the translation
`process
`is performed chunk .b)' chunk.
`fOl,lowthis design strategy
`and consider.
`the characters of web translauon ..• .The
`f
`OlloWtngsectionsdepict
`the details of analysis,
`transfer and synthesis modules.
`S.lAnalysis Module
`Attirst w'd
`" '.
`edelinliters
`t
`.
`.
`.'
`..
`.
`. .. ".
`s .
`.'.. e l.entIfy the sentence
`types of source sentences using sen enc
`.
`'.
`omestructural transfer
`rules can only be applied to some. types of sentences i... Then"
`. I t:
`(e g +ed
`Wetake a
`..'.
`.•
`.
`.
`+'
`... morphological
`analysis.
`The words
`in morphoiogica
`l?rms.
`'.'
`otng,tly,+s, etc.) are tagged with the morphological
`tags, which are useful forpart-
`-speechtag .
`.
`.
`..,
`.:.
`.....
`ti
`f the target s..ense
`gmg, word. sense disambigu anon
`and the gene.r.a. lO.no...
`.•.....
`.
`..••
`U•.••
`'.
`SIngth ..
`•.
`.'
`'..
`-:
`.•. •
`'.
`«.
`/
`.esense ofthe root word.
`..
`..•...
`••.:
`<
`. •...•.••.• .•'
`Afte.rmo h
`'.
`'.
`.
`.
`. ..•
`.•..••,
`.....
`.'
`. hed from VarIOUS
`•
`rp erne processing,
`the words
`In root
`forms are s~aI'c '. ...••...•
`. .
`
`AOL Ex. 1022
`Page 12 of 18
`
`

`

`260
`
`There are about 67,000 word
`dictionaries using the longest-matching strategy.
`entries in an English-Chinese general dictionary and 5,500 idioms in a phrasal
`dictionary.
`In addition, some domain specific dictionaries are required for better
`translation performance. After dictionary lookup,
`the idioms and the compound
`words are treated as complete units for POS tagging and sense translation.
`For consideration of the speed and robustness issues, a three-stage hybrid methodis
`It treats the certain cases using
`adopted to deal with part-of-speech tagging.
`heuristic rules, and disambiguates the uncertain cases using a statistical model. At
`stage 1, the words with specific morphological tags can be tagged without ambiguities.
`For example, the word of the pattern ADJ+ly is tagged with RB. The taggingof
`some morphological words depends on the morphological
`tag and the POS of itsroot
`form. For example, if the dictionary tag of the root of a word (root-er) is JJ, then
`this word is an adjective. Otherwise, it is a noun. Besides, if a word does nothave
`any morphological tags and has only one POS candidate in the dictionary, then the
`unique POS is assigned to this word. At stage 2, a pattern matching method that
`considers the morphological tags of the. current and the next words, as well as the
`POS of the next word, is employed to do the POS tagging.
`Stage 3 deals withthe
`remaining words, which have not been tagged up to now. A statistical bigramHMM
`model is followed to solve the uncertain cases.
`To reduce the cost of fully parsing in a real-time service; we adopt a partial parser
`to get the skeletons of sentences. A NP/ADJP finite state machine (FSM) is usedto
`segment the source sentence into a sequence of chunks. This FSM analyzes thet~g
`sequence, and recognizes the fundamental noun phrases and adjective phrases In
`linear time. Then a predicate-argument detector is followed to analyze the skeleton
`of sentence [23]. The determination of PP attachment
`is based on the rule templates
`[24].
`5.2 Transfer Module
`The structural transfer, the tense transfer and the lexical selections touch on the
`differences of source and target languages'. The major structural transfers occurin
`the comparative clauses,
`the question sentences
`and the modifications of noun
`phrases. The structure of noun phrases is left-recursion in English, but is right-
`recursion in Chinese. Due to the recursion in the noun phrases, the transferred target
`structure is treated as a whole chunk for the subsequent processing.
`For different
`tenses, the words·"have"
`and "be" have differentsenses
`in Chinese .
`. . Phrases and idioms are treated as complete units during lexical selection. A
`bJlmgual phrase dictionary is employed to produce phrase-by-phrase translation. For
`t~ose remaining words, several word selection algorithms like select-first, select the-
`highest-frequency word and mutual information m.ethod may be adopted to selectthe
`target sense. The select-first method always selects the first translation sense from
`the •.candidates with•.the matched POSes. The. second m~thod chooses the target
`sense with the highestoccurre~ce.probability,
`trained from a large-scale corpus of the
`target language .. The mutual information .modelconsiders
`the content around the
`wo~ds to ~e~ide the best combination of target words. Different models access
`'The largerthe table is, the more time it takes. Section 6
`v~o~s
`training tables.
`willdiscuss the time complexity, the table space and the translation accuracy.
`
`AOL Ex. 1022
`Page 13 of 18
`
`

`

`261
`
`5.3 Synthesis Module
`Thesynthesismodule deals with word insertion, deletion and word order refinement.
`Forexample, if the source word with morpheme
`tag YJB,
`is tagged as adverb (RB)
`andderivedfrom the adjective
`(JJ) word form,
`the target sense will be generated in
`lI-J "
`it" (di).
`thewayof deleting the character"
`(de) and appending"
`The character
`and the character "it" (di)
`"1JiJ" (de)always appears at the end of Chinese adjectives,
`attheend of adverbs.
`In addition,
`if the present participle
`and the past participle are
`bg" (de) is inserted into the target sense.
`taggedas adjective.
`The character"
`Ourprevious work [1] introduced
`the generation
`of bilingual aligned document for
`webtranslation s

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket