throbber
CROSS-LANGUAGE
`INFORMATION RETRIEVAL
`
`edited by
`
`Gregory Grefenstette
`Xerox Research Centre Europe
`Grenoble, France
`
`KLUWER ACADEMIC PUBLISHERS
`Boston / Dordrecht
`/ London
`
`AOL Ex. 1021
`Page 1 of 14
`
`

`

`for North America:
`Distributors
`Kluwer Academic Publishers
`101 Philip Drive
`Assinippi Park
`Norwell, Massachusetts 02061 USA
`
`for all other countries:
`Distributors
`Kluwer Academic Publishers Group
`Distribution Centre
`Post Office Box 322
`3300 AH Dordrecht, THE NETHERLANDS
`
`Library of Congress Cataloging-in-Publication
`
`Data
`
`A C.I.P. Catalogue record for this book is available
`from the Library of Congress.
`
`The publisher offers discounts on this book when ordered in bulk quantities. For
`more information contact:
`Sales Department, Kluwer Academic Publishers,
`101 Philip Drive, Assinippi Park, Norwell, MA 02061
`
`Copyright © 1998 by Kluwer Academic Publishers
`
`stored in a
`All rights reserved. No part of this publication may be reproduced,
`retrieval system or transmitted in any form or by any means, mechanical, photo-
`copying,
`recording, or otherwise, without
`the prior. written permission of the
`publisher, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell,
`Massachusetts 02061
`
`Printed on acid-free paper.
`
`Printed in the United States of America
`
`AOL Ex. 1021
`Page 2 of 14
`
`

`

`A LANGUAGE CONVERSION
`FRONT-END FOR
`CROSS-LANGUAGE
`INFORMATION RETRIEVAL
`YAMABANA Kiyoshi, MURAKI Kazunori,
`DOl Shinichi, KAMEl Shin-ichiro
`
`Information Technology Research Laboratories
`NEC Corporation
`Miyazaki 4-1-1, Miyamae-ku, Kawasaki 216, Japan
`
`ABSTRACT
`
`We present a new disambiguation method for query translation in cross-language
`information retrieval. The method is an application of ambiguity resolution methods
`developed for machine translation systems, and.is a combination of a statistics-based
`word selection method where statistical
`information is extracted from non-parallel
`corpora, and an interactive user interface to improve the translation quality. We
`describe its implementation as a front-end to IR systems.
`
`INTRODUCTION
`1
`With a rapid expansion of the Internet and increasing amount of online docu-
`ments available in foreign language, cross-language information retrieval (CLIR)
`is getting more focus and interest
`than ever. In CLlR, users are allowed to build
`a query in their native language to search documents written in another
`Ian-
`guage,and
`utilize the retrieved result. effectively.
`
`approach to this problem would be to· apply machine •trans-
`A straightforward
`lation (MT),
`translating
`either query or documents
`to reduce the task into
`monolingual
`retrieval. However, usefulness of MT system for this purpose is
`
`AOL Ex. 1021
`Page 3 of 14
`
`

`

`94
`
`CHAPTER 8
`
`in this case.
`important
`not obvious, because translation quality becomes most
`can be obtained only
`Generally speaking, high quality automatic
`translation
`when the applicable domain is limited and sufficient domain knowledge is in-
`In addition,
`its disambiguation capability might not be sufficient
`corporated.
`in a query translation approach, since queries tend to be short, sometimes even
`without
`internal structure. Many of disambiguation
`processes in MT that de-
`pend on syntactic analysis will be ineffective. A document
`translation approach
`may partly avoid this problem, but
`this approach brings other shortcomings,
`such as the increased cost
`inherent
`in duplicating
`databases.
`Since a large
`lexicon having enough number of multi-word terminologies,
`and proper disam-
`biguation in translation,
`are two important
`factors to obtain a good result
`in
`lexicon-based query translation[HG96a],
`it is worthwhile to pursue some dis-
`ambiguation method that does not depend on syntactic analysis.
`
`In this chapter, we present a lexicon-based query translation method in which
`a statistical word selection [DM92, DM93] is combined with an interactive user
`interface (MASA94, YDK+95] for disambiguation,
`both originally developed
`In this method,
`for machine translation.
`translation
`equivalent
`is determined
`from among possible candidate in the bilingual
`lexicon, using a statistics-based
`method, which we call DMAX method,
`in which co-occurrence frequency from
`non-parallel corpora and information from a bilingual
`lexicon are combined.
`Then, an interactive user interface helps the user to change selection, edit
`words, and obtain an explanation of translated words in the user language, at
`the user's will.
`
`Section 2 discusses problems in translation equivalent selection, and shortcom-
`ings of simple statistics-based method using word co-occurrence
`information,
`motivating our method. Section 3 describes the details of our method. Section
`as a language conversion front-end to IR sys-
`4 describes an implementation
`tems. Section 5 shows an experimental
`result. Section 6 is for discussion, and
`section 7 concludes this chapter.
`
`2 PROBLEMS IN TRANSLATION
`EQUIVALENT SELECTION
`
`Ambiguity resolution in translation or word sense selection has been one of the
`lar?est problems not only in query translation but also in general machine trans-
`latI?n systems. These systems, which are put into practical use nowadays,have
`mamly adopted. rule-based disambiguation method utilizing linguistic restric-
`
`AOL Ex. 1021
`Page 4 of 14
`
`

`

`Language Conversion Front-End
`
`95
`
`tions between modifiers and modifyees described in a lexicon and grammar to
`select a suitable equivalent
`translation. Although these methods are effectively
`usedin machine translation systems,
`they have many problems: difficulty in de-
`scribing restrictions on all dependencies
`in advance,
`inability to select suitable
`translations if the input expression meets two or more contradictory restric-
`tions or meets no restrictions,
`etc. These problems are especially conspicuous
`in query translation,
`since queries are usually segments of sentences or phrases
`or even a mere sequence of words without an internal syntactic structure. As a
`result, in query translation,
`current machine translation systems can work only
`as a lexicon consultation tool without enough disambiguation capability.
`
`In order to overcome these difficulties on rule-based systems, translation meth-
`ods utilizing corpus-based
`knowledge such as translation examples (pairs of
`source text and its translation)
`and statistical or probabilistic information ex-
`tracted from large corpora have been proposed in recent years. Among them,
`methods based on statistical
`information such as (co-)occurrence frequency is
`effective for query translation
`because it can be applied independently from
`structural analysis.
`
`In statistics-based word selection methods, suitable translation equivalents are
`selected using statistical or probabilistic
`information extracted from language
`texts[BDDM93, Nom91, Yar92].
`These methods
`are applicable to convert
`queries without syntactic analysis, while the statistical
`information reflects the
`context in which each word occurs and can imply the logical restrictions based
`on indirect structural deperidencies. Moreover, because a lot of machine read-
`~ble texts have already been collected,
`it is not difficult to extract statistical
`mformation for expression in the texts
`(semi-) automatically. However, the
`rnethod based on the statistics
`extracted from bilingual corpora cannot appre-
`ciate this merit enough, since it is difficult and expensive to collect sufficiently
`large bilingual corpora and divide the sentences of such corpora into fragments
`and align them automatically,
`especially for the languages whose linguistic
`structures aren't similar, such as English and Japanese.
`
`translation method to utilize word co-occurrence
`The simplest statistics-based
`information is to select
`the target word pair with highest co-o~currence fre-
`quency. It is calculated in advance how frequently each expressIOn.occurs or
`each pair of expressions co-occurs in the targetJanguage
`text, and the most
`f:equenttranslationis
`selected through this calculated data. This methc-d
`h.mple and easy to apply. However,
`it would be suffered from an
`19hfrequent co-occurrence between the target words
`translation of the source words but are not suitable in
`
`eQluivale:nt
`
`AOL Ex. 1021
`Page 5 of 14
`
`

`

`96
`
`Koiori-no
`bird
`of
`
`kago-ni
`cage dative
`or marker
`basket
`1).
`I put a bovl filled vith vater in the bird cage.
`
`mizu-o
`water objective
`marker
`
`ireta
`filled
`
`CHAPTER 8
`
`booru-o
`bowl objective
`or
`marker
`ball
`
`oita.
`put
`
`Figure 1 Word-to-word English correspondence for a sample Japanese Sen-
`tence.
`
`in the input sen-
`kago, booru appear
`the Japanese words kotori,
`Suppose that
`tence: Koiori-no
`kago-ni mizu-o ireta booru-o oita". The Japanese word kago
`has two equivalent English translations:
`'cage' and 'basket' and booru has two
`translations:
`'ball' and 'bowl'. Because the pair of 'basket' and 'ball' may co-
`occur most frequently,
`the method using only the statistics of target
`language
`might
`translate
`the sentence into '1 put a ball filled with water
`in the bird
`basket.'
`
`Thus, a method using only single language statistics has an obvious limitation
`by its nature. This fact motivated the idea to correlate independent
`language
`statistics
`through translation equivalent word pairings in a bilingual
`lexicon.
`
`3 DESCRIPTION OF THE METHOD
`
`capability and an
`Our method is a combination of automatic disambiguation
`interactive user interface. The query translation
`is based on a bilinguallexi-
`con, which is essentially a collection of equivalent word pairs in two languages.
`Because of word meaning ambiguity,
`the correspondence
`is usually many to
`many. Statistical
`information extracted from non-parallel
`corpora of the do-
`main is used to choose one translation from among possible ones. An interactive
`interface, a main translation window, is provided so that user can interactively
`improve the translation result. The window receives input, shows the converted
`query and a list of possible translations,
`and gives a means to access resources
`such as electronic human-readable bilingual dictionaries.
`
`Figure 2 is the configuration diagram of the method. There are three resources
`provided for query translation and automatic disambiguation:
`the bilingual
`lex-
`icon, word co-occurrence statistics drawn from source language corpora, word
`co-occurrence statistics drawn from target
`language corpora''. User's keyboard
`1[ put a bowl filled with water in the bird cage. The subject phrase watashi-wa = 'I' is
`omitted in this sentence. Detailed word correspondence is shown in Figure l.
`2Here "source language" is used as an equivalent to "user's language", and "target lan-
`guage" to "documents' language".
`
`AOL Ex. 1021
`Page 6 of 14
`
`

`

`Language Conversion Fro nt- End
`~ __ ~:~K~-I;;ut;
`Source
`Language
`Query
`
`'--r-r-1---'
`
`97
`
`Interactive
`Ambiguity
`Resolution
`
`Target
`Language
`Query
`
`",---- -----.~
`,
`
`{
`An Information
`I
`I
`I Retrieval System I
`\
`J
`'----- ------"
`
`.::..:::::: .:::::::: ....::.
`
`::+~i.~~t?:::::::::
`'-----l:::~:~~~Si.~:{
`:::T~i:f:::::::::::::::::::::
`Cooccurrence
`::::::::::::::::::::::::::::::::::::::::::::::
`Statistics
`
`Dual Corpora
`
`Automatic
`Translation
`
`Figure
`
`2 Block Diagram of the Method.
`
`input to a retrieval system is captured by the main translation window, that
`~ntegratesquery translation
`function, automatic disambiguation function,and
`Illteractive disambiguation function. The input word sequence undergoes mor-
`phological analysis, and the words are automatically translated to equivalent
`~ords in target
`language by consulting the bilingual
`lexicon.
`In the transla-
`~Ionprocess, co-occurrence information independently gathered from corpora
`~nsource language and target
`language are combined through the word pairing
`Illformationin the bilingual
`lexicon for translation equivalent selection. Trans-
`!ated query is immediately presented to the user on the same window, allowing
`Illteractive operation for improvement. When the user signals the end of in-
`ti
`ter
`A
`hi
`to the text retrieval system.
`mac me
`ac ion, the converted query is sent
`translation system is integrated to help the user to screen out irrelevant docu-
`ments.by examining rough translations
`of the retrieved result.
`
`DMAX Method for Translation
`Equivalent Selection
`
`equivalent
`quality of translation
`adopt a DMAX (Double MAXimize) method[DM92, DM93],
`frequencies among source words and. those among
`
`AOL Ex. 1021
`Page 7 of 14
`
`

`

`98
`
`CHAPTER 8
`
`first the
`In this method,
`lexicon.
`combined through the link in the bilingual
`source language words which maximize the co-occurrence
`frequency of the
`source language are selected,
`then the equivalent
`translations
`of them which
`maximize the co-occurrence frequency of the target
`language are selected. The
`dependency relation in the source language is reflected in the translated text
`through this method, so it overcomes the difficulty of the method using only the
`frequency data of target
`language text. This DMAX method is also tractable
`because the resource of the required linguistic statistics
`is dual corpora of source
`and target
`languages, not bilingual corpora,
`i.e. the target
`language text does-
`n't have to be the translation of the source language text.
`
`The procedure of this method is summarized as follows:
`
`(A) Preparation of corpus statistics
`science/etc.
`1. Select domain X - X:politics/medicine/computer
`2. Prepare the source and target
`language texts of domain X (the target lan-
`guage text needs not to be a translated text of the source language text)
`3. Calculate co-occurrence frequency of every noun in source language text
`4. Calculate co-occurrence frequency of every noun in target
`language text
`(B) Translation equivalent selection
`
`to translate and the source
`1. Extract nouns from the source language text
`language-target
`language bilingual dictionary of the nouns appeared in the
`text
`NOTATIONS
`Sa(a = 1 .. ·m)
`Tai(i = 1··· t1a)
`COF(Sa,Sb)
`
`: source language noun
`t1a target
`language equivalent nouns of Sa
`:
`: co-occurrence frequency between source language nouns
`s.,s,
`: co-occurrence frequency between target
`Tai,Tbj
`2. Select Sp,Sq I maxp,qCOF(Sp,Sq)
`language equivalent noun of Sp or Sq has not been fixed
`where target
`3. Select Tpr,Tqal maxr.aCOF(Tpr,Tqa)
`language equivalent noun of Sp as T pr
`4. Fix the target
`5. Fix the target
`language equivalent noun of Sq as T qe
`language equivalent noun of Sa(a =
`6. Repeat step 2-5 until every target
`1 ..• m) is fixed
`
`language nouns
`
`The above-mentioned example sentence Kotori-nokago-ni mizu-o ireta booru-
`o oita. can be translated correctly by this DMAXmethod.
`In this example,
`
`AOL Ex. 1021
`Page 8 of 14
`
`

`

`Language Conversion Front-End
`
`99
`
`in Japanese may be kotori and
`frequently
`the word pair which co-occurs most
`kago. And because
`the pair of 'bird'and
`'cage' may co-occur more frequently
`than the pair of 'bird'
`and 'basket'
`among the English equivalent
`translations
`of
`koiori and kago, kago is translated
`booru
`into 'cage'. Repeating
`the procedure,
`is translated
`into 'bowl'
`and the whole sentence
`is translated
`correctly into 'I
`put a bowl filled with water
`in the bird cage.'
`
`3.2 User Interface ..for Interactive
`Disambiguat.iori
`
`to obtain a good retrieval
`is crucial
`of the converted-query
`Since the quality
`so that
`the user can improve
`result, it is a natural
`idea to provide
`a user interface
`the translation
`quality
`through
`interactive
`operation.
`Since IR is usually an
`interactive process,
`interactivityintranslation
`can be naturally
`integrated with
`the system.
`In addition,
`necessary
`interaction
`is essentially
`limited to choosing
`appropriate
`target words.
`These
`circumstances,
`different
`from MT cases, are
`favorable for introducing
`interactivity,
`
`between
`that works as a query translator,
`is to provide a layer
`Our approach
`user's keyboard operation
`and an information
`retrieval
`system, The layer cap-
`tures keyboard
`input,
`recognizes word boundaries,
`translates
`each word into
`the target
`language
`based
`on the bilingual
`lexicon and DMAX method,
`and
`sends the resulting
`string
`to the IR system,
`
`con-
`from that of Kana-K~nji
`is borrowed
`operation
`.design of interactive
`version method, which converts
`Japanese
`phonetic
`character
`(kana)
`input
`to
`an expression mixed with
`characters
`of Chinese
`origin (kanji),
`suitable
`for
`~ritten
`documents,
`Although
`Kana-Kanji
`conversion method
`offer~ conver-
`s~on o,nly within
`a language, we expanded
`the methodint?
`c~nverslOn fu?c~
`tIonalIty between
`different
`languages
`as an interactive machme
`translatIOn
`system[MASA94, YDK+95],
`
`The •.conversion layer
`as a main translation window of the software,
`is realized
`and its translation
`are shown simultaneously,
`.•Assoon>as
`on which the input
`~he user enters
`a query to this window in user's
`language,
`translation
`p~ocess
`IS automatically
`invoked
`and the translated
`query appears on it; The wmdo~
`ti
`as in Kana~KanJl
`offers b th
`.,
`'
`.: •• h
`an easy translation
`equivalent
`selectIOn operalOll
`•.•••••.
`,
`.• . 0
`conversion method,
`and a normal
`functionality
`as in anedltor,T
`d
`editing
`the focus, and it can bechange
`translated word at
`the cursor position
`receives
`to.c- a.nth
`'If
`'ht.
`E.quiv.ale.nt.tr. a.n.s..-
`er word simply by moving the cursor
`to e tor
`rig
`'.
`.•• •
`.
`.
`
`.....0
`
`AOL Ex. 1021
`Page 9 of 14
`
`

`

`100
`
`CHAPTER 8
`
`lation candidates for the focused word are shown in a nearby separate window.
`The selection can be changed by a cursor movement
`to, or a mouse click on,
`the newly selected word. The window's normal editing operations allows the
`user to edit the resulting string freely. If the meaning or usage of translated
`words are not clear, a separate online dictionary can be accessed by a hotkey
`or a mouse click.
`
`it is converting
`implementation:
`Figure 3 shows a snapshot from a current
`a Japanese query jouhou kensoku, a Japanese equivalent
`to "information re-
`trieval",
`into an English equivalent. The input
`(in Kanji)
`is shown on the
`upper small window, and the corresponding translation is shown in the low~r
`window. The cursor is placed just at the tail of the first English word, speer-
`fying current target word. The first Japanese word is highlighted, showing it
`is the original word for the target word. The larger window below shows a
`list of possible translation equivalents for the original word jouhou, extracted
`from the bilingual lexicon. The first line is highlighted to show that
`it is the
`current selection. At the rightmost place of this window, useful information
`describing usage or meaning of each English word is shown to help the user
`to choose an appropriate one. If the user feel this information is not enough,
`bilingual dictionary in a CD-ROM can be accessed and the content of the spec-
`ified word is shownin a separate window. When the user moves the cursor to
`right, the window is replaced with one that contain translation equivalents for
`kensaku(retrieval).
`The highlighted Japanese word also changes.
`In addition,
`the user is free to edit on the window.
`If another Japanese word is entered,
`morphological analysis and translation process automatically start and update
`the displayed contents. When· the return key is pushed the result
`is sent to
`the retrieval system and used as if it were an input query directly built by the
`user.
`
`3.3 Translating Retrieved Documents
`
`A translation fac~lityfor retrieved documents is provided to help user to choose
`from among retrieved results. The user can freely choose between dictionary
`refer~nce functionality. and machine translation functionality,. according to: the
`quality of results obtained by thesefunctionalities.
`
`AOL Ex. 1021
`Page 10 of 14
`
`

`

`Language Conversion Front-End
`
`101
`
`I~
`Informationl retrieval
`
`•
`~~I[ m ']
`[/FiJI
`knowledge
`61!j:-b J
`~bservations [/FCJ~:g~~l[~~b'
`[/FiJI~:g~~1[P\]§B't~f~J
`tIp-off
`[/FiJI~:g~~1[
`.::L - JJ%o6iiJ
`news
`[CJ~:g~~1 [f~'5-J
`report
`
`Figure 3 Snapshot of Translation Window.
`
`4
`
`IMPLEMENTATION
`
`runs on Windows 95, and works as a language
`implementation
`The current
`translation front-end to arbitrary IR systems, where the user language is Japanese,
`and the document
`language is English. The lexicon is derived from one for a
`machine translation system, and contains about a hundred thousand Japanese
`entries.
`
`are not explicitly marked in Japanese, determination
`Since word boundaries
`of word boundaries
`is necessary before absorbing inflections. When an input
`query is given to the main translation window, a morphological analyzer
`is
`started. Ambiguities in word boundaries are resolved by preference rules, based
`on information such as word length,
`the number of morphemes,
`the number of
`content words, etc. The connectivity matrix between morphemes also serves
`to abandon inadequate morpheme sequences. An inflection table that contains
`word inflection rules is used to absorb inflections, instead of stemming.
`
`to an arbitrary retrieval system. The
`The current systemworksasafront-end
`system receives Japanese, converts each. word into an English equivalent,
`then
`sends the resulting English expression to another application to which the user
`started keyboard input. The function of sending characters
`to an arbitrary
`application is realized using a standard protocol ofIME (Input Method Editor)
`of the operating system. Therefore,it
`is portablein
`the sense that
`it can be
`combined with. any retrieval system ..
`
`AOL Ex. 1021
`Page 11 of 14
`
`

`

`102
`
`CHAPTER 8
`
`A general-purpose MT system is also integrated for the screening purpose. The
`MT module accepts input
`from the cut buffer of the operating system. The
`user has only to copy sentences to the buffer to invoke translation/dictionary
`consuitation function. The results are displayed in a separate window.
`
`5 EXPERIMENTAL RESULTS
`
`Because it is still in an early stage of development, and it is not easy to objec-
`tively evaluate an interactive system, evaluation for the method as a whole is
`not available at this time. In this section, we present a result of an preliminary
`experiment
`that measured the capability of statistical disambiguation method
`adopted here.
`
`First, we prepared Japanese newspapers for one year and English journals pub-
`lished in the same year, and calculated inner-sentence co-occurrence frequencies
`-how frequently the two words co-occur in the same sentence-
`between nouns
`appeared in the texts. The number of sentences were about 600 thousand for
`Japanese,
`and 200 thousand for English. Nouns were recognized by longest
`match with entries in the bilingual
`lexicon.
`
`bilingual sentences from Eigo
`Then we randomly chose 70 Japanese-to-English
`Kaiuia Hyougen Jiten( dictionary of expressions for English conversation: Obun-
`sha, Japan) and extracted pairs of corresponding Japanese noun and English
`noun. Each sentence contained 3 to 10 corresponding noun pairs. We compared
`the outputs of translation with DMAX method and translation with the method
`using only co-occurrence frequency(COF) between target
`language nouns. The
`experiment was performed in both Japanese-to-English
`and English-to-Japanese
`directions using the same statistics. We evaluate the outputs of the transla-
`tion by three levels, agree, correct and incorrect: agree means that
`the selected
`equivalent
`translation
`agrees strictly with the word appeared in the original
`sentence, correct means that
`the equivalent
`translation
`doesn't agree with the
`original word but
`its meaning agrees with the one of the original word, and
`incorrect means that
`the meaning of the equivalent
`translation
`does not agree
`with the one of the original word.
`
`is shown in •Figure 4. The. frequencies are total of
`Result of this experiment
`the 70 sentences. The frequency of the word which has only one equivalent
`translation
`is not
`included in this result, but ••we used. such words in other
`words' equivalent translation selection. This result
`indicates that, with DMAX
`
`AOL Ex. 1021
`Page 12 of 14
`
`

`

`Language Conversion Front-End
`
`103
`
`translation
`direction
`J::}E
`J::}E
`E::}J
`E::}J
`
`translation
`method
`DMAX method
`only with English COF
`DMAX method
`only with Japanese COF
`
`agree
`193
`186
`167
`157
`
`evaluation
`correct
`incorrect
`56
`42
`49
`56
`80
`82
`71
`101
`
`correct
`percentage
`85.6%
`80.8%
`75.1%
`69.3%
`
`Figure 4 Result of the translation experiment.
`
`method, we can select equivalent
`latter method.
`
`translation more accurately than with the
`
`6 DISCUSSION
`
`it does not require
`A major characteristics of DMAX method used here is that
`parallel corpora. Statistics on. word co-occurrence is obtained independently
`fro~ corpus of each language,
`and they are correlated in terms of bilingual
`lexicon. Therefore it can be easily applied to domains or language pairs for
`whichlarge quantity of parallel corpora is not available. In addition, translation
`from target
`language text
`to source language text can be achieved using the
`same statistics, because the statistics of two texts are independent. On the
`other hand,
`the quality of the bilingual
`lexicon is crucial. Although building
`a.ndkeeping the lexicon up-to-date
`is an expensive task, its cost may be justified
`SIncebilingual
`lexicons are a sharable
`resource with other NLP applications
`such as machine translation
`systems. To this problem, automatic thesaurus
`construction might be expected as future direction.
`
`The interface can be made highly independent of the retrieval system itself.
`Ourrent implementation
`has its own resources including bilingual lexicon, and
`can be used with essentially any retrieval system as longasitaccep~s
`query,in-
`put from keyboard. This modularity is important
`from practical point of VIew.
`~sers can start cross-language
`retrieval anytime they find a window for query
`Inp.utwhile browsing the internet.
`In this respect,the method and implem~n-
`tatI?n. diverges from previous work on interface for multilingual
`inform~tlOn
`retneval (e.g. [LPS92]),
`in which the interfaceand
`operationsaredeterIl1I11e~
`.J
`by the organization of controlled vocabulary lexicon for the IR system'
`, ...•...., .....,;
`thellsercan
`of the interactive method isthat
`characteristic
`Another important
`Jre~ly choose the level of interaction. Always optional,theusermaycomplet~11
`?Itnt the interaction (except
`for sending in the the result). On the. contrary, .If
`
`,
`
`AOL Ex. 1021
`Page 13 of 14
`
`

`

`104
`
`CHAPTER 8
`
`tool for interactive
`it works as a support
`the user uses interaction intensively,
`query translation.
`In this way the system and the user work in a cooperative
`way, and interactive operations are utilized essentially to improve the automatic
`translation result.
`
`As for the usefulness of MT in information retrieval, we believe currently avail-
`able machine translation
`systems are useful, when integrated
`as a tool for
`screening purpose[OD96]. Although current machine translation
`systems have
`a limited translation capability in general,
`they can give a rough translation
`for a large amount of documents
`in a short
`time, with small cost. Since this
`kind of support
`is difficult to obtain by any other means, machine translation
`systems can play an important
`role in a step to screen out
`irrelevant
`results.
`Recent increasing sales of English to Japanese MT system in Japan for internet
`browsing might give a side evidence for a common recognition that MT systems
`can be useful for information gathering and screening purpose''.
`
`im-
`We have to mention one obvious shortcoming in the method and current
`plementation, which comes from the fact that
`it is a direct offspring of machine
`translation technology. In machine translation,
`translation equivalent must al-
`ways be uniquely determined. The statistics-based word selection method and
`the user interface adopted here reflects this fact. However,
`in CLIR,
`there is
`no need to uniquely determine the translation,
`rather,
`appropriate
`expansion
`by synonyms is crucial for good results. We are working to expand the method
`toward this direction.
`
`7 CONCLUSION
`
`We presented a lexicon-based query translation method in which a statistical
`word selection method is combined with an interactive user interface for lexicon
`consultation. The statistical method, DMAXmethod,
`can be applied without
`parallel corpus, and effective even to a simple keyword sequence. An experiment
`showed its superiority to a simple statistical method using only target
`language
`statistics.
`The interactive interface can help the user •.to improve the query
`translation in proportion to the degree he/she participate
`in the process. An
`automatic translation facility is integrated for screening purpose. We described
`, its implementation as a language translation front-end to arbitrary IR system,
`which is a useful form for cross-language information retrieval on the internet.
`3According to a survey[NBP96], more than 100,000 (English to Japanese) MT softwares
`were sold in Japan during fiscal year of 1995, excluding pre-installed or bundled softwares
`that amount
`to almost 10 times more.
`
`AOL Ex. 1021
`Page 14 of 14
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket