`INFORMATION RETRIEVAL
`
`edited by
`
`Gregory Grefenstette
`Xerox Research Centre Europe
`Grenoble, France
`
`KLUWER ACADEMIC PUBLISHERS
`Boston / Dordrecht
`/ London
`
`AOL Ex. 1021
`Page 1 of 14
`
`
`
`for North America:
`Distributors
`Kluwer Academic Publishers
`101 Philip Drive
`Assinippi Park
`Norwell, Massachusetts 02061 USA
`
`for all other countries:
`Distributors
`Kluwer Academic Publishers Group
`Distribution Centre
`Post Office Box 322
`3300 AH Dordrecht, THE NETHERLANDS
`
`Library of Congress Cataloging-in-Publication
`
`Data
`
`A C.I.P. Catalogue record for this book is available
`from the Library of Congress.
`
`The publisher offers discounts on this book when ordered in bulk quantities. For
`more information contact:
`Sales Department, Kluwer Academic Publishers,
`101 Philip Drive, Assinippi Park, Norwell, MA 02061
`
`Copyright © 1998 by Kluwer Academic Publishers
`
`stored in a
`All rights reserved. No part of this publication may be reproduced,
`retrieval system or transmitted in any form or by any means, mechanical, photo-
`copying,
`recording, or otherwise, without
`the prior. written permission of the
`publisher, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell,
`Massachusetts 02061
`
`Printed on acid-free paper.
`
`Printed in the United States of America
`
`AOL Ex. 1021
`Page 2 of 14
`
`
`
`A LANGUAGE CONVERSION
`FRONT-END FOR
`CROSS-LANGUAGE
`INFORMATION RETRIEVAL
`YAMABANA Kiyoshi, MURAKI Kazunori,
`DOl Shinichi, KAMEl Shin-ichiro
`
`Information Technology Research Laboratories
`NEC Corporation
`Miyazaki 4-1-1, Miyamae-ku, Kawasaki 216, Japan
`
`ABSTRACT
`
`We present a new disambiguation method for query translation in cross-language
`information retrieval. The method is an application of ambiguity resolution methods
`developed for machine translation systems, and.is a combination of a statistics-based
`word selection method where statistical
`information is extracted from non-parallel
`corpora, and an interactive user interface to improve the translation quality. We
`describe its implementation as a front-end to IR systems.
`
`INTRODUCTION
`1
`With a rapid expansion of the Internet and increasing amount of online docu-
`ments available in foreign language, cross-language information retrieval (CLIR)
`is getting more focus and interest
`than ever. In CLlR, users are allowed to build
`a query in their native language to search documents written in another
`Ian-
`guage,and
`utilize the retrieved result. effectively.
`
`approach to this problem would be to· apply machine •trans-
`A straightforward
`lation (MT),
`translating
`either query or documents
`to reduce the task into
`monolingual
`retrieval. However, usefulness of MT system for this purpose is
`
`AOL Ex. 1021
`Page 3 of 14
`
`
`
`94
`
`CHAPTER 8
`
`in this case.
`important
`not obvious, because translation quality becomes most
`can be obtained only
`Generally speaking, high quality automatic
`translation
`when the applicable domain is limited and sufficient domain knowledge is in-
`In addition,
`its disambiguation capability might not be sufficient
`corporated.
`in a query translation approach, since queries tend to be short, sometimes even
`without
`internal structure. Many of disambiguation
`processes in MT that de-
`pend on syntactic analysis will be ineffective. A document
`translation approach
`may partly avoid this problem, but
`this approach brings other shortcomings,
`such as the increased cost
`inherent
`in duplicating
`databases.
`Since a large
`lexicon having enough number of multi-word terminologies,
`and proper disam-
`biguation in translation,
`are two important
`factors to obtain a good result
`in
`lexicon-based query translation[HG96a],
`it is worthwhile to pursue some dis-
`ambiguation method that does not depend on syntactic analysis.
`
`In this chapter, we present a lexicon-based query translation method in which
`a statistical word selection [DM92, DM93] is combined with an interactive user
`interface (MASA94, YDK+95] for disambiguation,
`both originally developed
`In this method,
`for machine translation.
`translation
`equivalent
`is determined
`from among possible candidate in the bilingual
`lexicon, using a statistics-based
`method, which we call DMAX method,
`in which co-occurrence frequency from
`non-parallel corpora and information from a bilingual
`lexicon are combined.
`Then, an interactive user interface helps the user to change selection, edit
`words, and obtain an explanation of translated words in the user language, at
`the user's will.
`
`Section 2 discusses problems in translation equivalent selection, and shortcom-
`ings of simple statistics-based method using word co-occurrence
`information,
`motivating our method. Section 3 describes the details of our method. Section
`as a language conversion front-end to IR sys-
`4 describes an implementation
`tems. Section 5 shows an experimental
`result. Section 6 is for discussion, and
`section 7 concludes this chapter.
`
`2 PROBLEMS IN TRANSLATION
`EQUIVALENT SELECTION
`
`Ambiguity resolution in translation or word sense selection has been one of the
`lar?est problems not only in query translation but also in general machine trans-
`latI?n systems. These systems, which are put into practical use nowadays,have
`mamly adopted. rule-based disambiguation method utilizing linguistic restric-
`
`AOL Ex. 1021
`Page 4 of 14
`
`
`
`Language Conversion Front-End
`
`95
`
`tions between modifiers and modifyees described in a lexicon and grammar to
`select a suitable equivalent
`translation. Although these methods are effectively
`usedin machine translation systems,
`they have many problems: difficulty in de-
`scribing restrictions on all dependencies
`in advance,
`inability to select suitable
`translations if the input expression meets two or more contradictory restric-
`tions or meets no restrictions,
`etc. These problems are especially conspicuous
`in query translation,
`since queries are usually segments of sentences or phrases
`or even a mere sequence of words without an internal syntactic structure. As a
`result, in query translation,
`current machine translation systems can work only
`as a lexicon consultation tool without enough disambiguation capability.
`
`In order to overcome these difficulties on rule-based systems, translation meth-
`ods utilizing corpus-based
`knowledge such as translation examples (pairs of
`source text and its translation)
`and statistical or probabilistic information ex-
`tracted from large corpora have been proposed in recent years. Among them,
`methods based on statistical
`information such as (co-)occurrence frequency is
`effective for query translation
`because it can be applied independently from
`structural analysis.
`
`In statistics-based word selection methods, suitable translation equivalents are
`selected using statistical or probabilistic
`information extracted from language
`texts[BDDM93, Nom91, Yar92].
`These methods
`are applicable to convert
`queries without syntactic analysis, while the statistical
`information reflects the
`context in which each word occurs and can imply the logical restrictions based
`on indirect structural deperidencies. Moreover, because a lot of machine read-
`~ble texts have already been collected,
`it is not difficult to extract statistical
`mformation for expression in the texts
`(semi-) automatically. However, the
`rnethod based on the statistics
`extracted from bilingual corpora cannot appre-
`ciate this merit enough, since it is difficult and expensive to collect sufficiently
`large bilingual corpora and divide the sentences of such corpora into fragments
`and align them automatically,
`especially for the languages whose linguistic
`structures aren't similar, such as English and Japanese.
`
`translation method to utilize word co-occurrence
`The simplest statistics-based
`information is to select
`the target word pair with highest co-o~currence fre-
`quency. It is calculated in advance how frequently each expressIOn.occurs or
`each pair of expressions co-occurs in the targetJanguage
`text, and the most
`f:equenttranslationis
`selected through this calculated data. This methc-d
`h.mple and easy to apply. However,
`it would be suffered from an
`19hfrequent co-occurrence between the target words
`translation of the source words but are not suitable in
`
`eQluivale:nt
`
`AOL Ex. 1021
`Page 5 of 14
`
`
`
`96
`
`Koiori-no
`bird
`of
`
`kago-ni
`cage dative
`or marker
`basket
`1).
`I put a bovl filled vith vater in the bird cage.
`
`mizu-o
`water objective
`marker
`
`ireta
`filled
`
`CHAPTER 8
`
`booru-o
`bowl objective
`or
`marker
`ball
`
`oita.
`put
`
`Figure 1 Word-to-word English correspondence for a sample Japanese Sen-
`tence.
`
`in the input sen-
`kago, booru appear
`the Japanese words kotori,
`Suppose that
`tence: Koiori-no
`kago-ni mizu-o ireta booru-o oita". The Japanese word kago
`has two equivalent English translations:
`'cage' and 'basket' and booru has two
`translations:
`'ball' and 'bowl'. Because the pair of 'basket' and 'ball' may co-
`occur most frequently,
`the method using only the statistics of target
`language
`might
`translate
`the sentence into '1 put a ball filled with water
`in the bird
`basket.'
`
`Thus, a method using only single language statistics has an obvious limitation
`by its nature. This fact motivated the idea to correlate independent
`language
`statistics
`through translation equivalent word pairings in a bilingual
`lexicon.
`
`3 DESCRIPTION OF THE METHOD
`
`capability and an
`Our method is a combination of automatic disambiguation
`interactive user interface. The query translation
`is based on a bilinguallexi-
`con, which is essentially a collection of equivalent word pairs in two languages.
`Because of word meaning ambiguity,
`the correspondence
`is usually many to
`many. Statistical
`information extracted from non-parallel
`corpora of the do-
`main is used to choose one translation from among possible ones. An interactive
`interface, a main translation window, is provided so that user can interactively
`improve the translation result. The window receives input, shows the converted
`query and a list of possible translations,
`and gives a means to access resources
`such as electronic human-readable bilingual dictionaries.
`
`Figure 2 is the configuration diagram of the method. There are three resources
`provided for query translation and automatic disambiguation:
`the bilingual
`lex-
`icon, word co-occurrence statistics drawn from source language corpora, word
`co-occurrence statistics drawn from target
`language corpora''. User's keyboard
`1[ put a bowl filled with water in the bird cage. The subject phrase watashi-wa = 'I' is
`omitted in this sentence. Detailed word correspondence is shown in Figure l.
`2Here "source language" is used as an equivalent to "user's language", and "target lan-
`guage" to "documents' language".
`
`AOL Ex. 1021
`Page 6 of 14
`
`
`
`Language Conversion Fro nt- End
`~ __ ~:~K~-I;;ut;
`Source
`Language
`Query
`
`'--r-r-1---'
`
`97
`
`Interactive
`Ambiguity
`Resolution
`
`Target
`Language
`Query
`
`",---- -----.~
`,
`
`{
`An Information
`I
`I
`I Retrieval System I
`\
`J
`'----- ------"
`
`.::..:::::: .:::::::: ....::.
`
`::+~i.~~t?:::::::::
`'-----l:::~:~~~Si.~:{
`:::T~i:f:::::::::::::::::::::
`Cooccurrence
`::::::::::::::::::::::::::::::::::::::::::::::
`Statistics
`
`Dual Corpora
`
`Automatic
`Translation
`
`Figure
`
`2 Block Diagram of the Method.
`
`input to a retrieval system is captured by the main translation window, that
`~ntegratesquery translation
`function, automatic disambiguation function,and
`Illteractive disambiguation function. The input word sequence undergoes mor-
`phological analysis, and the words are automatically translated to equivalent
`~ords in target
`language by consulting the bilingual
`lexicon.
`In the transla-
`~Ionprocess, co-occurrence information independently gathered from corpora
`~nsource language and target
`language are combined through the word pairing
`Illformationin the bilingual
`lexicon for translation equivalent selection. Trans-
`!ated query is immediately presented to the user on the same window, allowing
`Illteractive operation for improvement. When the user signals the end of in-
`ti
`ter
`A
`hi
`to the text retrieval system.
`mac me
`ac ion, the converted query is sent
`translation system is integrated to help the user to screen out irrelevant docu-
`ments.by examining rough translations
`of the retrieved result.
`
`DMAX Method for Translation
`Equivalent Selection
`
`equivalent
`quality of translation
`adopt a DMAX (Double MAXimize) method[DM92, DM93],
`frequencies among source words and. those among
`
`AOL Ex. 1021
`Page 7 of 14
`
`
`
`98
`
`CHAPTER 8
`
`first the
`In this method,
`lexicon.
`combined through the link in the bilingual
`source language words which maximize the co-occurrence
`frequency of the
`source language are selected,
`then the equivalent
`translations
`of them which
`maximize the co-occurrence frequency of the target
`language are selected. The
`dependency relation in the source language is reflected in the translated text
`through this method, so it overcomes the difficulty of the method using only the
`frequency data of target
`language text. This DMAX method is also tractable
`because the resource of the required linguistic statistics
`is dual corpora of source
`and target
`languages, not bilingual corpora,
`i.e. the target
`language text does-
`n't have to be the translation of the source language text.
`
`The procedure of this method is summarized as follows:
`
`(A) Preparation of corpus statistics
`science/etc.
`1. Select domain X - X:politics/medicine/computer
`2. Prepare the source and target
`language texts of domain X (the target lan-
`guage text needs not to be a translated text of the source language text)
`3. Calculate co-occurrence frequency of every noun in source language text
`4. Calculate co-occurrence frequency of every noun in target
`language text
`(B) Translation equivalent selection
`
`to translate and the source
`1. Extract nouns from the source language text
`language-target
`language bilingual dictionary of the nouns appeared in the
`text
`NOTATIONS
`Sa(a = 1 .. ·m)
`Tai(i = 1··· t1a)
`COF(Sa,Sb)
`
`: source language noun
`t1a target
`language equivalent nouns of Sa
`:
`: co-occurrence frequency between source language nouns
`s.,s,
`: co-occurrence frequency between target
`Tai,Tbj
`2. Select Sp,Sq I maxp,qCOF(Sp,Sq)
`language equivalent noun of Sp or Sq has not been fixed
`where target
`3. Select Tpr,Tqal maxr.aCOF(Tpr,Tqa)
`language equivalent noun of Sp as T pr
`4. Fix the target
`5. Fix the target
`language equivalent noun of Sq as T qe
`language equivalent noun of Sa(a =
`6. Repeat step 2-5 until every target
`1 ..• m) is fixed
`
`language nouns
`
`The above-mentioned example sentence Kotori-nokago-ni mizu-o ireta booru-
`o oita. can be translated correctly by this DMAXmethod.
`In this example,
`
`AOL Ex. 1021
`Page 8 of 14
`
`
`
`Language Conversion Front-End
`
`99
`
`in Japanese may be kotori and
`frequently
`the word pair which co-occurs most
`kago. And because
`the pair of 'bird'and
`'cage' may co-occur more frequently
`than the pair of 'bird'
`and 'basket'
`among the English equivalent
`translations
`of
`koiori and kago, kago is translated
`booru
`into 'cage'. Repeating
`the procedure,
`is translated
`into 'bowl'
`and the whole sentence
`is translated
`correctly into 'I
`put a bowl filled with water
`in the bird cage.'
`
`3.2 User Interface ..for Interactive
`Disambiguat.iori
`
`to obtain a good retrieval
`is crucial
`of the converted-query
`Since the quality
`so that
`the user can improve
`result, it is a natural
`idea to provide
`a user interface
`the translation
`quality
`through
`interactive
`operation.
`Since IR is usually an
`interactive process,
`interactivityintranslation
`can be naturally
`integrated with
`the system.
`In addition,
`necessary
`interaction
`is essentially
`limited to choosing
`appropriate
`target words.
`These
`circumstances,
`different
`from MT cases, are
`favorable for introducing
`interactivity,
`
`between
`that works as a query translator,
`is to provide a layer
`Our approach
`user's keyboard operation
`and an information
`retrieval
`system, The layer cap-
`tures keyboard
`input,
`recognizes word boundaries,
`translates
`each word into
`the target
`language
`based
`on the bilingual
`lexicon and DMAX method,
`and
`sends the resulting
`string
`to the IR system,
`
`con-
`from that of Kana-K~nji
`is borrowed
`operation
`.design of interactive
`version method, which converts
`Japanese
`phonetic
`character
`(kana)
`input
`to
`an expression mixed with
`characters
`of Chinese
`origin (kanji),
`suitable
`for
`~ritten
`documents,
`Although
`Kana-Kanji
`conversion method
`offer~ conver-
`s~on o,nly within
`a language, we expanded
`the methodint?
`c~nverslOn fu?c~
`tIonalIty between
`different
`languages
`as an interactive machme
`translatIOn
`system[MASA94, YDK+95],
`
`The •.conversion layer
`as a main translation window of the software,
`is realized
`and its translation
`are shown simultaneously,
`.•Assoon>as
`on which the input
`~he user enters
`a query to this window in user's
`language,
`translation
`p~ocess
`IS automatically
`invoked
`and the translated
`query appears on it; The wmdo~
`ti
`as in Kana~KanJl
`offers b th
`.,
`'
`.: •• h
`an easy translation
`equivalent
`selectIOn operalOll
`•.•••••.
`,
`.• . 0
`conversion method,
`and a normal
`functionality
`as in anedltor,T
`d
`editing
`the focus, and it can bechange
`translated word at
`the cursor position
`receives
`to.c- a.nth
`'If
`'ht.
`E.quiv.ale.nt.tr. a.n.s..-
`er word simply by moving the cursor
`to e tor
`rig
`'.
`.•• •
`.
`.
`
`.....0
`
`AOL Ex. 1021
`Page 9 of 14
`
`
`
`100
`
`CHAPTER 8
`
`lation candidates for the focused word are shown in a nearby separate window.
`The selection can be changed by a cursor movement
`to, or a mouse click on,
`the newly selected word. The window's normal editing operations allows the
`user to edit the resulting string freely. If the meaning or usage of translated
`words are not clear, a separate online dictionary can be accessed by a hotkey
`or a mouse click.
`
`it is converting
`implementation:
`Figure 3 shows a snapshot from a current
`a Japanese query jouhou kensoku, a Japanese equivalent
`to "information re-
`trieval",
`into an English equivalent. The input
`(in Kanji)
`is shown on the
`upper small window, and the corresponding translation is shown in the low~r
`window. The cursor is placed just at the tail of the first English word, speer-
`fying current target word. The first Japanese word is highlighted, showing it
`is the original word for the target word. The larger window below shows a
`list of possible translation equivalents for the original word jouhou, extracted
`from the bilingual lexicon. The first line is highlighted to show that
`it is the
`current selection. At the rightmost place of this window, useful information
`describing usage or meaning of each English word is shown to help the user
`to choose an appropriate one. If the user feel this information is not enough,
`bilingual dictionary in a CD-ROM can be accessed and the content of the spec-
`ified word is shownin a separate window. When the user moves the cursor to
`right, the window is replaced with one that contain translation equivalents for
`kensaku(retrieval).
`The highlighted Japanese word also changes.
`In addition,
`the user is free to edit on the window.
`If another Japanese word is entered,
`morphological analysis and translation process automatically start and update
`the displayed contents. When· the return key is pushed the result
`is sent to
`the retrieval system and used as if it were an input query directly built by the
`user.
`
`3.3 Translating Retrieved Documents
`
`A translation fac~lityfor retrieved documents is provided to help user to choose
`from among retrieved results. The user can freely choose between dictionary
`refer~nce functionality. and machine translation functionality,. according to: the
`quality of results obtained by thesefunctionalities.
`
`AOL Ex. 1021
`Page 10 of 14
`
`
`
`Language Conversion Front-End
`
`101
`
`I~
`Informationl retrieval
`
`•
`~~I[ m ']
`[/FiJI
`knowledge
`61!j:-b J
`~bservations [/FCJ~:g~~l[~~b'
`[/FiJI~:g~~1[P\]§B't~f~J
`tIp-off
`[/FiJI~:g~~1[
`.::L - JJ%o6iiJ
`news
`[CJ~:g~~1 [f~'5-J
`report
`
`Figure 3 Snapshot of Translation Window.
`
`4
`
`IMPLEMENTATION
`
`runs on Windows 95, and works as a language
`implementation
`The current
`translation front-end to arbitrary IR systems, where the user language is Japanese,
`and the document
`language is English. The lexicon is derived from one for a
`machine translation system, and contains about a hundred thousand Japanese
`entries.
`
`are not explicitly marked in Japanese, determination
`Since word boundaries
`of word boundaries
`is necessary before absorbing inflections. When an input
`query is given to the main translation window, a morphological analyzer
`is
`started. Ambiguities in word boundaries are resolved by preference rules, based
`on information such as word length,
`the number of morphemes,
`the number of
`content words, etc. The connectivity matrix between morphemes also serves
`to abandon inadequate morpheme sequences. An inflection table that contains
`word inflection rules is used to absorb inflections, instead of stemming.
`
`to an arbitrary retrieval system. The
`The current systemworksasafront-end
`system receives Japanese, converts each. word into an English equivalent,
`then
`sends the resulting English expression to another application to which the user
`started keyboard input. The function of sending characters
`to an arbitrary
`application is realized using a standard protocol ofIME (Input Method Editor)
`of the operating system. Therefore,it
`is portablein
`the sense that
`it can be
`combined with. any retrieval system ..
`
`AOL Ex. 1021
`Page 11 of 14
`
`
`
`102
`
`CHAPTER 8
`
`A general-purpose MT system is also integrated for the screening purpose. The
`MT module accepts input
`from the cut buffer of the operating system. The
`user has only to copy sentences to the buffer to invoke translation/dictionary
`consuitation function. The results are displayed in a separate window.
`
`5 EXPERIMENTAL RESULTS
`
`Because it is still in an early stage of development, and it is not easy to objec-
`tively evaluate an interactive system, evaluation for the method as a whole is
`not available at this time. In this section, we present a result of an preliminary
`experiment
`that measured the capability of statistical disambiguation method
`adopted here.
`
`First, we prepared Japanese newspapers for one year and English journals pub-
`lished in the same year, and calculated inner-sentence co-occurrence frequencies
`-how frequently the two words co-occur in the same sentence-
`between nouns
`appeared in the texts. The number of sentences were about 600 thousand for
`Japanese,
`and 200 thousand for English. Nouns were recognized by longest
`match with entries in the bilingual
`lexicon.
`
`bilingual sentences from Eigo
`Then we randomly chose 70 Japanese-to-English
`Kaiuia Hyougen Jiten( dictionary of expressions for English conversation: Obun-
`sha, Japan) and extracted pairs of corresponding Japanese noun and English
`noun. Each sentence contained 3 to 10 corresponding noun pairs. We compared
`the outputs of translation with DMAX method and translation with the method
`using only co-occurrence frequency(COF) between target
`language nouns. The
`experiment was performed in both Japanese-to-English
`and English-to-Japanese
`directions using the same statistics. We evaluate the outputs of the transla-
`tion by three levels, agree, correct and incorrect: agree means that
`the selected
`equivalent
`translation
`agrees strictly with the word appeared in the original
`sentence, correct means that
`the equivalent
`translation
`doesn't agree with the
`original word but
`its meaning agrees with the one of the original word, and
`incorrect means that
`the meaning of the equivalent
`translation
`does not agree
`with the one of the original word.
`
`is shown in •Figure 4. The. frequencies are total of
`Result of this experiment
`the 70 sentences. The frequency of the word which has only one equivalent
`translation
`is not
`included in this result, but ••we used. such words in other
`words' equivalent translation selection. This result
`indicates that, with DMAX
`
`AOL Ex. 1021
`Page 12 of 14
`
`
`
`Language Conversion Front-End
`
`103
`
`translation
`direction
`J::}E
`J::}E
`E::}J
`E::}J
`
`translation
`method
`DMAX method
`only with English COF
`DMAX method
`only with Japanese COF
`
`agree
`193
`186
`167
`157
`
`evaluation
`correct
`incorrect
`56
`42
`49
`56
`80
`82
`71
`101
`
`correct
`percentage
`85.6%
`80.8%
`75.1%
`69.3%
`
`Figure 4 Result of the translation experiment.
`
`method, we can select equivalent
`latter method.
`
`translation more accurately than with the
`
`6 DISCUSSION
`
`it does not require
`A major characteristics of DMAX method used here is that
`parallel corpora. Statistics on. word co-occurrence is obtained independently
`fro~ corpus of each language,
`and they are correlated in terms of bilingual
`lexicon. Therefore it can be easily applied to domains or language pairs for
`whichlarge quantity of parallel corpora is not available. In addition, translation
`from target
`language text
`to source language text can be achieved using the
`same statistics, because the statistics of two texts are independent. On the
`other hand,
`the quality of the bilingual
`lexicon is crucial. Although building
`a.ndkeeping the lexicon up-to-date
`is an expensive task, its cost may be justified
`SIncebilingual
`lexicons are a sharable
`resource with other NLP applications
`such as machine translation
`systems. To this problem, automatic thesaurus
`construction might be expected as future direction.
`
`The interface can be made highly independent of the retrieval system itself.
`Ourrent implementation
`has its own resources including bilingual lexicon, and
`can be used with essentially any retrieval system as longasitaccep~s
`query,in-
`put from keyboard. This modularity is important
`from practical point of VIew.
`~sers can start cross-language
`retrieval anytime they find a window for query
`Inp.utwhile browsing the internet.
`In this respect,the method and implem~n-
`tatI?n. diverges from previous work on interface for multilingual
`inform~tlOn
`retneval (e.g. [LPS92]),
`in which the interfaceand
`operationsaredeterIl1I11e~
`.J
`by the organization of controlled vocabulary lexicon for the IR system'
`, ...•...., .....,;
`thellsercan
`of the interactive method isthat
`characteristic
`Another important
`Jre~ly choose the level of interaction. Always optional,theusermaycomplet~11
`?Itnt the interaction (except
`for sending in the the result). On the. contrary, .If
`
`,
`
`AOL Ex. 1021
`Page 13 of 14
`
`
`
`104
`
`CHAPTER 8
`
`tool for interactive
`it works as a support
`the user uses interaction intensively,
`query translation.
`In this way the system and the user work in a cooperative
`way, and interactive operations are utilized essentially to improve the automatic
`translation result.
`
`As for the usefulness of MT in information retrieval, we believe currently avail-
`able machine translation
`systems are useful, when integrated
`as a tool for
`screening purpose[OD96]. Although current machine translation
`systems have
`a limited translation capability in general,
`they can give a rough translation
`for a large amount of documents
`in a short
`time, with small cost. Since this
`kind of support
`is difficult to obtain by any other means, machine translation
`systems can play an important
`role in a step to screen out
`irrelevant
`results.
`Recent increasing sales of English to Japanese MT system in Japan for internet
`browsing might give a side evidence for a common recognition that MT systems
`can be useful for information gathering and screening purpose''.
`
`im-
`We have to mention one obvious shortcoming in the method and current
`plementation, which comes from the fact that
`it is a direct offspring of machine
`translation technology. In machine translation,
`translation equivalent must al-
`ways be uniquely determined. The statistics-based word selection method and
`the user interface adopted here reflects this fact. However,
`in CLIR,
`there is
`no need to uniquely determine the translation,
`rather,
`appropriate
`expansion
`by synonyms is crucial for good results. We are working to expand the method
`toward this direction.
`
`7 CONCLUSION
`
`We presented a lexicon-based query translation method in which a statistical
`word selection method is combined with an interactive user interface for lexicon
`consultation. The statistical method, DMAXmethod,
`can be applied without
`parallel corpus, and effective even to a simple keyword sequence. An experiment
`showed its superiority to a simple statistical method using only target
`language
`statistics.
`The interactive interface can help the user •.to improve the query
`translation in proportion to the degree he/she participate
`in the process. An
`automatic translation facility is integrated for screening purpose. We described
`, its implementation as a language translation front-end to arbitrary IR system,
`which is a useful form for cross-language information retrieval on the internet.
`3According to a survey[NBP96], more than 100,000 (English to Japanese) MT softwares
`were sold in Japan during fiscal year of 1995, excluding pre-installed or bundled softwares
`that amount
`to almost 10 times more.
`
`AOL Ex. 1021
`Page 14 of 14
`
`