throbber
MURAX:
`
`A Robust
`
`Linguistic
`
`Approach
`
`For
`
`Question Answering
`
`Using An On-Line
`
`Encyclopedia
`
`Julian
`
`Kupiec
`
`Xerox
`
`Palo
`
`Alto
`
`Research
`
`Center
`
`3333 Goyote
`
`Hill Road,
`
`Palo
`
`Alto,
`
`CA
`
`94304
`
`Introduction
`
`1 T
`
`the motiva-
`First
`as follows.
`is organized
`paper
`he
`for
`the question-answering
`task is given and a de-
`tion
`scription
`of
`the
`kind
`of questions
`that
`are its
`concern,
`and their
`characteristics.
`A description
`of
`the system
`components
`is given in Section
`3. These include
`the
`encyclopedia
`and the IR system for accessing it. Shal-
`low linguistic
`analysis
`is done using a part-of-speech
`tagger and finite-state
`recognizes
`for matching
`lexico-
`syntactic
`patterns.
`by con-
`the analysis of a question
`Section 4 describes
`is illus-
`and the system output
`sidering
`an example,
`trated.
`Analysis
`proceeds
`in two stages. The first, pri-
`mary query construction,
`finds articles
`that are relevant
`to the question.
`The second stage (called answer extrac-
`tion) analyzes these articles
`to find noun phrases (called
`answer hypotheses)
`that are likely
`to be the answer.
`Both
`stages
`require
`searching
`the
`encyclopedia.
`Queries made during
`the first
`stage are called primary
`queries,
`and only
`involve
`phrases
`from the quest ion.
`The second stage creates
`secondary
`queries which
`are
`generated by MURAX
`to verify specific phrase relations.
`Secondary
`queries involve both answer hypotheses
`and
`phrases from the question.
`in Section 5,
`is explained
`Primary
`query construction
`followed
`by a complete
`description
`of answer extraction
`in Section 6. An informal
`evaluation
`and discussion
`are
`then presented.
`
`2
`
`Task
`
`Selection
`
`Abstract
`
`are applied to the task of an-
`linguistic methods
`Robust
`swering closed-class questions
`using a corpus of natural
`language.
`The methods
`are illustrated
`in a broad do-
`main:
`answering
`general-knowledge
`questions
`using an
`on-line encyclopedia.
`
`A closed-class question is a question stated in natural
`language, which
`assumes some definite
`answer
`typified
`by a noun phrase rather
`than a procedural
`answer. The
`methods
`hypothesize
`noun phrases that are likely
`to be
`the answer, and present
`the user with
`relevant
`text
`in
`which
`they are marked,
`focussing
`the user’s attention
`appropriately.
`Furthermore,
`the sentences of matching
`text
`that are shown to the user are selected to confirm
`phrase relations
`implied
`by the question,
`rather
`than
`being selected solely on the basis of word frequency.
`
`retrieval
`is accessed via an information
`The corpus
`(IR)
`system that
`supports
`boolean search with proxim-
`ity constraints.
`Queries
`are automatically
`constructed
`from the phrasal
`content
`of
`the question,
`and passed
`to the IR system to find relevant
`text.
`Then the rele-
`vant
`text
`is itself analyzed;
`noun phrase hypotheses
`are
`extracted
`and new queries are independently
`made to
`confirm phrase relations
`for
`the various hypotheses.
`
`in a
`being implemented
`are currently
`The methods
`system called MURAX
`and although
`this process is not
`complete,
`it
`is sufficiently
`advanced for an interim eval-
`uation
`to be presented.
`
`is
`this material
`of
`fee all or part
`to copy without
`Permission
`for
`granted
`provided
`that
`the copies
`are not made or distributed
`direct
`commercial
`advantage,
`the ACM copyright
`notice
`and the
`title
`the publication
`and its data
`appear,
`and notice
`is given
`of
`that
`copying
`is by permission
`of
`the Association
`for Computing
`Machinery.
`To copy
`otherwise,
`or
`to republish,
`requires
`a fee
`and/or
`specific
`permission.
`PA, USA
`ACM-SlGlR’93-6/93
`/Pittsburgh,
`01993
`ACM 0-89791-605-019310006/01
`
`81...$1.50
`
`The task is concerned with answering general-knowledge
`questions using Grolier’s
`on-line encyclopedia.
`The task
`is motivated
`by several criteria
`and and goals. Robust
`analysis is needed because the encyclopedia
`is composed
`of a significant
`quantity
`of unrestricted
`text. General-
`knowledge
`is a broad
`domain, which means that
`it
`is
`impractical
`to manually
`provide
`detailed
`lexical
`or se-
`mantic
`information
`for
`the words of the vocabulary
`(the
`
`181
`
`IPR2020-00686
`Apple EX1011 Page 1
`
`

`

`The
`cent ains over 100,000 word stems).
`encyclopedia
`methods
`demonstrate
`that
`shallow syntactic
`analysis
`can be used to practical
`advantage
`in broad
`domains,
`where the types of relations
`and objects
`involved
`are not
`known
`in advance,
`and may differ
`for each new ques-
`tion.
`The analysis must
`capitalize
`on the information
`available
`in a question,
`and profit
`from treating
`the en-
`
`resource.
`as a lexical
`cyclopedia
`language
`that natural
`The methods
`also demonstrate
`process,
`the retrieval
`of
`analysis
`can add to the quality
`confirms
`phrase rela-
`providing
`text
`to the user which
`The task also serves as
`tions and not
`just word matches.
`a practical
`focus for
`the development
`of
`linguistic
`tools
`for content
`analysis
`and reveals what
`kind of grammar
`development
`should be done to improve
`performance.
`The use of closed-class
`questions means that perfor-
`mance can be evaluated
`in a straightforward
`way by
`using a set of questions
`and correct
`answers. Given a
`correct noun phrase answer,
`it
`is generally
`easy to judge
`whether
`a noun phrase hypothesized
`by the system is
`correct
`or not.
`Thus
`relevance judgments
`are simpli-
`fied, and if one correct hypothesis
`is considered
`aa good
`aa any other,
`recall measurements
`are not
`required
`and
`performance
`can be considered
`simply
`as the percentage
`of correctly
`hypothesized
`answers.
`
`1.
`
`the junction
`What U.S. city is at
`and Monongahela
`rivers?
`
`of
`
`the Allegheny
`
`2.
`
`Who wrote
`
`“Across
`
`the River and into the Trees” ?
`
`3.
`
`Who married
`
`actress Nancy Davis?
`
`4.
`
`What
`
`‘s the capital
`
`of
`
`the Netherlands?
`
`5.
`
`Who waa the last of
`
`the Apache warrior
`
`chiefs?
`
`6.
`
`What
`clared:
`
`that de-
`headed the commission
`justice
`chief
`“Lee Harvey Oswald
`. . . acted alone.”?
`
`7.
`
`What
`
`famed falls are split
`
`in two by Goat
`
`Island?
`
`8.
`
`What
`
`is November’s
`
`birthstone?
`
`9.
`
`Who ‘s won the most Oscars for costume design?
`
`10.
`
`What
`
`is the state flower of Alaska?
`
`Figure
`
`1: Example Questions
`
`2.1
`
`Question
`
`Characteristics
`
`A closed-class question is a direct question whose answer
`is assumed to lie in a set of objects
`and is expressible
`aa
`a noun phrase.
`Such questions
`are exemplified
`in Fig-
`ure 1. These questions
`appear
`in the general-knowledge
`
`Who/Whose:
`What
`/Which:
`Where:
`When:
`How Many:
`
`Person
`Thing,
`Location
`Tame
`Number
`
`Person,
`
`Location
`
`Table 1: Question Words and Expectations
`
`and typify
`1 game
`Pursuit”
`“Trivial
`the
`task.
`is the
`concern
`of
`that
`of
`of being
`created
`independently
`are unbiased)
`and have a consistent
`form;
`yet
`they
`are flexible
`in their
`
`the form of question
`They
`have
`the
`virtue
`the retrieval
`task
`(i.e.
`and simple
`stylized
`expressive
`power.
`
`a question
`introduce
`that
`words
`interrogative
`The
`They
`indicate
`information.
`source
`of
`are an important
`of
`and some
`particular
`expectations
`about
`the
`answer
`are
`omissions
`these
`are illustrated
`in Table
`1. Notable
`the words why and how, expecting
`a procedural
`answer
`phrase 2 (e.g.
`“How do you make a
`rather
`than
`a noun
`loaf of bread?”).
`various an-
`can be used to filter
`These expectations
`beginning
`swer hypotheses.
`The answers
`to questions
`with
`the word
`“who”
`are likely
`to be people’s
`names.
`This
`fact
`can be used to advantage
`because various
`heuristics
`can be applied
`to verify whether
`a noun
`phrase is a person’s name.
`may or may not
`“what”
`by
`A question
`introduced
`characteristics
`can
`other
`refer
`to a person;
`however,
`Consider
`the following
`sentence
`frag-
`be exploited.
`ments, where NP symbolizes
`a noun phrase:
`“What
`is
`the NP.
`..”
`and ‘(What NP.
`..”.
`The
`noun
`phrase
`at
`the start
`of such
`questions
`is called
`the
`question’s
`type
`phrase
`and
`it
`indicates
`what
`type
`of
`thing
`the
`answer
`is. The
`encyclopedia
`can be searched
`to try
`to find
`ev-
`idence
`that
`an answer
`hypothesis
`is an instance
`of
`the
`type
`phrase
`(details
`are in Section
`6.1,1),
`The verbs
`in a
`question
`are also a useful
`source
`of
`information
`as they
`express
`a relation
`that
`exists
`between
`the
`answer
`and
`other
`phrases
`in the question.
`
`for
`hypotheses
`answer
`The
`to be locations,
`which
`likely
`prepositions
`or as arguments
`. ..”
`tions
`of
`the form c(When
`times
`potheses
`that
`are dates
`or
`questions
`beginning
`“How many
`pressions.
`questions
`Closed-class
`[Wendlandt
`and Driscoll,
`
`are
`questions
`. ..”
`“Where
`with
`locative
`often
`appear
`to verbs
`of motion.
`Ques-
`often
`expect
`answer
`hy-
`and the expectation
`of
`are numeric
`ex-
`. ..”
`
`are also addressed by a system
`1991]
`for accessing public
`in-
`
`1Copyright
`Abbot
`Tradem=k
`of Horn
`2 Questions
`requiring
`
`Horn
`
`Abbot
`
`Ltd.,
`
`Trivial
`
`Pursuit
`
`is a Registered
`
`Ltd.
`procedural
`
`answers
`
`are
`
`not
`
`considered
`
`but
`
`of more
`
`concern
`
`after
`
`initial
`
`goals
`
`have
`
`been
`
`unimportant,
`attained.
`
`182
`
`IPR2020-00686
`Apple EX1011 Page 2
`
`

`

`formation
`(e.g.
`“What
`
`at NASA
`documents
`are the dimensions
`
`Space Center
`Kennedy
`of
`the cargo
`area in the
`
`shuttle?”
`ilarity
`roles,
`
`sim-
`word-based
`conventional
`In the system,
`).
`terms
`for
`thematic
`measures
`are augmented
`with
`obtained
`from a manually
`constructed
`lexicon.
`
`3
`
`Components
`
`An on-line
`cyclopedia
`
`version
`[Grolier,
`
`Academic
`of Grolier’s
`1990] was chosen
`
`American
`as the corpus
`
`En-
`for
`
`articles,
`27,000
`approximately
`contains
`It
`task.
`the
`[Cut-
`(TDB)
`are accessed
`via the Text Database
`which
`the
`for
`et al.,
`1991], which
`is a flexible
`platform
`ting
`development
`of retrieval
`system prototypes
`and is struc-
`tured
`so that
`additional
`functional
`components
`(e.g.
`et al., 1992])
`search
`strategies
`and text
`taggers
`[Cutting
`can be easily
`integrated.
`
`are
`analysis
`linguistic
`for
`responsible
`The components
`pattern
`a lexico-syntactic
`tagger
`and
`a part-of-speech
`mat cher. The tagger
`is based on a hidden Markov model
`(HMM).
`HMM’s
`are probabilistic
`and their
`parameters
`can be estimated
`by
`training
`on a sample
`of ordinary
`is
`untagged
`text.
`Once
`trained,
`the Viterbi
`algorithm
`used for
`tagging.
`To assess performance,
`an HMM tag-
`ger
`[Kupiec,
`1992b] was trained
`on the untagged
`words
`of half
`of
`the Brown
`corpus
`[Francis
`and Kuilera,
`1982]
`and
`then
`tested
`against
`the manually
`assigned
`tags
`of
`the other
`half.
`This
`gave an overall
`error
`rate of 470 (cor-
`responding
`to an error
`rate
`of 11.2% on words
`that
`can
`assume more
`than
`one part-of-speech
`category).
`The
`percent
`age of
`tagger
`errors
`that
`affect
`correct
`recogni-
`The tagger
`tion
`of noun
`phr~es
`is much
`lower
`than
`4%.
`uses both
`suffix
`information
`and local
`context
`to predict
`the
`categories
`of words
`for which
`it has no lexicon
`en-
`tries.
`
`text was
`the encyclopedia
`tagging
`The HMM used for
`of such
`also trained
`using
`the encyclopedia.
`A benefit
`charac-
`training
`is that
`the tagger
`can adapt
`to certain
`regard
`teristics
`of
`the
`domain.
`An
`observation
`in this
`was made with
`the word
`“I”.
`The
`text
`of
`the encyclo-
`pedia
`is written
`in an impersonal
`style
`and
`the word
`is most
`often
`used in phrases
`like
`“King
`George
`I“
`and
`“World War
`I“. The tagger
`trained
`on encyclopedia
`text
`assigned
`‘T’
`appropriately
`(as a proper
`noun)
`whereas
`the tagger
`trained
`on the Brown
`corpus
`(a mixture
`of
`different
`kinds
`of
`text)
`assigned
`such instances
`as a pro-
`noun.
`
`a se-
`produces
`tagger
`the
`text,
`of
`a sentence
`Given
`part-of-speech
`associated
`of pairs
`of words with
`quence
`categories.
`be
`phrase
`recognition
`to
`These
`enable
`in
`by
`regular
`expressions
`done.
`Phrases
`are specified
`the
`finite-state
`calculus
`[Hopcroft
`and Unman,
`1979].
`Noun
`phrases
`are identified
`solely
`by part-of-speech
`cat-
`egories,
`but more
`generally
`categories
`and words
`are
`used
`to define
`lexico-syntactic
`patterns
`against
`which
`
`kind
`This
`by others
`
`of pattern
`(e.g.
`[Jacobs
`
`has
`matching
`et al., 1991,
`
`be-
`
`co-
`de-
`
`is matched.
`text
`also been exploited
`Hearst,
`1992]).
`are identified
`phrases
`noun
`simple
`Initially,
`only
`greatest
`reliability.
`with
`the
`cause they
`are recognized
`Analysis
`involving
`prepositional
`phrases
`or other
`ordination
`is applied
`subsequently
`as part
`of more
`tailed matching
`procedures.
`Word-initial
`capitalization
`ap-
`was found
`to be useful
`for
`splitting
`a noun
`phrase
`into
`propriately,
`thus
`“New York City
`borough”
`is split
`im-
`“New
`York
`City”
`and
`“borough”.
`Such
`splitting
`(en-
`proves
`the efficiency
`of boolean
`construction
`sev-
`abling
`direct
`phrase matches,
`than
`requiring
`eral words
`to be successively
`from the phrase).
`
`query
`rather
`dropped
`
`3.1
`
`Title
`
`Phrases
`
`book,
`of a film,
`is the title
`phrase that
`A multi-word
`Fur-
`is usefully
`treated
`as a single
`unit.
`play,
`etc.,
`(e.g.
`it may not be a simple
`noun
`phrase
`thermore,
`for Me).
`Such phrases are readily
`identi-
`Play Misty
`fied when marked
`typographically
`by enclosing
`quotes
`or italics. However,
`title phrases maybe marked only by
`word-initial
`capitalized
`letters;
`furthermore,
`some words
`(such as short
`function
`words) may not be capitalized.
`Thus,
`the correct extent
`of
`the phrase may be ambigu-
`ous and alternative
`possibilities must be accommodated.
`The most
`likely alternative
`is chosen after phrase match-
`ing has been done and the alternatives
`compared,
`based
`on the matches
`and frequency
`of
`the alternative
`inter-
`pretations.
`
`4
`
`Operational
`
`Overview
`
`section
`This
`of
`eration
`an example
`
`presents
`the s~stem,
`question,
`
`description
`an informal
`the analysis
`by tracing
`shown
`in Figure
`2.
`
`the op-
`of
`steps
`for
`
`“Who
`that
`
`was the Pulitzer
`ran for mayor
`
`novelist
`Prize-winning
`of New York City
`?“
`
`Pulitzer
`mayor
`
`Prize
`
`novelist
`winning
`New York City
`
`Figure
`
`2: Example
`
`Question
`
`and Component
`
`NP’s
`
`4.1
`
`Primary
`
`Document
`
`Matches
`
`ex-
`first
`are
`verbs
`and main
`phrases
`noun
`Simple
`in the figure.
`from the
`question,
`as illustrated
`tracted
`question
`phrases
`are used
`in a query
`construc-
`These
`tion/refinement
`procedure
`that
`forms
`boolean
`queries
`
`183
`
`IPR2020-00686
`Apple EX1011 Page 3
`
`

`

`a
`
`The
`5).
`(Section
`constraints
`proximity
`associated
`with
`to find
`the
`encyclopedia
`to search
`are used
`queries
`document
`articles
`from which
`primary
`relevant
`list
`of
`are made.
`These
`are sentences
`containing
`one
`matches
`of
`the question
`phrases.
`or more
`scored
`are heuristically
`document
`matches
`Primary
`with
`number
`of matches
`to the
`degree
`and
`according
`the question
`head words
`in a noun
`phrases.
`Matching
`double
`the score of other matching
`words
`phrase
`receive
`in a phrase. Words
`with matching
`stems
`but
`incompat-
`ible
`part-of-speech
`cat egories
`are given minimal
`scores.
`Primary
`document
`matches
`are then
`ranked
`according
`to their
`scores.
`
`4.2
`
`Extracting
`
`Answers
`
`contain
`document matches
`primary
`is assumed that
`It
`so answer extraction
`begins by find-
`answer hypotheses,
`ing all simple
`noun phrases
`contained
`in them.
`Each
`noun phrase is an answer hypothesis
`distinguished
`by
`its components
`words,
`and the article
`and sentence in
`which
`it occurs.
`Answer
`hypotheses
`are themselves
`scored on a per-article
`basis according
`to the sum of
`the scores of primary
`document matches
`in which they
`occur.
`The purpose
`of
`this
`is to minimize
`the prob-
`ability
`of overlooking
`the correct
`answer hypothesis
`if
`a subsequent
`non-exhaustive
`search is performed
`using
`the hypotheses.
`the system tries to verify
`For each answer hypothesis
`phrase relations
`implied
`by the question.
`For
`the ques-
`tion in Figure 2, we note that
`the answer
`is likely to be a
`person (indicated
`by “who”).
`The type phrase indicates
`the answer
`is preferably
`a “Pulitzer
`Prize winning
`nov-
`elist”,
`or at
`least a “novelist”
`as indicated
`by the head
`noun of the type phrase. The relative
`pronoun
`indicates
`that
`the answer also “ran for mayor of New York City”.
`Phrase matching
`procedures
`(detailed
`in Section 6) per-
`form the verification
`using the answer hypotheses
`and
`the primary
`document matches,
`but
`the verification
`is
`not
`limited
`to primary
`document matches.
`is not
`It can happen that a pertinent
`phrase relation
`present
`in the primary
`document matches
`although
`it
`can be confirmed
`elsewhere in the encyclopedia.
`This
`is because too few words are involved
`in the relation
`in
`comparison
`to other phrase matches,
`so the appropriate
`sentence does not
`rank high enough to be in the selected
`primary
`document matches.
`It
`is also possible that
`the
`appropriate
`information
`is not expressed in any primary
`document match
`and depends only on the answer hy-
`pothesis.
`This
`is the case with
`one heuristic
`that
`the
`system uses to try and verify
`that
`a noun phrase rep-
`resents a person’s name. The heuristic
`involves
`looking
`for an article that has the noun phrase in its title;
`thus if
`the article does not share any phrases with the question,
`it would not be part of any primary
`document match.
`Secondary
`queries are used as an alternative means to
`
`The
`for
`
`best matching
`this
`question
`
`phrase
`is: Mailer,
`
`Norman
`
`The
`
`following
`
`documents
`
`were most
`
`relevant:
`
`Document
`Relevant
`
`Title:
`Text:
`
`Mailer,
`
`Norman
`
`(1968),
`the Night
`of
`Armies
`e “The
`1967 peace march
`the
`narrative
`of
`the Pulitzer
`tagon, won Mailer
`National
`Book
`Award.”
`
`a personal
`on the Pen-
`Prize
`and the
`
`1969 Mailer
`l “In
`dependent
`candidate
`City.”
`
`ran
`
`as an in-
`unsuccessfully
`for mayor
`of New
`York
`
`Document
`Relevant
`
`Title:
`Text:
`
`novel
`
`novelists,
`American
`contemporary
`l “Among
`Hawkes,
`John
`Saul Bellow,
`John
`Dos Passes,
`Bernard
`Norman
`Mailer,
`Joseph
`Heller,
`Malamud,
`Thomas
`Pynchon,
`and J. D. Salinger
`have reached
`wide
`audiences.”
`
`Next
`
`best:
`
`Edith Wharton,
`
`William
`
`Faulkner
`
`Figure
`
`3: Example Output
`
`con-
`query may
`A secondary
`relations.
`phrase
`confirm
`sist of solely
`(as for
`the heuris-
`hypothesis
`an answer
`it may
`also include
`other
`ques-
`tic just mentioned)
`or
`tion
`phrases
`such
`as the
`question’s
`type
`phrase.
`To
`find
`out whether
`an answer
`hypothesis
`is a “novelist”,
`the
`two
`phrases
`are included
`in a query
`and
`a search
`yields
`a list of relevant
`articles.
`Sentences
`which
`contain
`co-occurrences
`are called
`secondary
`document
`matches.
`The
`system
`analyzes
`secondary
`document
`matches
`see if answer
`hypotheses
`can be validated
`as instances
`of
`the type
`phrase
`via lexico-syntactic
`patterns.
`
`to
`
`4,3
`
`System
`
`Output
`
`the output
`the system produces
`question
`the given
`For
`from ex-
`3. The presentation
`is different
`shown
`in Figure
`to the
`Answer
`hypotheses
`are shown
`t ant
`IR systems.
`user
`to focus
`his attention
`on likely
`answers
`and
`how
`they
`relate
`to other
`phrases
`in the
`question.
`The
`text
`presented
`is not
`necessarily
`from documents
`that
`have
`high
`similarity
`scores,
`but
`those which
`confirm
`phrase
`relations
`that
`lend
`evidence
`for
`an answer.
`This
`be-
`haviour
`is readily
`understood
`by users, even though
`they
`have not been involved
`in the tedious
`intermediate
`work
`done by the system.
`
`184
`
`IPR2020-00686
`Apple EX1011 Page 4
`
`

`

`are from pri-
`two sentences
`the first
`3,
`In Figure
`mary document mat ches. The last sentence confirming
`Norman Mailer
`as a novelist
`is a secondary
`document
`match.
`It was confirmed
`by a lexico-syntactic
`pattern
`which identifies
`the answer hypothesis
`as being in a list-
`inclusion
`relationship
`with the type phrase.
`to a
`We next
`consider
`this
`approach
`in contrast
`common
`alternative,
`vector-space
`search. Vector-space
`search using full-length
`documents
`is not as well suited
`to the task. For the example question,
`a search was done
`using a typical
`similarity
`measure and the bag of con-
`tent words of the question.
`The most
`relevant document
`(about Norman Mailer)
`was ranked
`37th.
`Somewhat
`better
`results
`could
`be expected
`if sentence or para-
`graph level matching was done (cf.
`[Salton and Buckley,
`1991]), However
`the resulting
`text matches do not have
`the benefit
`of being correlated
`in terms of a particular
`answer and they muddle
`information
`for different
`an-
`swer hypotheses.
`
`of hits,
`number
`with
`the purpose
`
`new boolean
`of
`
`queries may
`
`be generated
`
`1. Refining
`
`the ranking
`
`of
`
`the documents.
`
`2. Reducing
`
`the number
`
`of hits
`
`(Narrowing).
`
`3.
`
`Increasing
`
`the number
`
`of hits
`
`(Broadening).
`
`Iterative
`tigated
`for
`considered
`
`narrowing
`and
`broadening
`where
`phrase
`the situation
`et al., 1983].
`[Salton
`
`has been
`structure
`
`inves-
`is not
`
`5.1
`
`Narrowing
`
`by using title
`(1) and (2) above are performed
`Items
`phrases (Section
`3.1)
`rather
`than the noun phrases, or
`by adding extra query terms such as the main verbs and
`performing
`a new search in the encyclopedia.
`Including
`the main verb in the example
`gives:
`
`5
`
`Primary
`
`Query
`
`Construction
`
`This
`
`section
`
`describes
`
`how
`
`phrases
`
`from
`
`a question
`
`[{O president
`
`lincoln}
`
`shot]
`
`boolean
`into
`are translated
`straints.
`are
`passed
`These
`searches
`the encyclopedia
`and
`ing documents
`(or hits),
`The
`assumed
`of
`the IR system:
`
`con-
`proximity
`queries with
`which
`to
`an
`IR system
`returns
`a list
`of match-
`following
`functionality
`
`is
`
`boolean
`1. The
`term~,
`[term~,
`
`of
`AND
`,..termn]
`
`terms,
`
`denoted
`
`here as:
`
`2. Proximity
`sequence
`of a strict
`by up to p other
`denoted
`terms
`{p term~,
`term~,
`...termn}
`
`terms,
`of
`here as:
`
`separated
`
`terms,
`list of
`of an unordered
`3. Proximity
`by up to p other
`terms denoted here as:
`(p term~,term~,
`...termn)
`
`separated
`
`The overall
`question:
`
`process
`
`is again
`
`illustrated
`
`via an example
`
`“Who
`
`shot President
`
`Lincoln
`
`?“
`
`and
`phrases
`and the noun
`tagged
`is first
`The question
`noun
`In the above
`case the only
`main
`verbs
`are found.
`Lincoln
`and the main verb is shot.
`is President
`terms
`are next
`constructed
`from the phrases.
`outset
`a strict
`ordering
`is imposed
`on the com-
`
`phrase
`Boolean
`At
`the
`
`ponent
`the first
`
`words
`query
`
`of phrases.
`is:
`
`For
`
`the
`
`preceding
`
`question,
`
`{O president
`
`lincoln}
`
`The
`searches
`
`is
`system
`IR
`for documents
`
`this
`given
`that match.
`
`query
`boolean
`Depending
`
`and
`on the
`
`185
`
`of
`to reduce the number
`is done to try
`Narrowing
`the co-occurrence
`scope
`hits.
`It also involves
`reducing
`of terms in the query and constrains
`phrases to be closer
`together
`(and thus indirectly
`there is a higher probabil-
`ity of
`them being in some syntactic
`relation with each
`other).
`A sequence of queries with increasingly
`smaller
`scope are made,
`until
`there are fewer hits
`than some
`predetermined
`threshold.
`A narrowed
`version
`for
`the
`previous
`example
`is shown below:
`
`(10 {O president
`
`lincoln}
`
`shot)
`
`5.2
`
`Broadening
`
`is done to try and increase the number
`Broadening
`hits for a boolean query.
`It
`is achieved in three ways:
`
`of
`
`1.
`
`of words within
`scope
`the co-occurrence
`Increasing
`the requirement
`for
`while
`jointly
`dropping
`phrases,
`E.g.
`(5 president
`lin-
`strict
`ordering
`of
`the words.
`“President
`Abraham
`coln) would match
`the phrase
`Lincoln”.
`A sequence
`of queries
`with
`increasingly
`larger
`scope
`are made
`until
`some
`threshold
`on ei-
`ther
`the
`proximity
`or
`resulting
`number
`of hits
`is
`reached.
`
`from the
`phrases
`whole
`or more
`one
`2. Dropping
`each
`corresponding
`terms,
`Query
`query.
`boolean
`to get more
`hits.
`It
`are dropped
`to a phrase,
`efficient
`to drop
`them in an order
`that
`corresponds
`to decreasing
`number
`of overall
`occurrences
`in the
`encyclopedia.
`
`is
`
`IPR2020-00686
`Apple EX1011 Page 5
`
`

`

`2.
`
`3.
`
`4.
`
`5.
`
`one or more words from within multiple-
`3. Dropping
`word phrases
`in a query
`to produce
`a query
`that
`is composed
`of sub-phrases
`of
`the original.
`In the
`previous
`example,
`to increase
`the number
`of hits
`president
`could
`be dropped,
`and so might
`lincoln.
`
`5.3
`
`Control
`
`Strategy
`
`the noun phrases
`boolean query comprises all
`The initial
`derived
`from the user’s question.
`Broadening
`and/or
`narrowing
`are then applied.
`Although
`a strict
`prioriti-
`zation
`of operations
`does not seem necessary,
`the fol-
`lowing partial
`order
`is effective:
`
`1.
`
`Co-occurrence
`dropped.
`
`scope is increased
`
`before terms
`
`are
`
`Single phrases are dropped from a query before two
`phrases are dropped.
`
`frequency
`Higher
`frequency
`ones.
`
`phrases are dropped
`
`before lower
`
`Complete
`phrases.
`
`phrases
`
`are
`
`used
`
`before
`
`their
`
`sub-
`
`narrowing
`and/or
`of broadening
`process
`The iterative
`number
`on the
`a threshold
`either
`terminates
`when
`queries
`useful
`or no further
`hits
`has been
`reached,
`hits
`are ranked.
`be made.
`Upon
`termination
`the
`practice
`is not necessary
`to provide
`elaborate
`ranking
`it
`criteria
`and documents
`are ranked
`simply
`by the number
`of
`terms
`they
`have in common
`with
`the user’s
`question.
`
`of
`can
`In
`
`6
`
`Answer
`
`Extraction
`
`of how the most
`the description
`completes
`section
`This
`from the
`relevant
`are found
`hypotheses
`answer
`likely
`hits.
`Phrase matching
`opera-
`sentences
`in the
`various
`followed
`by the procedure
`for
`tions
`are considered
`first,
`queries
`to get
`secondary
`docu-
`constructing
`secondary
`several
`hypotheses
`may
`rep-
`ment mat ches. Generally
`so they must
`be linked
`together
`resent
`the same answer,
`and their
`various
`phrase matches
`combined.
`They
`can
`then
`be ranked
`in order
`of
`likelihood.
`
`6.1
`
`Phrase
`
`Matching
`
`patterns
`is done with lexico-syntactic
`Phrase matching
`which
`are described
`using
`regular
`expressions.
`The
`expressions
`are translated
`into finite-stat
`e recognizes,
`which
`are determinized
`and minimized
`[Hopcroft
`and
`Unman,
`1979] so matching
`is done efficiently
`and with-
`out backtracking.
`Recognizes
`are applied
`to primary
`
`phrases are tried
`Title
`nent noun phrases.
`
`before any of
`
`their
`
`compo-
`
`Example
`
`match:
`
`matches,
`
`and the longest
`
`possible match
`
`and secondary
`is recorded.
`and text match is shown in Fig-
`pattern
`An example
`copies of expressions
`can be in-
`ure 4. For convenience,
`cluded by naming
`them in other expressions.
`In the fig-
`ure,
`the expression NP 1 refers to a noun phrase, whose
`pattern
`is defined elsewhere.
`
`Regular Expression Operators:
`
`+
`
`{~.}
`(...)
`
`One or more instances
`Zero or one instances
`sequence of
`instances
`inclusive-or
`of
`instances
`
`Lexico-Syntactic
`
`pattern:
`
`{
`
`NP1 (are were include {such as}
`+{ NP2 ,}
`? { and NP4}}
`? NP3
`
`)
`
`“Countries
`NP1
`
`such as Egypt,
`NP2
`
`Sudan,
`NP2
`
`and Israel
`NP4
`
`. ..”
`
`Figure
`
`4: Example
`
`Pattern
`
`and Document
`
`Match
`
`is layered on top of
`phrase matching
`robustness,
`For
`c~occurrence
`matching
`so if
`the input
`is not a ques-
`tion (or a question
`beginning with
`“how”
`or
`“why”)
`the
`system provides
`output
`that
`is typical
`of co-occurrence
`based search methods.
`inher-
`the problems
`some of
`A large corpus mitigates
`ent
`in using simple language modelling.
`In a document
`match,
`a relation may not be verified because it
`requires
`more sophisticated
`analysis than is feasible with a finite-
`state grammar.
`However,
`the relation may be expressed
`in several places in the encyclopedia
`and thus more sim-
`ply in some places,
`improving
`the chances of verifying
`it.
`
`are also
`spurious matches
`that
`it happens
`Likewise
`made by simple
`phrase matching.
`Other
`things
`being
`equal, an answer hypothesis
`having more instances
`of
`the match is preferred.
`for an answer
`spurious matches
`It
`is less likely
`that
`hypothesis
`occur
`for several different
`phrase relations,
`so many of
`these errors don’t
`propagate
`far enough to
`cause an erroneous
`answer.
`
`6.1.1
`
`Verifying
`
`Type
`
`Phrases
`
`following
`The
`hypotheses
`
`relations
`as instances
`
`are used to try
`of
`type
`phrases:
`
`to
`
`verify
`
`answer
`
`186
`
`IPR2020-00686
`Apple EX1011 Page 6
`
`

`

`Apposition
`
`by
`is exemplified
`This
`of
`the following
`phrase
`match below it:
`
`the match
`question
`
`between
`and the
`
`the type
`document
`
`“Who was the last Anglo-Saxon
`
`king of England?”
`
`1)
`
`king of England, Harold,
`“The last Anglo-Saxon
`c. 1022, was defeated and killed
`at
`. ..”
`
`b.
`
`best. This
`is considered
`mimimum degree of mismatch
`the first
`question
`in Sec-
`is illustrated
`by considering
`tion 6.1.1 and the associated document matches
`(1) and
`(2). Both
`“Harold”
`and “Saint Edward
`the Confessor”
`match
`equally well with
`the type phrase
`“last Anglo-
`Saxon king of England”.
`However,
`“Harold”
`is (cor-
`rectly)
`preferred
`because the match
`is exact, whereas
`a longer match
`is involved
`for
`“Saint Edward
`the Con-
`fessor”
`(namely,
`he was the “next
`to last Anglo-Saxon
`king of England”).
`
`The
`
`IS-A
`
`Relation
`
`This
`
`is demonstrated
`
`by the following
`
`document
`
`match:
`
`6.1.4
`
`Person
`
`Verification
`
`2)
`
`1002 and
`b. between
`the Confessor,
`Edward
`“Saint
`5, 1066, was the next
`to last Anglo-
`1005, d. Jan.
`Saxon
`king
`of England
`(1042-66
`).”
`
`List
`
`Inclusion
`
`Lists
`type.
`
`are often
`Examples
`
`of
`objects
`to enumerate
`used
`are shown
`in Figures
`3 and 4.
`
`the
`
`same
`
`PJoun
`
`Phrase
`
`Inclusion
`
`by
`Type phrases are often related to answer hypotheses
`being included
`in them.
`In the question and correspond-
`ing document match shown below,
`the type phrase river
`is in the same noun phrase as the answer hypothesis
`Colorado River:
`
`“What
`
`river
`
`does the Hoover
`
`Dam dam?”
`
`<<. . . the Hoover
`
`Dam,
`
`on the Colorado
`
`River
`
`. ..”
`
`6.1.2
`
`Predicate/Argument
`
`Mat
`
`ch
`
`and other
`hypotheses
`answer
`associates
`operation
`This
`match
`that
`satisfy
`a verb
`in a document
`phrases
`noun
`in
`a question.
`implied
`relation
`are
`verbs
`Currently
`ac-
`and patterns
`to be monotransitive
`assumed
`simply
`for active
`and passive
`alternation
`are applied.
`counting
`This
`is illustrated
`by the question
`and document
`match
`shown below:
`
`“Who
`
`succeeded Sha.stri
`
`as prime minister?”
`
`“... Shastri waa succeeded by
`prime minister
`. ..”
`
`Indira
`
`Gandhi
`
`aa Indian
`
`as a person’s
`hypothesis
`of an answer
`confirmation
`The
`a reliable
`prop-
`In the encyclopedia,
`is important.
`name
`have word-initial
`names
`is that
`they
`of peoples’
`erty
`This
`simple
`consideration
`significantly
`letters.
`capital
`the number
`of answer
`hypotheses
`that
`require
`reduces
`consideration.
`further
`and
`are present
`names
`different
`multi-national
`Many
`is impractical.
`However
`manual
`enumeration
`exhaustive
`can be used. Articles
`about
`there
`are indirect
`clues that
`name
`aa the
`title
`and
`in
`people
`generally
`have
`their
`a mention
`at
`the beginning
`of
`such cases there
`is often
`the article of birth
`and/or
`death dates which are easily
`identified.
`Usually
`there is also a higher percentage
`of
`words that
`are male or
`female pronouns
`than in other
`articles.
`Thus to try and confirm an answer hypothesis
`as a person’s name, a secondary
`query is made to see if
`it
`is present as a title, and then it
`is decided whether
`the
`article
`is about
`a person.
`This heuristic
`is simple,
`yet
`robust
`(and of course is open to improvement
`by more
`sophisticated
`analysis).
`
`6.2
`
`Secondary
`
`Queries
`
`a supplementary
`are
`matches
`document
`Secondary
`and are found
`relations
`phrase
`means of confirming
`are constructed
`by MU-
`via secondary
`queries which
`RAX and passed to the IR system.
`Broadening
`is ap-
`plied as necessary
`to secondary
`queries,
`but
`terms are
`never dropped
`because they are required
`in the result-
`ing matches.
`For person verification,
`only an answer
`hypothesis
`is used in a secondary
`query, but other
`re-
`lations
`require
`other
`question
`phrases
`to be included.
`These are considered
`next.
`
`6.1.3
`
`Minimum
`
`Mismatch
`
`6.2.1
`
`Type
`
`Phrase
`
`Queries
`
`are ex-
`phrases
`noun
`simple
`identification,
`reliable
`For
`the ques-
`For
`matches.
`document
`tracted
`from primary
`tion
`in Figure
`1,
`the phrase
`“mayor
`of New York City”
`is first
`considered
`as two simpler
`and independent
`noun
`the
`phrases.
`Exact
`matching
`of
`overall
`noun
`phrase
`is done
`after
`all document
`matches
`are found.
`When
`comparing
`type
`phrases
`with
`answer
`hypotheses,
`the
`
`187
`
`Answer
`but when
`word
`of
`minimal
`matches.
`phrase
`with
`
`verbatim
`phrase,
`This
`
`in a query,
`only
`the head
`provides
`the
`
`are included
`hypotheses
`to verify
`a type
`trying
`the
`phrase
`is
`included.
`document
`necessary
`constraint
`on secondary
`in the type
`The detailed matching
`of all words
`of mismatch
`is done
`by
`considering
`the
`degree
`the
`type
`phrase
`(Section
`6.1.3).
`When
`the
`type
`
`IPR2020-00686
`Apple EX1011 Page 7
`
`

`

`hypothe-
`an answer
`against
`be matched
`cannot
`phrase
`of
`their
`pattern,
`the fact
`lexico-syntactic
`any
`sis using
`as it may
`is still
`recorded,
`in a sentence
`co-occurrence
`alternative
`hypotheses
`in
`serve
`as a means
`of
`ranking
`(the
`the absence
`of any better
`information
`relation
`may
`still
`be implied
`by the document
`match,
`but
`cannot
`be
`inferred
`from the
`simple matching
`operations
`that
`are
`used).
`
`6.2.2
`
`Co- Occurrence
`
`Queries
`
`in sec-
`phrases
`question
`other
`to include
`is expedient
`It
`4.2, a relevant
`in Section
`As mentioned
`ondary
`queries.
`the
`primary
`may
`not
`be found
`because
`phrase match
`document
`match
`in which
`it occurs
`has too low a score in
`comparison
`to other
`primary
`document
`matches.
`Creat-
`ing secondary
`queries
`with
`individual
`question
`phrases
`allows
`the relevant
`phrase match
`to be found.
`
`co-occurrences
`are also used to find
`queries
`Secondary
`and question
`phrases
`that
`extend
`hypotheses
`of answer
`of a single
`sentence.
`This
`can
`be
`the
`context
`beyond
`alternative
`answer
`hypotheses
`in the
`for
`ranking
`useful
`differentiating
`phrase matches.
`It
`is
`of other
`absence
`in the following
`question
`and primary
`docu-
`illustrated
`ment matches:
`
`film pits Humphrey
`“What
`the Florida
`Keys?”
`
`Bogart
`
`against gangsters
`
`in
`
`. . Bacall
`“.
`ple in films
`(1948 ),”
`
`a famous
`became
`and Bogart
`such as The Big Sleep (1946)
`
`romantic
`and Key
`
`cou-
`Largo
`
`Fal-
`films were The Maltese
`popular
`of his most
`“Some
`(1942),
`with
`Ingrid
`Bergman;
`con
`(1941);
`Casablanca
`costarring
`his wife,
`Lauren
`Ba-
`The
`Big Sleep
`(1946)
`call; The Treamre
`of Sierra Madre
`(1948);
`. ..”
`
`queries determine
`co-occurrence
`Secondary
`Largo
`answer
`hypothesis
`Key
`co-occurs
`with
`Keys, but
`hypotheses
`do not;
`the other
`“film”
`the
`absence
`of stronger
`evidence
`to the
`contrary,
`Largo receives
`a preference.
`
`the
`that
`Flo

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket