`
`(12) United States Patent
`US 7,003,781 B1
`(10) Patent N0.:
`Blackwell et al.
`Feb. 21, 2006
`(45) Date of Patent:
`
`(54) METHOD AND APPARATUS FOR
`CORRELATION OF EVENTS IN A
`DISTRIBUTED MULTI-SYSTEM
`COMPUTING ENVIRONMENT
`
`(75)
`
`Inventors: Aaron Kenneth Blackwell, Redding,
`CT (US); Aage Bendiksen, Hamden,
`CT (US); Benny Tseng, Danbury, CT
`(US); Zhongliang Lu, Danbury, CT
`(US); Amal Shah, Danbury, CT (US)
`
`g
`73 Assi nee: Bristol Technolo
`(US)
`
`gy
`
`y
`Inc., Danbur , CT
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No: 09/564,929
`
`(22) Filed:
`
`May 5, 2000
`
`(51)
`
`Int. Cl.
`(2006.01)
`G06F 9/00
`(52) US. Cl.
`...................................................... 719/327
`(58) Field of Classification Search ................ 703/328;
`719/328
`.
`.
`,
`See application file for complete search history.
`
`(56)
`
`References Cited
`US. PATENT DOCUMENTS
`5,583,761 A
`12/1996 Chou ......................... 395/798
`5,737,393 A *
`379/8813
`4/1998 Wolf ..
`
`5,768,577 A *
`6/1998 Kleewein et a1.
`‘
`707/10
`
`1/1999 Brown ,,,,,,,,,,,,,
`5,857,190 A *
`707/10
`
`5,889,518 A *
`3/1999 Poreh et al.
`345/804
`......
`5,941,996 A *
`8/1999 Smith et al.
`714/47
`
`5,956,507 A
`9/1999 Shearer; Jr‘ et 31-
`395/674
`
`
`“2001 Ford ~~~~~~~~~~~~~~~~
`6:181:364 31*
`725/32
`4/2002 Carpenter et a1.
`6,381,606 B1 *
`707/100
`
`6,484,150 B1 * 11/2002 Blinn et a],
`705/26
`................. 370/227
`6,625,117 B1 *
`9/2003 Chen et al.
`
`OTHER PUBLICATIONS
`
`Chalmers , Message system, Sep. 8, 1997, p. 1.*
`IBM, An Introduction to Messaging and Queuing, 1993,
`1995 .*
`Borland, API guide, 1999*
`Marc Verhiel , MQSeries Standards and Guidelines, Oct. 1,
`1999*
`
`System Engineering, MQSeries Integrator, Jun. 7, 1999*
`“An Introduction to Messaging and Queuing”,
`IBM
`MQSeries, Jun, 1995, pps. III-VIII, 1-35.
`IBM Technical Disclosure Bulletin—Method of Tracing
`Events in Multi-threaded OS/2 Applications, V01. 36, No.
`09A, Sep. 1993-entire article.
`
`* cited by examiner
`
`Primary Examiner—Meng-Al T. An
`Assistant Examiner—LeChi Truong
`(74) Attorney, Agent, or Firm—Ohlandt, Greeley, Ruggiero
`& Perle, L.L.P.
`
`(57)
`
`ABSTRACT
`
`.
`.
`.
`.
`A methOd and Syswm ls dISCIOSéd for monitoring an opera-
`tion of a distributed data processmg system. The system can
`include a plurality of applications running on a plurality of
`.
`.
`.
`host processors and communicating With one another, such
`as through a message. passing technique. The method
`includes steps executed in indiVidual ones of the plurality of
`applications, of (a) examining individual ones of generated
`API .Calls ‘0 detelrlmlge if a Infield“ APIICE‘H meelslprede'
`termined API ca
`criteria, (b) .1 aparticu ar API cal meets
`the predetermined API call criteria, storing all or a portion
`of the content of the API call as a stored event; (c) processing
`a plurality of the stored events to identify logically corre—
`lated events, such as those associated with a business
`transaction; and (d) displaying all or a portion of the stored
`API call content data for the logically correlated events.
`
`36 Claims, 18 Drawing Sheets
`
`NWJZERJD
`
`
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 1 0f 18
`
`US 7,003,781 B1
`
`USER
`APPLICATION #1
`101
`
`SENSOR #1
`
`103 - 102
`
`QUEUE
`MANAGER #1
`
`‘5
`
`14
`
`18
`
`ANALYZER.10
`
`I) w 20
`-
`DATABASE #1
`107 - 106
`
`12
`
`1a
`
`ANALYZER #1
`
`104 - 105
`
`QUEUE
`MANAGER #3
`
`M 15
`
`105
`
`QUEUE
`MANAGER #2
`
`12 1 r ‘4
`
`18
`
`103
`
`N 0
`SE S R #2
`101
`APPUCAUON #2
`
`USER
`
`‘5
`
`YYZ Ex. 2001
`
`2
`
`ANALYZER
`
`#
`106 - 107
`
`104
`
`102
`
`FIG.1
`
`20
`
`-V
`
`DKUEMSE #2
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 2 0f 18
`
`US 7,003,781 B1
`
`APPUCAUON
`
`210
`
`CALL TRICODER
`FUNCHON
`
`MANAGE
`CONFIGURA'HON
`QUEUE
`
` MANAGE
`INTERNAL
`FILTERS
`
`214
`
`216
`
`_ _ _
`
`222
`
`SEND
`EVENT
`
`220
`
`
`
`YYZ Ex. 2001
`
`
`
`
`
`
`
`
`
`F ______
`}
`226 \I
`:
`
`umMATCHING
`
`FILTER EXISTS?
`
`II ixlL
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 3 0f 18
`
`US 7,003,781 B1
`
`310
`
`312
`
`
`
`INSTANCE
`
`RESOURCE
`
`l
`
`320
`
`31 6
` PROGRAM
`
`LOOKUP TABLE
`
`31 8
`
`STANDARD
`EVENT
`INFORMATION
`
`
`
`TECHNOLOGY
`SPECIFIC EVENT
`INFORMATION
`
`322
`
`EVENT
`RELATIONSHIP
`
`
`
`324
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 4 0f 18
`
`US 7,003,781 B1
`
`METHDD/
`FUNCTION
`
`41°
`
`412
`
`ANALYZER
`DATA TYPE
`
`414
`
`IgE’zLEngEngf
`
`414
`
`DlSPLAY STRING
`GENERATOR #2
`
`414
`
`DISPLAY STRING
`GENERATOR #n
`
`416
`
`DATA LOCATOR
`
`412
`
`”NYZER
`DATA TYPE
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 5 0f 18
`
`US 7,003,781 B1
`
`51°
`
`EVENTS
`GEEVENT QUEZEOM
`
`518
`
`CREATE NEW
`TECHNOLOGY SPECIFIC
`RESOURCES
`
`512
`
`514
`
`51 6
`
`PROCESS STANDARD
`EVENT INFORMATION
`
`ADD EVENT T0
`DATABASE
`
`520
`
`522
`
`CREATE NEW
`STANDARD ENTITIES
`AND RESOURCES
`
`RELATIONS
`
`GENERATE EVENT
`
`PROCESS TECHNOLOGY
`SPECIFIC EVENT
`
`INFORMATION
`
`PERFORM
`DATA ANALYSIS
`
`524
`
`FIG.5
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 6 0f 18
`
`US 7,003,781 B1
`
`FIND POTENTIAL
`MATCHING EVENTS BASED
`ON LOOKUP TABLE
`
`POTAgNrTIAL
`Magmflc
`
`624
`
`ADD THIS EVENT TO
`CURRENT EVENT
`RELATIONSHIP RECORD
`
`
`
`FIG.6
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 7 0f 18
`
`US 7,003,781 B1
`
`iamS
`
`0.95
`
`ozcnfiow
`
`.38
`
`«an
`
`In.
`
`or=owmm
`
`2mg
`
`50.5218... om<oz<hm
`
`oz<$3.qu
`
`muomaomum
`
`Nd:
`
`>862on
`
`38%
`
`83$35%
`
`92a23%
`
`$35%
`
`8.3
`
`a;
`
`N:
`
`2.5
`
`owh
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SCRIPT GENERATED
`BY USER INTERFACE
`
`810
`
`FILTER MANAGER
`
`
`
`814
`
`SCRIPT ENGINE
`
`
`
`RESULTS
`
`812A
`
`USER INTERFACE
`
`FIG.8
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 8 0f 18
`
`US 7,003,781 B1
`
`812
`
`818
`
`USER INTERFACE
`
`USER—DEFINED
`SCRIPT
`
`SCRIPT EDHED
`BY USER
`
`
`
`722
`
`
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`10
`
`M9
`
`1B1873,007,SU
`
`
`
`3mm».mm»
`
`m0.8
`
`6mm»«on..m.Ezmam“353no
`
`
`,35:0Emamm8&8us:n.oz305%<oz9:z_uH68
`
`.892EMu
`
`moza.9mz.zoEmE
`S02529MEEozuo
`
`am29586
`
`m5
`
`mz_._.zu>u
`
`
`
`@53meMI...oz.“—
`
`"EKG
`
`meommwmum:
`
`5mg.to
`
`.m9szz<
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 10 0f 18
`
`US 7,003,781 B1
`
`m1....2hzw>um5.oo<
`
`
`
`zoEmoaEwmmao
`
`noUmMI...0...mz.
`
`
`
`mo...mkzw>u
`
`zo=o<mz<E9:.
`
`a:ozuEma
`
`2.:ea.
`
`$6:
`
`mamaa:55
`
`225(ng“E:9:
`
`2.3mME:555.no
`
`azimumo
`
`mmdc
`
`
`
`MI...._.<._.zm>mmt...oo<
`
`
`
`zoEmoaEmmmbo
`
`noEmmIHo...mz.
`
`
`
`mo...(0....sz
`
`zo_._.o<mz<E.MI...
`
`_$0:
`
`25ca
`
`uE2.8mSum
`
`5528H?mm
`
`a855mz_2958
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 11 0f 18
`
`US 7,003,781 B1
`
`1010
`
`1022
`
`
`
`FTND ALL EVENTS EN
`THE SAME LOCAL
`TRANSACTION AS
`
`
`EVENT E,
`INCLUDING E
`
`ITSELF
`
`
`
`Fig.9
`
`
`
`
`
`ADD EACH OF THESE-
`
`
`EVENTS TO THE SET
`
`DE TRANSACTION
`EVENTS
`
`START:
`USER SPECIFIES AN
`EVENT OF INTEREST
`
`1012
`
`CREATED EMPTY LIST
`OF RELATED EVENTS
`
`1014
`ADD THE EVENT TO
`THE UST OF
`RELATED EVENTS
`
`
`
`FOR EACH OF THESE
`EVENTS. FIND‘ ALL OTHER
`
`EVENTS THAT SHARE THE
`MESSAGE PATH EVENT
`RELATIONSHIP WITH THESE
`
`EVENTS. AND ADD THOSE
`
`
`OTHER EVENTS TO THE
`
`
`REMOVE AN. EVENT. E.
`UST OF
`FROM THE UST 0F
`RELATED EVENTS
`
`RELATED EVENTS
`
`1026
`
`
`
`
`
`
`HAS E
`ALREADY BEEN
` NO
`
`
`ADDED TO THE SET OF
`
`
`TRANSACTION
`EVENTS?
`
`
`
`
`
`END: ALL EVENTS IN
`THE BUSINESS
`_ TRANSACTION
`HAVE BEEN FOUND
`
`
`
`
`
`
`
`1 028
`
`FIG.‘IO
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`e
`
`010
`
`1B1873,007,SU
`
`328Em,883m_33NE“mum;50:no~32..”:22So:m.8%m.38”.3E88336a:«E“.35u}:
`a.$83;3.3Essen.mozsmz.ESE”2:22.mumpzme825m;328%mu2<z
`
`
`
`
`moxaoflmno88295%8was,212mm.35E995:.88zmzmmm..aEmamz<z
`
`
`m”58SEE._._<o:28mm:38.325
`Fwas:9.36a:mmommm883.36ME.”.8“55us:“.5:E5
`
`
`
`
`33E:.uaaou...»532ofifimmwomnomum555Pin::$3“$2mogommm
`
`.35$3052E.5u2<z“.52d5
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 13 0f 18
`
`US 7,003,781 B1
`
`F""‘" ”'1
`
`13
`
`CALL
`
`I I I : I I
`
`SENSOR
`14B
`
`‘5
`
`
`
`;
`APlsl
`APPLICATION
`
`
`COMMANDS
`
`FILTER
`CONFIGURABON
`
`FIG.12
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 14 0f 18
`
`US 7,003,781 B1
`
`Emzo
`
`Amvuziosz
`
`85.502
`
`zoF<ojaa<
`
`zo_._.<o_._mm<
`
`wane
`
`”.5550:
`
`mzo=<03mm<
`
`mv.9...
`
`:@«023.43
`
`umzommmmimmomzum
`6:52<3.NF
`32899:E5$352..
`
`mzo_._.<o:n_n_<
`
`<53p2u>w
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 15 0f 18
`
`US 7,003,781 B1
`
`3<2map
`
`
`
`Skamowzummus...—momzwm
`
`
`
`HomzooEN>A<Z<
`
`mowzwm
`
`mzo:.<o_zazioo
`
`
`
`29.55400(50
`
`.50:
`
`
`
`20:93.50zoFéaoEzoo
`
`P559245.
`
`Emzuw<z<z
`
`mw<m<h<o
`
`<._.<o-02
`zo_5<mzém_w>._<z<
`
`
`zoFEzmwmmm
`
`
`
`gin—(moamp
`
`Hoaxmo£08.1o...—
`65%.
`
`
`
`BOmzoommN>4<z<
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 16 0f 18
`
`US 7,003,781 B1
`
`Eofizmzzmxmux2.8550mo8:.1.
`EBm—fizzmxwox3:8350nowe:1.08—
`E0008E3=ouxo50280j<8x28...5.8we:a:«8
`
`Eoymimzxmrwoxfiimm:rEZw.momo:3.wmw
`
`88~8o3__ooxo_:osoooj<mux:8...E8mo:3«an.
`
`$030832?:38:85oz“.ma:8oz:
`
`80303.35?2:8550oz”.mo:882
`comxoficami38.szwozu8:3m8
`£9::88x88@288we:8vomxooéamfi2.8:662.:mo:2*8
`
`.omxuflcamfi3:8...>Ezuoimo:5—2.2
`
`
`£33It;820032:mdzofime:E.mm:
`
`
`
`8888888888o8o
`
`8080800000000
`
`.52main.mv.9...
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 17 0f 18
`
`US 7,003,781 B1
`
`cto:\gwing
`
`ES:28:50Ego:.Ntnmannw
`
`8thn
`
`533nm
`
`
`
`\onm8:”Susanna
`
`Ezmphoto:gunnuwmunm
`E5:28:5:33850.582axnmnmum
`
`
`gnnmunnnnN
`
`
`
`Um58...n."EmEma:§§£3~
`
`
`
`50m8:.8382
`
`$0338
`
`Bennmamunu
`
`§a§3~
`
`gunmanfi
`
`«2.5.25
`
`$333“
`
`89853“
`
`«Ragga
`
`£3938
`
`5382
`
`wfinmanfi
`
`Nwmunmumnunw
`
`uwnnmmuwmnnw
`
`“$383
`
`Ammunmumnunu
`
`89?.st
`
`sangnnn
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 18 0f 18
`
`US 7,003,781 B1
`
`Event Details
`
`ProcecsID 23698
`IPAddr.172222432
`Program
`assets
`Host
`name
`MQGET(Hconn; Hob}; MsgDesc; GetMngps; BufferLength; Buffer;
`DotaLength; CompCode; Reason)
`
`ThreodID' exzoomesa
`
`Entry
`
`Date: 00-00-2000 Time: 23:38:43:972453 UTC
`
`fieldName
`
`'
`
`
`
`
`
`
`
`
`
`
`
`3x20213548 ueue.manaer
`
`0x20218288 ASSET.VERIFICA1IION.QUEUE
`
`PM]
`'
`.
`I’I
`
`
`
`
`r-
`
`
`
`
`
`
`:
`r-Versm
`
`_'_—-—_
`
`
`
`
`
`lunar—l—
`:
`1- Feedback _—
`:
`“Encodin-
`
`
`
`
`FIG.17
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`US 7,003,781 B1
`
`1
`METHOD AND APPARATUS FOR
`CORRELATION OF EVENTS IN A
`DISTRIBUTED MULTI-SYSTEM
`COMPUTING ENVIRONMENT
`
`FIELD OF THE INVENTION
`
`The invention relates generally to methods and apparatus
`for correlating events attributable to computer programs
`residing on different computer systems in a distributed
`network, and more particularly relates to techniques and
`systems for tracing problem events to their source and
`facilitating their resolution.
`
`BACKGROUND OF THE INVENTION
`
`As the complexity of computer systems and networks of
`computer systems increase, it becomes more complex and
`time consuming to trace and resolve problems. This is
`especially true in large distributed systems where multiple
`computer programs are concurrently running in multiple
`computer systems.
`Typically, experienced software developers are used to
`monitor each of these systems and combine the individual
`analyses in order to obtain a coherent, global View of the
`operation of the distributed data processing system.
`In accordance with current methodologies this is a very
`manual and labor intensive process, and requires unique
`sldlls in the various computer operating environments that
`make up the distributed system. Furthermore, the inputs to
`the analysis, such as event and message tracing data, are not
`in common formats across the various systems. These fac-
`tors combine to make it a very tedious, error prone, slow and
`costly process to attempt to correlate these various disparate
`data traces into a coherent model of the operation of the
`distributed data processing system.
`Furthermore,
`the traditional error diagnosis processes
`typically employ a debugger, which is intrusive, or an
`embedded error logging facility, which normally requires
`that source code modifications be made.
`The deficiencies of the prior art approach to problem
`identification and resolution have become more prominent
`as large scale distributed business enterprise systems have
`been developed, wherein a plurality of different applications
`running on different hosts and under different operating
`systems all cooperate via message passing techniques to
`process input data related to independent and asynchronous
`transactions. A type of management software known as
`“middleware” has been developed to control and manage the
`message flow and processing, and employs message queues
`to temporally isolate the various applications from one
`another. In such a system several thousand transactions may
`be simultaneously in process, resulting in corresponding
`thousands of Application Program Interface (API) calls and
`messages being concurrently generated and routed through
`the system.
`As can be appreciated, identifying a cause of a failure or
`error condition occurring in one or a few of these transac-
`tions can be very complex, time consuming and, because of
`the significant amount of human operator analysis required,
`error prone.
`
`OBJECTS AND ADVANTAGES OF THE
`INVENTION
`
`is a first object and advantage of this invention to
`It
`provide a method and system for providing logical diagnos-
`
`10
`
`15
`
`20
`
`25
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`tic information for events, such as API calls, call arguments
`and return values, for a distributed data processing system
`wherein transactions occur over a plurality of hosts and
`applications.
`It is another object and advantage of this invention to
`provide a method and system for sensing and capturing, in
`a distributed mariner, an occurrence of events including API
`calls, call arguments and return values, for automatically
`correlating captured events relating to a particular distrib-
`uted transaction, and for displaying the correlated events to
`a human operator in a logically consistent manner.
`
`SUMMARY OF THE INVENTION
`
`The foregoing and other problems are overcome and the
`foregoing objects and advantages are realized by methods
`and apparatus in accordance with embodiments of this
`invention.
`
`The teachings of this invention solve the above-men-
`tioned problems by providing a uniform framework for
`capturing, managing, and correlating events from heterog—
`enous environments. In a presently preferred, but not lim-
`iting, embodiment the teachings of this invention support the
`automatic correlation of IBMTM MQSeriesTM (IBM and
`MQSeries are trademarks of the International Business
`Machines Corporation) API events, as well as a human
`user-assisted correlation of similar events, through an event
`modelling scheme and user management interface.
`More specifically, this invention provides the following
`novel processes, systems and sub-systems.
`In a first aspect
`this invention provides a design and
`implementation of an infrastructure for intercepting function
`calls, such as API calls, and generates events representing
`the corresponding function call from different computer
`programs in a distributed computing environment. This
`process is conducted in a non—intrusive manner. The infra—
`structure supports the conditional collection of a subset of
`event data through a data collection filter mechanism.
`In a second aspect this invention provides a set of data
`structures for modeling function calls and data structures,
`software programs, and miscellaneous computer system
`resources (e.g., IBMTM MQSeriesTM queue managers) of
`heterogeneous technologies. These data structures expose
`the event internals through a uniform set of interfaces.
`In a third aspect this invention provides for the develop-
`ment and realization of the concept of event relations for
`modeling a message path relation between a send and
`receive event, which is an important element in an event
`correlation algorithm. An algorithm for
`the systematic
`examination of events and the generation of corresponding
`event relations is also provided.
`In a fourth aspect this invention provides an interface built
`on top of an internal event model for exposing internal
`details of collected events through, for example, Microsoft
`COM object models.
`In a fifth aspect this invention provides an algorithm for
`the automatic correlation of IBMTM MQSeriesTM events
`from diiferent software programs that are involved in the
`same local and/or business transactions.
`
`In a further aspect this invention provides a mechanism to
`allow a human user to select a subset of collected events
`according to a set of evaluation criteria based on the event
`internal data. The user can achieve this selection through the
`use of a scripting language, such as Microsoft Visual
`BasicTM scripts, and a human interface.
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`US 7,003,781 B1
`
`3
`These various aspects of the invention provide a unique
`perspective to manage the collection and correlation of
`events in a distributed computing environment in the fol-
`lowing manner.
`First, event collection is handled in a non-intrusive man-
`ner. That is, no additional work (source code modification,
`recompilation,
`linking, etc.) is needed on the monitored
`software programs for event generation. Moreover, a human
`user need not have any knowledge of the internals of the
`software programs that he/she is monitoring. This contrasts
`favorably with the traditional diagnosis process, including
`those that use the debugger (intrusive) or the embedded
`logging (through source code modifications) approaches.
`Second, event collection can be triggered by the fulfill-
`ment of a set of criteria based on, for example, software
`program running states and computing environments. In
`other words, event collection is in general “disabled” for
`avoiding any interruption of normal program execution, and
`then automatically enabled for responding to an error con-
`dition or a change in program states or environments. When
`enabled by the triggering event(s), the sensor can send all
`event data that satisfies a specific data collection filter.
`Third, an amount of data to be collected from the software
`programs can be decided both statically (through pre-pro-
`grammed filtering conditions) and dynamically (such as
`from certain environment and program states).
`Fourth, the human user can control the monitoring activi—
`ties in a distributed computing environment from one central
`console.
`Fifth, event correlations for transaction analysis can be
`accomplished using an automatic correlation mechanism,
`thereby eliminating or reducing the involvement of highly
`skilled software programmers.
`Sixth, a user interface is provided for enabling a human
`user or operator to visualize and analyze subset(s) of events
`selected by user-defined selection criteria. In the presently
`preferred embodiment these selection criteria are defined
`through the use of Microsoft Visual BasicTM scripts. The
`operator has the ability to modify and customize the scripts
`to tailor the presentation to a desired format and content. The
`script may also be automatically generated by entry of data
`into a few fields in a presentation filter dialogue box.
`A method and system is therefore disclosed for monitor-
`ing an operation of a distributed data processing system. The
`system is a type of system that
`includes a plurality of
`applications running on a plurality of host processors and
`communicating with one another, such as through a message
`passing technique. The method has steps executed in the
`plurality of applications for: (a) examining individual ones
`of generated Application Program Interface (API) calls to
`determine if a particular API call meets predetermined API
`call criteria; (b) if a particular API call meets the predeter-
`mined API call criteria, storing all or a portion of the content
`of the API call as a stored event; (c) processing a plurality
`of the stored events to identify logically correlated events,
`such as those associated with a business transaction; and (d)
`displaying all or a portion of the stored API call content data
`for the logically correlated events. The API call criteria can
`include, by example, system entity identity, the API name,
`timing data and/or restrictions on parameter values to the
`API call. The step of displaying preferably includes a step of
`processing the stored API call content data for the logically
`correlated events using a script (pre-programmed, automati-
`cally generated, or operator-defined). The step of examining
`includes initial steps of:
`installing a sensor between an
`output of the application and a function call library for
`emulating, relative to the application, the interface to the
`
`10
`
`15
`
`20
`
`25
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`function call library; and storing the predetermined API call
`criteria in a memory that is accessible by the sensor. The step
`of examining then further includes steps of intercepting with
`the sensor an API call output from the application; deter-
`mining if the intercepted API call fulfills the stored prede-
`termined API call criteria; and, if a match occurs, capturing
`data representing all or a portion of the content of the API
`call and transmitting the captured data to a database for
`storage as the stored event.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The above set forth and other features of the invention are
`made more apparent in the ensuing Detailed Description of
`the Invention when read in conjunction with the attached
`Drawings, wherein:
`FIG. 1 is block diagram illustrating an exemplary moni-
`toring environment in accordance with the teachings herein;
`FIGS. 2710 are each a logic flow diagram or a logic
`model, wherein
`FIG. 2 depicts sensor work flow;
`FIG. 3 depicts an analyzer data logic model;
`FIG. 4 depicts an analyzer logic model;
`FIG. 5 depicts analyzer new event handling work flow;
`FIG. 6 depicts analyzer event relation generation flow;
`FIG. 7 illustrates a COM model interface;
`FIG. 8 illustrates a presentation data filtering operation;
`FIG. 9 illustrates a first embodiment of transaction cor-
`relation;
`FIG. 10 illustrates a second embodiment of transaction
`correlation;
`FIG. 11 is a table that illustrates a number of exemplary
`standard event attributes, and is referenced below in the
`description of the data model of FIG. 3;
`FIG. 12 is a simplified block diagram illustrating a
`relationship between a sensor, an application, and a call
`library emulated by the sensor;
`FIG. 13 is a block diagram of an exemplary distributed
`enterprise middleware-based system that includes the ana-
`lyzer and related components in accordance with the teach-
`ings herein;
`FIG. 14 is a conceptual block diagram of the analyzer
`console and its interface with sensors;
`FIG. 15 shows an exemplary content of a log file used to
`record message traffic after a tracing facility is enabled;
`FIG. 16 is an exemplary dynamic transaction visualiza-
`tion of message flow and API calls in the distributed
`enterprise middleware-based system of FIG. 13; and
`FIG. 17 illustrates how the captured event data can be
`visualized in an event details mode.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`FIG. 1 illustrates an exemplary analyzer monitoring envi-
`ronment. An analyzer system 10 in accordance with the
`teachings herein comprises two major sub-systems: an ana-
`lyzer 12 (also referred to herein as an analyzer console) and
`a plurality of sensors 14. The sensors 14 may be considered
`as agents that reside in the space of a monitored process, and
`operate to collect
`information on calls of the particular
`technology that a particular sensor 14 is monitoring.
`Referring briefly to FIG. 12, for Microsoft and UNIXTM
`Platforms (UNIX is a trademark of X/Open Company,
`Limited) a sensor 14 library 14B implements all of the API
`entry points for the technology that the particular sensor 14
`monitors. The sensor library 14B is named exactly as a
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`US 7,003,781 B1
`
`5
`standard call library 13, and is installed in a manner such that
`any monitored process or application 16 will interface at
`runtime with the sensor library 14B, instead of the standard
`library 13. This process is conducted in a non-intrusive
`manner and does not require any additional recompilation or
`relinking of the user application.
`For an 08/390TM platform (OS/390 is a trademark of the
`International Business Machines Company), in particular for
`the MQSeriesTM, a different approach makes use of the
`crossing exit mechanism provided by CICSTM (CICS is a
`trademark of the International Business Machines Com-
`
`pany). This approach also maintains the non-intrusive man-
`ner of the sensor 14 injection process.
`Referring also to FIG. 1, during the execution of the user
`application 16, control is passed via path 101 to the asso-
`ciated sensor 14 whenever a monitored API is invoked. In
`response,
`the sensor 14 performs the necessary work to
`generate an event representing the API call state. The
`generation of the event is triggered by the API fulfilling
`requirements stored in a sensor configuration filter 14A
`(FIG. 12), which is programmed with configuration com-
`mands or messages by the analyzer 10.
`A human operator employs the analyzer console 12, also
`referred to as the analyzer user interface (UI), for controlling
`the activities of the sensors 14, for visualizing the collected
`event data, and for performing data analysis. The analyzer
`console 12 sends out the sensor 14 configuration messages
`through a MQSeriesTM-based asynchronous communication
`network 15. This process is illustrated by path 104 (analyzer
`to Queue Manager/Queue 18) and path 102 (Queue Man-
`ager/Queue 18 to sensor 14) in FIG. 1. The sensor 14 also
`makes use of the same communication network 15 to pass
`captured event(s) to the analyzer console 12 Via paths 103
`and 105. The collected events are stored in a local event
`database 20 associated with the analyzer 12, via paths 106
`and 107.
`FIG. 2 illustrates the control flow of the sensor 14. At step
`210 an application 16 makes a function call belonging to the
`set of functions monitored by the associated sensor 14. In the
`preferred embodiment, at step 212, a tricoder function is
`invoked instead of the standard function. Atricoder function
`yields program control to the sensor 14 via path 201 for
`analyzer 10 related processing.
`In step 214, the sensor 14 first manages the configuration
`database 14A, also referred to herein as a configuration
`queue,
`in the analyzer communication network 15. This
`management function includes examining received configu-
`ration messages on the configuration queue,
`removing
`expired messages, and retrieving newly arrived messages. At
`step 216 the sensor 14 examines each of the newly arrived
`messages retrieved from step 214 and updates the internal
`data structures. Each configuration message contains a set of
`data collection filter rules. These rules determine the con-
`ditions which trigger event generation/reporting, as well as
`an amount of information to be collected from the event data
`packet. The filter rule conditions are preferably based on
`system entity identity (e.g., software program name, host
`machine name, queue manager name, etc), API name,
`timing information, and/or restrictions on parameter values
`to the API call, as described in further detail below.
`At step 218 the sensor 14 determines if any of the existing
`filter rules match the current program state. If there is a
`matching event, the sensor 14 generates the event, thereby
`capturing the state of the triggering function call (step 220).
`If there is no matching event, at step 222 the sensor 14
`instead invokes the standard API. The sensor 14 subse-
`quently returns control to the application 16.
`
`10
`
`15
`
`20
`
`25
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`The amount of information contained in the generated
`event depends on the filter rule specification. The filter rule
`specification determines whether function call parameters
`are to be sent, and the range of user data to be carried along
`with the event packet. For example, a particular packet may
`include some thousands of bytes of user message, and the
`filter rule specification may cause only the first 16 bytes to
`be captured and stored as part of the event, or may specify
`that none of the user message data be captured and saved.
`The filter rule specification(s) thus controls the type and
`amount of data that is captured and stored upon the filter rule
`matching the current program state.
`In some cases the amount of captured data may be made
`dynamic, e.g., as a function of the current environment or
`operating state of the system/processor being monitored.
`It is also possible to repeat steps 218 and 220 after the
`standard API call returns control to the sensor 14, in order
`to generate an event representing the post-call state. This
`recursion is indicated by the dashed line 226.
`FIG. 3 illustrates the data model used by the analyzer
`system 10 to store and represent the function call states and
`monitored environment in a hierarchical/networked manner.
`The program 310, host 312, and program instance 314 data
`types represent the system entities in a monitored environ-
`ment, where an entity is any object in the monitored system
`that exists for a certain length of time. Note that a program
`instance 314 is always associated with a program 310 and a
`host machine 312. The program instance 314 can be con-
`sidered as a process and thread of execution in a UNIXTM/
`Microsoft WindowsTM environment (Windows is a trade-
`mark of the Microsoft Corporation), and as a region-
`transaction—task in the 08/390TM CICSTM environment.
`A resource 316 is an entity that is specific to a particular
`technology monitored by the analyzer 10. For example, for
`the MQSeriesTM, the queue manager and the queues are
`considered to be a resource 316. One type of resource 316
`can be associated with another (e.g.: Queue Manager and the
`associated Queue, shown collectively as 18 in FIG. 1).
`An event entry represents the captured state of a function
`call collected by one of the sensors 14 in the system 10. That
`is, it is the internal storage for the event packets collected
`from different sensors 14. An event entry is associated with
`a program instance and optionally one or more resources.
`The event data can be divided into two groups: standard or
`technology neutral event information 318 and technology
`specific event information 320. The former includes infor-
`mation that is common among different technologies. FIG.
`11 is a table that illustrates a number of exemplary standard
`event attributes. It should be noted that the entity origin
`information including host name, program name, program
`instance identifier, and resource name (level 1 and level 2)
`can be accessed through the entity and resource entries
`associated with the respective event entry.
`The technology specific event information 320 contains
`function call parameters and a user data buffer. User data
`refers to the information particular to the application 16, and
`not the technology and function set. The technology specific
`event
`information 320 is divided into two sections, one
`covers the data captured before the standard function call
`(entry data), and one covers the data captured after the
`standard function call (exit data).
`Each event entry is associated with a group of event
`relationships 322. There can be different types of relation-
`ships defined for events. One important type of relationship
`considered by the analyzer 10 of this invention is the
`message path relation. The message path relation associates
`events that serve as the source and destination of a message
`
`YYZ Ex. 2001
`
`YYZ Ex. 2001
`
`
`
`US 7,003,781 B1
`
`7
`transaction between two entities in the monitored system.
`The concept of message path relation is generic for different
`technologies, and is realized by a specific relationship type
`for each technology monitored by the analyzer 10. As an
`example, for the MQSeriesTM it is realized by the MQPUT-
`MQGET type relation that associates MQPUT/MQPUT1
`and MQGET calls dealing with the same message.
`In
`general, an MQPUT call puts data on a queue, while the
`MQGET call takes data from a queue.
`A lookup table 324, similar to a hash table, is used for
`storing key-value mapping. Each entry in the lookup table
`324 contains at least a technology name, a key type, a key
`value, and value list. The value list contains a set of events
`that bear the same key value. For the MQSeriesTM example,
`the key type is based on a combination of Message ID,
`Correlation ID, and Message Time. This allows the analyzer
`10 to group MQPUT/MQPUTl/MQGET events bearing the
`same message ID, correlation ID, and message time, and to
`then look up the event
`in an efficient manner. This is
`particularly useful for deriving a message path relation.
`FIG. 4 illustrates the logic model 718 (see FIG. 7) defined
`for the analyzer 10. Recalling first that event data can be
`divided into a standard and technology specific section, the
`data format for the technology-specific section is different
`for different technologies. The analyzer 10 logic model
`provides a uniform way for exposing the technology specific
`data to different components of the analyzer 10.
`As was indicated previously,
`the technology-specific
`event data section in the data model cove