`
`
`US 20080297513A1
`
`(19) United States
`(12} Patent Application Publication (10) Pub. N0.: US 2008/0297513 A1
`Greenhill et a].
`(43) Pub. Date:
`Dec. 4, 2008
`
`(54) METHOD OF ANALYZI.\'G DATA
`
`Related US. Application Data
`
`(75)
`
`Inventors:
`
`Stewart Ellis Smith Grecnhill.
`Hilton (AU); Svetha Vcnkatcsh.
`Winthrop (AU): Peter Leslie Lee.
`Wattle Park (AU): Geoffrey Alec
`William West. Kalamunda (AU):
`(fhiou Peng Lam. Karawara (AU)
`
`Correspondence Address:
`EDELL. SHAPIRO & FINNAN, LLC
`1901 RESEARCH BOULEVARD. SUITE 400
`R0(TKVILLE, Ml) 20850 (US)
`
`[73) Assignee:
`
`[POM PTY LTD, Bentley (AU)
`
`(21) Appl. No;
`
`12"‘102‘502
`
`[22}
`
`Filed:
`
`Apr. 14. 2008
`
`[63) Confirmation of application No.
`001595. filed on Oct. 14. 2005.
`
`P( T'l‘r’AU2005a"
`
`P“""“""“ Classmcat'“
`
`(51)
`
`ll'lt- C1-
`G067 ”/20
`6990 5/02
`G06F 3/048
`(52) U.S. (.‘l.
`
`(2006.01)
`(200601)
`(2006.01)
`345f4401345/589:715f77l
`
`(57}
`
`ABSTRACT
`
`A computer asststed method of analysts suttable for process
`control. comprises the steps of: receiving first data streams
`representing values from a process; receiving second data
`streams representing states of the process: recording metaw
`data about the data streams; calculating relationships between
`pairs of the data streams; and recording relationship data
`resulting from the calculating step together with an associa—
`tion between at least one relationship datum and its corre-
`spending meta-data.
`
`12
`
`Cmrehu'nn
`
`'3"-
`
`!.
`26 ———-F- Chrreiationlllatzbuse
`
`.
`
`_.
`
`x
`
`~
`
`\5- .y’
`j,./'_-III \'\. , "
`\_
`.’
`
`/
`
`-
`‘iI-I
`'
`
`10
`/
`
`24
`l
`
`__
`
`Import
`
`Neural Net Training
`
`SOMMedel
`
`0
`
`Tugflmuping
`
`_'
`
`‘\_
`
`'-
`
`.
`
`\
`
`'
`
`A
`
`,i-I
`
`‘\\
`
`,-
`
`
`
`
`Flames dam
`14
`Event Dam —-..______q_____' 16
`
`Exw'.
`|_
`11-3. Herc-am | soLon l
`‘ SQL on i
`run
`i
`
`
`
`,
`_
`//-—
`.-
`f.
`
`J
`\
`1'
`
`
`
`
`Classification
`\
`
`|\\
`State Labels
`'--
`-—
`\ 32
`
`\\
`
`\\
`
`“ c
`
`\ ,V
`’ \
`
`I
`
`. '
`
`x
`
`‘\
`
`\x
`
`
`
`
`
`
`-
`
`.'
`
`
`'PJgGmupSel
`f
`l
`_ 28
`'
`I
`I
`|
`
`'
`
`_'
`
`-.
`
`I!
`1
`.
`
`,
`
`:
`
`,-
`
`‘
`
`'
`
`l
`
`-
`
`'—'
`
`y I.
`I. e __
`"
`
`"
`
`‘
`.,
`
`'
`
`.
`
`-
`
`\
`
`.I
`
`_-
`
`x
`
`'
`
`-
`
`_
`
`.
`I
`
`-
`
`_
`
`_
`
`__
`
`“K
`
`f
`
`-
`
`34
`
`Correlation Mam:
`Signals
`Lug;
`Pieces:
`Model
`_
`view
`_i
`._
`View
`,
`._
`View
`x
`View
`“cw
`
`/
`.
`..
`.
`.
`f" _
`_
`_
`_ //.. ._
`.
`_
`.
`.._ {... ..... .._
`_
`40
`44
`36
`38
`42
`
`1
`
`APPLE 1009
`
`APPLE 1009
`
`1
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 1 of 11
`
`US 200810297513 Al
`
`or
`
`_.
`
`3.55,“.62:2qu
`
`x .
`
`
`
`..\\\\.
`
`\\
`
`
`
`
`
`
`
`“Bans“ ._.1/1:
`38Hma40mvEdamTGIIUAO_.20.“._352m
`_..a._.._
`"-_....n._H
`..in._...W./_.....
`
`EEGEmfixx.
`
`Wm
`
`NN
`
`
`
`v_‘
`
`
`
`Ema—u$505.5
`
`
`
`ff
`
`asf\..I.v...u..m
`
`0N
`
`32>
`
`
`
`Wm
`
`fizz—2533230
`.A|
`
`2
`
`
`
`
`
`
`PatentApplication Publication
`
`Dec. 4, 2008 Sheet 2 0f 11
`
`US 2008l0297513 A1
`
`I. u. 1i
`:
`. "r.“- nc r‘:.'-‘
`-
`v-
`.... ar‘ -
`
`u:lll_
`
`a“I:
`
`in.vl-I
`
`I“
`
`IIIIth{I
`
`am;
`lid!
`
`'5 "" ‘rf-‘H. '53,?—
`
`3
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 3 0f 11
`
`US 2008l0297513 A1
`
`03TH
`5 .P‘J
`OSTI1R9.PV
`[W
`93111‘
`
`1."11
`PU
`1
`PV
`Ti
`311132.“!
`
`0531'}!
`29.91:
`Tl‘l
`mm
`3 [1
`zrpu
`1
`PU
`1"!
`TH
`91
`3.
`03"!121FV
`fl
`H
`
`nan-mm
`.P‘J
`63TH iTPV
`03Tl1t8_W
`'_ 1
`P“!
`Ti! I
`
`W
`I3»
`I 037:
`2.96
`03TH I
`03?” 1 I .W
`
`'fl‘l
`
`03.91.!
`
`9.3.-
`
`almwmumfi3“Eu...“
`
`42..
`
`KL.Q——fig
`
`12.9'50t3:“:
`
`Funny—Can.
`
`Hq:215:3
`
`Baa—mm.
`
`4
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 4 of 11
`
`US 2008l0297513 A1
`
`
`
`3.33mi:wmi
`
`33.Mm:2£3.21...
`
`33.53.35.5-
`
`
`
`Plum—b”lyriifJJI..Il|1.l\\\wHH_.—_.r
`
`
`
`
`
`v.9”.
`
`H53033b
`
`2Db
`
`092b
`
`GEEO
`
`
`
`meUn—O
`
`«we?a
`Ema,“3gb
`
`mNHmU<O
`
`
`
`
`
`
`535.0320,}
`
`5
`
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 5 of 11
`
`US 2008l0297513 A1
`
`
`
`
`
`ft...
`
`NEF."W:|.§|W:-
` W*x.\wzm
`
`
`_
`
`
`
`so”.HnInt.flit.
`
`
`
`gonnasaw—m_
`
`I1..-
`
`a3%”...
`
`..
`
`m.07.
`
`..
`
`
`
`RED«~33
`
`
`
`H.353£35.?
`
`3‘.
`“RE."$9“wa
`
`mmmfimw...L$39.0
`
`«mmmivm..
`
`«mgfi
`
`
`
`6
`
`
`
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 6 0f 11
`
`US 2008l0297513 A1
`
`.Ih-Hi
`E_.._"It......._......
`
`
`
`.5qu
`
`2623.
`
`I..5.3H;
`
`
`
`gnu—Ea
`
`3,232.
`
`
`
`5.3.“:Emu“.m.
`
`
`
`h.r.uu..I.I....\..
`
`
`
`....fi..{IIonI....I.I..:..:I..III.IIIII.I.II.III
`
`‘|.l.|.l|.|.|.|.l.lLl
`iiiiiififlifiif 3-- .
`
`
`
`
`
`J
`
`
`
`
`
`__.___..._.._..._......_...:..
`
`Janis“..
`
`
`
`
`
`
`
`incan......3;
`
`7
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 7 0f 11
`
`US 2008I‘0297513 A1
`
`fl|llliaaul=$fi§~wfi
`
`2533
`
`ban—93
`
`_a
`
`aknufinaEE.550
`
`39gfiw.mwfi+w
`
`_3.258_DEB-=8
`22::m.
`
`fig.h
`
`
`
`
`
`
`
`
`
`r:I?bLa‘Lmltf.AILI...t...)13!...
`
`4‘11.«a1¢(lfirat?.H
`
`fimfig
`
`
`
`23...amufiféofiz..
`
`45.??339
`$3434.!
`
`8
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 8 of 1]
`
`US 200810297513 Al
`
`‘I
`.
`.
`- ‘MH\'h::~-m$:;;m»rtmzw "-RwW;mL1>uuz—'LJ¢.:LH.A;3_ ,
`
`1 i
`
`5.838]
`
`....
`
`
`
`..u.‘{7.9.5ia1JA....._....T«..............n......E.................»...........}..s....:.“..r..a.........w.1.2.....r....n..«aim.
`
`
`
`
`
`
`
`
`
`
`b|_........lurkrvilglllfllidl
`.i>n.mmmimalllI.
`
`
`$3850.
`
`>n_.w:0.rmo
`
`.‘
`
`9
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 9 of 11
`
`US 2008l0297513 A1
`
`.€a£§g
`
`v
`
`innit-IIIQ1...
`
`IIIiIIlIrlllI...c5
`
`:alirlliitratll.
`
`Llinlé_it»..?If1.-....I1....:
`
`
`c1.
`
`..I!
`
`
`
`
`
`ill..........l....£fl...l.ufi_.i
`
`ping“-
`
`10
`
`10
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 10 ofll
`
`US 200810297513 Al
`
`
`
`
`
`”Busing.53%;?EN.._51.1633..
`
`
`
`_3i§___=___=.__fii-114:0....a...:...n...w
`iE==I=2%2.55.53:5:?—=E=I__*_§§aE_.
`
`_.Em__q.=.l==_l_!;l_=!§§:.5.m..E_
`
`
`..a.E:
`
`
`
`
`
`
`
`
`
`_
`i...
`
`@3235:
`
`11
`
`11
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 11 of 11
`
`US 200810297513 A1
`
`
`
`12
`
`12
`
`
`
`US 2008t02975 13 Al
`
`Dec. 4, 2008
`
`METHOD OF ANALYZING DATA
`
`CROSS REFERENCE TO RELATED
`APP]..1CA'I‘IONS
`
`[0001] This application is a continuation of International
`Application No. PCTt'AU2005f001595. filed on Oct. 14.
`2005. entitled “Method of Analysing Data.“ which claims
`priority under 35 U.S.(.‘. § 119 to Application No. AU
`2004905955 filed on Oct. 15, 2004. entitled “Method ofAnal-
`ysing Data." the entire contents of which are hereby incorpo-
`rated by reference.
`
`FIELD OF THE INVENTION
`
`[00 02] This invention concerns a computer assisted method
`of analysis suitable for process control. In further aspects the
`invention concerns a computer system for performing the
`method and computer software for peri'omiing the method.
`The invention has particular utility in the control oflndustrial
`Processes.
`
`BACKGROUND
`
`Industrial processes involve large and complex sys-
`[0003]
`tems. Typically. an industrial process involves many thou—
`sands ol‘variables which are controlled in part by automatic
`processes. and in pan by human operators. in the operation of
`these processes large amounts ofinibrmation are collected by
`process control and monitoring systems.
`[0004] Most tools currently available for process analysis
`are complex mathematical analysis tools that are general in
`nature. require an understanding of their language. and are
`expensive and time consuming to use. Tools such as Mallab.
`Excel. or Mathcad are routinely used in process enginmring
`environments. However. they require that the data all be
`stored in memory. limiting the complexity of the problems
`that can be analyzed or visualized.
`
`SUMMARY
`
`[0005] The invention is a computer assisted method of
`analysis suitable for process control. comprising the steps of:
`[0006]
`receiving first data streams representing values from
`a process:
`[0007]
`receiving second data streams representing states of
`the process;
`[0008]
`recording metadata about the data streams;
`[0009]
`calculating relationships between pairs of the data
`streams: and
`
`recording relationship data resulting from the cal-
`[0010]
`culating step together with an association between at least one
`relationship datum and its corresponding meta-data.
`[0011] By recording relationship data between the data
`streams together with corresponding metadata the process
`engineer is able to gain insight about the process and its
`control in relation to aspects of the process described by the
`metadata.
`
`[00.12] The data streams may be continuous streams. or
`they may be discontinuous. discontiguous or even a succes-
`sion of blocks of data.
`
`[0013] The values oi‘the first data streams may be measure—
`ments from the process. The values of the first data streams
`may be sampled over time. The states of the second data
`streams may be events or conditions in the process.
`
`[0014] There may he one or more third data streams repre-
`senting statistics calculated front the first or second data
`streams. or both.
`
`[0015] The metadata may concern the origins of the data
`streams. for instance it may comprise tags that identify the
`location of origin of each respective data stream. The asso-
`ciation may link each datum to its respective locations of
`origin. There may be more than one location depending on the
`origins of the data streams. The meta-data may include flow
`charts or plant diagrams. The chart or diagram may display
`the value ofeach datum at the location of its source.
`
`[0016] The calculating step may involve calculating corre-
`lations of the data streams. The calculating step may involve
`calculating. for a range oi'dii'ferent time lags. autocorrelations
`of the data streams. Alternatively. or in addition the calculat-
`ing step may involve calculating, for a range of different time
`lags. cross-correlation of pairs of data streams.
`[0017]
`Sub—sets may be created within the relationship
`data. and each sub-set may comprise data having a value
`within the same predetermined range of values. For instance.
`each sub-set may comprise data having a correlation value
`within the same predetermined range ol'values. Where the
`metadata involves tags that label the locations of origins a
`sub-set is designated a ‘tag group‘.
`[0018] The predetermined range of values is a user select—
`able parameter. so for instance the user may select a sub—
`group. or tag group. made up of data streams that are corre-
`lated to better than 90%. The degree of correlation may be
`changed by the user and this may automatically flow through
`to a change in the composition of the group. A similar result
`may automatically be achieved when making other changes.
`such as changing the amount of lag in correlation.
`[0019] As time passes and more data is received. the calcit—
`lating step may he perfomted again to update the relationship
`data. The step may even be performed repeatedly in real time.
`[0020] The relationship data may be displayed in a first
`form as a matrix with a single dattun in each cell of the matrix.
`The relationship data calculated for each data stream will
`appear in both a row and a column of the matrix. The matrix
`may be convertible directly to raster.
`[0021] The rows and columns may be grouped according to
`the value of the relationship data. in other words the tag
`groups may automatically be collected together.
`[0022] The relationship data may be displayed in a second
`form as a diagram of inctadata having locations marked
`according to their corresponding relationship datum. The
`location ofthe source oi‘each data stream. may be indicated in
`the diagram of metadata.
`[0023] The relationship data may be displayed in a third
`form as a list.
`
`[0024] The data streams may also be displayed in the fomi
`ol‘time-series data.
`
`[0025] Historical values of the relationship data or data
`streams may be displayed.
`[0026] Correlations between a pair ofdata streams may be
`displayed as a function of lagged time.
`[0027] Coding may be used to identify dili‘erent sub-sets in
`the display. and this coding may survive when a different view
`is selected so that a tag group highlighted in one group is still
`highlighted when the view is changed. Tire coding may be
`color coding or shading. A user may be able to select a sub-set
`by:
`[0028]
`
`clicking on a cell in the matrix;
`
`13
`
`13
`
`
`
`US 2008f029'i'513 A1
`
`Dec. 4, 2008
`
`clicking on a marked location in the meta-data dia-
`
`[0029]
`gram; or,
`clicking on a datum in the list.
`[0030]
`[003]] A nemal network may be trained to model the state
`space of the process.
`[0032]
`I11 another aspect the invention is a computer system
`for performing the method.
`[0033] A further aspect of the invention is computer soft-
`ware for perfonning the method.
`[0034]
`In the claims of this application and in the descrip-
`tion of the invention1 except where the context requires oth-
`erwise due to express language or necessary implication. the
`words “comprise" orvariations such as “comprises" or “com-
`prising" are used in an inclusive sense. i.e. to specify the
`presence of the stated leatures but not to preclude the pres-
`ence or addition of further features in various embodiments of
`the invention.
`
`BRIEF DESCRIPTION OF TIIIi DRAWINGS
`
`In order to provide a better understanding of the
`[0035]
`present invention preferred embodiments will be described
`below. by way ol'example only. with reference to the accom-
`panying drawings. in which:
`[0036]
`FIG. 1
`is a schematic view of information flow
`between parts ofan embodiment ofthe present invention.
`[0037]
`FIG. 2 is a large scale visualization ofa cross—cor—
`relation matrix (717x717 variables).
`[0038]
`FIG. 3 is a small scale visualization of the cross-
`correlation matrix of FIG. 2 (approx 40x40 variables).
`[0039]
`FIG. 4 is a process view showing tag grouping. The
`selected tag is displayed as a filled square. The related lags are
`displayed as filled circles.
`[0040]
`FIG. 5 is a process view showing tag similarity. The
`selected tag is displayed as a filled square. Other tags are
`displayed as filled circles, with the shading indicating the
`degree of correlation according to the cttrrently defined shad-
`ing mapping.
`[0041]
`FIG. 6 is a signal view showing changes over time
`for process variables and alarms in a tag group.
`[0042]
`FIG. 7 is a signal view showing signal amplitude
`using shading rather than plotting on the vertical axis. This is
`useful for visually identifying patterns in large sets of tags.
`[0043]
`FIG. 8 is a signal view showing a small set of vari-
`ables with scale information.
`
`FIG. 9 is a signal view showing all alarm events over
`[0044]
`a two month period.
`[0045]
`FIG. 10 is a lags view showing cross—correlation
`between a pair of variables as a function of time.
`[0046]
`FIG. 11 is a state space view labeled according to
`key perfonnance indicators.
`
`DliTAILl'iD DESCRIPTION
`
`[0047] The embodiment described here is used as a Process
`Data Management System (PDMS). which deals with data
`front industrial processes.
`It will be appreciated that
`the
`present invention may be used to analyze data from other
`sources.
`
`[0048] Due to the amount of data produced by a typical
`industrial process. and the speed at which it must be handled.
`specialized data structures have been developed to represent
`this information. An industrial process is intended to mean a
`non-trivial process in which one or more raw materials are
`converted into a product. Typically some of the variables in
`
`the process may be controlled. such as for example tempera-
`ture. pressure, flow rate. amount of a raw material. Some of
`the variables may not be able to be controlled. such as for
`example ambient temperature. or purity of a raw material.
`Some examples ofindustrial processes include an ore refining
`process. a production line process, a mining process and a
`construction process. These lists are exemplary and are not
`indented to be limiting.
`[0049]
`FIG. I shows a schematic overview of a process of
`producing visualizations from imported data according to an
`embodiment of the present invention. As will be described
`below the visual irittions allow the data from the process to be
`analyzed to gain an understanding of the process or charac-
`teristics of the process. Data 12 is provided from a number of
`sources. The data 12 is divided into process data 14 and event
`data 16.
`
`Process data 14 is regularly~sampled time—series
`[0050]
`data collected from sensors in the process. The characteristics
`being measured by the sensor is referred to as a variable and
`the value(s) ofthe variable at a given moment in time forms an
`element of data. "I'ypically. the signals are sampled continu-
`ously. with averages being recorded every minute. For a pro-
`cess with 1000 variables. this equates to approximately 1.5
`million data elements per day. Occasionally. there are prob—
`lems with sensors. or with the collection of data from the
`process historian. This means that data may not be available
`continuously, and may have “holes". Process data 14 is
`obtained from an Excel spreadsheet. a text file. an OPC-I-il)A
`or an SQI. database. (OPC stands for “OLE for Process Con-
`trol”) OLE is a Microsoli protocol
`for communicating
`between application processes. OPC is a set of communica-
`tion protocols used by the process industry. based on OLE
`communication mechanisms. OPC protocols include: OPC—
`DA (or OPC Data Access) for real-time access to the values of
`process variables and OPC-HDA (or ()PC Historical Data
`Access.)
`Event data 16 is irregular data generated to describe
`[0051]
`events or exceptional conditions. An example ot‘event data is
`an alarm which is triggered when a certain condition or con~
`ditions isr‘are met. Event data 16 may be obtained from an
`SQL database or text file.
`[0052] The process will usually have process meta-data.
`The meta-data is data about the process. rather than data
`collected by operation of the process. It may include descrip-
`tions of the structure of the process {for example plant draw-
`ings) and Ihe meaning of process variables etc.
`[0053] The process data 14 and event data 16 are collected
`into databases 18. The databases include a process database
`20 and an event database 22 and a meta-data database 24.
`These databases 18 are used to produce dependent databases.
`[0054] Correlation techniques are applied to the process
`data 14 in the process database 20 and event data in the event
`database 22 to find similarities between variables. The result-
`ing correlation data is saved in a correlation database 26.
`[0055] The correlation database 26 can then be used to tag
`variables that are similar to one another. Such similar vari-
`ables are stored in a tag group set 28.
`[0056] The process data 14 in the process database 20 and
`event data 16 in the event database 22 may also be used to
`train a neural network to generate a model of the process. In
`this example a self organizing map (SOM) model 30 is gen—
`erated. The SUM model can be used to classify the state ofthe
`process and to produce state labels 32.
`
`14
`
`14
`
`
`
`US 2008t0297513 A1
`
`Dec. 4, 2008
`
`[0057] The resulting information can then be used to visu-
`alize various aspects of the process. Visualizations 34 can be
`produced from this information to determine different aspects
`about the process. The visualizations 34 are useful to Show a
`user. such as a process engineer, what the process is actually
`doing. as opposed to what the process ought to be doing. The
`visualizations 34 aim to improve the insight of the engineer
`into the workings of the process. Relationships revealed by
`the visualizations can reveal unexpected relationships. con~
`firm that relationships that were thought to exist do in fact
`exist and also can reveal relationships that should have been
`obvious as a logical consequence of the process design. but
`the engineer may not have made the required deductive link.
`[0058] The examples of the visualizations 36 include: a
`correlation matrix view 36. which uses information from the
`correlation database 26 and the tag group set 28: a signals
`view 38. which uses information from the tag group set. the
`process database and the event database: a lags view 40.
`which uses information from the correlation database and the
`process database; a process view 42, which uses intbrmation
`from the tag group set 28. the correlation database 26 and the
`process meta-data 24; and a Model View 44. which may also
`be visualized as will be described further below. Other visu-
`
`alizations are possible.
`
`Data
`
`[0059] The process data 14 is imported and stored in the
`process database 20. The process database 20 holds the pro-
`cess data 14 as a set of values over time for each of the
`
`variables in the process. It is important that process data 14 be
`represented in a way that is both compact and efficient to
`access. For rapid visualization. it is important to be able to
`quickly retrieve samples based on a given time range. While
`general purpose databases are useful in many applications.
`they impose an additional layer of software and processing
`between the application and its data. In the PDMS, this may
`not be acceptable because of the required speed at which
`information must be processed. Therefore. specialized repre-
`sentations may be used that use domain information to
`improve speed and reduce the size of the stored data.
`[0060] Each process variable may define a series of com-
`ponents to its value over time. For example. each sample may
`have the following components:
`[006]] Time (32-bit integer).
`[0062] Duration (32-bit
`integer). Together with start
`time. this indicates the time interval over which the
`sample is valid.
`[0063] Value (32-bit float).
`[0064] Range (2x32-bit floats). For samples that have
`been derived from a number ofother samples, the system
`optionally stores a maximum and minimum in addition
`to the value. This allows (for example) a visualization of
`a decimated time series to display the full range of the
`signal for each sample.
`integer). Each
`[0065]
`l-Extra Attributes (8- or 32-bit
`sample may be tagged with one or more additional Boo]-
`ean or integer attributes packed into integer bit-fields.
`The main system-defined attribute is Quality. which is
`delined for data imported from 0PC~HDA data sources.
`Other tags may be defined by the user, and applied on a
`per-sample basis to stored data.
`[0066] There is usually a certain amount of redundancy in
`the process data 14 that means that not all of the components
`
`need to be stored. The PDMS can use information about this
`redundancy to reduce the size of the stored data. and improve
`retrieval time.
`
`[0067] Tillie: Most data is periodic. so a stream can be
`represented as a sequence of periodic regions.
`l'iach
`region is defined by a start time. sampling period (dura-
`tion). and a number of evenly spaced. contiguous
`samples. Time and duration are not explicitly stored for
`each sample. but are calculated from the region header.
`Providing the number of holes (i .e. breaks in the period—
`icity) is small. this representation roughly halves the
`storage per sample.
`[0068] Range: Most data that has been imported from a
`Distributed Control System (“DC‘S”) is averaged, but
`does not define the range of the original values. For this
`data, the range is not stored but is defined to be equiva-
`lent to the value.
`
`[0069] Attributes: If a quality measure is not available
`and no userdefined attributes are defined then there are
`no additional attributes to be stored. and this field is
`omitted in the data. If quality is defined, the user may
`choose to filter out “had" values in pre-processing. in
`which case all samples in the time-series are implicitly
`“good" and again. the attribute field is entitled.
`[0070] Quantization: with the above considerations.
`most time-series data can be represented using a 4—byte
`lioat data type per sample. Ifless that 32-bits precision is
`required. it is possible to quantize the data using a per-
`strearn scale and offset factor to map between 32-bit
`floats and 8- or 16~bit integers. Repeats: when consecu-
`tive periodic samples have the same values for attributes
`that are defined (i.e. value. range. and extra) 3 run-length
`encoding is used. Values are stored just once along with
`a repeat count.
`[007]]
`For periodic data. samples can be rapidly located
`using a computable offset from the start of each region. For
`aperiodic data, a binary search allows a given sample to be
`located in O(log(N)) time. for N samples.
`[0072] When process data 14 is imported into the process
`database 20. certain statistics ofthe data 14 are calculated and
`stored in the process database 20 with the data stream. These
`include: mean. standard deviation. various central moments
`(skewness. kurtosis]. maximum. minimum, and frequency
`distribution (represented as a histogram using a pre-set num-
`ber of frequency bins). This information is used during visu~
`at ization to provide an appropriate scaling for display. The
`frequency distribution is also used for display. and for certain
`types of normalization.
`[0073] Compression of the process database 20 is not pre-
`fen'ed. Many well-known techniques of compression exist
`including boxcar. backward slope. and straight line interpo-
`lation methods. These techniques are lossy (i.e. they discard
`information) so the reconstructed data may be inaccurate in
`ways that could be statistically significant. However it
`is
`anticipated that some versions of the PDMS may incorporate
`data compression as an option.
`[0074] A facility to decimate time-series data (i .e. to reduce
`the sampling rate) after filtering out high frequency compo-
`nents may be included. In doing so, it preserves the range
`information in the resulting data stream because this is an
`important indicator of variability. This makes it possible to
`pre-compttte a representation of each signal at a number of
`pre-defincd time scales (eg. 1 minute. 10 minutes, 1 hour. 1
`
`15
`
`15
`
`
`
`US 2008t0297513 At
`
`Dec. 4, 2008
`
`day). This technique (similar to “MIP maps" in 3D graphics)
`can be used to furttter accelerate the display of data over long
`time—scales.
`
`[0075] The PDMS includes utilities for importing process
`data from a number of sources:
`[0076]
`Spreadsheet files.
`[0077]
`Text files.
`[0078] Databases.
`[0079] OPC-l-IDA sewers.
`[0080]
`Spreadsheet
`files are typically encoded using
`Microsoft Excel data formats. Matty tools shipped with DCS
`or process historians allow data to be exported in this format.
`l-towtwer, there are many limitations on what data can be
`represented in spreadsheets. Typically. worksheets can have
`at most 255 columns and 65535 rows. To overcome these
`
`limitations, the import system allows process data to be dis-
`tributed across multiple directories. spreadsheets, and work—
`sheets. An import “wizard" may be used to allow the user to
`specify what data to import”. and how the different sample
`attributes and meta-data attributes are encoded.
`
`[0081] OPC-l-IDA is a Distributed Component Object
`Model (“DCOM”) based protocol for itnporting historical
`data from process historians. DCOM is a Microsoft protocol
`for conmnuiicating between application programs that may
`be running on difi'erent machines. Typically. a process histo—
`rian (eg. Pi) collects data in real—time from a DCS system and
`stores it in a specialized database. usually with the aid of
`various compression techniques. The OPC-I-iDA protocols
`allow clients to retrieve the stored data. This includes:
`[0082] Time
`[0083] Value
`[0084] Quality
`[0085]
`Process data 14 may be itnported directly from
`OPC-HDA servers.
`
`[0086] One problem with certain import methods is that
`process meta-data is ttot available. For example. OPC-HDA
`servers often do not support tag browsing. Therefore. a
`mechanism to separately import meta-data li'om text files (in
`CSV format) may be implemented.
`[0087] Events 16 are conditions with well defined time and
`duration. Events are usually related to alarm conditions.
`Change in alanii state is described by several types of types of
`events. Alarm events indicate the time at which an alarm
`
`started. Return events ittdicate when the alarm stopped. Other
`events indicate how the operators respond to the alarms. For
`example, Enable. Disable, and Acknowledge. Other kinds of
`operator actions may also be recorded. For example. changes
`to operating set poittts. and operating modes.
`[0088] Typically, event streams are used for visualization or
`alamt analysis. However. for visualization it is important that
`the event data be etficiently accessible so the visualisation
`tools generally require that a fast binary representation to be
`used.
`
`[0089] The livent Database 22 is a stream of events 16
`defined for a number of event variables. In this context. an
`event variable corresponds to a state ofa DCS tag. Events are
`defined by the following attributes:
`[0090] Time.
`[0091]
`Tag.
`[0092] Event Type (alarm, return. acknowledge. opera—
`tor action).
`[0093]
`Subtype [ll]. t-llllt, etc).
`[0094]
`Priority (high. low. emergency, diagnostic. etc).
`
`[0095] Events are stored in a compact biliary representa-
`tion. Titties are strictly ordered, so that the closest event to a
`given time can be located in O(log{N)) time. where N is the
`number of events. Most attributes are of enumerated types
`(tag. event type. subtype. and priority) and are represented
`using small integers [8- or 16-bits). Small look-up tables are
`used to map these integers totfrom string tags. This also
`ensures that event records have a fixed size. which makes
`indexing simpler. Each event record also contains a pointer to
`the next and previous event of the same type, so it is quite
`efficient to enumerate all of the events of a given type, or to
`find {for example) the next return event corresponding to a
`given alarm event.
`[0096] Event streants may originate from a number of
`sources:
`
`[0097] Event logs (e. g. text printed by a DCS)
`[0098] Event databases. stored in database tables or
`spreadsheets.
`[0099] Normally, events are generated by the DCS, and are
`logged in an external system. This may be an external process
`historian. or a customized system like an IMAC‘ logger.
`[0100] The PDMS itnports event streams from text streams.
`or from databases. For data-base import. the user specifies
`which columns of tire input correspond to the event attributes
`listed above. The user can also define specific mappings
`between the values ofthesc fields and the resulting enumera-
`tion value {c.g. there may be more than one string used to
`represent an event type. or subntype). This allows the conver-
`sion and the event model to be customized fora particular site.
`[0101]
`Process meta-data 24 is information about the pro-
`cess. as distinct frotn information collected from the process.
`This includes:
`[0102] Descriptions of the variables and events in a pro-
`cess. This infomiation is used in the analysis and visu-
`alization ofdata. It includes the DC 8 name. description.
`measurement units. and any other information about the
`measurement (cg. sensor type, precision. etc).
`[0103] Descriptions of the relationships between the
`variables. For example, a measurement point may be
`associated with tnore than one process variable. A vari-
`able that is controlled automatically may have in addi-
`tion to its value. a set-point and a controller output.
`[0104] Descriptions of the structure ofthe process. Nor-
`mally. a process is logically divided into separate units.
`This defines specific physical and functional relation-
`ships between variables.
`[0105] Drawings of the process structure. This includes
`process and instrumentation drawings (P&ID).
`[0106] Meta-data is used for visualisation, and during
`analysis to select variables based on criteria that are mean-
`ingful in the domain.
`[0107]
`Several
`types of meta-data may be represented
`within PDMS. Each stream of process data is associated with
`the following attributes:
`[0108]
`Tag Name
`[0109] Description
`[0110] Units
`[0111]
`Precotnputed statistics and frequency distribu-
`tion.
`
`[0112] This information is stored in the process rneta~data
`database 24.
`
`[0113] Certain types of visualization in the PDMS make
`use of process drawings. The drawings are stored as image
`files (cg. using (illr format). These files can be produced by
`
`16
`
`16
`
`
`
`US 2008t029'i'513 AI
`
`Dec. 4, 2008
`
`exporting the data from a CAD system. orby scanning printed
`drawings. They can be annotated by the user to indicate the
`position of important process variables. Tire annotation is
`stored rising an XMl . data format. The process database may
`include a drawing database comprising multiple drawings.
`each with an associated image and XML annotation.
`[0114] Most existing tools require that data be memory
`resident. That is. they assume they can hold all the relevant
`data in memory. This limits the quantity of data that can be
`analyzed. The PDMS uses data structures that are usually
`stored on disk, and hence do not rely upon the availability of
`adequate computer memory. The PDMS can deal with large
`data vectors collected over long time intervals. This leads to
`datasets that are very large, and can exceed the available
`memory in any typical high end computer. indexing methods
`are included that allow fast retrieval of data from disk and fast
`
`manipulation in memory. Recursive decomposition of data to
`optimiye data for the time-scaie of interest avoids using sub-
`second data for a year’s analysis but also avoids data loss that
`is common in process data c