`
`
`
`43
`
`US 2008029751
`
`as) United States
`a2) Patent Application Publication o) Pub. No.: US 2008/0297513 Al
`Greenhill et al.
`(43) Pub. Date:
`Dec. 4, 2008
`
`(54) METHOD OF ANALYZING DATA
`
`Related U.S. Application Data
`
`(75)
`
`Inventors:
`
`Stewart Ellis Smith Greenhill,
`Hilton (AU): Svetha Venkatesh,
`Winthrop (AU); Peter Leslie Lee,
`Wattle Park (AU); Geoffrey Alec
`William West, Kalamunda (AU);
`Chiou Peng Lam, Karawara (AU)
`
`Cirteepondence Kadibes:
`EDELL, SHAPIRO & FINNAN, LLC
`1901 RESEARCH BOULEVARD, SUITE 400
`nea
`2
`ee
`ROCKVILLE, MD 20850 (US)
`
`(73) Assignee:
`
`IPOM PTY LTD, Bentley (AU)
`
`(21) Appl. No.:
`
`12/102,502
`
`(22)
`
`Filed:
`
`Apr. 14, 2008
`
`(63) Continuation of application No. PCT/AU2005/
`001595, filed on Oct. 14, 2005.
`Publication Classification
`
`(51)
`
`Int. Cl.
`G06r 11/20
`(2006.01)
`G0IG 5/02
`(2006.01)
`GO6F 3/048
`(2006.01)
`(52) US. C1. oeeeeeeseeereereres 345/440; 345/589; 715/771
`(57)
`ABSTRACT
`.
`eat
`:
`‘
`A computer assisted method of analysis suitable for process
`control, comprises the steps of: receiving first data streams
`representing values from a process; receiving second data
`streams representing states of the process; recording meta-
`data about the data streams; calculating relationships between
`pairs of the data streams; and recording relationship data
`resulting from the calculating step together with an associa-
`tion betweenat least one relationship datumandits corre-
`sponding meta-data.
`
`1
`
`34
`
`20
`
`Correlation
`
`‘5
`
`.
`
`0
`:
`Classification
`i
`x Boe
`| My
`|
`i ad
`ae
`Staie Labels
`
`ox So a Ny
`i of
`Poi NN
`; 28
`:
`¢ we i
`me =
`2 *%,
`*
`‘
`an
`
`!
`1
`
`-
`
`:
`
`a
`
`é
`
`| RAS .
`
`i
`|
`
`.
`
`a
`
`ie
`
`E
`
`Ee
`
`Me
`
`Model
`Process
`|
`Lag:
`Signals
`Correlation Matrix
`ee
`J
`View bo View
`t
`b
`View
`bal
`View
`I
`
`
`
`~ ane=148 eee =f. - a Nt fe _ _ ae ee E ni aot
`
`
`
`40
`42
`44
`38
`36
`
`
`
`
`
`1
`
`APPLE 1009
`
`
`Processdata
`14
`EventData——— 16
`
`
`
`Excel
`[ Text
`| {orca | SQLDB
`| SQL DB |
`Text
`
`
`.
`am
`7
`-
`.
`ff
`\
`/
`
`
`
`
`= et a ei
`4 Se a 4"
`se
`ye a ‘
`:
`: 7
`26 ————|_Conelation Database
`Tag Grouping
`d
`
`:
`
`Tag Group Set
`
`—_
`
`Neural Net ‘Training
`
`10
`
`*
`
`‘
`
`¢
`
`j
`
`Z
`
`:
`
`SOM Mode!
`
`My
`
`f
`
`: :
`
`
`
`
`
`APPLE 1009
`
`1
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 1 of 11
`
`US 2008/0297513 Al
`
`OL
`
` i1j\POL|aqIOs|9}"amgwag
`
`—4e™.—petaiv4.¥‘
`
`\
`
`\
`
`\;‘‘
`
`ceiASETaS:
`
`7-5
`
`02
`
`iiMOLA|Beyisquats|$f
`
`
`
`NUPMOTRperO:>
`
`aEA,
`
`ve
`
`9€
`
`~——|—__
`iSurryFNPINE=SNroanoeféen&ay\/i-wenmoree|uodury\v2
`
`;ag“St)
`
`
`|aTOS=|INOL|[oxy]
`
`vi
`
`
`
`BIEPSSBOONT
`
`2
`
`
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 2 of 11
`
`US 2008/0297513 Al
`
`+as
`
`coneeeie
`
`tatenehOS,
`
`Al;
`4
`£
`-
`
`5
`
`3
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 3 of 11
`
`US 2008/0297513 Al
`
`O3TI1
`SLPY
`o3Tnag.Pyv
`
`7
`I
`4
`PV
`STHM32.PV
`
`SST
`4
`SoA
`25.PV
`‘ati
`28.PY
`TH
`aT
`2T.Py
`rit
`PY
`TH
`>,
`oa
`O3TH2ZR. RY
`6
`ix}
`O3TIN120.PV
`
`O3TM17Pv
`C3THIs.Py
`Sih
`py
`Th
`# OgTI43.
`py
`2.PV
`OsThit
`OSTHILPY
`
`Sah)
`
`Os.PY
`
`37!
`
`Adtobitth
`
`feyLUND
`
`AgIte
`
`ForRLS
`
`4AaBOS
`
`Aya
`
`4
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 4 of 11
`
`US 2008/0297513 Al
`
`ones*
`
`
`
`32194¥ts29f
`
`FSid
`
`wo
`
`it.asezik
`="Yorfoatbsik
`
`
`
`66LIk>ie
`
`
`
`Joyesodeagysela
`
`OotZebayAprera,
`
`"27OOOO
`
`rool|ori
`se1u®eyes
`
`vzLoadzs|
`
`
`epzikas6zixfoe
`
`sz|iwsszafOsszif
`
`(—>zpS29oztaceiOoazezide
`
`
`agezadSH
`retlQ|—azisa>K
`in~3avezodoN
`
`
`
`
`
`
`
`
`as6zidaspeod
`
`5
`
`
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 5 of 11
`
`US 2008/0297513 Al
`
`
`
`4
`
`GS‘Sls
`
`
`
`32191¥@pez54)"2
`
`664K)em
`
`
`yzr9oy¥
`
`:cesGZéld=
`
`2190~~scapeaE
`“ruOM
`_SazrgoyvC])
`seuevzid
`nii
`
`t |i4
`
`szwSSzadsszit
`
`
`
`ROOTZéay‘Aemsunda
`
`
`
`soyesodergysejyi
`
`6
`
`
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 6 of 11
`
`US 2008/0297513 Al
`
`Ad€5!5180
`
`APENED
`
`APELUES
`
`heyB25LS
`
`Adenia:Je
`
`SZ,
`
`teeents
`
`xrzi¥LtDS
`
`RICO
`
`PFEIDINDJ
`
`7
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 7 of 11
`
`US 2008/0297513 Al
`
`AdG1VOLE6
`
`FadCZSDLEO
`
`ACTS
`
`ieosin
`-——
`falCUISeeeeete
`ee
`
`aaHenla
`
`8
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 8 of 11
`
`US 2008/0297513 Al
`
`elre alias alae
`
`|
`SpltatndSiehnilabarat silsat *
`
`+
`Lea
`
`i ¥
`
`igs,
`
`2i % &
`
`x
`
`eeAd0szl4e0—
`
`||
`
`Ad'SLLOLEO
`.|yd
`
`
`
`SaShaERFmeenSeeeaagata
`ee
`
`2
`Gey
`
`aaaae
`
`9
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 9 of 11
`
`US 2008/0297513 Al
`
`
`
`10
`
`10
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 10 of 11
`
`US 2008/0297513 Al
`
`“GaatTTreml0eele
`
`
`
`oaZOLNv.
`
`
`
`misusewoweenPOLAY
`
`
`
`
`
`.WIATECkLatpla
`
`iteat
`
`TNNO,mtCRPoypyLAiMeech
`
`
`
`
`11
`
`~SaSSOHD
`
`
`Sara!
`
`
`
`11
`
`
`
`
`Patent Application Publication
`
`Dec. 4, 2008 Sheet 11 of 11
`
`US 2008/0297513 Al
`
`
`
`12
`
`12
`
`
`
`US 2008/0297513 Al
`
`Dec. 4, 2008
`
`METHOD OF ANALYZING DATA
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`[0014] There may be one or more third data streams repre-
`senting statistics calculated from the first or second data
`streams, or both.
`
`[0015] The metadata may concernthe origins of the data
`streams, for instance it may comprise tags that identify the
`location of origin of each respective data stream. The asso-
`ciation may link each datum to its respective locations of
`origin. There may be more than onelocation depending on the
`origins of the data streams. The meta-data may include flow
`charts or plant diagrams. The chart or diagram may display
`the value of each datumat the locationof its source.
`[0016] The calculating step may involve calculating corre-
`lations of the data streams. The calculating step may involve
`calculating, fora range ofdifferenttime lags, autocorrelations
`of the data streams. Alternatively, or in addition the calculat-
`ing step may involve calculating, for a range of different time
`lags, cross-correlation ofpairs of data streams.
`[0017]
`Sub-sets may be created within the relationship
`data, and each sub-set may comprise data having a value
`within the same predetermined range of values. For instance,
`each sub-set may comprise data having a correlation value
`within the same predetermined range of values. Where the
`metadata involves tags that label the locations of origins a
`Industrial processes involve large and complex sys-
`[0003]
`sub-set is designated a ‘tag group’.
`tems. Typically, an industrial process involves many thou-
`[0018] The predetermined range ofvalues is a user select-
`sands ofvariables which are controlled in part by automatic
`able parameter, so for instance the user may select a sub-
`processes, and in part by human operators. In the operation of
`group, or tag group, made up of data streams that are corre-
`these processes large amounts of informationare collected by
`lated to better than 90%. The degree of correlation may be
`process control and monitoring systems.
`changed by the user and this may automatically flow through
`[0004] Most tools currently available for process analysis
`to a changein the composition ofthe group. A similar result
`are complex mathematical analysis tools that are general in
`may automatically be achieved when making other changes,
`nature, require an understanding of their language, and are
`such as changing the amountoflag in correlation.
`expensive and time consumingto use. Tools such as Matlab,
`[0019] As time passes and moredata is received, the calcu-
`Excel, or Mathcad are routinely used in process engineering
`lating step may be performed again to update the relationship
`environments. However, they require that the data all be
`data. The step may even be performed repeatedly inreal time.
`stored in memory, limiting the complexity of the problems
`[0020] The relationship data may be displayed inafirst
`that can be analyzed or visualized.
`formas a matrix with a single datumineachcell of the matrix.
`The relationship data calculated for each data stream will
`appear in both a row and a columnof the matrix. The matrix
`may be convertible directly to raster.
`[0021] Therows and columns may be grouped according to
`the value of the relationship data, in other words the tag
`groups may automatically be collected together.
`[0022] The relationship data may be displayed in a second
`form as a diagram of metadata having locations marked
`according to their corresponding relationship datum. The
`location ofthe source ofeach data stream, may be indicated in
`the diagram of metadata.
`[0023] The relationship data may be displayed in a third
`formasa list.
`
`[0001] This application is a continuation of International
`Application No, PCT/AU2005/001595, filed on Oct. 14,
`2005, entitled “Method of Analysing Data,’ which claims
`priority under 35 U.S.C. § 119 to Application No. AU
`2004905955 filed on Oct. 15, 2004, entitled “Method ofAnal-
`ysing Data,” the entire contents of which are hereby incorpo-
`rated by reference.
`
`FIELD OF THE INVENTION
`
`[0002] This invention concerns acomputerassisted method
`of analysis suitable for process control. In further aspects the
`invention concerns a computer system for performing the
`method and computer software for performing the method.
`Theinvention has particular utility in the control of Industrial
`Processes.
`
`BACKGROUND
`
`SUMMARY
`
`[0005] The invention is a computer assisted method of
`analysis suitable for process control, comprising the steps of:
`[0006]
`receivingfirst data streams representing values from
`a process;
`[0007]
`receiving second data streams representing states of
`the process;
`[0008]
`recording metadata about the data streams;
`[0009]
`calculating relationships between pairs of the data
`streams; and
`[0010]
`recording relationship data resulting fromthe cal-
`culating step together with an association betweenatleast one
`relationship datum and its corresponding meta-data.
`[0011] By recording relationship data between the data
`streams together with corresponding metadata the process
`engineer is able to gain insight about the process and its
`control in relation to aspects of the process described by the
`metadata.
`
`[0012] The data streams may be continuous streams, or
`they may be discontinuous, discontiguous or even a succes-
`sion of blocksofdata.
`
`[0013] The values ofthefirst data streams may be measure-
`ments from the process. The valuesofthe first data streams
`may be sampled over time. The states of the second data
`streams may be events or conditions in the process.
`
`[0024] The data streams may also be displayed in the form
`oftime-series data.
`
`[0025] Historical values of the relationship data or data
`streams may be displayed.
`[0026] Correlations betweena pairof data streams may be
`displayed as a function oflagged time.
`[0027] Coding may be used to identify different sub-sets in
`the display, and this coding may survive whena different view
`is selected so that a tag group highlighted in one group is still
`highlighted when the viewis changed. The coding may be
`color coding or shading. A user maybe abletoselect a sub-set
`by:
`[0028]
`
`clicking on a cell in the matrix;
`
`13
`
`13
`
`
`
`US 2008/0297513 Al
`
`Dec. 4, 2008
`
`clicking on a markedlocation in the meta-data dia-
`
`[0029]
`gram; or,
`clicking on a datum inthelist.
`[0030]
`[0031] A neural network may be trained to modelthestate
`space of the process.
`[0032]
`In another aspect the inventionis a computer system
`for performing the method.
`[0033] A further aspectofthe invention is computer soft-
`ware for performing the method.
`[0034]
`In the claims of this application and in the descrip-
`tion of the invention, except where the context requires oth-
`erwise due to express language or necessary implication, the
`words “comprise”or variations such as “comprises” or “com-
`prising” are used in an inclusive sense, i.e. to specify the
`presence of the stated features but not to preclude the pres-
`enceor addition of further features in various embodiments of
`the invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`the process may be controlled, suchas for example tempera-
`ture, pressure, ow rate, amount of a raw material. Some of
`the variables may not be able to be controlled, such as for
`example ambient temperature, or purity of a raw material.
`Some examples of industrial processes include anore refining
`process, a production line process, a mining process and a
`construction process. These lists are exemplary and are not
`indented to be limiting.
`[0049]
`FIG. 1 shows a schematic overview of a process of
`producing visualizations from imported data according to an
`embodimentofthe present invention. As will be described
`below the visualizations allow the data from the process to be
`analyzed to gain an understanding of the process or charac-
`teristics ofthe process. Data 12 is provided from a number of
`sources, The data 12 is divided into process data 14 and event
`data 16.
`
`Process data 14 is regularly-sampled time-series
`[0050]
`data collected from sensors in the process. The characteristics
`being measured by the sensoris referred to as a variable and
`the value(s) ofthe variable at a given moment in time forms an
`In order to provide a better understanding of the
`[0035]
`element of data. Typically, the signals are sampled continu-
`present invention preferred embodiments will be described
`ously, with averages being recorded every minute. For a pro-
`below, by way of example only, with reference to the accom-
`cess with 1000 variables, this equates to approximately 1.5
`panying drawings, in which:
`million data elements per day. Occasionally, there are prob-
`[0036]
`FIG. 1 is a schematic view of information flow
`lems with sensors, or with the collection of data from the
`betweenparts of an embodimentofthe present invention.
`process historian. This means that data may not be available
`[0037]
`FIG. 2 is a large scale visualization ofa cross-cor-
`continuously, and may have “holes”. Process data 14 is
`relation matrix (717x717 variables).
`obtained from an Excel spreadsheet, a text file, an OPC-HDA
`[0038]
`FIG. 3 is a small scale visualization of the cross-
`oran SQLdatabase. (OPC stands for “OLEfor Process Con-
`correlation matrix of FIG, 2 (approx 40x40 variables).
`trol”) OLE is a Microsoft protocol
`for communicating
`
`[0039] FIG.4is a process view showing tag grouping. The
`between application processes. OPC is a set of communica-
`selected tag is displayed as a filled square. The related tags are
`tion protocols used by the process industry, based on OLE
`displayed asfilled circles.
`communication mechanisms. OPC protocols include: OPC-
`[0040] FIG.5 isa process view showingtag similarity. The
`DA(or OPCData Access) for real-time accesstothe values of
`selected tag is displayed as a filled square. Other tags are
`process variables and OPC-HDA (or OPCHistorical Data
`displayed as filled circles, with the shading indicating the
`Access.)
`degree of correlation according to the currently defined shad-
`Event data 16 is irregular data generated to describe
`ing mapping.
`[0051]
`events or exceptional conditions. An example ofeventdata is
`[0041]
`FIG. 6 is a signal view showing changes over time
`an alarm whichis triggered whena certain condition or con-
`for process variables and alarms in a tag group.
`ditions is/are met. Event data 16 may be obtained from an
`[0042]
`FIG. 7 is a signal view showing signal amplitude
`SQLdatabase ortextfile.
`using shading rather thanplotting on the vertical axis. This is
`useful for visually identifying patterns in large sets of tags.
`[0052] The process will usually have process meta-data.
`[0043]
`FIG. 8 is a signal view showing a small set of vari-
`The meta-data is data about the process, rather than data
`ables with scale information.
`collected by operationof the process. It may include descrip-
`[0044] FIG.9 is asignal view showing all alarm events over
`tions ofthe structure ofthe process (for example plant draw-
`ings) and the meaning of processvariablesete.
`a two monthperiod.
`[0045]
`FIG. 10 is a lags view showing cross-correlation
`[0053] The process data 14 and event data 16 are collected
`betweena pair ofvariables as a function oftime.
`into databases 18. The databases include a process database
`20 and an event database 22 and a meta-data database 24.
`[0046]
`FIG. 11 is a state space view labeled according to
`key performanceindicators.
`These databases 18 are used to produce dependent databases.
`[0054] Correlation techniques are applied to the process
`data 14 in the process database 20 and event data in the event
`database 22 tofind similarities betweenvariables. Theresult-
`ing correlation data is saved in a correlation database 26.
`[0055] The correlation database 26 can then be used to tag
`variables that are similar to one another. Such similar vari-
`ables are stored in a tag group set 28.
`[0056] The process data 14 in the process database 20 and
`event data 16 in the event database 22 may also be used to
`train a neural network to generate a model of the process. In
`this example a self organizing map (SOM) model 30 is gen-
`erated.The SOM model canbe used toclassify the state ofthe
`process and to produce state labels 32.
`
`DETAILED DESCRIPTION
`
`[0047] The embodiment described here is used as a Process
`Data Management System (PDMS), which deals with data
`from industrial processes.
`It will be appreciated that
`the
`present invention may be used to analyze data from other
`sources,
`
`[0048] Due to the amount of data produced by a typical
`industrial process, and the speed at which it must be handled,
`specialized data structures have been developed to represent
`this information. An industrial process is intended to meana
`non-trivial process in which one or more raw materials are
`converted into a product. Typically some of the variables in
`
`14
`
`14
`
`
`
`US 2008/0297513 Al
`
`Dec. 4, 2008
`
`need to be stored. The PDMScanuse information about this
`redundancy to reduce the size of the stored data, and improve
`retrieval time.
`
`[0057] The resulting information can then be used to visu-
`alize various aspects of the process. Visualizations 34 can be
`producedfromthis information to determine different aspects
`aboutthe process. The visualizations 34 are useful to showa
`[0067] Time: Mostdata is periodic, so a stream can be
`user, such as a process engineer, what the processis actually
`represented as a sequence of periodic regions. Each
`doing, as opposed to what the process ought to be doing. The
`region is defined byastart time, sampling period (dura-
`visualizations 34 aim to improvethe insight of the engineer
`tion), and a number of evenly spaced, contiguous
`into the workings of the process. Relationships revealed by
`samples. Time and duration are not explicitly stored for
`the visualizations can reveal unexpected relationships, con-
`each sample, but are calculated from the region header.
`firm that relationships that were thought to exist do in fact
`Providing the numberof holes (i.e. breaks in the period-
`exist and also canrevealrelationships that should have been
`icity) is small, this representation roughly halves the
`obvious as a logical consequence of the process design, but
`storage per sample.
`the engineer may not have made the required deductive link.
`[0068] Range: Most data that has been imported from a
`[0058] The examples ofthe visualizations 36 include: a
`Distributed Control System (“DCS”) is averaged, but
`correlation matrix view 36, which uses information from the
`does not define the range of the original values. Forthis
`correlation database 26 and the tag group set 28; a signals
`data, the rangeis not stored but is defined to be equiva-
`lent to the value.
`view 38, which uses information from the tag group set, the
`process database and the event database; a lags view 40,
`whichuses informationfromthe correlation database and the
`process database; a process view42, which uses information
`from the tag group set 28, the correlation database 26 and the
`process meta-data 24; and a Model View 44, which may also
`be visualized as will be described further below. Other visu-
`
`alizations are possible.
`
`Data
`
`[0069] Attributes: If a quality measure is not available
`and no user-definedattributes are defined then there are
`no additional attributes to be stored, and this field is
`omitted in the data. If quality is defined, the user may
`choose to filter out “bad” values in pre-processing, in
`whichcase all samples in the time-series are implicitly
`“good” and again, the attribute field is omitted.
`[0070] Quantization: with the above considerations,
`most time-series data can be represented using a 4-byte
`[0059] The process data 14 is imported and stored in the
`float data type per sample. Ifless that 32-bits precisionis
`process database 20. The process database 20 holds the pro-
`required, it is possible to quantize the data using a per-
`cess data 14 as a set of values over time for each of the
`stream scale and offset factor to map between 32-bit
`variables in the process. It is important that process data 14 be
`floats and 8- or 16-bit integers. Repeats: when consecu-
`represented in a way that is both compact and efficient to
`tive periodic samples have the samevalues forattributes
`access. For rapid visualization, it is important to be able to
`that are defined (i.e. value, range, and extra) a run-length
`quickly retrieve samples based on a given time range. While
`encoding is used. Values are stored just once along with
`general purpose databases are useful in many applications,
`a repeat count,
`they impose anadditional layer of software and processing
`[0071]
`For periodic data, samples can be rapidly located
`betweenthe application and its data. In the PDMS, this may
`using a computable offset from the start of each region. For
`not be acceptable because of the required speed at which
`aperiodic data, a binary search allows a given sample to be
`information must be processed. Therefore, specialized repre-
`located in O(log(N)) time, for N samples.
`sentations may be used that use domain information to
`[0072] When process data 14 is imported into the process
`improve speed and reduce the size ofthe stored data.
`database 20, certain statistics ofthe data 14 are calculated and
`[0060] Each process variable may define a series of com-
`stored in the process database 20 with the data stream. These
`ponents toits value over time. For example, each sample may
`include: mean, standard deviation, various central moments
`have the following components:
`(skewness, kurtosis), maximum, minimum, and frequency
`[0061] Time (32-bit integer).
`distribution (represented as a histogramusing a pre-set num-
`[0062] Duration (32-bit
`integer). Together with start
`ber of frequency bins). This informationis used during visu-
`time, this indicates the time interval over which the
`alization to provide an appropriate scaling for display. The
`sampleis valid.
`frequency distribution is also used for display, and for certain
`[0063] Value (32-bitfloat).
`types of normalization,
`[0064] Range (2x32-bit floats). For samples that have
`[0073] Compressionofthe process database 20 is not pre-
`been derived from a numberofother samples, the system
`ferred. Many well-known techniques of compression exist
`optionally stores a maximumand minimumin addition
`including boxcar, backward slope, and straight line interpo-
`to the value. This allows (for example)avisualization of
`lation methods. These techniques are lossy (1.e. they discard
`a decimated time series to display the full range of the
`information) so the reconstructed data may be inaccurate in
`signal for each sample.
`ways that could bestatistically significant. However it
`is
`anticipated that some versions of the PDMS mayincorporate
`integer). Each
`[0065]
`[Extra Attributes (8- or 32-bit
`sample may be tagged with one or more additional Bool-
`data compressionas an option.
`ean or integer attributes packed into integer bit-fields.
`[0074] A facility to decimate time-series data (i.e. to reduce
`The main system-defined attribute is Quality, which is
`the sampling rate) after filtering out high frequency compo-
`defined for data imported from OPC-HDAdata sources.
`nents may be included. In doing so, it preserves the range
`Other tags may be defined by the user, and applied ona
`information in the resulting data stream because this is an
`per-sample basis to stored data.
`important indicator of variability. This makes it possible to
`[0066] There is usually a certain amount of redundancy in
`pre-compute a representation of each signal at a numberof
`the process data 14 that meansthatnot all of the components
`pre-defined time scales (e.g. 1 minute, 10 minutes, | hour, |
`
`15
`
`15
`
`
`
`US 2008/0297513 Al
`
`Dec. 4, 2008
`
`day). This technique (similar to “MIP maps”in 3D graphics)
`can be used to further accelerate the display of data over long
`time-scales.
`
`[0075] The PDMS includesutilities for importing process
`data from a numberof sources:
`[0076]
`Spreadsheetfiles.
`[0077]
`Textfiles.
`[0078] Databases.
`[0079] OPC-HDAservers.
`[0080]
`Spreadsheet
`files are typically encoded using
`Microsoft Excel data formats. Many tools shipped with DCS
`or process historians allow data to be exported in this format.
`However, there are many limitations on what data can be
`represented in spreadsheets. Typically, worksheets can have
`at most 255 columns and 65535 rows. To overcome these
`limitations, the import system allowsprocess data to be dis-
`tributed across multiple directories, spreadsheets, and work-
`sheets. An import “wizard” may be used to allow the user to
`specify what data to import, and how the different sample
`attributes and meta-data attributes are encoded.
`
`[0081] OPC-HDA is a Distributed Component Object
`Model (“DCOM”) based protocol for importing historical
`data from process historians. DCOM is a Microsoft protocol
`for communicating between application programs that may
`be running ondifferent machines. Typically, a process histo-
`rian (e.g. Pi) collects data in real-time from a DCS system and
`stores it in a specialized database, usually with the aid of
`various compression techniques. The OPC-HDAprotocols
`allowclients to retrieve the stored data. This includes:
`[0082] Time
`[0083] Value
`[0084] Quality
`[0085]
`Process data 14 may be imported directly from
`OPC-HDAservers.
`
`[0086] One problem with certain import methods is that
`process meta-data is not available. For example, OPC-HDA
`servers often do not support tag browsing. Therefore, a
`mechanism to separately import meta-data fromtext files (in
`CSVformat) may be implemented.
`[0087] Events 16 are conditions with well defined time and
`duration. Events are usually related to alarm conditions.
`Change inalarmstate is described by several types of types of
`events. Alarm events indicate the time at which an alarm
`
`started. Return events indicate when the alarm stopped. Other
`events indicate how the operators respondtothe alarms. For
`example, Enable, Disable, and Acknowledge. Other kinds of
`operator actions may also be recorded. For example, changes
`to operating set points, and operating modes.
`[0088] Typically, event streams are used for visualization or
`alarm analysis. However, for visualizationit is important that
`the event data be efficiently accessible so the visualization
`tools generally require that a fast binary representation to be
`used.
`
`[0089] The Event Database 22 Is a stream of events 16
`defined for a numberof event variables. In this context, an
`event variable correspondsto a state of a DCStag. Events are
`defined by the following attributes:
`[0090] Time.
`[0091]
`Tag.
`[0092] Event Type (alarm, return, acknowledge, opera-
`tor action).
`[0093]
`Subtype (HI, HIHI, etc).
`[0094]
`Priority (high, low, emergency, diagnostic, etc).
`
`[0095] Events are stored in a compact binary representa-
`tion. Timesare strictly ordered, so that the closest event to a
`given time can be located in O(log(N)) time, where N is the
`number of events. Most attributes are of enumerated types
`(tag, event type, subtype, and priority) and are represented
`using small integers (8- or 16-bits). Small look-up tables are
`used to map these integers to/from string tags. This also
`ensures that event records have a fixed size, which makes
`indexing simpler. Each event record also contains a pointer to
`the next and previous event of the same type, soit is quite
`efficient to enumerateall of the events of a given type, or to
`find (for example) the next return event corresponding to a
`given alarm event.
`[0096] Event streams may originate from a number of
`sources:
`
`[0097] Event logs (e.g. text printed by a DCS)
`[0098] Event databases, stored in database tables or
`spreadsheets.
`[0099] Normally, events are generated by the DCS, and are
`logged in an external system. This may be an external process
`historian, or a customized systemlike an IMAClogger.
`[0100] The PDMSimports eventstreams from text streams,
`or from databases. For data-base import, the user specifies
`which columnsof the input correspondto the eventattributes
`listed above. The user can also define specific mappings
`betweenthe values of thesefields and the resulting enumera-
`tion value (e.g. there may be more than onestring used to
`represent an event type, or sub-type). This allows the conver-
`sion and the event model to be customized fora particular site.
`[0101]
`Process meta-data 24 is information about the pro-
`cess, as distinct from informationcollected from the process.
`This includes:
`[0102] Descriptions ofthe variables and events in a pro-
`cess. This informationis used in the analysis and visu-
`alization ofdata. It includes the DCS name, description,
`measurement units, and any other information about the
`measurement (e.g. sensor type, precision, etc).
`[0103] Descriptions of the relationships between the
`variables. For example, a measurement point may be
`associated with more than one process variable. A vari-
`able that is controlled automatically may have in addi-
`tionto its value, a set-point and a controller output.
`[0104] Descriptions ofthe structure ofthe process. Nor-
`mally, a process is logically divided into separate units.
`This defines specific physical and functional relation-
`ships betweenvariables.
`[0105] Drawingsof the process structure. This includes
`process and instrumentation drawings (P&ID).
`[0106] Meta-data is used for visualization, and during
`analysis to select variables based oncriteria that are mean-
`ingful in the domain.
`[0107]
`Several
`types of meta-data may be represented
`within PDMS. Eachstreamof process data is associated with
`the following attributes:
`[0108]
`‘Tag Name
`[0109] Description
`(O110] Units
`[0111]
`Precomputed statistics and frequency distribu-
`tion,
`
`[0112] This information is stored in the process meta-data
`database 24.
`
`[0113] Certain types ofvisualization in the PDMS make
`use of process drawings. The drawings are stored as image
`files (e.g. using GIF format). Thesefiles can be produced by
`
`16
`
`16
`
`
`
`US 2008/0297513 Al
`
`Dec. 4, 2008
`
`exporting the data from a CAD system,or by scanning printed
`drawings. They can be annotated by the user to indicate the
`position of important process variables. The annotation is
`stored using an XML data format. The process database may
`include a drawing database comprising multiple drawings,
`each with an associated image and XML annotation.
`[0114] Most existing tools require that data be memory
`resident. That is, they assume they can hold all the relevant
`data in memory. This limits the quantity of data that can be
`analyzed. The PDMS uses data structures that are usually
`stored on disk, and hence donotrely uponthe availability of
`adequate computer memory. The PDMS candeal with large
`data vectors collected over long time intervals. This leads to
`datasets that are very large, and can exceed the available
`memory in anytypical high end computer. Indexing methods
`are includedthatallow fast retrieval of data from disk and fast
`
`manipulation in memory. Recursive decomposition of data to
`optimize data for the time-scale of interest avoids using sub-
`second data for a year’s analysis but also avoidsdata loss that
`is common in process data compression algorithms used in
`mosthistorical visualization tools.
`[0115] The PDMS deals with data from both batch and
`continuous processes. There are very few tools available for
`batch processes. This is because of the complexity of the
`description of batch processes. Batch processes require two
`time dimensions to handle both elapsed time and time in a
`process state. They also require a description of the actual
`process equipment associated with any particular batch
`because multiple processing paths may exist through a typical
`batch process. They also require a representationof the state
`ofthe process andthe current process step being employed to
`be recorded in the datasets.
`
`Correlation
`
`[0116] The correlation database 26 comprises correlation
`data, Correlation data measures the similarity between pro-
`cess variables. The PDMS computes the lagged correlations
`for all pairs ofvariables, up to a defined timelag.
`Givena data series x,, the mean x is:
`
`
`
`For twodata series x, and y,, the covariances,.,, is:
`
`n
`
`Gal
`
`=
`
`iy
`
`=
`
`(x;y; — ¥)
`
`nt
`
`The simple variance s,. of x,is:
`?
`Sy “Sy
`
`The correlation R,,, oftwo x, and y,is the covariance normal-
`ized by the product ofthe variances of the twoserie