`
`Journal of Electrocardiology xx (2012) xxx – xxx
`
`www.jecgonline.com
`
`Signal quality and data fusion for false alarm reduction
`in the intensive care unit
`Qiao Li, PhD, a, b Gari D. Clifford, PhD b,⁎
`
`a Institute of Biomedical Engineering, School of Medicine, Shandong University, Jinan, Shandong, China
`b Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
`Received 30 March 2012
`
`Abstract
`
`Due to a lack of integration between different sensors, false alarms (FA) in the intensive care unit
`(ICU) are frequent and can lead to reduced standard of care. We present a novel framework for FA
`reduction using a machine learning approach to combine up to 114 signal quality and physiological
`features extracted from the electrocardiogram, photoplethysmograph, and optionally the arterial
`blood pressure waveform. A machine learning algorithm was trained and evaluated on a database of
`4107 expert-labeled life-threatening arrhythmias, from 182 separate ICU visits. On the independent
`test data, FA suppression results with no true alarm (TA) suppression were 86.4% for asystole, 100%
`for extreme bradycardia and 27.8% for extreme tachycardia. For the ventricular tachycardia alarms,
`the best FA suppression performance was 30.5% with a TA suppression rate below 1%. To reduce
`the TA suppression rate to zero, a reduction in FA suppression performance to 19.7% was required.
`© 2012 Elsevier Inc. All rights reserved.
`
`Keywords:
`
`False alarm reduction; Signal quality assessment; Genetic algorithm; Relevance vector machine; Intensive care unit
`
`Introduction
`
`False cardiac monitor alarm rates in the intensive care unit
`(ICU) are extremely frequent, and can be up to 95% for some
`types of alarms. 1 Since the publication by Lawless 2 on the
`“crying wolf” phenomenon in 1994, the unfortunate reality is
`that not much has changed over the intervening 15 years. 3
`There are two main reasons for the high false alarm rate. One
`is that physiological data can be severely corrupted by
`artifacts, noise and missing values. The other reason is that
`univariate alarm algorithms and simple numeric thresholds
`are predominantly used in current clinical bedside monitors. 4
`Moreover, alarm thresholds are often adjusted in an ad hoc
`manner, based on how annoying the alarm is perceived to be
`to the clinical team in attendance. There is little evidence that
`alarm thresholds are optimized for any population, particu-
`larly in any multivariate manner.
`Various strategies have been employed to deal with the
`false alarm problem including median filtering, 5 conven-
`tional statistical signal processing and filtering, 6 multivari-
`able fuzzy temporal profile modeling, 7 multi-parametric
`
`⁎ Corresponding author. Institute of Biomedical Engineering, Old Road
`Campus Research Building, Off Roosevelt Drive, OX3 7DQ Oxford, UK.
`E-mail address: gari.clifford@eng.ox.ac.uk
`
`0022-0736/$ – see front matter © 2012 Elsevier Inc. All rights reserved.
`http://dx.doi.org/10.1016/j.jelectrocard.2012.07.015
`
`analysis 1,8-11 and signal quality assessment techniques. 10-12
`Most of these studies however, use small number of alarms
`and patients. There are two studies that have used a large
`database and robust study design by splitting the data into a
`training and test data set
`to develop and evaluate their
`algorithms. Aboukhalil et al. 1 used arterial blood pressure
`(ABP) waveform and signal quality indices (SQIs) to
`suppress electrocardiogram (ECG) arrhythmia false alarms.
`Among five alarm categories and 5386 critical ECG
`arrhythmia alarms,
`the false alarm (FA) reduction rates
`were from 93.5% to 33.0% respectively and the true alarm
`(TA) reduction rates were 0%, except for ventricular
`tachycardia (VT) alarms (9.4%). Deshmane 10 used a signal
`quality assessment scheme for the pulse oximetry or
`photoplethysmogram (PPG) waveform as well as ABP and
`ECG to suppress false ECG critical arrhythmia alarms.
`Among 4012 critical ECG arrhythmia alarms,
`the FA
`reduction rates were from 68.2% to 1.6% with TA reduction
`rates of 4.0% (asystole), 0% (extreme bradycardia, EB), 0.8%
`(extreme tachycardia, ET) and 0.2% (VT). The main problem
`Aboukhalil et al. 1 and Deshmane 10 faced was that the VT
`alarm had high TA reduction rate but low FA reduction rate,
`as ABP and PPG waveforms did not always manifest low
`cardiac output, pulse pressure or sometimes, particularly
`abnormal beats during VT. Sayadi and Shamsollahi 13 have
`
`APPLE 1006
`
`1
`
`
`
`2
`
`Q. Li, G.D. Clifford / Journal of Electrocardiology xx (2012) xxx–xxx
`
`44.3
`14.4
`26.6
`91.2
`
`103545.0
`67629.4
`37
`1.6
`62
`2.7
`26011.3
`
`55.7
`85.6
`73.4
`8.8
`
`126555.0
`84936.9
`9.6
`220
`7.4
`171
`1.1
`25
`
`66.3
`11.2
`10.1
`12.4
`
`2300
`1525
`257
`233
`285
`
`46.4
`11.7
`26.6
`93.6
`
`168145.0
`98126.3
`85
`2.3
`87
`2.3
`52814.1
`
`53.6
`88.3
`73.4
`6.4
`
`205355.0
`113330.3
`64417.2
`6.4
`240
`1.0
`36
`
`56.6
`19.5
`8.8
`15.1
`
`3734
`2114
`729
`327
`564
`
`4107
`2260
`843
`360
`644
`
`All
`VT
`ET
`EB
`Asystole
`
`EB:extremebradycardia,ET:extremetachycardia,VT:ventriculartachycardia.
`
`%ofall%ofalarmtype
`
`%ofall%ofalarmtypeN
`
`N
`
`%ofall%ofalarmtype
`
`%ofall%ofalarmtypeN
`
`N
`
`Falsealarms
`
`Totalalarms%ofallTruealarms
`
`Falsealarms
`
`Totalalarms%OfallTruealarms
`
`Subset2(ECG,ABPandPPGavailable)
`
`AlarmtypeTotalalarmsSubset1(ECGandPPGavailable)
`
`GoldstandarddatasetsandsubsetsofcriticalECGarrhythmiaalarms:relativefrequencyoftrueandfalsealarmsonaper-alarmbasis.
`Table1
`
`recently applied a model-based filtering approach to detect-
`ing the above listed alarms in the MIMIC II database. They
`also quote superior FA suppression rates (except for
`bradycardia). However, it should be noted that all three
`waveform signals need to be present plus a central venous
`pressure (CVP) or pulmonary arterial pressure (PAP)
`waveform, which significantly limits the application of the
`system to a small subset of the population and only when the
`signals exhibit high quality. Moreover, the authors treated all
`alarms together, rather than dividing the data into indepen-
`dent training and testing sets.
`In the work we describe a framework that learns the
`relationship between the occurrences of noise and signals
`across all the cardiovascular signals in the ICU during life-
`threatening ventricular arrhythmias. Features extracted from
`the ECG, ABP and PPG (including heart rate (HR), pulse
`oxygen saturation (SpO2), signal quality indices and rates of
`changes in parameters) were combined in a novel data fusion
`framework to suppress the false arrhythmia alarms.
`As the ABP is an invasive measurement, present in only
`about two thirds of a typical ICU population, we compared
`the algorithms with ABP and without ABP. First, we
`generated a novel PPG signal quality assessment method
`using dynamic time warping algorithm 14 and used it to
`suppress the false alarms, according to the frame which
`Aboukhalil et al. 1 and Deshmane 10 used. We then estimated
`the heart rate (HR) from the ECG, ABP and PPG separately,
`fused the result using a Kalman filter and SQIs, 12,14,15 and
`used it to suppress the false alarms. These traditional methods
`showed a good performance on asystole and extreme
`bradycardia (EB) alarms, modest on extreme tachycardia
`(ET) alarms, but poor performance on VT alarms. To improve
`the VT alarm performance, in this work we extracted 114
`variables from ECG, ABP and PPG signals, including signal
`features and SQIs. We then used a feature selection
`technique, a genetic algorithm (GA), to select the optimal
`variable combination. The GA mimics the principles of
`natural selection to “breed” possible successful combinations
`of parameters, and “kills off” poorly performing combina-
`tions of parameters. The best feature combination are then
`presented to a nonlinear classifier known as a relevance
`vector machine (RVM), to label the alarms as true and false.
`
`Materials and methods
`
`Data sets
`
`We used the same data sets as described by Deshmane 10
`with minor adjustments, drawn from the multi-parameter
`ICU database (PhysioNet's MIMIC II database), 16-18
`containing simultaneous ECG, ABP, and PPG recordings
`with 4107 multiple expert-annotated alarms (asystole, EB,
`ET, and VT) on 182 ICU admissions. The adjustments
`include adding one case into the data sets, eliminating the
`alarms when PPG is unavailable at the time the alarm occurs,
`and revising 41 annotations from true to false which we
`considering were labeled inaccurately. (These labels were
`changed in the prototyping stage, before any machine
`learning was applied.) By eliminating alarms when the
`
`2
`
`
`
`Q. Li, G.D. Clifford / Journal of Electrocardiology xx (2012) xxx–xxx
`
`3
`
`Table 2
`Distribution of alarms in training and test sets of subset 1.
`
`Alarm type
`
`Asystole
`EB
`ET
`VT
`All
`
`Training
`
`False
`
`293
`63
`29
`483
`868
`
`True
`
`19
`123
`401
`672
`1215
`
`Total
`
`312
`186
`430
`1155
`2083
`
`FA rate (%)
`
`93.9
`33.9
`6.7
`41.8
`41.7
`
`Test
`
`False
`
`235
`24
`56
`498
`813
`
`True
`
`17
`117
`243
`461
`838
`
`Total
`
`252
`141
`299
`959
`1651
`
`FA rate (%)
`
`93.3
`17.0
`18.7
`51.9
`49.2
`
`PPG was unavailable, 3734 alarms remained. The false
`alarm rates were 93.6% for asystole, 26.6% for EB, 11.7%
`for ET, and 46.4% for VT respectively, and 45.0% overall.
`The ICU visits were divided into two separate sets for testing
`and training, ensuring that the frequency of alarms in each
`category was roughly equal through frequency ranking and
`separating odd and evenly numbered signals. The data were
`divided into two further subsets based on signal availability:
`subset 1 with ECG and PPG available for 30 s before and 10 s
`after each alarm; and subset 2 with ECG, ABP and PPG
`available in the same temporal window. Table 1 details the
`relative frequency of each alarm category and their
`associated true and false rates. Tables 2 and 3 show the
`distribution of alarms in training, test, and combined sets of
`subset 1 and subset 2. Three examples of false VT alarms and
`one true VT alarm are shown in Fig. 1.
`We took three approaches to false alarm reduction, which
`are now described.
`
`False alarm reduction based on PPG
`We developed a novel PPG SQI using the Dynamic
`Time Warping (DTW), multiple-template matching, and a
`heuristic fusion algorithm, which is described in Li and
`Clifford. 14 A PPG beat dynamic template was built by
`detecting and averaging the regular beats in a 30-s PPG
`signal window and segmenting each beat from the onset of
`the beat to the onset of the next beat. Beat detection was
`performed using wabp.c (an open source ABP beat detector
`available at www.physionet.org) with a time and amplitude
`threshold adjustment to fit PPG beat width and height. If no
`beat was found 3 s after the onset of any given beat, then
`the end of the beat window was truncated to 3 s. The
`correlation coefficient between each PPG beat and the
`template was calculated. However, because the morphology
`of beat may change in length due to changes in heart rate or
`cardiac output, three methods were used to fit each PPG
`beat with the template: (1) a direct correlation (no beat
`
`morphology changes), (2) linear interpolation of the beat
`with resampling to match the template, and (3) DTW,
`which stretches the nonlinear
`time-base and traces an
`optimal path to minimize the cumulative distance between
`the beat and the template. We also applied a clipping
`detection algorithm to quantify the percentage of samples
`which were saturated (to the maximum or minimum values)
`within the beat window. These four measures were then
`fused heuristically to classify each beat into excellent (E),
`acceptable (A), and unacceptable (U) according to Eq. (1).
`Taking SQIi, i=1,2,3,4, as the SQIs derived from direct
`correlation,
`linear
`interpolation, DTW, and clipping
`detection, then they are fused to form qSQI by Eq. (1).
`The percentage of good beats (E and A) in a 17-s analysis
`window (13 s prior to the alarm onset and 4 s after the
`alarm, which was also used by Aboukhalil et al. 1 and
`Deshmane 10) was set as the SQI of PPG.
`(
`Excellent Eð Þ
`if All of the 4 SQIi≥0:9
`of the 4 SQIi≥0:9 OR
`if
`3
`Acceptable Að Þ
`if All of the 4 SQIi≥0:7 OR
`Þ≥0:8andSQI1≥0:5and SQI4≥0:7if median SQI1;SQI2; SQI3
`ð
`
`Unacceptable Uð Þ
`otherwise
`ð1Þ
`where the coefficients 0.9, 0.8, 0.7 and 0.5 are arbitrary and
`empirically determined.
`We set a PPG SQI threshold (SQIth) for each type of alarm
`to accept or reject the information in the PPG. The PPG
`signals with SQI≥SQIth (where the PPG was of sufficient
`high quality), were used to suppress the alarms. In order to
`avoid TA suppression, at first, SQIth was set strictly to 1.
`Subsequently, the SQIth was gradually decreased, ensuring
`that the TA suppression was always minimized.
`
`8>>>><
`>>>>:
`
`qSQI=
`
`False alarm reduction based on HR and SQI derived from
`PPG, ABP, and ECG
`Following our previous study, 12 we estimated the HRs
`and SQIs from PPG, ABP, and ECG to suppress false alarms.
`
`Table 3
`Distribution of alarms in training and test sets of subset 2.
`
`Alarm type
`
`Asystole
`EB
`ET
`VT
`All
`
`Training
`
`False
`
`166
`58
`19
`305
`548
`
`True
`
`14
`108
`116
`478
`716
`
`Total
`
`180
`166
`135
`783
`1264
`
`FA rate (%)
`
`92.2
`34.9
`14.1
`39.0
`43.4
`
`Test
`
`False
`
`94
`4
`18
`371
`487
`
`True
`
`11
`63
`104
`371
`549
`
`Total
`
`105
`67
`122
`742
`1036
`
`FA rate (%)
`
`89.5
`6.0
`14.8
`50.0
`47.0
`
`3
`
`
`
`4
`
`Q. Li, G.D. Clifford / Journal of Electrocardiology xx (2012) xxx–xxx
`
`described in our earlier work. 12 The maximum, minimum,
`and mean HR were also calculated for each of the seven HRs
`over a window that was centered on the current beat and
`included both neighboring beats. The resulting 21 HRs and
`corresponding SQIs were then used to suppress false alarms
`by varying the SQI thresholds to decide if the source data are
`trustworthy or not. Subset 2 was used to evaluate the
`algorithm at this step.
`
`Machine learning for false alarm reduction
`A machine learning algorithm approach was used to learn
`the noise and signal relationships in each true and false VT
`alarm condition, which are the most difficult false alarms to
`suppress. Therefore, an extensive set of features were
`defined and a genetic algorithm (GA) was used to select
`pertinent features which were then presented to an RVM to
`classify VT alarms as true or false.
`
`In total 114 variables (including 87
`Variable choice.
`features and 27 SQI metrics) were extracted from ECG,
`ABP, PPG, and SpO2 signals within the 20-s analysis
`window. The features included HR (extracting from ECG,
`ABP, and PPG), systolic, diastolic, mean, and pulse blood
`pressure, SpO2, amplitude of PPG, and area difference of
`beat (ADB) with the mean area under the waveform of each
`beat in the 20-s window of the ECG, ABP, and PPG. Each
`feature except ADB has five sub-features calculated over
`the 20-s window: including maximum, minimum, median,
`variance, and gradient (derived from a robust least squares
`fit over the entire window). The ADB has only four sub-
`features; the mean ADB of five beats with the shortest beat-
`the maximum of mean
`intervals (ADBmean_top5),
`to-beat
`ADB of five consecutive beats (ADBmax_mean5),
`the
`variance (ADBvariance), and the robust least squares gradient
`(ADBgradient) of beats in the 20-s window. The SQI metrics
`of ECG included two metrics of inter-channel and inter-
`algorithm comparisons of two QRS detectors, kurtosis of
`ECG, spectral distribution of ECG and a fusion of these four
`metrics. 12 The ABP SQI metrics included a signal
`abnormality index with its nine sub-metrics 19 and the
`DTW-based SQI fusion with its four sub-metrics 14 which
`was discussed above and was applied on the ABP signal as
`well. The PPG SQI metrics included the DTW-based SQIs 14
`and two Hjorth parameters 10 which estimated the
`dominant frequency and half-bandwidth of the spectral
`distribution of PPG.
`
`Feature selection. Since it is unlikely that all 114 para-
`meters are useful (and in fact some may end up lowering the
`performance) a variable selection technique is required.
`Moreover, with a limited number of patterns from which to
`learn, it is important to keep the number of free parameters
`which we need to learn as low as possible. A genetic
`algorithm (GA) 20,21 was therefore used to select the optimal
`subset of variables for true/false alarm classification. Genetic
`algorithm is a general adaptive optimization search method-
`ology based on a direct analogy to Darwinian natural
`selection principle of “survival of the fittest” and genetics in
`biological systems. For the feature selection using GA, a
`
`Fig. 1. Examples of false and true ventricular tachycardia alarms. Note the
`vertical line marks the time the alarm sounded. (A and B) False alarms and
`the algorithm failed to suppress them. (C) A false alarm and is suppressed
`correctly. (D) A true alarm and is accepted correctly by the algorithm.
`
`A 20-s analysis window (prior to the alarm onset) was used
`to calculate the HR and SQI. Seven beat-by-beat HRs were
`estimated by fusing all possible combinations of signals from
`the three source signals using SQIs and a Kalman filter as
`
`4
`
`
`
`Q. Li, G.D. Clifford / Journal of Electrocardiology xx (2012) xxx–xxx
`
`5
`
`Table 4
`Performance of the PPG-based false alarm suppression algorithm.
`
`Alarm
`Type
`
`Data set #True #False TA
`suppression
`
`FA
`suppression
`
`SQI
`threshold
`
`EB
`
`ET
`
`Asystole Training
`Test
`Total
`Training
`Test
`Total
`Training
`Test
`Total
`Training
`Test
`Total
`
`VT
`
`19
`17
`36
`123
`117
`240
`401
`243
`644
`672
`461
`1133
`
`293
`235
`528
`63
`24
`87
`29
`56
`85
`483
`498
`981
`
`0 (0%)
`0 (0%)
`0 (0%)
`0 (0%)
`0 (0%)
`0 (0%)
`0 (0%)
`0 (0%)
`0 (0%)
`1 (0.15%)
`1 (0.21%)
`2 (0.18%)
`
`236 (80.5%) 0.1
`203 (86.4%)
`439 (83.1%)
`59 (93.7%) 0.1
`20 (83.3%)
`79 (90.8%)
`3 (10.3%) 1
`15 (26.8%)
`18 (21.2%)
`8 (1.66%) 0.1
`10 (2.01%)
`18 (1.83%)
`
`Fig. 2. ROC curves of 56 selected variables (with η=12) for training data
`using the RVM algorithm. The circle marks the operating point where no
`true alarm suppression occurs.
`
`of a multivariate logistic regression. The training set of subset
`2 was used and was split further into training and validation
`sets to train and evaluate the algorithm. A bootstrapping
`procedure was performed by running the logistic regression
`on the training set and evaluating the rMSE on the validation
`set. The GA selection was repeated 100 times and the selected
`variables were sorted by the frequency of selection. This
`ranking was then used as the order of priority in the machine
`learning module. The process was repeated with and without
`ABP features in order to indicate the performance of the
`algorithm on patients when the ABP line is not required.
`
`Machine learning algorithm choice. A Relevance Vector
`Machine is a sparse Bayesian model that provides probabi-
`listic predictions through Bayesian inference. 22-24 The
`central idea of RVM is to map a set of input data to a
`high-dimensional feature space through kernel functions and
`construct decision boundaries that separate the labeled data
`into their constituent classes by predicting the posterior
`probabilities of their class membership. Given a training data
`
`set composed of N samples {xi,yi}i=1N with input xi∈R M and
`output yi∈R, the RVM algorithm aims at constructing a
`function as shown in equation (2).
`y = wTφ xð Þ
`
`ð2Þ
`
`chromosome was defined to be a binary vector with the same
`length as the number of features (114 elements long in our
`scenario), each element (gene) representing one of the
`features (with a “1” indicating a feature is selected). A set
`of chromosomes that were created randomly made up of the
`original generation called a population. Then three opera-
`tions, called selection, crossover and mutation, were iterated
`to generate next generations until acceptable results were
`obtained or a fixed number of generations elapsed. In the
`selection operation, a fitness function was used to pick up the
`chromosomes with better performance. In the crossover
`operation, pairs of chromosomes (parents) were chosen
`randomly to swap parts of their information (binary string) at
`a randomly selected locus to give birth of their children.
`Mutation is used to randomly flip the value of some single
`bits within individual strings. An operator call clone was also
`used to copy some parents which have good performance to
`the next generation without crossover or mutation. Associ-
`ated with the characteristics of exploitation and exploration
`search, GA can deal with large search spaces efficiently, and
`hence has less chance to get local optimal solution than other
`algorithms. In our study, with a population of 50 chromo-
`somes with 114 genes each, a 2% mutation rate, a 10%
`cloning rate, a 45% cull rate for crossover and a 100-
`generation limit,
`the search space of possible variable
`combinations was rapidly explored. The fitness function
`that was minimized was the root mean squared error (rMSE)
`
`Table 5
`Performance and variable selections based on HR and SQI derived from PPG, ABP and ECG of subset 2.
`
`Alarm type
`
`Data set
`
`No. of true alarms
`
`No. of false alarms
`
`Variable selections
`
`SQI threshold
`
`TA suppression
`
`FA suppression
`
`Asystole
`
`EB
`
`ET
`
`VT
`
`Training
`Test
`Total
`Training
`Test
`Total
`Training
`Test
`Total
`Training
`Test
`Total
`
`14
`11
`25
`108
`63
`171
`116
`104
`220
`478
`371
`849
`
`166
`94
`260
`58
`4
`62
`19
`18
`37
`305
`371
`676
`
`HRABP_mean
`
`0.9
`
`HRECG_ABP_PPG_mean
`
`0.1
`
`HRECG_min
`HRABP_PPG_min
`
`HRABP_PPG_mean
`
`0.6
`0.5
`
`1.0
`
`0 (0%)
`
`0 (0%)
`
`0 (0%)
`
`0 (0%)
`
`123 (74.1%)
`66 (70.2%)
`189 (72.7%)
`55 (94.8%)
`4 (100%)
`59 (95.2%)
`12 (63.2%)
`5 (27.8%)
`17 (46.0%)
`8 (2.6%)
`16 (4.3%)
`24 (3.6%)
`
`5
`
`
`
`6
`
`Q. Li, G.D. Clifford / Journal of Electrocardiology xx (2012) xxx–xxx
`
`The posterior probability of model parameters,
`p(w,α,σ 2|t), can be decomposed into two components:
`
`
`p w; α; σ2jt
`
`
`
`
`= p wjt; α; σ2
`
`
`
`
`
`p α; σ2jt
`
`ð6Þ
`
`In-depth discussion concerning the calculus of these
`probabilities can be found from Tipping 22,23 and Bishop. 24
`For solving two-class problem, the logistic sigmoid function,
`σ(y)=1/(1+e −y), is applied to y(x).
`The main advantage of the RVM is that the inferred
`predictors of RVM are sparse (they contain only relatively
`few non-zero parameters) and provide a good generalization
`performance. The majority of parameters are automatically
`set to zero during the learning process, giving a procedure
`that is extremely effective at discerning those basis functions
`which are relevant
`for making good predictions.
`In
`classification,
`the RVM outputs probabilities of class
`membership rather than point estimates like the more
`conventional support vector machine. The RVM therefore
`provides a conditional distribution that allows the expression
`of uncertainty in the prediction. 22,23 Providing a continuous
`output also allows for the construction of a receiver operating
`characteristic (ROC) curve, which can be important
`if
`clinical staff wish to adjust the trade-off between TA and FA
`suppression in a systematic way. (Current approaches to
`adjusting alarms are rather ad hoc and the user is often
`unaware of just how they are moving the alarm performance
`around on the ROC curve.)
`Training the RVM was performed using an open-source
`algorithm. 25 Variables were ranked in order of frequency of
`selection by the GA and presented to the RVM for training
`and testing, beginning with the most frequently selected
`variable and then adding each less frequently selected
`variable one by one. In order to minimize true alarm
`suppression, a weighting factor (η=1,2,3…30) was used
`respectively to weight the estimation error of true alarms to
`find an optimal
`result when training and avoid TA
`suppression. ROC curves were generated with the combina-
`tions of the number of variables from 1 to 114, η from 1 to 30
`and the Gaussian kernel width k between 50 and 60. To
`generate the ROC curve, a TA acceptable threshold was
`raised from the lowest output of RVM to the highest one and
`the output value above the threshold was accepted as a TA.
`The area under the ROC curve (AUROC) and classification
`accuracy were used to select the best model. The operating
`point where the sensitivity was found to be 1 (no TA
`suppression) and also provided maximum specificity (max-
`imum FA suppression) was then used to classify the test set.
`
`Fig. 3. The sensitivities and specificities for all variable selections using the
`RVM algorithm.
`
`where w=(w1,…,wM) Tis the weight vector, φ(x) is a non-
`linear mapping function (basis function).
`When attempting to calculate w from the training set, we
`assume that each target ti is representative of the true model
`yi, but with the addition of noise εi:
`ð3Þ
`ti = yi + εi = wTφ xið Þ + εi
`where εi is assumed to be independent Gaussian distributed
`with zero mean and variance σ 2,εi~N(0,σ 2).
`The likelihood function of the observed data set can be
`written as:
`
`
`p tjw; σ2
`
`
`
`
`
`= 2πσ2
`
`
` −N = 2exp − 1
`
`‖t−Φw2‖
`
`2σ2
`
`
`
`ð4Þ
`
`ð5Þ
`
`where Φ is the N×M design matrix whose the ith row
`represents the vector φ(xi).
`Under a Bayesian perspective, model parameters w and
`σ 2 can be estimated by first assigning prior distributions to
`the parameters and then estimating their posterior distribu-
`tion using the likelihood of the observed data. In the
`formulation of RVMs, Tipping 22,23 proposed a prior
`conditional distribution for each free parameter of the form:
`
`
`
`
`N 0; α−1
`i
`
`p wjαð
`
`Þ = ∏N
`
`i = 0
`
`where α=[α0,…,αi,…,αN] T is the vector of the RVM
`hyperparameters and the inverse variance of each wi,
`which should be iteratively estimated from the data.
`
`Table 6
`Machine learning alarm suppression results.
`
`Alarm type
`
`VT
`
`Data set
`
`Training
`
`Test
`
`Threshold
`
`Weighting factor (η)
`
`No. of variables
`
`TA suppression
`
`FA suppression
`
`0.01
`0
`0
`0.01
`0.01
`0
`
`25
`24
`25
`25
`24
`25
`
`96
`69
`110
`96
`69
`110
`
`3 (0.63%)
`0 (0%)
`0 (0%)
`3 (0.81%)
`3 (0.81%)
`0 (0%)
`
`91 (29.84%)
`79 (25.90%)
`66 (21.64%)
`113 (30.46%)
`82 (22.10%)
`73 (19.68%)
`
`6
`
`
`
`Q. Li, G.D. Clifford / Journal of Electrocardiology xx (2012) xxx–xxx
`
`7
`
`Results
`
`False alarm reduction based on PPG
`
`Table 4 details the best performance of applying our PPG
`SQI algorithm to suppressing false alarms in subset 1 after
`varying SQIth between 0 and 1 in increments of 0.1. The aim
`was to minimize TA suppression (ideally zero) and afterward
`maximize FA suppression. Only one TA was suppressed (for
`VT) in training and testing for all alarms. The overall FA
`suppressions rates were 83.1% for asystole (80.5% on the
`training set and 86.4% on the test set), 90.8% for EB (93.7%
`on training and 83.3% on test), and 21.2% for ET (10.3% on
`training and 26.8% on test). However, FA suppression for
`VT was low, only 1.83% (1.66% on training and 2.01% on
`test), making the algorithm of marginal use for VT false
`alarm suppression.
`
`False alarm reduction based on HR and SQI derived from
`PPG, ABP, and ECG
`
`Table 5 provides the best performance and the HR
`variable selections and SQI thresholds on the training set
`of subset 2 which provided minimal
`(i.e., no) TA
`suppression for all types of alarms. The mean HR from
`the ABP (HRABP_mean) with SQIth=0.9 gave 74.1%
`asystole FA suppression. The fused mean HR from all
`three waveforms (HRECG_ABP_PPG_mean), with SQIth=0.1,
`gave a 94.8% EB FA suppression rate. The minimum HR
`from ECG (HRECG_min) and fused ABP and PPG
`(HRABP_PPG_min) gave an ET FA suppression of 63.2%
`with SQIth=0.6 and 0.5 respectively. The fused mean HR
`from the ABP and PPG (HRABP_PPG_mean) with SQIth=1.0,
`provided the best FA suppression result for VT (2.6%).
`Using the selected HRs and SQI thresholds, we tested the
`performance of false alarm suppression on test set. The
`performance on test set is shown in Table 5 as well. There
`was no TA suppression for all types of alarms. The FA
`suppression rate was 70.2%, 100%, 27.8%, and 4.3% for
`asystole, EB, ET, and VT respectively.
`
`False alarm reduction based on machine learning
`
`The GA selected four variables each time it was used.
`These were the maximum of the mean ADB of five
`consecutive beats from the ECG (ADBmax_mean5_ECG), the
`variance of ADB from ECG (ADBvariance_ECG) and two SQI
`features from the ABP and PPG. There were 23 variables that
`were selected in more than 50% of the GA runs and 41
`variables that were selected less than 10% of the runs.
`
`ROC curves of different variable selections and weighting
`factors (η) were created by RVM on training set. From
`Fig. 2, we can see that, for example, when 56 input variables
`are selected and η=12,
`the sensitivity was 1.0 (no TA
`suppression) and the specificity was 0.54 (54% FA
`suppression) on the training set. The corresponding test set
`sensitivity was 0.97 and specificity was 0.48. The sensitivity
`and specificity curves of all variable selections are made and
`shown in Fig. 3 by selecting the model with maximum of
`AUROC for each specific variable selection.
`We compared the false alarm suppression rate for
`different thresholds of acceptable true alarm suppression
`rate. The best performance was achieved, as shown in
`Table 6, with a FA suppression rate of more than 30%
`(29.84% on training, 30.46% on test data) with a TA
`suppression rate below 1%. To reduce the TA suppression
`rate to zero, a reduction in FA performance to 20% was
`required (21.64% on training, 19.68% on test data).
`To assess the variability of the performance of the
`algorithm, a 10-fold cross validation was tested on the data
`set (combining training and test set). By selecting the model
`with 110 features, η=25, k=57 with the operating point of
`no TA suppression on each training set, the average FA
`suppression rate on the training set was 21.34%±5.81%. The
`average TA and FA suppression rate are 0.59%±0.62% and
`20.30%±8.6% respectively on the validation set.
`When the ABP features were removed, 60 variables
`remain. The results (shown in Table 7) reveal that 33 selected
`variables provided the highest FA suppression (21% for both
`training and test sets) and least TA suppression (0.2% and
`0.8% for training and test sets respectively).
`
`Discussion
`
`The best alarm suppression performances achieved were
`as follows: asystole—FA suppression rate of 83.1% (80.5%
`on training, 86.4% on test) using the PPG only; EB—95.2%
`(94.8% training, 100% test) and ET—46.0% (63.2%
`training, 27.8% test) using the fusion of ECG, ABP and
`PPG features. No TA suppression was found in training or
`testing for these results. However, it should be noted that the
`number of FA of EB in the test set of subset 2 is low (four
`only), and should therefore be regarded with caution. For
`VT, the best performance on the independent test set was
`achieved by the RVM, with a FA suppression rate of over
`30% and a TA suppression rate below 1% using 96 features
`derived from the ECG, ABP and PPG. To reduce the TA
`suppression rate to zero, a reduction in the FA suppression
`
`No. of variables
`
`TA suppression
`
`FA suppression
`
`33
`58
`42
`33
`58
`42
`
`1 (0.21%)
`0 (0%)
`0 (0%)
`3 (0.81%)
`3 (0.81%)
`0 (0%)
`
`63 (20.66%)
`48 (15.74%)
`15 (4.92%)
`77 (20.75%)
`55 (14.82%)
`19 (5.12%)
`
`Table 7
`Machine learning alarm suppression results without ABP features.
`Weighting factor (η)
`
`Alarm Type
`
`Data set
`
`Threshold
`
`VT
`
`Training
`
`Test
`
`0.01
`0
`0
`0.01
`0.01
`0
`
`16
`27
`29
`16
`27
`29
`
`7
`
`
`
`8
`
`Q. Li, G.D. Clifford / Journal of Electrocardiology xx (2012) xxx–xxx
`
`rate to 20% was required. It should be noted that if small
`increases in the TA suppression rate are allowed (perhaps
`3%), then our approach can achieve FA suppression rates
`above 60% for VT alarms (see Fig. 3). However, the VT
`false alarms are the most difficult
`to suppress without
`causing any true alarm suppression. This is generally
`because low rate VT appears to have fairly normal pump
`action. However, one might argue that if the pump function
`is fine,
`then perhaps suppression of such alarms is
`acceptable. Conversely, if the noise on the ABP and PPG
`is coincident with the VT-like noise on the ECG, which is
`often the case, then it is impossible to suppress such alarms.
`Fig. 1(A) and (B) illustrates examples where our algorithm
`failed to suppress the false alarms because the signal quality
`of the ABP and PPG is too low to be used to suppress the
`false alarm. Despite these limitations, our results represent
`the best reported so far in the literature on VT alarms (in
`terms of the trade-off between TA and FA suppression). We
`note that the best previously reported results on VT alarms
`were by Aboukhalil et al 1 and Sayadi and Shamsollahi 13
`who achieved FA VT suppression rates of 33.0% and 66.7%
`respectively. However, their TA suppression rates (9.4% and
`3.8%) are clearly too high to make their algorithms useful for
`this category of alarm. Moreover, the latter work 13 also
`required additional signals not often recorded in the intensive
`care unit, such as the central venous pressure. In contrast, the
`approach described in the current work is immediately
`applicable to real-time monitoring on the entire ICU
`population. Additionally,
`the work described by Sayadi
`and Shamsollahi 13 is computationally intensive and requires
`a powerful modern desktop computer to run in near real time.
`The latency of their approach would exceed the recom-
`mended 10-s period.
`We note however, that the latter work app