`
`quality assessment of pulsatile signals
`
`Q Li1,2 and G D Clifford2
`
`1Institute of Biomedical Engineering, School of Medicine, Shandong University,
`
`Jinan, Shandong, 250012, China
`2 Institute of Biomedical Engineering, Department of Engineering Science,
`
`University of Oxford, Oxford, OX1 3PJ, UK
`
`E-mail: gari@robots.ox.ac.uk
`
`Abstract
`
`In this work we describe a beat-by-beat method for assessing the clinical utility of
`
`pulsatile waveforms, primarily recorded from cardiovascular blood volume or
`
`pressure changes, concentrating on the photoplethysmogram (PPG). Physiological
`
`blood flow is nonstationary, with pulses changing in height, width and morphology
`
`due to changes in heart rate, cardiac output, sensor type and hardware or software
`
`pre-processing
`
`requirements. Moreover,
`
`considerable
`
`inter-individual
`
`and
`
`sensor-location variability exists. Simple template matching methods are therefore
`
`inappropriate, and a patient-specific adaptive initialization is therefore required. We
`
`introduce dynamic time-warping (DTW) to stretch each beat to match a running
`
`template and combine it with several other features related to signal quality, including
`
`correlation and the percentage of the beat that appeared to be clipped. The features
`
`were then presented to a multi-layer perceptron (MLP) neural network to learn the
`
`relationships between the parameters in the presence of good and bad quality pulses.
`
`An expert-labelled database of 1055 segments of PPG, each 6 seconds long, recorded
`
`from 104 separate critical care admissions during both normal and verified
`
`arrhythmic events, was used to train and test our algorithms. An accuracy of 97.5%
`
`on the training set and 95.2% on test set was found. The algorithm could be deployed
`
`as a stand-alone signal quality assessment algorithm for vetting the clinical utility of
`
`PPG traces or any similar quasi-periodic signal.
`
`Keywords: artificial neural network, dynamic time warping, machine learning,
`
`multi-layer perceptron, photoplethysmograph, pulsatile signal, signal quality
`
`assessment.
`
`1
`
`APPLE 1074
`Apple v. AliveCor
`IPR2021-00972
`
`
`
`
`
`1. Introduction
`
`
`
`The Photoplethysmograph (PPG) may not only be used as the source of arterial oxygen
`
`saturation (SaO2) and heart rate (HR), but also as a simple and low-cost way of blood volume
`
`change detection in the microvascular bed of tissue, blood pressure and cardiac output
`
`estimation, respiration rate estimation and vascular assessment (Allen 2007). However, the
`
`PPG signal is easily disturbed by poor blood perfusion, ambient light and motion artefact
`
`(Hayes and Smith 1998, 2001). Such artefacts give rise to errors in interpretation of the PPG
`
`signals in clinical physiological measurements, and can lead to numerous false alarms. In a
`
`recent study by Monstaerio et al (2012) apnea-related false desaturation alarm rates were
`
`shown to be as high as 85%.
`
`Many signal processing methods have been used to suppress the artefacts, such as
`
`moving average filtering (Lee et al 2007), adaptive filtering (Graybeal and Petterson 2004,
`
`Chan and Zhang 2002, Relente and Sison 2002), wavelet transform (Sukanesh and Harikumar
`
`2010, Addison and Watson 2010, Lee and Zhang 2003), independent component analysis
`
`(Kim and Yoo 2006, Yao and Warren 2005, Krishnan et al 2008a), high order statistics
`
`(Krishnan et al 2008b) and singular value decomposition (Reddy and Kumar 2007). However,
`
`the signal processing methodologies suffer from a lack of generality imposed by the implicit
`
`assumption that artefact corruption manifests itself as an additional signal component
`
`unrelated to the physiology either in the time, frequency or statistical domains (Hayes and
`
`Smith 2001). An alternative approach is to assess the signal quality of PPG waveform and
`
`consider analyzing only good quality pulses. (Of course, the presence of poor quality
`
`waveforms can be considered useful information, such as a metric of physical activity, but the
`
`associated physiological information cannot be trusted.) Sukor et al (2011) used a waveform
`
`morphology analysis method to evaluate PPG signal quality when induced motion artefact
`
`occurred. By comparing with a manually annotated gold standard, the mean sensitivity,
`
`specificity, and accuracy for beat detection were 89 ± 11%, 77 ± 19%, and 83 ± 11%
`
`respectively on 104 fingertip PPG signals, acquired from 13 healthy people, conducted in a
`
`laboratory environment, containing varying degrees of purposely induced motion artefact. Gil
`
`et al (2010) and Monasterio et al (2012) used Hjorth parameters to assess PPG signal quality
`
`and Deshmane (2009) applied this to false electrocardiogram (ECG) arrhythmia alarms
`
`suppression in intensive care monitors. Although the Hjorth parameters provided an adequate
`
`method for identifying high quality data segments, during arrhythmias the Hjorth parameters
`
`often identified PPG data associated with an arrhythmia as poor quality PPG. Moreover, the
`
`Hjorth parameters require a window much larger than a single beat, so temporal resolution is
`
`limited.
`
`In this article, we described a novel beat-by-beat PPG signal quality metric which uses a
`
`multilayer perceptron (MLP) neural network to combine several individual signal quality
`
`metrics and physiological context to provide a probability of a pulse being acceptable for
`
`monitoring. One
`
`important component of our approach
`
`includes constructing an
`
`individual-specific template of an average beat. Dynamic time warping (DTW) (Keogh and
`
`Ratanamahatana 2005) was used to cope with the normal short-term nonstationary and
`
`nonlinear changes in height, width and overall morphology of each pulse due to changes in
`
`
`
`2
`
`
`
`heart rate, cardiac output, manufacturer-specific hardware responses of sensors or software
`
`pre-processing requirements. (In the latter case, automatic changes in light intensity, amplifier
`
`gain or averaging may cause unusual distortions.) Furthermore, differences in individual
`
`recording modalities (such as senor location or method of attachment to the patient) and intra-
`
`and inter-individual variability in skin and cardiovascular state can lead to large differences in
`
`initial morphologies and dynamic changes. Simple template matching methods are therefore
`
`inappropriate, and an adaptive method of initializing on a given recording set-up, and tracking
`
`the changes over time is therefore required. For this reason, DTW has previously been
`
`employed in ECG segmentation and classification (Vullings et al 1998, Huang and Kinsner
`
`2002). In this work, we use the DTW in a similar way to apply a nonlinear temporal
`
`stretching to fit the changing PPG beat with a dynamic beat template.
`
`
`
`2. Methods
`
` A
`
` database of 1,055 expert-labelled beats drawn from 104 separate critical care recordings
`
`was used to develop the algorithm described in this work. For each recording, a template was
`
`first formed from the average of the 30 seconds of beats in the PPG waveform. The template
`
`was then updated by each new beat that is accepted (has an SQI above a given threshold). The
`
`degree of similarity between a given beat and a running template was then used as an index of
`
`signal quality.
`
` However, since the DTW can fail in unexpected ways, it is not sufficient to just use this
`
`approach. A direct beat matching method without any preprocessing and also a matching
`
`based on linear resampling of the beat (to stretch or compress the beat to fit the length of the
`
`template) were also used. The correlation coefficient between the beat and the template was
`
`used as the signal quality index (SQI). Although the correlation coefficient can give a general
`
`match, it is insensitive to amplitudes, and indiscriminately accepts random square-wave noise.
`
`A clipping detection algorithm was therefore employed to detect the percentage of saturation
`
`to maximum or minimum value within each beat. These four measures of quality were then
`
`combined using a machine learning algorithm approach, which is described by Clifford et al
`
`(2011). Essentially, we learn the relationship between each of the signal quality measures by
`
`presenting the machine learning algorithm with hundreds of examples of high and low quality
`
`beats, and training the algorithm to classify the beats as high or low quality. This leads to a
`
`multivariate threshold set through rigorous experientially determined thresholds.
`
`
`
`2.1. Beat detection
`
`Beat detection was performed using wabp.c (an open source ABP beat detector (Zong et al
`
`2003) from www.physionet.org) with a time and amplitude threshold adjustment to fit PPG
`
`beat width and height. Specifically, we changed the slope width of rising edge of beat from
`
`130ms to 170ms and extended the eye-closing period after each detected beat from 250ms to
`
`340ms to avoid double-detection of the possible secondary peak of a PPG beat. The length of
`
`a PPG beat was delimited by the fiducial marks at the onset of the current beat and the onset
`
`of the next beat. If no beat was found 3 seconds after the onset of any given beat, then the end
`
`of the beat window was truncated to 3 seconds.
`
`
`
`
`
`3
`
`
`
`2.2. Initial template generation
`
`A PPG beat template was initially generated by averaging every beat in a window of 30
`
`seconds. The PPG signals are assumed to be quasi-periodic, and so autocorrelation of each 30
`
`seconds of data was taken and the length (L) between two main peaks of the autocorrelation
`
`sequence was used to determine the average period of PPG beats. The length of the PPG
`
`template was then set to be L. To derive the first template (T1) we averaged all the beats in the
`30s window with each beat beginning at the fiducial mark (onset of the beat) and ending at
`
`the length of the template. The correlation coefficients (C) between T1 and each beat in the
`30s window were then calculated (Clifford 2002). Any beat with C<0.8 was removed from
`
`the template, and the average beat was recalculated from the remaining beats to generate the
`
`second template (T2). If more than half of the beats were removed by the process, T2 was
`deemed untrustworthy, and the template from the previous window was used instead. If no
`
`previous window is available, the next 30 seconds were used. Template updating can then be
`
`performed on a beat-by-beat basis, but only after classification of a new incoming beat is
`
`performed, which requires several other beat analysis metrics first as described below.
`
`
`
`2.3. Dynamic time warping of PPG beat
`
`As described earlier, a nonlinear time-base stretching of each beat is sometimes required
`
`before correlating to the beat template, in order to allow for nonlinear and nonstationary
`
`changes in the beat morphology. This was achieved through DTW. Suppose we have two time
`
`series, T and B, of length n and m, respectively, where
`
`
`
`T
`
`=
`
`t
`1
`
`,
`
`t
`
`2
`
`,...,
`
`t
`
`
`
`t,...
`
`
`
`n
`
`i
`
`
`
`
`
`=
`
`B
`
` bb ,
`
`
`1
`2
`
`,...,
`
`b
`
`b,...
`
`
`m
`
`j
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(1)
`
`(2)
`
`To align two sequences using DTW, an n-by-m distance matrix (D) is constructed where
`the (ith, jth) element of the matrix contains the distance d (ti, bj) between the two points ti and bj.
`Each matrix element (i, j) corresponds to the alignment between the points ti and bj. The aim
`of DTW is to find an optimal path from (0, 0) to (n, m) and minimize the cumulative distance
`
`of the path.
`
`Defining T as the template of PPG and B as a PPG beat, we first transform the template
`
`and the beat to short line sequences using a piecewise linear approximation (PLA) algorithm
`
`(Koski 1996). The distance between each short line pair (d (ti, bj)) is then defined as the
`absolute difference between the slopes of each short line. A cumulative distance up to lines i
`
`+
`
`
`
`(bl
`
`j
`
`))
`
`
`
`
`
`
`
`
`
` (3)
`
` ,( )() tlbtd
`
`
`i
`j
`i
`
`
` ,( )()( tlbtd
`i
`j
`i
`
` ,( () blbtd
`i
`j
`
`)
`
`j
`
`
`
`++
`
`+
`
`(cid:239)(cid:238)(cid:239)(cid:237)(cid:236)
`
`and j, ci,j, is then defined by :
`
`
`
`c
`i
`
`,
`
`j
`
`=
`
`min
`
`c
`i
`
`-
`,1
`
`j
`
`--
`
`1
`
`1
`
`j
`
`j
`
`c
`-
`,1
`i
`c
`i
`
`,
`
` l(ti) and l(bj) are the duration of line ti and bj in the time series. The optimal path can be
`achieved by selecting the path with the minimum cumulative distance. Figure 1 shows an
`
`example of the PPG template and beat sequences, optimal warping path and the resulting
`
`alignment.
`
`
`
`4
`
`
`
`(a) (b)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(c)
`
`Figure 1. An example of DTW procedure. (a) The PPG beat template (T – bold line) and a
`
`PPG beat (B – soft line). (b) To align T and B, a warping matrix was constructed and the
`
`optimal warping path was shown with solid squares. (c) The resulting alignment flow.
`
`
`
`
`
`2.4. Signal quality metrics for PPG
`
`
`
`Four individual SQIs were initially defined as follows.
`
`
`
`2.4.1. Direct matching SQI. We selected the sampling point series of each beat within the 30s
`
`window, beginning at the fiducial mark and ending at the length of the template (L). Then
`
`calculate the correlation coefficient with the template as the direct matching SQI (SQI1). We
`set any negative value of correlation coefficient (negative correlation) to zero, so the value of
`
`SQI ranges between 0 and 1 inclusively.
`
`
`
`2.4.2. Linear resampling SQI. We selected each beat between two fiducial marks and linearly
`
`stretch (if the length of the beat is shorter than L) or compress (if it is longer) the beat to the
`
`length of template. Then calculate the correlation coefficient as the linear resampling SQI
`
`(SQI2). Again, the SQI value is rounded to a non-negative number.
`
`
`2.4.3. Dynamic time warping SQI. Using DTW, we resample the beat to length L and
`
`calculate the correlation coefficient as the dynamic time warping SQI (SQI3). Non-negative
`rounding is again applied.
`
`
`
`2.4.4. Clipping detection SQI. Periods of saturation to a maximum or a minimum value
`
`were determined within each beat. A hysteresis threshold (of 1 normalized unit) to determine
`
`the smallest fluctuation that should be ignored was defined. Such samples are defined to be
`
`‘clipped’. The percentage of the beat that is not clipped is defined to be the clipping detection
`
`SQI (SQI4).
`
`
`
`
`5
`
`
`
`2.5. Data Sources
`
`As there is no annotated PPG database published, we trained and evaluated our algorithm
`
`using an annotated PPG dataset developed by the PhysioNet team (Goldberger et al 2000)
`
`taken from the MIMIC II database (Saeed et al 2002). The dataset includes 1437 signal
`
`quality annotations of each channel including ECG, arterial blood pressure (ABP) and PPG
`
`from 104 independent adult critical care stays. Two independent annotators graded the signal
`
`quality based on the waveform around the time when arrhythmia alarm of monitors occurs.
`
`Disagreements were adjudicated by a third expert. There are two types of arrhythmia alarm in
`
`the dataset: asystole and ventricular tachycardia (VT). The types of annotation for signal
`
`quality were: good (1), bad (0) and uncertain (other). We selected only the annotations with a
`
`value of 1 (good) or 0 (bad) to be used in this study. The distribution of these annotations for
`
`the dataset is shown in table 1.
`
`Data was then split into separate training and testing groups. Patients in the dataset were
`
`sorted in ascending order of the number of annotations they possessed and every odd
`
`numbered patient (in the sorted list) was placed in the training and every even numbered
`
`patient in the test set. Each set therefore had an equal number of patients (52) and an
`
`approximately equal number of annotations, as shown in table 2.
`
`
`
`Alarm
`
`type
`
`Asystole
`
`VT
`
`Total
`
`
`
`Table 1. Summary of the expert annotations in the dataset.
`
`Patients
`
`54
`
`88
`
`104
`
`Good
`
`177
`
`648
`
`825
`
`Bad
`
`75
`
`155
`
`230
`
`PPG annotations
`
`Uncertain
`
`97
`
`285
`
`382
`
`Total
`
`349
`
`1088
`
`1437
`
`Used (Good + Bad)
`
`252
`
`803
`
`1055
`
`Table 2. Summary of the annotations in training and test datasets.
`
`Dataset
`
`Good quality
`
`Bad quality
`
`Total
`
`Training
`
`Test
`
`Total
`
`427
`
`398
`
`825
`
`127
`
`103
`
`230
`
`554
`
`501
`
`1055
`
`
`
`2.6. Data fusion approaches
`
`
`
`Two methods for fusing the signal quality information were compared; one based on simple
`
`logic, and one using an optimized multivariate classifier (the MLP).
`
`
`
`2.6.1. Simple heuristic fusion of the SQIs matrices. The four signal quality indices were fused
`
`into one (qSQI) and used to classify each beat in the dataset. The fusion equation was
`
`constructed in an ad hoc manner as follows:
`
`
`
`
`
`6
`
`
`
`Excellent
`
`)E(
`
`if
`
`ofAll
`
`the
`
`4
`
`SQI
`
`‡
`
`9.0
`
`i
`
`
`
`(4)
`
`‡‡
`
`SQI
`
`i
`SQI
`
`9.0
`
`OR
`
`7.0
`
`SQI
`
`1
`
`OR
`‡
`
`5.0
`
`and
`
`SQI
`
`4
`
`‡
`
`.70
`
`4
`
`the
`‡
`
`)
`
`3
`
`i
`and .80
`
`if
`
`if
`
`ofAll
`
`median
`
`(
`
`SQI
`
`,
`
`SQI
`
`2
`
`1
`
`,
`
`SQI
`
`(cid:239)(cid:238)(cid:239)(cid:237)(cid:236)
`
`if
`
`Acceptable
`
`)A(
`
`
`
`leUnacceptab
`
`)U(
`
`otherwise
`
`3
`
`of
`
`the
`
`4
`
`(cid:236)
`
`(cid:239)(cid:239)(cid:239)(cid:239)
`
`(cid:237)
`
`(cid:239)(cid:239)(cid:239)(cid:239)
`
`(cid:238)
`
`qSQI
`
`=
`
`where the coefficients 0.9, 0.8, 0.7 and 0.5 are arbitrary and set empirically through trial and
`
`error. Although these coefficients could be optimized, it is unlikely that the logic is optimal,
`
`and so an exhaustive search of possible logical combinations and thresholds was not
`
`performed. Rather, qSQI was defined to provide a baseline for a more principled approach. To
`
`convert the categorical outputs to numerical outputs, we mapped E or A to a value of unity,
`
`and U to a value of zero.
`
`To evaluate the performance of the algorithm, we chose an analysis window of six
`
`seconds, beginning at five seconds before the asystole or VT alarm onset. (This was
`
`approximately the segment of data which was used to make the SQI annotation by the
`
`experts.) An extra window of 30 seconds before the alarm fiducial mark was used to generate
`
`the ‘normal’ beat template. The mean qSQI (qSQImean) of all the beats within the analysis
`window was calculated. At the training stage, we selected a good quality threshold (qSQIth) to
`achieve the best classification accurate rate for the training set. If qSQImean ≥ qSQIth, we set
`the SQI to 1, otherwise we set the SQI to 0 in order to compare with the gold standard expert
`
`annotations and calculate the accuracy. To select the best qSQIth, we varied its value between
`0 and 1 in steps of 0.01 and calculated the classification accuracy at each point. The best
`
`qSQIth , which resulted in the highest accuracy, was then used to classify the test set.
`
`
`2.6.2. Machine learning for quality estimation. We selected two groups of input variables to
`
`present to the MLP. The first group included the four SQI metrics (SQI1, SQI2, SQI3 and SQI4).
`For each SQI metric, we calculated the mean SQI of the beats within the six second analysis
`
`window. The second group used six variables, including the four SQI matrices, the simple
`
`fusion (qSQI), and the number of beats detected within the window (Nbeats). The rationale for
`adding the number of beats as an input was that we expect the noise and abnormality of the
`
`signal to manifest differently at different heart rates. The rationale for including qSQI as a
`
`feature is that, if it proves to be a useful approach, then the highly nonlinear structure of the
`
`metric’s logic would be difficult to reproduce without much larger numbers of training
`
`patterns.
`
`Therefore, the architecture of the MLP was 4-N-1 or 6-N-1, where the number of hidden
`
`nodes, N, had to be optimized, and the input was fixed to the number of features as described
`
`above. The output was simply a single node providing an estimate of the class (1 or 0). A
`
`sigmoid activation function was used on the hidden layer and the MLP neural network
`
`training used the Levenburg-Marquardt algorithm (Moré 1978). The stopping criteria were: a
`maximum of 200 epochs, an error ≤ 10-5, or a gradient ≤ 10-5. Since the MLP requires an
`independent validation set to prevent over-training, the training set was further divided into
`
`subsets 70% for training, 25% for validation and 5% for pre-testing at random. The validation
`
`set was used to test the optimal number of nodes in the hidden layer. This was chosen to be
`
`
`
`7
`
`
`
`the number which provided the highest accuracy within the range of N = 2 to 20. (Using more
`
`than 20 hidden nodes would likely lead to extreme over-fitting for our given dataset).
`
`
`
`3. Results
`
`
`
`3.1. SQI metrics of PPG
`
`The four SQI metrics quantify different characteristics and the simple fusion of the SQI
`
`matrices (qSQI) classifies the signal quality of each PPG beat into three levels: extremely
`
`high quality (E), moderate quality (A), and untrustworthy (U). Figure 2 shows two parts of
`
`PPG from the evaluation dataset with four SQI metrics and the simple fusion classification.
`
`Each PPG beat onset is marked by a dotted line and the alarm onset is marked by a solid line
`
`at the 5th second.
`
`
`
`
`
` (a) (b)
`
`Figure 2. An example of SQI matrices and simple fusion of PPG from evaluation dataset. (a)
`
`annotated as E or A (good quality), (b) annotated as U (bad quality). Each plot shows two
`
`channels of signal, PPG (PLETH) and ECG (ECG V). The ECG is provided for visual
`
`reference only and is not used. Each detected PPG beat was marked by a dotted line and
`
`accompanied by a column of five annotations corresponding to the individual beat’s values of
`
`qSQI, (categorical; E, A or U), and the numerical values of SQI1, SQI2, SQI3, and SQI4
`respectively. Note that eq. 4 was applied to SQI1 through SQI4 to determine qSQI.
`
`
`3.2. Evaluation results
`
`
`
`3.2.1. Result of qSQI. Using the training set, we varied the value of qSQImean above which data
`was considered to be good quality and calculated the receiver operating characteristic (ROC)
`
`curve (Figure 3). The qSQIth which gave the best classification accuracy was qSQIth=0.36,
`which resulted in an accuracy of 88.1% (488 correctly classified out of 554) on the training
`
`set. Using this threshold the accuracy on the test set was found to be 91.8% (460 correctly
`
`classified out of 501).
`
`
`
`
`
`8
`
`
`
`Figure 3. ROC curve of qSQI algorithm derived by varying qSQIth across the training set.
`The circle indicates the position of maximum accuracy (88.1% in training set).
`
`
`
`
`
`3.2.2. Results of machine learning for classifying quality. In contrast to thresholding on qSQI,
`
`the machine learning algorithm approach provides a multivariate threshold. Figure 4 shows
`
`the ROC curves of MLP algorithm. The MLP neural network with 6 inputs gives the best
`
`performance with an accuracy of 97.5% (540 of 554) on the training set and 95.2% (477 of
`
`501) on test set.
`
`The full performances of different quality estimation methods are shown in table 3.
`
`
`
`
`
`
`
`Table 3. Performances of heuristic and ML approaches.
`
`# of
`
`Training Performance (%)
`
`Test Performance (%)
`
`Inputs
`
`Acc
`
`Se
`
`SP
`
`PPV Acc
`
`Se
`
`SP
`
`PPV
`
`1
`
`6
`
`4
`
`88.1
`
`88.3
`
`87.4
`
`95.9
`
`91.8
`
`94.7
`
`80.6
`
`95.0
`
`97.5
`
`98.4
`
`94.5
`
`98.4
`
`95.2
`
`99.0
`
`80.6
`
`95.2
`
`97.1
`
`98.6
`
`92.1
`
`97.7
`
`92.4
`
`96.7
`
`75.7
`
`93.9
`
`Notes
`
`qSQIth=
`0.36
`
`Hidden
`
`nodes: 10
`
`Hidden
`
`nodes: 10
`
`9
`
`Method
`
`qSQI
`
`MLP
`
`
`
`MLP
`
`
`
`
`
`
`
`
`
`
`Figure 4. ROC curves of MLP algorithms for training set with operating points of maximal
`
`accuracy indicated.
`
`
`
`
`
`Table 4. Performances of any possible five inputs of MLP algorithm.
`
`Training Performance (%)
`
`Test Performance (%)
`
`# of Hidden
`
`Acc
`
`Se
`
`SP
`
`PPV Acc
`
`Se
`
`SP
`
`PPV
`
`Nodes
`
`97.3 99.3 90.1 97.3 91.2
`
`98.0
`
`65.1
`
`91.6
`
`97.7 98.8 93.7 98.1 94.6
`
`97.0
`
`85.4
`
`96.3
`
`13
`
`14
`
`97.1 99.3 89.8 97.0 94.6
`
`98.0
`
`81.6
`
`95.4
`
`6
`
`98.4 99.1 96.1 98.8 93.6
`
`97.2
`
`79.6
`
`94.9
`
`19
`
`98.7 99.8 95.3 98.6 92.0
`
`96.5
`
`74.8
`
`93.7
`
`19
`
`98.6 98.6 98.4 99.5 94.0
`
`96.7
`
`83.5
`
`95.8
`
`18
`
`Inputs
`
`qSQI, SQI1, SQI2,
`SQI3, SQI4
`qSQI, SQI1, SQI2,
`SQI3, Nbeats
`qSQI, SQI1, SQI2,
`SQI4, Nbeats
`qSQI, SQI1, SQI3,
`SQI4, Nbeats
`qSQI, SQI2, SQI3,
`SQI4, Nbeats
`SQI1, SQI2, SQI3,
`SQI4, Nbeats
`
`
`
`
`
`Finally, in order to test the multivariate marginal information increase of each input variable,
`
`we retrained the MLP algorithm for all combinations of five of the six input variables. Table 4
`
`shows the performance of each of these combinations. The highest accuracy on test data was
`
`94.6% with variables qSQI, SQI1, SQI2, SQI3, and Nbeats, which is marginally lower than the
`best performance of 95.2%, with a small drop in sensitivity (Se), from 99% to 97%, but a
`
`large increase in specificity (SP) and a marginal increase in positive predictivity (PPV). We
`
`
`
`10
`
`
`
`note that the number of hidden nodes found for this performance is relatively high (14). A
`
`similar performance was found using only six hidden nodes qSQI, SQI1, SQI2, SQI4, and Nbeats,
`indicating that much complementary information exists between each metric.
`
`
`
`4. Discussion
`
`
`
`The multivariate ‘voting’ threshold provided by the machine learning approach is clearly
`
`superior to the single parameter thresholding on the SQI metrics, although only if a good
`
`choice of ML algorithm is made. Although other ML algorithms could be used, the flexibility
`
`of the neural network, and its simple on-line implementation make it a good choice if large
`
`numbers of training patterns are available (and in fact, in tests not published here, a support
`
`vector machine produced marginally worse results). Of the tested approaches, the MLP using
`
`all six quality measures provided the best performance, with 95% accuracy on an independent
`
`(unseen) test set. Although this is an impressive accuracy, and similar to recent results on
`
`ECG quality analysis we performed with a paradigmatically similar approach (Clifford et al
`
`2011), it must be noted that the weights of our trained MLP are specific to the type of data on
`
`which it was trained. In other words, to extend this system to other data and rhythms (outside
`
`of asystole and ventricular tachycardia) the MLP must be retrained. This of course, is not an
`
`issue as long as accurately labelled data is available. It should also be noted that there is some
`
`ambiguity in interpreting the 95% accuracy of our system in as much as it is not known what
`
`level of accuracy would be needed in a particular circumstance or application. For example,
`
`such an accuracy may be entirely sufficient to detect heart rates (and reduce false alarms such
`
`as bradycardia, asystole and tachycardia), but may not be sufficient to determine if we could
`
`trust an apnea alarm resulting from an analysis of a respiratory trace derived from the PPG, or
`
`a desaturation alarm. In subsequent studies we will attempt to assess such questions.
`
`By systematically removing each of the six input features, we see that the accuracy
`
`always drops, by between 0.6% and 3.8% from the six-input performance of 95%. This shows
`
`that every quality metric provides some improvement in a multivariate sense with Nbeats
`providing the most additional marginal information and SQI4, providing the least. This is as
`we would expect, since Nbeats (which is proportional to heart rate) is the most independent
`input parameter and a measurement of saturation (SQI4) may be redundant compared to the
`template matching. Moreover, the interpretation of each of the SQI’s should be heart rate
`
`dependent.
`
`A final note concerns the choice of features in this study, which were based on intuition
`
`and experience. However, the features are not exhaustive and a much wider variety of features
`
`could be tested as described in this work, or by adding in a feature selection approach such as
`
`a genetic algorithm.
`
`
`
`5. Conclusion
`
`
`
`We have described an effective system (with 95% accuracy on unseen test data) which could
`
`be deployed as a stand-alone signal quality assessment algorithm for vetting the clinical utility
`
`of PPG signals. Applications range from false alarm suppression to improving estimates of
`
`derived physiological parameters such as heart rate, respiration, oxygen saturation, pulse
`
`
`
`11
`
`
`
`transit time and peripheral circulatory changes. Moreover, the algorithm presented here is
`
`quite general and could be retrained and applied to any periodic or quasi-periodic signal such
`
`as continuous blood pressure.
`
`
`
`Acknowledgments
`
`The authors gratefully acknowledge funding for this research from Mindray North America.
`
`The authors would also like to thank the Laboratory for Computational Physiology at MIT for
`
`providing the annotated data for this study.
`
`
`
`References
`
`
`
`Addison P S and Watson J N 2010 Signal processing techniques for determining signal quality
`
`using a wavelet transform ratio surface US Patent App 12/469,498, 2009, Publication
`
`number: US 2010/0298728 A1
`
`Allen J 2007 Photoplethysmography and its application in clinical physiological measurement
`
`Physiol. Meas. 28 R1–39
`
`Chan K W and Zhang Y T 2002 Adaptive reduction of motion artifact from
`
`photoplethysmographic recordings using a variable step-size LMS filter. Proc. IEEE
`
`Sensors vol 2 1343–6
`
`Clifford G D, Lopez D, Li Q and Rezek I 2011 Signal quality indices and data fusion for
`
`determining acceptability of electrocardiograms collected
`
`in noisy ambulatory
`
`environments Comput. Cardiol. 38 285–8
`
`Clifford G D 2002 Signal processing methods for heart rate variability D.Phil. Thesis Oxford
`
`University, Oxford, UK
`
`Deshmane A V 2009 False arrhythmia alarm suppression using ECG, ABP, and
`
`photoplethysmogram M.S. Thesis MIT, Cambridge, USA
`
`Gil E, Bailon R, Vergara J, and Laguna P 2010 PTT variability for discrimination of sleep
`
`apnea related decreases in the amplitude fluctuations of PPG signal in children IEEE
`
`Trans. Biomed. Eng. 57 1079–88
`
`Goldberger A L, Amaral L A N, Glass L, Hausdorff J M, Ivanov P C, Mark R G, Mietus J E,
`
`Moody G B, Peng C K and Stanley H E 2000 PhysioBank, PhysioToolkit, and PhysioNet:
`
`Components of a new research resource for complex physiologic signals Circulation 101
`
`e215–20
`
`Graybeal J M and Petterson M T 2004 Adaptive filtering and alternative calculations
`
`revolutionizes pulse oximetry sensitivity and specificity during motion and low perfusion
`
`Proc. 26th Annu. Int. Conf. IEEE EMBS 5363–6
`
`Hayes M J and Smith P R 1998 Artifact reduction in photoplethysmography Applied Optics
`
`37 7437–46
`
`Hayes M J and Smith P R 2001 A new method for pulse oximetry possessing inherent
`
`insensitivity to artifact IEEE Trans. Biomed. Eng. 48 452–61
`
`Huang B and Kinsner W 2002 ECG frame classification using dynamic time warping Proc.
`
`2002 IEEE Canadian Conf. on Electrical and Computer Engineering 1105–10
`
`Keogh E and Ratanamahatana C A 2005 Exact indexing of dynamic time warping Knowledge
`
`and Information Systems vol 7 (London: Springer) 358–86
`
`
`
`12
`
`
`
`Kim B S and Yoo S K 2006 Motion artifact reduction in photoplethysmography using
`
`independent component analysis IEEE Trans. Biomed. Eng. 53 566–8
`
`Koski A 1996 Segmentation of digital signals based on estimated compression ratio IEEE
`
`Trans. Biomed. Eng. 43 928–38
`
`Krishnan R, Natarajan B and Warren S 2008a Motion artifact
`
`reduction
`
`in
`
`photopleythysmography using magnitude-based
`
`frequency domain
`
`independent
`
`component analysis Proc. of 17th Int. Conf. on Computer Communications and Networks
`
`ICCCN '08 1–5
`
`Krishnan R, Natarajan B and Warren S 2008b Analysis and detection of motion artifact in
`
`photoplethysmographic data using higher order statistics IEEE Int. Conf. on Acoustics,
`
`Speech and Signal Processing 613–6
`
`Lee C M and Zhang YT 2003 Reduction of motion artifacts from photoplethysmographic
`
`recordings using a wavelet denoising approach IEEE EMBS Asian-Pacific Conf. on
`
`Biomed. Eng. 194–5
`
`Lee H W, Lee J W, Jung W G and Lee G K 2007 The periodic moving average filter for
`
`removing motion artifacts from PPG signals Int. J. Control Automation Systems 5 701–6
`
`Monasterio V, Burgess F and Clifford G D 2012 Robust neonatal apnoea-related desaturation
`
`classification Physiol. Meas. Accepted for publication, May 2012
`
`Moré J J 1978 The Levenberg-Marquardt algorithm: Implementation and theory Numerical
`
`Analysis (Lecture Notes in Mathematics vol. 630) ed G A Watson (Springer Verlag) pp
`
`105–16
`
`Reddy K A and Kumar V J 2007 Motion artifact reduction in photoplethysmographic signals
`
`using singular value decomposition Proc. IEEE Instrumentation and Measurement
`
`Technology Conf. IMTC 1–4