`
`http://www.intl.elsevierhealth.com/journals/aiim
`
`Support vector machine-based arrhythmia
`classification using reduced features of
`heart rate variability signal
`
`Babak Mohammadzadeh Asl a,*, Seyed Kamaledin Setarehdan b,
`Maryam Mohebbi a
`
`a Department of Biomedical Engineering, Tarbiat Modares University, Tehran, Iran
`b Control and Intelligent Processing Center of Excellence, Faculty of Electrical and
`Computer Engineering, University of Tehran, P.O. Box 14395/515, Tehran, Iran
`
`Received 14 July 2007; received in revised form 24 April 2008; accepted 28 April 2008
`
`KEYWORDS
`Arrhythmia
`classification;
`Generalized
`discriminant analysis;
`Heart rate variability;
`Nonlinear analysis;
`Support vector
`machine
`
`Summary
`
`Objective: This paper presents an effective cardiac arrhythmia classification algo-
`rithm using the heart rate variability (HRV) signal. The proposed algorithm is based on
`the generalized discriminant analysis (GDA) feature reduction scheme and the support
`vector machine (SVM) classifier.
`Methodology: Initially 15 different features are extracted from the input HRV signal
`by means of linear and nonlinear methods. These features are then reduced to only
`five features by the GDA technique. This not only reduces the number of the input
`features but also increases the classification accuracy by selecting most discriminat-
`ing features. Finally, the SVM combined with the one-against-all strategy is used to
`classify the HRV signals.
`Results: The proposed GDA- and SVM-based cardiac arrhythmia classification algo-
`rithm is applied to input HRV signals, obtained from the MIT-BIH arrhythmia database,
`to discriminate six different types of cardiac arrhythmia. In particular, the HRV signals
`representing the six different types of arrhythmia classes including normal sinus
`rhythm, premature ventricular contraction, atrial fibrillation, sick sinus syndrome,
`ventricular fibrillation and 28 heart block are classified with an accuracy of 98.94%,
`98.96%, 98.53%, 98.51%, 100% and 100%, respectively, which are better than any other
`previously reported results.
`Conclusion: An effective cardiac arrhythmia classification algorithm is presented. A
`main advantage of the proposed algorithm, compared to the approaches which use
`the ECG signal itself is the fact that it is completely based on the HRV (R—R interval)
`signal which can be extracted from even a very noisy ECG signal with a relatively high
`
`* Corresponding author. Tel.: +98 912 4715235; fax: +98 21 88633029.
`E-mail address: b.mohammadzadeh@modares.ac.ir (B.M. Asl).
`
`0933-3657/$ — see front matter # 2008 Elsevier B.V. All rights reserved.
`doi:10.1016/j.artmed.2008.04.007
`
`APPLE 1039
`
`1
`
`
`
`52
`
`B.M. Asl et al.
`
`accuracy. Moreover, the usage of the HRV signal leads to an effective reduction of the
`processing time, which provides an online arrhythmia classification system. A main
`drawback of the proposed algorithm is however that some arrhythmia types such as
`left bundle branch block and right bundle branch block beats cannot be detected using
`only the features extracted from the HRV signal.
`# 2008 Elsevier B.V. All rights reserved.
`
`1. Introduction
`
`Heart diseases are a major cause of mortality in the
`developed countries. Many different instruments
`and methods are developed and being daily used
`to analyze the heart behavior. One of the relatively
`new methods to assess the heart activity and to
`discriminate different cardiac abnormalities is to
`analyse the so-called heart rate variability (HRV)
`signal. HRV signal, which is generated from electro-
`cardiogram (ECG) by calculating the inter-beat
`intervals, is a nonlinear and nonstationary signal
`that represents the autonomic activity of the ner-
`vous system and the way it influences the cardio-
`vascular system. Hence, measurement and analysis
`of the heart rate variations is a non-invasive tool for
`assessing both the autonomic nervous system and
`the cardiovascular autonomic regulatory system.
`Furthermore, it can provide useful
`information
`about the current and/or the future heart deficien-
`cies [1]. Therefore, HRV analysis can be considered
`as an important diagnostic tool in cardiology.
`Several methods have been proposed in the lit-
`erature for automatic cardiac arrhythmia detection
`and classification. Some examples of the techniques
`used include:
`threshold-crossing intervals
`[2],
`neural networks [3—10], wavelet transforms [11],
`wavelet analysis combined with radial basis function
`neural networks [12], support vector machines [13],
`Bayesian classifiers [14], fuzzy logic combined with
`the Markov models [15], fuzzy equivalence relations
`[16], and the rule-based algorithms [17]. Most of
`these studies [2—6,11—13] are based on the analysis
`of the ECG signal itself. In most methods, the various
`features of the ECG signal including the morpholo-
`gical features are extracted and used for classifica-
`tion of the cardiac arrhythmias. This is a time-
`consuming procedure and the results are very sen-
`sitive to the amount of the noise.
`An alternative approach would be to extract the
`HRV signal from the ECG signal first by recording the
`R—R time intervals and then processing the HRV
`signal instead. This is a more robust method since
`the R—R time intervals are less affected by the
`noise. Different HRV signal analysis methods for
`cardiac arrhythmia detection and classification
`were introduced in the past. Tsipouras and Fotiadis
`[8] proposed an algorithm based on both time and
`
`time—frequency analysis of the HRV signal using a
`set of neural networks. Their method could only
`classify the input ECG segments as ‘‘normal’’ or
`‘‘arrhythmic’’ segments without the ability to iden-
`tify the type of the arrhythmia. Acharya et al. [16]
`employed a multilayer perceptron (MLP) together
`with a fuzzy classifier for arrhythmia classification
`using HRV signal. They could classify the input ECG
`segments into one of the four different arrhythmia
`classes. In [17], Tsipouras et al. proposed a knowl-
`edge-based method for arrhythmia classification
`into four different categories. The main drawback
`of their algorithm was the fact that the atrial fibril-
`lation, which is an important
`life-threatening
`arrhythmia, was excluded from the ECG database.
`In this paper a new arrhythmia classification
`algorithm is proposed which is able to effectively
`identify six different and more frequently occurring
`types of cardiac arrhythmia. These arrhythmias are
`namely the normal sinus rhythm (NSR), the prema-
`ture ventricular contraction (PVC), the atrial fibril-
`lation (AF), the sick sinus syndrome (SSS), the
`ventricular fibrillation (VF) and the 28 heart block
`(BII). The proposed algorithm is based on the two
`kernel learning machines of the generalized discri-
`minant analysis (GDA) and the support vector
`machine (SVM). By cascading SVM with GDA, the
`input features will be nonlinearly mapped twice
`by radial basis function (RBF). As a result, a linear
`optimal separating hyperplane can be found with
`the largest margin of separation between each pair
`of arrhythmia classes in the implicit dot product
`feature space.
`GDA is a data transformation technique which
`was first introduced by Baudat and Anouar [18]. It
`can be considered as a kind of generalization to the
`well-known linear discriminant analysis (LDA) algo-
`rithm and has become a promising feature extrac-
`tion scheme [19—24] in recent years. The main steps
`in GDA are to map the input data into a convenient
`higher dimensional feature space F first and then to
`perform the LDA algorithm on the F instead of the
`original input space. By GDA therefore, both dimen-
`sionality reduction of the input feature space and
`selection of the useful discriminating features can
`be achieved simultaneously.
`SVM, which was first proposed by Vapnik [25], has
`been considered as an effective classification
`
`2
`
`
`
`Support vector machine-based arrhythmia classification using reduced features
`
`53
`
`scheme in many pattern recognition problems
`recently [22—24,26,27]. It is often reported that
`SVM provides better classification results than other
`widely used methods such as the neural network
`classifiers [28,29]. This is partly because SVM aims to
`obtain the optimal answer using the available infor-
`mation and in the same time it shows better gen-
`eralization ability on the unseen data.
`In continue the details of the proposed algorithm
`for cardiac arrhythmia classification from the HRV
`signal is presented. Section 2 provides the overall
`block diagram of the proposed algorithm together
`with the details of each block. The results of the
`application of the proposed algorithm to the MIT-BIH
`arrhythmia database are presented in Section 3.
`Section 4, compares the results obtained by the
`proposed algorithm to those obtained by the other
`previously reported techniques. This is followed by a
`discussion on the results and the methods. Finally,
`Section 5 concludes the paper.
`
`2. Materials and methods
`
`2.1. Database
`
`The HRV data used in this work is generated from the
`ECG signals provided by the MIT-BIH Arrhythmia
`Database [30]. The database was created in 1980
`as a reference standard for serving all those who are
`conducting a research on the cardiac arrhythmia
`detection and classification problem [31].
`The MIT-BIH Arrhythmia Database includes 48 ECG
`recordings each of length 30 min with a total of about
`109,000 R—R intervals. The ECG signals were band-
`pass-filtered in the frequency range of 0.1—100 Hz
`and were sampled with a sampling frequency of
`360 Hz. Each of the about 109,000 beats was manu-
`ally annotated by at least two cardiologists working
`independently. Their annotations were compared,
`consensus on disagreements was obtained, and the
`reference annotation files were prepared [31]. The
`reference annotation files include beat, rhythm, and
`signal quality annotations. Due to the lack of the VF
`data in the MIT-BIH arrhythmia database, which is
`needed in the current study, the Creighton University
`Ventricular Tachyarrhythmia Database was added to
`the MIT-BIH data as the VF arrhythmia class after
`resampling it at a rate of 360 Hz.
`
`Finally, a total number of 1367 ECG segments
`each with 32 R—R intervals were selected from
`the above-mentioned database and used in this
`work, which contains all six different arrhythmia
`classes considered in this study. The specialists
`defined rhythm annotations for each segment were
`also considered along with the segments.
`
`2.2. The proposed algorithm
`
`The block diagram of the proposed algorithm is
`demonstrated in Fig. 1. As seen, it comprises the
`four steps of preprocessing, feature extraction,
`GDA-based feature reduction and SVM-based
`arrhythmia classification. In continue, each block
`is described in more details.
`
`2.2.1. Preprocessing
`As a first step, it is necessary to extract the HRV
`signals from the ECG signals within the database. In
`general, this process can be affected by many inter-
`fering signals such as the mains 50 Hz, the interfer-
`ences from electromyogram (EMG) signals and also
`the baseline wandering. The interfering signals are
`effectively eliminated from the input ECG signal
`using a 5—15 Hz bandpass filter. Furthermore, the
`cubic splines are used for baseline approximation,
`which is then subtracted from the signal [32].
`Next, the tachograms are extracted from the
`filtered ECG signals as follows. Initially, using the
`Hamilton and Tompkins algorithm [33,34], a point
`within the QRS complex is detected (QRS point).
`Afterwards, the main wave of the QRS complex, i.e.,
`the R wave, is identified by locating the maximum
`absolute value of the signal within the time window
`[QRS—280 ms, QRS + 120 ms]. The HRV signal is then
`constructed by measuring the time intervals
`between the successive R waves (R—R intervals).
`Plotting the R—R intervals against the time indices
`provides the so-called tachograms. The tachograms
`are then divided into small segments each contain-
`ing 32 R—R intervals and characterized using the
`database rhythm annotation. It must be noted that
`the resulting tachograms are sequences of unevenly
`sampled beat-to-beat intervals. Therefore, for the
`case of the frequency domain analysis in the forth-
`coming Section 2.2.2, the cubic spline interpolation
`method is used at a sampling rate of 4 Hz to produce
`an evenly sampled data. This resampling procedure
`
`Figure 1
`
`Block diagram of the proposed arrhythmia classification algorithm.
`
`3
`
`
`
`54
`
`B.M. Asl et al.
`
`eceptor control and is mediated by both vagal and
`sympathetic systems [1,7]. In this work, the ratio of
`the LF and HF bands power (LF/HF) is used as the
`frequency domain feature of the HRV signal.
`
`is necessary prior to using the well-known methods
`of power spectral density (PSD) estimation which
`are only applicable to the evenly sampled signals.
`
`2.2.2. Feature extraction
`The next step in the block diagram is the feature
`extraction step. In general, the cardiovascular sys-
`tem, hence the HRV signals, demonstrates both
`linear and nonlinear behavior. Different linear and
`nonlinear parameters are defined and used for HRV
`signal description. In this work, a combination of
`both linear and nonlinear features of the HRV signal
`is considered. Time and frequency domain features
`are among the standard linear measures of the HRV
`signals which are strongly recommended in a special
`report published by the Task Force of the European
`Society of Cardiology and North American Society of
`Pacing and Electrophysiology in 1996 [1]. As in most
`previous works, these features are used in the cur-
`rent study.
`
`2.2.2.1. Linear analysis: time domain feature-
`s. Seven commonly used time domain parameters
`of the HRV signal which are also considered in this
`work are as follows:
`
`Mean: This refers to the mean value of the 32 R—R
`intervals within each segment.
`RMSSD: This refers to the root mean square suc-
`cessive difference of the 32 R—R intervals in each
`segment.
`SDNN: This refers to the standard deviation of the
`32 R—R intervals within each segment.
`SDSD: This refers to the standard deviation of
`differences between the adjacent R—R intervals
`within each segment.
`pNN50, pNN10, pNN5: These refer to the number
`of successive difference of intervals which differ
`by more than 50, 10 and 5 ms, respectively,
`divided by 32, the total number of the R—R
`intervals within each segment.
`
`2.2.2.2. Linear analysis: frequency domain fea-
`tures. Although the time domain parameters are
`computationally effective but they lack the ability
`to discriminate between the sympathetic and para-
`sympathetic contents of the HRV signal [35]. As the
`most popular linear technique used in the HRV signal
`analysis, however, the frequency domain analysis of
`the HRV signal has the ability to discriminate
`between the two. In fact, it is generally accepted
`that the spectral power in the high-frequency (HF)
`band (0.15—0.4 Hz) of the HRV signal reflects the
`respiratory sinus arrhythmia (RSA) and thus cardiac
`vagal activity. On the other hand, the low-frequency
`(LF) band (0.04—0.15 Hz), is related to the baror-
`
`2.2.2.3. Nonlinear analysis. Seven different non-
`linear parameters of the HRV signal are used in this
`work, which are listed and described in conti-
`nue.SD1/SD2: Let us consider the HRV signal as a
`time series of the R—R intervals which is denoted by
`RR(i). Now, if each interval RR(n + 1) is plotted as a
`function of the previous interval RR(n), then the
`resulting plot is known as the Poincare´ plot, which is
`a relatively new tool for HRV signal analysis. A useful
`feature of this tool is that it does not require the HRV
`to be considered as a stationary signal [36]. Poincare´
`plot can be seen as a graphical representation of the
`correlation between the successive R—R intervals.
`This plot can be quantitatively analyzed by calcu-
`lating the standard deviations of the distances of the
`points RR(i) from the lines y = x and y = x + 2RRm,
`where RRm is the mean of all RR(i) values. These
`standard deviations are denoted by SD1 and SD2,
`respectively. In fact, SD1 represents the fast beat-
`to-beat variability, while SD2 describes the rela-
`tively long-term variability in the HRV signal [37].
`The ratio SD1/SD2 is usually used to describe the
`relation between the two components.
`ApEn: Approximate entropy (ApEn) is a measure of
`unpredictability of the fluctuations in a time series,
`and reflects the likelihood that particular patterns of
`observations will not be followed by additional simi-
`lar observations. A time series containing many repe-
`titive patterns has a relatively small ApEn, while a
`more complex (i.e., less predictable) process has a
`relatively high ApEn [38]. We have used the method
`proposed in [39] for calculating ApEn for each HRV
`segment by setting the pattern length m = 2 and the
`measure of similarity r = 20% of the standard devia-
`tion of the segment, as proposed in [40].
`SpEn: Similar to ApEn, spectral entropy (SpEn)
`quantifies the complexity of the input time series
`(HRV segment) but in the frequency domain [41].
`The Shannon’s channel entropy is used in this work
`to obtain an estimate of the spectral entropy of the
`process as
`H ¼
`
`p f logð p fÞ
`
`X f
`
`(1)
`
`where pf is the value of the probability density
`function (PDF) of the process at frequency f [7].
`Heuristically, the entropy can be interpreted as a
`measure of uncertainty about the event at fre-
`quency f.
`LLE: Lyapunov exponent is a measure of how fast
`two initially nearby points on a trajectory will
`
`4
`
`
`
`Support vector machine-based arrhythmia classification using reduced features
`
`55
`
`dow size n on a log—log scale. Typically, F(n)
`increases with the window size. The fluctuation in
`small windows can be characterized by a scaling
`exponent (self similarity factor), a, which repre-
`sents the slope of the line relating log F(n) to log n
`[35].
`Sequential trend analysis: Sequential trend ana-
`lysis of the HRV signal evaluates not only the
`sympathetic—parasympathetic balance but also
`provides the spectral analysis of the signal without
`the necessity to consider the signal is stationary. To
`perform the sequential trend analysis, it is neces-
`sary to plot DRR(n) against DRR(n + 1) and divide
`the plane into four quadrants. The points located
`in the +/+ quadrant indicate two consecutive inter-
`val
`increments, which means the heart rate is
`decreasing and the ones in the / quadrant
`indicate two consecutive interval decrements,
`which means the heart rate is increasing. In this
`work the density of the points within the / and
`+/+ quadrants are used as two features that mea-
`sure the sympathetic and parasympathetic activ-
`ities, respectively [36].
`
`2.2.3. Feature dimension reduction by GDA
`Having defined the above-mentioned linear and
`nonlinear features, due to the large variations in
`the HRV patterns of various arrhythmia classes,
`there is usually a considerable overlap between
`some of these classes in the feature space. For
`example, the SSS and NSR classes demonstrate a
`large overlap with each other making it difficult to
`distinguish between the two. In this situation, a
`feature transformation mechanism that can mini-
`mize the within-class scatter and maximize the
`between-class scatter will be very beneficial. GDA
`[18] is such a transform which is employed in this
`work.
`GDA is a nonlinear extension to the ordinary
`LDA. The input training data is mapped by a kernel
`function to a high-dimensional feature space,
`where different classes of objects are supposed
`to be linearly separable. The LDA scheme is then
`applied to the mapped data, where it searches for
`those vectors that best discriminate among the
`classes
`rather
`than those vectors
`that best
`describe the data [47]. In fact, the goal of the
`LDA is to seek a transformation matrix that max-
`imizes the ratio of the between-class scatter to the
`within-class scatter. Furthermore, given a number
`of independent features which describe the data,
`LDA creates a linear combination of the features
`that yields the largest mean differences of the
`desired classes [48]. As a result, if there are N
`classes in the data set, the dimension of feature
`space can be reduced to N 1.
`
`diverge from each other as the system evolves,
`hence providing useful information about the sys-
`tem’s dependency on initial conditions [42]. A posi-
`tive Lyapunov exponent strongly indicates that the
`system is a chaotic one [43,44]. Although an m-
`dimensional system has m Lyapunov exponents,
`however, in most applications it is sufficient to
`obtain only the average largest Lyapunov exponent
`(LLE) as follows. First, a starting point is selected
`within the reconstructed phase space of the system
`and then all those points residing within a neighbor-
`hood of a predetermined radius e from the starting
`point are determined. Next, the mean distances
`between the trajectory of the initial point and
`the trajectories of the neighboring points are cal-
`culated as the system evolves. By plotting the loga-
`rithm of the above-mentioned mean values against
`the time index, the slope of the resulting line
`provides the LLE. To remove the dependency of
`the calculated values to the starting point, this
`procedure is repeated for different starting points
`and the average is taken as the average LLE [45]
`used as a feature to quantify the chaotic behavior of
`the HRV signal.
`DFA: The detrended fluctuation analysis (DFA) is a
`useful parameter to quantify the fractal scaling
`properties of the R—R intervals. This technique is
`a modification to the root-mean-square analysis of
`the random walks applied to nonstationary signals
`[46]. For the detrended fluctuation analysis of the
`HRV signal, the R—R time series (of total length N) is
`integrated using the following equation first:
`ðRRðiÞ RRmÞ
`yðkÞ ¼
`
`Xk
`
`(2)
`
`i¼1
`where y(k) is the kth value of the integrated series,
`RR(i) is the ith inter-beat interval and RRm is the
`average inter-beat interval over the entire series.
`Then, the integrated time series is divided into
`windows of equal length n. In each window of length
`n, a least-square line is fitted to the R—R interval
`data (representing the trend in that window). The y
`coordinate of the straight line segments are denoted
`by yn(k). Next, the integrated time series within
`each window n is detrended. The root-mean-square
`fluctuation of this integrated and detrended time
`series is then calculated as
`
`XN
`
`vuut
`ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
`½yðkÞ ynðkÞ2
`
`k¼1
`
`(3)
`
`1 N
`
`FðnÞ ¼
`
`This computation is repeated over all window
`sizes
`(time scales)
`to obtain the relationship
`between F(n) and the window size n (i.e., the
`number of beats within the observation window).
`F(n) is usually plotted against the observation win-
`
`5
`
`
`
`B.M. Asl et al.
`
`T
`
`(5)
`
`!
`fðx prÞ
`
`Xn
`
`p
`
`r¼1
`
`
`fðx pqÞ
`
`Xn
`0@
`
`p
`
`q¼1
`
`n p
`
`XN
`
`p¼1
`
`B ¼ 1
`M
`
`The purpose of the GDA is to find the projection
`vector v such that the inter-classes inertia is max-
`imized and the intra-classes inertia is minimized in
`space F, which is equivalent to solving the following
`maximization problem:
`vT Bv
`v ¼ arg max
`vT Vv
`The projection vector v is the eigenvector of the
`matrix V 1B associated with the eigenvalue l = vTBv/
`vTVv. All solutions v lie in the span of f(x). So, there
`exist expansion coefficients ai such that
`
`(6)
`
`v
`
`(4)
`
`Let us assume that the training data set X con-
`tains M feature vectors out of N classes. Let xpq
`denotes the qth HRV feature vector in the pth class,
`np is the class size of the pth class, and f is a
`nonlinear mapping function so that the space X is
`mapped into a higher dimensional feature space: f:
`xi 2 Rf ! f(xi) 2 RF, F f.
`The observations f(xi) are assumed to be centered
`in space F. Before projecting the training data set X
`into a new set Y by means of the GDA, two matrices of
`the within-class scatter matrix V and the between-
`class scatter matrix B in space F are defined as
`V ¼ 1
`fðx pqÞfTðx pqÞ
`M
`
`Xn
`XN
`
`p
`
`p¼1
`
`q¼1
`
`56
`
`Box-plots of the new five features for different arrhythmia classes (1 = NSR, 2 = PVC, 3 = AF, 4 = SSS, 5 = VF,
`Figure 2
`6 = BII). For more information see the text.
`
`6
`
`
`
`XM
`
`i¼1
`By using the kernel function k(xi, xj) = kij =
`f(xi)f(xj) and performing the eigenvectors decom-
`position on the kernel matrix K = (kij)i=1,. . .,M;j=1,. . .,M,
`M normalized expansion coefficients for each pro-
`jection vector a = a/(aTKa)1/2 is obtained.
`Now, for a feature vector x from the test HRV data
`set, the projection on the ith eigenvector vi can be
`computed as
`yi ¼ viT
`fðxÞ ¼
`
`jkðx j; xÞ
`ai
`
`(8)
`
`XM
`
`j¼1
`
`jfðx jÞfðxÞ ¼
`ai
`
`XM
`
`j¼1
`
`where ai
`j denotes the jth expansion coefficient of
`the ith eigenvector. For the purpose of feature
`dimension reduction, N 1 eigenvectors associated
`with the first largest nonzero N 1 eigenvalues are
`selected to form the transformation matrix WT =
`(v1, . . ., vN 1). So, each HRV feature vector is pro-
`jected into a new coordinates using the N 1 pro-
`jection vectors.
`
`Support vector machine-based arrhythmia classification using reduced features
`
`57
`
`It is worth to notice that the optimal number of
`eigenvectors for the data transformation is gener-
`ally equal to N 1 [22]. In this paper, assuming the
`number of classes (different arrhythmia to be iden-
`tified) is 6, the number of original 15 features was
`designed to be reduced to 5 by GDA with a guarantee
`that the performance is comparable to that of the
`non-reduced feature set.
`The box-plots and the feature space plots of the
`new five features for different arrhythmia classes,
`which are generated by this process, are presented
`in Figs. 2 and 3, respectively. As seen the patterns
`related to the different arrhythmia classes are
`located close to each other and relatively well
`separated from the other classes within the fea-
`ture space. Therefore the new reduced feature set
`not only increases the classification procedure in
`the next step but also provides an appropriate tool
`for a better discrimination of the different arrhyth-
`mia classes. To demonstrate the usefulness of the
`GDA technique, a comparative study is carried out
`in the next section comparing GDA to the com-
`monly used feature dimension reduction techni-
`ques of the principal component analysis (PCA) and
`the LDA.
`
`2.2.4. SVM-based arrhythmia classifier
`The next step in the block diagram is the classifica-
`tion of the HRV segments by considering their
`reduced features. Different classification methods
`have been used for cardiac arrhythmia classification
`in the past [2—17]. In this work the SVM scheme is
`used for classification. SVM is a machine-learning
`technique which has established itself as a powerful
`tool in many classification problems [22—24,26,27].
`Simply stated, the SVM identifies the best separating
`hyperplane (the plane with maximum margins)
`between the two classes of the training samples
`within the feature space by focusing on the training
`cases placed at the edge of the class descriptors. In
`this way, not only an optimal hyperplane is fitted,
`but also less training samples are effectively used;
`thus high classification accuracy is achieved with
`small training sets [49].
`Given a training set (xi, yi), i = 1, 2, . . ., l, where
`xi 2 Rn and yi 2 { 1, 1}, the traditional SVM algo-
`
`!
`(
`)
`rithm is summarized as the following optimization
`problem:
`wT w þ C
`min
`ji
`i¼1
`w;b;j
`subject to : yiðwT fðxiÞ þ bÞ 1 ji;
`
`Xl
`
`1 2
`
`zi > 0 8 i
`
`v ¼
`
`aifðxiÞ
`
`(7)
`
`Feature space plots of (a) first, second, and
`Figure 3
`third new features; (b) third, fourth, and fifth new fea-
`, PVC:
`, AF:
`tures for different arrhythmia classes (NSR:
`, SSS:
`, VF:
`, BII:
`).
`
`where f(x) is a nonlinear function that maps x into a
`higher dimensional space [50]. w, b and ji are the
`
`(9)
`
`7
`
`
`
`58
`
`B.M. Asl et al.
`
`weight vector, bias and slack variable, respectively.
`C is a constant and determined a priori. Searching
`for the optimal hyperplane in (9) is a quadratic
`programming problem, which can be solved by con-
`structing a Lagrangian and transforming it into a
`dual maximization problem of the function Q(a),
`defined as follows:
`ai 1
`max QðaÞ ¼
`2
`
`Xl
`
`Xl
`
`i¼1
`
`aia jyiy jKðxi; x jÞ
`
`Xl
`Xl
`
`i¼1
`
`j¼1
`
`subject to :
`
`aiyi ¼ 0;
`i¼1
`for i ¼ 1; 2; . . .; l
`0 ai C;
`(10)
`where K(xi, xj) = f(xi)Tf(xj) is the kernel function
`and, a = (a1, a2, . . ., al) is the vector of nonnegative
`Lagrange multipliers.
`the
`Assuming that
`the optimum values of
`Lagrange multipliers are denoted as ao,i(i = 1, 2,
`. . ., l), it is then possible to determine the corre-
`sponding optimum value of the linear weight vector
`wo and the optimal hyperplane as in (11) and (12),
`respectively:
`wo ¼
`ao;iyifðxiÞ
`
`(11)
`
`(12)
`
`(13)
`
`Xl
`
`Xl
`
`i¼1
`ao;iyiKðx; xiÞ þ b ¼ 0:
`
`
`!
`i¼1
`The decision function can be written as
`ao;iyiKðx; xiÞ þ b
`fðxÞ ¼ sign
`
`i¼1
`
`:
`
`Xl
`
`Although SVM separates the data only into two
`classes, classification into additional classes is pos-
`sible by applying either the one against all (OAA) or
`one against one (OAO) methods. In the OAA method,
`a set of binary classifiers (k parallel SVMs, where k
`denotes the number of classes) is trained to be able
`to separate each class from all others. Then each
`data object is classified to the class for which the
`largest decision value has been determined. The
`OAO method constructs k(k 1)/2 parallel SVMs
`where each SVM is trained on the data from two
`classes [51]. Then, the voting strategy [51] aggre-
`gates the decisions and predicts that each data
`object is in the class with the largest vote. A com-
`parative study is carried out in the next section
`comparing the performances of the OAO and OAA
`decomposition methods in training the proposed
`multiclass SVMs. In continue the results of the
`application of the proposed algorithm to the data
`set are presented.
`
`3. Results
`
`To evaluate the performance of the proposed
`arrhythmia classification algorithm, a total num-
`ber of 1367 HRV segments are used which includes
`835 NSR segments, 57 PVC segments, 322 AF seg-
`ments, 50 SSS segments, 78 VF segments, and 25
`BII segments. The relatively high percentage of the
`NSR segments in the data set is not far from reality
`as ECG recordings usually have a higher percen-
`tage of normal beats compared to arrhythmic
`segments. The HRV signals at each class are ran-
`domly divided into the train and test sets in an
`approximate ratio of 2/3 and 1/3. The exact num-
`ber of the train and test segments for each class is
`shown in Table 1.
`
`In this work, the radial basis function (RBF) is used
`as the kernel function and the parameters — kernel
`width s and regularization constant C — were
`experimentally defined to achieve the best classifi-
`cation result.
`
`Table 1 The correct classification results on the test set for each class by the proposed algorithm together with the
`spread of the erroneous classifications into the other classes (these are the average of 100 train and test procedures)
`
`8
`
`
`
`Support vector machine-based arrhythmia classification using reduced features
`
`59
`
`Table 2 Performance analysis of the SVM classifier on the original features (ORG) and the reduced features (by GDA)
`in terms of the average values of the four commonly used measures in % (numbers inside the parenthesis are the
`standard deviations)
`
`The 15 aforementioned linear and nonlinear
`arrhythmia features are then calculated for each
`HRV segment in the train and test sets. Next, the 15
`original features are reduced to only 5 new features
`by means of the GDA algorithm. A radial basis
`function (RBF) is used as the kernel where a kernel
`width of 7 was chosen empirically. Afterwards, the
`SVM classifier is trained using the reduced feature
`vectors of the training set. To optimize the learning
`cost and the classification performance, the SVM
`classifier parameters, the kernel width s and the
`regularization constant C, have to be chosen appro-
`priately. For this purpose, the training data set itself
`is divided into the train and validation sets. The
`optimum values of the parameters s and C are then
`chosen such that the minimum error is achieved on
`the validation data set. The resulting optimum
`values were 0.08 and 30 for s and C, respectively.
`The test data for each class is then used for perfor-
`mance analysis of the classifier.
`The whole procedure including randomly dividing
`the data set into the train and test sets, training the
`
`classifier and testing it by the test data set was
`repeated 100 times. The average correct classifica-
`tion results over the 100 runs on the test set for each
`class together with the spread of the erroneous
`classifications into the other classes are shown in
`Table 1. As seen, for the NSR, in average only 2.1
`segments are misclassified as the SSS (0.75%). For
`the PVC, in average only one segment is misclassi-
`fied as the AF (5.26%). For the AF, in average 3.8
`segments are misclassified as the PVC (3.52%) and 2
`segments as the SSS (1.11%). For the SSS, in average
`2.8 segments are misclassified as the NSR (14%). For
`the VF and the BII, there is no misclassification to
`any other classes.
`For a m