`
`891
`
`A Patient-Adaptable ECG Beat Classifier
`Using a Mixture of Experts Approach
`
`Yu Hen Hu,* Senior Member, IEEE, Surekha Palreddy, and Willis J. Tompkins, Fellow, IEEE
`
`Abstract—We present a “mixture-of-experts” (MOE) approach
`to develop customized electrocardigram (ECG) beat classifier in
`an effort to further improve the performance of ECG processing
`and to offer individualized health care. A small customized
`classifier is developed based on brief, patient-specific ECG data.
`It is then combined with a global classifier, which is tuned to a
`large ECG database of many patients, to form a MOE classifier
`structure. Tested with MIT/BIH arrhythmia database, we observe
`significant performance enhancement using this approach.
`
`Index Terms— ECG beat classification, MIT/BIH database,
`mixture of experts, neural network, patient adaptation.
`
`I. INTRODUCTION
`
`COMPUTERIZED electrocardiography is now a well-
`
`established practice, after several years of significant
`progress. Many algorithms have been proposed over years for
`electrocardiogram (ECG) beat detection and classification. In
`a clinical setting, such as an intensive care unit, it is essential
`for automated systems to accurately detect and classify elec-
`trocardiographic signals on a real-time basis. Since several
`arrhythmia are potentially dangerous and life threatening, if
`not detected within a few seconds to a few minutes of its
`onset, automated electrocardiographic monitoring assumes a
`challenging role. Several algorithms have been proposed in
`the literature for detection and classification of ECG beats and
`reported results, that leave room for improvement. They in-
`clude signal processing techniques; such as frequency analysis,
`template matching, and other parameter extraction methods.
`Artificial neural networks were also employed to exploit their
`natural ability in pattern-recognition tasks for successful clas-
`sification of ECG beat [2], [3], [6]–[8], [23]–[25], [28]–[31].
`One major problem faced by today’s automatic ECG anal-
`ysis machine is the wild variations in the morphologies of
`ECG waveforms of different patients and patient groups.
`An ECG beat classifier which performs well for a given
`training database often fails miserably when presented with
`a different patient’s ECG waveform. Such an inconsistency
`in performance is a major hurdle preventing highly reliable,
`fully automated ECG processing systems to be widely used
`clinically.
`
`Manuscript received September 13, 1995; revised May 5, 1997. Asterisk
`indicates corresponding author.
`*Y. H. Hu is with the Department of Electrical and Computer
`Engineering, University of Wisconsin, Madison, WI 53706 USA (e-mail:
`hu@engr.wisc.edu).
`S. Palreddy and W. J. Tompkins are with the Department of Electrical and
`Computer Engineering, University of Wisconsin, Madison, WI 53706 USA.
`Publisher Item Identifier S 0018-9294(97)06116-8.
`
`One obvious approach to alleviate this problem is to use as
`much training data as possible to develop the ECG classifier.
`This is the approach taken by all the vendors of ECG pro-
`cessing devices: A large in-house ECG database is developed
`and maintained to test each ECG processing algorithm to be
`incorporated into the product. However, such an approach
`suffers several pitfalls.
`1) No matter how large this database may be, it is not
`possible to cover every ECG waveform of all potential
`patients. Hence, its performance is inherently limited.
`2) The complexity of the classifier grows as the size of the
`training database grows. When a classifier is designed
`to correctly classify ECG from millions of patients
`(if it ever becomes possible), it has to take numerous
`exceptions into account. The result is a complicated
`classifier which is costly to develop, maintain, and
`update.
`3) It is practically impossible to make the classifier learn to
`correct errors during normal clinical use. Thus, it may be
`rendered useless if it fails to recognize a specific type of
`ECG beats which occurs frequently in certain patient’s
`ECG records.
`The answer, we believe, is to allow the classifier to be
`“patient-adaptable.” That is, to let the classification algorithm
`adaptable to the special characteristics of each patient’s ECG
`records. For example, we may include the training algorithm
`and the database used to develop the classifier to be delivered
`to the users, so that the classification algorithm can be fine-
`tuned to each patient. Unfortunately, this is impractical for
`several reasons.
`• While it is possible to turn over training algorithms and
`databases to the users in an academic environment, it
`is unlikely that any commercial ECG machine vendor
`is willing to risk revealing their proprietary information
`to their competitors. Moreover, in-house database often
`contains millions of ECG records which could be costly
`to distribute.
`• Users often do not want to be bothered by implementation
`details of an ECG algorithm. Thus, few users will be able
`to take advantage of this patient-adaptation feature even
`if it is available.
`• Even if a user is willing to perform the patient cus-
`tomization, he or she still have to provide sufficient
`number of patient-specific training data in order to per-
`form patient-adaptation. Manually editing ECG record is
`a time consuming, labor intensive task. Hence, the size of
`
`0018–9294/97$10.00 ª
`
`1997 IEEE
`
`APPLE 1049
`
`1
`
`
`
`892
`
`IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 44, NO. 9, SEPTEMBER 1997
`
`patient-specific training data must be tightly controlled.
`In this study, we propose a novel approach to patient-
`adaptation while avoiding these difficulties: 1) We do not
`require the factory-trained ECG classifier to provide training
`algorithms or training databases. Instead, all we need is that
`this classifier gives both its classification results, as well as
`an estimate of posterior probability of the feature vector as is
`drawn from each particular class. Hence, no company propri-
`etary information is needed. 2) A patient-specific classifier will
`be developed using an automated procedure, without human
`supervision. 3) Only a brief manually edited patient ECG
`record (2–5 min) is needed to achieve significant performance
`improvement.
`This proposed approach is based on three popular artificial
`neural network (ANN)-related algorithms, namely, the self-
`organizing maps (SOM), learning vector quantization (LVQ)
`algorithms, along with the mixture-of-experts (MOE) method.
`SOM and LVQ together are used to train the patient-specific
`classifier, and MOE is a paradigm which facilitates the com-
`bination of the two classifiers (original and patient-specific)
`to realize patient-adaptation. In MOE, the two classifiers are
`modeled as two experts on ECG beat classification. The
`original classifier, called the Global expert (GE) in this work,
`knows how to classify ECG beats for many other patients
`whose ECG records are part of the in-house,
`large ECG
`database. The patient-specific classifier, called the local expert
`(LE) in this work, is trained specifically with the ECG record
`of the patient. A gating function, based on the feature vector
`presented, dynamically weights the classification results of the
`GE’s and the LE’s to reach a combined decision. The process
`is analogous to two human experts arriving at a consensus
`based on their own expertise.
`literature survey and
`Section II
`reports the results of
`Section III discusses data acquisition with preprocessing.
`Section IV discusses
`the proposed algorithms and the
`development of experts. Section V reports the results of the
`classifier on the database records and discusses the results.
`Section VI is a summary of the findings of this paper.
`
`II. PRELIMINARIES
`
`A. ECG Beat Classification Techniques
`Automated ECG beat classification was traditionally per-
`formed using a decision-tree-like approach, based on various
`features extracted from an ECG beat [1], [4], [5], [13], [20],
`[22]. The features used include the width and height of QRS
`complex, RR interval, QRS complex area, etc. One of the
`difficulties is that these features are susceptible to variations of
`ECG beat morphology and temporal characteristics. As such,
`the classification rate reported in these earlier efforts are rather
`moderate.
`Artificial neural networks (ANN’s) have been widely ac-
`cepted for pattern recognition tasks. Their abilities to learn
`from examples and extract the statistical properties of the
`examples presented during the training sessions, make them
`an ideal choice for an automated process that imitates human
`logic. Several efforts have been made to apply ANN’s for
`
`the purpose of ECG beat detection and classification. Previ-
`ous reported efforts include [2], [3], [6]–[8], [23]–[25], and
`[28]–[31].
`Hu et al. [7] reported the development of an adaptive
`multilayer perceptron (MLP) for classification of ECG beats.
`They have achieved an average recognition accuracy of 90%
`in classifying the beats into two groups; normal and abnormal.
`In an attempt to classify the beats into 13 groups according to
`the MIT Database annotations, they have reported an average
`recognition accuracy rate of 65%. An hierarchical system of
`the MLP networks which first classify the beat into normal
`or abnormal, and then classify it into the specific beat type, is
`developed, which improved the recognition accuracy to 84.5%.
`
`B. Self-Organization Map (SOM) and Learning
`Vector Quantization (LVQ)
`SOM and LVQ are both clustering based algorithms pro-
`posed by Kohonen [14], [15]. SOM is an unsupervised on-line
`clustering technique. In SOM, each cluster center (prototype
`or code word) is represented by the weights of a neuron which
`is assigned to a coordinate in the feature map. The SOM
`training algorithm forces adjacent neurons in the feature map
`to respond to similar feature (input) vectors. In a way, this
`feature map is analogous to the spatial organization of sensory
`processing areas in the brain. Let
`be denoted as the
`weights (code word) or the th neuron in SOM during the time
`instant
`, the weights of SOM then are updated according to
`the following simple formula:
`
`(1)
`
`is the so-called neighborhood kernel, which determine
`the size of neighborhood of the th neuron within which all
`neighboring neurons will be updated in response to the present
`feature vector
`. Initially, the neighborhood is large. The
`size reduces as clustering converges, until no neighboring
`neurons will get updated.
`LVQ is a supervised, clustering-based classification tech-
`nique which classifies a feature vector
`according to the
`label of the cluster prototype (code word) into which
`is
`clustered. Classification error occurs when the feature vectors
`within the same cluster (hence, assigned to the same class
`label) are actually drawn from different classes. To minimize
`classification error, the LVQ algorithm fine tunes the clustering
`boundary between clusters of different class labels by modi-
`fying the position of the clustering center (prototype or code
`word). This method is called “learning vector quantization”
`because this clustering based classification method is similar to
`the “vector quantization” method used for signal compression
`in the areas of communication and signal processing.
`According to Kohonen, there are three different LVQ algo-
`rithms, called LVQ1, LVQ2, and LVQ3 developed at subse-
`quent stages to handle classification problems with different
`natures. In this study, the optimized learning-rate LVQ1 and
`LVQ3 algorithms were used for the training and fine-tuning of
`the code book respectively. In LVQ1, for a given input vector
`, a code word
`is found such that
`
`(2)
`
`2
`
`
`
`HU et al.: PATIENT-ADAPTABLE ECG BEAT CLASSIFIER
`
`893
`
`The code word is then updated as follows:
`
`(3)
`
`and
`
`if the classification is correct [i.e.,
`where
`have the same class label] and
`, otherwise.
`is a time-varying learning rate. Other code words in the code
`book remain unchanged. LVQ3 differs from LVQ1 in how
`falls within
`the code words are updated: Assuming that
`a window between two adjacent clusters with corresponding
`code words
`and
`. Suppose that
`and
`belong to the
`same class, and
`and
`belong to different classes, then both
`these code words will be updated in LVQ3:
`
`(4a)
`(4b)
`
`belong to the same class
`and
`On the other hand, if both
`as
`, and
`fall in a window centered at the cluster
`boundary of these two classes, then
`
`(5)
`
`depends on the size of the window,
`The optimal value of
`being smaller for narrower windows. This algorithm is self-
`stabilizing, and optimal placement of the
`does not change
`in continual training.
`Software packages of both SOM and LVQ are available
`in the public domain,1 and the application of these packages
`to the ECG beat classification problem is straight forward.
`The adaptation parameters in these packages (SOM PAK and
`LVQ PAK) were carefully fine tuned while developing the
`classifiers. As such, the development of the code book and
`eventually decision boundary can be made completely trans-
`parent to the user. Moreover, performance obtained using these
`package is very competitive compared to other approaches. In
`this research work, we first apply SOM to a set of training
`feature vectors. The resulting code book (prototypes) then will
`be submitted to the LVQ PAK to facilitate fine tuning and
`classification.
`
`III. MIXTURE OF EXPERTS (MOE)
`This user adaptation problem bears certain resemblance to
`the incremental
`learning problem in that new data are to
`be incorporated to improve existing classifier’s performance.
`However, the black-box model of the existing classifier pre-
`vents us from directly modifying the classifier structure as
`incremental learning algorithms do. Instead, we propose a
`different method called the MOE, to circumvent this problem.
`The MOE approach was proposed by Jacobs et al. [9]–[12],
`[16], [26], [27]. The basic notion is that linear combinations
`of several statistical estimates can perform better than any
`individual estimate. This strategy is not new. It is a well
`known fact that a panel of experts often arrive at a better
`diagnosis than any single expert, because each expert is able
`to contribute from his/her own expertise.
`
`1 University of Helsinki, Finland, URL: ftp://cochlea.hut.fi/pub/
`
`The basic idea is to leave the existing black-box classifier
`intact. Instead, we use the given small, user-specific training
`data set to develop a LE classifier. Then we invoke a modified
`MOE approach to combine these two classifiers, hoping to
`achieve better performance.
`To apply the MOE approach to solve the customization
`problem, we employ two experts: a GE and a LE. The GE
`represents the ECG beat classifier developed in factory. Thus,
`it is trained to classify all types of ECG beats present in the
`in-house ECG database. The LE represents a specialized ECG
`beat classifier, trained on a small segment of annotated ECG
`beats taken from the specific patient. As such, the GE and the
`LE are endowed with complementary knowledge bases, and
`can work together to reach a better decision than any one can
`reach individually.
`The expert network is a combination of the GE and LE
`classifiers. Let
`and
`be the output (row) vectors of
`the two respective GE and LE classifiers. Each element of each
`vector indicates the degree of proximity of an unknown ECG
`beat to a predefined ECG beat class (category). In the MOE
`method, the combined th output vector of both the experts
`is given by
`
`(6)
`
`are the
`,
`is the input feature vector,
`where
`weighting vectors for each expert from a gating network and
`are defined by
`
`(7)
`
`’s are the weight vectors of the gating network. Note
`.
`
`where
`that
`, and
`Theorem 1: Define
`,
`,
`to be the subregion in the feature space where
`the classifier
`makes correct classification of
`and let
`be defined the same way. Assume
`and
`, then
`
`(8)
`
`and
`Proof: We need only to prove that if both
`cannot
`misclassify a given feature vector
`, then
`give correct classification on . Since the correct classifica-
`tion output
`, the combined output
`, and individual
`classifier output
`and
`are all binary vectors of the
`same dimension, if both classifiers misclassify a given feature
`vector which belongs to class
`, we must have, for the th
`elements of these binary vectors
`
`where “ ” is the “exclusive-OR” operator in Boolean algebra.
`Since from (7),
`, we conclude
`, if
`, if
`.
`Hence,
`. In other words,
`must also
`misclassify the same feature vector
`regardless the choice
`
`, and
`
`3
`
`
`
`894
`
`IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 44, NO. 9, SEPTEMBER 1997
`
`of
`
`, and if
`
`. This is to say, if
`and
`.
`, then
`The implication of Theorem 1 is that the maximum per-
`formance enhancement of a MOE approach occurs when
`(empty set). An example is to
`designate each classifier to be responsible for classifying
`a particular class. The assumption that
`is
`essential in this theorem. If
`(interval between
`zero and one), it is possible to find a counter example. Let
`,
`, and
`
`. Then
`
`, then
`
`. If
`which yields correct
`
`classification.
`On the other hand, whether
`takes binary values or
`not, if both classifiers make correct classification, so will the
`combined classifier.
`Theorem 2: With the same definitions as in Theorem 1, and
`
`Proof: Assume
`
`, and
`
`[class
`. Then
`
`(9)
`
`,
`
`(10)
`
`is correctly classified.
`Thus, the output
`From Theorem 2, it is clear that if both classifiers #1 and #2
`correctly classify a pattern , then the combined classifier will
`also correctly classify the same pattern. Hence, this pattern can
`be excluded from the user-adaptation training set as it will not
`affect the result.
`indicated in
`Adaptation Algorithm: Based on the result
`Theorems 1 and 2, the design objective of the MOE network in
`(3) is to devise a training algorithm to estimate the parameter
`vectors
`. Given that
`and
`are fixed
`classifiers, this problem can be solved by a gradient procedure
`as follows: Let us assume
`be a set
`of training data used for searching the optimal gating functions
`and
`, such that the square error at the output
`is minimized.
`A gradient search algorithm can be devised as follows:
`
`(11)
`
`The initial values of
`are set to be the centroids of
`and
`the regions
`, respectively, for
`in the
`and
`user-specific training data set. The gradient of
`with respect
`to
`can be calculated as
`
`(12)
`
`(13)
`
`. In (13), we
`where
`assumed the transfer function
`is a differentiable threshold
`function, and is applied to the vector, element by element.
`Finally, with (13), we have as shown in (14) at the bottom of
`the page. Hence, for
`, we have
`
`(15)
`
`is accumulated
`Note that in above derivation, the error
`over the entire epoch (
`feature vectors). The summation
`over
`may be removed if we use on-line update of
`’s
`for each sample. This yields the following expression for
`:
`
`diag
`
`(16)
`
`. This is not surprising
`Clearly, we have
`with two parameter vectors arriving at a decision hyperplane
`.
`Until now, we have assumed that the user-specific ECG
`is readily available. However, in reality
`beat classifier
`it needs to be trained with the user-specific training data set.
`Also, the combined classifier
`needs to be trained by
`the same data set in order to determine the gating network
`parameters. Therefore, if
`is trained to 100% accuracy
`
`(14)
`
`4
`
`
`
`HU et al.: PATIENT-ADAPTABLE ECG BEAT CLASSIFIER
`
`895
`
`on the user-specific data set, then the gating network of choice
`may be
`and
`. In light of the results
`of Theorems 1 and 2, we devised the following strategy to
`alleviate this problem: First, we construct the user-specific
`training data set to contain only those feature vectors which
`the original classifier misclassified. We further partition this
`training data set into two subsets: one for the training of the
`user-specific classifier
`, and the other for estimating
`and
`.
`
`IV. EXPERIMENT
`The purpose of this experiment is to demonstrate the useful-
`ness of the proposed user-adaptation procedure. In particular,
`we will show that an ECG beat classifier trained on general
`patient records does not perform well when presented with
`patient records which contain rare beat types. Moreover, we
`show that the performance of the MOE classifier is able to gain
`significant performance enhancement with a small amount of
`annotated patient specific training data.
`
`A. Data Preparation
`In this study, we concentrate on the classification of ven-
`tricular ectopic beats (VEB’s). The 48 records (tapes) from
`MIT/BIH ECG arrhythmia database [17], [19] are used for the
`development and evaluation of the classifier. The availability
`of annotated MIT/BIH database has enabled the evaluation of
`performance of the proposed beat classification algorithm. The
`American Association of Medical Instrumentation (AAMI)-
`recommended practice [18] has provided a protocol for a
`reproducible test with realistic clinical requirements, empha-
`sizing tape-by-tape presentation of results that estimate an
`algorithm’s ability to detect events of clinical significance.
`Accompanying each tape in the MIT/BIH database is an
`annotation file in which each ECG beat has been identified
`by expert cardiologist annotators. These labels are referred to
`as “truth” annotations and are used in training (developing)
`the classifiers and also to evaluate the performance of the
`classifiers (experts) in testing phase. According to the AAMI-
`recommended practice, records containing the paced beats
`(four records) can be excluded from the reporting require-
`ments. Since this study is to evaluate the performance of a
`classifier that can identify a premature ventricular contraction
`(PVC), certain records in the database with no PVC’s (11
`records) were excluded from the study, leaving 33 records of
`interest. These excluded records are listed in Table I. Data
`from channel 1, down-sampled to 180 samples/s were used in
`this study. The selected files consist of 13 records (numbered
`from 100–124, inclusive, with some numbers missing) and
`20 records (numbered from 200–234, inclusive, with some
`numbers missing). The first group is intended to serve as a
`representative sample of a variety of waveforms and artifacts
`which an arrhythmia detector might encounter in routine
`clinical use. Records in the second group include complex
`ventricular, junctional, and supraventricular arrhythmias and
`conduction abnormalities. Several of these records are ex-
`pected to present significant difficulty to arrhythmia detectors
`because of the features of the rhythm, QRS morphology
`
`TABLE I
`RECORDS OF MIT/BIH DATABASE THAT WERE EXCLUDED FROM THE STUDY
`
`TABLE II
`FOUR CATEGORIES OF INTEREST INTO WHICH THE
`ECG BEATS OF THIS STUDY ARE CLASSIFIED
`
`variation, and signal quality. These records were reported to
`have gained considerable notoriety among database users [18].
`In this experiment, we use the first group of files as the
`training data to develop a GE classifier which is able to
`classify typical ECG beats. The second group of 20 records
`is used to simulate the ECG records of 20 patients, which
`are to be classified by the GE classifier. Since these records
`consist of less-frequently seen beats, it is expected that the
`GE classifier will not perform well. If this GE classifier were
`a commercial device, it will be deemed not-applicable (due to
`low performance) to many of these 20 test records. However,
`with the MOE approach, we will adapt this GE classifier with
`a LE classifier to gain significant performance enhancement
`at low cost.
`The beats in the MIT/BIH database are of several different
`types. In this study, we are interested in identifying four
`different categories, as indicated in Table II. Each of the
`four categories included beats of several types as shown in
`Table III. The AAMI convention was used to combine the
`beats into four classes of interest.
`
`B. Training and Testing Procedure
`In this study, a GE classifier was developed with SOM and
`LVQ algorithms using the data from the records of the first
`group (100–124). Before testing the records, a LE classifier
`was developed for each of the records in the second group
`using the first 2.5 min of data. The rest of the record is
`then tested using the mixture of global and LE’s as explained
`before. Since each record in the MIT/BIH database is of
`length 30 min, the 2.5 min segment account for 1/12th of total
`available patient specific data and contains approximately 150
`ECG beats. In practice, the attending cardiologist or any expert
`in ECG beat annotation will have to annotate a brief segment
`of patient-specific ECG in order to take advantage of the
`MOE approach. We believe that this is a reasonably small cost
`compared to the potential gain in performance enhancement.
`In future, we will explore a more effective method to further
`reduce the amount of required annotated patient-specific data.
`
`5
`
`
`
`896
`
`IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 44, NO. 9, SEPTEMBER 1997
`
`Fig. 1. Record by record comparison of sensitivity of three methods: GE, LE, and MOE.
`
`TABLE III
`BEATS OF MIT/BIH DATABASE CLUBBED INTO FOUR
`CATEGORIES BASED ON AAMI-RECOMMENDED PRACTICE
`
`The GE and LE classifiers were developed using the cluster-
`ing algorithm implemented in SOM PAK, and the fine-tuning
`algorithm implemented in LVQ PAK. The MOE algorithm
`was implemented in MATLAB. The SOM’s developed using
`all the data available in the training files had many of the
`nodes tuned to the normal beats providing a greater detail
`to the normal beats than to the abnormal ones. This lead
`to a successful recognition of most normal ECG beats and
`suboptimal recognition accuracies of abnormal beats. This bias
`was introduced due to the amount of data that falls into the
`category of normals was about ten times more than the data for
`other rhythms. Since the detail of the map is dependent upon
`the amount of data falling into that category, it is essential
`to provide equal amounts of data for each class. Therefore,
`normal beats were clustered (using SOM) and the prototype
`vectors developed were added to the dataset of beats from
`
`other categories forming sensitized data. The sensitized data
`was then used for developing the GE.
`1) Preprocessing: The objective of this paper is to classify
`the QRS beats into one of the four different categories. The
`QRS beats are obtained as 29 point templates. The position
`of annotation labels is used to identify the peak of the QRS
`waveform and 14 points on either side of the peak were picked
`up to form the template.
`The 29-dimensional template is then reduced to a nine-
`dimensional vector using principal-component analysis, also
`known as the Karhunen–Loeve transformation. It is designed
`such that the data set may be represented by a reduced number
`of “effective” features and yet retain most of the intrinsic
`information content of the data. We may reduce the number of
`features needed for effective data representation by discarding
`those linear combinations that have small variances and retain
`only those terms that have large variances. The data vector
`is then approximated with the
`largest eigenvalues of the
`, introducing an approximating error.
`correlation matrix
`Temporal parameters such as the instantaneous RR interval,
`average RR interval, and the width of the QRS complex were
`also extracted. The instantaneous RR interval is calculated as
`the difference between the QRS peak of the present beat and
`the previous beat. The average RR interval is calculated as the
`average RR interval over the previous ten beats. The width of
`the QRS complex is calculated according to the Pan–Hamilton
`algorithm [21].
`The information of each beat is stored as a 13-element
`vector, with the first nine elements representing the trans-
`formed morphological template, and the next three elements
`representing the temporal parameters. This leads to a 12-
`dimensional feature vector. The thirteenth element
`is the
`“label” of the beat from the annotation file, after suitable
`translation as described in Table III.
`Several preprocessing steps were performed on the raw data
`to study their effects upon the performance of the classifiers.
`Specifically, subtracting the mean value from each template
`showed a remarkable improvement in the performance of the
`
`6
`
`
`
`HU et al.: PATIENT-ADAPTABLE ECG BEAT CLASSIFIER
`
`897
`
`TABLE IV
`IDENTIFICATION OF TP, FP, TN, AND FN IN THIS STUDY.
`N(n): NORMAL BEATS, V(v): PREMATURE VENTRICULAR
`CONTRACTIONS, F(f): FUSION BEATS, Q(q): UNCLASSIFIABLE BEATS
`
`TABLE V
`COMPARISON OF PERFORMANCE BETWEEN THE GE, LE, AND MOE
`CLASSIFIERS. ALL ENTRIES ARE IN PERCENT (%). FOR THOSE RECORDS
`WHERE FP = TP = 0, POSITIVE PREDICTIVITY IS ASSIGNED TO
`NAN (NOT A NUMBER) BECAUSE ITS DENOMINATOR IS ZERO
`
`LVQ classifier. Even though the morphology of the beats
`belonging to the same category is similar, a baseline change
`can represent the data differently in the signal space. To avoid
`this problem, the mean value of the templates is subtracted.
`1 and
`1 before
`Templates were also scaled linearly between
`the expert classifiers are developed. Temporal
`information
`of the beats such as instantaneous RR interval, average RR
`interval over the past ten beats, and the width of QRS complex
`showed improvement in the classification of PVC beats.
`2) Training of the Global and Local Expert Classifiers: For
`the GE classifier, the sensitized data from 13 MIT/BIH data-
`base tapes (#100–124) is used to develop a SOM of size
`10 neurons. This is accomplished using SOM PAK. The
`15
`weights of each neuron form a code word in the code book
`of 150 code words. Each code word, or equivalently the
`associated neuron, then is labeled using annotated data. The
`label of the code word is assigned based on the label of
`annotated feature vectors assigned to that cluster.
`Another classifier of 150 code words, based on LVQ al-
`is developed using LVQ PAK. The classification
`gorithm,
`performance of the classifier developed using LVQ is superior
`for classes 1 and 3, whereas, the performance of the classifier
`developed using SOM is superior for classes 2 and 4. There-
`fore, the code books generated by LVQ and SOM were edited
`manually to select and combine those code words which yield
`superior performance. The resulting code book constitutes the
`GE classifier.
`To enable the “soft combination” of the classifier output,
`it is desired that the outputs of each classifier be an estimate
`of the a posterior probability of the feature vector belonging
`to that class. To facilitate this requirement, we assume that
`the posterior probability is a mixture of Gaussian distribution
`with each code word in the class being the mean of a
`Gaussian distribution with unity variance. This is a reasonable
`assumption since each code word is obtained using the SOM
`clustering algorithm based on the L norm distance measure.
`Therefore, for large amount of samples, the posterior proba-
`bility distribution of each class will converge to a Gaussian
`distribution asymptotically. For small samples such as those
`used for training a LE, a Gaussian distribution assumption
`seems to be an adequate approximation. Next the distance
`between a feature vector
`denoted by
`and the nearest code word of class
`,
`is computed.
`output of this GE classifier then is computed
`The class
`which is proportional to the Gaussian density
`as
`.
`function
`
`The LE classifier is developed in exactly the same manner
`as the global classifier, except that it uses only the first two
`and half minutes in the tape, and is constructed separately for
`each particular “patient tape” (tape #200–234) in the MIT/BIH
`database. We choose the first 2.5 min for training LE’s and
`the next 2.5 min of data to training the gating network of
`the MOE classifier. This practice is conformed to the AAMI-
`recommended procedure which allows to use of the first 5 min
`of data in each tape to fine tune the classifier. During testing
`with the combined MOE classifiers, only the last 25 min in
`each tape are used. Hence the testing data are never part of
`any training data through the entire process.
`3) Mixture of Experts (MOE) Classifier: A gating network
`provides the scaling factors (
`’s) for each class of both
`experts. The output of the gating network is a 2
`4 matrix,
`with each row forming a scaling factor vector for each expert.
`The weights of the gating network are simply determined as
`the centroids of the regions as indicated by the code-book
`vectors of the corresponding expert.
`The output of the classifier
`is calculated as given by
`(6). Each input vector is classified into the class which has
`maximum output in the output vector
`. Through extensive
`experimentation, we further modified the computation of the
`gating network output so that
`[i.e.,
`],
`if
`regardless of what was calculated from the
`gating network. This is intuitively convincing because it yields
`a decision for the LE when the LE classifier is certain about
`its diagnosis. We found that this modification improves the
`accuracy of the combined classifier and also improves the
`sensitivity.
`
`7
`
`
`
`898
`
`IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 44, NO. 9, SEPTEMBER 1997
`
`TABLE VI
`BEAT-BY-BEAT, RECORD-BY-RECORD TESTING RESULTS OF THE EXP