TY - JOUR
T1 - ECG Beat classification: Impact of linear dependent samples
AU - Hintermüller, Christoph
AU - Hirnschrodt, Michael
AU - Blessberger, Hermann
AU - Steinwender, Clemens
PY - 2023
Y1 - 2023
N2 - The Electro Cardio Gram (ECG) is a very valu-able clinical tool to access the electric function of the heart.It provides insight into the different phases of the heart beatand various kinds of disorders which may affect them. In lit-erature the impact of linear dependency between feature sig-nals upon the classification outcome and how to reduce it havebeen largely investigated and discussed. This study puts a fo-cus upon linear dependency between samples of imbalanceddata sets, its relation to the observed over fitting with respectto majority classes and hot to reduce it. A set of 58 featuresignals is used to train a several LDA classifier either discrim-inating 3 classes (Normal, Artefact, Arrhythmic) or 5 Classes(Normal, Artefact, Atrial and ventricular premature contrac-tions and bundle branch blocks). The training data set is pre-processed using four sample reduction approaches and a near-est neighbour clustering method. In the case of 5 classes ac-curacies of 96.82 % in the imbalanced case and 97.44 % forthe data preprocessed with the QR or SVD methods were ob-tained. For 3 classes curacies of 97.68 % and 98.12 % wereachieved. With the nearest neighbour clustering method onlyaccuracies of 96.00 % for 5 classes and 97.37 % for 3 classescould be achieved. The results clearly show that imbalancedECG data does contain linear dependent samples. These causea bias towards majority class which will be over fitted by theclassifier. Sample reduction methods and algorithms which arenot aware of the presence linear dependent samples like thenearest neighbour clustering approach even further increasethis bias ore even worse destroy relevant information by merg-ing samples which encode distinct aspects of the beat class,destroying relevant information.
AB - The Electro Cardio Gram (ECG) is a very valu-able clinical tool to access the electric function of the heart.It provides insight into the different phases of the heart beatand various kinds of disorders which may affect them. In lit-erature the impact of linear dependency between feature sig-nals upon the classification outcome and how to reduce it havebeen largely investigated and discussed. This study puts a fo-cus upon linear dependency between samples of imbalanceddata sets, its relation to the observed over fitting with respectto majority classes and hot to reduce it. A set of 58 featuresignals is used to train a several LDA classifier either discrim-inating 3 classes (Normal, Artefact, Arrhythmic) or 5 Classes(Normal, Artefact, Atrial and ventricular premature contrac-tions and bundle branch blocks). The training data set is pre-processed using four sample reduction approaches and a near-est neighbour clustering method. In the case of 5 classes ac-curacies of 96.82 % in the imbalanced case and 97.44 % forthe data preprocessed with the QR or SVD methods were ob-tained. For 3 classes curacies of 97.68 % and 98.12 % wereachieved. With the nearest neighbour clustering method onlyaccuracies of 96.00 % for 5 classes and 97.37 % for 3 classescould be achieved. The results clearly show that imbalancedECG data does contain linear dependent samples. These causea bias towards majority class which will be over fitted by theclassifier. Sample reduction methods and algorithms which arenot aware of the presence linear dependent samples like thenearest neighbour clustering approach even further increasethis bias ore even worse destroy relevant information by merg-ing samples which encode distinct aspects of the beat class,destroying relevant information.
U2 - 10.1515/cdbme-2023-1207
DO - 10.1515/cdbme-2023-1207
M3 - Article
VL - 9
SP - 23
EP - 26
JO - Current Directions in Biomedical Engineering
JF - Current Directions in Biomedical Engineering
IS - 2
ER -