Open source

The codes can be found at my Github repo. If you are familar to the models already, just see the codes. The codes are made from understanding of the research papers in Nature and the other and the open source. The host and main contributors of the linked repo are the co-authors of the original research papers. The two related research papers are easy to understand. If you do not have much time to read it, see their blog post about this research. Their open codes are also well-organized and clean, but they skipped the usage of MIT-BIH data and the codes might be a bit complex for beginners. I did my best to make the model architecture easier to understand and loaded the MIT-BIH dataset to the model.

I also thank to the host of this Github. They used a bit different models1 to join the CINC competitions and their open codes were helpful for me. Actually I had found this repo earlier than the codes of the original research papers.

The model

The model is mainly made of 3 blocks, the input plus a resnet block, and a resnet loop-block and a output block. One of the 1-D convolution layer in the loop-block reduces the half of the size of the layer per bi-loops. Thus, during 15 loops, it shrink down the input size of the signal to 1256 times. And its filter length gets double up with zeropad every 4 loops. Then the filter grows from 1 to 32 at the input block and enlarges from 32 to 256 at the loop-block.


Cleaning data

As I mentioned in the previous posts, I used MIT-BIH datasets. Compare to about 8 thousands CINC data, MIT-BIH contains only 47 signals, however, each signal is long, with 650000 samples for each signal. Besides, the signals have an annotation of samples for each peak. In other words, we have many labels for each signal. Thus, I collect out all the high peaks of the heart beat signals and matched each one of the peaks to a sliced 256-sized sample about a peak. Then, this 256-sized sample is well-fit into the model without flattening layer.2

Ironically, a lot of N-labeled data cause a trouble for this model. The normal sinus rhythmic data is easy to gain, so most of collected data are N-labeled. In the trainer, to reduce the cost(loss) of the model, finding the weights of the model to make N to be more precise is way more efficient than finding weights of the model to be balanced classifier. Overall, they have better recall and F1 scores in general labels.


Let us look at the dataset Stanford research group used in the above image. It looks well-balanced. There are still majority of normal sinus signals but not so much as MIT-BIH data. Thus, I decided to remove randomly 85 percents of sinus signals in the MIT-BIH dataset. I also reduced the number of categories from more than 15 to only 5 by getting rid of minor number of categories. These five kinds of heart beat signals are secured to have more data than others for a qualified classifier. I have also removed L, R labels. These are branch bundle block beats which were origianlly labeled as normal beats before. It seems hard to discern with normal beats in this kind of classifier without some specific digital signal process which I do not yet study in more detail. Stanford research group also excluded out these branch bundle block beats from their label categories. The MIT-BIH dataset do not contain many unrecognized beats or noise. I took noise as much as other kinds of beats from CINC2017 dataset and resampled it with matched sampling rate3, then we have six categories. Why stanford had 12 labels and we have only six? The CNN classifer can distinguish fine classifications among ventricular beats or paced beats very well. Now we have normal-9460, ventricular-5951, paced-2074, A-2092, F-761, and noise-2428, toatal 22766 of 256-sized heart beat signal data. This is 14 size of the data Stanford group used.

Some of you might ask why I do not have bandpass or lowcut filter for the signals. We do not need the small noises of signal for classifications. The below image had a rough low pass filter and looks good. However, the convolutional layer already plays the same role of the various filters, and an optimizer finds the best filter configurations automatcially. Probably if we use proper preprocessing filter, we might be able to use smaller layers of neural networks and it could be more efficient. I won’t do that because we want to see how CNN classifier works well.


There are 5 types, MLII, V1, V2, V4, V5 in MIT-BIH dataset. I mainly trained and tested MLII and V1 features among them because other features do not have many data. The signal would be recorded differently in magnitude and shapes by devices and the where to measure. V1 has a potential to classify branch bundle block beats discussed above. I do not know much about it and leave room for you think about.


Thus, I got very good f1-scores for normal, ventricular, paced beats and noise of validation. If I increase the number of data for other labels and make data more balanced, the metric will be more improved. I did not tune the hyperparameters carefully, so you might be able to imporve the metic score.





Using the trained model, can predict CINC2017 competition trainset and compare the result with its true labels. Try it by cloning the open source I upload.


Then, you’ll see the probabilities and annotations only when the predicted labels are not nomral beats. There are only two slices of abnormal beats among 41 total slices in the above image. The model predicted a normal sinus beat with 93. 6 percentages and the true label is matched.

However, we can think of more subtle situation. When 30% of peaks are predicted as ventricular and 70% of the peaks are predicted as normal. Whom the signals belonging to might be heart-sick. If only few peaks are predicted with lower probability as abnormal, then it could be out of worry. In other words, as calculating an average of the probability, the second highest predicted label might be more realistic label for the signal unless the probability of “N” is not so high over 85 percentages.


In the above image, it shows part of predcitions of slices (it was too long to be all captured) and there are many labels for abnormal beats. However, the most predicted label is still “N” with 55.6 percentages, and the second predicted label is matched to the true label, “A”, with 25.9 percentages. The third probability is 8.2 percentage and much smaller than the second.

I have realized ‘O’-others label of CINC data is not well-predictable using the trainer. I thought ‘V’ label of the trainer would correspond to ‘O’ of CINC data, but it was not. Maybe the categories I removed, because of lack of the numbers, would match this ‘O’ label. I do not want to spend too much time on it since anyway this trainer is limited because of the small size of the dataset. We might be allowed to include CINC data or opened irhythm data but those data are different from noise. The signals are influenced by the device and where the devices are attached and mixing diffrent type of data might ruin the balance of the data. I won’t do that before knowing more about the signal analysis.

  1. It is subtle to see the difference of two models in the model figure introduced in this post. However, as looking into the codes, the numbers of dropout and maxpoolings layers, and kernel size of the convolution layers are different in two models. CINC stil had a good accuracy score in the CINC2017 competition. I followed the model of the Stanford research group (original). [return]
  2. keras.model.summary presents the output shapes are well-matched without a fllattening layer. The layer after the loop has 1-size input with many filters, (1, the number of filters), and the shape of target label layer is designed to be (1, number of categories of the result) [return]
  3. I randomly picked 28.4 percentages of noise signals from CINC2017 dataset. The sampling rates of MIT-BIH, CINC2017, irhythm datasets are 360Hz, 300Hz, 200Hz. They should be matched in a trainer, even for recorded noise.