A systematic review and Meta-data analysis on the applications of Deep Learning in Electrocardiogram

Musa, Nehemiah; Gital, Abdulsalam Ya’u; Aljojo, Nahla; Chiroma, Haruna; Adewole, Kayode S.; Mojeed, Hammed A.; Faruk, Nasir; Abdulkarim, Abubakar; Emmanuel, Ifada; Folawiyo, Yusuf Y.; Ogunmodede, James A.; Oloyede, Abdukareem A.; Olawoyin, Lukman A.; Sikiru, Ismaeel A.; Katb, Ibrahim

doi:10.1007/s12652-022-03868-z

A systematic review and Meta-data analysis on the applications of Deep Learning in Electrocardiogram

Original Research
Published: 07 July 2022

Volume 14, pages 9677–9750, (2023)
Cite this article

Download PDF

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

A systematic review and Meta-data analysis on the applications of Deep Learning in Electrocardiogram

Download PDF

Nehemiah Musa¹,
Abdulsalam Ya’u Gital¹,
Nahla Aljojo²,
Haruna Chiroma ORCID: orcid.org/0000-0003-3446-4316^3,4,
Kayode S. Adewole⁵,
Hammed A. Mojeed⁵,
Nasir Faruk⁶,
Abubakar Abdulkarim⁷,
Ifada Emmanuel⁶,
Yusuf Y. Folawiyo⁶,
James A. Ogunmodede⁸,
Abdukareem A. Oloyede⁶,
Lukman A. Olawoyin⁶,
Ismaeel A. Sikiru⁶ &
…
Ibrahim Katb³

7166 Accesses
13 Citations
Explore all metrics

Abstract

The success of deep learning over the traditional machine learning techniques in handling artificial intelligence application tasks such as image processing, computer vision, object detection, speech recognition, medical imaging and so on, has made deep learning the buzz word that dominates Artificial Intelligence applications. From the last decade, the applications of deep learning in physiological signals such as electrocardiogram (ECG) have attracted a good number of research. However, previous surveys have not been able to provide a systematic comprehensive review including biometric ECG based systems of the applications of deep learning in ECG with respect to domain of applications. To address this gap, we conducted a systematic literature review on the applications of deep learning in ECG including biometric ECG based systems. The study analyzed systematically, 150 primary studies with evidence of the application of deep learning in ECG. The study shows that the applications of deep learning in ECG have been applied in different domains. We presented a new taxonomy of the domains of application of the deep learning in ECG. The paper also presented discussions on biometric ECG based systems and meta-data analysis of the studies based on the domain, area, task, deep learning models, dataset sources and preprocessing methods. Challenges and potential research opportunities were highlighted to enable novel research. We believe that this study will be useful to both new researchers and expert researchers who are seeking to add knowledge to the already existing body of knowledge in ECG signal processing using deep learning algorithm.

A Systematic Review on ECG and EMG Biomedical Signal Using Deep-Learning Approaches

What Machine Learning (ML) Can Bring to the Electrocardiogram (ECG) Signal: A Review

A new transfer learning approach to detect cardiac arrhythmia from ECG signals

Article 18 February 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The applications of Machine Learning (ML) techniques in medical biosignals data and sensors have received a tremendous attention from researchers in the last decades. However, the advent of Internet of medical Things (IoMTs), such as implantable medical device and personal wearable devices led to the large volume and varieties of data called big data (Belle et al. 2015; Corradi et al. 2019; Konan and Patel 2018). This made the use of traditional ML and feature engineering not suitable to process these data. More so, ML techniques depend largely on hand-crafted features which require deep domain knowledge to extract effective features and hence, time consuming (Hammad et al. 2019; Pyakillya et al. 2017; Taherisadr et al. 2018). Moreover, ML models suffer a great deal of problem of vanishing gradient and over fitting which affects the performances of the training models(Z.-J. Yao et al. 2018).

In respond to the challenges facing the shallow ML techniques as outlined in the previous section, Deep Learning (DL) algorithms emerges to solve the challenges of the shallow ML techniques. The DL algorithms perform better as the dataset becomes larger and the problem of vanishing gradient is handled through the application of the DL. The DL can extract information of clinical relevance hidden in large volume of health care data. Consequently, help in treatment, decision making and prevention of health conditions (Alom et al. 2019). The DL as a sub-class of ML has become a buzz word that dominates Artificial Intelligence (AI) applications and researches in recent years.

The popularity attracted by the DL is as a result of the successes recorded in handling complex data analytic problems such as, computer vision (Voulodimos et al. 2018), object recognition, detection, and segmentation (Wang 2016), face recognition(Shepley 2019), speech recognition (Nassif et al. 2019), image classification and localization(Pak and Kim 2017), natural language processing (Young et al. 2018), robotics (Pierson and Gashler 2017), medical imaging (Lee et al. 2017) and so on, better than the traditional ML techniques and feature engineering (Ince et al. 2017; Kwon et al. 2020; Tobore et al. 2019; Voulodimos et al. 2018). Another major difference between DL and traditional ML is in the presentation of data during model training (Miotto et al. 2018) and the way features are extracted (Alom et al. 2019). The early years of 21st century marked the “age of deep learning” (Voulodimos et al. 2018).

However, DL rose to become popular starting from the last decade, when Krizhevsky et al. made a high performance improvement on their convolutional neural networks model (AlexNet), that reduces top-5 error by 10% in 2012 and became the best model in the 2012 ImageNet Large Scale Visual Recognition Competition (ILSVRC-2012) (Krizhevsky et al. 2012). Other reasons associated with the rise of DL include the large availability of data and advances in computing power. The DL generalizes better when model with huge amount of data during model training. This would lead to computational cost which is handled by Graphic Processing Units (GPU), a special kind of processors that allow massive parallel computing. More so, the developments of software frameworks such as Keras, TensorFlow, Teano and PyTorch (Voulodimos et al. 2018), have made researchers to focus on the DL structures. Another common reason is the fact that the DL do not need hand-crafted features or feature engineering processes to extract features for training, instead, they can perform feature extraction, feature selection and classification automatically (Li et al. 2020). The DL are considered the promising technology suitable for big data generated from the medical applications. The DL models have also proven to produce state-of-the-art performances and more domains are gravitating towards it.

Electrocardiogram (ECG) is one of the most successful heart diseases diagnostic tools that have attracted a tremendous attention from researchers for decades. The ECG signals are biosignals that can help to detect heartbeats irregularity through calculating the bioelectrical and muscle activity of the heart (Shankar and Babu 2020). The Biosignals are electrical, thermal, mechanical, or other signals measured over the time from the human body (Ganapathy et al. 2018). The ECG instrument keeps records of the heartbeat’s activity continuously over time, which can be used for diagnosis. It records the electrical activities of the heart by placing the noninvasive electrodes (leads) on the subject’s body, typically, the chest and limb (Assodiky et al. 2017; Isin and Ozdalili 2017; Wang et al. 2019). These leads measure the voltage variations triggered by an involuntary impulses of the cardiac cells as they make the heart contract. In effect, these variations form the heartbeats that are seen in the form of series of waves. Time information of the electrical activity can be traced from the morphologies of the waves (Abdeldayem and Bourlai 2019) and used to detect heart and coronary related pathologies. The application of DL in ECG signals for various medical and health care applications has gained tremendous attention by the research community in the last decade. The use of these applications is essentially important to automatically diagnose patients’ cardiovascular diseases where there is a shortage of experienced and qualified medical doctors to interpret the ECG signals or even compliment the function of Cardiologist (Ribeiro et al. 2020).

This review is focused on related studies that use DL in ECG signals for various analyses from different domains.

Figure 1 illustrates the Google trends for DL compared with ECG. The search was performed using web search, for all categories, worldwide and from January, 2010 to May, 2020^{Footnote 1}. It can be observed that the trend in the graph (a) shows interest vs. time on a scale out of 100, where the score 0 means that the sufficient data is not present for the term. The score of a 100 signifies that the term is in its peak and more popular than any other topic at that time. Figure 1 (a) shows the steady rise of interest in both ECG and DL, with ECG reaching its peak value of 100 in February, 2020. The DL on the other hand, had minimal interest from 2010 to 2012, but shows a growing interest after 2012 onward. This may be as a result of the renewed interest and popularity gained by a DL model when CNN-based model (AlexNet) reduced top-5 error by 10% in 2012 and won the yearly ILSVRC-2012 competition (Krizhevsky et al. 2012). Figure 1(b) shows the top 5 popular countries where the search terms were more popular, sorted by interest for “deep learning + deep neural networks”; with China, South Korea, Japan, Germany and Taiwan on the top list.

There are a number of review papers found in the literature on the applications of DL in ECG signals from different perspectives Bote-Curiel et al. 2019; Faust et al. 2018; Ganapathy et al. 2018; Hong et al. 2020; Rim et al. 2020; Tobore et al. 2019; Z.-J. Yao et al. 2018). Most of these papers mainly work on the applications of DL in physiological signals analysis as a whole including ECG and for various medical (and/or healthcare) applications. However, evidence from the literature reveals the applications of DL in ECG-based biometric systems for human identification Bajare and Ingale 2019; Byeon et al. 2020; P.-L. Hong et al. 2019) and authentication (Hammad et al. 2018, 2019; Hammad and Wang 2019). Also, there are applications of DL in ECG-based driver drowsiness detection (Abbas 2020), stress level classification (Rastgoo et al. 2019) and pilot load prediction towards mitigating the risks of accidents (Xi et al. 2019). Nevertheless, (Hong et al. 2020) conducted a systematic review and focused on applications of DL in ECG data, considering the model architecture, source of data and the tasks perspectives, but it was not sufficiently extensive.

This study improves on (Hong et al. 2020) to presents a comprehensive systematic review and meta-data analysis on the applications of DL in ECG signals with respect to different domains that were not covered in the previous survey.

The subsequent sections of this study are structured as follows: Fig. 2 shows the general organization of the complete review. In Sect. 2, we present evolution of DL. Section 3 discusses the overview of ECG. In Sect. 4, we review related works. Section 5 outlines the systematic literature review processes. In Sect. 6, the paper presents the general discussion and meta-data analysis. Section 7 discusses challenges and future research directions while conclusion is presented in Sect. 8.

The summary of the contribution of this study are as follows: The study present taxonomy of domains for DL in ECG-based applications: medical/healthcare, biometric/security and driving. The paper highlights unresolved challenges and pointed out future directions. To the best knowledge of the authors, this is the first review to study the applications of DL in ECG signals with respect to different domains. We carried out meta-data analysis of the DL approaches in ECG signals based on their application domains, ECG preprocessing methods, application area, DL application tasks, DL models, DL model performances, datasets sources, and the architectures.

2 The evolution of Deep Learning

The DL is a new generation of Artificial Neural Networks (ANNs), a subset of ML a subset of AI (Alom et al. 2019; Nguyen et al. 2019). The emergence of AI can be traced back to the 1950s, when scientists thought computers could do things at the level of human intelligence. Notably, in 1950, Alan Turing asked a question, “Can machines think?” This question led to the journey of inventions of models from knowledge-based systems (also called symbolic AI) to ML models (Chollet 2018; Mohammed et al. 2016). In the 1956, John McCarthy et al. from IBM and Claude Shannon organized the first AI conference at Dartmouth College, USA, where the term “Artificial Intelligence” was first coined and used later in the second conference. The AI is the simulation of human intelligence on computers to mimic human intelligence (Jiang et al. 2017; Nguyen et al. 2019). This involves making computers to perform tasks like or even more than how a human would/could. Therefore, AI deals with the automation of human intelligence that would make machines perform tasks intelligently as humans (Chollet 2018).

The ML paradigm came in handy to curtail the limitations of the symbolic AI, that could not handle explicit rules for computing more complex and fuzzy problems like the speech recognition, computer vision, image processing, text classification, natural languages processing and pattern recognition. Instead of following some predefined set of rules, mapping input data to output data as it is with the symbolic AI, ML system is “trained” to learn (just like humans learn by experience), by supplying input data with labeled answers into the system. The ML enables computers to learn without programming explicitly (Chollet 2018; Mohammed et al. 2016). Learning refers to a procedure consisting of tuning the model parameters such that the learned model can perform a specific task (Alom et al. 2019). This enables an AI system to sieve information from raw data and present inferences based on input-output relationships thereby learning from experience for better generalization of new raw data (McBee et al. 2018). The ML algorithms that have gained popularity includes support vector machine (SVM), k-nearest neighbor (KNN), decision trees (DT), multilayer perceptron (MLP) and so on. The ML therefore provides a glimpse of hope to achieve the ultimate goal of AI – the explicit automation of human intelligence. Figure 3 shows the relationship between AI, ML, ANN and DL.

When it comes to performing work, intelligence is the central difference between humans and machines. Humans can learn using experience to take decisions, but machines cannot, they are built to perform specific and predefined set of tasks. The ML aims at bridging this gap. The advancement made in bridging this gap is further made possible with the introduction of ANN.

An ANN is a software simulation of how the biological brain functions, an extremely simplified model of the animal brain. The ANNs have witnessed three developmental waves so far from the literature. It started with the introduction of perceptron in 1950s, followed by the development of perceptron with backpropagation in 1970s and the introduction of DL in 1990s. The introduction of the first ANN model was by a neurophysiologist, Waren McCulloch and a mathematician, Walter Pitts, in 1943. Since then, scientists made contributions with Frank Rosenblatt who invented the first perceptron in 1957 (Mohammed et al. 2016). Perceptron is the simplest representation of biological neuron in the ANN.

Figure 4 shows a typical representation of a perceptron (the right picture) and a biological neuron (the left picture). The inputs (x₁, x_2,x_{3, …,}x_n) represent the dendrites carrying inputs data. These data inputs are multiplied with randomly generated weights(w₁, w_2,w_3,…, w_n) for each input data. The dot matrix product of (x₁, x_2,x_3,…, x_n) and (w₁, w_2,w_3,…,w_n) are summed up and a pre-determined value also called bias is added, this represent the body of a biological neuron. The activation, f, which represents the axon, is computed (using what is called step function). But this function can only estimate linear relations in the data. However, recent activation functions, such as sigmoid, Rectified Linear Unit (ReLU) and hyperbolic tangent (tanh) allow estimating complex and nonlinear relations in the input data and provide a normalization effect on the output data. The output, y, outputs one (1) if the resultant is more than some certain threshold value and the output is zero (0), if otherwise. The output y is computed using Eq. (1).

$$\varvec{y}=\varvec{f}\left({\sum }_{\varvec{k}=0}^{\varvec{n}}{\varvec{w}}_{\varvec{k}}\bullet {\varvec{x}}_{\varvec{k}}\right) +\varvec{b}\varvec{i}\varvec{a}\varvec{s}$$

(1)

In 1969, a book published by Marvin Minsky and Seymour Papert, called “Perceptrons” pointed the limitations of perceptron, which could not able to solve more complex features like XOR logic, and not able to tackle non-linearity features of input data in ANN. The authors of the book argued that the single perceptron approach to ANN could not be translated effectively into multi-layered ANN. As a result, ANN projects suffered from funding by organizations. However, in 1981, Paul Werbos proposed the first efficient ANN with backpropagation. Backpropagation algorithm works by fine-tuning the weights of an ANN based on the error rate computed in the previous iteration in order to reduce the difference between the actual output and the desired output (error), thus increasing its generalization ability (Fig. 5). In 1986, Rumelhart et al. work propagated the use of backpropagation and introduced hidden layers (Mohammed et al. 2016; Schmidhuber 2015). Figure 5 depicts a simple representation of an ANN with backpropagation. It may consist of tens to hundreds of neurons arranged in layers, with each layer connected to the layers on both sides. It has three parts consisting of the input unit (left), the hidden layer(s) (middle) and the output layer (right) that generates the results.

Figure 5 represents the working principle of backpropagation, the steps include first, input data, X is supplied into the network (1). A set of randomly generated weights, W, are multiplied for each X and summed with a bias value (2). The output layer (3) computes the output of the trained model. The loss function is calculated (4) and a backpropagation technique is applied (5) to backtrack to the hidden layers to fine-tune the weights such that the loss function is reduced. This process continues until the model is well trained or the number of epochs is reached.

The DL is based on the ANN that apply linear and nonlinear transformations of the input data from the input layer through the multiple hidden layers of processing units to the output layer (Ganapathy et al. 2018). These multiple processing layers imitate the human brain in how they represent data with multiple levels of abstraction. This helps the network to understand multimodal information, thereby implicitly capturing salient structures of large-volume of data (Voulodimos et al. 2018). Hinton et al. published a paper in 2006 (Hinton et al. 2006) attributed to be the first to come up with the concept and method of the DL. DL is a form of ML that enables machines to learn from experience and successive of concepts of the world (Kim 2016). Conversely, the form of ML that makes machines learn using only three layers of representations (input, hidden and output layers) of the data is sometimes called “shallow” learning or shallow model (Z.-J. Yao et al. 2018). The term “deep” in DL connotes the idea of hierarchical layers of representations, with modern DL having tens or hundreds hierarchical layers of representations (Chollet 2018; Faust, Hagiwara, et al., 2018). The major difference between DL and traditional ANNs is in the number of hidden layers (Fig. 6), their connections and the capability to learn meaningful abstractions of the input data (Miotto et al. 2018; Tobore et al. 2019). However, the performance largely dependent on the nature of the data representation presented to the model (Kim 2016). From 1990s, DL has successfully been used in different applications, it grows to become probably the most well-known field of AI (Bote-Curiel et al. 2019). For a more detailed timeline history and evolution of DL, the reader is referred to (Emmert-Streib et al. 2020) as the detail is not within the scope of the work. The DL is sometimes called the universal learning approach because it has been found applicable in almost all areas. Therefore, DL is task-independent (Alom et al. 2019). However, the potentials and opportunities of DL architectures are still being explored.

Figure 6 depicts the comparison between ANNs and DL. The ANNs usually have three layers where learning takes place towards the output. The DL consists of many hidden layers, tens to hundreds of layers where DL learn and fine-tune errors using backpropagation algorithm to extract salient features and structures of inputs for a more generalizable models. Learning approaches in ML and DL are classified into supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning (Alom et al. 2019; Mohammed et al. 2016).

2.1 Deep learning architectures

In this section, we discuss the DL architecture commonly used in ECG signal processing.

2.1.1 Deep neural networks

The Deep Neural Networks (DNNs) are special types of ML techniques with deeper networks than the traditional network. Even though DL can be referred to as DNNs generally, but for the sake of clarity in this review, DNN is considered as the conventional ANN with at least two hidden layers. Hence, DNN is a particular ANN with at least 4 layers, comprises of input, hidden layers and output layer (Sannino and De Pietro 2018). The use of DNNs suffers from the vanishing gradient problem; moreover, it is difficult to train. Over the years, many researchers tried to solve these challenges which led to the introduction of different kinds of DL architectures (Z.-J. Yao et al. 2018). Figure 7 shows the structure of the DNN.

2.1.2 Convolutional neural networks

Convolutional Neural Networks (CNNs), based on the human visual cortex was first proposed in 1962 by Hubel and Wiesel (Hubel and Wiesel 1962) and is the most utilized neural network for computer vision and video recognition (Shrestha and Mahmood 2019). It is a DL-based technique designed to automatically and adaptively learn features from an input image/data and classifies the input data into the desired classes. The CNN structure has three building blocks; the convolution layers, the pooling layers and the fully connected layers. The convolution and pooling layers are normally employed to extract features and fully connected layer is used for classification (Voulodimos et al. 2018; Yamashita et al. 2018). The concept of CNN was first proposed by Fukushima(Fukushima, 1995), but it was not widely used until when Lecum et al. in 1998, designed CNN for document analysis and recorded a good result for handwritten digit classification(LeCun et al. 1998). Then CNN rose to become popular about 14 years later when Krizhevsky et al. made a high performance improvement on their model known as AlexNet, which won the yearly ILSVRC-2012 (Krizhevsky et al. 2012). The year after, in 2013, ZF Net was developed which was an enhancement on AlexNet (Zeiler and Fergus 2014). It was followed by the development of GoogLeNet, a 22 layers deep network by researchers from Google Inc.(Szegedy et al. 2015).

The GoogLeNet won the ILSVRC 2014 challenge. Simonyan and Zisserman (Simonyan and Zisserman 2014), proposed VGGNet (also called VGG). They won the first and the second places in the ILSVRC 2014 for localization and classification tasks respectively. The ResNet, another powerful CNN architecture was proposed by He et al. (He et al. 2016) referred to as residual learning framework. The model won the first place on the ILSVRC 2015 classification task. There are other variants of the ResNet such as the ResNet50, ResNet34 and ResNeXt architectures. SqueezeNet (Iandola et al. 2016) is a model based on ResNet with 510 times less memory requirements and this is achieved through deep compression technique. Densely connected CNN (DenseNet) (Huang et al. 2017) proposed based on the deep CNN model. A DenseNet was designed in such a way that every layer is connected to every other layer in a feed forward network, it gives it an edge over the previous CNNs, such that it reduces the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters required. The CNN is considered the most used DL for classification (Pourbabaee et al. 2018; Schmidhuber 2015; Shi et al. 2020). The CNN has been applied in computer vision (Khan et al. 2018; Voulodimos et al. 2018), language translation (Yin et al. 2017), image segmentation (Kayalibay et al. 2017), object recognition (Kulik and Shtanko 2020) and so on. The architecture of the CNN (see Fig. 8) shows the different components of the architecture.

An input data e.g. an image (an m × m × r), where m is the height and width of the image and r is the number of channels) from the input layer passes through the convolution layer and pooling layer. The features of the input image also called input tensor, are extracted and classification of this image takes place at the fully connected layers where Softmax function is applied to classify an object with probabilistic values [0, 1](Sengupta et al. 2020).

The convolutional layer: This is the first layer that receives the input data. The Kernel, k, an n × n × q, where n is smaller than the dimension of the input image, m, and q can be the same as r, which is an array of numbers, called a tensor. The dimension of k can be m - n + 1. A matrix multiplication between each element of k and the input tensor is computed at each location of the tensor while it hovers over the input tensor at a stride length (distance between two successive kernels) and summed to obtain the output value in the corresponding position of the output tensor, called the feature map. The tensor hops using the same stride value and repeats the process until the entire image is traversed (see Fig. 9). The convolution operation can be performed over multiple convolutional layers thereby capturing high-level features for better performances. The key feature of a convolution operation is weight sharing: kernels are shared across all the image positions. In some cases where the kernels doesn’t fit perfectly on the input, valid padding or zero padding is applied. The former is when convolved feature is reduced in dimensionality as compared to the input. Valid padding drops the part where the kernel does not fit. Zero padding pads with zeros so that the filter fits in. In the convolution operation, the size of the kernels which is typically 3 × 3, the number of kernels, padding and stride value are decided prior to the training. These parameters are maintained to pooling layer. The convolution operation is mathematically expressed as in Eq. (2) (Alom et al. 2019):

$${{\text{x}}_{\text{j}}^{\text{l}} }^{ }\text{=f}\left({?}_{\text{i?Mj}}^{ }{\text{x}}_{\text{i}}^{\text{l-1}}\text{ * }{\text{k}}_{\text{ij }}^{\text{l}}\text{+ }{\text{b}}_{\text{j}}^{\text{l}}\right)\text{,}$$

(2)

Where.

${\text{x}}_{\text{j}}^{\text{l}}$= current layer output

${\text{x}}_{\text{i}}^{\text{l-1}}$= previous layer output

${\text{k}}_{\text{ij }}^{\text{l}}$= current layer kernel

${\text{b}}_{\text{j}}^{\text{l}}$= bias of the current layer

${M}_{j}$= selection of the input maps

The non-linear operation which is the use of activation function applied on the output of the convolution operation. Typically, the sigmoid function is used but the most common nonlinear activation function used presently is the rectified linear unit (ReLU) that simply computes the function: f(x) = max (0, x).

The pooling layer

This layer basically performs sub-sampling operation on the convolved features (feature map) for the next convolutional layer. This is with the aim to decrease the computational power required to process the data by dimensionality reduction and help reduce the chance for overfitting. However, it retains important information. There are different pooling operations such as max, average and sum pooling. Max and average pooling are the most popular (Alom et al. 2019; Voulodimos et al. 2018). The sub-sampling operation is computed using Eq. (3).

$${\text{x}}_{\text{j}}^{\text{l}}\text{ = down (}{\text{x}}_{\text{j}}^{\text{l-1}}\text{)}$$

(3)

,

The fully connected (FC) layer

This layer flattens the output feature maps of the convolutional or pooling layer from the 2-dimensional (2D) feature maps into 1-dimensional (1D) array of numbers (or vector) and feed it into a fully connected layer for classification. Every input is connected to every output by a learnable weight. Each fully connected layer is followed by a nonlinear function, such as ReLU. Finally, an activation function such as softmax or sigmoid is used to classify the input image(Yamashita et al. 2018).

2.1.3 Deep recurrent neural networks

Deep recurrent neural network is a variant of the recurrent neural networks (RNNs) with at least two hidden layers, where the current output depends on the computations performed previously, hence the name recurrent (Andersen et al. 2019). A RNN carries out training by remembering the previous step from the current step. This is different from feed forward neural network that trains only in the forward direction. This makes it inefficient for sequential data with dependencies like the time series predictions, speech recognition, voice semantic recognition etc. A RNN is said to possess a “memory” kind of concept that remembers the previous computation done before the current state, giving the RNN a kind of contextual information. Another important feature of the RNN is that, it provides same parameters for input layer through the hidden layers to output layer. Thereby, reduce the complexity of parameters in contrast with other ANN. The training of a RNN is preferably performed with data that have interdependencies to maintain information about what occurred in the previous interval (Tobore et al. 2019). However, the advantages of the RNN came with a disadvantage of vanishing gradient problem (Bengio et al. 1994; Shrestha and Mahmood 2019). To this effect, long short-term memory unit (LSTM) was proposed which effectively handle this challenge (Hochreiter and Schmidhuber 1997). Another RNN-based architecture is gated recurrent unit (GRU) which is a special case of LSTM. It has equivalent performance with LSTM but faster than the LSTM (Lynn et al. 2019; Wang et al. 2018). In this review, we only present the architecture of LSMT as the most used variant of RNN (Yildirim 2018). The architecture of LSTM is depicted in Fig. 10.

The LSTM extends the memory retention capability of RNN that retains it is value as a function of it is inputs data because RNN suffers from lags. The LSTM structure has three gates (input, forget and output gate) that direct the flow of information from previous to current state to the memory cell and to the next memory cell (Sharma et al. 2020). The input gate is establishes when the flow of the new information into the memory should take place. The forget gate controls how long the stored information should be retained giving free space for new data. The output gate decides when the stored information in the cell is used in the output (Hernández-Blanco et al., 2019). Figure 10 depicts the LSTM representation where, I_t, O_t, and M_t are the current input, output and memory state of the LSTM cell. I_t−2 and I_t−1, O_t−2 and O_t−1, M_t−2 and M_t−1 are the inputs, outputs, and memory respectively for previous time steps. O_t+1, and M_t+1 represent subsequent output and memory state respectively for subsequent time step input I_t+1. I, O, M represent the recurrent input, output and memory state respectively for a simplified LSTM cell operation and W_r is the weight for the computation in the cell. LSTM temporal information processing ability makes it popular and widely used (Alom et al. 2019). The memory cell ${c}_{t}$ is updated, while at same time produces output vector ${h}_{t}$ based on the following equations (Antczak 2018):

$${f}_{t}={s}_{g}\left({W}_{f }{x}_{t }+ {U}_{f }{h}_{t-1 }+ {b}_{f }\right)$$

(4)

$${i}_{t}={s}_{g}\left({W}_{i }{x}_{t }+ {U}_{i }{h}_{t-1 }+ {b}_{i }\right)$$

(5)

$${o}_{t}={s}_{g}\left({W}_{o }{x}_{t }+ {U}_{o }{h}_{t-1 }+ {b}_{o }\right)$$

(6)

$${c}_{t}={\sigma }_{c}\left({W}_{c }{x}_{t }+ {U}_{c }{h}_{t-1 }+ {b}_{c }\right)\circ {i}_{t}+{f}_{t} \circ {c}_{t-1}$$

(7)

$${h}_{t}={o}_{t}\circ {\sigma }_{h \left({C}_{t}\right)}$$

(8)

Where.

${x}_{t}$= input vector

${f}_{t}$, ${i}_{t} and {?}_{t}$ = activation vectors of input, forget and output gates

${\sigma }_{g}$, ${s}_{c} and {s}_{h}$ = activation functions, typically ${\sigma }_{g}$ is the sigmoid function, ${s}_{c} and {s}_{h}$ are tanh

$\circ$= denotes an operator called Hadamard product

2.1.4 Restricted Boltzmann Machines

The Boltzmann machines (BMs) are a kind of a bidirectional connected Networks with symmetrically coupled stochastic visible and hidden units. The visible units represent the first layer of the networks and correspond to the components of an observation whereas the hidden units model the dependences between these components of observations. Each of these units updates it is state over time in a probabilistic manner depending on the states of the neighboring units. This makes the learning of BMs computationally intensive. Restricted BM (RBM) is a parameterized generative stochastic ANN with undirected interactions between pairs of visible and hidden units. The RBM was designed to impose restrictions on the network topology in BM, thereby reducing the learning complexity of its parameters (Fischer and Igel 2012, 2014; Upadhya and Sastry 2019). Therefore, RBM becomes a variation to BM by given a restriction in the intra-layer connection between the visible and hidden units, forming a bipartite graph structure, hence the named RBM (Sengupta et al. 2020). Paul Smolensky was the first to introduce the RBM in 1986. However, he called it with the name Harmonium (Smolensky 1986).

The RBM has the ability to learn the input probability distribution in supervised as well as unsupervised approach, hence, it is popularity as a DL framework (Sengupta et al. 2020). There are two main DL architectures that incorporates RBM as learning module, namely, Deep Belief Networks (DBN) and Deep Boltzmann Machines (DBM) both are considered to belong to the “Boltzmann family” (Voulodimos et al. 2018). Special emphasis is given to DBN in this study. DBN is a model that is based on the combination of two different types of ANN. Specifically; DBNs amalgamate RBMs as the input unit with Deep Feedforward Neural Networks (D-FFNN) which forms the output unit (Emmert-Streib et al. 2020). Figure 11 depicts the structure of the DBN.

The DBN has undirected connections between it is top 2 layers and directed connections between all its subsequent layers (Voulodimos et al. 2018) (see Fig. 11). The DBN is initialized by greedy layer-wise training of the RBM Tobore et al. 2019; Voulodimos et al. 2018; Z.-J. Yao et al. 2018). Hinton and Salakhutdinov (Hinton and Salakhutdinov 2006) first proposed DBN and introduced the greedy layer-by-layer unsupervised learning algorithm that allows efficient training of these deep, hierarchical models in DBN. The DBN as a generative graphical model learns to extract a deep hierarchical representation of the training data. They model the joint distribution between observed vector x and the hidden layer as in Eq. (9).

$$\varvec{p}{\left(\varvec{x},{\varvec{h}}^{1},\dots ,{\varvec{h}}^{\varvec{l}}\right)}^{ }=\left(\prod _{\varvec{k}=0}^{\varvec{l}-2} \mathbf{p}\left({\varvec{h} }^{\varvec{k}}{ | \varvec{h}}^{\varvec{k}+1}\right)\right) \varvec{p}\left({\varvec{h}}^{\varvec{l}-1}, {\varvec{h}}^{\varvec{l}}\right)$$

(9)

Where $\varvec{x}={\varvec{h}}^{0}, \varvec{p}\left({\varvec{h}}^{\varvec{k}}|{\varvec{h}}^{\varvec{k}+1}\right),$ is a conditional distribution for the visible units at level $\varvec{k}$ conditioned on the hidden units of the RBM at level${ \varvec{k}+1}^{ }$and $\varvec{p}\left({\varvec{h}}^{\varvec{l}-1}|{\varvec{h}}^{\varvec{l}}\right)$ is the visible-hidden joint distribution in the top-level RBM (Voulodimos et al. 2018).

2.1.5 Autoencoders

An Autoencoder (AE) is an unsupervised DL approach originally proposed by LeCun et al., in 1987 (LeCun et al. 1998). It involves dimension reduction of the input data and reconstruction of the input in the output layer (Shrestha and Mahmood 2019). An AE is a network of three layers; it becomes a deep AE with multiple hidden layers. Both the input layer and output layer have the same number of units, represented with the same dimensionality and the hidden layers typically have fewer units that encodes the inputs in a more compressed form (Sengupta et al. 2020; Tobore et al. 2019). The AE architecture is presented in Fig. 12. Training of AE involves two phases: The encoder and the decoder. The network is trained using backpropagation. During the encoding phase, the inputs are encoded into some hidden representations using the weight metrics of the lower half layer, and in the decoding phase, it tries to reconstruct the same input from the encoding representation using the metrics of the upper half layer. The encoding and decoding phases can be mathematically expressed as in Eqs. (10) and (11) respectively.

$${\varvec{y}}^{\varvec{\text{?}}}=\varvec{f}\left(\varvec{w}\varvec{x}+\varvec{b}\right)$$

(10)

$${\varvec{x}}^{\varvec{\text{?}}}=\varvec{f}\left({\varvec{w}}^{\varvec{\text{?}}}{\varvec{y}}^{\varvec{\text{?}}}+\varvec{c}\right)$$

(11)

Where ${\varvec{x} }^{ }$and ${\varvec{x}}^{\varvec{\text{?}}}$represents the input vector and reconstructed input vector in the output layer respectively. Variable $\varvec{w}$ and $\varvec{b}$ are the parameters to be turned, ${\varvec{w}}^{\varvec{\text{?}}}$ and $\varvec{c}$ is the transpose of $\varvec{w}$, and the bias of the output layer respectively; ${\varvec{y}}^{ }$is the hidden representation and $\varvec{f}$ is the activation function. The parameters are updated using the following Equations:

$${\varvec{w}}_{\varvec{n}\varvec{e}\varvec{w}}=\varvec{w}-??\varvec{E}/?\varvec{w}$$

(12)

$${\varvec{b}}_{\varvec{n}\varvec{e}\varvec{w}}=\varvec{b}-??\varvec{E}/?\varvec{b}$$

(13)

Where ${\varvec{w}}_{\varvec{n}\varvec{e}\varvec{w}}$ and ${\varvec{b}}_{\varvec{n}\varvec{e}\varvec{w}}$ stands for the updated parameters of $\varvec{w}$ and $\varvec{b}$ respectively and $\varvec{E}$ is the reconstructed error of input at the output layer (Sengupta et al. 2020).

2.1.6 Generative adversarial networks

Generative Adversarial Network (GAN) is a DL architecture with unsupervised learning approach proposed by Goodfellow et al. in 2014 (Goodfellow et al. 2014). The GAN have two networks; generative and discriminator networks, that compete against each other in a zero-sum game simultaneously (Alom et al. 2019). The generative model tries to capture the data distribution whereas the discriminative model learns to estimate the probability of a sample either coming from training data or the distribution captured by the generative model. This can be viewed as a minmax two player game between these two models as the generative models produce adversarial examples while discriminative model trying to identify them correctly and both try to improve their efficiency until the adversarial examples are indistinguishable from the original ones (Sengupta et al. 2020). Figure 13 illustrates the flow of information in GAN deep network.

For a comprehensive discussion about the comparison between CNN and RNN, DBM and DBN, AE and RBM architectures, the reader is referred to (Tobore et al., 2019) as it is beyond the scope of this work.

3 The synopsis of Electrocardiogram

An electrocardiogram, usually abbreviated as “ECG” or “EKG” is a form of a test that provides the measurement of electrical signals generated from the heartbeat activity. It shows the condition of the heart and tells the status of various cardiovascular diseases (CVDs). ECG is a non-invasive and non-expensive tool, efficient in diagnosing cardiac disorders such as arrhythmia, by continuous monitoring of the ECG. The signals give information that can aid in analyzing and understanding the cardiac activity of a person such as heart rate, rhythm and morphology (Al Rahhal et al. 2016; Apandi et al. 2018; Park et al. 2019). Typically, the information provided by ECG test are the information of how long it takes for the electrical wave to pass through the heart by measuring time intervals on the ECG. This can help doctors to determine if the electrical signal passing through the heart is normal or slow, fasts or irregular. Secondly, measuring the amount of electrical wave passing through the heart muscle would help a cardiologist to diagnose if a part of the heart is overworked or too large. Electrocardiography is over a century old method with an established role in the care of patients with documented or suspected CVDs (Ribeiro et al. 2019). Other biosignals studied in the literature include Electromyography (EMG), Electroencephalography (EEG), Electrooculography (EOG), Photoplethysmography (PPG), phonocardiography (PCG), blood pressure and so on. The ECG is one of the biosignals or physiological signals, others includes EMG (concerns about changes to skeleton muscles), EEG (concerns about changes to the brain measured from the scalp), EOG (which concerns about changes to corneo-retinal potential between the front and the back of the human eye) (Rim et al. 2020), PPG (regards the volumetric changes of an organ over the time by recording changes in light absorption), blood pressure, and so on.

The ECG happened to be the first test tool developed in 1895 by Willem Einthoven and was found applicable in medical diagnosis (Ganapathy et al. 2018). The ECG rose to become one of the most widely used test tool for the CVDs disorders (Mincholé et al. 2019). Other areas where ECG is found applicable are: ECG-based biometric systems that have been proposed (both single and multimodal methods) for human identification and authentication using ECG as the physiological trait. In addition, ECG-based detection of driver drowsiness and stress level systems have been proposed in the literature to aid in reducing the rate of accidents. The ECG has also been used to predict the size and location of the heart as well as to locate the wound in the heart, and to ascertain the effectiveness of a drug (Byeon et al. 2020).

3.1 Types of ECG Machine

There are different types of ECG suitable for different situations and conditions. They can be used to measure the ECG signals either using in-the-person, on-the-person or off-the-person methods. The following are some of the commonly used types of the ECG machines:

The resting 12-leads ECG machine: This is considered the standard ECG for measuring heartbeat. With the 12 leads, more comprehensive signals can be obtained during the ECG measurement (Ribeiro et al. 2020). As demonstrated in (Mohamad Mahmoud Al Rahhal, Al Ajlan, Bazi, Al Hichri, & Rabczuk, 2018) that 12-leads ECG gives better performances in detecting Premature Ventricular Contractions (PVC) compare with the less leads ECG. The test using this ECG machine is carried out when the subject is lying still, and the 12 leads are placed on the chest, arms, and legs to sensor the electrical activity of the heart.
An ambulatory (Holter monitor) ECG machine: This type of ECG can take a day to two days recording ECG signals continuously. There are some arrhythmia abnormalities that rarely occur in a person, the type that may not be detected during the standard ECG testing. These kinds of arrhythmias are often tracked for 1 to 2 days using ECG holter (Takalo-Mattila et al. 2018). The electrodes may be connected to a small portable machine worn around the waist or around the hand like a wristwatch to enable monitoring of heart from home. These ECG machines are suitable for healthcare applications for remote and real time monitoring of patients.
Exercise ECG machine: This is a special type of ECG machine that is used for stress test. The device is usually used during an exercise and stress activity. During the exercise test, breathing and blood pressure rates can be monitored. This ECG test may be used to detect coronary artery disease, and to test safe levels of exercise following a heart attack or heart surgery.

3.2 ECG wave morphology

In the standard ECG machine with 12-leads, measurement is taken by placing the leads on the body. The leads are the channels of recording, which are lead I, lead II, lead III, aVR, aVL, aVF, V1, V2, V3, V4, V5, and V6. Among them, in particular, the lead II is most commonly used to evaluate behavior of the five waves because it shows clear signal compared to other waves (Amrani, Hammad, Jiang, Wang, & Amrani, 2018; Luo, Li, Wang, & Cuschieri, 2017; Sugimoto, Lee, & Okada, 2018). These leads are placed on the person skin usually, six on the chest. Every electrode produces the record from different angles. The resting 12-leads ECG is considered the most accurate tool to record heartbeat rhythm (Assodiky et al., 2017). However, the configuration of the ECG system used to extract signals is application-dependent (Abdeldayem & Bourlai, 2019). Some may require lying down on the bed (in the case of the resting 12-lead ECG) while some may require long time monitoring for hours to days (using Holter monitor). Methods to measure ECG have been classified as in-the-person, on-the-person and off-the-person methods. On-the-person ECG-measurement is done using electrodes attached into patient skin and this way measures the electrical activity of the heart. Secondly, device is implanted into patient’s body in-the-person measurement. Off-the-person method measures ECG without need for skin contact, for example, capacitive measurement (da Silva et al., 2015) (Muhammed & Aravinth, 2019).

Figure 14 shows the components of ECG waveforms recorded over an ECG machine. The ECG comprises of five (5) waves called PQRST waves. These waves give information about the electrical activities of the heart. The waves can be used for diagnosis of various heart disorders. The heartbeat is originated as an electric pulse from the sinoatrial (SA) node situated in the right atrium (singular of atria) of the heart. The SA node fires causing the atria to contract and pump blood to the lower chambers of the heart (ventricles). The P wave represents the normal atrial (upper heart chambers) depolarization, it shows how the electrical impulse (excitation) spreads across the two atriums of the heart. The Q, R and S waves that is called the QRS complex represents one single heartbeat and corresponds to the depolarization of the right and left ventricles (lower heart chambers). This occurs when the atria contract (squeeze), pumping blood into the ventricles, and then immediately relax. This is accompanied with the electrical pulse generated from the SA node which travels through the atrioventricular (AV) node that connects electrically the atria and the ventricles which activates the ventricles and cause the ventricles to contacts. The T wave represents the re-polarization (or recovery) of the ventricles. It shows that the electrical impulse has stopped spreading, and the ventricles relax once again (Antczak 2018; Banerjee et al. 2019; Swapna et al. 2018).

3.3 Measuring of ECG waveforms and diagnoses

The reading of the normal ECG signal is done using intervals between waves. For instance, the ECG signals is presented as P-QRS-T intervals, where the P wave starts and ends before the QRS complex with a duration from 0.06 to 0.12 seconds. The PR interval has duration of 0.12 to 0.20 seconds. An extended PR interval may indicate heart blockage. The QRS complex follows PR interval with a duration from 0.06 to 0.10 seconds. ST segment prolongs from S wave to the origination of T wave. The QT interval usually appears for 0.36 to 0.44 seconds (Konan & Patel, 2018). Any extension of these intervals would indicate certain heart pathologies like arrhythmias. The American Heart Association refers to arrhythmia as any change from the normal sequence of electrical impulses. This could be slow heat beat (bradycardia) that include supraventricular tachycardia, atrial tachycardia (fibrillation and flutter), and ventricular tachycardia. Very fast heart beat (tachycardia) comprising of AV heart blocks, bundle branch blocks, and tachybrady syndrome. Irregular contraction of upper heart chamber (atrial fibrillation); abnormal heart beat (conduction disorder); early heart beat (premature contraction), and so on. Other symptoms may include fainting, dizziness, weakness, and usually pain in the chest. Sometimes, people don’t feel any symptom (Assodiky et al., 2017; Kusuma & Udayan, 2020).

However, the manual reading of ECG strips for diagnosis is time consuming and dependent on the proficiency of the Cardiologist or Physiologist. More so, it is often prone to human errors due to fatigue (Apandi et al., 2018) (A. B. A. Qayyum, Islam, & Haque, 2019). The DL have shown promising results in automatically extracting the features of the ECG raw data and analyze (Singh, Pandey, Pawar, & Janghel, 2018; Takalo-Mattila et al., 2018). This enhances the productivity of the Cardiologists by helping in making fast and accurate decisions. Other methods that are used to detect infections and conditions of the heart include chest X-ray. The Chest X-ray images have been used to detect the emerging respiratory infectious disease, also known as coronavirus 2019 (COVID-19) using DL (Ouchicha, Ammor, & Meknassi, 2020; Polat et al.; X. Zhang et al.). Although the COVID-19 is considered illness of the lungs, a study revealed that 1 in 5 patients with COVID-19 have signs of heart injury (T. Guo et al., 2020).

4 Reviews of previous surveys

It is necessary to ensure the need for a review before conducting it. The study has to starts by identifying any existing review based on the appropriate evaluation criteria on the research area (Kitchenham & Charters, 2007). In order to avoid duplication of review work, we made a general search based on the title and keywords of the current review paper and found a number of the literature reviews that survey DL applications in ECG from different perspectives (Bote-Curiel et al., 2019; Faust, Hagiwara, et al., 2018; Ganapathy et al., 2018; S. Hong et al., 2020; Rim et al., 2020; Tobore et al., 2019; Z.-J. Yao et al., 2018). A study by Ganapathy et al. (Ganapathy et al., 2018) conducted a review of DL applied in 1D biosignals such as ECG, EMG, PPG, PCG, EOG and others. This study search papers from PubMed, Scopus, and ACM databases and sampled 71 studies from 2010 to 2017 inclusive. The authors classified the DL models according to their origin, dimension, input biosignal, goal of the application, type and ground truth data; the topology of the model and the scheduling of learning. A study in (Faust, Hagiwara, et al., 2018) investigated literature works that used DL in physiological signals in healthcare applications. The survey focused on physiological signals such as EMG, EEG, ECG, and EOG. The paper extracted parameters such as specific application area, DL model, system performance and type of dataset used to develop the model. It considered 53 papers published from 2008 to 2017 inclusive. In another study, a review by Yao et al. (Z.-J. Yao et al., 2018) conducted a survey of the applications of the DL in healthcare, focusing on 7 application areas such as electronic health records (EHR), ECG, EEG, community healthcare, data from wearable devices, drug analysis and genomics analysis. The survey discussed merits and demerits of the studies identified and existing challenges before proposing future directions. The survey concluded by highlighting the robustness and interesting features that make DL suitable for clinical and healthcare data. In another survey, (Tobore et al., 2019) pointed out some biomedical domain for DL intervention in healthcare challenges. Capturing papers from PubMed and IEEE Xplore databases published between 2012 to 2017 inclusive. The study presented the applications of DL in healthcare by classifying into biological system, e-health record, medical image, and physiological signals. The survey presented research directions for enhancing health management on a physiological signal application. In a review paper conducted by (S. Hong et al., 2020), a systematic review is presented on opportunities and challenges in DL techniques on ECG data, focusing on the model architecture, the applications and dataset. The study included 191 papers from Google Scholar, PubMed and DBLP databases published from January, 2010 to February, 2020 inclusive. The authors concluded by presenting some challenges such as interpretability, scalability, efficiency as potential area for further studies. A review by Bote-Curiel et al. (Bote-Curiel et al., 2019) first presented the overview of big data and DL (which are two related fields of Data Science), in light of their applications in healthcare domain. In a two folded review, the authors reviewed applications of DL in healthcare from biomedical information with emphasis on ECG. They searched PubMed, IEEE Xplore, Google Scholar, and Science Direct electronic databases, covering papers published in 2017 and 2018. A recent review paper by (Rim et al., 2020) conducted an extensive survey on DL in physiological signal data such as EMG, ECG, EEG, and EOG. They discovered 147 papers published between 2018 and 2019 inclusive searched from PubMed database. They extracted parameters such as the input data type, task model, training architecture, and dataset sources of the DL approaches with the objective to comprehend, categorize, and compare performance as they are applied in physiological signal analysis for various medical applications.

5 Systematic literature review

To ensure knowledge advancement, literature review becomes necessary to harmonize the body of knowledge with the aim to understand, summarize, analyze, and synthesize a group of related literature (Xiao and Watson 2019). This involves evaluating, analyzing, criticizing and/or identifying missing links or research gaps. The Systematic Literature Review (SLR) utilizes the evidence-based practices of evidence-based software engineering paradigm that help in rigorous understanding of the problem domain (Babar et al. 2014). This review adopted the SLR based on the guidelines proposed by Kitchenham and Charters (Kitchenham and Charters 2007). For carrying out SLR, there are three major phases that are involved, namely, (1) Planning the review (2) Conducting the review and (3) Reporting the review.

5.1 Planning the review

Planning a SLR starts with defining the protocol or a plan that details the procedure and process in carrying out the review. It is pertinent the review protocol is meticulously checked before executing it (Brereton et al. 2007). This review protocol is pilot tested by the student and validated by the supervisors. The review protocol comprises of (a) the research questions; (b) the search strategy; (c) the inclusion and exclusion criteria (study selection criteria) and (d) the data extraction and synthesis of results.

5.1.1 Research questions (RQ)

The following questions were formulated to guide the SLR study:

RQ1: In what domains is the DL application in ECG signals presented?

RQ2: What are the DL techniques that have been applied for ECG signal analysis?

RQ3: In what application areas the proposed DL models were presented?

RQ4: What are the application tasks performed by the proposed DL models?

RQ5: What are the sources of datasets utilized in the studies to model the DL?

RQ6a: What ECG preprocessing methods and training architectures were used?

RQ6b: Which of the training architectures produced the best performance?

5.1.2 Search Strategy

The search strategy defines the strategy used to find materials for the review. These include: channels for literature search, keyword used for the search, sampling strategy, stopping rule and other restrictions (Xiao & Watson, 2019). This review searches electronic databases for published papers that applied DL in ECG signals. As suggested by Brereton et al. (Brereton et al., 2007), we selected 8 electronic databases for our literature search: IEEE Xplore digital library^{Footnote 4}, ACM digital library^{Footnote 5}, Science Direct^{Footnote 6}, Springer Links^{Footnote 7}, DBLP^{Footnote 8}, PubMed^{Footnote 9}, and two interdisciplinary research databases, Scopus^{Footnote 10} and Web of Science (WoS)^{Footnote 11}. Selective and representative sampling was adopted in this review, with only peer-reviewed studies based on application of DL in ECG. The papers included in the literature were those written in English language only. The search covers from January 1st, 2010 to April, 30th 2020 (current date of conducting the research) inclusive. The following keywords were used; “deep learning”, “deep neural network”, “deep neural networks” “convolutional neural network” and “Electrocardiogram”, “ECG”, “EKG” (S. Hong, Zhou, Shang, Xiao, & Sun), and “deep learning electrocardiogram ECG” (Rim et al., 2020). Boolean operators were used to concatenate the keywords. The “OR” operator is often used to join synonymous keywords while “AND” operator is used to combine the main terms in the search string (Brereton et al., 2007). We used Boolean operators because most of the databases support their use (Xiao & Watson, 2019). These keywords were integrated to form the search string as shown below.

(“DL” OR “deep neural network” OR “deep neural networks”) AND (“Electrocardiogram” OR “ECG” OR “EKG”) AND DL electrocardiogram ECG

The initial search string was tested on two databases, Science Direct and IEEE Xplore by way of validation, the string fetched relevant known and unknown primary studies. In some cases, the search string had to be modified to suit the databases.

5.1.3 Study Selection Criteria

In view of the research questions and SLR objectives, the following inclusion and exclusion criteria were defined and observed in the published papers that were retrieved.

A.
Inclusion criteria.

Primary studies that contain the following will be included:

Study that presents evidence of the use of a DL architecture to model ECG AND.
Study based on empirical evidence AND.
Study that is reported in peer-reviewed workshop or conference or journal AND.
Study written in English Language AND.
Study between January 1st 2010 and 30th April 2020 inclusive.

B.
Exclusion criteria.

A primary study that contains the following will be excluded:

Study on deep model architecture without evidence of application in ECG OR.
Study not accessible electronically OR.
Study that is not complete (content and incomplete results) OR.
Encyclopedia, posters, books, book chapters, keynotes, and editorials OR.
Study that is a duplicate (if two versions of a paper are found, the less complete version is excluded).

C.
Quality Assessment (QA).

The QA was applied on the papers that passed the inclusion criteria. This is the final stage for preparing the selected papers for data extraction and analysis stage (Xiao & Watson, 2019). The full texts of the papers were read and checked considering the QA checklist presented in Table 1.

Table 1 QA Checklist

A systematic review and Meta-data analysis on the applications of Deep Learning in Electrocardiogram

Abstract

Similar content being viewed by others

A Systematic Review on ECG and EMG Biomedical Signal Using Deep-Learning Approaches

What Machine Learning (ML) Can Bring to the Electrocardiogram (ECG) Signal: A Review

A new transfer learning approach to detect cardiac arrhythmia from ECG signals

1 Introduction

2 The evolution of Deep Learning

2.1 Deep learning architectures

2.1.1 Deep neural networks

2.1.2 Convolutional neural networks

The pooling layer

The fully connected (FC) layer

2.1.3 Deep recurrent neural networks

2.1.4 Restricted Boltzmann Machines

2.1.5 Autoencoders

2.1.6 Generative adversarial networks

3 The synopsis of Electrocardiogram

3.1 Types of ECG Machine

3.2 ECG wave morphology

3.3 Measuring of ECG waveforms and diagnoses

4 Reviews of previous surveys

5 Systematic literature review

5.1 Planning the review

5.1.1 Research questions (RQ)

5.1.2 Search Strategy

5.1.3 Study Selection Criteria

5.1.4 Strategies for data extraction and synthesis

5.2 Conducting the review

5.3 Reporting findings

5.3.1 Application of DL in ECG signal

5.3.1.1 Medical/Healthcare domain

5.3.1.2 Biometric/Security domain

5.3.1.3 Driving domain

6 General analysis and discussions

6.1 Discussion of the Deep Learning model

6.2 Discussion of the application area

6.3 Discussion of the classification Task

6.4 Discussion of the dataset sources

6.5 Discussion of the Preprocessing Method

6.6 Discussion of the Learning Architecture

7 Challenges and future research directions

7.1 Domain Challenges

7.2 Model Challenges

7.3 Application Task Challenges

7.4 Dataset challenges

7.5 Training Architecture Challenges

8 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation