Background
Rapid response systems (RRS) have been shown to prevent in-hospital cardiac arrest (IHCA) or unplanned intensive care unit transfer (UIT) by enabling early detection and proper intervention in patients exhibiting signs of clinical deterioration [
1,
2]. Track-and-trigger systems are part of the afferent limb of the RRS for monitoring patients, detecting deterioration, and activating the RRS [
3]. In general, these can be categorized as single- (SPTTS) or multiple-parameter track-and-trigger systems (MPTTS). SPTTS activate the RRS using single abnormal vital signs or laboratory findings. However, while these systems can be intuitive and sensitive, the rapid response team (RRT) can be exhausted by many false alarms [
4]. Early warning scores (EWS) derived from a combination of several physiological parameters are typical examples of MPTTS [
5]. The modified early warning score (MEWS) and national early warning score (NEWS) are the most widely used MPTTS [
6], both of which have better predictive values for IHCA and are more efficient in detecting clinical deterioration than SPTTS [
7,
8].
The deep learning-based cardiac arrest risk management system (DeepCARS™) was first developed in 2018 and approved as a medical device in 2021 by the Ministry of Food and Drug Safety (MFDS). Using basic vital signs (blood pressure [BP], heart rate [HR], body temperature [BT], respiratory rate [RR]), patient age, and the recorded time of each vital sign, the DeepCARS™ has demonstrated higher accuracy in predicting IHCA, compared with the MEWS, with higher sensitivity and a lower false alarm rate [
7,
8]. However, the value and safety of this system in real-world practice remain to be determined, given that previous validation studies have been retrospective.
Therefore, we aimed to investigate the predictive accuracy of the DeepCARS™ for IHCA or UIT in general ward patients, compared with that of conventional methods in real-world practice.
Methods
Study design and population
We conducted a prospective multicenter cohort study over 3 months (October 18, 2021–January 17, 2022) at four tertiary academic hospitals in South Korea: Inha University Hospital (925 beds), Seoul National University Bundang Hospital (1324 beds), Dong-A University Medical Center (999 beds), and Seoul National University Hospital (1,793 beds). All hospitals had been operating mature RRS for at least 5 years. This study was registered at ClinicalTrials.gov (NCT 04951973) on June 30, 2021. The RRS of each hospital screened and monitored patients with simultaneous running of the DeepCARS™, MEWS, NEWS, and SPTTS for 3 months, and the intervention was maintained as routine practice as originally done by the RRT. As vital signs or laboratory data were entered into the electronic medical record, the prediction score for each method was automatically computed. When an alarm was triggered by any of above methods, the RRT reviewed and confirmed the alarm, making a decision on whether to provide intervention. It is important to note that the alarms generated by each method did not require any mandatory action, as it primarily serves as a screening tool.
All patients aged 18 years who had been admitted to the general wards during the study period were included. Patient data were excluded in the following cases: admission date outside of the study period, admission within 24 h before the end of the study period among those who did not experience IHCA or UIT, no vital signs recorded 24 h before IHCA or UIT, no vital signs recorded during the entire study period, and patients with DNR orders without any occurred events (Additional file
1: Fig. S1). The Ethics Committee and Institutional Review Board of each hospital approved the study protocol as minimal-risk research using data collected for routine clinical practice, and they waived the requirement of informed consent.
Outcomes
The primary outcome of interest was the composite of IHCA (loss of circulation prompting resuscitation with chest compression, defibrillation, or both) and UIT (admission to the intensive care unit (ICU) due to unanticipated deterioration in patients from general wards rather than from the operating room or emergency department) [
9‐
11]. We compared the predictive accuracy of the DeepCARS™ with that of the conventional triggering systems (MEWS, NEWS, and SPTTS) to determine whether the primary outcome occurred within 24 h of the system alarm being triggered. Additionally, we compared each score in terms of alarm performance and the timeliness of prediction. In addition, subgroup analyses were conducted according to department of admission, age group, sex, hospital, and surgical status.
Data collection and preprocessing
We collected data on age, sex, occurrence of events (IHCA and UIT), recorded time of vital signs, five time-stamped vital sign values (BP [systolic and diastolic], HR, RR, and BT), consciousness level, oxygen saturation, oxygen supplementation, five time-stamped laboratory test values (pH, PaO2, PaCO2, TCO2, and lactic acid), scores derived using each triggering system, DNR code status, and RRT intervention.
Deep learning-based cardiac arrest risk score
The detailed architecture of the DeepCARS™ has been described previously [
7,
8].
Deployment of the DeepCARS™
We deployed the DeepCARS™ and dashboard software in all participating hospitals. The design and interface choices for the dashboard were made in collaboration with the RRT from all participating hospitals and were refined based on the initial draft. The deployment was conducted in two steps. First, the RRT from the site and development team of the DeepCARS™ met with clinicians and the information system team to explain the features of the system, share the integration specifications, and discuss how to integrate the product within the hospital. Next, we set up the implementation phase to verify system integration at each site. The dashboard was used to display alerts and values for each prediction model and record the final intervention performed. We designed a dashboard for the RRT to click a button to categorize alerts into four types of events: cardiopulmonary resuscitation (CPR), UIT, DNR suggestion, and borderline intervention. Alerts that occurred in all hospitals after activation were included in the analysis.
Key aspect 1: How accurate is the DeepCARS™ in predicting IHCA or UIT, compared with conventional methods?
We evaluated predictive performance by measuring the area under the receiver operating characteristic curve (AUROC), which is one of the most used metrics reflecting sensitivity/false positive rates. Additionally, we calculated the F-1 score (2 × [precision × recall]/[precision + recall]), positive predictive value (true positive/[true positive + false positive]), negative predictive value (true negative/[true negative + false negative]), net reclassification index, and number needed to examine (NNE) [
12,
13]. We also compared predictive performance according to the timeline in the prediction window (24, 12, 6, 3, and 0.5 h before the primary event).
Key aspect 2: Does the DeepCARS™ lead to a lower total alarm count and higher appropriate alarm rate, compared with conventional methods?
We compared alarm performance by measuring the total alarm count and the rate of appropriate alarms. The total alarm count was expressed as the mean alarm count per day (MACPD)/1,000 beds and calculated by dividing the total number of alarms by the study period and the total number of beds and multiplying it by 1,000. Lower MACPD indicates better alarm performance.
We triaged the interventions performed by the RRT according to the A/B/C categories used by critical care response teams in Ontario [
14], with minor modifications. We divided patients into the following four categories: Category A (admission to the ICU); category B (borderline) included patients who required further assessment (typically investigations or monitoring of response to therapy); and category Cp (CPR) included patients with loss of circulation, prompting resuscitation with chest compression, defibrillation, or both. We added category D (do not resuscitate [DNR]), which included patients whose DNR orders were initiated by the RRT in the ward [
15]. All other alarms were categorized as Z. An alarm that activated the RRT and was connected to clinical intervention categories A, B, C, and D was defined as an appropriate alarm.
The rate of appropriate alarms was calculated by dividing the number of appropriate alarms by the total alarm count as follows: we compared the appropriate alarm count at MEWS and NEWS values of 5 points, which is the most commonly used triggering threshold and equivalent to a score of 95 points for the DeepCARS™.
Key aspect 3: Does the DeepCARS™ predict more cases of IHCA or UIT earlier than conventional systems do at the same specificity level?
Delayed RRT intervention is associated with poor prognosis [
16]. When there is sufficient preparation time for the RRT before a patient falls into a disastrous condition, the team has the advantage of responding appropriately to the deteriorating patient. Therefore, the ability to predict more events in a timely manner is an important feature of the RRS. We analyzed this performance by comparing the cumulative percentages of patients with composite primary outcomes from 24 h to 0.5 h before the event.
Key aspect 4: How robust is the DeepCARS™ in various cohorts when compared with conventional methods?
We calculated the predictive performance of the DeepCARS™ in various cohorts in terms of department of admission. The cohort was also divided according to age, sex, hospital, and surgical status.
Additionally, we assessed the calibration of each DeepCARS™ prediction model by plotting ideal calibration curves and calculating the average absolute error between the actual and estimated outcomes. We performed extensive statistical analysis using scikit-learn (Scikit-learn 0.23.1; community-driven project sponsored by BCG GAMMA), pandas (Pandas 1.0.5; community-driven project sponsored by NumFOCUS), and R (R 3.6.1; R core Team 2021).
Discussion
Our study indicated that the predictive performance of the DeepCARS™ for IHCA or UIT was superior to that of the MEWS, NEWS, and SPTTS in patients admitted to general wards. At the same sensitivity level, the total alarm count was significantly reduced using the DeepCARS™, which also increased the relative number of appropriate alarms leading to real activation of RRT interventions. In addition, the DeepCARS™ predicted the outcomes of patients earlier, and its predictive performance remained superior to that of conventional methods, regardless of department of admission, patient age, sex, hospital, or surgical status. Therefore, better predictions with fewer alarm counts and earlier predictions indicate that the DeepCARS™ is an effective alternative screening tool to conventional triggering systems for the RRS.
The main strength of our study was that we clearly distinguished true alarms that led to actual RRT interventions from all alarms in a prospective manner. To our knowledge, this is the first study to prospectively collect and triage each alarm system for RRT intervention. In our study, borderline interventions included fluid therapy, prescription of antibiotics or other medications, oxygen therapy, and recommendation for further specific evaluation by the RRT. Although these interventions are not as dramatic as UIT or IHCA, they account for the majority of RRT actions and improve clinical course, thereby helping to avoid potentially severe outcomes [
1,
17]. By defining borderline interventions and analyzing them according to alarms, we were able to calculate the exact number of appropriate alarms placing patients at risk of IHCA or UIT. In addition, DNR recommendations by the RRT are relatively common in clinical practice, such as in patients with terminal cancer or no further possibility of resuscitation [
18,
19]. However, a retrospective study design can make it difficult to identify and tag which alarms are associated with borderline interventions or DNR suggestions by the RRT. Our prospective study design enabled a more accurate validation by preventing the misclassification of appropriate alarms, providing stronger evidence of the clinical practicality and efficacy of the DeepCARS™.
Numerous studies have developed machine learning-based algorithms for predicting IHCA [
7,
8,
20‐
24]. Churpek et al. revealed that the random forest algorithm was more accurate than the MEWS in predicting IHCA, ICU admission, and death in wards for patients who experienced attempted resuscitation [
20]. The Mayo Clinic EWS and electronic cardiac arrest risk triage score also exhibited better performance in predicting IHCA or ICU transfer than did the NEWS [
23,
25]. These algorithms rely on a large number of variables and require complex calculations based on a combination of demographics, vital signs, and laboratory test results. Therefore, lack of demographic data and time lags between events and laboratory tests can lower their predictive performance and make them difficult to apply in real-world settings. In 2022, a time-series early warning score (TEWS) for predicting IHCA using only basic vital signs was validated [
21]. The predictive performance of the TEWS for IHCA was superior to that of the MEWS. The TEWS and DeepCARS™ differ in several aspects, including their model architectures, training methods, preprocessing methods, and exclusion criteria. The main differences between them are their inputs and outputs: while the DeepCARS™ uses age and recorded time as predictor variables for predicting cardiac arrest within 24 h in addition to vital signs, the TEWS focuses solely on vital signs to predict cardiac arrest within 48 h. Age was added as a predictor variable to the DeepCARS™ to provide basic patient information for the model to cluster patients according to age and vital signs. Age is important because vital signs associations can differ by age group. Additionally, the recorded time provides critical information regarding the length of stay and monitoring intensity, providing greater insight into the severity of the patient’s condition, compared with vital sign values alone. Finally, the DeepCARS™ is more advantageous than the TEWS, given that the latter was developed and validated in a single-center retrospective study.
Delays in RRS initiation and ICU transfer have been associated with increased mortality and morbidity [
26]. Although vital signs are usually monitored continuously in the ICU, nurses in general wards measure vital signs three or four times daily. Thus, early detection of clinical deterioration by EWS and suitable interventions for RRT are crucial for patient prognosis [
27,
28]. In our study, the DeepCARS™ provided more time to intervene, compared with the other traditional triggering systems. In addition, DeepCARS™ performance was sustained regardless of department of admission, age, sex, hospital, or surgical status. The current results indicate that the DeepCARS™ may be superior to or at least not inferior to conventional triggering systems in the RRS, highlighting its potential as an effective system for screening high-risk patients in general wards.
This study had some limitations. First, we did not examine the relationship between RRS activation by the DeepCARS™ and IHCA reduction. Although alarms triggered by the DeepCARS™ led to more adequate RRT interventions, compared with those triggered by other methods, the study period was too short for the evaluation of long-term prognosis. Second, we did not evaluate the appropriateness of every RRT intervention, as we assumed that the detection of clinical deterioration by the EWS would result in appropriate intervention. However, in real-world clinical practice, the judgment of the RRT may influence the decision to intervene and the quality of the intervention. Therefore, guidelines for appropriate standard interventions should be developed and verified. Third, selection bias may have occurred given that all hospitals included in this study had university affiliations. In addition, all four hospitals have mature RRS, and it is necessary to evaluate DeepCARS™ performance in hospitals that have recently implemented RRS and those without an established RRS, as the incidence and reduction of IHCA may depend on the maturity of the RRS. Finally, the DeepCARS™ was evaluated only in South Korea, necessitating further studies among other ethnic groups.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.