Mortality risk prediction for Intermediate Respiratory Care Unit's (IRCU) patients can facilitate optimal treatment in high-risk patients. While Intensive Care Units (ICUs) have a long term experience in using algorithms for this purpose, due to the special features of the IRCUs, the same strategics are not applicable. The aim of this study is to develop an IRCU specific mortality predictor tool using machine learning methods.
MethodsVital signs of patients were recorded from 1966 patients admitted from 2007 to 2017 in the Jiménez Díaz Foundation University Hospital's IRCU. A neural network was used to select the variables that better predict mortality status. Multivariate logistic regression provided us cut-off points that best discriminated the mortality status for each of the parameters. A new guideline for risk assessment was applied and mortality was recorded during one year.
ResultsOur algorithm shows that thrombocytopenia, metabolic acidosis, anemia, tachypnea, age, sodium levels, hypoxemia, leukocytopenia and hyperkalemia are the most relevant parameters associated with mortality. First year with this decision scene showed a decrease in failure rate of a 50%.
ConclusionsWe have generated a neural network model capable of identifying and classifying mortality predictors in the IRCU of a general hospital. Combined with multivariate regression analysis, it has provided us with an useful tool for the real-time monitoring of patients to detect specific mortality risks. The overall algorithm can be scaled to any type of unit offering personalized results and will increase accuracy over time when more patients are included to the cohorts.
La predicción del riesgo de mortalidad de los pacientes en la unidad de cuidados respiratorios intermedios (UCRI) puede facilitar un tratamiento óptimo en pacientes de alto riesgo. Si bien las unidades de cuidados intensivos (UCI) tienen una experiencia a largo plazo en el uso de algoritmos para este propósito, debido a las características especiales de las UCRI, no se pueden aplicar las mismas estrategias. El objetivo de este estudio es desarrollar una herramienta de predicción de mortalidad específica para la UCRI utilizando métodos de aprendizaje automático.
MétodosSe registraron los signos vitales de 1.966 pacientes ingresados entre 2007 y 2017 en la UCRI del Hospital Universitario de la Fundación Jiménez Díaz. Se utilizó una red neuronal para seleccionar las variables que mejor predijeran el estado de mortalidad. La regresión logística multivariante nos proporcionó los puntos de corte que discriminaban mejor el estado de la mortalidad para cada uno de los parámetros. Se aplicó una nueva guía para la evaluación de riesgos, y se registró la mortalidad durante un año.
ResultadosNuestro algoritmo muestra que la trombocitopenia, la acidosis metabólica, la anemia, la taquipnea, la edad, los niveles de sodio, la hipoxemia, la leucocitopenia y la hipercalemia son los parámetros más relevantes asociados con la mortalidad. En el primer año con este escenario de decisión se mostró una disminución en la tasa de fracaso de un 50%.
ConclusionesHemos generado un modelo de red neuronal capaz de identificar y clasificar predictores de mortalidad en la UCRI de un hospital general. Combinado con el análisis de regresión multivariante, nos ha proporcionado una herramienta útil para la monitorización en tiempo real de pacientes para detectar riesgos de mortalidad específicos. El algoritmo general se puede modificar a escala para cualquier tipo de unidad, lo que ofrecerá resultados personalizados, y su precisión aumentará con el tiempo, según se incluyan más pacientes en las cohortes.
Respiratory diseases are the leading cause of mortality in general hospitals.1 Severely ill patients with acute respiratory failure (ARF), acute exacerbation of chronic obstructive pulmonary disease (AECOPD), weaning procedures or community-acquired pneumonia (CAP) have been historically admitted to Intensive Care Units (ICUs). However, due to the limited availability and high cost of ICU beds, these patients are often admitted primarily to other units, chiefly in Intermediate Respiratory Care Units (IRCUs).
One of the most relevant outcomes of IRCUs is patient mortality. Indeed, several studies focus on independent factors related to mortality in patients receiving noninvasive ventilation in various diseases, such as COPD,2–4 pneumonia,5,6 cardiac failure,7 Acute Respiratory Distress Syndrome,8 asthma,9 immunocompromised patients,10,11 elderly patients,12 and interstitial lung diseases.13
IRCUs have traditionally used various scoring systems for predicting mortality inherited from ICUs, such as the APACHE system,14 the Acute Physiology Score (APS) III,15 Simplified Acute Physiology Score (SAPS),16 SAPS II,17 the Sequential Organ Failure Assessment (SOFA) score,18 the Logistic Organ Dysfunction Score (LODS),19 and the Oxford Acute Severity of Illness Score (OASIS).20 It has been generally accepted that these models tend to lack sufficient calibration to be used on an individual level,21 and research goals shifted to quantify ICU and hospital performance in aggregate.
Predictive tools can be developed using a variety of techniques from clinical judgment to statistical modeling,22,23 including some based on machine-learning algorithms such as Neural Networks (NN).24,25 The ability to predict the risk of mortality for ICU and IRCU patients could facilitate optimal allocation of staff and resources to high-risk patients and ensure timely interventions.
We aimed to build a statistical framework able to adapt to specific features of a hospital in order to identify and classify mortality predictors of an IRCU.
Material and MethodsStudy PopulationThe IRCU of University Hospital fundacion Jimenez Diaz is a high complexity unit with 15% of patients requiring complicated postoperative weaning procedures, while vasoactive treatment prevalence is approximately 26%. The unit has 5 beds (1:5 patient nurse ratio), being able to reach 8 in the periods of greatest need for care (1:4 patient nurse ratio in this scenario). Our unit is an open room to guarantee the global supervision of the patients that are completely telemonitorized. Supplementary Figures 1–3 describe the material and human resources, as well as its complexity according to the average GRD value of the unit. The study was aproved by our Ethic Committee. De-identified medical records were confidentially collected from a total of 1966 patients admitted to the FJD-Hospital IRCU (Madrid, Spain) from January 2007 to December 2017. A second cohort of 230 new patients with confidential records recruited from October 2018 to September 2019 in the same IRCU, were used to measure the performance of the new IRCU protocol based on the conclusions of this work. This is an observational and retrospective study. All the patients admitted to our IRCU were selected for the study and no exclusión criteria was imposed.
Data Collection and Variables Included in the StudyVital signs of patients were recorded upon arrival at the hospital, before being admitted to the IRCU: respiratory rate (RR), temperature (T), systolic blood pressure (SBP), and diastolic blood pressure (DBP). Laboratory findings including hemoglobin (Hb), platelets (PL), leucocytes (Leucos), the International Normalized Ratio (INR), blood glucose (GLU), blood potassium (K), sodium (NA) and creatinine (CREAT) were also registered. Blood gases at the time of hospital admission were recorded (Fi02 21%), including partial oxygen pressure (P02), partial carbon dioxide pressure (PC02), PH, bicarbonates (SBC) and base excess (BE). Age and individual mortality or survival upon discharge from the IRCU was extracted from medical histories. The data was collected following standard protocols by the same personnel over the whole time period and in the two cohorts included in the study.
Missing Data ImputationA criteria for variable exclusion was defined in order to avoid introduction of statistical noise that may introduce bias in final results. Parameters with a high percentage (>25%) of missing values were excluded from the analysis. For the rest of the parameters we perform the K-nearest neighbors (KNN) imputation method implemented in the DMwR R package.
Principal Component AnalysisPrincipal component analysis (PCA) was performed on the matrix with missing data imputation, using the pca3d R library.
Sample Classification Based on Variables, Cut-off Values and Logistic Regression ModelUsing ROC curves and Youden's J statistic we obtained the cut-off point that best discriminates the mortality status for each of the parameters. Based on these cut-off points we defined a binary classification for each parameter (P-value <.2) to feed a multivariate logistic regression model. Area Under the Curve (AUC), sensitivity and specificity were calculated using a ROC curve. An internal validation of the model was performed by bootstrap method to correct for optimistic prediction. A total of 1000 bootstrap samples with replacement were generated, and the differences in AUC on the bootstrap samples and the original sample were calculated. The goodness of fit of the model was evaluated by the Hosmer and Lemeshow test.
Neural Network Implementation and Data Re-samplingA NN model was applied using the caret R package. Data matrix was normalized using the min-max scale method. Ten re-sampled matrices were calculated having samples with mortality=1 and a random selection of the same number of samples with mortality=0. Every re-sampled matrix is then subject to a 10-fold cross validation process (90% training and 10% testing). Thus, a total of 100 neural network models are performed using “nnet” method with automatic selection of: (i) the optimal number of units per hidden layer between 1 and 5, and (ii) the optimal value for the regularization parameter (between 0.1 and 0.5 with increments of 0.1). The “twoClassSummary” method was used to compute sensitivity, specificity and the AUC. A ROC curve is calculated for the set of predictions and real values within the loop, using the 10-fold cross validation process. The final AUC, accuracy, sensitivity and specificity were calculated as the mean of the 100 neural network models performed in total. Our R script is available at https://github.com/pminguez/MachineLearning4UnbalancedData.
Using the KNN data matrix, we calculate the Spearman correlation between every pair of variables. In order to build the network, we select the pairs that have a correlation coefficient >±0.3 and a P-value <.01. The selected pairs of variables represent the edges of the correlation network.
Comparison of FailureIn order to compare the failure before and after the new criteria to detect high risk patients was implemented, we fit Poisson regression models to calculate the annual trend of failure risk in the two periods. The relatives risks (RR) were compared using the Wald test, taking the coefficients of the models (logarithm of RR) and its standard errors.
ResultsCohort and Dataset DescriptionA total of 1966 patients were included in the analysis having a global local IRCU mortality rate of 1.68% (33 patients) and a global failure (IRCU mortality+ICU transfer+outside IRCU delayed mortality) of about 5.39% (106 patients). Table 1 describes the study population with epidemiologic data and treatment used and Table 2 shows mean and standard deviation values for each of the variable used (INR was excluded). A PCA of the patients’ features did not show observable differences between patient's classifications according to mortality status (Fig. 1).
Global Profile of the Cohort Included in the Study. NIV Means Non Invasive Ventilation; HFO Means High Flow Oxygen Treatment.
Features | Threshold |
---|---|
Years of study | 10 |
Number of patients | 1966 |
Mortality rate (%) | 1.68 |
Age (mean) | 75±14 |
Gender ratio (male %/female %) | 57/63 |
NIV treatment (%) | 83 |
HFO treatment (%) | 2 |
Weaning procedures (%) | 15 |
Vasoactive drugs (%) | 26 |
Description of the Variables and Their Values Introduced in the Analysis. Percentage of NAs (Not Available Value) Indicates the Missing Values of Every Variable.
Variable | Mean | SD | %NA |
---|---|---|---|
AGE | 75.3 | 14.8 | 7.38 |
SBP | 127.43 | 20.35 | 0 |
DBP | 70.42 | 23.19 | 0 |
T | 37.34 | 0.54 | 0 |
RR | 19.41 | 2.38 | 0 |
LEUCOS | 12.25 | 9.79 | 0.15 |
Hb | 12.14 | 2.66 | 0.15 |
PL | 258.37 | 130.19 | 0.20 |
INR | 1.55 | 1.32 | 40.18 |
CREA | 1.01 | 1.57 | 0.15 |
NA | 138.35 | 5.43 | 0.15 |
K | 4.34 | 1.92 | 0.31 |