The aim of this study was to evaluate the quality of diagnosis and treatment of COPD using Big Data methodology on the Savana Manager 2.1 clinical platform.
Materials and methodsA total of 59,369 patients with a diagnosis of COPD were included from a population of 1,219,749 adults over 40 years of age.
ResultsIn total, 78% were men. Spirometry data were available for only 26,453 (43.5%) subjects. Disease severity was classified in 18,172 patients: 4396 mild, 7100 moderate, and 6676 severe, although only 27%, 34%, and 28%, respectively, presented obstructive spirometry. The clinical management of COPD is mainly the responsibility of the primary care and pulmonology departments, while internal medicine and, to a lesser extent, geriatrics also participate. Drug treatment was based on bronchodilators and inhaled corticosteroids (ICS). A marked decline in the use of long-acting beta-2 agonists (LABA) in monotherapy and a slight reduction in ICS/LABA combinations, associated with a long-acting anticholinergic (LAMA) in 74% of cases, were observed. All-cause in-hospital mortality among the overall population was 5.6% compared to 1% of the general population older than 40 years. In total, 35% were admitted to hospital, with an average stay of 6.6 days and an in-hospital mortality rate in this group of 10.74%.
DiscussionThis study identifies the main features of an unselected COPD population and the main errors made in the management of the disease.
El objetivo de este estudio es evaluar la calidad del diagnóstico y el tratamiento de la EPOC utilizando metodología de big data mediante la plataforma clínica Savana Manager 2.1.
Material y métodosSobre una población de 1.219.749 sujetos mayores de 40 años se incluyó a 59.369 pacientes con un diagnóstico de EPOC.
ResultadosEl 78% de ellos eran varones. Solo 26.453 (43,5%) disponían de espirometría. En 18.172 pacientes se hizo una aproximación a la gravedad de su proceso: 4.396 leves, 7.100 moderados y 6.676 graves, aunque solo disponían de espirometría obstructiva el 27, el 34 y el 28%, respectivamente. El manejo clínico de la EPOC recae fundamentalmente en Atención Primaria y Neumología, con un papel relevante de Medicina Interna y, en menor medida, de Geriatría. El tratamiento farmacológico está basado en el uso de broncodilatadores y corticoides inhalados (CI). Se observa un marcado descenso en la utilización de los beta-2-agonistas de larga duración (LABA) en monoterapia y una leve reducción de combinaciones de CI/LABA, asociados a un LAMA en el 74% de los casos. La mortalidad hospitalaria por cualquier causa de la población global fue del 5,60% frente al 1% de la población general mayor de 40 años. El 35% presentó un ingreso hospitalario, con una estancia media de 6,6 días y una tasa de mortalidad hospitalaria en este grupo del 10,74%.
DiscusiónEste estudio identifica cuáles son las principales características de una población no seleccionada de EPOC y cuáles son los principales errores en el manejo de la enfermedad.
Chronic obstructive pulmonary disease (COPD) is the fourth leading cause of death in Spain; it impacts negatively on the quality of life of patients and generates a significant burden of disability.1,2
Spirometry is crucial for establishing diagnosis and for classifying the functional severity of the disease. Despite the simplicity and low cost of this procedure, many patients are diagnosed with COPD solely on the basis of their medical history and physical examination.3 This can lead to inappropriate medical prescriptions, a delay in the treatment of other possible causes of symptoms, and high healthcare costs. It is important to underline the economic impact of COPD, and to remember that any intervention that will improve the evaluation and management of this disease will have a significant impact in both clinical and economic terms.4
Attempts have been made to blame the high rates of incorrect diagnoses on the limitations of primary care services. However, the results of the AUDIPOC study and its subsequent European extension have shown that the same mistakes are common in specialized care, leading to serious problems in the management of the disease in hospitalized patients, in whom the clinical impact of any error is magnified.5,6 Regional audits have confirmed this situation, revealing deficiencies in diagnoses and associated management in the hospital setting of over 70%.7 Without a correct diagnosis, it is difficult to provide correct treatment. For this reason, before implementing new care plans and even new clinical guidelines, the real situation of COPD management in our environment must be determined.
The strongest scientific information on the quality of COPD management comes from researchers in the United Kingdom, where there are institutions that promote and fund evaluative research on clinical practice.6,8 The main limitation of audit studies, including the AUDIPOC study itself, and case series is that they often start from a selection bias, as centers or doctors who are most interested in the matter are the ones who most often participate. Another limitation of these studies is that they are difficult to repeat periodically, so it is difficult to make a dynamic assessment of the impact of different healthcare measures, such as integrated care projects and even clinical practice guidelines (CPG). In fact, although numerous CPGs currently provide recommendations for managing COPD, scant data are available on their impact on the quality of diagnosis and treatment. In this setting, data collection should be increasingly common, both for care models and comparative evaluation programs, and for resource allocation.9
Big data applications in the health sector and, specifically, the application of new technologies to manage and extract the value of complex data generated in large volumes from electronic medical records (EMRs) are a reality. Most of the information contained in electronic medical records appears in an unstructured form, as free text, but this can be analyzed using big data techniques and artificial intelligence. Savana Manager is a clinical platform that can analyze free text and interpret the content of EMRs, regardless of the management system used in hospitals. In this way, the main indicators of a given clinical process can be evaluated, avoiding selection biases beyond the existence of the registry itself. Savana has developed EHRead technology,10 which can be used to read, process, and order unstructured free text from EMRs. Once this process is completed, the information in the EMR is transformed into structured data, which can be stored, consulted, and analyzed for research purposes in a simple and quick manner.
In view of all this, the objective of this study was to determine in our setting, under clinical practice conditions, the quality of COPD diagnosis and treatment, and the main health indicators, using big data methodology on the Savana Manager 2.1 clinical platform.
Materials and methodsThis was an observational, retrospective, non-interventional study using secondary data captured from the free text of EMRs. This study was carried out in Castilla-La Mancha in a catchment area of 2,030,807 inhabitants where the health service (SESCAM) uses the Savana Manager 2.1 tool, which can analyze data collected since 2011.
The study population included all patients over the age of 40 with a diagnosis of COPD. This section lists all the terms listed in Table 1.
Inclusion criteria. Patients aged ≥ 40 years, with a clinical diagnosis of COPD. The selected concept also includes the following terms.
Acute exacerbation of COPD |
Pulmonary emphysema |
Chronic obstructive pulmonary disease |
Severe chronic obstructive pulmonary disease |
End-stage chronic obstructive pulmonary disease |
Chronic obstructive pulmonary disease with acute lower respiratory tract infection |
Chronic obstructive airway disease with asthma |
Emphysema-type chronic obstructive pulmonary disease |
Stable chronic obstructive pulmonary disease |
Savana Manager is a data extraction system based on artificial intelligence (natural language processing [NLP]) and big data techniques. This technology can be used to extract unstructured clinical information (natural language or free text) from EMRs and transform it into reusable and ordered information for research purposes,11 maintaining patient anonymity at all times. Comprehensive clinical contents are also detected and scientifically validated with the application of computational linguistic techniques (SNOMED CT),12 using data from the EMRs of the SESCAM specialized care network (hospitalization, emergency and outpatient consultations) and primary care consultations. As for the study variables, it should be noted that, as a big data-based study, the potential number of variables that can be included is limited to the information contained in the EMRs.
The study period ran from January 1, 2011 to December 31, 2018. Initially, this period was evaluated overall, and then 3 cut-off points were established (2011−2012; 2014−2015; 2017−2018), in order to determine not only the status of the disease in those periods, but its evolution over time. Significant events during this period were the publication of the GOLD recommendations13 and the Spanish COPD (GesEPOC) clinical guidelines.14
Data management and protectionThe IT departments of each hospital are responsible for processing and anonymizing the data, which are then uploaded to Savana in such a way that Savana never receives any identifiable data. In addition, during data extraction, an algorithm is used that randomly enters confounding information for each patient while simultaneously retrieving only part of the individual information. The end result of this methodology is the creation of a fully dissociated and anonymous patient database, so that all study reports contain only aggregated data and neither patients nor physicians can be identified. According to the European Data Protection Authority, once an anonymous medical record no longer contains personal data, General Data Protection Regulations no longer apply to it. The study was approved by the Research Ethics Committee of the coordinating site.
Evaluation of data extractionUsing EHRead technology, the free text contained in the EMRs was analyzed and processed using NLP techniques. Medical concepts were detected through the use of computational linguistic techniques and comprehensive clinical contents. These unstructured data were processed as big data.
As this methodological approach is new, we completed our clinical findings with an evaluation of the Savana performance. The aim of this analysis was to verify the precision of the system in identifying records containing mentions of COPD and related variables. The lack of coded clinical data in Spain meant that an annotated corpus, known as the gold standard, had to be developed to carry out this evaluation. This gold standard consists of a set of clinical documents in which the appearance of entities/concepts related to COPD are manually verified by experts. The corpus used in this evaluation was a set of 560 documents reviewed by 3 experts to ensure the reliability of the manual annotation/revision.
Savana's performance was automatically calculated using the gold standard created by the experts as an evaluation resource. This means that the precision of Savana in identifying records in which the presence of the disease under study and related variables had been detected was measured with respect to the gold standard. The system evaluation metric was calculated in terms of the standard precision (P), recall (R) and F-measure metrics.15
Precision (P)=tptp+fp. This parameter gives us an indicator of the reliability with which the system retrieves the information.
Recall (R)=tptp+fp. This parameter gives us an indicator of the amount of information the system retrieves.
F-measure=2x Precisionx RecallPrecision + Recall. This parameter gives us an indicator of the reliability with which the system retrieves the information.
In all cases, we defined a true positive (tp) as a correctly identified record, a false positive (fp) as a misidentified record, and a false negative (fn) as a record that should have been identified.
Statistical analysisFor the purposes of this study, the statistical approach to the data collected included a descriptive analysis of all the variables evaluated. We used the usual descriptive statistics. Qualitative variables are presented as absolute frequencies and percentages, and quantitative variables as means ± standard deviations. The Student's t-test for independent samples or variance analysis was used for the analysis of the numerical variables. The Chi-squared test was used to measure the association and to compare proportions between qualitative variables. In all cases, differences with a p-value associated with the comparison test of less than 0.05 were considered significant.
ResultsOverall, 2,173,665 subjects were evaluated, of whom 2,030,807 were registered in the regional health system; the rest correspond mostly to floating populations from neighboring health areas. For the purposes of this study, only 1,219,749 subjects over 40 years of age were included; mean age was 62 years and 47% were men. Data analysis was based on 33,182,804 documents.
During the period 2011−2018, the cumulative number of patients over 40 years of age who had a diagnosis of COPD was 59,369; mean age was 73 years and 78% were men. Only 26,453 (43.5%) had spirometry performed. Disease severity had been classified in 18,172 patients: 4396 mild, 7100 moderate, and 6676 severe. This classification was made at the discretion of the treating physician, but we were unable to identify any element to establish that patients had been classified using standardized criteria. In fact, only 27%, 34%, and 28% of cases, respectively, had obstructive spirometry. Table 2 lists the main associated comorbidities.
Most common diseases in COPD patients.
AHT | 67% |
Dyslipidemia | 45% |
Hyperglycemia | 36% |
Heart failure | 28% |
Obesity | 24% |
Atrial fibrillation | 23% |
Ischemic heart disease | 21% |
BPH | 20% |
Sleep apnea syndrome | 17% |
Depression | 15% |
Chronic respiratory failure | 13% |
“Asthma” | 13% |
Chronic renal failure | 12% |
Osteoporosis | 7% |
Hiatus hernia | 6% |
Fig. 1 describes the specialties that most often treated COPD patients (Fig. 1a) and the use of spirometry in the diagnostic process in each of them (Fig. 1b), as well as their evolution at the 3 cut-off points selected during the study period.
As can be seen in Fig. 2, pharmacological treatment is based on the use of bronchodilators and inhaled corticosteroids (ICS). During the follow-up period, there was a marked decline in the use of long-acting beta-2-agonists (LABA) in monotherapy and, to a lesser extent, long-acting anticholinergics (LAMA), and a contrasting increase in dual bronchodilation, occasionally with a triple therapy strategy, combined with a simultaneous ICS. A slight downward trend in ICS/LABA combinations was confirmed, 74% being associated with a LAMA (triple open therapy). The study period does not allow us to determine the impact of triple therapy in a single device. Fig. 3 shows prescription profiles by specialties for the period 2011−2018.
All-cause in-hospital mortality in the COPD population was 5.6% compared to 1% in the general population over 40 years of age. Overall, 35% of patients were hospitalized, with a mean stay of 6.6 days and an in-hospital mortality rate in this group of 10.74% (Table 3). Although there were marked differences in mortality by hospital department, the differences did not reach statistical significance (P = .058), and the populations presented significant differences in mean age and associated comorbidities (Table 4).
Healthcare parameters of patients requiring hospitalization for any reason.
2011−2012 | 2014−2015 | 2017−2018 | 2011−2018 | P-value | |
---|---|---|---|---|---|
Hospital admission | 35% | ||||
Age (SD) | 77 (11) | 76 (11) | 76 (12) | 76 (11) | |
Sex (men) | 89% | 87% | 86% | 86% | |
Mean stay | 6.6 | 6.6 | 6.6 | 6.6 | |
Mean stay RM | 6.8 | 6.4 | 6.6 | 6.6 | |
Mean stay IM | 6.8 | 6.8 | 6.7 | 6.8 | |
Mean stay GER | 6.7 | 7.5 | 7.5 | 7.4 | |
Re-admission within 72 h | 0.77 | 0.38 | 0.50 | 0.48 | |
Hospital death | 9.51% | 7.72% | 7.37% | 10.74% | .77 |
Hospital death RM | 3.49% | 2.11% | 1.66% | 2.47% | .69 |
Hospital death IM | 8.63% | 9.18% | 8.43% | 11.44% | 1 |
Hospital death GER | 15.23% | 6.45% | 6.23 | 8.21% | .05 |
P value (death among departments) | 0.02 | 0.09 | 0.10 | 0.06 |
GER: geriatics; IM: internal medicine; RM: respiratory medicine.
Age, sex, and most common diseases in patients admitted for COPD. Diseases present with a frequency greater than 10% (2011–2018).
RM | IM | GER | P-value | |
---|---|---|---|---|
Age-years (SD) | 73 (10) | 78 (10) | 87 (5) | |
Sex (men) | 88% | 87% | 83% | |
Respiratory infection | 43% | 52% | 45% | .41 |
Pneumonia | 22% | 17% | 25% | .42 |
Chronic respiratory failure | 20% | 12% | 13% | .26 |
BPH | 16% | 22% | 27% | .20 |
Bronchial hyperresponsiveness | 16% | 14% | 15% | .97 |
Obesity | 16% | 16% | 11% | .55 |
SAHS | 11% | – | – | |
Congestive heart failure | 10% | 22% | 33% | <.001 |
Respiratory acidosis | 10% | – | – | |
Anemia | 10% | 13% | 18% | .30a |
Atrial fibrillation | – | 15% | 19% | .57a |
Heart failure | – | 16% | 14% | .85a |
Ischemic heart disease | – | 13% | 11% | .84a |
Chronic renal failure | – | 11% | 17% | .31a |
Cognitive impairment | – | – | 14% |
BPH: benign prostate hypertrophy; GER: geriatics; IM: internal medicine; PC: primary care; RM: respiratory medicine; SAHS: sleep apnea-hypopnea syndrome.
The results obtained in the evaluation of Savana's performance by identifying mentions of COPD and related variables are shown in Table 5. Regarding the F-measurement, Savana obtained 0.926, 0.895 and 0.912 in COPD, spirometry and treatments, respectively.
Savana performance in terms of precision, recall, and F-measure.
Precision | Recall | F-measure | |
---|---|---|---|
COPD | 0.888 (0.847−0.921) | 0.968 (0.939−0.985) | 0.926 (0.891−0.952) |
Spirometry | 0.944 (0.875−0.982) | 0.850 (0.765−0.914) | 0.895 (0.816−0.946) |
Treatments | 0.917 (0.887−0.942) | 0.907 (0.875−0.932) | 0.912 (0.881−0.937) |
This is the first observational, descriptive study carried out in Spain to analyze the situation of COPD using big data methodology, based on data captured from EMRs. The study period was 8 years. The main conclusion of this study, possibly the one that best reflects the actual situation in a given healthcare setting, is the persistence of serious errors in the diagnostic process, little modification of pharmacological treatments in a decade marked by changes in CPGs, and low in-hospital mortality, despite the high number of comorbidities presented by patients and the considerable differences in mortality among the specialties, which is particularly low in respiratory medicine. Our overall conclusion is that CPGs, whether GesEPOC or the GOLD recommendations, have had little impact on patient care. These findings should be taken into account when developing CPGs or care models, because the popularization of this technology makes it feasible to simultaneously implement projects that help improve clinical practice based on continuous monitoring of outcomes.
The linguistic evaluation demonstrated Savana's high yield in the identification of records containing mentions of COPD disease and its related variables, obtaining in most cases F-measure values greater than 90% in all the variables analyzed (Table 5). For this reason, we can conclude that the clinical findings obtained are reliable and robust for the variables evaluated in this study.
Twenty years ago, the IBERPOC study showed that 78.2% of COPD patients had no prior diagnosis of their disease.16 More recently, the EPISCAN I study found that the prevalence of COPD (defined by the GOLD criteria) in the Spanish population between 40 and 80 years of age was 10.2%; this was higher in men (15.1%) than in women (5.6%), and significantly higher in patients aged ≥70 years (22.9%).17 Although in the last 2 decades, the diagnostic problem of COPD has focused on the high rates of underdiagnosis that were recently confirmed in the EPISCAN II study,18,19 attention has shifted in recent years to the problem of overdiagnosis.20–22 Spirometry is crucial for establishing a diagnosis and for classifying the functional severity of this disease. Despite the simplicity and low cost of the procedure, many patients are still diagnosed with COPD solely on the basis of their medical history and physical examination.23,24 Our data confirm that there are serious issues surrounding the diagnosis of COPD, and scant improvements have been noted in the past 8 years, despite the extensive numbers of scientific publications and the publication of the GesEPOC clinical guidelines and the successive GOLD recommendations. The objective of this study was to analyze the quality of the diagnostic process, treatment characteristics, and the impact of COPD in our environment. Our aim in this first analysis was not to confirm whether the diagnosis of COPD is correct or not, but to highlight that if the basic tools are not applied, any success in diagnosis is a matter of chance or intuition, and not the result of a correct care process.
To analyze which specialists treat patients with COPD, Cho et al. recently conducted a cross-sectional, population-based study using the administrative databases of the state of Ontario (Canada). In this population, primary care played a dominant role in the clinical management of COPD. Only 10.7% of patients were seen by respiratory medicine experts compared to 82.3% who were seen by other specialists, including 24.5% by cardiologists. These data underline the lack of specialized care received by patients with COPD compared to other chronic diseases.25 In our setting, the clinical management of COPD rests primarily with primary care and respiratory medicine, while internal medicine and, to a lesser extent, geriatrics, are also significantly involved. Fortunately, recent years show that the role of respiratory medicine is becoming more important, although pulmonologists still see less than 35% of patients. Regardless of which specialist treats COPD, it is essential that minimum requirements are met, especially in diagnosis, using procedures that are not currently available outside the field of respiratory medicine (although even this specialty shows room for improvement). In contrast to the findings of Cho et al., specialties other than primary care, respiratory medicine, internal medicine and geriatrics play a currently negligible role.
During the last decade, there have been significant modifications in CPGs regarding drug treatments. Despite these changes, real-life treatment has changed little, and the mainstays are still bronchodilators and ICS. The increase in dual bronchodilation in recent years is due mainly to changes in treatments that were formerly administered separately. On the other hand, despite messages urging more restricted use of ICS, use of these products has fallen by only 10% in the last 8 years, and they continue to be used by 68% of patients, mostly in triple therapy. These percentages correspond to the different periods analyzed, so they may vary individually within each period, although these changes were negligible and did not affect the observed trend. This means that if a change was noted during 1 of the analyzed periods, both treatments were computed (for this reason the percentages may not add up to 100). At the same time, a more accurate analysis was conducted using 1-year cut-off points, but the situation barely changed, so we chose to evaluate 2-year periods, so that the comparisons were more representative of trend changes between the different periods. Surprisingly, the greatest use of ICS was in primary care, where it can be assumed that, in the absence of additional assessment by other specialists, less severe patients are treated (Fig. 3). These data are consistent with both national and international case series26,27 in which the use of ICS in primary care exceeded 80% of cases.
Clinical experience and data from the literature suggest that hospitalization rates are falling, and patients who are admitted are older, and often have several associated comorbidities,28,29 all in a setting of lower COPD mortality.30,31 Our study confirms these findings. In the cumulative period of 2011−2018, only 35% of patients were admitted to hospital for any cause. In this subgroup of patients with at least 1 admission, all-cause in-hospital mortality was 10.76%, but only 2.47% occurred in respiratory medicine, where admissions were mainly due to respiratory causes. Although there were large differences in mortality among the various specialties, differences in age and associated comorbidities prevented us from establishing a direct relationship between the specialty treating the patient and higher or lower mortality. The AUDIPOC study showed varying inter-hospital mortality but, just as in our series, it was difficult to assess the impact of certain variables, such as comorbidities5 or the organization of the care system itself (death in geriatric centers, etc.) when relating the different mortality figures to the level of care specialization.32
The main limitation of a study of these characteristics could be the lack of documented information. In Castilla-La Mancha, EMRs are used extensively. The implementation of this tool began a decade ago, and has become practically universal in the last 5 years, so this limitation applies to the early years of the study. Moreover, the results in some variables will be conditioned by the level of quality of the clinical reports, which in many cases do not collect all patient information. Since this study was not based on the strict recording of variables, some may not be adequately documented, and consequently could not be analyzed. In this study we have only included information that we can confidently claim to be of high quality and significant clinical relevance. As reading systems and data collection improve, it will be possible in coming years to evaluate other variables that can support quality analyses in other aspects of COPD.
ConclusionThe advancement of new technologies, the accessibility of the Internet, and the possibility of performing mass data analyses (big data) can help us determine the situation of COPD in real-life situations that, due to different biases, cannot always be correctly assessed with other methodologies. This study identifies the main characteristics of an unselected COPD population and the main errors in disease management. This information will help guide effective care strategies and improve the COPD situation in our setting, while offering in all likelihood the opportunity to monitor such measures on a continuous basis.
Conflicts of interestJosé Luis Izquierdo has received honoraria for consultancy, projects, and talks from AstraZeneca, Bayer, Boehringer Ingelheim, Chiesi, Glaxo, Grifols, Smith Kline, Menarini, Novartis, Orion, Pfizer, Sandoz, and Teva.
Diego Morena reports no conflict of interest.
Yolanda González is a full-time employee of SAVANA.
José Manuel Paredero reports no conflict of interest.
Bernardino Pérez reports no conflict of interest.
Desiré Graciani reports no conflict of interest.
Matilde Gutiérrez reports no conflict of interest.
José Miguel Rodríguez as received honoraria for consultancy, projects, and talks from AstraZeneca, Bayer, Boehringer Ingelheim, Chiesi, Glaxo Smith Kline, FAES, Grifols, Menarini, Novartis, Orion, Pfizer, Roche and Teva.
Please cite this article as: Izquierdo JL, Morena D, González Y, Paredero JM, Pérez B, Graziani D, et al. Manejo clínico de la EPOC en situación de vida real. Análisis a partir de big data. Arch Bronconeumol. 2021;57:94–100.