Large variation in diagnostic procedures and treatment recommendations may hinder the management of obstructive sleep apnea (OSA) and also compromise correct interpretation of the results of multicenter clinical trials, especially in subjects with non-severe OSA. The aim of this study was to analyze the therapeutic decision-making between different sleep physicians in patients with AHI<40events/h.
MethodsSix experienced senior sleep specialists from different sleep centers of Spain were asked to make a therapeutic decision (CPAP treatment) based on anonymized recordings of patients with suspected OSA that has previously performed a sleep study. The clinical data was shown in an online database and included anthropometric features, clinical questionnaires, comorbidities, physical examination and sleep study results. Intra- and inter-observer decision-making were analyzed by the Fleiss tm) Kappa statistics (Kappa).
ResultsA total of 720 medical decisions were taken to analyze the agreement between sleep professionals. Overall intra-observer evaluation reliability was almost perfect (Kappa=0.83, 95% CI, 0.75•0.90, p<0.001). However, overall inter-observer concordance decreased to moderate agreement (Kappa=0.46, 95% CI, 0.42•0.51, p<0.001). Nevertheless, it was especially low when considering AHI<15events/h.
ConclusionsThis study demonstrates a good intra-observer concordance in the therapeutic decision-making of different sleep physicians treating patients with low/moderate OSA. However, when analyzing inter-observer agreement the results were considerably worse. These findings underline the importance of developing improved consensus management protocols.
La gran variedad de procedimientos diagnósticos y recomendaciones de tratamiento puede dificultar el manejo del síndrome de apnea obstructiva del sueño (SAHS), y del mismo modo comprometer la correcta interpretación de los resultados de ensayos clínicos multicèc)ntricos, especialmente en pacientes con SAHS no grave. El objetivo de este estudio fue analizar la decisión terapèc)utica de distintos mèc)dicos expertos en sueño en pacientes con el índice de apnea hipopnea < 40 eventos/h.
Mèc)todosSe pidió a seis especialistas con amplia experiencia en sueño de diferentes centros de España que tomaran una decisión terapèc)utica (terapia de presión positiva continua en las vías respiratorias o CPAP) basada en datos anónimos de los pacientes con sospecha de SAHS en los que previamente se había llevado a cabo un estudio del sueño. Los datos clínicos procedían de una base de datos online e incluían características antropomèc)tricas, cuestionarios clínicos, comorbilidades, examen físico y resultados del estudio del sueño. La concordancia intra- e interobservador de la toma de decisiones se analizó mediante el estadístico Fleiss tm) Kappa (Kappa).
ResultadosSe analizaron un total de 720 decisiones mèc)dicas para evaluar el consenso entre profesionales del sueño. De manera global, la fiabilidad de la evaluación intraobservador fue casi perfecta (Kappa = 0,83; 95% CI; 0,75 a 0,90, p < 0,001). Sin embargo, la concordancia global interobservador disminuyó hasta alcanzar un grado moderado de consenso (Kappa = 0,46; 95% CI; 0,42 a 0,51, p < 0,001), que fue especialmente bajo cuando se tuvo en cuenta un índice de apnea hipopnea <15 eventos/h.
ConclusionesEste estudio demuestra una buena concordancia intraobservador en la toma de decisiones terapèc)uticas de distintos mèc)dicos expertos en sueño que tratan a pacientes con SAHS leve o moderado. Sin embargo, los resultados relativos al acuerdo interobservador fueron notablemente peores. Estos hallazgos señalan la importancia de desarrollar mejores protocolos consensuados de manejo.
Obstructive sleep apnea (OSA) is one of the most common sleep disorders. It is a chronic condition secondary to complete or partial upper airway obstruction that results in daytime sleepiness and fatigue that negatively affects the quality of life of patients. OSA is also considered an important risk factor for cardiovascular, metabolic and neurological comorbidities.1,2
OSA is diagnosed by using different sleep tests, from conventional polysomnography (PSG), which is actually the gold standard in diagnostic procedures, to most simplified single-channel home devices.3 Despite the proven efficacy and agreement of simplified devices with PSG, sometimes the clinical management of OSA can be hampered by different interpretations of the data from sleep studies. This fact is critical when dealing with patients with non-severe OSA, in whom the diagnosis and the subsequent therapeutic decision could be especially difficult.
There are several guidelines, consensus and research articles with various indications for CPAP treatment in OSA based on AHI severity,4 excessive daytime sleepiness,5,6 or improved driving performance.7 Other recommendations also include hypertension and cardiovascular comorbidities, regardless of OSA symptoms,8 impaired cognition, insomnia or mood disorders.9 In Spain the criteria for recommending CPAP are, as stated in the Spanish Sleep Network guidelines,10 an AHI between 5 and 30events/h with significant symptoms or OSA associated pathologies, or AHI greater than 30events/h, giving less importance to symptoms or pathologies.
Therefore, since different professionals may vary in interpreting the results of sleep studies, CPAP indication could be particularly complex when dealing with non-severe patients and different diagnostic methodologies. Thus, clinical decision-making in this group of OSA patients may be discordant and affect its management. This fact is also especially relevant since interpretation of results can be compromised in the context of clinical trials, especially those involving several centers and/or countries. Due to the high number of multicenter clinical trials that are carried out at present, it is worth to analyze the consistency of decision-making concerning recommendation of CPAP among different sleep physicians, especially in case of patients with AHI<30events/h, when using PSG or home respiratory polygraphy (RP), as usually occurs in a sleep unit.
The aim of this study was to compare the intra- and inter-observer variability in decision-making on treatment recommendation in patients with AHI<40events/h among different Spanish sleep physicians.
MethodsTo evaluate intra- and inter-observer agreement in therapeutic decision-making 6 sleep professionals, were asked to carry out an online therapeutic decision to recommend or not CPAP treatment based on anonymized records from 40 patients that were displayed on an online-encrypted database. These records included anthropometric data, clinical questionnaires, comorbidities, physical examination and the sleep study results, from either PSG or RP. The 6 professionals involved were experienced sleep physicians, working in different Sleep Units of Spanish University Hospitals, aged 36•60 years-old. All were recommended to follow the SEPAR guidelines for therapeutic decision making.
The study was carried out with the approval of the Hospitals tm) Ethics Committee. All data were obtained from patients recruited from May 2015 to May 2017 in the Sleep Unit of Hospital Clinic (Barcelona, Spain) and included in a clinical trial (trial number NCT02779894 in http://www.clinicaltrials.gov/). Forty patients were randomly selected from those in the study database presenting AHI<40events/h. All subjects were aged 18•75 years old and, before their sleep test, presented suspicion of OSA (heavy snoring with breathing pauses during the night, non-restful sleep and daytime somnolence or fatigue not explained by other pathologies) and/or refractory hypertension. None of them presented invalidating somnolence (medical criteria), any unstable diseases, previous use of CPAP, uvulopalatopharyngoplasty, or risk profession.
The sleep studies corresponded to in-hospital PSG, in-hospital RP or home RP. In in-hospital PSG (Grael/Somtèc) PSG, Compumedics Limited 2006, Abbotsford, Victoria, Australia) the signals of electroencephalogram (EEG- leads F4-M1, C4-M1 and O2-M1), electrooculogram (EOG) chin and leg electromyogram (EMG), and electrocardiogram (ECG) were recorded, as well as nasal and oronasal flow (cannula and thermistor), respiratory effort, and oxygen saturation (SpO2), body position, snoring (tracheal microphone). All patients were also video monitored. In-hospital PR (Somtè PSG, Compumedics Limited 2006, Abbotsford, Victoria, Australia) consisted in recording nasal flow (cannula), snoring, respiratory effort, SpO2 and body position, as well as video monitoring. In home RP (Portable type 3 ApneaLink air, ResMed, Australia) the registered variables were nasal flow (cannula), snoring, respiratory effort, SpO2, pulse frequency, and body position. Sleep scoring was performed using the standardized AASM criteria11: apnea was defined as an absence of flow for more than 10s. Hypopnea was defined as a discernible reduction in the amplitude of the airflow signal from the pre-event baseline for at least 10s, associated with an oxygen desaturation ≥3% in both PSG and PR and also associated with arousal in PSG.
Online procedureTo carry out the therapeutic decision, all professionals logged-in into a especially designed website with individual username and password. Professionals could see a list of patients identified by an anonymous alphanumeric code. To evaluate the intra-individual agreement, each of the patients was visualized 3 independent times, with a different code each, at random order and without knowledge of the evaluators. Therefore, each professional evaluated 120 cases. In order to avoid fatigue, each evaluator was allowed to analyze a maximum of 15 cases per day along a 3-month period.
When entering into each case evaluation, the data provided for therapeutic decision-making were displayed. They included anthropometric features (age, gender, body mass index (BMI), and systolic and diastolic blood pressure), clinical questionnaires (Epworth sleepiness scale (ESS), EuroQol-5D and EuroQol-VAS), comorbidities (high blood pressure, cardiac disease, neurological or respiratory diseases, diabetes, dyslipidemia, depression, anxiety or neoplasms), and physical examination data (micrognathia, retrognathia, amygdala and palatine Friedman class, nasal obstruction or ORL surgery). To help in analyzing the sleep history of the case, the following data were also displayed: ASDA sleepiness scale, daily amount of sleep hours, presence of snoring, choking attacks, nicturia, witnessed apneas, morning headache, restless sleep, daytime sleepiness, restless legs syndrome, aggressive behavior during sleep, and muscle weakness associated to intense emotions or sleepwalking. The results of the sleep studies included recorded time, AHI, central apneas, total respiratory events, Cheyne•Stokes breathing, postural predominance, basal and mean SpO2, ODI 3%, and CT90 for all tests. In case of PSG sleep efficiency, sleep staging, arousal index, and postural ODI were included. Based on all the information provided, sleep professionals were asked to choose a therapeutic decision (CPAP/non-CPAP) if an OSA diagnosis was attributed to the case.
Statistical analysisTo assess observer reliability on the therapeutic decision-making (CPAP treatment or not) between the same and the different physicians (intra- and inter-individual agreement), the Fleiss tm) Kappa (Kappa) statistics for categorical variables12 was calculated. Data are presented as mean±standard deviation for the variables measured in numerical scale and in percentage for the measurement in nominal scale.
Percentage of CPAP indication of the three therapeutic decisions for each actual patient made by each evaluator (Fig. 1) was also analyzed by 2-way ANOVA, taking “time of therapeutic decision” and “sleep physician” as factors.
ResultsTable 1 summarizes the patients tm) demographics and comorbidities of the sleep studies analyzed. Out of 40 sleep studies, 57.5% were from male patients, mean aged 51.07±11.70 years and 28.13±8.04kg/m2. Mean AHI was 18.42±11.32events/h, ranging 0•40. Concerning diurnal somnolence, patients presented a mean ESS of 8.02±4.56, ranging 0•22. Quality of life questionnaires showed mean EuroQol-5D of 0.84±0.21 and EuroQol-VAS of 71.37±19.18, ranging 0.17•1 and 10•100, respectively. Therefore, the sleep studies analyzed corresponded to a population of patients with wide ranges of sleepiness and quality of life.
General patients characteristics (n=40).
Male gender | 23 (57.35) |
Mean age (years) | 51.07±11.70 |
Neck circumference (cm) | 38.87±3.79 |
BMI (kg/m2) | 28.13±8.04 |
Nasal obstruction | 15 (37.5) |
ORL surgery | 7 (17.5) |
Smokers | 10 (25.0) |
Alcohol intake | 21 (52.5) |
Comorbidities | |
Hypertension | 10 (25.0) |
Diabetes mellitus | 7 (17.5) |
Dislipidemia | 23 (57.5) |
Cardiovascular disease | 3 (7.5) |
Neurological disease | 6 (15) |
Respiratory disease | 7 (17.5) |
Depression | 9 (22.5) |
Anxiety | 8 (20.0) |
Cancer | 4 (10.0) |
AHI (events/h) | 18.42±11.32 |
CT90 (%) | 5.05±8.43 |
ODI3% | 18.90±14.27 |
EuroQol-5D | 0.84±0.21 |
EuroQol-VAS | 71.37±19.18 |
ESS | 8.02±4.56 |
Data are expressed as mean±SD or number of patients (%). BMI: body mass index. AHI: apnea hypopnoea index. ODI: oxygen desaturation index. QoL: quality of life. VAS: visual analog scale. ESS: Epworth sleepiness scale.
As every actual sleep study was blindly analyzed 3 times by the 6 physicians, each case was analyzed 18 times, and a total of 720 medical decisions were made for assessing the concordance among health professionals. Table 2 shows the diagnosis and medical decisions chosen by all the health professionals that participated in the study.
On-line diagnosis and CPAP indication.
Total medical decisions | 720 |
Diagnosis | |
OSA | 70.9% |
Snoring | 21.5% |
Other* | 7.6% |
CPAP indication | 36.0% |
Fleiss tm) Kappa agreement for intra-observer decision-making is represented in Table 3. Overall intra-observer evaluation reliability of sleep professionals was found to be good, showing almost perfect agreement (Kappa=0.83, 95% CI: 0.75•0.90, p<0.001). Concordance was also assessed considering two different categories of AHI. For AHI<15events/h sleep physicians showed again an almost perfect agreement (Kappa=0.81, 95% CI: 0.70•0.93, p<0.001) while for AHI>15events/h the agreement was found to be substantial (Kappa=0.78, 95% CI: 0.68•0.87, p<0.001).
As indicated in Table 4, Fleiss tm) Kappa agreement clearly decreased when assessing inter-observer agreement. Considering all AHI, inter-observer evaluation reliability was found to be moderate (Kappa=0.46, 95% CI: 0.42•0.51, p<0.001). For AHI<15events/h the worst values of concordance were found: agreement could be not demonstrated between sleep physicians (Kappa=0.06, 95% CI: ∧0.01 to 0.14, p=0.095). When analyzing AHI>15, inter-observer agreement slightly increased, and was found to be fair (Kappa=0.37, 95% CI: 0.31•0.43, p<0.001).
Fig. 1 shows the percentage of CPAP indication of the 3 therapeutic decisions for each actual patient made by each evaluator. The results of 2-way ANOVA were consequent with intra- and inter-observer agreement analysis. Whereas the time of analysis factor was not significant (p=0.346), a significant effect of physician was observed (p<0.001).
DiscussionThis study analyzed intra- and inter-observer agreement in the therapeutic decision-making among Spanish sleep physicians treating patients with AHI<40events/h. The results, obtained from more than seven hundred evaluations (n=720) demonstrated a good intra-observer concordance. However, regarding to inter-individual agreement this study revealed lower concordance: overall concordance was found to be moderate but in case of less severe patients (AHI<15events/h) the inter-subject concordance considerably worsened.
Despite the existence of a number of clinical guidelines and consensus for the management of OSA, these results show a level of concordance below it is desirable. The majority of available reports have not analyzed the concordance of therapeutic decision-making between different sleep physicians, but between different diagnostic procedures, usually comparing simplified devices versus the gold standard in-hospital PSG. Indeed, Masa et al. 13 analyzed the therapeutic decision-making agreement between home RP and in-hospital PSG among 348 patients, describing that home RP was adequate for high AHI, but insufficient for mild or moderate AHI and concluded that this diagnostic method was effective only in patients with high pretest of OSA. Similar results have been obtained in a clinical study carried out in a pediatric population using in-lab RP versus PSG,14 and the authors concluded that clinical decision-making in children with mild and moderate OSA may be difficult. Studying an adult population, Guerrero et al.15 tried to go a step further and analyzed the agreement of therapeutic decision making after three consecutive nights with home RP versus PSG in patients with mild to moderate pretest of OSA or with associated comorbidities that could mask OSA symptoms. In this study, the authors also assessed the concordance between different specialists, comparing the therapeutic decisions made by sleep physicians and respiratory physicians. The authors demonstrated that this particular diagnostic method is useful to manage patients without high pretest probability of OSA or with comorbidities only when evaluated by a qualified sleep specialist. It is important to bear in mind that these and other clinical studies have also shown that these diagnostic approaches, in addition to effectively enabling a correct decision making to prescribe therapy for OSA in certain patient population, also reduced costs.16 A different study17 assessed both diagnosis and therapeutic decision agreement between personnel from sleep reference centers with PSG and from non-reference centers employing a simplified device. The authors found a substantial level of concordance between different professionals, but each group of professionals employed a different diagnostic method.
Our study is unique since, as far as we know, there are no published studies addressed to analyze the performance of CPAP therapeutic decision-making in OSA among sleep specialists in case that all of them analyze the same patient cases, using different real-life diagnostic methods. In fact, the available data are focused on comparing how respiratory events or sleep stage are scored.18•21 The causes that could explain the lack of concordance we found are not well known. However, we could speculate that, although it was recommended to follow the SEPAR regulations, it is possible that each evaluator has also used their own criteria by taking into account other circumstances (gender, quality of life, different pathologies, or only sleepiness). In addition, the lack of a clinical face-to-face interview could have conditioned the therapeutic decision proposed by the 6 professionals involved in the study. Moreover, it is remarkable that current consensuses are global and have little impact on defining the management of mild/moderate cases. In order to simplify the study and to focus on the most relevant therapeutic decision (CPAP prescription) other aspects usual in clinical practice, such as decision to refer patients to other specialty doctors (e.g. otorhinolaryngologist or neurologist), have not been analyzed.
By showing the striking and worrying results we have found, especially taking into account that our 6 sleep physicians had solid experience in the field, we intend to draw the attention of the sleep community. There are several possible consequences of this relatively poor agreement between sleep professionals. On the one hand, OSA management could differ especially of those patients with low AHI. On the other hand, the interpretation of the many multicentre studies that are currently on-going may be hampered by the lack of agreement we report here. The results found in different decision-making in other medical specialties regarding the agreement between specialist physicians have also revealed the need to seek strategies to improve inter-observer reliability.22•25 Thus, it seems advisable that clinical guidelines for the management of OSA need revision, in particular concerning recommendations to manage OSA patients with low AHI.
FinancialThe Spanish Ministry of Economy and Competitiveness (PI14/00416 and PI17/01068).
Conflict of interestThe authors declare no conflict of interest.
The authors would thank Mr A. Gabarrús for his statistical assistance, Mr R. Pereira for implementing the designed on-line database and Ms G. Guerrero for her participation in the preparation of the patients tm) database.