Early diagnosis of lung cancer (LC) is crucial to improve survival rates. Radiomics models hold promise for enhancing LC diagnosis. This study assesses the impact of integrating a clinical and a radiomic model based on deep learning to predict the malignancy of pulmonary nodules (PN).
MethodologyProspective cross-sectional study of 97 PNs from 93 patients. Clinical data included epidemiological risk factors and pulmonary function tests. The region of interest of each chest CT containing the PN was analysed. The radiomic model employed a pre-trained convolutional network to extract visual features. From these features, 500 with a positive standard deviation were chosen as inputs for an optimised neural network. The clinical model was estimated by a logistic regression model using clinical data. The malignancy probability from the clinical model was used as the best estimate of the pre-test probability of disease to update the malignancy probability of the radiomic model using a nomogram for Bayes’ theorem.
ResultsThe radiomic model had a positive predictive value (PPV) of 86%, an accuracy of 79% and an AUC of 0.67. The clinical model identified DLCO, obstruction index and smoking status as the most consistent clinical predictors associated with outcome. Integrating the clinical features into the deep-learning radiomic model achieves a PPV of 94%, an accuracy of 76% and an AUC of 0.80.
ConclusionsIncorporating clinical data into a deep-learning radiomic model improved PN malignancy assessment, boosting predictive performance. This study supports the potential of combined image-based and clinical features to improve LC diagnosis.
Lung cancer (LC) remains a major health problem, causing the highest mortality in Europe and worldwide.1 These data are explained by more than 70% of cases diagnosed at an advanced stage with a low survival; therefore, early diagnosis is key because early-stage treatment improves LC prognosis.2 Recently, a shift has been occurring due to the increasingly expanded use of computed tomography across various settings, leading to a rise in the number of incidentally detected pulmonary nodules (PN). On the other hand, this shift is also attributed to the implementation of lung cancer screening (LCS). In both cases, the outcome is early-stage diagnosis. That's how the 2022 American Cancer Society annual report reveals a decline in the incidence of advanced-stage cancer, with a simultaneous annual increase of 4.5% in a localised stage primarily linked to the initiation of LCS in the US a decade ago.3
Despite the known advantages in reducing mortality due to LCS,4–6 not all PN detected are LC. In a systematic review, 21% of trial screens yielded a false positive result (range 1–42%) in baseline LDCT,6 requiring follow-up images or even biopsy. Thus, these false positives can cause potential harm to patients, and they can generate high costs for health services.7–9 Similarly, they occur with the increased incidental finding of PN, as many of these are benign and require close follow-up, which is similar to the approach taken with PN identified through LCS.10–13 In this context, whether the PN is detected incidentally or through LCS, it is crucial to identify high-risk cases and rule out those with low risk.14,15
Fortunately, applying artificial intelligence (AI) techniques to radiomics might make it possible to discriminate between benign and malignant PN.16 Radiomics is an emerging area based on the high-throughput mining of quantitative image features from medical images that allows data to be applied in clinical decision-making to improve diagnostics and prognostics. It is mainly applied to cancer research.17,18 Image analysis with a structured step workflow19 is used to extract complementary features from multiple views (the shape, intensity and texture of features, among other things) which encapsulate different aspects of the lesion. Machine learning is a subfield of AI that encompasses all approaches and allows computers to learn from various types of data. In contrast, deep learning is part of machine learning based on multi-layered artificial neural networks to solve highly complex problems.20,21 This has been applied in radiomics and is used to model such multi-view features to predict clinical outcomes.
The usefulness of radiomics in the early diagnosis of LC is promising.22,23 Although many of these studies yield favourable outcomes, most are based solely on imaging data. Incorporating clinical data related to LC can enhance these findings by improving their application in daily clinical practice, yet there are still few studies with this approach. Among these, Zhang et al., using retrospective data, developed a radiomic and deep learning model to predict the malignancy of PN, achieving an AUC of 0.819 (95% CI: 0.76–0.88).24 Employing a multi-omics approach could represent a significant advancement in applying AI models to the clinical prognosis of LC patients.25
The Radiolung project aims to design an algorithm based on a radiomic signature that, associated with clinical data, can accurately discriminate between LC and benign tumours.
MethodsDesign and recruitmentProspective, cross-sectional, comparative, and experimental study investigating the radiomic signatures of malignant and benign resected PNs, along with clinical risk factors. Recruitment took place from December 2019 to September 2023 at a tertiary care hospital. The inclusion criteria were patients aged 35–85 years-old with a clearly identifiable PN detected incidentally by chest CT or in LCS that qualified for surgery according to a multidisciplinary tumour board. The exclusion criteria were a slice thickness greater than 2.5mm in the chest CT, a PN larger than 3cm in diameter, or lung metastasis. The patients underwent lobectomy, segmentectomy, or atypical resection according to thoracic surgery criteria.
In each case, clinical and demographic were selected. Pre-operative results of pulmonary function tests were obtained. Pre-surgery chest CT images, PN characteristics, and the pathological results of the lung tissue obtained during surgery were collected.
This study was performed in accordance with the principles of the Declaration of Helsinki. The research protocol was approved by the regional ethics committee (reference PI-19-169). All the patients gave their written informed consent.
Image data acquisitionPatients underwent multislice chest CT scans using regular radiation or low-dose radiation according to the protocols. Most of these scans were performed without intravenous contrast. Images were acquired with a GE Revolution scanner (General Electric Healthcare, Milwaukee, WI, USA), a SOMATOM Drive (Siemens Healthineers AG, Forchheim, Germany) and a Philips Incisive scanner (Philips Healthcare, Best, the Netherlands) equipped with 128mm×0.625mm, 128mm×0.6mm and 64mm×0.64mm detector collimation, respectively. They all used automatic tube voltage and automatic tube current modulation. Chest CT image reconstructions were performed on a 512×512 matrix using a high spatial frequency algorithm and thin slice thickness applying the lung window setting (WW: 1600 and WL: −600) for the lung series. All chest CTs were interpreted by thoracic radiologists with over 10 years of experience.
The images were extracted from the picture archiving and communication system (PACS) in the Digital Imaging and Communications in Medicine (DICOM) format and were anonymized to guarantee patient confidentiality. Subsequently, the anonymized DICOMs were uploaded to a website hosted at the Autonomous University of Barcelona (UAB) for processing by the Computer Vision Center (CVC) to construct the radiomic model.
Data analysisThe study analysis followed a three-step strategy. In the first step, a radiomic prediction model was fitted to estimate a PN's malignancy probability based on the chest CT image. In the second step, a clinical prediction model was fitted to estimate a patient's malignancy probability based on their clinical profile. In the third step, the malignancy probability predicted by the estimated clinical model was used as the best estimate of the pretest probability of disease to update the radiomic model's node malignancy probability using a nomogram for Bayes’ theorem.26
Deep radiomic modelAn AI system designed to diagnose PN based on radiomic analysis of chest CT scans has the three main steps sketched in Fig. 1 (pipeline). These steps have the following goals:
- 1.
Nodule detection. The first step is to identify the position and volumetric region (volume of interest, VOI) in the CT scan that contain the lesion of interest.
- 2.
Nodule representation. Features are extracted by computing or using intensity values from the volume (3D) or slices (2D) of the VOI, thereby characterising the visual appearance of nodules. These features define a crucial representation space for the malignancy characterisation of the nodules. The representation space of nodules is given by visual features describing the content of textural volumes extracted from the intensity VOI. The textural volumes are given by 3D GLCM (Grey Level Co-occurrence Matrix) textural descriptors derived from co-occurrence matrices.27 The visual features describing the content of textural volumes are extracted using a pre-trained convolutional neural network and concatenated for each slice. Finally, most discriminant features were the input to a fully connected neural network for the classification of benign and malign slices.
- 3.
Nodule diagnosis. In this step, a fully connected neural network is trained to determine the values of the nodule representation space that best discriminate malignancy. Furthermore, throughout the nodule representation and diagnosis process, various methods are used, with the potential to optimise their hyperparameters.
The probabilities of the deep radiomic model were calculated using a nodule k-fold (K=10) validation scheme in order to mitigate overfitting of the deep learning approach.
For more implementation details of deep-radiomic model see supplementary material.
Clinical modelA descriptive analysis of potential diagnostic factors for the diagnosis of PN malignancy was performed. The set of diagnostic factors included age, sex, educational level, body mass index (BMI), smoking status, living in an area with air pollution, family history of cancer, personal history of cancer, chronic obstructive pulmonary disease (COPD), and spirometric profile like forced vital capacity (FVC), forced expiratory volume in one second (FEV1), diffusing capacity for carbon monoxide (DLCO) and FEV1/FVC or obstruction index.
To proceed with the selection of variables according to the Akaike information criterion (AIC criteria), the initial dataset was bootstraped with repetition 2000 times,28 and a logistic regression model was fit in each sample. Malignancy diagnosis (yes/no) was used as the outcome. Variables that were retained in more than 70% of the models were candidates for the final model. The non-linear relationship between age and the log odd of the outcome was assessed with no relevant results. The final set of variables included in the model was then approved by a pulmonologist with more than 10 years of experience. Internal validation of the resulting model was performed on the whole cohort and was based on discrimination, calibration, and bootstrap validation.29
Discrimination was assessed by estimating the area under the receiver operating characteristic (ROC) curve (AUC). Calibration was assessed by the Brier score and graphically comparing the observed versus expected probabilities of malignancy diagnoses by deciles of predicted risk. Due to the imbalance in the malignancy distribution a precision recall curve was estimated and used to complement the model performance analysis. Bootstrap validation was performed to account for model overfitting and correct for optimism the model performance on the development data. Due to a lack of data, external validation was not performed.
Integrative model – Bayes updateThe Bayes theorem underlies the Fagan nomogram.26 This method allowed us to update the probability that a patient had a condition of interest given the probability that the subject had the condition before the test was performed and the likelihood ratio of the test. In our study, we used the probability of the clinical model as the probability that the subject had the condition before testing the PN, and the likelihood ratio of the test was based on the deep radiomic model. The resulting probability, using the Fagan nomogram, estimates the probability that the PN is malignant based on the patient's clinical probability and the deep radiomic model test result. Therefore, given a positive result in the radiomic model, the final probability of lung nodule malignancy is the malignancy clinical probability of the patient multiplied by the positive likelihood ratio of the deep radiomic model. Given a negative result in the radiomic model, the final probability of malignancy is the malignancy clinical probability of the patient multiplied by the negative likelihood ratio of the deep radiomic model.
All analyses were conducted using Python and R software version 4.1.0.30
ResultsDemographic and clinical dataThe demographic and clinical characteristics of the patients are presented in Table 1. The mean diameter of the PNs was 17.98mm (18.60mm in malignant, 15.76mm in benign) with a median of 18.00mm. 73 were malignant, and 20 were benign. In terms of the histological type of malignant PN, 57 were adenocarcinoma, and 15 were squamous cell carcinoma. In benign PN, the majority corresponded to fibrosis/inflammation processes (80%), with less frequency attributed to infectious causes such as aspergillosis and tuberculosis. In both benign and malignant nodules, over 50% were located in the upper lobes. The flowchart of patients and PN is in Fig. 2.
Demographic and clinical characteristics of the patients.
MalignantN=73 | BenignN=20 | TotalN=93 | |
---|---|---|---|
Age | |||
Mean (SD) | 69.22 (8.5) | 65.91 (10.6) | 68.50 (9.0) |
Sex,n(%) | |||
Woman | 23 (31.5%) | 10 (50.0%) | 33 (35.0%) |
BMI | |||
Mean (SD) | 27.41 (3.8) | 27.18 (6.6) | 27.36 (4.5) |
Education,n(%) | |||
College studies | 4 (5.5%) | 1 (5.0%) | 5 (5.4%) |
Post-high school training | 31 (42.5%) | 12 (60.0%) | 43 (46.2%) |
High school or less | 38 (52.0%) | 7 (35.0%) | 45 (48.4%) |
Air pollution,n(%) | |||
Yes | 17 (23.3%) | 3 (15.0%) | 20 (22.0%) |
Smoking,n(%) | |||
Never | 8 (11.0%) | 4 (20.0%) | 12 (12.9%) |
Current | 28 (38.4%) | 8 (40.0%) | 36 (38.7%) |
Former | 37 (50.6%) | 8 (40.0%) | 45 (48.4%) |
Pack year – index | |||
Mean (SD) | 39.1 (26.2) | 27.6 (23.6) | 42.1 (23.4) |
Family history of cancer,n(%) | |||
Yes | 29 (39.7%) | 8 (40.0%) | 37 (39.8%) |
Lung cancer | 13 (44.8%) | 2 (25.0%) | 15 (40.5%) |
Others | 16 (55.2%) | 6 (75.0%) | 22 (59.5%) |
Personal history of cancer,n(%) | |||
Yes | 21 (28.8%) | 5 (25.0%) | 26 (28.0%) |
COPD,n(%) | |||
Yes | 24 (32.9%) | 5 (25.0%) | 29 (31.2%) |
FVC (%) | |||
Mean (SD) | 92.2 (18.4) | 95.0 (17.0) | 92.8 (18.0) |
FEV1(%) | |||
Mean (SD) | 84.4 (19.2) | 88.5 (21.5) | 85.3 (19.7) |
Index (%) | |||
Mean (SD) | 72.7 (10.1) | 77.5 (15.5) | 73.7 (11.6) |
DLCO (%) | |||
Mean (SD) | 76.6 (16.3) | 83.2 (27.6) | 77.9 (19.1) |
This model includes data from 90 PN that met all the technical requirements for the analysis, 69 (77%) of which were malignant. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 87% (95% CI: 0.77–0.94), 52% (95% CI: 0.30–0.74), 86% (95% CI: 0.75–0.93) and 55% (95% CI: 0.32–0.77), respectively, with a brier score 0.16 (95% CI: 0.1–0.23), accuracy of 79% (95% CI: 0.59–0.79) and AUC of 0.67 (95% CI: 0.51–0.83) (Fig. 3).
Clinical modelThe model was carried out using the information of the 90 patients with complete cases (Fig. 4), 72 (80%) of which presented a malignant diagnosis. Based on the AIC and considering only variables selected more than 50% of the estimated models in the bootstrap samples, the optimal clinical model is displayed in Table 2.
In this clinical model, the sensitivity, specificity, PPV, and NPV were 86% (95% CI: 0.76–0.93), 61% (95% CI: 0.36–0.83), 90% (95% CI: 0.80–0.96) and 52% (95% CI: 0.30–0.74), respectively, with a brier score 0.14 (95% CI: 0.10–0.19), accuracy of 81% (95% CI: 0.71–0.89) and AUC of 0.71 (95% CI: 0.55–0.87)
Integrative modelThe integrative model includes data from 89 PN that met all requirements for the radiomic and clinical model, 69 (78%) of which presented a malignant diagnosis. Integrating selected clinical features into a radiomic deep learning model exhibits a sensitivity, specificity, PPV and NPV of 74% (95% CI: 0.62–0.84), 85% (95% CI: 0.62–0.97), 94% (95% CI: 0.85–0.99) and 49% (95% CI: 0.31–0.66), respectively. The brier score is 0.14 (95% CI: 0.09–0.19), the accuracy is 76% (95% CI: 0.66–0.85), and the AUC is 0.80 (95% CI: 0.67–0.92) (Fig. 5). Fig. 6 plots the final predicted PN malignancy by risk decile against the observed incidence of PN malignancy in each decile. The convergence of the two curves indicates good model calibration. The precision-recall curve (supplemental fig. 5) rises sharply and plateaus at high precision with precision-recall AUC of 0.91. This suggests that the estimated model can achieve high precision (true positive rate) without sacrificing recall (sensitivity).
DiscussionRadiomics has revolutionised medical imaging, and there are currently many working groups dedicated to this research in different fields.18,31 Other AI techniques, such as deep learning, allow us to associate these radiomic characteristics with other data, such as clinical information. Regarding this integrative approach, few working groups are dedicated to LC and PN diagnosis using radiomics and clinical data together, and the retrospective study by Lui et al.32 stands out. They conducted an analysis utilising 20 radiomic features, in addition to age, gender, and PN location. Successfully, they developed a predictive model that achieved an AUC of 0.81 (95% CI: 0.75–0.87). Similar results were published in 2023 by Lin et al.33 With a similar approach, our study yields promising results, as the integration of selected clinical features with radiomic image data based on deep learning techniques can formulate a predictive model that significantly improve results compared to relying solely on image analysis, achieving a positive predictive value of 94% (95% CI: 0.85–0.99) and an AUC of 0.80 (95% CI: 0.67–0.92). It is worth noting that the accuracy showed a slight but non-significant decrease in the integrative model, which could be related to the imbalance in the number of benign and malignant cases in our cohort.
Other studies such as Marmon et al.34 and Kammer et al.35 have developed models based on blood biomarkers, clinical data, and radiomics with favourable results. However, it is important to note that these studies use blood biomarkers that are not specific to LC, which may add an additional cost without necessarily improving performance compared to models using only clinical and radiomic data. Furthermore, technically, they use classic machine learning hand-crafted features for radiomic descriptors without incorporating information from surrounding lung tissue. While these studies are relevant, our research stands out for its innovative deep learning approach and its ability to integrate multiple data sources.
Regarding validated predictive models based on clinical data, there are those designed for incidental PN such as the Mayo model,36 while others focus on LCS such as the Brock model.37 Please see the comparative table in the supplementary material. We have applied the Mayo model in our cohort and it exhibits similar performance in terms of PPV of 88% (95% CI: 0.75–0.95), but worst accuracy 63% (95% CI: 0.52–0.74), and AUC of 0.54 (95% CI: 0.36–0.72) (see supplemental material).
A strength of this study is that it employs prospective patient data for individuals who have undergone PN surgery, allowing for the collection of clinical data, chest CT images and histology. Another point to highlight is the novelty of this exploratory study, as it tests various clinical characteristics to identify the most representative ones for association with a deep radiomic model. This emphasizes the importance of integrating tools from daily medical practice with new technologies to enable early cancer diagnosis. Finally, we have compared our model with a conventional and validated clinical model such as the Mayo model, allowing us to demonstrate its validity in our cohort.
As for the limitations of this study, the most significant one is the need to expand the number of cases to improve both the clinical and radiomic models. Therefore, we are considering making the next study multicentric in the upcoming year to validate our exploratory analysis. Another limitation is the low number of benign cases, as these are real-life cases involving PN that have undergone surgery, which are inherently more diagnostically challenging. On the flip side, this situation contributes to the model training to detect suspicious malignancy cases of PN that ultimately prove benign.
Radiomics and AI models focused on LC are evolving across different phases, encompassing aspects such as PN management, diagnosis, treatment, and relapse of LC.38,39 Integrative models with clinical data will be crucial to optimise their effectiveness. These advancements will positively impact patients, particularly in managing PN, because these models will aid in more accurately predicting the likelihood of malignancy, thereby avoiding unnecessary follow-ups, biopsies or surgeries. This situation not only reduces patient anxiety but also minimises potential harm, decreases waiting lists and reduces healthcare system costs.
In conclusion, we firmly believe that this integrative model, combining clinical data and radiomics through deep learning, can aid in diagnosing and managing PN. Although external validation is necessary in a subsequent phase, in the near future it may be directly useful in LCS programmes, thus optimising the early detection of LC and bringing technological advances closer to clinical practice.
FundingThis project is supported by the Ministerio de Economía, Industria y Competitividad, Gobierno de España grant number PID 2021-126776OB-C21; Barcelona Respiratory Network (BRN) Fundació Ramon Pla i Armengol; “Clinical project” grant, Fundació Acadèmia Ciències Mèdiques de Catalunya i de Balears; “Talents” grant 2020, Fundació La Pedrera and Hospital Universitari Germans Trias i Pujol; Lung Ambition Alliance Grant; and the JMC Legacy Research Fund of Germans Trias i Pujol University Hospital.
Financial/non-financial disclosuresThe authors declare not to have any conflicts of interest that may be considered to influence directly or indirectly the content of the manuscript.
Authors’ contributionsSonia Baeza: conceptualization, methodology, data curation, formal analysis, validation, writing-original draft, writing-review and editing, visualisation. Debora Gil: conceptualization, methodology, software, formal analysis, validation, writing-review and editing, supervision. Carles Sanchez: methodology, software, data curation, formal analysis, validation, writing-review and editing. Guillermo Torres: methodology, software, data curation, formal analysis, validation, writing-review and editing. Cristian Tebé: validation, formal analysis, writing-review and editing. João Carmezim: validation, formal analysis, writing-review and editing. Ignasi Guasch: methodology, writing-review and editing. Isabel Nogueira: methodology, writing-review and editing. Samuel Garcia-Reina: methodology, writing-review and editing. Carlos Matínez-Barenys: methodology, writing-review and editing. Jose Luis Mate: methodology, writing-review and editing. Felipe Andreo: conceptualization, writing-review and editing. Antoni Rosell: conceptualization, methodology, formal analysis, writing-review and editing, supervision.
Artificial intelligence involvementThe text is original, and no artificial intelligence techniques have been used to write its content.
We thank Adela González, our nurse navigator, and Laia Ruiz, a UAB medical student, for their contribution and support.