To develop a blood-based algorithm using cell-free DNA (cfDNA) methylation profiles to diagnose and stratify patients with COPD according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification.
MethodsBetween 2021 and 2023, 166 participants were enrolled, including patients with COPD (N=80) and healthy controls (N=86) from three hospitals in Andalusia. Genome-wide cfDNA methylation analysis was performed in 128 individuals to identify CpG sites discriminating COPD from healthy individuals and to stratify COPD patients by GOLD categories. A hierarchical support vector machine (SVM) classifier was implemented, and functional enrichment analyses were conducted to assess biological relevance.
ResultsDifferential methylation analysis identified 354 genes associated with COPD, 496 with GOLD classification, and 125 overlapping genes. The SVM classifier trained on cfDNA methylation features achieved complete separation between COPD and control groups, enabling a hard-margin model. This cfDNA-based classifier (EPIMETRIC) also distinguished GOLD A, B, and D subgroups. Enrichment analysis revealed pathways including glutamatergic synapse, nicotine addiction, tight junction, and Hippo signaling.
ConclusionsEPIMETRIC represents the first blood-based cfDNA methylation signature for COPD diagnosis and GOLD stratification. It provides a minimally invasive and highly accurate tool for early detection, personalized treatment, and identification of high-risk patients.
As highlighted in the 2025 GOLD update, COPD is increasingly recognized not as a static respiratory condition but as a complex, heterogeneous syndrome requiring precise stratification [1]. This evolving definition underscores a major public health challenge, as recent projections estimate that COPD prevalence will exceed 600 million individuals worldwide within the next three decades [2].
Despite being preventable and treatable, COPD remains underdiagnosed and frequently misdiagnosed, largely due to the limited sensitivity and accessibility of conventional diagnostic methods such as spirometry [3–5]. Early and accurate diagnosis is essential to improve patient outcomes, guide therapeutic interventions, and reduce long-term healthcare costs.
Although COPD etiology is multifactorial, cigarette smoking remains the predominant risk factor in high-income countries and the primary environmental driver of the pathological remodeling investigated in this study [6].
Current GOLD guidelines (2023 onward) use the ABE classification system, prioritizing exacerbation history to guide pharmacological management. However, from a pathobiological perspective, this system may obscure molecular heterogeneity, as Group E includes patients with widely variable symptom burden (previously Groups C and D). Because symptom severity often correlates with systemic inflammation and tissue remodeling, this grouping may mask distinct molecular signatures. Therefore, evaluating biomarkers using stratification systems that retain phenotypic granularity (e.g., ABCD classification) remains essential to fully capture disease biology [1,7,8]. The pathogenesis of COPD involves cumulative environmental and genetic influences throughout life, conceptualized as GETomics (gene–environment–time interactions) [6]. This complexity highlights the need for precision medicine approaches tailored to individual molecular and environmental profiles [9].
Among emerging biomarkers, DNA methylation—an epigenetic modification—has shown promise for early COPD detection and stratification [10]. Cell-free DNA (cfDNA), derived from apoptotic or necrotic cells in circulation, retains tissue-specific methylation patterns, making it a valuable biomarker source [11–13].
Liquid biopsy enables the analysis of cfDNA from peripheral blood, offering a minimally invasive approach to detect epigenetic alterations. Although previous studies have evaluated DNA methylation in COPD using tissue-derived DNA, validated blood-based methylation signatures for accurate diagnosis and stratification remain lacking [14–16]. This study addresses this gap by developing a cfDNA-based algorithm for COPD diagnosis and GOLD stratification.
MethodsThis observational, prospective, multicenter study included patients with COPD and healthy controls. Participants were recruited between 2021 and 2023 from three hospitals in Spain: Hospital Universitario Torrecárdenas (Almería), Hospital Universitario HLA Inmaculada (Granada), and Hospital de Alta Resolución de Loja (Loja). The study was approved by the hospitals’ ethics committees (CEIM/CEI No. 10/20) and conducted in accordance with the Declaration of Helsinki. All participants provided written informed consent.
Detailed inclusion and exclusion criteria and clinical variables are provided in the Supplementary Appendix. In accordance with GOLD guidelines at the time of recruitment, COPD patients were stratified using the ABCD classification system. The newer ABE system, although clinically useful, does not specifically address molecular biomarkers such as cfDNA methylation.
Peripheral blood samples were collected in EDTA tubes. cfDNA was extracted from 200μL of plasma using the QIAcube Connect platform (Qiagen), quantified using the Qubit 4 Fluorometer (Thermo Fisher Scientific) with the High Sensitivity dsDNA assay, and assessed using the High Sensitivity D5000 ScreenTape assay (Agilent TapeStation system).
Genome-wide cfDNA methylationFor genome-wide methylation analysis, 500 ng of cfDNA underwent bisulfite conversion using the EZ-96 DNA Methylation Kit (Zymo Research). Methylation profiling was performed using the Infinium MethylationEPIC BeadChip v1 (Illumina), covering more than 850 000 CpG sites, including RefSeq genes, CpG islands, ENCODE regulatory regions, and FANTOM5 enhancers.
Statistical analysisThe study was designed with an α level of 0.05 to detect a minimum absolute difference in methylation of 0.05, assuming an SD, 0.1 for both COPD and healthy control groups. Given that methylation profiles follow a beta distribution, sample size estimation was performed by simulation to achieve a target power close to 0.8 or greater, but not less than 0.7, yielding a minimum of 60 individuals per group (see Supplementary Appendix for a detailed description). Categorical variables were compared using Fisher's exact test or the chi-square test and are presented as absolute frequencies and percentages. Continuous variables were analyzed using analysis of variance (parametric or nonparametric, depending on the Shapiro–Wilk normality test) and are reported as mean (SD) or median (IQR), respectively.
All statistical tests were 2-tailed, with statistical significance set at P<.05 unless otherwise specified. Multiple testing was controlled using the false discovery rate (FDR) procedure [17] with a target q=.05, indicating that up to 5% of significant findings are expected to be false positives. This threshold was selected to ensure robust findings while preserving adequate statistical power to detect biologically relevant effects. All analyses were performed using R version 4.2.2. Visualization included beta methylation density plots per sample for normalization assessment; hierarchical clustering dendrograms and 3-dimensional principal component analysis (PCA) scatterplots for multivariate bias evaluation; Venn diagrams for overlap between conditions; Circos plots for genomic probe distribution; network diagrams for functional enrichment relationships; and stem plots for pathway enrichment. Genome-wide methylation data were deposited in the NCBI Gene Expression Omnibus database [18] under accession number GSE289742.
Methylation quality controlMethylation matrices were inspected for technical artifacts, including overexposed images and CpG probes with high noise levels (>0.20% of the matrix), using the ChAMP pipeline [19]. Beta values were normalized using ChAMP, and batch effects were corrected using ARSyN [19,20]. The experimental design (COPD vs healthy groups) was validated by hierarchical clustering.
Primary analysis: differential methylationProbes lacking Entrez Gene IDs or located in open-sea regions were excluded. Principal component analysis (PCA) was used for quality control and to assess potential bias related to demographic variables (sex, obesity, age). Differential methylation analysis was performed using a 2-level hierarchical beta regression model based on CpG island annotation, implemented with the betareg R package [21]. The first model identified CpG islands differentiating COPD from controls, and the second classified COPD patients into GOLD subgroups.
Secondary analysis: classifier algorithm for clinical diagnosis of COPDA hierarchical support vector machine (SVM) classifier with a linear kernel was developed using 10-fold cross-validation (e1071 R package) [22]. The first level classified COPD vs control status, and the second level stratified GOLD subgroups using a one-vs-all approach. Backward feature selection was applied to optimize the model and reduce dimensionality.
Exploratory analysis: functional enrichmentFunctional enrichment analysis of selected gene sets was performed using the DAVID database with KEGG pathway annotation [23]. Protein–protein interaction networks were constructed using the STRING database, limiting the number of interactors to 10 per gene to preserve interpretability [24].
ResultsFrom the initial cohort of 166 individuals, 128 participants (64 COPD and 64 controls) were included in the final analysis (Fig. 1). Significant differences were observed among GOLD groups in clinical phenotype distribution, total exacerbations, and modified Medical Research Council (mMRC) dyspnea scores (P<.001; Table 1).
Characteristics of patients with COPD stratified by GOLD criteria.*
| Characteristics | A (N=28/44%) | B (N=16/25%) | C (N=3/5%) | D (N=17/26%) |
|---|---|---|---|---|
| Age, median (range), y | 63 (41–80) | 70 (45–82) | 68 (61–78) | 68 (53–80) |
| Male sex, No. (%) | 22 (79) | 14 (88) | 3 (100) | 13 (77) |
| Active smoker, No. (%) | 14 (50) | 7 (44) | 2 (67) | 4 (23) |
| Obesity, No. (%) | 11 (39) | 2 (12) | 2 (67) | 4 (23) |
| COTE score, mean (range) | 1 (0–8) | 1 (0–6) | 1 (0–2) | 1 (0–6) |
| CTI, mean (range) | 43 (7–94) | 53 (0–126) | 44 (36–53) | 30 (12–120) |
| CAT>10, No. (%) | 11 (39) | 11 (69) | 3 (100) | 15 (88) |
| Symptomatic patients, No. (%) | 0 (0) | 16 (100) | 0 (0) | 17 (100) |
| Clinical phenotypes*, No. (%) | ||||
| Non-exacerbator | 20 (71) | 8 (50) | 0 (0) | 0 (0) |
| Frequent exacerbator with chronic bronchitis | 5 (18) | 5 (31) | 3 (100) | 7 (41) |
| Frequent exacerbator with emphysema | 1 (4) | 3 (19) | 0 (0) | 10 (59) |
| Asthma–COPD overlap | 2 (7) | 0 (0) | 0 (0) | 0 (0) |
| Total exacerbations, median (range) | 1 (0–2) | 1 (0–4) | 2 (2–2) | 2 (1–6) |
| mMRC score, No. (%) | ||||
| 0 | 4 (14) | 1 (6) | 0 (0) | 0 (0) |
| 1 | 23 (82) | 3 (19) | 3 (100) | 0 (0) |
| 2 | 1 (4) | 11 (69) | 0 (0) | 12 (70) |
| 3 | 0 (0) | 1 (6) | 0 (0) | 5 (30) |
| Pulmonary function decline, mL/y, median (range) | 11 (−45 to 145) | −7 (−93 to 20) | 14 (−11 to 50) | −7 (−85 to 33) |
COPD, chronic obstructive pulmonary disease; GOLD, Global Initiative for Chronic Obstructive Lung Disease; COTE, comorbidity test; CTI, cumulative tobacco index; CAT, COPD Assessment Test; mMRC, modified Medical Research Council dyspnea scale.
Statistically significant differences were observed for clinical phenotypes, total exacerbations, and mMRC score (P<.001). All patients had a history of smoking. Additional details on exacerbations are provided in the Supplementary Appendix.
Comparisons between COPD patients and controls showed significant differences in age, sex distribution, and obesity status (P<.001; Table S1). Given the potential influence of these variables on DNA methylation, PCA was performed to evaluate confounding effects. Visual inspection (Figs. S1–S3) showed no clustering by sex, obesity, or age, indicating minimal contribution to variance. No significant difference was observed between COPD and control groups regarding smoking status (P=.587).
Methylation quality controlMethylation probes were assessed (Fig. 2A), and technical artifacts were removed, including low-quality signals (defective signal or bead count<3 in ≥5% of samples) and probes identified by annotation (non-CG, SNP-related, multi-hit probes, and sex chromosome probes). Multivariate bias was corrected through iterative normalization of beta-value densities using ChAMP and ARSyN, primarily addressing batch effects associated with GOLD classification [22,23]. Hierarchical clustering confirmed group separation, and outliers were excluded (Fig. 1). Final beta distributions and clustering dendrograms are shown in Fig. 2B and C.
Overview of methylation analysis. (A) Quality control, including signal retention and multivariate bias removal. (B) Density plots of methylation beta values per sample for raw intensities, normalized data, and data after multivariate bias removal. (C) Hierarchical clustering dendrogram. (D) Three-dimensional principal component (PC) analysis scatterplot. Samples are color-coded according to Global Initiative for Chronic Obstructive Lung Disease (GOLD) groups A, B, C, and D and healthy controls. Colored ellipses were added to facilitate visualization of dispersion across groups. Panels B, C, and D use the same color legend. CTRL denotes healthy controls.
COPD subgroups formed distinct clusters; however, the 3 GOLD C samples showed heterogeneous distribution (one clustering with controls and two with GOLD A). Given the small sample size (N=3), these findings should be interpreted cautiously.
PCA of the first 3 components (Fig. 2D; Fig. S4) supported separation between COPD and controls and among COPD subgroups based on ABCD classification. GOLD C samples again clustered closer to controls and GOLD A. In contrast, PCA using the ABE classification separated COPD from controls but did not distinguish between subgroups (Fig. S5). Therefore, subsequent analyses were conducted using the ABCD classification.
Primary analysis: differential methylationA total of 354 genes distinguished COPD from controls, 496 genes were associated with GOLD classification, and 125 genes overlapped (Fig. 3A, Feature Selection panel), supporting the presence of a distinct methylation signature in COPD.
Classifier algorithm for clinical diagnosis of COPD. (A) Feature selection consisted of annotation-based filtering followed by differential methylation analysis, yielding a Venn diagram of cis-expressed genes from CpG islands. To determine COPD or GOLD status, the hierarchical support vector machine classifier selected the appropriate model to identify COPD and assign GOLD status using a one-vs-all strategy, yielding the genes shown in the Venn diagram. (B) Relationship between the genes used in the clinical classifier and their genomic locations. The outer Circos plot shows gene locations and is color-coded according to the Venn diagram in panel A, as well as CpG island and probe density. The inner Circos plot shows probe-to-gene annotation (right) and probe-to-CpG island annotation (left), whereas the connecting lines link the same methylation probe according to both annotation systems. COPD denotes chronic obstructive pulmonary disease; CTRL, healthy controls; and GOLD, Global Initiative for Chronic Obstructive Lung Disease criteria.
A hierarchical support vector machine (SVM) model was developed, with the first layer distinguishing COPD status and the second assigning Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification. The 3 GOLD C samples were excluded because of their limited representation, which could compromise statistical robustness. A linear SVM was selected because it identifies the optimal separating hyperplane by maximizing the margin between classes. This margin is inversely related to the cost parameter, requiring careful hyperparameter tuning. Under a hard-margin setting, no observations lie within the margin, and classes are perfectly separated, yielding 100% accuracy, 100% sensitivity, 100% specificity, and 100% positive and negative predictive values (Table S2). In contrast, a soft-margin approach allows limited misclassification to accommodate class overlap. For clinical COPD diagnosis, classification approximated a hard-margin scenario at a cost parameter of 1000, indicating clear separation between groups. Backward feature selection identified 44 genes as the optimal diagnostic signature for COPD and 61 genes for GOLD stratification, distributed across comparisons (Fig. 3A, Support Vector Machine panel; Table S2). Further reduction in the number of genes resulted in a non-hard-margin scenario and decreased classification performance, as reflected in the receiver operating characteristic curves (Fig. S6). Only 4 genes were shared between both models. Fig. 3B illustrates the relationship between classifier genes, their genomic locations, and corresponding probe annotations.
The final model, EPIMETRIC (EPIgenetic METhylation-based algorithm for Respiratory Illness Classification–COPD), was registered in Safe Creative S.L. (Spain) with accession No. 2504041362649.
Post hoc analysis of GOLD C samplesThe SVM classifiers were trained using GOLD A, B, and D groups. When the 3 GOLD C samples were introduced into EPIMETRIC, the first SVM classification step assigned the COPD label to all samples with a probability of 0.89. After confirming COPD status, the second SVM classification step applied a one-vs-all voting scheme across GOLD classes (A, B, and D), assigning the GOLD A label with probabilities of 0.971, 0.996, and 0.989 (see Supplementary Appendix). These findings are consistent with Fig. 2C and D. In the hierarchical clustering dendrogram (Fig. 2C), 2 of the GOLD C samples clustered within the GOLD A group, whereas the remaining sample clustered with healthy controls. Similarly, in the 3-dimensional principal component analysis (PCA) scatter plot (Fig. 2D), all 3 GOLD C samples were located closer to the centroid of GOLD A and healthy control groups than to GOLD D, based on the first 3 principal components. These components represent linear combinations in the β-methylation feature space.
Exploratory analysis: functional enrichmentFunctional enrichment and network analyses of genes included in the EPIMETRIC classifier identified several key hub genes, including LEF1, CTNNA2, TJP1, and GRIA2. These genes were associated with multiple biologically relevant pathways, supporting the pathogenic significance of the identified methylation alterations (Fig. 4A). Complementary protein-protein interaction analysis using the STRING database further characterized the relationships among classifier-derived proteins and their association with key pathways, including glutamatergic synapse, nicotine addiction, tight junction, and Hippo signaling (Fig. 4B and C). These results provide additional insight into the molecular mechanisms underlying the observed epigenetic changes.
Functional annotation analysis revealed key pathways in COPD. (A) Interactive descriptive plot of DAVID functional analysis showing genes and pathways. Genes are shown as green circles and pathways as blue rounded rectangles. Edge colors are based on differential comparisons from the classifier. Edge width represents the confidence (Pvalue) of the KEGG pathway association; thicker edges indicate higher P values. (B) STRING protein-protein interaction network. Protein colors indicate the pathways represented in panel C. (C) Enriched pathways from the KEGG functional analysis of the protein-protein interaction network. Pathways of interest are highlighted in color and ordered by group similarity.
Despite substantial advances in the understanding and treatment of COPD, underdiagnosis and misdiagnosis remain major barriers to effective management [7]. COPD is a common, preventable, and treatable disease; however, diagnostic tools such as spirometry are often underused or lack sufficient sensitivity, resulting in a substantial proportion of patients being misdiagnosed or left untreated [4,5]. Given the availability of interventions that can reduce morbidity and mortality, there is a pressing need to improve early and accurate diagnostic strategies [25].
In addition to smoking, environmental exposures such as air pollution and biomass fuel are major contributors to COPD pathogenesis. Early identification of at-risk individuals is essential to initiate timely interventions that may attenuate disease progression [4,26]. Nevertheless, currently available tools lack the sensitivity and specificity required for this purpose, underscoring the need for novel, noninvasive biomarkers.
Liquid biopsy has emerged as a promising noninvasive approach for capturing systemic molecular alterations, and cfDNA methylation analysis offers a powerful strategy for detecting disease-associated epigenetic changes [9,10]. In this exploratory study, we showed that cfDNA methylation signatures can differentiate patients with COPD from healthy controls and stratify them according to GOLD criteria with high internal accuracy. Although previous studies have reported associations between DNA methylation and COPD, our findings extend this evidence by demonstrating the feasibility of detecting these epigenetic signatures in a clinically accessible biospecimen, peripheral blood [16].
Rather than contradicting the 2023 and 2025 GOLD updates, which merged GOLD C and D into the unified E category [8,27], our analyses showed that GOLD C samples exhibited methylation profiles closer to those of healthy controls and GOLD A patients, suggesting possible biological heterogeneity. However, given the very small sample size (N=3), these findings should be interpreted cautiously and may not reflect the broader COPD population. Even so, this pattern may indicate underlying biological differences with potential clinical implications, particularly regarding the mechanisms driving frequent exacerbations in COPD. Grouping these patients together could obscure biologically relevant differences, emphasizing the need for further evaluation in larger cohorts [7,28].
In the first phase of the study, we found that 125 genes were shared between the COPD vs control and GOLD classifiers, suggesting that these genes may represent a core molecular program associated with both disease presence and severity stratification. These results further support the biological relevance of the methylation findings and provide mechanistic insights into COPD pathobiology, as well as into the development of the SVM classifier.
In the second phase, we developed EPIMETRIC, a hierarchical SVM-based algorithm that used cfDNA methylation features identified in the first phase to diagnose COPD and assign GOLD status. EPIMETRIC showed no classification errors during cross-validation within the analyzed cohort while using a reduced set of genes. These findings support the feasibility of cfDNA methylation-based classification; however, external validation is required to confirm generalizability and to minimize the risk of overfitting.
Evidence from epigenetic studies has shown that baseline DNA methylation patterns may predict corticosteroid responsiveness in COPD, reinforcing the potential of molecular profiling to guide treatment decisions [29]. In this context, although spirometry and blood eosinophil counts remain affordable cornerstones of current management, both primarily reflect functional impairment or a limited inflammatory dimension of the disease rather than its underlying biology. By contrast, cfDNA methylation profiling provides an objective, effort-independent biological readout that reflects systemic regulatory alterations associated with disease presence and severity. This approach may enable biologically informed patient stratification beyond symptoms alone [30], highlighting the importance of incorporating molecular markers into future COPD classification systems to better inform treatment decisions.
With regard to early diagnosis, pre-COPD and PRISm populations were not included in the present study; therefore, the value of cfDNA methylation in these settings remains undetermined. Future longitudinal studies are needed to evaluate the prognostic utility of cfDNA methylation and its potential role in monitoring treatment response.
Functional analysis of the methylation signature genes derived from EPIMETRIC revealed enrichment in pathways such as glutamatergic synapse, nicotine addiction, tight junction, and Hippo signaling [31–33]. Central proteins included LEF1, CTNNA2, GRIA2, and TJP1, which functioned as key regulatory hubs. These proteins provide a literature-based framework for improving understanding of disease mechanisms, even when not directly linked to COPD in all contexts, and may help identify potential therapeutic targets aimed at restoring lung tissue homeostasis and barrier integrity.
Study limitationsThe initial cohort of 166 samples was reduced by 7 because of incomplete clinical information and by 6 because of insufficient biological material for analysis. During methylation analysis, technical artifacts were excluded by removing 5 samples owing to matrix image defects and 4 samples because of failure to meet CpG probe detection thresholds. During normalization, 2 COPD samples with atypical clustering values and 14 additional samples identified during clustering (13 healthy controls and 1 COPD sample) were excluded. Thus, strict quality-control procedures reduced the final number of samples available for analysis.
In conclusion, cfDNA methylation profiling from peripheral blood shows strong potential for the diagnosis and stratification of patients with COPD. To our knowledge, EPIMETRIC is the first algorithm specifically developed using cfDNA methylation signatures for COPD diagnosis and GOLD classification. Although external validation is still required, these findings provide a solid proof of concept for future studies focused on early diagnosis, risk stratification, and the potential clinical utility of cfDNA-based molecular profiling.
Authors’ contributionsConceptualization: MJS. Data curation: AGD, MPMV. Formal analysis: AGD, CF. Funding acquisition: MJS. Methodology: VD, CF. Project administration: MJS, VD. Resources: PJRP, BAN, JJCR, MEAR. Software: AGD, CF. Supervision: MJS, BAN. Visualization: AGD, CF, VD. Writing—original draft: AGD, CF. Writing—review and editing: all authors.
Ethics approval and consent to participateThe study was approved by the hospitals’ ethics committees (CEIM/CEI No. 10/20) and conducted in accordance with the Declaration of Helsinki. All participants provided written informed consent.
Artificial intelligence involvementArtificial intelligence tools were used solely to enhance clarity and readability; they were not used to generate, analyze, or interpret scientific results.
FundingRegional Ministry of Health and Consumer Affairs of the Government of Andalusia and GlaxoSmithKline Spain (PIP-0192-2020).
Conflicts of interestNone declared.
The authors thank all patients who participated in this study. This work was supported by the Regional Ministry of Health and Consumer Affairs of the Government of Andalusia and GlaxoSmithKline Spain (PIP-0192-2020).













