Journal Information
Vol. 58. Issue 10.
Pages 679-680 (October 2022)
Download PDF
More article options
Vol. 58. Issue 10.
Pages 679-680 (October 2022)
Full text access
Is the Systematic Review and Meta-Analysis the Gold Standard for Scientific Evidence?
Gonzalo Labarcaa,
Corresponding author

Corresponding author.
, Luz M. Letelierb
a Division of Sleep and Circadian Disorders, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
b Departamento de Medicina Interna, Escuela de Medicina, Pontificia Universidad Católica de Chile, Chile
Article information
Full Text
Download PDF
Full Text

Systematic reviews (SRs) and meta-analyses aim to summarize a substantial body of evidence regarding a specific clinical question and provide a high-quality, valuable summary for clinicians and policymakers.1 However, SRs can be two-sided swords due to misunderstandings regarding the methodological steps and incorrect interpretations of the results.

The first concept regarding SRs is the difference between SRs and narrative reviews, minireviews, scoping reviews, and other reviews.1 An SR follows a well-defined prespecified method to answer a PICOS question (population, intervention, comparison, outcome, study design). Moreover, these questions focus on but are not limited to interventions (drugs, devices, others), diagnostic accuracy, and prognosis.2,3 Additionally, the protocol of each SR should be prospectively registered in an open access database, such as PROSPERO ( or the Cochrane library.4

Second, the literature search and study selection process in SRs must be broad and rigorous, including at least two different databases and a combination of keywords and syntax to identify potential studies without language restriction. Several databases, including PubMed (MEDLINE), Embase, LILACS, Clinical Trials (, Cochrane Central Register of Controlled Trials, ScienceDirect, Google Scholar, and the Directory of Open Access Journals, should be explored. Additionally, a manual search of the reference lists of included studies and examinations of meeting abstracts related to the PICOS question, including both industries-funded and nonfunded research, increase the quality of and confidence in SRs.1–4 For respiratory and sleep medicine, a search of the last five years by the American Thoracic Society (ATS), American College of Chest Physicians (Chest), and European Respiratory Society (ERS) is strongly encouraged.

Two independent reviewers perform the literature search and select the studies. The title and abstract screening process has essentially two steps: a broad database search using a syntax that aims to catch every study ever conducted; then, a precise screening process to reject all the studies that do not respond to the PICOS question. The title and abstract screening process should be done by two independent experts to systematically (using forms or spreadsheets) ascertain whether the article has original data in it and then whether it is relevant to the intended PICOS question. Then, the final study selection for qualitative and quantitative analysis should follow prespecified inclusion and exclusion criteria. Concordance between reviewers should be estimated with the Kappa (K) test.5 This process follows the PRISMA statement for SR of intervention,6 PRISMA-DTA guidelines for diagnostic accuracy,7 and MOOSE8 guidelines for SR of observational studies. Additionally, these data should be easy to find in Fig. 1—a Prisma flowchart. Accurate knowledge of this process provides concepts about the transparency and reproducibility of the SR. An SR aims to include every study ever conducted to respond to the PICOS question and do so in a replicable and rigorous manner.

Third, the risk of bias (RoB) should also be assessed independently by two reviewers, with a third review author consulted to resolve any discrepancy between reviewers. There are different objective and subjective tools to assess RoB; for example, the Cochrane RoB tool is recommended for SRs of intervention1; the QUADAS-2 is recommended for a diagnostic test9; and the Newcastle Ottawa (NOS) and Robins-1 are recommended for observational studies nonrandomized trials.10,11 A reasonable interpretation of the RoB of included studies is relevant because these decisions affect confidence in the results, and therefore, your Grading of Recommendations Assessment, Development and Evaluation (GRADE) recommendations.12

Assessing publication bias is a way to certify that all information was found and included in the SR. It will be evaluated through a visual inspection of a funnel plot and/or statistical test such as Egger's test reported in the meta-analysis. Asymmetry in the distribution of the studies raised concerns about publication bias.1

After a correct description of the structure of any SR, the understanding of the results includes a two-step process: (1) a detailed report of the SRs process, previously described and (2) a pooled analysis of the included studies. For this purpose, if the SR correctly identified more than two studies to combine, the meta-analysis is an acceptable method to follow.13 Otherwise, the SR should be restricted to a qualitative report of the included studies without meta-analysis.

In brief, a meta-analysis is a statistical method that combines the individual result of each study in one effect, increasing the sample size and, therefore, the precision of the intervention. The main component of a meta-analysis includes a pooled weighted measure, precision (confidence interval).1–3 Each included trial result is pooled in a meta-analysis using the DerSimonian and Laird method.14 The results are shown in the Forrest plot for each comparison. Continuous data are pooled and analyzed as the mean difference (MD) or weight mean difference (WMD), and categorical data are combined as the relative ratio (RR) and odds ratio (OR). Each analysis could be done using a random effects model or a fixed effects model. All results should include 95% confidence intervals (CIs) to show the precision of the results.1

Interstudy heterogeneity (percentage of total variation across studies not due to chance) is measured using visual inspection of a forest plot and an I2 test. There is no consensus about the % of thresholds to consider significant heterogeneity. In cases of heterogeneity in the meta-analysis, subgroup analysis can be used to evaluate possible explanations of the difference between groups following a prespecified subgroup analysis considering variables such as RoB, different clinical conditions, or others as potential causes.14

Additionally, another form of data analysis includes meta-regression analysis. This analysis explores the potential association between the outcome and prespecified variables; however, the absence of association found on meta-regression, especially when using study-level covariates, does not warrant the assertion that the covariate of individuals is nonrelevant.15

The report of the main results of the SR and meta-analysis should be reported using a summary of findings (SoF) table. According to the GRADE approach, the SoF table should predefine the clinically relevant outcomes regarding the PICOS question, providing data about the relative and absolute effect, study characteristics, and quality of the evidence. The main domain from GRADE includes data about (1) RoB of included studies, (2) inconsistency or substantial heterogeneity (I2) without explanation after subgroup analysis, (3) imprecision between results (Confidence interval of from the meta-analysis), including the minimal clinical important difference for improvement in each outcome, (4) risk of publication bias reported in the meta-analysis through the funnel plot statistical analysis and (5) indirectness related to study intervention or population.12

An SR that complies with the described standards should always be more informative than the individual studies that compose it. Nevertheless, the final step in the critical analysis of any SR and meta-analysis includes an assessment of its applicability, which should include the “real world” scenario, including patient values and preferences and cost effectiveness.1–3

After a good understanding of the methodological considerations, clinicians and researchers should be aware of the limitations of every SR. The main limitations of any SRs are related to the PICOS question (too general or too specific question); inaccurate literature search and study selection (pooling together different studies, like “apples and oranges”); the quality of the included studies (measure by the RoB); the publication bias; and the heterogeneity included in the studies. Finally, for many cases, especially for translational and basic science research (where study size is restricted to prespecified clinical and environmental conditions and experiments, and therefore, high interstudy variability regarding experiments and methods), SRs and meta-analyses do not provide the best scientific evidence.


Authors declare no founding regarding this study.

Conflict of interest

Authors declare no competing interest and no financial support.

Cochrane Handbook for Systematic Reviews of Interventions version 6.2 (updated February 2021). Cochrane,
L.M. Letelier, J.J. Manríquez, G. Rada.
Revisiones sistemáticas y metaanálisis: ¿son la mejor evidencia?.
Rev Med Chil, 133 (2005), pp. 246-249
A.P. Siddaway, A.M. Wood, L.V. Hedges.
How to do a systematic review: a best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses.
Annu Rev Psychol, 70 (2019), pp. 747-770
D. Moher, L. Shamseer, M. Clarke, D. Ghersi, A. Liberati, M. Petticrew, et al.
PRISMA-P Group. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement.
T. McGinn, P.C. Wyer, T.B. Newman, S. Keitz, R. Leipzig, G.G. For, et al.
Tips for learners of evidence-based medicine: 3. Measures of observer variability (kappa statistic).
CMAJ, 171 (2004), pp. 1369-1373
A. Liberati, D.G. Altman, J. Tetzlaff, C. Mulrow, P.C. Gøtzsche, J.P. Ioannidis, et al.
The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration.
M.D.F. McInnes, D. Moher, B.D. Thombs, T.A. McGrath, P.M. Bossuyt, and the PRISMA-DTA Group.
Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement.
JAMA, 319 (2018), pp. 388-396
D.F. Stroup, J.A. Berlin, S.C. Morton, I. Olkin, G.D. Williamson, D. Rennie, et al.
Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group.
JAMA, 283 (2000), pp. 2008-2012
P.F. Whiting, A.W. Rutjes, M.E. Westwood, S. Mallett, J.J. Deeks, J.B. Reitsma, et al.
QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.
Ann Intern Med, 155 (2011), pp. 529-536
J.M. Hootman, J.B. Driban, M.R. Sitler, K.P. Harris, N.M. Cattano.
Reliability and validity of three quality rating instruments for systematic reviews of observational studies.
Res Synth Methods, 2 (2011), pp. 110-118
J.A. Sterne, M.A. Hernán, B.C. Reeves, J. Savović, N.D. Berkman, M. Viswanathan, et al.
ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions.
G.H. Guyatt, A.D. Oxman, G.E. Vist, R. Kunz, Y. Falck-Ytter, P. Alonso-Coello, et al.
GRADE: an emerging consensus on rating quality of evidence and strength of recommendations.
R. DerSimonian, N. Laird.
Meta-analysis in clinical trials.
Control Clin Trials, 7 (1986), pp. 177-188
X. Sun, M. Briel, S.D. Walter, G.H. Guyatt.
Is a subgroup effect believable? Updating criteria to evaluate the credibility of subgroup analyses.
BMJ, 340 (2010), pp. c117
S.G. Thompson, J.P. Higgins.
How should meta-regression analyses be undertaken and interpreted?.
Stat Med, 21 (2002), pp. 1559-1573
Copyright © 2021. SEPAR
Archivos de Bronconeumología

Subscribe to our newsletter

Article options

Are you a health professional able to prescribe or dispense drugs?