Statistical and mathematical modeling in the coronavirus epidemic: some considerations to minimize biases in the results

Matabuena, Marcos; Padilla, Oscar Hernan Madrid; Gonzalez-Barcala, Francisco-Javier

doi:10.1016/j.arbr.2020.04.006

Archivos de Bronconeumología

ISSN: 0300-2896

Archivos de Bronconeumologia is an international journal that publishes original studies whose content is based upon results of research initiatives dealing with several aspects of respiratory medicine including epidemiology, respiratory physiology, pathophysiology of respiratory diseases, clinical management, thoracic surgery, pediatric lung diseases, respiratory critical care, respiratory allergy and translational research. Other types of articles such as editorials, reviews, and different types of letters are also published in the journal. Additionally, the journal expresses the voice of the following scientific societies: the Spanish Respiratory Society of Pneumology and Thoracic Surgery (SEPAR; https://www.separ.es/), the Latin American Thoracic Society (ALAT; https://alatorax.org/), and the Iberian American Association of Thoracic Surgery (AIACT; http://www.aiatorax.com/).

It is a monthly journal in which all manuscripts are sent to peer-review and handled by the editor or an associate editor from the team and the final decision is made on the basis of the comments from the expert reviewers and the editors. The journal is published solely in English. All the published data is composed of novel manuscripts not previously published in any other journal and not being in consideration for publication in any other journal..

The journal is indexed at Science Citation Index Expanded, Medline/Pubmed, Embase and SCOPUS. Access to any published article is possible through the journal's web page as well as from Pubmed, ScienceDirect, and other international databases. Furthermore, the journal is also present in X, Facebook and Linkedin. Manuscripts can be submitted electronically using the following web site: https://www.editorialmanager.com/ARBR/.

Indexed in:

Medline, Science Citation Index Expanded (SCIE)

The new coronavirus (SARS-CoV-2)1,2 has demonstrated the heavy health and socioeconomic impact that an epidemic can have worldwide. In the face of such pandemics, governments and health authorities must act quickly3 and implement policies that aim to limit the transmission of the virus, avoid the collapse of the health system, and reduce the morbidity and mortality associated with the virus - strategies all driven by the need to prioritize resources in settings where they are scarce. In this respect, supporting decision-making with the use of mathematical models can be a key factor. These tools are potentially useful for explaining and predicting the speed and manner in which the virus spreads, in order to support health planning, identify and stratify patient risk, and establish prognosis from electronic records.

A crucial consideration in the area of mathematical modeling is that the data collected are usually observational in nature. This may lead to significant bias in the results obtained from the systematic application of conventional statistical techniques.4 Another important factor is incomplete information,5 such as censored and lost data. As no diagnostic tests are performed in many cases, it is impossible to know whether or not they are infected. In addition, endpoints such as recovery or death have not yet been reached during the course of the study. Moreover, patients with no symptoms or mild symptoms are the least likely to visit a doctor or even have a diagnostic test. Again, ignoring the effects of missing or censored data may confer significant bias on the conclusions reached.5

From a statistical point of view, the study design may be more important than the amount of data collected. However, in a health emergency, governments may be overwhelmed and data may be collected from severe cases only. To determine the actual extent of the pandemic, random population sampling is necessary. A clear exception to this SARS-CoV-2 crisis is the case of South Korea and Singapore, where population tests were conducted systematically, allowing outbreaks of infection to be isolated more quickly, to the extent that the effects of the virus were mitigated more quickly than in other countries.

From an epidemiological point of view, it is important to highlight the need to identify variables that indicate patient risk and prognosis. The most popular indicator is undoubtedly the mortality risk, which measures the likelihood that a patient will die if he or she has the disease. Precise estimations are not simple, and as indicated above, given the observational nature of the recorded data, the presence of biases is customary. According to Lipsitch et al.,6 biases occur because of a delay in recording information or because there is a preponderance of patients at higher risk in the database. A potential solution to this problem in the analyses is to stratify patients into different groups based on their severity and prognosis. The use of specific techniques to manage causal inference or missing data, such as the Propensity Score or doubly robust estimators, is also recommended.7 This approach can improve statistical inference drawn from patients belonging to each stratum.

The large discrepancies in the proportion of symptomatic patients and the mortality risk associated with SARS-CoV-2 underline the need to adopt these approaches. On March 5, 2020, the percentage of asymptomatic patients reported by the European Center for Disease Prevention and Control was 80%. However, in a study of patients from the Diamond Princess cruise ship, this figure was 20%.9 In the latter case, the study sample comprised a greater proportion of older patients with a higher probability of presenting symptoms, making it difficult to extrapolate the conclusions to the general population. Similarly, the fatality rate varies significantly (estimates range between 0.4% and 15%10), partially due to the problems mentioned. The precise characterization of these variables based on the epidemiological profiles of the population is essential to understand the transmission mechanisms of the virus11 and predict future care demands.

A basic criticism of epidemic modelling is that parameters are frequently adjusted according to government-provided statistics on infected subjects, despite the fact that very few countries can provide clear evidence that these figures reflect the real situation, given the lack of knowledge about the percentage of asymptomatic patients and lack of overall testing among the population. In fact, asymptomatic patients may be the main transmitters of the virus.11

Mathematical models can be an important tool for anticipating future developments and supporting decision-making. However, if data are inaccurate and specific techniques to correct the observational nature of the recorded data are not used, conclusions may be biased. In this regard, all relevant institutions should make an effort and openly provide high-quality data,12 so that scientists can find the solutions most beneficial to society. Simultaneously, in the current era of big data,13 collaboration between different stakeholders (health management, care, research, etc.) is essential. The use of big data would facilitate the construction of more complex models that can take advantage of all data recorded from individual patient monitoring14 and in this way provide more agile responses to current epidemics.15

Funding

This work has received financial support from the Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2019–2022 ED431G-2019/04) and the European Regional Development Fund (ERDF), which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University System.

Conflict of interests

The first 2 authors state that they have no conflict of interests.

Francisco-Javier Gonzalez-Barcala has received honoraria for consultancy, projects or presentations from Chiesi, Menarini, Rovi, Bial, GlaxoSmithKline, Laboratorios Esteve, Teva, Gebro Pharma, ALK, Roxall, Stallergenes-Greer, Boehringer Ingelheim, Mundipharma and Novartis.

References

[1]

C. Huang, Y. Wang, X. Li, L. Ren, J. Zhao, Y. Hu, et al.

Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.

Lancet, 395 (2020),

[2]

N. Zhu, D. Zhang, W. Wang, X. Li, B. Yang, J. Song, et al.

A novel coronavirus from patients with pneumonia in China, 2019.

N Engl J Med, 382 (2020), pp. 727-733

http://dx.doi.org/10.1056/NEJMoa2001017 | Medline

[3]

I. Kickbusch, G. Leung.

Response to the emerging novel coronavirus outbreak.

BMJ, 368 (2020),

http://dx.doi.org/10.1136/bmj.l6968 | Medline

[4]

S. Greenland.

Multiple-bias modelling for analysis of observational data.

J R Stat Soc Ser A Stat Soc, 168 (2005), pp. 267-306

[5]

A. Tsiatis.

Semiparametric theory and missing data.

Springer Science & Business Media, (2007),

[6]

M. Lipsitch, C. Donnelly, C. Fraser, I. Blake, A. Cori, I. Dorigatti, et al.

Potential biases in estimating absolute and relative case-fatality risks during outbreaks.

PLoS Negl Trop Dis, 9 (2015),

http://dx.doi.org/10.1371/journal.pntd.0004151 | Medline

[7]

H. Bang, J. Robins.

Doubly robust estimation in missing data and causal inference models.

Biometrics, 61 (2006), pp. 962-973

http://dx.doi.org/10.1111/j.1541-0420.2005.00377.x | Medline

[8]

R.M. Anderson, H. Heesterbeek, D. Klinkenberg, T.D. Hollingsworth.

How will country-based mitigation measures influence the course of the covid-19 epidemic?.

Lancet, 395 (2020), pp. 931-934

http://dx.doi.org/10.1016/S0140-6736(20)30567-5 | Medline

[9]

K. Mizumoto, K. Kagaya, A. Zarebski, G. Chowell.

Estimating the asymptomatic proportion of coronavirus disease 2019 (covid-19) cases on board the diamond princess cruise ship, yokohama, japan, 2020.

Euro Surveill, 25 (2020),

http://dx.doi.org/10.2807/1560-7917.ES.2020.25.32.2001410 | Medline

[10]

D.D. Rajgor, M.H. Lee, S. Archuleta, N. Bagdasarian, S.C. Quek.

The many estimates of the covid-19 case fatality rate.

Lancet Infect Dis, (2020),

[11]

Y. Bai, L. Yao, T. Wei, F. Tian, D.-Y. Jin, L. Chen, et al.

Presumed asymptomatic carrier transmission of COVID-19.

JAMA, (2020),

http://dx.doi.org/10.1001/jama.280.3.292 | Medline

[12]

S.P. Layne, J.M. Hyman, D.M. Morens, J.K. Taubenberger.

New coronavirus outbreak: framing questions for pandemic prevention.

Sci Transl Med, 12 (2020),

[13]

N.G. Reich, L.C. Brooks, S.J. Fox, S. Kandula, C.J. McGowan, E. Moore, et al.

A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the united states.

Proc Natl Acad Sci U S A, 116 (2019), pp. 3146-3154

http://dx.doi.org/10.1073/pnas.1812594116 | Medline

[14]

X. Li, J. Dunn, D. Salins, G. Zhou, W. Zhou, S.M. Schüssler-Fiorenza Rose, et al.

Digital health: tracking physiomes and activity using wearable biosensors reveals useful health-related information.

PLoS Biol, 15 (2017), pp. e2001402

http://dx.doi.org/10.1371/journal.pbio.2001402 | Medline

[15]

C. Viboud, A. Vespignani.

The future of influenza forecasts.

Proc Natl Acad Sci U S A, 116 (2019), pp. 2802-2804

http://dx.doi.org/10.1073/pnas.1822167116 | Medline

☆

Please cite this article as: Matabuena M, Padilla OHM, Gonzalez-Barcala FJ. Modelado estadístico y matemático en la epidemia del coronavirus: algunas consideraciones para minimizar los sesgos en los resultados. Arch Bronconeumol. 2020;56:601–602.