Covid-19 origins: Weighing the odds

The Covid-19 origins debate has been contentious from the start. The only certainty is that something very unlikely happened in Wuhan at the end of 2019.

Since the beginning of the Covid-19 pandemic, the question of the virus’s origins has often been dominated by arguments about whether the SARS-CoV-2 virus was engineered or not — a compelling debate but an acrimonious one, and one that’s only part of a larger question about whether the virus (in natural form or otherwise) was being stored and researched at any of several laboratories in Wuhan. Those labs are the biosafety level-2 (BSL-2), BSL-3, and BSL-4 labs at the Wuhan Institute of Virology (WIV) and the labs at the CDC facilities in Wuhan.

Putting aside completely the matter of whether the virus was engineered, there’s another question that could have been asked even before the pandemic began: What are the chances of a novel coronavirus outbreak emerging naturally out of Wuhan? Versus the chances of a coronavirus outbreak emerging from one of the Wuhan labs storing and researching coronaviruses.

My estimates suggest that neither scenario is extremely unlikely compared to the other:

  1. For a lab-related incident, an outbreak of 1,000 or more coronavirus infections from one of the Wuhan labs might happen once in 500 to 50,000 years.
  2. For natural spillover in Wuhan, an outbreak of 1,000 or more coronavirus infections might happen once in 1,000 to 14,000 years.

In the absence of direct evidence, the only thing we know for sure is that one of these two low-probability events happened at the end of 2019.

What was the risk of an outbreak from one of the Wuhan labs?

If the SARS-CoV-2 virus was inside one of the Wuhan labs, what was the risk of it escaping and causing a significant outbreak — an outbreak that was more likely than not to spread out of control as local, national, and then international authorities all failed to contain it?

Ideally, we would answer this question by looking at the actual risk assessments for the labs in Wuhan. But in the absence of those risk assessments, we’ll turn to other resources — such as risk assessments prepared for other labs.

In the early 2000s, plans were announced for infectious-disease labs at Boston University. An initial risk assessment for the labs was reviewed by the National Research Council and found to have serious shortcomings including a failure to address worst-case scenarios. When the issue came before a Massachusetts court, the court requested a more rigorous study of the labs’ potential risks.

The full, final risk assessment¹ ran to 2,716 pages and looked at the risks associated with 13 pathogens. After its publication, the National Institutes of Health, which had commissioned the risk assessment, expressed a high degree of confidence in the Boston labs based on “the safeguards built into the facility, the low amounts of pathogens that will be present, and the culture of biosafety and training that will be integrated into everyday practice”.

The assessment found that the risk to the public was extremely low with the exception of infections involving 1918 H1N1 influenza and SARS. As noted on the final page of the risk-assessment summary, infections from a release of 1918 H1N1 influenza or SARS might occur over 500–5,000 years of the labs’ operation. (The planned operational life of a typical lab is only a few decades, but risk assessments call for us to imagine labs where researchers store and experiment on viruses year after year, century after century, millennium after millennium.)

If we’re to use the Boston labs’ risk assessment as a reference as we estimate the risks for the Wuhan labs, we’ll first have to assume that the Wuhan labs have been built, maintained, and operated to standards that are the same as the Boston labs. We’ll have to assume that the storage procedures, research activities, waste-disposal practices, and virus-collecting field trips are all comparable. But we’ll have to concede that none of the pathogens covered by the Boston labs’ risk assessment — including 1918 H1N1 influenza and SARS — are exactly equivalent to SARS-CoV-2.

For a lab-leak of the 1918 H1N1 flu virus, the Boston labs risk assessment summary reported that one or more people in the community might be accidentally infected once in 100–10,000 years. (Note that once in 100–10,000 years is not the risk of a 1918 H1N1 flu pandemic emerging from the Boston labs. Although pandemics start with one person, the infection of just one person is not likely to lead to a pandemic. In most cases, one person or several will be infected and then the outbreak will peter out or be contained.)

As the Boston labs’ risk assessment notes, predicting for larger outbreaks gets more difficult — the uncertainties are much greater. In the risk assessment summary for the Boston labs, an outbreak of 1,000 or more infections of 1918 H1N1 flu virus is estimated at once in 10,000–1 million years.

Yet there’s only so far this comparison can take us in terms of determining the probability of a large coronavirus outbreak (an outbreak large enough to have pandemic potential) emerging from one of the Wuhan labs. The Boston University biolabs are not Wuhan’s labs. And 1918 H1N1 influenza and SARS are not SARS-CoV-2. Even so, the risk assessment for the Boston labs does provide us with some sense of the magnitude of risk at a well-run biolab studying dangerous pathogens. And with no access to the actual risk assessment for the Wuhan labs, we might use once in 10,000–1 million years as the starting point for estimating the risk of an outbreak of 1,000 or more coronavirus infections from one of the Wuhan labs collecting and experimenting on coronaviruses.

Whether the Wuhan Institute of Virology was as well run as the Boston labs is questionable. It has been confirmed by Shi Zhengli, Director of the WIV’s Centre for Emerging Infectious Diseases, that the WIV was doing coronavirus research in BSL-2 labs rather than exclusively at BSL-3 labs (the biosafety level used at the Boston labs for research into H1N1 flu and SARS viruses). It has also been alleged that WIV field teams were conducting risky virus collecting in bat caves without appropriate precautions, that they were using inadequate protective clothing and equipment. Together these vulnerabilities would have increased the risk of a coronavirus outbreak related to lab research. But by how much? A factor of 10? A factor of 20? Could the risk of 1,000 or more coronavirus infections in the community have been as high as once in 500 to 50,000 years?

Coronavirus pandemics — historically few and far between

In March 2021, NPR published a story about coronavirus pandemics yet to come. NPR interviewed researchers who warned that another pandemic caused by a different coronavirus might strike very soon.

Edward Holmes, an evolutionary biologist at the University of Sydney, described to NPR how coronaviruses are under our feet in rodents, above our heads in bats, how we live in “a kind of coronavirus world”. NPR mentioned that coronaviruses are carried by many other animals including dogs, cats, birds, chickens, and pigs.

NPR also spoke with Peter Daszak, a virus researcher and president of the non-profit EcoHealth Alliance (an infectious disease research NGO). Dr Daszak estimated that more than a million people in Southeast Asia are infected with unknown coronaviruses from bats each year. Obviously, these infections don’t all lead to pandemics, but as NPR pointed out, every infection is an opportunity for a coronavirus to adapt and spread.

From reading NPR’s article, it seems remarkable that we went so long without a coronavirus pandemic. Surrounded by coronaviruses, millions of people infected with unidentified coronaviruses each year, maybe it was just a run of good luck that stopped coronavirus pandemics from striking humanity more often.

Yet coronavirus pandemics seem to have been uncommon in the past. Before the SARS-CoV-2 virus, there were only four known endemic coronaviruses in the human population. Still circulating after past pandemics, these are the common-cold viruses HCoV-229E, -OC43, -NL63, and -HKU1.

Ralph Baric, a coronavirus researcher at the University of North Carolina at Chapel Hill, spoke with The Irish Times in February 2021 about the history of the four known endemic coronaviruses. Looking back over the past millennium, Dr Baric told The Irish Times that three of the four coronaviruses jumped from bats to humans between 200 and 800 years ago, and that the fourth — OC43 — was likely the cause of the “Russian flu” pandemic in the 1890s.

Five coronavirus pandemics in the past thousand years might suggest they’ve been happening once every 200 years or so. That isn’t to say that there won’t be another coronavirus pandemic this year or next year, but on average they’ve been occurring once every 200 years.

The fact that there have been just four (now five) known coronavirus pandemics over the past 800 years shouldn’t lead us to underestimate the likelihood of future ones. For one thing, we don’t know if there were additional human coronaviruses that emerged as pandemics over the past millennium — other coronaviruses for which researchers have no record. It should be noted, too, that the world’s population today is many times larger than it was a thousand years ago or even a hundred years ago, and more people means many more opportunities for viruses to jump from animals to human beings and cause pandemics.

A novel coronavirus outbreak in Wuhan — a 1-in-1,000-year event?

A coronavirus pandemic emerging somewhere in the world was probably going to happen at some point this century — experts had long predicted one. But a coronavirus pandemic emerging in any one specific location (including Wuhan) is a low-probability event.

We won’t try to estimate the likelihood of a pandemic emerging out of Wuhan. Instead, as we did when estimating the lab-related risk, we’ll look at how often a novel coronavirus outbreak of 1,000 or more infections might occur in Wuhan, a city of 11 million, in a country of 1.4 billion, in a wider region of around 4 billion (China, South Asia, and Southeast Asia), in a world of close to 8 billion.

As a starting point we’ll assume that globally there will be five outbreaks per century (on average) of 1,000 or more infections of novel coronavirus. So far this century we’ve had two such outbreaks — SARS and Covid-19. Arguably MERS too, although no single MERS outbreak has been close to 1,000 cases.

If we assume that the chances of an outbreak in Wuhan are the same as everywhere else in the world then the chances are once in 14,000 years (we’ll use this as the upper range of our estimate). But because coronavirus outbreaks are more likely in the region adjacent to Wuhan, more likely to spread in a big city like Wuhan, and more likely detected in a large modern city like Wuhan, we’ll assign a lower range of once in 1,000 years (the number is arbitrary, but if there’s an argument for why it’s likely to be more often than once in 1,000 years, it would be good to hear it).

So there we have it: a back-of-an-envelope calculation for natural spillover in Wuhan that estimates an outbreak of 1,000 or more novel coronavirus infections might happen in the city once in 1,000 to 14,000 years.

Natural origins? Or via a lab? Dismissing either hypothesis serves no good

In January and February 2021, an international team selected by the World Health Organization joined Chinese experts in Wuhan to look into the origins of the Covid-19 pandemic.

At the end of their mission — following a discussion and a vote — the WHO team reached consensus on the likelihood of each origins hypothesis. The team used a scale of relative terms — “extremely unlikely”, “unlikely”, “possible”, “likely”, and “very likely”. It’s not clear (to this writer) what percentages the WHO team assigned to these relative terms. For example, does “extremely unlikely” mean less than 1% probability (we’ll assume it does), or does it mean less than 5% probability?

The WHO team dismissed laboratory origins as “extremely unlikely” and said that there was no merit in investigating lab origins any further (months later, Peter Ben Embarek, the WHO team’s leader, seemed to walk back this assertion). Instead, they went all in on the natural origins hypotheses. The WHO team judged origins through an intermediate animal host to be “likely to very likely” and they judged direct zoonotic spillover to be “possible to likely”. Origins through frozen food was judged to be “possible”. (Two — maybe even three — of the four scenarios were given likelihoods in the upper half of the WHO’s scale, suggesting they each had a probability of 50% or more, which is perplexing because surely the sum of all probabilities should equal 100%.)

Earlier on, we looked at the likelihood of a coronavirus pandemic emerging naturally out of Wuhan and wondered whether it might be as high as a 1-in-1,000-year event. A 1-in-1,000-year event could qualify as a “very likely” cause of the pandemic (the WHO team’s assessment of natural origins) as long as all other possible pathways are much less likely.

What of the lab-leak scenario? How would we define this “extremely unlikely” event (the WHO team’s assessment) when comparing it to a “very likely” 1-in-1,000-year event?

Clearly, the lab-leak scenario can’t be a 1-in-10,000-year event because that would mean there’s nearly a 10% chance of Covid-19 having lab origins — a 10% chance falls short of “extremely unlikely”.

Nor can the lab-leak scenario be a 1-in-20,000-year event because that would still give lab origins just under a 5% chance, which also falls short of “extremely unlikely”.

So what would be defined as “extremely unlikely” according to the WHO team’s scale? A 1-in-100,000-year event?

It is bold to claim that a coronavirus outbreak would not be expected to emerge from a coronavirus research lab in 10,000 to 100,000 years of the lab’s continuous operation. But this, it seems, is what those who dismiss the lab-leak hypothesis are implying. This is an extraordinary amount of confidence to have in the security and safety procedures of labs that we don’t work at, labs that very few of us have ever visited, labs whose full suite of research activities we can’t be certain of and that we certainly can’t vouch for.

Here again are my own estimates for the natural spillover and lab-leak scenarios. Two low probability events, one of which happened, neither of which seems extremely unlikely compared to the other:

  1. For a lab-related incident, an outbreak of 1,000 or more coronavirus infections might happen once in 500 to 50,000 years.
  2. For natural spillover in Wuhan, an outbreak of 1,000 or more coronavirus infections might happen once in 1,000 to 14,000 years.

On the Covid-19 origins issue, the most sensible position — from the beginning of 2020 right through to the present moment — has never been to dismiss lab origins as “extremely unlikely”. From the start, the most sensible position has been to acknowledge that lab origins are quite possible. Until there’s direct evidence, it serves no good purpose to rule out either line of inquiry.

But direct evidence may be a long time coming. Perhaps an animal source for the virus will eventually be found proving natural zoonosis. But if the virus was the result of a lab incident, it may take many years or even decades for the facts to emerge. In time, scientists, journalists, and others may uncover conclusive evidence. Or a whistleblower in China may get the word out. Or, as happens from time to time with authoritarian regimes, future Chinese leaders may reveal the truth if they think it’s in their interests to discredit their predecessors. But until then, the only thing we can be sure of is that one of two very low-probability events happened at the end of 2019.

¹ Page number references for the risk assessment for the infectious disease labs at Boston University: Pages 1587–1607 for Chapter 9, detailing the risk assessments of secondary transmissions of initial infections to members of the public; Pages 1593–94 (9.3.4 ) for the risk assessment of 1918 H1N1 flu infections of members of the public; Pages 1594–95 (9.3.5) for the risk assessment of SARS infections of the public; Page 556 (4.2.3) for a definition of a laboratory needlestick event; Page 1798 (11.6) for why a needlestick event was used as the basis of risk assessments of infections of the public (rather than needlestick risk and centrifuge risk being assessed individually); Also available is the 23-page risk assessment summary.

A shorter version of this article was published on Medium in May 2021.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store