The aim of this post is to present an awareness of the underlying biases in the commonly used mortality rate metric by several institutions. I do not claim that the calculations presented here are accurate and I would be more delighted if you point out the mistakes anywhere. I have tried to acknowledge information with links wherever needed. The conclusions presented here may vary as the pandemic progresses, I recommend not to take this blog as your primary source for argument but as an initial point of your research into deriving meaning out of numbers that are constantly bombarded on us.
When a pandemic is ongoing, three questions that arise in everyone’s mind are,
- How likely is that I could get infected with the disease?
- How likely is that I could die once I get infected?
- How good is my country doing in containing the disease
We think, a simple calculation should give us an estimate of the values to the above questions,
- Number of infected cases / Total population
- Number of death cases due to the disease / Number of infected cases
- See the trend of the people recovering daily
Suppose you see a graph with the three terms above slapped with a name and a value,
- Infection incidence rate = some percentage
- case fatality rate = some percentage
- recovery rate = some percentage
One could pose numerous questions while interpreting them,
- Is the infected cases reported taking into account of all the ‘actual’ infected cases in the population?
- How to infer information when the parameters of these terms are ever changing? Could I use for example case fatality rate to compare between nations of different population?
- How does this statistic change taking into account of age?
- How does daily deaths stack up against recovery rate?
Here in this post, I will talk a bit about case fatality rate.
CFR is strictly not a rate. Over all most sources I have seen agree that it is the proportion of number of deaths to the number of infected cases. So to see how CFR is changing daily we need a time series plot of the above said calculation. Take everyday’s cumulative deaths and express it as a percentage of everyday’s cumulative confirmed cases. The black (bottom most curve from top) plot does exactly that. Forget about the other two curves for now.
Dataset : John Hopkin’s covid dataset (Country wise time series of number of confirmed, death and recovered cases)
Could this measure alone be used to evaluate action plans and policies? Does a lower CFR mean we are “bending the curve” ?
One immediate observation we could make is CFR being a ratio, with that alone given, one could not infer the amount of infected cases or deaths, for example,
If one country has 100 cases and five deaths, its CFR is 5 percent. If another country has 100,000 cases and 1,000 deaths, its CFR is 1 percent.
Now how about CFR as an instrument for measuring the risk of death given one is infected, it’s a ratio expressed as a percentage right? that should give me the probability accurately right?
No, When a pandemic is ongoing, CFR cannot accurately tell the risk of death for an infected person. Why?
- It could be over estimated – the denominator, number of confirmed cases may not be the number of all the infected cases. The actual number of infected cases might be much larger because most people remain asymptomatic and don’t appear for tests till when the symptoms show. This also could explain why we see high values in the beginning of the pandemic.
- It could be under estimated – The numerator, could be a lower number because not all confirmed cases are ‘closed’ on a particular day. (see point 3)
- Risk should be calculated with number of ‘closed’ cases in denominator. Closed cases on a particular day are those whose outcome is known (either dead or recovered). The above plot I generated cannot be used to infer risk because some of the infected cases may shift to the numerator (death cases, unfortunately) at a later time and there by increasing the numerator. We just don’t know that yet. consider for example on 15 august, the reports are : 2 out of 100 cases died so far, does that mean the risk of death is 0.02? No we cannot naively conclude like that because what about the remaining 98 cases, We just don’t know how many of the 98 may die later. Thus CFR is unreliable for calculating risk of death while the epidemic is ongoing.
- One typical example of under estimating CFR, happened with SARS-CoV outbreak in 2003, CFR was initially under estimated ( 4% ) but the final CFR ended up being 9.6%, causing panic that the virus had become deadly but it was because all the deaths were catching up.
- Apart from all these, we also assume that all the deaths have been reported (there is still a question of reach and focus of health care in under developed parts of the country) and comorbidity has been ruled out (An infected individual already suffering from any other terminal illness and dying because of an issue apart from the virus)
Taking into account of these biases, there are two more estimators for CFR,
- Blue curve (from top, the first) – Number of deaths/(Number of deaths + Number of recoveries)
- Green curve (from top, the second) – Number of deaths/(Number of confirmed T days ago)
But still there are flaws in the simple estimator:
- Had we taken a group of N people, waited till the outcomes of all N people known and then applied the calculation we would have obtained the mortality rate of that group of N people. The denominator of the proportion here looks like ‘closed cases’, isn’t it? Yes it is, but does it tell the whole picture? The denominator does not tell the whole story of closed cases. It ignores the majority of active cases on a particular day whose results will be known only in future. Thus, it has neglected the fact that recovery takes longer time ( Note: a patient is considered recovered depending on the treatment, some hospitals might mark recovered after two days of no symptom or three days, which is again something we have not considered). More explanation
- The flaw here is easier to explain, we have assumed an estimate of T = 7 days on average for recovery to adjust for the lag which requires further analysis. Though it is definitely not T = 0.
All these estimators appear to converge because as more and more confirmed cases show, their results (as either dead or recovered) are known right towards the end of the pandemic and provided all cases have been reported when they show symptoms, we will get to know the real fatality rate at the end.
The true severity of a disease can be described by the Infection Fatality Ratio which takes into account of all infected cases in the denominator but we don’t know this when a pandemic is ongoing. If every case is reported after symptoms show up then CFR approaches IFR towards the end of the pandemic. When a pandemic is ongoing, IFR can only be estimated by statistical models like SIER and other models.
Can CFR be compared across countries without taking into account of anything else? A country’s CFR could come down simply because of the fact that they are detecting more mild cases, typically (younger population). Thus testing being the “hidden variable” in CFR, one cannot simply take CFR and compare between countries when they are not identical in several settings.
Read more :
Acknowledging all this biases, CFR would be a good metric to assess the effectiveness of treatment. But having a good treatment is only good till you don’t burden the country’s health infra structure. In the recent viral President Trump- swan interview, President Trump upheld the low CFR but that does not tell the whole picture as discussed above. One reddit user gave this tangible analogy “Imagine a 50-chamber revolver that you’re playing russian roulette with. Given that you play the game, there’s a 1/50 chance you die. That’s CFR. If only 50 people play, then one person is expected to die. Some countries play with a 100-chamber revolver (1/100 chance of death from the game), others play with a 10-chamber revolver (1/10 chance of death). The US has a low chance of dying from the game. However, the US has a far greater proportion of people playing it, and recruiting other people to play it with them. The p(Playing russian roulette) is therefore greater in the US. Even if the US has decent odds of surviving given that you play, you are more likely to play in the first place than if you lived elsewhere. That is why CFR is not a good measure. I’d rather not play the game at all, than to have a good chance of surviving the game. In the US, you have a decent chance of surviving the game, but you’re more likely to be playing, and you don’t want to play in the first place.”
- CFR is unreliable if the question is risk of dying since numerator and/or denominator could be incomplete.
- We want to know IFR not CFR for calculating the risk of death once infected but it is difficult to estimate because we don’t know the count of actual infected cases. Simulated IFR is much lower than CFR when the pandemic is ongoing, CFR converges to IFR towards the end of the pandemic
- CFR can decrease or increase over time, as responses change; and that it can vary by location and by the characteristics of the infected population, such as age, or sex, existing illness. Simply put if someone or a government puts forward a single CFR percentage, question further and get more context. An epidemic cannot be represented by a single CFR.
- Differences between countries do not necessarily reflect real differences in the risk of dying from COVID-19. Instead, they may reflect differences in the extent of testing, or the stage a country is in its trajectory through the outbreak.
- Having more parameters gives you a better idea of the severity of disease in each group.
I do not undermine any country’s or government’s efforts to eradicate the virus. I decided to write this short post as an experiment on how well my personal understanding of the topic is and to create an awareness in the process. If you find my wordings vague, I recommend you to take up a little more time and skim over the excellent resources provided below.
- http://ondata.blog/articles/coronavirus-mortality/ –
- https://ourworldindata.org/covid-mortality-risk – Interactive, data visualization against multiple parameters, I found this site better than any other because they do not just put up the charts but also mention the possible biases in detail
- https://www.reddit.com/r/AskStatistics/comments/i71mnm/case_fatality_rate_vs_infection_fatality_rate_vs/ – STEPHEN R. MARTIN‘s Russian roulette example
- https://youtu.be/2qdd7kirwIk – Vox – Understand CFR visually and why numbers are unreliable when we are amidst a pandemic
- https://youtu.be/O-3Mlj3MQ_Q – Vox – Understand how a simple twist in changing linear to logarithmic axes can give a misleading impression