data analysis

I wanted to track the progress of the covid19 pandemic over time. This took some work …

Areas of work

  1. Extraction: Pull data from source(s) and keep it clean over time

  2. Analysis: Devise useful metrics to follow

  3. Visualization: View metrics in interesting ways

Of the above, I have made some progress on Extraction and Analysis, whereas Visualization remains to be done.

Here are the data sources used:

Metrics

After extraction and cleanup, the data was used to generate some interesting metrics.

sum

This is the simplest metric, merely the cumulative sum of a time-series.

dgr (daily growth rate)

This is the daily growth rate of a daily time series.

agr5 (average growth rate over 5 days)

This is the average growth rate over 5 days of a daly time series

icfr (incremental case fatality ratio)

This is a specific metric which measures the incremental case fatality ratio of the covid disease.

As the outstanding cases on any day get resolved, icfr calculates the fatality rate of that group of patients. A declining icfr indicates that there has been a greater percentage of recoveries than deaths from the outstanding patient pool on each subsequent day over the last month.

An algorithmic description defines three terms : cfr, days to resolve, and icfr.

cfr: The Case Fatality Ratio, i.e. the percentage of deaths among resolved cases (recovered or dead). The calculation formula is simply : deaths / ( deaths + recovered)

days to resolve: Take any date on the calendar and mark the number of cases outstanding. Walk down the dates and find a date when the number of resolutions roughly equals the number of cases. This measures the days to resolve. Note that this metric can be calculated for any date on the calendar, and in fact it should be, since it is an important indicator of the state of the health care infrastructure. A longer days to resolve indicates longer stays in the hospital, as well as reporting delays caused by overload.

icfr: Take the additional deaths and additional recoveries since the beginning of the days to resolve period for any date. Find the cfr based on incremental deaths and recoveries. This is the incremental cfr (icfr), and is a more sensitive measurement of where the disease is going. In essence, it answers the question : “how bad was the disease for cases as they stood on this date?”.

Datasets

All datasets can be accessed through the dropdown selection UI on the datasets page,

Next Steps

Automated updates

The datasets linked above are intended to be updated daily by an automated procedure. This is under development so slips might occur. Bug reports are welcome!

Data Visualization

A Visualization frontend is planned to work with multiple time series data, this is work in progress

Availability of code

The Extraction and Analysis (metrics computation) code is available on github.


Acknowledgements

I gratefully acknowledge contributions by Ambuj Jain toward discussion, clarification, and validation of the icfr metric. Ambuj is a batchmate from IIMA-1989. He is keenly interested in global events to predict trends and is an active financial market forecaster. His predictions made during Zoom presentations to my batch group have been accurate right from the start of 2020.