Visualizing Covid-19 trends

Many websites and news outlets provide coverage and analysis of Covid-19 trends. Graphs and maps are among the best way to convey them, but I can never find the combination of data most interesting to me. Google, for example, has great tools to show changes over time at the county, state, and national levels; but the comparison of regions against each other is limited and clunky. The Johns Hopkins Covid Dashboard meanwhile has heatmaps for easily seeing how regions compare against each other, but the data are only a single and recent snapshot. One of NPR’s most interesting graphs overlays the new cases in all 50 states, but does so with logistical scales that show daily new cases as a function of total cases rather than calendar date; and only one state may be compared against just New York at a time.

I set out to improve the situation for myself by rendering the raw CDC data, updated daily, with my own choice of axes and metrics. The publicly available result is an interactive combination of a line graph and heat map, both powered by amCharts, which mutually update each other’s state. The user may select which core data set to draw from, how to aggregate that data, and then either which date to focus on or which two states to compare. Most significantly, the line graph plots the median and the top- and bottom-five states — key percentiles — for a given data aggregation to show how outlier states change in both value and identity over time.

The default view shows the rolling 7-day average of new cases per 100,000 residents in each state. The top graph shows the daily top- and bottom-5 states as well as the median state sandwiched between them. The heat map below shows all states on a given day.

Analysis to date

The key question in my mind is essentially the Covid “mood” of the US over time. That mood is defined by how hotspots develop and evolve. Critically, it should be normalized by regional population to enable a relative comparison. The best metric then, in my view, is the per capita new cases. As is common in many reported visualizations, the best results are usually had from rolling averages across a few days or weeks due to choppiness in recording, reporting, and aggregation from across the country.

The widely-reported spikes in April and July are clearly visible from the rise and fall of new cases in the top-five states over those periods. One key finding is that the median state rose with those outbreaks but didn’t substantially fall back along with them afterward. This tells me that regional outbreaks never really recovered or subsided, but simply migrated. As Dr. Fauci said in late October, the first wave never ended.

Median state per capita new cases continued to rise as the first spike peaked and remained flat as it fell.
Same story after that summertime sadness, except this time even the bottom five states begin to see nontrivial new cases which didn’t subside.

Of course the most recent spike dwarfs the first two to a shocking degree: a factor of about four times. The experts had all been predicting for months that the winter would be rough, but it was visually clear from per capita new cases by the first week in October that even the fall was headed for trouble. The dire warnings of Christmastime funerals don’t seem so hyperbolic and might have staved off disaster: those trends slack off only about a week before Thanksgiving.

Going into December the top five states have reversed their November explosion, but we have the starkest median state plateau yet; and even the bottom five states are worse off now than the majority of states were during the July peak.

So, what will happen in the wake of Thanksgiving? Maybe we’ll be saved another huge spike after this November wakeup call: most of the country has now been through a local wringer at one point or another.

The April peak mostly hit the denser parts of New England and, bizarrely, Louisiana.
The July peak was about as pronounced in magnitude as the first but was much broader in scope: much of the South and Mountain West were affected.
The November was focused in the Midwest, but take note of the scale: even remote places like upper New England, Alaska, and Hawaii are seeing five times the new cases per capita as compared with this summer.

But there’s new warning sign now: per capita deaths are on the rise. Deaths were high in the spring before much was known about how the treat Covid-19 at scale, but by the summer that situation had become more manageable. Now reports are increasing of hospitals becoming overwhelmed. While that isn’t very surprising given the characteristics of the fall surge, it does make clear the stark risk of the winter ahead.

After a long flat plateau, median state new deaths began to rise in mid-October. Even the bottom five states were ticking up in November.

Anticipate the news

The most useful aspect of these graphs is seeing trends as they’re unfolding. Often they’ve helped me anticipate news stories by making some states’ trends visually obvious. For example, the Dakotas were clearly off the rails by September, long before this was widely reported after it got really bad in November. And it was easy to see that Iowa was skewing the same direction in the weeks before Thanksgiving, a notion the Atlantic has just covered. Rhode Island surged past the median state for per capita new cases before Halloween, so I was unsurprised to hear they were opening field hospitals after Thanksgiving.

Another good way to spot outliers is by how fast new cases are changing week over week. For this, choose to graph the change from some number of days ago: positive values indicate an increase in the value, negative ones a drop. NPR reported in mid-November that cases were “growing at record speed”, and this view makes it easy to see that we reached that condition in late October. By the first of November all states had surpassed the July surge in new cases; by the time of that reporting, no state had posted a drop in the new case rate for three weeks, a new record for the pandemic.

Smoothing the curves is even more important when comparing rolling averages against themselves. I find a 14-day delta is the tightest spread that doesn’t introduce overmuch noise to the curves, but the choice is yours. Here, the heatmap ranges between red (positive) and blue (negative) with shades of white straddling no net change.

Play with the data yourself

Finally, some general usability notes for the visualization:

Use the slider bar at the top of the graph to zoom in on a time range. Mouse over the graph to highlight the date at your pointer, including tooltips to call out values for the top five states (A through E), the median state (M), and the bottom five states (V through Z).

Mouse over states in the heat map to plot the graph for that state against the dynamic percentiles. Click a state to pin that state’s graph to facilitate state-to-state comparisons. Click the same state again to dismiss the pin. A pinned state will also have a tooltip value when mousing over the graph.

Here’s the July surge and its correction, comparing Washington against the rest of the nation: against just the percentiles throughout, and against all the states on one day in early August.

And an overview of the available data views:

  • New / Total – Switch between new data (reported daily) and total data (a cumulative total from the beginning of record keeping).
  • Absolute / Per Capita – Switch between absolute numbers and state-by-state per capita normalizations. Per capita is per 100,000 residents as estimated by the Census Bureau this year. (Update: the screenshots were all based off the earlier 2019 estimates before the 2020 ones became available in late December.)
  • Cases / Deaths – Switch between data about infections and deaths due to Covid-19.
  • Over Rolling Average Days – Each data point is an average of this many days before and including that date. A rolling average of 1 day is just the daily raw data (no averaging).
  • Change Since Days Ago – Each data point is actually the difference between that date’s value and the value from this many days ago. That includes when using a rolling average to render data. For example, the 14-day rolling average with the change since 14 days ago is actually the average of the past two weeks less the average of the two weeks before that. Dismiss this “delta” view by setting the drop down to the empty value.

Once again, the source of this data is the CDC’s daily tracker data. Other data sets exist, for example that of the Covid Tracking Project. That dataset would be particularly interesting to consume, if it were robust. It includes hospitalization and test positivity rates, for example. But the data are spotty: each day’s data come with a varying letter grade for their relative quality (e.g. C-); and entire states are consistently missing from some of the measures. At last check, there are no data for hospitalizations on any day for Alaska, California, Delaware, Iowa, Illinois, Louisiana, Michigan, Missouri, North Carolina, Nevada, Pennsylvania, Texas, Vermont, West Virginia, and the District of Columbia. That’s more than two fifths of the entire country by population.

Happy sleuthing, and may robust science and personal responsibility — both informed by clear and consistent data — save us all.

Leave a Reply

Your email address will not be published. Required fields are marked *