Nick Cage is Not Your Lifeguard: Correlations can by risky

In this time of pandemic, I have a huge favor to ask. 

Please do not share research-y visuals of correlations that seem to be saying something large and profound about our current situation. Even if these visuals come from a reputable news source. But especially if it is a comparison visual that has taken two or more reputable data visualizations and thrown them side-by-side to make a comparison.

I have seen some of you share a pair of maps: 1 shows cell phone data of travel distances during a point-in-time during this pandemic and the other shows how far people have to travel to their nearest grocery store to get food. AHA!! Many have said, this proves that the connection here is that people in these areas have to travel further because they have no closer access to groceries, and this cell phone data is just being used to shame the poor.

These are 2 of the comparison photos I have seen. There are others.

As you know, I am always on the lookout for poverty-shaming, and always eager to think about structural inequalities and seek out solutions. However, I really, really beg my readers NOT to do policy suggestions via social media, based on random visualizations.

This pandemic is about a month old in the US. It takes a lot longer to do social science research. At this point, there are data available, but researchers have not yet had time to clean the data, determine best control variables and add those in, figure out what limitations there are, etc. This situation is fluid and everyone wants data now, so they share.

Data dead spots

Some of the miles-traveled data are coming from cell phone tracking. Have you ever driven through any of our fine western states? For how much of that time did you have solid data connections? I don’t know how the data tracking sites are accounting for that, or if the GPS is solid throughout.

Data standardization

None of these data comparisons are using standardized data. Depending on the scales used, a relatively small change can show up quite dramatically. Are they using the small unit of area? Are we taking into consideration essential workers? What about population density? Doesn’t that play a role here?

Reasons for longer drives

Some people have longer drives because they live in impoverished food deserts. And some people have longer drives because they live in rural areas. I notice the west doesn’t show up as an area of longer driving distances on the national map. Perhaps this is because the overall population is so very low. But you better believe that people are not adhering to a 2 mile driving radius in rural Montana. 2 miles barely gets you to your neighbor’s house, let alone a grocery store!

Can you find a correlation between these cows and those mountains?

That more granular data did show up in a map of Oregon, and was written about in the Willamette Week, where they noted that Eastern Oregonians were traveling more than western Oregonians, leading to suggestions that eastern Oregonians aren’t taking social distancing seriously. In response, Cody Mastrude writes, “We are ranchers and farmers who have to work cattle and land that spans acres and miles every day. We can’t feed cattle via the web. In general, we are spread out over miles.” And this is where I make a plug for the importance of qualitative data as a way to understand the nuances of quantitative data. Unfortunately, without such data available yet, people will speculate.

Do Nick Cage films lead to more drownings?

Correlation does not mean eyeballing two national maps and seeing they are similar. You have to get ZIP code level data and do comparison studies at that level.

But even if you just eyeball the 2 maps and look at the ZIP code level instead of the national swaths, you can see a whole lot of areas that don’t fit this convenient narrative.
When people determine a correlation, they conduct analysis of all data and get a correlation coefficient. You have to start there. And then if your correlation factor is high, you can think about all the confounding factors that may lead to it.

Correlation is not causation

What is the independent variable?

It seems that the implication of the driving and food deserts comparison is that the food desert is the thing that explains the extra driving. Let’s set aside what I said about visualizations and let’s assume that there is a strong, mathematically calculated correlation. This still doesn’t tell us for sure that one leads to the other. It could be that they are both outcomes of a different independent variable. Several of the states in question have governors or legislative bodies that are against government regulations and espouse a “boot straps” mentality that leads to policies that do not protect the impoverished and ensure access to food.

At the same time, these state governments have been much more hesitant to enact social distancing orders and in some cases have actively downplayed the risks. This can lead residents to assume that normal movement is acceptable and warranted, and won’t change their habits. Thus, the attitudes and laws of the governments would be the independent variable, and food deserts and lots of longer trips would EACH be dependent variables.

What is the point of all of this? Why am I posting this? Because now more than ever, science matters. Logic matters. Critical thinking matters a lot. Nick Cage also matters, of course!

We should always be thinking creatively and trying to figure things out. But armchair analysis should always approached with caution. When you see a source, click through the main article. Learn about the author, and any researchers that might be involved. Listen to the critics, and pay attention to where they come from as well. We are living in really perilous time. We need to be very responsible with the news we consume and disseminate. Treat each article as if our lives depended on it. Because this time, our lives might depend on it.

Nick Cage and this cat want you to know....