Author: Cindy Schneider
What do we do when an invisible menace—transmitted by the most run-of-the-mill acts—forces us to hunker down behind barricades of toilet paper? As it turns out, while some of us hit the treadmill, play Skyrim, and check the internet for news and analysis, some of us just can’t resist the urge to mess with the data ourselves. Because the COVID-19 pandemic is affecting human society in so many increasingly complex ways, never has it been more tempting to use available data to predict the future (and boy is it bleak!). But, in the spirit of Buzzfeed’s generalized eye roll at the COVID-19 dilettantes, entitled “I’m Not An Epidemiologist, But…”, probably not many of the non-epidemiologists are data scientists, either.
But instead of taking umbrage and writing a shrill screed against citizen data scientists, I’m going to show you, sensible-person-who-wouldn’t-miss-exponential-curves-if-they-disappeared-from-human-consciousness-tomorrow, a few ways to separate signal from noise for yourself. No data science required, bring only your common sense:
- Sourcing. This is an easy place to start. In a hyperlinked world, there is absolutely no reason for anyone citing a data set, statistics, or anyone else’s research results to not provide a link. No MLA-approved bibliography necessary – just point us to where it came from. In the case of COVID-19, authoritative sources are EXTREMELY easy to find, such as CDC, WHO, NIH, various renowned medical journals. If you read scientific claims made without sourcing, or that simply refer you to other blogs, posts or news reports, find your information somewhere else.
- Breathless wording. Where would the blogosphere be today without the use of hyperbole? When it comes to data science, an over-the-top title or wording can be a strong indicator of a weak result, or of the writer’s lack of confidence that you (the reader) will see what they see. The internet’s current favorite example of this is here (many of the original errors and bad conclusions have been corrected since his first posting, but, alas, the tone has not been recalibrated, and still runs roughly like “people will die and it will be all your fault”).
- Charts and Graphs. Study them. By now we’ve been so inundated with exponential curves that we’re unlikely to give special notice to one more. Scroll down to the one on this page, and ask yourself whether it tells you as much as you thought at first glance (why?). Always do a basic check – are the axes labeled clearly? Does the picture show what the words say it does?
- Predicting the future. To be sure, data science illuminates trends and patterns, and extends the world of probability and statistics, and many methods are easily accessible as bits of code and DIY (do-it-yourself) machine learning packages. The allure of drawing meaningful conclusions can be overwhelming to talented coders (who are also non-epidemiologists!). Here’s an example of a poor attempt to derive social impact from an otherwise solid application of the author’s skills. The data science is good – but compare his certainty that he has categorically defined the U.S. value for R0, and how you think the scientific community should determine the virulence of a pandemic-causing virus.
The well-intentioned movement to make data science accessible to a copy/paste world belies some of the characteristics that make data science powerful: intricately layered applications of principles across a broad swath of mathematics, and rigorous practices of conclusion-drawing that are heavily leveraged with prior assumptions, error bars, and measures of confidence or uncertainty. Fields like probability and statistics are fraught with slippery concepts and subtle ideas. In short, there’s a yawning gap between knowing how to wrangle data or apply and debug code, and being able to map results to valid insights, forecasts or sound advice.
So, look past all the data science humble brags (“I just want to help – look at all the famous people who have re-posted me!”), and apply your own critical thinking skills. It’s OK to come up with questions you can’t answer – your doubt can be a very good guide to better sources of information!