Talk of “big data” has gone out of fashion, and perhaps has even acquired a somewhat unsavory reputation. It isn’t because the practice of mining and processing large datasets has faded, but because the practice has become so prevalent it is no longer particularly remarkable. We take for granted that data sets contain billions of observations and that sophisticated algorithms can extract amazing, deep insights from these samples.
A recent article in Slate magazine reflects a sense that the early promises of “Big Data” remain unfulfilled. Moreover, the haste to deploy “data-driven decision-making” has resulted in several high-profile mistakes, e.g., the Google Photos snafu in which the company’s A.I. mistook black people for gorillas due to lack of diversity in their training data, or the recent Russian influence in the US election. More subtle but important outstanding issues revolve around implicit and explicit biases in relatively opaque models and systems.
There are, it turns out, no easy solutions. More data data is better, but then again sometimes “small data” is better. We need to push forward the development of more accurate and more efficient algorithms, but then again we need to avoid the “gold rush” mentality that neglects caution and prudence.
Photo illustration by Lisa Larson-Walker. Photo by Thinkstock.