One of the classic stories of distorted conclusions coming out of data that is out of context involves a big data project aimed at gathering Twitter feeds and other social media posts to predict unemployment rates in United States.
The researchers used sentiments analysis to determine if there was a correlation to keywords as “classifieds,” unemployment,” and “jobs” and the increase or decrease of the monthly unemployment rate in America.
The researches noticed a huge spike in the number of tweets containing one of their keywords. However, the result had nothing to do with unemployment. “What they hadn’t noticed was that Steve Jobs died,” said Gary King, Harvard University professor and director of the Institute for Quantitative Social Science.
People who work in the field of analytics and big data very likely have other examples similar to this story. Increasingly, more and more data experts are realizing that big data does not automatically yield good data.
If data is incomplete, out of context or contaminated, it can produce flawed answers and lead to decisions that could do more harm than good.
In the example above, King said, curated keywords work fine in the short run. But they tend to “fail catastrophically over the long run.” Results can be remedied but it would involve a lot of human labour.
And yet, in our everyday lives, we are likely to encounter numerous situations where decisions are made based on flawed data.
In his blog tiled: A Brave new world: Big data’s big dangers, Adam Frank wrote that in some case banks will deny a person a loan based in part on their contacts on the professional networking site LinkedIn or social networking site Facebook.
If your “friends” turn out to be deadbeats your credit worthiness is likely to be held in question as well.
Credit card companies also sometimes lower a client’s credit limit based on the repayment history of the customers of stores where that client shops, Frank said quoting Jay Stanley, senior policy analyst for the American Civil Liberties Union (ACLU).
Such methods may be called economic guilt-by-association base statistical inferences about a person on things which that person have little or no control over or even any awareness of.
The point is that big data is only a tool and should not be considered the solution, according to Kim Jones, senior vice-president of Vantiv, a payment procession solutions provider.
Taking the human element out of the equation and relying solely on data analytics, creates a higher error rate, he said.