The promise of being able to analyze big datasets is that organizations will be able get more useful insight than the numbers they have been able to crunch until recently.
One of the classic examples Google likes to cite is the ability to predict flu trends in real time by analyzing the timing and geography of searches for medical information like bringing down fevers and other flu-like symptoms.
Makes sense right? – if there’s a sudden outbreak of searches, there must be an outbreak of the flu. Google’s even built a Web flu tracker.
However, the latest issue of Science magazine has an article putting a dent in this particular shining example.
As reported by Ars Technica, researchers have found two instances where Google Flu Trends got it particularly wrong during a global pandemic in 2009 and the start of this last flu season in 2013.
Why? Apparently because the flu was in the news so the number of searches was up. That distorted the number of infected cased predicted from the real numbers found later by the U.S. Centers for Disease Control.
In fact the researchers believe Flu Trend have consistently overestimated the number of actual cases.
This is not to say real-time big data analysis is a pipe dream. But there’s an old saying about data: garbage in, garbage out (GIGO).
It’s something those analyzing real-time data subject to public mood swings – like social media feeds, for example — need to keep in mind.