Companies large and small are rushing to coral the abundant and cheap data available in various social media outlets and dig into the readily available information of what people are thinking, feeling, doing and even intending to do in the hopes improving corporate decisions, campaigns and making more money.
Two computer scientists, one from Canada’s McGill University and the other from the Carnegie Mellon University and the other from Carnegie Mellon University in the United States, are warning that these huge datasets can be “misleading.”
McGill’s Derek Ruths and Juergen Pfeffer of Carnegie Mellon cautioned in article they published in the Nov. 28 issue of the journal Science, that big data users need to figure out how to correct for biases inherent in information gathered from Facebook posts, tweets, and other social media output.
Ruths is an assistant professor in the School of Computer Science at McGill.
Pfeffer is an assistant research professor at the Institute for Software research, School of Computer Science at Carnegie.
The two pointed out that thousands of research papers based on data collected from social media are published each year.
“Many of these papers are used to inform and justify decision and investments among the public and in industry and government,” Ruths said in an article in the MCGill Web site.
“Not everything that can be labelled as,Big Data is automatically great, said Pfeffer who was quoted in an article appearing in the Carnegie Mellon Web site. “,,,the old adage of behavioural research still applies: Know Your Data.”
Their research highlighted several issues with using big data. The McGill article posted some of the issues and the ways to address them:
- Different social media platforms attract different users – Pinterest, for example, is dominated by females aged 25-34 – yet researchers rarely correct for the distorted picture these populations can produce.
- Publicly available data feeds used in social media research don’t always provide an accurate representation of the platform’s overall data – and researchers are generally in the dark about when and how social media providers filter their data streams.
- The design of social media platforms can dictate how users behave and, therefore, what behaviour can be measured. For instance, on Facebook the absence of a “dislike” button makes negative responses to content harder to detect than positive “likes”.
- Large numbers of spammers and bots, which masquerade as normal users on social media, get mistakenly incorporated into many measurements and predictions of human behaviour.
- Researchers often report results for groups of easy-to-classify users, topics, and events, making new methods seem more accurate than they actually are. For instance, efforts to infer political orientation of Twitter users achieve barely 65 per cent accuracy for typical users – even though studies (focusing on politically active users) have claimed 90 per cent accuracy.