One of the problems with big data is that it’s big — if you can’t get a handle on it, then the value is negated. That’s one of the themes that came out of from several speakers at this week’s Strata+Hadoop World Conference in New York.
As a report in ComputerWorld U.S., detailed, organizations have to think about a lot when they move to big data.
For example, Sharmila Shahani-Mulligan, CEO and co-founder of startup ClearStory Data, suggested interactive storytelling is a better way to get insights from piles of data. That’s what her company says it does with disparate sources of data, creating graphical representations of data plus allowing room on the screen for collaboration.
Dashboards, she said, are only good at framing key performance indicators, not analyzing data.
For those thinking visualization is the answer, University of Utah school of computing assistant professor Miriah Meyer said off the shelf solutions may not be enough. That’s why she helped create a tool called Mizzbee, a “multiscale synteny browser for exploring conservation relationships in comparative genomics data.” Here’s a link to what was done and why.
Google may be known for creating magical — and to some, frustrating — algorithms that rank Web sites for searches, but M.C. Srivas, chief technology officer of Hadoop distributor MapR Technologies, advised that its ability to process data is more important than its math. Google’s lesson is “the company that can process the most data will have an advantage over everybody else in the future,” he said.
Finally, Pinterest chief data scientist John Rauser said that while most big data analysis is based on statistics, a lot of data scientists are “faking it.” But, he added, instead of getting bogged down in statistical methods, BI people should translate the questions being asked into simple computational methods.
“If you can program a computer, you have direct access to the deepest and most fundamental ideas in statistics,” he was quoted as saying.
So, just do it.