A big data ‘aha’ moment

It may be a buzz phrase, the cloud computing of 2012, but I do find big data analytics fascinating. It's just the way my mind works; give me a big enough survey sample, and I can entertain myself with pivot tables for hours on end. But I felt I needed a better grounding in the concept, so I asked the folks at SAS Canada for a schooling. They connected me with Paul Kent, SAS Institute Inc. in Cary, N.C. Kent is the vice-president of platform research and development for the company.
 
(I also spoke to Pat Finerty of SAS Canada about the evolution of analysis, from data mining to big data, in this video.)
 
It's a given that technology changes everything, but that's particularly true in the big data analytics field. The ability to process the analytics of billions of lines of data in memory, innovatons like the Hadoop MapReduce framework for distributed computing, and high-performance computing grids make it possible to perform analytics on ever increasing amounts of data in near-real time.
 
On the other side of the equation, we're collecting more and more data to analyze. The evolution of data analysis is inextricably linked to the evolution of data collection. In the early days of computing, data was part of the application itself. Move along to the transactional data base model, and data is collected from outside the application, but complying with a specific structure of fields. Now, the sources of data aren't so structured: we're dealing with documents, images, and media files, often without the appropriate meta data; geo-location data that may or may not be associated with a transaction; social media feeds wherein context is everything; metering data from electrical grids; all manner of telematics from vehicles, production machinery, etc.
 
I remember a story from the days of yore, when data mining was a fresh concept. A colleague of mine called out a representative of one of the vendors over the beer and diapers issue: analyze enough transactional data, and you'll find a pattern that suggests people who buy diapers also buy beer, so a retailler can organize the shelves accordingly. Said colleague's complaint was that the company rep was presenting this as a fact, rather than a theoretical example of the patterns that data mining can unlock, and factually, it wasn't true. It's an item of small relevance, but for the fact that it lodged the beer-and-diapers model of data mining in my head for the ensuing 15 years.
 
And it's a handy model to have when the skeptical say that big data analytics is just a jumped-up version of data mining. It highlights the fundamental difference, and my discussion with Kent crystalized it: data mining is transaction-focused, teasing patterns out of information of limited scope, whereas big data analytics has a behavioural focus. We're not concerned with the transaction, according to Kent, but with the behaviour that leads to the transaction. Of those many new types of data outlined a couple paragraphs ago, almost all are related to behaviour.
 
That was my big data “aha” moment, and it fundamentally changes my understanding of analytics.

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.


Jim Love, Chief Content Officer, IT World Canada
Dave Webb
Dave Webb
Dave Webb is a freelance editor and writer. A veteran journalist of more than 20 years' experience (15 of them in technology), he has held senior editorial positions with a number of technology publications. He was honoured with an Andersen Consulting Award for Excellence in Business Journalism in 2000, and several Canadian Online Publishing Awards as part of the ComputerWorld Canada team.

Featured Download

IT World Canada in your inbox

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

Latest Blogs

Senior Contributor Spotlight