If there’s a trend in big data analytics, it has to be speed. Every vendor is clamoring to offer the most rapid insights into the bulk of data your enterprise carries. It’s no longer acceptable, by and large, to speak of hours or days. It’s about minutes and seconds.
The release of Enterprise R 6.0 by Revolution Analytics Inc. last week brought a few tools into the market for companies wanting to model certain kinds of data on a grand scale — faster. By certain kinds, this is to say that not all data distributions look like the simple bell curve most of us are familiar with.
Sometimes, you may need a Gamma, Poisson or Tweedie model to understand it. The new version of Enterprise R offers support for these models, known as Generalized Linear Models, but on the type high-power hardware Revolution builds its software for: parallelized and distributed systems.
While open-source R supports these models, it isn’t practical to use them for huge amounts of data, explained David Smith, chief executive officer at Revolution Analytics. But Enterprise R 6.0 can now be used on Linux boxes running IBM LSF grid software and Microsoft’s High Performance Computing (HPC) distributed computing servers (meaning that you can now use R for analytics on Azure).
But what are these models actually used for?
Poisson
“Counts is a good example,” said Smith. “If you’re on the manufacturing line and they’re counting the number of failures that happen every day,” he said, you might need a Poisson distribution.
Gamma
Trying to understand a very large initial public offering, or perhaps, company failures on a balance sheet? This is where a Gamma distribution comes in.
Tweedie
“Then there are some really weird distributions, and the one that comes up a lot in the insurance industry, it’s called a Tweedie distribution,” Smith said. “It’s a bit like the bell curve distribution but it’s truncated: it’s all cut off on one side because insurance claims can’t be less than zero.”
Weird though it may be, Tweedie can be very useful in looking into capital reserves or insurance premiums, said Smith.
Revolution has posted a video of a Tweedie regression performed on millions of insurance claims, if you’re interested in seeing it in action. The company says the results came in only minutes (compared to the hours it took before).