Data from the Internet of Things can come at an enterprise fast and furious, and in many cases, it’s what’s called “time series” data. With time series data, data location matters, it needs to be easy to retrieve using range queries, and has higher write volumes.
Basho Technologies is specifically addressing IoT and time series data with its new Riak TS, a distributed NoSQL database architected to aggregate and analyze massive amounts of sequenced, unstructured data. Peter Coppola, Basho’s VP of products and marketing, said that Hadoop is often the choice for analysing big data, it usually crunching historical data rather than real-time operational data.
“IoT is dealing with what’s happening right now, rather looking at stuff hours after it happened,” he says.
A frequent use case for Riak TS in an IoT scenario is sensors generating a great deal of information. Time series data is not limited to IoT, Coppola added. Other examples include financial trading data – a transaction that happened in a specific point in time, for example.
Other applications for Riak TS include utilities gathering and processing data from smart meters, or insurance companies with people in the field collecting claims data or even processing information from devices on vehicles to monitor driving behaviors that affect insurance rates.
For time series data, including IoT scenarios, enterprises would like want to group things by time, location or device IP, said Coppola. “We like to say data location matters.”
Riak TS lays everything out together on disk to make data faster for retrieval. “Time series applications have higher write volumes. There’s lots of data coming in and being acted on.”
Riak TS builds on technology from Basho’s Riak KV, its distributed NoSQL database, and is specifically designed to store and retrieve time series data with better read and write performance, said Coppola. The Riak TS database is able to make sure IoT or time series applications are always available for read and write operations, he added, but at the same time, one of its key features is the ability to filtering at a low level where data is stored to reduce the burden on client applications.
One of Basho’s customers is The Weather Company, which manages 20 terabytes of new data a day, including real-time forecasting data from more 130,000 sources. It uses Riak TS store and query time series data. In time series scenarios, Coppola explained, data location matters.
“You want to group things together and store them together on disk.” SQL is better suited for time series data, said Coppola, although it’s not an either/or proposition; most enterprises would employ a technology such as Basho’s as well as Hadoop depending on the nature of the data.
Not only is SQL better suited for operational data than Hadoop, said Coppola, “people see faster ROI on the operational side.”
Basho is demonstrating Riak TS at AWS re:Invent in Las Vegas this week.