Storage has never been in more demand than it is today. But there are questions of whether storage hardware and software makers can keep up.
“It seems to me there’s an insatiable appetite for gathering more and more data,” says Mike Cardy, global CTO at Toronto system integrator OnX Enterprise Solutions Inc.
Most IT managers would agree. A ravenous hunger for so-called “big data” has organizations and their IT departments scrambling to address it. The proliferation of managed services and clouds only compounds the problem.
There are three elements inextricably tied to big data storage: Compression, de-duplication and encryption. But first, there is the simpler question of whether storage technology is advancing at the same rate big data is growing.
“There isn’t an easy answer,” says Mark Peters, a senior analyst at the Enterprise Strategy Group Inc. “I think we live in a world where the demand for storage, whether from a capacity or performance criteria, seems to at least match and at times be greater than the ability of the industry to provide it. And the reason I say it in such convoluted terms is that every time we come up with a leap in capacity or performance new applications, new uses, new reasons to store things for longer seem to appear as well.”
“And I’m not sure, quite frankly, which is chicken and which is egg. I don’t think anyone knows.”
Peters says from an engineering perspective, storage has plenty of room to improve. Further technological advances are possible because we’ve begun to look critically at the way we’ve handled our storage infrastructure up until now.
“There is a realization that we have been very profligate with our use of storage, and when I say that I’m not talking in the last 10 minutes or 10 months. I’m talking over the last three decades. We have only concentrated on the effective side of the equation, getting the job done. And as much more of a focus in the last few years, we’ve always talked about efficiency, you know doing it with as few resources as possible. Well, that’s where I really think we are now.”
Peters says while we’ve increased capacity, we often haven’t been using the full potential of storage available to us. “Just to give a quick example, in a typical user, let’s say they buy a lot of spinning disk, they will probably only use anywhere from 30 to 70 per cent of its capacity. Therefore, phrased differently, they are wasting 70 to 30 per cent of its capacity. We are concentrating at the moment on finding ways in which we could use a much greater percentage of that capacity.”
Solid state has the potential to change things in a big way. But of course, there’s quite a hefty price tag attached.
“You’re going to see applications built where my cache memory, rather than disk, will go to solid state on the way to spinning disk,” says Jeff Goldstein, general manager of storage array manufacturer NetApp Inc. in Canada. “Trick is, with solid state it’s still very expensive, relatively speaking.”
“Our biggest frame today has 4,400 disk drives sitting in a frame, equaling about 4 PB of storage in one frame. That would be pretty expensive to replace with solid state. “
The real usefulness of solid state, says Peters, is in its capacity to do the heavy lifting, not necessarily to store huge amounts of data.
“If you can take the performance side, the actual I/O, the actual work, and get it onto solid state in some form, that’s an ideal area where it’s a relatively small amount of storage that accounts for a huge amount of the actual work that’s done. Quite often, the reason people waste so much of their storage space is that they bought it because each spinning disk can only do so many I/Os per minute, per hour, per day.”
Solid state is making major inroads in the IT marketplace, he says, and it will have its day in the sun, in part, because of our expectations. “Every single storage vendor offers solid state in some form or fashion. To some degree that makes it a self-fulfilling prophecy because they’re all out there, telling the story.”
Going down the storage chain from solid state to spinning disk, we finally end up at tape. Once upon a time, tape was synonymous with backup. Now, more and more, its role has been taken over by spinning disk, says Goldstein.
“It used to be you’d back up to tape and recover from tape. Tape was very much in the data centre. If you had a disk problem you’d go back to tape. As data has grown, though, tape gets moved further and further away from, kind of, the mainstream. People back up disk-to-disk, replicate to disk, and then archive to tape.”
But not everyone does, says Peters. He says he’s witnessed confusion among some companies about where the dividing line is between backup and archive.
“Frankly, many people do view their backups as their archives. I think that’s poor management—well, when it happens unintentionally, it’s poor management. You will find quite a lot of what I would regard as long-term permanent archival storage sitting on disks.
“But across the market as a whole,” he adds, “most of it still goes to tape at some point.” Library and Archives Canada, for example, is migrating its enormous collection of analog audio and video to SAN. Invariably, it will archive them in “an ever-growing library of LTO (Linear Tape-Open) tapes,” according to Douglas Smalley, video conservator in LAC’s preservation branch.
“All those LTO tapes will need to be refreshed on a regular cycle to ensure its continued existence for the long term. The good news is that the next migration will be file-to-file and can be highly automated, not requiring a human video conservator to monitor the transfer from an analogue tape one at a time.”
Tape is showing its age, but it remains the cheapest form of storage, says Goldstein. At the same time, he says, its reliability is increasingly being called into question.
“People are recognizing that tape is not a very reliable medium: tapes break, tapes stretch. When you want to go to recover from tape, often the data’s not all there and you don’t know it until you need it.” That doesn’t mean tape is going away, says Peters, not unless we can find something that can compete with it on price.
“The amount of data stored on tape still is huge and there is no sign of any technology replacing it,” says Peters. “At the end of the day, frankly, all storage, doesn’t matter whether you’re talking about putting it on DRAM, flash storage, spinning disk, whatever, it’s all about economics. People think it’s about capacity and performance but we only talk about those things because we can’t afford to put everything on DRAM or main memory.
“There is nothing as economically attractive as tape for long-term storage.”
Finally, archiving big data comes with its own set of problems. One particularly bothersome one is keeping it all compact, ordered and secure. PKWARE, famous for launching PKZip in 1989, is now finding its niche in catering to enterprises that want to save money by compressing their data before sending it to the cloud.
NetAPP does the same through the similar process of de-duplication, says Goldstein. “Our storage systems will look for the same bit strings, and if it’s the same we only keep it once. You can eliminate a lot of storage by not having to keep 500 or 5000 copies of data.”
Yet it’s encryption that PKWARE is really focused on today. Not only must data be compressed at the same time it is encrypted (it is impossible for dictionary-based compression algorithms to compress randomized data), but data security is also becoming increasingly important for enterprises that trust their data to the cloud.
“It’s kind of rare, especially with regulation and compliance, that you’re able to either archive data or move data that doesn’t contain some sensitive data,” says Joe Sturonas, CTO of PKWARE.
“When you move data from one platform to another, especially cloud storage, which is not yours, it’s another service, if it’s not encrypted the administrators of that service have access to all that data. And you may or may not want that.”
Cardy says certain OnX clients demand particularly stringent security. “Depending on whether it’s healthcare data or financial services data, where there are strict regulatory requirements, we will encrypt data in transport as well as on the storage media,” he says.
It seems that the main challenge in big data storage will not be in the storage media itself, or in the way we organize our backups and archives. The challenge will be deciding whether to trust it in somebody else’s hands.
The cloud has given enterprises unprecedented freedom to store massive amounts of data. But as Sturonas puts it, “with that freedom comes some exposure.”