Network Appliance Inc.’s NearStore ushered in the era of using inexpensive, Advanced Technology Attached (ATA) disk arrays for disk-to-disk backup or secondary, near-line storage. The product, launched in March 2002, offers faster backup and recovery times at a cost per megabyte that’s competitive with tape backup systems. Now vendors are rushing to add application-specific intelligence to ATA-based storage appliances that reduces application server workloads while offering more efficient ways to store and retrieve data.
Perhaps the best example is Centera, EMC Corp.’s system for indexing, storing and retrieving “fixed content” files. In Centera’s Content Addressed Storage scheme, the client application bypasses the server’s file system by making calls to a proprietary application programming interface (API). Centera intercepts each file storage request, strips off the metadata (such as date and time stamps) and runs a hashing algorithm to create a unique, 27-character content ID. It then returns a content descriptor file (CDF) to the client application that points to both the stored object and its metadata. Thereafter, the application need only request the stored object’s content ID. Abstracted from the storage media in this way, the application needn’t worry about disk I/O, tracking the file path or keeping up with changes in the back-end storage configuration.
The bottom line: “You should need less of a server . . . and the applications should run more efficiently on lower-cost compute platforms,” said Steve Duplessie, an analyst at Milford, Mass.-based Enterprise Storage Group.
Centera’s technology also eliminates redundant file storage by creating multiple references that point to a single instance of the stored file. For example, to store an archived e-mail file attachment sent to 1,000 users, Centera would create 1,000 CDF references to a single content ID, which in turn would reference a single, stored file.
Startup Avamar Technologies Inc. takes this technology one step further to address the problem of backup inefficiencies. While Centera’s CDF technology can eliminate storage of redundant files, Avamar’s Axion backup appliance indexes the individual data blocks that make up those files on disk in order to eliminate both file and partial file redundancies. When a sentence changes in a document, for example, Axion updates only the affected blocks within that file.
“We’re so much more efficient [that] we can store 10 to 100 times the amount of daily backups that you could on a [disk-to-disk backup system that is] mirroring tape backup,” said Jed Yueh, Avamar’s executive vice-president. The result is a system that requires less space for backups, can restore faster and can efficiently back up distributed systems over a wide-area network, he said.
Another startup, Netezza Corp., has taken the intelligent storage concept the furthest by embedding parallel processing power with individual disk drives. It designed the Netezza Performance Server as a “data appliance” that optimizes business intelligence queries against very large databases, replacing the traditional Oracle database running on high-end Unix servers and EMC storage arrays. CEO and co-founder Jit Saxena said disk I/O is a bottleneck when querying such databases. Netezza’s parallel processing architecture packages what it calls Snippet Processing Units (SPU) with each disk drive – up to 450 per appliance – and integrates those with a symmetric multiprocessing front end that can accept SQL queries from any application that supports the Open Database Connectivity protocol. Each SPU has dedicated memory and communicates over a Gigabit Ethernet connection.
“We have deployed huge amounts of intelligence right next to each drive,” said Saxena. By keeping all drives processing in parallel, he said, “we provide 10 to 20 times the performance of a [traditional] system at half to one-third the cost.” And because the system is read-intensive and application-specific, Saxena said ATA-based drives work well.
By using smart, inexpensive ATA-based storage appliances that offload I/O processing for application-specific tasks, vendors may eventually change how users view the traditional server’s role, said Duplessie.
“What we’re doing is taking distributed computing to the next level by ‘appliance-izing’ the intelligence in the server,” he said. But even big-name products like Centera are still in early stages of acceptance. “It will take some time for people to make the best use of this,” predicts Jamie Gruener, an analyst at The Yankee Group in Boston.