Permabits and Petabytes

OEM Data Optimization Solutions for Next Generation Storage

Dedupe 2.0–What exactly is it?

Published by Wayne Salpietro | Filed under Wayne Salpietro, Director PMM

In my earlier blog, I mentioned Dedupe 2.0 as the next generation of deduplication beyond backup. Dedupe 1.0 is the backup use case of deduplication which many vendors deploy today. It seems however, that some vendors are trying to claim their dedupe solutions as Dedupe 2.0, when in fact they’re just tweaks to Dedupe 1.0 implementations. For example, a recent white paper and now a few blog posts by Nexsan are doing exactly that. What they describe is a backup solution (including an OEM dedupe implementation from FalconStor) that is a little faster than their previous version because they added an extra processor and are now including MAID drives. (We saw how well MAID technology did for Copan!) How is that doing anything to solve the broader storage management challenges that Dedupe 2.0 really is designed for? Dedupe 2.0 is applying deduplication in storage environments such as archiving, tier 2 – n storage and primary storage.

For vendors to implement Dedupe 2.0, they must address the different use cases. For example, archiving is more about data volume with the amount of data increasing rapidly and also being stored for increasingly longer periods of time. In tiered storage, the concerns are about data volume and to some degree speed of ingest and data retrieval. Primary storage is the most difficult because it’s really all about implementing dedupe with zero impact on storage performance on the ingest and retrieval activities.

Dedupe 2.0 advances are being enabled by two technologies that are converging today. First is processing capabilities (multiple quad core) and second, and probably the most important, is the ability to very rapidly query an index to determine if there is a duplicate.

The ability to very rapidly query an index of digital fingerprints consumes processor cycles that until recently inhibited near real-time deployment. The solution required the development of indexing techniques that use memory to house the index. By doing so the query can be accelerated and dedupe would not impact ingest or rehydrate efforts and not impact system and storage throughput. Permabit developed and patented indexing techniques, over the last decade, which can return index queries in microseconds. We have been shipping this technology for the last several years in our Enterprise Archive product.

So the answer is that Dedupe 2.0, is the deployment of deduplication technology higher up in the storage stack. In primary, tier 2 and archive. Addressing duplicates before the data ever becomes a backup problem! Basically, dedupe everywhere.

March 15th, 2010

Leave a Comment